标签归档:thread-safety

全局变量在烧瓶中是线程安全的吗?如何在请求之间共享数据?

问题:全局变量在烧瓶中是线程安全的吗?如何在请求之间共享数据?

在我的应用程序中,公共对象的状态通过发出请求来更改,并且响应取决于状态。

class SomeObj():
    def __init__(self, param):
        self.param = param
    def query(self):
        self.param += 1
        return self.param

global_obj = SomeObj(0)

@app.route('/')
def home():
    flash(global_obj.query())
    render_template('index.html')

如果我在开发服务器上运行它,我希望得到1、2、3,依此类推。如果同时从100个不同的客户发出请求,会出问题吗?预期结果将是100个不同的客户端各自看到一个从1到100的唯一数字。或者会发生以下情况:

  1. 客户端1查询。self.param增加1。
  2. 在执行return语句之前,线程将切换到客户端2。self.param再次增加。
  3. 线程切换回客户端1,并向客户端返回数字2,例如。
  4. 现在,该线程移至客户端2,并向其返回数字3。

由于只有两个客户,因此预期结果是1和2,而不是2和3。跳过了一个数字。

当我扩展应用程序时,这是否真的会发生?我应该考虑使用什么替代全局变量?

In my application, the state of a common object is changed by making requests, and the response depends on the state.

class SomeObj():
    def __init__(self, param):
        self.param = param
    def query(self):
        self.param += 1
        return self.param

global_obj = SomeObj(0)

@app.route('/')
def home():
    flash(global_obj.query())
    render_template('index.html')

If I run this on my development server, I expect to get 1, 2, 3 and so on. If requests are made from 100 different clients simultaneously, can something go wrong? The expected result would be that the 100 different clients each see a unique number from 1 to 100. Or will something like this happen:

  1. Client 1 queries. self.param is incremented by 1.
  2. Before the return statement can be executed, the thread switches over to client 2. self.param is incremented again.
  3. The thread switches back to client 1, and the client is returned the number 2, say.
  4. Now the thread moves to client 2 and returns him/her the number 3.

Since there were only two clients, the expected results were 1 and 2, not 2 and 3. A number was skipped.

Will this actually happen as I scale up my application? What alternatives to a global variable should I look at?


回答 0

您不能使用全局变量来保存此类数据。它不仅不是线程安全的,也不是进程安全的,并且生产中的WSGI服务器产生了多个进程。如果您使用线程来处理请求,不仅计数会错误,而且根据处理该请求的进程的不同,计数也会有所不同。

使用Flask外部的数据源来保存全局数据。数据库,内存缓存或Redis都是适合的单独存储区域,具体取决于您的需求。如果您需要加载和访问Python数据,请考虑multiprocessing.Manager。您还可以将会话用于每个用户的简单数据。


开发服务器可以在单线程和进程中运行。您将看不到您描述的行为,因为每个请求都将被同步处理。启用线程或进程,您将看到它。app.run(threaded=True)app.run(processes=10)。(在1.0中,服务器默认为线程化。)


某些WSGI服务器可能支持gevent或其他异步工作器。全局变量仍然不是线程安全的,因为仍然没有针对大多数竞争条件的保护措施。您仍然可以设想这样一个场景,其中一个工作人员获取了一个值,产生了收益,另一个工作人员对其进行了修改,产生了收益,然后第一个工作人员也对其进行了修改。


如果请求期间需要存储一些全局数据,则可以使用Flask的gobject。另一个常见的情况是管理数据库连接的某些顶级对象。这种“全局”类型的区别在于,它对每个请求都是唯一的,请求之间不使用,并且有一些东西可以管理资源的建立和拆除。

You can’t use global variables to hold this sort of data. Not only is it not thread safe, it’s not process safe, and WSGI servers in production spawn multiple processes. Not only would your counts be wrong if you were using threads to handle requests, they would also vary depending on which process handled the request.

Use a data source outside of Flask to hold global data. A database, memcached, or redis are all appropriate separate storage areas, depending on your needs. If you need to load and access Python data, consider multiprocessing.Manager. You could also use the session for simple data that is per-user.


The development server may run in single thread and process. You won’t see the behavior you describe since each request will be handled synchronously. Enable threads or processes and you will see it. app.run(threaded=True) or app.run(processes=10). (In 1.0 the server is threaded by default.)


Some WSGI servers may support gevent or another async worker. Global variables are still not thread safe because there’s still no protection against most race conditions. You can still have a scenario where one worker gets a value, yields, another modifies it, yields, then the first worker also modifies it.


If you need to store some global data during a request, you may use Flask’s g object. Another common case is some top-level object that manages database connections. The distinction for this type of “global” is that it’s unique to each request, not used between requests, and there’s something managing the set up and teardown of the resource.


回答 1

这并不是对全局变量线程安全的真正答案。

但是我认为在这里提到会议很重要。您正在寻找一种存储特定于客户的数据的方法。每个连接都应该以线程安全的方式访问其自己的数据池。

服务器端会话可以做到这一点,它们可以在非常整齐的烧瓶插件中找到:https//pythonhosted.org/Flask-Session/

如果设置会话,session则所有路径中都存在一个变量,其行为类似于字典。对于每个连接的客户端,此词典中存储的数据都是单独的。

这是一个简短的演示:

from flask import Flask, session
from flask_session import Session

app = Flask(__name__)
# Check Configuration section for more details
SESSION_TYPE = 'filesystem'
app.config.from_object(__name__)
Session(app)

@app.route('/')
def reset():
    session["counter"]=0

    return "counter was reset"

@app.route('/inc')
def routeA():
    if not "counter" in session:
        session["counter"]=0

    session["counter"]+=1

    return "counter is {}".format(session["counter"])

@app.route('/dec')
def routeB():
    if not "counter" in session:
        session["counter"] = 0

    session["counter"] -= 1

    return "counter is {}".format(session["counter"])


if __name__ == '__main__':
    app.run()

之后pip install Flask-Session,您应该可以运行它。尝试从不同的浏览器访问它,您会发现计数器未在它们之间共享。

This is not really an answer to thread safety of globals.

But I think it is important to mention sessions here. You are looking for a way to store client-specific data. Every connection should have access to its own pool of data, in a threadsafe way.

This is possible with server-side sessions, and they are available in a very neat flask plugin: https://pythonhosted.org/Flask-Session/

If you set up sessions, a session variable is available in all your routes and it behaves like a dictionary. The data stored in this dictionary is individual for each connecting client.

Here is a short demo:

from flask import Flask, session
from flask_session import Session

app = Flask(__name__)
# Check Configuration section for more details
SESSION_TYPE = 'filesystem'
app.config.from_object(__name__)
Session(app)

@app.route('/')
def reset():
    session["counter"]=0

    return "counter was reset"

@app.route('/inc')
def routeA():
    if not "counter" in session:
        session["counter"]=0

    session["counter"]+=1

    return "counter is {}".format(session["counter"])

@app.route('/dec')
def routeB():
    if not "counter" in session:
        session["counter"] = 0

    session["counter"] -= 1

    return "counter is {}".format(session["counter"])


if __name__ == '__main__':
    app.run()

After pip install Flask-Session, you should be able to run this. Try accessing it from different browsers, you’ll see that the counter is not shared between them.


回答 2

虽然完全接受上述建议的答案,并且不鼓励将全局变量用于生产和可扩展的烧瓶存储,但出于对原型“非常简单的服务器”的目的,在烧瓶“开发服务器”下运行…

… python内置数据类型,我dict根据python docs(https://docs.python.org/3/glossary.html#term-global-interpreter-lock)亲自使用并测试了global 线程安全。不处理安全。

从开发服务器下运行的每个(可能是并发的)flask会话中,可以从(服务器全局)字典中进行插入,查找和读取。

当使用唯一的flask会话密钥为此类全局字典输入密钥时,它对于会话特定数据的服务器端存储非常有用,否则该会话不适合cookie(最大大小为4k)。

当然,应该谨慎地防止此类服务器全局指令在内存中增长得太大。可以在请求处理期间对某种过期的“旧”键/值对进行编码。

同样,不建议将其用于生产或可伸缩部署,但对于面向本地任务的服务器来说可能不错,因为对于给定任务而言,单独的db过多

While totally accepting the previous upvoted answers, and discouraging use of global variables for production and scalable Flask storage, for the purpose of prototyping or really simple servers, running under the flask ‘development server’…

The Python built-in data types, and I personally used and tested the global dict, as per Python documentation are thread safe. Not process safe.

The insertions, lookups, and reads from such a (server global) dict will be OK from each (possibly concurrent) Flask session running under the development server.

When such a global dict is keyed with a unique Flask session key, it can be rather useful for server-side storage of session specific data otherwise not fitting into the cookie (max size 4 kB).

Of course, such a server global dict should be carefully guarded for growing too large, being in-memory. Some sort of expiring the ‘old’ key/value pairs can be coded during request processing.

Again, it is not recommended for production or scalable deployments, but it is possibly OK for local task-oriented servers where a separate database is too much for the given task.


Python字典中的线程安全

问题:Python字典中的线程安全

我有一堂课,有一本字典

class OrderBook:
    orders = {'Restaurant1': None,
              'Restaurant2': None,
              'Restaurant3': None,
              'Restaurant4': None}

    @staticmethod
    def addOrder(restaurant_name, orders):
        OrderBook.orders[restaurant_name] = orders

我正在运行4个线程(每个餐厅一个线程)来调用方法OrderBook.addOrder。这是每个线程运行的函数:

def addOrders(restaurant_name):

    #creates orders
    ...

    OrderBook.addOrder(restaurant_name, orders)

这样安全吗,还是在调用之前必须使用锁addOrder

I have a class which holds a dictionary

class OrderBook:
    orders = {'Restaurant1': None,
              'Restaurant2': None,
              'Restaurant3': None,
              'Restaurant4': None}

    @staticmethod
    def addOrder(restaurant_name, orders):
        OrderBook.orders[restaurant_name] = orders

And I am running 4 threads (one for each restaurant) that call the method OrderBook.addOrder. Here is the function ran by each thread:

def addOrders(restaurant_name):

    #creates orders
    ...

    OrderBook.addOrder(restaurant_name, orders)

Is this safe, or do I have to use a lock before calling addOrder?


回答 0

Python的内置结构对于单个操作是线程安全的,但是有时很难看到一条语句真正变成了多个操作。

您的代码应该是安全的。请记住:这里的锁几乎不会增加任何开销,并且让您高枕无忧。

http://effbot.org/pyfaq/what-kinds-of-global-value-mutation-are-thread-safe.htm 具有更多详细信息。

Python’s built-in structures are thread-safe for single operations, but it can sometimes be hard to see where a statement really becomes multiple operations.

Your code should be safe. Keep in mind: a lock here will add almost no overhead, and will give you peace of mind.

http://effbot.org/pyfaq/what-kinds-of-global-value-mutation-are-thread-safe.htm has more details.


回答 1

是的,内置类型本质上是线程安全的:http : //docs.python.org/glossary.html#term-global-interpreter-lock

通过使对象模型(包括关键的内置类型,如dict)隐式地安全地防止并发访问,从而简化了CPython的实现。

Yes, built-in types are inherently thread-safe: http://docs.python.org/glossary.html#term-global-interpreter-lock

This simplifies the CPython implementation by making the object model (including critical built-in types such as dict) implicitly safe against concurrent access.


回答 2

Google的风格指南建议不要依赖dict原子性

在以下位置进一步详细解释:Python变量赋值是原子的吗?

不要依赖内置类型的原子性。

尽管Python的内置数据类型(如字典)似乎具有原子操作,但在某些极端情况下,它们不是原子操作(例如,如果将__hash____eq__实现为Python方法),则不应依赖其原子性。您也不应该依赖于原子变量赋值(因为这又取决于字典)。

使用Queue模块的Queue数据类型作为在线程之间传递数据的首选方式。否则,请使用线程模块及其锁定原语。了解如何正确使用条件变量,以便可以使用threading.Condition而不是使用较低级别的锁。

我同意这一观点:CPython中已经存在GIL,因此使用Lock的性能影响可以忽略不计。当这些CPython实现细节一天之内改变时,花在复杂代码库中的错误查找所花费的时间将大大增加。

Google’s style guide advises against relying on dict atomicity

Explained in further detail at: Is Python variable assignment atomic?

Do not rely on the atomicity of built-in types.

While Python’s built-in data types such as dictionaries appear to have atomic operations, there are corner cases where they aren’t atomic (e.g. if __hash__ or __eq__ are implemented as Python methods) and their atomicity should not be relied upon. Neither should you rely on atomic variable assignment (since this in turn depends on dictionaries).

Use the Queue module’s Queue data type as the preferred way to communicate data between threads. Otherwise, use the threading module and its locking primitives. Learn about the proper use of condition variables so you can use threading.Condition instead of using lower-level locks.

And I agree with this one: there is already the GIL in CPython, so the performance hit of using a Lock will be negligible. Much more costly will be the hours spent bug hunting in a complex codebase when those CPython implementation details change one day.


Queue.Queue与collections.deque

问题:Queue.Queue与collections.deque

我需要一个队列,多个线程可以将内容放入其中,并且多个线程可以读取。

Python至少有两个队列类,Queue.Queue和collections.deque,前者似乎在内部使用后者。两者都声称在文档中是线程安全的。

但是,队列文档还指出:

collections.deque是具有无限原子append()和popleft()操作的无界队列的替代实现,不需要锁定。

我猜我不太理解:这是否意味着双端队列毕竟不是完全线程安全的?

如果是这样,我可能无法完全理解两个类之间的区别。我可以看到Queue添加了阻止功能。另一方面,它失去了一些过时的功能,例如对操作员的支持。

直接访问内部双端队列对象是

Queue()中的x

线程安全的?

另外,当双端队列已经是线程安全的了,为什么Queue在操作上使用互斥锁?

I need a queue which multiple threads can put stuff into, and multiple threads may read from.

Python has at least two queue classes, Queue.Queue and collections.deque, with the former seemingly using the latter internally. Both claim to be thread-safe in the documentation.

However, the Queue docs also state:

collections.deque is an alternative implementation of unbounded queues with fast atomic append() and popleft() operations that do not require locking.

Which I guess I don’t quite unterstand: Does this mean deque isn’t fully thread-safe after all?

If it is, I may not fully understand the difference between the two classes. I can see that Queue adds blocking functionality. On the other hand, it loses some deque features like support for the in-operator.

Accessing the internal deque object directly, is

x in Queue().deque

thread-safe?

Also, why does Queue employ a mutex for it’s operations when deque is thread-safe already?


回答 0

Queue.Queuecollections.deque达到不同的目的。Queue.Queue旨在允许不同的线程使用排队的消息/数据进行通信,而collections.deque仅仅是作为数据结构。这就是为什么Queue.Queue有类似的方法put_nowait()get_nowait()join(),而collections.deque不会。Queue.Queue不打算用作集合,这就是为什么它缺少in运算符之类的原因。

归结为:如果您有多个线程,并且希望它们能够在不需要锁的情况下进行通信,那么您正在寻找Queue.Queue;如果您只想将队列或双端队列作为数据结构,请使用collections.deque

最后,访问和处理内部的双端队列 Queue.Queue正在玩火-您确实不想这样做。

Queue.Queue and collections.deque serve different purposes. Queue.Queue is intended for allowing different threads to communicate using queued messages/data, whereas collections.deque is simply intended as a datastructure. That’s why Queue.Queue has methods like put_nowait(), get_nowait(), and join(), whereas collections.deque doesn’t. Queue.Queue isn’t intended to be used as a collection, which is why it lacks the likes of the in operator.

It boils down to this: if you have multiple threads and you want them to be able to communicate without the need for locks, you’re looking for Queue.Queue; if you just want a queue or a double-ended queue as a datastructure, use collections.deque.

Finally, accessing and manipulating the internal deque of a Queue.Queue is playing with fire – you really don’t want to be doing that.


回答 1

如果您要寻找的是一种在线程之间传输对象的线程安全方法,那么两者都将起作用(对于FIFO和LIFO都适用)。对于FIFO:

注意:

  • 的其他操作 deque我不确定可能不是线程安全的。
  • deque并未阻挡pop()popleft()让你无法立足于阻塞,直到一个新项目到达您的消费者线程流。

但是,似乎双端队列具有明显的效率优势。这是使用CPython 2.7.3在几秒钟内插入和删除100k项的一些基准测试结果

deque 0.0747888759791
Queue 1.60079066852

这是基准代码:

import time
import Queue
import collections

q = collections.deque()
t0 = time.clock()
for i in xrange(100000):
    q.append(1)
for i in xrange(100000):
    q.popleft()
print 'deque', time.clock() - t0

q = Queue.Queue(200000)
t0 = time.clock()
for i in xrange(100000):
    q.put(1)
for i in xrange(100000):
    q.get()
print 'Queue', time.clock() - t0

If all you’re looking for is a thread-safe way to transfer objects between threads, then both would work (both for FIFO and LIFO). For FIFO:

Note:

  • Other operations on deque might not be thread safe, I’m not sure.
  • deque does not block on pop() or popleft() so you can’t base your consumer thread flow on blocking till a new item arrives.

However, it seems that deque has a significant efficiency advantage. Here are some benchmark results in seconds using CPython 2.7.3 for inserting and removing 100k items

deque 0.0747888759791
Queue 1.60079066852

Here’s the benchmark code:

import time
import Queue
import collections

q = collections.deque()
t0 = time.clock()
for i in xrange(100000):
    q.append(1)
for i in xrange(100000):
    q.popleft()
print 'deque', time.clock() - t0

q = Queue.Queue(200000)
t0 = time.clock()
for i in xrange(100000):
    q.put(1)
for i in xrange(100000):
    q.get()
print 'Queue', time.clock() - t0

回答 2

有关信息,请参阅Python票证中的双端线程安全性(https://bugs.python.org/issue15329)。标题“阐明哪些双端队列方法是线程安全的”

底线在这里:https : //bugs.python.org/issue15329#msg199368

双端队列的append(),appendleft(),pop(),popleft()和len(d)操作在CPython中是线程安全的。append方法的末尾有一个DECREF(对于已设置maxlen的情况),但这会在所有结构更新完成并且不变量已恢复之后发生,因此可以将这些操作视为原子操作。

无论如何,如果您不确定100%的可靠性,而宁愿选择可靠性而不是性能,则只需放一个类似Lock的锁即可;)

For information there is a Python ticket referenced for deque thread-safety (https://bugs.python.org/issue15329). Title “clarify which deque methods are thread-safe”

Bottom line here: https://bugs.python.org/issue15329#msg199368

The deque’s append(), appendleft(), pop(), popleft(), and len(d) operations are thread-safe in CPython. The append methods have a DECREF at the end (for cases where maxlen has been set), but this happens after all of the structure updates have been made and the invariants have been restored, so it is okay to treat these operations as atomic.

Anyway, if you are not 100% sure and you prefer reliability over performance, just put a like Lock ;)


回答 3

所有启用的单元素方法deque都是原子和线程安全的。所有其他方法也是线程安全的。之类的东西len(dq)dq[4]产生瞬间的正确的价值观。但是想想一下dq.extend(mylist)mylist当其他线程也在同一侧附加元素时,您不能保证所有元素都连续提交,但这通常不是线程间通信和有问题的任务所必需的。

因此,a的deque速度要快20倍左右Queue(后者dequemaxsize幕后使用),除非您不需要“舒适的”同步API(阻止/超时),严格遵守或“覆盖这些方法(_put,_get,.. )来实现其他队列组织的子类化行为,或者当您自己处理此类事情时,光秃秃的deque是高速线程间通信的好方法。

实际上,大量使用额外的互斥锁和额外的方法._get()Queue.py是由于向后兼容性限制,过去的过度设计以及缺乏为线程间通信中这一重要的速度瓶颈问题提供有效解决方案的注意。在较旧的Python版本中使用了列表-但是list.append()/。pop(0)甚至是&都是原子和线程安全的…

All single-element methods on deque are atomic and thread-safe. All other methods are thread-safe too. Things like len(dq), dq[4] yield momentary correct values. But think e.g. about dq.extend(mylist): you don’t get a guarantee that all elements in mylist are filed in a row when other threads also append elements on the same side – but thats usually not a requirement in inter-thread communication and for the questioned task.

So a deque is ~20x faster than Queue (which uses a deque under the hood) and unless you don’t need the “comfortable” synchronization API (blocking / timeout), the strict maxsize obeyance or the “Override these methods (_put, _get, ..) to implement other queue organizations” sub-classing behavior, or when you take care of such things yourself, then a bare deque is a good and efficient deal for high-speed inter-thread communication.

In fact the heavy usage of an extra mutex and extra method ._get() etc. method calls in Queue.py is due to backwards compatibility constraints, past over-design and lack of care for providing an efficient solution for this important speed bottleneck issue in inter-thread communication. A list was used in older Python versions – but even list.append()/.pop(0) was & is atomic and threadsafe …


回答 4

默认行为的20倍改进相比notify_all()每个结果相加会导致更差的结果:deque appendpopleftdequedeque

deque + notify_all: 0.469802
Queue:              0.667279

@Jonathan稍微修改了他的代码,我使用cPython 3.6.2获得了基准,并在双端队列中添加了条件以模拟Queue的行为。

import time
from queue import Queue
import threading
import collections

mutex = threading.Lock()
condition = threading.Condition(mutex)
q = collections.deque()
t0 = time.clock()
for i in range(100000):
    with condition:
        q.append(1)
        condition.notify_all()
for _ in range(100000):
    with condition:
        q.popleft()
        condition.notify_all()
print('deque', time.clock() - t0)

q = Queue(200000)
t0 = time.clock()
for _ in range(100000):
    q.put(1)
for _ in range(100000):
    q.get()
print('Queue', time.clock() - t0)

而且似乎该功能限制了性能 condition.notify_all()

collections.deque是具有无限原子append()和popleft()操作的无界队列的替代实现,不需要锁定。 docs队列

Adding notify_all() to each deque append and popleft results in far worse results for deque than the 20x improvement achieved by default deque behavior:

deque + notify_all: 0.469802
Queue:              0.667279

@Jonathan modify his code a little and I get the benchmark using cPython 3.6.2 and add condition in deque loop to simulate the behaviour Queue do.

import time
from queue import Queue
import threading
import collections

mutex = threading.Lock()
condition = threading.Condition(mutex)
q = collections.deque()
t0 = time.clock()
for i in range(100000):
    with condition:
        q.append(1)
        condition.notify_all()
for _ in range(100000):
    with condition:
        q.popleft()
        condition.notify_all()
print('deque', time.clock() - t0)

q = Queue(200000)
t0 = time.clock()
for _ in range(100000):
    q.put(1)
for _ in range(100000):
    q.get()
print('Queue', time.clock() - t0)

And it seems the performance limited by this function condition.notify_all()

collections.deque is an alternative implementation of unbounded queues with fast atomic append() and popleft() operations that do not require locking. docs Queue


回答 5

deque是线程安全的。“不需要锁定的操作”意味着您不必自己进行锁定,deque多多关照。

以一看Queue源,内部双端队列被称为self.queue并使用存取和突变互斥,所以Queue().queue线程安全的使用。

如果您正在寻找“ in”运算符,则双端队列或队列可能不是最适合您问题的数据结构。

deque is thread-safe. “operations that do not require locking” means that you don’t have to do the locking yourself, the deque takes care of it.

Taking a look at the Queue source, the internal deque is called self.queue and uses a mutex for accessors and mutations, so Queue().queue is not thread-safe to use.

If you’re looking for an “in” operator, then a deque or queue is possibly not the most appropriate data structure for your problem.


回答 6

(似乎我没有信誉可言…)您需要注意从不同线程使用的双端队列的哪些方法。

deque.get()似乎是线程安全的,但是我发现这样做

for item in a_deque:
   process(item)

如果另一个线程同时添加项目,则失败。我收到一个RuntimeException,它抱怨“迭代期间双端队列已变异”。

检查collectionsmodule.c以查看哪些操作受此影响

(seems I don’t have reputation to comment…) You need to be careful which methods of the deque you use from different threads.

deque.get() appears to be threadsafe, but I have found that doing

for item in a_deque:
   process(item)

can fail if another thread is adding items at the same time. I got an RuntimeException that complained “deque mutated during iteration”.

Check collectionsmodule.c to see which operations are affected by this