问题:Queue.Queue与collections.deque
我需要一个队列,多个线程可以将内容放入其中,并且多个线程可以读取。
Python至少有两个队列类,Queue.Queue和collections.deque,前者似乎在内部使用后者。两者都声称在文档中是线程安全的。
但是,队列文档还指出:
collections.deque是具有无限原子append()和popleft()操作的无界队列的替代实现,不需要锁定。
我猜我不太理解:这是否意味着双端队列毕竟不是完全线程安全的?
如果是这样,我可能无法完全理解两个类之间的区别。我可以看到Queue添加了阻止功能。另一方面,它失去了一些过时的功能,例如对操作员的支持。
直接访问内部双端队列对象是
Queue()中的x
线程安全的?
另外,当双端队列已经是线程安全的了,为什么Queue在操作上使用互斥锁?
I need a queue which multiple threads can put stuff into, and multiple threads may read from.
Python has at least two queue classes, Queue.Queue and collections.deque, with the former seemingly using the latter internally. Both claim to be thread-safe in the documentation.
However, the Queue docs also state:
collections.deque is an alternative
implementation of unbounded queues
with fast atomic append() and
popleft() operations that do not
require locking.
Which I guess I don’t quite unterstand: Does this mean deque isn’t fully thread-safe after all?
If it is, I may not fully understand the difference between the two classes. I can see that Queue adds blocking functionality. On the other hand, it loses some deque features like support for the in-operator.
Accessing the internal deque object directly, is
x in Queue().deque
thread-safe?
Also, why does Queue employ a mutex for it’s operations when deque is thread-safe already?
回答 0
Queue.Queue
和 collections.deque
达到不同的目的。Queue.Queue旨在允许不同的线程使用排队的消息/数据进行通信,而collections.deque
仅仅是作为数据结构。这就是为什么Queue.Queue
有类似的方法put_nowait()
,get_nowait()
和join()
,而collections.deque
不会。Queue.Queue
不打算用作集合,这就是为什么它缺少in
运算符之类的原因。
归结为:如果您有多个线程,并且希望它们能够在不需要锁的情况下进行通信,那么您正在寻找Queue.Queue
;如果您只想将队列或双端队列作为数据结构,请使用collections.deque
。
最后,访问和处理内部的双端队列 Queue.Queue
正在玩火-您确实不想这样做。
Queue.Queue
and collections.deque
serve different purposes. Queue.Queue is intended for allowing different threads to communicate using queued messages/data, whereas collections.deque
is simply intended as a datastructure. That’s why Queue.Queue
has methods like put_nowait()
, get_nowait()
, and join()
, whereas collections.deque
doesn’t. Queue.Queue
isn’t intended to be used as a collection, which is why it lacks the likes of the in
operator.
It boils down to this: if you have multiple threads and you want them to be able to communicate without the need for locks, you’re looking for Queue.Queue
; if you just want a queue or a double-ended queue as a datastructure, use collections.deque
.
Finally, accessing and manipulating the internal deque of a Queue.Queue
is playing with fire – you really don’t want to be doing that.
回答 1
如果您要寻找的是一种在线程之间传输对象的线程安全方法,那么两者都将起作用(对于FIFO和LIFO都适用)。对于FIFO:
注意:
- 的其他操作
deque
我不确定可能不是线程安全的。
deque
并未阻挡pop()
或popleft()
让你无法立足于阻塞,直到一个新项目到达您的消费者线程流。
但是,似乎双端队列具有明显的效率优势。这是使用CPython 2.7.3在几秒钟内插入和删除100k项的一些基准测试结果
deque 0.0747888759791
Queue 1.60079066852
这是基准代码:
import time
import Queue
import collections
q = collections.deque()
t0 = time.clock()
for i in xrange(100000):
q.append(1)
for i in xrange(100000):
q.popleft()
print 'deque', time.clock() - t0
q = Queue.Queue(200000)
t0 = time.clock()
for i in xrange(100000):
q.put(1)
for i in xrange(100000):
q.get()
print 'Queue', time.clock() - t0
If all you’re looking for is a thread-safe way to transfer objects between threads, then both would work (both for FIFO and LIFO). For FIFO:
Note:
- Other operations on
deque
might not be thread safe, I’m not sure.
deque
does not block on pop()
or popleft()
so you can’t base your consumer thread flow on blocking till a new item arrives.
However, it seems that deque has a significant efficiency advantage. Here are some benchmark results in seconds using CPython 2.7.3 for inserting and removing 100k items
deque 0.0747888759791
Queue 1.60079066852
Here’s the benchmark code:
import time
import Queue
import collections
q = collections.deque()
t0 = time.clock()
for i in xrange(100000):
q.append(1)
for i in xrange(100000):
q.popleft()
print 'deque', time.clock() - t0
q = Queue.Queue(200000)
t0 = time.clock()
for i in xrange(100000):
q.put(1)
for i in xrange(100000):
q.get()
print 'Queue', time.clock() - t0
回答 2
For information there is a Python ticket referenced for deque thread-safety (https://bugs.python.org/issue15329).
Title “clarify which deque methods are thread-safe”
Bottom line here: https://bugs.python.org/issue15329#msg199368
The deque’s append(), appendleft(), pop(), popleft(), and len(d)
operations are thread-safe in CPython. The append methods have a
DECREF at the end (for cases where maxlen has been set), but this
happens after all of the structure updates have been made and the
invariants have been restored, so it is okay to treat these operations
as atomic.
Anyway, if you are not 100% sure and you prefer reliability over performance, just put a like Lock ;)
回答 3
所有启用的单元素方法deque
都是原子和线程安全的。所有其他方法也是线程安全的。之类的东西len(dq)
,dq[4]
产生瞬间的正确的价值观。但是想想一下dq.extend(mylist)
:mylist
当其他线程也在同一侧附加元素时,您不能保证所有元素都连续提交,但这通常不是线程间通信和有问题的任务所必需的。
因此,a的deque
速度要快20倍左右Queue
(后者deque
在maxsize
幕后使用),除非您不需要“舒适的”同步API(阻止/超时),严格遵守或“覆盖这些方法(_put,_get,.. )来实现其他队列组织的子类化行为,或者当您自己处理此类事情时,光秃秃的deque
是高速线程间通信的好方法。
实际上,大量使用额外的互斥锁和额外的方法._get()
等Queue.py
是由于向后兼容性限制,过去的过度设计以及缺乏为线程间通信中这一重要的速度瓶颈问题提供有效解决方案的注意。在较旧的Python版本中使用了列表-但是list.append()/。pop(0)甚至是&都是原子和线程安全的…
All single-element methods on deque
are atomic and thread-safe. All other methods are thread-safe too. Things like len(dq)
, dq[4]
yield momentary correct values. But think e.g. about dq.extend(mylist)
: you don’t get a guarantee that all elements in mylist
are filed in a row when other threads also append elements on the same side – but thats usually not a requirement in inter-thread communication and for the questioned task.
So a deque
is ~20x faster than Queue
(which uses a deque
under the hood) and unless you don’t need the “comfortable” synchronization API (blocking / timeout), the strict maxsize
obeyance or the “Override these methods (_put, _get, ..) to implement other queue organizations” sub-classing behavior, or when you take care of such things yourself, then a bare deque
is a good and efficient deal for high-speed inter-thread communication.
In fact the heavy usage of an extra mutex and extra method ._get()
etc. method calls in Queue.py
is due to backwards compatibility constraints, past over-design and lack of care for providing an efficient solution for this important speed bottleneck issue in inter-thread communication. A list was used in older Python versions – but even list.append()/.pop(0) was & is atomic and threadsafe …
回答 4
与默认行为的20倍改进相比,将notify_all()
每个结果相加会导致更差的结果:deque
append
popleft
deque
deque
deque + notify_all: 0.469802
Queue: 0.667279
@Jonathan稍微修改了他的代码,我使用cPython 3.6.2获得了基准,并在双端队列中添加了条件以模拟Queue的行为。
import time
from queue import Queue
import threading
import collections
mutex = threading.Lock()
condition = threading.Condition(mutex)
q = collections.deque()
t0 = time.clock()
for i in range(100000):
with condition:
q.append(1)
condition.notify_all()
for _ in range(100000):
with condition:
q.popleft()
condition.notify_all()
print('deque', time.clock() - t0)
q = Queue(200000)
t0 = time.clock()
for _ in range(100000):
q.put(1)
for _ in range(100000):
q.get()
print('Queue', time.clock() - t0)
而且似乎该功能限制了性能 condition.notify_all()
collections.deque是具有无限原子append()和popleft()操作的无界队列的替代实现,不需要锁定。
docs队列
Adding notify_all()
to each deque
append
and popleft
results in far worse results for deque
than the 20x improvement achieved by default deque
behavior:
deque + notify_all: 0.469802
Queue: 0.667279
@Jonathan modify his code a little and I get the benchmark using cPython 3.6.2 and add condition in deque loop to simulate the behaviour Queue do.
import time
from queue import Queue
import threading
import collections
mutex = threading.Lock()
condition = threading.Condition(mutex)
q = collections.deque()
t0 = time.clock()
for i in range(100000):
with condition:
q.append(1)
condition.notify_all()
for _ in range(100000):
with condition:
q.popleft()
condition.notify_all()
print('deque', time.clock() - t0)
q = Queue(200000)
t0 = time.clock()
for _ in range(100000):
q.put(1)
for _ in range(100000):
q.get()
print('Queue', time.clock() - t0)
And it seems the performance limited by
this function condition.notify_all()
collections.deque is an alternative implementation of unbounded queues with fast atomic append() and popleft() operations that do not require locking.
docs Queue
回答 5
deque
是线程安全的。“不需要锁定的操作”意味着您不必自己进行锁定,deque
多多关照。
以一看Queue
源,内部双端队列被称为self.queue
并使用存取和突变互斥,所以Queue().queue
是不线程安全的使用。
如果您正在寻找“ in”运算符,则双端队列或队列可能不是最适合您问题的数据结构。
deque
is thread-safe. “operations that do not require locking” means that you don’t have to do the locking yourself, the deque
takes care of it.
Taking a look at the Queue
source, the internal deque is called self.queue
and uses a mutex for accessors and mutations, so Queue().queue
is not thread-safe to use.
If you’re looking for an “in” operator, then a deque or queue is possibly not the most appropriate data structure for your problem.
回答 6
(似乎我没有信誉可言…)您需要注意从不同线程使用的双端队列的哪些方法。
deque.get()似乎是线程安全的,但是我发现这样做
for item in a_deque:
process(item)
如果另一个线程同时添加项目,则失败。我收到一个RuntimeException,它抱怨“迭代期间双端队列已变异”。
检查collectionsmodule.c以查看哪些操作受此影响
(seems I don’t have reputation to comment…)
You need to be careful which methods of the deque you use from different threads.
deque.get() appears to be threadsafe, but I have found that doing
for item in a_deque:
process(item)
can fail if another thread is adding items at the same time.
I got an RuntimeException that complained “deque mutated during iteration”.
Check collectionsmodule.c to see which operations are affected by this