


我有一个多线程Python程序和一个实用程序函数, writeLog(message),该写出时间戳记和消息。不幸的是,结果日志文件没有给出哪个线程正在生成哪个消息的指示。


I have a multi-threading Python program, and a utility function, writeLog(message), that writes out a timestamp followed by the message. Unfortunately, the resultant log file gives no indication of which thread is generating which message.

I would like writeLog() to be able to add something to the message to identify which thread is calling it. Obviously I could just make the threads pass this information in, but that would be a lot more work. Is there some thread equivalent of os.getpid() that I could use?

回答 0

threading.get_ident(),或threading.current_thread().ident(或(threading.currentThread().ident对于python <2.6)。

threading.get_ident() works, or threading.current_thread().ident (or threading.currentThread().ident for Python < 2.6).

回答 1


%(thread)d: 线程ID(如果有)。

%(threadName)s: 线程名称(如果有)。



Using the logging module you can automatically add the current thread identifier in each log entry. Just use one of these LogRecord mapping keys in your logger format string:

%(thread)d : Thread ID (if available).

%(threadName)s : Thread name (if available).

and set up your default handler with it:


回答 2



import ctypes
libc = ctypes.cdll.LoadLibrary('libc.so.6')

# System dependent, see e.g. /usr/include/x86_64-linux-gnu/asm/unistd_64.h
SYS_gettid = 186

def getThreadId():
   """Returns OS thread id - Specific to Linux"""
   return libc.syscall(SYS_gettid)

The thread.get_ident() function returns a long integer on Linux. It’s not really a thread id.

I use this method to really get the thread id on Linux:

import ctypes
libc = ctypes.cdll.LoadLibrary('libc.so.6')

# System dependent, see e.g. /usr/include/x86_64-linux-gnu/asm/unistd_64.h
SYS_gettid = 186

def getThreadId():
   """Returns OS thread id - Specific to Linux"""
   return libc.syscall(SYS_gettid)

回答 3

回答 4


class myThread(threading.Thread):
    def __init__(self, threadID, name, counter):
        self.threadID = threadID



A thread has a name. 
The name can be passed to the constructor, 
and read or changed through the name attribute.



A string used for identification purposes only. 
It has no semantics. Multiple threads may
be given the same name. The initial name is set by the constructor.

I saw examples of thread IDs like this:

class myThread(threading.Thread):
    def __init__(self, threadID, name, counter):
        self.threadID = threadID

The threading module docs lists name attribute as well:


A thread has a name. 
The name can be passed to the constructor, 
and read or changed through the name attribute.



A string used for identification purposes only. 
It has no semantics. Multiple threads may
be given the same name. The initial name is set by the constructor.

回答 5




import threading

def worker():

threading.Thread(target=worker, name='foo').start()


You can get the ident of the current running thread. The ident could be reused for other threads, if the current thread ends.

When you crate an instance of Thread, a name is given implicit to the thread, which is the pattern: Thread-number

The name has no meaning and the name don’t have to be unique. The ident of all running threads is unique.

import threading

def worker():

threading.Thread(target=worker, name='foo').start()

The function threading.current_thread() returns the current running thread. This object holds the whole information of the thread.

回答 6

我在Python中创建了多个线程,打印了线程对象,并使用ident变量打印了id 。我看到所有ID都一样:

<Thread(Thread-1, stopped 140500807628544)>
<Thread(Thread-2, started 140500807628544)>
<Thread(Thread-3, started 140500807628544)>

I created multiple threads in Python, I printed the thread objects, and I printed the id using the ident variable. I see all the ids are same:

<Thread(Thread-1, stopped 140500807628544)>
<Thread(Thread-2, started 140500807628544)>
<Thread(Thread-3, started 140500807628544)>

回答 7

与@brucexin类似,我需要获取操作系统级别的线程标识符(!= thread.get_ident()),并使用如下所示的内容来不依赖于特定的数字并且仅使用amd64:

---- 8< ---- (xos.pyx)
"""module xos complements standard module os""" 

cdef extern from "<sys/syscall.h>":                                                             
    long syscall(long number, ...)                                                              
    const int SYS_gettid                                                                        

# gettid returns current OS thread identifier.                                                  
def gettid():                                                                                   
    return syscall(SYS_gettid)                                                                  

---- 8< ---- (test.py)
import pyximport; pyximport.install()
import xos


print 'my tid: %d' % xos.gettid()


Similarly to @brucexin I needed to get OS-level thread identifier (which != thread.get_ident()) and use something like below not to depend on particular numbers and being amd64-only:

---- 8< ---- (xos.pyx)
"""module xos complements standard module os""" 

cdef extern from "<sys/syscall.h>":                                                             
    long syscall(long number, ...)                                                              
    const int SYS_gettid                                                                        

# gettid returns current OS thread identifier.                                                  
def gettid():                                                                                   
    return syscall(SYS_gettid)                                                                  


---- 8< ---- (test.py)
import pyximport; pyximport.install()
import xos


print 'my tid: %d' % xos.gettid()

this depends on Cython though.

python multithreading等到所有线程完成

问题:python multithreading等到所有线程完成




import subprocess


#run scriptA
subprocess.call(scriptA + argumentsA)
subprocess.call(scriptA + argumentsB)
subprocess.call(scriptA + argumentsC)




from threading import Thread
import subprocess

def call_script(args)

#run scriptA   
t1 = Thread(target=call_script, args=(scriptA + argumentsA))
t2 = Thread(target=call_script, args=(scriptA + argumentsB))
t3 = Thread(target=call_script, args=(scriptA + argumentsC))



This may have been asked in a similar context but I was unable to find an answer after about 20 minutes of searching, so I will ask.

I have written a Python script (lets say: scriptA.py) and a script (lets say scriptB.py)

In scriptB I want to call scriptA multiple times with different arguments, each time takes about an hour to run, (its a huge script, does lots of stuff.. don’t worry about it) and I want to be able to run the scriptA with all the different arguments simultaneously, but I need to wait till ALL of them are done before continuing; my code:

import subprocess


#run scriptA
subprocess.call(scriptA + argumentsA)
subprocess.call(scriptA + argumentsB)
subprocess.call(scriptA + argumentsC)


I want to do run all the subprocess.call() at the same time, and then wait till they are all done, how should I do this?

I tried to use threading like the example here:

from threading import Thread
import subprocess

def call_script(args)

#run scriptA   
t1 = Thread(target=call_script, args=(scriptA + argumentsA))
t2 = Thread(target=call_script, args=(scriptA + argumentsB))
t3 = Thread(target=call_script, args=(scriptA + argumentsC))

But I do not think this is right.

How do I know they have all finished running before going to my do_finish()?

回答 0


t1 = Thread(target=call_script, args=(scriptA + argumentsA))
t2 = Thread(target=call_script, args=(scriptA + argumentsB))
t3 = Thread(target=call_script, args=(scriptA + argumentsC))




You need to use join method of Thread object in the end of the script.

t1 = Thread(target=call_script, args=(scriptA + argumentsA))
t2 = Thread(target=call_script, args=(scriptA + argumentsB))
t3 = Thread(target=call_script, args=(scriptA + argumentsC))



Thus the main thread will wait till t1, t2 and t3 finish execution.

回答 1


 threads = []

 t = Thread(...)

 ...repeat as often as necessary...

 # Start all threads
 for x in threads:

 # Wait for all of them to finish
 for x in threads:

Put the threads in a list and then use the Join method

 threads = []

 t = Thread(...)

 ...repeat as often as necessary...

 # Start all threads
 for x in threads:

 # Wait for all of them to finish
 for x in threads:

回答 2

在Python3中,由于使用Python 3.2,因此有一种新方法可以达到相同的结果,我个人更喜欢传统的线程创建/启动/连接程序包concurrent.futureshttps : //docs.python.org/3/library/concurrent.futures .html


from concurrent.futures.thread import ThreadPoolExecutor
import time

def call_script(ordinal, arg):
    print('Thread', ordinal, 'argument:', arg)
    print('Thread', ordinal, 'Finished')

args = ['argumentsA', 'argumentsB', 'argumentsC']

with ThreadPoolExecutor(max_workers=2) as executor:
    ordinal = 1
    for arg in args:
        executor.submit(call_script, ordinal, arg)
        ordinal += 1
print('All tasks has been finished')


Thread 1 argument: argumentsA
Thread 2 argument: argumentsB
Thread 1 Finished
Thread 2 Finished
Thread 3 argument: argumentsC
Thread 3 Finished
All tasks has been finished


In Python3, since Python 3.2 there is a new approach to reach the same result, that I personally prefer to the traditional thread creation/start/join, package concurrent.futures: https://docs.python.org/3/library/concurrent.futures.html

Using a ThreadPoolExecutor the code would be:

from concurrent.futures.thread import ThreadPoolExecutor
import time

def call_script(ordinal, arg):
    print('Thread', ordinal, 'argument:', arg)
    print('Thread', ordinal, 'Finished')

args = ['argumentsA', 'argumentsB', 'argumentsC']

with ThreadPoolExecutor(max_workers=2) as executor:
    ordinal = 1
    for arg in args:
        executor.submit(call_script, ordinal, arg)
        ordinal += 1
print('All tasks has been finished')

The output of the previous code is something like:

Thread 1 argument: argumentsA
Thread 2 argument: argumentsB
Thread 1 Finished
Thread 2 Finished
Thread 3 argument: argumentsC
Thread 3 Finished
All tasks has been finished

One of the advantages is that you can control the throughput setting the max concurrent workers.

回答 3


inputs = [scriptA + argumentsA, scriptA + argumentsB, ...]
threads = [Thread(target=call_script, args=(i)) for i in inputs]
[t.start() for t in threads]
[t.join() for t in threads]

I prefer using list comprehension based on an input list:

inputs = [scriptA + argumentsA, scriptA + argumentsB, ...]
threads = [Thread(target=call_script, args=(i)) for i in inputs]
[t.start() for t in threads]
[t.join() for t in threads]

回答 4


from multiprocessing import Process

class ProcessParallel(object):
    To Process the  functions parallely

    def __init__(self, *jobs):
        self.jobs = jobs
        self.processes = []

    def fork_processes(self):
        Creates the process objects for given function deligates
        for job in self.jobs:
            proc  = Process(target=job)

    def start_all(self):
        Starts the functions process all together.
        for proc in self.processes:

    def join_all(self):
        Waits untill all the functions executed.
        for proc in self.processes:

def two_sum(a=2, b=2):
    return a + b

def multiply(a=2, b=2):
    return a * b

#How to run:
if __name__ == '__main__':
    #note: two_sum, multiply can be replace with any python console scripts which
    #you wanted to run parallel..
    procs =  ProcessParallel(two_sum, multiply)
    #Add all the process in list
    #starts  process execution 
    #wait until all the process got executed

You can have class something like below from which you can add ‘n’ number of functions or console_scripts you want to execute in parallel passion and start the execution and wait for all jobs to complete..

from multiprocessing import Process

class ProcessParallel(object):
    To Process the  functions parallely

    def __init__(self, *jobs):
        self.jobs = jobs
        self.processes = []

    def fork_processes(self):
        Creates the process objects for given function deligates
        for job in self.jobs:
            proc  = Process(target=job)

    def start_all(self):
        Starts the functions process all together.
        for proc in self.processes:

    def join_all(self):
        Waits untill all the functions executed.
        for proc in self.processes:

def two_sum(a=2, b=2):
    return a + b

def multiply(a=2, b=2):
    return a * b

#How to run:
if __name__ == '__main__':
    #note: two_sum, multiply can be replace with any python console scripts which
    #you wanted to run parallel..
    procs =  ProcessParallel(two_sum, multiply)
    #Add all the process in list
    #starts  process execution 
    #wait until all the process got executed

回答 5

threading 模块文档




import threading as thrd

def alter_data(data, index):
    data[index] *= 2

data = [0, 2, 6, 20]

for i, value in enumerate(data):
    thrd.Thread(target=alter_data, args=[data, i]).start()

for thread in thrd.enumerate():
    if thread.daemon:
    except RuntimeError as err:
        if 'cannot join current thread' in err.args[0]:
            # catchs main thread


>>> print(data)
[0, 4, 12, 40]

From the threading module documentation

There is a “main thread” object; this corresponds to the initial thread of control in the Python program. It is not a daemon thread.

There is the possibility that “dummy thread objects” are created. These are thread objects corresponding to “alien threads”, which are threads of control started outside the threading module, such as directly from C code. Dummy thread objects have limited functionality; they are always considered alive and daemonic, and cannot be join()ed. They are never deleted, since it is impossible to detect the termination of alien threads.

So, to catch those two cases when you are not interested in keeping a list of the threads you create:

import threading as thrd

def alter_data(data, index):
    data[index] *= 2

data = [0, 2, 6, 20]

for i, value in enumerate(data):
    thrd.Thread(target=alter_data, args=[data, i]).start()

for thread in thrd.enumerate():
    if thread.daemon:
    except RuntimeError as err:
        if 'cannot join current thread' in err.args[0]:
            # catchs main thread


>>> print(data)
[0, 4, 12, 40]

回答 6


for t in threading.enumerate():
    if t.daemon:

Maybe, something like

for t in threading.enumerate():
    if t.daemon:

回答 7


for t in threading.enumerate():
    except RuntimeError as err:
        if 'cannot join current thread' in err:

I just came across the same problem where I needed to wait for all the threads which were created using the for loop.I just tried out the following piece of code.It may not be the perfect solution but I thought it would be a simple solution to test:

for t in threading.enumerate():
    except RuntimeError as err:
        if 'cannot join current thread' in err:





无论如何,该应用程序使用PyQt4,所以我想知道更好的选择是:使用Qt的线程还是使用Python threading模块?各自的优点/缺点是什么?还是您有完全不同的建议?

编辑(赏金):虽然在我的特定情况下,解决方案可能会使用非阻塞网络请求,例如Jeff OberLukášLalinský建议的(所以基本上将并发性问题留给了网络实现),但我仍然想要更多深入回答一般问题:






I’m writing a GUI application that regularly retrieves data through a web connection. Since this retrieval takes a while, this causes the UI to be unresponsive during the retrieval process (it cannot be split into smaller parts). This is why I’d like to outsource the web connection to a separate worker thread.

[Yes, I know, now I have two problems.]

Anyway, the application uses PyQt4, so I’d like to know what the better choice is: Use Qt’s threads or use the Python threading module? What are advantages / disadvantages of each? Or do you have a totally different suggestion?

Edit (re bounty): While the solution in my particular case will probably be using a non-blocking network request like Jeff Ober and Lukáš Lalinský suggested (so basically leaving the concurrency problems to the networking implementation), I’d still like a more in-depth answer to the general question:

What are advantages and disadvantages of using PyQt4’s (i.e. Qt’s) threads over native Python threads (from the threading module)?

Edit 2: Thanks all for you answers. Although there’s no 100% agreement, there seems to be widespread consensus that the answer is “use Qt”, since the advantage of that is integration with the rest of the library, while causing no real disadvantages.

For anyone looking to choose between the two threading implementations, I highly recommend they read all the answers provided here, including the PyQt mailing list thread that abbot links to.

There were several answers I considered for the bounty; in the end I chose abbot’s for the very relevant external reference; it was, however, a close call.

Thanks again.

回答 0

不久前在PyQt邮件列表中对此进行了讨论。引用乔凡尼·巴霍(Giovanni Bajo)对这个问题的评论




This was discussed not too long ago in PyQt mailing list. Quoting Giovanni Bajo’s comments on the subject:

It’s mostly the same. The main difference is that QThreads are better integrated with Qt (asynchrnous signals/slots, event loop, etc.). Also, you can’t use Qt from a Python thread (you can’t for instance post event to the main thread through QApplication.postEvent): you need a QThread for that to work.

A general rule of thumb might be to use QThreads if you’re going to interact somehow with Qt, and use Python threads otherwise.

And some earlier comment on this subject from PyQt’s author: “they are both wrappers around the same native thread implementations”. And both implementations use GIL in the same way.

回答 1

Python的线程将更简单,更安全,并且由于它用于基于I / O的应用程序,因此它们能够绕过GIL。也就是说,您是否考虑过使用Twisted或非阻塞套接字/选择的非阻塞I / O?



Python的线程是系统线程。但是,Python使用全局解释器锁(GIL)来确保解释器一次只执行一定大小的字节码指令块。幸运的是,Python在输入/输出操作期间释放了GIL,使线程可用于模拟非阻塞I / O。






非阻塞I / O

线程给您的应用程序增加了极大的复杂性。特别是在处理Python解释器和已编译模块代码之间已经很复杂的交互时。尽管许多人发现很难遵循基于事件的编程,但是基于事件的非阻塞I / O通常比线程难得多。

使用异步I / O,您始终可以确保对于每个打开的描述符,执行路径是一致且有序的。显然,有一些必须解决的问题,例如当代码取决于一个打开的通道时该怎么办进一步取决于当另一个打开的通道返回数据时要调用的代码结果。

新的Diesel库是基于事件的非阻塞I / O的一种不错的解决方案。目前,它仅限于Linux,但是它非常快且非常优雅。


Python’s threads will be simpler and safer, and since it is for an I/O-based application, they are able to bypass the GIL. That said, have you considered non-blocking I/O using Twisted or non-blocking sockets/select?

EDIT: more on threads

Python threads

Python’s threads are system threads. However, Python uses a global interpreter lock (GIL) to ensure that the interpreter is only ever executing a certain size block of byte-code instructions at a time. Luckily, Python releases the GIL during input/output operations, making threads useful for simulating non-blocking I/O.

Important caveat: This can be misleading, since the number of byte-code instructions does not correspond to the number of lines in a program. Even a single assignment may not be atomic in Python, so a mutex lock is necessary for any block of code that must be executed atomically, even with the GIL.

QT threads

When Python hands off control to a 3rd party compiled module, it releases the GIL. It becomes the responsibility of the module to ensure atomicity where required. When control is passed back, Python will use the GIL. This can make using 3rd party libraries in conjunction with threads confusing. It is even more difficult to use an external threading library because it adds uncertainty as to where and when control is in the hands of the module vs the interpreter.

QT threads operate with the GIL released. QT threads are able to execute QT library code (and other compiled module code that does not acquire the GIL) concurrently. However, the Python code executed within the context of a QT thread still acquires the GIL, and now you have to manage two sets of logic for locking your code.

In the end, both QT threads and Python threads are wrappers around system threads. Python threads are marginally safer to use, since those parts that are not written in Python (implicitly using the GIL) use the GIL in any case (although the caveat above still applies.)

Non-blocking I/O

Threads add extraordinarily complexity to your application. Especially when dealing with the already complex interaction between the Python interpreter and compiled module code. While many find event-based programming difficult to follow, event-based, non-blocking I/O is often much less difficult to reason about than threads.

With asynchronous I/O, you can always be sure that, for each open descriptor, the path of execution is consistent and orderly. There are, obviously, issues that must be addressed, such as what to do when code depending on one open channel further depends on the results of code to be called when another open channel returns data.

One nice solution for event-based, non-blocking I/O is the new Diesel library. It is restricted to Linux at the moment, but it is extraordinarily fast and quite elegant.

It is also worth your time to learn pyevent, a wrapper around the wonderful libevent library, which provides a basic framework for event-based programming using the fastest available method for your system (determined at compile time).

回答 2



The advantage of QThread is that it’s integrated with the rest of the Qt library. That is, thread-aware methods in Qt will need to know in which thread they run, and to move objects between threads, you will need to use QThread. Another useful feature is running your own event loop in a thread.

If you are accessing a HTTP server, you should consider QNetworkAccessManager.

回答 3




而且,由于这两者都是C ++绑定,因此对于此选择没有太大的性能问题。



I asked myself the same question when I was working to PyTalk.

If you are using Qt, you need to use QThread to be able to use the Qt framework and expecially the signal/slot system.

With the signal/slot engine, you will be able to talk from a thread to another and with every part of your project.

Moreover, there is not very performance question about this choice since both are a C++ bindings.

Here is my experience of PyQt and thread.

I encourage you to use QThread.

回答 4


Jeff has some good points. Only one main thread can do any GUI updates. If you do need to update the GUI from within the thread, Qt-4’s queued connection signals make it easy to send data across threads and will automatically be invoked if you’re using QThread; I’m not sure if they will be if you’re using Python threads, although it’s easy to add a parameter to connect().

回答 5






I can’t really recommend either, but I can try describing differences between CPython and Qt threads.

First of all, CPython threads do not run concurrently, at least not Python code. Yes, they do create system threads for each Python thread, however only the thread currently holding Global Interpreter Lock is allowed to run (C extensions and FFI code might bypass it, but Python bytecode is not executed while thread doesn’t hold GIL).

On the other hand, we have Qt threads, which are basically common layer over system threads, don’t have Global Interpreter Lock, and thus are capable of running concurrently. I’m not sure how PyQt deals with it, however unless your Qt threads call Python code, they should be able to run concurrently (bar various extra locks that might be implemented in various structures).

For extra fine-tuning, you can modify the amount of bytecode instructions that are interpreted before switching ownership of GIL – lower values mean more context switching (and possibly higher responsiveness) but lower performance per individual thread (context switches have their cost – if you try switching every few instructions it doesn’t help speed.)

Hope it helps with your problems :)

回答 6


I can’t comment on the exact differences between Python and PyQt threads, but I’ve been doing what you’re attempting to do using QThread, QNetworkAcessManager and making sure to call QApplication.processEvents() while the thread is alive. If GUI responsiveness is really the issue you’re trying to solve, the later will help.











I’m slightly confused about whether multithreading works in Python or not.

I know there has been a lot of questions about this and I’ve read many of them, but I’m still confused. I know from my own experience and have seen others post their own answers and examples here on StackOverflow that multithreading is indeed possible in Python. So why is it that everyone keep saying that Python is locked by the GIL and that only one thread can run at a time? It clearly does work. Or is there some distinction I’m not getting here?

Many posters/respondents also keep mentioning that threading is limited because it does not make use of multiple cores. But I would say they are still useful because they do work simultaneously and thus get the combined workload done faster. I mean why would there even be a Python thread module otherwise?


Thanks for all the answers so far. The way I understand it is that multithreading will only run in parallel for some IO tasks, but can only run one at a time for CPU-bound multiple core tasks.

I’m not entirely sure what this means for me in practical terms, so I’ll just give an example of the kind of task I’d like to multithread. For instance, let’s say I want to loop through a very long list of strings and I want to do some basic string operations on each list item. If I split up the list, send each sublist to be processed by my loop/string code in a new thread, and send the results back in a queue, will these workloads run roughly at the same time? Most importantly will this theoretically speed up the time it takes to run the script?

Another example might be if I can render and save four different pictures using PIL in four different threads, and have this be faster than processing the pictures one by one after each other? I guess this speed-component is what I’m really wondering about rather than what the correct terminology is.

I also know about the multiprocessing module but my main interest right now is for small-to-medium task loads (10-30 secs) and so I think multithreading will be more appropriate because subprocesses can be slow to initiate.

回答 0



这仅适用于Python代码。C扩展可以并且确实会发布GIL,以允许C代码的多个线程和一个Python线程跨多个内核运行。这扩展到由内核控制的I / O,例如select()对套接字读写的调用,使Python在多线程多核设置中合理有效地处理网络事件。


注意,GIL仅适用于CPython实现。Jython和IronPython使用不同的线程实现(分别是本机Java VM和.NET公共运行时线程)。

直接解决更新问题:任何尝试使用纯Python代码从并行执行中提高速度的任务都不会看到加速,因为线程化的Python代码一次只能锁定一个线程。但是,如果混用C扩展名和I / O(例如PIL或numpy操作),则任何C代码都可以与一个活动的Python线程并行运行。

Python线程非常适合创建响应式GUI或处理多个简短的Web请求,其中I / O比Python代码更多地成为瓶颈。它不适用于并行化计算量大的Python代码,不适合执行multiprocessing此类任务的模块或委托给专用的外部库。

The GIL does not prevent threading. All the GIL does is make sure only one thread is executing Python code at a time; control still switches between threads.

What the GIL prevents then, is making use of more than one CPU core or separate CPUs to run threads in parallel.

This only applies to Python code. C extensions can and do release the GIL to allow multiple threads of C code and one Python thread to run across multiple cores. This extends to I/O controlled by the kernel, such as select() calls for socket reads and writes, making Python handle network events reasonably efficiently in a multi-threaded multi-core setup.

What many server deployments then do, is run more than one Python process, to let the OS handle the scheduling between processes to utilize your CPU cores to the max. You can also use the multiprocessing library to handle parallel processing across multiple processes from one codebase and parent process, if that suits your use cases.

Note that the GIL is only applicable to the CPython implementation; Jython and IronPython use a different threading implementation (the native Java VM and .NET common runtime threads respectively).

To address your update directly: Any task that tries to get a speed boost from parallel execution, using pure Python code, will not see a speed-up as threaded Python code is locked to one thread executing at a time. If you mix in C extensions and I/O, however (such as PIL or numpy operations) and any C code can run in parallel with one active Python thread.

Python threading is great for creating a responsive GUI, or for handling multiple short web requests where I/O is the bottleneck more than the Python code. It is not suitable for parallelizing computationally intensive Python code, stick to the multiprocessing module for such tasks or delegate to a dedicated external library.

回答 1




在CPython中,由于具有全局解释器锁,因此只有一个线程可以一次执行Python代码(即使某些面向性能的库可能克服了此限制)。如果希望您的应用程序更好地利用多核计算机的计算资源,建议您使用多处理。但是,如果您要同时运行多个I / O绑定任务,则线程化仍然是合适的模型。

Yes. :)

You have the low level thread module and the higher level threading module. But it you simply want to use multicore machines, the multiprocessing module is the way to go.

Quote from the docs:

In CPython, due to the Global Interpreter Lock, only one thread can execute Python code at once (even though certain performance-oriented libraries might overcome this limitation). If you want your application to make better use of the computational resources of multi-core machines, you are advised to use multiprocessing. However, threading is still an appropriate model if you want to run multiple I/O-bound tasks simultaneously.

回答 2



Threading is Allowed in Python, the only problem is that the GIL will make sure that just one thread is executed at a time (no parallelism).

So basically if you want to multi-thread the code to speed up calculation it won’t speed it up as just one thread is executed at a time, but if you use it to interact with a database for example it will.






I’d like to parallelize my Python program so that it can make use of multiple processors on the machine that it runs on. My parallelization is very simple, in that all the parallel “threads” of the program are independent and write their output to separate files. I don’t need the threads to exchange information but it is imperative that I know when the threads finish since some steps of my pipeline depend on their output.

Portability is important, in that I’d like this to run on any Python version on Mac, Linux, and Windows. Given these constraints, which is the most appropriate Python module for implementing this? I am trying to decide between thread, subprocess, and multiprocessing, which all seem to provide related functionality.

Any thoughts on this? I’d like the simplest solution that’s portable.

回答 0



众所周知,线程是微妙的,而且使用CPython时,线程通常仅限于一个内核(尽管正如其中一项注释所述,全局解释器锁(GIL)可以在从Python代码调用的C代码中释放) 。

我相信您引用的三个模块的大多数功能都可以以与平台无关的方式使用。在可移植性方面,请注意,multiprocessing仅自Python 2.6(确实存在用于某些较旧版本的Python的版本)以来才成为标准。但这是一个很棒的模块!

multiprocessing is a great Swiss-army knife type of module. It is more general than threads, as you can even perform remote computations. This is therefore the module I would suggest you use.

The subprocess module would also allow you to launch multiple processes, but I found it to be less convenient to use than the new multiprocessing module.

Threads are notoriously subtle, and, with CPython, you are often limited to one core, with them (even though, as noted in one of the comments, the Global Interpreter Lock (GIL) can be released in C code called from Python code).

I believe that most of the functions of the three modules you cite can be used in a platform-independent way. On the portability side, note that multiprocessing only comes in standard since Python 2.6 (a version for some older versions of Python does exist, though). But it’s a great module!

回答 1



subprocess用于运行其他可执行文件 —它基本上是一个包装器os.fork(),并os.execve()带有可选的管道一定的支撑(管道设置,并从子进程。很明显,你可能其他进程间通信(IPC)机制,如插座,或POSIX或SysV共享内存,但是您将受限于所调用程序所支持的任何接口和IPC通道。


但是,可以产生数百个子流程并对其进行轮询。我自己个人最喜欢的实用程序classh正是这样做的。 最大的缺点了的subprocess模块是I / O支持通常是封锁。有一个PEP-3145草案可以在将来的Python 3.x版本和一个替代的asyncproc中进行修复(警告会导致直接下载,而不是任何文档或自述文件)。我还发现,直接导入fcntl和操作PopenPIPE文件描述符相对容易—尽管我不知道它是否可以移植到非UNIX平台。

(更新:2019年8月7日:Python 3支持ayncio子流程:asyncio子流程)

subprocess 几乎没有事件处理支持尽管您可以使用signal模块和普通的老式UNIX / Linux信号-像以前那样轻柔地杀死进程。


multiprocessing对现有的(Python)的代码中运行的功能,可支持这个家庭的过程中更加灵活的通信。特别是,最好在可能的情况下multiprocessing围绕模块的Queue对象构建IPC ,但您也可以使用Event对象和各种其他功能(其中一些功能大概mmap是在足够支持的平台上围绕支持构建的)。

Python的multiprocessing模块旨在提供与接口和功能非常相似的 功能,threading同时允许CPython在具有GIL(全局解释器锁定)的情况下在多个CPU /内核之间扩展您的处理。它利用了由OS内核开发人员完成的所有细粒度SMP锁定和一致性工作。


threading适用于受I / O限制(不需要跨多个CPU内核扩展)的相当狭窄的应用程序范围,并且受益于线程切换(带有共享核心内存)与进程/上下文切换。在Linux上,这几乎是空集(Linux进程切换时间非常接近其线程切换时间)。


当然,其中之一是特定于实现的-主要影响CPython。那就是GIL。在大多数情况下,大多数CPython程序不会受益于两个以上CPU(内核)的可用性,并且性能通常会受到 GIL锁定争用的影响。



  • (请注意:threading与主要的Python系统(例如NumPy)一起使用,可能比大多数自己的Python代码所遭受的GIL竞争要少得多。这是因为它们是专门为这样做而设计的; NumPy的本机/二进制部分,例如,在安全的情况下会释放GIL)。






我个人认为Twisted是编程模型的代名词,因为从某种意义上讲,解决问题的方法必须“内卷”。您不是将程序视为对输入数据,输出或结果的一系列操作,而是将程序编写为服务或守护程序,并定义程序对各种事件的反应。(实际上,Twisted程序的核心“主循环”是(通常?总是?)a reactor())。

使用Twisted主要挑战包括围绕事件驱动的模型扭曲思维,并避免使用任何未经编写在Twisted框架内合作的类库或工具包。这就是为什么Twisted提供了自己的模块来进行SSH协议处理,curses,自己的子进程/ Popen函数以及许多其他模块和协议处理程序的原因,这些模块和协议处理程序乍看起来似乎在Python标准库中是重复的。


注意:较新版本的Python 3.x包含asyncio(异步I / O)功能,例如async def@ async.coroutine装饰器和await关键字,以及从将来的支持中产生的收益。所有这些都大致类似于从流程(合作多任务)的角度来看是扭曲的。(有关Twisted对Python 3的支持的当前状态,请查看:https : //twistedmatrix.com/documents/current/core/howto/python3.html



围绕Redis构建分布式处理几乎是微不足道的。整个密钥存储区可用于存储工作单位和结果,Redis LIST可用作Queue()类似的对象,而PUB / SUB支持可用于类似Event的处理。您可以散列密钥并使用在Redis实例的松散集群中复制的键来存储拓扑和散列令牌映射,以提供一致的散列和故障转移,以扩展到超出任何单个实例的容量来协调工作人员并在其中封送数据(腌制,JSON,BSON或YAML)。

当然,当您开始围绕Redis构建更大规模,更复杂的解决方案时,您正在重新实现已经使用CeleryApache SparkHadoopZookeeperetcdCassandra等解决的许多功能。这些都具有用于Python访问其服务的模块。

[更新:如果您考虑将Python用于分布式系统中的计算密集型,则需要考虑以下两个资源:IPython ParallelPySpark。尽管这些是通用分布式计算系统,但它们尤其是可访问且流行的子系统,即数据科学和分析]。



For me this is actually pretty simple:

The subprocess option:

subprocess is for running other executables — it’s basically a wrapper around os.fork() and os.execve() with some support for optional plumbing (setting up PIPEs to and from the subprocesses. Obviously you could other inter-process communications (IPC) mechanisms, such as sockets, or Posix or SysV shared memory. But you’re going to be limited to whatever interfaces and IPC channels are supported by the programs you’re calling.

Commonly, one uses any subprocess synchronously — simply calling some external utility and reading back its output or awaiting its completion (perhaps reading its results from a temporary file, or after it’s posted them to some database).

However one can spawn hundreds of subprocesses and poll them. My own personal favorite utility classh does exactly that. The biggest disadvantage of the subprocess module is that I/O support is generally blocking. There is a draft PEP-3145 to fix that in some future version of Python 3.x and an alternative asyncproc (Warning that leads right to the download, not to any sort of documentation nor README). I’ve also found that it’s relatively easy to just import fcntl and manipulate your Popen PIPE file descriptors directly — though I don’t know if this is portable to non-UNIX platforms.

(Update: 7 August 2019: Python 3 support for ayncio subprocesses: asyncio Subprocessses)

subprocess has almost no event handling supportthough you can use the signal module and plain old-school UNIX/Linux signals — killing your processes softly, as it were.

The multiprocessing option:

multiprocessing is for running functions within your existing (Python) code with support for more flexible communications among this family of processes. In particular it’s best to build your multiprocessing IPC around the module’s Queue objects where possible, but you can also use Event objects and various other features (some of which are, presumably, built around mmap support on the platforms where that support is sufficient).

Python’s multiprocessing module is intended to provide interfaces and features which are very similar to threading while allowing CPython to scale your processing among multiple CPUs/cores despite the GIL (Global Interpreter Lock). It leverages all the fine-grained SMP locking and coherency effort that was done by developers of your OS kernel.

The threading option:

threading is for a fairly narrow range of applications which are I/O bound (don’t need to scale across multiple CPU cores) and which benefit from the extremely low latency and switching overhead of thread switching (with shared core memory) vs. process/context switching. On Linux this is almost the empty set (Linux process switch times are extremely close to its thread-switches).

threading suffers from two major disadvantages in Python.

One, of course, is implementation specific — mostly affecting CPython. That’s the GIL. For the most part, most CPython programs will not benefit from the availability of more than two CPUs (cores) and often performance will suffer from the GIL locking contention.

The larger issue which is not implementation specific, is that threads share the same memory, signal handlers, file descriptors and certain other OS resources. Thus the programmer must be extremely careful about object locking, exception handling and other aspects of their code which are both subtle and which can kill, stall, or deadlock the entire process (suite of threads).

By comparison the multiprocessing model gives each process its own memory, file descriptors, etc. A crash or unhandled exception in any one of them will only kill that resource and robustly handling the disappearance of a child or sibling process can be considerably easier than debugging, isolating and fixing or working around similar issues in threads.

  • (Note: use of threading with major Python systems, such as NumPy, may suffer considerably less from GIL contention than most of your own Python code would. That’s because they’ve been specifically engineered to do so; the native/binary portions of NumPy, for example, will release the GIL when that’s safe).

The twisted option:

It’s also worth noting that Twisted offers yet another alternative which is both elegant and very challenging to understand. Basically, at the risk of over simplifying to the point where fans of Twisted may storm my home with pitchforks and torches, Twisted provides event-driven co-operative multi-tasking within any (single) process.

To understand how this is possible one should read about the features of select() (which can be built around the select() or poll() or similar OS system calls). Basically it’s all driven by the ability to make a request of the OS to sleep pending any activity on a list of file descriptors or some timeout.

Awakening from each of these calls to select() is an event — either one involving input available (readable) on some number of sockets or file descriptors, or buffering space becoming available on some other (writable) descriptors or sockets, some exceptional conditions (TCP out-of-band PUSH’d packets, for example), or a TIMEOUT.

Thus the Twisted programming model is built around handling these events then looping on the resulting “main” handler, allowing it to dispatch the events to your handlers.

I personally think of the name, Twisted as evocative of the programming model … since your approach to the problem must be, in some sense, “twisted” inside out. Rather than conceiving of your program as a series of operations on input data and outputs or results, you’re writing your program as a service or daemon and defining how it reacts to various events. (In fact the core “main loop” of a Twisted program is (usually? always?) a reactor()).

The major challenges to using Twisted involve twisting your mind around the event driven model and also eschewing the use of any class libraries or toolkits which are not written to co-operate within the Twisted framework. This is why Twisted supplies its own modules for SSH protocol handling, for curses, and its own subprocess/Popen functions, and many other modules and protocol handlers which, at first blush, would seem to duplicate things in the Python standard libraries.

I think it’s useful to understand Twisted on a conceptual level even if you never intend to use it. It may give insights into performance, contention, and event handling in your threading, multiprocessing and even subprocess handling as well as any distributed processing you undertake.

(Note: Newer versions of Python 3.x are including asyncio (asynchronous I/O) features such as async def, the @async.coroutine decorator, and the await keyword, and yield from future support. All of these are roughly similar to Twisted from a process (co-operative multitasking) perspective). (For the current status of Twisted support for Python 3, check out: https://twistedmatrix.com/documents/current/core/howto/python3.html)

The distributed option:

Yet another realm of processing you haven’t asked about, but which is worth considering, is that of distributed processing. There are many Python tools and frameworks for distributed processing and parallel computation. Personally I think the easiest to use is one which is least often considered to be in that space.

It is almost trivial to build distributed processing around Redis. The entire key store can be used to store work units and results, Redis LISTs can be used as Queue() like object, and the PUB/SUB support can be used for Event-like handling. You can hash your keys and use values, replicated across a loose cluster of Redis instances, to store the topology and hash-token mappings to provide consistent hashing and fail-over for scaling beyond the capacity of any single instance for co-ordinating your workers and marshaling data (pickled, JSON, BSON, or YAML) among them.

Of course as you start to build a larger scale and more sophisticated solution around Redis you are re-implementing many of the features that have already been solved using, Celery, Apache Spark and Hadoop, Zookeeper, etcd, Cassandra and so on. Those all have modules for Python access to their services.

[Update: A couple of resources for consideration if you’re considering Python for computationally intensive across distributed systems: IPython Parallel and PySpark. While these are general purpose distributed computing systems, they are particularly accessible and popular subsystems data science and analytics].


There you have the gamut of processing alternatives for Python, from single threaded, with simple synchronous calls to sub-processes, pools of polled subprocesses, threaded and multiprocessing, event-driven co-operative multi-tasking, and out to distributed processing.

回答 2

在类似的情况下,我选择了单独的过程以及通过网络套接字进行的少量必要通信。它使用python高度可移植并且非常简单,但是可能并不简单(在我的情况下,我还有另一个约束:与其他用C ++编写的进程进行通信)。


In a similar case I opted for separate processes and the little bit of necessary communication trough network socket. It is highly portable and quite simple to do using python, but probably not the simpler (in my case I had also another constraint: communication with other processes written in C++).

In your case I would probably go for multiprocess, as python threads, at least when using CPython, are not real threads. Well, they are native system threads but C modules called from Python may or may not release the GIL and allow other threads them to run when calling blocking code.

回答 3


To use multiple processors in CPython your only choice is the multiprocessing module. CPython keeps a lock on it’s internals (the GIL) which prevents threads on other cpus to work in parallel. The multiprocessing module creates new processes ( like subprocess ) and manages communication between them.

回答 4



来自Ted Ziuba的网站

INPUTS_FROM_YOU | xargs -n1 -0 -P NUM ./process #NUM个并行进程


Gnu Parallel也将服务


Shell out and let the unix out to do your jobs:

use iterpipes to wrap subprocess and then:

From Ted Ziuba’s site

INPUTS_FROM_YOU | xargs -n1 -0 -P NUM ./process #NUM parallel processes


Gnu Parallel will also serve

You hang out with GIL while you send the backroom boys out to do your multicore work.

Python 3中的多处理与多线程与异步

问题:Python 3中的多处理与多线程与异步

我发现在Python 3.4中,用于多处理/线程的库很少:多处理 vs 线程asyncio


I found that in Python 3.4 there are few different libraries for multiprocessing/threading: multiprocessing vs threading vs asyncio.

But I don’t know which one to use or is the “recommended one”. Do they do the same thing, or are different? If so, which one is used for what? I want to write a program that uses multicores in my computer. But I don’t know which library I should learn.

回答 0

它们旨在(略有)不同的目的和/或要求。CPython(典型的主线Python实现)仍然具有全局解释器锁,因此多线程应用程序(当今实现并行处理的标准方式)不是最佳选择。这就是为什么multiprocessing 可能要优先于threading。但是并不是每个问题都可以有效地分解为[几乎独立的]部分,因此可能需要大量的进程间通信。这就是为什么multiprocessing可能不被threading普遍推荐的原因。

asyncio(该技术不仅在Python中可用,其他语言和/或框架也有此技术,例如Boost.ASIO)是一种有效处理来自许多同时源的大量I / O操作而无需并行代码执行的方法。 。因此,这仅是针对特定任务的解决方案(确实是一个不错的方案!),而不是通常用于并行处理的解决方案。

They are intended for (slightly) different purposes and/or requirements. CPython (a typical, mainline Python implementation) still has the global interpreter lock so a multi-threaded application (a standard way to implement parallel processing nowadays) is suboptimal. That’s why multiprocessing may be preferred over threading. But not every problem may be effectively split into [almost independent] pieces, so there may be a need in heavy interprocess communications. That’s why multiprocessing may not be preferred over threading in general.

asyncio (this technique is available not only in Python, other languages and/or frameworks also have it, e.g. Boost.ASIO) is a method to effectively handle a lot of I/O operations from many simultaneous sources w/o need of parallel code execution. So it’s just a solution (a good one indeed!) for a particular task, not for parallel processing in general.

回答 1





if io_bound:
    if io_very_slow:
        print("Use Asyncio")
        print("Use Threads")
    print("Multi Processing")
  • CPU限制=>多处理
  • I / O绑定,快速I / O,有限的连接数=>多线程
  • I / O受限,I / O缓慢,许多连接=> Asyncio


[ 注意 ]:

  • 如果您使用的是长调用方法(即,包含在睡眠时间或惰性I / O中的方法),则最佳选择是asyncioTwistedTornado方法(协程方法),该方法可以与单个线程并发工作。
  • asyncio适用于Python3.4及更高版本。
  • 自从Python2.7开始,TornadoTwisted已经准备就绪
  • uvloop是超快速asyncio事件循环(uvloop使asyncio速度提高2-4倍)。


  • Japranto GitHub是一个基于uvloop的非常快速的管道HTTP服务器。

[Quick Answer]


Making the Right Choice:

We have walked through the most popular forms of concurrency. But the question remains – when should choose which one? It really depends on the use cases. From my experience (and reading), I tend to follow this pseudo code:

if io_bound:
    if io_very_slow:
        print("Use Asyncio")
        print("Use Threads")
    print("Multi Processing")
  • CPU Bound => Multi Processing
  • I/O Bound, Fast I/O, Limited Number of Connections => Multi Threading
  • I/O Bound, Slow I/O, Many connections => Asyncio



  • If you have a long call method (i.e. a method that contained with a sleep time or lazy I/O), the best choice is asyncio, Twisted or Tornado approach (coroutine methods), that works with a single thread as concurrency.
  • asyncio works on Python3.4 and later.
  • Tornado and Twisted are ready since Python2.7
  • uvloop is ultra fast asyncio event loop (uvloop makes asyncio 2-4x faster).

[UPDATE (2019)]:

  • Japranto (GitHub) is a very fast pipelining HTTP server based on uvloop.

回答 2


IO- BOUND吗?———>使用asyncio

它是CPU- HEAVY吗?—–>使用multiprocessing


因此,除非您遇到IO / CPU问题,否则基本上要坚持使用线程。

This is the basic idea:

Is it IO-BOUND ? ———> USE asyncio

IS IT CPU-HEAVY ? —–> USE multiprocessing

ELSE ? ———————-> USE threading

So basically stick to threading unless you have IO/CPU problems.

回答 3


穿线您不需要多个CPU。想象一个程序向网络发送大量HTTP请求。如果使用单线程程序,它将在每个请求处停止执行(块),等待响应,然后在收到响应后继续执行。这里的问题是,在等待某些外部服务器执行任务时,您的CPU并未真正在工作。同时,它实际上可以做一些有用的工作!解决方法是使用线程-您可以创建多个线程,每个线程负责从Web请求一些内容。关于线程的好处是,即使它们在一个CPU上运行,CPU也会不时地“冻结”一个线程的执行并跳转到执行另一个线程(这称为上下文切换,并且它在不确定性下不断发生)间隔)。 -使用线程。


In multiprocessing you leverage multiple CPUs to distribute your calculations. Since each of the CPUs runs in parallel, you’re effectively able to run multiple tasks simultaneously. You would want to use multiprocessing for CPU-bound tasks. An example would be trying to calculate a sum of all elements of a huge list. If your machine has 8 cores, you can “cut” the list into 8 smaller lists and calculate the sum of each of those lists separately on separate core and then just add up those numbers. You’ll get a ~8x speedup by doing that.

In (multi)threading you don’t need multiple CPUs. Imagine a program that sends lots of HTTP requests to the web. If you used a single-threaded program, it would stop the execution (block) at each request, wait for a response, and then continue once received a response. The problem here is that your CPU isn’t really doing work while waiting for some external server to do the job; it could have actually done some useful work in the meantime! The fix is to use threads – you can create many of them, each responsible for requesting some content from the web. The nice thing about threads is that, even if they run on one CPU, the CPU from time to time “freezes” the execution of one thread and jumps to executing the other one (it’s called context switching and it happens constantly at non-deterministic intervals). So if your task is I/O bound – use threading.

asyncio is essentially threading where not the CPU but you, as a programmer (or actually your application), decide where and when does the context switch happen. In Python you use an await keyword to suspend the execution of your coroutine (defined using async keyword).




class OrderBook:
    orders = {'Restaurant1': None,
              'Restaurant2': None,
              'Restaurant3': None,
              'Restaurant4': None}

    def addOrder(restaurant_name, orders):
        OrderBook.orders[restaurant_name] = orders


def addOrders(restaurant_name):

    #creates orders

    OrderBook.addOrder(restaurant_name, orders)


I have a class which holds a dictionary

class OrderBook:
    orders = {'Restaurant1': None,
              'Restaurant2': None,
              'Restaurant3': None,
              'Restaurant4': None}

    def addOrder(restaurant_name, orders):
        OrderBook.orders[restaurant_name] = orders

And I am running 4 threads (one for each restaurant) that call the method OrderBook.addOrder. Here is the function ran by each thread:

def addOrders(restaurant_name):

    #creates orders

    OrderBook.addOrder(restaurant_name, orders)

Is this safe, or do I have to use a lock before calling addOrder?

回答 0



http://effbot.org/pyfaq/what-kinds-of-global-value-mutation-are-thread-safe.htm 具有更多详细信息。

Python’s built-in structures are thread-safe for single operations, but it can sometimes be hard to see where a statement really becomes multiple operations.

Your code should be safe. Keep in mind: a lock here will add almost no overhead, and will give you peace of mind.

http://effbot.org/pyfaq/what-kinds-of-global-value-mutation-are-thread-safe.htm has more details.

回答 1

是的,内置类型本质上是线程安全的:http : //docs.python.org/glossary.html#term-global-interpreter-lock


Yes, built-in types are inherently thread-safe: http://docs.python.org/glossary.html#term-global-interpreter-lock

This simplifies the CPython implementation by making the object model (including critical built-in types such as dict) implicitly safe against concurrent access.

回答 2







Google’s style guide advises against relying on dict atomicity

Explained in further detail at: Is Python variable assignment atomic?

Do not rely on the atomicity of built-in types.

While Python’s built-in data types such as dictionaries appear to have atomic operations, there are corner cases where they aren’t atomic (e.g. if __hash__ or __eq__ are implemented as Python methods) and their atomicity should not be relied upon. Neither should you rely on atomic variable assignment (since this in turn depends on dictionaries).

Use the Queue module’s Queue data type as the preferred way to communicate data between threads. Otherwise, use the threading module and its locking primitives. Learn about the proper use of condition variables so you can use threading.Condition instead of using lower-level locks.

And I agree with this one: there is already the GIL in CPython, so the performance hit of using a Lock will be negligible. Much more costly will be the hours spent bug hunting in a complex codebase when those CPython implementation details change one day.







In Python specifically, how do variables get shared between threads?

Although I have used threading.Thread before I never really understood or saw examples of how variables got shared. Are they shared between the main thread and the children or only among the children? When would I need to use thread local storage to avoid this sharing?

I have seen many warnings about synchronizing access to shared data among threads by using locks but I have yet to see a really good example of the problem.

Thanks in advance!

回答 0



In Python, everything is shared, except for function-local variables (because each function call gets its own set of locals, and threads are always separate function calls.) And even then, only the variables themselves (the names that refer to objects) are local to the function; objects themselves are always global, and anything can refer to them. The Thread object for a particular thread is not a special object in this regard. If you store the Thread object somewhere all threads can access (like a global variable) then all threads can access that one Thread object. If you want to atomically modify anything that another thread has access to, you have to protect it with a lock. And all threads must of course share this very same lock, or it wouldn’t be very effective.

If you want actual thread-local storage, that’s where threading.local comes in. Attributes of threading.local are not shared between threads; each thread sees only the attributes it itself placed in there. If you’re curious about its implementation, the source is in _threading_local.py in the standard library.

回答 1


#/usr/bin/env python

from time import sleep
from random import random
from threading import Thread, local

data = local()

def bar():
    print("I'm called from", data.v)

def foo():

class T(Thread):
    def run(self):
        data.v = self.getName()   # Thread-1 and Thread-2 accordingly
>> T()。start(); T()。开始()



#/usr/bin/env python

from time import sleep
from random import random
from threading import Thread

def bar():
    global v
    print("I'm called from", v)

def foo():

class T(Thread):
    def run(self):
        global v
        v = self.getName()   # Thread-1 and Thread-2 accordingly
>> T()。start(); T()。开始()


from threading import Thread

def bar(v):
    print("I'm called from", v)

def foo(v):

class T(Thread):
    def run(self):


Consider the following code:

#/usr/bin/env python

from time import sleep
from random import random
from threading import Thread, local

data = local()

def bar():
    print("I'm called from", data.v)

def foo():

class T(Thread):
    def run(self):
        data.v = self.getName()   # Thread-1 and Thread-2 accordingly
 >> T().start(); T().start()
I'm called from Thread-2
I'm called from Thread-1 

Here threading.local() is used as a quick and dirty way to pass some data from run() to bar() without changing the interface of foo().

Note that using global variables won’t do the trick:

#/usr/bin/env python

from time import sleep
from random import random
from threading import Thread

def bar():
    global v
    print("I'm called from", v)

def foo():

class T(Thread):
    def run(self):
        global v
        v = self.getName()   # Thread-1 and Thread-2 accordingly
 >> T().start(); T().start()
I'm called from Thread-2
I'm called from Thread-2 

Meanwhile, if you could afford passing this data through as an argument of foo() – it would be a more elegant and well-designed way:

from threading import Thread

def bar(v):
    print("I'm called from", v)

def foo(v):

class T(Thread):
    def run(self):

But this is not always possible when using third-party or poorly designed code.

回答 2


>>> tls = threading.local()
>>> tls.x = 4 
>>> tls.x


You can create thread local storage using threading.local().

>>> tls = threading.local()
>>> tls.x = 4 
>>> tls.x

Data stored to the tls will be unique to each thread which will help ensure that unintentional sharing does not occur.

回答 3



Just like in every other language, every thread in Python has access to the same variables. There’s no distinction between the ‘main thread’ and child threads.

One difference with Python is that the Global Interpreter Lock means that only one thread can be running Python code at a time. This isn’t much help when it comes to synchronising access, however, as all the usual pre-emption issues still apply, and you have to use threading primitives just like in other languages. It does mean you need to reconsider if you were using threads for performance, however.

回答 4


该语句似乎正确无误:“如果您想自动地修改另一个线程可以访问的任何内容,则必须使用锁来保护它。” 我认为这句话->有效地<-是正确的,但并不完全正确。我认为“原子”一词意味着Python解释器创建了一个字节代码块,该块不留空间给CPU发送中断信号。

我认为原子操作是不提供对中断访问权限的Python字节代码块。Python语句(例如“ running = True”)是原子的。在这种情况下,您无需从中断中锁定CPU(我相信)。Python字节代码故障可以防止线程中断。

诸如“ threads_running [5] = True”之类的Python代码不是原子的。这里有两个Python字节代码块;一个取消引用对象的list(),另一个字节码块为对象分配值,在这种情况下为列表中的“位置”。可以引发中断->在<-两个字节码之间-> chunks <-。那是坏事发生的地方。


I may be wrong here. If you know otherwise please expound as this would help explain why one would need to use thread local().

This statement seems off, not wrong: “If you want to atomically modify anything that another thread has access to, you have to protect it with a lock.” I think this statement is ->effectively<- right but not entirely accurate. I thought the term “atomic” meant that the Python interpreter created a byte-code chunk that left no room for an interrupt signal to the CPU.

I thought atomic operations are chunks of Python byte code that does not give access to interrupts. Python statements like “running = True” is atomic. You do not need to lock CPU from interrupts in this case (I believe). The Python byte code breakdown is safe from thread interruption.

Python code like “threads_running[5] = True” is not atomic. There are two chunks of Python byte code here; one to de-reference the list() for an object and another byte code chunk to assign a value to an object, in this case a “place” in a list. An interrupt can be raised –>between<- the two byte-code ->chunks<-. That is were bad stuff happens.

How does thread local() relate to “atomic”? This is why the statement seems misdirecting to me. If not can you explain?




def processLine(line) :
    print "hello";

dRecieved = connFile.readline();
processThread = threading.Thread(target=processLine, args=(dRecieved));

其中dRecieved是连接读取的一行的字符串。它调用了一个简单的函数,到目前为止,该函数仅具有打印“ hello”的一项工作。


Traceback (most recent call last):
File "C:\Python25\lib\threading.py", line 486, in __bootstrap_inner
File "C:\Python25\lib\threading.py", line 446, in run
self.__target(*self.__args, **self.__kwargs)
TypeError: processLine() takes exactly 1 arguments (232 given)


I have a problem with Python threading and sending a string in the arguments.

def processLine(line) :
    print "hello";


dRecieved = connFile.readline();
processThread = threading.Thread(target=processLine, args=(dRecieved));

Where dRecieved is the string of one line read by a connection. It calls a simple function which as of right now has only one job of printing “hello”.

However I get the following error

Traceback (most recent call last):
File "C:\Python25\lib\threading.py", line 486, in __bootstrap_inner
File "C:\Python25\lib\threading.py", line 446, in run
self.__target(*self.__args, **self.__kwargs)
TypeError: processLine() takes exactly 1 arguments (232 given)

232 is the length of the string that I am trying to pass, so I guess its breaking it up into each character and trying to pass the arguments like that. It works fine if I just call the function normally but I would really like to set it up as a separate thread.

回答 0



dRecieved = connFile.readline()
processThread = threading.Thread(target=processLine, args=(dRecieved,))  # <- note extra ','


dRecieved = connFile.readline()
processThread = threading.Thread(target=processLine, args=[dRecieved])  # <- 1 element list

如果您注意到,从堆栈跟踪中: self.__target(*self.__args, **self.__kwargs)

*self.__args将您的字符串转换成字符的列表,将它们传递给processLine 函数。如果将一个元素列表传递给它,它将把该元素作为第一个参数传递-在您的情况下为字符串。

You’re trying to create a tuple, but you’re just parenthesizing a string :)

Add an extra ‘,’:

dRecieved = connFile.readline()
processThread = threading.Thread(target=processLine, args=(dRecieved,))  # <- note extra ','

Or use brackets to make a list:

dRecieved = connFile.readline()
processThread = threading.Thread(target=processLine, args=[dRecieved])  # <- 1 element list

If you notice, from the stack trace: self.__target(*self.__args, **self.__kwargs)

The *self.__args turns your string into a list of characters, passing them to the processLine function. If you pass it a one element list, it will pass that element as the first argument – in your case, the string.

回答 1


首先,方法threading :: Thread的构造函数签名:

class threading.Thread(group=None, target=None, name=None, args=(), kwargs={}, *, daemon=None)


第二,关于Python 的怪癖tuple


另一方面,字符串是字符序列,例如'abc'[1] == 'b'。因此,如果将字符串发送到args,即使在括号中(仍然是一个字符串),每个字符也将被视为单个参数。


I hope to provide more background knowledge here.

First, constructor signature of the of method threading::Thread:

class threading.Thread(group=None, target=None, name=None, args=(), kwargs={}, *, daemon=None)

args is the argument tuple for the target invocation. Defaults to ().

Second, A quirk in Python about tuple:

Empty tuples are constructed by an empty pair of parentheses; a tuple with one item is constructed by following a value with a comma (it is not sufficient to enclose a single value in parentheses).

On the other hand, a string is a sequence of characters, like 'abc'[1] == 'b'. So if send a string to args, even in parentheses (still a sting), each character will be treated as a single parameter.

However, Python is so integrated and is not like JavaScript where extra arguments can be tolerated. Instead, it throws an TypeError to complain.




I notice that it is often suggested to use queues with multiple threads, instead of lists and .pop(). Is this because lists are not thread-safe, or for some other reason?

回答 0


L[0] += 1

如果另一个线程做同样的事情,则不能保证实际上将L [0]增加一,因为 +=这不是原子操作。(实际上,Python中很少有原子操作的操作,因为它们中的大多数操作都会导致调用任意Python代码。)您应该使用Queues,因为如果您仅使用不受保护的列表,则可能由于种族而获得或删除了错误的项目条件。

Lists themselves are thread-safe. In CPython the GIL protects against concurrent accesses to them, and other implementations take care to use a fine-grained lock or a synchronized datatype for their list implementations. However, while lists themselves can’t go corrupt by attempts to concurrently access, the lists’s data is not protected. For example:

L[0] += 1

is not guaranteed to actually increase L[0] by one if another thread does the same thing, because += is not an atomic operation. (Very, very few operations in Python are actually atomic, because most of them can cause arbitrary Python code to be called.) You should use Queues because if you just use an unprotected list, you may get or delete the wrong item because of race conditions.

回答 1

为了澄清托马斯出色答案的观点,应该提到的append() 线程安全的。


To clarify a point in Thomas’ excellent answer, it should be mentioned that append() is thread safe.

This is because there is no concern that data being read will be in the same place once we go to write to it. The append() operation does not read data, it only writes data to the list.

回答 2

list操作示例以及它们是否线程安全的完整但不详尽的列表。希望能得到关于答案obj in a_list的语言结构在这里

Here’s a comprehensive yet non-exhaustive list of examples of list operations and whether or not they are thread safe. Hoping to get an answer regarding the obj in a_list language construct here.

回答 3



import threading
import time

# Change this number as you please, bigger numbers will get the error quickly
count = 1000
l = []

def add():
    for i in range(count):

def remove():
    for i in range(count):

t1 = threading.Thread(target=add)
t2 = threading.Thread(target=remove)



Exception in thread Thread-63:
Traceback (most recent call last):
  File "/Users/zup/.pyenv/versions/3.6.8/lib/python3.6/threading.py", line 916, in _bootstrap_inner
  File "/Users/zup/.pyenv/versions/3.6.8/lib/python3.6/threading.py", line 864, in run
    self._target(*self._args, **self._kwargs)
  File "<ipython-input-30-ecfbac1c776f>", line 13, in remove
ValueError: list.remove(x): x not in list


import threading
import time
count = 1000
l = []
lock = threading.RLock()
def add():
    with lock:
        for i in range(count):

def remove():
    with lock:
        for i in range(count):

t1 = threading.Thread(target=add)
t2 = threading.Thread(target=remove)



[] # Empty list



I recently had this case where I needed to append to a list continuously in one thread, loop through the items and check if the item was ready, it was an AsyncResult in my case and remove it from the list only if it was ready. I could not find any examples that demonstrated my problem clearly Here is an example demonstrating adding to list in one thread continuously and removing from the same list in another thread continuously The flawed version runs easily on smaller numbers but keep the numbers big enough and run a few times and you will see the error

The FLAWED version

import threading
import time

# Change this number as you please, bigger numbers will get the error quickly
count = 1000
l = []

def add():
    for i in range(count):

def remove():
    for i in range(count):

t1 = threading.Thread(target=add)
t2 = threading.Thread(target=remove)


Output when ERROR

Exception in thread Thread-63:
Traceback (most recent call last):
  File "/Users/zup/.pyenv/versions/3.6.8/lib/python3.6/threading.py", line 916, in _bootstrap_inner
  File "/Users/zup/.pyenv/versions/3.6.8/lib/python3.6/threading.py", line 864, in run
    self._target(*self._args, **self._kwargs)
  File "<ipython-input-30-ecfbac1c776f>", line 13, in remove
ValueError: list.remove(x): x not in list

Version that uses locks

import threading
import time
count = 1000
l = []
lock = threading.RLock()
def add():
    with lock:
        for i in range(count):

def remove():
    with lock:
        for i in range(count):

t1 = threading.Thread(target=add)
t2 = threading.Thread(target=remove)



[] # Empty list


As mentioned in the earlier answers while the act of appending or popping elements from the list itself is thread safe, what is not thread safe is when you append in one thread and pop in another