标签归档:multithreading

如何在Python中找到线程ID

问题:如何在Python中找到线程ID

我有一个多线程Python程序和一个实用程序函数, writeLog(message),该写出时间戳记和消息。不幸的是,结果日志文件没有给出哪个线程正在生成哪个消息的指示。

我希望writeLog()能够在消息中添加一些内容,以标识哪个线程正在调用它。显然,我可以使线程将这些信息传递进来,但这将需要更多工作。是否有一些os.getpid()我可以使用的等效线程?

I have a multi-threading Python program, and a utility function, writeLog(message), that writes out a timestamp followed by the message. Unfortunately, the resultant log file gives no indication of which thread is generating which message.

I would like writeLog() to be able to add something to the message to identify which thread is calling it. Obviously I could just make the threads pass this information in, but that would be a lot more work. Is there some thread equivalent of os.getpid() that I could use?


回答 0

threading.get_ident(),或threading.current_thread().ident(或(threading.currentThread().ident对于python <2.6)。

threading.get_ident() works, or threading.current_thread().ident (or threading.currentThread().ident for Python < 2.6).


回答 1

使用日志记录模块,您可以在每个日志条目中自动添加当前线程标识符。只需在记录器格式字符串中使用以下LogRecord映射键之一:

%(thread)d: 线程ID(如果有)。

%(threadName)s: 线程名称(如果有)。

并使用它设置默认处理程序:

logging.basicConfig(format="%(threadName)s:%(message)s")

Using the logging module you can automatically add the current thread identifier in each log entry. Just use one of these LogRecord mapping keys in your logger format string:

%(thread)d : Thread ID (if available).

%(threadName)s : Thread name (if available).

and set up your default handler with it:

logging.basicConfig(format="%(threadName)s:%(message)s")

回答 2

thread.get_ident()函数在Linux上返回一个长整数。这实际上不是线程ID。

我使用这种方法来真正获取Linux上的线程ID:

import ctypes
libc = ctypes.cdll.LoadLibrary('libc.so.6')

# System dependent, see e.g. /usr/include/x86_64-linux-gnu/asm/unistd_64.h
SYS_gettid = 186

def getThreadId():
   """Returns OS thread id - Specific to Linux"""
   return libc.syscall(SYS_gettid)

The thread.get_ident() function returns a long integer on Linux. It’s not really a thread id.

I use this method to really get the thread id on Linux:

import ctypes
libc = ctypes.cdll.LoadLibrary('libc.so.6')

# System dependent, see e.g. /usr/include/x86_64-linux-gnu/asm/unistd_64.h
SYS_gettid = 186

def getThreadId():
   """Returns OS thread id - Specific to Linux"""
   return libc.syscall(SYS_gettid)

回答 3


回答 4

我看到了这样的线程ID的示例:

class myThread(threading.Thread):
    def __init__(self, threadID, name, counter):
        self.threadID = threadID
        ...

线程模块文档列表name属性,以及:

...

A thread has a name. 
The name can be passed to the constructor, 
and read or changed through the name attribute.

...

Thread.name

A string used for identification purposes only. 
It has no semantics. Multiple threads may
be given the same name. The initial name is set by the constructor.

I saw examples of thread IDs like this:

class myThread(threading.Thread):
    def __init__(self, threadID, name, counter):
        self.threadID = threadID
        ...

The threading module docs lists name attribute as well:

...

A thread has a name. 
The name can be passed to the constructor, 
and read or changed through the name attribute.

...

Thread.name

A string used for identification purposes only. 
It has no semantics. Multiple threads may
be given the same name. The initial name is set by the constructor.

回答 5

您可以获得当前正在运行的线程的标识。如果当前线程结束,则该标识可以重用于其他线程。

创建线程实例时,将为该线程隐式指定一个名称,即模式:线程号

名称没有意义,名称不必唯一。所有正在运行的线程的标识都是唯一的。

import threading


def worker():
    print(threading.current_thread().name)
    print(threading.get_ident())


threading.Thread(target=worker).start()
threading.Thread(target=worker, name='foo').start()

函数threading.current_thread()返回当前正在运行的线程。该对象保存线程的全部信息。

You can get the ident of the current running thread. The ident could be reused for other threads, if the current thread ends.

When you crate an instance of Thread, a name is given implicit to the thread, which is the pattern: Thread-number

The name has no meaning and the name don’t have to be unique. The ident of all running threads is unique.

import threading


def worker():
    print(threading.current_thread().name)
    print(threading.get_ident())


threading.Thread(target=worker).start()
threading.Thread(target=worker, name='foo').start()

The function threading.current_thread() returns the current running thread. This object holds the whole information of the thread.


回答 6

我在Python中创建了多个线程,打印了线程对象,并使用ident变量打印了id 。我看到所有ID都一样:

<Thread(Thread-1, stopped 140500807628544)>
<Thread(Thread-2, started 140500807628544)>
<Thread(Thread-3, started 140500807628544)>

I created multiple threads in Python, I printed the thread objects, and I printed the id using the ident variable. I see all the ids are same:

<Thread(Thread-1, stopped 140500807628544)>
<Thread(Thread-2, started 140500807628544)>
<Thread(Thread-3, started 140500807628544)>

回答 7

与@brucexin类似,我需要获取操作系统级别的线程标识符(!= thread.get_ident()),并使用如下所示的内容来不依赖于特定的数字并且仅使用amd64:

---- 8< ---- (xos.pyx)
"""module xos complements standard module os""" 

cdef extern from "<sys/syscall.h>":                                                             
    long syscall(long number, ...)                                                              
    const int SYS_gettid                                                                        

# gettid returns current OS thread identifier.                                                  
def gettid():                                                                                   
    return syscall(SYS_gettid)                                                                  

---- 8< ---- (test.py)
import pyximport; pyximport.install()
import xos

...

print 'my tid: %d' % xos.gettid()

这取决于Cython。

Similarly to @brucexin I needed to get OS-level thread identifier (which != thread.get_ident()) and use something like below not to depend on particular numbers and being amd64-only:

---- 8< ---- (xos.pyx)
"""module xos complements standard module os""" 

cdef extern from "<sys/syscall.h>":                                                             
    long syscall(long number, ...)                                                              
    const int SYS_gettid                                                                        

# gettid returns current OS thread identifier.                                                  
def gettid():                                                                                   
    return syscall(SYS_gettid)                                                                  

and

---- 8< ---- (test.py)
import pyximport; pyximport.install()
import xos

...

print 'my tid: %d' % xos.gettid()

this depends on Cython though.


python multithreading等到所有线程完成

问题:python multithreading等到所有线程完成

可能是在类似的情况下提出的,但是经过大约20分钟的搜索,我无法找到答案,所以我会提出。

我写了一个Python脚本(让我们说:scriptA.py)和一个脚本(让我们说scriptB.py)。

在scriptB中,我想使用不同的参数多次调用scriptA,每次运行大约需要一个小时(它是一个巨大的脚本,做了很多事情。。不用担心),并且我希望能够运行scriptA同时具有所有不同的参数,但是我需要等到所有参数都完成后才能继续;我的代码:

import subprocess

#setup
do_setup()

#run scriptA
subprocess.call(scriptA + argumentsA)
subprocess.call(scriptA + argumentsB)
subprocess.call(scriptA + argumentsC)

#finish
do_finish()

我想同时运行所有程序subprocess.call(),然后等到它们全部完成后,该怎么办?

我试图像这里的例子一样使用线程:

from threading import Thread
import subprocess

def call_script(args)
    subprocess.call(args)

#run scriptA   
t1 = Thread(target=call_script, args=(scriptA + argumentsA))
t2 = Thread(target=call_script, args=(scriptA + argumentsB))
t3 = Thread(target=call_script, args=(scriptA + argumentsC))
t1.start()
t2.start()
t3.start()

但是我认为这是不对的。

我怎么知道他们都已经跑完了才去我家do_finish()

This may have been asked in a similar context but I was unable to find an answer after about 20 minutes of searching, so I will ask.

I have written a Python script (lets say: scriptA.py) and a script (lets say scriptB.py)

In scriptB I want to call scriptA multiple times with different arguments, each time takes about an hour to run, (its a huge script, does lots of stuff.. don’t worry about it) and I want to be able to run the scriptA with all the different arguments simultaneously, but I need to wait till ALL of them are done before continuing; my code:

import subprocess

#setup
do_setup()

#run scriptA
subprocess.call(scriptA + argumentsA)
subprocess.call(scriptA + argumentsB)
subprocess.call(scriptA + argumentsC)

#finish
do_finish()

I want to do run all the subprocess.call() at the same time, and then wait till they are all done, how should I do this?

I tried to use threading like the example here:

from threading import Thread
import subprocess

def call_script(args)
    subprocess.call(args)

#run scriptA   
t1 = Thread(target=call_script, args=(scriptA + argumentsA))
t2 = Thread(target=call_script, args=(scriptA + argumentsB))
t3 = Thread(target=call_script, args=(scriptA + argumentsC))
t1.start()
t2.start()
t3.start()

But I do not think this is right.

How do I know they have all finished running before going to my do_finish()?


回答 0

您需要在脚本末尾使用object的join方法Thread

t1 = Thread(target=call_script, args=(scriptA + argumentsA))
t2 = Thread(target=call_script, args=(scriptA + argumentsB))
t3 = Thread(target=call_script, args=(scriptA + argumentsC))

t1.start()
t2.start()
t3.start()

t1.join()
t2.join()
t3.join()

因此,主线程将等到t1t2t3完成执行。

You need to use join method of Thread object in the end of the script.

t1 = Thread(target=call_script, args=(scriptA + argumentsA))
t2 = Thread(target=call_script, args=(scriptA + argumentsB))
t3 = Thread(target=call_script, args=(scriptA + argumentsC))

t1.start()
t2.start()
t3.start()

t1.join()
t2.join()
t3.join()

Thus the main thread will wait till t1, t2 and t3 finish execution.


回答 1

将线程放入列表中,然后使用Join方法

 threads = []

 t = Thread(...)
 threads.append(t)

 ...repeat as often as necessary...

 # Start all threads
 for x in threads:
     x.start()

 # Wait for all of them to finish
 for x in threads:
     x.join()

Put the threads in a list and then use the Join method

 threads = []

 t = Thread(...)
 threads.append(t)

 ...repeat as often as necessary...

 # Start all threads
 for x in threads:
     x.start()

 # Wait for all of them to finish
 for x in threads:
     x.join()

回答 2

在Python3中,由于使用Python 3.2,因此有一种新方法可以达到相同的结果,我个人更喜欢传统的线程创建/启动/连接程序包concurrent.futureshttps : //docs.python.org/3/library/concurrent.futures .html

使用ThreadPoolExecutor代码将是:

from concurrent.futures.thread import ThreadPoolExecutor
import time

def call_script(ordinal, arg):
    print('Thread', ordinal, 'argument:', arg)
    time.sleep(2)
    print('Thread', ordinal, 'Finished')

args = ['argumentsA', 'argumentsB', 'argumentsC']

with ThreadPoolExecutor(max_workers=2) as executor:
    ordinal = 1
    for arg in args:
        executor.submit(call_script, ordinal, arg)
        ordinal += 1
print('All tasks has been finished')

先前代码的输出类似于:

Thread 1 argument: argumentsA
Thread 2 argument: argumentsB
Thread 1 Finished
Thread 2 Finished
Thread 3 argument: argumentsC
Thread 3 Finished
All tasks has been finished

优点之一是,您可以控制吞吐量,设置最大并行工作器数。

In Python3, since Python 3.2 there is a new approach to reach the same result, that I personally prefer to the traditional thread creation/start/join, package concurrent.futures: https://docs.python.org/3/library/concurrent.futures.html

Using a ThreadPoolExecutor the code would be:

from concurrent.futures.thread import ThreadPoolExecutor
import time

def call_script(ordinal, arg):
    print('Thread', ordinal, 'argument:', arg)
    time.sleep(2)
    print('Thread', ordinal, 'Finished')

args = ['argumentsA', 'argumentsB', 'argumentsC']

with ThreadPoolExecutor(max_workers=2) as executor:
    ordinal = 1
    for arg in args:
        executor.submit(call_script, ordinal, arg)
        ordinal += 1
print('All tasks has been finished')

The output of the previous code is something like:

Thread 1 argument: argumentsA
Thread 2 argument: argumentsB
Thread 1 Finished
Thread 2 Finished
Thread 3 argument: argumentsC
Thread 3 Finished
All tasks has been finished

One of the advantages is that you can control the throughput setting the max concurrent workers.


回答 3

我更喜欢根据输入列表使用列表理解:

inputs = [scriptA + argumentsA, scriptA + argumentsB, ...]
threads = [Thread(target=call_script, args=(i)) for i in inputs]
[t.start() for t in threads]
[t.join() for t in threads]

I prefer using list comprehension based on an input list:

inputs = [scriptA + argumentsA, scriptA + argumentsB, ...]
threads = [Thread(target=call_script, args=(i)) for i in inputs]
[t.start() for t in threads]
[t.join() for t in threads]

回答 4

您可以在下面使用类似的类,从中可以添加要并行执行的n个功能或console_scripts,并开始执行并等待所有作业完成。

from multiprocessing import Process

class ProcessParallel(object):
    """
    To Process the  functions parallely

    """    
    def __init__(self, *jobs):
        """
        """
        self.jobs = jobs
        self.processes = []

    def fork_processes(self):
        """
        Creates the process objects for given function deligates
        """
        for job in self.jobs:
            proc  = Process(target=job)
            self.processes.append(proc)

    def start_all(self):
        """
        Starts the functions process all together.
        """
        for proc in self.processes:
            proc.start()

    def join_all(self):
        """
        Waits untill all the functions executed.
        """
        for proc in self.processes:
            proc.join()


def two_sum(a=2, b=2):
    return a + b

def multiply(a=2, b=2):
    return a * b


#How to run:
if __name__ == '__main__':
    #note: two_sum, multiply can be replace with any python console scripts which
    #you wanted to run parallel..
    procs =  ProcessParallel(two_sum, multiply)
    #Add all the process in list
    procs.fork_processes()
    #starts  process execution 
    procs.start_all()
    #wait until all the process got executed
    procs.join_all()

You can have class something like below from which you can add ‘n’ number of functions or console_scripts you want to execute in parallel passion and start the execution and wait for all jobs to complete..

from multiprocessing import Process

class ProcessParallel(object):
    """
    To Process the  functions parallely

    """    
    def __init__(self, *jobs):
        """
        """
        self.jobs = jobs
        self.processes = []

    def fork_processes(self):
        """
        Creates the process objects for given function deligates
        """
        for job in self.jobs:
            proc  = Process(target=job)
            self.processes.append(proc)

    def start_all(self):
        """
        Starts the functions process all together.
        """
        for proc in self.processes:
            proc.start()

    def join_all(self):
        """
        Waits untill all the functions executed.
        """
        for proc in self.processes:
            proc.join()


def two_sum(a=2, b=2):
    return a + b

def multiply(a=2, b=2):
    return a * b


#How to run:
if __name__ == '__main__':
    #note: two_sum, multiply can be replace with any python console scripts which
    #you wanted to run parallel..
    procs =  ProcessParallel(two_sum, multiply)
    #Add all the process in list
    procs.fork_processes()
    #starts  process execution 
    procs.start_all()
    #wait until all the process got executed
    procs.join_all()

回答 5

threading 模块文档

有一个“主线程”对象;这对应于Python程序中的初始控制线程。它不是守护程序线程。

有可能会创建“虚拟线程对象”。这些是与“外来线程”相对应的线程对象,“外来线程”是在线程模块外部启动的控制线程,例如直接从C代码启动的线程。虚拟线程对象的功能有限。它们始终被认为是活动的和守护程序的,不能被join()编辑。它们永远不会被删除,因为不可能检测到外来线程的终止。

因此,要在不希望保留所创建线程的列表的情况下捕获这两种情况:

import threading as thrd


def alter_data(data, index):
    data[index] *= 2


data = [0, 2, 6, 20]

for i, value in enumerate(data):
    thrd.Thread(target=alter_data, args=[data, i]).start()

for thread in thrd.enumerate():
    if thread.daemon:
        continue
    try:
        thread.join()
    except RuntimeError as err:
        if 'cannot join current thread' in err.args[0]:
            # catchs main thread
            continue
        else:
            raise

于是:

>>> print(data)
[0, 4, 12, 40]

From the threading module documentation

There is a “main thread” object; this corresponds to the initial thread of control in the Python program. It is not a daemon thread.

There is the possibility that “dummy thread objects” are created. These are thread objects corresponding to “alien threads”, which are threads of control started outside the threading module, such as directly from C code. Dummy thread objects have limited functionality; they are always considered alive and daemonic, and cannot be join()ed. They are never deleted, since it is impossible to detect the termination of alien threads.

So, to catch those two cases when you are not interested in keeping a list of the threads you create:

import threading as thrd


def alter_data(data, index):
    data[index] *= 2


data = [0, 2, 6, 20]

for i, value in enumerate(data):
    thrd.Thread(target=alter_data, args=[data, i]).start()

for thread in thrd.enumerate():
    if thread.daemon:
        continue
    try:
        thread.join()
    except RuntimeError as err:
        if 'cannot join current thread' in err.args[0]:
            # catchs main thread
            continue
        else:
            raise

Whereupon:

>>> print(data)
[0, 4, 12, 40]

回答 6

也许像

for t in threading.enumerate():
    if t.daemon:
        t.join()

Maybe, something like

for t in threading.enumerate():
    if t.daemon:
        t.join()

回答 7

我遇到了同样的问题,我需要等待使用for循环创建的所有线程。我只是尝试了以下代码段,它可能不是完美的解决方案,但我认为这将是一个简单的解决方案去测试:

for t in threading.enumerate():
    try:
        t.join()
    except RuntimeError as err:
        if 'cannot join current thread' in err:
            continue
        else:
            raise

I just came across the same problem where I needed to wait for all the threads which were created using the for loop.I just tried out the following piece of code.It may not be the perfect solution but I thought it would be a simple solution to test:

for t in threading.enumerate():
    try:
        t.join()
    except RuntimeError as err:
        if 'cannot join current thread' in err:
            continue
        else:
            raise

PyQt应用程序中的线程:使用Qt线程还是Python线程?

问题:PyQt应用程序中的线程:使用Qt线程还是Python线程?

我正在编写一个GUI应用程序,该应用程序通过Web连接定期检索数据。由于此检索需要一段时间,因此会导致UI在检索过程中无响应(无法拆分成较小的部分)。这就是为什么我想将Web连接外包给单独的工作线程。

[是的,我知道,现在我有两个问题。]

无论如何,该应用程序使用PyQt4,所以我想知道更好的选择是:使用Qt的线程还是使用Python threading模块?各自的优点/缺点是什么?还是您有完全不同的建议?

编辑(赏金):虽然在我的特定情况下,解决方案可能会使用非阻塞网络请求,例如Jeff OberLukášLalinský建议的(所以基本上将并发性问题留给了网络实现),但我仍然想要更多深入回答一般问题:

与本地Python线程(来自threading模块)相比,使用PyQt4(即Qt)线程有什么优缺点?


编辑2:谢谢大家的回答。尽管没有达成100%的协议,但似乎普遍的共识是答案是“使用Qt”,因为这样做的优点是可以与库的其余部分集成,而不会造成任何实际的缺点。

对于希望在这两种线程实现之间进行选择的任何人,我强烈建议他们阅读此处提供的所有答案,包括方丈链接到的PyQt邮件列表线程。

我考虑了一些悬赏的答案;最后,我选择了方丈作为非常相关的外部参考;然而,这是一个密切的电话。

再次感谢。

I’m writing a GUI application that regularly retrieves data through a web connection. Since this retrieval takes a while, this causes the UI to be unresponsive during the retrieval process (it cannot be split into smaller parts). This is why I’d like to outsource the web connection to a separate worker thread.

[Yes, I know, now I have two problems.]

Anyway, the application uses PyQt4, so I’d like to know what the better choice is: Use Qt’s threads or use the Python threading module? What are advantages / disadvantages of each? Or do you have a totally different suggestion?

Edit (re bounty): While the solution in my particular case will probably be using a non-blocking network request like Jeff Ober and Lukáš Lalinský suggested (so basically leaving the concurrency problems to the networking implementation), I’d still like a more in-depth answer to the general question:

What are advantages and disadvantages of using PyQt4’s (i.e. Qt’s) threads over native Python threads (from the threading module)?


Edit 2: Thanks all for you answers. Although there’s no 100% agreement, there seems to be widespread consensus that the answer is “use Qt”, since the advantage of that is integration with the rest of the library, while causing no real disadvantages.

For anyone looking to choose between the two threading implementations, I highly recommend they read all the answers provided here, including the PyQt mailing list thread that abbot links to.

There were several answers I considered for the bounty; in the end I chose abbot’s for the very relevant external reference; it was, however, a close call.

Thanks again.


回答 0

不久前在PyQt邮件列表中对此进行了讨论。引用乔凡尼·巴霍(Giovanni Bajo)对这个问题的评论

大致相同。主要区别在于QThreads与Qt(异步信号/插槽,事件循环等)更好地集成在一起。另外,您不能在Python线程中使用Qt(例如,不能通过QApplication.postEvent将事件发布到主线程):您需要一个QThread才能工作。

一般的经验法则是,如果您要以某种方式与Qt进行交互,则可以使用QThreads;否则,请使用Python线程。

PyQt的作者对此主题有较早的评论:“它们都是相同的本机线程实现的包装器”。两种实现都以相同的方式使用GIL。

This was discussed not too long ago in PyQt mailing list. Quoting Giovanni Bajo’s comments on the subject:

It’s mostly the same. The main difference is that QThreads are better integrated with Qt (asynchrnous signals/slots, event loop, etc.). Also, you can’t use Qt from a Python thread (you can’t for instance post event to the main thread through QApplication.postEvent): you need a QThread for that to work.

A general rule of thumb might be to use QThreads if you’re going to interact somehow with Qt, and use Python threads otherwise.

And some earlier comment on this subject from PyQt’s author: “they are both wrappers around the same native thread implementations”. And both implementations use GIL in the same way.


回答 1

Python的线程将更简单,更安全,并且由于它用于基于I / O的应用程序,因此它们能够绕过GIL。也就是说,您是否考虑过使用Twisted或非阻塞套接字/选择的非阻塞I / O?

编辑:更多关于线程

Python线程

Python的线程是系统线程。但是,Python使用全局解释器锁(GIL)来确保解释器一次只执行一定大小的字节码指令块。幸运的是,Python在输入/输出操作期间释放了GIL,使线程可用于模拟非阻塞I / O。

重要警告:这可能会引起误解,因为字节码指令的数量与程序中的行数对应。在Python中,即使是单个分配也可能不是原子分配的,因此对于必须原子执行的任何代码块,即使使用GIL,也需要互斥锁。

QT线程

当Python将控制权交给第三方编译模块时,它将释放GIL。在需要时,确保原子性成为模块的责任。当控制权回传时,Python将使用GIL。这可能会使第3方库与线程混淆一起使用。使用外部线程库更加困难,因为它增加了控制权在何时何地掌握在模块和解释器之间的不确定性。

QT线程在释放GIL的情况下运行。QT线程能够同时执行QT库代码(以及其他不获取GIL的已编译模块代码)。然而,QT线程的上下文中执行的Python代码仍然取得GIL,现在你必须要管理2台逻辑的锁定你的代码。

最后,QT线程和Python线程都是系统线程的包装器。Python线程使用起来稍微安全些,因为那些不是用Python编写的部分(隐式使用GIL)在任何情况下都使用GIL(尽管上面的警告仍然适用)。

非阻塞I / O

线程给您的应用程序增加了极大的复杂性。特别是在处理Python解释器和已编译模块代码之间已经很复杂的交互时。尽管许多人发现很难遵循基于事件的编程,但是基于事件的非阻塞I / O通常比线程难得多。

使用异步I / O,您始终可以确保对于每个打开的描述符,执行路径是一致且有序的。显然,有一些必须解决的问题,例如当代码取决于一个打开的通道时该怎么办进一步取决于当另一个打开的通道返回数据时要调用的代码结果。

新的Diesel库是基于事件的非阻塞I / O的一种不错的解决方案。目前,它仅限于Linux,但是它非常快且非常优雅。

还值得您花时间学习pyevent,它是一个出色的libevent库的包装器,它为系统使用最快的可用方法(在编译时确定)提供了基于事件的编程的基本框架。

Python’s threads will be simpler and safer, and since it is for an I/O-based application, they are able to bypass the GIL. That said, have you considered non-blocking I/O using Twisted or non-blocking sockets/select?

EDIT: more on threads

Python threads

Python’s threads are system threads. However, Python uses a global interpreter lock (GIL) to ensure that the interpreter is only ever executing a certain size block of byte-code instructions at a time. Luckily, Python releases the GIL during input/output operations, making threads useful for simulating non-blocking I/O.

Important caveat: This can be misleading, since the number of byte-code instructions does not correspond to the number of lines in a program. Even a single assignment may not be atomic in Python, so a mutex lock is necessary for any block of code that must be executed atomically, even with the GIL.

QT threads

When Python hands off control to a 3rd party compiled module, it releases the GIL. It becomes the responsibility of the module to ensure atomicity where required. When control is passed back, Python will use the GIL. This can make using 3rd party libraries in conjunction with threads confusing. It is even more difficult to use an external threading library because it adds uncertainty as to where and when control is in the hands of the module vs the interpreter.

QT threads operate with the GIL released. QT threads are able to execute QT library code (and other compiled module code that does not acquire the GIL) concurrently. However, the Python code executed within the context of a QT thread still acquires the GIL, and now you have to manage two sets of logic for locking your code.

In the end, both QT threads and Python threads are wrappers around system threads. Python threads are marginally safer to use, since those parts that are not written in Python (implicitly using the GIL) use the GIL in any case (although the caveat above still applies.)

Non-blocking I/O

Threads add extraordinarily complexity to your application. Especially when dealing with the already complex interaction between the Python interpreter and compiled module code. While many find event-based programming difficult to follow, event-based, non-blocking I/O is often much less difficult to reason about than threads.

With asynchronous I/O, you can always be sure that, for each open descriptor, the path of execution is consistent and orderly. There are, obviously, issues that must be addressed, such as what to do when code depending on one open channel further depends on the results of code to be called when another open channel returns data.

One nice solution for event-based, non-blocking I/O is the new Diesel library. It is restricted to Linux at the moment, but it is extraordinarily fast and quite elegant.

It is also worth your time to learn pyevent, a wrapper around the wonderful libevent library, which provides a basic framework for event-based programming using the fastest available method for your system (determined at compile time).


回答 2

优点QThread是它与Qt库的其余部分集成在一起。也就是说,Qt中的线程感知方法将需要知道它们在哪个线程中运行,并且需要在线程之间移动对象QThread。另一个有用的功能是在线程中运行您自己的事件循环。

如果要访问HTTP服务器,则应考虑QNetworkAccessManager

The advantage of QThread is that it’s integrated with the rest of the Qt library. That is, thread-aware methods in Qt will need to know in which thread they run, and to move objects between threads, you will need to use QThread. Another useful feature is running your own event loop in a thread.

If you are accessing a HTTP server, you should consider QNetworkAccessManager.


回答 3

PyTalk上工作时,我问了同样的问题。

如果您使用的是Qt,则需要使用QThread能够使用Qt框架,尤其是信号/插槽系统。

使用信号/插槽引擎,您将能够从一个线程与另一个线程以及项目的每个部分进行对话。

而且,由于这两者都是C ++绑定,因此对于此选择没有太大的性能问题。

这是我对PyQt和线程的经验。

我鼓励你使用QThread

I asked myself the same question when I was working to PyTalk.

If you are using Qt, you need to use QThread to be able to use the Qt framework and expecially the signal/slot system.

With the signal/slot engine, you will be able to talk from a thread to another and with every part of your project.

Moreover, there is not very performance question about this choice since both are a C++ bindings.

Here is my experience of PyQt and thread.

I encourage you to use QThread.


回答 4

杰夫有一些优点。只有一个主线程可以执行任何GUI更新。如果确实需要从线程内更新GUI,则Qt-4的排队连接信号使跨线程发送数据变得容易,并且如果使用QThread,则将自动调用该信号。我不确定您是否正在使用Python线程,尽管可以轻松地向中添加参数connect()

Jeff has some good points. Only one main thread can do any GUI updates. If you do need to update the GUI from within the thread, Qt-4’s queued connection signals make it easy to send data across threads and will automatically be invoked if you’re using QThread; I’m not sure if they will be if you’re using Python threads, although it’s easy to add a parameter to connect().


回答 5

我也不能真正推荐,但是我可以尝试描述CPython和Qt线程之间的区别。

首先,CPython线程不能并发运行,至少不是Python代码。是的,它们确实为每个Python线程创建了系统线程,但是仅允许当前持有“全局解释器锁”的线程运行(C扩展名和FFI代码可能会绕过它,但是当线程不持有GIL时不执行Python字节码)。

另一方面,我们有Qt线程,它们基本上是系统线程上的通用层,没有全局解释器锁定,因此能够并行运行。我不确定PyQt如何处理它,但是除非您的Qt线程调用Python代码,否则它们应该能够并发运行(可能会在各种结构中实现的各种额外锁)。

为了进行额外的微调,您可以修改在切换GIL所有权之前解释的字节码指令的数量-较低的值意味着更多的上下文切换(并且可能会有更高的响应度),但是每个单独线程的性能较低(如果您需要尝试切换每条指令,这对提高速度没有帮助。)

希望它可以帮助您解决问题:)

I can’t really recommend either, but I can try describing differences between CPython and Qt threads.

First of all, CPython threads do not run concurrently, at least not Python code. Yes, they do create system threads for each Python thread, however only the thread currently holding Global Interpreter Lock is allowed to run (C extensions and FFI code might bypass it, but Python bytecode is not executed while thread doesn’t hold GIL).

On the other hand, we have Qt threads, which are basically common layer over system threads, don’t have Global Interpreter Lock, and thus are capable of running concurrently. I’m not sure how PyQt deals with it, however unless your Qt threads call Python code, they should be able to run concurrently (bar various extra locks that might be implemented in various structures).

For extra fine-tuning, you can modify the amount of bytecode instructions that are interpreted before switching ownership of GIL – lower values mean more context switching (and possibly higher responsiveness) but lower performance per individual thread (context switches have their cost – if you try switching every few instructions it doesn’t help speed.)

Hope it helps with your problems :)


回答 6

我无法评论Python和PyQt线程之间的确切差异,但我一直在使用尝试做您想做的事情QThreadQNetworkAcessManager并确保QApplication.processEvents()在线程运行时调用它。如果GUI响应确实是您要解决的问题,则稍后的内容会有所帮助。

I can’t comment on the exact differences between Python and PyQt threads, but I’ve been doing what you’re attempting to do using QThread, QNetworkAcessManager and making sure to call QApplication.processEvents() while the thread is alive. If GUI responsiveness is really the issue you’re trying to solve, the later will help.


Python是否支持多线程?可以加快执行时间吗?

问题:Python是否支持多线程?可以加快执行时间吗?

我对多线程是否可以在Python中工作感到有些困惑。

我知道对此有很多疑问,我已经阅读了很多,但是我仍然很困惑。我从我自己的经验中知道,并且看到其他人在StackOverflow上发表了自己的答案和示例,在Python中确实可以实现多线程。那么,为什么每个人都在说Python被GIL锁定并且一次只能运行一个线程呢?显然可以。还是我不来这里有什么区别?

许多张贴者/受访者还不断提到线程是有限的,因为它不使用多个核心。但是我会说它们仍然有用,因为它们可以同时工作,因此可以更快地完成合并的工作量。我的意思是为什么还要有Python线程模块呢?

更新:

到目前为止,感谢您提供所有答案。据我了解,多线程只能并行运行某些IO任务,而一次只能运行一个CPU绑定的多个核心任务。

我并不完全确定这对我实际上意味着什么,所以我仅举一个我想进行多线程的任务示例。例如,假设我要遍历很长的字符串列表,并且希望对每个列表项执行一些基本的字符串操作。如果拆分列表,将每个要由循环/字符串代码处理的子列表发送到新线程中,然后将结果发送回队列中,这些工作负载是否会大致同时运行?最重要的是,从理论上讲,这会加快运行脚本的时间吗?

另一个例子可能是,如果我可以在四个不同的线程中使用PIL渲染和保存四张不同的图片,并且这比一张又一张地处理图片要快吗?我想这个速度要素是我真正想知道的,而不是正确的术语。

我也了解多处理模块,但是我现在的主要兴趣是中小型任务负载(10-30秒),因此我认为多线程将更合适,因为子进程的启动速度很慢。

I’m slightly confused about whether multithreading works in Python or not.

I know there has been a lot of questions about this and I’ve read many of them, but I’m still confused. I know from my own experience and have seen others post their own answers and examples here on StackOverflow that multithreading is indeed possible in Python. So why is it that everyone keep saying that Python is locked by the GIL and that only one thread can run at a time? It clearly does work. Or is there some distinction I’m not getting here?

Many posters/respondents also keep mentioning that threading is limited because it does not make use of multiple cores. But I would say they are still useful because they do work simultaneously and thus get the combined workload done faster. I mean why would there even be a Python thread module otherwise?

Update:

Thanks for all the answers so far. The way I understand it is that multithreading will only run in parallel for some IO tasks, but can only run one at a time for CPU-bound multiple core tasks.

I’m not entirely sure what this means for me in practical terms, so I’ll just give an example of the kind of task I’d like to multithread. For instance, let’s say I want to loop through a very long list of strings and I want to do some basic string operations on each list item. If I split up the list, send each sublist to be processed by my loop/string code in a new thread, and send the results back in a queue, will these workloads run roughly at the same time? Most importantly will this theoretically speed up the time it takes to run the script?

Another example might be if I can render and save four different pictures using PIL in four different threads, and have this be faster than processing the pictures one by one after each other? I guess this speed-component is what I’m really wondering about rather than what the correct terminology is.

I also know about the multiprocessing module but my main interest right now is for small-to-medium task loads (10-30 secs) and so I think multithreading will be more appropriate because subprocesses can be slow to initiate.


回答 0

GIL不会阻止线程化。GIL所做的全部工作就是确保一次只有一个线程在执行Python代码。控制仍然在线程之间切换。

GIL当时阻止的事情是利用多个CPU内核或单独的CPU并行运行线程。

这仅适用于Python代码。C扩展可以并且确实会发布GIL,以允许C代码的多个线程和一个Python线程跨多个内核运行。这扩展到由内核控制的I / O,例如select()对套接字读写的调用,使Python在多线程多核设置中合理有效地处理网络事件。

然后,许多服务器部署将执行多个Python进程,以使OS处理进程之间的调度,以最大程度地利用CPU内核。如果适合您的用例,您还可以使用该multiprocessing来处理来自一个代码库和父进程的多个进程的并行处理。

注意,GIL仅适用于CPython实现。Jython和IronPython使用不同的线程实现(分别是本机Java VM和.NET公共运行时线程)。

直接解决更新问题:任何尝试使用纯Python代码从并行执行中提高速度的任务都不会看到加速,因为线程化的Python代码一次只能锁定一个线程。但是,如果混用C扩展名和I / O(例如PIL或numpy操作),则任何C代码都可以与一个活动的Python线程并行运行。

Python线程非常适合创建响应式GUI或处理多个简短的Web请求,其中I / O比Python代码更多地成为瓶颈。它不适用于并行化计算量大的Python代码,不适合执行multiprocessing此类任务的模块或委托给专用的外部库。

The GIL does not prevent threading. All the GIL does is make sure only one thread is executing Python code at a time; control still switches between threads.

What the GIL prevents then, is making use of more than one CPU core or separate CPUs to run threads in parallel.

This only applies to Python code. C extensions can and do release the GIL to allow multiple threads of C code and one Python thread to run across multiple cores. This extends to I/O controlled by the kernel, such as select() calls for socket reads and writes, making Python handle network events reasonably efficiently in a multi-threaded multi-core setup.

What many server deployments then do, is run more than one Python process, to let the OS handle the scheduling between processes to utilize your CPU cores to the max. You can also use the multiprocessing library to handle parallel processing across multiple processes from one codebase and parent process, if that suits your use cases.

Note that the GIL is only applicable to the CPython implementation; Jython and IronPython use a different threading implementation (the native Java VM and .NET common runtime threads respectively).

To address your update directly: Any task that tries to get a speed boost from parallel execution, using pure Python code, will not see a speed-up as threaded Python code is locked to one thread executing at a time. If you mix in C extensions and I/O, however (such as PIL or numpy operations) and any C code can run in parallel with one active Python thread.

Python threading is great for creating a responsive GUI, or for handling multiple short web requests where I/O is the bottleneck more than the Python code. It is not suitable for parallelizing computationally intensive Python code, stick to the multiprocessing module for such tasks or delegate to a dedicated external library.


回答 1

是。:)

您具有低级线程模块和高级线程模块。但是您只需要使用多核计算机,多处理模块就是您的理想之选。

文档引用:

在CPython中,由于具有全局解释器锁,因此只有一个线程可以一次执行Python代码(即使某些面向性能的库可能克服了此限制)。如果希望您的应用程序更好地利用多核计算机的计算资源,建议您使用多处理。但是,如果您要同时运行多个I / O绑定任务,则线程化仍然是合适的模型。

Yes. :)

You have the low level thread module and the higher level threading module. But it you simply want to use multicore machines, the multiprocessing module is the way to go.

Quote from the docs:

In CPython, due to the Global Interpreter Lock, only one thread can execute Python code at once (even though certain performance-oriented libraries might overcome this limitation). If you want your application to make better use of the computational resources of multi-core machines, you are advised to use multiprocessing. However, threading is still an appropriate model if you want to run multiple I/O-bound tasks simultaneously.


回答 2

Python允许线程化,唯一的问题是GIL将确保一次只执行一个线程(无并行性)。

因此,基本上,如果您想对代码进行多线程处理以加快计算速度,那么一次只能执行一个线程就不会加快代码的处理速度,但是例如,如果您使用它与数据库进行交互,它就会加快速度。

Threading is Allowed in Python, the only problem is that the GIL will make sure that just one thread is executed at a time (no parallelism).

So basically if you want to multi-thread the code to speed up calculation it won’t speed it up as just one thread is executed at a time, but if you use it to interact with a database for example it will.


决定子进程,多处理和Python中的线程?

问题:决定子进程,多处理和Python中的线程?

我想并行化我的Python程序,以便它可以在运行它的机器上使用多个处理器。我的并行化非常简单,因为程序的所有并行“线程”都是独立的,并将其输出写入单独的文件。我不需要线程交换信息,但是必须知道线程何时完成,因为管道的某些步骤取决于它们的输出。

可移植性很重要,因为我希望它可以在Mac,Linux和Windows上的任何Python版本上运行。考虑到这些限制,哪个是实现此功能的最合适的Python模块?我试图在线程,子进程和多处理之间做出选择,它们似乎都提供了相关的功能。

有什么想法吗?我想要最简单的便携式解决方案。

I’d like to parallelize my Python program so that it can make use of multiple processors on the machine that it runs on. My parallelization is very simple, in that all the parallel “threads” of the program are independent and write their output to separate files. I don’t need the threads to exchange information but it is imperative that I know when the threads finish since some steps of my pipeline depend on their output.

Portability is important, in that I’d like this to run on any Python version on Mac, Linux, and Windows. Given these constraints, which is the most appropriate Python module for implementing this? I am trying to decide between thread, subprocess, and multiprocessing, which all seem to provide related functionality.

Any thoughts on this? I’d like the simplest solution that’s portable.


回答 0

multiprocessing是一款很棒的瑞士军刀型模块。它比线程更通用,因为您甚至可以执行远程计算。因此,这是我建议您使用的模块。

subprocess模块还允许您启动多个进程,但是我发现它比新的多处理模块使用起来不方便。

众所周知,线程是微妙的,而且使用CPython时,线程通常仅限于一个内核(尽管正如其中一项注释所述,全局解释器锁(GIL)可以在从Python代码调用的C代码中释放) 。

我相信您引用的三个模块的大多数功能都可以以与平台无关的方式使用。在可移植性方面,请注意,multiprocessing仅自Python 2.6(确实存在用于某些较旧版本的Python的版本)以来才成为标准。但这是一个很棒的模块!

multiprocessing is a great Swiss-army knife type of module. It is more general than threads, as you can even perform remote computations. This is therefore the module I would suggest you use.

The subprocess module would also allow you to launch multiple processes, but I found it to be less convenient to use than the new multiprocessing module.

Threads are notoriously subtle, and, with CPython, you are often limited to one core, with them (even though, as noted in one of the comments, the Global Interpreter Lock (GIL) can be released in C code called from Python code).

I believe that most of the functions of the three modules you cite can be used in a platform-independent way. On the portability side, note that multiprocessing only comes in standard since Python 2.6 (a version for some older versions of Python does exist, though). But it’s a great module!


回答 1

对我来说,这实际上很简单:

选项:

subprocess用于运行其他可执行文件 —它基本上是一个包装器os.fork(),并os.execve()带有可选的管道一定的支撑(管道设置,并从子进程。很明显,你可能其他进程间通信(IPC)机制,如插座,或POSIX或SysV共享内存,但是您将受限于所调用程序所支持的任何接口和IPC通道。

通常,任何人都可以subprocess同步使用—只需调用某个外部实用程序并回读其输出或等待其完成(也许从一个临时文件中读取结果,或者将其发布到某个数据库中)即可。

但是,可以产生数百个子流程并对其进行轮询。我自己个人最喜欢的实用程序classh正是这样做的。 最大的缺点了的subprocess模块是I / O支持通常是封锁。有一个PEP-3145草案可以在将来的Python 3.x版本和一个替代的asyncproc中进行修复(警告会导致直接下载,而不是任何文档或自述文件)。我还发现,直接导入fcntl和操作PopenPIPE文件描述符相对容易—尽管我不知道它是否可以移植到非UNIX平台。

(更新:2019年8月7日:Python 3支持ayncio子流程:asyncio子流程)

subprocess 几乎没有事件处理支持尽管您可以使用signal模块和普通的老式UNIX / Linux信号-像以前那样轻柔地杀死进程。

多处理选项:

multiprocessing对现有的(Python)的代码中运行的功能,可支持这个家庭的过程中更加灵活的通信。特别是,最好在可能的情况下multiprocessing围绕模块的Queue对象构建IPC ,但您也可以使用Event对象和各种其他功能(其中一些功能大概mmap是在足够支持的平台上围绕支持构建的)。

Python的multiprocessing模块旨在提供与接口和功能非常相似的 功能,threading同时允许CPython在具有GIL(全局解释器锁定)的情况下在多个CPU /内核之间扩展您的处理。它利用了由OS内核开发人员完成的所有细粒度SMP锁定和一致性工作。

线程选项:

threading适用于受I / O限制(不需要跨多个CPU内核扩展)的相当狭窄的应用程序范围,并且受益于线程切换(带有共享核心内存)与进程/上下文切换。在Linux上,这几乎是空集(Linux进程切换时间非常接近其线程切换时间)。

threading在Python中两个主要缺点

当然,其中之一是特定于实现的-主要影响CPython。那就是GIL。在大多数情况下,大多数CPython程序不会受益于两个以上CPU(内核)的可用性,并且性能通常会受到 GIL锁定争用的影响。

与实现无关的更大问题是线程共享相同的内存,信号处理程序,文件描述符和某些其他OS资源。因此,程序员必须非常小心对象锁定,异常处理以及其代码的其他方面,这些方面都很微妙,并且可能杀死,停止或死锁整个进程(线程套件)。

通过比较,该multiprocessing模型为每个进程提供了自己的内存,文件描述符等。其中任何一个崩溃或未处理的异常只会杀死该资源,而可靠地处理子进程或同级进程的消失比调试,隔离要容易得多。并修复或解决线程中的类似问题。

  • (请注意:threading与主要的Python系统(例如NumPy)一起使用,可能比大多数自己的Python代码所遭受的GIL竞争要少得多。这是因为它们是专门为这样做而设计的; NumPy的本机/二进制部分,例如,在安全的情况下会释放GIL)。

扭曲的选项:

还值得注意的是,Twisted提供了另一种选择,既优雅又难以理解。基本上,在过度简化的风险下,Twisted的粉丝可能会用干草叉和火把冲进我的家,Twisted在任何(单个)过程中提供事件驱动的协作式多任务处理。

要了解这是如何实现的,应该阅读一下select()(可以围绕select()poll()或类似的OS系统调用构建)的功能。基本上,所有这些都由以下能力驱动:操作系统请求进入睡眠状态,以等待文件描述符列表中的任何活动或某个超时。

从这些调用中的每一个唤醒,select()都是一个事件—要么涉及一些套接字或文件描述符上的可用输入(可读),要么涉及某些其他(可写)描述符或套接字上可用的缓冲空间,以及一些特殊情况(TCP)带外推送数据包)或超时。

因此,Twisted编程模型是围绕处理这些事件然后在生成的“主”处理程序上循环构建的,从而使其可以将事件分派给您的处理程序。

我个人认为Twisted是编程模型的代名词,因为从某种意义上讲,解决问题的方法必须“内卷”。您不是将程序视为对输入数据,输出或结果的一系列操作,而是将程序编写为服务或守护程序,并定义程序对各种事件的反应。(实际上,Twisted程序的核心“主循环”是(通常?总是?)a reactor())。

使用Twisted主要挑战包括围绕事件驱动的模型扭曲思维,并避免使用任何未经编写在Twisted框架内合作的类库或工具包。这就是为什么Twisted提供了自己的模块来进行SSH协议处理,curses,自己的子进程/ Popen函数以及许多其他模块和协议处理程序的原因,这些模块和协议处理程序乍看起来似乎在Python标准库中是重复的。

我认为从概念上理解Twisted很有用,即使您从不打算使用它。它可以提供有关线程,多处理甚至子流程处理以及您执行的任何分布式处理中的性能,争用和事件处理的真知灼见。

注意:较新版本的Python 3.x包含asyncio(异步I / O)功能,例如async def@ async.coroutine装饰器和await关键字,以及从将来的支持中产生的收益。所有这些都大致类似于从流程(合作多任务)的角度来看是扭曲的。(有关Twisted对Python 3的支持的当前状态,请查看:https : //twistedmatrix.com/documents/current/core/howto/python3.html

分布选项:

您尚未询问的另一个处理领域,但值得考虑的是分布式处理。有许多用于分布式处理和并行计算的Python工具和框架。我个人认为,最容易使用的是在该空间中最不常用的一种。

围绕Redis构建分布式处理几乎是微不足道的。整个密钥存储区可用于存储工作单位和结果,Redis LIST可用作Queue()类似的对象,而PUB / SUB支持可用于类似Event的处理。您可以散列密钥并使用在Redis实例的松散集群中复制的键来存储拓扑和散列令牌映射,以提供一致的散列和故障转移,以扩展到超出任何单个实例的容量来协调工作人员并在其中封送数据(腌制,JSON,BSON或YAML)。

当然,当您开始围绕Redis构建更大规模,更复杂的解决方案时,您正在重新实现已经使用CeleryApache SparkHadoopZookeeperetcdCassandra等解决的许多功能。这些都具有用于Python访问其服务的模块。

[更新:如果您考虑将Python用于分布式系统中的计算密集型,则需要考虑以下两个资源:IPython ParallelPySpark。尽管这些是通用分布式计算系统,但它们尤其是可访问且流行的子系统,即数据科学和分析]。

结论

从单线程,简单的同步调用到子进程,轮询的子进程池,线程和多处理,事件驱动的协作式多任务处理以及分布式处理,到处都有Python的处理替代方案。

For me this is actually pretty simple:

The subprocess option:

subprocess is for running other executables — it’s basically a wrapper around os.fork() and os.execve() with some support for optional plumbing (setting up PIPEs to and from the subprocesses. Obviously you could other inter-process communications (IPC) mechanisms, such as sockets, or Posix or SysV shared memory. But you’re going to be limited to whatever interfaces and IPC channels are supported by the programs you’re calling.

Commonly, one uses any subprocess synchronously — simply calling some external utility and reading back its output or awaiting its completion (perhaps reading its results from a temporary file, or after it’s posted them to some database).

However one can spawn hundreds of subprocesses and poll them. My own personal favorite utility classh does exactly that. The biggest disadvantage of the subprocess module is that I/O support is generally blocking. There is a draft PEP-3145 to fix that in some future version of Python 3.x and an alternative asyncproc (Warning that leads right to the download, not to any sort of documentation nor README). I’ve also found that it’s relatively easy to just import fcntl and manipulate your Popen PIPE file descriptors directly — though I don’t know if this is portable to non-UNIX platforms.

(Update: 7 August 2019: Python 3 support for ayncio subprocesses: asyncio Subprocessses)

subprocess has almost no event handling supportthough you can use the signal module and plain old-school UNIX/Linux signals — killing your processes softly, as it were.

The multiprocessing option:

multiprocessing is for running functions within your existing (Python) code with support for more flexible communications among this family of processes. In particular it’s best to build your multiprocessing IPC around the module’s Queue objects where possible, but you can also use Event objects and various other features (some of which are, presumably, built around mmap support on the platforms where that support is sufficient).

Python’s multiprocessing module is intended to provide interfaces and features which are very similar to threading while allowing CPython to scale your processing among multiple CPUs/cores despite the GIL (Global Interpreter Lock). It leverages all the fine-grained SMP locking and coherency effort that was done by developers of your OS kernel.

The threading option:

threading is for a fairly narrow range of applications which are I/O bound (don’t need to scale across multiple CPU cores) and which benefit from the extremely low latency and switching overhead of thread switching (with shared core memory) vs. process/context switching. On Linux this is almost the empty set (Linux process switch times are extremely close to its thread-switches).

threading suffers from two major disadvantages in Python.

One, of course, is implementation specific — mostly affecting CPython. That’s the GIL. For the most part, most CPython programs will not benefit from the availability of more than two CPUs (cores) and often performance will suffer from the GIL locking contention.

The larger issue which is not implementation specific, is that threads share the same memory, signal handlers, file descriptors and certain other OS resources. Thus the programmer must be extremely careful about object locking, exception handling and other aspects of their code which are both subtle and which can kill, stall, or deadlock the entire process (suite of threads).

By comparison the multiprocessing model gives each process its own memory, file descriptors, etc. A crash or unhandled exception in any one of them will only kill that resource and robustly handling the disappearance of a child or sibling process can be considerably easier than debugging, isolating and fixing or working around similar issues in threads.

  • (Note: use of threading with major Python systems, such as NumPy, may suffer considerably less from GIL contention than most of your own Python code would. That’s because they’ve been specifically engineered to do so; the native/binary portions of NumPy, for example, will release the GIL when that’s safe).

The twisted option:

It’s also worth noting that Twisted offers yet another alternative which is both elegant and very challenging to understand. Basically, at the risk of over simplifying to the point where fans of Twisted may storm my home with pitchforks and torches, Twisted provides event-driven co-operative multi-tasking within any (single) process.

To understand how this is possible one should read about the features of select() (which can be built around the select() or poll() or similar OS system calls). Basically it’s all driven by the ability to make a request of the OS to sleep pending any activity on a list of file descriptors or some timeout.

Awakening from each of these calls to select() is an event — either one involving input available (readable) on some number of sockets or file descriptors, or buffering space becoming available on some other (writable) descriptors or sockets, some exceptional conditions (TCP out-of-band PUSH’d packets, for example), or a TIMEOUT.

Thus the Twisted programming model is built around handling these events then looping on the resulting “main” handler, allowing it to dispatch the events to your handlers.

I personally think of the name, Twisted as evocative of the programming model … since your approach to the problem must be, in some sense, “twisted” inside out. Rather than conceiving of your program as a series of operations on input data and outputs or results, you’re writing your program as a service or daemon and defining how it reacts to various events. (In fact the core “main loop” of a Twisted program is (usually? always?) a reactor()).

The major challenges to using Twisted involve twisting your mind around the event driven model and also eschewing the use of any class libraries or toolkits which are not written to co-operate within the Twisted framework. This is why Twisted supplies its own modules for SSH protocol handling, for curses, and its own subprocess/Popen functions, and many other modules and protocol handlers which, at first blush, would seem to duplicate things in the Python standard libraries.

I think it’s useful to understand Twisted on a conceptual level even if you never intend to use it. It may give insights into performance, contention, and event handling in your threading, multiprocessing and even subprocess handling as well as any distributed processing you undertake.

(Note: Newer versions of Python 3.x are including asyncio (asynchronous I/O) features such as async def, the @async.coroutine decorator, and the await keyword, and yield from future support. All of these are roughly similar to Twisted from a process (co-operative multitasking) perspective). (For the current status of Twisted support for Python 3, check out: https://twistedmatrix.com/documents/current/core/howto/python3.html)

The distributed option:

Yet another realm of processing you haven’t asked about, but which is worth considering, is that of distributed processing. There are many Python tools and frameworks for distributed processing and parallel computation. Personally I think the easiest to use is one which is least often considered to be in that space.

It is almost trivial to build distributed processing around Redis. The entire key store can be used to store work units and results, Redis LISTs can be used as Queue() like object, and the PUB/SUB support can be used for Event-like handling. You can hash your keys and use values, replicated across a loose cluster of Redis instances, to store the topology and hash-token mappings to provide consistent hashing and fail-over for scaling beyond the capacity of any single instance for co-ordinating your workers and marshaling data (pickled, JSON, BSON, or YAML) among them.

Of course as you start to build a larger scale and more sophisticated solution around Redis you are re-implementing many of the features that have already been solved using, Celery, Apache Spark and Hadoop, Zookeeper, etcd, Cassandra and so on. Those all have modules for Python access to their services.

[Update: A couple of resources for consideration if you’re considering Python for computationally intensive across distributed systems: IPython Parallel and PySpark. While these are general purpose distributed computing systems, they are particularly accessible and popular subsystems data science and analytics].

Conclusion

There you have the gamut of processing alternatives for Python, from single threaded, with simple synchronous calls to sub-processes, pools of polled subprocesses, threaded and multiprocessing, event-driven co-operative multi-tasking, and out to distributed processing.


回答 2

在类似的情况下,我选择了单独的过程以及通过网络套接字进行的少量必要通信。它使用python高度可移植并且非常简单,但是可能并不简单(在我的情况下,我还有另一个约束:与其他用C ++编写的进程进行通信)。

在您的情况下,我可能会选择多进程,因为python线程(至少在使用CPython时)不是真正的线程。好吧,它们是本机系统线程,但是从Python调用的C模块可能会也可能不会释放GIL,并允许其他线程在调用阻塞代码时运行。

In a similar case I opted for separate processes and the little bit of necessary communication trough network socket. It is highly portable and quite simple to do using python, but probably not the simpler (in my case I had also another constraint: communication with other processes written in C++).

In your case I would probably go for multiprocess, as python threads, at least when using CPython, are not real threads. Well, they are native system threads but C modules called from Python may or may not release the GIL and allow other threads them to run when calling blocking code.


回答 3

要在CPython中使用多个处理器,唯一的选择是multiprocessing模块。CPython对其内部(GIL)保持锁定,以防止其他cpus上的线程并行工作。该multiprocessing模块创建新流程(例如subprocess)并管理它们之间的通信。

To use multiple processors in CPython your only choice is the multiprocessing module. CPython keeps a lock on it’s internals (the GIL) which prevents threads on other cpus to work in parallel. The multiprocessing module creates new processes ( like subprocess ) and manages communication between them.


回答 4

掏出外壳,让unix来完成您的工作:

使用iterpipes包装子进程,然后:

来自Ted Ziuba的网站

INPUTS_FROM_YOU | xargs -n1 -0 -P NUM ./process #NUM个并行进程

要么

Gnu Parallel也将服务

当您派遣后台男孩进行多核工作时,您会与GIL闲逛。

Shell out and let the unix out to do your jobs:

use iterpipes to wrap subprocess and then:

From Ted Ziuba’s site

INPUTS_FROM_YOU | xargs -n1 -0 -P NUM ./process #NUM parallel processes

OR

Gnu Parallel will also serve

You hang out with GIL while you send the backroom boys out to do your multicore work.


Python 3中的多处理与多线程与异步

问题:Python 3中的多处理与多线程与异步

我发现在Python 3.4中,用于多处理/线程的库很少:多处理 vs 线程asyncio

但是我不知道使用哪个,或者是“推荐的”。他们做的是同一件事还是不同?如果是这样,则将哪一个用于什么?我想编写一个在计算机上使用多核的程序。但是我不知道我应该学习哪个图书馆。

I found that in Python 3.4 there are few different libraries for multiprocessing/threading: multiprocessing vs threading vs asyncio.

But I don’t know which one to use or is the “recommended one”. Do they do the same thing, or are different? If so, which one is used for what? I want to write a program that uses multicores in my computer. But I don’t know which library I should learn.


回答 0

它们旨在(略有)不同的目的和/或要求。CPython(典型的主线Python实现)仍然具有全局解释器锁,因此多线程应用程序(当今实现并行处理的标准方式)不是最佳选择。这就是为什么multiprocessing 可能要优先于threading。但是并不是每个问题都可以有效地分解为[几乎独立的]部分,因此可能需要大量的进程间通信。这就是为什么multiprocessing可能不被threading普遍推荐的原因。

asyncio(该技术不仅在Python中可用,其他语言和/或框架也有此技术,例如Boost.ASIO)是一种有效处理来自许多同时源的大量I / O操作而无需并行代码执行的方法。 。因此,这仅是针对特定任务的解决方案(确实是一个不错的方案!),而不是通常用于并行处理的解决方案。

They are intended for (slightly) different purposes and/or requirements. CPython (a typical, mainline Python implementation) still has the global interpreter lock so a multi-threaded application (a standard way to implement parallel processing nowadays) is suboptimal. That’s why multiprocessing may be preferred over threading. But not every problem may be effectively split into [almost independent] pieces, so there may be a need in heavy interprocess communications. That’s why multiprocessing may not be preferred over threading in general.

asyncio (this technique is available not only in Python, other languages and/or frameworks also have it, e.g. Boost.ASIO) is a method to effectively handle a lot of I/O operations from many simultaneous sources w/o need of parallel code execution. So it’s just a solution (a good one indeed!) for a particular task, not for parallel processing in general.


回答 1

[快速回答]

TL; DR

做出正确的选择:

我们介绍了最流行的并发形式。但是问题仍然存在-什么时候应该选择哪个?这实际上取决于用例。根据我的经验(和阅读),我倾向于遵循以下伪代码:

if io_bound:
    if io_very_slow:
        print("Use Asyncio")
    else:
        print("Use Threads")
else:
    print("Multi Processing")
  • CPU限制=>多处理
  • I / O绑定,快速I / O,有限的连接数=>多线程
  • I / O受限,I / O缓慢,许多连接=> Asyncio

参考


[ 注意 ]:

  • 如果您使用的是长调用方法(即,包含在睡眠时间或惰性I / O中的方法),则最佳选择是asyncioTwistedTornado方法(协程方法),该方法可以与单个线程并发工作。
  • asyncio适用于Python3.4及更高版本。
  • 自从Python2.7开始,TornadoTwisted已经准备就绪
  • uvloop是超快速asyncio事件循环(uvloop使asyncio速度提高2-4倍)。

[更新(2019)]:

  • Japranto GitHub是一个基于uvloop的非常快速的管道HTTP服务器。

[Quick Answer]

TL;DR

Making the Right Choice:

We have walked through the most popular forms of concurrency. But the question remains – when should choose which one? It really depends on the use cases. From my experience (and reading), I tend to follow this pseudo code:

if io_bound:
    if io_very_slow:
        print("Use Asyncio")
    else:
        print("Use Threads")
else:
    print("Multi Processing")
  • CPU Bound => Multi Processing
  • I/O Bound, Fast I/O, Limited Number of Connections => Multi Threading
  • I/O Bound, Slow I/O, Many connections => Asyncio

Reference


[NOTE]:

  • If you have a long call method (i.e. a method that contained with a sleep time or lazy I/O), the best choice is asyncio, Twisted or Tornado approach (coroutine methods), that works with a single thread as concurrency.
  • asyncio works on Python3.4 and later.
  • Tornado and Twisted are ready since Python2.7
  • uvloop is ultra fast asyncio event loop (uvloop makes asyncio 2-4x faster).

[UPDATE (2019)]:

  • Japranto (GitHub) is a very fast pipelining HTTP server based on uvloop.

回答 2

这是基本思想:

IO- BOUND吗?———>使用asyncio

它是CPU- HEAVY吗?—–>使用multiprocessing

其他吗?———————->使用threading

因此,除非您遇到IO / CPU问题,否则基本上要坚持使用线程。

This is the basic idea:

Is it IO-BOUND ? ———> USE asyncio

IS IT CPU-HEAVY ? —–> USE multiprocessing

ELSE ? ———————-> USE threading

So basically stick to threading unless you have IO/CPU problems.


回答 3

多处理中,您利用多个CPU来分配您的计算。由于每个CPU并行运行,因此您可以有效地同时运行多个任务。您可能希望对CPU绑定的任务使用多处理。一个示例将尝试计算巨大列表中所有元素的总和。如果您的计算机具有8个核心,则可以将列表“切割”为8个较小的列表,并分别在单独的核心上计算每个列表的总和,然后将这些数字相加即可。这样您将获得约8倍的加速。

穿线您不需要多个CPU。想象一个程序向网络发送大量HTTP请求。如果使用单线程程序,它将在每个请求处停止执行(块),等待响应,然后在收到响应后继续执行。这里的问题是,在等待某些外部服务器执行任务时,您的CPU并未真正在工作。同时,它实际上可以做一些有用的工作!解决方法是使用线程-您可以创建多个线程,每个线程负责从Web请求一些内容。关于线程的好处是,即使它们在一个CPU上运行,CPU也会不时地“冻结”一个线程的执行并跳转到执行另一个线程(这称为上下文切换,并且它在不确定性下不断发生)间隔)。 -使用线程。

asyncio本质上是线程化,而不是CPU,而是由您(作为程序员(或实际上是您的应用程序))决定上下文切换的时间和地点。在Python中,您可以使用await关键字来暂停协程的执行(使用async关键字定义)。

In multiprocessing you leverage multiple CPUs to distribute your calculations. Since each of the CPUs runs in parallel, you’re effectively able to run multiple tasks simultaneously. You would want to use multiprocessing for CPU-bound tasks. An example would be trying to calculate a sum of all elements of a huge list. If your machine has 8 cores, you can “cut” the list into 8 smaller lists and calculate the sum of each of those lists separately on separate core and then just add up those numbers. You’ll get a ~8x speedup by doing that.

In (multi)threading you don’t need multiple CPUs. Imagine a program that sends lots of HTTP requests to the web. If you used a single-threaded program, it would stop the execution (block) at each request, wait for a response, and then continue once received a response. The problem here is that your CPU isn’t really doing work while waiting for some external server to do the job; it could have actually done some useful work in the meantime! The fix is to use threads – you can create many of them, each responsible for requesting some content from the web. The nice thing about threads is that, even if they run on one CPU, the CPU from time to time “freezes” the execution of one thread and jumps to executing the other one (it’s called context switching and it happens constantly at non-deterministic intervals). So if your task is I/O bound – use threading.

asyncio is essentially threading where not the CPU but you, as a programmer (or actually your application), decide where and when does the context switch happen. In Python you use an await keyword to suspend the execution of your coroutine (defined using async keyword).


Python字典中的线程安全

问题:Python字典中的线程安全

我有一堂课,有一本字典

class OrderBook:
    orders = {'Restaurant1': None,
              'Restaurant2': None,
              'Restaurant3': None,
              'Restaurant4': None}

    @staticmethod
    def addOrder(restaurant_name, orders):
        OrderBook.orders[restaurant_name] = orders

我正在运行4个线程(每个餐厅一个线程)来调用方法OrderBook.addOrder。这是每个线程运行的函数:

def addOrders(restaurant_name):

    #creates orders
    ...

    OrderBook.addOrder(restaurant_name, orders)

这样安全吗,还是在调用之前必须使用锁addOrder

I have a class which holds a dictionary

class OrderBook:
    orders = {'Restaurant1': None,
              'Restaurant2': None,
              'Restaurant3': None,
              'Restaurant4': None}

    @staticmethod
    def addOrder(restaurant_name, orders):
        OrderBook.orders[restaurant_name] = orders

And I am running 4 threads (one for each restaurant) that call the method OrderBook.addOrder. Here is the function ran by each thread:

def addOrders(restaurant_name):

    #creates orders
    ...

    OrderBook.addOrder(restaurant_name, orders)

Is this safe, or do I have to use a lock before calling addOrder?


回答 0

Python的内置结构对于单个操作是线程安全的,但是有时很难看到一条语句真正变成了多个操作。

您的代码应该是安全的。请记住:这里的锁几乎不会增加任何开销,并且让您高枕无忧。

http://effbot.org/pyfaq/what-kinds-of-global-value-mutation-are-thread-safe.htm 具有更多详细信息。

Python’s built-in structures are thread-safe for single operations, but it can sometimes be hard to see where a statement really becomes multiple operations.

Your code should be safe. Keep in mind: a lock here will add almost no overhead, and will give you peace of mind.

http://effbot.org/pyfaq/what-kinds-of-global-value-mutation-are-thread-safe.htm has more details.


回答 1

是的,内置类型本质上是线程安全的:http : //docs.python.org/glossary.html#term-global-interpreter-lock

通过使对象模型(包括关键的内置类型,如dict)隐式地安全地防止并发访问,从而简化了CPython的实现。

Yes, built-in types are inherently thread-safe: http://docs.python.org/glossary.html#term-global-interpreter-lock

This simplifies the CPython implementation by making the object model (including critical built-in types such as dict) implicitly safe against concurrent access.


回答 2

Google的风格指南建议不要依赖dict原子性

在以下位置进一步详细解释:Python变量赋值是原子的吗?

不要依赖内置类型的原子性。

尽管Python的内置数据类型(如字典)似乎具有原子操作,但在某些极端情况下,它们不是原子操作(例如,如果将__hash____eq__实现为Python方法),则不应依赖其原子性。您也不应该依赖于原子变量赋值(因为这又取决于字典)。

使用Queue模块的Queue数据类型作为在线程之间传递数据的首选方式。否则,请使用线程模块及其锁定原语。了解如何正确使用条件变量,以便可以使用threading.Condition而不是使用较低级别的锁。

我同意这一观点:CPython中已经存在GIL,因此使用Lock的性能影响可以忽略不计。当这些CPython实现细节一天之内改变时,花在复杂代码库中的错误查找所花费的时间将大大增加。

Google’s style guide advises against relying on dict atomicity

Explained in further detail at: Is Python variable assignment atomic?

Do not rely on the atomicity of built-in types.

While Python’s built-in data types such as dictionaries appear to have atomic operations, there are corner cases where they aren’t atomic (e.g. if __hash__ or __eq__ are implemented as Python methods) and their atomicity should not be relied upon. Neither should you rely on atomic variable assignment (since this in turn depends on dictionaries).

Use the Queue module’s Queue data type as the preferred way to communicate data between threads. Otherwise, use the threading module and its locking primitives. Learn about the proper use of condition variables so you can use threading.Condition instead of using lower-level locks.

And I agree with this one: there is already the GIL in CPython, so the performance hit of using a Lock will be negligible. Much more costly will be the hours spent bug hunting in a complex codebase when those CPython implementation details change one day.


Python中的“线程本地存储”是什么,为什么需要它?

问题:Python中的“线程本地存储”是什么,为什么需要它?

特别是在Python中,变量如何在线程之间共享?

尽管我threading.Thread以前从未使用过,但从未真正理解或看到过如何共享变量的示例。它们是在主线程和子线程之间共享还是仅在子线程之间共享?我何时需要使用线程本地存储来避免这种共享?

我已经看到许多关于通过使用锁在线程之间同步访问共享数据的警告,但是我还没有看到一个很好的问题示例。

提前致谢!

In Python specifically, how do variables get shared between threads?

Although I have used threading.Thread before I never really understood or saw examples of how variables got shared. Are they shared between the main thread and the children or only among the children? When would I need to use thread local storage to avoid this sharing?

I have seen many warnings about synchronizing access to shared data among threads by using locks but I have yet to see a really good example of the problem.

Thanks in advance!


回答 0

在Python中,所有内容都是共享的,但函数局部变量除外(因为每个函数调用都有自己的局部变量集,而线程始终是单独的函数调用。)即使这样,也只有变量本身(引用对象的名称)对于功能而言是本地的;对象本身始终是全局的,任何事物都可以引用它们。Thread在这方面,特定线程的对象不是特殊对象。如果将Thread对象存储在所有线程可以访问的位置(例如全局变量),则所有线程都可以访问该Thread对象。如果要原子地修改另一个线程可以访问的任何内容,则必须使用锁来保护它。当然,所有线程都必须共享此非常相同的锁,否则效果不是很好。

如果您需要实际的线程本地存储,那就在这里threading.localthreading.local线程之间不共享属性。每个线程仅看到其自身放置在其中的属性。如果您对它的实现感到好奇,在标准库的_threading_local.py中找到源。

In Python, everything is shared, except for function-local variables (because each function call gets its own set of locals, and threads are always separate function calls.) And even then, only the variables themselves (the names that refer to objects) are local to the function; objects themselves are always global, and anything can refer to them. The Thread object for a particular thread is not a special object in this regard. If you store the Thread object somewhere all threads can access (like a global variable) then all threads can access that one Thread object. If you want to atomically modify anything that another thread has access to, you have to protect it with a lock. And all threads must of course share this very same lock, or it wouldn’t be very effective.

If you want actual thread-local storage, that’s where threading.local comes in. Attributes of threading.local are not shared between threads; each thread sees only the attributes it itself placed in there. If you’re curious about its implementation, the source is in _threading_local.py in the standard library.


回答 1

考虑以下代码:

#/usr/bin/env python

from time import sleep
from random import random
from threading import Thread, local

data = local()

def bar():
    print("I'm called from", data.v)

def foo():
    bar()

class T(Thread):
    def run(self):
        sleep(random())
        data.v = self.getName()   # Thread-1 and Thread-2 accordingly
        sleep(1)
        foo()
>> T()。start(); T()。开始()
我是从Thread-2打来的
我是从Thread-1打来的 

在这里,threading.local()是一种快速而又肮脏的方法,用于将一些数据从run()传递到bar(),而无需更改foo()的接口。

请注意,使用全局变量无法解决问题:

#/usr/bin/env python

from time import sleep
from random import random
from threading import Thread

def bar():
    global v
    print("I'm called from", v)

def foo():
    bar()

class T(Thread):
    def run(self):
        global v
        sleep(random())
        v = self.getName()   # Thread-1 and Thread-2 accordingly
        sleep(1)
        foo()
>> T()。start(); T()。开始()
我是从Thread-2打来的
我是从Thread-2打来的 

同时,如果您可以负担得起将这些数据作为foo()的参数传递的话,这将是一种更为优雅且设计合理的方法:

from threading import Thread

def bar(v):
    print("I'm called from", v)

def foo(v):
    bar(v)

class T(Thread):
    def run(self):
        foo(self.getName())

但是,在使用第三方代码或设计不良的代码时,这并不总是可能的。

Consider the following code:

#/usr/bin/env python

from time import sleep
from random import random
from threading import Thread, local

data = local()

def bar():
    print("I'm called from", data.v)

def foo():
    bar()

class T(Thread):
    def run(self):
        sleep(random())
        data.v = self.getName()   # Thread-1 and Thread-2 accordingly
        sleep(1)
        foo()
 >> T().start(); T().start()
I'm called from Thread-2
I'm called from Thread-1 

Here threading.local() is used as a quick and dirty way to pass some data from run() to bar() without changing the interface of foo().

Note that using global variables won’t do the trick:

#/usr/bin/env python

from time import sleep
from random import random
from threading import Thread

def bar():
    global v
    print("I'm called from", v)

def foo():
    bar()

class T(Thread):
    def run(self):
        global v
        sleep(random())
        v = self.getName()   # Thread-1 and Thread-2 accordingly
        sleep(1)
        foo()
 >> T().start(); T().start()
I'm called from Thread-2
I'm called from Thread-2 

Meanwhile, if you could afford passing this data through as an argument of foo() – it would be a more elegant and well-designed way:

from threading import Thread

def bar(v):
    print("I'm called from", v)

def foo(v):
    bar(v)

class T(Thread):
    def run(self):
        foo(self.getName())

But this is not always possible when using third-party or poorly designed code.


回答 2

您可以使用创建线程本地存储threading.local()

>>> tls = threading.local()
>>> tls.x = 4 
>>> tls.x
4

存储到tls中的数据对于每个线程都是唯一的,这将有助于确保不会发生意外共享。

You can create thread local storage using threading.local().

>>> tls = threading.local()
>>> tls.x = 4 
>>> tls.x
4

Data stored to the tls will be unique to each thread which will help ensure that unintentional sharing does not occur.


回答 3

就像其他每种语言一样,Python中的每个线程都可以访问相同的变量。“主线程”和子线程之间没有区别。

与Python的不同之处在于,全局解释器锁定意味着一次只能运行一个线程。但是,在同步访问方面并没有太大帮助,因为所有常见的先占问题仍然存在,并且您必须像其他语言一样使用线程原语。但是,这确实意味着您需要重新考虑是否使用线程来提高性能。

Just like in every other language, every thread in Python has access to the same variables. There’s no distinction between the ‘main thread’ and child threads.

One difference with Python is that the Global Interpreter Lock means that only one thread can be running Python code at a time. This isn’t much help when it comes to synchronising access, however, as all the usual pre-emption issues still apply, and you have to use threading primitives just like in other languages. It does mean you need to reconsider if you were using threads for performance, however.


回答 4

我可能在这里错了。如果您还不知道,请进行阐述,因为这将有助于解释为什么需要使用线程local()。

该语句似乎正确无误:“如果您想自动地修改另一个线程可以访问的任何内容,则必须使用锁来保护它。” 我认为这句话->有效地<-是正确的,但并不完全正确。我认为“原子”一词意味着Python解释器创建了一个字节代码块,该块不留空间给CPU发送中断信号。

我认为原子操作是不提供对中断访问权限的Python字节代码块。Python语句(例如“ running = True”)是原子的。在这种情况下,您无需从中断中锁定CPU(我相信)。Python字节代码故障可以防止线程中断。

诸如“ threads_running [5] = True”之类的Python代码不是原子的。这里有两个Python字节代码块;一个取消引用对象的list(),另一个字节码块为对象分配值,在这种情况下为列表中的“位置”。可以引发中断->在<-两个字节码之间-> chunks <-。那是坏事发生的地方。

线程local()与“原子”有何关系?这就是为什么该声明似乎对我有误导。如果不能,您能解释一下吗?

I may be wrong here. If you know otherwise please expound as this would help explain why one would need to use thread local().

This statement seems off, not wrong: “If you want to atomically modify anything that another thread has access to, you have to protect it with a lock.” I think this statement is ->effectively<- right but not entirely accurate. I thought the term “atomic” meant that the Python interpreter created a byte-code chunk that left no room for an interrupt signal to the CPU.

I thought atomic operations are chunks of Python byte code that does not give access to interrupts. Python statements like “running = True” is atomic. You do not need to lock CPU from interrupts in this case (I believe). The Python byte code breakdown is safe from thread interruption.

Python code like “threads_running[5] = True” is not atomic. There are two chunks of Python byte code here; one to de-reference the list() for an object and another byte code chunk to assign a value to an object, in this case a “place” in a list. An interrupt can be raised –>between<- the two byte-code ->chunks<-. That is were bad stuff happens.

How does thread local() relate to “atomic”? This is why the statement seems misdirecting to me. If not can you explain?


Python线程字符串参数

问题:Python线程字符串参数

我在使用Python线程并在参数中发送字符串时遇到问题。

def processLine(line) :
    print "hello";
    return;

dRecieved = connFile.readline();
processThread = threading.Thread(target=processLine, args=(dRecieved));
processThread.start();

其中dRecieved是连接读取的一行的字符串。它调用了一个简单的函数,到目前为止,该函数仅具有打印“ hello”的一项工作。

但是我收到以下错误

Traceback (most recent call last):
File "C:\Python25\lib\threading.py", line 486, in __bootstrap_inner
self.run()
File "C:\Python25\lib\threading.py", line 446, in run
self.__target(*self.__args, **self.__kwargs)
TypeError: processLine() takes exactly 1 arguments (232 given)

232是我尝试传递的字符串的长度,因此我猜想它会将其分解成每个字符并尝试传递类似的参数。如果我只是正常调用该函数,它将很好用,但是我真的想将其设置为单独的线程。

I have a problem with Python threading and sending a string in the arguments.

def processLine(line) :
    print "hello";
    return;

.

dRecieved = connFile.readline();
processThread = threading.Thread(target=processLine, args=(dRecieved));
processThread.start();

Where dRecieved is the string of one line read by a connection. It calls a simple function which as of right now has only one job of printing “hello”.

However I get the following error

Traceback (most recent call last):
File "C:\Python25\lib\threading.py", line 486, in __bootstrap_inner
self.run()
File "C:\Python25\lib\threading.py", line 446, in run
self.__target(*self.__args, **self.__kwargs)
TypeError: processLine() takes exactly 1 arguments (232 given)

232 is the length of the string that I am trying to pass, so I guess its breaking it up into each character and trying to pass the arguments like that. It works fine if I just call the function normally but I would really like to set it up as a separate thread.


回答 0

您正在尝试创建一个元组,但是您只是在用括号括起来:)

添加一个额外的’,’:

dRecieved = connFile.readline()
processThread = threading.Thread(target=processLine, args=(dRecieved,))  # <- note extra ','
processThread.start()

或使用方括号列出:

dRecieved = connFile.readline()
processThread = threading.Thread(target=processLine, args=[dRecieved])  # <- 1 element list
processThread.start()

如果您注意到,从堆栈跟踪中: self.__target(*self.__args, **self.__kwargs)

*self.__args将您的字符串转换成字符的列表,将它们传递给processLine 函数。如果将一个元素列表传递给它,它将把该元素作为第一个参数传递-在您的情况下为字符串。

You’re trying to create a tuple, but you’re just parenthesizing a string :)

Add an extra ‘,’:

dRecieved = connFile.readline()
processThread = threading.Thread(target=processLine, args=(dRecieved,))  # <- note extra ','
processThread.start()

Or use brackets to make a list:

dRecieved = connFile.readline()
processThread = threading.Thread(target=processLine, args=[dRecieved])  # <- 1 element list
processThread.start()

If you notice, from the stack trace: self.__target(*self.__args, **self.__kwargs)

The *self.__args turns your string into a list of characters, passing them to the processLine function. If you pass it a one element list, it will pass that element as the first argument – in your case, the string.


回答 1

我希望在这里提供更多背景知识。

首先,方法threading :: Thread的构造函数签名:

class threading.Thread(group=None, target=None, name=None, args=(), kwargs={}, *, daemon=None)

args是目标调用的参数元组。默认为()。

第二,关于Python 的怪癖tuple

空元组由一对空括号组成;一个带有一个项目的元组是通过在值后面加上逗号来构造的(将单个值括在括号中是不够的)。

另一方面,字符串是字符序列,例如'abc'[1] == 'b'。因此,如果将字符串发送到args,即使在括号中(仍然是一个字符串),每个字符也将被视为单个参数。

但是,Python是如此集成,并且不像JavaScript那样可以容忍额外的参数。相反,它引发了TypeError抱怨。

I hope to provide more background knowledge here.

First, constructor signature of the of method threading::Thread:

class threading.Thread(group=None, target=None, name=None, args=(), kwargs={}, *, daemon=None)

args is the argument tuple for the target invocation. Defaults to ().

Second, A quirk in Python about tuple:

Empty tuples are constructed by an empty pair of parentheses; a tuple with one item is constructed by following a value with a comma (it is not sufficient to enclose a single value in parentheses).

On the other hand, a string is a sequence of characters, like 'abc'[1] == 'b'. So if send a string to args, even in parentheses (still a sting), each character will be treated as a single parameter.

However, Python is so integrated and is not like JavaScript where extra arguments can be tolerated. Instead, it throws an TypeError to complain.


列表是线程安全的吗?

问题:列表是线程安全的吗?

我注意到经常建议使用具有多个线程的队列,而不是列表和.pop()。这是因为列表不是线程安全的,还是出于其他原因?

I notice that it is often suggested to use queues with multiple threads, instead of lists and .pop(). Is this because lists are not thread-safe, or for some other reason?


回答 0

列表本身是线程安全的。在CPython中,GIL防止对它们的并发访问,而其他实现则要小心地为它们的列表实现使用细粒度锁或同步数据类型。但是,虽然列表本身不会因尝试并发访问而损坏,但是列表的数据不受保护。例如:

L[0] += 1

如果另一个线程做同样的事情,则不能保证实际上将L [0]增加一,因为 +=这不是原子操作。(实际上,Python中很少有原子操作的操作,因为它们中的大多数操作都会导致调用任意Python代码。)您应该使用Queues,因为如果您仅使用不受保护的列表,则可能由于种族而获得或删除了错误的项目条件。

Lists themselves are thread-safe. In CPython the GIL protects against concurrent accesses to them, and other implementations take care to use a fine-grained lock or a synchronized datatype for their list implementations. However, while lists themselves can’t go corrupt by attempts to concurrently access, the lists’s data is not protected. For example:

L[0] += 1

is not guaranteed to actually increase L[0] by one if another thread does the same thing, because += is not an atomic operation. (Very, very few operations in Python are actually atomic, because most of them can cause arbitrary Python code to be called.) You should use Queues because if you just use an unprotected list, you may get or delete the wrong item because of race conditions.


回答 1

为了澄清托马斯出色答案的观点,应该提到的append() 线程安全的。

这是因为不必担心一旦我们去写入数据,读取的数据就会位于同一位置。该操作不读取数据,仅将数据写入列表。append()

To clarify a point in Thomas’ excellent answer, it should be mentioned that append() is thread safe.

This is because there is no concern that data being read will be in the same place once we go to write to it. The append() operation does not read data, it only writes data to the list.


回答 2

list操作示例以及它们是否线程安全的完整但不详尽的列表。希望能得到关于答案obj in a_list的语言结构在这里

Here’s a comprehensive yet non-exhaustive list of examples of list operations and whether or not they are thread safe. Hoping to get an answer regarding the obj in a_list language construct here.


回答 3

我最近遇到过这种情况,我需要在一个线程中连续地追加到列表,循环遍历这些项目,并检查该项目是否准备就绪,在我的情况下,它是一个AsyncResult,只有在准备就绪时才将其从列表中删除。我找不到任何示例可以清楚地说明我的问题。这是一个示例,该示例演示了在一个线程中连续添加到列表,并在另一个线程中连续从同一列表中删除该有缺陷的版本很容易在较小的数字上运行,但保持足够大的数字并运行一个几次,你会看到错误

FLAWED版本

import threading
import time

# Change this number as you please, bigger numbers will get the error quickly
count = 1000
l = []

def add():
    for i in range(count):
        l.append(i)
        time.sleep(0.0001)

def remove():
    for i in range(count):
        l.remove(i)
        time.sleep(0.0001)


t1 = threading.Thread(target=add)
t2 = threading.Thread(target=remove)
t1.start()
t2.start()
t1.join()
t2.join()

print(l)

错误时输出

Exception in thread Thread-63:
Traceback (most recent call last):
  File "/Users/zup/.pyenv/versions/3.6.8/lib/python3.6/threading.py", line 916, in _bootstrap_inner
    self.run()
  File "/Users/zup/.pyenv/versions/3.6.8/lib/python3.6/threading.py", line 864, in run
    self._target(*self._args, **self._kwargs)
  File "<ipython-input-30-ecfbac1c776f>", line 13, in remove
    l.remove(i)
ValueError: list.remove(x): x not in list

使用锁的版本

import threading
import time
count = 1000
l = []
lock = threading.RLock()
def add():
    with lock:
        for i in range(count):
            l.append(i)
            time.sleep(0.0001)

def remove():
    with lock:
        for i in range(count):
            l.remove(i)
            time.sleep(0.0001)


t1 = threading.Thread(target=add)
t2 = threading.Thread(target=remove)
t1.start()
t2.start()
t1.join()
t2.join()

print(l)

输出量

[] # Empty list

结论

如前面的回答中所述,虽然从列表本身追加或弹出元素的行为是线程安全的,但当您在一个线程中追加并在另一个线程中弹出时,不是线程安全的

I recently had this case where I needed to append to a list continuously in one thread, loop through the items and check if the item was ready, it was an AsyncResult in my case and remove it from the list only if it was ready. I could not find any examples that demonstrated my problem clearly Here is an example demonstrating adding to list in one thread continuously and removing from the same list in another thread continuously The flawed version runs easily on smaller numbers but keep the numbers big enough and run a few times and you will see the error

The FLAWED version

import threading
import time

# Change this number as you please, bigger numbers will get the error quickly
count = 1000
l = []

def add():
    for i in range(count):
        l.append(i)
        time.sleep(0.0001)

def remove():
    for i in range(count):
        l.remove(i)
        time.sleep(0.0001)


t1 = threading.Thread(target=add)
t2 = threading.Thread(target=remove)
t1.start()
t2.start()
t1.join()
t2.join()

print(l)

Output when ERROR

Exception in thread Thread-63:
Traceback (most recent call last):
  File "/Users/zup/.pyenv/versions/3.6.8/lib/python3.6/threading.py", line 916, in _bootstrap_inner
    self.run()
  File "/Users/zup/.pyenv/versions/3.6.8/lib/python3.6/threading.py", line 864, in run
    self._target(*self._args, **self._kwargs)
  File "<ipython-input-30-ecfbac1c776f>", line 13, in remove
    l.remove(i)
ValueError: list.remove(x): x not in list

Version that uses locks

import threading
import time
count = 1000
l = []
lock = threading.RLock()
def add():
    with lock:
        for i in range(count):
            l.append(i)
            time.sleep(0.0001)

def remove():
    with lock:
        for i in range(count):
            l.remove(i)
            time.sleep(0.0001)


t1 = threading.Thread(target=add)
t2 = threading.Thread(target=remove)
t1.start()
t2.start()
t1.join()
t2.join()

print(l)

Output

[] # Empty list

Conclusion

As mentioned in the earlier answers while the act of appending or popping elements from the list itself is thread safe, what is not thread safe is when you append in one thread and pop in another