Python中的“线程本地存储”是什么,为什么需要它?

问题:Python中的“线程本地存储”是什么,为什么需要它?

特别是在Python中,变量如何在线程之间共享?

尽管我threading.Thread以前从未使用过,但从未真正理解或看到过如何共享变量的示例。它们是在主线程和子线程之间共享还是仅在子线程之间共享?我何时需要使用线程本地存储来避免这种共享?

我已经看到许多关于通过使用锁在线程之间同步访问共享数据的警告,但是我还没有看到一个很好的问题示例。

提前致谢!

In Python specifically, how do variables get shared between threads?

Although I have used threading.Thread before I never really understood or saw examples of how variables got shared. Are they shared between the main thread and the children or only among the children? When would I need to use thread local storage to avoid this sharing?

I have seen many warnings about synchronizing access to shared data among threads by using locks but I have yet to see a really good example of the problem.

Thanks in advance!


回答 0

在Python中,所有内容都是共享的,但函数局部变量除外(因为每个函数调用都有自己的局部变量集,而线程始终是单独的函数调用。)即使这样,也只有变量本身(引用对象的名称)对于功能而言是本地的;对象本身始终是全局的,任何事物都可以引用它们。Thread在这方面,特定线程的对象不是特殊对象。如果将Thread对象存储在所有线程可以访问的位置(例如全局变量),则所有线程都可以访问该Thread对象。如果要原子地修改另一个线程可以访问的任何内容,则必须使用锁来保护它。当然,所有线程都必须共享此非常相同的锁,否则效果不是很好。

如果您需要实际的线程本地存储,那就在这里threading.localthreading.local线程之间不共享属性。每个线程仅看到其自身放置在其中的属性。如果您对它的实现感到好奇,在标准库的_threading_local.py中找到源。

In Python, everything is shared, except for function-local variables (because each function call gets its own set of locals, and threads are always separate function calls.) And even then, only the variables themselves (the names that refer to objects) are local to the function; objects themselves are always global, and anything can refer to them. The Thread object for a particular thread is not a special object in this regard. If you store the Thread object somewhere all threads can access (like a global variable) then all threads can access that one Thread object. If you want to atomically modify anything that another thread has access to, you have to protect it with a lock. And all threads must of course share this very same lock, or it wouldn’t be very effective.

If you want actual thread-local storage, that’s where threading.local comes in. Attributes of threading.local are not shared between threads; each thread sees only the attributes it itself placed in there. If you’re curious about its implementation, the source is in _threading_local.py in the standard library.


回答 1

考虑以下代码:

#/usr/bin/env python

from time import sleep
from random import random
from threading import Thread, local

data = local()

def bar():
    print("I'm called from", data.v)

def foo():
    bar()

class T(Thread):
    def run(self):
        sleep(random())
        data.v = self.getName()   # Thread-1 and Thread-2 accordingly
        sleep(1)
        foo()
>> T()。start(); T()。开始()
我是从Thread-2打来的
我是从Thread-1打来的 

在这里,threading.local()是一种快速而又肮脏的方法,用于将一些数据从run()传递到bar(),而无需更改foo()的接口。

请注意,使用全局变量无法解决问题:

#/usr/bin/env python

from time import sleep
from random import random
from threading import Thread

def bar():
    global v
    print("I'm called from", v)

def foo():
    bar()

class T(Thread):
    def run(self):
        global v
        sleep(random())
        v = self.getName()   # Thread-1 and Thread-2 accordingly
        sleep(1)
        foo()
>> T()。start(); T()。开始()
我是从Thread-2打来的
我是从Thread-2打来的 

同时,如果您可以负担得起将这些数据作为foo()的参数传递的话,这将是一种更为优雅且设计合理的方法:

from threading import Thread

def bar(v):
    print("I'm called from", v)

def foo(v):
    bar(v)

class T(Thread):
    def run(self):
        foo(self.getName())

但是,在使用第三方代码或设计不良的代码时,这并不总是可能的。

Consider the following code:

#/usr/bin/env python

from time import sleep
from random import random
from threading import Thread, local

data = local()

def bar():
    print("I'm called from", data.v)

def foo():
    bar()

class T(Thread):
    def run(self):
        sleep(random())
        data.v = self.getName()   # Thread-1 and Thread-2 accordingly
        sleep(1)
        foo()
 >> T().start(); T().start()
I'm called from Thread-2
I'm called from Thread-1 

Here threading.local() is used as a quick and dirty way to pass some data from run() to bar() without changing the interface of foo().

Note that using global variables won’t do the trick:

#/usr/bin/env python

from time import sleep
from random import random
from threading import Thread

def bar():
    global v
    print("I'm called from", v)

def foo():
    bar()

class T(Thread):
    def run(self):
        global v
        sleep(random())
        v = self.getName()   # Thread-1 and Thread-2 accordingly
        sleep(1)
        foo()
 >> T().start(); T().start()
I'm called from Thread-2
I'm called from Thread-2 

Meanwhile, if you could afford passing this data through as an argument of foo() – it would be a more elegant and well-designed way:

from threading import Thread

def bar(v):
    print("I'm called from", v)

def foo(v):
    bar(v)

class T(Thread):
    def run(self):
        foo(self.getName())

But this is not always possible when using third-party or poorly designed code.


回答 2

您可以使用创建线程本地存储threading.local()

>>> tls = threading.local()
>>> tls.x = 4 
>>> tls.x
4

存储到tls中的数据对于每个线程都是唯一的,这将有助于确保不会发生意外共享。

You can create thread local storage using threading.local().

>>> tls = threading.local()
>>> tls.x = 4 
>>> tls.x
4

Data stored to the tls will be unique to each thread which will help ensure that unintentional sharing does not occur.


回答 3

就像其他每种语言一样,Python中的每个线程都可以访问相同的变量。“主线程”和子线程之间没有区别。

与Python的不同之处在于,全局解释器锁定意味着一次只能运行一个线程。但是,在同步访问方面并没有太大帮助,因为所有常见的先占问题仍然存在,并且您必须像其他语言一样使用线程原语。但是,这确实意味着您需要重新考虑是否使用线程来提高性能。

Just like in every other language, every thread in Python has access to the same variables. There’s no distinction between the ‘main thread’ and child threads.

One difference with Python is that the Global Interpreter Lock means that only one thread can be running Python code at a time. This isn’t much help when it comes to synchronising access, however, as all the usual pre-emption issues still apply, and you have to use threading primitives just like in other languages. It does mean you need to reconsider if you were using threads for performance, however.


回答 4

我可能在这里错了。如果您还不知道,请进行阐述,因为这将有助于解释为什么需要使用线程local()。

该语句似乎正确无误:“如果您想自动地修改另一个线程可以访问的任何内容,则必须使用锁来保护它。” 我认为这句话->有效地<-是正确的,但并不完全正确。我认为“原子”一词意味着Python解释器创建了一个字节代码块,该块不留空间给CPU发送中断信号。

我认为原子操作是不提供对中断访问权限的Python字节代码块。Python语句(例如“ running = True”)是原子的。在这种情况下,您无需从中断中锁定CPU(我相信)。Python字节代码故障可以防止线程中断。

诸如“ threads_running [5] = True”之类的Python代码不是原子的。这里有两个Python字节代码块;一个取消引用对象的list(),另一个字节码块为对象分配值,在这种情况下为列表中的“位置”。可以引发中断->在<-两个字节码之间-> chunks <-。那是坏事发生的地方。

线程local()与“原子”有何关系?这就是为什么该声明似乎对我有误导。如果不能,您能解释一下吗?

I may be wrong here. If you know otherwise please expound as this would help explain why one would need to use thread local().

This statement seems off, not wrong: “If you want to atomically modify anything that another thread has access to, you have to protect it with a lock.” I think this statement is ->effectively<- right but not entirely accurate. I thought the term “atomic” meant that the Python interpreter created a byte-code chunk that left no room for an interrupt signal to the CPU.

I thought atomic operations are chunks of Python byte code that does not give access to interrupts. Python statements like “running = True” is atomic. You do not need to lock CPU from interrupts in this case (I believe). The Python byte code breakdown is safe from thread interruption.

Python code like “threads_running[5] = True” is not atomic. There are two chunks of Python byte code here; one to de-reference the list() for an object and another byte code chunk to assign a value to an object, in this case a “place” in a list. An interrupt can be raised –>between<- the two byte-code ->chunks<-. That is were bad stuff happens.

How does thread local() relate to “atomic”? This is why the statement seems misdirecting to me. If not can you explain?