全局变量在烧瓶中是线程安全的吗?如何在请求之间共享数据?

问题:全局变量在烧瓶中是线程安全的吗?如何在请求之间共享数据?

在我的应用程序中,公共对象的状态通过发出请求来更改,并且响应取决于状态。

class SomeObj():
    def __init__(self, param):
        self.param = param
    def query(self):
        self.param += 1
        return self.param

global_obj = SomeObj(0)

@app.route('/')
def home():
    flash(global_obj.query())
    render_template('index.html')

如果我在开发服务器上运行它,我希望得到1、2、3,依此类推。如果同时从100个不同的客户发出请求,会出问题吗?预期结果将是100个不同的客户端各自看到一个从1到100的唯一数字。或者会发生以下情况:

  1. 客户端1查询。self.param增加1。
  2. 在执行return语句之前,线程将切换到客户端2。self.param再次增加。
  3. 线程切换回客户端1,并向客户端返回数字2,例如。
  4. 现在,该线程移至客户端2,并向其返回数字3。

由于只有两个客户,因此预期结果是1和2,而不是2和3。跳过了一个数字。

当我扩展应用程序时,这是否真的会发生?我应该考虑使用什么替代全局变量?

In my application, the state of a common object is changed by making requests, and the response depends on the state.

class SomeObj():
    def __init__(self, param):
        self.param = param
    def query(self):
        self.param += 1
        return self.param

global_obj = SomeObj(0)

@app.route('/')
def home():
    flash(global_obj.query())
    render_template('index.html')

If I run this on my development server, I expect to get 1, 2, 3 and so on. If requests are made from 100 different clients simultaneously, can something go wrong? The expected result would be that the 100 different clients each see a unique number from 1 to 100. Or will something like this happen:

  1. Client 1 queries. self.param is incremented by 1.
  2. Before the return statement can be executed, the thread switches over to client 2. self.param is incremented again.
  3. The thread switches back to client 1, and the client is returned the number 2, say.
  4. Now the thread moves to client 2 and returns him/her the number 3.

Since there were only two clients, the expected results were 1 and 2, not 2 and 3. A number was skipped.

Will this actually happen as I scale up my application? What alternatives to a global variable should I look at?


回答 0

您不能使用全局变量来保存此类数据。它不仅不是线程安全的,也不是进程安全的,并且生产中的WSGI服务器产生了多个进程。如果您使用线程来处理请求,不仅计数会错误,而且根据处理该请求的进程的不同,计数也会有所不同。

使用Flask外部的数据源来保存全局数据。数据库,内存缓存或Redis都是适合的单独存储区域,具体取决于您的需求。如果您需要加载和访问Python数据,请考虑multiprocessing.Manager。您还可以将会话用于每个用户的简单数据。


开发服务器可以在单线程和进程中运行。您将看不到您描述的行为,因为每个请求都将被同步处理。启用线程或进程,您将看到它。app.run(threaded=True)app.run(processes=10)。(在1.0中,服务器默认为线程化。)


某些WSGI服务器可能支持gevent或其他异步工作器。全局变量仍然不是线程安全的,因为仍然没有针对大多数竞争条件的保护措施。您仍然可以设想这样一个场景,其中一个工作人员获取了一个值,产生了收益,另一个工作人员对其进行了修改,产生了收益,然后第一个工作人员也对其进行了修改。


如果请求期间需要存储一些全局数据,则可以使用Flask的gobject。另一个常见的情况是管理数据库连接的某些顶级对象。这种“全局”类型的区别在于,它对每个请求都是唯一的,请求之间不使用,并且有一些东西可以管理资源的建立和拆除。

You can’t use global variables to hold this sort of data. Not only is it not thread safe, it’s not process safe, and WSGI servers in production spawn multiple processes. Not only would your counts be wrong if you were using threads to handle requests, they would also vary depending on which process handled the request.

Use a data source outside of Flask to hold global data. A database, memcached, or redis are all appropriate separate storage areas, depending on your needs. If you need to load and access Python data, consider multiprocessing.Manager. You could also use the session for simple data that is per-user.


The development server may run in single thread and process. You won’t see the behavior you describe since each request will be handled synchronously. Enable threads or processes and you will see it. app.run(threaded=True) or app.run(processes=10). (In 1.0 the server is threaded by default.)


Some WSGI servers may support gevent or another async worker. Global variables are still not thread safe because there’s still no protection against most race conditions. You can still have a scenario where one worker gets a value, yields, another modifies it, yields, then the first worker also modifies it.


If you need to store some global data during a request, you may use Flask’s g object. Another common case is some top-level object that manages database connections. The distinction for this type of “global” is that it’s unique to each request, not used between requests, and there’s something managing the set up and teardown of the resource.


回答 1

这并不是对全局变量线程安全的真正答案。

但是我认为在这里提到会议很重要。您正在寻找一种存储特定于客户的数据的方法。每个连接都应该以线程安全的方式访问其自己的数据池。

服务器端会话可以做到这一点,它们可以在非常整齐的烧瓶插件中找到:https//pythonhosted.org/Flask-Session/

如果设置会话,session则所有路径中都存在一个变量,其行为类似于字典。对于每个连接的客户端,此词典中存储的数据都是单独的。

这是一个简短的演示:

from flask import Flask, session
from flask_session import Session

app = Flask(__name__)
# Check Configuration section for more details
SESSION_TYPE = 'filesystem'
app.config.from_object(__name__)
Session(app)

@app.route('/')
def reset():
    session["counter"]=0

    return "counter was reset"

@app.route('/inc')
def routeA():
    if not "counter" in session:
        session["counter"]=0

    session["counter"]+=1

    return "counter is {}".format(session["counter"])

@app.route('/dec')
def routeB():
    if not "counter" in session:
        session["counter"] = 0

    session["counter"] -= 1

    return "counter is {}".format(session["counter"])


if __name__ == '__main__':
    app.run()

之后pip install Flask-Session,您应该可以运行它。尝试从不同的浏览器访问它,您会发现计数器未在它们之间共享。

This is not really an answer to thread safety of globals.

But I think it is important to mention sessions here. You are looking for a way to store client-specific data. Every connection should have access to its own pool of data, in a threadsafe way.

This is possible with server-side sessions, and they are available in a very neat flask plugin: https://pythonhosted.org/Flask-Session/

If you set up sessions, a session variable is available in all your routes and it behaves like a dictionary. The data stored in this dictionary is individual for each connecting client.

Here is a short demo:

from flask import Flask, session
from flask_session import Session

app = Flask(__name__)
# Check Configuration section for more details
SESSION_TYPE = 'filesystem'
app.config.from_object(__name__)
Session(app)

@app.route('/')
def reset():
    session["counter"]=0

    return "counter was reset"

@app.route('/inc')
def routeA():
    if not "counter" in session:
        session["counter"]=0

    session["counter"]+=1

    return "counter is {}".format(session["counter"])

@app.route('/dec')
def routeB():
    if not "counter" in session:
        session["counter"] = 0

    session["counter"] -= 1

    return "counter is {}".format(session["counter"])


if __name__ == '__main__':
    app.run()

After pip install Flask-Session, you should be able to run this. Try accessing it from different browsers, you’ll see that the counter is not shared between them.


回答 2

虽然完全接受上述建议的答案,并且不鼓励将全局变量用于生产和可扩展的烧瓶存储,但出于对原型“非常简单的服务器”的目的,在烧瓶“开发服务器”下运行…

… python内置数据类型,我dict根据python docs(https://docs.python.org/3/glossary.html#term-global-interpreter-lock)亲自使用并测试了global 线程安全。不处理安全。

从开发服务器下运行的每个(可能是并发的)flask会话中,可以从(服务器全局)字典中进行插入,查找和读取。

当使用唯一的flask会话密钥为此类全局字典输入密钥时,它对于会话特定数据的服务器端存储非常有用,否则该会话不适合cookie(最大大小为4k)。

当然,应该谨慎地防止此类服务器全局指令在内存中增长得太大。可以在请求处理期间对某种过期的“旧”键/值对进行编码。

同样,不建议将其用于生产或可伸缩部署,但对于面向本地任务的服务器来说可能不错,因为对于给定任务而言,单独的db过多

While totally accepting the previous upvoted answers, and discouraging use of global variables for production and scalable Flask storage, for the purpose of prototyping or really simple servers, running under the flask ‘development server’…

The Python built-in data types, and I personally used and tested the global dict, as per Python documentation are thread safe. Not process safe.

The insertions, lookups, and reads from such a (server global) dict will be OK from each (possibly concurrent) Flask session running under the development server.

When such a global dict is keyed with a unique Flask session key, it can be rather useful for server-side storage of session specific data otherwise not fitting into the cookie (max size 4 kB).

Of course, such a server global dict should be carefully guarded for growing too large, being in-memory. Some sort of expiring the ‘old’ key/value pairs can be coded during request processing.

Again, it is not recommended for production or scalable deployments, but it is possibly OK for local task-oriented servers where a separate database is too much for the given task.