Python 3中的Concurrent.futures与Multiprocessing

Question 1

Python 3.2 introduced Concurrent Futures, which appear to be some advanced combination of the older threading and multiprocessing modules.

What are the advantages and disadvantages of using this for CPU bound tasks over the older multiprocessing module?

This article suggests they’re much easier to work with – is that the case?

Question 2

I wouldn’t call concurrent.futures more “advanced” – it’s a simpler interface that works very much the same regardless of whether you use multiple threads or multiple processes as the underlying parallelization gimmick.

So, like virtually all instances of “simpler interface”, much the same trade-offs are involved: it has a shallower learning curve, in large part just because there’s so much less available to be learned; but, because it offers fewer options, it may eventually frustrate you in ways the richer interfaces won’t.

So far as CPU-bound tasks go, that’s way too under-specified to say much meaningful. For CPU-bound tasks under CPython, you need multiple processes rather than multiple threads to have any chance of getting a speedup. But how much (if any) of a speedup you get depends on the details of your hardware, your OS, and especially on how much inter-process communication your specific tasks require. Under the covers, all inter-process parallelization gimmicks rely on the same OS primitives – the high-level API you use to get at those isn’t a primary factor in bottom-line speed.

Edit: example

Here’s the final code shown in the article you referenced, but I’m adding an import statement needed to make it work:

from concurrent.futures import ProcessPoolExecutor
def pool_factorizer_map(nums, nprocs):
    # Let the executor divide the work among processes by using 'map'.
    with ProcessPoolExecutor(max_workers=nprocs) as executor:
        return {num:factors for num, factors in
                                zip(nums,
                                    executor.map(factorize_naive, nums))}

Here’s exactly the same thing using multiprocessing instead:

import multiprocessing as mp
def mp_factorizer_map(nums, nprocs):
    with mp.Pool(nprocs) as pool:
        return {num:factors for num, factors in
                                zip(nums,
                                    pool.map(factorize_naive, nums))}

Note that the ability to use multiprocessing.Pool objects as context managers was added in Python 3.3.

As for which one is easier to work with, they’re essentially identical.

One difference is that Pool supports so many different ways of doing things that you may not realize how easy it can be until you’ve climbed quite a way up the learning curve.

Again, all those different ways are both a strength and a weakness. They’re a strength because the flexibility may be required in some situations. They’re a weakness because of “preferably only one obvious way to do it”. A project sticking exclusively (if possible) to concurrent.futures will probably be easier to maintain over the long run, due to the lack of gratuitous novelty in how its minimal API can be used.

Python 3中的Concurrent.futures与Multiprocessing

问题：Python 3中的Concurrent.futures与Multiprocessing

回答 0

排行榜展示

Python 情人节超强技能导出微信聊天记录生成词云

你不得不知道的python超级文献批量搜索下载工具

7行代码 Python热力图可视化分析缺失数据处理

Python 流程图 — 一键转化代码为流程图

Python 优化—算出每条语句执行时间

你的10W块放哪里能赚最多钱？

文章展示

如何检查变量是否为类？

如何打开文件进行读写？

如何在python中将以前的打印内容覆盖到stdout？

Python的time.clock（）与time.time（）的准确性？

字典可以在创建时传递给Django模型吗？

用Python从文件中读取字符

Python 3中的Concurrent.futures与Multiprocessing

问题：Python 3中的Concurrent.futures与Multiprocessing

回答 0

相关文章

排行榜展示

文章展示