multiprocessing.Pool:何时使用apply,apply_async或map?

问题:multiprocessing.Pool:何时使用apply,apply_async或map?

我还没有看到关于Pool.applyPool.apply_asyncPool.map用例的清晰示例。我主要使用Pool.map; 别人的优势是什么?

I have not seen clear examples with use-cases for Pool.apply, Pool.apply_async and Pool.map. I am mainly using Pool.map; what are the advantages of others?


回答 0

在Python的早期,要使用任意参数调用函数,可以使用apply

apply(f,args,kwargs)

apply尽管在Python2.7中仍然存在,但在Python3中仍然存在,并且通常不再使用。如今,

f(*args,**kwargs)

是首选。这些multiprocessing.Pool模块尝试提供类似的接口。

Pool.apply就像Python一样apply,不同之处在于函数调用是在单独的进程中执行的。Pool.apply直到功能完成为止。

Pool.apply_async也类似于Python的内置函数apply,除了调用立即返回而不是等待结果而已。AsyncResult返回一个对象。您调用其get()方法以检索函数调用的结果。该get()方法将阻塞直到功能完成。因此,pool.apply(func, args, kwargs)等效于pool.apply_async(func, args, kwargs).get()

与相比Pool.apply,该Pool.apply_async方法还具有一个回调,如果提供该回调,则在函数完成时调用该回调。可以使用它来代替get()

例如:

import multiprocessing as mp
import time

def foo_pool(x):
    time.sleep(2)
    return x*x

result_list = []
def log_result(result):
    # This is called whenever foo_pool(i) returns a result.
    # result_list is modified only by the main process, not the pool workers.
    result_list.append(result)

def apply_async_with_callback():
    pool = mp.Pool()
    for i in range(10):
        pool.apply_async(foo_pool, args = (i, ), callback = log_result)
    pool.close()
    pool.join()
    print(result_list)

if __name__ == '__main__':
    apply_async_with_callback()

可能会产生如下结果

[1, 0, 4, 9, 25, 16, 49, 36, 81, 64]

请注意,与不同pool.map,结果的顺序可能与pool.apply_async调用的顺序不同。


因此,如果您需要在一个单独的进程中运行一个函数,但是希望当前进程在该函数返回之前一直阻塞,请使用Pool.apply。像一样Pool.applyPool.map阻塞直到返回完整的结果。

如果希望工作进程池异步执行许多功能调用,请使用Pool.apply_async。结果的顺序不能保证与调用的顺序相同Pool.apply_async

还要注意,您可以使用调用许多不同的函数Pool.apply_async(并非所有调用都需要使用同一函数)。

相反,Pool.map将相同的函数应用于许多参数。但是,与不同Pool.apply_async,结果按与参数顺序相对应的顺序返回。

Back in the old days of Python, to call a function with arbitrary arguments, you would use apply:

apply(f,args,kwargs)

apply still exists in Python2.7 though not in Python3, and is generally not used anymore. Nowadays,

f(*args,**kwargs)

is preferred. The multiprocessing.Pool modules tries to provide a similar interface.

Pool.apply is like Python apply, except that the function call is performed in a separate process. Pool.apply blocks until the function is completed.

Pool.apply_async is also like Python’s built-in apply, except that the call returns immediately instead of waiting for the result. An AsyncResult object is returned. You call its get() method to retrieve the result of the function call. The get() method blocks until the function is completed. Thus, pool.apply(func, args, kwargs) is equivalent to pool.apply_async(func, args, kwargs).get().

In contrast to Pool.apply, the Pool.apply_async method also has a callback which, if supplied, is called when the function is complete. This can be used instead of calling get().

For example:

import multiprocessing as mp
import time

def foo_pool(x):
    time.sleep(2)
    return x*x

result_list = []
def log_result(result):
    # This is called whenever foo_pool(i) returns a result.
    # result_list is modified only by the main process, not the pool workers.
    result_list.append(result)

def apply_async_with_callback():
    pool = mp.Pool()
    for i in range(10):
        pool.apply_async(foo_pool, args = (i, ), callback = log_result)
    pool.close()
    pool.join()
    print(result_list)

if __name__ == '__main__':
    apply_async_with_callback()

may yield a result such as

[1, 0, 4, 9, 25, 16, 49, 36, 81, 64]

Notice, unlike pool.map, the order of the results may not correspond to the order in which the pool.apply_async calls were made.


So, if you need to run a function in a separate process, but want the current process to block until that function returns, use Pool.apply. Like Pool.apply, Pool.map blocks until the complete result is returned.

If you want the Pool of worker processes to perform many function calls asynchronously, use Pool.apply_async. The order of the results is not guaranteed to be the same as the order of the calls to Pool.apply_async.

Notice also that you could call a number of different functions with Pool.apply_async (not all calls need to use the same function).

In contrast, Pool.map applies the same function to many arguments. However, unlike Pool.apply_async, the results are returned in an order corresponding to the order of the arguments.


回答 1

关于applyvs map

pool.apply(f, args)f仅在池中的一个工作线程中执行。因此,池中的一个进程将运行f(args)

pool.map(f, iterable):此方法将可迭代项分为多个块,将其作为单独的任务提交给流程池。因此,您可以利用池中的所有进程。

Regarding apply vs map:

pool.apply(f, args): f is only executed in ONE of the workers of the pool. So ONE of the processes in the pool will run f(args).

pool.map(f, iterable): This method chops the iterable into a number of chunks which it submits to the process pool as separate tasks. So you take advantage of all the processes in the pool.


回答 2

以下是在一个表的格式,以显示之间的差异的概述Pool.applyPool.apply_asyncPool.mapPool.map_async。选择一个时,必须考虑多个参数,并发性,阻塞和排序:

                  | Multi-args   Concurrence    Blocking     Ordered-results
---------------------------------------------------------------------
Pool.map          | no           yes            yes          yes
Pool.map_async    | no           yes            no           yes
Pool.apply        | yes          no             yes          no
Pool.apply_async  | yes          yes            no           no
Pool.starmap      | yes          yes            yes          yes
Pool.starmap_async| yes          yes            no           no

笔记:

  • Pool.imapPool.imap_async–地图和map_async的惰性版本。

  • Pool.starmap 方法,除了接受多个参数外,与map方法非常相似。

  • Async方法一次提交所有流程,并在完成后检索结果。使用get方法获取结果。

  • Pool.map(或Pool.apply)方法与Python内置map(或套用)非常相似。它们阻塞主流程,直到所有流程完成并返回结果。

例子:

地图

一次调用一份工作清单

results = pool.map(func, [1, 2, 3])

应用

只能被要求一份工作

for x, y in [[1, 1], [2, 2]]:
    results.append(pool.apply(func, (x, y)))

def collect_result(result):
    results.append(result)

map_async

一次调用一份工作清单

pool.map_async(func, jobs, callback=collect_result)

apply_async

只能调用一个作业并在后台并行执行一个作业

for x, y in [[1, 1], [2, 2]]:
    pool.apply_async(worker, (x, y), callback=collect_result)

星图

pool.map支持多个参数的变体

pool.starmap(func, [(1, 1), (2, 1), (3, 1)])

starmap_async

starmap()和map_async()的组合,它对可迭代的可迭代对象进行迭代,并在未包装可迭代对象的情况下调用func。返回结果对象。

pool.starmap_async(calculate_worker, [(1, 1), (2, 1), (3, 1)], callback=collect_result)

参考:

在此处找到完整的文档:https : //docs.python.org/3/library/multiprocessing.html

Here is an overview in a table format in order to show the differences between Pool.apply, Pool.apply_async, Pool.map and Pool.map_async. When choosing one, you have to take multi-args, concurrency, blocking, and ordering into account:

                  | Multi-args   Concurrence    Blocking     Ordered-results
---------------------------------------------------------------------
Pool.map          | no           yes            yes          yes
Pool.map_async    | no           yes            no           yes
Pool.apply        | yes          no             yes          no
Pool.apply_async  | yes          yes            no           no
Pool.starmap      | yes          yes            yes          yes
Pool.starmap_async| yes          yes            no           no

Notes:

  • Pool.imap and Pool.imap_async – lazier version of map and map_async.

  • Pool.starmap method, very much similar to map method besides it acceptance of multiple arguments.

  • Async methods submit all the processes at once and retrieve the results once they are finished. Use get method to obtain the results.

  • Pool.map(or Pool.apply)methods are very much similar to Python built-in map(or apply). They block the main process until all the processes complete and return the result.

Examples:

map

Is called for a list of jobs in one time

results = pool.map(func, [1, 2, 3])

apply

Can only be called for one job

for x, y in [[1, 1], [2, 2]]:
    results.append(pool.apply(func, (x, y)))

def collect_result(result):
    results.append(result)

map_async

Is called for a list of jobs in one time

pool.map_async(func, jobs, callback=collect_result)

apply_async

Can only be called for one job and executes a job in the background in parallel

for x, y in [[1, 1], [2, 2]]:
    pool.apply_async(worker, (x, y), callback=collect_result)

starmap

Is a variant of pool.map which support multiple arguments

pool.starmap(func, [(1, 1), (2, 1), (3, 1)])

starmap_async

A combination of starmap() and map_async() that iterates over iterable of iterables and calls func with the iterables unpacked. Returns a result object.

pool.starmap_async(calculate_worker, [(1, 1), (2, 1), (3, 1)], callback=collect_result)

Reference:

Find complete documentation here: https://docs.python.org/3/library/multiprocessing.html