我们什么时候应该调用multiprocessing.Pool.join?

问题:我们什么时候应该调用multiprocessing.Pool.join?

我正在使用’multiprocess.Pool.imap_unordered’如下

from multiprocessing import Pool
pool = Pool()
for mapped_result in pool.imap_unordered(mapping_func, args_iter):
    do some additional processing on mapped_result

我需要打电话pool.closepool.join之后的for循环?

I am using ‘multiprocess.Pool.imap_unordered’ as following

from multiprocessing import Pool
pool = Pool()
for mapped_result in pool.imap_unordered(mapping_func, args_iter):
    do some additional processing on mapped_result

Do I need to call pool.close or pool.join after the for loop?


回答 0

不,您没有,但是如果您不再使用游泳池,那可能是个好主意。

Tim Peters在此SO帖子中调用pool.close或调用的理由pool.join很明确:

至于Pool.close(),您应该在永远不会向池实例提交更多工作的时候才调用它。因此,通常在主程序的可并行化部分完成时调用Pool.close()。然后,当所有已分配的工作完成时,工作进程将终止。

调用Pool.join()等待工作进程终止也是一种很好的做法。除其他原因外,通常没有很好的方法来报告并行化代码中的异常(异常仅在与您的主程序正在做的事情有关的上下文中发生),而Pool.join()提供了一个同步点,可以报告发生的某些异常在您否则无法看到的工作流程中。

No, you don’t, but it’s probably a good idea if you aren’t going to use the pool anymore.

Reasons for calling pool.close or pool.join are well said by Tim Peters in this SO post:

As to Pool.close(), you should call that when – and only when – you’re never going to submit more work to the Pool instance. So Pool.close() is typically called when the parallelizable part of your main program is finished. Then the worker processes will terminate when all work already assigned has completed.

It’s also excellent practice to call Pool.join() to wait for the worker processes to terminate. Among other reasons, there’s often no good way to report exceptions in parallelized code (exceptions occur in a context only vaguely related to what your main program is doing), and Pool.join() provides a synchronization point that can report some exceptions that occurred in worker processes that you’d otherwise never see.


回答 1

我遇到了相同的内存问题,因为当我不使用Python时,pool.close()以及与用于计算Levenshtein距离的函数一起使用pool.join(),Python的multiprocessing.pool的内存使用率一直在增长pool.map()。该功能运行良好,但是在Win7 64机器上没有正确收集垃圾,并且每次调用该功能之前,内存使用率一直在失控,直到整个操作系统崩溃。这是修复漏洞的代码:

stringList = []
for possible_string in stringArray:
    stringList.append((searchString,possible_string))

pool = Pool(5)
results = pool.map(myLevenshteinFunction, stringList)
pool.close()
pool.join()

关闭并加入池后,内存泄漏消失了。

I had the same memory issue as Memory usage keep growing with Python’s multiprocessing.pool when I didn’t use pool.close() and pool.join() when using pool.map() with a function that calculated Levenshtein distance. The function worked fine, but wasn’t garbage collected properly on a Win7 64 machine, and the memory usage kept growing out of control every time the function was called until it took the whole operating system down. Here’s the code that fixed the leak:

stringList = []
for possible_string in stringArray:
    stringList.append((searchString,possible_string))

pool = Pool(5)
results = pool.map(myLevenshteinFunction, stringList)
pool.close()
pool.join()

After closing and joining the pool the memory leak went away.