标签归档:multithreading

线程池类似于多处理池?

问题:线程池类似于多处理池?

是否有用于工作线程的Pool类,类似于多处理模块的Pool类

我喜欢例如并行化地图功能的简单方法

def long_running_func(p):
    c_func_no_gil(p)

p = multiprocessing.Pool(4)
xs = p.map(long_running_func, range(100))

但是,我希望这样做而不会产生新流程的开销。

我知道GIL。但是,在我的用例中,该函数将是IO绑定的C函数,python包装程序将在实际函数调用之前为其释放GIL。

我必须编写自己的线程池吗?

Is there a Pool class for worker threads, similar to the multiprocessing module’s Pool class?

I like for example the easy way to parallelize a map function

def long_running_func(p):
    c_func_no_gil(p)

p = multiprocessing.Pool(4)
xs = p.map(long_running_func, range(100))

however I would like to do it without the overhead of creating new processes.

I know about the GIL. However, in my usecase, the function will be an IO-bound C function for which the python wrapper will release the GIL before the actual function call.

Do I have to write my own threading pool?


回答 0

我刚刚发现模块中实际上 一个基于线程的Pool接口multiprocessing,但是它有些隐藏并且没有正确记录。

可以通过导入

from multiprocessing.pool import ThreadPool

它是使用包装Python线程的虚拟Process类实现的。可以找到基于线程的Process类multiprocessing.dummy,在docs中对其进行了简要介绍。该虚拟模块应该提供基于线程的整个多处理接口。

I just found out that there actually is a thread-based Pool interface in the multiprocessing module, however it is hidden somewhat and not properly documented.

It can be imported via

from multiprocessing.pool import ThreadPool

It is implemented using a dummy Process class wrapping a python thread. This thread-based Process class can be found in multiprocessing.dummy which is mentioned briefly in the docs. This dummy module supposedly provides the whole multiprocessing interface based on threads.


回答 1

在Python 3中,您可以使用concurrent.futures.ThreadPoolExecutor,即:

executor = ThreadPoolExecutor(max_workers=10)
a = executor.submit(my_function)

有关更多信息和示例,请参阅文档

In Python 3 you can use concurrent.futures.ThreadPoolExecutor, i.e.:

executor = ThreadPoolExecutor(max_workers=10)
a = executor.submit(my_function)

See the docs for more info and examples.


回答 2

是的,它似乎(或多或少)具有相同的API。

import multiprocessing

def worker(lnk):
    ....    
def start_process():
    .....
....

if(PROCESS):
    pool = multiprocessing.Pool(processes=POOL_SIZE, initializer=start_process)
else:
    pool = multiprocessing.pool.ThreadPool(processes=POOL_SIZE, 
                                           initializer=start_process)

pool.map(worker, inputs)
....

Yes, and it seems to have (more or less) the same API.

import multiprocessing

def worker(lnk):
    ....    
def start_process():
    .....
....

if(PROCESS):
    pool = multiprocessing.Pool(processes=POOL_SIZE, initializer=start_process)
else:
    pool = multiprocessing.pool.ThreadPool(processes=POOL_SIZE, 
                                           initializer=start_process)

pool.map(worker, inputs)
....

回答 3

对于非常简单和轻巧的东西(从此处稍作修改):

from Queue import Queue
from threading import Thread


class Worker(Thread):
    """Thread executing tasks from a given tasks queue"""
    def __init__(self, tasks):
        Thread.__init__(self)
        self.tasks = tasks
        self.daemon = True
        self.start()

    def run(self):
        while True:
            func, args, kargs = self.tasks.get()
            try:
                func(*args, **kargs)
            except Exception, e:
                print e
            finally:
                self.tasks.task_done()


class ThreadPool:
    """Pool of threads consuming tasks from a queue"""
    def __init__(self, num_threads):
        self.tasks = Queue(num_threads)
        for _ in range(num_threads):
            Worker(self.tasks)

    def add_task(self, func, *args, **kargs):
        """Add a task to the queue"""
        self.tasks.put((func, args, kargs))

    def wait_completion(self):
        """Wait for completion of all the tasks in the queue"""
        self.tasks.join()

if __name__ == '__main__':
    from random import randrange
    from time import sleep

    delays = [randrange(1, 10) for i in range(100)]

    def wait_delay(d):
        print 'sleeping for (%d)sec' % d
        sleep(d)

    pool = ThreadPool(20)

    for i, d in enumerate(delays):
        pool.add_task(wait_delay, d)

    pool.wait_completion()

要支持完成任务的回调,您只需将回调添加到任务元组即可。

For something very simple and lightweight (slightly modified from here):

from Queue import Queue
from threading import Thread


class Worker(Thread):
    """Thread executing tasks from a given tasks queue"""
    def __init__(self, tasks):
        Thread.__init__(self)
        self.tasks = tasks
        self.daemon = True
        self.start()

    def run(self):
        while True:
            func, args, kargs = self.tasks.get()
            try:
                func(*args, **kargs)
            except Exception, e:
                print e
            finally:
                self.tasks.task_done()


class ThreadPool:
    """Pool of threads consuming tasks from a queue"""
    def __init__(self, num_threads):
        self.tasks = Queue(num_threads)
        for _ in range(num_threads):
            Worker(self.tasks)

    def add_task(self, func, *args, **kargs):
        """Add a task to the queue"""
        self.tasks.put((func, args, kargs))

    def wait_completion(self):
        """Wait for completion of all the tasks in the queue"""
        self.tasks.join()

if __name__ == '__main__':
    from random import randrange
    from time import sleep

    delays = [randrange(1, 10) for i in range(100)]

    def wait_delay(d):
        print 'sleeping for (%d)sec' % d
        sleep(d)

    pool = ThreadPool(20)

    for i, d in enumerate(delays):
        pool.add_task(wait_delay, d)

    pool.wait_completion()

To support callbacks on task completion you can just add the callback to the task tuple.


回答 4

嗨,在Python中使用线程池可以使用以下库:

from multiprocessing.dummy import Pool as ThreadPool

然后使用,这个库就是这样的:

pool = ThreadPool(threads)
results = pool.map(service, tasks)
pool.close()
pool.join()
return results

线程是所需的线程数,任务是大多数映射到服务的任务列表。

Hi to use the thread pool in Python you can use this library :

from multiprocessing.dummy import Pool as ThreadPool

and then for use, this library do like that :

pool = ThreadPool(threads)
results = pool.map(service, tasks)
pool.close()
pool.join()
return results

The threads are the number of threads that you want and tasks are a list of task that most map to the service.


回答 5

这是我最终使用的结果。它是上述dgorissen类的修改版本。

文件: threadpool.py

from queue import Queue, Empty
import threading
from threading import Thread


class Worker(Thread):
    _TIMEOUT = 2
    """ Thread executing tasks from a given tasks queue. Thread is signalable, 
        to exit
    """
    def __init__(self, tasks, th_num):
        Thread.__init__(self)
        self.tasks = tasks
        self.daemon, self.th_num = True, th_num
        self.done = threading.Event()
        self.start()

    def run(self):       
        while not self.done.is_set():
            try:
                func, args, kwargs = self.tasks.get(block=True,
                                                   timeout=self._TIMEOUT)
                try:
                    func(*args, **kwargs)
                except Exception as e:
                    print(e)
                finally:
                    self.tasks.task_done()
            except Empty as e:
                pass
        return

    def signal_exit(self):
        """ Signal to thread to exit """
        self.done.set()


class ThreadPool:
    """Pool of threads consuming tasks from a queue"""
    def __init__(self, num_threads, tasks=[]):
        self.tasks = Queue(num_threads)
        self.workers = []
        self.done = False
        self._init_workers(num_threads)
        for task in tasks:
            self.tasks.put(task)

    def _init_workers(self, num_threads):
        for i in range(num_threads):
            self.workers.append(Worker(self.tasks, i))

    def add_task(self, func, *args, **kwargs):
        """Add a task to the queue"""
        self.tasks.put((func, args, kwargs))

    def _close_all_threads(self):
        """ Signal all threads to exit and lose the references to them """
        for workr in self.workers:
            workr.signal_exit()
        self.workers = []

    def wait_completion(self):
        """Wait for completion of all the tasks in the queue"""
        self.tasks.join()

    def __del__(self):
        self._close_all_threads()


def create_task(func, *args, **kwargs):
    return (func, args, kwargs)

使用游泳池

from random import randrange
from time import sleep

delays = [randrange(1, 10) for i in range(30)]

def wait_delay(d):
    print('sleeping for (%d)sec' % d)
    sleep(d)

pool = ThreadPool(20)
for i, d in enumerate(delays):
    pool.add_task(wait_delay, d)
pool.wait_completion()

Here’s the result I finally ended up using. It’s a modified version of the classes by dgorissen above.

File: threadpool.py

from queue import Queue, Empty
import threading
from threading import Thread


class Worker(Thread):
    _TIMEOUT = 2
    """ Thread executing tasks from a given tasks queue. Thread is signalable, 
        to exit
    """
    def __init__(self, tasks, th_num):
        Thread.__init__(self)
        self.tasks = tasks
        self.daemon, self.th_num = True, th_num
        self.done = threading.Event()
        self.start()

    def run(self):       
        while not self.done.is_set():
            try:
                func, args, kwargs = self.tasks.get(block=True,
                                                   timeout=self._TIMEOUT)
                try:
                    func(*args, **kwargs)
                except Exception as e:
                    print(e)
                finally:
                    self.tasks.task_done()
            except Empty as e:
                pass
        return

    def signal_exit(self):
        """ Signal to thread to exit """
        self.done.set()


class ThreadPool:
    """Pool of threads consuming tasks from a queue"""
    def __init__(self, num_threads, tasks=[]):
        self.tasks = Queue(num_threads)
        self.workers = []
        self.done = False
        self._init_workers(num_threads)
        for task in tasks:
            self.tasks.put(task)

    def _init_workers(self, num_threads):
        for i in range(num_threads):
            self.workers.append(Worker(self.tasks, i))

    def add_task(self, func, *args, **kwargs):
        """Add a task to the queue"""
        self.tasks.put((func, args, kwargs))

    def _close_all_threads(self):
        """ Signal all threads to exit and lose the references to them """
        for workr in self.workers:
            workr.signal_exit()
        self.workers = []

    def wait_completion(self):
        """Wait for completion of all the tasks in the queue"""
        self.tasks.join()

    def __del__(self):
        self._close_all_threads()


def create_task(func, *args, **kwargs):
    return (func, args, kwargs)

To use the pool

from random import randrange
from time import sleep

delays = [randrange(1, 10) for i in range(30)]

def wait_delay(d):
    print('sleeping for (%d)sec' % d)
    sleep(d)

pool = ThreadPool(20)
for i, d in enumerate(delays):
    pool.add_task(wait_delay, d)
pool.wait_completion()

回答 6

创建新流程的开销非常小,尤其是其中只有4个时。我怀疑这是您应用程序的性能热点。保持简单,优化您必须去的地方以及分析结果指向的地方。

The overhead of creating the new processes is minimal, especially when it’s just 4 of them. I doubt this is a performance hot spot of your application. Keep it simple, optimize where you have to and where profiling results point to.


回答 7

没有基于线程的内置池。但是,用Queue该类实现生产者/消费者队列可能很快。

来自:https : //docs.python.org/2/library/queue.html

from threading import Thread
from Queue import Queue
def worker():
    while True:
        item = q.get()
        do_work(item)
        q.task_done()

q = Queue()
for i in range(num_worker_threads):
     t = Thread(target=worker)
     t.daemon = True
     t.start()

for item in source():
    q.put(item)

q.join()       # block until all tasks are done

There is no built in thread based pool. However, it can be very quick to implement a producer/consumer queue with the Queue class.

From: https://docs.python.org/2/library/queue.html

from threading import Thread
from Queue import Queue
def worker():
    while True:
        item = q.get()
        do_work(item)
        q.task_done()

q = Queue()
for i in range(num_worker_threads):
     t = Thread(target=worker)
     t.daemon = True
     t.start()

for item in source():
    q.put(item)

q.join()       # block until all tasks are done

如何从python中的线程获取返回值?

问题:如何从python中的线程获取返回值?

foo下面的函数返回一个字符串'foo'。如何获取'foo'从线程目标返回的值?

from threading import Thread

def foo(bar):
    print('hello {}'.format(bar))
    return 'foo'

thread = Thread(target=foo, args=('world!',))
thread.start()
return_value = thread.join()

上面显示的“一种显而易见的方法”不起作用:thread.join()return None

The function foo below returns a string 'foo'. How can I get the value 'foo' which is returned from the thread’s target?

from threading import Thread

def foo(bar):
    print('hello {}'.format(bar))
    return 'foo'

thread = Thread(target=foo, args=('world!',))
thread.start()
return_value = thread.join()

The “one obvious way to do it”, shown above, doesn’t work: thread.join() returned None.


回答 0

在Python 3.2+中,stdlib concurrent.futures模块向提供了更高级别的API threading,包括将返回值或异常从工作线程传递回主线程:

import concurrent.futures

def foo(bar):
    print('hello {}'.format(bar))
    return 'foo'

with concurrent.futures.ThreadPoolExecutor() as executor:
    future = executor.submit(foo, 'world!')
    return_value = future.result()
    print(return_value)

In Python 3.2+, stdlib concurrent.futures module provides a higher level API to threading, including passing return values or exceptions from a worker thread back to the main thread:

import concurrent.futures

def foo(bar):
    print('hello {}'.format(bar))
    return 'foo'

with concurrent.futures.ThreadPoolExecutor() as executor:
    future = executor.submit(foo, 'world!')
    return_value = future.result()
    print(return_value)

回答 1

FWIW,该multiprocessing模块为此类提供了一个不错的接口Pool。而且,如果您要坚持使用线程而不是进程,则可以只使用multiprocessing.pool.ThreadPool该类作为替代品。

def foo(bar, baz):
  print 'hello {0}'.format(bar)
  return 'foo' + baz

from multiprocessing.pool import ThreadPool
pool = ThreadPool(processes=1)

async_result = pool.apply_async(foo, ('world', 'foo')) # tuple of args for foo

# do some other stuff in the main process

return_val = async_result.get()  # get the return value from your function.

FWIW, the multiprocessing module has a nice interface for this using the Pool class. And if you want to stick with threads rather than processes, you can just use the multiprocessing.pool.ThreadPool class as a drop-in replacement.

def foo(bar, baz):
  print 'hello {0}'.format(bar)
  return 'foo' + baz

from multiprocessing.pool import ThreadPool
pool = ThreadPool(processes=1)

async_result = pool.apply_async(foo, ('world', 'foo')) # tuple of args for foo

# do some other stuff in the main process

return_val = async_result.get()  # get the return value from your function.

回答 2

我见过的一种方法是将可变对象(例如列表或字典)与索引或某种其他标识符一起传递给线程的构造函数。然后,线程可以将其结果存储在该对象的专用插槽中。例如:

def foo(bar, result, index):
    print 'hello {0}'.format(bar)
    result[index] = "foo"

from threading import Thread

threads = [None] * 10
results = [None] * 10

for i in range(len(threads)):
    threads[i] = Thread(target=foo, args=('world!', results, i))
    threads[i].start()

# do some other stuff

for i in range(len(threads)):
    threads[i].join()

print " ".join(results)  # what sound does a metasyntactic locomotive make?

如果您确实想join()返回被调用函数的返回值,则可以使用如下所示的Thread子类来实现:

from threading import Thread

def foo(bar):
    print 'hello {0}'.format(bar)
    return "foo"

class ThreadWithReturnValue(Thread):
    def __init__(self, group=None, target=None, name=None,
                 args=(), kwargs={}, Verbose=None):
        Thread.__init__(self, group, target, name, args, kwargs, Verbose)
        self._return = None
    def run(self):
        if self._Thread__target is not None:
            self._return = self._Thread__target(*self._Thread__args,
                                                **self._Thread__kwargs)
    def join(self):
        Thread.join(self)
        return self._return

twrv = ThreadWithReturnValue(target=foo, args=('world!',))

twrv.start()
print twrv.join()   # prints foo

由于名称修改,这有点麻烦,并且它访问特定于Thread实现的“私有”数据结构…但是它可以工作。

对于python3

class ThreadWithReturnValue(Thread):
    def __init__(self, group=None, target=None, name=None,
                 args=(), kwargs={}, Verbose=None):
        Thread.__init__(self, group, target, name, args, kwargs)
        self._return = None
    def run(self):
        print(type(self._target))
        if self._target is not None:
            self._return = self._target(*self._args,
                                                **self._kwargs)
    def join(self, *args):
        Thread.join(self, *args)
        return self._return

One way I’ve seen is to pass a mutable object, such as a list or a dictionary, to the thread’s constructor, along with a an index or other identifier of some sort. The thread can then store its results in its dedicated slot in that object. For example:

def foo(bar, result, index):
    print 'hello {0}'.format(bar)
    result[index] = "foo"

from threading import Thread

threads = [None] * 10
results = [None] * 10

for i in range(len(threads)):
    threads[i] = Thread(target=foo, args=('world!', results, i))
    threads[i].start()

# do some other stuff

for i in range(len(threads)):
    threads[i].join()

print " ".join(results)  # what sound does a metasyntactic locomotive make?

If you really want join() to return the return value of the called function, you can do this with a Thread subclass like the following:

from threading import Thread

def foo(bar):
    print 'hello {0}'.format(bar)
    return "foo"

class ThreadWithReturnValue(Thread):
    def __init__(self, group=None, target=None, name=None,
                 args=(), kwargs={}, Verbose=None):
        Thread.__init__(self, group, target, name, args, kwargs, Verbose)
        self._return = None
    def run(self):
        if self._Thread__target is not None:
            self._return = self._Thread__target(*self._Thread__args,
                                                **self._Thread__kwargs)
    def join(self):
        Thread.join(self)
        return self._return

twrv = ThreadWithReturnValue(target=foo, args=('world!',))

twrv.start()
print twrv.join()   # prints foo

That gets a little hairy because of some name mangling, and it accesses “private” data structures that are specific to Thread implementation… but it works.

For python3

class ThreadWithReturnValue(Thread):
    def __init__(self, group=None, target=None, name=None,
                 args=(), kwargs={}, Verbose=None):
        Thread.__init__(self, group, target, name, args, kwargs)
        self._return = None
    def run(self):
        print(type(self._target))
        if self._target is not None:
            self._return = self._target(*self._args,
                                                **self._kwargs)
    def join(self, *args):
        Thread.join(self, *args)
        return self._return

回答 3

Jake的答案很好,但是如果您不想使用线程池(您不知道需要多少线程,而是根据需要创建它们),那么内置的一种在线程之间传输信息的好方法队列类,因为它提供线程安全性。

我创建了以下装饰器,以使其与线程池类似:

def threaded(f, daemon=False):
    import Queue

    def wrapped_f(q, *args, **kwargs):
        '''this function calls the decorated function and puts the 
        result in a queue'''
        ret = f(*args, **kwargs)
        q.put(ret)

    def wrap(*args, **kwargs):
        '''this is the function returned from the decorator. It fires off
        wrapped_f in a new thread and returns the thread object with
        the result queue attached'''

        q = Queue.Queue()

        t = threading.Thread(target=wrapped_f, args=(q,)+args, kwargs=kwargs)
        t.daemon = daemon
        t.start()
        t.result_queue = q        
        return t

    return wrap

然后,将其用作:

@threaded
def long_task(x):
    import time
    x = x + 5
    time.sleep(5)
    return x

# does not block, returns Thread object
y = long_task(10)
print y

# this blocks, waiting for the result
result = y.result_queue.get()
print result

装饰函数每次调用时都会创建一个新线程,并返回一个Thread对象,该对象包含将接收结果的队列。

更新

自从我发布这个答案已经有一段时间了,但是它仍然得到视图,所以我想我将对其进行更新以反映我在较新版本的Python中执行此操作的方式:

concurrent.futures模块中添加了Python 3.2,该模块为并行任务提供了高级接口。它提供ThreadPoolExecutorProcessPoolExecutor,因此您可以使用具有相同api的线程或进程池。

此API的一个好处是将任务提交给Executor返回值Future return会对象,该对象将以您提交的可调用对象的返回值完成。

这使得queue不需要附加对象,从而大大简化了装饰器:

_DEFAULT_POOL = ThreadPoolExecutor()

def threadpool(f, executor=None):
    @wraps(f)
    def wrap(*args, **kwargs):
        return (executor or _DEFAULT_POOL).submit(f, *args, **kwargs)

    return wrap

这将使用默认模块如果未传入,线程池执行程序。

用法与之前非常相似:

@threadpool
def long_task(x):
    import time
    x = x + 5
    time.sleep(5)
    return x

# does not block, returns Future object
y = long_task(10)
print y

# this blocks, waiting for the result
result = y.result()
print result

如果您使用的是Python 3.4+,则使用此方法(通常是Future对象)的一个非常不错的功能是可以包装返回的future并将其转换为asyncio.Futurewith asyncio.wrap_future。这使得它很容易与协程一起工作:

result = await asyncio.wrap_future(long_task(10))

如果不需要访问基础concurrent.Future对象,则可以在包装器中包含自动换行:

_DEFAULT_POOL = ThreadPoolExecutor()

def threadpool(f, executor=None):
    @wraps(f)
    def wrap(*args, **kwargs):
        return asyncio.wrap_future((executor or _DEFAULT_POOL).submit(f, *args, **kwargs))

    return wrap

然后,每当需要将cpu密集型代码或阻塞代码从事件循环线程中推出时,都可以将其放入经过修饰的函数中:

@threadpool
def some_long_calculation():
    ...

# this will suspend while the function is executed on a threadpool
result = await some_long_calculation()

Jake’s answer is good, but if you don’t want to use a threadpool (you don’t know how many threads you’ll need, but create them as needed) then a good way to transmit information between threads is the built-in Queue.Queue class, as it offers thread safety.

I created the following decorator to make it act in a similar fashion to the threadpool:

def threaded(f, daemon=False):
    import Queue

    def wrapped_f(q, *args, **kwargs):
        '''this function calls the decorated function and puts the 
        result in a queue'''
        ret = f(*args, **kwargs)
        q.put(ret)

    def wrap(*args, **kwargs):
        '''this is the function returned from the decorator. It fires off
        wrapped_f in a new thread and returns the thread object with
        the result queue attached'''

        q = Queue.Queue()

        t = threading.Thread(target=wrapped_f, args=(q,)+args, kwargs=kwargs)
        t.daemon = daemon
        t.start()
        t.result_queue = q        
        return t

    return wrap

Then you just use it as:

@threaded
def long_task(x):
    import time
    x = x + 5
    time.sleep(5)
    return x

# does not block, returns Thread object
y = long_task(10)
print y

# this blocks, waiting for the result
result = y.result_queue.get()
print result

The decorated function creates a new thread each time it’s called and returns a Thread object that contains the queue that will receive the result.

UPDATE

It’s been quite a while since I posted this answer, but it still gets views so I thought I would update it to reflect the way I do this in newer versions of Python:

Python 3.2 added in the concurrent.futures module which provides a high-level interface for parallel tasks. It provides ThreadPoolExecutor and ProcessPoolExecutor, so you can use a thread or process pool with the same api.

One benefit of this api is that submitting a task to an Executor returns a Future object, which will complete with the return value of the callable you submit.

This makes attaching a queue object unnecessary, which simplifies the decorator quite a bit:

_DEFAULT_POOL = ThreadPoolExecutor()

def threadpool(f, executor=None):
    @wraps(f)
    def wrap(*args, **kwargs):
        return (executor or _DEFAULT_POOL).submit(f, *args, **kwargs)

    return wrap

This will use a default module threadpool executor if one is not passed in.

The usage is very similar to before:

@threadpool
def long_task(x):
    import time
    x = x + 5
    time.sleep(5)
    return x

# does not block, returns Future object
y = long_task(10)
print y

# this blocks, waiting for the result
result = y.result()
print result

If you’re using Python 3.4+, one really nice feature of using this method (and Future objects in general) is that the returned future can be wrapped to turn it into an asyncio.Future with asyncio.wrap_future. This makes it work easily with coroutines:

result = await asyncio.wrap_future(long_task(10))

If you don’t need access to the underlying concurrent.Future object, you can include the wrap in the decorator:

_DEFAULT_POOL = ThreadPoolExecutor()

def threadpool(f, executor=None):
    @wraps(f)
    def wrap(*args, **kwargs):
        return asyncio.wrap_future((executor or _DEFAULT_POOL).submit(f, *args, **kwargs))

    return wrap

Then, whenever you need to push cpu intensive or blocking code off the event loop thread, you can put it in a decorated function:

@threadpool
def some_long_calculation():
    ...

# this will suspend while the function is executed on a threadpool
result = await some_long_calculation()

回答 4

另一个不需要更改现有代码的解决方案:

import Queue
from threading import Thread

def foo(bar):
    print 'hello {0}'.format(bar)
    return 'foo'

que = Queue.Queue()

t = Thread(target=lambda q, arg1: q.put(foo(arg1)), args=(que, 'world!'))
t.start()
t.join()
result = que.get()
print result

还可以轻松地将其调整为多线程环境:

import Queue
from threading import Thread

def foo(bar):
    print 'hello {0}'.format(bar)
    return 'foo'

que = Queue.Queue()
threads_list = list()

t = Thread(target=lambda q, arg1: q.put(foo(arg1)), args=(que, 'world!'))
t.start()
threads_list.append(t)

# Add more threads here
...
threads_list.append(t2)
...
threads_list.append(t3)
...

# Join all the threads
for t in threads_list:
    t.join()

# Check thread's return value
while not que.empty():
    result = que.get()
    print result

Another solution that doesn’t require changing your existing code:

import Queue
from threading import Thread

def foo(bar):
    print 'hello {0}'.format(bar)
    return 'foo'

que = Queue.Queue()

t = Thread(target=lambda q, arg1: q.put(foo(arg1)), args=(que, 'world!'))
t.start()
t.join()
result = que.get()
print result

It can be also easily adjusted to a multi-threaded environment:

import Queue
from threading import Thread

def foo(bar):
    print 'hello {0}'.format(bar)
    return 'foo'

que = Queue.Queue()
threads_list = list()

t = Thread(target=lambda q, arg1: q.put(foo(arg1)), args=(que, 'world!'))
t.start()
threads_list.append(t)

# Add more threads here
...
threads_list.append(t2)
...
threads_list.append(t3)
...

# Join all the threads
for t in threads_list:
    t.join()

# Check thread's return value
while not que.empty():
    result = que.get()
    print result

回答 5

Parris / kindall的答案 join / return移植到Python 3 的答案

from threading import Thread

def foo(bar):
    print('hello {0}'.format(bar))
    return "foo"

class ThreadWithReturnValue(Thread):
    def __init__(self, group=None, target=None, name=None, args=(), kwargs=None, *, daemon=None):
        Thread.__init__(self, group, target, name, args, kwargs, daemon=daemon)

        self._return = None

    def run(self):
        if self._target is not None:
            self._return = self._target(*self._args, **self._kwargs)

    def join(self):
        Thread.join(self)
        return self._return


twrv = ThreadWithReturnValue(target=foo, args=('world!',))

twrv.start()
print(twrv.join())   # prints foo

请注意,Thread该类在Python 3中的实现方式有所不同。

Parris / kindall’s answer join/return answer ported to Python 3:

from threading import Thread

def foo(bar):
    print('hello {0}'.format(bar))
    return "foo"

class ThreadWithReturnValue(Thread):
    def __init__(self, group=None, target=None, name=None, args=(), kwargs=None, *, daemon=None):
        Thread.__init__(self, group, target, name, args, kwargs, daemon=daemon)

        self._return = None

    def run(self):
        if self._target is not None:
            self._return = self._target(*self._args, **self._kwargs)

    def join(self):
        Thread.join(self)
        return self._return


twrv = ThreadWithReturnValue(target=foo, args=('world!',))

twrv.start()
print(twrv.join())   # prints foo

Note, the Thread class is implemented differently in Python 3.


回答 6

我偷了kindall的答案并整理了一下。

关键部分是将* args和** kwargs添加到join()中以处理超时

class threadWithReturn(Thread):
    def __init__(self, *args, **kwargs):
        super(threadWithReturn, self).__init__(*args, **kwargs)

        self._return = None

    def run(self):
        if self._Thread__target is not None:
            self._return = self._Thread__target(*self._Thread__args, **self._Thread__kwargs)

    def join(self, *args, **kwargs):
        super(threadWithReturn, self).join(*args, **kwargs)

        return self._return

下面的更新的答案

这是我最受欢迎的答案,因此我决定使用将同时在py2和py3上运行的代码进行更新。

另外,我看到这个问题的许多答案表明对Thread.join()缺乏理解。有些人完全无法处理timeoutarg。但是,当您拥有(1)可以返回的目标函数None并且(2)您还传递了(timeout arg给join()。请参阅“测试4”以了解这种极端情况。

与py2和py3一起使用的ThreadWithReturn类:

import sys
from threading import Thread
from builtins import super    # https://stackoverflow.com/a/30159479

if sys.version_info >= (3, 0):
    _thread_target_key = '_target'
    _thread_args_key = '_args'
    _thread_kwargs_key = '_kwargs'
else:
    _thread_target_key = '_Thread__target'
    _thread_args_key = '_Thread__args'
    _thread_kwargs_key = '_Thread__kwargs'

class ThreadWithReturn(Thread):
    def __init__(self, *args, **kwargs):
        super().__init__(*args, **kwargs)
        self._return = None

    def run(self):
        target = getattr(self, _thread_target_key)
        if not target is None:
            self._return = target(
                *getattr(self, _thread_args_key),
                **getattr(self, _thread_kwargs_key)
            )

    def join(self, *args, **kwargs):
        super().join(*args, **kwargs)
        return self._return

一些示例测试如下所示:

import time, random

# TEST TARGET FUNCTION
def giveMe(arg, seconds=None):
    if not seconds is None:
        time.sleep(seconds)
    return arg

# TEST 1
my_thread = ThreadWithReturn(target=giveMe, args=('stringy',))
my_thread.start()
returned = my_thread.join()
# (returned == 'stringy')

# TEST 2
my_thread = ThreadWithReturn(target=giveMe, args=(None,))
my_thread.start()
returned = my_thread.join()
# (returned is None)

# TEST 3
my_thread = ThreadWithReturn(target=giveMe, args=('stringy',), kwargs={'seconds': 5})
my_thread.start()
returned = my_thread.join(timeout=2)
# (returned is None) # because join() timed out before giveMe() finished

# TEST 4
my_thread = ThreadWithReturn(target=giveMe, args=(None,), kwargs={'seconds': 5})
my_thread.start()
returned = my_thread.join(timeout=random.randint(1, 10))

您能否确定在测试4中可能遇到的特殊情况?

问题在于,我们希望GiveMe()返回None(请参见测试2),但是我们也希望join()如果超时则返回None。

returned is None 意味着:

(1)这就是GiveMe()返回的结果,或者

(2)join()超时

这个例子很简单,因为我们知道GiveMe()将始终返回None。但是在实际情况下(目标可能合法返回None或其他),我们希望显式检查发生了什么。

以下是解决这种情况的方法:

# TEST 4
my_thread = ThreadWithReturn(target=giveMe, args=(None,), kwargs={'seconds': 5})
my_thread.start()
returned = my_thread.join(timeout=random.randint(1, 10))

if my_thread.isAlive():
    # returned is None because join() timed out
    # this also means that giveMe() is still running in the background
    pass
    # handle this based on your app's logic
else:
    # join() is finished, and so is giveMe()
    # BUT we could also be in a race condition, so we need to update returned, just in case
    returned = my_thread.join()

I stole kindall’s answer and cleaned it up just a little bit.

The key part is adding *args and **kwargs to join() in order to handle the timeout

class threadWithReturn(Thread):
    def __init__(self, *args, **kwargs):
        super(threadWithReturn, self).__init__(*args, **kwargs)

        self._return = None

    def run(self):
        if self._Thread__target is not None:
            self._return = self._Thread__target(*self._Thread__args, **self._Thread__kwargs)

    def join(self, *args, **kwargs):
        super(threadWithReturn, self).join(*args, **kwargs)

        return self._return

UPDATED ANSWER BELOW

This is my most popularly upvoted answer, so I decided to update with code that will run on both py2 and py3.

Additionally, I see many answers to this question that show a lack of comprehension regarding Thread.join(). Some completely fail to handle the timeout arg. But there is also a corner-case that you should be aware of regarding instances when you have (1) a target function that can return None and (2) you also pass the timeout arg to join(). Please see “TEST 4” to understand this corner case.

ThreadWithReturn class that works with py2 and py3:

import sys
from threading import Thread
from builtins import super    # https://stackoverflow.com/a/30159479

if sys.version_info >= (3, 0):
    _thread_target_key = '_target'
    _thread_args_key = '_args'
    _thread_kwargs_key = '_kwargs'
else:
    _thread_target_key = '_Thread__target'
    _thread_args_key = '_Thread__args'
    _thread_kwargs_key = '_Thread__kwargs'

class ThreadWithReturn(Thread):
    def __init__(self, *args, **kwargs):
        super().__init__(*args, **kwargs)
        self._return = None

    def run(self):
        target = getattr(self, _thread_target_key)
        if not target is None:
            self._return = target(
                *getattr(self, _thread_args_key),
                **getattr(self, _thread_kwargs_key)
            )

    def join(self, *args, **kwargs):
        super().join(*args, **kwargs)
        return self._return

Some sample tests are shown below:

import time, random

# TEST TARGET FUNCTION
def giveMe(arg, seconds=None):
    if not seconds is None:
        time.sleep(seconds)
    return arg

# TEST 1
my_thread = ThreadWithReturn(target=giveMe, args=('stringy',))
my_thread.start()
returned = my_thread.join()
# (returned == 'stringy')

# TEST 2
my_thread = ThreadWithReturn(target=giveMe, args=(None,))
my_thread.start()
returned = my_thread.join()
# (returned is None)

# TEST 3
my_thread = ThreadWithReturn(target=giveMe, args=('stringy',), kwargs={'seconds': 5})
my_thread.start()
returned = my_thread.join(timeout=2)
# (returned is None) # because join() timed out before giveMe() finished

# TEST 4
my_thread = ThreadWithReturn(target=giveMe, args=(None,), kwargs={'seconds': 5})
my_thread.start()
returned = my_thread.join(timeout=random.randint(1, 10))

Can you identify the corner-case that we may possibly encounter with TEST 4?

The problem is that we expect giveMe() to return None (see TEST 2), but we also expect join() to return None if it times out.

returned is None means either:

(1) that’s what giveMe() returned, or

(2) join() timed out

This example is trivial since we know that giveMe() will always return None. But in real-world instance (where the target may legitimately return None or something else) we’d want to explicitly check for what happened.

Below is how to address this corner-case:

# TEST 4
my_thread = ThreadWithReturn(target=giveMe, args=(None,), kwargs={'seconds': 5})
my_thread.start()
returned = my_thread.join(timeout=random.randint(1, 10))

if my_thread.isAlive():
    # returned is None because join() timed out
    # this also means that giveMe() is still running in the background
    pass
    # handle this based on your app's logic
else:
    # join() is finished, and so is giveMe()
    # BUT we could also be in a race condition, so we need to update returned, just in case
    returned = my_thread.join()

回答 7

使用队列:

import threading, queue

def calc_square(num, out_queue1):
  l = []
  for x in num:
    l.append(x*x)
  out_queue1.put(l)


arr = [1,2,3,4,5,6,7,8,9,10]
out_queue1=queue.Queue()
t1=threading.Thread(target=calc_square, args=(arr,out_queue1))
t1.start()
t1.join()
print (out_queue1.get())

Using Queue :

import threading, queue

def calc_square(num, out_queue1):
  l = []
  for x in num:
    l.append(x*x)
  out_queue1.put(l)


arr = [1,2,3,4,5,6,7,8,9,10]
out_queue1=queue.Queue()
t1=threading.Thread(target=calc_square, args=(arr,out_queue1))
t1.start()
t1.join()
print (out_queue1.get())

回答 8

我对这个问题的解决方案是将函数和线程包装在一个类中。不需要使用池,队列或c类型变量传递。这也是非阻塞的。您改为查看状态。请参阅代码末尾有关如何使用它的示例。

import threading

class ThreadWorker():
    '''
    The basic idea is given a function create an object.
    The object can then run the function in a thread.
    It provides a wrapper to start it,check its status,and get data out the function.
    '''
    def __init__(self,func):
        self.thread = None
        self.data = None
        self.func = self.save_data(func)

    def save_data(self,func):
        '''modify function to save its returned data'''
        def new_func(*args, **kwargs):
            self.data=func(*args, **kwargs)

        return new_func

    def start(self,params):
        self.data = None
        if self.thread is not None:
            if self.thread.isAlive():
                return 'running' #could raise exception here

        #unless thread exists and is alive start or restart it
        self.thread = threading.Thread(target=self.func,args=params)
        self.thread.start()
        return 'started'

    def status(self):
        if self.thread is None:
            return 'not_started'
        else:
            if self.thread.isAlive():
                return 'running'
            else:
                return 'finished'

    def get_results(self):
        if self.thread is None:
            return 'not_started' #could return exception
        else:
            if self.thread.isAlive():
                return 'running'
            else:
                return self.data

def add(x,y):
    return x +y

add_worker = ThreadWorker(add)
print add_worker.start((1,2,))
print add_worker.status()
print add_worker.get_results()

My solution to the problem is to wrap the function and thread in a class. Does not require using pools,queues, or c type variable passing. It is also non blocking. You check status instead. See example of how to use it at end of code.

import threading

class ThreadWorker():
    '''
    The basic idea is given a function create an object.
    The object can then run the function in a thread.
    It provides a wrapper to start it,check its status,and get data out the function.
    '''
    def __init__(self,func):
        self.thread = None
        self.data = None
        self.func = self.save_data(func)

    def save_data(self,func):
        '''modify function to save its returned data'''
        def new_func(*args, **kwargs):
            self.data=func(*args, **kwargs)

        return new_func

    def start(self,params):
        self.data = None
        if self.thread is not None:
            if self.thread.isAlive():
                return 'running' #could raise exception here

        #unless thread exists and is alive start or restart it
        self.thread = threading.Thread(target=self.func,args=params)
        self.thread.start()
        return 'started'

    def status(self):
        if self.thread is None:
            return 'not_started'
        else:
            if self.thread.isAlive():
                return 'running'
            else:
                return 'finished'

    def get_results(self):
        if self.thread is None:
            return 'not_started' #could return exception
        else:
            if self.thread.isAlive():
                return 'running'
            else:
                return self.data

def add(x,y):
    return x +y

add_worker = ThreadWorker(add)
print add_worker.start((1,2,))
print add_worker.status()
print add_worker.get_results()

回答 9

join总是返回None,我想你应该子类Thread来处理返回码等等。

join always return None, i think you should subclass Thread to handle return codes and so.


回答 10

考虑到@iman@JakeBiesinger答案的评论,我将其重新组成为具有多个线程:

from multiprocessing.pool import ThreadPool

def foo(bar, baz):
    print 'hello {0}'.format(bar)
    return 'foo' + baz

numOfThreads = 3 
results = []

pool = ThreadPool(numOfThreads)

for i in range(0, numOfThreads):
    results.append(pool.apply_async(foo, ('world', 'foo'))) # tuple of args for foo)

# do some other stuff in the main process
# ...
# ...

results = [r.get() for r in results]
print results

pool.close()
pool.join()

干杯,

伙计

Taking into consideration @iman comment on @JakeBiesinger answer I have recomposed it to have various number of threads:

from multiprocessing.pool import ThreadPool

def foo(bar, baz):
    print 'hello {0}'.format(bar)
    return 'foo' + baz

numOfThreads = 3 
results = []

pool = ThreadPool(numOfThreads)

for i in range(0, numOfThreads):
    results.append(pool.apply_async(foo, ('world', 'foo'))) # tuple of args for foo)

# do some other stuff in the main process
# ...
# ...

results = [r.get() for r in results]
print results

pool.close()
pool.join()

Cheers,

Guy.


回答 11

您可以在线程函数的作用域之上定义一个可变变量,并将结果添加到该变量中。(我也将代码修改为与python3兼容)

returns = {}
def foo(bar):
    print('hello {0}'.format(bar))
    returns[bar] = 'foo'

from threading import Thread
t = Thread(target=foo, args=('world!',))
t.start()
t.join()
print(returns)

这返回 {'world!': 'foo'}

如果使用函数输入作为结果字典的键,则保证每个唯一的输入都会在结果中给出一个条目

You can define a mutable above the scope of the threaded function, and add the result to that. (I also modified the code to be python3 compatible)

returns = {}
def foo(bar):
    print('hello {0}'.format(bar))
    returns[bar] = 'foo'

from threading import Thread
t = Thread(target=foo, args=('world!',))
t.start()
t.join()
print(returns)

This returns {'world!': 'foo'}

If you use the function input as the key to your results dict, every unique input is guaranteed to give an entry in the results


回答 12

我正在使用此包装器,该包装器可以轻松地打开任何函数以在其中运行Thread-照顾其返回值或异常。它不会增加Queue开销。

def threading_func(f):
    """Decorator for running a function in a thread and handling its return
    value or exception"""
    def start(*args, **kw):
        def run():
            try:
                th.ret = f(*args, **kw)
            except:
                th.exc = sys.exc_info()
        def get(timeout=None):
            th.join(timeout)
            if th.exc:
                raise th.exc[0], th.exc[1], th.exc[2] # py2
                ##raise th.exc[1] #py3                
            return th.ret
        th = threading.Thread(None, run)
        th.exc = None
        th.get = get
        th.start()
        return th
    return start

使用范例

def f(x):
    return 2.5 * x
th = threading_func(f)(4)
print("still running?:", th.is_alive())
print("result:", th.get(timeout=1.0))

@threading_func
def th_mul(a, b):
    return a * b
th = th_mul("text", 2.5)

try:
    print(th.get())
except TypeError:
    print("exception thrown ok.")

threading模块注意事项

线程函数的舒适的返回值和异常处理是“ Pythonic”的常见需求,并且确实应该已经由threading模块提供-可能直接在标准Thread类中提供。ThreadPool对于简单的任务来说有太多的开销-3个管理线程,很多官僚作风。不幸Thread的是,其布局最初是从Java复制的-例如,您仍然可以从仍然无效的1st(!)构造函数参数中看到该布局group

I’m using this wrapper, which comfortably turns any function for running in a Thread – taking care of its return value or exception. It doesn’t add Queue overhead.

def threading_func(f):
    """Decorator for running a function in a thread and handling its return
    value or exception"""
    def start(*args, **kw):
        def run():
            try:
                th.ret = f(*args, **kw)
            except:
                th.exc = sys.exc_info()
        def get(timeout=None):
            th.join(timeout)
            if th.exc:
                raise th.exc[0], th.exc[1], th.exc[2] # py2
                ##raise th.exc[1] #py3                
            return th.ret
        th = threading.Thread(None, run)
        th.exc = None
        th.get = get
        th.start()
        return th
    return start

Usage Examples

def f(x):
    return 2.5 * x
th = threading_func(f)(4)
print("still running?:", th.is_alive())
print("result:", th.get(timeout=1.0))

@threading_func
def th_mul(a, b):
    return a * b
th = th_mul("text", 2.5)

try:
    print(th.get())
except TypeError:
    print("exception thrown ok.")

Notes on threading module

Comfortable return value & exception handling of a threaded function is a frequent “Pythonic” need and should indeed already be offered by the threading module – possibly directly in the standard Thread class. ThreadPool has way too much overhead for simple tasks – 3 managing threads, lots of bureaucracy. Unfortunately Thread‘s layout was copied from Java originally – which you see e.g. from the still useless 1st (!) constructor parameter group.


回答 13

将目标定义为
1)接受参数q
2)将任何语句替换return fooq.put(foo); return

所以一个功能

def func(a):
    ans = a * a
    return ans

会成为

def func(a, q):
    ans = a * a
    q.put(ans)
    return

然后您将照此进行

from Queue import Queue
from threading import Thread

ans_q = Queue()
arg_tups = [(i, ans_q) for i in xrange(10)]

threads = [Thread(target=func, args=arg_tup) for arg_tup in arg_tups]
_ = [t.start() for t in threads]
_ = [t.join() for t in threads]
results = [q.get() for _ in xrange(len(threads))]

而且,您可以使用函数装饰器/包装器来制作它,这样就可以target不修改而使用现有功能,而是遵循此基本方案。

Define your target to
1) take an argument q
2) replace any statements return foo with q.put(foo); return

so a function

def func(a):
    ans = a * a
    return ans

would become

def func(a, q):
    ans = a * a
    q.put(ans)
    return

and then you would proceed as such

from Queue import Queue
from threading import Thread

ans_q = Queue()
arg_tups = [(i, ans_q) for i in xrange(10)]

threads = [Thread(target=func, args=arg_tup) for arg_tup in arg_tups]
_ = [t.start() for t in threads]
_ = [t.join() for t in threads]
results = [q.get() for _ in xrange(len(threads))]

And you can use function decorators/wrappers to make it so you can use your existing functions as target without modifying them, but follow this basic scheme.


回答 14

如上所述,多处理池比基本线程慢得多。使用一些答案中提出的队列是一种非常有效的选择。我将它与字典配合使用,以便能够运行许多小线程并通过将它们与字典结合来调理多个答案:

#!/usr/bin/env python3

import threading
# use Queue for python2
import queue
import random

LETTERS = 'abcdefghijklmnopqrstuvwxyz'
LETTERS = [ x for x in LETTERS ]

NUMBERS = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

def randoms(k, q):
    result = dict()
    result['letter'] = random.choice(LETTERS)
    result['number'] = random.choice(NUMBERS)
    q.put({k: result})

threads = list()
q = queue.Queue()
results = dict()

for name in ('alpha', 'oscar', 'yankee',):
    threads.append( threading.Thread(target=randoms, args=(name, q)) )
    threads[-1].start()
_ = [ t.join() for t in threads ]
while not q.empty():
    results.update(q.get())

print(results)

As mentioned multiprocessing pool is much slower than basic threading. Using queues as proposeded in some answers here is a very effective alternative. I have use it with dictionaries in order to be able run a lot of small threads and recuperate multiple answers by combining them with dictionaries:

#!/usr/bin/env python3

import threading
# use Queue for python2
import queue
import random

LETTERS = 'abcdefghijklmnopqrstuvwxyz'
LETTERS = [ x for x in LETTERS ]

NUMBERS = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

def randoms(k, q):
    result = dict()
    result['letter'] = random.choice(LETTERS)
    result['number'] = random.choice(NUMBERS)
    q.put({k: result})

threads = list()
q = queue.Queue()
results = dict()

for name in ('alpha', 'oscar', 'yankee',):
    threads.append( threading.Thread(target=randoms, args=(name, q)) )
    threads[-1].start()
_ = [ t.join() for t in threads ]
while not q.empty():
    results.update(q.get())

print(results)

回答 15

GuySoft的想法很棒,但是我认为对象不一定必须继承自Thread,并且可以从接口中删除start():

from threading import Thread
import queue
class ThreadWithReturnValue(object):
    def __init__(self, target=None, args=(), **kwargs):
        self._que = queue.Queue()
        self._t = Thread(target=lambda q,arg1,kwargs1: q.put(target(*arg1, **kwargs1)) ,
                args=(self._que, args, kwargs), )
        self._t.start()

    def join(self):
        self._t.join()
        return self._que.get()


def foo(bar):
    print('hello {0}'.format(bar))
    return "foo"

twrv = ThreadWithReturnValue(target=foo, args=('world!',))

print(twrv.join())   # prints foo

GuySoft’s idea is great, but I think the object does not necessarily have to inherit from Thread and start() could be removed from interface:

from threading import Thread
import queue
class ThreadWithReturnValue(object):
    def __init__(self, target=None, args=(), **kwargs):
        self._que = queue.Queue()
        self._t = Thread(target=lambda q,arg1,kwargs1: q.put(target(*arg1, **kwargs1)) ,
                args=(self._que, args, kwargs), )
        self._t.start()

    def join(self):
        self._t.join()
        return self._que.get()


def foo(bar):
    print('hello {0}'.format(bar))
    return "foo"

twrv = ThreadWithReturnValue(target=foo, args=('world!',))

print(twrv.join())   # prints foo

回答 16

一种常见的解决方案是foo使用类似这样的装饰器包装函数

result = queue.Queue()

def task_wrapper(*args):
    result.put(target(*args))

然后整个代码可能像这样

result = queue.Queue()

def task_wrapper(*args):
    result.put(target(*args))

threads = [threading.Thread(target=task_wrapper, args=args) for args in args_list]

for t in threads:
    t.start()
    while(True):
        if(len(threading.enumerate()) < max_num):
            break
for t in threads:
    t.join()
return result

注意

一个重要的问题是返回值可能是无序的。(实际上,return value不一定将保存到queue,因为您可以选择任意线程安全的数据结构)

One usual solution is to wrap your function foo with a decorator like

result = queue.Queue()

def task_wrapper(*args):
    result.put(target(*args))

Then the whole code may looks like that

result = queue.Queue()

def task_wrapper(*args):
    result.put(target(*args))

threads = [threading.Thread(target=task_wrapper, args=args) for args in args_list]

for t in threads:
    t.start()
    while(True):
        if(len(threading.enumerate()) < max_num):
            break
for t in threads:
    t.join()
return result

Note

One important issue is that the return values may be unorderred. (In fact, the return value is not necessarily saved to the queue, since you can choose arbitrary thread-safe data structure )


回答 17

为什么不只使用全局变量?

import threading


class myThread(threading.Thread):
    def __init__(self, ind, lock):
        threading.Thread.__init__(self)
        self.ind = ind
        self.lock = lock

    def run(self):
        global results
        with self.lock:
            results.append(self.ind)



results = []
lock = threading.Lock()
threads = [myThread(x, lock) for x in range(1, 4)]
for t in threads:
    t.start()
for t in threads:
    t.join()
print(results)

Why don’t just use global variable?

import threading


class myThread(threading.Thread):
    def __init__(self, ind, lock):
        threading.Thread.__init__(self)
        self.ind = ind
        self.lock = lock

    def run(self):
        global results
        with self.lock:
            results.append(self.ind)



results = []
lock = threading.Lock()
threads = [myThread(x, lock) for x in range(1, 4)]
for t in threads:
    t.start()
for t in threads:
    t.join()
print(results)

回答 18

Kindall在Python3中的答案

class ThreadWithReturnValue(Thread):
    def __init__(self, group=None, target=None, name=None,
                 args=(), kwargs={}, *, daemon=None):
        Thread.__init__(self, group, target, name, args, kwargs, daemon)
        self._return = None 

    def run(self):
        try:
            if self._target:
                self._return = self._target(*self._args, **self._kwargs)
        finally:
            del self._target, self._args, self._kwargs 

    def join(self,timeout=None):
        Thread.join(self,timeout)
        return self._return

Kindall’s answer in Python3

class ThreadWithReturnValue(Thread):
    def __init__(self, group=None, target=None, name=None,
                 args=(), kwargs={}, *, daemon=None):
        Thread.__init__(self, group, target, name, args, kwargs, daemon)
        self._return = None 

    def run(self):
        try:
            if self._target:
                self._return = self._target(*self._args, **self._kwargs)
        finally:
            del self._target, self._args, self._kwargs 

    def join(self,timeout=None):
        Thread.join(self,timeout)
        return self._return

回答 19

如果仅要从函数调用中验证True或False,我发现一个更简单的解决方案是更新全局列表。

import threading

lists = {"A":"True", "B":"True"}

def myfunc(name: str, mylist):
    for i in mylist:
        if i == 31:
            lists[name] = "False"
            return False
        else:
            print("name {} : {}".format(name, i))

t1 = threading.Thread(target=myfunc, args=("A", [1, 2, 3, 4, 5, 6], ))
t2 = threading.Thread(target=myfunc, args=("B", [11, 21, 31, 41, 51, 61], ))
t1.start()
t2.start()
t1.join()
t2.join()

for value in lists.values():
    if value == False:
        # Something is suspicious 
        # Take necessary action 

如果您想查找任何一个线程是否返回了错误的状态以采取必要的操作,这将对您有所帮助。

If only True or False is to be validated from a function’s call, a simpler solution I find is updating a global list.

import threading

lists = {"A":"True", "B":"True"}

def myfunc(name: str, mylist):
    for i in mylist:
        if i == 31:
            lists[name] = "False"
            return False
        else:
            print("name {} : {}".format(name, i))

t1 = threading.Thread(target=myfunc, args=("A", [1, 2, 3, 4, 5, 6], ))
t2 = threading.Thread(target=myfunc, args=("B", [11, 21, 31, 41, 51, 61], ))
t1.start()
t2.start()
t1.join()
t2.join()

for value in lists.values():
    if value == False:
        # Something is suspicious 
        # Take necessary action 

This is more helpful where you want to find if any one of the threads had returned a false status to take the necessary action.


将模块“子进程”与超时一起使用

问题:将模块“子进程”与超时一起使用

这是运行任意命令以返回其stdout数据或在非零退出代码上引发异常的Python代码:

proc = subprocess.Popen(
    cmd,
    stderr=subprocess.STDOUT,  # Merge stdout and stderr
    stdout=subprocess.PIPE,
    shell=True)

communicate 用于等待进程退出:

stdoutdata, stderrdata = proc.communicate()

subprocess模块不支持超时-可以杀死运行时间超过X秒的进程-因此,communicate可能需要永远运行。

在打算在Windows和Linux上运行的Python程序中实现超时的最简单方法是什么?

Here’s the Python code to run an arbitrary command returning its stdout data, or raise an exception on non-zero exit codes:

proc = subprocess.Popen(
    cmd,
    stderr=subprocess.STDOUT,  # Merge stdout and stderr
    stdout=subprocess.PIPE,
    shell=True)

communicate is used to wait for the process to exit:

stdoutdata, stderrdata = proc.communicate()

The subprocess module does not support timeout–ability to kill a process running for more than X number of seconds–therefore, communicate may take forever to run.

What is the simplest way to implement timeouts in a Python program meant to run on Windows and Linux?


回答 0

在Python 3.3+中:

from subprocess import STDOUT, check_output

output = check_output(cmd, stderr=STDOUT, timeout=seconds)

output 是一个字节字符串,其中包含命令的合并标准输出,标准错误数据。

check_output提出CalledProcessError问题文本中指定的非零退出状态,这与proc.communicate()的方法。

我已删除,shell=True因为它经常不必要地使用。如果cmd确实需要,您可以随时将其添加回去。如果添加,shell=True即子进程是否产生了自己的后代;check_output()可以比超时指示晚得多返回,请参阅子进程超时失败

超时功能可在Python 2.x上通过subprocess323.2+子进程模块的反向端口使用。

In Python 3.3+:

from subprocess import STDOUT, check_output

output = check_output(cmd, stderr=STDOUT, timeout=seconds)

output is a byte string that contains command’s merged stdout, stderr data.

check_output raises CalledProcessError on non-zero exit status as specified in the question’s text unlike proc.communicate() method.

I’ve removed shell=True because it is often used unnecessarily. You can always add it back if cmd indeed requires it. If you add shell=True i.e., if the child process spawns its own descendants; check_output() can return much later than the timeout indicates, see Subprocess timeout failure.

The timeout feature is available on Python 2.x via the subprocess32 backport of the 3.2+ subprocess module.


回答 1

我对底层细节了解不多;但是,鉴于python 2.6中的API提供了等待线程并终止进程的能力,那么如何在单独的线程中运行进程呢?

import subprocess, threading

class Command(object):
    def __init__(self, cmd):
        self.cmd = cmd
        self.process = None

    def run(self, timeout):
        def target():
            print 'Thread started'
            self.process = subprocess.Popen(self.cmd, shell=True)
            self.process.communicate()
            print 'Thread finished'

        thread = threading.Thread(target=target)
        thread.start()

        thread.join(timeout)
        if thread.is_alive():
            print 'Terminating process'
            self.process.terminate()
            thread.join()
        print self.process.returncode

command = Command("echo 'Process started'; sleep 2; echo 'Process finished'")
command.run(timeout=3)
command.run(timeout=1)

我的计算机中此代码段的输出为:

Thread started
Process started
Process finished
Thread finished
0
Thread started
Process started
Terminating process
Thread finished
-15

从中可以看出,在第一次执行中,进程正确完成了(返回代码0),而在第二次执行中,进程终止了(返回代码-15)。

我没有在Windows中进行测试;但是,除了更新示例命令外,我认为它应该可以工作,因为我在文档中没有发现任何不支持thread.join或process.terminate的内容。

I don’t know much about the low level details; but, given that in python 2.6 the API offers the ability to wait for threads and terminate processes, what about running the process in a separate thread?

import subprocess, threading

class Command(object):
    def __init__(self, cmd):
        self.cmd = cmd
        self.process = None

    def run(self, timeout):
        def target():
            print 'Thread started'
            self.process = subprocess.Popen(self.cmd, shell=True)
            self.process.communicate()
            print 'Thread finished'

        thread = threading.Thread(target=target)
        thread.start()

        thread.join(timeout)
        if thread.is_alive():
            print 'Terminating process'
            self.process.terminate()
            thread.join()
        print self.process.returncode

command = Command("echo 'Process started'; sleep 2; echo 'Process finished'")
command.run(timeout=3)
command.run(timeout=1)

The output of this snippet in my machine is:

Thread started
Process started
Process finished
Thread finished
0
Thread started
Process started
Terminating process
Thread finished
-15

where it can be seen that, in the first execution, the process finished correctly (return code 0), while the in the second one the process was terminated (return code -15).

I haven’t tested in windows; but, aside from updating the example command, I think it should work since I haven’t found in the documentation anything that says that thread.join or process.terminate is not supported.


回答 2

可以使用threading.Timer类简化jcollado的答案:

import shlex
from subprocess import Popen, PIPE
from threading import Timer

def run(cmd, timeout_sec):
    proc = Popen(shlex.split(cmd), stdout=PIPE, stderr=PIPE)
    timer = Timer(timeout_sec, proc.kill)
    try:
        timer.start()
        stdout, stderr = proc.communicate()
    finally:
        timer.cancel()

# Examples: both take 1 second
run("sleep 1", 5)  # process ends normally at 1 second
run("sleep 5", 1)  # timeout happens at 1 second

jcollado’s answer can be simplified using the threading.Timer class:

import shlex
from subprocess import Popen, PIPE
from threading import Timer

def run(cmd, timeout_sec):
    proc = Popen(shlex.split(cmd), stdout=PIPE, stderr=PIPE)
    timer = Timer(timeout_sec, proc.kill)
    try:
        timer.start()
        stdout, stderr = proc.communicate()
    finally:
        timer.cancel()

# Examples: both take 1 second
run("sleep 1", 5)  # process ends normally at 1 second
run("sleep 5", 1)  # timeout happens at 1 second

回答 3

如果您使用的是Unix,

import signal
  ...
class Alarm(Exception):
    pass

def alarm_handler(signum, frame):
    raise Alarm

signal.signal(signal.SIGALRM, alarm_handler)
signal.alarm(5*60)  # 5 minutes
try:
    stdoutdata, stderrdata = proc.communicate()
    signal.alarm(0)  # reset the alarm
except Alarm:
    print "Oops, taking too long!"
    # whatever else

If you’re on Unix,

import signal
  ...
class Alarm(Exception):
    pass

def alarm_handler(signum, frame):
    raise Alarm

signal.signal(signal.SIGALRM, alarm_handler)
signal.alarm(5*60)  # 5 minutes
try:
    stdoutdata, stderrdata = proc.communicate()
    signal.alarm(0)  # reset the alarm
except Alarm:
    print "Oops, taking too long!"
    # whatever else

回答 4

这是Alex Martelli作为具有适当过程终止功能的模块的解决方案。其他方法不起作用,因为它们不使用proc.communicate()。因此,如果您有一个产生大量输出的进程,它将填充其输出缓冲区,然后阻塞直到您从中读取内容。

from os import kill
from signal import alarm, signal, SIGALRM, SIGKILL
from subprocess import PIPE, Popen

def run(args, cwd = None, shell = False, kill_tree = True, timeout = -1, env = None):
    '''
    Run a command with a timeout after which it will be forcibly
    killed.
    '''
    class Alarm(Exception):
        pass
    def alarm_handler(signum, frame):
        raise Alarm
    p = Popen(args, shell = shell, cwd = cwd, stdout = PIPE, stderr = PIPE, env = env)
    if timeout != -1:
        signal(SIGALRM, alarm_handler)
        alarm(timeout)
    try:
        stdout, stderr = p.communicate()
        if timeout != -1:
            alarm(0)
    except Alarm:
        pids = [p.pid]
        if kill_tree:
            pids.extend(get_process_children(p.pid))
        for pid in pids:
            # process might have died before getting to this line
            # so wrap to avoid OSError: no such process
            try: 
                kill(pid, SIGKILL)
            except OSError:
                pass
        return -9, '', ''
    return p.returncode, stdout, stderr

def get_process_children(pid):
    p = Popen('ps --no-headers -o pid --ppid %d' % pid, shell = True,
              stdout = PIPE, stderr = PIPE)
    stdout, stderr = p.communicate()
    return [int(p) for p in stdout.split()]

if __name__ == '__main__':
    print run('find /', shell = True, timeout = 3)
    print run('find', shell = True)

Here is Alex Martelli’s solution as a module with proper process killing. The other approaches do not work because they do not use proc.communicate(). So if you have a process that produces lots of output, it will fill its output buffer and then block until you read something from it.

from os import kill
from signal import alarm, signal, SIGALRM, SIGKILL
from subprocess import PIPE, Popen

def run(args, cwd = None, shell = False, kill_tree = True, timeout = -1, env = None):
    '''
    Run a command with a timeout after which it will be forcibly
    killed.
    '''
    class Alarm(Exception):
        pass
    def alarm_handler(signum, frame):
        raise Alarm
    p = Popen(args, shell = shell, cwd = cwd, stdout = PIPE, stderr = PIPE, env = env)
    if timeout != -1:
        signal(SIGALRM, alarm_handler)
        alarm(timeout)
    try:
        stdout, stderr = p.communicate()
        if timeout != -1:
            alarm(0)
    except Alarm:
        pids = [p.pid]
        if kill_tree:
            pids.extend(get_process_children(p.pid))
        for pid in pids:
            # process might have died before getting to this line
            # so wrap to avoid OSError: no such process
            try: 
                kill(pid, SIGKILL)
            except OSError:
                pass
        return -9, '', ''
    return p.returncode, stdout, stderr

def get_process_children(pid):
    p = Popen('ps --no-headers -o pid --ppid %d' % pid, shell = True,
              stdout = PIPE, stderr = PIPE)
    stdout, stderr = p.communicate()
    return [int(p) for p in stdout.split()]

if __name__ == '__main__':
    print run('find /', shell = True, timeout = 3)
    print run('find', shell = True)

回答 5

我修改了sussudio答案。现在函数返回:( ,returncodestdoutstderrtimeoutstdoutstderr被解码为UTF-8字符串

def kill_proc(proc, timeout):
  timeout["value"] = True
  proc.kill()

def run(cmd, timeout_sec):
  proc = subprocess.Popen(shlex.split(cmd), stdout=subprocess.PIPE, stderr=subprocess.PIPE)
  timeout = {"value": False}
  timer = Timer(timeout_sec, kill_proc, [proc, timeout])
  timer.start()
  stdout, stderr = proc.communicate()
  timer.cancel()
  return proc.returncode, stdout.decode("utf-8"), stderr.decode("utf-8"), timeout["value"]

I’ve modified sussudio answer. Now function returns: (returncode, stdout, stderr, timeout) – stdout and stderr is decoded to utf-8 string

def kill_proc(proc, timeout):
  timeout["value"] = True
  proc.kill()

def run(cmd, timeout_sec):
  proc = subprocess.Popen(shlex.split(cmd), stdout=subprocess.PIPE, stderr=subprocess.PIPE)
  timeout = {"value": False}
  timer = Timer(timeout_sec, kill_proc, [proc, timeout])
  timer.start()
  stdout, stderr = proc.communicate()
  timer.cancel()
  return proc.returncode, stdout.decode("utf-8"), stderr.decode("utf-8"), timeout["value"]

回答 6

惊讶没人提到使用 timeout

timeout 5 ping -c 3 somehost

显然,这不适用于每个用例,但是如果您处理的是简单脚本,那么这是很难克服的。

homebrew对于Mac用户,也可以通过coreutils中的gtimeout 使用。

surprised nobody mentioned using timeout

timeout 5 ping -c 3 somehost

This won’t for work for every use case obviously, but if your dealing with a simple script, this is hard to beat.

Also available as gtimeout in coreutils via homebrew for mac users.


回答 7

timeout现在由subprocess模块支持call()communicate()在其中(在Python3.3中):

import subprocess

subprocess.call("command", timeout=20, shell=True)

这将调用命令并引发异常

subprocess.TimeoutExpired

如果20秒后命令仍未完成。

然后,您可以处理异常以继续执行代码,例如:

try:
    subprocess.call("command", timeout=20, shell=True)
except subprocess.TimeoutExpired:
    # insert code here

希望这可以帮助。

timeout is now supported by call() and communicate() in the subprocess module (as of Python3.3):

import subprocess

subprocess.call("command", timeout=20, shell=True)

This will call the command and raise the exception

subprocess.TimeoutExpired

if the command doesn’t finish after 20 seconds.

You can then handle the exception to continue your code, something like:

try:
    subprocess.call("command", timeout=20, shell=True)
except subprocess.TimeoutExpired:
    # insert code here

Hope this helps.


回答 8

另一种选择是写入临时文件以防止stdout阻塞,而不是需要使用communication()进行轮询。这对我有用,而其他答案却没有。例如在Windows上。

    outFile =  tempfile.SpooledTemporaryFile() 
    errFile =   tempfile.SpooledTemporaryFile() 
    proc = subprocess.Popen(args, stderr=errFile, stdout=outFile, universal_newlines=False)
    wait_remaining_sec = timeout

    while proc.poll() is None and wait_remaining_sec > 0:
        time.sleep(1)
        wait_remaining_sec -= 1

    if wait_remaining_sec <= 0:
        killProc(proc.pid)
        raise ProcessIncompleteError(proc, timeout)

    # read temp streams from start
    outFile.seek(0);
    errFile.seek(0);
    out = outFile.read()
    err = errFile.read()
    outFile.close()
    errFile.close()

Another option is to write to a temporary file to prevent the stdout blocking instead of needing to poll with communicate(). This worked for me where the other answers did not; for example on windows.

    outFile =  tempfile.SpooledTemporaryFile() 
    errFile =   tempfile.SpooledTemporaryFile() 
    proc = subprocess.Popen(args, stderr=errFile, stdout=outFile, universal_newlines=False)
    wait_remaining_sec = timeout

    while proc.poll() is None and wait_remaining_sec > 0:
        time.sleep(1)
        wait_remaining_sec -= 1

    if wait_remaining_sec <= 0:
        killProc(proc.pid)
        raise ProcessIncompleteError(proc, timeout)

    # read temp streams from start
    outFile.seek(0);
    errFile.seek(0);
    out = outFile.read()
    err = errFile.read()
    outFile.close()
    errFile.close()

回答 9

我不知道为什么它不mentionned但是因为Python 3.5,有一个新的subprocess.run通用指令(即意味着取代check_callcheck_output……),并且其具有timeout参数也是如此。

subprocess.run(args,*,stdin = None,input = None,stdout = None,stderr = None,shell = False,cwd = None,timeout = None,check = False,encoding = None,errors = None)

Run the command described by args. Wait for command to complete, then return a CompletedProcess instance.

subprocess.TimeoutExpired超时到期时会引发异常。

I don’t know why it isn’t mentionned but since Python 3.5, there’s a new subprocess.run universal command (that is meant to replace check_call, check_output …) and which has the timeout parameter as well.

subprocess.run(args, *, stdin=None, input=None, stdout=None, stderr=None, shell=False, cwd=None, timeout=None, check=False, encoding=None, errors=None)

Run the command described by args. Wait for command to complete, then return a CompletedProcess instance.

It raises a subprocess.TimeoutExpired exception when the timeout is expired.


回答 10

这是我的解决方案,我正在使用线程和事件:

import subprocess
from threading import Thread, Event

def kill_on_timeout(done, timeout, proc):
    if not done.wait(timeout):
        proc.kill()

def exec_command(command, timeout):

    done = Event()
    proc = subprocess.Popen(command, stdout=subprocess.PIPE, stderr=subprocess.PIPE)

    watcher = Thread(target=kill_on_timeout, args=(done, timeout, proc))
    watcher.daemon = True
    watcher.start()

    data, stderr = proc.communicate()
    done.set()

    return data, stderr, proc.returncode

实际上:

In [2]: exec_command(['sleep', '10'], 5)
Out[2]: ('', '', -9)

In [3]: exec_command(['sleep', '10'], 11)
Out[3]: ('', '', 0)

Here is my solution, I was using Thread and Event:

import subprocess
from threading import Thread, Event

def kill_on_timeout(done, timeout, proc):
    if not done.wait(timeout):
        proc.kill()

def exec_command(command, timeout):

    done = Event()
    proc = subprocess.Popen(command, stdout=subprocess.PIPE, stderr=subprocess.PIPE)

    watcher = Thread(target=kill_on_timeout, args=(done, timeout, proc))
    watcher.daemon = True
    watcher.start()

    data, stderr = proc.communicate()
    done.set()

    return data, stderr, proc.returncode

In action:

In [2]: exec_command(['sleep', '10'], 5)
Out[2]: ('', '', -9)

In [3]: exec_command(['sleep', '10'], 11)
Out[3]: ('', '', 0)

回答 11

我使用的解决方案是给shell命令加上时间限制。如果命令花费的时间太长,则时间限制将停止它,并且Popen将有一个由时间限制设置的返回码。如果大于128,则表示时间限制终止了该进程。

另请参阅具有超时和大输出(> 64K)的python子进程

The solution I use is to prefix the shell command with timelimit. If the comand takes too long, timelimit will stop it and Popen will have a returncode set by timelimit. If it is > 128, it means timelimit killed the process.

See also python subprocess with timeout and large output (>64K)


回答 12

我将带有线程自的解决方案添加jcollado到了我的Python模块easyprocess中

安装:

pip install easyprocess

例:

from easyprocess import Proc

# shell is not supported!
stdout=Proc('ping localhost').call(timeout=1.5).stdout
print stdout

I added the solution with threading from jcollado to my Python module easyprocess.

Install:

pip install easyprocess

Example:

from easyprocess import Proc

# shell is not supported!
stdout=Proc('ping localhost').call(timeout=1.5).stdout
print stdout

回答 13

如果您使用的是python 2,请尝试一下

import subprocess32

try:
    output = subprocess32.check_output(command, shell=True, timeout=3)
except subprocess32.TimeoutExpired as e:
    print e

if you are using python 2, give it a try

import subprocess32

try:
    output = subprocess32.check_output(command, shell=True, timeout=3)
except subprocess32.TimeoutExpired as e:
    print e

回答 14

前置Linux命令timeout不是一个坏的解决方法,它对我有用。

cmd = "timeout 20 "+ cmd
subprocess.Popen(cmd.split(), stdout=subprocess.PIPE, stderr=subprocess.PIPE)
(output, err) = p.communicate()

Prepending the Linux command timeout isn’t a bad workaround and it worked for me.

cmd = "timeout 20 "+ cmd
subprocess.Popen(cmd.split(), stdout=subprocess.PIPE, stderr=subprocess.PIPE)
(output, err) = p.communicate()

回答 15

我已经实现了我可以从其中一些中学到的东西。这在Windows中有效,并且由于这是社区Wiki,因此我想我也将共享我的代码:

class Command(threading.Thread):
    def __init__(self, cmd, outFile, errFile, timeout):
        threading.Thread.__init__(self)
        self.cmd = cmd
        self.process = None
        self.outFile = outFile
        self.errFile = errFile
        self.timed_out = False
        self.timeout = timeout

    def run(self):
        self.process = subprocess.Popen(self.cmd, stdout = self.outFile, \
            stderr = self.errFile)

        while (self.process.poll() is None and self.timeout > 0):
            time.sleep(1)
            self.timeout -= 1

        if not self.timeout > 0:
            self.process.terminate()
            self.timed_out = True
        else:
            self.timed_out = False

然后从另一个类或文件:

        outFile =  tempfile.SpooledTemporaryFile()
        errFile =   tempfile.SpooledTemporaryFile()

        executor = command.Command(c, outFile, errFile, timeout)
        executor.daemon = True
        executor.start()

        executor.join()
        if executor.timed_out:
            out = 'timed out'
        else:
            outFile.seek(0)
            errFile.seek(0)
            out = outFile.read()
            err = errFile.read()

        outFile.close()
        errFile.close()

I’ve implemented what I could gather from a few of these. This works in Windows, and since this is a community wiki, I figure I would share my code as well:

class Command(threading.Thread):
    def __init__(self, cmd, outFile, errFile, timeout):
        threading.Thread.__init__(self)
        self.cmd = cmd
        self.process = None
        self.outFile = outFile
        self.errFile = errFile
        self.timed_out = False
        self.timeout = timeout

    def run(self):
        self.process = subprocess.Popen(self.cmd, stdout = self.outFile, \
            stderr = self.errFile)

        while (self.process.poll() is None and self.timeout > 0):
            time.sleep(1)
            self.timeout -= 1

        if not self.timeout > 0:
            self.process.terminate()
            self.timed_out = True
        else:
            self.timed_out = False

Then from another class or file:

        outFile =  tempfile.SpooledTemporaryFile()
        errFile =   tempfile.SpooledTemporaryFile()

        executor = command.Command(c, outFile, errFile, timeout)
        executor.daemon = True
        executor.start()

        executor.join()
        if executor.timed_out:
            out = 'timed out'
        else:
            outFile.seek(0)
            errFile.seek(0)
            out = outFile.read()
            err = errFile.read()

        outFile.close()
        errFile.close()

回答 16

一旦您了解了* unix中运行全过程的机器,您将轻松找到更简单的解决方案:

考虑这个简单的示例,如何使用select.select()使超时的communication()方法(现在几乎在* nix上几乎所有可用)。这也可以用epoll / poll / kqueue编写,但是select.select()变体可能是一个很好的例子。而且select.select()的主要限制(速度和最大1024 fds)不适用于您的任务。

这可以在* nix下工作,不创建线程,不使用信号,可以从任何线程(不仅是主线程)启动,并且速度足够快,可以从我的计算机上的stdout(i5 2.3ghz)读取250mb / s的数据。

在通信结束时加入stdout / stderr存在问题。如果您的程序输出很大,可能会导致占用大量内存。但是您可以在较小的超时时间内多次调用communication()。

class Popen(subprocess.Popen):
    def communicate(self, input=None, timeout=None):
        if timeout is None:
            return subprocess.Popen.communicate(self, input)

        if self.stdin:
            # Flush stdio buffer, this might block if user
            # has been writing to .stdin in an uncontrolled
            # fashion.
            self.stdin.flush()
            if not input:
                self.stdin.close()

        read_set, write_set = [], []
        stdout = stderr = None

        if self.stdin and input:
            write_set.append(self.stdin)
        if self.stdout:
            read_set.append(self.stdout)
            stdout = []
        if self.stderr:
            read_set.append(self.stderr)
            stderr = []

        input_offset = 0
        deadline = time.time() + timeout

        while read_set or write_set:
            try:
                rlist, wlist, xlist = select.select(read_set, write_set, [], max(0, deadline - time.time()))
            except select.error as ex:
                if ex.args[0] == errno.EINTR:
                    continue
                raise

            if not (rlist or wlist):
                # Just break if timeout
                # Since we do not close stdout/stderr/stdin, we can call
                # communicate() several times reading data by smaller pieces.
                break

            if self.stdin in wlist:
                chunk = input[input_offset:input_offset + subprocess._PIPE_BUF]
                try:
                    bytes_written = os.write(self.stdin.fileno(), chunk)
                except OSError as ex:
                    if ex.errno == errno.EPIPE:
                        self.stdin.close()
                        write_set.remove(self.stdin)
                    else:
                        raise
                else:
                    input_offset += bytes_written
                    if input_offset >= len(input):
                        self.stdin.close()
                        write_set.remove(self.stdin)

            # Read stdout / stderr by 1024 bytes
            for fn, tgt in (
                (self.stdout, stdout),
                (self.stderr, stderr),
            ):
                if fn in rlist:
                    data = os.read(fn.fileno(), 1024)
                    if data == '':
                        fn.close()
                        read_set.remove(fn)
                    tgt.append(data)

        if stdout is not None:
            stdout = ''.join(stdout)
        if stderr is not None:
            stderr = ''.join(stderr)

        return (stdout, stderr)

Once you understand full process running machinery in *unix, you will easily find simplier solution:

Consider this simple example how to make timeoutable communicate() meth using select.select() (available alsmost everythere on *nix nowadays). This also can be written with epoll/poll/kqueue, but select.select() variant could be a good example for you. And major limitations of select.select() (speed and 1024 max fds) are not applicapable for your task.

This works under *nix, does not create threads, does not uses signals, can be lauched from any thread (not only main), and fast enought to read 250mb/s of data from stdout on my machine (i5 2.3ghz).

There is a problem in join’ing stdout/stderr at the end of communicate. If you have huge program output this could lead to big memory usage. But you can call communicate() several times with smaller timeouts.

class Popen(subprocess.Popen):
    def communicate(self, input=None, timeout=None):
        if timeout is None:
            return subprocess.Popen.communicate(self, input)

        if self.stdin:
            # Flush stdio buffer, this might block if user
            # has been writing to .stdin in an uncontrolled
            # fashion.
            self.stdin.flush()
            if not input:
                self.stdin.close()

        read_set, write_set = [], []
        stdout = stderr = None

        if self.stdin and input:
            write_set.append(self.stdin)
        if self.stdout:
            read_set.append(self.stdout)
            stdout = []
        if self.stderr:
            read_set.append(self.stderr)
            stderr = []

        input_offset = 0
        deadline = time.time() + timeout

        while read_set or write_set:
            try:
                rlist, wlist, xlist = select.select(read_set, write_set, [], max(0, deadline - time.time()))
            except select.error as ex:
                if ex.args[0] == errno.EINTR:
                    continue
                raise

            if not (rlist or wlist):
                # Just break if timeout
                # Since we do not close stdout/stderr/stdin, we can call
                # communicate() several times reading data by smaller pieces.
                break

            if self.stdin in wlist:
                chunk = input[input_offset:input_offset + subprocess._PIPE_BUF]
                try:
                    bytes_written = os.write(self.stdin.fileno(), chunk)
                except OSError as ex:
                    if ex.errno == errno.EPIPE:
                        self.stdin.close()
                        write_set.remove(self.stdin)
                    else:
                        raise
                else:
                    input_offset += bytes_written
                    if input_offset >= len(input):
                        self.stdin.close()
                        write_set.remove(self.stdin)

            # Read stdout / stderr by 1024 bytes
            for fn, tgt in (
                (self.stdout, stdout),
                (self.stderr, stderr),
            ):
                if fn in rlist:
                    data = os.read(fn.fileno(), 1024)
                    if data == '':
                        fn.close()
                        read_set.remove(fn)
                    tgt.append(data)

        if stdout is not None:
            stdout = ''.join(stdout)
        if stderr is not None:
            stderr = ''.join(stderr)

        return (stdout, stderr)

回答 17

您可以使用 select

import subprocess
from datetime import datetime
from select import select

def call_with_timeout(cmd, timeout):
    started = datetime.now()
    sp = subprocess.Popen(cmd, stdout=subprocess.PIPE)
    while True:
        p = select([sp.stdout], [], [], timeout)
        if p[0]:
            p[0][0].read()
        ret = sp.poll()
        if ret is not None:
            return ret
        if (datetime.now()-started).total_seconds() > timeout:
            sp.kill()
            return None

You can do this using select

import subprocess
from datetime import datetime
from select import select

def call_with_timeout(cmd, timeout):
    started = datetime.now()
    sp = subprocess.Popen(cmd, stdout=subprocess.PIPE)
    while True:
        p = select([sp.stdout], [], [], timeout)
        if p[0]:
            p[0][0].read()
        ret = sp.poll()
        if ret is not None:
            return ret
        if (datetime.now()-started).total_seconds() > timeout:
            sp.kill()
            return None

回答 18

我已经在Windows,Linux和Mac上成功使用killableprocess。如果您使用的是Cygwin Python,则需要OSAF的killableprocess版本,因为否则本机Windows进程将不会被杀死。

I’ve used killableprocess successfully on Windows, Linux and Mac. If you are using Cygwin Python, you’ll need OSAF’s version of killableprocess because otherwise native Windows processes won’t get killed.


回答 19

尽管我没有广泛研究它,但我在ActiveState上发现的这种装饰器似乎对这种事情很有用。与一起subprocess.Popen(..., close_fds=True),至少我已经准备好使用Python编写shell脚本了。

Although I haven’t looked at it extensively, this decorator I found at ActiveState seems to be quite useful for this sort of thing. Along with subprocess.Popen(..., close_fds=True), at least I’m ready for shell-scripting in Python.


回答 20

如果shell = True,此解决方案将杀死进程树,将参数传递给进程(或不传递参数),具有超时并获取回调的stdout,stderr和进程输出(它将psutil用于kill_proc_tree)。这是基于SO中发布的几种解决方案,包括jcollado的解决方案。在jcollado的回答中张贴对Anson和jradice的评论的回应。已在Windows Srvr 2012和Ubuntu 14.04中测试。请注意,对于Ubuntu,您需要将parent.children(…)调用更改为parent.get_children(…)。

def kill_proc_tree(pid, including_parent=True):
  parent = psutil.Process(pid)
  children = parent.children(recursive=True)
  for child in children:
    child.kill()
  psutil.wait_procs(children, timeout=5)
  if including_parent:
    parent.kill()
    parent.wait(5)

def run_with_timeout(cmd, current_dir, cmd_parms, timeout):
  def target():
    process = subprocess.Popen(cmd, cwd=current_dir, shell=True, stdout=subprocess.PIPE, stdin=subprocess.PIPE, stderr=subprocess.PIPE)

    # wait for the process to terminate
    if (cmd_parms == ""):
      out, err = process.communicate()
    else:
      out, err = process.communicate(cmd_parms)
    errcode = process.returncode

  thread = Thread(target=target)
  thread.start()

  thread.join(timeout)
  if thread.is_alive():
    me = os.getpid()
    kill_proc_tree(me, including_parent=False)
    thread.join()

This solution kills the process tree in case of shell=True, passes parameters to the process (or not), has a timeout and gets the stdout, stderr and process output of the call back (it uses psutil for the kill_proc_tree). This was based on several solutions posted in SO including jcollado’s. Posting in response to comments by Anson and jradice in jcollado’s answer. Tested in Windows Srvr 2012 and Ubuntu 14.04. Please note that for Ubuntu you need to change the parent.children(…) call to parent.get_children(…).

def kill_proc_tree(pid, including_parent=True):
  parent = psutil.Process(pid)
  children = parent.children(recursive=True)
  for child in children:
    child.kill()
  psutil.wait_procs(children, timeout=5)
  if including_parent:
    parent.kill()
    parent.wait(5)

def run_with_timeout(cmd, current_dir, cmd_parms, timeout):
  def target():
    process = subprocess.Popen(cmd, cwd=current_dir, shell=True, stdout=subprocess.PIPE, stdin=subprocess.PIPE, stderr=subprocess.PIPE)

    # wait for the process to terminate
    if (cmd_parms == ""):
      out, err = process.communicate()
    else:
      out, err = process.communicate(cmd_parms)
    errcode = process.returncode

  thread = Thread(target=target)
  thread.start()

  thread.join(timeout)
  if thread.is_alive():
    me = os.getpid()
    kill_proc_tree(me, including_parent=False)
    thread.join()

回答 21

有一个想法可以继承Popen类并使用一些简单的方法装饰器对其进行扩展。我们称之为ExpirablePopen。

from logging import error
from subprocess import Popen
from threading import Event
from threading import Thread


class ExpirablePopen(Popen):

    def __init__(self, *args, **kwargs):
        self.timeout = kwargs.pop('timeout', 0)
        self.timer = None
        self.done = Event()

        Popen.__init__(self, *args, **kwargs)

    def __tkill(self):
        timeout = self.timeout
        if not self.done.wait(timeout):
            error('Terminating process {} by timeout of {} secs.'.format(self.pid, timeout))
            self.kill()

    def expirable(func):
        def wrapper(self, *args, **kwargs):
            # zero timeout means call of parent method
            if self.timeout == 0:
                return func(self, *args, **kwargs)

            # if timer is None, need to start it
            if self.timer is None:
                self.timer = thr = Thread(target=self.__tkill)
                thr.daemon = True
                thr.start()

            result = func(self, *args, **kwargs)
            self.done.set()

            return result
        return wrapper

    wait = expirable(Popen.wait)
    communicate = expirable(Popen.communicate)


if __name__ == '__main__':
    from subprocess import PIPE

    print ExpirablePopen('ssh -T git@bitbucket.org', stdout=PIPE, timeout=1).communicate()

There’s an idea to subclass the Popen class and extend it with some simple method decorators. Let’s call it ExpirablePopen.

from logging import error
from subprocess import Popen
from threading import Event
from threading import Thread


class ExpirablePopen(Popen):

    def __init__(self, *args, **kwargs):
        self.timeout = kwargs.pop('timeout', 0)
        self.timer = None
        self.done = Event()

        Popen.__init__(self, *args, **kwargs)

    def __tkill(self):
        timeout = self.timeout
        if not self.done.wait(timeout):
            error('Terminating process {} by timeout of {} secs.'.format(self.pid, timeout))
            self.kill()

    def expirable(func):
        def wrapper(self, *args, **kwargs):
            # zero timeout means call of parent method
            if self.timeout == 0:
                return func(self, *args, **kwargs)

            # if timer is None, need to start it
            if self.timer is None:
                self.timer = thr = Thread(target=self.__tkill)
                thr.daemon = True
                thr.start()

            result = func(self, *args, **kwargs)
            self.done.set()

            return result
        return wrapper

    wait = expirable(Popen.wait)
    communicate = expirable(Popen.communicate)


if __name__ == '__main__':
    from subprocess import PIPE

    print ExpirablePopen('ssh -T git@bitbucket.org', stdout=PIPE, timeout=1).communicate()

回答 22

我遇到的问题是,如果花费的时间比给定的超时时间长,我想终止多线程子进程。我想在中设置一个超时Popen(),但是没有用。然后,我意识到这Popen().wait()等于call(),因此我有了在该.wait(timeout=xxx)方法中设置超时的想法,该方法终于奏效了。因此,我通过以下方式解决了问题:

import os
import sys
import signal
import subprocess
from multiprocessing import Pool

cores_for_parallelization = 4
timeout_time = 15  # seconds

def main():
    jobs = [...YOUR_JOB_LIST...]
    with Pool(cores_for_parallelization) as p:
        p.map(run_parallel_jobs, jobs)

def run_parallel_jobs(args):
    # Define the arguments including the paths
    initial_terminal_command = 'C:\\Python34\\python.exe'  # Python executable
    function_to_start = 'C:\\temp\\xyz.py'  # The multithreading script
    final_list = [initial_terminal_command, function_to_start]
    final_list.extend(args)

    # Start the subprocess and determine the process PID
    subp = subprocess.Popen(final_list)  # starts the process
    pid = subp.pid

    # Wait until the return code returns from the function by considering the timeout. 
    # If not, terminate the process.
    try:
        returncode = subp.wait(timeout=timeout_time)  # should be zero if accomplished
    except subprocess.TimeoutExpired:
        # Distinguish between Linux and Windows and terminate the process if 
        # the timeout has been expired
        if sys.platform == 'linux2':
            os.kill(pid, signal.SIGTERM)
        elif sys.platform == 'win32':
            subp.terminate()

if __name__ == '__main__':
    main()

I had the problem that I wanted to terminate a multithreading subprocess if it took longer than a given timeout length. I wanted to set a timeout in Popen(), but it did not work. Then, I realized that Popen().wait() is equal to call() and so I had the idea to set a timeout within the .wait(timeout=xxx) method, which finally worked. Thus, I solved it this way:

import os
import sys
import signal
import subprocess
from multiprocessing import Pool

cores_for_parallelization = 4
timeout_time = 15  # seconds

def main():
    jobs = [...YOUR_JOB_LIST...]
    with Pool(cores_for_parallelization) as p:
        p.map(run_parallel_jobs, jobs)

def run_parallel_jobs(args):
    # Define the arguments including the paths
    initial_terminal_command = 'C:\\Python34\\python.exe'  # Python executable
    function_to_start = 'C:\\temp\\xyz.py'  # The multithreading script
    final_list = [initial_terminal_command, function_to_start]
    final_list.extend(args)

    # Start the subprocess and determine the process PID
    subp = subprocess.Popen(final_list)  # starts the process
    pid = subp.pid

    # Wait until the return code returns from the function by considering the timeout. 
    # If not, terminate the process.
    try:
        returncode = subp.wait(timeout=timeout_time)  # should be zero if accomplished
    except subprocess.TimeoutExpired:
        # Distinguish between Linux and Windows and terminate the process if 
        # the timeout has been expired
        if sys.platform == 'linux2':
            os.kill(pid, signal.SIGTERM)
        elif sys.platform == 'win32':
            subp.terminate()

if __name__ == '__main__':
    main()

回答 23

不幸的是,我受雇主披露源代码的非常严格的政策约束,因此我无法提供实际的代码。但按我的喜好,最好的解决方案是创建一个重写的子类Popen.wait()以轮询而不是无限期地等待,并Popen.__init__接受超时参数。完成后,所有其他Popen方法(调用wait)将按预期工作,包括communicate

Unfortunately, I’m bound by very strict policies on the disclosure of source code by my employer, so I can’t provide actual code. But for my taste the best solution is to create a subclass overriding Popen.wait() to poll instead of wait indefinitely, and Popen.__init__ to accept a timeout parameter. Once you do that, all the other Popen methods (which call wait) will work as expected, including communicate.


回答 24

https://pypi.python.org/pypi/python-subprocess2提供了子流程模块的扩展,使您可以等待一段时间,否则终止。

因此,要等待10秒钟才能终止进程,否则请终止:

pipe  = subprocess.Popen('...')

timeout =  10

results = pipe.waitOrTerminate(timeout)

这与Windows和UNIX兼容。“结果”是一个字典,它包含“ returnCode”和“ actionTaken”,returnCode是应用程序的返回值(如果必须终止,则为None)。如果该过程正常完成,则显示为“ SUBPROCESS2_PROCESS_COMPLETED”,或者根据执行的操作显示“ SUBPROCESS2_PROCESS_TERMINATED”和SUBPROCESS2_PROCESS_KILLED的掩码(有关详细信息,请参阅文档)

https://pypi.python.org/pypi/python-subprocess2 provides extensions to the subprocess module which allow you to wait up to a certain period of time, otherwise terminate.

So, to wait up to 10 seconds for the process to terminate, otherwise kill:

pipe  = subprocess.Popen('...')

timeout =  10

results = pipe.waitOrTerminate(timeout)

This is compatible with both windows and unix. “results” is a dictionary, it contains “returnCode” which is the return of the app (or None if it had to be killed), as well as “actionTaken”. which will be “SUBPROCESS2_PROCESS_COMPLETED” if the process completed normally, or a mask of “SUBPROCESS2_PROCESS_TERMINATED” and SUBPROCESS2_PROCESS_KILLED depending on action taken (see documentation for full details)


回答 25

对于python 2.6+,请使用gevent

 from gevent.subprocess import Popen, PIPE, STDOUT

 def call_sys(cmd, timeout):
      p= Popen(cmd, shell=True, stdout=PIPE)
      output, _ = p.communicate(timeout=timeout)
      assert p.returncode == 0, p. returncode
      return output

 call_sys('./t.sh', 2)

 # t.sh example
 sleep 5
 echo done
 exit 1

for python 2.6+, use gevent

 from gevent.subprocess import Popen, PIPE, STDOUT

 def call_sys(cmd, timeout):
      p= Popen(cmd, shell=True, stdout=PIPE)
      output, _ = p.communicate(timeout=timeout)
      assert p.returncode == 0, p. returncode
      return output

 call_sys('./t.sh', 2)

 # t.sh example
 sleep 5
 echo done
 exit 1

回答 26

python 2.7

import time
import subprocess

def run_command(cmd, timeout=0):
    start_time = time.time()
    df = subprocess.Popen(cmd, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
    while timeout and df.poll() == None:
        if time.time()-start_time >= timeout:
            df.kill()
            return -1, ""
    output = '\n'.join(df.communicate()).strip()
    return df.returncode, output

python 2.7

import time
import subprocess

def run_command(cmd, timeout=0):
    start_time = time.time()
    df = subprocess.Popen(cmd, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
    while timeout and df.poll() == None:
        if time.time()-start_time >= timeout:
            df.kill()
            return -1, ""
    output = '\n'.join(df.communicate()).strip()
    return df.returncode, output

回答 27

import subprocess, optparse, os, sys, re, datetime, threading, time, glob, shutil, xml.dom.minidom, traceback

class OutputManager:
    def __init__(self, filename, mode, console, logonly):
        self.con = console
        self.logtoconsole = True
        self.logtofile = False

        if filename:
            try:
                self.f = open(filename, mode)
                self.logtofile = True
                if logonly == True:
                    self.logtoconsole = False
            except IOError:
                print (sys.exc_value)
                print ("Switching to console only output...\n")
                self.logtofile = False
                self.logtoconsole = True

    def write(self, data):
        if self.logtoconsole == True:
            self.con.write(data)
        if self.logtofile == True:
            self.f.write(data)
        sys.stdout.flush()

def getTimeString():
        return time.strftime("%Y-%m-%d", time.gmtime())

def runCommand(command):
    '''
    Execute a command in new thread and return the
    stdout and stderr content of it.
    '''
    try:
        Output = subprocess.Popen(command, stdout=subprocess.PIPE, shell=True).communicate()[0]
    except Exception as e:
        print ("runCommand failed :%s" % (command))
        print (str(e))
        sys.stdout.flush()
        return None
    return Output

def GetOs():
    Os = ""
    if sys.platform.startswith('win32'):
        Os = "win"
    elif sys.platform.startswith('linux'):
        Os = "linux"
    elif sys.platform.startswith('darwin'):
        Os = "mac"
    return Os


def check_output(*popenargs, **kwargs):
    try:
        if 'stdout' in kwargs: 
            raise ValueError('stdout argument not allowed, it will be overridden.') 

        # Get start time.
        startTime = datetime.datetime.now()
        timeoutValue=3600

        cmd = popenargs[0]

        if sys.platform.startswith('win32'):
            process = subprocess.Popen( cmd, stdout=subprocess.PIPE, shell=True) 
        elif sys.platform.startswith('linux'):
            process = subprocess.Popen( cmd , stdout=subprocess.PIPE, shell=True ) 
        elif sys.platform.startswith('darwin'):
            process = subprocess.Popen( cmd , stdout=subprocess.PIPE, shell=True ) 

        stdoutdata, stderrdata = process.communicate( timeout = timeoutValue )
        retcode = process.poll()

        ####################################
        # Catch crash error and log it.
        ####################################
        OutputHandle = None
        try:
            if retcode >= 1:
                OutputHandle = OutputManager( 'CrashJob_' + getTimeString() + '.txt', 'a+', sys.stdout, False)
                OutputHandle.write( cmd )
                print (stdoutdata)
                print (stderrdata)
                sys.stdout.flush()
        except Exception as e:
            print (str(e))

    except subprocess.TimeoutExpired:
            ####################################
            # Catch time out error and log it.
            ####################################
            Os = GetOs()
            if Os == 'win':
                killCmd = "taskkill /FI \"IMAGENAME eq {0}\" /T /F"
            elif Os == 'linux':
                killCmd = "pkill {0)"
            elif Os == 'mac':
                # Linux, Mac OS
                killCmd = "killall -KILL {0}"

            runCommand(killCmd.format("java"))
            runCommand(killCmd.format("YouApp"))

            OutputHandle = None
            try:
                OutputHandle = OutputManager( 'KillJob_' + getTimeString() + '.txt', 'a+', sys.stdout, False)
                OutputHandle.write( cmd )
            except Exception as e:
                print (str(e))
    except Exception as e:
            for frame in traceback.extract_tb(sys.exc_info()[2]):
                        fname,lineno,fn,text = frame
                        print "Error in %s on line %d" % (fname, lineno)
import subprocess, optparse, os, sys, re, datetime, threading, time, glob, shutil, xml.dom.minidom, traceback

class OutputManager:
    def __init__(self, filename, mode, console, logonly):
        self.con = console
        self.logtoconsole = True
        self.logtofile = False

        if filename:
            try:
                self.f = open(filename, mode)
                self.logtofile = True
                if logonly == True:
                    self.logtoconsole = False
            except IOError:
                print (sys.exc_value)
                print ("Switching to console only output...\n")
                self.logtofile = False
                self.logtoconsole = True

    def write(self, data):
        if self.logtoconsole == True:
            self.con.write(data)
        if self.logtofile == True:
            self.f.write(data)
        sys.stdout.flush()

def getTimeString():
        return time.strftime("%Y-%m-%d", time.gmtime())

def runCommand(command):
    '''
    Execute a command in new thread and return the
    stdout and stderr content of it.
    '''
    try:
        Output = subprocess.Popen(command, stdout=subprocess.PIPE, shell=True).communicate()[0]
    except Exception as e:
        print ("runCommand failed :%s" % (command))
        print (str(e))
        sys.stdout.flush()
        return None
    return Output

def GetOs():
    Os = ""
    if sys.platform.startswith('win32'):
        Os = "win"
    elif sys.platform.startswith('linux'):
        Os = "linux"
    elif sys.platform.startswith('darwin'):
        Os = "mac"
    return Os


def check_output(*popenargs, **kwargs):
    try:
        if 'stdout' in kwargs: 
            raise ValueError('stdout argument not allowed, it will be overridden.') 

        # Get start time.
        startTime = datetime.datetime.now()
        timeoutValue=3600

        cmd = popenargs[0]

        if sys.platform.startswith('win32'):
            process = subprocess.Popen( cmd, stdout=subprocess.PIPE, shell=True) 
        elif sys.platform.startswith('linux'):
            process = subprocess.Popen( cmd , stdout=subprocess.PIPE, shell=True ) 
        elif sys.platform.startswith('darwin'):
            process = subprocess.Popen( cmd , stdout=subprocess.PIPE, shell=True ) 

        stdoutdata, stderrdata = process.communicate( timeout = timeoutValue )
        retcode = process.poll()

        ####################################
        # Catch crash error and log it.
        ####################################
        OutputHandle = None
        try:
            if retcode >= 1:
                OutputHandle = OutputManager( 'CrashJob_' + getTimeString() + '.txt', 'a+', sys.stdout, False)
                OutputHandle.write( cmd )
                print (stdoutdata)
                print (stderrdata)
                sys.stdout.flush()
        except Exception as e:
            print (str(e))

    except subprocess.TimeoutExpired:
            ####################################
            # Catch time out error and log it.
            ####################################
            Os = GetOs()
            if Os == 'win':
                killCmd = "taskkill /FI \"IMAGENAME eq {0}\" /T /F"
            elif Os == 'linux':
                killCmd = "pkill {0)"
            elif Os == 'mac':
                # Linux, Mac OS
                killCmd = "killall -KILL {0}"

            runCommand(killCmd.format("java"))
            runCommand(killCmd.format("YouApp"))

            OutputHandle = None
            try:
                OutputHandle = OutputManager( 'KillJob_' + getTimeString() + '.txt', 'a+', sys.stdout, False)
                OutputHandle.write( cmd )
            except Exception as e:
                print (str(e))
    except Exception as e:
            for frame in traceback.extract_tb(sys.exc_info()[2]):
                        fname,lineno,fn,text = frame
                        print "Error in %s on line %d" % (fname, lineno)

回答 28

只是想写一些简单的东西。

#!/usr/bin/python

from subprocess import Popen, PIPE
import datetime
import time 

popen = Popen(["/bin/sleep", "10"]);
pid = popen.pid
sttime = time.time();
waittime =  3

print "Start time %s"%(sttime)

while True:
    popen.poll();
    time.sleep(1)
    rcode = popen.returncode
    now = time.time();
    if [ rcode is None ]  and  [ now > (sttime + waittime) ] :
        print "Killing it now"
        popen.kill()

Was just trying to write something simpler.

#!/usr/bin/python

from subprocess import Popen, PIPE
import datetime
import time 

popen = Popen(["/bin/sleep", "10"]);
pid = popen.pid
sttime = time.time();
waittime =  3

print "Start time %s"%(sttime)

while True:
    popen.poll();
    time.sleep(1)
    rcode = popen.returncode
    now = time.time();
    if [ rcode is None ]  and  [ now > (sttime + waittime) ] :
        print "Killing it now"
        popen.kill()

有什么办法可以杀死线程吗?

问题:有什么办法可以杀死线程吗?

是否可以在不设置/检查任何标志/信号灯/等的情况下终止正在运行的线程?

Is it possible to terminate a running thread without setting/checking any flags/semaphores/etc.?


回答 0

用Python和任何语言突然终止线程通常是一个错误的模式。考虑以下情况:

  • 线程持有必须正确关闭的关键资源
  • 该线程创建了其他几个必须同时终止的线程。

如果您负担得起的话(如果您要管理自己的线程),处理此问题的一种好方法是有一个exit_request标志,每个线程定期检查一次,以查看是否该退出。

例如:

import threading

class StoppableThread(threading.Thread):
    """Thread class with a stop() method. The thread itself has to check
    regularly for the stopped() condition."""

    def __init__(self,  *args, **kwargs):
        super(StoppableThread, self).__init__(*args, **kwargs)
        self._stop_event = threading.Event()

    def stop(self):
        self._stop_event.set()

    def stopped(self):
        return self._stop_event.is_set()

在此代码中,您应该stop()在希望退出线程时调用该线程,然后使用来等待线程正确退出join()。线程应定期检查停止标志。

但是,在某些情况下,您确实需要杀死线程。一个示例是当您包装一个忙于长时间调用的外部库并且想要中断它时。

以下代码允许(有一些限制)在Python线程中引发Exception:

def _async_raise(tid, exctype):
    '''Raises an exception in the threads with id tid'''
    if not inspect.isclass(exctype):
        raise TypeError("Only types can be raised (not instances)")
    res = ctypes.pythonapi.PyThreadState_SetAsyncExc(ctypes.c_long(tid),
                                                     ctypes.py_object(exctype))
    if res == 0:
        raise ValueError("invalid thread id")
    elif res != 1:
        # "if it returns a number greater than one, you're in trouble,
        # and you should call it again with exc=NULL to revert the effect"
        ctypes.pythonapi.PyThreadState_SetAsyncExc(ctypes.c_long(tid), None)
        raise SystemError("PyThreadState_SetAsyncExc failed")

class ThreadWithExc(threading.Thread):
    '''A thread class that supports raising exception in the thread from
       another thread.
    '''
    def _get_my_tid(self):
        """determines this (self's) thread id

        CAREFUL : this function is executed in the context of the caller
        thread, to get the identity of the thread represented by this
        instance.
        """
        if not self.isAlive():
            raise threading.ThreadError("the thread is not active")

        # do we have it cached?
        if hasattr(self, "_thread_id"):
            return self._thread_id

        # no, look for it in the _active dict
        for tid, tobj in threading._active.items():
            if tobj is self:
                self._thread_id = tid
                return tid

        # TODO: in python 2.6, there's a simpler way to do : self.ident

        raise AssertionError("could not determine the thread's id")

    def raiseExc(self, exctype):
        """Raises the given exception type in the context of this thread.

        If the thread is busy in a system call (time.sleep(),
        socket.accept(), ...), the exception is simply ignored.

        If you are sure that your exception should terminate the thread,
        one way to ensure that it works is:

            t = ThreadWithExc( ... )
            ...
            t.raiseExc( SomeException )
            while t.isAlive():
                time.sleep( 0.1 )
                t.raiseExc( SomeException )

        If the exception is to be caught by the thread, you need a way to
        check that your thread has caught it.

        CAREFUL : this function is executed in the context of the
        caller thread, to raise an excpetion in the context of the
        thread represented by this instance.
        """
        _async_raise( self._get_my_tid(), exctype )

(基于Tomer Filiba的Killable Threads。有关return值的引用PyThreadState_SetAsyncExc似乎来自旧版本的Python。)

如文档中所述,这不是灵丹妙药,因为如果线程在Python解释器之外很忙,它将无法捕获中断。

此代码的一个很好的用法模式是让线程捕获特定的异常并执行清理。这样,您可以中断任务并仍然进行适当的清理。

It is generally a bad pattern to kill a thread abruptly, in Python and in any language. Think of the following cases:

  • the thread is holding a critical resource that must be closed properly
  • the thread has created several other threads that must be killed as well.

The nice way of handling this if you can afford it (if you are managing your own threads) is to have an exit_request flag that each threads checks on regular interval to see if it is time for it to exit.

For example:

import threading

class StoppableThread(threading.Thread):
    """Thread class with a stop() method. The thread itself has to check
    regularly for the stopped() condition."""

    def __init__(self,  *args, **kwargs):
        super(StoppableThread, self).__init__(*args, **kwargs)
        self._stop_event = threading.Event()

    def stop(self):
        self._stop_event.set()

    def stopped(self):
        return self._stop_event.is_set()

In this code, you should call stop() on the thread when you want it to exit, and wait for the thread to exit properly using join(). The thread should check the stop flag at regular intervals.

There are cases however when you really need to kill a thread. An example is when you are wrapping an external library that is busy for long calls and you want to interrupt it.

The following code allows (with some restrictions) to raise an Exception in a Python thread:

def _async_raise(tid, exctype):
    '''Raises an exception in the threads with id tid'''
    if not inspect.isclass(exctype):
        raise TypeError("Only types can be raised (not instances)")
    res = ctypes.pythonapi.PyThreadState_SetAsyncExc(ctypes.c_long(tid),
                                                     ctypes.py_object(exctype))
    if res == 0:
        raise ValueError("invalid thread id")
    elif res != 1:
        # "if it returns a number greater than one, you're in trouble,
        # and you should call it again with exc=NULL to revert the effect"
        ctypes.pythonapi.PyThreadState_SetAsyncExc(ctypes.c_long(tid), None)
        raise SystemError("PyThreadState_SetAsyncExc failed")

class ThreadWithExc(threading.Thread):
    '''A thread class that supports raising exception in the thread from
       another thread.
    '''
    def _get_my_tid(self):
        """determines this (self's) thread id

        CAREFUL : this function is executed in the context of the caller
        thread, to get the identity of the thread represented by this
        instance.
        """
        if not self.isAlive():
            raise threading.ThreadError("the thread is not active")

        # do we have it cached?
        if hasattr(self, "_thread_id"):
            return self._thread_id

        # no, look for it in the _active dict
        for tid, tobj in threading._active.items():
            if tobj is self:
                self._thread_id = tid
                return tid

        # TODO: in python 2.6, there's a simpler way to do : self.ident

        raise AssertionError("could not determine the thread's id")

    def raiseExc(self, exctype):
        """Raises the given exception type in the context of this thread.

        If the thread is busy in a system call (time.sleep(),
        socket.accept(), ...), the exception is simply ignored.

        If you are sure that your exception should terminate the thread,
        one way to ensure that it works is:

            t = ThreadWithExc( ... )
            ...
            t.raiseExc( SomeException )
            while t.isAlive():
                time.sleep( 0.1 )
                t.raiseExc( SomeException )

        If the exception is to be caught by the thread, you need a way to
        check that your thread has caught it.

        CAREFUL : this function is executed in the context of the
        caller thread, to raise an excpetion in the context of the
        thread represented by this instance.
        """
        _async_raise( self._get_my_tid(), exctype )

(Based on Killable Threads by Tomer Filiba. The quote about the return value of PyThreadState_SetAsyncExc appears to be from an old version of Python.)

As noted in the documentation, this is not a magic bullet because if the thread is busy outside the Python interpreter, it will not catch the interruption.

A good usage pattern of this code is to have the thread catch a specific exception and perform the cleanup. That way, you can interrupt a task and still have proper cleanup.


回答 1

没有官方的API可以这样做。

您需要使用平台API杀死线程,例如pthread_kill或TerminateThread。您可以通过pythonwin或ctypes访问此类API。

请注意,这本质上是不安全的。如果被杀死的线程在被杀死时具有GIL,则可能会导致无法收集的垃圾(来自成为垃圾的堆栈帧的局部变量),并可能导致死锁。

There is no official API to do that, no.

You need to use platform API to kill the thread, e.g. pthread_kill, or TerminateThread. You can access such API e.g. through pythonwin, or through ctypes.

Notice that this is inherently unsafe. It will likely lead to uncollectable garbage (from local variables of the stack frames that become garbage), and may lead to deadlocks, if the thread being killed has the GIL at the point when it is killed.


回答 2

一个multiprocessing.Process罐头p.terminate()

在我想杀死一个线程,但又不想使用标志/锁/信号/信号量/事件/任何东西的情况下,我会将线程提升为完整进程。对于仅使用几个线程的代码,开销并不是那么糟糕。

例如,这样做很方便,可以轻松终止执行阻塞I / O的助手“线程”

转换是微不足道的:在相关代码中,将全部替换threading.Threadmultiprocessing.Process和全部替换queue.Queuemultiprocessing.Queue,并将所需的调用添加p.terminate()到要杀死其子级的父进程中p

请参阅Python文档multiprocessing

A multiprocessing.Process can p.terminate()

In the cases where I want to kill a thread, but do not want to use flags/locks/signals/semaphores/events/whatever, I promote the threads to full blown processes. For code that makes use of just a few threads the overhead is not that bad.

E.g. this comes in handy to easily terminate helper “threads” which execute blocking I/O

The conversion is trivial: In related code replace all threading.Thread with multiprocessing.Process and all queue.Queue with multiprocessing.Queue and add the required calls of p.terminate() to your parent process which wants to kill its child p

See the Python documentation for multiprocessing.


回答 3

如果试图终止整个程序,则可以将线程设置为“守护程序”。参见 Thread.daemon

If you are trying to terminate the whole program you can set the thread as a “daemon”. see Thread.daemon


回答 4

正如其他人提到的那样,规范是设置停止标志。对于轻量级的东西(没有Thread的子类,没有全局变量),可以选择使用lambda回调。(请注意中的括号if stop()。)

import threading
import time

def do_work(id, stop):
    print("I am thread", id)
    while True:
        print("I am thread {} doing something".format(id))
        if stop():
            print("  Exiting loop.")
            break
    print("Thread {}, signing off".format(id))


def main():
    stop_threads = False
    workers = []
    for id in range(0,3):
        tmp = threading.Thread(target=do_work, args=(id, lambda: stop_threads))
        workers.append(tmp)
        tmp.start()
    time.sleep(3)
    print('main: done sleeping; time to stop the threads.')
    stop_threads = True
    for worker in workers:
        worker.join()
    print('Finis.')

if __name__ == '__main__':
    main()

替换print()pr()始终刷新(sys.stdout.flush())的函数可以提高Shell输出的精度。

(仅在Windows / Eclipse / Python3.3上测试过)

As others have mentioned, the norm is to set a stop flag. For something lightweight (no subclassing of Thread, no global variable), a lambda callback is an option. (Note the parentheses in if stop().)

import threading
import time

def do_work(id, stop):
    print("I am thread", id)
    while True:
        print("I am thread {} doing something".format(id))
        if stop():
            print("  Exiting loop.")
            break
    print("Thread {}, signing off".format(id))


def main():
    stop_threads = False
    workers = []
    for id in range(0,3):
        tmp = threading.Thread(target=do_work, args=(id, lambda: stop_threads))
        workers.append(tmp)
        tmp.start()
    time.sleep(3)
    print('main: done sleeping; time to stop the threads.')
    stop_threads = True
    for worker in workers:
        worker.join()
    print('Finis.')

if __name__ == '__main__':
    main()

Replacing print() with a pr() function that always flushes (sys.stdout.flush()) may improve the precision of the shell output.

(Only tested on Windows/Eclipse/Python3.3)


回答 5

这基于thread2-可杀死的线程(Python配方)

您需要调用PyThreadState_SetasyncExc(),该方法仅可通过ctypes使用。

该功能仅在Python 2.7.3上进行了测试,但可能与其他最新的2.x版本一起使用。

import ctypes

def terminate_thread(thread):
    """Terminates a python thread from another thread.

    :param thread: a threading.Thread instance
    """
    if not thread.isAlive():
        return

    exc = ctypes.py_object(SystemExit)
    res = ctypes.pythonapi.PyThreadState_SetAsyncExc(
        ctypes.c_long(thread.ident), exc)
    if res == 0:
        raise ValueError("nonexistent thread id")
    elif res > 1:
        # """if it returns a number greater than one, you're in trouble,
        # and you should call it again with exc=NULL to revert the effect"""
        ctypes.pythonapi.PyThreadState_SetAsyncExc(thread.ident, None)
        raise SystemError("PyThreadState_SetAsyncExc failed")

This is based on thread2 — killable threads (Python recipe)

You need to call PyThreadState_SetasyncExc(), which is only available through ctypes.

This has only been tested on Python 2.7.3, but it is likely to work with other recent 2.x releases.

import ctypes

def terminate_thread(thread):
    """Terminates a python thread from another thread.

    :param thread: a threading.Thread instance
    """
    if not thread.isAlive():
        return

    exc = ctypes.py_object(SystemExit)
    res = ctypes.pythonapi.PyThreadState_SetAsyncExc(
        ctypes.c_long(thread.ident), exc)
    if res == 0:
        raise ValueError("nonexistent thread id")
    elif res > 1:
        # """if it returns a number greater than one, you're in trouble,
        # and you should call it again with exc=NULL to revert the effect"""
        ctypes.pythonapi.PyThreadState_SetAsyncExc(thread.ident, None)
        raise SystemError("PyThreadState_SetAsyncExc failed")

回答 6

如果不配合使用线程,切勿强行杀死它。

杀死线程会删除所有尝试/最终阻止设置的保证,因此您可能会锁定锁,打开文件等。

唯一可以断言强行杀死线程是个好主意的方法是快速杀死程序,但绝不要杀死单个线程。

You should never forcibly kill a thread without cooperating with it.

Killing a thread removes any guarantees that try/finally blocks set up so you might leave locks locked, files open, etc.

The only time you can argue that forcibly killing threads is a good idea is to kill a program fast, but never single threads.


回答 7

在Python中,您根本无法直接杀死线程。

如果您实际上并不需要Thread(!),则可以使用 multiprocessing软件包,而不是使用threading软件包来做。在这里,要杀死一个进程,您可以简单地调用方法:

yourProcess.terminate()  # kill the process!

Python将杀死您的进程(在Unix上通过SIGTERM信号,而在Windows上通过TerminateProcess()调用)。在使用队列或管道时,请注意使用它!(它可能会损坏队列/管道中的数据)

请注意,multiprocessing.Event和的multiprocessing.Semaphore工作方式与threading.Event和的完全相同threading.Semaphore。实际上,第一个是后者的克隆。

如果您确实需要使用线程,则无法直接杀死它。但是,您可以使用“守护程序线程”。实际上,在Python中,可以将Thread标记为守护程序

yourThread.daemon = True  # set the Thread as a "daemon thread"

当没有活动的非守护线程时,主程序将退出。换句话说,当您的主线程(当然,这是一个非守护程序线程)完成其操作时,即使仍有一些守护程序线程在工作,程序也会退出。

请注意,有必要daemonstart()调用该方法之前设置一个线程!

当然,daemon甚至可以使用multiprocessing。在这里,当主进程退出时,它将尝试终止其所有守护进程。

最后,请注意,sys.exit()os.kill()不是选择。

In Python, you simply cannot kill a Thread directly.

If you do NOT really need to have a Thread (!), what you can do, instead of using the threading package , is to use the multiprocessing package . Here, to kill a process, you can simply call the method:

yourProcess.terminate()  # kill the process!

Python will kill your process (on Unix through the SIGTERM signal, while on Windows through the TerminateProcess() call). Pay attention to use it while using a Queue or a Pipe! (it may corrupt the data in the Queue/Pipe)

Note that the multiprocessing.Event and the multiprocessing.Semaphore work exactly in the same way of the threading.Event and the threading.Semaphore respectively. In fact, the first ones are clones of the latters.

If you REALLY need to use a Thread, there is no way to kill it directly. What you can do, however, is to use a “daemon thread”. In fact, in Python, a Thread can be flagged as daemon:

yourThread.daemon = True  # set the Thread as a "daemon thread"

The main program will exit when no alive non-daemon threads are left. In other words, when your main thread (which is, of course, a non-daemon thread) will finish its operations, the program will exit even if there are still some daemon threads working.

Note that it is necessary to set a Thread as daemon before the start() method is called!

Of course you can, and should, use daemon even with multiprocessing. Here, when the main process exits, it attempts to terminate all of its daemonic child processes.

Finally, please, note that sys.exit() and os.kill() are not choices.


回答 8

您可以通过在将退出线程的线程中安装跟踪来杀死线程。请参阅附件链接以获取一种可能的实现。

在Python中杀死线程

You can kill a thread by installing trace into the thread that will exit the thread. See attached link for one possible implementation.

Kill a thread in Python


回答 9

如果你是显式调用time.sleep()为你的线程(比如查询一些外部的服务)的一部分,在菲利普的方法的改进是使用了超时的eventwait()方法,无论你sleep()

例如:

import threading

class KillableThread(threading.Thread):
    def __init__(self, sleep_interval=1):
        super().__init__()
        self._kill = threading.Event()
        self._interval = sleep_interval

    def run(self):
        while True:
            print("Do Something")

            # If no kill signal is set, sleep for the interval,
            # If kill signal comes in while sleeping, immediately
            #  wake up and handle
            is_killed = self._kill.wait(self._interval)
            if is_killed:
                break

        print("Killing Thread")

    def kill(self):
        self._kill.set()

然后运行

t = KillableThread(sleep_interval=5)
t.start()
# Every 5 seconds it prints:
#: Do Something
t.kill()
#: Killing Thread

使用wait()而不是sleep()ing并定期检查事件的优点是您可以在更长的睡眠间隔内进行编程,线程几乎立即停止(原本应该是sleep()ing),并且我认为处理退出的代码明显更简单。

If you are explicitly calling time.sleep() as part of your thread (say polling some external service), an improvement upon Phillipe’s method is to use the timeout in the event‘s wait() method wherever you sleep()

For example:

import threading

class KillableThread(threading.Thread):
    def __init__(self, sleep_interval=1):
        super().__init__()
        self._kill = threading.Event()
        self._interval = sleep_interval

    def run(self):
        while True:
            print("Do Something")

            # If no kill signal is set, sleep for the interval,
            # If kill signal comes in while sleeping, immediately
            #  wake up and handle
            is_killed = self._kill.wait(self._interval)
            if is_killed:
                break

        print("Killing Thread")

    def kill(self):
        self._kill.set()

Then to run it

t = KillableThread(sleep_interval=5)
t.start()
# Every 5 seconds it prints:
#: Do Something
t.kill()
#: Killing Thread

The advantage of using wait() instead of sleep()ing and regularly checking the event is that you can program in longer intervals of sleep, the thread is stopped almost immediately (when you would otherwise be sleep()ing) and in my opinion, the code for handling exit is significantly simpler.


回答 10

如果您不杀死线程,那就更好了。一种方法可能是在线程的循环中引入“ try”块,并在您要停止线程时引发异常(例如,break / return / …会停止for / while / …)。我已经在我的应用程序上使用了它,并且可以使用…

It is better if you don’t kill a thread. A way could be to introduce a “try” block into the thread’s cycle and to throw an exception when you want to stop the thread (for example a break/return/… that stops your for/while/…). I’ve used this on my app and it works…


回答 11

绝对有可能实现Thread.stop以下示例代码中所示的方法:

import sys
import threading
import time


class StopThread(StopIteration):
    pass

threading.SystemExit = SystemExit, StopThread


class Thread2(threading.Thread):

    def stop(self):
        self.__stop = True

    def _bootstrap(self):
        if threading._trace_hook is not None:
            raise ValueError('Cannot run thread with tracing!')
        self.__stop = False
        sys.settrace(self.__trace)
        super()._bootstrap()

    def __trace(self, frame, event, arg):
        if self.__stop:
            raise StopThread()
        return self.__trace


class Thread3(threading.Thread):

    def _bootstrap(self, stop_thread=False):
        def stop():
            nonlocal stop_thread
            stop_thread = True
        self.stop = stop

        def tracer(*_):
            if stop_thread:
                raise StopThread()
            return tracer
        sys.settrace(tracer)
        super()._bootstrap()

###############################################################################


def main():
    test1 = Thread2(target=printer)
    test1.start()
    time.sleep(1)
    test1.stop()
    test1.join()
    test2 = Thread2(target=speed_test)
    test2.start()
    time.sleep(1)
    test2.stop()
    test2.join()
    test3 = Thread3(target=speed_test)
    test3.start()
    time.sleep(1)
    test3.stop()
    test3.join()


def printer():
    while True:
        print(time.time() % 1)
        time.sleep(0.1)


def speed_test(count=0):
    try:
        while True:
            count += 1
    except StopThread:
        print('Count =', count)

if __name__ == '__main__':
    main()

Thread3类似乎比快大约33%的运行代码Thread2类。

It is definitely possible to implement a Thread.stop method as shown in the following example code:

import sys
import threading
import time


class StopThread(StopIteration):
    pass

threading.SystemExit = SystemExit, StopThread


class Thread2(threading.Thread):

    def stop(self):
        self.__stop = True

    def _bootstrap(self):
        if threading._trace_hook is not None:
            raise ValueError('Cannot run thread with tracing!')
        self.__stop = False
        sys.settrace(self.__trace)
        super()._bootstrap()

    def __trace(self, frame, event, arg):
        if self.__stop:
            raise StopThread()
        return self.__trace


class Thread3(threading.Thread):

    def _bootstrap(self, stop_thread=False):
        def stop():
            nonlocal stop_thread
            stop_thread = True
        self.stop = stop

        def tracer(*_):
            if stop_thread:
                raise StopThread()
            return tracer
        sys.settrace(tracer)
        super()._bootstrap()

###############################################################################


def main():
    test1 = Thread2(target=printer)
    test1.start()
    time.sleep(1)
    test1.stop()
    test1.join()
    test2 = Thread2(target=speed_test)
    test2.start()
    time.sleep(1)
    test2.stop()
    test2.join()
    test3 = Thread3(target=speed_test)
    test3.start()
    time.sleep(1)
    test3.stop()
    test3.join()


def printer():
    while True:
        print(time.time() % 1)
        time.sleep(0.1)


def speed_test(count=0):
    try:
        while True:
            count += 1
    except StopThread:
        print('Count =', count)

if __name__ == '__main__':
    main()

The Thread3 class appears to run code approximately 33% faster than the Thread2 class.


回答 12

from ctypes import *
pthread = cdll.LoadLibrary("libpthread-2.15.so")
pthread.pthread_cancel(c_ulong(t.ident))

t是您的Thread对象。

阅读python源代码(Modules/threadmodule.cPython/thread_pthread.h),您可以看到的Thread.ident是一种pthread_t类型,因此您可以pthread在python use中做任何事情libpthread

from ctypes import *
pthread = cdll.LoadLibrary("libpthread-2.15.so")
pthread.pthread_cancel(c_ulong(t.ident))

t is your Thread object.

Read the python source (Modules/threadmodule.c and Python/thread_pthread.h) you can see the Thread.ident is an pthread_t type, so you can do anything pthread can do in python use libpthread.


回答 13

可以使用以下解决方法杀死线程:

kill_threads = False

def doSomething():
    global kill_threads
    while True:
        if kill_threads:
            thread.exit()
        ......
        ......

thread.start_new_thread(doSomething, ())

这甚至可以用于终止从主线程终止其代码在另一个模块中编写的线程。我们可以在该模块中声明一个全局变量,并使用它终止该模块中产生的线程。

我通常使用它在程序出口处终止所有线程。这可能不是终止线程的理想方法,但可能会有所帮助。

Following workaround can be used to kill a thread:

kill_threads = False

def doSomething():
    global kill_threads
    while True:
        if kill_threads:
            thread.exit()
        ......
        ......

thread.start_new_thread(doSomething, ())

This can be used even for terminating threads, whose code is written in another module, from main thread. We can declare a global variable in that module and use it to terminate thread/s spawned in that module.

I usually use this to terminate all the threads at the program exit. This might not be the perfect way to terminate thread/s but could help.


回答 14

我玩这个游戏很晚,但是我一直在努力解决类似的问题,以下内容似乎可以为我很好地解决问题,并且让守护子线程退出时让我做一些基本的线程状态检查和清理工作:

import threading
import time
import atexit

def do_work():

  i = 0
  @atexit.register
  def goodbye():
    print ("'CLEANLY' kill sub-thread with value: %s [THREAD: %s]" %
           (i, threading.currentThread().ident))

  while True:
    print i
    i += 1
    time.sleep(1)

t = threading.Thread(target=do_work)
t.daemon = True
t.start()

def after_timeout():
  print "KILL MAIN THREAD: %s" % threading.currentThread().ident
  raise SystemExit

threading.Timer(2, after_timeout).start()

Yield:

0
1
KILL MAIN THREAD: 140013208254208
'CLEANLY' kill sub-thread with value: 2 [THREAD: 140013674317568]

I’m way late to this game, but I’ve been wrestling with a similar question and the following appears to both resolve the issue perfectly for me AND lets me do some basic thread state checking and cleanup when the daemonized sub-thread exits:

import threading
import time
import atexit

def do_work():

  i = 0
  @atexit.register
  def goodbye():
    print ("'CLEANLY' kill sub-thread with value: %s [THREAD: %s]" %
           (i, threading.currentThread().ident))

  while True:
    print i
    i += 1
    time.sleep(1)

t = threading.Thread(target=do_work)
t.daemon = True
t.start()

def after_timeout():
  print "KILL MAIN THREAD: %s" % threading.currentThread().ident
  raise SystemExit

threading.Timer(2, after_timeout).start()

Yields:

0
1
KILL MAIN THREAD: 140013208254208
'CLEANLY' kill sub-thread with value: 2 [THREAD: 140013674317568]

回答 15

我想补充的一件事是,如果您在线程lib Python中阅读了官方文档,建议您避免使用“恶魔”线程,如果您不希望线程突然结束,请使用Paolo Rovelli 提到的标志。

根据官方文档:

守护程序线程在关闭时突然停止。它们的资源(例如打开的文件,数据库事务等)可能无法正确释放。如果您希望线程正常停止,请将它们设置为非守护进程,并使用适当的信令机制(例如事件)。

我认为创建守护线程取决于您的应用程序,但总的来说(我认为)最好避免杀死它们或使其成为守护线程。在多处理中,您可以is_alive()用来检查过程状态并“终止”以完成过程(还可以避免GIL问题)。但是,有时在Windows中执行代码时会发现更多问题。

永远记住,如果您有“活动线程”,Python解释器将运行以等待它们。(因为这个守护进程可以帮助您,如果没关系突然结束)。

One thing I want to add is that if you read official documentation in threading lib Python, it’s recommended to avoid use of “demonic” threads, when you don’t want threads end abruptly, with the flag that Paolo Rovelli mentioned.

From official documentation:

Daemon threads are abruptly stopped at shutdown. Their resources (such as open files, database transactions, etc.) may not be released properly. If you want your threads to stop gracefully, make them non-daemonic and use a suitable signaling mechanism such as an Event.

I think that creating daemonic threads depends of your application, but in general (and in my opinion) it’s better to avoid killing them or making them daemonic. In multiprocessing you can use is_alive() to check process status and “terminate” for finish them (Also you avoid GIL problems). But you can find more problems, sometimes, when you execute your code in Windows.

And always remember that if you have “live threads”, the Python interpreter will be running for wait them. (Because of this daemonic can help you if don’t matter abruptly ends).


回答 16

有一个为此目的而建立的库stopit。尽管此处列出的某些注意事项仍然适用,但至少此库提供了一种常规的,可重复的技术来实现所述目标。

There is a library built for this purpose, stopit. Although some of the same cautions listed herein still apply, at least this library presents a regular, repeatable technique for achieving the stated goal.


回答 17

虽然它已经很老了,对于某些人来说可能是一个方便的解决方案:

一个扩展了线程的模块功能的小模块-允许一个线程在另一个线程的上下文中引发异常。通过提高SystemExit,您最终可以杀死python线程。

import threading
import ctypes     

def _async_raise(tid, excobj):
    res = ctypes.pythonapi.PyThreadState_SetAsyncExc(tid, ctypes.py_object(excobj))
    if res == 0:
        raise ValueError("nonexistent thread id")
    elif res > 1:
        # """if it returns a number greater than one, you're in trouble, 
        # and you should call it again with exc=NULL to revert the effect"""
        ctypes.pythonapi.PyThreadState_SetAsyncExc(tid, 0)
        raise SystemError("PyThreadState_SetAsyncExc failed")

class Thread(threading.Thread):
    def raise_exc(self, excobj):
        assert self.isAlive(), "thread must be started"
        for tid, tobj in threading._active.items():
            if tobj is self:
                _async_raise(tid, excobj)
                return

        # the thread was alive when we entered the loop, but was not found 
        # in the dict, hence it must have been already terminated. should we raise
        # an exception here? silently ignore?

    def terminate(self):
        # must raise the SystemExit type, instead of a SystemExit() instance
        # due to a bug in PyThreadState_SetAsyncExc
        self.raise_exc(SystemExit)

因此,它允许“线程在另一个线程的上下文中引发异常”,这样,终止的线程可以处理终止,而无需定期检查中止标志。

但是,根据其原始来源,此代码存在一些问题。

  • 仅在执行python字节码时才会引发异常。如果您的线程调用了本机/内置阻止函数,则仅当执行返回到python代码时才会引发异常。
    • 如果内置函数在内部调用PyErr_Clear()也会存在一个问题,这将有效地取消您的未决异常。您可以尝试再次提高它。
  • 只有异常类型可以安全地引发。异常实例可能会导致意外行为,因此受到限制。
  • 我要求在内置线程模块中公开此功能,但是由于ctypes已成为标准库(从2.5版开始),并且此
    功能不太可能与实现无关,因此可以不公开

While it’s rather old, this might be a handy solution for some:

A little module that extends the threading’s module functionality — allows one thread to raise exceptions in the context of another thread. By raising SystemExit, you can finally kill python threads.

import threading
import ctypes     

def _async_raise(tid, excobj):
    res = ctypes.pythonapi.PyThreadState_SetAsyncExc(tid, ctypes.py_object(excobj))
    if res == 0:
        raise ValueError("nonexistent thread id")
    elif res > 1:
        # """if it returns a number greater than one, you're in trouble, 
        # and you should call it again with exc=NULL to revert the effect"""
        ctypes.pythonapi.PyThreadState_SetAsyncExc(tid, 0)
        raise SystemError("PyThreadState_SetAsyncExc failed")

class Thread(threading.Thread):
    def raise_exc(self, excobj):
        assert self.isAlive(), "thread must be started"
        for tid, tobj in threading._active.items():
            if tobj is self:
                _async_raise(tid, excobj)
                return

        # the thread was alive when we entered the loop, but was not found 
        # in the dict, hence it must have been already terminated. should we raise
        # an exception here? silently ignore?

    def terminate(self):
        # must raise the SystemExit type, instead of a SystemExit() instance
        # due to a bug in PyThreadState_SetAsyncExc
        self.raise_exc(SystemExit)

So, it allows a “thread to raise exceptions in the context of another thread” and in this way, the terminated thread can handle the termination without regularly checking an abort flag.

However, according to its original source, there are some issues with this code.

  • The exception will be raised only when executing python bytecode. If your thread calls a native/built-in blocking function, the exception will be raised only when execution returns to the python code.
    • There is also an issue if the built-in function internally calls PyErr_Clear(), which would effectively cancel your pending exception. You can try to raise it again.
  • Only exception types can be raised safely. Exception instances are likely to cause unexpected behavior, and are thus restricted.
  • I asked to expose this function in the built-in thread module, but since ctypes has become a standard library (as of 2.5), and this
    feature is not likely to be implementation-agnostic, it may be kept
    unexposed.

回答 18

这似乎与Windows 7上的pywin32一起使用

my_thread = threading.Thread()
my_thread.start()
my_thread._Thread__stop()

This seems to work with pywin32 on windows 7

my_thread = threading.Thread()
my_thread.start()
my_thread._Thread__stop()

回答 19

ØMQ项目的创始人之一Pieter Hintjens 说,使用ØMQ并避免使用同步原语(例如锁,互斥体,事件等),是编写多线程程序的最安全的方法:

http://zguide.zeromq.org/py:all#Multithreading-with-ZeroMQ

这包括告诉子线程应该取消其工作。这可以通过为该线程配备一个ØMQ套接字并在该套接字上轮询以获取一条消息指出该线程应取消的信息来完成。

该链接还提供了有关带有ØMQ的多线程python代码的示例。

Pieter Hintjens — one of the founders of the ØMQ-project — says, using ØMQ and avoiding synchronization primitives like locks, mutexes, events etc., is the sanest and securest way to write multi-threaded programs:

http://zguide.zeromq.org/py:all#Multithreading-with-ZeroMQ

This includes telling a child thread, that it should cancel its work. This would be done by equipping the thread with a ØMQ-socket and polling on that socket for a message saying that it should cancel.

The link also provides an example on multi-threaded python code with ØMQ.


回答 20

假设您要具有多个具有相同功能的线程,这是恕我直言,最简单的实现是通过id停止一个线程:

import time
from threading import Thread

def doit(id=0):
    doit.stop=0
    print("start id:%d"%id)
    while 1:
        time.sleep(1)
        print(".")
        if doit.stop==id:
            doit.stop=0
            break
    print("end thread %d"%id)

t5=Thread(target=doit, args=(5,))
t6=Thread(target=doit, args=(6,))

t5.start() ; t6.start()
time.sleep(2)
doit.stop =5  #kill t5
time.sleep(2)
doit.stop =6  #kill t6

不错的是,您可以拥有多个相同和不同的功能,并通过以下方式将其全部停止 functionname.stop

如果您只想使用该函数的一个线程,则无需记住ID。如果doit.stop> 0 ,则停止。

Asuming, that you want to have multiple threads of the same function, this is IMHO the easiest implementation to stop one by id:

import time
from threading import Thread

def doit(id=0):
    doit.stop=0
    print("start id:%d"%id)
    while 1:
        time.sleep(1)
        print(".")
        if doit.stop==id:
            doit.stop=0
            break
    print("end thread %d"%id)

t5=Thread(target=doit, args=(5,))
t6=Thread(target=doit, args=(6,))

t5.start() ; t6.start()
time.sleep(2)
doit.stop =5  #kill t5
time.sleep(2)
doit.stop =6  #kill t6

The nice thing is here, you can have multiple of same and different functions, and stop them all by functionname.stop

If you want to have only one thread of the function then you don’t need to remember the id. Just stop, if doit.stop > 0.


回答 21

只是基于@SCB的想法(这正是我所需要的)来创建具有自定义函数的KillableThread子类:

from threading import Thread, Event

class KillableThread(Thread):
    def __init__(self, sleep_interval=1, target=None, name=None, args=(), kwargs={}):
        super().__init__(None, target, name, args, kwargs)
        self._kill = Event()
        self._interval = sleep_interval
        print(self._target)

    def run(self):
        while True:
            # Call custom function with arguments
            self._target(*self._args)

        # If no kill signal is set, sleep for the interval,
        # If kill signal comes in while sleeping, immediately
        #  wake up and handle
        is_killed = self._kill.wait(self._interval)
        if is_killed:
            break

    print("Killing Thread")

def kill(self):
    self._kill.set()

if __name__ == '__main__':

    def print_msg(msg):
        print(msg)

    t = KillableThread(10, print_msg, args=("hello world"))
    t.start()
    time.sleep(6)
    print("About to kill thread")
    t.kill()

自然,就像@SBC一样,线程不必等待运行新循环来停止。在此示例中,您将在“即将杀死线程”之后看到“杀死线程”消息,而不是再等待4秒钟才能完成线程(因为我们已经睡了6秒钟)。

KillableThread构造函数中的第二个参数是您的自定义函数(此处为print_msg)。Args参数是在此处调用函数((“ hello world”))时将使用的参数。

Just to build up on @SCB’s idea (which was exactly what I needed) to create a KillableThread subclass with a customized function:

from threading import Thread, Event

class KillableThread(Thread):
    def __init__(self, sleep_interval=1, target=None, name=None, args=(), kwargs={}):
        super().__init__(None, target, name, args, kwargs)
        self._kill = Event()
        self._interval = sleep_interval
        print(self._target)

    def run(self):
        while True:
            # Call custom function with arguments
            self._target(*self._args)

        # If no kill signal is set, sleep for the interval,
        # If kill signal comes in while sleeping, immediately
        #  wake up and handle
        is_killed = self._kill.wait(self._interval)
        if is_killed:
            break

    print("Killing Thread")

def kill(self):
    self._kill.set()

if __name__ == '__main__':

    def print_msg(msg):
        print(msg)

    t = KillableThread(10, print_msg, args=("hello world"))
    t.start()
    time.sleep(6)
    print("About to kill thread")
    t.kill()

Naturally, like with @SBC, the thread doesn’t wait to run a new loop to stop. In this example, you would see the “Killing Thread” message printed right after the “About to kill thread” instead of waiting for 4 more seconds for the thread to complete (since we have slept for 6 seconds already).

Second argument in KillableThread constructor is your custom function (print_msg here). Args argument are the arguments that will be used when calling the function ((“hello world”)) here.


回答 22

如@Kozyarchuk的答案中所述,安装跟踪有效。由于此答案不包含任何代码,因此下面是一个可以使用的示例:

import sys, threading, time 

class TraceThread(threading.Thread): 
    def __init__(self, *args, **keywords): 
        threading.Thread.__init__(self, *args, **keywords) 
        self.killed = False
    def start(self): 
        self._run = self.run 
        self.run = self.settrace_and_run
        threading.Thread.start(self) 
    def settrace_and_run(self): 
        sys.settrace(self.globaltrace) 
        self._run()
    def globaltrace(self, frame, event, arg): 
        return self.localtrace if event == 'call' else None
    def localtrace(self, frame, event, arg): 
        if self.killed and event == 'line': 
            raise SystemExit() 
        return self.localtrace 

def f(): 
    while True: 
        print('1') 
        time.sleep(2)
        print('2') 
        time.sleep(2)
        print('3') 
        time.sleep(2)

t = TraceThread(target=f) 
t.start() 
time.sleep(2.5) 
t.killed = True

打印1并打印后停止23不打印。

As mentioned in @Kozyarchuk’s answer, installing trace works. Since this answer contained no code, here is a working ready-to-use example:

import sys, threading, time 

class TraceThread(threading.Thread): 
    def __init__(self, *args, **keywords): 
        threading.Thread.__init__(self, *args, **keywords) 
        self.killed = False
    def start(self): 
        self._run = self.run 
        self.run = self.settrace_and_run
        threading.Thread.start(self) 
    def settrace_and_run(self): 
        sys.settrace(self.globaltrace) 
        self._run()
    def globaltrace(self, frame, event, arg): 
        return self.localtrace if event == 'call' else None
    def localtrace(self, frame, event, arg): 
        if self.killed and event == 'line': 
            raise SystemExit() 
        return self.localtrace 

def f(): 
    while True: 
        print('1') 
        time.sleep(2)
        print('2') 
        time.sleep(2)
        print('3') 
        time.sleep(2)

t = TraceThread(target=f) 
t.start() 
time.sleep(2.5) 
t.killed = True

It stops after having printed 1 and 2. 3 is not printed.


回答 23

您可以在进程中执行命令,然后使用进程ID将其杀死。我需要在两个线程之间进行同步,其中一个线程不会自行返回。

processIds = []

def executeRecord(command):
    print(command)

    process = subprocess.Popen(command, stdout=subprocess.PIPE)
    processIds.append(process.pid)
    print(processIds[0])

    #Command that doesn't return by itself
    process.stdout.read().decode("utf-8")
    return;


def recordThread(command, timeOut):

    thread = Thread(target=executeRecord, args=(command,))
    thread.start()
    thread.join(timeOut)

    os.kill(processIds.pop(), signal.SIGINT)

    return;

You can execute your command in a process and then kill it using the process id. I needed to sync between two thread one of which doesn’t return by itself.

processIds = []

def executeRecord(command):
    print(command)

    process = subprocess.Popen(command, stdout=subprocess.PIPE)
    processIds.append(process.pid)
    print(processIds[0])

    #Command that doesn't return by itself
    process.stdout.read().decode("utf-8")
    return;


def recordThread(command, timeOut):

    thread = Thread(target=executeRecord, args=(command,))
    thread.start()
    thread.join(timeOut)

    os.kill(processIds.pop(), signal.SIGINT)

    return;

回答 24

使用setDaemon(True)启动子线程。

def bootstrap(_filename):
    mb = ModelBootstrap(filename=_filename) # Has many Daemon threads. All get stopped automatically when main thread is stopped.

t = threading.Thread(target=bootstrap,args=('models.conf',))
t.setDaemon(False)

while True:
    t.start()
    time.sleep(10) # I am just allowing the sub-thread to run for 10 sec. You can listen on an event to stop execution.
    print('Thread stopped')
    break

Start the sub thread with setDaemon(True).

def bootstrap(_filename):
    mb = ModelBootstrap(filename=_filename) # Has many Daemon threads. All get stopped automatically when main thread is stopped.

t = threading.Thread(target=bootstrap,args=('models.conf',))
t.setDaemon(False)

while True:
    t.start()
    time.sleep(10) # I am just allowing the sub-thread to run for 10 sec. You can listen on an event to stop execution.
    print('Thread stopped')
    break

回答 25

这是一个错误的答案,请参阅评论

方法如下:

from threading import *

...

for thread in enumerate():
    if thread.isAlive():
        try:
            thread._Thread__stop()
        except:
            print(str(thread.getName()) + ' could not be terminated'))

给它几秒钟,然后应该停止线程。还要检查thread._Thread__delete()方法。

thread.quit()为了方便起见,我建议使用一种方法。例如,如果您的线程中有一个套接字,建议quit()您在套接字句柄类中创建一个方法,终止该套接字,然后在的thread._Thread__stop()内部运行quit()

This is a bad answer, see the comments

Here’s how to do it:

from threading import *

...

for thread in enumerate():
    if thread.isAlive():
        try:
            thread._Thread__stop()
        except:
            print(str(thread.getName()) + ' could not be terminated'))

Give it a few seconds then your thread should be stopped. Check also the thread._Thread__delete() method.

I’d recommend a thread.quit() method for convenience. For example if you have a socket in your thread, I’d recommend creating a quit() method in your socket-handle class, terminate the socket, then run a thread._Thread__stop() inside of your quit().


回答 26

如果您确实需要杀死子任务的能力,请使用替代实现。multiprocessing并且gevent都支持滥杀“线程”。

Python的线程不支持取消。想都别想。您的代码很可能会死锁,损坏或泄漏内存,或者具有其他意想不到的“有趣”难以调试的效果,这种情况很少且不确定地发生。

If you really need the ability to kill a sub-task, use an alternate implementation. multiprocessing and gevent both support indiscriminately killing a “thread”.

Python’s threading does not support cancellation. Do not even try. Your code is very likely to deadlock, corrupt or leak memory, or have other unintended “interesting” hard-to-debug effects which happen rarely and nondeterministically.


如何在Python中使用线程?

问题:如何在Python中使用线程?

我试图了解Python中的线程。我看过文档和示例,但坦率地说,许多示例过于复杂,我难以理解它们。

您如何清楚地显示为多线程而划分的任务?

I am trying to understand threading in Python. I’ve looked at the documentation and examples, but quite frankly, many examples are overly sophisticated and I’m having trouble understanding them.

How do you clearly show tasks being divided for multi-threading?


回答 0

自2010年提出这个问题以来,如何使用带有mappool的 Python进行简单的多线程处理已经有了真正的简化。

下面的代码来自于一篇文章/博客文章,您绝对应该检出(没有从属关系)- 并行显示在一行中:更好的日常线程任务模型。我将在下面进行总结-最终仅是几行代码:

from multiprocessing.dummy import Pool as ThreadPool
pool = ThreadPool(4)
results = pool.map(my_function, my_array)

这是以下内容的多线程版本:

results = []
for item in my_array:
    results.append(my_function(item))

描述

Map是一个很棒的小功能,是轻松将并行性注入Python代码的关键。对于那些不熟悉的人来说,地图是从Lisp之类的功能语言中提炼出来的。它是将另一个功能映射到序列上的功能。

Map为我们处理序列上的迭代,应用函数,并将所有结果存储在最后的方便列表中。

在此处输入图片说明


实作

map函数的并行版本由以下两个库提供:multiprocessing,以及鲜为人知但同样出色的继子child:multiprocessing.dummy。

multiprocessing.dummy与多处理模块完全相同,但是使用线程代替一个重要的区别 -使用多个进程来执行CPU密集型任务;用于I / O的线程)。

multiprocessing.dummy复制了多处理的API,但仅不过是线程模块的包装器。

import urllib2
from multiprocessing.dummy import Pool as ThreadPool

urls = [
  'http://www.python.org',
  'http://www.python.org/about/',
  'http://www.onlamp.com/pub/a/python/2003/04/17/metaclasses.html',
  'http://www.python.org/doc/',
  'http://www.python.org/download/',
  'http://www.python.org/getit/',
  'http://www.python.org/community/',
  'https://wiki.python.org/moin/',
]

# Make the Pool of workers
pool = ThreadPool(4)

# Open the URLs in their own threads
# and return the results
results = pool.map(urllib2.urlopen, urls)

# Close the pool and wait for the work to finish
pool.close()
pool.join()

以及计时结果:

Single thread:   14.4 seconds
       4 Pool:   3.1 seconds
       8 Pool:   1.4 seconds
      13 Pool:   1.3 seconds

传递多个参数仅在Python 3.3和更高版本中才这样):

要传递多个数组:

results = pool.starmap(function, zip(list_a, list_b))

或传递一个常量和一个数组:

results = pool.starmap(function, zip(itertools.repeat(constant), list_a))

如果您使用的是Python的早期版本,则可以通过此变通方法()传递多个参数。

(感谢user136036的有用评论。)

Since this question was asked in 2010, there has been real simplification in how to do simple multithreading with Python with map and pool.

The code below comes from an article/blog post that you should definitely check out (no affiliation) – Parallelism in one line: A Better Model for Day to Day Threading Tasks. I’ll summarize below – it ends up being just a few lines of code:

from multiprocessing.dummy import Pool as ThreadPool
pool = ThreadPool(4)
results = pool.map(my_function, my_array)

Which is the multithreaded version of:

results = []
for item in my_array:
    results.append(my_function(item))

Description

Map is a cool little function, and the key to easily injecting parallelism into your Python code. For those unfamiliar, map is something lifted from functional languages like Lisp. It is a function which maps another function over a sequence.

Map handles the iteration over the sequence for us, applies the function, and stores all of the results in a handy list at the end.

Enter image description here


Implementation

Parallel versions of the map function are provided by two libraries:multiprocessing, and also its little known, but equally fantastic step child:multiprocessing.dummy.

multiprocessing.dummy is exactly the same as multiprocessing module, but uses threads instead (an important distinction – use multiple processes for CPU-intensive tasks; threads for (and during) I/O):

multiprocessing.dummy replicates the API of multiprocessing, but is no more than a wrapper around the threading module.

import urllib2
from multiprocessing.dummy import Pool as ThreadPool

urls = [
  'http://www.python.org',
  'http://www.python.org/about/',
  'http://www.onlamp.com/pub/a/python/2003/04/17/metaclasses.html',
  'http://www.python.org/doc/',
  'http://www.python.org/download/',
  'http://www.python.org/getit/',
  'http://www.python.org/community/',
  'https://wiki.python.org/moin/',
]

# Make the Pool of workers
pool = ThreadPool(4)

# Open the URLs in their own threads
# and return the results
results = pool.map(urllib2.urlopen, urls)

# Close the pool and wait for the work to finish
pool.close()
pool.join()

And the timing results:

Single thread:   14.4 seconds
       4 Pool:   3.1 seconds
       8 Pool:   1.4 seconds
      13 Pool:   1.3 seconds

Passing multiple arguments (works like this only in Python 3.3 and later):

To pass multiple arrays:

results = pool.starmap(function, zip(list_a, list_b))

Or to pass a constant and an array:

results = pool.starmap(function, zip(itertools.repeat(constant), list_a))

If you are using an earlier version of Python, you can pass multiple arguments via this workaround).

(Thanks to user136036 for the helpful comment.)


回答 1

这是一个简单的示例:您需要尝试一些备用URL并返回第一个URL的内容以进行响应。

import Queue
import threading
import urllib2

# Called by each thread
def get_url(q, url):
    q.put(urllib2.urlopen(url).read())

theurls = ["http://google.com", "http://yahoo.com"]

q = Queue.Queue()

for u in theurls:
    t = threading.Thread(target=get_url, args = (q,u))
    t.daemon = True
    t.start()

s = q.get()
print s

在这种情况下,线程被用作简单的优化:每个子线程都在等待URL解析和响应,以便将其内容放入队列中。每个线程都是一个守护进程(如果主线程结束,则不会使进程继续运行-这比不常见);主线程启动所有子线程,get在队列中执行a ,以等待直到其中一个完成a put,然后发出结果并终止(由于它们是守护线程,因此将取消可能仍在运行的所有子线程)。

正确使用Python中的线程总是会与I / O操作相关联(因为CPython无论如何都不会使用多个内核来运行受CPU约束的任务,因此,线程的唯一原因是在等待某些I / O时不会阻塞进程)。顺便说一句,队列几乎总是将工作分配到线程和/或收集工作结果的最佳方法,并且它们本质上是线程安全的,因此它们使您不必担心锁,条件,事件,信号量以及其他相互之间的关系。线程协调/通信概念。

Here’s a simple example: you need to try a few alternative URLs and return the contents of the first one to respond.

import Queue
import threading
import urllib2

# Called by each thread
def get_url(q, url):
    q.put(urllib2.urlopen(url).read())

theurls = ["http://google.com", "http://yahoo.com"]

q = Queue.Queue()

for u in theurls:
    t = threading.Thread(target=get_url, args = (q,u))
    t.daemon = True
    t.start()

s = q.get()
print s

This is a case where threading is used as a simple optimization: each subthread is waiting for a URL to resolve and respond, in order to put its contents on the queue; each thread is a daemon (won’t keep the process up if main thread ends — that’s more common than not); the main thread starts all subthreads, does a get on the queue to wait until one of them has done a put, then emits the results and terminates (which takes down any subthreads that might still be running, since they’re daemon threads).

Proper use of threads in Python is invariably connected to I/O operations (since CPython doesn’t use multiple cores to run CPU-bound tasks anyway, the only reason for threading is not blocking the process while there’s a wait for some I/O). Queues are almost invariably the best way to farm out work to threads and/or collect the work’s results, by the way, and they’re intrinsically threadsafe, so they save you from worrying about locks, conditions, events, semaphores, and other inter-thread coordination/communication concepts.


回答 2

注意:对于Python中的实际并行化,您应该使用多处理模块来分叉多个并行执行的进程(由于全局解释器锁,Python线程提供了交织,但实际上它们是串行执行的,而不是并行执行的,并且仅仅是在交错I / O操作时很有用)。

但是,如果您只是在寻找交织(或者正在进行尽管可以使用全局解释器锁而可以并行化的I / O操作),那么就可以从线程模块开始。作为一个非常简单的示例,让我们考虑通过并行求和子范围来求和一个大范围的问题:

import threading

class SummingThread(threading.Thread):
     def __init__(self,low,high):
         super(SummingThread, self).__init__()
         self.low=low
         self.high=high
         self.total=0

     def run(self):
         for i in range(self.low,self.high):
             self.total+=i


thread1 = SummingThread(0,500000)
thread2 = SummingThread(500000,1000000)
thread1.start() # This actually causes the thread to run
thread2.start()
thread1.join()  # This waits until the thread has completed
thread2.join()
# At this point, both threads have completed
result = thread1.total + thread2.total
print result

请注意,以上示例是一个非常愚蠢的示例,因为它完全不执行任何I / O操作,并且由于全局解释器锁定,尽管在CPython中是交错执行的(带有上下文切换的额外开销),但仍将串行执行。

NOTE: For actual parallelization in Python, you should use the multiprocessing module to fork multiple processes that execute in parallel (due to the global interpreter lock, Python threads provide interleaving, but they are in fact executed serially, not in parallel, and are only useful when interleaving I/O operations).

However, if you are merely looking for interleaving (or are doing I/O operations that can be parallelized despite the global interpreter lock), then the threading module is the place to start. As a really simple example, let’s consider the problem of summing a large range by summing subranges in parallel:

import threading

class SummingThread(threading.Thread):
     def __init__(self,low,high):
         super(SummingThread, self).__init__()
         self.low=low
         self.high=high
         self.total=0

     def run(self):
         for i in range(self.low,self.high):
             self.total+=i


thread1 = SummingThread(0,500000)
thread2 = SummingThread(500000,1000000)
thread1.start() # This actually causes the thread to run
thread2.start()
thread1.join()  # This waits until the thread has completed
thread2.join()
# At this point, both threads have completed
result = thread1.total + thread2.total
print result

Note that the above is a very stupid example, as it does absolutely no I/O and will be executed serially albeit interleaved (with the added overhead of context switching) in CPython due to the global interpreter lock.


回答 3

像其他提到的一样,由于GIL,CPython只能将线程用于I / O等待。

如果您想从多个内核中受益于CPU绑定任务,请使用multiprocessing

from multiprocessing import Process

def f(name):
    print 'hello', name

if __name__ == '__main__':
    p = Process(target=f, args=('bob',))
    p.start()
    p.join()

Like others mentioned, CPython can use threads only for I/O waits due to GIL.

If you want to benefit from multiple cores for CPU-bound tasks, use multiprocessing:

from multiprocessing import Process

def f(name):
    print 'hello', name

if __name__ == '__main__':
    p = Process(target=f, args=('bob',))
    p.start()
    p.join()

回答 4

仅需注意:线程不需要队列。

这是我能想到的最简单的示例,其中显示了10个进程同时运行。

import threading
from random import randint
from time import sleep


def print_number(number):

    # Sleeps a random 1 to 10 seconds
    rand_int_var = randint(1, 10)
    sleep(rand_int_var)
    print "Thread " + str(number) + " slept for " + str(rand_int_var) + " seconds"

thread_list = []

for i in range(1, 10):

    # Instantiates the thread
    # (i) does not make a sequence, so (i,)
    t = threading.Thread(target=print_number, args=(i,))
    # Sticks the thread in a list so that it remains accessible
    thread_list.append(t)

# Starts threads
for thread in thread_list:
    thread.start()

# This blocks the calling thread until the thread whose join() method is called is terminated.
# From http://docs.python.org/2/library/threading.html#thread-objects
for thread in thread_list:
    thread.join()

# Demonstrates that the main process waited for threads to complete
print "Done"

Just a note: A queue is not required for threading.

This is the simplest example I could imagine that shows 10 processes running concurrently.

import threading
from random import randint
from time import sleep


def print_number(number):

    # Sleeps a random 1 to 10 seconds
    rand_int_var = randint(1, 10)
    sleep(rand_int_var)
    print "Thread " + str(number) + " slept for " + str(rand_int_var) + " seconds"

thread_list = []

for i in range(1, 10):

    # Instantiates the thread
    # (i) does not make a sequence, so (i,)
    t = threading.Thread(target=print_number, args=(i,))
    # Sticks the thread in a list so that it remains accessible
    thread_list.append(t)

# Starts threads
for thread in thread_list:
    thread.start()

# This blocks the calling thread until the thread whose join() method is called is terminated.
# From http://docs.python.org/2/library/threading.html#thread-objects
for thread in thread_list:
    thread.join()

# Demonstrates that the main process waited for threads to complete
print "Done"

回答 5

Alex Martelli的回答对我有所帮助。但是,这是我认为更有用的修改版本(至少对我而言)。

更新:在Python 2和Python 3中均可使用

try:
    # For Python 3
    import queue
    from urllib.request import urlopen
except:
    # For Python 2 
    import Queue as queue
    from urllib2 import urlopen

import threading

worker_data = ['http://google.com', 'http://yahoo.com', 'http://bing.com']

# Load up a queue with your data. This will handle locking
q = queue.Queue()
for url in worker_data:
    q.put(url)

# Define a worker function
def worker(url_queue):
    queue_full = True
    while queue_full:
        try:
            # Get your data off the queue, and do some work
            url = url_queue.get(False)
            data = urlopen(url).read()
            print(len(data))

        except queue.Empty:
            queue_full = False

# Create as many threads as you want
thread_count = 5
for i in range(thread_count):
    t = threading.Thread(target=worker, args = (q,))
    t.start()

The answer from Alex Martelli helped me. However, here is a modified version that I thought was more useful (at least to me).

Updated: works in both Python 2 and Python 3

try:
    # For Python 3
    import queue
    from urllib.request import urlopen
except:
    # For Python 2 
    import Queue as queue
    from urllib2 import urlopen

import threading

worker_data = ['http://google.com', 'http://yahoo.com', 'http://bing.com']

# Load up a queue with your data. This will handle locking
q = queue.Queue()
for url in worker_data:
    q.put(url)

# Define a worker function
def worker(url_queue):
    queue_full = True
    while queue_full:
        try:
            # Get your data off the queue, and do some work
            url = url_queue.get(False)
            data = urlopen(url).read()
            print(len(data))

        except queue.Empty:
            queue_full = False

# Create as many threads as you want
thread_count = 5
for i in range(thread_count):
    t = threading.Thread(target=worker, args = (q,))
    t.start()

回答 6

给定一个函数,将f其像这样进行线程化:

import threading
threading.Thread(target=f).start()

将参数传递给 f

threading.Thread(target=f, args=(a,b,c)).start()

Given a function, f, thread it like this:

import threading
threading.Thread(target=f).start()

To pass arguments to f

threading.Thread(target=f, args=(a,b,c)).start()

回答 7

我发现这非常有用:创建与内核一样多的线程,并让它们执行(大量)任务(在这种情况下,调用Shell程序):

import Queue
import threading
import multiprocessing
import subprocess

q = Queue.Queue()
for i in range(30): # Put 30 tasks in the queue
    q.put(i)

def worker():
    while True:
        item = q.get()
        # Execute a task: call a shell program and wait until it completes
        subprocess.call("echo " + str(item), shell=True)
        q.task_done()

cpus = multiprocessing.cpu_count() # Detect number of cores
print("Creating %d threads" % cpus)
for i in range(cpus):
     t = threading.Thread(target=worker)
     t.daemon = True
     t.start()

q.join() # Block until all tasks are done

I found this very useful: create as many threads as cores and let them execute a (large) number of tasks (in this case, calling a shell program):

import Queue
import threading
import multiprocessing
import subprocess

q = Queue.Queue()
for i in range(30): # Put 30 tasks in the queue
    q.put(i)

def worker():
    while True:
        item = q.get()
        # Execute a task: call a shell program and wait until it completes
        subprocess.call("echo " + str(item), shell=True)
        q.task_done()

cpus = multiprocessing.cpu_count() # Detect number of cores
print("Creating %d threads" % cpus)
for i in range(cpus):
     t = threading.Thread(target=worker)
     t.daemon = True
     t.start()

q.join() # Block until all tasks are done

回答 8

Python 3具有启动并行任务的功能。这使我们的工作更加轻松。

它具有线程池进程池

以下提供了一个见解:

ThreadPoolExecutor示例

import concurrent.futures
import urllib.request

URLS = ['http://www.foxnews.com/',
        'http://www.cnn.com/',
        'http://europe.wsj.com/',
        'http://www.bbc.co.uk/',
        'http://some-made-up-domain.com/']

# Retrieve a single page and report the URL and contents
def load_url(url, timeout):
    with urllib.request.urlopen(url, timeout=timeout) as conn:
        return conn.read()

# We can use a with statement to ensure threads are cleaned up promptly
with concurrent.futures.ThreadPoolExecutor(max_workers=5) as executor:
    # Start the load operations and mark each future with its URL
    future_to_url = {executor.submit(load_url, url, 60): url for url in URLS}
    for future in concurrent.futures.as_completed(future_to_url):
        url = future_to_url[future]
        try:
            data = future.result()
        except Exception as exc:
            print('%r generated an exception: %s' % (url, exc))
        else:
            print('%r page is %d bytes' % (url, len(data)))

ProcessPoolExecutor

import concurrent.futures
import math

PRIMES = [
    112272535095293,
    112582705942171,
    112272535095293,
    115280095190773,
    115797848077099,
    1099726899285419]

def is_prime(n):
    if n % 2 == 0:
        return False

    sqrt_n = int(math.floor(math.sqrt(n)))
    for i in range(3, sqrt_n + 1, 2):
        if n % i == 0:
            return False
    return True

def main():
    with concurrent.futures.ProcessPoolExecutor() as executor:
        for number, prime in zip(PRIMES, executor.map(is_prime, PRIMES)):
            print('%d is prime: %s' % (number, prime))

if __name__ == '__main__':
    main()

Python 3 has the facility of launching parallel tasks. This makes our work easier.

It has thread pooling and process pooling.

The following gives an insight:

ThreadPoolExecutor Example (source)

import concurrent.futures
import urllib.request

URLS = ['http://www.foxnews.com/',
        'http://www.cnn.com/',
        'http://europe.wsj.com/',
        'http://www.bbc.co.uk/',
        'http://some-made-up-domain.com/']

# Retrieve a single page and report the URL and contents
def load_url(url, timeout):
    with urllib.request.urlopen(url, timeout=timeout) as conn:
        return conn.read()

# We can use a with statement to ensure threads are cleaned up promptly
with concurrent.futures.ThreadPoolExecutor(max_workers=5) as executor:
    # Start the load operations and mark each future with its URL
    future_to_url = {executor.submit(load_url, url, 60): url for url in URLS}
    for future in concurrent.futures.as_completed(future_to_url):
        url = future_to_url[future]
        try:
            data = future.result()
        except Exception as exc:
            print('%r generated an exception: %s' % (url, exc))
        else:
            print('%r page is %d bytes' % (url, len(data)))

ProcessPoolExecutor (source)

import concurrent.futures
import math

PRIMES = [
    112272535095293,
    112582705942171,
    112272535095293,
    115280095190773,
    115797848077099,
    1099726899285419]

def is_prime(n):
    if n % 2 == 0:
        return False

    sqrt_n = int(math.floor(math.sqrt(n)))
    for i in range(3, sqrt_n + 1, 2):
        if n % i == 0:
            return False
    return True

def main():
    with concurrent.futures.ProcessPoolExecutor() as executor:
        for number, prime in zip(PRIMES, executor.map(is_prime, PRIMES)):
            print('%d is prime: %s' % (number, prime))

if __name__ == '__main__':
    main()

回答 9

使用新的并发模块

def sqr(val):
    import time
    time.sleep(0.1)
    return val * val

def process_result(result):
    print(result)

def process_these_asap(tasks):
    import concurrent.futures

    with concurrent.futures.ProcessPoolExecutor() as executor:
        futures = []
        for task in tasks:
            futures.append(executor.submit(sqr, task))

        for future in concurrent.futures.as_completed(futures):
            process_result(future.result())
        # Or instead of all this just do:
        # results = executor.map(sqr, tasks)
        # list(map(process_result, results))

def main():
    tasks = list(range(10))
    print('Processing {} tasks'.format(len(tasks)))
    process_these_asap(tasks)
    print('Done')
    return 0

if __name__ == '__main__':
    import sys
    sys.exit(main())

对于所有以前接触过Java的人来说,执行者方法似乎都很熟悉。

另外请注意:为了使Universe保持理智,如果您不使用with上下文,请不要忘记关闭池/执行器(它非常强大,它可以为您完成此工作)

Using the blazing new concurrent.futures module

def sqr(val):
    import time
    time.sleep(0.1)
    return val * val

def process_result(result):
    print(result)

def process_these_asap(tasks):
    import concurrent.futures

    with concurrent.futures.ProcessPoolExecutor() as executor:
        futures = []
        for task in tasks:
            futures.append(executor.submit(sqr, task))

        for future in concurrent.futures.as_completed(futures):
            process_result(future.result())
        # Or instead of all this just do:
        # results = executor.map(sqr, tasks)
        # list(map(process_result, results))

def main():
    tasks = list(range(10))
    print('Processing {} tasks'.format(len(tasks)))
    process_these_asap(tasks)
    print('Done')
    return 0

if __name__ == '__main__':
    import sys
    sys.exit(main())

The executor approach might seem familiar to all those who have gotten their hands dirty with Java before.

Also on a side note: To keep the universe sane, don’t forget to close your pools/executors if you don’t use with context (which is so awesome that it does it for you)


回答 10

对我而言,线程的完美示例是监视异步事件。看这段代码。

# thread_test.py
import threading
import time

class Monitor(threading.Thread):
    def __init__(self, mon):
        threading.Thread.__init__(self)
        self.mon = mon

    def run(self):
        while True:
            if self.mon[0] == 2:
                print "Mon = 2"
                self.mon[0] = 3;

您可以通过打开IPython会话并执行以下操作来处理此代码:

>>> from thread_test import Monitor
>>> a = [0]
>>> mon = Monitor(a)
>>> mon.start()
>>> a[0] = 2
Mon = 2
>>>a[0] = 2
Mon = 2

等一下

>>> a[0] = 2
Mon = 2

For me, the perfect example for threading is monitoring asynchronous events. Look at this code.

# thread_test.py
import threading
import time

class Monitor(threading.Thread):
    def __init__(self, mon):
        threading.Thread.__init__(self)
        self.mon = mon

    def run(self):
        while True:
            if self.mon[0] == 2:
                print "Mon = 2"
                self.mon[0] = 3;

You can play with this code by opening an IPython session and doing something like:

>>> from thread_test import Monitor
>>> a = [0]
>>> mon = Monitor(a)
>>> mon.start()
>>> a[0] = 2
Mon = 2
>>>a[0] = 2
Mon = 2

Wait a few minutes

>>> a[0] = 2
Mon = 2

回答 11

大多数文档和教程都使用Python ThreadingQueue模块,对于初学者来说,它们似乎不胜枚举。

也许考虑使用concurrent.futures.ThreadPoolExecutorPython 3 的模块。

结合with子句和列表理解,这可能是一个真正的魅力。

from concurrent.futures import ThreadPoolExecutor, as_completed

def get_url(url):
    # Your actual program here. Using threading.Lock() if necessary
    return ""

# List of URLs to fetch
urls = ["url1", "url2"]

with ThreadPoolExecutor(max_workers = 5) as executor:

    # Create threads
    futures = {executor.submit(get_url, url) for url in urls}

    # as_completed() gives you the threads once finished
    for f in as_completed(futures):
        # Get the results
        rs = f.result()

Most documentation and tutorials use Python’s Threading and Queue module, and they could seem overwhelming for beginners.

Perhaps consider the concurrent.futures.ThreadPoolExecutor module of Python 3.

Combined with with clause and list comprehension it could be a real charm.

from concurrent.futures import ThreadPoolExecutor, as_completed

def get_url(url):
    # Your actual program here. Using threading.Lock() if necessary
    return ""

# List of URLs to fetch
urls = ["url1", "url2"]

with ThreadPoolExecutor(max_workers = 5) as executor:

    # Create threads
    futures = {executor.submit(get_url, url) for url in urls}

    # as_completed() gives you the threads once finished
    for f in as_completed(futures):
        # Get the results
        rs = f.result()

回答 12

我在这里看到了很多没有执行任何实际工作的示例,这些示例主要是CPU约束的。这是一个CPU限制任务的示例,该任务计算1000万到10.05百万之间的所有素数。我在这里使用了所有四种方法:

import math
import timeit
import threading
import multiprocessing
from concurrent.futures import ThreadPoolExecutor, ProcessPoolExecutor


def time_stuff(fn):
    """
    Measure time of execution of a function
    """
    def wrapper(*args, **kwargs):
        t0 = timeit.default_timer()
        fn(*args, **kwargs)
        t1 = timeit.default_timer()
        print("{} seconds".format(t1 - t0))
    return wrapper

def find_primes_in(nmin, nmax):
    """
    Compute a list of prime numbers between the given minimum and maximum arguments
    """
    primes = []

    # Loop from minimum to maximum
    for current in range(nmin, nmax + 1):

        # Take the square root of the current number
        sqrt_n = int(math.sqrt(current))
        found = False

        # Check if the any number from 2 to the square root + 1 divides the current numnber under consideration
        for number in range(2, sqrt_n + 1):

            # If divisible we have found a factor, hence this is not a prime number, lets move to the next one
            if current % number == 0:
                found = True
                break

        # If not divisible, add this number to the list of primes that we have found so far
        if not found:
            primes.append(current)

    # I am merely printing the length of the array containing all the primes, but feel free to do what you want
    print(len(primes))

@time_stuff
def sequential_prime_finder(nmin, nmax):
    """
    Use the main process and main thread to compute everything in this case
    """
    find_primes_in(nmin, nmax)

@time_stuff
def threading_prime_finder(nmin, nmax):
    """
    If the minimum is 1000 and the maximum is 2000 and we have four workers,
    1000 - 1250 to worker 1
    1250 - 1500 to worker 2
    1500 - 1750 to worker 3
    1750 - 2000 to worker 4
    so let’s split the minimum and maximum values according to the number of workers
    """
    nrange = nmax - nmin
    threads = []
    for i in range(8):
        start = int(nmin + i * nrange/8)
        end = int(nmin + (i + 1) * nrange/8)

        # Start the thread with the minimum and maximum split up to compute
        # Parallel computation will not work here due to the GIL since this is a CPU-bound task
        t = threading.Thread(target = find_primes_in, args = (start, end))
        threads.append(t)
        t.start()

    # Don’t forget to wait for the threads to finish
    for t in threads:
        t.join()

@time_stuff
def processing_prime_finder(nmin, nmax):
    """
    Split the minimum, maximum interval similar to the threading method above, but use processes this time
    """
    nrange = nmax - nmin
    processes = []
    for i in range(8):
        start = int(nmin + i * nrange/8)
        end = int(nmin + (i + 1) * nrange/8)
        p = multiprocessing.Process(target = find_primes_in, args = (start, end))
        processes.append(p)
        p.start()

    for p in processes:
        p.join()

@time_stuff
def thread_executor_prime_finder(nmin, nmax):
    """
    Split the min max interval similar to the threading method, but use a thread pool executor this time.
    This method is slightly faster than using pure threading as the pools manage threads more efficiently.
    This method is still slow due to the GIL limitations since we are doing a CPU-bound task.
    """
    nrange = nmax - nmin
    with ThreadPoolExecutor(max_workers = 8) as e:
        for i in range(8):
            start = int(nmin + i * nrange/8)
            end = int(nmin + (i + 1) * nrange/8)
            e.submit(find_primes_in, start, end)

@time_stuff
def process_executor_prime_finder(nmin, nmax):
    """
    Split the min max interval similar to the threading method, but use the process pool executor.
    This is the fastest method recorded so far as it manages process efficiently + overcomes GIL limitations.
    RECOMMENDED METHOD FOR CPU-BOUND TASKS
    """
    nrange = nmax - nmin
    with ProcessPoolExecutor(max_workers = 8) as e:
        for i in range(8):
            start = int(nmin + i * nrange/8)
            end = int(nmin + (i + 1) * nrange/8)
            e.submit(find_primes_in, start, end)

def main():
    nmin = int(1e7)
    nmax = int(1.05e7)
    print("Sequential Prime Finder Starting")
    sequential_prime_finder(nmin, nmax)
    print("Threading Prime Finder Starting")
    threading_prime_finder(nmin, nmax)
    print("Processing Prime Finder Starting")
    processing_prime_finder(nmin, nmax)
    print("Thread Executor Prime Finder Starting")
    thread_executor_prime_finder(nmin, nmax)
    print("Process Executor Finder Starting")
    process_executor_prime_finder(nmin, nmax)

main()

这是我的Mac OS X四核计算机上的结果

Sequential Prime Finder Starting
9.708213827005238 seconds
Threading Prime Finder Starting
9.81836523200036 seconds
Processing Prime Finder Starting
3.2467174359990167 seconds
Thread Executor Prime Finder Starting
10.228896902000997 seconds
Process Executor Finder Starting
2.656402041000547 seconds

I saw a lot of examples here where no real work was being performed, and they were mostly CPU-bound. Here is an example of a CPU-bound task that computes all prime numbers between 10 million and 10.05 million. I have used all four methods here:

import math
import timeit
import threading
import multiprocessing
from concurrent.futures import ThreadPoolExecutor, ProcessPoolExecutor


def time_stuff(fn):
    """
    Measure time of execution of a function
    """
    def wrapper(*args, **kwargs):
        t0 = timeit.default_timer()
        fn(*args, **kwargs)
        t1 = timeit.default_timer()
        print("{} seconds".format(t1 - t0))
    return wrapper

def find_primes_in(nmin, nmax):
    """
    Compute a list of prime numbers between the given minimum and maximum arguments
    """
    primes = []

    # Loop from minimum to maximum
    for current in range(nmin, nmax + 1):

        # Take the square root of the current number
        sqrt_n = int(math.sqrt(current))
        found = False

        # Check if the any number from 2 to the square root + 1 divides the current numnber under consideration
        for number in range(2, sqrt_n + 1):

            # If divisible we have found a factor, hence this is not a prime number, lets move to the next one
            if current % number == 0:
                found = True
                break

        # If not divisible, add this number to the list of primes that we have found so far
        if not found:
            primes.append(current)

    # I am merely printing the length of the array containing all the primes, but feel free to do what you want
    print(len(primes))

@time_stuff
def sequential_prime_finder(nmin, nmax):
    """
    Use the main process and main thread to compute everything in this case
    """
    find_primes_in(nmin, nmax)

@time_stuff
def threading_prime_finder(nmin, nmax):
    """
    If the minimum is 1000 and the maximum is 2000 and we have four workers,
    1000 - 1250 to worker 1
    1250 - 1500 to worker 2
    1500 - 1750 to worker 3
    1750 - 2000 to worker 4
    so let’s split the minimum and maximum values according to the number of workers
    """
    nrange = nmax - nmin
    threads = []
    for i in range(8):
        start = int(nmin + i * nrange/8)
        end = int(nmin + (i + 1) * nrange/8)

        # Start the thread with the minimum and maximum split up to compute
        # Parallel computation will not work here due to the GIL since this is a CPU-bound task
        t = threading.Thread(target = find_primes_in, args = (start, end))
        threads.append(t)
        t.start()

    # Don’t forget to wait for the threads to finish
    for t in threads:
        t.join()

@time_stuff
def processing_prime_finder(nmin, nmax):
    """
    Split the minimum, maximum interval similar to the threading method above, but use processes this time
    """
    nrange = nmax - nmin
    processes = []
    for i in range(8):
        start = int(nmin + i * nrange/8)
        end = int(nmin + (i + 1) * nrange/8)
        p = multiprocessing.Process(target = find_primes_in, args = (start, end))
        processes.append(p)
        p.start()

    for p in processes:
        p.join()

@time_stuff
def thread_executor_prime_finder(nmin, nmax):
    """
    Split the min max interval similar to the threading method, but use a thread pool executor this time.
    This method is slightly faster than using pure threading as the pools manage threads more efficiently.
    This method is still slow due to the GIL limitations since we are doing a CPU-bound task.
    """
    nrange = nmax - nmin
    with ThreadPoolExecutor(max_workers = 8) as e:
        for i in range(8):
            start = int(nmin + i * nrange/8)
            end = int(nmin + (i + 1) * nrange/8)
            e.submit(find_primes_in, start, end)

@time_stuff
def process_executor_prime_finder(nmin, nmax):
    """
    Split the min max interval similar to the threading method, but use the process pool executor.
    This is the fastest method recorded so far as it manages process efficiently + overcomes GIL limitations.
    RECOMMENDED METHOD FOR CPU-BOUND TASKS
    """
    nrange = nmax - nmin
    with ProcessPoolExecutor(max_workers = 8) as e:
        for i in range(8):
            start = int(nmin + i * nrange/8)
            end = int(nmin + (i + 1) * nrange/8)
            e.submit(find_primes_in, start, end)

def main():
    nmin = int(1e7)
    nmax = int(1.05e7)
    print("Sequential Prime Finder Starting")
    sequential_prime_finder(nmin, nmax)
    print("Threading Prime Finder Starting")
    threading_prime_finder(nmin, nmax)
    print("Processing Prime Finder Starting")
    processing_prime_finder(nmin, nmax)
    print("Thread Executor Prime Finder Starting")
    thread_executor_prime_finder(nmin, nmax)
    print("Process Executor Finder Starting")
    process_executor_prime_finder(nmin, nmax)

main()

Here are the results on my Mac OS X four-core machine

Sequential Prime Finder Starting
9.708213827005238 seconds
Threading Prime Finder Starting
9.81836523200036 seconds
Processing Prime Finder Starting
3.2467174359990167 seconds
Thread Executor Prime Finder Starting
10.228896902000997 seconds
Process Executor Finder Starting
2.656402041000547 seconds

回答 13

这是使用线程导入CSV的非常简单的示例。(图书馆收录的目的可能有所不同。)

辅助功能:

from threading import Thread
from project import app
import csv


def import_handler(csv_file_name):
    thr = Thread(target=dump_async_csv_data, args=[csv_file_name])
    thr.start()

def dump_async_csv_data(csv_file_name):
    with app.app_context():
        with open(csv_file_name) as File:
            reader = csv.DictReader(File)
            for row in reader:
                # DB operation/query

驱动功能:

import_handler(csv_file_name)

Here is the very simple example of CSV import using threading. (Library inclusion may differ for different purpose.)

Helper Functions:

from threading import Thread
from project import app
import csv


def import_handler(csv_file_name):
    thr = Thread(target=dump_async_csv_data, args=[csv_file_name])
    thr.start()

def dump_async_csv_data(csv_file_name):
    with app.app_context():
        with open(csv_file_name) as File:
            reader = csv.DictReader(File)
            for row in reader:
                # DB operation/query

Driver Function:

import_handler(csv_file_name)

回答 14

我想举一个简单的例子,当我不得不自己解决这个问题时,我发现这些解释很有用。

在此答案中,您将找到有关Python的GIL(全局解释器锁)的一些信息,以及使用multiprocessing.dummy和一些简单的基准编写的简单的日常示例。

全局翻译锁定(GIL)

Python不允许真正意义上的多线程。它具有一个多线程程序包,但是如果您想使用多线程来加快代码速度,那么使用它通常不是一个好主意。

Python具有称为全局解释器锁(GIL)的构造。GIL确保您的“线程”只能在任何一次执行。线程获取GIL,做一些工作,然后将GIL传递到下一个线程。

这发生得非常快,以至于人眼似乎您的线程正在并行执行,但是实际上它们只是使用相同的CPU内核轮流执行。

所有这些GIL传递都会增加执行开销。这意味着,如果您想使代码运行更快,那么使用线程包通常不是一个好主意。

有理由使用Python的线程包。如果您想同时运行某些东西,而效率不是问题,那么它就很好而且很方便。或者,如果您正在运行的代码需要等待某些东西(例如某些I / O),那么这很有意义。但是线程库不允许您使用额外的CPU内核。

多线程可以外包给操作系统(通过执行多处理),某些外部应用程序可以调用Python代码(例如SparkHadoop),也可以外包给Python代码调用的某些代码(例如:让您的Python代码调用C函数来执行昂贵的多线程任务)。

为什么如此重要

因为很多人在学习GIL是什么之前,会花费大量时间试图在他们喜欢的Python多线程代码中找到瓶颈。

清除此信息后,这是我的代码:

#!/bin/python
from multiprocessing.dummy import Pool
from subprocess import PIPE,Popen
import time
import os

# In the variable pool_size we define the "parallelness".
# For CPU-bound tasks, it doesn't make sense to create more Pool processes
# than you have cores to run them on.
#
# On the other hand, if you are using I/O-bound tasks, it may make sense
# to create a quite a few more Pool processes than cores, since the processes
# will probably spend most their time blocked (waiting for I/O to complete).
pool_size = 8

def do_ping(ip):
    if os.name == 'nt':
        print ("Using Windows Ping to " + ip)
        proc = Popen(['ping', ip], stdout=PIPE)
        return proc.communicate()[0]
    else:
        print ("Using Linux / Unix Ping to " + ip)
        proc = Popen(['ping', ip, '-c', '4'], stdout=PIPE)
        return proc.communicate()[0]


os.system('cls' if os.name=='nt' else 'clear')
print ("Running using threads\n")
start_time = time.time()
pool = Pool(pool_size)
website_names = ["www.google.com","www.facebook.com","www.pinterest.com","www.microsoft.com"]
result = {}
for website_name in website_names:
    result[website_name] = pool.apply_async(do_ping, args=(website_name,))
pool.close()
pool.join()
print ("\n--- Execution took {} seconds ---".format((time.time() - start_time)))

# Now we do the same without threading, just to compare time
print ("\nRunning NOT using threads\n")
start_time = time.time()
for website_name in website_names:
    do_ping(website_name)
print ("\n--- Execution took {} seconds ---".format((time.time() - start_time)))

# Here's one way to print the final output from the threads
output = {}
for key, value in result.items():
    output[key] = value.get()
print ("\nOutput aggregated in a Dictionary:")
print (output)
print ("\n")

print ("\nPretty printed output: ")
for key, value in output.items():
    print (key + "\n")
    print (value)

I would like to contribute with a simple example and the explanations I’ve found useful when I had to tackle this problem myself.

In this answer you will find some information about Python’s GIL (global interpreter lock) and a simple day-to-day example written using multiprocessing.dummy plus some simple benchmarks.

Global Interpreter Lock (GIL)

Python doesn’t allow multi-threading in the truest sense of the word. It has a multi-threading package, but if you want to multi-thread to speed your code up, then it’s usually not a good idea to use it.

Python has a construct called the global interpreter lock (GIL). The GIL makes sure that only one of your ‘threads’ can execute at any one time. A thread acquires the GIL, does a little work, then passes the GIL onto the next thread.

This happens very quickly so to the human eye it may seem like your threads are executing in parallel, but they are really just taking turns using the same CPU core.

All this GIL passing adds overhead to execution. This means that if you want to make your code run faster then using the threading package often isn’t a good idea.

There are reasons to use Python’s threading package. If you want to run some things simultaneously, and efficiency is not a concern, then it’s totally fine and convenient. Or if you are running code that needs to wait for something (like some I/O) then it could make a lot of sense. But the threading library won’t let you use extra CPU cores.

Multi-threading can be outsourced to the operating system (by doing multi-processing), and some external application that calls your Python code (for example, Spark or Hadoop), or some code that your Python code calls (for example: you could have your Python code call a C function that does the expensive multi-threaded stuff).

Why This Matters

Because lots of people spend a lot of time trying to find bottlenecks in their fancy Python multi-threaded code before they learn what the GIL is.

Once this information is clear, here’s my code:

#!/bin/python
from multiprocessing.dummy import Pool
from subprocess import PIPE,Popen
import time
import os

# In the variable pool_size we define the "parallelness".
# For CPU-bound tasks, it doesn't make sense to create more Pool processes
# than you have cores to run them on.
#
# On the other hand, if you are using I/O-bound tasks, it may make sense
# to create a quite a few more Pool processes than cores, since the processes
# will probably spend most their time blocked (waiting for I/O to complete).
pool_size = 8

def do_ping(ip):
    if os.name == 'nt':
        print ("Using Windows Ping to " + ip)
        proc = Popen(['ping', ip], stdout=PIPE)
        return proc.communicate()[0]
    else:
        print ("Using Linux / Unix Ping to " + ip)
        proc = Popen(['ping', ip, '-c', '4'], stdout=PIPE)
        return proc.communicate()[0]


os.system('cls' if os.name=='nt' else 'clear')
print ("Running using threads\n")
start_time = time.time()
pool = Pool(pool_size)
website_names = ["www.google.com","www.facebook.com","www.pinterest.com","www.microsoft.com"]
result = {}
for website_name in website_names:
    result[website_name] = pool.apply_async(do_ping, args=(website_name,))
pool.close()
pool.join()
print ("\n--- Execution took {} seconds ---".format((time.time() - start_time)))

# Now we do the same without threading, just to compare time
print ("\nRunning NOT using threads\n")
start_time = time.time()
for website_name in website_names:
    do_ping(website_name)
print ("\n--- Execution took {} seconds ---".format((time.time() - start_time)))

# Here's one way to print the final output from the threads
output = {}
for key, value in result.items():
    output[key] = value.get()
print ("\nOutput aggregated in a Dictionary:")
print (output)
print ("\n")

print ("\nPretty printed output: ")
for key, value in output.items():
    print (key + "\n")
    print (value)

回答 15

这是带有一个简单示例的多线程,将很有帮助。您可以运行它并轻松了解Python中多线程的工作方式。在以前的线程完成其工作之前,我使用了一个锁来防止访问其他线程。通过使用这一行代码,

tLock = threading.BoundedSemaphore(值= 4)

您可以一次允许多个进程,并保留其余线程,这些线程将在以后的进程或之前的进程完成后运行。

import threading
import time

#tLock = threading.Lock()
tLock = threading.BoundedSemaphore(value=4)
def timer(name, delay, repeat):
    print  "\r\nTimer: ", name, " Started"
    tLock.acquire()
    print "\r\n", name, " has the acquired the lock"
    while repeat > 0:
        time.sleep(delay)
        print "\r\n", name, ": ", str(time.ctime(time.time()))
        repeat -= 1

    print "\r\n", name, " is releaseing the lock"
    tLock.release()
    print "\r\nTimer: ", name, " Completed"

def Main():
    t1 = threading.Thread(target=timer, args=("Timer1", 2, 5))
    t2 = threading.Thread(target=timer, args=("Timer2", 3, 5))
    t3 = threading.Thread(target=timer, args=("Timer3", 4, 5))
    t4 = threading.Thread(target=timer, args=("Timer4", 5, 5))
    t5 = threading.Thread(target=timer, args=("Timer5", 0.1, 5))

    t1.start()
    t2.start()
    t3.start()
    t4.start()
    t5.start()

    print "\r\nMain Complete"

if __name__ == "__main__":
    Main()

Here is multi threading with a simple example which will be helpful. You can run it and understand easily how multi threading is working in Python. I used a lock for preventing access to other threads until the previous threads finished their work. By the use of this line of code,

tLock = threading.BoundedSemaphore(value=4)

you can allow a number of processes at a time and keep hold to the rest of the threads which will run later or after finished previous processes.

import threading
import time

#tLock = threading.Lock()
tLock = threading.BoundedSemaphore(value=4)
def timer(name, delay, repeat):
    print  "\r\nTimer: ", name, " Started"
    tLock.acquire()
    print "\r\n", name, " has the acquired the lock"
    while repeat > 0:
        time.sleep(delay)
        print "\r\n", name, ": ", str(time.ctime(time.time()))
        repeat -= 1

    print "\r\n", name, " is releaseing the lock"
    tLock.release()
    print "\r\nTimer: ", name, " Completed"

def Main():
    t1 = threading.Thread(target=timer, args=("Timer1", 2, 5))
    t2 = threading.Thread(target=timer, args=("Timer2", 3, 5))
    t3 = threading.Thread(target=timer, args=("Timer3", 4, 5))
    t4 = threading.Thread(target=timer, args=("Timer4", 5, 5))
    t5 = threading.Thread(target=timer, args=("Timer5", 0.1, 5))

    t1.start()
    t2.start()
    t3.start()
    t4.start()
    t5.start()

    print "\r\nMain Complete"

if __name__ == "__main__":
    Main()

回答 16

通过从这篇文章中借用,我们知道在多线程,多处理和异步/ asyncio及其用法之间进行选择。

Python 3具有新的内置库以实现并发性和并行性:current.futures

因此,我将通过一个实验来演示如何通过以下方式运行四个任务(即.sleep()方法)Threading-Pool

from concurrent.futures import ThreadPoolExecutor, as_completed
from time import sleep, time

def concurrent(max_worker=1):
    futures = []

    tick = time()
    with ThreadPoolExecutor(max_workers=max_worker) as executor:
        futures.append(executor.submit(sleep, 2))  # Two seconds sleep
        futures.append(executor.submit(sleep, 1))
        futures.append(executor.submit(sleep, 7))
        futures.append(executor.submit(sleep, 3))

        for future in as_completed(futures):
            if future.result() is not None:
                print(future.result())

    print('Total elapsed time by {} workers:'.format(max_worker), time()-tick)

concurrent(5)
concurrent(4)
concurrent(3)
concurrent(2)
concurrent(1)

输出:

Total elapsed time by 5 workers: 7.007831811904907
Total elapsed time by 4 workers: 7.007944107055664
Total elapsed time by 3 workers: 7.003149509429932
Total elapsed time by 2 workers: 8.004627466201782
Total elapsed time by 1 workers: 13.013478994369507

[ 注意 ]:

  • 从以上结果中可以看出,最好的情况是这四个任务需要3个工作人员。
  • 如果您有流程任务而不是I / O绑定或阻止(multiprocessingthreading),则可以将更ThreadPoolExecutor改为ProcessPoolExecutor

With borrowing from this post we know about choosing between the multithreading, multiprocessing, and async/asyncio and their usage.

Python 3 has a new built-in library in order to concurrency and parallelism: concurrent.futures

So I’ll demonstrate through an experiment to run four tasks (i.e. .sleep() method) by Threading-Pool manner:

from concurrent.futures import ThreadPoolExecutor, as_completed
from time import sleep, time

def concurrent(max_worker=1):
    futures = []

    tick = time()
    with ThreadPoolExecutor(max_workers=max_worker) as executor:
        futures.append(executor.submit(sleep, 2))  # Two seconds sleep
        futures.append(executor.submit(sleep, 1))
        futures.append(executor.submit(sleep, 7))
        futures.append(executor.submit(sleep, 3))

        for future in as_completed(futures):
            if future.result() is not None:
                print(future.result())

    print('Total elapsed time by {} workers:'.format(max_worker), time()-tick)

concurrent(5)
concurrent(4)
concurrent(3)
concurrent(2)
concurrent(1)

Output:

Total elapsed time by 5 workers: 7.007831811904907
Total elapsed time by 4 workers: 7.007944107055664
Total elapsed time by 3 workers: 7.003149509429932
Total elapsed time by 2 workers: 8.004627466201782
Total elapsed time by 1 workers: 13.013478994369507

[NOTE]:

  • As you can see in the above results, the best case was 3 workers for those four tasks.
  • If you have a process task instead of I/O bound or blocking (multiprocessing vs threading) you could change the ThreadPoolExecutor to ProcessPoolExecutor.

回答 17

先前的解决方案均未在我的GNU / Linux服务器(我没有管理员权限)上实际使用多个内核。他们只是在一个核心上运行。

我使用了较低级别的os.fork界面来生成多个进程。这是对我有用的代码:

from os import fork

values = ['different', 'values', 'for', 'threads']

for i in range(len(values)):
    p = fork()
    if p == 0:
        my_function(values[i])
        break

None of the previous solutions actually used multiple cores on my GNU/Linux server (where I don’t have administrator rights). They just ran on a single core.

I used the lower level os.fork interface to spawn multiple processes. This is the code that worked for me:

from os import fork

values = ['different', 'values', 'for', 'threads']

for i in range(len(values)):
    p = fork()
    if p == 0:
        my_function(values[i])
        break

回答 18

import threading
import requests

def send():

  r = requests.get('https://www.stackoverlow.com')

thread = []
t = threading.Thread(target=send())
thread.append(t)
t.start()
import threading
import requests

def send():

  r = requests.get('https://www.stackoverlow.com')

thread = []
t = threading.Thread(target=send())
thread.append(t)
t.start()