I have a multi-threading Python program, and a utility function, writeLog(message), that writes out a timestamp followed by the message. Unfortunately, the resultant log file gives no indication of which thread is generating which message.
I would like writeLog() to be able to add something to the message to identify which thread is calling it. Obviously I could just make the threads pass this information in, but that would be a lot more work. Is there some thread equivalent of os.getpid() that I could use?
Using the logging module you can automatically add the current thread identifier in each log entry.
Just use one of these LogRecord mapping keys in your logger format string:
import ctypes
libc = ctypes.cdll.LoadLibrary('libc.so.6')# System dependent, see e.g. /usr/include/x86_64-linux-gnu/asm/unistd_64.h
SYS_gettid =186def getThreadId():"""Returns OS thread id - Specific to Linux"""return libc.syscall(SYS_gettid)
The thread.get_ident() function returns a long integer on Linux. It’s not really a thread id.
I use this method to really get the thread id on Linux:
import ctypes
libc = ctypes.cdll.LoadLibrary('libc.so.6')
# System dependent, see e.g. /usr/include/x86_64-linux-gnu/asm/unistd_64.h
SYS_gettid = 186
def getThreadId():
"""Returns OS thread id - Specific to Linux"""
return libc.syscall(SYS_gettid)
...
A thread has a name.The name can be passed to the constructor,and read or changed through the name attribute....Thread.name
A string used for identification purposes only.It has no semantics.Multiple threads may
be given the same name.The initial name is set by the constructor.
...
A thread has a name.
The name can be passed to the constructor,
and read or changed through the name attribute.
...
Thread.name
A string used for identification purposes only.
It has no semantics. Multiple threads may
be given the same name. The initial name is set by the constructor.
----8<----(xos.pyx)"""module xos complements standard module os"""
cdef extern from"<sys/syscall.h>":
long syscall(long number,...)
const int SYS_gettid
# gettid returns current OS thread identifier. def gettid():return syscall(SYS_gettid)
和
----8<----(test.py)import pyximport; pyximport.install()import xos
...print'my tid: %d'% xos.gettid()
Similarly to @brucexin I needed to get OS-level thread identifier (which != thread.get_ident()) and use something like below not to depend on particular numbers and being amd64-only:
---- 8< ---- (xos.pyx)
"""module xos complements standard module os"""
cdef extern from "<sys/syscall.h>":
long syscall(long number, ...)
const int SYS_gettid
# gettid returns current OS thread identifier.
def gettid():
return syscall(SYS_gettid)
I notice that it is often suggested to use queues with multiple threads, instead of lists and .pop(). Is this because lists are not thread-safe, or for some other reason?
Lists themselves are thread-safe. In CPython the GIL protects against concurrent accesses to them, and other implementations take care to use a fine-grained lock or a synchronized datatype for their list implementations. However, while lists themselves can’t go corrupt by attempts to concurrently access, the lists’s data is not protected. For example:
L[0] += 1
is not guaranteed to actually increase L[0] by one if another thread does the same thing, because += is not an atomic operation. (Very, very few operations in Python are actually atomic, because most of them can cause arbitrary Python code to be called.) You should use Queues because if you just use an unprotected list, you may get or delete the wrong item because of race conditions.
To clarify a point in Thomas’ excellent answer, it should be mentioned that append()is thread safe.
This is because there is no concern that data being read will be in the same place once we go to write to it. The append() operation does not read data, it only writes data to the list.
import threading
import time
# Change this number as you please, bigger numbers will get the error quickly
count =1000
l =[]def add():for i in range(count):
l.append(i)
time.sleep(0.0001)def remove():for i in range(count):
l.remove(i)
time.sleep(0.0001)
t1 = threading.Thread(target=add)
t2 = threading.Thread(target=remove)
t1.start()
t2.start()
t1.join()
t2.join()print(l)
错误时输出
Exceptionin thread Thread-63:Traceback(most recent call last):File"/Users/zup/.pyenv/versions/3.6.8/lib/python3.6/threading.py", line 916,in _bootstrap_inner
self.run()File"/Users/zup/.pyenv/versions/3.6.8/lib/python3.6/threading.py", line 864,in run
self._target(*self._args,**self._kwargs)File"<ipython-input-30-ecfbac1c776f>", line 13,in remove
l.remove(i)ValueError: list.remove(x): x notin list
使用锁的版本
import threading
import time
count =1000
l =[]
lock = threading.RLock()def add():with lock:for i in range(count):
l.append(i)
time.sleep(0.0001)def remove():with lock:for i in range(count):
l.remove(i)
time.sleep(0.0001)
t1 = threading.Thread(target=add)
t2 = threading.Thread(target=remove)
t1.start()
t2.start()
t1.join()
t2.join()print(l)
I recently had this case where I needed to append to a list continuously in one thread, loop through the items and check if the item was ready, it was an AsyncResult in my case and remove it from the list only if it was ready.
I could not find any examples that demonstrated my problem clearly
Here is an example demonstrating adding to list in one thread continuously and removing from the same list in another thread continuously
The flawed version runs easily on smaller numbers but keep the numbers big enough and run a few times and you will see the error
The FLAWED version
import threading
import time
# Change this number as you please, bigger numbers will get the error quickly
count = 1000
l = []
def add():
for i in range(count):
l.append(i)
time.sleep(0.0001)
def remove():
for i in range(count):
l.remove(i)
time.sleep(0.0001)
t1 = threading.Thread(target=add)
t2 = threading.Thread(target=remove)
t1.start()
t2.start()
t1.join()
t2.join()
print(l)
Output when ERROR
Exception in thread Thread-63:
Traceback (most recent call last):
File "/Users/zup/.pyenv/versions/3.6.8/lib/python3.6/threading.py", line 916, in _bootstrap_inner
self.run()
File "/Users/zup/.pyenv/versions/3.6.8/lib/python3.6/threading.py", line 864, in run
self._target(*self._args, **self._kwargs)
File "<ipython-input-30-ecfbac1c776f>", line 13, in remove
l.remove(i)
ValueError: list.remove(x): x not in list
Version that uses locks
import threading
import time
count = 1000
l = []
lock = threading.RLock()
def add():
with lock:
for i in range(count):
l.append(i)
time.sleep(0.0001)
def remove():
with lock:
for i in range(count):
l.remove(i)
time.sleep(0.0001)
t1 = threading.Thread(target=add)
t2 = threading.Thread(target=remove)
t1.start()
t2.start()
t1.join()
t2.join()
print(l)
Output
[] # Empty list
Conclusion
As mentioned in the earlier answers while the act of appending or popping elements from the list itself is thread safe, what is not thread safe is when you append in one thread and pop in another
A somewhat clumsy ascii-art to demonstrate the mechanism:
The join() is presumably called by the main-thread. It could also be called by another thread, but would needlessly complicate the diagram.
join-calling should be placed in the track of the main-thread, but to express thread-relation and keep it as simple as possible, I choose to place it in the child-thread instead.
without join:
+---+---+------------------ main-thread
| |
| +........... child-thread(short)
+.................................. child-thread(long)
with join
+---+---+------------------***********+### main-thread
| | |
| +...........join() | child-thread(short)
+......................join()...... child-thread(long)
with join and daemon thread
+-+--+---+------------------***********+### parent-thread
| | | |
| | +...........join() | child-thread(short)
| +......................join()...... child-thread(long)
+,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, child-thread(long + daemonized)
'-' main-thread/parent-thread/main-program execution
'.' child-thread execution
'#' optional parent-thread execution after join()-blocked parent-thread could
continue
'*' main-thread 'sleeping' in join-method, waiting for child-thread to finish
',' daemonized thread - 'ignores' lifetime of other threads;
terminates when main-programs exits; is normally meant for
join-independent tasks
So the reason you don’t see any changes is because your main-thread does nothing after your join.
You could say join is (only) relevant for the execution-flow of the main-thread.
If, for example, you want to concurrently download a bunch of pages to concatenate them into a single large page, you may start concurrent downloads using threads, but need to wait until the last page/thread is finished before you start assembling a single page out of many. That’s when you use join().
join([timeout])
Wait until the thread terminates. This blocks the calling thread until the thread whose join() method is called terminates – either normally or through an unhandled exception – or until the optional timeout occurs.
This means that the main thread which spawns t and d, waits for t to finish until it finishes.
Depending on the logic your program employs, you may want to wait until a thread finishes before your main thread continues.
Also from the docs:
A thread can be flagged as a “daemon thread”. The significance of this flag is that the entire Python program exits when only daemon threads are left.
Here the master thread explicitly waits for the t thread to finish until it calls print the second time.
Alternatively if we had this:
print 'Test one'
print 'Test two'
t.join()
We’ll get this output:
Test one
Test two
Test non-daemon
Here we do our job in the main thread and then we wait for the t thread to finish. In this case we might even remove the explicit joining t.join() and the program will implicitly wait for t to finish.
回答 2
感谢您提供此主题-它对我也有很大帮助。
我今天学到了有关.join()的知识。
这些线程并行运行:
d.start()
t.start()
d.join()
t.join()
这些依次运行(不是我想要的):
d.start()
d.join()
t.start()
t.join()
特别是,我试图变得聪明和整洁:
classKiki(threading.Thread):def __init__(self, time):
super(Kiki, self).__init__()
self.time = time
self.start()
self.join()
class Kiki(threading.Thread):
def __init__(self, time):
super(Kiki, self).__init__()
self.time = time
self.start()
self.join()
This works! But it runs sequentially. I can put the self.start() in __ init __, but not the self.join(). That has to be done after every thread has been started.
join() is what causes the main thread to wait for your thread to finish. Otherwise, your thread runs all by itself.
So one way to think of join() as a “hold” on the main thread — it sort of de-threads your thread and executes sequentially in the main thread, before the main thread can continue. It assures that your thread is complete before the main thread moves forward. Note that this means it’s ok if your thread is already finished before you call the join() — the main thread is simply released immediately when join() is called.
In fact, it just now occurs to me that the main thread waits at d.join() until thread d finishes before it moves on to t.join().
In fact, to be very clear, consider this code:
import threading
import time
class Kiki(threading.Thread):
def __init__(self, time):
super(Kiki, self).__init__()
self.time = time
self.start()
def run(self):
print self.time, " seconds start!"
for i in range(0,self.time):
time.sleep(1)
print "1 sec of ", self.time
print self.time, " seconds finished!"
t1 = Kiki(3)
t2 = Kiki(2)
t3 = Kiki(1)
t1.join()
print "t1.join() finished"
t2.join()
print "t2.join() finished"
t3.join()
print "t3.join() finished"
It produces this output (note how the print statements are threaded into each other.)
$ python test_thread.py
32 seconds start! seconds start!1
seconds start!
1 sec of 1
1 sec of 1 seconds finished!
21 sec of
3
1 sec of 3
1 sec of 2
2 seconds finished!
1 sec of 3
3 seconds finished!
t1.join() finished
t2.join() finished
t3.join() finished
$
The t1.join() is holding up the main thread. All three threads complete before the t1.join() finishes and the main thread moves on to execute the print then t2.join() then print then t3.join() then print.
Corrections welcome. I’m also new to threading.
(Note: in case you’re interested, I’m writing code for a DrinkBot, and I need threading to run the ingredient pumps concurrently rather than sequentially — less time to wait for each drink.)
>>>from threading importThread>>>import time
>>>def sam():...print'started'... time.sleep(10)...print'waiting for 10sec'...>>> t =Thread(target=sam)>>> t.start()
started
>>> t.join()# with join interpreter will wait until your process get completed or terminated
done?# this line printed after thread execution stopped i.e after 10sec
waiting for10sec>>> done?
没有连接解释器不会等到进程被终止,
>>> t =Thread(target=sam)>>> t.start()
started
>>>print'yes done'#without join interpreter wont wait until process get terminated
yes done
>>> waiting for10sec
With join – interpreter will wait until your process get completed or terminated
>>> from threading import Thread
>>> import time
>>> def sam():
... print 'started'
... time.sleep(10)
... print 'waiting for 10sec'
...
>>> t = Thread(target=sam)
>>> t.start()
started
>>> t.join() # with join interpreter will wait until your process get completed or terminated
done? # this line printed after thread execution stopped i.e after 10sec
waiting for 10sec
>>> done?
without join – interpreter wont wait until process get terminated,
>>> t = Thread(target=sam)
>>> t.start()
started
>>> print 'yes done' #without join interpreter wont wait until process get terminated
yes done
>>> waiting for 10sec
When making join(t) function for both non-daemon thread and daemon thread, the main thread (or main process) should wait t seconds, then can go further to work on its own process. During the t seconds waiting time, both of the children threads should do what they can do, such as printing out some text. After the t seconds, if non-daemon thread still didn’t finish its job, and it still can finish it after the main process finishes its job, but for daemon thread, it just missed its opportunity window. However, it will eventually die after the python program exits. Please correct me if there is something wrong.
#1 - Without Join():import threading
import time
def loiter():print('You are loitering!')
time.sleep(5)print('You are not loitering anymore!')
t1 = threading.Thread(target = loiter)
t1.start()print('Hey, I do not want to loiter!')'''
Output without join()-->
You are loitering!
Hey, I do not want to loiter!
You are not loitering anymore! #After 5 seconds --> This statement will be printed
'''#2 - With Join():import threading
import time
def loiter():print('You are loitering!')
time.sleep(5)print('You are not loitering anymore!')
t1 = threading.Thread(target = loiter)
t1.start()
t1.join()print('Hey, I do not want to loiter!')'''
Output with join() -->
You are loitering!
You are not loitering anymore! #After 5 seconds --> This statement will be printed
Hey, I do not want to loiter!
'''
In python 3.x join() is used to join a thread with the main thread i.e. when join() is used for a particular thread the main thread will stop executing until the execution of joined thread is complete.
#1 - Without Join():
import threading
import time
def loiter():
print('You are loitering!')
time.sleep(5)
print('You are not loitering anymore!')
t1 = threading.Thread(target = loiter)
t1.start()
print('Hey, I do not want to loiter!')
'''
Output without join()-->
You are loitering!
Hey, I do not want to loiter!
You are not loitering anymore! #After 5 seconds --> This statement will be printed
'''
#2 - With Join():
import threading
import time
def loiter():
print('You are loitering!')
time.sleep(5)
print('You are not loitering anymore!')
t1 = threading.Thread(target = loiter)
t1.start()
t1.join()
print('Hey, I do not want to loiter!')
'''
Output with join() -->
You are loitering!
You are not loitering anymore! #After 5 seconds --> This statement will be printed
Hey, I do not want to loiter!
'''
回答 7
此示例演示了该.join()操作:
import threading
import time
def threaded_worker():for r in range(10):print('Other: ', r)
time.sleep(2)
thread_ = threading.Timer(1, threaded_worker)
thread_.daemon =True# If the main thread kills, this thread will be killed too.
thread_.start()
flag =Truefor i in range(10):print('Main: ', i)
time.sleep(2)if flag and i >4:print('''
Threaded_worker() joined to the main thread.
Now we have a sequential behavior instead of concurrency.
''')
thread_.join()
flag =False
出:
Main:0Other:0Main:1Other:1Main:2Other:2Main:3Other:3Main:4Other:4Main:5Other:5Threaded_worker() joined to the main thread.Now we have a sequential behavior instead of concurrency.Other:6Other:7Other:8Other:9Main:6Main:7Main:8Main:9
import threading
import time
def threaded_worker():
for r in range(10):
print('Other: ', r)
time.sleep(2)
thread_ = threading.Timer(1, threaded_worker)
thread_.daemon = True # If the main thread kills, this thread will be killed too.
thread_.start()
flag = True
for i in range(10):
print('Main: ', i)
time.sleep(2)
if flag and i > 4:
print(
'''
Threaded_worker() joined to the main thread.
Now we have a sequential behavior instead of concurrency.
''')
thread_.join()
flag = False
Out:
Main: 0
Other: 0
Main: 1
Other: 1
Main: 2
Other: 2
Main: 3
Other: 3
Main: 4
Other: 4
Main: 5
Other: 5
Threaded_worker() joined to the main thread.
Now we have a sequential behavior instead of concurrency.
Other: 6
Other: 7
Other: 8
Other: 9
Main: 6
Main: 7
Main: 8
Main: 9
“What’s the use of using join()?” you say. Really, it’s the same answer as “what’s the use of closing files, since python and the OS will close my file for me when my program exits?”.
It’s simply a matter of good programming. You should join() your threads at the point in the code that the thread should not be running anymore, either because you positively have to ensure the thread is not running to interfere with your own code, or that you want to behave correctly in a larger system.
You might say “I don’t want my code to delay giving an answer” just because of the additional time that the join() might require. This may be perfectly valid in some scenarios, but you now need to take into account that your code is “leaving cruft around for python and the OS to clean up”. If you do this for performance reasons, I strongly encourage you to document that behavior. This is especially true if you’re building a library/package that others are expected to utilize.
There’s no reason to not join(), other than performance reasons, and I would argue that your code does not need to perform that well.
A thread can be flagged as a “daemon thread”. The significance of this
flag is that the entire Python program exits when only daemon threads
are left. The initial value is inherited from the creating thread.
Does anyone have a clearer explanation of what that means or a practical example showing where you would set threads as daemonic?
Clarify it for me: so the only situation you wouldn’t set threads as daemonic, is when you want them to continue running after the main thread exits?
Some threads do background tasks, like sending keepalive packets, or performing periodic garbage collection, or whatever. These are only useful when the main program is running, and it’s okay to kill them off once the other, non-daemon, threads have exited.
Without daemon threads, you’d have to keep track of them, and tell them to exit, before your program can completely quit. By setting them as daemon threads, you can let them run and forget about them, and when your program quits, any daemon threads are killed automatically.
Let’s say you’re making some kind of dashboard widget. As part of this, you want it to display the unread message count in your email box. So you make a little thread that will:
Connect to the mail server and ask how many unread messages you have.
Signal the GUI with the updated count.
Sleep for a little while.
When your widget starts up, it would create this thread, designate it a daemon, and start it. Because it’s a daemon, you don’t have to think about it; when your widget exits, the thread will stop automatically.
Other posters gave some examples for situations in which you’d use daemon threads. My recommendation, however, is never to use them.
It’s not because they’re not useful, but because there are some bad side effects you can experience if you use them. Daemon threads can still execute after the Python runtime starts tearing down things in the main thread, causing some pretty bizarre exceptions.
A simpler way to think about it, perhaps: when main returns, your process will not exit if there are non-daemon threads still running.
A bit of advice: Clean shutdown is easy to get wrong when threads and synchronization are involved – if you can avoid it, do so. Use daemon threads whenever possible.
Chris already explained what daemon threads are, so let’s talk about practical usage. Many thread pool implementations use daemon threads for task workers. Workers are threads which execute tasks from task queue.
Worker needs to keep waiting for tasks in task queue indefinitely as they don’t know when new task will appear. Thread which assigns tasks (say main thread) only knows when tasks are over. Main thread waits on task queue to get empty and then exits. If workers are user threads i.e. non-daemon, program won’t terminate. It will keep waiting for these indefinitely running workers, even though workers aren’t doing anything useful. Mark workers daemon threads, and main thread will take care of killing them as soon as it’s done handling tasks.
Quoting Chris: “… when your program quits, any daemon threads are killed automatically.”. I think that sums it up. You should be careful when you use them as they abruptly terminate when main program executes to completion.
When your second thread is non-Daemon, your application’s primary main thread cannot quit because its exit criteria is being tied to the exit also of non-Daemon thread(s). Threads cannot be forcibly killed in python, therefore your app will have to really wait for the non-Daemon thread(s) to exit. If this behavior is not what you want, then set your second thread as daemon so that it won’t hold back your application from exiting.
In[1]:import signal
# Register an handler for the timeoutIn[2]:def handler(signum, frame):...:print("Forever is over!")...:raiseException("end of time")...:# This function *may* run for an indetermined time...In[3]:def loop_forever():...:import time
...:while1:...:print("sec")...: time.sleep(1)...:...:# Register the signal function handlerIn[4]: signal.signal(signal.SIGALRM, handler)Out[4]:0# Define a timeout for your functionIn[5]: signal.alarm(10)Out[5]:0In[6]:try:...: loop_forever()...:exceptException, exc:...:print(exc)....:
sec
sec
sec
sec
sec
sec
sec
sec
Foreveris over!
end of time
# Cancel the timer if the function returned before timeout# (ok, mine won't but yours maybe will :)In[7]: signal.alarm(0)Out[7]:0
You may use the signal package if you are running on UNIX:
In [1]: import signal
# Register an handler for the timeout
In [2]: def handler(signum, frame):
...: print("Forever is over!")
...: raise Exception("end of time")
...:
# This function *may* run for an indetermined time...
In [3]: def loop_forever():
...: import time
...: while 1:
...: print("sec")
...: time.sleep(1)
...:
...:
# Register the signal function handler
In [4]: signal.signal(signal.SIGALRM, handler)
Out[4]: 0
# Define a timeout for your function
In [5]: signal.alarm(10)
Out[5]: 0
In [6]: try:
...: loop_forever()
...: except Exception, exc:
...: print(exc)
....:
sec
sec
sec
sec
sec
sec
sec
sec
Forever is over!
end of time
# Cancel the timer if the function returned before timeout
# (ok, mine won't but yours maybe will :)
In [7]: signal.alarm(0)
Out[7]: 0
10 seconds after the call signal.alarm(10), the handler is called. This raises an exception that you can intercept from the regular Python code.
This module doesn’t play well with threads (but then, who does?)
Note that since we raise an exception when timeout happens, it may end up caught and ignored inside the function, for example of one such function:
def loop_forever():
while 1:
print('sec')
try:
time.sleep(10)
except:
continue
回答 1
您可以multiprocessing.Process用来精确地做到这一点。
码
import multiprocessing
import time
# bardef bar():for i in range(100):print"Tick"
time.sleep(1)if __name__ =='__main__':# Start bar as a process
p = multiprocessing.Process(target=bar)
p.start()# Wait for 10 seconds or until process finishes
p.join(10)# If thread is still activeif p.is_alive():print"running... let's kill it..."# Terminate
p.terminate()
p.join()
You can use multiprocessing.Process to do exactly that.
Code
import multiprocessing
import time
# bar
def bar():
for i in range(100):
print "Tick"
time.sleep(1)
if __name__ == '__main__':
# Start bar as a process
p = multiprocessing.Process(target=bar)
p.start()
# Wait for 10 seconds or until process finishes
p.join(10)
# If thread is still active
if p.is_alive():
print "running... let's kill it..."
# Terminate - may not work if process is stuck for good
p.terminate()
# OR Kill - will work for sure, no chance for process to finish nicely however
# p.kill()
p.join()
def quit_function(fn_name):# print to stderr, unbuffered in Python 2.print('{0} took too long'.format(fn_name), file=sys.stderr)
sys.stderr.flush()# Python 3 stderr is likely buffered.
thread.interrupt_main()# raises KeyboardInterrupt
这是装饰器本身:
def exit_after(s):'''
use as decorator to exit process if
function takes longer than s seconds
'''def outer(fn):def inner(*args,**kwargs):
timer = threading.Timer(s, quit_function, args=[fn.__name__])
timer.start()try:
result = fn(*args,**kwargs)finally:
timer.cancel()return result
return inner
return outer
用法
这是直接回答您有关5秒后退出的问题的用法!:
@exit_after(5)def countdown(n):print('countdown started', flush=True)for i in range(n,-1,-1):print(i, end=', ', flush=True)
sleep(1)print('countdown finished')
演示:
>>> countdown(3)
countdown started
3,2,1,0, countdown finished
>>> countdown(10)
countdown started
10,9,8,7,6, countdown took too long
Traceback(most recent call last):File"<stdin>", line 1,in<module>File"<stdin>", line 11,in inner
File"<stdin>", line 6,in countdown
KeyboardInterrupt
第二个函数调用将不会结束,而是该过程应退出并回溯!
KeyboardInterrupt 并不总是停止休眠线程
请注意,在Windows上的Python 2上,睡眠不会总是被键盘中断打断,例如:
@exit_after(1)def sleep10():
sleep(10)print('slept 10 seconds')>>> sleep10()
sleep10 took too long # Note that it hangs here about 9 more secondsTraceback(most recent call last):File"<stdin>", line 1,in<module>File"<stdin>", line 11,in inner
File"<stdin>", line 3,in sleep10
KeyboardInterrupt
>>>try:... countdown(10)...exceptKeyboardInterrupt:...print('do something else')...
countdown started
10,9,8,7,6, countdown took too long
do something else
Now we have imported our functionality from the standard library.
exit_after decorator
Next we need a function to terminate the main() from the child thread:
def quit_function(fn_name):
# print to stderr, unbuffered in Python 2.
print('{0} took too long'.format(fn_name), file=sys.stderr)
sys.stderr.flush() # Python 3 stderr is likely buffered.
thread.interrupt_main() # raises KeyboardInterrupt
And here is the decorator itself:
def exit_after(s):
'''
use as decorator to exit process if
function takes longer than s seconds
'''
def outer(fn):
def inner(*args, **kwargs):
timer = threading.Timer(s, quit_function, args=[fn.__name__])
timer.start()
try:
result = fn(*args, **kwargs)
finally:
timer.cancel()
return result
return inner
return outer
Usage
And here’s the usage that directly answers your question about exiting after 5 seconds!:
@exit_after(5)
def countdown(n):
print('countdown started', flush=True)
for i in range(n, -1, -1):
print(i, end=', ', flush=True)
sleep(1)
print('countdown finished')
Demo:
>>> countdown(3)
countdown started
3, 2, 1, 0, countdown finished
>>> countdown(10)
countdown started
10, 9, 8, 7, 6, countdown took too long
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<stdin>", line 11, in inner
File "<stdin>", line 6, in countdown
KeyboardInterrupt
The second function call will not finish, instead the process should exit with a traceback!
KeyboardInterrupt does not always stop a sleeping thread
Note that sleep will not always be interrupted by a keyboard interrupt, on Python 2 on Windows, e.g.:
@exit_after(1)
def sleep10():
sleep(10)
print('slept 10 seconds')
>>> sleep10()
sleep10 took too long # Note that it hangs here about 9 more seconds
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<stdin>", line 11, in inner
File "<stdin>", line 3, in sleep10
KeyboardInterrupt
def timeout(func, args=(), kwargs={}, timeout_duration=1, default=None):import signal
classTimeoutError(Exception):passdef handler(signum, frame):raiseTimeoutError()# set the timeout handler
signal.signal(signal.SIGALRM, handler)
signal.alarm(timeout_duration)try:
result = func(*args,**kwargs)exceptTimeoutErroras exc:
result = default
finally:
signal.alarm(0)return result
I have a different proposal which is a pure function (with the same API as the threading suggestion) and seems to work fine (based on suggestions on this thread)
def timeout(func, args=(), kwargs={}, timeout_duration=1, default=None):
import signal
class TimeoutError(Exception):
pass
def handler(signum, frame):
raise TimeoutError()
# set the timeout handler
signal.signal(signal.SIGALRM, handler)
signal.alarm(timeout_duration)
try:
result = func(*args, **kwargs)
except TimeoutError as exc:
result = default
finally:
signal.alarm(0)
return result
I ran across this thread when searching for a timeout call on unit tests. I didn’t find anything simple in the answers or 3rd party packages so I wrote the decorator below you can drop right into code:
import multiprocessing.pool
import functools
def timeout(max_timeout):
"""Timeout decorator, parameter in seconds."""
def timeout_decorator(item):
"""Wrap the original function."""
@functools.wraps(item)
def func_wrapper(*args, **kwargs):
"""Closure for function."""
pool = multiprocessing.pool.ThreadPool(processes=1)
async_result = pool.apply_async(item, args, kwargs)
# raises a TimeoutError if execution exceeds max_timeout
return async_result.get(max_timeout)
return func_wrapper
return timeout_decorator
Then it’s as simple as this to timeout a test or any function you like:
@timeout(5.0) # if execution takes longer than 5 seconds, raise a TimeoutError
def test_base_regression(self):
...
The stopit package, found on pypi, seems to handle timeouts well.
I like the @stopit.threading_timeoutable decorator, which adds a timeout parameter to the decorated function, which does what you expect, it stops the function.
from concurrent.futures importProcessPoolExecutor# Warning: this does not terminate function if timeoutdef timeout_five(fnc,*args,**kwargs):withProcessPoolExecutor()as p:
f = p.submit(fnc,*args,**kwargs)return f.result(timeout=5)
There are a lot of suggestions, but none using concurrent.futures, which I think is the most legible way to handle this.
from concurrent.futures import ProcessPoolExecutor
# Warning: this does not terminate function if timeout
def timeout_five(fnc, *args, **kwargs):
with ProcessPoolExecutor() as p:
f = p.submit(fnc, *args, **kwargs)
return f.result(timeout=5)
Super simple to read and maintain.
We make a pool, submit a single process and then wait up to 5 seconds before raising a TimeoutError that you could catch and handle however you needed.
Native to python 3.2+ and backported to 2.7 (pip install futures).
Switching between threads and processes is as simple as replacing ProcessPoolExecutor with ThreadPoolExecutor.
If you want to terminate the Process on timeout I would suggest looking into Pebble.
import time
import timeout_decorator
@timeout_decorator.timeout(5)def mytest():print"Start"for i in range(1,10):
time.sleep(1)print"%d seconds have passed"% i
if __name__ =='__main__':
mytest()
import time
import timeout_decorator
@timeout_decorator.timeout(5)
def mytest():
print "Start"
for i in range(1,10):
time.sleep(1)
print "%d seconds have passed" % i
if __name__ == '__main__':
mytest()
Most of the solutions presented here work wunderfully under Linux on the first glance – because we have fork() and signals() – but on windows the things look a bit different.
And when it comes to subthreads on Linux, You cant use Signals anymore.
In order to spawn a process under Windows, it needs to be picklable – and many decorated functions or Class methods are not.
So You need to use a better pickler like dill and multiprocess (not pickle and multiprocessing) – thats why You cant use ProcessPoolExecutor (or only with limited functionality).
For the timeout itself – You need to define what timeout means – because on Windows it will take considerable (and not determinable) time to spawn the process. This can be tricky on short timeouts. Lets assume, spawning the process takes about 0.5 seconds (easily !!!). If You give a timeout of 0.2 seconds what should happen ?
Should the function time out after 0.5 + 0.2 seconds (so let the method run for 0.2 seconds)?
Or should the called process time out after 0.2 seconds (in that case, the decorated function will ALWAYS timeout, because in that time it is not even spawned) ?
Also nested decorators can be nasty and You cant use Signals in a subthread. If You want to create a truly universal, cross-platform decorator, all this needs to be taken into consideration (and tested).
Other issues are passing exceptions back to the caller, as well as logging issues (if used in the decorated function – logging to files in another process is NOT supported)
I tried to cover all edge cases, You might look into the package wrapt_timeout_decorator, or at least test Your own solutions inspired by the unittests used there.
@Alexis Eggermont – unfortunately I dont have enough points to comment – maybe someone else can notify You – I think I solved Your import issue.
import time
from wrapt_timeout_decorator import*@timeout(5)def mytest(message):print(message)for i in range(1,10):
time.sleep(1)print('{} seconds have passed'.format(i))def main():
mytest('starting')if __name__ =='__main__':
main()
给出以下异常:
TimeoutError:Function mytest timed out after 5 seconds
import time
from wrapt_timeout_decorator import *
@timeout(5)
def mytest(message):
print(message)
for i in range(1,10):
time.sleep(1)
print('{} seconds have passed'.format(i))
def main():
mytest('starting')
if __name__ == '__main__':
main()
Gives the following exception:
TimeoutError: Function mytest timed out after 5 seconds
回答 10
我们可以使用相同的信号。我认为以下示例对您有用。与线程相比,它非常简单。
import signal
def timeout(signum, frame):raise myException
#this is an infinite loop, never ending under normal circumstancesdef main():print'Starting Main ',while1:print'in main ',#SIGALRM is only usable on a unix platform
signal.signal(signal.SIGALRM, timeout)#change 5 to however many seconds you need
signal.alarm(5)try:
main()except myException:print"whoops"
We can use signals for the same. I think the below example will be useful for you. It is very simple compared to threads.
import signal
def timeout(signum, frame):
raise myException
#this is an infinite loop, never ending under normal circumstances
def main():
print 'Starting Main ',
while 1:
print 'in main ',
#SIGALRM is only usable on a unix platform
signal.signal(signal.SIGALRM, timeout)
#change 5 to however many seconds you need
signal.alarm(5)
try:
main()
except myException:
print "whoops"
#!/usr/bin/python# lightly modified version of http://code.activestate.com/recipes/577600-queue-for-managing-multiple-sigalrm-alarms-concurr/"""alarm.py: Permits multiple SIGALRM events to be queued.
Uses a `heapq` to store the objects to be called when an alarm signal is
raised, so that the next alarm is always at the top of the heap.
"""import heapq
import signal
from time import time
__version__ ='$Revision: 2539 $'.split()[1]
alarmlist =[]
__new_alarm =lambda t, f, a, k:(t + time(), f, a, k)
__next_alarm =lambda: int(round(alarmlist[0][0]- time()))if alarmlist elseNone
__set_alarm =lambda: signal.alarm(max(__next_alarm(),1))classTimeoutError(Exception):def __init__(self, message, id_=None):
self.message = message
self.id_ = id_
classTimeout:''' id_ allows for nested timeouts. '''def __init__(self, id_=None, seconds=1, error_message='Timeout'):
self.seconds = seconds
self.error_message = error_message
self.id_ = id_
def handle_timeout(self):raiseTimeoutError(self.error_message, self.id_)def __enter__(self):
self.this_alarm = alarm(self.seconds, self.handle_timeout)def __exit__(self, type, value, traceback):try:
cancel(self.this_alarm)exceptValueError:passdef __clear_alarm():"""Clear an existing alarm.
If the alarm signal was set to a callable other than our own, queue the
previous alarm settings.
"""
oldsec = signal.alarm(0)
oldfunc = signal.signal(signal.SIGALRM, __alarm_handler)if oldsec >0and oldfunc != __alarm_handler:
heapq.heappush(alarmlist,(__new_alarm(oldsec, oldfunc,[],{})))def __alarm_handler(*zargs):"""Handle an alarm by calling any due heap entries and resetting the alarm.
Note that multiple heap entries might get called, especially if calling an
entry takes a lot of time.
"""try:
nextt = __next_alarm()while nextt isnotNoneand nextt <=0:(tm, func, args, keys)= heapq.heappop(alarmlist)
func(*args,**keys)
nextt = __next_alarm()finally:if alarmlist: __set_alarm()def alarm(sec, func,*args,**keys):"""Set an alarm.
When the alarm is raised in `sec` seconds, the handler will call `func`,
passing `args` and `keys`. Return the heap entry (which is just a big
tuple), so that it can be cancelled by calling `cancel()`.
"""
__clear_alarm()try:
newalarm = __new_alarm(sec, func, args, keys)
heapq.heappush(alarmlist, newalarm)return newalarm
finally:
__set_alarm()def cancel(alarm):"""Cancel an alarm by passing the heap entry returned by `alarm()`.
It is an error to try to cancel an alarm which has already occurred.
"""
__clear_alarm()try:
alarmlist.remove(alarm)
heapq.heapify(alarmlist)finally:if alarmlist: __set_alarm()
和用法示例:
import alarm
from time import sleep
try:with alarm.Timeout(id_='a', seconds=5):try:with alarm.Timeout(id_='b', seconds=2):
sleep(3)except alarm.TimeoutErroras e:print'raised', e.id_
sleep(30)except alarm.TimeoutErroras e:print'raised', e.id_
else:print'nope.'
#!/usr/bin/python
# lightly modified version of http://code.activestate.com/recipes/577600-queue-for-managing-multiple-sigalrm-alarms-concurr/
"""alarm.py: Permits multiple SIGALRM events to be queued.
Uses a `heapq` to store the objects to be called when an alarm signal is
raised, so that the next alarm is always at the top of the heap.
"""
import heapq
import signal
from time import time
__version__ = '$Revision: 2539 $'.split()[1]
alarmlist = []
__new_alarm = lambda t, f, a, k: (t + time(), f, a, k)
__next_alarm = lambda: int(round(alarmlist[0][0] - time())) if alarmlist else None
__set_alarm = lambda: signal.alarm(max(__next_alarm(), 1))
class TimeoutError(Exception):
def __init__(self, message, id_=None):
self.message = message
self.id_ = id_
class Timeout:
''' id_ allows for nested timeouts. '''
def __init__(self, id_=None, seconds=1, error_message='Timeout'):
self.seconds = seconds
self.error_message = error_message
self.id_ = id_
def handle_timeout(self):
raise TimeoutError(self.error_message, self.id_)
def __enter__(self):
self.this_alarm = alarm(self.seconds, self.handle_timeout)
def __exit__(self, type, value, traceback):
try:
cancel(self.this_alarm)
except ValueError:
pass
def __clear_alarm():
"""Clear an existing alarm.
If the alarm signal was set to a callable other than our own, queue the
previous alarm settings.
"""
oldsec = signal.alarm(0)
oldfunc = signal.signal(signal.SIGALRM, __alarm_handler)
if oldsec > 0 and oldfunc != __alarm_handler:
heapq.heappush(alarmlist, (__new_alarm(oldsec, oldfunc, [], {})))
def __alarm_handler(*zargs):
"""Handle an alarm by calling any due heap entries and resetting the alarm.
Note that multiple heap entries might get called, especially if calling an
entry takes a lot of time.
"""
try:
nextt = __next_alarm()
while nextt is not None and nextt <= 0:
(tm, func, args, keys) = heapq.heappop(alarmlist)
func(*args, **keys)
nextt = __next_alarm()
finally:
if alarmlist: __set_alarm()
def alarm(sec, func, *args, **keys):
"""Set an alarm.
When the alarm is raised in `sec` seconds, the handler will call `func`,
passing `args` and `keys`. Return the heap entry (which is just a big
tuple), so that it can be cancelled by calling `cancel()`.
"""
__clear_alarm()
try:
newalarm = __new_alarm(sec, func, args, keys)
heapq.heappush(alarmlist, newalarm)
return newalarm
finally:
__set_alarm()
def cancel(alarm):
"""Cancel an alarm by passing the heap entry returned by `alarm()`.
It is an error to try to cancel an alarm which has already occurred.
"""
__clear_alarm()
try:
alarmlist.remove(alarm)
heapq.heapify(alarmlist)
finally:
if alarmlist: __set_alarm()
and a usage example:
import alarm
from time import sleep
try:
with alarm.Timeout(id_='a', seconds=5):
try:
with alarm.Timeout(id_='b', seconds=2):
sleep(3)
except alarm.TimeoutError as e:
print 'raised', e.id_
sleep(30)
except alarm.TimeoutError as e:
print 'raised', e.id_
else:
print 'nope.'
I am trying to understand threading in Python. I’ve looked at the documentation and examples, but quite frankly, many examples are overly sophisticated and I’m having trouble understanding them.
How do you clearly show tasks being divided for multi-threading?
import urllib2from multiprocessing.dummy importPoolasThreadPool
urls =['http://www.python.org','http://www.python.org/about/','http://www.onlamp.com/pub/a/python/2003/04/17/metaclasses.html','http://www.python.org/doc/','http://www.python.org/download/','http://www.python.org/getit/','http://www.python.org/community/','https://wiki.python.org/moin/',]# Make the Pool of workers
pool =ThreadPool(4)# Open the URLs in their own threads# and return the results
results = pool.map(urllib2.urlopen, urls)# Close the pool and wait for the work to finish
pool.close()
pool.join()
以及计时结果:
Single thread:14.4 seconds4Pool:3.1 seconds8Pool:1.4 seconds13Pool:1.3 seconds
from multiprocessing.dummy import Pool as ThreadPool
pool = ThreadPool(4)
results = pool.map(my_function, my_array)
Which is the multithreaded version of:
results = []
for item in my_array:
results.append(my_function(item))
Description
Map is a cool little function, and the key to easily injecting parallelism into your Python code. For those unfamiliar, map is something lifted from functional languages like Lisp. It is a function which maps another function over a sequence.
Map handles the iteration over the sequence for us, applies the function, and stores all of the results in a handy list at the end.
Implementation
Parallel versions of the map function are provided by two libraries:multiprocessing, and also its little known, but equally fantastic step child:multiprocessing.dummy.
multiprocessing.dummy is exactly the same as multiprocessing module, but uses threads instead (an important distinction – use multiple processes for CPU-intensive tasks; threads for (and during) I/O):
multiprocessing.dummy replicates the API of multiprocessing, but is no more than a wrapper around the threading module.
import urllib2
from multiprocessing.dummy import Pool as ThreadPool
urls = [
'http://www.python.org',
'http://www.python.org/about/',
'http://www.onlamp.com/pub/a/python/2003/04/17/metaclasses.html',
'http://www.python.org/doc/',
'http://www.python.org/download/',
'http://www.python.org/getit/',
'http://www.python.org/community/',
'https://wiki.python.org/moin/',
]
# Make the Pool of workers
pool = ThreadPool(4)
# Open the URLs in their own threads
# and return the results
results = pool.map(urllib2.urlopen, urls)
# Close the pool and wait for the work to finish
pool.close()
pool.join()
importQueueimport threadingimport urllib2# Called by each threaddef get_url(q, url):
q.put(urllib2.urlopen(url).read())
theurls =["http://google.com","http://yahoo.com"]
q =Queue.Queue()for u in theurls:
t = threading.Thread(target=get_url, args =(q,u))
t.daemon =True
t.start()
s = q.get()print s
Here’s a simple example: you need to try a few alternative URLs and return the contents of the first one to respond.
import Queue
import threading
import urllib2
# Called by each thread
def get_url(q, url):
q.put(urllib2.urlopen(url).read())
theurls = ["http://google.com", "http://yahoo.com"]
q = Queue.Queue()
for u in theurls:
t = threading.Thread(target=get_url, args = (q,u))
t.daemon = True
t.start()
s = q.get()
print s
This is a case where threading is used as a simple optimization: each subthread is waiting for a URL to resolve and respond, in order to put its contents on the queue; each thread is a daemon (won’t keep the process up if main thread ends — that’s more common than not); the main thread starts all subthreads, does a get on the queue to wait until one of them has done a put, then emits the results and terminates (which takes down any subthreads that might still be running, since they’re daemon threads).
Proper use of threads in Python is invariably connected to I/O operations (since CPython doesn’t use multiple cores to run CPU-bound tasks anyway, the only reason for threading is not blocking the process while there’s a wait for some I/O). Queues are almost invariably the best way to farm out work to threads and/or collect the work’s results, by the way, and they’re intrinsically threadsafe, so they save you from worrying about locks, conditions, events, semaphores, and other inter-thread coordination/communication concepts.
import threading
classSummingThread(threading.Thread):def __init__(self,low,high):
super(SummingThread, self).__init__()
self.low=low
self.high=high
self.total=0def run(self):for i in range(self.low,self.high):
self.total+=i
thread1 =SummingThread(0,500000)
thread2 =SummingThread(500000,1000000)
thread1.start()# This actually causes the thread to run
thread2.start()
thread1.join()# This waits until the thread has completed
thread2.join()# At this point, both threads have completed
result = thread1.total + thread2.total
print result
NOTE: For actual parallelization in Python, you should use the multiprocessing module to fork multiple processes that execute in parallel (due to the global interpreter lock, Python threads provide interleaving, but they are in fact executed serially, not in parallel, and are only useful when interleaving I/O operations).
However, if you are merely looking for interleaving (or are doing I/O operations that can be parallelized despite the global interpreter lock), then the threading module is the place to start. As a really simple example, let’s consider the problem of summing a large range by summing subranges in parallel:
import threading
class SummingThread(threading.Thread):
def __init__(self,low,high):
super(SummingThread, self).__init__()
self.low=low
self.high=high
self.total=0
def run(self):
for i in range(self.low,self.high):
self.total+=i
thread1 = SummingThread(0,500000)
thread2 = SummingThread(500000,1000000)
thread1.start() # This actually causes the thread to run
thread2.start()
thread1.join() # This waits until the thread has completed
thread2.join()
# At this point, both threads have completed
result = thread1.total + thread2.total
print result
Note that the above is a very stupid example, as it does absolutely no I/O and will be executed serially albeit interleaved (with the added overhead of context switching) in CPython due to the global interpreter lock.
Like others mentioned, CPython can use threads only for I/O waits due to GIL.
If you want to benefit from multiple cores for CPU-bound tasks, use multiprocessing:
from multiprocessing import Process
def f(name):
print 'hello', name
if __name__ == '__main__':
p = Process(target=f, args=('bob',))
p.start()
p.join()
回答 4
仅需注意:线程不需要队列。
这是我能想到的最简单的示例,其中显示了10个进程同时运行。
import threading
from random import randint
from time import sleep
def print_number(number):# Sleeps a random 1 to 10 seconds
rand_int_var = randint(1,10)
sleep(rand_int_var)print"Thread "+ str(number)+" slept for "+ str(rand_int_var)+" seconds"
thread_list =[]for i in range(1,10):# Instantiates the thread# (i) does not make a sequence, so (i,)
t = threading.Thread(target=print_number, args=(i,))# Sticks the thread in a list so that it remains accessible
thread_list.append(t)# Starts threadsfor thread in thread_list:
thread.start()# This blocks the calling thread until the thread whose join() method is called is terminated.# From http://docs.python.org/2/library/threading.html#thread-objectsfor thread in thread_list:
thread.join()# Demonstrates that the main process waited for threads to completeprint"Done"
Just a note: A queue is not required for threading.
This is the simplest example I could imagine that shows 10 processes running concurrently.
import threading
from random import randint
from time import sleep
def print_number(number):
# Sleeps a random 1 to 10 seconds
rand_int_var = randint(1, 10)
sleep(rand_int_var)
print "Thread " + str(number) + " slept for " + str(rand_int_var) + " seconds"
thread_list = []
for i in range(1, 10):
# Instantiates the thread
# (i) does not make a sequence, so (i,)
t = threading.Thread(target=print_number, args=(i,))
# Sticks the thread in a list so that it remains accessible
thread_list.append(t)
# Starts threads
for thread in thread_list:
thread.start()
# This blocks the calling thread until the thread whose join() method is called is terminated.
# From http://docs.python.org/2/library/threading.html#thread-objects
for thread in thread_list:
thread.join()
# Demonstrates that the main process waited for threads to complete
print "Done"
try:# For Python 3import queue
from urllib.request import urlopen
except:# For Python 2 importQueueas queue
from urllib2 import urlopen
import threading
worker_data =['http://google.com','http://yahoo.com','http://bing.com']# Load up a queue with your data. This will handle locking
q = queue.Queue()for url in worker_data:
q.put(url)# Define a worker functiondef worker(url_queue):
queue_full =Truewhile queue_full:try:# Get your data off the queue, and do some work
url = url_queue.get(False)
data = urlopen(url).read()print(len(data))except queue.Empty:
queue_full =False# Create as many threads as you want
thread_count =5for i in range(thread_count):
t = threading.Thread(target=worker, args =(q,))
t.start()
The answer from Alex Martelli helped me. However, here is a modified version that I thought was more useful (at least to me).
Updated: works in both Python 2 and Python 3
try:
# For Python 3
import queue
from urllib.request import urlopen
except:
# For Python 2
import Queue as queue
from urllib2 import urlopen
import threading
worker_data = ['http://google.com', 'http://yahoo.com', 'http://bing.com']
# Load up a queue with your data. This will handle locking
q = queue.Queue()
for url in worker_data:
q.put(url)
# Define a worker function
def worker(url_queue):
queue_full = True
while queue_full:
try:
# Get your data off the queue, and do some work
url = url_queue.get(False)
data = urlopen(url).read()
print(len(data))
except queue.Empty:
queue_full = False
# Create as many threads as you want
thread_count = 5
for i in range(thread_count):
t = threading.Thread(target=worker, args = (q,))
t.start()
importQueueimport threading
import multiprocessing
import subprocess
q =Queue.Queue()for i in range(30):# Put 30 tasks in the queue
q.put(i)def worker():whileTrue:
item = q.get()# Execute a task: call a shell program and wait until it completes
subprocess.call("echo "+ str(item), shell=True)
q.task_done()
cpus = multiprocessing.cpu_count()# Detect number of coresprint("Creating %d threads"% cpus)for i in range(cpus):
t = threading.Thread(target=worker)
t.daemon =True
t.start()
q.join()# Block until all tasks are done
I found this very useful: create as many threads as cores and let them execute a (large) number of tasks (in this case, calling a shell program):
import Queue
import threading
import multiprocessing
import subprocess
q = Queue.Queue()
for i in range(30): # Put 30 tasks in the queue
q.put(i)
def worker():
while True:
item = q.get()
# Execute a task: call a shell program and wait until it completes
subprocess.call("echo " + str(item), shell=True)
q.task_done()
cpus = multiprocessing.cpu_count() # Detect number of cores
print("Creating %d threads" % cpus)
for i in range(cpus):
t = threading.Thread(target=worker)
t.daemon = True
t.start()
q.join() # Block until all tasks are done
import concurrent.futures
import urllib.request
URLS =['http://www.foxnews.com/','http://www.cnn.com/','http://europe.wsj.com/','http://www.bbc.co.uk/','http://some-made-up-domain.com/']# Retrieve a single page and report the URL and contentsdef load_url(url, timeout):with urllib.request.urlopen(url, timeout=timeout)as conn:return conn.read()# We can use a with statement to ensure threads are cleaned up promptlywith concurrent.futures.ThreadPoolExecutor(max_workers=5)as executor:# Start the load operations and mark each future with its URL
future_to_url ={executor.submit(load_url, url,60): url for url in URLS}for future in concurrent.futures.as_completed(future_to_url):
url = future_to_url[future]try:
data = future.result()exceptExceptionas exc:print('%r generated an exception: %s'%(url, exc))else:print('%r page is %d bytes'%(url, len(data)))
import concurrent.futures
import math
PRIMES =[112272535095293,112582705942171,112272535095293,115280095190773,115797848077099,1099726899285419]def is_prime(n):if n %2==0:returnFalse
sqrt_n = int(math.floor(math.sqrt(n)))for i in range(3, sqrt_n +1,2):if n % i ==0:returnFalsereturnTruedef main():with concurrent.futures.ProcessPoolExecutor()as executor:for number, prime in zip(PRIMES, executor.map(is_prime, PRIMES)):print('%d is prime: %s'%(number, prime))if __name__ =='__main__':
main()
import concurrent.futures
import urllib.request
URLS = ['http://www.foxnews.com/',
'http://www.cnn.com/',
'http://europe.wsj.com/',
'http://www.bbc.co.uk/',
'http://some-made-up-domain.com/']
# Retrieve a single page and report the URL and contents
def load_url(url, timeout):
with urllib.request.urlopen(url, timeout=timeout) as conn:
return conn.read()
# We can use a with statement to ensure threads are cleaned up promptly
with concurrent.futures.ThreadPoolExecutor(max_workers=5) as executor:
# Start the load operations and mark each future with its URL
future_to_url = {executor.submit(load_url, url, 60): url for url in URLS}
for future in concurrent.futures.as_completed(future_to_url):
url = future_to_url[future]
try:
data = future.result()
except Exception as exc:
print('%r generated an exception: %s' % (url, exc))
else:
print('%r page is %d bytes' % (url, len(data)))
import concurrent.futures
import math
PRIMES = [
112272535095293,
112582705942171,
112272535095293,
115280095190773,
115797848077099,
1099726899285419]
def is_prime(n):
if n % 2 == 0:
return False
sqrt_n = int(math.floor(math.sqrt(n)))
for i in range(3, sqrt_n + 1, 2):
if n % i == 0:
return False
return True
def main():
with concurrent.futures.ProcessPoolExecutor() as executor:
for number, prime in zip(PRIMES, executor.map(is_prime, PRIMES)):
print('%d is prime: %s' % (number, prime))
if __name__ == '__main__':
main()
def sqr(val):import time
time.sleep(0.1)return val * val
def process_result(result):print(result)def process_these_asap(tasks):import concurrent.futures
with concurrent.futures.ProcessPoolExecutor()as executor:
futures =[]for task in tasks:
futures.append(executor.submit(sqr, task))for future in concurrent.futures.as_completed(futures):
process_result(future.result())# Or instead of all this just do:# results = executor.map(sqr, tasks)# list(map(process_result, results))def main():
tasks = list(range(10))print('Processing {} tasks'.format(len(tasks)))
process_these_asap(tasks)print('Done')return0if __name__ =='__main__':import sys
sys.exit(main())
def sqr(val):
import time
time.sleep(0.1)
return val * val
def process_result(result):
print(result)
def process_these_asap(tasks):
import concurrent.futures
with concurrent.futures.ProcessPoolExecutor() as executor:
futures = []
for task in tasks:
futures.append(executor.submit(sqr, task))
for future in concurrent.futures.as_completed(futures):
process_result(future.result())
# Or instead of all this just do:
# results = executor.map(sqr, tasks)
# list(map(process_result, results))
def main():
tasks = list(range(10))
print('Processing {} tasks'.format(len(tasks)))
process_these_asap(tasks)
print('Done')
return 0
if __name__ == '__main__':
import sys
sys.exit(main())
The executor approach might seem familiar to all those who have gotten their hands dirty with Java before.
Also on a side note: To keep the universe sane, don’t forget to close your pools/executors if you don’t use with context (which is so awesome that it does it for you)
from concurrent.futures importThreadPoolExecutor, as_completed
def get_url(url):# Your actual program here. Using threading.Lock() if necessaryreturn""# List of URLs to fetch
urls =["url1","url2"]withThreadPoolExecutor(max_workers =5)as executor:# Create threads
futures ={executor.submit(get_url, url)for url in urls}# as_completed() gives you the threads once finishedfor f in as_completed(futures):# Get the results
rs = f.result()
Most documentation and tutorials use Python’s Threading and Queue module, and they could seem overwhelming for beginners.
Perhaps consider the concurrent.futures.ThreadPoolExecutor module of Python 3.
Combined with with clause and list comprehension it could be a real charm.
from concurrent.futures import ThreadPoolExecutor, as_completed
def get_url(url):
# Your actual program here. Using threading.Lock() if necessary
return ""
# List of URLs to fetch
urls = ["url1", "url2"]
with ThreadPoolExecutor(max_workers = 5) as executor:
# Create threads
futures = {executor.submit(get_url, url) for url in urls}
# as_completed() gives you the threads once finished
for f in as_completed(futures):
# Get the results
rs = f.result()
import math
import timeit
import threading
import multiprocessing
from concurrent.futures importThreadPoolExecutor,ProcessPoolExecutordef time_stuff(fn):"""
Measure time of execution of a function
"""def wrapper(*args,**kwargs):
t0 = timeit.default_timer()
fn(*args,**kwargs)
t1 = timeit.default_timer()print("{} seconds".format(t1 - t0))return wrapper
def find_primes_in(nmin, nmax):"""
Compute a list of prime numbers between the given minimum and maximum arguments
"""
primes =[]# Loop from minimum to maximumfor current in range(nmin, nmax +1):# Take the square root of the current number
sqrt_n = int(math.sqrt(current))
found =False# Check if the any number from 2 to the square root + 1 divides the current numnber under considerationfor number in range(2, sqrt_n +1):# If divisible we have found a factor, hence this is not a prime number, lets move to the next oneif current % number ==0:
found =Truebreak# If not divisible, add this number to the list of primes that we have found so farifnot found:
primes.append(current)# I am merely printing the length of the array containing all the primes, but feel free to do what you wantprint(len(primes))@time_stuffdef sequential_prime_finder(nmin, nmax):"""
Use the main process and main thread to compute everything in this case
"""
find_primes_in(nmin, nmax)@time_stuffdef threading_prime_finder(nmin, nmax):"""
If the minimum is 1000 and the maximum is 2000 and we have four workers,
1000 - 1250 to worker 1
1250 - 1500 to worker 2
1500 - 1750 to worker 3
1750 - 2000 to worker 4
so let’s split the minimum and maximum values according to the number of workers
"""
nrange = nmax - nmin
threads =[]for i in range(8):
start = int(nmin + i * nrange/8)
end = int(nmin +(i +1)* nrange/8)# Start the thread with the minimum and maximum split up to compute# Parallel computation will not work here due to the GIL since this is a CPU-bound task
t = threading.Thread(target = find_primes_in, args =(start, end))
threads.append(t)
t.start()# Don’t forget to wait for the threads to finishfor t in threads:
t.join()@time_stuffdef processing_prime_finder(nmin, nmax):"""
Split the minimum, maximum interval similar to the threading method above, but use processes this time
"""
nrange = nmax - nmin
processes =[]for i in range(8):
start = int(nmin + i * nrange/8)
end = int(nmin +(i +1)* nrange/8)
p = multiprocessing.Process(target = find_primes_in, args =(start, end))
processes.append(p)
p.start()for p in processes:
p.join()@time_stuffdef thread_executor_prime_finder(nmin, nmax):"""
Split the min max interval similar to the threading method, but use a thread pool executor this time.
This method is slightly faster than using pure threading as the pools manage threads more efficiently.
This method is still slow due to the GIL limitations since we are doing a CPU-bound task.
"""
nrange = nmax - nmin
withThreadPoolExecutor(max_workers =8)as e:for i in range(8):
start = int(nmin + i * nrange/8)
end = int(nmin +(i +1)* nrange/8)
e.submit(find_primes_in, start, end)@time_stuffdef process_executor_prime_finder(nmin, nmax):"""
Split the min max interval similar to the threading method, but use the process pool executor.
This is the fastest method recorded so far as it manages process efficiently + overcomes GIL limitations.
RECOMMENDED METHOD FOR CPU-BOUND TASKS
"""
nrange = nmax - nmin
withProcessPoolExecutor(max_workers =8)as e:for i in range(8):
start = int(nmin + i * nrange/8)
end = int(nmin +(i +1)* nrange/8)
e.submit(find_primes_in, start, end)def main():
nmin = int(1e7)
nmax = int(1.05e7)print("Sequential Prime Finder Starting")
sequential_prime_finder(nmin, nmax)print("Threading Prime Finder Starting")
threading_prime_finder(nmin, nmax)print("Processing Prime Finder Starting")
processing_prime_finder(nmin, nmax)print("Thread Executor Prime Finder Starting")
thread_executor_prime_finder(nmin, nmax)print("Process Executor Finder Starting")
process_executor_prime_finder(nmin, nmax)
main()
I saw a lot of examples here where no real work was being performed, and they were mostly CPU-bound. Here is an example of a CPU-bound task that computes all prime numbers between 10 million and 10.05 million. I have used all four methods here:
import math
import timeit
import threading
import multiprocessing
from concurrent.futures import ThreadPoolExecutor, ProcessPoolExecutor
def time_stuff(fn):
"""
Measure time of execution of a function
"""
def wrapper(*args, **kwargs):
t0 = timeit.default_timer()
fn(*args, **kwargs)
t1 = timeit.default_timer()
print("{} seconds".format(t1 - t0))
return wrapper
def find_primes_in(nmin, nmax):
"""
Compute a list of prime numbers between the given minimum and maximum arguments
"""
primes = []
# Loop from minimum to maximum
for current in range(nmin, nmax + 1):
# Take the square root of the current number
sqrt_n = int(math.sqrt(current))
found = False
# Check if the any number from 2 to the square root + 1 divides the current numnber under consideration
for number in range(2, sqrt_n + 1):
# If divisible we have found a factor, hence this is not a prime number, lets move to the next one
if current % number == 0:
found = True
break
# If not divisible, add this number to the list of primes that we have found so far
if not found:
primes.append(current)
# I am merely printing the length of the array containing all the primes, but feel free to do what you want
print(len(primes))
@time_stuff
def sequential_prime_finder(nmin, nmax):
"""
Use the main process and main thread to compute everything in this case
"""
find_primes_in(nmin, nmax)
@time_stuff
def threading_prime_finder(nmin, nmax):
"""
If the minimum is 1000 and the maximum is 2000 and we have four workers,
1000 - 1250 to worker 1
1250 - 1500 to worker 2
1500 - 1750 to worker 3
1750 - 2000 to worker 4
so let’s split the minimum and maximum values according to the number of workers
"""
nrange = nmax - nmin
threads = []
for i in range(8):
start = int(nmin + i * nrange/8)
end = int(nmin + (i + 1) * nrange/8)
# Start the thread with the minimum and maximum split up to compute
# Parallel computation will not work here due to the GIL since this is a CPU-bound task
t = threading.Thread(target = find_primes_in, args = (start, end))
threads.append(t)
t.start()
# Don’t forget to wait for the threads to finish
for t in threads:
t.join()
@time_stuff
def processing_prime_finder(nmin, nmax):
"""
Split the minimum, maximum interval similar to the threading method above, but use processes this time
"""
nrange = nmax - nmin
processes = []
for i in range(8):
start = int(nmin + i * nrange/8)
end = int(nmin + (i + 1) * nrange/8)
p = multiprocessing.Process(target = find_primes_in, args = (start, end))
processes.append(p)
p.start()
for p in processes:
p.join()
@time_stuff
def thread_executor_prime_finder(nmin, nmax):
"""
Split the min max interval similar to the threading method, but use a thread pool executor this time.
This method is slightly faster than using pure threading as the pools manage threads more efficiently.
This method is still slow due to the GIL limitations since we are doing a CPU-bound task.
"""
nrange = nmax - nmin
with ThreadPoolExecutor(max_workers = 8) as e:
for i in range(8):
start = int(nmin + i * nrange/8)
end = int(nmin + (i + 1) * nrange/8)
e.submit(find_primes_in, start, end)
@time_stuff
def process_executor_prime_finder(nmin, nmax):
"""
Split the min max interval similar to the threading method, but use the process pool executor.
This is the fastest method recorded so far as it manages process efficiently + overcomes GIL limitations.
RECOMMENDED METHOD FOR CPU-BOUND TASKS
"""
nrange = nmax - nmin
with ProcessPoolExecutor(max_workers = 8) as e:
for i in range(8):
start = int(nmin + i * nrange/8)
end = int(nmin + (i + 1) * nrange/8)
e.submit(find_primes_in, start, end)
def main():
nmin = int(1e7)
nmax = int(1.05e7)
print("Sequential Prime Finder Starting")
sequential_prime_finder(nmin, nmax)
print("Threading Prime Finder Starting")
threading_prime_finder(nmin, nmax)
print("Processing Prime Finder Starting")
processing_prime_finder(nmin, nmax)
print("Thread Executor Prime Finder Starting")
thread_executor_prime_finder(nmin, nmax)
print("Process Executor Finder Starting")
process_executor_prime_finder(nmin, nmax)
main()
Here are the results on my Mac OS X four-core machine
Sequential Prime Finder Starting
9.708213827005238 seconds
Threading Prime Finder Starting
9.81836523200036 seconds
Processing Prime Finder Starting
3.2467174359990167 seconds
Thread Executor Prime Finder Starting
10.228896902000997 seconds
Process Executor Finder Starting
2.656402041000547 seconds
Here is the very simple example of CSV import using threading. (Library inclusion may differ for different purpose.)
Helper Functions:
from threading import Thread
from project import app
import csv
def import_handler(csv_file_name):
thr = Thread(target=dump_async_csv_data, args=[csv_file_name])
thr.start()
def dump_async_csv_data(csv_file_name):
with app.app_context():
with open(csv_file_name) as File:
reader = csv.DictReader(File)
for row in reader:
# DB operation/query
#!/bin/pythonfrom multiprocessing.dummy importPoolfrom subprocess import PIPE,Popenimport time
import os
# In the variable pool_size we define the "parallelness".# For CPU-bound tasks, it doesn't make sense to create more Pool processes# than you have cores to run them on.## On the other hand, if you are using I/O-bound tasks, it may make sense# to create a quite a few more Pool processes than cores, since the processes# will probably spend most their time blocked (waiting for I/O to complete).
pool_size =8def do_ping(ip):if os.name =='nt':print("Using Windows Ping to "+ ip)
proc =Popen(['ping', ip], stdout=PIPE)return proc.communicate()[0]else:print("Using Linux / Unix Ping to "+ ip)
proc =Popen(['ping', ip,'-c','4'], stdout=PIPE)return proc.communicate()[0]
os.system('cls'if os.name=='nt'else'clear')print("Running using threads\n")
start_time = time.time()
pool =Pool(pool_size)
website_names =["www.google.com","www.facebook.com","www.pinterest.com","www.microsoft.com"]
result ={}for website_name in website_names:
result[website_name]= pool.apply_async(do_ping, args=(website_name,))
pool.close()
pool.join()print("\n--- Execution took {} seconds ---".format((time.time()- start_time)))# Now we do the same without threading, just to compare timeprint("\nRunning NOT using threads\n")
start_time = time.time()for website_name in website_names:
do_ping(website_name)print("\n--- Execution took {} seconds ---".format((time.time()- start_time)))# Here's one way to print the final output from the threads
output ={}for key, value in result.items():
output[key]= value.get()print("\nOutput aggregated in a Dictionary:")print(output)print("\n")print("\nPretty printed output: ")for key, value in output.items():print(key +"\n")print(value)
I would like to contribute with a simple example and the explanations I’ve found useful when I had to tackle this problem myself.
In this answer you will find some information about Python’s GIL (global interpreter lock) and a simple day-to-day example written using multiprocessing.dummy plus some simple benchmarks.
Global Interpreter Lock (GIL)
Python doesn’t allow multi-threading in the truest sense of the word. It has a multi-threading package, but if you want to multi-thread to speed your code up, then it’s usually not a good idea to use it.
Python has a construct called the global interpreter lock (GIL).
The GIL makes sure that only one of your ‘threads’ can execute at any one time. A thread acquires the GIL, does a little work, then passes the GIL onto the next thread.
This happens very quickly so to the human eye it may seem like your threads are executing in parallel, but they are really just taking turns using the same CPU core.
All this GIL passing adds overhead to execution. This means that if you want to make your code run faster then using the threading
package often isn’t a good idea.
There are reasons to use Python’s threading package. If you want to run some things simultaneously, and efficiency is not a concern,
then it’s totally fine and convenient. Or if you are running code that needs to wait for something (like some I/O) then it could make a lot of sense. But the threading library won’t let you use extra CPU cores.
Multi-threading can be outsourced to the operating system (by doing multi-processing), and some external application that calls your Python code (for example, Spark or Hadoop), or some code that your Python code calls (for example: you could have your Python code call a C function that does the expensive multi-threaded stuff).
Why This Matters
Because lots of people spend a lot of time trying to find bottlenecks in their fancy Python multi-threaded code before they learn what the GIL is.
Once this information is clear, here’s my code:
#!/bin/python
from multiprocessing.dummy import Pool
from subprocess import PIPE,Popen
import time
import os
# In the variable pool_size we define the "parallelness".
# For CPU-bound tasks, it doesn't make sense to create more Pool processes
# than you have cores to run them on.
#
# On the other hand, if you are using I/O-bound tasks, it may make sense
# to create a quite a few more Pool processes than cores, since the processes
# will probably spend most their time blocked (waiting for I/O to complete).
pool_size = 8
def do_ping(ip):
if os.name == 'nt':
print ("Using Windows Ping to " + ip)
proc = Popen(['ping', ip], stdout=PIPE)
return proc.communicate()[0]
else:
print ("Using Linux / Unix Ping to " + ip)
proc = Popen(['ping', ip, '-c', '4'], stdout=PIPE)
return proc.communicate()[0]
os.system('cls' if os.name=='nt' else 'clear')
print ("Running using threads\n")
start_time = time.time()
pool = Pool(pool_size)
website_names = ["www.google.com","www.facebook.com","www.pinterest.com","www.microsoft.com"]
result = {}
for website_name in website_names:
result[website_name] = pool.apply_async(do_ping, args=(website_name,))
pool.close()
pool.join()
print ("\n--- Execution took {} seconds ---".format((time.time() - start_time)))
# Now we do the same without threading, just to compare time
print ("\nRunning NOT using threads\n")
start_time = time.time()
for website_name in website_names:
do_ping(website_name)
print ("\n--- Execution took {} seconds ---".format((time.time() - start_time)))
# Here's one way to print the final output from the threads
output = {}
for key, value in result.items():
output[key] = value.get()
print ("\nOutput aggregated in a Dictionary:")
print (output)
print ("\n")
print ("\nPretty printed output: ")
for key, value in output.items():
print (key + "\n")
print (value)
Here is multi threading with a simple example which will be helpful. You can run it and understand easily how multi threading is working in Python. I used a lock for preventing access to other threads until the previous threads finished their work. By the use of this line of code,
tLock = threading.BoundedSemaphore(value=4)
you can allow a number of processes at a time and keep hold to the rest of the threads which will run later or after finished previous processes.
from concurrent.futures importThreadPoolExecutor, as_completed
from time import sleep, time
def concurrent(max_worker=1):
futures =[]
tick = time()withThreadPoolExecutor(max_workers=max_worker)as executor:
futures.append(executor.submit(sleep,2))# Two seconds sleep
futures.append(executor.submit(sleep,1))
futures.append(executor.submit(sleep,7))
futures.append(executor.submit(sleep,3))for future in as_completed(futures):if future.result()isnotNone:print(future.result())print('Total elapsed time by {} workers:'.format(max_worker), time()-tick)
concurrent(5)
concurrent(4)
concurrent(3)
concurrent(2)
concurrent(1)
输出:
Total elapsed time by 5 workers:7.007831811904907Total elapsed time by 4 workers:7.007944107055664Total elapsed time by 3 workers:7.003149509429932Total elapsed time by 2 workers:8.004627466201782Total elapsed time by 1 workers:13.013478994369507
With borrowing from this post we know about choosing between the multithreading, multiprocessing, and async/asyncio and their usage.
Python 3 has a new built-in library in order to concurrency and parallelism: concurrent.futures
So I’ll demonstrate through an experiment to run four tasks (i.e. .sleep() method) by Threading-Pool manner:
from concurrent.futures import ThreadPoolExecutor, as_completed
from time import sleep, time
def concurrent(max_worker=1):
futures = []
tick = time()
with ThreadPoolExecutor(max_workers=max_worker) as executor:
futures.append(executor.submit(sleep, 2)) # Two seconds sleep
futures.append(executor.submit(sleep, 1))
futures.append(executor.submit(sleep, 7))
futures.append(executor.submit(sleep, 3))
for future in as_completed(futures):
if future.result() is not None:
print(future.result())
print('Total elapsed time by {} workers:'.format(max_worker), time()-tick)
concurrent(5)
concurrent(4)
concurrent(3)
concurrent(2)
concurrent(1)
Output:
Total elapsed time by 5 workers: 7.007831811904907
Total elapsed time by 4 workers: 7.007944107055664
Total elapsed time by 3 workers: 7.003149509429932
Total elapsed time by 2 workers: 8.004627466201782
Total elapsed time by 1 workers: 13.013478994369507
[NOTE]:
As you can see in the above results, the best case was 3 workers for those four tasks.
If you have a process task instead of I/O bound or blocking (multiprocessing vs threading) you could change the ThreadPoolExecutor to ProcessPoolExecutor.
None of the previous solutions actually used multiple cores on my GNU/Linux server (where I don’t have administrator rights). They just ran on a single core.
I used the lower level os.fork interface to spawn multiple processes. This is the code that worked for me:
from os import fork
values = ['different', 'values', 'for', 'threads']
for i in range(len(values)):
p = fork()
if p == 0:
my_function(values[i])
break
回答 18
import threading
import requests
def send():
r = requests.get('https://www.stackoverlow.com')
thread =[]
t = threading.Thread(target=send())
thread.append(t)
t.start()