Project Euler and other coding contests often have a maximum time to run or people boast of how fast their particular solution runs. With Python, sometimes the approaches are somewhat kludgey – i.e., adding timing code to __main__.
What is a good way to profile how long a Python program takes to run?
Python includes a profiler called cProfile. It not only gives the total running time, but also times each function separately, and tells you how many times each function was called, making it easy to determine where you should make optimizations.
You can call it from within your code, or from the interpreter, like this:
import cProfile
cProfile.run('foo()')
Even more usefully, you can invoke the cProfile when running a script:
python -m cProfile myscript.py
To make it even easier, I made a little batch file called ‘profile.bat’:
A while ago I made pycallgraph which generates a visualisation from your Python code. Edit: I’ve updated the example to work with 3.3, the latest release as of this writing.
After a pip install pycallgraph and installing GraphViz you can run it from the command line:
pycallgraph graphviz -- ./mypythonscript.py
Or, you can profile particular parts of your code:
from pycallgraph import PyCallGraph
from pycallgraph.output import GraphvizOutput
with PyCallGraph(output=GraphvizOutput()):
code_to_profile()
Either of these will generate a pycallgraph.png file similar to the image below:
It’s worth pointing out that using the profiler only works (by default) on the main thread, and you won’t get any information from other threads if you use them. This can be a bit of a gotcha as it is completely unmentioned in the profiler documentation.
and use that ProfiledThread class instead of the standard one. It might give you more flexibility, but I’m not sure it’s worth it, especially if you are using third-party code which wouldn’t use your class.
@Maxy’s comment on this answer helped me out enough that I think it deserves its own answer: I already had cProfile-generated .pstats files and I didn’t want to re-run things with pycallgraph, so I used gprof2dot, and got pretty svgs:
I ran into a handy tool called SnakeViz when researching this topic. SnakeViz is a web-based profiling visualization tool. It is very easy to install and use. The usual way I use it is to generate a stat file with %prun and then do analysis in SnakeViz.
The main viz technique used is Sunburst chart as shown below, in which the hierarchy of function calls is arranged as layers of arcs and time info encoded in their angular widths.
The best thing is you can interact with the chart. For example, to zoom in one can click on an arc, and the arc and its descendants will be enlarged as a new sunburst to display more details.
I think that cProfile is great for profiling, while kcachegrind is great for visualizing the results. The pyprof2calltree in between handles the file conversion.
Also worth mentioning is the GUI cProfile dump viewer RunSnakeRun. It allows you to sort and select, thereby zooming in on the relevant parts of the program. The sizes of the rectangles in the picture is proportional to the time taken. If you mouse over a rectangle it highlights that call in the table and everywhere on the map. When you double-click on a rectangle it zooms in on that portion. It will show you who calls that portion and what that portion calls.
The descriptive information is very helpful. It shows you the code for that bit which can be helpful when you are dealing with built-in library calls. It tells you what file and what line to find the code.
Also want to point at that the OP said ‘profiling’ but it appears he meant ‘timing’. Keep in mind programs will run slower when profiled.
A nice profiling module is the line_profiler (called using the script kernprof.py). It can be downloaded here.
My understanding is that cProfile only gives information about total time spent in each function. So individual lines of code are not timed. This is an issue in scientific computing since often one single line can take a lot of time. Also, as I remember, cProfile didn’t catch the time I was spending in say numpy.dot.
line_profiler (already presented here) also inspired pprofile, which is described as:
Line-granularity, thread-aware deterministic and statistic pure-python
profiler
It provides line-granularity as line_profiler, is pure Python, can be used as a standalone command or a module, and can even generate callgrind-format files that can be easily analyzed with [k|q]cachegrind.
vprof
There is also vprof, a Python package described as:
[…] providing rich and interactive visualizations for various Python program characteristics such as running time and memory usage.
def count():from math import sqrt
for x in range(10**5):
sqrt(x)if __name__ =='__main__':import cProfile, pstats
cProfile.run("count()","{}.profile".format(__file__))
s = pstats.Stats("{}.profile".format(__file__))
s.strip_dirs()
s.sort_stats("time").print_stats(10)
There’s a lot of great answers but they either use command line or some external program for profiling and/or sorting the results.
I really missed some way I could use in my IDE (eclipse-PyDev) without touching the command line or installing anything. So here it is.
Profiling without command line
def count():
from math import sqrt
for x in range(10**5):
sqrt(x)
if __name__ == '__main__':
import cProfile, pstats
cProfile.run("count()", "{}.profile".format(__file__))
s = pstats.Stats("{}.profile".format(__file__))
s.strip_dirs()
s.sort_stats("time").print_stats(10)
Following Joe Shaw’s answer about multi-threaded code not to work as expected, I figured that the runcall method in cProfile is merely doing self.enable() and self.disable() calls around the profiled function call, so you can simply do that yourself and have whatever code you want in-between with minimal interference with existing code.
In Virtaal’s source there’s a very useful class and decorator that can make profiling (even for specific methods/functions) very easy. The output can then be viewed very comfortably in KCacheGrind.
cProfile is great for quick profiling but most of the time it was ending for me with the errors. Function runctx solves this problem by initializing correctly the environment and variables, hope it can be useful for someone:
import cProfile, pstats
class_ProfileFunc:def __init__(self, func, sort_stats_by):
self.func = func
self.profile_runs =[]
self.sort_stats_by = sort_stats_by
def __call__(self,*args,**kwargs):
pr = cProfile.Profile()
pr.enable()# this is the profiling section
retval = self.func(*args,**kwargs)
pr.disable()
self.profile_runs.append(pr)
ps = pstats.Stats(*self.profile_runs).sort_stats(self.sort_stats_by)return retval, ps
def cumulative_profiler(amount_of_times, sort_stats_by='time'):def real_decorator(function):def wrapper(*args,**kwargs):nonlocal function, amount_of_times, sort_stats_by # for python 2.x remove this row
profiled_func =_ProfileFunc(function, sort_stats_by)for i in range(amount_of_times):
retval, ps = profiled_func(*args,**kwargs)
ps.print_stats()return retval # returns the results of the functionreturn wrapper
if callable(amount_of_times):# incase you don't want to specify the amount of times
func = amount_of_times # amount_of_times is the function in here
amount_of_times =5# the default amountreturn real_decorator(func)return real_decorator
例
分析功能 baz
import time
@cumulative_profilerdef baz():
time.sleep(1)
time.sleep(2)return1
baz()
baz 跑了5次并打印了这个:
20 function calls in15.003 seconds
Ordered by: internal time
ncalls tottime percall cumtime percall filename:lineno(function)1015.0031.50015.0031.500{built-in method time.sleep}50.0000.00015.0033.001<ipython-input-9-c89afe010372>:3(baz)50.0000.0000.0000.000{method 'disable' of '_lsprof.Profiler' objects}
If you want to make a cumulative profiler,
meaning to run the function several times in a row and watch the sum of the results.
you can use this cumulative_profiler decorator:
it’s python >= 3.6 specific, but you can remove nonlocal for it work on older versions.
import cProfile, pstats
class _ProfileFunc:
def __init__(self, func, sort_stats_by):
self.func = func
self.profile_runs = []
self.sort_stats_by = sort_stats_by
def __call__(self, *args, **kwargs):
pr = cProfile.Profile()
pr.enable() # this is the profiling section
retval = self.func(*args, **kwargs)
pr.disable()
self.profile_runs.append(pr)
ps = pstats.Stats(*self.profile_runs).sort_stats(self.sort_stats_by)
return retval, ps
def cumulative_profiler(amount_of_times, sort_stats_by='time'):
def real_decorator(function):
def wrapper(*args, **kwargs):
nonlocal function, amount_of_times, sort_stats_by # for python 2.x remove this row
profiled_func = _ProfileFunc(function, sort_stats_by)
for i in range(amount_of_times):
retval, ps = profiled_func(*args, **kwargs)
ps.print_stats()
return retval # returns the results of the function
return wrapper
if callable(amount_of_times): # incase you don't want to specify the amount of times
func = amount_of_times # amount_of_times is the function in here
amount_of_times = 5 # the default amount
return real_decorator(func)
return real_decorator
Example
profiling the function baz
import time
@cumulative_profiler
def baz():
time.sleep(1)
time.sleep(2)
return 1
baz()
baz ran 5 times and printed this:
20 function calls in 15.003 seconds
Ordered by: internal time
ncalls tottime percall cumtime percall filename:lineno(function)
10 15.003 1.500 15.003 1.500 {built-in method time.sleep}
5 0.000 0.000 15.003 3.001 <ipython-input-9-c89afe010372>:3(baz)
5 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}
The terminal-only (and simplest) solution, in case all those fancy UI’s fail to install or to run:
ignore cProfile completely and replace it with pyinstrument, that will collect and display the tree of calls right after execution.
Install:
$ pip install pyinstrument
Profile and display result:
$ python -m pyinstrument ./prog.py
Works with python2 and 3.
[EDIT]
The documentation of the API, for profiling only a part of the code, can be found here.
My way is to use yappi (https://github.com/sumerc/yappi). It’s especially useful combined with an RPC server where (even just for debugging) you register method to start, stop and print profiling information, e.g. in this way:
@staticmethod
def startProfiler():
yappi.start()
@staticmethod
def stopProfiler():
yappi.stop()
@staticmethod
def printProfiler():
stats = yappi.get_stats(yappi.SORTTYPE_TTOT, yappi.SORTORDER_DESC, 20)
statPrint = '\n'
namesArr = [len(str(stat[0])) for stat in stats.func_stats]
log.debug("namesArr %s", str(namesArr))
maxNameLen = max(namesArr)
log.debug("maxNameLen: %s", maxNameLen)
for stat in stats.func_stats:
nameAppendSpaces = [' ' for i in range(maxNameLen - len(stat[0]))]
log.debug('nameAppendSpaces: %s', nameAppendSpaces)
blankSpace = ''
for space in nameAppendSpaces:
blankSpace += space
log.debug("adding spaces: %s", len(nameAppendSpaces))
statPrint = statPrint + str(stat[0]) + blankSpace + " " + str(stat[1]).ljust(8) + "\t" + str(
round(stat[2], 2)).ljust(8 - len(str(stat[2]))) + "\t" + str(round(stat[3], 2)) + "\n"
log.log(1000, "\nname" + ''.ljust(maxNameLen - 4) + " ncall \tttot \ttsub")
log.log(1000, statPrint)
Then when your program work you can start profiler at any time by calling the startProfiler RPC method and dump profiling information to a log file by calling printProfiler (or modify the rpc method to return it to the caller) and get such output:
It may not be very useful for short scripts but helps to optimize server-type processes especially given the printProfiler method can be called multiple times over time to profile and compare e.g. different program usage scenarios.
In newer versions of yappi, the following code will work:
Ever want to know what the hell that python script is doing? Enter the
Inspect Shell. Inspect Shell lets you print/alter globals and run
functions without interrupting the running script. Now with
auto-complete and command history (only on linux).
It would depend on what you want to see out of profiling. Simple time
metrics can be given by (bash).
time python python_prog.py
Even ‘/usr/bin/time’ can output detailed metrics by using ‘–verbose’ flag.
To check time metrics given by each function and to better understand how much time is spent on functions, you can use the inbuilt cProfile in python.
Going into more detailed metrics like performance, time is not the only metric. You can worry about memory, threads etc.
Profiling options:
1. line_profiler is another profiler used commonly to find out timing metrics line-by-line.
2. memory_profiler is a tool to profile memory usage.
3. heapy (from project Guppy) Profile how objects in the heap are used.
These are some of the common ones I tend to use. But if you want to find out more, try reading this book
It is a pretty good book on starting out with performance in mind. You can move onto advanced topics on using Cython and JIT(Just-in-time) compiled python.
With a statistical profiler like austin, no instrumentation is required, meaning that you can get profiling data out of a Python application simply with
austin python3 my_script.py
The raw output isn’t very useful, but you can pipe that to flamegraph.pl
to get a flame graph representation of that data that gives you a breakdown of where the time (measured in microseconds of real time) is being spent.
There’s also a statistical profiler called statprof. It’s a sampling profiler, so it adds minimal overhead to your code and gives line-based (not just function-based) timings. It’s more suited to soft real-time applications like games, but may be have less precision than cProfile.
import time # line number 1import random
from auto_profiler importProfiler,Treedef f1():
mysleep(.6+random.random())def mysleep(t):
time.sleep(t)def fact(i):
f1()if(i==1):return1return i*fact(i-1)def show(p):print('Time [Hits * PerHit] Function name [Called from] [Function Location]\n'+\
'-----------------------------------------------------------------------')print(Tree(p.root, threshold=0.5))@Profiler(depth=4, on_disable=show)def main():for i in range(5):
f1()
fact(3)if __name__ =='__main__':
main()
示例输出
Time[Hits*PerHit]Function name [Calledfrom][function location]-----------------------------------------------------------------------8.974s[1*8.974] main [auto-profiler/profiler.py:267][/test/t2.py:30]├──5.954s[5*1.191] f1 [/test/t2.py:34][/test/t2.py:14]│└──5.954s[5*1.191] mysleep [/test/t2.py:15][/test/t2.py:17]│└──5.954s[5*1.191]<time.sleep>|||# The rest is for the example recursive function call fact└──3.020s[1*3.020] fact [/test/t2.py:36][/test/t2.py:20]├──0.849s[1*0.849] f1 [/test/t2.py:21][/test/t2.py:14]│└──0.849s[1*0.849] mysleep [/test/t2.py:15][/test/t2.py:17]│└──0.849s[1*0.849]<time.sleep>└──2.171s[1*2.171] fact [/test/t2.py:24][/test/t2.py:20]├──1.552s[1*1.552] f1 [/test/t2.py:21][/test/t2.py:14]│└──1.552s[1*1.552] mysleep [/test/t2.py:15][/test/t2.py:17]└──0.619s[1*0.619] fact [/test/t2.py:24][/test/t2.py:20]└──0.619s[1*0.619] f1 [/test/t2.py:21][/test/t2.py:14]
By adding a decorator it will show a tree of time-consuming functions
@Profiler(depth=4, on_disable=show)
Install by: pip install auto_profiler
Example
import time # line number 1
import random
from auto_profiler import Profiler, Tree
def f1():
mysleep(.6+random.random())
def mysleep(t):
time.sleep(t)
def fact(i):
f1()
if(i==1):
return 1
return i*fact(i-1)
def show(p):
print('Time [Hits * PerHit] Function name [Called from] [Function Location]\n'+\
'-----------------------------------------------------------------------')
print(Tree(p.root, threshold=0.5))
@Profiler(depth=4, on_disable=show)
def main():
for i in range(5):
f1()
fact(3)
if __name__ == '__main__':
main()
Example Output
Time [Hits * PerHit] Function name [Called from] [function location]
-----------------------------------------------------------------------
8.974s [1 * 8.974] main [auto-profiler/profiler.py:267] [/test/t2.py:30]
├── 5.954s [5 * 1.191] f1 [/test/t2.py:34] [/test/t2.py:14]
│ └── 5.954s [5 * 1.191] mysleep [/test/t2.py:15] [/test/t2.py:17]
│ └── 5.954s [5 * 1.191] <time.sleep>
|
|
| # The rest is for the example recursive function call fact
└── 3.020s [1 * 3.020] fact [/test/t2.py:36] [/test/t2.py:20]
├── 0.849s [1 * 0.849] f1 [/test/t2.py:21] [/test/t2.py:14]
│ └── 0.849s [1 * 0.849] mysleep [/test/t2.py:15] [/test/t2.py:17]
│ └── 0.849s [1 * 0.849] <time.sleep>
└── 2.171s [1 * 2.171] fact [/test/t2.py:24] [/test/t2.py:20]
├── 1.552s [1 * 1.552] f1 [/test/t2.py:21] [/test/t2.py:14]
│ └── 1.552s [1 * 1.552] mysleep [/test/t2.py:15] [/test/t2.py:17]
└── 0.619s [1 * 0.619] fact [/test/t2.py:24] [/test/t2.py:20]
└── 0.619s [1 * 0.619] f1 [/test/t2.py:21] [/test/t2.py:14]
For getting quick profile stats for your code snippets on an IPython notebook.
One can embed line_profiler and memory_profiler straight into their notebooks.
%timeit -r 7 -n 1000 print('Outputs execution time of the snippet')
#1000 loops, best of 7: 7.46 ns per loop
Gives best time out of given number of runs(r) in looping (n) times.
Outputs details on system caching:
When code snippets are executed multiple times, system caches a few opearations and doesn’t execute them again that may hamper the accuracy of the profile reports.