My memory usage increases over time and restarting Django is not kind to users.
I am unsure how to go about profiling the memory usage but some tips on how to start measuring would be useful.
I have a feeling that there are some simple steps that could produce big gains. Ensuring ‘debug’ is set to ‘False’ is an obvious biggie.
Can anyone suggest others? How much improvement would caching on low-traffic sites?
In this case I’m running under Apache 2.x with mod_python. I’ve heard mod_wsgi is a bit leaner but it would be tricky to switch at this stage unless I know the gains would be significant.
Edit: Thanks for the tips so far. Any suggestions how to discover what’s using up the memory? Are there any guides to Python memory profiling?
Also as mentioned there’s a few things that will make it tricky to switch to mod_wsgi so I’d like to have some idea of the gains I could expect before ploughing forwards in that direction.
Edit:Graham Dumpleton’s article is the best I’ve found on the MPM and mod_wsgi related stuff. I am rather disappointed that no-one could provide any info on debugging the memory usage in the app itself though.
Final Edit: Well I have been discussing this with Webfaction to see if they could assist with recompiling Apache and this is their word on the matter:
“I really don’t think that you will get much of a benefit by switching to an MPM Worker + mod_wsgi setup. I estimate that you might be able to save around 20MB, but probably not much more than that.”
So! This brings me back to my original question (which I am still none the wiser about). How does one go about identifying where the problems lies? It’s a well known maxim that you don’t optimize without testing to see where you need to optimize but there is very little in the way of tutorials on measuring Python memory usage and none at all specific to Django.
Thanks for everyone’s assistance but I think this question is still open!
Make sure you are not keeping global references to data. That prevents the python garbage collector from releasing the memory.
Don’t use mod_python. It loads an interpreter inside apache. If you need to use apache, use mod_wsgi instead. It is not tricky to switch. It is very easy. mod_wsgi is way easier to configure for django than brain-dead mod_python.
If you can remove apache from your requirements, that would be even better to your memory. spawning seems to be the new fast scalable way to run python web applications.
EDIT: I don’t see how switching to mod_wsgi could be “tricky“. It should be a very easy task. Please elaborate on the problem you are having with the switch.
I’ll also just add my voice of support for mod_wsgi. It makes a world of difference in terms of performance and memory usage over mod_python. Graham Dumpleton’s support for mod_wsgi is outstanding, both in terms of active development and in helping people on the mailing list to optimize their installations. David Cramer at curse.com has posted some charts (which I can’t seem to find now unfortunately) showing the drastic reduction in cpu and memory usage after they switched to mod_wsgi on that high traffic site. Several of the django devs have switched. Seriously, it’s a no-brainer :)
Additionally, check if you do not use any of known leakers. MySQLdb is known to leak enormous amounts of memory with Django due to bug in unicode handling. Other than that, Django Debug Toolbar might help you to track the hogs.
In addition to not keeping around global references to large data objects, try to avoid loading large datasets into memory at all wherever possible.
Switch to mod_wsgi in daemon mode, and use Apache’s worker mpm instead of prefork. This latter step can allow you to serve many more concurrent users with much less memory overhead.
Another plus for mod_wsgi: set a maximum-requests parameter in your WSGIDaemonProcess directive and mod_wsgi will restart the daemon process every so often. There should be no visible effect for the user, other than a slow page load the first time a fresh process is hit, as it’ll be loading Django and your application code into memory.
But even if you do have memory leaks, that should keep the process size from getting too large, without having to interrupt service to your users.
回答 7
这是我用于mod_wsgi的脚本(称为wsgi.py,并放在django项目的根目录中):
import os
import sys
import django.core.handlers.wsgi
from os import path
sys.stdout = open('/dev/null','a+')
sys.stderr = open('/dev/null','a+')
sys.path.append(path.join(path.dirname(__file__),'..'))
os.environ['DJANGO_SETTINGS_MODULE']='myproject.settings'
application = django.core.handlers.wsgi.WSGIHandler()
根据需要调整myproject.settings和路径。我将所有输出重定向到/ dev / null,因为默认情况下mod_wsgi阻止打印。请改用日志记录。
对于apache:
<VirtualHost*>ServerName myhost.com
ErrorLog/var/log/apache2/error-myhost.log
CustomLog/var/log/apache2/access-myhost.log common
DocumentRoot"/var/www"WSGIScriptAlias//path/to/my/wsgi.py
</VirtualHost>
We stumbled over a bug in Django with big sitemaps (10.000 items). Seems Django is trying to load them all in memory when generating the sitemap: http://code.djangoproject.com/ticket/11572 – effectively kills the apache process when Google pays a visit to the site.
I’ve been using cProfile to profile my code, and it’s been working great. I also use gprof2dot.py to visualize the results (makes it a little clearer).
However, cProfile (and most other Python profilers I’ve seen so far) seem to only profile at the function-call level. This causes confusion when certain functions are called from different places – I have no idea if call #1 or call #2 is taking up the majority of the time. This gets even worse when the function in question is six levels deep, called from seven other places.
How do I get a line-by-line profiling?
Instead of this:
function #12, total time: 2.0s
I’d like to see something like this:
function #12 (called from somefile.py:102) 0.5s
function #12 (called from main.py:12) 1.5s
cProfile does show how much of the total time “transfers” to the parent, but again this connection is lost when you have a bunch of layers and interconnected calls.
Ideally, I’d love to have a GUI that would parse through the data, then show me my source file with a total time given to each line. Something like this:
main.py:
a = 1 # 0.0s
result = func(a) # 0.4s
c = 1000 # 0.0s
result = func(c) # 5.0s
Then I’d be able to click on the second “func(c)” call to see what’s taking up time in that call, separate from the “func(a)” call.
Does that make sense? Is there any profiling library that collects this type of information? Is there some awesome tool I’ve missed?
import pprofile
profiler = pprofile.Profile()with profiler:
some_code
# Process profile content: generate a cachegrind file and send it to user.# You can also write the result to the console:
profiler.print_stats()# Or to a file:
profiler.dump_stats("/tmp/profiler_stats.txt")
You could also use pprofile(pypi).
If you want to profile the entire execution, it does not require source code modification.
You can also profile a subset of a larger program in two ways:
toggle profiling when reaching a specific point in the code, such as:
import pprofile
profiler = pprofile.Profile()
with profiler:
some_code
# Process profile content: generate a cachegrind file and send it to user.
# You can also write the result to the console:
profiler.print_stats()
# Or to a file:
profiler.dump_stats("/tmp/profiler_stats.txt")
toggle profiling asynchronously from call stack (requires a way to trigger this code in considered application, for example a signal handler or an available worker thread) by using statistical profiling:
import pprofile
profiler = pprofile.StatisticalProfile()
statistical_profiler_thread = pprofile.StatisticalThread(
profiler=profiler,
)
with statistical_profiler_thread:
sleep(n)
# Likewise, process profile content
Code annotation output format is much like line profiler:
Note that because pprofile does not rely on code modification it can profile top-level module statements, allowing to profile program startup time (how long it takes to import modules, initialise globals, …).
It can generate cachegrind-formatted output, so you can use kcachegrind to browse large results easily.
Disclosure: I am pprofile author.
回答 2
您可以为此使用line_profiler软件包的帮助
1.首先安装软件包:
pip install line_profiler
2.使用magic命令将包加载到python / notebook环境
%load_ext line_profiler
3.如果要分析功能代码,
请执行以下操作:
%lprun -f demo_func demo_func(arg1, arg2)
如果您按照以下步骤操作,您将获得包含所有详细信息的格式化输出:)
Line# Hits Time Per Hit % Time Line Contents1def demo_func(a,b):21248.0248.064.8print(a+b)3140.040.010.4print(a)4194.094.024.5print(a*b)511.01.00.3return a/b
Usage: kernprof [-s setupfile][-o output_file_path] scriptfile [arg]...Options:--version show program's version number and exit
-h, --help show this help message and exit
-l, --line-by-line Use the line-by-line profiler from the line_profiler
module instead of Profile. Implies --builtin.
-b, --builtin Put 'profile' in the builtins. Use 'profile.enable()'
and 'profile.disable()' in your code to turn it on and
off, or '@profile' to decorate a single function, or
'with profile:' to profile a single section of code.
-o OUTFILE, --outfile=OUTFILE
Save stats to <outfile>
-s SETUP, --setup=SETUP
Code to execute before the code to profile
-v, --view View the results of the profile in addition to saving
it.
结果将在控制台上显示为:
Total time:17.6699 s
File: main.py
Function: fun_a at line 5Line# Hits Time Per Hit % Time Line Contents==============================================================5@profile6def fun_a():...
Suppose you have the program main.py and within it, functions fun_a() and fun_b() that you want to profile with respect to time; you will need to use the decorator @profile just before the function definitions. For e.g.,
@profile
def fun_a():
#do something
@profile
def fun_b():
#do something more
if __name__ == '__main__':
fun_a()
fun_b()
The program can be profiled by executing the shell command:
$ kernprof -l -v main.py
The arguments can be fetched using $ kernprof -h
Usage: kernprof [-s setupfile] [-o output_file_path] scriptfile [arg] ...
Options:
--version show program's version number and exit
-h, --help show this help message and exit
-l, --line-by-line Use the line-by-line profiler from the line_profiler
module instead of Profile. Implies --builtin.
-b, --builtin Put 'profile' in the builtins. Use 'profile.enable()'
and 'profile.disable()' in your code to turn it on and
off, or '@profile' to decorate a single function, or
'with profile:' to profile a single section of code.
-o OUTFILE, --outfile=OUTFILE
Save stats to <outfile>
-s SETUP, --setup=SETUP
Code to execute before the code to profile
-v, --view View the results of the profile in addition to saving
it.
The results will be printed on the console as:
Total time: 17.6699 s
File: main.py
Function: fun_a at line 5
Line # Hits Time Per Hit % Time Line Contents
==============================================================
5 @profile
6 def fun_a():
...
EDIT: The results from the profilers can be parsed using the TAMPPA package. Using it, we can get line-by-line desired plots as
If you want to measure CPU time, can use time.process_time() for Python 3.3 and above:
import time
start = time.process_time()
# your code here
print(time.process_time() - start)
First call turns the timer on, and second call tells you how many seconds have elapsed.
There is also a function time.clock(), but it is deprecated since Python 3.3 and will be removed in Python 3.8.
There are better profiling tools like timeit and profile, however time.process_time() will measure the CPU time and this is what you’re are asking about.
If you want to measure wall clock time instead, use time.time().
回答 1
您还可以使用time库:
import time
start = time.time()
# your code# end
print(f'Time: {time.time() - start}')
with CodeTimer('loop 1'):
for i inrange(100000):
passwith CodeTimer('loop 2'):
for i inrange(100000):
pass
Code block 'loop 1' took: 4.991 ms
Code block 'loop 2' took: 3.666 ms
并嵌套它们:
with CodeTimer('Outer'):
for i inrange(100000):
passwith CodeTimer('Inner'):
for i inrange(100000):
passfor i inrange(100000):
pass
Code block 'Inner' took: 2.382 ms
Code block 'Outer' took: 10.466 ms
You could then name the code blocks you want to measure:
with CodeTimer('loop 1'):
for i in range(100000):
pass
with CodeTimer('loop 2'):
for i in range(100000):
pass
Code block 'loop 1' took: 4.991 ms
Code block 'loop 2' took: 3.666 ms
And nest them:
with CodeTimer('Outer'):
for i in range(100000):
pass
with CodeTimer('Inner'):
for i in range(100000):
pass
for i in range(100000):
pass
Code block 'Inner' took: 2.382 ms
Code block 'Outer' took: 10.466 ms
Regarding timeit.default_timer(), it uses the best timer based on OS and Python version, see this answer.
回答 3
我总是喜欢以小时,分钟和秒(%H:%M:%S)格式检查时间:
from datetime import datetime
start = datetime.now()
# your code
end = datetime.now()
time_taken = end - start
print('Time: ',time_taken)
I was looking for a way how to output a formatted time with minimal code, so here is my solution. Many people use Pandas anyway, so in some cases this can save from additional library imports.
import pandas as pd
start = pd.Timestamp.now()
# code
print(pd.Timestamp.now()-start)
Output:
0 days 00:05:32.541600
I would recommend using this if time precision is not the most important, otherwise use time library:
%timeit pd.Timestamp.now() outputs 3.29 µs ± 214 ns per loop
%timeit time.time() outputs 154 ns ± 13.3 ns per loop
Putting the code in a function, then using a decorator for timing is another option. (Source) The advantage of this method is that you define timer once and use it with a simple additional line for every function.
First, define timer decorator:
import functools
import time
def timer(func):
@functools.wraps(func)
def wrapper(*args, **kwargs):
start_time = time.perf_counter()
value = func(*args, **kwargs)
end_time = time.perf_counter()
run_time = end_time - start_time
print("Finished {} in {} secs".format(repr(func.__name__), round(run_time, 3)))
return value
return wrapper
Then, use the decorator while defining the function:
@timer
def doubled_and_add(num):
res = sum([i*2 for i in range(num)])
print("Result : {}".format(res))
Let’s try:
doubled_and_add(100000)
doubled_and_add(1000000)
Output:
Result : 9999900000
Finished 'doubled_and_add' in 0.0119 secs
Result : 999999000000
Finished 'doubled_and_add' in 0.0897 secs
Note: I’m not sure why to use time.perf_counter instead of time.time. Comments are welcome.
回答 6
您也可以尝试以下方法:
from time import perf_counter
t0 = perf_counter()
...
t1 = perf_counter()
time_taken = t1 - t0
A friend and I have written a Python profile viewer called SnakeViz that runs in a web browser. If you are already successfully using RunSnakeRun SnakeViz may not add that much value, but SnakeViz is much easier to install.
Edit: SnakeViz supports Python 2 and 3 and works on all major systems.
Python Call Graph generates pics very similar to those in maxy’s answer. It also shows total time for each function, for some reason it’s not reflected in the example graphs.
I’ve written a browser-based visualization tool, profile_eye, which operates on the output of gprof2dot.
gprof2dot is great at grokking many profiling-tool outputs, and does a great job at graph-element placement. The final rendering is a static graphic, which is often very cluttered.
Using d3.js it’s possible to remove much of that clutter, through relative fading of unfocused elements, tooltips, and a fisheye distortion.
I’ve recently become interested in algorithms and have begun exploring them by writing a naive implementation and then optimizing it in various ways.
I’m already familiar with the standard Python module for profiling runtime (for most things I’ve found the timeit magic function in IPython to be sufficient), but I’m also interested in memory usage so I can explore those tradeoffs as well (e.g. the cost of caching a table of previously computed values versus recomputing them as needed). Is there a module that will profile the memory usage of a given function for me?
>>>from guppy import hpy; h=hpy()>>> h.heap()Partition of a set of 48477 objects.Total size =3265516 bytes.IndexCount%Size%Cumulative%Kind(class/ dict of class)02577353161282049161282049 str1116992448396015209678064 tuple217402415847233836472 dict of module3347872225927256095678 types.CodeType4329671845766274553284 function540111751125292064489 dict of class61080818883300253292 dict (no owner)71140796322308216494 dict of type81170513362313350096 type96671240121315751297 __builtin__.wrapper_descriptor<76 more rows.Type e.g.'_.more' to view.>>>> h.iso(1,[],{})Partition of a set of 3 objects.Total size =176 bytes.IndexCount%Size%Cumulative%Kind(class/ dict of class)01331367713677 dict (no owner)1133281616493 list2133127176100 int>>> x=[]>>> h.iso(x).sp0: h.Root.i0_modules['__main__'].__dict__['x']>>>
Python 3.4 includes a new module: tracemalloc. It provides detailed statistics about which code is allocating the most memory. Here’s an example that displays the top three lines allocating memory.
from collections import Counter
import linecache
import os
import tracemalloc
def display_top(snapshot, key_type='lineno', limit=3):
snapshot = snapshot.filter_traces((
tracemalloc.Filter(False, "<frozen importlib._bootstrap>"),
tracemalloc.Filter(False, "<unknown>"),
))
top_stats = snapshot.statistics(key_type)
print("Top %s lines" % limit)
for index, stat in enumerate(top_stats[:limit], 1):
frame = stat.traceback[0]
# replace "/path/to/module/file.py" with "module/file.py"
filename = os.sep.join(frame.filename.split(os.sep)[-2:])
print("#%s: %s:%s: %.1f KiB"
% (index, filename, frame.lineno, stat.size / 1024))
line = linecache.getline(frame.filename, frame.lineno).strip()
if line:
print(' %s' % line)
other = top_stats[limit:]
if other:
size = sum(stat.size for stat in other)
print("%s other: %.1f KiB" % (len(other), size / 1024))
total = sum(stat.size for stat in top_stats)
print("Total allocated size: %.1f KiB" % (total / 1024))
tracemalloc.start()
counts = Counter()
fname = '/usr/share/dict/american-english'
with open(fname) as words:
words = list(words)
for word in words:
prefix = word[:3]
counts[prefix] += 1
print('Top prefixes:', counts.most_common(3))
snapshot = tracemalloc.take_snapshot()
display_top(snapshot)
That example is great when the memory is still being held at the end of the calculation, but sometimes you have code that allocates a lot of memory and then releases it all. It’s not technically a memory leak, but it’s using more memory than you think it should. How can you track memory usage when it all gets released? If it’s your code, you can probably add some debugging code to take snapshots while it’s running. If not, you can start a background thread to monitor memory usage while the main thread runs.
Here’s the previous example where the code has all been moved into the count_prefixes() function. When that function returns, all the memory is released. I also added some sleep() calls to simulate a long-running calculation.
from collections import Counter
import linecache
import os
import tracemalloc
from time import sleep
def count_prefixes():
sleep(2) # Start up time.
counts = Counter()
fname = '/usr/share/dict/american-english'
with open(fname) as words:
words = list(words)
for word in words:
prefix = word[:3]
counts[prefix] += 1
sleep(0.0001)
most_common = counts.most_common(3)
sleep(3) # Shut down time.
return most_common
def main():
tracemalloc.start()
most_common = count_prefixes()
print('Top prefixes:', most_common)
snapshot = tracemalloc.take_snapshot()
display_top(snapshot)
def display_top(snapshot, key_type='lineno', limit=3):
snapshot = snapshot.filter_traces((
tracemalloc.Filter(False, "<frozen importlib._bootstrap>"),
tracemalloc.Filter(False, "<unknown>"),
))
top_stats = snapshot.statistics(key_type)
print("Top %s lines" % limit)
for index, stat in enumerate(top_stats[:limit], 1):
frame = stat.traceback[0]
# replace "/path/to/module/file.py" with "module/file.py"
filename = os.sep.join(frame.filename.split(os.sep)[-2:])
print("#%s: %s:%s: %.1f KiB"
% (index, filename, frame.lineno, stat.size / 1024))
line = linecache.getline(frame.filename, frame.lineno).strip()
if line:
print(' %s' % line)
other = top_stats[limit:]
if other:
size = sum(stat.size for stat in other)
print("%s other: %.1f KiB" % (len(other), size / 1024))
total = sum(stat.size for stat in top_stats)
print("Total allocated size: %.1f KiB" % (total / 1024))
main()
When I run that version, the memory usage has gone from 6MB down to 4KB, because the function released all its memory when it finished.
Top prefixes: [('con', 1220), ('dis', 1002), ('pro', 809)]
Top 3 lines
#1: collections/__init__.py:537: 0.7 KiB
self.update(*args, **kwds)
#2: collections/__init__.py:555: 0.6 KiB
return _heapq.nlargest(n, self.items(), key=_itemgetter(1))
#3: python3.6/heapq.py:569: 0.5 KiB
result = [(key(elem), i, elem) for i, elem in zip(range(0, -n, -1), it)]
10 other: 2.2 KiB
Total allocated size: 4.0 KiB
Now here’s a version inspired by another answer that starts a second thread to monitor memory usage.
from collections import Counter
import linecache
import os
import tracemalloc
from datetime import datetime
from queue import Queue, Empty
from resource import getrusage, RUSAGE_SELF
from threading import Thread
from time import sleep
def memory_monitor(command_queue: Queue, poll_interval=1):
tracemalloc.start()
old_max = 0
snapshot = None
while True:
try:
command_queue.get(timeout=poll_interval)
if snapshot is not None:
print(datetime.now())
display_top(snapshot)
return
except Empty:
max_rss = getrusage(RUSAGE_SELF).ru_maxrss
if max_rss > old_max:
old_max = max_rss
snapshot = tracemalloc.take_snapshot()
print(datetime.now(), 'max RSS', max_rss)
def count_prefixes():
sleep(2) # Start up time.
counts = Counter()
fname = '/usr/share/dict/american-english'
with open(fname) as words:
words = list(words)
for word in words:
prefix = word[:3]
counts[prefix] += 1
sleep(0.0001)
most_common = counts.most_common(3)
sleep(3) # Shut down time.
return most_common
def main():
queue = Queue()
poll_interval = 0.1
monitor_thread = Thread(target=memory_monitor, args=(queue, poll_interval))
monitor_thread.start()
try:
most_common = count_prefixes()
print('Top prefixes:', most_common)
finally:
queue.put('stop')
monitor_thread.join()
def display_top(snapshot, key_type='lineno', limit=3):
snapshot = snapshot.filter_traces((
tracemalloc.Filter(False, "<frozen importlib._bootstrap>"),
tracemalloc.Filter(False, "<unknown>"),
))
top_stats = snapshot.statistics(key_type)
print("Top %s lines" % limit)
for index, stat in enumerate(top_stats[:limit], 1):
frame = stat.traceback[0]
# replace "/path/to/module/file.py" with "module/file.py"
filename = os.sep.join(frame.filename.split(os.sep)[-2:])
print("#%s: %s:%s: %.1f KiB"
% (index, filename, frame.lineno, stat.size / 1024))
line = linecache.getline(frame.filename, frame.lineno).strip()
if line:
print(' %s' % line)
other = top_stats[limit:]
if other:
size = sum(stat.size for stat in other)
print("%s other: %.1f KiB" % (len(other), size / 1024))
total = sum(stat.size for stat in top_stats)
print("Total allocated size: %.1f KiB" % (total / 1024))
main()
The resource module lets you check the current memory usage, and save the snapshot from the peak memory usage. The queue lets the main thread tell the memory monitor thread when to print its report and shut down. When it runs, it shows the memory being used by the list() call:
2018-05-29 10:34:34.441334 max RSS 10188
2018-05-29 10:34:36.475707 max RSS 23588
2018-05-29 10:34:36.616524 max RSS 38104
2018-05-29 10:34:36.772978 max RSS 45924
2018-05-29 10:34:36.929688 max RSS 46824
2018-05-29 10:34:37.087554 max RSS 46852
Top prefixes: [('con', 1220), ('dis', 1002), ('pro', 809)]
2018-05-29 10:34:56.281262
Top 3 lines
#1: scratches/scratch.py:36: 6527.0 KiB
words = list(words)
#2: scratches/scratch.py:38: 16.4 KiB
prefix = word[:3]
#3: scratches/scratch.py:39: 10.1 KiB
counts[prefix] += 1
19 other: 10.8 KiB
Total allocated size: 6564.3 KiB
If you’re on Linux, you may find /proc/self/statm more useful than the resource module.
>>> help(asizeof.asizeof)Help on function asizeof in module pympler.asizeof:
asizeof(*objs,**opts)Return the combined size in bytes of all objects passed as positional arguments.
>>> help(asizeof.asizeof)
Help on function asizeof in module pympler.asizeof:
asizeof(*objs, **opts)
Return the combined size in bytes of all objects passed as positional arguments.
from profile import profilefrom time import sleepfrom sklearn import datasets # Just an example of 3rd party function call# Method 1
run_profiling = profile(datasets.load_digits)
data = run_profiling()# Method 2@profiledef my_function():# do some stuff
a_list =[]for i in range(1,100000):
a_list.append(i)return a_list
res = my_function()
Since the accepted answer and also the next highest voted answer have, in my opinion, some problems, I’d like to offer one more answer that is based closely on Ihor B.’s answer with some small but important modifications.
This solution allows you to run profiling on either by wrapping a function call with the profile function and calling it, or by decorating your function/method with the @profile decorator.
The first technique is useful when you want to profile some third-party code without messing with its source, whereas the second technique is a bit “cleaner” and works better when you are don’t mind modifying the source of the function/method you want to profile.
I’ve also modified the output, so that you get RSS, VMS, and shared memory. I don’t care much about the “before” and “after” values, but only the delta, so I removed those (if you’re comparing to Ihor B.’s answer).
Example usage, assuming the above code is saved as profile.py:
from profile import profile
from time import sleep
from sklearn import datasets # Just an example of 3rd party function call
# Method 1
run_profiling = profile(datasets.load_digits)
data = run_profiling()
# Method 2
@profile
def my_function():
# do some stuff
a_list = []
for i in range(1,100000):
a_list.append(i)
return a_list
res = my_function()
This should result in output similar to the below:
Keep in mind, this method of profiling is only going to be approximate, since lots of other stuff might be happening on the machine. Due to garbage collection and other factors, the deltas might even be zero.
For some unknown reason, very short function calls (e.g. 1 or 2 ms)
show up with zero memory usage. I suspect this is some limitation of
the hardware/OS (tested on basic laptop with Linux) on how often
memory statistics are updated.
To keep the examples simple, I didn’t use any function arguments, but they should work as one would expect, i.e.
profile(my_function, arg) to profile my_function(arg)
Below is a simple function decorator which allows to track how much memory the process consumed before the function call, after the function call, and what is the difference:
This piece of code in Python runs in (Note: The timing is done with the time function in BASH in Linux.)
real 0m1.841s
user 0m1.828s
sys 0m0.012s
However, if the for loop isn’t placed within a function,
for i in xrange(10**8):
pass
then it runs for a much longer time:
real 0m4.543s
user 0m4.524s
sys 0m0.012s
Why is this?
回答 0
您可能会问为什么存储局部变量比全局变量更快。这是CPython实现的细节。
请记住,CPython被编译为字节码,解释器将运行该字节码。编译函数时,局部变量存储在固定大小的数组(不是 a dict)中,并且变量名称分配给索引。这是可能的,因为您不能将局部变量动态添加到函数中。然后检索一个本地变量实际上是对列表的指针查找,而对refcount的引用PyObject则是微不足道的。
You might ask why it is faster to store local variables than globals. This is a CPython implementation detail.
Remember that CPython is compiled to bytecode, which the interpreter runs. When a function is compiled, the local variables are stored in a fixed-size array (not a dict) and variable names are assigned to indexes. This is possible because you can’t dynamically add local variables to a function. Then retrieving a local variable is literally a pointer lookup into the list and a refcount increase on the PyObject which is trivial.
Contrast this to a global lookup (LOAD_GLOBAL), which is a true dict search involving a hash and so on. Incidentally, this is why you need to specify global i if you want it to be global: if you ever assign to a variable inside a scope, the compiler will issue STORE_FASTs for its access unless you tell it not to.
By the way, global lookups are still pretty optimised. Attribute lookups foo.bar are the really slow ones!
Here is small illustration on local variable efficiency.
The difference is that STORE_FAST is faster (!) than STORE_NAME. This is because in a function, i is a local but at toplevel it is a global.
To examine bytecode, use the dis module. I was able to disassemble the function directly, but to disassemble the toplevel code I had to use the compile builtin.
回答 2
除了局部/全局变量存储时间外,操作码预测还使函数运行更快。
正如其他答案所解释的,该函数STORE_FAST在循环中使用操作码。这是函数循环的字节码:
>>13 FOR_ITER 6(to 22)# get next value from iterator16 STORE_FAST 0(x)# set local variable19 JUMP_ABSOLUTE 13# back to FOR_ITER
case FOR_ITER:// the FOR_ITER opcode case
v = TOP();
x =(*v->ob_type->tp_iternext)(v);// x is the next value from iteratorif(x != NULL){
PUSH(x);// put x on top of the stack
PREDICT(STORE_FAST);// predict STORE_FAST will follow - success!
PREDICT(UNPACK_SEQUENCE);// this and everything below is skippedcontinue;}// error-checking and more code for when the iterator ends normally
PREDICTED_WITH_ARG(STORE_FAST);case STORE_FAST:
v = POP();// pop x back off the stack
SETLOCAL(oparg, v);// set it as the new local variablegoto fast_next_opcode;
Aside from local/global variable store times, opcode prediction makes the function faster.
As the other answers explain, the function uses the STORE_FAST opcode in the loop. Here’s the bytecode for the function’s loop:
>> 13 FOR_ITER 6 (to 22) # get next value from iterator
16 STORE_FAST 0 (x) # set local variable
19 JUMP_ABSOLUTE 13 # back to FOR_ITER
Normally when a program is run, Python executes each opcode one after the other, keeping track of the a stack and preforming other checks on the stack frame after each opcode is executed. Opcode prediction means that in certain cases Python is able to jump directly to the next opcode, thus avoiding some of this overhead.
In this case, every time Python sees FOR_ITER (the top of the loop), it will “predict” that STORE_FAST is the next opcode it has to execute. Python then peeks at the next opcode and, if the prediction was correct, it jumps straight to STORE_FAST. This has the effect of squeezing the two opcodes into a single opcode.
On the other hand, the STORE_NAME opcode is used in the loop at the global level. Python does *not* make similar predictions when it sees this opcode. Instead, it must go back to the top of the evaluation-loop which has obvious implications for the speed at which the loop is executed.
To give some more technical detail about this optimization, here’s a quote from the ceval.c file (the “engine” of Python’s virtual machine):
Some opcodes tend to come in pairs thus making it possible to
predict the second code when the first is run. For example,
GET_ITER is often followed by FOR_ITER. And FOR_ITER is often
followed by STORE_FAST or UNPACK_SEQUENCE.
Verifying the prediction costs a single high-speed test of a register
variable against a constant. If the pairing was good, then the
processor’s own internal branch predication has a high likelihood of
success, resulting in a nearly zero-overhead transition to the
next opcode. A successful prediction saves a trip through the eval-loop
including its two unpredictable branches, the HAS_ARG test and the
switch-case. Combined with the processor’s internal branch prediction,
a successful PREDICT has the effect of making the two opcodes run as if
they were a single new opcode with the bodies combined.
We can see in the source code for the FOR_ITER opcode exactly where the prediction for STORE_FAST is made:
case FOR_ITER: // the FOR_ITER opcode case
v = TOP();
x = (*v->ob_type->tp_iternext)(v); // x is the next value from iterator
if (x != NULL) {
PUSH(x); // put x on top of the stack
PREDICT(STORE_FAST); // predict STORE_FAST will follow - success!
PREDICT(UNPACK_SEQUENCE); // this and everything below is skipped
continue;
}
// error-checking and more code for when the iterator ends normally
The PREDICT function expands to if (*next_instr == op) goto PRED_##op i.e. we just jump to the start of the predicted opcode. In this case, we jump here:
PREDICTED_WITH_ARG(STORE_FAST);
case STORE_FAST:
v = POP(); // pop x back off the stack
SETLOCAL(oparg, v); // set it as the new local variable
goto fast_next_opcode;
The local variable is now set and the next opcode is up for execution. Python continues through the iterable until it reaches the end, making the successful prediction each time.
The Python wiki page has more information about how CPython’s virtual machine works.
I want to know the memory usage of my Python application and specifically want to know what code blocks/portions or objects are consuming most memory.
Google search shows a commercial one is Python Memory Validator (Windows only).
Partition of a set of 132527 objects.Total size =8301532 bytes.IndexCount%Size%Cumulative%Kind(class/ dict of class)03514427214041226214041226 str
13839729130902016344943242 tuple
253007398569418928850 dict (no owner)
Line# Mem usage Increment Line Contents==============================================3@profile45.97 MB 0.00 MB def my_func():513.61 MB 7.64 MB a =[1]*(10**6)6166.20 MB 152.59 MB b =[2]*(2*10**7)713.61 MB -152.59 MB del b
813.61 MB 0.00 MB return a
Since nobody has mentioned it I’ll point to my module memory_profiler which is capable of printing line-by-line report of memory usage and works on Unix and Windows (needs psutil on this last one). Output is not very detailed but the goal is to give you an overview of where the code is consuming more memory and not a exhaustive analysis on allocated objects.
After decorating your function with @profile and running your code with the -m memory_profiler flag it will print a line-by-line report like this:
Line # Mem usage Increment Line Contents
==============================================
3 @profile
4 5.97 MB 0.00 MB def my_func():
5 13.61 MB 7.64 MB a = [1] * (10 ** 6)
6 166.20 MB 152.59 MB b = [2] * (2 * 10 ** 7)
7 13.61 MB -152.59 MB del b
8 13.61 MB 0.00 MB return a
I recommend Dowser. It is very easy to setup, and you need zero changes to your code. You can view counts of objects of each type through time, view list of live objects, view references to live objects, all from the simple web interface.
You import memdebug, and call memdebug.start. That’s all.
I haven’t tried PySizer or Heapy. I would appreciate others’ reviews.
UPDATE
The above code is for CherryPy 2.X, CherryPy 3.X the server.quickstart method has been removed and engine.start does not take the blocking flag. So if you are using CherryPy 3.X
Muppy is (yet another) Memory Usage Profiler for Python. The focus of this toolset is laid on the identification of memory leaks.
Muppy tries to help developers to identity memory leaks of Python applications. It enables the tracking of memory usage during runtime and the identification of objects which are leaking. Additionally, tools are provided which allow to locate the source of not released objects.
It allows you to log and plot the memory usage of your variables during the execution of the decorated methods. You just have to import the library using:
I found meliae to be much more functional than Heapy or PySizer. If you happen to be running a wsgi webapp, then Dozer is a nice middleware wrapper of Dowser
由于以下原因,获取Python解释器的内存地址可能有点棘手Address Space Layout Randomization如果目标python解释器附带符号,那么通过取消引用interp_head或_PyRuntime变量取决于Python版本。但是,许多Python版本都附带了剥离的二进制文件,或者在Windows上没有相应的PDB符号文件。在这些情况下,我们扫描BSS部分,查找看起来可能指向有效PyInterpreterState的地址,并检查该地址的布局是否符合我们的预期
在Linux上,默认配置是在附加到非子进程时需要root权限。对于py-spy,这意味着您可以通过让py-spy创建进程(py-spy record -- python myprogram.py),但是通过指定PID附加到现有进程通常需要root(sudo py-spy record --pid 123456)。您可以通过在Linux上设置ptrace_scope sysctl variable
如何检测线程是否空闲?
PY-SPY尝试仅包括来自活动运行代码的线程的堆栈跟踪,并排除休眠或空闲的线程。如果可能,py-spy尝试从OS获取此线程活动信息:通过读入/proc/PID/stat在Linux上,通过使用machthread_basic_info调用OSX,并通过查看当前系统调用是否known to be
idle在Windows上
Project Euler and other coding contests often have a maximum time to run or people boast of how fast their particular solution runs. With Python, sometimes the approaches are somewhat kludgey – i.e., adding timing code to __main__.
What is a good way to profile how long a Python program takes to run?
Python includes a profiler called cProfile. It not only gives the total running time, but also times each function separately, and tells you how many times each function was called, making it easy to determine where you should make optimizations.
You can call it from within your code, or from the interpreter, like this:
import cProfile
cProfile.run('foo()')
Even more usefully, you can invoke the cProfile when running a script:
python -m cProfile myscript.py
To make it even easier, I made a little batch file called ‘profile.bat’:
A while ago I made pycallgraph which generates a visualisation from your Python code. Edit: I’ve updated the example to work with 3.3, the latest release as of this writing.
After a pip install pycallgraph and installing GraphViz you can run it from the command line:
pycallgraph graphviz -- ./mypythonscript.py
Or, you can profile particular parts of your code:
from pycallgraph import PyCallGraph
from pycallgraph.output import GraphvizOutput
with PyCallGraph(output=GraphvizOutput()):
code_to_profile()
Either of these will generate a pycallgraph.png file similar to the image below:
It’s worth pointing out that using the profiler only works (by default) on the main thread, and you won’t get any information from other threads if you use them. This can be a bit of a gotcha as it is completely unmentioned in the profiler documentation.
and use that ProfiledThread class instead of the standard one. It might give you more flexibility, but I’m not sure it’s worth it, especially if you are using third-party code which wouldn’t use your class.
@Maxy’s comment on this answer helped me out enough that I think it deserves its own answer: I already had cProfile-generated .pstats files and I didn’t want to re-run things with pycallgraph, so I used gprof2dot, and got pretty svgs:
I ran into a handy tool called SnakeViz when researching this topic. SnakeViz is a web-based profiling visualization tool. It is very easy to install and use. The usual way I use it is to generate a stat file with %prun and then do analysis in SnakeViz.
The main viz technique used is Sunburst chart as shown below, in which the hierarchy of function calls is arranged as layers of arcs and time info encoded in their angular widths.
The best thing is you can interact with the chart. For example, to zoom in one can click on an arc, and the arc and its descendants will be enlarged as a new sunburst to display more details.
I think that cProfile is great for profiling, while kcachegrind is great for visualizing the results. The pyprof2calltree in between handles the file conversion.
Also worth mentioning is the GUI cProfile dump viewer RunSnakeRun. It allows you to sort and select, thereby zooming in on the relevant parts of the program. The sizes of the rectangles in the picture is proportional to the time taken. If you mouse over a rectangle it highlights that call in the table and everywhere on the map. When you double-click on a rectangle it zooms in on that portion. It will show you who calls that portion and what that portion calls.
The descriptive information is very helpful. It shows you the code for that bit which can be helpful when you are dealing with built-in library calls. It tells you what file and what line to find the code.
Also want to point at that the OP said ‘profiling’ but it appears he meant ‘timing’. Keep in mind programs will run slower when profiled.
A nice profiling module is the line_profiler (called using the script kernprof.py). It can be downloaded here.
My understanding is that cProfile only gives information about total time spent in each function. So individual lines of code are not timed. This is an issue in scientific computing since often one single line can take a lot of time. Also, as I remember, cProfile didn’t catch the time I was spending in say numpy.dot.
line_profiler (already presented here) also inspired pprofile, which is described as:
Line-granularity, thread-aware deterministic and statistic pure-python
profiler
It provides line-granularity as line_profiler, is pure Python, can be used as a standalone command or a module, and can even generate callgrind-format files that can be easily analyzed with [k|q]cachegrind.
vprof
There is also vprof, a Python package described as:
[…] providing rich and interactive visualizations for various Python program characteristics such as running time and memory usage.
def count():from math import sqrt
for x in range(10**5):
sqrt(x)if __name__ =='__main__':import cProfile, pstats
cProfile.run("count()","{}.profile".format(__file__))
s = pstats.Stats("{}.profile".format(__file__))
s.strip_dirs()
s.sort_stats("time").print_stats(10)
There’s a lot of great answers but they either use command line or some external program for profiling and/or sorting the results.
I really missed some way I could use in my IDE (eclipse-PyDev) without touching the command line or installing anything. So here it is.
Profiling without command line
def count():
from math import sqrt
for x in range(10**5):
sqrt(x)
if __name__ == '__main__':
import cProfile, pstats
cProfile.run("count()", "{}.profile".format(__file__))
s = pstats.Stats("{}.profile".format(__file__))
s.strip_dirs()
s.sort_stats("time").print_stats(10)
Following Joe Shaw’s answer about multi-threaded code not to work as expected, I figured that the runcall method in cProfile is merely doing self.enable() and self.disable() calls around the profiled function call, so you can simply do that yourself and have whatever code you want in-between with minimal interference with existing code.
In Virtaal’s source there’s a very useful class and decorator that can make profiling (even for specific methods/functions) very easy. The output can then be viewed very comfortably in KCacheGrind.
cProfile is great for quick profiling but most of the time it was ending for me with the errors. Function runctx solves this problem by initializing correctly the environment and variables, hope it can be useful for someone:
import cProfile, pstats
class_ProfileFunc:def __init__(self, func, sort_stats_by):
self.func = func
self.profile_runs =[]
self.sort_stats_by = sort_stats_by
def __call__(self,*args,**kwargs):
pr = cProfile.Profile()
pr.enable()# this is the profiling section
retval = self.func(*args,**kwargs)
pr.disable()
self.profile_runs.append(pr)
ps = pstats.Stats(*self.profile_runs).sort_stats(self.sort_stats_by)return retval, ps
def cumulative_profiler(amount_of_times, sort_stats_by='time'):def real_decorator(function):def wrapper(*args,**kwargs):nonlocal function, amount_of_times, sort_stats_by # for python 2.x remove this row
profiled_func =_ProfileFunc(function, sort_stats_by)for i in range(amount_of_times):
retval, ps = profiled_func(*args,**kwargs)
ps.print_stats()return retval # returns the results of the functionreturn wrapper
if callable(amount_of_times):# incase you don't want to specify the amount of times
func = amount_of_times # amount_of_times is the function in here
amount_of_times =5# the default amountreturn real_decorator(func)return real_decorator
例
分析功能 baz
import time
@cumulative_profilerdef baz():
time.sleep(1)
time.sleep(2)return1
baz()
baz 跑了5次并打印了这个:
20 function calls in15.003 seconds
Ordered by: internal time
ncalls tottime percall cumtime percall filename:lineno(function)1015.0031.50015.0031.500{built-in method time.sleep}50.0000.00015.0033.001<ipython-input-9-c89afe010372>:3(baz)50.0000.0000.0000.000{method 'disable' of '_lsprof.Profiler' objects}
If you want to make a cumulative profiler,
meaning to run the function several times in a row and watch the sum of the results.
you can use this cumulative_profiler decorator:
it’s python >= 3.6 specific, but you can remove nonlocal for it work on older versions.
import cProfile, pstats
class _ProfileFunc:
def __init__(self, func, sort_stats_by):
self.func = func
self.profile_runs = []
self.sort_stats_by = sort_stats_by
def __call__(self, *args, **kwargs):
pr = cProfile.Profile()
pr.enable() # this is the profiling section
retval = self.func(*args, **kwargs)
pr.disable()
self.profile_runs.append(pr)
ps = pstats.Stats(*self.profile_runs).sort_stats(self.sort_stats_by)
return retval, ps
def cumulative_profiler(amount_of_times, sort_stats_by='time'):
def real_decorator(function):
def wrapper(*args, **kwargs):
nonlocal function, amount_of_times, sort_stats_by # for python 2.x remove this row
profiled_func = _ProfileFunc(function, sort_stats_by)
for i in range(amount_of_times):
retval, ps = profiled_func(*args, **kwargs)
ps.print_stats()
return retval # returns the results of the function
return wrapper
if callable(amount_of_times): # incase you don't want to specify the amount of times
func = amount_of_times # amount_of_times is the function in here
amount_of_times = 5 # the default amount
return real_decorator(func)
return real_decorator
Example
profiling the function baz
import time
@cumulative_profiler
def baz():
time.sleep(1)
time.sleep(2)
return 1
baz()
baz ran 5 times and printed this:
20 function calls in 15.003 seconds
Ordered by: internal time
ncalls tottime percall cumtime percall filename:lineno(function)
10 15.003 1.500 15.003 1.500 {built-in method time.sleep}
5 0.000 0.000 15.003 3.001 <ipython-input-9-c89afe010372>:3(baz)
5 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}
The terminal-only (and simplest) solution, in case all those fancy UI’s fail to install or to run:
ignore cProfile completely and replace it with pyinstrument, that will collect and display the tree of calls right after execution.
Install:
$ pip install pyinstrument
Profile and display result:
$ python -m pyinstrument ./prog.py
Works with python2 and 3.
[EDIT]
The documentation of the API, for profiling only a part of the code, can be found here.
My way is to use yappi (https://github.com/sumerc/yappi). It’s especially useful combined with an RPC server where (even just for debugging) you register method to start, stop and print profiling information, e.g. in this way:
@staticmethod
def startProfiler():
yappi.start()
@staticmethod
def stopProfiler():
yappi.stop()
@staticmethod
def printProfiler():
stats = yappi.get_stats(yappi.SORTTYPE_TTOT, yappi.SORTORDER_DESC, 20)
statPrint = '\n'
namesArr = [len(str(stat[0])) for stat in stats.func_stats]
log.debug("namesArr %s", str(namesArr))
maxNameLen = max(namesArr)
log.debug("maxNameLen: %s", maxNameLen)
for stat in stats.func_stats:
nameAppendSpaces = [' ' for i in range(maxNameLen - len(stat[0]))]
log.debug('nameAppendSpaces: %s', nameAppendSpaces)
blankSpace = ''
for space in nameAppendSpaces:
blankSpace += space
log.debug("adding spaces: %s", len(nameAppendSpaces))
statPrint = statPrint + str(stat[0]) + blankSpace + " " + str(stat[1]).ljust(8) + "\t" + str(
round(stat[2], 2)).ljust(8 - len(str(stat[2]))) + "\t" + str(round(stat[3], 2)) + "\n"
log.log(1000, "\nname" + ''.ljust(maxNameLen - 4) + " ncall \tttot \ttsub")
log.log(1000, statPrint)
Then when your program work you can start profiler at any time by calling the startProfiler RPC method and dump profiling information to a log file by calling printProfiler (or modify the rpc method to return it to the caller) and get such output:
It may not be very useful for short scripts but helps to optimize server-type processes especially given the printProfiler method can be called multiple times over time to profile and compare e.g. different program usage scenarios.
In newer versions of yappi, the following code will work:
Ever want to know what the hell that python script is doing? Enter the
Inspect Shell. Inspect Shell lets you print/alter globals and run
functions without interrupting the running script. Now with
auto-complete and command history (only on linux).
It would depend on what you want to see out of profiling. Simple time
metrics can be given by (bash).
time python python_prog.py
Even ‘/usr/bin/time’ can output detailed metrics by using ‘–verbose’ flag.
To check time metrics given by each function and to better understand how much time is spent on functions, you can use the inbuilt cProfile in python.
Going into more detailed metrics like performance, time is not the only metric. You can worry about memory, threads etc.
Profiling options:
1. line_profiler is another profiler used commonly to find out timing metrics line-by-line.
2. memory_profiler is a tool to profile memory usage.
3. heapy (from project Guppy) Profile how objects in the heap are used.
These are some of the common ones I tend to use. But if you want to find out more, try reading this book
It is a pretty good book on starting out with performance in mind. You can move onto advanced topics on using Cython and JIT(Just-in-time) compiled python.
With a statistical profiler like austin, no instrumentation is required, meaning that you can get profiling data out of a Python application simply with
austin python3 my_script.py
The raw output isn’t very useful, but you can pipe that to flamegraph.pl
to get a flame graph representation of that data that gives you a breakdown of where the time (measured in microseconds of real time) is being spent.
There’s also a statistical profiler called statprof. It’s a sampling profiler, so it adds minimal overhead to your code and gives line-based (not just function-based) timings. It’s more suited to soft real-time applications like games, but may be have less precision than cProfile.
import time # line number 1import random
from auto_profiler importProfiler,Treedef f1():
mysleep(.6+random.random())def mysleep(t):
time.sleep(t)def fact(i):
f1()if(i==1):return1return i*fact(i-1)def show(p):print('Time [Hits * PerHit] Function name [Called from] [Function Location]\n'+\
'-----------------------------------------------------------------------')print(Tree(p.root, threshold=0.5))@Profiler(depth=4, on_disable=show)def main():for i in range(5):
f1()
fact(3)if __name__ =='__main__':
main()
示例输出
Time[Hits*PerHit]Function name [Calledfrom][function location]-----------------------------------------------------------------------8.974s[1*8.974] main [auto-profiler/profiler.py:267][/test/t2.py:30]├──5.954s[5*1.191] f1 [/test/t2.py:34][/test/t2.py:14]│└──5.954s[5*1.191] mysleep [/test/t2.py:15][/test/t2.py:17]│└──5.954s[5*1.191]<time.sleep>|||# The rest is for the example recursive function call fact└──3.020s[1*3.020] fact [/test/t2.py:36][/test/t2.py:20]├──0.849s[1*0.849] f1 [/test/t2.py:21][/test/t2.py:14]│└──0.849s[1*0.849] mysleep [/test/t2.py:15][/test/t2.py:17]│└──0.849s[1*0.849]<time.sleep>└──2.171s[1*2.171] fact [/test/t2.py:24][/test/t2.py:20]├──1.552s[1*1.552] f1 [/test/t2.py:21][/test/t2.py:14]│└──1.552s[1*1.552] mysleep [/test/t2.py:15][/test/t2.py:17]└──0.619s[1*0.619] fact [/test/t2.py:24][/test/t2.py:20]└──0.619s[1*0.619] f1 [/test/t2.py:21][/test/t2.py:14]
By adding a decorator it will show a tree of time-consuming functions
@Profiler(depth=4, on_disable=show)
Install by: pip install auto_profiler
Example
import time # line number 1
import random
from auto_profiler import Profiler, Tree
def f1():
mysleep(.6+random.random())
def mysleep(t):
time.sleep(t)
def fact(i):
f1()
if(i==1):
return 1
return i*fact(i-1)
def show(p):
print('Time [Hits * PerHit] Function name [Called from] [Function Location]\n'+\
'-----------------------------------------------------------------------')
print(Tree(p.root, threshold=0.5))
@Profiler(depth=4, on_disable=show)
def main():
for i in range(5):
f1()
fact(3)
if __name__ == '__main__':
main()
Example Output
Time [Hits * PerHit] Function name [Called from] [function location]
-----------------------------------------------------------------------
8.974s [1 * 8.974] main [auto-profiler/profiler.py:267] [/test/t2.py:30]
├── 5.954s [5 * 1.191] f1 [/test/t2.py:34] [/test/t2.py:14]
│ └── 5.954s [5 * 1.191] mysleep [/test/t2.py:15] [/test/t2.py:17]
│ └── 5.954s [5 * 1.191] <time.sleep>
|
|
| # The rest is for the example recursive function call fact
└── 3.020s [1 * 3.020] fact [/test/t2.py:36] [/test/t2.py:20]
├── 0.849s [1 * 0.849] f1 [/test/t2.py:21] [/test/t2.py:14]
│ └── 0.849s [1 * 0.849] mysleep [/test/t2.py:15] [/test/t2.py:17]
│ └── 0.849s [1 * 0.849] <time.sleep>
└── 2.171s [1 * 2.171] fact [/test/t2.py:24] [/test/t2.py:20]
├── 1.552s [1 * 1.552] f1 [/test/t2.py:21] [/test/t2.py:14]
│ └── 1.552s [1 * 1.552] mysleep [/test/t2.py:15] [/test/t2.py:17]
└── 0.619s [1 * 0.619] fact [/test/t2.py:24] [/test/t2.py:20]
└── 0.619s [1 * 0.619] f1 [/test/t2.py:21] [/test/t2.py:14]
For getting quick profile stats for your code snippets on an IPython notebook.
One can embed line_profiler and memory_profiler straight into their notebooks.
%timeit -r 7 -n 1000 print('Outputs execution time of the snippet')
#1000 loops, best of 7: 7.46 ns per loop
Gives best time out of given number of runs(r) in looping (n) times.
Outputs details on system caching:
When code snippets are executed multiple times, system caches a few opearations and doesn’t execute them again that may hamper the accuracy of the profile reports.
# install pyroscope
brew install pyroscope-io/brew/pyroscope
# start pyroscope server:
pyroscope server
# in a separate tab, start profiling your app:
pyroscope exec python manage.py runserver # If using Python
pyroscope exec rails server # If using Ruby# If using Pyroscope cloud add flags for server address and auth token# pyroscope exec -server-address "https://your_company.pyroscope.cloud" -auth-token "ps-key-1234567890" python manage.py runserver