标签归档:profiling

减少Django的内存使用量。低挂水果?

问题:减少Django的内存使用量。低挂水果?

我的内存使用量随着时间的推移而增加,并且重新启动Django对用户而言并不友好。

我不确定如何分析内存使用情况,但是一些有关如何开始测量的提示将很有用。

我感觉有些简单的步骤可以带来很大的收益。确保将“调试”设置为“假”是显而易见的。

有人可以建议别人吗?在低流量的网站上缓存会带来多少改善?

在这种情况下,我使用mod_python在Apache 2.x下运行。我听说mod_wsgi较为精简,但在此阶段进行切换将非常棘手,除非我知道收益会很大。

编辑:感谢到目前为止的提示。关于如何发现内存用尽的任何建议?是否有任何有关Python内存分析的指南?

同样如前所述,有些事情会使切换到mod_wsgi变得很棘手,因此我想对在朝这个方向努力之前所能获得的收益有所了解。

编辑:卡尔在这里发布了更详细的回复,值得一读:Django部署:减少Apache的开销

编辑: Graham Dumpleton的文章是我在MPM和mod_wsgi相关的东西上找到的最好的文章。我很失望,但是没人能提供有关调试应用程序本身的内存使用情况的任何信息。

最终编辑:好吧,我一直在与Webfaction讨论这个问题,看他们是否可以协助重新编译Apache,这就是他们的话:

“我真的认为切换到MPM Worker + mod_wsgi设置不会给您带来太大的好处。我估计您可能可以节省20MB左右,但可能不超过20MB。”

所以!这使我回到了最初的问题(我仍然不明智)。如何确定问题所在?这是一个众所周知的准则,如果不进行测试以查看需要优化的地方就不会进行优化,但是关于测量Python内存使用情况的教程的方式很少,而针对Django的教程则完全没有。

感谢大家的帮助,但我认为这个问题仍然悬而未决!

另一个最终编辑;-)

我在django-users列表上问了这个,并得到了一些非常有帮助的回复

老实说,有史以来最后一次更新!

这是刚刚发布。可能是迄今为止最好的解决方案:使用Pympler分析Django对象的大小和内存使用情况

My memory usage increases over time and restarting Django is not kind to users.

I am unsure how to go about profiling the memory usage but some tips on how to start measuring would be useful.

I have a feeling that there are some simple steps that could produce big gains. Ensuring ‘debug’ is set to ‘False’ is an obvious biggie.

Can anyone suggest others? How much improvement would caching on low-traffic sites?

In this case I’m running under Apache 2.x with mod_python. I’ve heard mod_wsgi is a bit leaner but it would be tricky to switch at this stage unless I know the gains would be significant.

Edit: Thanks for the tips so far. Any suggestions how to discover what’s using up the memory? Are there any guides to Python memory profiling?

Also as mentioned there’s a few things that will make it tricky to switch to mod_wsgi so I’d like to have some idea of the gains I could expect before ploughing forwards in that direction.

Edit: Carl posted a slightly more detailed reply here that is worth reading: Django Deployment: Cutting Apache’s Overhead

Edit: Graham Dumpleton’s article is the best I’ve found on the MPM and mod_wsgi related stuff. I am rather disappointed that no-one could provide any info on debugging the memory usage in the app itself though.

Final Edit: Well I have been discussing this with Webfaction to see if they could assist with recompiling Apache and this is their word on the matter:

“I really don’t think that you will get much of a benefit by switching to an MPM Worker + mod_wsgi setup. I estimate that you might be able to save around 20MB, but probably not much more than that.”

So! This brings me back to my original question (which I am still none the wiser about). How does one go about identifying where the problems lies? It’s a well known maxim that you don’t optimize without testing to see where you need to optimize but there is very little in the way of tutorials on measuring Python memory usage and none at all specific to Django.

Thanks for everyone’s assistance but I think this question is still open!

Another final edit ;-)

I asked this on the django-users list and got some very helpful replies

Honestly the last update ever!

This was just released. Could be the best solution yet: Profiling Django object size and memory usage with Pympler


回答 0

确保您没有保留对数据的全局引用。这样可以防止python垃圾回收器释放内存。

不要使用mod_python。它在apache中加载一个解释器。如果需要使用apache,请mod_wsgi改用。切换并不困难。这很容易。与django -dead相比,为djangomod_wsgi进行配置更容易mod_python

如果您可以从需求中删除apache,那对您的记忆会更好。spawning似乎是运行python Web应用程序的新的快速可扩展方式。

编辑:我看不到如何切换到mod_wsgi可能是“ 棘手的 ”。这应该是一个非常容易的任务。请详细说明您在使用交换机时遇到的问题。

Make sure you are not keeping global references to data. That prevents the python garbage collector from releasing the memory.

Don’t use mod_python. It loads an interpreter inside apache. If you need to use apache, use mod_wsgi instead. It is not tricky to switch. It is very easy. mod_wsgi is way easier to configure for django than brain-dead mod_python.

If you can remove apache from your requirements, that would be even better to your memory. spawning seems to be the new fast scalable way to run python web applications.

EDIT: I don’t see how switching to mod_wsgi could be “tricky“. It should be a very easy task. Please elaborate on the problem you are having with the switch.


回答 1

如果您在mod_wsgi下运行,并且由于它是WSGI兼容的,则大概是在生成的,您可以使用Dozer查看您的内存使用情况。

在mod_wsgi下,只需将其添加到WSGI脚本的底部:

from dozer import Dozer
application = Dozer(application)

然后将浏览器指向http:// domain / _dozer / index以查看所有内存分配的列表。

我还要添加对mod_wsgi的支持之声。与mod_python相比,它在性能和内存使用方面有很大的不同。Graham Dumpleton对mod_wsgi的支持非常出色,无论是在积极开发方面还是在帮助邮件列表中的人员优化安装方面均如此。curse.com上的David Cramer 发布了一些图表(不幸的是,现在似乎找不到),显示了在该高流量站点上切换到mod_wsgi后cpu和内存使用量的急剧下降。django开发人员中有几个已经切换。说真的,这很容易:)

If you are running under mod_wsgi, and presumably spawning since it is WSGI compliant, you can use Dozer to look at your memory usage.

Under mod_wsgi just add this at the bottom of your WSGI script:

from dozer import Dozer
application = Dozer(application)

Then point your browser at http://domain/_dozer/index to see a list of all your memory allocations.

I’ll also just add my voice of support for mod_wsgi. It makes a world of difference in terms of performance and memory usage over mod_python. Graham Dumpleton’s support for mod_wsgi is outstanding, both in terms of active development and in helping people on the mailing list to optimize their installations. David Cramer at curse.com has posted some charts (which I can’t seem to find now unfortunately) showing the drastic reduction in cpu and memory usage after they switched to mod_wsgi on that high traffic site. Several of the django devs have switched. Seriously, it’s a no-brainer :)


回答 2

这些是我知道的Python内存探查器解决方案(与Django无关):

免责声明:我与后者有一定关系。

各个项目的文档应使您了解如何使用这些工具来分析Python应用程序的内存行为。

以下是一个不错的“战争故事”,还提供了一些有用的指导:

These are the Python memory profiler solutions I’m aware of (not Django related):

Disclaimer: I have a stake in the latter.

The individual project’s documentation should give you an idea of how to use these tools to analyze memory behavior of Python applications.

The following is a nice “war story” that also gives some helpful pointers:


回答 3

此外,检查是否不使用任何已知的泄漏器。由于Unicode处理中的错误,MySQLdb会泄漏Django的大量内存。除此之外,Django Debug Toolbar可能会帮助您跟踪猪。

Additionally, check if you do not use any of known leakers. MySQLdb is known to leak enormous amounts of memory with Django due to bug in unicode handling. Other than that, Django Debug Toolbar might help you to track the hogs.


回答 4

除了不保留对大型数据对象的全局引用之外,还应尽可能避免将大型数据集加载到内存中。

在守护程序模式下切换到mod_wsgi,并使用Apache的worker mpm代替prefork。后面的步骤可以使您以更少的内存开销为更多的并发用户提供服务。

In addition to not keeping around global references to large data objects, try to avoid loading large datasets into memory at all wherever possible.

Switch to mod_wsgi in daemon mode, and use Apache’s worker mpm instead of prefork. This latter step can allow you to serve many more concurrent users with much less memory overhead.


回答 5

Webfaction实际上有一些技巧可以降低Django的内存使用量。

要点:

  • 确保将debug设置为false(您已经知道)。
  • 在您的Apache配置中使用“ ServerLimit”
  • 检查内存中没有大对象
  • 考虑在单独的进程或服务器中提供静态内容。
  • 在您的apache配置中使用“ MaxRequestsPerChild”
  • 找出并了解您正在使用多少内存

Webfaction actually has some tips for keeping django memory usage down.

The major points:

  • Make sure debug is set to false (you already know that).
  • Use “ServerLimit” in your apache config
  • Check that no big objects are being loaded in memory
  • Consider serving static content in a separate process or server.
  • Use “MaxRequestsPerChild” in your apache config
  • Find out and understand how much memory you’re using

回答 6

mod_wsgi的另一个优点:maximum-requestsWSGIDaemonProcess指令中设置一个参数,mod_wsgi会每隔很长时间就重新启动守护进程。对用户来说,应该没有可见的效果,除了第一次刷新新进程时页面加载缓慢之外,因为它将把Django和您的应用程序代码加载到内存中。

但是,即使确实有内存泄漏,也应避免进程过大,而不必中断对用户的服务。

Another plus for mod_wsgi: set a maximum-requests parameter in your WSGIDaemonProcess directive and mod_wsgi will restart the daemon process every so often. There should be no visible effect for the user, other than a slow page load the first time a fresh process is hit, as it’ll be loading Django and your application code into memory.

But even if you do have memory leaks, that should keep the process size from getting too large, without having to interrupt service to your users.


回答 7

这是我用于mod_wsgi的脚本(称为wsgi.py,并放在django项目的根目录中):

import os
import sys
import django.core.handlers.wsgi

from os import path

sys.stdout = open('/dev/null', 'a+')
sys.stderr = open('/dev/null', 'a+')

sys.path.append(path.join(path.dirname(__file__), '..'))

os.environ['DJANGO_SETTINGS_MODULE'] = 'myproject.settings'
application = django.core.handlers.wsgi.WSGIHandler()

根据需要调整myproject.settings和路径。我将所有输出重定向到/ dev / null,因为默认情况下mod_wsgi阻止打印。请改用日志记录。

对于apache:

<VirtualHost *>
   ServerName myhost.com

   ErrorLog /var/log/apache2/error-myhost.log
   CustomLog /var/log/apache2/access-myhost.log common

   DocumentRoot "/var/www"

   WSGIScriptAlias / /path/to/my/wsgi.py

</VirtualHost>

希望这至少应该可以帮助您设置mod_wsgi,以便您查看它是否有所作为。

Here is the script I use for mod_wsgi (called wsgi.py, and put in the root off my django project):

import os
import sys
import django.core.handlers.wsgi

from os import path

sys.stdout = open('/dev/null', 'a+')
sys.stderr = open('/dev/null', 'a+')

sys.path.append(path.join(path.dirname(__file__), '..'))

os.environ['DJANGO_SETTINGS_MODULE'] = 'myproject.settings'
application = django.core.handlers.wsgi.WSGIHandler()

Adjust myproject.settings and the path as needed. I redirect all output to /dev/null since mod_wsgi by default prevents printing. Use logging instead.

For apache:

<VirtualHost *>
   ServerName myhost.com

   ErrorLog /var/log/apache2/error-myhost.log
   CustomLog /var/log/apache2/access-myhost.log common

   DocumentRoot "/var/www"

   WSGIScriptAlias / /path/to/my/wsgi.py

</VirtualHost>

Hopefully this should at least help you set up mod_wsgi so you can see if it makes a difference.


回答 8

缓存:确保已将其刷新。它很容易将某些东西放到缓存中,但是由于缓存引用而永远不会被GC。

Swig’d代码:确保任何内存管理都正确完成,这真的很容易在python中丢失,尤其是在第三方库中

监视:如果可以,获取有关内存使用率和命中率的数据。通常,您会看到某种类型的请求与内存使用之间的关联。

Caches: make sure they’re being flushed. Its easy for something to land in a cache, but never be GC’d because of the cache reference.

Swig’d code: Make sure any memory management is being done correctly, its really easy to miss these in python, especially with third party libraries

Monitoring: If you can, get data about memory usage and hits. Usually you’ll see a correlation between a certain type of request and memory usage.


回答 9

我们偶然发现了Django中包含大型站点地图(10000个项)的错误。似乎Django在生成站点地图时正在尝试将它们全部加载到内存中:http : //code.djangoproject.com/ticket/11572-当Google对该网站进行访问时,有效地终止了Apache进程。

We stumbled over a bug in Django with big sitemaps (10.000 items). Seems Django is trying to load them all in memory when generating the sitemap: http://code.djangoproject.com/ticket/11572 – effectively kills the apache process when Google pays a visit to the site.


如何测量python中代码行之间的时间?

问题:如何测量python中代码行之间的时间?

因此,在Java中,我们可以执行如何测量函数执行所需的时间

但是如何在python中完成呢?要测量代码行之间的时间开始和结束时间?这样做的事情:

import some_time_library

starttime = some_time_library.some_module()
code_tobe_measured() 
endtime = some_time_library.some_module()

time_taken = endtime - starttime

So in Java, we can do How to measure time taken by a function to execute

But how is it done in python? To measure the time start and end time between lines of codes? Something that does this:

import some_time_library

starttime = some_time_library.some_module()
code_tobe_measured() 
endtime = some_time_library.some_module()

time_taken = endtime - starttime

回答 0

如果要测量CPU时间,可以time.process_time()用于Python 3.3及更高版本:

import time
start = time.process_time()
# your code here    
print(time.process_time() - start)

第一个电话打开计时器,第二个电话告诉您已经过去了几秒钟。

还有一个功能time.clock(),但是自Python 3.3起不推荐使用,并将在Python 3.8中删除。

有更好的性能分析工具,例如timeitprofile,但是time.process_time()将测量CPU时间,这就是您要的内容。

如果您想测量挂钟时间,请使用time.time()

If you want to measure CPU time, can use time.process_time() for Python 3.3 and above:

import time
start = time.process_time()
# your code here    
print(time.process_time() - start)

First call turns the timer on, and second call tells you how many seconds have elapsed.

There is also a function time.clock(), but it is deprecated since Python 3.3 and will be removed in Python 3.8.

There are better profiling tools like timeit and profile, however time.process_time() will measure the CPU time and this is what you’re are asking about.

If you want to measure wall clock time instead, use time.time().


回答 1

您还可以使用time库:

import time

start = time.time()

# your code

# end

print(f'Time: {time.time() - start}')

You can also use time library:

import time

start = time.time()

# your code

# end

print(f'Time: {time.time() - start}')

回答 2

借助小型便利类,您可以像这样测量缩进行中所花费的时间

with CodeTimer():
   line_to_measure()
   another_line()
   # etc...

缩进的行完成执行后,将显示以下内容:

Code block took: x.xxx ms

更新:现在,您可以使用pip install linetimer然后获取类from linetimer import CodeTimer。请参阅此GitHub项目

以上类的代码:

import timeit

class CodeTimer:
    def __init__(self, name=None):
        self.name = " '"  + name + "'" if name else ''

    def __enter__(self):
        self.start = timeit.default_timer()

    def __exit__(self, exc_type, exc_value, traceback):
        self.took = (timeit.default_timer() - self.start) * 1000.0
        print('Code block' + self.name + ' took: ' + str(self.took) + ' ms')

然后,您可以命名要测量的代码块

with CodeTimer('loop 1'):
   for i in range(100000):
      pass

with CodeTimer('loop 2'):
   for i in range(100000):
      pass

Code block 'loop 1' took: 4.991 ms
Code block 'loop 2' took: 3.666 ms

嵌套它们:

with CodeTimer('Outer'):
   for i in range(100000):
      pass

   with CodeTimer('Inner'):
      for i in range(100000):
         pass

   for i in range(100000):
      pass

Code block 'Inner' took: 2.382 ms
Code block 'Outer' took: 10.466 ms

关于timeit.default_timer(),它使用基于OS和Python版本的最佳计时器,请参见此答案

With a help of a small convenience class, you can measure time spent in indented lines like this:

with CodeTimer():
   line_to_measure()
   another_line()
   # etc...

Which will show the following after the indented line(s) finishes executing:

Code block took: x.xxx ms

UPDATE: You can now get the class with pip install linetimer and then from linetimer import CodeTimer. See this GitHub project.

The code for above class:

import timeit

class CodeTimer:
    def __init__(self, name=None):
        self.name = " '"  + name + "'" if name else ''

    def __enter__(self):
        self.start = timeit.default_timer()

    def __exit__(self, exc_type, exc_value, traceback):
        self.took = (timeit.default_timer() - self.start) * 1000.0
        print('Code block' + self.name + ' took: ' + str(self.took) + ' ms')

You could then name the code blocks you want to measure:

with CodeTimer('loop 1'):
   for i in range(100000):
      pass

with CodeTimer('loop 2'):
   for i in range(100000):
      pass

Code block 'loop 1' took: 4.991 ms
Code block 'loop 2' took: 3.666 ms

And nest them:

with CodeTimer('Outer'):
   for i in range(100000):
      pass

   with CodeTimer('Inner'):
      for i in range(100000):
         pass

   for i in range(100000):
      pass

Code block 'Inner' took: 2.382 ms
Code block 'Outer' took: 10.466 ms

Regarding timeit.default_timer(), it uses the best timer based on OS and Python version, see this answer.


回答 3

我总是喜欢以小时,分钟和秒(%H:%M:%S)格式检查时间:

from datetime import datetime
start = datetime.now()
# your code
end = datetime.now()
time_taken = end - start
print('Time: ',time_taken) 

输出:

Time:  0:00:00.000019

I always prefer to check time in hours, minutes and seconds (%H:%M:%S) format:

from datetime import datetime
start = datetime.now()
# your code
end = datetime.now()
time_taken = end - start
print('Time: ',time_taken) 

output:

Time:  0:00:00.000019

回答 4

我一直在寻找一种以最少的代码输出格式化时间的方法,因此这是我的解决方案。无论如何,许多人都使用熊猫,因此在某些情况下,可以从其他库导入中节省下来。

import pandas as pd
start = pd.Timestamp.now()
# code
print(pd.Timestamp.now()-start)

输出:

0 days 00:05:32.541600

如果时间精度不是最重要的,我建议使用此方法,否则请使用time库:

%timeit pd.Timestamp.now() 每个回路输出3.29 µs±214 ns

%timeit time.time() 每个回路输出154 ns±13.3 ns

I was looking for a way how to output a formatted time with minimal code, so here is my solution. Many people use Pandas anyway, so in some cases this can save from additional library imports.

import pandas as pd
start = pd.Timestamp.now()
# code
print(pd.Timestamp.now()-start)

Output:

0 days 00:05:32.541600

I would recommend using this if time precision is not the most important, otherwise use time library:

%timeit pd.Timestamp.now() outputs 3.29 µs ± 214 ns per loop

%timeit time.time() outputs 154 ns ± 13.3 ns per loop


回答 5

将代码放入函数中,然后使用装饰器进行计时是另一种选择。()此方法的优点是您只需定义一次计时器,并将其与每个功能的附加简单行一起使用。

首先,定义timer装饰器:

import functools
import time

def timer(func):
    @functools.wraps(func)
    def wrapper(*args, **kwargs):
        start_time = time.perf_counter()
        value = func(*args, **kwargs)
        end_time = time.perf_counter()
        run_time = end_time - start_time
        print("Finished {} in {} secs".format(repr(func.__name__), round(run_time, 3)))
        return value

    return wrapper

然后,在定义函数时使用装饰器:

@timer
def doubled_and_add(num):
    res = sum([i*2 for i in range(num)])
    print("Result : {}".format(res))

我们试试吧:

doubled_and_add(100000)
doubled_and_add(1000000)

输出:

Result : 9999900000
Finished 'doubled_and_add' in 0.0119 secs
Result : 999999000000
Finished 'doubled_and_add' in 0.0897 secs

注意:我不确定为什么要使用time.perf_counter而不是time.time。欢迎发表评论。

Putting the code in a function, then using a decorator for timing is another option. (Source) The advantage of this method is that you define timer once and use it with a simple additional line for every function.

First, define timer decorator:

import functools
import time

def timer(func):
    @functools.wraps(func)
    def wrapper(*args, **kwargs):
        start_time = time.perf_counter()
        value = func(*args, **kwargs)
        end_time = time.perf_counter()
        run_time = end_time - start_time
        print("Finished {} in {} secs".format(repr(func.__name__), round(run_time, 3)))
        return value

    return wrapper

Then, use the decorator while defining the function:

@timer
def doubled_and_add(num):
    res = sum([i*2 for i in range(num)])
    print("Result : {}".format(res))

Let’s try:

doubled_and_add(100000)
doubled_and_add(1000000)

Output:

Result : 9999900000
Finished 'doubled_and_add' in 0.0119 secs
Result : 999999000000
Finished 'doubled_and_add' in 0.0897 secs

Note: I’m not sure why to use time.perf_counter instead of time.time. Comments are welcome.


回答 6

您也可以尝试以下方法:

from time import perf_counter

t0 = perf_counter()

...

t1 = perf_counter()
time_taken = t1 - t0

You can try this as well:

from time import perf_counter

t0 = perf_counter()

...

t1 = perf_counter()
time_taken = t1 - t0

是否有适用于Python的可视分析器?[关闭]

问题:是否有适用于Python的可视分析器?[关闭]

我现在使用cProfile,但是我发现编写pstats代码来查询统计数据很繁琐。

我正在寻找一个视觉工具,向我展示Python代码在CPU时间和内存分配方面的工作。

Java世界中的一些示例是visualvmJProfiler

  • 是否存在这样的东西?
  • 是否有执行此操作的IDE?
  • dtrace的帮助吗?

我知道Linux版KCachegrind,但是我希望我可以在Windows / Mac上运行而无需安装KDE。

I use cProfile now but I find it tedious to write pstats code just to query the statistics data.

I’m looking for a visual tool that shows me what my Python code is doing in terms of CPU time and memory allocation.

Some examples from the Java world are visualvm and JProfiler.

  • Does something like this exist?
  • Is there an IDE that does this?
  • Would dtrace help?

I know about KCachegrind for Linux, but I would prefer something that I can run on Windows/Mac without installing KDE.


回答 0

我和一个朋友编写了一个名为SnakeViz的Python配置文件查看器,该查看器在网络浏览器中运行。如果您已经成功使用RunSnakeRun,则 SnakeViz可能不会添加那么多的值,但是SnakeViz的安装要容易得多。

编辑:SnakeViz支持Python 2和3,并且可以在所有主要系统上使用。

A friend and I have written a Python profile viewer called SnakeViz that runs in a web browser. If you are already successfully using RunSnakeRun SnakeViz may not add that much value, but SnakeViz is much easier to install.

Edit: SnakeViz supports Python 2 and 3 and works on all major systems.


回答 1

我只知道RunSnakeRun

前段时间也有关于PyDev(Eclipse)中的集成探查器的讨论,但是我不知道这是否会成为现实。

更新:不幸的是,似乎不再维护RunSnakeRun,并且它不支持Python 3。

I’m only aware of RunSnakeRun.

There was also some talk some time ago about an integrated profiler in PyDev (Eclipse), but I don’t know if that will ever see the light of day.

Update: Unfortunately it seems that RunSnakeRun is no longer maintained, and it does not support Python 3.


回答 2

我使用gprof2dot.py。结果看起来像这样。我使用这些命令:

  python -m cProfile -o profile.dat my_program.py
  gprof2dot.py -f pstats profile.dat | dot -Tpng -o profile.png

您需要安装graphvizgprof2dot.py。您可能需要一个方便的Shell脚本

I use gprof2dot.py. The result looks like this. I use those commands:

  python -m cProfile -o profile.dat my_program.py
  gprof2dot.py -f pstats profile.dat | dot -Tpng -o profile.png

You need graphviz and gprof2dot.py installed. You might like a convenience shell script.


回答 3

Spyder还为cProfile提供了一个非常不错的GUI:

Spyder also provides a pretty nice gui for cProfile:


回答 4

适用于Visual Studio的Python工具包含一个做得很好的图形分析器:http : //www.youtube.com/watch?v= VCx7rlPyEzE&hd =1

http://pytools.codeplex.com/

Python Tools for Visual Studio contains a very well done graphical profiler: http://www.youtube.com/watch?v=VCx7rlPyEzE&hd=1

http://pytools.codeplex.com/


回答 5

此人创造了一个图形化的配置文件,描述在这里。也许您可以将其用作自己工作的起点。

This person created a graphical profile, described here. Maybe you could use that as a starting point for your own work.


回答 6

KCacheGrind包含一个称为QCacheGrind的版本,该版本可在Mac OS XWindows上运行。

KCacheGrind includes a version called QCacheGrind which does run on Mac OS X and on Windows.


回答 7

试用Snakeviz。非常容易安装(通过pip),并且基于浏览器。

https://jiffyclub.github.io/snakeviz/

Try out Snakeviz. Very easy to install (via pip) and it’s browser based.

https://jiffyclub.github.io/snakeviz/


回答 8

Python Call Graph生成的图片与maxy’s answer中的图片非常相似。它还显示每个功能的总时间,由于某种原因,它没有反映在示例图中。

Python Call Graph generates pics very similar to those in maxy’s answer. It also shows total time for each function, for some reason it’s not reflected in the example graphs.


回答 9

我已经编写了一个基于浏览器的可视化工具profile_eye,它可以对gprof2dot的输出进行操作

gprof2dot非常擅长处理许多分析工具的输出,并且在图形元素放置方面做得很好。最终渲染是静态图形,通常非常混乱。

使用d3.js,可以通过未聚焦的元素,工具提示和鱼眼失真的相对淡入来消除很多杂乱

为了进行比较,请参阅profile_eye对gprof2dot使用规范示例的可视化。特别是对于Python,请参见cProfile输出示例

I’ve written a browser-based visualization tool, profile_eye, which operates on the output of gprof2dot.

gprof2dot is great at grokking many profiling-tool outputs, and does a great job at graph-element placement. The final rendering is a static graphic, which is often very cluttered.

Using d3.js it’s possible to remove much of that clutter, through relative fading of unfocused elements, tooltips, and a fisheye distortion.

For comparison, see profile_eye’s visualization of the canonical example used by gprof2dot. For Python in particular, see a cProfile output example.


回答 10

考虑pyflame +火焰图

Pyflame:适用于Python + Flamegraph的跟踪分析器

https://github.com/uber/pyflame

您可以使用pyflame跟踪正在运行的python进程。

Consider pyflame + flamegraph

Pyflame: A Ptracing Profiler For Python + flamegraph

https://github.com/uber/pyflame

You can trace towards a running python process using pyflame.


回答 11

我曾经使用过plop,发现它非常轻巧。快速了解性能。

I have used plop and found it to be very light-weight. Gives a quick insight into the perf.


如何剖析Python中的内存使用情况?

问题:如何剖析Python中的内存使用情况?

最近,我对算法产生了兴趣,并通过编写一个简单的实现,然后以各种方式对其进行了优化来开始探索它们。

我已经熟悉了用于分析运行时的标准Python模块(对于大多数事情,我发现IPython中的timeit magic函数就足够了),但是我也对内存使用感兴趣,因此我也可以探索这些折衷方案(例如,缓存先前计算的值与根据需要重新计算它们的表的成本)。是否有一个模块可以为我配置给定功能的内存使用情况?

I’ve recently become interested in algorithms and have begun exploring them by writing a naive implementation and then optimizing it in various ways.

I’m already familiar with the standard Python module for profiling runtime (for most things I’ve found the timeit magic function in IPython to be sufficient), but I’m also interested in memory usage so I can explore those tradeoffs as well (e.g. the cost of caching a table of previously computed values versus recomputing them as needed). Is there a module that will profile the memory usage of a given function for me?


回答 0

在这里已经回答了这个问题:Python memory profiler

基本上,您可以执行以下操作(引用自Guppy-PE):

>>> from guppy import hpy; h=hpy()
>>> h.heap()
Partition of a set of 48477 objects. Total size = 3265516 bytes.
 Index  Count   %     Size   % Cumulative  % Kind (class / dict of class)
     0  25773  53  1612820  49   1612820  49 str
     1  11699  24   483960  15   2096780  64 tuple
     2    174   0   241584   7   2338364  72 dict of module
     3   3478   7   222592   7   2560956  78 types.CodeType
     4   3296   7   184576   6   2745532  84 function
     5    401   1   175112   5   2920644  89 dict of class
     6    108   0    81888   3   3002532  92 dict (no owner)
     7    114   0    79632   2   3082164  94 dict of type
     8    117   0    51336   2   3133500  96 type
     9    667   1    24012   1   3157512  97 __builtin__.wrapper_descriptor
<76 more rows. Type e.g. '_.more' to view.>
>>> h.iso(1,[],{})
Partition of a set of 3 objects. Total size = 176 bytes.
 Index  Count   %     Size   % Cumulative  % Kind (class / dict of class)
     0      1  33      136  77       136  77 dict (no owner)
     1      1  33       28  16       164  93 list
     2      1  33       12   7       176 100 int
>>> x=[]
>>> h.iso(x).sp
 0: h.Root.i0_modules['__main__'].__dict__['x']
>>> 

This one has been answered already here: Python memory profiler

Basically you do something like that (cited from Guppy-PE):

>>> from guppy import hpy; h=hpy()
>>> h.heap()
Partition of a set of 48477 objects. Total size = 3265516 bytes.
 Index  Count   %     Size   % Cumulative  % Kind (class / dict of class)
     0  25773  53  1612820  49   1612820  49 str
     1  11699  24   483960  15   2096780  64 tuple
     2    174   0   241584   7   2338364  72 dict of module
     3   3478   7   222592   7   2560956  78 types.CodeType
     4   3296   7   184576   6   2745532  84 function
     5    401   1   175112   5   2920644  89 dict of class
     6    108   0    81888   3   3002532  92 dict (no owner)
     7    114   0    79632   2   3082164  94 dict of type
     8    117   0    51336   2   3133500  96 type
     9    667   1    24012   1   3157512  97 __builtin__.wrapper_descriptor
<76 more rows. Type e.g. '_.more' to view.>
>>> h.iso(1,[],{})
Partition of a set of 3 objects. Total size = 176 bytes.
 Index  Count   %     Size   % Cumulative  % Kind (class / dict of class)
     0      1  33      136  77       136  77 dict (no owner)
     1      1  33       28  16       164  93 list
     2      1  33       12   7       176 100 int
>>> x=[]
>>> h.iso(x).sp
 0: h.Root.i0_modules['__main__'].__dict__['x']
>>> 

回答 1

Python 3.4包含一个新模块:tracemalloc。它提供有关哪些代码分配最多内存的详细统计信息。这是显示分配内存的前三行的示例。

from collections import Counter
import linecache
import os
import tracemalloc

def display_top(snapshot, key_type='lineno', limit=3):
    snapshot = snapshot.filter_traces((
        tracemalloc.Filter(False, "<frozen importlib._bootstrap>"),
        tracemalloc.Filter(False, "<unknown>"),
    ))
    top_stats = snapshot.statistics(key_type)

    print("Top %s lines" % limit)
    for index, stat in enumerate(top_stats[:limit], 1):
        frame = stat.traceback[0]
        # replace "/path/to/module/file.py" with "module/file.py"
        filename = os.sep.join(frame.filename.split(os.sep)[-2:])
        print("#%s: %s:%s: %.1f KiB"
              % (index, filename, frame.lineno, stat.size / 1024))
        line = linecache.getline(frame.filename, frame.lineno).strip()
        if line:
            print('    %s' % line)

    other = top_stats[limit:]
    if other:
        size = sum(stat.size for stat in other)
        print("%s other: %.1f KiB" % (len(other), size / 1024))
    total = sum(stat.size for stat in top_stats)
    print("Total allocated size: %.1f KiB" % (total / 1024))


tracemalloc.start()

counts = Counter()
fname = '/usr/share/dict/american-english'
with open(fname) as words:
    words = list(words)
    for word in words:
        prefix = word[:3]
        counts[prefix] += 1
print('Top prefixes:', counts.most_common(3))

snapshot = tracemalloc.take_snapshot()
display_top(snapshot)

结果如下:

Top prefixes: [('con', 1220), ('dis', 1002), ('pro', 809)]
Top 3 lines
#1: scratches/memory_test.py:37: 6527.1 KiB
    words = list(words)
#2: scratches/memory_test.py:39: 247.7 KiB
    prefix = word[:3]
#3: scratches/memory_test.py:40: 193.0 KiB
    counts[prefix] += 1
4 other: 4.3 KiB
Total allocated size: 6972.1 KiB

什么时候内存泄漏不是泄漏?

当计算结束时仍保留内存时,该示例非常有用,但是有时您拥有分配大量内存然后释放所有内存的代码。从技术上讲,这不是内存泄漏,但是它使用的内存比您想象的要多。释放所有内存时如何跟踪?如果是您的代码,则可能可以添加一些调试代码以在运行时拍摄快照。如果没有,您可以在主线程运行时启动后台线程来监视内存使用情况。

这是前面的示例,其中所有代码都已移入count_prefixes()函数中。该函数返回时,将释放所有内存。我还添加了一些sleep()调用来模拟长时间运行的计算。

from collections import Counter
import linecache
import os
import tracemalloc
from time import sleep


def count_prefixes():
    sleep(2)  # Start up time.
    counts = Counter()
    fname = '/usr/share/dict/american-english'
    with open(fname) as words:
        words = list(words)
        for word in words:
            prefix = word[:3]
            counts[prefix] += 1
            sleep(0.0001)
    most_common = counts.most_common(3)
    sleep(3)  # Shut down time.
    return most_common


def main():
    tracemalloc.start()

    most_common = count_prefixes()
    print('Top prefixes:', most_common)

    snapshot = tracemalloc.take_snapshot()
    display_top(snapshot)


def display_top(snapshot, key_type='lineno', limit=3):
    snapshot = snapshot.filter_traces((
        tracemalloc.Filter(False, "<frozen importlib._bootstrap>"),
        tracemalloc.Filter(False, "<unknown>"),
    ))
    top_stats = snapshot.statistics(key_type)

    print("Top %s lines" % limit)
    for index, stat in enumerate(top_stats[:limit], 1):
        frame = stat.traceback[0]
        # replace "/path/to/module/file.py" with "module/file.py"
        filename = os.sep.join(frame.filename.split(os.sep)[-2:])
        print("#%s: %s:%s: %.1f KiB"
              % (index, filename, frame.lineno, stat.size / 1024))
        line = linecache.getline(frame.filename, frame.lineno).strip()
        if line:
            print('    %s' % line)

    other = top_stats[limit:]
    if other:
        size = sum(stat.size for stat in other)
        print("%s other: %.1f KiB" % (len(other), size / 1024))
    total = sum(stat.size for stat in top_stats)
    print("Total allocated size: %.1f KiB" % (total / 1024))


main()

当我运行该版本时,内存使用已从6MB减少到4KB,因为该函数在完成时会释放其所有内存。

Top prefixes: [('con', 1220), ('dis', 1002), ('pro', 809)]
Top 3 lines
#1: collections/__init__.py:537: 0.7 KiB
    self.update(*args, **kwds)
#2: collections/__init__.py:555: 0.6 KiB
    return _heapq.nlargest(n, self.items(), key=_itemgetter(1))
#3: python3.6/heapq.py:569: 0.5 KiB
    result = [(key(elem), i, elem) for i, elem in zip(range(0, -n, -1), it)]
10 other: 2.2 KiB
Total allocated size: 4.0 KiB

现在,这是受另一个答案启发的版本,该答案启动了另一个线程来监视内存使用情况。

from collections import Counter
import linecache
import os
import tracemalloc
from datetime import datetime
from queue import Queue, Empty
from resource import getrusage, RUSAGE_SELF
from threading import Thread
from time import sleep

def memory_monitor(command_queue: Queue, poll_interval=1):
    tracemalloc.start()
    old_max = 0
    snapshot = None
    while True:
        try:
            command_queue.get(timeout=poll_interval)
            if snapshot is not None:
                print(datetime.now())
                display_top(snapshot)

            return
        except Empty:
            max_rss = getrusage(RUSAGE_SELF).ru_maxrss
            if max_rss > old_max:
                old_max = max_rss
                snapshot = tracemalloc.take_snapshot()
                print(datetime.now(), 'max RSS', max_rss)


def count_prefixes():
    sleep(2)  # Start up time.
    counts = Counter()
    fname = '/usr/share/dict/american-english'
    with open(fname) as words:
        words = list(words)
        for word in words:
            prefix = word[:3]
            counts[prefix] += 1
            sleep(0.0001)
    most_common = counts.most_common(3)
    sleep(3)  # Shut down time.
    return most_common


def main():
    queue = Queue()
    poll_interval = 0.1
    monitor_thread = Thread(target=memory_monitor, args=(queue, poll_interval))
    monitor_thread.start()
    try:
        most_common = count_prefixes()
        print('Top prefixes:', most_common)
    finally:
        queue.put('stop')
        monitor_thread.join()


def display_top(snapshot, key_type='lineno', limit=3):
    snapshot = snapshot.filter_traces((
        tracemalloc.Filter(False, "<frozen importlib._bootstrap>"),
        tracemalloc.Filter(False, "<unknown>"),
    ))
    top_stats = snapshot.statistics(key_type)

    print("Top %s lines" % limit)
    for index, stat in enumerate(top_stats[:limit], 1):
        frame = stat.traceback[0]
        # replace "/path/to/module/file.py" with "module/file.py"
        filename = os.sep.join(frame.filename.split(os.sep)[-2:])
        print("#%s: %s:%s: %.1f KiB"
              % (index, filename, frame.lineno, stat.size / 1024))
        line = linecache.getline(frame.filename, frame.lineno).strip()
        if line:
            print('    %s' % line)

    other = top_stats[limit:]
    if other:
        size = sum(stat.size for stat in other)
        print("%s other: %.1f KiB" % (len(other), size / 1024))
    total = sum(stat.size for stat in top_stats)
    print("Total allocated size: %.1f KiB" % (total / 1024))


main()

resource模块使您可以检查当前内存使用情况,并从峰值内存使用情况中保存快照。队列让主线程告诉内存监视器线程何时打印其报告并关闭。运行时,它显示list()调用正在使用的内存:

2018-05-29 10:34:34.441334 max RSS 10188
2018-05-29 10:34:36.475707 max RSS 23588
2018-05-29 10:34:36.616524 max RSS 38104
2018-05-29 10:34:36.772978 max RSS 45924
2018-05-29 10:34:36.929688 max RSS 46824
2018-05-29 10:34:37.087554 max RSS 46852
Top prefixes: [('con', 1220), ('dis', 1002), ('pro', 809)]
2018-05-29 10:34:56.281262
Top 3 lines
#1: scratches/scratch.py:36: 6527.0 KiB
    words = list(words)
#2: scratches/scratch.py:38: 16.4 KiB
    prefix = word[:3]
#3: scratches/scratch.py:39: 10.1 KiB
    counts[prefix] += 1
19 other: 10.8 KiB
Total allocated size: 6564.3 KiB

如果您使用的是Linux,则可能会发现/proc/self/statm比该resource模块更有用。

Python 3.4 includes a new module: tracemalloc. It provides detailed statistics about which code is allocating the most memory. Here’s an example that displays the top three lines allocating memory.

from collections import Counter
import linecache
import os
import tracemalloc

def display_top(snapshot, key_type='lineno', limit=3):
    snapshot = snapshot.filter_traces((
        tracemalloc.Filter(False, "<frozen importlib._bootstrap>"),
        tracemalloc.Filter(False, "<unknown>"),
    ))
    top_stats = snapshot.statistics(key_type)

    print("Top %s lines" % limit)
    for index, stat in enumerate(top_stats[:limit], 1):
        frame = stat.traceback[0]
        # replace "/path/to/module/file.py" with "module/file.py"
        filename = os.sep.join(frame.filename.split(os.sep)[-2:])
        print("#%s: %s:%s: %.1f KiB"
              % (index, filename, frame.lineno, stat.size / 1024))
        line = linecache.getline(frame.filename, frame.lineno).strip()
        if line:
            print('    %s' % line)

    other = top_stats[limit:]
    if other:
        size = sum(stat.size for stat in other)
        print("%s other: %.1f KiB" % (len(other), size / 1024))
    total = sum(stat.size for stat in top_stats)
    print("Total allocated size: %.1f KiB" % (total / 1024))


tracemalloc.start()

counts = Counter()
fname = '/usr/share/dict/american-english'
with open(fname) as words:
    words = list(words)
    for word in words:
        prefix = word[:3]
        counts[prefix] += 1
print('Top prefixes:', counts.most_common(3))

snapshot = tracemalloc.take_snapshot()
display_top(snapshot)

And here are the results:

Top prefixes: [('con', 1220), ('dis', 1002), ('pro', 809)]
Top 3 lines
#1: scratches/memory_test.py:37: 6527.1 KiB
    words = list(words)
#2: scratches/memory_test.py:39: 247.7 KiB
    prefix = word[:3]
#3: scratches/memory_test.py:40: 193.0 KiB
    counts[prefix] += 1
4 other: 4.3 KiB
Total allocated size: 6972.1 KiB

When is a memory leak not a leak?

That example is great when the memory is still being held at the end of the calculation, but sometimes you have code that allocates a lot of memory and then releases it all. It’s not technically a memory leak, but it’s using more memory than you think it should. How can you track memory usage when it all gets released? If it’s your code, you can probably add some debugging code to take snapshots while it’s running. If not, you can start a background thread to monitor memory usage while the main thread runs.

Here’s the previous example where the code has all been moved into the count_prefixes() function. When that function returns, all the memory is released. I also added some sleep() calls to simulate a long-running calculation.

from collections import Counter
import linecache
import os
import tracemalloc
from time import sleep


def count_prefixes():
    sleep(2)  # Start up time.
    counts = Counter()
    fname = '/usr/share/dict/american-english'
    with open(fname) as words:
        words = list(words)
        for word in words:
            prefix = word[:3]
            counts[prefix] += 1
            sleep(0.0001)
    most_common = counts.most_common(3)
    sleep(3)  # Shut down time.
    return most_common


def main():
    tracemalloc.start()

    most_common = count_prefixes()
    print('Top prefixes:', most_common)

    snapshot = tracemalloc.take_snapshot()
    display_top(snapshot)


def display_top(snapshot, key_type='lineno', limit=3):
    snapshot = snapshot.filter_traces((
        tracemalloc.Filter(False, "<frozen importlib._bootstrap>"),
        tracemalloc.Filter(False, "<unknown>"),
    ))
    top_stats = snapshot.statistics(key_type)

    print("Top %s lines" % limit)
    for index, stat in enumerate(top_stats[:limit], 1):
        frame = stat.traceback[0]
        # replace "/path/to/module/file.py" with "module/file.py"
        filename = os.sep.join(frame.filename.split(os.sep)[-2:])
        print("#%s: %s:%s: %.1f KiB"
              % (index, filename, frame.lineno, stat.size / 1024))
        line = linecache.getline(frame.filename, frame.lineno).strip()
        if line:
            print('    %s' % line)

    other = top_stats[limit:]
    if other:
        size = sum(stat.size for stat in other)
        print("%s other: %.1f KiB" % (len(other), size / 1024))
    total = sum(stat.size for stat in top_stats)
    print("Total allocated size: %.1f KiB" % (total / 1024))


main()

When I run that version, the memory usage has gone from 6MB down to 4KB, because the function released all its memory when it finished.

Top prefixes: [('con', 1220), ('dis', 1002), ('pro', 809)]
Top 3 lines
#1: collections/__init__.py:537: 0.7 KiB
    self.update(*args, **kwds)
#2: collections/__init__.py:555: 0.6 KiB
    return _heapq.nlargest(n, self.items(), key=_itemgetter(1))
#3: python3.6/heapq.py:569: 0.5 KiB
    result = [(key(elem), i, elem) for i, elem in zip(range(0, -n, -1), it)]
10 other: 2.2 KiB
Total allocated size: 4.0 KiB

Now here’s a version inspired by another answer that starts a second thread to monitor memory usage.

from collections import Counter
import linecache
import os
import tracemalloc
from datetime import datetime
from queue import Queue, Empty
from resource import getrusage, RUSAGE_SELF
from threading import Thread
from time import sleep

def memory_monitor(command_queue: Queue, poll_interval=1):
    tracemalloc.start()
    old_max = 0
    snapshot = None
    while True:
        try:
            command_queue.get(timeout=poll_interval)
            if snapshot is not None:
                print(datetime.now())
                display_top(snapshot)

            return
        except Empty:
            max_rss = getrusage(RUSAGE_SELF).ru_maxrss
            if max_rss > old_max:
                old_max = max_rss
                snapshot = tracemalloc.take_snapshot()
                print(datetime.now(), 'max RSS', max_rss)


def count_prefixes():
    sleep(2)  # Start up time.
    counts = Counter()
    fname = '/usr/share/dict/american-english'
    with open(fname) as words:
        words = list(words)
        for word in words:
            prefix = word[:3]
            counts[prefix] += 1
            sleep(0.0001)
    most_common = counts.most_common(3)
    sleep(3)  # Shut down time.
    return most_common


def main():
    queue = Queue()
    poll_interval = 0.1
    monitor_thread = Thread(target=memory_monitor, args=(queue, poll_interval))
    monitor_thread.start()
    try:
        most_common = count_prefixes()
        print('Top prefixes:', most_common)
    finally:
        queue.put('stop')
        monitor_thread.join()


def display_top(snapshot, key_type='lineno', limit=3):
    snapshot = snapshot.filter_traces((
        tracemalloc.Filter(False, "<frozen importlib._bootstrap>"),
        tracemalloc.Filter(False, "<unknown>"),
    ))
    top_stats = snapshot.statistics(key_type)

    print("Top %s lines" % limit)
    for index, stat in enumerate(top_stats[:limit], 1):
        frame = stat.traceback[0]
        # replace "/path/to/module/file.py" with "module/file.py"
        filename = os.sep.join(frame.filename.split(os.sep)[-2:])
        print("#%s: %s:%s: %.1f KiB"
              % (index, filename, frame.lineno, stat.size / 1024))
        line = linecache.getline(frame.filename, frame.lineno).strip()
        if line:
            print('    %s' % line)

    other = top_stats[limit:]
    if other:
        size = sum(stat.size for stat in other)
        print("%s other: %.1f KiB" % (len(other), size / 1024))
    total = sum(stat.size for stat in top_stats)
    print("Total allocated size: %.1f KiB" % (total / 1024))


main()

The resource module lets you check the current memory usage, and save the snapshot from the peak memory usage. The queue lets the main thread tell the memory monitor thread when to print its report and shut down. When it runs, it shows the memory being used by the list() call:

2018-05-29 10:34:34.441334 max RSS 10188
2018-05-29 10:34:36.475707 max RSS 23588
2018-05-29 10:34:36.616524 max RSS 38104
2018-05-29 10:34:36.772978 max RSS 45924
2018-05-29 10:34:36.929688 max RSS 46824
2018-05-29 10:34:37.087554 max RSS 46852
Top prefixes: [('con', 1220), ('dis', 1002), ('pro', 809)]
2018-05-29 10:34:56.281262
Top 3 lines
#1: scratches/scratch.py:36: 6527.0 KiB
    words = list(words)
#2: scratches/scratch.py:38: 16.4 KiB
    prefix = word[:3]
#3: scratches/scratch.py:39: 10.1 KiB
    counts[prefix] += 1
19 other: 10.8 KiB
Total allocated size: 6564.3 KiB

If you’re on Linux, you may find /proc/self/statm more useful than the resource module.


回答 2

如果只想查看对象的内存使用情况,(回答其他问题

有一个名为Pymplerasizeof 模块,其中包含该模块。

用法如下:

from pympler import asizeof
asizeof.asizeof(my_object)

sys.getsizeof与之不同,它适用于您自己创建的对象

>>> asizeof.asizeof(tuple('bcd'))
200
>>> asizeof.asizeof({'foo': 'bar', 'baz': 'bar'})
400
>>> asizeof.asizeof({})
280
>>> asizeof.asizeof({'foo':'bar'})
360
>>> asizeof.asizeof('foo')
40
>>> asizeof.asizeof(Bar())
352
>>> asizeof.asizeof(Bar().__dict__)
280
>>> help(asizeof.asizeof)
Help on function asizeof in module pympler.asizeof:

asizeof(*objs, **opts)
    Return the combined size in bytes of all objects passed as positional arguments.

If you only want to look at the memory usage of an object, (answer to other question)

There is a module called Pympler which contains the asizeof module.

Use as follows:

from pympler import asizeof
asizeof.asizeof(my_object)

Unlike sys.getsizeof, it works for your self-created objects.

>>> asizeof.asizeof(tuple('bcd'))
200
>>> asizeof.asizeof({'foo': 'bar', 'baz': 'bar'})
400
>>> asizeof.asizeof({})
280
>>> asizeof.asizeof({'foo':'bar'})
360
>>> asizeof.asizeof('foo')
40
>>> asizeof.asizeof(Bar())
352
>>> asizeof.asizeof(Bar().__dict__)
280
>>> help(asizeof.asizeof)
Help on function asizeof in module pympler.asizeof:

asizeof(*objs, **opts)
    Return the combined size in bytes of all objects passed as positional arguments.

回答 3

披露:

  • 仅适用于Linux
  • 报告用于由当前过程作为一个整体,而不是单个存储器功能

但由于它的简单性,它很不错:

import resource
def using(point=""):
    usage=resource.getrusage(resource.RUSAGE_SELF)
    return '''%s: usertime=%s systime=%s mem=%s mb
           '''%(point,usage[0],usage[1],
                usage[2]/1024.0 )

只需插入using("Label")您想查看的情况即可。例如

print(using("before"))
wrk = ["wasting mem"] * 1000000
print(using("after"))

>>> before: usertime=2.117053 systime=1.703466 mem=53.97265625 mb
>>> after: usertime=2.12023 systime=1.70708 mem=60.8828125 mb

Disclosure:

  • Applicable on Linux only
  • Reports memory used by the current process as a whole, not individual functions within

But nice because of its simplicity:

import resource
def using(point=""):
    usage=resource.getrusage(resource.RUSAGE_SELF)
    return '''%s: usertime=%s systime=%s mem=%s mb
           '''%(point,usage[0],usage[1],
                usage[2]/1024.0 )

Just insert using("Label") where you want to see what’s going on. For example

print(using("before"))
wrk = ["wasting mem"] * 1000000
print(using("after"))

>>> before: usertime=2.117053 systime=1.703466 mem=53.97265625 mb
>>> after: usertime=2.12023 systime=1.70708 mem=60.8828125 mb

回答 4

在我看来,既然已接受的答案以及投票数第二高的答案都存在一些问题,所以我想再提供一个基于Ihor B.答案的答案,并进行了一些微小但重要的修改。

该解决方案允许您运行分析上或者通过包装函数调用用profile,功能和调用它通过与装饰你的函数/法@profile装饰。

当您要分析一些第三方代码而不弄乱其源代码时,第一种技术很有用,而第二种技术则比较“干净”,当您不介意修改函数/方法的源代码时,效果更好想要简介。

我还修改了输出,以便获得RSS,VMS和共享内存。我不太关心“之前”和“之后”的值,只关心增量,所以我删除了那些值(如果您要与Ihor B.的答案进行比较)。

分析代码

# profile.py
import time
import os
import psutil
import inspect


def elapsed_since(start):
    #return time.strftime("%H:%M:%S", time.gmtime(time.time() - start))
    elapsed = time.time() - start
    if elapsed < 1:
        return str(round(elapsed*1000,2)) + "ms"
    if elapsed < 60:
        return str(round(elapsed, 2)) + "s"
    if elapsed < 3600:
        return str(round(elapsed/60, 2)) + "min"
    else:
        return str(round(elapsed / 3600, 2)) + "hrs"


def get_process_memory():
    process = psutil.Process(os.getpid())
    mi = process.memory_info()
    return mi.rss, mi.vms, mi.shared


def format_bytes(bytes):
    if abs(bytes) < 1000:
        return str(bytes)+"B"
    elif abs(bytes) < 1e6:
        return str(round(bytes/1e3,2)) + "kB"
    elif abs(bytes) < 1e9:
        return str(round(bytes / 1e6, 2)) + "MB"
    else:
        return str(round(bytes / 1e9, 2)) + "GB"


def profile(func, *args, **kwargs):
    def wrapper(*args, **kwargs):
        rss_before, vms_before, shared_before = get_process_memory()
        start = time.time()
        result = func(*args, **kwargs)
        elapsed_time = elapsed_since(start)
        rss_after, vms_after, shared_after = get_process_memory()
        print("Profiling: {:>20}  RSS: {:>8} | VMS: {:>8} | SHR {"
              ":>8} | time: {:>8}"
            .format("<" + func.__name__ + ">",
                    format_bytes(rss_after - rss_before),
                    format_bytes(vms_after - vms_before),
                    format_bytes(shared_after - shared_before),
                    elapsed_time))
        return result
    if inspect.isfunction(func):
        return wrapper
    elif inspect.ismethod(func):
        return wrapper(*args,**kwargs)

用法示例,假设上面的代码另存为profile.py

from profile import profile
from time import sleep
from sklearn import datasets # Just an example of 3rd party function call


# Method 1
run_profiling = profile(datasets.load_digits)
data = run_profiling()

# Method 2
@profile
def my_function():
    # do some stuff
    a_list = []
    for i in range(1,100000):
        a_list.append(i)
    return a_list


res = my_function()

这将导致输出类似于以下内容:

Profiling:        <load_digits>  RSS:   5.07MB | VMS:   4.91MB | SHR  73.73kB | time:  89.99ms
Profiling:        <my_function>  RSS:   1.06MB | VMS:   1.35MB | SHR       0B | time:   8.43ms

重要的最后几点注意事项:

  1. 请记住,这种剖析方法仅是近似的,因为计算机上可能会发生许多其他事情。由于垃圾收集和其他因素,增量甚至可能为零。
  2. 由于某些未知的原因,出现非常短的函数调用(例如1或2 ms),而内存使用量为零。我怀疑这是硬件/操作系统(在装有Linux的基本笔记本电脑上测试过)在内存统计信息更新频率方面的一些限制。
  3. 为了使示例简单,我没有使用任何函数参数,但是它们应该像预期的那样工作,即 profile(my_function, arg)进行概要分析my_function(arg)

Since the accepted answer and also the next highest voted answer have, in my opinion, some problems, I’d like to offer one more answer that is based closely on Ihor B.’s answer with some small but important modifications.

This solution allows you to run profiling on either by wrapping a function call with the profile function and calling it, or by decorating your function/method with the @profile decorator.

The first technique is useful when you want to profile some third-party code without messing with its source, whereas the second technique is a bit “cleaner” and works better when you are don’t mind modifying the source of the function/method you want to profile.

I’ve also modified the output, so that you get RSS, VMS, and shared memory. I don’t care much about the “before” and “after” values, but only the delta, so I removed those (if you’re comparing to Ihor B.’s answer).

Profiling code

# profile.py
import time
import os
import psutil
import inspect


def elapsed_since(start):
    #return time.strftime("%H:%M:%S", time.gmtime(time.time() - start))
    elapsed = time.time() - start
    if elapsed < 1:
        return str(round(elapsed*1000,2)) + "ms"
    if elapsed < 60:
        return str(round(elapsed, 2)) + "s"
    if elapsed < 3600:
        return str(round(elapsed/60, 2)) + "min"
    else:
        return str(round(elapsed / 3600, 2)) + "hrs"


def get_process_memory():
    process = psutil.Process(os.getpid())
    mi = process.memory_info()
    return mi.rss, mi.vms, mi.shared


def format_bytes(bytes):
    if abs(bytes) < 1000:
        return str(bytes)+"B"
    elif abs(bytes) < 1e6:
        return str(round(bytes/1e3,2)) + "kB"
    elif abs(bytes) < 1e9:
        return str(round(bytes / 1e6, 2)) + "MB"
    else:
        return str(round(bytes / 1e9, 2)) + "GB"


def profile(func, *args, **kwargs):
    def wrapper(*args, **kwargs):
        rss_before, vms_before, shared_before = get_process_memory()
        start = time.time()
        result = func(*args, **kwargs)
        elapsed_time = elapsed_since(start)
        rss_after, vms_after, shared_after = get_process_memory()
        print("Profiling: {:>20}  RSS: {:>8} | VMS: {:>8} | SHR {"
              ":>8} | time: {:>8}"
            .format("<" + func.__name__ + ">",
                    format_bytes(rss_after - rss_before),
                    format_bytes(vms_after - vms_before),
                    format_bytes(shared_after - shared_before),
                    elapsed_time))
        return result
    if inspect.isfunction(func):
        return wrapper
    elif inspect.ismethod(func):
        return wrapper(*args,**kwargs)

Example usage, assuming the above code is saved as profile.py:

from profile import profile
from time import sleep
from sklearn import datasets # Just an example of 3rd party function call


# Method 1
run_profiling = profile(datasets.load_digits)
data = run_profiling()

# Method 2
@profile
def my_function():
    # do some stuff
    a_list = []
    for i in range(1,100000):
        a_list.append(i)
    return a_list


res = my_function()

This should result in output similar to the below:

Profiling:        <load_digits>  RSS:   5.07MB | VMS:   4.91MB | SHR  73.73kB | time:  89.99ms
Profiling:        <my_function>  RSS:   1.06MB | VMS:   1.35MB | SHR       0B | time:   8.43ms

A couple of important final notes:

  1. Keep in mind, this method of profiling is only going to be approximate, since lots of other stuff might be happening on the machine. Due to garbage collection and other factors, the deltas might even be zero.
  2. For some unknown reason, very short function calls (e.g. 1 or 2 ms) show up with zero memory usage. I suspect this is some limitation of the hardware/OS (tested on basic laptop with Linux) on how often memory statistics are updated.
  3. To keep the examples simple, I didn’t use any function arguments, but they should work as one would expect, i.e. profile(my_function, arg) to profile my_function(arg)

回答 5

下面是一个简单的函数装饰器,它可以跟踪函数调用之前,函数调用之后进程消耗的内存量以及它们之间的区别:

import time
import os
import psutil


def elapsed_since(start):
    return time.strftime("%H:%M:%S", time.gmtime(time.time() - start))


def get_process_memory():
    process = psutil.Process(os.getpid())
    return process.get_memory_info().rss


def profile(func):
    def wrapper(*args, **kwargs):
        mem_before = get_process_memory()
        start = time.time()
        result = func(*args, **kwargs)
        elapsed_time = elapsed_since(start)
        mem_after = get_process_memory()
        print("{}: memory before: {:,}, after: {:,}, consumed: {:,}; exec time: {}".format(
            func.__name__,
            mem_before, mem_after, mem_after - mem_before,
            elapsed_time))
        return result
    return wrapper

这是我的博客,描述了所有详细信息。(已归档的链接

Below is a simple function decorator which allows to track how much memory the process consumed before the function call, after the function call, and what is the difference:

import time
import os
import psutil
 
 
def elapsed_since(start):
    return time.strftime("%H:%M:%S", time.gmtime(time.time() - start))
 
 
def get_process_memory():
    process = psutil.Process(os.getpid())
    mem_info = process.memory_info()
    return mem_info.rss
 
 
def profile(func):
    def wrapper(*args, **kwargs):
        mem_before = get_process_memory()
        start = time.time()
        result = func(*args, **kwargs)
        elapsed_time = elapsed_since(start)
        mem_after = get_process_memory()
        print("{}: memory before: {:,}, after: {:,}, consumed: {:,}; exec time: {}".format(
            func.__name__,
            mem_before, mem_after, mem_after - mem_before,
            elapsed_time))
        return result
    return wrapper

Here is my blog which describes all the details. (archived link)


回答 6

也许有帮助:
< 参见其他 >

pip install gprof2dot
sudo apt-get install graphviz

gprof2dot -f pstats profile_for_func1_001 | dot -Tpng -o profile.png

def profileit(name):
    """
    @profileit("profile_for_func1_001")
    """
    def inner(func):
        def wrapper(*args, **kwargs):
            prof = cProfile.Profile()
            retval = prof.runcall(func, *args, **kwargs)
            # Note use of name from outer scope
            prof.dump_stats(name)
            return retval
        return wrapper
    return inner

@profileit("profile_for_func1_001")
def func1(...)

maybe it help:
<see additional>

pip install gprof2dot
sudo apt-get install graphviz

gprof2dot -f pstats profile_for_func1_001 | dot -Tpng -o profile.png

def profileit(name):
    """
    @profileit("profile_for_func1_001")
    """
    def inner(func):
        def wrapper(*args, **kwargs):
            prof = cProfile.Profile()
            retval = prof.runcall(func, *args, **kwargs)
            # Note use of name from outer scope
            prof.dump_stats(name)
            return retval
        return wrapper
    return inner

@profileit("profile_for_func1_001")
def func1(...)

回答 7

一个简单的示例,使用memory_profile计算代码块/函数的内存使用率,同时返回函数的结果:

import memory_profiler as mp

def fun(n):
    tmp = []
    for i in range(n):
        tmp.extend(list(range(i*i)))
    return "XXXXX"

在运行代码之前计算内存使用量,然后在代码执行期间计算最大使用量:

start_mem = mp.memory_usage(max_usage=True)
res = mp.memory_usage(proc=(fun, [100]), max_usage=True, retval=True) 
print('start mem', start_mem)
print('max mem', res[0][0])
print('used mem', res[0][0]-start_mem)
print('fun output', res[1])

计算运行功能时采样点的使用情况:

res = mp.memory_usage((fun, [100]), interval=.001, retval=True)
print('min mem', min(res[0]))
print('max mem', max(res[0]))
print('used mem', max(res[0])-min(res[0]))
print('fun output', res[1])

积分:@skeept

A simple example to calculate the memory usage of a block of codes / function using memory_profile, while returning result of the function:

import memory_profiler as mp

def fun(n):
    tmp = []
    for i in range(n):
        tmp.extend(list(range(i*i)))
    return "XXXXX"

calculate memory usage before running the code then calculate max usage during the code:

start_mem = mp.memory_usage(max_usage=True)
res = mp.memory_usage(proc=(fun, [100]), max_usage=True, retval=True) 
print('start mem', start_mem)
print('max mem', res[0][0])
print('used mem', res[0][0]-start_mem)
print('fun output', res[1])

calculate usage in sampling points while running function:

res = mp.memory_usage((fun, [100]), interval=.001, retval=True)
print('min mem', min(res[0]))
print('max mem', max(res[0]))
print('used mem', max(res[0])-min(res[0]))
print('fun output', res[1])

Credits: @skeept


为什么Python代码在函数中运行得更快?

问题:为什么Python代码在函数中运行得更快?

def main():
    for i in xrange(10**8):
        pass
main()

Python中的这段代码在其中运行(注意:计时是通过Linux中的BASH中的time函数完成的。)

real    0m1.841s
user    0m1.828s
sys     0m0.012s

但是,如果for循环未放在函数中,

for i in xrange(10**8):
    pass

那么它会运行更长的时间:

real    0m4.543s
user    0m4.524s
sys     0m0.012s

为什么是这样?

def main():
    for i in xrange(10**8):
        pass
main()

This piece of code in Python runs in (Note: The timing is done with the time function in BASH in Linux.)

real    0m1.841s
user    0m1.828s
sys     0m0.012s

However, if the for loop isn’t placed within a function,

for i in xrange(10**8):
    pass

then it runs for a much longer time:

real    0m4.543s
user    0m4.524s
sys     0m0.012s

Why is this?


回答 0

您可能会问为什么存储局部变量比全局变量更快。这是CPython实现的细节。

请记住,CPython被编译为字节码,解释器将运行该字节码。编译函数时,局部变量存储在固定大小的数组(不是 a dict)中,并且变量名称分配给索引。这是可能的,因为您不能将局部变量动态添加到函数中。然后检索一个本地变量实际上是对列表的指针查找,而对refcount的引用PyObject则是微不足道的。

将此与全局查找(LOAD_GLOBAL)进行对比,它是dict涉及哈希等的真实搜索。顺便说一句,这就是为什么需要指定global i是否要使其成为全局变量的原因:如果曾经在作用域内分配变量,则编译器将发出STORE_FASTs的访问权限,除非您告知不要这样做。

顺便说一句,全局查找仍然非常优化。属性查找foo.bar真的慢的!

这是关于局部变量效率的小插图

You might ask why it is faster to store local variables than globals. This is a CPython implementation detail.

Remember that CPython is compiled to bytecode, which the interpreter runs. When a function is compiled, the local variables are stored in a fixed-size array (not a dict) and variable names are assigned to indexes. This is possible because you can’t dynamically add local variables to a function. Then retrieving a local variable is literally a pointer lookup into the list and a refcount increase on the PyObject which is trivial.

Contrast this to a global lookup (LOAD_GLOBAL), which is a true dict search involving a hash and so on. Incidentally, this is why you need to specify global i if you want it to be global: if you ever assign to a variable inside a scope, the compiler will issue STORE_FASTs for its access unless you tell it not to.

By the way, global lookups are still pretty optimised. Attribute lookups foo.bar are the really slow ones!

Here is small illustration on local variable efficiency.


回答 1

在函数内部,字节码为:

  2           0 SETUP_LOOP              20 (to 23)
              3 LOAD_GLOBAL              0 (xrange)
              6 LOAD_CONST               3 (100000000)
              9 CALL_FUNCTION            1
             12 GET_ITER            
        >>   13 FOR_ITER                 6 (to 22)
             16 STORE_FAST               0 (i)

  3          19 JUMP_ABSOLUTE           13
        >>   22 POP_BLOCK           
        >>   23 LOAD_CONST               0 (None)
             26 RETURN_VALUE        

在顶层,字节码为:

  1           0 SETUP_LOOP              20 (to 23)
              3 LOAD_NAME                0 (xrange)
              6 LOAD_CONST               3 (100000000)
              9 CALL_FUNCTION            1
             12 GET_ITER            
        >>   13 FOR_ITER                 6 (to 22)
             16 STORE_NAME               1 (i)

  2          19 JUMP_ABSOLUTE           13
        >>   22 POP_BLOCK           
        >>   23 LOAD_CONST               2 (None)
             26 RETURN_VALUE        

区别在于STORE_FAST比()快STORE_NAME。这是因为在函数中,i它是局部的,但在顶层是全局的。

要检查字节码,请使用dis模块。我可以直接反汇编该函数,但是要反汇编顶层代码,我必须使用compile内置函数。

Inside a function, the bytecode is:

  2           0 SETUP_LOOP              20 (to 23)
              3 LOAD_GLOBAL              0 (xrange)
              6 LOAD_CONST               3 (100000000)
              9 CALL_FUNCTION            1
             12 GET_ITER            
        >>   13 FOR_ITER                 6 (to 22)
             16 STORE_FAST               0 (i)

  3          19 JUMP_ABSOLUTE           13
        >>   22 POP_BLOCK           
        >>   23 LOAD_CONST               0 (None)
             26 RETURN_VALUE        

At the top level, the bytecode is:

  1           0 SETUP_LOOP              20 (to 23)
              3 LOAD_NAME                0 (xrange)
              6 LOAD_CONST               3 (100000000)
              9 CALL_FUNCTION            1
             12 GET_ITER            
        >>   13 FOR_ITER                 6 (to 22)
             16 STORE_NAME               1 (i)

  2          19 JUMP_ABSOLUTE           13
        >>   22 POP_BLOCK           
        >>   23 LOAD_CONST               2 (None)
             26 RETURN_VALUE        

The difference is that STORE_FAST is faster (!) than STORE_NAME. This is because in a function, i is a local but at toplevel it is a global.

To examine bytecode, use the dis module. I was able to disassemble the function directly, but to disassemble the toplevel code I had to use the compile builtin.


回答 2

除了局部/全局变量存储时间外,操作码预测还使函数运行更快。

正如其他答案所解释的,该函数STORE_FAST在循环中使用操作码。这是函数循环的字节码:

    >>   13 FOR_ITER                 6 (to 22)   # get next value from iterator
         16 STORE_FAST               0 (x)       # set local variable
         19 JUMP_ABSOLUTE           13           # back to FOR_ITER

通常,在运行程序时,Python会依次执行每个操作码,跟踪堆栈并在执行每个操作码后对堆栈帧执行其他检查。操作码预测意味着在某些情况下,Python能够直接跳转到下一个操作码,从而避免了其中的一些开销。

在这种情况下,每当Python看到FOR_ITER(循环的顶部)时,它将“预测” STORE_FAST它必须执行的下一个操作码。然后,Python窥视下一个操作码,如果预测正确,它将直接跳转到STORE_FAST。这具有将两个操作码压缩为单个操作码的效果。

另一方面,STORE_NAME操作码在全局级别的循环中使用。看到此操作码时,Python *不会*做出类似的预测。相反,它必须返回到评估循环的顶部,该循环对循环的执行速度有明显的影响。

为了提供有关此优化的更多技术细节,以下是该ceval.c文件(Python虚拟机的“引擎”)的引文:

一些操作码往往成对出现,因此可以在运行第一个代码时预测第二个代码。例如, GET_ITER通常紧随其后FOR_ITER。并且FOR_ITER通常后跟STORE_FASTUNPACK_SEQUENCE

验证预测需要对寄存器变量进行一个针对常数的高速测试。如果配对良好,则处理器自己的内部分支谓词成功的可能性很高,从而导致到下一个操作码的开销几乎为零。成功的预测可以节省通过评估循环的旅程,该评估循环包括其两个不可预测的分支,HAS_ARG测试和开关情况。结合处理器的内部分支预测,成功PREDICT的结果是使两个操作码像合并了主体的单个新操作码一样运行。

我们可以在FOR_ITER操作码的源代码中看到准确的预测STORE_FAST位置:

case FOR_ITER:                         // the FOR_ITER opcode case
    v = TOP();
    x = (*v->ob_type->tp_iternext)(v); // x is the next value from iterator
    if (x != NULL) {                     
        PUSH(x);                       // put x on top of the stack
        PREDICT(STORE_FAST);           // predict STORE_FAST will follow - success!
        PREDICT(UNPACK_SEQUENCE);      // this and everything below is skipped
        continue;
    }
    // error-checking and more code for when the iterator ends normally                                     

PREDICT函数扩展为,if (*next_instr == op) goto PRED_##op即我们只是跳转到预测的操作码的开头。在这种情况下,我们跳到这里:

PREDICTED_WITH_ARG(STORE_FAST);
case STORE_FAST:
    v = POP();                     // pop x back off the stack
    SETLOCAL(oparg, v);            // set it as the new local variable
    goto fast_next_opcode;

现在设置了局部变量,下一个操作码可以执行了。Python继续执行迭代直到到达终点,每次都成功进行预测。

Python的wiki页面有大约CPython中的虚拟机是如何工作的更多信息。

Aside from local/global variable store times, opcode prediction makes the function faster.

As the other answers explain, the function uses the STORE_FAST opcode in the loop. Here’s the bytecode for the function’s loop:

    >>   13 FOR_ITER                 6 (to 22)   # get next value from iterator
         16 STORE_FAST               0 (x)       # set local variable
         19 JUMP_ABSOLUTE           13           # back to FOR_ITER

Normally when a program is run, Python executes each opcode one after the other, keeping track of the a stack and preforming other checks on the stack frame after each opcode is executed. Opcode prediction means that in certain cases Python is able to jump directly to the next opcode, thus avoiding some of this overhead.

In this case, every time Python sees FOR_ITER (the top of the loop), it will “predict” that STORE_FAST is the next opcode it has to execute. Python then peeks at the next opcode and, if the prediction was correct, it jumps straight to STORE_FAST. This has the effect of squeezing the two opcodes into a single opcode.

On the other hand, the STORE_NAME opcode is used in the loop at the global level. Python does *not* make similar predictions when it sees this opcode. Instead, it must go back to the top of the evaluation-loop which has obvious implications for the speed at which the loop is executed.

To give some more technical detail about this optimization, here’s a quote from the ceval.c file (the “engine” of Python’s virtual machine):

Some opcodes tend to come in pairs thus making it possible to predict the second code when the first is run. For example, GET_ITER is often followed by FOR_ITER. And FOR_ITER is often followed by STORE_FAST or UNPACK_SEQUENCE.

Verifying the prediction costs a single high-speed test of a register variable against a constant. If the pairing was good, then the processor’s own internal branch predication has a high likelihood of success, resulting in a nearly zero-overhead transition to the next opcode. A successful prediction saves a trip through the eval-loop including its two unpredictable branches, the HAS_ARG test and the switch-case. Combined with the processor’s internal branch prediction, a successful PREDICT has the effect of making the two opcodes run as if they were a single new opcode with the bodies combined.

We can see in the source code for the FOR_ITER opcode exactly where the prediction for STORE_FAST is made:

case FOR_ITER:                         // the FOR_ITER opcode case
    v = TOP();
    x = (*v->ob_type->tp_iternext)(v); // x is the next value from iterator
    if (x != NULL) {                     
        PUSH(x);                       // put x on top of the stack
        PREDICT(STORE_FAST);           // predict STORE_FAST will follow - success!
        PREDICT(UNPACK_SEQUENCE);      // this and everything below is skipped
        continue;
    }
    // error-checking and more code for when the iterator ends normally                                     

The PREDICT function expands to if (*next_instr == op) goto PRED_##op i.e. we just jump to the start of the predicted opcode. In this case, we jump here:

PREDICTED_WITH_ARG(STORE_FAST);
case STORE_FAST:
    v = POP();                     // pop x back off the stack
    SETLOCAL(oparg, v);            // set it as the new local variable
    goto fast_next_opcode;

The local variable is now set and the next opcode is up for execution. Python continues through the iterable until it reaches the end, making the successful prediction each time.

The Python wiki page has more information about how CPython’s virtual machine works.


建议使用哪个Python内存分析器?[关闭]

问题:建议使用哪个Python内存分析器?[关闭]

我想知道我的Python应用程序的内存使用情况,尤其想知道哪些代码块/部分或对象消耗了最多的内存。Google搜索显示商用的是Python Memory Validator(仅限Windows)。

开源的是PySizerHeapy

我没有尝试过任何人,所以我想知道哪个是最好的考虑因素:

  1. 提供大多数细节。

  2. 我必须要做最少的更改,也可以不做任何更改。

I want to know the memory usage of my Python application and specifically want to know what code blocks/portions or objects are consuming most memory. Google search shows a commercial one is Python Memory Validator (Windows only).

And open source ones are PySizer and Heapy.

I haven’t tried anyone, so I wanted to know which one is the best considering:

  1. Gives most details.

  2. I have to do least or no changes to my code.


回答 0

很容易使用。在代码中的某些时候,您必须编写以下代码:

from guppy import hpy
h = hpy()
print(h.heap())

这将为您提供如下输出:

Partition of a set of 132527 objects. Total size = 8301532 bytes.
Index  Count   %     Size   % Cumulative  % Kind (class / dict of class)
0  35144  27  2140412  26   2140412  26 str
1  38397  29  1309020  16   3449432  42 tuple
2    530   0   739856   9   4189288  50 dict (no owner)

您还可以从何处引用对象,并获取有关该对象的统计信息,但是以某种方式,该文档上的文档很少。

还有一个用Tk编写的图形浏览器。

Heapy is quite simple to use. At some point in your code, you have to write the following:

from guppy import hpy
h = hpy()
print(h.heap())

This gives you some output like this:

Partition of a set of 132527 objects. Total size = 8301532 bytes.
Index  Count   %     Size   % Cumulative  % Kind (class / dict of class)
0  35144  27  2140412  26   2140412  26 str
1  38397  29  1309020  16   3449432  42 tuple
2    530   0   739856   9   4189288  50 dict (no owner)

You can also find out from where objects are referenced and get statistics about that, but somehow the docs on that are a bit sparse.

There is a graphical browser as well, written in Tk.


回答 1

由于没有人提到它,因此我将指向我的模块memory_profiler,该能够打印内存使用情况的报告,并且可以在Unix和Windows上运行(在最后一个版本中需要psutil)。输出不是很详细,但是目标是让您概述代码在哪里消耗了更多的内存,而不是对分配的对象进行详尽的分析。

在用函数修饰功能@profile并使用-m memory_profiler标记运行代码后,它将打印如下一行报告:

Line #    Mem usage  Increment   Line Contents
==============================================
     3                           @profile
     4      5.97 MB    0.00 MB   def my_func():
     5     13.61 MB    7.64 MB       a = [1] * (10 ** 6)
     6    166.20 MB  152.59 MB       b = [2] * (2 * 10 ** 7)
     7     13.61 MB -152.59 MB       del b
     8     13.61 MB    0.00 MB       return a

Since nobody has mentioned it I’ll point to my module memory_profiler which is capable of printing line-by-line report of memory usage and works on Unix and Windows (needs psutil on this last one). Output is not very detailed but the goal is to give you an overview of where the code is consuming more memory and not a exhaustive analysis on allocated objects.

After decorating your function with @profile and running your code with the -m memory_profiler flag it will print a line-by-line report like this:

Line #    Mem usage  Increment   Line Contents
==============================================
     3                           @profile
     4      5.97 MB    0.00 MB   def my_func():
     5     13.61 MB    7.64 MB       a = [1] * (10 ** 6)
     6    166.20 MB  152.59 MB       b = [2] * (2 * 10 ** 7)
     7     13.61 MB -152.59 MB       del b
     8     13.61 MB    0.00 MB       return a

回答 2

我推荐Dowser。设置非常容易,您需要对代码进行零更改。您可以通过简单的Web界面随时查看每种类型的对象计数,查看活动对象列表,查看对活动对象的引用。

# memdebug.py

import cherrypy
import dowser

def start(port):
    cherrypy.tree.mount(dowser.Root())
    cherrypy.config.update({
        'environment': 'embedded',
        'server.socket_port': port
    })
    cherrypy.server.quickstart()
    cherrypy.engine.start(blocking=False)

您导入memdebug,然后调用memdebug.start。就这样。

我没有尝试过PySizer或Heapy。我会很感激别人的评论。

更新

上面的代码用于CherryPy 2.XCherryPy 3.Xserver.quickstart方法已删除,并且engine.start不带有该blocking标志。因此,如果您正在使用CherryPy 3.X

# memdebug.py

import cherrypy
import dowser

def start(port):
    cherrypy.tree.mount(dowser.Root())
    cherrypy.config.update({
        'environment': 'embedded',
        'server.socket_port': port
    })
    cherrypy.engine.start()

I recommend Dowser. It is very easy to setup, and you need zero changes to your code. You can view counts of objects of each type through time, view list of live objects, view references to live objects, all from the simple web interface.

# memdebug.py

import cherrypy
import dowser

def start(port):
    cherrypy.tree.mount(dowser.Root())
    cherrypy.config.update({
        'environment': 'embedded',
        'server.socket_port': port
    })
    cherrypy.server.quickstart()
    cherrypy.engine.start(blocking=False)

You import memdebug, and call memdebug.start. That’s all.

I haven’t tried PySizer or Heapy. I would appreciate others’ reviews.

UPDATE

The above code is for CherryPy 2.X, CherryPy 3.X the server.quickstart method has been removed and engine.start does not take the blocking flag. So if you are using CherryPy 3.X

# memdebug.py

import cherrypy
import dowser

def start(port):
    cherrypy.tree.mount(dowser.Root())
    cherrypy.config.update({
        'environment': 'embedded',
        'server.socket_port': port
    })
    cherrypy.engine.start()

回答 3

考虑objgraph库(请参阅http://www.lshift.net/blog/2008/11/14/tracing-python-memory-leaks以获得示例用例)。


回答 4

up(还有另一个)是Python的Memory Usage Profiler。该工具集的重点在于识别内存泄漏。

Muppy试图帮助开发人员识别Python应用程序的内存泄漏。它可以跟踪运行时的内存使用情况,并标识泄漏的对象。另外,提供了一些工具,这些工具可以定位未释放对象的来源。

Muppy is (yet another) Memory Usage Profiler for Python. The focus of this toolset is laid on the identification of memory leaks.

Muppy tries to help developers to identity memory leaks of Python applications. It enables the tracking of memory usage during runtime and the identification of objects which are leaking. Additionally, tools are provided which allow to locate the source of not released objects.


回答 5

我正在为Python开发一个名为memprof的内存分析器:

http://jmdana.github.io/memprof/

它允许您在执行装饰方法期间记录和绘制变量的内存使用情况。您只需要使用以下方法导入库:

from memprof import memprof

并使用以下方法装饰您的方法:

@memprof

这是有关情节外观的示例:

该项目托管在GitHub中:

https://github.com/jmdana/memprof

I’m developing a memory profiler for Python called memprof:

http://jmdana.github.io/memprof/

It allows you to log and plot the memory usage of your variables during the execution of the decorated methods. You just have to import the library using:

from memprof import memprof

And decorate your method using:

@memprof

This is an example on how the plots look like:

The project is hosted in GitHub:

https://github.com/jmdana/memprof


回答 6

我发现苹果酱比Heapy或PySizer更具功能。如果您恰巧正在运行wsgi Webapp,则Dozer是Dowser的一个很好的中间件包装

I found meliae to be much more functional than Heapy or PySizer. If you happen to be running a wsgi webapp, then Dozer is a nice middleware wrapper of Dowser


回答 7

也尝试pytracemalloc项目,该项目提供每个Python行号的内存使用情况。

编辑(2014/04):现在它具有Qt GUI来分析快照。

Try also the pytracemalloc project which provides the memory usage per Python line number.

EDIT (2014/04): It now has a Qt GUI to analyze snapshots.


Py-spy-Python程序的采样分析器

PY-SPY是一个用于Python程序的采样分析器。它允许您可视化Python程序正在花费时间的内容,而无需重新启动程序或以任何方式修改代码。PY-SPY的开销非常低:为了提高速度,它是用Rust编写的,并且不会在与分析过的Python程序相同的进程中运行。这意味着对生产Python代码使用py-spy是安全的

PY-SPY可以在Linux、OSX、Windows和FreeBSD上运行,并支持分析所有最新版本的CPython解释器(版本2.3-2.7和3.3-3.9)

安装

可以通过以下方式从PyPI安装预制的双轮:

pip install py-spy

您也可以从下载预生成的二进制文件。GitHub Releases
Page
这包括ARM和FreeBSD的二进制文件,它们不能使用pip安装。如果您是铁锈用户,py-spy还可以安装:cargo install py-spy在Arch Linux上,py-spy is in AUR并且可以与安装在一起yay -S py-spy

用法

PY-SPY从命令行工作,并获取要从中采样的程序的PID或要运行的python程序的命令行。Py-spy有三个子命令recordtopdump

录制

Py-spy支持使用record指挥部。例如,您可以生成一个flame graph通过执行以下操作,对您的python进程执行以下操作:

py-spy record -o profile.svg --pid 12345
# OR
py-spy record -o profile.svg -- python myprogram.py

这将生成一个交互式SVG文件,如下所示:

您可以更改文件格式以生成speedscope属性的配置文件或原始数据--format参数。看见py-spy record --help有关其他选项的信息,包括更改采样率、过滤以仅包括持有GIL的线程、分析本机C扩展、显示线程ID、分析子进程等

顶部

顶部显示哪些函数在您的python程序中占用的时间最长的实时视图,这与unix类似。top指挥部。使用以下选项运行py-spy:

py-spy top --pid 12345
# OR
py-spy top -- python myprogram.py

将显示Python程序的实时更新高级视图:

转储

Py-spy还可以显示每个python线程的当前调用堆栈,dump命令:

py-spy dump --pid 12345

这会将每个线程的调用堆栈以及其他一些基本进程信息转储到控制台:

这对于只需要一个调用堆栈来确定Python程序挂起的位置的情况很有用。此命令还能够打印出与每个堆栈帧相关联的局部变量,方法是将--locals旗帜

常见问题解答

为什么我们需要另一个Python分析器?

此项目旨在让您分析和调试任何正在运行的Python程序,即使该程序正在服务于生产流量

虽然还有许多其他的Python评测项目,但几乎所有项目都需要以某种方式修改评测的程序。通常,分析代码在目标python进程内运行,这会减慢并改变程序的操作方式。这意味着使用这些分析器调试生产服务中的问题通常不安全,因为它们通常会对性能产生明显影响

间谍是怎么工作的?

Py-spy的工作方式是直接读取python程序的内存,方法是使用process_vm_readvLinux上的系统调用,vm_read访问OSX或ReadProcessMemory在Windows上调用

计算Python程序的调用堆栈的方法是查看全局PyInterpreterState变量以获取解释器中运行的所有Python线程,然后迭代每个线程中的每个PyFrameObject以获得调用堆栈。由于Python ABI在不同版本之间会有所不同,因此我们使用Rust的bindgen要为我们关心的每个Python解释器类生成不同的RUST结构,并使用这些生成的结构来确定Python程序中的内存布局

由于以下原因,获取Python解释器的内存地址可能有点棘手Address Space Layout Randomization如果目标python解释器附带符号,那么通过取消引用interp_head_PyRuntime变量取决于Python版本。但是,许多Python版本都附带了剥离的二进制文件,或者在Windows上没有相应的PDB符号文件。在这些情况下,我们扫描BSS部分,查找看起来可能指向有效PyInterpreterState的地址,并检查该地址的布局是否符合我们的预期

py-spy配置文件可以本地扩展吗?

是!PY-SPY支持在x86_64Linux和Windows上分析用C/C++或Cython等语言编写的本机Python扩展。您可以通过传递以下命令来启用此模式--native在命令行上。为了获得最佳效果,您应该使用符号编译Python扩展。对于Cython程序,同样值得注意的是,py-spy需要生成的C或C++文件才能返回原始.pyx文件的行号。请阅读blog post了解更多信息

如何评价子流程?

通过传入--subprocesses标志添加到记录或顶视图,py-spy还将包括作为目标程序的子进程的任何python进程的输出。这对于分析使用多处理或独角兽工作线程池的应用程序非常有用。PY-SPY将监视正在创建的新进程,并自动附加到它们,并在输出中包含它们的样本。记录视图将包括调用堆栈中每个程序的PID和cmdline,子进程显示为其父进程的子进程

你什么时候需要以sudo的身份跑步?

PY-SPY通过从不同的Python进程读取内存来工作,出于安全原因,这可能是不允许的,具体取决于您的操作系统和系统设置。在许多情况下,以root用户(使用sudo或类似用户)身份运行可以绕过这些安全限制。OSX总是需要以root用户身份运行,但在Linux上,这取决于您如何启动py-spy和系统安全设置

在Linux上,默认配置是在附加到非子进程时需要root权限。对于py-spy,这意味着您可以通过让py-spy创建进程(py-spy record -- python myprogram.py),但是通过指定PID附加到现有进程通常需要root(sudo py-spy record --pid 123456)。您可以通过在Linux上设置ptrace_scope sysctl variable

如何检测线程是否空闲?

PY-SPY尝试仅包括来自活动运行代码的线程的堆栈跟踪,并排除休眠或空闲的线程。如果可能,py-spy尝试从OS获取此线程活动信息:通过读入/proc/PID/stat在Linux上,通过使用machthread_basic_info调用OSX,并通过查看当前系统调用是否known to be
idle
在Windows上

这种方法有一些限制,不过这可能会导致空闲线程仍然被标记为活动的。首先,我们必须在暂停程序之前获取此线程活动信息,因为从暂停的程序中获取此信息将导致它总是返回此信息为空闲。这意味着存在潜在的竞争条件,在这种情况下,我们获得线程活动,然后当我们获得堆栈跟踪时,线程处于不同的状态。对于Linux上的FreeBSD和i686/ARM处理器,查询操作系统的线程活动也尚未实现。在Windows上,IO上被阻塞的调用也不会被标记为空闲,例如在从标准输入读取输入时。最后,在某些Linux调用中,我们正在使用的ptrace Attach可能会导致空闲线程暂时唤醒,从而导致从procfs读取时出现误报。出于这些原因,我们还有一个启发式回退,它将Python中已知的某些已知调用标记为空闲

您可以通过设置--idle标志,该标志将包括py-spy认为空闲的帧。

GIL检测是如何工作的?

我们通过查看_PyThreadState_Current符号(适用于Python3.6和更早版本),并通过从_PyRuntimePython3.7及更高版本中的struct。这些符号可能不包括在您的python发行版中,这将导致解析哪个线程持有GIL失败。当前的GIL使用情况也显示在top查看为%Gil

通过--gil标志将仅包括对挂起的线程的跟踪。Global Interpreter Lock在某些情况下,这可能是更准确地了解您的python程序是如何花费时间的,尽管您应该意识到,这将错过在仍处于活动状态时发布GIL的扩展中的活动

为什么在OSX上分析/usr/bin/python会出现问题?

OSX有一项功能,称为System Integrity Protection这可以防止甚至root用户从/usr/bin中的任何二进制文件中读取内存。不幸的是,这包括OSX附带的python解释器

有几种不同的方法来处理此问题:

我怎么在码头里运行py-spy?

在停靠容器中运行py-spy通常也会出现权限被拒绝的错误,即使在以root身份运行时也是如此

此错误是由docker限制我们正在使用的process_vm_readv系统调用引起的。可以通过设置--cap-add SYS_PTRACE当启动船坞容器时

或者,您也可以编辑docker编写的YAML文件

your_service:
   cap_add:
     - SYS_PTRACE

请注意,您需要重新启动停靠容器才能使此设置生效

您还可以从主机操作系统使用py-spy来分析在停靠容器内运行的进程

我怎么才能在库伯内斯运营Py-Spy?

Py-Spy需求SYS_PTRACE才能读取进程内存。默认情况下,Kubernetes会删除该功能,从而导致错误

Permission Denied: Try running again with elevated permissions by going 'sudo env "PATH=$PATH" !!'

解决此问题的推荐方法是编辑规范并添加该功能。对于部署,这是通过将以下内容添加到Deployment.spec.template.spec.containers

securityContext:
  capabilities:
    add:
    - SYS_PTRACE

有关这一点的更多详细信息,请单击此处:https://kubernetes.io/docs/tasks/configure-pod-container/security-context/#set-capabilities-for-a-container请注意,这将删除现有Pod并重新创建这些Pod

如何在Alpine Linux上安装py-spy?

高山python选择退出manylinux轮子:pypa/pip#3969 (comment)您可以通过执行以下操作来覆盖此行为,以使用pip在Alpine上安装py-spy:

echo 'manylinux1_compatible = True' > /usr/local/lib/python3.7/site-packages/_manylinux.py

或者,您可以从以下位置下载MUSL二进制文件GitHub releases page

如何避免暂停Python程序?

通过设置--nonblocking选项,py-spy不会暂停您正在分析的目标python。虽然使用py-spy对进程进行采样的性能影响通常非常低,但设置此选项将完全避免中断正在运行的python程序。

设置此选项后,py-spy将在运行时从python进程读取解释器状态。由于我们用来读取内存的调用不是原子的,并且我们必须发出多个调用才能获得堆栈跟踪,这意味着在采样时偶尔会遇到错误。这可能表现为采样时错误率增加,或者部分堆栈帧包含在输出中

您如何通过PyPI分发Rust可执行二进制文件?

好的,实际上从来没有人问过我这个问题,但我想分享一下,因为这是一个非常可怕的黑客攻击,可能对其他人有用

我真的很想通过PyPI分发这个包,因为使用pip安装会使大多数Python程序员更容易在他们的系统上安装。不幸的是,installing executables as python
scripts isn’t something that setuptools supports

为了解决这个问题,我使用setuptools_rust包来构建py-spy二进制文件,然后覆盖distutils install command将构建的二进制文件复制到python脚本文件夹中。通过为支持的平台预置轮子来实现这一点,意味着我们可以使用pip安装py-spy,而不需要在要安装它的机器上安装Rust编译器。

py-spy是否支持32位Windows?与PyPy集成?使用USc2版本的Python2吗?

尚未=)

如果您希望在py-spy中看到一些功能,请竖起大拇指appropriate
issue
或者创建一个新的文档来描述缺少的功能

学分

Py-Spy在很大程度上受到了Julia Evans在……方面的出色工作rbspy特别是,生成火焰图和速度范围文件的代码直接取自rbspy,该项目使用read-process-memoryproc-maps从rbspy剥离出来的板条箱

许可证

Py-spy是在麻省理工学院的许可下释放的,请参阅LICENSE用于全文的文件

如何配置Python脚本?

问题:如何配置Python脚本?

欧拉计画和其他编码竞赛经常有最长的运行时间,或者人们吹嘘他们的特定解决方案的运行速度。使用Python时,有时这些方法有些繁琐-即向中添加计时代码__main__

剖析Python程序需要花费多长时间的好方法是什么?

Project Euler and other coding contests often have a maximum time to run or people boast of how fast their particular solution runs. With Python, sometimes the approaches are somewhat kludgey – i.e., adding timing code to __main__.

What is a good way to profile how long a Python program takes to run?


回答 0

Python包含一个名为cProfile的探查器。它不仅给出了总的运行时间,还分别对每个函数进行了计时,并告诉您每个函数被调用了多少次,从而使您轻松确定应该在哪里进行优化。

您可以从代码内部或解释器中调用它,如下所示:

import cProfile
cProfile.run('foo()')

更有用的是,您可以在运行脚本时调用cProfile:

python -m cProfile myscript.py

为了使其更容易,我制作了一个名为“ profile.bat”的批处理文件:

python -m cProfile %1

所以我要做的就是运行:

profile euler048.py

我得到这个:

1007 function calls in 0.061 CPU seconds

Ordered by: standard name
ncalls  tottime  percall  cumtime  percall filename:lineno(function)
    1    0.000    0.000    0.061    0.061 <string>:1(<module>)
 1000    0.051    0.000    0.051    0.000 euler048.py:2(<lambda>)
    1    0.005    0.005    0.061    0.061 euler048.py:2(<module>)
    1    0.000    0.000    0.061    0.061 {execfile}
    1    0.002    0.002    0.053    0.053 {map}
    1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler objects}
    1    0.000    0.000    0.000    0.000 {range}
    1    0.003    0.003    0.003    0.003 {sum}

编辑:更新了指向PyCon 2013的视频资源的链接,标题为 Python Profiling
Also via YouTube

Python includes a profiler called cProfile. It not only gives the total running time, but also times each function separately, and tells you how many times each function was called, making it easy to determine where you should make optimizations.

You can call it from within your code, or from the interpreter, like this:

import cProfile
cProfile.run('foo()')

Even more usefully, you can invoke the cProfile when running a script:

python -m cProfile myscript.py

To make it even easier, I made a little batch file called ‘profile.bat’:

python -m cProfile %1

So all I have to do is run:

profile euler048.py

And I get this:

1007 function calls in 0.061 CPU seconds

Ordered by: standard name
ncalls  tottime  percall  cumtime  percall filename:lineno(function)
    1    0.000    0.000    0.061    0.061 <string>:1(<module>)
 1000    0.051    0.000    0.051    0.000 euler048.py:2(<lambda>)
    1    0.005    0.005    0.061    0.061 euler048.py:2(<module>)
    1    0.000    0.000    0.061    0.061 {execfile}
    1    0.002    0.002    0.053    0.053 {map}
    1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler objects}
    1    0.000    0.000    0.000    0.000 {range}
    1    0.003    0.003    0.003    0.003 {sum}

EDIT: Updated link to a good video resource from PyCon 2013 titled Python Profiling
Also via YouTube.


回答 1

前一阵子,我pycallgraph从您的Python代码生成了可视化效果。编辑:我已经更新了该示例以使其可用于本文撰写时的最新版本3.3。

pip install pycallgraph安装GraphViz之后,您可以从命令行运行它:

pycallgraph graphviz -- ./mypythonscript.py

或者,您可以分析代码的特定部分:

from pycallgraph import PyCallGraph
from pycallgraph.output import GraphvizOutput

with PyCallGraph(output=GraphvizOutput()):
    code_to_profile()

这些都将生成pycallgraph.png类似于下图的文件:

A while ago I made pycallgraph which generates a visualisation from your Python code. Edit: I’ve updated the example to work with 3.3, the latest release as of this writing.

After a pip install pycallgraph and installing GraphViz you can run it from the command line:

pycallgraph graphviz -- ./mypythonscript.py

Or, you can profile particular parts of your code:

from pycallgraph import PyCallGraph
from pycallgraph.output import GraphvizOutput

with PyCallGraph(output=GraphvizOutput()):
    code_to_profile()

Either of these will generate a pycallgraph.png file similar to the image below:


回答 2

值得指出的是,使用探查器仅在主线程上有效(默认情况下),如果使用其他线程,则不会从其他线程获得任何信息。这可能有点麻烦,因为在探查器文档中完全没有提及。

如果您还想分析线程,则需要查看文档中的threading.setprofile()函数

您也可以创建自己的threading.Thread子类来做到这一点:

class ProfiledThread(threading.Thread):
    # Overrides threading.Thread.run()
    def run(self):
        profiler = cProfile.Profile()
        try:
            return profiler.runcall(threading.Thread.run, self)
        finally:
            profiler.dump_stats('myprofile-%d.profile' % (self.ident,))

并使用ProfiledThread该类而不是标准类。它可能会给您带来更大的灵活性,但是我不确定是否值得,特别是如果您使用的是不使用您的类的第三方代码。

It’s worth pointing out that using the profiler only works (by default) on the main thread, and you won’t get any information from other threads if you use them. This can be a bit of a gotcha as it is completely unmentioned in the profiler documentation.

If you also want to profile threads, you’ll want to look at the threading.setprofile() function in the docs.

You could also create your own threading.Thread subclass to do it:

class ProfiledThread(threading.Thread):
    # Overrides threading.Thread.run()
    def run(self):
        profiler = cProfile.Profile()
        try:
            return profiler.runcall(threading.Thread.run, self)
        finally:
            profiler.dump_stats('myprofile-%d.profile' % (self.ident,))

and use that ProfiledThread class instead of the standard one. It might give you more flexibility, but I’m not sure it’s worth it, especially if you are using third-party code which wouldn’t use your class.


回答 3

python Wiki是用于分析资源的好页面:http : //wiki.python.org/moin/PythonSpeed/PerformanceTips#Profiling_Code

就像python docs一样:http : //docs.python.org/library/profile.html

如Chris Lawlor所示,cProfile是一个很棒的工具,可以轻松地用于打印到屏幕上:

python -m cProfile -s time mine.py <args>

或提交:

python -m cProfile -o output.file mine.py <args>

PS>如果您使用的是Ubuntu,请确保安装python-profile

apt-get install python-profiler 

如果输出到文件,则可以使用以下工具获得不错的可视化效果

PyCallGraph:用于创建调用图图像的工具
安装:

 pip install pycallgraph

跑:

 pycallgraph mine.py args

视图:

 gimp pycallgraph.png

您可以使用任何喜欢的方式查看png文件,我使用的是gimp
不幸的是我经常得到

点:对于cairo-renderer位图,图形太大。按0.257079缩放以适应

这使我的图像变小了。所以我通常创建svg文件:

pycallgraph -f svg -o pycallgraph.svg mine.py <args>

PS>确保安装graphviz(提供点程序):

pip install graphviz

通过@maxy / @quodlibetor使用gprof2dot进行替代绘图:

pip install gprof2dot
python -m cProfile -o profile.pstats mine.py
gprof2dot -f pstats profile.pstats | dot -Tsvg -o mine.svg

The python wiki is a great page for profiling resources: http://wiki.python.org/moin/PythonSpeed/PerformanceTips#Profiling_Code

as is the python docs: http://docs.python.org/library/profile.html

as shown by Chris Lawlor cProfile is a great tool and can easily be used to print to the screen:

python -m cProfile -s time mine.py <args>

or to file:

python -m cProfile -o output.file mine.py <args>

PS> If you are using Ubuntu, make sure to install python-profile

apt-get install python-profiler 

If you output to file you can get nice visualizations using the following tools

PyCallGraph : a tool to create call graph images
install:

 pip install pycallgraph

run:

 pycallgraph mine.py args

view:

 gimp pycallgraph.png

You can use whatever you like to view the png file, I used gimp
Unfortunately I often get

dot: graph is too large for cairo-renderer bitmaps. Scaling by 0.257079 to fit

which makes my images unusably small. So I generally create svg files:

pycallgraph -f svg -o pycallgraph.svg mine.py <args>

PS> make sure to install graphviz (which provides the dot program):

pip install graphviz

Alternative Graphing using gprof2dot via @maxy / @quodlibetor :

pip install gprof2dot
python -m cProfile -o profile.pstats mine.py
gprof2dot -f pstats profile.pstats | dot -Tsvg -o mine.svg

回答 4

@Maxy对这个答案的评论为我提供了足够的帮助,我认为它应该得到自己的答案:我已经有cProfile生成的.pstats文件,并且我不想用pycallgraph重新运行,所以我使用了gprof2dot,并且很漂亮svgs:

$ sudo apt-get install graphviz
$ git clone https://github.com/jrfonseca/gprof2dot
$ ln -s "$PWD"/gprof2dot/gprof2dot.py ~/bin
$ cd $PROJECT_DIR
$ gprof2dot.py -f pstats profile.pstats | dot -Tsvg -o callgraph.svg

和布莱姆!

它使用点(pycallgraph使用相同的东西),因此输出看起来类似。我的印象是,尽管gprof2dot丢失的信息更少:

@Maxy’s comment on this answer helped me out enough that I think it deserves its own answer: I already had cProfile-generated .pstats files and I didn’t want to re-run things with pycallgraph, so I used gprof2dot, and got pretty svgs:

$ sudo apt-get install graphviz
$ git clone https://github.com/jrfonseca/gprof2dot
$ ln -s "$PWD"/gprof2dot/gprof2dot.py ~/bin
$ cd $PROJECT_DIR
$ gprof2dot.py -f pstats profile.pstats | dot -Tsvg -o callgraph.svg

and BLAM!

It uses dot (the same thing that pycallgraph uses) so output looks similar. I get the impression that gprof2dot loses less information though:


回答 5

在研究此主题时,我遇到了一个名为SnakeViz的便捷工具。SnakeViz是基于Web的配置文件可视化工具。这是非常容易安装和使用。我通常使用的方法是使用生成统计文件,%prun然后在SnakeViz中进行分析。

所使用的主要可视化技术是如下所示的森伯斯特图,其中,函数调用的层次结构被安排为弧形层,并且时间信息以其角宽进行编码。

最好的事情是您可以与图表进行交互。例如,要放大,可以单击圆弧,然后将圆弧及其后代放大为新的旭日形以显示更多详细信息。

I ran into a handy tool called SnakeViz when researching this topic. SnakeViz is a web-based profiling visualization tool. It is very easy to install and use. The usual way I use it is to generate a stat file with %prun and then do analysis in SnakeViz.

The main viz technique used is Sunburst chart as shown below, in which the hierarchy of function calls is arranged as layers of arcs and time info encoded in their angular widths.

The best thing is you can interact with the chart. For example, to zoom in one can click on an arc, and the arc and its descendants will be enlarged as a new sunburst to display more details.


回答 6

最简单最快的方式找到所有的时间是怎么回事。

1. pip install snakeviz

2. python -m cProfile -o temp.dat <PROGRAM>.py

3. snakeviz temp.dat

在浏览器中绘制饼图。最大的一块是问题功能。很简单。

Simplest and quickest way to find where all the time is going.

1. pip install snakeviz

2. python -m cProfile -o temp.dat <PROGRAM>.py

3. snakeviz temp.dat

Draws a pie chart in a browser. Biggest piece is the problem function. Very simple.


回答 7

我认为这cProfile对于概要分析非常有用,而kcachegrind对于可视化结果则非常有用。该pyprof2calltree文件转换手柄之间英寸

python -m cProfile -o script.profile script.py
pyprof2calltree -i script.profile -o script.calltree
kcachegrind script.calltree

要安装必需的工具(至少在Ubuntu上):

apt-get install kcachegrind
pip install pyprof2calltree

结果:

I think that cProfile is great for profiling, while kcachegrind is great for visualizing the results. The pyprof2calltree in between handles the file conversion.

python -m cProfile -o script.profile script.py
pyprof2calltree -i script.profile -o script.calltree
kcachegrind script.calltree

To install the required tools (on Ubuntu, at least):

apt-get install kcachegrind
pip install pyprof2calltree

The result:


回答 8

同样值得一提的是GUI cProfile转储查看器RunSnakeRun。它允许您排序和选择,从而放大程序的相关部分。图片中矩形的大小与所花费的时间成比例。如果将鼠标悬停在矩形上,它将突出显示表格中以及地图上所有位置的调用。当您双击一个矩形时,它将放大该部分。它将显示谁调用了该部分以及该部分调用了什么。

描述性信息非常有帮助。它显示了该位的代码,在处理内置库调用时可能会有所帮助。它告诉您要查找代码的文件和行。

还想指出,OP表示“概要分析”,但看来他的意思是“定时”。请记住,配置文件后,程序运行速度会变慢。

Also worth mentioning is the GUI cProfile dump viewer RunSnakeRun. It allows you to sort and select, thereby zooming in on the relevant parts of the program. The sizes of the rectangles in the picture is proportional to the time taken. If you mouse over a rectangle it highlights that call in the table and everywhere on the map. When you double-click on a rectangle it zooms in on that portion. It will show you who calls that portion and what that portion calls.

The descriptive information is very helpful. It shows you the code for that bit which can be helpful when you are dealing with built-in library calls. It tells you what file and what line to find the code.

Also want to point at that the OP said ‘profiling’ but it appears he meant ‘timing’. Keep in mind programs will run slower when profiled.


回答 9

一个不错的分析模块是line_profiler(使用脚本kernprof.py调用)。可以在这里下载。

我的理解是cProfile仅提供有关每个功能花费的总时间的信息。因此,单独的代码行不会计时。这是科学计算中的一个问题,因为单行通常会花费很多时间。另外,正如我记得的那样,cProfile并没有抓住我花在numpy.dot上的时间。

A nice profiling module is the line_profiler (called using the script kernprof.py). It can be downloaded here.

My understanding is that cProfile only gives information about total time spent in each function. So individual lines of code are not timed. This is an issue in scientific computing since often one single line can take a lot of time. Also, as I remember, cProfile didn’t catch the time I was spending in say numpy.dot.


回答 10

我最近创建了金枪鱼,用于可视化Python运行时和导入配置文件。这可能会有所帮助。

用安装

pip install tuna

创建运行时配置文件

python3 -m cProfile -o program.prof yourfile.py

或导入配置文件(需要Python 3.7+)

python3 -X importprofile yourfile.py 2> import.log

然后在文件上运行金枪鱼

tuna program.prof

I recently created tuna for visualizing Python runtime and import profiles; this may be helpful here.

Install with

pip install tuna

Create a runtime profile

python3 -m cProfile -o program.prof yourfile.py

or an import profile (Python 3.7+ required)

python3 -X importprofile yourfile.py 2> import.log

Then just run tuna on the file

tuna program.prof

回答 11

pprofile

line_profiler(已在此处展示)也受到了启发 pprofile,它被描述为:

线粒度,线程感知的确定性和统计纯Python探查器

它提供的行粒度为line_profiler,它是纯Python,可以用作独立命令或模块,甚至可以生成可以使用轻松分析的callgrind格式文件[k|q]cachegrind

vprof

还有vprof,一个Python包,描述为:

为各种Python程序特征(例如运行时间和内存使用情况)提供丰富的交互式可视化。

pprofile

line_profiler (already presented here) also inspired pprofile, which is described as:

Line-granularity, thread-aware deterministic and statistic pure-python profiler

It provides line-granularity as line_profiler, is pure Python, can be used as a standalone command or a module, and can even generate callgrind-format files that can be easily analyzed with [k|q]cachegrind.

vprof

There is also vprof, a Python package described as:

[…] providing rich and interactive visualizations for various Python program characteristics such as running time and memory usage.


回答 12

有很多不错的答案,但是他们要么使用命令行,要么使用某些外部程序来对结果进行概要分析和/或排序。

我真的很想念我可以在IDE(eclipse-PyDev)中使用的某些方式,而无需触摸命令行或安装任何东西。就是这样

不使用命令行进行分析

def count():
    from math import sqrt
    for x in range(10**5):
        sqrt(x)

if __name__ == '__main__':
    import cProfile, pstats
    cProfile.run("count()", "{}.profile".format(__file__))
    s = pstats.Stats("{}.profile".format(__file__))
    s.strip_dirs()
    s.sort_stats("time").print_stats(10)

有关更多信息,请参阅文档或其他答案。

There’s a lot of great answers but they either use command line or some external program for profiling and/or sorting the results.

I really missed some way I could use in my IDE (eclipse-PyDev) without touching the command line or installing anything. So here it is.

Profiling without command line

def count():
    from math import sqrt
    for x in range(10**5):
        sqrt(x)

if __name__ == '__main__':
    import cProfile, pstats
    cProfile.run("count()", "{}.profile".format(__file__))
    s = pstats.Stats("{}.profile".format(__file__))
    s.strip_dirs()
    s.sort_stats("time").print_stats(10)

See docs or other answers for more info.


回答 13

在Joe Shaw回答了多线程代码无法按预期工作的回答之后,我发现runcallcProfile 中的方法只是在做,self.enable()并且self.disable()在配置函数调用周围进行调用,因此您可以自己进行操作,并在中间使用任何想要的代码对现有代码的干扰最小。

Following Joe Shaw’s answer about multi-threaded code not to work as expected, I figured that the runcall method in cProfile is merely doing self.enable() and self.disable() calls around the profiled function call, so you can simply do that yourself and have whatever code you want in-between with minimal interference with existing code.


回答 14

在Virtaal的资料中,有一个非常有用的类和装饰器,可以使分析(即使对于特定的方法/函数)也非常容易。然后可以在KCacheGrind中非常舒适地查看输出。

In Virtaal’s source there’s a very useful class and decorator that can make profiling (even for specific methods/functions) very easy. The output can then be viewed very comfortably in KCacheGrind.


回答 15

cProfile非常适合快速分析,但是大多数情况下它以错误结束。函数runctx通过正确初始化环境和变量来解决此问题,希望它对某人有用:

import cProfile
cProfile.runctx('foo()', None, locals())

cProfile is great for quick profiling but most of the time it was ending for me with the errors. Function runctx solves this problem by initializing correctly the environment and variables, hope it can be useful for someone:

import cProfile
cProfile.runctx('foo()', None, locals())

回答 16

如果要制作累积分析器,则意味着连续运行该函数几次,并观察结果的总和。

您可以使用以下cumulative_profiler装饰器:

它是特定于python> = 3.6的python,但是您可以删除nonlocal它,以便在较旧版本上运行。

import cProfile, pstats

class _ProfileFunc:
    def __init__(self, func, sort_stats_by):
        self.func =  func
        self.profile_runs = []
        self.sort_stats_by = sort_stats_by

    def __call__(self, *args, **kwargs):
        pr = cProfile.Profile()
        pr.enable()  # this is the profiling section
        retval = self.func(*args, **kwargs)
        pr.disable()

        self.profile_runs.append(pr)
        ps = pstats.Stats(*self.profile_runs).sort_stats(self.sort_stats_by)
        return retval, ps

def cumulative_profiler(amount_of_times, sort_stats_by='time'):
    def real_decorator(function):
        def wrapper(*args, **kwargs):
            nonlocal function, amount_of_times, sort_stats_by  # for python 2.x remove this row

            profiled_func = _ProfileFunc(function, sort_stats_by)
            for i in range(amount_of_times):
                retval, ps = profiled_func(*args, **kwargs)
            ps.print_stats()
            return retval  # returns the results of the function
        return wrapper

    if callable(amount_of_times):  # incase you don't want to specify the amount of times
        func = amount_of_times  # amount_of_times is the function in here
        amount_of_times = 5  # the default amount
        return real_decorator(func)
    return real_decorator

分析功能 baz

import time

@cumulative_profiler
def baz():
    time.sleep(1)
    time.sleep(2)
    return 1

baz()

baz 跑了5次并打印了这个:

         20 function calls in 15.003 seconds

   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
       10   15.003    1.500   15.003    1.500 {built-in method time.sleep}
        5    0.000    0.000   15.003    3.001 <ipython-input-9-c89afe010372>:3(baz)
        5    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}

指定次数

@cumulative_profiler(3)
def baz():
    ...

If you want to make a cumulative profiler, meaning to run the function several times in a row and watch the sum of the results.

you can use this cumulative_profiler decorator:

it’s python >= 3.6 specific, but you can remove nonlocal for it work on older versions.

import cProfile, pstats

class _ProfileFunc:
    def __init__(self, func, sort_stats_by):
        self.func =  func
        self.profile_runs = []
        self.sort_stats_by = sort_stats_by

    def __call__(self, *args, **kwargs):
        pr = cProfile.Profile()
        pr.enable()  # this is the profiling section
        retval = self.func(*args, **kwargs)
        pr.disable()

        self.profile_runs.append(pr)
        ps = pstats.Stats(*self.profile_runs).sort_stats(self.sort_stats_by)
        return retval, ps

def cumulative_profiler(amount_of_times, sort_stats_by='time'):
    def real_decorator(function):
        def wrapper(*args, **kwargs):
            nonlocal function, amount_of_times, sort_stats_by  # for python 2.x remove this row

            profiled_func = _ProfileFunc(function, sort_stats_by)
            for i in range(amount_of_times):
                retval, ps = profiled_func(*args, **kwargs)
            ps.print_stats()
            return retval  # returns the results of the function
        return wrapper

    if callable(amount_of_times):  # incase you don't want to specify the amount of times
        func = amount_of_times  # amount_of_times is the function in here
        amount_of_times = 5  # the default amount
        return real_decorator(func)
    return real_decorator

Example

profiling the function baz

import time

@cumulative_profiler
def baz():
    time.sleep(1)
    time.sleep(2)
    return 1

baz()

baz ran 5 times and printed this:

         20 function calls in 15.003 seconds

   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
       10   15.003    1.500   15.003    1.500 {built-in method time.sleep}
        5    0.000    0.000   15.003    3.001 <ipython-input-9-c89afe010372>:3(baz)
        5    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}

specifying the amount of times

@cumulative_profiler(3)
def baz():
    ...

回答 17

仅终端机(也是最简单的)解决方案,如果所有这些精美的UI无法安装或运行:完全
忽略cProfile并替换为pyinstrument,它将在执行后立即收集并显示调用树。

安装:

$ pip install pyinstrument

配置文件和显示结果:

$ python -m pyinstrument ./prog.py

适用于python2和3。

[编辑]仅用于分析部分代码的API文档可在此处找到。

The terminal-only (and simplest) solution, in case all those fancy UI’s fail to install or to run:
ignore cProfile completely and replace it with pyinstrument, that will collect and display the tree of calls right after execution.

Install:

$ pip install pyinstrument

Profile and display result:

$ python -m pyinstrument ./prog.py

Works with python2 and 3.

[EDIT] The documentation of the API, for profiling only a part of the code, can be found here.


回答 18

我的方式是使用yappi(https://github.com/sumerc/yappi)。与RPC服务器结合使用时特别有用,在RPC服务器中(甚至仅用于调试),您注册方法以启动,停止和打印性能分析信息,例如:

@staticmethod
def startProfiler():
    yappi.start()

@staticmethod
def stopProfiler():
    yappi.stop()

@staticmethod
def printProfiler():
    stats = yappi.get_stats(yappi.SORTTYPE_TTOT, yappi.SORTORDER_DESC, 20)
    statPrint = '\n'
    namesArr = [len(str(stat[0])) for stat in stats.func_stats]
    log.debug("namesArr %s", str(namesArr))
    maxNameLen = max(namesArr)
    log.debug("maxNameLen: %s", maxNameLen)

    for stat in stats.func_stats:
        nameAppendSpaces = [' ' for i in range(maxNameLen - len(stat[0]))]
        log.debug('nameAppendSpaces: %s', nameAppendSpaces)
        blankSpace = ''
        for space in nameAppendSpaces:
            blankSpace += space

        log.debug("adding spaces: %s", len(nameAppendSpaces))
        statPrint = statPrint + str(stat[0]) + blankSpace + " " + str(stat[1]).ljust(8) + "\t" + str(
            round(stat[2], 2)).ljust(8 - len(str(stat[2]))) + "\t" + str(round(stat[3], 2)) + "\n"

    log.log(1000, "\nname" + ''.ljust(maxNameLen - 4) + " ncall \tttot \ttsub")
    log.log(1000, statPrint)

然后,当程序工作时,您可以随时通过调用startProfilerRPC方法来启动事件探查器,并通过调用printProfiler(或修改rpc方法以将其返回给调用者)将概要分析信息转储到日志文件中,并获得以下输出:

2014-02-19 16:32:24,128-|SVR-MAIN  |-(Thread-3   )-Level 1000: 
name                                                                                                                                      ncall     ttot    tsub
2014-02-19 16:32:24,128-|SVR-MAIN  |-(Thread-3   )-Level 1000: 
C:\Python27\lib\sched.py.run:80                                                                                                           22        0.11    0.05
M:\02_documents\_repos\09_aheadRepos\apps\ahdModbusSrv\pyAheadRpcSrv\xmlRpc.py.iterFnc:293                                                22        0.11    0.0
M:\02_documents\_repos\09_aheadRepos\apps\ahdModbusSrv\serverMain.py.makeIteration:515                                                    22        0.11    0.0
M:\02_documents\_repos\09_aheadRepos\apps\ahdModbusSrv\pyAheadRpcSrv\PicklingXMLRPC.py._dispatch:66                                       1         0.0     0.0
C:\Python27\lib\BaseHTTPServer.py.date_time_string:464                                                                                    1         0.0     0.0
c:\users\zasiec~1\appdata\local\temp\easy_install-hwcsr1\psutil-1.1.2-py2.7-win32.egg.tmp\psutil\_psmswindows.py._get_raw_meminfo:243     4         0.0     0.0
C:\Python27\lib\SimpleXMLRPCServer.py.decode_request_content:537                                                                          1         0.0     0.0
c:\users\zasiec~1\appdata\local\temp\easy_install-hwcsr1\psutil-1.1.2-py2.7-win32.egg.tmp\psutil\_psmswindows.py.get_system_cpu_times:148 4         0.0     0.0
<string>.__new__:8                                                                                                                        220       0.0     0.0
C:\Python27\lib\socket.py.close:276                                                                                                       4         0.0     0.0
C:\Python27\lib\threading.py.__init__:558                                                                                                 1         0.0     0.0
<string>.__new__:8                                                                                                                        4         0.0     0.0
C:\Python27\lib\threading.py.notify:372                                                                                                   1         0.0     0.0
C:\Python27\lib\rfc822.py.getheader:285                                                                                                   4         0.0     0.0
C:\Python27\lib\BaseHTTPServer.py.handle_one_request:301                                                                                  1         0.0     0.0
C:\Python27\lib\xmlrpclib.py.end:816                                                                                                      3         0.0     0.0
C:\Python27\lib\SimpleXMLRPCServer.py.do_POST:467                                                                                         1         0.0     0.0
C:\Python27\lib\SimpleXMLRPCServer.py.is_rpc_path_valid:460                                                                               1         0.0     0.0
C:\Python27\lib\SocketServer.py.close_request:475                                                                                         1         0.0     0.0
c:\users\zasiec~1\appdata\local\temp\easy_install-hwcsr1\psutil-1.1.2-py2.7-win32.egg.tmp\psutil\__init__.py.cpu_times:1066               4         0.0     0.0 

它对于短脚本可能不是很有用,但有助于优化服务器类型的进程,尤其是考虑到该printProfiler方法可以随时间多次调用以概要分析和比较例如不同的程序使用情况时,尤其如此。

在较新版本的yappi中,以下代码将起作用:

@staticmethod
def printProfile():
    yappi.get_func_stats().print_all()

My way is to use yappi (https://github.com/sumerc/yappi). It’s especially useful combined with an RPC server where (even just for debugging) you register method to start, stop and print profiling information, e.g. in this way:

@staticmethod
def startProfiler():
    yappi.start()

@staticmethod
def stopProfiler():
    yappi.stop()

@staticmethod
def printProfiler():
    stats = yappi.get_stats(yappi.SORTTYPE_TTOT, yappi.SORTORDER_DESC, 20)
    statPrint = '\n'
    namesArr = [len(str(stat[0])) for stat in stats.func_stats]
    log.debug("namesArr %s", str(namesArr))
    maxNameLen = max(namesArr)
    log.debug("maxNameLen: %s", maxNameLen)

    for stat in stats.func_stats:
        nameAppendSpaces = [' ' for i in range(maxNameLen - len(stat[0]))]
        log.debug('nameAppendSpaces: %s', nameAppendSpaces)
        blankSpace = ''
        for space in nameAppendSpaces:
            blankSpace += space

        log.debug("adding spaces: %s", len(nameAppendSpaces))
        statPrint = statPrint + str(stat[0]) + blankSpace + " " + str(stat[1]).ljust(8) + "\t" + str(
            round(stat[2], 2)).ljust(8 - len(str(stat[2]))) + "\t" + str(round(stat[3], 2)) + "\n"

    log.log(1000, "\nname" + ''.ljust(maxNameLen - 4) + " ncall \tttot \ttsub")
    log.log(1000, statPrint)

Then when your program work you can start profiler at any time by calling the startProfiler RPC method and dump profiling information to a log file by calling printProfiler (or modify the rpc method to return it to the caller) and get such output:

2014-02-19 16:32:24,128-|SVR-MAIN  |-(Thread-3   )-Level 1000: 
name                                                                                                                                      ncall     ttot    tsub
2014-02-19 16:32:24,128-|SVR-MAIN  |-(Thread-3   )-Level 1000: 
C:\Python27\lib\sched.py.run:80                                                                                                           22        0.11    0.05
M:\02_documents\_repos\09_aheadRepos\apps\ahdModbusSrv\pyAheadRpcSrv\xmlRpc.py.iterFnc:293                                                22        0.11    0.0
M:\02_documents\_repos\09_aheadRepos\apps\ahdModbusSrv\serverMain.py.makeIteration:515                                                    22        0.11    0.0
M:\02_documents\_repos\09_aheadRepos\apps\ahdModbusSrv\pyAheadRpcSrv\PicklingXMLRPC.py._dispatch:66                                       1         0.0     0.0
C:\Python27\lib\BaseHTTPServer.py.date_time_string:464                                                                                    1         0.0     0.0
c:\users\zasiec~1\appdata\local\temp\easy_install-hwcsr1\psutil-1.1.2-py2.7-win32.egg.tmp\psutil\_psmswindows.py._get_raw_meminfo:243     4         0.0     0.0
C:\Python27\lib\SimpleXMLRPCServer.py.decode_request_content:537                                                                          1         0.0     0.0
c:\users\zasiec~1\appdata\local\temp\easy_install-hwcsr1\psutil-1.1.2-py2.7-win32.egg.tmp\psutil\_psmswindows.py.get_system_cpu_times:148 4         0.0     0.0
<string>.__new__:8                                                                                                                        220       0.0     0.0
C:\Python27\lib\socket.py.close:276                                                                                                       4         0.0     0.0
C:\Python27\lib\threading.py.__init__:558                                                                                                 1         0.0     0.0
<string>.__new__:8                                                                                                                        4         0.0     0.0
C:\Python27\lib\threading.py.notify:372                                                                                                   1         0.0     0.0
C:\Python27\lib\rfc822.py.getheader:285                                                                                                   4         0.0     0.0
C:\Python27\lib\BaseHTTPServer.py.handle_one_request:301                                                                                  1         0.0     0.0
C:\Python27\lib\xmlrpclib.py.end:816                                                                                                      3         0.0     0.0
C:\Python27\lib\SimpleXMLRPCServer.py.do_POST:467                                                                                         1         0.0     0.0
C:\Python27\lib\SimpleXMLRPCServer.py.is_rpc_path_valid:460                                                                               1         0.0     0.0
C:\Python27\lib\SocketServer.py.close_request:475                                                                                         1         0.0     0.0
c:\users\zasiec~1\appdata\local\temp\easy_install-hwcsr1\psutil-1.1.2-py2.7-win32.egg.tmp\psutil\__init__.py.cpu_times:1066               4         0.0     0.0 

It may not be very useful for short scripts but helps to optimize server-type processes especially given the printProfiler method can be called multiple times over time to profile and compare e.g. different program usage scenarios.

In newer versions of yappi, the following code will work:

@staticmethod
def printProfile():
    yappi.get_func_stats().print_all()

回答 19

PyVmMonitor是处理Python中性能分析的新工具:http ://www.pyvmmonitor.com/

它具有一些独特的功能,例如

  • 将探查器附加到正在运行的(CPython)程序
  • 通过Yappi集成进行按需配置
  • 在其他计算机上配置文件
  • 多进程支持(多处理,django …)
  • 实时采样/ CPU视图(具有时间范围选择)
  • 通过cProfile / profile集成进行确定性分析
  • 分析现有的PStats结果
  • 打开DOT文件
  • 程序化API访问
  • 按方法或行对样本进行分组
  • PyDev集成
  • PyCharm整合

注意:它是商业性的,但对开源免费。

A new tool to handle profiling in Python is PyVmMonitor: http://www.pyvmmonitor.com/

It has some unique features such as

  • Attach profiler to a running (CPython) program
  • On demand profiling with Yappi integration
  • Profile on a different machine
  • Multiple processes support (multiprocessing, django…)
  • Live sampling/CPU view (with time range selection)
  • Deterministic profiling through cProfile/profile integration
  • Analyze existing PStats results
  • Open DOT files
  • Programatic API access
  • Group samples by method or line
  • PyDev integration
  • PyCharm integration

Note: it’s commercial, but free for open source.


回答 20

gprof2dot_magic

魔术函数,用于gprof2dot在JupyterLab或Jupyter Notebook中将任何Python语句配置为DOT图。

GitHub回购:https : //github.com/mattijn/gprof2dot_magic

安装

确保您拥有Python软件包gprof2dot_magic

pip install gprof2dot_magic

它的依赖关系gprof2dotgraphviz也将被安装

用法

要启用魔术功能,请先加载gprof2dot_magic模块

%load_ext gprof2dot_magic

然后将任何行语句配置为DOT图,如下所示:

%gprof2dot print('hello world')

gprof2dot_magic

Magic function for gprof2dot to profile any Python statement as a DOT graph in JupyterLab or Jupyter Notebook.

GitHub repo: https://github.com/mattijn/gprof2dot_magic

installation

Make sure you’ve the Python package gprof2dot_magic.

pip install gprof2dot_magic

Its dependencies gprof2dot and graphviz will be installed as well

usage

To enable the magic function, first load the gprof2dot_magic module

%load_ext gprof2dot_magic

and then profile any line statement as a DOT graph as such:

%gprof2dot print('hello world')


回答 21

是否曾经想知道python脚本到底在做什么?输入检查外壳。通过Inspect Shell,您可以在不中断正在运行的脚本的情况下打印/更改全局变量并运行函数。现在具有自动完成和命令历史记录(仅在Linux上)。

Inspect Shell不是pdb样式的调试器。

https://github.com/amoffat/Inspect-Shell

您可以使用它(和您的手表)。

Ever want to know what the hell that python script is doing? Enter the Inspect Shell. Inspect Shell lets you print/alter globals and run functions without interrupting the running script. Now with auto-complete and command history (only on linux).

Inspect Shell is not a pdb-style debugger.

https://github.com/amoffat/Inspect-Shell

You could use that (and your wristwatch).


回答 22

要添加到https://stackoverflow.com/a/582337/1070617

我编写了此模块,该模块使您可以使用cProfile并轻松查看其输出。此处更多内容:https//github.com/ymichael/cprofilev

$ python -m cprofilev /your/python/program
# Go to http://localhost:4000 to view collected statistics.

另请参阅:http: //ymichael.com/2014/03/08/profiling-python-with-cprofile.html,了解如何理解收集到的统计信息。

To add on to https://stackoverflow.com/a/582337/1070617,

I wrote this module that allows you to use cProfile and view its output easily. More here: https://github.com/ymichael/cprofilev

$ python -m cprofilev /your/python/program
# Go to http://localhost:4000 to view collected statistics.

Also see: http://ymichael.com/2014/03/08/profiling-python-with-cprofile.html on how to make sense of the collected statistics.


回答 23

这将取决于您希望从分析中看到什么。可以通过(bash)给出简单的时间指标。

time python python_prog.py

甚至’/ usr / bin / time’也可以使用’–verbose’标志输出详细的指标。

要检查每个函数给出的时间指标并更好地了解在函数上花费了多少时间,可以在python中使用内置的cProfile。

进入性能,时间等更详细的指标并不是唯一的指标。您可以担心内存,线程等问题。
分析选项:
1. line_profiler是另一个分析器,通常用于逐行找出时序度量。
2. memory_profiler是用于分析内存使用情况的工具。
3. 堆(来自项目Guppy)描述如何使用堆中的对象。

这些是我倾向于使用的一些常见的东西。但是,如果您想了解更多信息,请尝试阅读本书。 这是一本关于性能入门的不错的书。您可以转到使用Cython和JIT(即时)编译的python的高级主题。

It would depend on what you want to see out of profiling. Simple time metrics can be given by (bash).

time python python_prog.py

Even ‘/usr/bin/time’ can output detailed metrics by using ‘–verbose’ flag.

To check time metrics given by each function and to better understand how much time is spent on functions, you can use the inbuilt cProfile in python.

Going into more detailed metrics like performance, time is not the only metric. You can worry about memory, threads etc.
Profiling options:
1. line_profiler is another profiler used commonly to find out timing metrics line-by-line.
2. memory_profiler is a tool to profile memory usage.
3. heapy (from project Guppy) Profile how objects in the heap are used.

These are some of the common ones I tend to use. But if you want to find out more, try reading this book It is a pretty good book on starting out with performance in mind. You can move onto advanced topics on using Cython and JIT(Just-in-time) compiled python.


回答 24

使用austin之类的统计分析器,不需要任何检测,这意味着您可以轻松地从Python应用程序中分析数据

austin python3 my_script.py

原始输出不是很有用,但是您可以将其通过管道传递到flamegraph.pl 以获取该数据的火焰图表示,从而可以细分所花费的时间(以毫秒为单位的实时时间)。

austin python3 my_script.py | flamegraph.pl > my_script_profile.svg

With a statistical profiler like austin, no instrumentation is required, meaning that you can get profiling data out of a Python application simply with

austin python3 my_script.py

The raw output isn’t very useful, but you can pipe that to flamegraph.pl to get a flame graph representation of that data that gives you a breakdown of where the time (measured in microseconds of real time) is being spent.

austin python3 my_script.py | flamegraph.pl > my_script_profile.svg

回答 25

还有一个名为的统计分析器statprof。它是一个采样探查器,因此它为您的代码增加了最小的开销,并提供了基于行(不仅仅基于函数)的时序。它更适合诸如游戏之类的软实时应用程序,但精度可能不如cProfile。

pypi中版本有点旧,因此可以pip通过指定git仓库来安装它:

pip install git+git://github.com/bos/statprof.py@1a33eba91899afe17a8b752c6dfdec6f05dd0c01

您可以像这样运行它:

import statprof

with statprof.profile():
    my_questionable_function()

另请参阅https://stackoverflow.com/a/10333592/320036

There’s also a statistical profiler called statprof. It’s a sampling profiler, so it adds minimal overhead to your code and gives line-based (not just function-based) timings. It’s more suited to soft real-time applications like games, but may be have less precision than cProfile.

The version in pypi is a bit old, so can install it with pip by specifying the git repository:

pip install git+git://github.com/bos/statprof.py@1a33eba91899afe17a8b752c6dfdec6f05dd0c01

You can run it like this:

import statprof

with statprof.profile():
    my_questionable_function()

See also https://stackoverflow.com/a/10333592/320036


回答 26

我刚刚从pypref_time中开发了自己的探查器:

https://github.com/modaresimr/auto_profiler

通过添加装饰器,它将显示一棵耗时的功能树

@Profiler(depth=4, on_disable=show)

Install by: pip install auto_profiler

import time # line number 1
import random

from auto_profiler import Profiler, Tree

def f1():
    mysleep(.6+random.random())

def mysleep(t):
    time.sleep(t)

def fact(i):
    f1()
    if(i==1):
        return 1
    return i*fact(i-1)


def show(p):
    print('Time   [Hits * PerHit] Function name [Called from] [Function Location]\n'+\
          '-----------------------------------------------------------------------')
    print(Tree(p.root, threshold=0.5))

@Profiler(depth=4, on_disable=show)
def main():
    for i in range(5):
        f1()

    fact(3)


if __name__ == '__main__':
    main()

示例输出


Time   [Hits * PerHit] Function name [Called from] [function location]
-----------------------------------------------------------------------
8.974s [1 * 8.974]  main  [auto-profiler/profiler.py:267]  [/test/t2.py:30]
├── 5.954s [5 * 1.191]  f1  [/test/t2.py:34]  [/test/t2.py:14]
   └── 5.954s [5 * 1.191]  mysleep  [/test/t2.py:15]  [/test/t2.py:17]
       └── 5.954s [5 * 1.191]  <time.sleep>
|
|
|   # The rest is for the example recursive function call fact
└── 3.020s [1 * 3.020]  fact  [/test/t2.py:36]  [/test/t2.py:20]
    ├── 0.849s [1 * 0.849]  f1  [/test/t2.py:21]  [/test/t2.py:14]
       └── 0.849s [1 * 0.849]  mysleep  [/test/t2.py:15]  [/test/t2.py:17]
           └── 0.849s [1 * 0.849]  <time.sleep>
    └── 2.171s [1 * 2.171]  fact  [/test/t2.py:24]  [/test/t2.py:20]
        ├── 1.552s [1 * 1.552]  f1  [/test/t2.py:21]  [/test/t2.py:14]
           └── 1.552s [1 * 1.552]  mysleep  [/test/t2.py:15]  [/test/t2.py:17]
        └── 0.619s [1 * 0.619]  fact  [/test/t2.py:24]  [/test/t2.py:20]
            └── 0.619s [1 * 0.619]  f1  [/test/t2.py:21]  [/test/t2.py:14]

I just developed my own profiler inspired from pypref_time:

https://github.com/modaresimr/auto_profiler

By adding a decorator it will show a tree of time-consuming functions

@Profiler(depth=4, on_disable=show)

Install by: pip install auto_profiler

Example

import time # line number 1
import random

from auto_profiler import Profiler, Tree

def f1():
    mysleep(.6+random.random())

def mysleep(t):
    time.sleep(t)

def fact(i):
    f1()
    if(i==1):
        return 1
    return i*fact(i-1)


def show(p):
    print('Time   [Hits * PerHit] Function name [Called from] [Function Location]\n'+\
          '-----------------------------------------------------------------------')
    print(Tree(p.root, threshold=0.5))

@Profiler(depth=4, on_disable=show)
def main():
    for i in range(5):
        f1()

    fact(3)


if __name__ == '__main__':
    main()

Example Output


Time   [Hits * PerHit] Function name [Called from] [function location]
-----------------------------------------------------------------------
8.974s [1 * 8.974]  main  [auto-profiler/profiler.py:267]  [/test/t2.py:30]
├── 5.954s [5 * 1.191]  f1  [/test/t2.py:34]  [/test/t2.py:14]
│   └── 5.954s [5 * 1.191]  mysleep  [/test/t2.py:15]  [/test/t2.py:17]
│       └── 5.954s [5 * 1.191]  <time.sleep>
|
|
|   # The rest is for the example recursive function call fact
└── 3.020s [1 * 3.020]  fact  [/test/t2.py:36]  [/test/t2.py:20]
    ├── 0.849s [1 * 0.849]  f1  [/test/t2.py:21]  [/test/t2.py:14]
    │   └── 0.849s [1 * 0.849]  mysleep  [/test/t2.py:15]  [/test/t2.py:17]
    │       └── 0.849s [1 * 0.849]  <time.sleep>
    └── 2.171s [1 * 2.171]  fact  [/test/t2.py:24]  [/test/t2.py:20]
        ├── 1.552s [1 * 1.552]  f1  [/test/t2.py:21]  [/test/t2.py:14]
        │   └── 1.552s [1 * 1.552]  mysleep  [/test/t2.py:15]  [/test/t2.py:17]
        └── 0.619s [1 * 0.619]  fact  [/test/t2.py:24]  [/test/t2.py:20]
            └── 0.619s [1 * 0.619]  f1  [/test/t2.py:21]  [/test/t2.py:14]

回答 27

当我不在服务器上时,我使用 lsprofcalltree.py并像这样运行我的程序:

python lsprofcalltree.py -o callgrind.1 test.py

然后,我可以使用任何与callgrind兼容的软件打开报告,例如qcachegrind

When i’m not root on the server, I use lsprofcalltree.py and run my program like this:

python lsprofcalltree.py -o callgrind.1 test.py

Then I can open the report with any callgrind-compatible software, like qcachegrind


回答 28

用于在IPython笔记本上快速获取代码段的配置文件统计信息。可以将line_profiler和memory_profiler直接嵌入他们的笔记本中。

得到它!

!pip install line_profiler
!pip install memory_profiler

加载它!

%load_ext line_profiler
%load_ext memory_profiler

用它!


%时间

%time print('Outputs CPU time,Wall Clock time') 
#CPU times: user 2 µs, sys: 0 ns, total: 2 µs Wall time: 5.96 µs

给出:

  • CPU时间:CPU级执行时间
  • 系统时间:系统级执行时间
  • 总计:CPU时间+系统时间
  • 挂钟时间:挂钟时间

%timeit

%timeit -r 7 -n 1000 print('Outputs execution time of the snippet') 
#1000 loops, best of 7: 7.46 ns per loop
  • 在循环(n)次中,在给定的运行次数(r)中给出最佳时间。
  • 输出有关系统缓存的详细信息:
    • 当代码片段多次执行时,系统会缓存一些操作,而不会再次执行它们,这可能会影响概要文件报告的准确性。

%修剪

%prun -s cumulative 'Code to profile' 

给出:

  • 函数调用数(ncalls)
  • 每个函数调用都有条目(不同)
  • 每次通话所花费的时间(每次通话)
  • 直到该函数调用为止的时间(cumtime)
  • 称为etc的func /模块的名称…


%mit

%memit 'Code to profile'
#peak memory: 199.45 MiB, increment: 0.00 MiB

给出:

  • 内存使用情况

%lprun

#Example function
def fun():
  for i in range(10):
    print(i)

#Usage: %lprun <name_of_the_function> function
%lprun -f fun fun()

给出:

  • 逐行统计

For getting quick profile stats for your code snippets on an IPython notebook. One can embed line_profiler and memory_profiler straight into their notebooks.

Get it!

!pip install line_profiler
!pip install memory_profiler

Load it!

%load_ext line_profiler
%load_ext memory_profiler

Use it!


%time

%time print('Outputs CPU time,Wall Clock time') 
#CPU times: user 2 µs, sys: 0 ns, total: 2 µs Wall time: 5.96 µs

Gives:

  • CPU times: CPU level execution time
  • sys times: system level execution time
  • total: CPU time + system time
  • Wall time: Wall Clock Time

%timeit

%timeit -r 7 -n 1000 print('Outputs execution time of the snippet') 
#1000 loops, best of 7: 7.46 ns per loop
  • Gives best time out of given number of runs(r) in looping (n) times.
  • Outputs details on system caching:
    • When code snippets are executed multiple times, system caches a few opearations and doesn’t execute them again that may hamper the accuracy of the profile reports.

%prun

%prun -s cumulative 'Code to profile' 

Gives:

  • number of function calls(ncalls)
  • has entries per function call(distinct)
  • time taken per call(percall)
  • time elapsed till that function call(cumtime)
  • name of the func/module called etc…


%memit

%memit 'Code to profile'
#peak memory: 199.45 MiB, increment: 0.00 MiB

Gives:

  • Memory usage

%lprun

#Example function
def fun():
  for i in range(10):
    print(i)

#Usage: %lprun <name_of_the_function> function
%lprun -f fun fun()

Gives:

  • Line wise stats


Pyroscope-🔥连续性能分析平台🔥调试性能问题可归结为一行代码

Pyroscope是一个开源的连续性能剖析平台。它将帮助您:

  • 查找代码中的性能问题
  • 解决CPU利用率高的问题
  • 了解应用程序的调用树
  • 跟踪随时间变化的情况

🔥Pyroscope Live Demo🔥

功能

  • 可以存储来自多个应用程序的多年性能分析数据
  • 您可以一次查看多年数据,也可以放大特定事件
  • 低CPU开销
  • 高效压缩、低磁盘空间需求
  • 时髦的用户界面
  • 支持Go、Ruby和Python

分3步在本地试用Pyroscope:

# install pyroscope
brew install pyroscope-io/brew/pyroscope

# start pyroscope server:
pyroscope server

# in a separate tab, start profiling your app:
pyroscope exec python manage.py runserver # If using Python
pyroscope exec rails server               # If using Ruby

# If using Pyroscope cloud add flags for server address and auth token
# pyroscope exec -server-address "https://your_company.pyroscope.cloud" -auth-token "ps-key-1234567890" python manage.py runserver

文档

有关如何将Pyroscope与其他编程语言配合使用、在Linux上安装或在生产环境中使用的更多信息,请查看我们的文档:

下载次数

您可以从我们的网站下载适用于MacOS、Linux和Docker的最新版本的PyroscopeDownloads page

支持的集成

  • 红宝石(通过rbspy)
  • Python(通过py-spy)
  • 前往(途经pprof)
  • Linux eBPF(VIAprofile.py从…bcc-tools)
  • PHP(通过phpspy)
  • .NET(通过dotnet trace)
  • Java(即将推出)

请让我们知道您希望在our issues或在our slack

学分

高温镜之所以成为可能,要归功于许多人的出色工作,包括但不限于:

  • 布兰登·格雷格(Brendan Gregg)-火焰图的发明者
  • Julia Evans-Ruby的rbspy采样剖析器的创建者
  • 弗拉基米尔·阿加方金(Vladimir Agafonkin)-火焰手的创建者-快速火焰图形渲染器
  • Ben Frederickson-Python的py-spy采样分析器的创建者
  • Adam Saponara-PHP采样剖析器的创建者
  • Alexei Starovoitov、Brendan Gregg和其他许多使Linux内核中基于BPF的分析成为可能的人

贡献

要开始投稿,请查看我们的Contributing Guide

感谢Pyroscope的贡献者!