标签归档:memory-management

减少Django的内存使用量。低挂水果?

问题:减少Django的内存使用量。低挂水果?

我的内存使用量随着时间的推移而增加,并且重新启动Django对用户而言并不友好。

我不确定如何分析内存使用情况,但是一些有关如何开始测量的提示将很有用。

我感觉有些简单的步骤可以带来很大的收益。确保将“调试”设置为“假”是显而易见的。

有人可以建议别人吗?在低流量的网站上缓存会带来多少改善?

在这种情况下,我使用mod_python在Apache 2.x下运行。我听说mod_wsgi较为精简,但在此阶段进行切换将非常棘手,除非我知道收益会很大。

编辑:感谢到目前为止的提示。关于如何发现内存用尽的任何建议?是否有任何有关Python内存分析的指南?

同样如前所述,有些事情会使切换到mod_wsgi变得很棘手,因此我想对在朝这个方向努力之前所能获得的收益有所了解。

编辑:卡尔在这里发布了更详细的回复,值得一读:Django部署:减少Apache的开销

编辑: Graham Dumpleton的文章是我在MPM和mod_wsgi相关的东西上找到的最好的文章。我很失望,但是没人能提供有关调试应用程序本身的内存使用情况的任何信息。

最终编辑:好吧,我一直在与Webfaction讨论这个问题,看他们是否可以协助重新编译Apache,这就是他们的话:

“我真的认为切换到MPM Worker + mod_wsgi设置不会给您带来太大的好处。我估计您可能可以节省20MB左右,但可能不超过20MB。”

所以!这使我回到了最初的问题(我仍然不明智)。如何确定问题所在?这是一个众所周知的准则,如果不进行测试以查看需要优化的地方就不会进行优化,但是关于测量Python内存使用情况的教程的方式很少,而针对Django的教程则完全没有。

感谢大家的帮助,但我认为这个问题仍然悬而未决!

另一个最终编辑;-)

我在django-users列表上问了这个,并得到了一些非常有帮助的回复

老实说,有史以来最后一次更新!

这是刚刚发布。可能是迄今为止最好的解决方案:使用Pympler分析Django对象的大小和内存使用情况

My memory usage increases over time and restarting Django is not kind to users.

I am unsure how to go about profiling the memory usage but some tips on how to start measuring would be useful.

I have a feeling that there are some simple steps that could produce big gains. Ensuring ‘debug’ is set to ‘False’ is an obvious biggie.

Can anyone suggest others? How much improvement would caching on low-traffic sites?

In this case I’m running under Apache 2.x with mod_python. I’ve heard mod_wsgi is a bit leaner but it would be tricky to switch at this stage unless I know the gains would be significant.

Edit: Thanks for the tips so far. Any suggestions how to discover what’s using up the memory? Are there any guides to Python memory profiling?

Also as mentioned there’s a few things that will make it tricky to switch to mod_wsgi so I’d like to have some idea of the gains I could expect before ploughing forwards in that direction.

Edit: Carl posted a slightly more detailed reply here that is worth reading: Django Deployment: Cutting Apache’s Overhead

Edit: Graham Dumpleton’s article is the best I’ve found on the MPM and mod_wsgi related stuff. I am rather disappointed that no-one could provide any info on debugging the memory usage in the app itself though.

Final Edit: Well I have been discussing this with Webfaction to see if they could assist with recompiling Apache and this is their word on the matter:

“I really don’t think that you will get much of a benefit by switching to an MPM Worker + mod_wsgi setup. I estimate that you might be able to save around 20MB, but probably not much more than that.”

So! This brings me back to my original question (which I am still none the wiser about). How does one go about identifying where the problems lies? It’s a well known maxim that you don’t optimize without testing to see where you need to optimize but there is very little in the way of tutorials on measuring Python memory usage and none at all specific to Django.

Thanks for everyone’s assistance but I think this question is still open!

Another final edit ;-)

I asked this on the django-users list and got some very helpful replies

Honestly the last update ever!

This was just released. Could be the best solution yet: Profiling Django object size and memory usage with Pympler


回答 0

确保您没有保留对数据的全局引用。这样可以防止python垃圾回收器释放内存。

不要使用mod_python。它在apache中加载一个解释器。如果需要使用apache,请mod_wsgi改用。切换并不困难。这很容易。与django -dead相比,为djangomod_wsgi进行配置更容易mod_python

如果您可以从需求中删除apache,那对您的记忆会更好。spawning似乎是运行python Web应用程序的新的快速可扩展方式。

编辑:我看不到如何切换到mod_wsgi可能是“ 棘手的 ”。这应该是一个非常容易的任务。请详细说明您在使用交换机时遇到的问题。

Make sure you are not keeping global references to data. That prevents the python garbage collector from releasing the memory.

Don’t use mod_python. It loads an interpreter inside apache. If you need to use apache, use mod_wsgi instead. It is not tricky to switch. It is very easy. mod_wsgi is way easier to configure for django than brain-dead mod_python.

If you can remove apache from your requirements, that would be even better to your memory. spawning seems to be the new fast scalable way to run python web applications.

EDIT: I don’t see how switching to mod_wsgi could be “tricky“. It should be a very easy task. Please elaborate on the problem you are having with the switch.


回答 1

如果您在mod_wsgi下运行,并且由于它是WSGI兼容的,则大概是在生成的,您可以使用Dozer查看您的内存使用情况。

在mod_wsgi下,只需将其添加到WSGI脚本的底部:

from dozer import Dozer
application = Dozer(application)

然后将浏览器指向http:// domain / _dozer / index以查看所有内存分配的列表。

我还要添加对mod_wsgi的支持之声。与mod_python相比,它在性能和内存使用方面有很大的不同。Graham Dumpleton对mod_wsgi的支持非常出色,无论是在积极开发方面还是在帮助邮件列表中的人员优化安装方面均如此。curse.com上的David Cramer 发布了一些图表(不幸的是,现在似乎找不到),显示了在该高流量站点上切换到mod_wsgi后cpu和内存使用量的急剧下降。django开发人员中有几个已经切换。说真的,这很容易:)

If you are running under mod_wsgi, and presumably spawning since it is WSGI compliant, you can use Dozer to look at your memory usage.

Under mod_wsgi just add this at the bottom of your WSGI script:

from dozer import Dozer
application = Dozer(application)

Then point your browser at http://domain/_dozer/index to see a list of all your memory allocations.

I’ll also just add my voice of support for mod_wsgi. It makes a world of difference in terms of performance and memory usage over mod_python. Graham Dumpleton’s support for mod_wsgi is outstanding, both in terms of active development and in helping people on the mailing list to optimize their installations. David Cramer at curse.com has posted some charts (which I can’t seem to find now unfortunately) showing the drastic reduction in cpu and memory usage after they switched to mod_wsgi on that high traffic site. Several of the django devs have switched. Seriously, it’s a no-brainer :)


回答 2

这些是我知道的Python内存探查器解决方案(与Django无关):

免责声明:我与后者有一定关系。

各个项目的文档应使您了解如何使用这些工具来分析Python应用程序的内存行为。

以下是一个不错的“战争故事”,还提供了一些有用的指导:

These are the Python memory profiler solutions I’m aware of (not Django related):

Disclaimer: I have a stake in the latter.

The individual project’s documentation should give you an idea of how to use these tools to analyze memory behavior of Python applications.

The following is a nice “war story” that also gives some helpful pointers:


回答 3

此外,检查是否不使用任何已知的泄漏器。由于Unicode处理中的错误,MySQLdb会泄漏Django的大量内存。除此之外,Django Debug Toolbar可能会帮助您跟踪猪。

Additionally, check if you do not use any of known leakers. MySQLdb is known to leak enormous amounts of memory with Django due to bug in unicode handling. Other than that, Django Debug Toolbar might help you to track the hogs.


回答 4

除了不保留对大型数据对象的全局引用之外,还应尽可能避免将大型数据集加载到内存中。

在守护程序模式下切换到mod_wsgi,并使用Apache的worker mpm代替prefork。后面的步骤可以使您以更少的内存开销为更多的并发用户提供服务。

In addition to not keeping around global references to large data objects, try to avoid loading large datasets into memory at all wherever possible.

Switch to mod_wsgi in daemon mode, and use Apache’s worker mpm instead of prefork. This latter step can allow you to serve many more concurrent users with much less memory overhead.


回答 5

Webfaction实际上有一些技巧可以降低Django的内存使用量。

要点:

  • 确保将debug设置为false(您已经知道)。
  • 在您的Apache配置中使用“ ServerLimit”
  • 检查内存中没有大对象
  • 考虑在单独的进程或服务器中提供静态内容。
  • 在您的apache配置中使用“ MaxRequestsPerChild”
  • 找出并了解您正在使用多少内存

Webfaction actually has some tips for keeping django memory usage down.

The major points:

  • Make sure debug is set to false (you already know that).
  • Use “ServerLimit” in your apache config
  • Check that no big objects are being loaded in memory
  • Consider serving static content in a separate process or server.
  • Use “MaxRequestsPerChild” in your apache config
  • Find out and understand how much memory you’re using

回答 6

mod_wsgi的另一个优点:maximum-requestsWSGIDaemonProcess指令中设置一个参数,mod_wsgi会每隔很长时间就重新启动守护进程。对用户来说,应该没有可见的效果,除了第一次刷新新进程时页面加载缓慢之外,因为它将把Django和您的应用程序代码加载到内存中。

但是,即使确实有内存泄漏,也应避免进程过大,而不必中断对用户的服务。

Another plus for mod_wsgi: set a maximum-requests parameter in your WSGIDaemonProcess directive and mod_wsgi will restart the daemon process every so often. There should be no visible effect for the user, other than a slow page load the first time a fresh process is hit, as it’ll be loading Django and your application code into memory.

But even if you do have memory leaks, that should keep the process size from getting too large, without having to interrupt service to your users.


回答 7

这是我用于mod_wsgi的脚本(称为wsgi.py,并放在django项目的根目录中):

import os
import sys
import django.core.handlers.wsgi

from os import path

sys.stdout = open('/dev/null', 'a+')
sys.stderr = open('/dev/null', 'a+')

sys.path.append(path.join(path.dirname(__file__), '..'))

os.environ['DJANGO_SETTINGS_MODULE'] = 'myproject.settings'
application = django.core.handlers.wsgi.WSGIHandler()

根据需要调整myproject.settings和路径。我将所有输出重定向到/ dev / null,因为默认情况下mod_wsgi阻止打印。请改用日志记录。

对于apache:

<VirtualHost *>
   ServerName myhost.com

   ErrorLog /var/log/apache2/error-myhost.log
   CustomLog /var/log/apache2/access-myhost.log common

   DocumentRoot "/var/www"

   WSGIScriptAlias / /path/to/my/wsgi.py

</VirtualHost>

希望这至少应该可以帮助您设置mod_wsgi,以便您查看它是否有所作为。

Here is the script I use for mod_wsgi (called wsgi.py, and put in the root off my django project):

import os
import sys
import django.core.handlers.wsgi

from os import path

sys.stdout = open('/dev/null', 'a+')
sys.stderr = open('/dev/null', 'a+')

sys.path.append(path.join(path.dirname(__file__), '..'))

os.environ['DJANGO_SETTINGS_MODULE'] = 'myproject.settings'
application = django.core.handlers.wsgi.WSGIHandler()

Adjust myproject.settings and the path as needed. I redirect all output to /dev/null since mod_wsgi by default prevents printing. Use logging instead.

For apache:

<VirtualHost *>
   ServerName myhost.com

   ErrorLog /var/log/apache2/error-myhost.log
   CustomLog /var/log/apache2/access-myhost.log common

   DocumentRoot "/var/www"

   WSGIScriptAlias / /path/to/my/wsgi.py

</VirtualHost>

Hopefully this should at least help you set up mod_wsgi so you can see if it makes a difference.


回答 8

缓存:确保已将其刷新。它很容易将某些东西放到缓存中,但是由于缓存引用而永远不会被GC。

Swig’d代码:确保任何内存管理都正确完成,这真的很容易在python中丢失,尤其是在第三方库中

监视:如果可以,获取有关内存使用率和命中率的数据。通常,您会看到某种类型的请求与内存使用之间的关联。

Caches: make sure they’re being flushed. Its easy for something to land in a cache, but never be GC’d because of the cache reference.

Swig’d code: Make sure any memory management is being done correctly, its really easy to miss these in python, especially with third party libraries

Monitoring: If you can, get data about memory usage and hits. Usually you’ll see a correlation between a certain type of request and memory usage.


回答 9

我们偶然发现了Django中包含大型站点地图(10000个项)的错误。似乎Django在生成站点地图时正在尝试将它们全部加载到内存中:http : //code.djangoproject.com/ticket/11572-当Google对该网站进行访问时,有效地终止了Apache进程。

We stumbled over a bug in Django with big sitemaps (10.000 items). Seems Django is trying to load them all in memory when generating the sitemap: http://code.djangoproject.com/ticket/11572 – effectively kills the apache process when Google pays a visit to the site.


在Python中释放内存

问题:在Python中释放内存

在以下示例中,我有一些有关内存使用的相关问题。

  1. 如果我在解释器中运行,

    foo = ['bar' for _ in xrange(10000000)]

    我的机器上使用的实际内存最高为80.9mb。那我

    del foo

    实际内存下降,但仅限于30.4mb。解释器使用4.4mb基线,因此不26mb向OS 释放内存有什么好处?是因为Python在“提前计划”,以为您可能会再次使用那么多的内存?

  2. 它为什么50.5mb特别释放- 释放的量基于什么?

  3. 有没有一种方法可以强制Python释放所有已使用的内存(如果您知道不会再使用那么多的内存)?

注意 此问题不同于我如何在Python中显式释放内存? 因为这个问题主要解决了内存使用量相对于基线的增加,即使解释器通过垃圾回收(使用gc.collect或不使用)释放了对象之后。

I have a few related questions regarding memory usage in the following example.

  1. If I run in the interpreter,

    foo = ['bar' for _ in xrange(10000000)]
    

    the real memory used on my machine goes up to 80.9mb. I then,

    del foo
    

    real memory goes down, but only to 30.4mb. The interpreter uses 4.4mb baseline so what is the advantage in not releasing 26mb of memory to the OS? Is it because Python is “planning ahead”, thinking that you may use that much memory again?

  2. Why does it release 50.5mb in particular – what is the amount that is released based on?

  3. Is there a way to force Python to release all the memory that was used (if you know you won’t be using that much memory again)?

NOTE This question is different from How can I explicitly free memory in Python? because this question primarily deals with the increase of memory usage from baseline even after the interpreter has freed objects via garbage collection (with use of gc.collect or not).


回答 0

堆上分配的内存可能会出现高水位标记。Python PyObject_Malloc在4个KiB池中分配小对象()的内部优化使情况复杂化,分类为8字节倍数的分配大小-最多256字节(3.3中为512字节)。池本身位于256 KiB竞技场中,因此,如果仅在一个池中使用一个块,则不会释放整个256 KiB竞技场。在Python 3.3中,小型对象分配器已切换为使用匿名内存映射而不是堆,因此它在释放内存方面应表现更好。

此外,内置类型维护可能使用或不使用小型对象分配器的先前分配对象的空闲列表。该int类型维护一个具有自己分配的内存的空闲列表,要清除它,需要调用PyInt_ClearFreeList()。可以通过做一个full来间接地调用它gc.collect

这样尝试,然后告诉我您得到了什么。这是psutil.Process.memory_info的链接。

import os
import gc
import psutil

proc = psutil.Process(os.getpid())
gc.collect()
mem0 = proc.get_memory_info().rss

# create approx. 10**7 int objects and pointers
foo = ['abc' for x in range(10**7)]
mem1 = proc.get_memory_info().rss

# unreference, including x == 9999999
del foo, x
mem2 = proc.get_memory_info().rss

# collect() calls PyInt_ClearFreeList()
# or use ctypes: pythonapi.PyInt_ClearFreeList()
gc.collect()
mem3 = proc.get_memory_info().rss

pd = lambda x2, x1: 100.0 * (x2 - x1) / mem0
print "Allocation: %0.2f%%" % pd(mem1, mem0)
print "Unreference: %0.2f%%" % pd(mem2, mem1)
print "Collect: %0.2f%%" % pd(mem3, mem2)
print "Overall: %0.2f%%" % pd(mem3, mem0)

输出:

Allocation: 3034.36%
Unreference: -752.39%
Collect: -2279.74%
Overall: 2.23%

编辑:

我改用相对于进程VM大小的度量来消除系统中其他进程的影响。

当顶部的连续可用空间达到恒定,动态或可配置的阈值时,C运行时(例如glibc,msvcrt)会缩小堆。使用glibc,您可以使用mallopt(M_TRIM_THRESHOLD)进行调整。鉴于此,如果堆的收缩量比您的块收缩的量更大甚至更多,也就不足为奇了free

在3.x版本range中不会创建列表,因此上面的测试不会创建1000万个int对象。即使这样做,int3.x中的类型也基本上是2.x long,它没有实现自由列表。

Memory allocated on the heap can be subject to high-water marks. This is complicated by Python’s internal optimizations for allocating small objects (PyObject_Malloc) in 4 KiB pools, classed for allocation sizes at multiples of 8 bytes — up to 256 bytes (512 bytes in 3.3). The pools themselves are in 256 KiB arenas, so if just one block in one pool is used, the entire 256 KiB arena will not be released. In Python 3.3 the small object allocator was switched to using anonymous memory maps instead of the heap, so it should perform better at releasing memory.

Additionally, the built-in types maintain freelists of previously allocated objects that may or may not use the small object allocator. The int type maintains a freelist with its own allocated memory, and clearing it requires calling PyInt_ClearFreeList(). This can be called indirectly by doing a full gc.collect.

Try it like this, and tell me what you get. Here’s the link for psutil.Process.memory_info.

import os
import gc
import psutil

proc = psutil.Process(os.getpid())
gc.collect()
mem0 = proc.get_memory_info().rss

# create approx. 10**7 int objects and pointers
foo = ['abc' for x in range(10**7)]
mem1 = proc.get_memory_info().rss

# unreference, including x == 9999999
del foo, x
mem2 = proc.get_memory_info().rss

# collect() calls PyInt_ClearFreeList()
# or use ctypes: pythonapi.PyInt_ClearFreeList()
gc.collect()
mem3 = proc.get_memory_info().rss

pd = lambda x2, x1: 100.0 * (x2 - x1) / mem0
print "Allocation: %0.2f%%" % pd(mem1, mem0)
print "Unreference: %0.2f%%" % pd(mem2, mem1)
print "Collect: %0.2f%%" % pd(mem3, mem2)
print "Overall: %0.2f%%" % pd(mem3, mem0)

Output:

Allocation: 3034.36%
Unreference: -752.39%
Collect: -2279.74%
Overall: 2.23%

Edit:

I switched to measuring relative to the process VM size to eliminate the effects of other processes in the system.

The C runtime (e.g. glibc, msvcrt) shrinks the heap when contiguous free space at the top reaches a constant, dynamic, or configurable threshold. With glibc you can tune this with mallopt (M_TRIM_THRESHOLD). Given this, it isn’t surprising if the heap shrinks by more — even a lot more — than the block that you free.

In 3.x range doesn’t create a list, so the test above won’t create 10 million int objects. Even if it did, the int type in 3.x is basically a 2.x long, which doesn’t implement a freelist.


回答 1

我猜您在这里真正关心的问题是:

有没有一种方法可以强制Python释放所有已使用的内存(如果您知道不会再使用那么多的内存)?

不,那里没有。但是有一个简单的解决方法:子进程。

如果您需要5分钟的500MB临时存储空间,但是之后又需要运行2个小时,并且不会再碰到那么多的内存,请生成一个子进程来进行占用大量内存的工作。当子进程消失时,内存将被释放。

这不是完全琐碎和免费的,但是它很容易且便宜,通常足以使交易值得。

首先,最简单的创建子进程的方法是使用concurrent.futures或(对于3.1及更早版本,futures在PyPI上进行反向移植):

with concurrent.futures.ProcessPoolExecutor(max_workers=1) as executor:
    result = executor.submit(func, *args, **kwargs).result()

如果需要更多控制,请使用该multiprocessing模块。

费用是:

  • 在某些平台上,尤其是Windows,进程启动有点慢。我们在这里以毫秒为单位,而不是分钟,如果您要让一个孩子做300秒的工作,您甚至不会注意到。但这不是免费的。
  • 如果使用大量的临时存储的还真是,这样做可能会导致换出你的主程序。当然,从长远来看,您可以节省时间,因为如果该内存永远存在,那将导致在某些时候进行交换。但是,在某些情况下,这可能会将逐渐的缓慢转变为非常明显的一次(和早期)延迟。
  • 在进程之间发送大量数据可能很慢。同样,如果您正在谈论发送超过2K的参数并返回64K的结果,您甚至不会注意到它,但是如果您发送和接收大量数据,则需要使用其他某种机制(文件,mmapPed或其他格式;共享内存API multiprocessing;等)。
  • 在进程之间发送大量数据意味着数据必须是可腌制的(或者,如果将它们粘贴到文件或共享内存中,struct则是-理想情况下是-理想ctypes)。

I’m guessing the question you really care about here is:

Is there a way to force Python to release all the memory that was used (if you know you won’t be using that much memory again)?

No, there is not. But there is an easy workaround: child processes.

If you need 500MB of temporary storage for 5 minutes, but after that you need to run for another 2 hours and won’t touch that much memory ever again, spawn a child process to do the memory-intensive work. When the child process goes away, the memory gets released.

This isn’t completely trivial and free, but it’s pretty easy and cheap, which is usually good enough for the trade to be worthwhile.

First, the easiest way to create a child process is with concurrent.futures (or, for 3.1 and earlier, the futures backport on PyPI):

with concurrent.futures.ProcessPoolExecutor(max_workers=1) as executor:
    result = executor.submit(func, *args, **kwargs).result()

If you need a little more control, use the multiprocessing module.

The costs are:

  • Process startup is kind of slow on some platforms, notably Windows. We’re talking milliseconds here, not minutes, and if you’re spinning up one child to do 300 seconds’ worth of work, you won’t even notice it. But it’s not free.
  • If the large amount of temporary memory you use really is large, doing this can cause your main program to get swapped out. Of course you’re saving time in the long run, because that if that memory hung around forever it would have to lead to swapping at some point. But this can turn gradual slowness into very noticeable all-at-once (and early) delays in some use cases.
  • Sending large amounts of data between processes can be slow. Again, if you’re talking about sending over 2K of arguments and getting back 64K of results, you won’t even notice it, but if you’re sending and receiving large amounts of data, you’ll want to use some other mechanism (a file, mmapped or otherwise; the shared-memory APIs in multiprocessing; etc.).
  • Sending large amounts of data between processes means the data have to be pickleable (or, if you stick them in a file or shared memory, struct-able or ideally ctypes-able).

回答 2

eryksun已经回答了问题1,而我已经回答了问题3(原始的#4),但是现在让我们回答问题2:

为什么特别释放50.5mb-释放量基于多少?

最终,它基于的是Python内部的一系列巧合,而malloc这些巧合很难预测。

首先,根据测量内存的方式,您可能只在测量实际映射到内存的页面。在这种情况下,每当页面被调页器换出时,内存将显示为“已释放”,即使尚未释放也是如此。

或者您可能正在测量使用中的页面,这些页面可能会或可能不会计算已分配但从未接触过的页面(在乐观地过度分配的系统(例如linux)上),已分配但已标记的页面MADV_FREE等。

如果您确实在测量分配的页面(这实际上不是一件非常有用的事情,但这似乎是您要问的问题),并且页面实际上已经被释放了,则可能会发生两种情况:您曾经使用过brk或等效方法来缩小数据段(如今非常少见),或者您曾经使用过munmap或类似方法来释放映射的段。(从理论上讲,后者也有一个较小的变体,因为有一些方法可以释放一部分已映射的段,例如,将其窃取MAP_FIXED用于MADV_FREE立即取消映射的段。)

但是大多数程序并不直接在内存页面中分配内容。他们使用malloc-style分配器。当您调用时free,如果您恰巧free正在映射中的最后一个活动对象(或数据段的最后N个页面),则分配器只能将页面释放到OS 。您的应用程序无法合理地预测甚至提前检测到它。

CPython使这一过程变得更加复杂-它在的顶部具有一个自定义的2级对象分配器,而在的顶部具有一个自定义的内存分配器malloc。(有关更详细的解释,请参见源注释。)此外,即使在C API级别上,Python也要少得多,您甚至不直接控制顶级对象的释放时间。

因此,当您释放一个对象时,如何知道它是否将向OS释放内存?好吧,首先,您必须知道已发布了最后一个引用(包括您不知道的任何内部引用),从而允许GC对其进行分配。(与其他实现不同,至少CPython会在允许的情况下立即释放对象。)这通常会在下一级向下释放至少两件事(例如,对于一个字符串,您释放该PyString对象和字符串缓冲区) )。

如果确实要释放对象,则要知道这是否导致下一级别的释放对象存储块,您必须知道对象分配器的内部状态及其实现方式。(除非您要取消分配块中的最后一件事,否则显然不会发生,即使那样,也可能不会发生。)

如果确实要释放对象存储块,要知道这是否导致free调用,则必须知道PyMem分配器的内部状态及其实现方式。(同样,您必须在malloced区域中释放最后一个使用中的块,即使那样,也可能不会发生。)

如果你 free一个malloc版区,要知道这是否会导致munmap或同等学历(或brk),你必须知道的内部状态malloc,以及它是如何实现的。而且,这个不同于其他,它是高度特定于平台的。(同样,您通常必须malloc在一个mmap网段中释放最后一次使用的资源,即使那样,也可能不会发生。)

因此,如果您想了解为什么它恰好释放了50.5mb,则必须从下至上进行跟踪。malloc当您进行一次或多次free通话(可能超过50.5mb)时,为什么不映射50.5mb的页面?您必须阅读平台的malloc,然后遍历各种表和列表以查看其当前状态。(在某些平台上,它甚至可能利用系统级信息,而如果不制作系统快照以进行脱机检查几乎是不可能捕获的,但是幸运的是,这通常不是问题。)然后,您必须在以上三个级别上执行相同的操作。

因此,对该问题唯一有用的答案是“因为”。

除非您正在进行资源有限的(例如嵌入式)开发,否则您没有理由关心这些细节。

如果你正在做资源有限的发展,了解这些细节是无用的; 您几乎必须在所有这些级别上进行最终运行,尤其mmap是在应用程序级别上可能需要的内存(可能在两者之间使用一个简单的,易于理解的,特定于应用程序的区域分配器)。

eryksun has answered question #1, and I’ve answered question #3 (the original #4), but now let’s answer question #2:

Why does it release 50.5mb in particular – what is the amount that is released based on?

What it’s based on is, ultimately, a whole series of coincidences inside Python and malloc that are very hard to predict.

First, depending on how you’re measuring memory, you may only be measuring pages actually mapped into memory. In that case, any time a page gets swapped out by the pager, memory will show up as “freed”, even though it hasn’t been freed.

Or you may be measuring in-use pages, which may or may not count allocated-but-never-touched pages (on systems that optimistically over-allocate, like linux), pages that are allocated but tagged MADV_FREE, etc.

If you really are measuring allocated pages (which is actually not a very useful thing to do, but it seems to be what you’re asking about), and pages have really been deallocated, two circumstances in which this can happen: Either you’ve used brk or equivalent to shrink the data segment (very rare nowadays), or you’ve used munmap or similar to release a mapped segment. (There’s also theoretically a minor variant to the latter, in that there are ways to release part of a mapped segment—e.g., steal it with MAP_FIXED for a MADV_FREE segment that you immediately unmap.)

But most programs don’t directly allocate things out of memory pages; they use a malloc-style allocator. When you call free, the allocator can only release pages to the OS if you just happen to be freeing the last live object in a mapping (or in the last N pages of the data segment). There’s no way your application can reasonably predict this, or even detect that it happened in advance.

CPython makes this even more complicated—it has a custom 2-level object allocator on top of a custom memory allocator on top of malloc. (See the source comments for a more detailed explanation.) And on top of that, even at the C API level, much less Python, you don’t even directly control when the top-level objects are deallocated.

So, when you release an object, how do you know whether it’s going to release memory to the OS? Well, first you have to know that you’ve released the last reference (including any internal references you didn’t know about), allowing the GC to deallocate it. (Unlike other implementations, at least CPython will deallocate an object as soon as it’s allowed to.) This usually deallocates at least two things at the next level down (e.g., for a string, you’re releasing the PyString object, and the string buffer).

If you do deallocate an object, to know whether this causes the next level down to deallocate a block of object storage, you have to know the internal state of the object allocator, as well as how it’s implemented. (It obviously can’t happen unless you’re deallocating the last thing in the block, and even then, it may not happen.)

If you do deallocate a block of object storage, to know whether this causes a free call, you have to know the internal state of the PyMem allocator, as well as how it’s implemented. (Again, you have to be deallocating the last in-use block within a malloced region, and even then, it may not happen.)

If you do free a malloced region, to know whether this causes an munmap or equivalent (or brk), you have to know the internal state of the malloc, as well as how it’s implemented. And this one, unlike the others, is highly platform-specific. (And again, you generally have to be deallocating the last in-use malloc within an mmap segment, and even then, it may not happen.)

So, if you want to understand why it happened to release exactly 50.5mb, you’re going to have to trace it from the bottom up. Why did malloc unmap 50.5mb worth of pages when you did those one or more free calls (for probably a bit more than 50.5mb)? You’d have to read your platform’s malloc, and then walk the various tables and lists to see its current state. (On some platforms, it may even make use of system-level information, which is pretty much impossible to capture without making a snapshot of the system to inspect offline, but luckily this isn’t usually a problem.) And then you have to do the same thing at the 3 levels above that.

So, the only useful answer to the question is “Because.”

Unless you’re doing resource-limited (e.g., embedded) development, you have no reason to care about these details.

And if you are doing resource-limited development, knowing these details is useless; you pretty much have to do an end-run around all those levels and specifically mmap the memory you need at the application level (possibly with one simple, well-understood, application-specific zone allocator in between).


回答 3

首先,您可能要安装一眼:

sudo apt-get install python-pip build-essential python-dev lm-sensors 
sudo pip install psutil logutils bottle batinfo https://bitbucket.org/gleb_zhulik/py3sensors/get/tip.tar.gz zeroconf netifaces pymdstat influxdb elasticsearch potsdb statsd pystache docker-py pysnmp pika py-cpuinfo bernhard
sudo pip install glances

然后在终端中运行它!

glances

在您的Python代码中,在文件的开头添加以下内容:

import os
import gc # Garbage Collector

使用“ Big”变量(例如:myBigVar)后,您要为其释放内存,请在python代码中编写以下内容:

del myBigVar
gc.collect()

在另一个终端中,运行您的python代码,并在“ glances”终端中观察如何在系统中管理内存!

祝好运!

PS我假设您正在Debian或Ubuntu系统上工作

First, you may want to install glances:

sudo apt-get install python-pip build-essential python-dev lm-sensors 
sudo pip install psutil logutils bottle batinfo https://bitbucket.org/gleb_zhulik/py3sensors/get/tip.tar.gz zeroconf netifaces pymdstat influxdb elasticsearch potsdb statsd pystache docker-py pysnmp pika py-cpuinfo bernhard
sudo pip install glances

Then run it in the terminal!

glances

In your Python code, add at the begin of the file, the following:

import os
import gc # Garbage Collector

After using the “Big” variable (for example: myBigVar) for which, you would like to release memory, write in your python code the following:

del myBigVar
gc.collect()

In another terminal, run your python code and observe in the “glances” terminal, how the memory is managed in your system!

Good luck!

P.S. I assume you are working on a Debian or Ubuntu system


有没有办法从解释器的内存中删除创建的变量,函数等?

问题:有没有办法从解释器的内存中删除创建的变量,函数等?

我一直在寻找这个问题的准确答案已有几天了,但是还没有任何好的结果。我不是编程的完整初学者,但即使在中级水平上也还不是。

当我进入Python的外壳程序时,键入:dir()并且可以看到当前作用域(主块)中所有对象的所有名称,其中有6个:

['__builtins__', '__doc__', '__loader__', '__name__', '__package__', '__spec__']

然后,例如,当我声明一个变量时,x = 10它会自动添加到内置模块下的对象列表中dir(),当我dir()再次键入时,它现在显示:

['__builtins__', '__doc__', '__loader__', '__name__', '__package__', '__spec__', 'x']

函数,类等也是如此。

我如何删除所有这些新对象而不删除开始时可用的标准6?

我在这里已经阅读了有关“内存清理”,“控制台清理”的信息,该命令将删除命令提示符窗口中的所有文本:

>>> import sys
>>> clear = lambda: os.system('cls')
>>> clear()

但是,所有这些都与我要实现的目标无关,它不会清除所有使用过的对象。

I’ve been searching for the accurate answer to this question for a couple of days now but haven’t got anything good. I’m not a complete beginner in programming, but not yet even on the intermediate level.

When I’m in the shell of Python, I type: dir() and I can see all the names of all the objects in the current scope (main block), there are 6 of them:

['__builtins__', '__doc__', '__loader__', '__name__', '__package__', '__spec__']

Then, when I’m declaring a variable, for example x = 10, it automatically adds to that lists of objects under built-in module dir(), and when I type dir() again, it shows now:

['__builtins__', '__doc__', '__loader__', '__name__', '__package__', '__spec__', 'x']

The same goes for functions, classes and so on.

How do I delete all those new objects without erasing the standard 6 which where available at the beginning?

I’ve read here about “memory cleaning”, “cleaning of the console”, which erases all the text from the command prompt window:

>>> import sys
>>> clear = lambda: os.system('cls')
>>> clear()

But all this has nothing to do with what I’m trying to achieve, it doesn’t clean out all used objects.


回答 0

您可以使用以下方法删除个人名称del

del x

或者您可以将它们从globals()对象中删除:

for name in dir():
    if not name.startswith('_'):
        del globals()[name]

这只是一个示例循环。它只在防御性上删除不以下划线开头的名称,并假设(并非毫无道理)您在解释器的开头仅使用了不带下划线的名称。如果您确实想透彻一点,则可以使用硬编码的名称列表来保留(白名单)。除了退出并重新启动解释器外,没有内置函数可以为您执行清除操作。

您已导入(import os)的模块将保持导入状态,因为它们被sys.modules; 引用。后续导入将重用已经导入的模块对象。您只是在当前的全局命名空间中没有对它们的引用。

You can delete individual names with del:

del x

or you can remove them from the globals() object:

for name in dir():
    if not name.startswith('_'):
        del globals()[name]

This is just an example loop; it defensively only deletes names that do not start with an underscore, making a (not unreasoned) assumption that you only used names without an underscore at the start in your interpreter. You could use a hard-coded list of names to keep instead (whitelisting) if you really wanted to be thorough. There is no built-in function to do the clearing for you, other than just exit and restart the interpreter.

Modules you’ve imported (import os) are going to remain imported because they are referenced by sys.modules; subsequent imports will reuse the already imported module object. You just won’t have a reference to them in your current global namespace.


回答 1

是。有一种简单的方法可以删除iPython中的所有内容。在iPython控制台中,只需键入:

%reset

然后系统会要求您确认。按y。如果您不想看到此提示,只需键入:

%reset -f

这应该工作..

Yes. There is a simple way to remove everything in iPython. In iPython console, just type:

%reset

Then system will ask you to confirm. Press y. If you don’t want to see this prompt, simply type:

%reset -f

This should work..


回答 2

您可以使用python垃圾收集器:

import gc
gc.collect()

You can use python garbage collector:

import gc
gc.collect()

回答 3

如果您在的交互式环境中,Jupyter或者,如果您不想要的var变重,则ipython可能需要 清除它们。

magic命令resetreset_selective在交互式python会话(例如ipython和)上可用 Jupyter

1) reset

reset 通过删除用户定义的所有名称(如果不带参数调用)来重置命名空间。

inout参数指定是否要刷新输入/输出缓存。目录历史记录将使用dhist参数刷新。

reset in out

另一个有趣的是array,仅删除了numpy数组:

reset array

2)reset_selective

通过删除用户定义的名称来重置命名空间。输入/输出历史记录保留在您需要的时候。

清洁阵列示例:

In [1]: import numpy as np
In [2]: littleArray = np.array([1,2,3,4,5])
In [3]: who_ls
Out[3]: ['littleArray', 'np']
In [4]: reset_selective -f littleArray
In [5]: who_ls
Out[5]: ['np']

来源:http : //ipython.readthedocs.io/en/stable/interactive/magics.html

If you are in an interactive environment like Jupyter or ipython you might be interested in clearing unwanted var’s if they are getting heavy.

The magic-commands reset and reset_selective is vailable on interactive python sessions like ipython and Jupyter

1) reset

reset Resets the namespace by removing all names defined by the user, if called without arguments.

in and the out parameters specify whether you want to flush the in/out caches. The directory history is flushed with the dhist parameter.

reset in out

Another interesting one is array that only removes numpy Arrays:

reset array

2) reset_selective

Resets the namespace by removing names defined by the user. Input/Output history are left around in case you need them.

Clean Array Example:

In [1]: import numpy as np
In [2]: littleArray = np.array([1,2,3,4,5])
In [3]: who_ls
Out[3]: ['littleArray', 'np']
In [4]: reset_selective -f littleArray
In [5]: who_ls
Out[5]: ['np']

Source: http://ipython.readthedocs.io/en/stable/interactive/magics.html


回答 4

这对我有用。

您需要对全局变量运行两次,然后对本地变量运行

for name in dir():
    if not name.startswith('_'):
        del globals()[name]

for name in dir():
    if not name.startswith('_'):
        del locals()[name]

This worked for me.

You need to run it twice once for globals followed by locals

for name in dir():
    if not name.startswith('_'):
        del globals()[name]

for name in dir():
    if not name.startswith('_'):
        del locals()[name]

回答 5

实际上python会回收不再使用的内存。这被称为垃圾回收,这是python中的自动过程。但是,如果您仍然想这样做,则可以通过删除它del variable_name。您也可以通过将变量分配给None

a = 10
print a 

del a       
print a      ## throws an error here because it's been deleted already.

真正从未引用的Python对象回收内存的唯一方法是通过垃圾回收器。del关键字只是将对象的名称解除绑定,但是仍然需要对该对象进行垃圾回收。您可以强制垃圾收集器使用gc模块运行,但这几乎可以肯定是过早的优化,但是它有其自身的风险。使用del并没有真正的效果,因为这些名称会因为超出范围而被删除。

Actually python will reclaim the memory which is not in use anymore.This is called garbage collection which is automatic process in python. But still if you want to do it then you can delete it by del variable_name. You can also do it by assigning the variable to None

a = 10
print a 

del a       
print a      ## throws an error here because it's been deleted already.

The only way to truly reclaim memory from unreferenced Python objects is via the garbage collector. The del keyword simply unbinds a name from an object, but the object still needs to be garbage collected. You can force garbage collector to run using the gc module, but this is almost certainly a premature optimization but it has its own risks. Using del has no real effect, since those names would have been deleted as they went out of scope anyway.


如何检查pytorch是否正在使用GPU?

问题:如何检查pytorch是否正在使用GPU?

我想知道是否pytorch正在使用我的GPU。nvidia-smi在此过程中,可以检测是否有来自GPU的任何活动,但是我想要在python脚本中编写一些东西。

有办法吗?

I would like to know if pytorch is using my GPU. It’s possible to detect with nvidia-smi if there is any activity from the GPU during the process, but I want something written in a python script.

Is there a way to do so?


回答 0

这将起作用:

In [1]: import torch

In [2]: torch.cuda.current_device()
Out[2]: 0

In [3]: torch.cuda.device(0)
Out[3]: <torch.cuda.device at 0x7efce0b03be0>

In [4]: torch.cuda.device_count()
Out[4]: 1

In [5]: torch.cuda.get_device_name(0)
Out[5]: 'GeForce GTX 950M'

In [6]: torch.cuda.is_available()
Out[6]: True

这告诉我GPU GeForce GTX 950M正在被使用PyTorch

This is going to work :

In [1]: import torch

In [2]: torch.cuda.current_device()
Out[2]: 0

In [3]: torch.cuda.device(0)
Out[3]: <torch.cuda.device at 0x7efce0b03be0>

In [4]: torch.cuda.device_count()
Out[4]: 1

In [5]: torch.cuda.get_device_name(0)
Out[5]: 'GeForce GTX 950M'

In [6]: torch.cuda.is_available()
Out[6]: True

This tells me the GPU GeForce GTX 950M is being used by PyTorch.


回答 1

因为这里没有提出,所以我添加了一个使用的方法torch.device,因为这很方便,也可以在正确的上初始化张量device

# setting device on GPU if available, else CPU
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print('Using device:', device)
print()

#Additional Info when using cuda
if device.type == 'cuda':
    print(torch.cuda.get_device_name(0))
    print('Memory Usage:')
    print('Allocated:', round(torch.cuda.memory_allocated(0)/1024**3,1), 'GB')
    print('Cached:   ', round(torch.cuda.memory_cached(0)/1024**3,1), 'GB')

输出:

Using device: cuda

Tesla K80
Memory Usage:
Allocated: 0.3 GB
Cached:    0.6 GB

如上所述,使用device它是可能的

  • 移至张量到各自的device

    torch.rand(10).to(device)
  • 创建直接在张量device

    torch.rand(10, device=device)

这使得在CPUGPU之间切换变得舒适,而无需更改实际代码。


编辑:

由于对缓存分配的内存存在一些疑问和困惑,因此我添加了一些有关它的其他信息:


您可以直接device在帖子中上面指定的位置移交一个,也可以将其保留为None,它将使用current_device()

As it hasn’t been proposed here, I’m adding a method using torch.device, as this is quite handy, also when initializing tensors on the correct device.

# setting device on GPU if available, else CPU
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print('Using device:', device)
print()

#Additional Info when using cuda
if device.type == 'cuda':
    print(torch.cuda.get_device_name(0))
    print('Memory Usage:')
    print('Allocated:', round(torch.cuda.memory_allocated(0)/1024**3,1), 'GB')
    print('Cached:   ', round(torch.cuda.memory_reserved(0)/1024**3,1), 'GB')

Edit: torch.cuda.memory_cached has been renamed to torch.cuda.memory_reserved. So use memory_cached for older versions.

Output:

Using device: cuda

Tesla K80
Memory Usage:
Allocated: 0.3 GB
Cached:    0.6 GB

As mentioned above, using device it is possible to:

  • To move tensors to the respective device:

      torch.rand(10).to(device)
    
  • To create a tensor directly on the device:

      torch.rand(10, device=device)
    

Which makes switching between CPU and GPU comfortable without changing the actual code.


Edit:

As there has been some questions and confusion about the cached and allocated memory I’m adding some additional information about it:


You can either directly hand over a device as specified further above in the post or you can leave it None and it will use the current_device().


Additional note: Old graphic cards with Cuda compute capability 3.0 or lower may be visible but cannot be used by Pytorch!
Thanks to hekimgil for pointing this out! – “Found GPU0 GeForce GT 750M which is of cuda capability 3.0. PyTorch no longer supports this GPU because it is too old. The minimum cuda capability that we support is 3.5.”


回答 2

在开始运行训练循环之后,如果要从终端手动查看它,则您的程序是否正在使用GPU资源以及使用程度如何,则可以像下面这样简单地使用watch

$ watch -n 2 nvidia-smi

这将持续每2秒更新一次使用情况统计信息,直到您按ctrl+c


如果您需要对可能需要的更多GPU统计信息进行更多控制,则可以使用with的更复杂的版本nvidia-smi--query-gpu=...。以下是对此的简单说明:

$ watch -n 3 nvidia-smi --query-gpu=index,gpu_name,memory.total,memory.used,memory.free,temperature.gpu,pstate,utilization.gpu,utilization.memory --format=csv

这将输出统计信息,例如:

注意:中的逗号分隔查询名称之间不应有任何空格--query-gpu=...。否则,这些值将被忽略,并且不返回任何统计信息。


另外,您可以通过执行以下操作来检查您的PyTorch安装是否正确检测到CUDA安装:

In [13]: import  torch

In [14]: torch.cuda.is_available()
Out[14]: True

True状态表示PyTorch已正确配置并正在使用GPU,尽管您必须在代码中使用必需的语句移动/放置张量。


如果要在Python代码中执行此操作,请查看以下模块:

https://github.com/jonsafari/nvidia-ml-py或在pypi中:https ://pypi.python.org/pypi/nvidia-ml-py/

After you start running the training loop, if you want to manually watch it from the terminal whether your program is utilizing the GPU resources and to what extent, then you can simply use watch as in:

$ watch -n 2 nvidia-smi

This will continuously update the usage stats for every 2 seconds until you press ctrl+c


If you need more control on more GPU stats you might need, you can use more sophisticated version of nvidia-smi with --query-gpu=.... Below is a simple illustration of this:

$ watch -n 3 nvidia-smi --query-gpu=index,gpu_name,memory.total,memory.used,memory.free,temperature.gpu,pstate,utilization.gpu,utilization.memory --format=csv

which would output the stats something like:

Note: There should not be any space between the comma separated query names in --query-gpu=.... Else those values will be ignored and no stats are returned.


Also, you can check whether your installation of PyTorch detects your CUDA installation correctly by doing:

In [13]: import  torch

In [14]: torch.cuda.is_available()
Out[14]: True

True status means that PyTorch is configured correctly and is using the GPU although you have to move/place the tensors with necessary statements in your code.


If you want to do this inside Python code, then look into this module:

https://github.com/jonsafari/nvidia-ml-py or in pypi here: https://pypi.python.org/pypi/nvidia-ml-py/


回答 3

在办公室站点和“入门”页面上,如下检查PyTorch的GPU:

import torch
torch.cuda.is_available()

参考:PyTorch |开始

On the office site and the get start page, check GPU for PyTorch as below:

import torch
torch.cuda.is_available()

Reference: PyTorch|Get Start


回答 4

从实际的角度来看,只有一个小题外话:

import torch
dev = torch.device("cuda") if torch.cuda.is_available() else torch.device("cpu")

dev现在知道,如果CUDA或CPU。

当移至cuda时,如何处理模型和张量是有区别的。起初有点奇怪。

import torch
dev = torch.device("cuda") if torch.cuda.is_available() else torch.device("cpu")
t1 = torch.randn(1,2)
t2 = torch.randn(1,2).to(dev)
print(t1)  # tensor([[-0.2678,  1.9252]])
print(t2)  # tensor([[ 0.5117, -3.6247]], device='cuda:0')
t1.to(dev) 
print(t1)  # tensor([[-0.2678,  1.9252]]) 
print(t1.is_cuda) # False
t1=t1.to(dev)
print(t1)  # tensor([[-0.2678,  1.9252]], device='cuda:0') 
print(t1.is_cuda) # True

class M(nn.Module):
def __init__(self):        
    super().__init__()        
    self.l1 = nn.Linear(1,2)

def forward(self, x):                      
    x = self.l1(x)
    return x
model = M()   # not on cuda
model.to(dev) # is on cuda (all parameters)
print(next(model.parameters()).is_cuda) #True

这一切都是棘手的,一旦理解就可以帮助您以更少的调试快速处理。

From practical standpoint just one minor digression:

import torch
dev = torch.device("cuda") if torch.cuda.is_available() else torch.device("cpu")

This dev now knows if cuda or cpu.

And there is a difference how you deal with model and with tensors when moving to cuda. It is a bit strange at first.

import torch
import torch.nn as nn
dev = torch.device("cuda") if torch.cuda.is_available() else torch.device("cpu")
t1 = torch.randn(1,2)
t2 = torch.randn(1,2).to(dev)
print(t1)  # tensor([[-0.2678,  1.9252]])
print(t2)  # tensor([[ 0.5117, -3.6247]], device='cuda:0')
t1.to(dev) 
print(t1)  # tensor([[-0.2678,  1.9252]]) 
print(t1.is_cuda) # False
t1 = t1.to(dev)
print(t1)  # tensor([[-0.2678,  1.9252]], device='cuda:0') 
print(t1.is_cuda) # True

class M(nn.Module):
    def __init__(self):        
        super().__init__()        
        self.l1 = nn.Linear(1,2)

    def forward(self, x):                      
        x = self.l1(x)
        return x
model = M()   # not on cuda
model.to(dev) # is on cuda (all parameters)
print(next(model.parameters()).is_cuda) # True

This all is tricky and understanding it once, helps you to deal fast with less debugging.


回答 5

要检查是否有可用的GPU:

torch.cuda.is_available()

如果以上函数返回False

  1. 你要么没有GPU,
  2. 或尚未安装Nvidia驱动程序,因此操作系统看不到GPU,
  3. 或者GPU被环境变量隐藏CUDA_VISIBLE_DEVICES。当值CUDA_VISIBLE_DEVICES是-1时,所有设备都被隐藏。您可以使用以下代码在代码中检查该值:os.environ['CUDA_VISIBLE_DEVICES']

如果以上函数返回True,则不一定表示您正在使用GPU。在Pytorch中,您可以在创建张量时为设备分配张量。默认情况下,张量分配给cpu。要检查张量的分配位置,请执行以下操作:

# assuming that 'a' is a tensor created somewhere else
a.device  # returns the device where the tensor is allocated

请注意,您无法对在不同设备中分配的张量进行操作。要查看如何为GPU分配张量,请参见此处:https : //pytorch.org/docs/stable/notes/cuda.html

To check if there is a GPU available:

torch.cuda.is_available()

If the above function returns False,

  1. you either have no GPU,
  2. or the Nvidia drivers have not been installed so the OS does not see the GPU,
  3. or the GPU is being hidden by the environmental variable CUDA_VISIBLE_DEVICES. When the value of CUDA_VISIBLE_DEVICES is -1, then all your devices are being hidden. You can check that value in code with this line: os.environ['CUDA_VISIBLE_DEVICES']

If the above function returns True that does not necessarily mean that you are using the GPU. In Pytorch you can allocate tensors to devices when you create them. By default, tensors get allocated to the cpu. To check where your tensor is allocated do:

# assuming that 'a' is a tensor created somewhere else
a.device  # returns the device where the tensor is allocated

Note that you cannot operate on tensors allocated in different devices. To see how to allocate a tensor to the GPU, see here: https://pytorch.org/docs/stable/notes/cuda.html


回答 6

几乎所有答案都在这里参考torch.cuda.is_available()。但是,那只是硬币的一部分。它告诉您GPU(实际上是CUDA)是否可用,而不是实际上是否在使用它。在典型的设置中,您可以通过以下方式设置设备:

device = torch.device("cuda") if torch.cuda.is_available() else torch.device("cpu")

但是在较大的环境(例如研究)中,通常也为用户提供更多选项,因此,基于输入,他们可以禁用CUDA,指定CUDA ID等。在这种情况下,是否使用GPU不仅取决于是否可用。将设备设置为割炬设备后,您可以获取其type属性以验证它是否为CUDA。

if device.type == 'cuda':
    # do something

Almost all answers here reference torch.cuda.is_available(). However, that’s only one part of the coin. It tells you whether the GPU (actually CUDA) is available, not whether it’s actually being used. In a typical setup, you would set your device with something like this:

device = torch.device("cuda") if torch.cuda.is_available() else torch.device("cpu")

but in larger environments (e.g. research) it is also common to give the user more options, so based on input they can disable CUDA, specify CUDA IDs, and so on. In such case, whether or not the GPU is used is not only based on whether it is available or not. After the device has been set to a torch device, you can get its type property to verify whether it’s CUDA or not.

if device.type == 'cuda':
    # do something

回答 7

只需从命令提示符或Linux环境运行以下命令。

python -c 'import torch; print(torch.cuda.is_available())'

上面应该打印 True

python -c 'import torch; print(torch.rand(2,3).cuda())'

这应该打印以下内容:

tensor([[0.7997, 0.6170, 0.7042], [0.4174, 0.1494, 0.0516]], device='cuda:0')

Simply from command prompt or Linux environment run the following command.

python -c 'import torch; print(torch.cuda.is_available())'

The above should print True

python -c 'import torch; print(torch.rand(2,3).cuda())'

This one should print the following:

tensor([[0.7997, 0.6170, 0.7042], [0.4174, 0.1494, 0.0516]], device='cuda:0')

回答 8

如果您在这里是因为您的pytorch总是False为此付出代价torch.cuda.is_available(),则可能是因为您安装的pytorch版本没有GPU支持。(例如:您先在笔记本电脑中编码,然后在服务器上进行测试)。

解决方案是使用pytorch 下载页面中的正确命令再次卸载并安装pytorch 。另请参阅 pytorch问题。

If you are here because your pytorch always gives False for torch.cuda.is_available() that’s probably because you installed your pytorch version without GPU support. (Eg: you coded up in laptop then testing on server).

The solution is to uninstall and install pytorch again with the right command from pytorch downloads page. Also refer this pytorch issue.


回答 9

在GPU上创建张量,如下所示:

$ python
>>> import torch
>>> print(torch.rand(3,3).cuda()) 

不要退出,打开另一个终端,并使用以下命令检查python进程是否正在使用GPU:

$ nvidia-smi

Create a tensor on the GPU as follows:

$ python
>>> import torch
>>> print(torch.rand(3,3).cuda()) 

Do not quit, open another terminal and check if the python process is using the GPU using:

$ nvidia-smi

为什么两个相同的列表具有不同的内存占用量?

问题:为什么两个相同的列表具有不同的内存占用量?

我创建了两个列表l1l2,但是每个列表都有不同的创建方法:

import sys

l1 = [None] * 10
l2 = [None for _ in range(10)]

print('Size of l1 =', sys.getsizeof(l1))
print('Size of l2 =', sys.getsizeof(l2))

但是输出使我感到惊讶:

Size of l1 = 144
Size of l2 = 192

使用列表推导创建的列表在内存中的容量更大,但是在Python中,这两个列表是相同的。

这是为什么?这是CPython内部的东西,还是其他解释?

I created two lists l1 and l2, but each one with a different creation method:

import sys

l1 = [None] * 10
l2 = [None for _ in range(10)]

print('Size of l1 =', sys.getsizeof(l1))
print('Size of l2 =', sys.getsizeof(l2))

But the output surprised me:

Size of l1 = 144
Size of l2 = 192

The list created with a list comprehension is a bigger size in memory, but the two lists are identical in Python otherwise.

Why is that? Is this some CPython internal thing, or some other explanation?


回答 0

当您编写时[None] * 10,Python知道它将需要一个正好包含10个对象的列表,因此它会精确地分配该对象。

当您使用列表推导时,Python不知道它将需要多少。因此,随着元素的添加,列表逐渐增加。对于每个重新分配,它分配的空间都超过立即需要的空间,因此不必为每个元素重新分配。结果列表可能会比所需的要大一些。

比较具有相似大小的列表时,可以看到此行为:

>>> sys.getsizeof([None]*15)
184
>>> sys.getsizeof([None]*16)
192
>>> sys.getsizeof([None for _ in range(15)])
192
>>> sys.getsizeof([None for _ in range(16)])
192
>>> sys.getsizeof([None for _ in range(17)])
264

您可以看到第一种方法只分配需要的内容,而第二种则周期性地增长。在此示例中,它为16个元素分配了足够的内存,并且在达到第17个元素时不得不重新分配。

When you write [None] * 10, Python knows that it will need a list of exactly 10 objects, so it allocates exactly that.

When you use a list comprehension, Python doesn’t know how much it will need. So it gradually grows the list as elements are added. For each reallocation it allocates more room than is immediately needed, so that it doesn’t have to reallocate for each element. The resulting list is likely to be somewhat bigger than needed.

You can see this behavior when comparing lists created with similar sizes:

>>> sys.getsizeof([None]*15)
184
>>> sys.getsizeof([None]*16)
192
>>> sys.getsizeof([None for _ in range(15)])
192
>>> sys.getsizeof([None for _ in range(16)])
192
>>> sys.getsizeof([None for _ in range(17)])
264

You can see that the first method allocates just what is needed, while the second one grows periodically. In this example, it allocates enough for 16 elements, and had to reallocate when reaching the 17th.


回答 1

就像在这个问题中指出的那样,列表理解是list.append在后台使用的,因此它将调用list-resize方法,该方法将进行整体化。

为了向自己证明这一点,您实际上可以使用反dis汇编程序:

>>> code = compile('[x for x in iterable]', '', 'eval')
>>> import dis
>>> dis.dis(code)
  1           0 LOAD_CONST               0 (<code object <listcomp> at 0x10560b810, file "", line 1>)
              2 LOAD_CONST               1 ('<listcomp>')
              4 MAKE_FUNCTION            0
              6 LOAD_NAME                0 (iterable)
              8 GET_ITER
             10 CALL_FUNCTION            1
             12 RETURN_VALUE

Disassembly of <code object <listcomp> at 0x10560b810, file "", line 1>:
  1           0 BUILD_LIST               0
              2 LOAD_FAST                0 (.0)
        >>    4 FOR_ITER                 8 (to 14)
              6 STORE_FAST               1 (x)
              8 LOAD_FAST                1 (x)
             10 LIST_APPEND              2
             12 JUMP_ABSOLUTE            4
        >>   14 RETURN_VALUE
>>>

注意LIST_APPEND反汇编<listcomp>代码对象中的操作码。从文档

LIST_APPEND(i)

来电list.append(TOS[-i], TOS)。用于实现列表推导。

现在,对于列表重复操作,如果考虑以下因素,我们就可以知道发生了什么:

>>> import sys
>>> sys.getsizeof([])
64
>>> 8*10
80
>>> 64 + 80
144
>>> sys.getsizeof([None]*10)
144

因此,它似乎能够准确分配大小。查看源代码,我们看到的就是这样:

static PyObject *
list_repeat(PyListObject *a, Py_ssize_t n)
{
    Py_ssize_t i, j;
    Py_ssize_t size;
    PyListObject *np;
    PyObject **p, **items;
    PyObject *elem;
    if (n < 0)
        n = 0;
    if (n > 0 && Py_SIZE(a) > PY_SSIZE_T_MAX / n)
        return PyErr_NoMemory();
    size = Py_SIZE(a) * n;
    if (size == 0)
        return PyList_New(0);
    np = (PyListObject *) PyList_New(size);

即,这里:size = Py_SIZE(a) * n;。其余函数仅填充数组。

As noted in this question the list-comprehension uses list.append under the hood, so it will call the list-resize method, which overallocates.

To demonstrate this to yourself, you can actually use the dis dissasembler:

>>> code = compile('[x for x in iterable]', '', 'eval')
>>> import dis
>>> dis.dis(code)
  1           0 LOAD_CONST               0 (<code object <listcomp> at 0x10560b810, file "", line 1>)
              2 LOAD_CONST               1 ('<listcomp>')
              4 MAKE_FUNCTION            0
              6 LOAD_NAME                0 (iterable)
              8 GET_ITER
             10 CALL_FUNCTION            1
             12 RETURN_VALUE

Disassembly of <code object <listcomp> at 0x10560b810, file "", line 1>:
  1           0 BUILD_LIST               0
              2 LOAD_FAST                0 (.0)
        >>    4 FOR_ITER                 8 (to 14)
              6 STORE_FAST               1 (x)
              8 LOAD_FAST                1 (x)
             10 LIST_APPEND              2
             12 JUMP_ABSOLUTE            4
        >>   14 RETURN_VALUE
>>>

Notice the LIST_APPEND opcode in the disassembly of the <listcomp> code object. From the docs:

LIST_APPEND(i)

Calls list.append(TOS[-i], TOS). Used to implement list comprehensions.

Now, for the list-repetition operation, we have a hint about what is going on if we consider:

>>> import sys
>>> sys.getsizeof([])
64
>>> 8*10
80
>>> 64 + 80
144
>>> sys.getsizeof([None]*10)
144

So, it seems to be able to exactly allocate the size. Looking at the source code, we see this is exactly what happens:

static PyObject *
list_repeat(PyListObject *a, Py_ssize_t n)
{
    Py_ssize_t i, j;
    Py_ssize_t size;
    PyListObject *np;
    PyObject **p, **items;
    PyObject *elem;
    if (n < 0)
        n = 0;
    if (n > 0 && Py_SIZE(a) > PY_SSIZE_T_MAX / n)
        return PyErr_NoMemory();
    size = Py_SIZE(a) * n;
    if (size == 0)
        return PyList_New(0);
    np = (PyListObject *) PyList_New(size);

Namely, here: size = Py_SIZE(a) * n;. The rest of the functions simply fills the array.


回答 2

没有一个是内存块,但是它不是预先指定的大小。除此之外,数组元素之间的数组还有一些额外的间距。您可以通过运行以下命令自己查看:

for ele in l2:
    print(sys.getsizeof(ele))

>>>>16
16
16
16
16
16
16
16
16
16

它的总和不是l2的大小,而是更小。

print(sys.getsizeof([None]))
72

而且这远远大于的十分之一l1

您的数字应该有所不同,具体取决于您的操作系统的详细信息和操作系统中当前内存使用情况的详细信息。[None]的大小永远不能大于将变量设置为存储的可用相邻内存,并且如果以后将其动态分配为更大,则可能必须移动该变量。

None is a block of memory, but it is not a pre-specified size. In addition to that, there is some extra spacing in an array between array elements. You can see this yourself by running:

for ele in l2:
    print(sys.getsizeof(ele))

>>>>16
16
16
16
16
16
16
16
16
16

Which does not total the size of l2, but rather is less.

print(sys.getsizeof([None]))
72

And this is much greater than one tenth of the size of l1.

Your numbers should vary depending on both the details of your operating system and the details of current memory usage in your operating system. The size of [None] can never be bigger than the available adjacent memory where the variable is set to be stored, and the variable may have to be moved if it is later dynamically allocated to be larger.


Python内存泄漏

问题:Python内存泄漏

我有一个长时间运行的脚本,如果让脚本运行足够长的时间,它将消耗系统上的所有内存。

在不详细介绍脚本的情况下,我有两个问题:

  1. 是否有可遵循的“最佳实践”,以防止泄漏发生?
  2. 有什么技术可以调试Python中的内存泄漏?

I have a long-running script which, if let to run long enough, will consume all the memory on my system.

Without going into details about the script, I have two questions:

  1. Are there any “Best Practices” to follow, which will help prevent leaks from occurring?
  2. What techniques are there to debug memory leaks in Python?

回答 0

看看这篇文章:跟踪python内存泄漏

另外,请注意,垃圾收集模块实际上可以设置调试标志。看一下set_debug功能。此外,请查看Gnibbler的这段代码,以确定调用后已创建的对象的类型。

Have a look at this article: Tracing python memory leaks

Also, note that the garbage collection module actually can have debug flags set. Look at the set_debug function. Additionally, look at this code by Gnibbler for determining the types of objects that have been created after a call.


回答 1

我尝试了前面提到的大多数选项,但是发现这个小而直观的软件包是最好的:pympler

跟踪未进行垃圾收集的对象非常简单,请查看以下小示例:

通过安装软件包 pip install pympler

from pympler.tracker import SummaryTracker
tracker = SummaryTracker()

# ... some code you want to investigate ...

tracker.print_diff()

输出显示您已添加的所有对象,以及它们消耗的内存。

样本输出:

                                 types |   # objects |   total size
====================================== | =========== | ============
                                  list |        1095 |    160.78 KB
                                   str |        1093 |     66.33 KB
                                   int |         120 |      2.81 KB
                                  dict |           3 |       840 B
      frame (codename: create_summary) |           1 |       560 B
          frame (codename: print_diff) |           1 |       480 B

该软件包提供了更多的功能。查看pympler的文档,特别是确定内存泄漏一节。

I tried out most options mentioned previously but found this small and intuitive package to be the best: pympler

It’s quite straight forward to trace objects that were not garbage-collected, check this small example:

install package via pip install pympler

from pympler.tracker import SummaryTracker
tracker = SummaryTracker()

# ... some code you want to investigate ...

tracker.print_diff()

The output shows you all the objects that have been added, plus the memory they consumed.

Sample output:

                                 types |   # objects |   total size
====================================== | =========== | ============
                                  list |        1095 |    160.78 KB
                                   str |        1093 |     66.33 KB
                                   int |         120 |      2.81 KB
                                  dict |           3 |       840 B
      frame (codename: create_summary) |           1 |       560 B
          frame (codename: print_diff) |           1 |       480 B

This package provides a number of more features. Check pympler’s documentation, in particular the section Identifying memory leaks.


回答 2

让我推荐我创建的mem_top工具

它帮助我解决了类似的问题

它只是立即显示Python程序中内存泄漏的主要嫌疑人

Let me recommend mem_top tool I created

It helped me to solve a similar issue

It just instantly shows top suspects for memory leaks in a Python program


回答 3

从Python 3.4开始,Tracemalloc模块已集成为内置模块,并且显然,它也可以作为第三方库用于Python的早期版本(尽管尚未测试)。

该模块能够输出分配最多内存的精确文件和行。恕我直言,此信息比每种类型分配的实例数更有价值(99%的时间最终是很多元组,这是一个线索,但在大多数情况下几乎没有帮助)。

我建议您将tracemalloc与pyrasite结合使用。每10个中有9 pyrasite外壳中运行前10个代码段,将为您提供足够的信息和提示,以在10分钟内修复泄漏。但是,如果仍然无法找到泄漏的原因,pyrasite-shell与该线程中提到的其他工具的组合可能也会为您提供更多提示。您还应该查看吡铁矿提供的所有其他帮助程序(例如内存查看器)。

Tracemalloc module was integrated as a built-in module starting from Python 3.4, and appearently, it’s also available for prior versions of Python as a third-party library (haven’t tested it though).

This module is able to output the precise files and lines that allocated the most memory. IMHO, this information is infinitly more valuable than the number of allocated instances for each type (which ends up being a lot of tuples 99% of the time, which is a clue, but barely helps in most cases).

I recommend you use tracemalloc in combination with pyrasite. 9 times out of 10, running the top 10 snippet in a pyrasite-shell will give you enough information and hints to to fix the leak within 10 minutes. Yet, if you’re still unable to find the leak cause, pyrasite-shell in combination with the other tools mentioned in this thread will probably give you some more hints too. You should also take a look on all the extra helpers provided by pyrasite (such as the memory viewer).


回答 4

您应该特别查看全局数据或静态数据(寿命长的数据)。

当这些数据不受限制地增长时,您也会在Python中遇到麻烦。

垃圾收集器只能收集不再引用的数据。但是您的静态数据可以连接应释放的数据元素。

另一个问题可能是内存循环,但是至少从理论上讲,垃圾收集器应该找到并消除循环-至少只要不将其挂接到某些长期存在的数据上即可。

什么样的长期数据特别麻烦?在所有列表和字典上都有一个很好的外观-它们可以不受限制地增长。在字典中,您甚至可能看不到麻烦,因为当您访问字典时,字典中的键数可能对您来说不是很明显。

You should specially have a look on your global or static data (long living data).

When this data grows without restriction, you can also get troubles in Python.

The garbage collector can only collect data, that is not referenced any more. But your static data can hookup data elements that should be freed.

Another problem can be memory cycles, but at least in theory the Garbage collector should find and eliminate cycles — at least as long as they are not hooked on some long living data.

What kinds of long living data are specially troublesome? Have a good look on any lists and dictionaries — they can grow without any limit. In dictionaries you might even don’t see the trouble coming since when you access dicts, the number of keys in the dictionary might not be of big visibility to you …


回答 5

为了检测和定位长时间运行的进程(例如在生产环境中)的内存泄漏,现在可以使用stackimpact。它在下面使用tracemalloc这篇文章中的更多信息。

To detect and locate memory leaks for long running processes, e.g. in production environments, you can now use stackimpact. It uses tracemalloc underneath. More info in this post.


回答 6

就最佳实践而言,请留意递归函数。就我而言,我遇到了递归问题(不需要递归)。我正在做什么的简化示例:

def my_function():
    # lots of memory intensive operations
    # like operating on images or huge dictionaries and lists
    .....
    my_flag = True
    if my_flag:  # restart the function if a certain flag is true
        my_function()

def main():
    my_function()

以这种递归方式进行操作不会触发垃圾回收并清除功能的其余部分,因此每次通过内存使用的时间都在增长。

我的解决方案是将递归调用从my_function()中拉出,并在再次调用时让main()处理。这样,功能自然结束,并在运行后自动清理。

def my_function():
    # lots of memory intensive operations
    # like operating on images or huge dictionaries and lists
    .....
    my_flag = True
    .....
    return my_flag

def main():
    result = my_function()
    if result:
        my_function()

As far as best practices, keep an eye for recursive functions. In my case I ran into issues with recursion (where there didn’t need to be). A simplified example of what I was doing:

def my_function():
    # lots of memory intensive operations
    # like operating on images or huge dictionaries and lists
    .....
    my_flag = True
    if my_flag:  # restart the function if a certain flag is true
        my_function()

def main():
    my_function()

operating in this recursive manner won’t trigger the garbage collection and clear out the remains of the function, so every time through memory usage is growing and growing.

My solution was to pull the recursive call out of my_function() and have main() handle when to call it again. this way the function ends naturally and cleans up after itself.

def my_function():
    # lots of memory intensive operations
    # like operating on images or huge dictionaries and lists
    .....
    my_flag = True
    .....
    return my_flag

def main():
    result = my_function()
    if result:
        my_function()

回答 7

不确定python中内存泄漏的“最佳实践”,但是python应该通过垃圾回收器清除它自己的内存。因此,主要是我从检查一些简短的循环列表开始,因为它们不会被垃圾收集器接收。

Not sure about “Best Practices” for memory leaks in python, but python should clear it’s own memory by it’s garbage collector. So mainly I would start by checking for circular list of some short, since they won’t be picked up by the garbage collector.


回答 8

这绝不是详尽的建议。但是在编写时要避免将来的内存泄漏(循环),要记住的第一件事是要确保接受回调引用的任何内容都应将该回调存储为弱引用。

This is by no means exhaustive advice. But number one thing to keep in mind when writing with the thought of avoiding future memory leaks (loops) is to make sure that anything which accepts a reference to a call-back, should store that call-back as a weak reference.


Python进程使用的总内存?

问题:Python进程使用的总内存?

Python程序是否有办法确定当前正在使用多少内存?我已经看到了有关单个对象的内存使用情况的讨论,但是我需要的是该过程的总内存使用情况,以便可以确定何时需要开始丢弃缓存的数据。

Is there a way for a Python program to determine how much memory it’s currently using? I’ve seen discussions about memory usage for a single object, but what I need is total memory usage for the process, so that I can determine when it’s necessary to start discarding cached data.


回答 0

是适用于各种操作系统(包括Linux,Windows 7等)的有用解决方案:

import os
import psutil
process = psutil.Process(os.getpid())
print(process.memory_info().rss)  # in bytes 

在我当前使用psutil 5.6.3安装的python 2.7中,最后一行应为

print(process.memory_info()[0])

相反(API发生了变化)。

注意:pip install psutil如果尚未安装,请执行此操作。

Here is a useful solution that works for various operating systems, including Linux, Windows, etc.:

import os
import psutil
process = psutil.Process(os.getpid())
print(process.memory_info().rss)  # in bytes 

With Python 2.7 and psutil 5.6.3, the last line should be

print(process.memory_info()[0])

instead (there was a change in the API later).

Note: do pip install psutil if it is not installed yet.


回答 1

对于基于Unix的系统(Linux,Mac OS X,Solaris),可以使用getrusage()标准库模块中的功能resource。产生的对象具有属性ru_maxrss,该属性给出了调用过程的峰值内存使用情况:

>>> resource.getrusage(resource.RUSAGE_SELF).ru_maxrss
2656  # peak memory usage (kilobytes on Linux, bytes on OS X)

Python文档不记单位。请参阅您特定系统的man getrusage.2页面以检查单位的值。在Ubuntu 18.04上,单位记为千字节。在Mac OS X上,它是字节。

getrusage()还可以提供该功能resource.RUSAGE_CHILDREN以获取子进程的使用情况,以及(在某些系统上)resource.RUSAGE_BOTH总(自身和子)进程的使用情况。

如果你只关心Linux的,则可以选择阅读/proc/self/status/proc/self/statm在其他的答案对这个问题,并描述文件一个了。

For Unix based systems (Linux, Mac OS X, Solaris), you can use the getrusage() function from the standard library module resource. The resulting object has the attribute ru_maxrss, which gives the peak memory usage for the calling process:

>>> resource.getrusage(resource.RUSAGE_SELF).ru_maxrss
2656  # peak memory usage (kilobytes on Linux, bytes on OS X)

The Python docs don’t make note of the units. Refer to your specific system’s man getrusage.2 page to check the unit for the value. On Ubuntu 18.04, the unit is noted as kilobytes. On Mac OS X, it’s bytes.

The getrusage() function can also be given resource.RUSAGE_CHILDREN to get the usage for child processes, and (on some systems) resource.RUSAGE_BOTH for total (self and child) process usage.

If you care only about Linux, you can alternatively read the /proc/self/status or /proc/self/statm file as described in other answers for this question and this one too.


回答 2

在Windows上,您可以使用WMI(主页Cheeseshop):


def memory():
    import os
    from wmi import WMI
    w = WMI('.')
    result = w.query("SELECT WorkingSet FROM Win32_PerfRawData_PerfProc_Process WHERE IDProcess=%d" % os.getpid())
    return int(result[0].WorkingSet)

在Linux上(来自python cookbook http://code.activestate.com/recipes/286222/

import os
_proc_status = '/proc/%d/status' % os.getpid()

_scale = {'kB': 1024.0, 'mB': 1024.0*1024.0,
          'KB': 1024.0, 'MB': 1024.0*1024.0}

def _VmB(VmKey):
    '''Private.
    '''
    global _proc_status, _scale
     # get pseudo file  /proc/<pid>/status
    try:
        t = open(_proc_status)
        v = t.read()
        t.close()
    except:
        return 0.0  # non-Linux?
     # get VmKey line e.g. 'VmRSS:  9999  kB\n ...'
    i = v.index(VmKey)
    v = v[i:].split(None, 3)  # whitespace
    if len(v) < 3:
        return 0.0  # invalid format?
     # convert Vm value to bytes
    return float(v[1]) * _scale[v[2]]


def memory(since=0.0):
    '''Return memory usage in bytes.
    '''
    return _VmB('VmSize:') - since


def resident(since=0.0):
    '''Return resident memory usage in bytes.
    '''
    return _VmB('VmRSS:') - since


def stacksize(since=0.0):
    '''Return stack size in bytes.
    '''
    return _VmB('VmStk:') - since

On Windows, you can use WMI (home page, cheeseshop):


def memory():
    import os
    from wmi import WMI
    w = WMI('.')
    result = w.query("SELECT WorkingSet FROM Win32_PerfRawData_PerfProc_Process WHERE IDProcess=%d" % os.getpid())
    return int(result[0].WorkingSet)

On Linux (from python cookbook http://code.activestate.com/recipes/286222/:

import os
_proc_status = '/proc/%d/status' % os.getpid()

_scale = {'kB': 1024.0, 'mB': 1024.0*1024.0,
          'KB': 1024.0, 'MB': 1024.0*1024.0}

def _VmB(VmKey):
    '''Private.
    '''
    global _proc_status, _scale
     # get pseudo file  /proc/<pid>/status
    try:
        t = open(_proc_status)
        v = t.read()
        t.close()
    except:
        return 0.0  # non-Linux?
     # get VmKey line e.g. 'VmRSS:  9999  kB\n ...'
    i = v.index(VmKey)
    v = v[i:].split(None, 3)  # whitespace
    if len(v) < 3:
        return 0.0  # invalid format?
     # convert Vm value to bytes
    return float(v[1]) * _scale[v[2]]


def memory(since=0.0):
    '''Return memory usage in bytes.
    '''
    return _VmB('VmSize:') - since


def resident(since=0.0):
    '''Return resident memory usage in bytes.
    '''
    return _VmB('VmRSS:') - since


def stacksize(since=0.0):
    '''Return stack size in bytes.
    '''
    return _VmB('VmStk:') - since

回答 3

在UNIX上,您可以使用该ps工具进行监视:

$ ps u -p 1347 | awk '{sum=sum+$6}; END {print sum/1024}'

其中1347是某个进程ID。此外,结果以MB为单位。

On unix, you can use the ps tool to monitor it:

$ ps u -p 1347 | awk '{sum=sum+$6}; END {print sum/1024}'

where 1347 is some process id. Also, the result is in MB.


回答 4

Linux,Python 2,Python 3和pypy上当前进程的当前内存使用情况,没有任何导入:

def getCurrentMemoryUsage():
    ''' Memory usage in kB '''

    with open('/proc/self/status') as f:
        memusage = f.read().split('VmRSS:')[1].split('\n')[0][:-3]

    return int(memusage.strip())

在Linux 4.4和4.9上进行了测试,但即使是早期的Linux版本也可以使用。

在文件中man proc查找信息并进行搜索/proc/$PID/status,它提到了某些字段的最低版本(例如Linux 2.6.10的“ VmPTE”),但是“ VmRSS”字段(我在这里使用)没有提及。因此,我认为它从早期版本就已经存在。

Current memory usage of the current process on Linux, for Python 2, Python 3, and pypy, without any imports:

def getCurrentMemoryUsage():
    ''' Memory usage in kB '''

    with open('/proc/self/status') as f:
        memusage = f.read().split('VmRSS:')[1].split('\n')[0][:-3]

    return int(memusage.strip())

It reads the status file of the current process, takes everything after VmRSS:, then takes everything before the first newline (isolating the value of VmRSS), and finally cuts off the last 3 bytes which are a space and the unit (kB).
To return, it strips any whitespace and returns it as a number.

Tested on Linux 4.4 and 4.9, but even an early Linux version should work: looking in man proc and searching for the info on the /proc/$PID/status file, it mentions minimum versions for some fields (like Linux 2.6.10 for “VmPTE”), but the “VmRSS” field (which I use here) has no such mention. Therefore I assume it has been in there since an early version.


回答 5

我喜欢,谢谢@bayer。我现在有一个特定的过程计数工具。

# Megabyte.
$ ps aux | grep python | awk '{sum=sum+$6}; END {print sum/1024 " MB"}'
87.9492 MB

# Byte.
$ ps aux | grep python | awk '{sum=sum+$6}; END {print sum " KB"}'
90064 KB

附上我的流程清单。

$ ps aux  | grep python
root       943  0.0  0.1  53252  9524 ?        Ss   Aug19  52:01 /usr/bin/python /usr/local/bin/beaver -c /etc/beaver/beaver.conf -l /var/log/beaver.log -P /var/run/beaver.pid
root       950  0.6  0.4 299680 34220 ?        Sl   Aug19 568:52 /usr/bin/python /usr/local/bin/beaver -c /etc/beaver/beaver.conf -l /var/log/beaver.log -P /var/run/beaver.pid
root      3803  0.2  0.4 315692 36576 ?        S    12:43   0:54 /usr/bin/python /usr/local/bin/beaver -c /etc/beaver/beaver.conf -l /var/log/beaver.log -P /var/run/beaver.pid
jonny    23325  0.0  0.1  47460  9076 pts/0    S+   17:40   0:00 python
jonny    24651  0.0  0.0  13076   924 pts/4    S+   18:06   0:00 grep python

参考

I like it, thank you for @bayer. I get a specific process count tool, now.

# Megabyte.
$ ps aux | grep python | awk '{sum=sum+$6}; END {print sum/1024 " MB"}'
87.9492 MB

# Byte.
$ ps aux | grep python | awk '{sum=sum+$6}; END {print sum " KB"}'
90064 KB

Attach my process list.

$ ps aux  | grep python
root       943  0.0  0.1  53252  9524 ?        Ss   Aug19  52:01 /usr/bin/python /usr/local/bin/beaver -c /etc/beaver/beaver.conf -l /var/log/beaver.log -P /var/run/beaver.pid
root       950  0.6  0.4 299680 34220 ?        Sl   Aug19 568:52 /usr/bin/python /usr/local/bin/beaver -c /etc/beaver/beaver.conf -l /var/log/beaver.log -P /var/run/beaver.pid
root      3803  0.2  0.4 315692 36576 ?        S    12:43   0:54 /usr/bin/python /usr/local/bin/beaver -c /etc/beaver/beaver.conf -l /var/log/beaver.log -P /var/run/beaver.pid
jonny    23325  0.0  0.1  47460  9076 pts/0    S+   17:40   0:00 python
jonny    24651  0.0  0.0  13076   924 pts/4    S+   18:06   0:00 grep python

Reference


回答 6

对于Python 3.6和psutil 5.4.5,更容易使用此处memory_percent()列出的函数。

import os
import psutil
process = psutil.Process(os.getpid())
print(process.memory_percent())

For Python 3.6 and psutil 5.4.5 it is easier to use memory_percent() function listed here.

import os
import psutil
process = psutil.Process(os.getpid())
print(process.memory_percent())

回答 7

甚至比/proc/self/status:更容易使用/proc/self/statm。这只是几个统计信息之间用空格分隔的列表。我无法判断两个文件是否总是存在。

/ proc / [pid] / statm

提供有关内存使用情况的信息,以页为单位。这些列是:

  • 大小(1)程序总大小(与/ proc / [pid] / status中的VmSize相同)
  • 常驻(2)常驻集大小(与/ proc / [pid] / status中的VmRSS相同)
  • 共享(3)个常驻共享页面(即由文件支持)的数量(与/ proc / [pid] / status中的RssFile + RssShmem相同)
  • 文字(4)文字(代码)
  • lib(5)库(从Linux 2.6开始不使用;始终为0)
  • 数据(6)数据+堆栈
  • dt(7)脏页(自Linux 2.6起未使用;始终为0)

这是一个简单的例子:

from pathlib import Path
from resource import getpagesize

PAGESIZE = getpagesize()
PATH = Path('/proc/self/statm')


def get_resident_set_size() -> int:
    """Return the current resident set size in bytes."""
    # statm columns are: size resident shared text lib data dt
    statm = PATH.read_text()
    fields = statm.split()
    return int(fields[1]) * PAGESIZE


data = []
start_memory = get_resident_set_size()
for _ in range(10):
    data.append('X' * 100000)
    print(get_resident_set_size() - start_memory)

生成的列表看起来像这样:

0
0
368640
368640
368640
638976
638976
909312
909312
909312

您可以看到在大约分配了3个100,000字节后,它跳了约300,000字节。

Even easier to use than /proc/self/status: /proc/self/statm. It’s just a space delimited list of several statistics. I haven’t been able to tell if both files are always present.

/proc/[pid]/statm

Provides information about memory usage, measured in pages. The columns are:

  • size (1) total program size (same as VmSize in /proc/[pid]/status)
  • resident (2) resident set size (same as VmRSS in /proc/[pid]/status)
  • shared (3) number of resident shared pages (i.e., backed by a file) (same as RssFile+RssShmem in /proc/[pid]/status)
  • text (4) text (code)
  • lib (5) library (unused since Linux 2.6; always 0)
  • data (6) data + stack
  • dt (7) dirty pages (unused since Linux 2.6; always 0)

Here’s a simple example:

from pathlib import Path
from resource import getpagesize

PAGESIZE = getpagesize()
PATH = Path('/proc/self/statm')


def get_resident_set_size() -> int:
    """Return the current resident set size in bytes."""
    # statm columns are: size resident shared text lib data dt
    statm = PATH.read_text()
    fields = statm.split()
    return int(fields[1]) * PAGESIZE


data = []
start_memory = get_resident_set_size()
for _ in range(10):
    data.append('X' * 100000)
    print(get_resident_set_size() - start_memory)

That produces a list that looks something like this:

0
0
368640
368640
368640
638976
638976
909312
909312
909312

You can see that it jumps by about 300,000 bytes after roughly 3 allocations of 100,000 bytes.


回答 8

下面是我的函数装饰器,它可以跟踪该过程在函数调用之前消耗了多少内存,在函数调用之后使用了多少内存以及执行了多长时间。

import time
import os
import psutil


def elapsed_since(start):
    return time.strftime("%H:%M:%S", time.gmtime(time.time() - start))


def get_process_memory():
    process = psutil.Process(os.getpid())
    return process.memory_info().rss


def track(func):
    def wrapper(*args, **kwargs):
        mem_before = get_process_memory()
        start = time.time()
        result = func(*args, **kwargs)
        elapsed_time = elapsed_since(start)
        mem_after = get_process_memory()
        print("{}: memory before: {:,}, after: {:,}, consumed: {:,}; exec time: {}".format(
            func.__name__,
            mem_before, mem_after, mem_after - mem_before,
            elapsed_time))
        return result
    return wrapper

因此,当您用它装饰一些功能时

from utils import track

@track
def list_create(n):
    print("inside list create")
    return [1] * n

您将能够看到以下输出:

inside list create
list_create: memory before: 45,928,448, after: 46,211,072, consumed: 282,624; exec time: 00:00:00

Below is my function decorator which allows to track how much memory this process consumed before the function call, how much memory it uses after the function call, and how long the function is executed.

import time
import os
import psutil


def elapsed_since(start):
    return time.strftime("%H:%M:%S", time.gmtime(time.time() - start))


def get_process_memory():
    process = psutil.Process(os.getpid())
    return process.memory_info().rss


def track(func):
    def wrapper(*args, **kwargs):
        mem_before = get_process_memory()
        start = time.time()
        result = func(*args, **kwargs)
        elapsed_time = elapsed_since(start)
        mem_after = get_process_memory()
        print("{}: memory before: {:,}, after: {:,}, consumed: {:,}; exec time: {}".format(
            func.__name__,
            mem_before, mem_after, mem_after - mem_before,
            elapsed_time))
        return result
    return wrapper

So, when you have some function decorated with it

from utils import track

@track
def list_create(n):
    print("inside list create")
    return [1] * n

You will be able to see this output:

inside list create
list_create: memory before: 45,928,448, after: 46,211,072, consumed: 282,624; exec time: 00:00:00

回答 9

import os, win32api, win32con, win32process
han = win32api.OpenProcess(win32con.PROCESS_QUERY_INFORMATION|win32con.PROCESS_VM_READ, 0, os.getpid())
process_memory = int(win32process.GetProcessMemoryInfo(han)['WorkingSetSize'])
import os, win32api, win32con, win32process
han = win32api.OpenProcess(win32con.PROCESS_QUERY_INFORMATION|win32con.PROCESS_VM_READ, 0, os.getpid())
process_memory = int(win32process.GetProcessMemoryInfo(han)['WorkingSetSize'])

回答 10

对于Unix系统time,如果通过-v,则命令(/ usr / bin / time)会为您提供该信息。参见Maximum resident set size下文,这是程序执行期间使用最大(峰值)实际(非虚拟)内存

$ /usr/bin/time -v ls /

    Command being timed: "ls /"
    User time (seconds): 0.00
    System time (seconds): 0.01
    Percent of CPU this job got: 250%
    Elapsed (wall clock) time (h:mm:ss or m:ss): 0:00.00
    Average shared text size (kbytes): 0
    Average unshared data size (kbytes): 0
    Average stack size (kbytes): 0
    Average total size (kbytes): 0
    Maximum resident set size (kbytes): 0
    Average resident set size (kbytes): 0
    Major (requiring I/O) page faults: 0
    Minor (reclaiming a frame) page faults: 315
    Voluntary context switches: 2
    Involuntary context switches: 0
    Swaps: 0
    File system inputs: 0
    File system outputs: 0
    Socket messages sent: 0
    Socket messages received: 0
    Signals delivered: 0
    Page size (bytes): 4096
    Exit status: 0

For Unix systems command time (/usr/bin/time) gives you that info if you pass -v. See Maximum resident set size below, which is the maximum (peak) real (not virtual) memory that was used during program execution:

$ /usr/bin/time -v ls /

    Command being timed: "ls /"
    User time (seconds): 0.00
    System time (seconds): 0.01
    Percent of CPU this job got: 250%
    Elapsed (wall clock) time (h:mm:ss or m:ss): 0:00.00
    Average shared text size (kbytes): 0
    Average unshared data size (kbytes): 0
    Average stack size (kbytes): 0
    Average total size (kbytes): 0
    Maximum resident set size (kbytes): 0
    Average resident set size (kbytes): 0
    Major (requiring I/O) page faults: 0
    Minor (reclaiming a frame) page faults: 315
    Voluntary context switches: 2
    Involuntary context switches: 0
    Swaps: 0
    File system inputs: 0
    File system outputs: 0
    Socket messages sent: 0
    Socket messages received: 0
    Signals delivered: 0
    Page size (bytes): 4096
    Exit status: 0

回答 11

使用sh和os来了解python Bayer的答案。

float(sh.awk(sh.ps('u','-p',os.getpid()),'{sum=sum+$6}; END {print sum/1024}'))

答案以兆字节为单位。

Using sh and os to get into python bayer’s answer.

float(sh.awk(sh.ps('u','-p',os.getpid()),'{sum=sum+$6}; END {print sum/1024}'))

Answer is in megabytes.


如何在Python中显式释放内存?

问题:如何在Python中显式释放内存?

我编写了一个Python程序,该程序作用于大型输入文件,以创建代表三角形的数百万个对象。该算法是:

  1. 读取输入文件
  2. 处理文件并创建一个三角形列表,以其顶点表示
  3. 以OFF格式输出顶点:顶点列表,后跟三角形列表。三角形由顶点列表中的索引表示

在打印出三角形之前先打印出完整的顶点列表的OFF要求意味着在将输出写入文件之前,必须将三角形的列表保留在内存中。同时,由于列表的大小,我遇到了内存错误。

告诉Python我不再需要某些数据并且可以释放它们的最佳方法是什么?

I wrote a Python program that acts on a large input file to create a few million objects representing triangles. The algorithm is:

  1. read an input file
  2. process the file and create a list of triangles, represented by their vertices
  3. output the vertices in the OFF format: a list of vertices followed by a list of triangles. The triangles are represented by indices into the list of vertices

The requirement of OFF that I print out the complete list of vertices before I print out the triangles means that I have to hold the list of triangles in memory before I write the output to file. In the meanwhile I’m getting memory errors because of the sizes of the lists.

What is the best way to tell Python that I no longer need some of the data, and it can be freed?


回答 0

根据Python官方文档,您可以使用强制垃圾回收器释放未引用的内存gc.collect()。例:

import gc
gc.collect()

According to Python Official Documentation, you can force the Garbage Collector to release unreferenced memory with gc.collect(). Example:

import gc
gc.collect()

回答 1

不幸的是(取决于您的Python版本和版本),某些类型的对象使用“空闲列表”,这是一种整洁的局部优化,但可能会导致内存碎片,特别是通过为特定类型的对象设置越来越多的“专用”内存来实现。因此无法使用“普通基金”。

确保大量但临时使用内存的唯一真正可靠的方法是在完成后将所有资源都返还给系统,这是让使用发生在子进程中,该进程需要大量内存,然后终止。在这种情况下,操作系统将完成其工作,并乐意回收子进程可能吞没的所有资源。幸运的是,该multiprocessing模块使这种操作(过去很痛苦)在现代版本的Python中还不错。

在您的用例中,似乎子过程累积一些结果并确保这些结果可用于主过程的最佳方法是使用半临时文件(我所说的是半临时文件,而不是那种关闭后会自动消失,只会删除您用完后会明确删除的普通文件)。

Unfortunately (depending on your version and release of Python) some types of objects use “free lists” which are a neat local optimization but may cause memory fragmentation, specifically by making more and more memory “earmarked” for only objects of a certain type and thereby unavailable to the “general fund”.

The only really reliable way to ensure that a large but temporary use of memory DOES return all resources to the system when it’s done, is to have that use happen in a subprocess, which does the memory-hungry work then terminates. Under such conditions, the operating system WILL do its job, and gladly recycle all the resources the subprocess may have gobbled up. Fortunately, the multiprocessing module makes this kind of operation (which used to be rather a pain) not too bad in modern versions of Python.

In your use case, it seems that the best way for the subprocesses to accumulate some results and yet ensure those results are available to the main process is to use semi-temporary files (by semi-temporary I mean, NOT the kind of files that automatically go away when closed, just ordinary files that you explicitly delete when you’re all done with them).


回答 2

del语句可能有用,但是IIRC 不能保证释放内存。该文档是在这里 …和为什么它没有被释放是在这里

我听说Linux和Unix类型系统上的人们分叉python进程来做一些工作,获得结果然后杀死它。

本文对Python垃圾收集器进行了说明,但我认为缺乏内存控制是托管内存的缺点

The del statement might be of use, but IIRC it isn’t guaranteed to free the memory. The docs are here … and a why it isn’t released is here.

I have heard people on Linux and Unix-type systems forking a python process to do some work, getting results and then killing it.

This article has notes on the Python garbage collector, but I think lack of memory control is the downside to managed memory


回答 3

Python是垃圾回收的,因此,如果减小列表的大小,它将回收内存。您还可以使用“ del”语句完全摆脱变量:

biglist = [blah,blah,blah]
#...
del biglist

Python is garbage-collected, so if you reduce the size of your list, it will reclaim memory. You can also use the “del” statement to get rid of a variable completely:

biglist = [blah,blah,blah]
#...
del biglist

回答 4

您不能显式释放内存。您需要做的是确保您不保留对对象的引用。然后将对它们进行垃圾回收,从而释放内存。

对于您的情况,当您需要大型列表时,通常需要重新组织代码,通常使用生成器/迭代器。这样,您根本就不需要在内存中存储大型列表。

http://www.prasannatech.net/2009/07/introduction-python-generators.html

You can’t explicitly free memory. What you need to do is to make sure you don’t keep references to objects. They will then be garbage collected, freeing the memory.

In your case, when you need large lists, you typically need to reorganize the code, typically using generators/iterators instead. That way you don’t need to have the large lists in memory at all.

http://www.prasannatech.net/2009/07/introduction-python-generators.html


回答 5

del可以是您的朋友,因为当没有其他引用时,它将对象标记为可删除。现在,CPython解释器通常会保留此内存供以后使用,因此您的操作系统可能看不到“已释放”的内存。)

通过使用更紧凑的数据结构,也许您一开始就不会遇到任何内存问题。因此,数字列表的存储效率比标准array模块或第三方numpy模块使用的格式低得多。通过将顶点放在NumPy 3xN数组中并将三角形放在N元素数组中,可以节省内存。

(del can be your friend, as it marks objects as being deletable when there no other references to them. Now, often the CPython interpreter keeps this memory for later use, so your operating system might not see the “freed” memory.)

Maybe you would not run into any memory problem in the first place by using a more compact structure for your data. Thus, lists of numbers are much less memory-efficient than the format used by the standard array module or the third-party numpy module. You would save memory by putting your vertices in a NumPy 3xN array and your triangles in an N-element array.


回答 6

从文件读取图形时,我遇到了类似的问题。该处理包括计算不适合内存的200 000×200 000浮点矩阵(一次一行)。尝试使用gc.collect()固定的内存相关方面来释放两次计算之间的内存,但这导致了性能问题:我不知道为什么,但是即使使用的内存量保持不变,每次调用也要gc.collect()花费更多的时间。前一个。因此,垃圾收集很快就花费了大部分计算时间。

为了解决内存和性能问题,我改用了在某处阅读过的多线程技巧(很抱歉,我找不到相关的文章了)。在以大for循环读取文件的每一行之前,先对其进行处理,然后gc.collect()每隔一段时间运行一次以释放内存空间。现在,我调用一个在新线程中读取和处理文件块的函数。线程结束后,将自动释放内存,而不会出现奇怪的性能问题。

实际上它是这样的:

from dask import delayed  # this module wraps the multithreading
def f(storage, index, chunk_size):  # the processing function
    # read the chunk of size chunk_size starting at index in the file
    # process it using data in storage if needed
    # append data needed for further computations  to storage 
    return storage

partial_result = delayed([])  # put into the delayed() the constructor for your data structure
# I personally use "delayed(nx.Graph())" since I am creating a networkx Graph
chunk_size = 100  # ideally you want this as big as possible while still enabling the computations to fit in memory
for index in range(0, len(file), chunk_size):
    # we indicates to dask that we will want to apply f to the parameters partial_result, index, chunk_size
    partial_result = delayed(f)(partial_result, index, chunk_size)

    # no computations are done yet !
    # dask will spawn a thread to run f(partial_result, index, chunk_size) once we call partial_result.compute()
    # passing the previous "partial_result" variable in the parameters assures a chunk will only be processed after the previous one is done
    # it also allows you to use the results of the processing of the previous chunks in the file if needed

# this launches all the computations
result = partial_result.compute()

# one thread is spawned for each "delayed" one at a time to compute its result
# dask then closes the tread, which solves the memory freeing issue
# the strange performance issue with gc.collect() is also avoided

I had a similar problem in reading a graph from a file. The processing included the computation of a 200 000×200 000 float matrix (one line at a time) that did not fit into memory. Trying to free the memory between computations using gc.collect() fixed the memory-related aspect of the problem but it resulted in performance issues: I don’t know why but even though the amount of used memory remained constant, each new call to gc.collect() took some more time than the previous one. So quite quickly the garbage collecting took most of the computation time.

To fix both the memory and performance issues I switched to the use of a multithreading trick I read once somewhere (I’m sorry, I cannot find the related post anymore). Before I was reading each line of the file in a big for loop, processing it, and running gc.collect() every once and a while to free memory space. Now I call a function that reads and processes a chunk of the file in a new thread. Once the thread ends, the memory is automatically freed without the strange performance issue.

Practically it works like this:

from dask import delayed  # this module wraps the multithreading
def f(storage, index, chunk_size):  # the processing function
    # read the chunk of size chunk_size starting at index in the file
    # process it using data in storage if needed
    # append data needed for further computations  to storage 
    return storage

partial_result = delayed([])  # put into the delayed() the constructor for your data structure
# I personally use "delayed(nx.Graph())" since I am creating a networkx Graph
chunk_size = 100  # ideally you want this as big as possible while still enabling the computations to fit in memory
for index in range(0, len(file), chunk_size):
    # we indicates to dask that we will want to apply f to the parameters partial_result, index, chunk_size
    partial_result = delayed(f)(partial_result, index, chunk_size)

    # no computations are done yet !
    # dask will spawn a thread to run f(partial_result, index, chunk_size) once we call partial_result.compute()
    # passing the previous "partial_result" variable in the parameters assures a chunk will only be processed after the previous one is done
    # it also allows you to use the results of the processing of the previous chunks in the file if needed

# this launches all the computations
result = partial_result.compute()

# one thread is spawned for each "delayed" one at a time to compute its result
# dask then closes the tread, which solves the memory freeing issue
# the strange performance issue with gc.collect() is also avoided

回答 7

其他人已经发布了一些方法,使您可以“哄骗” Python解释器释放内存(或者避免出现内存问题)。您应该首先尝试他们的想法。但是,我觉得给您直接回答您的问题很重要。

实际上并没有直接告诉Python释放内存的方法。这件事的事实是,如果您想要较低的控制级别,则必须使用C或C ++编写扩展。

也就是说,有一些工具可以帮助您:

Others have posted some ways that you might be able to “coax” the Python interpreter into freeing the memory (or otherwise avoid having memory problems). Chances are you should try their ideas out first. However, I feel it important to give you a direct answer to your question.

There isn’t really any way to directly tell Python to free memory. The fact of that matter is that if you want that low a level of control, you’re going to have to write an extension in C or C++.

That said, there are some tools to help with this:


回答 8

如果您不关心顶点重用,则可以有两个输出文件-一个用于顶点,一个用于三角形。完成后,将三角形文件附加到顶点文件。

If you don’t care about vertex reuse, you could have two output files–one for vertices and one for triangles. Then append the triangle file to the vertex file when you are done.


如何确定Python中对象的大小?

问题:如何确定Python中对象的大小?

我想知道如何在Python中获取对象的大小,例如字符串,整数等。

相关问题:Python列表(元组)中每个元素有多少个字节?

我使用的XML文件包含指定值大小的大小字段。我必须解析此XML并进行编码。当我想更改特定字段的值时,我将检查该值的大小字段。在这里,我想比较输入的新值是否与XML中的值相同。我需要检查新值的大小。如果是字符串,我可以说它的长度。但是如果是int,float等,我会感到困惑。

I want to know how to get size of objects like a string, integer, etc. in Python.

Related question: How many bytes per element are there in a Python list (tuple)?

I am using an XML file which contains size fields that specify the size of value. I must parse this XML and do my coding. When I want to change the value of a particular field, I will check the size field of that value. Here I want to compare whether the new value that I’m gong to enter is of the same size as in XML. I need to check the size of new value. In case of a string I can say its the length. But in case of int, float, etc. I am confused.


回答 0

只需使用模块中定义的sys.getsizeof函数即可sys

sys.getsizeof(object[, default])

返回对象的大小(以字节为单位)。该对象可以是任何类型的对象。所有内置对象都将返回正确的结果,但是对于第三方扩展,这不一定成立,因为它是特定于实现的。

default参数允许定义一个值,如果对象类型不提供检索大小的方法并导致,则将返回该值 TypeError

getsizeof__sizeof__如果对象由垃圾收集器管理,则调用该对象的 方法并添加额外的垃圾收集器开销。

用法示例,在python 3.0中:

>>> import sys
>>> x = 2
>>> sys.getsizeof(x)
24
>>> sys.getsizeof(sys.getsizeof)
32
>>> sys.getsizeof('this')
38
>>> sys.getsizeof('this also')
48

如果您使用的是python <2.6及以下版本,则sys.getsizeof可以使用此扩展模块。虽然从未使用过。

Just use the sys.getsizeof function defined in the sys module.

sys.getsizeof(object[, default]):

Return the size of an object in bytes. The object can be any type of object. All built-in objects will return correct results, but this does not have to hold true for third-party extensions as it is implementation specific.

The default argument allows to define a value which will be returned if the object type does not provide means to retrieve the size and would cause a TypeError.

getsizeof calls the object’s __sizeof__ method and adds an additional garbage collector overhead if the object is managed by the garbage collector.

Usage example, in python 3.0:

>>> import sys
>>> x = 2
>>> sys.getsizeof(x)
24
>>> sys.getsizeof(sys.getsizeof)
32
>>> sys.getsizeof('this')
38
>>> sys.getsizeof('this also')
48

If you are in python < 2.6 and don’t have sys.getsizeof you can use this extensive module instead. Never used it though.


回答 1

如何确定Python中对象的大小?

答案“仅使用sys.getsizeof”不是一个完整的答案。

该答案确实直接适用于内置对象,但没有考虑这些对象可能包含的内容,特别是不包含哪些类型,例如自定义对象,元组,列表,字典和集合所包含的类型。它们可以互相包含实例,以及数字,字符串和其他对象。

更完整的答案

使用Anaconda发行版中的64位Python 3.6和sys.getsizeof,我确定了以下对象的最小大小,并请注意set和dict预分配了空间,因此空的对象直到设定的数量后才再次增长。因语言的实现而异):

Python 3:

Empty
Bytes  type        scaling notes
28     int         +4 bytes about every 30 powers of 2
37     bytes       +1 byte per additional byte
49     str         +1-4 per additional character (depending on max width)
48     tuple       +8 per additional item
64     list        +8 for each additional
224    set         5th increases to 736; 21nd, 2272; 85th, 8416; 341, 32992
240    dict        6th increases to 368; 22nd, 1184; 43rd, 2280; 86th, 4704; 171st, 9320
136    func def    does not include default args and other attrs
1056   class def   no slots 
56     class inst  has a __dict__ attr, same scaling as dict above
888    class def   with slots
16     __slots__   seems to store in mutable tuple-like structure
                   first slot grows to 48, and so on.

您如何解释呢?好吧,说您有一套10件物品。如果每个项目都是100字节,那么整个数据结构有多大?该集合本身为736,因为它的大小增加了一倍,达到736字节。然后,添加项目的大小,因此总计1736字节

有关函数和类定义的一些警告:

请注意,每个类定义都有一个__dict__用于类attrs 的代理(48字节)结构。每个插槽property在类定义中都有一个描述符(如)。

开槽实例在其第一个元素上以48个字节开头,并且每增加一个字节就增加8个字节。只有空的带槽对象具有16个字节,而没有数据的实例意义不大。

此外,每个函数定义都有代码对象(可能是文档字符串)和其他可能的属性,甚至是__dict__

还要注意,我们sys.getsizeof()之所以使用,是因为我们关心的是边际空间使用情况,其中包括docs中对象的垃圾回收开销:

__sizeof__如果对象是由垃圾收集器管理的,则getsizeof()调用对象的方法并增加额外的垃圾收集器开销。

还要注意,调整列表的大小(例如重复添加到列表中)会使它们预先分配空间,类似于集合和字典。从listobj.c源代码

    /* This over-allocates proportional to the list size, making room
     * for additional growth.  The over-allocation is mild, but is
     * enough to give linear-time amortized behavior over a long
     * sequence of appends() in the presence of a poorly-performing
     * system realloc().
     * The growth pattern is:  0, 4, 8, 16, 25, 35, 46, 58, 72, 88, ...
     * Note: new_allocated won't overflow because the largest possible value
     *       is PY_SSIZE_T_MAX * (9 / 8) + 6 which always fits in a size_t.
     */
    new_allocated = (size_t)newsize + (newsize >> 3) + (newsize < 9 ? 3 : 6);

历史数据

Python 2.7分析,通过guppy.hpy和确认sys.getsizeof

Bytes  type        empty + scaling notes
24     int         NA
28     long        NA
37     str         + 1 byte per additional character
52     unicode     + 4 bytes per additional character
56     tuple       + 8 bytes per additional item
72     list        + 32 for first, 8 for each additional
232    set         sixth item increases to 744; 22nd, 2280; 86th, 8424
280    dict        sixth item increases to 1048; 22nd, 3352; 86th, 12568 *
120    func def    does not include default args and other attrs
64     class inst  has a __dict__ attr, same scaling as dict above
16     __slots__   class with slots has no dict, seems to store in 
                   mutable tuple-like structure.
904    class def   has a proxy __dict__ structure for class attrs
104    old class   makes sense, less stuff, has real dict though.

请注意,字典(而非集合)在Python 3.6中得到了更紧凑的表示形式

我认为在64位计算机上,每个附加项目要引用8个字节是很有意义的。这8个字节指向所包含项在内存中的位置。如果我没记错的话,Python 2的unicode的4个字节是固定宽度的,但是在Python 3中,str变成的unicode的宽度等于字符的最大宽度。

(有关插槽的更多信息,请参见此答案

更完整的功能

我们需要一个功能来搜索列表,元组,集合,字典,obj.__dict__‘s和中的元素obj.__slots__,以及我们可能尚未想到的其他内容。

我们希望依靠gc.get_referents此搜索,因为它可以在C级别上运行(使其变得非常快)。缺点是get_referents可以返回冗余成员,因此我们需要确保不会重复计算。

类,模块和函数是单例-它们在内存中存在一次。我们对它们的大小不太感兴趣,因为我们对此无能为力-它们是程序的一部分。因此,如果碰巧引用了它们,我们将避免计算它们。

我们将使用类型的黑名单,因此我们不将整个程序包括在我们的大小计数中。

import sys
from types import ModuleType, FunctionType
from gc import get_referents

# Custom objects know their class.
# Function objects seem to know way too much, including modules.
# Exclude modules as well.
BLACKLIST = type, ModuleType, FunctionType


def getsize(obj):
    """sum size of object & members."""
    if isinstance(obj, BLACKLIST):
        raise TypeError('getsize() does not take argument of type: '+ str(type(obj)))
    seen_ids = set()
    size = 0
    objects = [obj]
    while objects:
        need_referents = []
        for obj in objects:
            if not isinstance(obj, BLACKLIST) and id(obj) not in seen_ids:
                seen_ids.add(id(obj))
                size += sys.getsizeof(obj)
                need_referents.append(obj)
        objects = get_referents(*need_referents)
    return size

为了与下面的白名单功能形成对比,大多数对象都知道如何遍历自身以进行垃圾回收(当我们想知道某些对象在内存中有多昂贵时,这正是我们要寻找的东西。gc.get_referents。)但是,如果我们不谨慎的话,这一措施的范围将比我们预期的要广泛得多。

例如,函数对创建它们的模块非常了解。

另一个对比点是,字典中作为键的字符串通常会被保留,因此不会重复。检查id(key)还将使我们避免计算重复项,这将在下一部分中进行。黑名单解决方案会跳过对全部为字符串的键的计数。

白名单类型,递归访问者(旧的实现)

为了亲自涵盖其中的大多数类型,我编写了此递归函数以尝试估算大多数Python对象的大小,包括大多数内建函数,集合模块中的类型以及自定义类型(有槽或其他),而不是依赖于gc模块。 。

这种功能可以对要计算内存使用情况的类型进行更细粒度的控制,但存在将类型排除在外的危险:

import sys
from numbers import Number
from collections import Set, Mapping, deque

try: # Python 2
    zero_depth_bases = (basestring, Number, xrange, bytearray)
    iteritems = 'iteritems'
except NameError: # Python 3
    zero_depth_bases = (str, bytes, Number, range, bytearray)
    iteritems = 'items'

def getsize(obj_0):
    """Recursively iterate to sum size of object & members."""
    _seen_ids = set()
    def inner(obj):
        obj_id = id(obj)
        if obj_id in _seen_ids:
            return 0
        _seen_ids.add(obj_id)
        size = sys.getsizeof(obj)
        if isinstance(obj, zero_depth_bases):
            pass # bypass remaining control flow and return
        elif isinstance(obj, (tuple, list, Set, deque)):
            size += sum(inner(i) for i in obj)
        elif isinstance(obj, Mapping) or hasattr(obj, iteritems):
            size += sum(inner(k) + inner(v) for k, v in getattr(obj, iteritems)())
        # Check for custom object instances - may subclass above too
        if hasattr(obj, '__dict__'):
            size += inner(vars(obj))
        if hasattr(obj, '__slots__'): # can have __slots__ with __dict__
            size += sum(inner(getattr(obj, s)) for s in obj.__slots__ if hasattr(obj, s))
        return size
    return inner(obj_0)

我相当随意地测试了它(我应该对其进行单元测试):

>>> getsize(['a', tuple('bcd'), Foo()])
344
>>> getsize(Foo())
16
>>> getsize(tuple('bcd'))
194
>>> getsize(['a', tuple('bcd'), Foo(), {'foo': 'bar', 'baz': 'bar'}])
752
>>> getsize({'foo': 'bar', 'baz': 'bar'})
400
>>> getsize({})
280
>>> getsize({'foo':'bar'})
360
>>> getsize('foo')
40
>>> class Bar():
...     def baz():
...         pass
>>> getsize(Bar())
352
>>> getsize(Bar().__dict__)
280
>>> sys.getsizeof(Bar())
72
>>> getsize(Bar.__dict__)
872
>>> sys.getsizeof(Bar.__dict__)
280

此实现违反了类定义和函数定义,因为我们没有使用它们的所有属性,但是由于它们在该进程的内存中应该只存在一次,因此它们的大小实际上并没有太大关系。

How do I determine the size of an object in Python?

The answer, “Just use sys.getsizeof” is not a complete answer.

That answer does work for builtin objects directly, but it does not account for what those objects may contain, specifically, what types, such as custom objects, tuples, lists, dicts, and sets contain. They can contain instances each other, as well as numbers, strings and other objects.

A More Complete Answer

Using 64 bit Python 3.6 from the Anaconda distribution, with sys.getsizeof, I have determined the minimum size of the following objects, and note that sets and dicts preallocate space so empty ones don’t grow again until after a set amount (which may vary by implementation of the language):

Python 3:

Empty
Bytes  type        scaling notes
28     int         +4 bytes about every 30 powers of 2
37     bytes       +1 byte per additional byte
49     str         +1-4 per additional character (depending on max width)
48     tuple       +8 per additional item
64     list        +8 for each additional
224    set         5th increases to 736; 21nd, 2272; 85th, 8416; 341, 32992
240    dict        6th increases to 368; 22nd, 1184; 43rd, 2280; 86th, 4704; 171st, 9320
136    func def    does not include default args and other attrs
1056   class def   no slots 
56     class inst  has a __dict__ attr, same scaling as dict above
888    class def   with slots
16     __slots__   seems to store in mutable tuple-like structure
                   first slot grows to 48, and so on.

How do you interpret this? Well say you have a set with 10 items in it. If each item is 100 bytes each, how big is the whole data structure? The set is 736 itself because it has sized up one time to 736 bytes. Then you add the size of the items, so that’s 1736 bytes in total

Some caveats for function and class definitions:

Note each class definition has a proxy __dict__ (48 bytes) structure for class attrs. Each slot has a descriptor (like a property) in the class definition.

Slotted instances start out with 48 bytes on their first element, and increase by 8 each additional. Only empty slotted objects have 16 bytes, and an instance with no data makes very little sense.

Also, each function definition has code objects, maybe docstrings, and other possible attributes, even a __dict__.

Also note that we use sys.getsizeof() because we care about the marginal space usage, which includes the garbage collection overhead for the object, from the docs:

getsizeof() calls the object’s __sizeof__ method and adds an additional garbage collector overhead if the object is managed by the garbage collector.

Also note that resizing lists (e.g. repetitively appending to them) causes them to preallocate space, similarly to sets and dicts. From the listobj.c source code:

    /* This over-allocates proportional to the list size, making room
     * for additional growth.  The over-allocation is mild, but is
     * enough to give linear-time amortized behavior over a long
     * sequence of appends() in the presence of a poorly-performing
     * system realloc().
     * The growth pattern is:  0, 4, 8, 16, 25, 35, 46, 58, 72, 88, ...
     * Note: new_allocated won't overflow because the largest possible value
     *       is PY_SSIZE_T_MAX * (9 / 8) + 6 which always fits in a size_t.
     */
    new_allocated = (size_t)newsize + (newsize >> 3) + (newsize < 9 ? 3 : 6);

Historical data

Python 2.7 analysis, confirmed with guppy.hpy and sys.getsizeof:

Bytes  type        empty + scaling notes
24     int         NA
28     long        NA
37     str         + 1 byte per additional character
52     unicode     + 4 bytes per additional character
56     tuple       + 8 bytes per additional item
72     list        + 32 for first, 8 for each additional
232    set         sixth item increases to 744; 22nd, 2280; 86th, 8424
280    dict        sixth item increases to 1048; 22nd, 3352; 86th, 12568 *
120    func def    does not include default args and other attrs
64     class inst  has a __dict__ attr, same scaling as dict above
16     __slots__   class with slots has no dict, seems to store in 
                   mutable tuple-like structure.
904    class def   has a proxy __dict__ structure for class attrs
104    old class   makes sense, less stuff, has real dict though.

Note that dictionaries (but not sets) got a more compact representation in Python 3.6

I think 8 bytes per additional item to reference makes a lot of sense on a 64 bit machine. Those 8 bytes point to the place in memory the contained item is at. The 4 bytes are fixed width for unicode in Python 2, if I recall correctly, but in Python 3, str becomes a unicode of width equal to the max width of the characters.

(And for more on slots, see this answer )

A More Complete Function

We want a function that searches the elements in lists, tuples, sets, dicts, obj.__dict__‘s, and obj.__slots__, as well as other things we may not have yet thought of.

We want to rely on gc.get_referents to do this search because it works at the C level (making it very fast). The downside is that get_referents can return redundant members, so we need to ensure we don’t double count.

Classes, modules, and functions are singletons – they exist one time in memory. We’re not so interested in their size, as there’s not much we can do about them – they’re a part of the program. So we’ll avoid counting them if they happen to be referenced.

We’re going to use a blacklist of types so we don’t include the entire program in our size count.

import sys
from types import ModuleType, FunctionType
from gc import get_referents

# Custom objects know their class.
# Function objects seem to know way too much, including modules.
# Exclude modules as well.
BLACKLIST = type, ModuleType, FunctionType


def getsize(obj):
    """sum size of object & members."""
    if isinstance(obj, BLACKLIST):
        raise TypeError('getsize() does not take argument of type: '+ str(type(obj)))
    seen_ids = set()
    size = 0
    objects = [obj]
    while objects:
        need_referents = []
        for obj in objects:
            if not isinstance(obj, BLACKLIST) and id(obj) not in seen_ids:
                seen_ids.add(id(obj))
                size += sys.getsizeof(obj)
                need_referents.append(obj)
        objects = get_referents(*need_referents)
    return size

To contrast this with the following whitelisted function, most objects know how to traverse themselves for the purposes of garbage collection (which is approximately what we’re looking for when we want to know how expensive in memory certain objects are. This functionality is used by gc.get_referents.) However, this measure is going to be much more expansive in scope than we intended if we are not careful.

For example, functions know quite a lot about the modules they are created in.

Another point of contrast is that strings that are keys in dictionaries are usually interned so they are not duplicated. Checking for id(key) will also allow us to avoid counting duplicates, which we do in the next section. The blacklist solution skips counting keys that are strings altogether.

Whitelisted Types, Recursive visitor (old implementation)

To cover most of these types myself, instead of relying on the gc module, I wrote this recursive function to try to estimate the size of most Python objects, including most builtins, types in the collections module, and custom types (slotted and otherwise).

This sort of function gives much more fine-grained control over the types we’re going to count for memory usage, but has the danger of leaving types out:

import sys
from numbers import Number
from collections import Set, Mapping, deque

try: # Python 2
    zero_depth_bases = (basestring, Number, xrange, bytearray)
    iteritems = 'iteritems'
except NameError: # Python 3
    zero_depth_bases = (str, bytes, Number, range, bytearray)
    iteritems = 'items'

def getsize(obj_0):
    """Recursively iterate to sum size of object & members."""
    _seen_ids = set()
    def inner(obj):
        obj_id = id(obj)
        if obj_id in _seen_ids:
            return 0
        _seen_ids.add(obj_id)
        size = sys.getsizeof(obj)
        if isinstance(obj, zero_depth_bases):
            pass # bypass remaining control flow and return
        elif isinstance(obj, (tuple, list, Set, deque)):
            size += sum(inner(i) for i in obj)
        elif isinstance(obj, Mapping) or hasattr(obj, iteritems):
            size += sum(inner(k) + inner(v) for k, v in getattr(obj, iteritems)())
        # Check for custom object instances - may subclass above too
        if hasattr(obj, '__dict__'):
            size += inner(vars(obj))
        if hasattr(obj, '__slots__'): # can have __slots__ with __dict__
            size += sum(inner(getattr(obj, s)) for s in obj.__slots__ if hasattr(obj, s))
        return size
    return inner(obj_0)

And I tested it rather casually (I should unittest it):

>>> getsize(['a', tuple('bcd'), Foo()])
344
>>> getsize(Foo())
16
>>> getsize(tuple('bcd'))
194
>>> getsize(['a', tuple('bcd'), Foo(), {'foo': 'bar', 'baz': 'bar'}])
752
>>> getsize({'foo': 'bar', 'baz': 'bar'})
400
>>> getsize({})
280
>>> getsize({'foo':'bar'})
360
>>> getsize('foo')
40
>>> class Bar():
...     def baz():
...         pass
>>> getsize(Bar())
352
>>> getsize(Bar().__dict__)
280
>>> sys.getsizeof(Bar())
72
>>> getsize(Bar.__dict__)
872
>>> sys.getsizeof(Bar.__dict__)
280

This implementation breaks down on class definitions and function definitions because we don’t go after all of their attributes, but since they should only exist once in memory for the process, their size really doesn’t matter too much.


回答 2

Pympler封装的asizeof模块可以做到这一点。

用法如下:

from pympler import asizeof
asizeof.asizeof(my_object)

sys.getsizeof与之不同,它适用于您自己创建的对象。它甚至可以与numpy一起使用。

>>> asizeof.asizeof(tuple('bcd'))
200
>>> asizeof.asizeof({'foo': 'bar', 'baz': 'bar'})
400
>>> asizeof.asizeof({})
280
>>> asizeof.asizeof({'foo':'bar'})
360
>>> asizeof.asizeof('foo')
40
>>> asizeof.asizeof(Bar())
352
>>> asizeof.asizeof(Bar().__dict__)
280
>>> A = rand(10)
>>> B = rand(10000)
>>> asizeof.asizeof(A)
176
>>> asizeof.asizeof(B)
80096

正如提到的

可以通过设置option来包含类,函数,方法,模块等对象的(字节)代码大小code=True

如果您需要其他有关实时数据的视图,Pympler的

该模块muppy用于对Python应用程序进行在线监视,该模块Class Tracker提供对所选Python对象生命周期的离线分析。

The Pympler package’s asizeof module can do this.

Use as follows:

from pympler import asizeof
asizeof.asizeof(my_object)

Unlike sys.getsizeof, it works for your self-created objects. It even works with numpy.

>>> asizeof.asizeof(tuple('bcd'))
200
>>> asizeof.asizeof({'foo': 'bar', 'baz': 'bar'})
400
>>> asizeof.asizeof({})
280
>>> asizeof.asizeof({'foo':'bar'})
360
>>> asizeof.asizeof('foo')
40
>>> asizeof.asizeof(Bar())
352
>>> asizeof.asizeof(Bar().__dict__)
280
>>> A = rand(10)
>>> B = rand(10000)
>>> asizeof.asizeof(A)
176
>>> asizeof.asizeof(B)
80096

As mentioned,

The (byte)code size of objects like classes, functions, methods, modules, etc. can be included by setting option code=True.

And if you need other view on live data, Pympler’s

module muppy is used for on-line monitoring of a Python application and module Class Tracker provides off-line analysis of the lifetime of selected Python objects.


回答 3

对于numpy数组,getsizeof它不起作用-对于我来说,由于某种原因它总是返回40:

from pylab import *
from sys import getsizeof
A = rand(10)
B = rand(10000)

然后(在ipython中):

In [64]: getsizeof(A)
Out[64]: 40

In [65]: getsizeof(B)
Out[65]: 40

令人高兴的是:

In [66]: A.nbytes
Out[66]: 80

In [67]: B.nbytes
Out[67]: 80000

For numpy arrays, getsizeof doesn’t work – for me it always returns 40 for some reason:

from pylab import *
from sys import getsizeof
A = rand(10)
B = rand(10000)

Then (in ipython):

In [64]: getsizeof(A)
Out[64]: 40

In [65]: getsizeof(B)
Out[65]: 40

Happily, though:

In [66]: A.nbytes
Out[66]: 80

In [67]: B.nbytes
Out[67]: 80000

回答 4

这可能比看起来要复杂得多,具体取决于您要如何计算事物。例如,如果您有一个整数列表,您是否想要包含整数引用的列表的大小?(即仅列出,而不列出其中的内容),还是要包括指向的实际数据,在这种情况下,您需要处理重复的引用,以及当两个对象包含对引用的引用时如何防止重复计算同一对象。

您可能想看看其中一种python内存分析器,例如pysizer,看看它们是否满足您的需求。

This can be more complicated than it looks depending on how you want to count things. For instance, if you have a list of ints, do you want the size of the list containing the references to the ints? (ie. list only, not what is contained in it), or do you want to include the actual data pointed to, in which case you need to deal with duplicate references, and how to prevent double-counting when two objects contain references to the same object.

You may want to take a look at one of the python memory profilers, such as pysizer to see if they meet your needs.


回答 5

Raymond Hettinger 在此宣布sys.getsizeof,Python 3.8(2019年第一季度)将更改的某些结果:

在64位版本中,Python容器要小8字节。

tuple ()  48 -> 40       
list  []  64 ->56
set()    224 -> 216
dict  {} 240 -> 232

这是在问题33597Inada Naoki(methane围绕Compact PyGC_Head和PR 7043开展的工作之后

这个想法将PyGC_Head的大小减少到两个单词

目前,PyGC_Head包含三个词gc_prevgc_nextgc_refcnt

  • gc_refcnt 收集时用于尝试删除。
  • gc_prev 用于跟踪和取消跟踪。

因此,如果我们可以避免在尝试删除时进行跟踪/取消跟踪,gc_prev并且gc_refcnt可以共享相同的内存空间。

参见commit d5c875b

Py_ssize_t从中删除一名成员PyGC_Head
所有GC跟踪的对象(例如,元组,列表,字典)的大小都减少了4或8个字节。

Python 3.8 (Q1 2019) will change some of the results of sys.getsizeof, as announced here by Raymond Hettinger:

Python containers are 8 bytes smaller on 64-bit builds.

tuple ()  48 -> 40       
list  []  64 ->56
set()    224 -> 216
dict  {} 240 -> 232

This comes after issue 33597 and Inada Naoki (methane)‘s work around Compact PyGC_Head, and PR 7043

This idea reduces PyGC_Head size to two words.

Currently, PyGC_Head takes three words; gc_prev, gc_next, and gc_refcnt.

  • gc_refcnt is used when collecting, for trial deletion.
  • gc_prev is used for tracking and untracking.

So if we can avoid tracking/untracking while trial deletion, gc_prev and gc_refcnt can share same memory space.

See commit d5c875b:

Removed one Py_ssize_t member from PyGC_Head.
All GC tracked objects (e.g. tuple, list, dict) size is reduced 4 or 8 bytes.


回答 6

我本人多次遇到此问题,然后写了一个小函数(受@ aaron-hall的启发)和测试,实现了sys.getsizeof的期望:

https://github.com/bosswissam/pysize

如果您对背景故事感兴趣,请在这里

编辑:附加下面的代码,以方便参考。要查看最新代码,请检查github链接。

    import sys

    def get_size(obj, seen=None):
        """Recursively finds size of objects"""
        size = sys.getsizeof(obj)
        if seen is None:
            seen = set()
        obj_id = id(obj)
        if obj_id in seen:
            return 0
        # Important mark as seen *before* entering recursion to gracefully handle
        # self-referential objects
        seen.add(obj_id)
        if isinstance(obj, dict):
            size += sum([get_size(v, seen) for v in obj.values()])
            size += sum([get_size(k, seen) for k in obj.keys()])
        elif hasattr(obj, '__dict__'):
            size += get_size(obj.__dict__, seen)
        elif hasattr(obj, '__iter__') and not isinstance(obj, (str, bytes, bytearray)):
            size += sum([get_size(i, seen) for i in obj])
        return size

Having run into this problem many times myself, I wrote up a small function (inspired by @aaron-hall’s answer) & tests that does what I would have expected sys.getsizeof to do:

https://github.com/bosswissam/pysize

If you’re interested in the backstory, here it is

EDIT: Attaching the code below for easy reference. To see the most up-to-date code, please check the github link.

    import sys

    def get_size(obj, seen=None):
        """Recursively finds size of objects"""
        size = sys.getsizeof(obj)
        if seen is None:
            seen = set()
        obj_id = id(obj)
        if obj_id in seen:
            return 0
        # Important mark as seen *before* entering recursion to gracefully handle
        # self-referential objects
        seen.add(obj_id)
        if isinstance(obj, dict):
            size += sum([get_size(v, seen) for v in obj.values()])
            size += sum([get_size(k, seen) for k in obj.keys()])
        elif hasattr(obj, '__dict__'):
            size += get_size(obj.__dict__, seen)
        elif hasattr(obj, '__iter__') and not isinstance(obj, (str, bytes, bytearray)):
            size += sum([get_size(i, seen) for i in obj])
        return size

回答 7

这是我根据先前的答案编写的一个快速脚本,用于列出所有变量的大小

for i in dir():
    print (i, sys.getsizeof(eval(i)) )

Here is a quick script I wrote based on the previous answers to list sizes of all variables

for i in dir():
    print (i, sys.getsizeof(eval(i)) )

回答 8

您可以序列化对象以得出与对象大小密切相关的度量:

import pickle

## let o be the object, whose size you want to measure
size_estimate = len(pickle.dumps(o))

如果您要测量无法腌制的对象(例如,由于lambda表达式),则可以使用混浊解决方案。

You can serialize the object to derive a measure that is closely related to the size of the object:

import pickle

## let o be the object, whose size you want to measure
size_estimate = len(pickle.dumps(o))

If you want to measure objects that cannot be pickled (e.g. because of lambda expressions) cloudpickle can be a solution.


回答 9

如果不想包含链接(嵌套)对象的大小,请使用sys.getsizeof()

但是,如果您要计算嵌套在列表,字典,集合,元组中的子对象(通常这就是您要查找的内容),请使用递归的deep sizeof()函数,如下所示:

import sys
def sizeof(obj):
    size = sys.getsizeof(obj)
    if isinstance(obj, dict): return size + sum(map(sizeof, obj.keys())) + sum(map(sizeof, obj.values()))
    if isinstance(obj, (list, tuple, set, frozenset)): return size + sum(map(sizeof, obj))
    return size

您还可以在漂亮的工具箱中找到此功能,以及许多其他有用的单行代码:

https://github.com/mwojnars/nifty/blob/master/util.py

Use sys.getsizeof() if you DON’T want to include sizes of linked (nested) objects.

However, if you want to count sub-objects nested in lists, dicts, sets, tuples – and usually THIS is what you’re looking for – use the recursive deep sizeof() function as shown below:

import sys
def sizeof(obj):
    size = sys.getsizeof(obj)
    if isinstance(obj, dict): return size + sum(map(sizeof, obj.keys())) + sum(map(sizeof, obj.values()))
    if isinstance(obj, (list, tuple, set, frozenset)): return size + sum(map(sizeof, obj))
    return size

You can also find this function in the nifty toolbox, together with many other useful one-liners:

https://github.com/mwojnars/nifty/blob/master/util.py


回答 10

如果您不需要对象的确切大小,但大致了解对象的大小,一种快速(又脏)的方法是让程序运行,睡眠较长时间并检查内存使用情况(例如:Mac的活动监视器)通过此特定的python进程。当您尝试在python进程中查找单个大对象的大小时,这将是有效的。例如,我最近想检查一个新数据结构的内存使用情况,并将其与Python的set数据结构进行比较。首先,我将元素(大型公共领域书中的单词)写到一个集合中,然后检查流程的大小,然后对其他数据结构执行相同的操作。我发现一组Python进程占用的内存是新数据结构的两倍。再一次,你不会 不能准确地说出进程使用的内存等于对象的大小。随着对象的大小变大,与要监视的对象的大小相比,该过程的其余部分所消耗的内存可以忽略不计,这变得接近。

If you don’t need the exact size of the object but roughly to know how big it is, one quick (and dirty) way is to let the program run, sleep for an extended period of time, and check the memory usage (ex: Mac’s activity monitor) by this particular python process. This would be effective when you are trying to find the size of one single large object in a python process. For example, I recently wanted to check the memory usage of a new data structure and compare it with that of Python’s set data structure. First I wrote the elements (words from a large public domain book) to a set, then checked the size of the process, and then did the same thing with the other data structure. I found out the Python process with a set is taking twice as much memory as the new data structure. Again, you wouldn’t be able to exactly say the memory used by the process is equal to the size of the object. As the size of the object gets large, this becomes close as the memory consumed by the rest of the process becomes negligible compared to the size of the object you are trying to monitor.


回答 11

您可以使用如下所述的getSizeof()来确定对象的大小

import sys
str1 = "one"
int_element=5
print("Memory size of '"+str1+"' = "+str(sys.getsizeof(str1))+ " bytes")
print("Memory size of '"+ str(int_element)+"' = "+str(sys.getsizeof(int_element))+ " bytes")

You can make use of getSizeof() as mentioned below to determine the size of an object

import sys
str1 = "one"
int_element=5
print("Memory size of '"+str1+"' = "+str(sys.getsizeof(str1))+ " bytes")
print("Memory size of '"+ str(int_element)+"' = "+str(sys.getsizeof(int_element))+ " bytes")

回答 12

我使用这个技巧…可能在小对象上不准确,但是我认为它对于复杂对象(如pygame表面)比sys.getsizeof()更准确

import pygame as pg
import os
import psutil
import time


process = psutil.Process(os.getpid())
pg.init()    
vocab = ['hello', 'me', 'you', 'she', 'he', 'they', 'we',
         'should', 'why?', 'necessarily', 'do', 'that']

font = pg.font.SysFont("monospace", 100, True)

dct = {}

newMem = process.memory_info().rss  # don't mind this line
Str = f'store ' + f'Nothing \tsurface use about '.expandtabs(15) + \
      f'0\t bytes'.expandtabs(9)  # don't mind this assignment too

usedMem = process.memory_info().rss

for word in vocab:
    dct[word] = font.render(word, True, pg.Color("#000000"))

    time.sleep(0.1)  # wait a moment

    # get total used memory of this script:
    newMem = process.memory_info().rss
    Str = f'store ' + f'{word}\tsurface use about '.expandtabs(15) + \
          f'{newMem - usedMem}\t bytes'.expandtabs(9)

    print(Str)
    usedMem = newMem

在我的Windows 10(python 3.7.3)上,输出为:

store hello          surface use about 225280    bytes
store me             surface use about 61440     bytes
store you            surface use about 94208     bytes
store she            surface use about 81920     bytes
store he             surface use about 53248     bytes
store they           surface use about 114688    bytes
store we             surface use about 57344     bytes
store should         surface use about 172032    bytes
store why?           surface use about 110592    bytes
store necessarily    surface use about 311296    bytes
store do             surface use about 57344     bytes
store that           surface use about 110592    bytes

I use this trick… May won’t be accurate on small objects, but I think it’s much more accurate for a complex object (like pygame surface) rather than sys.getsizeof()

import pygame as pg
import os
import psutil
import time


process = psutil.Process(os.getpid())
pg.init()    
vocab = ['hello', 'me', 'you', 'she', 'he', 'they', 'we',
         'should', 'why?', 'necessarily', 'do', 'that']

font = pg.font.SysFont("monospace", 100, True)

dct = {}

newMem = process.memory_info().rss  # don't mind this line
Str = f'store ' + f'Nothing \tsurface use about '.expandtabs(15) + \
      f'0\t bytes'.expandtabs(9)  # don't mind this assignment too

usedMem = process.memory_info().rss

for word in vocab:
    dct[word] = font.render(word, True, pg.Color("#000000"))

    time.sleep(0.1)  # wait a moment

    # get total used memory of this script:
    newMem = process.memory_info().rss
    Str = f'store ' + f'{word}\tsurface use about '.expandtabs(15) + \
          f'{newMem - usedMem}\t bytes'.expandtabs(9)

    print(Str)
    usedMem = newMem

On my windows 10, python 3.7.3, the output is:

store hello          surface use about 225280    bytes
store me             surface use about 61440     bytes
store you            surface use about 94208     bytes
store she            surface use about 81920     bytes
store he             surface use about 53248     bytes
store they           surface use about 114688    bytes
store we             surface use about 57344     bytes
store should         surface use about 172032    bytes
store why?           surface use about 110592    bytes
store necessarily    surface use about 311296    bytes
store do             surface use about 57344     bytes
store that           surface use about 110592    bytes

建议使用哪个Python内存分析器?[关闭]

问题:建议使用哪个Python内存分析器?[关闭]

我想知道我的Python应用程序的内存使用情况,尤其想知道哪些代码块/部分或对象消耗了最多的内存。Google搜索显示商用的是Python Memory Validator(仅限Windows)。

开源的是PySizerHeapy

我没有尝试过任何人,所以我想知道哪个是最好的考虑因素:

  1. 提供大多数细节。

  2. 我必须要做最少的更改,也可以不做任何更改。

I want to know the memory usage of my Python application and specifically want to know what code blocks/portions or objects are consuming most memory. Google search shows a commercial one is Python Memory Validator (Windows only).

And open source ones are PySizer and Heapy.

I haven’t tried anyone, so I wanted to know which one is the best considering:

  1. Gives most details.

  2. I have to do least or no changes to my code.


回答 0

很容易使用。在代码中的某些时候,您必须编写以下代码:

from guppy import hpy
h = hpy()
print(h.heap())

这将为您提供如下输出:

Partition of a set of 132527 objects. Total size = 8301532 bytes.
Index  Count   %     Size   % Cumulative  % Kind (class / dict of class)
0  35144  27  2140412  26   2140412  26 str
1  38397  29  1309020  16   3449432  42 tuple
2    530   0   739856   9   4189288  50 dict (no owner)

您还可以从何处引用对象,并获取有关该对象的统计信息,但是以某种方式,该文档上的文档很少。

还有一个用Tk编写的图形浏览器。

Heapy is quite simple to use. At some point in your code, you have to write the following:

from guppy import hpy
h = hpy()
print(h.heap())

This gives you some output like this:

Partition of a set of 132527 objects. Total size = 8301532 bytes.
Index  Count   %     Size   % Cumulative  % Kind (class / dict of class)
0  35144  27  2140412  26   2140412  26 str
1  38397  29  1309020  16   3449432  42 tuple
2    530   0   739856   9   4189288  50 dict (no owner)

You can also find out from where objects are referenced and get statistics about that, but somehow the docs on that are a bit sparse.

There is a graphical browser as well, written in Tk.


回答 1

由于没有人提到它,因此我将指向我的模块memory_profiler,该能够打印内存使用情况的报告,并且可以在Unix和Windows上运行(在最后一个版本中需要psutil)。输出不是很详细,但是目标是让您概述代码在哪里消耗了更多的内存,而不是对分配的对象进行详尽的分析。

在用函数修饰功能@profile并使用-m memory_profiler标记运行代码后,它将打印如下一行报告:

Line #    Mem usage  Increment   Line Contents
==============================================
     3                           @profile
     4      5.97 MB    0.00 MB   def my_func():
     5     13.61 MB    7.64 MB       a = [1] * (10 ** 6)
     6    166.20 MB  152.59 MB       b = [2] * (2 * 10 ** 7)
     7     13.61 MB -152.59 MB       del b
     8     13.61 MB    0.00 MB       return a

Since nobody has mentioned it I’ll point to my module memory_profiler which is capable of printing line-by-line report of memory usage and works on Unix and Windows (needs psutil on this last one). Output is not very detailed but the goal is to give you an overview of where the code is consuming more memory and not a exhaustive analysis on allocated objects.

After decorating your function with @profile and running your code with the -m memory_profiler flag it will print a line-by-line report like this:

Line #    Mem usage  Increment   Line Contents
==============================================
     3                           @profile
     4      5.97 MB    0.00 MB   def my_func():
     5     13.61 MB    7.64 MB       a = [1] * (10 ** 6)
     6    166.20 MB  152.59 MB       b = [2] * (2 * 10 ** 7)
     7     13.61 MB -152.59 MB       del b
     8     13.61 MB    0.00 MB       return a

回答 2

我推荐Dowser。设置非常容易,您需要对代码进行零更改。您可以通过简单的Web界面随时查看每种类型的对象计数,查看活动对象列表,查看对活动对象的引用。

# memdebug.py

import cherrypy
import dowser

def start(port):
    cherrypy.tree.mount(dowser.Root())
    cherrypy.config.update({
        'environment': 'embedded',
        'server.socket_port': port
    })
    cherrypy.server.quickstart()
    cherrypy.engine.start(blocking=False)

您导入memdebug,然后调用memdebug.start。就这样。

我没有尝试过PySizer或Heapy。我会很感激别人的评论。

更新

上面的代码用于CherryPy 2.XCherryPy 3.Xserver.quickstart方法已删除,并且engine.start不带有该blocking标志。因此,如果您正在使用CherryPy 3.X

# memdebug.py

import cherrypy
import dowser

def start(port):
    cherrypy.tree.mount(dowser.Root())
    cherrypy.config.update({
        'environment': 'embedded',
        'server.socket_port': port
    })
    cherrypy.engine.start()

I recommend Dowser. It is very easy to setup, and you need zero changes to your code. You can view counts of objects of each type through time, view list of live objects, view references to live objects, all from the simple web interface.

# memdebug.py

import cherrypy
import dowser

def start(port):
    cherrypy.tree.mount(dowser.Root())
    cherrypy.config.update({
        'environment': 'embedded',
        'server.socket_port': port
    })
    cherrypy.server.quickstart()
    cherrypy.engine.start(blocking=False)

You import memdebug, and call memdebug.start. That’s all.

I haven’t tried PySizer or Heapy. I would appreciate others’ reviews.

UPDATE

The above code is for CherryPy 2.X, CherryPy 3.X the server.quickstart method has been removed and engine.start does not take the blocking flag. So if you are using CherryPy 3.X

# memdebug.py

import cherrypy
import dowser

def start(port):
    cherrypy.tree.mount(dowser.Root())
    cherrypy.config.update({
        'environment': 'embedded',
        'server.socket_port': port
    })
    cherrypy.engine.start()

回答 3

考虑objgraph库(请参阅http://www.lshift.net/blog/2008/11/14/tracing-python-memory-leaks以获得示例用例)。


回答 4

up(还有另一个)是Python的Memory Usage Profiler。该工具集的重点在于识别内存泄漏。

Muppy试图帮助开发人员识别Python应用程序的内存泄漏。它可以跟踪运行时的内存使用情况,并标识泄漏的对象。另外,提供了一些工具,这些工具可以定位未释放对象的来源。

Muppy is (yet another) Memory Usage Profiler for Python. The focus of this toolset is laid on the identification of memory leaks.

Muppy tries to help developers to identity memory leaks of Python applications. It enables the tracking of memory usage during runtime and the identification of objects which are leaking. Additionally, tools are provided which allow to locate the source of not released objects.


回答 5

我正在为Python开发一个名为memprof的内存分析器:

http://jmdana.github.io/memprof/

它允许您在执行装饰方法期间记录和绘制变量的内存使用情况。您只需要使用以下方法导入库:

from memprof import memprof

并使用以下方法装饰您的方法:

@memprof

这是有关情节外观的示例:

该项目托管在GitHub中:

https://github.com/jmdana/memprof

I’m developing a memory profiler for Python called memprof:

http://jmdana.github.io/memprof/

It allows you to log and plot the memory usage of your variables during the execution of the decorated methods. You just have to import the library using:

from memprof import memprof

And decorate your method using:

@memprof

This is an example on how the plots look like:

The project is hosted in GitHub:

https://github.com/jmdana/memprof


回答 6

我发现苹果酱比Heapy或PySizer更具功能。如果您恰巧正在运行wsgi Webapp,则Dozer是Dowser的一个很好的中间件包装

I found meliae to be much more functional than Heapy or PySizer. If you happen to be running a wsgi webapp, then Dozer is a nice middleware wrapper of Dowser


回答 7

也尝试pytracemalloc项目,该项目提供每个Python行号的内存使用情况。

编辑(2014/04):现在它具有Qt GUI来分析快照。

Try also the pytracemalloc project which provides the memory usage per Python line number.

EDIT (2014/04): It now has a Qt GUI to analyze snapshots.