在Python中释放内存

问题:在Python中释放内存

在以下示例中,我有一些有关内存使用的相关问题。

  1. 如果我在解释器中运行,

    foo = ['bar' for _ in xrange(10000000)]

    我的机器上使用的实际内存最高为80.9mb。那我

    del foo

    实际内存下降,但仅限于30.4mb。解释器使用4.4mb基线,因此不26mb向OS 释放内存有什么好处?是因为Python在“提前计划”,以为您可能会再次使用那么多的内存?

  2. 它为什么50.5mb特别释放- 释放的量基于什么?

  3. 有没有一种方法可以强制Python释放所有已使用的内存(如果您知道不会再使用那么多的内存)?

注意 此问题不同于我如何在Python中显式释放内存? 因为这个问题主要解决了内存使用量相对于基线的增加,即使解释器通过垃圾回收(使用gc.collect或不使用)释放了对象之后。

I have a few related questions regarding memory usage in the following example.

  1. If I run in the interpreter,

    foo = ['bar' for _ in xrange(10000000)]
    

    the real memory used on my machine goes up to 80.9mb. I then,

    del foo
    

    real memory goes down, but only to 30.4mb. The interpreter uses 4.4mb baseline so what is the advantage in not releasing 26mb of memory to the OS? Is it because Python is “planning ahead”, thinking that you may use that much memory again?

  2. Why does it release 50.5mb in particular – what is the amount that is released based on?

  3. Is there a way to force Python to release all the memory that was used (if you know you won’t be using that much memory again)?

NOTE This question is different from How can I explicitly free memory in Python? because this question primarily deals with the increase of memory usage from baseline even after the interpreter has freed objects via garbage collection (with use of gc.collect or not).


回答 0

堆上分配的内存可能会出现高水位标记。Python PyObject_Malloc在4个KiB池中分配小对象()的内部优化使情况复杂化,分类为8字节倍数的分配大小-最多256字节(3.3中为512字节)。池本身位于256 KiB竞技场中,因此,如果仅在一个池中使用一个块,则不会释放整个256 KiB竞技场。在Python 3.3中,小型对象分配器已切换为使用匿名内存映射而不是堆,因此它在释放内存方面应表现更好。

此外,内置类型维护可能使用或不使用小型对象分配器的先前分配对象的空闲列表。该int类型维护一个具有自己分配的内存的空闲列表,要清除它,需要调用PyInt_ClearFreeList()。可以通过做一个full来间接地调用它gc.collect

这样尝试,然后告诉我您得到了什么。这是psutil.Process.memory_info的链接。

import os
import gc
import psutil

proc = psutil.Process(os.getpid())
gc.collect()
mem0 = proc.get_memory_info().rss

# create approx. 10**7 int objects and pointers
foo = ['abc' for x in range(10**7)]
mem1 = proc.get_memory_info().rss

# unreference, including x == 9999999
del foo, x
mem2 = proc.get_memory_info().rss

# collect() calls PyInt_ClearFreeList()
# or use ctypes: pythonapi.PyInt_ClearFreeList()
gc.collect()
mem3 = proc.get_memory_info().rss

pd = lambda x2, x1: 100.0 * (x2 - x1) / mem0
print "Allocation: %0.2f%%" % pd(mem1, mem0)
print "Unreference: %0.2f%%" % pd(mem2, mem1)
print "Collect: %0.2f%%" % pd(mem3, mem2)
print "Overall: %0.2f%%" % pd(mem3, mem0)

输出:

Allocation: 3034.36%
Unreference: -752.39%
Collect: -2279.74%
Overall: 2.23%

编辑:

我改用相对于进程VM大小的度量来消除系统中其他进程的影响。

当顶部的连续可用空间达到恒定,动态或可配置的阈值时,C运行时(例如glibc,msvcrt)会缩小堆。使用glibc,您可以使用mallopt(M_TRIM_THRESHOLD)进行调整。鉴于此,如果堆的收缩量比您的块收缩的量更大甚至更多,也就不足为奇了free

在3.x版本range中不会创建列表,因此上面的测试不会创建1000万个int对象。即使这样做,int3.x中的类型也基本上是2.x long,它没有实现自由列表。

Memory allocated on the heap can be subject to high-water marks. This is complicated by Python’s internal optimizations for allocating small objects (PyObject_Malloc) in 4 KiB pools, classed for allocation sizes at multiples of 8 bytes — up to 256 bytes (512 bytes in 3.3). The pools themselves are in 256 KiB arenas, so if just one block in one pool is used, the entire 256 KiB arena will not be released. In Python 3.3 the small object allocator was switched to using anonymous memory maps instead of the heap, so it should perform better at releasing memory.

Additionally, the built-in types maintain freelists of previously allocated objects that may or may not use the small object allocator. The int type maintains a freelist with its own allocated memory, and clearing it requires calling PyInt_ClearFreeList(). This can be called indirectly by doing a full gc.collect.

Try it like this, and tell me what you get. Here’s the link for psutil.Process.memory_info.

import os
import gc
import psutil

proc = psutil.Process(os.getpid())
gc.collect()
mem0 = proc.get_memory_info().rss

# create approx. 10**7 int objects and pointers
foo = ['abc' for x in range(10**7)]
mem1 = proc.get_memory_info().rss

# unreference, including x == 9999999
del foo, x
mem2 = proc.get_memory_info().rss

# collect() calls PyInt_ClearFreeList()
# or use ctypes: pythonapi.PyInt_ClearFreeList()
gc.collect()
mem3 = proc.get_memory_info().rss

pd = lambda x2, x1: 100.0 * (x2 - x1) / mem0
print "Allocation: %0.2f%%" % pd(mem1, mem0)
print "Unreference: %0.2f%%" % pd(mem2, mem1)
print "Collect: %0.2f%%" % pd(mem3, mem2)
print "Overall: %0.2f%%" % pd(mem3, mem0)

Output:

Allocation: 3034.36%
Unreference: -752.39%
Collect: -2279.74%
Overall: 2.23%

Edit:

I switched to measuring relative to the process VM size to eliminate the effects of other processes in the system.

The C runtime (e.g. glibc, msvcrt) shrinks the heap when contiguous free space at the top reaches a constant, dynamic, or configurable threshold. With glibc you can tune this with mallopt (M_TRIM_THRESHOLD). Given this, it isn’t surprising if the heap shrinks by more — even a lot more — than the block that you free.

In 3.x range doesn’t create a list, so the test above won’t create 10 million int objects. Even if it did, the int type in 3.x is basically a 2.x long, which doesn’t implement a freelist.


回答 1

我猜您在这里真正关心的问题是:

有没有一种方法可以强制Python释放所有已使用的内存(如果您知道不会再使用那么多的内存)?

不,那里没有。但是有一个简单的解决方法:子进程。

如果您需要5分钟的500MB临时存储空间,但是之后又需要运行2个小时,并且不会再碰到那么多的内存,请生成一个子进程来进行占用大量内存的工作。当子进程消失时,内存将被释放。

这不是完全琐碎和免费的,但是它很容易且便宜,通常足以使交易值得。

首先,最简单的创建子进程的方法是使用concurrent.futures或(对于3.1及更早版本,futures在PyPI上进行反向移植):

with concurrent.futures.ProcessPoolExecutor(max_workers=1) as executor:
    result = executor.submit(func, *args, **kwargs).result()

如果需要更多控制,请使用该multiprocessing模块。

费用是:

  • 在某些平台上,尤其是Windows,进程启动有点慢。我们在这里以毫秒为单位,而不是分钟,如果您要让一个孩子做300秒的工作,您甚至不会注意到。但这不是免费的。
  • 如果使用大量的临时存储的还真是,这样做可能会导致换出你的主程序。当然,从长远来看,您可以节省时间,因为如果该内存永远存在,那将导致在某些时候进行交换。但是,在某些情况下,这可能会将逐渐的缓慢转变为非常明显的一次(和早期)延迟。
  • 在进程之间发送大量数据可能很慢。同样,如果您正在谈论发送超过2K的参数并返回64K的结果,您甚至不会注意到它,但是如果您发送和接收大量数据,则需要使用其他某种机制(文件,mmapPed或其他格式;共享内存API multiprocessing;等)。
  • 在进程之间发送大量数据意味着数据必须是可腌制的(或者,如果将它们粘贴到文件或共享内存中,struct则是-理想情况下是-理想ctypes)。

I’m guessing the question you really care about here is:

Is there a way to force Python to release all the memory that was used (if you know you won’t be using that much memory again)?

No, there is not. But there is an easy workaround: child processes.

If you need 500MB of temporary storage for 5 minutes, but after that you need to run for another 2 hours and won’t touch that much memory ever again, spawn a child process to do the memory-intensive work. When the child process goes away, the memory gets released.

This isn’t completely trivial and free, but it’s pretty easy and cheap, which is usually good enough for the trade to be worthwhile.

First, the easiest way to create a child process is with concurrent.futures (or, for 3.1 and earlier, the futures backport on PyPI):

with concurrent.futures.ProcessPoolExecutor(max_workers=1) as executor:
    result = executor.submit(func, *args, **kwargs).result()

If you need a little more control, use the multiprocessing module.

The costs are:

  • Process startup is kind of slow on some platforms, notably Windows. We’re talking milliseconds here, not minutes, and if you’re spinning up one child to do 300 seconds’ worth of work, you won’t even notice it. But it’s not free.
  • If the large amount of temporary memory you use really is large, doing this can cause your main program to get swapped out. Of course you’re saving time in the long run, because that if that memory hung around forever it would have to lead to swapping at some point. But this can turn gradual slowness into very noticeable all-at-once (and early) delays in some use cases.
  • Sending large amounts of data between processes can be slow. Again, if you’re talking about sending over 2K of arguments and getting back 64K of results, you won’t even notice it, but if you’re sending and receiving large amounts of data, you’ll want to use some other mechanism (a file, mmapped or otherwise; the shared-memory APIs in multiprocessing; etc.).
  • Sending large amounts of data between processes means the data have to be pickleable (or, if you stick them in a file or shared memory, struct-able or ideally ctypes-able).

回答 2

eryksun已经回答了问题1,而我已经回答了问题3(原始的#4),但是现在让我们回答问题2:

为什么特别释放50.5mb-释放量基于多少?

最终,它基于的是Python内部的一系列巧合,而malloc这些巧合很难预测。

首先,根据测量内存的方式,您可能只在测量实际映射到内存的页面。在这种情况下,每当页面被调页器换出时,内存将显示为“已释放”,即使尚未释放也是如此。

或者您可能正在测量使用中的页面,这些页面可能会或可能不会计算已分配但从未接触过的页面(在乐观地过度分配的系统(例如linux)上),已分配但已标记的页面MADV_FREE等。

如果您确实在测量分配的页面(这实际上不是一件非常有用的事情,但这似乎是您要问的问题),并且页面实际上已经被释放了,则可能会发生两种情况:您曾经使用过brk或等效方法来缩小数据段(如今非常少见),或者您曾经使用过munmap或类似方法来释放映射的段。(从理论上讲,后者也有一个较小的变体,因为有一些方法可以释放一部分已映射的段,例如,将其窃取MAP_FIXED用于MADV_FREE立即取消映射的段。)

但是大多数程序并不直接在内存页面中分配内容。他们使用malloc-style分配器。当您调用时free,如果您恰巧free正在映射中的最后一个活动对象(或数据段的最后N个页面),则分配器只能将页面释放到OS 。您的应用程序无法合理地预测甚至提前检测到它。

CPython使这一过程变得更加复杂-它在的顶部具有一个自定义的2级对象分配器,而在的顶部具有一个自定义的内存分配器malloc。(有关更详细的解释,请参见源注释。)此外,即使在C API级别上,Python也要少得多,您甚至不直接控制顶级对象的释放时间。

因此,当您释放一个对象时,如何知道它是否将向OS释放内存?好吧,首先,您必须知道已发布了最后一个引用(包括您不知道的任何内部引用),从而允许GC对其进行分配。(与其他实现不同,至少CPython会在允许的情况下立即释放对象。)这通常会在下一级向下释放至少两件事(例如,对于一个字符串,您释放该PyString对象和字符串缓冲区) )。

如果确实要释放对象,则要知道这是否导致下一级别的释放对象存储块,您必须知道对象分配器的内部状态及其实现方式。(除非您要取消分配块中的最后一件事,否则显然不会发生,即使那样,也可能不会发生。)

如果确实要释放对象存储块,要知道这是否导致free调用,则必须知道PyMem分配器的内部状态及其实现方式。(同样,您必须在malloced区域中释放最后一个使用中的块,即使那样,也可能不会发生。)

如果你 free一个malloc版区,要知道这是否会导致munmap或同等学历(或brk),你必须知道的内部状态malloc,以及它是如何实现的。而且,这个不同于其他,它是高度特定于平台的。(同样,您通常必须malloc在一个mmap网段中释放最后一次使用的资源,即使那样,也可能不会发生。)

因此,如果您想了解为什么它恰好释放了50.5mb,则必须从下至上进行跟踪。malloc当您进行一次或多次free通话(可能超过50.5mb)时,为什么不映射50.5mb的页面?您必须阅读平台的malloc,然后遍历各种表和列表以查看其当前状态。(在某些平台上,它甚至可能利用系统级信息,而如果不制作系统快照以进行脱机检查几乎是不可能捕获的,但是幸运的是,这通常不是问题。)然后,您必须在以上三个级别上执行相同的操作。

因此,对该问题唯一有用的答案是“因为”。

除非您正在进行资源有限的(例如嵌入式)开发,否则您没有理由关心这些细节。

如果你正在做资源有限的发展,了解这些细节是无用的; 您几乎必须在所有这些级别上进行最终运行,尤其mmap是在应用程序级别上可能需要的内存(可能在两者之间使用一个简单的,易于理解的,特定于应用程序的区域分配器)。

eryksun has answered question #1, and I’ve answered question #3 (the original #4), but now let’s answer question #2:

Why does it release 50.5mb in particular – what is the amount that is released based on?

What it’s based on is, ultimately, a whole series of coincidences inside Python and malloc that are very hard to predict.

First, depending on how you’re measuring memory, you may only be measuring pages actually mapped into memory. In that case, any time a page gets swapped out by the pager, memory will show up as “freed”, even though it hasn’t been freed.

Or you may be measuring in-use pages, which may or may not count allocated-but-never-touched pages (on systems that optimistically over-allocate, like linux), pages that are allocated but tagged MADV_FREE, etc.

If you really are measuring allocated pages (which is actually not a very useful thing to do, but it seems to be what you’re asking about), and pages have really been deallocated, two circumstances in which this can happen: Either you’ve used brk or equivalent to shrink the data segment (very rare nowadays), or you’ve used munmap or similar to release a mapped segment. (There’s also theoretically a minor variant to the latter, in that there are ways to release part of a mapped segment—e.g., steal it with MAP_FIXED for a MADV_FREE segment that you immediately unmap.)

But most programs don’t directly allocate things out of memory pages; they use a malloc-style allocator. When you call free, the allocator can only release pages to the OS if you just happen to be freeing the last live object in a mapping (or in the last N pages of the data segment). There’s no way your application can reasonably predict this, or even detect that it happened in advance.

CPython makes this even more complicated—it has a custom 2-level object allocator on top of a custom memory allocator on top of malloc. (See the source comments for a more detailed explanation.) And on top of that, even at the C API level, much less Python, you don’t even directly control when the top-level objects are deallocated.

So, when you release an object, how do you know whether it’s going to release memory to the OS? Well, first you have to know that you’ve released the last reference (including any internal references you didn’t know about), allowing the GC to deallocate it. (Unlike other implementations, at least CPython will deallocate an object as soon as it’s allowed to.) This usually deallocates at least two things at the next level down (e.g., for a string, you’re releasing the PyString object, and the string buffer).

If you do deallocate an object, to know whether this causes the next level down to deallocate a block of object storage, you have to know the internal state of the object allocator, as well as how it’s implemented. (It obviously can’t happen unless you’re deallocating the last thing in the block, and even then, it may not happen.)

If you do deallocate a block of object storage, to know whether this causes a free call, you have to know the internal state of the PyMem allocator, as well as how it’s implemented. (Again, you have to be deallocating the last in-use block within a malloced region, and even then, it may not happen.)

If you do free a malloced region, to know whether this causes an munmap or equivalent (or brk), you have to know the internal state of the malloc, as well as how it’s implemented. And this one, unlike the others, is highly platform-specific. (And again, you generally have to be deallocating the last in-use malloc within an mmap segment, and even then, it may not happen.)

So, if you want to understand why it happened to release exactly 50.5mb, you’re going to have to trace it from the bottom up. Why did malloc unmap 50.5mb worth of pages when you did those one or more free calls (for probably a bit more than 50.5mb)? You’d have to read your platform’s malloc, and then walk the various tables and lists to see its current state. (On some platforms, it may even make use of system-level information, which is pretty much impossible to capture without making a snapshot of the system to inspect offline, but luckily this isn’t usually a problem.) And then you have to do the same thing at the 3 levels above that.

So, the only useful answer to the question is “Because.”

Unless you’re doing resource-limited (e.g., embedded) development, you have no reason to care about these details.

And if you are doing resource-limited development, knowing these details is useless; you pretty much have to do an end-run around all those levels and specifically mmap the memory you need at the application level (possibly with one simple, well-understood, application-specific zone allocator in between).


回答 3

首先,您可能要安装一眼:

sudo apt-get install python-pip build-essential python-dev lm-sensors 
sudo pip install psutil logutils bottle batinfo https://bitbucket.org/gleb_zhulik/py3sensors/get/tip.tar.gz zeroconf netifaces pymdstat influxdb elasticsearch potsdb statsd pystache docker-py pysnmp pika py-cpuinfo bernhard
sudo pip install glances

然后在终端中运行它!

glances

在您的Python代码中,在文件的开头添加以下内容:

import os
import gc # Garbage Collector

使用“ Big”变量(例如:myBigVar)后,您要为其释放内存,请在python代码中编写以下内容:

del myBigVar
gc.collect()

在另一个终端中,运行您的python代码,并在“ glances”终端中观察如何在系统中管理内存!

祝好运!

PS我假设您正在Debian或Ubuntu系统上工作

First, you may want to install glances:

sudo apt-get install python-pip build-essential python-dev lm-sensors 
sudo pip install psutil logutils bottle batinfo https://bitbucket.org/gleb_zhulik/py3sensors/get/tip.tar.gz zeroconf netifaces pymdstat influxdb elasticsearch potsdb statsd pystache docker-py pysnmp pika py-cpuinfo bernhard
sudo pip install glances

Then run it in the terminal!

glances

In your Python code, add at the begin of the file, the following:

import os
import gc # Garbage Collector

After using the “Big” variable (for example: myBigVar) for which, you would like to release memory, write in your python code the following:

del myBigVar
gc.collect()

In another terminal, run your python code and observe in the “glances” terminal, how the memory is managed in your system!

Good luck!

P.S. I assume you are working on a Debian or Ubuntu system