Python 实用宝典

Question 1

My memory usage increases over time and restarting Django is not kind to users.

I am unsure how to go about profiling the memory usage but some tips on how to start measuring would be useful.

I have a feeling that there are some simple steps that could produce big gains. Ensuring ‘debug’ is set to ‘False’ is an obvious biggie.

Can anyone suggest others? How much improvement would caching on low-traffic sites?

In this case I’m running under Apache 2.x with mod_python. I’ve heard mod_wsgi is a bit leaner but it would be tricky to switch at this stage unless I know the gains would be significant.

Edit: Thanks for the tips so far. Any suggestions how to discover what’s using up the memory? Are there any guides to Python memory profiling?

Also as mentioned there’s a few things that will make it tricky to switch to mod_wsgi so I’d like to have some idea of the gains I could expect before ploughing forwards in that direction.

Edit: Carl posted a slightly more detailed reply here that is worth reading: Django Deployment: Cutting Apache’s Overhead

Edit: Graham Dumpleton’s article is the best I’ve found on the MPM and mod_wsgi related stuff. I am rather disappointed that no-one could provide any info on debugging the memory usage in the app itself though.

Final Edit: Well I have been discussing this with Webfaction to see if they could assist with recompiling Apache and this is their word on the matter:

“I really don’t think that you will get much of a benefit by switching to an MPM Worker + mod_wsgi setup. I estimate that you might be able to save around 20MB, but probably not much more than that.”

So! This brings me back to my original question (which I am still none the wiser about). How does one go about identifying where the problems lies? It’s a well known maxim that you don’t optimize without testing to see where you need to optimize but there is very little in the way of tutorials on measuring Python memory usage and none at all specific to Django.

Thanks for everyone’s assistance but I think this question is still open!

Another final edit ;-)

I asked this on the django-users list and got some very helpful replies

Honestly the last update ever!

This was just released. Could be the best solution yet: Profiling Django object size and memory usage with Pympler

Question 2

Make sure you are not keeping global references to data. That prevents the python garbage collector from releasing the memory.

Don’t use mod_python. It loads an interpreter inside apache. If you need to use apache, use mod_wsgi instead. It is not tricky to switch. It is very easy. mod_wsgi is way easier to configure for django than brain-dead mod_python.

If you can remove apache from your requirements, that would be even better to your memory. spawning seems to be the new fast scalable way to run python web applications.

EDIT: I don’t see how switching to mod_wsgi could be “tricky“. It should be a very easy task. Please elaborate on the problem you are having with the switch.

Question 3

If you are running under mod_wsgi, and presumably spawning since it is WSGI compliant, you can use Dozer to look at your memory usage.

Under mod_wsgi just add this at the bottom of your WSGI script:

from dozer import Dozer
application = Dozer(application)

Then point your browser at http://domain/_dozer/index to see a list of all your memory allocations.

I’ll also just add my voice of support for mod_wsgi. It makes a world of difference in terms of performance and memory usage over mod_python. Graham Dumpleton’s support for mod_wsgi is outstanding, both in terms of active development and in helping people on the mailing list to optimize their installations. David Cramer at curse.com has posted some charts (which I can’t seem to find now unfortunately) showing the drastic reduction in cpu and memory usage after they switched to mod_wsgi on that high traffic site. Several of the django devs have switched. Seriously, it’s a no-brainer :)

Question 4

These are the Python memory profiler solutions I’m aware of (not Django related):

Heapy
pysizer (discontinued)
~~Python Memory Validator (commercial)~~
Pympler

Disclaimer: I have a stake in the latter.

The individual project’s documentation should give you an idea of how to use these tools to analyze memory behavior of Python applications.

The following is a nice “war story” that also gives some helpful pointers:

Reducing the footprint of python applications

Question 5

Additionally, check if you do not use any of known leakers. MySQLdb is known to leak enormous amounts of memory with Django due to bug in unicode handling. Other than that, Django Debug Toolbar might help you to track the hogs.

Question 6

In addition to not keeping around global references to large data objects, try to avoid loading large datasets into memory at all wherever possible.

Switch to mod_wsgi in daemon mode, and use Apache’s worker mpm instead of prefork. This latter step can allow you to serve many more concurrent users with much less memory overhead.

Question 7

Webfaction actually has some tips for keeping django memory usage down.

The major points:

Make sure debug is set to false (you already know that).
Use “ServerLimit” in your apache config
Check that no big objects are being loaded in memory
Consider serving static content in a separate process or server.
Use “MaxRequestsPerChild” in your apache config
Find out and understand how much memory you’re using

Question 8

Another plus for mod_wsgi: set a maximum-requests parameter in your WSGIDaemonProcess directive and mod_wsgi will restart the daemon process every so often. There should be no visible effect for the user, other than a slow page load the first time a fresh process is hit, as it’ll be loading Django and your application code into memory.

But even if you do have memory leaks, that should keep the process size from getting too large, without having to interrupt service to your users.

Question 9

Here is the script I use for mod_wsgi (called wsgi.py, and put in the root off my django project):

import os
import sys
import django.core.handlers.wsgi

from os import path

sys.stdout = open('/dev/null', 'a+')
sys.stderr = open('/dev/null', 'a+')

sys.path.append(path.join(path.dirname(__file__), '..'))

os.environ['DJANGO_SETTINGS_MODULE'] = 'myproject.settings'
application = django.core.handlers.wsgi.WSGIHandler()

Adjust myproject.settings and the path as needed. I redirect all output to /dev/null since mod_wsgi by default prevents printing. Use logging instead.

For apache:

<VirtualHost *>
   ServerName myhost.com

   ErrorLog /var/log/apache2/error-myhost.log
   CustomLog /var/log/apache2/access-myhost.log common

   DocumentRoot "/var/www"

   WSGIScriptAlias / /path/to/my/wsgi.py

</VirtualHost>

Hopefully this should at least help you set up mod_wsgi so you can see if it makes a difference.

Question 10

Caches: make sure they’re being flushed. Its easy for something to land in a cache, but never be GC’d because of the cache reference.

Swig’d code: Make sure any memory management is being done correctly, its really easy to miss these in python, especially with third party libraries

Monitoring: If you can, get data about memory usage and hits. Usually you’ll see a correlation between a certain type of request and memory usage.

Question 11

We stumbled over a bug in Django with big sitemaps (10.000 items). Seems Django is trying to load them all in memory when generating the sitemap: http://code.djangoproject.com/ticket/11572 – effectively kills the apache process when Google pays a visit to the site.

Question 12

I have a few related questions regarding memory usage in the following example.

If I run in the interpreter,
```
foo = ['bar' for _ in xrange(10000000)]
```
the real memory used on my machine goes up to 80.9mb. I then,
```
del foo
```
real memory goes down, but only to 30.4mb. The interpreter uses 4.4mb baseline so what is the advantage in not releasing 26mb of memory to the OS? Is it because Python is “planning ahead”, thinking that you may use that much memory again?
Why does it release 50.5mb in particular – what is the amount that is released based on?
Is there a way to force Python to release all the memory that was used (if you know you won’t be using that much memory again)?

NOTE This question is different from How can I explicitly free memory in Python? because this question primarily deals with the increase of memory usage from baseline even after the interpreter has freed objects via garbage collection (with use of gc.collect or not).

Question 13

Memory allocated on the heap can be subject to high-water marks. This is complicated by Python’s internal optimizations for allocating small objects (PyObject_Malloc) in 4 KiB pools, classed for allocation sizes at multiples of 8 bytes — up to 256 bytes (512 bytes in 3.3). The pools themselves are in 256 KiB arenas, so if just one block in one pool is used, the entire 256 KiB arena will not be released. In Python 3.3 the small object allocator was switched to using anonymous memory maps instead of the heap, so it should perform better at releasing memory.

Additionally, the built-in types maintain freelists of previously allocated objects that may or may not use the small object allocator. The int type maintains a freelist with its own allocated memory, and clearing it requires calling PyInt_ClearFreeList(). This can be called indirectly by doing a full gc.collect.

Try it like this, and tell me what you get. Here’s the link for psutil.Process.memory_info.

import os
import gc
import psutil

proc = psutil.Process(os.getpid())
gc.collect()
mem0 = proc.get_memory_info().rss

# create approx. 10**7 int objects and pointers
foo = ['abc' for x in range(10**7)]
mem1 = proc.get_memory_info().rss

# unreference, including x == 9999999
del foo, x
mem2 = proc.get_memory_info().rss

# collect() calls PyInt_ClearFreeList()
# or use ctypes: pythonapi.PyInt_ClearFreeList()
gc.collect()
mem3 = proc.get_memory_info().rss

pd = lambda x2, x1: 100.0 * (x2 - x1) / mem0
print "Allocation: %0.2f%%" % pd(mem1, mem0)
print "Unreference: %0.2f%%" % pd(mem2, mem1)
print "Collect: %0.2f%%" % pd(mem3, mem2)
print "Overall: %0.2f%%" % pd(mem3, mem0)

Output:

Allocation: 3034.36%
Unreference: -752.39%
Collect: -2279.74%
Overall: 2.23%

Edit:

I switched to measuring relative to the process VM size to eliminate the effects of other processes in the system.

The C runtime (e.g. glibc, msvcrt) shrinks the heap when contiguous free space at the top reaches a constant, dynamic, or configurable threshold. With glibc you can tune this with mallopt (M_TRIM_THRESHOLD). Given this, it isn’t surprising if the heap shrinks by more — even a lot more — than the block that you free.

In 3.x range doesn’t create a list, so the test above won’t create 10 million int objects. Even if it did, the int type in 3.x is basically a 2.x long, which doesn’t implement a freelist.

Question 14

I’m guessing the question you really care about here is:

Is there a way to force Python to release all the memory that was used (if you know you won’t be using that much memory again)?

No, there is not. But there is an easy workaround: child processes.

If you need 500MB of temporary storage for 5 minutes, but after that you need to run for another 2 hours and won’t touch that much memory ever again, spawn a child process to do the memory-intensive work. When the child process goes away, the memory gets released.

This isn’t completely trivial and free, but it’s pretty easy and cheap, which is usually good enough for the trade to be worthwhile.

First, the easiest way to create a child process is with concurrent.futures (or, for 3.1 and earlier, the futures backport on PyPI):

with concurrent.futures.ProcessPoolExecutor(max_workers=1) as executor:
    result = executor.submit(func, *args, **kwargs).result()

If you need a little more control, use the multiprocessing module.

The costs are:

Process startup is kind of slow on some platforms, notably Windows. We’re talking milliseconds here, not minutes, and if you’re spinning up one child to do 300 seconds’ worth of work, you won’t even notice it. But it’s not free.
If the large amount of temporary memory you use really is large, doing this can cause your main program to get swapped out. Of course you’re saving time in the long run, because that if that memory hung around forever it would have to lead to swapping at some point. But this can turn gradual slowness into very noticeable all-at-once (and early) delays in some use cases.
Sending large amounts of data between processes can be slow. Again, if you’re talking about sending over 2K of arguments and getting back 64K of results, you won’t even notice it, but if you’re sending and receiving large amounts of data, you’ll want to use some other mechanism (a file, mmapped or otherwise; the shared-memory APIs in multiprocessing; etc.).
Sending large amounts of data between processes means the data have to be pickleable (or, if you stick them in a file or shared memory, struct-able or ideally ctypes-able).

Question 15

eryksun has answered question #1, and I’ve answered question #3 (the original #4), but now let’s answer question #2:

Why does it release 50.5mb in particular – what is the amount that is released based on?

What it’s based on is, ultimately, a whole series of coincidences inside Python and malloc that are very hard to predict.

First, depending on how you’re measuring memory, you may only be measuring pages actually mapped into memory. In that case, any time a page gets swapped out by the pager, memory will show up as “freed”, even though it hasn’t been freed.

Or you may be measuring in-use pages, which may or may not count allocated-but-never-touched pages (on systems that optimistically over-allocate, like linux), pages that are allocated but tagged MADV_FREE, etc.

If you really are measuring allocated pages (which is actually not a very useful thing to do, but it seems to be what you’re asking about), and pages have really been deallocated, two circumstances in which this can happen: Either you’ve used brk or equivalent to shrink the data segment (very rare nowadays), or you’ve used munmap or similar to release a mapped segment. (There’s also theoretically a minor variant to the latter, in that there are ways to release part of a mapped segment—e.g., steal it with MAP_FIXED for a MADV_FREE segment that you immediately unmap.)

But most programs don’t directly allocate things out of memory pages; they use a malloc-style allocator. When you call free, the allocator can only release pages to the OS if you just happen to be freeing the last live object in a mapping (or in the last N pages of the data segment). There’s no way your application can reasonably predict this, or even detect that it happened in advance.

CPython makes this even more complicated—it has a custom 2-level object allocator on top of a custom memory allocator on top of malloc. (See the source comments for a more detailed explanation.) And on top of that, even at the C API level, much less Python, you don’t even directly control when the top-level objects are deallocated.

So, when you release an object, how do you know whether it’s going to release memory to the OS? Well, first you have to know that you’ve released the last reference (including any internal references you didn’t know about), allowing the GC to deallocate it. (Unlike other implementations, at least CPython will deallocate an object as soon as it’s allowed to.) This usually deallocates at least two things at the next level down (e.g., for a string, you’re releasing the PyString object, and the string buffer).

If you do deallocate an object, to know whether this causes the next level down to deallocate a block of object storage, you have to know the internal state of the object allocator, as well as how it’s implemented. (It obviously can’t happen unless you’re deallocating the last thing in the block, and even then, it may not happen.)

If you do deallocate a block of object storage, to know whether this causes a free call, you have to know the internal state of the PyMem allocator, as well as how it’s implemented. (Again, you have to be deallocating the last in-use block within a malloced region, and even then, it may not happen.)

If you do free a malloced region, to know whether this causes an munmap or equivalent (or brk), you have to know the internal state of the malloc, as well as how it’s implemented. And this one, unlike the others, is highly platform-specific. (And again, you generally have to be deallocating the last in-use malloc within an mmap segment, and even then, it may not happen.)

So, if you want to understand why it happened to release exactly 50.5mb, you’re going to have to trace it from the bottom up. Why did malloc unmap 50.5mb worth of pages when you did those one or more free calls (for probably a bit more than 50.5mb)? You’d have to read your platform’s malloc, and then walk the various tables and lists to see its current state. (On some platforms, it may even make use of system-level information, which is pretty much impossible to capture without making a snapshot of the system to inspect offline, but luckily this isn’t usually a problem.) And then you have to do the same thing at the 3 levels above that.

So, the only useful answer to the question is “Because.”

Unless you’re doing resource-limited (e.g., embedded) development, you have no reason to care about these details.

And if you are doing resource-limited development, knowing these details is useless; you pretty much have to do an end-run around all those levels and specifically mmap the memory you need at the application level (possibly with one simple, well-understood, application-specific zone allocator in between).

Question 16

First, you may want to install glances:

sudo apt-get install python-pip build-essential python-dev lm-sensors 
sudo pip install psutil logutils bottle batinfo https://bitbucket.org/gleb_zhulik/py3sensors/get/tip.tar.gz zeroconf netifaces pymdstat influxdb elasticsearch potsdb statsd pystache docker-py pysnmp pika py-cpuinfo bernhard
sudo pip install glances

Then run it in the terminal!

glances

In your Python code, add at the begin of the file, the following:

import os
import gc # Garbage Collector

After using the “Big” variable (for example: myBigVar) for which, you would like to release memory, write in your python code the following:

del myBigVar
gc.collect()

In another terminal, run your python code and observe in the “glances” terminal, how the memory is managed in your system!

Good luck!

P.S. I assume you are working on a Debian or Ubuntu system

Question 17

I’ve been searching for the accurate answer to this question for a couple of days now but haven’t got anything good. I’m not a complete beginner in programming, but not yet even on the intermediate level.

When I’m in the shell of Python, I type: dir() and I can see all the names of all the objects in the current scope (main block), there are 6 of them:

['__builtins__', '__doc__', '__loader__', '__name__', '__package__', '__spec__']

Then, when I’m declaring a variable, for example x = 10, it automatically adds to that lists of objects under built-in module dir(), and when I type dir() again, it shows now:

['__builtins__', '__doc__', '__loader__', '__name__', '__package__', '__spec__', 'x']

The same goes for functions, classes and so on.

How do I delete all those new objects without erasing the standard 6 which where available at the beginning?

I’ve read here about “memory cleaning”, “cleaning of the console”, which erases all the text from the command prompt window:

>>> import sys
>>> clear = lambda: os.system('cls')
>>> clear()

But all this has nothing to do with what I’m trying to achieve, it doesn’t clean out all used objects.

Question 18

You can delete individual names with del:

del x

or you can remove them from the globals() object:

for name in dir():
    if not name.startswith('_'):
        del globals()[name]

This is just an example loop; it defensively only deletes names that do not start with an underscore, making a (not unreasoned) assumption that you only used names without an underscore at the start in your interpreter. You could use a hard-coded list of names to keep instead (whitelisting) if you really wanted to be thorough. There is no built-in function to do the clearing for you, other than just exit and restart the interpreter.

Modules you’ve imported (import os) are going to remain imported because they are referenced by sys.modules; subsequent imports will reuse the already imported module object. You just won’t have a reference to them in your current global namespace.

Question 19

Yes. There is a simple way to remove everything in iPython. In iPython console, just type:

%reset

Then system will ask you to confirm. Press y. If you don’t want to see this prompt, simply type:

%reset -f

This should work..

Question 20

You can use python garbage collector:

import gc
gc.collect()

Question 21

If you are in an interactive environment like Jupyter or ipython you might be interested in clearing unwanted var’s if they are getting heavy.

The magic-commands reset and reset_selective is vailable on interactive python sessions like ipython and Jupyter

1) reset

reset Resets the namespace by removing all names defined by the user, if called without arguments.

in and the out parameters specify whether you want to flush the in/out caches. The directory history is flushed with the dhist parameter.

reset in out

Another interesting one is array that only removes numpy Arrays:

reset array

2) reset_selective

Resets the namespace by removing names defined by the user. Input/Output history are left around in case you need them.

Clean Array Example:

In [1]: import numpy as np
In [2]: littleArray = np.array([1,2,3,4,5])
In [3]: who_ls
Out[3]: ['littleArray', 'np']
In [4]: reset_selective -f littleArray
In [5]: who_ls
Out[5]: ['np']

Source: http://ipython.readthedocs.io/en/stable/interactive/magics.html

Question 22

This worked for me.

You need to run it twice once for globals followed by locals

for name in dir():
    if not name.startswith('_'):
        del globals()[name]

for name in dir():
    if not name.startswith('_'):
        del locals()[name]

Question 23

Actually python will reclaim the memory which is not in use anymore.This is called garbage collection which is automatic process in python. But still if you want to do it then you can delete it by del variable_name. You can also do it by assigning the variable to None

a = 10
print a 

del a       
print a      ## throws an error here because it's been deleted already.

The only way to truly reclaim memory from unreferenced Python objects is via the garbage collector. The del keyword simply unbinds a name from an object, but the object still needs to be garbage collected. You can force garbage collector to run using the gc module, but this is almost certainly a premature optimization but it has its own risks. Using del has no real effect, since those names would have been deleted as they went out of scope anyway.

Question 24

I would like to know if pytorch is using my GPU. It’s possible to detect with nvidia-smi if there is any activity from the GPU during the process, but I want something written in a python script.

Is there a way to do so?

Question 25

This is going to work :

In [1]: import torch

In [2]: torch.cuda.current_device()
Out[2]: 0

In [3]: torch.cuda.device(0)
Out[3]: <torch.cuda.device at 0x7efce0b03be0>

In [4]: torch.cuda.device_count()
Out[4]: 1

In [5]: torch.cuda.get_device_name(0)
Out[5]: 'GeForce GTX 950M'

In [6]: torch.cuda.is_available()
Out[6]: True

This tells me the GPU GeForce GTX 950M is being used by PyTorch.

Question 26

As it hasn’t been proposed here, I’m adding a method using torch.device, as this is quite handy, also when initializing tensors on the correct device.

# setting device on GPU if available, else CPU
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print('Using device:', device)
print()

#Additional Info when using cuda
if device.type == 'cuda':
    print(torch.cuda.get_device_name(0))
    print('Memory Usage:')
    print('Allocated:', round(torch.cuda.memory_allocated(0)/1024**3,1), 'GB')
    print('Cached:   ', round(torch.cuda.memory_reserved(0)/1024**3,1), 'GB')

Edit: torch.cuda.memory_cached has been renamed to torch.cuda.memory_reserved. So use memory_cached for older versions.

Output:

Using device: cuda

Tesla K80
Memory Usage:
Allocated: 0.3 GB
Cached:    0.6 GB

As mentioned above, using device it is possible to:

To move tensors to the respective device:
```
  torch.rand(10).to(device)
```
To create a tensor directly on the device:
```
  torch.rand(10, device=device)
```

Which makes switching between CPU and GPU comfortable without changing the actual code.

Edit:

As there has been some questions and confusion about the cached and allocated memory I’m adding some additional information about it:

torch.cuda.max_memory_cached(device=None)

Returns the maximum GPU memory managed by the caching allocator in bytes for a given device.
torch.cuda.memory_allocated(device=None)

Returns the current GPU memory usage by tensors in bytes for a given device.

You can either directly hand over a device as specified further above in the post or you can leave it None and it will use the current_device().

Additional note: Old graphic cards with Cuda compute capability 3.0 or lower may be visible but cannot be used by Pytorch!
Thanks to hekimgil for pointing this out! – “Found GPU0 GeForce GT 750M which is of cuda capability 3.0. PyTorch no longer supports this GPU because it is too old. The minimum cuda capability that we support is 3.5.”

Question 27

After you start running the training loop, if you want to manually watch it from the terminal whether your program is utilizing the GPU resources and to what extent, then you can simply use watch as in:

$ watch -n 2 nvidia-smi

This will continuously update the usage stats for every 2 seconds until you press ctrl+c

If you need more control on more GPU stats you might need, you can use more sophisticated version of nvidia-smi with --query-gpu=.... Below is a simple illustration of this:

$ watch -n 3 nvidia-smi --query-gpu=index,gpu_name,memory.total,memory.used,memory.free,temperature.gpu,pstate,utilization.gpu,utilization.memory --format=csv

which would output the stats something like:

Note: There should not be any space between the comma separated query names in --query-gpu=.... Else those values will be ignored and no stats are returned.

Also, you can check whether your installation of PyTorch detects your CUDA installation correctly by doing:

In [13]: import  torch

In [14]: torch.cuda.is_available()
Out[14]: True

True status means that PyTorch is configured correctly and is using the GPU although you have to move/place the tensors with necessary statements in your code.

If you want to do this inside Python code, then look into this module:

https://github.com/jonsafari/nvidia-ml-py or in pypi here: https://pypi.python.org/pypi/nvidia-ml-py/

Question 28

On the office site and the get start page, check GPU for PyTorch as below:

import torch
torch.cuda.is_available()

Reference: PyTorch|Get Start

Question 29

From practical standpoint just one minor digression:

import torch
dev = torch.device("cuda") if torch.cuda.is_available() else torch.device("cpu")

This dev now knows if cuda or cpu.

And there is a difference how you deal with model and with tensors when moving to cuda. It is a bit strange at first.

import torch
import torch.nn as nn
dev = torch.device("cuda") if torch.cuda.is_available() else torch.device("cpu")
t1 = torch.randn(1,2)
t2 = torch.randn(1,2).to(dev)
print(t1)  # tensor([[-0.2678,  1.9252]])
print(t2)  # tensor([[ 0.5117, -3.6247]], device='cuda:0')
t1.to(dev) 
print(t1)  # tensor([[-0.2678,  1.9252]]) 
print(t1.is_cuda) # False
t1 = t1.to(dev)
print(t1)  # tensor([[-0.2678,  1.9252]], device='cuda:0') 
print(t1.is_cuda) # True

class M(nn.Module):
    def __init__(self):        
        super().__init__()        
        self.l1 = nn.Linear(1,2)

    def forward(self, x):                      
        x = self.l1(x)
        return x
model = M()   # not on cuda
model.to(dev) # is on cuda (all parameters)
print(next(model.parameters()).is_cuda) # True

This all is tricky and understanding it once, helps you to deal fast with less debugging.

Question 30

To check if there is a GPU available:

torch.cuda.is_available()

If the above function returns False,

you either have no GPU,
or the Nvidia drivers have not been installed so the OS does not see the GPU,
or the GPU is being hidden by the environmental variable CUDA_VISIBLE_DEVICES. When the value of CUDA_VISIBLE_DEVICES is -1, then all your devices are being hidden. You can check that value in code with this line: os.environ['CUDA_VISIBLE_DEVICES']

If the above function returns True that does not necessarily mean that you are using the GPU. In Pytorch you can allocate tensors to devices when you create them. By default, tensors get allocated to the cpu. To check where your tensor is allocated do:

# assuming that 'a' is a tensor created somewhere else
a.device  # returns the device where the tensor is allocated

Note that you cannot operate on tensors allocated in different devices. To see how to allocate a tensor to the GPU, see here: https://pytorch.org/docs/stable/notes/cuda.html

Question 31

Almost all answers here reference torch.cuda.is_available(). However, that’s only one part of the coin. It tells you whether the GPU (actually CUDA) is available, not whether it’s actually being used. In a typical setup, you would set your device with something like this:

device = torch.device("cuda") if torch.cuda.is_available() else torch.device("cpu")

but in larger environments (e.g. research) it is also common to give the user more options, so based on input they can disable CUDA, specify CUDA IDs, and so on. In such case, whether or not the GPU is used is not only based on whether it is available or not. After the device has been set to a torch device, you can get its type property to verify whether it’s CUDA or not.

if device.type == 'cuda':
    # do something

Question 32

Simply from command prompt or Linux environment run the following command.

python -c 'import torch; print(torch.cuda.is_available())'

The above should print True

python -c 'import torch; print(torch.rand(2,3).cuda())'

This one should print the following:

tensor([[0.7997, 0.6170, 0.7042], [0.4174, 0.1494, 0.0516]], device='cuda:0')

Question 33

If you are here because your pytorch always gives False for torch.cuda.is_available() that’s probably because you installed your pytorch version without GPU support. (Eg: you coded up in laptop then testing on server).

The solution is to uninstall and install pytorch again with the right command from pytorch downloads page. Also refer this pytorch issue.

Question 34

Create a tensor on the GPU as follows:

$ python
>>> import torch
>>> print(torch.rand(3,3).cuda())

Do not quit, open another terminal and check if the python process is using the GPU using:

$ nvidia-smi

Question 35

I created two lists l1 and l2, but each one with a different creation method:

import sys

l1 = [None] * 10
l2 = [None for _ in range(10)]

print('Size of l1 =', sys.getsizeof(l1))
print('Size of l2 =', sys.getsizeof(l2))

But the output surprised me:

Size of l1 = 144
Size of l2 = 192

The list created with a list comprehension is a bigger size in memory, but the two lists are identical in Python otherwise.

Why is that? Is this some CPython internal thing, or some other explanation?

Question 36

When you write [None] * 10, Python knows that it will need a list of exactly 10 objects, so it allocates exactly that.

When you use a list comprehension, Python doesn’t know how much it will need. So it gradually grows the list as elements are added. For each reallocation it allocates more room than is immediately needed, so that it doesn’t have to reallocate for each element. The resulting list is likely to be somewhat bigger than needed.

You can see this behavior when comparing lists created with similar sizes:

>>> sys.getsizeof([None]*15)
184
>>> sys.getsizeof([None]*16)
192
>>> sys.getsizeof([None for _ in range(15)])
192
>>> sys.getsizeof([None for _ in range(16)])
192
>>> sys.getsizeof([None for _ in range(17)])
264

You can see that the first method allocates just what is needed, while the second one grows periodically. In this example, it allocates enough for 16 elements, and had to reallocate when reaching the 17th.

Question 37

As noted in this question the list-comprehension uses list.append under the hood, so it will call the list-resize method, which overallocates.

To demonstrate this to yourself, you can actually use the dis dissasembler:

>>> code = compile('[x for x in iterable]', '', 'eval')
>>> import dis
>>> dis.dis(code)
  1           0 LOAD_CONST               0 (<code object <listcomp> at 0x10560b810, file "", line 1>)
              2 LOAD_CONST               1 ('<listcomp>')
              4 MAKE_FUNCTION            0
              6 LOAD_NAME                0 (iterable)
              8 GET_ITER
             10 CALL_FUNCTION            1
             12 RETURN_VALUE

Disassembly of <code object <listcomp> at 0x10560b810, file "", line 1>:
  1           0 BUILD_LIST               0
              2 LOAD_FAST                0 (.0)
        >>    4 FOR_ITER                 8 (to 14)
              6 STORE_FAST               1 (x)
              8 LOAD_FAST                1 (x)
             10 LIST_APPEND              2
             12 JUMP_ABSOLUTE            4
        >>   14 RETURN_VALUE
>>>

Notice the LIST_APPEND opcode in the disassembly of the <listcomp> code object. From the docs:

LIST_APPEND(i)

Calls list.append(TOS[-i], TOS). Used to implement list comprehensions.

Now, for the list-repetition operation, we have a hint about what is going on if we consider:

>>> import sys
>>> sys.getsizeof([])
64
>>> 8*10
80
>>> 64 + 80
144
>>> sys.getsizeof([None]*10)
144

So, it seems to be able to exactly allocate the size. Looking at the source code, we see this is exactly what happens:

static PyObject *
list_repeat(PyListObject *a, Py_ssize_t n)
{
    Py_ssize_t i, j;
    Py_ssize_t size;
    PyListObject *np;
    PyObject **p, **items;
    PyObject *elem;
    if (n < 0)
        n = 0;
    if (n > 0 && Py_SIZE(a) > PY_SSIZE_T_MAX / n)
        return PyErr_NoMemory();
    size = Py_SIZE(a) * n;
    if (size == 0)
        return PyList_New(0);
    np = (PyListObject *) PyList_New(size);

Namely, here: size = Py_SIZE(a) * n;. The rest of the functions simply fills the array.

Question 38

None is a block of memory, but it is not a pre-specified size. In addition to that, there is some extra spacing in an array between array elements. You can see this yourself by running:

for ele in l2:
    print(sys.getsizeof(ele))

>>>>16
16
16
16
16
16
16
16
16
16

Which does not total the size of l2, but rather is less.

print(sys.getsizeof([None]))
72

And this is much greater than one tenth of the size of l1.

Your numbers should vary depending on both the details of your operating system and the details of current memory usage in your operating system. The size of [None] can never be bigger than the available adjacent memory where the variable is set to be stored, and the variable may have to be moved if it is later dynamically allocated to be larger.

Question 39

I have a long-running script which, if let to run long enough, will consume all the memory on my system.

Without going into details about the script, I have two questions:

Are there any “Best Practices” to follow, which will help prevent leaks from occurring?
What techniques are there to debug memory leaks in Python?

Question 40

Have a look at this article: Tracing python memory leaks

Also, note that the garbage collection module actually can have debug flags set. Look at the set_debug function. Additionally, look at this code by Gnibbler for determining the types of objects that have been created after a call.

Question 41

I tried out most options mentioned previously but found this small and intuitive package to be the best: pympler

It’s quite straight forward to trace objects that were not garbage-collected, check this small example:

install package via pip install pympler

from pympler.tracker import SummaryTracker
tracker = SummaryTracker()

# ... some code you want to investigate ...

tracker.print_diff()

The output shows you all the objects that have been added, plus the memory they consumed.

Sample output:

                                 types |   # objects |   total size
====================================== | =========== | ============
                                  list |        1095 |    160.78 KB
                                   str |        1093 |     66.33 KB
                                   int |         120 |      2.81 KB
                                  dict |           3 |       840 B
      frame (codename: create_summary) |           1 |       560 B
          frame (codename: print_diff) |           1 |       480 B

This package provides a number of more features. Check pympler’s documentation, in particular the section Identifying memory leaks.

Question 42

Let me recommend mem_top tool I created

It helped me to solve a similar issue

It just instantly shows top suspects for memory leaks in a Python program

Question 43

Tracemalloc module was integrated as a built-in module starting from Python 3.4, and appearently, it’s also available for prior versions of Python as a third-party library (haven’t tested it though).

This module is able to output the precise files and lines that allocated the most memory. IMHO, this information is infinitly more valuable than the number of allocated instances for each type (which ends up being a lot of tuples 99% of the time, which is a clue, but barely helps in most cases).

I recommend you use tracemalloc in combination with pyrasite. 9 times out of 10, running the top 10 snippet in a pyrasite-shell will give you enough information and hints to to fix the leak within 10 minutes. Yet, if you’re still unable to find the leak cause, pyrasite-shell in combination with the other tools mentioned in this thread will probably give you some more hints too. You should also take a look on all the extra helpers provided by pyrasite (such as the memory viewer).

Question 44

You should specially have a look on your global or static data (long living data).

When this data grows without restriction, you can also get troubles in Python.

The garbage collector can only collect data, that is not referenced any more. But your static data can hookup data elements that should be freed.

Another problem can be memory cycles, but at least in theory the Garbage collector should find and eliminate cycles — at least as long as they are not hooked on some long living data.

What kinds of long living data are specially troublesome? Have a good look on any lists and dictionaries — they can grow without any limit. In dictionaries you might even don’t see the trouble coming since when you access dicts, the number of keys in the dictionary might not be of big visibility to you …

Question 45

To detect and locate memory leaks for long running processes, e.g. in production environments, you can now use stackimpact. It uses tracemalloc underneath. More info in this post.

Question 46

As far as best practices, keep an eye for recursive functions. In my case I ran into issues with recursion (where there didn’t need to be). A simplified example of what I was doing:

def my_function():
    # lots of memory intensive operations
    # like operating on images or huge dictionaries and lists
    .....
    my_flag = True
    if my_flag:  # restart the function if a certain flag is true
        my_function()

def main():
    my_function()

operating in this recursive manner won’t trigger the garbage collection and clear out the remains of the function, so every time through memory usage is growing and growing.

My solution was to pull the recursive call out of my_function() and have main() handle when to call it again. this way the function ends naturally and cleans up after itself.

def my_function():
    # lots of memory intensive operations
    # like operating on images or huge dictionaries and lists
    .....
    my_flag = True
    .....
    return my_flag

def main():
    result = my_function()
    if result:
        my_function()

Question 47

Not sure about “Best Practices” for memory leaks in python, but python should clear it’s own memory by it’s garbage collector. So mainly I would start by checking for circular list of some short, since they won’t be picked up by the garbage collector.

Question 48

This is by no means exhaustive advice. But number one thing to keep in mind when writing with the thought of avoiding future memory leaks (loops) is to make sure that anything which accepts a reference to a call-back, should store that call-back as a weak reference.

Question 49

Is there a way for a Python program to determine how much memory it’s currently using? I’ve seen discussions about memory usage for a single object, but what I need is total memory usage for the process, so that I can determine when it’s necessary to start discarding cached data.

Question 50

Here is a useful solution that works for various operating systems, including Linux, Windows, etc.:

import os
import psutil
process = psutil.Process(os.getpid())
print(process.memory_info().rss)  # in bytes

With Python 2.7 and psutil 5.6.3, the last line should be

print(process.memory_info()[0])

instead (there was a change in the API later).

Note: do pip install psutil if it is not installed yet.

Question 51

For Unix based systems (Linux, Mac OS X, Solaris), you can use the getrusage() function from the standard library module resource. The resulting object has the attribute ru_maxrss, which gives the peak memory usage for the calling process:

>>> resource.getrusage(resource.RUSAGE_SELF).ru_maxrss
2656  # peak memory usage (kilobytes on Linux, bytes on OS X)

The Python docs don’t make note of the units. Refer to your specific system’s man getrusage.2 page to check the unit for the value. On Ubuntu 18.04, the unit is noted as kilobytes. On Mac OS X, it’s bytes.

The getrusage() function can also be given resource.RUSAGE_CHILDREN to get the usage for child processes, and (on some systems) resource.RUSAGE_BOTH for total (self and child) process usage.

If you care only about Linux, you can alternatively read the /proc/self/status or /proc/self/statm file as described in other answers for this question and this one too.

Question 52

On Windows, you can use WMI (home page, cheeseshop):


def memory():
    import os
    from wmi import WMI
    w = WMI('.')
    result = w.query("SELECT WorkingSet FROM Win32_PerfRawData_PerfProc_Process WHERE IDProcess=%d" % os.getpid())
    return int(result[0].WorkingSet)

On Linux (from python cookbook http://code.activestate.com/recipes/286222/:

import os
_proc_status = '/proc/%d/status' % os.getpid()

_scale = {'kB': 1024.0, 'mB': 1024.0*1024.0,
          'KB': 1024.0, 'MB': 1024.0*1024.0}

def _VmB(VmKey):
    '''Private.
    '''
    global _proc_status, _scale
     # get pseudo file  /proc/<pid>/status
    try:
        t = open(_proc_status)
        v = t.read()
        t.close()
    except:
        return 0.0  # non-Linux?
     # get VmKey line e.g. 'VmRSS:  9999  kB\n ...'
    i = v.index(VmKey)
    v = v[i:].split(None, 3)  # whitespace
    if len(v) < 3:
        return 0.0  # invalid format?
     # convert Vm value to bytes
    return float(v[1]) * _scale[v[2]]


def memory(since=0.0):
    '''Return memory usage in bytes.
    '''
    return _VmB('VmSize:') - since


def resident(since=0.0):
    '''Return resident memory usage in bytes.
    '''
    return _VmB('VmRSS:') - since


def stacksize(since=0.0):
    '''Return stack size in bytes.
    '''
    return _VmB('VmStk:') - since

Question 53

On unix, you can use the ps tool to monitor it:

$ ps u -p 1347 | awk '{sum=sum+$6}; END {print sum/1024}'

where 1347 is some process id. Also, the result is in MB.

Question 54

Current memory usage of the current process on Linux, for Python 2, Python 3, and pypy, without any imports:

def getCurrentMemoryUsage():
    ''' Memory usage in kB '''

    with open('/proc/self/status') as f:
        memusage = f.read().split('VmRSS:')[1].split('\n')[0][:-3]

    return int(memusage.strip())

It reads the status file of the current process, takes everything after VmRSS:, then takes everything before the first newline (isolating the value of VmRSS), and finally cuts off the last 3 bytes which are a space and the unit (kB).
To return, it strips any whitespace and returns it as a number.

Tested on Linux 4.4 and 4.9, but even an early Linux version should work: looking in man proc and searching for the info on the /proc/$PID/status file, it mentions minimum versions for some fields (like Linux 2.6.10 for “VmPTE”), but the “VmRSS” field (which I use here) has no such mention. Therefore I assume it has been in there since an early version.

Question 55

I like it, thank you for @bayer. I get a specific process count tool, now.

# Megabyte.
$ ps aux | grep python | awk '{sum=sum+$6}; END {print sum/1024 " MB"}'
87.9492 MB

# Byte.
$ ps aux | grep python | awk '{sum=sum+$6}; END {print sum " KB"}'
90064 KB

Attach my process list.

$ ps aux  | grep python
root       943  0.0  0.1  53252  9524 ?        Ss   Aug19  52:01 /usr/bin/python /usr/local/bin/beaver -c /etc/beaver/beaver.conf -l /var/log/beaver.log -P /var/run/beaver.pid
root       950  0.6  0.4 299680 34220 ?        Sl   Aug19 568:52 /usr/bin/python /usr/local/bin/beaver -c /etc/beaver/beaver.conf -l /var/log/beaver.log -P /var/run/beaver.pid
root      3803  0.2  0.4 315692 36576 ?        S    12:43   0:54 /usr/bin/python /usr/local/bin/beaver -c /etc/beaver/beaver.conf -l /var/log/beaver.log -P /var/run/beaver.pid
jonny    23325  0.0  0.1  47460  9076 pts/0    S+   17:40   0:00 python
jonny    24651  0.0  0.0  13076   924 pts/4    S+   18:06   0:00 grep python

Reference

Question 56

For Python 3.6 and psutil 5.4.5 it is easier to use memory_percent() function listed here.

import os
import psutil
process = psutil.Process(os.getpid())
print(process.memory_percent())

Question 57

Even easier to use than /proc/self/status: /proc/self/statm. It’s just a space delimited list of several statistics. I haven’t been able to tell if both files are always present.

/proc/[pid]/statm

Provides information about memory usage, measured in pages. The columns are:

size (1) total program size (same as VmSize in /proc/[pid]/status)

resident (2) resident set size (same as VmRSS in /proc/[pid]/status)

shared (3) number of resident shared pages (i.e., backed by a file) (same as RssFile+RssShmem in /proc/[pid]/status)

text (4) text (code)

lib (5) library (unused since Linux 2.6; always 0)

data (6) data + stack

dt (7) dirty pages (unused since Linux 2.6; always 0)

Here’s a simple example:

from pathlib import Path
from resource import getpagesize

PAGESIZE = getpagesize()
PATH = Path('/proc/self/statm')


def get_resident_set_size() -> int:
    """Return the current resident set size in bytes."""
    # statm columns are: size resident shared text lib data dt
    statm = PATH.read_text()
    fields = statm.split()
    return int(fields[1]) * PAGESIZE


data = []
start_memory = get_resident_set_size()
for _ in range(10):
    data.append('X' * 100000)
    print(get_resident_set_size() - start_memory)

That produces a list that looks something like this:

You can see that it jumps by about 300,000 bytes after roughly 3 allocations of 100,000 bytes.

Question 58

Below is my function decorator which allows to track how much memory this process consumed before the function call, how much memory it uses after the function call, and how long the function is executed.

import time
import os
import psutil


def elapsed_since(start):
    return time.strftime("%H:%M:%S", time.gmtime(time.time() - start))


def get_process_memory():
    process = psutil.Process(os.getpid())
    return process.memory_info().rss


def track(func):
    def wrapper(*args, **kwargs):
        mem_before = get_process_memory()
        start = time.time()
        result = func(*args, **kwargs)
        elapsed_time = elapsed_since(start)
        mem_after = get_process_memory()
        print("{}: memory before: {:,}, after: {:,}, consumed: {:,}; exec time: {}".format(
            func.__name__,
            mem_before, mem_after, mem_after - mem_before,
            elapsed_time))
        return result
    return wrapper

So, when you have some function decorated with it

from utils import track

@track
def list_create(n):
    print("inside list create")
    return [1] * n

You will be able to see this output:

inside list create
list_create: memory before: 45,928,448, after: 46,211,072, consumed: 282,624; exec time: 00:00:00

Question 59

import os, win32api, win32con, win32process
han = win32api.OpenProcess(win32con.PROCESS_QUERY_INFORMATION|win32con.PROCESS_VM_READ, 0, os.getpid())
process_memory = int(win32process.GetProcessMemoryInfo(han)['WorkingSetSize'])

Question 60

For Unix systems command time (/usr/bin/time) gives you that info if you pass -v. See Maximum resident set size below, which is the maximum (peak) real (not virtual) memory that was used during program execution:

$ /usr/bin/time -v ls /

    Command being timed: "ls /"
    User time (seconds): 0.00
    System time (seconds): 0.01
    Percent of CPU this job got: 250%
    Elapsed (wall clock) time (h:mm:ss or m:ss): 0:00.00
    Average shared text size (kbytes): 0
    Average unshared data size (kbytes): 0
    Average stack size (kbytes): 0
    Average total size (kbytes): 0
    Maximum resident set size (kbytes): 0
    Average resident set size (kbytes): 0
    Major (requiring I/O) page faults: 0
    Minor (reclaiming a frame) page faults: 315
    Voluntary context switches: 2
    Involuntary context switches: 0
    Swaps: 0
    File system inputs: 0
    File system outputs: 0
    Socket messages sent: 0
    Socket messages received: 0
    Signals delivered: 0
    Page size (bytes): 4096
    Exit status: 0

Question 61

Using sh and os to get into python bayer’s answer.

float(sh.awk(sh.ps('u','-p',os.getpid()),'{sum=sum+$6}; END {print sum/1024}'))

Answer is in megabytes.

问题：减少Django的内存使用量。低挂水果？

回答 0

回答 1

回答 2

回答 3

回答 4

回答 5

回答 6

回答 7

回答 8

回答 9

问题：在Python中释放内存

回答 0

回答 1

回答 2

回答 3

问题：有没有办法从解释器的内存中删除创建的变量，函数等？

回答 0

回答 1

回答 2

回答 3

回答 4

回答 5

问题：如何检查pytorch是否正在使用GPU？

回答 0

回答 1

编辑：

Edit:

回答 2

回答 3

回答 4

回答 5

回答 6

回答 7

回答 8

回答 9

问题：为什么两个相同的列表具有不同的内存占用量？

回答 0

回答 1

回答 2

问题：Python内存泄漏

回答 0

回答 1

回答 2

回答 3

回答 4

回答 5

回答 6

回答 7

回答 8

问题：Python进程使用的总内存？

回答 0

回答 1

回答 2

回答 3

回答 4

回答 5

参考

Reference

回答 6

回答 7

回答 8

回答 9

回答 10

回答 11

问题：如何在Python中显式释放内存？

回答 0

回答 1

回答 2

回答 3

回答 4

回答 5

回答 6

回答 7

回答 8

问题：如何确定Python中对象的大小？

回答 0

回答 1

如何确定Python中对象的大小？

更完整的答案