numpy数组的Python内存使用情况

问题:numpy数组的Python内存使用情况

我正在使用python分析一些大文件,并且遇到了内存问题,因此我一直在使用sys.getsizeof()来跟踪使用情况,但是numpy数组的行为很奇怪。这是一个涉及我必须打开的反照率地图的示例:

>>> import numpy as np
>>> import struct
>>> from sys import getsizeof
>>> f = open('Albedo_map.assoc', 'rb')
>>> getsizeof(f)
144
>>> albedo = struct.unpack('%df' % (7200*3600), f.read(7200*3600*4))
>>> getsizeof(albedo)
207360056
>>> albedo = np.array(albedo).reshape(3600,7200)
>>> getsizeof(albedo)
80

数据仍然存在,但是对象的大小(3600×7200像素图)已从约200 Mb变为80字节。我希望我的内存问题已经解决,并将所有内容都转换为numpy数组,但是我认为这种行为(如果为真)在某种程度上会违反某些信息论定律或热力学定律,等等。倾向于相信getsizeof()不适用于numpy数组。有任何想法吗?

I’m using python to analyse some large files and I’m running into memory issues, so I’ve been using sys.getsizeof() to try and keep track of the usage, but it’s behaviour with numpy arrays is bizarre. Here’s an example involving a map of albedos that I’m having to open:

>>> import numpy as np
>>> import struct
>>> from sys import getsizeof
>>> f = open('Albedo_map.assoc', 'rb')
>>> getsizeof(f)
144
>>> albedo = struct.unpack('%df' % (7200*3600), f.read(7200*3600*4))
>>> getsizeof(albedo)
207360056
>>> albedo = np.array(albedo).reshape(3600,7200)
>>> getsizeof(albedo)
80

Well the data’s still there, but the size of the object, a 3600×7200 pixel map, has gone from ~200 Mb to 80 bytes. I’d like to hope that my memory issues are over and just convert everything to numpy arrays, but I feel that this behaviour, if true, would in some way violate some law of information theory or thermodynamics, or something, so I’m inclined to believe that getsizeof() doesn’t work with numpy arrays. Any ideas?


回答 0

您可以将其array.nbytes用于numpy数组,例如:

>>> import numpy as np
>>> from sys import getsizeof
>>> a = [0] * 1024
>>> b = np.array(a)
>>> getsizeof(a)
8264
>>> b.nbytes
8192

You can use array.nbytes for numpy arrays, for example:

>>> import numpy as np
>>> from sys import getsizeof
>>> a = [0] * 1024
>>> b = np.array(a)
>>> getsizeof(a)
8264
>>> b.nbytes
8192

回答 1

nbytes字段将为您提供数组中所有元素的大小(以字节为单位)numpy.array

size_in_bytes = my_numpy_array.nbytes

请注意,这并不测量“数组对象的非元素属性”,因此,以字节为单位的实际大小可以比此大几个字节。

The field nbytes will give you the size in bytes of all the elements of the array in a numpy.array:

size_in_bytes = my_numpy_array.nbytes

Notice that this does not measures “non-element attributes of the array object” so the actual size in bytes can be a few bytes larger than this.


回答 2

在python笔记本中,我经常想过滤掉“悬空的numpy.ndarray”,特别是存储在的笔记本中_1_2等从未真正意味着活路。

我使用此代码来获取所有列表及其大小的列表。

不知道locals()或者globals()是更好地在这里。

import sys
import numpy
from humanize import naturalsize

for size, name in sorted(
    (value.nbytes, name)
    for name, value in locals().items()
    if isinstance(value, numpy.ndarray)):
  print("{:>30}: {:>8}".format(name, naturalsize(size)))

In python notebooks I often want to filter out ‘dangling’ numpy.ndarray‘s, in particular the ones that are stored in _1, _2, etc that were never really meant to stay alive.

I use this code to get a listing of all of them and their size.

Not sure if locals() or globals() is better here.

import sys
import numpy
from humanize import naturalsize

for size, name in sorted(
    (value.nbytes, name)
    for name, value in locals().items()
    if isinstance(value, numpy.ndarray)):
  print("{:>30}: {:>8}".format(name, naturalsize(size)))