为什么元组在内存中的空间比列表少?

问题:为什么元组在内存中的空间比列表少?

A tuple在Python中占用更少的内存空间:

>>> a = (1,2,3)
>>> a.__sizeof__()
48

lists占用更多的内存空间:

>>> b = [1,2,3]
>>> b.__sizeof__()
64

Python内存管理内部会发生什么?

A tuple takes less memory space in Python:

>>> a = (1,2,3)
>>> a.__sizeof__()
48

whereas lists takes more memory space:

>>> b = [1,2,3]
>>> b.__sizeof__()
64

What happens internally on the Python memory management?


回答 0

我假设您正在使用CPython并使用64位(在CPython 2.7 64位上得到的结果相同)。其他Python实现可能会有所不同,或者您拥有32位Python。

无论采用哪种实现方式,lists都是可变大小的,而tuples是固定大小的。

因此tuples可以将元素直接存储在struct内部,另一方面,列表需要一层间接寻址(它存储指向元素的指针)。间接层是一个指针,在64位系统(即64位,即8字节)上。

但是还有另一件事list:它们过度分配。否则,list.append始终是一项O(n)操作-要使其摊销(快得多!!!),它会过度分配。但是现在它必须跟踪分配的大小和填充的大小(s只需要存储一个大小,因为分配的和填充的大小始终相同)。这意味着每个列表必须存储另一个“大小”,它在64位系统上是64位整数,也是8个字节。O(1)tuple

因此lists比tuples 需要至少16个字节的内存。为什么我说“至少”?由于分配过多。过度分配意味着它分配了比所需更多的空间。但是,过度分配的数量取决于创建列表的“方式”和附加/删除历史记录:

>>> l = [1,2,3]
>>> l.__sizeof__()
64
>>> l.append(4)  # triggers re-allocation (with over-allocation), because the original list is full
>>> l.__sizeof__()
96

>>> l = []
>>> l.__sizeof__()
40
>>> l.append(1)  # re-allocation with over-allocation
>>> l.__sizeof__()
72
>>> l.append(2)  # no re-alloc
>>> l.append(3)  # no re-alloc
>>> l.__sizeof__()
72
>>> l.append(4)  # still has room, so no over-allocation needed (yet)
>>> l.__sizeof__()
72

图片

我决定创建一些图像以伴随以上说明。也许这些有帮助

在示例中,这是(示意性地)将其存储在内存中的方式。我强调了红色(徒手)循环的区别:

这实际上只是一个近似值,因为int对象也是Python对象,并且CPython甚至重用了小整数,因此内存中对象的一种可能更准确的表示形式(尽管不那么可读)将是:

有用的链接:

请注意,__sizeof__它并不会真正返回“正确”的大小!它仅返回存储值的大小。但是,使用sys.getsizeof结果不同:

>>> import sys
>>> l = [1,2,3]
>>> t = (1, 2, 3)
>>> sys.getsizeof(l)
88
>>> sys.getsizeof(t)
72

有24个“额外”字节。这些是真实的,这是方法中未考虑的垃圾收集器开销__sizeof__。这是因为您通常不应该直接使用魔术方法-在这种情况下,请使用知道如何处理魔术方法的函数:(sys.getsizeof这实际上会将GC开销加到的返回值上__sizeof__)。

I assume you’re using CPython and with 64bits (I got the same results on my CPython 2.7 64-bit). There could be differences in other Python implementations or if you have a 32bit Python.

Regardless of the implementation, lists are variable-sized while tuples are fixed-size.

So tuples can store the elements directly inside the struct, lists on the other hand need a layer of indirection (it stores a pointer to the elements). This layer of indirection is a pointer, on 64bit systems that’s 64bit, hence 8bytes.

But there’s another thing that lists do: They over-allocate. Otherwise list.append would be an O(n) operation always – to make it amortized O(1) (much faster!!!) it over-allocates. But now it has to keep track of the allocated size and the filled size (tuples only need to store one size, because allocated and filled size are always identical). That means each list has to store another “size” which on 64bit systems is a 64bit integer, again 8 bytes.

So lists need at least 16 bytes more memory than tuples. Why did I say “at least”? Because of the over-allocation. Over-allocation means it allocates more space than needed. However, the amount of over-allocation depends on “how” you create the list and the append/deletion history:

>>> l = [1,2,3]
>>> l.__sizeof__()
64
>>> l.append(4)  # triggers re-allocation (with over-allocation), because the original list is full
>>> l.__sizeof__()
96

>>> l = []
>>> l.__sizeof__()
40
>>> l.append(1)  # re-allocation with over-allocation
>>> l.__sizeof__()
72
>>> l.append(2)  # no re-alloc
>>> l.append(3)  # no re-alloc
>>> l.__sizeof__()
72
>>> l.append(4)  # still has room, so no over-allocation needed (yet)
>>> l.__sizeof__()
72

Images

I decided to create some images to accompany the explanation above. Maybe these are helpful

This is how it (schematically) is stored in memory in your example. I highlighted the differences with red (free-hand) cycles:

That’s actually just an approximation because int objects are also Python objects and CPython even reuses small integers, so a probably more accurate representation (although not as readable) of the objects in memory would be:

Useful links:

Note that __sizeof__ doesn’t really return the “correct” size! It only returns the size of the stored values. However when you use sys.getsizeof the result is different:

>>> import sys
>>> l = [1,2,3]
>>> t = (1, 2, 3)
>>> sys.getsizeof(l)
88
>>> sys.getsizeof(t)
72

There are 24 “extra” bytes. These are real, that’s the garbage collector overhead that isn’t accounted for in the __sizeof__ method. That’s because you’re generally not supposed to use magic methods directly – use the functions that know how to handle them, in this case: sys.getsizeof (which actually adds the GC overhead to the value returned from __sizeof__).


回答 1

我将更深入地研究CPython代码库,以便我们可以看到大小的实际计算方式。在您的特定示例中没有执行过度分配,因此我不会赘述

我将在这里使用64位值。


lists 的大小由以下函数计算得出list_sizeof

static PyObject *
list_sizeof(PyListObject *self)
{
    Py_ssize_t res;

    res = _PyObject_SIZE(Py_TYPE(self)) + self->allocated * sizeof(void*);
    return PyInt_FromSsize_t(res);
}

Py_TYPE(self)是一个抓取ob_typeself(返回PyList_Type),而 _PyObject_SIZE另一种宏抓斗tp_basicsize从该类型。tp_basicsize计算为实例结构sizeof(PyListObject)在哪里PyListObject

PyListObject结构包含三个字段:

PyObject_VAR_HEAD     # 24 bytes 
PyObject **ob_item;   #  8 bytes
Py_ssize_t allocated; #  8 bytes

这些内容有评论(我将它们修剪掉)以解释它们的含义,请点击上面的链接阅读它们。PyObject_VAR_HEAD扩展到3个8字节字段(ob_refcountob_typeob_size),所以一个24字节的贡献。

所以现在res是:

sizeof(PyListObject) + self->allocated * sizeof(void*)

要么:

40 + self->allocated * sizeof(void*)

如果列表实例具有已分配的元素。第二部分计算他们的贡献。self->allocated顾名思义,它保存分配的元素数。

没有任何元素,列表的大小计算为:

>>> [].__sizeof__()
40

即实例结构的大小。


tuple对象没有定义tuple_sizeof函数。而是使用它们object_sizeof来计算大小:

static PyObject *
object_sizeof(PyObject *self, PyObject *args)
{
    Py_ssize_t res, isize;

    res = 0;
    isize = self->ob_type->tp_itemsize;
    if (isize > 0)
        res = Py_SIZE(self) * isize;
    res += self->ob_type->tp_basicsize;

    return PyInt_FromSsize_t(res);
}

lists一样,它获取tp_basicsize和,如果对象具有非零值tp_itemsize(意味着它具有可变长度的实例),它将乘以元组中的项数(通过Py_SIZEtp_itemsize

tp_basicsize再次使用sizeof(PyTupleObject)其中的 PyTupleObject结构包含

PyObject_VAR_HEAD       # 24 bytes 
PyObject *ob_item[1];   # 8  bytes

因此,没有任何元素(即Py_SIZEreturn 0),空元组的大小等于sizeof(PyTupleObject)

>>> ().__sizeof__()
24

?? 嗯,这里是我还没有找到一个解释,一个古怪tp_basicsizetuples的实际计算公式如下:

sizeof(PyTupleObject) - sizeof(PyObject *)

为什么8要从中删除其他字节tp_basicsize是我一直无法找到的。(有关可能的解释,请参阅MSeifert的评论)


但是,这基本上是您特定示例中的区别list还会保留许多已分配的元素,这有助于确定何时再次过度分配。

现在,当添加其他元素时,列表确实会执行此过度分配以实现O(1)追加。由于MSeifert的封面很好地覆盖了他的答案,因此尺寸更大。

I’ll take a deeper dive into the CPython codebase so we can see how the sizes are actually calculated. In your specific example, no over-allocations have been performed, so I won’t touch on that.

I’m going to use 64-bit values here, as you are.


The size for lists is calculated from the following function, list_sizeof:

static PyObject *
list_sizeof(PyListObject *self)
{
    Py_ssize_t res;

    res = _PyObject_SIZE(Py_TYPE(self)) + self->allocated * sizeof(void*);
    return PyInt_FromSsize_t(res);
}

Here Py_TYPE(self) is a macro that grabs the ob_type of self (returning PyList_Type) while _PyObject_SIZE is another macro that grabs tp_basicsize from that type. tp_basicsize is calculated as sizeof(PyListObject) where PyListObject is the instance struct.

The PyListObject structure has three fields:

PyObject_VAR_HEAD     # 24 bytes 
PyObject **ob_item;   #  8 bytes
Py_ssize_t allocated; #  8 bytes

these have comments (which I trimmed) explaining what they are, follow the link above to read them. PyObject_VAR_HEAD expands into three 8 byte fields (ob_refcount, ob_type and ob_size) so a 24 byte contribution.

So for now res is:

sizeof(PyListObject) + self->allocated * sizeof(void*)

or:

40 + self->allocated * sizeof(void*)

If the list instance has elements that are allocated. the second part calculates their contribution. self->allocated, as it’s name implies, holds the number of allocated elements.

Without any elements, the size of lists is calculated to be:

>>> [].__sizeof__()
40

i.e the size of the instance struct.


tuple objects don’t define a tuple_sizeof function. Instead, they use object_sizeof to calculate their size:

static PyObject *
object_sizeof(PyObject *self, PyObject *args)
{
    Py_ssize_t res, isize;

    res = 0;
    isize = self->ob_type->tp_itemsize;
    if (isize > 0)
        res = Py_SIZE(self) * isize;
    res += self->ob_type->tp_basicsize;

    return PyInt_FromSsize_t(res);
}

This, as for lists, grabs the tp_basicsize and, if the object has a non-zero tp_itemsize (meaning it has variable-length instances), it multiplies the number of items in the tuple (which it gets via Py_SIZE) with tp_itemsize.

tp_basicsize again uses sizeof(PyTupleObject) where the PyTupleObject struct contains:

PyObject_VAR_HEAD       # 24 bytes 
PyObject *ob_item[1];   # 8  bytes

So, without any elements (that is, Py_SIZE returns 0) the size of empty tuples is equal to sizeof(PyTupleObject):

>>> ().__sizeof__()
24

huh? Well, here’s an oddity which I haven’t found an explanation for, the tp_basicsize of tuples is actually calculated as follows:

sizeof(PyTupleObject) - sizeof(PyObject *)

why an additional 8 bytes is removed from tp_basicsize is something I haven’t been able to find out. (See MSeifert’s comment for a possible explanation)


But, this is basically the difference in your specific example. lists also keep around a number of allocated elements which helps determine when to over-allocate again.

Now, when additional elements are added, lists do indeed perform this over-allocation in order to achieve O(1) appends. This results in greater sizes as MSeifert’s covers nicely in his answer.


回答 2

MSeifert的答案涵盖了广泛的范围;为简单起见,您可以想到:

tuple是一成不变的。一旦设置,您将无法更改。因此,您预先知道需要为该对象分配多少内存。

list易变。您可以在其中添加或删除项目。它必须知道它的大小(用于内部隐含)。根据需要调整大小。

没有免费的餐点 -这些功能需要付费。因此,列表的内存开销。

MSeifert answer covers it broadly; to keep it simple you can think of:

tuple is immutable. Once it set, you can’t change it. So you know in advance how much memory you need to allocate for that object.

list is mutable. You can add or remove items to or from it. It has to know the size of it (for internal impl.). It resizes as needed.

There are no free meals – these capabilities comes with a cost. Hence the overhead in memory for lists.


回答 3

元组的大小是有前缀的,这意味着在元组初始化时,解释器会为所包含的数据分配足够的空间,这就是它的结尾,使其具有不变性(无法修改),而列表是可变对象,因此意味着动态分配内存,因此避免每次您追加或修改列表时都要分配空间(分配足够的空间来容纳已更改的数据并将数据复制到其中),它会为以后的追加,修改等分配更多的空间。总结。

The size of the tuple is prefixed, meaning at tuple initialization the interpreter allocate enough space for the contained data, and that’s the end of it, giving it’s immutable (can’t be modified), whereas a list is a mutable object hence implying dynamic allocation of memory, so to avoid allocating space each time you append or modify the list ( allocate enough space to contain the changed data and copy the data to it), it allocates additional space for future append, modifications, … that pretty much sums it up.