# 是什么导致[* a]总体化？

## 问题：是什么导致[* a]总体化？

``````0 56 56 56
1 64 88 88
2 72 88 96
3 80 88 104
4 88 88 112
5 96 120 120
6 104 120 128
7 112 120 136
8 120 120 152
9 128 184 184
10 136 184 192
11 144 184 200
12 152 184 208``````

``````from sys import getsizeof

for n in range(13):
a = [None] * n
print(n, getsizeof(list(a)),
getsizeof([x for x in a]),
getsizeof([*a]))``````

Apparently `list(a)` doesn’t overallocate, `[x for x in a]` overallocates at some points, and `[*a]` overallocates all the time?

Here are sizes n from 0 to 12 and the resulting sizes in bytes for the three methods:

``````0 56 56 56
1 64 88 88
2 72 88 96
3 80 88 104
4 88 88 112
5 96 120 120
6 104 120 128
7 112 120 136
8 120 120 152
9 128 184 184
10 136 184 192
11 144 184 200
12 152 184 208
``````

Computed like this, reproducable at repl.it, using Python 3.8:

``````from sys import getsizeof

for n in range(13):
a = [None] * n
print(n, getsizeof(list(a)),
getsizeof([x for x in a]),
getsizeof([*a]))
``````

So: How does this work? How does `[*a]` overallocate? Actually, what mechanism does it use to create the result list from the given input? Does it use an iterator over `a` and use something like `list.append`? Where is the source code?

(Colab with data and code that produced the images.)

Zooming in to smaller n:

Zooming out to larger n:

## 回答 0

`[*a]` 在内部执行C等效于

1. 新建一个空的 `list`
2. 呼叫 `newlist.extend(a)`
3. 返回`list`

``````from sys import getsizeof

for n in range(13):
a = [None] * n
l = []
l.extend(a)
print(n, getsizeof(list(a)),
getsizeof([x for x in a]),
getsizeof([*a]),
getsizeof(l))``````

`[*a]` is internally doing the C equivalent of:

1. Make a new, empty `list`
2. Call `newlist.extend(a)`
3. Returns `list`.

So if you expand your test to:

``````from sys import getsizeof

for n in range(13):
a = [None] * n
l = []
l.extend(a)
print(n, getsizeof(list(a)),
getsizeof([x for x in a]),
getsizeof([*a]),
getsizeof(l))
``````

Try it online!

you’ll see the results for `getsizeof([*a])` and `l = []; l.extend(a); getsizeof(l)` are the same.

This is usually the right thing to do; when `extend`ing you’re usually expecting to add more later, and similarly for generalized unpacking, it’s assumed that multiple things will be added one after the other. `[*a]` is not the normal case; Python assumes there are multiple items or iterables being added to the `list` (`[*a, b, c, *d]`), so overallocation saves work in the common case.

By contrast, a `list` constructed from a single, presized iterable (with `list()`) may not grow or shrink during use, and overallocating is premature until proven otherwise; Python recently fixed a bug that made the constructor overallocate even for inputs with known size.

As for `list` comprehensions, they’re effectively equivalent to repeated `append`s, so you’re seeing the final result of the normal overallocation growth pattern when adding an element at a time.

To be clear, none of this is a language guarantee. It’s just how CPython implements it. The Python language spec is generally unconcerned with specific growth patterns in `list` (aside from guaranteeing amortized `O(1)` `append`s and `pop`s from the end). As noted in the comments, the specific implementation changes again in 3.9; while it won’t affect `[*a]`, it could affect other cases where what used to be “build a temporary `tuple` of individual items and then `extend` with the `tuple`” now becomes multiple applications of `LIST_APPEND`, which can change when the overallocation occurs and what numbers go into the calculation.

## 回答 1

``````>>> import dis
>>> dis.dis('[*a]')
2 BUILD_LIST_UNPACK        1
4 RETURN_VALUE``````

``````        case TARGET(BUILD_LIST_UNPACK): {
...
PyObject *sum = PyList_New(0);
...
none_val = _PyList_Extend((PyListObject *)sum, PEEK(i));``````

`_PyList_Extend` 用途 `list_extend`

``````_PyList_Extend(PyListObject *self, PyObject *iterable)
{
return list_extend(self, iterable);
}``````
``````list_extend(PyListObject *self, PyObject *iterable)
...
n = PySequence_Fast_GET_SIZE(iterable);
...
m = Py_SIZE(self);
...
if (list_resize(self, m + n) < 0) {``````

overallocates如下：

``````list_resize(PyListObject *self, Py_ssize_t newsize)
{
...
new_allocated = (size_t)newsize + (newsize >> 3) + (newsize < 9 ? 3 : 6);``````

``````from sys import getsizeof
for n in range(13):
a = [None] * n
expected_spots = n + (n >> 3) + (3 if n < 9 else 6)
expected_bytesize = getsizeof([]) + expected_spots * 8
real_bytesize = getsizeof([*a])
print(n,
expected_bytesize,
real_bytesize,
real_bytesize == expected_bytesize)``````

``````0 80 56 False
1 88 88 True
2 96 96 True
3 104 104 True
4 112 112 True
5 120 120 True
6 128 128 True
7 136 136 True
8 152 152 True
9 184 184 True
10 192 192 True
11 200 200 True
12 208 208 True``````

``````        if (n == 0) {
...
Py_RETURN_NONE;
}
...
if (list_resize(self, m + n) < 0) {``````

Full picture of what happens, building on the other answers and comments (especially ShadowRanger’s answer, which also explains why it’s done like that).

Disassembling shows that `BUILD_LIST_UNPACK` gets used:

``````>>> import dis
>>> dis.dis('[*a]')
2 BUILD_LIST_UNPACK        1
4 RETURN_VALUE
``````

That’s handled in `ceval.c`, which builds an empty list and extends it (with `a`):

``````        case TARGET(BUILD_LIST_UNPACK): {
...
PyObject *sum = PyList_New(0);
...
none_val = _PyList_Extend((PyListObject *)sum, PEEK(i));
``````

`_PyList_Extend` uses `list_extend`:

``````_PyList_Extend(PyListObject *self, PyObject *iterable)
{
return list_extend(self, iterable);
}
``````
``````list_extend(PyListObject *self, PyObject *iterable)
...
n = PySequence_Fast_GET_SIZE(iterable);
...
m = Py_SIZE(self);
...
if (list_resize(self, m + n) < 0) {
``````

And that overallocates as follows:

``````list_resize(PyListObject *self, Py_ssize_t newsize)
{
...
new_allocated = (size_t)newsize + (newsize >> 3) + (newsize < 9 ? 3 : 6);
``````

Let’s check that. Compute the expected number of spots with the formula above, and compute the expected byte size by multiplying it with 8 (as I’m using 64-bit Python here) and adding an empty list’s byte size (i.e., a list object’s constant overhead):

``````from sys import getsizeof
for n in range(13):
a = [None] * n
expected_spots = n + (n >> 3) + (3 if n < 9 else 6)
expected_bytesize = getsizeof([]) + expected_spots * 8
real_bytesize = getsizeof([*a])
print(n,
expected_bytesize,
real_bytesize,
real_bytesize == expected_bytesize)
``````

Output:

``````0 80 56 False
1 88 88 True
2 96 96 True
3 104 104 True
4 112 112 True
5 120 120 True
6 128 128 True
7 136 136 True
8 152 152 True
9 184 184 True
10 192 192 True
11 200 200 True
12 208 208 True
``````

Matches except for `n = 0`, which `list_extend` actually shortcuts, so actually that matches, too:

``````        if (n == 0) {
...
Py_RETURN_NONE;
}
...
if (list_resize(self, m + n) < 0) {
``````

## 回答 2

https://github.com/python/cpython/blob/master/Objects/listobject.c#L36

`````` * The growth pattern is:  0, 4, 8, 16, 25, 35, 46, 58, 72, 88, ...
...

new_allocated = (size_t)newsize + (newsize >> 3) + (newsize < 9 ? 3 : 6);``````

These are going to be implementation details of the CPython interpreter, and so may not be consistent across other interpreters.

That said, you can see where the comprehension and `list(a)` behaviors come in here:

https://github.com/python/cpython/blob/master/Objects/listobject.c#L36

Specifically for the comprehension:

`````` * The growth pattern is:  0, 4, 8, 16, 25, 35, 46, 58, 72, 88, ...
...

new_allocated = (size_t)newsize + (newsize >> 3) + (newsize < 9 ? 3 : 6);
``````

Just below those lines, there is `list_preallocate_exact` which is used when calling `list(a)`.