Python的清单是如何实现的?

问题:Python的清单是如何实现的?

它是一个链表,一个数组吗?我四处搜寻,只发现有人猜测。我的C语言知识不足以查看源代码。

Is it a linked list, an array? I searched around and only found people guessing. My C knowledge isn’t good enough to look at the source code.


回答 0

这是一个动态数组。实际证明:不管索引如何,索引都需要花费同一时间(当然,差异很小(0.0013微秒!)):

...>python -m timeit --setup="x = [None]*1000" "x[500]"
10000000 loops, best of 3: 0.0579 usec per loop

...>python -m timeit --setup="x = [None]*1000" "x[0]"
10000000 loops, best of 3: 0.0566 usec per loop

如果IronPython或Jython使用链接列表,我会感到惊讶-它们会破坏许多基于列表是动态数组的假设而广泛使用的库的性能。

It’s a dynamic array. Practical proof: Indexing takes (of course with extremely small differences (0.0013 µsecs!)) the same time regardless of index:

...>python -m timeit --setup="x = [None]*1000" "x[500]"
10000000 loops, best of 3: 0.0579 usec per loop

...>python -m timeit --setup="x = [None]*1000" "x[0]"
10000000 loops, best of 3: 0.0566 usec per loop

I would be astounded if IronPython or Jython used linked lists – they would ruin the performance of many many widely-used libraries built on the assumption that lists are dynamic arrays.


回答 1

实际上,C代码非常简单。扩展一个宏并修剪一些不相关的注释,其基本结构在中listobject.h,该结构将列表定义为:

typedef struct {
    PyObject_HEAD
    Py_ssize_t ob_size;

    /* Vector of pointers to list elements.  list[0] is ob_item[0], etc. */
    PyObject **ob_item;

    /* ob_item contains space for 'allocated' elements.  The number
     * currently in use is ob_size.
     * Invariants:
     *     0 <= ob_size <= allocated
     *     len(list) == ob_size
     *     ob_item == NULL implies ob_size == allocated == 0
     */
    Py_ssize_t allocated;
} PyListObject;

PyObject_HEAD包含引用计数和类型标识符。因此,它是一个整体的向量/数组。用于在此类数组已满时调整其大小的代码在中listobject.c。它实际上并没有将数组加倍,而是通过分配来增长

new_allocated = (newsize >> 3) + (newsize < 9 ? 3 : 6);
new_allocated += newsize;

每次达到最大容量时,newsize请求的大小在哪里(不一定allocated + 1是因为您可以添加extend任意数量的元素,而不是append一个一个地添加它们)。

另请参阅Python FAQ

The C code is pretty simple, actually. Expanding one macro and pruning some irrelevant comments, the basic structure is in listobject.h, which defines a list as:

typedef struct {
    PyObject_HEAD
    Py_ssize_t ob_size;

    /* Vector of pointers to list elements.  list[0] is ob_item[0], etc. */
    PyObject **ob_item;

    /* ob_item contains space for 'allocated' elements.  The number
     * currently in use is ob_size.
     * Invariants:
     *     0 <= ob_size <= allocated
     *     len(list) == ob_size
     *     ob_item == NULL implies ob_size == allocated == 0
     */
    Py_ssize_t allocated;
} PyListObject;

PyObject_HEAD contains a reference count and a type identifier. So, it’s a vector/array that overallocates. The code for resizing such an array when it’s full is in listobject.c. It doesn’t actually double the array, but grows by allocating

new_allocated = (newsize >> 3) + (newsize < 9 ? 3 : 6);
new_allocated += newsize;

to the capacity each time, where newsize is the requested size (not necessarily allocated + 1 because you can extend by an arbitrary number of elements instead of append‘ing them one by one).

See also the Python FAQ.


回答 2

在CPython中,列表是指针数组。Python的其他实现可能选择以不同的方式存储它们。

In CPython, lists are arrays of pointers. Other implementations of Python may choose to store them in different ways.


回答 3

这取决于实现,但是IIRC:

  • CPython使用了一个指针数组
  • Jython使用 ArrayList
  • IronPython显然也使用数组。您可以浏览源代码以找出答案。

因此,它们都具有O(1)随机访问权限。

This is implementation dependent, but IIRC:

  • CPython uses an array of pointers
  • Jython uses an ArrayList
  • IronPython apparently also uses an array. You can browse the source code to find out.

Thus they all have O(1) random access.


回答 4

我建议Laurent Luce的文章“ Python列表实现”。这对我真的很有用,因为作者解释了该列表是如何在CPython中实现的,并为此使用了出色的图表。

列出对象C的结构

CPython中的列表对象由以下C结构表示。ob_item是指向列表元素的指针的列表。已分配是在内存中分配的插槽数。

typedef struct {
    PyObject_VAR_HEAD
    PyObject **ob_item;
    Py_ssize_t allocated;
} PyListObject;

重要的是要注意分配的插槽和列表大小之间的差异。列表的大小与相同len(l)。分配的插槽数是已在内存中分配的数量。通常,您会看到分配的大小可能大于大小。这是为了避免realloc每次将新元素添加到列表时都需要调用。

附加

我们将一个整数附加到列表中:l.append(1)。怎么了?

我们继续添加一个元素:l.append(2)list_resize在n + 1 = 2时调用,但是由于分配的大小为4,因此无需分配更多的内存。同样的事情发生时,我们增加2个整数:l.append(3)l.append(4)。下图显示了到目前为止的内容。

让我们在位置1处插入一个新的整数(5),l.insert(1,5)看看内部发生了什么。

流行音乐

当您弹出最后一个元素:时l.pop(),将listpop()被调用。list_resize在内部调用listpop(),如果新大小小于分配大小的一半,则列表将缩小。

您可以观察到插槽4仍指向整数,但重要的是列表的大小现在为4。让我们再弹出一个元素。在中list_resize(),大小– 1 = 4 – 1 = 3小于分配的插槽的一半,因此列表缩小为6个插槽,现在列表的新大小为3。

您可以观察到插槽3和4仍指向一些整数,但重要的是列表的大小现在为3。

删除 Python列表对象具有删除特定元素的方法:l.remove(5)

I would suggest Laurent Luce’s article “Python list implementation”. Was really useful for me because the author explains how the list is implemented in CPython and uses excellent diagrams for this purpose.

List object C structure

A list object in CPython is represented by the following C structure. ob_item is a list of pointers to the list elements. allocated is the number of slots allocated in memory.

typedef struct {
    PyObject_VAR_HEAD
    PyObject **ob_item;
    Py_ssize_t allocated;
} PyListObject;

It is important to notice the difference between allocated slots and the size of the list. The size of a list is the same as len(l). The number of allocated slots is what has been allocated in memory. Often, you will see that allocated can be greater than size. This is to avoid needing calling realloc each time a new elements is appended to the list.

Append

We append an integer to the list: l.append(1). What happens?

We continue by adding one more element: l.append(2). list_resize is called with n+1 = 2 but because the allocated size is 4, there is no need to allocate more memory. Same thing happens when we add 2 more integers: l.append(3), l.append(4). The following diagram shows what we have so far.

Insert

Let’s insert a new integer (5) at position 1: l.insert(1,5) and look at what happens internally.

Pop

When you pop the last element: l.pop(), listpop() is called. list_resize is called inside listpop() and if the new size is less than half of the allocated size then the list is shrunk.

You can observe that slot 4 still points to the integer but the important thing is the size of the list which is now 4. Let’s pop one more element. In list_resize(), size – 1 = 4 – 1 = 3 is less than half of the allocated slots so the list is shrunk to 6 slots and the new size of the list is now 3.

You can observe that slot 3 and 4 still point to some integers but the important thing is the size of the list which is now 3.

Remove Python list object has a method to remove a specific element: l.remove(5).


回答 5

根据文档

Python的列表实际上是可变长度数组,而不是Lisp样式的链接列表。

According to the documentation,

Python’s lists are really variable-length arrays, not Lisp-style linked lists.


回答 6

正如上面其他人所述,这些列表(当列表很大时)是通过分配固定数量的空间来实现的;如果该空间应填充,则分配更大的空间并在元素上进行复制。

为了理解为什么在不损失一般性的情况下对O(1)进行摊销,假设我们插入了a = 2 ^ n个元素,现在我们必须将表加倍为2 ^(n + 1)个大小。这意味着我们当前正在执行2 ^(n + 1)个操作。最后一个副本,我们进行了2 ^ n次操作。在此之前,我们做了2 ^(n-1)…一直到8,4,2,1。现在,如果将它们加起来,我们得到1 + 2 + 4 + 8 + … + 2 ^(n + 1)= 2 ^(n + 2)-1 <4 * 2 ^ n = O(2 ^ n)= O(a)总插入(即O(1)摊销时间)。另外,应注意,如果表允许删除,则表收缩必须以其他因子(例如3倍)完成

As others have stated above, the lists (when appreciably large) are implemented by allocating a fixed amount of space, and, if that space should fill, allocating a larger amount of space and copying over the elements.

To understand why the method is O(1) amortized, without loss of generality, assume we have inserted a = 2^n elements, and we now have to double our table to 2^(n+1) size. That means we’re currently doing 2^(n+1) operations. Last copy, we did 2^n operations. Before that we did 2^(n-1)… all the way down to 8,4,2,1. Now, if we add these up, we get 1 + 2 + 4 + 8 + … + 2^(n+1) = 2^(n+2) – 1 < 4*2^n = O(2^n) = O(a) total insertions (i.e. O(1) amortized time). Also, it should be noted that if the table allows deletions the table shrinking has to be done at a different factor (e.g 3x)


回答 7

Python中的列表类似于数组,您可以在其中存储多个值。列表是可变的,这意味着您可以更改它。您应该知道的更重要的事情是,当我们创建列表时,Python会自动为该列表变量创建reference_id。如果通过分配其他变量来更改它,则主列表将被更改。让我们尝试一个例子:

list_one = [1,2,3,4]

my_list = list_one

#my_list: [1,2,3,4]

my_list.append("new")

#my_list: [1,2,3,4,'new']
#list_one: [1,2,3,4,'new']

我们追加了,my_list但是我们的主要列表已更改。这意味着没有将列表指定为副本列表,将其指定为参考。

A list in Python is something like an array, where you can store multiple values. List is mutable that means you can change it. The more important thing you should know, when we create a list, Python automatically creates a reference_id for that list variable. If you change it by assigning others variable the main list will be change. Let’s try with a example:

list_one = [1,2,3,4]

my_list = list_one

#my_list: [1,2,3,4]

my_list.append("new")

#my_list: [1,2,3,4,'new']
#list_one: [1,2,3,4,'new']

We append my_list but our main list has changed. That mean’s list didn’t assign as a copy list assign as its reference.


回答 8

在CPython中,列表是作为动态数组实现的,因此,当我们在那时追加时,不仅添加了一个宏,而且还分配了更多空间,因此每次都不应添加新空间。

In CPython list is implemented as dynamic array, and therefore when we append at that time not only one macro is added but some more space is allocated so that everytime new space should not be added.