标签归档:python-internals

为什么x ** 4.0比Python 3中的x ** 4快?

问题:为什么x ** 4.0比Python 3中的x ** 4快?

为什么x**4.0要比x**4?我正在使用CPython 3.5.2。

$ python -m timeit "for x in range(100):" " x**4.0"
  10000 loops, best of 3: 24.2 usec per loop

$ python -m timeit "for x in range(100):" " x**4"
  10000 loops, best of 3: 30.6 usec per loop

我尝试更改所提高的力量以查看其作用方式,例如,如果我将x提高至10或16的力量,它会从30跳到35,但是如果我以浮点数提高10.0,那只是在移动大约24.1〜4。

我想这可能与浮点转换和2的幂有关,但我真的不知道。

我注意到在这两种情况下2的幂都更快,因为这些计算对于解释器/计算机而言更本机/更容易,所以我想。但尽管如此,浮子几乎没有动。2.0 => 24.1~4 & 128.0 => 24.1~4 2 => 29 & 128 => 62


TigerhawkT3指出,它不会在循环之外发生。我检查了一下,这种情况只发生在基地抬高时(从我所看到的情况)。有什么想法吗?

Why is x**4.0 faster than x**4? I am using CPython 3.5.2.

$ python -m timeit "for x in range(100):" " x**4.0"
  10000 loops, best of 3: 24.2 usec per loop

$ python -m timeit "for x in range(100):" " x**4"
  10000 loops, best of 3: 30.6 usec per loop

I tried changing the power I raised by to see how it acts, and for example if I raise x to the power of 10 or 16 it’s jumping from 30 to 35, but if I’m raising by 10.0 as a float, it’s just moving around 24.1~4.

I guess it has something to do with float conversion and powers of 2 maybe, but I don’t really know.

I noticed that in both cases powers of 2 are faster, I guess since those calculations are more native/easy for the interpreter/computer. But still, with floats it’s almost not moving. 2.0 => 24.1~4 & 128.0 => 24.1~4 but 2 => 29 & 128 => 62


TigerhawkT3 pointed out that it doesn’t happen outside of the loop. I checked and the situation only occurs (from what I’ve seen) when the base is getting raised. Any idea about that?

回答 0

为什么比Python 3 * x**4.0 更快x**4

Python 3 int对象是成熟的对象,旨在支持任意大小。因此,它们是在C级别上进行处理的(请参阅如何在中将所有变量声明为PyLongObject *type long_pow)。这也使它们的取幂变得更加棘手乏味,因为您需要ob_digit使用它用来表示其值的数组进行操作。(勇敢的人士来了。-有关s的更多信息,请参见:了解Python中大整数的内存分配PyLongObject。)

float相反,可以将 Python 对象转换为C double类型(通过使用PyFloat_AsDouble),并且可以使用这些本机类型执行操作。这很棒,因为在检查了相关的边缘情况之后,它允许Python 使用平台的powC pow,即)来处理实际的幂运算:

/* Now iv and iw are finite, iw is nonzero, and iv is
 * positive and not equal to 1.0.  We finally allow
 * the platform pow to step in and do the rest.
 */
errno = 0;
PyFPE_START_PROTECT("pow", return NULL)
ix = pow(iv, iw); 

其中,iviw是我们的原PyFloatObjectS作为Ç double秒。

对于它的价值:Python 2.7.13对我而言是一个2~3更快的因子,并且显示出相反的行为。

先前的事实也解释了Python 2和3之间的差异,因此,我认为我也将解决此评论,因为它很有趣。

在Python 2中,您使用的旧int对象与intPython 3中的对象不同(int3.x中的所有对象都是PyLongObject类型)。在Python 2中,有一个区别取决于对象的值(或者,如果使用后缀L/l):

# Python 2
type(30)  # <type 'int'>
type(30L) # <type 'long'>

<type 'int'>你看到这里做同样的事情float就做,它被安全地转换成C long 时,就可以进行幂(中int_pow也暗示编译器放你的歌在寄存器中,如果能够这样做,这样可以有所作为) :

static PyObject *
int_pow(PyIntObject *v, PyIntObject *w, PyIntObject *z)
{
    register long iv, iw, iz=0, ix, temp, prev;
/* Snipped for brevity */    

这样可以提高速度。

要查看<type 'long'>s与<type 'int'>s 相比是否比较慢,如果将x名称包装long在Python 2 中的调用中(实质上是强制将其long_pow像在Python 3中那样使用),则速度增益会消失:

# <type 'int'>
(python2)  python -m timeit "for x in range(1000):" " x**2"       
10000 loops, best of 3: 116 usec per loop
# <type 'long'> 
(python2)  python -m timeit "for x in range(1000):" " long(x)**2"
100 loops, best of 3: 2.12 msec per loop

请注意,尽管一个代码段将转换为intlong而另一个代码段未将其转换为(如@pydsinger所指出的),但这种转换并不是减慢速度的推动力。执行long_pow是。(仅long(x)将时间与语句一起查看)。

[…]它不会在循环之外发生。[…]有什么想法吗?

这是CPython的窥孔优化器,可以为您折叠常量。无论哪种情况,您都会获得相同的精确计时,因为没有实际的计算来找到幂运算的结果,只加载值:

dis.dis(compile('4 ** 4', '', 'exec'))
  1           0 LOAD_CONST               2 (256)
              3 POP_TOP
              4 LOAD_CONST               1 (None)
              7 RETURN_VALUE

生成相同的字节码'4 ** 4.',唯一的区别是LOAD_CONST加载的是float 256.0而不是int 256

dis.dis(compile('4 ** 4.', '', 'exec'))
  1           0 LOAD_CONST               3 (256.0)
              2 POP_TOP
              4 LOAD_CONST               2 (None)
              6 RETURN_VALUE

所以时代是一样的。


*以上所有内容仅适用于CPython(Python的参考实现)。其他实现可能会有所不同。

Why is x**4.0 faster than x**4 in Python 3*?

Python 3 int objects are a full fledged object designed to support an arbitrary size; due to that fact, they are handled as such on the C level (see how all variables are declared as PyLongObject * type in long_pow). This also makes their exponentiation a lot more trickier and tedious since you need to play around with the ob_digit array it uses to represent its value to perform it. (Source for the brave. — See: Understanding memory allocation for large integers in Python for more on PyLongObjects.)

Python float objects, on the contrary, can be transformed to a C double type (by using PyFloat_AsDouble) and operations can be performed using those native types. This is great because, after checking for relevant edge-cases, it allows Python to use the platforms’ pow (C’s pow, that is) to handle the actual exponentiation:

/* Now iv and iw are finite, iw is nonzero, and iv is
 * positive and not equal to 1.0.  We finally allow
 * the platform pow to step in and do the rest.
 */
errno = 0;
PyFPE_START_PROTECT("pow", return NULL)
ix = pow(iv, iw); 

where iv and iw are our original PyFloatObjects as C doubles.

For what it’s worth: Python 2.7.13 for me is a factor 2~3 faster, and shows the inverse behaviour.

The previous fact also explains the discrepancy between Python 2 and 3 so, I thought I’d address this comment too because it is interesting.

In Python 2, you’re using the old int object that differs from the int object in Python 3 (all int objects in 3.x are of PyLongObject type). In Python 2, there’s a distinction that depends on the value of the object (or, if you use the suffix L/l):

# Python 2
type(30)  # <type 'int'>
type(30L) # <type 'long'>

The <type 'int'> you see here does the same thing floats do, it gets safely converted into a C long when exponentiation is performed on it (The int_pow also hints the compiler to put ’em in a register if it can do so, so that could make a difference):

static PyObject *
int_pow(PyIntObject *v, PyIntObject *w, PyIntObject *z)
{
    register long iv, iw, iz=0, ix, temp, prev;
/* Snipped for brevity */    

this allows for a good speed gain.

To see how sluggish <type 'long'>s are in comparison to <type 'int'>s, if you wrapped the x name in a long call in Python 2 (essentially forcing it to use long_pow as in Python 3), the speed gain disappears:

# <type 'int'>
(python2) ➜ python -m timeit "for x in range(1000):" " x**2"       
10000 loops, best of 3: 116 usec per loop
# <type 'long'> 
(python2) ➜ python -m timeit "for x in range(1000):" " long(x)**2"
100 loops, best of 3: 2.12 msec per loop

Take note that, though the one snippet transforms the int to long while the other does not (as pointed out by @pydsinger), this cast is not the contributing force behind the slowdown. The implementation of long_pow is. (Time the statements solely with long(x) to see).

[…] it doesn’t happen outside of the loop. […] Any idea about that?

This is CPython’s peephole optimizer folding the constants for you. You get the same exact timings either case since there’s no actual computation to find the result of the exponentiation, only loading of values:

dis.dis(compile('4 ** 4', '', 'exec'))
  1           0 LOAD_CONST               2 (256)
              3 POP_TOP
              4 LOAD_CONST               1 (None)
              7 RETURN_VALUE

Identical byte-code is generated for '4 ** 4.' with the only difference being that the LOAD_CONST loads the float 256.0 instead of the int 256:

dis.dis(compile('4 ** 4.', '', 'exec'))
  1           0 LOAD_CONST               3 (256.0)
              2 POP_TOP
              4 LOAD_CONST               2 (None)
              6 RETURN_VALUE

So the times are identical.


*All of the above apply solely for CPython, the reference implementation of Python. Other implementations might perform differently.


回答 1

如果我们查看字节码,我们可以看到表达式是完全相同的。唯一的区别是常量的类型,它将是的自变量BINARY_POWER。因此,最肯定的是由于它int被转换为浮点数。

>>> def func(n):
...    return n**4
... 
>>> def func1(n):
...    return n**4.0
... 
>>> from dis import dis
>>> dis(func)
  2           0 LOAD_FAST                0 (n)
              3 LOAD_CONST               1 (4)
              6 BINARY_POWER
              7 RETURN_VALUE
>>> dis(func1)
  2           0 LOAD_FAST                0 (n)
              3 LOAD_CONST               1 (4.0)
              6 BINARY_POWER
              7 RETURN_VALUE

更新:让我们看一下CPython源代码中的Objects / abstract.c

PyObject *
PyNumber_Power(PyObject *v, PyObject *w, PyObject *z)
{
    return ternary_op(v, w, z, NB_SLOT(nb_power), "** or pow()");
}

PyNumber_Power呼叫ternary_op,这太长了,无法粘贴到这里,因此这是链接

它调用的nb_power广告位xy作为参数传递。

最后,在Objects / floatobject.c的float_pow()第686行,我们看到在实际操作之前将参数转换为C :double

static PyObject *
float_pow(PyObject *v, PyObject *w, PyObject *z)
{
    double iv, iw, ix;
    int negate_result = 0;

    if ((PyObject *)z != Py_None) {
        PyErr_SetString(PyExc_TypeError, "pow() 3rd argument not "
            "allowed unless all arguments are integers");
        return NULL;
    }

    CONVERT_TO_DOUBLE(v, iv);
    CONVERT_TO_DOUBLE(w, iw);
    ...

If we look at the bytecode, we can see that the expressions are purely identical. The only difference is a type of a constant that will be an argument of BINARY_POWER. So it’s most certainly due to an int being converted to a floating point number down the line.

>>> def func(n):
...    return n**4
... 
>>> def func1(n):
...    return n**4.0
... 
>>> from dis import dis
>>> dis(func)
  2           0 LOAD_FAST                0 (n)
              3 LOAD_CONST               1 (4)
              6 BINARY_POWER
              7 RETURN_VALUE
>>> dis(func1)
  2           0 LOAD_FAST                0 (n)
              3 LOAD_CONST               1 (4.0)
              6 BINARY_POWER
              7 RETURN_VALUE

Update: let’s take a look at Objects/abstract.c in the CPython source code:

PyObject *
PyNumber_Power(PyObject *v, PyObject *w, PyObject *z)
{
    return ternary_op(v, w, z, NB_SLOT(nb_power), "** or pow()");
}

PyNumber_Power calls ternary_op, which is too long to paste here, so here’s the link.

It calls the nb_power slot of x, passing y as an argument.

Finally, in float_pow() at line 686 of Objects/floatobject.c we see that arguments are converted to a C double right before the actual operation:

static PyObject *
float_pow(PyObject *v, PyObject *w, PyObject *z)
{
    double iv, iw, ix;
    int negate_result = 0;

    if ((PyObject *)z != Py_None) {
        PyErr_SetString(PyExc_TypeError, "pow() 3rd argument not "
            "allowed unless all arguments are integers");
        return NULL;
    }

    CONVERT_TO_DOUBLE(v, iv);
    CONVERT_TO_DOUBLE(w, iw);
    ...

回答 2

因为一个是正确的,所以另一个是近似值。

>>> 334453647687345435634784453567231654765 ** 4.0
1.2512490121794596e+154
>>> 334453647687345435634784453567231654765 ** 4
125124901217945966595797084130108863452053981325370920366144
719991392270482919860036990488994139314813986665699000071678
41534843695972182197917378267300625

Because one is correct, another is approximation.

>>> 334453647687345435634784453567231654765 ** 4.0
1.2512490121794596e+154
>>> 334453647687345435634784453567231654765 ** 4
125124901217945966595797084130108863452053981325370920366144
719991392270482919860036990488994139314813986665699000071678
41534843695972182197917378267300625

为什么两个相同的列表具有不同的内存占用量?

问题:为什么两个相同的列表具有不同的内存占用量?

我创建了两个列表l1l2,但是每个列表都有不同的创建方法:

import sys

l1 = [None] * 10
l2 = [None for _ in range(10)]

print('Size of l1 =', sys.getsizeof(l1))
print('Size of l2 =', sys.getsizeof(l2))

但是输出使我感到惊讶:

Size of l1 = 144
Size of l2 = 192

使用列表推导创建的列表在内存中的容量更大,但是在Python中,这两个列表是相同的。

这是为什么?这是CPython内部的东西,还是其他解释?

I created two lists l1 and l2, but each one with a different creation method:

import sys

l1 = [None] * 10
l2 = [None for _ in range(10)]

print('Size of l1 =', sys.getsizeof(l1))
print('Size of l2 =', sys.getsizeof(l2))

But the output surprised me:

Size of l1 = 144
Size of l2 = 192

The list created with a list comprehension is a bigger size in memory, but the two lists are identical in Python otherwise.

Why is that? Is this some CPython internal thing, or some other explanation?


回答 0

当您编写时[None] * 10,Python知道它将需要一个正好包含10个对象的列表,因此它会精确地分配该对象。

当您使用列表推导时,Python不知道它将需要多少。因此,随着元素的添加,列表逐渐增加。对于每个重新分配,它分配的空间都超过立即需要的空间,因此不必为每个元素重新分配。结果列表可能会比所需的要大一些。

比较具有相似大小的列表时,可以看到此行为:

>>> sys.getsizeof([None]*15)
184
>>> sys.getsizeof([None]*16)
192
>>> sys.getsizeof([None for _ in range(15)])
192
>>> sys.getsizeof([None for _ in range(16)])
192
>>> sys.getsizeof([None for _ in range(17)])
264

您可以看到第一种方法只分配需要的内容,而第二种则周期性地增长。在此示例中,它为16个元素分配了足够的内存,并且在达到第17个元素时不得不重新分配。

When you write [None] * 10, Python knows that it will need a list of exactly 10 objects, so it allocates exactly that.

When you use a list comprehension, Python doesn’t know how much it will need. So it gradually grows the list as elements are added. For each reallocation it allocates more room than is immediately needed, so that it doesn’t have to reallocate for each element. The resulting list is likely to be somewhat bigger than needed.

You can see this behavior when comparing lists created with similar sizes:

>>> sys.getsizeof([None]*15)
184
>>> sys.getsizeof([None]*16)
192
>>> sys.getsizeof([None for _ in range(15)])
192
>>> sys.getsizeof([None for _ in range(16)])
192
>>> sys.getsizeof([None for _ in range(17)])
264

You can see that the first method allocates just what is needed, while the second one grows periodically. In this example, it allocates enough for 16 elements, and had to reallocate when reaching the 17th.


回答 1

就像在这个问题中指出的那样,列表理解是list.append在后台使用的,因此它将调用list-resize方法,该方法将进行整体化。

为了向自己证明这一点,您实际上可以使用反dis汇编程序:

>>> code = compile('[x for x in iterable]', '', 'eval')
>>> import dis
>>> dis.dis(code)
  1           0 LOAD_CONST               0 (<code object <listcomp> at 0x10560b810, file "", line 1>)
              2 LOAD_CONST               1 ('<listcomp>')
              4 MAKE_FUNCTION            0
              6 LOAD_NAME                0 (iterable)
              8 GET_ITER
             10 CALL_FUNCTION            1
             12 RETURN_VALUE

Disassembly of <code object <listcomp> at 0x10560b810, file "", line 1>:
  1           0 BUILD_LIST               0
              2 LOAD_FAST                0 (.0)
        >>    4 FOR_ITER                 8 (to 14)
              6 STORE_FAST               1 (x)
              8 LOAD_FAST                1 (x)
             10 LIST_APPEND              2
             12 JUMP_ABSOLUTE            4
        >>   14 RETURN_VALUE
>>>

注意LIST_APPEND反汇编<listcomp>代码对象中的操作码。从文档

LIST_APPEND(i)

来电list.append(TOS[-i], TOS)。用于实现列表推导。

现在,对于列表重复操作,如果考虑以下因素,我们就可以知道发生了什么:

>>> import sys
>>> sys.getsizeof([])
64
>>> 8*10
80
>>> 64 + 80
144
>>> sys.getsizeof([None]*10)
144

因此,它似乎能够准确分配大小。查看源代码,我们看到的就是这样:

static PyObject *
list_repeat(PyListObject *a, Py_ssize_t n)
{
    Py_ssize_t i, j;
    Py_ssize_t size;
    PyListObject *np;
    PyObject **p, **items;
    PyObject *elem;
    if (n < 0)
        n = 0;
    if (n > 0 && Py_SIZE(a) > PY_SSIZE_T_MAX / n)
        return PyErr_NoMemory();
    size = Py_SIZE(a) * n;
    if (size == 0)
        return PyList_New(0);
    np = (PyListObject *) PyList_New(size);

即,这里:size = Py_SIZE(a) * n;。其余函数仅填充数组。

As noted in this question the list-comprehension uses list.append under the hood, so it will call the list-resize method, which overallocates.

To demonstrate this to yourself, you can actually use the dis dissasembler:

>>> code = compile('[x for x in iterable]', '', 'eval')
>>> import dis
>>> dis.dis(code)
  1           0 LOAD_CONST               0 (<code object <listcomp> at 0x10560b810, file "", line 1>)
              2 LOAD_CONST               1 ('<listcomp>')
              4 MAKE_FUNCTION            0
              6 LOAD_NAME                0 (iterable)
              8 GET_ITER
             10 CALL_FUNCTION            1
             12 RETURN_VALUE

Disassembly of <code object <listcomp> at 0x10560b810, file "", line 1>:
  1           0 BUILD_LIST               0
              2 LOAD_FAST                0 (.0)
        >>    4 FOR_ITER                 8 (to 14)
              6 STORE_FAST               1 (x)
              8 LOAD_FAST                1 (x)
             10 LIST_APPEND              2
             12 JUMP_ABSOLUTE            4
        >>   14 RETURN_VALUE
>>>

Notice the LIST_APPEND opcode in the disassembly of the <listcomp> code object. From the docs:

LIST_APPEND(i)

Calls list.append(TOS[-i], TOS). Used to implement list comprehensions.

Now, for the list-repetition operation, we have a hint about what is going on if we consider:

>>> import sys
>>> sys.getsizeof([])
64
>>> 8*10
80
>>> 64 + 80
144
>>> sys.getsizeof([None]*10)
144

So, it seems to be able to exactly allocate the size. Looking at the source code, we see this is exactly what happens:

static PyObject *
list_repeat(PyListObject *a, Py_ssize_t n)
{
    Py_ssize_t i, j;
    Py_ssize_t size;
    PyListObject *np;
    PyObject **p, **items;
    PyObject *elem;
    if (n < 0)
        n = 0;
    if (n > 0 && Py_SIZE(a) > PY_SSIZE_T_MAX / n)
        return PyErr_NoMemory();
    size = Py_SIZE(a) * n;
    if (size == 0)
        return PyList_New(0);
    np = (PyListObject *) PyList_New(size);

Namely, here: size = Py_SIZE(a) * n;. The rest of the functions simply fills the array.


回答 2

没有一个是内存块,但是它不是预先指定的大小。除此之外,数组元素之间的数组还有一些额外的间距。您可以通过运行以下命令自己查看:

for ele in l2:
    print(sys.getsizeof(ele))

>>>>16
16
16
16
16
16
16
16
16
16

它的总和不是l2的大小,而是更小。

print(sys.getsizeof([None]))
72

而且这远远大于的十分之一l1

您的数字应该有所不同,具体取决于您的操作系统的详细信息和操作系统中当前内存使用情况的详细信息。[None]的大小永远不能大于将变量设置为存储的可用相邻内存,并且如果以后将其动态分配为更大,则可能必须移动该变量。

None is a block of memory, but it is not a pre-specified size. In addition to that, there is some extra spacing in an array between array elements. You can see this yourself by running:

for ele in l2:
    print(sys.getsizeof(ele))

>>>>16
16
16
16
16
16
16
16
16
16

Which does not total the size of l2, but rather is less.

print(sys.getsizeof([None]))
72

And this is much greater than one tenth of the size of l1.

Your numbers should vary depending on both the details of your operating system and the details of current memory usage in your operating system. The size of [None] can never be bigger than the available adjacent memory where the variable is set to be stored, and the variable may have to be moved if it is later dynamically allocated to be larger.


为什么Python的数组变慢?

问题:为什么Python的数组变慢?

我期望array.array比列表更快,因为数组似乎已拆箱。

但是,我得到以下结果:

In [1]: import array

In [2]: L = list(range(100000000))

In [3]: A = array.array('l', range(100000000))

In [4]: %timeit sum(L)
1 loop, best of 3: 667 ms per loop

In [5]: %timeit sum(A)
1 loop, best of 3: 1.41 s per loop

In [6]: %timeit sum(L)
1 loop, best of 3: 627 ms per loop

In [7]: %timeit sum(A)
1 loop, best of 3: 1.39 s per loop

造成这种差异的原因是什么?

I expected array.array to be faster than lists, as arrays seem to be unboxed.

However, I get the following result:

In [1]: import array

In [2]: L = list(range(100000000))

In [3]: A = array.array('l', range(100000000))

In [4]: %timeit sum(L)
1 loop, best of 3: 667 ms per loop

In [5]: %timeit sum(A)
1 loop, best of 3: 1.41 s per loop

In [6]: %timeit sum(L)
1 loop, best of 3: 627 ms per loop

In [7]: %timeit sum(A)
1 loop, best of 3: 1.39 s per loop

What could be the cause of such a difference?


回答 0

存储是“未装箱的”,但是每次访问元素时,Python都必须对其进行“装箱”(将其嵌入到常规Python对象中),以便对其进行任何处理。例如,您sum(A)遍历数组,并将每个整数一次装在一个常规Python int对象中。那要花时间。在您的中sum(L),所有装箱操作均在创建列表时完成。

因此,最后,阵列通常较慢,但所需的内存却少得多。


这是最新版本的Python 3的相关代码,但是自Python首次发布以来,相同的基本思想也适用于所有CPython实现。

这是访问列表项的代码:

PyObject *
PyList_GetItem(PyObject *op, Py_ssize_t i)
{
    /* error checking omitted */
    return ((PyListObject *)op) -> ob_item[i];
}

它几乎没有什么: somelist[i]只返回i列表中的第一个对象(CPython中的所有Python对象都是指向结构的指针,该结构的初始段符合a的布局struct PyObject)。

__getitem__array带有类型代码的实现l

static PyObject *
l_getitem(arrayobject *ap, Py_ssize_t i)
{
    return PyLong_FromLong(((long *)ap->ob_item)[i]);
}

原始内存被视为平台本地C long整数的向量;该i“个C long被读取起来; 然后PyLong_FromLong()调用该方法将本机包装(“框”)C long在Python long对象中(在Python 3中,该对象消除了Python 2 int和之间的区别long,实际上显示为type int)。

此装箱必须为Python int对象分配新的内存,然后将native C long的位喷入其中。在原始示例的上下文中,该对象的生存期非常短(足够长,足以sum()将内容添加到运行总计中),然后需要更多时间才能取消分配新int对象。

在CPython实现中,这就是速度差异的来源,永远都是这样,永远都是这样。

The storage is “unboxed”, but every time you access an element Python has to “box” it (embed it in a regular Python object) in order to do anything with it. For example, your sum(A) iterates over the array, and boxes each integer, one at a time, in a regular Python int object. That costs time. In your sum(L), all the boxing was done at the time the list was created.

So, in the end, an array is generally slower, but requires substantially less memory.


Here’s the relevant code from a recent version of Python 3, but the same basic ideas apply to all CPython implementations since Python was first released.

Here’s the code to access a list item:

PyObject *
PyList_GetItem(PyObject *op, Py_ssize_t i)
{
    /* error checking omitted */
    return ((PyListObject *)op) -> ob_item[i];
}

There’s very little to it: somelist[i] just returns the i‘th object in the list (and all Python objects in CPython are pointers to a struct whose initial segment conforms to the layout of a struct PyObject).

And here’s the __getitem__ implementation for an array with type code l:

static PyObject *
l_getitem(arrayobject *ap, Py_ssize_t i)
{
    return PyLong_FromLong(((long *)ap->ob_item)[i]);
}

The raw memory is treated as a vector of platform-native C long integers; the i‘th C long is read up; and then PyLong_FromLong() is called to wrap (“box”) the native C long in a Python long object (which, in Python 3, which eliminates Python 2’s distinction between int and long, is actually shown as type int).

This boxing has to allocate new memory for a Python int object, and spray the native C long‘s bits into it. In the context of the original example, this object’s lifetime is very brief (just long enough for sum() to add the contents into a running total), and then more time is required to deallocate the new int object.

This is where the speed difference comes from, always has come from, and always will come from in the CPython implementation.


回答 1

为了增加蒂姆·彼得斯的出色答案,数组实现了缓冲协议,而列表则没有。这意味着,如果您正在编写C扩展名(或道德上的对等物,例如编写Cython模块),则可以比Python更快地访问和使用数组的元素。这将为您带来可观的速度改进,可能远远超过一个数量级。但是,它有很多缺点:

  1. 您现在正从事编写C而不是Python的工作。Cython是改善这种情况的一种方法,但是并不能消除语言之间的许多基本差异;您需要熟悉C语义并了解它在做什么。
  2. PyPy的C API 在某种程度上可以正常工作,但是速度不是很快。如果您以PyPy为目标,则可能应该只编写带有常规列表的简单代码,然后让JITter为您优化它。
  3. C扩展比纯Python代码更难分发,因为它们需要编译。编译通常取决于体系结构和操作系统,因此您需要确保要针对目标平台进行编译。

直接转到C扩展程序可能会使用大锤拍打苍蝇,具体取决于您的用例。您首先应该研究NumPy,看看它是否足够强大,可以执行您想做的任何数学运算。如果正确使用,它将比本地Python快得多。

To add to Tim Peters’ excellent answer, arrays implement the buffer protocol, while lists do not. This means that, if you are writing a C extension (or the moral equivalent, such as writing a Cython module), then you can access and work with the elements of an array much faster than anything Python can do. This will give you considerable speed improvements, possibly well over an order of magnitude. However, it has a number of downsides:

  1. You are now in the business of writing C instead of Python. Cython is one way to ameliorate this, but it does not eliminate many fundamental differences between the languages; you need to be familiar with C semantics and understand what it is doing.
  2. PyPy’s C API works to some extent, but isn’t very fast. If you are targeting PyPy, you should probably just write simple code with regular lists, and then let the JITter optimize it for you.
  3. C extensions are harder to distribute than pure Python code because they need to be compiled. Compilation tends to be architecture and operating-system dependent, so you will need to ensure you are compiling for your target platform.

Going straight to C extensions may be using a sledgehammer to swat a fly, depending on your use case. You should first investigate NumPy and see if it is powerful enough to do whatever math you’re trying to do. It will also be much faster than native Python, if used correctly.


回答 2

蒂姆·彼得斯(Tim Peters)回答了为什么这样做很慢,但让我们看看如何改善它。

坚持您的示例sum(range(...))(比示例小10倍,以适合此处的内存):

import numpy
import array
L = list(range(10**7))
A = array.array('l', L)
N = numpy.array(L)

%timeit sum(L)
10 loops, best of 3: 101 ms per loop

%timeit sum(A)
1 loop, best of 3: 237 ms per loop

%timeit sum(N)
1 loop, best of 3: 743 ms per loop

这样numpy也需要装箱/拆箱,这有额外的开销。要使其快速运行,必须停留在numpy C代码内:

%timeit N.sum()
100 loops, best of 3: 6.27 ms per loop

因此,从列表解决方案到numpy版本,这是运行时的因素16。

我们还要检查创建这些数据结构需要花费多长时间

%timeit list(range(10**7))
1 loop, best of 3: 283 ms per loop

%timeit array.array('l', range(10**7))
1 loop, best of 3: 884 ms per loop

%timeit numpy.array(range(10**7))
1 loop, best of 3: 1.49 s per loop

%timeit numpy.arange(10**7)
10 loops, best of 3: 21.7 ms per loop

明确的获胜者:Numpy

还要注意,创建数据结构所花费的时间与求和所花费的时间差不多,甚至更多。分配内存很慢。

那些的内存使用情况:

sys.getsizeof(L)
90000112
sys.getsizeof(A)
81940352
sys.getsizeof(N)
80000096

因此,每个数字占用8个字节,开销各不相同。对于此范围,我们使用32位整数就足够了,因此我们可以保护一些内存。

N=numpy.arange(10**7, dtype=numpy.int32)

sys.getsizeof(N)
40000096

%timeit N.sum()
100 loops, best of 3: 8.35 ms per loop

但是事实证明,在我的计算机上添加64位整数要快于32位整数,因此只有在受内存/带宽限制的情况下才值得这样做。

Tim Peters answered why this is slow, but let’s see how to improve it.

Sticking to your example of sum(range(...)) (factor 10 smaller than your example to fit into memory here):

import numpy
import array
L = list(range(10**7))
A = array.array('l', L)
N = numpy.array(L)

%timeit sum(L)
10 loops, best of 3: 101 ms per loop

%timeit sum(A)
1 loop, best of 3: 237 ms per loop

%timeit sum(N)
1 loop, best of 3: 743 ms per loop

This way also numpy needs to box/unbox, which has additional overhead. To make it fast one has to stay within the numpy c code:

%timeit N.sum()
100 loops, best of 3: 6.27 ms per loop

So from the list solution to the numpy version this is a factor 16 in runtime.

Let’s also check how long creating those data structures takes

%timeit list(range(10**7))
1 loop, best of 3: 283 ms per loop

%timeit array.array('l', range(10**7))
1 loop, best of 3: 884 ms per loop

%timeit numpy.array(range(10**7))
1 loop, best of 3: 1.49 s per loop

%timeit numpy.arange(10**7)
10 loops, best of 3: 21.7 ms per loop

Clear winner: Numpy

Also note that creating the data structure takes about as much time as summing, if not more. Allocating memory is slow.

Memory usage of those:

sys.getsizeof(L)
90000112
sys.getsizeof(A)
81940352
sys.getsizeof(N)
80000096

So these take 8 bytes per number with varying overhead. For the range we use 32bit ints are sufficient, so we can safe some memory.

N=numpy.arange(10**7, dtype=numpy.int32)

sys.getsizeof(N)
40000096

%timeit N.sum()
100 loops, best of 3: 8.35 ms per loop

But it turns out that adding 64bit ints is faster than 32bit ints on my machine, so this is only worth it if you are limited by memory/bandwidth.


回答 3

请注意,100000000等于10^8不等于10^7,我的结果如下所示:

100000000 == 10**8

# my test results on a Linux virtual machine:
#<L = list(range(100000000))> Time: 0:00:03.263585
#<A = array.array('l', range(100000000))> Time: 0:00:16.728709
#<L = list(range(10**8))> Time: 0:00:03.119379
#<A = array.array('l', range(10**8))> Time: 0:00:18.042187
#<A = array.array('l', L)> Time: 0:00:07.524478
#<sum(L)> Time: 0:00:01.640671
#<np.sum(L)> Time: 0:00:20.762153

please note that 100000000 equals to 10^8 not to 10^7, and my results are as the folowwing:

100000000 == 10**8

# my test results on a Linux virtual machine:
#<L = list(range(100000000))> Time: 0:00:03.263585
#<A = array.array('l', range(100000000))> Time: 0:00:16.728709
#<L = list(range(10**8))> Time: 0:00:03.119379
#<A = array.array('l', range(10**8))> Time: 0:00:18.042187
#<A = array.array('l', L)> Time: 0:00:07.524478
#<sum(L)> Time: 0:00:01.640671
#<np.sum(L)> Time: 0:00:20.762153

是否可以“破解” Python的打印功能?

问题:是否可以“破解” Python的打印功能?

注意:此问题仅供参考。我很想知道这样做有多深入到Python内部。

不久之前,在某个问题的内部开始了一个讨论,该问题是关于传递给print语句的字符串是否可以在调用to之后/期间进行修改print。例如,考虑以下功能:

def print_something():
    print('This cat was scared.')

现在,当print运行时,到终端的输出应显示:

This dog was scared.

请注意,单词“ cat”已被单词“ dog”代替。某处某种方式能够修改那些内部缓冲区以更改打印的内容。假设这样做是在未经原始代码作者明确许可的情况下进行的(因此,被黑客/劫持)。

这个评论从智者@abarnert,尤其让我思考:

有两种方法可以做到这一点,但是它们都很丑陋,绝不应该这样做。最丑陋的方法是code用一个带有不同co_consts 列表的对象替换 函数内部的对象。接下来可能是进入C API以访问str的内部缓冲区。[…]

因此,看起来这实际上是可能的。

这是我解决此问题的幼稚方法:

>>> import inspect
>>> exec(inspect.getsource(print_something).replace('cat', 'dog'))
>>> print_something()
This dog was scared.

当然,这exec很糟糕,但这并不能真正回答问题,因为 print调用when / after之后,它实际上并未进行任何修改。

正如@abarnert解释的那样,它将如何进行?

Note: This question is for informational purposes only. I am interested to see how deep into Python’s internals it is possible to go with this.

Not very long ago, a discussion began inside a certain question regarding whether the strings passed to print statements could be modified after/during the call to print has been made. For example, consider the function:

def print_something():
    print('This cat was scared.')

Now, when print is run, then the output to the terminal should display:

This dog was scared.

Notice the word “cat” has been replaced by the word “dog”. Something somewhere somehow was able to modify those internal buffers to change what was printed. Assume this is done without the original code author’s explicit permission (hence, hacking/hijacking).

This comment from the wise @abarnert, in particular, got me thinking:

There are a couple of ways to do that, but they’re all very ugly, and should never be done. The least ugly way is to probably replace the code object inside the function with one with a different co_consts list. Next is probably reaching into the C API to access the str’s internal buffer. […]

So, it looks like this is actually possible.

Here’s my naive way of approaching this problem:

>>> import inspect
>>> exec(inspect.getsource(print_something).replace('cat', 'dog'))
>>> print_something()
This dog was scared.

Of course, exec is bad, but that doesn’t really answer the question, because it does not actually modify anything during when/after print is called.

How would it be done as @abarnert has explained it?


回答 0

首先,实际上没有那么多hacky方式。我们要做的就是更改print打印内容,对吗?

_print = print
def print(*args, **kw):
    args = (arg.replace('cat', 'dog') if isinstance(arg, str) else arg
            for arg in args)
    _print(*args, **kw)

或者,类似地,您可以选择Monkey补丁sys.stdout而不是print


同样,这个exec … getsource …想法也没有错。好吧,这当然有很多问题,但是比这里的要少…


但是,如果您确实想修改函数对象的代码常量,则可以这样做。

如果您真的想真正使用代码对象,则应该使用bytecode(完成时)或byteplay(直到那时,或者对于较旧的Python版本)之类的库,而不是手动执行。即使对于这种琐碎的事情,CodeType初始化器还是很痛苦的。如果您确实需要做一些固定的工作lnotab,那么只有疯子才会手动进行。

另外,不用说,并非所有的Python实现都使用CPython风格的代码对象。这段代码可以在CPython 3.7中使用,并且可能所有版本都可以回溯到至少2.2,但需要进行一些细微的更改(不是代码黑客的东西,而是生成器表达式之类的东西),但是不适用于任何版本的IronPython。

import types

def print_function():
    print ("This cat was scared.")

def main():
    # A function object is a wrapper around a code object, with
    # a bit of extra stuff like default values and closure cells.
    # See inspect module docs for more details.
    co = print_function.__code__
    # A code object is a wrapper around a string of bytecode, with a
    # whole bunch of extra stuff, including a list of constants used
    # by that bytecode. Again see inspect module docs. Anyway, inside
    # the bytecode for string (which you can read by typing
    # dis.dis(string) in your REPL), there's going to be an
    # instruction like LOAD_CONST 1 to load the string literal onto
    # the stack to pass to the print function, and that works by just
    # reading co.co_consts[1]. So, that's what we want to change.
    consts = tuple(c.replace("cat", "dog") if isinstance(c, str) else c
                   for c in co.co_consts)
    # Unfortunately, code objects are immutable, so we have to create
    # a new one, copying over everything except for co_consts, which
    # we'll replace. And the initializer has a zillion parameters.
    # Try help(types.CodeType) at the REPL to see the whole list.
    co = types.CodeType(
        co.co_argcount, co.co_kwonlyargcount, co.co_nlocals,
        co.co_stacksize, co.co_flags, co.co_code,
        consts, co.co_names, co.co_varnames, co.co_filename,
        co.co_name, co.co_firstlineno, co.co_lnotab,
        co.co_freevars, co.co_cellvars)
    print_function.__code__ = co
    print_function()

main()

入侵代码对象可能会出什么问题?大多数情况下,segfaults RuntimeError会耗尽整个堆栈,更正常RuntimeError的segfault 会被处理,或者垃圾值可能只会引发a TypeErrorAttributeError当您尝试使用它们时。例如,尝试创建一个代码对象,该对象只带有一个RETURN_VALUE在堆栈上没有任何内容的字节码b'S\0'(3.6以上的字节码,b'S'之前),或者一个空元组(表示字节码中是否co_consts有a LOAD_CONST 0,或者varnames减1,因此最高的字节LOAD_FAST实际上加载了一个freevar) / cellvar单元格。为了获得一些真正的乐趣,如果您lnotab弄错了太多,那么只有在调试器中运行代码时,您的代码才会出现段错误。

使用bytecodebyteplay不会保护您免受所有这些问题的影响,但是它们确实具有一些基本的健全性检查,并且好的助手可以让您执行诸如插入代码块之类的事情,并使其担心更新所有偏移量和标签,以便您能够弄错了,等等。(此外,它们使您不必键入该可笑的6行构造函数,也不必调试由此产生的愚蠢的错字。)


现在进入第二。

我提到代码对象是不可变的。当然,const是一个元组,因此我们不能直接更改它。const元组中的东西是一个字符串,我们也不能直接更改它。这就是为什么我必须构建一个新的字符串来构建一个新的元组来构建一个新的代码对象的原因。

但是,如果您可以直接更改字符串怎么办?

好吧,在足够深入的内容下,所有内容都只是指向某些C数据的指针,对吗?如果您使用的是CPython,则有一个C API可以访问对象,并且可以使用ctypes它从Python本身内部访问该API,这是一个很糟糕的想法,他们将它们放在pythonapistdlib的ctypes模块中。:)您需要了解的最重要的技巧id(x)是实际指向x内存的指针(作为int)。

不幸的是,用于字符串的C API不能让我们安全地获取已经冻结的字符串的内部存储。因此,请放心,我们只需要阅读头文件并自己找到该存储即可。

如果您使用的是CPython 3.4-3.7(旧版本有所不同,谁知道未来),那么将使用紧凑ASCII格式存储由纯ASCII组成的模块中的字符串文字。提早结束,并且ASCII字节的缓冲区立即在内存中。如果您在字符串或某些非文字字符串中输入非ASCII字符,这将中断(可能在段错误中),但是您可以阅读其他4种方式来访问不同类型字符串的缓冲区。

为了使事情变得简单一些,我在superhackyinternalsGitHub上使用了该项目。(这是有意不可pip安装的,因为您真的不应该使用它,除非尝试在本地构建解释器等。)

import ctypes
import internals # https://github.com/abarnert/superhackyinternals/blob/master/internals.py

def print_function():
    print ("This cat was scared.")

def main():
    for c in print_function.__code__.co_consts:
        if isinstance(c, str):
            idx = c.find('cat')
            if idx != -1:
                # Too much to explain here; just guess and learn to
                # love the segfaults...
                p = internals.PyUnicodeObject.from_address(id(c))
                assert p.compact and p.ascii
                addr = id(c) + internals.PyUnicodeObject.utf8_length.offset
                buf = (ctypes.c_int8 * 3).from_address(addr + idx)
                buf[:3] = b'dog'

    print_function()

main()

如果您想玩这些东西,int则比起隐藏起来要简单得多str。而且,通过更改2to 的值来猜测可以破坏什么,容易得多1,对吗?实际上,忘记想象,让我们开始吧(superhackyinternals再次使用类型):

>>> n = 2
>>> pn = PyLongObject.from_address(id(n))
>>> pn.ob_digit[0]
2
>>> pn.ob_digit[0] = 1
>>> 2
1
>>> n * 3
3
>>> i = 10
>>> while i < 40:
...     i *= 2
...     print(i)
10
10
10

…假设代码框具有无限长的滚动条。

我在IPython中尝试过同样的事情,并且第一次尝试2在提示符下进行评估,它陷入了某种不间断的无限循环。大概2是在REPL循环中将数字用于某物,而股票解释器不是吗?

First, there’s actually a much less hacky way. All we want to do is change what print prints, right?

_print = print
def print(*args, **kw):
    args = (arg.replace('cat', 'dog') if isinstance(arg, str) else arg
            for arg in args)
    _print(*args, **kw)

Or, similarly, you can monkeypatch sys.stdout instead of print.


Also, nothing wrong with the exec … getsource … idea. Well, of course there’s plenty wrong with it, but less than what follows here…


But if you do want to modify the function object’s code constants, we can do that.

If you really want to play around with code objects for real, you should use a library like bytecode (when it’s finished) or byteplay (until then, or for older Python versions) instead of doing it manually. Even for something this trivial, the CodeType initializer is a pain; if you actually need to do stuff like fixing up lnotab, only a lunatic would do that manually.

Also, it goes without saying that not all Python implementations use CPython-style code objects. This code will work in CPython 3.7, and probably all versions back to at least 2.2 with a few minor changes (and not the code-hacking stuff, but things like generator expressions), but it won’t work with any version of IronPython.

import types

def print_function():
    print ("This cat was scared.")

def main():
    # A function object is a wrapper around a code object, with
    # a bit of extra stuff like default values and closure cells.
    # See inspect module docs for more details.
    co = print_function.__code__
    # A code object is a wrapper around a string of bytecode, with a
    # whole bunch of extra stuff, including a list of constants used
    # by that bytecode. Again see inspect module docs. Anyway, inside
    # the bytecode for string (which you can read by typing
    # dis.dis(string) in your REPL), there's going to be an
    # instruction like LOAD_CONST 1 to load the string literal onto
    # the stack to pass to the print function, and that works by just
    # reading co.co_consts[1]. So, that's what we want to change.
    consts = tuple(c.replace("cat", "dog") if isinstance(c, str) else c
                   for c in co.co_consts)
    # Unfortunately, code objects are immutable, so we have to create
    # a new one, copying over everything except for co_consts, which
    # we'll replace. And the initializer has a zillion parameters.
    # Try help(types.CodeType) at the REPL to see the whole list.
    co = types.CodeType(
        co.co_argcount, co.co_kwonlyargcount, co.co_nlocals,
        co.co_stacksize, co.co_flags, co.co_code,
        consts, co.co_names, co.co_varnames, co.co_filename,
        co.co_name, co.co_firstlineno, co.co_lnotab,
        co.co_freevars, co.co_cellvars)
    print_function.__code__ = co
    print_function()

main()

What could go wrong with hacking up code objects? Mostly just segfaults, RuntimeErrors that eat up the whole stack, more normal RuntimeErrors that can be handled, or garbage values that will probably just raise a TypeError or AttributeError when you try to use them. For examples, try creating a code object with just a RETURN_VALUE with nothing on the stack (bytecode b'S\0' for 3.6+, b'S' before), or with an empty tuple for co_consts when there’s a LOAD_CONST 0 in the bytecode, or with varnames decremented by 1 so the highest LOAD_FAST actually loads a freevar/cellvar cell. For some real fun, if you get the lnotab wrong enough, your code will only segfault when run in the debugger.

Using bytecode or byteplay won’t protect you from all of those problems, but they do have some basic sanity checks, and nice helpers that let you do things like insert a chunk of code and let it worry about updating all offsets and labels so you can’t get it wrong, and so on. (Plus, they keep you from having to type in that ridiculous 6-line constructor, and having to debug the silly typos that come from doing so.)


Now on to #2.

I mentioned that code objects are immutable. And of course the consts are a tuple, so we can’t change that directly. And the thing in the const tuple is a string, which we also can’t change directly. That’s why I had to build a new string to build a new tuple to build a new code object.

But what if you could change a string directly?

Well, deep enough under the covers, everything is just a pointer to some C data, right? If you’re using CPython, there’s a C API to access the objects, and you can use ctypes to access that API from within Python itself, which is such a terrible idea that they put a pythonapi right there in the stdlib’s ctypes module. :) The most important trick you need to know is that id(x) is the actual pointer to x in memory (as an int).

Unfortunately, the C API for strings won’t let us safely get at the internal storage of an already-frozen string. So screw safely, let’s just read the header files and find that storage ourselves.

If you’re using CPython 3.4 – 3.7 (it’s different for older versions, and who knows for the future), a string literal from a module that’s made of pure ASCII is going to be stored using the compact ASCII format, which means the struct ends early and the buffer of ASCII bytes follows immediately in memory. This will break (as in probably segfault) if you put a non-ASCII character in the string, or certain kinds of non-literal strings, but you can read up on the other 4 ways to access the buffer for different kinds of strings.

To make things slightly easier, I’m using the superhackyinternals project off my GitHub. (It’s intentionally not pip-installable because you really shouldn’t be using this except to experiment with your local build of the interpreter and the like.)

import ctypes
import internals # https://github.com/abarnert/superhackyinternals/blob/master/internals.py

def print_function():
    print ("This cat was scared.")

def main():
    for c in print_function.__code__.co_consts:
        if isinstance(c, str):
            idx = c.find('cat')
            if idx != -1:
                # Too much to explain here; just guess and learn to
                # love the segfaults...
                p = internals.PyUnicodeObject.from_address(id(c))
                assert p.compact and p.ascii
                addr = id(c) + internals.PyUnicodeObject.utf8_length.offset
                buf = (ctypes.c_int8 * 3).from_address(addr + idx)
                buf[:3] = b'dog'

    print_function()

main()

If you want to play with this stuff, int is a whole lot simpler under the covers than str. And it’s a lot easier to guess what you can break by changing the value of 2 to 1, right? Actually, forget imagining, let’s just do it (using the types from superhackyinternals again):

>>> n = 2
>>> pn = PyLongObject.from_address(id(n))
>>> pn.ob_digit[0]
2
>>> pn.ob_digit[0] = 1
>>> 2
1
>>> n * 3
3
>>> i = 10
>>> while i < 40:
...     i *= 2
...     print(i)
10
10
10

… pretend that code box has an infinite-length scrollbar.

I tried the same thing in IPython, and the first time I tried to evaluate 2 at the prompt, it went into some kind of uninterruptable infinite loop. Presumably it’s using the number 2 for something in its REPL loop, while the stock interpreter isn’t?


回答 1

Monkey补丁 print

print是一个内置函数,因此它将使用模块(或Python 2)中print定义的函数。因此,无论何时要修改或更改内置函数的行为,都可以在该模块中简单地重新分配名称。builtins__builtin__

此过程称为monkey-patching

# Store the real print function in another variable otherwise
# it will be inaccessible after being modified.
_print = print  

# Actual implementation of the new print
def custom_print(*args, **options):
    _print('custom print called')
    _print(*args, **options)

# Change the print function globally
import builtins
builtins.print = custom_print

之后,即使是在外部模块中,每个print调用也都将通过。custom_printprint

但是,您实际上并不想打印其他文本,而是要更改打印的文本。一种解决方法是将其替换为要打印的字符串:

_print = print  

def custom_print(*args, **options):
    # Get the desired seperator or the default whitspace
    sep = options.pop('sep', ' ')
    # Create the final string
    printed_string = sep.join(args)
    # Modify the final string
    printed_string = printed_string.replace('cat', 'dog')
    # Call the default print function
    _print(printed_string, **options)

import builtins
builtins.print = custom_print

实际上,如果您运行:

>>> def print_something():
...     print('This cat was scared.')
>>> print_something()
This dog was scared.

或者,如果您将其写入文件:

test_file.py

def print_something():
    print('This cat was scared.')

print_something()

并导入:

>>> import test_file
This dog was scared.
>>> test_file.print_something()
This dog was scared.

因此,它确实按预期工作。

但是,如果您只想临时打印Monkey补丁,可以将其包装在上下文管理器中:

import builtins

class ChangePrint(object):
    def __init__(self):
        self.old_print = print

    def __enter__(self):
        def custom_print(*args, **options):
            # Get the desired seperator or the default whitspace
            sep = options.pop('sep', ' ')
            # Create the final string
            printed_string = sep.join(args)
            # Modify the final string
            printed_string = printed_string.replace('cat', 'dog')
            # Call the default print function
            self.old_print(printed_string, **options)

        builtins.print = custom_print

    def __exit__(self, *args, **kwargs):
        builtins.print = self.old_print

因此,当您运行时,它取决于上下文,显示的内容是:

>>> with ChangePrint() as x:
...     test_file.print_something()
... 
This dog was scared.
>>> test_file.print_something()
This cat was scared.

这样便可以print通过Monkey补丁“破解” 。

修改目标,而不是 print

如果您看一下签名,print则会注意到默认情况下有一个file参数sys.stdout。请注意,这是一个动态默认参数(每次调用时都会真正查找),而不像Python中的普通默认参数。因此,如果您进行更改,则实际上将打印到其他目标会更加方便,因为Python还提供了一个功能(从Python 3.4开始,但是为早期的Python版本创建等效功能很容易)。sys.stdoutprintsys.stdout printredirect_stdout

缺点是它不适用于print不打印到的语句,sys.stdout并且创建自己的语句stdout并不是很简单。

import io
import sys

class CustomStdout(object):
    def __init__(self, *args, **kwargs):
        self.current_stdout = sys.stdout

    def write(self, string):
        self.current_stdout.write(string.replace('cat', 'dog'))

但是,这也可以:

>>> import contextlib
>>> with contextlib.redirect_stdout(CustomStdout()):
...     test_file.print_something()
... 
This dog was scared.
>>> test_file.print_something()
This cat was scared.

摘要

@abarnet已经提到了其中一些观点,但是我想更详细地探讨这些选项。特别是如何跨模块(使用builtins/ __builtin__)修改它,以及如何仅临时更改(使用contextmanagers)。

Monkey-patch print

print is a builtin function so it will use the print function defined in the builtins module (or __builtin__ in Python 2). So whenever you want to modify or change the behavior of a builtin function you can simply reassign the name in that module.

This process is called monkey-patching.

# Store the real print function in another variable otherwise
# it will be inaccessible after being modified.
_print = print  

# Actual implementation of the new print
def custom_print(*args, **options):
    _print('custom print called')
    _print(*args, **options)

# Change the print function globally
import builtins
builtins.print = custom_print

After that every print call will go through custom_print, even if the print is in an external module.

However you don’t really want to print additional text, you want to change the text that is printed. One way to go about that is to replace it in the string that would be printed:

_print = print  

def custom_print(*args, **options):
    # Get the desired seperator or the default whitspace
    sep = options.pop('sep', ' ')
    # Create the final string
    printed_string = sep.join(args)
    # Modify the final string
    printed_string = printed_string.replace('cat', 'dog')
    # Call the default print function
    _print(printed_string, **options)

import builtins
builtins.print = custom_print

And indeed if you run:

>>> def print_something():
...     print('This cat was scared.')
>>> print_something()
This dog was scared.

Or if you write that to a file:

test_file.py

def print_something():
    print('This cat was scared.')

print_something()

and import it:

>>> import test_file
This dog was scared.
>>> test_file.print_something()
This dog was scared.

So it really works as intended.

However, in case you only temporarily want to monkey-patch print you could wrap this in a context-manager:

import builtins

class ChangePrint(object):
    def __init__(self):
        self.old_print = print

    def __enter__(self):
        def custom_print(*args, **options):
            # Get the desired seperator or the default whitspace
            sep = options.pop('sep', ' ')
            # Create the final string
            printed_string = sep.join(args)
            # Modify the final string
            printed_string = printed_string.replace('cat', 'dog')
            # Call the default print function
            self.old_print(printed_string, **options)

        builtins.print = custom_print

    def __exit__(self, *args, **kwargs):
        builtins.print = self.old_print

So when you run that it depends on the context what is printed:

>>> with ChangePrint() as x:
...     test_file.print_something()
... 
This dog was scared.
>>> test_file.print_something()
This cat was scared.

So that’s how you could “hack” print by monkey-patching.

Modify the target instead of the print

If you look at the signature of print you’ll notice a file argument which is sys.stdout by default. Note that this is a dynamic default argument (it really looks up sys.stdout every time you call print) and not like normal default arguments in Python. So if you change sys.stdout print will actually print to the different target even more convenient that Python also provides a redirect_stdout function (from Python 3.4 on, but it’s easy to create an equivalent function for earlier Python versions).

The downside is that it won’t work for print statements that don’t print to sys.stdout and that creating your own stdout isn’t really straightforward.

import io
import sys

class CustomStdout(object):
    def __init__(self, *args, **kwargs):
        self.current_stdout = sys.stdout

    def write(self, string):
        self.current_stdout.write(string.replace('cat', 'dog'))

However this also works:

>>> import contextlib
>>> with contextlib.redirect_stdout(CustomStdout()):
...     test_file.print_something()
... 
This dog was scared.
>>> test_file.print_something()
This cat was scared.

Summary

Some of these points have already be mentioned by @abarnet but I wanted to explore these options in more detail. Especially how to modify it across modules (using builtins/__builtin__) and how to make that change only temporary (using contextmanagers).


回答 2

print函数捕获所有输出然后对其进行处理的一种简单方法是将输出流更改为其他内容,例如文件。

我将使用PHP命名约定(ob_startob_get_contents,…)

from functools import partial
output_buffer = None
print_orig = print
def ob_start(fname="print.txt"):
    global print
    global output_buffer
    print = partial(print_orig, file=output_buffer)
    output_buffer = open(fname, 'w')
def ob_end():
    global output_buffer
    close(output_buffer)
    print = print_orig
def ob_get_contents(fname="print.txt"):
    return open(fname, 'r').read()

用法:

print ("Hi John")
ob_start()
print ("Hi John")
ob_end()
print (ob_get_contents().replace("Hi", "Bye"))

将打印

嗨约翰再见约翰

A simple way to capture all output from a print function and then process it, is to change the output stream to something else, e.g. a file.

I’ll use a PHP naming conventions (ob_start, ob_get_contents,…)

from functools import partial
output_buffer = None
print_orig = print
def ob_start(fname="print.txt"):
    global print
    global output_buffer
    print = partial(print_orig, file=output_buffer)
    output_buffer = open(fname, 'w')
def ob_end():
    global output_buffer
    close(output_buffer)
    print = print_orig
def ob_get_contents(fname="print.txt"):
    return open(fname, 'r').read()

Usage:

print ("Hi John")
ob_start()
print ("Hi John")
ob_end()
print (ob_get_contents().replace("Hi", "Bye"))

Would print

Hi John Bye John


回答 3

让我们将其与帧自省结合起来!

import sys

_print = print

def print(*args, **kw):
    frame = sys._getframe(1)
    _print(frame.f_code.co_name)
    _print(*args, **kw)

def greetly(name, greeting = "Hi")
    print(f"{greeting}, {name}!")

class Greeter:
    def __init__(self, greeting = "Hi"):
        self.greeting = greeting
    def greet(self, name):
        print(f"{self.greeting}, {name}!")

您会发现此技巧在调用函数或方法的每个问候语前都有序。这对于日志记录或调试可能非常有用;特别是因为它可以让您“劫持”第三方代码中的打印语句。

Let’s combine this with frame introspection!

import sys

_print = print

def print(*args, **kw):
    frame = sys._getframe(1)
    _print(frame.f_code.co_name)
    _print(*args, **kw)

def greetly(name, greeting = "Hi")
    print(f"{greeting}, {name}!")

class Greeter:
    def __init__(self, greeting = "Hi"):
        self.greeting = greeting
    def greet(self, name):
        print(f"{self.greeting}, {name}!")

You’ll find this trick prefaces every greeting with the calling function or method. This might be very useful for logging or debugging; especially as it lets you “hijack” print statements in third party code.


为什么字典和集合中的顺序是任意的?

问题:为什么字典和集合中的顺序是任意的?

我不明白如何通过“任意”顺序完成字典或在python中设置的循环。

我的意思是,这是一种编程语言,因此该语言中的所有内容都必须100%确定,对吗?Python必须具有某种算法来决定选择字典或集合的哪一部分,第一,第二等等。

我想念什么?

I don’t understand how looping over a dictionary or set in python is done by ‘arbitrary’ order.

I mean, it’s a programming language so everything in the language must be 100% determined, correct? Python must have some kind of algorithm that decides which part of the dictionary or set is chosen, 1st, second and so on.

What am I missing?


回答 0

注意:此答案是dict在Python 3.6中更改类型的实现之前编写的。此答案中的大多数实现细节仍然适用,但是字典中键的列出顺序不再由哈希值确定。设置的实现保持不变。

顺序不是任意的,而是取决于字典或集合的插入和删除历史记录以及特定的Python实现。对于该答案的其余部分,对于“字典”,您还可以阅读“设置”;集被实现为仅具有键而没有值的字典。

对键进行散列,并将散列值分配给动态表中的插槽(它可以根据需要增加或缩小)。映射过程可能导致冲突,这意味着必须根据已存在的键将密钥插入下一个插槽。

列出内容循环遍历插槽,因此键以它们当前在表中的顺序列出。

以键'foo''bar'为例,假设表的大小为8个插槽。在Python 2.7中,hash('foo')is -4177197833195190597hash('bar')is 327024216814240868。模数8,这意味着这两个键分别插入插槽3和4中,然后:

>>> hash('foo')
-4177197833195190597
>>> hash('foo') % 8
3
>>> hash('bar')
327024216814240868
>>> hash('bar') % 8
4

这通知了他们的上市顺序:

>>> {'bar': None, 'foo': None}
{'foo': None, 'bar': None}

除3和4之外的所有插槽均为空,在表上循环先列出插槽3,然后列出插槽4,因此'foo'在之前列出'bar'

barbaz,但是散列值恰好相距8,因此映射到完全相同的插槽4

>>> hash('bar')
327024216814240868
>>> hash('baz')
327024216814240876
>>> hash('bar') % 8
4
>>> hash('baz') % 8
4

现在,他们的顺序取决于首先插入哪个密钥。第二个密钥将必须移至下一个插槽:

>>> {'baz': None, 'bar': None}
{'bar': None, 'baz': None}
>>> {'bar': None, 'baz': None}
{'baz': None, 'bar': None}

此处的表顺序有所不同,因为一个或另一个键先插入插槽。

CPython使用的基础结构(最常用的Python实现)的技术名称是哈希表,该哈希表使用开放式寻址。如果您感到好奇,并且对C足够了解,请查看C实现的所有(详细记录)细节。您还可以观看Brandon RhodesPycon 2010上所作的有关CPython如何dict工作的演示,或获取Beautiful Code的副本,其中包括Andrew Kuchling编写的有关实现的章节。

请注意,从Python 3.3开始,还使用了随机哈希种子,使得哈希冲突无法预测,以防止某些类型的拒绝服务(攻击者通过引起大量哈希冲突而使Python服务器无响应)。这意味着给定字典或集合的顺序取决于当前Python调用的随机哈希种子。

其他实现可以自由地为字典使用不同的结构,只要它们满足已记录的Python接口即可,但是我相信到目前为止,所有实现都使用哈希表的变体。

CPython 3.6引入了一个新的 dict实现,该实现可以维持插入顺序,并且启动起来更快,内存效率更高。新的实现没有保留一个大的稀疏表,其中的每一行都引用存储的哈希值以及键和值对象,而是添加了一个较小的哈希数组,该数组仅引用单独的“密集”表中的索引(一个表仅包含尽可能多的行) (因为有实际的键/值对),而密集表恰好按顺序列出了包含的项。有关更多详细信息,请参见Python-Dev建议。请注意,在Python 3.6中,这被视为实现细节,Python语言不会指定其他实现必须保留顺序。这在Python 3.7中有所更改,在该版本中,此详细信息已提升为一种语言规范;为了使任何实现与Python 3.7或更高版本正确兼容,必须复制此保留顺序的行为。明确地说:此更改不适用于集合,因为集合已经具有“小”哈希结构。

Python 2.7及更高版本还提供了一个OrderedDict该类的子类dict添加了额外的数据结构来记录键顺序。以某种速度和额外的内存为代价,此类会记住您按什么顺序插入键。然后列出键,值或项目将按此顺序进行。它使用存储在其他词典中的双向链接列表来使订单保持最新状态。请参阅Raymond Hettinger帖子,概述该想法OrderedDict对象还有其他优点,例如可重新排序

如果您需要订购的套装,则可以安装oset软件包;它适用于Python 2.5及更高版本。

Note: This answer was written before the implementation of the dict type changed, in Python 3.6. Most of the implementation details in this answer still apply, but the listing order of keys in dictionaries is no longer determined by hash values. The set implementation remains unchanged.

The order is not arbitrary, but depends on the insertion and deletion history of the dictionary or set, as well as on the specific Python implementation. For the remainder of this answer, for ‘dictionary’, you can also read ‘set’; sets are implemented as dictionaries with just keys and no values.

Keys are hashed, and hash values are assigned to slots in a dynamic table (it can grow or shrink based on needs). And that mapping process can lead to collisions, meaning that a key will have to be slotted in a next slot based on what is already there.

Listing the contents loops over the slots, and so keys are listed in the order they currently reside in the table.

Take the keys 'foo' and 'bar', for example, and lets assume the table size is 8 slots. In Python 2.7, hash('foo') is -4177197833195190597, hash('bar') is 327024216814240868. Modulo 8, that means these two keys are slotted in slots 3 and 4 then:

>>> hash('foo')
-4177197833195190597
>>> hash('foo') % 8
3
>>> hash('bar')
327024216814240868
>>> hash('bar') % 8
4

This informs their listing order:

>>> {'bar': None, 'foo': None}
{'foo': None, 'bar': None}

All slots except 3 and 4 are empty, looping over the table first lists slot 3, then slot 4, so 'foo' is listed before 'bar'.

bar and baz, however, have hash values that are exactly 8 apart and thus map to the exact same slot, 4:

>>> hash('bar')
327024216814240868
>>> hash('baz')
327024216814240876
>>> hash('bar') % 8
4
>>> hash('baz') % 8
4

Their order now depends on which key was slotted first; the second key will have to be moved to a next slot:

>>> {'baz': None, 'bar': None}
{'bar': None, 'baz': None}
>>> {'bar': None, 'baz': None}
{'baz': None, 'bar': None}

The table order differs here, because one or the other key was slotted first.

The technical name for the underlying structure used by CPython (the most commonly used Python implemenation) is a hash table, one that uses open addressing. If you are curious, and understand C well enough, take a look at the C implementation for all the (well documented) details. You could also watch this Pycon 2010 presentation by Brandon Rhodes about how CPython dict works, or pick up a copy of Beautiful Code, which includes a chapter on the implementation written by Andrew Kuchling.

Note that as of Python 3.3, a random hash seed is used as well, making hash collisions unpredictable to prevent certain types of denial of service (where an attacker renders a Python server unresponsive by causing mass hash collisions). This means that the order of a given dictionary or set is then also dependent on the random hash seed for the current Python invocation.

Other implementations are free to use a different structure for dictionaries, as long as they satisfy the documented Python interface for them, but I believe that all implementations so far use a variation of the hash table.

CPython 3.6 introduces a new dict implementation that maintains insertion order, and is faster and more memory efficient to boot. Rather than keep a large sparse table where each row references the stored hash value, and the key and value objects, the new implementation adds a smaller hash array that only references indices in a separate ‘dense’ table (one that only contains as many rows as there are actual key-value pairs), and it is the dense table that happens to list the contained items in order. See the proposal to Python-Dev for more details. Note that in Python 3.6 this is considered an implementation detail, Python-the-language does not specify that other implementations have to retain order. This changed in Python 3.7, where this detail was elevated to be a language specification; for any implementation to be properly compatible with Python 3.7 or newer it must copy this order-preserving behaviour. And to be explicit: this change doesn’t apply to sets, as sets already have a ‘small’ hash structure.

Python 2.7 and newer also provides an OrderedDict class, a subclass of dict that adds an additional data structure to record key order. At the price of some speed and extra memory, this class remembers in what order you inserted keys; listing keys, values or items will then do so in that order. It uses a doubly-linked list stored in an additional dictionary to keep the order up-to-date efficiently. See the post by Raymond Hettinger outlining the idea. OrderedDict objects have other advantages, such as being re-orderable.

If you wanted an ordered set, you can install the oset package; it works on Python 2.5 and up.


回答 1

这更多是对Python 3.41集的响应,该集在被关闭之前被重复了。


其他人是对的:不要依赖命令。甚至不要假装有一个。

也就是说,您可以依靠件事:

list(myset) == list(myset)

也就是说,顺序是稳定的


要了解为什么会有感知的顺序,就需要了解以下几点:

  • Python使用哈希集

  • CPython的哈希集如何存储在内存中以及

  • 数字如何散列

从顶部:

一个哈希集合是存储随机数据与真快,查找时间的方法。

它具有一个支持数组:

# A C array; items may be NULL,
# a pointer to an object, or a
# special dummy object
_ _ 4 _ _ 2 _ _ 6

我们将忽略特殊的伪对象,该伪对象的存在只是为了使移除更易于处理,因为我们不会从这些集合中移除。

为了真正快速地进行查找,您需要做一些魔术来计算对象的哈希值。唯一的规则是两个相等的对象具有相同的哈希值。(但是,如果两个对象具有相同的哈希,则它们可能不相等。)

然后,通过将模数乘以数组长度来建立索引:

hash(4) % len(storage) = index 2

这使得访问元素确实非常快。

散列只是故事的大部分,因为hash(n) % len(storage)并且hash(m) % len(storage)可以产生相同的数目。在这种情况下,几种不同的策略可以尝试解决冲突。CPython在做复杂的事情之前先使用了9次“线性探测”,因此在寻找其他位置之前,它会在插槽的左侧查找多达9个位置。

CPython的哈希集存储如下:

  • 哈希集不能超过2/3 full。如果有20个元素,并且后备数组长30个元素,则后备存储将调整为更大的大小。这是因为您与小型后备店的碰撞更为频繁,而碰撞会使一切变慢。

  • 除大型存储集(50k元素)以2的幂(8、32、128,…)调整大小外,后备存储以8的幂从4开始调整大小。

因此,当您创建阵列时,后备存储区的长度为8。当存储区的容量为5并添加一个元素时,它将短暂包含6个元素。6 > ²⁄₃·8因此这会触发调整大小,后备存储将大小增加三倍,达到32。

最后,hash(n)仅返回n数字(-1特殊情况除外)。


因此,让我们看第一个:

v_set = {88,11,1,33,21,3,7,55,37,8}

len(v_set)是10,因此在添加所有项目后,后备存储至少为15(+1)。2的相关乘方为32。因此,后备存储为:

__ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __

我们有

hash(88) % 32 = 24
hash(11) % 32 = 11
hash(1)  % 32 = 1
hash(33) % 32 = 1
hash(21) % 32 = 21
hash(3)  % 32 = 3
hash(7)  % 32 = 7
hash(55) % 32 = 23
hash(37) % 32 = 5
hash(8)  % 32 = 8

所以这些插入为:

__  1 __  3 __ 37 __  7  8 __ __ 11 __ __ __ __ __ __ __ __ __ 21 __ 55 88 __ __ __ __ __ __ __
   33 ← Can't also be where 1 is;
        either 1 or 33 has to move

因此,我们希望订单像

{[1 or 33], 3, 37, 7, 8, 11, 21, 55, 88}

与1或33不在其他地方的开始。这将使用线性探测,因此我们将具有:


__  1 33  3 __ 37 __  7  8 __ __ 11 __ __ __ __ __ __ __ __ __ 21 __ 55 88 __ __ __ __ __ __ __

要么


__ 33  1  3 __ 37 __  7  8 __ __ 11 __ __ __ __ __ __ __ __ __ 21 __ 55 88 __ __ __ __ __ __ __

您可能希望33是被替换的,因为1已经存在,但是由于在构建集合时会发生调整大小,实际上并非如此。每次重建集合时,已经添加的项目都会有效地重新排序。

现在你明白了为什么

{7,5,11,1,4,13,55,12,2,3,6,20,9,10}

可能是有秩序的。有14个元素,因此后备存储区至少为21 + 1,这意味着32:

__ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __

前13个插槽中的1到13个哈希值。20进入插槽20。

__  1  2  3  4  5  6  7  8  9 10 11 12 13 __ __ __ __ __ __ 20 __ __ __ __ __ __ __ __ __ __ __

55进入插槽hash(55) % 3223

__  1  2  3  4  5  6  7  8  9 10 11 12 13 __ __ __ __ __ __ 20 __ __ 55 __ __ __ __ __ __ __ __

如果我们选择50,我们期望

__  1  2  3  4  5  6  7  8  9 10 11 12 13 __ __ __ __ 50 __ 20 __ __ __ __ __ __ __ __ __ __ __

瞧瞧:

{1, 2, 3, 4, 5, 6, 7, 9, 10, 11, 12, 13, 20, 50}
#>>> {1, 2, 3, 4, 5, 6, 7, 9, 10, 11, 12, 13, 50, 20}

pop 通过事物的外观非常简单地实现:遍历列表并弹出第一个列表。


这是所有实现细节。

This is more a response to Python 3.41 A set before it was closed as a duplicate.


The others are right: don’t rely on the order. Don’t even pretend there is one.

That said, there is one thing you can rely on:

list(myset) == list(myset)

That is, the order is stable.


Understanding why there is a perceived order requires understanding a few things:

  • That Python uses hash sets,

  • How CPython’s hash set is stored in memory and

  • How numbers get hashed

From the top:

A hash set is a method of storing random data with really fast lookup times.

It has a backing array:

# A C array; items may be NULL,
# a pointer to an object, or a
# special dummy object
_ _ 4 _ _ 2 _ _ 6

We shall ignore the special dummy object, which exists only to make removes easier to deal with, because we won’t be removing from these sets.

In order to have really fast lookup, you do some magic to calculate a hash from an object. The only rule is that two objects which are equal have the same hash. (But if two objects have the same hash they can be unequal.)

You then make in index by taking the modulus by the array length:

hash(4) % len(storage) = index 2

This makes it really fast to access elements.

Hashes are only most of the story, as hash(n) % len(storage) and hash(m) % len(storage) can result in the same number. In that case, several different strategies can try and resolve the conflict. CPython uses “linear probing” 9 times before doing complicated things, so it will look to the left of the slot for up to 9 places before looking elsewhere.

CPython’s hash sets are stored like this:

  • A hash set can be no more than 2/3 full. If there are 20 elements and the backing array is 30 elements long, the backing store will resize to be larger. This is because you get collisions more often with small backing stores, and collisions slow everything down.

  • The backing store resizes in powers of 4, starting at 8, except for large sets (50k elements) which resize in powers of two: (8, 32, 128, …).

So when you create an array the backing store is length 8. When it is 5 full and you add an element, it will briefly contain 6 elements. 6 > ²⁄₃·8 so this triggers a resize, and the backing store quadruples to size 32.

Finally, hash(n) just returns n for numbers (except -1 which is special).


So, let’s look at the first one:

v_set = {88,11,1,33,21,3,7,55,37,8}

len(v_set) is 10, so the backing store is at least 15(+1) after all items have been added. The relevant power of 2 is 32. So the backing store is:

__ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __

We have

hash(88) % 32 = 24
hash(11) % 32 = 11
hash(1)  % 32 = 1
hash(33) % 32 = 1
hash(21) % 32 = 21
hash(3)  % 32 = 3
hash(7)  % 32 = 7
hash(55) % 32 = 23
hash(37) % 32 = 5
hash(8)  % 32 = 8

so these insert as:

__  1 __  3 __ 37 __  7  8 __ __ 11 __ __ __ __ __ __ __ __ __ 21 __ 55 88 __ __ __ __ __ __ __
   33 ← Can't also be where 1 is;
        either 1 or 33 has to move

So we would expect an order like

{[1 or 33], 3, 37, 7, 8, 11, 21, 55, 88}

with the 1 or 33 that isn’t at the start somewhere else. This will use linear probing, so we will either have:

       ↓
__  1 33  3 __ 37 __  7  8 __ __ 11 __ __ __ __ __ __ __ __ __ 21 __ 55 88 __ __ __ __ __ __ __

or

       ↓
__ 33  1  3 __ 37 __  7  8 __ __ 11 __ __ __ __ __ __ __ __ __ 21 __ 55 88 __ __ __ __ __ __ __

You might expect the 33 to be the one that’s displaced because the 1 was already there, but due to the resizing that happens as the set is being built, this isn’t actually the case. Every time the set gets rebuilt, the items already added are effectively reordered.

Now you can see why

{7,5,11,1,4,13,55,12,2,3,6,20,9,10}

might be in order. There are 14 elements, so the backing store is at least 21+1, which means 32:

__ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __

1 to 13 hash in the first 13 slots. 20 goes in slot 20.

__  1  2  3  4  5  6  7  8  9 10 11 12 13 __ __ __ __ __ __ 20 __ __ __ __ __ __ __ __ __ __ __

55 goes in slot hash(55) % 32 which is 23:

__  1  2  3  4  5  6  7  8  9 10 11 12 13 __ __ __ __ __ __ 20 __ __ 55 __ __ __ __ __ __ __ __

If we chose 50 instead, we’d expect

__  1  2  3  4  5  6  7  8  9 10 11 12 13 __ __ __ __ 50 __ 20 __ __ __ __ __ __ __ __ __ __ __

And lo and behold:

{1, 2, 3, 4, 5, 6, 7, 9, 10, 11, 12, 13, 20, 50}
#>>> {1, 2, 3, 4, 5, 6, 7, 9, 10, 11, 12, 13, 50, 20}

pop is implemented quite simply by the looks of things: it traverses the list and pops the first one.


This is all implementation detail.


回答 2

“任意”与“不确定”不同。

他们的意思是,没有“在公共界面中”的字典迭代顺序有用的属性。几乎可以肯定,迭代顺序的许多属性完全由当前实现字典迭代的代码确定,但是作者并没有向您保证可以使用它们。这给了他们更大的自由,可以在Python版本之间(甚至在不同的操作条件下,或者在运行时完全随机地)更改这些属性,而不必担心程序会中断。

因此,如果您编写的程序在所有字典顺序上都依赖于任何属性,那么您正在“违反使用字典类型的约定”,并且Python开发人员并不保证这将始终有效,即使它看起来可以正常工作现在,当您对其进行测试时。从根本上讲,这等效于依赖C中的“未定义行为”。

“Arbitrary” isn’t the same thing as “non-determined”.

What they’re saying is that there are no useful properties of dictionary iteration order that are “in the public interface”. There almost certainly are many properties of the iteration order that are fully determined by the code that currently implements dictionary iteration, but the authors aren’t promising them to you as something you can use. This gives them more freedom to change these properties between Python versions (or even just in different operating conditions, or completely at random at runtime) without worrying that your program will break.

Thus if you write a program that depends on any property at all of dictionary order, then you are “breaking the contract” of using the dictionary type, and the Python developers are not promising that this will always work, even if it appears to work for now when you test it. It’s basically the equivalent of relying on “undefined behaviour” in C.


回答 3

这个问题的其他答案都很好并且写得很好。OP询问“如何”,我将其解释为“他们如何摆脱”或“为什么”。

Python文档说字典没有排序,因为Python字典实现了抽象数据类型 关联数组。正如他们所说

返回绑定的顺序可以是任意的

换句话说,计算机科学专业的学生不能假设关联数组是有序的。数学中的集合也是如此

集合中元素的列出顺序无关紧要

计算机科学

集合是一种抽象数据类型,可以存储某些值,而没有任何特定顺序

使用哈希表实现字典是一个实现细节,它很有趣,因为就顺序而言,它具有与关联数组相同的属性。

The other answers to this question are excellent and well written. The OP asks “how” which I interpret as “how do they get away with” or “why”.

The Python documentation says dictionaries are not ordered because the Python dictionary implements the abstract data type associative array. As they say

the order in which the bindings are returned may be arbitrary

In other words, a computer science student cannot assume that an associative array is ordered. The same is true for sets in math

the order in which the elements of a set are listed is irrelevant

and computer science

a set is an abstract data type that can store certain values, without any particular order

Implementing a dictionary using a hash table is an implementation detail that is interesting in that it has the same properties as associative arrays as far as order is concerned.


回答 4

Python使用哈希表来存储字典,因此使用哈希表的字典或其他可迭代对象中没有顺序。

但是关于哈希对象中项目的索引,python根据以下代码在其中hashtable.c计算索引:

key_hash = ht->hash_func(key);
index = key_hash & (ht->num_buckets - 1);

因此,因为整数的哈希值是整数本身*索引基于数字(ht->num_buckets - 1是一个常数),所以按位计算-和之间的索引(ht->num_buckets - 1)与数字本身*(预期-1的哈希值是-2 ),以及具有其哈希值的其他对象。

考虑以下set使用hash-table的示例:

>>> set([0,1919,2000,3,45,33,333,5])
set([0, 33, 3, 5, 45, 333, 2000, 1919])

对于数量,33我们有:

33 & (ht->num_buckets - 1) = 1

实际上是:

'0b100001' & '0b111'= '0b1' # 1 the index of 33

注意在这种情况下(ht->num_buckets - 1)8-1=70b111

对于1919

'0b11101111111' & '0b111' = '0b111' # 7 the index of 1919

对于333

'0b101001101' & '0b111' = '0b101' # 5 the index of 333

有关python哈希函数的更多详细信息,请阅读python源代码中的以下引号:

未来的主要细节:在模拟随机性的意义上,大多数哈希方案都依赖于具有“良好”的哈希函数。Python并非如此:在最常见的情况下,它最重要的哈希函数(用于字符串和整数)非常规则:

>>> map(hash, (0, 1, 2, 3))
  [0, 1, 2, 3]
>>> map(hash, ("namea", "nameb", "namec", "named"))
  [-1658398457, -1658398460, -1658398459, -1658398462]

这不一定坏!相反,在大小为2 ** i的表中,以低序i位作为初始表索引非常快,并且对于由连续整数范围索引的字典,根本没有冲突。当键是“连续”字符串时,情况大致相同。因此,这在通常情况下会提供比随机行为更好的行为,这是非常理想的。

OTOH,当发生冲突时,填充哈希表的连续切片的趋势使得良好的冲突解决策略至关重要。仅采用哈希码的最后i位也是容易受到攻击的:例如,将列表[i << 16 for i in range(20000)]视为一组键。 由于int是它们自己的哈希码,并且适合大小为2 ** 15的字典,因此每个哈希码的最后15位均为0:它们映射到相同的表索引。

但是迎合不寻常的情况不应减慢通常的情况,因此我们无论如何都只接受最后的i个信息。剩下的事要靠冲突解决来解决。如果我们通常在第一次尝试时就找到了要寻找的密钥(事实证明,我们通常会这样做-表负载因数保持在2/3以下,那么我们的优势很明显),那么就可以了保持初始索引计算的便宜是最好的选择。


*类的哈希函数int

class int:
    def __hash__(self):
        value = self
        if value == -1:
            value = -2
        return value

Python use hash table for storing the dictionaries, so there is no order in dictionaries or other iterable objects that use hash table.

But regarding the indices of items in a hash object, python calculate the indices based on following code within hashtable.c:

key_hash = ht->hash_func(key);
index = key_hash & (ht->num_buckets - 1);

Therefor, as the hash value of integers is the integer itself* the index is based on the number (ht->num_buckets - 1 is a constant) so the index calculated by Bitwise-and between (ht->num_buckets - 1) and the number itself* (expect for -1 which it’s hash is -2) , and for other objects with their hash value.

consider the following example with set that use hash-table :

>>> set([0,1919,2000,3,45,33,333,5])
set([0, 33, 3, 5, 45, 333, 2000, 1919])

For number 33 we have :

33 & (ht->num_buckets - 1) = 1

That actually it’s :

'0b100001' & '0b111'= '0b1' # 1 the index of 33

Note in this case (ht->num_buckets - 1) is 8-1=7 or 0b111.

And for 1919 :

'0b11101111111' & '0b111' = '0b111' # 7 the index of 1919

And for 333 :

'0b101001101' & '0b111' = '0b101' # 5 the index of 333

For more details about python hash function its good to read the following quotes from python source code :

Major subtleties ahead: Most hash schemes depend on having a “good” hash function, in the sense of simulating randomness. Python doesn’t: its most important hash functions (for strings and ints) are very regular in common cases:

>>> map(hash, (0, 1, 2, 3))
  [0, 1, 2, 3]
>>> map(hash, ("namea", "nameb", "namec", "named"))
  [-1658398457, -1658398460, -1658398459, -1658398462]

This isn’t necessarily bad! To the contrary, in a table of size 2**i, taking the low-order i bits as the initial table index is extremely fast, and there are no collisions at all for dicts indexed by a contiguous range of ints. The same is approximately true when keys are “consecutive” strings. So this gives better-than-random behavior in common cases, and that’s very desirable.

OTOH, when collisions occur, the tendency to fill contiguous slices of the hash table makes a good collision resolution strategy crucial. Taking only the last i bits of the hash code is also vulnerable: for example, consider the list [i << 16 for i in range(20000)] as a set of keys. Since ints are their own hash codes, and this fits in a dict of size 2**15, the last 15 bits of every hash code are all 0: they all map to the same table index.

But catering to unusual cases should not slow the usual ones, so we just take the last i bits anyway. It’s up to collision resolution to do the rest. If we usually find the key we’re looking for on the first try (and, it turns out, we usually do — the table load factor is kept under 2/3, so the odds are solidly in our favor), then it makes best sense to keep the initial index computation dirt cheap.


* The hash function for class int :

class int:
    def __hash__(self):
        value = self
        if value == -1:
            value = -2
        return value


回答 5

从Python 3.7(在CPython 3.6中已经开始)开始,字典项将保持其插入顺序


查找内置Python函数的源代码?

问题:查找内置Python函数的源代码?

有没有办法查看内置函数如何在python中工作?我不仅意味着如何使用它们,而且还意味着它们是如何构建的,排序枚举后的代码是什么?

Is there a way to see how built in functions work in python? I don’t mean just how to use them, but also how were they built, what is the code behind sorted or enumerate etc…?


回答 0

由于Python是开源的,因此您可以阅读源代码

要找出实现了特定模块或功能的文件,通常可以打印__file__属性。或者,您可以使用该inspect模块,请参阅的文档中的“ 检索源代码 ”部分inspect

对于内置的类和方法,这是不是这样,因为直白inspect.getfile,并inspect.getsource会返回一个类型错误,指出对象是内置。但是,可以Objects在Python源trunk子目录中找到许多内置类型。例如,看到这里的枚举类的实现或这里的执行list类型。

Since Python is open source you can read the source code.

To find out what file a particular module or function is implemented in you can usually print the __file__ attribute. Alternatively, you may use the inspect module, see the section Retrieving Source Code in the documentation of inspect.

For built-in classes and methods this is not so straightforward since inspect.getfile and inspect.getsource will return a type error stating that the object is built-in. However, many of the built-in types can be found in the Objects sub-directory of the Python source trunk. For example, see here for the implementation of the enumerate class or here for the implementation of the list type.


回答 1

这是一个补充@Chris答案的食谱答案,CPython已移至GitHub,并且Mercurial存储库将不再更新:

  1. 如有必要,安装Git。
  2. git clone https://github.com/python/cpython.git

  3. 代码将签出到名为cpython-> 的子目录cd cpython

  4. 假设我们正在寻找print()… 的定义
  5. egrep --color=always -R 'print' | less -R
  6. 啊哈!参见Python/bltinmodule.c->builtin_print()

请享用。

Here is a cookbook answer to supplement @Chris’ answer, CPython has moved to GitHub and the Mercurial repository will no longer be updated:

  1. Install Git if necessary.
  2. git clone https://github.com/python/cpython.git

  3. Code will checkout to a subdirectory called cpython -> cd cpython

  4. Let’s say we are looking for the definition of print()
  5. egrep --color=always -R 'print' | less -R
  6. Aha! See Python/bltinmodule.c -> builtin_print()

Enjoy.


回答 2

在此处输入图片说明

我不得不花点时间找到以下内容的来源,Built-in Functions因为搜索将产生数千个结果。(祝您好运,找到其中的任何来源)

总之,所有这些功能都定义bltinmodule.c的函数开始builtin_{functionname}

内置源:https : //github.com/python/cpython/blob/master/Python/bltinmodule.c

对于内置类型:https//github.com/python/cpython/tree/master/Objects

enter image description here

I had to dig a little to find the source of the following Built-in Functions as the search would yield thousands of results. (Good luck searching for any of those to find where it’s source is)

Anyway, all those functions are defined in bltinmodule.c Functions start with builtin_{functionname}

Built-in Source: https://github.com/python/cpython/blob/master/Python/bltinmodule.c

For Built-in Types: https://github.com/python/cpython/tree/master/Objects


回答 3

IPython的外壳让一切变得简单:function?给你的文档。function??同时显示代码。但是这仅适用于纯python函数。

然后,您随时可以下载(c)Python的源代码。

如果您对核心功能的pythonic实现感兴趣,请查看PyPy源代码。

The iPython shell makes this easy: function? will give you the documentation. function?? shows also the code. BUT this only works for pure python functions.

Then you can always download the source code for the (c)Python.

If you’re interested in pythonic implementations of core functionality have a look at PyPy source.


回答 4

2种方法

  1. 您可以使用查看有关代码段的用法 help()
  2. 您可以使用以下命令检查这些模块的隐藏代码 inspect

1)检查:

使用inpsect模块来浏览所需的代码… 注意:您只能浏览已导入的模块(aka)软件包的代码

例如:

  >>> import randint  
  >>> from inspect import getsource
  >>> getsource(randint) # here i am going to explore code for package called `randint`

2)help():

您只需使用help()命令即可获得有关内置函数及其代码的帮助。

例如:如果您想查看str()的代码,只需键入- help(str)

它会像这样返回

>>> help(str)
Help on class str in module __builtin__:

class str(basestring)
 |  str(object='') -> string
 |
 |  Return a nice string representation of the object.
 |  If the argument is a string, the return value is the same object.
 |
 |  Method resolution order:
 |      str
 |      basestring
 |      object
 |
 |  Methods defined here:
 |
 |  __add__(...)
 |      x.__add__(y) <==> x+y
 |
 |  __contains__(...)
 |      x.__contains__(y) <==> y in x
 |
 |  __eq__(...)
 |      x.__eq__(y) <==> x==y
 |
 |  __format__(...)
 |      S.__format__(format_spec) -> string
 |
 |      Return a formatted version of S as described by format_spec.
 |
 |  __ge__(...)
 |      x.__ge__(y) <==> x>=y
 |
 |  __getattribute__(...)
-- More  --

2 methods,

  1. You can check usage about snippet using help()
  2. you can check hidden code for those modules using inspect

1) inspect:

use inpsect module to explore code you want… NOTE: you can able to explore code only for modules (aka) packages you have imported

for eg:

  >>> import randint  
  >>> from inspect import getsource
  >>> getsource(randint) # here i am going to explore code for package called `randint`

2) help():

you can simply use help() command to get help about builtin functions as well its code.

for eg: if you want to see the code for str() , simply type – help(str)

it will return like this,

>>> help(str)
Help on class str in module __builtin__:

class str(basestring)
 |  str(object='') -> string
 |
 |  Return a nice string representation of the object.
 |  If the argument is a string, the return value is the same object.
 |
 |  Method resolution order:
 |      str
 |      basestring
 |      object
 |
 |  Methods defined here:
 |
 |  __add__(...)
 |      x.__add__(y) <==> x+y
 |
 |  __contains__(...)
 |      x.__contains__(y) <==> y in x
 |
 |  __eq__(...)
 |      x.__eq__(y) <==> x==y
 |
 |  __format__(...)
 |      S.__format__(format_spec) -> string
 |
 |      Return a formatted version of S as described by format_spec.
 |
 |  __ge__(...)
 |      x.__ge__(y) <==> x>=y
 |
 |  __getattribute__(...)
-- More  --

回答 5

《 Python 开发人员指南》是很多未知资源。

在最近的GH 问题(有点)中,增加了一个新章节来解决您所问的问题:CPython源代码布局。如果应该更改某些内容,该资源也将得到更新。

Quite an unknown resource is the Python Developer Guide.

In a (somewhat) recent GH issue, a new chapter was added for to address the question you’re asking: CPython Source Code Layout. If something should change, that resource will also get updated.


回答 6

如@Jim所述,此处描述了文件组织。为便于发现而复制:

对于Python模块,典型的布局为:

Lib/<module>.py
Modules/_<module>.c (if theres also a C accelerator module)
Lib/test/test_<module>.py
Doc/library/<module>.rst

对于仅扩展模块,典型布局为:

Modules/<module>module.c
Lib/test/test_<module>.py
Doc/library/<module>.rst

对于内置类型,典型的布局为:

Objects/<builtin>object.c
Lib/test/test_<builtin>.py
Doc/library/stdtypes.rst

对于内置函数,典型布局为:

Python/bltinmodule.c
Lib/test/test_builtin.py
Doc/library/functions.rst

一些exceptions:

builtin type int is at Objects/longobject.c
builtin type str is at Objects/unicodeobject.c
builtin module sys is at Python/sysmodule.c
builtin module marshal is at Python/marshal.c
Windows-only module winreg is at PC/winreg.c

As mentioned by @Jim, the file organization is described here. Reproduced for ease of discovery:

For Python modules, the typical layout is:

Lib/<module>.py
Modules/_<module>.c (if there’s also a C accelerator module)
Lib/test/test_<module>.py
Doc/library/<module>.rst

For extension-only modules, the typical layout is:

Modules/<module>module.c
Lib/test/test_<module>.py
Doc/library/<module>.rst

For builtin types, the typical layout is:

Objects/<builtin>object.c
Lib/test/test_<builtin>.py
Doc/library/stdtypes.rst

For builtin functions, the typical layout is:

Python/bltinmodule.c
Lib/test/test_builtin.py
Doc/library/functions.rst

Some exceptions:

builtin type int is at Objects/longobject.c
builtin type str is at Objects/unicodeobject.c
builtin module sys is at Python/sysmodule.c
builtin module marshal is at Python/marshal.c
Windows-only module winreg is at PC/winreg.c

元组比Python中的列表更有效吗?

问题:元组比Python中的列表更有效吗?

在元素的实例化和检索方面,元组和列表之间是否存在性能差异?

Is there any performance difference between tuples and lists when it comes to instantiation and retrieval of elements?


回答 0

dis模块反汇编函数的字节码,对于查看元组和列表之间的区别很有用。

在这种情况下,您可以看到访问元素会生成相同的代码,但是分配元组要比分配列表快得多。

>>> def a():
...     x=[1,2,3,4,5]
...     y=x[2]
...
>>> def b():
...     x=(1,2,3,4,5)
...     y=x[2]
...
>>> import dis
>>> dis.dis(a)
  2           0 LOAD_CONST               1 (1)
              3 LOAD_CONST               2 (2)
              6 LOAD_CONST               3 (3)
              9 LOAD_CONST               4 (4)
             12 LOAD_CONST               5 (5)
             15 BUILD_LIST               5
             18 STORE_FAST               0 (x)

  3          21 LOAD_FAST                0 (x)
             24 LOAD_CONST               2 (2)
             27 BINARY_SUBSCR
             28 STORE_FAST               1 (y)
             31 LOAD_CONST               0 (None)
             34 RETURN_VALUE
>>> dis.dis(b)
  2           0 LOAD_CONST               6 ((1, 2, 3, 4, 5))
              3 STORE_FAST               0 (x)

  3           6 LOAD_FAST                0 (x)
              9 LOAD_CONST               2 (2)
             12 BINARY_SUBSCR
             13 STORE_FAST               1 (y)
             16 LOAD_CONST               0 (None)
             19 RETURN_VALUE

The dis module disassembles the byte code for a function and is useful to see the difference between tuples and lists.

In this case, you can see that accessing an element generates identical code, but that assigning a tuple is much faster than assigning a list.

>>> def a():
...     x=[1,2,3,4,5]
...     y=x[2]
...
>>> def b():
...     x=(1,2,3,4,5)
...     y=x[2]
...
>>> import dis
>>> dis.dis(a)
  2           0 LOAD_CONST               1 (1)
              3 LOAD_CONST               2 (2)
              6 LOAD_CONST               3 (3)
              9 LOAD_CONST               4 (4)
             12 LOAD_CONST               5 (5)
             15 BUILD_LIST               5
             18 STORE_FAST               0 (x)

  3          21 LOAD_FAST                0 (x)
             24 LOAD_CONST               2 (2)
             27 BINARY_SUBSCR
             28 STORE_FAST               1 (y)
             31 LOAD_CONST               0 (None)
             34 RETURN_VALUE
>>> dis.dis(b)
  2           0 LOAD_CONST               6 ((1, 2, 3, 4, 5))
              3 STORE_FAST               0 (x)

  3           6 LOAD_FAST                0 (x)
              9 LOAD_CONST               2 (2)
             12 BINARY_SUBSCR
             13 STORE_FAST               1 (y)
             16 LOAD_CONST               0 (None)
             19 RETURN_VALUE

回答 1

通常,您可能期望元组会稍微快一些。但是,您绝对应该测试您的特定情况(如果差异可能会影响程序的性能,请记住“过早的优化是万恶之源”)。

Python非常简单:timeit是您的朋友。

$ python -m timeit "x=(1,2,3,4,5,6,7,8)"
10000000 loops, best of 3: 0.0388 usec per loop

$ python -m timeit "x=[1,2,3,4,5,6,7,8]"
1000000 loops, best of 3: 0.363 usec per loop

和…

$ python -m timeit -s "x=(1,2,3,4,5,6,7,8)" "y=x[3]"
10000000 loops, best of 3: 0.0938 usec per loop

$ python -m timeit -s "x=[1,2,3,4,5,6,7,8]" "y=x[3]"
10000000 loops, best of 3: 0.0649 usec per loop

因此,在这种情况下,元组的实例化速度几乎快一个数量级,但列表的项目访问实际上要快一些!因此,如果您要创建一些元组并多次访问它们,那么实际上使用列表可能会更快。

当然,如果您要更改一项,则列表肯定会更快,因为您需要创建一个完整的新元组来更改一项(因为元组是不可变的)。

In general, you might expect tuples to be slightly faster. However you should definitely test your specific case (if the difference might impact the performance of your program — remember “premature optimization is the root of all evil”).

Python makes this very easy: timeit is your friend.

$ python -m timeit "x=(1,2,3,4,5,6,7,8)"
10000000 loops, best of 3: 0.0388 usec per loop

$ python -m timeit "x=[1,2,3,4,5,6,7,8]"
1000000 loops, best of 3: 0.363 usec per loop

and…

$ python -m timeit -s "x=(1,2,3,4,5,6,7,8)" "y=x[3]"
10000000 loops, best of 3: 0.0938 usec per loop

$ python -m timeit -s "x=[1,2,3,4,5,6,7,8]" "y=x[3]"
10000000 loops, best of 3: 0.0649 usec per loop

So in this case, instantiation is almost an order of magnitude faster for the tuple, but item access is actually somewhat faster for the list! So if you’re creating a few tuples and accessing them many many times, it may actually be faster to use lists instead.

Of course if you want to change an item, the list will definitely be faster since you’d need to create an entire new tuple to change one item of it (since tuples are immutable).


回答 2

摘要

元组的性能往往比几乎每个类别中的列表都要好:

1)元组可以恒定折叠

2)元组可以重复使用而不是复制。

3)元组是紧凑的,并且不会过度分配。

4)元组直接引用其元素。

元组可以恒定折叠

常量元组可以通过Python的窥孔优化器或AST优化器预先计算。另一方面,列表是从头开始构建的:

    >>> from dis import dis

    >>> dis(compile("(10, 'abc')", '', 'eval'))
      1           0 LOAD_CONST               2 ((10, 'abc'))
                  3 RETURN_VALUE   

    >>> dis(compile("[10, 'abc']", '', 'eval'))
      1           0 LOAD_CONST               0 (10)
                  3 LOAD_CONST               1 ('abc')
                  6 BUILD_LIST               2
                  9 RETURN_VALUE 

元组不需要复制

运行tuple(some_tuple)立即返回本身。由于元组是不可变的,因此不必复制它们:

>>> a = (10, 20, 30)
>>> b = tuple(a)
>>> a is b
True

相反,list(some_list)要求将所有数据复制到新列表中:

>>> a = [10, 20, 30]
>>> b = list(a)
>>> a is b
False

元组不会过度分配

由于元组的大小是固定的,因此它可以比需要过度分配以使append()操作高效的列表更紧凑地存储。

这给元组提供了很好的空间优势:

>>> import sys
>>> sys.getsizeof(tuple(iter(range(10))))
128
>>> sys.getsizeof(list(iter(range(10))))
200

这是来自Objects / listobject.c的注释,解释了列表在做什么:

/* This over-allocates proportional to the list size, making room
 * for additional growth.  The over-allocation is mild, but is
 * enough to give linear-time amortized behavior over a long
 * sequence of appends() in the presence of a poorly-performing
 * system realloc().
 * The growth pattern is:  0, 4, 8, 16, 25, 35, 46, 58, 72, 88, ...
 * Note: new_allocated won't overflow because the largest possible value
 *       is PY_SSIZE_T_MAX * (9 / 8) + 6 which always fits in a size_t.
 */

元组直接引用其元素

对对象的引用直接合并到元组对象中。相反,列表具有指向外部指针数组的额外间接层。

这使元组在索引查找和拆包方面具有较小的速度优势:

$ python3.6 -m timeit -s 'a = (10, 20, 30)' 'a[1]'
10000000 loops, best of 3: 0.0304 usec per loop
$ python3.6 -m timeit -s 'a = [10, 20, 30]' 'a[1]'
10000000 loops, best of 3: 0.0309 usec per loop

$ python3.6 -m timeit -s 'a = (10, 20, 30)' 'x, y, z = a'
10000000 loops, best of 3: 0.0249 usec per loop
$ python3.6 -m timeit -s 'a = [10, 20, 30]' 'x, y, z = a'
10000000 loops, best of 3: 0.0251 usec per loop

是元组(10, 20)的存储方式:

    typedef struct {
        Py_ssize_t ob_refcnt;
        struct _typeobject *ob_type;
        Py_ssize_t ob_size;
        PyObject *ob_item[2];     /* store a pointer to 10 and a pointer to 20 */
    } PyTupleObject;

这里是列表如何[10, 20]存储:

    PyObject arr[2];              /* store a pointer to 10 and a pointer to 20 */

    typedef struct {
        Py_ssize_t ob_refcnt;
        struct _typeobject *ob_type;
        Py_ssize_t ob_size;
        PyObject **ob_item = arr; /* store a pointer to the two-pointer array */
        Py_ssize_t allocated;
    } PyListObject;

注意,元组对象直接合并了两个数据指针,而列表对象又具有一个间接层,该层指向包含两个数据指针的外部数组。

Summary

Tuples tend to perform better than lists in almost every category:

1) Tuples can be constant folded.

2) Tuples can be reused instead of copied.

3) Tuples are compact and don’t over-allocate.

4) Tuples directly reference their elements.

Tuples can be constant folded

Tuples of constants can be precomputed by Python’s peephole optimizer or AST-optimizer. Lists, on the other hand, get built-up from scratch:

    >>> from dis import dis

    >>> dis(compile("(10, 'abc')", '', 'eval'))
      1           0 LOAD_CONST               2 ((10, 'abc'))
                  3 RETURN_VALUE   

    >>> dis(compile("[10, 'abc']", '', 'eval'))
      1           0 LOAD_CONST               0 (10)
                  3 LOAD_CONST               1 ('abc')
                  6 BUILD_LIST               2
                  9 RETURN_VALUE 

Tuples do not need to be copied

Running tuple(some_tuple) returns immediately itself. Since tuples are immutable, they do not have to be copied:

>>> a = (10, 20, 30)
>>> b = tuple(a)
>>> a is b
True

In contrast, list(some_list) requires all the data to be copied to a new list:

>>> a = [10, 20, 30]
>>> b = list(a)
>>> a is b
False

Tuples do not over-allocate

Since a tuple’s size is fixed, it can be stored more compactly than lists which need to over-allocate to make append() operations efficient.

This gives tuples a nice space advantage:

>>> import sys
>>> sys.getsizeof(tuple(iter(range(10))))
128
>>> sys.getsizeof(list(iter(range(10))))
200

Here is the comment from Objects/listobject.c that explains what lists are doing:

/* This over-allocates proportional to the list size, making room
 * for additional growth.  The over-allocation is mild, but is
 * enough to give linear-time amortized behavior over a long
 * sequence of appends() in the presence of a poorly-performing
 * system realloc().
 * The growth pattern is:  0, 4, 8, 16, 25, 35, 46, 58, 72, 88, ...
 * Note: new_allocated won't overflow because the largest possible value
 *       is PY_SSIZE_T_MAX * (9 / 8) + 6 which always fits in a size_t.
 */

Tuples refer directly to their elements

References to objects are incorporated directly in a tuple object. In contrast, lists have an extra layer of indirection to an external array of pointers.

This gives tuples a small speed advantage for indexed lookups and unpacking:

$ python3.6 -m timeit -s 'a = (10, 20, 30)' 'a[1]'
10000000 loops, best of 3: 0.0304 usec per loop
$ python3.6 -m timeit -s 'a = [10, 20, 30]' 'a[1]'
10000000 loops, best of 3: 0.0309 usec per loop

$ python3.6 -m timeit -s 'a = (10, 20, 30)' 'x, y, z = a'
10000000 loops, best of 3: 0.0249 usec per loop
$ python3.6 -m timeit -s 'a = [10, 20, 30]' 'x, y, z = a'
10000000 loops, best of 3: 0.0251 usec per loop

Here is how the tuple (10, 20) is stored:

    typedef struct {
        Py_ssize_t ob_refcnt;
        struct _typeobject *ob_type;
        Py_ssize_t ob_size;
        PyObject *ob_item[2];     /* store a pointer to 10 and a pointer to 20 */
    } PyTupleObject;

Here is how the list [10, 20] is stored:

    PyObject arr[2];              /* store a pointer to 10 and a pointer to 20 */

    typedef struct {
        Py_ssize_t ob_refcnt;
        struct _typeobject *ob_type;
        Py_ssize_t ob_size;
        PyObject **ob_item = arr; /* store a pointer to the two-pointer array */
        Py_ssize_t allocated;
    } PyListObject;

Note that the tuple object incorporates the two data pointers directly while the list object has an additional layer of indirection to an external array holding the two data pointers.


回答 3

元组是不变的,它们的存储效率更高;为了提高效率,列表列出了整体内存,以允许没有常量reallocs的追加。因此,如果您想遍历代码中恒定的值序列(例如for direction in 'up', 'right', 'down', 'left':),则首选元组,因为此类元组是在编译时预先计算的。

访问速度应该相同(它们都作为连续数组存储在内存中)。

但是,当您处理可变数据时,它alist.append(item)是首选atuple+= (item,)。请记住,元组应被视为没有字段名称的记录。

Tuples, being immutable, are more memory efficient; lists, for speed efficiency, overallocate memory in order to allow appends without constant reallocs. So, if you want to iterate through a constant sequence of values in your code (eg for direction in 'up', 'right', 'down', 'left':), tuples are preferred, since such tuples are pre-calculated in compile time.

Read-access speeds should be the same (they are both stored as contiguous arrays in the memory).

But, alist.append(item) is much preferred to atuple+= (item,) when you deal with mutable data. Remember, tuples are intended to be treated as records without field names.


回答 4

array如果列表或元组中的所有项目都属于同一C类型,则还应该考虑标准库中的模块。它将占用更少的内存,并且速度更快。

You should also consider the array module in the standard library if all the items in your list or tuple are of the same C type. It will take less memory and can be faster.


回答 5

仅出于此目的,这是另一个小基准。

In [11]: %timeit list(range(100))
749 ns ± 2.41 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

In [12]: %timeit tuple(range(100))
781 ns ± 3.34 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

In [1]: %timeit list(range(1_000))
13.5 µs ± 466 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

In [2]: %timeit tuple(range(1_000))
12.4 µs ± 182 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

In [7]: %timeit list(range(10_000))
182 µs ± 810 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

In [8]: %timeit tuple(range(10_000))
188 µs ± 2.38 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

In [3]: %timeit list(range(1_00_000))
2.76 ms ± 30.5 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

In [4]: %timeit tuple(range(1_00_000))
2.74 ms ± 31.8 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

In [10]: %timeit list(range(10_00_000))
28.1 ms ± 266 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

In [9]: %timeit tuple(range(10_00_000))
28.5 ms ± 447 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

让我们对这些进行平均:

In [3]: l = np.array([749 * 10 ** -9, 13.5 * 10 ** -6, 182 * 10 ** -6, 2.76 * 10 ** -3, 28.1 * 10 ** -3])

In [2]: t = np.array([781 * 10 ** -9, 12.4 * 10 ** -6, 188 * 10 ** -6, 2.74 * 10 ** -3, 28.5 * 10 ** -3])

In [11]: np.average(l)
Out[11]: 0.0062112498000000006

In [12]: np.average(t)
Out[12]: 0.0062882362

In [17]: np.average(t) / np.average(l)  * 100
Out[17]: 101.23946713590554

您可以说它几乎没有定论。

但是可以肯定的是,与列表相比,元组花费101.239%了时间,或者花费了1.239%更多时间来完成这项工作。

Here is another little benchmark, just for the sake of it..

In [11]: %timeit list(range(100))
749 ns ± 2.41 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

In [12]: %timeit tuple(range(100))
781 ns ± 3.34 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

In [1]: %timeit list(range(1_000))
13.5 µs ± 466 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

In [2]: %timeit tuple(range(1_000))
12.4 µs ± 182 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

In [7]: %timeit list(range(10_000))
182 µs ± 810 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

In [8]: %timeit tuple(range(10_000))
188 µs ± 2.38 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

In [3]: %timeit list(range(1_00_000))
2.76 ms ± 30.5 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

In [4]: %timeit tuple(range(1_00_000))
2.74 ms ± 31.8 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

In [10]: %timeit list(range(10_00_000))
28.1 ms ± 266 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

In [9]: %timeit tuple(range(10_00_000))
28.5 ms ± 447 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

Let’s average these out:

In [3]: l = np.array([749 * 10 ** -9, 13.5 * 10 ** -6, 182 * 10 ** -6, 2.76 * 10 ** -3, 28.1 * 10 ** -3])

In [2]: t = np.array([781 * 10 ** -9, 12.4 * 10 ** -6, 188 * 10 ** -6, 2.74 * 10 ** -3, 28.5 * 10 ** -3])

In [11]: np.average(l)
Out[11]: 0.0062112498000000006

In [12]: np.average(t)
Out[12]: 0.0062882362

In [17]: np.average(t) / np.average(l)  * 100
Out[17]: 101.23946713590554

You can call it almost inconclusive.

But sure, tuples took 101.239% the time, or 1.239% extra time to do the job compared to lists.


回答 6

元组应该比列表稍有效率,因此它比列表更快,因为它们是不可变的。

Tuples should be slightly more efficient and because of that, faster, than lists because they are immutable.


回答 7

Tuple高效阅读的主要原因是因为它是不可变的。

为什么不可变的对象易于阅读?

原因是元组可以存储在内存缓存中,与列表不同。该程序总是可变的(可以随时更改),因此总是从列表存储位置读取。

The main reason for Tuple to be very efficient in reading is because it’s immutable.

Why immutable objects are easy to read?

The reason is tuples can be stored in the memory cache, unlike lists. The program always read from the lists memory location as it is mutable (can change any time).


Python的清单是如何实现的?

问题:Python的清单是如何实现的?

它是一个链表,一个数组吗?我四处搜寻,只发现有人猜测。我的C语言知识不足以查看源代码。

Is it a linked list, an array? I searched around and only found people guessing. My C knowledge isn’t good enough to look at the source code.


回答 0

这是一个动态数组。实际证明:不管索引如何,索引都需要花费同一时间(当然,差异很小(0.0013微秒!)):

...>python -m timeit --setup="x = [None]*1000" "x[500]"
10000000 loops, best of 3: 0.0579 usec per loop

...>python -m timeit --setup="x = [None]*1000" "x[0]"
10000000 loops, best of 3: 0.0566 usec per loop

如果IronPython或Jython使用链接列表,我会感到惊讶-它们会破坏许多基于列表是动态数组的假设而广泛使用的库的性能。

It’s a dynamic array. Practical proof: Indexing takes (of course with extremely small differences (0.0013 µsecs!)) the same time regardless of index:

...>python -m timeit --setup="x = [None]*1000" "x[500]"
10000000 loops, best of 3: 0.0579 usec per loop

...>python -m timeit --setup="x = [None]*1000" "x[0]"
10000000 loops, best of 3: 0.0566 usec per loop

I would be astounded if IronPython or Jython used linked lists – they would ruin the performance of many many widely-used libraries built on the assumption that lists are dynamic arrays.


回答 1

实际上,C代码非常简单。扩展一个宏并修剪一些不相关的注释,其基本结构在中listobject.h,该结构将列表定义为:

typedef struct {
    PyObject_HEAD
    Py_ssize_t ob_size;

    /* Vector of pointers to list elements.  list[0] is ob_item[0], etc. */
    PyObject **ob_item;

    /* ob_item contains space for 'allocated' elements.  The number
     * currently in use is ob_size.
     * Invariants:
     *     0 <= ob_size <= allocated
     *     len(list) == ob_size
     *     ob_item == NULL implies ob_size == allocated == 0
     */
    Py_ssize_t allocated;
} PyListObject;

PyObject_HEAD包含引用计数和类型标识符。因此,它是一个整体的向量/数组。用于在此类数组已满时调整其大小的代码在中listobject.c。它实际上并没有将数组加倍,而是通过分配来增长

new_allocated = (newsize >> 3) + (newsize < 9 ? 3 : 6);
new_allocated += newsize;

每次达到最大容量时,newsize请求的大小在哪里(不一定allocated + 1是因为您可以添加extend任意数量的元素,而不是append一个一个地添加它们)。

另请参阅Python FAQ

The C code is pretty simple, actually. Expanding one macro and pruning some irrelevant comments, the basic structure is in listobject.h, which defines a list as:

typedef struct {
    PyObject_HEAD
    Py_ssize_t ob_size;

    /* Vector of pointers to list elements.  list[0] is ob_item[0], etc. */
    PyObject **ob_item;

    /* ob_item contains space for 'allocated' elements.  The number
     * currently in use is ob_size.
     * Invariants:
     *     0 <= ob_size <= allocated
     *     len(list) == ob_size
     *     ob_item == NULL implies ob_size == allocated == 0
     */
    Py_ssize_t allocated;
} PyListObject;

PyObject_HEAD contains a reference count and a type identifier. So, it’s a vector/array that overallocates. The code for resizing such an array when it’s full is in listobject.c. It doesn’t actually double the array, but grows by allocating

new_allocated = (newsize >> 3) + (newsize < 9 ? 3 : 6);
new_allocated += newsize;

to the capacity each time, where newsize is the requested size (not necessarily allocated + 1 because you can extend by an arbitrary number of elements instead of append‘ing them one by one).

See also the Python FAQ.


回答 2

在CPython中,列表是指针数组。Python的其他实现可能选择以不同的方式存储它们。

In CPython, lists are arrays of pointers. Other implementations of Python may choose to store them in different ways.


回答 3

这取决于实现,但是IIRC:

  • CPython使用了一个指针数组
  • Jython使用 ArrayList
  • IronPython显然也使用数组。您可以浏览源代码以找出答案。

因此,它们都具有O(1)随机访问权限。

This is implementation dependent, but IIRC:

  • CPython uses an array of pointers
  • Jython uses an ArrayList
  • IronPython apparently also uses an array. You can browse the source code to find out.

Thus they all have O(1) random access.


回答 4

我建议Laurent Luce的文章“ Python列表实现”。这对我真的很有用,因为作者解释了该列表是如何在CPython中实现的,并为此使用了出色的图表。

列出对象C的结构

CPython中的列表对象由以下C结构表示。ob_item是指向列表元素的指针的列表。已分配是在内存中分配的插槽数。

typedef struct {
    PyObject_VAR_HEAD
    PyObject **ob_item;
    Py_ssize_t allocated;
} PyListObject;

重要的是要注意分配的插槽和列表大小之间的差异。列表的大小与相同len(l)。分配的插槽数是已在内存中分配的数量。通常,您会看到分配的大小可能大于大小。这是为了避免realloc每次将新元素添加到列表时都需要调用。

附加

我们将一个整数附加到列表中:l.append(1)。怎么了?
在此处输入图片说明

我们继续添加一个元素:l.append(2)list_resize在n + 1 = 2时调用,但是由于分配的大小为4,因此无需分配更多的内存。同样的事情发生时,我们增加2个整数:l.append(3)l.append(4)。下图显示了到目前为止的内容。

在此处输入图片说明

让我们在位置1处插入一个新的整数(5),l.insert(1,5)看看内部发生了什么。在此处输入图片说明

流行音乐

当您弹出最后一个元素:时l.pop(),将listpop()被调用。list_resize在内部调用listpop(),如果新大小小于分配大小的一半,则列表将缩小。在此处输入图片说明

您可以观察到插槽4仍指向整数,但重要的是列表的大小现在为4。让我们再弹出一个元素。在中list_resize(),大小– 1 = 4 – 1 = 3小于分配的插槽的一半,因此列表缩小为6个插槽,现在列表的新大小为3。

您可以观察到插槽3和4仍指向一些整数,但重要的是列表的大小现在为3。在此处输入图片说明

删除 Python列表对象具有删除特定元素的方法:l.remove(5)在此处输入图片说明

I would suggest Laurent Luce’s article “Python list implementation”. Was really useful for me because the author explains how the list is implemented in CPython and uses excellent diagrams for this purpose.

List object C structure

A list object in CPython is represented by the following C structure. ob_item is a list of pointers to the list elements. allocated is the number of slots allocated in memory.

typedef struct {
    PyObject_VAR_HEAD
    PyObject **ob_item;
    Py_ssize_t allocated;
} PyListObject;

It is important to notice the difference between allocated slots and the size of the list. The size of a list is the same as len(l). The number of allocated slots is what has been allocated in memory. Often, you will see that allocated can be greater than size. This is to avoid needing calling realloc each time a new elements is appended to the list.

Append

We append an integer to the list: l.append(1). What happens?
enter image description here

We continue by adding one more element: l.append(2). list_resize is called with n+1 = 2 but because the allocated size is 4, there is no need to allocate more memory. Same thing happens when we add 2 more integers: l.append(3), l.append(4). The following diagram shows what we have so far.

enter image description here

Insert

Let’s insert a new integer (5) at position 1: l.insert(1,5) and look at what happens internally. enter image description here

Pop

When you pop the last element: l.pop(), listpop() is called. list_resize is called inside listpop() and if the new size is less than half of the allocated size then the list is shrunk.enter image description here

You can observe that slot 4 still points to the integer but the important thing is the size of the list which is now 4. Let’s pop one more element. In list_resize(), size – 1 = 4 – 1 = 3 is less than half of the allocated slots so the list is shrunk to 6 slots and the new size of the list is now 3.

You can observe that slot 3 and 4 still point to some integers but the important thing is the size of the list which is now 3.enter image description here

Remove Python list object has a method to remove a specific element: l.remove(5).enter image description here


回答 5

根据文档

Python的列表实际上是可变长度数组,而不是Lisp样式的链接列表。

According to the documentation,

Python’s lists are really variable-length arrays, not Lisp-style linked lists.


回答 6

正如上面其他人所述,这些列表(当列表很大时)是通过分配固定数量的空间来实现的;如果该空间应填充,则分配更大的空间并在元素上进行复制。

为了理解为什么在不损失一般性的情况下对O(1)进行摊销,假设我们插入了a = 2 ^ n个元素,现在我们必须将表加倍为2 ^(n + 1)个大小。这意味着我们当前正在执行2 ^(n + 1)个操作。最后一个副本,我们进行了2 ^ n次操作。在此之前,我们做了2 ^(n-1)…一直到8,4,2,1。现在,如果将它们加起来,我们得到1 + 2 + 4 + 8 + … + 2 ^(n + 1)= 2 ^(n + 2)-1 <4 * 2 ^ n = O(2 ^ n)= O(a)总插入(即O(1)摊销时间)。另外,应注意,如果表允许删除,则表收缩必须以其他因子(例如3倍)完成

As others have stated above, the lists (when appreciably large) are implemented by allocating a fixed amount of space, and, if that space should fill, allocating a larger amount of space and copying over the elements.

To understand why the method is O(1) amortized, without loss of generality, assume we have inserted a = 2^n elements, and we now have to double our table to 2^(n+1) size. That means we’re currently doing 2^(n+1) operations. Last copy, we did 2^n operations. Before that we did 2^(n-1)… all the way down to 8,4,2,1. Now, if we add these up, we get 1 + 2 + 4 + 8 + … + 2^(n+1) = 2^(n+2) – 1 < 4*2^n = O(2^n) = O(a) total insertions (i.e. O(1) amortized time). Also, it should be noted that if the table allows deletions the table shrinking has to be done at a different factor (e.g 3x)


回答 7

Python中的列表类似于数组,您可以在其中存储多个值。列表是可变的,这意味着您可以更改它。您应该知道的更重要的事情是,当我们创建列表时,Python会自动为该列表变量创建reference_id。如果通过分配其他变量来更改它,则主列表将被更改。让我们尝试一个例子:

list_one = [1,2,3,4]

my_list = list_one

#my_list: [1,2,3,4]

my_list.append("new")

#my_list: [1,2,3,4,'new']
#list_one: [1,2,3,4,'new']

我们追加了,my_list但是我们的主要列表已更改。这意味着没有将列表指定为副本列表,将其指定为参考。

A list in Python is something like an array, where you can store multiple values. List is mutable that means you can change it. The more important thing you should know, when we create a list, Python automatically creates a reference_id for that list variable. If you change it by assigning others variable the main list will be change. Let’s try with a example:

list_one = [1,2,3,4]

my_list = list_one

#my_list: [1,2,3,4]

my_list.append("new")

#my_list: [1,2,3,4,'new']
#list_one: [1,2,3,4,'new']

We append my_list but our main list has changed. That mean’s list didn’t assign as a copy list assign as its reference.


回答 8

在CPython中,列表是作为动态数组实现的,因此,当我们在那时追加时,不仅添加了一个宏,而且还分配了更多空间,因此每次都不应添加新空间。

In CPython list is implemented as dynamic array, and therefore when we append at that time not only one macro is added but some more space is allocated so that everytime new space should not be added.


什么是CPython中的全局解释器锁(GIL)?

问题:什么是CPython中的全局解释器锁(GIL)?

什么是全局解释器锁,为什么会出现问题?

从Python删除GIL周围产生了很多噪音,我想理解为什么这是如此重要。我自己从未写过编译器或解释器,所以不要节俭,我可能需要理解它们。

What is a global interpreter lock and why is it an issue?

A lot of noise has been made around removing the GIL from Python, and I’d like to understand why that is so important. I have never written a compiler nor an interpreter myself, so don’t be frugal with details, I’ll probably need them to understand.


回答 0

Python的GIL旨在序列化从不同线程对解释器内部的访问。在多核系统上,这意味着多个线程无法有效利用多个核。(如果GIL不会导致此问题,那么大多数人就不会在意GIL-只是由于多核系统的普及而成为一个问题。)如果您想详细了解它,您可以观看此视频或查看这组幻灯片。可能信息太多,但是您确实要求提供详细信息:-)

请注意,Python的GIL实际上只是CPython(参考实现)的问题。Jython和IronPython没有GIL。作为Python开发人员,除非您正在编写C扩展,否则通常不会遇到GIL。C扩展编写者需要在其扩展确实阻止I / O时释放GIL,以便Python进程中的其他线程可以运行。

Python’s GIL is intended to serialize access to interpreter internals from different threads. On multi-core systems, it means that multiple threads can’t effectively make use of multiple cores. (If the GIL didn’t lead to this problem, most people wouldn’t care about the GIL – it’s only being raised as an issue because of the increasing prevalence of multi-core systems.) If you want to understand it in detail, you can view this video or look at this set of slides. It might be too much information, but then you did ask for details :-)

Note that Python’s GIL is only really an issue for CPython, the reference implementation. Jython and IronPython don’t have a GIL. As a Python developer, you don’t generally come across the GIL unless you’re writing a C extension. C extension writers need to release the GIL when their extensions do blocking I/O, so that other threads in the Python process get a chance to run.


回答 1

假设您有多个线程,它们实际上并没有相互影响对方的数据。那些应该尽可能独立地执行。如果您有一个“全局锁”,您需要获取该“全局锁”以调用(例如)一个函数,那么这最终会成为瓶颈。首先拥有多个线程可能并没有带来太多好处。

用现实世界来类比:假设有100个开发人员在一家公司工作,而他们只有一个咖啡杯。大多数开发人员会花时间等待咖啡而不是编码。

这些都不是特定于Python的-首先,我不知道Python需要GIL的详细信息。但是,希望它使您对一般概念有了更好的了解。

Suppose you have multiple threads which don’t really touch each other’s data. Those should execute as independently as possible. If you have a “global lock” which you need to acquire in order to (say) call a function, that can end up as a bottleneck. You can wind up not getting much benefit from having multiple threads in the first place.

To put it into a real world analogy: imagine 100 developers working at a company with only a single coffee mug. Most of the developers would spend their time waiting for coffee instead of coding.

None of this is Python-specific – I don’t know the details of what Python needed a GIL for in the first place. However, hopefully it’s given you a better idea of the general concept.


回答 2

首先让我们了解python GIL提供的功能:

任何操作/指令都在解释器中执行。GIL确保在特定时间由单个线程保存解释器。您的具有多个线程的python程序可在单个解释器中工作。在任何特定时间,此解释器都由单个线程持有。这意味着在任何时刻都只有运行解释器的线程在运行

现在为什么会这样:

您的计算机可能具有多个内核/处理器。多个内核允许多个线程同时执行即多个线程可以在任何特定时间执行。但是,由于解释器由单个线程持有,因此其他线程即使可以访问核心也不会做任何事情。因此,您无法获得多个内核所提供的任何优势,因为在任何时候都只使用一个内核,即当前持有解释器的线程正在使用的内核。因此,您的程序将像执行单线程程序一样花费很长时间。

但是,潜在的阻塞或长时间运行的操作(例如I / O,图像处理和NumPy数字运算)发生在GIL之外。从这里取。因此,对于此类操作,尽管存在GIL,但多线程操作仍将比单线程操作更快。因此,GIL并不总是瓶颈。

编辑:GIL是CPython的实现细节。IronPython和Jython没有GIL,因此他们中应该有一个真正的多线程程序,以为我从未使用过PyPy和Jython,也不确定。

Let’s first understand what the python GIL provides:

Any operation/instruction is executed in the interpreter. GIL ensures that interpreter is held by a single thread at a particular instant of time. And your python program with multiple threads works in a single interpreter. At any particular instant of time, this interpreter is held by a single thread. It means that only the thread which is holding the interpreter is running at any instant of time.

Now why is that an issue:

Your machine could be having multiple cores/processors. And multiple cores allow multiple threads to execute simultaneously i.e multiple threads could execute at any particular instant of time.. But since the interpreter is held by a single thread, other threads are not doing anything even though they have access to a core. So, you are not getting any advantage provided by multiple cores because at any instant only a single core, which is the core being used by the thread currently holding the interpreter, is being used. So, your program will take as long to execute as if it were a single threaded program.

However, potentially blocking or long-running operations, such as I/O, image processing, and NumPy number crunching, happen outside the GIL. Taken from here. So for such operations, a multithreaded operation will still be faster than a single threaded operation despite the presence of GIL. So, GIL is not always a bottleneck.

Edit: GIL is an implementation detail of CPython. IronPython and Jython don’t have GIL, so a truly multithreaded program should be possible in them, thought I have never used PyPy and Jython and not sure of this.


回答 3

Python不允许真正意义上的多线程。它具有多线程程序包,但是如果您想使用多线程来加快代码速度,那么使用它通常不是一个好主意。Python具有称为全局解释器锁(GIL)的构造。

https://www.youtube.com/watch?v=ph374fJqFPE

GIL确保在任何一次只能执行您的一个“线程”。线程获取GIL,做一些工作,然后将GIL传递到下一个线程。这发生得非常快,以至于人眼似乎您的线程正在并行执行,但实际上它们只是使用相同的CPU内核轮流执行。所有这些GIL传递都会增加执行开销。这意味着,如果您想使代码运行更快,那么使用线程包通常不是一个好主意。

有理由使用Python的线程包。如果您想同时运行一些东西,而效率不是问题,那么它就很好而且很方便。或者,如果您正在运行的代码需要等待某些东西(例如某些IO),那么这很有意义。但是线程库不会让您使用额外的CPU内核。

多线程可以外包给操作系统(通过执行多处理),一些调用您的Python代码的外部应用程序(例如Spark或Hadoop)或一些您的Python代码调用的代码(例如:您可以拥有Python代码调用执行昂贵的多线程内容的C函数)。

Python doesn’t allow multi-threading in the truest sense of the word. It has a multi-threading package but if you want to multi-thread to speed your code up, then it’s usually not a good idea to use it. Python has a construct called the Global Interpreter Lock (GIL).

https://www.youtube.com/watch?v=ph374fJqFPE

The GIL makes sure that only one of your ‘threads’ can execute at any one time. A thread acquires the GIL, does a little work, then passes the GIL onto the next thread. This happens very quickly so to the human eye it may seem like your threads are executing in parallel, but they are really just taking turns using the same CPU core. All this GIL passing adds overhead to execution. This means that if you want to make your code run faster then using the threading package often isn’t a good idea.

There are reasons to use Python’s threading package. If you want to run some things simultaneously, and efficiency is not a concern, then it’s totally fine and convenient. Or if you are running code that needs to wait for something (like some IO) then it could make a lot of sense. But the threading library wont let you use extra CPU cores.

Multi-threading can be outsourced to the operating system (by doing multi-processing), some external application that calls your Python code (eg, Spark or Hadoop), or some code that your Python code calls (eg: you could have your Python code call a C function that does the expensive multi-threaded stuff).


回答 4

每当两个线程访问同一变量时,您就会遇到问题。例如,在C ++中,避免该问题的方法是定义一些互斥锁,以防止两个线程同时输入一个对象的setter。

在python中可以进行多线程处理,但是不能以比一条python指令更好的粒度同时执行两个线程。正在运行的线程正在获取一个名为GIL的全局锁。

这意味着,如果您开始编写一些多线程代码以利用您的多核处理器,则性能不会提高。通常的解决方法是进行多进程。

请注意,如果您位于用C语言编写的方法中,则可以释放GIL。

GIL的使用不是Python固有的,而是它的某些解释器(包括最常见的CPython)使用的。(#edited,请参阅评论)

GIL问题在Python 3000中仍然有效。

Whenever two threads have access to the same variable you have a problem. In C++ for instance, the way to avoid the problem is to define some mutex lock to prevent two thread to, let’s say, enter the setter of an object at the same time.

Multithreading is possible in python, but two threads cannot be executed at the same time at a granularity finer than one python instruction. The running thread is getting a global lock called GIL.

This means if you begin write some multithreaded code in order to take advantage of your multicore processor, your performance won’t improve. The usual workaround consists of going multiprocess.

Note that it is possible to release the GIL if you’re inside a method you wrote in C for instance.

The use of a GIL is not inherent to Python but to some of its interpreter, including the most common CPython. (#edited, see comment)

The GIL issue is still valid in Python 3000.


回答 5

Python 3.7文档

我还想强调Python threading文档中的以下引号:

CPython实现细节:在CPython中,由于使用了全局解释器锁,因此只有一个线程可以一次执行Python代码(即使某些面向性能的库可能克服了此限制)。如果您希望应用程序更好地利用多核计算机的计算资源,建议使用multiprocessingconcurrent.futures.ProcessPoolExecutor。但是,如果您要同时运行多个I / O绑定任务,则线程化仍然是合适的模型。

这链接到词汇表条目,global interpreter lock条目解释为GIL暗示Python中的线程并行性不适合CPU绑定的任务

CPython解释器用来确保每次只有一个线程执行Python字节码的机制。通过使对象模型(包括关键的内置类型,如dict)隐式地安全地防止并发访问,从而简化了CPython的实现。锁定整个解释器可以使解释器更容易进行多线程处理,但会牺牲多处理器机器提供的许多并行性。

但是,某些扩展模块(标准的或第三方的)被设计为在执行诸如压缩或散列之类的计算密集型任务时释放GIL。另外,在执行I / O时,始终释放GIL。

过去创建“自由线程”解释器(一种以更精细的粒度锁定共享数据的解释器)的努力并未成功,因为在常见的单处理器情况下性能会受到影响。相信克服该性能问题将使实施更加复杂,因此维护成本更高。

此引号还暗示,作为CPython实现的细节,字典以及变量分配也是线程安全的:

接下来,该软件包文档multiprocessing说明了如何通过生成过程同时暴露类似于以下内容的接口来克服GIL threading

multiprocessing是一个程序包,它使用类似于线程模块的API支持生成程序。多处理程序包同时提供本地和远程并发性,通过使用子进程而不是线程来有效地避开全局解释器锁。因此,多处理模块允许程序员充分利用给定机器上的多个处理器。它可以在Unix和Windows上运行。

以及用于concurrent.futures.ProcessPoolExecutor解释该文档multiprocessing用作后端的文档

ProcessPoolExecutor类是Executor子类,它使用进程池异步执行调用。ProcessPoolExecutor使用多处理模块,该模块可以使其避开全局解释器锁,但也意味着只能执行和返回可拾取对象。

应对比于其他基类ThreadPoolExecutor的是使用线程而不是进程

ThreadPoolExecutor是一个Executor子类,它使用线程池异步执行调用。

从中我们得出结论,ThreadPoolExecutor它仅适用于I / O绑定的任务,同时ProcessPoolExecutor还可以处理CPU绑定的任务。

下面的问题询问为什么GIL首先存在:为什么使用全局解释器锁定?

进程与线程实验

Multiprocessing vs Threading Python中,对Python中的进程与线程进行了实验分析。

快速预览结果:

在此处输入图片说明

Python 3.7 documentation

I would also like to highlight the following quote from the Python threading documentation:

CPython implementation detail: In CPython, due to the Global Interpreter Lock, only one thread can execute Python code at once (even though certain performance-oriented libraries might overcome this limitation). If you want your application to make better use of the computational resources of multi-core machines, you are advised to use multiprocessing or concurrent.futures.ProcessPoolExecutor. However, threading is still an appropriate model if you want to run multiple I/O-bound tasks simultaneously.

This links to the Glossary entry for global interpreter lock which explains that the GIL implies that threaded parallelism in Python is unsuitable for CPU bound tasks:

The mechanism used by the CPython interpreter to assure that only one thread executes Python bytecode at a time. This simplifies the CPython implementation by making the object model (including critical built-in types such as dict) implicitly safe against concurrent access. Locking the entire interpreter makes it easier for the interpreter to be multi-threaded, at the expense of much of the parallelism afforded by multi-processor machines.

However, some extension modules, either standard or third-party, are designed so as to release the GIL when doing computationally-intensive tasks such as compression or hashing. Also, the GIL is always released when doing I/O.

Past efforts to create a “free-threaded” interpreter (one which locks shared data at a much finer granularity) have not been successful because performance suffered in the common single-processor case. It is believed that overcoming this performance issue would make the implementation much more complicated and therefore costlier to maintain.

This quote also implies that dicts and thus variable assignment are also thread safe as a CPython implementation detail:

Next, the docs for the multiprocessing package explain how it overcomes the GIL by spawning process while exposing an interface similar to that of threading:

multiprocessing is a package that supports spawning processes using an API similar to the threading module. The multiprocessing package offers both local and remote concurrency, effectively side-stepping the Global Interpreter Lock by using subprocesses instead of threads. Due to this, the multiprocessing module allows the programmer to fully leverage multiple processors on a given machine. It runs on both Unix and Windows.

And the docs for concurrent.futures.ProcessPoolExecutor explain that it uses multiprocessing as a backend:

The ProcessPoolExecutor class is an Executor subclass that uses a pool of processes to execute calls asynchronously. ProcessPoolExecutor uses the multiprocessing module, which allows it to side-step the Global Interpreter Lock but also means that only picklable objects can be executed and returned.

which should be contrasted to the other base class ThreadPoolExecutor that uses threads instead of processes

ThreadPoolExecutor is an Executor subclass that uses a pool of threads to execute calls asynchronously.

from which we conclude that ThreadPoolExecutor is only suitable for I/O bound tasks, while ProcessPoolExecutor can also handle CPU bound tasks.

The following question asks why the GIL exists in the first place: Why the Global Interpreter Lock?

Process vs thread experiments

At Multiprocessing vs Threading Python I’ve done an experimental analysis of process vs threads in Python.

Quick preview of the results:

enter image description here


回答 6

为什么Python(CPython等)使用GIL

来自http://wiki.python.org/moin/GlobalInterpreterLock

在CPython中,全局解释器锁(即GIL)是一个互斥体,可以防止多个本机线程一次执行Python字节码。该锁是必需的,主要是因为CPython的内存管理不是线程安全的。

如何从Python中删除它?

像Lua一样,也许Python可以启动多个VM,但是python不能这样做,我想应该还有其他原因。

在Numpy或其他一些python扩展库中,有时,将GIL释放到其他线程可以提高整个程序的效率。

Why Python (CPython and others) uses the GIL

From http://wiki.python.org/moin/GlobalInterpreterLock

In CPython, the global interpreter lock, or GIL, is a mutex that prevents multiple native threads from executing Python bytecodes at once. This lock is necessary mainly because CPython’s memory management is not thread-safe.

How to remove it from Python?

Like Lua, maybe Python could start multiple VM, But python doesn’t do that, I guess there should be some other reasons.

In Numpy or some other python extended library, sometimes, releasing the GIL to other threads could boost the efficiency of the whole programme.


回答 7

我想分享《 Visual Effects的多线程》一书中的示例。所以这是经典的死锁情况

static void MyCallback(const Context &context){
Auto<Lock> lock(GetMyMutexFromContext(context));
...
EvalMyPythonString(str); //A function that takes the GIL
...    
}

现在考虑序列中导致死锁的事件。

╔═══╦════════════════════════════════════════╦══════════════════════════════════════╗
    Main Thread                             Other Thread                         
╠═══╬════════════════════════════════════════╬══════════════════════════════════════╣
 1  Python Command acquires GIL             Work started                         
 2  Computation requested                   MyCallback runs and acquires MyMutex 
 3                                          MyCallback now waits for GIL         
 4  MyCallback runs and waits for MyMutex   waiting for GIL                      
╚═══╩════════════════════════════════════════╩══════════════════════════════════════╝

I want to share an example from the book multithreading for Visual Effects. So here is a classic dead lock situation

static void MyCallback(const Context &context){
Auto<Lock> lock(GetMyMutexFromContext(context));
...
EvalMyPythonString(str); //A function that takes the GIL
...    
}

Now consider the events in the sequence resulting a dead-lock.

╔═══╦════════════════════════════════════════╦══════════════════════════════════════╗
║   ║ Main Thread                            ║ Other Thread                         ║
╠═══╬════════════════════════════════════════╬══════════════════════════════════════╣
║ 1 ║ Python Command acquires GIL            ║ Work started                         ║
║ 2 ║ Computation requested                  ║ MyCallback runs and acquires MyMutex ║
║ 3 ║                                        ║ MyCallback now waits for GIL         ║
║ 4 ║ MyCallback runs and waits for MyMutex  ║ waiting for GIL                      ║
╚═══╩════════════════════════════════════════╩══════════════════════════════════════╝

为什么有些float <整数比较的速度慢四倍?

问题:为什么有些float <整数比较的速度慢四倍?

将浮点数与整数进行比较时,某些值对的评估时间要比其他类似幅度的值花费更长的时间。

例如:

>>> import timeit
>>> timeit.timeit("562949953420000.7 < 562949953421000") # run 1 million times
0.5387085462592742

但是,如果将float或整数变小或变大一定数量,则比较会更快地运行:

>>> timeit.timeit("562949953420000.7 < 562949953422000") # integer increased by 1000
0.1481498428446173
>>> timeit.timeit("562949953423001.8 < 562949953421000") # float increased by 3001.1
0.1459577925548956

更改比较运算符(例如使用==>代替)不会以任何明显的方式影响时间。

这不只是涉及到大小,因为采摘较大或较小的值会导致比较快,所以我怀疑它已经降到了一些不幸的方式位排队。

显然,对于大多数用例而言,比较这些值已足够快。我只是对为什么Python似乎在某些价值观上比在其他价值观上挣扎更多感到好奇。

When comparing floats to integers, some pairs of values take much longer to be evaluated than other values of a similar magnitude.

For example:

>>> import timeit
>>> timeit.timeit("562949953420000.7 < 562949953421000") # run 1 million times
0.5387085462592742

But if the float or integer is made smaller or larger by a certain amount, the comparison runs much more quickly:

>>> timeit.timeit("562949953420000.7 < 562949953422000") # integer increased by 1000
0.1481498428446173
>>> timeit.timeit("562949953423001.8 < 562949953421000") # float increased by 3001.1
0.1459577925548956

Changing the comparison operator (e.g. using == or > instead) does not affect the times in any noticeable way.

This is not solely related to magnitude because picking larger or smaller values can result in faster comparisons, so I suspect it is down to some unfortunate way the bits line up.

Clearly, comparing these values is more than fast enough for most use cases. I am simply curious as to why Python seems to struggle more with some pairs of values than with others.


回答 0

浮点对象的Python源代码中的注释确认:

比较几乎是一场噩梦

在将浮点数与整数进行比较时尤其如此,因为与浮点数不同,Python中的整数可以任意大,并且总是精确的。尝试将整数强制转换为浮点数可能会失去精度,并使比较不准确。尝试将浮点数转换为整数也不会起作用,因为任何小数部分都会丢失。

为了解决这个问题,Python执行了一系列检查,如果其中一项检查成功,则返回结果。它比较两个值的符号,然后比较整数是否“太大”而不能成为浮点数,然后将浮点的指数与整数长度进行比较。如果所有这些检查均失败,则有必要构造两个新的Python对象进行比较以获得结果。

将浮点数v与整数/长整数进行比较时w,最坏的情况是:

  • v并且w具有相同的符号(正号或负号),
  • 该整数的w位数很少,可以保存为该size_t类型(通常为32或64位),
  • 整数w至少有49位,
  • float的指数与中的v位数相同w

这正是我们对问题中的值所拥有的:

>>> import math
>>> math.frexp(562949953420000.7) # gives the float's (significand, exponent) pair
(0.9999999999976706, 49)
>>> (562949953421000).bit_length()
49

我们看到49既是浮点数的指数,也是整数的位数。这两个数字都是正数,因此符合上述四个条件。

选择一个较大或较小的值可以更改整数的位数或指数的值,因此Python能够确定比较结果,而无需执行昂贵的最终检查。

这特定于该语言的CPython实现。


比较比较详细

float_richcompare函数处理两个值v和之间的比较w

下面是该功能执行的检查的分步说明。当试图了解函数的功能时,Python来源中的注释实际上非常有帮助,因此我将其放在相关的地方。我还将这些检查总结在答案底部的列表中。

其主要思想是映射Python对象vw两个相应的C双打,i并且j,然后可以很容易地比较以得到正确的结果。Python 2和Python 3都使用相同的想法进行操作(前者只是分别处理intlong键入)。

首先要做的是检查v是否绝对是Python float并将其映射到C double i。接下来,该函数查看是否w也是float并将其映射到C double j。这是该功能的最佳情况,因为可以跳过所有其他检查。该功能还检查是否vinfnan

static PyObject*
float_richcompare(PyObject *v, PyObject *w, int op)
{
    double i, j;
    int r = 0;
    assert(PyFloat_Check(v));       
    i = PyFloat_AS_DOUBLE(v);       

    if (PyFloat_Check(w))           
        j = PyFloat_AS_DOUBLE(w);   

    else if (!Py_IS_FINITE(i)) {
        if (PyLong_Check(w))
            j = 0.0;
        else
            goto Unimplemented;
    }

现在我们知道,如果w未通过这些检查,则不是Python浮点数。现在,该函数检查它是否为Python整数。在这种情况下,最简单的测试是提取v和的符号w0如果为零,-1则返回,如果为负,1则为正)。如果符号不同,则这是返回比较结果所需的全部信息:

    else if (PyLong_Check(w)) {
        int vsign = i == 0.0 ? 0 : i < 0.0 ? -1 : 1;
        int wsign = _PyLong_Sign(w);
        size_t nbits;
        int exponent;

        if (vsign != wsign) {
            /* Magnitudes are irrelevant -- the signs alone
             * determine the outcome.
             */
            i = (double)vsign;
            j = (double)wsign;
            goto Compare;
        }
    }   

如果此检查失败,则vw具有相同的符号。

下一个检查将计算整数中的位数w。如果它有太多位,那么就不可能将其保存为浮点数,因此其大小必须大于浮点数v

    nbits = _PyLong_NumBits(w);
    if (nbits == (size_t)-1 && PyErr_Occurred()) {
        /* This long is so large that size_t isn't big enough
         * to hold the # of bits.  Replace with little doubles
         * that give the same outcome -- w is so large that
         * its magnitude must exceed the magnitude of any
         * finite float.
         */
        PyErr_Clear();
        i = (double)vsign;
        assert(wsign != 0);
        j = wsign * 2.0;
        goto Compare;
    }

另一方面,如果整数w的位数为48个或更少,则可以安全地将C翻倍j并进行比较:

    if (nbits <= 48) {
        j = PyLong_AsDouble(w);
        /* It's impossible that <= 48 bits overflowed. */
        assert(j != -1.0 || ! PyErr_Occurred());
        goto Compare;
    }

从这一点开始,我们知道它w具有49位或更多位。将其w视为正整数会很方便,因此请根据需要更改符号和比较运算符:

    if (nbits <= 48) {
        /* "Multiply both sides" by -1; this also swaps the
         * comparator.
         */
        i = -i;
        op = _Py_SwappedOp[op];
    }

现在,该函数查看浮点数的指数。回想一下,可以将浮点数写为有效位数* 2 指数(忽略符号),并且有效位数表示0.5到1之间的数字:

    (void) frexp(i, &exponent);
    if (exponent < 0 || (size_t)exponent < nbits) {
        i = 1.0;
        j = 2.0;
        goto Compare;
    }

这检查了两件事。如果指数小于0,则浮点数小于1(因此,其大小小于任何整数)。或者,如果指数小于in的位数,wv < |w|由于有效* 2 指数小于2 nbits,我们可以得到

如果这两项检查均未通过,该函数将查看该指数是否大于中的位数w。这表明有效数* 2 指数大于2 nbit,因此v > |w|

    if ((size_t)exponent > nbits) {
        i = 2.0;
        j = 1.0;
        goto Compare;
    }

如果此检查未成功,我们将知道float的指数v与整数中的位数相同w

现在可以比较两个值的唯一方法是从v和构造两个新的Python整数w。这个想法是丢弃的小数部分v,将整数部分加倍,然后再加一个。w也会加倍,并且可以将这两个新的Python对象进行比较以提供正确的返回值。使用具有较小值的示例,4.65 < 4将由比较确定(2*4)+1 == 9 < 8 == (2*4)(返回false)。

    {
        double fracpart;
        double intpart;
        PyObject *result = NULL;
        PyObject *one = NULL;
        PyObject *vv = NULL;
        PyObject *ww = w;

        // snip

        fracpart = modf(i, &intpart); // split i (the double that v mapped to)
        vv = PyLong_FromDouble(intpart);

        // snip

        if (fracpart != 0.0) {
            /* Shift left, and or a 1 bit into vv
             * to represent the lost fraction.
             */
            PyObject *temp;

            one = PyLong_FromLong(1);

            temp = PyNumber_Lshift(ww, one); // left-shift doubles an integer
            ww = temp;

            temp = PyNumber_Lshift(vv, one);
            vv = temp;

            temp = PyNumber_Or(vv, one); // a doubled integer is even, so this adds 1
            vv = temp;
        }
        // snip
    }
}

为简洁起见,我省略了Python创建这些新对象时必须进行的其他错误检查和垃圾跟踪。不用说,这增加了额外的开销,并解释了为什么问题中突出显示的值比其他值慢得多。


这是比较功能执行的检查的摘要。

让它v成为一个浮点数并将其转换为C的double。现在,如果w也是浮点数:

  • 检查wnan还是inf。如果是这样,请根据的类型分别处理此特殊情况w

  • 如果不是,则比较vw直接按其表示形式将C值翻倍。

如果w为整数:

  • 提取的迹象vw。如果它们不同,那么我们知道v并且w不同,那是更大的价值。

  • 符号相同。)检查是否w有太多位不能浮空(大于size_t)。如果是这样,w则其幅度大于v

  • 检查是否w有48位或更少的位。如果是这样,可以安全地将其强制转换为C double而不损失其精度,并与进行比较v

  • w具有超过48位。我们现在将w其视作已适当更改比较操作的正整数。

  • 考虑浮点数的指数v。如果指数为负,则v小于1且因此小于任何正整数。否则,如果指数小于的位数,w则它必须小于w

  • 如果的指数v大于中的位数​​,wv大于w

  • 指数与中的位数相同w

  • 最后检查。拆分v成它的整数和小数部分。将整数部分加倍并加1以补偿小数部分。现在将整数倍w。比较这两个新的整数以获得结果。

A comment in the Python source code for float objects acknowledges that:

Comparison is pretty much a nightmare

This is especially true when comparing a float to an integer, because, unlike floats, integers in Python can be arbitrarily large and are always exact. Trying to cast the integer to a float might lose precision and make the comparison inaccurate. Trying to cast the float to an integer is not going to work either because any fractional part will be lost.

To get around this problem, Python performs a series of checks, returning the result if one of the checks succeeds. It compares the signs of the two values, then whether the integer is “too big” to be a float, then compares the exponent of the float to the length of the integer. If all of these checks fail, it is necessary to construct two new Python objects to compare in order to obtain the result.

When comparing a float v to an integer/long w, the worst case is that:

  • v and w have the same sign (both positive or both negative),
  • the integer w has few enough bits that it can be held in the size_t type (typically 32 or 64 bits),
  • the integer w has at least 49 bits,
  • the exponent of the float v is the same as the number of bits in w.

And this is exactly what we have for the values in the question:

>>> import math
>>> math.frexp(562949953420000.7) # gives the float's (significand, exponent) pair
(0.9999999999976706, 49)
>>> (562949953421000).bit_length()
49

We see that 49 is both the exponent of the float and the number of bits in the integer. Both numbers are positive and so the four criteria above are met.

Choosing one of the values to be larger (or smaller) can change the number of bits of the integer, or the value of the exponent, and so Python is able to determine the result of the comparison without performing the expensive final check.

This is specific to the CPython implementation of the language.


The comparison in more detail

The float_richcompare function handles the comparison between two values v and w.

Below is a step-by-step description of the checks that the function performs. The comments in the Python source are actually very helpful when trying to understand what the function does, so I’ve left them in where relevant. I’ve also summarised these checks in a list at the foot of the answer.

The main idea is to map the Python objects v and w to two appropriate C doubles, i and j, which can then be easily compared to give the correct result. Both Python 2 and Python 3 use the same ideas to do this (the former just handles int and long types separately).

The first thing to do is check that v is definitely a Python float and map it to a C double i. Next the function looks at whether w is also a float and maps it to a C double j. This is the best case scenario for the function as all the other checks can be skipped. The function also checks to see whether v is inf or nan:

static PyObject*
float_richcompare(PyObject *v, PyObject *w, int op)
{
    double i, j;
    int r = 0;
    assert(PyFloat_Check(v));       
    i = PyFloat_AS_DOUBLE(v);       

    if (PyFloat_Check(w))           
        j = PyFloat_AS_DOUBLE(w);   

    else if (!Py_IS_FINITE(i)) {
        if (PyLong_Check(w))
            j = 0.0;
        else
            goto Unimplemented;
    }

Now we know that if w failed these checks, it is not a Python float. Now the function checks if it’s a Python integer. If this is the case, the easiest test is to extract the sign of v and the sign of w (return 0 if zero, -1 if negative, 1 if positive). If the signs are different, this is all the information needed to return the result of the comparison:

    else if (PyLong_Check(w)) {
        int vsign = i == 0.0 ? 0 : i < 0.0 ? -1 : 1;
        int wsign = _PyLong_Sign(w);
        size_t nbits;
        int exponent;

        if (vsign != wsign) {
            /* Magnitudes are irrelevant -- the signs alone
             * determine the outcome.
             */
            i = (double)vsign;
            j = (double)wsign;
            goto Compare;
        }
    }   

If this check failed, then v and w have the same sign.

The next check counts the number of bits in the integer w. If it has too many bits then it can’t possibly be held as a float and so must be larger in magnitude than the float v:

    nbits = _PyLong_NumBits(w);
    if (nbits == (size_t)-1 && PyErr_Occurred()) {
        /* This long is so large that size_t isn't big enough
         * to hold the # of bits.  Replace with little doubles
         * that give the same outcome -- w is so large that
         * its magnitude must exceed the magnitude of any
         * finite float.
         */
        PyErr_Clear();
        i = (double)vsign;
        assert(wsign != 0);
        j = wsign * 2.0;
        goto Compare;
    }

On the other hand, if the integer w has 48 or fewer bits, it can safely turned in a C double j and compared:

    if (nbits <= 48) {
        j = PyLong_AsDouble(w);
        /* It's impossible that <= 48 bits overflowed. */
        assert(j != -1.0 || ! PyErr_Occurred());
        goto Compare;
    }

From this point onwards, we know that w has 49 or more bits. It will be convenient to treat w as a positive integer, so change the sign and the comparison operator as necessary:

    if (nbits <= 48) {
        /* "Multiply both sides" by -1; this also swaps the
         * comparator.
         */
        i = -i;
        op = _Py_SwappedOp[op];
    }

Now the function looks at the exponent of the float. Recall that a float can be written (ignoring sign) as significand * 2exponent and that the significand represents a number between 0.5 and 1:

    (void) frexp(i, &exponent);
    if (exponent < 0 || (size_t)exponent < nbits) {
        i = 1.0;
        j = 2.0;
        goto Compare;
    }

This checks two things. If the exponent is less than 0 then the float is smaller than 1 (and so smaller in magnitude than any integer). Or, if the exponent is less than the number of bits in w then we have that v < |w| since significand * 2exponent is less than 2nbits.

Failing these two checks, the function looks to see whether the exponent is greater than the number of bit in w. This shows that significand * 2exponent is greater than 2nbits and so v > |w|:

    if ((size_t)exponent > nbits) {
        i = 2.0;
        j = 1.0;
        goto Compare;
    }

If this check did not succeed we know that the exponent of the float v is the same as the number of bits in the integer w.

The only way that the two values can be compared now is to construct two new Python integers from v and w. The idea is to discard the fractional part of v, double the integer part, and then add one. w is also doubled and these two new Python objects can be compared to give the correct return value. Using an example with small values, 4.65 < 4 would be determined by the comparison (2*4)+1 == 9 < 8 == (2*4) (returning false).

    {
        double fracpart;
        double intpart;
        PyObject *result = NULL;
        PyObject *one = NULL;
        PyObject *vv = NULL;
        PyObject *ww = w;

        // snip

        fracpart = modf(i, &intpart); // split i (the double that v mapped to)
        vv = PyLong_FromDouble(intpart);

        // snip

        if (fracpart != 0.0) {
            /* Shift left, and or a 1 bit into vv
             * to represent the lost fraction.
             */
            PyObject *temp;

            one = PyLong_FromLong(1);

            temp = PyNumber_Lshift(ww, one); // left-shift doubles an integer
            ww = temp;

            temp = PyNumber_Lshift(vv, one);
            vv = temp;

            temp = PyNumber_Or(vv, one); // a doubled integer is even, so this adds 1
            vv = temp;
        }
        // snip
    }
}

For brevity I’ve left out the additional error-checking and garbage-tracking Python has to do when it creates these new objects. Needless to say, this adds additional overhead and explains why the values highlighted in the question are significantly slower to compare than others.


Here is a summary of the checks that are performed by the comparison function.

Let v be a float and cast it as a C double. Now, if w is also a float:

  • Check whether w is nan or inf. If so, handle this special case separately depending on the type of w.

  • If not, compare v and w directly by their representations as C doubles.

If w is an integer:

  • Extract the signs of v and w. If they are different then we know v and w are different and which is the greater value.

  • (The signs are the same.) Check whether w has too many bits to be a float (more than size_t). If so, w has greater magnitude than v.

  • Check if w has 48 or fewer bits. If so, it can be safely cast to a C double without losing its precision and compared with v.

  • (w has more than 48 bits. We will now treat w as a positive integer having changed the compare op as appropriate.)

  • Consider the exponent of the float v. If the exponent is negative, then v is less than 1 and therefore less than any positive integer. Else, if the exponent is less than the number of bits in w then it must be less than w.

  • If the exponent of v is greater than the number of bits in w then v is greater than w.

  • (The exponent is the same as the number of bits in w.)

  • The final check. Split v into its integer and fractional parts. Double the integer part and add 1 to compensate for the fractional part. Now double the integer w. Compare these two new integers instead to get the result.


回答 1

gmpy2与任意精度的浮点数和整数一起使用,可以获得更统一的比较性能:

~ $ ptipython
Python 3.5.1 |Anaconda 4.0.0 (64-bit)| (default, Dec  7 2015, 11:16:01) 
Type "copyright", "credits" or "license" for more information.

IPython 4.1.2 -- An enhanced Interactive Python.
?         -> Introduction and overview of IPython's features.
%quickref -> Quick reference.
help      -> Python's own help system.
object?   -> Details about 'object', use 'object??' for extra details.

In [1]: import gmpy2

In [2]: from gmpy2 import mpfr

In [3]: from gmpy2 import mpz

In [4]: gmpy2.get_context().precision=200

In [5]: i1=562949953421000

In [6]: i2=562949953422000

In [7]: f=562949953420000.7

In [8]: i11=mpz('562949953421000')

In [9]: i12=mpz('562949953422000')

In [10]: f1=mpfr('562949953420000.7')

In [11]: f<i1
Out[11]: True

In [12]: f<i2
Out[12]: True

In [13]: f1<i11
Out[13]: True

In [14]: f1<i12
Out[14]: True

In [15]: %timeit f<i1
The slowest run took 10.15 times longer than the fastest. This could mean that an intermediate result is being cached.
1000000 loops, best of 3: 441 ns per loop

In [16]: %timeit f<i2
The slowest run took 12.55 times longer than the fastest. This could mean that an intermediate result is being cached.
10000000 loops, best of 3: 152 ns per loop

In [17]: %timeit f1<i11
The slowest run took 32.04 times longer than the fastest. This could mean that an intermediate result is being cached.
1000000 loops, best of 3: 269 ns per loop

In [18]: %timeit f1<i12
The slowest run took 36.81 times longer than the fastest. This could mean that an intermediate result is being cached.
1000000 loops, best of 3: 231 ns per loop

In [19]: %timeit f<i11
The slowest run took 78.26 times longer than the fastest. This could mean that an intermediate result is being cached.
10000000 loops, best of 3: 156 ns per loop

In [20]: %timeit f<i12
The slowest run took 21.24 times longer than the fastest. This could mean that an intermediate result is being cached.
10000000 loops, best of 3: 194 ns per loop

In [21]: %timeit f1<i1
The slowest run took 37.61 times longer than the fastest. This could mean that an intermediate result is being cached.
1000000 loops, best of 3: 275 ns per loop

In [22]: %timeit f1<i2
The slowest run took 39.03 times longer than the fastest. This could mean that an intermediate result is being cached.
1000000 loops, best of 3: 259 ns per loop

Using gmpy2 with arbitrary precision floats and integers it is possible to get more uniform comparison performance:

~ $ ptipython
Python 3.5.1 |Anaconda 4.0.0 (64-bit)| (default, Dec  7 2015, 11:16:01) 
Type "copyright", "credits" or "license" for more information.

IPython 4.1.2 -- An enhanced Interactive Python.
?         -> Introduction and overview of IPython's features.
%quickref -> Quick reference.
help      -> Python's own help system.
object?   -> Details about 'object', use 'object??' for extra details.

In [1]: import gmpy2

In [2]: from gmpy2 import mpfr

In [3]: from gmpy2 import mpz

In [4]: gmpy2.get_context().precision=200

In [5]: i1=562949953421000

In [6]: i2=562949953422000

In [7]: f=562949953420000.7

In [8]: i11=mpz('562949953421000')

In [9]: i12=mpz('562949953422000')

In [10]: f1=mpfr('562949953420000.7')

In [11]: f<i1
Out[11]: True

In [12]: f<i2
Out[12]: True

In [13]: f1<i11
Out[13]: True

In [14]: f1<i12
Out[14]: True

In [15]: %timeit f<i1
The slowest run took 10.15 times longer than the fastest. This could mean that an intermediate result is being cached.
1000000 loops, best of 3: 441 ns per loop

In [16]: %timeit f<i2
The slowest run took 12.55 times longer than the fastest. This could mean that an intermediate result is being cached.
10000000 loops, best of 3: 152 ns per loop

In [17]: %timeit f1<i11
The slowest run took 32.04 times longer than the fastest. This could mean that an intermediate result is being cached.
1000000 loops, best of 3: 269 ns per loop

In [18]: %timeit f1<i12
The slowest run took 36.81 times longer than the fastest. This could mean that an intermediate result is being cached.
1000000 loops, best of 3: 231 ns per loop

In [19]: %timeit f<i11
The slowest run took 78.26 times longer than the fastest. This could mean that an intermediate result is being cached.
10000000 loops, best of 3: 156 ns per loop

In [20]: %timeit f<i12
The slowest run took 21.24 times longer than the fastest. This could mean that an intermediate result is being cached.
10000000 loops, best of 3: 194 ns per loop

In [21]: %timeit f1<i1
The slowest run took 37.61 times longer than the fastest. This could mean that an intermediate result is being cached.
1000000 loops, best of 3: 275 ns per loop

In [22]: %timeit f1<i2
The slowest run took 39.03 times longer than the fastest. This could mean that an intermediate result is being cached.
1000000 loops, best of 3: 259 ns per loop