问题:为什么x ** 4.0比Python 3中的x ** 4快?

为什么x**4.0要比x**4?我正在使用CPython 3.5.2。

$ python -m timeit "for x in range(100):" " x**4.0"
  10000 loops, best of 3: 24.2 usec per loop

$ python -m timeit "for x in range(100):" " x**4"
  10000 loops, best of 3: 30.6 usec per loop

我尝试更改所提高的力量以查看其作用方式,例如,如果我将x提高至10或16的力量,它会从30跳到35,但是如果我以浮点数提高10.0,那只是在移动大约24.1〜4。

我想这可能与浮点转换和2的幂有关,但我真的不知道。

我注意到在这两种情况下2的幂都更快,因为这些计算对于解释器/计算机而言更本机/更容易,所以我想。但尽管如此,浮子几乎没有动。2.0 => 24.1~4 & 128.0 => 24.1~4 2 => 29 & 128 => 62


TigerhawkT3指出,它不会在循环之外发生。我检查了一下,这种情况只发生在基地抬高时(从我所看到的情况)。有什么想法吗?

Why is x**4.0 faster than x**4? I am using CPython 3.5.2.

$ python -m timeit "for x in range(100):" " x**4.0"
  10000 loops, best of 3: 24.2 usec per loop

$ python -m timeit "for x in range(100):" " x**4"
  10000 loops, best of 3: 30.6 usec per loop

I tried changing the power I raised by to see how it acts, and for example if I raise x to the power of 10 or 16 it’s jumping from 30 to 35, but if I’m raising by 10.0 as a float, it’s just moving around 24.1~4.

I guess it has something to do with float conversion and powers of 2 maybe, but I don’t really know.

I noticed that in both cases powers of 2 are faster, I guess since those calculations are more native/easy for the interpreter/computer. But still, with floats it’s almost not moving. 2.0 => 24.1~4 & 128.0 => 24.1~4 but 2 => 29 & 128 => 62


TigerhawkT3 pointed out that it doesn’t happen outside of the loop. I checked and the situation only occurs (from what I’ve seen) when the base is getting raised. Any idea about that?

回答 0

为什么比Python 3 * x**4.0 更快x**4

Python 3 int对象是成熟的对象,旨在支持任意大小。因此,它们是在C级别上进行处理的(请参阅如何在中将所有变量声明为PyLongObject *type long_pow)。这也使它们的取幂变得更加棘手乏味,因为您需要ob_digit使用它用来表示其值的数组进行操作。(勇敢的人士来了。-有关s的更多信息,请参见:了解Python中大整数的内存分配PyLongObject。)

float相反,可以将 Python 对象转换为C double类型(通过使用PyFloat_AsDouble),并且可以使用这些本机类型执行操作。这很棒,因为在检查了相关的边缘情况之后,它允许Python 使用平台的powC pow,即)来处理实际的幂运算:

/* Now iv and iw are finite, iw is nonzero, and iv is
 * positive and not equal to 1.0.  We finally allow
 * the platform pow to step in and do the rest.
 */
errno = 0;
PyFPE_START_PROTECT("pow", return NULL)
ix = pow(iv, iw); 

其中,iviw是我们的原PyFloatObjectS作为Ç double秒。

对于它的价值:Python 2.7.13对我而言是一个2~3更快的因子,并且显示出相反的行为。

先前的事实也解释了Python 2和3之间的差异,因此,我认为我也将解决此评论,因为它很有趣。

在Python 2中,您使用的旧int对象与intPython 3中的对象不同(int3.x中的所有对象都是PyLongObject类型)。在Python 2中,有一个区别取决于对象的值(或者,如果使用后缀L/l):

# Python 2
type(30)  # <type 'int'>
type(30L) # <type 'long'>

<type 'int'>你看到这里做同样的事情float就做,它被安全地转换成C long 时,就可以进行幂(中int_pow也暗示编译器放你的歌在寄存器中,如果能够这样做,这样可以有所作为) :

static PyObject *
int_pow(PyIntObject *v, PyIntObject *w, PyIntObject *z)
{
    register long iv, iw, iz=0, ix, temp, prev;
/* Snipped for brevity */    

这样可以提高速度。

要查看<type 'long'>s与<type 'int'>s 相比是否比较慢,如果将x名称包装long在Python 2 中的调用中(实质上是强制将其long_pow像在Python 3中那样使用),则速度增益会消失:

# <type 'int'>
(python2)  python -m timeit "for x in range(1000):" " x**2"       
10000 loops, best of 3: 116 usec per loop
# <type 'long'> 
(python2)  python -m timeit "for x in range(1000):" " long(x)**2"
100 loops, best of 3: 2.12 msec per loop

请注意,尽管一个代码段将转换为intlong而另一个代码段未将其转换为(如@pydsinger所指出的),但这种转换并不是减慢速度的推动力。执行long_pow是。(仅long(x)将时间与语句一起查看)。

[…]它不会在循环之外发生。[…]有什么想法吗?

这是CPython的窥孔优化器,可以为您折叠常量。无论哪种情况,您都会获得相同的精确计时,因为没有实际的计算来找到幂运算的结果,只加载值:

dis.dis(compile('4 ** 4', '', 'exec'))
  1           0 LOAD_CONST               2 (256)
              3 POP_TOP
              4 LOAD_CONST               1 (None)
              7 RETURN_VALUE

生成相同的字节码'4 ** 4.',唯一的区别是LOAD_CONST加载的是float 256.0而不是int 256

dis.dis(compile('4 ** 4.', '', 'exec'))
  1           0 LOAD_CONST               3 (256.0)
              2 POP_TOP
              4 LOAD_CONST               2 (None)
              6 RETURN_VALUE

所以时代是一样的。


*以上所有内容仅适用于CPython(Python的参考实现)。其他实现可能会有所不同。

Why is x**4.0 faster than x**4 in Python 3*?

Python 3 int objects are a full fledged object designed to support an arbitrary size; due to that fact, they are handled as such on the C level (see how all variables are declared as PyLongObject * type in long_pow). This also makes their exponentiation a lot more trickier and tedious since you need to play around with the ob_digit array it uses to represent its value to perform it. (Source for the brave. — See: Understanding memory allocation for large integers in Python for more on PyLongObjects.)

Python float objects, on the contrary, can be transformed to a C double type (by using PyFloat_AsDouble) and operations can be performed using those native types. This is great because, after checking for relevant edge-cases, it allows Python to use the platforms’ pow (C’s pow, that is) to handle the actual exponentiation:

/* Now iv and iw are finite, iw is nonzero, and iv is
 * positive and not equal to 1.0.  We finally allow
 * the platform pow to step in and do the rest.
 */
errno = 0;
PyFPE_START_PROTECT("pow", return NULL)
ix = pow(iv, iw); 

where iv and iw are our original PyFloatObjects as C doubles.

For what it’s worth: Python 2.7.13 for me is a factor 2~3 faster, and shows the inverse behaviour.

The previous fact also explains the discrepancy between Python 2 and 3 so, I thought I’d address this comment too because it is interesting.

In Python 2, you’re using the old int object that differs from the int object in Python 3 (all int objects in 3.x are of PyLongObject type). In Python 2, there’s a distinction that depends on the value of the object (or, if you use the suffix L/l):

# Python 2
type(30)  # <type 'int'>
type(30L) # <type 'long'>

The <type 'int'> you see here does the same thing floats do, it gets safely converted into a C long when exponentiation is performed on it (The int_pow also hints the compiler to put ’em in a register if it can do so, so that could make a difference):

static PyObject *
int_pow(PyIntObject *v, PyIntObject *w, PyIntObject *z)
{
    register long iv, iw, iz=0, ix, temp, prev;
/* Snipped for brevity */    

this allows for a good speed gain.

To see how sluggish <type 'long'>s are in comparison to <type 'int'>s, if you wrapped the x name in a long call in Python 2 (essentially forcing it to use long_pow as in Python 3), the speed gain disappears:

# <type 'int'>
(python2) ➜ python -m timeit "for x in range(1000):" " x**2"       
10000 loops, best of 3: 116 usec per loop
# <type 'long'> 
(python2) ➜ python -m timeit "for x in range(1000):" " long(x)**2"
100 loops, best of 3: 2.12 msec per loop

Take note that, though the one snippet transforms the int to long while the other does not (as pointed out by @pydsinger), this cast is not the contributing force behind the slowdown. The implementation of long_pow is. (Time the statements solely with long(x) to see).

[…] it doesn’t happen outside of the loop. […] Any idea about that?

This is CPython’s peephole optimizer folding the constants for you. You get the same exact timings either case since there’s no actual computation to find the result of the exponentiation, only loading of values:

dis.dis(compile('4 ** 4', '', 'exec'))
  1           0 LOAD_CONST               2 (256)
              3 POP_TOP
              4 LOAD_CONST               1 (None)
              7 RETURN_VALUE

Identical byte-code is generated for '4 ** 4.' with the only difference being that the LOAD_CONST loads the float 256.0 instead of the int 256:

dis.dis(compile('4 ** 4.', '', 'exec'))
  1           0 LOAD_CONST               3 (256.0)
              2 POP_TOP
              4 LOAD_CONST               2 (None)
              6 RETURN_VALUE

So the times are identical.


*All of the above apply solely for CPython, the reference implementation of Python. Other implementations might perform differently.


回答 1

如果我们查看字节码,我们可以看到表达式是完全相同的。唯一的区别是常量的类型,它将是的自变量BINARY_POWER。因此,最肯定的是由于它int被转换为浮点数。

>>> def func(n):
...    return n**4
... 
>>> def func1(n):
...    return n**4.0
... 
>>> from dis import dis
>>> dis(func)
  2           0 LOAD_FAST                0 (n)
              3 LOAD_CONST               1 (4)
              6 BINARY_POWER
              7 RETURN_VALUE
>>> dis(func1)
  2           0 LOAD_FAST                0 (n)
              3 LOAD_CONST               1 (4.0)
              6 BINARY_POWER
              7 RETURN_VALUE

更新:让我们看一下CPython源代码中的Objects / abstract.c

PyObject *
PyNumber_Power(PyObject *v, PyObject *w, PyObject *z)
{
    return ternary_op(v, w, z, NB_SLOT(nb_power), "** or pow()");
}

PyNumber_Power呼叫ternary_op,这太长了,无法粘贴到这里,因此这是链接

它调用的nb_power广告位xy作为参数传递。

最后,在Objects / floatobject.c的float_pow()第686行,我们看到在实际操作之前将参数转换为C :double

static PyObject *
float_pow(PyObject *v, PyObject *w, PyObject *z)
{
    double iv, iw, ix;
    int negate_result = 0;

    if ((PyObject *)z != Py_None) {
        PyErr_SetString(PyExc_TypeError, "pow() 3rd argument not "
            "allowed unless all arguments are integers");
        return NULL;
    }

    CONVERT_TO_DOUBLE(v, iv);
    CONVERT_TO_DOUBLE(w, iw);
    ...

If we look at the bytecode, we can see that the expressions are purely identical. The only difference is a type of a constant that will be an argument of BINARY_POWER. So it’s most certainly due to an int being converted to a floating point number down the line.

>>> def func(n):
...    return n**4
... 
>>> def func1(n):
...    return n**4.0
... 
>>> from dis import dis
>>> dis(func)
  2           0 LOAD_FAST                0 (n)
              3 LOAD_CONST               1 (4)
              6 BINARY_POWER
              7 RETURN_VALUE
>>> dis(func1)
  2           0 LOAD_FAST                0 (n)
              3 LOAD_CONST               1 (4.0)
              6 BINARY_POWER
              7 RETURN_VALUE

Update: let’s take a look at Objects/abstract.c in the CPython source code:

PyObject *
PyNumber_Power(PyObject *v, PyObject *w, PyObject *z)
{
    return ternary_op(v, w, z, NB_SLOT(nb_power), "** or pow()");
}

PyNumber_Power calls ternary_op, which is too long to paste here, so here’s the link.

It calls the nb_power slot of x, passing y as an argument.

Finally, in float_pow() at line 686 of Objects/floatobject.c we see that arguments are converted to a C double right before the actual operation:

static PyObject *
float_pow(PyObject *v, PyObject *w, PyObject *z)
{
    double iv, iw, ix;
    int negate_result = 0;

    if ((PyObject *)z != Py_None) {
        PyErr_SetString(PyExc_TypeError, "pow() 3rd argument not "
            "allowed unless all arguments are integers");
        return NULL;
    }

    CONVERT_TO_DOUBLE(v, iv);
    CONVERT_TO_DOUBLE(w, iw);
    ...

回答 2

因为一个是正确的,所以另一个是近似值。

>>> 334453647687345435634784453567231654765 ** 4.0
1.2512490121794596e+154
>>> 334453647687345435634784453567231654765 ** 4
125124901217945966595797084130108863452053981325370920366144
719991392270482919860036990488994139314813986665699000071678
41534843695972182197917378267300625

Because one is correct, another is approximation.

>>> 334453647687345435634784453567231654765 ** 4.0
1.2512490121794596e+154
>>> 334453647687345435634784453567231654765 ** 4
125124901217945966595797084130108863452053981325370920366144
719991392270482919860036990488994139314813986665699000071678
41534843695972182197917378267300625

声明:本站所有文章,如无特殊说明或标注,均为本站原创发布。任何个人或组织,在未征得本站同意时,禁止复制、盗用、采集、发布本站内容到任何网站、书籍等各类媒体平台。如若本站内容侵犯了原著者的合法权益,可联系我们进行处理。