在Python中哪个更快:x **。5或math.sqrt(x)?

问题:在Python中哪个更快:x **。5或math.sqrt(x)?

我一直想知道这已经有一段时间了。就像标题中所说的那样,实际功能中哪个更快或更简单地提高一半功率?

更新

这不是过早优化的问题。这仅仅是基础代码实际上如何工作的问题。Python代码如何工作的理论是什么?

我向Guido van Rossum发送了一封电子邮件,因为我真的很想知道这些方法的区别。

我的电子邮件:

在Python中,至少有3种方法可以求平方根:math.sqrt,’**’运算符和pow(x,.5)。我只是好奇每个实现方式的差异。说到效率,哪个更好?

他的回应:

pow和**等价;math.sqrt不适用于复数,并且链接到C sqrt()函数。至于哪一个更快,我不知道…

I’ve been wondering this for some time. As the title say, which is faster, the actual function or simply raising to the half power?

UPDATE

This is not a matter of premature optimization. This is simply a question of how the underlying code actually works. What is the theory of how Python code works?

I sent Guido van Rossum an email cause I really wanted to know the differences in these methods.

My email:

There are at least 3 ways to do a square root in Python: math.sqrt, the ‘**’ operator and pow(x,.5). I’m just curious as to the differences in the implementation of each of these. When it comes to efficiency which is better?

His response:

pow and ** are equivalent; math.sqrt doesn’t work for complex numbers, and links to the C sqrt() function. As to which one is faster, I have no idea…


回答 0

math.sqrt(x)比快得多x**0.5

import math
N = 1000000
%%timeit
for i in range(N):
    z=i**.5

10次​​循环,最佳3:每个循环156毫秒

%%timeit
for i in range(N):
    z=math.sqrt(i)

10个循环,最佳3:每个循环91.1 ms

使用Python 3.6.9(笔记本)。

math.sqrt(x) is significantly faster than x**0.5.

import math
N = 1000000
%%timeit
for i in range(N):
    z=i**.5

10 loops, best of 3: 156 ms per loop

%%timeit
for i in range(N):
    z=math.sqrt(i)

10 loops, best of 3: 91.1 ms per loop

Using Python 3.6.9 (notebook).


回答 1

  • 优化的第一法则:不要做
  • 第二条规则:不这样做,但

以下是一些时间安排(Python 2.5.2,Windows):

$ python -mtimeit -s"from math import sqrt; x = 123" "x**.5"
1000000 loops, best of 3: 0.445 usec per loop

$ python -mtimeit -s"from math import sqrt; x = 123" "sqrt(x)"
1000000 loops, best of 3: 0.574 usec per loop

$ python -mtimeit -s"import math; x = 123" "math.sqrt(x)"
1000000 loops, best of 3: 0.727 usec per loop

此测试表明该x**.5速度比稍快sqrt(x)

对于Python 3.0,结果相反:

$ \Python30\python -mtimeit -s"from math import sqrt; x = 123" "x**.5"
1000000 loops, best of 3: 0.803 usec per loop

$ \Python30\python -mtimeit -s"from math import sqrt; x = 123" "sqrt(x)"
1000000 loops, best of 3: 0.695 usec per loop

$ \Python30\python -mtimeit -s"import math; x = 123" "math.sqrt(x)"
1000000 loops, best of 3: 0.761 usec per loop

math.sqrt(x)总是比x**.5另一台机器(Ubuntu,Python 2.6和3.1)快:

$ python -mtimeit -s"from math import sqrt; x = 123" "x**.5"
10000000 loops, best of 3: 0.173 usec per loop
$ python -mtimeit -s"from math import sqrt; x = 123" "sqrt(x)"
10000000 loops, best of 3: 0.115 usec per loop
$ python -mtimeit -s"import math; x = 123" "math.sqrt(x)"
10000000 loops, best of 3: 0.158 usec per loop
$ python3.1 -mtimeit -s"from math import sqrt; x = 123" "x**.5"
10000000 loops, best of 3: 0.194 usec per loop
$ python3.1 -mtimeit -s"from math import sqrt; x = 123" "sqrt(x)"
10000000 loops, best of 3: 0.123 usec per loop
$ python3.1 -mtimeit -s"import math; x = 123" "math.sqrt(x)"
10000000 loops, best of 3: 0.157 usec per loop
  • first rule of optimization: don’t do it
  • second rule: don’t do it, yet

Here’s some timings (Python 2.5.2, Windows):

$ python -mtimeit -s"from math import sqrt; x = 123" "x**.5"
1000000 loops, best of 3: 0.445 usec per loop

$ python -mtimeit -s"from math import sqrt; x = 123" "sqrt(x)"
1000000 loops, best of 3: 0.574 usec per loop

$ python -mtimeit -s"import math; x = 123" "math.sqrt(x)"
1000000 loops, best of 3: 0.727 usec per loop

This test shows that x**.5 is slightly faster than sqrt(x).

For the Python 3.0 the result is the opposite:

$ \Python30\python -mtimeit -s"from math import sqrt; x = 123" "x**.5"
1000000 loops, best of 3: 0.803 usec per loop

$ \Python30\python -mtimeit -s"from math import sqrt; x = 123" "sqrt(x)"
1000000 loops, best of 3: 0.695 usec per loop

$ \Python30\python -mtimeit -s"import math; x = 123" "math.sqrt(x)"
1000000 loops, best of 3: 0.761 usec per loop

math.sqrt(x) is always faster than x**.5 on another machine (Ubuntu, Python 2.6 and 3.1):

$ python -mtimeit -s"from math import sqrt; x = 123" "x**.5"
10000000 loops, best of 3: 0.173 usec per loop
$ python -mtimeit -s"from math import sqrt; x = 123" "sqrt(x)"
10000000 loops, best of 3: 0.115 usec per loop
$ python -mtimeit -s"import math; x = 123" "math.sqrt(x)"
10000000 loops, best of 3: 0.158 usec per loop
$ python3.1 -mtimeit -s"from math import sqrt; x = 123" "x**.5"
10000000 loops, best of 3: 0.194 usec per loop
$ python3.1 -mtimeit -s"from math import sqrt; x = 123" "sqrt(x)"
10000000 loops, best of 3: 0.123 usec per loop
$ python3.1 -mtimeit -s"import math; x = 123" "math.sqrt(x)"
10000000 loops, best of 3: 0.157 usec per loop

回答 2

您真正表演了多少个平方根?您是否正在尝试使用Python编写一些3D图形引擎?如果没有,那么为什么要使用易于理解的代码而不是神秘的代码?在我可以预见的几乎任何应用程序中,时间差都将比任何人所注意到的要小。我真的不是要放下您的问题,但似乎您对过早的优化走得太远了。

How many square roots are you really performing? Are you trying to write some 3D graphics engine in Python? If not, then why go with code which is cryptic over code that is easy to read? The time difference is would be less than anybody could notice in just about any application I could forsee. I really don’t mean to put down your question, but it seems that you’re going a little too far with premature optimization.


回答 3

在这些微基准测试中,math.sqrt速度会变慢,因为sqrt在数学命名空间中查找会花费一些时间。您可以使用

 from math import sqrt

即使如此,通过在时间上运行一些变体,仍显示了轻微(4-5%)的性能优势 x**.5

有趣的是,

 import math
 sqrt = math.sqrt

速度提高得更多,速度差异在1%以内,几乎没有统计学意义。


我将重复Kibbee,并说这可能是过早的优化。

In these micro-benchmarks, math.sqrt will be slower, because of the slight time it takes to lookup the sqrt in the math namespace. You can improve it slightly with

 from math import sqrt

Even then though, running a few variations through timeit, show a slight (4-5%) performance advantage for x**.5

Interestingly, doing

 import math
 sqrt = math.sqrt

sped it up even more, to within 1% difference in speed, with very little statistical significance.


I will repeat Kibbee, and say that this is probably a premature optimization.


回答 4

在python 2.6中,该(float).__pow__() 函数使用C pow()函数,而这些math.sqrt()函数使用C sqrt()函数。

在glibc编译器中,的实现pow(x,y)非常复杂,并且针对各种特殊情况进行了优化。例如,调用C pow(x,0.5)只是调用该sqrt()函数。

使用.**或的速度差异math.sqrt是由C函数周围使用的包装程序引起的,并且速度很大程度上取决于系统上使用的优化标志/ C编译器。

编辑:

这是我机器上Claudiu算法的结果。我得到了不同的结果:

zoltan@host:~$ python2.4 p.py 
Took 0.173994 seconds
Took 0.158991 seconds
zoltan@host:~$ python2.5 p.py 
Took 0.182321 seconds
Took 0.155394 seconds
zoltan@host:~$ python2.6 p.py 
Took 0.166766 seconds
Took 0.097018 seconds

In python 2.6 the (float).__pow__() function uses the C pow() function and the math.sqrt() functions uses the C sqrt() function.

In glibc compiler the implementation of pow(x,y) is quite complex and it is well optimized for various exceptional cases. For example, calling C pow(x,0.5) simply calls the sqrt() function.

The difference in speed of using .** or math.sqrt is caused by the wrappers used around the C functions and the speed strongly depends on optimization flags/C compiler used on the system.

Edit:

Here are the results of Claudiu’s algorithm on my machine. I got different results:

zoltan@host:~$ python2.4 p.py 
Took 0.173994 seconds
Took 0.158991 seconds
zoltan@host:~$ python2.5 p.py 
Took 0.182321 seconds
Took 0.155394 seconds
zoltan@host:~$ python2.6 p.py 
Took 0.166766 seconds
Took 0.097018 seconds

回答 5

物有所值(请参阅吉姆的答案)。在我的机器上,运行python 2.5:

PS C:\> python -m timeit -n 100000 10000**.5
100000 loops, best of 3: 0.0543 usec per loop
PS C:\> python -m timeit -n 100000 -s "import math" math.sqrt(10000)
100000 loops, best of 3: 0.162 usec per loop
PS C:\> python -m timeit -n 100000 -s "from math import sqrt" sqrt(10000)
100000 loops, best of 3: 0.0541 usec per loop

For what it’s worth (see Jim’s answer). On my machine, running python 2.5:

PS C:\> python -m timeit -n 100000 10000**.5
100000 loops, best of 3: 0.0543 usec per loop
PS C:\> python -m timeit -n 100000 -s "import math" math.sqrt(10000)
100000 loops, best of 3: 0.162 usec per loop
PS C:\> python -m timeit -n 100000 -s "from math import sqrt" sqrt(10000)
100000 loops, best of 3: 0.0541 usec per loop

回答 6

使用Claudiu的代码,即使在“从数学导入sqrt” x **。5的情况下,在我的机器上也更快,但使用psyco.full()sqrt(x)的速度要快得多,至少提高了200%

using Claudiu’s code, on my machine even with “from math import sqrt” x**.5 is faster but using psyco.full() sqrt(x) becomes much faster, at least by 200%


回答 7

最有可能是math.sqrt(x),因为它已针对平方根进行了优化。

基准将为您提供所需的答案。

Most likely math.sqrt(x), because it’s optimized for square rooting.

Benchmarks will provide you the answer you are looking for.


回答 8

有人评论了Quake 3中的“快速Newton-Raphson平方根” …我用ctypes实现了它,但是与本地版本相比,它超级慢。我将尝试一些优化和替代实现。

from ctypes import c_float, c_long, byref, POINTER, cast

def sqrt(num):
 xhalf = 0.5*num
 x = c_float(num)
 i = cast(byref(x), POINTER(c_long)).contents.value
 i = c_long(0x5f375a86 - (i>>1))
 x = cast(byref(i), POINTER(c_float)).contents.value

 x = x*(1.5-xhalf*x*x)
 x = x*(1.5-xhalf*x*x)
 return x * num

这是使用struct的另一种方法,其速度比ctypes版本快3.6倍,但仍是C速度的1/10。

from struct import pack, unpack

def sqrt_struct(num):
 xhalf = 0.5*num
 i = unpack('L', pack('f', 28.0))[0]
 i = 0x5f375a86 - (i>>1)
 x = unpack('f', pack('L', i))[0]

 x = x*(1.5-xhalf*x*x)
 x = x*(1.5-xhalf*x*x)
 return x * num

Someone commented about the “fast Newton-Raphson square root” from Quake 3… I implemented it with ctypes, but it’s super slow in comparison to the native versions. I’m going to try a few optimizations and alternate implementations.

from ctypes import c_float, c_long, byref, POINTER, cast

def sqrt(num):
 xhalf = 0.5*num
 x = c_float(num)
 i = cast(byref(x), POINTER(c_long)).contents.value
 i = c_long(0x5f375a86 - (i>>1))
 x = cast(byref(i), POINTER(c_float)).contents.value

 x = x*(1.5-xhalf*x*x)
 x = x*(1.5-xhalf*x*x)
 return x * num

Here’s another method using struct, comes out about 3.6x faster than the ctypes version, but still 1/10 the speed of C.

from struct import pack, unpack

def sqrt_struct(num):
 xhalf = 0.5*num
 i = unpack('L', pack('f', 28.0))[0]
 i = 0x5f375a86 - (i>>1)
 x = unpack('f', pack('L', i))[0]

 x = x*(1.5-xhalf*x*x)
 x = x*(1.5-xhalf*x*x)
 return x * num

回答 9

Claudiu的结果与我的不同。我在旧的P4 2.4Ghz计算机上的Ubuntu上使用Python 2.6 …这是我的结果:

>>> timeit1()
Took 0.564911 seconds
>>> timeit2()
Took 0.403087 seconds
>>> timeit1()
Took 0.604713 seconds
>>> timeit2()
Took 0.387749 seconds
>>> timeit1()
Took 0.587829 seconds
>>> timeit2()
Took 0.379381 seconds

sqrt始终对我来说速度更快…甚至Codepad.org现在似乎也同意sqrt在本地情况下更快(http://codepad.org/6trzcM3j)。目前,键盘似乎正在运行Python 2.5。当Claudiu第一次回答时,也许他们使用的是2.4或更早版本?

实际上,即使使用math.sqrt(i)代替arg(i),我仍然可以获得更好的sqrt时间。在这种情况下,timeit2()在我的机器上花费了0.53到0.55秒,仍然比timeit1的0.56-0.60更好。

我想说的是,在现代Python上,请使用math.sqrt并通过somevar = math.sqrt或从math import sqrt将其带入本地上下文。

Claudiu’s results differ from mine. I’m using Python 2.6 on Ubuntu on an old P4 2.4Ghz machine… Here’s my results:

>>> timeit1()
Took 0.564911 seconds
>>> timeit2()
Took 0.403087 seconds
>>> timeit1()
Took 0.604713 seconds
>>> timeit2()
Took 0.387749 seconds
>>> timeit1()
Took 0.587829 seconds
>>> timeit2()
Took 0.379381 seconds

sqrt is consistently faster for me… Even Codepad.org NOW seems to agree that sqrt, in the local context, is faster (http://codepad.org/6trzcM3j). Codepad seems to be running Python 2.5 presently. Perhaps they were using 2.4 or older when Claudiu first answered?

In fact, even using math.sqrt(i) in place of arg(i), I still get better times for sqrt. In this case timeit2() took between 0.53 and 0.55 seconds on my machine, which is still better than the 0.56-0.60 figures from timeit1.

I’d say, on modern Python, use math.sqrt and definitely bring it to local context, either with somevar=math.sqrt or with from math import sqrt.


回答 10

需要优化的Python风格是可读性。为此,我认为明确使用sqrt功能是最好的。话虽如此,我们还是要研究性能。

我更新了Claudiu用于Python 3的代码,并且也使得无法优化计算(将来一个好的Python编译器可能会做的事情):

from sys import version
from time import time
from math import sqrt, pi, e

print(version)

N = 1_000_000

def timeit1():
  z = N * e
  s = time()
  for n in range(N):
    z += (n * pi) ** .5 - z ** .5
  print (f"Took {(time() - s):.4f} seconds to calculate {z}")

def timeit2():
  z = N * e
  s = time()
  for n in range(N):
    z += sqrt(n * pi) - sqrt(z)
  print (f"Took {(time() - s):.4f} seconds to calculate {z}")

def timeit3(arg=sqrt):
  z = N * e
  s = time()
  for n in range(N):
    z += arg(n * pi) - arg(z)
  print (f"Took {(time() - s):.4f} seconds to calculate {z}")

timeit1()
timeit2()
timeit3()

结果各不相同,但示例输出为:

3.6.6 (default, Jul 19 2018, 14:25:17) 
[GCC 8.1.1 20180712 (Red Hat 8.1.1-5)]
Took 0.3747 seconds to calculate 3130485.5713865166
Took 0.2899 seconds to calculate 3130485.5713865166
Took 0.2635 seconds to calculate 3130485.5713865166

自己尝试。

The Pythonic thing to optimize for is readability. For this I think explicit use of the sqrt function is best. Having said that, let’s investigate performance anyway.

I updated Claudiu’s code for Python 3 and also made it impossible to optimize away the calculations (something a good Python compiler may do in the future):

from sys import version
from time import time
from math import sqrt, pi, e

print(version)

N = 1_000_000

def timeit1():
  z = N * e
  s = time()
  for n in range(N):
    z += (n * pi) ** .5 - z ** .5
  print (f"Took {(time() - s):.4f} seconds to calculate {z}")

def timeit2():
  z = N * e
  s = time()
  for n in range(N):
    z += sqrt(n * pi) - sqrt(z)
  print (f"Took {(time() - s):.4f} seconds to calculate {z}")

def timeit3(arg=sqrt):
  z = N * e
  s = time()
  for n in range(N):
    z += arg(n * pi) - arg(z)
  print (f"Took {(time() - s):.4f} seconds to calculate {z}")

timeit1()
timeit2()
timeit3()

Results vary, but a sample output is:

3.6.6 (default, Jul 19 2018, 14:25:17) 
[GCC 8.1.1 20180712 (Red Hat 8.1.1-5)]
Took 0.3747 seconds to calculate 3130485.5713865166
Took 0.2899 seconds to calculate 3130485.5713865166
Took 0.2635 seconds to calculate 3130485.5713865166

Try it yourself.


回答 11

问题SQRMINSUM我已经解决了最近需要大型数据集的计算重复平方根。在我进行其他优化之前,历史上最早的2个提交仅通过将sqrt()替换为** 0.5而有所不同,从而将PyPy中的运行时间从3.74s减少到0.51s。这几乎是Claudiu测算的400%改善的两倍。

The problem SQRMINSUM I’ve solved recently requires computing square root repeatedly on a large dataset. The oldest 2 submissions in my history, before I’ve made other optimizations, differ solely by replacing **0.5 with sqrt(), thus reducing the runtime from 3.74s to 0.51s in PyPy. This is almost twice the already massive 400% improvement that Claudiu measured.


回答 12

当然,如果要处理文字并需要一个恒定值,那么如果使用运算符编写,Python运行时可以在编译时预先计算该值-在这种情况下,无需分析每个版本:

In [77]: dis.dis(a)                                                                                                                       
  2           0 LOAD_CONST               1 (1.4142135623730951)
              2 RETURN_VALUE

In [78]: def a(): 
    ...:     return 2 ** 0.5 
    ...:                                                                                                                                  

In [79]: import dis                                                                                                                       

In [80]: dis.dis(a)                                                                                                                       
  2           0 LOAD_CONST               1 (1.4142135623730951)
              2 RETURN_VALUE

Of course, if one is dealing with literals and need a constant value, Python runtime can pre-calculate the value at compile time, if it is written with operators – no need to profile each version in this case:

In [77]: dis.dis(a)                                                                                                                       
  2           0 LOAD_CONST               1 (1.4142135623730951)
              2 RETURN_VALUE

In [78]: def a(): 
    ...:     return 2 ** 0.5 
    ...:                                                                                                                                  

In [79]: import dis                                                                                                                       

In [80]: dis.dis(a)                                                                                                                       
  2           0 LOAD_CONST               1 (1.4142135623730951)
              2 RETURN_VALUE


回答 13

如果您进入math.py并将函数“ sqrt”复制到程序中,将会更快。程序需要花费时间才能找到math.py,然后打开它,找到所需的函数,然后将其带回到程序中。如果即使使用“查找”步骤该功能也更快,则该功能本身必须非常快。可能会将您的时间减少一半。综上所述:

  1. 转到math.py
  2. 找到功能“ sqrt”
  3. 复制它
  4. 将函数作为sqrt查找器粘贴到您的程序中。
  5. 时间。

What would be even faster is if you went into math.py and copied the function “sqrt” into your program. It takes time for your program to find math.py, then open it, find the function you are looking for, and then bring that back to your program. If that function is faster even with the “lookup” steps, then the function itself has to be awfully fast. Probably will cut your time in half. IN summary:

  1. Go to math.py
  2. Find the function “sqrt”
  3. Copy it
  4. Paste function into your program as the sqrt finder.
  5. Time it.