问题:在Python中哪个更快:x **。5或math.sqrt(x)?
我一直想知道这已经有一段时间了。就像标题中所说的那样,实际功能中哪个更快或更简单地提高一半功率?
更新
这不是过早优化的问题。这仅仅是基础代码实际上如何工作的问题。Python代码如何工作的理论是什么?
我向Guido van Rossum发送了一封电子邮件,因为我真的很想知道这些方法的区别。
我的电子邮件:
在Python中,至少有3种方法可以求平方根:math.sqrt,’**’运算符和pow(x,.5)。我只是好奇每个实现方式的差异。说到效率,哪个更好?
他的回应:
pow和**等价;math.sqrt不适用于复数,并且链接到C sqrt()函数。至于哪一个更快,我不知道…
I’ve been wondering this for some time. As the title say, which is faster, the actual function or simply raising to the half power?
UPDATE
This is not a matter of premature optimization. This is simply a question of how the underlying code actually works. What is the theory of how Python code works?
I sent Guido van Rossum an email cause I really wanted to know the differences in these methods.
My email:
There are at least 3 ways to do a square root in Python: math.sqrt, the
‘**’ operator and pow(x,.5). I’m just curious as to the differences in
the implementation of each of these. When it comes to efficiency which
is better?
His response:
pow and ** are equivalent; math.sqrt doesn’t work for complex numbers,
and links to the C sqrt() function. As to which one is
faster, I have no idea…
回答 0
math.sqrt(x)
比快得多x**0.5
。
import math
N = 1000000
%%timeit
for i in range(N):
z=i**.5
10次循环,最佳3:每个循环156毫秒
%%timeit
for i in range(N):
z=math.sqrt(i)
10个循环,最佳3:每个循环91.1 ms
使用Python 3.6.9(笔记本)。
math.sqrt(x)
is significantly faster than x**0.5
.
import math
N = 1000000
%%timeit
for i in range(N):
z=i**.5
10 loops, best of 3: 156 ms per loop
%%timeit
for i in range(N):
z=math.sqrt(i)
10 loops, best of 3: 91.1 ms per loop
Using Python 3.6.9 (notebook).
回答 1
以下是一些时间安排(Python 2.5.2,Windows):
$ python -mtimeit -s"from math import sqrt; x = 123" "x**.5"
1000000 loops, best of 3: 0.445 usec per loop
$ python -mtimeit -s"from math import sqrt; x = 123" "sqrt(x)"
1000000 loops, best of 3: 0.574 usec per loop
$ python -mtimeit -s"import math; x = 123" "math.sqrt(x)"
1000000 loops, best of 3: 0.727 usec per loop
此测试表明该x**.5
速度比稍快sqrt(x)
。
对于Python 3.0,结果相反:
$ \Python30\python -mtimeit -s"from math import sqrt; x = 123" "x**.5"
1000000 loops, best of 3: 0.803 usec per loop
$ \Python30\python -mtimeit -s"from math import sqrt; x = 123" "sqrt(x)"
1000000 loops, best of 3: 0.695 usec per loop
$ \Python30\python -mtimeit -s"import math; x = 123" "math.sqrt(x)"
1000000 loops, best of 3: 0.761 usec per loop
math.sqrt(x)
总是比x**.5
另一台机器(Ubuntu,Python 2.6和3.1)快:
$ python -mtimeit -s"from math import sqrt; x = 123" "x**.5"
10000000 loops, best of 3: 0.173 usec per loop
$ python -mtimeit -s"from math import sqrt; x = 123" "sqrt(x)"
10000000 loops, best of 3: 0.115 usec per loop
$ python -mtimeit -s"import math; x = 123" "math.sqrt(x)"
10000000 loops, best of 3: 0.158 usec per loop
$ python3.1 -mtimeit -s"from math import sqrt; x = 123" "x**.5"
10000000 loops, best of 3: 0.194 usec per loop
$ python3.1 -mtimeit -s"from math import sqrt; x = 123" "sqrt(x)"
10000000 loops, best of 3: 0.123 usec per loop
$ python3.1 -mtimeit -s"import math; x = 123" "math.sqrt(x)"
10000000 loops, best of 3: 0.157 usec per loop
- first rule of optimization: don’t do it
- second rule: don’t do it, yet
Here’s some timings (Python 2.5.2, Windows):
$ python -mtimeit -s"from math import sqrt; x = 123" "x**.5"
1000000 loops, best of 3: 0.445 usec per loop
$ python -mtimeit -s"from math import sqrt; x = 123" "sqrt(x)"
1000000 loops, best of 3: 0.574 usec per loop
$ python -mtimeit -s"import math; x = 123" "math.sqrt(x)"
1000000 loops, best of 3: 0.727 usec per loop
This test shows that x**.5
is slightly faster than sqrt(x)
.
For the Python 3.0 the result is the opposite:
$ \Python30\python -mtimeit -s"from math import sqrt; x = 123" "x**.5"
1000000 loops, best of 3: 0.803 usec per loop
$ \Python30\python -mtimeit -s"from math import sqrt; x = 123" "sqrt(x)"
1000000 loops, best of 3: 0.695 usec per loop
$ \Python30\python -mtimeit -s"import math; x = 123" "math.sqrt(x)"
1000000 loops, best of 3: 0.761 usec per loop
math.sqrt(x)
is always faster than x**.5
on another machine (Ubuntu, Python 2.6 and 3.1):
$ python -mtimeit -s"from math import sqrt; x = 123" "x**.5"
10000000 loops, best of 3: 0.173 usec per loop
$ python -mtimeit -s"from math import sqrt; x = 123" "sqrt(x)"
10000000 loops, best of 3: 0.115 usec per loop
$ python -mtimeit -s"import math; x = 123" "math.sqrt(x)"
10000000 loops, best of 3: 0.158 usec per loop
$ python3.1 -mtimeit -s"from math import sqrt; x = 123" "x**.5"
10000000 loops, best of 3: 0.194 usec per loop
$ python3.1 -mtimeit -s"from math import sqrt; x = 123" "sqrt(x)"
10000000 loops, best of 3: 0.123 usec per loop
$ python3.1 -mtimeit -s"import math; x = 123" "math.sqrt(x)"
10000000 loops, best of 3: 0.157 usec per loop
回答 2
您真正表演了多少个平方根?您是否正在尝试使用Python编写一些3D图形引擎?如果没有,那么为什么要使用易于理解的代码而不是神秘的代码?在我可以预见的几乎任何应用程序中,时间差都将比任何人所注意到的要小。我真的不是要放下您的问题,但似乎您对过早的优化走得太远了。
How many square roots are you really performing? Are you trying to write some 3D graphics engine in Python? If not, then why go with code which is cryptic over code that is easy to read? The time difference is would be less than anybody could notice in just about any application I could forsee. I really don’t mean to put down your question, but it seems that you’re going a little too far with premature optimization.
回答 3
在这些微基准测试中,math.sqrt
速度会变慢,因为sqrt
在数学命名空间中查找会花费一些时间。您可以使用
from math import sqrt
即使如此,通过在时间上运行一些变体,仍显示了轻微(4-5%)的性能优势 x**.5
有趣的是,
import math
sqrt = math.sqrt
速度提高得更多,速度差异在1%以内,几乎没有统计学意义。
我将重复Kibbee,并说这可能是过早的优化。
In these micro-benchmarks, math.sqrt
will be slower, because of the slight time it takes to lookup the sqrt
in the math namespace. You can improve it slightly with
from math import sqrt
Even then though, running a few variations through timeit, show a slight (4-5%) performance advantage for x**.5
Interestingly, doing
import math
sqrt = math.sqrt
sped it up even more, to within 1% difference in speed, with very little statistical significance.
I will repeat Kibbee, and say that this is probably a premature optimization.
回答 4
在python 2.6中,该(float).__pow__()
函数使用C pow()
函数,而这些math.sqrt()
函数使用C sqrt()
函数。
在glibc编译器中,的实现pow(x,y)
非常复杂,并且针对各种特殊情况进行了优化。例如,调用C pow(x,0.5)
只是调用该sqrt()
函数。
使用.**
或的速度差异math.sqrt
是由C函数周围使用的包装程序引起的,并且速度很大程度上取决于系统上使用的优化标志/ C编译器。
编辑:
这是我机器上Claudiu算法的结果。我得到了不同的结果:
zoltan@host:~$ python2.4 p.py
Took 0.173994 seconds
Took 0.158991 seconds
zoltan@host:~$ python2.5 p.py
Took 0.182321 seconds
Took 0.155394 seconds
zoltan@host:~$ python2.6 p.py
Took 0.166766 seconds
Took 0.097018 seconds
In python 2.6 the (float).__pow__()
function uses the C pow()
function and the math.sqrt()
functions uses the C sqrt()
function.
In glibc compiler the implementation of pow(x,y)
is quite complex and it is well optimized for various exceptional cases. For example, calling C pow(x,0.5)
simply calls the sqrt()
function.
The difference in speed of using .**
or math.sqrt
is caused by the wrappers used around the C functions and the speed strongly depends on optimization flags/C compiler used on the system.
Edit:
Here are the results of Claudiu’s algorithm on my machine. I got different results:
zoltan@host:~$ python2.4 p.py
Took 0.173994 seconds
Took 0.158991 seconds
zoltan@host:~$ python2.5 p.py
Took 0.182321 seconds
Took 0.155394 seconds
zoltan@host:~$ python2.6 p.py
Took 0.166766 seconds
Took 0.097018 seconds
回答 5
物有所值(请参阅吉姆的答案)。在我的机器上,运行python 2.5:
PS C:\> python -m timeit -n 100000 10000**.5
100000 loops, best of 3: 0.0543 usec per loop
PS C:\> python -m timeit -n 100000 -s "import math" math.sqrt(10000)
100000 loops, best of 3: 0.162 usec per loop
PS C:\> python -m timeit -n 100000 -s "from math import sqrt" sqrt(10000)
100000 loops, best of 3: 0.0541 usec per loop
For what it’s worth (see Jim’s answer). On my machine, running python 2.5:
PS C:\> python -m timeit -n 100000 10000**.5
100000 loops, best of 3: 0.0543 usec per loop
PS C:\> python -m timeit -n 100000 -s "import math" math.sqrt(10000)
100000 loops, best of 3: 0.162 usec per loop
PS C:\> python -m timeit -n 100000 -s "from math import sqrt" sqrt(10000)
100000 loops, best of 3: 0.0541 usec per loop
回答 6
使用Claudiu的代码,即使在“从数学导入sqrt” x **。5的情况下,在我的机器上也更快,但使用psyco.full()sqrt(x)的速度要快得多,至少提高了200%
using Claudiu’s code, on my machine even with “from math import sqrt” x**.5 is faster but using psyco.full() sqrt(x) becomes much faster, at least by 200%
回答 7
最有可能是math.sqrt(x),因为它已针对平方根进行了优化。
基准将为您提供所需的答案。
Most likely math.sqrt(x), because it’s optimized for square rooting.
Benchmarks will provide you the answer you are looking for.
回答 8
有人评论了Quake 3中的“快速Newton-Raphson平方根” …我用ctypes实现了它,但是与本地版本相比,它超级慢。我将尝试一些优化和替代实现。
from ctypes import c_float, c_long, byref, POINTER, cast
def sqrt(num):
xhalf = 0.5*num
x = c_float(num)
i = cast(byref(x), POINTER(c_long)).contents.value
i = c_long(0x5f375a86 - (i>>1))
x = cast(byref(i), POINTER(c_float)).contents.value
x = x*(1.5-xhalf*x*x)
x = x*(1.5-xhalf*x*x)
return x * num
这是使用struct的另一种方法,其速度比ctypes版本快3.6倍,但仍是C速度的1/10。
from struct import pack, unpack
def sqrt_struct(num):
xhalf = 0.5*num
i = unpack('L', pack('f', 28.0))[0]
i = 0x5f375a86 - (i>>1)
x = unpack('f', pack('L', i))[0]
x = x*(1.5-xhalf*x*x)
x = x*(1.5-xhalf*x*x)
return x * num
Someone commented about the “fast Newton-Raphson square root” from Quake 3… I implemented it with ctypes, but it’s super slow in comparison to the native versions. I’m going to try a few optimizations and alternate implementations.
from ctypes import c_float, c_long, byref, POINTER, cast
def sqrt(num):
xhalf = 0.5*num
x = c_float(num)
i = cast(byref(x), POINTER(c_long)).contents.value
i = c_long(0x5f375a86 - (i>>1))
x = cast(byref(i), POINTER(c_float)).contents.value
x = x*(1.5-xhalf*x*x)
x = x*(1.5-xhalf*x*x)
return x * num
Here’s another method using struct, comes out about 3.6x faster than the ctypes version, but still 1/10 the speed of C.
from struct import pack, unpack
def sqrt_struct(num):
xhalf = 0.5*num
i = unpack('L', pack('f', 28.0))[0]
i = 0x5f375a86 - (i>>1)
x = unpack('f', pack('L', i))[0]
x = x*(1.5-xhalf*x*x)
x = x*(1.5-xhalf*x*x)
return x * num
回答 9
Claudiu的结果与我的不同。我在旧的P4 2.4Ghz计算机上的Ubuntu上使用Python 2.6 …这是我的结果:
>>> timeit1()
Took 0.564911 seconds
>>> timeit2()
Took 0.403087 seconds
>>> timeit1()
Took 0.604713 seconds
>>> timeit2()
Took 0.387749 seconds
>>> timeit1()
Took 0.587829 seconds
>>> timeit2()
Took 0.379381 seconds
sqrt始终对我来说速度更快…甚至Codepad.org现在似乎也同意sqrt在本地情况下更快(http://codepad.org/6trzcM3j)。目前,键盘似乎正在运行Python 2.5。当Claudiu第一次回答时,也许他们使用的是2.4或更早版本?
实际上,即使使用math.sqrt(i)代替arg(i),我仍然可以获得更好的sqrt时间。在这种情况下,timeit2()在我的机器上花费了0.53到0.55秒,仍然比timeit1的0.56-0.60更好。
我想说的是,在现代Python上,请使用math.sqrt并通过somevar = math.sqrt或从math import sqrt将其带入本地上下文。
Claudiu’s results differ from mine. I’m using Python 2.6 on Ubuntu on an old P4 2.4Ghz machine… Here’s my results:
>>> timeit1()
Took 0.564911 seconds
>>> timeit2()
Took 0.403087 seconds
>>> timeit1()
Took 0.604713 seconds
>>> timeit2()
Took 0.387749 seconds
>>> timeit1()
Took 0.587829 seconds
>>> timeit2()
Took 0.379381 seconds
sqrt is consistently faster for me… Even Codepad.org NOW seems to agree that sqrt, in the local context, is faster (http://codepad.org/6trzcM3j). Codepad seems to be running Python 2.5 presently. Perhaps they were using 2.4 or older when Claudiu first answered?
In fact, even using math.sqrt(i) in place of arg(i), I still get better times for sqrt. In this case timeit2() took between 0.53 and 0.55 seconds on my machine, which is still better than the 0.56-0.60 figures from timeit1.
I’d say, on modern Python, use math.sqrt and definitely bring it to local context, either with somevar=math.sqrt or with from math import sqrt.
回答 10
需要优化的Python风格是可读性。为此,我认为明确使用sqrt
功能是最好的。话虽如此,我们还是要研究性能。
我更新了Claudiu用于Python 3的代码,并且也使得无法优化计算(将来一个好的Python编译器可能会做的事情):
from sys import version
from time import time
from math import sqrt, pi, e
print(version)
N = 1_000_000
def timeit1():
z = N * e
s = time()
for n in range(N):
z += (n * pi) ** .5 - z ** .5
print (f"Took {(time() - s):.4f} seconds to calculate {z}")
def timeit2():
z = N * e
s = time()
for n in range(N):
z += sqrt(n * pi) - sqrt(z)
print (f"Took {(time() - s):.4f} seconds to calculate {z}")
def timeit3(arg=sqrt):
z = N * e
s = time()
for n in range(N):
z += arg(n * pi) - arg(z)
print (f"Took {(time() - s):.4f} seconds to calculate {z}")
timeit1()
timeit2()
timeit3()
结果各不相同,但示例输出为:
3.6.6 (default, Jul 19 2018, 14:25:17)
[GCC 8.1.1 20180712 (Red Hat 8.1.1-5)]
Took 0.3747 seconds to calculate 3130485.5713865166
Took 0.2899 seconds to calculate 3130485.5713865166
Took 0.2635 seconds to calculate 3130485.5713865166
自己尝试。
The Pythonic thing to optimize for is readability. For this I think explicit use of the sqrt
function is best. Having said that, let’s investigate performance anyway.
I updated Claudiu’s code for Python 3 and also made it impossible to optimize away the calculations (something a good Python compiler may do in the future):
from sys import version
from time import time
from math import sqrt, pi, e
print(version)
N = 1_000_000
def timeit1():
z = N * e
s = time()
for n in range(N):
z += (n * pi) ** .5 - z ** .5
print (f"Took {(time() - s):.4f} seconds to calculate {z}")
def timeit2():
z = N * e
s = time()
for n in range(N):
z += sqrt(n * pi) - sqrt(z)
print (f"Took {(time() - s):.4f} seconds to calculate {z}")
def timeit3(arg=sqrt):
z = N * e
s = time()
for n in range(N):
z += arg(n * pi) - arg(z)
print (f"Took {(time() - s):.4f} seconds to calculate {z}")
timeit1()
timeit2()
timeit3()
Results vary, but a sample output is:
3.6.6 (default, Jul 19 2018, 14:25:17)
[GCC 8.1.1 20180712 (Red Hat 8.1.1-5)]
Took 0.3747 seconds to calculate 3130485.5713865166
Took 0.2899 seconds to calculate 3130485.5713865166
Took 0.2635 seconds to calculate 3130485.5713865166
Try it yourself.
回答 11
问题SQRMINSUM我已经解决了最近需要大型数据集的计算重复平方根。在我进行其他优化之前,历史上最早的2个提交仅通过将sqrt()替换为** 0.5而有所不同,从而将PyPy中的运行时间从3.74s减少到0.51s。这几乎是Claudiu测算的400%改善的两倍。
The problem SQRMINSUM I’ve solved recently requires computing square root repeatedly on a large dataset. The oldest 2 submissions in my history, before I’ve made other optimizations, differ solely by replacing **0.5 with sqrt(), thus reducing the runtime from 3.74s to 0.51s in PyPy. This is almost twice the already massive 400% improvement that Claudiu measured.
回答 12
当然,如果要处理文字并需要一个恒定值,那么如果使用运算符编写,Python运行时可以在编译时预先计算该值-在这种情况下,无需分析每个版本:
In [77]: dis.dis(a)
2 0 LOAD_CONST 1 (1.4142135623730951)
2 RETURN_VALUE
In [78]: def a():
...: return 2 ** 0.5
...:
In [79]: import dis
In [80]: dis.dis(a)
2 0 LOAD_CONST 1 (1.4142135623730951)
2 RETURN_VALUE
Of course, if one is dealing with literals and need a constant value, Python runtime can pre-calculate the value at compile time, if it is written with operators – no need to profile each version in this case:
In [77]: dis.dis(a)
2 0 LOAD_CONST 1 (1.4142135623730951)
2 RETURN_VALUE
In [78]: def a():
...: return 2 ** 0.5
...:
In [79]: import dis
In [80]: dis.dis(a)
2 0 LOAD_CONST 1 (1.4142135623730951)
2 RETURN_VALUE
回答 13
如果您进入math.py并将函数“ sqrt”复制到程序中,将会更快。程序需要花费时间才能找到math.py,然后打开它,找到所需的函数,然后将其带回到程序中。如果即使使用“查找”步骤该功能也更快,则该功能本身必须非常快。可能会将您的时间减少一半。综上所述:
- 转到math.py
- 找到功能“ sqrt”
- 复制它
- 将函数作为sqrt查找器粘贴到您的程序中。
- 时间。
What would be even faster is if you went into math.py and copied the function “sqrt” into your program. It takes time for your program to find math.py, then open it, find the function you are looking for, and then bring that back to your program. If that function is faster even with the “lookup” steps, then the function itself has to be awfully fast. Probably will cut your time in half. IN summary:
- Go to math.py
- Find the function “sqrt”
- Copy it
- Paste function into your program as the sqrt finder.
- Time it.