如果PyPy快6.3倍,为什么我不应该在CPython上使用PyPy?

问题:如果PyPy快6.3倍,为什么我不应该在CPython上使用PyPy?

我已经听到很多有关PyPy项目的信息。他们声称它比其站点上的CPython解释器快6.3倍。

每当我们谈论诸如Python之类的动态语言时,速度都是头等大事。为了解决这个问题,他们说PyPy快6.3倍。

第二个问题是并行性,臭名昭著的Global Interpreter Lock(GIL)。为此,PyPy表示可以提供无GIL的Python

如果PyPy可以解决这些巨大的挑战,那么它的哪些弱点正在阻碍广泛采用?也就是说,是什么原因导致我这样的人,一个典型的Python开发,切换到PyPy 现在

I’ve been hearing a lot about the PyPy project. They claim it is 6.3 times faster than the CPython interpreter on their site.

Whenever we talk about dynamic languages like Python, speed is one of the top issues. To solve this, they say PyPy is 6.3 times faster.

The second issue is parallelism, the infamous Global Interpreter Lock (GIL). For this, PyPy says it can give GIL-less Python.

If PyPy can solve these great challenges, what are its weaknesses that are preventing wider adoption? That is to say, what’s preventing someone like me, a typical Python developer, from switching to PyPy right now?


回答 0

注意: PyPy现在比2013年提出这个问题时更加成熟,并且得到了更好的支持。避免从过时的信息中得出结论。


  1. 正如其他人很快提到的,PyPy 对C扩展提供了长期的支持。它具有支持,但通常速度低于Python,并且充其量也只是个问题。因此,许多模块只需要 CPython。PyPy不支持numpy PyPy现在支持numpy。某些扩展仍然不受支持(Pandas,SciPy等),请在进行更改之前先查看支持的软件包的列表
  2. 目前,对Python 3的支持尚处于试验阶段。 刚刚达到稳定!自2014年6月20日起,PyPy3 2.3.1-Fulcrum退出了
  3. PyPy有时并不真正更快“脚本”,其中有很多人使用Python进行。这些是运行时间短的程序,它们执行简单和小的操作。由于PyPy是JIT编译器,因此其主要优点来自运行时间长和简单的类型(例如数字)。坦率地说,与CPython相比,PyPy的JIT之前速度非常差
  4. 惯性。迁移到PyPy通常需要重新配置工具,对于某些人和组织而言,这简直就是太多的工作。

我会说,这些是影响我的主要原因。

NOTE: PyPy is more mature and better supported now than it was in 2013, when this question was asked. Avoid drawing conclusions from out-of-date information.


  1. PyPy, as others have been quick to mention, has tenuous support for C extensions. It has support, but typically at slower-than-Python speeds and it’s iffy at best. Hence a lot of modules simply require CPython. PyPy doesn’t support numpy PyPy now supports numpy. Some extensions are still not supported (Pandas, SciPy, etc.), take a look at the list of supported packages before making the change.
  2. Python 3 support is experimental at the moment. has just reached stable! As of 20th June 2014, PyPy3 2.3.1 – Fulcrum is out!
  3. PyPy sometimes isn’t actually faster for “scripts”, which a lot of people use Python for. These are the short-running programs that do something simple and small. Because PyPy is a JIT compiler its main advantages come from long run times and simple types (such as numbers). Frankly, PyPy’s pre-JIT speeds are pretty bad compared to CPython.
  4. Inertia. Moving to PyPy often requires retooling, which for some people and organizations is simply too much work.

Those are the main reasons that affect me, I’d say.


回答 1

该网站也没有权利要求PyPy比CPython的快6.3倍。报价:

所有基准的几何平均值比CPython快0.16或6.3倍

这与您所做的一揽子声明完全不同,当您了解差异时,您将至少了解一组不能仅仅说“使用PyPy”的原因。听起来好像我很挑剔,但是了解为什么这两个陈述完全不同是至关重要的。

分解:

  • 他们所做的陈述仅适用于他们所使用的基准。它完全没有说明您的程序(除非您的程序与其基准之一完全相同)。

  • 该声明大约是一组基准的平均值。没有人声称运行PyPy甚至可以为他们测试过的程序带来6.3倍的改进。

  • 没有人声称PyPy甚至可以运行CPython运行的所有程序,更不用说更快了。

That site does not claim PyPy is 6.3 times faster than CPython. To quote:

The geometric average of all benchmarks is 0.16 or 6.3 times faster than CPython

This is a very different statement to the blanket statement you made, and when you understand the difference, you’ll understand at least one set of reasons why you can’t just say “use PyPy”. It might sound like I’m nit-picking, but understanding why these two statements are totally different is vital.

To break that down:

  • The statement they make only applies to the benchmarks they’ve used. It says absolutely nothing about your program (unless your program is exactly the same as one of their benchmarks).

  • The statement is about an average of a group of benchmarks. There is no claim that running PyPy will give a 6.3 times improvement even for the programs they have tested.

  • There is no claim that PyPy will even run all the programs that CPython runs at all, let alone faster.


回答 2

由于pypy并非100%兼容,因此需要8 gig的ram进行编译,这是一个不断变化的目标,并且处于高度试验阶段,而cpython是稳定的,这是模块构建器默认的目标,长达20年(包括无法在pypy上运行的c扩展名) ),并且已经广泛部署。

Pypy可能永远不会成为参考实现,但是它是一个很好的工具。

Because pypy is not 100% compatible, takes 8 gigs of ram to compile, is a moving target, and highly experimental, where cpython is stable, the default target for module builders for 2 decades (including c extensions that don’t work on pypy), and already widely deployed.

Pypy will likely never be the reference implementation, but it is a good tool to have.


回答 3

第二个问题更容易回答:如果所有代码都是纯Python,则基本上可以使用PyPy替代。但是,许多广泛使用的库(包括一些标准库)都是用C编写的,并作为Python扩展进行编译。其中有些可以与PyPy一起使用,有些则不能。PyPy提供了与Python相同的“面向前”工具-也就是说,它是Python-,但是它的内在功能是不同的,因此与这些内在功能连接的工具将不起作用。

关于第一个问题,我想这有点像第一个Catch-22:PyPy一直在迅速发展,以提高速度并增强与其他代码的互操作性。这使其比官方更具实验性。

我认为,如果PyPy进入稳定状态,则有可能开始被更广泛地使用。我也认为Python摆脱C的支持是很棒的。但这不会一会儿发生。PyPy还没有达到临界质量的地方是几乎对自己有用的,足以做你想要的一切,这将激励人们以填补空白。

The second question is easier to answer: you basically can use PyPy as a drop-in replacement if all your code is pure Python. However, many widely used libraries (including some of the standard library) are written in C and compiled as Python extensions. Some of these can be made to work with PyPy, some can’t. PyPy provides the same “forward-facing” tool as Python — that is, it is Python — but its innards are different, so tools that interface with those innards won’t work.

As for the first question, I imagine it is sort of a Catch-22 with the first: PyPy has been evolving rapidly in an effort to improve speed and enhance interoperability with other code. This has made it more experimental than official.

I think it’s possible that if PyPy gets into a stable state, it may start getting more widely used. I also think it would be great for Python to move away from its C underpinnings. But it won’t happen for a while. PyPy hasn’t yet reached the critical mass where it is almost useful enough on its own to do everything you’d want, which would motivate people to fill in the gaps.


回答 4

我对此主题做了一个小型基准测试。尽管许多其他发布者在兼容性方面都提出了很好的观点,但我的经验是,PyPy仅仅移动一些位并没有那么快。对于Python的许多用途,它实际上仅存在于在两个或多个服务之间转换位。例如,很少有Web应用程序对数据集执行CPU密集型分析。相反,它们从客户端获取一些字节,将其存储在某种数据库中,然后再将其返回给其他客户端。有时,数据格式会更改。

BDFL和CPython开发人员是一群非常聪明的人,并设法帮助CPython在这种情况下表现出色。这是一个无耻的博客插件:http : //www.hydrogen18.com/blog/unpickling-buffers.html。我正在使用Stackless,它是从CPython派生的,并保留了完整的C模块接口。在那种情况下,我发现使用PyPy没有任何优势。

I did a small benchmark on this topic. While many of the other posters have made good points about compatibility, my experience has been that PyPy isn’t that much faster for just moving around bits. For many uses of Python, it really only exists to translate bits between two or more services. For example, not many web applications are performing CPU intensive analysis of datasets. Instead, they take some bytes from a client, store them in some sort of database, and later return them to other clients. Sometimes the format of the data is changed.

The BDFL and the CPython developers are a remarkably intelligent group of people and have a managed to help CPython perform excellent in such a scenario. Here’s a shameless blog plug: http://www.hydrogen18.com/blog/unpickling-buffers.html . I’m using Stackless, which is derived from CPython and retains the full C module interface. I didn’t find any advantage to using PyPy in that case.


回答 5

问:如果与CPython相比,PyPy可以解决这些巨大的挑战(速度,内存消耗,并行性),那么它的哪些弱点在阻止更广泛的采用?

答:首先,很少有证据表明PyPy团队可以解决问题的速度一般。长期证据表明,PyPy运行某些Python代码要比CPython慢​​,而且这一缺点似乎深深地植根于PyPy。

其次,在相当多的情况下,当前版本的PyPy消耗的内存比CPython多得多。因此,PyPy尚未解决内存消耗问题。

无论PyPy解决所提到的巨大挑战,并在一般更快,较少的内存饿了,和更友好的并行与CPython是一个悬而未决的问题无法在短期内得到解决。有人押注,PyPy将永远无法提供一种通用解决方案,使它在所有情况下均能统治CPython 2.7和3.3。

如果PyPy总体上要比CPython更好,这是值得怀疑的,那么影响其广泛采用的主要弱点将是与CPython的兼容性。还存在一些问题,例如CPython可在更广泛的CPU和OS上运行,但是与PyPy的性能和CPython兼容性目标相比,这些问题的重要性要小得多。


问:为什么现在不能放弃用PyPy替换CPython?

答:PyPy并非100%与CPython兼容,因为它没有在后台模拟CPython。有些程序可能仍依赖于PyPy中缺少的CPython的独特功能,例如C绑定,Python对象和方法的C实现,或CPython垃圾收集器的增量性质。

Q: If PyPy can solve these great challenges (speed, memory consumption, parallelism) in comparison to CPython, what are its weaknesses that are preventing wider adoption?

A: First, there is little evidence that the PyPy team can solve the speed problem in general. Long-term evidence is showing that PyPy runs certain Python codes slower than CPython and this drawback seems to be rooted very deeply in PyPy.

Secondly, the current version of PyPy consumes much more memory than CPython in a rather large set of cases. So PyPy didn’t solve the memory consumption problem yet.

Whether PyPy solves the mentioned great challenges and will in general be faster, less memory hungry, and more friendly to parallelism than CPython is an open question that cannot be solved in the short term. Some people are betting that PyPy will never be able to offer a general solution enabling it to dominate CPython 2.7 and 3.3 in all cases.

If PyPy succeeds to be better than CPython in general, which is questionable, the main weakness affecting its wider adoption will be its compatibility with CPython. There also exist issues such as the fact that CPython runs on a wider range of CPUs and OSes, but these issues are much less important compared to PyPy’s performance and CPython-compatibility goals.


Q: Why can’t I do drop in replacement of CPython with PyPy now?

A: PyPy isn’t 100% compatible with CPython because it isn’t simulating CPython under the hood. Some programs may still depend on CPython’s unique features that are absent in PyPy such as C bindings, C implementations of Python object&methods, or the incremental nature of CPython’s garbage collector.


回答 6

CPython具有引用计数和垃圾收集,PyPy仅具有垃圾收集。

因此,对象倾向于更早地删除,并__del__在CPython中以更可预测的方式调用。一些软件依赖于这种行为,因此它们还没有准备好迁移到PyPy。

某些其他软件可同时使用这两种软件,但CPython使用较少的内存,因为较早时释放了未使用的对象。(我没有任何度量来表明这有多重要,还有哪些其他实现细节会影响内存使用。)

CPython has reference counting and garbage collection, PyPy has garbage collection only.

So objects tend to be deleted earlier and __del__ is called in a more predictable way in CPython. Some software relies on this behavior, thus they are not ready for migrating to PyPy.

Some other software works with both, but uses less memory with CPython, because unused objects are freed earlier. (I don’t have any measurements to indicate how significant this is and what other implementation details affect the memory use.)


回答 7

对于许多项目,在速度方面,不同的python之间实际上有0%的差异。那就是那些受工程时间支配并且所有python都具有相同数量的库支持的库。

For a lot of projects, there is actually 0% difference between the different pythons in terms of speed. That is those that are dominated by engineering time and where all pythons have the same amount of library support.


回答 8

简单地说:PyPy提供了CPython所缺乏的速度,但却牺牲了它的兼容性。但是,大多数人选择Python是因为它具有灵活性和“含电池”功能(高兼容性),而不是因为它的速度(尽管它仍然是首选)。

To make this simple: PyPy provides the speed that’s lacked by CPython but sacrifices its compatibility. Most people, however, choose Python for its flexibility and its “battery-included” feature (high compatibility), not for its speed (it’s still preferred though).


回答 9

我发现了一些例子,其中PyPy比Python慢​​。但是:仅在Windows上。

C:\Users\User>python -m timeit -n10 -s"from sympy import isprime" "isprime(2**521-1);isprime(2**1279-1)"
10 loops, best of 3: 294 msec per loop

C:\Users\User>pypy -m timeit -n10 -s"from sympy import isprime" "isprime(2**521-1);isprime(2**1279-1)"
10 loops, best of 3: 1.33 sec per loop

因此,如果您想到的是PyPy,请忘记Windows。在Linux上,您可以实现出色的加速。示例(列出1到1,000,000之间的所有素数):

from sympy import sieve
primes = list(sieve.primerange(1, 10**6))

PyPy的运行速度比Python快10(!)倍。但不在Windows上。那里只有3倍的速度。

I’ve found examples, where PyPy is slower than Python. But: Only on Windows.

C:\Users\User>python -m timeit -n10 -s"from sympy import isprime" "isprime(2**521-1);isprime(2**1279-1)"
10 loops, best of 3: 294 msec per loop

C:\Users\User>pypy -m timeit -n10 -s"from sympy import isprime" "isprime(2**521-1);isprime(2**1279-1)"
10 loops, best of 3: 1.33 sec per loop

So, if you think of PyPy, forget Windows. On Linux, you can achieve awesome accelerations. Example (list all primes between 1 and 1,000,000):

from sympy import sieve
primes = list(sieve.primerange(1, 10**6))

This runs 10(!) times faster on PyPy than on Python. But not on windows. There it is only 3x as fast.


回答 10

PyPy已经支持Python 3一段时间了,但是根据Anthony Shaw在2018年4月2日发布的HackerNoon帖子中所述,PyPy3仍然比PyPy(Python 2)慢几倍。

对于许多科学计算,尤其是矩阵计算,numpy是更好的选择(请参阅FAQ:我应该安装numpy还是numpypy?)。

Pypy不支持gmpy2。您可以改用gmpy_cffi, 尽管我尚未测试过它的速度,并且该项目在2014年发布了一个版本。

对于Project Euler问题,我经常使用PyPy,对于简单的数值计算通常from __future__ import division足以满足我的目的,但是截至2018年,Python 3支持仍在开发中,最好的选择是在64位Linux上。Windows PyPy3.5 v6.0(截至2018年12月)为最新版本。

PyPy has had Python 3 support for a while, but according to this HackerNoon post by Anthony Shaw from April 2nd, 2018, PyPy3 is still several times slower than PyPy (Python 2).

For many scientific calculations, particularly matrix calculations, numpy is a better choice (see FAQ: Should I install numpy or numpypy?).

Pypy does not support gmpy2. You can instead make use of gmpy_cffi though I haven’t tested its speed and the project had one release in 2014.

For Project Euler problems, I make frequent use of PyPy, and for simple numerical calculations often from __future__ import division is sufficient for my purposes, but Python 3 support is still being worked on as of 2018, with your best bet being on 64-bit Linux. Windows PyPy3.5 v6.0, the latest as of December 2018, is in beta.


回答 11

支持的Python版本

引用PythonZen

可读性很重要。

例如,Python 3.7引入了数据类,Python 3.8引入了fstring =

Python 3.7和Python 3.8中可能还有其他更重要的功能。关键是PyPy目前不支持Python 3.7或Python 3.8。

Supported Python Versions

To cite the Zen of Python:

Readability counts.

For example, Python 3.7 introduced dataclasses and Python 3.8 introduced fstring =.

There might be other features in Python 3.7 and Python 3.8 which are more important to you. The point is that PyPy does not support Python 3.7 or Python 3.8 at the moment.