PyPy —如何击败CPython?

问题:PyPy —如何击败CPython?

来自Google开源博客

PyPy是Python中Python的重新实现,它使用先进的技术来尝试获得比CPython更好的性能。多年的努力终于有了回报。我们的速度结果通常要比CPython好,从稍慢一些到实际应用程序代码的速度提高2倍,再到小型基准测试的速度提高10倍。

这怎么可能?哪个Python实现用于实现PyPy?CPython的?PyPyPy或PyPyPyPy击败他们的分数的机会是什么?

(在相关说明中……为什么有人会尝试这样的方法?)

From the Google Open Source Blog:

PyPy is a reimplementation of Python in Python, using advanced techniques to try to attain better performance than CPython. Many years of hard work have finally paid off. Our speed results often beat CPython, ranging from being slightly slower, to speedups of up to 2x on real application code, to speedups of up to 10x on small benchmarks.

How is this possible? Which Python implementation was used to implement PyPy? CPython? And what are the chances of a PyPyPy or PyPyPyPy beating their score?

(On a related note… why would anyone try something like this?)


回答 0

Q1。这怎么可能?

在某些情况下,手动内存管理(这是CPython对其计数的方式)可能比自动管理要慢。

CPython解释器实现的局限性排除了PyPy可以进行的某些优化(例如,细粒度的锁)。

正如Marcelo所述,JIT。能够即时确认对象的类型可以节省您进行多个指针取消引用的操作,以最终到达您要调用的方法。

Q2。哪个Python实现用于实现PyPy?

PyPy解释器在RPython中实现,RPython是Python的静态类型子集(该语言而不是CPython解释器)。- 有关详细信息,请参阅https://pypy.readthedocs.org/en/latest/architecture.html

Q3。PyPyPy或PyPyPyPy击败他们的分数的机会是什么?

那将取决于这些假设解释器的实现。例如,如果其中一个拿到了源代码,对其进行了某种分析,并在运行了一段时间后将其直接转换为目标紧密的特定汇编代码,我想它会比CPython快得多。

更新:最近,在一个精心设计的示例中,PyPy优于使用编译的类似C程序gcc -O3。这是一个人为的案例,但确实体现了一些想法。

Q4。为什么有人会尝试这样的事情?

从官方网站。https://pypy.readthedocs.org/zh_CN/latest/architecture.html#mission-statement

我们旨在提供:

  • 一个通用的翻译和支持框架,用于生成
    动态语言的实现,强调
    语言规范和实现
    方面之间的明确区分。我们称此为RPython toolchain_。

  • Python_语言的合规,灵活和快速实现,它使用上述工具链来启用新的高级高级功能,而不必对低级细节进行编码。

通过以这种方式分离关注点,我们的Python和其他动态语言的实现能够自动为任何动态语言生成即时编译器。它还允许采用混合匹配方法来实施决策,包括许多历史上在用户控制范围之外的决策,例如目标平台,内存和线程模型,垃圾回收策略以及所应用的优化,包括是否具有首先是JIT。

C编译器gcc用C实现,Haskell编译器GHC用Haskell编写。您是否有理由不使用Python编写Python解释器/编译器?

Q1. How is this possible?

Manual memory management (which is what CPython does with its counting) can be slower than automatic management in some cases.

Limitations in the implementation of the CPython interpreter preclude certain optimisations that PyPy can do (eg. fine grained locks).

As Marcelo mentioned, the JIT. Being able to on the fly confirm the type of an object can save you the need to do multiple pointer dereferences to finally arrive at the method you want to call.

Q2. Which Python implementation was used to implement PyPy?

The PyPy interpreter is implemented in RPython which is a statically typed subset of Python (the language and not the CPython interpreter). – Refer https://pypy.readthedocs.org/en/latest/architecture.html for details.

Q3. And what are the chances of a PyPyPy or PyPyPyPy beating their score?

That would depend on the implementation of these hypothetical interpreters. If one of them for example took the source, did some kind of analysis on it and converted it directly into tight target specific assembly code after running for a while, I imagine it would be quite faster than CPython.

Update: Recently, on a carefully crafted example, PyPy outperformed a similar C program compiled with gcc -O3. It’s a contrived case but does exhibit some ideas.

Q4. Why would anyone try something like this?

From the official site. https://pypy.readthedocs.org/en/latest/architecture.html#mission-statement

We aim to provide:

  • a common translation and support framework for producing
    implementations of dynamic languages, emphasizing a clean
    separation between language specification and implementation
    aspects. We call this the RPython toolchain_.

  • a compliant, flexible and fast implementation of the Python_ Language which uses the above toolchain to enable new advanced high-level features without having to encode the low-level details.

By separating concerns in this way, our implementation of Python – and other dynamic languages – is able to automatically generate a Just-in-Time compiler for any dynamic language. It also allows a mix-and-match approach to implementation decisions, including many that have historically been outside of a user’s control, such as target platform, memory and threading models, garbage collection strategies, and optimizations applied, including whether or not to have a JIT in the first place.

The C compiler gcc is implemented in C, The Haskell compiler GHC is written in Haskell. Do you have any reason for the Python interpreter/compiler to not be written in Python?


回答 1

尽管在技术上是正确的,“ PyPy是Python在Python中的重新实现”是一种相当误导的方式来描述恕我直言的PyPy。

PyPy有两个主要部分。

  1. 翻译框架
  2. 口译员

翻译框架是编译器。它将RPython代码编译为C(或其他目标),并自动添加垃圾回收和JIT编译器等方面。它不能处理任意的Python代码,只能处理RPython。

RPython是普通Python的子集;所有RPython代码都是Python代码,但并非相反。RPython没有正式的定义,因为RPython基本上只是“可由PyPy的翻译框架翻译的Python子集”。但是要进行翻译,RPython代码必须是静态类型的(推断类型,您不必声明它们,但严格来说,它仍然是每个变量的一种类型),并且您不能执行诸如声明/修改函数/类在运行时。

然后,解释器是用RPython编写的普通Python解释器。

由于RPython代码是普通的Python代码,因此您可以在任何Python解释器上运行它。但是,PyPy的速度要求均不来自这种方式。这只是一个快速的测试周期,因为翻译解释器需要长时间。

有了这样的理解,对于PyPyPy或PyPyPyPy的猜测实际上没有任何意义。您有一个用RPython编写的解释器。您将其转换为可快速执行Python的C代码。到此过程停止;不再有RPython可以通过再次处理来加快速度。

因此“ PyPy怎么可能比CPython更快”就变得很明显。PyPy有一个更好的实现,包括一个JIT编译器(我相信,没有JIT编译器,它通常不会那么快,这意味着PyPy仅对于易受JIT编译的程序而言更快)。CPython从未被设计为Python语言的高度优化的实现(尽管您确实遵循差异,但它们确实尝试使其成为高度优化的实现)。


PyPy项目的真正创新之处在于,它们无需手动编写复杂的GC方案或JIT编译器。他们用RPython相对直接地编写解释器,并且由于所有RPython都比Python低,所以它仍然是一种面向对象的垃圾收集语言,比C高得多。然后,翻译框架自动添加GC和JIT之类的东西。所以翻译框架是一个巨大的努力,但它同样适用于PyPy python解释器,但是他们更改了实现,从而在实验中拥有更大的自由度来提高性能(而不必担心引入GC错误或更新JIT编译器以应对更改)。这也意味着当他们开始实现Python3解释器时,它将自动获得相同的好处。以及使用PyPy框架编写的任何其他解释器(其中有很多处于波兰的不同阶段)。并且所有使用PyPy框架的解释器都会自动支持该框架支持的所有平台。

因此,PyPy项目的真正好处是(尽可能)将实现动态语言的,与平台无关的高效解释器的所有部分分开。然后在一个地方提出一种很好的实现,可以在许多口译员之间重复使用。这并不是像“我的Python程序现在运行速度更快”那样的即时胜利,但它是未来的广阔前景。

而且它可以更快(也许)运行您的Python程序。

“PyPy is a reimplementation of Python in Python” is a rather misleading way to describe PyPy, IMHO, although it’s technically true.

There are two major parts of PyPy.

  1. The translation framework
  2. The interpreter

The translation framework is a compiler. It compiles RPython code down to C (or other targets), automatically adding in aspects such as garbage collection and a JIT compiler. It cannot handle arbitrary Python code, only RPython.

RPython is a subset of normal Python; all RPython code is Python code, but not the other way around. There is no formal definition of RPython, because RPython is basically just “the subset of Python that can be translated by PyPy’s translation framework”. But in order to be translated, RPython code has to be statically typed (the types are inferred, you don’t declare them, but it’s still strictly one type per variable), and you can’t do things like declaring/modifying functions/classes at runtime either.

The interpreter then is a normal Python interpreter written in RPython.

Because RPython code is normal Python code, you can run it on any Python interpreter. But none of PyPy’s speed claims come from running it that way; this is just for a rapid test cycle, because translating the interpreter takes a long time.

With that understood, it should be immediately obvious that speculations about PyPyPy or PyPyPyPy don’t actually make any sense. You have an interpreter written in RPython. You translate it to C code that executes Python quickly. There the process stops; there’s no more RPython to speed up by processing it again.

So “How is it possible for PyPy to be faster than CPython” also becomes fairly obvious. PyPy has a better implementation, including a JIT compiler (it’s generally not quite as fast without the JIT compiler, I believe, which means PyPy is only faster for programs susceptible to JIT-compilation). CPython was never designed to be a highly optimising implementation of the Python language (though they do try to make it a highly optimised implementation, if you follow the difference).


The really innovative bit of the PyPy project is that they don’t write sophisticated GC schemes or JIT compilers by hand. They write the interpreter relatively straightforwardly in RPython, and for all RPython is lower level than Python it’s still an object-oriented garbage collected language, much more high level than C. Then the translation framework automatically adds things like GC and JIT. So the translation framework is a huge effort, but it applies equally well to the PyPy python interpreter however they change their implementation, allowing for much more freedom in experimentation to improve performance (without worrying about introducing GC bugs or updating the JIT compiler to cope with the changes). It also means when they get around to implementing a Python3 interpreter, it will automatically get the same benefits. And any other interpreters written with the PyPy framework (of which there are a number at varying stages of polish). And all interpreters using the PyPy framework automatically support all platforms supported by the framework.

So the true benefit of the PyPy project is to separate out (as much as possible) all the parts of implementing an efficient platform-independent interpreter for a dynamic language. And then come up with one good implementation of them in one place, that can be re-used across many interpreters. That’s not an immediate win like “my Python program runs faster now”, but it’s a great prospect for the future.

And it can run your Python program faster (maybe).


回答 2

PyPy是用Python实现的,但是它实现了JIT编译器来动态生成本机代码。

在Python之上实现PyPy的原因可能是它只是一种非常有生产力的语言,尤其是因为JIT编译器使宿主语言的性能变得无关紧要。

PyPy is implemented in Python, but it implements a JIT compiler to generate native code on the fly.

The reason to implement PyPy on top of Python is probably that it is simply a very productive language, especially since the JIT compiler makes the host language’s performance somewhat irrelevant.


回答 3

PyPy用受限Python编写。据我所知,它不能在CPython解释器之上运行。受限制的Python是Python语言的子集。AFAIK将PyPy解释器编译为机器代码,因此在安装时,它在运行时不使用python解释器。

您的问题似乎希望PyPy解释器在执行代码时在CPython之上运行。 编辑:是的,要使用PyPy,您首先需要将PyPy python代码转换为C并使用gcc进行编译,转换为jvm字节代码或.Net CLI代码。请参阅入门

PyPy is written in Restricted Python. It does not run on top of the CPython interpreter, as far as I know. Restricted Python is a subset of the Python language. AFAIK, the PyPy interpreter is compiled to machine code, so when installed it does not utilize a python interpreter at runtime.

Your question seems to expect the PyPy interpreter is running on top of CPython while executing code. Edit: Yes, to use PyPy you first translate the PyPy python code, either to C and build with gcc, to jvm byte code, or to .Net CLI code. See Getting Started