标签归档:pypy

为什么pow(a,d,n)比a ** d%n快得多?

问题:为什么pow(a,d,n)比a ** d%n快得多?

我正在尝试实施Miller-Rabin素数测试,并对为什么中号(〜7位数)要花这么长时间(> 20秒)感到困惑。我最终发现以下代码行是问题的根源:

x = a**d % n

(其中adn都是相似的,但不相等的中号,**是幂运算符,并且%是模运算符)

然后,我尝试将其替换为以下内容:

x = pow(a, d, n)

相比之下,它几乎是瞬时的。

对于上下文,这是原始功能:

from random import randint

def primalityTest(n, k):
    if n < 2:
        return False
    if n % 2 == 0:
        return False
    s = 0
    d = n - 1
    while d % 2 == 0:
        s += 1
        d >>= 1
    for i in range(k):
        rand = randint(2, n - 2)
        x = rand**d % n         # offending line
        if x == 1 or x == n - 1:
            continue
        for r in range(s):
            toReturn = True
            x = pow(x, 2, n)
            if x == 1:
                return False
            if x == n - 1:
                toReturn = False
                break
        if toReturn:
            return False
    return True

print(primalityTest(2700643,1))

定时计算示例:

from timeit import timeit

a = 2505626
d = 1520321
n = 2700643

def testA():
    print(a**d % n)

def testB():
    print(pow(a, d, n))

print("time: %(time)fs" % {"time":timeit("testA()", setup="from __main__ import testA", number=1)})
print("time: %(time)fs" % {"time":timeit("testB()", setup="from __main__ import testB", number=1)})

输出(与PyPy 1.9.0一起运行):

2642565
time: 23.785543s
2642565
time: 0.000030s

输出(在Python 3.3.0中运行,2.7.2返回的时间非常相似):

2642565
time: 14.426975s
2642565
time: 0.000021s

还有一个相关的问题,为什么使用Python 2或3运行时,这种计算几乎比使用PyPy时快两倍,而通常PyPy却要快得多

I was trying to implement a Miller-Rabin primality test, and was puzzled why it was taking so long (> 20 seconds) for midsize numbers (~7 digits). I eventually found the following line of code to be the source of the problem:

x = a**d % n

(where a, d, and n are all similar, but unequal, midsize numbers, ** is the exponentiation operator, and % is the modulo operator)

I then I tried replacing it with the following:

x = pow(a, d, n)

and it by comparison it is almost instantaneous.

For context, here is the original function:

from random import randint

def primalityTest(n, k):
    if n < 2:
        return False
    if n % 2 == 0:
        return False
    s = 0
    d = n - 1
    while d % 2 == 0:
        s += 1
        d >>= 1
    for i in range(k):
        rand = randint(2, n - 2)
        x = rand**d % n         # offending line
        if x == 1 or x == n - 1:
            continue
        for r in range(s):
            toReturn = True
            x = pow(x, 2, n)
            if x == 1:
                return False
            if x == n - 1:
                toReturn = False
                break
        if toReturn:
            return False
    return True

print(primalityTest(2700643,1))

An example timed calculation:

from timeit import timeit

a = 2505626
d = 1520321
n = 2700643

def testA():
    print(a**d % n)

def testB():
    print(pow(a, d, n))

print("time: %(time)fs" % {"time":timeit("testA()", setup="from __main__ import testA", number=1)})
print("time: %(time)fs" % {"time":timeit("testB()", setup="from __main__ import testB", number=1)})

Output (run with PyPy 1.9.0):

2642565
time: 23.785543s
2642565
time: 0.000030s

Output (run with Python 3.3.0, 2.7.2 returns very similar times):

2642565
time: 14.426975s
2642565
time: 0.000021s

And a related question, why is this calculation almost twice as fast when run with Python 2 or 3 than with PyPy, when usually PyPy is much faster?


回答 0

请参阅Wikipedia上有关模幂的文章。基本上,当您这样做时a**d % n,实际上必须计算a**d,这可能会很大。但是有些计算方法a**d % n不必自己计算a**d,这就是pow它的作用。该**运营商不能做到这一点,因为它不能“预见未来”知道你要立即采取模数。

See the Wikipedia article on modular exponentiation. Basically, when you do a**d % n, you actually have to calculate a**d, which could be quite large. But there are ways of computing a**d % n without having to compute a**d itself, and that is what pow does. The ** operator can’t do this because it can’t “see into the future” to know that you are going to immediately take the modulus.


回答 1

BrenBarn回答了您的主要问题。除了您:

为什么用Python 2或3运行时,它的速度几乎是PyPy的两倍,而通常PyPy要快得多?

如果您阅读了PyPy的性能页面,这正是PyPy不擅长的事情-实际上,他们给出的第一个示例是:

不良的例子包括进行大量的计算-这是由无法优化的支持代码执行的。

从理论上讲,将巨大的幂乘以Mod转换为模块化幂(至少在第一遍之后)是JIT可以实现的一种转换,但不是PyPy的JIT。

附带说明一下,如果您需要使用巨大的整数进行计算,则可能需要查看第三方模块,例如gmpy,在某些情况下,它有时会比CPython的本机实现快得多,在某些主流用途之外,并且也有很多用途。否则,您将不得不编写自己的其他功能,而代价是不太方便。

BrenBarn answered your main question. For your aside:

why is it almost twice as fast when run with Python 2 or 3 than PyPy, when usually PyPy is much faster?

If you read PyPy’s performance page, this is exactly the kind of thing PyPy is not good at—in fact, the very first example they give:

Bad examples include doing computations with large longs – which is performed by unoptimizable support code.

Theoretically, turning a huge exponentiation followed by a mod into a modular exponentiation (at least after the first pass) is a transformation a JIT might be able to make… but not PyPy’s JIT.

As a side note, if you need to do calculations with huge integers, you may want to look at third-party modules like gmpy, which can sometimes be much faster than CPython’s native implementation in some cases outside the mainstream uses, and also has a lot of additional functionality that you’d otherwise have to write yourself, at the cost of being less convenient.


回答 2

进行模幂运算有一些捷径:例如,您可以找到从到的a**(2i) mod n每个,并将所需的中间结果相乘(mod )。专用的模幂函数(例如3参数)可以利用这些技巧,因为它知道您正在执行模数运算。Python解析器无法识别给定的裸表达式,因此它将执行完整的计算(这将花费更长的时间)。i1log(d)npow()a**d % n

There are shortcuts to doing modular exponentiation: for instance, you can find a**(2i) mod n for every i from 1 to log(d) and multiply together (mod n) the intermediate results you need. A dedicated modular-exponentiation function like 3-argument pow() can leverage such tricks because it knows you’re doing modular arithmetic. The Python parser can’t recognize this given the bare expression a**d % n, so it will perform the full calculation (which will take much longer).


回答 3

x = a**d % n计算的方法是提高功率,然后用adn。首先,如果a很大,则会创建一个巨大的数字,然后将其截短。但是,x = pow(a, d, n)最有可能进行了优化,以便仅n跟踪最后一位,这是计算以模为模的乘法所需的全部数字。

The way x = a**d % n is calculated is to raise a to the d power, then modulo that with n. Firstly, if a is large, this creates a huge number which is then truncated. However, x = pow(a, d, n) is most likely optimized so that only the last n digits are tracked, which are all that are required for calculating multiplication modulo a number.


为什么标准Python中不包含PyPy?

问题:为什么标准Python中不包含PyPy?

我在看PyPy,我只是想知道为什么它没有被主流Python发行版所采用。诸如JIT编译和较低的内存占用量之类的方法不会大大提高所有Python代码的速度吗?

简而言之,PyPy的主要缺陷是什么?

I was looking at PyPy and I was just wondering why it hasn’t been adopted into the mainline Python distributions. Wouldn’t things like JIT compilation and lower memory footprint greatly improve the speeds of all Python code?

In short, what are the main drawbacks of PyPy that cause it to remain a separate project?


回答 0

PyPy不是CPython的分支,因此永远不能将其直接合并到CPython中。

从理论上讲,Python社区可以普遍采用PyPy,可以将PyPy用作参考实现,而可以停止CPython。但是,PyPy有其自身的弱点:

  • CPython易于与用C编写的Python模块集成,这是传统上Python应用程序处理CPU密集型任务的方式(例如,参见SciPy项目)。
  • PyPy JIT编译步骤本身要花费CPU时间-仅通过重复运行已编译的代码,它才能整体上更快。这意味着启动时间可能会更长,因此PyPy对于运行胶水代码或琐碎的脚本不一定有效。
  • PyPy和CPython行为在所有方面都不完全相同,尤其是涉及“实现细节”时(该行为不是语言指定的,但在实际水平上仍然很重要)。
  • CPython比PyPy可以在更多的体系结构上运行,并且已经成功地适应了以PyPy不可行的方式在嵌入式体系结构中运行。
  • CPython的内存管理参考计数方案可以说比PyPy的各种GC系统具有更可预测的性能影响,尽管不一定对所有“纯GC”策略都如此。
  • PyPy尚未完全支持Python 3.x,尽管这是一个活跃的工作项目。

PyPy是一个很棒的项目,但是CPU密集型任务的运行速度并不是全部,在许多应用程序中,它是许多关注中最少的。例如,Django可以在PyPy上运行,这使得模板化更快,但是CPython的数据库驱动程序比PyPy的更快。最后,哪种实现方式更有效取决于给定应用程序的瓶颈所在。

另一个例子:您认为PyPy非常适合游戏,但是大多数GC策略(例如PyPy中使用的GC策略)都会引起明显的抖动。对于CPython,大多数占用大量CPU的游戏资源都已转移到PyGame库中,因为PyGame主要是作为C扩展实现的,所以PyPy无法利用(尽管参见:pygame-cffi)。我仍然认为PyPy可以成为游戏的绝佳平台,但我从未见过它的实际用途。

PyPy和CPython在基本设计问题上有根本不同的方法,并会做出不同的权衡,因此在每种情况下,两者都不比另一个“更好”。

PyPy is not a fork of CPython, so it could never be merged directly into CPython.

Theoretically the Python community could universally adopt PyPy, PyPy could be made the reference implementation, and CPython could be discontinued. However, PyPy has its own weaknesses:

  • CPython is easy to integrate with Python modules written in C, which is traditionally the way Python applications have handled CPU-intensive tasks (see for instance the SciPy project).
  • The PyPy JIT compilation step itself costs CPU time — it’s only through repeated running of compiled code that it becomes faster overall. This means startup times can be higher, and therefore PyPy isn’t necessarily as efficient for running glue code or trivial scripts.
  • PyPy and CPython behavior is not identical in all respects, especially when it comes to “implementation details” (behavior that is not specified by the language but is still important at a practical level).
  • CPython runs on more architectures than PyPy and has been successfully adapted to run in embedded architectures in ways that may be impractical for PyPy.
  • CPython’s reference counting scheme for memory management arguably has more predictable performance impacts than PyPy’s various GC systems, although this isn’t necessarily true of all “pure GC” strategies.
  • PyPy does not yet fully support Python 3.x, although that is an active work item.

PyPy is a great project, but runtime speed on CPU-intensive tasks isn’t everything, and in many applications it’s the least of many concerns. For instance, Django can run on PyPy and that makes templating faster, but CPython’s database drivers are faster than PyPy’s; in the end, which implementation is more efficient depends on where the bottleneck in a given application is.

Another example: you’d think PyPy would be great for games, but most GC strategies like those used in PyPy cause noticeable jitter. For CPython, most of the CPU-intensive game stuff is offloaded to the PyGame library, which PyPy can’t take advantage of since PyGame is primarily implemented as a C extension (though see: pygame-cffi). I still think PyPy can be a great platform for games, but I’ve never seen it actually used.

PyPy and CPython have radically different approaches to fundamental design questions and make different tradeoffs, so neither one is “better” than the other in every case.


回答 1

例如,它与Python 2.x 并非100%兼容,并且仅对3.x 具有初步支持

它也不是可以合并的东西-PyPy提供的Python实现是使用他们创建的框架生成的,该框架非常酷,但也与现有的CPython实现完全不同。它必须是一个完整的替代品。

PyPy和CPython之间有一些非常具体的区别,其中一个很大的区别就是扩展模块的支持方式-如果您想超越标准库,那就太重要了。

还值得注意的是,PyPy并非普遍都更快。

For one, it’s not 100% compatible with Python 2.x, and has only preliminary support for 3.x.

It’s also not something that could be merged – The Python implementation that is provided by PyPy is generated using a framework they have created, which is extremely cool, but also completely disparate with the existing CPython implementation. It would have to be a complete replacement.

There are some very concrete differences between PyPy and CPython, a big one being how extension modules are supported – which, if you want to go beyond the standard library, is a big deal.

It’s also worth noting that PyPy isn’t universally faster.


回答 2

观看Guido van Rossum的这段视频。他谈论您在12分33秒时问的相同问题。

强调:

  • 缺乏Python 3兼容性
  • 缺乏扩展支持
  • 不适合作为胶水代码
  • 速度不是一切

毕竟,他是决定的人…

See this video by Guido van Rossum. He talks about the same question you asked at 12 min 33 secs.

Highlights:

  • lack of Python 3 compatibility
  • lack of extension support
  • not appropriate as glue code
  • speed is not everything

After all, he’s the one to decide…


回答 3

根据PyPy网站的说法,一个原因可能是它目前仅在32位和64位Intel x86架构上运行,而CPython也可以在其他平台上运行。这可能是由于PyPy中特定于平台的速度增强所致。虽然速度是一件好事,但人们通常希望语言实现尽可能与“平台无关”。

One reason might be that according to PyPy site, it currently runs only on 32- and 64-bit Intel x86 architecture, while CPython runs on other platforms as well. This is probably due to platform-specific speed enhancements in PyPy. While speed is a good thing, people often want language implementations to be as “platform-independent” as possible.


回答 4

我建议观看David Beazley的主题演讲,以获取更多见解。它通过阐明PyPy的性质和复杂性来回答您的问题。

I recommend watching this keynote by David Beazley for more insights. It answers your question by giving clarity on nature & intricacies of PyPy.


回答 5

除了这里所说的一切之外,PyPy在错误方面还不如CPython坚如磐石。使用SymPy,在过去的几年中,我们发现了PyPy中大约有十二个错误,无论是发布版本还是夜间版本。

另一方面,我们在CPython中只发现了一个bug,而该bug在一个预发行版本中。

另外,不要轻视缺少Python 3支持的情况。核心Python社区中甚至没有人再关心Python 2。他们正在研究Python 3.4的下一个重要功能,这将是Python 3的第五个主要版本。PyPy家伙还没有一个。因此,在开始成为竞争者之前,他们还有一些工作要做。

不要误会我的意思。PyPy很棒。但是在许多非常重要的方面,它仍然远没有比CPython更好。

顺便说一句,如果您在PyPy中使用SymPy,则不会看到较小的内存占用(或加速)。参见https://bitbucket.org/pypy/pypy/issues/1447/

In addition to everything that’s been said here, PyPy is not nearly as rock solid as CPython in terms of bugs. With SymPy, we’ve found at about a dozen bugs in PyPy over the past couple of years, both in released versions and in the nightlies.

On the other hand, we’ve only ever found one bug in CPython, and that was in a prerelease.

Plus, don’t discount the lack of Python 3 support. No one in the core Python community even cares about Python 2 any more. They are working on the next big things in Python 3.4, which will be the fifth major release of Python 3. The PyPy guys still haven’t gotten one of them. So they’ve got some catching up to do before they can start to be contenders.

Don’t get me wrong. PyPy is awesome. But it’s still far from being better than CPython in a lot of very important ways.

And by the way, if you use SymPy in PyPy, you won’t see a smaller memory footprint (or a speedup either). See https://bitbucket.org/pypy/pypy/issues/1447/.


PyPy —如何击败CPython?

问题:PyPy —如何击败CPython?

来自Google开源博客

PyPy是Python中Python的重新实现,它使用先进的技术来尝试获得比CPython更好的性能。多年的努力终于有了回报。我们的速度结果通常要比CPython好,从稍慢一些到实际应用程序代码的速度提高2倍,再到小型基准测试的速度提高10倍。

这怎么可能?哪个Python实现用于实现PyPy?CPython的?PyPyPy或PyPyPyPy击败他们的分数的机会是什么?

(在相关说明中……为什么有人会尝试这样的方法?)

From the Google Open Source Blog:

PyPy is a reimplementation of Python in Python, using advanced techniques to try to attain better performance than CPython. Many years of hard work have finally paid off. Our speed results often beat CPython, ranging from being slightly slower, to speedups of up to 2x on real application code, to speedups of up to 10x on small benchmarks.

How is this possible? Which Python implementation was used to implement PyPy? CPython? And what are the chances of a PyPyPy or PyPyPyPy beating their score?

(On a related note… why would anyone try something like this?)


回答 0

Q1。这怎么可能?

在某些情况下,手动内存管理(这是CPython对其计数的方式)可能比自动管理要慢。

CPython解释器实现的局限性排除了PyPy可以进行的某些优化(例如,细粒度的锁)。

正如Marcelo所述,JIT。能够即时确认对象的类型可以节省您进行多个指针取消引用的操作,以最终到达您要调用的方法。

Q2。哪个Python实现用于实现PyPy?

PyPy解释器在RPython中实现,RPython是Python的静态类型子集(该语言而不是CPython解释器)。- 有关详细信息,请参阅https://pypy.readthedocs.org/en/latest/architecture.html

Q3。PyPyPy或PyPyPyPy击败他们的分数的机会是什么?

那将取决于这些假设解释器的实现。例如,如果其中一个拿到了源代码,对其进行了某种分析,并在运行了一段时间后将其直接转换为目标紧密的特定汇编代码,我想它会比CPython快得多。

更新:最近,在一个精心设计的示例中,PyPy优于使用编译的类似C程序gcc -O3。这是一个人为的案例,但确实体现了一些想法。

Q4。为什么有人会尝试这样的事情?

从官方网站。https://pypy.readthedocs.org/zh_CN/latest/architecture.html#mission-statement

我们旨在提供:

  • 一个通用的翻译和支持框架,用于生成
    动态语言的实现,强调
    语言规范和实现
    方面之间的明确区分。我们称此为RPython toolchain_。

  • Python_语言的合规,灵活和快速实现,它使用上述工具链来启用新的高级高级功能,而不必对低级细节进行编码。

通过以这种方式分离关注点,我们的Python和其他动态语言的实现能够自动为任何动态语言生成即时编译器。它还允许采用混合匹配方法来实施决策,包括许多历史上在用户控制范围之外的决策,例如目标平台,内存和线程模型,垃圾回收策略以及所应用的优化,包括是否具有首先是JIT。

C编译器gcc用C实现,Haskell编译器GHC用Haskell编写。您是否有理由不使用Python编写Python解释器/编译器?

Q1. How is this possible?

Manual memory management (which is what CPython does with its counting) can be slower than automatic management in some cases.

Limitations in the implementation of the CPython interpreter preclude certain optimisations that PyPy can do (eg. fine grained locks).

As Marcelo mentioned, the JIT. Being able to on the fly confirm the type of an object can save you the need to do multiple pointer dereferences to finally arrive at the method you want to call.

Q2. Which Python implementation was used to implement PyPy?

The PyPy interpreter is implemented in RPython which is a statically typed subset of Python (the language and not the CPython interpreter). – Refer https://pypy.readthedocs.org/en/latest/architecture.html for details.

Q3. And what are the chances of a PyPyPy or PyPyPyPy beating their score?

That would depend on the implementation of these hypothetical interpreters. If one of them for example took the source, did some kind of analysis on it and converted it directly into tight target specific assembly code after running for a while, I imagine it would be quite faster than CPython.

Update: Recently, on a carefully crafted example, PyPy outperformed a similar C program compiled with gcc -O3. It’s a contrived case but does exhibit some ideas.

Q4. Why would anyone try something like this?

From the official site. https://pypy.readthedocs.org/en/latest/architecture.html#mission-statement

We aim to provide:

  • a common translation and support framework for producing
    implementations of dynamic languages, emphasizing a clean
    separation between language specification and implementation
    aspects. We call this the RPython toolchain_.

  • a compliant, flexible and fast implementation of the Python_ Language which uses the above toolchain to enable new advanced high-level features without having to encode the low-level details.

By separating concerns in this way, our implementation of Python – and other dynamic languages – is able to automatically generate a Just-in-Time compiler for any dynamic language. It also allows a mix-and-match approach to implementation decisions, including many that have historically been outside of a user’s control, such as target platform, memory and threading models, garbage collection strategies, and optimizations applied, including whether or not to have a JIT in the first place.

The C compiler gcc is implemented in C, The Haskell compiler GHC is written in Haskell. Do you have any reason for the Python interpreter/compiler to not be written in Python?


回答 1

尽管在技术上是正确的,“ PyPy是Python在Python中的重新实现”是一种相当误导的方式来描述恕我直言的PyPy。

PyPy有两个主要部分。

  1. 翻译框架
  2. 口译员

翻译框架是编译器。它将RPython代码编译为C(或其他目标),并自动添加垃圾回收和JIT编译器等方面。它不能处理任意的Python代码,只能处理RPython。

RPython是普通Python的子集;所有RPython代码都是Python代码,但并非相反。RPython没有正式的定义,因为RPython基本上只是“可由PyPy的翻译框架翻译的Python子集”。但是要进行翻译,RPython代码必须是静态类型的(推断类型,您不必声明它们,但严格来说,它仍然是每个变量的一种类型),并且您不能执行诸如声明/修改函数/类在运行时。

然后,解释器是用RPython编写的普通Python解释器。

由于RPython代码是普通的Python代码,因此您可以在任何Python解释器上运行它。但是,PyPy的速度要求均不来自这种方式。这只是一个快速的测试周期,因为翻译解释器需要长时间。

有了这样的理解,对于PyPyPy或PyPyPyPy的猜测实际上没有任何意义。您有一个用RPython编写的解释器。您将其转换为可快速执行Python的C代码。到此过程停止;不再有RPython可以通过再次处理来加快速度。

因此“ PyPy怎么可能比CPython更快”就变得很明显。PyPy有一个更好的实现,包括一个JIT编译器(我相信,没有JIT编译器,它通常不会那么快,这意味着PyPy仅对于易受JIT编译的程序而言更快)。CPython从未被设计为Python语言的高度优化的实现(尽管您确实遵循差异,但它们确实尝试使其成为高度优化的实现)。


PyPy项目的真正创新之处在于,它们无需手动编写复杂的GC方案或JIT编译器。他们用RPython相对直接地编写解释器,并且由于所有RPython都比Python低,所以它仍然是一种面向对象的垃圾收集语言,比C高得多。然后,翻译框架自动添加GC和JIT之类的东西。所以翻译框架是一个巨大的努力,但它同样适用于PyPy python解释器,但是他们更改了实现,从而在实验中拥有更大的自由度来提高性能(而不必担心引入GC错误或更新JIT编译器以应对更改)。这也意味着当他们开始实现Python3解释器时,它将自动获得相同的好处。以及使用PyPy框架编写的任何其他解释器(其中有很多处于波兰的不同阶段)。并且所有使用PyPy框架的解释器都会自动支持该框架支持的所有平台。

因此,PyPy项目的真正好处是(尽可能)将实现动态语言的,与平台无关的高效解释器的所有部分分开。然后在一个地方提出一种很好的实现,可以在许多口译员之间重复使用。这并不是像“我的Python程序现在运行速度更快”那样的即时胜利,但它是未来的广阔前景。

而且它可以更快(也许)运行您的Python程序。

“PyPy is a reimplementation of Python in Python” is a rather misleading way to describe PyPy, IMHO, although it’s technically true.

There are two major parts of PyPy.

  1. The translation framework
  2. The interpreter

The translation framework is a compiler. It compiles RPython code down to C (or other targets), automatically adding in aspects such as garbage collection and a JIT compiler. It cannot handle arbitrary Python code, only RPython.

RPython is a subset of normal Python; all RPython code is Python code, but not the other way around. There is no formal definition of RPython, because RPython is basically just “the subset of Python that can be translated by PyPy’s translation framework”. But in order to be translated, RPython code has to be statically typed (the types are inferred, you don’t declare them, but it’s still strictly one type per variable), and you can’t do things like declaring/modifying functions/classes at runtime either.

The interpreter then is a normal Python interpreter written in RPython.

Because RPython code is normal Python code, you can run it on any Python interpreter. But none of PyPy’s speed claims come from running it that way; this is just for a rapid test cycle, because translating the interpreter takes a long time.

With that understood, it should be immediately obvious that speculations about PyPyPy or PyPyPyPy don’t actually make any sense. You have an interpreter written in RPython. You translate it to C code that executes Python quickly. There the process stops; there’s no more RPython to speed up by processing it again.

So “How is it possible for PyPy to be faster than CPython” also becomes fairly obvious. PyPy has a better implementation, including a JIT compiler (it’s generally not quite as fast without the JIT compiler, I believe, which means PyPy is only faster for programs susceptible to JIT-compilation). CPython was never designed to be a highly optimising implementation of the Python language (though they do try to make it a highly optimised implementation, if you follow the difference).


The really innovative bit of the PyPy project is that they don’t write sophisticated GC schemes or JIT compilers by hand. They write the interpreter relatively straightforwardly in RPython, and for all RPython is lower level than Python it’s still an object-oriented garbage collected language, much more high level than C. Then the translation framework automatically adds things like GC and JIT. So the translation framework is a huge effort, but it applies equally well to the PyPy python interpreter however they change their implementation, allowing for much more freedom in experimentation to improve performance (without worrying about introducing GC bugs or updating the JIT compiler to cope with the changes). It also means when they get around to implementing a Python3 interpreter, it will automatically get the same benefits. And any other interpreters written with the PyPy framework (of which there are a number at varying stages of polish). And all interpreters using the PyPy framework automatically support all platforms supported by the framework.

So the true benefit of the PyPy project is to separate out (as much as possible) all the parts of implementing an efficient platform-independent interpreter for a dynamic language. And then come up with one good implementation of them in one place, that can be re-used across many interpreters. That’s not an immediate win like “my Python program runs faster now”, but it’s a great prospect for the future.

And it can run your Python program faster (maybe).


回答 2

PyPy是用Python实现的,但是它实现了JIT编译器来动态生成本机代码。

在Python之上实现PyPy的原因可能是它只是一种非常有生产力的语言,尤其是因为JIT编译器使宿主语言的性能变得无关紧要。

PyPy is implemented in Python, but it implements a JIT compiler to generate native code on the fly.

The reason to implement PyPy on top of Python is probably that it is simply a very productive language, especially since the JIT compiler makes the host language’s performance somewhat irrelevant.


回答 3

PyPy用受限Python编写。据我所知,它不能在CPython解释器之上运行。受限制的Python是Python语言的子集。AFAIK将PyPy解释器编译为机器代码,因此在安装时,它在运行时不使用python解释器。

您的问题似乎希望PyPy解释器在执行代码时在CPython之上运行。 编辑:是的,要使用PyPy,您首先需要将PyPy python代码转换为C并使用gcc进行编译,转换为jvm字节代码或.Net CLI代码。请参阅入门

PyPy is written in Restricted Python. It does not run on top of the CPython interpreter, as far as I know. Restricted Python is a subset of the Python language. AFAIK, the PyPy interpreter is compiled to machine code, so when installed it does not utilize a python interpreter at runtime.

Your question seems to expect the PyPy interpreter is running on top of CPython while executing code. Edit: Yes, to use PyPy you first translate the PyPy python code, either to C and build with gcc, to jvm byte code, or to .Net CLI code. See Getting Started


如果PyPy快6.3倍,为什么我不应该在CPython上使用PyPy?

问题:如果PyPy快6.3倍,为什么我不应该在CPython上使用PyPy?

我已经听到很多有关PyPy项目的信息。他们声称它比其站点上的CPython解释器快6.3倍。

每当我们谈论诸如Python之类的动态语言时,速度都是头等大事。为了解决这个问题,他们说PyPy快6.3倍。

第二个问题是并行性,臭名昭著的Global Interpreter Lock(GIL)。为此,PyPy表示可以提供无GIL的Python

如果PyPy可以解决这些巨大的挑战,那么它的哪些弱点正在阻碍广泛采用?也就是说,是什么原因导致我这样的人,一个典型的Python开发,切换到PyPy 现在

I’ve been hearing a lot about the PyPy project. They claim it is 6.3 times faster than the CPython interpreter on their site.

Whenever we talk about dynamic languages like Python, speed is one of the top issues. To solve this, they say PyPy is 6.3 times faster.

The second issue is parallelism, the infamous Global Interpreter Lock (GIL). For this, PyPy says it can give GIL-less Python.

If PyPy can solve these great challenges, what are its weaknesses that are preventing wider adoption? That is to say, what’s preventing someone like me, a typical Python developer, from switching to PyPy right now?


回答 0

注意: PyPy现在比2013年提出这个问题时更加成熟,并且得到了更好的支持。避免从过时的信息中得出结论。


  1. 正如其他人很快提到的,PyPy 对C扩展提供了长期的支持。它具有支持,但通常速度低于Python,并且充其量也只是个问题。因此,许多模块只需要 CPython。PyPy不支持numpy PyPy现在支持numpy。某些扩展仍然不受支持(Pandas,SciPy等),请在进行更改之前先查看支持的软件包的列表
  2. 目前,对Python 3的支持尚处于试验阶段。 刚刚达到稳定!自2014年6月20日起,PyPy3 2.3.1-Fulcrum退出了
  3. PyPy有时并不真正更快“脚本”,其中有很多人使用Python进行。这些是运行时间短的程序,它们执行简单和小的操作。由于PyPy是JIT编译器,因此其主要优点来自运行时间长和简单的类型(例如数字)。坦率地说,与CPython相比,PyPy的JIT之前速度非常差
  4. 惯性。迁移到PyPy通常需要重新配置工具,对于某些人和组织而言,这简直就是太多的工作。

我会说,这些是影响我的主要原因。

NOTE: PyPy is more mature and better supported now than it was in 2013, when this question was asked. Avoid drawing conclusions from out-of-date information.


  1. PyPy, as others have been quick to mention, has tenuous support for C extensions. It has support, but typically at slower-than-Python speeds and it’s iffy at best. Hence a lot of modules simply require CPython. PyPy doesn’t support numpy PyPy now supports numpy. Some extensions are still not supported (Pandas, SciPy, etc.), take a look at the list of supported packages before making the change.
  2. Python 3 support is experimental at the moment. has just reached stable! As of 20th June 2014, PyPy3 2.3.1 – Fulcrum is out!
  3. PyPy sometimes isn’t actually faster for “scripts”, which a lot of people use Python for. These are the short-running programs that do something simple and small. Because PyPy is a JIT compiler its main advantages come from long run times and simple types (such as numbers). Frankly, PyPy’s pre-JIT speeds are pretty bad compared to CPython.
  4. Inertia. Moving to PyPy often requires retooling, which for some people and organizations is simply too much work.

Those are the main reasons that affect me, I’d say.


回答 1

该网站也没有权利要求PyPy比CPython的快6.3倍。报价:

所有基准的几何平均值比CPython快0.16或6.3倍

这与您所做的一揽子声明完全不同,当您了解差异时,您将至少了解一组不能仅仅说“使用PyPy”的原因。听起来好像我很挑剔,但是了解为什么这两个陈述完全不同是至关重要的。

分解:

  • 他们所做的陈述仅适用于他们所使用的基准。它完全没有说明您的程序(除非您的程序与其基准之一完全相同)。

  • 该声明大约是一组基准的平均值。没有人声称运行PyPy甚至可以为他们测试过的程序带来6.3倍的改进。

  • 没有人声称PyPy甚至可以运行CPython运行的所有程序,更不用说更快了。

That site does not claim PyPy is 6.3 times faster than CPython. To quote:

The geometric average of all benchmarks is 0.16 or 6.3 times faster than CPython

This is a very different statement to the blanket statement you made, and when you understand the difference, you’ll understand at least one set of reasons why you can’t just say “use PyPy”. It might sound like I’m nit-picking, but understanding why these two statements are totally different is vital.

To break that down:

  • The statement they make only applies to the benchmarks they’ve used. It says absolutely nothing about your program (unless your program is exactly the same as one of their benchmarks).

  • The statement is about an average of a group of benchmarks. There is no claim that running PyPy will give a 6.3 times improvement even for the programs they have tested.

  • There is no claim that PyPy will even run all the programs that CPython runs at all, let alone faster.


回答 2

由于pypy并非100%兼容,因此需要8 gig的ram进行编译,这是一个不断变化的目标,并且处于高度试验阶段,而cpython是稳定的,这是模块构建器默认的目标,长达20年(包括无法在pypy上运行的c扩展名) ),并且已经广泛部署。

Pypy可能永远不会成为参考实现,但是它是一个很好的工具。

Because pypy is not 100% compatible, takes 8 gigs of ram to compile, is a moving target, and highly experimental, where cpython is stable, the default target for module builders for 2 decades (including c extensions that don’t work on pypy), and already widely deployed.

Pypy will likely never be the reference implementation, but it is a good tool to have.


回答 3

第二个问题更容易回答:如果所有代码都是纯Python,则基本上可以使用PyPy替代。但是,许多广泛使用的库(包括一些标准库)都是用C编写的,并作为Python扩展进行编译。其中有些可以与PyPy一起使用,有些则不能。PyPy提供了与Python相同的“面向前”工具-也就是说,它是Python-,但是它的内在功能是不同的,因此与这些内在功能连接的工具将不起作用。

关于第一个问题,我想这有点像第一个Catch-22:PyPy一直在迅速发展,以提高速度并增强与其他代码的互操作性。这使其比官方更具实验性。

我认为,如果PyPy进入稳定状态,则有可能开始被更广泛地使用。我也认为Python摆脱C的支持是很棒的。但这不会一会儿发生。PyPy还没有达到临界质量的地方是几乎对自己有用的,足以做你想要的一切,这将激励人们以填补空白。

The second question is easier to answer: you basically can use PyPy as a drop-in replacement if all your code is pure Python. However, many widely used libraries (including some of the standard library) are written in C and compiled as Python extensions. Some of these can be made to work with PyPy, some can’t. PyPy provides the same “forward-facing” tool as Python — that is, it is Python — but its innards are different, so tools that interface with those innards won’t work.

As for the first question, I imagine it is sort of a Catch-22 with the first: PyPy has been evolving rapidly in an effort to improve speed and enhance interoperability with other code. This has made it more experimental than official.

I think it’s possible that if PyPy gets into a stable state, it may start getting more widely used. I also think it would be great for Python to move away from its C underpinnings. But it won’t happen for a while. PyPy hasn’t yet reached the critical mass where it is almost useful enough on its own to do everything you’d want, which would motivate people to fill in the gaps.


回答 4

我对此主题做了一个小型基准测试。尽管许多其他发布者在兼容性方面都提出了很好的观点,但我的经验是,PyPy仅仅移动一些位并没有那么快。对于Python的许多用途,它实际上仅存在于在两个或多个服务之间转换位。例如,很少有Web应用程序对数据集执行CPU密集型分析。相反,它们从客户端获取一些字节,将其存储在某种数据库中,然后再将其返回给其他客户端。有时,数据格式会更改。

BDFL和CPython开发人员是一群非常聪明的人,并设法帮助CPython在这种情况下表现出色。这是一个无耻的博客插件:http : //www.hydrogen18.com/blog/unpickling-buffers.html。我正在使用Stackless,它是从CPython派生的,并保留了完整的C模块接口。在那种情况下,我发现使用PyPy没有任何优势。

I did a small benchmark on this topic. While many of the other posters have made good points about compatibility, my experience has been that PyPy isn’t that much faster for just moving around bits. For many uses of Python, it really only exists to translate bits between two or more services. For example, not many web applications are performing CPU intensive analysis of datasets. Instead, they take some bytes from a client, store them in some sort of database, and later return them to other clients. Sometimes the format of the data is changed.

The BDFL and the CPython developers are a remarkably intelligent group of people and have a managed to help CPython perform excellent in such a scenario. Here’s a shameless blog plug: http://www.hydrogen18.com/blog/unpickling-buffers.html . I’m using Stackless, which is derived from CPython and retains the full C module interface. I didn’t find any advantage to using PyPy in that case.


回答 5

问:如果与CPython相比,PyPy可以解决这些巨大的挑战(速度,内存消耗,并行性),那么它的哪些弱点在阻止更广泛的采用?

答:首先,很少有证据表明PyPy团队可以解决问题的速度一般。长期证据表明,PyPy运行某些Python代码要比CPython慢​​,而且这一缺点似乎深深地植根于PyPy。

其次,在相当多的情况下,当前版本的PyPy消耗的内存比CPython多得多。因此,PyPy尚未解决内存消耗问题。

无论PyPy解决所提到的巨大挑战,并在一般更快,较少的内存饿了,和更友好的并行与CPython是一个悬而未决的问题无法在短期内得到解决。有人押注,PyPy将永远无法提供一种通用解决方案,使它在所有情况下均能统治CPython 2.7和3.3。

如果PyPy总体上要比CPython更好,这是值得怀疑的,那么影响其广泛采用的主要弱点将是与CPython的兼容性。还存在一些问题,例如CPython可在更广泛的CPU和OS上运行,但是与PyPy的性能和CPython兼容性目标相比,这些问题的重要性要小得多。


问:为什么现在不能放弃用PyPy替换CPython?

答:PyPy并非100%与CPython兼容,因为它没有在后台模拟CPython。有些程序可能仍依赖于PyPy中缺少的CPython的独特功能,例如C绑定,Python对象和方法的C实现,或CPython垃圾收集器的增量性质。

Q: If PyPy can solve these great challenges (speed, memory consumption, parallelism) in comparison to CPython, what are its weaknesses that are preventing wider adoption?

A: First, there is little evidence that the PyPy team can solve the speed problem in general. Long-term evidence is showing that PyPy runs certain Python codes slower than CPython and this drawback seems to be rooted very deeply in PyPy.

Secondly, the current version of PyPy consumes much more memory than CPython in a rather large set of cases. So PyPy didn’t solve the memory consumption problem yet.

Whether PyPy solves the mentioned great challenges and will in general be faster, less memory hungry, and more friendly to parallelism than CPython is an open question that cannot be solved in the short term. Some people are betting that PyPy will never be able to offer a general solution enabling it to dominate CPython 2.7 and 3.3 in all cases.

If PyPy succeeds to be better than CPython in general, which is questionable, the main weakness affecting its wider adoption will be its compatibility with CPython. There also exist issues such as the fact that CPython runs on a wider range of CPUs and OSes, but these issues are much less important compared to PyPy’s performance and CPython-compatibility goals.


Q: Why can’t I do drop in replacement of CPython with PyPy now?

A: PyPy isn’t 100% compatible with CPython because it isn’t simulating CPython under the hood. Some programs may still depend on CPython’s unique features that are absent in PyPy such as C bindings, C implementations of Python object&methods, or the incremental nature of CPython’s garbage collector.


回答 6

CPython具有引用计数和垃圾收集,PyPy仅具有垃圾收集。

因此,对象倾向于更早地删除,并__del__在CPython中以更可预测的方式调用。一些软件依赖于这种行为,因此它们还没有准备好迁移到PyPy。

某些其他软件可同时使用这两种软件,但CPython使用较少的内存,因为较早时释放了未使用的对象。(我没有任何度量来表明这有多重要,还有哪些其他实现细节会影响内存使用。)

CPython has reference counting and garbage collection, PyPy has garbage collection only.

So objects tend to be deleted earlier and __del__ is called in a more predictable way in CPython. Some software relies on this behavior, thus they are not ready for migrating to PyPy.

Some other software works with both, but uses less memory with CPython, because unused objects are freed earlier. (I don’t have any measurements to indicate how significant this is and what other implementation details affect the memory use.)


回答 7

对于许多项目,在速度方面,不同的python之间实际上有0%的差异。那就是那些受工程时间支配并且所有python都具有相同数量的库支持的库。

For a lot of projects, there is actually 0% difference between the different pythons in terms of speed. That is those that are dominated by engineering time and where all pythons have the same amount of library support.


回答 8

简单地说:PyPy提供了CPython所缺乏的速度,但却牺牲了它的兼容性。但是,大多数人选择Python是因为它具有灵活性和“含电池”功能(高兼容性),而不是因为它的速度(尽管它仍然是首选)。

To make this simple: PyPy provides the speed that’s lacked by CPython but sacrifices its compatibility. Most people, however, choose Python for its flexibility and its “battery-included” feature (high compatibility), not for its speed (it’s still preferred though).


回答 9

我发现了一些例子,其中PyPy比Python慢​​。但是:仅在Windows上。

C:\Users\User>python -m timeit -n10 -s"from sympy import isprime" "isprime(2**521-1);isprime(2**1279-1)"
10 loops, best of 3: 294 msec per loop

C:\Users\User>pypy -m timeit -n10 -s"from sympy import isprime" "isprime(2**521-1);isprime(2**1279-1)"
10 loops, best of 3: 1.33 sec per loop

因此,如果您想到的是PyPy,请忘记Windows。在Linux上,您可以实现出色的加速。示例(列出1到1,000,000之间的所有素数):

from sympy import sieve
primes = list(sieve.primerange(1, 10**6))

PyPy的运行速度比Python快10(!)倍。但不在Windows上。那里只有3倍的速度。

I’ve found examples, where PyPy is slower than Python. But: Only on Windows.

C:\Users\User>python -m timeit -n10 -s"from sympy import isprime" "isprime(2**521-1);isprime(2**1279-1)"
10 loops, best of 3: 294 msec per loop

C:\Users\User>pypy -m timeit -n10 -s"from sympy import isprime" "isprime(2**521-1);isprime(2**1279-1)"
10 loops, best of 3: 1.33 sec per loop

So, if you think of PyPy, forget Windows. On Linux, you can achieve awesome accelerations. Example (list all primes between 1 and 1,000,000):

from sympy import sieve
primes = list(sieve.primerange(1, 10**6))

This runs 10(!) times faster on PyPy than on Python. But not on windows. There it is only 3x as fast.


回答 10

PyPy已经支持Python 3一段时间了,但是根据Anthony Shaw在2018年4月2日发布的HackerNoon帖子中所述,PyPy3仍然比PyPy(Python 2)慢几倍。

对于许多科学计算,尤其是矩阵计算,numpy是更好的选择(请参阅FAQ:我应该安装numpy还是numpypy?)。

Pypy不支持gmpy2。您可以改用gmpy_cffi, 尽管我尚未测试过它的速度,并且该项目在2014年发布了一个版本。

对于Project Euler问题,我经常使用PyPy,对于简单的数值计算通常from __future__ import division足以满足我的目的,但是截至2018年,Python 3支持仍在开发中,最好的选择是在64位Linux上。Windows PyPy3.5 v6.0(截至2018年12月)为最新版本。

PyPy has had Python 3 support for a while, but according to this HackerNoon post by Anthony Shaw from April 2nd, 2018, PyPy3 is still several times slower than PyPy (Python 2).

For many scientific calculations, particularly matrix calculations, numpy is a better choice (see FAQ: Should I install numpy or numpypy?).

Pypy does not support gmpy2. You can instead make use of gmpy_cffi though I haven’t tested its speed and the project had one release in 2014.

For Project Euler problems, I make frequent use of PyPy, and for simple numerical calculations often from __future__ import division is sufficient for my purposes, but Python 3 support is still being worked on as of 2018, with your best bet being on 64-bit Linux. Windows PyPy3.5 v6.0, the latest as of December 2018, is in beta.


回答 11

支持的Python版本

引用PythonZen

可读性很重要。

例如,Python 3.7引入了数据类,Python 3.8引入了fstring =

Python 3.7和Python 3.8中可能还有其他更重要的功能。关键是PyPy目前不支持Python 3.7或Python 3.8。

Supported Python Versions

To cite the Zen of Python:

Readability counts.

For example, Python 3.7 introduced dataclasses and Python 3.8 introduced fstring =.

There might be other features in Python 3.7 and Python 3.8 which are more important to you. The point is that PyPy does not support Python 3.7 or Python 3.8 at the moment.