标签归档:python-internals

从类定义中的列表理解访问类变量

问题:从类定义中的列表理解访问类变量

如何从类定义中的列表理解中访问其他类变量?以下内容在Python 2中有效,但在Python 3中失败:

class Foo:
    x = 5
    y = [x for i in range(1)]

Python 3.2给出了错误:

NameError: global name 'x' is not defined

尝试Foo.x也不起作用。关于如何在Python 3中执行此操作的任何想法?

一个更复杂的激励示例:

from collections import namedtuple
class StateDatabase:
    State = namedtuple('State', ['name', 'capital'])
    db = [State(*args) for args in [
        ['Alabama', 'Montgomery'],
        ['Alaska', 'Juneau'],
        # ...
    ]]

在此示例中,apply()这是一个不错的解决方法,但不幸的是它已从Python 3中删除。

How do you access other class variables from a list comprehension within the class definition? The following works in Python 2 but fails in Python 3:

class Foo:
    x = 5
    y = [x for i in range(1)]

Python 3.2 gives the error:

NameError: global name 'x' is not defined

Trying Foo.x doesn’t work either. Any ideas on how to do this in Python 3?

A slightly more complicated motivating example:

from collections import namedtuple
class StateDatabase:
    State = namedtuple('State', ['name', 'capital'])
    db = [State(*args) for args in [
        ['Alabama', 'Montgomery'],
        ['Alaska', 'Juneau'],
        # ...
    ]]

In this example, apply() would have been a decent workaround, but it is sadly removed from Python 3.


回答 0

类范围和列表,集合或字典的理解以及生成器表达式不混合。

为什么;或者,关于这个的正式词

在Python 3中,为列表理解赋予了它们自己的适当范围(本地命名空间),以防止其局部变量渗入周围的范围内(即使在理解范围之后,也请参阅Python列表理解重新绑定名称。对吗?)。在模块或函数中使用这样的列表理解时,这很好,但是在类中,作用域范围有点奇怪

pep 227中对此进行了记录:

类范围内的名称不可访问。名称在最里面的函数范围内解析。如果类定义出现在嵌套作用域链中,则解析过程将跳过类定义。

并在 class复合语句文档中

然后,使用新创建的本地命名空间和原始的全局命名空间,在新的执行框架中执行该类的套件(请参见Naming and binding部分)。(通常,套件仅包含函数定义。)当类的套件完成执行时,其执行框架将被丢弃,但其本地命名空间将被保存[4]然后,使用基类的继承列表和属性字典的已保存本地命名空间创建类对象。

强调我的;执行框架是临时范围。

由于范围被重新用作类对象的属性,因此允许将其用作非本地范围也将导致未定义的行为。例如,如果一个类方法称为x嵌套作用域变量,然后又进行操作Foo.x,会发生什么情况?更重要的是,这对于Foo?Python 必须以不同的方式对待类范围,因为它与函数范围有很大不同。

最后但同样重要的是,链接 执行模型文档中命名和绑定部分明确提到了类作用域:

在类块中定义的名称范围仅限于该类块。它不会扩展到方法的代码块–包括理解和生成器表达式,因为它们是使用函数范围实现的。这意味着以下操作将失败:

class A:
     a = 42
     b = list(a + i for i in range(10))

因此,总结一下:您不能从函数,列出的理解或包含在该范围内的生成器表达式中访问类范围;它们的作用就好像该范围不存在。在Python 2中,列表理解是使用快捷方式实现的,但是在Python 3中,它们具有自己的功能范围(应该一直如此),因此您的示例中断了。无论Python版本如何,其他理解类型都有其自己的范围,因此具有set或dict理解的类似示例将在Python 2中中断。

# Same error, in Python 2 or 3
y = {x: x for i in range(1)}

(小)异常;或者,为什么一部分仍然可以工作

无论Python版本如何,理解或生成器表达式的一部分都在周围的范围内执行。那就是最外层可迭代的表达。在您的示例中,它是range(1)

y = [x for i in range(1)]
#               ^^^^^^^^

因此,使用 x在该表达式中不会引发错误:

# Runs fine
y = [i for i in range(x)]

这仅适用于最外面的可迭代对象。如果一个理解具有多个for子句,则内部的可迭代for子句在该理解的范围进行评估:

# NameError
y = [i for i in range(1) for j in range(x)]

做出此设计决定是为了在genexp创建时引发错误,而不是在创建生成器表达式的最外层可迭代器引发错误时,或者当最外层可迭代器变得不可迭代时,在迭代时抛出错误。理解共享此行为以保持一致性。

在引擎盖下看;或者,比您想要的方式更详细

您可以使用dis模块查看所有这些操作。在以下示例中,我将使用Python 3.3,因为它添加了合格的名称,这些名称可以整洁地标识我们要检查的代码对象。产生的字节码在其他方面与Python 3.2相同。

为了创建一个类,Python本质上采用了构成类主体的整个套件(因此所有内容都比该class <name>:行缩进了一层),并像执行一个函数一样执行:

>>> import dis
>>> def foo():
...     class Foo:
...         x = 5
...         y = [x for i in range(1)]
...     return Foo
... 
>>> dis.dis(foo)
  2           0 LOAD_BUILD_CLASS     
              1 LOAD_CONST               1 (<code object Foo at 0x10a436030, file "<stdin>", line 2>) 
              4 LOAD_CONST               2 ('Foo') 
              7 MAKE_FUNCTION            0 
             10 LOAD_CONST               2 ('Foo') 
             13 CALL_FUNCTION            2 (2 positional, 0 keyword pair) 
             16 STORE_FAST               0 (Foo) 

  5          19 LOAD_FAST                0 (Foo) 
             22 RETURN_VALUE         

首先LOAD_CONSTFoo该类中为类主体加载一个代码对象,然后将其放入函数中并进行调用。然后,该调用的结果用于创建类的命名空间,__dict__。到目前为止,一切都很好。

这里要注意的是字节码包含一个嵌套的代码对象。在Python中,类定义,函数,理解和生成器均表示为代码对象,这些对象不仅包含字节码,而且还包含表示局部变量,常量,取自全局变量的变量和取自嵌套作用域的变量的结构。编译后的字节码引用了这些结构,而python解释器知道如何访问给定的字节码。

这里要记住的重要一点是,Python在编译时创建了这些结构。该class套件是<code object Foo at 0x10a436030, file "<stdin>", line 2>已编译的代码对象()。

让我们检查创建类主体本身的代码对象。代码对象具有以下co_consts结构:

>>> foo.__code__.co_consts
(None, <code object Foo at 0x10a436030, file "<stdin>", line 2>, 'Foo')
>>> dis.dis(foo.__code__.co_consts[1])
  2           0 LOAD_FAST                0 (__locals__) 
              3 STORE_LOCALS         
              4 LOAD_NAME                0 (__name__) 
              7 STORE_NAME               1 (__module__) 
             10 LOAD_CONST               0 ('foo.<locals>.Foo') 
             13 STORE_NAME               2 (__qualname__) 

  3          16 LOAD_CONST               1 (5) 
             19 STORE_NAME               3 (x) 

  4          22 LOAD_CONST               2 (<code object <listcomp> at 0x10a385420, file "<stdin>", line 4>) 
             25 LOAD_CONST               3 ('foo.<locals>.Foo.<listcomp>') 
             28 MAKE_FUNCTION            0 
             31 LOAD_NAME                4 (range) 
             34 LOAD_CONST               4 (1) 
             37 CALL_FUNCTION            1 (1 positional, 0 keyword pair) 
             40 GET_ITER             
             41 CALL_FUNCTION            1 (1 positional, 0 keyword pair) 
             44 STORE_NAME               5 (y) 
             47 LOAD_CONST               5 (None) 
             50 RETURN_VALUE         

上面的字节码创建了类主体。该功能被执行并且将所得locals()的命名空间,包含xy用于创建类(不同之处在于因为它不工作x不被定义为一个全局)。请注意,在中存储5x,它会加载另一个代码对象。那就是列表理解;它像类主体一样被包装在一个函数对象中;创建的函数带有一个位置参数,该参数range(1)可迭代用于其循环代码,并转换为迭代器。如字节码所示,range(1)在类范围内进行评估。

从中可以看出,用于函数或生成器的代码对象与用于理解的代码对象之间的唯一区别是,后者在执行父代码对象时立即执行;字节码只是简单地动态创建一个函数,然后只需几个小步骤就可以执行它。

Python 2.x在那里改用内联字节码,这是Python 2.7的输出:

  2           0 LOAD_NAME                0 (__name__)
              3 STORE_NAME               1 (__module__)

  3           6 LOAD_CONST               0 (5)
              9 STORE_NAME               2 (x)

  4          12 BUILD_LIST               0
             15 LOAD_NAME                3 (range)
             18 LOAD_CONST               1 (1)
             21 CALL_FUNCTION            1
             24 GET_ITER            
        >>   25 FOR_ITER                12 (to 40)
             28 STORE_NAME               4 (i)
             31 LOAD_NAME                2 (x)
             34 LIST_APPEND              2
             37 JUMP_ABSOLUTE           25
        >>   40 STORE_NAME               5 (y)
             43 LOAD_LOCALS         
             44 RETURN_VALUE        

没有代码对象被加载,而是FOR_ITER循环内联运行。因此,在Python 3.x中,为列表生成器提供了自己的适当代码对象,这意味着它具有自己的作用域。

然而,理解与当模块或脚本首先被解释加载的Python源代码的其余部分一起编译,编译器并没有考虑一类套件的有效范围。在列表理解任何引用变量必须在查找范围周围的类定义,递归。如果编译器未找到该变量,则将其标记为全局变量。列表理解代码对象的反汇编显示x确实确实是作为全局加载的:

>>> foo.__code__.co_consts[1].co_consts
('foo.<locals>.Foo', 5, <code object <listcomp> at 0x10a385420, file "<stdin>", line 4>, 'foo.<locals>.Foo.<listcomp>', 1, None)
>>> dis.dis(foo.__code__.co_consts[1].co_consts[2])
  4           0 BUILD_LIST               0 
              3 LOAD_FAST                0 (.0) 
        >>    6 FOR_ITER                12 (to 21) 
              9 STORE_FAST               1 (i) 
             12 LOAD_GLOBAL              0 (x) 
             15 LIST_APPEND              2 
             18 JUMP_ABSOLUTE            6 
        >>   21 RETURN_VALUE         

此字节代码块加载传入的第一个参数( range(1)迭代器),就像Python 2.x版本用于对其FOR_ITER进行循环并创建其输出一样。

如果我们xfoo函数中定义,x它将是一个单元格变量(单元格是指嵌套作用域):

>>> def foo():
...     x = 2
...     class Foo:
...         x = 5
...         y = [x for i in range(1)]
...     return Foo
... 
>>> dis.dis(foo.__code__.co_consts[2].co_consts[2])
  5           0 BUILD_LIST               0 
              3 LOAD_FAST                0 (.0) 
        >>    6 FOR_ITER                12 (to 21) 
              9 STORE_FAST               1 (i) 
             12 LOAD_DEREF               0 (x) 
             15 LIST_APPEND              2 
             18 JUMP_ABSOLUTE            6 
        >>   21 RETURN_VALUE         

LOAD_DEREF将间接加载x从代码对象小区对象:

>>> foo.__code__.co_cellvars               # foo function `x`
('x',)
>>> foo.__code__.co_consts[2].co_cellvars  # Foo class, no cell variables
()
>>> foo.__code__.co_consts[2].co_consts[2].co_freevars  # Refers to `x` in foo
('x',)
>>> foo().y
[2]

实际引用从当前帧数据结构中查找值,当前帧数据结构是从功能对象的.__closure__属性初始化的。由于为理解代码对象创建的函数被再次丢弃,因此我们无法检查该函数的关闭情况。要查看实际的闭包,我们必须检查一个嵌套函数:

>>> def spam(x):
...     def eggs():
...         return x
...     return eggs
... 
>>> spam(1).__code__.co_freevars
('x',)
>>> spam(1)()
1
>>> spam(1).__closure__
>>> spam(1).__closure__[0].cell_contents
1
>>> spam(5).__closure__[0].cell_contents
5

因此,总结一下:

  • 列表推导在Python 3中获得了自己的代码对象,并且函数,生成器或推导的代码对象之间没有区别。理解代码对象包装在一个临时函数对象中,并立即调用。
  • 代码对象是在编译时创建的,并且根据代码的嵌套作用域,将任何非局部变量标记为全局变量或自由变量。类主体被视为查找那些变量的范围。
  • 执行代码时,Python只需查看全局变量或当前正在执行的对象的关闭。由于编译器未将类主体作为范围包含在内,因此不考虑临时函数命名空间。

解决方法;或者,该怎么办

如果要x像在函数中那样为变量创建显式作用域,则可以将类作用域变量用于列表理解:

>>> class Foo:
...     x = 5
...     def y(x):
...         return [x for i in range(1)]
...     y = y(x)
... 
>>> Foo.y
[5]

y可以直接调用“临时” 功能。我们用它的返回值替换它。解决时考虑其范围x

>>> foo.__code__.co_consts[1].co_consts[2]
<code object y at 0x10a5df5d0, file "<stdin>", line 4>
>>> foo.__code__.co_consts[1].co_consts[2].co_cellvars
('x',)

当然,人们在阅读您的代码时会对此有些挠头。您可能要在其中添加一个大的粗注,以解释您为什么这样做。

最好的解决方法是仅使用__init__创建一个实例变量:

def __init__(self):
    self.y = [self.x for i in range(1)]

并避免一切费力的工作,并避免提出自己的问题。对于您自己的具体示例,我什至不将其存储namedtuple在类中。直接使用输出(根本不存储生成的类),或使用全局变量:

from collections import namedtuple
State = namedtuple('State', ['name', 'capital'])

class StateDatabase:
    db = [State(*args) for args in [
       ('Alabama', 'Montgomery'),
       ('Alaska', 'Juneau'),
       # ...
    ]]

Class scope and list, set or dictionary comprehensions, as well as generator expressions do not mix.

The why; or, the official word on this

In Python 3, list comprehensions were given a proper scope (local namespace) of their own, to prevent their local variables bleeding over into the surrounding scope (see Python list comprehension rebind names even after scope of comprehension. Is this right?). That’s great when using such a list comprehension in a module or in a function, but in classes, scoping is a little, uhm, strange.

This is documented in pep 227:

Names in class scope are not accessible. Names are resolved in the innermost enclosing function scope. If a class definition occurs in a chain of nested scopes, the resolution process skips class definitions.

and in the class compound statement documentation:

The class’s suite is then executed in a new execution frame (see section Naming and binding), using a newly created local namespace and the original global namespace. (Usually, the suite contains only function definitions.) When the class’s suite finishes execution, its execution frame is discarded but its local namespace is saved. [4] A class object is then created using the inheritance list for the base classes and the saved local namespace for the attribute dictionary.

Emphasis mine; the execution frame is the temporary scope.

Because the scope is repurposed as the attributes on a class object, allowing it to be used as a nonlocal scope as well leads to undefined behaviour; what would happen if a class method referred to x as a nested scope variable, then manipulates Foo.x as well, for example? More importantly, what would that mean for subclasses of Foo? Python has to treat a class scope differently as it is very different from a function scope.

Last, but definitely not least, the linked Naming and binding section in the Execution model documentation mentions class scopes explicitly:

The scope of names defined in a class block is limited to the class block; it does not extend to the code blocks of methods – this includes comprehensions and generator expressions since they are implemented using a function scope. This means that the following will fail:

class A:
     a = 42
     b = list(a + i for i in range(10))

So, to summarize: you cannot access the class scope from functions, list comprehensions or generator expressions enclosed in that scope; they act as if that scope does not exist. In Python 2, list comprehensions were implemented using a shortcut, but in Python 3 they got their own function scope (as they should have had all along) and thus your example breaks. Other comprehension types have their own scope regardless of Python version, so a similar example with a set or dict comprehension would break in Python 2.

# Same error, in Python 2 or 3
y = {x: x for i in range(1)}

The (small) exception; or, why one part may still work

There’s one part of a comprehension or generator expression that executes in the surrounding scope, regardless of Python version. That would be the expression for the outermost iterable. In your example, it’s the range(1):

y = [x for i in range(1)]
#               ^^^^^^^^

Thus, using x in that expression would not throw an error:

# Runs fine
y = [i for i in range(x)]

This only applies to the outermost iterable; if a comprehension has multiple for clauses, the iterables for inner for clauses are evaluated in the comprehension’s scope:

# NameError
y = [i for i in range(1) for j in range(x)]

This design decision was made in order to throw an error at genexp creation time instead of iteration time when creating the outermost iterable of a generator expression throws an error, or when the outermost iterable turns out not to be iterable. Comprehensions share this behavior for consistency.

Looking under the hood; or, way more detail than you ever wanted

You can see this all in action using the dis module. I’m using Python 3.3 in the following examples, because it adds qualified names that neatly identify the code objects we want to inspect. The bytecode produced is otherwise functionally identical to Python 3.2.

To create a class, Python essentially takes the whole suite that makes up the class body (so everything indented one level deeper than the class <name>: line), and executes that as if it were a function:

>>> import dis
>>> def foo():
...     class Foo:
...         x = 5
...         y = [x for i in range(1)]
...     return Foo
... 
>>> dis.dis(foo)
  2           0 LOAD_BUILD_CLASS     
              1 LOAD_CONST               1 (<code object Foo at 0x10a436030, file "<stdin>", line 2>) 
              4 LOAD_CONST               2 ('Foo') 
              7 MAKE_FUNCTION            0 
             10 LOAD_CONST               2 ('Foo') 
             13 CALL_FUNCTION            2 (2 positional, 0 keyword pair) 
             16 STORE_FAST               0 (Foo) 

  5          19 LOAD_FAST                0 (Foo) 
             22 RETURN_VALUE         

The first LOAD_CONST there loads a code object for the Foo class body, then makes that into a function, and calls it. The result of that call is then used to create the namespace of the class, its __dict__. So far so good.

The thing to note here is that the bytecode contains a nested code object; in Python, class definitions, functions, comprehensions and generators all are represented as code objects that contain not only bytecode, but also structures that represent local variables, constants, variables taken from globals, and variables taken from the nested scope. The compiled bytecode refers to those structures and the python interpreter knows how to access those given the bytecodes presented.

The important thing to remember here is that Python creates these structures at compile time; the class suite is a code object (<code object Foo at 0x10a436030, file "<stdin>", line 2>) that is already compiled.

Let’s inspect that code object that creates the class body itself; code objects have a co_consts structure:

>>> foo.__code__.co_consts
(None, <code object Foo at 0x10a436030, file "<stdin>", line 2>, 'Foo')
>>> dis.dis(foo.__code__.co_consts[1])
  2           0 LOAD_FAST                0 (__locals__) 
              3 STORE_LOCALS         
              4 LOAD_NAME                0 (__name__) 
              7 STORE_NAME               1 (__module__) 
             10 LOAD_CONST               0 ('foo.<locals>.Foo') 
             13 STORE_NAME               2 (__qualname__) 

  3          16 LOAD_CONST               1 (5) 
             19 STORE_NAME               3 (x) 

  4          22 LOAD_CONST               2 (<code object <listcomp> at 0x10a385420, file "<stdin>", line 4>) 
             25 LOAD_CONST               3 ('foo.<locals>.Foo.<listcomp>') 
             28 MAKE_FUNCTION            0 
             31 LOAD_NAME                4 (range) 
             34 LOAD_CONST               4 (1) 
             37 CALL_FUNCTION            1 (1 positional, 0 keyword pair) 
             40 GET_ITER             
             41 CALL_FUNCTION            1 (1 positional, 0 keyword pair) 
             44 STORE_NAME               5 (y) 
             47 LOAD_CONST               5 (None) 
             50 RETURN_VALUE         

The above bytecode creates the class body. The function is executed and the resulting locals() namespace, containing x and y is used to create the class (except that it doesn’t work because x isn’t defined as a global). Note that after storing 5 in x, it loads another code object; that’s the list comprehension; it is wrapped in a function object just like the class body was; the created function takes a positional argument, the range(1) iterable to use for its looping code, cast to an iterator. As shown in the bytecode, range(1) is evaluated in the class scope.

From this you can see that the only difference between a code object for a function or a generator, and a code object for a comprehension is that the latter is executed immediately when the parent code object is executed; the bytecode simply creates a function on the fly and executes it in a few small steps.

Python 2.x uses inline bytecode there instead, here is output from Python 2.7:

  2           0 LOAD_NAME                0 (__name__)
              3 STORE_NAME               1 (__module__)

  3           6 LOAD_CONST               0 (5)
              9 STORE_NAME               2 (x)

  4          12 BUILD_LIST               0
             15 LOAD_NAME                3 (range)
             18 LOAD_CONST               1 (1)
             21 CALL_FUNCTION            1
             24 GET_ITER            
        >>   25 FOR_ITER                12 (to 40)
             28 STORE_NAME               4 (i)
             31 LOAD_NAME                2 (x)
             34 LIST_APPEND              2
             37 JUMP_ABSOLUTE           25
        >>   40 STORE_NAME               5 (y)
             43 LOAD_LOCALS         
             44 RETURN_VALUE        

No code object is loaded, instead a FOR_ITER loop is run inline. So in Python 3.x, the list generator was given a proper code object of its own, which means it has its own scope.

However, the comprehension was compiled together with the rest of the python source code when the module or script was first loaded by the interpreter, and the compiler does not consider a class suite a valid scope. Any referenced variables in a list comprehension must look in the scope surrounding the class definition, recursively. If the variable wasn’t found by the compiler, it marks it as a global. Disassembly of the list comprehension code object shows that x is indeed loaded as a global:

>>> foo.__code__.co_consts[1].co_consts
('foo.<locals>.Foo', 5, <code object <listcomp> at 0x10a385420, file "<stdin>", line 4>, 'foo.<locals>.Foo.<listcomp>', 1, None)
>>> dis.dis(foo.__code__.co_consts[1].co_consts[2])
  4           0 BUILD_LIST               0 
              3 LOAD_FAST                0 (.0) 
        >>    6 FOR_ITER                12 (to 21) 
              9 STORE_FAST               1 (i) 
             12 LOAD_GLOBAL              0 (x) 
             15 LIST_APPEND              2 
             18 JUMP_ABSOLUTE            6 
        >>   21 RETURN_VALUE         

This chunk of bytecode loads the first argument passed in (the range(1) iterator), and just like the Python 2.x version uses FOR_ITER to loop over it and create its output.

Had we defined x in the foo function instead, x would be a cell variable (cells refer to nested scopes):

>>> def foo():
...     x = 2
...     class Foo:
...         x = 5
...         y = [x for i in range(1)]
...     return Foo
... 
>>> dis.dis(foo.__code__.co_consts[2].co_consts[2])
  5           0 BUILD_LIST               0 
              3 LOAD_FAST                0 (.0) 
        >>    6 FOR_ITER                12 (to 21) 
              9 STORE_FAST               1 (i) 
             12 LOAD_DEREF               0 (x) 
             15 LIST_APPEND              2 
             18 JUMP_ABSOLUTE            6 
        >>   21 RETURN_VALUE         

The LOAD_DEREF will indirectly load x from the code object cell objects:

>>> foo.__code__.co_cellvars               # foo function `x`
('x',)
>>> foo.__code__.co_consts[2].co_cellvars  # Foo class, no cell variables
()
>>> foo.__code__.co_consts[2].co_consts[2].co_freevars  # Refers to `x` in foo
('x',)
>>> foo().y
[2]

The actual referencing looks the value up from the current frame data structures, which were initialized from a function object’s .__closure__ attribute. Since the function created for the comprehension code object is discarded again, we do not get to inspect that function’s closure. To see a closure in action, we’d have to inspect a nested function instead:

>>> def spam(x):
...     def eggs():
...         return x
...     return eggs
... 
>>> spam(1).__code__.co_freevars
('x',)
>>> spam(1)()
1
>>> spam(1).__closure__
>>> spam(1).__closure__[0].cell_contents
1
>>> spam(5).__closure__[0].cell_contents
5

So, to summarize:

  • List comprehensions get their own code objects in Python 3, and there is no difference between code objects for functions, generators or comprehensions; comprehension code objects are wrapped in a temporary function object and called immediately.
  • Code objects are created at compile time, and any non-local variables are marked as either global or as free variables, based on the nested scopes of the code. The class body is not considered a scope for looking up those variables.
  • When executing the code, Python has only to look into the globals, or the closure of the currently executing object. Since the compiler didn’t include the class body as a scope, the temporary function namespace is not considered.

A workaround; or, what to do about it

If you were to create an explicit scope for the x variable, like in a function, you can use class-scope variables for a list comprehension:

>>> class Foo:
...     x = 5
...     def y(x):
...         return [x for i in range(1)]
...     y = y(x)
... 
>>> Foo.y
[5]

The ‘temporary’ y function can be called directly; we replace it when we do with its return value. Its scope is considered when resolving x:

>>> foo.__code__.co_consts[1].co_consts[2]
<code object y at 0x10a5df5d0, file "<stdin>", line 4>
>>> foo.__code__.co_consts[1].co_consts[2].co_cellvars
('x',)

Of course, people reading your code will scratch their heads over this a little; you may want to put a big fat comment in there explaining why you are doing this.

The best work-around is to just use __init__ to create an instance variable instead:

def __init__(self):
    self.y = [self.x for i in range(1)]

and avoid all the head-scratching, and questions to explain yourself. For your own concrete example, I would not even store the namedtuple on the class; either use the output directly (don’t store the generated class at all), or use a global:

from collections import namedtuple
State = namedtuple('State', ['name', 'capital'])

class StateDatabase:
    db = [State(*args) for args in [
       ('Alabama', 'Montgomery'),
       ('Alaska', 'Juneau'),
       # ...
    ]]

回答 1

我认为这是Python 3中的一个缺陷。我希望他们能够改变它。

旧方法(适用于2.7,适用NameError: name 'x' is not defined于3+):

class A:
    x = 4
    y = [x+i for i in range(1)]

注意:仅使用范围A.x将无法解决

新方式(适用于3+):

class A:
    x = 4
    y = (lambda x=x: [x+i for i in range(1)])()

因为语法太丑陋,所以我通常在构造函数中初始化所有类变量

In my opinion it is a flaw in Python 3. I hope they change it.

Old Way (works in 2.7, throws NameError: name 'x' is not defined in 3+):

class A:
    x = 4
    y = [x+i for i in range(1)]

NOTE: simply scoping it with A.x would not solve it

New Way (works in 3+):

class A:
    x = 4
    y = (lambda x=x: [x+i for i in range(1)])()

Because the syntax is so ugly I just initialize all my class variables in the constructor typically


回答 2

公认的答案提供了很好的信息,但这里似乎还有其他一些不足之处–列表理解和生成器表达式之间的差异。我玩过的一个演示:

class Foo:

    # A class-level variable.
    X = 10

    # I can use that variable to define another class-level variable.
    Y = sum((X, X))

    # Works in Python 2, but not 3.
    # In Python 3, list comprehensions were given their own scope.
    try:
        Z1 = sum([X for _ in range(3)])
    except NameError:
        Z1 = None

    # Fails in both.
    # Apparently, generator expressions (that's what the entire argument
    # to sum() is) did have their own scope even in Python 2.
    try:
        Z2 = sum(X for _ in range(3))
    except NameError:
        Z2 = None

    # Workaround: put the computation in lambda or def.
    compute_z3 = lambda val: sum(val for _ in range(3))

    # Then use that function.
    Z3 = compute_z3(X)

    # Also worth noting: here I can refer to XS in the for-part of the
    # generator expression (Z4 works), but I cannot refer to XS in the
    # inner-part of the generator expression (Z5 fails).
    XS = [15, 15, 15, 15]
    Z4 = sum(val for val in XS)
    try:
        Z5 = sum(XS[i] for i in range(len(XS)))
    except NameError:
        Z5 = None

print(Foo.Z1, Foo.Z2, Foo.Z3, Foo.Z4, Foo.Z5)

The accepted answer provides excellent information, but there appear to be a few other wrinkles here — differences between list comprehension and generator expressions. A demo that I played around with:

class Foo:

    # A class-level variable.
    X = 10

    # I can use that variable to define another class-level variable.
    Y = sum((X, X))

    # Works in Python 2, but not 3.
    # In Python 3, list comprehensions were given their own scope.
    try:
        Z1 = sum([X for _ in range(3)])
    except NameError:
        Z1 = None

    # Fails in both.
    # Apparently, generator expressions (that's what the entire argument
    # to sum() is) did have their own scope even in Python 2.
    try:
        Z2 = sum(X for _ in range(3))
    except NameError:
        Z2 = None

    # Workaround: put the computation in lambda or def.
    compute_z3 = lambda val: sum(val for _ in range(3))

    # Then use that function.
    Z3 = compute_z3(X)

    # Also worth noting: here I can refer to XS in the for-part of the
    # generator expression (Z4 works), but I cannot refer to XS in the
    # inner-part of the generator expression (Z5 fails).
    XS = [15, 15, 15, 15]
    Z4 = sum(val for val in XS)
    try:
        Z5 = sum(XS[i] for i in range(len(XS)))
    except NameError:
        Z5 = None

print(Foo.Z1, Foo.Z2, Foo.Z3, Foo.Z4, Foo.Z5)

回答 3

这是Python中的错误。宣传被认为等同于for循环,但是在类中却并非如此。至少在Python 3.6.6之前的版本中,在类中使用的理解中,在理解内部只能访问该理解外部的一个变量,并且必须将其用作最外层的迭代器。在功能上,此范围限制不适用。

为了说明为什么这是一个错误,让我们回到原始示例。这将失败:

class Foo:
    x = 5
    y = [x for i in range(1)]

但这有效:

def Foo():
    x = 5
    y = [x for i in range(1)]

该限制在参考指南的本节结尾处说明。

This is a bug in Python. Comprehensions are advertised as being equivalent to for loops, but this is not true in classes. At least up to Python 3.6.6, in a comprehension used in a class, only one variable from outside the comprehension is accessible inside the comprehension, and it must be used as the outermost iterator. In a function, this scope limitation does not apply.

To illustrate why this is a bug, let’s return to the original example. This fails:

class Foo:
    x = 5
    y = [x for i in range(1)]

But this works:

def Foo():
    x = 5
    y = [x for i in range(1)]

The limitation is stated at the end of this section in the reference guide.


回答 4

由于最外层的迭代器是在周围的范围内进行评估的,因此我们可以zip一起使用itertools.repeat将依赖项传递到理解范围内:

import itertools as it

class Foo:
    x = 5
    y = [j for i, j in zip(range(3), it.repeat(x))]

也可以for在理解中使用嵌套循环,并将依赖项包含在最外层的可迭代对象中:

class Foo:
    x = 5
    y = [j for j in (x,) for i in range(3)]

对于OP的特定示例:

from collections import namedtuple
import itertools as it

class StateDatabase:
    State = namedtuple('State', ['name', 'capital'])
    db = [State(*args) for State, args in zip(it.repeat(State), [
        ['Alabama', 'Montgomery'],
        ['Alaska', 'Juneau'],
        # ...
    ])]

Since the outermost iterator is evaluated in the surrounding scope we can use zip together with itertools.repeat to carry the dependencies over to the comprehension’s scope:

import itertools as it

class Foo:
    x = 5
    y = [j for i, j in zip(range(3), it.repeat(x))]

One can also use nested for loops in the comprehension and include the dependencies in the outermost iterable:

class Foo:
    x = 5
    y = [j for j in (x,) for i in range(3)]

For the specific example of the OP:

from collections import namedtuple
import itertools as it

class StateDatabase:
    State = namedtuple('State', ['name', 'capital'])
    db = [State(*args) for State, args in zip(it.repeat(State), [
        ['Alabama', 'Montgomery'],
        ['Alaska', 'Juneau'],
        # ...
    ])]

是什么导致[* a]总体化?

问题:是什么导致[* a]总体化?

显然list(a)不是总归,[x for x in a]在某些时候[*a]总归,始终都是总归吗?

这是从0到12的大小n,以及三种方法的结果大小(以字节为单位):

0 56 56 56
1 64 88 88
2 72 88 96
3 80 88 104
4 88 88 112
5 96 120 120
6 104 120 128
7 112 120 136
8 120 120 152
9 128 184 184
10 136 184 192
11 144 184 200
12 152 184 208

这样计算,可以使用Python 3 在repl.it中重现。8

from sys import getsizeof

for n in range(13):
    a = [None] * n
    print(n, getsizeof(list(a)),
             getsizeof([x for x in a]),
             getsizeof([*a]))

因此:这是如何工作的?如何整体化[*a]?实际上,它使用什么机制从给定的输入创建结果列表?它是否使用了迭代器a并使用了类似的东西list.append?源代码在哪里?

产生图像的数据和代码协作。)

放大到较小的n:

缩小到较大的n:

Apparently list(a) doesn’t overallocate, [x for x in a] overallocates at some points, and [*a] overallocates all the time?

Here are sizes n from 0 to 12 and the resulting sizes in bytes for the three methods:

0 56 56 56
1 64 88 88
2 72 88 96
3 80 88 104
4 88 88 112
5 96 120 120
6 104 120 128
7 112 120 136
8 120 120 152
9 128 184 184
10 136 184 192
11 144 184 200
12 152 184 208

Computed like this, reproducable at repl.it, using Python 3.8:

from sys import getsizeof

for n in range(13):
    a = [None] * n
    print(n, getsizeof(list(a)),
             getsizeof([x for x in a]),
             getsizeof([*a]))

So: How does this work? How does [*a] overallocate? Actually, what mechanism does it use to create the result list from the given input? Does it use an iterator over a and use something like list.append? Where is the source code?

(Colab with data and code that produced the images.)

Zooming in to smaller n:

Zooming out to larger n:


回答 0

[*a] 在内部执行C等效于

  1. 新建一个空的 list
  2. 呼叫 newlist.extend(a)
  3. 返回list

因此,如果您将测试扩展到:

from sys import getsizeof

for n in range(13):
    a = [None] * n
    l = []
    l.extend(a)
    print(n, getsizeof(list(a)),
             getsizeof([x for x in a]),
             getsizeof([*a]),
             getsizeof(l))

在线尝试!

你会看到的结果getsizeof([*a])l = []; l.extend(a); getsizeof(l)是相同的。

通常这是正确的做法;在extending时,您通常希望以后再添加,对于一般的解压缩,类似的情况是,假设要在一个接一个的基础上添加多个内容。[*a]这不是正常情况;Python假定list[*a, b, c, *d])中添加了多个项目或可迭代项,因此在常见情况下,过度分配可以节省工作。

相比之下,list由单个预先确定的可迭代大小(带有list())构成的结构在使用过程中可能不会增长或收缩,并且积算过早,除非另外证明。Python最近修复了一个错误,该错误使构造函数甚至可以对已知大小的输入进行整体分配

至于list理解,它们实际上等效于重复的appends,因此当您一次添加一个元素时,您会看到正常的过度分配增长模式的最终结果。

需要明确的是,这都不是语言保证。这就是CPython实现它的方式。Python语言规范通常与特定的增长模式无关list(除了保证从末期开始摊销O(1) appends和pops)。如评论中所述,具体实现在3.9中再次更改;尽管这不会影响[*a],但可能会影响其他情况,这些情况曾经是“构建tuple单个项目的临时内容,然后extend使用tuple”,现在变成的多个应用程序LIST_APPEND,当发生超额分配以及要计算的数字是多少时,该应用程序可能会发生变化。

[*a] is internally doing the C equivalent of:

  1. Make a new, empty list
  2. Call newlist.extend(a)
  3. Returns list.

So if you expand your test to:

from sys import getsizeof

for n in range(13):
    a = [None] * n
    l = []
    l.extend(a)
    print(n, getsizeof(list(a)),
             getsizeof([x for x in a]),
             getsizeof([*a]),
             getsizeof(l))

Try it online!

you’ll see the results for getsizeof([*a]) and l = []; l.extend(a); getsizeof(l) are the same.

This is usually the right thing to do; when extending you’re usually expecting to add more later, and similarly for generalized unpacking, it’s assumed that multiple things will be added one after the other. [*a] is not the normal case; Python assumes there are multiple items or iterables being added to the list ([*a, b, c, *d]), so overallocation saves work in the common case.

By contrast, a list constructed from a single, presized iterable (with list()) may not grow or shrink during use, and overallocating is premature until proven otherwise; Python recently fixed a bug that made the constructor overallocate even for inputs with known size.

As for list comprehensions, they’re effectively equivalent to repeated appends, so you’re seeing the final result of the normal overallocation growth pattern when adding an element at a time.

To be clear, none of this is a language guarantee. It’s just how CPython implements it. The Python language spec is generally unconcerned with specific growth patterns in list (aside from guaranteeing amortized O(1) appends and pops from the end). As noted in the comments, the specific implementation changes again in 3.9; while it won’t affect [*a], it could affect other cases where what used to be “build a temporary tuple of individual items and then extend with the tuple” now becomes multiple applications of LIST_APPEND, which can change when the overallocation occurs and what numbers go into the calculation.


回答 1

的全貌是什么情况,建立在其他的答案和评论(尤其是ShadowRanger的答案,这也解释了为什么它这样做)。

拆卸显示已BUILD_LIST_UNPACK被使用:

>>> import dis
>>> dis.dis('[*a]')
  1           0 LOAD_NAME                0 (a)
              2 BUILD_LIST_UNPACK        1
              4 RETURN_VALUE

这是在中ceval.c处理,它会构建一个空列表并将其扩展(带有a):

        case TARGET(BUILD_LIST_UNPACK): {
            ...
            PyObject *sum = PyList_New(0);
              ...
                none_val = _PyList_Extend((PyListObject *)sum, PEEK(i));

_PyList_Extend 用途 list_extend

_PyList_Extend(PyListObject *self, PyObject *iterable)
{
    return list_extend(self, iterable);
}

其中调用list_resize的总和为

list_extend(PyListObject *self, PyObject *iterable)
    ...
        n = PySequence_Fast_GET_SIZE(iterable);
        ...
        m = Py_SIZE(self);
        ...
        if (list_resize(self, m + n) < 0) {

overallocates如下:

list_resize(PyListObject *self, Py_ssize_t newsize)
{
  ...
    new_allocated = (size_t)newsize + (newsize >> 3) + (newsize < 9 ? 3 : 6);

让我们检查一下。用上面的公式计算预期的点数,并通过将预期的字节数乘以8(如我在此处使用64位Python)并添加一个空列表的字节数(即列表对象的固定开销)来计算预期的字节数:

from sys import getsizeof
for n in range(13):
    a = [None] * n
    expected_spots = n + (n >> 3) + (3 if n < 9 else 6)
    expected_bytesize = getsizeof([]) + expected_spots * 8
    real_bytesize = getsizeof([*a])
    print(n,
          expected_bytesize,
          real_bytesize,
          real_bytesize == expected_bytesize)

输出:

0 80 56 False
1 88 88 True
2 96 96 True
3 104 104 True
4 112 112 True
5 120 120 True
6 128 128 True
7 136 136 True
8 152 152 True
9 184 184 True
10 192 192 True
11 200 200 True
12 208 208 True

匹配,除了n = 0list_extend实际上是快捷方式,因此实际上也匹配:

        if (n == 0) {
            ...
            Py_RETURN_NONE;
        }
        ...
        if (list_resize(self, m + n) < 0) {

Full picture of what happens, building on the other answers and comments (especially ShadowRanger’s answer, which also explains why it’s done like that).

Disassembling shows that BUILD_LIST_UNPACK gets used:

>>> import dis
>>> dis.dis('[*a]')
  1           0 LOAD_NAME                0 (a)
              2 BUILD_LIST_UNPACK        1
              4 RETURN_VALUE

That’s handled in ceval.c, which builds an empty list and extends it (with a):

        case TARGET(BUILD_LIST_UNPACK): {
            ...
            PyObject *sum = PyList_New(0);
              ...
                none_val = _PyList_Extend((PyListObject *)sum, PEEK(i));

_PyList_Extend uses list_extend:

_PyList_Extend(PyListObject *self, PyObject *iterable)
{
    return list_extend(self, iterable);
}

Which calls list_resize with the sum of the sizes:

list_extend(PyListObject *self, PyObject *iterable)
    ...
        n = PySequence_Fast_GET_SIZE(iterable);
        ...
        m = Py_SIZE(self);
        ...
        if (list_resize(self, m + n) < 0) {

And that overallocates as follows:

list_resize(PyListObject *self, Py_ssize_t newsize)
{
  ...
    new_allocated = (size_t)newsize + (newsize >> 3) + (newsize < 9 ? 3 : 6);

Let’s check that. Compute the expected number of spots with the formula above, and compute the expected byte size by multiplying it with 8 (as I’m using 64-bit Python here) and adding an empty list’s byte size (i.e., a list object’s constant overhead):

from sys import getsizeof
for n in range(13):
    a = [None] * n
    expected_spots = n + (n >> 3) + (3 if n < 9 else 6)
    expected_bytesize = getsizeof([]) + expected_spots * 8
    real_bytesize = getsizeof([*a])
    print(n,
          expected_bytesize,
          real_bytesize,
          real_bytesize == expected_bytesize)

Output:

0 80 56 False
1 88 88 True
2 96 96 True
3 104 104 True
4 112 112 True
5 120 120 True
6 128 128 True
7 136 136 True
8 152 152 True
9 184 184 True
10 192 192 True
11 200 200 True
12 208 208 True

Matches except for n = 0, which list_extend actually shortcuts, so actually that matches, too:

        if (n == 0) {
            ...
            Py_RETURN_NONE;
        }
        ...
        if (list_resize(self, m + n) < 0) {

回答 2

这些将是CPython解释器的实现细节,因此在其他解释器之间可能不一致。

也就是说,您可以list(a)在这里看到理解和行为的位置:

https://github.com/python/cpython/blob/master/Objects/listobject.c#L36

专为理解:

 * The growth pattern is:  0, 4, 8, 16, 25, 35, 46, 58, 72, 88, ...
...

new_allocated = (size_t)newsize + (newsize >> 3) + (newsize < 9 ? 3 : 6);

在这些行的下面,有list_preallocate_exact调用时使用的行list(a)

These are going to be implementation details of the CPython interpreter, and so may not be consistent across other interpreters.

That said, you can see where the comprehension and list(a) behaviors come in here:

https://github.com/python/cpython/blob/master/Objects/listobject.c#L36

Specifically for the comprehension:

 * The growth pattern is:  0, 4, 8, 16, 25, 35, 46, 58, 72, 88, ...
...

new_allocated = (size_t)newsize + (newsize >> 3) + (newsize < 9 ? 3 : 6);

Just below those lines, there is list_preallocate_exact which is used when calling list(a).


为什么迭代一小串字符串比一小串列表慢?

问题:为什么迭代一小串字符串比一小串列表慢?

我在玩timeit时发现,对小字符串进行简单的列表理解要比对小字符串列表进行相同的操作花费的时间更长。有什么解释吗?时间几乎是原来的1.35倍。

>>> from timeit import timeit
>>> timeit("[x for x in 'abc']")
2.0691067844831528
>>> timeit("[x for x in ['a', 'b', 'c']]")
1.5286479570345861

导致此情况的较低级别发生了什么?

I was playing around with timeit and noticed that doing a simple list comprehension over a small string took longer than doing the same operation on a list of small single character strings. Any explanation? It’s almost 1.35 times as much time.

>>> from timeit import timeit
>>> timeit("[x for x in 'abc']")
2.0691067844831528
>>> timeit("[x for x in ['a', 'b', 'c']]")
1.5286479570345861

What’s happening on a lower level that’s causing this?


回答 0

TL; DR

  • 对于Python 2,一旦消除了很多开销,实际的速度差异就会接近70%(或更高)。

  • 对象创建没有错。这两种方法都不会创建新对象,因为会缓存一个字符的字符串。

  • 区别并不明显,但可能是由于对类型和格式正确的字符串索引进行了大量检查而造成的。由于很有必要检查返回的商品,因此很有可能。

  • 列表索引非常快。



>>> python3 -m timeit '[x for x in "abc"]'
1000000 loops, best of 3: 0.388 usec per loop

>>> python3 -m timeit '[x for x in ["a", "b", "c"]]'
1000000 loops, best of 3: 0.436 usec per loop

这与您发现的内容不同…

然后,您必须使用Python 2。

>>> python2 -m timeit '[x for x in "abc"]'
1000000 loops, best of 3: 0.309 usec per loop

>>> python2 -m timeit '[x for x in ["a", "b", "c"]]'
1000000 loops, best of 3: 0.212 usec per loop

让我们解释两个版本之间的区别。我将检查编译后的代码。

对于Python 3:

import dis

def list_iterate():
    [item for item in ["a", "b", "c"]]

dis.dis(list_iterate)
#>>>   4           0 LOAD_CONST               1 (<code object <listcomp> at 0x7f4d06b118a0, file "", line 4>)
#>>>               3 LOAD_CONST               2 ('list_iterate.<locals>.<listcomp>')
#>>>               6 MAKE_FUNCTION            0
#>>>               9 LOAD_CONST               3 ('a')
#>>>              12 LOAD_CONST               4 ('b')
#>>>              15 LOAD_CONST               5 ('c')
#>>>              18 BUILD_LIST               3
#>>>              21 GET_ITER
#>>>              22 CALL_FUNCTION            1 (1 positional, 0 keyword pair)
#>>>              25 POP_TOP
#>>>              26 LOAD_CONST               0 (None)
#>>>              29 RETURN_VALUE

def string_iterate():
    [item for item in "abc"]

dis.dis(string_iterate)
#>>>  21           0 LOAD_CONST               1 (<code object <listcomp> at 0x7f4d06b17150, file "", line 21>)
#>>>               3 LOAD_CONST               2 ('string_iterate.<locals>.<listcomp>')
#>>>               6 MAKE_FUNCTION            0
#>>>               9 LOAD_CONST               3 ('abc')
#>>>              12 GET_ITER
#>>>              13 CALL_FUNCTION            1 (1 positional, 0 keyword pair)
#>>>              16 POP_TOP
#>>>              17 LOAD_CONST               0 (None)
#>>>              20 RETURN_VALUE

您会在此处看到,由于每次都建立列表,列表变体可能会变慢。

这是

 9 LOAD_CONST   3 ('a')
12 LOAD_CONST   4 ('b')
15 LOAD_CONST   5 ('c')
18 BUILD_LIST   3

部分。字符串变体仅具有

 9 LOAD_CONST   3 ('abc')

您可以检查一下是否确实有所不同:

def string_iterate():
    [item for item in ("a", "b", "c")]

dis.dis(string_iterate)
#>>>  35           0 LOAD_CONST               1 (<code object <listcomp> at 0x7f4d068be660, file "", line 35>)
#>>>               3 LOAD_CONST               2 ('string_iterate.<locals>.<listcomp>')
#>>>               6 MAKE_FUNCTION            0
#>>>               9 LOAD_CONST               6 (('a', 'b', 'c'))
#>>>              12 GET_ITER
#>>>              13 CALL_FUNCTION            1 (1 positional, 0 keyword pair)
#>>>              16 POP_TOP
#>>>              17 LOAD_CONST               0 (None)
#>>>              20 RETURN_VALUE

这产生了

 9 LOAD_CONST               6 (('a', 'b', 'c'))

因为元组是不可变的。测试:

>>> python3 -m timeit '[x for x in ("a", "b", "c")]'
1000000 loops, best of 3: 0.369 usec per loop

太好了,赶快行动吧。

对于Python 2:

def list_iterate():
    [item for item in ["a", "b", "c"]]

dis.dis(list_iterate)
#>>>   2           0 BUILD_LIST               0
#>>>               3 LOAD_CONST               1 ('a')
#>>>               6 LOAD_CONST               2 ('b')
#>>>               9 LOAD_CONST               3 ('c')
#>>>              12 BUILD_LIST               3
#>>>              15 GET_ITER            
#>>>         >>   16 FOR_ITER                12 (to 31)
#>>>              19 STORE_FAST               0 (item)
#>>>              22 LOAD_FAST                0 (item)
#>>>              25 LIST_APPEND              2
#>>>              28 JUMP_ABSOLUTE           16
#>>>         >>   31 POP_TOP             
#>>>              32 LOAD_CONST               0 (None)
#>>>              35 RETURN_VALUE        

def string_iterate():
    [item for item in "abc"]

dis.dis(string_iterate)
#>>>   2           0 BUILD_LIST               0
#>>>               3 LOAD_CONST               1 ('abc')
#>>>               6 GET_ITER            
#>>>         >>    7 FOR_ITER                12 (to 22)
#>>>              10 STORE_FAST               0 (item)
#>>>              13 LOAD_FAST                0 (item)
#>>>              16 LIST_APPEND              2
#>>>              19 JUMP_ABSOLUTE            7
#>>>         >>   22 POP_TOP             
#>>>              23 LOAD_CONST               0 (None)
#>>>              26 RETURN_VALUE        

奇怪的是,我们具有相同的列表构建,但是这样做的速度仍然更快。Python 2的运行速度异常快。

让我们删除理解和重新计时。这_ =是为了防止它被优化。

>>> python3 -m timeit '_ = ["a", "b", "c"]'
10000000 loops, best of 3: 0.0707 usec per loop

>>> python3 -m timeit '_ = "abc"'
100000000 loops, best of 3: 0.0171 usec per loop

我们可以看到初始化不足以说明版本之间的差异(这些数字很小)!因此,我们可以得出结论,Python 3的理解速度较慢。随着Python 3将理解方式更改为具有更安全的作用域,这才有意义。

好吧,现在提高基准(我只是删除不是迭代的开销)。这通过预先分配来删除迭代器的构建:

>>> python3 -m timeit -s 'iterable = "abc"'           '[x for x in iterable]'
1000000 loops, best of 3: 0.387 usec per loop

>>> python3 -m timeit -s 'iterable = ["a", "b", "c"]' '[x for x in iterable]'
1000000 loops, best of 3: 0.368 usec per loop
>>> python2 -m timeit -s 'iterable = "abc"'           '[x for x in iterable]'
1000000 loops, best of 3: 0.309 usec per loop

>>> python2 -m timeit -s 'iterable = ["a", "b", "c"]' '[x for x in iterable]'
10000000 loops, best of 3: 0.164 usec per loop

我们可以检查调用iter是否是开销:

>>> python3 -m timeit -s 'iterable = "abc"'           'iter(iterable)'
10000000 loops, best of 3: 0.099 usec per loop

>>> python3 -m timeit -s 'iterable = ["a", "b", "c"]' 'iter(iterable)'
10000000 loops, best of 3: 0.1 usec per loop
>>> python2 -m timeit -s 'iterable = "abc"'           'iter(iterable)'
10000000 loops, best of 3: 0.0913 usec per loop

>>> python2 -m timeit -s 'iterable = ["a", "b", "c"]' 'iter(iterable)'
10000000 loops, best of 3: 0.0854 usec per loop

不,不是。差别太小,尤其是对于Python 3。

因此,让我们降低整体速度,从而消除更多不必要的开销!目的是使迭代时间更长,从而节省时间。

>>> python3 -m timeit -s 'import random; iterable = "".join(chr(random.randint(0, 127)) for _ in range(100000))' '[x for x in iterable]'
100 loops, best of 3: 3.12 msec per loop

>>> python3 -m timeit -s 'import random; iterable =        [chr(random.randint(0, 127)) for _ in range(100000)]' '[x for x in iterable]'
100 loops, best of 3: 2.77 msec per loop
>>> python2 -m timeit -s 'import random; iterable = "".join(chr(random.randint(0, 127)) for _ in range(100000))' '[x for x in iterable]'
100 loops, best of 3: 2.32 msec per loop

>>> python2 -m timeit -s 'import random; iterable =        [chr(random.randint(0, 127)) for _ in range(100000)]' '[x for x in iterable]'
100 loops, best of 3: 2.09 msec per loop

这实际上并没有太大变化,但有所帮助。

因此,消除理解。开销并不是问题的一部分:

>>> python3 -m timeit -s 'import random; iterable = "".join(chr(random.randint(0, 127)) for _ in range(100000))' 'for x in iterable: pass'
1000 loops, best of 3: 1.71 msec per loop

>>> python3 -m timeit -s 'import random; iterable =        [chr(random.randint(0, 127)) for _ in range(100000)]' 'for x in iterable: pass'
1000 loops, best of 3: 1.36 msec per loop
>>> python2 -m timeit -s 'import random; iterable = "".join(chr(random.randint(0, 127)) for _ in range(100000))' 'for x in iterable: pass'
1000 loops, best of 3: 1.27 msec per loop

>>> python2 -m timeit -s 'import random; iterable =        [chr(random.randint(0, 127)) for _ in range(100000)]' 'for x in iterable: pass'
1000 loops, best of 3: 935 usec per loop

这还差不多!通过使用deque迭代,我们仍然可以稍微快一些。基本上是一样的,但是速度更快

>>> python3 -m timeit -s 'import random; from collections import deque; iterable = "".join(chr(random.randint(0, 127)) for _ in range(100000))' 'deque(iterable, maxlen=0)'
1000 loops, best of 3: 777 usec per loop

>>> python3 -m timeit -s 'import random; from collections import deque; iterable =        [chr(random.randint(0, 127)) for _ in range(100000)]' 'deque(iterable, maxlen=0)'
1000 loops, best of 3: 405 usec per loop
>>> python2 -m timeit -s 'import random; from collections import deque; iterable = "".join(chr(random.randint(0, 127)) for _ in range(100000))' 'deque(iterable, maxlen=0)'
1000 loops, best of 3: 805 usec per loop

>>> python2 -m timeit -s 'import random; from collections import deque; iterable =        [chr(random.randint(0, 127)) for _ in range(100000)]' 'deque(iterable, maxlen=0)'
1000 loops, best of 3: 438 usec per loop

令我印象深刻的是,Unicode在字节串方面具有竞争力。我们可以通过尝试在bytesunicode两者中进行显式检查:

  • bytes

    >>> python3 -m timeit -s 'import random; from collections import deque; iterable = b"".join(chr(random.randint(0, 127)).encode("ascii") for _ in range(100000))' 'deque(iterable, maxlen=0)'                                                                    :(
    1000 loops, best of 3: 571 usec per loop
    
    >>> python3 -m timeit -s 'import random; from collections import deque; iterable =         [chr(random.randint(0, 127)).encode("ascii") for _ in range(100000)]' 'deque(iterable, maxlen=0)'
    1000 loops, best of 3: 394 usec per loop
    
    >>> python2 -m timeit -s 'import random; from collections import deque; iterable = b"".join(chr(random.randint(0, 127))                 for _ in range(100000))' 'deque(iterable, maxlen=0)'
    1000 loops, best of 3: 757 usec per loop
    
    >>> python2 -m timeit -s 'import random; from collections import deque; iterable =         [chr(random.randint(0, 127))                 for _ in range(100000)]' 'deque(iterable, maxlen=0)'
    1000 loops, best of 3: 438 usec per loop
    

    在这里,您可以看到Python 3实际上比Python 2 更快

  • unicode

    >>> python3 -m timeit -s 'import random; from collections import deque; iterable = u"".join(   chr(random.randint(0, 127)) for _ in range(100000))' 'deque(iterable, maxlen=0)'
    1000 loops, best of 3: 800 usec per loop
    
    >>> python3 -m timeit -s 'import random; from collections import deque; iterable =         [   chr(random.randint(0, 127)) for _ in range(100000)]' 'deque(iterable, maxlen=0)'
    1000 loops, best of 3: 394 usec per loop
    
    >>> python2 -m timeit -s 'import random; from collections import deque; iterable = u"".join(unichr(random.randint(0, 127)) for _ in range(100000))' 'deque(iterable, maxlen=0)'
    1000 loops, best of 3: 1.07 msec per loop
    
    >>> python2 -m timeit -s 'import random; from collections import deque; iterable =         [unichr(random.randint(0, 127)) for _ in range(100000)]' 'deque(iterable, maxlen=0)'
    1000 loops, best of 3: 469 usec per loop
    

    同样,Python 3更快,尽管这是可以预料的(str在Python 3中引起了很多关注)。

实际上,这unicodebytes差异很小,令人印象深刻。

因此,让我们分析一下这种情况,因为它对我来说既快速又方便:

>>> python3 -m timeit -s 'import random; from collections import deque; iterable = "".join(chr(random.randint(0, 127)) for _ in range(100000))' 'deque(iterable, maxlen=0)'
1000 loops, best of 3: 777 usec per loop

>>> python3 -m timeit -s 'import random; from collections import deque; iterable =        [chr(random.randint(0, 127)) for _ in range(100000)]' 'deque(iterable, maxlen=0)'
1000 loops, best of 3: 405 usec per loop

实际上,我们可以排除蒂姆·彼得(Tim Peter)提出10次支持的答案!

>>> foo = iterable[123]
>>> iterable[36] is foo
True

这些不是新对象!

但这值得一提:索引成本。区别可能在于索引,因此删除迭代并仅索引:

>>> python3 -m timeit -s 'import random; iterable = "".join(chr(random.randint(0, 127)) for _ in range(100000))' 'iterable[123]'
10000000 loops, best of 3: 0.0397 usec per loop

>>> python3 -m timeit -s 'import random; iterable =        [chr(random.randint(0, 127)) for _ in range(100000)]' 'iterable[123]'
10000000 loops, best of 3: 0.0374 usec per loop

差异似乎很小,但是至少一半的成本是间接费用:

>>> python3 -m timeit -s 'import random; iterable =        [chr(random.randint(0, 127)) for _ in range(100000)]' 'iterable; 123'
100000000 loops, best of 3: 0.0173 usec per loop

因此,速度差足以决定对此负责。我认为。

那么为什么索引列表这么快呢?

好吧,我会回来给你这一点,但我的猜测是的是倒在支票实习字符串(或缓存的字符,如果它是一个独立的机构)。这将不如最佳速度快。但是我会去检查源代码(尽管我对C语言不太满意):)。


所以这是来源:

static PyObject *
unicode_getitem(PyObject *self, Py_ssize_t index)
{
    void *data;
    enum PyUnicode_Kind kind;
    Py_UCS4 ch;
    PyObject *res;

    if (!PyUnicode_Check(self) || PyUnicode_READY(self) == -1) {
        PyErr_BadArgument();
        return NULL;
    }
    if (index < 0 || index >= PyUnicode_GET_LENGTH(self)) {
        PyErr_SetString(PyExc_IndexError, "string index out of range");
        return NULL;
    }
    kind = PyUnicode_KIND(self);
    data = PyUnicode_DATA(self);
    ch = PyUnicode_READ(kind, data, index);
    if (ch < 256)
        return get_latin1_char(ch);

    res = PyUnicode_New(1, ch);
    if (res == NULL)
        return NULL;
    kind = PyUnicode_KIND(res);
    data = PyUnicode_DATA(res);
    PyUnicode_WRITE(kind, data, 0, ch);
    assert(_PyUnicode_CheckConsistency(res, 1));
    return res;
}

从顶部走,我们将进行一些检查。这些无聊。然后一些分配,这也应该很无聊。第一个有趣的行是

ch = PyUnicode_READ(kind, data, index);

但是我们希望这很快,因为我们正在通过索引从连续的C数组读取数据。结果ch小于256,因此我们将在中返回缓存的字符get_latin1_char(ch)

因此,我们将运行(删除第一个检查)

kind = PyUnicode_KIND(self);
data = PyUnicode_DATA(self);
ch = PyUnicode_READ(kind, data, index);
return get_latin1_char(ch);

哪里

#define PyUnicode_KIND(op) \
    (assert(PyUnicode_Check(op)), \
     assert(PyUnicode_IS_READY(op)),            \
     ((PyASCIIObject *)(op))->state.kind)

(这很无聊,因为断言在调试中会被忽略(因此我可以检查它们是否很快),((PyASCIIObject *)(op))->state.kind)并且(我认为)是间接调用和C级强制转换);

#define PyUnicode_DATA(op) \
    (assert(PyUnicode_Check(op)), \
     PyUnicode_IS_COMPACT(op) ? _PyUnicode_COMPACT_DATA(op) :   \
     _PyUnicode_NONCOMPACT_DATA(op))

(由于类似的原因,这也很无聊,假设宏(Something_CAPITALIZED)都很快),

#define PyUnicode_READ(kind, data, index) \
    ((Py_UCS4) \
    ((kind) == PyUnicode_1BYTE_KIND ? \
        ((const Py_UCS1 *)(data))[(index)] : \
        ((kind) == PyUnicode_2BYTE_KIND ? \
            ((const Py_UCS2 *)(data))[(index)] : \
            ((const Py_UCS4 *)(data))[(index)] \
        ) \
    ))

(涉及索引,但实际上一点也不慢),并且

static PyObject*
get_latin1_char(unsigned char ch)
{
    PyObject *unicode = unicode_latin1[ch];
    if (!unicode) {
        unicode = PyUnicode_New(1, ch);
        if (!unicode)
            return NULL;
        PyUnicode_1BYTE_DATA(unicode)[0] = ch;
        assert(_PyUnicode_CheckConsistency(unicode, 1));
        unicode_latin1[ch] = unicode;
    }
    Py_INCREF(unicode);
    return unicode;
}

这证实了我的怀疑:

  • 这被缓存:

    PyObject *unicode = unicode_latin1[ch];
  • 这应该很快。在if (!unicode)没有运行,所以它是在这种情况下相当于字面上

    PyObject *unicode = unicode_latin1[ch];
    Py_INCREF(unicode);
    return unicode;
    

坦白地说,在测试asserts 之后(通过禁用它们[我认为它可以在C级断言上运行…]),只有看起来很慢的部分是:

PyUnicode_IS_COMPACT(op)
_PyUnicode_COMPACT_DATA(op)
_PyUnicode_NONCOMPACT_DATA(op)

哪个是:

#define PyUnicode_IS_COMPACT(op) \
    (((PyASCIIObject*)(op))->state.compact)

(和以前一样快),

#define _PyUnicode_COMPACT_DATA(op)                     \
    (PyUnicode_IS_ASCII(op) ?                   \
     ((void*)((PyASCIIObject*)(op) + 1)) :              \
     ((void*)((PyCompactUnicodeObject*)(op) + 1)))

(如果宏IS_ASCII很快,则很快),以及

#define _PyUnicode_NONCOMPACT_DATA(op)                  \
    (assert(((PyUnicodeObject*)(op))->data.any),        \
     ((((PyUnicodeObject *)(op))->data.any)))

(因为它是断言,间接寻址和强制转换,因此速度也很快)。

因此,我们进入(兔子洞)以:

PyUnicode_IS_ASCII

这是

#define PyUnicode_IS_ASCII(op)                   \
    (assert(PyUnicode_Check(op)),                \
     assert(PyUnicode_IS_READY(op)),             \
     ((PyASCIIObject*)op)->state.ascii)

嗯…似乎也很快…


好吧,但让我们将其与进行比较PyList_GetItem。(是的,感谢蒂姆·彼得斯(Tim Peters)为我提供了更多的工作要做:P。)

PyObject *
PyList_GetItem(PyObject *op, Py_ssize_t i)
{
    if (!PyList_Check(op)) {
        PyErr_BadInternalCall();
        return NULL;
    }
    if (i < 0 || i >= Py_SIZE(op)) {
        if (indexerr == NULL) {
            indexerr = PyUnicode_FromString(
                "list index out of range");
            if (indexerr == NULL)
                return NULL;
        }
        PyErr_SetObject(PyExc_IndexError, indexerr);
        return NULL;
    }
    return ((PyListObject *)op) -> ob_item[i];
}

我们可以看到,在非错误情况下,这只会运行:

PyList_Check(op)
Py_SIZE(op)
((PyListObject *)op) -> ob_item[i]

哪里PyList_Check

#define PyList_Check(op) \
     PyType_FastSubclass(Py_TYPE(op), Py_TPFLAGS_LIST_SUBCLASS)

TABS!TABS !!!)(issue215875分钟内修复并合并。就像…是的。该死的。他们让Skeet感到羞耻。

#define Py_SIZE(ob)             (((PyVarObject*)(ob))->ob_size)
#define PyType_FastSubclass(t,f)  PyType_HasFeature(t,f)
#ifdef Py_LIMITED_API
#define PyType_HasFeature(t,f)  ((PyType_GetFlags(t) & (f)) != 0)
#else
#define PyType_HasFeature(t,f)  (((t)->tp_flags & (f)) != 0)
#endif

因此,除非Py_LIMITED_API启用,否则通常这确实是微不足道的(两个间接调用和几个布尔检查)……???

然后是索引和强制转换(((PyListObject *)op) -> ob_item[i]),我们完成了。

因此,对列表检查肯定会更少,并且速度差异很小肯定意味着它可能是相关的。


我认为通常来说,(->)Unicode的类型检查和间接性更多。似乎我遗漏了一点,但是

TL;DR

  • The actual speed difference is closer to 70% (or more) once a lot of the overhead is removed, for Python 2.

  • Object creation is not at fault. Neither method creates a new object, as one-character strings are cached.

  • The difference is unobvious, but is likely created from a greater number of checks on string indexing, with regards to the type and well-formedness. It is also quite likely thanks to the need to check what to return.

  • List indexing is remarkably fast.



>>> python3 -m timeit '[x for x in "abc"]'
1000000 loops, best of 3: 0.388 usec per loop

>>> python3 -m timeit '[x for x in ["a", "b", "c"]]'
1000000 loops, best of 3: 0.436 usec per loop

This disagrees with what you’ve found…

You must be using Python 2, then.

>>> python2 -m timeit '[x for x in "abc"]'
1000000 loops, best of 3: 0.309 usec per loop

>>> python2 -m timeit '[x for x in ["a", "b", "c"]]'
1000000 loops, best of 3: 0.212 usec per loop

Let’s explain the difference between the versions. I’ll examine the compiled code.

For Python 3:

import dis

def list_iterate():
    [item for item in ["a", "b", "c"]]

dis.dis(list_iterate)
#>>>   4           0 LOAD_CONST               1 (<code object <listcomp> at 0x7f4d06b118a0, file "", line 4>)
#>>>               3 LOAD_CONST               2 ('list_iterate.<locals>.<listcomp>')
#>>>               6 MAKE_FUNCTION            0
#>>>               9 LOAD_CONST               3 ('a')
#>>>              12 LOAD_CONST               4 ('b')
#>>>              15 LOAD_CONST               5 ('c')
#>>>              18 BUILD_LIST               3
#>>>              21 GET_ITER
#>>>              22 CALL_FUNCTION            1 (1 positional, 0 keyword pair)
#>>>              25 POP_TOP
#>>>              26 LOAD_CONST               0 (None)
#>>>              29 RETURN_VALUE

def string_iterate():
    [item for item in "abc"]

dis.dis(string_iterate)
#>>>  21           0 LOAD_CONST               1 (<code object <listcomp> at 0x7f4d06b17150, file "", line 21>)
#>>>               3 LOAD_CONST               2 ('string_iterate.<locals>.<listcomp>')
#>>>               6 MAKE_FUNCTION            0
#>>>               9 LOAD_CONST               3 ('abc')
#>>>              12 GET_ITER
#>>>              13 CALL_FUNCTION            1 (1 positional, 0 keyword pair)
#>>>              16 POP_TOP
#>>>              17 LOAD_CONST               0 (None)
#>>>              20 RETURN_VALUE

You see here that the list variant is likely to be slower due to the building of the list each time.

This is the

 9 LOAD_CONST   3 ('a')
12 LOAD_CONST   4 ('b')
15 LOAD_CONST   5 ('c')
18 BUILD_LIST   3

part. The string variant only has

 9 LOAD_CONST   3 ('abc')

You can check that this does seem to make a difference:

def string_iterate():
    [item for item in ("a", "b", "c")]

dis.dis(string_iterate)
#>>>  35           0 LOAD_CONST               1 (<code object <listcomp> at 0x7f4d068be660, file "", line 35>)
#>>>               3 LOAD_CONST               2 ('string_iterate.<locals>.<listcomp>')
#>>>               6 MAKE_FUNCTION            0
#>>>               9 LOAD_CONST               6 (('a', 'b', 'c'))
#>>>              12 GET_ITER
#>>>              13 CALL_FUNCTION            1 (1 positional, 0 keyword pair)
#>>>              16 POP_TOP
#>>>              17 LOAD_CONST               0 (None)
#>>>              20 RETURN_VALUE

This produces just

 9 LOAD_CONST               6 (('a', 'b', 'c'))

as tuples are immutable. Test:

>>> python3 -m timeit '[x for x in ("a", "b", "c")]'
1000000 loops, best of 3: 0.369 usec per loop

Great, back up to speed.

For Python 2:

def list_iterate():
    [item for item in ["a", "b", "c"]]

dis.dis(list_iterate)
#>>>   2           0 BUILD_LIST               0
#>>>               3 LOAD_CONST               1 ('a')
#>>>               6 LOAD_CONST               2 ('b')
#>>>               9 LOAD_CONST               3 ('c')
#>>>              12 BUILD_LIST               3
#>>>              15 GET_ITER            
#>>>         >>   16 FOR_ITER                12 (to 31)
#>>>              19 STORE_FAST               0 (item)
#>>>              22 LOAD_FAST                0 (item)
#>>>              25 LIST_APPEND              2
#>>>              28 JUMP_ABSOLUTE           16
#>>>         >>   31 POP_TOP             
#>>>              32 LOAD_CONST               0 (None)
#>>>              35 RETURN_VALUE        

def string_iterate():
    [item for item in "abc"]

dis.dis(string_iterate)
#>>>   2           0 BUILD_LIST               0
#>>>               3 LOAD_CONST               1 ('abc')
#>>>               6 GET_ITER            
#>>>         >>    7 FOR_ITER                12 (to 22)
#>>>              10 STORE_FAST               0 (item)
#>>>              13 LOAD_FAST                0 (item)
#>>>              16 LIST_APPEND              2
#>>>              19 JUMP_ABSOLUTE            7
#>>>         >>   22 POP_TOP             
#>>>              23 LOAD_CONST               0 (None)
#>>>              26 RETURN_VALUE        

The odd thing is that we have the same building of the list, but it’s still faster for this. Python 2 is acting strangely fast.

Let’s remove the comprehensions and re-time. The _ = is to prevent it getting optimised out.

>>> python3 -m timeit '_ = ["a", "b", "c"]'
10000000 loops, best of 3: 0.0707 usec per loop

>>> python3 -m timeit '_ = "abc"'
100000000 loops, best of 3: 0.0171 usec per loop

We can see that initialization is not significant enough to account for the difference between the versions (those numbers are small)! We can thus conclude that Python 3 has slower comprehensions. This makes sense as Python 3 changed comprehensions to have safer scoping.

Well, now improve the benchmark (I’m just removing overhead that isn’t iteration). This removes the building of the iterable by pre-assigning it:

>>> python3 -m timeit -s 'iterable = "abc"'           '[x for x in iterable]'
1000000 loops, best of 3: 0.387 usec per loop

>>> python3 -m timeit -s 'iterable = ["a", "b", "c"]' '[x for x in iterable]'
1000000 loops, best of 3: 0.368 usec per loop
>>> python2 -m timeit -s 'iterable = "abc"'           '[x for x in iterable]'
1000000 loops, best of 3: 0.309 usec per loop

>>> python2 -m timeit -s 'iterable = ["a", "b", "c"]' '[x for x in iterable]'
10000000 loops, best of 3: 0.164 usec per loop

We can check if calling iter is the overhead:

>>> python3 -m timeit -s 'iterable = "abc"'           'iter(iterable)'
10000000 loops, best of 3: 0.099 usec per loop

>>> python3 -m timeit -s 'iterable = ["a", "b", "c"]' 'iter(iterable)'
10000000 loops, best of 3: 0.1 usec per loop
>>> python2 -m timeit -s 'iterable = "abc"'           'iter(iterable)'
10000000 loops, best of 3: 0.0913 usec per loop

>>> python2 -m timeit -s 'iterable = ["a", "b", "c"]' 'iter(iterable)'
10000000 loops, best of 3: 0.0854 usec per loop

No. No it is not. The difference is too small, especially for Python 3.

So let’s remove yet more unwanted overhead… by making the whole thing slower! The aim is just to have a longer iteration so the time hides overhead.

>>> python3 -m timeit -s 'import random; iterable = "".join(chr(random.randint(0, 127)) for _ in range(100000))' '[x for x in iterable]'
100 loops, best of 3: 3.12 msec per loop

>>> python3 -m timeit -s 'import random; iterable =        [chr(random.randint(0, 127)) for _ in range(100000)]' '[x for x in iterable]'
100 loops, best of 3: 2.77 msec per loop
>>> python2 -m timeit -s 'import random; iterable = "".join(chr(random.randint(0, 127)) for _ in range(100000))' '[x for x in iterable]'
100 loops, best of 3: 2.32 msec per loop

>>> python2 -m timeit -s 'import random; iterable =        [chr(random.randint(0, 127)) for _ in range(100000)]' '[x for x in iterable]'
100 loops, best of 3: 2.09 msec per loop

This hasn’t actually changed much, but it’s helped a little.

So remove the comprehension. It’s overhead that’s not part of the question:

>>> python3 -m timeit -s 'import random; iterable = "".join(chr(random.randint(0, 127)) for _ in range(100000))' 'for x in iterable: pass'
1000 loops, best of 3: 1.71 msec per loop

>>> python3 -m timeit -s 'import random; iterable =        [chr(random.randint(0, 127)) for _ in range(100000)]' 'for x in iterable: pass'
1000 loops, best of 3: 1.36 msec per loop
>>> python2 -m timeit -s 'import random; iterable = "".join(chr(random.randint(0, 127)) for _ in range(100000))' 'for x in iterable: pass'
1000 loops, best of 3: 1.27 msec per loop

>>> python2 -m timeit -s 'import random; iterable =        [chr(random.randint(0, 127)) for _ in range(100000)]' 'for x in iterable: pass'
1000 loops, best of 3: 935 usec per loop

That’s more like it! We can get slightly faster still by using deque to iterate. It’s basically the same, but it’s faster:

>>> python3 -m timeit -s 'import random; from collections import deque; iterable = "".join(chr(random.randint(0, 127)) for _ in range(100000))' 'deque(iterable, maxlen=0)'
1000 loops, best of 3: 777 usec per loop

>>> python3 -m timeit -s 'import random; from collections import deque; iterable =        [chr(random.randint(0, 127)) for _ in range(100000)]' 'deque(iterable, maxlen=0)'
1000 loops, best of 3: 405 usec per loop
>>> python2 -m timeit -s 'import random; from collections import deque; iterable = "".join(chr(random.randint(0, 127)) for _ in range(100000))' 'deque(iterable, maxlen=0)'
1000 loops, best of 3: 805 usec per loop

>>> python2 -m timeit -s 'import random; from collections import deque; iterable =        [chr(random.randint(0, 127)) for _ in range(100000)]' 'deque(iterable, maxlen=0)'
1000 loops, best of 3: 438 usec per loop

What impresses me is that Unicode is competitive with bytestrings. We can check this explicitly by trying bytes and unicode in both:

  • bytes

    >>> python3 -m timeit -s 'import random; from collections import deque; iterable = b"".join(chr(random.randint(0, 127)).encode("ascii") for _ in range(100000))' 'deque(iterable, maxlen=0)'                                                                    :(
    1000 loops, best of 3: 571 usec per loop
    
    >>> python3 -m timeit -s 'import random; from collections import deque; iterable =         [chr(random.randint(0, 127)).encode("ascii") for _ in range(100000)]' 'deque(iterable, maxlen=0)'
    1000 loops, best of 3: 394 usec per loop
    
    >>> python2 -m timeit -s 'import random; from collections import deque; iterable = b"".join(chr(random.randint(0, 127))                 for _ in range(100000))' 'deque(iterable, maxlen=0)'
    1000 loops, best of 3: 757 usec per loop
    
    >>> python2 -m timeit -s 'import random; from collections import deque; iterable =         [chr(random.randint(0, 127))                 for _ in range(100000)]' 'deque(iterable, maxlen=0)'
    1000 loops, best of 3: 438 usec per loop
    

    Here you see Python 3 actually faster than Python 2.

  • unicode

    >>> python3 -m timeit -s 'import random; from collections import deque; iterable = u"".join(   chr(random.randint(0, 127)) for _ in range(100000))' 'deque(iterable, maxlen=0)'
    1000 loops, best of 3: 800 usec per loop
    
    >>> python3 -m timeit -s 'import random; from collections import deque; iterable =         [   chr(random.randint(0, 127)) for _ in range(100000)]' 'deque(iterable, maxlen=0)'
    1000 loops, best of 3: 394 usec per loop
    
    >>> python2 -m timeit -s 'import random; from collections import deque; iterable = u"".join(unichr(random.randint(0, 127)) for _ in range(100000))' 'deque(iterable, maxlen=0)'
    1000 loops, best of 3: 1.07 msec per loop
    
    >>> python2 -m timeit -s 'import random; from collections import deque; iterable =         [unichr(random.randint(0, 127)) for _ in range(100000)]' 'deque(iterable, maxlen=0)'
    1000 loops, best of 3: 469 usec per loop
    

    Again, Python 3 is faster, although this is to be expected (str has had a lot of attention in Python 3).

In fact, this unicodebytes difference is very small, which is impressive.

So let’s analyse this one case, seeing as it’s fast and convenient for me:

>>> python3 -m timeit -s 'import random; from collections import deque; iterable = "".join(chr(random.randint(0, 127)) for _ in range(100000))' 'deque(iterable, maxlen=0)'
1000 loops, best of 3: 777 usec per loop

>>> python3 -m timeit -s 'import random; from collections import deque; iterable =        [chr(random.randint(0, 127)) for _ in range(100000)]' 'deque(iterable, maxlen=0)'
1000 loops, best of 3: 405 usec per loop

We can actually rule out Tim Peter’s 10-times-upvoted answer!

>>> foo = iterable[123]
>>> iterable[36] is foo
True

These are not new objects!

But this is worth mentioning: indexing costs. The difference will likely be in the indexing, so remove the iteration and just index:

>>> python3 -m timeit -s 'import random; iterable = "".join(chr(random.randint(0, 127)) for _ in range(100000))' 'iterable[123]'
10000000 loops, best of 3: 0.0397 usec per loop

>>> python3 -m timeit -s 'import random; iterable =        [chr(random.randint(0, 127)) for _ in range(100000)]' 'iterable[123]'
10000000 loops, best of 3: 0.0374 usec per loop

The difference seems small, but at least half of the cost is overhead:

>>> python3 -m timeit -s 'import random; iterable =        [chr(random.randint(0, 127)) for _ in range(100000)]' 'iterable; 123'
100000000 loops, best of 3: 0.0173 usec per loop

so the speed difference is sufficient to decide to blame it. I think.

So why is indexing a list so much faster?

Well, I’ll come back to you on that, but my guess is that’s is down to the check for interned strings (or cached characters if it’s a separate mechanism). This will be less fast than optimal. But I’ll go check the source (although I’m not comfortable in C…) :).


So here’s the source:

static PyObject *
unicode_getitem(PyObject *self, Py_ssize_t index)
{
    void *data;
    enum PyUnicode_Kind kind;
    Py_UCS4 ch;
    PyObject *res;

    if (!PyUnicode_Check(self) || PyUnicode_READY(self) == -1) {
        PyErr_BadArgument();
        return NULL;
    }
    if (index < 0 || index >= PyUnicode_GET_LENGTH(self)) {
        PyErr_SetString(PyExc_IndexError, "string index out of range");
        return NULL;
    }
    kind = PyUnicode_KIND(self);
    data = PyUnicode_DATA(self);
    ch = PyUnicode_READ(kind, data, index);
    if (ch < 256)
        return get_latin1_char(ch);

    res = PyUnicode_New(1, ch);
    if (res == NULL)
        return NULL;
    kind = PyUnicode_KIND(res);
    data = PyUnicode_DATA(res);
    PyUnicode_WRITE(kind, data, 0, ch);
    assert(_PyUnicode_CheckConsistency(res, 1));
    return res;
}

Walking from the top, we’ll have some checks. These are boring. Then some assigns, which should also be boring. The first interesting line is

ch = PyUnicode_READ(kind, data, index);

but we’d hope that is fast, as we’re reading from a contiguous C array by indexing it. The result, ch, will be less than 256 so we’ll return the cached character in get_latin1_char(ch).

So we’ll run (dropping the first checks)

kind = PyUnicode_KIND(self);
data = PyUnicode_DATA(self);
ch = PyUnicode_READ(kind, data, index);
return get_latin1_char(ch);

Where

#define PyUnicode_KIND(op) \
    (assert(PyUnicode_Check(op)), \
     assert(PyUnicode_IS_READY(op)),            \
     ((PyASCIIObject *)(op))->state.kind)

(which is boring because asserts get ignored in debug [so I can check that they’re fast] and ((PyASCIIObject *)(op))->state.kind) is (I think) an indirection and a C-level cast);

#define PyUnicode_DATA(op) \
    (assert(PyUnicode_Check(op)), \
     PyUnicode_IS_COMPACT(op) ? _PyUnicode_COMPACT_DATA(op) :   \
     _PyUnicode_NONCOMPACT_DATA(op))

(which is also boring for similar reasons, assuming the macros (Something_CAPITALIZED) are all fast),

#define PyUnicode_READ(kind, data, index) \
    ((Py_UCS4) \
    ((kind) == PyUnicode_1BYTE_KIND ? \
        ((const Py_UCS1 *)(data))[(index)] : \
        ((kind) == PyUnicode_2BYTE_KIND ? \
            ((const Py_UCS2 *)(data))[(index)] : \
            ((const Py_UCS4 *)(data))[(index)] \
        ) \
    ))

(which involves indexes but really isn’t slow at all) and

static PyObject*
get_latin1_char(unsigned char ch)
{
    PyObject *unicode = unicode_latin1[ch];
    if (!unicode) {
        unicode = PyUnicode_New(1, ch);
        if (!unicode)
            return NULL;
        PyUnicode_1BYTE_DATA(unicode)[0] = ch;
        assert(_PyUnicode_CheckConsistency(unicode, 1));
        unicode_latin1[ch] = unicode;
    }
    Py_INCREF(unicode);
    return unicode;
}

Which confirms my suspicion that:

  • This is cached:

    PyObject *unicode = unicode_latin1[ch];
    
  • This should be fast. The if (!unicode) is not run, so it’s literally equivalent in this case to

    PyObject *unicode = unicode_latin1[ch];
    Py_INCREF(unicode);
    return unicode;
    

Honestly, after testing the asserts are fast (by disabling them [I think it works on the C-level asserts…]), the only plausibly-slow parts are:

PyUnicode_IS_COMPACT(op)
_PyUnicode_COMPACT_DATA(op)
_PyUnicode_NONCOMPACT_DATA(op)

Which are:

#define PyUnicode_IS_COMPACT(op) \
    (((PyASCIIObject*)(op))->state.compact)

(fast, as before),

#define _PyUnicode_COMPACT_DATA(op)                     \
    (PyUnicode_IS_ASCII(op) ?                   \
     ((void*)((PyASCIIObject*)(op) + 1)) :              \
     ((void*)((PyCompactUnicodeObject*)(op) + 1)))

(fast if the macro IS_ASCII is fast), and

#define _PyUnicode_NONCOMPACT_DATA(op)                  \
    (assert(((PyUnicodeObject*)(op))->data.any),        \
     ((((PyUnicodeObject *)(op))->data.any)))

(also fast as it’s an assert plus an indirection plus a cast).

So we’re down (the rabbit hole) to:

PyUnicode_IS_ASCII

which is

#define PyUnicode_IS_ASCII(op)                   \
    (assert(PyUnicode_Check(op)),                \
     assert(PyUnicode_IS_READY(op)),             \
     ((PyASCIIObject*)op)->state.ascii)

Hmm… that seems fast too…


Well, OK, but let’s compare it to PyList_GetItem. (Yeah, thanks Tim Peters for giving me more work to do :P.)

PyObject *
PyList_GetItem(PyObject *op, Py_ssize_t i)
{
    if (!PyList_Check(op)) {
        PyErr_BadInternalCall();
        return NULL;
    }
    if (i < 0 || i >= Py_SIZE(op)) {
        if (indexerr == NULL) {
            indexerr = PyUnicode_FromString(
                "list index out of range");
            if (indexerr == NULL)
                return NULL;
        }
        PyErr_SetObject(PyExc_IndexError, indexerr);
        return NULL;
    }
    return ((PyListObject *)op) -> ob_item[i];
}

We can see that on non-error cases this is just going to run:

PyList_Check(op)
Py_SIZE(op)
((PyListObject *)op) -> ob_item[i]

Where PyList_Check is

#define PyList_Check(op) \
     PyType_FastSubclass(Py_TYPE(op), Py_TPFLAGS_LIST_SUBCLASS)

(TABS! TABS!!!) (issue21587) That got fixed and merged in 5 minutes. Like… yeah. Damn. They put Skeet to shame.

#define Py_SIZE(ob)             (((PyVarObject*)(ob))->ob_size)
#define PyType_FastSubclass(t,f)  PyType_HasFeature(t,f)
#ifdef Py_LIMITED_API
#define PyType_HasFeature(t,f)  ((PyType_GetFlags(t) & (f)) != 0)
#else
#define PyType_HasFeature(t,f)  (((t)->tp_flags & (f)) != 0)
#endif

So this is normally really trivial (two indirections and a couple of boolean checks) unless Py_LIMITED_API is on, in which case… ???

Then there’s the indexing and a cast (((PyListObject *)op) -> ob_item[i]) and we’re done.

So there are definitely fewer checks for lists, and the small speed differences certainly imply that it could be relevant.


I think in general, there’s just more type-checking and indirection (->) for Unicode. It seems I’m missing a point, but what?


回答 1

当您遍历大多数容器对象(列表,元组,字典,…)时,迭代器会容器中传递对象。

但是,当您遍历字符串时,必须为传递的每个字符创建一个对象-字符串不是“容器”,就如同列表是容器一样。在迭代创建对象之前,字符串中的各个字符不作为不同的对象存在。

When you iterate over most container objects (lists, tuples, dicts, …), the iterator delivers the objects in the container.

But when you iterate over a string, a new object has to be created for each character delivered – a string is not “a container” in the same sense a list is a container. The individual characters in a string don’t exist as distinct objects before iteration creates those objects.


回答 2

创建字符串的迭代器可能会招致麻烦。而数组在实例化时已经包含一个迭代器。

编辑:

>>> timeit("[x for x in ['a','b','c']]")
0.3818681240081787
>>> timeit("[x for x in 'abc']")
0.3732869625091553

这是使用2.7运行的,但是在我的Mac book pro i7上。这可能是系统配置不同的结果。

You could be incurring and overhead for creating the iterator for the string. Whereas the array already contains an iterator upon instantiation.

EDIT:

>>> timeit("[x for x in ['a','b','c']]")
0.3818681240081787
>>> timeit("[x for x in 'abc']")
0.3732869625091553

This was ran using 2.7, but on my mac book pro i7. This could be the result of a system configuration difference.


为什么在Python 3.5中str.translate比Python 3.4更快?

问题:为什么在Python 3.5中str.translate比Python 3.4更快?

我试图使用text.translate()Python 3.4 从给定的字符串中删除不需要的字符。

最小的代码是:

import sys 
s = 'abcde12345@#@$#%$'
mapper = dict.fromkeys(i for i in range(sys.maxunicode) if chr(i) in '@#$')
print(s.translate(mapper))

它按预期工作。但是,在Python 3.4和Python 3.5中执行相同的程序会产生很大的不同。

计算时间的代码是

python3 -m timeit -s "import sys;s = 'abcde12345@#@$#%$'*1000 ; mapper = dict.fromkeys(i for i in range(sys.maxunicode) if chr(i) in '@#$'); "   "s.translate(mapper)"

Python 3.4程序花费1.3毫秒,而Python 3.5中的同一程序仅花费26.4μs

Python 3.5中有哪些改进使其比Python 3.4更快?

I was trying to remove unwanted characters from a given string using text.translate() in Python 3.4.

The minimal code is:

import sys 
s = 'abcde12345@#@$#%$'
mapper = dict.fromkeys(i for i in range(sys.maxunicode) if chr(i) in '@#$')
print(s.translate(mapper))

It works as expected. However the same program when executed in Python 3.4 and Python 3.5 gives a large difference.

The code to calculate timings is

python3 -m timeit -s "import sys;s = 'abcde12345@#@$#%$'*1000 ; mapper = dict.fromkeys(i for i in range(sys.maxunicode) if chr(i) in '@#$'); "   "s.translate(mapper)"

The Python 3.4 program takes 1.3ms whereas the same program in Python 3.5 takes only 26.4μs.

What has improved in Python 3.5 that makes it faster compared to Python 3.4?


回答 0

TL; DR- 问题21118


长篇故事

Josh Rosenberg发现str.translate()与相比,该功能非常慢bytes.translate,他提出了一个问题,并指出:

在Python 3中,str.translate()通常是性能悲观,而不是优化。

为什么str.translate()慢呢?

str.translate()速度很慢的主要原因是查找曾经在Python字典中进行。

使用maketrans此问题使情况变得更糟。类似的方法是使用bytes256个项目构建一个C数组以快速查找表。因此,较高级别的Python的使用dict使str.translate()Python 3.4中的速度非常慢。

现在发生什么事?

第一种方法是添加一个小的补丁,translate_writer,但是速度的提高并不令人满意。很快又测试了另一个补丁fast_translate,它产生了非常好的结果,加速了55%。

从文件中可以看到的主要变化是Python字典查找已更改为C级查找。

现在的速度几乎与 bytes

                                unpatched           patched

str.translate                   4.55125927699919    0.7898181750006188
str.translate from bytes trans  1.8910855210015143  0.779950579000797

这里需要注意的一点是,性能增强仅在ASCII字符串中突出。

正如JFSebastian在下面的注释中提到的,在3.5之前,对于ASCII和非ASCII情况,转换以前都以相同的方式工作。但是从3.5 ASCII起,大小写要快得多。

早期的ASCII与非ASCII几乎相同,但是现在我们可以看到性能有了很大的变化。

答案所示,它可以从71.6μs改善到2.33μs 。

以下代码演示了这一点

python3.5 -m timeit -s "text = 'mJssissippi'*100; d=dict(J='i')" "text.translate(d)"
100000 loops, best of 3: 2.3 usec per loop
python3.5 -m timeit -s "text = 'm\U0001F602ssissippi'*100; d={'\U0001F602': 'i'}" "text.translate(d)"
10000 loops, best of 3: 117 usec per loop

python3 -m timeit -s "text = 'm\U0001F602ssissippi'*100; d={'\U0001F602': 'i'}" "text.translate(d)"
10000 loops, best of 3: 91.2 usec per loop
python3 -m timeit -s "text = 'mJssissippi'*100; d=dict(J='i')" "text.translate(d)"
10000 loops, best of 3: 101 usec per loop

结果列表:

         Python 3.4    Python 3.5  
Ascii     91.2          2.3 
Unicode   101           117

TL;DR – ISSUE 21118


The long Story

Josh Rosenberg found out that the str.translate() function is very slow compared to the bytes.translate, he raised an issue, stating that:

In Python 3, str.translate() is usually a performance pessimization, not optimization.

Why was str.translate() slow?

The main reason for str.translate() to be very slow was that the lookup used to be in a Python dictionary.

The usage of maketrans made this problem worse. The similar approach using bytes builds a C array of 256 items to fast table lookup. Hence the usage of higher level Python dict makes the str.translate() in Python 3.4 very slow.

What happened now?

The first approach was to add a small patch, translate_writer, However the speed increase was not that pleasing. Soon another patch fast_translate was tested and it yielded very nice results of up to 55% speedup.

The main change as can be seen from the file is that the Python dictionary lookup is changed into a C level lookup.

The speeds now are almost the same as bytes

                                unpatched           patched

str.translate                   4.55125927699919    0.7898181750006188
str.translate from bytes trans  1.8910855210015143  0.779950579000797

A small note here is that the performance enhancement is only prominent in ASCII strings.

As J.F.Sebastian mentions in a comment below, Before 3.5, translate used to work in the same way for both ASCII and non-ASCII cases. However from 3.5 ASCII case is much faster.

Earlier ASCII vs non-ascii used to be almost same, however now we can see a great change in the performance.

It can be an improvement from 71.6μs to 2.33μs as seen in this answer.

The following code demonstrates this

python3.5 -m timeit -s "text = 'mJssissippi'*100; d=dict(J='i')" "text.translate(d)"
100000 loops, best of 3: 2.3 usec per loop
python3.5 -m timeit -s "text = 'm\U0001F602ssissippi'*100; d={'\U0001F602': 'i'}" "text.translate(d)"
10000 loops, best of 3: 117 usec per loop

python3 -m timeit -s "text = 'm\U0001F602ssissippi'*100; d={'\U0001F602': 'i'}" "text.translate(d)"
10000 loops, best of 3: 91.2 usec per loop
python3 -m timeit -s "text = 'mJssissippi'*100; d=dict(J='i')" "text.translate(d)"
10000 loops, best of 3: 101 usec per loop

Tabulation of the results:

         Python 3.4    Python 3.5  
Ascii     91.2          2.3 
Unicode   101           117

为什么tuple(set([(1,“ a”,“ b”,“ c”,“ z”,“ f”]))==元组(set([(a,b,c) “ z”,“ f”,1]))85%的时间启用了哈希随机化?

问题:为什么tuple(set([(1,“ a”,“ b”,“ c”,“ z”,“ f”]))==元组(set([(a,b,c) “ z”,“ f”,1]))85%的时间启用了哈希随机化?

鉴于零比雷埃夫斯对另一个问题的回答,我们认为

x = tuple(set([1, "a", "b", "c", "z", "f"]))
y = tuple(set(["a", "b", "c", "z", "f", 1]))
print(x == y)

True在启用散列随机化的情况下,大约打印时间的85%。为什么是85%?

Given Zero Piraeus’ answer to another question, we have that

x = tuple(set([1, "a", "b", "c", "z", "f"]))
y = tuple(set(["a", "b", "c", "z", "f", 1]))
print(x == y)

Prints True about 85% of the time with hash randomization enabled. Why 85%?


回答 0

我假设这个问题的所有读者都读过:

首先要注意的是,哈希随机化是由解释器启动决定的。

两组字母的哈希值都相同,因此唯一重要的是是否发生冲突(顺序会受到影响)。


通过第二个链接的推论,我们知道这些集合的支持数组从长度8开始:

_ _ _ _ _ _ _ _

在第一种情况下,我们插入1

_ 1 _ _ _ _ _ _

然后插入其余部分:

α 1 ? ? ? ? ? ?

然后将其重新映射为32号:

    1 can't collide with α as α is an even hash
  ↓ so 1 is inserted at slot 1 first
? 1 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?

在第二种情况下,我们插入其余部分:

? β ? ? ? ? ? ?

然后尝试插入1:

    Try to insert 1 here, but will
  ↓ be rehashed if β exists
? β ? ? ? ? ? ?

然后它将被修复:

    Try to insert 1 here, but will
    be rehashed if β exists and has
  ↓ not rehashed somewhere else
? β ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?

因此,迭代次数是否不同仅取决于β是否存在。


β的机率是5个字母中的任何一个都将以1模8 哈希以1模32哈希的概率。

由于任何哈希到1模32的东西也哈希到1模8,我们想找到32个插槽的机会,所以五个插槽之一在插槽1中:

5 (number of letters) / 32 (number of slots)

5/32为0.15625,因此在两个固定结构之间有15.625%的订单概率不同


一点也不奇怪,这正是零比雷埃夫斯所测量的。


¹从技术上讲,这并不明显。我们可以假装5个散列中的每一个都是唯一的,因为需要重新哈希处理,但是由于线性探测,实际上更有可能发生“成束的”结构……但是因为我们只查看是否占用了一个插槽,所以实际上不会影响我们。

I’m going to assume any readers of this question to have read both:

The first thing to note is that hash randomization is decided on interpreter start-up.

The hash of each letter will be the same for both sets, so the only thing that can matter is if there is a collision (where order will be affected).


By the deductions of that second link we know the backing array for these sets starts at length 8:

_ _ _ _ _ _ _ _

In the first case, we insert 1:

_ 1 _ _ _ _ _ _

and then insert the rest:

α 1 ? ? ? ? ? ?

Then it is rehashed to size 32:

    1 can't collide with α as α is an even hash
  ↓ so 1 is inserted at slot 1 first
? 1 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?

In the second case, we insert the rest:

? β ? ? ? ? ? ?

And then try to insert 1:

    Try to insert 1 here, but will
  ↓ be rehashed if β exists
? β ? ? ? ? ? ?

And then it will be rehashed:

    Try to insert 1 here, but will
    be rehashed if β exists and has
  ↓ not rehashed somewhere else
? β ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?

So whether the iteration orders are different depends solely on whether β exists.


The chance of a β is the chance that any of the 5 letters will hash to 1 modulo 8 and hash to 1 modulo 32.

Since anything that hashes to 1 modulo 32 also hashes to 1 modulo 8, we want to find the chance that of the 32 slots, one of the five is in slot 1:

5 (number of letters) / 32 (number of slots)

5/32 is 0.15625, so there is a 15.625% chance¹ of the orders being different between the two set constructions.


Not very strangely at all, this is exactly what Zero Piraeus measured.


¹Technically even this isn’t obvious. We can pretend every one of the 5 hashes uniquely because of rehashing, but because of linear probing it’s actually more likely for “bunched” structures to occur… but because we’re only looking at whether a single slot is occupied, this doesn’t actually affect us.


Python是否会优化掉仅用作返回值的变量?

问题:Python是否会优化掉仅用作返回值的变量?

以下两个代码段之间是否有最终区别?首先为函数中的变量分配一个值,然后返回该变量。第二个函数只是直接返回值。

Python是否将它们转换为等效的字节码?是其中之一吗?

情况1

def func():
    a = 42
    return a

情况2

def func():
    return 42

Is there any ultimate difference between the following two code snippets? The first assigns a value to a variable in a function and then returns that variable. The second function just returns the value directly.

Does Python turn them into equivalent bytecode? Is one of them faster?

Case 1:

def func():
    a = 42
    return a

Case 2:

def func():
    return 42

回答 0

不,不是

CPython字节码的编译仅通过小型的猫眼优化器传递,该优化器仅用于基本优化(有关这些优化的更多信息,请参见测试套件中的test_peepholer.py)。

要查看实际发生的情况,请使用dis*查看生成的指令。对于第一个函数,包含分配:

from dis import dis
dis(func)
  2           0 LOAD_CONST               1 (42)
              2 STORE_FAST               0 (a)

  3           4 LOAD_FAST                0 (a)
              6 RETURN_VALUE

而对于第二个功能:

dis(func2)
  2           0 LOAD_CONST               1 (42)
              2 RETURN_VALUE

第一个中使用了另外两个(快速)指令:STORE_FASTLOAD_FAST。这些可以快速存储并获取fastlocals当前执行帧数组中的值。然后,在两种情况下RETURN_VALUE都执行a。所以,第二个是曾经如此轻微,由于执行需要更少的命令更快。

通常,请注意CPython编译器在执行的优化中是保守的。它并没有像其他编译器一样聪明(通常也有更多的信息可以使用)。除了显然是正确的以外,主要设计目标是:a)保持简单,并且b)尽可能快地进行编译,因此您甚至不会注意到存在编译阶段。

最后,您不应该为像这样的小问题而烦恼。速度的好处是微小的,恒定的,并且与解释Python事实所带来的开销相形见war。

* dis是一个小的Python模块,可反汇编您的代码,您可以使用它查看VM将执行的Python字节码。

注意:正如@Jorn Vernee的评论中所述,这特定于Python的CPython实现。如果其他实现愿意的话,其他实现可能会进行更积极的优化,而CPython则不需要。

No, it doesn’t.

The compilation to CPython byte code is only passed through a small peephole optimizer that is designed to do only basic optimizations (See test_peepholer.py in the test suite for more on these optimizations).

To take a look at what’s actually going to happen, use dis* to see the instructions generated. For the first function, containing the assignment:

from dis import dis
dis(func)
  2           0 LOAD_CONST               1 (42)
              2 STORE_FAST               0 (a)

  3           4 LOAD_FAST                0 (a)
              6 RETURN_VALUE

While, for the second function:

dis(func2)
  2           0 LOAD_CONST               1 (42)
              2 RETURN_VALUE

Two more (fast) instructions are used in the first: STORE_FAST and LOAD_FAST. These make a quick store and grab of the value in the fastlocals array of the current execution frame. Then, in both cases, a RETURN_VALUE is performed. So, the second is ever so slightly faster due to less commands needed to execute.

In general, be aware that the CPython compiler is conservative in the optimizations it performs. It isn’t and doesn’t try to be as smart as other compilers (which, in general, also have much more information to work with). The main design goal, apart from obviously being correct, is to a) keep it simple and b) be as swift as possible in compiling these so you don’t even notice that a compilation phase exists.

In the end, you shouldn’t trouble yourself with small issues like this one. The benefit in speed is tiny, constant and, dwarfed by the overhead introduced by the fact that Python is interpreted.

*dis is a little Python module that dis-assembles your code, you can use it to see the Python bytecode that the VM will execute.

Note: As also stated in a comment by @Jorn Vernee, this is specific to the CPython implementation of Python. Other implementations might do more aggressive optimizations if they so desire, CPython doesn’t.


回答 1

两者基本上是相同的,除了在第一种情况下,对象42只是简单地分配给名为的变量a,换句话说,名称(即a)是指值(即42)。从某种意义上说,它从不复制任何数据,从技术上讲它不会做任何分配。

returning期间,此命名绑定a在第一种情况下返回,而对象42在第二种情况下返回。

有关更多阅读,请参考Ned Batchelder的精彩文章

Both are basically the same except that in the first case the object 42 is simply aassigned to a variable named a or, in other words, names (i.e. a) refer to values (i.e. 42) . It doesn’t do any assignment technically, in the sense that it never copies any data.

While returning, this named binding a is returned in the first case while the object 42 is return in the second case.

For more reading, refer this great article by Ned Batchelder


为什么元组在内存中的空间比列表少?

问题:为什么元组在内存中的空间比列表少?

A tuple在Python中占用更少的内存空间:

>>> a = (1,2,3)
>>> a.__sizeof__()
48

lists占用更多的内存空间:

>>> b = [1,2,3]
>>> b.__sizeof__()
64

Python内存管理内部会发生什么?

A tuple takes less memory space in Python:

>>> a = (1,2,3)
>>> a.__sizeof__()
48

whereas lists takes more memory space:

>>> b = [1,2,3]
>>> b.__sizeof__()
64

What happens internally on the Python memory management?


回答 0

我假设您正在使用CPython并使用64位(在CPython 2.7 64位上得到的结果相同)。其他Python实现可能会有所不同,或者您拥有32位Python。

无论采用哪种实现方式,lists都是可变大小的,而tuples是固定大小的。

因此tuples可以将元素直接存储在struct内部,另一方面,列表需要一层间接寻址(它存储指向元素的指针)。间接层是一个指针,在64位系统(即64位,即8字节)上。

但是还有另一件事list:它们过度分配。否则,list.append始终是一项O(n)操作-要使其摊销(快得多!!!),它会过度分配。但是现在它必须跟踪分配的大小和填充的大小(s只需要存储一个大小,因为分配的和填充的大小始终相同)。这意味着每个列表必须存储另一个“大小”,它在64位系统上是64位整数,也是8个字节。O(1)tuple

因此lists比tuples 需要至少16个字节的内存。为什么我说“至少”?由于分配过多。过度分配意味着它分配了比所需更多的空间。但是,过度分配的数量取决于创建列表的“方式”和附加/删除历史记录:

>>> l = [1,2,3]
>>> l.__sizeof__()
64
>>> l.append(4)  # triggers re-allocation (with over-allocation), because the original list is full
>>> l.__sizeof__()
96

>>> l = []
>>> l.__sizeof__()
40
>>> l.append(1)  # re-allocation with over-allocation
>>> l.__sizeof__()
72
>>> l.append(2)  # no re-alloc
>>> l.append(3)  # no re-alloc
>>> l.__sizeof__()
72
>>> l.append(4)  # still has room, so no over-allocation needed (yet)
>>> l.__sizeof__()
72

图片

我决定创建一些图像以伴随以上说明。也许这些有帮助

在示例中,这是(示意性地)将其存储在内存中的方式。我强调了红色(徒手)循环的区别:

这实际上只是一个近似值,因为int对象也是Python对象,并且CPython甚至重用了小整数,因此内存中对象的一种可能更准确的表示形式(尽管不那么可读)将是:

有用的链接:

请注意,__sizeof__它并不会真正返回“正确”的大小!它仅返回存储值的大小。但是,使用sys.getsizeof结果不同:

>>> import sys
>>> l = [1,2,3]
>>> t = (1, 2, 3)
>>> sys.getsizeof(l)
88
>>> sys.getsizeof(t)
72

有24个“额外”字节。这些是真实的,这是方法中未考虑的垃圾收集器开销__sizeof__。这是因为您通常不应该直接使用魔术方法-在这种情况下,请使用知道如何处理魔术方法的函数:(sys.getsizeof这实际上会将GC开销加到的返回值上__sizeof__)。

I assume you’re using CPython and with 64bits (I got the same results on my CPython 2.7 64-bit). There could be differences in other Python implementations or if you have a 32bit Python.

Regardless of the implementation, lists are variable-sized while tuples are fixed-size.

So tuples can store the elements directly inside the struct, lists on the other hand need a layer of indirection (it stores a pointer to the elements). This layer of indirection is a pointer, on 64bit systems that’s 64bit, hence 8bytes.

But there’s another thing that lists do: They over-allocate. Otherwise list.append would be an O(n) operation always – to make it amortized O(1) (much faster!!!) it over-allocates. But now it has to keep track of the allocated size and the filled size (tuples only need to store one size, because allocated and filled size are always identical). That means each list has to store another “size” which on 64bit systems is a 64bit integer, again 8 bytes.

So lists need at least 16 bytes more memory than tuples. Why did I say “at least”? Because of the over-allocation. Over-allocation means it allocates more space than needed. However, the amount of over-allocation depends on “how” you create the list and the append/deletion history:

>>> l = [1,2,3]
>>> l.__sizeof__()
64
>>> l.append(4)  # triggers re-allocation (with over-allocation), because the original list is full
>>> l.__sizeof__()
96

>>> l = []
>>> l.__sizeof__()
40
>>> l.append(1)  # re-allocation with over-allocation
>>> l.__sizeof__()
72
>>> l.append(2)  # no re-alloc
>>> l.append(3)  # no re-alloc
>>> l.__sizeof__()
72
>>> l.append(4)  # still has room, so no over-allocation needed (yet)
>>> l.__sizeof__()
72

Images

I decided to create some images to accompany the explanation above. Maybe these are helpful

This is how it (schematically) is stored in memory in your example. I highlighted the differences with red (free-hand) cycles:

That’s actually just an approximation because int objects are also Python objects and CPython even reuses small integers, so a probably more accurate representation (although not as readable) of the objects in memory would be:

Useful links:

Note that __sizeof__ doesn’t really return the “correct” size! It only returns the size of the stored values. However when you use sys.getsizeof the result is different:

>>> import sys
>>> l = [1,2,3]
>>> t = (1, 2, 3)
>>> sys.getsizeof(l)
88
>>> sys.getsizeof(t)
72

There are 24 “extra” bytes. These are real, that’s the garbage collector overhead that isn’t accounted for in the __sizeof__ method. That’s because you’re generally not supposed to use magic methods directly – use the functions that know how to handle them, in this case: sys.getsizeof (which actually adds the GC overhead to the value returned from __sizeof__).


回答 1

我将更深入地研究CPython代码库,以便我们可以看到大小的实际计算方式。在您的特定示例中没有执行过度分配,因此我不会赘述

我将在这里使用64位值。


lists 的大小由以下函数计算得出list_sizeof

static PyObject *
list_sizeof(PyListObject *self)
{
    Py_ssize_t res;

    res = _PyObject_SIZE(Py_TYPE(self)) + self->allocated * sizeof(void*);
    return PyInt_FromSsize_t(res);
}

Py_TYPE(self)是一个抓取ob_typeself(返回PyList_Type),而 _PyObject_SIZE另一种宏抓斗tp_basicsize从该类型。tp_basicsize计算为实例结构sizeof(PyListObject)在哪里PyListObject

PyListObject结构包含三个字段:

PyObject_VAR_HEAD     # 24 bytes 
PyObject **ob_item;   #  8 bytes
Py_ssize_t allocated; #  8 bytes

这些内容有评论(我将它们修剪掉)以解释它们的含义,请点击上面的链接阅读它们。PyObject_VAR_HEAD扩展到3个8字节字段(ob_refcountob_typeob_size),所以一个24字节的贡献。

所以现在res是:

sizeof(PyListObject) + self->allocated * sizeof(void*)

要么:

40 + self->allocated * sizeof(void*)

如果列表实例具有已分配的元素。第二部分计算他们的贡献。self->allocated顾名思义,它保存分配的元素数。

没有任何元素,列表的大小计算为:

>>> [].__sizeof__()
40

即实例结构的大小。


tuple对象没有定义tuple_sizeof函数。而是使用它们object_sizeof来计算大小:

static PyObject *
object_sizeof(PyObject *self, PyObject *args)
{
    Py_ssize_t res, isize;

    res = 0;
    isize = self->ob_type->tp_itemsize;
    if (isize > 0)
        res = Py_SIZE(self) * isize;
    res += self->ob_type->tp_basicsize;

    return PyInt_FromSsize_t(res);
}

lists一样,它获取tp_basicsize和,如果对象具有非零值tp_itemsize(意味着它具有可变长度的实例),它将乘以元组中的项数(通过Py_SIZEtp_itemsize

tp_basicsize再次使用sizeof(PyTupleObject)其中的 PyTupleObject结构包含

PyObject_VAR_HEAD       # 24 bytes 
PyObject *ob_item[1];   # 8  bytes

因此,没有任何元素(即Py_SIZEreturn 0),空元组的大小等于sizeof(PyTupleObject)

>>> ().__sizeof__()
24

?? 嗯,这里是我还没有找到一个解释,一个古怪tp_basicsizetuples的实际计算公式如下:

sizeof(PyTupleObject) - sizeof(PyObject *)

为什么8要从中删除其他字节tp_basicsize是我一直无法找到的。(有关可能的解释,请参阅MSeifert的评论)


但是,这基本上是您特定示例中的区别list还会保留许多已分配的元素,这有助于确定何时再次过度分配。

现在,当添加其他元素时,列表确实会执行此过度分配以实现O(1)追加。由于MSeifert的封面很好地覆盖了他的答案,因此尺寸更大。

I’ll take a deeper dive into the CPython codebase so we can see how the sizes are actually calculated. In your specific example, no over-allocations have been performed, so I won’t touch on that.

I’m going to use 64-bit values here, as you are.


The size for lists is calculated from the following function, list_sizeof:

static PyObject *
list_sizeof(PyListObject *self)
{
    Py_ssize_t res;

    res = _PyObject_SIZE(Py_TYPE(self)) + self->allocated * sizeof(void*);
    return PyInt_FromSsize_t(res);
}

Here Py_TYPE(self) is a macro that grabs the ob_type of self (returning PyList_Type) while _PyObject_SIZE is another macro that grabs tp_basicsize from that type. tp_basicsize is calculated as sizeof(PyListObject) where PyListObject is the instance struct.

The PyListObject structure has three fields:

PyObject_VAR_HEAD     # 24 bytes 
PyObject **ob_item;   #  8 bytes
Py_ssize_t allocated; #  8 bytes

these have comments (which I trimmed) explaining what they are, follow the link above to read them. PyObject_VAR_HEAD expands into three 8 byte fields (ob_refcount, ob_type and ob_size) so a 24 byte contribution.

So for now res is:

sizeof(PyListObject) + self->allocated * sizeof(void*)

or:

40 + self->allocated * sizeof(void*)

If the list instance has elements that are allocated. the second part calculates their contribution. self->allocated, as it’s name implies, holds the number of allocated elements.

Without any elements, the size of lists is calculated to be:

>>> [].__sizeof__()
40

i.e the size of the instance struct.


tuple objects don’t define a tuple_sizeof function. Instead, they use object_sizeof to calculate their size:

static PyObject *
object_sizeof(PyObject *self, PyObject *args)
{
    Py_ssize_t res, isize;

    res = 0;
    isize = self->ob_type->tp_itemsize;
    if (isize > 0)
        res = Py_SIZE(self) * isize;
    res += self->ob_type->tp_basicsize;

    return PyInt_FromSsize_t(res);
}

This, as for lists, grabs the tp_basicsize and, if the object has a non-zero tp_itemsize (meaning it has variable-length instances), it multiplies the number of items in the tuple (which it gets via Py_SIZE) with tp_itemsize.

tp_basicsize again uses sizeof(PyTupleObject) where the PyTupleObject struct contains:

PyObject_VAR_HEAD       # 24 bytes 
PyObject *ob_item[1];   # 8  bytes

So, without any elements (that is, Py_SIZE returns 0) the size of empty tuples is equal to sizeof(PyTupleObject):

>>> ().__sizeof__()
24

huh? Well, here’s an oddity which I haven’t found an explanation for, the tp_basicsize of tuples is actually calculated as follows:

sizeof(PyTupleObject) - sizeof(PyObject *)

why an additional 8 bytes is removed from tp_basicsize is something I haven’t been able to find out. (See MSeifert’s comment for a possible explanation)


But, this is basically the difference in your specific example. lists also keep around a number of allocated elements which helps determine when to over-allocate again.

Now, when additional elements are added, lists do indeed perform this over-allocation in order to achieve O(1) appends. This results in greater sizes as MSeifert’s covers nicely in his answer.


回答 2

MSeifert的答案涵盖了广泛的范围;为简单起见,您可以想到:

tuple是一成不变的。一旦设置,您将无法更改。因此,您预先知道需要为该对象分配多少内存。

list易变。您可以在其中添加或删除项目。它必须知道它的大小(用于内部隐含)。根据需要调整大小。

没有免费的餐点 -这些功能需要付费。因此,列表的内存开销。

MSeifert answer covers it broadly; to keep it simple you can think of:

tuple is immutable. Once it set, you can’t change it. So you know in advance how much memory you need to allocate for that object.

list is mutable. You can add or remove items to or from it. It has to know the size of it (for internal impl.). It resizes as needed.

There are no free meals – these capabilities comes with a cost. Hence the overhead in memory for lists.


回答 3

元组的大小是有前缀的,这意味着在元组初始化时,解释器会为所包含的数据分配足够的空间,这就是它的结尾,使其具有不变性(无法修改),而列表是可变对象,因此意味着动态分配内存,因此避免每次您追加或修改列表时都要分配空间(分配足够的空间来容纳已更改的数据并将数据复制到其中),它会为以后的追加,修改等分配更多的空间。总结。

The size of the tuple is prefixed, meaning at tuple initialization the interpreter allocate enough space for the contained data, and that’s the end of it, giving it’s immutable (can’t be modified), whereas a list is a mutable object hence implying dynamic allocation of memory, so to avoid allocating space each time you append or modify the list ( allocate enough space to contain the changed data and copy the data to it), it allocates additional space for future append, modifications, … that pretty much sums it up.


关于Python的内置sort()方法

问题:关于Python的内置sort()方法

sort()Python 的内置方法使用什么算法?可以看看该方法的代码吗?

What algorithm is the built in sort() method in Python using? Is it possible to have a look at the code for that method?


回答 0

当然!代码在这里,从函数开始,islt进行相当一段时间;-)。就像克里斯的评论所暗示的那样,它是C代码。您还需要阅读此文本文件,以获得文本说明,结果等。

如果您喜欢阅读Java代码而不是C代码,则可以看看Joshua Bloch在Java中和Java中实现timsort(Joshua也是在1997年实现了仍在Java中使用的经过修改的mergesort的人,并且人们希望Java将最终改用他最近的timsort港口)。

timsort的Java端口的一些说明在这里,diff在这里(带有指向所有需要的文件的指针),关键文件在这里 -FWIW,尽管我是比Java程序员更好的C程序员,在这种情况下,我发现Joshua的Java代码比Tim的C代码更具可读性;-)。

Sure! The code’s here, starting with function islt and proceeding for QUITE a while;-). As Chris’s comment suggests, it’s C code. You’ll also want to read this text file for a textual explanation, results, etc etc.

If you prefer reading Java code than C code, you could look at Joshua Bloch’s implementation of timsort in and for Java (Joshua’s also the guy who implemented, in 1997, the modified mergesort that’s still used in Java, and one can hope that Java will eventually switch to his recent port of timsort).

Some explanation of the Java port of timsort is here, the diff is here (with pointers to all needed files), the key file is here — FWIW, while I’m a better C programmer than Java programmer, in this case I find Joshua’s Java code more readable overall than Tim’s C code;-).


回答 1

我只是想提供一个非常有用的链接,而我在Alex的其他全面答案中却没有找到它:对Python timsort的高级解释(带有图形可视化!)。

(是的,该算法现在基本上称为Timsort

I just wanted to supply a very helpful link that I missed in Alex’s otherwise comprehensive answer: A high-level explanation of Python’s timsort (with graph visualizations!).

(Yes, the algorithm is basically known as Timsort now)


回答 2

在早期的python版本中,sort函数实现了quicksort的修改版本。但是,它被认为是不稳定的,从2.3版本开始,他们改用自适应合并排序算法。

In early python-versions, the sort function implemented a modified version of quicksort. However, it was deemed unstable and as of 2.3 they switched to using an adaptive mergesort algorithm.


什么时候在Python中hash(n)== n?

问题:什么时候在Python中hash(n)== n?

我一直在玩Python的hash函数。对于小整数,它hash(n) == n总是出现。但是,这不会扩展为大量:

>>> hash(2**100) == 2**100
False

我并不感到惊讶,我知道哈希值取值范围有限。这个范围是多少?

我尝试使用二进制搜索来找到最小的数字hash(n) != n

>>> import codejamhelpers # pip install codejamhelpers
>>> help(codejamhelpers.binary_search)
Help on function binary_search in module codejamhelpers.binary_search:

binary_search(f, t)
    Given an increasing function :math:`f`, find the greatest non-negative integer :math:`n` such that :math:`f(n) \le t`. If :math:`f(n) > t` for all :math:`n \ge 0`, return None.

>>> f = lambda n: int(hash(n) != n)
>>> n = codejamhelpers.binary_search(f, 0)
>>> hash(n)
2305843009213693950
>>> hash(n+1)
0

2305843009213693951有什么特别之处?我注意到它小于sys.maxsize == 9223372036854775807

编辑:我正在使用Python3。我在Python 2上运行了相同的二进制搜索,得到了不同的结果2147483648,我注意到这是 sys.maxint+1

我也玩过[hash(random.random()) for i in range(10**6)]以估计哈希函数的范围。最大值始终低于上面的n。比较最小值,似乎Python 3的哈希值始终为正值,而Python 2的哈希值可以为负值。

I’ve been playing with Python’s hash function. For small integers, it appears hash(n) == n always. However this does not extend to large numbers:

>>> hash(2**100) == 2**100
False

I’m not surprised, I understand hash takes a finite range of values. What is that range?

I tried using binary search to find the smallest number hash(n) != n

>>> import codejamhelpers # pip install codejamhelpers
>>> help(codejamhelpers.binary_search)
Help on function binary_search in module codejamhelpers.binary_search:

binary_search(f, t)
    Given an increasing function :math:`f`, find the greatest non-negative integer :math:`n` such that :math:`f(n) \le t`. If :math:`f(n) > t` for all :math:`n \ge 0`, return None.

>>> f = lambda n: int(hash(n) != n)
>>> n = codejamhelpers.binary_search(f, 0)
>>> hash(n)
2305843009213693950
>>> hash(n+1)
0

What’s special about 2305843009213693951? I note it’s less than sys.maxsize == 9223372036854775807

Edit: I’m using Python 3. I ran the same binary search on Python 2 and got a different result 2147483648, which I note is sys.maxint+1

I also played with [hash(random.random()) for i in range(10**6)] to estimate the range of hash function. The max is consistently below n above. Comparing the min, it seems Python 3’s hash is always positively valued, whereas Python 2’s hash can take negative values.


回答 0

基于文件中的python文档pyhash.c

对于数字类型,数字x的哈希值是基于对x的减乘以模数质数得出的P = 2**_PyHASH_BITS - 1。它的设计使 hash(x) == hash(y)x和y在数值上相等时,即使x和y具有不同的类型。

因此,对于64/32位计算机,减少量将为2 _PyHASH_BITS -1,但是什么是_PyHASH_BITS

您可以在pyhash.h头文件中找到该文件,对于64位计算机,该头文件已定义为61(您可以在pyconfig.h文件中阅读更多说明)。

#if SIZEOF_VOID_P >= 8
#  define _PyHASH_BITS 61
#else
#  define _PyHASH_BITS 31
#endif

因此首先基于您的平台,例如在我的64位Linux平台上,减少幅度是2 61 -1,即2305843009213693951

>>> 2**61 - 1
2305843009213693951

也可以使用math.frexp来获取尾数和尾数sys.maxint,对于64位机器,该尾数和尾数表明max int为2 63

>>> import math
>>> math.frexp(sys.maxint)
(0.5, 64)

您可以通过一个简单的测试来查看差异:

>>> hash(2**62) == 2**62
True
>>> hash(2**63) == 2**63
False

阅读有关python哈希算法的完整文档https://github.com/python/cpython/blob/master/Python/pyhash.c#L34

如注释中所述,您可以使用sys.hash_info(在python 3.X中),这将为您提供用于计算哈希的参数的结构序列。

>>> sys.hash_info
sys.hash_info(width=64, modulus=2305843009213693951, inf=314159, nan=0, imag=1000003, algorithm='siphash24', hash_bits=64, seed_bits=128, cutoff=0)
>>> 

除了我在前inf几行中描述的模数之外,您还可以获得以下值:

>>> hash(float('inf'))
314159
>>> sys.hash_info.inf
314159

Based on python documentation in pyhash.c file:

For numeric types, the hash of a number x is based on the reduction of x modulo the prime P = 2**_PyHASH_BITS - 1. It’s designed so that hash(x) == hash(y) whenever x and y are numerically equal, even if x and y have different types.

So for a 64/32 bit machine, the reduction would be 2 _PyHASH_BITS – 1, but what is _PyHASH_BITS?

You can find it in pyhash.h header file which for a 64 bit machine has been defined as 61 (you can read more explanation in pyconfig.h file).

#if SIZEOF_VOID_P >= 8
#  define _PyHASH_BITS 61
#else
#  define _PyHASH_BITS 31
#endif

So first off all it’s based on your platform for example in my 64bit Linux platform the reduction is 261-1, which is 2305843009213693951:

>>> 2**61 - 1
2305843009213693951

Also You can use math.frexp in order to get the mantissa and exponent of sys.maxint which for a 64 bit machine shows that max int is 263:

>>> import math
>>> math.frexp(sys.maxint)
(0.5, 64)

And you can see the difference by a simple test:

>>> hash(2**62) == 2**62
True
>>> hash(2**63) == 2**63
False

Read the complete documentation about python hashing algorithm https://github.com/python/cpython/blob/master/Python/pyhash.c#L34

As mentioned in comment you can use sys.hash_info (in python 3.X) which will give you a struct sequence of parameters used for computing hashes.

>>> sys.hash_info
sys.hash_info(width=64, modulus=2305843009213693951, inf=314159, nan=0, imag=1000003, algorithm='siphash24', hash_bits=64, seed_bits=128, cutoff=0)
>>> 

Alongside the modulus that I’ve described in preceding lines, you can also get the inf value as following:

>>> hash(float('inf'))
314159
>>> sys.hash_info.inf
314159

回答 1

23058430092136939512^61 - 1。它是最大的Mersenne素数,适合64位。

如果您只需要将值mod取一个数字就可以进行哈希处理,那么大的Mersenne素数是一个不错的选择-它易于计算并且可以确保可能性的均匀分布。(尽管我个人永远不会这样散列)

计算浮点数的模数特别方便。它们具有将整数乘以的指数成分2^x。既然2^61 = 1 mod 2^61-1,您只需要考虑(exponent) mod 61

请参阅:https//en.wikipedia.org/wiki/Mersenne_prime

2305843009213693951 is 2^61 - 1. It’s the largest Mersenne prime that fits into 64 bits.

If you have to make a hash just by taking the value mod some number, then a large Mersenne prime is a good choice — it’s easy to compute and ensures an even distribution of possibilities. (Although I personally would never make a hash this way)

It’s especially convenient to compute the modulus for floating point numbers. They have an exponential component that multiplies the whole number by 2^x. Since 2^61 = 1 mod 2^61-1, you only need to consider the (exponent) mod 61.

See: https://en.wikipedia.org/wiki/Mersenne_prime


回答 2

哈希函数返回的是纯整数int,这意味着返回的值大于-sys.maxint和小于sys.maxint,这意味着如果传递sys.maxint + x给它,结果将为-sys.maxint + (x - 2)

hash(sys.maxint + 1) == sys.maxint + 1 # False
hash(sys.maxint + 1) == - sys.maxint -1 # True
hash(sys.maxint + sys.maxint) == -sys.maxint + sys.maxint - 2 # True

同时2**200n倍大于sys.maxint-我的猜测是,哈希将范围去了-sys.maxint..+sys.maxint,直到它停止在普通整数在这个范围内,如上面的代码段n次..

因此,通常,对于任何n <= sys.maxint

hash(sys.maxint*n) == -sys.maxint*(n%2) +  2*(n%2)*sys.maxint - n/2 - (n + 1)%2 ## True

注意:这适用于python 2。

Hash function returns plain int that means that returned value is greater than -sys.maxint and lower than sys.maxint, which means if you pass sys.maxint + x to it result would be -sys.maxint + (x - 2).

hash(sys.maxint + 1) == sys.maxint + 1 # False
hash(sys.maxint + 1) == - sys.maxint -1 # True
hash(sys.maxint + sys.maxint) == -sys.maxint + sys.maxint - 2 # True

Meanwhile 2**200 is a n times greater than sys.maxint – my guess is that hash would go over range -sys.maxint..+sys.maxint n times until it stops on plain integer in that range, like in code snippets above..

So generally, for any n <= sys.maxint:

hash(sys.maxint*n) == -sys.maxint*(n%2) +  2*(n%2)*sys.maxint - n/2 - (n + 1)%2 ## True

Note: this is true for python 2.


回答 3

可以在这里找到cpython中int类型实现。

它只返回值,除了-1,则返回-2

static long
int_hash(PyIntObject *v)
{
    /* XXX If this is changed, you also need to change the way
       Python's long, float and complex types are hashed. */
    long x = v -> ob_ival;
    if (x == -1)
        x = -2;
    return x;
}

The implementation for the int type in cpython can be found here.

It just returns the value, except for -1, than it returns -2:

static long
int_hash(PyIntObject *v)
{
    /* XXX If this is changed, you also need to change the way
       Python's long, float and complex types are hashed. */
    long x = v -> ob_ival;
    if (x == -1)
        x = -2;
    return x;
}

如何覆盖Python对象的复制/深层复制操作?

问题:如何覆盖Python对象的复制/深层复制操作?

我了解复制模块copy与vs 之间的区别deepcopy。我已经使用过copy.copy并且copy.deepcopy在成功之前使用过,但这是我第一次真正地重载__copy__and __deepcopy__方法。我已经用谷歌搜索看去,通过内置的Python模块查找的实例__copy____deepcopy__功能(例如sets.pydecimal.pyfractions.py),但我仍然不能100%肯定我明白了它的权利。

这是我的情况:

我有一个配置对象。最初,我将使用一组默认值实例化一个配置对象。此配置将移交给其他多个对象(以确保所有对象都以相同的配置开始)。但是,一旦开始用户交互,每个对象都需要独立地调整其配置,而又不影响彼此的配置(对我来说,我需要对初始配置进行深入复制才能进行处理)。

这是一个示例对象:

class ChartConfig(object):

    def __init__(self):

        #Drawing properties (Booleans/strings)
        self.antialiased = None
        self.plot_style = None
        self.plot_title = None
        self.autoscale = None

        #X axis properties (strings/ints)
        self.xaxis_title = None
        self.xaxis_tick_rotation = None
        self.xaxis_tick_align = None

        #Y axis properties (strings/ints)
        self.yaxis_title = None
        self.yaxis_tick_rotation = None
        self.yaxis_tick_align = None

        #A list of non-primitive objects
        self.trace_configs = []

    def __copy__(self):
        pass

    def __deepcopy__(self, memo):
        pass 

在此对象上实现copydeepcopy方法以确保copy.copycopy.deepcopy提供适当行为的正确方法是什么?

I understand the difference between copy vs. deepcopy in the copy module. I’ve used copy.copy and copy.deepcopy before successfully, but this is the first time I’ve actually gone about overloading the __copy__ and __deepcopy__ methods. I’ve already Googled around and looked through the built-in Python modules to look for instances of the __copy__ and __deepcopy__ functions (e.g. sets.py, decimal.py, and fractions.py), but I’m still not 100% sure I’ve got it right.

Here’s my scenario:

I have a configuration object. Initially, I’m going to instantiate one configuration object with a default set of values. This configuration will be handed off to multiple other objects (to ensure all objects start with the same configuration). However, once user interaction starts, each object needs to tweak its configurations independently without affecting each other’s configurations (which says to me I’ll need to make deepcopys of my initial configuration to hand around).

Here’s a sample object:

class ChartConfig(object):

    def __init__(self):

        #Drawing properties (Booleans/strings)
        self.antialiased = None
        self.plot_style = None
        self.plot_title = None
        self.autoscale = None

        #X axis properties (strings/ints)
        self.xaxis_title = None
        self.xaxis_tick_rotation = None
        self.xaxis_tick_align = None

        #Y axis properties (strings/ints)
        self.yaxis_title = None
        self.yaxis_tick_rotation = None
        self.yaxis_tick_align = None

        #A list of non-primitive objects
        self.trace_configs = []

    def __copy__(self):
        pass

    def __deepcopy__(self, memo):
        pass 

What is the right way to implement the copy and deepcopy methods on this object to ensure copy.copy and copy.deepcopy give me the proper behavior?


回答 0

有关自定义的建议位于文档页面的最后

类可以使用与控制酸洗相同的接口来控制复制。有关这些方法的信息,请参见模块pickle的描述。复制模块不使用copy_reg注册模块。

为了让一个类定义自己的副本实现,它可以定义特殊的方法__copy__()__deepcopy__()。前者被称为实现浅拷贝操作;没有传递其他参数。后者被称为实现深度复制操作。它传递了一个参数,即备忘字典。如果__deepcopy__() 实现需要复制组件的深层副本,则应deepcopy()以该组件为第一个参数,并以备注字典为第二个参数来调用该函数。

由于您似乎不关心腌制自定义,因此定义__copy____deepcopy__无疑似乎是适合您的正确方法。

具体来说,__copy__(浅表副本)在您的情况下非常容易…:

def __copy__(self):
  newone = type(self)()
  newone.__dict__.update(self.__dict__)
  return newone

__deepcopy__会类似(也接受memoarg),但是在返回之前,它必须调用self.foo = deepcopy(self.foo, memo)任何self.foo需要深度复制的属性(本质上是容器属性-列表,字典,非原始对象,它们通过__dict__s 保存其他内容)。

The recommendations for customizing are at the very end of the docs page:

Classes can use the same interfaces to control copying that they use to control pickling. See the description of module pickle for information on these methods. The copy module does not use the copy_reg registration module.

In order for a class to define its own copy implementation, it can define special methods __copy__() and __deepcopy__(). The former is called to implement the shallow copy operation; no additional arguments are passed. The latter is called to implement the deep copy operation; it is passed one argument, the memo dictionary. If the __deepcopy__() implementation needs to make a deep copy of a component, it should call the deepcopy() function with the component as first argument and the memo dictionary as second argument.

Since you appear not to care about pickling customization, defining __copy__ and __deepcopy__ definitely seems like the right way to go for you.

Specifically, __copy__ (the shallow copy) is pretty easy in your case…:

def __copy__(self):
  newone = type(self)()
  newone.__dict__.update(self.__dict__)
  return newone

__deepcopy__ would be similar (accepting a memo arg too) but before the return it would have to call self.foo = deepcopy(self.foo, memo) for any attribute self.foo that needs deep copying (essentially attributes that are containers — lists, dicts, non-primitive objects which hold other stuff through their __dict__s).


回答 1

将Alex Martelli的答案和Rob Young的评论放在一起,您将获得以下代码:

from copy import copy, deepcopy

class A(object):
    def __init__(self):
        print 'init'
        self.v = 10
        self.z = [2,3,4]

    def __copy__(self):
        cls = self.__class__
        result = cls.__new__(cls)
        result.__dict__.update(self.__dict__)
        return result

    def __deepcopy__(self, memo):
        cls = self.__class__
        result = cls.__new__(cls)
        memo[id(self)] = result
        for k, v in self.__dict__.items():
            setattr(result, k, deepcopy(v, memo))
        return result

a = A()
a.v = 11
b1, b2 = copy(a), deepcopy(a)
a.v = 12
a.z.append(5)
print b1.v, b1.z
print b2.v, b2.z

版画

init
11 [2, 3, 4, 5]
11 [2, 3, 4]

在此__deepcopy__填写该格,memo以避免在对象本身从其成员引用的情况下过度复制。

Putting together Alex Martelli’s answer and Rob Young’s comment you get the following code:

from copy import copy, deepcopy

class A(object):
    def __init__(self):
        print 'init'
        self.v = 10
        self.z = [2,3,4]

    def __copy__(self):
        cls = self.__class__
        result = cls.__new__(cls)
        result.__dict__.update(self.__dict__)
        return result

    def __deepcopy__(self, memo):
        cls = self.__class__
        result = cls.__new__(cls)
        memo[id(self)] = result
        for k, v in self.__dict__.items():
            setattr(result, k, deepcopy(v, memo))
        return result

a = A()
a.v = 11
b1, b2 = copy(a), deepcopy(a)
a.v = 12
a.z.append(5)
print b1.v, b1.z
print b2.v, b2.z

prints

init
11 [2, 3, 4, 5]
11 [2, 3, 4]

here __deepcopy__ fills in the memo dict to avoid excess copying in case the object itself is referenced from its member.


回答 2

遵循Peter的出色回答,实现自定义深度复制,对默认实现的更改最少(例如,仅修改所需的字段):

class Foo(object):
    def __deepcopy__(self, memo):
        deepcopy_method = self.__deepcopy__
        self.__deepcopy__ = None
        cp = deepcopy(self, memo)
        self.__deepcopy__ = deepcopy_method
        cp.__deepcopy__ = deepcopy_method

        # custom treatments
        # for instance: cp.id = None

        return cp

Following Peter’s excellent answer, to implement a custom deepcopy, with minimal alteration to the default implementation (e.g. just modifying a field like I needed) :

class Foo(object):
    def __deepcopy__(self, memo):
        deepcopy_method = self.__deepcopy__
        self.__deepcopy__ = None
        cp = deepcopy(self, memo)
        self.__deepcopy__ = deepcopy_method
        cp.__deepcopy__ = deepcopy_method

        # custom treatments
        # for instance: cp.id = None

        return cp

回答 3

从您的问题尚不清楚,您为什么需要覆盖这些方法,因为您不想对复制方法进行任何自定义。

无论如何,如果您确实想自定义深层副本(例如,通过共享某些属性并复制其他属性),则可以采用以下解决方案:

from copy import deepcopy


def deepcopy_with_sharing(obj, shared_attribute_names, memo=None):
    '''
    Deepcopy an object, except for a given list of attributes, which should
    be shared between the original object and its copy.

    obj is some object
    shared_attribute_names: A list of strings identifying the attributes that
        should be shared between the original and its copy.
    memo is the dictionary passed into __deepcopy__.  Ignore this argument if
        not calling from within __deepcopy__.
    '''
    assert isinstance(shared_attribute_names, (list, tuple))
    shared_attributes = {k: getattr(obj, k) for k in shared_attribute_names}

    if hasattr(obj, '__deepcopy__'):
        # Do hack to prevent infinite recursion in call to deepcopy
        deepcopy_method = obj.__deepcopy__
        obj.__deepcopy__ = None

    for attr in shared_attribute_names:
        del obj.__dict__[attr]

    clone = deepcopy(obj)

    for attr, val in shared_attributes.iteritems():
        setattr(obj, attr, val)
        setattr(clone, attr, val)

    if hasattr(obj, '__deepcopy__'):
        # Undo hack
        obj.__deepcopy__ = deepcopy_method
        del clone.__deepcopy__

    return clone



class A(object):

    def __init__(self):
        self.copy_me = []
        self.share_me = []

    def __deepcopy__(self, memo):
        return deepcopy_with_sharing(self, shared_attribute_names = ['share_me'], memo=memo)

a = A()
b = deepcopy(a)
assert a.copy_me is not b.copy_me
assert a.share_me is b.share_me

c = deepcopy(b)
assert c.copy_me is not b.copy_me
assert c.share_me is b.share_me

Its not clear from your problem why you need to override these methods, since you don’t want to do any customization to the copying methods.

Anyhow, if you do want to customize the deep copy (e.g. by sharing some attributes and copying others), here is a solution:

from copy import deepcopy


def deepcopy_with_sharing(obj, shared_attribute_names, memo=None):
    '''
    Deepcopy an object, except for a given list of attributes, which should
    be shared between the original object and its copy.

    obj is some object
    shared_attribute_names: A list of strings identifying the attributes that
        should be shared between the original and its copy.
    memo is the dictionary passed into __deepcopy__.  Ignore this argument if
        not calling from within __deepcopy__.
    '''
    assert isinstance(shared_attribute_names, (list, tuple))
    shared_attributes = {k: getattr(obj, k) for k in shared_attribute_names}

    if hasattr(obj, '__deepcopy__'):
        # Do hack to prevent infinite recursion in call to deepcopy
        deepcopy_method = obj.__deepcopy__
        obj.__deepcopy__ = None

    for attr in shared_attribute_names:
        del obj.__dict__[attr]

    clone = deepcopy(obj)

    for attr, val in shared_attributes.iteritems():
        setattr(obj, attr, val)
        setattr(clone, attr, val)

    if hasattr(obj, '__deepcopy__'):
        # Undo hack
        obj.__deepcopy__ = deepcopy_method
        del clone.__deepcopy__

    return clone



class A(object):

    def __init__(self):
        self.copy_me = []
        self.share_me = []

    def __deepcopy__(self, memo):
        return deepcopy_with_sharing(self, shared_attribute_names = ['share_me'], memo=memo)

a = A()
b = deepcopy(a)
assert a.copy_me is not b.copy_me
assert a.share_me is b.share_me

c = deepcopy(b)
assert c.copy_me is not b.copy_me
assert c.share_me is b.share_me

回答 4

我可能在细节上有些偏离,但是这里有:

copy文档 ;

  • 浅表副本将构造一个新的复合对象,然后(在可能的范围内)将对原始对象中找到的对象的引用插入其中。
  • 深层副本将构造一个新的复合对象,然后递归地将原始对象中发现的对象的副本插入其中。

换句话说:copy()将仅复制顶部元素,将其余元素作为指针保留到原始结构中。deepcopy()将递归复制所有内容。

那就是deepcopy()你所需要的。

如果您需要做一些真正特定的事情,则可以按照手册中的说明覆盖__copy__()__deepcopy__()。就个人而言,我可能会实现一个普通函数(例如config.copy_config(),诸如此类)以明确表明它不是Python标准行为。

I might be a bit off on the specifics, but here goes;

From the copy docs;

  • A shallow copy constructs a new compound object and then (to the extent possible) inserts references into it to the objects found in the original.
  • A deep copy constructs a new compound object and then, recursively, inserts copies into it of the objects found in the original.

In other words: copy() will copy only the top element and leave the rest as pointers into the original structure. deepcopy() will recursively copy over everything.

That is, deepcopy() is what you need.

If you need to do something really specific, you can override __copy__() or __deepcopy__(), as described in the manual. Personally, I’d probably implement a plain function (e.g. config.copy_config() or such) to make it plain that it isn’t Python standard behaviour.


回答 5

copy模块最终使用__getstate__()/ pickling协议,因此这些也是要覆盖的有效目标。__setstate__()

默认实现只是返回并设置__dict__类的,因此您不必调用super()并担心上面的 Eino Gourdin的巧妙技巧。

The copy module uses eventually the __getstate__()/__setstate__() pickling protocol, so these are also valid targets to override.

The default implementation just returns and sets the __dict__ of the class, so you don’t have to call super() and worry about Eino Gourdin’s clever trick, above.


回答 6

建立在安东尼·哈奇金斯(Antony Hatchkins)干净答案的基础上,这是我的版本,其中相关类来自另一个自定义类(我们需要调用super):

class Foo(FooBase):
    def __init__(self, param1, param2):
        self._base_params = [param1, param2]
        super(Foo, result).__init__(*self._base_params)

    def __copy__(self):
        cls = self.__class__
        result = cls.__new__(cls)
        result.__dict__.update(self.__dict__)
        super(Foo, result).__init__(*self._base_params)
        return result

    def __deepcopy__(self, memo):
        cls = self.__class__
        result = cls.__new__(cls)
        memo[id(self)] = result
        for k, v in self.__dict__.items():
            setattr(result, k, copy.deepcopy(v, memo))
        super(Foo, result).__init__(*self._base_params)
        return result

Building on Antony Hatchkins’ clean answer, here’s my version where the class in question derives from another custom class (s.t. we need to call super):

class Foo(FooBase):
    def __init__(self, param1, param2):
        self._base_params = [param1, param2]
        super(Foo, result).__init__(*self._base_params)

    def __copy__(self):
        cls = self.__class__
        result = cls.__new__(cls)
        result.__dict__.update(self.__dict__)
        super(Foo, result).__init__(*self._base_params)
        return result

    def __deepcopy__(self, memo):
        cls = self.__class__
        result = cls.__new__(cls)
        memo[id(self)] = result
        for k, v in self.__dict__.items():
            setattr(result, k, copy.deepcopy(v, memo))
        super(Foo, result).__init__(*self._base_params)
        return result