为什么提早归还比其他要慢?

问题:为什么提早归还比其他要慢?

这是我几天前给出的答案的后续问题。编辑:该问题的OP似乎已经使用了我发布给他的代码来问同样的问题,但是我没有意识到。道歉。提供的答案是不同的!

我基本上观察到:

>>> def without_else(param=False):
...     if param:
...         return 1
...     return 0
>>> def with_else(param=False):
...     if param:
...         return 1
...     else:
...         return 0
>>> from timeit import Timer as T
>>> T(lambda : without_else()).repeat()
[0.3011460304260254, 0.2866089344024658, 0.2871549129486084]
>>> T(lambda : with_else()).repeat()
[0.27536892890930176, 0.2693932056427002, 0.27011704444885254]
>>> T(lambda : without_else(True)).repeat()
[0.3383951187133789, 0.32756996154785156, 0.3279120922088623]
>>> T(lambda : with_else(True)).repeat()
[0.3305950164794922, 0.32186388969421387, 0.3209099769592285]

…或者换句话说:else无论if是否触发条件,使子句更快。

我认为这与两者生成的不同字节码有关,但是有人能详细确认/解释吗?

编辑:似乎不是每个人都可以重现我的时间安排,所以我认为在我的系统上提供一些信息可能有用。我正在运行安装了默认python的Ubuntu 11.10 64位。python生成以下版本信息:

Python 2.7.2+ (default, Oct  4 2011, 20:06:09) 
[GCC 4.6.1] on linux2

以下是Python 2.7中反汇编的结果:

>>> dis.dis(without_else)
  2           0 LOAD_FAST                0 (param)
              3 POP_JUMP_IF_FALSE       10

  3           6 LOAD_CONST               1 (1)
              9 RETURN_VALUE        

  4     >>   10 LOAD_CONST               2 (0)
             13 RETURN_VALUE        
>>> dis.dis(with_else)
  2           0 LOAD_FAST                0 (param)
              3 POP_JUMP_IF_FALSE       10

  3           6 LOAD_CONST               1 (1)
              9 RETURN_VALUE        

  5     >>   10 LOAD_CONST               2 (0)
             13 RETURN_VALUE        
             14 LOAD_CONST               0 (None)
             17 RETURN_VALUE        

This is a follow-up question to an answer I gave a few days back. Edit: it seems that the OP of that question already used the code I posted to him to ask the same question, but I was unaware of it. Apologies. The answers provided are different though!

Substantially I observed that:

>>> def without_else(param=False):
...     if param:
...         return 1
...     return 0
>>> def with_else(param=False):
...     if param:
...         return 1
...     else:
...         return 0
>>> from timeit import Timer as T
>>> T(lambda : without_else()).repeat()
[0.3011460304260254, 0.2866089344024658, 0.2871549129486084]
>>> T(lambda : with_else()).repeat()
[0.27536892890930176, 0.2693932056427002, 0.27011704444885254]
>>> T(lambda : without_else(True)).repeat()
[0.3383951187133789, 0.32756996154785156, 0.3279120922088623]
>>> T(lambda : with_else(True)).repeat()
[0.3305950164794922, 0.32186388969421387, 0.3209099769592285]

…or in other words: having the else clause is faster regardless of the if condition being triggered or not.

I assume it has to do with different bytecode generated by the two, but is anybody able to confirm/explain in detail?

EDIT: Seems not everybody is able to reproduce my timings, so I thought it might be useful to give some info on my system. I’m running Ubuntu 11.10 64 bit with the default python installed. python generates the following version information:

Python 2.7.2+ (default, Oct  4 2011, 20:06:09) 
[GCC 4.6.1] on linux2

Here are the results of the disassembly in Python 2.7:

>>> dis.dis(without_else)
  2           0 LOAD_FAST                0 (param)
              3 POP_JUMP_IF_FALSE       10

  3           6 LOAD_CONST               1 (1)
              9 RETURN_VALUE        

  4     >>   10 LOAD_CONST               2 (0)
             13 RETURN_VALUE        
>>> dis.dis(with_else)
  2           0 LOAD_FAST                0 (param)
              3 POP_JUMP_IF_FALSE       10

  3           6 LOAD_CONST               1 (1)
              9 RETURN_VALUE        

  5     >>   10 LOAD_CONST               2 (0)
             13 RETURN_VALUE        
             14 LOAD_CONST               0 (None)
             17 RETURN_VALUE        

回答 0

这纯粹是猜测,我还没有找到一种简单的方法来检查它是否正确,但是我有一个理论适合您。

我尝试了您的代码并获得了相同的结果,但without_else()反复地比with_else()

>>> T(lambda : without_else()).repeat()
[0.42015745017874906, 0.3188967452567226, 0.31984281521812363]
>>> T(lambda : with_else()).repeat()
[0.36009842032996175, 0.28962249392031936, 0.2927151355828528]
>>> T(lambda : without_else(True)).repeat()
[0.31709728471076915, 0.3172671387005721, 0.3285821242644147]
>>> T(lambda : with_else(True)).repeat()
[0.30939889008243426, 0.3035132258429485, 0.3046679117038593]

考虑到字节码是相同的,唯一的区别是函数的名称。特别是时序测试会在全局名称上进行查找。尝试重命名without_else(),区别消失:

>>> def no_else(param=False):
    if param:
        return 1
    return 0

>>> T(lambda : no_else()).repeat()
[0.3359846013948413, 0.29025818923918223, 0.2921801513879245]
>>> T(lambda : no_else(True)).repeat()
[0.3810395594970828, 0.2969634408842694, 0.2960104566362247]

我的猜测是,这without_else与其他内容发生了哈希冲突,globals()因此全局名称查找稍微慢一些。

编辑:具有7个或8个键的字典可能具有32个插槽,因此在此基础上,without_else哈希与__builtins__

>>> [(k, hash(k) % 32) for k in globals().keys() ]
[('__builtins__', 8), ('with_else', 9), ('__package__', 15), ('without_else', 8), ('T', 21), ('__name__', 25), ('no_else', 28), ('__doc__', 29)]

要阐明哈希如何工作:

__builtins__ 散列为-1196389688,这减小了表大小(32)的模数,这意味着它存储在表的#8插槽中。

without_else哈希到505688136,将模32的值降低为8,因此发生冲突。要解决此问题,Python计算:

从…开始:

j = hash % 32
perturb = hash

重复此过程,直到找到可用的插槽:

j = (5*j) + 1 + perturb;
perturb >>= 5;
use j % 2**i as the next table index;

这使它可以用作下一个索引17。幸运的是,它是免费的,因此循环仅重复一次。哈希表的大小是2的幂,哈希表的大小也是2的幂2**ii是哈希值使用的位数j

对该表的每个探测都可以找到以下之一:

  • 插槽为空,在这种情况下,探测停止,并且我们知道该值不在表中。

  • 该广告位尚未使用,但已在过去使用过,在这种情况下,我们尝试使用如上计算的下一个值。

  • 插槽已满,但是存储在表中的完整哈希值与我们要查找的键的哈希值不同(在__builtins__vs 的情况下就是这样without_else)。

  • 插槽已满,并且恰好具有我们想要的哈希值,然后Python检查以查看键和我们正在查找的对象是否是同一对象(在这种情况下,这是因为可能会插入标识符的短字符串被插入了,所以相同的标识符使用完全相同的字符串)。

  • 最终,当插槽已满时,哈希值完全匹配,但是键不是同一对象,然后Python才会尝试比较它们是否相等。这相对较慢,但实际上不应该进行名称查找。

This is a pure guess, and I haven’t figured out an easy way to check whether it is right, but I have a theory for you.

I tried your code and get the same of results, without_else() is repeatedly slightly slower than with_else():

>>> T(lambda : without_else()).repeat()
[0.42015745017874906, 0.3188967452567226, 0.31984281521812363]
>>> T(lambda : with_else()).repeat()
[0.36009842032996175, 0.28962249392031936, 0.2927151355828528]
>>> T(lambda : without_else(True)).repeat()
[0.31709728471076915, 0.3172671387005721, 0.3285821242644147]
>>> T(lambda : with_else(True)).repeat()
[0.30939889008243426, 0.3035132258429485, 0.3046679117038593]

Considering that the bytecode is identical, the only difference is the name of the function. In particular the timing test does a lookup on the global name. Try renaming without_else() and the difference disappears:

>>> def no_else(param=False):
    if param:
        return 1
    return 0

>>> T(lambda : no_else()).repeat()
[0.3359846013948413, 0.29025818923918223, 0.2921801513879245]
>>> T(lambda : no_else(True)).repeat()
[0.3810395594970828, 0.2969634408842694, 0.2960104566362247]

My guess is that without_else has a hash collision with something else in globals() so the global name lookup is slightly slower.

Edit: A dictionary with 7 or 8 keys probably has 32 slots, so on that basis without_else has a hash collision with __builtins__:

>>> [(k, hash(k) % 32) for k in globals().keys() ]
[('__builtins__', 8), ('with_else', 9), ('__package__', 15), ('without_else', 8), ('T', 21), ('__name__', 25), ('no_else', 28), ('__doc__', 29)]

To clarify how the hashing works:

__builtins__ hashes to -1196389688 which reduced modulo the table size (32) means it is stored in the #8 slot of the table.

without_else hashes to 505688136 which reduced modulo 32 is 8 so there’s a collision. To resolve this Python calculates:

Starting with:

j = hash % 32
perturb = hash

Repeat this until we find a free slot:

j = (5*j) + 1 + perturb;
perturb >>= 5;
use j % 2**i as the next table index;

which gives it 17 to use as the next index. Fortunately that’s free so the loop only repeats once. The hash table size is a power of 2, so 2**i is the size of the hash table, i is the number of bits used from the hash value j.

Each probe into the table can find one of these:

  • The slot is empty, in that case the probing stops and we know the value is not in the table.

  • The slot is unused but was used in the past in which case we go try the next value calculated as above.

  • The slot is full but the full hash value stored in the table isn’t the same as the hash of the key we are looking for (that’s what happens in the case of __builtins__ vs without_else).

  • The slot is full and has exactly the hash value we want, then Python checks to see if the key and the object we are looking up are the same object (which in this case they will be because short strings that could be identifiers are interned so identical identifiers use the exact same string).

  • Finally when the slot is full, the hash matches exactly, but the keys are not the identical object, then and only then will Python try comparing them for equality. This is comparatively slow, but in the case of name lookups shouldn’t actually happen.