内容 隐藏
问题:“ x 从此页面,我们知道: 链式比较比使用and运算符要快。写x < y < z而不是x < y and y < z。 但是,测试以下代码片段时,我得到了不同的结果: $ python -m timeit "x = 1.2" "y = 1.3" "z = 1.8" "x < y < z" 1000000 loops, best of 3: 0.322 usec per loop $ python -m timeit "x = 1.2" "y = 1.3" "z = 1.8" "x < y and y < z" 1000000 loops, best of 3: 0.22 usec per loop $ python -m timeit "x = 1.2" "y = 1.3" "z = 1.1" "x < y < z" 1000000 loops, best of 3: 0.279 usec per loop $ python -m timeit "x = 1.2" "y = 1.3" "z = 1.1" "x < y and y < z" 1000000 loops, best of 3: 0.215 usec per loop 看来x < y and y < z比快x < y < z。为什么? 在搜索了该站点的一些帖子(如本篇文章)之后,我知道“仅评估一次”是的关键x < y < z,但是我仍然感到困惑。为了进一步研究,我使用dis.dis以下命令分解了这两个函数: import dis def chained_compare(): x = 1.2 y = 1.3 z = 1.1 x < y < z def and_compare(): x = 1.2 y = 1.3 z = 1.1 x < y and y < z dis.dis(chained_compare) dis.dis(and_compare) 输出为: ## chained_compare ## 4 0 LOAD_CONST 1 (1.2) 3 STORE_FAST 0 (x) 5 6 LOAD_CONST 2 (1.3) 9 STORE_FAST 1 (y) 6 12 LOAD_CONST 3 (1.1) 15 STORE_FAST 2 (z) 7 18 LOAD_FAST 0 (x) 21 LOAD_FAST 1 (y) 24 DUP_TOP 25 ROT_THREE 26 COMPARE_OP 0 (<) 29 JUMP_IF_FALSE_OR_POP 41 32 LOAD_FAST 2 (z) 35 COMPARE_OP 0 (<) 38 JUMP_FORWARD 2 (to 43) >> 41 ROT_TWO 42 POP_TOP >> 43 POP_TOP 44 LOAD_CONST 0 (None) 47 RETURN_VALUE ## and_compare ## 10 0 LOAD_CONST 1 (1.2) 3 STORE_FAST 0 (x) 11 6 LOAD_CONST 2 (1.3) 9 STORE_FAST 1 (y) 12 12 LOAD_CONST 3 (1.1) 15 STORE_FAST 2 (z) 13 18 LOAD_FAST 0 (x) 21 LOAD_FAST 1 (y) 24 COMPARE_OP 0 (<) 27 JUMP_IF_FALSE_OR_POP 39 30 LOAD_FAST 1 (y) 33 LOAD_FAST 2 (z) 36 COMPARE_OP 0 (<) >> 39 POP_TOP 40 LOAD_CONST 0 (None) 看来,的x < y and y < z分解命令比少x < y < z。我应该考虑x < y and y < z比x < y < z吗? 在Intel®Xeon®CPU E5640 @ 2.67GHz上使用Python 2.7.6进行了测试。 点击查看英文原文 From this page, we know that: Chained comparisons are faster than using the and operator. Write x < y < z instead of x < y and y < z. However, I got a different result testing the following code snippets: $ python -m timeit "x = 1.2" "y = 1.3" "z = 1.8" "x < y < z" 1000000 loops, best of 3: 0.322 usec per loop $ python -m timeit "x = 1.2" "y = 1.3" "z = 1.8" "x < y and y < z" 1000000 loops, best of 3: 0.22 usec per loop $ python -m timeit "x = 1.2" "y = 1.3" "z = 1.1" "x < y < z" 1000000 loops, best of 3: 0.279 usec per loop $ python -m timeit "x = 1.2" "y = 1.3" "z = 1.1" "x < y and y < z" 1000000 loops, best of 3: 0.215 usec per loop It seems that x < y and y < z is faster than x < y < z. Why? After searching some posts in this site (like this one) I know that “evaluated only once” is the key for x < y < z, however I’m still confused. To do further study, I disassembled these two functions using dis.dis: import dis def chained_compare(): x = 1.2 y = 1.3 z = 1.1 x < y < z def and_compare(): x = 1.2 y = 1.3 z = 1.1 x < y and y < z dis.dis(chained_compare) dis.dis(and_compare) And the output is: ## chained_compare ## 4 0 LOAD_CONST 1 (1.2) 3 STORE_FAST 0 (x) 5 6 LOAD_CONST 2 (1.3) 9 STORE_FAST 1 (y) 6 12 LOAD_CONST 3 (1.1) 15 STORE_FAST 2 (z) 7 18 LOAD_FAST 0 (x) 21 LOAD_FAST 1 (y) 24 DUP_TOP 25 ROT_THREE 26 COMPARE_OP 0 (<) 29 JUMP_IF_FALSE_OR_POP 41 32 LOAD_FAST 2 (z) 35 COMPARE_OP 0 (<) 38 JUMP_FORWARD 2 (to 43) >> 41 ROT_TWO 42 POP_TOP >> 43 POP_TOP 44 LOAD_CONST 0 (None) 47 RETURN_VALUE ## and_compare ## 10 0 LOAD_CONST 1 (1.2) 3 STORE_FAST 0 (x) 11 6 LOAD_CONST 2 (1.3) 9 STORE_FAST 1 (y) 12 12 LOAD_CONST 3 (1.1) 15 STORE_FAST 2 (z) 13 18 LOAD_FAST 0 (x) 21 LOAD_FAST 1 (y) 24 COMPARE_OP 0 (<) 27 JUMP_IF_FALSE_OR_POP 39 30 LOAD_FAST 1 (y) 33 LOAD_FAST 2 (z) 36 COMPARE_OP 0 (<) >> 39 POP_TOP 40 LOAD_CONST 0 (None) It seems that the x < y and y < z has less dissembled commands than x < y < z. Should I consider x < y and y < z faster than x < y < z? Tested with Python 2.7.6 on an Intel(R) Xeon(R) CPU E5640 @ 2.67GHz. 回答 0 区别在于in x < y < z y仅被评估一次。如果y是一个变量,这并没有太大的区别,但是当它是一个函数调用时,它却会产生很大的差异,这需要花费一些时间来计算。 from time import sleep def y(): sleep(.2) return 1.3 %timeit 1.2 < y() < 1.8 10 loops, best of 3: 203 ms per loop %timeit 1.2 < y() and y() < 1.8 1 loops, best of 3: 405 ms per loop 点击查看英文原文 The difference is that in x < y < z y is only evaluated once. This does not make a large difference if y is a variable, but it does when it is a function call, which takes some time to compute. from time import sleep def y(): sleep(.2) return 1.3 %timeit 1.2 < y() < 1.8 10 loops, best of 3: 203 ms per loop %timeit 1.2 < y() and y() < 1.8 1 loops, best of 3: 405 ms per loop 回答 1 您定义的两个函数的最佳字节码将是 0 LOAD_CONST 0 (None) 3 RETURN_VALUE 因为不使用比较结果。通过返回比较结果,使情况变得更加有趣。让我们在编译时也无法得知结果。 def interesting_compare(y): x = 1.1 z = 1.3 return x < y < z # or: x < y and y < z 同样,比较的两个版本在语义上是相同的,因此两个结构的最佳字节码相同。尽我所能,它看起来像这样。我已经在每个操作码之前和之后用Forth注释(在右边的栈顶,在--前后划分,尾部?表示可能存在或不存在的东西)的每一行都用了堆栈内容。请注意,RETURN_VALUE将丢弃所有遗留在返回值下面的堆栈中的所有内容。 0 LOAD_FAST 0 (y) ; -- y 3 DUP_TOP ; y -- y y 4 LOAD_CONST 0 (1.1) ; y y -- y y 1.1 7 COMPARE_OP 4 (>) ; y y 1.1 -- y pred 10 JUMP_IF_FALSE_OR_POP 19 ; y pred -- y 13 LOAD_CONST 1 (1.3) ; y -- y 1.3 16 COMPARE_OP 0 (<) ; y 1.3 -- pred >> 19 RETURN_VALUE ; y? pred -- 如果CPython,PyPy等语言的实现未针对两种变体生成此字节码(或其等效的操作序列),则说明该字节码编译器的质量较差。从上面发布的字节码序列中获取是一个已解决的问题(我想在这种情况下,您需要做的就是不断折叠,消除无效代码以及对堆栈内容进行更好的建模;常见的子表达式消除也将是廉价且有价值的),而没有在现代语言实现中没有这样做的借口。 现在,碰巧该语言的所有当前实现都具有劣质的字节码编译器。但是您在编码时应该忽略这一点!假装字节码编译器很好,并编写最易读的代码。无论如何它可能足够快。如果不是这样,请首先寻找算法上的改进,然后再尝试Cython-与您可能应用的任何表达式级调整相比,在相同的工作量下将提供更多的改进。 点击查看英文原文 Optimal bytecode for both of the functions you defined would be 0 LOAD_CONST 0 (None) 3 RETURN_VALUE because the result of the comparison is not used. Let’s make the situation more interesting by returning the result of the comparison. Let’s also have the result not be knowable at compile time. def interesting_compare(y): x = 1.1 z = 1.3 return x < y < z # or: x < y and y < z Again, the two versions of the comparison are semantically identical, so the optimal bytecode is the same for both constructs. As best I can work it out, it would look like this. I’ve annotated each line with the stack contents before and after each opcode, in Forth notation (top of stack at right, -- divides before and after, trailing ? indicates something that might or might not be there). Note that RETURN_VALUE discards everything that happens to be left on the stack underneath the value returned. 0 LOAD_FAST 0 (y) ; -- y 3 DUP_TOP ; y -- y y 4 LOAD_CONST 0 (1.1) ; y y -- y y 1.1 7 COMPARE_OP 4 (>) ; y y 1.1 -- y pred 10 JUMP_IF_FALSE_OR_POP 19 ; y pred -- y 13 LOAD_CONST 1 (1.3) ; y -- y 1.3 16 COMPARE_OP 0 (<) ; y 1.3 -- pred >> 19 RETURN_VALUE ; y? pred -- If an implementation of the language, CPython, PyPy, whatever, does not generate this bytecode (or its own equivalent sequence of operations) for both variations, that demonstrates the poor quality of that bytecode compiler. Getting from the bytecode sequences you posted to the above is a solved problem (I think all you need for this case is constant folding, dead code elimination, and better modeling of the contents of the stack; common subexpression elimination would also be cheap and valuable), and there’s really no excuse for not doing it in a modern language implementation. Now, it happens that all current implementations of the language have poor-quality bytecode compilers. But you should ignore that while coding! Pretend the bytecode compiler is good, and write the most readable code. It will probably be plenty fast enough anyway. If it isn’t, look for algorithmic improvements first, and give Cython a try second — that will provide far more improvement for the same effort than any expression-level tweaks you might apply. 回答 2 由于输出的差异似乎是由于缺乏优化所致,所以我认为在大多数情况下您应该忽略该差异-可能差异会消失。区别在于,y只应评估一次,然后通过将其复制到堆栈上来解决该问题,这需要额外的费用POP_TOP– LOAD_FAST尽管有可能使用解决方案。 但是,重要的区别在于,如果对x<y and y<z第二个y进行评估,则如果应x<y为true,则应评估两次,如果对的评估y花费大量时间或具有副作用,则可能会产生影响。 在大多数情况下,x<y<z尽管速度稍慢,但仍应使用。 点击查看英文原文 Since the difference in the output seem to be due to lack of optimization I think you should ignore that difference for most cases – it could be that the difference will go away. The difference is because y only should be evaluated once and that is solved by duplicating it on the stack which requires an extra POP_TOP – the solution to use LOAD_FAST might be possible though. The important difference though is that in x<y and y<z the second y should be evaluated twice if x<y evaluates to true, this has implications if the evaluation of y takes considerable time or have side effects. In most scenarios you should use x<y<z despite the fact it’s somewhat slower. 回答 3 首先,您的比较几乎没有意义,因为没有引入两种不同的构造来提高性能,因此您不应基于此决定是否使用一个构造来代替另一个构造。 该x < y < z结构: 其含义更清晰,更直接。 它的语义是您从比较的“数学意义”中所期望的:evalute x,y并z 一次并检查是否整个条件成立。使用可以and通过y多次评估来更改语义,这可以更改结果。 因此,请根据您想要的语义以及是否相等来选择一个,以代替另一个。 这就是说:更多的反汇编代码确实 并不意味着慢的代码。但是,执行更多的字节码操作意味着每个操作都比较简单,但是需要主循环的迭代。这意味着,如果您正在执行的操作非常快(例如,您在那里执行的本地变量查找),那么执行更多字节码操作的开销可能会很重要。 但要注意,这个结果并没有在更一般的情况下举行,仅在“最坏情况”那你碰巧轮廓。正如其他人指出的那样,如果更改y为花费更多时间的内容,您将看到结果更改,因为链接表示法仅对它进行一次评估。 总结: 性能之前要考虑语义。 考虑到可读性。 不要相信微型基准。始终使用不同种类的参数进行分析,以了解功能/表达式时序相对于所述参数的行为,并考虑您打算如何使用它。 点击查看英文原文 First of all, your comparison is pretty much meaningless because the two different constructs were not introduced to provide a performance improvement, so you shouldn’t decide whether to use one in place of the other based on that. The x < y < z construct: Is clearer and more direct in its meaning. Its semantics is what you’d expect from the “mathematical meaning” of the comparison: evalute x, y and z once and check if the whole condition holds. Using and changes the semantics by evaluating y multiple times, which can change the result. So choose one in place of the other depending on the semantics you want and, if they are equivalent, whether one is more readable than the other. This said: more disassembled code does does not imply slower code. However executing more bytecode operations means that each operation is simpler and yet it requires an iteration of the main loop. This means that if the operations you are performing are extremely fast (e.g. local variable lookup as you are doing there), then the overhead of executing more bytecode operations can matter. But note that this result does not hold in the more generic situation, only to the “worst case” that you happen to profile. As others have noted, if you change y to something that takes even a bit more time you’ll see that the results change, because the chained notation evaluates it only once. Summarizing: Consider semantics before performance. Take into account readability. Don’t trust micro benchmarks. Always profile with different kind of parameters to see how a function/expression timing behave in relation to said parameters and consider how you plan to use it.

问题:“ x

从此页面,我们知道:

链式比较比使用and运算符要快。写x < y < z而不是x < y and y < z

但是,测试以下代码片段时,我得到了不同的结果:

$ python -m timeit "x = 1.2" "y = 1.3" "z = 1.8" "x < y < z"
1000000 loops, best of 3: 0.322 usec per loop
$ python -m timeit "x = 1.2" "y = 1.3" "z = 1.8" "x < y and y < z"
1000000 loops, best of 3: 0.22 usec per loop
$ python -m timeit "x = 1.2" "y = 1.3" "z = 1.1" "x < y < z"
1000000 loops, best of 3: 0.279 usec per loop
$ python -m timeit "x = 1.2" "y = 1.3" "z = 1.1" "x < y and y < z"
1000000 loops, best of 3: 0.215 usec per loop

看来x < y and y < z比快x < y < z为什么?

在搜索了该站点的一些帖子(如本篇文章)之后,我知道“仅评估一次”是的关键x < y < z,但是我仍然感到困惑。为了进一步研究,我使用dis.dis以下命令分解了这两个函数:

import dis

def chained_compare():
        x = 1.2
        y = 1.3
        z = 1.1
        x < y < z

def and_compare():
        x = 1.2
        y = 1.3
        z = 1.1
        x < y and y < z

dis.dis(chained_compare)
dis.dis(and_compare)

输出为:

## chained_compare ##

  4           0 LOAD_CONST               1 (1.2)
              3 STORE_FAST               0 (x)

  5           6 LOAD_CONST               2 (1.3)
              9 STORE_FAST               1 (y)

  6          12 LOAD_CONST               3 (1.1)
             15 STORE_FAST               2 (z)

  7          18 LOAD_FAST                0 (x)
             21 LOAD_FAST                1 (y)
             24 DUP_TOP
             25 ROT_THREE
             26 COMPARE_OP               0 (<)
             29 JUMP_IF_FALSE_OR_POP    41
             32 LOAD_FAST                2 (z)
             35 COMPARE_OP               0 (<)
             38 JUMP_FORWARD             2 (to 43)
        >>   41 ROT_TWO
             42 POP_TOP
        >>   43 POP_TOP
             44 LOAD_CONST               0 (None)
             47 RETURN_VALUE

## and_compare ##

 10           0 LOAD_CONST               1 (1.2)
              3 STORE_FAST               0 (x)

 11           6 LOAD_CONST               2 (1.3)
              9 STORE_FAST               1 (y)

 12          12 LOAD_CONST               3 (1.1)
             15 STORE_FAST               2 (z)

 13          18 LOAD_FAST                0 (x)
             21 LOAD_FAST                1 (y)
             24 COMPARE_OP               0 (<)
             27 JUMP_IF_FALSE_OR_POP    39
             30 LOAD_FAST                1 (y)
             33 LOAD_FAST                2 (z)
             36 COMPARE_OP               0 (<)
        >>   39 POP_TOP
             40 LOAD_CONST               0 (None)

看来,的x < y and y < z分解命令比少x < y < z。我应该考虑x < y and y < zx < y < z吗?

在Intel®Xeon®CPU E5640 @ 2.67GHz上使用Python 2.7.6进行了测试。

From this page, we know that:

Chained comparisons are faster than using the and operator. Write x < y < z instead of x < y and y < z.

However, I got a different result testing the following code snippets:

$ python -m timeit "x = 1.2" "y = 1.3" "z = 1.8" "x < y < z"
1000000 loops, best of 3: 0.322 usec per loop
$ python -m timeit "x = 1.2" "y = 1.3" "z = 1.8" "x < y and y < z"
1000000 loops, best of 3: 0.22 usec per loop
$ python -m timeit "x = 1.2" "y = 1.3" "z = 1.1" "x < y < z"
1000000 loops, best of 3: 0.279 usec per loop
$ python -m timeit "x = 1.2" "y = 1.3" "z = 1.1" "x < y and y < z"
1000000 loops, best of 3: 0.215 usec per loop

It seems that x < y and y < z is faster than x < y < z. Why?

After searching some posts in this site (like this one) I know that “evaluated only once” is the key for x < y < z, however I’m still confused. To do further study, I disassembled these two functions using dis.dis:

import dis

def chained_compare():
        x = 1.2
        y = 1.3
        z = 1.1
        x < y < z

def and_compare():
        x = 1.2
        y = 1.3
        z = 1.1
        x < y and y < z

dis.dis(chained_compare)
dis.dis(and_compare)

And the output is:

## chained_compare ##

  4           0 LOAD_CONST               1 (1.2)
              3 STORE_FAST               0 (x)

  5           6 LOAD_CONST               2 (1.3)
              9 STORE_FAST               1 (y)

  6          12 LOAD_CONST               3 (1.1)
             15 STORE_FAST               2 (z)

  7          18 LOAD_FAST                0 (x)
             21 LOAD_FAST                1 (y)
             24 DUP_TOP
             25 ROT_THREE
             26 COMPARE_OP               0 (<)
             29 JUMP_IF_FALSE_OR_POP    41
             32 LOAD_FAST                2 (z)
             35 COMPARE_OP               0 (<)
             38 JUMP_FORWARD             2 (to 43)
        >>   41 ROT_TWO
             42 POP_TOP
        >>   43 POP_TOP
             44 LOAD_CONST               0 (None)
             47 RETURN_VALUE

## and_compare ##

 10           0 LOAD_CONST               1 (1.2)
              3 STORE_FAST               0 (x)

 11           6 LOAD_CONST               2 (1.3)
              9 STORE_FAST               1 (y)

 12          12 LOAD_CONST               3 (1.1)
             15 STORE_FAST               2 (z)

 13          18 LOAD_FAST                0 (x)
             21 LOAD_FAST                1 (y)
             24 COMPARE_OP               0 (<)
             27 JUMP_IF_FALSE_OR_POP    39
             30 LOAD_FAST                1 (y)
             33 LOAD_FAST                2 (z)
             36 COMPARE_OP               0 (<)
        >>   39 POP_TOP
             40 LOAD_CONST               0 (None)

It seems that the x < y and y < z has less dissembled commands than x < y < z. Should I consider x < y and y < z faster than x < y < z?

Tested with Python 2.7.6 on an Intel(R) Xeon(R) CPU E5640 @ 2.67GHz.


回答 0

区别在于in x < y < z y仅被评估一次。如果y是一个变量,这并没有太大的区别,但是当它是一个函数调用时,它却会产生很大的差异,这需要花费一些时间来计算。

from time import sleep
def y():
    sleep(.2)
    return 1.3
%timeit 1.2 < y() < 1.8
10 loops, best of 3: 203 ms per loop
%timeit 1.2 < y() and y() < 1.8
1 loops, best of 3: 405 ms per loop

The difference is that in x < y < z y is only evaluated once. This does not make a large difference if y is a variable, but it does when it is a function call, which takes some time to compute.

from time import sleep
def y():
    sleep(.2)
    return 1.3
%timeit 1.2 < y() < 1.8
10 loops, best of 3: 203 ms per loop
%timeit 1.2 < y() and y() < 1.8
1 loops, best of 3: 405 ms per loop

回答 1

您定义的两个函数的最佳字节码将是

          0 LOAD_CONST               0 (None)
          3 RETURN_VALUE

因为不使用比较结果。通过返回比较结果,使情况变得更加有趣。让我们在编译时也无法得知结果。

def interesting_compare(y):
    x = 1.1
    z = 1.3
    return x < y < z  # or: x < y and y < z

同样,比较的两个版本在语义上是相同的,因此两个结构的最佳字节码相同。尽我所能,它看起来像这样。我已经在每个操作码之前和之后用Forth注释(在右边的栈顶,在--前后划分,尾部?表示可能存在或不存在的东西)的每一行都用了堆栈内容。请注意,RETURN_VALUE将丢弃所有遗留在返回值下面的堆栈中的所有内容。

          0 LOAD_FAST                0 (y)    ;          -- y
          3 DUP_TOP                           ; y        -- y y
          4 LOAD_CONST               0 (1.1)  ; y y      -- y y 1.1
          7 COMPARE_OP               4 (>)    ; y y 1.1  -- y pred
         10 JUMP_IF_FALSE_OR_POP     19       ; y pred   -- y
         13 LOAD_CONST               1 (1.3)  ; y        -- y 1.3
         16 COMPARE_OP               0 (<)    ; y 1.3    -- pred
     >>  19 RETURN_VALUE                      ; y? pred  --

如果CPython,PyPy等语言的实现未针对两种变体生成此字节码(或其等效的操作序列),则说明该字节码编译器的质量较差。从上面发布的字节码序列中获取是一个已解决的问题(我想在这种情况下,您需要做的就是不断折叠消除无效代码以及对堆栈内容进行更好的建模;常见的子表达式消除也将是廉价且有价值的),而没有在现代语言实现中没有这样做的借口。

现在,碰巧该语言的所有当前实现都具有劣质的字节码编译器。但是您在编码时应该忽略这一点!假装字节码编译器很好,并编写最易读的代码。无论如何它可能足够快。如果不是这样,请首先寻找算法上的改进,然后尝试Cython-与您可能应用的任何表达式级调整相比,在相同的工作量下将提供更多的改进。

Optimal bytecode for both of the functions you defined would be

          0 LOAD_CONST               0 (None)
          3 RETURN_VALUE

because the result of the comparison is not used. Let’s make the situation more interesting by returning the result of the comparison. Let’s also have the result not be knowable at compile time.

def interesting_compare(y):
    x = 1.1
    z = 1.3
    return x < y < z  # or: x < y and y < z

Again, the two versions of the comparison are semantically identical, so the optimal bytecode is the same for both constructs. As best I can work it out, it would look like this. I’ve annotated each line with the stack contents before and after each opcode, in Forth notation (top of stack at right, -- divides before and after, trailing ? indicates something that might or might not be there). Note that RETURN_VALUE discards everything that happens to be left on the stack underneath the value returned.

          0 LOAD_FAST                0 (y)    ;          -- y
          3 DUP_TOP                           ; y        -- y y
          4 LOAD_CONST               0 (1.1)  ; y y      -- y y 1.1
          7 COMPARE_OP               4 (>)    ; y y 1.1  -- y pred
         10 JUMP_IF_FALSE_OR_POP     19       ; y pred   -- y
         13 LOAD_CONST               1 (1.3)  ; y        -- y 1.3
         16 COMPARE_OP               0 (<)    ; y 1.3    -- pred
     >>  19 RETURN_VALUE                      ; y? pred  --

If an implementation of the language, CPython, PyPy, whatever, does not generate this bytecode (or its own equivalent sequence of operations) for both variations, that demonstrates the poor quality of that bytecode compiler. Getting from the bytecode sequences you posted to the above is a solved problem (I think all you need for this case is constant folding, dead code elimination, and better modeling of the contents of the stack; common subexpression elimination would also be cheap and valuable), and there’s really no excuse for not doing it in a modern language implementation.

Now, it happens that all current implementations of the language have poor-quality bytecode compilers. But you should ignore that while coding! Pretend the bytecode compiler is good, and write the most readable code. It will probably be plenty fast enough anyway. If it isn’t, look for algorithmic improvements first, and give Cython a try second — that will provide far more improvement for the same effort than any expression-level tweaks you might apply.


回答 2

由于输出的差异似乎是由于缺乏优化所致,所以我认为在大多数情况下您应该忽略该差异-可能差异会消失。区别在于,y只应评估一次,然后通过将其复制到堆栈上来解决该问题,这需要额外的费用POP_TOPLOAD_FAST尽管有可能使用解决方案。

但是,重要的区别在于,如果对x<y and y<z第二个y进行评估,则如果应x<y为true,则应评估两次,如果对的评估y花费大量时间或具有副作用,则可能会产生影响。

在大多数情况下,x<y<z尽管速度稍慢,但仍应使用。

Since the difference in the output seem to be due to lack of optimization I think you should ignore that difference for most cases – it could be that the difference will go away. The difference is because y only should be evaluated once and that is solved by duplicating it on the stack which requires an extra POP_TOP – the solution to use LOAD_FAST might be possible though.

The important difference though is that in x<y and y<z the second y should be evaluated twice if x<y evaluates to true, this has implications if the evaluation of y takes considerable time or have side effects.

In most scenarios you should use x<y<z despite the fact it’s somewhat slower.


回答 3

首先,您的比较几乎没有意义,因为没有引入两种不同的构造来提高性能,因此您不应基于此决定是否使用一个构造来代替另一个构造。

x < y < z结构:

  1. 其含义更清晰,更直接。
  2. 它的语义是您从比较的“数学意义”中所期望的:evalute xyz 一次并检查是否整个条件成立。使用可以and通过y多次评估来更改语义,这可以更改结果

因此,请根据您想要的语义以及是否相等来选择一个,以代替另一个。

这就是说:更多的反汇编代码确实 并不意味着慢的代码。但是,执行更多的字节码操作意味着每个操作都比较简单,但是需要主循环的迭代。这意味着,如果您正在执行的操作非常快(例如,您在那里执行的本地变量查找),那么执行更多字节码操作的开销可能会很重要。

但要注意,这个结果并没有在更一般的情况下举行,仅在“最坏情况”那你碰巧轮廓。正如其他人指出的那样,如果更改y为花费更多时间的内容,您将看到结果更改,因为链接表示法仅对它进行一次评估。

总结:

  • 性能之前要考虑语义。
  • 考虑到可读性。
  • 不要相信微型基准。始终使用不同种类的参数进行分析,以了解功能/表达式时序相对于所述参数的行为,并考虑您打算如何使用它。

First of all, your comparison is pretty much meaningless because the two different constructs were not introduced to provide a performance improvement, so you shouldn’t decide whether to use one in place of the other based on that.

The x < y < z construct:

  1. Is clearer and more direct in its meaning.
  2. Its semantics is what you’d expect from the “mathematical meaning” of the comparison: evalute x, y and z once and check if the whole condition holds. Using and changes the semantics by evaluating y multiple times, which can change the result.

So choose one in place of the other depending on the semantics you want and, if they are equivalent, whether one is more readable than the other.

This said: more disassembled code does does not imply slower code. However executing more bytecode operations means that each operation is simpler and yet it requires an iteration of the main loop. This means that if the operations you are performing are extremely fast (e.g. local variable lookup as you are doing there), then the overhead of executing more bytecode operations can matter.

But note that this result does not hold in the more generic situation, only to the “worst case” that you happen to profile. As others have noted, if you change y to something that takes even a bit more time you’ll see that the results change, because the chained notation evaluates it only once.

Summarizing:

  • Consider semantics before performance.
  • Take into account readability.
  • Don’t trust micro benchmarks. Always profile with different kind of parameters to see how a function/expression timing behave in relation to said parameters and consider how you plan to use it.

声明:本站所有文章,如无特殊说明或标注,均为本站原创发布。任何个人或组织,在未征得本站同意时,禁止复制、盗用、采集、发布本站内容到任何网站、书籍等各类媒体平台。如若本站内容侵犯了原著者的合法权益,可联系我们进行处理。