为什么Python 3允许“ 00”作为0的文字,却不允许“ 01”作为1的文字?

问题:为什么Python 3允许“ 00”作为0的文字,却不允许“ 01”作为1的文字?

为什么Python 3允许“ 00”作为原义的0,却不允许“ 01”作为原义的1?有充分的理由吗?这种矛盾使我感到困惑。(我们正在谈论的是Python 3,它故意打破了向后兼容性以实现诸如一致性之类的目标。)

例如:

>>> from datetime import time
>>> time(16, 00)
datetime.time(16, 0)
>>> time(16, 01)
  File "<stdin>", line 1
    time(16, 01)
              ^
SyntaxError: invalid token
>>>

Why does Python 3 allow “00” as a literal for 0 but not allow “01” as a literal for 1? Is there a good reason? This inconsistency baffles me. (And we’re talking about Python 3, which purposely broke backward compatibility in order to achieve goals like consistency.)

For example:

>>> from datetime import time
>>> time(16, 00)
datetime.time(16, 0)
>>> time(16, 01)
  File "<stdin>", line 1
    time(16, 01)
              ^
SyntaxError: invalid token
>>>

回答 0

根据https://docs.python.org/3/reference/lexical_analysis.html#integer-literals

整数文字由以下词汇定义描述:

integer        ::=  decimalinteger | octinteger | hexinteger | bininteger
decimalinteger ::=  nonzerodigit digit* | "0"+
nonzerodigit   ::=  "1"..."9"
digit          ::=  "0"..."9"
octinteger     ::=  "0" ("o" | "O") octdigit+
hexinteger     ::=  "0" ("x" | "X") hexdigit+
bininteger     ::=  "0" ("b" | "B") bindigit+
octdigit       ::=  "0"..."7"
hexdigit       ::=  digit | "a"..."f" | "A"..."F"
bindigit       ::=  "0" | "1"

除了可以存储在可用内存中的整数之外,整数文字的长度没有限制。

请注意,不允许使用非零十进制数字开头的零。这是为了消除C样式八进制文字的歧义,Python在3.0版之前使用了这些样式。

如此处所述,不允许使用非零十进制数字开头的零"0"+作为一个非常特殊的情况是合法的,这在Python 2中是不存在的

integer        ::=  decimalinteger | octinteger | hexinteger | bininteger
decimalinteger ::=  nonzerodigit digit* | "0"
octinteger     ::=  "0" ("o" | "O") octdigit+ | "0" octdigit+

SVN commit r55866在令牌生成器中实现了PEP 3127,它禁止使用旧0<octal>数字。但是,奇怪的是,它也添加了以下注释:

/* in any case, allow '0' as a literal */

带有nonzeroSyntaxError在以下数字序列包含非零数字时抛出的特殊标志。

这很奇怪,因为PEP 3127不允许这种情况:

该PEP建议,将使用Python 3.0(和2.6的Python 3.0预览模式)从语言中删除使用前导零指定八进制数的功能,并且每当前导“ 0”为紧跟着另一个数字

(强调我的)

因此,允许多个零的事实在技术上违反了PEP,并且基本上由Georg Brandl实施为特殊情况。他进行了相应的文档更改,以注意这"0"+是的有效案例decimalinteger(以前已在中进行了介绍octinteger)。

我们可能永远不会确切知道为什么Georg选择使之"0"+有效-在Python中它可能永远是一个奇怪的情况。


更新 [2015年7月28日]:这个问题引发了关于python-ideas 的热烈讨论Georg在其中进行了讨论

史蒂文·达普拉诺(Steven D’Aprano)写道:

为什么这样定义?[…]为什么我们写0000以得到零?

我可以告诉你,但后来我不得不杀了你。

格奥尔格

后来,该线程生成了此错误报告,旨在摆脱这种特殊情况。乔治在这里

我不记得有意进行更改的原因(从文档更改中可以看出)。

我现在无法提出更改的充分理由[…]

因此,我们有了它:这种不一致背后的确切原因已不复存在。

最后,请注意,该错误报告已被拒绝:对于Python 3.x的其余部分,前导零将仅在零整数上继续被接受。

Per https://docs.python.org/3/reference/lexical_analysis.html#integer-literals:

Integer literals are described by the following lexical definitions:

integer        ::=  decimalinteger | octinteger | hexinteger | bininteger
decimalinteger ::=  nonzerodigit digit* | "0"+
nonzerodigit   ::=  "1"..."9"
digit          ::=  "0"..."9"
octinteger     ::=  "0" ("o" | "O") octdigit+
hexinteger     ::=  "0" ("x" | "X") hexdigit+
bininteger     ::=  "0" ("b" | "B") bindigit+
octdigit       ::=  "0"..."7"
hexdigit       ::=  digit | "a"..."f" | "A"..."F"
bindigit       ::=  "0" | "1"

There is no limit for the length of integer literals apart from what can be stored in available memory.

Note that leading zeros in a non-zero decimal number are not allowed. This is for disambiguation with C-style octal literals, which Python used before version 3.0.

As noted here, leading zeros in a non-zero decimal number are not allowed. "0"+ is legal as a very special case, which wasn’t present in Python 2:

integer        ::=  decimalinteger | octinteger | hexinteger | bininteger
decimalinteger ::=  nonzerodigit digit* | "0"
octinteger     ::=  "0" ("o" | "O") octdigit+ | "0" octdigit+

SVN commit r55866 implemented PEP 3127 in the tokenizer, which forbids the old 0<octal> numbers. However, curiously, it also adds this note:

/* in any case, allow '0' as a literal */

with a special nonzero flag that only throws a SyntaxError if the following sequence of digits contains a nonzero digit.

This is odd because PEP 3127 does not allow this case:

This PEP proposes that the ability to specify an octal number by using a leading zero will be removed from the language in Python 3.0 (and the Python 3.0 preview mode of 2.6), and that a SyntaxError will be raised whenever a leading “0” is immediately followed by another digit.

(emphasis mine)

So, the fact that multiple zeros are allowed is technically violating the PEP, and was basically implemented as a special case by Georg Brandl. He made the corresponding documentation change to note that "0"+ was a valid case for decimalinteger (previously that had been covered under octinteger).

We’ll probably never know exactly why Georg chose to make "0"+ valid – it may forever remain an odd corner case in Python.


UPDATE [28 Jul 2015]: This question led to a lively discussion thread on python-ideas in which Georg chimed in:

Steven D’Aprano wrote:

Why was it defined that way? […] Why would we write 0000 to get zero?

I could tell you, but then I’d have to kill you.

Georg

Later on, the thread spawned this bug report aiming to get rid of this special case. Here, Georg says:

I don’t recall the reason for this deliberate change (as seen from the docs change).

I’m unable to come up with a good reason for this change now […]

and thus we have it: the precise reason behind this inconsistency is lost to time.

Finally, note that the bug report was rejected: leading zeros will continue to be accepted only on zero integers for the rest of Python 3.x.


回答 1

这是特例("0"+

2.4.4。整数文字

整数文字由以下词汇定义描述:

整数:: =十进制整数| 八进制| hexinteger | 二进制整数
十进制整数:: =非零数字* “ 0” +
非零数字:: =“ 1” ...“ 9”
数字:: =“ 0” ...“ 9”
八位整数:: =“ 0”(“ o” |“ O”)八位数字+
hexinteger :: =“ 0”(“ x” |“ X”)十六进制+
bininteger :: =“ 0”(“ b” |“ B”)bindigit +
八位数字:: =“ 0” ...“ 7”
十六进制::: digit | “ a” ...“ f” | “ A” ...“ F”
bindigit :: =“ 0” | “ 1”

如果您查看语法,则很容易看到0需要特殊情况。我不确定为什么在+那里需要’ ‘。是时候浏览一下开发邮件列表了…


有趣的是,在Python2中,有多个0解析为octinteger(最终结果仍然0是)

十进制整数:: =非零数字* “ 0”
八位整数:: =“ 0”(“ o” |“ O”)八位数字+ | “ 0”八位数字+

It’s a special case ("0"+)

2.4.4. Integer literals

Integer literals are described by the following lexical definitions:

integer        ::=  decimalinteger | octinteger | hexinteger | bininteger
decimalinteger ::=  nonzerodigit digit* | "0"+
nonzerodigit   ::=  "1"..."9"
digit          ::=  "0"..."9"
octinteger     ::=  "0" ("o" | "O") octdigit+
hexinteger     ::=  "0" ("x" | "X") hexdigit+
bininteger     ::=  "0" ("b" | "B") bindigit+
octdigit       ::=  "0"..."7"
hexdigit       ::=  digit | "a"..."f" | "A"..."F"
bindigit       ::=  "0" | "1"

If you look at the grammar, it’s easy to see that 0 need a special case. I’m not sure why the ‘+‘ is considered necessary there though. Time to dig through the dev mailing list…


Interesting to note that in Python2, more than one 0 was parsed as an octinteger (the end result is still 0 though)

decimalinteger ::=  nonzerodigit digit* | "0"
octinteger     ::=  "0" ("o" | "O") octdigit+ | "0" octdigit+

回答 2

Python2使用前导零指定八进制数:

>>> 010
8

为了避免这种情况(?误导性)行为,Python3需要明确的前缀0b0o0x

>>> 0o10
8

Python2 used the leading zero to specify octal numbers:

>>> 010
8

To avoid this (misleading?) behaviour, Python3 requires explicit prefixes 0b, 0o, 0x:

>>> 0o10
8