标签归档:literals

为什么Python 3允许“ 00”作为0的文字,却不允许“ 01”作为1的文字?

问题:为什么Python 3允许“ 00”作为0的文字,却不允许“ 01”作为1的文字?

为什么Python 3允许“ 00”作为原义的0,却不允许“ 01”作为原义的1?有充分的理由吗?这种矛盾使我感到困惑。(我们正在谈论的是Python 3,它故意打破了向后兼容性以实现诸如一致性之类的目标。)

例如:

>>> from datetime import time
>>> time(16, 00)
datetime.time(16, 0)
>>> time(16, 01)
  File "<stdin>", line 1
    time(16, 01)
              ^
SyntaxError: invalid token
>>>

Why does Python 3 allow “00” as a literal for 0 but not allow “01” as a literal for 1? Is there a good reason? This inconsistency baffles me. (And we’re talking about Python 3, which purposely broke backward compatibility in order to achieve goals like consistency.)

For example:

>>> from datetime import time
>>> time(16, 00)
datetime.time(16, 0)
>>> time(16, 01)
  File "<stdin>", line 1
    time(16, 01)
              ^
SyntaxError: invalid token
>>>

回答 0

根据https://docs.python.org/3/reference/lexical_analysis.html#integer-literals

整数文字由以下词汇定义描述:

integer        ::=  decimalinteger | octinteger | hexinteger | bininteger
decimalinteger ::=  nonzerodigit digit* | "0"+
nonzerodigit   ::=  "1"..."9"
digit          ::=  "0"..."9"
octinteger     ::=  "0" ("o" | "O") octdigit+
hexinteger     ::=  "0" ("x" | "X") hexdigit+
bininteger     ::=  "0" ("b" | "B") bindigit+
octdigit       ::=  "0"..."7"
hexdigit       ::=  digit | "a"..."f" | "A"..."F"
bindigit       ::=  "0" | "1"

除了可以存储在可用内存中的整数之外,整数文字的长度没有限制。

请注意,不允许使用非零十进制数字开头的零。这是为了消除C样式八进制文字的歧义,Python在3.0版之前使用了这些样式。

如此处所述,不允许使用非零十进制数字开头的零"0"+作为一个非常特殊的情况是合法的,这在Python 2中是不存在的

integer        ::=  decimalinteger | octinteger | hexinteger | bininteger
decimalinteger ::=  nonzerodigit digit* | "0"
octinteger     ::=  "0" ("o" | "O") octdigit+ | "0" octdigit+

SVN commit r55866在令牌生成器中实现了PEP 3127,它禁止使用旧0<octal>数字。但是,奇怪的是,它也添加了以下注释:

/* in any case, allow '0' as a literal */

带有nonzeroSyntaxError在以下数字序列包含非零数字时抛出的特殊标志。

这很奇怪,因为PEP 3127不允许这种情况:

该PEP建议,将使用Python 3.0(和2.6的Python 3.0预览模式)从语言中删除使用前导零指定八进制数的功能,并且每当前导“ 0”为紧跟着另一个数字

(强调我的)

因此,允许多个零的事实在技术上违反了PEP,并且基本上由Georg Brandl实施为特殊情况。他进行了相应的文档更改,以注意这"0"+是的有效案例decimalinteger(以前已在中进行了介绍octinteger)。

我们可能永远不会确切知道为什么Georg选择使之"0"+有效-在Python中它可能永远是一个奇怪的情况。


更新 [2015年7月28日]:这个问题引发了关于python-ideas 的热烈讨论Georg在其中进行了讨论

史蒂文·达普拉诺(Steven D’Aprano)写道:

为什么这样定义?[…]为什么我们写0000以得到零?

我可以告诉你,但后来我不得不杀了你。

格奥尔格

后来,该线程生成了此错误报告,旨在摆脱这种特殊情况。乔治在这里

我不记得有意进行更改的原因(从文档更改中可以看出)。

我现在无法提出更改的充分理由[…]

因此,我们有了它:这种不一致背后的确切原因已不复存在。

最后,请注意,该错误报告已被拒绝:对于Python 3.x的其余部分,前导零将仅在零整数上继续被接受。

Per https://docs.python.org/3/reference/lexical_analysis.html#integer-literals:

Integer literals are described by the following lexical definitions:

integer        ::=  decimalinteger | octinteger | hexinteger | bininteger
decimalinteger ::=  nonzerodigit digit* | "0"+
nonzerodigit   ::=  "1"..."9"
digit          ::=  "0"..."9"
octinteger     ::=  "0" ("o" | "O") octdigit+
hexinteger     ::=  "0" ("x" | "X") hexdigit+
bininteger     ::=  "0" ("b" | "B") bindigit+
octdigit       ::=  "0"..."7"
hexdigit       ::=  digit | "a"..."f" | "A"..."F"
bindigit       ::=  "0" | "1"

There is no limit for the length of integer literals apart from what can be stored in available memory.

Note that leading zeros in a non-zero decimal number are not allowed. This is for disambiguation with C-style octal literals, which Python used before version 3.0.

As noted here, leading zeros in a non-zero decimal number are not allowed. "0"+ is legal as a very special case, which wasn’t present in Python 2:

integer        ::=  decimalinteger | octinteger | hexinteger | bininteger
decimalinteger ::=  nonzerodigit digit* | "0"
octinteger     ::=  "0" ("o" | "O") octdigit+ | "0" octdigit+

SVN commit r55866 implemented PEP 3127 in the tokenizer, which forbids the old 0<octal> numbers. However, curiously, it also adds this note:

/* in any case, allow '0' as a literal */

with a special nonzero flag that only throws a SyntaxError if the following sequence of digits contains a nonzero digit.

This is odd because PEP 3127 does not allow this case:

This PEP proposes that the ability to specify an octal number by using a leading zero will be removed from the language in Python 3.0 (and the Python 3.0 preview mode of 2.6), and that a SyntaxError will be raised whenever a leading “0” is immediately followed by another digit.

(emphasis mine)

So, the fact that multiple zeros are allowed is technically violating the PEP, and was basically implemented as a special case by Georg Brandl. He made the corresponding documentation change to note that "0"+ was a valid case for decimalinteger (previously that had been covered under octinteger).

We’ll probably never know exactly why Georg chose to make "0"+ valid – it may forever remain an odd corner case in Python.


UPDATE [28 Jul 2015]: This question led to a lively discussion thread on python-ideas in which Georg chimed in:

Steven D’Aprano wrote:

Why was it defined that way? […] Why would we write 0000 to get zero?

I could tell you, but then I’d have to kill you.

Georg

Later on, the thread spawned this bug report aiming to get rid of this special case. Here, Georg says:

I don’t recall the reason for this deliberate change (as seen from the docs change).

I’m unable to come up with a good reason for this change now […]

and thus we have it: the precise reason behind this inconsistency is lost to time.

Finally, note that the bug report was rejected: leading zeros will continue to be accepted only on zero integers for the rest of Python 3.x.


回答 1

这是特例("0"+

2.4.4。整数文字

整数文字由以下词汇定义描述:

整数:: =十进制整数| 八进制| hexinteger | 二进制整数
十进制整数:: =非零数字* “ 0” +
非零数字:: =“ 1” ...“ 9”
数字:: =“ 0” ...“ 9”
八位整数:: =“ 0”(“ o” |“ O”)八位数字+
hexinteger :: =“ 0”(“ x” |“ X”)十六进制+
bininteger :: =“ 0”(“ b” |“ B”)bindigit +
八位数字:: =“ 0” ...“ 7”
十六进制::: digit | “ a” ...“ f” | “ A” ...“ F”
bindigit :: =“ 0” | “ 1”

如果您查看语法,则很容易看到0需要特殊情况。我不确定为什么在+那里需要’ ‘。是时候浏览一下开发邮件列表了…


有趣的是,在Python2中,有多个0解析为octinteger(最终结果仍然0是)

十进制整数:: =非零数字* “ 0”
八位整数:: =“ 0”(“ o” |“ O”)八位数字+ | “ 0”八位数字+

It’s a special case ("0"+)

2.4.4. Integer literals

Integer literals are described by the following lexical definitions:

integer        ::=  decimalinteger | octinteger | hexinteger | bininteger
decimalinteger ::=  nonzerodigit digit* | "0"+
nonzerodigit   ::=  "1"..."9"
digit          ::=  "0"..."9"
octinteger     ::=  "0" ("o" | "O") octdigit+
hexinteger     ::=  "0" ("x" | "X") hexdigit+
bininteger     ::=  "0" ("b" | "B") bindigit+
octdigit       ::=  "0"..."7"
hexdigit       ::=  digit | "a"..."f" | "A"..."F"
bindigit       ::=  "0" | "1"

If you look at the grammar, it’s easy to see that 0 need a special case. I’m not sure why the ‘+‘ is considered necessary there though. Time to dig through the dev mailing list…


Interesting to note that in Python2, more than one 0 was parsed as an octinteger (the end result is still 0 though)

decimalinteger ::=  nonzerodigit digit* | "0"
octinteger     ::=  "0" ("o" | "O") octdigit+ | "0" octdigit+

回答 2

Python2使用前导零指定八进制数:

>>> 010
8

为了避免这种情况(?误导性)行为,Python3需要明确的前缀0b0o0x

>>> 0o10
8

Python2 used the leading zero to specify octal numbers:

>>> 010
8

To avoid this (misleading?) behaviour, Python3 requires explicit prefixes 0b, 0o, 0x:

>>> 0o10
8

为什么Python的原始字符串文字不能以单个反斜杠结尾?

问题:为什么Python的原始字符串文字不能以单个反斜杠结尾?

从技术上讲,文档中描述了任意数量的反斜杠。

>>> r'\'
  File "<stdin>", line 1
    r'\'
       ^
SyntaxError: EOL while scanning string literal
>>> r'\\'
'\\\\'
>>> r'\\\'
  File "<stdin>", line 1
    r'\\\'
         ^
SyntaxError: EOL while scanning string literal

似乎解析器可以将原始字符串中的反斜杠视为常规字符(这不是原始字符串的全部含义吗?),但是我可能缺少明显的东西。

Technically, any odd number of backslashes, as described in the documentation.

>>> r'\'
  File "<stdin>", line 1
    r'\'
       ^
SyntaxError: EOL while scanning string literal
>>> r'\\'
'\\\\'
>>> r'\\\'
  File "<stdin>", line 1
    r'\\\'
         ^
SyntaxError: EOL while scanning string literal

It seems like the parser could just treat backslashes in raw strings as regular characters (isn’t that what raw strings are all about?), but I’m probably missing something obvious.


回答 0

我在该部分中以粗体突出显示了原因:

字符串引号可以使用反斜杠转义,但反斜杠仍保留在字符串中;例如,r"\""是由两个字符组成的有效字符串文字:反斜杠和双引号;r"\"不是有效的字符串文字(即使是原始字符串也不能以奇数个反斜杠结尾)。特别是,原始字符串不能以单个反斜杠结尾(因为反斜杠会转义以下引号字符)。还请注意,单个反斜杠后跟换行符将被解释为这两个字符是字符串的一部分,而不是换行符。

因此,原始字符串不是100%原始的,仍然存在一些基本的反斜杠处理。

The reason is explained in the part of that section which I highlighted in bold:

String quotes can be escaped with a backslash, but the backslash remains in the string; for example, r"\"" is a valid string literal consisting of two characters: a backslash and a double quote; r"\" is not a valid string literal (even a raw string cannot end in an odd number of backslashes). Specifically, a raw string cannot end in a single backslash (since the backslash would escape the following quote character). Note also that a single backslash followed by a newline is interpreted as those two characters as part of the string, not as a line continuation.

So raw strings are not 100% raw, there is still some rudimentary backslash-processing.


回答 1

关于python原始字符串的整个误解是,大多数人都认为反斜杠(在原始字符串内)与其他所有字符一样都是常规字符。它不是。要了解的关键是此python的教程序列:

当存在’ r ‘或’ R ‘前缀时,字符串中包含反斜杠后的字符而无需更改,并且所有反斜杠都保留在字符串中

因此,反斜杠后面的任何字符都是原始字符串的一部分。解析器输入原始字符串(非Unicode字符串)并遇到反斜杠后,便知道存在2个字符(紧随其后的是反斜杠和char)。

这条路:

r’abc \ d’包含a,b,c,\,d

r’abc \’d’包含a,b,c,\,’,d

r’abc \”包括a,b,c,\,’

和:

r’abc \’包含a,b,c,\,’,但现在没有终止引号。

最后一种情况表明,根据文档,解析器现在找不到结尾的引号,因为您在上面看到的最后一个引号是字符串的一部分,即反斜杠不能在此结尾,因为它将“吞噬”字符串的结尾字符。

The whole misconception about python’s raw strings is that most of people think that backslash (within a raw string) is just a regular character as all others. It is NOT. The key to understand is this python’s tutorial sequence:

When an ‘r‘ or ‘R‘ prefix is present, a character following a backslash is included in the string without change, and all backslashes are left in the string

So any character following a backslash is part of raw string. Once parser enters a raw string (non Unicode one) and encounters a backslash it knows there are 2 characters (a backslash and a char following it).

This way:

r’abc\d’ comprises a, b, c, \, d

r’abc\’d’ comprises a, b, c, \, ‘, d

r’abc\” comprises a, b, c, \, ‘

and:

r’abc\’ comprises a, b, c, \, ‘ but there is no terminating quote now.

Last case shows that according to documentation now a parser cannot find closing quote as the last quote you see above is part of the string i.e. backslash cannot be last here as it will ‘devour’ string closing char.


回答 2

它就是这样儿的!我将其视为python中的那些小缺陷之一!

我认为没有充分的理由,但绝对不是要解析。使用\作为最后符来解析原始字符串真的很容易。

问题是,如果您允许\成为原始字符串中的最后符,那么您将无法在原始字符串中放入“。”似乎python允许使用“而不是将\作为最后符。

但是,这不会造成任何麻烦。

如果您担心无法轻松地编写Windows文件夹路径(例如,c:\mypath\然后不用担心),则可以将它们表示为r"C:\mypath",并且,如果需要附加子目录名称,请不要使用字符串串联来实现,无论如何,这不是正确的方法!用os.path.join

>>> import os
>>> os.path.join(r"C:\mypath", "subfolder")
'C:\\mypath\\subfolder'

That’s the way it is! I see it as one of those small defects in python!

I don’t think there’s a good reason for it, but it’s definitely not parsing; it’s really easy to parse raw strings with \ as a last character.

The catch is, if you allow \ to be the last character in a raw string then you won’t be able to put ” inside a raw string. It seems python went with allowing ” instead of allowing \ as the last character.

However, this shouldn’t cause any trouble.

If you’re worried about not being able to easily write windows folder pathes such as c:\mypath\ then worry not, for, you can represent them as r"C:\mypath", and, if you need to append a subdirectory name, don’t do it with string concatenation, for it’s not the right way to do it anyway! use os.path.join

>>> import os
>>> os.path.join(r"C:\mypath", "subfolder")
'C:\\mypath\\subfolder'

回答 3

为了使您的原始字符串以斜杠结尾,我建议您可以使用以下技巧:

>>> print r"c:\test"'\\'
test\

In order for you to end a raw string with a slash I suggest you can use this trick:

>>> print r"c:\test"'\\'
test\

回答 4

另一个技巧是在计算结果为“ \”时使用chr(92)。

最近,我不得不清理一串反斜线,而以下方法可以解决问题:

CleanString = DirtyString.replace(chr(92),'')

我意识到这并不能解决“为什么”的问题,但是线程吸引了许多人寻找解决当前问题的方法。

Another trick is to use chr(92) as it evaluates to “\”.

I recently had to clean a string of backslashes and the following did the trick:

CleanString = DirtyString.replace(chr(92),'')

I realize that this does not take care of the “why” but the thread attracts many people looking for a solution to an immediate problem.


回答 5

由于原始字符串中允许使用\“。因此不能用于标识字符串文字的结尾。

为什么在遇到第一个“”时不停止解析字符串文字?

如果真是这样,那么在字符串文字中将不允许使用“”。

Since \” is allowed inside the raw string. Then it can’t be used to identify the end of the string literal.

Why not stop parsing the string literal when you encounter the first “?

If that was the case, then \” wouldn’t be allowed inside the string literal. But it is.


回答 6

r'\'语法错误的原因是,尽管字符串表达式是原始的,但使用的引号(单引号或双引号)始终必须转义,否则它们会标记引号的结尾。因此,如果您想在单引号引起来的字符串中表达单引号,则没有其他方法可以使用\'。同样适用于双引号。

但是您可以使用:

'\\'

The reason for why r'\' is syntactical incorrect is that although the string expression is raw the used quotes (single or double) always have to be escape since they would mark the end of the quote otherwise. So if you want to express a single quote inside single quoted string, there is no other way than using \'. Same applies for double quotes.

But you could use:

'\\'

回答 7

此后删除了答案的另一位用户(不确定是否要记入他们的答案)建议,Python语言设计人员可以通过使用相同的解析规则并将转义的字符扩展为原始格式来简化解析器设计。 (如果文字被标记为原始)。

我认为这是一个有趣的想法,并将其作为后代社区Wiki包含在内。

Another user who has since deleted their answer (not sure if they’d like to be credited) suggested that the Python language designers may be able to simplify the parser design by using the same parsing rules and expanding escaped characters to raw form as an afterthought (if the literal was marked as raw).

I thought it was an interesting idea and am including it as community wiki for posterity.


回答 8

尽管其作用很大,但即使是原始字符串也不能以单个反斜杠结尾,因为反斜杠会转义以下引号字符—您仍必须先转义周围的引号字符才能将其嵌入到字符串中。也就是说,r“ … \”不是有效的字符串文字-原始字符串不能以奇数个反斜杠结尾。
如果需要用单个反斜杠结束原始字符串,则可以使用两个反斜杠。

Despite its role, even a raw string cannot end in a single backslash, because the backslash escapes the following quote character—you still must escape the surrounding quote character to embed it in the string. That is, r”…\” is not a valid string literal—a raw string cannot end in an odd number of backslashes.
If you need to end a raw string with a single backslash, you can use two and slice off the second.


回答 9

从C来看,对我来说很清楚,单个\用作转义符,允许您将特殊字符(例如换行符,制表符和引号)放入字符串中。

确实确实不允许\作为最后符,因为它将逃脱“并使解析器阻塞。但是如前所述,\是合法的。

Comming from C it pretty clear to me that a single \ works as escape character allowing you to put special characters such as newlines, tabs and quotes into strings.

That does indeed disallow \ as last character since it will escape the ” and make the parser choke. But as pointed out earlier \ is legal.


回答 10

一些技巧 :

1)如果您需要为路径操纵反斜杠,则标准python模块os.path是您的朋友。例如 :

os.path.normpath(’c:/ folder1 /’)

2)如果您要构建的字符串中带有反斜杠,但字符串末尾没有反斜杠,那么原始字符串就是您的朋友(在文字字符串前使用’r’前缀)。例如 :

r'\one \two \three'

3)如果您需要为变量X中的字符串加上反斜杠作为前缀,则可以执行以下操作:

X='dummy'
bs=r'\ ' # don't forget the space after backslash or you will get EOL error
X2=bs[0]+X  # X2 now contains \dummy

4)如果您需要创建一个结尾处带有反斜杠的字符串,则结合技巧2和3:

voice_name='upper'
lilypond_display=r'\DisplayLilyMusic \ ' # don't forget the space at the end
lilypond_statement=lilypond_display[:-1]+voice_name

现在lilypond_statement包含 "\DisplayLilyMusic \upper"

Python万岁!:)

n3on

some tips :

1) if you need to manipulate backslash for path then standard python module os.path is your friend. for example :

os.path.normpath(‘c:/folder1/’)

2) if you want to build strings with backslash in it BUT without backslash at the END of your string then raw string is your friend (use ‘r’ prefix before your literal string). for example :

r'\one \two \three'

3) if you need to prefix a string in a variable X with a backslash then you can do this :

X='dummy'
bs=r'\ ' # don't forget the space after backslash or you will get EOL error
X2=bs[0]+X  # X2 now contains \dummy

4) if you need to create a string with a backslash at the end then combine tip 2 and 3 :

voice_name='upper'
lilypond_display=r'\DisplayLilyMusic \ ' # don't forget the space at the end
lilypond_statement=lilypond_display[:-1]+voice_name

now lilypond_statement contains "\DisplayLilyMusic \upper"

long live python ! 🙂

n3on


回答 11

我遇到了这个问题,并找到了部分解决方案,在某些情况下是好的。尽管python无法以单个反斜杠结束字符串,但是可以将其序列化并保存在文本文件中,并以单个反斜杠结尾。因此,如果您需要在计算机上保存带有单个反斜杠的文本,则可以:

x = 'a string\\' 
x
'a string\\' 

# Now save it in a text file and it will appear with a single backslash:

with open("my_file.txt", 'w') as h:
    h.write(x)

顺便说一句,如果您使用python的json库转储它,它就不能与json一起使用。

最后,我使用了Spyder,我注意到,如果我在Spider的文本编辑器中通过在变量资源管理器中双击其名称来打开该变量,则该变量将带有一个反斜杠,并且可以通过这种方式复制到剪贴板(不是对大多数需求都非常有帮助,但也许对某些人很有帮助。)

I encountered this problem and found a partial solution which is good for some cases. Despite python not being able to end a string with a single backslash, it can be serialized and saved in a text file with a single backslash at the end. Therefore if what you need is saving a text with a single backslash on you computer, it is possible:

x = 'a string\\' 
x
'a string\\' 

# Now save it in a text file and it will appear with a single backslash:

with open("my_file.txt", 'w') as h:
    h.write(x)

BTW it is not working with json if you dump it using python’s json library.

Finally, I work with Spyder, and I noticed that if I open the variable in spider’s text editor by double clicking on its name in the variable explorer, it is presented with a single backslash and can be copied to the clipboard that way (it’s not very helpful for most needs but maybe for some..).


您如何用Python表达二进制文字?

问题:您如何用Python表达二进制文字?

如何使用Python文字将整数表示为二进制数?

我很容易找到十六进制的答案:

>>> 0x12AF
4783
>>> 0x100
256

和八进制:

>>> 01267
695
>>> 0100
64

您如何使用文字在Python中表示二进制?


答案摘要

  • Python 2.5及更早版本:可以使用,int('01010101111',2)但不能使用文字来表示二进制。
  • Python 2.5和更早版本:无法表达二进制文字。
  • Python 2.6 beta:您可以这样做:0b11001110B1100111
  • Python 2.6 beta:还将允许0o270O27(第二个字符是字母O)表示一个八进制。
  • Python 3.0 beta:与2.6相同,但将不再允许使用较旧027的八进制语法。

How do you express an integer as a binary number with Python literals?

I was easily able to find the answer for hex:

>>> 0x12AF
4783
>>> 0x100
256

and octal:

>>> 01267
695
>>> 0100
64

How do you use literals to express binary in Python?


Summary of Answers

  • Python 2.5 and earlier: can express binary using int('01010101111',2) but not with a literal.
  • Python 2.5 and earlier: there is no way to express binary literals.
  • Python 2.6 beta: You can do like so: 0b1100111 or 0B1100111.
  • Python 2.6 beta: will also allow 0o27 or 0O27 (second character is the letter O) to represent an octal.
  • Python 3.0 beta: Same as 2.6, but will no longer allow the older 027 syntax for octals.

回答 0

供参考- 未来的 Python可能性:
从Python 2.6开始,您可以使用前缀0b0B表示二进制文字:

>>> 0b101111
47

您还可以使用新的bin函数来获取数字的二进制表示形式:

>>> bin(173)
'0b10101101'

文档的开发版本:Python 2.6的新增功能

For reference—future Python possibilities:
Starting with Python 2.6 you can express binary literals using the prefix 0b or 0B:

>>> 0b101111
47

You can also use the new bin function to get the binary representation of a number:

>>> bin(173)
'0b10101101'

Development version of the documentation: What’s New in Python 2.6


回答 1

>>> print int('01010101111',2)
687
>>> print int('11111111',2)
255

另一种方式。

>>> print int('01010101111',2)
687
>>> print int('11111111',2)
255

Another way.


回答 2

您如何用Python表达二进制文字?

它们不是“二进制”文字,而是“整数文字”。您可以用二进制格式表示整数文字,0后跟a Bb后跟一系列零和一,例如:

>>> 0b0010101010
170
>>> 0B010101
21

从Python 3 文档开始,以下是在Python中提供整数文字的方式:

整数文字由以下词汇定义描述:

integer      ::=  decinteger | bininteger | octinteger | hexinteger
decinteger   ::=  nonzerodigit (["_"] digit)* | "0"+ (["_"] "0")*
bininteger   ::=  "0" ("b" | "B") (["_"] bindigit)+
octinteger   ::=  "0" ("o" | "O") (["_"] octdigit)+
hexinteger   ::=  "0" ("x" | "X") (["_"] hexdigit)+
nonzerodigit ::=  "1"..."9"
digit        ::=  "0"..."9"
bindigit     ::=  "0" | "1"
octdigit     ::=  "0"..."7"
hexdigit     ::=  digit | "a"..."f" | "A"..."F"

除了可以存储在可用内存中的整数之外,整数字面量的长度没有限制。

请注意,不允许使用非零十进制数字开头的零。这是为了消除C样式八进制文字的歧义,Python在3.0版之前使用了这些文字。

整数文字的一些示例:

7     2147483647                        0o177    0b100110111
3     79228162514264337593543950336     0o377    0xdeadbeef
      100_000_000_000                   0b_1110_0101

在版本3.6中进行了更改:现在允许在文本中使用下划线进行分组。

其他表达二进制的方式:

您可以在可操作的字符串对象中包含零和一(尽管在大多数情况下,您可能应该对整数进行按位运算)-只需将零和一的字符串以及您要从中转换的基数传递给int ):

>>> int('010101', 2)
21

您可以选择使用0b0B前缀:

>>> int('0b0010101010', 2)
170

如果将其0作为基数传递,则如果字符串未指定前缀,则它将假定基数为10:

>>> int('10101', 0)
10101
>>> int('0b10101', 0)
21

从int转换回人类可读的二进制文件:

您可以将整数传递给bin以查看二进制文字的字符串表示形式:

>>> bin(21)
'0b10101'

你可以结合binint去来回:

>>> bin(int('010101', 2))
'0b10101'

如果希望最小宽度和前面的零,也可以使用格式规范:

>>> format(int('010101', 2), '{fill}{width}b'.format(width=10, fill=0))
'0000010101'
>>> format(int('010101', 2), '010b')
'0000010101'

How do you express binary literals in Python?

They’re not “binary” literals, but rather, “integer literals”. You can express integer literals with a binary format with a 0 followed by a B or b followed by a series of zeros and ones, for example:

>>> 0b0010101010
170
>>> 0B010101
21

From the Python 3 docs, these are the ways of providing integer literals in Python:

Integer literals are described by the following lexical definitions:

integer      ::=  decinteger | bininteger | octinteger | hexinteger
decinteger   ::=  nonzerodigit (["_"] digit)* | "0"+ (["_"] "0")*
bininteger   ::=  "0" ("b" | "B") (["_"] bindigit)+
octinteger   ::=  "0" ("o" | "O") (["_"] octdigit)+
hexinteger   ::=  "0" ("x" | "X") (["_"] hexdigit)+
nonzerodigit ::=  "1"..."9"
digit        ::=  "0"..."9"
bindigit     ::=  "0" | "1"
octdigit     ::=  "0"..."7"
hexdigit     ::=  digit | "a"..."f" | "A"..."F"

There is no limit for the length of integer literals apart from what can be stored in available memory.

Note that leading zeros in a non-zero decimal number are not allowed. This is for disambiguation with C-style octal literals, which Python used before version 3.0.

Some examples of integer literals:

7     2147483647                        0o177    0b100110111
3     79228162514264337593543950336     0o377    0xdeadbeef
      100_000_000_000                   0b_1110_0101

Changed in version 3.6: Underscores are now allowed for grouping purposes in literals.

Other ways of expressing binary:

You can have the zeros and ones in a string object which can be manipulated (although you should probably just do bitwise operations on the integer in most cases) – just pass int the string of zeros and ones and the base you are converting from (2):

>>> int('010101', 2)
21

You can optionally have the 0b or 0B prefix:

>>> int('0b0010101010', 2)
170

If you pass it 0 as the base, it will assume base 10 if the string doesn’t specify with a prefix:

>>> int('10101', 0)
10101
>>> int('0b10101', 0)
21

Converting from int back to human readable binary:

You can pass an integer to bin to see the string representation of a binary literal:

>>> bin(21)
'0b10101'

And you can combine bin and int to go back and forth:

>>> bin(int('010101', 2))
'0b10101'

You can use a format specification as well, if you want to have minimum width with preceding zeros:

>>> format(int('010101', 2), '{fill}{width}b'.format(width=10, fill=0))
'0000010101'
>>> format(int('010101', 2), '010b')
'0000010101'

回答 3

开头的0表示底数是8(而不是10),这很容易看到:

>>> int('010101', 0)
4161

如果您不以0开头,则python假定数字以10为底。

>>> int('10101', 0)
10101

0 in the start here specifies that the base is 8 (not 10), which is pretty easy to see:

>>> int('010101', 0)
4161

If you don’t start with a 0, then python assumes the number is base 10.

>>> int('10101', 0)
10101

回答 4

据我所知,直到2.5,Python仅支持十六进制和八进制文字。我确实找到了一些有关在将来的版本中添加二进制文件的讨论,但没有明确的定义。

As far as I can tell Python, up through 2.5, only supports hexadecimal & octal literals. I did find some discussions about adding binary to future versions but nothing definite.


回答 5

我很确定这是由于Python 3.0的变化之一,也许bin()与hex()和oct()一起使用。

编辑:lbrandy的答案在所有情况下都是正确的。

I am pretty sure this is one of the things due to change in Python 3.0 with perhaps bin() to go with hex() and oct().

EDIT: lbrandy’s answer is correct in all cases.


空集文字?

问题:空集文字?

[] =空 list

() =空 tuple

{} =空 dict

空有类似的记号set吗?还是我必须写set()

[] = empty list

() = empty tuple

{} = empty dict

Is there a similar notation for an empty set? Or do I have to write set()?


回答 0

不,空集没有文字语法。你必须写set()

No, there’s no literal syntax for the empty set. You have to write set().


回答 1

一定请使用 set()创建一个空集。

但是,如果您想打动别人,请告诉他们您可以使用文字和*Python> = 3.5(请参阅PEP 448)创建一个空集,方法是:

>>> s = {*()}  # or {*{}} or {*[]}
>>> print(s)
set()

这基本上是一种更简洁的方法{_ for _ in ()},但是,请不要这样做。

By all means, please use set() to create an empty set.

But, if you want to impress people, tell them that you can create an empty set using literals and * with Python >= 3.5 (see PEP 448) by doing:

>>> s = {*()}  # or {*{}} or {*[]}
>>> print(s)
set()

this is basically a more condensed way of doing {_ for _ in ()}, but, don’t do this.


回答 2

只是为了扩展公认的答案:

从version 2.73.1python起,set文字已经{}以用法的形式出现了{1,2,3},但是{}它本身仍然用于空字典。

Python 2.7(第一行在Python <2.7中无效)

>>> {1,2,3}.__class__
<type 'set'>
>>> {}.__class__
<type 'dict'>

Python 3.x

>>> {1,4,5}.__class__
<class 'set'>
>>> {}.__class__
<type 'dict'>

此处更多内容:https//docs.python.org/3/whatsnew/2.7.html#other-language-changes

Just to extend the accepted answer:

From version 2.7 and 3.1 python has got set literal {} in form of usage {1,2,3}, but {} itself still used for empty dict.

Python 2.7 (first line is invalid in Python <2.7)

>>> {1,2,3}.__class__
<type 'set'>
>>> {}.__class__
<type 'dict'>

Python 3.x

>>> {1,4,5}.__class__
<class 'set'>
>>> {}.__class__
<type 'dict'>

More here: https://docs.python.org/3/whatsnew/2.7.html#other-language-changes


回答 3

这取决于您是否要使用文字进行比较或赋值。

如果要将现有集设为空,则可以使用该.clear()方法,尤其是在要避免创建新对象的情况下。如果要进行比较,请使用set()或检查长度是否为0。

例:

#create a new set    
a=set([1,2,3,'foo','bar'])
#or, using a literal:
a={1,2,3,'foo','bar'}

#create an empty set
a=set()
#or, use the clear method
a.clear()

#comparison to a new blank set
if a==set():
    #do something

#length-checking comparison
if len(a)==0:
    #do something

It depends on if you want the literal for a comparison, or for assignment.

If you want to make an existing set empty, you can use the .clear() metod, especially if you want to avoid creating a new object. If you want to do a comparison, use set() or check if the length is 0.

example:

#create a new set    
a=set([1,2,3,'foo','bar'])
#or, using a literal:
a={1,2,3,'foo','bar'}

#create an empty set
a=set()
#or, use the clear method
a.clear()

#comparison to a new blank set
if a==set():
    #do something

#length-checking comparison
if len(a)==0:
    #do something

回答 4

更加疯狂的想法是:在Python 3接受Unicode标识符的情况下,您可以声明一个变量ϕ = frozenset()(ϕ为U + 03D5)并使用它。

Adding to the crazy ideas: with Python 3 accepting unicode identifiers, you could declare a variable ϕ = frozenset() (ϕ is U+03D5) and use it instead.


回答 5

是。适用于非空dict / set的相同表示法适用于空dict / set。

注意非空dictset文字之间的区别:

{1: 'a', 2: 'b', 3: 'c'}-一个数字键-值对的内部使得dict
{'aaa', 'bbb', 'ccc'}-一个元组值的内部使一个set

所以:

{}==零个键值对==空dict
{*()}==空值元组==空set

但是事实是您可以做到,但这并不意味着您应该这样做。除非您有很强的理由,否则最好显式构造一个空集,例如:

a = set()

注意:正如评论中注意到的那样,{()}不是一个空集合。这是一个包含1个元素的集合:空元组。

Yes. The same notation that works for non-empty dict/set works for empty ones.

Notice the difference between non-empty dict and set literals:

{1: 'a', 2: 'b', 3: 'c'} — a number of key-value pairs inside makes a dict
{'aaa', 'bbb', 'ccc'} — a tuple of values inside makes a set

So:

{} == zero number of key-value pairs == empty dict
{*()} == empty tuple of values == empty set

However the fact, that you can do it, doesn’t mean you should. Unless you have some strong reasons, it’s better to construct an empty set explicitly, like:

a = set()

NB: As ctrueden noticed in comments, {()} is not an empty set. It’s a set with 1 element: empty tuple.


回答 6

有几种方法可以在Python中创建空Set:

  1. 使用 set()方法
    这是python中的内置方法,可在该变量中创建Empty set。
  2. 使用clear()方法(创造性的工程师技术LOL)
    请参见以下示例:

    sets = {“ Hi”,“ How”,“ are”,“ You”,“ All”}
    类型(集合)(此行输出:set)
    集合.clear()
    print(sets)(此行输出:{})
    type(sets)(此行输出:set)

因此,这是创建空Set的2种方法。

There are few ways to create empty Set in Python :

  1. Using set() method
    This is the built-in method in python that creates Empty set in that variable.
  2. Using clear() method (creative Engineer Technique LOL)
    See this Example:

    sets={“Hi”,”How”,”are”,”You”,”All”}
    type(sets)  (This Line Output : set)
    sets.clear()
    print(sets)  (This Line Output : {})
    type(sets)  (This Line Output : set)

So, This are 2 ways to create empty Set.


为什么[]比list()快?

问题:为什么[]比list()快?

我最近比较了[]和的处理速度,并list()惊讶地发现它的[]运行速度是的三倍以上list()。我跑了相同的测试与{}dict(),结果几乎相同:[]{}两个花了大约0.128sec /百万次,而list()dict()把每个粗0.428sec /万次。

为什么是这样?不要[]{}(可能()'',太)立即传回了一些空的股票面值的副本,而其明确命名同行(list()dict()tuple()str())完全去创建一个对象,他们是否真的有元素?

我不知道这两种方法有何不同,但我很想找出答案。我在文档中或SO上都找不到答案,而寻找空括号却比我预期的要麻烦得多。

通过分别调用timeit.timeit("[]")timeit.timeit("list()"),和timeit.timeit("{}")timeit.timeit("dict()")来比较列表和字典,以获得计时结果。我正在运行Python 2.7.9。

我最近发现“ 为什么True慢于if? ”比较了if Trueto 的性能,if 1并且似乎触及了类似的文字对全局场景;也许也值得考虑。

I recently compared the processing speeds of [] and list() and was surprised to discover that [] runs more than three times faster than list(). I ran the same test with {} and dict() and the results were practically identical: [] and {} both took around 0.128sec / million cycles, while list() and dict() took roughly 0.428sec / million cycles each.

Why is this? Do [] and {} (and probably () and '', too) immediately pass back a copies of some empty stock literal while their explicitly-named counterparts (list(), dict(), tuple(), str()) fully go about creating an object, whether or not they actually have elements?

I have no idea how these two methods differ but I’d love to find out. I couldn’t find an answer in the docs or on SO, and searching for empty brackets turned out to be more problematic than I’d expected.

I got my timing results by calling timeit.timeit("[]") and timeit.timeit("list()"), and timeit.timeit("{}") and timeit.timeit("dict()"), to compare lists and dictionaries, respectively. I’m running Python 2.7.9.

I recently discovered “Why is if True slower than if 1?” that compares the performance of if True to if 1 and seems to touch on a similar literal-versus-global scenario; perhaps it’s worth considering as well.


回答 0

因为[]{}文字语法。Python可以创建字节码仅用于创建列表或字典对象:

>>> import dis
>>> dis.dis(compile('[]', '', 'eval'))
  1           0 BUILD_LIST               0
              3 RETURN_VALUE        
>>> dis.dis(compile('{}', '', 'eval'))
  1           0 BUILD_MAP                0
              3 RETURN_VALUE        

list()dict()是单独的对象。它们的名称需要解析,必须包含堆栈以推入参数,必须存储框架以供以后检索,并且必须进行调用。这都需要更多时间。

对于空的情况,这意味着您至少要有一个LOAD_NAME(必须在全局命名空间以及__builtin__模块中进行搜索),后跟一个CALL_FUNCTION必须保留当前帧的:

>>> dis.dis(compile('list()', '', 'eval'))
  1           0 LOAD_NAME                0 (list)
              3 CALL_FUNCTION            0
              6 RETURN_VALUE        
>>> dis.dis(compile('dict()', '', 'eval'))
  1           0 LOAD_NAME                0 (dict)
              3 CALL_FUNCTION            0
              6 RETURN_VALUE        

您可以使用以下命令分别计时名称查找timeit

>>> import timeit
>>> timeit.timeit('list', number=10**7)
0.30749011039733887
>>> timeit.timeit('dict', number=10**7)
0.4215109348297119

时间差异可能是字典哈希冲突。从调用这些对象的时间中减去这些时间,然后将结果与使用文字的时间进行比较:

>>> timeit.timeit('[]', number=10**7)
0.30478692054748535
>>> timeit.timeit('{}', number=10**7)
0.31482696533203125
>>> timeit.timeit('list()', number=10**7)
0.9991960525512695
>>> timeit.timeit('dict()', number=10**7)
1.0200958251953125

因此,1.00 - 0.31 - 0.30 == 0.39每1000万次调用必须调用该对象花费了额外的几秒钟。

您可以通过将全局名称别名为本地名称来避免全局查找成本(使用timeit设置,绑定到名称的所有内容都是本地名称):

>>> timeit.timeit('_list', '_list = list', number=10**7)
0.1866450309753418
>>> timeit.timeit('_dict', '_dict = dict', number=10**7)
0.19016098976135254
>>> timeit.timeit('_list()', '_list = list', number=10**7)
0.841480016708374
>>> timeit.timeit('_dict()', '_dict = dict', number=10**7)
0.7233691215515137

但您永远无法克服这些CALL_FUNCTION成本。

Because [] and {} are literal syntax. Python can create bytecode just to create the list or dictionary objects:

>>> import dis
>>> dis.dis(compile('[]', '', 'eval'))
  1           0 BUILD_LIST               0
              3 RETURN_VALUE        
>>> dis.dis(compile('{}', '', 'eval'))
  1           0 BUILD_MAP                0
              3 RETURN_VALUE        

list() and dict() are separate objects. Their names need to be resolved, the stack has to be involved to push the arguments, the frame has to be stored to retrieve later, and a call has to be made. That all takes more time.

For the empty case, that means you have at the very least a LOAD_NAME (which has to search through the global namespace as well as the __builtin__ module) followed by a CALL_FUNCTION, which has to preserve the current frame:

>>> dis.dis(compile('list()', '', 'eval'))
  1           0 LOAD_NAME                0 (list)
              3 CALL_FUNCTION            0
              6 RETURN_VALUE        
>>> dis.dis(compile('dict()', '', 'eval'))
  1           0 LOAD_NAME                0 (dict)
              3 CALL_FUNCTION            0
              6 RETURN_VALUE        

You can time the name lookup separately with timeit:

>>> import timeit
>>> timeit.timeit('list', number=10**7)
0.30749011039733887
>>> timeit.timeit('dict', number=10**7)
0.4215109348297119

The time discrepancy there is probably a dictionary hash collision. Subtract those times from the times for calling those objects, and compare the result against the times for using literals:

>>> timeit.timeit('[]', number=10**7)
0.30478692054748535
>>> timeit.timeit('{}', number=10**7)
0.31482696533203125
>>> timeit.timeit('list()', number=10**7)
0.9991960525512695
>>> timeit.timeit('dict()', number=10**7)
1.0200958251953125

So having to call the object takes an additional 1.00 - 0.31 - 0.30 == 0.39 seconds per 10 million calls.

You can avoid the global lookup cost by aliasing the global names as locals (using a timeit setup, everything you bind to a name is a local):

>>> timeit.timeit('_list', '_list = list', number=10**7)
0.1866450309753418
>>> timeit.timeit('_dict', '_dict = dict', number=10**7)
0.19016098976135254
>>> timeit.timeit('_list()', '_list = list', number=10**7)
0.841480016708374
>>> timeit.timeit('_dict()', '_dict = dict', number=10**7)
0.7233691215515137

but you never can overcome that CALL_FUNCTION cost.


回答 1

list()需要全局查找和函数调用,但需要[]编译为一条指令。看到:

Python 2.7.3
>>> import dis
>>> print dis.dis(lambda: list())
  1           0 LOAD_GLOBAL              0 (list)
              3 CALL_FUNCTION            0
              6 RETURN_VALUE        
None
>>> print dis.dis(lambda: [])
  1           0 BUILD_LIST               0
              3 RETURN_VALUE        
None

list() requires a global lookup and a function call but [] compiles to a single instruction. See:

Python 2.7.3
>>> import dis
>>> print dis.dis(lambda: list())
  1           0 LOAD_GLOBAL              0 (list)
              3 CALL_FUNCTION            0
              6 RETURN_VALUE        
None
>>> print dis.dis(lambda: [])
  1           0 BUILD_LIST               0
              3 RETURN_VALUE        
None

回答 2

因为list是一个功能转化说一个字符串列表对象,而[]用于创建一个列表蝙蝠。尝试一下(可能对您更有意义):

x = "wham bam"
a = list(x)
>>> a
["w", "h", "a", "m", ...]

y = ["wham bam"]
>>> y
["wham bam"]

为您提供包含您所输入内容的实际列表。

Because list is a function to convert say a string to a list object, while [] is used to create a list off the bat. Try this (might make more sense to you):

x = "wham bam"
a = list(x)
>>> a
["w", "h", "a", "m", ...]

While

y = ["wham bam"]
>>> y
["wham bam"]

Gives you a actual list containing whatever you put in it.


回答 3

至此,答案非常好,并完全涵盖了这个问题。对于那些感兴趣的人,我将进一步从字节码中删除。我正在使用CPython的最新仓库;在这方面,旧版本的行为类似,但可能会稍作更改。

这是每个BUILD_LIST针对for []CALL_FUNCTIONfor 的执行情况的细分list()


BUILD_LIST指令:

您应该只查看恐怖:

PyObject *list =  PyList_New(oparg);
if (list == NULL)
    goto error;
while (--oparg >= 0) {
    PyObject *item = POP();
    PyList_SET_ITEM(list, oparg, item);
}
PUSH(list);
DISPATCH();

我知道那令人费解。这是多么简单:

  • 使用创建新列表PyList_New(主要是为新的列表对象分配内存),以oparg信号指示堆栈上的参数数量。开门见山。
  • 检查是否没有问题if (list==NULL)
  • 使用PyList_SET_ITEM(宏)添加位于堆栈上的所有参数(在我们的示例中此参数未执行)。

难怪它很快!它是为创建新列表而定制的,仅此而已:-)

CALL_FUNCTION指令:

窥视代码处理时,这是您看到的第一件事CALL_FUNCTION

PyObject **sp, *res;
sp = stack_pointer;
res = call_function(&sp, oparg, NULL);
stack_pointer = sp;
PUSH(res);
if (res == NULL) {
    goto error;
}
DISPATCH();

看起来很无害吧?好吧,不是,不幸的call_function是,不是一个会立即调用该函数的直截了当的家伙,它不会。相反,它从堆栈中获取对象,获取堆栈中的所有参数,然后根据对象的类型进行切换。它是:

我们正在调用list类型,传入的参数call_functionPyList_Type。CPython现在必须调用一个泛型函数来处理名为的所有可调用对象_PyObject_FastCallKeywords,还有更多函数调用。

该函数再次检查某些函数类型(我不明白为什么),然后在为kwargs创建字典后,如果需要,继续调用_PyObject_FastCallDict

_PyObject_FastCallDict终于把我们带到某个地方!执行后甚至更多的检查抓住了tp_call从插槽type中的type我们在通过了,那就是它抓住type.tp_call。然后,它根据传入的参数来创建元组_PyStack_AsTuple,最后可以最终进行调用

tp_call,它将匹配type.__call__并最终创建列表对象。它调用与之__new__对应的列表PyType_GenericNew并为其分配内存PyType_GenericAlloc这实际上是它与追上的部分PyList_New,最后。所有以前的内容对于以通用方式处理对象都是必需的。

最后,使用任何可用参数type_call调用list.__init__并初始化列表,然后继续返回原来的方式。:-)

最后,记住 LOAD_NAME,这是另一个在这里做出贡献的家伙。


很容易看到,在处理我们的输入时,Python通常必须跳过圈以真正找到合适的C函数来完成工作。它不具有立即调用它的功能,因为它是动态的,有人可能会掩盖list并且男孩会做很多人做的事情),因此必须采取另一条路。

这是哪里 list()损失很多的地方:正在探索的Python需要做以找出它应该做什么。

另一方面,字面语法恰好意味着一回事。它无法更改,并且始终以预定的方式运行。

脚注:所有功能名称均可能从一个版本更改为另一个版本。关键点仍然存在,并且很可能在将来的任何版本中都存在,这是动态查找使事情变慢的原因。

The answers here are great, to the point and fully cover this question. I’ll drop a further step down from byte-code for those interested. I’m using the most recent repo of CPython; older versions behave similar in this regard but slight changes might be in place.

Here’s a break down of the execution for each of these, BUILD_LIST for [] and CALL_FUNCTION for list().


The BUILD_LIST instruction:

You should just view the horror:

PyObject *list =  PyList_New(oparg);
if (list == NULL)
    goto error;
while (--oparg >= 0) {
    PyObject *item = POP();
    PyList_SET_ITEM(list, oparg, item);
}
PUSH(list);
DISPATCH();

Terribly convoluted, I know. This is how simple it is:

  • Create a new list with PyList_New (this mainly allocates the memory for a new list object), oparg signalling the number of arguments on the stack. Straight to the point.
  • Check that nothing went wrong with if (list==NULL).
  • Add any arguments (in our case this isn’t executed) located on the stack with PyList_SET_ITEM (a macro).

No wonder it is fast! It’s custom-made for creating new lists, nothing else 🙂

The CALL_FUNCTION instruction:

Here’s the first thing you see when you peek at the code handling CALL_FUNCTION:

PyObject **sp, *res;
sp = stack_pointer;
res = call_function(&sp, oparg, NULL);
stack_pointer = sp;
PUSH(res);
if (res == NULL) {
    goto error;
}
DISPATCH();

Looks pretty harmless, right? Well, no, unfortunately not, call_function is not a straightforward guy that will call the function immediately, it can’t. Instead, it grabs the object from the stack, grabs all arguments of the stack and then switches based on the type of the object; is it a:

We’re calling the list type, the argument passed in to call_function is PyList_Type. CPython now has to call a generic function to handle any callable objects named _PyObject_FastCallKeywords, yay more function calls.

This function again makes some checks for certain function types (which I cannot understand why) and then, after creating a dict for kwargs if required, goes on to call _PyObject_FastCallDict.

_PyObject_FastCallDict finally gets us somewhere! After performing even more checks it grabs the tp_call slot from the type of the type we’ve passed in, that is, it grabs type.tp_call. It then proceeds to create a tuple out of of the arguments passed in with _PyStack_AsTuple and, finally, a call can finally be made!

tp_call, which matches type.__call__ takes over and finally creates the list object. It calls the lists __new__ which corresponds to PyType_GenericNew and allocates memory for it with PyType_GenericAlloc: This is actually the part where it catches up with PyList_New, finally. All the previous are necessary to handle objects in a generic fashion.

In the end, type_call calls list.__init__ and initializes the list with any available arguments, then we go on a returning back the way we came. 🙂

Finally, remmeber the LOAD_NAME, that’s another guy that contributes here.


It’s easy to see that, when dealing with our input, Python generally has to jump through hoops in order to actually find out the appropriate C function to do the job. It doesn’t have the curtesy of immediately calling it because it’s dynamic, someone might mask list (and boy do many people do) and another path must be taken.

This is where list() loses much: The exploring Python needs to do to find out what the heck it should do.

Literal syntax, on the other hand, means exactly one thing; it cannot be changed and always behaves in a pre-determined way.

Footnote: All function names are subject to change from one release to the other. The point still stands and most likely will stand in any future versions, it’s the dynamic look-up that slows things down.


回答 4

为什么[]要比list()

最大的原因是Python list()就像用户定义的函数一样对待,这意味着您可以通过别名别名来拦截它list并做一些不同的事情(例如使用您自己的子类列表或双端队列)。

它将立即使用创建新的内置列表实例[]

我的解释旨在为您提供直觉。

说明

[] 通常称为文字语法。

在语法中,这称为“列表显示”。从文档

列表显示是括在方括号中的一系列可能为空的表达式:

list_display ::=  "[" [starred_list | comprehension] "]"

列表显示将产生一个新的列表对象,其内容由表达式列表或理解列表指定。提供逗号分隔的表达式列表时,将按从左到右的顺序评估其元素,并将其按此顺序放入列表对象中。提供理解后,将根据理解产生的元素来构建列表。

简而言之,这意味着将list创建一个内置类型的对象。

不能回避这一点-这意味着Python可以尽快完成它。

另一方面,list()可以list使用内置列表构造函数拦截创建内置对象的过程。

例如,假设我们希望创建噪音较大的列表:

class List(list):
    def __init__(self, iterable=None):
        if iterable is None:
            super().__init__()
        else:
            super().__init__(iterable)
        print('List initialized.')

然后,我们可以list在模块级别的全局范围内截取该名称,然后在创建时list,实际上创建了子类型列表:

>>> list = List
>>> a_list = list()
List initialized.
>>> type(a_list)
<class '__main__.List'>

同样,我们可以将其从全局命名空间中删除

del list

并将其放在内置命名空间中:

import builtins
builtins.list = List

现在:

>>> list_0 = list()
List initialized.
>>> type(list_0)
<class '__main__.List'>

并注意列表显示无条件创建列表:

>>> list_1 = []
>>> type(list_1)
<class 'list'>

我们可能只是暂时执行此操作,所以请撤消更改-首先List从内置文件中删除新对象:

>>> del builtins.list
>>> builtins.list
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: module 'builtins' has no attribute 'list'
>>> list()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
NameError: name 'list' is not defined

哦,不,我们失去了原来的踪迹。

不用担心,我们仍然可以得到list-它是列表文字的类型:

>>> builtins.list = type([])
>>> list()
[]

所以…

为什么[]要比list()

如我们所见-我们可以覆盖list-但是我们不能截取文字类型的创建。使用时,list我们必须进行查找以查看是否存在任何内容。

然后,我们必须调用已查找的任何可调用对象。从语法上:

调用使用一系列可能为空的参数来调用可调用对象(例如,函数):

call                 ::=  primary "(" [argument_list [","] | comprehension] ")"

我们可以看到它对任何名称都具有相同的作用,而不仅仅是列表:

>>> import dis
>>> dis.dis('list()')
  1           0 LOAD_NAME                0 (list)
              2 CALL_FUNCTION            0
              4 RETURN_VALUE
>>> dis.dis('doesnotexist()')
  1           0 LOAD_NAME                0 (doesnotexist)
              2 CALL_FUNCTION            0
              4 RETURN_VALUE

因为[]在Python字节码级别没有函数调用:

>>> dis.dis('[]')
  1           0 BUILD_LIST               0
              2 RETURN_VALUE

它只是直接建立列表而无需在字节码级别进行任何查找或调用。

结论

我们已经证明了list可以使用范围规则用用户代码拦截,并且可以list()查找可调用对象然后调用它。

[]列表显示或文字显示则避免了名称查找和函数调用。

Why is [] faster than list()?

The biggest reason is that Python treats list() just like a user-defined function, which means you can intercept it by aliasing something else to list and do something different (like use your own subclassed list or perhaps a deque).

It immediately creates a new instance of a builtin list with [].

My explanation seeks to give you the intuition for this.

Explanation

[] is commonly known as literal syntax.

In the grammar, this is referred to as a “list display”. From the docs:

A list display is a possibly empty series of expressions enclosed in square brackets:

list_display ::=  "[" [starred_list | comprehension] "]"

A list display yields a new list object, the contents being specified by either a list of expressions or a comprehension. When a comma-separated list of expressions is supplied, its elements are evaluated from left to right and placed into the list object in that order. When a comprehension is supplied, the list is constructed from the elements resulting from the comprehension.

In short, this means that a builtin object of type list is created.

There is no circumventing this – which means Python can do it as quickly as it may.

On the other hand, list() can be intercepted from creating a builtin list using the builtin list constructor.

For example, say we want our lists to be created noisily:

class List(list):
    def __init__(self, iterable=None):
        if iterable is None:
            super().__init__()
        else:
            super().__init__(iterable)
        print('List initialized.')

We could then intercept the name list on the module level global scope, and then when we create a list, we actually create our subtyped list:

>>> list = List
>>> a_list = list()
List initialized.
>>> type(a_list)
<class '__main__.List'>

Similarly we could remove it from the global namespace

del list

and put it in the builtin namespace:

import builtins
builtins.list = List

And now:

>>> list_0 = list()
List initialized.
>>> type(list_0)
<class '__main__.List'>

And note that the list display creates a list unconditionally:

>>> list_1 = []
>>> type(list_1)
<class 'list'>

We probably only do this temporarily, so lets undo our changes – first remove the new List object from the builtins:

>>> del builtins.list
>>> builtins.list
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: module 'builtins' has no attribute 'list'
>>> list()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
NameError: name 'list' is not defined

Oh, no, we lost track of the original.

Not to worry, we can still get list – it’s the type of a list literal:

>>> builtins.list = type([])
>>> list()
[]

So…

Why is [] faster than list()?

As we’ve seen – we can overwrite list – but we can’t intercept the creation of the literal type. When we use list we have to do the lookups to see if anything is there.

Then we have to call whatever callable we have looked up. From the grammar:

A call calls a callable object (e.g., a function) with a possibly empty series of arguments:

call                 ::=  primary "(" [argument_list [","] | comprehension] ")"

We can see that it does the same thing for any name, not just list:

>>> import dis
>>> dis.dis('list()')
  1           0 LOAD_NAME                0 (list)
              2 CALL_FUNCTION            0
              4 RETURN_VALUE
>>> dis.dis('doesnotexist()')
  1           0 LOAD_NAME                0 (doesnotexist)
              2 CALL_FUNCTION            0
              4 RETURN_VALUE

For [] there is no function call at the Python bytecode level:

>>> dis.dis('[]')
  1           0 BUILD_LIST               0
              2 RETURN_VALUE

It simply goes straight to building the list without any lookups or calls at the bytecode level.

Conclusion

We have demonstrated that list can be intercepted with user code using the scoping rules, and that list() looks for a callable and then calls it.

Whereas [] is a list display, or a literal, and thus avoids the name lookup and function call.