标签归档:syntax

在Python中使用多个参数进行字符串格式化(例如’%s…%s’)

问题:在Python中使用多个参数进行字符串格式化(例如’%s…%s’)

我有一个看起来像的字符串,'%s in %s'并且我想知道如何分隔参数,以便它们是两个不同的%s。我来自Java的想法是这样的:

'%s in %s' % unicode(self.author),  unicode(self.publication)

但这不起作用,因此它在Python中的外观如何?

I have a string that looks like '%s in %s' and I want to know how to seperate the arguments so that they are two different %s. My mind coming from Java came up with this:

'%s in %s' % unicode(self.author),  unicode(self.publication)

But this doesn’t work so how does it look in Python?


回答 0

马克·西达德(Mark Cidade)的答案是正确的-您需要提供一个元组。

但是从Python 2.6起,您可以使用format代替%

'{0} in {1}'.format(unicode(self.author,'utf-8'),  unicode(self.publication,'utf-8'))

%不再鼓励使用for格式化字符串。

这种字符串格式设置方法是Python 3.0中的新标准,应优先于新代码中“字符串格式设置操作”中描述的%格式设置。

Mark Cidade’s answer is right – you need to supply a tuple.

However from Python 2.6 onwards you can use format instead of %:

'{0} in {1}'.format(unicode(self.author,'utf-8'),  unicode(self.publication,'utf-8'))

Usage of % for formatting strings is no longer encouraged.

This method of string formatting is the new standard in Python 3.0, and should be preferred to the % formatting described in String Formatting Operations in new code.


回答 1

如果使用多个参数,则必须将其放在一个元组中(请注意额外的括号):

'%s in %s' % (unicode(self.author),  unicode(self.publication))

正如EOL所指出的那样,该unicode()函数通常假定默认为ascii编码,因此,如果您使用非ASCII字符,则显式传递编码会更安全:

'%s in %s' % (unicode(self.author,'utf-8'),  unicode(self.publication('utf-8')))

从Python 3.0开始,最好改用以下str.format()语法:

'{0} in {1}'.format(unicode(self.author,'utf-8'),unicode(self.publication,'utf-8'))

If you’re using more than one argument it has to be in a tuple (note the extra parentheses):

'%s in %s' % (unicode(self.author),  unicode(self.publication))

As EOL points out, the unicode() function usually assumes ascii encoding as a default, so if you have non-ASCII characters, it’s safer to explicitly pass the encoding:

'%s in %s' % (unicode(self.author,'utf-8'),  unicode(self.publication('utf-8')))

And as of Python 3.0, it’s preferred to use the str.format() syntax instead:

'{0} in {1}'.format(unicode(self.author,'utf-8'),unicode(self.publication,'utf-8'))

回答 2

在元组/映射对象上有多个参数 format

以下是文档摘录:

给定的format % values中的%转换规范format将替换为的零个或多个元素values。效果类似于使用sprintf() C语言中的用法。

如果format需要单个参数,则值可以是单个非元组对象。否则,值必须是一个具有由formatstring 指定的项目数的元组或者是一个映射对象(例如,字典)。

参考资料


开启str.format而不是%

%操作员的新替代方法是使用str.format。以下是文档摘录:

str.format(*args, **kwargs)

执行字符串格式化操作。调用此方法的字符串可以包含文字文本或用大括号分隔的替换字段{}。每个替换字段都包含位置参数的数字索引或关键字参数的名称。返回字符串的副本,其中每个替换字段都用相应参数的字符串值替换。

此方法是Python 3.0中的新标准,应优先于%formatting

参考资料


例子

以下是一些用法示例:

>>> '%s for %s' % ("tit", "tat")
tit for tat

>>> '{} and {}'.format("chicken", "waffles")
chicken and waffles

>>> '%(last)s, %(first)s %(last)s' % {'first': "James", 'last': "Bond"}
Bond, James Bond

>>> '{last}, {first} {last}'.format(first="James", last="Bond")
Bond, James Bond

也可以看看

On a tuple/mapping object for multiple argument format

The following is excerpt from the documentation:

Given format % values, % conversion specifications in format are replaced with zero or more elements of values. The effect is similar to the using sprintf() in the C language.

If format requires a single argument, values may be a single non-tuple object. Otherwise, values must be a tuple with exactly the number of items specified by the format string, or a single mapping object (for example, a dictionary).

References


On str.format instead of %

A newer alternative to % operator is to use str.format. Here’s an excerpt from the documentation:

str.format(*args, **kwargs)

Perform a string formatting operation. The string on which this method is called can contain literal text or replacement fields delimited by braces {}. Each replacement field contains either the numeric index of a positional argument, or the name of a keyword argument. Returns a copy of the string where each replacement field is replaced with the string value of the corresponding argument.

This method is the new standard in Python 3.0, and should be preferred to % formatting.

References


Examples

Here are some usage examples:

>>> '%s for %s' % ("tit", "tat")
tit for tat

>>> '{} and {}'.format("chicken", "waffles")
chicken and waffles

>>> '%(last)s, %(first)s %(last)s' % {'first': "James", 'last': "Bond"}
Bond, James Bond

>>> '{last}, {first} {last}'.format(first="James", last="Bond")
Bond, James Bond

See also


回答 3

您必须将值放在括号中:

'%s in %s' % (unicode(self.author),  unicode(self.publication))

在这里,第一个%sunicode(self.author)将被放置。第二%sunicode(self.publication)将使用。

注意:你应该有利于string formatting%符号。更多信息在这里

You must just put the values into parentheses:

'%s in %s' % (unicode(self.author),  unicode(self.publication))

Here, for the first %s the unicode(self.author) will be placed. And for the second %s, the unicode(self.publication) will be used.

Note: You should favor string formatting over the % Notation. More info here


回答 4

到目前为止,发布的一些答案存在一个严重的问题:unicode()从默认编码(通常为ASCII)解码;实际上,unicode()试图通过将给定的字节转换为字符来“感知”。因此,以下代码(基本上是前面的答案所建议的)在我的计算机上失败:

# -*- coding: utf-8 -*-
author = 'éric'
print '{0}'.format(unicode(author))

给出:

Traceback (most recent call last):
  File "test.py", line 3, in <module>
    print '{0}'.format(unicode(author))
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 0: ordinal not in range(128)

失败的原因是author不只包含ASCII字节(即[0; 127]中的值),并且unicode()默认情况下(在许多计算机上)从ASCII解码。

一个可靠的解决方案是显式提供您的字段中使用的编码。以UTF-8为例:

u'{0} in {1}'.format(unicode(self.author, 'utf-8'), unicode(self.publication, 'utf-8'))

(或不使用initial u,这取决于您要使用Unicode结果还是字节字符串)。

在这一点上,可能要考虑让authorand publication字段为Unicode字符串,而不是在格式化期间对其进行解码。

There is a significant problem with some of the answers posted so far: unicode() decodes from the default encoding, which is often ASCII; in fact, unicode() tries to make “sense” of the bytes it is given by converting them into characters. Thus, the following code, which is essentially what is recommended by previous answers, fails on my machine:

# -*- coding: utf-8 -*-
author = 'éric'
print '{0}'.format(unicode(author))

gives:

Traceback (most recent call last):
  File "test.py", line 3, in <module>
    print '{0}'.format(unicode(author))
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 0: ordinal not in range(128)

The failure comes from the fact that author does not contain only ASCII bytes (i.e. with values in [0; 127]), and unicode() decodes from ASCII by default (on many machines).

A robust solution is to explicitly give the encoding used in your fields; taking UTF-8 as an example:

u'{0} in {1}'.format(unicode(self.author, 'utf-8'), unicode(self.publication, 'utf-8'))

(or without the initial u, depending on whether you want a Unicode result or a byte string).

At this point, one might want to consider having the author and publication fields be Unicode strings, instead of decoding them during formatting.


回答 5

对于python2,您也可以执行此操作

'%(author)s in %(publication)s'%{'author':unicode(self.author),
                                  'publication':unicode(self.publication)}

如果您有很多可替代的论点(特别是在进行国际化的情况下),这将很方便

Python2.6及更高版本支持 .format()

'{author} in {publication}'.format(author=self.author,
                                   publication=self.publication)

For python2 you can also do this

'%(author)s in %(publication)s'%{'author':unicode(self.author),
                                  'publication':unicode(self.publication)}

which is handy if you have a lot of arguments to substitute (particularly if you are doing internationalisation)

Python2.6 onwards supports .format()

'{author} in {publication}'.format(author=self.author,
                                   publication=self.publication)

回答 6

您还可以通过以下方式干净,简单地使用它(但是错误!因为您应该format像Mark Byers所说的那样使用):

print 'This is my %s formatted with %d arguments' % ('string', 2)

You could also use it clean and simple (but wrong! because you should use format like Mark Byers said) by doing:

print 'This is my %s formatted with %d arguments' % ('string', 2)

回答 7

为了完整起见,在PEP-498中引入了Python 3.6 f-string 。这些字符串可以

使用最小语法将表达式嵌入字符串文字中。

这意味着对于您的示例,您还可以使用:

f'{self.author} in {self.publication}'

For completeness, in python 3.6 f-string are introduced in PEP-498. These strings make it possible to

embed expressions inside string literals, using a minimal syntax.

That would mean that for your example you could also use:

f'{self.author} in {self.publication}'

为什么列表中允许尾随逗号?

问题:为什么列表中允许尾随逗号?

我很好奇为什么在Python中列表中的尾部逗号是有效的语法,并且似乎Python只是忽略了它:

>>> ['a','b',]
['a', 'b']

自从('a')('a',)是两个元组时,它是一个有意义的元组,但是在列表中呢?

I am curious why in Python a trailing comma in a list is valid syntax, and it seems that Python simply ignores it:

>>> ['a','b',]
['a', 'b']

It makes sense when its a tuple since ('a') and ('a',) are two different things, but in lists?


回答 0

主要优点是,它使多行列表更易于编辑,并减少了差异。

变更:

s = ['manny',
     'mo',
     'jack',
]

至:

s = ['manny',
     'mo',
     'jack',
     'roger',
]

仅涉及差异的单行更改:

  s = ['manny',
       'mo',
       'jack',
+      'roger',
  ]

当省略尾部逗号时,这击败了更令人困惑的多行差异:

  s = ['manny',
       'mo',
-      'jack'
+      'jack',
+      'roger'
  ]

后者的差异使得很难看到仅添加了一行,而另一行没有更改内容。

它还降低了这样做的风险:

s = ['manny',
     'mo',
     'jack'
     'roger'  # Added this line, but forgot to add a comma on the previous line
]

并触发隐式字符串文字串联,产生s = ['manny', 'mo', 'jackroger']而不是预期的结果。

The main advantages are that it makes multi-line lists easier to edit and that it reduces clutter in diffs.

Changing:

s = ['manny',
     'mo',
     'jack',
]

to:

s = ['manny',
     'mo',
     'jack',
     'roger',
]

involves only a one-line change in the diff:

  s = ['manny',
       'mo',
       'jack',
+      'roger',
  ]

This beats the more confusing multi-line diff when the trailing comma was omitted:

  s = ['manny',
       'mo',
-      'jack'
+      'jack',
+      'roger'
  ]

The latter diff makes it harder to see that only one line was added and that the other line didn’t change content.

It also reduces the risk of doing this:

s = ['manny',
     'mo',
     'jack'
     'roger'  # Added this line, but forgot to add a comma on the previous line
]

and triggering implicit string literal concatenation, producing s = ['manny', 'mo', 'jackroger'] instead of the intended result.


回答 1

这是一种常见的语法约定,允许在数组中尾随逗号,C和Java之类的语言都允许,并且Python似乎已对其列表数据结构采用了该约定。在生成用于填充列表的代码时,它特别有用:只需生成一系列元素和逗号,而无需将最后一个元素和逗号视为特殊情况,并且不应该在末尾加逗号。

It’s a common syntactical convention to allow trailing commas in an array, languages like C and Java allow it, and Python seems to have adopted this convention for its list data structure. It’s particularly useful when generating code for populating a list: just generate a sequence of elements and commas, no need to consider the last one as a special case that shouldn’t have a comma at the end.


回答 2

它有助于消除某种错误。有时在多行上写列表会更清晰。但是在以后的维护中,您可能需要重新排列项目。

l1 = [
        1,
        2,
        3,
        4,
        5
]

# Now you want to rearrange

l1 = [
        1,
        2,
        3,
        5
        4,
]

# Now you have an error

但是,如果允许使用尾随逗号,则可以轻松地重新排列行而不会引起错误。

It helps to eliminate a certain kind of bug. It’s sometimes clearer to write lists on multiple lines. But in, later maintenace you may want to rearrange the items.

l1 = [
        1,
        2,
        3,
        4,
        5
]

# Now you want to rearrange

l1 = [
        1,
        2,
        3,
        5
        4,
]

# Now you have an error

But if you allow trailing commas, and use them, you can easily rearrange the lines without introducing an error.


回答 3

元组的不同之处在于,('a')它使用隐式连续和()s作为优先运算符进行扩展,而('a',)引用长度为1的元组。

你原来的例子是 tuple('a')

A tuple is different because ('a') is expanded using implicit continuation and ()s as a precendence operator, whereas ('a',) refers to a length 1 tuple.

Your original example would have been tuple('a')


回答 4

主要原因是使diff变得不那么复杂。例如,您有一个列表:

list = [
    'a',
    'b',
    'c'
]

并且您想要向其中添加另一个元素。然后,您将最终执行此操作:

list = [
    'a',
    'b',
    'c',
    'd'
]

因此,diff将显示出两行已更改,首先在’c’处添加’,’,在最后一行添加’d’。

因此,python允许在列表的最后一个元素中尾部加上’,’,以防止可能引起混淆的额外差异。

The main reason is to make diff less complicated. For example you have a list :

list = [
    'a',
    'b',
    'c'
]

and you want to add another element to it. Then you will be end up doing this:

list = [
    'a',
    'b',
    'c',
    'd'
]

thus, diff will show that two lines have been changed, first adding ‘,’ in line with ‘c’ and adding ‘d’ at last line.

So, python allows trailing ‘,’ in last element of list, to prevent extra diff which can cause confusion.


字符串格式化命名参数?

问题:字符串格式化命名参数?

我知道这是一个非常简单的问题,但我不知道该如何使用Google。

我能怎么做

print '<a href="%s">%s</a>' % (my_url)

所以要my_url使用两次?我假设我必须“命名” the %s,然后在参数中使用字典,但是我不确定正确的语法吗?


仅供参考,我知道我可以my_url在参数中使用两次,但这不是重点:)

I know it’s a really simple question, but I have no idea how to google it.

how can I do

print '<a href="%s">%s</a>' % (my_url)

So that my_url is used twice? I assume I have to “name” the %s and then use a dict in the params, but I’m not sure of the proper syntax?


just FYI, I’m aware I can just use my_url twice in the params, but that’s not the point :)


回答 0

在Python 2.6+和Python 3中,您可能选择使用更新的字符串格式化方法。

print('<a href="{0}">{0}</a>'.format(my_url))

这样可以避免重复输入参数,或者

print('<a href="{url}">{url}</a>'.format(url=my_url))

如果要命名参数。

print('<a href="{}">{}</a>'.format(my_url, my_url))

这是严格的位置,只有警告:format()参数遵循Python规则,其中必须首先使用未命名的args,然后是命名参数,然后是* args(类似于list或tuple的序列),然后是* kwargs(一种dict如果您知道什么对您有好处,请使用字符串进行键控)。首先通过将命名值替换为它们的标签来确定插值点,然后从剩下的位置进行定位。因此,您也可以这样做…

print('<a href="{not_my_url}">{}</a>'.format(my_url, my_url, not_my_url=her_url))

但这不是…

print('<a href="{not_my_url}">{}</a>'.format(my_url, not_my_url=her_url, my_url))

In Python 2.6+ and Python 3, you might choose to use the newer string formatting method.

print('<a href="{0}">{0}</a>'.format(my_url))

which saves you from repeating the argument, or

print('<a href="{url}">{url}</a>'.format(url=my_url))

if you want named parameters.

print('<a href="{}">{}</a>'.format(my_url, my_url))

which is strictly positional, and only comes with the caveat that format() arguments follow Python rules where unnamed args must come first, followed by named arguments, followed by *args (a sequence like list or tuple) and then *kwargs (a dict keyed with strings if you know what’s good for you). The interpolation points are determined first by substituting the named values at their labels, and then positional from what’s left. So, you can also do this…

print('<a href="{not_my_url}">{}</a>'.format(my_url, my_url, not_my_url=her_url))

But not this…

print('<a href="{not_my_url}">{}</a>'.format(my_url, not_my_url=her_url, my_url))

回答 1

print '<a href="%(url)s">%(url)s</a>' % {'url': my_url}
print '<a href="%(url)s">%(url)s</a>' % {'url': my_url}

回答 2

Python 3.6+中的解决方案

Python 3.6引入了文字字符串格式化,因此您可以格式化命名参数,而无需在字符串外重复任何命名参数:

print(f'<a href="{my_url:s}">{my_url:s}</a>')

这将进行评估my_url,因此,如果未定义,您将获得NameError。实际上,my_url您可以编写一个任意的Python表达式(而不是),只要它的计算结果为字符串(由于:s格式代码)即可。如果要为表达式的结果表示字符串表示形式(可能不是字符串),请替换:s!s,就像一般,预文字字符串格式化。

有关文字字符串格式的详细信息,请参阅PEP 498(首次引入该格式)。

Solution in Python 3.6+

Python 3.6 introduces literal string formatting, so that you can format the named parameters without any repeating any of your named parameters outside the string:

print(f'<a href="{my_url:s}">{my_url:s}</a>')

This will evaluate my_url, so if it’s not defined you will get a NameError. In fact, instead of my_url, you can write an arbitrary Python expression, as long as it evaluates to a string (because of the :s formatting code). If you want a string representation for the result of an expression that might not be a string, replace :s by !s, just like with regular, pre-literal string formatting.

For details on literal string formatting, see PEP 498, where it was first introduced.


回答 3

您将沉迷于语法。

同样是C#6.0,EcmaScript开发人员也熟悉此语法。

In [1]: print '{firstname} {lastname}'.format(firstname='Mehmet', lastname='Ağa')
Mehmet Ağa

In [2]: print '{firstname} {lastname}'.format(**dict(firstname='Mehmet', lastname='Ağa'))
Mehmet Ağa

You will be addicted to syntax.

Also C# 6.0, EcmaScript developers has also familier this syntax.

In [1]: print '{firstname} {lastname}'.format(firstname='Mehmet', lastname='Ağa')
Mehmet Ağa

In [2]: print '{firstname} {lastname}'.format(**dict(firstname='Mehmet', lastname='Ağa'))
Mehmet Ağa

回答 4

对于构建HTML页面,您要使用模板引擎,而不是简单的字符串插值。

For building HTML pages, you want to use a templating engine, not simple string interpolation.


回答 5

与字典方式一样,了解以下格式可能会很有用:

print '<a href="%s">%s</a>' % (my_url, my_url)

这是一点点的冗余,并且在修改代码时,字典方式当然不易出错,但是仍然可以使用元组进行多次插入。第%s一个元素替换了元组中的第一个元素,第二%s个元素替换了元组中的第二个元素,以此类推。

As well as the dictionary way, it may be useful to know the following format:

print '<a href="%s">%s</a>' % (my_url, my_url)

Here it’s a tad redundant, and the dictionary way is certainly less error prone when modifying the code, but it’s still possible to use tuples for multiple insertions. The first %s is substituted for the first element in the tuple, the second %s is substituted for the second element in the tuple, and so on for each element in the tuple.


您可以在Python的语法中添加新的语句吗?

问题:您可以在Python的语法中添加新的语句吗?

你可以添加新的语句(例如printraisewith)Python的语法?

说,允许

mystatement "Something"

要么,

new_if True:
    print "example"

如果您应该的话,不要太多,但如果可能的话,就可以了(只需修改python解释器代码即可)

Can you add new statements (like print, raise, with) to Python’s syntax?

Say, to allow..

mystatement "Something"

Or,

new_if True:
    print "example"

Not so much if you should, but rather if it’s possible (short of modifying the python interpreters code)


回答 0

您可能会发现这很有用-Python内部:在Python上添加新语句,引用如下:


本文旨在更好地了解Python前端的工作方式。仅阅读文档和源代码可能会有些无聊,因此我在这里采用动手实践的方法:我将向untilPython 添加一条语句。

本文的所有编码都是针对Python Mercurial存储库镜像中最前沿的Py3k分支完成的。

until声明

有些语言,如红宝石,有一个until说法,这是补充whileuntil num == 0相当于while num != 0)。在Ruby中,我可以这样写:

num = 3
until num == 0 do
  puts num
  num -= 1
end

它将打印:

3
2
1

因此,我想为Python添加类似的功能。也就是说,能够写:

num = 3
until num == 0:
  print(num)
  num -= 1

语言倡导题外话

本文并不试图建议在untilPython中添加一条语句。尽管我认为这样的声明可以使一些代码更清晰,并且本文显示了添加的难易程度,但我完全尊重Python的极简主义哲学。实际上,我在这里要做的只是深入了解Python的内部工作原理。

修改语法

Python使用名为的自定义解析器生成器pgen。这是一个LL(1)解析器,它将Python源代码转换为解析树。解析器生成器的输入是文件Grammar/Grammar[1]。这是一个简单的文本文件,指定Python的语法。

[1]:从这里开始,相对于源代码树的根目录,对Python源文件中的文件进行引用,该目录是您运行configure和make生成Python的目录。

必须对语法文件进行两次修改。首先是为until语句添加定义。我找到了该while语句的定义位置(while_stmt),并添加until_stmt到了[2]下面:

compound_stmt: if_stmt | while_stmt | until_stmt | for_stmt | try_stmt | with_stmt | funcdef | classdef | decorated
if_stmt: 'if' test ':' suite ('elif' test ':' suite)* ['else' ':' suite]
while_stmt: 'while' test ':' suite ['else' ':' suite]
until_stmt: 'until' test ':' suite

[2]:这演示了在修改我不熟悉的源代码时使用的一种通用技术:按相似性工作。这个原则并不能解决您的所有问题,但绝对可以简化流程。由于必须完成的所有工作while都必须完成until,因此它可以作为很好的指导。

请注意,我已经决定else从该子句的定义中排除该子句until,只是为了使它有所不同(并且因为坦率地说,我不喜欢else循环的子句,并且认为它与Python的Zen不太匹配)。

第二个更改是将规则修改为compound_stmtinclude until_stmt,如您在上面的代码段中所见。紧接着while_stmt又是。

当你运行make修改后Grammar/Grammar,通知该pgen程序运行重新生成Include/graminit.hPython/graminit.c,然后几个文件得到重新编译。

修改AST生成代码

在Python解析器创建了一个解析树之后,该树将转换为AST,因为在编译过程的后续阶段中,使用 AST 更简单

因此,我们将要访问Parser/Python.asdl,它定义了Python AST的结构,并为我们的新until语句添加了一个AST节点,再次位于以下位置while

| While(expr test, stmt* body, stmt* orelse)
| Until(expr test, stmt* body)

如果您现在运行make,请注意,在编译一堆文件之前,请先Parser/asdl_c.py运行以从AST定义文件生成C代码。这(如Grammar/Grammar)是Python源代码的另一个示例,它使用迷你语言(即DSL)简化了编程。还要注意,由于Parser/asdl_c.py是Python脚本,所以这是一种引导程序 -要从头开始构建Python,Python必须已经可用。

Parser/asdl_c.py生成用于管理新定义的AST节点的代码(到文件Include/Python-ast.h和中Python/Python-ast.c)时,我们仍然必须编写代码来手动将相关的解析树节点转换为它。这是在文件中完成的Python/ast.c。在那里,一个名为的函数ast_for_stmt将语句的解析树节点转换为AST节点。同样,在我们的老朋友的指导下while,我们跳入switch了处理复合语句的大幕,并为until_stmt以下项添加了一个子句:

case while_stmt:
    return ast_for_while_stmt(c, ch);
case until_stmt:
    return ast_for_until_stmt(c, ch);

现在我们应该执行ast_for_until_stmt。这里是:

static stmt_ty
ast_for_until_stmt(struct compiling *c, const node *n)
{
    /* until_stmt: 'until' test ':' suite */
    REQ(n, until_stmt);

    if (NCH(n) == 4) {
        expr_ty expression;
        asdl_seq *suite_seq;

        expression = ast_for_expr(c, CHILD(n, 1));
        if (!expression)
            return NULL;
        suite_seq = ast_for_suite(c, CHILD(n, 3));
        if (!suite_seq)
            return NULL;
        return Until(expression, suite_seq, LINENO(n), n->n_col_offset, c->c_arena);
    }

    PyErr_Format(PyExc_SystemError,
                 "wrong number of tokens for 'until' statement: %d",
                 NCH(n));
    return NULL;
}

同样,在仔细查看等效项的同时对它进行了编码ast_for_while_stmt,所不同的是,until我决定不支持该else子句。如预期的那样,使用其他AST创建函数(如ast_for_expr条件表达式和语句ast_for_suite主体)以递归方式创建AST until。最后,Until返回一个名为的新节点。

请注意,我们n使用诸如NCH和的一些宏来访问解析树节点CHILD。这些值得理解-它们的代码在Include/node.h

题外话:AST组成

我选择为该until语句创建一种新型的AST ,但实际上这不是必需的。我可以使用现有AST节点的组成来节省一些工作并实现新功能,因为:

until condition:
   # do stuff

在功能上等同于:

while not condition:
  # do stuff

与其在中创建Until节点ast_for_until_stmt,不如创建一个节点作为子Not节点的While节点。由于AST编译器已经知道如何处理这些节点,因此可以跳过该过程的后续步骤。

将AST编译成字节码

下一步是将AST编译为Python字节码。编译产生的中间结果是CFG(控制流图),但是由于使用相同的代码进行处理,因此我暂时将忽略此细节,并留给另一篇文章。

我们接下来要看的代码是Python/compile.c。按照的开头while,我们找到函数compiler_visit_stmt,该函数负责将语句编译为字节码。我们为添加一个子句Until

case While_kind:
    return compiler_while(c, s);
case Until_kind:
    return compiler_until(c, s);

如果您想知道Until_kind是什么,那么它是一个_stmt_kind从AST定义文件自动生成为的常数(实际上是枚举的值)Include/Python-ast.h。无论如何,我们称compiler_until它当然仍然不存在。我待会儿。

如果您像我一样好奇,您会发现这compiler_visit_stmt很奇怪。grep对源代码树进行-ping操作并没有揭示调用它的位置。在这种情况下,仅保留一个选项-C macro-fu。确实,经过简短的调查,我们找到了以下VISIT宏中定义的宏Python/compile.c

#define VISIT(C, TYPE, V) {\
    if (!compiler_visit_ ## TYPE((C), (V))) \
        return 0; \

它用来调用compiler_visit_stmtcompiler_body。回到我们的业务,但是…

按照承诺,这里是compiler_until

static int
compiler_until(struct compiler *c, stmt_ty s)
{
    basicblock *loop, *end, *anchor = NULL;
    int constant = expr_constant(s->v.Until.test);

    if (constant == 1) {
        return 1;
    }
    loop = compiler_new_block(c);
    end = compiler_new_block(c);
    if (constant == -1) {
        anchor = compiler_new_block(c);
        if (anchor == NULL)
            return 0;
    }
    if (loop == NULL || end == NULL)
        return 0;

    ADDOP_JREL(c, SETUP_LOOP, end);
    compiler_use_next_block(c, loop);
    if (!compiler_push_fblock(c, LOOP, loop))
        return 0;
    if (constant == -1) {
        VISIT(c, expr, s->v.Until.test);
        ADDOP_JABS(c, POP_JUMP_IF_TRUE, anchor);
    }
    VISIT_SEQ(c, stmt, s->v.Until.body);
    ADDOP_JABS(c, JUMP_ABSOLUTE, loop);

    if (constant == -1) {
        compiler_use_next_block(c, anchor);
        ADDOP(c, POP_BLOCK);
    }
    compiler_pop_fblock(c, LOOP, loop);
    compiler_use_next_block(c, end);

    return 1;
}

我有一个表白:这段代码并不是基于对Python字节码的深刻理解而编写的。像本文的其余部分一样,它是模仿亲属compiler_while功能来完成的。但是,通过仔细阅读它,牢记Python VM是基于堆栈的,并浏览该dis模块的文档(该模块的文档提供了带有说明的Python字节码列表),可以了解正在发生的事情。

就是这样,我们完成了……不是吗?

进行所有更改并运行之后make,我们可以运行新编译的Python并尝试新的until语句:

>>> until num == 0:
...   print(num)
...   num -= 1
...
3
2
1

瞧,行得通!我们来看一下使用dis模块为新语句创建的字节码,如下所示:

import dis

def myfoo(num):
    until num == 0:
        print(num)
        num -= 1

dis.dis(myfoo)

结果如下:

4           0 SETUP_LOOP              36 (to 39)
      >>    3 LOAD_FAST                0 (num)
            6 LOAD_CONST               1 (0)
            9 COMPARE_OP               2 (==)
           12 POP_JUMP_IF_TRUE        38

5          15 LOAD_NAME                0 (print)
           18 LOAD_FAST                0 (num)
           21 CALL_FUNCTION            1
           24 POP_TOP

6          25 LOAD_FAST                0 (num)
           28 LOAD_CONST               2 (1)
           31 INPLACE_SUBTRACT
           32 STORE_FAST               0 (num)
           35 JUMP_ABSOLUTE            3
      >>   38 POP_BLOCK
      >>   39 LOAD_CONST               0 (None)
           42 RETURN_VALUE

最有趣的操作是数字12:如果条件为true,则在循环之后跳转到。这是的正确语义until。如果未执行该跳转,则循环主体将继续运行,直到其跳回到操作35的状态为止。

感觉很不错,然后尝试运行该函数(执行myfoo(3)),而不显示其字节码。结果令人鼓舞:

Traceback (most recent call last):
  File "zy.py", line 9, in
    myfoo(3)
  File "zy.py", line 5, in myfoo
    print(num)
SystemError: no locals when loading 'print'

哇…这不好。那么出了什么问题?

缺少符号表的情况

Python编译器在编译AST时执行的步骤之一是为其编译的代码创建符号表。对PySymtable_Buildin 的调用将PyAST_Compile调用符号表模块(Python/symtable.c),该模块以类似于代码生成功能的方式遍历AST。每个作用域都有一个符号表,有助于编译器找出一些关键信息,例如哪些变量是全局变量,哪些是局部变量。

为了解决这个问题,我们必须修改的symtable_visit_stmt函数,在类似语句[3]的代码之后Python/symtable.c添加用于处理until语句的代码:while

case While_kind:
    VISIT(st, expr, s->v.While.test);
    VISIT_SEQ(st, stmt, s->v.While.body);
    if (s->v.While.orelse)
        VISIT_SEQ(st, stmt, s->v.While.orelse);
    break;
case Until_kind:
    VISIT(st, expr, s->v.Until.test);
    VISIT_SEQ(st, stmt, s->v.Until.body);
    break;

[3]:顺便说一下,如果没有此代码,则会有的编译器警告Python/symtable.c。编译器注意到,Until_kind枚举值未在和的switch语句中处理symtable_visit_stmt。检查编译器警告始终很重要!

现在我们真的完成了。进行此更改后编译源代码可以myfoo(3)按预期执行工作。

结论

在本文中,我演示了如何向Python添加新语句。尽管需要对Python编译器的代码进行大量修改,但更改并不难实现,因为我使用了类似的现有语句作为准则。

Python编译器是复杂的软件,我并不声称自己是该领域的专家。但是,我对Python的内部结构特别是前端非常感兴趣。因此,我发现此练习对于编译器原理和源代码的理论研究非常有用。它将作为以后将深入编译器的文章的基础。

参考资料

我使用了一些出色的参考来构建本文。在这里,它们没有特定的顺序:

  • PEP 339:CPython编译器的设计 -可能是Python编译器最重要,最全面的官方文档。太短了,它痛苦地显示出缺乏有关Python内部结构的好的文档。
  • “ Python编译器内部知识”-Thomas Lee的文章
  • “ Python:设计与实现”-Guido van Rossum的演讲
  • Python(2.5)虚拟机,导览-PeterTröger的演示

原始资料

You may find this useful – Python internals: adding a new statement to Python, quoted here:


This article is an attempt to better understand how the front-end of Python works. Just reading documentation and source code may be a bit boring, so I’m taking a hands-on approach here: I’m going to add an until statement to Python.

All the coding for this article was done against the cutting-edge Py3k branch in the Python Mercurial repository mirror.

The until statement

Some languages, like Ruby, have an until statement, which is the complement to while (until num == 0 is equivalent to while num != 0). In Ruby, I can write:

num = 3
until num == 0 do
  puts num
  num -= 1
end

And it will print:

3
2
1

So, I want to add a similar capability to Python. That is, being able to write:

num = 3
until num == 0:
  print(num)
  num -= 1

A language-advocacy digression

This article doesn’t attempt to suggest the addition of an until statement to Python. Although I think such a statement would make some code clearer, and this article displays how easy it is to add, I completely respect Python’s philosophy of minimalism. All I’m trying to do here, really, is gain some insight into the inner workings of Python.

Modifying the grammar

Python uses a custom parser generator named pgen. This is a LL(1) parser that converts Python source code into a parse tree. The input to the parser generator is the file Grammar/Grammar[1]. This is a simple text file that specifies the grammar of Python.

[1]: From here on, references to files in the Python source are given relatively to the root of the source tree, which is the directory where you run configure and make to build Python.

Two modifications have to be made to the grammar file. The first is to add a definition for the until statement. I found where the while statement was defined (while_stmt), and added until_stmt below [2]:

compound_stmt: if_stmt | while_stmt | until_stmt | for_stmt | try_stmt | with_stmt | funcdef | classdef | decorated
if_stmt: 'if' test ':' suite ('elif' test ':' suite)* ['else' ':' suite]
while_stmt: 'while' test ':' suite ['else' ':' suite]
until_stmt: 'until' test ':' suite

[2]: This demonstrates a common technique I use when modifying source code I’m not familiar with: work by similarity. This principle won’t solve all your problems, but it can definitely ease the process. Since everything that has to be done for while also has to be done for until, it serves as a pretty good guideline.

Note that I’ve decided to exclude the else clause from my definition of until, just to make it a little bit different (and because frankly I dislike the else clause of loops and don’t think it fits well with the Zen of Python).

The second change is to modify the rule for compound_stmt to include until_stmt, as you can see in the snippet above. It’s right after while_stmt, again.

When you run make after modifying Grammar/Grammar, notice that the pgen program is run to re-generate Include/graminit.h and Python/graminit.c, and then several files get re-compiled.

Modifying the AST generation code

After the Python parser has created a parse tree, this tree is converted into an AST, since ASTs are much simpler to work with in subsequent stages of the compilation process.

So, we’re going to visit Parser/Python.asdl which defines the structure of Python’s ASTs and add an AST node for our new until statement, again right below the while:

| While(expr test, stmt* body, stmt* orelse)
| Until(expr test, stmt* body)

If you now run make, notice that before compiling a bunch of files, Parser/asdl_c.py is run to generate C code from the AST definition file. This (like Grammar/Grammar) is another example of the Python source-code using a mini-language (in other words, a DSL) to simplify programming. Also note that since Parser/asdl_c.py is a Python script, this is a kind of bootstrapping – to build Python from scratch, Python already has to be available.

While Parser/asdl_c.py generated the code to manage our newly defined AST node (into the files Include/Python-ast.h and Python/Python-ast.c), we still have to write the code that converts a relevant parse-tree node into it by hand. This is done in the file Python/ast.c. There, a function named ast_for_stmt converts parse tree nodes for statements into AST nodes. Again, guided by our old friend while, we jump right into the big switch for handling compound statements and add a clause for until_stmt:

case while_stmt:
    return ast_for_while_stmt(c, ch);
case until_stmt:
    return ast_for_until_stmt(c, ch);

Now we should implement ast_for_until_stmt. Here it is:

static stmt_ty
ast_for_until_stmt(struct compiling *c, const node *n)
{
    /* until_stmt: 'until' test ':' suite */
    REQ(n, until_stmt);

    if (NCH(n) == 4) {
        expr_ty expression;
        asdl_seq *suite_seq;

        expression = ast_for_expr(c, CHILD(n, 1));
        if (!expression)
            return NULL;
        suite_seq = ast_for_suite(c, CHILD(n, 3));
        if (!suite_seq)
            return NULL;
        return Until(expression, suite_seq, LINENO(n), n->n_col_offset, c->c_arena);
    }

    PyErr_Format(PyExc_SystemError,
                 "wrong number of tokens for 'until' statement: %d",
                 NCH(n));
    return NULL;
}

Again, this was coded while closely looking at the equivalent ast_for_while_stmt, with the difference that for until I’ve decided not to support the else clause. As expected, the AST is created recursively, using other AST creating functions like ast_for_expr for the condition expression and ast_for_suite for the body of the until statement. Finally, a new node named Until is returned.

Note that we access the parse-tree node n using some macros like NCH and CHILD. These are worth understanding – their code is in Include/node.h.

Digression: AST composition

I chose to create a new type of AST for the until statement, but actually this isn’t necessary. I could’ve saved some work and implemented the new functionality using composition of existing AST nodes, since:

until condition:
   # do stuff

Is functionally equivalent to:

while not condition:
  # do stuff

Instead of creating the Until node in ast_for_until_stmt, I could have created a Not node with an While node as a child. Since the AST compiler already knows how to handle these nodes, the next steps of the process could be skipped.

Compiling ASTs into bytecode

The next step is compiling the AST into Python bytecode. The compilation has an intermediate result which is a CFG (Control Flow Graph), but since the same code handles it I will ignore this detail for now and leave it for another article.

The code we will look at next is Python/compile.c. Following the lead of while, we find the function compiler_visit_stmt, which is responsible for compiling statements into bytecode. We add a clause for Until:

case While_kind:
    return compiler_while(c, s);
case Until_kind:
    return compiler_until(c, s);

If you wonder what Until_kind is, it’s a constant (actually a value of the _stmt_kind enumeration) automatically generated from the AST definition file into Include/Python-ast.h. Anyway, we call compiler_until which, of course, still doesn’t exist. I’ll get to it an a moment.

If you’re curious like me, you’ll notice that compiler_visit_stmt is peculiar. No amount of grep-ping the source tree reveals where it is called. When this is the case, only one option remains – C macro-fu. Indeed, a short investigation leads us to the VISIT macro defined in Python/compile.c:

#define VISIT(C, TYPE, V) {\
    if (!compiler_visit_ ## TYPE((C), (V))) \
        return 0; \

It’s used to invoke compiler_visit_stmt in compiler_body. Back to our business, however…

As promised, here’s compiler_until:

static int
compiler_until(struct compiler *c, stmt_ty s)
{
    basicblock *loop, *end, *anchor = NULL;
    int constant = expr_constant(s->v.Until.test);

    if (constant == 1) {
        return 1;
    }
    loop = compiler_new_block(c);
    end = compiler_new_block(c);
    if (constant == -1) {
        anchor = compiler_new_block(c);
        if (anchor == NULL)
            return 0;
    }
    if (loop == NULL || end == NULL)
        return 0;

    ADDOP_JREL(c, SETUP_LOOP, end);
    compiler_use_next_block(c, loop);
    if (!compiler_push_fblock(c, LOOP, loop))
        return 0;
    if (constant == -1) {
        VISIT(c, expr, s->v.Until.test);
        ADDOP_JABS(c, POP_JUMP_IF_TRUE, anchor);
    }
    VISIT_SEQ(c, stmt, s->v.Until.body);
    ADDOP_JABS(c, JUMP_ABSOLUTE, loop);

    if (constant == -1) {
        compiler_use_next_block(c, anchor);
        ADDOP(c, POP_BLOCK);
    }
    compiler_pop_fblock(c, LOOP, loop);
    compiler_use_next_block(c, end);

    return 1;
}

I have a confession to make: this code wasn’t written based on a deep understanding of Python bytecode. Like the rest of the article, it was done in imitation of the kin compiler_while function. By reading it carefully, however, keeping in mind that the Python VM is stack-based, and glancing into the documentation of the dis module, which has a list of Python bytecodes with descriptions, it’s possible to understand what’s going on.

That’s it, we’re done… Aren’t we?

After making all the changes and running make, we can run the newly compiled Python and try our new until statement:

>>> until num == 0:
...   print(num)
...   num -= 1
...
3
2
1

Voila, it works! Let’s see the bytecode created for the new statement by using the dis module as follows:

import dis

def myfoo(num):
    until num == 0:
        print(num)
        num -= 1

dis.dis(myfoo)

Here’s the result:

4           0 SETUP_LOOP              36 (to 39)
      >>    3 LOAD_FAST                0 (num)
            6 LOAD_CONST               1 (0)
            9 COMPARE_OP               2 (==)
           12 POP_JUMP_IF_TRUE        38

5          15 LOAD_NAME                0 (print)
           18 LOAD_FAST                0 (num)
           21 CALL_FUNCTION            1
           24 POP_TOP

6          25 LOAD_FAST                0 (num)
           28 LOAD_CONST               2 (1)
           31 INPLACE_SUBTRACT
           32 STORE_FAST               0 (num)
           35 JUMP_ABSOLUTE            3
      >>   38 POP_BLOCK
      >>   39 LOAD_CONST               0 (None)
           42 RETURN_VALUE

The most interesting operation is number 12: if the condition is true, we jump to after the loop. This is correct semantics for until. If the jump isn’t executed, the loop body keeps running until it jumps back to the condition at operation 35.

Feeling good about my change, I then tried running the function (executing myfoo(3)) instead of showing its bytecode. The result was less than encouraging:

Traceback (most recent call last):
  File "zy.py", line 9, in
    myfoo(3)
  File "zy.py", line 5, in myfoo
    print(num)
SystemError: no locals when loading 'print'

Whoa… this can’t be good. So what went wrong?

The case of the missing symbol table

One of the steps the Python compiler performs when compiling the AST is create a symbol table for the code it compiles. The call to PySymtable_Build in PyAST_Compile calls into the symbol table module (Python/symtable.c), which walks the AST in a manner similar to the code generation functions. Having a symbol table for each scope helps the compiler figure out some key information, such as which variables are global and which are local to a scope.

To fix the problem, we have to modify the symtable_visit_stmt function in Python/symtable.c, adding code for handling until statements, after the similar code for while statements [3]:

case While_kind:
    VISIT(st, expr, s->v.While.test);
    VISIT_SEQ(st, stmt, s->v.While.body);
    if (s->v.While.orelse)
        VISIT_SEQ(st, stmt, s->v.While.orelse);
    break;
case Until_kind:
    VISIT(st, expr, s->v.Until.test);
    VISIT_SEQ(st, stmt, s->v.Until.body);
    break;

[3]: By the way, without this code there’s a compiler warning for Python/symtable.c. The compiler notices that the Until_kind enumeration value isn’t handled in the switch statement of symtable_visit_stmt and complains. It’s always important to check for compiler warnings!

And now we really are done. Compiling the source after this change makes the execution of myfoo(3) work as expected.

Conclusion

In this article I’ve demonstrated how to add a new statement to Python. Albeit requiring quite a bit of tinkering in the code of the Python compiler, the change wasn’t difficult to implement, because I used a similar and existing statement as a guideline.

The Python compiler is a sophisticated chunk of software, and I don’t claim being an expert in it. However, I am really interested in the internals of Python, and particularly its front-end. Therefore, I found this exercise a very useful companion to theoretical study of the compiler’s principles and source code. It will serve as a base for future articles that will get deeper into the compiler.

References

I used a few excellent references for the construction of this article. Here they are, in no particular order:

  • PEP 339: Design of the CPython compiler – probably the most important and comprehensive piece of official documentation for the Python compiler. Being very short, it painfully displays the scarcity of good documentation of the internals of Python.
  • “Python Compiler Internals” – an article by Thomas Lee
  • “Python: Design and Implementation” – a presentation by Guido van Rossum
  • Python (2.5) Virtual Machine, A guided tour – a presentation by Peter Tröger

original source


回答 1

做这种事情的一种方法是预处理并修改源代码,将添加的语句翻译成python。这种方法会带来各种问题,我不建议将其用于一般用途,但是对于尝试语言或特定用途的元编程,它有时会很有用。

例如,假设我们要引入“ myprint”语句,该语句不是打印到屏幕而是登录到特定文件。即:

myprint "This gets logged to file"

相当于

print >>open('/tmp/logfile.txt','a'), "This gets logged to file"

从正则表达式替换到生成AST,以及根据自己的语法与现有python匹配的紧密程度来编写自己的解析器,有多种选择方法。一个好的中间方法是使用标记器模块。这应该允许您在解释源代码时(类似于python解释器)添加新的关键字,控制结构等,从而避免原始正则表达式解决方案造成损坏。对于上面的“ myprint”,您可以编写以下转换代码:

import tokenize

LOGFILE = '/tmp/log.txt'
def translate(readline):
    for type, name,_,_,_ in tokenize.generate_tokens(readline):
        if type ==tokenize.NAME and name =='myprint':
            yield tokenize.NAME, 'print'
            yield tokenize.OP, '>>'
            yield tokenize.NAME, "open"
            yield tokenize.OP, "("
            yield tokenize.STRING, repr(LOGFILE)
            yield tokenize.OP, ","
            yield tokenize.STRING, "'a'"
            yield tokenize.OP, ")"
            yield tokenize.OP, ","
        else:
            yield type,name

(这确实使myprint有效地成为关键字,因此在其他地方用作变量可能会引起问题)

然后的问题是如何使用它,以便您的代码可从python使用。一种方法就是编写自己的导入函数,并使用它来加载以自定义语言编写的代码。即:

import new
def myimport(filename):
    mod = new.module(filename)
    f=open(filename)
    data = tokenize.untokenize(translate(f.readline))
    exec data in mod.__dict__
    return mod

这就要求您处理自定义代码的方法不同于普通的python模块。即“ some_mod = myimport("some_mod.py")”而非“ import some_mod

另一个相当整洁(尽管很hacky)的解决方案是创建自定义编码(请参阅PEP 263),如食谱所示。您可以将其实现为:

import codecs, cStringIO, encodings
from encodings import utf_8

class StreamReader(utf_8.StreamReader):
    def __init__(self, *args, **kwargs):
        codecs.StreamReader.__init__(self, *args, **kwargs)
        data = tokenize.untokenize(translate(self.stream.readline))
        self.stream = cStringIO.StringIO(data)

def search_function(s):
    if s!='mylang': return None
    utf8=encodings.search_function('utf8') # Assume utf8 encoding
    return codecs.CodecInfo(
        name='mylang',
        encode = utf8.encode,
        decode = utf8.decode,
        incrementalencoder=utf8.incrementalencoder,
        incrementaldecoder=utf8.incrementaldecoder,
        streamreader=StreamReader,
        streamwriter=utf8.streamwriter)

codecs.register(search_function)

现在,在运行此代码之后(例如,您可以将其放置在.pythonrc或site.py中),以注释“ #coding:mylang”开头的任何代码都将自动通过上述预处理步骤进行翻译。例如。

# coding: mylang
myprint "this gets logged to file"
for i in range(10):
    myprint "so does this : ", i, "times"
myprint ("works fine" "with arbitrary" + " syntax" 
  "and line continuations")

注意事项:

预处理器方法存在一些问题,如果您使用过C预处理器,您可能会很熟悉。主要的是调试。python看到的只是经过预处理的文件,这意味着打印在堆栈跟踪等中的文本将引用该文件。如果您执行了重要的翻译,这可能与源文本有很大不同。上面的示例不会更改行号等,因此不会有太大的不同,但是更改的次数越多,越难弄清。

One way to do things like this is to preprocess the source and modify it, translating your added statement to python. There are various problems this approach will bring, and I wouldn’t recommend it for general usage, but for experimentation with language, or specific-purpose metaprogramming, it can occassionally be useful.

For instance, lets say we want to introduce a “myprint” statement, that instead of printing to the screen instead logs to a specific file. ie:

myprint "This gets logged to file"

would be equivalent to

print >>open('/tmp/logfile.txt','a'), "This gets logged to file"

There are various options as to how to do the replacing, from regex substitution to generating an AST, to writing your own parser depending on how close your syntax matches existing python. A good intermediate approach is to use the tokenizer module. This should allow you to add new keywords, control structures etc while interpreting the source similarly to the python interpreter, thus avoiding the breakage crude regex solutions would cause. For the above “myprint”, you could write the following transformation code:

import tokenize

LOGFILE = '/tmp/log.txt'
def translate(readline):
    for type, name,_,_,_ in tokenize.generate_tokens(readline):
        if type ==tokenize.NAME and name =='myprint':
            yield tokenize.NAME, 'print'
            yield tokenize.OP, '>>'
            yield tokenize.NAME, "open"
            yield tokenize.OP, "("
            yield tokenize.STRING, repr(LOGFILE)
            yield tokenize.OP, ","
            yield tokenize.STRING, "'a'"
            yield tokenize.OP, ")"
            yield tokenize.OP, ","
        else:
            yield type,name

(This does make myprint effectively a keyword, so use as a variable elsewhere will likely cause problems)

The problem then is how to use it so that your code is usable from python. One way would just be to write your own import function, and use it to load code written in your custom language. ie:

import new
def myimport(filename):
    mod = new.module(filename)
    f=open(filename)
    data = tokenize.untokenize(translate(f.readline))
    exec data in mod.__dict__
    return mod

This requires you handle your customised code differently from normal python modules however. ie “some_mod = myimport("some_mod.py")” rather than “import some_mod

Another fairly neat (albeit hacky) solution is to create a custom encoding (See PEP 263) as this recipe demonstrates. You could implement this as:

import codecs, cStringIO, encodings
from encodings import utf_8

class StreamReader(utf_8.StreamReader):
    def __init__(self, *args, **kwargs):
        codecs.StreamReader.__init__(self, *args, **kwargs)
        data = tokenize.untokenize(translate(self.stream.readline))
        self.stream = cStringIO.StringIO(data)

def search_function(s):
    if s!='mylang': return None
    utf8=encodings.search_function('utf8') # Assume utf8 encoding
    return codecs.CodecInfo(
        name='mylang',
        encode = utf8.encode,
        decode = utf8.decode,
        incrementalencoder=utf8.incrementalencoder,
        incrementaldecoder=utf8.incrementaldecoder,
        streamreader=StreamReader,
        streamwriter=utf8.streamwriter)

codecs.register(search_function)

Now after this code gets run (eg. you could place it in your .pythonrc or site.py) any code starting with the comment “# coding: mylang” will automatically be translated through the above preprocessing step. eg.

# coding: mylang
myprint "this gets logged to file"
for i in range(10):
    myprint "so does this : ", i, "times"
myprint ("works fine" "with arbitrary" + " syntax" 
  "and line continuations")

Caveats:

There are problems to the preprocessor approach, as you’ll probably be familiar with if you’ve worked with the C preprocessor. The main one is debugging. All python sees is the preprocessed file which means that text printed in the stack trace etc will refer to that. If you’ve performed significant translation, this may be very different from your source text. The example above doesn’t change line numbers etc, so won’t be too different, but the more you change it, the harder it will be to figure out.


回答 2

是的,在某种程度上是可能的。有一个模块可以sys.settrace()用来实现gotocomefrom“关键字”:

from goto import goto, label
for i in range(1, 10):
  for j in range(1, 20):
    print i, j
    if j == 3:
      goto .end # breaking out from nested loop
label .end
print "Finished"

Yes, to some extent it is possible. There is a module out there that uses sys.settrace() to implement goto and comefrom “keywords”:

from goto import goto, label
for i in range(1, 10):
  for j in range(1, 20):
    print i, j
    if j == 3:
      goto .end # breaking out from nested loop
label .end
print "Finished"

回答 3

缺少更改和重新编译源代码(在开放源代码中可能的)的情况下,更改基本语言实际上是不可能的。

即使您确实重新编译了源代码,也不会是python,只是经过修改的修改过的版本,您需要非常小心,不要引入错误。

但是,我不确定您为什么要这么做。Python的面向对象功能使使用这种语言实现类似的结果非常简单。

Short of changing and recompiling the source code (which is possible with open source), changing the base language is not really possible.

Even if you do recompile the source, it wouldn’t be python, just your hacked-up changed version which you need to be very careful not to introduce bugs into.

However, I’m not sure why you’d want to. Python’s object-oriented features makes it quite simple to achieve similar results with the language as it stands.


回答 4

通用答案:您需要预处理源文件。

更具体的答案:安装EasyExtend,然后执行以下步骤

i)创建一个新的langlet(扩展语言)

import EasyExtend
EasyExtend.new_langlet("mystmts", prompt = "my> ", source_ext = "mypy")

如果没有其他规范,则应在EasyExtend / langlets / mystmts /下创建一堆文件。

ii)打开mystmts / parsedef / Grammar.ext并添加以下行

small_stmt: (expr_stmt | print_stmt  | del_stmt | pass_stmt | flow_stmt |
             import_stmt | global_stmt | exec_stmt | assert_stmt | my_stmt )

my_stmt: 'mystatement' expr

这足以定义新语句的语法。small_stmt非终结符是Python语法的一部分,是连接新语句的地方。解析器现在将识别新语句,即将解析包含该新语句的源文件。尽管编译器将拒绝它,因为它仍然必须转换为有效的Python。

iii)现在必须添加语句的语义。为此,必须编辑msytmts / langlet.py并添加my_stmt节点访问者。

 def call_my_stmt(expression):
     "defines behaviour for my_stmt"
     print "my stmt called with", expression

 class LangletTransformer(Transformer):
       @transform
       def my_stmt(self, node):
           _expr = find_node(node, symbol.expr)
           return any_stmt(CST_CallFunc("call_my_stmt", [_expr]))

 __publish__ = ["call_my_stmt"]

iv)cd到langlets / mystmts并输入

python run_mystmts.py

现在将开始一个会话,可以使用新定义的语句:

__________________________________________________________________________________

 mystmts

 On Python 2.5.1 (r251:54863, Apr 18 2007, 08:51:08) [MSC v.1310 32 bit (Intel)]
 __________________________________________________________________________________

 my> mystatement 40+2
 my stmt called with 42

做出一些琐碎的声明需要几个步骤,对吗?目前还没有一种API可以让人们定义简单的东西而不必关心语法。但是EE对一些错误进行模化是非常可靠的。因此,出现一个API只是时间问题,它使程序员可以使用便捷的OO编程定义便捷的内容,例如中缀运算符或小语句。对于更复杂的事情,例如通过构建langlet在Python中嵌入整个语言,没有办法解决完整的语法方法。

General answer: you need to preprocess your source files.

More specific answer: install EasyExtend, and go through following steps

i) Create a new langlet ( extension language )

import EasyExtend
EasyExtend.new_langlet("mystmts", prompt = "my> ", source_ext = "mypy")

Without additional specification a bunch of files shall be created under EasyExtend/langlets/mystmts/ .

ii) Open mystmts/parsedef/Grammar.ext and add following lines

small_stmt: (expr_stmt | print_stmt  | del_stmt | pass_stmt | flow_stmt |
             import_stmt | global_stmt | exec_stmt | assert_stmt | my_stmt )

my_stmt: 'mystatement' expr

This is sufficient to define the syntax of your new statement. The small_stmt non-terminal is part of the Python grammar and it’s the place where the new statement is hooked in. The parser will now recognize the new statement i.e. a source file containing it will be parsed. The compiler will reject it though because it still has to be transformed into valid Python.

iii) Now one has to add semantics of the statement. For this one has to edit msytmts/langlet.py and add a my_stmt node visitor.

 def call_my_stmt(expression):
     "defines behaviour for my_stmt"
     print "my stmt called with", expression

 class LangletTransformer(Transformer):
       @transform
       def my_stmt(self, node):
           _expr = find_node(node, symbol.expr)
           return any_stmt(CST_CallFunc("call_my_stmt", [_expr]))

 __publish__ = ["call_my_stmt"]

iv) cd to langlets/mystmts and type

python run_mystmts.py

Now a session shall be started and the newly defined statement can be used:

__________________________________________________________________________________

 mystmts

 On Python 2.5.1 (r251:54863, Apr 18 2007, 08:51:08) [MSC v.1310 32 bit (Intel)]
 __________________________________________________________________________________

 my> mystatement 40+2
 my stmt called with 42

Quite a few steps to come to a trivial statement, right? There isn’t an API yet that lets one define simple things without having to care about grammars. But EE is very reliable modulo some bugs. So it’s just a matter of time that an API emerges that lets programmers define convenient stuff like infix operators or small statements using just convenient OO programming. For more complex things like embedding whole languages in Python by means of building a langlet there is no way of going around a full grammar approach.


回答 5

这是一种仅在解释模式下添加新语句的非常简单但糟糕的方法。我只使用sys.displayhook将它用于少量的1个字母的命令来编辑基因注释,但是为了回答这个问题,我还为语法错误添加了sys.excepthook。后者确实很丑陋,从readline缓冲区中获取原始代码。好处是,以这种方式添加新语句非常容易。


jcomeau@intrepid:~/$ cat demo.py; ./demo.py
#!/usr/bin/python -i
'load everything needed under "package", such as package.common.normalize()'
import os, sys, readline, traceback
if __name__ == '__main__':
    class t:
        @staticmethod
        def localfunction(*args):
            print 'this is a test'
            if args:
                print 'ignoring %s' % repr(args)

    def displayhook(whatever):
        if hasattr(whatever, 'localfunction'):
            return whatever.localfunction()
        else:
            print whatever

    def excepthook(exctype, value, tb):
        if exctype is SyntaxError:
            index = readline.get_current_history_length()
            item = readline.get_history_item(index)
            command = item.split()
            print 'command:', command
            if len(command[0]) == 1:
                try:
                    eval(command[0]).localfunction(*command[1:])
                except:
                    traceback.print_exception(exctype, value, tb)
        else:
            traceback.print_exception(exctype, value, tb)

    sys.displayhook = displayhook
    sys.excepthook = excepthook
>>> t
this is a test
>>> t t
command: ['t', 't']
this is a test
ignoring ('t',)
>>> ^D

Here’s a very simple but crappy way to add new statements, in interpretive mode only. I’m using it for little 1-letter commands for editing gene annotations using only sys.displayhook, but just so I could answer this question I added sys.excepthook for the syntax errors as well. The latter is really ugly, fetching the raw code from the readline buffer. The benefit is, it’s trivially easy to add new statements this way.


jcomeau@intrepid:~/$ cat demo.py; ./demo.py
#!/usr/bin/python -i
'load everything needed under "package", such as package.common.normalize()'
import os, sys, readline, traceback
if __name__ == '__main__':
    class t:
        @staticmethod
        def localfunction(*args):
            print 'this is a test'
            if args:
                print 'ignoring %s' % repr(args)

    def displayhook(whatever):
        if hasattr(whatever, 'localfunction'):
            return whatever.localfunction()
        else:
            print whatever

    def excepthook(exctype, value, tb):
        if exctype is SyntaxError:
            index = readline.get_current_history_length()
            item = readline.get_history_item(index)
            command = item.split()
            print 'command:', command
            if len(command[0]) == 1:
                try:
                    eval(command[0]).localfunction(*command[1:])
                except:
                    traceback.print_exception(exctype, value, tb)
        else:
            traceback.print_exception(exctype, value, tb)

    sys.displayhook = displayhook
    sys.excepthook = excepthook
>>> t
this is a test
>>> t t
command: ['t', 't']
this is a test
ignoring ('t',)
>>> ^D


回答 6

我找到了有关添加新语句的指南:

https://troeger.eu/files/teaching/pythonvm08lab.pdf

基本上,要添加新语句,您必须Python/ast.c(除其他外)进行编辑并重新编译python二进制文件。

尽管有可能,但不要这样做。您几乎可以通过函数和类来实现所有目的(这不需要人们重新编译python才能运行您的脚本。)

I’ve found a guide on adding new statements:

https://troeger.eu/files/teaching/pythonvm08lab.pdf

Basically, to add new statements, you must edit Python/ast.c (among other things) and recompile the python binary.

While it’s possible, don’t. You can achieve almost everything via functions and classes (which wont require people to recompile python just to run your script..)


回答 7

使用EasyExtend可以做到这一点

EasyExtend(EE)是一个用纯Python编写并与CPython集成的预处理器生成器和元编程框架。EasyExtend的主要目的是创建扩展语言,即向Python添加自定义语法和语义。

It’s possible to do this using EasyExtend:

EasyExtend (EE) is a preprocessor generator and metaprogramming framework written in pure Python and integrated with CPython. The main purpose of EasyExtend is the creation of extension languages i.e. adding custom syntax and semantics to Python.


回答 8

它并没有在语言语法中添加新的语句,但是宏是一个强大的工具:https : //github.com/lihaoyi/macropy

It’s not exactly adding new statements to the language syntax, but macros are a powerful tool: https://github.com/lihaoyi/macropy


回答 9

并非没有修改解释器。我知道过去几年中许多语言都被描述为“可扩展”,但并不是您所描述的那样。您可以通过添加函数和类来扩展Python。

Not without modifying the interpreter. I know a lot of languages in the past several years have been described as “extensible”, but not in the way you’re describing. You extend Python by adding functions and classes.


回答 10

有一种基于Python的语言称为Logix,您可以使用它执行此操作。它不是一直在开发了一段时间,但功能,你要求做的工作与最新版本。

There is a language based on python called Logix with which you CAN do such things. It hasn’t been under development for a while, but the features that you asked for do work with the latest version.


回答 11

装饰器可以完成某些操作。例如,假设Python没有with语句。然后,我们可以实现类似的行为,如下所示:

# ====== Implementation of "mywith" decorator ======

def mywith(stream):
    def decorator(function):
        try: function(stream)
        finally: stream.close()
    return decorator

# ====== Using the decorator ======

@mywith(open("test.py","r"))
def _(infile):
    for l in infile.readlines():
        print(">>", l.rstrip())

但是,这是一个非常不干净的解决方案。特别是装饰器调用函数并设置_为的行为None是意外的。为了澄清起见:此装饰器等效于编写

def _(infile): ...
_ = mywith(open(...))(_) # mywith returns None.

通常,装饰器和装饰器将修改而不执行功能。

我以前在脚本中使用过这种方法,在该脚本中,我不得不临时设置几个功能的工作目录。

Some things can be done with decorators. Let’s e.g. assume, Python had no with statement. We could then implement a similar behaviour like this:

# ====== Implementation of "mywith" decorator ======

def mywith(stream):
    def decorator(function):
        try: function(stream)
        finally: stream.close()
    return decorator

# ====== Using the decorator ======

@mywith(open("test.py","r"))
def _(infile):
    for l in infile.readlines():
        print(">>", l.rstrip())

It is a pretty unclean solution however as done here. Especially the behaviour where the decorator calls the function and sets _ to None is unexpected. For clarification: This decorator is equivalent to writing

def _(infile): ...
_ = mywith(open(...))(_) # mywith returns None.

and decorators are normally expected to modify, not to execute, functions.

I used such a method before in a script where I had to temporarily set the working directory for several functions.


回答 12

十年前,您做不到,我怀疑情况已经改变。但是,如果您准备重新编译python,那么修改语法并不难,我也怀疑是否已更改。

Ten years ago you couldn’t, and I doubt that’s changed. However, it wasn’t that hard to modify the syntax back then if you were prepared to recompile python, and I doubt that’s changed, either.


将if-elif-else语句放在一行上吗?

问题:将if-elif-else语句放在一行上吗?

我已阅读以下链接,但未解决我的问题。
Python是否具有三元条件运算符?(问题是将if-else语句压缩到一行)

有没有更简单的方式编写if-elif-else语句,使其适合一行?
例如,

if expression1:
   statement1
elif expression2:
   statement2
else:
   statement3

或一个真实的例子:

if i > 100:
    x = 2
elif i < 100:
    x = 1
else:
    x = 0

我只是觉得,如果上面的示例可以用以下方式编写,则看起来会更加简洁。

x=2 if i>100 elif i<100 1 else 0 [WRONG]

I have read the links below, but it doesn’t address my question.
Does Python have a ternary conditional operator? (the question is about condensing if-else statement to one line)

Is there an easier way of writing an if-elif-else statement so it fits on one line?
For example,

if expression1:
   statement1
elif expression2:
   statement2
else:
   statement3

Or a real-world example:

if i > 100:
    x = 2
elif i < 100:
    x = 1
else:
    x = 0

I just feel if the example above could be written the following way, it could look like more concise.

x=2 if i>100 elif i<100 1 else 0 [WRONG]

回答 0

不,这是不可能的(至少不能使用任意语句),也不是可取的。将所有内容都放在一行中很可能会违反PEP-8,在这种情况下,行的长度不得超过80个字符。

这也与Python的Zen背道而驰:“可读性很重要”。(import this在Python提示符下键入以读取整个内容)。

可以在Python中使用三元表达式,但只能用于表达式,不能用于语句:

>>> a = "Hello" if foo() else "Goodbye"

编辑:

现在,您修改后的问题表明,除了要分配的值之外,这三个语句是相同的。在这种情况下,链式三元运算符确实可以工作,但是我仍然认为它的可读性较差:

>>> i=100
>>> a = 1 if i<100 else 2 if i>100 else 0
>>> a
0
>>> i=101
>>> a = 1 if i<100 else 2 if i>100 else 0
>>> a
2
>>> i=99
>>> a = 1 if i<100 else 2 if i>100 else 0
>>> a
1

No, it’s not possible (at least not with arbitrary statements), nor is it desirable. Fitting everything on one line would most likely violate PEP-8 where it is mandated that lines should not exceed 80 characters in length.

It’s also against the Zen of Python: “Readability counts”. (Type import this at the Python prompt to read the whole thing).

You can use a ternary expression in Python, but only for expressions, not for statements:

>>> a = "Hello" if foo() else "Goodbye"

Edit:

Your revised question now shows that the three statements are identical except for the value being assigned. In that case, a chained ternary operator does work, but I still think that it’s less readable:

>>> i=100
>>> a = 1 if i<100 else 2 if i>100 else 0
>>> a
0
>>> i=101
>>> a = 1 if i<100 else 2 if i>100 else 0
>>> a
2
>>> i=99
>>> a = 1 if i<100 else 2 if i>100 else 0
>>> a
1

回答 1

如果您仅在不同情况下需要不同的表达式,那么这可能对您有用:

expr1 if condition1 else expr2 if condition2 else expr

例如:

a = "neg" if b<0 else "pos" if b>0 else "zero"

If you only need different expressions for different cases then this may work for you:

expr1 if condition1 else expr2 if condition2 else expr

For example:

a = "neg" if b<0 else "pos" if b>0 else "zero"

回答 2

只需在else语句中嵌套另一个if子句。但这并没有使它看起来更漂亮。

>>> x=5
>>> x if x>0 else ("zero" if x==0 else "invalid value")
5
>>> x = 0
>>> x if x>0 else ("zero" if x==0 else "invalid value")
'zero'
>>> x = -1
>>> x if x>0 else ("zero" if x==0 else "invalid value")
'invalid value'

Just nest another if clause in the else statement. But that doesn’t make it look any prettier.

>>> x=5
>>> x if x>0 else ("zero" if x==0 else "invalid value")
5
>>> x = 0
>>> x if x>0 else ("zero" if x==0 else "invalid value")
'zero'
>>> x = -1
>>> x if x>0 else ("zero" if x==0 else "invalid value")
'invalid value'

回答 3

尽管有其他一些答案:是的,这是可能的

if expression1:
   statement1
elif expression2:
   statement2
else:
   statement3

转换为以下一种衬纸:

statement1 if expression1 else (statement2 if expression2 else statement3)

实际上,您可以将它们嵌套到无限远。请享用 ;)

Despite some other answers: YES it IS possible:

if expression1:
   statement1
elif expression2:
   statement2
else:
   statement3

translates to the following one liner:

statement1 if expression1 else (statement2 if expression2 else statement3)

in fact you can nest those till infinity. Enjoy ;)


回答 4

您可以选择实际使用a的get方法dict

x = {i<100: -1, -10<=i<=10: 0, i>100: 1}.get(True, 2)

get如果其中一个键可以保证计算为,则不需要该方法True

x = {i<0: -1, i==0: 0, i>0: 1}[True]

理想情况下,最多不应将其中一个键评估为True。如果一个以上的键计算为True,则结果似乎不可预测。

You can optionally actually use the get method of a dict:

x = {i<100: -1, -10<=i<=10: 0, i>100: 1}.get(True, 2)

You don’t need the get method if one of the keys is guaranteed to evaluate to True:

x = {i<0: -1, i==0: 0, i>0: 1}[True]

At most one of the keys should ideally evaluate to True. If more than one key evaluates to True, the results could seem unpredictable.


回答 5

在我看来,还有一种方法是很难理解的,但无论如何我还是会出于好奇而分享:

x = (i>100 and 2) or (i<100 and 1) or 0

此处提供更多信息:https : //docs.python.org/3/library/stdtypes.html#boolean-operations-and-or-not

There’s an alternative that’s quite unreadable in my opinion but I’ll share anyway just as a curiosity:

x = (i>100 and 2) or (i<100 and 1) or 0

More info here: https://docs.python.org/3/library/stdtypes.html#boolean-operations-and-or-not


回答 6

if i > 100:
    x = 2
elif i < 100:
    x = 1
else:
    x = 0

如果要在一行中使用上述代码,则可以使用以下代码:

x = 2 if i > 100 else 1 if i < 100 else 0

这样做时,如果i> 100,x将被分配为2;如果i <100,则x将被分配;如果i = 100,则x将被分配为0。

if i > 100:
    x = 2
elif i < 100:
    x = 1
else:
    x = 0

If you want to use the above-mentioned code in one line, you can use the following:

x = 2 if i > 100 else 1 if i < 100 else 0

On doing so, x will be assigned 2 if i > 100, 1 if i < 100 and 0 if i = 100


回答 7

这也取决于您的表情的性质。关于“不做”的其他答案的一般建议对于通用语句和通用表达式非常有效。

但是,如果您只需要一个“ dispatch”表(例如,根据给定选项的值调用一个不同的函数),则可以将这些函数放在字典中进行调用。

就像是:

def save(): 
   ...
def edit():
   ...
options = {"save": save, "edit": edit, "remove": lambda : "Not Implemented"}

option = get_input()
result = options[option]()

代替if-else:

if option=="save":
    save()
...

It also depends on the nature of your expressions. The general advice on the other answers of “not doing it” is quite valid for generic statements and generic expressions.

But if all you need is a “dispatch” table, like, calling a different function depending on the value of a given option, you can put the functions to call inside a dictionary.

Something like:

def save(): 
   ...
def edit():
   ...
options = {"save": save, "edit": edit, "remove": lambda : "Not Implemented"}

option = get_input()
result = options[option]()

Instead of an if-else:

if option=="save":
    save()
...

回答 8

人们已经提到了三元表达式。有时,以简单的条件分配为例,可以使用数学表达式执行条件分配。这可能不会使您的代码具有很高的可读性,但是确实可以将它放在很短的一行上。您的示例可以这样写:

x = 2*(i>100) | 1*(i<100)

比较将为True或False,然后与数字相乘将为1或0。可以使用+而不是| 在中间。

People have already mentioned ternary expressions. Sometimes with a simple conditional assignment as your example, it is possible to use a mathematical expression to perform the conditional assignment. This may not make your code very readable, but it does get it on one fairly short line. Your example could be written like this:

x = 2*(i>100) | 1*(i<100)

The comparisons would be True or False, and when multiplying with numbers would then be either 1 or 0. One could use a + instead of an | in the middle.


回答 9

三元运算符是一个简洁的表达的最好方式。语法为variable = value_1 if condition else value_2。因此,对于您的示例,您必须两次应用三元运算符:

i = 23 # set any value for i
x = 2 if i > 100 else 1 if i < 100 else 0

The ternary operator is the best way to a concise expression. The syntax is variable = value_1 if condition else value_2. So, for your example, you must apply the ternary operator twice:

i = 23 # set any value for i
x = 2 if i > 100 else 1 if i < 100 else 0

回答 10

您可以使用嵌套三元if语句。

# if-else ternary construct
country_code = 'USA'
is_USA = True if country_code == 'USA' else False
print('is_USA:', is_USA)

# if-elif-else ternary construct
# Create function to avoid repeating code.
def get_age_category_name(age):
    age_category_name = 'Young' if age <= 40 else ('Middle Aged' if age > 40 and age <= 65 else 'Senior')
    return age_category_name

print(get_age_category_name(25))
print(get_age_category_name(50))
print(get_age_category_name(75))

You can use nested ternary if statements.

# if-else ternary construct
country_code = 'USA'
is_USA = True if country_code == 'USA' else False
print('is_USA:', is_USA)

# if-elif-else ternary construct
# Create function to avoid repeating code.
def get_age_category_name(age):
    age_category_name = 'Young' if age <= 40 else ('Middle Aged' if age > 40 and age <= 65 else 'Senior')
    return age_category_name

print(get_age_category_name(25))
print(get_age_category_name(50))
print(get_age_category_name(75))

在元组定义中使用逗号结尾的语法规则是什么?

问题:在元组定义中使用逗号结尾的语法规则是什么?

在单个元素元组的情况下,需要尾随逗号。

a = ('foo',)

那么具有多个元素的元组呢?似乎尾随逗号是否存在,它们都有效。这样对吗?我认为使用逗号结尾更易于编辑。那是不好的编码风格吗?

a = ('foo1', 'foo2')
b = ('foo1', 'foo2',)

In the case of a single element tuple, the trailing comma is required.

a = ('foo',)

What about a tuple with multiple elements? It seems that whether the trailing comma exists or not, they are both valid. Is this correct? Having a trailing comma is easier for editing in my opinion. Is that a bad coding style?

a = ('foo1', 'foo2')
b = ('foo1', 'foo2',)

回答 0

在所有情况下,除了空的元组,逗号都是重要的事情。仅在出于其他语法原因而需要时才需要括号:将元组与一组函数参数,运算符优先级区分开或允许换行。

元组,列表或函数参数的尾部逗号是一种不错的样式,尤其是当您有一个较长的初始化并分为多行时。如果始终包含尾随逗号,则不会在末尾添加任何行,而期望添加另一个元素,而只是创建一个有效的表达式:

a = [
   "a",
   "b"
   "c"
]

假设它最初是一个2元素列表,后来又进行了扩展,那么它可能以一种可能并不立即显而易见的方式出错了。始终包含结尾逗号,这样可以避免出现陷阱。

In all cases except the empty tuple the comma is the important thing. Parentheses are only required when required for other syntactic reasons: to distinguish a tuple from a set of function arguments, operator precedence, or to allow line breaks.

The trailing comma for tuples, lists, or function arguments is good style especially when you have a long initialisation that is split over multiple lines. If you always include a trailing comma then you won’t add another line to the end expecting to add another element and instead just creating a valid expression:

a = [
   "a",
   "b"
   "c"
]

Assuming that started as a 2 element list that was later extended it has gone wrong in a perhaps not immediately obvious way. Always include the trailing comma and you avoid that trap.


回答 1

只有单项元组才需要消除定义元组或由括号包围的表达式的歧义。

(1)  # the number 1 (the parentheses are wrapping the expression `1`)
(1,) # a 1-tuple holding a number 1

对于一个以上的项目,不再需要了,因为显然它是一个元组。但是,允许使用逗号逗号来简化使用多行定义它们的过程。您可以在不破坏语法的情况下添加到末尾或重新排列项目,因为您偶然遗漏了逗号。

例如,

someBigTuple = (
                   0,
                   1,
                   2,
                   3,
                   4,
                   5,
                   6,
                   7,
                   8,
                   9,
                   10,
                   #...
                   10000000000,
               )

请注意,这也适用于其他集合(例如列表和字典),而不仅仅是元组。

It is only required for single-item tuples to disambiguate defining a tuple or an expression surrounded by parentheses.

(1)  # the number 1 (the parentheses are wrapping the expression `1`)
(1,) # a 1-tuple holding a number 1

For more than one item, it is no longer necessary since it is perfectly clear it is a tuple. However, the trailing comma is allowed to make defining them using multiple lines easier. You could add to the end or rearrange items without breaking the syntax because you left out a comma on accident.

e.g.,

someBigTuple = (
                   0,
                   1,
                   2,
                   3,
                   4,
                   5,
                   6,
                   7,
                   8,
                   9,
                   10,
                   #...
                   10000000000,
               )

Note that this applies to other collections (e.g., lists and dictionaries) too and not just tuples.


回答 2

尾随逗号的另一个优点是,它使差异看起来更好。如果您从

a = [
    1,
    2,
    3
]

并将其更改为

a = [
    1,
    2,
    3,
    4
]

差异看起来像

 a = [
     1,
     2,
-    3
+    3,
+    4
 ]

而如果您以逗号开头,例如

a = [
    1,
    2,
    3,
]

然后差异将是

 a = [
     1,
     2,
     3,
+    4,
 ]

Another advantage of trailing commas is that it makes diffs look nicer. If you started with

a = [
    1,
    2,
    3
]

and changed it to

a = [
    1,
    2,
    3,
    4
]

The diff would look like

 a = [
     1,
     2,
-    3
+    3,
+    4
 ]

Whereas if you had started with a trailing comma, like

a = [
    1,
    2,
    3,
]

Then the diff would just be

 a = [
     1,
     2,
     3,
+    4,
 ]

回答 3

它是可选的:请参阅Python Wiki

简介:单元素元组需要结尾逗号,但是对于多元素元组是可选的。

It’s optional: see the Python wiki.

Summary: single-element tuples need a trailing comma, but it’s optional for multiple-element tuples.


回答 4

另外,请考虑所需的情况:

>>> (('x','y'))*4                         # same as ('x','y')*4
('x', 'y', 'x', 'y', 'x', 'y', 'x', 'y')
#Expected = (('x', 'y'), ('x', 'y'), ('x', 'y'), ('x', 'y'))

因此,在这种情况下,外部括号仅是对括号进行分组。要使它们成为元组,您需要添加尾随逗号。即

>>> (('x','y'),)*4 
(('x', 'y'), ('x', 'y'), ('x', 'y'), ('x', 'y'))

Also, consider the situation where you want:

>>> (('x','y'))*4                         # same as ('x','y')*4
('x', 'y', 'x', 'y', 'x', 'y', 'x', 'y')
#Expected = (('x', 'y'), ('x', 'y'), ('x', 'y'), ('x', 'y'))

So in this case the outer parentheses are nothing more than grouping parentheses. To make them tuple you need to add a trailing comma. i.e.

>>> (('x','y'),)*4 
(('x', 'y'), ('x', 'y'), ('x', 'y'), ('x', 'y'))

回答 5

仅尾元素元组才需要尾随逗号。对于较大的元组,用逗号结尾是一个样式问题,并非必需。它的最大优点是对带有经常修改的多行大元组(例如配置元组)的文件进行完全比较。

Trailing comma is required for one-element tuples only. Having a trailing comma for larger tuples is a matter of style and is not required. Its greatest advantage is clean diff on files with multi-line large tuples that are often modified (e.g. configuration tuples).


回答 6

这是一个简单的答案。

a =(“ s”)是一个字符串

a =(“ s”,)是具有一个元素的元组。

对于一个元素元组,Python需要附加逗号,以区分字符串和元组。

例如,在python控制台上尝试:

a =(“ s”)

a = a +(1,2,3)

追溯(最近一次通话):

文件stdin,第1行,在模块中

TypeError:无法连接“ str”和“ tuple”对象

That’s a simple answer.

a = (“s”) is a string

and

a = (“s”,) is a tuple with one element.

Python needs an additional comma in case of one element tuple to, differentiate between string and tuple.

For example try this on python console:

a = (“s”)

a = a + (1,2,3)

Traceback (most recent call last):

File stdin, line 1, in module

TypeError: cannot concatenate ‘str’ and ‘tuple’ objects


回答 7

存在的另一个原因是它使代码生成和__repr__功能更易于编写。例如,如果您有一些构建为的对象obj(arg1, arg2, ..., argn),则可以这样写obj.__repr__

def __repr__(self):
    l = ['obj(']
    for arg in obj.args: # Suppose obj.args == (arg1, arg2, ..., argn)
        l.append(repr(arg))
        l.append(', ')
    l.append(')')
    return ''.join(l)

如果不允许使用逗号结尾,则必须对最后一个参数进行特殊处理。实际上,您可以使用列表理解将上述内容写在一行中(我将其写得更长一些,以便于阅读)。如果上学期不得不特殊情况,要做到这一点并非易事。

Another reason that this exists is that it makes code generation and __repr__ functions easier to write. For example, if you have some object that is built like obj(arg1, arg2, ..., argn), then you can just write obj.__repr__ as

def __repr__(self):
    l = ['obj(']
    for arg in obj.args: # Suppose obj.args == (arg1, arg2, ..., argn)
        l.append(repr(arg))
        l.append(', ')
    l.append(')')
    return ''.join(l)

If a trailing comma wasn’t allowed, you would have to special case the last argument. In fact, you could write the above in one line using a list comprehension (I’ve written it out longer to make it easier to read). It wouldn’t be so easy to do that if you had to special case the last term.


回答 8

PEP 8-Python代码样式指南-何时使用尾随逗号

尾部的逗号通常是可选的,但当组成一个元素的元组时它们是必需的(并且在Python 2中,它们具有print语句的语义)。为了清楚起见,建议将后者用(技术上多余的)括号括起来。

是:

FILES = ('setup.cfg',)

好的,但是令人困惑:

FILES = 'setup.cfg',

如果结尾的逗号多余,则在使用版本控制系统时,当值,参数或导入项的列表预计会随着时间扩展时,它们通常会很有用。模式是将每个值(等)单独放在一行上,始终添加尾随逗号,并在下一行上添加右括号/括号/括号。但是,在与结束定界符相同的行上使用尾随逗号是没有意义的(在上述单例元组的情况下除外)。

是:

FILES = [
    'setup.cfg',
    'tox.ini',
    ]
initialize(FILES,
           error=True,
           )

没有:

FILES = ['setup.cfg', 'tox.ini',]
initialize(FILES, error=True,)

PEP 8 — Style Guide for Python Code – When to Use Trailing Commas

Trailing commas are usually optional, except they are mandatory when making a tuple of one element (and in Python 2 they have semantics for the print statement). For clarity, it is recommended to surround the latter in (technically redundant) parentheses.

Yes:

FILES = ('setup.cfg',)

OK, but confusing:

FILES = 'setup.cfg',

When trailing commas are redundant, they are often helpful when a version control system is used, when a list of values, arguments or imported items is expected to be extended over time. The pattern is to put each value (etc.) on a line by itself, always adding a trailing comma, and add the close parenthesis/bracket/brace on the next line. However it does not make sense to have a trailing comma on the same line as the closing delimiter (except in the above case of singleton tuples).

Yes:

FILES = [
    'setup.cfg',
    'tox.ini',
    ]
initialize(FILES,
           error=True,
           )

No:

FILES = ['setup.cfg', 'tox.ini',]
initialize(FILES, error=True,)

回答 9

编码风格是您的喜好,如果您认为编码标准很重要,那么可以使用PEP-8指导您。

您如何看待以下表达的结果?

x = (3)
x = (3+2)
x = 2*(3+2)

是的,x只是一个数字。

Coding style is your taste, If you think coding standard matters there is a PEP-8 That can guide you.

What do you think of the result of following expression?

x = (3)
x = (3+2)
x = 2*(3+2)

Yep, x is just an number.


“ x不在y”或“ x不在y”

问题:“ x不在y”或“ x不在y”

在测试成员资格时,我们可以使用:

x not in y

或者:

not x in y

取决于x和,此表达式可能有许多可能的上下文y。例如,它可能用于子字符串检查,列表成员资格,字典键存在。

  • 两种形式总是相等的吗?
  • 有首选的语法吗?

When testing for membership, we can use:

x not in y

Or alternatively:

not x in y

There can be many possible contexts for this expression depending on x and y. It could be for a substring check, list membership, dict key existence, for example.

  • Are the two forms always equivalent?
  • Is there a preferred syntax?

回答 0

他们总是给出相同的结果。

实际上,not 'ham' in 'spam and eggs'在特殊情况下,执行单个“非输入”操作而不是“输入”操作,然后取反:

>>> import dis

>>> def notin():
    'ham' not in 'spam and eggs'
>>> dis.dis(notin)
  2           0 LOAD_CONST               1 ('ham')
              3 LOAD_CONST               2 ('spam and eggs')
              6 COMPARE_OP               7 (not in)
              9 POP_TOP             
             10 LOAD_CONST               0 (None)
             13 RETURN_VALUE    

>>> def not_in():
    not 'ham' in 'spam and eggs'
>>> dis.dis(not_in)
  2           0 LOAD_CONST               1 ('ham')
              3 LOAD_CONST               2 ('spam and eggs')
              6 COMPARE_OP               7 (not in)
              9 POP_TOP             
             10 LOAD_CONST               0 (None)
             13 RETURN_VALUE    

>>> def not__in():
    not ('ham' in 'spam and eggs')
>>> dis.dis(not__in)
  2           0 LOAD_CONST               1 ('ham')
              3 LOAD_CONST               2 ('spam and eggs')
              6 COMPARE_OP               7 (not in)
              9 POP_TOP             
             10 LOAD_CONST               0 (None)
             13 RETURN_VALUE        

>>> def noteq():
    not 'ham' == 'spam and eggs'
>>> dis.dis(noteq)
  2           0 LOAD_CONST               1 ('ham')
              3 LOAD_CONST               2 ('spam and eggs')
              6 COMPARE_OP               2 (==)
              9 UNARY_NOT           
             10 POP_TOP             
             11 LOAD_CONST               0 (None)
             14 RETURN_VALUE      

起初我以为它们总是给出相同的结果,但是not它本身只是一个低优先级的逻辑否定运算符,可以a in b像其他布尔表达式一样容易地应用,而not in为了方便和清楚起见,它是一个单独的运算符。

上面的拆卸很明显!not显然,虽然显然是逻辑取反运算符,但该窗体not a in b是特殊情况,因此实际上并未使用通用运算符。这not a in b实际上使表达式与相同a not in b,而不仅仅是产生相同值的表达式。

They always give the same result.

In fact, not 'ham' in 'spam and eggs' appears to be special cased to perform a single “not in” operation, rather than an “in” operation and then negating the result:

>>> import dis

>>> def notin():
    'ham' not in 'spam and eggs'
>>> dis.dis(notin)
  2           0 LOAD_CONST               1 ('ham')
              3 LOAD_CONST               2 ('spam and eggs')
              6 COMPARE_OP               7 (not in)
              9 POP_TOP             
             10 LOAD_CONST               0 (None)
             13 RETURN_VALUE    

>>> def not_in():
    not 'ham' in 'spam and eggs'
>>> dis.dis(not_in)
  2           0 LOAD_CONST               1 ('ham')
              3 LOAD_CONST               2 ('spam and eggs')
              6 COMPARE_OP               7 (not in)
              9 POP_TOP             
             10 LOAD_CONST               0 (None)
             13 RETURN_VALUE    

>>> def not__in():
    not ('ham' in 'spam and eggs')
>>> dis.dis(not__in)
  2           0 LOAD_CONST               1 ('ham')
              3 LOAD_CONST               2 ('spam and eggs')
              6 COMPARE_OP               7 (not in)
              9 POP_TOP             
             10 LOAD_CONST               0 (None)
             13 RETURN_VALUE        

>>> def noteq():
    not 'ham' == 'spam and eggs'
>>> dis.dis(noteq)
  2           0 LOAD_CONST               1 ('ham')
              3 LOAD_CONST               2 ('spam and eggs')
              6 COMPARE_OP               2 (==)
              9 UNARY_NOT           
             10 POP_TOP             
             11 LOAD_CONST               0 (None)
             14 RETURN_VALUE      

I had thought at first that they always gave the same result, but that not on its own was simply a low precedence logical negation operator, which could be applied to a in b just as easily as any other boolean expression, whereas not in was a separate operator for convenience and clarity.

The disassembly above was revealing! It seems that while not obviously is a logical negation operator, the form not a in b is special cased so that it’s not actually using the general operator. This makes not a in b literally the same expression as a not in b, rather than merely an expression that results in the same value.


回答 1

  1. 不,没有区别。

    运算符not in被定义为具有的反真值in

    Python文档

  2. 我认为not in它是首选的,因为它更明显,他们为此添加了特殊情况。

  1. No, there is no difference.

    The operator not in is defined to have the inverse true value of in.

    Python documentation

  2. I would assume not in is preferred because it is more obvious and they added a special case for it.

回答 2

它们的含义相同,但是pycodestyle Python样式指南检查器(以前称为pep8)更喜欢规则E713中not in运算符:

E713:会员资格测试应为 not in

另请参见“ Python if x is not Noneif not x is None?” 选择非常相似的样式

They are identical in meaning, but the pycodestyle Python style guide checker (formerly called pep8) prefers the not in operator in rule E713:

E713: test for membership should be not in

See also “Python if x is not None or if not x is None?” for a very similar choice of style.


回答 3

其他人已经非常清楚地表明,这两个语句在相当低的水平上是等效的。

但是,我认为尚未有人对此施加过足够的压力,因为这让您自己选择,您应该

选择使代码尽可能可读的形式。

并且不一定对任何人都可读,即使这当然是一个不错的目标。不,请确保该代码对您来说尽可能易读,因为您是最有可能在稍后返回此代码并尝试阅读该代码的人。

Others have already made it very clear that the two statements are, down to a quite low level, equivalent.

However, I don’t think that anyone yet has stressed enough that since this leaves the choice up to you, you should

choose the form that makes your code as readable as possible.

And not necessarily as readable as possible to anyone, even if that’s of course a nice thing to aim for. No, make sure the code is as readable as possible to you, since you are the one who is the most likely to come back to this code later and try to read it.


回答 4

在Python中,没有区别。而且没有偏好。

In Python, there is no difference. And there is no preference.


回答 5

从句法上讲,它们是相同的陈述。我会很快指出,它'ham' not in 'spam and eggs'传达了更清晰的意图,但是我已经看到not 'ham' in 'spam and eggs'了比其他传达了更清晰含义的代码和方案。

Syntactically they’re the same statement. I would be quick to state that 'ham' not in 'spam and eggs' conveys clearer intent, but I’ve seen code and scenarios in which not 'ham' in 'spam and eggs' conveys a clearer meaning than the other.


这是什么意思?

问题:这是什么意思?

我正在分析一些Python代码,但我不知道

pop = population[:]

手段。是Java中的数组列表还是二维数组?

I’m analyzing some Python code and I don’t know what

pop = population[:]

means. Is it something like array lists in Java or like a bi-dimensional array?


回答 0

它是切片符号的示例,其作用取决于的类型population。如果population是列表,则此行将创建列表的浅表副本。对于类型为tuple或的对象str,它将不执行任何操作(如果不使用,该行将执行相同操作[:]),对于(例如)NumPy数组,它将为相同的数据创建一个新视图。

It is an example of slice notation, and what it does depends on the type of population. If population is a list, this line will create a shallow copy of the list. For an object of type tuple or a str, it will do nothing (the line will do the same without [:]), and for a (say) NumPy array, it will create a new view to the same data.


回答 1

了解列表切片通常会复制一部分列表,这也可能会有所帮助。例如,population[2:4]将返回一个包含人口[2]和人口[3]的列表(切片是右排它的)。保留左和右索引,因为population[:]它们分别默认为0和length(population),从而选择了整个列表。因此,这是制作列表副本的常见习语。

It might also help to know that a list slice in general makes a copy of part of the list. E.g. population[2:4] will return a list containing population[2] and population[3] (slicing is right-exclusive). Leaving away the left and right index, as in population[:] they default to 0 and length(population) respectively, thereby selecting the entire list. Hence this is a common idiom to make a copy of a list.


回答 2

好吧…这确实取决于上下文。最终,它通过一个slice对象(slice(None,None,None))为以下方法之一: __getitem____setitem____delitem__。(实际上,如果对象具有__getslice__,将代替__getitem__,但现在已弃用,不应使用)。

对象可以对切片执行其所需的操作。

在以下情况下:

x = obj[:]

这将obj.__getitem__使用传入的slice对象进行调用。实际上,这完全等效于:

x = obj[slice(None,None,None)]

(尽管前者可能更有效,因为它不必查找 slice构造函数-全部用字节码完成)。

对于大多数对象,这是一种创建序列的一部分的浅表副本的方法。

下一个:

x[:] = obj

是一种设置项目的方法(它调用 __setitem__基于)的方法obj

而且,我想您可能会猜出什么:

del x[:]

调用;-)。

您还可以传递不同的切片:

x[1:4]

结构体 slice(1,4,None)

x[::-1]

构造slice(None,None,-1)等等。进一步阅读: 解释Python的切片符号

well… this really depends on the context. Ultimately, it passes a slice object (slice(None,None,None)) to one of the following methods: __getitem__, __setitem__ or __delitem__. (Actually, if the object has a __getslice__, that will be used instead of __getitem__, but that is now deprecated and shouldn’t be used).

Objects can do what they want with the slice.

In the context of:

x = obj[:]

This will call obj.__getitem__ with the slice object passed in. In fact, this is completely equivalent to:

x = obj[slice(None,None,None)]

(although the former is probably more efficient because it doesn’t have to look up the slice constructor — It’s all done in bytecode).

For most objects, this is a way to create a shallow copy of a portion of the sequence.

Next:

x[:] = obj

Is a way to set the items (it calls __setitem__) based on obj.

and, I think you can probably guess what:

del x[:]

calls ;-).

You can also pass different slices:

x[1:4]

constructs slice(1,4,None)

x[::-1]

constructs slice(None,None,-1) and so forth. Further reading: Explain Python’s slice notation


回答 3

这是一片从序列开始到结束的,通常会产生浅拷贝。

(嗯,不仅限于此,但您无需关心。)

It is a slice from the beginning of the sequence to the end, usually producing a shallow copy.

(Well, it’s more than that, but you don’t need to care yet.)


回答 4

它创建列表的副本,而不仅仅是为现有列表分配新名称。

It creates a copy of the list, versus just assigning a new name for the already existing list.


回答 5

[:]
用于限制器或数组中的切片,散列,
例如:
[1:5]用于显示1到5之间的值,即1-4
[start:end]

主要用于数组中的切片,理解方括号接受表示的变量要显示的值或键,“:”用于将整个数组限制或分割为数据包。

[:]
used for limiter or slicing in array , hash
eg:
[1:5] for displaying values between 1 inclusive and 5 exclusive i.e 1-4
[start:end]

basically used in array for slicing , understand bracket accept variable that mean value or key to display, and ” : ” is used to limit or slice the entire array into packets .


为什么不分配给空列表(例如[] =“”)错误?

问题:为什么不分配给空列表(例如[] =“”)错误?

在python 3.4中,我输入

[] = "" 

并且工作正常,不会引发异常。虽然当然[]不等于""事后。

[] = ()

也可以。

"" = []

引发异常,

() = ""

引发异常,但是如预期的那样。发生什么了?

In python 3.4, I am typing

[] = "" 

and it works fine, no Exception is raised. Though of course [] is not equal to "" afterwards.

[] = ()

also works fine.

"" = []

raises an exception as expected though,

() = ""

raises an exception as expected though. So, what’s going on?


回答 0

您不是为了平等而比较。您正在分配

Python允许您分配给多个目标:

foo, bar = 1, 2

将两个值分别分配给foobar。您所需要的只是右侧的序列可迭代,左侧的是名称列表或元组。

当您这样做时:

[] = ""

您分配了一个名称列表序列(空字符串仍然是序列)。

本质上与执行此操作相同:

[foo, bar, baz] = "abc"

在哪里结束foo = "a"bar = "b"以及baz = "c",但字符数减少了。

但是,您无法分配给字符串,因此 ""在的左侧永远不会起作用,并且始终是语法错误。

请参阅赋值语句文档

赋值语句评估表达式列表(请记住,它可以是单个表达式或逗号分隔的列表,后者产生一个元组),并将单个结果对象从左到右分配给每个目标列表。

将对象分配给目标列表(可选地用括号或方括号括起来)的方式如下所述。

强调我的

Python不会为空列表引发语法错误实际上是一个错误!正式记录的语法不允许有空的目标列表,并且对于空的目标,()确实会出错。见bug 23275 ; 它被认为是无害的错误:

起点是认识到这种情况已经存在很长时间并且是无害的。

另请参见为什么分配给空列表而不分配给空元组有效吗?

You are not comparing for equality. You are assigning.

Python allows you to assign to multiple targets:

foo, bar = 1, 2

assigns the two values to foo and bar, respectively. All you need is a sequence or iterable on the right-hand side, and a list or tuple of names on the left.

When you do:

[] = ""

you assigned an empty sequence (empty strings are sequences still) to an empty list of names.

It is essentially the same thing as doing:

[foo, bar, baz] = "abc"

where you end up with foo = "a", bar = "b" and baz = "c", but with fewer characters.

You cannot, however, assign to a string, so "" on the left-hand side of an assignment never works and is always a syntax error.

See the Assignment statements documentation:

An assignment statement evaluates the expression list (remember that this can be a single expression or a comma-separated list, the latter yielding a tuple) and assigns the single resulting object to each of the target lists, from left to right.

and

Assignment of an object to a target list, optionally enclosed in parentheses or square brackets, is recursively defined as follows.

Emphasis mine.

That Python doesn’t throw a syntax error for the empty list is actually a bit of a bug! The officially documented grammar doesn’t allow for an empty target list, and for the empty () you do get an error. See bug 23275; it is considered a harmless bug:

The starting point is recognizing that this has been around for very long time and is harmless.

Also see Why is it valid to assign to an empty list but not to an empty tuple?


回答 1

它遵循文档中“ 分配声明”部分的规则,

assignment_stmt ::=  (target_list "=")+ (expression_list | yield_expression)

如果target list是用逗号分隔的目标列表:该对象必须是可迭代的,并且具有与目标列表中的目标相同数量的项,并且这些项从左到右分配给相应的目标。

该对象必须是与目标列表中的目标具有相同数量项的序列,并且这些项从左到右分配给相应的目标。

所以,当你说

[] = ""

"" 是可迭代的(任何有效的python字符串都是可迭代的),并且正在列表的元素上解压缩。

例如,

>>> [a, b, c] = "123"
>>> a, b, c
('1', '2', '3')

由于您有一个空字符串和一个空列表,因此无需解压缩。因此,没有错误。

但是,尝试一下

>>> [] = "1"
Traceback (most recent call last):
  File "<input>", line 1, in <module>
ValueError: too many values to unpack (expected 0)
>>> [a] = ""
Traceback (most recent call last):
  File "<input>", line 1, in <module>
ValueError: need more than 0 values to unpack

在这种[] = "1"情况下,您尝试将字符串解压缩到"1"一个空变量列表中。因此,它抱怨“解压缩的值太多(预期为0)”。

同样,[a] = ""以防万一,您有一个空字符串,因此实际上没有要解压缩的内容,但是您正在通过一个变量解压缩它,这也是不可能的。这就是为什么它抱怨“需要超过0个值才能解包”。

除此之外,您已经注意到,

>>> [] = ()

也不会引发错误,因为()是一个空的元组。

>>> ()
()
>>> type(())
<class 'tuple'>

当将其解压缩到一个空列表中时,没有任何要解压缩的内容。所以没有错误。


但是,当你这样做

>>> "" = []
  File "<input>", line 1
SyntaxError: can't assign to literal
>>> "" = ()
  File "<input>", line 1
SyntaxError: can't assign to literal

如错误消息所述,您正在尝试分配给字符串文字。这是不可能的。这就是为什么您会得到错误。就像在说

>>> 1 = "one"
  File "<input>", line 1
SyntaxError: can't assign to literal

内部构造

在内部,此分配操作将转换为UNPACK_SEQUENCE操作码,

>>> dis(compile('[] = ""', "string", "exec"))
  1           0 LOAD_CONST               0 ('')
              3 UNPACK_SEQUENCE          0
              6 LOAD_CONST               1 (None)

在这里,由于字符串为空,因此将UNPACK_SEQUENCE解压缩0时间。但是当你有这样的事情

>>> dis(compile('[a, b, c] = "123"', "string", "exec"))
  1           0 LOAD_CONST               0 ('123')
              3 UNPACK_SEQUENCE          3
              6 STORE_NAME               0 (a)
              9 STORE_NAME               1 (b)
             12 STORE_NAME               2 (c)
             15 LOAD_CONST               1 (None)
             18 RETURN_VALUE

该序列123从右到左解压到堆栈中。因此,堆栈的顶部为,1下一个为2,最后一个为3。然后,它从堆栈的顶部开始逐一分配左侧表达式中的变量。


顺便说一句,在Python中,这就是您可以在同一表达式中进行多个分配的方式。例如,

a, b, c, d, e, f = u, v, w, x, y, z

之所以可行,是因为使用右手边的值构造一个元组,然后将其拆开到左手边的值上。

>>> dis(compile('a, b, c, d, e, f = u, v, w, x, y, z', "string", "exec"))
  1           0 LOAD_NAME                0 (u)
              3 LOAD_NAME                1 (v)
              6 LOAD_NAME                2 (w)
              9 LOAD_NAME                3 (x)
             12 LOAD_NAME                4 (y)
             15 LOAD_NAME                5 (z)
             18 BUILD_TUPLE              6
             21 UNPACK_SEQUENCE          6
             24 STORE_NAME               6 (a)
             27 STORE_NAME               7 (b)
             30 STORE_NAME               8 (c)
             33 STORE_NAME               9 (d)
             36 STORE_NAME              10 (e)
             39 STORE_NAME              11 (f)
             42 LOAD_CONST               0 (None)
             45 RETURN_VALUE

但是经典的交换技术a, b = b, a使用堆栈顶部的元素旋转。如果只有两个或三个元素,则将使用特殊的ROT_TWOROT_THREE说明来对待它们,而不是构造元组和拆包。

>>> dis(compile('a, b = b, a', "string", "exec"))
  1           0 LOAD_NAME                0 (b)
              3 LOAD_NAME                1 (a)
              6 ROT_TWO
              7 STORE_NAME               1 (a)
             10 STORE_NAME               0 (b)
             13 LOAD_CONST               0 (None)
             16 RETURN_VALUE

It follows the Assignment statements section rules from the documentation,

assignment_stmt ::=  (target_list "=")+ (expression_list | yield_expression)

If the target list is a comma-separated list of targets: The object must be an iterable with the same number of items as there are targets in the target list, and the items are assigned, from left to right, to the corresponding targets.

The object must be a sequence with the same number of items as there are targets in the target list, and the items are assigned, from left to right, to the corresponding targets.

So, when you say

[] = ""

"" is an iterable (any valid python string is an iterable) and it is being unpacked over the elements of the list.

For example,

>>> [a, b, c] = "123"
>>> a, b, c
('1', '2', '3')

Since you have an empty string, and an empty list, there is nothing to unpack. So, no error.

But, try this

>>> [] = "1"
Traceback (most recent call last):
  File "<input>", line 1, in <module>
ValueError: too many values to unpack (expected 0)
>>> [a] = ""
Traceback (most recent call last):
  File "<input>", line 1, in <module>
ValueError: need more than 0 values to unpack

In the [] = "1" case, you are trying to unpack the string "1" over an empty list of variables. So it complains with “too many values to unpack (expected 0)”.

Same way, in [a] = "" case, you have an empty string, so nothing to unpack really, but you are unpacking it over one variable, which is, again, not possible. That is why it complains “need more than 0 values to unpack”.

Apart from that, as you noticed,

>>> [] = ()

also throws no error, because () is an empty tuple.

>>> ()
()
>>> type(())
<class 'tuple'>

and when it is unpacked over an empty list, there is nothing to unpack. So no error.


But, when you do

>>> "" = []
  File "<input>", line 1
SyntaxError: can't assign to literal
>>> "" = ()
  File "<input>", line 1
SyntaxError: can't assign to literal

as the error message says, you are trying to assign to a string literal. Which is not possible. That is why you are getting the errors. It is like saying

>>> 1 = "one"
  File "<input>", line 1
SyntaxError: can't assign to literal

Internals

Internally, this assignment operation will be translated to UNPACK_SEQUENCE op code,

>>> dis(compile('[] = ""', "string", "exec"))
  1           0 LOAD_CONST               0 ('')
              3 UNPACK_SEQUENCE          0
              6 LOAD_CONST               1 (None)

Here, since the string is empty, UNPACK_SEQUENCE unpacks 0 times. But when you have something like this

>>> dis(compile('[a, b, c] = "123"', "string", "exec"))
  1           0 LOAD_CONST               0 ('123')
              3 UNPACK_SEQUENCE          3
              6 STORE_NAME               0 (a)
              9 STORE_NAME               1 (b)
             12 STORE_NAME               2 (c)
             15 LOAD_CONST               1 (None)
             18 RETURN_VALUE

the sequence 123 is unpacked in to the stack, from right to left. So, the top of the stack would be 1 and the next would be 2 and the last would be 3. Then it assigns from the top of the stack to the variables from the left hand side expression one by one.


BTW, in Python, this is how you can do multiple assignments in the same expression. For example,

a, b, c, d, e, f = u, v, w, x, y, z

this works because, the right hand values are used to construct a tuple and then it will be unpacked over the left hand side values.

>>> dis(compile('a, b, c, d, e, f = u, v, w, x, y, z', "string", "exec"))
  1           0 LOAD_NAME                0 (u)
              3 LOAD_NAME                1 (v)
              6 LOAD_NAME                2 (w)
              9 LOAD_NAME                3 (x)
             12 LOAD_NAME                4 (y)
             15 LOAD_NAME                5 (z)
             18 BUILD_TUPLE              6
             21 UNPACK_SEQUENCE          6
             24 STORE_NAME               6 (a)
             27 STORE_NAME               7 (b)
             30 STORE_NAME               8 (c)
             33 STORE_NAME               9 (d)
             36 STORE_NAME              10 (e)
             39 STORE_NAME              11 (f)
             42 LOAD_CONST               0 (None)
             45 RETURN_VALUE

but the classic swapping technique a, b = b, a uses rotation of elements in the top of the stack. If you have only two or three elements then they are treated with special ROT_TWO and ROT_THREE instructions instead of constructing the tuple and unpacking.

>>> dis(compile('a, b = b, a', "string", "exec"))
  1           0 LOAD_NAME                0 (b)
              3 LOAD_NAME                1 (a)
              6 ROT_TWO
              7 STORE_NAME               1 (a)
             10 STORE_NAME               0 (b)
             13 LOAD_CONST               0 (None)
             16 RETURN_VALUE

Python中“ in”的关联性?

问题:Python中“ in”的关联性?

我正在制作一个Python解析器,这确实让我感到困惑:

>>>  1 in  []  in 'a'
False

>>> (1 in  []) in 'a'
TypeError: 'in <string>' requires string as left operand, not bool

>>>  1 in ([] in 'a')
TypeError: 'in <string>' requires string as left operand, not list

关于关联性等方面,“ in”在Python中到底如何工作?

为什么这些表达式中没有两个表现相同?

I’m making a Python parser, and this is really confusing me:

>>>  1 in  []  in 'a'
False

>>> (1 in  []) in 'a'
TypeError: 'in <string>' requires string as left operand, not bool

>>>  1 in ([] in 'a')
TypeError: 'in <string>' requires string as left operand, not list

How exactly does “in” work in Python, with regards to associativity, etc.?

Why do no two of these expressions behave the same way?


回答 0

1 in [] in 'a'被评估为(1 in []) and ([] in 'a')

由于第一个条件(1 in [])是False,整个条件的评估为False; ([] in 'a')永远不会实际评估,因此不会引发任何错误。

这是语句定义:

In [121]: def func():
   .....:     return 1 in [] in 'a'
   .....: 

In [122]: dis.dis(func)
  2           0 LOAD_CONST               1 (1)
              3 BUILD_LIST               0
              6 DUP_TOP             
              7 ROT_THREE           
              8 COMPARE_OP               6 (in)
             11 JUMP_IF_FALSE            8 (to 22)  #if first comparison is wrong 
                                                    #then jump to 22, 
             14 POP_TOP             
             15 LOAD_CONST               2 ('a')
             18 COMPARE_OP               6 (in)     #this is never executed, so no Error
             21 RETURN_VALUE         
        >>   22 ROT_TWO             
             23 POP_TOP             
             24 RETURN_VALUE        

In [150]: def func1():
   .....:     return (1 in  []) in 'a'
   .....: 

In [151]: dis.dis(func1)
  2           0 LOAD_CONST               1 (1)
              3 LOAD_CONST               3 (())
              6 COMPARE_OP               6 (in)   # perform 1 in []
              9 LOAD_CONST               2 ('a')  # now load 'a'
             12 COMPARE_OP               6 (in)   # compare result of (1 in []) with 'a'
                                                  # throws Error coz (False in 'a') is
                                                  # TypeError
             15 RETURN_VALUE   



In [153]: def func2():
   .....:     return 1 in ([] in 'a')
   .....: 

In [154]: dis.dis(func2)
  2           0 LOAD_CONST               1 (1)
              3 BUILD_LIST               0
              6 LOAD_CONST               2 ('a') 
              9 COMPARE_OP               6 (in)  # perform ([] in 'a'), which is 
                                                 # Incorrect, so it throws TypeError
             12 COMPARE_OP               6 (in)  # if no Error then 
                                                 # compare 1 with the result of ([] in 'a')
             15 RETURN_VALUE        

1 in [] in 'a' is evaluated as (1 in []) and ([] in 'a').

Since the first condition (1 in []) is False, the whole condition is evaluated as False; ([] in 'a') is never actually evaluated, so no error is raised.

Here are the statement definitions:

In [121]: def func():
   .....:     return 1 in [] in 'a'
   .....: 

In [122]: dis.dis(func)
  2           0 LOAD_CONST               1 (1)
              3 BUILD_LIST               0
              6 DUP_TOP             
              7 ROT_THREE           
              8 COMPARE_OP               6 (in)
             11 JUMP_IF_FALSE            8 (to 22)  #if first comparison is wrong 
                                                    #then jump to 22, 
             14 POP_TOP             
             15 LOAD_CONST               2 ('a')
             18 COMPARE_OP               6 (in)     #this is never executed, so no Error
             21 RETURN_VALUE         
        >>   22 ROT_TWO             
             23 POP_TOP             
             24 RETURN_VALUE        

In [150]: def func1():
   .....:     return (1 in  []) in 'a'
   .....: 

In [151]: dis.dis(func1)
  2           0 LOAD_CONST               1 (1)
              3 LOAD_CONST               3 (())
              6 COMPARE_OP               6 (in)   # perform 1 in []
              9 LOAD_CONST               2 ('a')  # now load 'a'
             12 COMPARE_OP               6 (in)   # compare result of (1 in []) with 'a'
                                                  # throws Error coz (False in 'a') is
                                                  # TypeError
             15 RETURN_VALUE   



In [153]: def func2():
   .....:     return 1 in ([] in 'a')
   .....: 

In [154]: dis.dis(func2)
  2           0 LOAD_CONST               1 (1)
              3 BUILD_LIST               0
              6 LOAD_CONST               2 ('a') 
              9 COMPARE_OP               6 (in)  # perform ([] in 'a'), which is 
                                                 # Incorrect, so it throws TypeError
             12 COMPARE_OP               6 (in)  # if no Error then 
                                                 # compare 1 with the result of ([] in 'a')
             15 RETURN_VALUE        

回答 1

Python通过链式比较来做特殊的事情。

对以下内容的评估不同:

x > y > z   # in this case, if x > y evaluates to true, then
            # the value of y is being used to compare, again,
            # to z

(x > y) > z # the parenth form, on the other hand, will first
            # evaluate x > y. And, compare the evaluated result
            # with z, which can be "True > z" or "False > z"

但是,在这两种情况下,如果第一个比较是False,则不会查看语句的其余部分。

对于您的特殊情况

1 in [] in 'a'   # this is false because 1 is not in []

(1 in []) in a   # this gives an error because we are
                 # essentially doing this: False in 'a'

1 in ([] in 'a') # this fails because you cannot do
                 # [] in 'a'

同样为了说明上面的第一条规则,这些是评估为True的语句。

1 in [1,2] in [4,[1,2]] # But "1 in [4,[1,2]]" is False

2 < 4 > 1               # and note "2 < 1" is also not true

python运算符的优先级:http : //docs.python.org/reference/expressions.html#summary

Python does special things with chained comparisons.

The following are evaluated differently:

x > y > z   # in this case, if x > y evaluates to true, then
            # the value of y is being used to compare, again,
            # to z

(x > y) > z # the parenth form, on the other hand, will first
            # evaluate x > y. And, compare the evaluated result
            # with z, which can be "True > z" or "False > z"

In both cases though, if the first comparison is False, the rest of the statement won’t be looked at.

For your particular case,

1 in [] in 'a'   # this is false because 1 is not in []

(1 in []) in a   # this gives an error because we are
                 # essentially doing this: False in 'a'

1 in ([] in 'a') # this fails because you cannot do
                 # [] in 'a'

Also to demonstrate the first rule above, these are statements that evaluate to True.

1 in [1,2] in [4,[1,2]] # But "1 in [4,[1,2]]" is False

2 < 4 > 1               # and note "2 < 1" is also not true

Precedence of python operators: http://docs.python.org/reference/expressions.html#summary


回答 2

从文档中:

比较可以任意链接,例如,x <y <= z等于x <y和y <= z,除了y仅被评估一次(但在两种情况下,当x <y被发现时,z都不被评估。是假的)。

这意味着,x in y in z!中没有关联性!

以下是等效的:

1 in  []  in 'a'
# <=>
middle = []
#            False          not evaluated
result = (1 in middle) and (middle in 'a')


(1 in  []) in 'a'
# <=>
lhs = (1 in []) # False
result = lhs in 'a' # False in 'a' - TypeError


1 in  ([] in 'a')
# <=>
rhs = ([] in 'a') # TypeError
result = 1 in rhs

From the documentation:

Comparisons can be chained arbitrarily, e.g., x < y <= z is equivalent to x < y and y <= z, except that y is evaluated only once (but in both cases z is not evaluated at all when x < y is found to be false).

What this means is, that there no associativity in x in y in z!

The following are equivalent:

1 in  []  in 'a'
# <=>
middle = []
#            False          not evaluated
result = (1 in middle) and (middle in 'a')


(1 in  []) in 'a'
# <=>
lhs = (1 in []) # False
result = lhs in 'a' # False in 'a' - TypeError


1 in  ([] in 'a')
# <=>
rhs = ([] in 'a') # TypeError
result = 1 in rhs

回答 3

简短的回答是,由于这里已经以良好的方式多次给出了一个较长的答案,那就是布尔表达式被短路了,如果进一步的评估不能使true变为false ,则布尔表达式 已停止评估。

(请参阅http://en.wikipedia.org/wiki/Short-circuit_evaluation

答案可能有点短(没有双关语),但是如上所述,所有其他解释在这里都已经做得很好,但是我认为该词值得一提。

The short answer, since the long one is already given several times here and in excellent ways, is that the boolean expression is short-circuited, this is has stopped evaluation when a change of true in false or vice versa cannot happen by further evaluation.

(see http://en.wikipedia.org/wiki/Short-circuit_evaluation)

It might be a little short (no pun intended) as an answer, but as mentioned, all other explanation is allready done quite well here, but I thought the term deserved to be mentioned.