分类目录归档:知识问答

django MultiValueDictKeyError错误,我该如何处理

问题:django MultiValueDictKeyError错误,我该如何处理

我正在尝试将对象保存到数据库中,但是它引发了MultiValueDictKeyError错误。

问题出在表格内,is_private用一个复选框表示。如果未选中该复选框,则显然不会传递任何内容。这是消除错误的地方。

我如何正确处理并捕获此异常?

该行是

is_private = request.POST['is_private']

I’m trying to save a object to my database, but it’s throwing a MultiValueDictKeyError error.

The problems lies within the form, the is_private is represented by a checkbox. If the check box is NOT selected, obviously nothing is passed. This is where the error gets chucked.

How do I properly deal with this exception, and catch it?

The line is

is_private = request.POST['is_private']

回答 0

使用MultiValueDict的get方法。这在标准字典中也存在,并且是一种在不存在默认值的情况下获取值的方法。

is_private = request.POST.get('is_private', False)

通常,

my_var = dict.get(<key>, <default>)

Use the MultiValueDict’s get method. This is also present on standard dicts and is a way to fetch a value while providing a default if it does not exist.

is_private = request.POST.get('is_private', False)

Generally,

my_var = dict.get(<key>, <default>)

回答 1

选择最适合您的:

1个

is_private = request.POST.get('is_private', False);

如果is_privatekey在request.POST中存在,则is_private变量等于它,如果不相等,则它等于False。

2

if 'is_private' in request.POST:
    is_private = request.POST['is_private']
else:
    is_private = False

3

from django.utils.datastructures import MultiValueDictKeyError
try:
    is_private = request.POST['is_private']
except MultiValueDictKeyError:
    is_private = False

Choose what is best for you:

1

is_private = request.POST.get('is_private', False);

If is_private key is present in request.POST the is_private variable will be equal to it, if not, then it will be equal to False.

2

if 'is_private' in request.POST:
    is_private = request.POST['is_private']
else:
    is_private = False

3

from django.utils.datastructures import MultiValueDictKeyError
try:
    is_private = request.POST['is_private']
except MultiValueDictKeyError:
    is_private = False

回答 2

之所以会这样,是因为您试图从不存在的字典中获取密钥。您需要先测试它是否在其中。

尝试:

is_private = 'is_private' in request.POST

要么

is_private = 'is_private' in request.POST and request.POST['is_private']

取决于您使用的值。

You get that because you’re trying to get a key from a dictionary when it’s not there. You need to test if it is in there first.

try:

is_private = 'is_private' in request.POST

or

is_private = 'is_private' in request.POST and request.POST['is_private']

depending on the values you’re using.


回答 3

您为什么不尝试is_private在模型中定义为default=False

class Foo(models.Models):
    is_private = models.BooleanField(default=False)

Why didn’t you try to define is_private in your models as default=False?

class Foo(models.Models):
    is_private = models.BooleanField(default=False)

回答 4

要记住的另一件事是request.POST['keyword']引用由指定的html name属性标识的元素keyword

因此,如果您的表格是:

<form action="/login/" method="POST">
  <input type="text" name="keyword" placeholder="Search query">
  <input type="number" name="results" placeholder="Number of results">
</form>

然后,request.POST['keyword']和分别request.POST['results']包含输入元素keyword和的值results

Another thing to remember is that request.POST['keyword'] refers to the element identified by the specified html name attribute keyword.

So, if your form is:

<form action="/login/" method="POST">
  <input type="text" name="keyword" placeholder="Search query">
  <input type="number" name="results" placeholder="Number of results">
</form>

then, request.POST['keyword'] and request.POST['results'] will contain the value of the input elements keyword and results, respectively.


回答 5

首先检查请求对象是否具有’is_private’键参数。多数情况下,此MultiValueDictKeyError发生是因为类字典的请求对象中缺少键。由于字典是无序键,因此值对为“关联存储器”或“关联数组”

换句话说。request.GET或request.POST是类似于字典的对象,包含所有请求参数。这是特定于Django的。

如果key在字典中,则方法get()返回给定key的值。如果key不可用,则返回默认值None。

您可以通过以下方式处理此错误:

is_private = request.POST.get('is_private', False);

First check if the request object have the ‘is_private’ key parameter. Most of the case’s this MultiValueDictKeyError occurred for missing key in the dictionary-like request object. Because dictionary is an unordered key, value pair “associative memories” or “associative arrays”

In another word. request.GET or request.POST is a dictionary-like object containing all request parameters. This is specific to Django.

The method get() returns a value for the given key if key is in the dictionary. If key is not available then returns default value None.

You can handle this error by putting :

is_private = request.POST.get('is_private', False);

回答 6

对我而言,由于以下原因,此错误在我的django项目中发生:

  1. 我在项目的模板文件夹中的home.html文件中插入了一个新的超链接,如下所示:

    <input type="button" value="About" onclick="location.href='{% url 'about' %}'">

  2. 在views.py中,我具有count和about的以下定义:

   def count(request):
           fulltext = request.GET['fulltext']
           wordlist = fulltext.split()
           worddict = {}
           for word in wordlist:
               if word in worddict:
                   worddict[word] += 1
               else:
                   worddict[word] = 1
                   worddict = sorted(worddict.items(), key = operator.itemgetter(1),reverse=True)
           return render(request,'count.html', 'fulltext':fulltext,'count':len(wordlist),'worddict'::worddict})

   def about(request): 
       return render(request,"about.html")
  1. 在urls.py中,我具有以下url模式:
    urlpatterns = [
        path('admin/', admin.site.urls),
        path('',views.homepage,name="home"),
        path('eggs',views.eggs),
        path('count/',views.count,name="count"),
        path('about/',views.count,name="about"),
    ]

可以看出没有。上面的3,在最后一个url模式中,我错误地调用了views.count而我需要调用views.about。fulltext = request.GET['fulltext']views.py的count函数中的这一行(由于在urlpatterns中输入错误而被错误地调用)引发了multivaluedictkeyerror异常。

然后,我将urls.py中的最后一个URL模式更改为正确的模式,即path('about/',views.about,name="about"),一切正常。

显然,通常django中的新手程序员会犯这样的错误,即我错误地为URL调用了另一个视图函数,这可能是期望使用不同的参数集或在其render调用中传递不同的对象集,而不是预期的行为。

希望这可以帮助一些新手程序员使用django。

For me, this error occurred in my django project because of the following:

  1. I inserted a new hyperlink in my home.html present in templates folder of my project as below:

    <input type="button" value="About" onclick="location.href='{% url 'about' %}'">
  2. In views.py, I had the following definitions of count and about:

   def count(request):
           fulltext = request.GET['fulltext']
           wordlist = fulltext.split()
           worddict = {}
           for word in wordlist:
               if word in worddict:
                   worddict[word] += 1
               else:
                   worddict[word] = 1
                   worddict = sorted(worddict.items(), key = operator.itemgetter(1),reverse=True)
           return render(request,'count.html', 'fulltext':fulltext,'count':len(wordlist),'worddict'::worddict})

   def about(request): 
       return render(request,"about.html")
  1. In urls.py, I had the following url patterns:
    urlpatterns = [
        path('admin/', admin.site.urls),
        path('',views.homepage,name="home"),
        path('eggs',views.eggs),
        path('count/',views.count,name="count"),
        path('about/',views.count,name="about"),
    ]

As can be seen in no. 3 above,in the last url pattern, I was incorrectly calling views.count whereas I needed to call views.about. This line fulltext = request.GET['fulltext'] in count function (which was mistakenly called because of wrong entry in urlpatterns) of views.py threw the multivaluedictkeyerror exception.

Then I changed the last url pattern in urls.py to the correct one i.e. path('about/',views.about,name="about"), and everything worked fine.

Apparently, in general a newbie programmer in django can make the mistake I made of wrongly calling another view function for a url, which might be expecting different set of parameters or passing different set of objects in its render call, rather than the intended behavior.

Hope this helps some newbie programmer to django.


sqlalchemy在多列中唯一

问题:sqlalchemy在多列中唯一

假设我有一个代表位置的类。位置“属于”客户。位置由Unicode 10个字符代码标识。对于特定客户,“位置代码”在位置之间应该是唯一的。

The two below fields in combination should be unique
customer_id = Column(Integer,ForeignKey('customers.customer_id')
location_code = Column(Unicode(10))

因此,如果我有两个客户,客户“ 123”和客户“ 456”。它们都可以有一个称为“ main”的位置,但都不能有两个称为main的位置。

我可以在业务逻辑中处理此问题,但我想确保没有办法轻松地在sqlalchemy中添加需求。unique = True选项似乎仅在应用于特定字段时才起作用,这将导致整个表仅对所有位置具有唯一代码。

Let’s say that I have a class that represents locations. Locations “belong” to customers. Locations are identified by a unicode 10 character code. The “location code” should be unique among the locations for a specific customer.

The two below fields in combination should be unique
customer_id = Column(Integer,ForeignKey('customers.customer_id')
location_code = Column(Unicode(10))

So if i have two customers, customer “123” and customer “456”. They both can have a location called “main” but neither could have two locations called main.

I can handle this in the business logic but I want to make sure there is no way to easily add the requirement in sqlalchemy. The unique=True option seems to only work when applied to a specific field and it would cause the entire table to only have a unique code for all locations.


回答 0

从以下文档的摘录Column

unique –为True时,指示此列包含唯一约束,或者,如果index也为True,则指示应使用唯一标志创建索引。要在约束/索引中指定多个列或指定一个显式名称,请显式使用 UniqueConstraintIndex构造。

由于这些属于表而不属于映射的类,因此可以在表定义中声明它们,或者如果使用声明性声明,例如__table_args__

# version1: table definition
mytable = Table('mytable', meta,
    # ...
    Column('customer_id', Integer, ForeignKey('customers.customer_id')),
    Column('location_code', Unicode(10)),

    UniqueConstraint('customer_id', 'location_code', name='uix_1')
    )
# or the index, which will ensure uniqueness as well
Index('myindex', mytable.c.customer_id, mytable.c.location_code, unique=True)


# version2: declarative
class Location(Base):
    __tablename__ = 'locations'
    id = Column(Integer, primary_key = True)
    customer_id = Column(Integer, ForeignKey('customers.customer_id'), nullable=False)
    location_code = Column(Unicode(10), nullable=False)
    __table_args__ = (UniqueConstraint('customer_id', 'location_code', name='_customer_location_uc'),
                     )

Extract from the documentation of the Column:

unique – When True, indicates that this column contains a unique constraint, or if index is True as well, indicates that the Index should be created with the unique flag. To specify multiple columns in the constraint/index or to specify an explicit name, use the UniqueConstraint or Index constructs explicitly.

As these belong to a Table and not to a mapped Class, one declares those in the table definition, or if using declarative as in the __table_args__:

# version1: table definition
mytable = Table('mytable', meta,
    # ...
    Column('customer_id', Integer, ForeignKey('customers.customer_id')),
    Column('location_code', Unicode(10)),

    UniqueConstraint('customer_id', 'location_code', name='uix_1')
    )
# or the index, which will ensure uniqueness as well
Index('myindex', mytable.c.customer_id, mytable.c.location_code, unique=True)


# version2: declarative
class Location(Base):
    __tablename__ = 'locations'
    id = Column(Integer, primary_key = True)
    customer_id = Column(Integer, ForeignKey('customers.customer_id'), nullable=False)
    location_code = Column(Unicode(10), nullable=False)
    __table_args__ = (UniqueConstraint('customer_id', 'location_code', name='_customer_location_uc'),
                     )

回答 1

from flask_sqlalchemy import SQLAlchemy
db = SQLAlchemy()

class Location(Base):
      __table_args__ = (
        # this can be db.PrimaryKeyConstraint if you want it to be a primary key
        db.UniqueConstraint('customer_id', 'location_code'))
      customer_id = Column(Integer,ForeignKey('customers.customer_id')
      location_code = Column(Unicode(10))
from flask_sqlalchemy import SQLAlchemy
db = SQLAlchemy()

class Location(Base):
      __table_args__ = (
        # this can be db.PrimaryKeyConstraint if you want it to be a primary key
        db.UniqueConstraint('customer_id', 'location_code'),
      )
      customer_id = Column(Integer,ForeignKey('customers.customer_id')
      location_code = Column(Unicode(10))

Python argparse:默认值或指定值

问题:Python argparse:默认值或指定值

我想有一个可选参数,如果仅存在未指定值的标志,则默认为一个值,但是存储用户指定的值,而不是如果用户指定一个值,则存储默认值。是否已经有可用于此的措施?

一个例子:

python script.py --example
# args.example would equal a default value of 1
python script.py --example 2
# args.example would equal a default value of 2

我可以创建一个动作,但是想查看是否存在执行此操作的方法。

I would like to have a optional argument that will default to a value if only the flag is present with no value specified, but store a user-specified value instead of the default if the user specifies a value. Is there already an action available for this?

An example:

python script.py --example
# args.example would equal a default value of 1
python script.py --example 2
# args.example would equal a default value of 2

I can create an action, but wanted to see if there was an existing way to do this.


回答 0

import argparse
parser = argparse.ArgumentParser()
parser.add_argument('--example', nargs='?', const=1, type=int)
args = parser.parse_args()
print(args)

% test.py 
Namespace(example=None)
% test.py --example
Namespace(example=1)
% test.py --example 2
Namespace(example=2)

  • nargs='?' 表示0或1参数
  • const=1 当参数为0时设置默认值
  • type=int 将参数转换为int

如果即使未指定,test.py也要设置example为1 --example,则包括default=1。也就是说,

parser.add_argument('--example', nargs='?', const=1, type=int, default=1)

然后

% test.py 
Namespace(example=1)
import argparse
parser = argparse.ArgumentParser()
parser.add_argument('--example', nargs='?', const=1, type=int)
args = parser.parse_args()
print(args)

% test.py 
Namespace(example=None)
% test.py --example
Namespace(example=1)
% test.py --example 2
Namespace(example=2)

  • nargs='?' means 0-or-1 arguments
  • const=1 sets the default when there are 0 arguments
  • type=int converts the argument to int

If you want test.py to set example to 1 even if no --example is specified, then include default=1. That is, with

parser.add_argument('--example', nargs='?', const=1, type=int, default=1)

then

% test.py 
Namespace(example=1)

回答 1

实际上,您只需要使用此脚本中的default参数即可:add_argumenttest.py

import argparse

if __name__ == '__main__':

    parser = argparse.ArgumentParser()
    parser.add_argument('--example', default=1)
    args = parser.parse_args()
    print(args.example)

test.py --example
% 1
test.py --example 2
% 2

详细信息在这里

Actually, you only need to use the default argument to add_argument as in this test.py script:

import argparse

if __name__ == '__main__':

    parser = argparse.ArgumentParser()
    parser.add_argument('--example', default=1)
    args = parser.parse_args()
    print(args.example)

test.py --example
% 1
test.py --example 2
% 2

Details are here.


回答 2

和…之间的不同:

parser.add_argument("--debug", help="Debug", nargs='?', type=int, const=1, default=7)

parser.add_argument("--debug", help="Debug", nargs='?', type=int, const=1)

因此是:

myscript.py =>在第一种情况下,debug是7(默认情况下),在第二种情况下是“ None”

myscript.py --debug =>在每种情况下,调试均为1

myscript.py --debug 2 =>在每种情况下,调试均为2

The difference between:

parser.add_argument("--debug", help="Debug", nargs='?', type=int, const=1, default=7)

and

parser.add_argument("--debug", help="Debug", nargs='?', type=int, const=1)

is thus:

myscript.py => debug is 7 (from default) in the first case and “None” in the second

myscript.py --debug => debug is 1 in each case

myscript.py --debug 2 => debug is 2 in each case


用括号括起来的列表和括号在Python中有什么区别?

问题:用括号括起来的列表和括号在Python中有什么区别?

>>> x=[1,2]
>>> x[1]
2
>>> x=(1,2)
>>> x[1]
2

它们都有效吗?是出于某些原因而首选?

>>> x=[1,2]
>>> x[1]
2
>>> x=(1,2)
>>> x[1]
2

Are they both valid? Is one preferred for some reason?


回答 0

方括号是列表,括号是元组

列表是可变的,这意味着您可以更改其内容:

>>> x = [1,2]
>>> x.append(3)
>>> x
[1, 2, 3]

而元组不是:

>>> x = (1,2)
>>> x
(1, 2)
>>> x.append(3)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: 'tuple' object has no attribute 'append'

另一个主要区别是元组是可哈希的,这意味着您可以将其用作字典的键。例如:

>>> x = (1,2)
>>> y = [1,2]
>>> z = {}
>>> z[x] = 3
>>> z
{(1, 2): 3}
>>> z[y] = 4
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: unhashable type: 'list'

请注意,正如许多人指出的那样,您可以将元组加在一起。例如:

>>> x = (1,2)
>>> x += (3,)
>>> x
(1, 2, 3)

但是,这并不意味着元组是可变的。在上面的示例中,通过将两个元组作为参数相加来构造新的元组。原始元组未修改。为了证明这一点,请考虑以下因素:

>>> x = (1,2)
>>> y = x
>>> x += (3,)
>>> x
(1, 2, 3)
>>> y
(1, 2)

而如果您要使用列表构造相同的示例,则y也会进行更新:

>>> x = [1, 2]
>>> y = x
>>> x += [3]
>>> x
[1, 2, 3]
>>> y
[1, 2, 3]

Square brackets are lists while parentheses are tuples.

A list is mutable, meaning you can change its contents:

>>> x = [1,2]
>>> x.append(3)
>>> x
[1, 2, 3]

while tuples are not:

>>> x = (1,2)
>>> x
(1, 2)
>>> x.append(3)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: 'tuple' object has no attribute 'append'

The other main difference is that a tuple is hashable, meaning that you can use it as a key to a dictionary, among other things. For example:

>>> x = (1,2)
>>> y = [1,2]
>>> z = {}
>>> z[x] = 3
>>> z
{(1, 2): 3}
>>> z[y] = 4
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: unhashable type: 'list'

Note that, as many people have pointed out, you can add tuples together. For example:

>>> x = (1,2)
>>> x += (3,)
>>> x
(1, 2, 3)

However, this does not mean tuples are mutable. In the example above, a new tuple is constructed by adding together the two tuples as arguments. The original tuple is not modified. To demonstrate this, consider the following:

>>> x = (1,2)
>>> y = x
>>> x += (3,)
>>> x
(1, 2, 3)
>>> y
(1, 2)

Whereas, if you were to construct this same example with a list, y would also be updated:

>>> x = [1, 2]
>>> y = x
>>> x += [3]
>>> x
[1, 2, 3]
>>> y
[1, 2, 3]

回答 1

一个有趣的区别:

lst=[1]
print lst          // prints [1]
print type(lst)    // prints <type 'list'>

notATuple=(1)
print notATuple        // prints 1
print type(notATuple)  // prints <type 'int'>
                                         ^^ instead of tuple(expected)

即使只包含一个值,逗号也必须包含在元组中。例如(1,)代替(1)

One interesting difference :

lst=[1]
print lst          // prints [1]
print type(lst)    // prints <type 'list'>

notATuple=(1)
print notATuple        // prints 1
print type(notATuple)  // prints <type 'int'>
                                         ^^ instead of tuple(expected)

A comma must be included in a tuple even if it contains only a single value. e.g. (1,) instead of (1).


回答 2

它们不是列表,而是列表和元组。您可以在Python教程中阅读有关元组的信息。尽管您可以对列表进行变异,但是使用元组是不可能的。

In [1]: x = (1, 2)

In [2]: x[0] = 3
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)

/home/user/<ipython console> in <module>()

TypeError: 'tuple' object does not support item assignment

They are not lists, they are a list and a tuple. You can read about tuples in the Python tutorial. While you can mutate lists, this is not possible with tuples.

In [1]: x = (1, 2)

In [2]: x[0] = 3
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)

/home/user/<ipython console> in <module>()

TypeError: 'tuple' object does not support item assignment

回答 3

方括号和括号的另一种不同之处是方括号可以描述列表的理解,例如 [x for x in y]

相应的括号语法指定一个元组生成器(x for x in y)

您可以使用以下方法获取元组理解: tuple(x for x in y)

请参阅:为什么Python中没有元组理解?

Another way brackets and parentheses differ is that square brackets can describe a list comprehension, e.g. [x for x in y]

Whereas the corresponding parenthetic syntax specifies a tuple generator: (x for x in y)

You can get a tuple comprehension using: tuple(x for x in y)

See: Why is there no tuple comprehension in Python?


回答 4

第一个是列表,第二个是元组。列表是可变的,元组不是。

查看本教程的“ 数据结构”部分和文档的“ 序列类型”部分。

The first is a list, the second is a tuple. Lists are mutable, tuples are not.

Take a look at the Data Structures section of the tutorial, and the Sequence Types section of the documentation.


回答 5

逗号分隔由包含的项目 ()tupleS,那些由封闭[]list秒。

Comma-separated items enclosed by ( and ) are tuples, those enclosed by [ and ] are lists.


从类定义中的列表理解访问类变量

问题:从类定义中的列表理解访问类变量

如何从类定义中的列表理解中访问其他类变量?以下内容在Python 2中有效,但在Python 3中失败:

class Foo:
    x = 5
    y = [x for i in range(1)]

Python 3.2给出了错误:

NameError: global name 'x' is not defined

尝试Foo.x也不起作用。关于如何在Python 3中执行此操作的任何想法?

一个更复杂的激励示例:

from collections import namedtuple
class StateDatabase:
    State = namedtuple('State', ['name', 'capital'])
    db = [State(*args) for args in [
        ['Alabama', 'Montgomery'],
        ['Alaska', 'Juneau'],
        # ...
    ]]

在此示例中,apply()这是一个不错的解决方法,但不幸的是它已从Python 3中删除。

How do you access other class variables from a list comprehension within the class definition? The following works in Python 2 but fails in Python 3:

class Foo:
    x = 5
    y = [x for i in range(1)]

Python 3.2 gives the error:

NameError: global name 'x' is not defined

Trying Foo.x doesn’t work either. Any ideas on how to do this in Python 3?

A slightly more complicated motivating example:

from collections import namedtuple
class StateDatabase:
    State = namedtuple('State', ['name', 'capital'])
    db = [State(*args) for args in [
        ['Alabama', 'Montgomery'],
        ['Alaska', 'Juneau'],
        # ...
    ]]

In this example, apply() would have been a decent workaround, but it is sadly removed from Python 3.


回答 0

类范围和列表,集合或字典的理解以及生成器表达式不混合。

为什么;或者,关于这个的正式词

在Python 3中,为列表理解赋予了它们自己的适当范围(本地命名空间),以防止其局部变量渗入周围的范围内(即使在理解范围之后,也请参阅Python列表理解重新绑定名称。对吗?)。在模块或函数中使用这样的列表理解时,这很好,但是在类中,作用域范围有点奇怪

pep 227中对此进行了记录:

类范围内的名称不可访问。名称在最里面的函数范围内解析。如果类定义出现在嵌套作用域链中,则解析过程将跳过类定义。

并在 class复合语句文档中

然后,使用新创建的本地命名空间和原始的全局命名空间,在新的执行框架中执行该类的套件(请参见Naming and binding部分)。(通常,套件仅包含函数定义。)当类的套件完成执行时,其执行框架将被丢弃,但其本地命名空间将被保存[4]然后,使用基类的继承列表和属性字典的已保存本地命名空间创建类对象。

强调我的;执行框架是临时范围。

由于范围被重新用作类对象的属性,因此允许将其用作非本地范围也将导致未定义的行为。例如,如果一个类方法称为x嵌套作用域变量,然后又进行操作Foo.x,会发生什么情况?更重要的是,这对于Foo?Python 必须以不同的方式对待类范围,因为它与函数范围有很大不同。

最后但同样重要的是,链接 执行模型文档中命名和绑定部分明确提到了类作用域:

在类块中定义的名称范围仅限于该类块。它不会扩展到方法的代码块–包括理解和生成器表达式,因为它们是使用函数范围实现的。这意味着以下操作将失败:

class A:
     a = 42
     b = list(a + i for i in range(10))

因此,总结一下:您不能从函数,列出的理解或包含在该范围内的生成器表达式中访问类范围;它们的作用就好像该范围不存在。在Python 2中,列表理解是使用快捷方式实现的,但是在Python 3中,它们具有自己的功能范围(应该一直如此),因此您的示例中断了。无论Python版本如何,其他理解类型都有其自己的范围,因此具有set或dict理解的类似示例将在Python 2中中断。

# Same error, in Python 2 or 3
y = {x: x for i in range(1)}

(小)异常;或者,为什么一部分仍然可以工作

无论Python版本如何,理解或生成器表达式的一部分都在周围的范围内执行。那就是最外层可迭代的表达。在您的示例中,它是range(1)

y = [x for i in range(1)]
#               ^^^^^^^^

因此,使用 x在该表达式中不会引发错误:

# Runs fine
y = [i for i in range(x)]

这仅适用于最外面的可迭代对象。如果一个理解具有多个for子句,则内部的可迭代for子句在该理解的范围进行评估:

# NameError
y = [i for i in range(1) for j in range(x)]

做出此设计决定是为了在genexp创建时引发错误,而不是在创建生成器表达式的最外层可迭代器引发错误时,或者当最外层可迭代器变得不可迭代时,在迭代时抛出错误。理解共享此行为以保持一致性。

在引擎盖下看;或者,比您想要的方式更详细

您可以使用dis模块查看所有这些操作。在以下示例中,我将使用Python 3.3,因为它添加了合格的名称,这些名称可以整洁地标识我们要检查的代码对象。产生的字节码在其他方面与Python 3.2相同。

为了创建一个类,Python本质上采用了构成类主体的整个套件(因此所有内容都比该class <name>:行缩进了一层),并像执行一个函数一样执行:

>>> import dis
>>> def foo():
...     class Foo:
...         x = 5
...         y = [x for i in range(1)]
...     return Foo
... 
>>> dis.dis(foo)
  2           0 LOAD_BUILD_CLASS     
              1 LOAD_CONST               1 (<code object Foo at 0x10a436030, file "<stdin>", line 2>) 
              4 LOAD_CONST               2 ('Foo') 
              7 MAKE_FUNCTION            0 
             10 LOAD_CONST               2 ('Foo') 
             13 CALL_FUNCTION            2 (2 positional, 0 keyword pair) 
             16 STORE_FAST               0 (Foo) 

  5          19 LOAD_FAST                0 (Foo) 
             22 RETURN_VALUE         

首先LOAD_CONSTFoo该类中为类主体加载一个代码对象,然后将其放入函数中并进行调用。然后,该调用的结果用于创建类的命名空间,__dict__。到目前为止,一切都很好。

这里要注意的是字节码包含一个嵌套的代码对象。在Python中,类定义,函数,理解和生成器均表示为代码对象,这些对象不仅包含字节码,而且还包含表示局部变量,常量,取自全局变量的变量和取自嵌套作用域的变量的结构。编译后的字节码引用了这些结构,而python解释器知道如何访问给定的字节码。

这里要记住的重要一点是,Python在编译时创建了这些结构。该class套件是<code object Foo at 0x10a436030, file "<stdin>", line 2>已编译的代码对象()。

让我们检查创建类主体本身的代码对象。代码对象具有以下co_consts结构:

>>> foo.__code__.co_consts
(None, <code object Foo at 0x10a436030, file "<stdin>", line 2>, 'Foo')
>>> dis.dis(foo.__code__.co_consts[1])
  2           0 LOAD_FAST                0 (__locals__) 
              3 STORE_LOCALS         
              4 LOAD_NAME                0 (__name__) 
              7 STORE_NAME               1 (__module__) 
             10 LOAD_CONST               0 ('foo.<locals>.Foo') 
             13 STORE_NAME               2 (__qualname__) 

  3          16 LOAD_CONST               1 (5) 
             19 STORE_NAME               3 (x) 

  4          22 LOAD_CONST               2 (<code object <listcomp> at 0x10a385420, file "<stdin>", line 4>) 
             25 LOAD_CONST               3 ('foo.<locals>.Foo.<listcomp>') 
             28 MAKE_FUNCTION            0 
             31 LOAD_NAME                4 (range) 
             34 LOAD_CONST               4 (1) 
             37 CALL_FUNCTION            1 (1 positional, 0 keyword pair) 
             40 GET_ITER             
             41 CALL_FUNCTION            1 (1 positional, 0 keyword pair) 
             44 STORE_NAME               5 (y) 
             47 LOAD_CONST               5 (None) 
             50 RETURN_VALUE         

上面的字节码创建了类主体。该功能被执行并且将所得locals()的命名空间,包含xy用于创建类(不同之处在于因为它不工作x不被定义为一个全局)。请注意,在中存储5x,它会加载另一个代码对象。那就是列表理解;它像类主体一样被包装在一个函数对象中;创建的函数带有一个位置参数,该参数range(1)可迭代用于其循环代码,并转换为迭代器。如字节码所示,range(1)在类范围内进行评估。

从中可以看出,用于函数或生成器的代码对象与用于理解的代码对象之间的唯一区别是,后者在执行父代码对象时立即执行;字节码只是简单地动态创建一个函数,然后只需几个小步骤就可以执行它。

Python 2.x在那里改用内联字节码,这是Python 2.7的输出:

  2           0 LOAD_NAME                0 (__name__)
              3 STORE_NAME               1 (__module__)

  3           6 LOAD_CONST               0 (5)
              9 STORE_NAME               2 (x)

  4          12 BUILD_LIST               0
             15 LOAD_NAME                3 (range)
             18 LOAD_CONST               1 (1)
             21 CALL_FUNCTION            1
             24 GET_ITER            
        >>   25 FOR_ITER                12 (to 40)
             28 STORE_NAME               4 (i)
             31 LOAD_NAME                2 (x)
             34 LIST_APPEND              2
             37 JUMP_ABSOLUTE           25
        >>   40 STORE_NAME               5 (y)
             43 LOAD_LOCALS         
             44 RETURN_VALUE        

没有代码对象被加载,而是FOR_ITER循环内联运行。因此,在Python 3.x中,为列表生成器提供了自己的适当代码对象,这意味着它具有自己的作用域。

然而,理解与当模块或脚本首先被解释加载的Python源代码的其余部分一起编译,编译器并没有考虑一类套件的有效范围。在列表理解任何引用变量必须在查找范围周围的类定义,递归。如果编译器未找到该变量,则将其标记为全局变量。列表理解代码对象的反汇编显示x确实确实是作为全局加载的:

>>> foo.__code__.co_consts[1].co_consts
('foo.<locals>.Foo', 5, <code object <listcomp> at 0x10a385420, file "<stdin>", line 4>, 'foo.<locals>.Foo.<listcomp>', 1, None)
>>> dis.dis(foo.__code__.co_consts[1].co_consts[2])
  4           0 BUILD_LIST               0 
              3 LOAD_FAST                0 (.0) 
        >>    6 FOR_ITER                12 (to 21) 
              9 STORE_FAST               1 (i) 
             12 LOAD_GLOBAL              0 (x) 
             15 LIST_APPEND              2 
             18 JUMP_ABSOLUTE            6 
        >>   21 RETURN_VALUE         

此字节代码块加载传入的第一个参数( range(1)迭代器),就像Python 2.x版本用于对其FOR_ITER进行循环并创建其输出一样。

如果我们xfoo函数中定义,x它将是一个单元格变量(单元格是指嵌套作用域):

>>> def foo():
...     x = 2
...     class Foo:
...         x = 5
...         y = [x for i in range(1)]
...     return Foo
... 
>>> dis.dis(foo.__code__.co_consts[2].co_consts[2])
  5           0 BUILD_LIST               0 
              3 LOAD_FAST                0 (.0) 
        >>    6 FOR_ITER                12 (to 21) 
              9 STORE_FAST               1 (i) 
             12 LOAD_DEREF               0 (x) 
             15 LIST_APPEND              2 
             18 JUMP_ABSOLUTE            6 
        >>   21 RETURN_VALUE         

LOAD_DEREF将间接加载x从代码对象小区对象:

>>> foo.__code__.co_cellvars               # foo function `x`
('x',)
>>> foo.__code__.co_consts[2].co_cellvars  # Foo class, no cell variables
()
>>> foo.__code__.co_consts[2].co_consts[2].co_freevars  # Refers to `x` in foo
('x',)
>>> foo().y
[2]

实际引用从当前帧数据结构中查找值,当前帧数据结构是从功能对象的.__closure__属性初始化的。由于为理解代码对象创建的函数被再次丢弃,因此我们无法检查该函数的关闭情况。要查看实际的闭包,我们必须检查一个嵌套函数:

>>> def spam(x):
...     def eggs():
...         return x
...     return eggs
... 
>>> spam(1).__code__.co_freevars
('x',)
>>> spam(1)()
1
>>> spam(1).__closure__
>>> spam(1).__closure__[0].cell_contents
1
>>> spam(5).__closure__[0].cell_contents
5

因此,总结一下:

  • 列表推导在Python 3中获得了自己的代码对象,并且函数,生成器或推导的代码对象之间没有区别。理解代码对象包装在一个临时函数对象中,并立即调用。
  • 代码对象是在编译时创建的,并且根据代码的嵌套作用域,将任何非局部变量标记为全局变量或自由变量。类主体被视为查找那些变量的范围。
  • 执行代码时,Python只需查看全局变量或当前正在执行的对象的关闭。由于编译器未将类主体作为范围包含在内,因此不考虑临时函数命名空间。

解决方法;或者,该怎么办

如果要x像在函数中那样为变量创建显式作用域,则可以将类作用域变量用于列表理解:

>>> class Foo:
...     x = 5
...     def y(x):
...         return [x for i in range(1)]
...     y = y(x)
... 
>>> Foo.y
[5]

y可以直接调用“临时” 功能。我们用它的返回值替换它。解决时考虑其范围x

>>> foo.__code__.co_consts[1].co_consts[2]
<code object y at 0x10a5df5d0, file "<stdin>", line 4>
>>> foo.__code__.co_consts[1].co_consts[2].co_cellvars
('x',)

当然,人们在阅读您的代码时会对此有些挠头。您可能要在其中添加一个大的粗注,以解释您为什么这样做。

最好的解决方法是仅使用__init__创建一个实例变量:

def __init__(self):
    self.y = [self.x for i in range(1)]

并避免一切费力的工作,并避免提出自己的问题。对于您自己的具体示例,我什至不将其存储namedtuple在类中。直接使用输出(根本不存储生成的类),或使用全局变量:

from collections import namedtuple
State = namedtuple('State', ['name', 'capital'])

class StateDatabase:
    db = [State(*args) for args in [
       ('Alabama', 'Montgomery'),
       ('Alaska', 'Juneau'),
       # ...
    ]]

Class scope and list, set or dictionary comprehensions, as well as generator expressions do not mix.

The why; or, the official word on this

In Python 3, list comprehensions were given a proper scope (local namespace) of their own, to prevent their local variables bleeding over into the surrounding scope (see Python list comprehension rebind names even after scope of comprehension. Is this right?). That’s great when using such a list comprehension in a module or in a function, but in classes, scoping is a little, uhm, strange.

This is documented in pep 227:

Names in class scope are not accessible. Names are resolved in the innermost enclosing function scope. If a class definition occurs in a chain of nested scopes, the resolution process skips class definitions.

and in the class compound statement documentation:

The class’s suite is then executed in a new execution frame (see section Naming and binding), using a newly created local namespace and the original global namespace. (Usually, the suite contains only function definitions.) When the class’s suite finishes execution, its execution frame is discarded but its local namespace is saved. [4] A class object is then created using the inheritance list for the base classes and the saved local namespace for the attribute dictionary.

Emphasis mine; the execution frame is the temporary scope.

Because the scope is repurposed as the attributes on a class object, allowing it to be used as a nonlocal scope as well leads to undefined behaviour; what would happen if a class method referred to x as a nested scope variable, then manipulates Foo.x as well, for example? More importantly, what would that mean for subclasses of Foo? Python has to treat a class scope differently as it is very different from a function scope.

Last, but definitely not least, the linked Naming and binding section in the Execution model documentation mentions class scopes explicitly:

The scope of names defined in a class block is limited to the class block; it does not extend to the code blocks of methods – this includes comprehensions and generator expressions since they are implemented using a function scope. This means that the following will fail:

class A:
     a = 42
     b = list(a + i for i in range(10))

So, to summarize: you cannot access the class scope from functions, list comprehensions or generator expressions enclosed in that scope; they act as if that scope does not exist. In Python 2, list comprehensions were implemented using a shortcut, but in Python 3 they got their own function scope (as they should have had all along) and thus your example breaks. Other comprehension types have their own scope regardless of Python version, so a similar example with a set or dict comprehension would break in Python 2.

# Same error, in Python 2 or 3
y = {x: x for i in range(1)}

The (small) exception; or, why one part may still work

There’s one part of a comprehension or generator expression that executes in the surrounding scope, regardless of Python version. That would be the expression for the outermost iterable. In your example, it’s the range(1):

y = [x for i in range(1)]
#               ^^^^^^^^

Thus, using x in that expression would not throw an error:

# Runs fine
y = [i for i in range(x)]

This only applies to the outermost iterable; if a comprehension has multiple for clauses, the iterables for inner for clauses are evaluated in the comprehension’s scope:

# NameError
y = [i for i in range(1) for j in range(x)]

This design decision was made in order to throw an error at genexp creation time instead of iteration time when creating the outermost iterable of a generator expression throws an error, or when the outermost iterable turns out not to be iterable. Comprehensions share this behavior for consistency.

Looking under the hood; or, way more detail than you ever wanted

You can see this all in action using the dis module. I’m using Python 3.3 in the following examples, because it adds qualified names that neatly identify the code objects we want to inspect. The bytecode produced is otherwise functionally identical to Python 3.2.

To create a class, Python essentially takes the whole suite that makes up the class body (so everything indented one level deeper than the class <name>: line), and executes that as if it were a function:

>>> import dis
>>> def foo():
...     class Foo:
...         x = 5
...         y = [x for i in range(1)]
...     return Foo
... 
>>> dis.dis(foo)
  2           0 LOAD_BUILD_CLASS     
              1 LOAD_CONST               1 (<code object Foo at 0x10a436030, file "<stdin>", line 2>) 
              4 LOAD_CONST               2 ('Foo') 
              7 MAKE_FUNCTION            0 
             10 LOAD_CONST               2 ('Foo') 
             13 CALL_FUNCTION            2 (2 positional, 0 keyword pair) 
             16 STORE_FAST               0 (Foo) 

  5          19 LOAD_FAST                0 (Foo) 
             22 RETURN_VALUE         

The first LOAD_CONST there loads a code object for the Foo class body, then makes that into a function, and calls it. The result of that call is then used to create the namespace of the class, its __dict__. So far so good.

The thing to note here is that the bytecode contains a nested code object; in Python, class definitions, functions, comprehensions and generators all are represented as code objects that contain not only bytecode, but also structures that represent local variables, constants, variables taken from globals, and variables taken from the nested scope. The compiled bytecode refers to those structures and the python interpreter knows how to access those given the bytecodes presented.

The important thing to remember here is that Python creates these structures at compile time; the class suite is a code object (<code object Foo at 0x10a436030, file "<stdin>", line 2>) that is already compiled.

Let’s inspect that code object that creates the class body itself; code objects have a co_consts structure:

>>> foo.__code__.co_consts
(None, <code object Foo at 0x10a436030, file "<stdin>", line 2>, 'Foo')
>>> dis.dis(foo.__code__.co_consts[1])
  2           0 LOAD_FAST                0 (__locals__) 
              3 STORE_LOCALS         
              4 LOAD_NAME                0 (__name__) 
              7 STORE_NAME               1 (__module__) 
             10 LOAD_CONST               0 ('foo.<locals>.Foo') 
             13 STORE_NAME               2 (__qualname__) 

  3          16 LOAD_CONST               1 (5) 
             19 STORE_NAME               3 (x) 

  4          22 LOAD_CONST               2 (<code object <listcomp> at 0x10a385420, file "<stdin>", line 4>) 
             25 LOAD_CONST               3 ('foo.<locals>.Foo.<listcomp>') 
             28 MAKE_FUNCTION            0 
             31 LOAD_NAME                4 (range) 
             34 LOAD_CONST               4 (1) 
             37 CALL_FUNCTION            1 (1 positional, 0 keyword pair) 
             40 GET_ITER             
             41 CALL_FUNCTION            1 (1 positional, 0 keyword pair) 
             44 STORE_NAME               5 (y) 
             47 LOAD_CONST               5 (None) 
             50 RETURN_VALUE         

The above bytecode creates the class body. The function is executed and the resulting locals() namespace, containing x and y is used to create the class (except that it doesn’t work because x isn’t defined as a global). Note that after storing 5 in x, it loads another code object; that’s the list comprehension; it is wrapped in a function object just like the class body was; the created function takes a positional argument, the range(1) iterable to use for its looping code, cast to an iterator. As shown in the bytecode, range(1) is evaluated in the class scope.

From this you can see that the only difference between a code object for a function or a generator, and a code object for a comprehension is that the latter is executed immediately when the parent code object is executed; the bytecode simply creates a function on the fly and executes it in a few small steps.

Python 2.x uses inline bytecode there instead, here is output from Python 2.7:

  2           0 LOAD_NAME                0 (__name__)
              3 STORE_NAME               1 (__module__)

  3           6 LOAD_CONST               0 (5)
              9 STORE_NAME               2 (x)

  4          12 BUILD_LIST               0
             15 LOAD_NAME                3 (range)
             18 LOAD_CONST               1 (1)
             21 CALL_FUNCTION            1
             24 GET_ITER            
        >>   25 FOR_ITER                12 (to 40)
             28 STORE_NAME               4 (i)
             31 LOAD_NAME                2 (x)
             34 LIST_APPEND              2
             37 JUMP_ABSOLUTE           25
        >>   40 STORE_NAME               5 (y)
             43 LOAD_LOCALS         
             44 RETURN_VALUE        

No code object is loaded, instead a FOR_ITER loop is run inline. So in Python 3.x, the list generator was given a proper code object of its own, which means it has its own scope.

However, the comprehension was compiled together with the rest of the python source code when the module or script was first loaded by the interpreter, and the compiler does not consider a class suite a valid scope. Any referenced variables in a list comprehension must look in the scope surrounding the class definition, recursively. If the variable wasn’t found by the compiler, it marks it as a global. Disassembly of the list comprehension code object shows that x is indeed loaded as a global:

>>> foo.__code__.co_consts[1].co_consts
('foo.<locals>.Foo', 5, <code object <listcomp> at 0x10a385420, file "<stdin>", line 4>, 'foo.<locals>.Foo.<listcomp>', 1, None)
>>> dis.dis(foo.__code__.co_consts[1].co_consts[2])
  4           0 BUILD_LIST               0 
              3 LOAD_FAST                0 (.0) 
        >>    6 FOR_ITER                12 (to 21) 
              9 STORE_FAST               1 (i) 
             12 LOAD_GLOBAL              0 (x) 
             15 LIST_APPEND              2 
             18 JUMP_ABSOLUTE            6 
        >>   21 RETURN_VALUE         

This chunk of bytecode loads the first argument passed in (the range(1) iterator), and just like the Python 2.x version uses FOR_ITER to loop over it and create its output.

Had we defined x in the foo function instead, x would be a cell variable (cells refer to nested scopes):

>>> def foo():
...     x = 2
...     class Foo:
...         x = 5
...         y = [x for i in range(1)]
...     return Foo
... 
>>> dis.dis(foo.__code__.co_consts[2].co_consts[2])
  5           0 BUILD_LIST               0 
              3 LOAD_FAST                0 (.0) 
        >>    6 FOR_ITER                12 (to 21) 
              9 STORE_FAST               1 (i) 
             12 LOAD_DEREF               0 (x) 
             15 LIST_APPEND              2 
             18 JUMP_ABSOLUTE            6 
        >>   21 RETURN_VALUE         

The LOAD_DEREF will indirectly load x from the code object cell objects:

>>> foo.__code__.co_cellvars               # foo function `x`
('x',)
>>> foo.__code__.co_consts[2].co_cellvars  # Foo class, no cell variables
()
>>> foo.__code__.co_consts[2].co_consts[2].co_freevars  # Refers to `x` in foo
('x',)
>>> foo().y
[2]

The actual referencing looks the value up from the current frame data structures, which were initialized from a function object’s .__closure__ attribute. Since the function created for the comprehension code object is discarded again, we do not get to inspect that function’s closure. To see a closure in action, we’d have to inspect a nested function instead:

>>> def spam(x):
...     def eggs():
...         return x
...     return eggs
... 
>>> spam(1).__code__.co_freevars
('x',)
>>> spam(1)()
1
>>> spam(1).__closure__
>>> spam(1).__closure__[0].cell_contents
1
>>> spam(5).__closure__[0].cell_contents
5

So, to summarize:

  • List comprehensions get their own code objects in Python 3, and there is no difference between code objects for functions, generators or comprehensions; comprehension code objects are wrapped in a temporary function object and called immediately.
  • Code objects are created at compile time, and any non-local variables are marked as either global or as free variables, based on the nested scopes of the code. The class body is not considered a scope for looking up those variables.
  • When executing the code, Python has only to look into the globals, or the closure of the currently executing object. Since the compiler didn’t include the class body as a scope, the temporary function namespace is not considered.

A workaround; or, what to do about it

If you were to create an explicit scope for the x variable, like in a function, you can use class-scope variables for a list comprehension:

>>> class Foo:
...     x = 5
...     def y(x):
...         return [x for i in range(1)]
...     y = y(x)
... 
>>> Foo.y
[5]

The ‘temporary’ y function can be called directly; we replace it when we do with its return value. Its scope is considered when resolving x:

>>> foo.__code__.co_consts[1].co_consts[2]
<code object y at 0x10a5df5d0, file "<stdin>", line 4>
>>> foo.__code__.co_consts[1].co_consts[2].co_cellvars
('x',)

Of course, people reading your code will scratch their heads over this a little; you may want to put a big fat comment in there explaining why you are doing this.

The best work-around is to just use __init__ to create an instance variable instead:

def __init__(self):
    self.y = [self.x for i in range(1)]

and avoid all the head-scratching, and questions to explain yourself. For your own concrete example, I would not even store the namedtuple on the class; either use the output directly (don’t store the generated class at all), or use a global:

from collections import namedtuple
State = namedtuple('State', ['name', 'capital'])

class StateDatabase:
    db = [State(*args) for args in [
       ('Alabama', 'Montgomery'),
       ('Alaska', 'Juneau'),
       # ...
    ]]

回答 1

我认为这是Python 3中的一个缺陷。我希望他们能够改变它。

旧方法(适用于2.7,适用NameError: name 'x' is not defined于3+):

class A:
    x = 4
    y = [x+i for i in range(1)]

注意:仅使用范围A.x将无法解决

新方式(适用于3+):

class A:
    x = 4
    y = (lambda x=x: [x+i for i in range(1)])()

因为语法太丑陋,所以我通常在构造函数中初始化所有类变量

In my opinion it is a flaw in Python 3. I hope they change it.

Old Way (works in 2.7, throws NameError: name 'x' is not defined in 3+):

class A:
    x = 4
    y = [x+i for i in range(1)]

NOTE: simply scoping it with A.x would not solve it

New Way (works in 3+):

class A:
    x = 4
    y = (lambda x=x: [x+i for i in range(1)])()

Because the syntax is so ugly I just initialize all my class variables in the constructor typically


回答 2

公认的答案提供了很好的信息,但这里似乎还有其他一些不足之处–列表理解和生成器表达式之间的差异。我玩过的一个演示:

class Foo:

    # A class-level variable.
    X = 10

    # I can use that variable to define another class-level variable.
    Y = sum((X, X))

    # Works in Python 2, but not 3.
    # In Python 3, list comprehensions were given their own scope.
    try:
        Z1 = sum([X for _ in range(3)])
    except NameError:
        Z1 = None

    # Fails in both.
    # Apparently, generator expressions (that's what the entire argument
    # to sum() is) did have their own scope even in Python 2.
    try:
        Z2 = sum(X for _ in range(3))
    except NameError:
        Z2 = None

    # Workaround: put the computation in lambda or def.
    compute_z3 = lambda val: sum(val for _ in range(3))

    # Then use that function.
    Z3 = compute_z3(X)

    # Also worth noting: here I can refer to XS in the for-part of the
    # generator expression (Z4 works), but I cannot refer to XS in the
    # inner-part of the generator expression (Z5 fails).
    XS = [15, 15, 15, 15]
    Z4 = sum(val for val in XS)
    try:
        Z5 = sum(XS[i] for i in range(len(XS)))
    except NameError:
        Z5 = None

print(Foo.Z1, Foo.Z2, Foo.Z3, Foo.Z4, Foo.Z5)

The accepted answer provides excellent information, but there appear to be a few other wrinkles here — differences between list comprehension and generator expressions. A demo that I played around with:

class Foo:

    # A class-level variable.
    X = 10

    # I can use that variable to define another class-level variable.
    Y = sum((X, X))

    # Works in Python 2, but not 3.
    # In Python 3, list comprehensions were given their own scope.
    try:
        Z1 = sum([X for _ in range(3)])
    except NameError:
        Z1 = None

    # Fails in both.
    # Apparently, generator expressions (that's what the entire argument
    # to sum() is) did have their own scope even in Python 2.
    try:
        Z2 = sum(X for _ in range(3))
    except NameError:
        Z2 = None

    # Workaround: put the computation in lambda or def.
    compute_z3 = lambda val: sum(val for _ in range(3))

    # Then use that function.
    Z3 = compute_z3(X)

    # Also worth noting: here I can refer to XS in the for-part of the
    # generator expression (Z4 works), but I cannot refer to XS in the
    # inner-part of the generator expression (Z5 fails).
    XS = [15, 15, 15, 15]
    Z4 = sum(val for val in XS)
    try:
        Z5 = sum(XS[i] for i in range(len(XS)))
    except NameError:
        Z5 = None

print(Foo.Z1, Foo.Z2, Foo.Z3, Foo.Z4, Foo.Z5)

回答 3

这是Python中的错误。宣传被认为等同于for循环,但是在类中却并非如此。至少在Python 3.6.6之前的版本中,在类中使用的理解中,在理解内部只能访问该理解外部的一个变量,并且必须将其用作最外层的迭代器。在功能上,此范围限制不适用。

为了说明为什么这是一个错误,让我们回到原始示例。这将失败:

class Foo:
    x = 5
    y = [x for i in range(1)]

但这有效:

def Foo():
    x = 5
    y = [x for i in range(1)]

该限制在参考指南的本节结尾处说明。

This is a bug in Python. Comprehensions are advertised as being equivalent to for loops, but this is not true in classes. At least up to Python 3.6.6, in a comprehension used in a class, only one variable from outside the comprehension is accessible inside the comprehension, and it must be used as the outermost iterator. In a function, this scope limitation does not apply.

To illustrate why this is a bug, let’s return to the original example. This fails:

class Foo:
    x = 5
    y = [x for i in range(1)]

But this works:

def Foo():
    x = 5
    y = [x for i in range(1)]

The limitation is stated at the end of this section in the reference guide.


回答 4

由于最外层的迭代器是在周围的范围内进行评估的,因此我们可以zip一起使用itertools.repeat将依赖项传递到理解范围内:

import itertools as it

class Foo:
    x = 5
    y = [j for i, j in zip(range(3), it.repeat(x))]

也可以for在理解中使用嵌套循环,并将依赖项包含在最外层的可迭代对象中:

class Foo:
    x = 5
    y = [j for j in (x,) for i in range(3)]

对于OP的特定示例:

from collections import namedtuple
import itertools as it

class StateDatabase:
    State = namedtuple('State', ['name', 'capital'])
    db = [State(*args) for State, args in zip(it.repeat(State), [
        ['Alabama', 'Montgomery'],
        ['Alaska', 'Juneau'],
        # ...
    ])]

Since the outermost iterator is evaluated in the surrounding scope we can use zip together with itertools.repeat to carry the dependencies over to the comprehension’s scope:

import itertools as it

class Foo:
    x = 5
    y = [j for i, j in zip(range(3), it.repeat(x))]

One can also use nested for loops in the comprehension and include the dependencies in the outermost iterable:

class Foo:
    x = 5
    y = [j for j in (x,) for i in range(3)]

For the specific example of the OP:

from collections import namedtuple
import itertools as it

class StateDatabase:
    State = namedtuple('State', ['name', 'capital'])
    db = [State(*args) for State, args in zip(it.repeat(State), [
        ['Alabama', 'Montgomery'],
        ['Alaska', 'Juneau'],
        # ...
    ])]

在Python中使用多个参数进行字符串格式化(例如’%s…%s’)

问题:在Python中使用多个参数进行字符串格式化(例如’%s…%s’)

我有一个看起来像的字符串,'%s in %s'并且我想知道如何分隔参数,以便它们是两个不同的%s。我来自Java的想法是这样的:

'%s in %s' % unicode(self.author),  unicode(self.publication)

但这不起作用,因此它在Python中的外观如何?

I have a string that looks like '%s in %s' and I want to know how to seperate the arguments so that they are two different %s. My mind coming from Java came up with this:

'%s in %s' % unicode(self.author),  unicode(self.publication)

But this doesn’t work so how does it look in Python?


回答 0

马克·西达德(Mark Cidade)的答案是正确的-您需要提供一个元组。

但是从Python 2.6起,您可以使用format代替%

'{0} in {1}'.format(unicode(self.author,'utf-8'),  unicode(self.publication,'utf-8'))

%不再鼓励使用for格式化字符串。

这种字符串格式设置方法是Python 3.0中的新标准,应优先于新代码中“字符串格式设置操作”中描述的%格式设置。

Mark Cidade’s answer is right – you need to supply a tuple.

However from Python 2.6 onwards you can use format instead of %:

'{0} in {1}'.format(unicode(self.author,'utf-8'),  unicode(self.publication,'utf-8'))

Usage of % for formatting strings is no longer encouraged.

This method of string formatting is the new standard in Python 3.0, and should be preferred to the % formatting described in String Formatting Operations in new code.


回答 1

如果使用多个参数,则必须将其放在一个元组中(请注意额外的括号):

'%s in %s' % (unicode(self.author),  unicode(self.publication))

正如EOL所指出的那样,该unicode()函数通常假定默认为ascii编码,因此,如果您使用非ASCII字符,则显式传递编码会更安全:

'%s in %s' % (unicode(self.author,'utf-8'),  unicode(self.publication('utf-8')))

从Python 3.0开始,最好改用以下str.format()语法:

'{0} in {1}'.format(unicode(self.author,'utf-8'),unicode(self.publication,'utf-8'))

If you’re using more than one argument it has to be in a tuple (note the extra parentheses):

'%s in %s' % (unicode(self.author),  unicode(self.publication))

As EOL points out, the unicode() function usually assumes ascii encoding as a default, so if you have non-ASCII characters, it’s safer to explicitly pass the encoding:

'%s in %s' % (unicode(self.author,'utf-8'),  unicode(self.publication('utf-8')))

And as of Python 3.0, it’s preferred to use the str.format() syntax instead:

'{0} in {1}'.format(unicode(self.author,'utf-8'),unicode(self.publication,'utf-8'))

回答 2

在元组/映射对象上有多个参数 format

以下是文档摘录:

给定的format % values中的%转换规范format将替换为的零个或多个元素values。效果类似于使用sprintf() C语言中的用法。

如果format需要单个参数,则值可以是单个非元组对象。否则,值必须是一个具有由formatstring 指定的项目数的元组或者是一个映射对象(例如,字典)。

参考资料


开启str.format而不是%

%操作员的新替代方法是使用str.format。以下是文档摘录:

str.format(*args, **kwargs)

执行字符串格式化操作。调用此方法的字符串可以包含文字文本或用大括号分隔的替换字段{}。每个替换字段都包含位置参数的数字索引或关键字参数的名称。返回字符串的副本,其中每个替换字段都用相应参数的字符串值替换。

此方法是Python 3.0中的新标准,应优先于%formatting

参考资料


例子

以下是一些用法示例:

>>> '%s for %s' % ("tit", "tat")
tit for tat

>>> '{} and {}'.format("chicken", "waffles")
chicken and waffles

>>> '%(last)s, %(first)s %(last)s' % {'first': "James", 'last': "Bond"}
Bond, James Bond

>>> '{last}, {first} {last}'.format(first="James", last="Bond")
Bond, James Bond

也可以看看

On a tuple/mapping object for multiple argument format

The following is excerpt from the documentation:

Given format % values, % conversion specifications in format are replaced with zero or more elements of values. The effect is similar to the using sprintf() in the C language.

If format requires a single argument, values may be a single non-tuple object. Otherwise, values must be a tuple with exactly the number of items specified by the format string, or a single mapping object (for example, a dictionary).

References


On str.format instead of %

A newer alternative to % operator is to use str.format. Here’s an excerpt from the documentation:

str.format(*args, **kwargs)

Perform a string formatting operation. The string on which this method is called can contain literal text or replacement fields delimited by braces {}. Each replacement field contains either the numeric index of a positional argument, or the name of a keyword argument. Returns a copy of the string where each replacement field is replaced with the string value of the corresponding argument.

This method is the new standard in Python 3.0, and should be preferred to % formatting.

References


Examples

Here are some usage examples:

>>> '%s for %s' % ("tit", "tat")
tit for tat

>>> '{} and {}'.format("chicken", "waffles")
chicken and waffles

>>> '%(last)s, %(first)s %(last)s' % {'first': "James", 'last': "Bond"}
Bond, James Bond

>>> '{last}, {first} {last}'.format(first="James", last="Bond")
Bond, James Bond

See also


回答 3

您必须将值放在括号中:

'%s in %s' % (unicode(self.author),  unicode(self.publication))

在这里,第一个%sunicode(self.author)将被放置。第二%sunicode(self.publication)将使用。

注意:你应该有利于string formatting%符号。更多信息在这里

You must just put the values into parentheses:

'%s in %s' % (unicode(self.author),  unicode(self.publication))

Here, for the first %s the unicode(self.author) will be placed. And for the second %s, the unicode(self.publication) will be used.

Note: You should favor string formatting over the % Notation. More info here


回答 4

到目前为止,发布的一些答案存在一个严重的问题:unicode()从默认编码(通常为ASCII)解码;实际上,unicode()试图通过将给定的字节转换为字符来“感知”。因此,以下代码(基本上是前面的答案所建议的)在我的计算机上失败:

# -*- coding: utf-8 -*-
author = 'éric'
print '{0}'.format(unicode(author))

给出:

Traceback (most recent call last):
  File "test.py", line 3, in <module>
    print '{0}'.format(unicode(author))
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 0: ordinal not in range(128)

失败的原因是author不只包含ASCII字节(即[0; 127]中的值),并且unicode()默认情况下(在许多计算机上)从ASCII解码。

一个可靠的解决方案是显式提供您的字段中使用的编码。以UTF-8为例:

u'{0} in {1}'.format(unicode(self.author, 'utf-8'), unicode(self.publication, 'utf-8'))

(或不使用initial u,这取决于您要使用Unicode结果还是字节字符串)。

在这一点上,可能要考虑让authorand publication字段为Unicode字符串,而不是在格式化期间对其进行解码。

There is a significant problem with some of the answers posted so far: unicode() decodes from the default encoding, which is often ASCII; in fact, unicode() tries to make “sense” of the bytes it is given by converting them into characters. Thus, the following code, which is essentially what is recommended by previous answers, fails on my machine:

# -*- coding: utf-8 -*-
author = 'éric'
print '{0}'.format(unicode(author))

gives:

Traceback (most recent call last):
  File "test.py", line 3, in <module>
    print '{0}'.format(unicode(author))
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 0: ordinal not in range(128)

The failure comes from the fact that author does not contain only ASCII bytes (i.e. with values in [0; 127]), and unicode() decodes from ASCII by default (on many machines).

A robust solution is to explicitly give the encoding used in your fields; taking UTF-8 as an example:

u'{0} in {1}'.format(unicode(self.author, 'utf-8'), unicode(self.publication, 'utf-8'))

(or without the initial u, depending on whether you want a Unicode result or a byte string).

At this point, one might want to consider having the author and publication fields be Unicode strings, instead of decoding them during formatting.


回答 5

对于python2,您也可以执行此操作

'%(author)s in %(publication)s'%{'author':unicode(self.author),
                                  'publication':unicode(self.publication)}

如果您有很多可替代的论点(特别是在进行国际化的情况下),这将很方便

Python2.6及更高版本支持 .format()

'{author} in {publication}'.format(author=self.author,
                                   publication=self.publication)

For python2 you can also do this

'%(author)s in %(publication)s'%{'author':unicode(self.author),
                                  'publication':unicode(self.publication)}

which is handy if you have a lot of arguments to substitute (particularly if you are doing internationalisation)

Python2.6 onwards supports .format()

'{author} in {publication}'.format(author=self.author,
                                   publication=self.publication)

回答 6

您还可以通过以下方式干净,简单地使用它(但是错误!因为您应该format像Mark Byers所说的那样使用):

print 'This is my %s formatted with %d arguments' % ('string', 2)

You could also use it clean and simple (but wrong! because you should use format like Mark Byers said) by doing:

print 'This is my %s formatted with %d arguments' % ('string', 2)

回答 7

为了完整起见,在PEP-498中引入了Python 3.6 f-string 。这些字符串可以

使用最小语法将表达式嵌入字符串文字中。

这意味着对于您的示例,您还可以使用:

f'{self.author} in {self.publication}'

For completeness, in python 3.6 f-string are introduced in PEP-498. These strings make it possible to

embed expressions inside string literals, using a minimal syntax.

That would mean that for your example you could also use:

f'{self.author} in {self.publication}'

在组对象上应用vs变换

问题:在组对象上应用vs变换

考虑以下数据帧:

     A      B         C         D
0  foo    one  0.162003  0.087469
1  bar    one -1.156319 -1.526272
2  foo    two  0.833892 -1.666304
3  bar  three -2.026673 -0.322057
4  foo    two  0.411452 -0.954371
5  bar    two  0.765878 -0.095968
6  foo    one -0.654890  0.678091
7  foo  three -1.789842 -1.130922

以下命令起作用:

> df.groupby('A').apply(lambda x: (x['C'] - x['D']))
> df.groupby('A').apply(lambda x: (x['C'] - x['D']).mean())

但以下任何一项均无效:

> df.groupby('A').transform(lambda x: (x['C'] - x['D']))
ValueError: could not broadcast input array from shape (5) into shape (5,3)

> df.groupby('A').transform(lambda x: (x['C'] - x['D']).mean())
 TypeError: cannot concatenate a non-NDFrame object

为什么? 文档上的示例似乎建议通过调用transform组,可以进行行操作处理:

# Note that the following suggests row-wise operation (x.mean is the column mean)
zscore = lambda x: (x - x.mean()) / x.std()
transformed = ts.groupby(key).transform(zscore)

换句话说,我认为转换本质上是一种特定的应用类型(不聚合)。我哪里错了?

供参考,以下是上面原始数据帧的构造:

df = pd.DataFrame({'A' : ['foo', 'bar', 'foo', 'bar',
                          'foo', 'bar', 'foo', 'foo'],
                   'B' : ['one', 'one', 'two', 'three',
                         'two', 'two', 'one', 'three'],
                   'C' : randn(8), 'D' : randn(8)})

Consider the following dataframe:

     A      B         C         D
0  foo    one  0.162003  0.087469
1  bar    one -1.156319 -1.526272
2  foo    two  0.833892 -1.666304
3  bar  three -2.026673 -0.322057
4  foo    two  0.411452 -0.954371
5  bar    two  0.765878 -0.095968
6  foo    one -0.654890  0.678091
7  foo  three -1.789842 -1.130922

The following commands work:

> df.groupby('A').apply(lambda x: (x['C'] - x['D']))
> df.groupby('A').apply(lambda x: (x['C'] - x['D']).mean())

but none of the following work:

> df.groupby('A').transform(lambda x: (x['C'] - x['D']))
ValueError: could not broadcast input array from shape (5) into shape (5,3)

> df.groupby('A').transform(lambda x: (x['C'] - x['D']).mean())
 TypeError: cannot concatenate a non-NDFrame object

Why? The example on the documentation seems to suggest that calling transform on a group allows one to do row-wise operation processing:

# Note that the following suggests row-wise operation (x.mean is the column mean)
zscore = lambda x: (x - x.mean()) / x.std()
transformed = ts.groupby(key).transform(zscore)

In other words, I thought that transform is essentially a specific type of apply (the one that does not aggregate). Where am I wrong?

For reference, below is the construction of the original dataframe above:

df = pd.DataFrame({'A' : ['foo', 'bar', 'foo', 'bar',
                          'foo', 'bar', 'foo', 'foo'],
                   'B' : ['one', 'one', 'two', 'three',
                         'two', 'two', 'one', 'three'],
                   'C' : randn(8), 'D' : randn(8)})

回答 0

apply和之间的两个主要区别transform

transformapplygroupby方法之间有两个主要区别。

  • 输入:
    • apply将每个组的所有列作为DataFrame隐式传递给自定义函数。
    • 同时transform将每个组的每一列作为系列分别传递给自定义函数。
  • 输出:
    • 传递给的自定义函数apply可以返回标量,或者返回Series或DataFrame(或numpy数组,甚至是list)
    • 传递给的自定义函数transform必须返回与group长度相同的序列(一维Series,数组或列表)。

因此,transform一次只能处理一个Series,而一次apply可以处理整个DataFrame。

检查自定义功能

检查传递给applyor的自定义函数的输入可能会很有帮助transform

例子

让我们创建一些示例数据并检查组,以便您可以了解我在说什么:

import pandas as pd
import numpy as np
df = pd.DataFrame({'State':['Texas', 'Texas', 'Florida', 'Florida'], 
                   'a':[4,5,1,3], 'b':[6,10,3,11]})

     State  a   b
0    Texas  4   6
1    Texas  5  10
2  Florida  1   3
3  Florida  3  11

让我们创建一个简单的自定义函数,该函数打印出隐式传递的对象的类型,然后引发一个错误,以便可以停止执行。

def inspect(x):
    print(type(x))
    raise

现在,让我们将此函数传递给groupby applytransformmethod,以查看传递给它的对象:

df.groupby('State').apply(inspect)

<class 'pandas.core.frame.DataFrame'>
<class 'pandas.core.frame.DataFrame'>
RuntimeError

如您所见,DataFrame被传递到inspect函数中。您可能想知道为什么将DataFrame类型打印两次。熊猫两次参加第一组比赛。这样做是为了确定是否存在快速完成计算的方法。这是您不应该担心的次要细节。

现在,让我们用 transform

df.groupby('State').transform(inspect)
<class 'pandas.core.series.Series'>
<class 'pandas.core.series.Series'>
RuntimeError

它传递了一个Series-一个完全不同的Pandas对象。

因此,一次transform只能使用一个系列。它并非不可能同时作用于两根色谱柱。因此,如果尝试ab自定义函数中减去column ,则会出现错误transform。见下文:

def subtract_two(x):
    return x['a'] - x['b']

df.groupby('State').transform(subtract_two)
KeyError: ('a', 'occurred at index a')

当熊猫试图找到a不存在的Series索引时,我们得到一个KeyError 。您可以通过完整apply的DataFrame 来完成此操作:

df.groupby('State').apply(subtract_two)

State     
Florida  2   -2
         3   -8
Texas    0   -2
         1   -5
dtype: int64

输出是一个Series,并且保留了原始索引,因此有些混乱,但是我们可以访问所有列。


显示传递的熊猫对象

它可以在自定义函数中显示整个pandas对象,从而提供更多帮助,因此您可以确切地看到所使用的对象。您可以使用print我喜欢使用模块中的display函数的语句,IPython.display以便在Jupyter笔记本中以HTML形式很好地输出DataFrame:

from IPython.display import display
def subtract_two(x):
    display(x)
    return x['a'] - x['b']

屏幕截图:


变换必须返回与组大小相同的一维序列

另一个区别是transform必须返回与该组相同大小的一维序列。在这种特定情况下,每个组都有两行,因此transform必须返回两行的序列。如果没有,则会引发错误:

def return_three(x):
    return np.array([1, 2, 3])

df.groupby('State').transform(return_three)
ValueError: transform must return a scalar value for each group

该错误消息并不能真正说明问题。您必须返回与组长度相同的序列。因此,这样的功能将起作用:

def rand_group_len(x):
    return np.random.rand(len(x))

df.groupby('State').transform(rand_group_len)

          a         b
0  0.962070  0.151440
1  0.440956  0.782176
2  0.642218  0.483257
3  0.056047  0.238208

返回单个标量对象也适用于 transform

如果仅从自定义函数返回单个标量,transform则将其用于组中的每一行:

def group_sum(x):
    return x.sum()

df.groupby('State').transform(group_sum)

   a   b
0  9  16
1  9  16
2  4  14
3  4  14

Two major differences between apply and transform

There are two major differences between the transform and apply groupby methods.

  • Input:
  • apply implicitly passes all the columns for each group as a DataFrame to the custom function.
  • while transform passes each column for each group individually as a Series to the custom function.
  • Output:
  • The custom function passed to apply can return a scalar, or a Series or DataFrame (or numpy array or even list).
  • The custom function passed to transform must return a sequence (a one dimensional Series, array or list) the same length as the group.

So, transform works on just one Series at a time and apply works on the entire DataFrame at once.

Inspecting the custom function

It can help quite a bit to inspect the input to your custom function passed to apply or transform.

Examples

Let’s create some sample data and inspect the groups so that you can see what I am talking about:

import pandas as pd
import numpy as np
df = pd.DataFrame({'State':['Texas', 'Texas', 'Florida', 'Florida'], 
                   'a':[4,5,1,3], 'b':[6,10,3,11]})

     State  a   b
0    Texas  4   6
1    Texas  5  10
2  Florida  1   3
3  Florida  3  11

Let’s create a simple custom function that prints out the type of the implicitly passed object and then raised an error so that execution can be stopped.

def inspect(x):
    print(type(x))
    raise

Now let’s pass this function to both the groupby apply and transform methods to see what object is passed to it:

df.groupby('State').apply(inspect)

<class 'pandas.core.frame.DataFrame'>
<class 'pandas.core.frame.DataFrame'>
RuntimeError

As you can see, a DataFrame is passed into the inspect function. You might be wondering why the type, DataFrame, got printed out twice. Pandas runs the first group twice. It does this to determine if there is a fast way to complete the computation or not. This is a minor detail that you shouldn’t worry about.

Now, let’s do the same thing with transform

df.groupby('State').transform(inspect)
<class 'pandas.core.series.Series'>
<class 'pandas.core.series.Series'>
RuntimeError

It is passed a Series – a totally different Pandas object.

So, transform is only allowed to work with a single Series at a time. It is impossible for it to act on two columns at the same time. So, if we try and subtract column a from b inside of our custom function we would get an error with transform. See below:

def subtract_two(x):
    return x['a'] - x['b']

df.groupby('State').transform(subtract_two)
KeyError: ('a', 'occurred at index a')

We get a KeyError as pandas is attempting to find the Series index a which does not exist. You can complete this operation with apply as it has the entire DataFrame:

df.groupby('State').apply(subtract_two)

State     
Florida  2   -2
         3   -8
Texas    0   -2
         1   -5
dtype: int64

The output is a Series and a little confusing as the original index is kept, but we have access to all columns.


Displaying the passed pandas object

It can help even more to display the entire pandas object within the custom function, so you can see exactly what you are operating with. You can use print statements by I like to use the display function from the IPython.display module so that the DataFrames get nicely outputted in HTML in a jupyter notebook:

from IPython.display import display
def subtract_two(x):
    display(x)
    return x['a'] - x['b']

Screenshot:


Transform must return a single dimensional sequence the same size as the group

The other difference is that transform must return a single dimensional sequence the same size as the group. In this particular instance, each group has two rows, so transform must return a sequence of two rows. If it does not then an error is raised:

def return_three(x):
    return np.array([1, 2, 3])

df.groupby('State').transform(return_three)
ValueError: transform must return a scalar value for each group

The error message is not really descriptive of the problem. You must return a sequence the same length as the group. So, a function like this would work:

def rand_group_len(x):
    return np.random.rand(len(x))

df.groupby('State').transform(rand_group_len)

          a         b
0  0.962070  0.151440
1  0.440956  0.782176
2  0.642218  0.483257
3  0.056047  0.238208

Returning a single scalar object also works for transform

If you return just a single scalar from your custom function, then transform will use it for each of the rows in the group:

def group_sum(x):
    return x.sum()

df.groupby('State').transform(group_sum)

   a   b
0  9  16
1  9  16
2  4  14
3  4  14

回答 1

就像我对.transform操作vs 感到困惑一样,.apply我找到了一些答案,这使我对该问题有所了解。例如,此答案非常有帮助。

到目前为止,我的建议是彼此隔离地.transform处理(或处理)Series(列)。这意味着在最后两个呼叫中:

df.groupby('A').transform(lambda x: (x['C'] - x['D']))
df.groupby('A').transform(lambda x: (x['C'] - x['D']).mean())

您要求.transform从两列中获取值,而“它”实际上并没有同时“看到”它们(可以这么说)。transform将逐一查看数据框列,然后返回一系列“(由一系列)标量组成的”(或一组系列),这些标量被重复了len(input_column)几次。

因此,应使用此标量.transform来使之Series成为输入上应用某种归约函数的结果Series(并且一次只能应用于一个系列/列)。

考虑以下示例(在您的数据框上):

zscore = lambda x: (x - x.mean()) / x.std() # Note that it does not reference anything outside of 'x' and for transform 'x' is one column.
df.groupby('A').transform(zscore)

将生成:

       C      D
0  0.989  0.128
1 -0.478  0.489
2  0.889 -0.589
3 -0.671 -1.150
4  0.034 -0.285
5  1.149  0.662
6 -1.404 -0.907
7 -0.509  1.653

这与您一次只在一列上使用它完全相同:

df.groupby('A')['C'].transform(zscore)

生成:

0    0.989
1   -0.478
2    0.889
3   -0.671
4    0.034
5    1.149
6   -1.404
7   -0.509

请注意,.apply在上一个示例(df.groupby('A')['C'].apply(zscore))中,它的工作方式完全相同,但是如果您尝试在数据帧上使用它,它将失败:

df.groupby('A').apply(zscore)

给出错误:

ValueError: operands could not be broadcast together with shapes (6,) (2,)

那么还有什么.transform用处呢?最简单的情况是尝试将归约函数的结果分配回原始数据帧。

df['sum_C'] = df.groupby('A')['C'].transform(sum)
df.sort('A') # to clearly see the scalar ('sum') applies to the whole column of the group

生成:

     A      B      C      D  sum_C
1  bar    one  1.998  0.593  3.973
3  bar  three  1.287 -0.639  3.973
5  bar    two  0.687 -1.027  3.973
4  foo    two  0.205  1.274  4.373
2  foo    two  0.128  0.924  4.373
6  foo    one  2.113 -0.516  4.373
7  foo  three  0.657 -1.179  4.373
0  foo    one  1.270  0.201  4.373

尝试用同样.apply会给NaNssum_C。因为.apply会返回reduce Series,所以它不知道如何广播回去:

df.groupby('A')['C'].apply(sum)

给予:

A
bar    3.973
foo    4.373

在某些情况下,什么时候.transform用于过滤数据:

df[df.groupby(['B'])['D'].transform(sum) < -1]

     A      B      C      D
3  bar  three  1.287 -0.639
7  foo  three  0.657 -1.179

我希望这可以增加一些清晰度。

As I felt similarly confused with .transform operation vs. .apply I found a few answers shedding some light on the issue. This answer for example was very helpful.

My takeout so far is that .transform will work (or deal) with Series (columns) in isolation from each other. What this means is that in your last two calls:

df.groupby('A').transform(lambda x: (x['C'] - x['D']))
df.groupby('A').transform(lambda x: (x['C'] - x['D']).mean())

You asked .transform to take values from two columns and ‘it’ actually does not ‘see’ both of them at the same time (so to speak). transform will look at the dataframe columns one by one and return back a series (or group of series) ‘made’ of scalars which are repeated len(input_column) times.

So this scalar, that should be used by .transform to make the Series is a result of some reduction function applied on an input Series (and only on ONE series/column at a time).

Consider this example (on your dataframe):

zscore = lambda x: (x - x.mean()) / x.std() # Note that it does not reference anything outside of 'x' and for transform 'x' is one column.
df.groupby('A').transform(zscore)

will yield:

       C      D
0  0.989  0.128
1 -0.478  0.489
2  0.889 -0.589
3 -0.671 -1.150
4  0.034 -0.285
5  1.149  0.662
6 -1.404 -0.907
7 -0.509  1.653

Which is exactly the same as if you would use it on only on one column at a time:

df.groupby('A')['C'].transform(zscore)

yielding:

0    0.989
1   -0.478
2    0.889
3   -0.671
4    0.034
5    1.149
6   -1.404
7   -0.509

Note that .apply in the last example (df.groupby('A')['C'].apply(zscore)) would work in exactly the same way, but it would fail if you tried using it on a dataframe:

df.groupby('A').apply(zscore)

gives error:

ValueError: operands could not be broadcast together with shapes (6,) (2,)

So where else is .transform useful? The simplest case is trying to assign results of reduction function back to original dataframe.

df['sum_C'] = df.groupby('A')['C'].transform(sum)
df.sort('A') # to clearly see the scalar ('sum') applies to the whole column of the group

yielding:

     A      B      C      D  sum_C
1  bar    one  1.998  0.593  3.973
3  bar  three  1.287 -0.639  3.973
5  bar    two  0.687 -1.027  3.973
4  foo    two  0.205  1.274  4.373
2  foo    two  0.128  0.924  4.373
6  foo    one  2.113 -0.516  4.373
7  foo  three  0.657 -1.179  4.373
0  foo    one  1.270  0.201  4.373

Trying the same with .apply would give NaNs in sum_C. Because .apply would return a reduced Series, which it does not know how to broadcast back:

df.groupby('A')['C'].apply(sum)

giving:

A
bar    3.973
foo    4.373

There are also cases when .transform is used to filter the data:

df[df.groupby(['B'])['D'].transform(sum) < -1]

     A      B      C      D
3  bar  three  1.287 -0.639
7  foo  three  0.657 -1.179

I hope this adds a bit more clarity.


回答 2

我将使用一个非常简单的代码片段来说明不同之处:

test = pd.DataFrame({'id':[1,2,3,1,2,3,1,2,3], 'price':[1,2,3,2,3,1,3,1,2]})
grouping = test.groupby('id')['price']

DataFrame看起来像这样:

    id  price   
0   1   1   
1   2   2   
2   3   3   
3   1   2   
4   2   3   
5   3   1   
6   1   3   
7   2   1   
8   3   2   

该表中有3个客户ID,每个客户进行三笔交易,每次支付1,2,3美元。

现在,我想找到每个客户的最低付款额。有两种方法:

  1. 使用apply

    grouping.min()

回报看起来像这样:

id
1    1
2    1
3    1
Name: price, dtype: int64

pandas.core.series.Series # return type
Int64Index([1, 2, 3], dtype='int64', name='id') #The returned Series' index
# lenght is 3
  1. 使用transform

    分组变换(最小值)

回报看起来像这样:

0    1
1    1
2    1
3    1
4    1
5    1
6    1
7    1
8    1
Name: price, dtype: int64

pandas.core.series.Series # return type
RangeIndex(start=0, stop=9, step=1) # The returned Series' index
# length is 9    

这两个方法都返回一个Series对象,但是第一个的对象length为3,length第二个的对象为9。

如果要回答What is the minimum price paid by each customer,则该apply方法是更适合选择的一种。

如果要回答What is the difference between the amount paid for each transaction vs the minimum payment,则要使用transform,因为:

test['minimum'] = grouping.transform(min) # ceates an extra column filled with minimum payment
test.price - test.minimum # returns the difference for each row

Apply 不能简单地在这里工作,因为它返回的是大小为3的Series,但是原始df的长度为9。您无法轻松地将其集成回原始df。

I am going to use a very simple snippet to illustrate the difference:

test = pd.DataFrame({'id':[1,2,3,1,2,3,1,2,3], 'price':[1,2,3,2,3,1,3,1,2]})
grouping = test.groupby('id')['price']

The DataFrame looks like this:

    id  price   
0   1   1   
1   2   2   
2   3   3   
3   1   2   
4   2   3   
5   3   1   
6   1   3   
7   2   1   
8   3   2   

There are 3 customer IDs in this table, each customer made three transactions and paid 1,2,3 dollars each time.

Now, I want to find the minimum payment made by each customer. There are two ways of doing it:

  1. Using apply:

    grouping.min()

The return looks like this:

id
1    1
2    1
3    1
Name: price, dtype: int64

pandas.core.series.Series # return type
Int64Index([1, 2, 3], dtype='int64', name='id') #The returned Series' index
# lenght is 3
  1. Using transform:

    grouping.transform(min)

The return looks like this:

0    1
1    1
2    1
3    1
4    1
5    1
6    1
7    1
8    1
Name: price, dtype: int64

pandas.core.series.Series # return type
RangeIndex(start=0, stop=9, step=1) # The returned Series' index
# length is 9    

Both methods return a Series object, but the length of the first one is 3 and the length of the second one is 9.

If you want to answer What is the minimum price paid by each customer, then the apply method is the more suitable one to choose.

If you want to answer What is the difference between the amount paid for each transaction vs the minimum payment, then you want to use transform, because:

test['minimum'] = grouping.transform(min) # ceates an extra column filled with minimum payment
test.price - test.minimum # returns the difference for each row

Apply does not work here simply because it returns a Series of size 3, but the original df’s length is 9. You cannot integrate it back to the original df easily.


回答 3

tmp = df.groupby(['A'])['c'].transform('mean')

就好像

tmp1 = df.groupby(['A']).agg({'c':'mean'})
tmp = df['A'].map(tmp1['c'])

要么

tmp1 = df.groupby(['A'])['c'].mean()
tmp = df['A'].map(tmp1)
tmp = df.groupby(['A'])['c'].transform('mean')

is like

tmp1 = df.groupby(['A']).agg({'c':'mean'})
tmp = df['A'].map(tmp1['c'])

or

tmp1 = df.groupby(['A'])['c'].mean()
tmp = df['A'].map(tmp1)

Python datetime-在使用strptime获取日,月,年之后设置固定的小时和分钟

问题:Python datetime-在使用strptime获取日,月,年之后设置固定的小时和分钟

我已经成功地将26 Sep 2012格式转换为26-09-2012使用:

datetime.strptime(request.POST['sample_date'],'%d %b %Y')

但是,我不知道如何将上述时间设置为11:59。有谁知道如何做到这一点?

请注意,这可以是将来的日期,也可以是任何随机的日期,而不仅仅是当前日期。

I’ve successfully converted something of 26 Sep 2012 format to 26-09-2012 using:

datetime.strptime(request.POST['sample_date'],'%d %b %Y')

However, I don’t know how to set the hour and minute of something like the above to 11:59. Does anyone know how to do this?

Note, this can be a future date or any random one, not just the current date.


回答 0

用途datetime.replace

from datetime import datetime
date = datetime.strptime('26 Sep 2012', '%d %b %Y')
newdate = date.replace(hour=11, minute=59)

Use datetime.replace:

from datetime import datetime
date = datetime.strptime('26 Sep 2012', '%d %b %Y')
newdate = date.replace(hour=11, minute=59)

回答 1

datetime.replace()将提供最佳选择。此外,它还提供了替换日,年和月的工具。

假设我们有一个datetime对象,日期表示为: "2017-05-04"

>>> from datetime import datetime
>>> date = datetime.strptime('2017-05-04',"%Y-%m-%d")
>>> print(date)
2017-05-04 00:00:00
>>> date = date.replace(minute=59, hour=23, second=59, year=2018, month=6, day=1)
>>> print(date)
2018-06-01 23:59:59

datetime.replace() will provide the best options. Also, it provides facility for replacing day, year, and month.

Suppose we have a datetime object and date is represented as: "2017-05-04"

>>> from datetime import datetime
>>> date = datetime.strptime('2017-05-04',"%Y-%m-%d")
>>> print(date)
2017-05-04 00:00:00
>>> date = date.replace(minute=59, hour=23, second=59, year=2018, month=6, day=1)
>>> print(date)
2018-06-01 23:59:59

Matplotlib(pyplot)savefig输出空白图像

问题:Matplotlib(pyplot)savefig输出空白图像

我正在尝试保存使用matplotlib创建的图;但是,图像保存为空白。

这是我的代码:

plt.subplot(121)
plt.imshow(dataStack, cmap=mpl.cm.bone)

plt.subplot(122)
y = copy.deepcopy(tumorStack)
y = np.ma.masked_where(y == 0, y)

plt.imshow(dataStack, cmap=mpl.cm.bone)
plt.imshow(y, cmap=mpl.cm.jet_r, interpolation='nearest')

if T0 is not None:
    plt.subplot(123)
    plt.imshow(T0, cmap=mpl.cm.bone)

    #plt.subplot(124)
    #Autozoom

#else:
    #plt.subplot(124)
    #Autozoom

plt.show()
plt.draw()
plt.savefig('tessstttyyy.png', dpi=100)

tessstttyyy.png为空白(也尝试使用.jpg)

I am trying to save plots I make using matplotlib; however, the images are saving blank.

Here is my code:

plt.subplot(121)
plt.imshow(dataStack, cmap=mpl.cm.bone)

plt.subplot(122)
y = copy.deepcopy(tumorStack)
y = np.ma.masked_where(y == 0, y)

plt.imshow(dataStack, cmap=mpl.cm.bone)
plt.imshow(y, cmap=mpl.cm.jet_r, interpolation='nearest')

if T0 is not None:
    plt.subplot(123)
    plt.imshow(T0, cmap=mpl.cm.bone)

    #plt.subplot(124)
    #Autozoom

#else:
    #plt.subplot(124)
    #Autozoom

plt.show()
plt.draw()
plt.savefig('tessstttyyy.png', dpi=100)

And tessstttyyy.png is blank (also tried with .jpg)


回答 0

首先,什么时候会发生T0 is not None?我会测试一下,然后再调整传递给的值plt.subplot();可以尝试使用值131、132和133,或者取决于是否T0存在的值。

其次,在plt.show()调用之后,创建一个新图形。为了解决这个问题,您可以

  1. 调用plt.savefig('tessstttyyy.png', dpi=100)之前调用plt.show()

  2. show()通过调用plt.gcf()“获取当前图形”来保存图形,然后可以随时调用savefig()Figure对象。

例如:

fig1 = plt.gcf()
plt.show()
plt.draw()
fig1.savefig('tessstttyyy.png', dpi=100)

在您的代码中,“ tesssttyyy.png”为空白,因为它保存的是新图形,该图形上没有任何内容。

First, what happens when T0 is not None? I would test that, then I would adjust the values I pass to plt.subplot(); maybe try values 131, 132, and 133, or values that depend whether or not T0 exists.

Second, after plt.show() is called, a new figure is created. To deal with this, you can

  1. Call plt.savefig('tessstttyyy.png', dpi=100) before you call plt.show()

  2. Save the figure before you show() by calling plt.gcf() for “get current figure”, then you can call savefig() on this Figure object at any time.

For example:

fig1 = plt.gcf()
plt.show()
plt.draw()
fig1.savefig('tessstttyyy.png', dpi=100)

In your code, ‘tesssttyyy.png’ is blank because it is saving the new figure, to which nothing has been plotted.


回答 1

plt.show() 应该来 plt.savefig()

说明:plt.show()清除所有内容,因此以后任何事情都会在一个新的空白图形上发生

plt.show() should come after plt.savefig()

Explanation: plt.show() clears the whole thing, so anything afterwards will happen on a new empty figure


回答 2

更改功能的顺序为我解决了问题

  • 首先 保存情节
  • 然后 显示剧情

如下:

plt.savefig('heatmap.png')

plt.show()

change the order of the functions fixed the problem for me:

  • first Save the plot
  • then Show the plot

as following:

plt.savefig('heatmap.png')

plt.show()

回答 3

在show()对我有用之前调用savefig。

fig ,ax = plt.subplots(figsize = (4,4))
sns.barplot(x='sex', y='tip', color='g', ax=ax,data=tips)
sns.barplot(x='sex', y='tip', color='b', ax=ax,data=tips)
ax.legend(['Male','Female'], facecolor='w')

plt.savefig('figure.png')
plt.show()

Calling savefig before show() worked for me.

fig ,ax = plt.subplots(figsize = (4,4))
sns.barplot(x='sex', y='tip', color='g', ax=ax,data=tips)
sns.barplot(x='sex', y='tip', color='b', ax=ax,data=tips)
ax.legend(['Male','Female'], facecolor='w')

plt.savefig('figure.png')
plt.show()

回答 4

让我给一个更详细的例子:

import numpy as np
import matplotlib.pyplot as plt


def draw_result(lst_iter, lst_loss, lst_acc, title):
    plt.plot(lst_iter, lst_loss, '-b', label='loss')
    plt.plot(lst_iter, lst_acc, '-r', label='accuracy')

    plt.xlabel("n iteration")
    plt.legend(loc='upper left')
    plt.title(title)
    plt.savefig(title+".png")  # should before plt.show method

    plt.show()


def test_draw():
    lst_iter = range(100)
    lst_loss = [0.01 * i + 0.01 * i ** 2 for i in xrange(100)]
    # lst_loss = np.random.randn(1, 100).reshape((100, ))
    lst_acc = [0.01 * i - 0.01 * i ** 2 for i in xrange(100)]
    # lst_acc = np.random.randn(1, 100).reshape((100, ))
    draw_result(lst_iter, lst_loss, lst_acc, "sgd_method")


if __name__ == '__main__':
    test_draw()

let’s me give a more detail example:

import numpy as np
import matplotlib.pyplot as plt


def draw_result(lst_iter, lst_loss, lst_acc, title):
    plt.plot(lst_iter, lst_loss, '-b', label='loss')
    plt.plot(lst_iter, lst_acc, '-r', label='accuracy')

    plt.xlabel("n iteration")
    plt.legend(loc='upper left')
    plt.title(title)
    plt.savefig(title+".png")  # should before plt.show method

    plt.show()


def test_draw():
    lst_iter = range(100)
    lst_loss = [0.01 * i + 0.01 * i ** 2 for i in xrange(100)]
    # lst_loss = np.random.randn(1, 100).reshape((100, ))
    lst_acc = [0.01 * i - 0.01 * i ** 2 for i in xrange(100)]
    # lst_acc = np.random.randn(1, 100).reshape((100, ))
    draw_result(lst_iter, lst_loss, lst_acc, "sgd_method")


if __name__ == '__main__':
    test_draw()


用多重继承调用父类__init__,正确的方法是什么?

问题:用多重继承调用父类__init__,正确的方法是什么?

假设我有多个继承方案:

class A(object):
    # code for A here

class B(object):
    # code for B here

class C(A, B):
    def __init__(self):
        # What's the right code to write here to ensure 
        # A.__init__ and B.__init__ get called?

有编写的两个典型方法C__init__

  1. (老式) ParentClass.__init__(self)
  2. (较新的样式) super(DerivedClass, self).__init__()

但是,在任何一种情况下,如果父类(AB没有遵循相同的约定,则代码将无法正常工作(某些代码可能会丢失或多次调用)。

那么又是什么正确的方法呢?说“保持一致,遵循一个或另一个”很容易,但是如果AB来自第三方图书馆,那又如何呢?有没有一种方法可以确保所有父类构造函数都被调用(以正确的顺序,并且只能调用一次)?

编辑:看看我的意思,如果我这样做:

class A(object):
    def __init__(self):
        print("Entering A")
        super(A, self).__init__()
        print("Leaving A")

class B(object):
    def __init__(self):
        print("Entering B")
        super(B, self).__init__()
        print("Leaving B")

class C(A, B):
    def __init__(self):
        print("Entering C")
        A.__init__(self)
        B.__init__(self)
        print("Leaving C")

然后我得到:

Entering C
Entering A
Entering B
Leaving B
Leaving A
Entering B
Leaving B
Leaving C

请注意,Binit会被调用两次。如果我做:

class A(object):
    def __init__(self):
        print("Entering A")
        print("Leaving A")

class B(object):
    def __init__(self):
        print("Entering B")
        super(B, self).__init__()
        print("Leaving B")

class C(A, B):
    def __init__(self):
        print("Entering C")
        super(C, self).__init__()
        print("Leaving C")

然后我得到:

Entering C
Entering A
Leaving A
Leaving C

请注意,B永远不会调用init。因此,似乎除非我知道/控制我从(AB)继承的类的初始化,否则我无法对正在编写的类(C)做出安全选择。

Say I have a multiple inheritance scenario:

class A(object):
    # code for A here

class B(object):
    # code for B here

class C(A, B):
    def __init__(self):
        # What's the right code to write here to ensure 
        # A.__init__ and B.__init__ get called?

There’s two typical approaches to writing C‘s __init__:

  1. (old-style) ParentClass.__init__(self)
  2. (newer-style) super(DerivedClass, self).__init__()

However, in either case, if the parent classes (A and B) don’t follow the same convention, then the code will not work correctly (some may be missed, or get called multiple times).

So what’s the correct way again? It’s easy to say “just be consistent, follow one or the other”, but if A or B are from a 3rd party library, what then? Is there an approach that can ensure that all parent class constructors get called (and in the correct order, and only once)?

Edit: to see what I mean, if I do:

class A(object):
    def __init__(self):
        print("Entering A")
        super(A, self).__init__()
        print("Leaving A")

class B(object):
    def __init__(self):
        print("Entering B")
        super(B, self).__init__()
        print("Leaving B")

class C(A, B):
    def __init__(self):
        print("Entering C")
        A.__init__(self)
        B.__init__(self)
        print("Leaving C")

Then I get:

Entering C
Entering A
Entering B
Leaving B
Leaving A
Entering B
Leaving B
Leaving C

Note that B‘s init gets called twice. If I do:

class A(object):
    def __init__(self):
        print("Entering A")
        print("Leaving A")

class B(object):
    def __init__(self):
        print("Entering B")
        super(B, self).__init__()
        print("Leaving B")

class C(A, B):
    def __init__(self):
        print("Entering C")
        super(C, self).__init__()
        print("Leaving C")

Then I get:

Entering C
Entering A
Leaving A
Leaving C

Note that B‘s init never gets called. So it seems that unless I know/control the init’s of the classes I inherit from (A and B) I cannot make a safe choice for the class I’m writing (C).


回答 0

两种方式都可以正常工作。使用该方法super()可为子类带来更大的灵活性。

在直接呼叫方式中,C.__init__可以同时呼叫A.__init__B.__init__

使用时super(),需要将类设计为在其中C调用的协作式多重继承super,这将调用A的代码,该代码还将super调用B的代码。请参阅http://rhettinger.wordpress.com/2011/05/26/super-considered-super,以详细了解可以使用进行的操作super

[回答问题,稍后编辑]

因此,似乎除非我知道/控制我从(A和B)继承的类的初始化,否则我无法对我正在编写的类(C)做出安全的选择。

参考的文章显示了如何通过在A和周围添加包装器类来处理这种情况B。标题为“如何合并非合作类”的部分提供了一个可行的示例。

可能希望多重继承更容易,让您轻松组成Car和Airplane类来获得FlyingCar,但现实情况是,单独设计的组件通常需要适配器或包装器,然后才能像我们希望的那样无缝地组装在一起:-)

另一个想法:如果您对使用多重继承来编写功能不满意,则可以使用composition来完全控制在哪些情况下调用哪种方法。

Both ways work fine. The approach using super() leads to greater flexibility for subclasses.

In the direct call approach, C.__init__ can call both A.__init__ and B.__init__.

When using super(), the classes need to be designed for cooperative multiple inheritance where C calls super, which invokes A‘s code which will also call super which invokes B‘s code. See http://rhettinger.wordpress.com/2011/05/26/super-considered-super for more detail on what can be done with super.

[Response question as later edited]

So it seems that unless I know/control the init’s of the classes I inherit from (A and B) I cannot make a safe choice for the class I’m writing (C).

The referenced article shows how to handle this situation by adding a wrapper class around A and B. There is a worked-out example in the section titled “How to Incorporate a Non-cooperative Class”.

One might wish that multiple inheritance were easier, letting you effortlessly compose Car and Airplane classes to get a FlyingCar, but the reality is that separately designed components often need adapters or wrappers before fitting together as seamlessly as we would like :-)

One other thought: if you’re unhappy with composing functionality using multiple inheritance, you can use composition for complete control over which methods get called on which occasions.


回答 1

您问题的答案取决于一个非常重要的方面:您的基类是否设计用于多重继承?

有3种不同的方案:

  1. 基类是不相关的独立类。

    如果您的基类是能够独立运行的独立实体,并且彼此之间不认识,则它们不是为多重继承设计的。例:

    class Foo:
        def __init__(self):
            self.foo = 'foo'
    
    class Bar:
        def __init__(self, bar):
            self.bar = bar
    

    重要:请注意,既不打电话Foo也不Bar打电话super().__init__()!这就是为什么您的代码无法正常工作的原因。由于Diamond继承在python中的工作方式,因此object不应调用基类为的类super().__init__()。如您所知,这样做会破坏多重继承,因为您最终将调用另一个类的__init__而不是object.__init__()免责声明:避免super().__init__()object-subclasses中是我个人的建议,绝不是python社区中达成一致的共识。有些人更喜欢super在每个类中使用,认为如果该类的行为不像您通常可以编写一个适配器您期望的。)

    这也意味着您永远不应编写从其继承object且没有__init__方法的类。完全不定义__init__方法与调用具有相同的效果super().__init__()。如果您的类直接继承自object,请确保添加一个空的构造函数,如下所示:

    class Base(object):
        def __init__(self):
            pass
    

    无论如何,在这种情况下,您将必须手动调用每个父构造函数。有两种方法可以做到这一点:

    • 不带 super

      class FooBar(Foo, Bar):
          def __init__(self, bar='bar'):
              Foo.__init__(self)  # explicit calls without super
              Bar.__init__(self, bar)
      
    • super

      class FooBar(Foo, Bar):
          def __init__(self, bar='bar'):
              super().__init__()  # this calls all constructors up to Foo
              super(Foo, self).__init__(bar)  # this calls all constructors after Foo up
                                              # to Bar
      

    这两种方法各有其优点和缺点。如果你使用super,你的类将支持依赖注入。另一方面,容易出错。例如,如果你改变的顺序FooBar(像class FooBar(Bar, Foo)),你就必须更新super到匹配的电话。没有super您,不必担心这一点,并且代码更具可读性。

  2. 类之一是mixin。

    混入是,这是一个一流的设计与多重继承使用。这意味着我们不必手动调用两个父构造函数,因为mixin会自动为我们调用第二个构造函数。由于这次只需要调用一个构造函数,因此super可以避免对父类的名称进行硬编码。

    例:

    class FooMixin:
        def __init__(self, *args, **kwargs):
            super().__init__(*args, **kwargs)  # forwards all unused arguments
            self.foo = 'foo'
    
    class Bar:
        def __init__(self, bar):
            self.bar = bar
    
    class FooBar(FooMixin, Bar):
        def __init__(self, bar='bar'):
            super().__init__(bar)  # a single call is enough to invoke
                                   # all parent constructors
    
            # NOTE: `FooMixin.__init__(self, bar)` would also work, but isn't
            # recommended because we don't want to hard-code the parent class.
    

    这里的重要细节是:

    • mixin调用super().__init__()并通过它接收的任何参数。
    • 子类首先从mixin继承:class FooBar(FooMixin, Bar)。如果基类的顺序错误,则将永远不会调用mixin的构造函数。
  3. 所有基类均设计用于协作继承。

    专为合作继承而设计的类非常类似于mixin:它们将所有未使用的参数传递给下一类。和以前一样,我们只需要调用即可super().__init__(),所有父级构造函数都将被链调用。

    例:

    class CoopFoo:
        def __init__(self, **kwargs):
            super().__init__(**kwargs)  # forwards all unused arguments
            self.foo = 'foo'
    
    class CoopBar:
        def __init__(self, bar, **kwargs):
            super().__init__(**kwargs)  # forwards all unused arguments
            self.bar = bar
    
    class CoopFooBar(CoopFoo, CoopBar):
        def __init__(self, bar='bar'):
            super().__init__(bar=bar)  # pass all arguments on as keyword
                                       # arguments to avoid problems with
                                       # positional arguments and the order
                                       # of the parent classes
    

    在这种情况下,父类的顺序无关紧要。我们CoopBar最好还是从头继承,而代码仍然可以正常工作。但这是真的,因为所有参数都作为关键字参数传递。使用位置参数将很容易弄错参数的顺序,因此,协作类习惯于仅接受关键字参数。

    这也是我前面提到的规则的一个exceptions:CoopFooCoopBar都继承自object,但它们仍然调用super().__init__()。如果没有,则不会有合作继承。

底线:正确的实现取决于您从其继承的类。

构造函数是类的公共接口的一部分。如果该类被设计为混合或协作继承,则必须将其记录下来。如果文档中未提及任何内容,则可以安全地假定该类不是为协作多重继承设计的。

The answer to your question depends on one very important aspect: Are your base classes designed for multiple inheritance?

There are 3 different scenarios:

  1. The base classes are unrelated, standalone classes.

    If your base classes are separate entities that are capable of functioning independently and they don’t know each other, they’re not designed for multiple inheritance. Example:

    class Foo:
        def __init__(self):
            self.foo = 'foo'
    
    class Bar:
        def __init__(self, bar):
            self.bar = bar
    

    Important: Notice that neither Foo nor Bar calls super().__init__()! This is why your code didn’t work correctly. Because of the way diamond inheritance works in python, classes whose base class is object should not call super().__init__(). As you’ve noticed, doing so would break multiple inheritance because you end up calling another class’s __init__ rather than object.__init__(). (Disclaimer: Avoiding super().__init__() in object-subclasses is my personal recommendation and by no means an agreed-upon consensus in the python community. Some people prefer to use super in every class, arguing that you can always write an adapter if the class doesn’t behave as you expect.)

    This also means that you should never write a class that inherits from object and doesn’t have an __init__ method. Not defining a __init__ method at all has the same effect as calling super().__init__(). If your class inherits directly from object, make sure to add an empty constructor like so:

    class Base(object):
        def __init__(self):
            pass
    

    Anyway, in this situation, you will have to call each parent constructor manually. There are two ways to do this:

    • Without super

      class FooBar(Foo, Bar):
          def __init__(self, bar='bar'):
              Foo.__init__(self)  # explicit calls without super
              Bar.__init__(self, bar)
      
    • With super

      class FooBar(Foo, Bar):
          def __init__(self, bar='bar'):
              super().__init__()  # this calls all constructors up to Foo
              super(Foo, self).__init__(bar)  # this calls all constructors after Foo up
                                              # to Bar
      

    Each of these two methods has its own advantages and disadvantages. If you use super, your class will support dependency injection. On the other hand, it’s easier to make mistakes. For example if you change the order of Foo and Bar (like class FooBar(Bar, Foo)), you’d have to update the super calls to match. Without super you don’t have to worry about this, and the code is much more readable.

  2. One of the classes is a mixin.

    A mixin is a class that’s designed to be used with multiple inheritance. This means we don’t have to call both parent constructors manually, because the mixin will automatically call the 2nd constructor for us. Since we only have to call a single constructor this time, we can do so with super to avoid having to hard-code the parent class’s name.

    Example:

    class FooMixin:
        def __init__(self, *args, **kwargs):
            super().__init__(*args, **kwargs)  # forwards all unused arguments
            self.foo = 'foo'
    
    class Bar:
        def __init__(self, bar):
            self.bar = bar
    
    class FooBar(FooMixin, Bar):
        def __init__(self, bar='bar'):
            super().__init__(bar)  # a single call is enough to invoke
                                   # all parent constructors
    
            # NOTE: `FooMixin.__init__(self, bar)` would also work, but isn't
            # recommended because we don't want to hard-code the parent class.
    

    The important details here are:

    • The mixin calls super().__init__() and passes through any arguments it receives.
    • The subclass inherits from the mixin first: class FooBar(FooMixin, Bar). If the order of the base classes is wrong, the mixin’s constructor will never be called.
  3. All base classes are designed for cooperative inheritance.

    Classes designed for cooperative inheritance are a lot like mixins: They pass through all unused arguments to the next class. Like before, we just have to call super().__init__() and all parent constructors will be chain-called.

    Example:

    class CoopFoo:
        def __init__(self, **kwargs):
            super().__init__(**kwargs)  # forwards all unused arguments
            self.foo = 'foo'
    
    class CoopBar:
        def __init__(self, bar, **kwargs):
            super().__init__(**kwargs)  # forwards all unused arguments
            self.bar = bar
    
    class CoopFooBar(CoopFoo, CoopBar):
        def __init__(self, bar='bar'):
            super().__init__(bar=bar)  # pass all arguments on as keyword
                                       # arguments to avoid problems with
                                       # positional arguments and the order
                                       # of the parent classes
    

    In this case, the order of the parent classes doesn’t matter. We might as well inherit from CoopBar first, and the code would still work the same. But that’s only true because all arguments are passed as keyword arguments. Using positional arguments would make it easy to get the order of the arguments wrong, so it’s customary for cooperative classes to accept only keyword arguments.

    This is also an exception to the rule I mentioned earlier: Both CoopFoo and CoopBar inherit from object, but they still call super().__init__(). If they didn’t, there would be no cooperative inheritance.

Bottom line: The correct implementation depends on the classes you’re inheriting from.

The constructor is part of a class’s public interface. If the class is designed as a mixin or for cooperative inheritance, that must be documented. If the docs don’t mention anything of the sort, it’s safe to assume that the class isn’t designed for cooperative multiple inheritance.


回答 2

这两种方法(“新风格”或“旧式”),将工作,如果你有过的源代码控制AB。否则,可能需要使用适配器类。

可访问的源代码:正确使用“新样式”

class A(object):
    def __init__(self):
        print("-> A")
        super(A, self).__init__()
        print("<- A")

class B(object):
    def __init__(self):
        print("-> B")
        super(B, self).__init__()
        print("<- B")

class C(A, B):
    def __init__(self):
        print("-> C")
        # Use super here, instead of explicit calls to __init__
        super(C, self).__init__()
        print("<- C")
>>> C()
-> C
-> A
-> B
<- B
<- A
<- C

在此,方法解析顺序(MRO)规定以下内容:

  • C(A, B)A首先决定,然后B。MRO是C -> A -> B -> object
  • super(A, self).__init__()沿始于的MRO链继续C.__init__进行B.__init__
  • super(B, self).__init__()沿始于的MRO链继续C.__init__进行object.__init__

您可以说这种情况是为多重继承而设计的

可访问的源代码:正确使用“旧样式”

class A(object):
    def __init__(self):
        print("-> A")
        print("<- A")

class B(object):
    def __init__(self):
        print("-> B")
        # Don't use super here.
        print("<- B")

class C(A, B):
    def __init__(self):
        print("-> C")
        A.__init__(self)
        B.__init__(self)
        print("<- C")
>>> C()
-> C
-> A
<- A
-> B
<- B
<- C

在此,MRO无关紧要,因为A.__init__B.__init__被显式调用。class C(B, A):也会一样工作。

尽管这种情况不是像以前的样式那样“设计”为新样式的多重继承,但多重继承仍然是可能的。


现在,如果AB是从第三方库-即你有过的源代码没有控制AB?简短的答案:您必须设计一个实现必要super调用的适配器类,然后使用一个空类来定义MRO(请参阅Raymond Hettinger上的文章super -尤其是“如何合并非合作类”一节)。

第三方家长:A未实施superB确实

class A(object):
    def __init__(self):
        print("-> A")
        print("<- A")

class B(object):
    def __init__(self):
        print("-> B")
        super(B, self).__init__()
        print("<- B")

class Adapter(object):
    def __init__(self):
        print("-> C")
        A.__init__(self)
        super(Adapter, self).__init__()
        print("<- C")

class C(Adapter, B):
    pass
>>> C()
-> C
-> A
<- A
-> B
<- B
<- C

Adapter实现super是为了C定义MRO,该MRO在super(Adapter, self).__init__()执行时起作用。

如果反过来呢?

第三方父母:A工具superB才不是

class A(object):
    def __init__(self):
        print("-> A")
        super(A, self).__init__()
        print("<- A")

class B(object):
    def __init__(self):
        print("-> B")
        print("<- B")

class Adapter(object):
    def __init__(self):
        print("-> C")
        super(Adapter, self).__init__()
        B.__init__(self)
        print("<- C")

class C(Adapter, A):
    pass
>>> C()
-> C
-> A
<- A
-> B
<- B
<- C

此处的模式相同,除了执行顺序已切换Adapter.__init__super先呼叫,然后再进行显式呼叫。请注意,带有第三方父母的每种情况都需要一个唯一的适配器类。

因此,似乎除非我知道/控制我从(AB)继承的类的初始化,否则我无法对正在编写的类(C)做出安全选择。

虽然你可以处理,你没有的情况下,控制的源代码A,并B通过使用适配器类,这是事实,你必须知道在init怎样的父类实现super(如果有的话),以这样做。

Either approach (“new style” or “old style”) will work if you have control over the source code for A and B. Otherwise, use of an adapter class might be necessary.

Source code accessible: Correct use of “new style”

class A(object):
    def __init__(self):
        print("-> A")
        super(A, self).__init__()
        print("<- A")

class B(object):
    def __init__(self):
        print("-> B")
        super(B, self).__init__()
        print("<- B")

class C(A, B):
    def __init__(self):
        print("-> C")
        # Use super here, instead of explicit calls to __init__
        super(C, self).__init__()
        print("<- C")
>>> C()
-> C
-> A
-> B
<- B
<- A
<- C

Here, method resolution order (MRO) dictates the following:

  • C(A, B) dictates A first, then B. MRO is C -> A -> B -> object.
  • super(A, self).__init__() continues along the MRO chain initiated in C.__init__ to B.__init__.
  • super(B, self).__init__() continues along the MRO chain initiated in C.__init__ to object.__init__.

You could say that this case is designed for multiple inheritance.

Source code accessible: Correct use of “old style”

class A(object):
    def __init__(self):
        print("-> A")
        print("<- A")

class B(object):
    def __init__(self):
        print("-> B")
        # Don't use super here.
        print("<- B")

class C(A, B):
    def __init__(self):
        print("-> C")
        A.__init__(self)
        B.__init__(self)
        print("<- C")
>>> C()
-> C
-> A
<- A
-> B
<- B
<- C

Here, MRO does not matter, since A.__init__ and B.__init__ are called explicitly. class C(B, A): would work just as well.

Although this case is not “designed” for multiple inheritance in the new style as the previous one was, multiple inheritance is still possible.


Now, what if A and B are from a third party library – i.e., you have no control over the source code for A and B? The short answer: You must design an adapter class that implements the necessary super calls, then use an empty class to define the MRO (see Raymond Hettinger’s article on super – especially the section, “How to Incorporate a Non-cooperative Class”).

Third-party parents: A does not implement super; B does

class A(object):
    def __init__(self):
        print("-> A")
        print("<- A")

class B(object):
    def __init__(self):
        print("-> B")
        super(B, self).__init__()
        print("<- B")

class Adapter(object):
    def __init__(self):
        print("-> C")
        A.__init__(self)
        super(Adapter, self).__init__()
        print("<- C")

class C(Adapter, B):
    pass
>>> C()
-> C
-> A
<- A
-> B
<- B
<- C

Class Adapter implements super so that C can define the MRO, which comes into play when super(Adapter, self).__init__() is executed.

And what if it’s the other way around?

Third-party parents: A implements super; B does not

class A(object):
    def __init__(self):
        print("-> A")
        super(A, self).__init__()
        print("<- A")

class B(object):
    def __init__(self):
        print("-> B")
        print("<- B")

class Adapter(object):
    def __init__(self):
        print("-> C")
        super(Adapter, self).__init__()
        B.__init__(self)
        print("<- C")

class C(Adapter, A):
    pass
>>> C()
-> C
-> A
<- A
-> B
<- B
<- C

Same pattern here, except the order of execution is switched in Adapter.__init__; super call first, then explicit call. Notice that each case with third-party parents requires a unique adapter class.

So it seems that unless I know/control the init’s of the classes I inherit from (A and B) I cannot make a safe choice for the class I’m writing (C).

Although you can handle the cases where you don’t control the source code of A and B by using an adapter class, it is true that you must know how the init’s of the parent classes implement super (if at all) in order to do so.


回答 3

正如雷蒙德(Raymond)在回答中所说的那样,直接调用A.__init__B.__init__可以正常工作,并且您的代码易于阅读。

但是,它不使用C和这些类之间的继承链接。利用该链接可为您提供更多的一致性,并使最终的重构更加容易且不易出错。如何执行此操作的示例:

class C(A, B):
    def __init__(self):
        print("entering c")
        for base_class in C.__bases__:  # (A, B)
             base_class.__init__(self)
        print("leaving c")

As Raymond said in his answer, a direct call to A.__init__ and B.__init__ works fine, and your code would be readable.

However, it does not use the inheritance link between C and those classes. Exploiting that link gives you more consistancy and make eventual refactorings easier and less error-prone. An example of how to do that:

class C(A, B):
    def __init__(self):
        print("entering c")
        for base_class in C.__bases__:  # (A, B)
             base_class.__init__(self)
        print("leaving c")

回答 4

本文有助于解释协作式多重继承:

http://www.artima.com/weblogs/viewpost.jsp?thread=281127

它提到了有用的方法mro(),可向您显示方法解析顺序。在你的第二个例子,当你调用superA,该super呼叫继续在MRO。顺序中的下一个类是B,这就是为什么Binit首次被调用的原因。

这是来自python官方站点的更多技术文章:

http://www.python.org/download/releases/2.3/mro/

This article helps to explain cooperative multiple inheritance:

http://www.artima.com/weblogs/viewpost.jsp?thread=281127

It mentions the useful method mro() that shows you the method resolution order. In your 2nd example, where you call super in A, the super call continues on in MRO. The next class in the order is B, this is why B‘s init is called the first time.

Here’s a more technical article from the official python site:

http://www.python.org/download/releases/2.3/mro/


回答 5

如果要从第三方库中繁衍子类类,则不会,没有盲目的方法来调用__init__实际上起作用的基类方法(或任何其他方法),而不管基类的编程方式如何。

super使编写旨在协作实现方法的类成为复杂的多重继承树的一部分成为可能,而类继承者不必知道。但是无法使用它正确地从可能使用或可能不使用的任意类中继承super

本质上,一个类是设计为使用super基类还是直接调用基类来进行子类化,是属于该类“公共接口”一部分的属性,因此应进行记录。如果您以库作者所期望的方式使用第三方库,并且库具有合理的文档,则通常会告诉您需要做什么来对特定的事物进行子类化。如果不是,那么您必须查看要子类化的类的源代码,并查看其基类调用约定是什么。如果你是从一个或多个第三方库的方式,该库作者结合多个类想到,那么它可能无法始终如一地调用超类的方法在所有; 如果类A是使用的层次结构的一部分,super而类B是不使用super的层次结构的一部分,则不能保证这两种选择都不会起作用。您将必须找出一种适用于每个特定案例的策略。

If you are multiply sub-classing classes from third party libraries, then no, there is no blind approach to calling the base class __init__ methods (or any other methods) that actually works regardless of how the base classes are programmed.

super makes it possible to write classes designed to cooperatively implement methods as part of complex multiple inheritance trees which need not be known to the class author. But there’s no way to use it to correctly inherit from arbitrary classes that may or may not use super.

Essentially, whether a class is designed to be sub-classed using super or with direct calls to the base class is a property which is part of the class’ “public interface”, and it should be documented as such. If you’re using third-party libraries in the way that the library author expected and the library has reasonable documentation, it would normally tell you what you are required to do to subclass particular things. If not, then you’ll have to look at the source code for the classes you’re sub-classing and see what their base-class-invocation convention is. If you’re combining multiple classes from one or more third-party libraries in a way that the library authors didn’t expect, then it may not be possible to consistently invoke super-class methods at all; if class A is part of a hierarchy using super and class B is part of a hierarchy that doesn’t use super, then neither option is guaranteed to work. You’ll have to figure out a strategy that happens to work for each particular case.