标签归档:Python

-m开关的作用是什么?

问题:-m开关的作用是什么?

你能给我解释一下打电话之间有什么区别

python -m mymod1 mymod2.py args

python mymod1.py mymod2.py args

看来在这两种情况下mymod1.py被调用,sys.argv

['mymod1.py', 'mymod2.py', 'args']

那么,该-m开关是做什么用的呢?

Could you explain to me what the difference is between calling

python -m mymod1 mymod2.py args

and

python mymod1.py mymod2.py args

It seems in both cases mymod1.py is called and sys.argv is

['mymod1.py', 'mymod2.py', 'args']

So what is the -m switch for?


回答 0

PEP 338Rationale部分的第一行说:

Python 2.4添加了命令行开关-m,以允许使用Python模块命名空间定位模块以作为脚本执行。激励性的示例是标准库模块,例如pdb和profile,并且Python 2.4实现对于此有限的目的是合适的。

因此,您可以通过这种方式在Python的搜索路径中指定任何模块,而不仅仅是当前目录中的文件。您是正确的,python mymod1.py mymod2.py args其效果完全相同。本Scope of this proposal节的第一行指出:

在Python 2.4中,将执行使用-m定位的模块,就像在命令行中提供了其文件名一样。

还有-m更多的可能,例如使用作为包装一部分的模块等,这就是PEP 338其余部分的意义。阅读以获取更多信息。

The first line of the Rationale section of PEP 338 says:

Python 2.4 adds the command line switch -m to allow modules to be located using the Python module namespace for execution as scripts. The motivating examples were standard library modules such as pdb and profile, and the Python 2.4 implementation is fine for this limited purpose.

So you can specify any module in Python’s search path this way, not just files in the current directory. You’re correct that python mymod1.py mymod2.py args has exactly the same effect. The first line of the Scope of this proposal section states:

In Python 2.4, a module located using -m is executed just as if its filename had been provided on the command line.

With -m more is possible, like working with modules which are part of a package, etc. That’s what the rest of PEP 338 is about. Read it for more info.


回答 1

值得一提的是,只有在程序包具有文件的情况下__main__.py,此方法才有效。否则,该程序包无法直接执行。

python -m some_package some_arguments

python解释器将__main__.py在包路径中查找要执行的文件。等效于:

python path_to_package/__main__.py somearguments

它将在以下时间执行内容:

if __name__ == "__main__":

It’s worth mentioning this only works if the package has a file __main__.py Otherwise, this package can not be executed directly.

python -m some_package some_arguments

The python interpreter will looking for a __main__.py file in the package path to execute. It’s equivalent to:

python path_to_package/__main__.py somearguments

It will execute the content after:

if __name__ == "__main__":

回答 2

在我看来,尽管已经多次询问并回答了这个问题(例如,在这里在这里在这里在这里),但是没有一个现有的答案可以完全或简洁地捕捉到该-m标志的所有含义。因此,以下将尝试改进之前的内容。

简介(TLDR)

-m命令执行了很多操作,并非始终需要所有这些命令。简而言之:(1)允许通过模块名而不是文件名执行python脚本(2)允许选择要添加到的目录以sys.path进行import解析,(3)允许从命令行执行具有相对导入的python脚本。

初赛

为了解释-m标志,我们首先必须弄清楚一些术语。

首先,Python的主要组织单位称为模块。模块有两种形式之一:代码模块和包模块。代码模块是包含python可执行代码的任何文件。软件包模块是包含其他模块(代码模块或软件包模块)的目录。代码模块的最常见类型是*.py文件,而软件包模块的最常见类型是包含__init__.py文件的目录。

其次,可以通过两种不同的方式唯一标识所有模块:<modulename><filename>。模块通常由Python代码中的模块名称(例如import <modulename>)和命令行上的文件名(例如)来标识python <filename>。所有Python解释器都可以通过一组定义良好的规则将模块名转换为文件名。这些规则取决于sys.path变量,因此可以通过更改此值来更改映射(有关如何完成此操作的更多信息,请参阅PEP 302)。

第三,所有模块(代码和程序包)都可以执行(这意味着与模块关联的代码将由Python解释器评估)。根据执行方法和模块类型的不同,对哪些代码进行评估以及何时修改可能会有所不同。例如,如果一个人通过执行一个包模块,python <filename>那么<filename>/__init__.py它将被评估,然后是<filename>/__main__.py。另一方面,如果一个人通过执行相同的程序包模块,import <modulename>那么__init__.py将仅执行程序包。

的历史发展 -m

-m标志最初是在Python 2.4.1中引入的。最初,它的唯一目的是提供一种识别要执行的python模块的替代方法。也就是说,如果我们同时知道模块的<filename><modulename>,则以下两个命令是等效的:python <filename> <args>python -m <modulename> <args>。另外,根据PEP 338,此迭代-m仅适用于顶级模块名称(即,可以直接在sys.path上找到的模块,而无需任何中间包)。

随着完成PEP 338-m功能扩展到支持<modulename>超出顶层modulenames表示。这意味着http.server现在已经完全支持诸如这样的名称。此增强功能还意味着模块中的所有软件包现在都已加载(即,所有软件包__init__.py文件均已评估)。

PEP 366-m带来了最终的主要功能增强。通过此更新,不仅可以支持绝对导入,还可以支持显式相对导入。这是通过修改命令中命名模块的变量来实现的。-m__package__-m

用例

-m标志有两种值得注意的用例:

  1. 从命令行执行可能不知道其文件名的模块。该用例利用了Python解释器知道如何将模块名转换为文件名这一事实。当要从命令行运行stdlib模块或第三方模块时,这特别有利。例如,很少有人知道http.server模块的文件名,但大多数人确实知道其模块名,因此我们可以使用从命令行执行它python -m http.server

  2. 要执行包含绝对导入的本地软件包,而无需安装它。PEP 338中详细介绍了该用例,并利用了将当前工作目录添加到sys.path而不是模块目录的事实。该用例与pip install -e .在开发/编辑模式下安装软件包非常相似。

缺点

经过-m多年的改进,它仍然存在一个主要缺点-它只能执行以python编写的代码模块(即* .py)。例如,如果-m用于执行C编译代码模块,则会产生以下错误,No code object available for <modulename>(请参见此处以获取更多详细信息)。

详细比较

通过python命令执行模块的效果(即python <filename>):

  • sys.path 修改为包括最终目录 <filename>
  • __name__ 设定为 '__main__'
  • __package__ 设定为 None
  • __init__.py 不评估任何软件包(包括其自身的软件包模块)
  • __main__.py评估包装模块;对代码进行代码模块评估。

通过import语句(即import <modulename>)执行模块的影响:

  • sys.path以任何方式修改
  • __name__ 设置为的绝对形式 <modulename>
  • __package__ 设置为中的直接父包 <modulename>
  • __init__.py 针对所有软件包进行评估(包括针对软件包模块的评估)
  • __main__.py评价包模块; 对代码进行代码模块评估

通过-m标志(即python -m <modulename>)执行模块的影响:

  • sys.path 修改为包括当前目录
  • __name__ 设定为 '__main__'
  • __package__ 设置为中的直接父包 <modulename>
  • __init__.py 针对所有软件包进行评估(包括针对软件包模块的评估)
  • __main__.py评估包装模块;对代码进行代码模块评估

结论

-m最简单的角度来看,该标志是使用模块名而不是文件名从命令行执行python脚本的一种方法。另外,-m提供了附加功能,结合了import语句的功能(例如,支持显式相对导入和自动包__init__评估)和python命令行的便利性。

Despite this question having been asked and answered several times (e.g., here, here, here, and here) in my opinion no existing answer fully or concisely captures all the implications of the -m flag. Therefore, the following will attempt to improve on what has come before.

Introduction (TLDR)

The -m flag does a lot of things, not all of which will be needed all the time. In short it can be used to: (1) execute python code from the command line via modulename rather than filename (2) add a directory to sys.path for use in import resolution and (3) execute python code that contains relative imports from the command line.

Preliminaries

To explain the -m flag we first need to explain a little terminology.

Python’s primary organizational unit is known as a module. Module’s come in one of two flavors: code modules and package modules. A code module is any file that contains python executable code. A package module is a directory that contains other modules (either code modules or package modules). The most common type of code modules are *.py files while the most common type of package modules are directories containing an __init__.py file.

Python allows modules to be uniquely identified in two distinct ways: modulename and filename. In general, modules are identified by modulename in Python code (e.g., import <modulename>) and by filename on the command line (e.g., python <filename>). All python interpreters are able to convert modulenames to filenames by following the same few, well-defined rules. These rules hinge on the sys.path variable. By altering this variable one can change how Python resolves modulenames into filenames (for more on how this is done see PEP 302).

All modules (both code and package) can be executed (i.e., code associated with the module will be evaluated by the Python interpreter). Depending on the execution method (and module type) what code gets evaluated, and when, can change quite a bit. For example, if one executes a package module via python <filename> then <filename>/__init__.py will be evaluated followed by <filename>/__main__.py. On the other hand, if one executes that same package module via import <modulename> then only the package’s __init__.py will be executed.

Historical Development of -m

The -m flag was first introduced in Python 2.4.1. Initially its only purpose was to provide an alternative means of identifying the python module to execute from the command line. That is, if we knew both the <filename> and <modulename> for a module then the following two commands were equivalent: python <filename> <args> and python -m <modulename> <args>. One constraint with this iteration, according to PEP 338, was that -m only worked with top level modulenames (i.e., modules that could be found directly on sys.path without any intervening package modules).

With the completion of PEP 338 the -m feature was extended to support <modulename> representations beyond the top level. This meant names such as http.server were now fully supported. This extension also meant that each parent package in modulename was now evaluated (i.e., all parent package __init__.py files were evaluated) in addition to the module referenced by the modulename itself.

The final major feature enhancement for -m came with PEP 366. With this upgrade -m gained the ability to support not only absolute imports but also explicit relative imports when executing modules. This was achieved by changing -m so that it set the __package__ variable to the parent module of the given modulename (in addition to everything else it already did).

Use Cases

There are two notable use cases for the -m flag:

  1. To execute modules from the command line for which one may not know their filename. This use case takes advantage of the fact that the Python interpreter knows how to convert modulenames to filenames. This is particularly advantageous when one wants to run stdlib modules or 3rd-party module from the command line. For example, very few people know the filename for the http.server module but most people do know its modulename so we can execute it from the command line using python -m http.server.

  2. To execute a local package containing absolute or relative imports without needing to install it. This use case is detailed in PEP 338 and leverages the fact that the current working directory is added to sys.path rather than the module’s directory. This use case is very similar to using pip install -e . to install a package in develop/edit mode.

Shortcomings

With all the enhancements made to -m over the years it still has one major shortcoming — it can only execute modules written in Python (i.e., *.py). For example, if -m is used to execute a C compiled code module the following error will be produced, No code object available for <modulename> (see here for more details).

Detailed Comparisons

Effects of module execution via import statement (i.e., import <modulename>):

  • sys.path is not modified in any way
  • __name__ is set to the absolute form of <modulename>
  • __package__ is set to the immediate parent package in <modulename>
  • __init__.py is evaluated for all packages (including its own for package modules)
  • __main__.py is not evaluated for package modules; the code is evaluated for code modules

Effects of module execution via command line (i.e., python <filename>):

  • sys.path is modified to include the final directory in <filename>
  • __name__ is set to '__main__'
  • __package__ is set to None
  • __init__.py is not evaluated for any package (including its own for package modules)
  • __main__.py is evaluated for package modules; the code is evaluated for code modules.

Effects of module execution via command line with the -m flag (i.e., python -m <modulename>):

  • sys.path is modified to include the current directory
  • __name__ is set to '__main__'
  • __package__ is set to the immediate parent package in <modulename>
  • __init__.py is evaluated for all packages (including its own for package modules)
  • __main__.py is evaluated for package modules; the code is evaluated for code modules

Conclusion

The -m flag is, at its simplest, a means to execute python scripts from the command line by using modulenames rather than filenames. The real power of -m, however, is in its ability to combine the power of import statements (e.g., support for explicit relative imports and automatic package __init__ evaluation) with the convenience of the command line.


SQLAlchemy默认DateTime

问题:SQLAlchemy默认DateTime

这是我的声明性模型:

import datetime
from sqlalchemy import Column, Integer, DateTime
from sqlalchemy.ext.declarative import declarative_base

Base = declarative_base()

class Test(Base):
    __tablename__ = 'test'

    id = Column(Integer, primary_key=True)
    created_date = DateTime(default=datetime.datetime.utcnow)

但是,当我尝试导入此模块时,出现此错误:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "orm/models2.py", line 37, in <module>
    class Test(Base):
  File "orm/models2.py", line 41, in Test
    created_date = sqlalchemy.DateTime(default=datetime.datetime.utcnow)
TypeError: __init__() got an unexpected keyword argument 'default'

如果使用整数类型,则可以设置默认值。这是怎么回事?

This is my declarative model:

import datetime
from sqlalchemy import Column, Integer, DateTime
from sqlalchemy.ext.declarative import declarative_base

Base = declarative_base()

class Test(Base):
    __tablename__ = 'test'

    id = Column(Integer, primary_key=True)
    created_date = DateTime(default=datetime.datetime.utcnow)

However, when I try to import this module, I get this error:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "orm/models2.py", line 37, in <module>
    class Test(Base):
  File "orm/models2.py", line 41, in Test
    created_date = sqlalchemy.DateTime(default=datetime.datetime.utcnow)
TypeError: __init__() got an unexpected keyword argument 'default'

If I use an Integer type, I can set a default value. What’s going on?


回答 0

DateTime没有默认键作为输入。默认键应该是该Column功能的输入。试试这个:

import datetime
from sqlalchemy import Column, Integer, DateTime
from sqlalchemy.ext.declarative import declarative_base

Base = declarative_base()

class Test(Base):
    __tablename__ = 'test'

    id = Column(Integer, primary_key=True)
    created_date = Column(DateTime, default=datetime.datetime.utcnow)

DateTime doesn’t have a default key as an input. The default key should be an input to the Column function. Try this:

import datetime
from sqlalchemy import Column, Integer, DateTime
from sqlalchemy.ext.declarative import declarative_base

Base = declarative_base()

class Test(Base):
    __tablename__ = 'test'

    id = Column(Integer, primary_key=True)
    created_date = Column(DateTime, default=datetime.datetime.utcnow)

回答 1

计算数据库中的时间戳,而不是客户端中的时间戳

为了理智,您可能希望datetimes由数据库服务器而不是应用程序服务器来计算所有数据。计算应用程序中的时间戳可能会导致问题,因为网络等待时间是可变的,客户端会经历略微不同的时钟漂移,并且不同的编程语言有时会略有不同地计算时间。

SQLAlchemy允许您通过传递func.now()func.current_timestamp()(它们是彼此的别名)来执行此操作,该命令告诉DB计算时间戳本身。

使用SQLALchemy的 server_default

另外,对于已经告诉数据库计算值的默认值,通常最好使用server_default代替default。这告诉SQLAlchemy将默认值作为CREATE TABLE语句的一部分传递。

例如,如果您针对该表编写了一个临时脚本,则使用server_default意味着您无需担心手动向脚本添加时间戳调用-数据库将自动对其进行设置。

了解SQLAlchemy的onupdate/server_onupdate

SQLAlchemy还支持,onupdate以便每当更新该行时,它都会插入一个新的时间戳。再一次,最好告诉数据库来计算时间戳本身:

from sqlalchemy.sql import func

time_created = Column(DateTime(timezone=True), server_default=func.now())
time_updated = Column(DateTime(timezone=True), onupdate=func.now())

有一个server_onupdate参数,但与不同server_default,它实际上未在服务器端设置任何参数。它只是告诉SQLalchemy更新发生时(也许您在列上创建了触发器),数据库将更改列,因此SQLAlchemy将要求返回值,以便它可以更新相应的对象。

另一个潜在的陷阱:

您可能会惊讶地发现,如果在单个事务中进行大量更改,则它们都具有相同的时间戳。这是因为SQL标准指定CURRENT_TIMESTAMP根据事务的开始返回值。

PostgreSQL提供了非SQL标准statement_timestamp()clock_timestamp()并且在事务中更改。此处的文档:https : //www.postgresql.org/docs/current/static/functions-datetime.html#FUNCTIONS-DATETIME-CURRENT

UTC时间戳

如果要使用UTC时间戳,请func.utcnow()SQLAlchemy文档中提供的实现存根。但是,您需要自己提供适当的特定于驱动程序的功能。

Calculate timestamps within your DB, not your client

For sanity, you probably want to have all datetimes calculated by your DB server, rather than the application server. Calculating the timestamp in the application can lead to problems because network latency is variable, clients experience slightly different clock drift, and different programming languages occasionally calculate time slightly differently.

SQLAlchemy allows you to do this by passing func.now() or func.current_timestamp() (they are aliases of each other) which tells the DB to calculate the timestamp itself.

Use SQLALchemy’s server_default

Additionally, for a default where you’re already telling the DB to calculate the value, it’s generally better to use server_default instead of default. This tells SQLAlchemy to pass the default value as part of the CREATE TABLE statement.

For example, if you write an ad hoc script against this table, using server_default means you won’t need to worry about manually adding a timestamp call to your script–the database will set it automatically.

Understanding SQLAlchemy’s onupdate/server_onupdate

SQLAlchemy also supports onupdate so that anytime the row is updated it inserts a new timestamp. Again, best to tell the DB to calculate the timestamp itself:

from sqlalchemy.sql import func

time_created = Column(DateTime(timezone=True), server_default=func.now())
time_updated = Column(DateTime(timezone=True), onupdate=func.now())

There is a server_onupdate parameter, but unlike server_default, it doesn’t actually set anything serverside. It just tells SQLalchemy that your database will change the column when an update happens (perhaps you created a trigger on the column ), so SQLAlchemy will ask for the return value so it can update the corresponding object.

One other potential gotcha:

You might be surprised to notice that if you make a bunch of changes within a single transaction, they all have the same timestamp. That’s because the SQL standard specifies that CURRENT_TIMESTAMP returns values based on the start of the transaction.

PostgreSQL provides the non-SQL-standard statement_timestamp() and clock_timestamp() which do change within a transaction. Docs here: https://www.postgresql.org/docs/current/static/functions-datetime.html#FUNCTIONS-DATETIME-CURRENT

UTC timestamp

If you want to use UTC timestamps, a stub of implementation for func.utcnow() is provided in SQLAlchemy documentation. You need to provide appropriate driver-specific functions on your own though.


回答 2

您还可以默认使用sqlalchemy内置函数 DateTime

from sqlalchemy.sql import func

DT = Column(DateTime(timezone=True), default=func.now())

You can also use sqlalchemy builtin function for default DateTime

from sqlalchemy.sql import func

DT = Column(DateTime(timezone=True), default=func.now())

回答 3

您可能想要使用,onupdate=datetime.now以便UPDATE也可以更改该last_updated字段。

SQLAlchemy对于python执行的函数有两个默认值。

  • default 设置一次INSERT的值
  • onupdate还将值设置为UPDATE 上的可调用结果。

You likely want to use onupdate=datetime.now so that UPDATEs also change the last_updated field.

SQLAlchemy has two defaults for python executed functions.

  • default sets the value on INSERT, only once
  • onupdate sets the value to the callable result on UPDATE as well.

回答 4

default关键字参数应被给予Column对象。

例:

Column(u'timestamp', TIMESTAMP(timezone=True), primary_key=False, nullable=False, default=time_now),

默认值可以是可调用的,在这里我定义如下。

from pytz import timezone
from datetime import datetime

UTC = timezone('UTC')

def time_now():
    return datetime.now(UTC)

The default keyword parameter should be given to the Column object.

Example:

Column(u'timestamp', TIMESTAMP(timezone=True), primary_key=False, nullable=False, default=time_now),

The default value can be a callable, which here I defined like the following.

from pytz import timezone
from datetime import datetime

UTC = timezone('UTC')

def time_now():
    return datetime.now(UTC)

回答 5

根据PostgreSQL文档,https://www.postgresql.org/docs/9.6/static/functions-datetime.html

now, CURRENT_TIMESTAMP, LOCALTIMESTAMP return the time of transaction.

这被认为是一个功能:目的是允许单个事务具有“当前”时间的一致概念,以便同一事务内的多个修改具有相同的时间戳。

如果您不希望事务时间戳记,则可能要使用statement_timestampclock_timestamp

statement_timestamp()

返回当前语句的开始时间(更具体地说,是从客户端收到最新命令消息的时间)。statement_timestamp

clock_timestamp()

返回实际的当前时间,因此,即使在单个SQL命令中,其值也会更改。

As per PostgreSQL documentation, https://www.postgresql.org/docs/9.6/static/functions-datetime.html

now, CURRENT_TIMESTAMP, LOCALTIMESTAMP return the time of transaction.

This is considered a feature: the intent is to allow a single transaction to have a consistent notion of the “current” time, so that multiple modifications within the same transaction bear the same time stamp.

You might want to use statement_timestamp or clock_timestamp if you don’t want transaction timestamp.

statement_timestamp()

returns the start time of the current statement (more specifically, the time of receipt of the latest command message from the client). statement_timestamp

clock_timestamp()

returns the actual current time, and therefore its value changes even within a single SQL command.


在Matplotlib中为线上的单个点设置标记

问题:在Matplotlib中为线上的单个点设置标记

我已经使用Matplotlib在图形上绘制线条。现在,我想为线上的各个点设置样式,特别是标记。我该怎么做呢?

为了澄清我的问题,我希望能够为一行中的单个标记设置样式,而不是为该行上的每个标记设置样式。

I have used Matplotlib to plot lines on a figure. Now I would now like to set the style, specifically the marker, for individual points on the line. How do I do this?

To clarify my question, I want to be able to set the style for individual markers on a line, not every marker on said line.


回答 0

在调用中指定关键字args linestyle和/或。markerplot

例如,使用虚线和蓝色圆圈标记:

plt.plot(range(10), linestyle='--', marker='o', color='b')

相同内容的快捷方式调用:

plt.plot(range(10), '--bo')

这是可能的线条和标记样式的列表:

================    ===============================
character           description
================    ===============================
   -                solid line style
   --               dashed line style
   -.               dash-dot line style
   :                dotted line style
   .                point marker
   ,                pixel marker
   o                circle marker
   v                triangle_down marker
   ^                triangle_up marker
   <                triangle_left marker
   >                triangle_right marker
   1                tri_down marker
   2                tri_up marker
   3                tri_left marker
   4                tri_right marker
   s                square marker
   p                pentagon marker
   *                star marker
   h                hexagon1 marker
   H                hexagon2 marker
   +                plus marker
   x                x marker
   D                diamond marker
   d                thin_diamond marker
   |                vline marker
   _                hline marker
================    ===============================

编辑: 以标记点的任意子集为例,如注释中所要求:

import numpy as np
import matplotlib.pyplot as plt

xs = np.linspace(-np.pi, np.pi, 30)
ys = np.sin(xs)
markers_on = [12, 17, 18, 19]
plt.plot(xs, ys, '-gD', markevery=markers_on)
plt.show()

markevery由于此功能分支的合并,从1.4+开始,使用kwarg的最后一个示例是可能的。如果您坚持使用较旧版本的matplotlib,则仍可以通过在散点图上覆盖散点图来获得结果。有关更多详细信息,请参见编辑历史记录

Specify the keyword args linestyle and/or marker in your call to plot.

For example, using a dashed line and blue circle markers:

plt.plot(range(10), linestyle='--', marker='o', color='b')

A shortcut call for the same thing:

plt.plot(range(10), '--bo')

Here is a list of the possible line and marker styles:

================    ===============================
character           description
================    ===============================
   -                solid line style
   --               dashed line style
   -.               dash-dot line style
   :                dotted line style
   .                point marker
   ,                pixel marker
   o                circle marker
   v                triangle_down marker
   ^                triangle_up marker
   <                triangle_left marker
   >                triangle_right marker
   1                tri_down marker
   2                tri_up marker
   3                tri_left marker
   4                tri_right marker
   s                square marker
   p                pentagon marker
   *                star marker
   h                hexagon1 marker
   H                hexagon2 marker
   +                plus marker
   x                x marker
   D                diamond marker
   d                thin_diamond marker
   |                vline marker
   _                hline marker
================    ===============================

edit: with an example of marking an arbitrary subset of points, as requested in the comments:

import numpy as np
import matplotlib.pyplot as plt

xs = np.linspace(-np.pi, np.pi, 30)
ys = np.sin(xs)
markers_on = [12, 17, 18, 19]
plt.plot(xs, ys, '-gD', markevery=markers_on)
plt.show()

This last example using the markevery kwarg is possible in since 1.4+, due to the merge of this feature branch. If you are stuck on an older version of matplotlib, you can still achieve the result by overlaying a scatterplot on the line plot. See the edit history for more details.


回答 1

有一张图片显示所有标记的名称和描述,希望对您有帮助。

import matplotlib.pylab as plt
markers=['.',',','o','v','^','<','>','1','2','3','4','8','s','p','P','*','h','H','+','x','X','D','d','|','_']
descriptions=['point', 'pixel', 'circle', 'triangle_down', 'triangle_up','triangle_left', 'triangle_right', 'tri_down', 'tri_up', 'tri_left','tri_right', 'octagon', 'square', 'pentagon', 'plus (filled)','star', 'hexagon1', 'hexagon2', 'plus', 'x', 'x (filled)','diamond', 'thin_diamond', 'vline', 'hline']
x=[]
y=[]
for i in range(5):
    for j in range(5):
        x.append(i)
        y.append(j)
plt.figure()
for i,j,m,l in zip(x,y,markers,descriptions):
    plt.scatter(i,j,marker=m)
    plt.text(i-0.15,j+0.15,s=m+' : '+l)
plt.axis([-0.1,4.8,-0.1,4.5])
plt.tight_layout()
plt.axis('off')
plt.show()  

There is a picture show all markers’ name and description, i hope it will help you.

import matplotlib.pylab as plt
markers=['.',',','o','v','^','<','>','1','2','3','4','8','s','p','P','*','h','H','+','x','X','D','d','|','_']
descriptions=['point', 'pixel', 'circle', 'triangle_down', 'triangle_up','triangle_left', 'triangle_right', 'tri_down', 'tri_up', 'tri_left','tri_right', 'octagon', 'square', 'pentagon', 'plus (filled)','star', 'hexagon1', 'hexagon2', 'plus', 'x', 'x (filled)','diamond', 'thin_diamond', 'vline', 'hline']
x=[]
y=[]
for i in range(5):
    for j in range(5):
        x.append(i)
        y.append(j)
plt.figure()
for i,j,m,l in zip(x,y,markers,descriptions):
    plt.scatter(i,j,marker=m)
    plt.text(i-0.15,j+0.15,s=m+' : '+l)
plt.axis([-0.1,4.8,-0.1,4.5])
plt.tight_layout()
plt.axis('off')
plt.show()  


回答 2

供将来参考- Line2D艺术家返回的艺术家plot()还有一种set_markevery()方法,允许您仅在某些点上设置标记-请参见https://matplotlib.org/api/_as_gen/matplotlib.lines.Line2D.html#matplotlib.lines.Line2D。 set_markevery

For future reference – the Line2D artist returned by plot() also has a set_markevery() method which allows you to only set markers on certain points – see https://matplotlib.org/api/_as_gen/matplotlib.lines.Line2D.html#matplotlib.lines.Line2D.set_markevery


回答 3

更改特定点标记形状,大小的一个简单技巧是:首先将其与所有其他数据一起绘制,然后仅对该点(或一组点,如果要更改多个点的样式)再绘制一个图。假设我们要更改第二点的标记形状:

x = [1,2,3,4,5]
y = [2,1,3,6,7]

plt.plot(x, y, "-o")
x0 = [2]
y0 = [1]
plt.plot(x0, y0, "s")

plt.show()

结果是: 用多个标记绘制

A simple trick to change a particular point marker shape, size… is to first plot it with all the other data then plot one more plot only with that point(or set of points if you want to change the style of multiple points). Suppose we want to change the marker shape of second point:

x = [1,2,3,4,5]
y = [2,1,3,6,7]

plt.plot(x, y, "-o")
x0 = [2]
y0 = [1]
plt.plot(x0, y0, "s")

plt.show()

Result is: Plot with multiple markers


django MultiValueDictKeyError错误,我该如何处理

问题:django MultiValueDictKeyError错误,我该如何处理

我正在尝试将对象保存到数据库中,但是它引发了MultiValueDictKeyError错误。

问题出在表格内,is_private用一个复选框表示。如果未选中该复选框,则显然不会传递任何内容。这是消除错误的地方。

我如何正确处理并捕获此异常?

该行是

is_private = request.POST['is_private']

I’m trying to save a object to my database, but it’s throwing a MultiValueDictKeyError error.

The problems lies within the form, the is_private is represented by a checkbox. If the check box is NOT selected, obviously nothing is passed. This is where the error gets chucked.

How do I properly deal with this exception, and catch it?

The line is

is_private = request.POST['is_private']

回答 0

使用MultiValueDict的get方法。这在标准字典中也存在,并且是一种在不存在默认值的情况下获取值的方法。

is_private = request.POST.get('is_private', False)

通常,

my_var = dict.get(<key>, <default>)

Use the MultiValueDict’s get method. This is also present on standard dicts and is a way to fetch a value while providing a default if it does not exist.

is_private = request.POST.get('is_private', False)

Generally,

my_var = dict.get(<key>, <default>)

回答 1

选择最适合您的:

1个

is_private = request.POST.get('is_private', False);

如果is_privatekey在request.POST中存在,则is_private变量等于它,如果不相等,则它等于False。

2

if 'is_private' in request.POST:
    is_private = request.POST['is_private']
else:
    is_private = False

3

from django.utils.datastructures import MultiValueDictKeyError
try:
    is_private = request.POST['is_private']
except MultiValueDictKeyError:
    is_private = False

Choose what is best for you:

1

is_private = request.POST.get('is_private', False);

If is_private key is present in request.POST the is_private variable will be equal to it, if not, then it will be equal to False.

2

if 'is_private' in request.POST:
    is_private = request.POST['is_private']
else:
    is_private = False

3

from django.utils.datastructures import MultiValueDictKeyError
try:
    is_private = request.POST['is_private']
except MultiValueDictKeyError:
    is_private = False

回答 2

之所以会这样,是因为您试图从不存在的字典中获取密钥。您需要先测试它是否在其中。

尝试:

is_private = 'is_private' in request.POST

要么

is_private = 'is_private' in request.POST and request.POST['is_private']

取决于您使用的值。

You get that because you’re trying to get a key from a dictionary when it’s not there. You need to test if it is in there first.

try:

is_private = 'is_private' in request.POST

or

is_private = 'is_private' in request.POST and request.POST['is_private']

depending on the values you’re using.


回答 3

您为什么不尝试is_private在模型中定义为default=False

class Foo(models.Models):
    is_private = models.BooleanField(default=False)

Why didn’t you try to define is_private in your models as default=False?

class Foo(models.Models):
    is_private = models.BooleanField(default=False)

回答 4

要记住的另一件事是request.POST['keyword']引用由指定的html name属性标识的元素keyword

因此,如果您的表格是:

<form action="/login/" method="POST">
  <input type="text" name="keyword" placeholder="Search query">
  <input type="number" name="results" placeholder="Number of results">
</form>

然后,request.POST['keyword']和分别request.POST['results']包含输入元素keyword和的值results

Another thing to remember is that request.POST['keyword'] refers to the element identified by the specified html name attribute keyword.

So, if your form is:

<form action="/login/" method="POST">
  <input type="text" name="keyword" placeholder="Search query">
  <input type="number" name="results" placeholder="Number of results">
</form>

then, request.POST['keyword'] and request.POST['results'] will contain the value of the input elements keyword and results, respectively.


回答 5

首先检查请求对象是否具有’is_private’键参数。多数情况下,此MultiValueDictKeyError发生是因为类字典的请求对象中缺少键。由于字典是无序键,因此值对为“关联存储器”或“关联数组”

换句话说。request.GET或request.POST是类似于字典的对象,包含所有请求参数。这是特定于Django的。

如果key在字典中,则方法get()返回给定key的值。如果key不可用,则返回默认值None。

您可以通过以下方式处理此错误:

is_private = request.POST.get('is_private', False);

First check if the request object have the ‘is_private’ key parameter. Most of the case’s this MultiValueDictKeyError occurred for missing key in the dictionary-like request object. Because dictionary is an unordered key, value pair “associative memories” or “associative arrays”

In another word. request.GET or request.POST is a dictionary-like object containing all request parameters. This is specific to Django.

The method get() returns a value for the given key if key is in the dictionary. If key is not available then returns default value None.

You can handle this error by putting :

is_private = request.POST.get('is_private', False);

回答 6

对我而言,由于以下原因,此错误在我的django项目中发生:

  1. 我在项目的模板文件夹中的home.html文件中插入了一个新的超链接,如下所示:

    <input type="button" value="About" onclick="location.href='{% url 'about' %}'">

  2. 在views.py中,我具有count和about的以下定义:

   def count(request):
           fulltext = request.GET['fulltext']
           wordlist = fulltext.split()
           worddict = {}
           for word in wordlist:
               if word in worddict:
                   worddict[word] += 1
               else:
                   worddict[word] = 1
                   worddict = sorted(worddict.items(), key = operator.itemgetter(1),reverse=True)
           return render(request,'count.html', 'fulltext':fulltext,'count':len(wordlist),'worddict'::worddict})

   def about(request): 
       return render(request,"about.html")
  1. 在urls.py中,我具有以下url模式:
    urlpatterns = [
        path('admin/', admin.site.urls),
        path('',views.homepage,name="home"),
        path('eggs',views.eggs),
        path('count/',views.count,name="count"),
        path('about/',views.count,name="about"),
    ]

可以看出没有。上面的3,在最后一个url模式中,我错误地调用了views.count而我需要调用views.about。fulltext = request.GET['fulltext']views.py的count函数中的这一行(由于在urlpatterns中输入错误而被错误地调用)引发了multivaluedictkeyerror异常。

然后,我将urls.py中的最后一个URL模式更改为正确的模式,即path('about/',views.about,name="about"),一切正常。

显然,通常django中的新手程序员会犯这样的错误,即我错误地为URL调用了另一个视图函数,这可能是期望使用不同的参数集或在其render调用中传递不同的对象集,而不是预期的行为。

希望这可以帮助一些新手程序员使用django。

For me, this error occurred in my django project because of the following:

  1. I inserted a new hyperlink in my home.html present in templates folder of my project as below:

    <input type="button" value="About" onclick="location.href='{% url 'about' %}'">
  2. In views.py, I had the following definitions of count and about:

   def count(request):
           fulltext = request.GET['fulltext']
           wordlist = fulltext.split()
           worddict = {}
           for word in wordlist:
               if word in worddict:
                   worddict[word] += 1
               else:
                   worddict[word] = 1
                   worddict = sorted(worddict.items(), key = operator.itemgetter(1),reverse=True)
           return render(request,'count.html', 'fulltext':fulltext,'count':len(wordlist),'worddict'::worddict})

   def about(request): 
       return render(request,"about.html")
  1. In urls.py, I had the following url patterns:
    urlpatterns = [
        path('admin/', admin.site.urls),
        path('',views.homepage,name="home"),
        path('eggs',views.eggs),
        path('count/',views.count,name="count"),
        path('about/',views.count,name="about"),
    ]

As can be seen in no. 3 above,in the last url pattern, I was incorrectly calling views.count whereas I needed to call views.about. This line fulltext = request.GET['fulltext'] in count function (which was mistakenly called because of wrong entry in urlpatterns) of views.py threw the multivaluedictkeyerror exception.

Then I changed the last url pattern in urls.py to the correct one i.e. path('about/',views.about,name="about"), and everything worked fine.

Apparently, in general a newbie programmer in django can make the mistake I made of wrongly calling another view function for a url, which might be expecting different set of parameters or passing different set of objects in its render call, rather than the intended behavior.

Hope this helps some newbie programmer to django.


sqlalchemy在多列中唯一

问题:sqlalchemy在多列中唯一

假设我有一个代表位置的类。位置“属于”客户。位置由Unicode 10个字符代码标识。对于特定客户,“位置代码”在位置之间应该是唯一的。

The two below fields in combination should be unique
customer_id = Column(Integer,ForeignKey('customers.customer_id')
location_code = Column(Unicode(10))

因此,如果我有两个客户,客户“ 123”和客户“ 456”。它们都可以有一个称为“ main”的位置,但都不能有两个称为main的位置。

我可以在业务逻辑中处理此问题,但我想确保没有办法轻松地在sqlalchemy中添加需求。unique = True选项似乎仅在应用于特定字段时才起作用,这将导致整个表仅对所有位置具有唯一代码。

Let’s say that I have a class that represents locations. Locations “belong” to customers. Locations are identified by a unicode 10 character code. The “location code” should be unique among the locations for a specific customer.

The two below fields in combination should be unique
customer_id = Column(Integer,ForeignKey('customers.customer_id')
location_code = Column(Unicode(10))

So if i have two customers, customer “123” and customer “456”. They both can have a location called “main” but neither could have two locations called main.

I can handle this in the business logic but I want to make sure there is no way to easily add the requirement in sqlalchemy. The unique=True option seems to only work when applied to a specific field and it would cause the entire table to only have a unique code for all locations.


回答 0

从以下文档的摘录Column

unique –为True时,指示此列包含唯一约束,或者,如果index也为True,则指示应使用唯一标志创建索引。要在约束/索引中指定多个列或指定一个显式名称,请显式使用 UniqueConstraintIndex构造。

由于这些属于表而不属于映射的类,因此可以在表定义中声明它们,或者如果使用声明性声明,例如__table_args__

# version1: table definition
mytable = Table('mytable', meta,
    # ...
    Column('customer_id', Integer, ForeignKey('customers.customer_id')),
    Column('location_code', Unicode(10)),

    UniqueConstraint('customer_id', 'location_code', name='uix_1')
    )
# or the index, which will ensure uniqueness as well
Index('myindex', mytable.c.customer_id, mytable.c.location_code, unique=True)


# version2: declarative
class Location(Base):
    __tablename__ = 'locations'
    id = Column(Integer, primary_key = True)
    customer_id = Column(Integer, ForeignKey('customers.customer_id'), nullable=False)
    location_code = Column(Unicode(10), nullable=False)
    __table_args__ = (UniqueConstraint('customer_id', 'location_code', name='_customer_location_uc'),
                     )

Extract from the documentation of the Column:

unique – When True, indicates that this column contains a unique constraint, or if index is True as well, indicates that the Index should be created with the unique flag. To specify multiple columns in the constraint/index or to specify an explicit name, use the UniqueConstraint or Index constructs explicitly.

As these belong to a Table and not to a mapped Class, one declares those in the table definition, or if using declarative as in the __table_args__:

# version1: table definition
mytable = Table('mytable', meta,
    # ...
    Column('customer_id', Integer, ForeignKey('customers.customer_id')),
    Column('location_code', Unicode(10)),

    UniqueConstraint('customer_id', 'location_code', name='uix_1')
    )
# or the index, which will ensure uniqueness as well
Index('myindex', mytable.c.customer_id, mytable.c.location_code, unique=True)


# version2: declarative
class Location(Base):
    __tablename__ = 'locations'
    id = Column(Integer, primary_key = True)
    customer_id = Column(Integer, ForeignKey('customers.customer_id'), nullable=False)
    location_code = Column(Unicode(10), nullable=False)
    __table_args__ = (UniqueConstraint('customer_id', 'location_code', name='_customer_location_uc'),
                     )

回答 1

from flask_sqlalchemy import SQLAlchemy
db = SQLAlchemy()

class Location(Base):
      __table_args__ = (
        # this can be db.PrimaryKeyConstraint if you want it to be a primary key
        db.UniqueConstraint('customer_id', 'location_code'))
      customer_id = Column(Integer,ForeignKey('customers.customer_id')
      location_code = Column(Unicode(10))
from flask_sqlalchemy import SQLAlchemy
db = SQLAlchemy()

class Location(Base):
      __table_args__ = (
        # this can be db.PrimaryKeyConstraint if you want it to be a primary key
        db.UniqueConstraint('customer_id', 'location_code'),
      )
      customer_id = Column(Integer,ForeignKey('customers.customer_id')
      location_code = Column(Unicode(10))

Python argparse:默认值或指定值

问题:Python argparse:默认值或指定值

我想有一个可选参数,如果仅存在未指定值的标志,则默认为一个值,但是存储用户指定的值,而不是如果用户指定一个值,则存储默认值。是否已经有可用于此的措施?

一个例子:

python script.py --example
# args.example would equal a default value of 1
python script.py --example 2
# args.example would equal a default value of 2

我可以创建一个动作,但是想查看是否存在执行此操作的方法。

I would like to have a optional argument that will default to a value if only the flag is present with no value specified, but store a user-specified value instead of the default if the user specifies a value. Is there already an action available for this?

An example:

python script.py --example
# args.example would equal a default value of 1
python script.py --example 2
# args.example would equal a default value of 2

I can create an action, but wanted to see if there was an existing way to do this.


回答 0

import argparse
parser = argparse.ArgumentParser()
parser.add_argument('--example', nargs='?', const=1, type=int)
args = parser.parse_args()
print(args)

% test.py 
Namespace(example=None)
% test.py --example
Namespace(example=1)
% test.py --example 2
Namespace(example=2)

  • nargs='?' 表示0或1参数
  • const=1 当参数为0时设置默认值
  • type=int 将参数转换为int

如果即使未指定,test.py也要设置example为1 --example,则包括default=1。也就是说,

parser.add_argument('--example', nargs='?', const=1, type=int, default=1)

然后

% test.py 
Namespace(example=1)
import argparse
parser = argparse.ArgumentParser()
parser.add_argument('--example', nargs='?', const=1, type=int)
args = parser.parse_args()
print(args)

% test.py 
Namespace(example=None)
% test.py --example
Namespace(example=1)
% test.py --example 2
Namespace(example=2)

  • nargs='?' means 0-or-1 arguments
  • const=1 sets the default when there are 0 arguments
  • type=int converts the argument to int

If you want test.py to set example to 1 even if no --example is specified, then include default=1. That is, with

parser.add_argument('--example', nargs='?', const=1, type=int, default=1)

then

% test.py 
Namespace(example=1)

回答 1

实际上,您只需要使用此脚本中的default参数即可:add_argumenttest.py

import argparse

if __name__ == '__main__':

    parser = argparse.ArgumentParser()
    parser.add_argument('--example', default=1)
    args = parser.parse_args()
    print(args.example)

test.py --example
% 1
test.py --example 2
% 2

详细信息在这里

Actually, you only need to use the default argument to add_argument as in this test.py script:

import argparse

if __name__ == '__main__':

    parser = argparse.ArgumentParser()
    parser.add_argument('--example', default=1)
    args = parser.parse_args()
    print(args.example)

test.py --example
% 1
test.py --example 2
% 2

Details are here.


回答 2

和…之间的不同:

parser.add_argument("--debug", help="Debug", nargs='?', type=int, const=1, default=7)

parser.add_argument("--debug", help="Debug", nargs='?', type=int, const=1)

因此是:

myscript.py =>在第一种情况下,debug是7(默认情况下),在第二种情况下是“ None”

myscript.py --debug =>在每种情况下,调试均为1

myscript.py --debug 2 =>在每种情况下,调试均为2

The difference between:

parser.add_argument("--debug", help="Debug", nargs='?', type=int, const=1, default=7)

and

parser.add_argument("--debug", help="Debug", nargs='?', type=int, const=1)

is thus:

myscript.py => debug is 7 (from default) in the first case and “None” in the second

myscript.py --debug => debug is 1 in each case

myscript.py --debug 2 => debug is 2 in each case


用括号括起来的列表和括号在Python中有什么区别?

问题:用括号括起来的列表和括号在Python中有什么区别?

>>> x=[1,2]
>>> x[1]
2
>>> x=(1,2)
>>> x[1]
2

它们都有效吗?是出于某些原因而首选?

>>> x=[1,2]
>>> x[1]
2
>>> x=(1,2)
>>> x[1]
2

Are they both valid? Is one preferred for some reason?


回答 0

方括号是列表,括号是元组

列表是可变的,这意味着您可以更改其内容:

>>> x = [1,2]
>>> x.append(3)
>>> x
[1, 2, 3]

而元组不是:

>>> x = (1,2)
>>> x
(1, 2)
>>> x.append(3)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: 'tuple' object has no attribute 'append'

另一个主要区别是元组是可哈希的,这意味着您可以将其用作字典的键。例如:

>>> x = (1,2)
>>> y = [1,2]
>>> z = {}
>>> z[x] = 3
>>> z
{(1, 2): 3}
>>> z[y] = 4
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: unhashable type: 'list'

请注意,正如许多人指出的那样,您可以将元组加在一起。例如:

>>> x = (1,2)
>>> x += (3,)
>>> x
(1, 2, 3)

但是,这并不意味着元组是可变的。在上面的示例中,通过将两个元组作为参数相加来构造新的元组。原始元组未修改。为了证明这一点,请考虑以下因素:

>>> x = (1,2)
>>> y = x
>>> x += (3,)
>>> x
(1, 2, 3)
>>> y
(1, 2)

而如果您要使用列表构造相同的示例,则y也会进行更新:

>>> x = [1, 2]
>>> y = x
>>> x += [3]
>>> x
[1, 2, 3]
>>> y
[1, 2, 3]

Square brackets are lists while parentheses are tuples.

A list is mutable, meaning you can change its contents:

>>> x = [1,2]
>>> x.append(3)
>>> x
[1, 2, 3]

while tuples are not:

>>> x = (1,2)
>>> x
(1, 2)
>>> x.append(3)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: 'tuple' object has no attribute 'append'

The other main difference is that a tuple is hashable, meaning that you can use it as a key to a dictionary, among other things. For example:

>>> x = (1,2)
>>> y = [1,2]
>>> z = {}
>>> z[x] = 3
>>> z
{(1, 2): 3}
>>> z[y] = 4
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: unhashable type: 'list'

Note that, as many people have pointed out, you can add tuples together. For example:

>>> x = (1,2)
>>> x += (3,)
>>> x
(1, 2, 3)

However, this does not mean tuples are mutable. In the example above, a new tuple is constructed by adding together the two tuples as arguments. The original tuple is not modified. To demonstrate this, consider the following:

>>> x = (1,2)
>>> y = x
>>> x += (3,)
>>> x
(1, 2, 3)
>>> y
(1, 2)

Whereas, if you were to construct this same example with a list, y would also be updated:

>>> x = [1, 2]
>>> y = x
>>> x += [3]
>>> x
[1, 2, 3]
>>> y
[1, 2, 3]

回答 1

一个有趣的区别:

lst=[1]
print lst          // prints [1]
print type(lst)    // prints <type 'list'>

notATuple=(1)
print notATuple        // prints 1
print type(notATuple)  // prints <type 'int'>
                                         ^^ instead of tuple(expected)

即使只包含一个值,逗号也必须包含在元组中。例如(1,)代替(1)

One interesting difference :

lst=[1]
print lst          // prints [1]
print type(lst)    // prints <type 'list'>

notATuple=(1)
print notATuple        // prints 1
print type(notATuple)  // prints <type 'int'>
                                         ^^ instead of tuple(expected)

A comma must be included in a tuple even if it contains only a single value. e.g. (1,) instead of (1).


回答 2

它们不是列表,而是列表和元组。您可以在Python教程中阅读有关元组的信息。尽管您可以对列表进行变异,但是使用元组是不可能的。

In [1]: x = (1, 2)

In [2]: x[0] = 3
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)

/home/user/<ipython console> in <module>()

TypeError: 'tuple' object does not support item assignment

They are not lists, they are a list and a tuple. You can read about tuples in the Python tutorial. While you can mutate lists, this is not possible with tuples.

In [1]: x = (1, 2)

In [2]: x[0] = 3
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)

/home/user/<ipython console> in <module>()

TypeError: 'tuple' object does not support item assignment

回答 3

方括号和括号的另一种不同之处是方括号可以描述列表的理解,例如 [x for x in y]

相应的括号语法指定一个元组生成器(x for x in y)

您可以使用以下方法获取元组理解: tuple(x for x in y)

请参阅:为什么Python中没有元组理解?

Another way brackets and parentheses differ is that square brackets can describe a list comprehension, e.g. [x for x in y]

Whereas the corresponding parenthetic syntax specifies a tuple generator: (x for x in y)

You can get a tuple comprehension using: tuple(x for x in y)

See: Why is there no tuple comprehension in Python?


回答 4

第一个是列表,第二个是元组。列表是可变的,元组不是。

查看本教程的“ 数据结构”部分和文档的“ 序列类型”部分。

The first is a list, the second is a tuple. Lists are mutable, tuples are not.

Take a look at the Data Structures section of the tutorial, and the Sequence Types section of the documentation.


回答 5

逗号分隔由包含的项目 ()tupleS,那些由封闭[]list秒。

Comma-separated items enclosed by ( and ) are tuples, those enclosed by [ and ] are lists.


从类定义中的列表理解访问类变量

问题:从类定义中的列表理解访问类变量

如何从类定义中的列表理解中访问其他类变量?以下内容在Python 2中有效,但在Python 3中失败:

class Foo:
    x = 5
    y = [x for i in range(1)]

Python 3.2给出了错误:

NameError: global name 'x' is not defined

尝试Foo.x也不起作用。关于如何在Python 3中执行此操作的任何想法?

一个更复杂的激励示例:

from collections import namedtuple
class StateDatabase:
    State = namedtuple('State', ['name', 'capital'])
    db = [State(*args) for args in [
        ['Alabama', 'Montgomery'],
        ['Alaska', 'Juneau'],
        # ...
    ]]

在此示例中,apply()这是一个不错的解决方法,但不幸的是它已从Python 3中删除。

How do you access other class variables from a list comprehension within the class definition? The following works in Python 2 but fails in Python 3:

class Foo:
    x = 5
    y = [x for i in range(1)]

Python 3.2 gives the error:

NameError: global name 'x' is not defined

Trying Foo.x doesn’t work either. Any ideas on how to do this in Python 3?

A slightly more complicated motivating example:

from collections import namedtuple
class StateDatabase:
    State = namedtuple('State', ['name', 'capital'])
    db = [State(*args) for args in [
        ['Alabama', 'Montgomery'],
        ['Alaska', 'Juneau'],
        # ...
    ]]

In this example, apply() would have been a decent workaround, but it is sadly removed from Python 3.


回答 0

类范围和列表,集合或字典的理解以及生成器表达式不混合。

为什么;或者,关于这个的正式词

在Python 3中,为列表理解赋予了它们自己的适当范围(本地命名空间),以防止其局部变量渗入周围的范围内(即使在理解范围之后,也请参阅Python列表理解重新绑定名称。对吗?)。在模块或函数中使用这样的列表理解时,这很好,但是在类中,作用域范围有点奇怪

pep 227中对此进行了记录:

类范围内的名称不可访问。名称在最里面的函数范围内解析。如果类定义出现在嵌套作用域链中,则解析过程将跳过类定义。

并在 class复合语句文档中

然后,使用新创建的本地命名空间和原始的全局命名空间,在新的执行框架中执行该类的套件(请参见Naming and binding部分)。(通常,套件仅包含函数定义。)当类的套件完成执行时,其执行框架将被丢弃,但其本地命名空间将被保存[4]然后,使用基类的继承列表和属性字典的已保存本地命名空间创建类对象。

强调我的;执行框架是临时范围。

由于范围被重新用作类对象的属性,因此允许将其用作非本地范围也将导致未定义的行为。例如,如果一个类方法称为x嵌套作用域变量,然后又进行操作Foo.x,会发生什么情况?更重要的是,这对于Foo?Python 必须以不同的方式对待类范围,因为它与函数范围有很大不同。

最后但同样重要的是,链接 执行模型文档中命名和绑定部分明确提到了类作用域:

在类块中定义的名称范围仅限于该类块。它不会扩展到方法的代码块–包括理解和生成器表达式,因为它们是使用函数范围实现的。这意味着以下操作将失败:

class A:
     a = 42
     b = list(a + i for i in range(10))

因此,总结一下:您不能从函数,列出的理解或包含在该范围内的生成器表达式中访问类范围;它们的作用就好像该范围不存在。在Python 2中,列表理解是使用快捷方式实现的,但是在Python 3中,它们具有自己的功能范围(应该一直如此),因此您的示例中断了。无论Python版本如何,其他理解类型都有其自己的范围,因此具有set或dict理解的类似示例将在Python 2中中断。

# Same error, in Python 2 or 3
y = {x: x for i in range(1)}

(小)异常;或者,为什么一部分仍然可以工作

无论Python版本如何,理解或生成器表达式的一部分都在周围的范围内执行。那就是最外层可迭代的表达。在您的示例中,它是range(1)

y = [x for i in range(1)]
#               ^^^^^^^^

因此,使用 x在该表达式中不会引发错误:

# Runs fine
y = [i for i in range(x)]

这仅适用于最外面的可迭代对象。如果一个理解具有多个for子句,则内部的可迭代for子句在该理解的范围进行评估:

# NameError
y = [i for i in range(1) for j in range(x)]

做出此设计决定是为了在genexp创建时引发错误,而不是在创建生成器表达式的最外层可迭代器引发错误时,或者当最外层可迭代器变得不可迭代时,在迭代时抛出错误。理解共享此行为以保持一致性。

在引擎盖下看;或者,比您想要的方式更详细

您可以使用dis模块查看所有这些操作。在以下示例中,我将使用Python 3.3,因为它添加了合格的名称,这些名称可以整洁地标识我们要检查的代码对象。产生的字节码在其他方面与Python 3.2相同。

为了创建一个类,Python本质上采用了构成类主体的整个套件(因此所有内容都比该class <name>:行缩进了一层),并像执行一个函数一样执行:

>>> import dis
>>> def foo():
...     class Foo:
...         x = 5
...         y = [x for i in range(1)]
...     return Foo
... 
>>> dis.dis(foo)
  2           0 LOAD_BUILD_CLASS     
              1 LOAD_CONST               1 (<code object Foo at 0x10a436030, file "<stdin>", line 2>) 
              4 LOAD_CONST               2 ('Foo') 
              7 MAKE_FUNCTION            0 
             10 LOAD_CONST               2 ('Foo') 
             13 CALL_FUNCTION            2 (2 positional, 0 keyword pair) 
             16 STORE_FAST               0 (Foo) 

  5          19 LOAD_FAST                0 (Foo) 
             22 RETURN_VALUE         

首先LOAD_CONSTFoo该类中为类主体加载一个代码对象,然后将其放入函数中并进行调用。然后,该调用的结果用于创建类的命名空间,__dict__。到目前为止,一切都很好。

这里要注意的是字节码包含一个嵌套的代码对象。在Python中,类定义,函数,理解和生成器均表示为代码对象,这些对象不仅包含字节码,而且还包含表示局部变量,常量,取自全局变量的变量和取自嵌套作用域的变量的结构。编译后的字节码引用了这些结构,而python解释器知道如何访问给定的字节码。

这里要记住的重要一点是,Python在编译时创建了这些结构。该class套件是<code object Foo at 0x10a436030, file "<stdin>", line 2>已编译的代码对象()。

让我们检查创建类主体本身的代码对象。代码对象具有以下co_consts结构:

>>> foo.__code__.co_consts
(None, <code object Foo at 0x10a436030, file "<stdin>", line 2>, 'Foo')
>>> dis.dis(foo.__code__.co_consts[1])
  2           0 LOAD_FAST                0 (__locals__) 
              3 STORE_LOCALS         
              4 LOAD_NAME                0 (__name__) 
              7 STORE_NAME               1 (__module__) 
             10 LOAD_CONST               0 ('foo.<locals>.Foo') 
             13 STORE_NAME               2 (__qualname__) 

  3          16 LOAD_CONST               1 (5) 
             19 STORE_NAME               3 (x) 

  4          22 LOAD_CONST               2 (<code object <listcomp> at 0x10a385420, file "<stdin>", line 4>) 
             25 LOAD_CONST               3 ('foo.<locals>.Foo.<listcomp>') 
             28 MAKE_FUNCTION            0 
             31 LOAD_NAME                4 (range) 
             34 LOAD_CONST               4 (1) 
             37 CALL_FUNCTION            1 (1 positional, 0 keyword pair) 
             40 GET_ITER             
             41 CALL_FUNCTION            1 (1 positional, 0 keyword pair) 
             44 STORE_NAME               5 (y) 
             47 LOAD_CONST               5 (None) 
             50 RETURN_VALUE         

上面的字节码创建了类主体。该功能被执行并且将所得locals()的命名空间,包含xy用于创建类(不同之处在于因为它不工作x不被定义为一个全局)。请注意,在中存储5x,它会加载另一个代码对象。那就是列表理解;它像类主体一样被包装在一个函数对象中;创建的函数带有一个位置参数,该参数range(1)可迭代用于其循环代码,并转换为迭代器。如字节码所示,range(1)在类范围内进行评估。

从中可以看出,用于函数或生成器的代码对象与用于理解的代码对象之间的唯一区别是,后者在执行父代码对象时立即执行;字节码只是简单地动态创建一个函数,然后只需几个小步骤就可以执行它。

Python 2.x在那里改用内联字节码,这是Python 2.7的输出:

  2           0 LOAD_NAME                0 (__name__)
              3 STORE_NAME               1 (__module__)

  3           6 LOAD_CONST               0 (5)
              9 STORE_NAME               2 (x)

  4          12 BUILD_LIST               0
             15 LOAD_NAME                3 (range)
             18 LOAD_CONST               1 (1)
             21 CALL_FUNCTION            1
             24 GET_ITER            
        >>   25 FOR_ITER                12 (to 40)
             28 STORE_NAME               4 (i)
             31 LOAD_NAME                2 (x)
             34 LIST_APPEND              2
             37 JUMP_ABSOLUTE           25
        >>   40 STORE_NAME               5 (y)
             43 LOAD_LOCALS         
             44 RETURN_VALUE        

没有代码对象被加载,而是FOR_ITER循环内联运行。因此,在Python 3.x中,为列表生成器提供了自己的适当代码对象,这意味着它具有自己的作用域。

然而,理解与当模块或脚本首先被解释加载的Python源代码的其余部分一起编译,编译器并没有考虑一类套件的有效范围。在列表理解任何引用变量必须在查找范围周围的类定义,递归。如果编译器未找到该变量,则将其标记为全局变量。列表理解代码对象的反汇编显示x确实确实是作为全局加载的:

>>> foo.__code__.co_consts[1].co_consts
('foo.<locals>.Foo', 5, <code object <listcomp> at 0x10a385420, file "<stdin>", line 4>, 'foo.<locals>.Foo.<listcomp>', 1, None)
>>> dis.dis(foo.__code__.co_consts[1].co_consts[2])
  4           0 BUILD_LIST               0 
              3 LOAD_FAST                0 (.0) 
        >>    6 FOR_ITER                12 (to 21) 
              9 STORE_FAST               1 (i) 
             12 LOAD_GLOBAL              0 (x) 
             15 LIST_APPEND              2 
             18 JUMP_ABSOLUTE            6 
        >>   21 RETURN_VALUE         

此字节代码块加载传入的第一个参数( range(1)迭代器),就像Python 2.x版本用于对其FOR_ITER进行循环并创建其输出一样。

如果我们xfoo函数中定义,x它将是一个单元格变量(单元格是指嵌套作用域):

>>> def foo():
...     x = 2
...     class Foo:
...         x = 5
...         y = [x for i in range(1)]
...     return Foo
... 
>>> dis.dis(foo.__code__.co_consts[2].co_consts[2])
  5           0 BUILD_LIST               0 
              3 LOAD_FAST                0 (.0) 
        >>    6 FOR_ITER                12 (to 21) 
              9 STORE_FAST               1 (i) 
             12 LOAD_DEREF               0 (x) 
             15 LIST_APPEND              2 
             18 JUMP_ABSOLUTE            6 
        >>   21 RETURN_VALUE         

LOAD_DEREF将间接加载x从代码对象小区对象:

>>> foo.__code__.co_cellvars               # foo function `x`
('x',)
>>> foo.__code__.co_consts[2].co_cellvars  # Foo class, no cell variables
()
>>> foo.__code__.co_consts[2].co_consts[2].co_freevars  # Refers to `x` in foo
('x',)
>>> foo().y
[2]

实际引用从当前帧数据结构中查找值,当前帧数据结构是从功能对象的.__closure__属性初始化的。由于为理解代码对象创建的函数被再次丢弃,因此我们无法检查该函数的关闭情况。要查看实际的闭包,我们必须检查一个嵌套函数:

>>> def spam(x):
...     def eggs():
...         return x
...     return eggs
... 
>>> spam(1).__code__.co_freevars
('x',)
>>> spam(1)()
1
>>> spam(1).__closure__
>>> spam(1).__closure__[0].cell_contents
1
>>> spam(5).__closure__[0].cell_contents
5

因此,总结一下:

  • 列表推导在Python 3中获得了自己的代码对象,并且函数,生成器或推导的代码对象之间没有区别。理解代码对象包装在一个临时函数对象中,并立即调用。
  • 代码对象是在编译时创建的,并且根据代码的嵌套作用域,将任何非局部变量标记为全局变量或自由变量。类主体被视为查找那些变量的范围。
  • 执行代码时,Python只需查看全局变量或当前正在执行的对象的关闭。由于编译器未将类主体作为范围包含在内,因此不考虑临时函数命名空间。

解决方法;或者,该怎么办

如果要x像在函数中那样为变量创建显式作用域,则可以将类作用域变量用于列表理解:

>>> class Foo:
...     x = 5
...     def y(x):
...         return [x for i in range(1)]
...     y = y(x)
... 
>>> Foo.y
[5]

y可以直接调用“临时” 功能。我们用它的返回值替换它。解决时考虑其范围x

>>> foo.__code__.co_consts[1].co_consts[2]
<code object y at 0x10a5df5d0, file "<stdin>", line 4>
>>> foo.__code__.co_consts[1].co_consts[2].co_cellvars
('x',)

当然,人们在阅读您的代码时会对此有些挠头。您可能要在其中添加一个大的粗注,以解释您为什么这样做。

最好的解决方法是仅使用__init__创建一个实例变量:

def __init__(self):
    self.y = [self.x for i in range(1)]

并避免一切费力的工作,并避免提出自己的问题。对于您自己的具体示例,我什至不将其存储namedtuple在类中。直接使用输出(根本不存储生成的类),或使用全局变量:

from collections import namedtuple
State = namedtuple('State', ['name', 'capital'])

class StateDatabase:
    db = [State(*args) for args in [
       ('Alabama', 'Montgomery'),
       ('Alaska', 'Juneau'),
       # ...
    ]]

Class scope and list, set or dictionary comprehensions, as well as generator expressions do not mix.

The why; or, the official word on this

In Python 3, list comprehensions were given a proper scope (local namespace) of their own, to prevent their local variables bleeding over into the surrounding scope (see Python list comprehension rebind names even after scope of comprehension. Is this right?). That’s great when using such a list comprehension in a module or in a function, but in classes, scoping is a little, uhm, strange.

This is documented in pep 227:

Names in class scope are not accessible. Names are resolved in the innermost enclosing function scope. If a class definition occurs in a chain of nested scopes, the resolution process skips class definitions.

and in the class compound statement documentation:

The class’s suite is then executed in a new execution frame (see section Naming and binding), using a newly created local namespace and the original global namespace. (Usually, the suite contains only function definitions.) When the class’s suite finishes execution, its execution frame is discarded but its local namespace is saved. [4] A class object is then created using the inheritance list for the base classes and the saved local namespace for the attribute dictionary.

Emphasis mine; the execution frame is the temporary scope.

Because the scope is repurposed as the attributes on a class object, allowing it to be used as a nonlocal scope as well leads to undefined behaviour; what would happen if a class method referred to x as a nested scope variable, then manipulates Foo.x as well, for example? More importantly, what would that mean for subclasses of Foo? Python has to treat a class scope differently as it is very different from a function scope.

Last, but definitely not least, the linked Naming and binding section in the Execution model documentation mentions class scopes explicitly:

The scope of names defined in a class block is limited to the class block; it does not extend to the code blocks of methods – this includes comprehensions and generator expressions since they are implemented using a function scope. This means that the following will fail:

class A:
     a = 42
     b = list(a + i for i in range(10))

So, to summarize: you cannot access the class scope from functions, list comprehensions or generator expressions enclosed in that scope; they act as if that scope does not exist. In Python 2, list comprehensions were implemented using a shortcut, but in Python 3 they got their own function scope (as they should have had all along) and thus your example breaks. Other comprehension types have their own scope regardless of Python version, so a similar example with a set or dict comprehension would break in Python 2.

# Same error, in Python 2 or 3
y = {x: x for i in range(1)}

The (small) exception; or, why one part may still work

There’s one part of a comprehension or generator expression that executes in the surrounding scope, regardless of Python version. That would be the expression for the outermost iterable. In your example, it’s the range(1):

y = [x for i in range(1)]
#               ^^^^^^^^

Thus, using x in that expression would not throw an error:

# Runs fine
y = [i for i in range(x)]

This only applies to the outermost iterable; if a comprehension has multiple for clauses, the iterables for inner for clauses are evaluated in the comprehension’s scope:

# NameError
y = [i for i in range(1) for j in range(x)]

This design decision was made in order to throw an error at genexp creation time instead of iteration time when creating the outermost iterable of a generator expression throws an error, or when the outermost iterable turns out not to be iterable. Comprehensions share this behavior for consistency.

Looking under the hood; or, way more detail than you ever wanted

You can see this all in action using the dis module. I’m using Python 3.3 in the following examples, because it adds qualified names that neatly identify the code objects we want to inspect. The bytecode produced is otherwise functionally identical to Python 3.2.

To create a class, Python essentially takes the whole suite that makes up the class body (so everything indented one level deeper than the class <name>: line), and executes that as if it were a function:

>>> import dis
>>> def foo():
...     class Foo:
...         x = 5
...         y = [x for i in range(1)]
...     return Foo
... 
>>> dis.dis(foo)
  2           0 LOAD_BUILD_CLASS     
              1 LOAD_CONST               1 (<code object Foo at 0x10a436030, file "<stdin>", line 2>) 
              4 LOAD_CONST               2 ('Foo') 
              7 MAKE_FUNCTION            0 
             10 LOAD_CONST               2 ('Foo') 
             13 CALL_FUNCTION            2 (2 positional, 0 keyword pair) 
             16 STORE_FAST               0 (Foo) 

  5          19 LOAD_FAST                0 (Foo) 
             22 RETURN_VALUE         

The first LOAD_CONST there loads a code object for the Foo class body, then makes that into a function, and calls it. The result of that call is then used to create the namespace of the class, its __dict__. So far so good.

The thing to note here is that the bytecode contains a nested code object; in Python, class definitions, functions, comprehensions and generators all are represented as code objects that contain not only bytecode, but also structures that represent local variables, constants, variables taken from globals, and variables taken from the nested scope. The compiled bytecode refers to those structures and the python interpreter knows how to access those given the bytecodes presented.

The important thing to remember here is that Python creates these structures at compile time; the class suite is a code object (<code object Foo at 0x10a436030, file "<stdin>", line 2>) that is already compiled.

Let’s inspect that code object that creates the class body itself; code objects have a co_consts structure:

>>> foo.__code__.co_consts
(None, <code object Foo at 0x10a436030, file "<stdin>", line 2>, 'Foo')
>>> dis.dis(foo.__code__.co_consts[1])
  2           0 LOAD_FAST                0 (__locals__) 
              3 STORE_LOCALS         
              4 LOAD_NAME                0 (__name__) 
              7 STORE_NAME               1 (__module__) 
             10 LOAD_CONST               0 ('foo.<locals>.Foo') 
             13 STORE_NAME               2 (__qualname__) 

  3          16 LOAD_CONST               1 (5) 
             19 STORE_NAME               3 (x) 

  4          22 LOAD_CONST               2 (<code object <listcomp> at 0x10a385420, file "<stdin>", line 4>) 
             25 LOAD_CONST               3 ('foo.<locals>.Foo.<listcomp>') 
             28 MAKE_FUNCTION            0 
             31 LOAD_NAME                4 (range) 
             34 LOAD_CONST               4 (1) 
             37 CALL_FUNCTION            1 (1 positional, 0 keyword pair) 
             40 GET_ITER             
             41 CALL_FUNCTION            1 (1 positional, 0 keyword pair) 
             44 STORE_NAME               5 (y) 
             47 LOAD_CONST               5 (None) 
             50 RETURN_VALUE         

The above bytecode creates the class body. The function is executed and the resulting locals() namespace, containing x and y is used to create the class (except that it doesn’t work because x isn’t defined as a global). Note that after storing 5 in x, it loads another code object; that’s the list comprehension; it is wrapped in a function object just like the class body was; the created function takes a positional argument, the range(1) iterable to use for its looping code, cast to an iterator. As shown in the bytecode, range(1) is evaluated in the class scope.

From this you can see that the only difference between a code object for a function or a generator, and a code object for a comprehension is that the latter is executed immediately when the parent code object is executed; the bytecode simply creates a function on the fly and executes it in a few small steps.

Python 2.x uses inline bytecode there instead, here is output from Python 2.7:

  2           0 LOAD_NAME                0 (__name__)
              3 STORE_NAME               1 (__module__)

  3           6 LOAD_CONST               0 (5)
              9 STORE_NAME               2 (x)

  4          12 BUILD_LIST               0
             15 LOAD_NAME                3 (range)
             18 LOAD_CONST               1 (1)
             21 CALL_FUNCTION            1
             24 GET_ITER            
        >>   25 FOR_ITER                12 (to 40)
             28 STORE_NAME               4 (i)
             31 LOAD_NAME                2 (x)
             34 LIST_APPEND              2
             37 JUMP_ABSOLUTE           25
        >>   40 STORE_NAME               5 (y)
             43 LOAD_LOCALS         
             44 RETURN_VALUE        

No code object is loaded, instead a FOR_ITER loop is run inline. So in Python 3.x, the list generator was given a proper code object of its own, which means it has its own scope.

However, the comprehension was compiled together with the rest of the python source code when the module or script was first loaded by the interpreter, and the compiler does not consider a class suite a valid scope. Any referenced variables in a list comprehension must look in the scope surrounding the class definition, recursively. If the variable wasn’t found by the compiler, it marks it as a global. Disassembly of the list comprehension code object shows that x is indeed loaded as a global:

>>> foo.__code__.co_consts[1].co_consts
('foo.<locals>.Foo', 5, <code object <listcomp> at 0x10a385420, file "<stdin>", line 4>, 'foo.<locals>.Foo.<listcomp>', 1, None)
>>> dis.dis(foo.__code__.co_consts[1].co_consts[2])
  4           0 BUILD_LIST               0 
              3 LOAD_FAST                0 (.0) 
        >>    6 FOR_ITER                12 (to 21) 
              9 STORE_FAST               1 (i) 
             12 LOAD_GLOBAL              0 (x) 
             15 LIST_APPEND              2 
             18 JUMP_ABSOLUTE            6 
        >>   21 RETURN_VALUE         

This chunk of bytecode loads the first argument passed in (the range(1) iterator), and just like the Python 2.x version uses FOR_ITER to loop over it and create its output.

Had we defined x in the foo function instead, x would be a cell variable (cells refer to nested scopes):

>>> def foo():
...     x = 2
...     class Foo:
...         x = 5
...         y = [x for i in range(1)]
...     return Foo
... 
>>> dis.dis(foo.__code__.co_consts[2].co_consts[2])
  5           0 BUILD_LIST               0 
              3 LOAD_FAST                0 (.0) 
        >>    6 FOR_ITER                12 (to 21) 
              9 STORE_FAST               1 (i) 
             12 LOAD_DEREF               0 (x) 
             15 LIST_APPEND              2 
             18 JUMP_ABSOLUTE            6 
        >>   21 RETURN_VALUE         

The LOAD_DEREF will indirectly load x from the code object cell objects:

>>> foo.__code__.co_cellvars               # foo function `x`
('x',)
>>> foo.__code__.co_consts[2].co_cellvars  # Foo class, no cell variables
()
>>> foo.__code__.co_consts[2].co_consts[2].co_freevars  # Refers to `x` in foo
('x',)
>>> foo().y
[2]

The actual referencing looks the value up from the current frame data structures, which were initialized from a function object’s .__closure__ attribute. Since the function created for the comprehension code object is discarded again, we do not get to inspect that function’s closure. To see a closure in action, we’d have to inspect a nested function instead:

>>> def spam(x):
...     def eggs():
...         return x
...     return eggs
... 
>>> spam(1).__code__.co_freevars
('x',)
>>> spam(1)()
1
>>> spam(1).__closure__
>>> spam(1).__closure__[0].cell_contents
1
>>> spam(5).__closure__[0].cell_contents
5

So, to summarize:

  • List comprehensions get their own code objects in Python 3, and there is no difference between code objects for functions, generators or comprehensions; comprehension code objects are wrapped in a temporary function object and called immediately.
  • Code objects are created at compile time, and any non-local variables are marked as either global or as free variables, based on the nested scopes of the code. The class body is not considered a scope for looking up those variables.
  • When executing the code, Python has only to look into the globals, or the closure of the currently executing object. Since the compiler didn’t include the class body as a scope, the temporary function namespace is not considered.

A workaround; or, what to do about it

If you were to create an explicit scope for the x variable, like in a function, you can use class-scope variables for a list comprehension:

>>> class Foo:
...     x = 5
...     def y(x):
...         return [x for i in range(1)]
...     y = y(x)
... 
>>> Foo.y
[5]

The ‘temporary’ y function can be called directly; we replace it when we do with its return value. Its scope is considered when resolving x:

>>> foo.__code__.co_consts[1].co_consts[2]
<code object y at 0x10a5df5d0, file "<stdin>", line 4>
>>> foo.__code__.co_consts[1].co_consts[2].co_cellvars
('x',)

Of course, people reading your code will scratch their heads over this a little; you may want to put a big fat comment in there explaining why you are doing this.

The best work-around is to just use __init__ to create an instance variable instead:

def __init__(self):
    self.y = [self.x for i in range(1)]

and avoid all the head-scratching, and questions to explain yourself. For your own concrete example, I would not even store the namedtuple on the class; either use the output directly (don’t store the generated class at all), or use a global:

from collections import namedtuple
State = namedtuple('State', ['name', 'capital'])

class StateDatabase:
    db = [State(*args) for args in [
       ('Alabama', 'Montgomery'),
       ('Alaska', 'Juneau'),
       # ...
    ]]

回答 1

我认为这是Python 3中的一个缺陷。我希望他们能够改变它。

旧方法(适用于2.7,适用NameError: name 'x' is not defined于3+):

class A:
    x = 4
    y = [x+i for i in range(1)]

注意:仅使用范围A.x将无法解决

新方式(适用于3+):

class A:
    x = 4
    y = (lambda x=x: [x+i for i in range(1)])()

因为语法太丑陋,所以我通常在构造函数中初始化所有类变量

In my opinion it is a flaw in Python 3. I hope they change it.

Old Way (works in 2.7, throws NameError: name 'x' is not defined in 3+):

class A:
    x = 4
    y = [x+i for i in range(1)]

NOTE: simply scoping it with A.x would not solve it

New Way (works in 3+):

class A:
    x = 4
    y = (lambda x=x: [x+i for i in range(1)])()

Because the syntax is so ugly I just initialize all my class variables in the constructor typically


回答 2

公认的答案提供了很好的信息,但这里似乎还有其他一些不足之处–列表理解和生成器表达式之间的差异。我玩过的一个演示:

class Foo:

    # A class-level variable.
    X = 10

    # I can use that variable to define another class-level variable.
    Y = sum((X, X))

    # Works in Python 2, but not 3.
    # In Python 3, list comprehensions were given their own scope.
    try:
        Z1 = sum([X for _ in range(3)])
    except NameError:
        Z1 = None

    # Fails in both.
    # Apparently, generator expressions (that's what the entire argument
    # to sum() is) did have their own scope even in Python 2.
    try:
        Z2 = sum(X for _ in range(3))
    except NameError:
        Z2 = None

    # Workaround: put the computation in lambda or def.
    compute_z3 = lambda val: sum(val for _ in range(3))

    # Then use that function.
    Z3 = compute_z3(X)

    # Also worth noting: here I can refer to XS in the for-part of the
    # generator expression (Z4 works), but I cannot refer to XS in the
    # inner-part of the generator expression (Z5 fails).
    XS = [15, 15, 15, 15]
    Z4 = sum(val for val in XS)
    try:
        Z5 = sum(XS[i] for i in range(len(XS)))
    except NameError:
        Z5 = None

print(Foo.Z1, Foo.Z2, Foo.Z3, Foo.Z4, Foo.Z5)

The accepted answer provides excellent information, but there appear to be a few other wrinkles here — differences between list comprehension and generator expressions. A demo that I played around with:

class Foo:

    # A class-level variable.
    X = 10

    # I can use that variable to define another class-level variable.
    Y = sum((X, X))

    # Works in Python 2, but not 3.
    # In Python 3, list comprehensions were given their own scope.
    try:
        Z1 = sum([X for _ in range(3)])
    except NameError:
        Z1 = None

    # Fails in both.
    # Apparently, generator expressions (that's what the entire argument
    # to sum() is) did have their own scope even in Python 2.
    try:
        Z2 = sum(X for _ in range(3))
    except NameError:
        Z2 = None

    # Workaround: put the computation in lambda or def.
    compute_z3 = lambda val: sum(val for _ in range(3))

    # Then use that function.
    Z3 = compute_z3(X)

    # Also worth noting: here I can refer to XS in the for-part of the
    # generator expression (Z4 works), but I cannot refer to XS in the
    # inner-part of the generator expression (Z5 fails).
    XS = [15, 15, 15, 15]
    Z4 = sum(val for val in XS)
    try:
        Z5 = sum(XS[i] for i in range(len(XS)))
    except NameError:
        Z5 = None

print(Foo.Z1, Foo.Z2, Foo.Z3, Foo.Z4, Foo.Z5)

回答 3

这是Python中的错误。宣传被认为等同于for循环,但是在类中却并非如此。至少在Python 3.6.6之前的版本中,在类中使用的理解中,在理解内部只能访问该理解外部的一个变量,并且必须将其用作最外层的迭代器。在功能上,此范围限制不适用。

为了说明为什么这是一个错误,让我们回到原始示例。这将失败:

class Foo:
    x = 5
    y = [x for i in range(1)]

但这有效:

def Foo():
    x = 5
    y = [x for i in range(1)]

该限制在参考指南的本节结尾处说明。

This is a bug in Python. Comprehensions are advertised as being equivalent to for loops, but this is not true in classes. At least up to Python 3.6.6, in a comprehension used in a class, only one variable from outside the comprehension is accessible inside the comprehension, and it must be used as the outermost iterator. In a function, this scope limitation does not apply.

To illustrate why this is a bug, let’s return to the original example. This fails:

class Foo:
    x = 5
    y = [x for i in range(1)]

But this works:

def Foo():
    x = 5
    y = [x for i in range(1)]

The limitation is stated at the end of this section in the reference guide.


回答 4

由于最外层的迭代器是在周围的范围内进行评估的,因此我们可以zip一起使用itertools.repeat将依赖项传递到理解范围内:

import itertools as it

class Foo:
    x = 5
    y = [j for i, j in zip(range(3), it.repeat(x))]

也可以for在理解中使用嵌套循环,并将依赖项包含在最外层的可迭代对象中:

class Foo:
    x = 5
    y = [j for j in (x,) for i in range(3)]

对于OP的特定示例:

from collections import namedtuple
import itertools as it

class StateDatabase:
    State = namedtuple('State', ['name', 'capital'])
    db = [State(*args) for State, args in zip(it.repeat(State), [
        ['Alabama', 'Montgomery'],
        ['Alaska', 'Juneau'],
        # ...
    ])]

Since the outermost iterator is evaluated in the surrounding scope we can use zip together with itertools.repeat to carry the dependencies over to the comprehension’s scope:

import itertools as it

class Foo:
    x = 5
    y = [j for i, j in zip(range(3), it.repeat(x))]

One can also use nested for loops in the comprehension and include the dependencies in the outermost iterable:

class Foo:
    x = 5
    y = [j for j in (x,) for i in range(3)]

For the specific example of the OP:

from collections import namedtuple
import itertools as it

class StateDatabase:
    State = namedtuple('State', ['name', 'capital'])
    db = [State(*args) for State, args in zip(it.repeat(State), [
        ['Alabama', 'Montgomery'],
        ['Alaska', 'Juneau'],
        # ...
    ])]

在Python中使用多个参数进行字符串格式化(例如’%s…%s’)

问题:在Python中使用多个参数进行字符串格式化(例如’%s…%s’)

我有一个看起来像的字符串,'%s in %s'并且我想知道如何分隔参数,以便它们是两个不同的%s。我来自Java的想法是这样的:

'%s in %s' % unicode(self.author),  unicode(self.publication)

但这不起作用,因此它在Python中的外观如何?

I have a string that looks like '%s in %s' and I want to know how to seperate the arguments so that they are two different %s. My mind coming from Java came up with this:

'%s in %s' % unicode(self.author),  unicode(self.publication)

But this doesn’t work so how does it look in Python?


回答 0

马克·西达德(Mark Cidade)的答案是正确的-您需要提供一个元组。

但是从Python 2.6起,您可以使用format代替%

'{0} in {1}'.format(unicode(self.author,'utf-8'),  unicode(self.publication,'utf-8'))

%不再鼓励使用for格式化字符串。

这种字符串格式设置方法是Python 3.0中的新标准,应优先于新代码中“字符串格式设置操作”中描述的%格式设置。

Mark Cidade’s answer is right – you need to supply a tuple.

However from Python 2.6 onwards you can use format instead of %:

'{0} in {1}'.format(unicode(self.author,'utf-8'),  unicode(self.publication,'utf-8'))

Usage of % for formatting strings is no longer encouraged.

This method of string formatting is the new standard in Python 3.0, and should be preferred to the % formatting described in String Formatting Operations in new code.


回答 1

如果使用多个参数,则必须将其放在一个元组中(请注意额外的括号):

'%s in %s' % (unicode(self.author),  unicode(self.publication))

正如EOL所指出的那样,该unicode()函数通常假定默认为ascii编码,因此,如果您使用非ASCII字符,则显式传递编码会更安全:

'%s in %s' % (unicode(self.author,'utf-8'),  unicode(self.publication('utf-8')))

从Python 3.0开始,最好改用以下str.format()语法:

'{0} in {1}'.format(unicode(self.author,'utf-8'),unicode(self.publication,'utf-8'))

If you’re using more than one argument it has to be in a tuple (note the extra parentheses):

'%s in %s' % (unicode(self.author),  unicode(self.publication))

As EOL points out, the unicode() function usually assumes ascii encoding as a default, so if you have non-ASCII characters, it’s safer to explicitly pass the encoding:

'%s in %s' % (unicode(self.author,'utf-8'),  unicode(self.publication('utf-8')))

And as of Python 3.0, it’s preferred to use the str.format() syntax instead:

'{0} in {1}'.format(unicode(self.author,'utf-8'),unicode(self.publication,'utf-8'))

回答 2

在元组/映射对象上有多个参数 format

以下是文档摘录:

给定的format % values中的%转换规范format将替换为的零个或多个元素values。效果类似于使用sprintf() C语言中的用法。

如果format需要单个参数,则值可以是单个非元组对象。否则,值必须是一个具有由formatstring 指定的项目数的元组或者是一个映射对象(例如,字典)。

参考资料


开启str.format而不是%

%操作员的新替代方法是使用str.format。以下是文档摘录:

str.format(*args, **kwargs)

执行字符串格式化操作。调用此方法的字符串可以包含文字文本或用大括号分隔的替换字段{}。每个替换字段都包含位置参数的数字索引或关键字参数的名称。返回字符串的副本,其中每个替换字段都用相应参数的字符串值替换。

此方法是Python 3.0中的新标准,应优先于%formatting

参考资料


例子

以下是一些用法示例:

>>> '%s for %s' % ("tit", "tat")
tit for tat

>>> '{} and {}'.format("chicken", "waffles")
chicken and waffles

>>> '%(last)s, %(first)s %(last)s' % {'first': "James", 'last': "Bond"}
Bond, James Bond

>>> '{last}, {first} {last}'.format(first="James", last="Bond")
Bond, James Bond

也可以看看

On a tuple/mapping object for multiple argument format

The following is excerpt from the documentation:

Given format % values, % conversion specifications in format are replaced with zero or more elements of values. The effect is similar to the using sprintf() in the C language.

If format requires a single argument, values may be a single non-tuple object. Otherwise, values must be a tuple with exactly the number of items specified by the format string, or a single mapping object (for example, a dictionary).

References


On str.format instead of %

A newer alternative to % operator is to use str.format. Here’s an excerpt from the documentation:

str.format(*args, **kwargs)

Perform a string formatting operation. The string on which this method is called can contain literal text or replacement fields delimited by braces {}. Each replacement field contains either the numeric index of a positional argument, or the name of a keyword argument. Returns a copy of the string where each replacement field is replaced with the string value of the corresponding argument.

This method is the new standard in Python 3.0, and should be preferred to % formatting.

References


Examples

Here are some usage examples:

>>> '%s for %s' % ("tit", "tat")
tit for tat

>>> '{} and {}'.format("chicken", "waffles")
chicken and waffles

>>> '%(last)s, %(first)s %(last)s' % {'first': "James", 'last': "Bond"}
Bond, James Bond

>>> '{last}, {first} {last}'.format(first="James", last="Bond")
Bond, James Bond

See also


回答 3

您必须将值放在括号中:

'%s in %s' % (unicode(self.author),  unicode(self.publication))

在这里,第一个%sunicode(self.author)将被放置。第二%sunicode(self.publication)将使用。

注意:你应该有利于string formatting%符号。更多信息在这里

You must just put the values into parentheses:

'%s in %s' % (unicode(self.author),  unicode(self.publication))

Here, for the first %s the unicode(self.author) will be placed. And for the second %s, the unicode(self.publication) will be used.

Note: You should favor string formatting over the % Notation. More info here


回答 4

到目前为止,发布的一些答案存在一个严重的问题:unicode()从默认编码(通常为ASCII)解码;实际上,unicode()试图通过将给定的字节转换为字符来“感知”。因此,以下代码(基本上是前面的答案所建议的)在我的计算机上失败:

# -*- coding: utf-8 -*-
author = 'éric'
print '{0}'.format(unicode(author))

给出:

Traceback (most recent call last):
  File "test.py", line 3, in <module>
    print '{0}'.format(unicode(author))
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 0: ordinal not in range(128)

失败的原因是author不只包含ASCII字节(即[0; 127]中的值),并且unicode()默认情况下(在许多计算机上)从ASCII解码。

一个可靠的解决方案是显式提供您的字段中使用的编码。以UTF-8为例:

u'{0} in {1}'.format(unicode(self.author, 'utf-8'), unicode(self.publication, 'utf-8'))

(或不使用initial u,这取决于您要使用Unicode结果还是字节字符串)。

在这一点上,可能要考虑让authorand publication字段为Unicode字符串,而不是在格式化期间对其进行解码。

There is a significant problem with some of the answers posted so far: unicode() decodes from the default encoding, which is often ASCII; in fact, unicode() tries to make “sense” of the bytes it is given by converting them into characters. Thus, the following code, which is essentially what is recommended by previous answers, fails on my machine:

# -*- coding: utf-8 -*-
author = 'éric'
print '{0}'.format(unicode(author))

gives:

Traceback (most recent call last):
  File "test.py", line 3, in <module>
    print '{0}'.format(unicode(author))
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 0: ordinal not in range(128)

The failure comes from the fact that author does not contain only ASCII bytes (i.e. with values in [0; 127]), and unicode() decodes from ASCII by default (on many machines).

A robust solution is to explicitly give the encoding used in your fields; taking UTF-8 as an example:

u'{0} in {1}'.format(unicode(self.author, 'utf-8'), unicode(self.publication, 'utf-8'))

(or without the initial u, depending on whether you want a Unicode result or a byte string).

At this point, one might want to consider having the author and publication fields be Unicode strings, instead of decoding them during formatting.


回答 5

对于python2,您也可以执行此操作

'%(author)s in %(publication)s'%{'author':unicode(self.author),
                                  'publication':unicode(self.publication)}

如果您有很多可替代的论点(特别是在进行国际化的情况下),这将很方便

Python2.6及更高版本支持 .format()

'{author} in {publication}'.format(author=self.author,
                                   publication=self.publication)

For python2 you can also do this

'%(author)s in %(publication)s'%{'author':unicode(self.author),
                                  'publication':unicode(self.publication)}

which is handy if you have a lot of arguments to substitute (particularly if you are doing internationalisation)

Python2.6 onwards supports .format()

'{author} in {publication}'.format(author=self.author,
                                   publication=self.publication)

回答 6

您还可以通过以下方式干净,简单地使用它(但是错误!因为您应该format像Mark Byers所说的那样使用):

print 'This is my %s formatted with %d arguments' % ('string', 2)

You could also use it clean and simple (but wrong! because you should use format like Mark Byers said) by doing:

print 'This is my %s formatted with %d arguments' % ('string', 2)

回答 7

为了完整起见,在PEP-498中引入了Python 3.6 f-string 。这些字符串可以

使用最小语法将表达式嵌入字符串文字中。

这意味着对于您的示例,您还可以使用:

f'{self.author} in {self.publication}'

For completeness, in python 3.6 f-string are introduced in PEP-498. These strings make it possible to

embed expressions inside string literals, using a minimal syntax.

That would mean that for your example you could also use:

f'{self.author} in {self.publication}'

在组对象上应用vs变换

问题:在组对象上应用vs变换

考虑以下数据帧:

     A      B         C         D
0  foo    one  0.162003  0.087469
1  bar    one -1.156319 -1.526272
2  foo    two  0.833892 -1.666304
3  bar  three -2.026673 -0.322057
4  foo    two  0.411452 -0.954371
5  bar    two  0.765878 -0.095968
6  foo    one -0.654890  0.678091
7  foo  three -1.789842 -1.130922

以下命令起作用:

> df.groupby('A').apply(lambda x: (x['C'] - x['D']))
> df.groupby('A').apply(lambda x: (x['C'] - x['D']).mean())

但以下任何一项均无效:

> df.groupby('A').transform(lambda x: (x['C'] - x['D']))
ValueError: could not broadcast input array from shape (5) into shape (5,3)

> df.groupby('A').transform(lambda x: (x['C'] - x['D']).mean())
 TypeError: cannot concatenate a non-NDFrame object

为什么? 文档上的示例似乎建议通过调用transform组,可以进行行操作处理:

# Note that the following suggests row-wise operation (x.mean is the column mean)
zscore = lambda x: (x - x.mean()) / x.std()
transformed = ts.groupby(key).transform(zscore)

换句话说,我认为转换本质上是一种特定的应用类型(不聚合)。我哪里错了?

供参考,以下是上面原始数据帧的构造:

df = pd.DataFrame({'A' : ['foo', 'bar', 'foo', 'bar',
                          'foo', 'bar', 'foo', 'foo'],
                   'B' : ['one', 'one', 'two', 'three',
                         'two', 'two', 'one', 'three'],
                   'C' : randn(8), 'D' : randn(8)})

Consider the following dataframe:

     A      B         C         D
0  foo    one  0.162003  0.087469
1  bar    one -1.156319 -1.526272
2  foo    two  0.833892 -1.666304
3  bar  three -2.026673 -0.322057
4  foo    two  0.411452 -0.954371
5  bar    two  0.765878 -0.095968
6  foo    one -0.654890  0.678091
7  foo  three -1.789842 -1.130922

The following commands work:

> df.groupby('A').apply(lambda x: (x['C'] - x['D']))
> df.groupby('A').apply(lambda x: (x['C'] - x['D']).mean())

but none of the following work:

> df.groupby('A').transform(lambda x: (x['C'] - x['D']))
ValueError: could not broadcast input array from shape (5) into shape (5,3)

> df.groupby('A').transform(lambda x: (x['C'] - x['D']).mean())
 TypeError: cannot concatenate a non-NDFrame object

Why? The example on the documentation seems to suggest that calling transform on a group allows one to do row-wise operation processing:

# Note that the following suggests row-wise operation (x.mean is the column mean)
zscore = lambda x: (x - x.mean()) / x.std()
transformed = ts.groupby(key).transform(zscore)

In other words, I thought that transform is essentially a specific type of apply (the one that does not aggregate). Where am I wrong?

For reference, below is the construction of the original dataframe above:

df = pd.DataFrame({'A' : ['foo', 'bar', 'foo', 'bar',
                          'foo', 'bar', 'foo', 'foo'],
                   'B' : ['one', 'one', 'two', 'three',
                         'two', 'two', 'one', 'three'],
                   'C' : randn(8), 'D' : randn(8)})

回答 0

apply和之间的两个主要区别transform

transformapplygroupby方法之间有两个主要区别。

  • 输入:
    • apply将每个组的所有列作为DataFrame隐式传递给自定义函数。
    • 同时transform将每个组的每一列作为系列分别传递给自定义函数。
  • 输出:
    • 传递给的自定义函数apply可以返回标量,或者返回Series或DataFrame(或numpy数组,甚至是list)
    • 传递给的自定义函数transform必须返回与group长度相同的序列(一维Series,数组或列表)。

因此,transform一次只能处理一个Series,而一次apply可以处理整个DataFrame。

检查自定义功能

检查传递给applyor的自定义函数的输入可能会很有帮助transform

例子

让我们创建一些示例数据并检查组,以便您可以了解我在说什么:

import pandas as pd
import numpy as np
df = pd.DataFrame({'State':['Texas', 'Texas', 'Florida', 'Florida'], 
                   'a':[4,5,1,3], 'b':[6,10,3,11]})

     State  a   b
0    Texas  4   6
1    Texas  5  10
2  Florida  1   3
3  Florida  3  11

让我们创建一个简单的自定义函数,该函数打印出隐式传递的对象的类型,然后引发一个错误,以便可以停止执行。

def inspect(x):
    print(type(x))
    raise

现在,让我们将此函数传递给groupby applytransformmethod,以查看传递给它的对象:

df.groupby('State').apply(inspect)

<class 'pandas.core.frame.DataFrame'>
<class 'pandas.core.frame.DataFrame'>
RuntimeError

如您所见,DataFrame被传递到inspect函数中。您可能想知道为什么将DataFrame类型打印两次。熊猫两次参加第一组比赛。这样做是为了确定是否存在快速完成计算的方法。这是您不应该担心的次要细节。

现在,让我们用 transform

df.groupby('State').transform(inspect)
<class 'pandas.core.series.Series'>
<class 'pandas.core.series.Series'>
RuntimeError

它传递了一个Series-一个完全不同的Pandas对象。

因此,一次transform只能使用一个系列。它并非不可能同时作用于两根色谱柱。因此,如果尝试ab自定义函数中减去column ,则会出现错误transform。见下文:

def subtract_two(x):
    return x['a'] - x['b']

df.groupby('State').transform(subtract_two)
KeyError: ('a', 'occurred at index a')

当熊猫试图找到a不存在的Series索引时,我们得到一个KeyError 。您可以通过完整apply的DataFrame 来完成此操作:

df.groupby('State').apply(subtract_two)

State     
Florida  2   -2
         3   -8
Texas    0   -2
         1   -5
dtype: int64

输出是一个Series,并且保留了原始索引,因此有些混乱,但是我们可以访问所有列。


显示传递的熊猫对象

它可以在自定义函数中显示整个pandas对象,从而提供更多帮助,因此您可以确切地看到所使用的对象。您可以使用print我喜欢使用模块中的display函数的语句,IPython.display以便在Jupyter笔记本中以HTML形式很好地输出DataFrame:

from IPython.display import display
def subtract_two(x):
    display(x)
    return x['a'] - x['b']

屏幕截图:


变换必须返回与组大小相同的一维序列

另一个区别是transform必须返回与该组相同大小的一维序列。在这种特定情况下,每个组都有两行,因此transform必须返回两行的序列。如果没有,则会引发错误:

def return_three(x):
    return np.array([1, 2, 3])

df.groupby('State').transform(return_three)
ValueError: transform must return a scalar value for each group

该错误消息并不能真正说明问题。您必须返回与组长度相同的序列。因此,这样的功能将起作用:

def rand_group_len(x):
    return np.random.rand(len(x))

df.groupby('State').transform(rand_group_len)

          a         b
0  0.962070  0.151440
1  0.440956  0.782176
2  0.642218  0.483257
3  0.056047  0.238208

返回单个标量对象也适用于 transform

如果仅从自定义函数返回单个标量,transform则将其用于组中的每一行:

def group_sum(x):
    return x.sum()

df.groupby('State').transform(group_sum)

   a   b
0  9  16
1  9  16
2  4  14
3  4  14

Two major differences between apply and transform

There are two major differences between the transform and apply groupby methods.

  • Input:
  • apply implicitly passes all the columns for each group as a DataFrame to the custom function.
  • while transform passes each column for each group individually as a Series to the custom function.
  • Output:
  • The custom function passed to apply can return a scalar, or a Series or DataFrame (or numpy array or even list).
  • The custom function passed to transform must return a sequence (a one dimensional Series, array or list) the same length as the group.

So, transform works on just one Series at a time and apply works on the entire DataFrame at once.

Inspecting the custom function

It can help quite a bit to inspect the input to your custom function passed to apply or transform.

Examples

Let’s create some sample data and inspect the groups so that you can see what I am talking about:

import pandas as pd
import numpy as np
df = pd.DataFrame({'State':['Texas', 'Texas', 'Florida', 'Florida'], 
                   'a':[4,5,1,3], 'b':[6,10,3,11]})

     State  a   b
0    Texas  4   6
1    Texas  5  10
2  Florida  1   3
3  Florida  3  11

Let’s create a simple custom function that prints out the type of the implicitly passed object and then raised an error so that execution can be stopped.

def inspect(x):
    print(type(x))
    raise

Now let’s pass this function to both the groupby apply and transform methods to see what object is passed to it:

df.groupby('State').apply(inspect)

<class 'pandas.core.frame.DataFrame'>
<class 'pandas.core.frame.DataFrame'>
RuntimeError

As you can see, a DataFrame is passed into the inspect function. You might be wondering why the type, DataFrame, got printed out twice. Pandas runs the first group twice. It does this to determine if there is a fast way to complete the computation or not. This is a minor detail that you shouldn’t worry about.

Now, let’s do the same thing with transform

df.groupby('State').transform(inspect)
<class 'pandas.core.series.Series'>
<class 'pandas.core.series.Series'>
RuntimeError

It is passed a Series – a totally different Pandas object.

So, transform is only allowed to work with a single Series at a time. It is impossible for it to act on two columns at the same time. So, if we try and subtract column a from b inside of our custom function we would get an error with transform. See below:

def subtract_two(x):
    return x['a'] - x['b']

df.groupby('State').transform(subtract_two)
KeyError: ('a', 'occurred at index a')

We get a KeyError as pandas is attempting to find the Series index a which does not exist. You can complete this operation with apply as it has the entire DataFrame:

df.groupby('State').apply(subtract_two)

State     
Florida  2   -2
         3   -8
Texas    0   -2
         1   -5
dtype: int64

The output is a Series and a little confusing as the original index is kept, but we have access to all columns.


Displaying the passed pandas object

It can help even more to display the entire pandas object within the custom function, so you can see exactly what you are operating with. You can use print statements by I like to use the display function from the IPython.display module so that the DataFrames get nicely outputted in HTML in a jupyter notebook:

from IPython.display import display
def subtract_two(x):
    display(x)
    return x['a'] - x['b']

Screenshot:


Transform must return a single dimensional sequence the same size as the group

The other difference is that transform must return a single dimensional sequence the same size as the group. In this particular instance, each group has two rows, so transform must return a sequence of two rows. If it does not then an error is raised:

def return_three(x):
    return np.array([1, 2, 3])

df.groupby('State').transform(return_three)
ValueError: transform must return a scalar value for each group

The error message is not really descriptive of the problem. You must return a sequence the same length as the group. So, a function like this would work:

def rand_group_len(x):
    return np.random.rand(len(x))

df.groupby('State').transform(rand_group_len)

          a         b
0  0.962070  0.151440
1  0.440956  0.782176
2  0.642218  0.483257
3  0.056047  0.238208

Returning a single scalar object also works for transform

If you return just a single scalar from your custom function, then transform will use it for each of the rows in the group:

def group_sum(x):
    return x.sum()

df.groupby('State').transform(group_sum)

   a   b
0  9  16
1  9  16
2  4  14
3  4  14

回答 1

就像我对.transform操作vs 感到困惑一样,.apply我找到了一些答案,这使我对该问题有所了解。例如,此答案非常有帮助。

到目前为止,我的建议是彼此隔离地.transform处理(或处理)Series(列)。这意味着在最后两个呼叫中:

df.groupby('A').transform(lambda x: (x['C'] - x['D']))
df.groupby('A').transform(lambda x: (x['C'] - x['D']).mean())

您要求.transform从两列中获取值,而“它”实际上并没有同时“看到”它们(可以这么说)。transform将逐一查看数据框列,然后返回一系列“(由一系列)标量组成的”(或一组系列),这些标量被重复了len(input_column)几次。

因此,应使用此标量.transform来使之Series成为输入上应用某种归约函数的结果Series(并且一次只能应用于一个系列/列)。

考虑以下示例(在您的数据框上):

zscore = lambda x: (x - x.mean()) / x.std() # Note that it does not reference anything outside of 'x' and for transform 'x' is one column.
df.groupby('A').transform(zscore)

将生成:

       C      D
0  0.989  0.128
1 -0.478  0.489
2  0.889 -0.589
3 -0.671 -1.150
4  0.034 -0.285
5  1.149  0.662
6 -1.404 -0.907
7 -0.509  1.653

这与您一次只在一列上使用它完全相同:

df.groupby('A')['C'].transform(zscore)

生成:

0    0.989
1   -0.478
2    0.889
3   -0.671
4    0.034
5    1.149
6   -1.404
7   -0.509

请注意,.apply在上一个示例(df.groupby('A')['C'].apply(zscore))中,它的工作方式完全相同,但是如果您尝试在数据帧上使用它,它将失败:

df.groupby('A').apply(zscore)

给出错误:

ValueError: operands could not be broadcast together with shapes (6,) (2,)

那么还有什么.transform用处呢?最简单的情况是尝试将归约函数的结果分配回原始数据帧。

df['sum_C'] = df.groupby('A')['C'].transform(sum)
df.sort('A') # to clearly see the scalar ('sum') applies to the whole column of the group

生成:

     A      B      C      D  sum_C
1  bar    one  1.998  0.593  3.973
3  bar  three  1.287 -0.639  3.973
5  bar    two  0.687 -1.027  3.973
4  foo    two  0.205  1.274  4.373
2  foo    two  0.128  0.924  4.373
6  foo    one  2.113 -0.516  4.373
7  foo  three  0.657 -1.179  4.373
0  foo    one  1.270  0.201  4.373

尝试用同样.apply会给NaNssum_C。因为.apply会返回reduce Series,所以它不知道如何广播回去:

df.groupby('A')['C'].apply(sum)

给予:

A
bar    3.973
foo    4.373

在某些情况下,什么时候.transform用于过滤数据:

df[df.groupby(['B'])['D'].transform(sum) < -1]

     A      B      C      D
3  bar  three  1.287 -0.639
7  foo  three  0.657 -1.179

我希望这可以增加一些清晰度。

As I felt similarly confused with .transform operation vs. .apply I found a few answers shedding some light on the issue. This answer for example was very helpful.

My takeout so far is that .transform will work (or deal) with Series (columns) in isolation from each other. What this means is that in your last two calls:

df.groupby('A').transform(lambda x: (x['C'] - x['D']))
df.groupby('A').transform(lambda x: (x['C'] - x['D']).mean())

You asked .transform to take values from two columns and ‘it’ actually does not ‘see’ both of them at the same time (so to speak). transform will look at the dataframe columns one by one and return back a series (or group of series) ‘made’ of scalars which are repeated len(input_column) times.

So this scalar, that should be used by .transform to make the Series is a result of some reduction function applied on an input Series (and only on ONE series/column at a time).

Consider this example (on your dataframe):

zscore = lambda x: (x - x.mean()) / x.std() # Note that it does not reference anything outside of 'x' and for transform 'x' is one column.
df.groupby('A').transform(zscore)

will yield:

       C      D
0  0.989  0.128
1 -0.478  0.489
2  0.889 -0.589
3 -0.671 -1.150
4  0.034 -0.285
5  1.149  0.662
6 -1.404 -0.907
7 -0.509  1.653

Which is exactly the same as if you would use it on only on one column at a time:

df.groupby('A')['C'].transform(zscore)

yielding:

0    0.989
1   -0.478
2    0.889
3   -0.671
4    0.034
5    1.149
6   -1.404
7   -0.509

Note that .apply in the last example (df.groupby('A')['C'].apply(zscore)) would work in exactly the same way, but it would fail if you tried using it on a dataframe:

df.groupby('A').apply(zscore)

gives error:

ValueError: operands could not be broadcast together with shapes (6,) (2,)

So where else is .transform useful? The simplest case is trying to assign results of reduction function back to original dataframe.

df['sum_C'] = df.groupby('A')['C'].transform(sum)
df.sort('A') # to clearly see the scalar ('sum') applies to the whole column of the group

yielding:

     A      B      C      D  sum_C
1  bar    one  1.998  0.593  3.973
3  bar  three  1.287 -0.639  3.973
5  bar    two  0.687 -1.027  3.973
4  foo    two  0.205  1.274  4.373
2  foo    two  0.128  0.924  4.373
6  foo    one  2.113 -0.516  4.373
7  foo  three  0.657 -1.179  4.373
0  foo    one  1.270  0.201  4.373

Trying the same with .apply would give NaNs in sum_C. Because .apply would return a reduced Series, which it does not know how to broadcast back:

df.groupby('A')['C'].apply(sum)

giving:

A
bar    3.973
foo    4.373

There are also cases when .transform is used to filter the data:

df[df.groupby(['B'])['D'].transform(sum) < -1]

     A      B      C      D
3  bar  three  1.287 -0.639
7  foo  three  0.657 -1.179

I hope this adds a bit more clarity.


回答 2

我将使用一个非常简单的代码片段来说明不同之处:

test = pd.DataFrame({'id':[1,2,3,1,2,3,1,2,3], 'price':[1,2,3,2,3,1,3,1,2]})
grouping = test.groupby('id')['price']

DataFrame看起来像这样:

    id  price   
0   1   1   
1   2   2   
2   3   3   
3   1   2   
4   2   3   
5   3   1   
6   1   3   
7   2   1   
8   3   2   

该表中有3个客户ID,每个客户进行三笔交易,每次支付1,2,3美元。

现在,我想找到每个客户的最低付款额。有两种方法:

  1. 使用apply

    grouping.min()

回报看起来像这样:

id
1    1
2    1
3    1
Name: price, dtype: int64

pandas.core.series.Series # return type
Int64Index([1, 2, 3], dtype='int64', name='id') #The returned Series' index
# lenght is 3
  1. 使用transform

    分组变换(最小值)

回报看起来像这样:

0    1
1    1
2    1
3    1
4    1
5    1
6    1
7    1
8    1
Name: price, dtype: int64

pandas.core.series.Series # return type
RangeIndex(start=0, stop=9, step=1) # The returned Series' index
# length is 9    

这两个方法都返回一个Series对象,但是第一个的对象length为3,length第二个的对象为9。

如果要回答What is the minimum price paid by each customer,则该apply方法是更适合选择的一种。

如果要回答What is the difference between the amount paid for each transaction vs the minimum payment,则要使用transform,因为:

test['minimum'] = grouping.transform(min) # ceates an extra column filled with minimum payment
test.price - test.minimum # returns the difference for each row

Apply 不能简单地在这里工作,因为它返回的是大小为3的Series,但是原始df的长度为9。您无法轻松地将其集成回原始df。

I am going to use a very simple snippet to illustrate the difference:

test = pd.DataFrame({'id':[1,2,3,1,2,3,1,2,3], 'price':[1,2,3,2,3,1,3,1,2]})
grouping = test.groupby('id')['price']

The DataFrame looks like this:

    id  price   
0   1   1   
1   2   2   
2   3   3   
3   1   2   
4   2   3   
5   3   1   
6   1   3   
7   2   1   
8   3   2   

There are 3 customer IDs in this table, each customer made three transactions and paid 1,2,3 dollars each time.

Now, I want to find the minimum payment made by each customer. There are two ways of doing it:

  1. Using apply:

    grouping.min()

The return looks like this:

id
1    1
2    1
3    1
Name: price, dtype: int64

pandas.core.series.Series # return type
Int64Index([1, 2, 3], dtype='int64', name='id') #The returned Series' index
# lenght is 3
  1. Using transform:

    grouping.transform(min)

The return looks like this:

0    1
1    1
2    1
3    1
4    1
5    1
6    1
7    1
8    1
Name: price, dtype: int64

pandas.core.series.Series # return type
RangeIndex(start=0, stop=9, step=1) # The returned Series' index
# length is 9    

Both methods return a Series object, but the length of the first one is 3 and the length of the second one is 9.

If you want to answer What is the minimum price paid by each customer, then the apply method is the more suitable one to choose.

If you want to answer What is the difference between the amount paid for each transaction vs the minimum payment, then you want to use transform, because:

test['minimum'] = grouping.transform(min) # ceates an extra column filled with minimum payment
test.price - test.minimum # returns the difference for each row

Apply does not work here simply because it returns a Series of size 3, but the original df’s length is 9. You cannot integrate it back to the original df easily.


回答 3

tmp = df.groupby(['A'])['c'].transform('mean')

就好像

tmp1 = df.groupby(['A']).agg({'c':'mean'})
tmp = df['A'].map(tmp1['c'])

要么

tmp1 = df.groupby(['A'])['c'].mean()
tmp = df['A'].map(tmp1)
tmp = df.groupby(['A'])['c'].transform('mean')

is like

tmp1 = df.groupby(['A']).agg({'c':'mean'})
tmp = df['A'].map(tmp1['c'])

or

tmp1 = df.groupby(['A'])['c'].mean()
tmp = df['A'].map(tmp1)