使用熊猫的“大数据”工作流程

问题:使用熊猫的“大数据”工作流程

在学习熊猫的过程中,我试图迷惑了这个问题很多月。我在日常工作中使用SAS,这非常有用,因为它提供了核心支持。但是,由于许多其他原因,SAS作为一个软件也是很糟糕的。

有一天,我希望用python和pandas取代我对SAS的使用,但是我目前缺少大型数据集的核心工作流程。我不是在谈论需要分布式网络的“大数据”,而是文件太大而无法容纳在内存中,但文件又足够小而无法容纳在硬盘上。

我的第一个想法是用来HDFStore将大型数据集保存在磁盘上,然后仅将需要的部分拉入数据帧中进行分析。其他人提到MongoDB是更易于使用的替代方案。我的问题是这样的:

什么是实现以下目标的最佳实践工作流:

  1. 将平面文件加载到永久的磁盘数据库结构中
  2. 查询该数据库以检索要输入到熊猫数据结构中的数据
  3. 处理熊猫中的片段后更新数据库

现实世界中的示例将不胜感激,尤其是那些从“大数据”中使用熊猫的人。

编辑-我希望如何工作的示例:

  1. 迭代地导入一个大的平面文件,并将其存储在永久的磁盘数据库结构中。这些文件通常太大而无法容纳在内存中。
  2. 为了使用Pandas,我想读取这些数据的子集(通常一次只读取几列),使其适合内存。
  3. 我将通过对所选列执行各种操作来创建新列。
  4. 然后,我将不得不将这些新列添加到数据库结构中。

我正在尝试找到执行这些步骤的最佳实践方法。阅读有关熊猫和pytables的链接,似乎添加一个新列可能是个问题。

编辑-专门回答杰夫的问题:

  1. 我正在建立消费者信用风险模型。数据类型包括电话,SSN和地址特征;财产价值;犯罪记录,破产等贬义信息。我每天使用的数据集平均有近1,000到2,000个字段,这些字段是混合数据类型:数字和字符数据的连续,名义和有序变量。我很少追加行,但是我确实执行许多创建新列的操作。
  2. 典型的操作涉及使用条件逻辑将几个列合并到一个新的复合列中。例如,if var1 > 2 then newvar = 'A' elif var2 = 4 then newvar = 'B'。这些操作的结果是数据集中每个记录的新列。
  3. 最后,我想将这些新列添加到磁盘数据结构中。我将重复步骤2,使用交叉表和描述性统计数据探索数据,以寻找有趣的直观关系进行建模。
  4. 一个典型的项目文件通常约为1GB。文件组织成这样的方式,其中一行包含消费者数据记录。每条记录的每一行都有相同的列数。情况总是如此。
  5. 创建新列时,我会按行进行子集化是非常罕见的。但是,在创建报告或生成描述性统计信息时,对行进行子集化是很常见的。例如,我可能想为特定业务创建一个简单的频率,例如零售信用卡。为此,除了我要报告的任何列之外,我将只选择那些业务线=零售的记录。但是,在创建新列时,我将拉出所有数据行,而仅提取操作所需的列。
  6. 建模过程要求我分析每一列,寻找与某些结果变量有关的有趣关系,并创建描述这些关系的新复合列。我探索的列通常以小集合形式完成。例如,我将集中讨论一组20个仅涉及属性值的列,并观察它们与贷款违约的关系。一旦探索了这些列并创建了新的列,我便转到另一组列,例如大学学历,并重复该过程。我正在做的是创建候选变量,这些变量解释我的数据和某些结果之间的关系。在此过程的最后,我应用了一些学习技术,这些技术可以根据这些复合列创建方程。

我很少向数据集添加行。我几乎总是会创建新列(统计/机器学习术语中的变量或功能)。

I have tried to puzzle out an answer to this question for many months while learning pandas. I use SAS for my day-to-day work and it is great for it’s out-of-core support. However, SAS is horrible as a piece of software for numerous other reasons.

One day I hope to replace my use of SAS with python and pandas, but I currently lack an out-of-core workflow for large datasets. I’m not talking about “big data” that requires a distributed network, but rather files too large to fit in memory but small enough to fit on a hard-drive.

My first thought is to use HDFStore to hold large datasets on disk and pull only the pieces I need into dataframes for analysis. Others have mentioned MongoDB as an easier to use alternative. My question is this:

What are some best-practice workflows for accomplishing the following:

  1. Loading flat files into a permanent, on-disk database structure
  2. Querying that database to retrieve data to feed into a pandas data structure
  3. Updating the database after manipulating pieces in pandas

Real-world examples would be much appreciated, especially from anyone who uses pandas on “large data”.

Edit — an example of how I would like this to work:

  1. Iteratively import a large flat-file and store it in a permanent, on-disk database structure. These files are typically too large to fit in memory.
  2. In order to use Pandas, I would like to read subsets of this data (usually just a few columns at a time) that can fit in memory.
  3. I would create new columns by performing various operations on the selected columns.
  4. I would then have to append these new columns into the database structure.

I am trying to find a best-practice way of performing these steps. Reading links about pandas and pytables it seems that appending a new column could be a problem.

Edit — Responding to Jeff’s questions specifically:

  1. I am building consumer credit risk models. The kinds of data include phone, SSN and address characteristics; property values; derogatory information like criminal records, bankruptcies, etc… The datasets I use every day have nearly 1,000 to 2,000 fields on average of mixed data types: continuous, nominal and ordinal variables of both numeric and character data. I rarely append rows, but I do perform many operations that create new columns.
  2. Typical operations involve combining several columns using conditional logic into a new, compound column. For example, if var1 > 2 then newvar = 'A' elif var2 = 4 then newvar = 'B'. The result of these operations is a new column for every record in my dataset.
  3. Finally, I would like to append these new columns into the on-disk data structure. I would repeat step 2, exploring the data with crosstabs and descriptive statistics trying to find interesting, intuitive relationships to model.
  4. A typical project file is usually about 1GB. Files are organized into such a manner where a row consists of a record of consumer data. Each row has the same number of columns for every record. This will always be the case.
  5. It’s pretty rare that I would subset by rows when creating a new column. However, it’s pretty common for me to subset on rows when creating reports or generating descriptive statistics. For example, I might want to create a simple frequency for a specific line of business, say Retail credit cards. To do this, I would select only those records where the line of business = retail in addition to whichever columns I want to report on. When creating new columns, however, I would pull all rows of data and only the columns I need for the operations.
  6. The modeling process requires that I analyze every column, look for interesting relationships with some outcome variable, and create new compound columns that describe those relationships. The columns that I explore are usually done in small sets. For example, I will focus on a set of say 20 columns just dealing with property values and observe how they relate to defaulting on a loan. Once those are explored and new columns are created, I then move on to another group of columns, say college education, and repeat the process. What I’m doing is creating candidate variables that explain the relationship between my data and some outcome. At the very end of this process, I apply some learning techniques that create an equation out of those compound columns.

It is rare that I would ever add rows to the dataset. I will nearly always be creating new columns (variables or features in statistics/machine learning parlance).


回答 0

我通常以这种方式使用数十GB的数据,例如,我在磁盘上有一些表,这些表是通过查询读取,创建数据并追加回去的。

值得阅读文档以及该线程的后期内容,以获取有关如何存储数据的一些建议。

将影响您存储数据方式的详细信息,例如:
尽可能多地提供详细信息;我可以帮助您建立结构。

  1. 数据大小,行数,列数,列类型;您要追加行还是仅追加列?
  2. 典型的操作将是什么样的。例如,对列进行查询以选择一堆行和特定的列,然后执行一个操作(在内存中),创建新列并保存。
    (提供一个玩具示例可以使我们提供更具体的建议。)
  3. 处理完之后,您该怎么办?步骤2是临时的还是可重复的?
  4. 输入平面文件:大约总大小(以Gb为单位)。这些是如何组织的,例如通过记录?每个文件都包含不同的字段,还是每个文件都有一些记录,每个文件中都有所有字段?
  5. 您是否曾经根据条件选择行(记录)的子集(例如,选择字段A> 5的行)?然后执行某项操作,还是只选择包含所有记录的A,B,C字段(然后执行某项操作)?
  6. 您是否“工作”所有列(成组),还是只用于报告的比例很高(例如,您想保留数据,但无需明确地拉入该列,直到最终结果时间)?

确保至少0.10.1安装了熊猫

逐块读取迭代文件多个表查询

由于pytables已优化为按行操作(这是您要查询的内容),因此我们将为每组字段创建一个表。这样一来,很容易选择一小组字段(它将与一个大表一起使用,但是这样做更有效。我想我将来可能会解决此限制。这是更加直观):(
以下是伪代码。)

import numpy as np
import pandas as pd

# create a store
store = pd.HDFStore('mystore.h5')

# this is the key to your storage:
#    this maps your fields to a specific group, and defines 
#    what you want to have as data_columns.
#    you might want to create a nice class wrapping this
#    (as you will want to have this map and its inversion)  
group_map = dict(
    A = dict(fields = ['field_1','field_2',.....], dc = ['field_1',....,'field_5']),
    B = dict(fields = ['field_10',......        ], dc = ['field_10']),
    .....
    REPORTING_ONLY = dict(fields = ['field_1000','field_1001',...], dc = []),

)

group_map_inverted = dict()
for g, v in group_map.items():
    group_map_inverted.update(dict([ (f,g) for f in v['fields'] ]))

读入文件并创建存储(基本上是做什么append_to_multiple):

for f in files:
   # read in the file, additional options may be necessary here
   # the chunksize is not strictly necessary, you may be able to slurp each 
   # file into memory in which case just eliminate this part of the loop 
   # (you can also change chunksize if necessary)
   for chunk in pd.read_table(f, chunksize=50000):
       # we are going to append to each table by group
       # we are not going to create indexes at this time
       # but we *ARE* going to create (some) data_columns

       # figure out the field groupings
       for g, v in group_map.items():
             # create the frame for this group
             frame = chunk.reindex(columns = v['fields'], copy = False)    

             # append it
             store.append(g, frame, index=False, data_columns = v['dc'])

现在,您已将所有表存储在文件中(实际上,您可以根据需要将它们存储在单独的文件中,您可能需要将文件名添加到group_map中,但这可能不是必需的)。

这是获取列并创建新列的方式:

frame = store.select(group_that_I_want)
# you can optionally specify:
# columns = a list of the columns IN THAT GROUP (if you wanted to
#     select only say 3 out of the 20 columns in this sub-table)
# and a where clause if you want a subset of the rows

# do calculations on this frame
new_frame = cool_function_on_frame(frame)

# to 'add columns', create a new group (you probably want to
# limit the columns in this new_group to be only NEW ones
# (e.g. so you don't overlap from the other tables)
# add this info to the group_map
store.append(new_group, new_frame.reindex(columns = new_columns_created, copy = False), data_columns = new_columns_created)

准备进行后期处理时:

# This may be a bit tricky; and depends what you are actually doing.
# I may need to modify this function to be a bit more general:
report_data = store.select_as_multiple([groups_1,groups_2,.....], where =['field_1>0', 'field_1000=foo'], selector = group_1)

关于data_columns,实际上不需要定义任何 data_columns。它们使您可以根据列来子选择行。例如:

store.select(group, where = ['field_1000=foo', 'field_1001>0'])

在最后的报告生成阶段,它们可能对您来说最有趣(实际上,数据列与其他列是分开的,如果定义太多,这可能会影响效率)。

您可能还想:

  • 创建一个使用字段列表的函数,在groups_map中查找组,然后选择它们并连接结果,以便获得结果框架(本质上就是select_as_multiple所做的事情)。这样,结构对您将非常透明。
  • 在某些数据列上建立索引(使行子设置快得多)。
  • 启用压缩。

如有疑问,请告诉我!

I routinely use tens of gigabytes of data in just this fashion e.g. I have tables on disk that I read via queries, create data and append back.

It’s worth reading the docs and late in this thread for several suggestions for how to store your data.

Details which will affect how you store your data, like:
Give as much detail as you can; and I can help you develop a structure.

  1. Size of data, # of rows, columns, types of columns; are you appending rows, or just columns?
  2. What will typical operations look like. E.g. do a query on columns to select a bunch of rows and specific columns, then do an operation (in-memory), create new columns, save these.
    (Giving a toy example could enable us to offer more specific recommendations.)
  3. After that processing, then what do you do? Is step 2 ad hoc, or repeatable?
  4. Input flat files: how many, rough total size in Gb. How are these organized e.g. by records? Does each one contains different fields, or do they have some records per file with all of the fields in each file?
  5. Do you ever select subsets of rows (records) based on criteria (e.g. select the rows with field A > 5)? and then do something, or do you just select fields A, B, C with all of the records (and then do something)?
  6. Do you ‘work on’ all of your columns (in groups), or are there a good proportion that you may only use for reports (e.g. you want to keep the data around, but don’t need to pull in that column explicity until final results time)?

Solution

Ensure you have pandas at least 0.10.1 installed.

Read iterating files chunk-by-chunk and multiple table queries.

Since pytables is optimized to operate on row-wise (which is what you query on), we will create a table for each group of fields. This way it’s easy to select a small group of fields (which will work with a big table, but it’s more efficient to do it this way… I think I may be able to fix this limitation in the future… this is more intuitive anyhow):
(The following is pseudocode.)

import numpy as np
import pandas as pd

# create a store
store = pd.HDFStore('mystore.h5')

# this is the key to your storage:
#    this maps your fields to a specific group, and defines 
#    what you want to have as data_columns.
#    you might want to create a nice class wrapping this
#    (as you will want to have this map and its inversion)  
group_map = dict(
    A = dict(fields = ['field_1','field_2',.....], dc = ['field_1',....,'field_5']),
    B = dict(fields = ['field_10',......        ], dc = ['field_10']),
    .....
    REPORTING_ONLY = dict(fields = ['field_1000','field_1001',...], dc = []),

)

group_map_inverted = dict()
for g, v in group_map.items():
    group_map_inverted.update(dict([ (f,g) for f in v['fields'] ]))

Reading in the files and creating the storage (essentially doing what append_to_multiple does):

for f in files:
   # read in the file, additional options may be necessary here
   # the chunksize is not strictly necessary, you may be able to slurp each 
   # file into memory in which case just eliminate this part of the loop 
   # (you can also change chunksize if necessary)
   for chunk in pd.read_table(f, chunksize=50000):
       # we are going to append to each table by group
       # we are not going to create indexes at this time
       # but we *ARE* going to create (some) data_columns

       # figure out the field groupings
       for g, v in group_map.items():
             # create the frame for this group
             frame = chunk.reindex(columns = v['fields'], copy = False)    

             # append it
             store.append(g, frame, index=False, data_columns = v['dc'])

Now you have all of the tables in the file (actually you could store them in separate files if you wish, you would prob have to add the filename to the group_map, but probably this isn’t necessary).

This is how you get columns and create new ones:

frame = store.select(group_that_I_want)
# you can optionally specify:
# columns = a list of the columns IN THAT GROUP (if you wanted to
#     select only say 3 out of the 20 columns in this sub-table)
# and a where clause if you want a subset of the rows

# do calculations on this frame
new_frame = cool_function_on_frame(frame)

# to 'add columns', create a new group (you probably want to
# limit the columns in this new_group to be only NEW ones
# (e.g. so you don't overlap from the other tables)
# add this info to the group_map
store.append(new_group, new_frame.reindex(columns = new_columns_created, copy = False), data_columns = new_columns_created)

When you are ready for post_processing:

# This may be a bit tricky; and depends what you are actually doing.
# I may need to modify this function to be a bit more general:
report_data = store.select_as_multiple([groups_1,groups_2,.....], where =['field_1>0', 'field_1000=foo'], selector = group_1)

About data_columns, you don’t actually need to define ANY data_columns; they allow you to sub-select rows based on the column. E.g. something like:

store.select(group, where = ['field_1000=foo', 'field_1001>0'])

They may be most interesting to you in the final report generation stage (essentially a data column is segregated from other columns, which might impact efficiency somewhat if you define a lot).

You also might want to:

  • create a function which takes a list of fields, looks up the groups in the groups_map, then selects these and concatenates the results so you get the resulting frame (this is essentially what select_as_multiple does). This way the structure would be pretty transparent to you.
  • indexes on certain data columns (makes row-subsetting much faster).
  • enable compression.

Let me know when you have questions!


回答 1

我认为以上答案都缺少一种我发现非常有用的简单方法。

当我的文件太大而无法加载到内存中时,我将该文件分成多个较小的文件(按行或列)

示例:如果有30天的〜30GB大小的交易数据值得每天将其拆分为一个〜1GB大小的文件。随后,我分别处理每个文件,并在最后汇总结果

最大的优势之一是它允许并行处理文件(多个线程或多个进程)

另一个优点是文件操作(如示例中的添加/删除日期)可以通过常规的shell命令完成,而在更高级/更复杂的文件格式中则无法实现

这种方法无法涵盖所有​​情况,但在许多情况下非常有用

I think the answers above are missing a simple approach that I’ve found very useful.

When I have a file that is too large to load in memory, I break up the file into multiple smaller files (either by row or cols)

Example: In case of 30 days worth of trading data of ~30GB size, I break it into a file per day of ~1GB size. I subsequently process each file separately and aggregate results at the end

One of the biggest advantages is that it allows parallel processing of the files (either multiple threads or processes)

The other advantage is that file manipulation (like adding/removing dates in the example) can be accomplished by regular shell commands, which is not be possible in more advanced/complicated file formats

This approach doesn’t cover all scenarios, but is very useful in a lot of them


回答 2

问题提出两年后,现在出现了一个“核心外”熊猫:dask。太好了!尽管它不支持所有熊猫功能,但您可以真正做到这一点。

There is now, two years after the question, an ‘out-of-core’ pandas equivalent: dask. It is excellent! Though it does not support all of pandas functionality, you can get really far with it.


回答 3

如果您的数据集介于1到20GB之间,则应该获得具有48GB RAM的工作站。然后,熊猫可以将整个数据集保存在RAM中。我知道这不是您在这里寻找的答案,但是在具有4GB RAM的笔记本电脑上进行科学计算是不合理的。

If your datasets are between 1 and 20GB, you should get a workstation with 48GB of RAM. Then Pandas can hold the entire dataset in RAM. I know its not the answer you’re looking for here, but doing scientific computing on a notebook with 4GB of RAM isn’t reasonable.


回答 4

我知道这是一个旧线程,但是我认为Blaze库值得一试。它是针对此类情况而构建的。

从文档:

Blaze将NumPy和Pandas的可用性扩展到分布式和核外计算。Blaze提供了类似于NumPy ND-Array或Pandas DataFrame的接口,但是将这些熟悉的接口映射到了其他各种计算引擎上,例如Postgres或Spark。

编辑:顺便说一下,它由ContinuumIO和NumPy的作者Travis Oliphant支持。

I know this is an old thread but I think the Blaze library is worth checking out. It’s built for these types of situations.

From the docs:

Blaze extends the usability of NumPy and Pandas to distributed and out-of-core computing. Blaze provides an interface similar to that of the NumPy ND-Array or Pandas DataFrame but maps these familiar interfaces onto a variety of other computational engines like Postgres or Spark.

Edit: By the way, it’s supported by ContinuumIO and Travis Oliphant, author of NumPy.


回答 5

pymongo就是这种情况。我还使用python中的sql server,sqlite,HDF,ORM(SQLAlchemy)进行了原型设计。首要的pymongo是基于文档的数据库,因此每个人都是(dict具有属性的)文档。很多人组成一个集合,您可以有很多集合(人,股票市场,收入)。

pd.dateframe-> pymongo注意:我使用chunksizein read_csv使其保持5到10k记录(如果较大,pymongo会删除套接字)

aCollection.insert((a[1].to_dict() for a in df.iterrows()))

查询:gt =大于…

pd.DataFrame(list(mongoCollection.find({'anAttribute':{'$gt':2887000, '$lt':2889000}})))

.find()返回一个迭代器,因此我通常将ichunked其切成更小的迭代器。

由于我通常可以将10个数据源粘贴在一起,因此如何进行联接:

aJoinDF = pandas.DataFrame(list(mongoCollection.find({'anAttribute':{'$in':Att_Keys}})))

然后(就我而言,有时我必须aJoinDF先进行“可合并”操作)。

df = pandas.merge(df, aJoinDF, on=aKey, how='left')

然后,您可以通过下面的update方法将新信息写入您的主要收藏夹。(逻辑收集与物理数据源)。

collection.update({primarykey:foo},{key:change})

在较小的查询中,只需进行非规范化即可。例如,您在文档中有代码,而您仅添加域代码文本并在dict创建文档时进行查找。

现在,您有了一个基于人的漂亮数据集,您可以在每种情况下释放自己的逻辑并添加更多属性。最后,您可以将3个最大记忆键指标读入大熊猫,并进行数据透视/汇总/数据探索。这对我来说适合300万条带有数字/大文本/类别/代码/浮点数/ …的记录

您还可以使用MongoDB内置的两种方法(MapReduce和聚合框架)。有关聚合框架的更多信息,请参见此处,因为它似乎比MapReduce容易,并且看起来便于进行快速聚合工作。注意,我不需要定义字段或关系,可以将项目添加到文档中。在快速变化的numpy,pandas,python工具集的当前状态下,MongoDB可以帮助我开始工作:)

This is the case for pymongo. I have also prototyped using sql server, sqlite, HDF, ORM (SQLAlchemy) in python. First and foremost pymongo is a document based DB, so each person would be a document (dict of attributes). Many people form a collection and you can have many collections (people, stock market, income).

pd.dateframe -> pymongo Note: I use the chunksize in read_csv to keep it to 5 to 10k records(pymongo drops the socket if larger)

aCollection.insert((a[1].to_dict() for a in df.iterrows()))

querying: gt = greater than…

pd.DataFrame(list(mongoCollection.find({'anAttribute':{'$gt':2887000, '$lt':2889000}})))

.find() returns an iterator so I commonly use ichunked to chop into smaller iterators.

How about a join since I normally get 10 data sources to paste together:

aJoinDF = pandas.DataFrame(list(mongoCollection.find({'anAttribute':{'$in':Att_Keys}})))

then (in my case sometimes I have to agg on aJoinDF first before its “mergeable”.)

df = pandas.merge(df, aJoinDF, on=aKey, how='left')

And you can then write the new info to your main collection via the update method below. (logical collection vs physical datasources).

collection.update({primarykey:foo},{key:change})

On smaller lookups, just denormalize. For example, you have code in the document and you just add the field code text and do a dict lookup as you create documents.

Now you have a nice dataset based around a person, you can unleash your logic on each case and make more attributes. Finally you can read into pandas your 3 to memory max key indicators and do pivots/agg/data exploration. This works for me for 3 million records with numbers/big text/categories/codes/floats/…

You can also use the two methods built into MongoDB (MapReduce and aggregate framework). See here for more info about the aggregate framework, as it seems to be easier than MapReduce and looks handy for quick aggregate work. Notice I didn’t need to define my fields or relations, and I can add items to a document. At the current state of the rapidly changing numpy, pandas, python toolset, MongoDB helps me just get to work 🙂


回答 6

我发现这有点晚了,但我遇到了类似的问题(抵押预付款模型)。我的解决方案是跳过熊猫HDFStore层,并使用直接pytables。我将每列保存为最终文件中的单独HDF5阵列。

我的基本工作流程是首先从数据库中获取CSV文件。我用gzip压缩,所以它没有那么大。然后,通过在python中对其进行迭代,将每一行转换为实际数据类型并将其写入HDF5文件,将其转换为面向行的HDF5文件。这花费了数十分钟,但是它不使用任何内存,因为它只是逐行地操作。然后,我将面向行的HDF5文件“转置”为面向列的HDF5文件。

表转置如下:

def transpose_table(h_in, table_path, h_out, group_name="data", group_path="/"):
    # Get a reference to the input data.
    tb = h_in.getNode(table_path)
    # Create the output group to hold the columns.
    grp = h_out.createGroup(group_path, group_name, filters=tables.Filters(complevel=1))
    for col_name in tb.colnames:
        logger.debug("Processing %s", col_name)
        # Get the data.
        col_data = tb.col(col_name)
        # Create the output array.
        arr = h_out.createCArray(grp,
                                 col_name,
                                 tables.Atom.from_dtype(col_data.dtype),
                                 col_data.shape)
        # Store the data.
        arr[:] = col_data
    h_out.flush()

然后读回它就像:

def read_hdf5(hdf5_path, group_path="/data", columns=None):
    """Read a transposed data set from a HDF5 file."""
    if isinstance(hdf5_path, tables.file.File):
        hf = hdf5_path
    else:
        hf = tables.openFile(hdf5_path)

    grp = hf.getNode(group_path)
    if columns is None:
        data = [(child.name, child[:]) for child in grp]
    else:
        data = [(child.name, child[:]) for child in grp if child.name in columns]

    # Convert any float32 columns to float64 for processing.
    for i in range(len(data)):
        name, vec = data[i]
        if vec.dtype == np.float32:
            data[i] = (name, vec.astype(np.float64))

    if not isinstance(hdf5_path, tables.file.File):
        hf.close()
    return pd.DataFrame.from_items(data)

现在,我通常在具有大量内存的计算机上运行此程序,因此我可能对内存使用情况不够谨慎。例如,默认情况下,装入操作将读取整个数据集。

这通常对我有用,但是有点笨拙,我不能使用花式的pytables魔术。

编辑:与默认的记录数组pytables相比,此方法的真正优势在于,我可以使用无法处理表的h5r将数据加载到R中。或者,至少,我无法使其加载异类表。

I spotted this a little late, but I work with a similar problem (mortgage prepayment models). My solution has been to skip the pandas HDFStore layer and use straight pytables. I save each column as an individual HDF5 array in my final file.

My basic workflow is to first get a CSV file from the database. I gzip it, so it’s not as huge. Then I convert that to a row-oriented HDF5 file, by iterating over it in python, converting each row to a real data type, and writing it to a HDF5 file. That takes some tens of minutes, but it doesn’t use any memory, since it’s only operating row-by-row. Then I “transpose” the row-oriented HDF5 file into a column-oriented HDF5 file.

The table transpose looks like:

def transpose_table(h_in, table_path, h_out, group_name="data", group_path="/"):
    # Get a reference to the input data.
    tb = h_in.getNode(table_path)
    # Create the output group to hold the columns.
    grp = h_out.createGroup(group_path, group_name, filters=tables.Filters(complevel=1))
    for col_name in tb.colnames:
        logger.debug("Processing %s", col_name)
        # Get the data.
        col_data = tb.col(col_name)
        # Create the output array.
        arr = h_out.createCArray(grp,
                                 col_name,
                                 tables.Atom.from_dtype(col_data.dtype),
                                 col_data.shape)
        # Store the data.
        arr[:] = col_data
    h_out.flush()

Reading it back in then looks like:

def read_hdf5(hdf5_path, group_path="/data", columns=None):
    """Read a transposed data set from a HDF5 file."""
    if isinstance(hdf5_path, tables.file.File):
        hf = hdf5_path
    else:
        hf = tables.openFile(hdf5_path)

    grp = hf.getNode(group_path)
    if columns is None:
        data = [(child.name, child[:]) for child in grp]
    else:
        data = [(child.name, child[:]) for child in grp if child.name in columns]

    # Convert any float32 columns to float64 for processing.
    for i in range(len(data)):
        name, vec = data[i]
        if vec.dtype == np.float32:
            data[i] = (name, vec.astype(np.float64))

    if not isinstance(hdf5_path, tables.file.File):
        hf.close()
    return pd.DataFrame.from_items(data)

Now, I generally run this on a machine with a ton of memory, so I may not be careful enough with my memory usage. For example, by default the load operation reads the whole data set.

This generally works for me, but it’s a bit clunky, and I can’t use the fancy pytables magic.

Edit: The real advantage of this approach, over the array-of-records pytables default, is that I can then load the data into R using h5r, which can’t handle tables. Or, at least, I’ve been unable to get it to load heterogeneous tables.


回答 7

我发现对大型数据用例有用的一个技巧是通过将浮点精度降低到32位来减少数据量。它并非在所有情况下都适用,但是在许多应用程序中,64位精度过高,并且节省2倍的内存值得。提出一个显而易见的观点:

>>> df = pd.DataFrame(np.random.randn(int(1e8), 5))
>>> df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 100000000 entries, 0 to 99999999
Data columns (total 5 columns):
...
dtypes: float64(5)
memory usage: 3.7 GB

>>> df.astype(np.float32).info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 100000000 entries, 0 to 99999999
Data columns (total 5 columns):
...
dtypes: float32(5)
memory usage: 1.9 GB

One trick I found helpful for large data use cases is to reduce the volume of the data by reducing float precision to 32-bit. It’s not applicable in all cases, but in many applications 64-bit precision is overkill and the 2x memory savings are worth it. To make an obvious point even more obvious:

>>> df = pd.DataFrame(np.random.randn(int(1e8), 5))
>>> df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 100000000 entries, 0 to 99999999
Data columns (total 5 columns):
...
dtypes: float64(5)
memory usage: 3.7 GB

>>> df.astype(np.float32).info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 100000000 entries, 0 to 99999999
Data columns (total 5 columns):
...
dtypes: float32(5)
memory usage: 1.9 GB

回答 8

正如其他人所指出的,若干年后的“外的核心”大熊猫相当于已经出现:DASK。尽管dask并不是熊猫及其所有功能的直接替代品,但它出于以下几个原因而脱颖而出:

Dask是一个灵活的用于分析计算的并行计算库,针对动态任务调度进行了优化,以针对“大数据”集合(如并行数组,数据框和列表)的交互式计算工作负载进行动态任务调度,这些列表将诸如NumPy,Pandas或Python迭代器之类的通用接口扩展为更大的-非内存或分布式环境,并可以从便携式计算机扩展到群集。

达斯克强调以下优点:

  • 熟悉:提供并行的NumPy数组和Pandas DataFrame对象
  • 灵活:提供任务调度界面,用于更多自定义工作负载并与其他项目集成。
  • 本机:通过访问PyData堆栈,在Pure Python中启用分布式计算。
  • 快速:以低开销,低延迟和快速数值算法所需的最少序列化操作
  • 扩大规模:在具有1000个核心的集群上灵活运行缩小规模:在单个过程中轻松设置并在笔记本电脑上运行
  • 响应式:设计时考虑了交互式计算,可提供快速反馈和诊断以帮助人类

并添加一个简单的代码示例:

import dask.dataframe as dd
df = dd.read_csv('2015-*-*.csv')
df.groupby(df.user_id).value.mean().compute()

替换这样的一些熊猫代码:

import pandas as pd
df = pd.read_csv('2015-01-01.csv')
df.groupby(df.user_id).value.mean()

并且特别值得注意的是,通过该concurrent.futures界面提供了用于提交自定义任务的通用基础架构:

from dask.distributed import Client
client = Client('scheduler:port')

futures = []
for fn in filenames:
    future = client.submit(load, fn)
    futures.append(future)

summary = client.submit(summarize, futures)
summary.result()

As noted by others, after some years an ‘out-of-core’ pandas equivalent has emerged: dask. Though dask is not a drop-in replacement of pandas and all of its functionality it stands out for several reasons:

Dask is a flexible parallel computing library for analytic computing that is optimized for dynamic task scheduling for interactive computational workloads of “Big Data” collections like parallel arrays, dataframes, and lists that extend common interfaces like NumPy, Pandas, or Python iterators to larger-than-memory or distributed environments and scales from laptops to clusters.

Dask emphasizes the following virtues:

  • Familiar: Provides parallelized NumPy array and Pandas DataFrame objects
  • Flexible: Provides a task scheduling interface for more custom workloads and integration with other projects.
  • Native: Enables distributed computing in Pure Python with access to the PyData stack.
  • Fast: Operates with low overhead, low latency, and minimal serialization necessary for fast numerical algorithms
  • Scales up: Runs resiliently on clusters with 1000s of cores Scales down: Trivial to set up and run on a laptop in a single process
  • Responsive: Designed with interactive computing in mind it provides rapid feedback and diagnostics to aid humans

and to add a simple code sample:

import dask.dataframe as dd
df = dd.read_csv('2015-*-*.csv')
df.groupby(df.user_id).value.mean().compute()

replaces some pandas code like this:

import pandas as pd
df = pd.read_csv('2015-01-01.csv')
df.groupby(df.user_id).value.mean()

and, especially noteworthy, provides through the concurrent.futures interface a general infrastructure for the submission of custom tasks:

from dask.distributed import Client
client = Client('scheduler:port')

futures = []
for fn in filenames:
    future = client.submit(load, fn)
    futures.append(future)

summary = client.submit(summarize, futures)
summary.result()

回答 9

在这里还值得一提的是Ray
它是一个分布式计算框架,它以分布式方式自己实现了对熊猫的实现。

只需替换pandas导入,代码应该可以正常运行:

# import pandas as pd
import ray.dataframe as pd

#use pd as usual

可以在这里阅读更多详细信息:

https://rise.cs.berkeley.edu/blog/pandas-on-ray/

It is worth mentioning here Ray as well,
it’s a distributed computation framework, that has it’s own implementation for pandas in a distributed way.

Just replace the pandas import, and the code should work as is:

# import pandas as pd
import ray.dataframe as pd

#use pd as usual

can read more details here:

https://rise.cs.berkeley.edu/blog/pandas-on-ray/


回答 10

另一种变化

在熊猫中完成的许多操作也可以作为db查询来完成(sql,mongo)

使用RDBMS或mongodb,您可以在数据库查询中执行某些聚合(针对大型数据进行了优化,并有效地使用了缓存和索引)

以后,您可以使用熊猫进行后期处理。

这种方法的优点是,您可以在处理大型数据时获得数据库优化,同时仍可以使用高级声明性语法定义逻辑-无需处理决定在内存中做什么和做什么的细节。的核心。

尽管查询语言和熊猫语言不同,但是将部分逻辑从一个逻辑转换到另一个逻辑通常并不复杂。

One more variation

Many of the operations done in pandas can also be done as a db query (sql, mongo)

Using a RDBMS or mongodb allows you to perform some of the aggregations in the DB Query (which is optimized for large data, and uses cache and indexes efficiently)

Later, you can perform post processing using pandas.

The advantage of this method is that you gain the DB optimizations for working with large data, while still defining the logic in a high level declarative syntax – and not having to deal with the details of deciding what to do in memory and what to do out of core.

And although the query language and pandas are different, it’s usually not complicated to translate part of the logic from one to another.


回答 11

如果您走创建数据管道的简单路径,请将该路径分解为多个较小的文件,请考虑使用Ruffus

Consider Ruffus if you go the simple path of creating a data pipeline which is broken down into multiple smaller files.


回答 12

我最近遇到了类似的问题。我发现简单地读取数据并将数据块追加到同一csv时效果很好。我的问题是,使用某些列的值,根据另一张表中的信息添加日期列。这可能会帮助那些对dask和hdf5感到困惑的人,但更熟悉像我这样的熊猫。

def addDateColumn():
"""Adds time to the daily rainfall data. Reads the csv as chunks of 100k 
   rows at a time and outputs them, appending as needed, to a single csv. 
   Uses the column of the raster names to get the date.
"""
    df = pd.read_csv(pathlist[1]+"CHIRPS_tanz.csv", iterator=True, 
                     chunksize=100000) #read csv file as 100k chunks

    '''Do some stuff'''

    count = 1 #for indexing item in time list 
    for chunk in df: #for each 100k rows
        newtime = [] #empty list to append repeating times for different rows
        toiterate = chunk[chunk.columns[2]] #ID of raster nums to base time
        while count <= toiterate.max():
            for i in toiterate: 
                if i ==count:
                    newtime.append(newyears[count])
            count+=1
        print "Finished", str(chunknum), "chunks"
        chunk["time"] = newtime #create new column in dataframe based on time
        outname = "CHIRPS_tanz_time2.csv"
        #append each output to same csv, using no header
        chunk.to_csv(pathlist[2]+outname, mode='a', header=None, index=None)

I recently came across a similar issue. I found simply reading the data in chunks and appending it as I write it in chunks to the same csv works well. My problem was adding a date column based on information in another table, using the value of certain columns as follows. This may help those confused by dask and hdf5 but more familiar with pandas like myself.

def addDateColumn():
"""Adds time to the daily rainfall data. Reads the csv as chunks of 100k 
   rows at a time and outputs them, appending as needed, to a single csv. 
   Uses the column of the raster names to get the date.
"""
    df = pd.read_csv(pathlist[1]+"CHIRPS_tanz.csv", iterator=True, 
                     chunksize=100000) #read csv file as 100k chunks

    '''Do some stuff'''

    count = 1 #for indexing item in time list 
    for chunk in df: #for each 100k rows
        newtime = [] #empty list to append repeating times for different rows
        toiterate = chunk[chunk.columns[2]] #ID of raster nums to base time
        while count <= toiterate.max():
            for i in toiterate: 
                if i ==count:
                    newtime.append(newyears[count])
            count+=1
        print "Finished", str(chunknum), "chunks"
        chunk["time"] = newtime #create new column in dataframe based on time
        outname = "CHIRPS_tanz_time2.csv"
        #append each output to same csv, using no header
        chunk.to_csv(pathlist[2]+outname, mode='a', header=None, index=None)

回答 13

我想指出一下Vaex软件包。

Vaex是用于惰性核心数据框架(类似于Pandas)的python库,用于可视化和探索大型表格数据集。它可以在高达每秒十亿(10 9)个对象/行的N维网格上计算统计信息,例如平均值,总和,计数,标准差等。可视化使用直方图,密度图和3d体积渲染完成,从而可以交互式探索大数据。Vaex使用内存映射,零内存复制策略和惰性计算来获得最佳性能(不浪费内存)。

看一下文档:https : //vaex.readthedocs.io/en/latest/ 该API非常接近于熊猫API。

I’d like to point out the Vaex package.

Vaex is a python library for lazy Out-of-Core DataFrames (similar to Pandas), to visualize and explore big tabular datasets. It can calculate statistics such as mean, sum, count, standard deviation etc, on an N-dimensional grid up to a billion (109) objects/rows per second. Visualization is done using histograms, density plots and 3d volume rendering, allowing interactive exploration of big data. Vaex uses memory mapping, zero memory copy policy and lazy computations for best performance (no memory wasted).

Have a look at the documentation: https://vaex.readthedocs.io/en/latest/ The API is very close to the API of pandas.


回答 14

为什么选择熊猫?您是否尝试过标准Python

使用标准库python。即使最近发布了稳定版,Pandas也会经常更新。

使用标准的python库,您的代码将始终运行。

一种实现方法是对要存储数据的方式有所了解,并对数据要解决哪些问题。然后绘制一个模式,说明如何组织数据(思考表),这将有助于您查询数据,而不必进行规范化。

您可以充分利用:

  • 字典列表,用于将数据存储在内存中,一个字典为一行,
  • 生成器逐行处理数据,以免RAM溢出,
  • 列出理解以查询您的数据,
  • 利用Counter,DefaultDict,…
  • 使用您选择的任何存储解决方案将数据存储在硬盘上,json可能是其中之一。

随着时间的推移,Ram和HDD越来越便宜,并且标准python 3广泛可用且稳定。

Why Pandas ? Have you tried Standard Python ?

The use of standard library python. Pandas is subject to frequent updates, even with the recent release of the stable version.

Using the standard python library your code will always run.

One way of doing it is to have an idea of the way you want your data to be stored , and which questions you want to solve regarding the data. Then draw a schema of how you can organise your data (think tables) that will help you query the data, not necessarily normalisation.

You can make good use of :

  • list of dictionaries to store the data in memory, one dict being one row,
  • generators to process the data row after row to not overflow your RAM,
  • list comprehension to query your data,
  • make use of Counter, DefaultDict, …
  • store your data on your hard drive using whatever storing solution you have chosen, json could be one of them.

Ram and HDD is becoming cheaper and cheaper with time and standard python 3 is widely available and stable.


回答 15

目前,我正在“喜欢”您,只是规模较小,这就是为什么我没有PoC来建议的原因。

但是,我似乎在使用pickle作为缓存系统并将各种功能的执行外包到文件中找到了成功-从我的commando / main文件中执行这些文件。例如,我使用prepare_use.py转换对象类型,将数据集拆分为测试,验证和预测数据集。

用咸菜进行缓存如何工作?我使用字符串来访问动态创建的pickle文件,具体取决于传递了哪些参数和数据集(为此,我尝试捕获并确定程序是否已在运行,使用.shape表示数据集,使用dict表示通过参数)。尊重这些措施,我得到一个String来尝试查找和读取.pickle文件,并且如果找到了该字符串,则可以跳过处理时间以跳转到我现在正在处理的执行。

使用数据库时,我遇到了类似的问题,这就是为什么我在使用此解决方案时感到高兴的原因,但是-有很多限制-例如由于冗余而存储大量的泡菜集。可以使用正确的索引从转换前到更新表进行更新-验证信息可以打开另一本完整的书(我尝试合并爬网的租金数据,基本上在2小时后停止使用数据库-因为我想在之后跳回每个转换过程)

我希望我的2美分能以某种方式对您有所帮助。

问候。

At the moment I am working “like” you, just on a lower scale, which is why I don’t have a PoC for my suggestion.

However, I seem to find success in using pickle as caching system and outsourcing execution of various functions into files – executing these files from my commando / main file; For example i use a prepare_use.py to convert object types, split a data set into test, validating and prediction data set.

How does your caching with pickle work? I use strings in order to access pickle-files that are dynamically created, depending on which parameters and data sets were passed (with that i try to capture and determine if the program was already run, using .shape for data set, dict for passed parameters). Respecting these measures, i get a String to try to find and read a .pickle-file and can, if found, skip processing time in order to jump to the execution i am working on right now.

Using databases I encountered similar problems, which is why i found joy in using this solution, however – there are many constraints for sure – for example storing huge pickle sets due to redundancy. Updating a table from before to after a transformation can be done with proper indexing – validating information opens up a whole other book (I tried consolidating crawled rent data and stopped using a database after 2 hours basically – as I would have liked to jump back after every transformation process)

I hope my 2 cents help you in some way.

Greetings.


@property装饰器如何工作?

问题:@property装饰器如何工作?

我想了解内置函数的property工作原理。令我感到困惑的是,property它还可以用作装饰器,但是仅当用作内置函数时才接受参数,而不能用作装饰器。

这个例子来自文档

class C(object):
    def __init__(self):
        self._x = None

    def getx(self):
        return self._x
    def setx(self, value):
        self._x = value
    def delx(self):
        del self._x
    x = property(getx, setx, delx, "I'm the 'x' property.")

property的论点是getxsetxdelx和文档字符串。

在下面的代码中property用作装饰器。它的对象是x函数,但是在上面的代码中,参数中没有对象函数的位置。

class C(object):
    def __init__(self):
        self._x = None

    @property
    def x(self):
        """I'm the 'x' property."""
        return self._x

    @x.setter
    def x(self, value):
        self._x = value

    @x.deleter
    def x(self):
        del self._x

并且,x.setterx.deleter装饰器是如何创建的?我很困惑。

I would like to understand how the built-in function property works. What confuses me is that property can also be used as a decorator, but it only takes arguments when used as a built-in function and not when used as a decorator.

This example is from the documentation:

class C(object):
    def __init__(self):
        self._x = None

    def getx(self):
        return self._x
    def setx(self, value):
        self._x = value
    def delx(self):
        del self._x
    x = property(getx, setx, delx, "I'm the 'x' property.")

property‘s arguments are getx, setx, delx and a doc string.

In the code below property is used as decorator. The object of it is the x function, but in the code above there is no place for an object function in the arguments.

class C(object):
    def __init__(self):
        self._x = None

    @property
    def x(self):
        """I'm the 'x' property."""
        return self._x

    @x.setter
    def x(self, value):
        self._x = value

    @x.deleter
    def x(self):
        del self._x

And, how are the x.setter and x.deleter decorators created? I am confused.


回答 0

property()函数返回一个特殊的描述符对象

>>> property()
<property object at 0x10ff07940>

正是这种对象有额外的方法:

>>> property().getter
<built-in method getter of property object at 0x10ff07998>
>>> property().setter
<built-in method setter of property object at 0x10ff07940>
>>> property().deleter
<built-in method deleter of property object at 0x10ff07998>

这些充当装饰。他们返回一个新的属性对象:

>>> property().getter(None)
<property object at 0x10ff079f0>

那是旧对象的副本,但是替换了其中一个功能。

请记住,@decorator语法只是语法糖。语法:

@property
def foo(self): return self._foo

确实与

def foo(self): return self._foo
foo = property(foo)

因此foo该函数被替换property(foo),我们在上面看到的是一个特殊的对象。然后,当您使用时@foo.setter(),您正在做的就是调用property().setter上面显示的方法,该方法将返回该属性的新副本,但是这次将setter函数替换为装饰方法。

下面的序列还通过使用那些装饰器方法创建了一个全开属性。

首先,我们property仅使用getter 创建一些函数和一个对象:

>>> def getter(self): print('Get!')
... 
>>> def setter(self, value): print('Set to {!r}!'.format(value))
... 
>>> def deleter(self): print('Delete!')
... 
>>> prop = property(getter)
>>> prop.fget is getter
True
>>> prop.fset is None
True
>>> prop.fdel is None
True

接下来,我们使用该.setter()方法添加setter:

>>> prop = prop.setter(setter)
>>> prop.fget is getter
True
>>> prop.fset is setter
True
>>> prop.fdel is None
True

最后,我们使用以下.deleter()方法添加删除器:

>>> prop = prop.deleter(deleter)
>>> prop.fget is getter
True
>>> prop.fset is setter
True
>>> prop.fdel is deleter
True

最后但并非最不重要的一点是,该property对象充当描述符对象,因此它具有和.__get__(),可以.__set__().__delete__()实例属性的获取,设置和删除方法挂钩:

>>> class Foo: pass
... 
>>> prop.__get__(Foo(), Foo)
Get!
>>> prop.__set__(Foo(), 'bar')
Set to 'bar'!
>>> prop.__delete__(Foo())
Delete!

Descriptor Howto包括以下类型的纯Python示例实现property()

class Property:
    "Emulate PyProperty_Type() in Objects/descrobject.c"

    def __init__(self, fget=None, fset=None, fdel=None, doc=None):
        self.fget = fget
        self.fset = fset
        self.fdel = fdel
        if doc is None and fget is not None:
            doc = fget.__doc__
        self.__doc__ = doc

    def __get__(self, obj, objtype=None):
        if obj is None:
            return self
        if self.fget is None:
            raise AttributeError("unreadable attribute")
        return self.fget(obj)

    def __set__(self, obj, value):
        if self.fset is None:
            raise AttributeError("can't set attribute")
        self.fset(obj, value)

    def __delete__(self, obj):
        if self.fdel is None:
            raise AttributeError("can't delete attribute")
        self.fdel(obj)

    def getter(self, fget):
        return type(self)(fget, self.fset, self.fdel, self.__doc__)

    def setter(self, fset):
        return type(self)(self.fget, fset, self.fdel, self.__doc__)

    def deleter(self, fdel):
        return type(self)(self.fget, self.fset, fdel, self.__doc__)

The property() function returns a special descriptor object:

>>> property()
<property object at 0x10ff07940>

It is this object that has extra methods:

>>> property().getter
<built-in method getter of property object at 0x10ff07998>
>>> property().setter
<built-in method setter of property object at 0x10ff07940>
>>> property().deleter
<built-in method deleter of property object at 0x10ff07998>

These act as decorators too. They return a new property object:

>>> property().getter(None)
<property object at 0x10ff079f0>

that is a copy of the old object, but with one of the functions replaced.

Remember, that the @decorator syntax is just syntactic sugar; the syntax:

@property
def foo(self): return self._foo

really means the same thing as

def foo(self): return self._foo
foo = property(foo)

so foo the function is replaced by property(foo), which we saw above is a special object. Then when you use @foo.setter(), what you are doing is call that property().setter method I showed you above, which returns a new copy of the property, but this time with the setter function replaced with the decorated method.

The following sequence also creates a full-on property, by using those decorator methods.

First we create some functions and a property object with just a getter:

>>> def getter(self): print('Get!')
... 
>>> def setter(self, value): print('Set to {!r}!'.format(value))
... 
>>> def deleter(self): print('Delete!')
... 
>>> prop = property(getter)
>>> prop.fget is getter
True
>>> prop.fset is None
True
>>> prop.fdel is None
True

Next we use the .setter() method to add a setter:

>>> prop = prop.setter(setter)
>>> prop.fget is getter
True
>>> prop.fset is setter
True
>>> prop.fdel is None
True

Last we add a deleter with the .deleter() method:

>>> prop = prop.deleter(deleter)
>>> prop.fget is getter
True
>>> prop.fset is setter
True
>>> prop.fdel is deleter
True

Last but not least, the property object acts as a descriptor object, so it has .__get__(), .__set__() and .__delete__() methods to hook into instance attribute getting, setting and deleting:

>>> class Foo: pass
... 
>>> prop.__get__(Foo(), Foo)
Get!
>>> prop.__set__(Foo(), 'bar')
Set to 'bar'!
>>> prop.__delete__(Foo())
Delete!

The Descriptor Howto includes a pure Python sample implementation of the property() type:

class Property:
    "Emulate PyProperty_Type() in Objects/descrobject.c"

    def __init__(self, fget=None, fset=None, fdel=None, doc=None):
        self.fget = fget
        self.fset = fset
        self.fdel = fdel
        if doc is None and fget is not None:
            doc = fget.__doc__
        self.__doc__ = doc

    def __get__(self, obj, objtype=None):
        if obj is None:
            return self
        if self.fget is None:
            raise AttributeError("unreadable attribute")
        return self.fget(obj)

    def __set__(self, obj, value):
        if self.fset is None:
            raise AttributeError("can't set attribute")
        self.fset(obj, value)

    def __delete__(self, obj):
        if self.fdel is None:
            raise AttributeError("can't delete attribute")
        self.fdel(obj)

    def getter(self, fget):
        return type(self)(fget, self.fset, self.fdel, self.__doc__)

    def setter(self, fset):
        return type(self)(self.fget, fset, self.fdel, self.__doc__)

    def deleter(self, fdel):
        return type(self)(self.fget, self.fset, fdel, self.__doc__)

回答 1

文档说这只是创建只读属性的捷径。所以

@property
def x(self):
    return self._x

相当于

def getx(self):
    return self._x
x = property(getx)

Documentation says it’s just a shortcut for creating readonly properties. So

@property
def x(self):
    return self._x

is equivalent to

def getx(self):
    return self._x
x = property(getx)

回答 2

这是如何@property实现的最小示例:

class Thing:
    def __init__(self, my_word):
        self._word = my_word 
    @property
    def word(self):
        return self._word

>>> print( Thing('ok').word )
'ok'

否则,将word保留方法而不是属性。

class Thing:
    def __init__(self, my_word):
        self._word = my_word
    def word(self):
        return self._word

>>> print( Thing('ok').word() )
'ok'

Here is a minimal example of how @property can be implemented:

class Thing:
    def __init__(self, my_word):
        self._word = my_word 
    @property
    def word(self):
        return self._word

>>> print( Thing('ok').word )
'ok'

Otherwise word remains a method instead of a property.

class Thing:
    def __init__(self, my_word):
        self._word = my_word
    def word(self):
        return self._word

>>> print( Thing('ok').word() )
'ok'

回答 3

第一部分很简单:

@property
def x(self): ...

是相同的

def x(self): ...
x = property(x)
  • 反过来,这是property仅使用getter 创建a的简化语法。

下一步将使用设置器和删除器扩展此属性。并通过适当的方法来实现:

@x.setter
def x(self, value): ...

返回一个新属性,该属性继承了旧属性x以及给定的setter的所有内容。

x.deleter 以相同的方式工作。

The first part is simple:

@property
def x(self): ...

is the same as

def x(self): ...
x = property(x)
  • which, in turn, is the simplified syntax for creating a property with just a getter.

The next step would be to extend this property with a setter and a deleter. And this happens with the appropriate methods:

@x.setter
def x(self, value): ...

returns a new property which inherits everything from the old x plus the given setter.

x.deleter works the same way.


回答 4

以下内容:

class C(object):
    def __init__(self):
        self._x = None

    @property
    def x(self):
        """I'm the 'x' property."""
        return self._x

    @x.setter
    def x(self, value):
        self._x = value

    @x.deleter
    def x(self):
        del self._x

是相同的:

class C(object):
    def __init__(self):
        self._x = None

    def _x_get(self):
        return self._x

    def _x_set(self, value):
        self._x = value

    def _x_del(self):
        del self._x

    x = property(_x_get, _x_set, _x_del, 
                    "I'm the 'x' property.")

是相同的:

class C(object):
    def __init__(self):
        self._x = None

    def _x_get(self):
        return self._x

    def _x_set(self, value):
        self._x = value

    def _x_del(self):
        del self._x

    x = property(_x_get, doc="I'm the 'x' property.")
    x = x.setter(_x_set)
    x = x.deleter(_x_del)

是相同的:

class C(object):
    def __init__(self):
        self._x = None

    def _x_get(self):
        return self._x
    x = property(_x_get, doc="I'm the 'x' property.")

    def _x_set(self, value):
        self._x = value
    x = x.setter(_x_set)

    def _x_del(self):
        del self._x
    x = x.deleter(_x_del)

等同于:

class C(object):
    def __init__(self):
        self._x = None

    @property
    def x(self):
        """I'm the 'x' property."""
        return self._x

    @x.setter
    def x(self, value):
        self._x = value

    @x.deleter
    def x(self):
        del self._x

This following:

class C(object):
    def __init__(self):
        self._x = None

    @property
    def x(self):
        """I'm the 'x' property."""
        return self._x

    @x.setter
    def x(self, value):
        self._x = value

    @x.deleter
    def x(self):
        del self._x

Is the same as:

class C(object):
    def __init__(self):
        self._x = None

    def _x_get(self):
        return self._x

    def _x_set(self, value):
        self._x = value

    def _x_del(self):
        del self._x

    x = property(_x_get, _x_set, _x_del, 
                    "I'm the 'x' property.")

Is the same as:

class C(object):
    def __init__(self):
        self._x = None

    def _x_get(self):
        return self._x

    def _x_set(self, value):
        self._x = value

    def _x_del(self):
        del self._x

    x = property(_x_get, doc="I'm the 'x' property.")
    x = x.setter(_x_set)
    x = x.deleter(_x_del)

Is the same as:

class C(object):
    def __init__(self):
        self._x = None

    def _x_get(self):
        return self._x
    x = property(_x_get, doc="I'm the 'x' property.")

    def _x_set(self, value):
        self._x = value
    x = x.setter(_x_set)

    def _x_del(self):
        del self._x
    x = x.deleter(_x_del)

Which is the same as :

class C(object):
    def __init__(self):
        self._x = None

    @property
    def x(self):
        """I'm the 'x' property."""
        return self._x

    @x.setter
    def x(self, value):
        self._x = value

    @x.deleter
    def x(self):
        del self._x

回答 5

下面是另一个示例,该示例在@property需要重构代码的情况下如何提供帮助(从此处进行总结):

假设您创建了一个Money这样的类:

class Money:
    def __init__(self, dollars, cents):
        self.dollars = dollars
        self.cents = cents

并且用户根据他/她使用的此类创建一个库

money = Money(27, 12)

print("I have {} dollar and {} cents.".format(money.dollars, money.cents))
# prints I have 27 dollar and 12 cents.

现在,让我们假设您决定更改您的Money类并摆脱dollarscents属性,而是决定仅跟踪总分:

class Money:
    def __init__(self, dollars, cents):
        self.total_cents = dollars * 100 + cents

如果上述用户现在尝试像以前一样运行他/她的库

money = Money(27, 12)

print("I have {} dollar and {} cents.".format(money.dollars, money.cents))

这会导致错误

AttributeError:“ Money”对象没有属性“ dollars”

也就是说,现在大家谁依赖于原始的手段Money类将不得不改变所有代码行,其中dollarscents使用可以是非常痛苦……那么,怎么会这样避免?通过使用@property

就是那样:

class Money:
    def __init__(self, dollars, cents):
        self.total_cents = dollars * 100 + cents

    # Getter and setter for dollars...
    @property
    def dollars(self):
        return self.total_cents // 100

    @dollars.setter
    def dollars(self, new_dollars):
        self.total_cents = 100 * new_dollars + self.cents

    # And the getter and setter for cents.
    @property
    def cents(self):
        return self.total_cents % 100

    @cents.setter
    def cents(self, new_cents):
        self.total_cents = 100 * self.dollars + new_cents

现在我们从图书馆打电话时

money = Money(27, 12)

print("I have {} dollar and {} cents.".format(money.dollars, money.cents))
# prints I have 27 dollar and 12 cents.

它会按预期工作,我们不必在库中更改任何代码!实际上,我们甚至不必知道我们依赖的库已更改。

setter可以正常工作:

money.dollars += 2
print("I have {} dollar and {} cents.".format(money.dollars, money.cents))
# prints I have 29 dollar and 12 cents.

money.cents += 10
print("I have {} dollar and {} cents.".format(money.dollars, money.cents))
# prints I have 29 dollar and 22 cents.

您也@property可以在抽象类中使用。我在这里举一个最小的例子。

Below is another example on how @property can help when one has to refactor code which is taken from here (I only summarize it below):

Imagine you created a class Money like this:

class Money:
    def __init__(self, dollars, cents):
        self.dollars = dollars
        self.cents = cents

and an user creates a library depending on this class where he/she uses e.g.

money = Money(27, 12)

print("I have {} dollar and {} cents.".format(money.dollars, money.cents))
# prints I have 27 dollar and 12 cents.

Now let’s suppose you decide to change your Money class and get rid of the dollars and cents attributes but instead decide to only track the total amount of cents:

class Money:
    def __init__(self, dollars, cents):
        self.total_cents = dollars * 100 + cents

If the above mentioned user now tries to run his/her library as before

money = Money(27, 12)

print("I have {} dollar and {} cents.".format(money.dollars, money.cents))

it will result in an error

AttributeError: ‘Money’ object has no attribute ‘dollars’

That means that now everyone who relies on your original Money class would have to change all lines of code where dollars and cents are used which can be very painful… So, how could this be avoided? By using @property!

That is how:

class Money:
    def __init__(self, dollars, cents):
        self.total_cents = dollars * 100 + cents

    # Getter and setter for dollars...
    @property
    def dollars(self):
        return self.total_cents // 100

    @dollars.setter
    def dollars(self, new_dollars):
        self.total_cents = 100 * new_dollars + self.cents

    # And the getter and setter for cents.
    @property
    def cents(self):
        return self.total_cents % 100

    @cents.setter
    def cents(self, new_cents):
        self.total_cents = 100 * self.dollars + new_cents

when we now call from our library

money = Money(27, 12)

print("I have {} dollar and {} cents.".format(money.dollars, money.cents))
# prints I have 27 dollar and 12 cents.

it will work as expected and we did not have to change a single line of code in our library! In fact, we would not even have to know that the library we depend on changed.

Also the setter works fine:

money.dollars += 2
print("I have {} dollar and {} cents.".format(money.dollars, money.cents))
# prints I have 29 dollar and 12 cents.

money.cents += 10
print("I have {} dollar and {} cents.".format(money.dollars, money.cents))
# prints I have 29 dollar and 22 cents.

You can use @property also in abstract classes; I give a minimal example here.


回答 6

我在这里阅读了所有文章,并意识到我们可能需要一个真实的例子。为什么实际上我们有@property?因此,考虑使用身份验证系统的Flask应用。您在中声明模型用户models.py

class User(UserMixin, db.Model):
    __tablename__ = 'users'
    id = db.Column(db.Integer, primary_key=True)
    email = db.Column(db.String(64), unique=True, index=True)
    username = db.Column(db.String(64), unique=True, index=True)
    password_hash = db.Column(db.String(128))

    ...

    @property
    def password(self):
        raise AttributeError('password is not a readable attribute')

    @password.setter
    def password(self, password):
        self.password_hash = generate_password_hash(password)

    def verify_password(self, password):
        return check_password_hash(self.password_hash, password)

在这段代码中,我们password使用了“隐藏”属性,当您尝试直接访问它时@property,该属性会触发AttributeError断言,而我们使用@ property.setter来设置实际的实例变量password_hash

现在,auth/views.py我们可以实例化一个用户:

...
@auth.route('/register', methods=['GET', 'POST'])
def register():
    form = RegisterForm()
    if form.validate_on_submit():
        user = User(email=form.email.data,
                    username=form.username.data,
                    password=form.password.data)
        db.session.add(user)
        db.session.commit()
...

password用户填写表单时,该属性来自注册表单。密码确认发生在前端EqualTo('password', message='Passwords must match')(如果您想知道,但这是与Flask表单相关的其他主题)。

我希望这个例子会有用

I read all the posts here and realized that we may need a real life example. Why, actually, we have @property? So, consider a Flask app where you use authentication system. You declare a model User in models.py:

class User(UserMixin, db.Model):
    __tablename__ = 'users'
    id = db.Column(db.Integer, primary_key=True)
    email = db.Column(db.String(64), unique=True, index=True)
    username = db.Column(db.String(64), unique=True, index=True)
    password_hash = db.Column(db.String(128))

    ...

    @property
    def password(self):
        raise AttributeError('password is not a readable attribute')

    @password.setter
    def password(self, password):
        self.password_hash = generate_password_hash(password)

    def verify_password(self, password):
        return check_password_hash(self.password_hash, password)

In this code we’ve “hidden” attribute password by using @property which triggers AttributeError assertion when you try to access it directly, while we used @property.setter to set the actual instance variable password_hash.

Now in auth/views.py we can instantiate a User with:

...
@auth.route('/register', methods=['GET', 'POST'])
def register():
    form = RegisterForm()
    if form.validate_on_submit():
        user = User(email=form.email.data,
                    username=form.username.data,
                    password=form.password.data)
        db.session.add(user)
        db.session.commit()
...

Notice attribute password that comes from a registration form when a user fills the form. Password confirmation happens on the front end with EqualTo('password', message='Passwords must match') (in case if you are wondering, but it’s a different topic related Flask forms).

I hope this example will be useful


回答 7

那里的很多人都清楚了这一点,但这是我一直在寻找的直接点。我觉得从@property装饰器开始很重要。例如:-

class UtilityMixin():
    @property
    def get_config(self):
        return "This is property"

函数“ get_config()”的调用将像这样工作。

util = UtilityMixin()
print(util.get_config)

如果您注意到我没有使用“()”括号来调用该函数。这是我在搜索@property装饰器的基本内容。这样您就可以像使用变量一样使用函数。

This point is been cleared by many people up there but here is a direct point which I was searching. This is what I feel is important to start with the @property decorator. eg:-

class UtilityMixin():
    @property
    def get_config(self):
        return "This is property"

The calling of function “get_config()” will work like this.

util = UtilityMixin()
print(util.get_config)

If you notice I have not used “()” brackets for calling the function. This is the basic thing which I was searching for the @property decorator. So that you can use your function just like a variable.


回答 8

让我们从Python装饰器开始。

Python装饰器是一个函数,可以帮助向已经定义的函数添加一些其他功能。

在Python中,一切都是对象。Python中的函数是一流的对象,这意味着它们可以被变量引用,添加到列表中,作为参数传递给另一个函数等。

考虑以下代码片段。

def decorator_func(fun):
    def wrapper_func():
        print("Wrapper function started")
        fun()
        print("Given function decorated")
        # Wrapper function add something to the passed function and decorator 
        # returns the wrapper function
    return wrapper_func

def say_bye():
    print("bye!!")

say_bye = decorator_func(say_bye)
say_bye()

# Output:
#  Wrapper function started
#  bye
#  Given function decorated

在这里,我们可以说装饰器函数修改了我们的say_hello函数,并在其中添加了一些额外的代码行。

装饰器的Python语法

def decorator_func(fun):
    def wrapper_func():
        print("Wrapper function started")
        fun()
        print("Given function decorated")
        # Wrapper function add something to the passed function and decorator 
        # returns the wrapper function
    return wrapper_func

@decorator_func
def say_bye():
    print("bye!!")

say_bye()

最后,让我们结束一个案例案例,但在此之前,让我们先讨论一些糟糕的原则。

在许多面向对象的编程语言中都使用getter和setter来确保数据封装的原理(被视为将数据与对这些数据进行操作的方法捆绑在一起)。

这些方法当然是用于获取数据的吸气剂和用于更改数据的设置器。

根据此原理,将一个类的属性设为私有,以隐藏它们并保护它们免受其他代码的侵害。

是的,@ property基本上是使用getter和setterpythonic方法。

Python有一个伟大的概念,称为属性,它使面向对象的程序员的生活变得更加简单。

让我们假设您决定创建一个可以存储摄氏温度的类。

class Celsius:
def __init__(self, temperature = 0):
    self.set_temperature(temperature)

def to_fahrenheit(self):
    return (self.get_temperature() * 1.8) + 32

def get_temperature(self):
    return self._temperature

def set_temperature(self, value):
    if value < -273:
        raise ValueError("Temperature below -273 is not possible")
    self._temperature = value

重构代码,这是我们可以通过属性实现的方法。

在Python中,property()是一个内置函数,可创建并返回属性对象。

属性对象具有三种方法,getter(),setter()和delete()。

class Celsius:
def __init__(self, temperature = 0):
    self.temperature = temperature

def to_fahrenheit(self):
    return (self.temperature * 1.8) + 32

def get_temperature(self):
    print("Getting value")
    return self.temperature

def set_temperature(self, value):
    if value < -273:
        raise ValueError("Temperature below -273 is not possible")
    print("Setting value")
    self.temperature = value

temperature = property(get_temperature,set_temperature)

这里,

temperature = property(get_temperature,set_temperature)

本可以分解为

# make empty property
temperature = property()
# assign fget
temperature = temperature.getter(get_temperature)
# assign fset
temperature = temperature.setter(set_temperature)

注意事项:

  • get_temperature仍然是属性而不是方法。

现在,您可以通过写入来获取温度值。

C = Celsius()
C.temperature
# instead of writing C.get_temperature()

我们可以进一步继续,不要定义名称get_temperatureset_temperature,因为它们是不必要的,并污染类命名空间。

解决上述问题的pythonic方法是使用@property

class Celsius:
    def __init__(self, temperature = 0):
        self.temperature = temperature

    def to_fahrenheit(self):
        return (self.temperature * 1.8) + 32

    @property
    def temperature(self):
        print("Getting value")
        return self.temperature

    @temperature.setter
    def temperature(self, value):
        if value < -273:
            raise ValueError("Temperature below -273 is not possible")
        print("Setting value")
        self.temperature = value

注意事项-

  1. 用于获取值的方法以“ @property”修饰。
  2. 用作设置器的方法用“ @ temperature.setter”修饰,如果该函数被称为“ x”,则必须用“ @ x.setter”修饰。
  3. 我们用相同的名称和不同数量的参数“ def temperature(self)”和“ def temperature(self,x)”编写了“两个”方法。

如您所见,该代码绝对不太优雅。

现在,让我们谈谈一个现实的实用场景。

假设您设计的类如下:

class OurClass:

    def __init__(self, a):
        self.x = a


y = OurClass(10)
print(y.x)

现在,让我们进一步假设我们的类在客户中很受欢迎,并且他们开始在程序中使用它。他们对对象进行了各种分配。

有朝一日,一个值得信赖的客户来找我们,建议“ x”的值必须在0到1000之间,这确实是一个可怕的情况!

由于属性,这很容易:我们创建属性版本“ x”。

class OurClass:

    def __init__(self,x):
        self.x = x

    @property
    def x(self):
        return self.__x

    @x.setter
    def x(self, x):
        if x < 0:
            self.__x = 0
        elif x > 1000:
            self.__x = 1000
        else:
            self.__x = x

很好,不是吗:您可以从可以想象到的最简单的实现开始,并且以后可以随意迁移到属性版本,而不必更改接口!因此,属性不仅仅是吸气剂和塞特剂的替代品!

您可以在此处检查此实现

Let’s start with Python decorators.

A Python decorator is a function that helps to add some additional functionalities to an already defined function.

In Python, everything is an object. Functions in Python are first-class objects which means that they can be referenced by a variable, added in the lists, passed as arguments to another function etc.

Consider the following code snippet.

def decorator_func(fun):
    def wrapper_func():
        print("Wrapper function started")
        fun()
        print("Given function decorated")
        # Wrapper function add something to the passed function and decorator 
        # returns the wrapper function
    return wrapper_func

def say_bye():
    print("bye!!")

say_bye = decorator_func(say_bye)
say_bye()

# Output:
#  Wrapper function started
#  bye
#  Given function decorated

Here, we can say that decorator function modified our say_hello function and added some extra lines of code in it.

Python syntax for decorator

def decorator_func(fun):
    def wrapper_func():
        print("Wrapper function started")
        fun()
        print("Given function decorated")
        # Wrapper function add something to the passed function and decorator 
        # returns the wrapper function
    return wrapper_func

@decorator_func
def say_bye():
    print("bye!!")

say_bye()

Let’s Concluded everything than with a case scenario, but before that let’s talk about some oops priniciples.

Getters and setters are used in many object oriented programming languages to ensure the principle of data encapsulation(is seen as the bundling of data with the methods that operate on these data.)

These methods are of course the getter for retrieving the data and the setter for changing the data.

According to this principle, the attributes of a class are made private to hide and protect them from other code.

Yup, @property is basically a pythonic way to use getters and setters.

Python has a great concept called property which makes the life of an object-oriented programmer much simpler.

Let us assume that you decide to make a class that could store the temperature in degree Celsius.

class Celsius:
def __init__(self, temperature = 0):
    self.set_temperature(temperature)

def to_fahrenheit(self):
    return (self.get_temperature() * 1.8) + 32

def get_temperature(self):
    return self._temperature

def set_temperature(self, value):
    if value < -273:
        raise ValueError("Temperature below -273 is not possible")
    self._temperature = value

Refactored Code, Here is how we could have achieved it with property.

In Python, property() is a built-in function that creates and returns a property object.

A property object has three methods, getter(), setter(), and delete().

class Celsius:
def __init__(self, temperature = 0):
    self.temperature = temperature

def to_fahrenheit(self):
    return (self.temperature * 1.8) + 32

def get_temperature(self):
    print("Getting value")
    return self.temperature

def set_temperature(self, value):
    if value < -273:
        raise ValueError("Temperature below -273 is not possible")
    print("Setting value")
    self.temperature = value

temperature = property(get_temperature,set_temperature)

Here,

temperature = property(get_temperature,set_temperature)

could have been broken down as,

# make empty property
temperature = property()
# assign fget
temperature = temperature.getter(get_temperature)
# assign fset
temperature = temperature.setter(set_temperature)

Point To Note:

  • get_temperature remains a property instead of a method.

Now you can access the value of temperature by writing.

C = Celsius()
C.temperature
# instead of writing C.get_temperature()

We can further go on and not define names get_temperature and set_temperature as they are unnecessary and pollute the class namespace.

The pythonic way to deal with the above problem is to use @property.

class Celsius:
    def __init__(self, temperature = 0):
        self.temperature = temperature

    def to_fahrenheit(self):
        return (self.temperature * 1.8) + 32

    @property
    def temperature(self):
        print("Getting value")
        return self.temperature

    @temperature.setter
    def temperature(self, value):
        if value < -273:
            raise ValueError("Temperature below -273 is not possible")
        print("Setting value")
        self.temperature = value

Points to Note –

  1. A method which is used for getting a value is decorated with “@property”.
  2. The method which has to function as the setter is decorated with “@temperature.setter”, If the function had been called “x”, we would have to decorate it with “@x.setter”.
  3. We wrote “two” methods with the same name and a different number of parameters “def temperature(self)” and “def temperature(self,x)”.

As you can see, the code is definitely less elegant.

Now,let’s talk about one real-life practical scenerio.

Let’s say you have designed a class as follows:

class OurClass:

    def __init__(self, a):
        self.x = a


y = OurClass(10)
print(y.x)

Now, let’s further assume that our class got popular among clients and they started using it in their programs, They did all kinds of assignments to the object.

And One fateful day, a trusted client came to us and suggested that “x” has to be a value between 0 and 1000, this is really a horrible scenario!

Due to properties it’s easy: We create a property version of “x”.

class OurClass:

    def __init__(self,x):
        self.x = x

    @property
    def x(self):
        return self.__x

    @x.setter
    def x(self, x):
        if x < 0:
            self.__x = 0
        elif x > 1000:
            self.__x = 1000
        else:
            self.__x = x

This is great, isn’t it: You can start with the simplest implementation imaginable, and you are free to later migrate to a property version without having to change the interface! So properties are not just a replacement for getters and setter!

You can check this Implementation here


回答 9

property@property装饰器背后的一类。

您可以随时检查以下内容:

print(property) #<class 'property'>

我改写了示例,help(property)以显示@property语法

class C:
    def __init__(self):
        self._x=None

    @property 
    def x(self):
        return self._x

    @x.setter 
    def x(self, value):
        self._x = value

    @x.deleter
    def x(self):
        del self._x

c = C()
c.x="a"
print(c.x)

在功能上与property()语法相同:

class C:
    def __init__(self):
        self._x=None

    def g(self):
        return self._x

    def s(self, v):
        self._x = v

    def d(self):
        del self._x

    prop = property(g,s,d)

c = C()
c.x="a"
print(c.x)

如您所见,我们使用该属性的方式没有什么不同。

为了回答这个问题,@property装饰器是通过property类实现的。


因此,问题是要对该property类进行一些解释。这行:

prop = property(g,s,d)

是初始化。我们可以这样重写它:

prop = property(fget=g,fset=s,fdel=d)

的含义fgetfsetfdel

 |    fget
 |      function to be used for getting an attribute value
 |    fset
 |      function to be used for setting an attribute value
 |    fdel
 |      function to be used for del'ing an attribute
 |    doc
 |      docstring

下图显示了我们从类中获得的三胞胎property

__get____set____delete__那里被覆盖。这是Python中描述符模式的实现。

通常,描述符是具有“绑定行为”的对象属性,其属性访问已被描述符协议中的方法所覆盖。

我们还可以使用属性settergetterdeleter方法的功能绑定属性。检查下一个示例。s2该类的方法C会将属性设置为double

class C:
    def __init__(self):
        self._x=None

    def g(self):
        return self._x

    def s(self, x):
        self._x = x

    def d(self):
        del self._x

    def s2(self,x):
        self._x=x+x


    x=property(g)
    x=x.setter(s)
    x=x.deleter(d)      


c = C()
c.x="a"
print(c.x) # outputs "a"

C.x=property(C.g, C.s2)
C.x=C.x.deleter(C.d)
c2 = C()
c2.x="a"
print(c2.x) # outputs "aa"

property is a class behind @property decorator.

You can always check this:

print(property) #<class 'property'>

I rewrote the example from help(property) to show that the @property syntax

class C:
    def __init__(self):
        self._x=None

    @property 
    def x(self):
        return self._x

    @x.setter 
    def x(self, value):
        self._x = value

    @x.deleter
    def x(self):
        del self._x

c = C()
c.x="a"
print(c.x)

is functionally identical to property() syntax:

class C:
    def __init__(self):
        self._x=None

    def g(self):
        return self._x

    def s(self, v):
        self._x = v

    def d(self):
        del self._x

    prop = property(g,s,d)

c = C()
c.x="a"
print(c.x)

There is no difference how we use the property as you can see.

To answer the question @property decorator is implemented via property class.


So, the question is to explain the property class a bit. This line:

prop = property(g,s,d)

Was the initialization. We can rewrite it like this:

prop = property(fget=g,fset=s,fdel=d)

The meaning of fget, fset and fdel:

 |    fget
 |      function to be used for getting an attribute value
 |    fset
 |      function to be used for setting an attribute value
 |    fdel
 |      function to be used for del'ing an attribute
 |    doc
 |      docstring

The next image shows the triplets we have, from the class property:

__get__, __set__, and __delete__ are there to be overridden. This is the implementation of the descriptor pattern in Python.

In general, a descriptor is an object attribute with “binding behavior”, one whose attribute access has been overridden by methods in the descriptor protocol.

We can also use property setter, getter and deleter methods to bind the function to property. Check the next example. The method s2 of the class C will set the property doubled.

class C:
    def __init__(self):
        self._x=None

    def g(self):
        return self._x

    def s(self, x):
        self._x = x

    def d(self):
        del self._x

    def s2(self,x):
        self._x=x+x


    x=property(g)
    x=x.setter(s)
    x=x.deleter(d)      


c = C()
c.x="a"
print(c.x) # outputs "a"

C.x=property(C.g, C.s2)
C.x=C.x.deleter(C.d)
c2 = C()
c2.x="a"
print(c2.x) # outputs "aa"

回答 10

可以通过两种方式声明属性。

  • 为属性创建getter,setter方法,然后将它们作为参数传递给属性函数
  • 使用@property装饰器。

您可以看一下我编写的有关python属性的一些示例。

A property can be declared in two ways.

  • Creating the getter, setter methods for an attribute and then passing these as argument to property function
  • Using the @property decorator.

You can have a look at few examples I have written about properties in python.


回答 11

最好的解释可以在这里找到:Python @Property Explained –如何使用和何时使用?(完整示例)Selva Prabhakaran | 发表于十一月5,2018

它帮助我理解了为什么不仅如此。

https://www.machinelearningplus.com/python/python-property/

The best explanation can be found here: Python @Property Explained – How to Use and When? (Full Examples) by Selva Prabhakaran | Posted on November 5, 2018

It helped me understand WHY not only HOW.

https://www.machinelearningplus.com/python/python-property/


回答 12

这是另一个示例:

##
## Python Properties Example
##
class GetterSetterExample( object ):
    ## Set the default value for x ( we reference it using self.x, set a value using self.x = value )
    __x = None


##
## On Class Initialization - do something... if we want..
##
def __init__( self ):
    ## Set a value to __x through the getter / setter... Since __x is defined above, this doesn't need to be set...
    self.x = 1234

    return None


##
## Define x as a property, ie a getter - All getters should have a default value arg, so I added it - it will not be passed in when setting a value, so you need to set the default here so it will be used..
##
@property
def x( self, _default = None ):
    ## I added an optional default value argument as all getters should have this - set it to the default value you want to return...
    _value = ( self.__x, _default )[ self.__x == None ]

    ## Debugging - so you can see the order the calls are made...
    print( '[ Test Class ] Get x = ' + str( _value ) )

    ## Return the value - we are a getter afterall...
    return _value


##
## Define the setter function for x...
##
@x.setter
def x( self, _value = None ):
    ## Debugging - so you can see the order the calls are made...
    print( '[ Test Class ] Set x = ' + str( _value ) )

    ## This is to show the setter function works.... If the value is above 0, set it to a negative value... otherwise keep it as is ( 0 is the only non-negative number, it can't be negative or positive anyway )
    if ( _value > 0 ):
        self.__x = -_value
    else:
        self.__x = _value


##
## Define the deleter function for x...
##
@x.deleter
def x( self ):
    ## Unload the assignment / data for x
    if ( self.__x != None ):
        del self.__x


##
## To String / Output Function for the class - this will show the property value for each property we add...
##
def __str__( self ):
    ## Output the x property data...
    print( '[ x ] ' + str( self.x ) )


    ## Return a new line - technically we should return a string so it can be printed where we want it, instead of printed early if _data = str( C( ) ) is used....
    return '\n'

##
##
##
_test = GetterSetterExample( )
print( _test )

## For some reason the deleter isn't being called...
del _test.x

基本上,与C(object)示例相同,只是我改用x …我也不在__init中初始化 -…很好..我可以,但是可以删除它,因为__x被定义为一部分班上的…

输出为:

[ Test Class ] Set x = 1234
[ Test Class ] Get x = -1234
[ x ] -1234

如果我将init的self.x = 1234注释掉,则输出为:

[ Test Class ] Get x = None
[ x ] None

并且如果我在getter函数中将_default = None设置为_default = 0(因为所有的getter都应具有默认值,但不会被我所看到的属性值传递,因此您可以在此处定义它,以及它实际上还不错,因为您可以定义一次默认值并在所有地方使用它),即:def x(self,_default = 0):

[ Test Class ] Get x = 0
[ x ] 0

注意:getter逻辑只是为了让它操纵值以确保它被操纵-与print语句相同…

注意:我习惯了Lua,并且在调用单个函数时能够动态创建10个以上的助手,并且我在不使用属性的情况下为Python做了类似的事情,并且在一定程度上可以正常工作,但是,即使这些函数是在之前创建的被使用时,在创建它们之前有时仍会调用它们,这很奇怪,因为它不是以这种方式编码的。。。我更喜欢Lua元表的灵活性,而且我可以使用实际的setter / getters。而不是本质上直接访问变量…我确实喜欢用Python可以快速构建某些东西-例如gui程序。尽管没有大量其他库,虽然我正在设计的库可能无法实现-如果我在AutoHotkey中对其进行编码,则可以直接访问所需的dll调用,并且可以在Java,C#,C ++,

注意:此论坛中的代码输出已损坏-我必须在代码的第一部分中添加空格才能使其正常工作-复制/粘贴时,请确保将所有空格都转换为制表符…。我在Python中使用制表符,因为在10,000行的文件大小可以为512KB至1MB(带空格)和100至200KB(带制表符),这在文件大小和减少处理时间方面存在巨大差异。

还可以按用户调整选项卡-因此,如果您希望使用2个空格宽度,4个,8个空格或您可以做的任何事情,这意味着它对于有视力缺陷的开发人员来说是体贴的。

注意:由于论坛软件中的错误,该类中定义的所有功能均未正确缩进-如果复制/粘贴,请确保将其缩进

Here is another example:

##
## Python Properties Example
##
class GetterSetterExample( object ):
    ## Set the default value for x ( we reference it using self.x, set a value using self.x = value )
    __x = None


##
## On Class Initialization - do something... if we want..
##
def __init__( self ):
    ## Set a value to __x through the getter / setter... Since __x is defined above, this doesn't need to be set...
    self.x = 1234

    return None


##
## Define x as a property, ie a getter - All getters should have a default value arg, so I added it - it will not be passed in when setting a value, so you need to set the default here so it will be used..
##
@property
def x( self, _default = None ):
    ## I added an optional default value argument as all getters should have this - set it to the default value you want to return...
    _value = ( self.__x, _default )[ self.__x == None ]

    ## Debugging - so you can see the order the calls are made...
    print( '[ Test Class ] Get x = ' + str( _value ) )

    ## Return the value - we are a getter afterall...
    return _value


##
## Define the setter function for x...
##
@x.setter
def x( self, _value = None ):
    ## Debugging - so you can see the order the calls are made...
    print( '[ Test Class ] Set x = ' + str( _value ) )

    ## This is to show the setter function works.... If the value is above 0, set it to a negative value... otherwise keep it as is ( 0 is the only non-negative number, it can't be negative or positive anyway )
    if ( _value > 0 ):
        self.__x = -_value
    else:
        self.__x = _value


##
## Define the deleter function for x...
##
@x.deleter
def x( self ):
    ## Unload the assignment / data for x
    if ( self.__x != None ):
        del self.__x


##
## To String / Output Function for the class - this will show the property value for each property we add...
##
def __str__( self ):
    ## Output the x property data...
    print( '[ x ] ' + str( self.x ) )


    ## Return a new line - technically we should return a string so it can be printed where we want it, instead of printed early if _data = str( C( ) ) is used....
    return '\n'

##
##
##
_test = GetterSetterExample( )
print( _test )

## For some reason the deleter isn't being called...
del _test.x

Basically, the same as the C( object ) example except I’m using x instead… I also don’t initialize in __init – … well.. I do, but it can be removed because __x is defined as part of the class….

The output is:

[ Test Class ] Set x = 1234
[ Test Class ] Get x = -1234
[ x ] -1234

and if I comment out the self.x = 1234 in init then the output is:

[ Test Class ] Get x = None
[ x ] None

and if I set the _default = None to _default = 0 in the getter function ( as all getters should have a default value but it isn’t passed in by the property values from what I’ve seen so you can define it here, and it actually isn’t bad because you can define the default once and use it everywhere ) ie: def x( self, _default = 0 ):

[ Test Class ] Get x = 0
[ x ] 0

Note: The getter logic is there just to have the value be manipulated by it to ensure it is manipulated by it – the same for the print statements…

Note: I’m used to Lua and being able to dynamically create 10+ helpers when I call a single function and I made something similar for Python without using properties and it works to a degree, but, even though the functions are being created before being used, there are still issues at times with them being called prior to being created which is strange as it isn’t coded that way… I prefer the flexibility of Lua meta-tables and the fact I can use actual setters / getters instead of essentially directly accessing a variable… I do like how quickly some things can be built with Python though – for instance gui programs. although one I am designing may not be possible without a lot of additional libraries – if I code it in AutoHotkey I can directly access the dll calls I need, and the same can be done in Java, C#, C++, and more – maybe I haven’t found the right thing yet but for that project I may switch from Python..

Note: The code output in this forum is broken – I had to add spaces to the first part of the code for it to work – when copy / pasting ensure you convert all spaces to tabs…. I use tabs for Python because in a file which is 10,000 lines the filesize can be 512KB to 1MB with spaces and 100 to 200KB with tabs which equates to a massive difference for file size, and reduction in processing time…

Tabs can also be adjusted per user – so if you prefer 2 spaces width, 4, 8 or whatever you can do it meaning it is thoughtful for developers with eye-sight deficits.

Note: All of the functions defined in the class aren’t indented properly because of a bug in the forum software – ensure you indent it if you copy / paste


回答 13

一句话:对我来说,对于Python 2.x,@property当我不继承自object

class A():
    pass

但在以下情况下有效:

class A(object):
    pass

对于Python 3,始终有效。

One remark: for me, for Python 2.x, @property didn’t work as advertised when I didn’t inherit from object:

class A():
    pass

but worked when:

class A(object):
    pass

for Python 3, worked always.


在Python Pandas中向现有DataFrame添加新列

问题:在Python Pandas中向现有DataFrame添加新列

我有以下索引的DataFrame,其中的命名列和行不是连续数字:

          a         b         c         d
2  0.671399  0.101208 -0.181532  0.241273
3  0.446172 -0.243316  0.051767  1.577318
5  0.614758  0.075793 -0.451460 -0.012493

我想'e'在现有数据框架中添加一个新列,并且不想更改数据框架中的任何内容(即,新列始终与DataFrame具有相同的长度)。

0   -0.335485
1   -1.166658
2   -0.385571
dtype: float64

如何e在上述示例中添加列?

I have the following indexed DataFrame with named columns and rows not- continuous numbers:

          a         b         c         d
2  0.671399  0.101208 -0.181532  0.241273
3  0.446172 -0.243316  0.051767  1.577318
5  0.614758  0.075793 -0.451460 -0.012493

I would like to add a new column, 'e', to the existing data frame and do not want to change anything in the data frame (i.e., the new column always has the same length as the DataFrame).

0   -0.335485
1   -1.166658
2   -0.385571
dtype: float64

How can I add column e to the above example?


回答 0

使用原始的df1索引创建系列:

df1['e'] = pd.Series(np.random.randn(sLength), index=df1.index)

编辑2015年
有人报告SettingWithCopyWarning使用此代码。
但是,该代码仍可以在当前的熊猫0.10.1版本中完美运行。

>>> sLength = len(df1['a'])
>>> df1
          a         b         c         d
6 -0.269221 -0.026476  0.997517  1.294385
8  0.917438  0.847941  0.034235 -0.448948

>>> df1['e'] = pd.Series(np.random.randn(sLength), index=df1.index)
>>> df1
          a         b         c         d         e
6 -0.269221 -0.026476  0.997517  1.294385  1.757167
8  0.917438  0.847941  0.034235 -0.448948  2.228131

>>> p.version.short_version
'0.16.1'

SettingWithCopyWarning目标对数据帧的副本通知可能无效转让的。它不一定表示您做错了(它可能会触发误报),但从0.13.0起,它会让您知道有更多适合同一目的的方法。然后,如果收到警告,请遵循其建议:尝试使用.loc [row_index,col_indexer] = value代替

>>> df1.loc[:,'f'] = pd.Series(np.random.randn(sLength), index=df1.index)
>>> df1
          a         b         c         d         e         f
6 -0.269221 -0.026476  0.997517  1.294385  1.757167 -0.050927
8  0.917438  0.847941  0.034235 -0.448948  2.228131  0.006109
>>> 

实际上,这是目前熊猫文档中描述的更有效的方法


编辑2017

如评论和@Alexander所示,当前最好将Series的值添加为DataFrame的新列的最佳方法是使用assign

df1 = df1.assign(e=pd.Series(np.random.randn(sLength)).values)

Use the original df1 indexes to create the series:

df1['e'] = pd.Series(np.random.randn(sLength), index=df1.index)

Edit 2015
Some reported getting the SettingWithCopyWarning with this code.
However, the code still runs perfectly with the current pandas version 0.16.1.

>>> sLength = len(df1['a'])
>>> df1
          a         b         c         d
6 -0.269221 -0.026476  0.997517  1.294385
8  0.917438  0.847941  0.034235 -0.448948

>>> df1['e'] = pd.Series(np.random.randn(sLength), index=df1.index)
>>> df1
          a         b         c         d         e
6 -0.269221 -0.026476  0.997517  1.294385  1.757167
8  0.917438  0.847941  0.034235 -0.448948  2.228131

>>> p.version.short_version
'0.16.1'

The SettingWithCopyWarning aims to inform of a possibly invalid assignment on a copy of the Dataframe. It doesn’t necessarily say you did it wrong (it can trigger false positives) but from 0.13.0 it let you know there are more adequate methods for the same purpose. Then, if you get the warning, just follow its advise: Try using .loc[row_index,col_indexer] = value instead

>>> df1.loc[:,'f'] = pd.Series(np.random.randn(sLength), index=df1.index)
>>> df1
          a         b         c         d         e         f
6 -0.269221 -0.026476  0.997517  1.294385  1.757167 -0.050927
8  0.917438  0.847941  0.034235 -0.448948  2.228131  0.006109
>>> 

In fact, this is currently the more efficient method as described in pandas docs


Edit 2017

As indicated in the comments and by @Alexander, currently the best method to add the values of a Series as a new column of a DataFrame could be using assign:

df1 = df1.assign(e=pd.Series(np.random.randn(sLength)).values)

回答 1

这是添加新列的简单方法: df['e'] = e

This is the simple way of adding a new column: df['e'] = e


回答 2

我想在现有数据框中添加新列’e’,并且不更改数据框中的任何内容。(该系列的长度总是与数据帧相同。)

我假设中的索引值e与中的索引值匹配df1

初始化名为的新列e并为其分配系列中的值的最简单方法e

df['e'] = e.values

分配(熊猫0.16.0+)

从Pandas 0.16.0开始,您还可以使用assign,它为DataFrame分配新列,并返回一个新对象(副本),该对象除包含新列外还包含所有原始列。

df1 = df1.assign(e=e.values)

按照此示例(还包括assign函数的源代码),您还可以包括多个列:

df = pd.DataFrame({'a': [1, 2], 'b': [3, 4]})
>>> df.assign(mean_a=df.a.mean(), mean_b=df.b.mean())
   a  b  mean_a  mean_b
0  1  3     1.5     3.5
1  2  4     1.5     3.5

在您的示例中:

np.random.seed(0)
df1 = pd.DataFrame(np.random.randn(10, 4), columns=['a', 'b', 'c', 'd'])
mask = df1.applymap(lambda x: x <-0.7)
df1 = df1[-mask.any(axis=1)]
sLength = len(df1['a'])
e = pd.Series(np.random.randn(sLength))

>>> df1
          a         b         c         d
0  1.764052  0.400157  0.978738  2.240893
2 -0.103219  0.410599  0.144044  1.454274
3  0.761038  0.121675  0.443863  0.333674
7  1.532779  1.469359  0.154947  0.378163
9  1.230291  1.202380 -0.387327 -0.302303

>>> e
0   -1.048553
1   -1.420018
2   -1.706270
3    1.950775
4   -0.509652
dtype: float64

df1 = df1.assign(e=e.values)

>>> df1
          a         b         c         d         e
0  1.764052  0.400157  0.978738  2.240893 -1.048553
2 -0.103219  0.410599  0.144044  1.454274 -1.420018
3  0.761038  0.121675  0.443863  0.333674 -1.706270
7  1.532779  1.469359  0.154947  0.378163  1.950775
9  1.230291  1.202380 -0.387327 -0.302303 -0.509652

首次引入此新功能时,可以在此处找到说明。

I would like to add a new column, ‘e’, to the existing data frame and do not change anything in the data frame. (The series always got the same length as a dataframe.)

I assume that the index values in e match those in df1.

The easiest way to initiate a new column named e, and assign it the values from your series e:

df['e'] = e.values

assign (Pandas 0.16.0+)

As of Pandas 0.16.0, you can also use assign, which assigns new columns to a DataFrame and returns a new object (a copy) with all the original columns in addition to the new ones.

df1 = df1.assign(e=e.values)

As per this example (which also includes the source code of the assign function), you can also include more than one column:

df = pd.DataFrame({'a': [1, 2], 'b': [3, 4]})
>>> df.assign(mean_a=df.a.mean(), mean_b=df.b.mean())
   a  b  mean_a  mean_b
0  1  3     1.5     3.5
1  2  4     1.5     3.5

In context with your example:

np.random.seed(0)
df1 = pd.DataFrame(np.random.randn(10, 4), columns=['a', 'b', 'c', 'd'])
mask = df1.applymap(lambda x: x <-0.7)
df1 = df1[-mask.any(axis=1)]
sLength = len(df1['a'])
e = pd.Series(np.random.randn(sLength))

>>> df1
          a         b         c         d
0  1.764052  0.400157  0.978738  2.240893
2 -0.103219  0.410599  0.144044  1.454274
3  0.761038  0.121675  0.443863  0.333674
7  1.532779  1.469359  0.154947  0.378163
9  1.230291  1.202380 -0.387327 -0.302303

>>> e
0   -1.048553
1   -1.420018
2   -1.706270
3    1.950775
4   -0.509652
dtype: float64

df1 = df1.assign(e=e.values)

>>> df1
          a         b         c         d         e
0  1.764052  0.400157  0.978738  2.240893 -1.048553
2 -0.103219  0.410599  0.144044  1.454274 -1.420018
3  0.761038  0.121675  0.443863  0.333674 -1.706270
7  1.532779  1.469359  0.154947  0.378163  1.950775
9  1.230291  1.202380 -0.387327 -0.302303 -0.509652

The description of this new feature when it was first introduced can be found here.


回答 3

似乎在最新的Pandas版本中,可行的方法是使用df.assign

df1 = df1.assign(e=np.random.randn(sLength))

它不会产生SettingWithCopyWarning

It seems that in recent Pandas versions the way to go is to use df.assign:

df1 = df1.assign(e=np.random.randn(sLength))

It doesn’t produce SettingWithCopyWarning.


回答 4

通过NumPy直接执行此操作将是最有效的:

df1['e'] = np.random.randn(sLength)

请注意,我最初的建议(很旧)是使用map(慢得多):

df1['e'] = df1['a'].map(lambda x: np.random.random())

Doing this directly via NumPy will be the most efficient:

df1['e'] = np.random.randn(sLength)

Note my original (very old) suggestion was to use map (which is much slower):

df1['e'] = df1['a'].map(lambda x: np.random.random())

回答 5

超简单的列分配

将熊猫数据框实现为列的有序字典。

这意味着__getitem__ []不仅可以用于获取特定列,__setitem__ [] =还可以用于分配新列。

例如,只需使用[]访问器,就可以向该数据框添加一列

    size      name color
0    big      rose   red
1  small    violet  blue
2  small     tulip   red
3  small  harebell  blue

df['protected'] = ['no', 'no', 'no', 'yes']

    size      name color protected
0    big      rose   red        no
1  small    violet  blue        no
2  small     tulip   red        no
3  small  harebell  blue       yes

请注意,即使数据框的索引已关闭,此操作也有效。

df.index = [3,2,1,0]
df['protected'] = ['no', 'no', 'no', 'yes']
    size      name color protected
3    big      rose   red        no
2  small    violet  blue        no
1  small     tulip   red        no
0  small  harebell  blue       yes

[] =是要走的路,但要当心!

但是,如果您有一个pd.Series并尝试将其分配给索引关闭的数据帧,则会遇到麻烦。参见示例:

df['protected'] = pd.Series(['no', 'no', 'no', 'yes'])
    size      name color protected
3    big      rose   red       yes
2  small    violet  blue        no
1  small     tulip   red        no
0  small  harebell  blue        no

这是因为pd.Series默认情况下,a的索引从0枚举到n。而熊猫[] =方法试图 变得“聪明”

实际发生了什么。

使用[] =方法时,pandas使用左手数据帧的索引和右手序列的索引安静地执行外部联接或外部合并。df['column'] = series

边注

这很快就会引起认知失调,因为该[]=方法试图根据输入来做很多不同的事情,除非您只知道熊猫如何工作的,否则无法预测结果。因此,我建议不要使用[]=in代码库,但是在笔记本中浏览数据时可以使用。

解决问题

如果您有一个pd.Series并且希望从上到下分配它,或者您正在编码生产性代码并且不确定索引顺序,那么为此类问题提供保护是值得的。

您可以将转换pd.Series为a np.ndarray或a list,这可以解决问题。

df['protected'] = pd.Series(['no', 'no', 'no', 'yes']).values

要么

df['protected'] = list(pd.Series(['no', 'no', 'no', 'yes']))

但这不是很明确。

某些编码器可能会说:“嘿,这看起来很多余,我将对其进行优化”。

显式方式

设置的索引pd.Series是的索引df是明确的。

df['protected'] = pd.Series(['no', 'no', 'no', 'yes'], index=df.index)

或更现实的说,您可能pd.Series已经有空了。

protected_series = pd.Series(['no', 'no', 'no', 'yes'])
protected_series.index = df.index

3     no
2     no
1     no
0    yes

现在可以分配

df['protected'] = protected_series

    size      name color protected
3    big      rose   red        no
2  small    violet  blue        no
1  small     tulip   red        no
0  small  harebell  blue       yes

另一种方式 df.reset_index()

由于索引不一致是问题所在,因此,如果您认为数据框的索引不应该指示事物,则可以简单地删除索引,这应该更快,但是它不是很干净,因为您的函数现在可能做两件事。

df.reset_index(drop=True)
protected_series.reset_index(drop=True)
df['protected'] = protected_series

    size      name color protected
0    big      rose   red        no
1  small    violet  blue        no
2  small     tulip   red        no
3  small  harebell  blue       yes

注意 df.assign

尽管df.assign让您更清楚地知道自己在做什么,但实际上却存在与上述相同的所有问题[]=

df.assign(protected=pd.Series(['no', 'no', 'no', 'yes']))
    size      name color protected
3    big      rose   red       yes
2  small    violet  blue        no
1  small     tulip   red        no
0  small  harebell  blue        no

请注意df.assign,您的专栏没有被调用self。会导致错误。这很df.assign ,因为函数中存在这些伪像。

df.assign(self=pd.Series(['no', 'no', 'no', 'yes'])
TypeError: assign() got multiple values for keyword argument 'self'

您可能会说,“好吧,那我就不使用了self”。但是谁知道这个函数将来会如何变化以支持新的论点。也许您的列名将成为熊猫新更新中的一个参数,从而导致升级问题。

Super simple column assignment

A pandas dataframe is implemented as an ordered dict of columns.

This means that the __getitem__ [] can not only be used to get a certain column, but __setitem__ [] = can be used to assign a new column.

For example, this dataframe can have a column added to it by simply using the [] accessor

    size      name color
0    big      rose   red
1  small    violet  blue
2  small     tulip   red
3  small  harebell  blue

df['protected'] = ['no', 'no', 'no', 'yes']

    size      name color protected
0    big      rose   red        no
1  small    violet  blue        no
2  small     tulip   red        no
3  small  harebell  blue       yes

Note that this works even if the index of the dataframe is off.

df.index = [3,2,1,0]
df['protected'] = ['no', 'no', 'no', 'yes']
    size      name color protected
3    big      rose   red        no
2  small    violet  blue        no
1  small     tulip   red        no
0  small  harebell  blue       yes

[]= is the way to go, but watch out!

However, if you have a pd.Series and try to assign it to a dataframe where the indexes are off, you will run in to trouble. See example:

df['protected'] = pd.Series(['no', 'no', 'no', 'yes'])
    size      name color protected
3    big      rose   red       yes
2  small    violet  blue        no
1  small     tulip   red        no
0  small  harebell  blue        no

This is because a pd.Series by default has an index enumerated from 0 to n. And the pandas [] = method tries to be “smart”

What actually is going on.

When you use the [] = method pandas is quietly performing an outer join or outer merge using the index of the left hand dataframe and the index of the right hand series. df['column'] = series

Side note

This quickly causes cognitive dissonance, since the []= method is trying to do a lot of different things depending on the input, and the outcome cannot be predicted unless you just know how pandas works. I would therefore advice against the []= in code bases, but when exploring data in a notebook, it is fine.

Going around the problem

If you have a pd.Series and want it assigned from top to bottom, or if you are coding productive code and you are not sure of the index order, it is worth it to safeguard for this kind of issue.

You could downcast the pd.Series to a np.ndarray or a list, this will do the trick.

df['protected'] = pd.Series(['no', 'no', 'no', 'yes']).values

or

df['protected'] = list(pd.Series(['no', 'no', 'no', 'yes']))

But this is not very explicit.

Some coder may come along and say “Hey, this looks redundant, I’ll just optimize this away”.

Explicit way

Setting the index of the pd.Series to be the index of the df is explicit.

df['protected'] = pd.Series(['no', 'no', 'no', 'yes'], index=df.index)

Or more realistically, you probably have a pd.Series already available.

protected_series = pd.Series(['no', 'no', 'no', 'yes'])
protected_series.index = df.index

3     no
2     no
1     no
0    yes

Can now be assigned

df['protected'] = protected_series

    size      name color protected
3    big      rose   red        no
2  small    violet  blue        no
1  small     tulip   red        no
0  small  harebell  blue       yes

Alternative way with df.reset_index()

Since the index dissonance is the problem, if you feel that the index of the dataframe should not dictate things, you can simply drop the index, this should be faster, but it is not very clean, since your function now probably does two things.

df.reset_index(drop=True)
protected_series.reset_index(drop=True)
df['protected'] = protected_series

    size      name color protected
0    big      rose   red        no
1  small    violet  blue        no
2  small     tulip   red        no
3  small  harebell  blue       yes

Note on df.assign

While df.assign make it more explicit what you are doing, it actually has all the same problems as the above []=

df.assign(protected=pd.Series(['no', 'no', 'no', 'yes']))
    size      name color protected
3    big      rose   red       yes
2  small    violet  blue        no
1  small     tulip   red        no
0  small  harebell  blue        no

Just watch out with df.assign that your column is not called self. It will cause errors. This makes df.assign smelly, since there are these kind of artifacts in the function.

df.assign(self=pd.Series(['no', 'no', 'no', 'yes'])
TypeError: assign() got multiple values for keyword argument 'self'

You may say, “Well, I’ll just not use self then”. But who knows how this function changes in the future to support new arguments. Maybe your column name will be an argument in a new update of pandas, causing problems with upgrading.


回答 6

最简单的方法:

data['new_col'] = list_of_values

data.loc[ : , 'new_col'] = list_of_values

这样,您可以在熊猫对象中设置新值时避免所谓的链接索引。单击此处以进一步阅读

Easiest ways:-

data['new_col'] = list_of_values

data.loc[ : , 'new_col'] = list_of_values

This way you avoid what is called chained indexing when setting new values in a pandas object. Click here to read further.


回答 7

如果您要将整个新列设置为初始基值(例如None),则可以执行以下操作:df1['e'] = None

实际上,这将为单元分配“对象”类型。因此,稍后您可以将复杂的数据类型(如列表)放到单个单元格中。

If you want to set the whole new column to an initial base value (e.g. None), you can do this: df1['e'] = None

This actually would assign “object” type to the cell. So later you’re free to put complex data types, like list, into individual cells.


回答 8

我感到恐惧SettingWithCopyWarning,并且无法通过使用iloc语法进行修复。我的DataFrame是由ODBC源中的read_sql创建的。使用上面lowtech的建议,以下内容对我有用:

df.insert(len(df.columns), 'e', pd.Series(np.random.randn(sLength),  index=df.index))

这样可以很好地在最后插入列。我不知道这是否是最有效的,但我不喜欢警告消息。我认为有一个更好的解决方案,但我找不到它,并且我认为它取决于索引的某些方面。
注意。这只能工作一次,并且如果尝试覆盖现有列会给出错误消息。
注意如上所述,从0.16.0开始分配是最佳解决方案。请参阅文档http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.assign.html#pandas.DataFrame.assign 对于不覆盖中间值的数据流类型而言效果很好。

I got the dreaded SettingWithCopyWarning, and it wasn’t fixed by using the iloc syntax. My DataFrame was created by read_sql from an ODBC source. Using a suggestion by lowtech above, the following worked for me:

df.insert(len(df.columns), 'e', pd.Series(np.random.randn(sLength),  index=df.index))

This worked fine to insert the column at the end. I don’t know if it is the most efficient, but I don’t like warning messages. I think there is a better solution, but I can’t find it, and I think it depends on some aspect of the index.
Note. That this only works once and will give an error message if trying to overwrite and existing column.
Note As above and from 0.16.0 assign is the best solution. See documentation http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.assign.html#pandas.DataFrame.assign Works well for data flow type where you don’t overwrite your intermediate values.


回答 9

  1. 首先创建一个list_of_e具有相关数据的python 。
  2. 用这个: df['e'] = list_of_e
  1. First create a python’s list_of_e that has relevant data.
  2. Use this: df['e'] = list_of_e

回答 10

如果您要添加的列是一个系列变量,则只需:

df["new_columns_name"]=series_variable_name #this will do it for you

即使您要替换现有的列,此方法也能很好地工作,只需键入与要替换的列相同的new_columns_name,它将用新的系列数据覆盖现有的列数据。

If the column you are trying to add is a series variable then just :

df["new_columns_name"]=series_variable_name #this will do it for you

This works well even if you are replacing an existing column.just type the new_columns_name same as the column you want to replace.It will just overwrite the existing column data with the new series data.


回答 11

如果数据框和Series对象具有相同的index,则pandas.concat也可以在这里工作:

import pandas as pd
df
#          a            b           c           d
#0  0.671399     0.101208   -0.181532    0.241273
#1  0.446172    -0.243316    0.051767    1.577318
#2  0.614758     0.075793   -0.451460   -0.012493

e = pd.Series([-0.335485, -1.166658, -0.385571])    
e
#0   -0.335485
#1   -1.166658
#2   -0.385571
#dtype: float64

# here we need to give the series object a name which converts to the new  column name 
# in the result
df = pd.concat([df, e.rename("e")], axis=1)
df

#          a            b           c           d           e
#0  0.671399     0.101208   -0.181532    0.241273   -0.335485
#1  0.446172    -0.243316    0.051767    1.577318   -1.166658
#2  0.614758     0.075793   -0.451460   -0.012493   -0.385571

如果它们没有相同的索引:

e.index = df.index
df = pd.concat([df, e.rename("e")], axis=1)

If the data frame and Series object have the same index, pandas.concat also works here:

import pandas as pd
df
#          a            b           c           d
#0  0.671399     0.101208   -0.181532    0.241273
#1  0.446172    -0.243316    0.051767    1.577318
#2  0.614758     0.075793   -0.451460   -0.012493

e = pd.Series([-0.335485, -1.166658, -0.385571])    
e
#0   -0.335485
#1   -1.166658
#2   -0.385571
#dtype: float64

# here we need to give the series object a name which converts to the new  column name 
# in the result
df = pd.concat([df, e.rename("e")], axis=1)
df

#          a            b           c           d           e
#0  0.671399     0.101208   -0.181532    0.241273   -0.335485
#1  0.446172    -0.243316    0.051767    1.577318   -1.166658
#2  0.614758     0.075793   -0.451460   -0.012493   -0.385571

In case they don’t have the same index:

e.index = df.index
df = pd.concat([df, e.rename("e")], axis=1)

回答 12

万无一失:

df.loc[:, 'NewCol'] = 'New_Val'

例:

df = pd.DataFrame(data=np.random.randn(20, 4), columns=['A', 'B', 'C', 'D'])

df

           A         B         C         D
0  -0.761269  0.477348  1.170614  0.752714
1   1.217250 -0.930860 -0.769324 -0.408642
2  -0.619679 -1.227659 -0.259135  1.700294
3  -0.147354  0.778707  0.479145  2.284143
4  -0.529529  0.000571  0.913779  1.395894
5   2.592400  0.637253  1.441096 -0.631468
6   0.757178  0.240012 -0.553820  1.177202
7  -0.986128 -1.313843  0.788589 -0.707836
8   0.606985 -2.232903 -1.358107 -2.855494
9  -0.692013  0.671866  1.179466 -1.180351
10 -1.093707 -0.530600  0.182926 -1.296494
11 -0.143273 -0.503199 -1.328728  0.610552
12 -0.923110 -1.365890 -1.366202 -1.185999
13 -2.026832  0.273593 -0.440426 -0.627423
14 -0.054503 -0.788866 -0.228088 -0.404783
15  0.955298 -1.430019  1.434071 -0.088215
16 -0.227946  0.047462  0.373573 -0.111675
17  1.627912  0.043611  1.743403 -0.012714
18  0.693458  0.144327  0.329500 -0.655045
19  0.104425  0.037412  0.450598 -0.923387


df.drop([3, 5, 8, 10, 18], inplace=True)

df

           A         B         C         D
0  -0.761269  0.477348  1.170614  0.752714
1   1.217250 -0.930860 -0.769324 -0.408642
2  -0.619679 -1.227659 -0.259135  1.700294
4  -0.529529  0.000571  0.913779  1.395894
6   0.757178  0.240012 -0.553820  1.177202
7  -0.986128 -1.313843  0.788589 -0.707836
9  -0.692013  0.671866  1.179466 -1.180351
11 -0.143273 -0.503199 -1.328728  0.610552
12 -0.923110 -1.365890 -1.366202 -1.185999
13 -2.026832  0.273593 -0.440426 -0.627423
14 -0.054503 -0.788866 -0.228088 -0.404783
15  0.955298 -1.430019  1.434071 -0.088215
16 -0.227946  0.047462  0.373573 -0.111675
17  1.627912  0.043611  1.743403 -0.012714
19  0.104425  0.037412  0.450598 -0.923387

df.loc[:, 'NewCol'] = 0

df
           A         B         C         D  NewCol
0  -0.761269  0.477348  1.170614  0.752714       0
1   1.217250 -0.930860 -0.769324 -0.408642       0
2  -0.619679 -1.227659 -0.259135  1.700294       0
4  -0.529529  0.000571  0.913779  1.395894       0
6   0.757178  0.240012 -0.553820  1.177202       0
7  -0.986128 -1.313843  0.788589 -0.707836       0
9  -0.692013  0.671866  1.179466 -1.180351       0
11 -0.143273 -0.503199 -1.328728  0.610552       0
12 -0.923110 -1.365890 -1.366202 -1.185999       0
13 -2.026832  0.273593 -0.440426 -0.627423       0
14 -0.054503 -0.788866 -0.228088 -0.404783       0
15  0.955298 -1.430019  1.434071 -0.088215       0
16 -0.227946  0.047462  0.373573 -0.111675       0
17  1.627912  0.043611  1.743403 -0.012714       0
19  0.104425  0.037412  0.450598 -0.923387       0

Foolproof:

df.loc[:, 'NewCol'] = 'New_Val'

Example:

df = pd.DataFrame(data=np.random.randn(20, 4), columns=['A', 'B', 'C', 'D'])

df

           A         B         C         D
0  -0.761269  0.477348  1.170614  0.752714
1   1.217250 -0.930860 -0.769324 -0.408642
2  -0.619679 -1.227659 -0.259135  1.700294
3  -0.147354  0.778707  0.479145  2.284143
4  -0.529529  0.000571  0.913779  1.395894
5   2.592400  0.637253  1.441096 -0.631468
6   0.757178  0.240012 -0.553820  1.177202
7  -0.986128 -1.313843  0.788589 -0.707836
8   0.606985 -2.232903 -1.358107 -2.855494
9  -0.692013  0.671866  1.179466 -1.180351
10 -1.093707 -0.530600  0.182926 -1.296494
11 -0.143273 -0.503199 -1.328728  0.610552
12 -0.923110 -1.365890 -1.366202 -1.185999
13 -2.026832  0.273593 -0.440426 -0.627423
14 -0.054503 -0.788866 -0.228088 -0.404783
15  0.955298 -1.430019  1.434071 -0.088215
16 -0.227946  0.047462  0.373573 -0.111675
17  1.627912  0.043611  1.743403 -0.012714
18  0.693458  0.144327  0.329500 -0.655045
19  0.104425  0.037412  0.450598 -0.923387


df.drop([3, 5, 8, 10, 18], inplace=True)

df

           A         B         C         D
0  -0.761269  0.477348  1.170614  0.752714
1   1.217250 -0.930860 -0.769324 -0.408642
2  -0.619679 -1.227659 -0.259135  1.700294
4  -0.529529  0.000571  0.913779  1.395894
6   0.757178  0.240012 -0.553820  1.177202
7  -0.986128 -1.313843  0.788589 -0.707836
9  -0.692013  0.671866  1.179466 -1.180351
11 -0.143273 -0.503199 -1.328728  0.610552
12 -0.923110 -1.365890 -1.366202 -1.185999
13 -2.026832  0.273593 -0.440426 -0.627423
14 -0.054503 -0.788866 -0.228088 -0.404783
15  0.955298 -1.430019  1.434071 -0.088215
16 -0.227946  0.047462  0.373573 -0.111675
17  1.627912  0.043611  1.743403 -0.012714
19  0.104425  0.037412  0.450598 -0.923387

df.loc[:, 'NewCol'] = 0

df
           A         B         C         D  NewCol
0  -0.761269  0.477348  1.170614  0.752714       0
1   1.217250 -0.930860 -0.769324 -0.408642       0
2  -0.619679 -1.227659 -0.259135  1.700294       0
4  -0.529529  0.000571  0.913779  1.395894       0
6   0.757178  0.240012 -0.553820  1.177202       0
7  -0.986128 -1.313843  0.788589 -0.707836       0
9  -0.692013  0.671866  1.179466 -1.180351       0
11 -0.143273 -0.503199 -1.328728  0.610552       0
12 -0.923110 -1.365890 -1.366202 -1.185999       0
13 -2.026832  0.273593 -0.440426 -0.627423       0
14 -0.054503 -0.788866 -0.228088 -0.404783       0
15  0.955298 -1.430019  1.434071 -0.088215       0
16 -0.227946  0.047462  0.373573 -0.111675       0
17  1.627912  0.043611  1.743403 -0.012714       0
19  0.104425  0.037412  0.450598 -0.923387       0

回答 13

让我补充一点,就像hum3一样.loc没有解决SettingWithCopyWarning,我不得不求助于df.insert()。在我的情况下,“假”链索引产生了误报 dict['a']['e'],其中'e'是新列,并且dict['a']是来自字典的DataFrame。

另请注意,如果您知道自己在做什么,则可以使用pd.options.mode.chained_assignment = None ,而可以使用此处提供的其他解决方案之一来切换警告 。

Let me just add that, just like for hum3, .loc didn’t solve the SettingWithCopyWarning and I had to resort to df.insert(). In my case false positive was generated by “fake” chain indexing dict['a']['e'], where 'e' is the new column, and dict['a'] is a DataFrame coming from dictionary.

Also note that if you know what you are doing, you can switch of the warning using pd.options.mode.chained_assignment = None and than use one of the other solutions given here.


回答 14

要在数据框中的给定位置(0 <= loc <=列数)插入新列,只需使用Dataframe.insert:

DataFrame.insert(loc, column, value)

因此,如果要将列e添加到名为df的数据帧的末尾,则可以使用:

e = [-0.335485, -1.166658, -0.385571]    
DataFrame.insert(loc=len(df.columns), column='e', value=e)

value可以是Series,整数(在这种情况下,所有单元格都填充有该值)或类似数组的结构

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.insert.html

to insert a new column at a given location (0 <= loc <= amount of columns) in a data frame, just use Dataframe.insert:

DataFrame.insert(loc, column, value)

Therefore, if you want to add the column e at the end of a data frame called df, you can use:

e = [-0.335485, -1.166658, -0.385571]    
DataFrame.insert(loc=len(df.columns), column='e', value=e)

value can be a Series, an integer (in which case all cells get filled with this one value), or an array-like structure

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.insert.html


回答 15

在分配新列之前,如果您已建立索引数据,则需要对索引进行排序。至少就我而言,我必须:

data.set_index(['index_column'], inplace=True)
"if index is unsorted, assignment of a new column will fail"        
data.sort_index(inplace = True)
data.loc['index_value1', 'column_y'] = np.random.randn(data.loc['index_value1', 'column_x'].shape[0])

Before assigning a new column, if you have indexed data, you need to sort the index. At least in my case I had to:

data.set_index(['index_column'], inplace=True)
"if index is unsorted, assignment of a new column will fail"        
data.sort_index(inplace = True)
data.loc['index_value1', 'column_y'] = np.random.randn(data.loc['index_value1', 'column_x'].shape[0])

回答 16

但是要注意的一件事是,如果您这样做

df1['e'] = Series(np.random.randn(sLength), index=df1.index)

这实际上是df1.index上的连接。因此,如果要产生外部联接效果,我可能不完善的解决方案是创建一个具有索引值的数据框,该索引值覆盖数据的整个范围,然后使用上面的代码。例如,

data = pd.DataFrame(index=all_possible_values)
df1['e'] = Series(np.random.randn(sLength), index=df1.index)

One thing to note, though, is that if you do

df1['e'] = Series(np.random.randn(sLength), index=df1.index)

this will effectively be a left join on the df1.index. So if you want to have an outer join effect, my probably imperfect solution is to create a dataframe with index values covering the universe of your data, and then use the code above. For example,

data = pd.DataFrame(index=all_possible_values)
df1['e'] = Series(np.random.randn(sLength), index=df1.index)

回答 17

我一直在寻找一种通用方法,将numpy.nans 的列添加到数据框而不会变得愚蠢SettingWithCopyWarning

从以下内容:

我想出了这个:

col = 'column_name'
df = df.assign(**{col:numpy.full(len(df), numpy.nan)})

I was looking for a general way of adding a column of numpy.nans to a dataframe without getting the dumb SettingWithCopyWarning.

From the following:

  • the answers here
  • this question about passing a variable as a keyword argument
  • this method for generating a numpy array of NaNs in-line

I came up with this:

col = 'column_name'
df = df.assign(**{col:numpy.full(len(df), numpy.nan)})

回答 18

要将新列“ e”添加到现有数据框中

 df1.loc[:,'e'] = Series(np.random.randn(sLength))

To add a new column, ‘e’, to the existing data frame

 df1.loc[:,'e'] = Series(np.random.randn(sLength))

回答 19

为了完整性-使用DataFrame.eval()方法的另一种解决方案:

数据:

In [44]: e
Out[44]:
0    1.225506
1   -1.033944
2   -0.498953
3   -0.373332
4    0.615030
5   -0.622436
dtype: float64

In [45]: df1
Out[45]:
          a         b         c         d
0 -0.634222 -0.103264  0.745069  0.801288
4  0.782387 -0.090279  0.757662 -0.602408
5 -0.117456  2.124496  1.057301  0.765466
7  0.767532  0.104304 -0.586850  1.051297
8 -0.103272  0.958334  1.163092  1.182315
9 -0.616254  0.296678 -0.112027  0.679112

解:

In [46]: df1.eval("e = @e.values", inplace=True)

In [47]: df1
Out[47]:
          a         b         c         d         e
0 -0.634222 -0.103264  0.745069  0.801288  1.225506
4  0.782387 -0.090279  0.757662 -0.602408 -1.033944
5 -0.117456  2.124496  1.057301  0.765466 -0.498953
7  0.767532  0.104304 -0.586850  1.051297 -0.373332
8 -0.103272  0.958334  1.163092  1.182315  0.615030
9 -0.616254  0.296678 -0.112027  0.679112 -0.622436

For the sake of completeness – yet another solution using DataFrame.eval() method:

Data:

In [44]: e
Out[44]:
0    1.225506
1   -1.033944
2   -0.498953
3   -0.373332
4    0.615030
5   -0.622436
dtype: float64

In [45]: df1
Out[45]:
          a         b         c         d
0 -0.634222 -0.103264  0.745069  0.801288
4  0.782387 -0.090279  0.757662 -0.602408
5 -0.117456  2.124496  1.057301  0.765466
7  0.767532  0.104304 -0.586850  1.051297
8 -0.103272  0.958334  1.163092  1.182315
9 -0.616254  0.296678 -0.112027  0.679112

Solution:

In [46]: df1.eval("e = @e.values", inplace=True)

In [47]: df1
Out[47]:
          a         b         c         d         e
0 -0.634222 -0.103264  0.745069  0.801288  1.225506
4  0.782387 -0.090279  0.757662 -0.602408 -1.033944
5 -0.117456  2.124496  1.057301  0.765466 -0.498953
7  0.767532  0.104304 -0.586850  1.051297 -0.373332
8 -0.103272  0.958334  1.163092  1.182315  0.615030
9 -0.616254  0.296678 -0.112027  0.679112 -0.622436

回答 20

创建一个空列

df['i'] = None

To create an empty column

df['i'] = None

回答 21

以下是我的工作…但是,我对熊猫和Python真的很陌生,所以没有什么承诺。

df = pd.DataFrame([[1, 2], [3, 4], [5,6]], columns=list('AB'))

newCol = [3,5,7]
newName = 'C'

values = np.insert(df.values,df.shape[1],newCol,axis=1)
header = df.columns.values.tolist()
header.append(newName)

df = pd.DataFrame(values,columns=header)

The following is what I did… But I’m pretty new to pandas and really Python in general, so no promises.

df = pd.DataFrame([[1, 2], [3, 4], [5,6]], columns=list('AB'))

newCol = [3,5,7]
newName = 'C'

values = np.insert(df.values,df.shape[1],newCol,axis=1)
header = df.columns.values.tolist()
header.append(newName)

df = pd.DataFrame(values,columns=header)

回答 22

如果得到SettingWithCopyWarning,一个简单的解决方法是复制您要向其中添加列的DataFrame。

df = df.copy()
df['col_name'] = values

If you get the SettingWithCopyWarning, an easy fix is to copy the DataFrame you are trying to add a column to.

df = df.copy()
df['col_name'] = values

我如何获得执行Python程序的时间?

问题:我如何获得执行Python程序的时间?

我在Python中有一个命令行程序,需要花一些时间才能完成。我想知道完成跑步所需的确切时间。

我看过该timeit模块,但似乎仅适用于少量代码段。我想安排整个节目的时间。

I have a command line program in Python that takes a while to finish. I want to know the exact time it takes to finish running.

I’ve looked at the timeit module, but it seems it’s only for small snippets of code. I want to time the whole program.


回答 0

Python中最简单的方法:

import time
start_time = time.time()
main()
print("--- %s seconds ---" % (time.time() - start_time))

假设您的程序至少需要十分之一秒才能运行。

印刷品:

--- 0.764891862869 seconds ---

The simplest way in Python:

import time
start_time = time.time()
main()
print("--- %s seconds ---" % (time.time() - start_time))

This assumes that your program takes at least a tenth of second to run.

Prints:

--- 0.764891862869 seconds ---

回答 1

我将此timing.py模块放入自己的site-packages目录中,然后将其插入import timing模块顶部:

import atexit
from time import clock

def secondsToStr(t):
    return "%d:%02d:%02d.%03d" % \
        reduce(lambda ll,b : divmod(ll[0],b) + ll[1:],
            [(t*1000,),1000,60,60])

line = "="*40
def log(s, elapsed=None):
    print line
    print secondsToStr(clock()), '-', s
    if elapsed:
        print "Elapsed time:", elapsed
    print line
    print

def endlog():
    end = clock()
    elapsed = end-start
    log("End Program", secondsToStr(elapsed))

def now():
    return secondsToStr(clock())

start = clock()
atexit.register(endlog)
log("Start Program")

timing.log如果要显示的程序中有重要的阶段,我也可以从程序中调用。但仅包括即可import timing打印开始时间和结束时间以及总体经过时间。(请原谅我晦涩的secondsToStr功能,它只是将秒的浮点数格式设置为hh:mm:ss.sss形式。)

注意:以上代码的Python 3版本可以在此处此处找到。

I put this timing.py module into my own site-packages directory, and just insert import timing at the top of my module:

import atexit
from time import clock

def secondsToStr(t):
    return "%d:%02d:%02d.%03d" % \
        reduce(lambda ll,b : divmod(ll[0],b) + ll[1:],
            [(t*1000,),1000,60,60])

line = "="*40
def log(s, elapsed=None):
    print line
    print secondsToStr(clock()), '-', s
    if elapsed:
        print "Elapsed time:", elapsed
    print line
    print

def endlog():
    end = clock()
    elapsed = end-start
    log("End Program", secondsToStr(elapsed))

def now():
    return secondsToStr(clock())

start = clock()
atexit.register(endlog)
log("Start Program")

I can also call timing.log from within my program if there are significant stages within the program I want to show. But just including import timing will print the start and end times, and overall elapsed time. (Forgive my obscure secondsToStr function, it just formats a floating point number of seconds to hh:mm:ss.sss form.)

Note: A Python 3 version of the above code can be found here or here.


回答 2

在Linux或Unix中:

$ time python yourprogram.py

在Windows中,请参见以下StackOverflow问题: 如何在Windows命令行上测量命令的执行时间?

要获得更详细的输出,

$ time -v python yourprogram.py
    Command being timed: "python3 yourprogram.py"
    User time (seconds): 0.08
    System time (seconds): 0.02
    Percent of CPU this job got: 98%
    Elapsed (wall clock) time (h:mm:ss or m:ss): 0:00.10
    Average shared text size (kbytes): 0
    Average unshared data size (kbytes): 0
    Average stack size (kbytes): 0
    Average total size (kbytes): 0
    Maximum resident set size (kbytes): 9480
    Average resident set size (kbytes): 0
    Major (requiring I/O) page faults: 0
    Minor (reclaiming a frame) page faults: 1114
    Voluntary context switches: 0
    Involuntary context switches: 22
    Swaps: 0
    File system inputs: 0
    File system outputs: 0
    Socket messages sent: 0
    Socket messages received: 0
    Signals delivered: 0
    Page size (bytes): 4096
    Exit status: 0

In Linux or Unix:

$ time python yourprogram.py

In Windows, see this StackOverflow question: How do I measure execution time of a command on the Windows command line?

For more verbose output,

$ time -v python yourprogram.py
    Command being timed: "python3 yourprogram.py"
    User time (seconds): 0.08
    System time (seconds): 0.02
    Percent of CPU this job got: 98%
    Elapsed (wall clock) time (h:mm:ss or m:ss): 0:00.10
    Average shared text size (kbytes): 0
    Average unshared data size (kbytes): 0
    Average stack size (kbytes): 0
    Average total size (kbytes): 0
    Maximum resident set size (kbytes): 9480
    Average resident set size (kbytes): 0
    Major (requiring I/O) page faults: 0
    Minor (reclaiming a frame) page faults: 1114
    Voluntary context switches: 0
    Involuntary context switches: 22
    Swaps: 0
    File system inputs: 0
    File system outputs: 0
    Socket messages sent: 0
    Socket messages received: 0
    Signals delivered: 0
    Page size (bytes): 4096
    Exit status: 0

回答 3

我真的很喜欢Paul McGuire的答案,但是我使用Python3。因此,对于那些感兴趣的人:这是他的答案的一种修改,可用于* nix上的Python 3(我想在Windows下,clock()应该使用代替time()):

#python3
import atexit
from time import time, strftime, localtime
from datetime import timedelta

def secondsToStr(elapsed=None):
    if elapsed is None:
        return strftime("%Y-%m-%d %H:%M:%S", localtime())
    else:
        return str(timedelta(seconds=elapsed))

def log(s, elapsed=None):
    line = "="*40
    print(line)
    print(secondsToStr(), '-', s)
    if elapsed:
        print("Elapsed time:", elapsed)
    print(line)
    print()

def endlog():
    end = time()
    elapsed = end-start
    log("End Program", secondsToStr(elapsed))

start = time()
atexit.register(endlog)
log("Start Program")

如果您认为此方法有用,则仍应投票赞成他的答案,而不是像他所做的大部分工作一样;)。

I really like Paul McGuire’s answer, but I use Python 3. So for those who are interested: here’s a modification of his answer that works with Python 3 on *nix (I imagine, under Windows, that clock() should be used instead of time()):

#python3
import atexit
from time import time, strftime, localtime
from datetime import timedelta

def secondsToStr(elapsed=None):
    if elapsed is None:
        return strftime("%Y-%m-%d %H:%M:%S", localtime())
    else:
        return str(timedelta(seconds=elapsed))

def log(s, elapsed=None):
    line = "="*40
    print(line)
    print(secondsToStr(), '-', s)
    if elapsed:
        print("Elapsed time:", elapsed)
    print(line)
    print()

def endlog():
    end = time()
    elapsed = end-start
    log("End Program", secondsToStr(elapsed))

start = time()
atexit.register(endlog)
log("Start Program")

If you find this useful, you should still up-vote his answer instead of this one, as he did most of the work ;).


回答 4

import time

start_time = time.clock()
main()
print time.clock() - start_time, "seconds"

time.clock()返回处理器时间,这使我们只能计算该进程使用的时间(无论如何在Unix上)。该文档说“无论如何,这是用于基准化Python或计时算法的功能”

import time

start_time = time.clock()
main()
print time.clock() - start_time, "seconds"

time.clock() returns the processor time, which allows us to calculate only the time used by this process (on Unix anyway). The documentation says “in any case, this is the function to use for benchmarking Python or timing algorithms”


回答 5

我喜欢输出 datetime模块提供,其中时间增量对象根据需要以人类可读的方式显示天,小时,分钟等。

例如:

from datetime import datetime
start_time = datetime.now()
# do your work here
end_time = datetime.now()
print('Duration: {}'.format(end_time - start_time))

样品输出,例如

Duration: 0:00:08.309267

要么

Duration: 1 day, 1:51:24.269711

正如JF Sebastian提到的那样,这种方法在本地时间可能会遇到一些棘手的情况,因此使用起来更安全:

import time
from datetime import timedelta
start_time = time.monotonic()
end_time = time.monotonic()
print(timedelta(seconds=end_time - start_time))

I like the output the datetime module provides, where time delta objects show days, hours, minutes, etc. as necessary in a human-readable way.

For example:

from datetime import datetime
start_time = datetime.now()
# do your work here
end_time = datetime.now()
print('Duration: {}'.format(end_time - start_time))

Sample output e.g.

Duration: 0:00:08.309267

or

Duration: 1 day, 1:51:24.269711

As J.F. Sebastian mentioned, this approach might encounter some tricky cases with local time, so it’s safer to use:

import time
from datetime import timedelta
start_time = time.monotonic()
end_time = time.monotonic()
print(timedelta(seconds=end_time - start_time))

回答 6

您可以使用Python探查器cProfile来测量CPU时间,还可以测量每个函数内部花费了多少时间以及每个函数被调用了多少次。如果您想在不知道从哪里开始的情况下提高脚本性能,这将非常有用。另一个Stack Overflow问题的答案非常好。看看文档总是很高兴也。

这是一个示例,如何从命令行使用cProfile来分析脚本:

$ python -m cProfile euler048.py

1007 function calls in 0.061 CPU seconds

Ordered by: standard name
ncalls  tottime  percall  cumtime  percall filename:lineno(function)
    1    0.000    0.000    0.061    0.061 <string>:1(<module>)
 1000    0.051    0.000    0.051    0.000 euler048.py:2(<lambda>)
    1    0.005    0.005    0.061    0.061 euler048.py:2(<module>)
    1    0.000    0.000    0.061    0.061 {execfile}
    1    0.002    0.002    0.053    0.053 {map}
    1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler objects}
    1    0.000    0.000    0.000    0.000 {range}
    1    0.003    0.003    0.003    0.003 {sum}

You can use the Python profiler cProfile to measure CPU time and additionally how much time is spent inside each function and how many times each function is called. This is very useful if you want to improve performance of your script without knowing where to start. This answer to another Stack Overflow question is pretty good. It’s always good to have a look in the documentation too.

Here’s an example how to profile a script using cProfile from a command line:

$ python -m cProfile euler048.py

1007 function calls in 0.061 CPU seconds

Ordered by: standard name
ncalls  tottime  percall  cumtime  percall filename:lineno(function)
    1    0.000    0.000    0.061    0.061 <string>:1(<module>)
 1000    0.051    0.000    0.051    0.000 euler048.py:2(<lambda>)
    1    0.005    0.005    0.061    0.061 euler048.py:2(<module>)
    1    0.000    0.000    0.061    0.061 {execfile}
    1    0.002    0.002    0.053    0.053 {map}
    1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler objects}
    1    0.000    0.000    0.000    0.000 {range}
    1    0.003    0.003    0.003    0.003 {sum}

回答 7

对于Linux甚至更好: time

$ time -v python rhtest2.py

    Command being timed: "python rhtest2.py"
    User time (seconds): 4.13
    System time (seconds): 0.07
    Percent of CPU this job got: 91%
    Elapsed (wall clock) time (h:mm:ss or m:ss): 0:04.58
    Average shared text size (kbytes): 0
    Average unshared data size (kbytes): 0
    Average stack size (kbytes): 0
    Average total size (kbytes): 0
    Maximum resident set size (kbytes): 0
    Average resident set size (kbytes): 0
    Major (requiring I/O) page faults: 15
    Minor (reclaiming a frame) page faults: 5095
    Voluntary context switches: 27
    Involuntary context switches: 279
    Swaps: 0
    File system inputs: 0
    File system outputs: 0
    Socket messages sent: 0
    Socket messages received: 0
    Signals delivered: 0
    Page size (bytes): 4096
    Exit status: 0

Even better for Linux: time

$ time -v python rhtest2.py

    Command being timed: "python rhtest2.py"
    User time (seconds): 4.13
    System time (seconds): 0.07
    Percent of CPU this job got: 91%
    Elapsed (wall clock) time (h:mm:ss or m:ss): 0:04.58
    Average shared text size (kbytes): 0
    Average unshared data size (kbytes): 0
    Average stack size (kbytes): 0
    Average total size (kbytes): 0
    Maximum resident set size (kbytes): 0
    Average resident set size (kbytes): 0
    Major (requiring I/O) page faults: 15
    Minor (reclaiming a frame) page faults: 5095
    Voluntary context switches: 27
    Involuntary context switches: 279
    Swaps: 0
    File system inputs: 0
    File system outputs: 0
    Socket messages sent: 0
    Socket messages received: 0
    Signals delivered: 0
    Page size (bytes): 4096
    Exit status: 0

回答 8

time.clock()

从版本3.3开始不推荐使用:此功能的行为取决于平台:根据您的要求,使用perf_counter()process_time()来具有明确定义的行为。

time.perf_counter()

返回性能计数器的值(以小数秒为单位),即具有最高可用分辨率的时钟以测量较短的持续时间。它的确包含整个系统的睡眠时间。

time.process_time()

返回当前进程的系统和用户CPU时间之和的值(以秒为单位)。它包括睡眠期间经过的时间。

start = time.process_time()
... do something
elapsed = (time.process_time() - start)

time.clock()

Deprecated since version 3.3: The behavior of this function depends on the platform: use perf_counter() or process_time() instead, depending on your requirements, to have a well-defined behavior.

time.perf_counter()

Return the value (in fractional seconds) of a performance counter, i.e. a clock with the highest available resolution to measure a short duration. It does include time elapsed during sleep and is system-wide.

time.process_time()

Return the value (in fractional seconds) of the sum of the system and user CPU time of the current process. It does not include time elapsed during sleep.

start = time.process_time()
... do something
elapsed = (time.process_time() - start)

回答 9

只需使用该timeit模块。它同时适用于Python 2和Python 3。

import timeit

start = timeit.default_timer()

# All the program statements
stop = timeit.default_timer()
execution_time = stop - start

print("Program Executed in "+str(execution_time)) # It returns time in seconds

它以秒为单位返回,您可以拥有执行时间。很简单,但是您应该将它们写在开始程序执行的主函数中。如果即使在遇到错误时也想获得执行时间,则将参数“开始”添加到该位置并进行计算,例如:

def sample_function(start,**kwargs):
     try:
         # Your statements
     except:
         # except statements run when your statements raise an exception
         stop = timeit.default_timer()
         execution_time = stop - start
         print("Program executed in " + str(execution_time))

Just use the timeit module. It works with both Python 2 and Python 3.

import timeit

start = timeit.default_timer()

# All the program statements
stop = timeit.default_timer()
execution_time = stop - start

print("Program Executed in "+str(execution_time)) # It returns time in seconds

It returns in seconds and you can have your execution time. It is simple, but you should write these in thew main function which starts program execution. If you want to get the execution time even when you get an error then take your parameter “Start” to it and calculate there like:

def sample_function(start,**kwargs):
     try:
         # Your statements
     except:
         # except statements run when your statements raise an exception
         stop = timeit.default_timer()
         execution_time = stop - start
         print("Program executed in " + str(execution_time))

回答 10

以下代码段以一种易于阅读的<HH:MM:SS>格式打印经过的时间。

import time
from datetime import timedelta

start_time = time.time()

#
# Perform lots of computations.
#

elapsed_time_secs = time.time() - start_time

msg = "Execution took: %s secs (Wall clock time)" % timedelta(seconds=round(elapsed_time_secs))

print(msg)    

The following snippet prints elapsed time in a nice human readable <HH:MM:SS> format.

import time
from datetime import timedelta

start_time = time.time()

#
# Perform lots of computations.
#

elapsed_time_secs = time.time() - start_time

msg = "Execution took: %s secs (Wall clock time)" % timedelta(seconds=round(elapsed_time_secs))

print(msg)    

回答 11

from time import time
start_time = time()
...
end_time = time()
time_taken = end_time - start_time # time_taken is in seconds
hours, rest = divmod(time_taken,3600)
minutes, seconds = divmod(rest, 60)
from time import time
start_time = time()
...
end_time = time()
time_taken = end_time - start_time # time_taken is in seconds
hours, rest = divmod(time_taken,3600)
minutes, seconds = divmod(rest, 60)

回答 12

IPython中,“ timeit”任何脚本:

def foo():
    %run bar.py
timeit foo()

In IPython, “timeit” any script:

def foo():
    %run bar.py
timeit foo()

回答 13

我已经看过timeit模块,但似乎只适用于小段代码。我想安排整个节目的时间。

$ python -mtimeit -n1 -r1 -t -s "from your_module import main" "main()"

它运行一次your_module.main()功能,并使用以下命令打印经过的时间time.time()功能作为计时器。

/usr/bin/time在Python中进行仿真,请参见带有/ usr / bin / time的Python子进程:如何捕获计时信息,但忽略所有其他输出?

要测量time.sleep()每个函数的CPU时间(例如,不包括中的时间),您可以使用profile模块(cProfile在Python 2上):

$ python3 -mprofile your_module.py

如果您想使用相同的计时器,则可以传递-ptimeit上面的命令profile模块使用的。

请参阅如何配置Python脚本?

I’ve looked at the timeit module, but it seems it’s only for small snippets of code. I want to time the whole program.

$ python -mtimeit -n1 -r1 -t -s "from your_module import main" "main()"

It runs your_module.main() function one time and print the elapsed time using time.time() function as a timer.

To emulate /usr/bin/time in Python see Python subprocess with /usr/bin/time: how to capture timing info but ignore all other output?.

To measure CPU time (e.g., don’t include time during time.sleep()) for each function, you could use profile module (cProfile on Python 2):

$ python3 -mprofile your_module.py

You could pass -p to timeit command above if you want to use the same timer as profile module uses.

See How can you profile a Python script?


回答 14

我也喜欢Paul McGuire的答案,并提出了一个更适合我需求的上下文管理器表格。

import datetime as dt
import timeit

class TimingManager(object):
    """Context Manager used with the statement 'with' to time some execution.

    Example:

    with TimingManager() as t:
       # Code to time
    """

    clock = timeit.default_timer

    def __enter__(self):
        """
        """
        self.start = self.clock()
        self.log('\n=> Start Timing: {}')

        return self

    def __exit__(self, exc_type, exc_val, exc_tb):
        """
        """
        self.endlog()

        return False

    def log(self, s, elapsed=None):
        """Log current time and elapsed time if present.
        :param s: Text to display, use '{}' to format the text with
            the current time.
        :param elapsed: Elapsed time to display. Dafault: None, no display.
        """
        print s.format(self._secondsToStr(self.clock()))

        if(elapsed is not None):
            print 'Elapsed time: {}\n'.format(elapsed)

    def endlog(self):
        """Log time for the end of execution with elapsed time.
        """
        self.log('=> End Timing: {}', self.now())

    def now(self):
        """Return current elapsed time as hh:mm:ss string.
        :return: String.
        """
        return str(dt.timedelta(seconds = self.clock() - self.start))

    def _secondsToStr(self, sec):
        """Convert timestamp to h:mm:ss string.
        :param sec: Timestamp.
        """
        return str(dt.datetime.fromtimestamp(sec))

I liked Paul McGuire’s answer too and came up with a context manager form which suited my needs more.

import datetime as dt
import timeit

class TimingManager(object):
    """Context Manager used with the statement 'with' to time some execution.

    Example:

    with TimingManager() as t:
       # Code to time
    """

    clock = timeit.default_timer

    def __enter__(self):
        """
        """
        self.start = self.clock()
        self.log('\n=> Start Timing: {}')

        return self

    def __exit__(self, exc_type, exc_val, exc_tb):
        """
        """
        self.endlog()

        return False

    def log(self, s, elapsed=None):
        """Log current time and elapsed time if present.
        :param s: Text to display, use '{}' to format the text with
            the current time.
        :param elapsed: Elapsed time to display. Dafault: None, no display.
        """
        print s.format(self._secondsToStr(self.clock()))

        if(elapsed is not None):
            print 'Elapsed time: {}\n'.format(elapsed)

    def endlog(self):
        """Log time for the end of execution with elapsed time.
        """
        self.log('=> End Timing: {}', self.now())

    def now(self):
        """Return current elapsed time as hh:mm:ss string.
        :return: String.
        """
        return str(dt.timedelta(seconds = self.clock() - self.start))

    def _secondsToStr(self, sec):
        """Convert timestamp to h:mm:ss string.
        :param sec: Timestamp.
        """
        return str(dt.datetime.fromtimestamp(sec))

回答 15

对于使用Jupyter Notebook的数据人员

在单元格中,可以使用Jupyter的%%timemagic命令来测量执行时间:

%%time
[ x**2 for x in range(10000)]

输出量

CPU times: user 4.54 ms, sys: 0 ns, total: 4.54 ms
Wall time: 4.12 ms

这只会捕获特定单元的执行时间。如果您想捕获整个笔记本(即程序)的执行时间,则可以在同一目录中创建一个新笔记本,然后在新笔记本中执行所有单元:

假设上面的笔记本名为example_notebook.ipynb。在同一目录中的新笔记本中:

# Convert your notebook to a .py script:
!jupyter nbconvert --to script example_notebook.ipynb

# Run the example_notebook with -t flag for time
%run -t example_notebook

输出量

IPython CPU timings (estimated):
  User   :       0.00 s.
  System :       0.00 s.
Wall time:       0.00 s.

For the data folks using Jupyter Notebook

In a cell, you can use Jupyter’s %%time magic command to measure the execution time:

%%time
[ x**2 for x in range(10000)]

Output

CPU times: user 4.54 ms, sys: 0 ns, total: 4.54 ms
Wall time: 4.12 ms

This will only capture the execution time of a particular cell. If you’d like to capture the execution time of the whole notebook (i.e. program), you can create a new notebook in the same directory and in the new notebook execute all cells:

Suppose the notebook above is called example_notebook.ipynb. In a new notebook within the same directory:

# Convert your notebook to a .py script:
!jupyter nbconvert --to script example_notebook.ipynb

# Run the example_notebook with -t flag for time
%run -t example_notebook

Output

IPython CPU timings (estimated):
  User   :       0.00 s.
  System :       0.00 s.
Wall time:       0.00 s.

回答 16

有一个timeit模块可用于计时Python代码的执行时间。

它在Python文档26.6中提供了详细的文档和示例timeit —测量小代码段的执行时间

There is a timeit module which can be used to time the execution times of Python code.

It has detailed documentation and examples in Python documentation, 26.6. timeit — Measure execution time of small code snippets.


回答 17

使用line_profiler

line_profiler将分析各个代码行执行所需的时间。剖析器通过Cython在C中实现,以减少分析的开销。

from line_profiler import LineProfiler
import random

def do_stuff(numbers):
    s = sum(numbers)
    l = [numbers[i]/43 for i in range(len(numbers))]
    m = ['hello'+str(numbers[i]) for i in range(len(numbers))]

numbers = [random.randint(1,100) for i in range(1000)]
lp = LineProfiler()
lp_wrapper = lp(do_stuff)
lp_wrapper(numbers)
lp.print_stats()

结果将是:

Timer unit: 1e-06 s

Total time: 0.000649 s
File: <ipython-input-2-2e060b054fea>
Function: do_stuff at line 4

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
     4                                           def do_stuff(numbers):
     5         1           10     10.0      1.5      s = sum(numbers)
     6         1          186    186.0     28.7      l = [numbers[i]/43 for i in range(len(numbers))]
     7         1          453    453.0     69.8      m = ['hello'+str(numbers[i]) for i in range(len(numbers))]

Use line_profiler.

line_profiler will profile the time individual lines of code take to execute. The profiler is implemented in C via Cython in order to reduce the overhead of profiling.

from line_profiler import LineProfiler
import random

def do_stuff(numbers):
    s = sum(numbers)
    l = [numbers[i]/43 for i in range(len(numbers))]
    m = ['hello'+str(numbers[i]) for i in range(len(numbers))]

numbers = [random.randint(1,100) for i in range(1000)]
lp = LineProfiler()
lp_wrapper = lp(do_stuff)
lp_wrapper(numbers)
lp.print_stats()

The results will be:

Timer unit: 1e-06 s

Total time: 0.000649 s
File: <ipython-input-2-2e060b054fea>
Function: do_stuff at line 4

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
     4                                           def do_stuff(numbers):
     5         1           10     10.0      1.5      s = sum(numbers)
     6         1          186    186.0     28.7      l = [numbers[i]/43 for i in range(len(numbers))]
     7         1          453    453.0     69.8      m = ['hello'+str(numbers[i]) for i in range(len(numbers))]

回答 18

我使用了一个非常简单的函数来计时部分代码执行时间:

import time
def timing():
    start_time = time.time()
    return lambda x: print("[{:.2f}s] {}".format(time.time() - start_time, x))

要使用它,只需在代码之前调用它以进行测量以检索函数计时,然后在代码后调用带有注释的函数。时间将显示在评论的前面。例如:

t = timing()
train = pd.read_csv('train.csv',
                        dtype={
                            'id': str,
                            'vendor_id': str,
                            'pickup_datetime': str,
                            'dropoff_datetime': str,
                            'passenger_count': int,
                            'pickup_longitude': np.float64,
                            'pickup_latitude': np.float64,
                            'dropoff_longitude': np.float64,
                            'dropoff_latitude': np.float64,
                            'store_and_fwd_flag': str,
                            'trip_duration': int,
                        },
                        parse_dates = ['pickup_datetime', 'dropoff_datetime'],
                   )
t("Loaded {} rows data from 'train'".format(len(train)))

然后输出将如下所示:

[9.35s] Loaded 1458644 rows data from 'train'

I used a very simple function to time a part of code execution:

import time
def timing():
    start_time = time.time()
    return lambda x: print("[{:.2f}s] {}".format(time.time() - start_time, x))

And to use it, just call it before the code to measure to retrieve function timing, and then call the function after the code with comments. The time will appear in front of the comments. For example:

t = timing()
train = pd.read_csv('train.csv',
                        dtype={
                            'id': str,
                            'vendor_id': str,
                            'pickup_datetime': str,
                            'dropoff_datetime': str,
                            'passenger_count': int,
                            'pickup_longitude': np.float64,
                            'pickup_latitude': np.float64,
                            'dropoff_longitude': np.float64,
                            'dropoff_latitude': np.float64,
                            'store_and_fwd_flag': str,
                            'trip_duration': int,
                        },
                        parse_dates = ['pickup_datetime', 'dropoff_datetime'],
                   )
t("Loaded {} rows data from 'train'".format(len(train)))

Then the output will look like this:

[9.35s] Loaded 1458644 rows data from 'train'

回答 19

我在很多地方都遇到过同样的问题,所以我创建了一个便利包horology。您可以安装它,pip install horology然后以一种优雅的方式完成它:

from horology import Timing

with Timing(name='Important calculations: '):
    prepare()
    do_your_stuff()
    finish_sth()

将输出:

Important calculations: 12.43 ms

甚至更简单(如果您有一个功能):

from horology import timed

@timed
def main():
    ...

将输出:

main: 7.12 h

它照顾单位和舍入。它适用于python 3.6或更高版本。

I was having the same problem in many places, so I created a convenience package horology. You can install it with pip install horology and then do it in the elegant way:

from horology import Timing

with Timing(name='Important calculations: '):
    prepare()
    do_your_stuff()
    finish_sth()

will output:

Important calculations: 12.43 ms

Or even simpler (if you have one function):

from horology import timed

@timed
def main():
    ...

will output:

main: 7.12 h

It takes care of units and rounding. It works with python 3.6 or newer.


回答 20

这是Paul McGuire的答案,对我有用。以防万一有人在运行那个困难。

import atexit
from time import clock

def reduce(function, iterable, initializer=None):
    it = iter(iterable)
    if initializer is None:
        value = next(it)
    else:
        value = initializer
    for element in it:
        value = function(value, element)
    return value

def secondsToStr(t):
    return "%d:%02d:%02d.%03d" % \
        reduce(lambda ll,b : divmod(ll[0],b) + ll[1:],
            [(t*1000,),1000,60,60])

line = "="*40
def log(s, elapsed=None):
    print (line)
    print (secondsToStr(clock()), '-', s)
    if elapsed:
        print ("Elapsed time:", elapsed)
    print (line)

def endlog():
    end = clock()
    elapsed = end-start
    log("End Program", secondsToStr(elapsed))

def now():
    return secondsToStr(clock())

def main():
    start = clock()
    atexit.register(endlog)
    log("Start Program")

timing.main()导入文件后,从程序中调用。

This is Paul McGuire’s answer that works for me. Just in case someone was having trouble running that one.

import atexit
from time import clock

def reduce(function, iterable, initializer=None):
    it = iter(iterable)
    if initializer is None:
        value = next(it)
    else:
        value = initializer
    for element in it:
        value = function(value, element)
    return value

def secondsToStr(t):
    return "%d:%02d:%02d.%03d" % \
        reduce(lambda ll,b : divmod(ll[0],b) + ll[1:],
            [(t*1000,),1000,60,60])

line = "="*40
def log(s, elapsed=None):
    print (line)
    print (secondsToStr(clock()), '-', s)
    if elapsed:
        print ("Elapsed time:", elapsed)
    print (line)

def endlog():
    end = clock()
    elapsed = end-start
    log("End Program", secondsToStr(elapsed))

def now():
    return secondsToStr(clock())

def main():
    start = clock()
    atexit.register(endlog)
    log("Start Program")

Call timing.main() from your program after importing the file.


回答 21

Timeit是Python中的一个类,用于计算小代码块的执行时间。

Default_timer是此类中的一种方法,用于测量挂钟计时,而不是CPU执行时间。因此,其他进程执行可能会对此产生干扰。因此,这对于较小的代码块很有用。

代码示例如下:

from timeit import default_timer as timer

start= timer()

# Some logic

end = timer()

print("Time taken:", end-start)

Timeit is a class in Python used to calculate the execution time of small blocks of code.

Default_timer is a method in this class which is used to measure the wall clock timing, not CPU execution time. Thus other process execution might interfere with this. Thus it is useful for small blocks of code.

A sample of the code is as follows:

from timeit import default_timer as timer

start= timer()

# Some logic

end = timer()

print("Time taken:", end-start)

回答 22

稍后的答案,但我使用timeit

import timeit
code_to_test = """
a = range(100000)
b = []
for i in a:
    b.append(i*2)
"""
elapsed_time = timeit.timeit(code_to_test, number=500)
print(elapsed_time)
# 10.159821493085474

  • 在内包装所有代码,包括您可能拥有的任何导入code_to_test
  • number 参数指定代码应重复的次数。
  • 演示版

Later answer, but I use timeit:

import timeit
code_to_test = """
a = range(100000)
b = []
for i in a:
    b.append(i*2)
"""
elapsed_time = timeit.timeit(code_to_test, number=500)
print(elapsed_time)
# 10.159821493085474

  • Wrap all your code, including any imports you may have, inside code_to_test.
  • number argument specifies the amount of times the code should repeat.
  • Demo

回答 23

Python程序执行时间的时间可能不一致,具体取决于:

  • 可以使用不同的算法评估同一程序
  • 运行时间因算法而异
  • 运行时间因实现而异
  • 运行时间因计算机而异
  • 基于少量输入,运行时间是不可预测的

这是因为最有效的方法是使用“增长顺序”并学习“ O”表示法来正确执行。

无论如何,您可以尝试使用以下简单算法以特定的机器每秒计数步骤来评估任何Python程序的性能: 使其适应您要评估的程序

import time

now = time.time()
future = now + 10
step = 4 # Why 4 steps? Because until here already four operations executed
while time.time() < future:
    step += 3 # Why 3 again? Because a while loop executes one comparison and one plus equal statement
step += 4 # Why 3 more? Because one comparison starting while when time is over plus the final assignment of step + 1 and print statement
print(str(int(step / 10)) + " steps per second")

The time of a Python program’s execution measure could be inconsistent depending on:

  • Same program can be evaluated using different algorithms
  • Running time varies between algorithms
  • Running time varies between implementations
  • Running time varies between computers
  • Running time is not predictable based on small inputs

This is because the most effective way is using the “Order of Growth” and learn the Big “O” notation to do it properly.

Anyway, you can try to evaluate the performance of any Python program in specific machine counting steps per second using this simple algorithm: adapt this to the program you want to evaluate

import time

now = time.time()
future = now + 10
step = 4 # Why 4 steps? Because until here already four operations executed
while time.time() < future:
    step += 3 # Why 3 again? Because a while loop executes one comparison and one plus equal statement
step += 4 # Why 3 more? Because one comparison starting while when time is over plus the final assignment of step + 1 and print statement
print(str(int(step / 10)) + " steps per second")

回答 24

您只需在Python中执行此操作即可。无需使其变得复杂。

import time

start = time.localtime()
end = time.localtime()
"""Total execution time in seconds$ """
print(end.tm_sec - start.tm_sec)

You do this simply in Python. There is no need to make it complicated.

import time

start = time.localtime()
end = time.localtime()
"""Total execution time in seconds$ """
print(end.tm_sec - start.tm_sec)

回答 25

与@rogeriopvl的响应类似,我添加了一点修改,以使用相同的库将长时间运行的作业转换为时分秒。

import time
start_time = time.time()
main()
seconds = time.time() - start_time
print('Time Taken:', time.strftime("%H:%M:%S",time.gmtime(seconds)))

样本输出

Time Taken: 00:00:08

Similar to the response from @rogeriopvl I added a slight modification to convert to hour minute seconds using the same library for long running jobs.

import time
start_time = time.time()
main()
seconds = time.time() - start_time
print('Time Taken:', time.strftime("%H:%M:%S",time.gmtime(seconds)))

Sample Output

Time Taken: 00:00:08

回答 26

首先,通过以管理员身份打开命令提示符(CMD)并在其中键入命令,以安装对人类友好的软件包- pip install humanfriendly

码:

from humanfriendly import format_timespan
import time
begin_time = time.time()
# Put your code here
end_time = time.time() - begin_time
print("Total execution time: ", format_timespan(end_time))

输出:

First, install humanfriendly package by opening Command Prompt (CMD) as administrator and type there – pip install humanfriendly

Code:

from humanfriendly import format_timespan
import time
begin_time = time.time()
# Put your code here
end_time = time.time() - begin_time
print("Total execution time: ", format_timespan(end_time))

Output:


回答 27

要使用metakermit更新的Python 2.7 答案,您将需要单调包。

代码如下:

from datetime import timedelta
from monotonic import monotonic

start_time = monotonic()
end_time = monotonic()
print(timedelta(seconds=end_time - start_time))

To use metakermit’s updated answer for Python 2.7, you will require the monotonic package.

The code would then be as follows:

from datetime import timedelta
from monotonic import monotonic

start_time = monotonic()
end_time = monotonic()
print(timedelta(seconds=end_time - start_time))

回答 28

我尝试使用以下脚本找到时差。

import time

start_time = time.perf_counter()
[main code here]
print (time.perf_counter() - start_time, "seconds")

I tried and found time difference using the following scripts.

import time

start_time = time.perf_counter()
[main code here]
print (time.perf_counter() - start_time, "seconds")

回答 29

如果要以微秒为单位测量时间,则可以使用以下版本,完全基于Paul McGuireNicojo的回答-这是Python 3代码。我还添加了一些颜色:

import atexit
from time import time
from datetime import timedelta, datetime


def seconds_to_str(elapsed=None):
    if elapsed is None:
        return datetime.now().strftime("%Y-%m-%d %H:%M:%S.%f")
    else:
        return str(timedelta(seconds=elapsed))


def log(txt, elapsed=None):
    colour_cyan = '\033[36m'
    colour_reset = '\033[0;0;39m'
    colour_red = '\033[31m'
    print('\n ' + colour_cyan + '  [TIMING]> [' + seconds_to_str() + '] ----> ' + txt + '\n' + colour_reset)
    if elapsed:
        print("\n " + colour_red + " [TIMING]> Elapsed time ==> " + elapsed + "\n" + colour_reset)


def end_log():
    end = time()
    elapsed = end-start
    log("End Program", seconds_to_str(elapsed))


start = time()
atexit.register(end_log)
log("Start Program")

log()=>函数,输出定时信息。

txt ==>要记录的第一个参数,以及用来标记时间的字符串。

atexit ==> Python模块,用于注册程序退出时可以调用的函数。

If you want to measure time in microseconds, then you can use the following version, based completely on the answers of Paul McGuire and Nicojo – it’s Python 3 code. I’ve also added some colour to it:

import atexit
from time import time
from datetime import timedelta, datetime


def seconds_to_str(elapsed=None):
    if elapsed is None:
        return datetime.now().strftime("%Y-%m-%d %H:%M:%S.%f")
    else:
        return str(timedelta(seconds=elapsed))


def log(txt, elapsed=None):
    colour_cyan = '\033[36m'
    colour_reset = '\033[0;0;39m'
    colour_red = '\033[31m'
    print('\n ' + colour_cyan + '  [TIMING]> [' + seconds_to_str() + '] ----> ' + txt + '\n' + colour_reset)
    if elapsed:
        print("\n " + colour_red + " [TIMING]> Elapsed time ==> " + elapsed + "\n" + colour_reset)


def end_log():
    end = time()
    elapsed = end-start
    log("End Program", seconds_to_str(elapsed))


start = time()
atexit.register(end_log)
log("Start Program")

log() => function that prints out the timing information.

txt ==> first argument to log, and its string to mark timing.

atexit ==> Python module to register functions that you can call when the program exits.


如何将文本文件读入字符串变量并删除换行符?

问题:如何将文本文件读入字符串变量并删除换行符?

我使用以下代码段在python中读取文件:

with open ("data.txt", "r") as myfile:
    data=myfile.readlines()

输入文件为:

LLKKKKKKKKMMMMMMMMNNNNNNNNNNNNN
GGGGGGGGGHHHHHHHHHHHHHHHHHHHHEEEEEEEE

当我打印数据时

['LLKKKKKKKKMMMMMMMMNNNNNNNNNNNNN\n', 'GGGGGGGGGHHHHHHHHHHHHHHHHHHHHEEEEEEEE']

如我所见,数据是list形式形式的。我如何使其成为字符串?而且我怎么删除"\n""["以及"]"从中字符?

I use the following code segment to read a file in python:

with open ("data.txt", "r") as myfile:
    data=myfile.readlines()

Input file is:

LLKKKKKKKKMMMMMMMMNNNNNNNNNNNNN
GGGGGGGGGHHHHHHHHHHHHHHHHHHHHEEEEEEEE

and when I print data I get

['LLKKKKKKKKMMMMMMMMNNNNNNNNNNNNN\n', 'GGGGGGGGGHHHHHHHHHHHHHHHHHHHHEEEEEEEE']

As I see data is in list form. How do I make it string? And also how do I remove the "\n", "[", and "]" characters from it?


回答 0

您可以使用:

with open('data.txt', 'r') as file:
    data = file.read().replace('\n', '')

You could use:

with open('data.txt', 'r') as file:
    data = file.read().replace('\n', '')

回答 1

使用read(),而不是readline()

with open('data.txt', 'r') as myfile:
  data = myfile.read()

Use read(), not readline():

with open('data.txt', 'r') as myfile:
  data = myfile.read()

回答 2

您可以在一行中读取文件:

str = open('very_Important.txt', 'r').read()

请注意,这不会显式关闭文件。

当文件作为垃圾回收的一部分退出时,CPython将关闭文件。

但是其他python实现不会。要编写可移植的代码,最好with显式使用或关闭文件。做空并不总是更好。参见https://stackoverflow.com/a/7396043/362951

You can read from a file in one line:

str = open('very_Important.txt', 'r').read()

Please note that this does not close the file explicitly.

CPython will close the file when it exits as part of the garbage collection.

But other python implementations won’t. To write portable code, it is better to use with or close the file explicitly. Short is not always better. See https://stackoverflow.com/a/7396043/362951


回答 3

要将所有行连接到字符串中并删除新行,我通常使用:

with open('t.txt') as f:
  s = " ".join([x.strip() for x in f]) 

To join all lines into a string and remove new lines I normally use :

with open('t.txt') as f:
  s = " ".join([x.strip() for x in f]) 

回答 4

在Python 3.5或更高版本中,可以使用pathlib将文本文件的内容复制到一个变量中在一行中关闭该文件

from pathlib import Path
txt = Path('data.txt').read_text()

然后您可以使用str.replace删除换行符:

txt = txt.replace('\n', '')

In Python 3.5 or later, using pathlib you can copy text file contents into a variable and close the file in one line:

from pathlib import Path
txt = Path('data.txt').read_text()

and then you can use str.replace to remove the newlines:

txt = txt.replace('\n', '')

回答 5

with open("data.txt") as myfile:
    data="".join(line.rstrip() for line in myfile)

join()将加入一个字符串列表,而不带参数的rstrip()将从字符串末尾修剪空白,包括换行符。

with open("data.txt") as myfile:
    data="".join(line.rstrip() for line in myfile)

join() will join a list of strings, and rstrip() with no arguments will trim whitespace, including newlines, from the end of strings.


回答 6

这可以使用read()方法完成:

text_as_string = open('Your_Text_File.txt', 'r').read()

或者由于默认模式本身是“ r”(读取),因此只需使用,

text_as_string = open('Your_Text_File.txt').read()

This can be done using the read() method :

text_as_string = open('Your_Text_File.txt', 'r').read()

Or as the default mode itself is ‘r’ (read) so simply use,

text_as_string = open('Your_Text_File.txt').read()

回答 7

我已经摆弄了一段时间,并且更喜欢与read结合使用rstrip。如果不使用rstrip("\n"),Python会在字符串末尾添加换行符,这在大多数情况下不是很有用。

with open("myfile.txt") as f:
    file_content = f.read().rstrip("\n")
    print file_content

I have fiddled around with this for a while and have prefer to use use read in combination with rstrip. Without rstrip("\n"), Python adds a newline to the end of the string, which in most cases is not very useful.

with open("myfile.txt") as f:
    file_content = f.read().rstrip("\n")
    print file_content

回答 8

很难确切地知道您要做什么,但是这样的事情应该可以帮助您入门:

with open ("data.txt", "r") as myfile:
    data = ' '.join([line.replace('\n', '') for line in myfile.readlines()])

It’s hard to tell exactly what you’re after, but something like this should get you started:

with open ("data.txt", "r") as myfile:
    data = ' '.join([line.replace('\n', '') for line in myfile.readlines()])

回答 9

我很惊讶没有人提及splitlines()

with open ("data.txt", "r") as myfile:
    data = myfile.read().splitlines()

data现在,变量是一个列表,在打印时如下所示:

['LLKKKKKKKKMMMMMMMMNNNNNNNNNNNNN', 'GGGGGGGGGHHHHHHHHHHHHHHHHHHHHEEEEEEEE']

请注意,没有换行符(\n)。

那时,这听起来像是要将行打印回控制台,您可以使用for循环来实现:

for line in data:
    print line

I’m surprised nobody mentioned splitlines() yet.

with open ("data.txt", "r") as myfile:
    data = myfile.read().splitlines()

Variable data is now a list that looks like this when printed:

['LLKKKKKKKKMMMMMMMMNNNNNNNNNNNNN', 'GGGGGGGGGHHHHHHHHHHHHHHHHHHHHEEEEEEEE']

Note there are no newlines (\n).

At that point, it sounds like you want to print back the lines to console, which you can achieve with a for loop:

for line in data:
    print line

回答 10

您还可以删除每行并连接成最终字符串。

myfile = open("data.txt","r")
data = ""
lines = myfile.readlines()
for line in lines:
    data = data + line.strip();

这也可以解决。

You can also strip each line and concatenate into a final string.

myfile = open("data.txt","r")
data = ""
lines = myfile.readlines()
for line in lines:
    data = data + line.strip();

This would also work out just fine.


回答 11

您可以将其压缩为两行代码!!!!

content = open('filepath','r').read().replace('\n',' ')
print(content)

如果您的文件显示为:

hello how are you?
who are you?
blank blank

python输出

hello how are you? who are you? blank blank

you can compress this into one into two lines of code!!!

content = open('filepath','r').read().replace('\n',' ')
print(content)

if your file reads:

hello how are you?
who are you?
blank blank

python output

hello how are you? who are you? blank blank

回答 12

这是一个可复制粘贴的单行解决方案,它也关闭了文件对象:

_ = open('data.txt', 'r'); data = _.read(); _.close()

This is a one line, copy-pasteable solution that also closes the file object:

_ = open('data.txt', 'r'); data = _.read(); _.close()

回答 13

f = open('data.txt','r')
string = ""
while 1:
    line = f.readline()
    if not line:break
    string += line

f.close()


print string
f = open('data.txt','r')
string = ""
while 1:
    line = f.readline()
    if not line:break
    string += line

f.close()


print string

回答 14

python3:如果您对方括号语法不陌生,请使用Google“列表注释”。

 with open('data.txt') as f:
     lines = [ line.strip( ) for line in list(f) ]

python3: Google “list comphrension” if the square bracket syntax is new to you.

 with open('data.txt') as f:
     lines = [ line.strip( ) for line in list(f) ]

回答 15

你有试过吗?

x = "yourfilename.txt"
y = open(x, 'r').read()

print(y)

Have you tried this?

x = "yourfilename.txt"
y = open(x, 'r').read()

print(y)

回答 16

我认为没有人解决您问题的[]部分。当您将每一行读入变量时,由于在用\替换\ n之前有多行,所以最终创建了一个列表。如果您有一个x变量,并通过以下方式将其打印出来

X

或打印(x)

或str(x)

您将看到带有括号的整个列表。如果您调用(排序数组)的每个元素

x [0]则省略括号。如果您使用str()函数,您将只会看到数据,而不会看到“”。str(x [0])

I don’t feel that anyone addressed the [ ] part of your question. When you read each line into your variable, because there were multiple lines before you replaced the \n with ” you ended up creating a list. If you have a variable of x and print it out just by

x

or print(x)

or str(x)

You will see the entire list with the brackets. If you call each element of the (array of sorts)

x[0] then it omits the brackets. If you use the str() function you will see just the data and not the ” either. str(x[0])


回答 17

也许您可以尝试一下?我在程序中使用它。

Data= open ('data.txt', 'r')
data = Data.readlines()
for i in range(len(data)):
    data[i] = data[i].strip()+ ' '
data = ''.join(data).strip()

Maybe you could try this? I use this in my programs.

Data= open ('data.txt', 'r')
data = Data.readlines()
for i in range(len(data)):
    data[i] = data[i].strip()+ ' '
data = ''.join(data).strip()

回答 18

正则表达式也适用:

import re
with open("depression.txt") as f:
     l = re.split(' ', re.sub('\n',' ', f.read()))[:-1]

print (l)

[‘I’,’feel’,’empty’,’and’,’dead’,’inside’]

Regular expression works too:

import re
with open("depression.txt") as f:
     l = re.split(' ', re.sub('\n',' ', f.read()))[:-1]

print (l)

[‘I’, ‘feel’, ’empty’, ‘and’, ‘dead’, ‘inside’]


回答 19

要使用Python删除换行符,您可以使用replace字符串函数。

本示例删除所有3种换行符:

my_string = open('lala.json').read()
print(my_string)

my_string = my_string.replace("\r","").replace("\n","")
print(my_string)

示例文件为:

{
  "lala": "lulu",
  "foo": "bar"
}

您可以使用以下重播方案进行尝试:

https://repl.it/repls/AnnualJointHardware

To remove line breaks using Python you can use replace function of a string.

This example removes all 3 types of line breaks:

my_string = open('lala.json').read()
print(my_string)

my_string = my_string.replace("\r","").replace("\n","")
print(my_string)

Example file is:

{
  "lala": "lulu",
  "foo": "bar"
}

You can try it using this replay scenario:

https://repl.it/repls/AnnualJointHardware


回答 20

这有效:将文件更改为:

LLKKKKKKKKMMMMMMMMNNNNNNNNNNNNN GGGGGGGGGHHHHHHHHHHHHHHHHHHHHEEEEEEEE

然后:

file = open("file.txt")
line = file.read()
words = line.split()

这将创建一个列表words,该列表等于:

['LLKKKKKKKKMMMMMMMMNNNNNNNNNNNNN', 'GGGGGGGGGHHHHHHHHHHHHHHHHHHHHEEEEEEEE']

那摆脱了“ \ n”。要回答括号中的问题,只需执行以下操作:

for word in words: # Assuming words is the list above
    print word # Prints each word in file on a different line

要么:

print words[0] + ",", words[1] # Note that the "+" symbol indicates no spaces
#The comma not in parentheses indicates a space

返回:

LLKKKKKKKKMMMMMMMMNNNNNNNNNNNNN, GGGGGGGGGHHHHHHHHHHHHHHHHHHHHEEEEEEEE

This works: Change your file to:

LLKKKKKKKKMMMMMMMMNNNNNNNNNNNNN GGGGGGGGGHHHHHHHHHHHHHHHHHHHHEEEEEEEE

Then:

file = open("file.txt")
line = file.read()
words = line.split()

This creates a list named words that equals:

['LLKKKKKKKKMMMMMMMMNNNNNNNNNNNNN', 'GGGGGGGGGHHHHHHHHHHHHHHHHHHHHEEEEEEEE']

That got rid of the “\n”. To answer the part about the brackets getting in your way, just do this:

for word in words: # Assuming words is the list above
    print word # Prints each word in file on a different line

Or:

print words[0] + ",", words[1] # Note that the "+" symbol indicates no spaces
#The comma not in parentheses indicates a space

This returns:

LLKKKKKKKKMMMMMMMMNNNNNNNNNNNNN, GGGGGGGGGHHHHHHHHHHHHHHHHHHHHEEEEEEEE

回答 21

with open(player_name, 'r') as myfile:
 data=myfile.readline()
 list=data.split(" ")
 word=list[0]

此代码将帮助您阅读第一行,然后使用list and split选项可以转换以空格分隔的第一行单词以存储在列表中。

比起您可以轻松访问任何单词,甚至将其存储在字符串中而言。

您也可以使用for循环执行相同的操作。

with open(player_name, 'r') as myfile:
 data=myfile.readline()
 list=data.split(" ")
 word=list[0]

This code will help you to read the first line and then using the list and split option you can convert the first line word separated by space to be stored in a list.

Than you can easily access any word, or even store it in a string.

You can also do the same thing with using a for loop.


回答 22

file = open("myfile.txt", "r")
lines = file.readlines()
str = ''                                     #string declaration

for i in range(len(lines)):
    str += lines[i].rstrip('\n') + ' '

print str
file = open("myfile.txt", "r")
lines = file.readlines()
str = ''                                     #string declaration

for i in range(len(lines)):
    str += lines[i].rstrip('\n') + ' '

print str

回答 23

尝试以下方法:

with open('data.txt', 'r') as myfile:
    data = myfile.read()

    sentences = data.split('\\n')
    for sentence in sentences:
        print(sentence)

注意:它不会删除\n。仅用于查看文本,好像没有\n

Try the following:

with open('data.txt', 'r') as myfile:
    data = myfile.read()

    sentences = data.split('\\n')
    for sentence in sentences:
        print(sentence)

Caution: It does not remove the \n. It is just for viewing the text as if there were no \n


Python中“ assert”的用法是什么?

问题:Python中“ assert”的用法是什么?

我一直在阅读一些源代码,并且在几个地方看到了的用法assert

到底是什么意思?它的用途是什么?

I have been reading some source code and in several places I have seen the usage of assert.

What does it mean exactly? What is its usage?


回答 0

assert语句几乎存在于每种编程语言中。它有助于在程序中尽早发现问题,找出原因,而不是在其他操作后再发现问题。

当你做…

assert condition

…您要告诉程序测试该条件,如果条件为假,则立即触发错误。

在Python中,它大致等于:

if not condition:
    raise AssertionError()

在Python Shell中尝试:

>>> assert True # nothing happens
>>> assert False
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AssertionError

断言可以包括可选消息,您可以在运行解释器时将其禁用。

要在断言失败时打印消息:

assert False, "Oh no! This assertion failed!"

千万不能使用括号调用assert的功能等。这是一个声明。如果你这样做assert(condition, message)你会运行assert一个(condition, message)元组的第一个参数。

至于禁用它们,运行时,python在优化模式,其中__debug__False,断言语句将被忽略。只要通过-O标志:

python -O script.py

有关相关文档,请参见此处

The assert statement exists in almost every programming language. It helps detect problems early in your program, where the cause is clear, rather than later as a side-effect of some other operation.

When you do…

assert condition

… you’re telling the program to test that condition, and immediately trigger an error if the condition is false.

In Python, it’s roughly equivalent to this:

if not condition:
    raise AssertionError()

Try it in the Python shell:

>>> assert True # nothing happens
>>> assert False
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AssertionError

Assertions can include an optional message, and you can disable them when running the interpreter.

To print a message if the assertion fails:

assert False, "Oh no! This assertion failed!"

Do not use parenthesis to call assert like a function. It is a statement. If you do assert(condition, message) you’ll be running the assert with a (condition, message) tuple as first parameter.

As for disabling them, when running python in optimized mode, where __debug__ is False, assert statements will be ignored. Just pass the -O flag:

python -O script.py

See here for the relevant documentation.


回答 1

注意括号。正如上面指出的那样,在Python 3中,assert它仍然是一条语句,因此与类似print(..),可以将其外推到assert(..)raise(..)但不应该外推。

这很重要,因为:

assert(2 + 2 == 5, "Houston we've got a problem")

不起作用,不像

assert 2 + 2 == 5, "Houston we've got a problem"

第一个不起作用的原因是bool( (False, "Houston we've got a problem") )评估为True

在语句中assert(False),这些只是多余的括号False,对它们的内容进行求值。但是assert(False,)现在带括号的是一个元组,非空元组的计算结果为True布尔值。

Watch out for the parentheses. As has been pointed out above, in Python 3, assert is still a statement, so by analogy with print(..), one may extrapolate the same to assert(..) or raise(..) but you shouldn’t.

This is important because:

assert(2 + 2 == 5, "Houston we've got a problem")

won’t work, unlike

assert 2 + 2 == 5, "Houston we've got a problem"

The reason the first one will not work is that bool( (False, "Houston we've got a problem") ) evaluates to True.

In the statement assert(False), these are just redundant parentheses around False, which evaluate to their contents. But with assert(False,) the parentheses are now a tuple, and a non-empty tuple evaluates to True in a boolean context.


回答 2

正如其他答案所指出的,assert类似于在给定条件不成立时引发异常。一个重要的区别是,如果使用优化选项编译代码,则assert语句将被忽略-O。该文档说,assert expression可以更好地描述为等同于

if __debug__:
   if not expression: raise AssertionError

如果您要彻底测试代码,然后在满意所有断言都不失败的情况下发布优化版本,这将非常有用-当优化打开时,__debug__变量变为False且条件将不再被求值。如果您依靠断言并且没有意识到它们已经消失,那么此功能还可以吸引您。

As other answers have noted, assert is similar to throwing an exception if a given condition isn’t true. An important difference is that assert statements get ignored if you compile your code with the optimization option -O. The documentation says that assert expression can better be described as being equivalent to

if __debug__:
   if not expression: raise AssertionError

This can be useful if you want to thoroughly test your code, then release an optimized version when you’re happy that none of your assertion cases fail – when optimization is on, the __debug__ variable becomes False and the conditions will stop getting evaluated. This feature can also catch you out if you’re relying on the asserts and don’t realize they’ve disappeared.


回答 3

Python中断言的目的是通知开发人员程序中不可恢复的错误。

断言并不旨在表示预期的错误情况,例如“找不到文件”,用户可以在其中采取纠正措施(或只是再试一次)。

另一种看待它的方式是说断言是代码中的内部自检。它们通过在代码中声明某些条件是不可能的来工作的。如果不满足这些条件,则意味着程序中存在错误。

如果您的程序没有错误,则这些情况将永远不会发生。但是,如果确实发生了其中一种情况,则程序将因声明错误而崩溃,并确切地告诉您触发了哪个“不可能”条件。这使查找和修复程序中的错误变得更加容易。

这是我写的有关Python断言的教程的摘要:

Python的assert语句是一种调试辅助工具,而不是用于处理运行时错误的机制。使用断言的目的是让开发人员更快地找到错误的可能根本原因。除非程序中存在错误,否则永远不会引发断言错误。

The goal of an assertion in Python is to inform developers about unrecoverable errors in a program.

Assertions are not intended to signal expected error conditions, like “file not found”, where a user can take corrective action (or just try again).

Another way to look at it is to say that assertions are internal self-checks in your code. They work by declaring some conditions as impossible in your code. If these conditions don’t hold that means there’s a bug in the program.

If your program is bug-free, these conditions will never occur. But if one of them does occur the program will crash with an assertion error telling you exactly which “impossible” condition was triggered. This makes it much easier to track down and fix bugs in your programs.

Here’s a summary from a tutorial on Python’s assertions I wrote:

Python’s assert statement is a debugging aid, not a mechanism for handling run-time errors. The goal of using assertions is to let developers find the likely root cause of a bug more quickly. An assertion error should never be raised unless there’s a bug in your program.


回答 4

其他人已经为您提供了指向文档的链接。

您可以在交互式外壳中尝试以下操作:

>>> assert 5 > 2
>>> assert 2 > 5
Traceback (most recent call last):
  File "<string>", line 1, in <fragment>
builtins.AssertionError:

第一条语句什么也不做,而第二条语句引发异常。这是第一个提示:断言对于检查在代码的给定位置应为真的条件(通常是函数的开始(前提)和结束(条件))很有用。

断言实际上与合同编程高度相关,这是非常有用的工程实践:

http://en.wikipedia.org/wiki/Design_by_contract

Others have already given you links to documentation.

You can try the following in a interactive shell:

>>> assert 5 > 2
>>> assert 2 > 5
Traceback (most recent call last):
  File "<string>", line 1, in <fragment>
builtins.AssertionError:

The first statement does nothing, while the second raises an exception. This is the first hint: asserts are useful to check conditions that should be true in a given position of your code (usually, the beginning (preconditions) and the end of a function (postconditions)).

Asserts are actually highly tied to programming by contract, which is a very useful engineering practice:

http://en.wikipedia.org/wiki/Design_by_contract.


回答 5

从文档:

Assert statements are a convenient way to insert debugging assertions into a program

在这里您可以阅读更多信息:http : //docs.python.org/release/2.5.2/ref/assert.html

From docs:

Assert statements are a convenient way to insert debugging assertions into a program

Here you can read more: http://docs.python.org/release/2.5.2/ref/assert.html


回答 6

assert语句有两种形式。

简单形式assert <expression>相当于

if __debug__:
    if not <expression>: raise AssertionError

扩展形式assert <expression1>, <expression2>相当于

if __debug__:
    if not <expression1>: raise AssertionError, <expression2>

The assert statement has two forms.

The simple form, assert <expression>, is equivalent to

if __​debug__:
    if not <expression>: raise AssertionError

The extended form, assert <expression1>, <expression2>, is equivalent to

if __​debug__:
    if not <expression1>: raise AssertionError, <expression2>

回答 7

断言是检查程序内部状态是否符合程序员预期的一种系统方法,目的是捕获错误。请参见下面的示例。

>>> number = input('Enter a positive number:')
Enter a positive number:-1
>>> assert (number > 0), 'Only positive numbers are allowed!'
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AssertionError: Only positive numbers are allowed!
>>> 

Assertions are a systematic way to check that the internal state of a program is as the programmer expected, with the goal of catching bugs. See the example below.

>>> number = input('Enter a positive number:')
Enter a positive number:-1
>>> assert (number > 0), 'Only positive numbers are allowed!'
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AssertionError: Only positive numbers are allowed!
>>> 

回答 8

这是一个简单的示例,将其保存在文件中(假设为b.py)

def chkassert(num):
    assert type(num) == int


chkassert('a')

结果是什么时候 $python b.py

Traceback (most recent call last):
  File "b.py", line 5, in <module>
    chkassert('a')
  File "b.py", line 2, in chkassert
    assert type(num) == int
AssertionError

Here is a simple example, save this in file (let’s say b.py)

def chkassert(num):
    assert type(num) == int


chkassert('a')

and the result when $python b.py

Traceback (most recent call last):
  File "b.py", line 5, in <module>
    chkassert('a')
  File "b.py", line 2, in chkassert
    assert type(num) == int
AssertionError

回答 9

如果assert后的语句为true,则程序继续;但是,如果assert后的语句为false,则程序给出错误。就那么简单。

例如:

assert 1>0   #normal execution
assert 0>1   #Traceback (most recent call last):
             #File "<pyshell#11>", line 1, in <module>
             #assert 0>1
             #AssertionError

if the statement after assert is true then the program continues , but if the statement after assert is false then the program gives an error. Simple as that.

e.g.:

assert 1>0   #normal execution
assert 0>1   #Traceback (most recent call last):
             #File "<pyshell#11>", line 1, in <module>
             #assert 0>1
             #AssertionError

回答 10

assert语句几乎存在于每种编程语言中。它有助于在程序中尽早发现问题,找出原因,而不是在其他操作后再发现问题。他们总是期待一个True条件。

当您执行以下操作时:

assert condition

您要告诉程序测试该条件并在错误的情况下立即触发错误。

在Python中,assertexpression等效于:

if __debug__:
    if not <expression>: raise AssertionError

您可以使用扩展表达式来传递可选消息

if __debug__:
    if not (expression_1): raise AssertionError(expression_2)

在Python解释器中尝试一下:

>>> assert True # Nothing happens because the condition returns a True value.
>>> assert False # A traceback is triggered because this evaluation did not yield an expected value.
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AssertionError

在主要针对那些认为在assertif语句之间切换的人使用它们之前,有一些注意事项。使用的目的assert是在程序验证条件并返回应立即停止程序的值的情况下,而不是采取某些替代方法来绕过错误:

1.括号

您可能已经注意到,该assert语句使用两个条件。因此,千万不能使用括号englobe他们作为一个显而易见的建议。如果您这样做:

assert (condition, message)

例:

>>> assert (1==2, 1==1)
<stdin>:1: SyntaxWarning: assertion is always true, perhaps remove parentheses?

您将以代表元组的第一个参数运行assert带有的a (condition, message),这是因为Python中的非空元组始终为True。但是,您可以单独进行而不会出现问题:

assert (condition), "message"

例:

>>> assert (1==2), ("This condition returns a %s value.") % "False"
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AssertionError: This condition returns a False value.

2.调试目的

如果您想知道何时使用assert语句。举一个在现实生活中使用的例子:

*当您的程序倾向于控制用户输入的每个参数或其他任何参数时:

def loremipsum(**kwargs):
    kwargs.pop('bar') # return 0 if "bar" isn't in parameter
    kwargs.setdefault('foo', type(self)) # returns `type(self)` value by default
    assert (len(kwargs) == 0), "unrecognized parameter passed in %s" % ', '.join(kwargs.keys())

*数学上的另一种情况是某个方程式的系数或常数为0或非正数:

def discount(item, percent):
    price = int(item['price'] * (1.0 - percent))
    print(price)
    assert (0 <= price <= item['price']),\
            "Discounted prices cannot be lower than 0 "\
            "and they cannot be higher than the original price."

    return price

*甚至是布尔实现的简单示例:

def true(a, b):
    assert (a == b), "False"
    return 1

def false(a, b):
    assert (a != b), "True"
    return 0

3.数据处理或数据验证

最重要的是不要依赖该assert语句执行数据处理或数据验证,因为可以在Python初始化时使用-O-OO标志(分别表示值1、2和0(默认值)或PYTHONOPTIMIZE环境变量)关闭此语句。。

值1:

*断言被禁用;

*使用.pyo扩展名而不是.pyc; 生成字节码文件;

* sys.flags.optimize设置为1(True);

*和,__debug__设置为False;

值2:再禁用一件事

*文档字符串被禁用;

因此,使用该assert语句来验证某种预期数据非常危险,这甚至暗示了某些安全问题。然后,如果您需要验证某些权限,我建议您raise AuthError代替。作为先决条件,assert程序员通常在没有用户直接交互的库或模块上使用an 。

The assert statement exists in almost every programming language. It helps detect problems early in your program, where the cause is clear, rather than later as a side-effect of some other operation. They always expect a True condition.

When you do something like:

assert condition

You’re telling the program to test that condition and immediately trigger an error if it is false.

In Python, assert expression, is equivalent to:

if __debug__:
    if not <expression>: raise AssertionError

You can use the extended expression to pass an optional message:

if __debug__:
    if not (expression_1): raise AssertionError(expression_2)

Try it in the Python interpreter:

>>> assert True # Nothing happens because the condition returns a True value.
>>> assert False # A traceback is triggered because this evaluation did not yield an expected value.
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AssertionError

There are some caveats to seen before using them mainly for those who deem to toggles between the assert and if statements. The aim to use assert is on occasions when the program verifies a condition and return a value that should stop the program immediately instead of taking some alternative way to bypass the error:

1. Parentheses

As you may have noticed, the assert statement uses two conditions. Hence, do not use parentheses to englobe them as one for obvious advice. If you do such as:

assert (condition, message)

Example:

>>> assert (1==2, 1==1)
<stdin>:1: SyntaxWarning: assertion is always true, perhaps remove parentheses?

You will be running the assert with a (condition, message) which represents a tuple as the first parameter, and this happens cause non-empty tuple in Python is always True. However, you can do separately without problem:

assert (condition), "message"

Example:

>>> assert (1==2), ("This condition returns a %s value.") % "False"
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AssertionError: This condition returns a False value.

2. Debug purpose

If you are wondering regarding when use assert statement. Take an example used in real life:

* When your program tends to control each parameter entered by the user or whatever else:

def loremipsum(**kwargs):
    kwargs.pop('bar') # return 0 if "bar" isn't in parameter
    kwargs.setdefault('foo', type(self)) # returns `type(self)` value by default
    assert (len(kwargs) == 0), "unrecognized parameter passed in %s" % ', '.join(kwargs.keys())

* Another case is on math when 0 or non-positive as a coefficient or constant on a certain equation:

def discount(item, percent):
    price = int(item['price'] * (1.0 - percent))
    print(price)
    assert (0 <= price <= item['price']),\
            "Discounted prices cannot be lower than 0 "\
            "and they cannot be higher than the original price."

    return price

* or even a simple example of a boolean implementation:

def true(a, b):
    assert (a == b), "False"
    return 1

def false(a, b):
    assert (a != b), "True"
    return 0

3. Data processing or data validation

The utmost importance is to not rely on the assert statement to execute data processing or data validation because this statement can be turned off on the Python initialization with -O or -OO flag – meaning value 1, 2, and 0 (as default), respectively – or PYTHONOPTIMIZE environment variable.

Value 1:

* asserts are disabled;

* bytecode files are generated using .pyo extension instead of .pyc;

* sys.flags.optimize is set to 1 (True);

* and, __debug__ is set to False;

Value 2: disables one more stuff

* docstrings are disabled;

Therefore, using the assert statement to validate a sort of expected data is extremely dangerous, implying even to some security issues. Then, if you need to validate some permission I recommend you raise AuthError instead. As a preconditional effective, an assert is commonly used by programmers on libraries or modules that do not have a user interact directly.


回答 11

正如在C2 Wiki上简要概述的那样:

断言是程序中特定点的布尔表达式,除非程序中存在错误,否则该表达式为真

您可以使用一条assert语句来记录您在特定程序点上对代码的理解。例如,您可以记录关于输入(前提条件),程序状态(不变式)或输出(后置条件)的假设或保证。

如果您的断言失败了,这将向您(或您的后继者)发出警报,提醒您在编写程序时对程序的理解是错误的,并且可能包含错误。

有关更多信息,John Regehr在“ Assertions”中有一篇精彩的博客文章,该文章也适用于Python assert语句。

As summarized concisely on the C2 Wiki:

An assertion is a boolean expression at a specific point in a program which will be true unless there is a bug in the program.

You can use an assert statement to document your understanding of the code at a particular program point. For example, you can document assumptions or guarantees about inputs (preconditions), program state (invariants), or outputs (postconditions).

Should your assertion ever fail, this is an alert for you (or your successor) that your understanding of the program was wrong when you wrote it, and that it likely contains a bug.

For more information, John Regehr has a wonderful blog post on the Use of Assertions, which applies to the Python assert statement as well.


回答 12

如果您想确切知道保留函数在python中的作用,请输入 help(enter_keyword)

确保您输入的保留关键字是否作为字符串输入。

If you ever want to know exactly what a reserved function does in python, type in help(enter_keyword)

Make sure if you are entering a reserved keyword that you enter it as a string.


回答 13

Python 断言基本上是一种调试辅助工具,用于测试代码内部自检的条件。当代码陷入不可能的情况时,Assert使调试变得非常容易。断言检查那些不可能的情况。

假设有一个函数可以计算折扣后的商品价格:

def calculate_discount(price, discount):
    discounted_price = price - [discount*price]
    assert 0 <= discounted_price <= price
    return discounted_price

在这里,Discounted_price永远不能小于0且大于实际价格。因此,如果违反了上述条件,则assert会引发Assertion Error,这将有助于开发人员识别出某些不可能的事情发生了。

希望能帮助到你 :)

Python assert is basically a debugging aid which test condition for internal self-check of your code. Assert makes debugging really easy when your code gets into impossible edge cases. Assert check those impossible cases.

Let’s say there is a function to calculate price of item after discount :

def calculate_discount(price, discount):
    discounted_price = price - [discount*price]
    assert 0 <= discounted_price <= price
    return discounted_price

here, discounted_price can never be less than 0 and greater than actual price. So, in case the above condition is violated assert raises an Assertion Error, which helps the developer to identify that something impossible had happened.

Hope it helps 🙂


回答 14

我的简短解释是:

  • assertAssertionError如果expression为false,则引发,否则继续执行代码,如果有逗号,则为AssertionError: whatever after comma,并且代码如下:raise AssertionError(whatever after comma)

有关此的相关教程:

https://www.tutorialspoint.com/python/assertions_in_python.htm

My short explanation is:

  • assert raises AssertionError if expression is false, otherwise just continues the code, and if there’s a comma whatever it is it will be AssertionError: whatever after comma, and to code is like: raise AssertionError(whatever after comma)

A related tutorial about this:

https://www.tutorialspoint.com/python/assertions_in_python.htm


回答 15

在Pycharm中,如果assert与一起使用isinstance来声明对象的类型,它将使您在编码时可以访问父对象的方法和属性,它将自动自动完成。

例如,假设self.object1.object2是一个MyClass对象。

import MyClasss

def code_it(self):
    testObject = self.object1.object2 # at this point, program doesn't know that testObject  is a MyClass object yet
    assert isinstance(testObject , MyClasss) # now the program knows testObject is a MyClass object
    testObject.do_it() # from this point on, PyCharm will be able to auto-complete when you are working on testObject

In Pycharm, if you use assert along with isinstance to declare an object’s type, it will let you access the methods and attributes of the parent object while you are coding, it will auto-complete automatically.

For example, let’s say self.object1.object2 is a MyClass object.

import MyClasss

def code_it(self):
    testObject = self.object1.object2 # at this point, program doesn't know that testObject  is a MyClass object yet
    assert isinstance(testObject , MyClasss) # now the program knows testObject is a MyClass object
    testObject.do_it() # from this point on, PyCharm will be able to auto-complete when you are working on testObject

回答 16

如在其他答案中所写,assert语句用于检查给定点的程序状态。

我不会重复有关关联消息,括号或-O选项和__debug__常量的内容。另请查阅文档以获取第一手信息。我将重点关注您的问题:的用途是assert什么?更准确地说,何时(何时不该使用)assert

assert语句对于调试程序很有用,但不鼓励检查用户输入。我使用以下经验法则:保留断言以检测这种不应该发生的情况。用户输入可能不正确,例如密码太短,但这不是不应该发生的情况。如果圆的直径不是其半径的两倍,则在这种情况不应该发生

最有趣的,在我脑海里,使用的assert是由灵感 合同编程为[面向对象的软件建设]由B.迈耶描述( https://www.eiffel.org/doc/eiffel/Object-Oriented_Software_Construction% 2C_2nd_Edition )并以[Eiffel编程语言](https://en.wikipedia.org/wiki/Eiffel_ (programming_language )实施。您不能使用该assert语句通过合同完全模拟编程,但是保持意图很有趣。

这是一个例子。想象一下,您必须编写一个head函数(例如headHaskell中的[ 函数]( http://www.zvon.org/other/haskell/Outputprelude/head_f.html))。给出的规范是:“如果列表不为空,则返回列表的第一项”。查看以下实现:

>>> def head1(xs): return xs[0]

>>> def head2(xs):
...     if len(xs) > 0:
...         return xs[0]
...     else:
...         return None

(是的,可以写成return xs[0] if xs else None,但这不是重点)

如果列表不为空,则两个函数的结果相同,并且此结果正确:

>>> head1([1, 2, 3]) == head2([1, 2, 3]) == 1
True

因此,这两种实现都是(我希望)正确的。当您尝试采用空列表的标题时,它们会有所不同:

>>> head1([])
Traceback (most recent call last):
...
IndexError: list index out of range

但:

>>> head2([]) is None
True

同样,这两种实现都是正确的,因为没有人应该将空列表传递给这些函数(我们超出了规范)。那是一个不正确的电话,但是如果您进行这样的电话,任何事情都会发生。一个函数引发异常,另一个函数返回一个特殊值。最重要的是:我们不能依靠这种行为。如果xs为空,则可以使用:

print(head2(xs))

但这将使程序崩溃:

print(head1(xs))

为避免意外,我想知道何时将一些意外的参数传递给函数。换句话说:我想知道何时可观察的行为不可靠,因为它取决于实现而不是规范。当然,我可以阅读规范,但是程序员并不总是仔细阅读文档。

想象一下,如果我有一种方法可以将规范插入代码中以达到以下效果:当我违反规范时,例如,通过向传递一个空列表head,我会得到警告。这将对编写正确的(即符合规范的)程序有很大的帮助。这就是assert 进入现场的地方:

>>> def head1(xs):
...     assert len(xs) > 0, "The list must not be empty"
...     return xs[0]

>>> def head2(xs):
...     assert len(xs) > 0, "The list must not be empty"
...     if len(xs) > 0:
...         return xs[0]
...     else:
...         return None

现在,我们有:

>>> head1([])
Traceback (most recent call last):
...
AssertionError: The list must not be empty

和:

>>> head2([])
Traceback (most recent call last):
...
AssertionError: The list must not be empty

请注意,它head1抛出一个AssertionError,而不是IndexError。这很重要,因为an AssertionError并不是任何运行时错误:它表示违反规范。我想要警告,但出现错误。幸运的是,我可以禁用该检查(使用该-O选项),但后果自负。我会做到的,崩溃真的很昂贵,并且希望最好。想象一下,我的程序嵌入在穿过黑洞的宇宙飞船中。我将禁用断言,并希望该程序足够健壮,以免崩溃的时间尽可能长。

此示例仅与前提条件有关,因为您可以使用它assert来检查后置条件(返回值和/或状态)和不变式(类的状态)。请注意,检查后置条件和不变量with assert可能很麻烦:

  • 对于后置条件,需要将返回值分配给变量,并且如果要处理方法,则可能需要存储对象的初始状态;
  • 对于不变式,您必须在方法调用之前和之后检查状态。

您不会拥有像Eiffel那样复杂的功能,但是可以提高程序的整体质量。


总而言之,该assert语句是检测这种不应该发生的情况的便捷方法。违反规范(例如,向传递一个空列表head)是头等舱,这种情况不应该发生。因此,尽管该assert语句可用于检测任何意外情况,但这是确保满足规范的一种特权方式。一旦将assert语句插入代码中以表示规范,我们就可以希望您提高了程序的质量,因为将报告错误的参数,错误的返回值,错误的类状态…。

As written in other answers, assert statements are used to check the state of the program at a given point.

I won’t repeat what was said about associated message, parentheses, or -O option and __debug__ constant. Check also the doc for first hand information. I will focus on your question: what is the use of assert? More precisely, when (and when not) should one use assert?

The assert statements are useful to debug a program, but discouraged to check user input. I use the following rule of thumb: keep assertions to detect a this should not happen situation. A user input may be incorrect, e.g. a password too short, but this is not a this should not happen case. If the diameter of a circle is not twice as large as its radius, you are in a this should not happen case.

The most interesting, in my mind, use of assert is inspired by the programming by contract as described by B. Meyer in [Object-Oriented Software Construction]( https://www.eiffel.org/doc/eiffel/Object-Oriented_Software_Construction%2C_2nd_Edition ) and implemented in the [Eiffel programming language]( https://en.wikipedia.org/wiki/Eiffel_(programming_language)). You can’t fully emulate programming by contract using the assert statement, but it’s interesting to keep the intent.

Here’s an example. Imagine you have to write a head function (like the [head function in Haskell]( http://www.zvon.org/other/haskell/Outputprelude/head_f.html)). The specification you are given is: “if the list is not empty, return the first item of a list”. Look at the following implementations:

>>> def head1(xs): return xs[0]

And

>>> def head2(xs):
...     if len(xs) > 0:
...         return xs[0]
...     else:
...         return None

(Yes, this can be written as return xs[0] if xs else None, but that’s not the point).

If the list is not empty, both functions have the same result and this result is correct:

>>> head1([1, 2, 3]) == head2([1, 2, 3]) == 1
True

Hence, both implementations are (I hope) correct. They differ when you try to take the head item of an empty list:

>>> head1([])
Traceback (most recent call last):
...
IndexError: list index out of range

But:

>>> head2([]) is None
True

Again, both implementations are correct, because no one should pass an empty list to these functions (we are out of the specification). That’s an incorrect call, but if you do such a call, anything can happen. One function raises an exception, the other returns a special value. The most important is: we can’t rely on this behavior. If xs is empty, this will work:

print(head2(xs))

But this will crash the program:

print(head1(xs))

To avoid some surprises, I would like to know when I’m passing some unexpected argument to a function. In other words: I would like to know when the observable behavior is not reliable, because it depends on the implementation, not on the specification. Of course, I can read the specification, but programmers do not always read carefully the docs.

Imagine if I had a way to insert the specification into the code to get the following effect: when I violate the specification, e.g by passing an empty list to head, I get a warning. That would be a great help to write a correct (i.e. compliant with the specification) program. And that’s where assert enters on the scene:

>>> def head1(xs):
...     assert len(xs) > 0, "The list must not be empty"
...     return xs[0]

And

>>> def head2(xs):
...     assert len(xs) > 0, "The list must not be empty"
...     if len(xs) > 0:
...         return xs[0]
...     else:
...         return None

Now, we have:

>>> head1([])
Traceback (most recent call last):
...
AssertionError: The list must not be empty

And:

>>> head2([])
Traceback (most recent call last):
...
AssertionError: The list must not be empty

Note that head1 throws an AssertionError, not an IndexError. That’s important because an AssertionError is not any runtime error: it signals a violation of the specification. I wanted a warning, but I get an error. Fortunately, I can disable the check (using the -O option), but at my own risks. I will do it a crash is really expensive, and hope for the best. Imagine my program is embedded in a spaceship that travels through a black hole. I will disable assertions and hope the program is robust enough to not crash as long as possible.

This example was only about preconditions, be you can use assert to check postconditions (the return value and/or the state) and invariants (state of a class). Note that checking postconditions and invariants with assert can be cumbersome:

  • for postconditions, you need to assign the return value to a variable, and maybe to store the iniial state of the object if you are dealing with a method;
  • for invariants, you have to check the state before and after a method call.

You won’t have something as sophisticated as Eiffel, but you can however improve the overall quality of a program.


To summarize, the assert statement is a convenient way to detect a this should not happen situation. Violations of the specification (e.g. passing an empty list to head) are first class this should not happen situations. Hence, while the assert statement may be used to detect any unexpected situation, it is a privilegied way to ensure that the specification is fulfilled. Once you have inserted assert statements into the code to represent the specification, we can hope you have improved the quality of the program because incorrect arguments, incorrect return values, incorrect states of a class…, will be reported.


回答 17

格式:assert Expression [,arguments]当assert遇到一条语句时,Python计算该表达式。如果该语句不为true,则会引发异常(assertionError)。如果断言失败,Python将ArgumentExpression用作AssertionError的参数。可以使用try-except语句像其他任何异常一样捕获和处理AssertionError异常,但是如果不处理,它们将终止程序并产生回溯。例:

def KelvinToFahrenheit(Temperature):    
    assert (Temperature >= 0),"Colder than absolute zero!"    
    return ((Temperature-273)*1.8)+32    
print KelvinToFahrenheit(273)    
print int(KelvinToFahrenheit(505.78))    
print KelvinToFahrenheit(-5)    

执行以上代码后,将产生以下结果:

32.0
451
Traceback (most recent call last):    
  File "test.py", line 9, in <module>    
    print KelvinToFahrenheit(-5)    
  File "test.py", line 4, in KelvinToFahrenheit    
    assert (Temperature >= 0),"Colder than absolute zero!"    
AssertionError: Colder than absolute zero!    

format : assert Expression[,arguments] When assert encounters a statement,Python evaluates the expression.If the statement is not true,an exception is raised(assertionError). If the assertion fails, Python uses ArgumentExpression as the argument for the AssertionError. AssertionError exceptions can be caught and handled like any other exception using the try-except statement, but if not handled, they will terminate the program and produce a traceback. Example:

def KelvinToFahrenheit(Temperature):    
    assert (Temperature >= 0),"Colder than absolute zero!"    
    return ((Temperature-273)*1.8)+32    
print KelvinToFahrenheit(273)    
print int(KelvinToFahrenheit(505.78))    
print KelvinToFahrenheit(-5)    

When the above code is executed, it produces the following result:

32.0
451
Traceback (most recent call last):    
  File "test.py", line 9, in <module>    
    print KelvinToFahrenheit(-5)    
  File "test.py", line 4, in KelvinToFahrenheit    
    assert (Temperature >= 0),"Colder than absolute zero!"    
AssertionError: Colder than absolute zero!    

回答 18

def getUser(self, id, Email):

    user_key = id and id or Email

    assert user_key

可用于确保在函数调用中传递参数。

def getUser(self, id, Email):

    user_key = id and id or Email

    assert user_key

Can be used to ensure parameters are passed in the function call.


回答 19

>>>this_is_very_complex_function_result = 9
>>>c = this_is_very_complex_function_result
>>>test_us = (c < 4)

>>> #first we try without assert
>>>if test_us == True:
    print("YES! I am right!")
else:
    print("I am Wrong, but the program still RUNS!")

I am Wrong, but the program still RUNS!


>>> #now we try with assert
>>> assert test_us
Traceback (most recent call last):
  File "<pyshell#52>", line 1, in <module>
    assert test_us
AssertionError
>>> 
>>>this_is_very_complex_function_result = 9
>>>c = this_is_very_complex_function_result
>>>test_us = (c < 4)

>>> #first we try without assert
>>>if test_us == True:
    print("YES! I am right!")
else:
    print("I am Wrong, but the program still RUNS!")

I am Wrong, but the program still RUNS!


>>> #now we try with assert
>>> assert test_us
Traceback (most recent call last):
  File "<pyshell#52>", line 1, in <module>
    assert test_us
AssertionError
>>> 

回答 20

基本上,assert关键字的含义是,如果条件不成立,则通过assertionerror进行处理,否则例如在python中继续进行。

代码1

a=5

b=6

assert a==b

输出:

assert a==b

AssertionError

代码2

a=5

b=5

assert a==b

输出:

Process finished with exit code 0

Basically the assert keyword meaning is that if the condition is not true then it through an assertionerror else it continue for example in python.

code-1

a=5

b=6

assert a==b

OUTPUT:

assert a==b

AssertionError

code-2

a=5

b=5

assert a==b

OUTPUT:

Process finished with exit code 0

是否有内置功能可以打印对象的所有当前属性和值?

问题:是否有内置功能可以打印对象的所有当前属性和值?

所以我在这里寻找的是类似PHP的print_r函数。

这样一来,我可以通过查看问题对象的状态来调试脚本。

So what I’m looking for here is something like PHP’s print_r function.

This is so I can debug my scripts by seeing what’s the state of the object in question.


回答 0

您实际上是将两种不同的东西混合在一起。

使用dir()vars()inspect模块来得到你所感兴趣的是(我用__builtins__作为一个例子,你可以使用任何对象,而不是)。

>>> l = dir(__builtins__)
>>> d = __builtins__.__dict__

随心所欲地打印该词典:

>>> print l
['ArithmeticError', 'AssertionError', 'AttributeError',...

要么

>>> from pprint import pprint
>>> pprint(l)
['ArithmeticError',
 'AssertionError',
 'AttributeError',
 'BaseException',
 'DeprecationWarning',
...

>>> pprint(d, indent=2)
{ 'ArithmeticError': <type 'exceptions.ArithmeticError'>,
  'AssertionError': <type 'exceptions.AssertionError'>,
  'AttributeError': <type 'exceptions.AttributeError'>,
...
  '_': [ 'ArithmeticError',
         'AssertionError',
         'AttributeError',
         'BaseException',
         'DeprecationWarning',
...

交互式调试器中还可以作为命令提供漂亮的打印:

(Pdb) pp vars()
{'__builtins__': {'ArithmeticError': <type 'exceptions.ArithmeticError'>,
                  'AssertionError': <type 'exceptions.AssertionError'>,
                  'AttributeError': <type 'exceptions.AttributeError'>,
                  'BaseException': <type 'exceptions.BaseException'>,
                  'BufferError': <type 'exceptions.BufferError'>,
                  ...
                  'zip': <built-in function zip>},
 '__file__': 'pass.py',
 '__name__': '__main__'}

You are really mixing together two different things.

Use dir(), vars() or the inspect module to get what you are interested in (I use __builtins__ as an example; you can use any object instead).

>>> l = dir(__builtins__)
>>> d = __builtins__.__dict__

Print that dictionary however fancy you like:

>>> print l
['ArithmeticError', 'AssertionError', 'AttributeError',...

or

>>> from pprint import pprint
>>> pprint(l)
['ArithmeticError',
 'AssertionError',
 'AttributeError',
 'BaseException',
 'DeprecationWarning',
...

>>> pprint(d, indent=2)
{ 'ArithmeticError': <type 'exceptions.ArithmeticError'>,
  'AssertionError': <type 'exceptions.AssertionError'>,
  'AttributeError': <type 'exceptions.AttributeError'>,
...
  '_': [ 'ArithmeticError',
         'AssertionError',
         'AttributeError',
         'BaseException',
         'DeprecationWarning',
...

Pretty printing is also available in the interactive debugger as a command:

(Pdb) pp vars()
{'__builtins__': {'ArithmeticError': <type 'exceptions.ArithmeticError'>,
                  'AssertionError': <type 'exceptions.AssertionError'>,
                  'AttributeError': <type 'exceptions.AttributeError'>,
                  'BaseException': <type 'exceptions.BaseException'>,
                  'BufferError': <type 'exceptions.BufferError'>,
                  ...
                  'zip': <built-in function zip>},
 '__file__': 'pass.py',
 '__name__': '__main__'}

回答 1

您要vars()pprint()

from pprint import pprint
pprint(vars(your_object))

You want vars() mixed with pprint():

from pprint import pprint
pprint(vars(your_object))

回答 2

def dump(obj):
  for attr in dir(obj):
    print("obj.%s = %r" % (attr, getattr(obj, attr)))

有很多第三方函数可以根据其作者的喜好添加诸如异常处理,国家/特殊字符打印,递归到嵌套对象等功能。但他们基本上都归结为这一点。

def dump(obj):
  for attr in dir(obj):
    print("obj.%s = %r" % (attr, getattr(obj, attr)))

There are many 3rd-party functions out there that add things like exception handling, national/special character printing, recursing into nested objects etc. according to their authors’ preferences. But they all basically boil down to this.


回答 3

已经提到了dir,但这只会为您提供属性的名称。如果还需要它们的值,请尝试__dict__。

class O:
   def __init__ (self):
      self.value = 3

o = O()

这是输出:

>>> o.__dict__

{'value': 3}

dir has been mentioned, but that’ll only give you the attributes’ names. If you want their values as well try __dict__.

class O:
   def __init__ (self):
      self.value = 3

o = O()

Here is the output:

>>> o.__dict__

{'value': 3}

回答 4

您可以使用“ dir()”函数执行此操作。

>>> import sys
>>> dir(sys)
['__displayhook__', '__doc__', '__excepthook__', '__name__', '__stderr__', '__stdin__', '__stdo
t__', '_current_frames', '_getframe', 'api_version', 'argv', 'builtin_module_names', 'byteorder
, 'call_tracing', 'callstats', 'copyright', 'displayhook', 'dllhandle', 'exc_clear', 'exc_info'
 'exc_type', 'excepthook', 'exec_prefix', 'executable', 'exit', 'getcheckinterval', 'getdefault
ncoding', 'getfilesystemencoding', 'getrecursionlimit', 'getrefcount', 'getwindowsversion', 'he
version', 'maxint', 'maxunicode', 'meta_path', 'modules', 'path', 'path_hooks', 'path_importer_
ache', 'platform', 'prefix', 'ps1', 'ps2', 'setcheckinterval', 'setprofile', 'setrecursionlimit
, 'settrace', 'stderr', 'stdin', 'stdout', 'subversion', 'version', 'version_info', 'warnoption
', 'winver']
>>>

另一个有用的功能是帮助。

>>> help(sys)
Help on built-in module sys:

NAME
    sys

FILE
    (built-in)

MODULE DOCS
    http://www.python.org/doc/current/lib/module-sys.html

DESCRIPTION
    This module provides access to some objects used or maintained by the
    interpreter and to functions that interact strongly with the interpreter.

    Dynamic objects:

    argv -- command line arguments; argv[0] is the script pathname if known

You can use the “dir()” function to do this.

>>> import sys
>>> dir(sys)
['__displayhook__', '__doc__', '__excepthook__', '__name__', '__stderr__', '__stdin__', '__stdo
t__', '_current_frames', '_getframe', 'api_version', 'argv', 'builtin_module_names', 'byteorder
, 'call_tracing', 'callstats', 'copyright', 'displayhook', 'dllhandle', 'exc_clear', 'exc_info'
 'exc_type', 'excepthook', 'exec_prefix', 'executable', 'exit', 'getcheckinterval', 'getdefault
ncoding', 'getfilesystemencoding', 'getrecursionlimit', 'getrefcount', 'getwindowsversion', 'he
version', 'maxint', 'maxunicode', 'meta_path', 'modules', 'path', 'path_hooks', 'path_importer_
ache', 'platform', 'prefix', 'ps1', 'ps2', 'setcheckinterval', 'setprofile', 'setrecursionlimit
, 'settrace', 'stderr', 'stdin', 'stdout', 'subversion', 'version', 'version_info', 'warnoption
', 'winver']
>>>

Another useful feature is help.

>>> help(sys)
Help on built-in module sys:

NAME
    sys

FILE
    (built-in)

MODULE DOCS
    http://www.python.org/doc/current/lib/module-sys.html

DESCRIPTION
    This module provides access to some objects used or maintained by the
    interpreter and to functions that interact strongly with the interpreter.

    Dynamic objects:

    argv -- command line arguments; argv[0] is the script pathname if known

回答 5

要打印对象的当前状态,您可以:

>>> obj # in an interpreter

要么

print repr(obj) # in a script

要么

print obj

为您的类定义__str____repr__方法。从Python文档中

__repr__(self)repr()内置函数和字符串转换(反引号)调用以计算对象的“正式”字符串表示形式。如果可能的话,这应该看起来像一个有效的Python表达式,可以用来重新创建具有相同值的对象(在适当的环境下)。如果无法做到这一点,则应返回“ <…一些有用的说明…>”形式的字符串。返回值必须是一个字符串对象。如果一个类定义了repr()而不是__str__(),那么__repr__()当需要该类实例的“非正式”字符串表示形式时,也可以使用该类。这通常用于调试,因此重要的是,表示形式必须信息丰富且明确。

__str__(self)str()内置函数和print语句调用,以计算对象的“非正式”字符串表示形式。区别__repr__()在于它不必是有效的Python表达式:相反,可以使用更方便或更简洁的表示形式。返回值必须是一个字符串对象。

To print the current state of the object you might:

>>> obj # in an interpreter

or

print repr(obj) # in a script

or

print obj

For your classes define __str__ or __repr__ methods. From the Python documentation:

__repr__(self) Called by the repr() built-in function and by string conversions (reverse quotes) to compute the “official” string representation of an object. If at all possible, this should look like a valid Python expression that could be used to recreate an object with the same value (given an appropriate environment). If this is not possible, a string of the form “<…some useful description…>” should be returned. The return value must be a string object. If a class defines repr() but not __str__(), then __repr__() is also used when an “informal” string representation of instances of that class is required. This is typically used for debugging, so it is important that the representation is information-rich and unambiguous.

__str__(self) Called by the str() built-in function and by the print statement to compute the “informal” string representation of an object. This differs from __repr__() in that it does not have to be a valid Python expression: a more convenient or concise representation may be used instead. The return value must be a string object.


回答 6

可能值得一看-

是否有与Perl的Data :: Dumper等效的Python?

我的建议是

https://gist.github.com/1071857

请注意,perl有一个称为Data :: Dumper的模块,该模块将对象数据转换回perl源代码(注意:它不会将代码转换回源代码,并且几乎始终不希望输出中的对象方法函数)。可以将其用于持久性,但通用目的是用于调试。

标准python pprint有很多无法实现的功能,特别是当它看到一个对象的实例并为您提供该对象的内部十六进制指针时,它只会停止下降(错误,该指针不是很多使用方式)。简而言之,python就是关于这个伟大的面向对象范例的全部,但是您开箱即用的工具是为处理对象以外的东西而设计的。

perl Data :: Dumper允许您控制要深入的深度,还可以检测圆形链接结构(这很重要)。从根本上讲,此过程在perl中更容易实现,因为对象没有祝福以外的任何魔力(普遍定义良好的过程)。

Might be worth checking out —

Is there a Python equivalent to Perl’s Data::Dumper?

My recommendation is this —

https://gist.github.com/1071857

Note that perl has a module called Data::Dumper which translates object data back to perl source code (NB: it does NOT translate code back to source, and almost always you don’t want to the object method functions in the output). This can be used for persistence, but the common purpose is for debugging.

There are a number of things standard python pprint fails to achieve, in particular it just stops descending when it sees an instance of an object and gives you the internal hex pointer of the object (errr, that pointer is not a whole lot of use by the way). So in a nutshell, python is all about this great object oriented paradigm, but the tools you get out of the box are designed for working with something other than objects.

The perl Data::Dumper allows you to control how deep you want to go, and also detects circular linked structures (that’s really important). This process is fundamentally easier to achieve in perl because objects have no particular magic beyond their blessing (a universally well defined process).


回答 7

我建议使用help(your_object)

help(dir)

 If called without an argument, return the names in the current scope.
 Else, return an alphabetized list of names comprising (some of) the attributes
 of the given object, and of attributes reachable from it.
 If the object supplies a method named __dir__, it will be used; otherwise
 the default dir() logic is used and returns:
 for a module object: the module's attributes.
 for a class object:  its attributes, and recursively the attributes
 of its bases.
 for any other object: its attributes, its class's attributes, and
 recursively the attributes of its class's base classes.

help(vars)

Without arguments, equivalent to locals().
With an argument, equivalent to object.__dict__.

I recommend using help(your_object).

help(dir)

 If called without an argument, return the names in the current scope.
 Else, return an alphabetized list of names comprising (some of) the attributes
 of the given object, and of attributes reachable from it.
 If the object supplies a method named __dir__, it will be used; otherwise
 the default dir() logic is used and returns:
 for a module object: the module's attributes.
 for a class object:  its attributes, and recursively the attributes
 of its bases.
 for any other object: its attributes, its class's attributes, and
 recursively the attributes of its class's base classes.

help(vars)

Without arguments, equivalent to locals().
With an argument, equivalent to object.__dict__.

回答 8

在大多数情况下,使用__dict__dir()将获得所需的信息。如果您碰巧需要更多细节,则标准库包含检查模块,可让您获得一些令人印象深刻的细节。真正真正的信息包括:

  • 函数名称和方法参数
  • 类层次结构
  • 函数/类对象的实现源代码
  • 框架对象外的局部变量

如果你只是寻找“难道我的对象有什么属性值?”,然后dir()__dict__可能是足够的。如果您真的想深入研究任意对象的当前状态(请记住,在python中几乎所有对象都是对象),那么inspect值得考虑。

In most cases, using __dict__ or dir() will get you the info you’re wanting. If you should happen to need more details, the standard library includes the inspect module, which allows you to get some impressive amount of detail. Some of the real nuggests of info include:

  • names of function and method parameters
  • class hierarchies
  • source code of the implementation of a functions/class objects
  • local variables out of a frame object

If you’re just looking for “what attribute values does my object have?”, then dir() and __dict__ are probably sufficient. If you’re really looking to dig into the current state of arbitrary objects (keeping in mind that in python almost everything is an object), then inspect is worthy of consideration.


回答 9

是否有内置功能可以打印对象的所有当前属性和值?

不可以。最受好评的答案不包括某些类型的属性,被接受的答案显示了如何获取所有属性,包括非公共api的方法和部分。但是,没有为此提供良好的内置函数。

因此,简短的推论是您可以编写自己的脚本,但是它将计算属性和其他计算的数据描述符(它们是公共API的一部分),并且您可能不希望这样做:

from pprint import pprint
from inspect import getmembers
from types import FunctionType

def attributes(obj):
    disallowed_names = {
      name for name, value in getmembers(type(obj)) 
        if isinstance(value, FunctionType)}
    return {
      name: getattr(obj, name) for name in dir(obj) 
        if name[0] != '_' and name not in disallowed_names and hasattr(obj, name)}

def print_attributes(obj):
    pprint(attributes(obj))

其他答案的问题

在具有许多不同类型的数据成员的类上观察当前投票最高的答案的应用:

from pprint import pprint

class Obj:
    __slots__ = 'foo', 'bar', '__dict__'
    def __init__(self, baz):
        self.foo = ''
        self.bar = 0
        self.baz = baz
    @property
    def quux(self):
        return self.foo * self.bar

obj = Obj('baz')
pprint(vars(obj))

仅打印:

{'baz': 'baz'}

由于vars 返回__dict__对象的,而并非副本,因此,如果您修改vars返回的dict,那么您也将修改__dict__对象本身的。

vars(obj)['quux'] = 'WHAT?!'
vars(obj)

返回:

{'baz': 'baz', 'quux': 'WHAT?!'}

-这很糟糕,因为quux是我们不应该设置的属性,也不应该在命名空间中…

在当前接受的答案(和其他答案)中应用建议并没有多大好处:

>>> dir(obj)
['__class__', '__delattr__', '__dict__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__gt__', '__hash__', '__init__', '__init_subclass__', '__le__', '__lt__', '__module__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__slots__', '__str__', '__subclasshook__', 'bar', 'baz', 'foo', 'quux']

如我们所见,dir仅返回与一个对象关联的所有(实际上只是大多数)名称。

inspect.getmembers注释中提到的,也存在类似缺陷-它返回所有名称值。

从Class

在教学时,我让我的学生创建一个函数,该函数提供对象的语义公共API:

def api(obj):
    return [name for name in dir(obj) if name[0] != '_']

我们可以扩展它以提供对象的语义命名空间的副本,但是我们需要排除__slots__未分配的内容,并且如果我们认真对待“当前属性”的请求,则需要排除计算出的属性(如它们可能变得昂贵,并且可以解释为不是“当前”):

from types import FunctionType
from inspect import getmembers

def attrs(obj):
     disallowed_properties = {
       name for name, value in getmembers(type(obj)) 
         if isinstance(value, (property, FunctionType))}
     return {
       name: getattr(obj, name) for name in api(obj) 
         if name not in disallowed_properties and hasattr(obj, name)}

现在我们不计算或显示属性quux:

>>> attrs(obj)
{'bar': 0, 'baz': 'baz', 'foo': ''}

注意事项

但是也许我们确实知道我们的财产并不昂贵。我们可能想要更改逻辑以使其也包括在内。也许我们想排除其他 自定义数据描述符。

然后,我们需要进一步自定义此功能。因此,我们不能拥有一个内在的功能,就可以神奇地准确地知道我们想要什么并提供它,这是有道理的。这是我们需要创建自己的功能。

结论

没有内置函数可以执行此操作,因此您应该执行最适合您情况的语义上的操作。

Is there a built-in function to print all the current properties and values of an object?

No. The most upvoted answer excludes some kinds of attributes, and the accepted answer shows how to get all attributes, including methods and parts of the non-public api. But there is no good complete builtin function for this.

So the short corollary is that you can write your own, but it will calculate properties and other calculated data-descriptors that are part of the public API, and you might not want that:

from pprint import pprint
from inspect import getmembers
from types import FunctionType

def attributes(obj):
    disallowed_names = {
      name for name, value in getmembers(type(obj)) 
        if isinstance(value, FunctionType)}
    return {
      name: getattr(obj, name) for name in dir(obj) 
        if name[0] != '_' and name not in disallowed_names and hasattr(obj, name)}

def print_attributes(obj):
    pprint(attributes(obj))

Problems with other answers

Observe the application of the currently top voted answer on a class with a lot of different kinds of data members:

from pprint import pprint

class Obj:
    __slots__ = 'foo', 'bar', '__dict__'
    def __init__(self, baz):
        self.foo = ''
        self.bar = 0
        self.baz = baz
    @property
    def quux(self):
        return self.foo * self.bar

obj = Obj('baz')
pprint(vars(obj))

only prints:

{'baz': 'baz'}

Because vars only returns the __dict__ of an object, and it’s not a copy, so if you modify the dict returned by vars, you’re also modifying the __dict__ of the object itself.

vars(obj)['quux'] = 'WHAT?!'
vars(obj)

returns:

{'baz': 'baz', 'quux': 'WHAT?!'}

— which is bad because quux is a property that we shouldn’t be setting and shouldn’t be in the namespace…

Applying the advice in the currently accepted answer (and others) is not much better:

>>> dir(obj)
['__class__', '__delattr__', '__dict__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__gt__', '__hash__', '__init__', '__init_subclass__', '__le__', '__lt__', '__module__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__slots__', '__str__', '__subclasshook__', 'bar', 'baz', 'foo', 'quux']

As we can see, dir only returns all (actually just most) of the names associated with an object.

inspect.getmembers, mentioned in the comments, is similarly flawed – it returns all names and values.

From class

When teaching I have my students create a function that provides the semantically public API of an object:

def api(obj):
    return [name for name in dir(obj) if name[0] != '_']

We can extend this to provide a copy of the semantic namespace of an object, but we need to exclude __slots__ that aren’t assigned, and if we’re taking the request for “current properties” seriously, we need to exclude calculated properties (as they could become expensive, and could be interpreted as not “current”):

from types import FunctionType
from inspect import getmembers

def attrs(obj):
     disallowed_properties = {
       name for name, value in getmembers(type(obj)) 
         if isinstance(value, (property, FunctionType))}
     return {
       name: getattr(obj, name) for name in api(obj) 
         if name not in disallowed_properties and hasattr(obj, name)}

And now we do not calculate or show the property, quux:

>>> attrs(obj)
{'bar': 0, 'baz': 'baz', 'foo': ''}

Caveats

But perhaps we do know our properties aren’t expensive. We may want to alter the logic to include them as well. And perhaps we want to exclude other custom data descriptors instead.

Then we need to further customize this function. And so it makes sense that we cannot have a built-in function that magically knows exactly what we want and provides it. This is functionality we need to create ourselves.

Conclusion

There is no built-in function that does this, and you should do what is most semantically appropriate for your situation.


回答 10

一个带有魔术的元编程示例Dump对象

$ cat dump.py
#!/usr/bin/python
import sys
if len(sys.argv) > 2:
    module, metaklass  = sys.argv[1:3]
    m = __import__(module, globals(), locals(), [metaklass])
    __metaclass__ = getattr(m, metaklass)

class Data:
    def __init__(self):
        self.num = 38
        self.lst = ['a','b','c']
        self.str = 'spam'
    dumps   = lambda self: repr(self)
    __str__ = lambda self: self.dumps()

data = Data()
print data

没有参数:

$ python dump.py
<__main__.Data instance at 0x00A052D8>

带有Gnosis实用程序

$ python dump.py gnosis.magic MetaXMLPickler
<?xml version="1.0"?>
<!DOCTYPE PyObject SYSTEM "PyObjects.dtd">
<PyObject module="__main__" class="Data" id="11038416">
<attr name="lst" type="list" id="11196136" >
  <item type="string" value="a" />
  <item type="string" value="b" />
  <item type="string" value="c" />
</attr>
<attr name="num" type="numeric" value="38" />
<attr name="str" type="string" value="spam" />
</PyObject>

它有点过时了,但仍然可以使用。

A metaprogramming example Dump object with magic:

$ cat dump.py
#!/usr/bin/python
import sys
if len(sys.argv) > 2:
    module, metaklass  = sys.argv[1:3]
    m = __import__(module, globals(), locals(), [metaklass])
    __metaclass__ = getattr(m, metaklass)

class Data:
    def __init__(self):
        self.num = 38
        self.lst = ['a','b','c']
        self.str = 'spam'
    dumps   = lambda self: repr(self)
    __str__ = lambda self: self.dumps()

data = Data()
print data

Without arguments:

$ python dump.py
<__main__.Data instance at 0x00A052D8>

With Gnosis Utils:

$ python dump.py gnosis.magic MetaXMLPickler
<?xml version="1.0"?>
<!DOCTYPE PyObject SYSTEM "PyObjects.dtd">
<PyObject module="__main__" class="Data" id="11038416">
<attr name="lst" type="list" id="11196136" >
  <item type="string" value="a" />
  <item type="string" value="b" />
  <item type="string" value="c" />
</attr>
<attr name="num" type="numeric" value="38" />
<attr name="str" type="string" value="spam" />
</PyObject>

It is a bit outdated but still working.


回答 11

如果您正在使用它进行调试,并且只想递归地转储所有内容,那么可接受的答案将不令人满意,因为这要求您的类已经具有良好的__str__实现。如果不是这种情况,那么效果会更好:

import json
print(json.dumps(YOUR_OBJECT, 
                 default=lambda obj: vars(obj),
                 indent=1))

If you’re using this for debugging, and you just want a recursive dump of everything, the accepted answer is unsatisfying because it requires that your classes have good __str__ implementations already. If that’s not the case, this works much better:

import json
print(json.dumps(YOUR_OBJECT, 
                 default=lambda obj: vars(obj),
                 indent=1))

回答 12

尝试ppretty

from ppretty import ppretty


class A(object):
    s = 5

    def __init__(self):
        self._p = 8

    @property
    def foo(self):
        return range(10)


print ppretty(A(), show_protected=True, show_static=True, show_properties=True)

输出:

__main__.A(_p = 8, foo = [0, 1, ..., 8, 9], s = 5)

Try ppretty

from ppretty import ppretty


class A(object):
    s = 5

    def __init__(self):
        self._p = 8

    @property
    def foo(self):
        return range(10)


print ppretty(A(), show_protected=True, show_static=True, show_properties=True)

Output:

__main__.A(_p = 8, foo = [0, 1, ..., 8, 9], s = 5)

回答 13

from pprint import pprint

def print_r(the_object):
    print ("CLASS: ", the_object.__class__.__name__, " (BASE CLASS: ", the_object.__class__.__bases__,")")
    pprint(vars(the_object))
from pprint import pprint

def print_r(the_object):
    print ("CLASS: ", the_object.__class__.__name__, " (BASE CLASS: ", the_object.__class__.__bases__,")")
    pprint(vars(the_object))

回答 14

这将以json或yaml缩进格式递归打印所有对象内容:

import jsonpickle # pip install jsonpickle
import json
import yaml # pip install pyyaml

serialized = jsonpickle.encode(obj, max_depth=2) # max_depth is optional
print json.dumps(json.loads(serialized), indent=4)
print yaml.dump(yaml.load(serialized), indent=4)

This prints out all the object contents recursively in json or yaml indented format:

import jsonpickle # pip install jsonpickle
import json
import yaml # pip install pyyaml

serialized = jsonpickle.encode(obj, max_depth=2) # max_depth is optional
print json.dumps(json.loads(serialized), indent=4)
print yaml.dump(yaml.load(serialized), indent=4)

回答 15

我赞成仅提及pprint的答案。明确地说,如果要查看复杂数据结构中的所有,请执行以下操作:

from pprint import pprint
pprint(my_var)

其中my_var是您感兴趣的变量。当我使用时,pprint(vars(my_var))我什么也没得到,这里的其他答案也无济于事,或者该方法看起来不必要地冗长。顺便说一句,在我的特定情况下,我正在检查的代码具有字典词典。

值得指出的是,对于某些自定义类,您可能只会得到无用<someobject.ExampleClass object at 0x7f739267f400>的输出。在这种情况下,您可能必须实现一个__str__方法或尝试其他解决方案。我仍然想找到没有第三方库就可以在所有情况下使用的简单方法。

I’ve upvoted the answer that mentions only pprint. To be clear, if you want to see all the values in a complex data structure, then do something like:

from pprint import pprint
pprint(my_var)

Where my_var is your variable of interest. When I used pprint(vars(my_var)) I got nothing, and other answers here didn’t help or the method looked unnecessarily long. By the way, in my particular case, the code I was inspecting had a dictionary of dictionaries.

Worth pointing out that with some custom classes you may just end up with an unhelpful <someobject.ExampleClass object at 0x7f739267f400> kind of output. In that case, you might have to implement a __str__ method, or try some of the other solutions. I’d still like to find something simple that works in all scenarios, without third party libraries.


回答 16

我需要在一些日志中打印DEBUG信息,并且无法使用pprint,因为它将破坏它。相反,我这样做了,并且得到了几乎相同的东西。

DO = DemoObject()

itemDir = DO.__dict__

for i in itemDir:
    print '{0}  :  {1}'.format(i, itemDir[i])

I was needing to print DEBUG info in some logs and was unable to use pprint because it would break it. Instead I did this and got virtually the same thing.

DO = DemoObject()

itemDir = DO.__dict__

for i in itemDir:
    print '{0}  :  {1}'.format(i, itemDir[i])

回答 17

要转储“ myObject”:

from bson import json_util
import json

print(json.dumps(myObject, default=json_util.default, sort_keys=True, indent=4, separators=(',', ': ')))

我尝试了vars()和dir(); 都因为我要找的东西而失败了。vars()无效,因为对象没有__dict__(exceptions.TypeError:vars()参数必须具有__dict__属性)。dir()并不是我要找的东西:它只是字段名的列表,不提供值或对象结构。

我认为json.dumps()适用于没有default = json_util.default的大多数对象,但是我在对象中有一个datetime字段,因此标准json序列化程序失败。请参阅如何克服python中的“ datetime.datetime无法JSON序列化”?

To dump “myObject”:

from bson import json_util
import json

print(json.dumps(myObject, default=json_util.default, sort_keys=True, indent=4, separators=(',', ': ')))

I tried vars() and dir(); both failed for what I was looking for. vars() didn’t work because the object didn’t have __dict__ (exceptions.TypeError: vars() argument must have __dict__ attribute). dir() wasn’t what I was looking for: it’s just a listing of field names, doesn’t give the values or the object structure.

I think json.dumps() would work for most objects without the default=json_util.default, but I had a datetime field in the object so the standard json serializer failed. See How to overcome “datetime.datetime not JSON serializable” in python?


回答 18

为什么不简单一些:

for key,value in obj.__dict__.iteritems():
    print key,value

Why not something simple:

for key,value in obj.__dict__.iteritems():
    print key,value

回答 19

pprint包含一个“漂亮打印机”,用于生成美观的数据结构表示。格式化程序产生的数据结构可以由解释器正确解析,并且易于阅读。如果可能的话,输出保持在一行上,并在分成多行时缩进。

pprint contains a “pretty printer” for producing aesthetically pleasing representations of your data structures. The formatter produces representations of data structures that can be parsed correctly by the interpreter, and are also easy for a human to read. The output is kept on a single line, if possible, and indented when split across multiple lines.


回答 20

只需尝试beeprint

它不仅可以帮助您打印对象变量,而且还可以帮助您输出漂亮的输出,例如:

class(NormalClassNewStyle):
  dicts: {
  },
  lists: [],
  static_props: 1,
  tupl: (1, 2)

Just try beeprint.

It will help you not only with printing object variables, but beautiful output as well, like this:

class(NormalClassNewStyle):
  dicts: {
  },
  lists: [],
  static_props: 1,
  tupl: (1, 2)

回答 21

对于每个奋斗的人

  • vars() 不返回所有属性。
  • dir() 不返回属性的值。

以下代码显示带有的所有属性obj及其值:

for attr in dir(obj):
        try:
            print("obj.{} = {}".format(attr, getattr(obj, attr)))
        except AttributeError:
            print("obj.{} = ?".format(attr))

For everybody struggling with

  • vars() not returning all attributes.
  • dir() not returning the attributes’ values.

The following code prints all attributes of obj with their values:

for attr in dir(obj):
        try:
            print("obj.{} = {}".format(attr, getattr(obj, attr)))
        except AttributeError:
            print("obj.{} = ?".format(attr))

回答 22

您可以尝试Flask调试工具栏。
https://pypi.python.org/pypi/Flask-DebugToolbar

from flask import Flask
from flask_debugtoolbar import DebugToolbarExtension

app = Flask(__name__)

# the toolbar is only enabled in debug mode:
app.debug = True

# set a 'SECRET_KEY' to enable the Flask session cookies
app.config['SECRET_KEY'] = '<replace with a secret key>'

toolbar = DebugToolbarExtension(app)

You can try the Flask Debug Toolbar.
https://pypi.python.org/pypi/Flask-DebugToolbar

from flask import Flask
from flask_debugtoolbar import DebugToolbarExtension

app = Flask(__name__)

# the toolbar is only enabled in debug mode:
app.debug = True

# set a 'SECRET_KEY' to enable the Flask session cookies
app.config['SECRET_KEY'] = '<replace with a secret key>'

toolbar = DebugToolbarExtension(app)

回答 23

我喜欢使用python对象内置类型keysvalues

对于属性,无论它们是方法还是变量:

o.keys()

对于这些属性的值:

o.values()

I like working with python object built-in types keys or values.

For attributes regardless they are methods or variables:

o.keys()

For values of those attributes:

o.values()

回答 24

无论在类中,__init__或外部如何定义变量,该方法都有效。

your_obj = YourObj()
attrs_with_value = {attr: getattr(your_obj, attr) for attr in dir(your_obj)}

This works no matter how your varibles are defined within a class, inside __init__ or outside.

your_obj = YourObj()
attrs_with_value = {attr: getattr(your_obj, attr) for attr in dir(your_obj)}

计算字符串中字符的出现次数

问题:计算字符串中字符的出现次数

计算字符串中字符出现次数的最简单方法是什么?

例如计算'a'出现在其中的次数'Mary had a little lamb'

What’s the simplest way to count the number of occurrences of a character in a string?

e.g. count the number of times 'a' appears in 'Mary had a little lamb'


回答 0

str.count(sub [,start [,end]])

返回sub范围中的子字符串不重叠的次数[start, end]。可选参数startend并按片表示法解释。

>>> sentence = 'Mary had a little lamb'
>>> sentence.count('a')
4

str.count(sub[, start[, end]])

Return the number of non-overlapping occurrences of substring sub in the range [start, end]. Optional arguments start and end are interpreted as in slice notation.

>>> sentence = 'Mary had a little lamb'
>>> sentence.count('a')
4

回答 1

您可以使用count()

>>> 'Mary had a little lamb'.count('a')
4

You can use count() :

>>> 'Mary had a little lamb'.count('a')
4

回答 2

正如其他答案所说,使用字符串方法count()可能是最简单的方法,但是如果您经常这样做,请查看collections.Counter

from collections import Counter
my_str = "Mary had a little lamb"
counter = Counter(my_str)
print counter['a']

As other answers said, using the string method count() is probably the simplest, but if you’re doing this frequently, check out collections.Counter:

from collections import Counter
my_str = "Mary had a little lamb"
counter = Counter(my_str)
print counter['a']

回答 3

正则表达式可能吗?

import re
my_string = "Mary had a little lamb"
len(re.findall("a", my_string))

Regular expressions maybe?

import re
my_string = "Mary had a little lamb"
len(re.findall("a", my_string))

回答 4

myString.count('a');

更多信息在这里

myString.count('a');

more info here


回答 5

Python-3.x:

"aabc".count("a")

str.count(sub [,start [,end]])

返回子字符串sub在[start,end]范围内不重叠的次数。可选参数start和end解释为切片表示法。

Python-3.x:

"aabc".count("a")

str.count(sub[, start[, end]])

Return the number of non-overlapping occurrences of substring sub in the range [start, end]. Optional arguments start and end are interpreted as in slice notation.


回答 6

str.count(a)是计算字符串中单个字符的最佳解决方案。但是,如果您需要计算更多的字符,则必须读取整个字符串与要计算的字符一样多的次数。

这项工作的更好方法是:

from collections import defaultdict

text = 'Mary had a little lamb'
chars = defaultdict(int)

for char in text:
    chars[char] += 1

因此,您将拥有一个dict,它返回字符串中每个字母(0如果不存在)的出现次数。

>>>chars['a']
4
>>>chars['x']
0

对于不区分大小写的计数器,您可以通过子类化来覆盖mutator和accessor方法defaultdict(基类的方法是只读的):

class CICounter(defaultdict):
    def __getitem__(self, k):
        return super().__getitem__(k.lower())

    def __setitem__(self, k, v):
        super().__setitem__(k.lower(), v)


chars = CICounter(int)

for char in text:
    chars[char] += 1

>>>chars['a']
4
>>>chars['M']
2
>>>chars['x']
0

str.count(a) is the best solution to count a single character in a string. But if you need to count more characters you would have to read the whole string as many times as characters you want to count.

A better approach for this job would be:

from collections import defaultdict

text = 'Mary had a little lamb'
chars = defaultdict(int)

for char in text:
    chars[char] += 1

So you’ll have a dict that returns the number of occurrences of every letter in the string and 0 if it isn’t present.

>>>chars['a']
4
>>>chars['x']
0

For a case insensitive counter you could override the mutator and accessor methods by subclassing defaultdict (base class’ ones are read-only):

class CICounter(defaultdict):
    def __getitem__(self, k):
        return super().__getitem__(k.lower())

    def __setitem__(self, k, v):
        super().__setitem__(k.lower(), v)


chars = CICounter(int)

for char in text:
    chars[char] += 1

>>>chars['a']
4
>>>chars['M']
2
>>>chars['x']
0

回答 7

这个简单而直接的功能可能会有所帮助:

def check_freq(x):
    freq = {}
    for c in x:
       freq[c] = str.count(c)
    return freq

check_freq("abbabcbdbabdbdbabababcbcbab")
{'a': 7, 'b': 14, 'c': 3, 'd': 3}

This easy and straight forward function might help:

def check_freq(x):
    freq = {}
    for c in x:
       freq[c] = str.count(c)
    return freq

check_freq("abbabcbdbabdbdbabababcbcbab")
{'a': 7, 'b': 14, 'c': 3, 'd': 3}

回答 8

如果要区分大小写(当然还有正则表达式的全部功能),则正则表达式非常有用。

my_string = "Mary had a little lamb"
# simplest solution, using count, is case-sensitive
my_string.count("m")   # yields 1
import re
# case-sensitive with regex
len(re.findall("m", my_string))
# three ways to get case insensitivity - all yield 2
len(re.findall("(?i)m", my_string))
len(re.findall("m|M", my_string))
len(re.findall(re.compile("m",re.IGNORECASE), my_string))

请注意,正则表达式版本的运行时间大约是其十倍,这仅在my_string非常长或代码处于深循环内时才可能是一个问题。

Regular expressions are very useful if you want case-insensitivity (and of course all the power of regex).

my_string = "Mary had a little lamb"
# simplest solution, using count, is case-sensitive
my_string.count("m")   # yields 1
import re
# case-sensitive with regex
len(re.findall("m", my_string))
# three ways to get case insensitivity - all yield 2
len(re.findall("(?i)m", my_string))
len(re.findall("m|M", my_string))
len(re.findall(re.compile("m",re.IGNORECASE), my_string))

Be aware that the regex version takes on the order of ten times as long to run, which will likely be an issue only if my_string is tremendously long, or the code is inside a deep loop.


回答 9

a = 'have a nice day'
symbol = 'abcdefghijklmnopqrstuvwxyz'
for key in symbol:
    print key, a.count(key)
a = 'have a nice day'
symbol = 'abcdefghijklmnopqrstuvwxyz'
for key in symbol:
    print key, a.count(key)

回答 10

str = "count a character occurance"

List = list(str)
print (List)
Uniq = set(List)
print (Uniq)

for key in Uniq:
    print (key, str.count(key))
str = "count a character occurance"

List = list(str)
print (List)
Uniq = set(List)
print (Uniq)

for key in Uniq:
    print (key, str.count(key))

回答 11

另一种方式来获得所有的字符数不使用Counter()count和正则表达式

counts_dict = {}
for c in list(sentence):
  if c not in counts_dict:
    counts_dict[c] = 0
  counts_dict[c] += 1

for key, value in counts_dict.items():
    print(key, value)

An alternative way to get all the character counts without using Counter(), count and regex

counts_dict = {}
for c in list(sentence):
  if c not in counts_dict:
    counts_dict[c] = 0
  counts_dict[c] += 1

for key, value in counts_dict.items():
    print(key, value)

回答 12

count绝对是计算字符串中字符出现次数的最简洁,最有效的方法,但是我尝试使用解决方案lambda,例如:

sentence = 'Mary had a little lamb'
sum(map(lambda x : 1 if 'a' in x else 0, sentence))

这将导致:

4

同样,这样做还有一个好处,如果该句子是包含与上述相同字符的子字符串列表,则由于使用,这也会给出正确的结果in。看一看 :

sentence = ['M', 'ar', 'y', 'had', 'a', 'little', 'l', 'am', 'b']
sum(map(lambda x : 1 if 'a' in x else 0, sentence))

这也导致:

4

当然,这仅在检查单个字符的出现(例如'a'在这种特殊情况下)时才起作用。

count is definitely the most concise and efficient way of counting the occurrence of a character in a string but I tried to come up with a solution using lambda, something like this :

sentence = 'Mary had a little lamb'
sum(map(lambda x : 1 if 'a' in x else 0, sentence))

This will result in :

4

Also, there is one more advantage to this is if the sentence is a list of sub-strings containing same characters as above, then also this gives the correct result because of the use of in. Have a look :

sentence = ['M', 'ar', 'y', 'had', 'a', 'little', 'l', 'am', 'b']
sum(map(lambda x : 1 if 'a' in x else 0, sentence))

This also results in :

4

But Of-course this will work only when checking occurrence of single character such as 'a' in this particular case.


回答 13

“不使用count来查找想要的字符串中的字符”方法。

import re

def count(s, ch):

   pass

def main():

   s = raw_input ("Enter strings what you like, for example, 'welcome': ")  

   ch = raw_input ("Enter you want count characters, but best result to find one character: " )

   print ( len (re.findall ( ch, s ) ) )

main()

“Without using count to find you want character in string” method.

import re

def count(s, ch):

   pass

def main():

   s = raw_input ("Enter strings what you like, for example, 'welcome': ")  

   ch = raw_input ("Enter you want count characters, but best result to find one character: " )

   print ( len (re.findall ( ch, s ) ) )

main()

回答 14

我是熊猫图书馆的粉丝,尤其是value_counts()方法。您可以使用它来计算字符串中每个字符的出现:

>>> import pandas as pd
>>> phrase = "I love the pandas library and its `value_counts()` method"
>>> pd.Series(list(phrase)).value_counts()
     8
a    5
e    4
t    4
o    3
n    3
s    3
d    3
l    3
u    2
i    2
r    2
v    2
`    2
h    2
p    1
b    1
I    1
m    1
(    1
y    1
_    1
)    1
c    1
dtype: int64

I am a fan of the pandas library, in particular the value_counts() method. You could use it to count the occurrence of each character in your string:

>>> import pandas as pd
>>> phrase = "I love the pandas library and its `value_counts()` method"
>>> pd.Series(list(phrase)).value_counts()
     8
a    5
e    4
t    4
o    3
n    3
s    3
d    3
l    3
u    2
i    2
r    2
v    2
`    2
h    2
p    1
b    1
I    1
m    1
(    1
y    1
_    1
)    1
c    1
dtype: int64

回答 15

spam = 'have a nice day'
var = 'd'


def count(spam, var):
    found = 0
    for key in spam:
        if key == var:
            found += 1
    return found
count(spam, var)
print 'count %s is: %s ' %(var, count(spam, var))
spam = 'have a nice day'
var = 'd'


def count(spam, var):
    found = 0
    for key in spam:
        if key == var:
            found += 1
    return found
count(spam, var)
print 'count %s is: %s ' %(var, count(spam, var))

回答 16

Python 3

有两种方法可以实现此目的:

1)内置函数count()

sentence = 'Mary had a little lamb'
print(sentence.count('a'))`

2)不使用功能

sentence = 'Mary had a little lamb'    
count = 0

for i in sentence:
    if i == "a":
        count = count + 1

print(count)

Python 3

Ther are two ways to achieve this:

1) With built-in function count()

sentence = 'Mary had a little lamb'
print(sentence.count('a'))`

2) Without using a function

sentence = 'Mary had a little lamb'    
count = 0

for i in sentence:
    if i == "a":
        count = count + 1

print(count)

回答 17

仅此恕我直言-您可以添加上限或下限方法

def count_letter_in_str(string,letter):
    return string.count(letter)

No more than this IMHO – you can add the upper or lower methods

def count_letter_in_str(string,letter):
    return string.count(letter)

如何检查变量是否存在?

问题:如何检查变量是否存在?

我想检查一个变量是否存在。现在我正在做这样的事情:

try:
   myVar
except NameError:
   # Do something.

是否有其他方法无一exceptions?

I want to check if a variable exists. Now I’m doing something like this:

try:
   myVar
except NameError:
   # Do something.

Are there other ways without exceptions?


回答 0

要检查是否存在局部变量:

if 'myVar' in locals():
  # myVar exists.

要检查是否存在全局变量:

if 'myVar' in globals():
  # myVar exists.

要检查对象是否具有属性:

if hasattr(obj, 'attr_name'):
  # obj.attr_name exists.

To check the existence of a local variable:

if 'myVar' in locals():
  # myVar exists.

To check the existence of a global variable:

if 'myVar' in globals():
  # myVar exists.

To check if an object has an attribute:

if hasattr(obj, 'attr_name'):
  # obj.attr_name exists.

回答 1

使用中那些尚未被定义或组(或明或暗地)变量几乎总是一件坏事任何语言,因为这往往预示着该计划的逻辑还没有被恰当地考虑,并有可能的结果行为无法预测。

如果您需要在Python中执行此操作,以下与您的操作类似的技巧将确保变量在使用前具有一定的价值:

try:
    myVar
except NameError:
    myVar = None

# Now you're free to use myVar without Python complaining.

但是,我仍然不认为这是个好主意-在我看来,您应该尝试重构代码,以免发生这种情况。

The use of variables that have yet to been defined or set (implicitly or explicitly) is almost always a bad thing in any language, since it often indicates that the logic of the program hasn’t been thought through properly, and is likely to result in unpredictable behaviour.

If you need to do it in Python, the following trick, which is similar to yours, will ensure that a variable has some value before use:

try:
    myVar
except NameError:
    myVar = None

# Now you're free to use myVar without Python complaining.

However, I’m still not convinced that’s a good idea – in my opinion, you should try to refactor your code so that this situation does not occur.


回答 2

一种简单的方法是一开始就初始化它 myVar = None

然后稍后:

if myVar is not None:
    # Do something

A simple way is to initialize it at first saying myVar = None

Then later on:

if myVar is not None:
    # Do something

回答 3

使用try / except是测试变量是否存在的最佳方法。但是几乎可以肯定,有一种比设置/测试全局变量更好的方法。

例如,如果您想在第一次调用某个函数时初始化模块级变量,那么最好使用如下代码:

my_variable = None

def InitMyVariable():
  global my_variable
  if my_variable is None:
    my_variable = ...

Using try/except is the best way to test for a variable’s existence. But there’s almost certainly a better way of doing whatever it is you’re doing than setting/testing global variables.

For example, if you want to initialize a module-level variable the first time you call some function, you’re better off with code something like this:

my_variable = None

def InitMyVariable():
  global my_variable
  if my_variable is None:
    my_variable = ...

回答 4

对于对象/模块,您还可以

'var' in dir(obj)

例如,

>>> class Something(object):
...     pass
...
>>> c = Something()
>>> c.a = 1
>>> 'a' in dir(c)
True
>>> 'b' in dir(c)
False

for objects/modules, you can also

'var' in dir(obj)

For example,

>>> class Something(object):
...     pass
...
>>> c = Something()
>>> c.a = 1
>>> 'a' in dir(c)
True
>>> 'b' in dir(c)
False

回答 5

我将假定该测试将在功能中使用,类似于user97370的答案。我不喜欢这个答案,因为它污染了全局命名空间。解决该问题的一种方法是改用类:

class InitMyVariable(object):
  my_variable = None

def __call__(self):
  if self.my_variable is None:
   self.my_variable = ...

我不喜欢这样,因为它使代码复杂化,并提出了一些问题,例如,是否应该确认Singleton编程模式?幸运的是,Python允许函数在一段时间内拥有属性,这为我们提供了一个简单的解决方案:

def InitMyVariable():
  if InitMyVariable.my_variable is None:
    InitMyVariable.my_variable = ...
InitMyVariable.my_variable = None

I will assume that the test is going to be used in a function, similar to user97370’s answer. I don’t like that answer because it pollutes the global namespace. One way to fix it is to use a class instead:

class InitMyVariable(object):
  my_variable = None

def __call__(self):
  if self.my_variable is None:
   self.my_variable = ...

I don’t like this, because it complicates the code and opens up questions such as, should this confirm to the Singleton programming pattern? Fortunately, Python has allowed functions to have attributes for a while, which gives us this simple solution:

def InitMyVariable():
  if InitMyVariable.my_variable is None:
    InitMyVariable.my_variable = ...
InitMyVariable.my_variable = None

回答 6

catchexcept在Python中被称为。除此之外,对于这种简单情况也很好。还有的AttributeError,可以用来检查一个对象具有的属性。

catch is called except in Python. other than that it’s fine for such simple cases. There’s the AttributeError that can be used to check if an object has an attribute.


回答 7

处理这种情况的一种通常有效的方法是不显式检查变量是否存在,而只是继续将可能不存在的变量的首次用法包装在try / except NameError中:

# Search for entry.
for x in y:
  if x == 3:
    found = x

# Work with found entry.
try:
  print('Found: {0}'.format(found))
except NameError:
  print('Not found')
else:
  # Handle rest of Found case here
  ...

A way that often works well for handling this kind of situation is to not explicitly check if the variable exists but just go ahead and wrap the first usage of the possibly non-existing variable in a try/except NameError:

# Search for entry.
for x in y:
  if x == 3:
    found = x

# Work with found entry.
try:
  print('Found: {0}'.format(found))
except NameError:
  print('Not found')
else:
  # Handle rest of Found case here
  ...

回答 8

我创建了一个自定义函数。

def exists(var):
     var_exists = var in locals() or var in globals()
     return var_exists

然后调用如下函数,将其替换variable_name为要检查的变量:

exists("variable_name")

将返回TrueFalse

I created a custom function.

def exists(var):
     var_exists = var in locals() or var in globals()
     return var_exists

Then the call the function like follows replacing variable_name with the variable you want to check:

exists("variable_name")

Will return True or False


什么是mixin,为什么它们有用?

问题:什么是mixin,为什么它们有用?

在“ Python编程 ”中,Mark Lutz提到了“ mixins”。我来自C / C ++ / C#背景,以前没有听说过这个词。什么是mixin?

本示例的两行之间进行阅读(我已经链接到它,因为它很长),我认为这是使用多重继承来扩展类而不是“适当的”子类的一种情况。这是正确的吗?

为什么我要这样做而不是将新功能放入子类中?因此,为什么混合/多重继承方法比使用组合更好?

什么将mixin与多重继承分开?这仅仅是语义问题吗?

In “Programming Python“, Mark Lutz mentions “mixins”. I’m from a C/C++/C# background and I have not heard the term before. What is a mixin?

Reading between the lines of this example (which I’ve linked to because it’s quite long), I’m presuming it’s a case of using multiple inheritance to extend a class as opposed to ‘proper’ subclassing. Is this right?

Why would I want to do that rather than put the new functionality into a subclass? For that matter, why would a mixin/multiple inheritance approach be better than using composition?

What separates a mixin from multiple inheritance? Is it just a matter of semantics?


回答 0

mixin是一种特殊的多重继承。使用mixin的主要情况有两种:

  1. 您想为一个类提供很多可选功能。
  2. 您想在许多不同的类中使用一种特定功能。

例如,请考虑werkzeug的请求和响应系统。我可以说一个普通的旧请求对象:

from werkzeug import BaseRequest

class Request(BaseRequest):
    pass

如果我想添加接受标头支持,我会做

from werkzeug import BaseRequest, AcceptMixin

class Request(AcceptMixin, BaseRequest):
    pass

如果我想创建一个支持接受标头,etag,身份验证和用户代理支持的请求对象,则可以执行以下操作:

from werkzeug import BaseRequest, AcceptMixin, ETagRequestMixin, UserAgentMixin, AuthenticationMixin

class Request(AcceptMixin, ETagRequestMixin, UserAgentMixin, AuthenticationMixin, BaseRequest):
    pass

区别是细微的,但是在上面的示例中,mixin类并不是独立存在的。在更传统的多重继承中,AuthenticationMixin(例如)可能更像Authenticator。也就是说,该类可能会设计为独立存在。

A mixin is a special kind of multiple inheritance. There are two main situations where mixins are used:

  1. You want to provide a lot of optional features for a class.
  2. You want to use one particular feature in a lot of different classes.

For an example of number one, consider werkzeug’s request and response system. I can make a plain old request object by saying:

from werkzeug import BaseRequest

class Request(BaseRequest):
    pass

If I want to add accept header support, I would make that

from werkzeug import BaseRequest, AcceptMixin

class Request(AcceptMixin, BaseRequest):
    pass

If I wanted to make a request object that supports accept headers, etags, authentication, and user agent support, I could do this:

from werkzeug import BaseRequest, AcceptMixin, ETagRequestMixin, UserAgentMixin, AuthenticationMixin

class Request(AcceptMixin, ETagRequestMixin, UserAgentMixin, AuthenticationMixin, BaseRequest):
    pass

The difference is subtle, but in the above examples, the mixin classes weren’t made to stand on their own. In more traditional multiple inheritance, the AuthenticationMixin (for example) would probably be something more like Authenticator. That is, the class would probably be designed to stand on its own.


回答 1

首先,您应该注意,mixin仅存在于多种继承语言中。您不能使用Java或C#进行混合。

基本上,mixin是独立的基本类型,可为子类提供有限的功能和多态共振。如果您正在考虑使用C#,请考虑一下您不必实际实现的接口,因为该接口已经实现了。您只需继承它并从其功能中受益。

Mixins通常范围狭窄,不打算扩展。

[编辑-关于原因:]

既然您问过,我想我应该说一下原因。最大的好处是您不必一遍又一遍地自己做。在C#中,mixin受益最大的地方可能是Disposal模式。每当实现IDisposable时,几乎总是希望遵循相同的模式,但最终会以较小的变化编写和重新编写相同的基本代码。如果有可扩展的Disposal mixin,则可以节省很多额外的键入操作。

[编辑2-回答您的其他问题]

什么将mixin与多重继承分开?这仅仅是语义问题吗?

是。mixin和标准多重继承之间的区别只是语义问题。具有多重继承的类可能会使用混合作为多重继承的一部分。

mixin的目的是创建一个可以通过继承“混合”到任何其他类型的类型,而不会影响继承类型,同时仍然为该类型提供一些有益的功能。

再次考虑一下已经实现的接口。

我个人不使用mixins,因为我主要使用不支持它们的语言进行开发,因此我很难拿出一个像样的示例来提供“啊!”的好例子。你的时刻。但我会再试一次。我将使用一个人为设计的示例-大多数语言已经以某种方式提供了该功能-希望这将解释应该如何创建和使用mixin。开始:

假设您具有一个可以与XML进行序列化的类型。您希望该类型提供“ ToXML”方法,该方法返回包含具有该类型的数据值的XML片段的字符串,以及“ FromXML”,其允许该类型从字符串中的XML片段重建其数据值。同样,这是一个人为的示例,因此也许您使用文件流或语言运行时库中的XML Writer类…等等。关键是您想将对象序列化为XML并从XML取回新对象。

此示例中的另一个重要点是您希望以通用方式执行此操作。您不需要为要序列化的每种类型实现“ ToXML”和“ FromXML”方法,而是需要一些通用的方法来确保您的类型可以做到这一点并且可以正常工作。您想要代码重用。

如果您的语言支持,则可以创建XmlSerializable mixin为您完成工作。此类型将实现ToXML和FromXML方法。它将使用对示例不重要的某种机制,能够从与之混合的任何类型中收集所有必要的数据,以构建ToXML返回的XML片段,并且当FromXML为叫。

和..就是这样。要使用它,您需要将任何类型的类型都需要序列化为XML,才能从XmlSerializable继承。每当需要序列化或反序列化该类型时,只需调用ToXML或FromXML。实际上,由于XmlSerializable是完全成熟的类型并且是多态的,因此可以想象到,您可以构建一个对原始类型一无所知的文档序列化器,只接受一个XmlSerializable类型的数组。

现在想象一下将此场景用于其他用途,例如创建一个确保每个混合了它的类的mixin记录每个方法调用,或者一个为混合它的类型提供事务性的mixin。列表可以继续。

如果您只是将mixin视为旨在为类型添加少量功能而又不影响该类型的小型基本类型,那么您就是无所不能。

希望。:)

First, you should note that mixins only exist in multiple-inheritance languages. You can’t do a mixin in Java or C#.

Basically, a mixin is a stand-alone base type that provides limited functionality and polymorphic resonance for a child class. If you’re thinking in C#, think of an interface that you don’t have to actually implement because it’s already implemented; you just inherit from it and benefit from its functionality.

Mixins are typically narrow in scope and not meant to be extended.

[edit — as to why:]

I suppose I should address why, since you asked. The big benefit is that you don’t have to do it yourself over and over again. In C#, the biggest place where a mixin could benefit might be from the Disposal pattern. Whenever you implement IDisposable, you almost always want to follow the same pattern, but you end up writing and re-writing the same basic code with minor variations. If there were an extendable Disposal mixin, you could save yourself a lot of extra typing.

[edit 2 — to answer your other questions]

What separates a mixin from multiple inheritance? Is it just a matter of semantics?

Yes. The difference between a mixin and standard multiple inheritance is just a matter of semantics; a class that has multiple inheritance might utilize a mixin as part of that multiple inheritance.

The point of a mixin is to create a type that can be “mixed in” to any other type via inheritance without affecting the inheriting type while still offering some beneficial functionality for that type.

Again, think of an interface that is already implemented.

I personally don’t use mixins since I develop primarily in a language that doesn’t support them, so I’m having a really difficult time coming up with a decent example that will just supply that “ahah!” moment for you. But I’ll try again. I’m going to use an example that’s contrived — most languages already provide the feature in some way or another — but that will, hopefully, explain how mixins are supposed to be created and used. Here goes:

Suppose you have a type that you want to be able to serialize to and from XML. You want the type to provide a “ToXML” method that returns a string containing an XML fragment with the data values of the type, and a “FromXML” that allows the type to reconstruct its data values from an XML fragment in a string. Again, this is a contrived example, so perhaps you use a file stream, or an XML Writer class from your language’s runtime library… whatever. The point is that you want to serialize your object to XML and get a new object back from XML.

The other important point in this example is that you want to do this in a generic way. You don’t want to have to implement a “ToXML” and “FromXML” method for every type that you want to serialize, you want some generic means of ensuring that your type will do this and it just works. You want code reuse.

If your language supported it, you could create the XmlSerializable mixin to do your work for you. This type would implement the ToXML and the FromXML methods. It would, using some mechanism that’s not important to the example, be capable of gathering all the necessary data from any type that it’s mixed in with to build the XML fragment returned by ToXML and it would be equally capable of restoring that data when FromXML is called.

And.. that’s it. To use it, you would have any type that needs to be serialized to XML inherit from XmlSerializable. Whenever you needed to serialize or deserialize that type, you would simply call ToXML or FromXML. In fact, since XmlSerializable is a fully-fledged type and polymorphic, you could conceivably build a document serializer that doesn’t know anything about your original type, accepting only, say, an array of XmlSerializable types.

Now imagine using this scenario for other things, like creating a mixin that ensures that every class that mixes it in logs every method call, or a mixin that provides transactionality to the type that mixes it in. The list can go on and on.

If you just think of a mixin as a small base type designed to add a small amount of functionality to a type without otherwise affecting that type, then you’re golden.

Hopefully. 🙂


回答 2

该答案旨在通过以下示例解释mixin :

  • 自包含:简短,无需了解任何库即可理解示例。

  • 用Python而不是其他语言。

    可以理解,存在其他语言(例如Ruby)的示例,因为该术语在这些语言中更为常见,但这是Python线程。

它还应考虑有争议的问题:

是否需要多重继承来表征mixin?

定义

我还没有看到来自“权威”来源的引文,清楚地说明了Python中的mixin。

我已经看到了mixin的2种可能定义(如果认为它们与其他类似概念(例如抽象基类)不同),人们并不完全同意哪种正确。

不同语言之间的共识可能会有所不同。

定义1:无多重继承

mixin是一个类,以便该类的某些方法使用该类中未定义的方法。

因此,该类不是要实例化的,而应用作基类。否则,该实例将具有在不引发异常的情况下无法调用的方法。

一些资料来源增加的一个约束是该类可能不包含数据,仅包含方法,但我不明白为什么这样做是必要的。但是实际上,许多有用的mixin没有任何数据,并且没有数据的基类更易于使用。

一个典型的例子是从only <=和实现所有比较运算符==

class ComparableMixin(object):
    """This class has methods which use `<=` and `==`,
    but this class does NOT implement those methods."""
    def __ne__(self, other):
        return not (self == other)
    def __lt__(self, other):
        return self <= other and (self != other)
    def __gt__(self, other):
        return not self <= other
    def __ge__(self, other):
        return self == other or self > other

class Integer(ComparableMixin):
    def __init__(self, i):
        self.i = i
    def __le__(self, other):
        return self.i <= other.i
    def __eq__(self, other):
        return self.i == other.i

assert Integer(0) <  Integer(1)
assert Integer(0) != Integer(1)
assert Integer(1) >  Integer(0)
assert Integer(1) >= Integer(1)

# It is possible to instantiate a mixin:
o = ComparableMixin()
# but one of its methods raise an exception:
#o != o 

这个特定的例子可以通过functools.total_ordering()装饰器来实现,但是这里的游戏是重新发明轮子:

import functools

@functools.total_ordering
class Integer(object):
    def __init__(self, i):
        self.i = i
    def __le__(self, other):
        return self.i <= other.i
    def __eq__(self, other):
        return self.i == other.i

assert Integer(0) < Integer(1)
assert Integer(0) != Integer(1)
assert Integer(1) > Integer(0)
assert Integer(1) >= Integer(1)

定义2:多重继承

mixin是一种设计模式,其中基类的某些方法使用其未定义的方法,并且该方法应由另一个基类实现,而不是由定义1中的派生方法实现。

术语“ 混合类”是指打算在该设计模式中使用的基类(使用方法的那些类是TODO,还是实现该方法的那些?

决定给定类是否为混合类并不容易:该方法可以仅在派生类上实现,在这种情况下,我们回到定义1。您必须考虑作者的意图。

这种模式很有趣,因为可以通过选择不同的基类来重组功能:

class HasMethod1(object):
    def method(self):
        return 1

class HasMethod2(object):
    def method(self):
        return 2

class UsesMethod10(object):
    def usesMethod(self):
        return self.method() + 10

class UsesMethod20(object):
    def usesMethod(self):
        return self.method() + 20

class C1_10(HasMethod1, UsesMethod10): pass
class C1_20(HasMethod1, UsesMethod20): pass
class C2_10(HasMethod2, UsesMethod10): pass
class C2_20(HasMethod2, UsesMethod20): pass

assert C1_10().usesMethod() == 11
assert C1_20().usesMethod() == 21
assert C2_10().usesMethod() == 12
assert C2_20().usesMethod() == 22

# Nothing prevents implementing the method
# on the base class like in Definition 1:

class C3_10(UsesMethod10):
    def method(self):
        return 3

assert C3_10().usesMethod() == 13

权威的Python事件

collections.abc官方文档中,该文档明确使用术语Mixin Methods

它指出如果一个类:

  • 贯彻 __next__
  • 从单个类继承 Iterator

然后该类将免费获得一个__iter__ mixin方法

因此,至少在文档的这一点上,mixin不需要多重继承,并且与定义1保持一致。

当然,文档在不同点上可能是矛盾的,并且其他重要的Python库可能正在其文档中使用其他定义。

该页面还使用术语Set mixin,它明确表明类似类Set并且Iterator可以称为Mixin类。

用其他语言

  • 红宝石:显然不需要混入多重继承,如主要参考书如提到的编程的Ruby和Ruby编程语言

  • C ++:未实现的方法是纯虚拟方法。

    定义1与抽象类(具有纯虚方法的类)的定义一致。该类无法实例化。

    虚拟继承可以定义2:来自两个派生类的多重继承

This answer aims to explain mixins with examples that are:

  • self-contained: short, with no need to know any libraries to understand the example.

  • in Python, not in other languages.

    It is understandable that there were examples from other languages such as Ruby since the term is much more common in those languages, but this is a Python thread.

It shall also consider the controversial question:

Is multiple inheritance necessary or not to characterize a mixin?

Definitions

I have yet to see a citation from an “authoritative” source clearly saying what is a mixin in Python.

I have seen 2 possible definitions of a mixin (if they are to be considered as different from other similar concepts such as abstract base classes), and people don’t entirely agree on which one is correct.

The consensus may vary between different languages.

Definition 1: no multiple inheritance

A mixin is a class such that some method of the class uses a method which is not defined in the class.

Therefore the class is not meant to be instantiated, but rather serve as a base class. Otherwise the instance would have methods that cannot be called without raising an exception.

A constraint which some sources add is that the class may not contain data, only methods, but I don’t see why this is necessary. In practice however, many useful mixins don’t have any data, and base classes without data are simpler to use.

A classic example is the implementation of all comparison operators from only <= and ==:

class ComparableMixin(object):
    """This class has methods which use `<=` and `==`,
    but this class does NOT implement those methods."""
    def __ne__(self, other):
        return not (self == other)
    def __lt__(self, other):
        return self <= other and (self != other)
    def __gt__(self, other):
        return not self <= other
    def __ge__(self, other):
        return self == other or self > other

class Integer(ComparableMixin):
    def __init__(self, i):
        self.i = i
    def __le__(self, other):
        return self.i <= other.i
    def __eq__(self, other):
        return self.i == other.i

assert Integer(0) <  Integer(1)
assert Integer(0) != Integer(1)
assert Integer(1) >  Integer(0)
assert Integer(1) >= Integer(1)

# It is possible to instantiate a mixin:
o = ComparableMixin()
# but one of its methods raise an exception:
#o != o 

This particular example could have been achieved via the functools.total_ordering() decorator, but the game here was to reinvent the wheel:

import functools

@functools.total_ordering
class Integer(object):
    def __init__(self, i):
        self.i = i
    def __le__(self, other):
        return self.i <= other.i
    def __eq__(self, other):
        return self.i == other.i

assert Integer(0) < Integer(1)
assert Integer(0) != Integer(1)
assert Integer(1) > Integer(0)
assert Integer(1) >= Integer(1)

Definition 2: multiple inheritance

A mixin is a design pattern in which some method of a base class uses a method it does not define, and that method is meant to be implemented by another base class, not by the derived like in Definition 1.

The term mixin class refers to base classes which are intended to be used in that design pattern (TODO those that use the method, or those that implement it?)

It is not easy to decide if a given class is a mixin or not: the method could be just implemented on the derived class, in which case we’re back to Definition 1. You have to consider the author’s intentions.

This pattern is interesting because it is possible to recombine functionalities with different choices of base classes:

class HasMethod1(object):
    def method(self):
        return 1

class HasMethod2(object):
    def method(self):
        return 2

class UsesMethod10(object):
    def usesMethod(self):
        return self.method() + 10

class UsesMethod20(object):
    def usesMethod(self):
        return self.method() + 20

class C1_10(HasMethod1, UsesMethod10): pass
class C1_20(HasMethod1, UsesMethod20): pass
class C2_10(HasMethod2, UsesMethod10): pass
class C2_20(HasMethod2, UsesMethod20): pass

assert C1_10().usesMethod() == 11
assert C1_20().usesMethod() == 21
assert C2_10().usesMethod() == 12
assert C2_20().usesMethod() == 22

# Nothing prevents implementing the method
# on the base class like in Definition 1:

class C3_10(UsesMethod10):
    def method(self):
        return 3

assert C3_10().usesMethod() == 13

Authoritative Python occurrences

At the official documentatiton for collections.abc the documentation explicitly uses the term Mixin Methods.

It states that if a class:

  • implements __next__
  • inherits from a single class Iterator

then the class gets an __iter__ mixin method for free.

Therefore at least on this point of the documentation, mixin does not not require multiple inheritance, and is coherent with Definition 1.

The documentation could of course be contradictory at different points, and other important Python libraries might be using the other definition in their documentation.

This page also uses the term Set mixin, which clearly suggests that classes like Set and Iterator can be called Mixin classes.

In other languages

  • Ruby: Clearly does not require multiple inheritance for mixin, as mentioned in major reference books such as Programming Ruby and The Ruby programming Language

  • C++: A method that is not implemented is a pure virtual method.

    Definition 1 coincides with the definition of an abstract class (a class that has a pure virtual method). That class cannot be instantiated.

    Definition 2 is possible with virtual inheritance: Multiple Inheritance from two derived classes


回答 3

我认为它们是使用多重继承的一种有条理的方式-因为mixin最终只是(可能)遵循关于被称为mixin的类的约定的另一个python类。

我对管理您称为Mixin的约定的理解是Mixin:

  • 添加方法但不添加实例变量(类常量可以)
  • 仅继承自object(在Python中)

这样,它限制了多重继承的潜在复杂性,并且通过限制外观(相对于完全多重继承),使跟踪程序流变得相当容易。它们类似于ruby模块

如果我想添加实例变量(比单继承具有更大的灵活性),那么我倾向于组合。

话虽如此,我看到了名为XYZMixin的类,它们确实具有实例变量。

I think of them as a disciplined way of using multiple inheritance – because ultimately a mixin is just another python class that (might) follow the conventions about classes that are called mixins.

My understanding of the conventions that govern something you would call a Mixin are that a Mixin:

  • adds methods but not instance variables (class constants are OK)
  • only inherits from object (in Python)

That way it limits the potential complexity of multiple inheritance, and makes it reasonably easy to track the flow of your program by limiting where you have to look (compared to full multiple inheritance). They are similar to ruby modules.

If I want to add instance variables (with more flexibility than allowed for by single inheritance) then I tend to go for composition.

Having said that, I have seen classes called XYZMixin that do have instance variables.


回答 4

Mixins是“编程”中的一个概念,其中该类提供功能,但并不用于实例化。Mixins的主要目的是提供独立的功能,并且最好的是,mixin本身不与其他mixin继承并且也避免状态。在诸如Ruby之类的语言中,有一些直接的语言支持,但对于Python则没有。但是,您可以使用多类继承来执行Python中提供的功能。

我观看了http://www.youtube.com/watch?v=v_uKI2NOLEM的视频,以了解Mixins的基础知识。对于初学者来说,了解mixin的基础知识,它们如何工作以及在实现它们时可能遇到的问题非常有用。

维基百科仍然是最好的:http : //en.wikipedia.org/wiki/Mixin

Mixins is a concept in Programming in which the class provides functionalities but it is not meant to be used for instantiation. Main purpose of Mixins is to provide functionalities which are standalone and it would be best if the mixins itself do not have inheritance with other mixins and also avoid state. In languages such as Ruby, there is some direct language support but for Python, there isn’t. However, you could used multi-class inheritance to execute the functionality provided in Python.

I watched this video http://www.youtube.com/watch?v=v_uKI2NOLEM to understand the basics of mixins. It is quite useful for a beginner to understand the basics of mixins and how they work and the problems you might face in implementing them.

Wikipedia is still the best: http://en.wikipedia.org/wiki/Mixin


回答 5

什么将mixin与多重继承分开?这仅仅是语义问题吗?

混合是多重继承的有限形式。在某些语言中,将mixin添加到类的机制(在语法方面)与继承略有不同。

特别是在Python的上下文中,mixin是一个父类,它为子类提供功能,但本身并不打算实例化。

您可能会说,“那只是多重继承,而不是真正的mixin”是因为实际上可以实例化和使用对于mixin感到困惑的类,因此,这确实是语义上的,而且非常真实。

多重继承的例子

该示例来自文档,是OrderedCounter:

class OrderedCounter(Counter, OrderedDict):
     'Counter that remembers the order elements are first encountered'

     def __repr__(self):
         return '%s(%r)' % (self.__class__.__name__, OrderedDict(self))

     def __reduce__(self):
         return self.__class__, (OrderedDict(self),)

它从模块子类化Counter和。OrderedDictcollections

双方CounterOrderedDict意图被实例化,并在自己使用。但是,通过将它们都子类化,我们可以得到一个有序的计数器,并在每个对象中重用代码。

这是重用代码的有效方法,但也可能会出现问题。如果事实证明其中一个对象中存在错误,则不小心修复它可能会在子类中创建错误。

混合的例子

通常将Mixins提倡为获得代码重用的方式,而又避免了诸如OrderedCounter之类的协作多重继承可能存在的潜在耦合问题。当您使用mixins时,您使用的功能与数据紧密耦合。

与上面的示例不同,mixin不能单独使用。它提供了新的或不同的功能。

例如,标准库有一对夫妇在混入socketserver

可以使用这些混合类来创建每种类型服务器的分支和线程版本。例如,ThreadingUDPServer的创建如下:

class ThreadingUDPServer(ThreadingMixIn, UDPServer):
    pass

混合类首先出现,因为它会覆盖UDPServer中定义的方法。设置各种属性还可以更改基础服务器机制的行为。

在这种情况下,mixin方法将覆盖UDPServer对象定义中的方法以允许并发。

覆盖的方法似乎是process_request,它还提供了另一种方法process_request_thread。这是源代码

class ThreadingMixIn:
        """Mix-in class to handle each request in a new thread."""

        # Decides how threads will act upon termination of the
        # main process
        daemon_threads = False

        def process_request_thread(self, request, client_address):
            """Same as in BaseServer but as a thread.
            In addition, exception handling is done here.
            """
            try:
                self.finish_request(request, client_address)
            except Exception:
                self.handle_error(request, client_address)
            finally:
                self.shutdown_request(request)

        def process_request(self, request, client_address):
            """Start a new thread to process the request."""
            t = threading.Thread(target = self.process_request_thread,
                                 args = (request, client_address))
            t.daemon = self.daemon_threads
            t.start()

人为的例子

这是一个mixin,主要用于演示目的-大多数对象的发展将超出此repr的用途:

class SimpleInitReprMixin(object):
    """mixin, don't instantiate - useful for classes instantiable
    by keyword arguments to their __init__ method.
    """
    __slots__ = () # allow subclasses to use __slots__ to prevent __dict__
    def __repr__(self):
        kwarg_strings = []
        d = getattr(self, '__dict__', None)
        if d is not None:
            for k, v in d.items():
                kwarg_strings.append('{k}={v}'.format(k=k, v=repr(v)))
        slots = getattr(self, '__slots__', None)
        if slots is not None:
            for k in slots:
                v = getattr(self, k, None)
                kwarg_strings.append('{k}={v}'.format(k=k, v=repr(v)))
        return '{name}({kwargs})'.format(
          name=type(self).__name__,
          kwargs=', '.join(kwarg_strings)
          )

用法是:

class Foo(SimpleInitReprMixin): # add other mixins and/or extend another class here
    __slots__ = 'foo',
    def __init__(self, foo=None):
        self.foo = foo
        super(Foo, self).__init__()

和用法:

>>> f1 = Foo('bar')
>>> f2 = Foo()
>>> f1
Foo(foo='bar')
>>> f2
Foo(foo=None)

What separates a mixin from multiple inheritance? Is it just a matter of semantics?

A mixin is a limited form of multiple inheritance. In some languages the mechanism for adding a mixin to a class is slightly different (in terms of syntax) from that of inheritance.

In the context of Python especially, a mixin is a parent class that provides functionality to subclasses but is not intended to be instantiated itself.

What might cause you to say, “that’s just multiple inheritance, not really a mixin” is if the class that might be confused for a mixin can actually be instantiated and used – so indeed it is a semantic, and very real, difference.

Example of Multiple Inheritance

This example, from the documentation, is an OrderedCounter:

class OrderedCounter(Counter, OrderedDict):
     'Counter that remembers the order elements are first encountered'

     def __repr__(self):
         return '%s(%r)' % (self.__class__.__name__, OrderedDict(self))

     def __reduce__(self):
         return self.__class__, (OrderedDict(self),)

It subclasses both the Counter and the OrderedDict from the collections module.

Both Counter and OrderedDict are intended to be instantiated and used on their own. However, by subclassing them both, we can have a counter that is ordered and reuses the code in each object.

This is a powerful way to reuse code, but it can also be problematic. If it turns out there’s a bug in one of the objects, fixing it without care could create a bug in th