分类目录归档:知识问答

在pandas数据框中选择多个列

问题:在pandas数据框中选择多个列

我在不同的列中有数据,但是我不知道如何提取数据以将其保存在另一个变量中。

index  a   b   c
1      2   3   4
2      3   4   5

如何选择'a''b'并将其保存到df1?

我试过了

df1 = df['a':'b']
df1 = df.ix[:, 'a':'b']

似乎没有任何工作。

I have data in different columns but I don’t know how to extract it to save it in another variable.

index  a   b   c
1      2   3   4
2      3   4   5

How do I select 'a', 'b' and save it in to df1?

I tried

df1 = df['a':'b']
df1 = df.ix[:, 'a':'b']

None seem to work.


回答 0

列名(字符串)无法按照您尝试的方式进行切片。

在这里,您有两个选择。如果您从上下文中知道要切出哪些变量,则可以通过将列表传递给__getitem__语法([])来仅返回那些列的视图。

df1 = df[['a','b']]

或者,如果需要对它们进行数字索引而不是按其名称进行索引(例如,您的代码应在不知道前两列名称的情况下自动执行此操作),则可以执行以下操作:

df1 = df.iloc[:,0:2] # Remember that Python does not slice inclusive of the ending index.

此外,您应该熟悉Pandas对象与该对象副本的视图概念。上述方法中的第一个将在内存中返回所需子对象(所需切片)的新副本。

但是,有时熊猫中有一些索引约定不执行此操作,而是给您一个新变量,该变量仅引用与原始对象中的子对象或切片相同的内存块。第二种索引编制方式会发生这种情况,因此您可以使用copy()函数对其进行修改以获取常规副本。发生这种情况时,更改您认为是切片对象的内容有时会更改原始对象。始终对此保持警惕。

df1 = df.iloc[0,0:2].copy() # To avoid the case where changing df1 also changes df

要使用iloc,您需要知道列位置(或索引)。由于列位置可能会改变,而不是硬编码索引,则可以使用ilocget_loc功能的columns数据框对象的方法来获得列索引。

{df.columns.get_loc(c):c for idx, c in enumerate(df.columns)}

现在,您可以使用此字典通过名称和使用来访问列iloc

The column names (which are strings) cannot be sliced in the manner you tried.

Here you have a couple of options. If you know from context which variables you want to slice out, you can just return a view of only those columns by passing a list into the __getitem__ syntax (the []’s).

df1 = df[['a','b']]

Alternatively, if it matters to index them numerically and not by their name (say your code should automatically do this without knowing the names of the first two columns) then you can do this instead:

df1 = df.iloc[:,0:2] # Remember that Python does not slice inclusive of the ending index.

Additionally, you should familiarize yourself with the idea of a view into a Pandas object vs. a copy of that object. The first of the above methods will return a new copy in memory of the desired sub-object (the desired slices).

Sometimes, however, there are indexing conventions in Pandas that don’t do this and instead give you a new variable that just refers to the same chunk of memory as the sub-object or slice in the original object. This will happen with the second way of indexing, so you can modify it with the copy() function to get a regular copy. When this happens, changing what you think is the sliced object can sometimes alter the original object. Always good to be on the look out for this.

df1 = df.iloc[0,0:2].copy() # To avoid the case where changing df1 also changes df

To use iloc, you need to know the column positions (or indices). As the column positions may change, instead of hard-coding indices, you can use iloc along with get_loc function of columns method of dataframe object to obtain column indices.

{df.columns.get_loc(c):c for idx, c in enumerate(df.columns)}

Now you can use this dictionary to access columns through names and using iloc.


回答 1

从0.11.0版本开始,可以按照您尝试使用.loc索引器的方式对列进行切片:

df.loc[:, 'C':'E']

等价于

df[['C', 'D', 'E']]  # or df.loc[:, ['C', 'D', 'E']]

C通过返回列E


随机生成的DataFrame的演示:

import pandas as pd
import numpy as np
np.random.seed(5)
df = pd.DataFrame(np.random.randint(100, size=(100, 6)), 
                  columns=list('ABCDEF'), 
                  index=['R{}'.format(i) for i in range(100)])
df.head()

Out: 
     A   B   C   D   E   F
R0  99  78  61  16  73   8
R1  62  27  30  80   7  76
R2  15  53  80  27  44  77
R3  75  65  47  30  84  86
R4  18   9  41  62   1  82

要从C到E获得列(请注意,与整数切片不同,列中包含’E’):

df.loc[:, 'C':'E']

Out: 
      C   D   E
R0   61  16  73
R1   30  80   7
R2   80  27  44
R3   47  30  84
R4   41  62   1
R5    5  58   0
...

同样适用于基于标签选择行。从这些列中获取行“ R6”至“ R10”:

df.loc['R6':'R10', 'C':'E']

Out: 
      C   D   E
R6   51  27  31
R7   83  19  18
R8   11  67  65
R9   78  27  29
R10   7  16  94

.loc还接受一个布尔数组,因此您可以选择在数组中对应条目为的列True。例如,如果列名称在列表中,则df.columns.isin(list('BCD'))返回array([False, True, True, True, False, False], dtype=bool)-True ['B', 'C', 'D'];错误,否则。

df.loc[:, df.columns.isin(list('BCD'))]

Out: 
      B   C   D
R0   78  61  16
R1   27  30  80
R2   53  80  27
R3   65  47  30
R4    9  41  62
R5   78   5  58
...

As of version 0.11.0, columns can be sliced in the manner you tried using the .loc indexer:

df.loc[:, 'C':'E']

is equivalent of

df[['C', 'D', 'E']]  # or df.loc[:, ['C', 'D', 'E']]

and returns columns C through E.


A demo on a randomly generated DataFrame:

import pandas as pd
import numpy as np
np.random.seed(5)
df = pd.DataFrame(np.random.randint(100, size=(100, 6)), 
                  columns=list('ABCDEF'), 
                  index=['R{}'.format(i) for i in range(100)])
df.head()

Out: 
     A   B   C   D   E   F
R0  99  78  61  16  73   8
R1  62  27  30  80   7  76
R2  15  53  80  27  44  77
R3  75  65  47  30  84  86
R4  18   9  41  62   1  82

To get the columns from C to E (note that unlike integer slicing, ‘E’ is included in the columns):

df.loc[:, 'C':'E']

Out: 
      C   D   E
R0   61  16  73
R1   30  80   7
R2   80  27  44
R3   47  30  84
R4   41  62   1
R5    5  58   0
...

Same works for selecting rows based on labels. Get the rows ‘R6’ to ‘R10’ from those columns:

df.loc['R6':'R10', 'C':'E']

Out: 
      C   D   E
R6   51  27  31
R7   83  19  18
R8   11  67  65
R9   78  27  29
R10   7  16  94

.loc also accepts a boolean array so you can select the columns whose corresponding entry in the array is True. For example, df.columns.isin(list('BCD')) returns array([False, True, True, True, False, False], dtype=bool) – True if the column name is in the list ['B', 'C', 'D']; False, otherwise.

df.loc[:, df.columns.isin(list('BCD'))]

Out: 
      B   C   D
R0   78  61  16
R1   27  30  80
R2   53  80  27
R3   65  47  30
R4    9  41  62
R5   78   5  58
...

回答 2

假设列名(df.columns)为['index','a','b','c'],则所需数据在第3列和第4列中。如果在脚本运行时不知道它们的名称,则可以执行此操作

newdf = df[df.columns[2:4]] # Remember, Python is 0-offset! The "3rd" entry is at slot 2.

正如EMS在他的回答中指出的那样,df.ix切片列更加简洁,但是.columns切片界面可能更自然,因为它使用了香草1-D python列表索引/切片语法。

警告:这'index'DataFrame列的坏名称。该标签也用于真实df.index属性Index数组。因此,您的列由返回,df['index']而真正的DataFrame索引由返回df.index。An Index是一种特殊的Series优化方法,用于查找其元素的值。对于df.index,它用于按标签查找行。该df.columns属性也是一个pd.Index数组,用于按标签查找列。

Assuming your column names (df.columns) are ['index','a','b','c'], then the data you want is in the 3rd & 4th columns. If you don’t know their names when your script runs, you can do this

newdf = df[df.columns[2:4]] # Remember, Python is 0-offset! The "3rd" entry is at slot 2.

As EMS points out in his answer, df.ix slices columns a bit more concisely, but the .columns slicing interface might be more natural because it uses the vanilla 1-D python list indexing/slicing syntax.

WARN: 'index' is a bad name for a DataFrame column. That same label is also used for the real df.index attribute, a Index array. So your column is returned by df['index'] and the real DataFrame index is returned by df.index. An Index is a special kind of Series optimized for lookup of it’s elements’ values. For df.index it’s for looking up rows by their label. That df.columns attribute is also a pd.Index array, for looking up columns by their labels.


回答 3

In [39]: df
Out[39]: 
   index  a  b  c
0      1  2  3  4
1      2  3  4  5

In [40]: df1 = df[['b', 'c']]

In [41]: df1
Out[41]: 
   b  c
0  3  4
1  4  5
In [39]: df
Out[39]: 
   index  a  b  c
0      1  2  3  4
1      2  3  4  5

In [40]: df1 = df[['b', 'c']]

In [41]: df1
Out[41]: 
   b  c
0  3  4
1  4  5

回答 4

我知道这个问题已经很老了,但是在最新版本的熊猫中,有一种简单的方法可以做到这一点。列名(即字符串)可以按您喜欢的任何方式进行切片。

columns = ['b', 'c']
df1 = pd.DataFrame(df, columns=columns)

I realize this question is quite old, but in the latest version of pandas there is an easy way to do exactly this. Column names (which are strings) can be sliced in whatever manner you like.

columns = ['b', 'c']
df1 = pd.DataFrame(df, columns=columns)

回答 5

您可以提供要删除的列的列表,然后仅使用drop()Pandas DataFrame上的函数返回带有所需列的DataFrame。

只是说

colsToDrop = ['a']
df.drop(colsToDrop, axis=1)

将返回仅包含列b和的DataFrame c

drop方法在此处记录

You could provide a list of columns to be dropped and return back the DataFrame with only the columns needed using the drop() function on a Pandas DataFrame.

Just saying

colsToDrop = ['a']
df.drop(colsToDrop, axis=1)

would return a DataFrame with just the columns b and c.

The drop method is documented here.


回答 6

有了熊猫

机智列名称

dataframe[['column1','column2']]

通过iloc和具有索引号的特定列进行选择:

dataframe.iloc[:,[1,2]]

与loc列名称可以像

dataframe.loc[:,['column1','column2']]

With pandas,

wit column names

dataframe[['column1','column2']]

to select by iloc and specific columns with index number:

dataframe.iloc[:,[1,2]]

with loc column names can be used like

dataframe.loc[:,['column1','column2']]

回答 7

我发现此方法非常有用:

# iloc[row slicing, column slicing]
surveys_df.iloc [0:3, 1:4]

可以在这里找到更多详细信息

I found this method to be very useful:

# iloc[row slicing, column slicing]
surveys_df.iloc [0:3, 1:4]

More details can be found here


回答 8

从0.21.0开始,不推荐使用.loc[]使用带有一个或多个缺少标签的列表,而推荐使用.reindex。因此,您的问题的答案是:

df1 = df.reindex(columns=['b','c'])

在以前的版本中,.loc[list-of-labels]只要找到至少一个键就可以使用using (否则将引发KeyError)。此行为已弃用,现在显示警告消息。推荐的替代方法是使用.reindex()

索引和选择数据中了解更多信息

Starting with 0.21.0, using .loc or [] with a list with one or more missing labels is deprecated in favor of .reindex. So, the answer to your question is:

df1 = df.reindex(columns=['b','c'])

In prior versions, using .loc[list-of-labels] would work as long as at least 1 of the keys was found (otherwise it would raise a KeyError). This behavior is deprecated and now shows a warning message. The recommended alternative is to use .reindex().

Read more at Indexing and Selecting Data


回答 9

您可以使用熊猫。我创建了DataFrame:

    import pandas as pd
    df = pd.DataFrame([[1, 2,5], [5,4, 5], [7,7, 8], [7,6,9]], 
                      index=['Jane', 'Peter','Alex','Ann'],
                      columns=['Test_1', 'Test_2', 'Test_3'])

数据框:

           Test_1  Test_2  Test_3
    Jane        1       2       5
    Peter       5       4       5
    Alex        7       7       8
    Ann         7       6       9

要按名称选择1列或更多列:

    df[['Test_1','Test_3']]

           Test_1  Test_3
    Jane        1       5
    Peter       5       5
    Alex        7       8
    Ann         7       9

您还可以使用:

    df.Test_2

和哟列 Test_2

    Jane     2
    Peter    4
    Alex     7
    Ann      6

您也可以使用从这些行中选择列和行.loc()。这称为“切片”。请注意,我从列Test_1Test_3

    df.loc[:,'Test_1':'Test_3']

“切片”为:

            Test_1  Test_2  Test_3
     Jane        1       2       5
     Peter       5       4       5
     Alex        7       7       8
     Ann         7       6       9

如果你只是想PeterAnn来自列Test_1Test_3

    df.loc[['Peter', 'Ann'],['Test_1','Test_3']]

你得到:

           Test_1  Test_3
    Peter       5       5
    Ann         7       9

You can use pandas. I create the DataFrame:

    import pandas as pd
    df = pd.DataFrame([[1, 2,5], [5,4, 5], [7,7, 8], [7,6,9]], 
                      index=['Jane', 'Peter','Alex','Ann'],
                      columns=['Test_1', 'Test_2', 'Test_3'])

The DataFrame:

           Test_1  Test_2  Test_3
    Jane        1       2       5
    Peter       5       4       5
    Alex        7       7       8
    Ann         7       6       9

To select 1 or more columns by name:

    df[['Test_1','Test_3']]

           Test_1  Test_3
    Jane        1       5
    Peter       5       5
    Alex        7       8
    Ann         7       9

You can also use:

    df.Test_2

And yo get column Test_2

    Jane     2
    Peter    4
    Alex     7
    Ann      6

You can also select columns and rows from these rows using .loc(). This is called “slicing”. Notice that I take from column Test_1to Test_3

    df.loc[:,'Test_1':'Test_3']

The “Slice” is:

            Test_1  Test_2  Test_3
     Jane        1       2       5
     Peter       5       4       5
     Alex        7       7       8
     Ann         7       6       9

And if you just want Peter and Ann from columns Test_1 and Test_3:

    df.loc[['Peter', 'Ann'],['Test_1','Test_3']]

You get:

           Test_1  Test_3
    Peter       5       5
    Ann         7       9

回答 10

如果要按行索引和列名获取一个元素,则可以像那样进行df['b'][0]。它像成像一样简单。

或者,您也可以df.ix[0,'b']混合使用索引和标签。

注意:由于ix不推荐使用v0.20 ,而推荐使用loc/ iloc

If you want to get one element by row index and column name, you can do it just like df['b'][0]. It is as simple as you can image.

Or you can use df.ix[0,'b'],mixed usage of index and label.

Note: Since v0.20 ix has been deprecated in favour of loc / iloc.


回答 11

一种不同而又简单的方法:迭代行

使用iterows

 df1= pd.DataFrame() #creating an empty dataframe
 for index,i in df.iterrows():
    df1.loc[index,'A']=df.loc[index,'A']
    df1.loc[index,'B']=df.loc[index,'B']
    df1.head()

One different and easy approach : iterating rows

using iterows

 df1= pd.DataFrame() #creating an empty dataframe
 for index,i in df.iterrows():
    df1.loc[index,'A']=df.loc[index,'A']
    df1.loc[index,'B']=df.loc[index,'B']
    df1.head()

回答 12

以上响应中讨论的不同方法是基于以下假设:用户知道要放下或子集化的列索引,或者用户希望使用一定范围的列(例如,在“ C”与“ E”之间)对数据帧进行子集化。pandas.DataFrame.drop()当然是基于用户定义的列列表对数据进行子集化的选项(尽管您必须谨慎使用始终使用dataframe的副本,并且inplace参数不应设置为True!)

另一种选择是使用pandas.columns.difference(),它对列名进行设置上的区别,并返回包含所需列的数组的索引类型。以下是解决方案:

df = pd.DataFrame([[2,3,4],[3,4,5]],columns=['a','b','c'],index=[1,2])
columns_for_differencing = ['a']
df1 = df.copy()[df.columns.difference(columns_for_differencing)]
print(df1)

输出为: b c 1 3 4 2 4 5

The different approaches discussed in above responses are based on the assumption that either the user knows column indices to drop or subset on, or the user wishes to subset a dataframe using a range of columns (for instance between ‘C’ : ‘E’). pandas.DataFrame.drop() is certainly an option to subset data based on a list of columns defined by user (though you have to be cautious that you always use copy of dataframe and inplace parameters should not be set to True!!)

Another option is to use pandas.columns.difference(), which does a set difference on column names, and returns an index type of array containing desired columns. Following is the solution:

df = pd.DataFrame([[2,3,4],[3,4,5]],columns=['a','b','c'],index=[1,2])
columns_for_differencing = ['a']
df1 = df.copy()[df.columns.difference(columns_for_differencing)]
print(df1)

The output would be: b c 1 3 4 2 4 5


回答 13

您也可以使用df.pop()

>>> df = pd.DataFrame([('falcon', 'bird',    389.0),
...                    ('parrot', 'bird',     24.0),
...                    ('lion',   'mammal',   80.5),
...                    ('monkey', 'mammal', np.nan)],
...                   columns=('name', 'class', 'max_speed'))
>>> df
     name   class  max_speed
0  falcon    bird      389.0
1  parrot    bird       24.0
2    lion  mammal       80.5
3  monkey  mammal 

>>> df.pop('class')
0      bird
1      bird
2    mammal
3    mammal
Name: class, dtype: object

>>> df
     name  max_speed
0  falcon      389.0
1  parrot       24.0
2    lion       80.5
3  monkey        NaN

让我知道这是否对您有帮助,请使用df.pop(c)

you can also use df.pop()

>>> df = pd.DataFrame([('falcon', 'bird',    389.0),
...                    ('parrot', 'bird',     24.0),
...                    ('lion',   'mammal',   80.5),
...                    ('monkey', 'mammal', np.nan)],
...                   columns=('name', 'class', 'max_speed'))
>>> df
     name   class  max_speed
0  falcon    bird      389.0
1  parrot    bird       24.0
2    lion  mammal       80.5
3  monkey  mammal 

>>> df.pop('class')
0      bird
1      bird
2    mammal
3    mammal
Name: class, dtype: object

>>> df
     name  max_speed
0  falcon      389.0
1  parrot       24.0
2    lion       80.5
3  monkey        NaN

let me know if this helps so for you , please use df.pop(c)


回答 14

我已经看到了一些答案,但是仍然不清楚。您将如何选择那些感兴趣的列?答案是,如果将它们收集在列表中,则可以使用列表引用列。

print(extracted_features.shape)
print(extracted_features)

(63,)
['f000004' 'f000005' 'f000006' 'f000014' 'f000039' 'f000040' 'f000043'
 'f000047' 'f000048' 'f000049' 'f000050' 'f000051' 'f000052' 'f000053'
 'f000054' 'f000055' 'f000056' 'f000057' 'f000058' 'f000059' 'f000060'
 'f000061' 'f000062' 'f000063' 'f000064' 'f000065' 'f000066' 'f000067'
 'f000068' 'f000069' 'f000070' 'f000071' 'f000072' 'f000073' 'f000074'
 'f000075' 'f000076' 'f000077' 'f000078' 'f000079' 'f000080' 'f000081'
 'f000082' 'f000083' 'f000084' 'f000085' 'f000086' 'f000087' 'f000088'
 'f000089' 'f000090' 'f000091' 'f000092' 'f000093' 'f000094' 'f000095'
 'f000096' 'f000097' 'f000098' 'f000099' 'f000100' 'f000101' 'f000103']

我有以下list / numpy数组extracted_features,指定63列。原始数据集有103列,我想准确提取出这些列,然后使用

dataset[extracted_features]

你最终会得到这个

您将在机器学习中(特别是在功能选择中)经常使用此功能。我也想讨论其他方式,但是我认为其他stackoverflowers已经对此进行了讨论。希望这对您有所帮助!

I’ve seen several answers on that, but on remained unclear to me. How would you select those columns of interest? The answer to that is that if you have them gathered in a list, you can just reference the columns using the list.

Example

print(extracted_features.shape)
print(extracted_features)

(63,)
['f000004' 'f000005' 'f000006' 'f000014' 'f000039' 'f000040' 'f000043'
 'f000047' 'f000048' 'f000049' 'f000050' 'f000051' 'f000052' 'f000053'
 'f000054' 'f000055' 'f000056' 'f000057' 'f000058' 'f000059' 'f000060'
 'f000061' 'f000062' 'f000063' 'f000064' 'f000065' 'f000066' 'f000067'
 'f000068' 'f000069' 'f000070' 'f000071' 'f000072' 'f000073' 'f000074'
 'f000075' 'f000076' 'f000077' 'f000078' 'f000079' 'f000080' 'f000081'
 'f000082' 'f000083' 'f000084' 'f000085' 'f000086' 'f000087' 'f000088'
 'f000089' 'f000090' 'f000091' 'f000092' 'f000093' 'f000094' 'f000095'
 'f000096' 'f000097' 'f000098' 'f000099' 'f000100' 'f000101' 'f000103']

I have the following list/numpy array extracted_features, specifying 63 columns. The original dataset has 103 columns, and I would like to extract exactly those, then I would use

dataset[extracted_features]

And you will end up with this

This something you would use quite often in Machine Learning (more specifically, in feature selection). I would like to discuss other ways too, but I think that has already been covered by other stackoverflowers. Hope this’ve been helpful!


回答 15

您可以使用pandas.DataFrame.filtermethod来过滤或重新排序列,如下所示:

df1 = df.filter(['a', 'b'])

You can use pandas.DataFrame.filter method to either filter or reorder columns like this:

df1 = df.filter(['a', 'b'])

回答 16

df[['a','b']] # select all rows of 'a' and 'b'column 
df.loc[0:10, ['a','b']] # index 0 to 10 select column 'a' and 'b'
df.loc[0:10, ['a':'b']] # index 0 to 10 select column 'a' to 'b'
df.iloc[0:10, 3:5] # index 0 to 10 and column 3 to 5
df.iloc[3, 3:5] # index 3 of column 3 to 5
df[['a','b']] # select all rows of 'a' and 'b'column 
df.loc[0:10, ['a','b']] # index 0 to 10 select column 'a' and 'b'
df.loc[0:10, ['a':'b']] # index 0 to 10 select column 'a' to 'b'
df.iloc[0:10, 3:5] # index 0 to 10 and column 3 to 5
df.iloc[3, 3:5] # index 3 of column 3 to 5

如果解释了Python,那么什么是.pyc文件?

问题:如果解释了Python,那么什么是.pyc文件?

我已经了解Python是一种解释语言…
但是,当我看我的 Python源代码时,我看到的.pyc是Windows标识为“编译的Python文件”的文件。

这些从哪里来?

I’ve been given to understand that Python is an interpreted language…
However, when I look at my Python source code I see .pyc files, which Windows identifies as “Compiled Python Files”.

Where do these come in?


回答 0

它们包含字节码,这是Python解释器将源代码编译到的字节码。然后,此代码由Python的虚拟机执行。

Python的文档解释了这样的定义:

Python是一种解释型语言,与编译型语言相反,尽管由于字节码编译器的存在,两者之间的区别可能很模糊。这意味着可以直接运行源文件,而无需显式创建然后运行的可执行文件。

They contain byte code, which is what the Python interpreter compiles the source to. This code is then executed by Python’s virtual machine.

Python’s documentation explains the definition like this:

Python is an interpreted language, as opposed to a compiled one, though the distinction can be blurry because of the presence of the bytecode compiler. This means that source files can be run directly without explicitly creating an executable which is then run.


回答 1

我已经了解Python是一种解释语言…

这种流行的模因是不正确的,或者是基于对(自然)语言水平的误解造成的:类似的错误是说“圣经是一本精装书”。让我解释一下这个比喻…

“圣经”是“一书的”,即一个感(标识为实际的物理对象)的书籍; 被认为是“圣经副本”的书应该具有基本的共同点(内容,尽管即使这些书可以使用不同的语言,具有不同的可接受的翻译,脚注和其他注释的级别),但是这些书是完全可以在被认为是基础的许多方面进行区别-装订类型,装订颜色,打印中使用的字体,插图(如有),可写边距宽,是否内置书签,数量和种类, 等等等等。

很有可能典型的圣经印刷确实是精装书本-毕竟,这是一本书,通常一遍又一遍地读,在几个地方加上书签,通过寻找给定的章节指针来翻阅等等,而良好的精装书装订可以使给定的副本在这种使用下的使用寿命更长。但是,这些都是平凡的(实用的)问题,不能用来确定给定的实际书本对象是否是圣经的副本:平装本完全可以印刷!

同样,从定义一类语言实现的意义上讲,Python是“一种语言”,这些实现必须在某些基本方面都相似(语法,大多数语义,但明确允许它们不同的部分除外),但必须完全允许几乎在每个“实现”细节上都各不相同-包括它们如何处理给定的源文件,是否将源代码编译为较低级别的形式(如果可以,将其编译为哪种形式)以及是否保存此类已编译的表单(到磁盘或其他位置),它们如何执行所述表单等。

经典实现CPython通常简称为“ Python”,但是它只是几种生产质量实现,与Microsoft的IronPython(编译为CLR代码,即“ .NET”),Jython并存。 (可编译为JVM代码),PyPy(可使用Python本身编写,并且可以编译为多种“后端”形式,包括“即时”生成的机器语言)。它们都是Python(Python语言的实现),就像许多表面上不同的书本都可以是圣经(圣经的副本)一样。

如果您对CPython特别感兴趣:它将源文件编译为特定于Python的较低级形式(称为“字节码”),在需要时自动进行(当没有与源文件相对应的字节码文件时),或者字节码文件早于源代码或由其他Python版本编译),通常将字节码文件保存到磁盘中(以避免将来再次编译它们)。OTOH IronPython通常将编译为CLR代码(取决于是否将其保存到磁盘),将Jython编译为JVM代码(将它们保存至磁盘或不保存- .class如果确实将其保存,则将使用扩展名)。

然后,这些较低级别的表单由适当的“虚拟机”(也称为“解释器”)执行-CPython VM,.Net运行时,Java VM(也称为JVM)。

因此,从这个意义上讲(典型的实现方式是什么),Python是一种“解释语言”,当且仅当C#和Java是:它们都具有一种典型的实现策略,即首先生成字节码,然后通过VM /解释器执行字节码。 。

更有可能的重点是编译过程的“繁重”,缓慢和高仪式性。CPython旨在尽可能快地编译,尽可能轻量级,尽可能少地执行仪式-编译器几乎不执行错误检查和优化,因此它可以快速运行并占用少量内存,这反过来又使它可以运行可以在任何需要的时候自动透明地运行,而用户甚至在大多数情况下都不需要知道正在进行编译。Java和C#通常在编译期间接受更多工作(因此不执行自动编译),以便更彻底地检查错误并执行更多优化。这是灰度的连续体,而不是黑白情况,

I’ve been given to understand that Python is an interpreted language…

This popular meme is incorrect, or, rather, constructed upon a misunderstanding of (natural) language levels: a similar mistake would be to say “the Bible is a hardcover book”. Let me explain that simile…

“The Bible” is “a book” in the sense of being a class of (actual, physical objects identified as) books; the books identified as “copies of the Bible” are supposed to have something fundamental in common (the contents, although even those can be in different languages, with different acceptable translations, levels of footnotes and other annotations) — however, those books are perfectly well allowed to differ in a myriad of aspects that are not considered fundamental — kind of binding, color of binding, font(s) used in the printing, illustrations if any, wide writable margins or not, numbers and kinds of builtin bookmarks, and so on, and so forth.

It’s quite possible that a typical printing of the Bible would indeed be in hardcover binding — after all, it’s a book that’s typically meant to be read over and over, bookmarked at several places, thumbed through looking for given chapter-and-verse pointers, etc, etc, and a good hardcover binding can make a given copy last longer under such use. However, these are mundane (practical) issues that cannot be used to determine whether a given actual book object is a copy of the Bible or not: paperback printings are perfectly possible!

Similarly, Python is “a language” in the sense of defining a class of language implementations which must all be similar in some fundamental respects (syntax, most semantics except those parts of those where they’re explicitly allowed to differ) but are fully allowed to differ in just about every “implementation” detail — including how they deal with the source files they’re given, whether they compile the sources to some lower level forms (and, if so, which form — and whether they save such compiled forms, to disk or elsewhere), how they execute said forms, and so forth.

The classical implementation, CPython, is often called just “Python” for short — but it’s just one of several production-quality implementations, side by side with Microsoft’s IronPython (which compiles to CLR codes, i.e., “.NET”), Jython (which compiles to JVM codes), PyPy (which is written in Python itself and can compile to a huge variety of “back-end” forms including “just-in-time” generated machine language). They’re all Python (==”implementations of the Python language”) just like many superficially different book objects can all be Bibles (==”copies of The Bible”).

If you’re interested in CPython specifically: it compiles the source files into a Python-specific lower-level form (known as “bytecode”), does so automatically when needed (when there is no bytecode file corresponding to a source file, or the bytecode file is older than the source or compiled by a different Python version), usually saves the bytecode files to disk (to avoid recompiling them in the future). OTOH IronPython will typically compile to CLR codes (saving them to disk or not, depending) and Jython to JVM codes (saving them to disk or not — it will use the .class extension if it does save them).

These lower level forms are then executed by appropriate “virtual machines” also known as “interpreters” — the CPython VM, the .Net runtime, the Java VM (aka JVM), as appropriate.

So, in this sense (what do typical implementations do), Python is an “interpreted language” if and only if C# and Java are: all of them have a typical implementation strategy of producing bytecode first, then executing it via a VM/interpreter.

More likely the focus is on how “heavy”, slow, and high-ceremony the compilation process is. CPython is designed to compile as fast as possible, as lightweight as possible, with as little ceremony as feasible — the compiler does very little error checking and optimization, so it can run fast and in small amounts of memory, which in turns lets it be run automatically and transparently whenever needed, without the user even needing to be aware that there is a compilation going on, most of the time. Java and C# typically accept more work during compilation (and therefore don’t perform automatic compilation) in order to check errors more thoroughly and perform more optimizations. It’s a continuum of gray scales, not a black or white situation, and it would be utterly arbitrary to put a threshold at some given level and say that only above that level you call it “compilation”!-)


回答 2

没有所谓的解释语言。使用解释器还是编译器纯粹是实现的特征,与该语言绝对无关。

每种语言都可以由解释器或编译器实现。绝大多数语言至少每种类型都有一种实现。(例如,有C和C ++的解释器,有JavaScript,PHP,Perl,Python和Ruby的编译器。)此外,大多数现代语言实现实际上将解释器和编译器(甚至是多个编译器)结合在一起。

语言只是一组抽象的数学规则。解释器是一种语言的几种具体实现策略之一。这两个人生活在完全不同的抽象级别上。如果英语是一种打字语言,则术语“解释语言”将是一种打字错误。语句“ Python是一种解释性语言”不仅是错误的(因为如果错误,则意味着该语句甚至是有意义的,即使它是错误的),它只是简单的没有意义,因为一种语言永远无法将定义为“解释。”

特别是,如果您查看当前现有的Python实现,则以下是它们正在使用的实现策略:

  • IronPython:编译为DLR树,然后DLR编译为CIL字节码。CIL字节码会发生什么情况取决于您运行的CLI VES,但是Microsoft .NET,GNU Portable.NET和Novell Mono最终会将其编译为本机代码。
  • Jython:解释Python源代码,直到它标识热代码路径,然后将其编译为JVML字节码。JVML字节码会发生什么情况取决于您在哪个JVM上运行。Maxine将直接将其编译为未优化的本机代码,直到它识别出热代码路径,然后将其重新编译为优化的本机代码。HotSpot将首先解释JVML字节码,然后最终将热代码路径编译为优化的机器代码。
  • PyPy:编译为PyPy字节码,然后由PyPy VM解释,直到它标识热代码路径,然后根据运行的平台将其编译为本机代码,JVML字节码或CIL字节码。
  • CPython:编译为CPython字节码,然后对其进行解释。
  • 无堆栈Python:编译为CPython字节码,然后对其进行解释。
  • Unladen Swallow:编译为CPython字节码,然后对其进行解释,直到识别出热代码路径,然后将其编译为LLVM IR,然后LLVM编译器再将其编译为本机代码。
  • Cython:将Python代码编译为可移植的C代码,然后使用标准C编译器对其进行编译
  • Nuitka:将Python代码编译为机器相关的C ++代码,然后使用标准C编译器进行编译

您可能会注意到,列表中的每个实现(以及我未提及的其他一些实现,例如tinypy,Shedskin或Psyco)都有一个编译器。实际上,据我所知,目前尚没有纯粹解释的Python实现,没有计划好的实现,也从来没有这样的实现。

即使您将“解释语言”一词解释为“具有解释性实现的语言”的含义,这也不是没有道理,但事实并非如此。谁告诉你的,显然不知道他在说什么。

特别是,.pyc您看到的文件是CPython,Stackless Python或Unladen Swallow生成的缓存字节码文件。

There is no such thing as an interpreted language. Whether an interpreter or a compiler is used is purely a trait of the implementation and has absolutely nothing whatsoever to do with the language.

Every language can be implemented by either an interpreter or a compiler. The vast majority of languages have at least one implementation of each type. (For example, there are interpreters for C and C++ and there are compilers for JavaScript, PHP, Perl, Python and Ruby.) Besides, the majority of modern language implementations actually combine both an interpreter and a compiler (or even multiple compilers).

A language is just a set of abstract mathematical rules. An interpreter is one of several concrete implementation strategies for a language. Those two live on completely different abstraction levels. If English were a typed language, the term “interpreted language” would be a type error. The statement “Python is an interpreted language” is not just false (because being false would imply that the statement even makes sense, even if it is wrong), it just plain doesn’t make sense, because a language can never be defined as “interpreted.”

In particular, if you look at the currently existing Python implementations, these are the implementation strategies they are using:

  • IronPython: compiles to DLR trees which the DLR then compiles to CIL bytecode. What happens to the CIL bytecode depends upon which CLI VES you are running on, but Microsoft .NET, GNU Portable.NET and Novell Mono will eventually compile it to native machine code.
  • Jython: interprets Python sourcecode until it identifies the hot code paths, which it then compiles to JVML bytecode. What happens to the JVML bytecode depends upon which JVM you are running on. Maxine will directly compile it to un-optimized native code until it identifies the hot code paths, which it then recompiles to optimized native code. HotSpot will first interpret the JVML bytecode and then eventually compile the hot code paths to optimized machine code.
  • PyPy: compiles to PyPy bytecode, which then gets interpreted by the PyPy VM until it identifies the hot code paths which it then compiles into native code, JVML bytecode or CIL bytecode depending on which platform you are running on.
  • CPython: compiles to CPython bytecode which it then interprets.
  • Stackless Python: compiles to CPython bytecode which it then interprets.
  • Unladen Swallow: compiles to CPython bytecode which it then interprets until it identifies the hot code paths which it then compiles to LLVM IR which the LLVM compiler then compiles to native machine code.
  • Cython: compiles Python code to portable C code, which is then compiled with a standard C compiler
  • Nuitka: compiles Python code to machine-dependent C++ code, which is then compiled with a standard C compiler

You might notice that every single one of the implementations in that list (plus some others I didn’t mention, like tinypy, Shedskin or Psyco) has a compiler. In fact, as far as I know, there is currently no Python implementation which is purely interpreted, there is no such implementation planned and there never has been such an implementation.

Not only does the term “interpreted language” not make sense, even if you interpret it as meaning “language with interpreted implementation”, it is clearly not true. Whoever told you that, obviously doesn’t know what he is talking about.

In particular, the .pyc files you are seeing are cached bytecode files produced by CPython, Stackless Python or Unladen Swallow.


回答 3

它们是由Python解释器在.py导入文件时创建的,它们包含导入的模块/程序的“已编译字节码”,其想法是从源代码“转换”为字节码(只需要执行一次)。import如果s .pyc比相应.py文件新,则可以在后续s 上跳过,从而加快启动速度。但是它仍然被解释。

These are created by the Python interpreter when a .py file is imported, and they contain the “compiled bytecode” of the imported module/program, the idea being that the “translation” from source code to bytecode (which only needs to be done once) can be skipped on subsequent imports if the .pyc is newer than the corresponding .py file, thus speeding startup a little. But it’s still interpreted.


回答 4

为了加快模块的加载速度,Python将模块的编译内容缓存在.pyc中。

CPython将其源代码编译为“字节代码”,并且出于性能方面的考虑,只要源文件发生更改,它都会在文件系统上缓存该字节代码。由于可以绕过编译阶段,因此可以更快地加载Python模块。当您的源文件是foo.py时,CPython将字节代码缓存在源代码旁边的foo.pyc文件中。

在python3中,扩展了Python的导入机制,以在每个Python包目录内的单个目录中编写和搜索字节码缓存文件。该目录将称为__pycache__。

这是描述如何加载模块的流程图:

欲获得更多信息:

参考:PEP3147
参考:“已编译” Python文件

To speed up loading modules, Python caches the compiled content of modules in .pyc.

CPython compiles its source code into “byte code”, and for performance reasons, it caches this byte code on the file system whenever the source file has changes. This makes loading of Python modules much faster because the compilation phase can be bypassed. When your source file is foo.py , CPython caches the byte code in a foo.pyc file right next to the source.

In python3, Python’s import machinery is extended to write and search for byte code cache files in a single directory inside every Python package directory. This directory will be called __pycache__ .

Here is a flow chart describing how modules are loaded:

For more information:

ref:PEP3147
ref:“Compiled” Python files


回答 5

这是给初学者的,

在运行脚本之前,Python会自动将脚本编译为已编译的代码,即字节代码。

运行脚本不被视为导入,并且不会创建.pyc。

举例来说,如果你有一个脚本文件abc.py是进口的另一个模块xyz.py,当你运行abc.pyxyz.pyc将被创建,因为XYZ是进口的,但没有abc.pyc文件将被创建以来的ABC。 py未导入。

如果您需要为未导入的模块创建.pyc文件,则可以使用py_compilecompileall模块。

py_compile模块可以手动编译任何模块。一种方法是py_compile.compile交互使用该模块中的功能:

>>> import py_compile
>>> py_compile.compile('abc.py')

这会将.pyc写入与abc.py相同的位置(您可以使用可选参数覆盖它 cfile)。

您还可以使用compileall模块自动编译一个或多个目录中的所有文件。

python -m compileall

如果省略了目录名(此示例中为当前目录),则模块将编译在 sys.path

THIS IS FOR BEGINNERS,

Python automatically compiles your script to compiled code, so called byte code, before running it.

Running a script is not considered an import and no .pyc will be created.

For example, if you have a script file abc.py that imports another module xyz.py, when you run abc.py, xyz.pyc will be created since xyz is imported, but no abc.pyc file will be created since abc.py isn’t being imported.

If you need to create a .pyc file for a module that is not imported, you can use the py_compile and compileall modules.

The py_compile module can manually compile any module. One way is to use the py_compile.compile function in that module interactively:

>>> import py_compile
>>> py_compile.compile('abc.py')

This will write the .pyc to the same location as abc.py (you can override that with the optional parameter cfile).

You can also automatically compile all files in a directory or directories using the compileall module.

python -m compileall

If the directory name (the current directory in this example) is omitted, the module compiles everything found on sys.path


回答 6

Python(至少是最常见的实现)遵循一种将原始源编译为字节码,然后在虚拟机上解释字节码的模式。这意味着(同样,最常见的实现)既不是纯解释器也不是纯编译器。

但是,另一方面是,编译过程基本上是隐藏的-.pyc文件基本上被视为高速缓存;它们加快了速度,但是您通常根本不需要意识到它们。必要时,它会根据文件时间/日期戳自动使它们无效并重新加载(重新编译源代码)。

我唯一一次看到的问题是,经过编译的字节码文件以某种方式获得了未来的时间戳,这意味着它看起来总是比源文件新。由于它看起来较新,因此从未重新编译源文件,因此无论您进行了什么更改,它们都将被忽略…

Python (at least the most common implementation of it) follows a pattern of compiling the original source to byte codes, then interpreting the byte codes on a virtual machine. This means (again, the most common implementation) is neither a pure interpreter nor a pure compiler.

The other side of this is, however, that the compilation process is mostly hidden — the .pyc files are basically treated like a cache; they speed things up, but you normally don’t have to be aware of them at all. It automatically invalidates and re-loads them (re-compiles the source code) when necessary based on file time/date stamps.

About the only time I’ve seen a problem with this was when a compiled bytecode file somehow got a timestamp well into the future, which meant it always looked newer than the source file. Since it looked newer, the source file was never recompiled, so no matter what changes you made, they were ignored…


回答 7

Python的* .py文件只是一个文本文件,您可以在其中编写一些代码行。当您尝试使用“ python filename.py”执行该文件时

此命令调用Python虚拟机。Python虚拟机具有2个组件:“编译器”和“解释器”。解释器无法直接读取* .py文件中的文本,因此该文本首先被转换为针对PVM的字节码(不是硬件,而是PVM)。PVM执行此字节代码。* .pyc文件也作为运行它的一部分生成,该文件对shell中的文件或其他文件中的文件执行导入操作。

如果此* .pyc文件已经生成,则下次您运行/执行* .py文件时,系统会直接加载* .pyc文件,而无需进行任何编译(这将为您节省一些处理器的机器周期)。

生成* .pyc文件后,除非您进行编辑,否则不需要* .py文件。

Python’s *.py file is just a text file in which you write some lines of code. When you try to execute this file using say “python filename.py”

This command invokes Python Virtual Machine. Python Virtual Machine has 2 components: “compiler” and “interpreter”. Interpreter cannot directly read the text in *.py file, so this text is first converted into a byte code which is targeted to the PVM (not hardware but PVM). PVM executes this byte code. *.pyc file is also generated, as part of running it which performs your import operation on file in shell or in some other file.

If this *.pyc file is already generated then every next time you run/execute your *.py file, system directly loads your *.pyc file which won’t need any compilation(This will save you some machine cycles of processor).

Once the *.pyc file is generated, there is no need of *.py file, unless you edit it.


回答 8

Python代码经历两个阶段。第一步,将代码编译成实际上是字节码的.pyc文件。然后,使用CPython解释器解释此.pyc文件(字节码)。请参考链接。在这里,用简单的术语解释了代码编译和执行的过程。

Python code goes through 2 stages. First step compiles the code into .pyc files which is actually a bytecode. Then this .pyc file(bytecode) is interpreted using CPython interpreter. Please refer to this link. Here process of code compilation and execution is explained in easy terms.


在Python中,如何确定对象是否可迭代?

问题:在Python中,如何确定对象是否可迭代?

有没有类似的方法isiterable?到目前为止,我发现的唯一解决方案是调用

hasattr(myObj, '__iter__')

但是我不确定这有多愚蠢。

Is there a method like isiterable? The only solution I have found so far is to call

hasattr(myObj, '__iter__')

But I am not sure how fool-proof this is.


回答 0

我最近一直在研究这个问题。基于此,我的结论是,如今这是最好的方法:

from collections.abc import Iterable   # drop `.abc` with Python 2.7 or lower

def iterable(obj):
    return isinstance(obj, Iterable)

上面已经建议过上述方法,但是普遍的共识是使用iter()会更好:

def iterable(obj):
    try:
        iter(obj)
    except Exception:
        return False
    else:
        return True

我们也iter()为此目的在代码中使用了,但是最近我开始越来越多地被那些只__getitem__被认为是可迭代的对象而烦恼。有__getitem__一个不可迭代的对象有充分的理由,并且上面的代码不能与它们很好地配合。作为真实示例,我们可以使用Faker。上面的代码报告了它是可迭代的,但实际上尝试对其进行迭代会导致AttributeError(用Faker 4.0.2测试):

>>> from faker import Faker
>>> fake = Faker()
>>> iter(fake)    # No exception, must be iterable
<iterator object at 0x7f1c71db58d0>
>>> list(fake)    # Ooops
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/.../site-packages/faker/proxy.py", line 59, in __getitem__
    return self._factory_map[locale.replace('-', '_')]
AttributeError: 'int' object has no attribute 'replace'

如果使用insinstance(),我们不会偶然地认为Faker实例(或其他只有的对象__getitem__)是可迭代的:

>>> from collections.abc import Iterable
>>> from faker import Faker
>>> isinstance(Faker(), Iterable)
False

较早的答案评论说,使用iter()安全性更高,因为在Python中实现迭代的旧方法基于__getitem__isinstance()方法,而该方法无法检测到这一点。在旧的Python版本中可能确实如此,但是根据我相当详尽的测试,isinstance()如今可以很好地工作了。唯一isinstance()不起作用但起作用的情况iter()UserDict使用Python2。如果相关,则可以使用isinstance(item, (Iterable, UserDict))它进行覆盖。

I’ve been studying this problem quite a bit lately. Based on that my conclusion is that nowadays this is the best approach:

from collections.abc import Iterable   # drop `.abc` with Python 2.7 or lower

def iterable(obj):
    return isinstance(obj, Iterable)

The above has been recommended already earlier, but the general consensus has been that using iter() would be better:

def iterable(obj):
    try:
        iter(obj)
    except Exception:
        return False
    else:
        return True

We’ve used iter() in our code as well for this purpose, but I’ve lately started to get more and more annoyed by objects which only have __getitem__ being considered iterable. There are valid reasons to have __getitem__ in a non-iterable object and with them the above code doesn’t work well. As a real life example we can use Faker. The above code reports it being iterable but actually trying to iterate it causes an AttributeError (tested with Faker 4.0.2):

>>> from faker import Faker
>>> fake = Faker()
>>> iter(fake)    # No exception, must be iterable
<iterator object at 0x7f1c71db58d0>
>>> list(fake)    # Ooops
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/.../site-packages/faker/proxy.py", line 59, in __getitem__
    return self._factory_map[locale.replace('-', '_')]
AttributeError: 'int' object has no attribute 'replace'

If we’d use insinstance(), we wouldn’t accidentally consider Faker instances (or any other objects having only __getitem__) to be iterable:

>>> from collections.abc import Iterable
>>> from faker import Faker
>>> isinstance(Faker(), Iterable)
False

Earlier answers commented that using iter() is safer as the old way to implement iteration in Python was based on __getitem__ and the isinstance() approach wouldn’t detect that. This may have been true with old Python versions, but based on my pretty exhaustive testing isinstance() works great nowadays. The only case where isinstance() didn’t work but iter() did was with UserDict when using Python 2. If that’s relevant, it’s possible to use isinstance(item, (Iterable, UserDict)) to get that covered.


回答 1

  1. 检查__iter__序列类型是否有效,但是在Python 2中,例如字符串可能会失败。我也想知道正确的答案,在此之前,这是一种可能性(也适用于字符串):

    from __future__ import print_function
    
    try:
        some_object_iterator = iter(some_object)
    except TypeError as te:
        print(some_object, 'is not iterable')

    所述iter内置的检查的__iter__方法或串的情况下的__getitem__方法。

  2. 另一种通用的pythonic方法是假定一个可迭代的对象,如果它不适用于给定的对象,则将优雅地失败。Python词汇表:

    通过检查对象的方法或属性签名而不是通过与某种类型对象的显式关系来确定对象类型的Python编程风格(“如果它看起来像鸭子,并且像鸭子一样嘎嘎叫,那一定是鸭子。”)通过强调接口经过精心设计的代码(而不是特定类型)通过允许多态替换来提高其灵活性。鸭式输入避免使用type()或isinstance()进行测试。取而代之的是,它通常采用EAFP(比授权更容易获得宽恕)风格的编程。

    try:
       _ = (e for e in my_object)
    except TypeError:
       print my_object, 'is not iterable'
  3. collections模块提供了一些抽象基类,这些基类允许询问类或实例是否提供了特定的功能,例如:

    from collections.abc import Iterable
    
    if isinstance(e, Iterable):
        # e is iterable

    但是,这不会检查可通过迭代的类__getitem__

  1. Checking for __iter__ works on sequence types, but it would fail on e.g. strings in Python 2. I would like to know the right answer too, until then, here is one possibility (which would work on strings, too):

    from __future__ import print_function
    
    try:
        some_object_iterator = iter(some_object)
    except TypeError as te:
        print(some_object, 'is not iterable')
    

    The iter built-in checks for the __iter__ method or in the case of strings the __getitem__ method.

  2. Another general pythonic approach is to assume an iterable, then fail gracefully if it does not work on the given object. The Python glossary:

    Pythonic programming style that determines an object’s type by inspection of its method or attribute signature rather than by explicit relationship to some type object (“If it looks like a duck and quacks like a duck, it must be a duck.”) By emphasizing interfaces rather than specific types, well-designed code improves its flexibility by allowing polymorphic substitution. Duck-typing avoids tests using type() or isinstance(). Instead, it typically employs the EAFP (Easier to Ask Forgiveness than Permission) style of programming.

    try:
       _ = (e for e in my_object)
    except TypeError:
       print my_object, 'is not iterable'
    
  3. The collections module provides some abstract base classes, which allow to ask classes or instances if they provide particular functionality, for example:

    from collections.abc import Iterable
    
    if isinstance(e, Iterable):
        # e is iterable
    

    However, this does not check for classes that are iterable through __getitem__.


回答 2

鸭打字

try:
    iterator = iter(theElement)
except TypeError:
    # not iterable
else:
    # iterable

# for obj in iterator:
#     pass

类型检查

使用抽象基类。他们至少需要Python 2.6,并且仅适用于新型类。

from collections.abc import Iterable   # import directly from collections for Python < 3.3

if isinstance(theElement, Iterable):
    # iterable
else:
    # not iterable

但是,iter()文档所描述那样更可靠:

检查isinstance(obj, Iterable)会检测已注册为Iterable或具有__iter__()方法的类,但不会检测到使用该__getitem__() 方法进行迭代的类。确定对象是否可迭代的唯一可靠方法是调用iter(obj)

Duck typing

try:
    iterator = iter(theElement)
except TypeError:
    # not iterable
else:
    # iterable

# for obj in iterator:
#     pass

Type checking

Use the Abstract Base Classes. They need at least Python 2.6 and work only for new-style classes.

from collections.abc import Iterable   # import directly from collections for Python < 3.3

if isinstance(theElement, Iterable):
    # iterable
else:
    # not iterable

However, iter() is a bit more reliable as described by the documentation:

Checking isinstance(obj, Iterable) detects classes that are registered as Iterable or that have an __iter__() method, but it does not detect classes that iterate with the __getitem__() method. The only reliable way to determine whether an object is iterable is to call iter(obj).


回答 3

我想摆脱一点点的相互作用越轻iter__iter____getitem__会发生什么窗帘后面。有了这些知识,您将能够理解为什么您能做到最好的是

try:
    iter(maybe_iterable)
    print('iteration will probably work')
except TypeError:
    print('not iterable')

我将首先列出事实,然后快速提醒您for在python中使用循环时会发生什么,然后再进行讨论以说明事实。

事实

  1. 如果至少满足以下条件之一,则可以o通过调用从任何对象获取迭代器iter(o)

    a)o具有__iter__返回迭代器对象的方法。迭代器是任何具有__iter__和方法__next__(Python 2 :)的对象next

    b)o__getitem__方法。

  2. 仅检查Iterable或的实例Sequence,或仅检查属性__iter__是不够的。

  3. 如果一个对象o仅实现__getitem__,而不是实现__iter__iter(o)则将构造一个迭代器,该迭代器尝试从o整数索引(从索引0开始)获取项目。迭代器将捕获IndexError所引发的任何(但无其他错误),然后引发StopIteration自身。

  4. 从最一般的意义上讲,iter除了尝试一下之外,没有其他方法可以检查返回的迭代器是否正常。

  5. 如果o实现了对象__iter__,则该iter函数将确保返回的对象__iter__是迭代器。如果对象仅实现,则没有健全性检查__getitem__

  6. __iter__胜。如果一个对象同时o实现__iter__and __getitem__iter(o)将调用__iter__

  7. 如果要使自己的对象可迭代,请始终实现该__iter__方法。

for 循环

为了继续学习,您需要了解for在Python中使用循环时会发生什么。如果您已经知道,请随时跳到下一部分。

for item in o用于某些可迭代对象时o,Python调用iter(o)并期望将迭代器对象作为返回值。迭代器是实现__next__(或next在Python 2中)方法和__iter__方法的任何对象。

按照约定,__iter__迭代器的方法应返回对象本身(即return self)。然后next,Python调用迭代器,直到StopIteration引发为止。所有这些操作都是隐式发生的,但是以下演示使其可见:

import random

class DemoIterable(object):
    def __iter__(self):
        print('__iter__ called')
        return DemoIterator()

class DemoIterator(object):
    def __iter__(self):
        return self

    def __next__(self):
        print('__next__ called')
        r = random.randint(1, 10)
        if r == 5:
            print('raising StopIteration')
            raise StopIteration
        return r

迭代DemoIterable

>>> di = DemoIterable()
>>> for x in di:
...     print(x)
...
__iter__ called
__next__ called
9
__next__ called
8
__next__ called
10
__next__ called
3
__next__ called
10
__next__ called
raising StopIteration

讨论与插图

在第1点和第2点:获取迭代器和不可靠的检查

考虑以下类别:

class BasicIterable(object):
    def __getitem__(self, item):
        if item == 3:
            raise IndexError
        return item

调用iter的实例BasicIterable将返回迭代器,而不会出现任何问题,因为BasicIterableImplements __getitem__

>>> b = BasicIterable()
>>> iter(b)
<iterator object at 0x7f1ab216e320>

但是,请务必注意,b__iter__属性不具有,也不被视为Iterable或的实例Sequence

>>> from collections import Iterable, Sequence
>>> hasattr(b, '__iter__')
False
>>> isinstance(b, Iterable)
False
>>> isinstance(b, Sequence)
False

这就是为什么Luciano Ramalho的Fluent Python建议调用iter和处理潜能TypeError作为检查对象是否可迭代的最准确方法。直接从书中引用:

从Python 3.4开始,检查对象x是否可迭代的最准确方法是调用iter(x)并处理TypeError异常(如果不是)。这比使用更为准确isinstance(x, abc.Iterable),因为iter(x)还考虑了传统__getitem__方法,而IterableABC则不考虑。

关于第3点:遍历仅提供__getitem__,但不提供的对象__iter__

BasicIterable按预期的方式对一个工作实例进行迭代:Python构造了一个迭代器,该迭代器尝试从索引开始(从零开始)获取项目,直到IndexError引发。演示对象的__getitem__方法仅返回item__getitem__(self, item)由所返回的迭代器作为参数提供给iter

>>> b = BasicIterable()
>>> it = iter(b)
>>> next(it)
0
>>> next(it)
1
>>> next(it)
2
>>> next(it)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
StopIteration

请注意,当迭代器StopIteration无法返回下一项时,它将引发该迭代器IndexError,并且为其item == 3内部处理。这就是为什么按预期BasicIterable进行for循环的原因:

>>> for x in b:
...     print(x)
...
0
1
2

这是另一个例子,目的是让迭代器返回的迭代器iter尝试按索引访问项目的概念。WrappedDict不继承自dict,这意味着实例将没有__iter__方法。

class WrappedDict(object): # note: no inheritance from dict!
    def __init__(self, dic):
        self._dict = dic

    def __getitem__(self, item):
        try:
            return self._dict[item] # delegate to dict.__getitem__
        except KeyError:
            raise IndexError

注意,将to __getitem__委托给dict.__getitem__它,方括号表示形式只是一种简写形式。

>>> w = WrappedDict({-1: 'not printed',
...                   0: 'hi', 1: 'StackOverflow', 2: '!',
...                   4: 'not printed', 
...                   'x': 'not printed'})
>>> for x in w:
...     print(x)
... 
hi
StackOverflow
!

关于第4点和第5点:iter在调用迭代器时检查它__iter__

iter(o)为对象调用时oiter将确保方法的返回值(__iter__如果存在)是迭代器。这意味着返回的对象必须实现__next__(或next在Python 2中)和__iter__iter无法对仅提供的对象执行任何健全性检查__getitem__,因为它无法检查整数索引是否可以访问对象的项。

class FailIterIterable(object):
    def __iter__(self):
        return object() # not an iterator

class FailGetitemIterable(object):
    def __getitem__(self, item):
        raise Exception

请注意,从FailIterIterable实例构造一个迭代器会立即失败,而从一个实例构造一个迭代器会立即失败FailGetItemIterable,但会在第一次调用时引发Exception __next__

>>> fii = FailIterIterable()
>>> iter(fii)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: iter() returned non-iterator of type 'object'
>>>
>>> fgi = FailGetitemIterable()
>>> it = iter(fgi)
>>> next(it)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/path/iterdemo.py", line 42, in __getitem__
    raise Exception
Exception

在第6点:__iter__获胜

这很简单。如果一个对象实现__iter__and __getitem__iter将调用__iter__。考虑以下类

class IterWinsDemo(object):
    def __iter__(self):
        return iter(['__iter__', 'wins'])

    def __getitem__(self, item):
        return ['__getitem__', 'wins'][item]

以及遍历实例时的输出:

>>> iwd = IterWinsDemo()
>>> for x in iwd:
...     print(x)
...
__iter__
wins

关于第7点:您的可迭代类应实现 __iter__

您可能会问自己,为什么大多数内置序列(如list实现一个__iter__方法)何时__getitem__足够。

class WrappedList(object): # note: no inheritance from list!
    def __init__(self, lst):
        self._list = lst

    def __getitem__(self, item):
        return self._list[item]

毕竟,在上面的类的实例上进行迭代(可以使用方括号表示法)__getitem__来委托对其进行调用,该实例list.__getitem__可以正常工作:

>>> wl = WrappedList(['A', 'B', 'C'])
>>> for x in wl:
...     print(x)
... 
A
B
C

您的自定义可迭代项应实现的原因__iter__如下:

  1. 如果实现__iter__,则实例将被视为可迭代的,isinstance(o, collections.abc.Iterable)并将返回True
  2. 如果返回的对象__iter__不是迭代器,iter则将立即失败并引发TypeError
  3. 由于__getitem__向后兼容的原因,存在的特殊处理。再次引用Fluent Python:

这就是任何Python序列都是可迭代的原因:它们都实现了__getitem__。实际上,标准序列也可以实现__iter__,您也应该实现,因为__getitem__出于向后兼容的原因而存在对的特殊处理,并且可能在将来消失(尽管在我撰写本文时不推荐使用)。

I’d like to shed a little bit more light on the interplay of iter, __iter__ and __getitem__ and what happens behind the curtains. Armed with that knowledge, you will be able to understand why the best you can do is

try:
    iter(maybe_iterable)
    print('iteration will probably work')
except TypeError:
    print('not iterable')

I will list the facts first and then follow up with a quick reminder of what happens when you employ a for loop in python, followed by a discussion to illustrate the facts.

Facts

  1. You can get an iterator from any object o by calling iter(o) if at least one of the following conditions holds true:

    a) o has an __iter__ method which returns an iterator object. An iterator is any object with an __iter__ and a __next__ (Python 2: next) method.

    b) o has a __getitem__ method.

  2. Checking for an instance of Iterable or Sequence, or checking for the attribute __iter__ is not enough.

  3. If an object o implements only __getitem__, but not __iter__, iter(o) will construct an iterator that tries to fetch items from o by integer index, starting at index 0. The iterator will catch any IndexError (but no other errors) that is raised and then raises StopIteration itself.

  4. In the most general sense, there’s no way to check whether the iterator returned by iter is sane other than to try it out.

  5. If an object o implements __iter__, the iter function will make sure that the object returned by __iter__ is an iterator. There is no sanity check if an object only implements __getitem__.

  6. __iter__ wins. If an object o implements both __iter__ and __getitem__, iter(o) will call __iter__.

  7. If you want to make your own objects iterable, always implement the __iter__ method.

for loops

In order to follow along, you need an understanding of what happens when you employ a for loop in Python. Feel free to skip right to the next section if you already know.

When you use for item in o for some iterable object o, Python calls iter(o) and expects an iterator object as the return value. An iterator is any object which implements a __next__ (or next in Python 2) method and an __iter__ method.

By convention, the __iter__ method of an iterator should return the object itself (i.e. return self). Python then calls next on the iterator until StopIteration is raised. All of this happens implicitly, but the following demonstration makes it visible:

import random

class DemoIterable(object):
    def __iter__(self):
        print('__iter__ called')
        return DemoIterator()

class DemoIterator(object):
    def __iter__(self):
        return self

    def __next__(self):
        print('__next__ called')
        r = random.randint(1, 10)
        if r == 5:
            print('raising StopIteration')
            raise StopIteration
        return r

Iteration over a DemoIterable:

>>> di = DemoIterable()
>>> for x in di:
...     print(x)
...
__iter__ called
__next__ called
9
__next__ called
8
__next__ called
10
__next__ called
3
__next__ called
10
__next__ called
raising StopIteration

Discussion and illustrations

On point 1 and 2: getting an iterator and unreliable checks

Consider the following class:

class BasicIterable(object):
    def __getitem__(self, item):
        if item == 3:
            raise IndexError
        return item

Calling iter with an instance of BasicIterable will return an iterator without any problems because BasicIterable implements __getitem__.

>>> b = BasicIterable()
>>> iter(b)
<iterator object at 0x7f1ab216e320>

However, it is important to note that b does not have the __iter__ attribute and is not considered an instance of Iterable or Sequence:

>>> from collections import Iterable, Sequence
>>> hasattr(b, '__iter__')
False
>>> isinstance(b, Iterable)
False
>>> isinstance(b, Sequence)
False

This is why Fluent Python by Luciano Ramalho recommends calling iter and handling the potential TypeError as the most accurate way to check whether an object is iterable. Quoting directly from the book:

As of Python 3.4, the most accurate way to check whether an object x is iterable is to call iter(x) and handle a TypeError exception if it isn’t. This is more accurate than using isinstance(x, abc.Iterable) , because iter(x) also considers the legacy __getitem__ method, while the Iterable ABC does not.

On point 3: Iterating over objects which only provide __getitem__, but not __iter__

Iterating over an instance of BasicIterable works as expected: Python constructs an iterator that tries to fetch items by index, starting at zero, until an IndexError is raised. The demo object’s __getitem__ method simply returns the item which was supplied as the argument to __getitem__(self, item) by the iterator returned by iter.

>>> b = BasicIterable()
>>> it = iter(b)
>>> next(it)
0
>>> next(it)
1
>>> next(it)
2
>>> next(it)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
StopIteration

Note that the iterator raises StopIteration when it cannot return the next item and that the IndexError which is raised for item == 3 is handled internally. This is why looping over a BasicIterable with a for loop works as expected:

>>> for x in b:
...     print(x)
...
0
1
2

Here’s another example in order to drive home the concept of how the iterator returned by iter tries to access items by index. WrappedDict does not inherit from dict, which means instances won’t have an __iter__ method.

class WrappedDict(object): # note: no inheritance from dict!
    def __init__(self, dic):
        self._dict = dic

    def __getitem__(self, item):
        try:
            return self._dict[item] # delegate to dict.__getitem__
        except KeyError:
            raise IndexError

Note that calls to __getitem__ are delegated to dict.__getitem__ for which the square bracket notation is simply a shorthand.

>>> w = WrappedDict({-1: 'not printed',
...                   0: 'hi', 1: 'StackOverflow', 2: '!',
...                   4: 'not printed', 
...                   'x': 'not printed'})
>>> for x in w:
...     print(x)
... 
hi
StackOverflow
!

On point 4 and 5: iter checks for an iterator when it calls __iter__:

When iter(o) is called for an object o, iter will make sure that the return value of __iter__, if the method is present, is an iterator. This means that the returned object must implement __next__ (or next in Python 2) and __iter__. iter cannot perform any sanity checks for objects which only provide __getitem__, because it has no way to check whether the items of the object are accessible by integer index.

class FailIterIterable(object):
    def __iter__(self):
        return object() # not an iterator

class FailGetitemIterable(object):
    def __getitem__(self, item):
        raise Exception

Note that constructing an iterator from FailIterIterable instances fails immediately, while constructing an iterator from FailGetItemIterable succeeds, but will throw an Exception on the first call to __next__.

>>> fii = FailIterIterable()
>>> iter(fii)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: iter() returned non-iterator of type 'object'
>>>
>>> fgi = FailGetitemIterable()
>>> it = iter(fgi)
>>> next(it)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/path/iterdemo.py", line 42, in __getitem__
    raise Exception
Exception

On point 6: __iter__ wins

This one is straightforward. If an object implements __iter__ and __getitem__, iter will call __iter__. Consider the following class

class IterWinsDemo(object):
    def __iter__(self):
        return iter(['__iter__', 'wins'])

    def __getitem__(self, item):
        return ['__getitem__', 'wins'][item]

and the output when looping over an instance:

>>> iwd = IterWinsDemo()
>>> for x in iwd:
...     print(x)
...
__iter__
wins

On point 7: your iterable classes should implement __iter__

You might ask yourself why most builtin sequences like list implement an __iter__ method when __getitem__ would be sufficient.

class WrappedList(object): # note: no inheritance from list!
    def __init__(self, lst):
        self._list = lst

    def __getitem__(self, item):
        return self._list[item]

After all, iteration over instances of the class above, which delegates calls to __getitem__ to list.__getitem__ (using the square bracket notation), will work fine:

>>> wl = WrappedList(['A', 'B', 'C'])
>>> for x in wl:
...     print(x)
... 
A
B
C

The reasons your custom iterables should implement __iter__ are as follows:

  1. If you implement __iter__, instances will be considered iterables, and isinstance(o, collections.abc.Iterable) will return True.
  2. If the the object returned by __iter__ is not an iterator, iter will fail immediately and raise a TypeError.
  3. The special handling of __getitem__ exists for backwards compatibility reasons. Quoting again from Fluent Python:

That is why any Python sequence is iterable: they all implement __getitem__ . In fact, the standard sequences also implement __iter__, and yours should too, because the special handling of __getitem__ exists for backward compatibility reasons and may be gone in the future (although it is not deprecated as I write this).


回答 4

这还不够:返回的对象__iter__必须实现迭代协议(即next方法)。请参阅文档中的相关部分。

在Python中,一个好的做法是“尝试一下”而不是“检查”。

This isn’t sufficient: the object returned by __iter__ must implement the iteration protocol (i.e. next method). See the relevant section in the documentation.

In Python, a good practice is to “try and see” instead of “checking”.


回答 5

在Python <= 2.5中,您不能也不应-可迭代是一个“非正式”接口。

但是从Python 2.6和3.0开始,您可以利用新的ABC(抽象基类)基础结构以及一些内置的ABC(可在collections模块中使用):

from collections import Iterable

class MyObject(object):
    pass

mo = MyObject()
print isinstance(mo, Iterable)
Iterable.register(MyObject)
print isinstance(mo, Iterable)

print isinstance("abc", Iterable)

现在,这是合乎需要的还是实际可行的,仅是一个约定问题。如您所见,您可以将一个不可迭代的对象注册为Iterable-它将在运行时引发异常。因此,isinstance获得了“新”的含义-它只是检查“声明的”类型兼容性,这是在Python中使用的好方法。

另一方面,如果您的对象不满足您所需的接口,您将要做什么?请看以下示例:

from collections import Iterable
from traceback import print_exc

def check_and_raise(x):
    if not isinstance(x, Iterable):
        raise TypeError, "%s is not iterable" % x
    else:
        for i in x:
            print i

def just_iter(x):
    for i in x:
        print i


class NotIterable(object):
    pass

if __name__ == "__main__":
    try:
        check_and_raise(5)
    except:
        print_exc()
        print

    try:
        just_iter(5)
    except:
        print_exc()
        print

    try:
        Iterable.register(NotIterable)
        ni = NotIterable()
        check_and_raise(ni)
    except:
        print_exc()
        print

如果对象不满足您的期望,则只引发TypeError,但是如果已注册了正确的ABC,则您的检查无用。相反,如果该__iter__方法可用,Python会自动将该类的对象识别为Iterable。

因此,如果您只是期望一个可迭代的对象,请对其进行迭代并忘记它。另一方面,如果您需要根据输入类型执行不同的操作,则可能会发现ABC基础结构非常有用。

In Python <= 2.5, you can’t and shouldn’t – iterable was an “informal” interface.

But since Python 2.6 and 3.0 you can leverage the new ABC (abstract base class) infrastructure along with some builtin ABCs which are available in the collections module:

from collections import Iterable

class MyObject(object):
    pass

mo = MyObject()
print isinstance(mo, Iterable)
Iterable.register(MyObject)
print isinstance(mo, Iterable)

print isinstance("abc", Iterable)

Now, whether this is desirable or actually works, is just a matter of conventions. As you can see, you can register a non-iterable object as Iterable – and it will raise an exception at runtime. Hence, isinstance acquires a “new” meaning – it just checks for “declared” type compatibility, which is a good way to go in Python.

On the other hand, if your object does not satisfy the interface you need, what are you going to do? Take the following example:

from collections import Iterable
from traceback import print_exc

def check_and_raise(x):
    if not isinstance(x, Iterable):
        raise TypeError, "%s is not iterable" % x
    else:
        for i in x:
            print i

def just_iter(x):
    for i in x:
        print i


class NotIterable(object):
    pass

if __name__ == "__main__":
    try:
        check_and_raise(5)
    except:
        print_exc()
        print

    try:
        just_iter(5)
    except:
        print_exc()
        print

    try:
        Iterable.register(NotIterable)
        ni = NotIterable()
        check_and_raise(ni)
    except:
        print_exc()
        print

If the object doesn’t satisfy what you expect, you just throw a TypeError, but if the proper ABC has been registered, your check is unuseful. On the contrary, if the __iter__ method is available Python will automatically recognize object of that class as being Iterable.

So, if you just expect an iterable, iterate over it and forget it. On the other hand, if you need to do different things depending on input type, you might find the ABC infrastructure pretty useful.


回答 6

try:
  #treat object as iterable
except TypeError, e:
  #object is not actually iterable

不要运行检查以查看您的鸭子是否真的是鸭子,以查看它是否可迭代,请像对待鸭子一样对待它,否则请抱怨。

try:
  #treat object as iterable
except TypeError, e:
  #object is not actually iterable

Don’t run checks to see if your duck really is a duck to see if it is iterable or not, treat it as if it was and complain if it wasn’t.


回答 7

Python 3.5开始,您可以使用标准库中的类型输入模块来处理与类型相关的事情:

from typing import Iterable

...

if isinstance(my_item, Iterable):
    print(True)

Since Python 3.5 you can use the typing module from the standard library for type related things:

from typing import Iterable

...

if isinstance(my_item, Iterable):
    print(True)

回答 8

到目前为止,我发现的最佳解决方案是:

hasattr(obj, '__contains__')

基本上检查对象是否实现了in运算符。

优点(其他解决方案都没有这三个优点):

  • 它是一个表达式(作为lambda,而不是try … except变体)
  • (应该)由所有可迭代对象(包括字符串)实现(而不是__iter__
  • 适用于任何Python> = 2.5

笔记:

  • 例如,在列表中同时具有可迭代和不可迭代,并且您需要根据其类型对每个元素进行不同处理(在try和non-上处理可迭代)时,Python的“请求宽恕,而不是允许”的哲学无法很好地工作。上的iterables 起作用,但是看起来很难看并且具有误导性)
  • 尝试实际迭代对象(例如[x for obj中的x的x])以检查其是否可迭代的问题的解决方案,可能会导致大型可迭代对象的性能下降(尤其是如果您只需要可迭代对象的前几个元素,则为示例),应避免

The best solution I’ve found so far:

hasattr(obj, '__contains__')

which basically checks if the object implements the in operator.

Advantages (none of the other solutions has all three):

  • it is an expression (works as a lambda, as opposed to the try…except variant)
  • it is (should be) implemented by all iterables, including strings (as opposed to __iter__)
  • works on any Python >= 2.5

Notes:

  • the Python philosophy of “ask for forgiveness, not permission” doesn’t work well when e.g. in a list you have both iterables and non-iterables and you need to treat each element differently according to it’s type (treating iterables on try and non-iterables on except would work, but it would look butt-ugly and misleading)
  • solutions to this problem which attempt to actually iterate over the object (e.g. [x for x in obj]) to check if it’s iterable may induce significant performance penalties for large iterables (especially if you just need the first few elements of the iterable, for example) and should be avoided

回答 9

您可以尝试以下方法:

def iterable(a):
    try:
        (x for x in a)
        return True
    except TypeError:
        return False

如果我们可以使生成器在其上进行迭代(但不要使用生成器,这样就不会占用空间),那么它是可迭代的。好像是“ duh”之类的东西。为什么首先需要确定变量是否可迭代?

You could try this:

def iterable(a):
    try:
        (x for x in a)
        return True
    except TypeError:
        return False

If we can make a generator that iterates over it (but never use the generator so it doesn’t take up space), it’s iterable. Seems like a “duh” kind of thing. Why do you need to determine if a variable is iterable in the first place?


回答 10

我在这里找到了一个不错的解决方案:

isiterable = lambda obj: isinstance(obj, basestring) \
    or getattr(obj, '__iter__', False)

I found a nice solution here:

isiterable = lambda obj: isinstance(obj, basestring) \
    or getattr(obj, '__iter__', False)

回答 11

根据Python 2词汇表,可迭代项是

所有序列类型(如liststr,和tuple)和一些非序列类型,如dictfile以及你与定义任何类的对象__iter__()__getitem__()方法。Iterables可用于for循环以及需要序列的许多其他地方(zip(),map()等)。将可迭代对象作为参数传递给内置函数iter()时,它将返回该对象的迭代器。

当然,考虑到Python的通用编码风格是基于“请求宽容比允许容易”这一事实,因此,人们普遍期望使用

try:
    for i in object_in_question:
        do_something
except TypeError:
    do_something_for_non_iterable

但是,如果您需要显式检查它,可以通过测试一个可迭代对象hasattr(object_in_question, "__iter__") or hasattr(object_in_question, "__getitem__")。您需要检查两者,因为strs没有__iter__方法(至少在Python 2中没有),并且generator对象没有__getitem__方法。

According to the Python 2 Glossary, iterables are

all sequence types (such as list, str, and tuple) and some non-sequence types like dict and file and objects of any classes you define with an __iter__() or __getitem__() method. Iterables can be used in a for loop and in many other places where a sequence is needed (zip(), map(), …). When an iterable object is passed as an argument to the built-in function iter(), it returns an iterator for the object.

Of course, given the general coding style for Python based on the fact that it’s “Easier to ask for forgiveness than permission.”, the general expectation is to use

try:
    for i in object_in_question:
        do_something
except TypeError:
    do_something_for_non_iterable

But if you need to check it explicitly, you can test for an iterable by hasattr(object_in_question, "__iter__") or hasattr(object_in_question, "__getitem__"). You need to check for both, because strs don’t have an __iter__ method (at least not in Python 2, in Python 3 they do) and because generator objects don’t have a __getitem__ method.


回答 12

我经常在脚本内找到定义iterable函数的方便方法。(现在结合了Alfe建议的简化):

import collections

def iterable(obj):
    return isinstance(obj, collections.Iterable):

因此,您可以以可读性强的形式测试任何对象是否可迭代

if iterable(obj):
    # act on iterable
else:
    # not iterable

就像您使用该callable功能一样

编辑:如果您已安装numpy,则可以执行以下操作:from numpy import iterable,这就像

def iterable(obj):
    try: iter(obj)
    except: return False
    return True

如果您没有numpy,则可以简单地实现此代码或上面的代码。

I often find convenient, inside my scripts, to define an iterable function. (Now incorporates Alfe’s suggested simplification):

import collections

def iterable(obj):
    return isinstance(obj, collections.Iterable):

so you can test if any object is iterable in the very readable form

if iterable(obj):
    # act on iterable
else:
    # not iterable

as you would do with thecallable function

EDIT: if you have numpy installed, you can simply do: from numpy import iterable, which is simply something like

def iterable(obj):
    try: iter(obj)
    except: return False
    return True

If you do not have numpy, you can simply implement this code, or the one above.


回答 13

具有这样的内置功能:

from pandas.util.testing import isiterable

has a built-in function like that:

from pandas.util.testing import isiterable

回答 14

它总是躲避我,为什么Python有callable(obj) -> bool,但不是iterable(obj) -> bool……
当然这是容易做到hasattr(obj,'__call__'),即使是速度较慢。

由于几乎所有其他答案都建议使用try/ except TypeError,因此在任何语言中通常都将异常测试视为不良实践,因此,以下是iterable(obj) -> bool我越来越喜欢并经常使用的实现:

为了python 2的缘故,我将只使用lambda来提高性能……
(在python 3中,定义函数的功能def与无关紧要lambda

iterable = lambda obj: hasattr(obj,'__iter__') or hasattr(obj,'__getitem__')

请注意,此功能对于的对象执行得更快,__iter__因为它不会测试__getitem__

大多数可迭代对象都应依赖于__iter__特殊情况对象回退到的位置__getitem__,尽管要使对象可迭代则需要使用任一个。
(由于这是标准的,因此也会影响C对象)

It’s always eluded me as to why python has callable(obj) -> bool but not iterable(obj) -> bool
surely it’s easier to do hasattr(obj,'__call__') even if it is slower.

Since just about every other answer recommends using try/except TypeError, where testing for exceptions is generally considered bad practice among any language, here’s an implementation of iterable(obj) -> bool I’ve grown more fond of and use often:

For python 2’s sake, I’ll use a lambda just for that extra performance boost…
(in python 3 it doesn’t matter what you use for defining the function, def has roughly the same speed as lambda)

iterable = lambda obj: hasattr(obj,'__iter__') or hasattr(obj,'__getitem__')

Note that this function executes faster for objects with __iter__ since it doesn’t test for __getitem__.

Most iterable objects should rely on __iter__ where special-case objects fall back to __getitem__, though either is required for an object to be iterable.
(and since this is standard, it affects C objects as well)


回答 15

def is_iterable(x):
    try:
        0 in x
    except TypeError:
        return False
    else:
        return True

这将对所有可迭代对象的方式都说“是”,但是对Python 2中的字符串说“否”。(例如,当递归函数可以使用字符串或字符串容器时,这就是我想要的。在这种情况下,请求宽恕可能会导致混淆代码,最好先请求权限。)

import numpy

class Yes:
    def __iter__(self):
        yield 1;
        yield 2;
        yield 3;

class No:
    pass

class Nope:
    def __iter__(self):
        return 'nonsense'

assert is_iterable(Yes())
assert is_iterable(range(3))
assert is_iterable((1,2,3))   # tuple
assert is_iterable([1,2,3])   # list
assert is_iterable({1,2,3})   # set
assert is_iterable({1:'one', 2:'two', 3:'three'})   # dictionary
assert is_iterable(numpy.array([1,2,3]))
assert is_iterable(bytearray("not really a string", 'utf-8'))

assert not is_iterable(No())
assert not is_iterable(Nope())
assert not is_iterable("string")
assert not is_iterable(42)
assert not is_iterable(True)
assert not is_iterable(None)

这里的许多其他策略都会对字符串说“是”。如果您要使用它们,请使用它们。

import collections
import numpy

assert isinstance("string", collections.Iterable)
assert isinstance("string", collections.Sequence)
assert numpy.iterable("string")
assert iter("string")
assert hasattr("string", '__getitem__')

注意:is_iterable()将对类型为bytes和的字符串说是bytearray

  • bytesPython 3中的对象是可迭代的。Python2 True == is_iterable(b"string") == is_iterable("string".encode('utf-8'))中没有此类。
  • bytearray Python 2和3中的对象是可迭代的 True == is_iterable(bytearray(b"abc"))

该任择议定书hasattr(x, '__iter__')的做法将是说在Python 3,没有在Python 2串(也罢''b''u'')。感谢@LuisMasuelli注意到它也会让您失望__iter__

def is_iterable(x):
    try:
        0 in x
    except TypeError:
        return False
    else:
        return True

This will say yes to all manner of iterable objects, but it will say no to strings in Python 2. (That’s what I want for example when a recursive function could take a string or a container of strings. In that situation, asking forgiveness may lead to obfuscode, and it’s better to ask permission first.)

import numpy

class Yes:
    def __iter__(self):
        yield 1;
        yield 2;
        yield 3;

class No:
    pass

class Nope:
    def __iter__(self):
        return 'nonsense'

assert is_iterable(Yes())
assert is_iterable(range(3))
assert is_iterable((1,2,3))   # tuple
assert is_iterable([1,2,3])   # list
assert is_iterable({1,2,3})   # set
assert is_iterable({1:'one', 2:'two', 3:'three'})   # dictionary
assert is_iterable(numpy.array([1,2,3]))
assert is_iterable(bytearray("not really a string", 'utf-8'))

assert not is_iterable(No())
assert not is_iterable(Nope())
assert not is_iterable("string")
assert not is_iterable(42)
assert not is_iterable(True)
assert not is_iterable(None)

Many other strategies here will say yes to strings. Use them if that’s what you want.

import collections
import numpy

assert isinstance("string", collections.Iterable)
assert isinstance("string", collections.Sequence)
assert numpy.iterable("string")
assert iter("string")
assert hasattr("string", '__getitem__')

Note: is_iterable() will say yes to strings of type bytes and bytearray.

  • bytes objects in Python 3 are iterable True == is_iterable(b"string") == is_iterable("string".encode('utf-8')) There is no such type in Python 2.
  • bytearray objects in Python 2 and 3 are iterable True == is_iterable(bytearray(b"abc"))

The O.P. hasattr(x, '__iter__') approach will say yes to strings in Python 3 and no in Python 2 (no matter whether '' or b'' or u''). Thanks to @LuisMasuelli for noticing it will also let you down on a buggy __iter__.


回答 16

尊重Python的鸭子类型,最简单的方法是捕获错误(Python完全知道从对象变成迭代器的期望):

class A(object):
    def __getitem__(self, item):
        return something

class B(object):
    def __iter__(self):
        # Return a compliant iterator. Just an example
        return iter([])

class C(object):
    def __iter__(self):
        # Return crap
        return 1

class D(object): pass

def iterable(obj):
    try:
        iter(obj)
        return True
    except:
        return False

assert iterable(A())
assert iterable(B())
assert iterable(C())
assert not iterable(D())

注意事项

  1. __iter__如果异常类型相同,则对象是不可迭代的还是已实现越野车的区分是无关紧要的:无论如何,您将无法迭代该对象。
  2. 我想我理解您的担心:callable如果我还可以依赖于鸭子类型来引发未为我的对象定义的AttributeErrorif 的检查,那么它如何存在__call__,但可迭代检查不是这种情况?

    我不知道答案,但是您可以实现我(和其他用户)提供的功能,也可以仅捕获代码中的异常(您在该部分中的实现将类似于我编写的功能-只要确保隔离从代码的其余部分创建迭代器,这样您可以捕获异常并将其与另一个区别TypeError

The easiest way, respecting the Python’s duck typing, is to catch the error (Python knows perfectly what does it expect from an object to become an iterator):

class A(object):
    def __getitem__(self, item):
        return something

class B(object):
    def __iter__(self):
        # Return a compliant iterator. Just an example
        return iter([])

class C(object):
    def __iter__(self):
        # Return crap
        return 1

class D(object): pass

def iterable(obj):
    try:
        iter(obj)
        return True
    except:
        return False

assert iterable(A())
assert iterable(B())
assert iterable(C())
assert not iterable(D())

Notes:

  1. It is irrelevant the distinction whether the object is not iterable, or a buggy __iter__ has been implemented, if the exception type is the same: anyway you will not be able to iterate the object.
  2. I think I understand your concern: How does callable exists as a check if I could also rely on duck typing to raise an AttributeError if __call__ is not defined for my object, but that’s not the case for iterable checking?

    I don’t know the answer, but you can either implement the function I (and other users) gave, or just catch the exception in your code (your implementation in that part will be like the function I wrote – just ensure you isolate the iterator creation from the rest of the code so you can capture the exception and distinguish it from another TypeError.


回答 17

如果对象是可迭代的isiterable,则以下代码中的func返回True。如果不是迭代返回False

def isiterable(object_):
    return hasattr(type(object_), "__iter__")

fruits = ("apple", "banana", "peach")
isiterable(fruits) # returns True

num = 345
isiterable(num) # returns False

isiterable(str) # returns False because str type is type class and it's not iterable.

hello = "hello dude !"
isiterable(hello) # returns True because as you know string objects are iterable

The isiterable func at the following code returns True if object is iterable. if it’s not iterable returns False

def isiterable(object_):
    return hasattr(type(object_), "__iter__")

example

fruits = ("apple", "banana", "peach")
isiterable(fruits) # returns True

num = 345
isiterable(num) # returns False

isiterable(str) # returns False because str type is type class and it's not iterable.

hello = "hello dude !"
isiterable(hello) # returns True because as you know string objects are iterable

回答 18

除了检查__iter__属性之外,您还可以检查__len__属性,该属性由每个内置迭代的python实现,包括字符串。

>>> hasattr(1, "__len__")
False
>>> hasattr(1.3, "__len__")
False
>>> hasattr("a", "__len__")
True
>>> hasattr([1,2,3], "__len__")
True
>>> hasattr({1,2}, "__len__")
True
>>> hasattr({"a":1}, "__len__")
True
>>> hasattr(("a", 1), "__len__")
True

不可迭代的对象出于明显的原因不会实现此目的。但是,它不会捕获没有实现它的用户定义的可迭代对象,也不会捕获iter可以处理的生成器表达式。但是,这可以一行完成,并且or为生成器添加一个简单的表达式检查将解决此问题。(请注意,写作type(my_generator_expression) == generator会引发NameError。请改为参考答案。)

您可以从以下类型使用GeneratorType:

>>> import types
>>> types.GeneratorType
<class 'generator'>
>>> gen = (i for i in range(10))
>>> isinstance(gen, types.GeneratorType)
True

— utdemir接受的答案

(这对于检查是否可以调用len该对象很有用。)

Instead of checking for the __iter__ attribute, you could check for the __len__ attribute, which is implemented by every python builtin iterable, including strings.

>>> hasattr(1, "__len__")
False
>>> hasattr(1.3, "__len__")
False
>>> hasattr("a", "__len__")
True
>>> hasattr([1,2,3], "__len__")
True
>>> hasattr({1,2}, "__len__")
True
>>> hasattr({"a":1}, "__len__")
True
>>> hasattr(("a", 1), "__len__")
True

None-iterable objects would not implement this for obvious reasons. However, it does not catch user-defined iterables that do not implement it, nor do generator expressions, which iter can deal with. However, this can be done in a line, and adding a simple or expression checking for generators would fix this problem. (Note that writing type(my_generator_expression) == generator would throw a NameError. Refer to this answer instead.)

You can use GeneratorType from types:

>>> import types
>>> types.GeneratorType
<class 'generator'>
>>> gen = (i for i in range(10))
>>> isinstance(gen, types.GeneratorType)
True

— accepted answer by utdemir

(This makes it useful for checking if you can call len on the object though.)


回答 19

并不是真正的“正确”,但可以用作最常见类型的快速检查,例如字符串,元组,浮点数等。

>>> '__iter__' in dir('sds')
True
>>> '__iter__' in dir(56)
False
>>> '__iter__' in dir([5,6,9,8])
True
>>> '__iter__' in dir({'jh':'ff'})
True
>>> '__iter__' in dir({'jh'})
True
>>> '__iter__' in dir(56.9865)
False

Not really “correct” but can serve as quick check of most common types like strings, tuples, floats, etc…

>>> '__iter__' in dir('sds')
True
>>> '__iter__' in dir(56)
False
>>> '__iter__' in dir([5,6,9,8])
True
>>> '__iter__' in dir({'jh':'ff'})
True
>>> '__iter__' in dir({'jh'})
True
>>> '__iter__' in dir(56.9865)
False

回答 20

Kinda参加聚会很晚,但是我问了自己这个问题,然后看到了这个答案。我不知道是否有人已经发布了这个。但从本质上讲,我注意到所有可迭代类型的字典中都具有__getitem __()。这是您无需尝试即可检查对象是否可迭代的方式。(双关语意)

def is_attr(arg):
    return '__getitem__' in dir(arg)

Kinda late to the party but I asked myself this question and saw this then thought of an answer. I don’t know if someone already posted this. But essentially, I’ve noticed that all iterable types have __getitem__() in their dict. This is how you would check if an object was an iterable without even trying. (Pun intended)

def is_attr(arg):
    return '__getitem__' in dir(arg)

如何根据本地目录中的requirements.txt文件使用pip安装软件包?

问题:如何根据本地目录中的requirements.txt文件使用pip安装软件包?

这是问题所在

我有一个require.txt看起来像:

BeautifulSoup==3.2.0
Django==1.3
Fabric==1.2.0
Jinja2==2.5.5
PyYAML==3.09
Pygments==1.4
SQLAlchemy==0.7.1
South==0.7.3
amqplib==0.6.1
anyjson==0.3
...

我有一个本地存档目录,其中包含所有软件包和其他软件包。

我创建了一个新的virtualenv

bin/virtualenv testing

激活它后,我尝试根据本地存档目录中的requirements.txt安装软件包。

source bin/activate
pip install -r /path/to/requirements.txt -f file:///path/to/archive/

我得到一些输出,似乎表明安装正常

Downloading/unpacking Fabric==1.2.0 (from -r ../testing/requirements.txt (line 3))
  Running setup.py egg_info for package Fabric
    warning: no previously-included files matching '*' found under directory 'docs/_build'
    warning: no files found matching 'fabfile.py'
Downloading/unpacking South==0.7.3 (from -r ../testing/requirements.txt (line 8))
  Running setup.py egg_info for package South
....

但后来检查发现该软件包均未正确安装。我无法导入软件包,但在virtualenv的site-packages目录中找不到任何软件包。那么出了什么问题?

Here is the problem

I have a requirements.txt that looks like:

BeautifulSoup==3.2.0
Django==1.3
Fabric==1.2.0
Jinja2==2.5.5
PyYAML==3.09
Pygments==1.4
SQLAlchemy==0.7.1
South==0.7.3
amqplib==0.6.1
anyjson==0.3
...

I have a local archive directory containing all the packages + others.

I have created a new virtualenv with

bin/virtualenv testing

upon activating it, I tried to install the packages according to requirements.txt from the local archive directory.

source bin/activate
pip install -r /path/to/requirements.txt -f file:///path/to/archive/

I got some output that seems to indicate that the installation is fine

Downloading/unpacking Fabric==1.2.0 (from -r ../testing/requirements.txt (line 3))
  Running setup.py egg_info for package Fabric
    warning: no previously-included files matching '*' found under directory 'docs/_build'
    warning: no files found matching 'fabfile.py'
Downloading/unpacking South==0.7.3 (from -r ../testing/requirements.txt (line 8))
  Running setup.py egg_info for package South
....

But later check revealed none of the package is installed properly. I cannot import the package, and none is found in the site-packages directory of my virtualenv. So what went wrong?


回答 0

这对我有用:

$ pip install -r requirements.txt --no-index --find-links file:///tmp/packages

--no-index-忽略软件包索引(仅查看--find-linksURL)。

-f, --find-links <URL>-如果是URL或html文件的路径,请解析出指向归档文件的链接。如果是file://目录的本地路径或URL,请在目录列表中查找档案。

This works for me:

$ pip install -r requirements.txt --no-index --find-links file:///tmp/packages

--no-index – Ignore package index (only looking at --find-links URLs instead).

-f, --find-links <URL> – If a URL or path to an html file, then parse for links to archives. If a local path or file:// URL that’s a directory, then look for archives in the directory listing.


回答 1

我已经阅读了上面的内容,意识到这是一个古老的问题,但它仍未完全解决,仍然位于我的Google搜索结果的顶部,因此,这是一个适用于所有人的答案:

pip install -r /path/to/requirements.txt

I’ve read the above, realize this is an old question, but it’s totally unresolved and still at the top of my google search results so here’s an answer that works for everyone:

pip install -r /path/to/requirements.txt

回答 2

为了使virtualenv将所有文件安装在requirements.txt文件中。

  1. cd到requirements.txt所在的目录
  2. 激活您的虚拟环境
  3. 运行: pip install -r requirements.txt 在您的外壳中

For virtualenv to install all files in the requirements.txt file.

  1. cd to the directory where requirements.txt is located
  2. activate your virtualenv
  3. run: pip install -r requirements.txt in your shell

回答 3

我有一个类似的问题。我尝试了这个:

pip install -U -r requirements.txt 

(-U =更新(如果已安装))

但是问题仍然存在。我意识到缺少一些通用的开发库。

sudo apt-get install libtiff5-dev libjpeg8-dev zlib1g-dev liblcms2-dev libwebp-dev tcl8.6-dev tk8.6-dev python-tk

我不知道这是否对您有帮助。

I had a similar problem. I tried this:

pip install -U -r requirements.txt 

(-U = update if it had already installed)

But the problem continued. I realized that some of generic libraries for development were missed.

sudo apt-get install libtiff5-dev libjpeg8-dev zlib1g-dev liblcms2-dev libwebp-dev tcl8.6-dev tk8.6-dev python-tk

I don’t know if this would help you.


回答 4

pip install -r requirements.txt

有关更多详细信息,请检查帮助选项。

pip install --help

我们可以找到选项“ -r”

-r,–requirement从给定的需求文件安装。此选项可以多次使用。

有关一些常用的pip安装选项的更多信息:(这是pip install命令上的帮助选项)

以上是完整的选项集。请使用pip install –help获得完整的选项列表。

pip install -r requirements.txt

For further details please check the help option.

pip install --help

We can find the option ‘-r’

-r, –requirement Install from the given requirements file. This option can be used multiple times.

Further information on some commonly used pip install options: (This is the help option on pip install command)

Also the above is the complete set of options. Please use pip install –help for complete list of options.


回答 5

简短答案

pip install -r /path/to/requirements.txt

或其他形式:

python -m pip install -r /path/to/requirements.txt

说明

在这里,-r是的缩写--requirement,它要求pip从给定的requirements文件进行安装。

pip只有在检查了requirements文件中所有列出的项目的可用性之后,它才会开始安装,即使有一个项目也不会开始安装requirement不可用。

安装可用软件包的一种解决方法是逐一安装列出的软件包。为此使用以下命令。将显示红色警告,以通知您有关不可用的软件包的信息。

cat requirements.txt | xargs -n 1 pip install

要忽略注释(以开头的行#)和空白行,请使用:

cat requirements.txt | cut -f1 -d"#" | sed '/^\s*$/d' | xargs -n 1 pip install

Short answer

pip install -r /path/to/requirements.txt

or in another form:

python -m pip install -r /path/to/requirements.txt

Explanation

Here, -r is short form of --requirement and it asks the pip to install from the given requirements file.

pip will start installation only after checking the availability of all listed items in the requirements file and it won’t start installation even if one requirement is unavailable.

One workaround to install the available packages is installing listed packages one by one. Use the following command for that. A red color warning will be shown to notify you about the unavailable packages.

cat requirements.txt | xargs -n 1 pip install

To ignore comments (lines starting with a #) and blank lines, use:

cat requirements.txt | cut -f1 -d"#" | sed '/^\s*$/d' | xargs -n 1 pip install

回答 6

通常,您将需要从本地档案中快速安装,而无需探究PyPI。

首先,下载符合您要求的档案:

$ pip install --download <DIR> -r requirements.txt

然后,使用–find-links和安装–no-index

$ pip install --no-index --find-links=[file://]<DIR> -r requirements.txt

Often, you will want a fast install from local archives, without probing PyPI.

First, download the archives that fulfill your requirements:

$ pip install --download <DIR> -r requirements.txt

Then, install using –find-links and –no-index:

$ pip install --no-index --find-links=[file://]<DIR> -r requirements.txt

回答 7

我使用了很多系统,这些系统被开发人员“遵循他们在互联网上找到的指示”所破坏。您pip和您使用python的路径/站点程序包不同,这是非常普遍的。因此,当我遇到奇怪的事物时,我首先要做的是:

$ python -c 'import sys; print(sys.path)'
['', '/usr/lib/python2.7', '/usr/lib/python2.7/plat-x86_64-linux-gnu',
'/usr/lib/python2.7/lib-tk', '/usr/lib/python2.7/lib-old',
'/usr/lib/python2.7/lib-dynload', '/usr/local/lib/python2.7/dist-packages',
'/usr/lib/python2.7/dist-packages']

$ pip --version
pip 9.0.1 from /usr/local/lib/python2.7/dist-packages (python 2.7)

那是一个快乐的系统

下面是一个不愉快的系统。(或者至少是一个幸福无知的系统,导致其他人感到不高兴。)

$ pip --version
pip 9.0.1 from /usr/local/lib/python3.6/site-packages (python 3.6)

$ python -c 'import sys; print(sys.path)'
['', '/usr/local/Cellar/python/2.7.13/Frameworks/Python.framework/Versions/2.7/lib/python27.zip',
'/usr/local/Cellar/python/2.7.13/Frameworks/Python.framework/Versions/2.7/lib/python2.7',
'/usr/local/Cellar/python/2.7.13/Frameworks/Python.framework/Versions/2.7/lib/python2.7/plat-darwin',
'/usr/local/Cellar/python/2.7.13/Frameworks/Python.framework/Versions/2.7/lib/python2.7/plat-mac',
'/usr/local/Cellar/python/2.7.13/Frameworks/Python.framework/Versions/2.7/lib/python2.7/plat-mac/lib-scriptpackages',
'/usr/local/Cellar/python/2.7.13/Frameworks/Python.framework/Versions/2.7/lib/python2.7/lib-tk',
'/usr/local/Cellar/python/2.7.13/Frameworks/Python.framework/Versions/2.7/lib/python2.7/lib-old',
'/usr/local/Cellar/python/2.7.13/Frameworks/Python.framework/Versions/2.7/lib/python2.7/lib-dynload',
'/usr/local/lib/python2.7/site-packages']

$ which pip pip2 pip3
/usr/local/bin/pip
/usr/local/bin/pip3

不高兴,因为pip(python3.6和)正在使用/usr/local/lib/python3.6/site-packageswhile python是(python2.7和)正在使用/usr/local/lib/python2.7/site-packages

当我要确保将要求安装到正确的 python时,请执行以下操作:

$ which -a python python2 python3
/usr/local/bin/python
/usr/bin/python
/usr/local/bin/python2
/usr/local/bin/python3

$ /usr/bin/python -m pip install -r requirements.txt

您听说过,“如果它没有损坏,请不要尝试对其进行修复。” DevOps的版本是“如果您没有破坏它并且可以解决它,请不要尝试对其进行修复。”

I work with a lot of systems that have been mucked by developers “following directions they found on the internet”. It is extremely common that your pip and your python are not looking at the same paths/site-packages. For this reason, when I encounter oddness I start by doing this:

$ python -c 'import sys; print(sys.path)'
['', '/usr/lib/python2.7', '/usr/lib/python2.7/plat-x86_64-linux-gnu',
'/usr/lib/python2.7/lib-tk', '/usr/lib/python2.7/lib-old',
'/usr/lib/python2.7/lib-dynload', '/usr/local/lib/python2.7/dist-packages',
'/usr/lib/python2.7/dist-packages']

$ pip --version
pip 9.0.1 from /usr/local/lib/python2.7/dist-packages (python 2.7)

That is a happy system.

Below is an unhappy system. (Or at least it’s a blissfully ignorant system that causes others to be unhappy.)

$ pip --version
pip 9.0.1 from /usr/local/lib/python3.6/site-packages (python 3.6)

$ python -c 'import sys; print(sys.path)'
['', '/usr/local/Cellar/python/2.7.13/Frameworks/Python.framework/Versions/2.7/lib/python27.zip',
'/usr/local/Cellar/python/2.7.13/Frameworks/Python.framework/Versions/2.7/lib/python2.7',
'/usr/local/Cellar/python/2.7.13/Frameworks/Python.framework/Versions/2.7/lib/python2.7/plat-darwin',
'/usr/local/Cellar/python/2.7.13/Frameworks/Python.framework/Versions/2.7/lib/python2.7/plat-mac',
'/usr/local/Cellar/python/2.7.13/Frameworks/Python.framework/Versions/2.7/lib/python2.7/plat-mac/lib-scriptpackages',
'/usr/local/Cellar/python/2.7.13/Frameworks/Python.framework/Versions/2.7/lib/python2.7/lib-tk',
'/usr/local/Cellar/python/2.7.13/Frameworks/Python.framework/Versions/2.7/lib/python2.7/lib-old',
'/usr/local/Cellar/python/2.7.13/Frameworks/Python.framework/Versions/2.7/lib/python2.7/lib-dynload',
'/usr/local/lib/python2.7/site-packages']

$ which pip pip2 pip3
/usr/local/bin/pip
/usr/local/bin/pip3

It is unhappy because pip is (python3.6 and) using /usr/local/lib/python3.6/site-packages while python is (python2.7 and) using /usr/local/lib/python2.7/site-packages

When I want to make sure I’m installing requirements to the right python, I do this:

$ which -a python python2 python3
/usr/local/bin/python
/usr/bin/python
/usr/local/bin/python2
/usr/local/bin/python3

$ /usr/bin/python -m pip install -r requirements.txt

You’ve heard, “If it ain’t broke, don’t try to fix it.” The DevOps version of that is, “If you didn’t break it and you can work around it, don’t try to fix it.”


回答 8

首先,创建一个虚拟环境

在python 3.6中

virtualenv --python=/usr/bin/python3.6 <path/to/new/virtualenv/>

在python 2.7中

virtualenv --python=/usr/bin/python2.7 <path/to/new/virtualenv/>

然后激活环境并安装require.txt文件中所有可用的软件包。

source <path/to/new/virtualenv>/bin/activate
pip install -r <path/to/requirement.txt>

first of all, create a virtual environment

in python 3.6

virtualenv --python=/usr/bin/python3.6 <path/to/new/virtualenv/>

in python 2.7

virtualenv --python=/usr/bin/python2.7 <path/to/new/virtualenv/>

then activate the environment and install all the packages available in the requirement.txt file.

source <path/to/new/virtualenv>/bin/activate
pip install -r <path/to/requirement.txt>

回答 9

使用python 3在虚拟环境中安装Requirements.txt文件:

我遇到过同样的问题。我试图在虚拟环境中安装requirements.txt文件。我找到了解决方案。

最初,我以这种方式创建了虚拟环境:

virtualenv -p python3 myenv

使用以下方法激活环境:

source myenv/bin/activate

现在,我使用以下命令安装了requirements.txt:

pip3 install -r requirements.txt

安装成功,我能够导入模块。

Installing requirements.txt file inside virtual env with python 3:

I had the same issue. I was trying to install requirements.txt file inside a virtual environament. I found the solution.

Initially, I created my virtual env in this way:

virtualenv -p python3 myenv

Activate the environment using:

source myenv/bin/activate

Now I installed the requirements.txt using:

pip3 install -r requirements.txt

Installation was successful and I was able to import the modules.


回答 10

尝试这个

python -m pip install -r requirements.txt

try this

python -m pip install -r requirements.txt

回答 11

pip install --user -r requirements.txt 

要么

pip3 install --user -r requirements.txt 
pip install --user -r requirements.txt 

OR

pip3 install --user -r requirements.txt 

如何终止Python脚本

问题:如何终止Python脚本

我知道die()PHP 中的命令会较早退出脚本。

如何在Python中执行此操作?

I am aware of the die() command in PHP which exits a script early.

How can I do this in Python?


回答 0

import sys
sys.exit()

sys模块文档中的详细信息:

sys.exit([arg])

从Python退出。这是通过引发SystemExit异常来实现的,因此可以执行 try语句的finally子句指定的清除操作,并且有可能在外部级别拦截出口尝试。

可选参数arg可以是给出退出状态的整数(默认为零),也可以是其他类型的对象。如果它是整数,则Shell等将零视为“成功终止”,而将任何非零值视为“异常终止”。大多数系统要求它的范围是0-127,否则会产生不确定的结果。某些系统具有为特定的退出代码分配特定含义的约定,但是这些通常不完善。Unix程序通常将2用于命令行语法错误,将1用于所有其他类型的错误。如果传递了另一种类型的对象,则“无”等效于传递零,并且将任何其他对象输出stderr并导致退出代码为1。特别是, sys.exit("some error message") 是发生错误时退出程序的快速方法。

由于exit()最终“仅”会引发异常,因此它仅在从主线程调用时才退出进程,并且不会拦截该异常。

请注意,这是退出的“不错”方式。下面的@ glyphtwistedmatrix指出,如果您想要“硬出口”,则可以使用os._exit(*errorcode*),尽管它在某种程度上可能是特定于OS的(例如,在Windows下可能不会显示错误代码),并且它肯定不那么友好,因为它在进程终止之前,不允许解释器进行任何清理。

import sys
sys.exit()

details from the sys module documentation:

sys.exit([arg])

Exit from Python. This is implemented by raising the SystemExit exception, so cleanup actions specified by finally clauses of try statements are honored, and it is possible to intercept the exit attempt at an outer level.

The optional argument arg can be an integer giving the exit status (defaulting to zero), or another type of object. If it is an integer, zero is considered “successful termination” and any nonzero value is considered “abnormal termination” by shells and the like. Most systems require it to be in the range 0-127, and produce undefined results otherwise. Some systems have a convention for assigning specific meanings to specific exit codes, but these are generally underdeveloped; Unix programs generally use 2 for command line syntax errors and 1 for all other kind of errors. If another type of object is passed, None is equivalent to passing zero, and any other object is printed to stderr and results in an exit code of 1. In particular, sys.exit("some error message") is a quick way to exit a program when an error occurs.

Since exit() ultimately “only” raises an exception, it will only exit the process when called from the main thread, and the exception is not intercepted.

Note that this is the ‘nice’ way to exit. @glyphtwistedmatrix below points out that if you want a ‘hard exit’, you can use os._exit(*errorcode*), though it’s likely os-specific to some extent (it might not take an errorcode under windows, for example), and it definitely is less friendly since it doesn’t let the interpreter do any cleanup before the process dies.


回答 1

一种提前终止Python脚本的简单方法是使用内置quit()函数。无需导入任何库,它既高效又简单。

例:

#do stuff
if this == that:
  quit()

A simple way to terminate a Python script early is to use the built-in quit() function. There is no need to import any library, and it is efficient and simple.

Example:

#do stuff
if this == that:
  quit()

回答 2

另一种方法是:

raise SystemExit

Another way is:

raise SystemExit

回答 3

您也可以简单地使用exit()

请记住sys.exit()exit()quit(),和os._exit(0) Python解释器。因此,如果它出现在由另一个脚本通过调用的脚本中execfile(),它将停止两个脚本的执行。

请参阅“ 停止执行用execfile调用的脚本 ”来避免这种情况。

You can also use simply exit().

Keep in mind that sys.exit(), exit(), quit(), and os._exit(0) kill the Python interpreter. Therefore, if it appears in a script called from another script by execfile(), it stops execution of both scripts.

See “Stop execution of a script called with execfile” to avoid this.


回答 4

尽管您通常应该选择sys.exit它是因为它比其他代码更“友好”,但它实际上所做的只是引发一个异常。

如果您确定需要立即退出进程,并且您可能在某个将捕获的异常处理程序中SystemExit,则可以使用另一个函数– os._exit在C级别立即终止,并且不执行任何正常的删除操作口译员 例如,不执行在“ atexit”模块中注册的钩子。

While you should generally prefer sys.exit because it is more “friendly” to other code, all it actually does is raise an exception.

If you are sure that you need to exit a process immediately, and you might be inside of some exception handler which would catch SystemExit, there is another function – os._exit – which terminates immediately at the C level and does not perform any of the normal tear-down of the interpreter; for example, hooks registered with the “atexit” module are not executed.


回答 5

我刚刚发现了写multithreadded的应用程序时,raise SystemExitsys.exit()两个只有杀死正在运行的线程。另一方面,os._exit()退出整个过程。在“ 为什么在Python的线程内调用sys.exit()不会退出?中对此进行了讨论。

下面的示例有2个线程。肯尼和卡特曼。卡特曼本应该永远活着,但肯尼却被递归召唤,应该在3秒后死亡。(递归调用不是最好的方法,但是我还有其他原因)

如果我们还希望卡特曼在肯尼死后去世,那么肯尼就应该离开os._exit,否则,只有肯尼会死,卡特曼才能永远生活。

import threading
import time
import sys
import os

def kenny(num=0):
    if num > 3:
        # print("Kenny dies now...")
        # raise SystemExit #Kenny will die, but Cartman will live forever
        # sys.exit(1) #Same as above

        print("Kenny dies and also kills Cartman!")
        os._exit(1)
    while True:
        print("Kenny lives: {0}".format(num))
        time.sleep(1)
        num += 1
        kenny(num)

def cartman():
    i = 0
    while True:
        print("Cartman lives: {0}".format(i))
        i += 1
        time.sleep(1)

if __name__ == '__main__':
    daemon_kenny = threading.Thread(name='kenny', target=kenny)
    daemon_cartman = threading.Thread(name='cartman', target=cartman)
    daemon_kenny.setDaemon(True)
    daemon_cartman.setDaemon(True)

    daemon_kenny.start()
    daemon_cartman.start()
    daemon_kenny.join()
    daemon_cartman.join()

I’ve just found out that when writing a multithreadded app, raise SystemExit and sys.exit() both kills only the running thread. On the other hand, os._exit() exits the whole process. This was discussed in “Why does sys.exit() not exit when called inside a thread in Python?“.

The example below has 2 threads. Kenny and Cartman. Cartman is supposed to live forever, but Kenny is called recursively and should die after 3 seconds. (recursive calling is not the best way, but I had other reasons)

If we also want Cartman to die when Kenny dies, Kenny should go away with os._exit, otherwise, only Kenny will die and Cartman will live forever.

import threading
import time
import sys
import os

def kenny(num=0):
    if num > 3:
        # print("Kenny dies now...")
        # raise SystemExit #Kenny will die, but Cartman will live forever
        # sys.exit(1) #Same as above

        print("Kenny dies and also kills Cartman!")
        os._exit(1)
    while True:
        print("Kenny lives: {0}".format(num))
        time.sleep(1)
        num += 1
        kenny(num)

def cartman():
    i = 0
    while True:
        print("Cartman lives: {0}".format(i))
        i += 1
        time.sleep(1)

if __name__ == '__main__':
    daemon_kenny = threading.Thread(name='kenny', target=kenny)
    daemon_cartman = threading.Thread(name='cartman', target=cartman)
    daemon_kenny.setDaemon(True)
    daemon_cartman.setDaemon(True)

    daemon_kenny.start()
    daemon_cartman.start()
    daemon_kenny.join()
    daemon_cartman.join()

回答 6

from sys import exit
exit()

作为参数,您可以传递退出代码,该退出代码将返回给OS。默认值为0。

from sys import exit
exit()

As a parameter you can pass an exit code, which will be returned to OS. Default is 0.


回答 7

我是一个新手,但可以肯定的是,它更干净,更易控制

def main():
    try:
        Answer = 1/0
        print  Answer
    except:
        print 'Program terminated'
        return
    print 'You wont see this'

if __name__ == '__main__': 
    main()

程序终止

import sys
def main():
    try:
        Answer = 1/0
        print  Answer
    except:
        print 'Program terminated'
        sys.exit()
    print 'You wont see this'

if __name__ == '__main__': 
    main()

程序终止了回溯(最近一次调用最后一次):main()中的文件“ Z:\ Directory \ testdieprogram.py”,第12行,sys.exit中主要的文件“ Z:\ Directory \ testdieprogram.py”,第8行( )SystemExit

编辑

关键是该程序可以顺利,和平地结束,而不是“我已停止!!!!”

I’m a total novice but surely this is cleaner and more controlled

def main():
    try:
        Answer = 1/0
        print  Answer
    except:
        print 'Program terminated'
        return
    print 'You wont see this'

if __name__ == '__main__': 
    main()

Program terminated

than

import sys
def main():
    try:
        Answer = 1/0
        print  Answer
    except:
        print 'Program terminated'
        sys.exit()
    print 'You wont see this'

if __name__ == '__main__': 
    main()

Program terminated Traceback (most recent call last): File “Z:\Directory\testdieprogram.py”, line 12, in main() File “Z:\Directory\testdieprogram.py”, line 8, in main sys.exit() SystemExit

Edit

The point being that the program ends smoothly and peacefully, rather than “I’VE STOPPED !!!!”


回答 8

在Python 3.5中,我尝试合并类似的代码,而不使用内置的模块(例如sys,Biopy)来停止脚本并向用户打印错误消息。这是我的示例:

## My example:
if "ATG" in my_DNA: 
    ## <Do something & proceed...>
else: 
    print("Start codon is missing! Check your DNA sequence!")
    exit() ## as most folks said above

后来,我发现抛出一个错误更为简洁:

## My example revised:
if "ATG" in my_DNA: 
    ## <Do something & proceed...>
else: 
    raise ValueError("Start codon is missing! Check your DNA sequence!")

In Python 3.5, I tried to incorporate similar code without use of modules (e.g. sys, Biopy) other than what’s built-in to stop the script and print an error message to my users. Here’s my example:

## My example:
if "ATG" in my_DNA: 
    ## <Do something & proceed...>
else: 
    print("Start codon is missing! Check your DNA sequence!")
    exit() ## as most folks said above

Later on, I found it is more succinct to just throw an error:

## My example revised:
if "ATG" in my_DNA: 
    ## <Do something & proceed...>
else: 
    raise ValueError("Start codon is missing! Check your DNA sequence!")

回答 9

我的两分钱。

Python 3.8.1,Windows 10、64位。

sys.exit() 无法直接为我工作。

我有几个下一个循环。

首先,我声明一个布尔变量,称为immediateExit

因此,在程序代码的开头,我写了:

immediateExit = False

然后,从最内部的(嵌套的)循环异常开始,我写:

            immediateExit = True
            sys.exit('CSV file corrupted 0.')

然后,我进入外循环的直接延续,在代码执行任何其他操作之前,我写了:

    if immediateExit:
        sys.exit('CSV file corrupted 1.')

根据复杂程度,有时除部分内容外,还需要重复上述说明。

    if immediateExit:
        sys.exit('CSV file corrupted 1.5.')

自定义消息也用于我的个人调试,因为数字是出于相同的目的-查看脚本真正的退出位置。

'CSV file corrupted 1.5.'

在我的特殊情况下,我正在处理一个CSV文件,如果该软件检测到它已损坏,则我不希望该软件接触它。因此对我来说,在检测到可能的损坏后立即退出整个Python脚本非常重要。

从所有循环中逐步退出系统,我设法做到了。

完整代码:(需要进行一些更改,因为它是内部任务的专有代码):

immediateExit = False
start_date = '1994.01.01'
end_date = '1994.01.04'
resumedDate = end_date


end_date_in_working_days = False
while not end_date_in_working_days:
    try:
        end_day_position = working_days.index(end_date)

        end_date_in_working_days = True
    except ValueError: # try statement from end_date in workdays check
        print(current_date_and_time())
        end_date = input('>> {} is not in the list of working days. Change the date (YYYY.MM.DD): '.format(end_date))
        print('New end date: ', end_date, '\n')
        continue


    csv_filename = 'test.csv'
    csv_headers = 'date,rate,brand\n' # not real headers, this is just for example
    try:
        with open(csv_filename, 'r') as file:
            print('***\nOld file {} found. Resuming the file by re-processing the last date lines.\nThey shall be deleted and re-processed.\n***\n'.format(csv_filename))
            last_line = file.readlines()[-1]
            start_date = last_line.split(',')[0] # assigning the start date to be the last like date.
            resumedDate = start_date

            if last_line == csv_headers:
                pass
            elif start_date not in working_days:
                print('***\n\n{} file might be corrupted. Erase or edit the file to continue.\n***'.format(csv_filename))
                immediateExit = True
                sys.exit('CSV file corrupted 0.')
            else:
                start_date = last_line.split(',')[0] # assigning the start date to be the last like date.
                print('\nLast date:', start_date)
                file.seek(0) # setting the cursor at the beginnning of the file
                lines = file.readlines() # reading the file contents into a list
                count = 0 # nr. of lines with last date
                for line in lines: #cycling through the lines of the file
                    if line.split(',')[0] == start_date: # cycle for counting the lines with last date in it.
                        count = count + 1
        if immediateExit:
            sys.exit('CSV file corrupted 1.')
        for iter in range(count): # removing the lines with last date
            lines.pop()
        print('\n{} lines removed from date: {} in {} file'.format(count, start_date, csv_filename))



        if immediateExit:
            sys.exit('CSV file corrupted 1.2.')
        with open(csv_filename, 'w') as file:
            print('\nFile', csv_filename, 'open for writing')
            file.writelines(lines)

            print('\nRemoving', count, 'lines from', csv_filename)

        fileExists = True

    except:
        if immediateExit:
            sys.exit('CSV file corrupted 1.5.')
        with open(csv_filename, 'w') as file:
            file.write(csv_headers)
            fileExists = False
    if immediateExit:
        sys.exit('CSV file corrupted 2.')

My two cents.

Python 3.8.1, Windows 10, 64-bit.

sys.exit() does not work directly for me.

I have several nexted loops.

First I declare a boolean variable, which I call immediateExit.

So, in the beginning of the program code I write:

immediateExit = False

Then, starting from the most inner (nested) loop exception, I write:

            immediateExit = True
            sys.exit('CSV file corrupted 0.')

Then I go into the immediate continuation of the outer loop, and before anything else being executed by the code, I write:

    if immediateExit:
        sys.exit('CSV file corrupted 1.')

Depending on the complexity, sometimes the above statement needs to be repeated also in except sections, etc.

    if immediateExit:
        sys.exit('CSV file corrupted 1.5.')

The custom message is for my personal debugging, as well, as the numbers are for the same purpose – to see where the script really exits.

'CSV file corrupted 1.5.'

In my particular case I am processing a CSV file, which I do not want the software to touch, if the software detects it is corrupted. Therefore for me it is very important to exit the whole Python script immediately after detecting the possible corruption.

And following the gradual sys.exit-ing from all the loops I manage to do it.

Full code: (some changes were needed because it is proprietory code for internal tasks):

immediateExit = False
start_date = '1994.01.01'
end_date = '1994.01.04'
resumedDate = end_date


end_date_in_working_days = False
while not end_date_in_working_days:
    try:
        end_day_position = working_days.index(end_date)

        end_date_in_working_days = True
    except ValueError: # try statement from end_date in workdays check
        print(current_date_and_time())
        end_date = input('>> {} is not in the list of working days. Change the date (YYYY.MM.DD): '.format(end_date))
        print('New end date: ', end_date, '\n')
        continue


    csv_filename = 'test.csv'
    csv_headers = 'date,rate,brand\n' # not real headers, this is just for example
    try:
        with open(csv_filename, 'r') as file:
            print('***\nOld file {} found. Resuming the file by re-processing the last date lines.\nThey shall be deleted and re-processed.\n***\n'.format(csv_filename))
            last_line = file.readlines()[-1]
            start_date = last_line.split(',')[0] # assigning the start date to be the last like date.
            resumedDate = start_date

            if last_line == csv_headers:
                pass
            elif start_date not in working_days:
                print('***\n\n{} file might be corrupted. Erase or edit the file to continue.\n***'.format(csv_filename))
                immediateExit = True
                sys.exit('CSV file corrupted 0.')
            else:
                start_date = last_line.split(',')[0] # assigning the start date to be the last like date.
                print('\nLast date:', start_date)
                file.seek(0) # setting the cursor at the beginnning of the file
                lines = file.readlines() # reading the file contents into a list
                count = 0 # nr. of lines with last date
                for line in lines: #cycling through the lines of the file
                    if line.split(',')[0] == start_date: # cycle for counting the lines with last date in it.
                        count = count + 1
        if immediateExit:
            sys.exit('CSV file corrupted 1.')
        for iter in range(count): # removing the lines with last date
            lines.pop()
        print('\n{} lines removed from date: {} in {} file'.format(count, start_date, csv_filename))



        if immediateExit:
            sys.exit('CSV file corrupted 1.2.')
        with open(csv_filename, 'w') as file:
            print('\nFile', csv_filename, 'open for writing')
            file.writelines(lines)

            print('\nRemoving', count, 'lines from', csv_filename)

        fileExists = True

    except:
        if immediateExit:
            sys.exit('CSV file corrupted 1.5.')
        with open(csv_filename, 'w') as file:
            file.write(csv_headers)
            fileExists = False
    if immediateExit:
        sys.exit('CSV file corrupted 2.')


如何在Python中进行换行(换行)?

问题:如何在Python中进行换行(换行)?

我有一长行代码,我想在多行中分解。我使用什么,语法是什么?

例如,添加一串字符串,

e = 'a' + 'b' + 'c' + 'd'

并分成两行,如下所示:

e = 'a' + 'b' +
    'c' + 'd'

I have a long line of code that I want to break up among multiple lines. What do I use and what is the syntax?

For example, adding a bunch of strings,

e = 'a' + 'b' + 'c' + 'd'

and have it in two lines like this:

e = 'a' + 'b' +
    'c' + 'd'

回答 0

什么线?您只需在下一行就有参数就不会有任何问题:

a = dostuff(blahblah1, blahblah2, blahblah3, blahblah4, blahblah5, 
            blahblah6, blahblah7)

否则,您可以执行以下操作:

if a == True and \
   b == False

查看样式指南以获取更多信息。

从示例行中:

a = '1' + '2' + '3' + \
    '4' + '5'

要么:

a = ('1' + '2' + '3' +
    '4' + '5')

请注意,样式指南指出,最好使用带括号的隐式连续符,但是在这种特殊情况下,仅在表达式周围加上括号可能是错误的方法。

What is the line? You can just have arguments on the next line without any problems:

a = dostuff(blahblah1, blahblah2, blahblah3, blahblah4, blahblah5, 
            blahblah6, blahblah7)

Otherwise you can do something like this:

if a == True and \
   b == False

Check the style guide for more information.

From your example line:

a = '1' + '2' + '3' + \
    '4' + '5'

Or:

a = ('1' + '2' + '3' +
    '4' + '5')

Note that the style guide says that using the implicit continuation with parentheses is preferred, but in this particular case just adding parentheses around your expression is probably the wrong way to go.


回答 1

PEP 8-Python代码样式指南

包装长行的首选方法是在括号,方括号和花括号内使用Python的隐含行连续性。通过将表达式包装在括号中,可以将长行分成多行。应优先使用这些,而不是使用反斜杠进行行连续。

有时反斜杠可能仍然合适。例如,长的多个with语句不能使用隐式连续,因此可以使用反斜杠:

with open('/path/to/some/file/you/want/to/read') as file_1, \
        open('/path/to/some/file/being/written', 'w') as file_2:
    file_2.write(file_1.read())

另一种此类情况是使用assert语句。

确保适当缩进续行。绕开二元运算符的首选位置是运算符之后,而不是在运算符之前。一些例子:

class Rectangle(Blob):

    def __init__(self, width, height,
                 color='black', emphasis=None, highlight=0):
        if (width == 0 and height == 0 and
                color == 'red' and emphasis == 'strong' or
                highlight > 100):
            raise ValueError("sorry, you lose")
        if width == 0 and height == 0 and (color == 'red' or
                                           emphasis is None):
            raise ValueError("I don't think so -- values are %s, %s" %
                             (width, height))
        Blob.__init__(self, width, height,
                      color, emphasis, highlight)

现在,PEP8建议数学家及其发布者使用相反的约定(用于在二进制运算处中断)以提高可读性。

唐纳德·克努斯(Donald Knuth)二元运算符垂直对齐之前先断后合的风格,从而减少了确定要添加和减去的项时的工作量。

PEP8:换行符应该在二进制运算符之前还是之后?

唐纳德·克努斯(Donald Knuth)在他的《计算机和排版》系列中解释了传统规则:“尽管段落中的公式总是在二进制运算和关系之后中断,但显示的公式总是在二进制运算和关系之前中断” [3]。

遵循数学的传统通常会导致代码更具可读性:

# Yes: easy to match operators with operands
income = (gross_wages
          + taxable_interest
          + (dividends - qualified_dividends)
          - ira_deduction
          - student_loan_interest)

在Python代码中,只要约定在本地是一致的,就可以在二进制运算符之前或之后中断。对于新代码,建议使用Knuth的样式。

[3]:Donald Knuth的The TeXBook,第195和196页

From PEP 8 — Style Guide for Python Code:

The preferred way of wrapping long lines is by using Python’s implied line continuation inside parentheses, brackets and braces. Long lines can be broken over multiple lines by wrapping expressions in parentheses. These should be used in preference to using a backslash for line continuation.

Backslashes may still be appropriate at times. For example, long, multiple with-statements cannot use implicit continuation, so backslashes are acceptable:

with open('/path/to/some/file/you/want/to/read') as file_1, \
        open('/path/to/some/file/being/written', 'w') as file_2:
    file_2.write(file_1.read())

Another such case is with assert statements.

Make sure to indent the continued line appropriately. The preferred place to break around a binary operator is after the operator, not before it. Some examples:

class Rectangle(Blob):

    def __init__(self, width, height,
                 color='black', emphasis=None, highlight=0):
        if (width == 0 and height == 0 and
                color == 'red' and emphasis == 'strong' or
                highlight > 100):
            raise ValueError("sorry, you lose")
        if width == 0 and height == 0 and (color == 'red' or
                                           emphasis is None):
            raise ValueError("I don't think so -- values are %s, %s" %
                             (width, height))
        Blob.__init__(self, width, height,
                      color, emphasis, highlight)

PEP8 now recommends the opposite convention (for breaking at binary operations) used by mathematicians and their publishers to improve readability.

Donald Knuth’s style of breaking before a binary operator aligns operators vertically, thus reducing the eye’s workload when determining which items are added and subtracted.

From PEP8: Should a line break before or after a binary operator?:

Donald Knuth explains the traditional rule in his Computers and Typesetting series: “Although formulas within a paragraph always break after binary operations and relations, displayed formulas always break before binary operations”[3].

Following the tradition from mathematics usually results in more readable code:

# Yes: easy to match operators with operands
income = (gross_wages
          + taxable_interest
          + (dividends - qualified_dividends)
          - ira_deduction
          - student_loan_interest)

In Python code, it is permissible to break before or after a binary operator, as long as the convention is consistent locally. For new code Knuth’s style is suggested.

[3]: Donald Knuth’s The TeXBook, pages 195 and 196


回答 2

使用反斜杠结束行的危险在于,如果在反斜杠之后添加空格(当然很难看到),则反斜杠将不再执行您原本的想法。

有关更多信息,请参见Python习语和反习语(对于Python 2Python 3)。

The danger in using a backslash to end a line is that if whitespace is added after the backslash (which, of course, is very hard to see), the backslash is no longer doing what you thought it was.

See Python Idioms and Anti-Idioms (for Python 2 or Python 3) for more.


回答 3

\在行末放置a 或将语句括在parens中( .. )。从IBM

b = ((i1 < 20) and
     (i2 < 30) and
     (i3 < 40))

要么

b = (i1 < 20) and \
    (i2 < 30) and \
    (i3 < 40)

Put a \ at the end of your line or enclose the statement in parens ( .. ). From IBM:

b = ((i1 < 20) and
     (i2 < 30) and
     (i3 < 40))

or

b = (i1 < 20) and \
    (i2 < 30) and \
    (i3 < 40)

回答 4

您可以在括号和大括号之间打断线。此外,您可以将反斜杠字符附加\到一行以显式中断它:

x = (tuples_first_value,
     second_value)
y = 1 + \
    2

You can break lines in between parenthesises and braces. Additionally, you can append the backslash character \ to a line to explicitly break it:

x = (tuples_first_value,
     second_value)
y = 1 + \
    2

回答 5

从马口中:显式线连接

可以使用反斜杠字符(\)将两条或更多条物理行连接为逻辑行,如下所示:当一条物理行以不属于字符串文字或注释的反斜杠结尾时,则将其与以下行合并成一条逻辑行,删除反斜杠和以下换行符。例如:

if 1900 < year < 2100 and 1 <= month <= 12 \
   and 1 <= day <= 31 and 0 <= hour < 24 \
   and 0 <= minute < 60 and 0 <= second < 60:   # Looks like a valid date
        return 1

以反斜杠结尾的行不能带有注释。反斜杠不会继续发表评论。除字符串文字外,反斜杠不会延续令牌(即,字符串文字之外的令牌无法使用反斜杠在物理行之间进行拆分)。反斜杠在字符串文字之外的其他行上是非法的。

From the horse’s mouth: Explicit line joining

Two or more physical lines may be joined into logical lines using backslash characters (\), as follows: when a physical line ends in a backslash that is not part of a string literal or comment, it is joined with the following forming a single logical line, deleting the backslash and the following end-of-line character. For example:

if 1900 < year < 2100 and 1 <= month <= 12 \
   and 1 <= day <= 31 and 0 <= hour < 24 \
   and 0 <= minute < 60 and 0 <= second < 60:   # Looks like a valid date
        return 1

A line ending in a backslash cannot carry a comment. A backslash does not continue a comment. A backslash does not continue a token except for string literals (i.e., tokens other than string literals cannot be split across physical lines using a backslash). A backslash is illegal elsewhere on a line outside a string literal.


回答 6

这可能不是Python的方式,但是我通常使用带有join函数的列表来编写长字符串,例如SQL查询:

query = " ".join([
    'SELECT * FROM "TableName"',
    'WHERE "SomeColumn1"=VALUE',
    'ORDER BY "SomeColumn2"',
    'LIMIT 5;'
])

It may not be the Pythonic way, but I generally use a list with the join function for writing a long string, like SQL queries:

query = " ".join([
    'SELECT * FROM "TableName"',
    'WHERE "SomeColumn1"=VALUE',
    'ORDER BY "SomeColumn2"',
    'LIMIT 5;'
])

回答 7

摘自《 The Hitchhiker’s Guide to Python(Line Continuation)》:

当逻辑代码行长于可接受的限制时,您需要将其划分为多条物理行。如果该行的最后符是反斜杠,则Python解释器将连接连续的行。在某些情况下这很有用,但由于其易碎性通常应避免使用:在反斜杠后的行末添加空格将破坏代码并可能产生意外结果。

更好的解决方案是在元素周围使用括号。在行尾留下未封闭的括号的情况下,Python解释器将加入下一行,直到括号被封闭为止。花括号和方括号的行为相同。

但是,通常情况下,必须分开一条较长的逻辑线表明您正在尝试同时执行太多操作,这可能会影响可读性。

话虽如此,这是一个考虑多次导入的示例(当超出行限制时,在PEP-8上定义),通常也适用于字符串:

from app import (
    app, abort, make_response, redirect, render_template, request, session
)

Taken from The Hitchhiker’s Guide to Python (Line Continuation):

When a logical line of code is longer than the accepted limit, you need to split it over multiple physical lines. The Python interpreter will join consecutive lines if the last character of the line is a backslash. This is helpful in some cases, but should usually be avoided because of its fragility: a white space added to the end of the line, after the backslash, will break the code and may have unexpected results.

A better solution is to use parentheses around your elements. Left with an unclosed parenthesis on an end-of-line the Python interpreter will join the next line until the parentheses are closed. The same behaviour holds for curly and square braces.

However, more often than not, having to split a long logical line is a sign that you are trying to do too many things at the same time, which may hinder readability.

Having that said, here’s an example considering multiple imports (when exceeding line limits, defined on PEP-8), also applied to strings in general:

from app import (
    app, abort, make_response, redirect, render_template, request, session
)

回答 8

如果由于长字符串而要中断行,可以将该字符串分成几部分:

long_string = "a very long string"
print("a very long string")

将被替换

long_string = (
  "a "
  "very "
  "long "
  "string"
)
print(
  "a "
  "very "
  "long "
  "string"
)

两个打印语句的输出:

a very long string

注意情感中的括号。

还要注意,将文字字符串分成几部分,只允许在部分字符串上使用文字前缀:

s = (
  "2+2="
  f"{2+2}"
)

If you want to break your line because of a long literal string, you can break that string into pieces:

long_string = "a very long string"
print("a very long string")

will be replaced by

long_string = (
  "a "
  "very "
  "long "
  "string"
)
print(
  "a "
  "very "
  "long "
  "string"
)

Output for both print statements:

a very long string

Notice the parenthesis in the affectation.

Notice also that breaking literal strings into pieces allows to use the literal prefix only on parts of the string:

s = (
  "2+2="
  f"{2+2}"
)

回答 9

使用行继续运算符,即“ \”

例子:

# Ex.1

x = 1
s =  x + x**2/2 + x**3/3 \
       + x**4/4 + x**5/5 \
       + x**6/6 + x**7/7 \
       + x**8/8
print(s)
# 2.7178571428571425


----------


# Ex.2

text = ('Put several strings within parentheses ' \
        'to have them joined together.')
print(text)


----------


# Ex.3

x = 1
s =  x + x**2/2 \
       + x**3/3 \
       + x**4/4 \
       + x**6/6 \
       + x**8/8
print(s)
# 2.3749999999999996

Use the line continuation operator i.e. “\”

Examples:

# Ex.1

x = 1
s =  x + x**2/2 + x**3/3 \
       + x**4/4 + x**5/5 \
       + x**6/6 + x**7/7 \
       + x**8/8
print(s)
# 2.7178571428571425


----------


# Ex.2

text = ('Put several strings within parentheses ' \
        'to have them joined together.')
print(text)


----------


# Ex.3

x = 1
s =  x + x**2/2 \
       + x**3/3 \
       + x**4/4 \
       + x**6/6 \
       + x**8/8
print(s)
# 2.3749999999999996

如何修剪空白?

问题:如何修剪空白?

是否有Python函数可以从字符串中修剪空格(空格和制表符)?

例如:\t example string\texample string

Is there a Python function that will trim whitespace (spaces and tabs) from a string?

Example: \t example string\texample string


回答 0

两侧的空格:

s = "  \t a string example\t  "
s = s.strip()

右侧的空格:

s = s.rstrip()

左侧的空白:

s = s.lstrip()

正如thedz所指出的,您可以提供一个参数来将任意字符剥离到以下任何函数中,如下所示:

s = s.strip(' \t\n\r')

这将去除任何空间,\t\n,或\r从左侧字符,右手侧,或该字符串的两侧。

上面的示例仅从字符串的左侧和右侧删除字符串。如果还要从字符串中间删除字符,请尝试re.sub

import re
print re.sub('[\s+]', '', s)

那应该打印出来:

astringexample

Whitespace on both sides:

s = "  \t a string example\t  "
s = s.strip()

Whitespace on the right side:

s = s.rstrip()

Whitespace on the left side:

s = s.lstrip()

As thedz points out, you can provide an argument to strip arbitrary characters to any of these functions like this:

s = s.strip(' \t\n\r')

This will strip any space, \t, \n, or \r characters from the left-hand side, right-hand side, or both sides of the string.

The examples above only remove strings from the left-hand and right-hand sides of strings. If you want to also remove characters from the middle of a string, try re.sub:

import re
print re.sub('[\s+]', '', s)

That should print out:

astringexample

回答 1

Python trim方法称为strip

str.strip() #trim
str.lstrip() #ltrim
str.rstrip() #rtrim

Python trim method is called strip:

str.strip() #trim
str.lstrip() #ltrim
str.rstrip() #rtrim

回答 2

对于前导和尾随空格:

s = '   foo    \t   '
print s.strip() # prints "foo"

否则,一个正则表达式将起作用:

import re
pat = re.compile(r'\s+')
s = '  \t  foo   \t   bar \t  '
print pat.sub('', s) # prints "foobar"

For leading and trailing whitespace:

s = '   foo    \t   '
print s.strip() # prints "foo"

Otherwise, a regular expression works:

import re
pat = re.compile(r'\s+')
s = '  \t  foo   \t   bar \t  '
print pat.sub('', s) # prints "foobar"

回答 3

您还可以使用非常简单且基本的功能:str.replace(),用于空白和制表符:

>>> whitespaces = "   abcd ef gh ijkl       "
>>> tabs = "        abcde       fgh        ijkl"

>>> print whitespaces.replace(" ", "")
abcdefghijkl
>>> print tabs.replace(" ", "")
abcdefghijkl

简单容易。

You can also use very simple, and basic function: str.replace(), works with the whitespaces and tabs:

>>> whitespaces = "   abcd ef gh ijkl       "
>>> tabs = "        abcde       fgh        ijkl"

>>> print whitespaces.replace(" ", "")
abcdefghijkl
>>> print tabs.replace(" ", "")
abcdefghijkl

Simple and easy.


回答 4

#how to trim a multi line string or a file

s=""" line one
\tline two\t
line three """

#line1 starts with a space, #2 starts and ends with a tab, #3 ends with a space.

s1=s.splitlines()
print s1
[' line one', '\tline two\t', 'line three ']

print [i.strip() for i in s1]
['line one', 'line two', 'line three']




#more details:

#we could also have used a forloop from the begining:
for line in s.splitlines():
    line=line.strip()
    process(line)

#we could also be reading a file line by line.. e.g. my_file=open(filename), or with open(filename) as myfile:
for line in my_file:
    line=line.strip()
    process(line)

#moot point: note splitlines() removed the newline characters, we can keep them by passing True:
#although split() will then remove them anyway..
s2=s.splitlines(True)
print s2
[' line one\n', '\tline two\t\n', 'line three ']
#how to trim a multi line string or a file

s=""" line one
\tline two\t
line three """

#line1 starts with a space, #2 starts and ends with a tab, #3 ends with a space.

s1=s.splitlines()
print s1
[' line one', '\tline two\t', 'line three ']

print [i.strip() for i in s1]
['line one', 'line two', 'line three']




#more details:

#we could also have used a forloop from the begining:
for line in s.splitlines():
    line=line.strip()
    process(line)

#we could also be reading a file line by line.. e.g. my_file=open(filename), or with open(filename) as myfile:
for line in my_file:
    line=line.strip()
    process(line)

#moot point: note splitlines() removed the newline characters, we can keep them by passing True:
#although split() will then remove them anyway..
s2=s.splitlines(True)
print s2
[' line one\n', '\tline two\t\n', 'line three ']

回答 5

尚无人发布这些正则表达式解决方案。

匹配:

>>> import re
>>> p=re.compile('\\s*(.*\\S)?\\s*')

>>> m=p.match('  \t blah ')
>>> m.group(1)
'blah'

>>> m=p.match('  \tbl ah  \t ')
>>> m.group(1)
'bl ah'

>>> m=p.match('  \t  ')
>>> print m.group(1)
None

搜索(您必须以不同的方式处理“仅空格”输入大小写):

>>> p1=re.compile('\\S.*\\S')

>>> m=p1.search('  \tblah  \t ')
>>> m.group()
'blah'

>>> m=p1.search('  \tbl ah  \t ')
>>> m.group()
'bl ah'

>>> m=p1.search('  \t  ')
>>> m.group()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: 'NoneType' object has no attribute 'group'

如果使用re.sub,则可以删除内部空格,这可能是不希望的。

No one has posted these regex solutions yet.

Matching:

>>> import re
>>> p=re.compile('\\s*(.*\\S)?\\s*')

>>> m=p.match('  \t blah ')
>>> m.group(1)
'blah'

>>> m=p.match('  \tbl ah  \t ')
>>> m.group(1)
'bl ah'

>>> m=p.match('  \t  ')
>>> print m.group(1)
None

Searching (you have to handle the “only spaces” input case differently):

>>> p1=re.compile('\\S.*\\S')

>>> m=p1.search('  \tblah  \t ')
>>> m.group()
'blah'

>>> m=p1.search('  \tbl ah  \t ')
>>> m.group()
'bl ah'

>>> m=p1.search('  \t  ')
>>> m.group()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: 'NoneType' object has no attribute 'group'

If you use re.sub, you may remove inner whitespace, which could be undesirable.


回答 6

空格包括空格,制表符和CRLF。因此,我们可以使用的一种优雅且单线的字符串函数是translation

' hello apple'.translate(None, ' \n\t\r')

或者,如果您想彻底

import string
' hello  apple'.translate(None, string.whitespace)

Whitespace includes space, tabs and CRLF. So an elegant and one-liner string function we can use is translate.

' hello apple'.translate(None, ' \n\t\r')

OR if you want to be thorough

import string
' hello  apple'.translate(None, string.whitespace)

回答 7

(re.sub(’+’,”,(my_str.replace(’\ n’,”))))。strip()

这将删除所有不需要的空格和换行符。希望有帮助

import re
my_str = '   a     b \n c   '
formatted_str = (re.sub(' +', ' ',(my_str.replace('\n',' ')))).strip()

这将导致:

‘a b \ nc’ 将更改为 ‘ab c’

(re.sub(‘ +’, ‘ ‘,(my_str.replace(‘\n’,’ ‘)))).strip()

This will remove all the unwanted spaces and newline characters. Hope this help

import re
my_str = '   a     b \n c   '
formatted_str = (re.sub(' +', ' ',(my_str.replace('\n',' ')))).strip()

This will result :

‘ a      b \n c ‘ will be changed to ‘a b c’


回答 8

    something = "\t  please_     \t remove_  all_    \n\n\n\nwhitespaces\n\t  "

    something = "".join(something.split())

输出:

please_remove_all_whitespaces


在答案中添加Le Droid的评论。用空格分隔:

    something = "\t  please     \t remove  all   extra \n\n\n\nwhitespaces\n\t  "
    something = " ".join(something.split())

输出:

请删除所有多余的空格

    something = "\t  please_     \t remove_  all_    \n\n\n\nwhitespaces\n\t  "

    something = "".join(something.split())

output:

please_remove_all_whitespaces


Adding Le Droid’s comment to the answer. To separate with a space:
    something = "\t  please     \t remove  all   extra \n\n\n\nwhitespaces\n\t  "
    something = " ".join(something.split())

output:

please remove all extra whitespaces


回答 9

如果使用Python 3:在您的打印语句中,以sep =“”结尾。这将分隔所有空间。

例:

txt="potatoes"
print("I love ",txt,"",sep="")

这将打印: 我爱土豆。

代替: 我爱土豆。

在您的情况下,由于您尝试使用\ t,因此请执行sep =“ \ t”

If using Python 3: In your print statement, finish with sep=””. That will separate out all of the spaces.

EXAMPLE:

txt="potatoes"
print("I love ",txt,"",sep="")

This will print: I love potatoes.

Instead of: I love potatoes .

In your case, since you would be trying to get ride of the \t, do sep=”\t”


回答 10

在以不同的理解程度查看了这里的许多解决方案之后,我想知道如果字符串用逗号分隔该怎么办…

问题

在尝试处理联系人信息的csv时,我需要一个解决此问题的方法:修剪多余的空格和一些垃圾,但保留尾随逗号和内部空格。我要处理包含联系人注释的字段,所以我想删除垃圾,留下好东西。删除所有标点符号和谷壳后,我不想失去复合令牌之间的空白,因为我不想以后再构建。

正则表达式和模式: [\s_]+?\W+

该模式查找任何空白字符的单个实例,并且下划线(’_’)从1到无数次懒惰(尽可能少的字符),[\s_]+?而在非单词字符从1到无数个数字出现之前时间:( \W+等于[^a-zA-Z0-9_])。具体来说,这会找到大量空格:空字符(\ 0),制表符(\ t),换行符(\ n),前馈(\ f),回车符(\ r)。

我认为这样做有两个好处:

  1. 它不会删除您可能希望保持在一起的完整单词/标记之间的空格;

  2. Python的内置字符串方法strip()不在字符串内部处理,仅在左右两端进行处理,默认arg为空字符(请参见以下示例:文本中包含几行换行符,strip()而regex模式却不会将其全部删除) 。text.strip(' \n\t\r')

这超出了OP的问题,但我认为在很多情况下,像我一样,文本数据中可能会有奇怪的病理性实例(某些转义字符最终出现在某些文本中)。此外,在类似列表的字符串中,除非分隔符将两个空格字符或某些非单词字符分开,例如’-,’或’-、、、’,否则我们不希望删除分隔符。

注意:不是在谈论CSV本身的分隔符。仅在CSV内数据是列表形式的实例,即cs字符串是子字符串。

全面披露:我只处理文本约一个月,而正则表达式仅在最近两周内处理,所以我确定我缺少一些细微差别。就是说,对于较小的字符串集合(我的是在12,000行和40个奇数列的数据帧中),作为除去多余字符的最后一步,此方法效果很好,特别是如果您在其中引入了一些额外的空格想要分隔由非单词字符连接的文本,但又不想在以前没有空格的地方添加空格。

一个例子:

import re


text = "\"portfolio, derp, hello-world, hello-, -world, founders, mentors, :, ?, %, ,>, , ffib, biff, 1, 12.18.02, 12,  2013, 9874890288, .., ..., ...., , ff, series a, exit, general mailing, fr, , , ,, co founder, pitch_at_palace, ba, _slkdjfl_bf, sdf_jlk, )_(, jim.somedude@blahblah.com, ,dd invites,subscribed,, master, , , ,  dd invites,subscribed, , , , \r, , \0, ff dd \n invites, subscribed, , ,  , , alumni spring 2012 deck: https: www.dropbox.com s, \n i69rpofhfsp9t7c practice 20ignition - 20june \t\n .2134.pdf 2109                                                 \n\n\n\nklkjsdf\""

print(f"Here is the text as formatted:\n{text}\n")
print()
print("Trimming both the whitespaces and the non-word characters that follow them.")
print()
trim_ws_punctn = re.compile(r'[\s_]+?\W+')
clean_text = trim_ws_punctn.sub(' ', text)
print(clean_text)
print()
print("what about 'strip()'?")
print(f"Here is the text, formatted as is:\n{text}\n")
clean_text = text.strip(' \n\t\r')  # strip out whitespace?
print()
print(f"Here is the text, formatted as is:\n{clean_text}\n")

print()
print("Are 'text' and 'clean_text' unchanged?")
print(clean_text == text)

输出:

Here is the text as formatted:

"portfolio, derp, hello-world, hello-, -world, founders, mentors, :, ?, %, ,>, , ffib, biff, 1, 12.18.02, 12,  2013, 9874890288, .., ..., ...., , ff, series a, exit, general mailing, fr, , , ,, co founder, pitch_at_palace, ba, _slkdjfl_bf, sdf_jlk, )_(, jim.somedude@blahblah.com, ,dd invites,subscribed,, master, , , ,  dd invites,subscribed, ,, , , ff dd 
 invites, subscribed, , ,  , , alumni spring 2012 deck: https: www.dropbox.com s, 
 i69rpofhfsp9t7c practice 20ignition - 20june 
 .2134.pdf 2109                                                 



klkjsdf" 

using regex to trim both the whitespaces and the non-word characters that follow them.

"portfolio, derp, hello-world, hello-, world, founders, mentors, ffib, biff, 1, 12.18.02, 12, 2013, 9874890288, ff, series a, exit, general mailing, fr, co founder, pitch_at_palace, ba, _slkdjfl_bf, sdf_jlk,  jim.somedude@blahblah.com, dd invites,subscribed,, master, dd invites,subscribed, ff dd invites, subscribed, alumni spring 2012 deck: https: www.dropbox.com s, i69rpofhfsp9t7c practice 20ignition 20june 2134.pdf 2109 klkjsdf"

Very nice.
What about 'strip()'?

Here is the text, formatted as is:

"portfolio, derp, hello-world, hello-, -world, founders, mentors, :, ?, %, ,>, , ffib, biff, 1, 12.18.02, 12,  2013, 9874890288, .., ..., ...., , ff, series a, exit, general mailing, fr, , , ,, co founder, pitch_at_palace, ba, _slkdjfl_bf, sdf_jlk, )_(, jim.somedude@blahblah.com, ,dd invites,subscribed,, master, , , ,  dd invites,subscribed, ,, , , ff dd 
 invites, subscribed, , ,  , , alumni spring 2012 deck: https: www.dropbox.com s, 
 i69rpofhfsp9t7c practice 20ignition - 20june 
 .2134.pdf 2109                                                 



klkjsdf"


Here is the text, after stipping with 'strip':


"portfolio, derp, hello-world, hello-, -world, founders, mentors, :, ?, %, ,>, , ffib, biff, 1, 12.18.02, 12,  2013, 9874890288, .., ..., ...., , ff, series a, exit, general mailing, fr, , , ,, co founder, pitch_at_palace, ba, _slkdjfl_bf, sdf_jlk, )_(, jim.somedude@blahblah.com, ,dd invites,subscribed,, master, , , ,  dd invites,subscribed, ,, , , ff dd 
 invites, subscribed, , ,  , , alumni spring 2012 deck: https: www.dropbox.com s, 
 i69rpofhfsp9t7c practice 20ignition - 20june 
 .2134.pdf 2109                                                 



klkjsdf"
Are 'text' and 'clean_text' unchanged? 'True'

因此,strip一次删除一个空格。因此,在OP的情况下,strip()可以。但是如果情况变得更加复杂,则对于更一般的设置,正则表达式和类似的模式可能会有一定价值。

看到它在行动

Having looked at quite a few solutions here with various degrees of understanding, I wondered what to do if the string was comma separated…

the problem

While trying to process a csv of contact information, I needed a solution this problem: trim extraneous whitespace and some junk, but preserve trailing commas, and internal whitespace. Working with a field containing notes on the contacts, I wanted to remove the garbage, leaving the good stuff. Trimming out all the punctuation and chaff, I didn’t want to lose the whitespace between compound tokens as I didn’t want to rebuild later.

regex and patterns: [\s_]+?\W+

The pattern looks for single instances of any whitespace character and the underscore (‘_’) from 1 to an unlimited number of times lazily (as few characters as possible) with [\s_]+? that come before non-word characters occurring from 1 to an unlimited amount of time with this: \W+ (is equivalent to [^a-zA-Z0-9_]). Specifically, this finds swaths of whitespace: null characters (\0), tabs (\t), newlines (\n), feed-forward (\f), carriage returns (\r).

I see the advantage to this as two-fold:

  1. that it doesn’t remove whitespace between the complete words/tokens that you might want to keep together;

  2. Python’s built in string method strip()doesn’t deal inside the string, just the left and right ends, and default arg is null characters (see below example: several newlines are in the text, and strip() does not remove them all while the regex pattern does). text.strip(' \n\t\r')

This goes beyond the OPs question, but I think there are plenty of cases where we might have odd, pathological instances within the text data, as I did (some how the escape characters ended up in some of the text). Moreover, in list-like strings, we don’t want to eliminate the delimiter unless the delimiter separates two whitespace characters or some non-word character, like ‘-,’ or ‘-, ,,,’.

NB: Not talking about the delimiter of the CSV itself. Only of instances within the CSV where the data is list-like, ie is a c.s. string of substrings.

Full disclosure: I’ve only been manipulating text for about a month, and regex only the last two weeks, so I’m sure there are some nuances I’m missing. That said, for smaller collections of strings (mine are in a dataframe of 12,000 rows and 40 odd columns), as a final step after a pass for removal of extraneous characters, this works exceptionally well, especially if you introduce some additional whitespace where you want to separate text joined by a non-word character, but don’t want to add whitespace where there was none before.

An example:

import re


text = "\"portfolio, derp, hello-world, hello-, -world, founders, mentors, :, ?, %, ,>, , ffib, biff, 1, 12.18.02, 12,  2013, 9874890288, .., ..., ...., , ff, series a, exit, general mailing, fr, , , ,, co founder, pitch_at_palace, ba, _slkdjfl_bf, sdf_jlk, )_(, jim.somedude@blahblah.com, ,dd invites,subscribed,, master, , , ,  dd invites,subscribed, , , , \r, , \0, ff dd \n invites, subscribed, , ,  , , alumni spring 2012 deck: https: www.dropbox.com s, \n i69rpofhfsp9t7c practice 20ignition - 20june \t\n .2134.pdf 2109                                                 \n\n\n\nklkjsdf\""

print(f"Here is the text as formatted:\n{text}\n")
print()
print("Trimming both the whitespaces and the non-word characters that follow them.")
print()
trim_ws_punctn = re.compile(r'[\s_]+?\W+')
clean_text = trim_ws_punctn.sub(' ', text)
print(clean_text)
print()
print("what about 'strip()'?")
print(f"Here is the text, formatted as is:\n{text}\n")
clean_text = text.strip(' \n\t\r')  # strip out whitespace?
print()
print(f"Here is the text, formatted as is:\n{clean_text}\n")

print()
print("Are 'text' and 'clean_text' unchanged?")
print(clean_text == text)

This outputs:

Here is the text as formatted:

"portfolio, derp, hello-world, hello-, -world, founders, mentors, :, ?, %, ,>, , ffib, biff, 1, 12.18.02, 12,  2013, 9874890288, .., ..., ...., , ff, series a, exit, general mailing, fr, , , ,, co founder, pitch_at_palace, ba, _slkdjfl_bf, sdf_jlk, )_(, jim.somedude@blahblah.com, ,dd invites,subscribed,, master, , , ,  dd invites,subscribed, ,, , , ff dd 
 invites, subscribed, , ,  , , alumni spring 2012 deck: https: www.dropbox.com s, 
 i69rpofhfsp9t7c practice 20ignition - 20june 
 .2134.pdf 2109                                                 



klkjsdf" 

using regex to trim both the whitespaces and the non-word characters that follow them.

"portfolio, derp, hello-world, hello-, world, founders, mentors, ffib, biff, 1, 12.18.02, 12, 2013, 9874890288, ff, series a, exit, general mailing, fr, co founder, pitch_at_palace, ba, _slkdjfl_bf, sdf_jlk,  jim.somedude@blahblah.com, dd invites,subscribed,, master, dd invites,subscribed, ff dd invites, subscribed, alumni spring 2012 deck: https: www.dropbox.com s, i69rpofhfsp9t7c practice 20ignition 20june 2134.pdf 2109 klkjsdf"

Very nice.
What about 'strip()'?

Here is the text, formatted as is:

"portfolio, derp, hello-world, hello-, -world, founders, mentors, :, ?, %, ,>, , ffib, biff, 1, 12.18.02, 12,  2013, 9874890288, .., ..., ...., , ff, series a, exit, general mailing, fr, , , ,, co founder, pitch_at_palace, ba, _slkdjfl_bf, sdf_jlk, )_(, jim.somedude@blahblah.com, ,dd invites,subscribed,, master, , , ,  dd invites,subscribed, ,, , , ff dd 
 invites, subscribed, , ,  , , alumni spring 2012 deck: https: www.dropbox.com s, 
 i69rpofhfsp9t7c practice 20ignition - 20june 
 .2134.pdf 2109                                                 



klkjsdf"


Here is the text, after stipping with 'strip':


"portfolio, derp, hello-world, hello-, -world, founders, mentors, :, ?, %, ,>, , ffib, biff, 1, 12.18.02, 12,  2013, 9874890288, .., ..., ...., , ff, series a, exit, general mailing, fr, , , ,, co founder, pitch_at_palace, ba, _slkdjfl_bf, sdf_jlk, )_(, jim.somedude@blahblah.com, ,dd invites,subscribed,, master, , , ,  dd invites,subscribed, ,, , , ff dd 
 invites, subscribed, , ,  , , alumni spring 2012 deck: https: www.dropbox.com s, 
 i69rpofhfsp9t7c practice 20ignition - 20june 
 .2134.pdf 2109                                                 



klkjsdf"
Are 'text' and 'clean_text' unchanged? 'True'

So strip removes one whitespace from at a time. So in the OPs case, strip() is fine. but if things get any more complex, regex and a similar pattern may be of some value for more general settings.

see it in action


回答 11

尝试翻译

>>> import string
>>> print '\t\r\n  hello \r\n world \t\r\n'

  hello 
 world  
>>> tr = string.maketrans(string.whitespace, ' '*len(string.whitespace))
>>> '\t\r\n  hello \r\n world \t\r\n'.translate(tr)
'     hello    world    '
>>> '\t\r\n  hello \r\n world \t\r\n'.translate(tr).replace(' ', '')
'helloworld'

try translate

>>> import string
>>> print '\t\r\n  hello \r\n world \t\r\n'

  hello 
 world  
>>> tr = string.maketrans(string.whitespace, ' '*len(string.whitespace))
>>> '\t\r\n  hello \r\n world \t\r\n'.translate(tr)
'     hello    world    '
>>> '\t\r\n  hello \r\n world \t\r\n'.translate(tr).replace(' ', '')
'helloworld'

回答 12

如果要仅在字符串的开头和结尾处修剪空格,则可以执行以下操作:

some_string = "    Hello,    world!\n    "
new_string = some_string.strip()
# new_string is now "Hello,    world!"

这与Qt的QString :: trimmed()方法非常相似,因为它删除了前导和尾随空格,而只保留了内部空格。

但是,如果您想使用类似Qt的QString :: simplified()方法的方法,该方法不仅删除开头和结尾的空格,还可以将所有连续的内部空格“挤压”到一个空格字符,则可以使用.split()and 的组合" ".join,如下所示:

some_string = "\t    Hello,  \n\t  world!\n    "
new_string = " ".join(some_string.split())
# new_string is now "Hello, world!"

在最后一个示例中,内部空格的每个序列都用一个空格代替,同时仍在字符串的开头和结尾修剪空格。

If you want to trim the whitespace off just the beginning and end of the string, you can do something like this:

some_string = "    Hello,    world!\n    "
new_string = some_string.strip()
# new_string is now "Hello,    world!"

This works a lot like Qt’s QString::trimmed() method, in that it removes leading and trailing whitespace, while leaving internal whitespace alone.

But if you’d like something like Qt’s QString::simplified() method which not only removes leading and trailing whitespace, but also “squishes” all consecutive internal whitespace to one space character, you can use a combination of .split() and " ".join, like this:

some_string = "\t    Hello,  \n\t  world!\n    "
new_string = " ".join(some_string.split())
# new_string is now "Hello, world!"

In this last example, each sequence of internal whitespace replaced with a single space, while still trimming the whitespace off the start and end of the string.


回答 13

通常,我使用以下方法:

>>> myStr = "Hi\n Stack Over \r flow!"
>>> charList = [u"\u005Cn",u"\u005Cr",u"\u005Ct"]
>>> import re
>>> for i in charList:
        myStr = re.sub(i, r"", myStr)

>>> myStr
'Hi Stack Over  flow'

注意:这仅用于删除“ \ n”,“ \ r”和“ \ t”。它不会删除多余的空间。

Generally, I am using the following method:

>>> myStr = "Hi\n Stack Over \r flow!"
>>> charList = [u"\u005Cn",u"\u005Cr",u"\u005Ct"]
>>> import re
>>> for i in charList:
        myStr = re.sub(i, r"", myStr)

>>> myStr
'Hi Stack Over  flow'

Note: This is only for removing “\n”, “\r” and “\t” only. It does not remove extra spaces.


回答 14

用于从字符串中间删除空格

$p = "ATGCGAC ACGATCGACC";
$p =~ s/\s//g;
print $p;

输出:

ATGCGACACGATCGACC

for removing whitespaces from the middle of the string

$p = "ATGCGAC ACGATCGACC";
$p =~ s/\s//g;
print $p;

output:

ATGCGACACGATCGACC

回答 15

这将删除字符串开头和结尾的所有空格和换行符:

>>> s = "  \n\t  \n   some \n text \n     "
>>> re.sub("^\s+|\s+$", "", s)
>>> "some \n text"

This will remove all whitespace and newlines from both the beginning and end of a string:

>>> s = "  \n\t  \n   some \n text \n     "
>>> re.sub("^\s+|\s+$", "", s)
>>> "some \n text"

将行写入文件的正确方法?

问题:将行写入文件的正确方法?

我已经习惯了 print >>f, "hi there"

但是,似乎print >>已经弃用了。推荐使用哪种方法进行上述操作?

更新:关于…的所有这些答案,"\n"这是通用的还是特定于Unix的?IE,我应该"\r\n"在Windows上运行吗?

I’m used to doing print >>f, "hi there"

However, it seems that print >> is getting deprecated. What is the recommended way to do the line above?

Update: Regarding all those answers with "\n"…is this universal or Unix-specific? IE, should I be doing "\r\n" on Windows?


回答 0

这应该很简单:

with open('somefile.txt', 'a') as the_file:
    the_file.write('Hello\n')

从文档:

os.linesep写入以文本模式打开的文件时(默认),请勿用作行终止符;在所有平台上都使用一个’\ n’代替。

一些有用的读物​​:

This should be as simple as:

with open('somefile.txt', 'a') as the_file:
    the_file.write('Hello\n')

From The Documentation:

Do not use os.linesep as a line terminator when writing files opened in text mode (the default); use a single ‘\n’ instead, on all platforms.

Some useful reading:


回答 1

您应该使用print()Python 2.6+以上版本提供的功能

from __future__ import print_function  # Only needed for Python 2
print("hi there", file=f)

对于Python 3,您不需要import,因为该 print()功能是默认设置。

替代方法是使用:

f = open('myfile', 'w')
f.write('hi there\n')  # python will convert \n to os.linesep
f.close()  # you can omit in most cases as the destructor will call it

引用Python文档中有关换行符的内容:

在输出中,如果换行符为None,则所有'\n'写入的字符都会转换为系统默认的行分隔符os.linesep。如果newline是'',则不会进行翻译。如果换行符是其他任何合法值,'\n'则将写入的所有字符转换为给定的字符串。

You should use the print() function which is available since Python 2.6+

from __future__ import print_function  # Only needed for Python 2
print("hi there", file=f)

For Python 3 you don’t need the import, since the print() function is the default.

The alternative would be to use:

f = open('myfile', 'w')
f.write('hi there\n')  # python will convert \n to os.linesep
f.close()  # you can omit in most cases as the destructor will call it

Quoting from Python documentation regarding newlines:

On output, if newline is None, any '\n' characters written are translated to the system default line separator, os.linesep. If newline is '', no translation takes place. If newline is any of the other legal values, any '\n' characters written are translated to the given string.


回答 2

Python文档建议是这样的:

with open('file_to_write', 'w') as f:
    f.write('file contents\n')

所以这就是我通常的方式:)

来自docs.python.org的声明:

在处理文件对象时,最好使用‘with’关键字。这样做的好处是,即使在执行过程中引发了异常,文件在其套件完成后也将正确关闭。它也比编写等效的try-finally块短得多。

The python docs recommend this way:

with open('file_to_write', 'w') as f:
    f.write('file contents\n')

So this is the way I usually do it :)

Statement from docs.python.org:

It is good practice to use the ‘with’ keyword when dealing with file objects. This has the advantage that the file is properly closed after its suite finishes, even if an exception is raised on the way. It is also much shorter than writing equivalent try-finally blocks.


回答 3

关于os.linesep:

这是Windows上未经编辑的Python 2.7.1解释器的确切会话:

Python 2.7.1 (r271:86832, Nov 27 2010, 18:30:46) [MSC v.1500 32 bit (Intel)] on
win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import os
>>> os.linesep
'\r\n'
>>> f = open('myfile','w')
>>> f.write('hi there\n')
>>> f.write('hi there' + os.linesep) # same result as previous line ?????????
>>> f.close()
>>> open('myfile', 'rb').read()
'hi there\r\nhi there\r\r\n'
>>>

在Windows上:

正如预期的那样,os.linesep确实产生相同的结果一样'\n'。它不可能产生相同的结果。'hi there' + os.linesep等同于'hi there\r\n'等同于'hi there\n'

就是这么简单:使用\n它将自动转换为os.linesep。自从将Python首次移植到Windows以来,事情就变得如此简单。

在非Windows系统上使用os.linesep是没有意义的,并且在Windows上会产生错误的结果。

请勿使用os.linesep!

Regarding os.linesep:

Here is an exact unedited Python 2.7.1 interpreter session on Windows:

Python 2.7.1 (r271:86832, Nov 27 2010, 18:30:46) [MSC v.1500 32 bit (Intel)] on
win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import os
>>> os.linesep
'\r\n'
>>> f = open('myfile','w')
>>> f.write('hi there\n')
>>> f.write('hi there' + os.linesep) # same result as previous line ?????????
>>> f.close()
>>> open('myfile', 'rb').read()
'hi there\r\nhi there\r\r\n'
>>>

On Windows:

As expected, os.linesep does NOT produce the same outcome as '\n'. There is no way that it could produce the same outcome. 'hi there' + os.linesep is equivalent to 'hi there\r\n', which is NOT equivalent to 'hi there\n'.

It’s this simple: use \n which will be translated automatically to os.linesep. And it’s been that simple ever since the first port of Python to Windows.

There is no point in using os.linesep on non-Windows systems, and it produces wrong results on Windows.

DO NOT USE os.linesep!


回答 4

我认为没有“正确”的方法。

我会用:

with open ('myfile', 'a') as f: f.write ('hi there\n')

在纪念蒂姆·托迪(Tim Toady)中

I do not think there is a “correct” way.

I would use:

with open ('myfile', 'a') as f: f.write ('hi there\n')

In memoriam Tim Toady.


回答 5

在Python 3中,它是一个函数,但是在Python 2中,您可以将其添加到源文件的顶部:

from __future__ import print_function

那你做

print("hi there", file=f)

In Python 3 it is a function, but in Python 2 you can add this to the top of the source file:

from __future__ import print_function

Then you do

print("hi there", file=f)

回答 6

如果您要写入大量数据,并且速度是一个问题,那么您可能应该考虑一下f.write(...)。我进行了快速的速度比较,它比print(..., file=f)执行大量写操作时要快得多。

import time    

start = start = time.time()
with open("test.txt", 'w') as f:
    for i in range(10000000):
        # print('This is a speed test', file=f)
        # f.write('This is a speed test\n')
end = time.time()
print(end - start)

write我的机器上,平均完成print时间为2.45秒,而耗时则为(9.76s)的4倍。话虽这么说,在大多数现实情况下这都不是问题。

如果您选择使用,print(..., file=f)您可能会发现您可能会不希望取消换行符,或将其替换为其他内容。这可以通过设置可选end参数来完成,例如;

with open("test", 'w') as f:
    print('Foo1,', file=f, end='')
    print('Foo2,', file=f, end='')
    print('Foo3', file=f)

我建议您使用哪种选择,with因为它使代码更易于阅读。

更新:这种性能差异是由以下事实解释的:write高度缓冲并在实际对磁盘进行任何写操作之前返回(请参阅此答案),而print(可能)使用行缓冲。一个简单的测试就是检查长写的性能,因为行缓冲的缺点(在速度方面)不太明显。

start = start = time.time()
long_line = 'This is a speed test' * 100
with open("test.txt", 'w') as f:
    for i in range(1000000):
        # print(long_line, file=f)
        # f.write(long_line + '\n')
end = time.time()

print(end - start, "s")

现在,性能差异变得不那么明显了,的平均时间为2.20s write和3.10s print。如果您需要连接一串字符串来获得这种良好的性能,那么性能会受到影响,因此使用案例中print效率更高的情况很少见。

If you are writing a lot of data and speed is a concern you should probably go with f.write(...). I did a quick speed comparison and it was considerably faster than print(..., file=f) when performing a large number of writes.

import time    

start = start = time.time()
with open("test.txt", 'w') as f:
    for i in range(10000000):
        # print('This is a speed test', file=f)
        # f.write('This is a speed test\n')
end = time.time()
print(end - start)

On average write finished in 2.45s on my machine, whereas print took about 4 times as long (9.76s). That being said, in most real-world scenarios this will not be an issue.

If you choose to go with print(..., file=f) you will probably find that you’ll want to suppress the newline from time to time, or replace it with something else. This can be done by setting the optional end parameter, e.g.;

with open("test", 'w') as f:
    print('Foo1,', file=f, end='')
    print('Foo2,', file=f, end='')
    print('Foo3', file=f)

Whichever way you choose I’d suggest using with since it makes the code much easier to read.

Update: This difference in performance is explained by the fact that write is highly buffered and returns before any writes to disk actually take place (see this answer), whereas print (probably) uses line buffering. A simple test for this would be to check performance for long writes as well, where the disadvantages (in terms of speed) for line buffering would be less pronounced.

start = start = time.time()
long_line = 'This is a speed test' * 100
with open("test.txt", 'w') as f:
    for i in range(1000000):
        # print(long_line, file=f)
        # f.write(long_line + '\n')
end = time.time()

print(end - start, "s")

The performance difference now becomes much less pronounced, with an average time of 2.20s for write and 3.10s for print. If you need to concatenate a bunch of strings to get this loooong line performance will suffer, so use-cases where print would be more efficient are a bit rare.


回答 7

从3.5开始,您还可以使用pathlib

Path.write_text(data, encoding=None, errors=None)

打开以文本模式指向的文件,向其中写入数据,然后关闭文件:

import pathlib

pathlib.Path('textfile.txt').write_text('content')

Since 3.5 you can also use the pathlib for that purpose:

Path.write_text(data, encoding=None, errors=None)

Open the file pointed to in text mode, write data to it, and close the file:

import pathlib

pathlib.Path('textfile.txt').write_text('content')

回答 8

当您说Line时,它表示一些序列化的字符,它们以’\ n’字符结尾。行应该在最后一点,所以我们应该在每行的末尾考虑’\ n’。这是解决方案:

with open('YOURFILE.txt', 'a') as the_file:
    the_file.write("Hello")

在追加模式下,每次写入光标后移至新行,如果要使用w模式,则应\nwrite()函数末尾添加字符:

the_file.write("Hello\n")

When you said Line it means some serialized characters which are ended to ‘\n’ characters. Line should be last at some point so we should consider ‘\n’ at the end of each line. Here is solution:

with open('YOURFILE.txt', 'a') as the_file:
    the_file.write("Hello")

in append mode after each write the cursor move to new line, if you want to use w mode you should add \n characters at the end of the write() function:

the_file.write("Hello\n")

回答 9

也可以按以下方式使用该io模块:

import io
my_string = "hi there"

with io.open("output_file.txt", mode='w', encoding='utf-8') as f:
    f.write(my_string)

One can also use the io module as in:

import io
my_string = "hi there"

with io.open("output_file.txt", mode='w', encoding='utf-8') as f:
    f.write(my_string)

回答 10

可以使用烧瓶中的文件写入文本:

filehandle = open("text.txt", "w")
filebuffer = ["hi","welcome","yes yes welcome"]
filehandle.writelines(filebuffer)
filehandle.close()

To write text in a file in the flask can be used:

filehandle = open("text.txt", "w")
filebuffer = ["hi","welcome","yes yes welcome"]
filehandle.writelines(filebuffer)
filehandle.close()

回答 11

您也可以尝试 filewriter

pip install filewriter

from filewriter import Writer

Writer(filename='my_file', ext='txt') << ["row 1 hi there", "row 2"]

写入 my_file.txt

接受可迭代对象或带有__str__支持的对象。

You can also try filewriter

pip install filewriter

from filewriter import Writer

Writer(filename='my_file', ext='txt') << ["row 1 hi there", "row 2"]

Writes into my_file.txt

Takes an iterable or an object with __str__ support.


回答 12

当我需要写很多新行时,我定义一个使用print函数的lambda :

out = open(file_name, 'w')
fwl = lambda *x, **y: print(*x, **y, file=out) # FileWriteLine
fwl('Hi')

这种方法的好处是可以利用该功能可用的所有print功能。

更新:正如Georgy在评论部分中提到的那样,可以通过以下partial功能进一步改善此想法:

from functools import partial
fwl = partial(print, file=out)

恕我直言,这是一种功能更强,含糊不清的方法。

When I need to write new lines a lot, I define a lambda that uses a print function:

out = open(file_name, 'w')
fwl = lambda *x, **y: print(*x, **y, file=out) # FileWriteLine
fwl('Hi')

This approach has the benefit that it can utilize all the features that are available with the print function.

Update: As is mentioned by Georgy in the comment section, it is possible to improve this idea further with the partial function:

from functools import partial
fwl = partial(print, file=out)

IMHO, this is a more functional and less cryptic approach.


如何从函数返回多个值?[关闭]

问题:如何从函数返回多个值?[关闭]

用支持它的语言返回多个值的规范方法通常是麻烦的

选项:使用元组

考虑下面这个简单的例子:

def f(x):
  y0 = x + 1
  y1 = x * 3
  y2 = y0 ** y3
  return (y0, y1, y2)

但是,随着返回值的数量增加,这很快就会成为问题。如果要返回四个或五个值怎么办?当然,您可以继续修改它们,但是很容易忘记哪个值在哪里。在任何要接收它们的地方打开它们的包装也是很丑陋的。

选项:使用字典

下一步的逻辑步骤似乎是引入某种“记录符号”。在Python中,显而易见的方法是使用dict

考虑以下:

def g(x):
  y0 = x + 1
  y1 = x * 3
  y2 = y0 ** y3
  return {'y0': y0, 'y1': y1 ,'y2': y2}

(请注意,y0,y1和y2只是抽象标识符。正如所指出的,实际上,您将使用有意义的标识符。)

现在,我们有了一种机制,可以投影出返回对象的特定成员。例如,

result['y0']

选项:使用类

但是,还有另一种选择。相反,我们可以返回一个特殊的结构。我已经在Python的上下文中对此进行了框架化,但是我确信它也适用于其他语言。确实,如果您使用C语言工作,这很可能是您唯一的选择。开始:

class ReturnValue:
  def __init__(self, y0, y1, y2):
     self.y0 = y0
     self.y1 = y1
     self.y2 = y2

def g(x):
  y0 = x + 1
  y1 = x * 3
  y2 = y0 ** y3
  return ReturnValue(y0, y1, y2)

在Python中,前面的两个在管道方面可能非常相似-毕竟{ y0, y1, y2 }最终只是__dict__ReturnValue

Python提供了一项附加功能,尽管对于微小的对象,__slots__属性。该类可以表示为:

class ReturnValue(object):
  __slots__ = ["y0", "y1", "y2"]
  def __init__(self, y0, y1, y2):
     self.y0 = y0
     self.y1 = y1
     self.y2 = y2

Python参考手册中

__slots__声明采用一系列实例变量,并在每个实例中仅保留足够的空间来容纳每个变量的值。因为__dict__未为每个实例创建空间,所以节省了空间。

选项:使用数据类(Python 3.7+)

使用Python 3.7的新数据类,返回一个具有自动添加的特殊方法,键入和其他有用工具的类:

@dataclass
class Returnvalue:
    y0: int
    y1: float
    y3: int

def total_cost(x):
    y0 = x + 1
    y1 = x * 3
    y2 = y0 ** y3
    return ReturnValue(y0, y1, y2)

选项:使用列表

我忽略的另一个建议来自蜥蜴人比尔:

def h(x):
  result = [x + 1]
  result.append(x * 3)
  result.append(y0 ** y3)
  return result

这是我最不喜欢的方法。我想我对接触Haskell感到很受污染,但是混合类型列表的想法一直让我感到不舒服。在此特定示例中,列表为“非混合”类型,但可以想象是这样。

据我所知,以这种方式使用的列表实际上对元组没有任何好处。Python中列表和元组之间的唯一真正区别是列表是可变的,而元组则不是。

我个人倾向于继承函数式编程的约定:对任何数量的相同类型的元素使用列表,对固定数量的预定类型的元素使用元组。

在冗长的序言之后,出现了不可避免的问题。(您认为)哪种方法最好?

The canonical way to return multiple values in languages that support it is often tupling.

Option: Using a tuple

Consider this trivial example:

def f(x):
  y0 = x + 1
  y1 = x * 3
  y2 = y0 ** y3
  return (y0, y1, y2)

However, this quickly gets problematic as the number of values returned increases. What if you want to return four or five values? Sure, you could keep tupling them, but it gets easy to forget which value is where. It’s also rather ugly to unpack them wherever you want to receive them.

Option: Using a dictionary

The next logical step seems to be to introduce some sort of ‘record notation’. In Python, the obvious way to do this is by means of a dict.

Consider the following:

def g(x):
  y0 = x + 1
  y1 = x * 3
  y2 = y0 ** y3
  return {'y0': y0, 'y1': y1 ,'y2': y2}

(Just to be clear, y0, y1, and y2 are just meant as abstract identifiers. As pointed out, in practice you’d use meaningful identifiers.)

Now, we have a mechanism whereby we can project out a particular member of the returned object. For example,

result['y0']

Option: Using a class

However, there is another option. We could instead return a specialized structure. I’ve framed this in the context of Python, but I’m sure it applies to other languages as well. Indeed, if you were working in C this might very well be your only option. Here goes:

class ReturnValue:
  def __init__(self, y0, y1, y2):
     self.y0 = y0
     self.y1 = y1
     self.y2 = y2

def g(x):
  y0 = x + 1
  y1 = x * 3
  y2 = y0 ** y3
  return ReturnValue(y0, y1, y2)

In Python the previous two are perhaps very similar in terms of plumbing – after all { y0, y1, y2 } just end up being entries in the internal __dict__ of the ReturnValue.

There is one additional feature provided by Python though for tiny objects, the __slots__ attribute. The class could be expressed as:

class ReturnValue(object):
  __slots__ = ["y0", "y1", "y2"]
  def __init__(self, y0, y1, y2):
     self.y0 = y0
     self.y1 = y1
     self.y2 = y2

From the Python Reference Manual:

The __slots__ declaration takes a sequence of instance variables and reserves just enough space in each instance to hold a value for each variable. Space is saved because __dict__ is not created for each instance.

Option: Using a dataclass (Python 3.7+)

Using Python 3.7’s new dataclasses, return a class with automatically added special methods, typing and other useful tools:

@dataclass
class Returnvalue:
    y0: int
    y1: float
    y3: int

def total_cost(x):
    y0 = x + 1
    y1 = x * 3
    y2 = y0 ** y3
    return ReturnValue(y0, y1, y2)

Option: Using a list

Another suggestion which I’d overlooked comes from Bill the Lizard:

def h(x):
  result = [x + 1]
  result.append(x * 3)
  result.append(y0 ** y3)
  return result

This is my least favorite method though. I suppose I’m tainted by exposure to Haskell, but the idea of mixed-type lists has always felt uncomfortable to me. In this particular example the list is -not- mixed type, but it conceivably could be.

A list used in this way really doesn’t gain anything with respect to the tuple as far as I can tell. The only real difference between lists and tuples in Python is that lists are mutable, whereas tuples are not.

I personally tend to carry over the conventions from functional programming: use lists for any number of elements of the same type, and tuples for a fixed number of elements of predetermined types.

Question

After the lengthy preamble, comes the inevitable question. Which method (do you think) is best?


回答 0

为此,在2.6中添加了命名元组。另请参见os.stat以获取类似的内置示例。

>>> import collections
>>> Point = collections.namedtuple('Point', ['x', 'y'])
>>> p = Point(1, y=2)
>>> p.x, p.y
1 2
>>> p[0], p[1]
1 2

在最新版本的Python 3(我认为是3.6+)中,新typing库提供了NamedTuple使命名元组更易于创建和更强大的类。通过继承,typing.NamedTuple您可以使用文档字符串,默认值和类型注释。

示例(来自文档):

class Employee(NamedTuple):  # inherit from typing.NamedTuple
    name: str
    id: int = 3  # default value

employee = Employee('Guido')
assert employee.id == 3

Named tuples were added in 2.6 for this purpose. Also see os.stat for a similar builtin example.

>>> import collections
>>> Point = collections.namedtuple('Point', ['x', 'y'])
>>> p = Point(1, y=2)
>>> p.x, p.y
1 2
>>> p[0], p[1]
1 2

In recent versions of Python 3 (3.6+, I think), the new typing library got the NamedTuple class to make named tuples easier to create and more powerful. Inheriting from typing.NamedTuple lets you use docstrings, default values, and type annotations.

Example (From the docs):

class Employee(NamedTuple):  # inherit from typing.NamedTuple
    name: str
    id: int = 3  # default value

employee = Employee('Guido')
assert employee.id == 3

回答 1

对于小型项目,我发现使用元组最简单。当这变得难以管理时(而不是之前),我开始将事物分组为逻辑结构,但是我认为您建议使用字典和ReturnValue对象是错误的(或者过于简单)。

返回与键的字典"y0""y1""y2"等不提供任何优势元组。返回一个ReturnValue实例与性能.y0.y1.y2等不提供任何元组过任何优势。如果您想到达任何地方,就需要开始命名事物,并且无论如何都可以使用元组来命名:

def get_image_data(filename):
    [snip]
    return size, (format, version, compression), (width,height)

size, type, dimensions = get_image_data(x)

恕我直言,除元组之外,唯一好的技术是使用适当的方法和属性返回真实对象,就像您从re.match()或获取的那样open(file)

For small projects I find it easiest to work with tuples. When that gets too hard to manage (and not before) I start grouping things into logical structures, however I think your suggested use of dictionaries and ReturnValue objects is wrong (or too simplistic).

Returning a dictionary with keys "y0", "y1", "y2", etc. doesn’t offer any advantage over tuples. Returning a ReturnValue instance with properties .y0, .y1, .y2, etc. doesn’t offer any advantage over tuples either. You need to start naming things if you want to get anywhere, and you can do that using tuples anyway:

def get_image_data(filename):
    [snip]
    return size, (format, version, compression), (width,height)

size, type, dimensions = get_image_data(x)

IMHO, the only good technique beyond tuples is to return real objects with proper methods and properties, like you get from re.match() or open(file).


回答 2

许多答案表明您需要返回某种类型的集合,例如字典或列表。您可以省去多余的语法,而只需写出返回值(以逗号分隔)即可。注意:从技术上讲,这将返回一个元组。

def f():
    return True, False
x, y = f()
print(x)
print(y)

给出:

True
False

A lot of the answers suggest you need to return a collection of some sort, like a dictionary or a list. You could leave off the extra syntax and just write out the return values, comma-separated. Note: this technically returns a tuple.

def f():
    return True, False
x, y = f()
print(x)
print(y)

gives:

True
False

回答 3

我投票给字典。

我发现,如果我创建的函数返回的变量超过2-3个,则将它们折叠成字典。否则,我往往会忘记所返回内容的顺序和内容。

另外,引入“特殊”结构会使您的代码更难以遵循。(其他人将不得不搜索代码以找出它是什么)

如果您担心类型查找,请使用描述性字典键,例如“ x值列表”。

def g(x):
  y0 = x + 1
  y1 = x * 3
  y2 = y0 ** y3
  return {'y0':y0, 'y1':y1 ,'y2':y2 }

I vote for the dictionary.

I find that if I make a function that returns anything more than 2-3 variables I’ll fold them up in a dictionary. Otherwise I tend to forget the order and content of what I’m returning.

Also, introducing a ‘special’ structure makes your code more difficult to follow. (Someone else will have to search through the code to find out what it is)

If your concerned about type look up, use descriptive dictionary keys, for example, ‘x-values list’.

def g(x):
  y0 = x + 1
  y1 = x * 3
  y2 = y0 ** y3
  return {'y0':y0, 'y1':y1 ,'y2':y2 }

回答 4

另一种选择是使用生成器:

>>> def f(x):
        y0 = x + 1
        yield y0
        yield x * 3
        yield y0 ** 4


>>> a, b, c = f(5)
>>> a
6
>>> b
15
>>> c
1296

尽管IMHO元组通常是最好的,除非返回的值是封装在类中的候选对象。

Another option would be using generators:

>>> def f(x):
        y0 = x + 1
        yield y0
        yield x * 3
        yield y0 ** 4


>>> a, b, c = f(5)
>>> a
6
>>> b
15
>>> c
1296

Although IMHO tuples are usually best, except in cases where the values being returned are candidates for encapsulation in a class.


回答 5

我更喜欢在元组感到“自然”时使用元组。坐标是一个典型示例,其中单独的对象可以独立站立,例如在单轴缩放计算中,顺序很重要。注意:如果我可以对项目进行排序或改组而不会对组的含义造成不利影响,那么我可能不应该使用元组。

仅当分组的对象并不总是相同时,我才使用字典作为返回值。考虑可选的电子邮件标题。

对于其余的情况,如果分组的对象在组内具有固有的含义,或者需要具有自己方法的成熟对象,则使用类。

I prefer to use tuples whenever a tuple feels “natural”; coordinates are a typical example, where the separate objects can stand on their own, e.g. in one-axis only scaling calculations, and the order is important. Note: if I can sort or shuffle the items without an adverse effect to the meaning of the group, then I probably shouldn’t use a tuple.

I use dictionaries as a return value only when the grouped objects aren’t always the same. Think optional email headers.

For the rest of the cases, where the grouped objects have inherent meaning inside the group or a fully-fledged object with its own methods is needed, I use a class.


回答 6

我更喜欢:

def g(x):
  y0 = x + 1
  y1 = x * 3
  y2 = y0 ** y3
  return {'y0':y0, 'y1':y1 ,'y2':y2 }

似乎其他所有东西只是做相同事情的额外代码。

I prefer:

def g(x):
  y0 = x + 1
  y1 = x * 3
  y2 = y0 ** y3
  return {'y0':y0, 'y1':y1 ,'y2':y2 }

It seems everything else is just extra code to do the same thing.


回答 7

>>> def func():
...    return [1,2,3]
...
>>> a,b,c = func()
>>> a
1
>>> b
2
>>> c
3
>>> def func():
...    return [1,2,3]
...
>>> a,b,c = func()
>>> a
1
>>> b
2
>>> c
3

回答 8

通常,“专用结构”实际上是具有其自身方法的对象的当前状态。

class Some3SpaceThing(object):
  def __init__(self,x):
    self.g(x)
  def g(self,x):
    self.y0 = x + 1
    self.y1 = x * 3
    self.y2 = y0 ** y3

r = Some3SpaceThing( x )
r.y0
r.y1
r.y2

我希望在可能的地方找到匿名结构的名称。有意义的名称使事情变得更清楚。

Generally, the “specialized structure” actually IS a sensible current state of an object, with its own methods.

class Some3SpaceThing(object):
  def __init__(self,x):
    self.g(x)
  def g(self,x):
    self.y0 = x + 1
    self.y1 = x * 3
    self.y2 = y0 ** y3

r = Some3SpaceThing( x )
r.y0
r.y1
r.y2

I like to find names for anonymous structures where possible. Meaningful names make things more clear.


回答 9

Python的元组,字典和对象为程序员提供了在小型数据结构(“事物”)的形式和便利之间的平滑权衡。对我而言,如何表示事物的选择主要取决于我将如何使用结构。在C ++中,即使您可以合法地将方法放在; 上,也struct仅用于纯数据项和class带有方法的对象是一种常见的约定struct。我的习惯与Python类似,用dicttuple代替struct

对于坐标集,我将使用a tuple而不是点class或a dict(并且请注意,您可以将a tuple用作字典键,因此dicts是非常好的稀疏多维数组)。

如果我要遍历所有东西,我更喜欢tuple在迭代中解包s:

for score,id,name in scoreAllTheThings():
    if score > goodScoreThreshold:
        print "%6.3f #%6d %s"%(score,id,name)

…由于对象版本更易阅读:

for entry in scoreAllTheThings():
    if entry.score > goodScoreThreshold:
        print "%6.3f #%6d %s"%(entry.score,entry.id,entry.name)

…更不用说了dict

for entry in scoreAllTheThings():
    if entry['score'] > goodScoreThreshold:
        print "%6.3f #%6d %s"%(entry['score'],entry['id'],entry['name'])

如果该事物被广泛使用,并且您发现自己在代码中的多个位置对它执行了类似的非平凡操作,那么通常值得用适当的方法将其变成一个类对象。

最后,如果我要与非Python系统组件交换数据,那么我通常会将它们放在a中,dict因为这最适合JSON序列化。

Python’s tuples, dicts, and objects offer the programmer a smooth tradeoff between formality and convenience for small data structures (“things”). For me, the choice of how to represent a thing is dictated mainly by how I’m going to use the structure. In C++, it’s a common convention to use struct for data-only items and class for objects with methods, even though you can legally put methods on a struct; my habit is similar in Python, with dict and tuple in place of struct.

For coordinate sets, I’ll use a tuple rather than a point class or a dict (and note that you can use a tuple as a dictionary key, so dicts make great sparse multidimensional arrays).

If I’m going to be iterating over a list of things, I prefer unpacking tuples on the iteration:

for score,id,name in scoreAllTheThings():
    if score > goodScoreThreshold:
        print "%6.3f #%6d %s"%(score,id,name)

…as the object version is more cluttered to read:

for entry in scoreAllTheThings():
    if entry.score > goodScoreThreshold:
        print "%6.3f #%6d %s"%(entry.score,entry.id,entry.name)

…let alone the dict.

for entry in scoreAllTheThings():
    if entry['score'] > goodScoreThreshold:
        print "%6.3f #%6d %s"%(entry['score'],entry['id'],entry['name'])

If the thing is widely used, and you find yourself doing similar non-trivial operations on it in multiple places in the code, then it’s usually worthwhile to make it a class object with appropriate methods.

Finally, if I’m going to be exchanging data with non-Python system components, I’ll most often keep them in a dict because that’s best suited to JSON serialization.


回答 10

S.Lott关于命名容器类的建议的+1。

对于Python 2.6及更高版本,命名元组提供了一种轻松创建这些容器类的有用方法,其结果是“重量轻,并且不需要比常规元组更多的内存”。

+1 on S.Lott’s suggestion of a named container class.

For Python 2.6 and up, a named tuple provides a useful way of easily creating these container classes, and the results are “lightweight and require no more memory than regular tuples”.


回答 11

在像Python这样的语言中,我通常会使用字典,因为与创建新类相比,它所涉及的开销更少。

但是,如果我发现自己不断返回相同的变量集,则可能涉及一个我要考虑的新类。

In languages like Python, I would usually use a dictionary as it involves less overhead than creating a new class.

However, if I find myself constantly returning the same set of variables, then that probably involves a new class that I’ll factor out.


回答 12

我将使用字典来传递和从函数返回值:

使用form中定义的变量form

form = {
    'level': 0,
    'points': 0,
    'game': {
        'name': ''
    }
}


def test(form):
    form['game']['name'] = 'My game!'
    form['level'] = 2

    return form

>>> print(test(form))
{u'game': {u'name': u'My game!'}, u'points': 0, u'level': 2}

对于我和处理单元而言,这是最有效的方法。

您只需要传递一个指针并返回一个指针即可。

在代码中进行更改时,不必更改函数的参数(成千上万个)。

I would use a dict to pass and return values from a function:

Use variable form as defined in form.

form = {
    'level': 0,
    'points': 0,
    'game': {
        'name': ''
    }
}


def test(form):
    form['game']['name'] = 'My game!'
    form['level'] = 2

    return form

>>> print(test(form))
{u'game': {u'name': u'My game!'}, u'points': 0, u'level': 2}

This is the most efficient way for me and for processing unit.

You have to pass just one pointer in and return just one pointer out.

You do not have to change functions’ (thousands of them) arguments whenever you make a change in your code.


回答 13

“最佳”是部分主观的决定。在可接受不可变的一般情况下,将元组用于小的收益集。当不需要可变性时,元组总是比列表更可取。

对于更复杂的返回值,或者对于形式化很有价值(即高价值代码)的情况,命名元组更好。对于最复杂的情​​况,对象通常是最好的。但是,实际情况才是最重要的。如果返回一个对象是有意义的,因为那是您在函数末尾自然所拥有的(例如Factory模式),则返回该对象。

正如智者所说:

过早的优化是编程中所有邪恶(或至少是大多数邪恶)的根源。

“Best” is a partially subjective decision. Use tuples for small return sets in the general case where an immutable is acceptable. A tuple is always preferable to a list when mutability is not a requirement.

For more complex return values, or for the case where formality is valuable (i.e. high value code) a named tuple is better. For the most complex case an object is usually best. However, it’s really the situation that matters. If it makes sense to return an object because that is what you naturally have at the end of the function (e.g. Factory pattern) then return the object.

As the wise man said:

Premature optimization is the root of all evil (or at least most of it) in programming.


如何打印JSON文件?

问题:如何打印JSON文件?

我有一个JSON文件,我想对其进行漂亮打印-在python中执行此操作的最简单方法是什么?我知道PrettyPrint带有一个“对象”,我认为它可以是一个文件,但是我不知道如何传递文件-仅使用文件名是行不通的。

I have a JSON file that is a mess that I want to prettyprint– what’s the easiest way to do this in python? I know PrettyPrint takes an “object”, which I think can be a file, but I don’t know how to pass a file in– just using the filename doesn’t work.


回答 0

json模块已经使用indent参数实现了一些基本的漂亮打印:

>>> import json
>>>
>>> your_json = '["foo", {"bar":["baz", null, 1.0, 2]}]'
>>> parsed = json.loads(your_json)
>>> print(json.dumps(parsed, indent=4, sort_keys=True))
[
    "foo", 
    {
        "bar": [
            "baz", 
            null, 
            1.0, 
            2
        ]
    }
]

要解析文件,请使用json.load()

with open('filename.txt', 'r') as handle:
    parsed = json.load(handle)

The json module already implements some basic pretty printing with the indent parameter:

>>> import json
>>>
>>> your_json = '["foo", {"bar":["baz", null, 1.0, 2]}]'
>>> parsed = json.loads(your_json)
>>> print(json.dumps(parsed, indent=4, sort_keys=True))
[
    "foo", 
    {
        "bar": [
            "baz", 
            null, 
            1.0, 
            2
        ]
    }
]

To parse a file, use json.load():

with open('filename.txt', 'r') as handle:
    parsed = json.load(handle)

回答 1

您可以在命令行上执行此操作:

python3 -m json.tool some.json

(正如问题注释中已经提到的,感谢@Kai Petzke的python3建议)。

实际上,就命令行上的json处理而言,python不是我最喜欢的工具。简单的漂亮打印是可以的,但是如果您要操作json,它可能会变得过于复杂。您很快就需要编写一个单独的脚本文件,最终可能得到其键为u“ some-key”(python unicode)的地图,这会使选择字段更加困难,并且实际上并没有朝着漂亮的方向发展。 -印刷。

您也可以使用jq

jq . some.json

并获得颜色作为奖励(并且更容易扩展)。

附录:关于使用jq一方面处理大型JSON文件,另一方面使用非常大的jq程序的注释有些混乱。对于漂亮地打印由单个大型JSON实体组成的文件,实际的限制是RAM。对于漂亮地打印由单个真实数据数组组成的2GB文件,漂亮打印所需的“最大驻留集大小”为5GB(无论使用jq 1.5还是1.6)。还要注意,jq可以在python之后使用pip install jq

You can do this on the command line:

python3 -m json.tool some.json

(as already mentioned in the commentaries to the question, thanks to @Kai Petzke for the python3 suggestion).

Actually python is not my favourite tool as far as json processing on the command line is concerned. For simple pretty printing is ok, but if you want to manipulate the json it can become overcomplicated. You’d soon need to write a separate script-file, you could end up with maps whose keys are u”some-key” (python unicode), which makes selecting fields more difficult and doesn’t really go in the direction of pretty-printing.

You can also use jq:

jq . some.json

and you get colors as a bonus (and way easier extendability).

Addendum: There is some confusion in the comments about using jq to process large JSON files on the one hand, and having a very large jq program on the other. For pretty-printing a file consisting of a single large JSON entity, the practical limitation is RAM. For pretty-printing a 2GB file consisting of a single array of real-world data, the “maximum resident set size” required for pretty-printing was 5GB (whether using jq 1.5 or 1.6). Note also that jq can be used from within python after pip install jq.


回答 2

您可以使用内置的modul pprint(https://docs.python.org/3.6/library/pprint.html)

如何读取带有json数据的文件并打印出来。

import json
import pprint

json_data = None
with open('filename.txt', 'r') as f:
    data = f.read()
    json_data = json.loads(data)

pprint.pprint(json_data)

You could use the built-in modul pprint (https://docs.python.org/3.6/library/pprint.html).

How you can read the file with json data and print it out.

import json
import pprint

json_data = None
with open('filename.txt', 'r') as f:
    data = f.read()
    json_data = json.loads(data)

pprint.pprint(json_data)

回答 3

Pygmentize + Python json.tool =带有语法突出显示的漂亮打印

Pygmentize是杀手级工具。看到这个。

我结合python json.tool与pygmentize

echo '{"foo": "bar"}' | python -m json.tool | pygmentize -l json

有关pygmentize安装说明,请参见上面的链接。

下图是一个演示:

Pygmentize + Python json.tool = Pretty Print with Syntax Highlighting

Pygmentize is a killer tool. See this.

I combine python json.tool with pygmentize

echo '{"foo": "bar"}' | python -m json.tool | pygmentize -l json

See the link above for pygmentize installation instruction.

A demo of this is in the image below:


回答 4

使用此功能,不出汗不必记住,如果你的JSON是一种strdict再次-这个漂亮的打印只要看看:

import json

def pp_json(json_thing, sort=True, indents=4):
    if type(json_thing) is str:
        print(json.dumps(json.loads(json_thing), sort_keys=sort, indent=indents))
    else:
        print(json.dumps(json_thing, sort_keys=sort, indent=indents))
    return None

pp_json(your_json_string_or_dict)

Use this function and don’t sweat having to remember if your JSON is a str or dict again – just look at the pretty print:

import json

def pp_json(json_thing, sort=True, indents=4):
    if type(json_thing) is str:
        print(json.dumps(json.loads(json_thing), sort_keys=sort, indent=indents))
    else:
        print(json.dumps(json_thing, sort_keys=sort, indent=indents))
    return None

pp_json(your_json_string_or_dict)

回答 5

我曾经写过一个prettyjson()函数来产生漂亮的输出。您可以从此仓库中获取实现。

此功能的主要功能是尝试将dict和list项目保持在一行中,直到maxlinelength达到确定的水平为止。这样会产生更少的JSON行,输出看起来更紧凑且更易于阅读。

您可以产生这种输出,例如:

{
  "grid": {"port": "COM5"},
  "policy": {
    "movingaverage": 5,
    "hysteresis": 5,
    "fan1": {
      "name": "CPU",
      "signal": "cpu",
      "mode": "auto",
      "speed": 100,
      "curve": [[0, 75], [50, 75], [75, 100]]
    }
}

UPD 19年12月:我将代码放入单独的存储库中,更正了一些错误,并进行了其他一些调整。

I once wrote a prettyjson() function to produce nice-looking output. You can grab the implementation from this repo.

The main feature of this function is it tries to keep dict and list items in one line until a certain maxlinelength is reached. This produces fewer lines of JSON, the output looks more compact and easier to read.

You can produce this kind of output for instance:

{
  "grid": {"port": "COM5"},
  "policy": {
    "movingaverage": 5,
    "hysteresis": 5,
    "fan1": {
      "name": "CPU",
      "signal": "cpu",
      "mode": "auto",
      "speed": 100,
      "curve": [[0, 75], [50, 75], [75, 100]]
    }
}

UPD Dec’19: I placed the code into a separate repo, corrected a few bugs and made a few other tweaks.


回答 6

为了能够从命令行进行漂亮的打印并能够控制缩进等,您可以设置类似于以下的别名:

alias jsonpp="python -c 'import sys, json; print json.dumps(json.load(sys.stdin), sort_keys=True, indent=2)'"

然后以下列方式之一使用别名:

cat myfile.json | jsonpp
jsonpp < myfile.json

To be able to pretty print from the command line and be able to have control over the indentation etc. you can set up an alias similar to this:

alias jsonpp="python -c 'import sys, json; print json.dumps(json.load(sys.stdin), sort_keys=True, indent=2)'"

And then use the alias in one of these ways:

cat myfile.json | jsonpp
jsonpp < myfile.json

回答 7

使用pprint:https ://docs.python.org/3.6/library/pprint.html

import pprint
pprint.pprint(json)

print() 相比 pprint.pprint()

print(json)
{'feed': {'title': 'W3Schools Home Page', 'title_detail': {'type': 'text/plain', 'language': None, 'base': '', 'value': 'W3Schools Home Page'}, 'links': [{'rel': 'alternate', 'type': 'text/html', 'href': 'https://www.w3schools.com'}], 'link': 'https://www.w3schools.com', 'subtitle': 'Free web building tutorials', 'subtitle_detail': {'type': 'text/html', 'language': None, 'base': '', 'value': 'Free web building tutorials'}}, 'entries': [], 'bozo': 0, 'encoding': 'utf-8', 'version': 'rss20', 'namespaces': {}}

pprint.pprint(json)
{'bozo': 0,
 'encoding': 'utf-8',
 'entries': [],
 'feed': {'link': 'https://www.w3schools.com',
          'links': [{'href': 'https://www.w3schools.com',
                     'rel': 'alternate',
                     'type': 'text/html'}],
          'subtitle': 'Free web building tutorials',
          'subtitle_detail': {'base': '',
                              'language': None,
                              'type': 'text/html',
                              'value': 'Free web building tutorials'},
          'title': 'W3Schools Home Page',
          'title_detail': {'base': '',
                           'language': None,
                           'type': 'text/plain',
                           'value': 'W3Schools Home Page'}},
 'namespaces': {},
 'version': 'rss20'}

Use pprint: https://docs.python.org/3.6/library/pprint.html

import pprint
pprint.pprint(json)

print() compared to pprint.pprint()

print(json)
{'feed': {'title': 'W3Schools Home Page', 'title_detail': {'type': 'text/plain', 'language': None, 'base': '', 'value': 'W3Schools Home Page'}, 'links': [{'rel': 'alternate', 'type': 'text/html', 'href': 'https://www.w3schools.com'}], 'link': 'https://www.w3schools.com', 'subtitle': 'Free web building tutorials', 'subtitle_detail': {'type': 'text/html', 'language': None, 'base': '', 'value': 'Free web building tutorials'}}, 'entries': [], 'bozo': 0, 'encoding': 'utf-8', 'version': 'rss20', 'namespaces': {}}

pprint.pprint(json)
{'bozo': 0,
 'encoding': 'utf-8',
 'entries': [],
 'feed': {'link': 'https://www.w3schools.com',
          'links': [{'href': 'https://www.w3schools.com',
                     'rel': 'alternate',
                     'type': 'text/html'}],
          'subtitle': 'Free web building tutorials',
          'subtitle_detail': {'base': '',
                              'language': None,
                              'type': 'text/html',
                              'value': 'Free web building tutorials'},
          'title': 'W3Schools Home Page',
          'title_detail': {'base': '',
                           'language': None,
                           'type': 'text/plain',
                           'value': 'W3Schools Home Page'}},
 'namespaces': {},
 'version': 'rss20'}

回答 8

这是一个简单的示例,可以在Python中以一种不错的方式将JSON打印到控制台,而无需将JSON作为本地文件存储在您的计算机上:

import pprint
import json 
from urllib.request import urlopen # (Only used to get this example)

# Getting a JSON example for this example 
r = urlopen("https://mdn.github.io/fetch-examples/fetch-json/products.json")
text = r.read() 

# To print it
pprint.pprint(json.loads(text))

Here’s a simple example of pretty printing JSON to the console in a nice way in Python, without requiring the JSON to be on your computer as a local file:

import pprint
import json 
from urllib.request import urlopen # (Only used to get this example)

# Getting a JSON example for this example 
r = urlopen("https://mdn.github.io/fetch-examples/fetch-json/products.json")
text = r.read() 

# To print it
pprint.pprint(json.loads(text))

回答 9

def saveJson(date,fileToSave):
    with open(fileToSave, 'w+') as fileToSave:
        json.dump(date, fileToSave, ensure_ascii=True, indent=4, sort_keys=True)

它可以显示或保存到文件中。

def saveJson(date,fileToSave):
    with open(fileToSave, 'w+') as fileToSave:
        json.dump(date, fileToSave, ensure_ascii=True, indent=4, sort_keys=True)

It works to display or save it to a file.


回答 10

我认为最好先解析json,以避免出现错误:

def format_response(response):
    try:
        parsed = json.loads(response.text)
    except JSONDecodeError:
        return response.text
    return json.dumps(parsed, ensure_ascii=True, indent=4)

I think that’s better to parse the json before, to avoid errors:

def format_response(response):
    try:
        parsed = json.loads(response.text)
    except JSONDecodeError:
        return response.text
    return json.dumps(parsed, ensure_ascii=True, indent=4)

回答 11

您可以尝试pprintjson


安装

$ pip3 install pprintjson

用法

使用pprintjson CLI从文件漂亮地打印JSON。

$ pprintjson "./path/to/file.json"

使用pprintjson CLI从标准输入漂亮地打印JSON。

$ echo '{ "a": 1, "b": "string", "c": true }' | pprintjson

使用pprintjson CLI从字符串漂亮地打印JSON。

$ pprintjson -c '{ "a": 1, "b": "string", "c": true }'

从缩进为1的字符串漂亮地打印JSON。

$ pprintjson -c '{ "a": 1, "b": "string", "c": true }' -i 1

从字符串漂亮地打印JSON并将输出保存到文件output.json。

$ pprintjson -c '{ "a": 1, "b": "string", "c": true }' -o ./output.json

输出量

You could try pprintjson.


Installation

$ pip3 install pprintjson

Usage

Pretty print JSON from a file using the pprintjson CLI.

$ pprintjson "./path/to/file.json"

Pretty print JSON from a stdin using the pprintjson CLI.

$ echo '{ "a": 1, "b": "string", "c": true }' | pprintjson

Pretty print JSON from a string using the pprintjson CLI.

$ pprintjson -c '{ "a": 1, "b": "string", "c": true }'

Pretty print JSON from a string with an indent of 1.

$ pprintjson -c '{ "a": 1, "b": "string", "c": true }' -i 1

Pretty print JSON from a string and save output to a file output.json.

$ pprintjson -c '{ "a": 1, "b": "string", "c": true }' -o ./output.json

Output


回答 12

它远非完美,但可以做到。

data = data.replace(',"',',\n"')

您可以对其进行改进,添加缩进等,但是如果您只想能够阅读更简洁的json,则可以采用这种方法。

It’s far from perfect, but it does the job.

data = data.replace(',"',',\n"')

you can improve it, add indenting and so on, but if you just want to be able to read a cleaner json, this is the way to go.