分类目录归档:知识问答

熊猫:如何将一列中的文本分成多行?

问题:熊猫:如何将一列中的文本分成多行?

我正在处理一个较大的csv文件,并且最后一列的旁边是一串文本,我想用一个特定的分隔符来分割它。我想知道是否有使用pandas或python的简单方法?

CustNum  CustomerName     ItemQty  Item   Seatblocks                 ItemExt
32363    McCartney, Paul      3     F04    2:218:10:4,6                   60
31316    Lennon, John        25     F01    1:13:36:1,12 1:13:37:1,13     300

我想先按空格(' ')(':')Seatblocks列中按冒号分开,但每个单元格将导致列数不同。我具有重新排列列的功能,因此Seatblocks列位于工作表的末尾,但是我不确定从那里开始如何做。我可以使用内置text-to-columns函数和快速宏在excel中完成此操作,但是我的数据集记录太多,无法处理excel。

最终,我想记录约翰·列侬的记录并创建多行,并将每组座位的信息放在单独的行上。

I’m working with a large csv file and the next to last column has a string of text that I want to split by a specific delimiter. I was wondering if there is a simple way to do this using pandas or python?

CustNum  CustomerName     ItemQty  Item   Seatblocks                 ItemExt
32363    McCartney, Paul      3     F04    2:218:10:4,6                   60
31316    Lennon, John        25     F01    1:13:36:1,12 1:13:37:1,13     300

I want to split by the space(' ') and then the colon(':') in the Seatblocks column, but each cell would result in a different number of columns. I have a function to rearrange the columns so the Seatblocks column is at the end of the sheet, but I’m not sure what to do from there. I can do it in excel with the built in text-to-columns function and a quick macro, but my dataset has too many records for excel to handle.

Ultimately, I want to take records such John Lennon’s and create multiple lines, with the info from each set of seats on a separate line.


回答 0

这将座垫按空间划分,并给每个单独的行。

In [43]: df
Out[43]: 
   CustNum     CustomerName  ItemQty Item                 Seatblocks  ItemExt
0    32363  McCartney, Paul        3  F04               2:218:10:4,6       60
1    31316     Lennon, John       25  F01  1:13:36:1,12 1:13:37:1,13      300

In [44]: s = df['Seatblocks'].str.split(' ').apply(Series, 1).stack()

In [45]: s.index = s.index.droplevel(-1) # to line up with df's index

In [46]: s.name = 'Seatblocks' # needs a name to join

In [47]: s
Out[47]: 
0    2:218:10:4,6
1    1:13:36:1,12
1    1:13:37:1,13
Name: Seatblocks, dtype: object

In [48]: del df['Seatblocks']

In [49]: df.join(s)
Out[49]: 
   CustNum     CustomerName  ItemQty Item  ItemExt    Seatblocks
0    32363  McCartney, Paul        3  F04       60  2:218:10:4,6
1    31316     Lennon, John       25  F01      300  1:13:36:1,12
1    31316     Lennon, John       25  F01      300  1:13:37:1,13

或者,将每个冒号分隔的字符串放在自己的列中:

In [50]: df.join(s.apply(lambda x: Series(x.split(':'))))
Out[50]: 
   CustNum     CustomerName  ItemQty Item  ItemExt  0    1   2     3
0    32363  McCartney, Paul        3  F04       60  2  218  10   4,6
1    31316     Lennon, John       25  F01      300  1   13  36  1,12
1    31316     Lennon, John       25  F01      300  1   13  37  1,13

这有点丑陋,但也许有人会用更漂亮的解决方案。

This splits the Seatblocks by space and gives each its own row.

In [43]: df
Out[43]: 
   CustNum     CustomerName  ItemQty Item                 Seatblocks  ItemExt
0    32363  McCartney, Paul        3  F04               2:218:10:4,6       60
1    31316     Lennon, John       25  F01  1:13:36:1,12 1:13:37:1,13      300

In [44]: s = df['Seatblocks'].str.split(' ').apply(Series, 1).stack()

In [45]: s.index = s.index.droplevel(-1) # to line up with df's index

In [46]: s.name = 'Seatblocks' # needs a name to join

In [47]: s
Out[47]: 
0    2:218:10:4,6
1    1:13:36:1,12
1    1:13:37:1,13
Name: Seatblocks, dtype: object

In [48]: del df['Seatblocks']

In [49]: df.join(s)
Out[49]: 
   CustNum     CustomerName  ItemQty Item  ItemExt    Seatblocks
0    32363  McCartney, Paul        3  F04       60  2:218:10:4,6
1    31316     Lennon, John       25  F01      300  1:13:36:1,12
1    31316     Lennon, John       25  F01      300  1:13:37:1,13

Or, to give each colon-separated string in its own column:

In [50]: df.join(s.apply(lambda x: Series(x.split(':'))))
Out[50]: 
   CustNum     CustomerName  ItemQty Item  ItemExt  0    1   2     3
0    32363  McCartney, Paul        3  F04       60  2  218  10   4,6
1    31316     Lennon, John       25  F01      300  1   13  36  1,12
1    31316     Lennon, John       25  F01      300  1   13  37  1,13

This is a little ugly, but maybe someone will chime in with a prettier solution.


回答 1

与Dan不同的是,我认为他的回答相当优雅……但是不幸的是,它的效率也非常低下。因此,由于问题提到“大的csv文件”,因此我建议尝试使用Shell Dan的解决方案:

time python -c "import pandas as pd;
df = pd.DataFrame(['a b c']*100000, columns=['col']);
print df['col'].apply(lambda x : pd.Series(x.split(' '))).head()"

…与这种替代方案相比:

time python -c "import pandas as pd;
from scipy import array, concatenate;
df = pd.DataFrame(['a b c']*100000, columns=['col']);
print pd.DataFrame(concatenate(df['col'].apply( lambda x : [x.split(' ')]))).head()"

… 还有这个:

time python -c "import pandas as pd;
df = pd.DataFrame(['a b c']*100000, columns=['col']);
print pd.DataFrame(dict(zip(range(3), [df['col'].apply(lambda x : x.split(' ')[i]) for i in range(3)]))).head()"

第二个简单地避免了分配10万个序列,这足以使它快10倍左右。但是,第三种解决方案有点讽刺地浪费了对str.split()的调用(每行每列调用一次,因此比其他两种解决方案多三倍),它比第一种解决方案快40倍,因为它甚至避免实例化100000个列表。是的,这确实有点丑陋…

编辑: 此答案建议如何使用“ to_list()”并避免使用lambda。结果是像

time python -c "import pandas as pd;
df = pd.DataFrame(['a b c']*100000, columns=['col']);
print pd.DataFrame(df.col.str.split().tolist()).head()"

这比第三个解决方案更有效,而且肯定更优雅。

编辑:更简单

time python -c "import pandas as pd;
df = pd.DataFrame(['a b c']*100000, columns=['col']);
print pd.DataFrame(list(df.col.str.split())).head()"

也可以,并且几乎一样有效。

编辑: 更简单!并处理NaN(但效率较低):

time python -c "import pandas as pd;
df = pd.DataFrame(['a b c']*100000, columns=['col']);
print df.col.str.split(expand=True).head()"

Differently from Dan, I consider his answer quite elegant… but unfortunately it is also very very inefficient. So, since the question mentioned “a large csv file”, let me suggest to try in a shell Dan’s solution:

time python -c "import pandas as pd;
df = pd.DataFrame(['a b c']*100000, columns=['col']);
print df['col'].apply(lambda x : pd.Series(x.split(' '))).head()"

… compared to this alternative:

time python -c "import pandas as pd;
from scipy import array, concatenate;
df = pd.DataFrame(['a b c']*100000, columns=['col']);
print pd.DataFrame(concatenate(df['col'].apply( lambda x : [x.split(' ')]))).head()"

… and this:

time python -c "import pandas as pd;
df = pd.DataFrame(['a b c']*100000, columns=['col']);
print pd.DataFrame(dict(zip(range(3), [df['col'].apply(lambda x : x.split(' ')[i]) for i in range(3)]))).head()"

The second simply refrains from allocating 100 000 Series, and this is enough to make it around 10 times faster. But the third solution, which somewhat ironically wastes a lot of calls to str.split() (it is called once per column per row, so three times more than for the others two solutions), is around 40 times faster than the first, because it even avoids to instance the 100 000 lists. And yes, it is certainly a little ugly…

EDIT: this answer suggests how to use “to_list()” and to avoid the need for a lambda. The result is something like

time python -c "import pandas as pd;
df = pd.DataFrame(['a b c']*100000, columns=['col']);
print pd.DataFrame(df.col.str.split().tolist()).head()"

which is even more efficient than the third solution, and certainly much more elegant.

EDIT: the even simpler

time python -c "import pandas as pd;
df = pd.DataFrame(['a b c']*100000, columns=['col']);
print pd.DataFrame(list(df.col.str.split())).head()"

works too, and is almost as efficient.

EDIT: even simpler! And handles NaNs (but less efficient):

time python -c "import pandas as pd;
df = pd.DataFrame(['a b c']*100000, columns=['col']);
print df.col.str.split(expand=True).head()"

回答 2

import pandas as pd
import numpy as np

df = pd.DataFrame({'ItemQty': {0: 3, 1: 25}, 
                   'Seatblocks': {0: '2:218:10:4,6', 1: '1:13:36:1,12 1:13:37:1,13'}, 
                   'ItemExt': {0: 60, 1: 300}, 
                   'CustomerName': {0: 'McCartney, Paul', 1: 'Lennon, John'}, 
                   'CustNum': {0: 32363, 1: 31316}, 
                   'Item': {0: 'F04', 1: 'F01'}}, 
                    columns=['CustNum','CustomerName','ItemQty','Item','Seatblocks','ItemExt'])

print (df)
   CustNum     CustomerName  ItemQty Item                 Seatblocks  ItemExt
0    32363  McCartney, Paul        3  F04               2:218:10:4,6       60
1    31316     Lennon, John       25  F01  1:13:36:1,12 1:13:37:1,13      300

链接的另一个类似解决方案是use reset_indexrename

print (df.drop('Seatblocks', axis=1)
             .join
             (
             df.Seatblocks
             .str
             .split(expand=True)
             .stack()
             .reset_index(drop=True, level=1)
             .rename('Seatblocks')           
             ))

   CustNum     CustomerName  ItemQty Item  ItemExt    Seatblocks
0    32363  McCartney, Paul        3  F04       60  2:218:10:4,6
1    31316     Lennon, John       25  F01      300  1:13:36:1,12
1    31316     Lennon, John       25  F01      300  1:13:37:1,13

如果in列中不是NOT NaN值,则最快的解决方案是listDataFrame构造函数使用理解:

df = pd.DataFrame(['a b c']*100000, columns=['col'])

In [141]: %timeit (pd.DataFrame(dict(zip(range(3), [df['col'].apply(lambda x : x.split(' ')[i]) for i in range(3)]))))
1 loop, best of 3: 211 ms per loop

In [142]: %timeit (pd.DataFrame(df.col.str.split().tolist()))
10 loops, best of 3: 87.8 ms per loop

In [143]: %timeit (pd.DataFrame(list(df.col.str.split())))
10 loops, best of 3: 86.1 ms per loop

In [144]: %timeit (df.col.str.split(expand=True))
10 loops, best of 3: 156 ms per loop

In [145]: %timeit (pd.DataFrame([ x.split() for x in df['col'].tolist()]))
10 loops, best of 3: 54.1 ms per loop

但是如果列NaN只包含str.splitexpand=True返回的参数一起使用DataFrame值为(document)的,那么它解释了为什么它比较慢:

df = pd.DataFrame(['a b c']*10, columns=['col'])
df.loc[0] = np.nan
print (df.head())
     col
0    NaN
1  a b c
2  a b c
3  a b c
4  a b c

print (df.col.str.split(expand=True))
     0     1     2
0  NaN  None  None
1    a     b     c
2    a     b     c
3    a     b     c
4    a     b     c
5    a     b     c
6    a     b     c
7    a     b     c
8    a     b     c
9    a     b     c
import pandas as pd
import numpy as np

df = pd.DataFrame({'ItemQty': {0: 3, 1: 25}, 
                   'Seatblocks': {0: '2:218:10:4,6', 1: '1:13:36:1,12 1:13:37:1,13'}, 
                   'ItemExt': {0: 60, 1: 300}, 
                   'CustomerName': {0: 'McCartney, Paul', 1: 'Lennon, John'}, 
                   'CustNum': {0: 32363, 1: 31316}, 
                   'Item': {0: 'F04', 1: 'F01'}}, 
                    columns=['CustNum','CustomerName','ItemQty','Item','Seatblocks','ItemExt'])

print (df)
   CustNum     CustomerName  ItemQty Item                 Seatblocks  ItemExt
0    32363  McCartney, Paul        3  F04               2:218:10:4,6       60
1    31316     Lennon, John       25  F01  1:13:36:1,12 1:13:37:1,13      300

Another similar solution with chaining is use reset_index and rename:

print (df.drop('Seatblocks', axis=1)
             .join
             (
             df.Seatblocks
             .str
             .split(expand=True)
             .stack()
             .reset_index(drop=True, level=1)
             .rename('Seatblocks')           
             ))

   CustNum     CustomerName  ItemQty Item  ItemExt    Seatblocks
0    32363  McCartney, Paul        3  F04       60  2:218:10:4,6
1    31316     Lennon, John       25  F01      300  1:13:36:1,12
1    31316     Lennon, John       25  F01      300  1:13:37:1,13

If in column are NOT NaN values, the fastest solution is use list comprehension with DataFrame constructor:

df = pd.DataFrame(['a b c']*100000, columns=['col'])

In [141]: %timeit (pd.DataFrame(dict(zip(range(3), [df['col'].apply(lambda x : x.split(' ')[i]) for i in range(3)]))))
1 loop, best of 3: 211 ms per loop

In [142]: %timeit (pd.DataFrame(df.col.str.split().tolist()))
10 loops, best of 3: 87.8 ms per loop

In [143]: %timeit (pd.DataFrame(list(df.col.str.split())))
10 loops, best of 3: 86.1 ms per loop

In [144]: %timeit (df.col.str.split(expand=True))
10 loops, best of 3: 156 ms per loop

In [145]: %timeit (pd.DataFrame([ x.split() for x in df['col'].tolist()]))
10 loops, best of 3: 54.1 ms per loop

But if column contains NaN only works str.split with parameter expand=True which return DataFrame (documentation), and it explain why it is slowier:

df = pd.DataFrame(['a b c']*10, columns=['col'])
df.loc[0] = np.nan
print (df.head())
     col
0    NaN
1  a b c
2  a b c
3  a b c
4  a b c

print (df.col.str.split(expand=True))
     0     1     2
0  NaN  None  None
1    a     b     c
2    a     b     c
3    a     b     c
4    a     b     c
5    a     b     c
6    a     b     c
7    a     b     c
8    a     b     c
9    a     b     c

回答 3

另一种方法是这样的:

temp = df['Seatblocks'].str.split(' ')
data = data.reindex(data.index.repeat(temp.apply(len)))
data['new_Seatblocks'] = np.hstack(temp)

Another approach would be like this:

temp = df['Seatblocks'].str.split(' ')
data = data.reindex(data.index.repeat(temp.apply(len)))
data['new_Seatblocks'] = np.hstack(temp)

回答 4

也可以使用groupby()而不需要加入和stack()。

使用上面的示例数据:

import pandas as pd
import numpy as np


df = pd.DataFrame({'ItemQty': {0: 3, 1: 25}, 
                   'Seatblocks': {0: '2:218:10:4,6', 1: '1:13:36:1,12 1:13:37:1,13'}, 
                   'ItemExt': {0: 60, 1: 300}, 
                   'CustomerName': {0: 'McCartney, Paul', 1: 'Lennon, John'}, 
                   'CustNum': {0: 32363, 1: 31316}, 
                   'Item': {0: 'F04', 1: 'F01'}}, 
                    columns=['CustNum','CustomerName','ItemQty','Item','Seatblocks','ItemExt']) 
print(df)

   CustNum     CustomerName  ItemQty Item                 Seatblocks  ItemExt
0  32363    McCartney, Paul  3        F04  2:218:10:4,6               60     
1  31316    Lennon, John     25       F01  1:13:36:1,12 1:13:37:1,13  300  


#first define a function: given a Series of string, split each element into a new series
def split_series(ser,sep):
    return pd.Series(ser.str.cat(sep=sep).split(sep=sep)) 
#test the function, 
split_series(pd.Series(['a b','c']),sep=' ')
0    a
1    b
2    c
dtype: object

df2=(df.groupby(df.columns.drop('Seatblocks').tolist()) #group by all but one column
          ['Seatblocks'] #select the column to be split
          .apply(split_series,sep=' ') # split 'Seatblocks' in each group
         .reset_index(drop=True,level=-1).reset_index()) #remove extra index created

print(df2)
   CustNum     CustomerName  ItemQty Item  ItemExt    Seatblocks
0    31316     Lennon, John       25  F01      300  1:13:36:1,12
1    31316     Lennon, John       25  F01      300  1:13:37:1,13
2    32363  McCartney, Paul        3  F04       60  2:218:10:4,6

Can also use groupby() with no need to join and stack().

Use above example data:

import pandas as pd
import numpy as np


df = pd.DataFrame({'ItemQty': {0: 3, 1: 25}, 
                   'Seatblocks': {0: '2:218:10:4,6', 1: '1:13:36:1,12 1:13:37:1,13'}, 
                   'ItemExt': {0: 60, 1: 300}, 
                   'CustomerName': {0: 'McCartney, Paul', 1: 'Lennon, John'}, 
                   'CustNum': {0: 32363, 1: 31316}, 
                   'Item': {0: 'F04', 1: 'F01'}}, 
                    columns=['CustNum','CustomerName','ItemQty','Item','Seatblocks','ItemExt']) 
print(df)

   CustNum     CustomerName  ItemQty Item                 Seatblocks  ItemExt
0  32363    McCartney, Paul  3        F04  2:218:10:4,6               60     
1  31316    Lennon, John     25       F01  1:13:36:1,12 1:13:37:1,13  300  


#first define a function: given a Series of string, split each element into a new series
def split_series(ser,sep):
    return pd.Series(ser.str.cat(sep=sep).split(sep=sep)) 
#test the function, 
split_series(pd.Series(['a b','c']),sep=' ')
0    a
1    b
2    c
dtype: object

df2=(df.groupby(df.columns.drop('Seatblocks').tolist()) #group by all but one column
          ['Seatblocks'] #select the column to be split
          .apply(split_series,sep=' ') # split 'Seatblocks' in each group
         .reset_index(drop=True,level=-1).reset_index()) #remove extra index created

print(df2)
   CustNum     CustomerName  ItemQty Item  ItemExt    Seatblocks
0    31316     Lennon, John       25  F01      300  1:13:36:1,12
1    31316     Lennon, John       25  F01      300  1:13:37:1,13
2    32363  McCartney, Paul        3  F04       60  2:218:10:4,6

回答 5

这似乎比该线程其他地方建议的方法容易得多。

在熊猫数据框中拆分行

This seems a far easier method than those suggested elsewhere in this thread.

split rows in pandas dataframe


遍历一个numpy数组

问题:遍历一个numpy数组

有没有那么冗长的替代方案:

for x in xrange(array.shape[0]):
    for y in xrange(array.shape[1]):
        do_stuff(x, y)

我想出了这个:

for x, y in itertools.product(map(xrange, array.shape)):
    do_stuff(x, y)

这节省了一个缩进,但仍然很丑陋。

我希望看起来像这样的伪代码:

for x, y in array.indices:
    do_stuff(x, y)

有没有类似的东西存在?

Is there a less verbose alternative to this:

for x in xrange(array.shape[0]):
    for y in xrange(array.shape[1]):
        do_stuff(x, y)

I came up with this:

for x, y in itertools.product(map(xrange, array.shape)):
    do_stuff(x, y)

Which saves one indentation, but is still pretty ugly.

I’m hoping for something that looks like this pseudocode:

for x, y in array.indices:
    do_stuff(x, y)

Does anything like that exist?


回答 0

我认为您正在寻找ndenumerate

>>> a =numpy.array([[1,2],[3,4],[5,6]])
>>> for (x,y), value in numpy.ndenumerate(a):
...  print x,y
... 
0 0
0 1
1 0
1 1
2 0
2 1

关于性能。它比列表理解要慢一些。

X = np.zeros((100, 100, 100))

%timeit list([((i,j,k), X[i,j,k]) for i in range(X.shape[0]) for j in range(X.shape[1]) for k in range(X.shape[2])])
1 loop, best of 3: 376 ms per loop

%timeit list(np.ndenumerate(X))
1 loop, best of 3: 570 ms per loop

如果您担心性能,可以通过查看实现来进一步优化ndenumerate,它实现了两件事,转换为数组并循环。如果知道有数组,则可以调用.coords平面迭代器的属性。

a = X.flat
%timeit list([(a.coords, x) for x in a.flat])
1 loop, best of 3: 305 ms per loop

I think you’re looking for the ndenumerate.

>>> a =numpy.array([[1,2],[3,4],[5,6]])
>>> for (x,y), value in numpy.ndenumerate(a):
...  print x,y
... 
0 0
0 1
1 0
1 1
2 0
2 1

Regarding the performance. It is a bit slower than a list comprehension.

X = np.zeros((100, 100, 100))

%timeit list([((i,j,k), X[i,j,k]) for i in range(X.shape[0]) for j in range(X.shape[1]) for k in range(X.shape[2])])
1 loop, best of 3: 376 ms per loop

%timeit list(np.ndenumerate(X))
1 loop, best of 3: 570 ms per loop

If you are worried about the performance you could optimise a bit further by looking at the implementation of ndenumerate, which does 2 things, converting to an array and looping. If you know you have an array, you can call the .coords attribute of the flat iterator.

a = X.flat
%timeit list([(a.coords, x) for x in a.flat])
1 loop, best of 3: 305 ms per loop

回答 1

如果只需要索引,可以尝试numpy.ndindex

>>> a = numpy.arange(9).reshape(3, 3)
>>> [(x, y) for x, y in numpy.ndindex(a.shape)]
[(0, 0), (0, 1), (0, 2), (1, 0), (1, 1), (1, 2), (2, 0), (2, 1), (2, 2)]

If you only need the indices, you could try numpy.ndindex:

>>> a = numpy.arange(9).reshape(3, 3)
>>> [(x, y) for x, y in numpy.ndindex(a.shape)]
[(0, 0), (0, 1), (0, 2), (1, 0), (1, 1), (1, 2), (2, 0), (2, 1), (2, 2)]

回答 2

nditer

import numpy as np
Y = np.array([3,4,5,6])
for y in np.nditer(Y, op_flags=['readwrite']):
    y += 3

Y == np.array([6, 7, 8, 9])

y = 3将无法使用y *= 0y += 3而是使用。

see nditer

import numpy as np
Y = np.array([3,4,5,6])
for y in np.nditer(Y, op_flags=['readwrite']):
    y += 3

Y == np.array([6, 7, 8, 9])

y = 3 would not work, use y *= 0 and y += 3 instead.


如何在Python中将任何数据类型更改为字符串

问题:如何在Python中将任何数据类型更改为字符串

如何在Python中将任何数据类型更改为字符串?

How can I change any data type into a string in Python?


回答 0

myvariable = 4
mystring = str(myvariable)  # '4'

另外,也可以尝试repr:

mystring = repr(myvariable) # '4'

这在python中称为“转换”,非常普遍。

myvariable = 4
mystring = str(myvariable)  # '4'

also, alternatively try repr:

mystring = repr(myvariable) # '4'

This is called “conversion” in python, and is quite common.


回答 1

str旨在产生对象数据的字符串表示形式。如果您正在编写自己的类,并且想str为您工作,请添加:

def __str__(self):
    return "Some descriptive string"

print str(myObj)会打电话给myObj.__str__()

repr是类似的方法,通常会产生有关类信息的信息。对于大多数核心库对象,repr在尖括号之间生成类名称(有时会生成一些类信息)。repr例如,仅通过在“交互”窗格中键入对象即可使用,而无需使用print或其他任何方法。

您可以repr为自己的对象定义行为,就像可以定义以下对象的行为一样str

def __repr__(self):
    return "Some descriptive string"

>>> myObj在“交互”窗格中,或repr(myObj),将导致myObj.__repr__()

str is meant to produce a string representation of the object’s data. If you’re writing your own class and you want str to work for you, add:

def __str__(self):
    return "Some descriptive string"

print str(myObj) will call myObj.__str__().

repr is a similar method, which generally produces information on the class info. For most core library object, repr produces the class name (and sometime some class information) between angle brackets. repr will be used, for example, by just typing your object into your interactions pane, without using print or anything else.

You can define the behavior of repr for your own objects just like you can define the behavior of str:

def __repr__(self):
    return "Some descriptive string"

>>> myObj in your interactions pane, or repr(myObj), will result in myObj.__repr__()


回答 2

我看到所有建议使用的答案str(object)。如果您的对象包含多个ascii字符,则可能会失败,并且您会看到类似的错误ordinal not in range(128)。在我用英语以外的其他语言转换字符串列表时,情况就是这样

我通过使用解决了 unicode(object)

I see all answers recommend using str(object). It might fail if your object have more than ascii characters and you will see error like ordinal not in range(128). This was the case for me while I was converting list of string in language other than English

I resolved it by using unicode(object)


回答 3

str(object) 会成功的

如果要更改对象的字符串化方式,请__str__(self)为对象的类定义方法。这种方法必须返回str或unicode对象。

str(object) will do the trick.

If you want to alter the way object is stringified, define __str__(self) method for object’s class. Such method has to return str or unicode object.


回答 4

使用str内置的:

x = str(something)

例子:

>>> str(1)
'1'
>>> str(1.0)
'1.0'
>>> str([])
'[]'
>>> str({})
'{}'

...

从文档中:

返回一个字符串,其中包含对象的可很好打印的表示形式。对于字符串,这将返回字符串本身。与repr(object)的区别在于str(object)并不总是尝试返回eval()可接受的字符串;它的目标是返回可打印的字符串。如果未提供任何参数,则返回空字符串“”。

Use the str built-in:

x = str(something)

Examples:

>>> str(1)
'1'
>>> str(1.0)
'1.0'
>>> str([])
'[]'
>>> str({})
'{}'

...

From the documentation:

Return a string containing a nicely printable representation of an object. For strings, this returns the string itself. The difference with repr(object) is that str(object) does not always attempt to return a string that is acceptable to eval(); its goal is to return a printable string. If no argument is given, returns the empty string, ”.


回答 5

str(x)。但是,每种数据类型都可以定义自己的字符串转换,因此这可能不是您想要的。

With str(x). However, every data type can define its own string conversion, so this might not be what you want.


回答 6

您可以%s如下使用

>>> "%s" %([])
'[]'

You can use %s like below

>>> "%s" %([])
'[]'

回答 7

只需使用str-例如:

>>> str([])
'[]'

Just use str – for example:

>>> str([])
'[]'

回答 8

使用格式:

"%s" % (x)

例:

x = time.ctime(); str = "%s" % (x); print str

输出: 2018年1月11日星期四20:40:05

Use formatting:

"%s" % (x)

Example:

x = time.ctime(); str = "%s" % (x); print str

Output: Thu Jan 11 20:40:05 2018


回答 9

如果您确实要“更改”数据类型,请小心。像在其他情况下(例如,在for循环中更改迭代器),这可能会引起意外的行为:

>> dct = {1:3, 2:1}
>> len(str(dct))
12
>> print(str(dct))
{1: 31, 2: 0}
>> l = ["all","colours"]
>> len(str(l))
18

Be careful if you really want to “change” the data type. Like in other cases (e.g. changing the iterator in a for loop) this might bring up unexpected behaviour:

>> dct = {1:3, 2:1}
>> len(str(dct))
12
>> print(str(dct))
{1: 31, 2: 0}
>> l = ["all","colours"]
>> len(str(l))
18

Jenkins中的Python单元测试?

问题:Jenkins中的Python单元测试?

您如何让Jenkins执行python unittest案例?是否可以从内置unittest包中输出JUnit样式的XML ?

How do you get Jenkins to execute python unittest cases? Is it possible to JUnit style XML output from the builtin unittest package?


回答 0

样本测试:

tests.py:

# tests.py

import random
try:
    import unittest2 as unittest
except ImportError:
    import unittest

class SimpleTest(unittest.TestCase):
    @unittest.skip("demonstrating skipping")
    def test_skipped(self):
        self.fail("shouldn't happen")

    def test_pass(self):
        self.assertEqual(10, 7 + 3)

    def test_fail(self):
        self.assertEqual(11, 7 + 3)

带有pytest的JUnit

使用以下命令运行测试:

py.test --junitxml results.xml tests.py

results.xml:

<?xml version="1.0" encoding="utf-8"?>
<testsuite errors="0" failures="1" name="pytest" skips="1" tests="2" time="0.097">
    <testcase classname="tests.SimpleTest" name="test_fail" time="0.000301837921143">
        <failure message="test failure">self = &lt;tests.SimpleTest testMethod=test_fail&gt;

    def test_fail(self):
&gt;       self.assertEqual(11, 7 + 3)
E       AssertionError: 11 != 10

tests.py:16: AssertionError</failure>
    </testcase>
    <testcase classname="tests.SimpleTest" name="test_pass" time="0.000109910964966"/>
    <testcase classname="tests.SimpleTest" name="test_skipped" time="0.000164031982422">
        <skipped message="demonstrating skipping" type="pytest.skip">/home/damien/test-env/lib/python2.6/site-packages/_pytest/unittest.py:119: Skipped: demonstrating skipping</skipped>
    </testcase>
</testsuite>

带鼻子的JUnit

使用以下命令运行测试:

nosetests --with-xunit

鼻子测试.xml:

<?xml version="1.0" encoding="UTF-8"?>
<testsuite name="nosetests" tests="3" errors="0" failures="1" skip="1">
    <testcase classname="tests.SimpleTest" name="test_fail" time="0.000">
        <failure type="exceptions.AssertionError" message="11 != 10">
            <![CDATA[Traceback (most recent call last):
File "/opt/python-2.6.1/lib/python2.6/site-packages/unittest2-0.5.1-py2.6.egg/unittest2/case.py", line 340, in run
testMethod()
File "/home/damien/tests.py", line 16, in test_fail
self.assertEqual(11, 7 + 3)
File "/opt/python-2.6.1/lib/python2.6/site-packages/unittest2-0.5.1-py2.6.egg/unittest2/case.py", line 521, in assertEqual
assertion_func(first, second, msg=msg)
File "/opt/python-2.6.1/lib/python2.6/site-packages/unittest2-0.5.1-py2.6.egg/unittest2/case.py", line 514, in _baseAssertEqual
raise self.failureException(msg)
AssertionError: 11 != 10
]]>
        </failure>
    </testcase>
    <testcase classname="tests.SimpleTest" name="test_pass" time="0.000"></testcase>
    <testcase classname="tests.SimpleTest" name="test_skipped" time="0.000">
        <skipped type="nose.plugins.skip.SkipTest" message="demonstrating skipping">
            <![CDATA[SkipTest: demonstrating skipping
]]>
        </skipped>
    </testcase>
</testsuite>

带有鼻子的JUnit2

您将需要使用nose2.plugins.junitxml插件。您可以nose2像通常一样使用配置文件进行配置,也可以使用--plugin命令行选项进行配置。

使用以下命令运行测试:

nose2 --plugin nose2.plugins.junitxml --junit-xml tests

鼻子2-junit.xml:

<testsuite errors="0" failures="1" name="nose2-junit" skips="1" tests="3" time="0.001">
  <testcase classname="tests.SimpleTest" name="test_fail" time="0.000126">
    <failure message="test failure">Traceback (most recent call last):
  File "/Users/damien/Work/test2/tests.py", line 18, in test_fail
    self.assertEqual(11, 7 + 3)
AssertionError: 11 != 10
</failure>
  </testcase>
  <testcase classname="tests.SimpleTest" name="test_pass" time="0.000095" />
  <testcase classname="tests.SimpleTest" name="test_skipped" time="0.000058">
    <skipped />
  </testcase>
</testsuite>

具有unittest-xml-reporting的JUnit

将以下内容附加到 tests.py

if __name__ == '__main__':
    import xmlrunner
    unittest.main(testRunner=xmlrunner.XMLTestRunner(output='test-reports'))

使用以下命令运行测试:

python tests.py

测试报告/TEST-SimpleTest-20131001140629.xml:

<?xml version="1.0" ?>
<testsuite errors="1" failures="0" name="SimpleTest-20131001140629" tests="3" time="0.000">
    <testcase classname="SimpleTest" name="test_pass" time="0.000"/>
    <testcase classname="SimpleTest" name="test_fail" time="0.000">
        <error message="11 != 10" type="AssertionError">
<![CDATA[Traceback (most recent call last):
  File "tests.py", line 16, in test_fail
    self.assertEqual(11, 7 + 3)
AssertionError: 11 != 10
]]>     </error>
    </testcase>
    <testcase classname="SimpleTest" name="test_skipped" time="0.000">
        <skipped message="demonstrating skipping" type="skip"/>
    </testcase>
    <system-out>
<![CDATA[]]>    </system-out>
    <system-err>
<![CDATA[]]>    </system-err>
</testsuite>

sample tests:

tests.py:

# tests.py

import random
try:
    import unittest2 as unittest
except ImportError:
    import unittest

class SimpleTest(unittest.TestCase):
    @unittest.skip("demonstrating skipping")
    def test_skipped(self):
        self.fail("shouldn't happen")

    def test_pass(self):
        self.assertEqual(10, 7 + 3)

    def test_fail(self):
        self.assertEqual(11, 7 + 3)

JUnit with pytest

run the tests with:

py.test --junitxml results.xml tests.py

results.xml:

<?xml version="1.0" encoding="utf-8"?>
<testsuite errors="0" failures="1" name="pytest" skips="1" tests="2" time="0.097">
    <testcase classname="tests.SimpleTest" name="test_fail" time="0.000301837921143">
        <failure message="test failure">self = &lt;tests.SimpleTest testMethod=test_fail&gt;

    def test_fail(self):
&gt;       self.assertEqual(11, 7 + 3)
E       AssertionError: 11 != 10

tests.py:16: AssertionError</failure>
    </testcase>
    <testcase classname="tests.SimpleTest" name="test_pass" time="0.000109910964966"/>
    <testcase classname="tests.SimpleTest" name="test_skipped" time="0.000164031982422">
        <skipped message="demonstrating skipping" type="pytest.skip">/home/damien/test-env/lib/python2.6/site-packages/_pytest/unittest.py:119: Skipped: demonstrating skipping</skipped>
    </testcase>
</testsuite>

JUnit with nose

run the tests with:

nosetests --with-xunit

nosetests.xml:

<?xml version="1.0" encoding="UTF-8"?>
<testsuite name="nosetests" tests="3" errors="0" failures="1" skip="1">
    <testcase classname="tests.SimpleTest" name="test_fail" time="0.000">
        <failure type="exceptions.AssertionError" message="11 != 10">
            <![CDATA[Traceback (most recent call last):
File "/opt/python-2.6.1/lib/python2.6/site-packages/unittest2-0.5.1-py2.6.egg/unittest2/case.py", line 340, in run
testMethod()
File "/home/damien/tests.py", line 16, in test_fail
self.assertEqual(11, 7 + 3)
File "/opt/python-2.6.1/lib/python2.6/site-packages/unittest2-0.5.1-py2.6.egg/unittest2/case.py", line 521, in assertEqual
assertion_func(first, second, msg=msg)
File "/opt/python-2.6.1/lib/python2.6/site-packages/unittest2-0.5.1-py2.6.egg/unittest2/case.py", line 514, in _baseAssertEqual
raise self.failureException(msg)
AssertionError: 11 != 10
]]>
        </failure>
    </testcase>
    <testcase classname="tests.SimpleTest" name="test_pass" time="0.000"></testcase>
    <testcase classname="tests.SimpleTest" name="test_skipped" time="0.000">
        <skipped type="nose.plugins.skip.SkipTest" message="demonstrating skipping">
            <![CDATA[SkipTest: demonstrating skipping
]]>
        </skipped>
    </testcase>
</testsuite>

JUnit with nose2

You would need to use the nose2.plugins.junitxml plugin. You can configure nose2 with a config file like you would normally do, or with the --plugin command-line option.

run the tests with:

nose2 --plugin nose2.plugins.junitxml --junit-xml tests

nose2-junit.xml:

<testsuite errors="0" failures="1" name="nose2-junit" skips="1" tests="3" time="0.001">
  <testcase classname="tests.SimpleTest" name="test_fail" time="0.000126">
    <failure message="test failure">Traceback (most recent call last):
  File "/Users/damien/Work/test2/tests.py", line 18, in test_fail
    self.assertEqual(11, 7 + 3)
AssertionError: 11 != 10
</failure>
  </testcase>
  <testcase classname="tests.SimpleTest" name="test_pass" time="0.000095" />
  <testcase classname="tests.SimpleTest" name="test_skipped" time="0.000058">
    <skipped />
  </testcase>
</testsuite>

JUnit with unittest-xml-reporting

Append the following to tests.py

if __name__ == '__main__':
    import xmlrunner
    unittest.main(testRunner=xmlrunner.XMLTestRunner(output='test-reports'))

run the tests with:

python tests.py

test-reports/TEST-SimpleTest-20131001140629.xml:

<?xml version="1.0" ?>
<testsuite errors="1" failures="0" name="SimpleTest-20131001140629" tests="3" time="0.000">
    <testcase classname="SimpleTest" name="test_pass" time="0.000"/>
    <testcase classname="SimpleTest" name="test_fail" time="0.000">
        <error message="11 != 10" type="AssertionError">
<![CDATA[Traceback (most recent call last):
  File "tests.py", line 16, in test_fail
    self.assertEqual(11, 7 + 3)
AssertionError: 11 != 10
]]>     </error>
    </testcase>
    <testcase classname="SimpleTest" name="test_skipped" time="0.000">
        <skipped message="demonstrating skipping" type="skip"/>
    </testcase>
    <system-out>
<![CDATA[]]>    </system-out>
    <system-err>
<![CDATA[]]>    </system-err>
</testsuite>

回答 1

我会第二次使用鼻子。现在已经内置了基本的XML报告。只需使用–with-xunit命令行选项,它就会生成一个nosetests.xml文件。例如:

鼻子测试–with-xunit

然后添加一个“发布JUnit测试结果报告”后生成操作,并使用nasesttests.xml填充“测试报告XML”字段(假设您在$ WORKSPACE中运行了鼻子测试)。

I would second using nose. Basic XML reporting is now built in. Just use the –with-xunit command line option and it will produce a nosetests.xml file. For example:

nosetests –with-xunit

Then add a “Publish JUnit test result report” post build action, and fill in the “Test report XMLs” field with nosetests.xml (assuming that you ran nosetests in $WORKSPACE).


回答 2

您可以安装unittest-xml-reporting包,以将生成XML的测试运行器添加到内置unittest

我们使用pytest,它内置了XML输出(这是一个命令行选项)。

无论哪种方式,都可以通过运行shell命令来执行单元测试。

You can install the unittest-xml-reporting package to add a test runner that generates XML to the built-in unittest.

We use pytest, which has XML output built in (it’s a command line option).

Either way, executing the unit tests can be done by running a shell command.


回答 3

我用鼻子测试。有一些插件可以为Jenkins输出XML

I used nosetests. There are addons to output the XML for Jenkins


回答 4

当使用buildout时,我们collective.xmltestreport会产生JUnit风格的XML输出,也许它是源代码,或者模块本身可能会有所帮助。

When using buildout we use collective.xmltestreport to produce JUnit-style XML output, perhaps it’s source code or the module itself could be of help.


回答 5

python -m pytest --junit-xml=pytest_unit.xml source_directory/test/unit || true # tests may fail

从jenkins作为shell运行它,您可以在pytest_unit.xml中作为工件获取报告。

python -m pytest --junit-xml=pytest_unit.xml source_directory/test/unit || true # tests may fail

Run this as shell from jenkins , you can get the report in pytest_unit.xml as artifact.


如何在setuptools / distribute中包含软件包数据?

问题:如何在setuptools / distribute中包含软件包数据?

使用setuptools / distribute时,我无法使安装程序提取任何package_data文件。我读过的所有内容都表明,以下是正确的方法。有人可以请教吗?

setup(
   name='myapp',
   packages=find_packages(),
   package_data={
      'myapp': ['data/*.txt'],
   },
   include_package_data=True,
   zip_safe=False,
   install_requires=['distribute'],
)

myapp/data/数据文件的位置在哪里。

When using setuptools, I can not get the installer to pull in any package_data files. Everything I’ve read says that the following is the correct way to do it. Can someone please advise?

setup(
   name='myapp',
   packages=find_packages(),
   package_data={
      'myapp': ['data/*.txt'],
   },
   include_package_data=True,
   zip_safe=False,
   install_requires=['distribute'],
)

where myapp/data/ is the location of the data files.


回答 0

我知道这是一个老问题,但人们发现这里通过谷歌自己的方式: package_data是低了下来,肮脏的谎言。它仅在构建二进制软件包(python setup.py bdist ...)时使用,在构建源软件包(python setup.py sdist ...)时不使用。当然,这是荒谬的-人们希望构建源代码分发将导致文件集合,这些文件可以发送给其他人来构建二进制分发。

在任何情况下,使用MANIFEST.in将工作二进制和源分布。

I realize that this is an old question, but for people finding their way here via Google: package_data is a low-down, dirty lie. It is only used when building binary packages (python setup.py bdist ...) but not when building source packages (python setup.py sdist ...). This is, of course, ridiculous — one would expect that building a source distribution would result in a collection of files that could be sent to someone else to built the binary distribution.

In any case, using MANIFEST.in will work both for binary and for source distributions.


回答 1

我只是有同样的问题。解决的方法是简单地删除include_package_data=True

这里阅读之后,我意识到它include_package_data旨在包含来自版本控制的文件,而不是顾名思义仅包含“ include package data”。从文档:

[include_package_data]的数据文件必须处于CVS或Subversion控制之下

如果要对包含的文件进行更细粒度的控制(例如,如果您的软件包目录中有文档文件,并希望将其从安装中排除),则也可以使用package_data关键字。

把那个参数排除掉可以解决这个问题,这恰好是为什么当您切换到distutils时它也可以工作的原因,因为它不接受那个参数。

I just had this same issue. The solution, was simply to remove include_package_data=True.

After reading here, I realized that include_package_data aims to include files from version control, as opposed to merely “include package data” as the name implies. From the docs:

The data files [of include_package_data] must be under CVS or Subversion control

If you want finer-grained control over what files are included (for example, if you have documentation files in your package directories and want to exclude them from installation), then you can also use the package_data keyword.

Taking that argument out fixed it, which is coincidentally why it also worked when you switched to distutils, since it doesn’t take that argument.


回答 2

遵循@Joe的建议删除该include_package_data=True行也对我有用。

详细说明一下,我没有 MANIFEST.in文件。我使用Git而不是CVS。

存储库采用以下形式:

/myrepo
    - .git/
    - setup.py
    - myproject
        - __init__.py
        - some_mod
            - __init__.py
            - animals.py
            - rocks.py
        - config
            - __init__.py
            - settings.py
            - other_settings.special
            - cool.huh
            - other_settings.xml
        - words
            - __init__.py
            word_set.txt

setup.py

from setuptools import setup, find_packages
import os.path

setup (
    name='myproject',
    version = "4.19",
    packages = find_packages(),  
    # package_dir={'mypkg': 'src/mypkg'},  # didnt use this.
    package_data = {
        # If any package contains *.txt or *.rst files, include them:
        '': ['*.txt', '*.xml', '*.special', '*.huh'],
    },

#
    # Oddly enough, include_package_data=True prevented package_data from working.
    # include_package_data=True, # Commented out.
    data_files=[
#               ('bitmaps', ['bm/b1.gif', 'bm/b2.gif']),
        ('/opt/local/myproject/etc', ['myproject/config/settings.py', 'myproject/config/other_settings.special']),
        ('/opt/local/myproject/etc', [os.path.join('myproject/config', 'cool.huh')]),
#
        ('/opt/local/myproject/etc', [os.path.join('myproject/config', 'other_settings.xml')]),
        ('/opt/local/myproject/data', [os.path.join('myproject/words', 'word_set.txt')]),
    ],

    install_requires=[ 'jsonschema',
        'logging', ],

     entry_points = {
        'console_scripts': [
            # Blah...
        ], },
)

python setup.py sdist为源发行版(没有尝试过二进制)运行。

在新的虚拟环境中,我有一个myproject-4.19.tar.gz文件,并且我使用

(venv) pip install ~/myproject-4.19.tar.gz
...

除了将所有内容都安装到我的虚拟环境中之外site-packages,这些特殊数据文件也都安装到/opt/local/myproject/data和中/opt/local/myproject/etc

Following @Joe ‘s recommendation to remove the include_package_data=True line also worked for me.

To elaborate a bit more, I have no MANIFEST.in file. I use Git and not CVS.

Repository takes this kind of shape:

/myrepo
    - .git/
    - setup.py
    - myproject
        - __init__.py
        - some_mod
            - __init__.py
            - animals.py
            - rocks.py
        - config
            - __init__.py
            - settings.py
            - other_settings.special
            - cool.huh
            - other_settings.xml
        - words
            - __init__.py
            word_set.txt

setup.py:

from setuptools import setup, find_packages
import os.path

setup (
    name='myproject',
    version = "4.19",
    packages = find_packages(),  
    # package_dir={'mypkg': 'src/mypkg'},  # didnt use this.
    package_data = {
        # If any package contains *.txt or *.rst files, include them:
        '': ['*.txt', '*.xml', '*.special', '*.huh'],
    },

#
    # Oddly enough, include_package_data=True prevented package_data from working.
    # include_package_data=True, # Commented out.
    data_files=[
#               ('bitmaps', ['bm/b1.gif', 'bm/b2.gif']),
        ('/opt/local/myproject/etc', ['myproject/config/settings.py', 'myproject/config/other_settings.special']),
        ('/opt/local/myproject/etc', [os.path.join('myproject/config', 'cool.huh')]),
#
        ('/opt/local/myproject/etc', [os.path.join('myproject/config', 'other_settings.xml')]),
        ('/opt/local/myproject/data', [os.path.join('myproject/words', 'word_set.txt')]),
    ],

    install_requires=[ 'jsonschema',
        'logging', ],

     entry_points = {
        'console_scripts': [
            # Blah...
        ], },
)

I run python setup.py sdist for a source distrib (haven’t tried binary).

And when inside of a brand new virtual environment, I have a myproject-4.19.tar.gz, file, and I use

(venv) pip install ~/myproject-4.19.tar.gz
...

And other than everything getting installed to my virtual environment’s site-packages, those special data files get installed to /opt/local/myproject/data and /opt/local/myproject/etc.


回答 3

include_package_data=True 为我工作。

如果你使用git,请记住,包括setuptools-gitinstall_requires。远没有拥有Manifest或包含所有路径package_data(在我的情况下,它是具有各种静态特性的django应用程序)那么无聊

(粘贴了我的评论,就像k3-rnc所说的那样,实际上是有帮助的)

include_package_data=True worked for me.

If you use git, remember to include setuptools-git in install_requires. Far less boring than having a Manifest or including all path in package_data ( in my case it’s a django app with all kind of statics )

( pasted the comment I made, as k3-rnc mentioned it’s actually helpful as is )


回答 4

更新:此答案是旧的,该信息不再有效。所有setup.py配置均应使用import setuptools。我在https://stackoverflow.com/a/49501350/64313中添加了更完整的答案


我通过切换到distutils解决了这个问题。似乎已弃用和/或破坏了分发。

from distutils.core import setup

setup(
   name='myapp',
   packages=['myapp'],
   package_data={
      'myapp': ['data/*.txt'],
   },
)

Update: This answer is old and the information is no longer valid. All setup.py configs should use import setuptools. I’ve added a more complete answer at https://stackoverflow.com/a/49501350/64313


I solved this by switching to distutils. Looks like distribute is deprecated and/or broken.

from distutils.core import setup

setup(
   name='myapp',
   packages=['myapp'],
   package_data={
      'myapp': ['data/*.txt'],
   },
)

回答 5

古老的问题,然而… python的软件包管理确实有很多不足之处。因此,我有在本地使用pip安装到指定目录的用例,很惊讶package_data和data_files路径都无法解决。我不希望再向仓库添加另一个文件,所以最终我利用了data_files和setup.py选项–install-data;。像这样的东西

pip install . --install-option="--install-data=$PWD/package" -t package  

Ancient question and yet… package management of python really leaves a lot to be desired. So I had the use case of installing using pip locally to a specified directory and was surprised both package_data and data_files paths did not work out. I was not keen on adding yet another file to the repo so I ended up leveraging data_files and setup.py option –install-data; something like this

pip install . --install-option="--install-data=$PWD/package" -t package  

回答 6

将包含软件包数据的文件夹移到module文件夹为我解决了这个问题。

看到这个问题:MANIFEST.in在“ python setup.py install”上被忽略-没有安装数据文件?

Moving the folder containing the package data into to module folder solved the problem for me.

See this question: MANIFEST.in ignored on “python setup.py install” – no data files installed?


回答 7

我在几天中遇到了同样的问题,但是即使一切都变得混乱,这个线程也无法为我提供帮助。因此,我进行了研究,发现了以下解决方案:

基本上在这种情况下,您应该执行以下操作:

from setuptools import setup

setup(
   name='myapp',
   packages=['myapp'],
   package_dir={'myapp':'myapp'}, # the one line where all the magic happens
   package_data={
      'myapp': ['data/*.txt'],
   },
)

完整的其他stackoverflow答案在这里

I had the same problem for a couple of days but even this thread wasn’t able to help me as everything was confusing. So I did my research and found the following solution:

Basically in this case, you should do:

from setuptools import setup

setup(
   name='myapp',
   packages=['myapp'],
   package_dir={'myapp':'myapp'}, # the one line where all the magic happens
   package_data={
      'myapp': ['data/*.txt'],
   },
)

The full other stackoverflow answer here


回答 8

只需删除该行:

include_package_data=True,

从您的安装脚本中,它将正常工作。(刚刚通过最新的setuptools测试。)

Just remove the line:

include_package_data=True,

from your setup script, and it will work fine. (Tested just now with latest setuptools.)


回答 9

使用setup.cfg(setuptools≥30.3.0)

从setuptools 30.3.0(2016年12月8日发布)开始,您可以保持setup.py很小的规模并将配置移动到setup.cfg文件中。使用这种方法,您可以将包数据放在以下[options.package_data]部分中:

[options.package_data]
* = *.txt, *.rst
hello = *.msg

在这种情况下,您setup.py可以做到:

from setuptools import setup
setup()

有关更多信息,请参阅使用setup.cfg文件配置安装程序

一些关于setup.cfgPEP 518中pyproject.toml提议的弃用赞成的说法,但从2020年2月21日起这仍然是临时的。

Using setup.cfg (setuptools ≥ 30.3.0)

Starting with setuptools 30.3.0 (released 2016-12-08), you can keep your setup.py very small and move the configuration to a setup.cfg file. With this approach, you could put your package data in an [options.package_data] section:

[options.package_data]
* = *.txt, *.rst
hello = *.msg

In this case, your setup.py can be as short as:

from setuptools import setup
setup()

For more information, see configuring setup using setup.cfg files.

There is some talk of deprecating setup.cfg in favour of pyproject.toml as proposed in PEP 518, but this is still provisional as of 2020-02-21.


这是从哪里来的:-*-编码:utf-8-*-

问题:这是从哪里来的:-*-编码:utf-8-*-

Python将以下内容识别为定义文件编码的指令:

# -*- coding: utf-8 -*-

我确实在(-*- var: value -*-)之前看到过这种说明。它从何而来?完整规范是什么,例如,值可以包含空格,特殊符号,换行符,甚至-*-本身吗?

我的程序将编写纯文本文件,我想使用这种格式在其中包含一些元数据。

Python recognizes the following as instruction which defines file’s encoding:

# -*- coding: utf-8 -*-

I definitely saw this kind of instructions before (-*- var: value -*-). Where does it come from? What is the full specification, e.g. can the value include spaces, special symbols, newlines, even -*- itself?

My program will be writing plain text files and I’d like to include some metadata in them using this format.


回答 0

这种指定Python文件编码的方式来自PEP 0263-定义Python源代码编码

GNU Emacs也可以识别它(请参阅Python语言参考,2.1.4编码声明),尽管我不知道它是否是第一个使用该语法的程序。

This way of specifying the encoding of a Python file comes from PEP 0263 – Defining Python Source Code Encodings.

It is also recognized by GNU Emacs (see Python Language Reference, 2.1.4 Encoding declarations), though I don’t know if it was the first program to use that syntax.


回答 1

# -*- coding: utf-8 -*-是Python 2的东西。在Python 3+中,源文件默认编码已经是UTF-8,并且该行是无用的。

请参阅:我应该在Python 3中使用编码声明吗?

pyupgrade是一个可以在代码上运行的工具,用于从Python 2中删除这些注释和其他不再有用的遗留物,例如让所有类都继承自object

# -*- coding: utf-8 -*- is a Python 2 thing. In Python 3+, the default encoding of source files is already UTF-8 and that line is useless.

See: Should I use encoding declaration in Python 3?

pyupgrade is a tool you can run on your code to remove those comments and other no-longer-useful leftovers from Python 2, like having all your classes inherit from object.


回答 2

这就是所谓的文件局部变量,Emacs可以理解并相应地进行设置。请参阅Emacs手册中的相应部分 -您可以在文件的页眉或页脚中定义它们

This is so called file local variables, that are understood by Emacs and set correspondingly. See corresponding section in Emacs manual – you can define them either in header or in footer of file


回答 3

在PyCharm中,我将其省略。它将关闭底部的UTF-8指示器,并警告该编码为硬编码。不要以为您需要上面提到的PyCharm评论。

In PyCharm, I’d leave it out. It turns off the UTF-8 indicator at the bottom with a warning that the encoding is hard-coded. Don’t think you need the PyCharm comment mentioned above.


如何更新Python?

问题:如何更新Python?

我从2012年初开始安装了2.7版。对于在安装最新版本之前是否应该完全卸载并擦除此版本,我无法达成共识。

“软”删除旧版本?硬删除/清除旧版本?安装在顶部?

我在某处看到了一个特殊的安装/升级过程,该过程使用Python安装的“分段”方法,将不同的版本分开并保持功能。不知道这是否是事实上的标准方法。

我还想知道Revo是否太过热情,是否可能导致清除仍然需要的残留物(例如环境/ PATH变量)而引起问题。

(Win7 x64,32位Python)

I have version 2.7 installed from early 2012. I can’t find any consensus on whether I should completely uninstall and wipe this version before putting on the latest version.

“Soft”-removing old versions? Hard-removing/wiping old versions? Installing over top?

I’ve seen somewhere a special install/upgrade process using a “segmenting” method of Python installations, keeping different versions separate and apart, but functional. Not sure if this is the standard, de facto way.

I also wonder if Revo gets too overzealous and may cause issues with wiping out still-needed remnants, like environment/PATH variables.

(Win7 x64, 32-bit Python)


回答 0

更新日期:2018-07-06

这个帖子现在已经快5年了!2020年,Python-2.7将停止从python.org接收官方更新。此外,还发布了Python-3.7。查看Python-Future,了解如何使您的Python-2代码与Python-3兼容。为了更新conda,文档现在建议conda update --all在您的每个conda环境中使用更新该版本的所有软件包和Python可执行文件。另外,由于它们将名称更改为Anaconda,所以我不知道Windows注册表项是否仍然相同。

更新日期:2017-03-24

自2015年6月以来,没有对Python(x,y)进行任何更新,因此我认为可以断定它已被放弃。

更新:2016-11-11

正如下面的@cxw注释所示,这些答案适用于相同的位版本,按位版本,我的意思是64位与32位。例如,这些答案将适用于从64位Python-2.7.10更新到64位Python-2.7.11,相同的位版本。虽然可以将两个不同的Python版本一起安装,但这需要一些技巧,因此我将为读者保存该练习。如果您不想黑客,我建议如果切换位版本,请先删除其他位版本。

更新日期:2016-05-16
  • 通过禁用更改Windows 和注册表的选项,AnacondaMiniConda可以与现有的Python安装一起使用PATH。解压后,conda在您的binPyPI中创建符号链接到或安装conda。然后创建另一个名为符号链接conda-activateactivate在巨蟒/ Miniconda根bin文件夹。现在,Anaconda / Miniconda就像Ruby RVM。仅用于conda-activate root启用Anaconda / Miniconda。
  • 便携式Python已不再开发或维护。

TL; DR

  • 使用Anaconda或miniconda,然后执行conda update --all以保持每个conda环境的更新,
  • 同样重要的版本官方的Python比如 2.7.5),只需安装过旧的( 2.7.4),
  • 官方Python的不同主要版本 3.3),与老,设置路径/联装并排方点到显性的( 2.7),快捷方式等(在bash $ ln /c/Python33/python.exe python3)。

答案取决于:

  1. 如果OP具有2.7.x,并且要安装较新的2.7.x,则

    • 如果使用MSI安装程序Python官方网站上,只要安装了旧版本,安装程序会发出警告,它会删除并替换旧版本; 前后检查“控制面板”中的“已安装程序”,以确认旧版本已被新版本替换;2.7.x的较新版本向后兼容,因此这是完全安全的,因此,IMHO 2.7.x的多个版本永远不需要。
    • 如果是从源代码构建的,那么您可能应该构建在一个全新的,干净的目录中,然后在通过所有测试并且确信它已成功构建后,将路径指向新的构建,但是您可能希望保留旧的进行构建,因为从源构建可能偶尔会遇到问题。请参阅我的指南,以在带有SDK 7.0的Windows 7上构建Python x64
    • 如果从诸如Python(x,y)之类的发行版进行安装,请访问其网站。Python(x,y)已被放弃。 我相信可以使用其包管理器在Python(x,y)内处理更新,但是更新更新也包含在其网站上。我找不到具体的参考,所以也许有人可以对此发表意见。与ActiveState相似,并且可能是有思想的,Python(x,y)明确指出它与Python的其他安装不兼容:

      建议在安装Python(x,y)之前先卸载所有其他Python发行版

    • Enthought Canopy使用MSI,并将分别安装到所有用户中或为所有用户安装,Program Files\Enthoughthome\AppData\Local\Enthought\Canopy\App针对每个用户安装。通过使用内置的更新工具来更新较新的安装。查看他们的文档
    • ActiveState还使用MSI,因此可以在较旧的安装之上安装较新的安装。查看其安装说明

      其他Python 2.7安装在Windows上,ActivePython 2.7无法与其他Python 2.7安装共存(例如,来自python.org的Python 2.7构建)。在安装ActivePython 2.7之前,请卸载其他所有Python 2.7安装。

    • Sage建议您将其安装到虚拟机中,并提供可用于此目的的Oracle VirtualBox映像文件。发出sage -upgrade命令在内部处理升级。
    • 可以使用以下conda命令更新Anaconda

      conda update --all

      Anaconda / Miniconda允许用户创建环境来管理多个Python版本,包括Python- 2.6、2.7、3.3、3.4 和3.5。Anaconda / Miniconda的根安装当前基于Python-2.7或Python-3.5。

      Anaconda可能会破坏其他Python安装。安装使用MSI安装程序。 [ 更新:2016-05-16] Anaconda和Miniconda现在使用.exe安装程序,并提供选项来禁用Windows PATH和注册表更改。

      因此,根据安装方式和安装过程中选择的选项,可以在不中断现有Python安装的情况下安装Anaconda / Miniconda。如果.exe使用安装程序和选项来改变的Windows PATH和注册表都没有禁用,则任何以前的Python的安装将被禁用,但只需卸载Python/ Miniconda安装应恢复原来的Python安装,也许除了Windows注册表Python\PythonCore键。

      Python/ Miniconda使得下面的注册表编辑无论安装选项:HKCU\Software\Python\ContinuumAnalytics\使用下列按键:HelpInstallPathModulesPythonPath– Python官方注册过这些按键,但下Python\PythonCore。还为Anaconda \ Miniconda注册了卸载信息。除非在安装过程中选择“在Windows中注册”选项,否则它不会创建PythonCore,因此像Visual Studio的Python Tools这样的集成不会自动看到Anaconda / Miniconda。如果注册Python/ Miniconda选项激活,那么我认为您现有的Python Windows注册表项将被改变和卸载可能不会恢复它们。

    • 我认为,可以通过WinPython控制面板处理WinPython更新。
    • PortablePython不再被开发它没有更新方法。可能更新可以解压缩到一个新的目录,然后App\lib\site-packagesApp\Scripts可以复制到新安装的,但如果没有工作,然后重新安装所有的包可能是必要的。使用pip list查看包安装了什么,它们的版本。有些是由PortablePython安装的。使用easy_install pip如果未安装它安装点子。
  2. 如果OP具有2.7.x,并且想要安装其他版本,例如 <= 2.6.x或> = 3.xx,则可以并排安装不同版本。您必须选择要与*.py文件关联的Python版本(如果有),以及要在路径中使用的版本,尽管如果使用BASH则应该能够设置具有不同路径的shell 。AFAIK 2.7.x向后兼容2.6.x,因此不需要IMHO并排安装,但是Python-3.xx不向后兼容,因此我的建议是将Python-2.7放在您的路径上并具有通过创建指向可执行文件的快捷方式python3(这是Linux上的常见设置),可以将python-3作为可选版本。Windows上官方的Python默认安装路径是

    • 适用于3.3.x的C:\ Python33(最新2013-07-29)
    • C:\ Python32 for 3.2.x
    • &C。
    • C:\ Python27 for 2.7.x(最新2013-07-29)
    • C:\ Python26 for 2.6.x
    • &C。
  3. 如果OP不是在更新Python,而只是在更新软件包,则他们可能希望研究virtualenv,以使特定于其开发项目的软件包的不同版本分开。Pip还是更新软件包的好工具。如果软件包使用二进制安装程序,则通常在安装新软件包之前先卸载旧软件包。

我希望这可以消除任何混乱。

UPDATE: 2018-07-06

This post is now nearly 5 years old! Python-2.7 will stop receiving official updates from python.org in 2020. Also, Python-3.7 has been released. Check out Python-Future on how to make your Python-2 code compatible with Python-3. For updating conda, the documentation now recommends using conda update --all in each of your conda environments to update all packages and the Python executable for that version. Also, since they changed their name to Anaconda, I don’t know if the Windows registry keys are still the same.

UPDATE: 2017-03-24

There have been no updates to Python(x,y) since June of 2015, so I think it’s safe to assume it has been abandoned.

UPDATE: 2016-11-11

As @cxw comments below, these answers are for the same bit-versions, and by bit-version I mean 64-bit vs. 32-bit. For example, these answers would apply to updating from 64-bit Python-2.7.10 to 64-bit Python-2.7.11, ie: the same bit-version. While it is possible to install two different bit versions of Python together, it would require some hacking, so I’ll save that exercise for the reader. If you don’t want to hack, I suggest that if switching bit-versions, remove the other bit-version first.

UPDATES: 2016-05-16
  • Anaconda and MiniConda can be used with an existing Python installation by disabling the options to alter the Windows PATH and Registry. After extraction, create a symlink to conda in your bin or install conda from PyPI. Then create another symlink called conda-activate to activate in the Anaconda/Miniconda root bin folder. Now Anaconda/Miniconda is just like Ruby RVM. Just use conda-activate root to enable Anaconda/Miniconda.
  • Portable Python is no longer being developed or maintained.

TL;DR

  • Using Anaconda or miniconda, then just execute conda update --all to keep each conda environment updated,
  • same major version of official Python (e.g. 2.7.5), just install over old (e.g. 2.7.4),
  • different major version of official Python (e.g. 3.3), install side-by-side with old, set paths/associations to point to dominant (e.g. 2.7), shortcut to other (e.g. in BASH $ ln /c/Python33/python.exe python3).

The answer depends:

  1. If OP has 2.7.x and wants to install newer version of 2.7.x, then

    • if using MSI installer from the official Python website, just install over old version, installer will issue warning that it will remove and replace the older version; looking in “installed programs” in “control panel” before and after confirms that the old version has been replaced by the new version; newer versions of 2.7.x are backwards compatible so this is completely safe and therefore IMHO multiple versions of 2.7.x should never necessary.
    • if building from source, then you should probably build in a fresh, clean directory, and then point your path to the new build once it passes all tests and you are confident that it has been built successfully, but you may wish to keep the old build around because building from source may occasionally have issues. See my guide for building Python x64 on Windows 7 with SDK 7.0.
    • if installing from a distribution such as Python(x,y), see their website. Python(x,y) has been abandoned. I believe that updates can be handled from within Python(x,y) with their package manager, but updates are also included on their website. I could not find a specific reference so perhaps someone else can speak to this. Similar to ActiveState and probably Enthought, Python (x,y) clearly states it is incompatible with other installations of Python:

      It is recommended to uninstall any other Python distribution before installing Python(x,y)

    • Enthought Canopy uses an MSI and will install either into Program Files\Enthought or home\AppData\Local\Enthought\Canopy\App for all users or per user respectively. Newer installations are updated by using the built in update tool. See their documentation.
    • ActiveState also uses an MSI so newer installations can be installed on top of older ones. See their installation notes.

      Other Python 2.7 Installations On Windows, ActivePython 2.7 cannot coexist with other Python 2.7 installations (for example, a Python 2.7 build from python.org). Uninstall any other Python 2.7 installations before installing ActivePython 2.7.

    • Sage recommends that you install it into a virtual machine, and provides a Oracle VirtualBox image file that can be used for this purpose. Upgrades are handled internally by issuing the sage -upgrade command.
    • Anaconda can be updated by using the conda command:

      conda update --all
      

      Anaconda/Miniconda lets users create environments to manage multiple Python versions including Python-2.6, 2.7, 3.3, 3.4 and 3.5. The root Anaconda/Miniconda installations are currently based on either Python-2.7 or Python-3.5.

      Anaconda will likely disrupt any other Python installations. Installation uses MSI installer. [UPDATE: 2016-05-16] Anaconda and Miniconda now use .exe installers and provide options to disable Windows PATH and Registry alterations.

      Therefore Anaconda/Miniconda can be installed without disrupting existing Python installations depending on how it was installed and the options that were selected during installation. If the .exe installer is used and the options to alter Windows PATH and Registry are not disabled, then any previous Python installations will be disabled, but simply uninstalling the Anaconda/Miniconda installation should restore the original Python installation, except maybe the Windows Registry Python\PythonCore keys.

      Anaconda/Miniconda makes the following registry edits regardless of the installation options: HKCU\Software\Python\ContinuumAnalytics\ with the following keys: Help, InstallPath, Modules and PythonPath – official Python registers these keys too, but under Python\PythonCore. Also uninstallation info is registered for Anaconda\Miniconda. Unless you select the “Register with Windows” option during installation, it doesn’t create PythonCore, so integrations like Python Tools for Visual Studio do not automatically see Anaconda/Miniconda. If the option to register Anaconda/Miniconda is enabled, then I think your existing Python Windows Registry keys will be altered and uninstallation will probably not restore them.

    • WinPython updates, I think, can be handled through the WinPython Control Panel.
    • PortablePython is no longer being developed. It had no update method. Possibly updates could be unzipped into a fresh directory and then App\lib\site-packages and App\Scripts could be copied to the new installation, but if this didn’t work then reinstalling all packages might have been necessary. Use pip list to see what packages were installed and their versions. Some were installed by PortablePython. Use easy_install pip to install pip if it wasn’t installed.
  2. If OP has 2.7.x and wants to install a different version, e.g. <=2.6.x or >=3.x.x, then installing different versions side-by-side is fine. You must choose which version of Python (if any) to associate with *.py files and which you want on your path, although you should be able to set up shells with different paths if you use BASH. AFAIK 2.7.x is backwards compatible with 2.6.x, so IMHO side-by-side installs is not necessary, however Python-3.x.x is not backwards compatible, so my recommendation would be to put Python-2.7 on your path and have Python-3 be an optional version by creating a shortcut to its executable called python3 (this is a common setup on Linux). The official Python default install path on Windows is

    • C:\Python33 for 3.3.x (latest 2013-07-29)
    • C:\Python32 for 3.2.x
    • &c.
    • C:\Python27 for 2.7.x (latest 2013-07-29)
    • C:\Python26 for 2.6.x
    • &c.
  3. If OP is not updating Python, but merely updating packages, they may wish to look into virtualenv to keep the different versions of packages specific to their development projects separate. Pip is also a great tool to update packages. If packages use binary installers I usually uninstall the old package before installing the new one.

I hope this clears up any confusion.


回答 1

最好的解决方案是在多个路径中安装不同的Python版本。

例如。C:\ Python27(适用于2.7)和C:\ Python33(适用于3.3)。

阅读以获取更多信息:如何在Windows上运行多个Python版本

The best solution is to install the different Python versions in multiple paths.

eg. C:\Python27 for 2.7, and C:\Python33 for 3.3.

Read this for more info: How to run multiple Python versions on Windows


回答 2

  • 官方Python .msi安装程序旨在替代:

    • 以前的任何微型发行版(在xyz中z为“微型”),因为可以保证它们是向后兼容和二进制兼容的
    • 任何微型版本的“快照”(从源构建)安装
  • 快照安装程序旨在用较低的微型版本替换任何快照。

(见的2.X负责代码为3.X

任何其他版本不一定兼容,因此与现有版本一起安装。如果您希望卸载旧版本,则需要手动进行。并卸载您拥有的所有第三方模块:

  • 如果您从bdist_wininst软件包(Windows .exe)安装了任何模块,请在卸载版本之前先将其卸载,否则如果卸载程序具有自定义逻辑,则卸载程序可能无法正常工作
  • 安装了模块 setuptools /的pip驻留在其中,Lib\site-packages之后可以删除
  • 您为每个用户安装的软件包(如果有)驻留在该软件包中,%APPDATA%/Python/PythonXY/site-packages并且同样可以删除
  • Official Python .msi installers are designed to replace:

    • any previous micro release (in x.y.z, z is “micro”) because they are guaranteed to be backward-compatible and binary-compatible
    • a “snapshot” (built from source) installation with any micro version
  • A snapshot installer is designed to replace any snapshot with a lower micro version.

(See responsible code for 2.x, for 3.x)

Any other versions are not necessarily compatible and are thus installed alongside the existing one. If you wish to uninstall the old version, you’ll need to do that manually. And also uninstall any 3rd-party modules you had for it:

  • If you installed any modules from bdist_wininst packages (Windows .exes), uninstall them before uninstalling the version, or the uninstaller might not work correctly if it has custom logic
  • modules installed with setuptools/pip that reside in Lib\site-packages can just be deleted afterwards
  • packages that you installed per-user, if any, reside in %APPDATA%/Python/PythonXY/site-packages and can likewise be deleted

回答 3

我一直只是将新版本安装在最上面,从来没有任何问题。但是,请确保您的路径已更新为指向新版本。

I have always just installed the new version on top and never had any issues. Do make sure that your path is updated to point to the new version though.


为什么从__future__ import print_function使用会破坏Python2样式的打印?[关闭]

问题:为什么从__future__ import print_function使用会破坏Python2样式的打印?[关闭]

我是使用python编程的新手,但我尝试使用分隔符并结束打印,但这仍然给我带来语法错误。

我正在使用python 2.7。

这是我的代码:

from __future__ import print_function
import sys, os, time

for x in range(0,10):
    print x, sep=' ', end=''
    time.sleep(1)

这是错误:

$ python2 xy.py
  File "xy.py", line 5
    print x, sep=' ', end=''
          ^
SyntaxError: invalid syntax
$

I am new at programming with python, and I am trying to print out with a separator and end but it is still giving me a syntax error.

I am using python 2.7.

Here is my code:

from __future__ import print_function
import sys, os, time

for x in range(0,10):
    print x, sep=' ', end=''
    time.sleep(1)

And here is the error:

$ python2 xy.py
  File "xy.py", line 5
    print x, sep=' ', end=''
          ^
SyntaxError: invalid syntax
$

回答 0

首先,from __future__ import print_function必须是脚本中的第一行代码(除了下面提到的一些exceptions)。第二,正如其他答案所说,您现在必须print用作函数。这就是重点from __future__ import print_function;将print 功能从Python 3带入Python 2.6+。

from __future__ import print_function

import sys, os, time

for x in range(0,10):
    print(x, sep=' ', end='')  # No need for sep here, but okay :)
    time.sleep(1)

__future__语句必须位于文件的顶部,因为它们会更改语言的基本内容,因此编译器需要从一开始就了解它们。从文档中

将来的语句在编译时会得到特殊识别和处理:更改核心结构的语义通常是通过生成不同的代码来实现的。甚至可能是新功能引入了新的不兼容语法(例如新的保留字)的情况,在这种情况下,编译器可能需要以不同的方式解析模块。直到运行时才能推迟此类决策。

该文档还提到,__future__语句之前唯一可以做的事情就是模块文档字符串,注释,空白行和其他将来的语句。

First of all, from __future__ import print_function needs to be the first line of code in your script (aside from some exceptions mentioned below). Second of all, as other answers have said, you have to use print as a function now. That’s the whole point of from __future__ import print_function; to bring the print function from Python 3 into Python 2.6+.

from __future__ import print_function

import sys, os, time

for x in range(0,10):
    print(x, sep=' ', end='')  # No need for sep here, but okay :)
    time.sleep(1)

__future__ statements need to be near the top of the file because they change fundamental things about the language, and so the compiler needs to know about them from the beginning. From the documentation:

A future statement is recognized and treated specially at compile time: Changes to the semantics of core constructs are often implemented by generating different code. It may even be the case that a new feature introduces new incompatible syntax (such as a new reserved word), in which case the compiler may need to parse the module differently. Such decisions cannot be pushed off until runtime.

The documentation also mentions that the only things that can precede a __future__ statement are the module docstring, comments, blank lines, and other future statements.


使用子流程获取实时输出

问题:使用子流程获取实时输出

我正在尝试为命令行程序(svnadmin verify)编写包装脚本,该脚本将显示该操作的良好进度指示器。这要求我能够立即看到包装程序输出的每一行。

我认为我只是使用subprocess.Popen,use 来执行程序stdout=PIPE,然后读取其中的每一行并据此进行操作。但是,当我运行以下代码时,输​​出似乎被缓冲在某处,导致它出现在两个块中,第1到332行,然后是333到439行(输出的最后一行)

from subprocess import Popen, PIPE, STDOUT

p = Popen('svnadmin verify /var/svn/repos/config', stdout = PIPE, 
        stderr = STDOUT, shell = True)
for line in p.stdout:
    print line.replace('\n', '')

在稍微了解一下子流程的文档之后,我发现了bufsize参数Popen,因此我尝试将bufsize设置为1(缓冲每行)和0(没有缓冲),但是两个值似乎都没有改变行的传递方式。

在这一点上,我开始精通吸管,因此编写了以下输出循环:

while True:
    try:
        print p.stdout.next().replace('\n', '')
    except StopIteration:
        break

但是得到了相同的结果。

是否可以获取使用子进程执行的程序的“实时”程序输出?Python中还有其他向前兼容的选项(不是exec*)吗?

I am trying to write a wrapper script for a command line program (svnadmin verify) that will display a nice progress indicator for the operation. This requires me to be able to see each line of output from the wrapped program as soon as it is output.

I figured that I’d just execute the program using subprocess.Popen, use stdout=PIPE, then read each line as it came in and act on it accordingly. However, when I ran the following code, the output appeared to be buffered somewhere, causing it to appear in two chunks, lines 1 through 332, then 333 through 439 (the last line of output)

from subprocess import Popen, PIPE, STDOUT

p = Popen('svnadmin verify /var/svn/repos/config', stdout = PIPE, 
        stderr = STDOUT, shell = True)
for line in p.stdout:
    print line.replace('\n', '')

After looking at the documentation on subprocess a little, I discovered the bufsize parameter to Popen, so I tried setting bufsize to 1 (buffer each line) and 0 (no buffer), but neither value seemed to change the way the lines were being delivered.

At this point I was starting to grasp for straws, so I wrote the following output loop:

while True:
    try:
        print p.stdout.next().replace('\n', '')
    except StopIteration:
        break

but got the same result.

Is it possible to get ‘realtime’ program output of a program executed using subprocess? Is there some other option in Python that is forward-compatible (not exec*)?


回答 0

我尝试了这个,由于某种原因,代码

for line in p.stdout:
  ...

积极地缓冲,变体

while True:
  line = p.stdout.readline()
  if not line: break
  ...

才不是。显然,这是一个已知的错误:http : //bugs.python.org/issue3907(从2018年8月29日开始,此问题已“关闭”)

I tried this, and for some reason while the code

for line in p.stdout:
  ...

buffers aggressively, the variant

while True:
  line = p.stdout.readline()
  if not line: break
  ...

does not. Apparently this is a known bug: http://bugs.python.org/issue3907 (The issue is now “Closed” as of Aug 29, 2018)


回答 1

p = subprocess.Popen(cmd, stdout=subprocess.PIPE, bufsize=1)
for line in iter(p.stdout.readline, b''):
    print line,
p.stdout.close()
p.wait()
p = subprocess.Popen(cmd, stdout=subprocess.PIPE, bufsize=1)
for line in iter(p.stdout.readline, b''):
    print line,
p.stdout.close()
p.wait()

回答 2

您可以将子流程的输出直接定向到流。简化示例:

subprocess.run(['ls'], stderr=sys.stderr, stdout=sys.stdout)

You can direct the subprocess output to the streams directly. Simplified example:

subprocess.run(['ls'], stderr=sys.stderr, stdout=sys.stdout)

回答 3

您可以尝试以下方法:

import subprocess
import sys

process = subprocess.Popen(
    cmd, stdout=subprocess.PIPE, stderr=subprocess.PIPE
)

while True:
    out = process.stdout.read(1)
    if out == '' and process.poll() != None:
        break
    if out != '':
        sys.stdout.write(out)
        sys.stdout.flush()

如果您使用readline而不是read,则在某些情况下不会打印输入消息。尝试使用需要内联输入的命令并亲自查看。

You can try this:

import subprocess
import sys

process = subprocess.Popen(
    cmd, stdout=subprocess.PIPE, stderr=subprocess.PIPE
)

while True:
    out = process.stdout.read(1)
    if out == '' and process.poll() != None:
        break
    if out != '':
        sys.stdout.write(out)
        sys.stdout.flush()

If you use readline instead of read, there will be some cases where the input message is not printed. Try it with a command the requires an inline input and see for yourself.


回答 4

流子stdin和stdout与ASYNCIO在Python的博客文章凯文·麦卡锡显示了如何ASYNCIO做到这一点:

import asyncio
from asyncio.subprocess import PIPE
from asyncio import create_subprocess_exec


async def _read_stream(stream, callback):
    while True:
        line = await stream.readline()
        if line:
            callback(line)
        else:
            break


async def run(command):
    process = await create_subprocess_exec(
        *command, stdout=PIPE, stderr=PIPE
    )

    await asyncio.wait(
        [
            _read_stream(
                process.stdout,
                lambda x: print(
                    "STDOUT: {}".format(x.decode("UTF8"))
                ),
            ),
            _read_stream(
                process.stderr,
                lambda x: print(
                    "STDERR: {}".format(x.decode("UTF8"))
                ),
            ),
        ]
    )

    await process.wait()


async def main():
    await run("docker build -t my-docker-image:latest .")


if __name__ == "__main__":
    loop = asyncio.get_event_loop()
    loop.run_until_complete(main())

The Streaming subprocess stdin and stdout with asyncio in Python blog post by Kevin McCarthy shows how to do it with asyncio:

import asyncio
from asyncio.subprocess import PIPE
from asyncio import create_subprocess_exec


async def _read_stream(stream, callback):
    while True:
        line = await stream.readline()
        if line:
            callback(line)
        else:
            break


async def run(command):
    process = await create_subprocess_exec(
        *command, stdout=PIPE, stderr=PIPE
    )

    await asyncio.wait(
        [
            _read_stream(
                process.stdout,
                lambda x: print(
                    "STDOUT: {}".format(x.decode("UTF8"))
                ),
            ),
            _read_stream(
                process.stderr,
                lambda x: print(
                    "STDERR: {}".format(x.decode("UTF8"))
                ),
            ),
        ]
    )

    await process.wait()


async def main():
    await run("docker build -t my-docker-image:latest .")


if __name__ == "__main__":
    loop = asyncio.get_event_loop()
    loop.run_until_complete(main())

回答 5

实时输出问题已解决:在捕获c程序的实时输出时,我在Python中确实遇到了类似的问题。我添加了“ fflush(stdout) ;” 在我的C代码中 它为我工作。这是代码片段

<< C程序>>

#include <stdio.h>
void main()
{
    int count = 1;
    while (1)
    {
        printf(" Count  %d\n", count++);
        fflush(stdout);
        sleep(1);
    }
}

<< Python程序>>

#!/usr/bin/python

import os, sys
import subprocess


procExe = subprocess.Popen(".//count", shell=True, stdout=subprocess.PIPE, stderr=subprocess.PIPE, universal_newlines=True)

while procExe.poll() is None:
    line = procExe.stdout.readline()
    print("Print:" + line)

<<输出>>打印:计数1打印:计数2打印:计数3

希望能帮助到你。

〜塞拉姆

Real Time Output Issue resolved: I encountered a similar issue in Python, while capturing the real time output from C program. I added fflush(stdout); in my C code. It worked for me. Here is the code.

C program:

#include <stdio.h>
void main()
{
    int count = 1;
    while (1)
    {
        printf(" Count  %d\n", count++);
        fflush(stdout);
        sleep(1);
    }
}

Python program:

#!/usr/bin/python

import os, sys
import subprocess


procExe = subprocess.Popen(".//count", shell=True, stdout=subprocess.PIPE, stderr=subprocess.PIPE, universal_newlines=True)

while procExe.poll() is None:
    line = procExe.stdout.readline()
    print("Print:" + line)

Output:

Print: Count  1
Print: Count  2
Print: Count  3

回答 6

我不久前遇到了同样的问题。我的解决方案read是放弃对该方法的迭代,即使您的子流程未完成执行,该方法也会立即返回,等等。

I ran into the same problem awhile back. My solution was to ditch iterating for the read method, which will return immediately even if your subprocess isn’t finished executing, etc.


回答 7

根据使用情况,您可能还想禁用子流程本身中的缓冲。

如果子进程将是Python进程,则可以在调用之前执行此操作:

os.environ["PYTHONUNBUFFERED"] = "1"

或者将其作为env参数传递给Popen

否则,如果您使用的是Linux / Unix,则可以使用该stdbuf工具。例如:

cmd = ["stdbuf", "-oL"] + cmd

另请参见这里stdbuf或其他选项。

(有关相同答案,请参见此处。)

Depending on the use case, you might also want to disable the buffering in the subprocess itself.

If the subprocess will be a Python process, you could do this before the call:

os.environ["PYTHONUNBUFFERED"] = "1"

Or alternatively pass this in the env argument to Popen.

Otherwise, if you are on Linux/Unix, you can use the stdbuf tool. E.g. like:

cmd = ["stdbuf", "-oL"] + cmd

See also here about stdbuf or other options.

(See also here for the same answer.)


回答 8

我使用此解决方案在子流程上获得实时输出。该过程完成后,该循环将立即停止,不再需要break语句或可能的无限循环。

sub_process = subprocess.Popen(my_command, close_fds=True, shell=True, stdout=subprocess.PIPE, stderr=subprocess.PIPE)

while sub_process.poll() is None:
    out = sub_process.stdout.read(1)
    sys.stdout.write(out)
    sys.stdout.flush()

I used this solution to get realtime output on a subprocess. This loop will stop as soon as the process completes leaving out a need for a break statement or possible infinite loop.

sub_process = subprocess.Popen(my_command, close_fds=True, shell=True, stdout=subprocess.PIPE, stderr=subprocess.PIPE)

while sub_process.poll() is None:
    out = sub_process.stdout.read(1)
    sys.stdout.write(out)
    sys.stdout.flush()

回答 9

在此处找到此“即插即用”功能。像魅力一样工作!

import subprocess

def myrun(cmd):
    """from http://blog.kagesenshi.org/2008/02/teeing-python-subprocesspopen-output.html
    """
    p = subprocess.Popen(cmd, shell=True, stdout=subprocess.PIPE, stderr=subprocess.STDOUT)
    stdout = []
    while True:
        line = p.stdout.readline()
        stdout.append(line)
        print line,
        if line == '' and p.poll() != None:
            break
    return ''.join(stdout)

Found this “plug-and-play” function here. Worked like a charm!

import subprocess

def myrun(cmd):
    """from http://blog.kagesenshi.org/2008/02/teeing-python-subprocesspopen-output.html
    """
    p = subprocess.Popen(cmd, shell=True, stdout=subprocess.PIPE, stderr=subprocess.STDOUT)
    stdout = []
    while True:
        line = p.stdout.readline()
        stdout.append(line)
        print line,
        if line == '' and p.poll() != None:
            break
    return ''.join(stdout)

回答 10

您可以在子进程的输出中的每个字节上使用迭代器。这允许从子进程进行内联更新(以’\ r’结尾的行覆盖先前的输出行):

from subprocess import PIPE, Popen

command = ["my_command", "-my_arg"]

# Open pipe to subprocess
subprocess = Popen(command, stdout=PIPE, stderr=PIPE)


# read each byte of subprocess
while subprocess.poll() is None:
    for c in iter(lambda: subprocess.stdout.read(1) if subprocess.poll() is None else {}, b''):
        c = c.decode('ascii')
        sys.stdout.write(c)
sys.stdout.flush()

if subprocess.returncode != 0:
    raise Exception("The subprocess did not terminate correctly.")

You may use an iterator over each byte in the output of the subprocess. This allows inline update (lines ending with ‘\r’ overwrite previous output line) from the subprocess:

from subprocess import PIPE, Popen

command = ["my_command", "-my_arg"]

# Open pipe to subprocess
subprocess = Popen(command, stdout=PIPE, stderr=PIPE)


# read each byte of subprocess
while subprocess.poll() is None:
    for c in iter(lambda: subprocess.stdout.read(1) if subprocess.poll() is None else {}, b''):
        c = c.decode('ascii')
        sys.stdout.write(c)
sys.stdout.flush()

if subprocess.returncode != 0:
    raise Exception("The subprocess did not terminate correctly.")

回答 11

在Python 3.x中,该过程可能会挂起,因为输出是字节数组而不是字符串。确保将其解码为字符串。

从Python 3.6开始,您可以使用Popen Constructor中的参数encoding来实现。完整的例子:

process = subprocess.Popen(
    'my_command',
    stdout=subprocess.PIPE,
    stderr=subprocess.STDOUT,
    shell=True,
    encoding='utf-8',
    errors='replace'
)

while True:
    realtime_output = process.stdout.readline()

    if realtime_output == '' and process.poll() is not None:
        break

    if realtime_output:
        print(realtime_output.strip(), flush=True)

请注意,此代码重定向 stderrstdout处理输出错误

In Python 3.x the process might hang because the output is a byte array instead of a string. Make sure you decode it into a string.

Starting from Python 3.6 you can do it using the parameter encoding in Popen Constructor. The complete example:

process = subprocess.Popen(
    'my_command',
    stdout=subprocess.PIPE,
    stderr=subprocess.STDOUT,
    shell=True,
    encoding='utf-8',
    errors='replace'
)

while True:
    realtime_output = process.stdout.readline()

    if realtime_output == '' and process.poll() is not None:
        break

    if realtime_output:
        print(realtime_output.strip(), flush=True)

Note that this code redirects stderr to stdout and handles output errors.


回答 12

将pexpect [ http://www.noah.org/wiki/Pexpect ]与非阻塞的阅读行一起使用将解决此问题。这是由于管道是缓冲的,因此您的应用程序的输出将被管道缓冲,因此,直到缓冲填满或进程终止,您才能获得该输出。

Using pexpect with non-blocking readlines will resolve this problem. It stems from the fact that pipes are buffered, and so your app’s output is getting buffered by the pipe, therefore you can’t get to that output until the buffer fills or the process dies.


回答 13

完整的解决方案:

import contextlib
import subprocess

# Unix, Windows and old Macintosh end-of-line
newlines = ['\n', '\r\n', '\r']
def unbuffered(proc, stream='stdout'):
    stream = getattr(proc, stream)
    with contextlib.closing(stream):
        while True:
            out = []
            last = stream.read(1)
            # Don't loop forever
            if last == '' and proc.poll() is not None:
                break
            while last not in newlines:
                # Don't loop forever
                if last == '' and proc.poll() is not None:
                    break
                out.append(last)
                last = stream.read(1)
            out = ''.join(out)
            yield out

def example():
    cmd = ['ls', '-l', '/']
    proc = subprocess.Popen(
        cmd,
        stdout=subprocess.PIPE,
        stderr=subprocess.STDOUT,
        # Make all end-of-lines '\n'
        universal_newlines=True,
    )
    for line in unbuffered(proc):
        print line

example()

Complete solution:

import contextlib
import subprocess

# Unix, Windows and old Macintosh end-of-line
newlines = ['\n', '\r\n', '\r']
def unbuffered(proc, stream='stdout'):
    stream = getattr(proc, stream)
    with contextlib.closing(stream):
        while True:
            out = []
            last = stream.read(1)
            # Don't loop forever
            if last == '' and proc.poll() is not None:
                break
            while last not in newlines:
                # Don't loop forever
                if last == '' and proc.poll() is not None:
                    break
                out.append(last)
                last = stream.read(1)
            out = ''.join(out)
            yield out

def example():
    cmd = ['ls', '-l', '/']
    proc = subprocess.Popen(
        cmd,
        stdout=subprocess.PIPE,
        stderr=subprocess.STDOUT,
        # Make all end-of-lines '\n'
        universal_newlines=True,
    )
    for line in unbuffered(proc):
        print line

example()

回答 14

这是我经常使用的基本骨架。它使实现超时变得容易,并且能够处理不可避免的挂起过程。

import subprocess
import threading
import Queue

def t_read_stdout(process, queue):
    """Read from stdout"""

    for output in iter(process.stdout.readline, b''):
        queue.put(output)

    return

process = subprocess.Popen(['dir'],
                           stdout=subprocess.PIPE,
                           stderr=subprocess.STDOUT,
                           bufsize=1,
                           cwd='C:\\',
                           shell=True)

queue = Queue.Queue()
t_stdout = threading.Thread(target=t_read_stdout, args=(process, queue))
t_stdout.daemon = True
t_stdout.start()

while process.poll() is None or not queue.empty():
    try:
        output = queue.get(timeout=.5)

    except Queue.Empty:
        continue

    if not output:
        continue

    print(output),

t_stdout.join()

This is the basic skeleton that I always use for this. It makes it easy to implement timeouts and is able to deal with inevitable hanging processes.

import subprocess
import threading
import Queue

def t_read_stdout(process, queue):
    """Read from stdout"""

    for output in iter(process.stdout.readline, b''):
        queue.put(output)

    return

process = subprocess.Popen(['dir'],
                           stdout=subprocess.PIPE,
                           stderr=subprocess.STDOUT,
                           bufsize=1,
                           cwd='C:\\',
                           shell=True)

queue = Queue.Queue()
t_stdout = threading.Thread(target=t_read_stdout, args=(process, queue))
t_stdout.daemon = True
t_stdout.start()

while process.poll() is None or not queue.empty():
    try:
        output = queue.get(timeout=.5)

    except Queue.Empty:
        continue

    if not output:
        continue

    print(output),

t_stdout.join()

回答 15

(此解决方案已通过Python 2.7.15进行了测试
),每行读/写后只需要sys.stdout.flush():

while proc.poll() is None:
    line = proc.stdout.readline()
    sys.stdout.write(line)
    # or print(line.strip()), you still need to force the flush.
    sys.stdout.flush()

(This solution has been tested with Python 2.7.15)
You just need to sys.stdout.flush() after each line read/write:

while proc.poll() is None:
    line = proc.stdout.readline()
    sys.stdout.write(line)
    # or print(line.strip()), you still need to force the flush.
    sys.stdout.flush()

回答 16

很少有建议使用python 3.x或pthon 2.x的答案,下面的代码对两者都适用。

 p = subprocess.Popen(cmd, shell=True, stdout=subprocess.PIPE, stderr=subprocess.STDOUT,)
    stdout = []
    while True:
        line = p.stdout.readline()
        if not isinstance(line, (str)):
            line = line.decode('utf-8')
        stdout.append(line)
        print (line)
        if (line == '' and p.poll() != None):
            break

Few answers suggesting python 3.x or pthon 2.x , Below code will work for both.

 p = subprocess.Popen(cmd, shell=True, stdout=subprocess.PIPE, stderr=subprocess.STDOUT,)
    stdout = []
    while True:
        line = p.stdout.readline()
        if not isinstance(line, (str)):
            line = line.decode('utf-8')
        stdout.append(line)
        print (line)
        if (line == '' and p.poll() != None):
            break

列出给定类的层次结构中的所有基类?

问题:列出给定类的层次结构中的所有基类?

给定一个类Foo(无论它是否是新型类),如何生成所有基类-在继承层次结构中的任何位置issubclass

Given a class Foo (whether it is a new-style class or not), how do you generate all the base classes – anywhere in the inheritance hierarchy – it issubclass of?


回答 0

inspect.getmro(cls)适用于新样式和旧样式类,并以与NewClass.mro()方法解析相同的顺序返回:类及其所有祖先类的列表。

>>> class A(object):
>>>     pass
>>>
>>> class B(A):
>>>     pass
>>>
>>> import inspect
>>> inspect.getmro(B)
(<class '__main__.B'>, <class '__main__.A'>, <type 'object'>)

inspect.getmro(cls) works for both new and old style classes and returns the same as NewClass.mro(): a list of the class and all its ancestor classes, in the order used for method resolution.

>>> class A(object):
>>>     pass
>>>
>>> class B(A):
>>>     pass
>>>
>>> import inspect
>>> inspect.getmro(B)
(<class '__main__.B'>, <class '__main__.A'>, <type 'object'>)

回答 1

请参阅python上的可用__bases__属性class,该属性包含基类的元组:

>>> def classlookup(cls):
...     c = list(cls.__bases__)
...     for base in c:
...         c.extend(classlookup(base))
...     return c
...
>>> class A: pass
...
>>> class B(A): pass
...
>>> class C(object, B): pass
...
>>> classlookup(C)
[<type 'object'>, <class __main__.B at 0x00AB7300>, <class __main__.A at 0x00A6D630>]

See the __bases__ property available on a python class, which contains a tuple of the bases classes:

>>> def classlookup(cls):
...     c = list(cls.__bases__)
...     for base in c:
...         c.extend(classlookup(base))
...     return c
...
>>> class A: pass
...
>>> class B(A): pass
...
>>> class C(object, B): pass
...
>>> classlookup(C)
[<type 'object'>, <class __main__.B at 0x00AB7300>, <class __main__.A at 0x00A6D630>]

回答 2

inspect.getclasstree()将创建一个嵌套的类及其基列表。用法:

inspect.getclasstree(inspect.getmro(IOError)) # Insert your Class instead of IOError.

inspect.getclasstree() will create a nested list of classes and their bases. Usage:

inspect.getclasstree(inspect.getmro(IOError)) # Insert your Class instead of IOError.

回答 3

您可以使用__bases__类对象的元组:

class A(object, B, C):
    def __init__(self):
       pass
print A.__bases__

返回的元组__bases__具有其所有基类。

希望能帮助到你!

you can use the __bases__ tuple of the class object:

class A(object, B, C):
    def __init__(self):
       pass
print A.__bases__

The tuple returned by __bases__ has all its base classes.

Hope it helps!


回答 4

在python 3.7中,您无需导入inspect,type.mro将为您提供结果。

>>> class A:
...   pass
... 
>>> class B(A):
...   pass
... 
>>> type.mro(B)
[<class '__main__.B'>, <class '__main__.A'>, <class 'object'>]
>>>

注意,在python 3.x中,每个类都继承自基础对象类。

In python 3.7 you don’t need to import inspect, type.mro will give you the result.

>>> class A:
...   pass
... 
>>> class B(A):
...   pass
... 
>>> type.mro(B)
[<class '__main__.B'>, <class '__main__.A'>, <class 'object'>]
>>>

attention that in python 3.x every class inherits from base object class.


回答 5

根据Python文档,我们还可以简单地使用class.__mro__属性或class.mro()方法:

>>> class A:
...     pass
... 
>>> class B(A):
...     pass
... 
>>> B.__mro__
(<class '__main__.B'>, <class '__main__.A'>, <class 'object'>)
>>> A.__mro__
(<class '__main__.A'>, <class 'object'>)
>>> object.__mro__
(<class 'object'>,)
>>>
>>> B.mro()
[<class '__main__.B'>, <class '__main__.A'>, <class 'object'>]
>>> A.mro()
[<class '__main__.A'>, <class 'object'>]
>>> object.mro()
[<class 'object'>]
>>> A in B.mro()
True

According to the Python doc, we can also simply use class.__mro__ attribute or class.mro() method:

>>> class A:
...     pass
... 
>>> class B(A):
...     pass
... 
>>> B.__mro__
(<class '__main__.B'>, <class '__main__.A'>, <class 'object'>)
>>> A.__mro__
(<class '__main__.A'>, <class 'object'>)
>>> object.__mro__
(<class 'object'>,)
>>>
>>> B.mro()
[<class '__main__.B'>, <class '__main__.A'>, <class 'object'>]
>>> A.mro()
[<class '__main__.A'>, <class 'object'>]
>>> object.mro()
[<class 'object'>]
>>> A in B.mro()
True


回答 6

尽管Jochen的回答非常有帮助和正确,但是您可以使用inspect模块的.getmro()方法获得类层次结构,但是突出显示Python的继承层次结构也很重要:

例如:

class MyClass(YourClass):

继承类

  • 儿童班
  • 派生类
  • 子类

例如:

class YourClass(Object):

继承的类

  • 家长班
  • 基类
  • 超类

一个类可以从另一个类继承-该类的属性是继承的-特别是其方法是继承的-这意味着继承(子)类的实例可以访问该继承(父)类的属性

实例->类->然后继承的类

使用

import inspect
inspect.getmro(MyClass)

将在Python中向您显示层次结构。

Although Jochen’s answer is very helpful and correct, as you can obtain the class hierarchy using the .getmro() method of the inspect module, it’s also important to highlight that Python’s inheritance hierarchy is as follows:

ex:

class MyClass(YourClass):

An inheriting class

  • Child class
  • Derived class
  • Subclass

ex:

class YourClass(Object):

An inherited class

  • Parent class
  • Base class
  • Superclass

One class can inherit from another – The class’ attributed are inherited – in particular, its methods are inherited – this means that instances of an inheriting (child) class can access attributed of the inherited (parent) class

instance -> class -> then inherited classes

using

import inspect
inspect.getmro(MyClass)

will show you the hierarchy, within Python.