分类目录归档:知识问答

Unicode(UTF-8)用Python读写文件

问题:Unicode(UTF-8)用Python读写文件

我在理解将文本写入文件和将文件写入文件时遇到了一些大脑故障(Python 2.4)。

# The string, which has an a-acute in it.
ss = u'Capit\xe1n'
ss8 = ss.encode('utf8')
repr(ss), repr(ss8)

(“ u’Capit \ xe1n’”,“’Capit \ xc3 \ xa1n’”)

print ss, ss8
print >> open('f1','w'), ss8

>>> file('f1').read()
'Capit\xc3\xa1n\n'

因此,我Capit\xc3\xa1n在文件f2 中输入我最喜欢的编辑器。

然后:

>>> open('f1').read()
'Capit\xc3\xa1n\n'
>>> open('f2').read()
'Capit\\xc3\\xa1n\n'
>>> open('f1').read().decode('utf8')
u'Capit\xe1n\n'
>>> open('f2').read().decode('utf8')
u'Capit\\xc3\\xa1n\n'

我在这里不明白什么?显然,我缺少一些至关重要的魔术(或理智)。一种类型的文本文件可以正确转换?

我真正无法理解的是UTF-8表示法的意义所在,如果您实际上无法让Python识别它的话(如果它来自外部)。也许我应该只将JSON转储字符串,然后使用它,因为它具有可表示性!更重要的是,当来自文件时,Python是否会识别并解码该Unicode对象的ASCII表示形式?如果是这样,我如何得到它?

>>> print simplejson.dumps(ss)
'"Capit\u00e1n"'
>>> print >> file('f3','w'), simplejson.dumps(ss)
>>> simplejson.load(open('f3'))
u'Capit\xe1n'

I’m having some brain failure in understanding reading and writing text to a file (Python 2.4).

# The string, which has an a-acute in it.
ss = u'Capit\xe1n'
ss8 = ss.encode('utf8')
repr(ss), repr(ss8)

(“u’Capit\xe1n'”, “‘Capit\xc3\xa1n'”)

print ss, ss8
print >> open('f1','w'), ss8

>>> file('f1').read()
'Capit\xc3\xa1n\n'

So I type in Capit\xc3\xa1n into my favorite editor, in file f2.

Then:

>>> open('f1').read()
'Capit\xc3\xa1n\n'
>>> open('f2').read()
'Capit\\xc3\\xa1n\n'
>>> open('f1').read().decode('utf8')
u'Capit\xe1n\n'
>>> open('f2').read().decode('utf8')
u'Capit\\xc3\\xa1n\n'

What am I not understanding here? Clearly there is some vital bit of magic (or good sense) that I’m missing. What does one type into text files to get proper conversions?

What I’m truly failing to grok here, is what the point of the UTF-8 representation is, if you can’t actually get Python to recognize it, when it comes from outside. Maybe I should just JSON dump the string, and use that instead, since that has an asciiable representation! More to the point, is there an ASCII representation of this Unicode object that Python will recognize and decode, when coming in from a file? If so, how do I get it?

>>> print simplejson.dumps(ss)
'"Capit\u00e1n"'
>>> print >> file('f3','w'), simplejson.dumps(ss)
>>> simplejson.load(open('f3'))
u'Capit\xe1n'

回答 0

在符号中

u'Capit\xe1n\n'

“ \ xe1”仅代表一个字节。“ \ x”告诉您“ e1”为十六进制。当你写

Capit\xc3\xa1n

到您的文件中,您有“ \ xc3”。这些是4个字节,在您的代码中,您全部读取了它们。显示它们时可以看到以下内容:

>>> open('f2').read()
'Capit\\xc3\\xa1n\n'

您可以看到反斜杠被反斜杠转义了。因此,您的字符串中有四个字节:“ \”,“ x”,“ c”和“ 3”。

编辑:

正如其他人在他们的答案中指出的那样,您只需要在编辑器中输入字符,然后您的编辑器就应处理到UTF-8的转换并保存。

如果您实际上有这种格式的字符串,则可以使用string_escape编解码器将其解码为普通字符串:

In [15]: print 'Capit\\xc3\\xa1n\n'.decode('string_escape')
Capitán

结果是一个以UTF-8编码的字符串,其中重音字符由\\xc3\\xa1原始字符串中写入的两个字节表示。如果要使用unicode字符串,则必须使用UTF-8再次解码。

编辑:您的文件中没有UTF-8。实际查看它的外观:

s = u'Capit\xe1n\n'
sutf8 = s.encode('UTF-8')
open('utf-8.out', 'w').write(sutf8)

将文件utf-8.out内容与使用编辑器保存的文件内容进行比较。

In the notation

u'Capit\xe1n\n'

the “\xe1” represents just one byte. “\x” tells you that “e1” is in hexadecimal. When you write

Capit\xc3\xa1n

into your file you have “\xc3” in it. Those are 4 bytes and in your code you read them all. You can see this when you display them:

>>> open('f2').read()
'Capit\\xc3\\xa1n\n'

You can see that the backslash is escaped by a backslash. So you have four bytes in your string: “\”, “x”, “c” and “3”.

Edit:

As others pointed out in their answers you should just enter the characters in the editor and your editor should then handle the conversion to UTF-8 and save it.

If you actually have a string in this format you can use the string_escape codec to decode it into a normal string:

In [15]: print 'Capit\\xc3\\xa1n\n'.decode('string_escape')
Capitán

The result is a string that is encoded in UTF-8 where the accented character is represented by the two bytes that were written \\xc3\\xa1 in the original string. If you want to have a unicode string you have to decode again with UTF-8.

To your edit: you don’t have UTF-8 in your file. To actually see how it would look like:

s = u'Capit\xe1n\n'
sutf8 = s.encode('UTF-8')
open('utf-8.out', 'w').write(sutf8)

Compare the content of the file utf-8.out to the content of the file you saved with your editor.


回答 1

我发现打开文件时更容易指定编码,而不是搞乱编码和解码方法。该io模块(Python 2.6中添加)提供了一个io.open函数,该函数具有一个编码参数。

使用io模块中的open方法。

>>>import io
>>>f = io.open("test", mode="r", encoding="utf-8")

然后,在调用f的read()函数之后,将返回一个编码的Unicode对象。

>>>f.read()
u'Capit\xe1l\n\n'

请注意,在Python 3中,该io.open函数是内置函数的别名open。内置的open函数仅在Python 3中支持encoding参数,而在Python 2中不支持。

编辑:以前此答案推荐编解码器模块。该混合编解码器时,模块可能会造成问题read()readline(),所以这个答案现在建议的IO模块来代替。

使用编解码器模块中的open方法。

>>>import codecs
>>>f = codecs.open("test", "r", "utf-8")

然后,在调用f的read()函数之后,将返回一个编码的Unicode对象。

>>>f.read()
u'Capit\xe1l\n\n'

如果您知道文件的编码,那么使用编解码器软件包将减少混乱。

请参阅http://docs.python.org/library/codecs.html#codecs.open

Rather than mess with the encode and decode methods I find it easier to specify the encoding when opening the file. The io module (added in Python 2.6) provides an io.open function, which has an encoding parameter.

Use the open method from the io module.

>>>import io
>>>f = io.open("test", mode="r", encoding="utf-8")

Then after calling f’s read() function, an encoded Unicode object is returned.

>>>f.read()
u'Capit\xe1l\n\n'

Note that in Python 3, the io.open function is an alias for the built-in open function. The built-in open function only supports the encoding argument in Python 3, not Python 2.

Edit: Previously this answer recommended the codecs module. The codecs module can cause problems when mixing read() and readline(), so this answer now recommends the io module instead.

Use the open method from the codecs module.

>>>import codecs
>>>f = codecs.open("test", "r", "utf-8")

Then after calling f’s read() function, an encoded Unicode object is returned.

>>>f.read()
u'Capit\xe1l\n\n'

If you know the encoding of a file, using the codecs package is going to be much less confusing.

See http://docs.python.org/library/codecs.html#codecs.open


回答 2

现在,您在Python3中所需的就是 open(Filename, 'r', encoding='utf-8')

[在2016-02-10上进行编辑以要求澄清]

Python3在其open函数中添加了encoding参数。从此处收集了有关open函数的以下信息:https : //docs.python.org/3/library/functions.html#open

open(file, mode='r', buffering=-1, 
      encoding=None, errors=None, newline=None, 
      closefd=True, opener=None)

编码是用于解码或编码文件的编码名称。仅应在文本模式下使用。默认编码取决于平台(无论locale.getpreferredencoding() 返回什么),但是可以使用Python支持的任何文本编码。有关支持的编码列表,请参见编解码器模块。

因此,通过向encoding='utf-8'open函数添加参数,所有文件的读取和写入操作都将以utf8的形式完成(现在,这也是使用Python完成的所有操作的默认编码。)

Now all you need in Python3 is open(Filename, 'r', encoding='utf-8')

[Edit on 2016-02-10 for requested clarification]

Python3 added the encoding parameter to its open function. The following information about the open function is gathered from here: https://docs.python.org/3/library/functions.html#open

open(file, mode='r', buffering=-1, 
      encoding=None, errors=None, newline=None, 
      closefd=True, opener=None)

Encoding is the name of the encoding used to decode or encode the file. This should only be used in text mode. The default encoding is platform dependent (whatever locale.getpreferredencoding() returns), but any text encoding supported by Python can be used. See the codecs module for the list of supported encodings.

So by adding encoding='utf-8' as a parameter to the open function, the file reading and writing is all done as utf8 (which is also now the default encoding of everything done in Python.)


回答 3

因此,我找到了所需的解决方案,即:

print open('f2').read().decode('string-escape').decode("utf-8")

这里有一些不常用的编解码器。这种特殊的阅读方式允许人们从Python内部获取UTF-8表示形式,将其复制到ASCII文件中,然后将其读入Unicode。在“字符串转义”解码下,斜杠不会加倍。

这允许我想象中的那种往返。

So, I’ve found a solution for what I’m looking for, which is:

print open('f2').read().decode('string-escape').decode("utf-8")

There are some unusual codecs that are useful here. This particular reading allows one to take UTF-8 representations from within Python, copy them into an ASCII file, and have them be read in to Unicode. Under the “string-escape” decode, the slashes won’t be doubled.

This allows for the sort of round trip that I was imagining.


回答 4

# -*- encoding: utf-8 -*-

# converting a unknown formatting file in utf-8

import codecs
import commands

file_location = "jumper.sub"
file_encoding = commands.getoutput('file -b --mime-encoding %s' % file_location)

file_stream = codecs.open(file_location, 'r', file_encoding)
file_output = codecs.open(file_location+"b", 'w', 'utf-8')

for l in file_stream:
    file_output.write(l)

file_stream.close()
file_output.close()
# -*- encoding: utf-8 -*-

# converting a unknown formatting file in utf-8

import codecs
import commands

file_location = "jumper.sub"
file_encoding = commands.getoutput('file -b --mime-encoding %s' % file_location)

file_stream = codecs.open(file_location, 'r', file_encoding)
file_output = codecs.open(file_location+"b", 'w', 'utf-8')

for l in file_stream:
    file_output.write(l)

file_stream.close()
file_output.close()

回答 5

实际上,这对于在Python 3.2中读取UTF-8编码的文件非常有用:

import codecs
f = codecs.open('file_name.txt', 'r', 'UTF-8')
for line in f:
    print(line)

Actually, this worked for me for reading a file with UTF-8 encoding in Python 3.2:

import codecs
f = codecs.open('file_name.txt', 'r', 'UTF-8')
for line in f:
    print(line)

回答 6

要读取Unicode字符串然后发送到HTML,我这样做:

fileline.decode("utf-8").encode('ascii', 'xmlcharrefreplace')

对于由python驱动的http服务器有用。

To read in an Unicode string and then send to HTML, I did this:

fileline.decode("utf-8").encode('ascii', 'xmlcharrefreplace')

Useful for python powered http servers.


回答 7

您已经迷惑了编码的一般问题:如何确定文件是哪种编码?

答:除非文件格式为此提供,否则您不能这样做。例如,XML以:

<?xml encoding="utf-8"?>

仔细选择了此标头,以便无论编码方式都可以读取它。在您的情况下,没有这样的提示,因此您的编辑器和Python都不知道发生了什么。因此,您必须使用codecs模块并使用codecs.open(path,mode,encoding)它提供Python中缺少的位。

对于您的编辑器,您必须检查它是否提供某种方式来设置文件的编码。

UTF-8的重点是能够将21位字符(Unicode)编码为8位数据流(因为这是世界上所有计算机只能处理的事情)。但是,由于大多数操作系统早于Unicode时代,因此它们没有合适的工具将编码信息附加到硬盘上的文件中。

下一个问题是Python中的表示形式。heikogerlach评论中对此做了完美解释。您必须了解控制台只能显示ASCII。为了显示Unicode或> = charcode 128的任何内容,它必须使用某种转义方法。在编辑器中,您不得键入转义的显示字符串,而应输入字符串的含义(在这种情况下,必须输入变音符号并保存文件)。

也就是说,您可以使用Python函数eval()将转义的字符串转换为字符串:

>>> x = eval("'Capit\\xc3\\xa1n\\n'")
>>> x
'Capit\xc3\xa1n\n'
>>> x[5]
'\xc3'
>>> len(x[5])
1

如您所见,字符串“ \ xc3”已变成单个字符。现在,这是一个8位字符串,采用UTF-8编码。要获取Unicode:

>>> x.decode('utf-8')
u'Capit\xe1n\n'

Gregg Lind问:我认为这里缺少一些内容:文件f2包含:十六进制:

0000000: 4361 7069 745c 7863 335c 7861 316e  Capit\xc3\xa1n

codecs.open('f2','rb', 'utf-8'),例如,将它们全部读取到一个单独的字符中(期望),是否有任何方法可以用ASCII写入文件?

答:这取决于您的意思。ASCII不能表示大于127的字符。因此,您需要某种方式来表示“接下来的几个字符表示特殊的含义”,这就是序列“ \ x”的作用。它说:接下来的两个字符是单个字符的代码。“ \ u”使用四个字符对最多0xFFFF(65535)的Unicode进行编码。

因此,您不能直接将Unicode写为ASCII(因为ASCII根本不包含相同的字符)。您可以将其写为字符串转义符(如f2所示);在这种情况下,文件可以表示为ASCII。或者您可以将其编写为UTF-8,在这种情况下,您需要8位安全流。

您的解决方案使用decode('string-escape')确实可以,但是您必须知道使用了多少内存:使用量的三倍codecs.open()

请记住,文件只是一个具有8位的字节序列。位和字节都没有意义。是您说“ 65代表’A’”。由于\xc3\xa1应该变成“à”,但是计算机无法识别,因此必须通过指定在写入文件时使用的编码来告诉它。

You have stumbled over the general problem with encodings: How can I tell in which encoding a file is?

Answer: You can’t unless the file format provides for this. XML, for example, begins with:

<?xml encoding="utf-8"?>

This header was carefully chosen so that it can be read no matter the encoding. In your case, there is no such hint, hence neither your editor nor Python has any idea what is going on. Therefore, you must use the codecs module and use codecs.open(path,mode,encoding) which provides the missing bit in Python.

As for your editor, you must check if it offers some way to set the encoding of a file.

The point of UTF-8 is to be able to encode 21-bit characters (Unicode) as an 8-bit data stream (because that’s the only thing all computers in the world can handle). But since most OSs predate the Unicode era, they don’t have suitable tools to attach the encoding information to files on the hard disk.

The next issue is the representation in Python. This is explained perfectly in the comment by heikogerlach. You must understand that your console can only display ASCII. In order to display Unicode or anything >= charcode 128, it must use some means of escaping. In your editor, you must not type the escaped display string but what the string means (in this case, you must enter the umlaut and save the file).

That said, you can use the Python function eval() to turn an escaped string into a string:

>>> x = eval("'Capit\\xc3\\xa1n\\n'")
>>> x
'Capit\xc3\xa1n\n'
>>> x[5]
'\xc3'
>>> len(x[5])
1

As you can see, the string “\xc3” has been turned into a single character. This is now an 8-bit string, UTF-8 encoded. To get Unicode:

>>> x.decode('utf-8')
u'Capit\xe1n\n'

Gregg Lind asked: I think there are some pieces missing here: the file f2 contains: hex:

0000000: 4361 7069 745c 7863 335c 7861 316e  Capit\xc3\xa1n

codecs.open('f2','rb', 'utf-8'), for example, reads them all in a separate chars (expected) Is there any way to write to a file in ASCII that would work?

Answer: That depends on what you mean. ASCII can’t represent characters > 127. So you need some way to say “the next few characters mean something special” which is what the sequence “\x” does. It says: The next two characters are the code of a single character. “\u” does the same using four characters to encode Unicode up to 0xFFFF (65535).

So you can’t directly write Unicode to ASCII (because ASCII simply doesn’t contain the same characters). You can write it as string escapes (as in f2); in this case, the file can be represented as ASCII. Or you can write it as UTF-8, in which case, you need an 8-bit safe stream.

Your solution using decode('string-escape') does work, but you must be aware how much memory you use: Three times the amount of using codecs.open().

Remember that a file is just a sequence of bytes with 8 bits. Neither the bits nor the bytes have a meaning. It’s you who says “65 means ‘A'”. Since \xc3\xa1 should become “à” but the computer has no means to know, you must tell it by specifying the encoding which was used when writing the file.


回答 8

除了之外codecs.open(),可以使用io.open()Python2或Python3来读取/写入unicode文件

import io

text = u'á'
encoding = 'utf8'

with io.open('data.txt', 'w', encoding=encoding, newline='\n') as fout:
    fout.write(text)

with io.open('data.txt', 'r', encoding=encoding, newline='\n') as fin:
    text2 = fin.read()

assert text == text2

except for codecs.open(), one can uses io.open() to work with Python2 or Python3 to read / write unicode file

example

import io

text = u'á'
encoding = 'utf8'

with io.open('data.txt', 'w', encoding=encoding, newline='\n') as fout:
    fout.write(text)

with io.open('data.txt', 'r', encoding=encoding, newline='\n') as fin:
    text2 = fin.read()

assert text == text2

回答 9

好吧,您最喜欢的文本编辑器没有意识到这\xc3\xa1应该是字符文字,而是将它们解释为文本。这就是为什么在最后一行得到双反斜杠的原因-它现在是xc3文件中的真实反斜杠+ 等。

如果要用Python读写编码文件,最好使用编解码器模块。

在终端和应用程序之间粘贴文本很困难,因为您不知道哪个程序将使用哪种编码来解释您的文本。您可以尝试以下方法:

>>> s = file("f1").read()
>>> print unicode(s, "Latin-1")
Capitán

然后将此字符串粘贴到编辑器中,并确保使用Latin-1将其存储。在剪贴板不乱码的假设下,往返应该起作用。

Well, your favorite text editor does not realize that \xc3\xa1 are supposed to be character literals, but it interprets them as text. That’s why you get the double backslashes in the last line — it’s now a real backslash + xc3, etc. in your file.

If you want to read and write encoded files in Python, best use the codecs module.

Pasting text between the terminal and applications is difficult, because you don’t know which program will interpret your text using which encoding. You could try the following:

>>> s = file("f1").read()
>>> print unicode(s, "Latin-1")
Capitán

Then paste this string into your editor and make sure that it stores it using Latin-1. Under the assumption that the clipboard does not garble the string, the round trip should work.


回答 10

\ x ..序列特定于Python。这不是通用字节转义序列。

实际输入UTF-8编码的非ASCII的方式取决于您的操作系统和/或编辑器。这是您在Windows中的操作方法。对于OS X进入一个带有尖音,你可以点击option+ E,然后A在OS X的支持UTF-8,而几乎所有的文本编辑器。

The \x.. sequence is something that’s specific to Python. It’s not a universal byte escape sequence.

How you actually enter in UTF-8-encoded non-ASCII depends on your OS and/or your editor. Here’s how you do it in Windows. For OS X to enter a with an acute accent you can just hit option + E, then A, and almost all text editors in OS X support UTF-8.


回答 11

您还可以open()通过使用该partial函数替换原来的函数,从而改进原始函数以使用Unicode文件。该解决方案的优点在于您无需更改任何旧代码。它是透明的。

import codecs
import functools
open = functools.partial(codecs.open, encoding='utf-8')

You can also improve the original open() function to work with Unicode files by replacing it in place, using the partial function. The beauty of this solution is you don’t need to change any old code. It’s transparent.

import codecs
import functools
open = functools.partial(codecs.open, encoding='utf-8')

回答 12

我试图使用Python 2.7.9 解析iCal

从icalendar导入日历

但是我得到了:

 Traceback (most recent call last):
 File "ical.py", line 92, in parse
    print "{}".format(e[attr])
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe1' in position 7: ordinal not in range(128)

它被固定为:

print "{}".format(e[attr].encode("utf-8"))

(现在,它可以打印likéáböss了。)

I was trying to parse iCal using Python 2.7.9:

from icalendar import Calendar

But I was getting:

 Traceback (most recent call last):
 File "ical.py", line 92, in parse
    print "{}".format(e[attr])
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe1' in position 7: ordinal not in range(128)

and it was fixed with just:

print "{}".format(e[attr].encode("utf-8"))

(Now it can print liké á böss.)


回答 13

通过将整个脚本的默认编码更改为’UTF-8’,我找到了最简单的方法:

import sys
reload(sys)
sys.setdefaultencoding('utf8')

任何openprint或其他语句将只使用utf8

至少适用于Python 2.7.9

Thx转到https://markhneedham.com/blog/2015/05/21/python-unicodeencodeerror-ascii-codec-cant-encode-character-uxfc-in-position-11-ordinal-not-in-range128/(看看结尾)。

I found the most simple approach by changing the default encoding of the whole script to be ‘UTF-8’:

import sys
reload(sys)
sys.setdefaultencoding('utf8')

any open, print or other statement will just use utf8.

Works at least for Python 2.7.9.

Thx goes to https://markhneedham.com/blog/2015/05/21/python-unicodeencodeerror-ascii-codec-cant-encode-character-uxfc-in-position-11-ordinal-not-in-range128/ (look at the end).


列出对象的属性

问题:列出对象的属性

有没有办法获取类实例上存在的属性列表?

class new_class():
    def __init__(self, number):
        self.multi = int(number) * 2
        self.str = str(number)

a = new_class(2)
print(', '.join(a.SOMETHING))

理想的结果是将输出“ multi,str”。我希望它可以查看脚本各个部分的当前属性。

Is there a way to grab a list of attributes that exist on instances of a class?

class new_class():
    def __init__(self, number):
        self.multi = int(number) * 2
        self.str = str(number)

a = new_class(2)
print(', '.join(a.SOMETHING))

The desired result is that “multi, str” will be output. I want this to see the current attributes from various parts of a script.


回答 0

>>> class new_class():
...   def __init__(self, number):
...     self.multi = int(number) * 2
...     self.str = str(number)
... 
>>> a = new_class(2)
>>> a.__dict__
{'multi': 4, 'str': '2'}
>>> a.__dict__.keys()
dict_keys(['multi', 'str'])

您可能还会发现pprint有帮助。

>>> class new_class():
...   def __init__(self, number):
...     self.multi = int(number) * 2
...     self.str = str(number)
... 
>>> a = new_class(2)
>>> a.__dict__
{'multi': 4, 'str': '2'}
>>> a.__dict__.keys()
dict_keys(['multi', 'str'])

You may also find pprint helpful.


回答 1

dir(instance)
# or (same value)
instance.__dir__()
# or
instance.__dict__

然后,您可以测试的类型type()或的方法callable()

dir(instance)
# or (same value)
instance.__dir__()
# or
instance.__dict__

Then you can test what type is with type() or if is a method with callable().


回答 2

vars(obj) 返回对象的属性。

vars(obj) returns the attributes of an object.


回答 3

先前的所有答案都是正确的,您可以根据自己的需求选择三种方式

  1. dir()

  2. vars()

  3. __dict__

>>> dir(a)
['__class__', '__delattr__', '__dict__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__gt__', '__hash__', '__init__', '__init_subclass__', '__le__', '__lt__', '__module__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', '__weakref__', 'multi', 'str']
>>> vars(a)
{'multi': 4, 'str': '2'}
>>> a.__dict__
{'multi': 4, 'str': '2'}

All previous answers are correct, you have three options for what you are asking

  1. dir()

  2. vars()

  3. __dict__

>>> dir(a)
['__class__', '__delattr__', '__dict__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__gt__', '__hash__', '__init__', '__init_subclass__', '__le__', '__lt__', '__module__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', '__weakref__', 'multi', 'str']
>>> vars(a)
{'multi': 4, 'str': '2'}
>>> a.__dict__
{'multi': 4, 'str': '2'}

回答 4

>>> ', '.join(i for i in dir(a) if not i.startswith('__'))
'multi, str'

当然,这将打印类定义中的所有方法或属性。您可以通过更改i.startwith('__')为排除“私有”方法i.startwith('_')

>>> ', '.join(i for i in dir(a) if not i.startswith('__'))
'multi, str'

This of course will print any methods or attributes in the class definition. You can exclude “private” methods by changing i.startwith('__') to i.startwith('_')


回答 5

检查模块提供了简便的方法来检查的对象:

检查模块提供了几个有用的功能,以帮助获取有关活动对象的信息,例如模块,类,方法,函数,回溯,框架对象和代码对象。


使用,getmembers()您可以查看类的所有属性及其值。要排除私有或受保护的属性,请使用.startswith('_')。要排除方法或功能,请使用inspect.ismethod()inspect.isfunction()

import inspect


class NewClass(object):
    def __init__(self, number):
        self.multi = int(number) * 2
        self.str = str(number)

    def func_1(self):
        pass


a = NewClass(2)

for i in inspect.getmembers(a):
    # Ignores anything starting with underscore 
    # (that is, private and protected attributes)
    if not i[0].startswith('_'):
        # Ignores methods
        if not inspect.ismethod(i[1]):
            print(i)

请注意,由于第一个ismethod()元素i只是一个字符串(其名称),因此它用于第二个元素。

主题:使用CamelCase作为类名。

The inspect module provides easy ways to inspect an object:

The inspect module provides several useful functions to help get information about live objects such as modules, classes, methods, functions, tracebacks, frame objects, and code objects.


Using getmembers() you can see all attributes of your class, along with their value. To exclude private or protected attributes use .startswith('_'). To exclude methods or functions use inspect.ismethod() or inspect.isfunction().

import inspect


class NewClass(object):
    def __init__(self, number):
        self.multi = int(number) * 2
        self.str = str(number)

    def func_1(self):
        pass


a = NewClass(2)

for i in inspect.getmembers(a):
    # Ignores anything starting with underscore 
    # (that is, private and protected attributes)
    if not i[0].startswith('_'):
        # Ignores methods
        if not inspect.ismethod(i[1]):
            print(i)

Note that ismethod() is used on the second element of i since the first is simply a string (its name).

Offtopic: Use CamelCase for class names.


回答 6

您可以dir(your_object)用来获取属性和getattr(your_object, your_object_attr)获取值

用法:

for att in dir(your_object):
    print (att, getattr(your_object,att))

如果您的对象没有__dict__,这将特别有用。如果不是这种情况,您也可以尝试var(your_object)

You can use dir(your_object) to get the attributes and getattr(your_object, your_object_attr) to get the values

usage :

for att in dir(your_object):
    print (att, getattr(your_object,att))

This is particularly useful if your object have no __dict__. If that is not the case you can try var(your_object) also


回答 7

人们经常提到要列出完整的属性列表,您应该使用dir()。但是请注意,与普遍看法相反,这dir()并不能体现所有属性。例如,即使您可以从类本身访问它,您也可能会注意到__name__dir()列表中可能缺少该类。从dir()Python 2Python 3)的文档中:

因为dir()的主要提供是为了方便在交互式提示符下使用,所以它尝试提供一组有趣的名称,而不是尝试提供一组严格或一致定义的名称,并且其详细行为可能会因版本而异。例如,当参数是类时,元类属性不在结果列表中。

像下图的功能更趋于完善,虽然有因为返回的列表中没有完整的担保dir()可以由许多因素,包括实施的影响的__dir__()方法,或自定义__getattr__()__getattribute__()对类或它的某个父。有关更多详细信息,请参见提供的链接。

def dirmore(instance):
    visible = dir(instance)
    visible += [a for a in set(dir(type)).difference(visible)
                if hasattr(instance, a)]
    return sorted(visible)

It’s often mentioned that to list a complete list of attributes you should use dir(). Note however that contrary to popular belief dir() does not bring out all attributes. For example you might notice that __name__ might be missing from a class’s dir() listing even though you can access it from the class itself. From the doc on dir() (Python 2, Python 3):

Because dir() is supplied primarily as a convenience for use at an interactive prompt, it tries to supply an interesting set of names more than it tries to supply a rigorously or consistently defined set of names, and its detailed behavior may change across releases. For example, metaclass attributes are not in the result list when the argument is a class.

A function like the following tends to be more complete, although there’s no guarantee of completeness since the list returned by dir() can be affected by many factors including implementing the __dir__() method, or customizing __getattr__() or __getattribute__() on the class or one of its parents. See provided links for more details.

def dirmore(instance):
    visible = dir(instance)
    visible += [a for a in set(dir(type)).difference(visible)
                if hasattr(instance, a)]
    return sorted(visible)

回答 8

你要干嘛 在不知道确切意图的情况下,可能很难获得最佳答案。

  • 如果要以特定的方式显示类的实例,几乎总是最好手动进行此操作。这将完全包括您想要的内容,而不包括您不需要的内容,并且顺序是可以预测的。

    如果您正在寻找一种显示类内容的方法,请手动设置您关心的属性的格式,并将其作为类的__str__or __repr__方法提供。

  • 如果要了解对象存在哪些方法等,以了解其工作原理,请使用helphelp(a)将根据对象的文档字符串显示有关该对象的类的格式化输出。

  • dir存在以编程方式获取对象的所有属性。(访问__dict__将执行与我相同的操作,但不会使用我自己。)但是,这可能不包括您想要的东西,也可能包括您不想要的东西。它是不可靠的,人们认为他们想要的次数比他们想要的要多得多。

  • 有点正交,目前对Python 3的支持很少。如果您对编写真正的软件感兴趣,那么您将需要第三方产品,例如numpy,lxml,Twisted,PIL或任何数量的尚不支持Python 3并且没有计划很快的计划的Web框架。2.6和3.x分支之间的差异很小,但是库支持方面的差异很大。

What do you want this for? It may be hard to get you the best answer without knowing your exact intent.

  • It is almost always better to do this manually if you want to display an instance of your class in a specific way. This will include exactly what you want and not include what you don’t want, and the order will be predictable.

    If you are looking for a way to display the content of a class, manually format the attributes you care about and provide this as the __str__ or __repr__ method for your class.

  • If you want to learn about what methods and such exist for an object to understand how it works, use help. help(a) will show you a formatted output about the object’s class based on its docstrings.

  • dir exists for programatically getting all the attributes of an object. (Accessing __dict__ does something I would group as the same but that I wouldn’t use myself.) However, this may not include things you want and it may include things you do not want. It is unreliable and people think they want it a lot more often than they do.

  • On a somewhat orthogonal note, there is very little support for Python 3 at the current time. If you are interested in writing real software you are going to want third-party stuff like numpy, lxml, Twisted, PIL, or any number of web frameworks that do not yet support Python 3 and do not have plans to any time too soon. The differences between 2.6 and the 3.x branch are small, but the difference in library support is huge.


回答 9

有多种方法可以做到这一点:

#! /usr/bin/env python3
#
# This demonstrates how to pick the attiributes of an object

class C(object) :

  def __init__ (self, name="q" ):
    self.q = name
    self.m = "y?"

c = C()

print ( dir(c) )

运行时,此代码将生成:

jeffs@jeff-desktop:~/skyset$ python3 attributes.py 
['__class__', '__delattr__', '__dict__', '__dir__', '__doc__',      '__eq__', '__format__', '__ge__', '__getattribute__', '__gt__', '__hash__', '__init__', '__le__', '__lt__', '__module__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', '__weakref__', 'm', 'q']

jeffs@jeff-desktop:~/skyset$

There is more than one way to do it:

#! /usr/bin/env python3
#
# This demonstrates how to pick the attiributes of an object

class C(object) :

  def __init__ (self, name="q" ):
    self.q = name
    self.m = "y?"

c = C()

print ( dir(c) )

When run, this code produces:

jeffs@jeff-desktop:~/skyset$ python3 attributes.py 
['__class__', '__delattr__', '__dict__', '__dir__', '__doc__',      '__eq__', '__format__', '__ge__', '__getattribute__', '__gt__', '__hash__', '__init__', '__le__', '__lt__', '__module__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', '__weakref__', 'm', 'q']

jeffs@jeff-desktop:~/skyset$

回答 10

请查看按顺序执行的python shell脚本,在这里您将获得以字符串格式(用逗号分隔)的类的属性。

>>> class new_class():
...     def __init__(self, number):
...         self.multi = int(number)*2
...         self.str = str(number)
... 
>>> a = new_class(4)
>>> ",".join(a.__dict__.keys())
'str,multi'<br/>

我正在使用python 3.4

Please see the python shell script which has been executed in sequence, here you will get the attributes of a class in string format separated by comma.

>>> class new_class():
...     def __init__(self, number):
...         self.multi = int(number)*2
...         self.str = str(number)
... 
>>> a = new_class(4)
>>> ",".join(a.__dict__.keys())
'str,multi'<br/>

I am using python 3.4


回答 11

除了这些答案之外,我还将包括一个函数(python 3),用于生成几乎所有值的整个结构。它用于dir建立属性名称的完整列表,然后getattr与每个名称一起使用。它显示值的每个成员的类型,并在可能的情况下还显示整个成员:

import json

def get_info(obj):

  type_name = type(obj).__name__
  print('Value is of type {}!'.format(type_name))
  prop_names = dir(obj)

  for prop_name in prop_names:
    prop_val = getattr(obj, prop_name)
    prop_val_type_name = type(prop_val).__name__
    print('{} has property "{}" of type "{}"'.format(type_name, prop_name, prop_val_type_name))

    try:
      val_as_str = json.dumps([ prop_val ], indent=2)[1:-1]
      print('  Here\'s the {} value: {}'.format(prop_name, val_as_str))
    except:
      pass

现在,以下任何一项都应提供洞察力:

get_info(None)
get_info('hello')

import numpy
get_info(numpy)
# ... etc.

In addition to these answers, I’ll include a function (python 3) for spewing out virtually the entire structure of any value. It uses dir to establish the full list of property names, then uses getattr with each name. It displays the type of every member of the value, and when possible also displays the entire member:

import json

def get_info(obj):

  type_name = type(obj).__name__
  print('Value is of type {}!'.format(type_name))
  prop_names = dir(obj)

  for prop_name in prop_names:
    prop_val = getattr(obj, prop_name)
    prop_val_type_name = type(prop_val).__name__
    print('{} has property "{}" of type "{}"'.format(type_name, prop_name, prop_val_type_name))

    try:
      val_as_str = json.dumps([ prop_val ], indent=2)[1:-1]
      print('  Here\'s the {} value: {}'.format(prop_name, val_as_str))
    except:
      pass

Now any of the following should give insight:

get_info(None)
get_info('hello')

import numpy
get_info(numpy)
# ... etc.

回答 12

获取对象的属性

class new_class():
    def __init__(self, number):
    self.multi = int(number) * 2
    self.str = str(number)

new_object = new_class(2)                
print(dir(new_object))                   #total list attributes of new_object
attr_value = new_object.__dict__         
print(attr_value)                        #Dictionary of attribute and value for new_class                   

for attr in attr_value:                  #attributes on  new_class
    print(attr)

输出量

['__class__', '__delattr__', '__dict__', '__dir__', '__doc__','__eq__', '__format__', '__ge__', '__getattribute__', '__gt__', '__hash__', '__init__', '__init_subclass__', '__le__', '__lt__', '__module__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', '__weakref__', 'multi', 'str']

{'multi': 4, 'str': '2'}

multi
str

Get attributes of an object

class new_class():
    def __init__(self, number):
    self.multi = int(number) * 2
    self.str = str(number)

new_object = new_class(2)                
print(dir(new_object))                   #total list attributes of new_object
attr_value = new_object.__dict__         
print(attr_value)                        #Dictionary of attribute and value for new_class                   

for attr in attr_value:                  #attributes on  new_class
    print(attr)

Output

['__class__', '__delattr__', '__dict__', '__dir__', '__doc__','__eq__', '__format__', '__ge__', '__getattribute__', '__gt__', '__hash__', '__init__', '__init_subclass__', '__le__', '__lt__', '__module__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', '__weakref__', 'multi', 'str']

{'multi': 4, 'str': '2'}

multi
str

回答 13

如前所述,使用obj.__dict__可以处理常见情况,但是某些类没有__dict__属性和使用__slots__(主要是为了提高内存效率)。

更具弹性的方法示例:

class A(object):
    __slots__ = ('x', 'y', )
    def __init__(self, x, y):
        self.x = x
        self.y = y


class B(object):
    def __init__(self, x, y):
        self.x = x
        self.y = y


def get_object_attrs(obj):
    try:
        return obj.__dict__
    except AttributeError:
        return {attr: getattr(obj, attr) for attr in obj.__slots__}


a = A(1,2)
b = B(1,2)
assert not hasattr(a, '__dict__')

print(get_object_attrs(a))
print(get_object_attrs(b))

此代码的输出:

{'x': 1, 'y': 2}
{'x': 1, 'y': 2}

注意1:
Python是一种动态语言,因此最好还是了解试图从中获取属性的类,因为即使这段代码也可能会丢失某些情况。

注意2:
此代码仅输出实例变量,这意味着未提供类变量。例如:

class A(object):
    url = 'http://stackoverflow.com'
    def __init__(self, path):
        self.path = path

print(A('/questions').__dict__)

代码输出:

{'path': '/questions'}

此代码不会显示urlclass属性,并且可能会省略所需的class属性。
有时,我们可能会认为属性是实例成员,但并非如此,因此在本示例中不会显示。

As written before using obj.__dict__ can handle common cases but some classes do not have the __dict__ attribute and use __slots__ (mostly for memory efficiency).

example for a more resilient way of doing this:

class A(object):
    __slots__ = ('x', 'y', )
    def __init__(self, x, y):
        self.x = x
        self.y = y


class B(object):
    def __init__(self, x, y):
        self.x = x
        self.y = y


def get_object_attrs(obj):
    try:
        return obj.__dict__
    except AttributeError:
        return {attr: getattr(obj, attr) for attr in obj.__slots__}


a = A(1,2)
b = B(1,2)
assert not hasattr(a, '__dict__')

print(get_object_attrs(a))
print(get_object_attrs(b))

this code’s output:

{'x': 1, 'y': 2}
{'x': 1, 'y': 2}

Note1:
Python is a dynamic language and it is always better knowing the classes you trying to get the attributes from as even this code can miss some cases.

Note2:
this code outputs only instance variables meaning class variables are not provided. for example:

class A(object):
    url = 'http://stackoverflow.com'
    def __init__(self, path):
        self.path = path

print(A('/questions').__dict__)

code outputs:

{'path': '/questions'}

This code does not print the url class attribute and might omit wanted class attributes.
Sometimes we might think an attribute is an instance member but it is not and won’t be shown using this example.


回答 14

  • 使用__dict__vars 不起作用,因为它错过了__slots__
  • 使用__dict____slots__ 不起作用,因为它错过了__slots__基类。
  • 使用dir 不起作用,因为它包括类属性(例如方法或属性)以及对象属性。
  • 使用vars等同于使用__dict__

这是我最好的:

from typing import Dict

def get_attrs( x : object ) -> Dict[str, object]:
    mro      = type( x ).mro()
    attrs    = { }
    has_dict = False
    sentinel = object()

    for klass in mro:
        for slot in getattr( klass, "__slots__", () ):
            v = getattr( x, slot, sentinel )

            if v is sentinel:
                continue

            if slot == "__dict__":
                assert not has_dict, "Multiple __dicts__?"
                attrs.update( v )
                has_dict = True
            else:
                attrs[slot] = v

    if not has_dict:
        attrs.update( getattr( x, "__dict__", { } ) )

    return attrs
  • Using __dict__ or vars does not work because it misses out __slots__.
  • Using __dict__ and __slots__ does not work because it misses out __slots__ from base classes.
  • Using dir does not work because it includes class attributes, such as methods or properties, as well as the object attributes.
  • Using vars is equivalent to using __dict__.

This is the best I have:

from typing import Dict

def get_attrs( x : object ) -> Dict[str, object]:
    mro      = type( x ).mro()
    attrs    = { }
    has_dict = False
    sentinel = object()

    for klass in mro:
        for slot in getattr( klass, "__slots__", () ):
            v = getattr( x, slot, sentinel )

            if v is sentinel:
                continue

            if slot == "__dict__":
                assert not has_dict, "Multiple __dicts__?"
                attrs.update( v )
                has_dict = True
            else:
                attrs[slot] = v

    if not has_dict:
        attrs.update( getattr( x, "__dict__", { } ) )

    return attrs

回答 15

attributes_list = [attribute for attribute in dir(obj) if attribute[0].islower()]
attributes_list = [attribute for attribute in dir(obj) if attribute[0].islower()]

回答 16

请按顺序查看以下Python Shell脚本执行,它将提供从创建类到提取实例的字段名称的解决方案。

>>> class Details:
...       def __init__(self,name,age):
...           self.name=name
...           self.age =age
...       def show_details(self):
...           if self.name:
...              print "Name : ",self.name
...           else:
...              print "Name : ","_"
...           if self.age:
...              if self.age>0:
...                 print "Age  : ",self.age
...              else:
...                 print "Age can't be -ve"
...           else:
...              print "Age  : ","_"
... 
>>> my_details = Details("Rishikesh",24)
>>> 
>>> print my_details
<__main__.Details instance at 0x10e2e77e8>
>>> 
>>> print my_details.name
Rishikesh
>>> print my_details.age
24
>>> 
>>> my_details.show_details()
Name :  Rishikesh
Age  :  24
>>> 
>>> person1 = Details("",34)
>>> person1.name
''
>>> person1.age
34
>>> person1.show_details
<bound method Details.show_details of <__main__.Details instance at 0x10e2e7758>>
>>> 
>>> person1.show_details()
Name :  _
Age  :  34
>>>
>>> person2 = Details("Rob Pike",0)
>>> person2.name
'Rob Pike'
>>> 
>>> person2.age
0
>>> 
>>> person2.show_details()
Name :  Rob Pike
Age  :  _
>>> 
>>> person3 = Details("Rob Pike",-45)
>>> 
>>> person3.name
'Rob Pike'
>>> 
>>> person3.age
-45
>>> 
>>> person3.show_details()
Name :  Rob Pike
Age can't be -ve
>>>
>>> person3.__dict__
{'age': -45, 'name': 'Rob Pike'}
>>>
>>> person3.__dict__.keys()
['age', 'name']
>>>
>>> person3.__dict__.values()
[-45, 'Rob Pike']
>>>

Please see the following Python shell scripting execution in sequence, it will give the solution from creation of class to extracting the field names of instances.

>>> class Details:
...       def __init__(self,name,age):
...           self.name=name
...           self.age =age
...       def show_details(self):
...           if self.name:
...              print "Name : ",self.name
...           else:
...              print "Name : ","_"
...           if self.age:
...              if self.age>0:
...                 print "Age  : ",self.age
...              else:
...                 print "Age can't be -ve"
...           else:
...              print "Age  : ","_"
... 
>>> my_details = Details("Rishikesh",24)
>>> 
>>> print my_details
<__main__.Details instance at 0x10e2e77e8>
>>> 
>>> print my_details.name
Rishikesh
>>> print my_details.age
24
>>> 
>>> my_details.show_details()
Name :  Rishikesh
Age  :  24
>>> 
>>> person1 = Details("",34)
>>> person1.name
''
>>> person1.age
34
>>> person1.show_details
<bound method Details.show_details of <__main__.Details instance at 0x10e2e7758>>
>>> 
>>> person1.show_details()
Name :  _
Age  :  34
>>>
>>> person2 = Details("Rob Pike",0)
>>> person2.name
'Rob Pike'
>>> 
>>> person2.age
0
>>> 
>>> person2.show_details()
Name :  Rob Pike
Age  :  _
>>> 
>>> person3 = Details("Rob Pike",-45)
>>> 
>>> person3.name
'Rob Pike'
>>> 
>>> person3.age
-45
>>> 
>>> person3.show_details()
Name :  Rob Pike
Age can't be -ve
>>>
>>> person3.__dict__
{'age': -45, 'name': 'Rob Pike'}
>>>
>>> person3.__dict__.keys()
['age', 'name']
>>>
>>> person3.__dict__.values()
[-45, 'Rob Pike']
>>>

回答 17

__attr__ 给出实例的属性列表。

>>> import requests
>>> r=requests.get('http://www.google.com')
>>> r.__attrs__
['_content', 'status_code', 'headers', 'url', 'history', 'encoding', 'reason', 'cookies', 'elapsed', 'request']
>>> r.url
'http://www.google.com/'
>>>

__attr__ gives the list of attributes of an instance.

>>> import requests
>>> r=requests.get('http://www.google.com')
>>> r.__attrs__
['_content', 'status_code', 'headers', 'url', 'history', 'encoding', 'reason', 'cookies', 'elapsed', 'request']
>>> r.url
'http://www.google.com/'
>>>

什么是__main__.py?

问题:什么是__main__.py?

什么是__main__.py文件,哪些代码排序应我把它付诸表决,而当我应该有一个?

What is the __main__.py file for, what sort of code should I put into it, and when should I have one?


回答 0

通常,通过在命令行上命名.py文件来运行Python程序:

$ python my_program.py

您还可以创建一个充满代码的目录或zipfile,并包含一个__main__.py。然后,您只需在命令行上命名目录或zipfile,它就会__main__.py自动执行:

$ python my_program_dir
$ python my_program.zip
# Or, if the program is accessible as a module
$ python -m my_program

您必须自己决定应用程序是否可以从这样的执行中受益。


请注意,__main__ 模块通常不是来自__main__.py文件。可以,但是通常不会。当您运行类似python my_program.py的脚本时,脚本将作为__main__模块而不是my_program模块运行。对于以以下方式运行的模块,也会发生这种情况:python -m my_module或以其他方式,。

如果__main__在错误消息中看到该名称,则不一定意味着您应该在寻找__main__.py文件。

Often, a Python program is run by naming a .py file on the command line:

$ python my_program.py

You can also create a directory or zipfile full of code, and include a __main__.py. Then you can simply name the directory or zipfile on the command line, and it executes the __main__.py automatically:

$ python my_program_dir
$ python my_program.zip
# Or, if the program is accessible as a module
$ python -m my_program

You’ll have to decide for yourself whether your application could benefit from being executed like this.


Note that a __main__ module usually doesn’t come from a __main__.py file. It can, but it usually doesn’t. When you run a script like python my_program.py, the script will run as the __main__ module instead of the my_program module. This also happens for modules run as python -m my_module, or in several other ways.

If you saw the name __main__ in an error message, that doesn’t necessarily mean you should be looking for a __main__.py file.


回答 1

是什么 __main__.py文件是做什么用的?

创建Python模块时,通常在使该模块main作为程序的入口点运行时使其执行某些功能(通常包含在函数中)。通常,通过将以下常见用法放在大多数Python文件的底部来完成此操作:

if __name__ == '__main__':
    # execute only if run as the entry point into the program
    main()

您可以使用来获得与Python包相同的语义__main__.py。这是一个Linux Shell提示符,$如果您在Windows上没有Bash(或另一个Posix Shell),只需demo/__<init/main>__.pyEOFs 之间创建以下内容的文件:

$ mkdir demo
$ cat > demo/__init__.py << EOF
print('demo/__init__.py executed')
def main():
    print('main executed')
EOF
$ cat > demo/__main__.py << EOF
print('demo/__main__.py executed')
from __init__ import main
main()
EOF

(在Posix / Bash shell中,可以通过在每个cat命令的末尾输入+ (文件末尾字符)来执行不带<< EOFs和以EOFs 结尾的上述操作)CtrlD

现在:

$ python demo
demo/__main__.py executed
demo/__init__.py executed
main executed

您可以从文档中得出。该文档说:

__main__ —顶级脚本环境

'__main__'是在其中执行顶级代码的作用域的名称。从标准输入,脚本或交互式提示中读取时,模块的__name__设置等于'__main__'

通过检查模块自身__name__,可以发现模块是否在主作用域中运行,这允许使用通用习语在模块中作为脚本运行时或python -m在导入时有条件地在模块中有条件地执行代码:

if __name__ == '__main__':
      # execute only if run as a script
      main()

对于包,通过包含一个__main__.py模块可以实现相同的效果 ,当使用来运行该模块时,将执行其内容-m

压缩的

您还可以将其打包到一个文件中,然后从命令行运行,如下所示-但请注意,压缩包不能执行子包或子模块作为入口点:

$ python -m zipfile -c demo.zip demo/*
$ python demo.zip
demo/__main__.py executed
demo/__init__.py executed
main() executed

What is the __main__.py file for?

When creating a Python module, it is common to make the module execute some functionality (usually contained in a main function) when run as the entry point of the program. This is typically done with the following common idiom placed at the bottom of most Python files:

if __name__ == '__main__':
    # execute only if run as the entry point into the program
    main()

You can get the same semantics for a Python package with __main__.py. This is a linux shell prompt, $, if you don’t have Bash (or another Posix shell) on Windows just create these files at demo/__<init/main>__.py with contents in between the EOFs:

$ mkdir demo
$ cat > demo/__init__.py << EOF
print('demo/__init__.py executed')
def main():
    print('main executed')
EOF
$ cat > demo/__main__.py << EOF
print('demo/__main__.py executed')
from __init__ import main
main()
EOF

(In a Posix/Bash shell, you can do the above without the << EOFs and ending EOFs by entering Ctrl+D, the end-of-file character, at the end of each cat command)

And now:

$ python demo
demo/__main__.py executed
demo/__init__.py executed
main executed

You can derive this from the documention. The documentation says:

__main__ — Top-level script environment

'__main__' is the name of the scope in which top-level code executes. A module’s __name__ is set equal to '__main__' when read from standard input, a script, or from an interactive prompt.

A module can discover whether or not it is running in the main scope by checking its own __name__, which allows a common idiom for conditionally executing code in a module when it is run as a script or with python -m but not when it is imported:

if __name__ == '__main__':
      # execute only if run as a script
      main()

For a package, the same effect can be achieved by including a __main__.py module, the contents of which will be executed when the module is run with -m.

Zipped

You can also package this into a single file and run it from the command line like this – but note that zipped packages can’t execute sub-packages or submodules as the entry point:

$ python -m zipfile -c demo.zip demo/*
$ python demo.zip
demo/__main__.py executed
demo/__init__.py executed
main() executed

回答 2

__main__.py用于zip文件中的python程序。在__main__.py当zip文件中运行文件就会被执行。例如,如果压缩文件是这样的:

test.zip
     __main__.py

和的内容__main__.py

import sys
print "hello %s" % sys.argv[1]

然后,如果我们要跑步,python test.zip world我们就会hello world下车。

因此,__main__.py当在zip文件上调用python时,文件就会运行。

__main__.py is used for python programs in zip files. The __main__.py file will be executed when the zip file in run. For example, if the zip file was as such:

test.zip
     __main__.py

and the contents of __main__.py was

import sys
print "hello %s" % sys.argv[1]

Then if we were to run python test.zip world we would get hello world out.

So the __main__.py file run when python is called on a zip file.


回答 3

创建__main__.pyyourpackage使其可执行文件:

$ python -m yourpackage

You create __main__.py in yourpackage to make it executable as:

$ python -m yourpackage

回答 4

如果您的脚本是目录或ZIP文件而不是单个python文件,__main__.py则在将“脚本”作为参数传递给python解释器时将执行该脚本。

If your script is a directory or ZIP file rather than a single python file, __main__.py will be executed when the “script” is passed as an argument to the python interpreter.


Python是否支持短路?

问题:Python是否支持短路?

Python是否支持布尔表达式中的短路?

Does Python support short-circuiting in boolean expressions?


回答 0

是的andor操作员都短路了-请参阅docs

Yep, both and and or operators short-circuit — see the docs.


回答 1

捷径,运营商andor

我们首先定义一个有用的函数来确定是否执行了某些操作。一个简单的函数,它接受一个参数,输出一条消息并返回输入,且未更改。

>>> def fun(i):
...     print "executed"
...     return i
... 

在以下示例中,可以观察到Python的捷径andor

>>> fun(1)
executed
1
>>> 1 or fun(1)    # due to short-circuiting  "executed" not printed
1
>>> 1 and fun(1)   # fun(1) called and "executed" printed 
executed
1
>>> 0 and fun(1)   # due to short-circuiting  "executed" not printed 
0

注意:解释器认为以下值表示false:

        False    None    0    ""    ()    []     {}

功能中的捷径:any()all()

Python any()all()函数还支持短路。如文档所示;他们按顺序评估序列中的每个元素,直到找到可以提前退出评估的结果。考虑下面的示例以了解两者。

该函数any()检查是否有任何元素为True。一旦遇到True,它将立即停止执行并返回True。

>>> any(fun(i) for i in [1, 2, 3, 4])   # bool(1) = True
executed
True
>>> any(fun(i) for i in [0, 2, 3, 4])   
executed                               # bool(0) = False
executed                               # bool(2) = True
True
>>> any(fun(i) for i in [0, 0, 3, 4])
executed
executed
executed
True

该函数all()检查所有元素是否为True,并在遇到False时立即停止执行:

>>> all(fun(i) for i in [0, 0, 3, 4])
executed
False
>>> all(fun(i) for i in [1, 0, 3, 4])
executed
executed
False

链式比较中的捷径:

此外,在Python中

比较可以任意链接 ; 例如,x < y <= z与等效x < y and y <= z,除了y只被评估一次(但是在两种情况下z,如果x < y发现为假,则根本不评估)。

>>> 5 > 6 > fun(3)    # same as:  5 > 6 and 6 > fun(3)
False                 # 5 > 6 is False so fun() not called and "executed" NOT printed
>>> 5 < 6 > fun(3)    # 5 < 6 is True 
executed              # fun(3) called and "executed" printed
True
>>> 4 <= 6 > fun(7)   # 4 <= 6 is True  
executed              # fun(3) called and "executed" printed
False
>>> 5 < fun(6) < 3    # only prints "executed" once
executed
False
>>> 5 < fun(6) and fun(6) < 3 # prints "executed" twice, because the second part executes it again
executed
executed
False

编辑:
需要注意的另一点要点:-逻辑andor Python中的运算符返回操作数的值,而不是布尔值(TrueFalse)。例如:

操作x and y给出结果if x is false, then x, else y

不像在其他语言,例如&&||在C运算该返回0或1。

例子:

>>> 3 and 5    # Second operand evaluated and returned 
5                   
>>> 3  and ()
()
>>> () and 5   # Second operand NOT evaluated as first operand () is  false
()             # so first operand returned 

同样,or运算符返回最左边的值,其中bool(value)==,True否则返回最右边的假值(根据捷径),例如:

>>> 2 or 5    # left most operand bool(2) == True
2    
>>> 0 or 5    # bool(0) == False and bool(5) == True
5
>>> 0 or ()
()

那么,这有什么用呢?Magnus Lie Hetland 在《实用Python》中给出的一个示例用法:
假设用户应该输入他或她的名字,但是可以选择不输入任何东西,在这种情况下,您想使用默认值'<unknown>'。您可以使用if语句,但也可以非常简洁地陈述一下:

In [171]: name = raw_input('Enter Name: ') or '<Unkown>'
Enter Name: 

In [172]: name
Out[172]: '<Unkown>'

换句话说,如果raw_input的返回值是true(不是一个空字符串),则将其分配给name(不变);否则,它将返回true。否则,默认设置'<unknown>'name

Short-circuiting behavior in operator and, or:

Let’s first define a useful function to determine if something is executed or not. A simple function that accepts an argument, prints a message and returns the input, unchanged.

>>> def fun(i):
...     print "executed"
...     return i
... 

One can observe the Python’s short-circuiting behavior of and, or operators in the following example:

>>> fun(1)
executed
1
>>> 1 or fun(1)    # due to short-circuiting  "executed" not printed
1
>>> 1 and fun(1)   # fun(1) called and "executed" printed 
executed
1
>>> 0 and fun(1)   # due to short-circuiting  "executed" not printed 
0

Note: The following values are considered by the interpreter to mean false:

        False    None    0    ""    ()    []     {}

Short-circuiting behavior in function: any(), all():

Python’s any() and all() functions also support short-circuiting. As shown in the docs; they evaluate each element of a sequence in-order, until finding a result that allows an early exit in the evaluation. Consider examples below to understand both.

The function any() checks if any element is True. It stops executing as soon as a True is encountered and returns True.

>>> any(fun(i) for i in [1, 2, 3, 4])   # bool(1) = True
executed
True
>>> any(fun(i) for i in [0, 2, 3, 4])   
executed                               # bool(0) = False
executed                               # bool(2) = True
True
>>> any(fun(i) for i in [0, 0, 3, 4])
executed
executed
executed
True

The function all() checks all elements are True and stops executing as soon as a False is encountered:

>>> all(fun(i) for i in [0, 0, 3, 4])
executed
False
>>> all(fun(i) for i in [1, 0, 3, 4])
executed
executed
False

Short-circuiting behavior in Chained Comparison:

Additionally, in Python

Comparisons can be chained arbitrarily; for example, x < y <= z is equivalent to x < y and y <= z, except that y is evaluated only once (but in both cases z is not evaluated at all when x < y is found to be false).

>>> 5 > 6 > fun(3)    # same as:  5 > 6 and 6 > fun(3)
False                 # 5 > 6 is False so fun() not called and "executed" NOT printed
>>> 5 < 6 > fun(3)    # 5 < 6 is True 
executed              # fun(3) called and "executed" printed
True
>>> 4 <= 6 > fun(7)   # 4 <= 6 is True  
executed              # fun(3) called and "executed" printed
False
>>> 5 < fun(6) < 3    # only prints "executed" once
executed
False
>>> 5 < fun(6) and fun(6) < 3 # prints "executed" twice, because the second part executes it again
executed
executed
False

Edit:
One more interesting point to note :- Logical and, or operators in Python returns an operand’s value instead of a Boolean (True or False). For example:

Operation x and y gives the result if x is false, then x, else y

Unlike in other languages e.g. &&, || operators in C that return either 0 or 1.

Examples:

>>> 3 and 5    # Second operand evaluated and returned 
5                   
>>> 3  and ()
()
>>> () and 5   # Second operand NOT evaluated as first operand () is  false
()             # so first operand returned 

Similarly or operator return left most value for which bool(value) == True else right most false value (according to short-circuiting behavior), examples:

>>> 2 or 5    # left most operand bool(2) == True
2    
>>> 0 or 5    # bool(0) == False and bool(5) == True
5
>>> 0 or ()
()

So, how is this useful? One example use given in Practical Python By Magnus Lie Hetland:
Let’s say a user is supposed to enter his or her name, but may opt to enter nothing, in which case you want to use the default value '<unknown>'. You could use an if statement, but you could also state things very succinctly:

In [171]: name = raw_input('Enter Name: ') or '<Unkown>'
Enter Name: 

In [172]: name
Out[172]: '<Unkown>'

In other words, if the return value from raw_input is true (not an empty string), it is assigned to name (nothing changes); otherwise, the default '<unknown>' is assigned to name.


回答 2

是。在python解释器中尝试以下操作:

>>>False and 3/0
False
>>>True and 3/0
ZeroDivisionError: integer division or modulo by zero

要么

>>>True or 3/0
True
>>>False or 3/0
ZeroDivisionError: integer division or modulo by zero

Yes. Try the following in your python interpreter:

and

>>>False and 3/0
False
>>>True and 3/0
ZeroDivisionError: integer division or modulo by zero

or

>>>True or 3/0
True
>>>False or 3/0
ZeroDivisionError: integer division or modulo by zero

如何在Flask中获取POST JSON?

问题:如何在Flask中获取POST JSON?

我正在尝试使用Flask构建一个简单的API,现在我想在其中读取一些POSTed JSON。我使用Postman Chrome扩展程序执行POST,而我发布的JSON就是{"text":"lalala"}。我尝试使用以下方法读取JSON:

@app.route('/api/add_message/<uuid>', methods=['GET', 'POST'])
def add_message(uuid):
    content = request.json
    print content
    return uuid

在浏览器上,它可以正确返回我放入GET中的UUID,但是在控制台上,它只是打印出来None(我希望它可以在其中打印出来{"text":"lalala"}。有人知道我如何从Flask方法中获取发布的JSON吗?

I’m trying to build a simple API using Flask, in which I now want to read some POSTed JSON. I do the POST with the Postman Chrome extension, and the JSON I POST is simply {"text":"lalala"}. I try to read the JSON using the following method:

@app.route('/api/add_message/<uuid>', methods=['GET', 'POST'])
def add_message(uuid):
    content = request.json
    print content
    return uuid

On the browser it correctly returns the UUID I put in the GET, but on the console, it just prints out None (where I expect it to print out the {"text":"lalala"}. Does anybody know how I can get the posted JSON from within the Flask method?


回答 0

首先,该.json属性是一个委托给request.get_json()method的属性,该属性记录了为什么None在此处看到。

您需要将请求内容类型设置为,application/json以使.json属性和.get_json()方法(不带参数)起作用,None否则两者都会产生这种情况。请参阅Flask Request文档

如果mimetype表示JSON(包含application / json,请参见is_json()),则它将包含已解析的JSON数据,否则为None

您可以request.get_json()通过向其传递force=True关键字参数来告知跳过内容类型要求。

请注意,如果此时引发异常(可能导致400 Bad Request响应),则您的JSON 数据无效。它在某种程度上畸形;您可能需要使用JSON验证程序进行检查。

First of all, the .json attribute is a property that delegates to the request.get_json() method, which documents why you see None here.

You need to set the request content type to application/json for the .json property and .get_json() method (with no arguments) to work as either will produce None otherwise. See the Flask Request documentation:

This will contain the parsed JSON data if the mimetype indicates JSON (application/json, see is_json()), otherwise it will be None.

You can tell request.get_json() to skip the content type requirement by passing it the force=True keyword argument.

Note that if an exception is raised at this point (possibly resulting in a 400 Bad Request response), your JSON data is invalid. It is in some way malformed; you may want to check it with a JSON validator.


回答 1

作为参考,以下是有关如何从Python客户端发送json的完整代码:

import requests
res = requests.post('http://localhost:5000/api/add_message/1234', json={"mytext":"lalala"})
if res.ok:
    print res.json()

“ json =“输入将自动设置内容类型,如下所述:使用Python请求发布JSON

以上客户端将使用此服务器端代码:

from flask import Flask, request, jsonify
app = Flask(__name__)

@app.route('/api/add_message/<uuid>', methods=['GET', 'POST'])
def add_message(uuid):
    content = request.json
    print content['mytext']
    return jsonify({"uuid":uuid})

if __name__ == '__main__':
    app.run(host= '0.0.0.0',debug=True)

For reference, here’s complete code for how to send json from a Python client:

import requests
res = requests.post('http://localhost:5000/api/add_message/1234', json={"mytext":"lalala"})
if res.ok:
    print res.json()

The “json=” input will automatically set the content-type, as discussed here: Post JSON using Python Requests

And the above client will work with this server-side code:

from flask import Flask, request, jsonify
app = Flask(__name__)

@app.route('/api/add_message/<uuid>', methods=['GET', 'POST'])
def add_message(uuid):
    content = request.json
    print content['mytext']
    return jsonify({"uuid":uuid})

if __name__ == '__main__':
    app.run(host= '0.0.0.0',debug=True)

回答 2

这就是我要做的方法,应该是

@app.route('/api/add_message/<uuid>', methods=['GET', 'POST'])
def add_message(uuid):
    content = request.get_json(silent=True)
    # print(content) # Do your processing
    return uuid

随着silent=True集,该get_json功能将尝试检索JSON的身体的时候默默的失败。默认情况下,此设置为False。如果您始终希望使用json正文(而不是可选),请将其保留为silent=False

设置force=True将忽略request.headers.get('Content-Type') == 'application/json'烧瓶对您的 检查。默认情况下,它也设置为False

请参见烧瓶文档

我强烈建议您离开force=False并让客户端发送Content-Type标头以使其更加明确。

希望这可以帮助!

This is the way I would do it and it should be

@app.route('/api/add_message/<uuid>', methods=['GET', 'POST'])
def add_message(uuid):
    content = request.get_json(silent=True)
    # print(content) # Do your processing
    return uuid

With silent=True set, the get_json function will fail silently when trying to retrieve the json body. By default this is set to False. If you are always expecting a json body (not optionally), leave it as silent=False.

Setting force=True will ignore the request.headers.get('Content-Type') == 'application/json' check that flask does for you. By default this is also set to False.

See flask documentation.

I would strongly recommend leaving force=False and make the client send the Content-Type header to make it more explicit.

Hope this helps!


回答 3

假设您已发布具有application/json内容类型的有效JSON ,request.json将具有已解析的JSON数据。

from flask import Flask, request, jsonify

app = Flask(__name__)


@app.route('/echo', methods=['POST'])
def hello():
   return jsonify(request.json)

Assuming you’ve posted valid JSON with the application/json content type, request.json will have the parsed JSON data.

from flask import Flask, request, jsonify

app = Flask(__name__)


@app.route('/echo', methods=['POST'])
def hello():
   return jsonify(request.json)

回答 4

对于所有问题都是来自ajax调用的人,这里有一个完整的示例:

Ajax调用:这里的关键是使用a dict然后JSON.stringify

    var dict = {username : "username" , password:"password"};

    $.ajax({
        type: "POST", 
        url: "http://127.0.0.1:5000/", //localhost Flask
        data : JSON.stringify(dict),
        contentType: "application/json",
    });

在服务器端:

from flask import Flask
from flask import request
import json

app = Flask(__name__)

@app.route("/",  methods = ['POST'])
def hello():
    print(request.get_json())
    return json.dumps({'success':True}), 200, {'ContentType':'application/json'} 

if __name__ == "__main__":
    app.run()

For all those whose issue was from the ajax call, here is a full example :

Ajax call : the key here is to use a dict and then JSON.stringify

    var dict = {username : "username" , password:"password"};

    $.ajax({
        type: "POST", 
        url: "http://127.0.0.1:5000/", //localhost Flask
        data : JSON.stringify(dict),
        contentType: "application/json",
    });

And on server side :

from flask import Flask
from flask import request
import json

app = Flask(__name__)

@app.route("/",  methods = ['POST'])
def hello():
    print(request.get_json())
    return json.dumps({'success':True}), 200, {'ContentType':'application/json'} 

if __name__ == "__main__":
    app.run()

回答 5

给出另一种方法。

from flask import Flask, jsonify, request
app = Flask(__name__)

@app.route('/service', methods=['POST'])
def service():
    data = json.loads(request.data)
    text = data.get("text",None)
    if text is None:
        return jsonify({"message":"text not found"})
    else:
        return jsonify(data)

if __name__ == '__main__':
    app.run(host= '0.0.0.0',debug=True)

To give another approach.

from flask import Flask, jsonify, request
app = Flask(__name__)

@app.route('/service', methods=['POST'])
def service():
    data = json.loads(request.data)
    text = data.get("text",None)
    if text is None:
        return jsonify({"message":"text not found"})
    else:
        return jsonify(data)

if __name__ == '__main__':
    app.run(host= '0.0.0.0',debug=True)

回答 6

假设您发布了有效的JSON,

@app.route('/api/add_message/<uuid>', methods=['GET', 'POST'])
def add_message(uuid):
    content = request.json
    print content['uuid']
    # Return data as JSON
    return jsonify(content)

Assuming that you have posted valid JSON,

@app.route('/api/add_message/<uuid>', methods=['GET', 'POST'])
def add_message(uuid):
    content = request.json
    print content['uuid']
    # Return data as JSON
    return jsonify(content)

回答 7

尝试使用力参数…

request.get_json(force = True)

Try to use force parameter…

request.get_json(force = True)


“ pip install unroll”:“ python setup.py egg_info”失败,错误代码为1

问题:“ pip install unroll”:“ python setup.py egg_info”失败,错误代码为1

我是Python的新手,并一直在尝试使用安装某些软件包pip

但是pip install unroll给我

命令“ python setup.py egg_info”在C:\ Users \ MARKAN〜1 \ AppData \ Local \ Temp \ pip-build-wa7uco0k \ unroll \中失败,错误代码为1

我该如何解决?

I’m new to Python and have been trying to install some packages with pip.

But pip install unroll gives me

Command “python setup.py egg_info” failed with error code 1 in C:\Users\MARKAN~1\AppData\Local\Temp\pip-build-wa7uco0k\unroll\

How can I solve this?


回答 0

关于错误代码

根据Python文档

该模块提供了可用的标准errno系统符号。每个符号的值是相应的整数值。名称和描述是从linux / include / errno.h借来的,应该十分全面。

错误代码1在errno.h和中定义Operation not permitted

关于您的错误

您的setuptools似乎未安装。只需遵循Installation InstructionsPyPI网站上的即可。

如果已经安装,请尝试

pip install --upgrade setuptools

如果已经更新,请检查模块ez_setup是否缺失。如果是的话

pip install ez_setup

然后再试一次

pip install unroll

如果仍然无法正常运行,则可能是pip没有正确安装/升级setup_tools,因此您可能需要尝试

easy_install -U setuptools

然后再次

pip install unroll

About the error code

According to the Python documentation:

This module makes available standard errno system symbols. The value of each symbol is the corresponding integer value. The names and descriptions are borrowed from linux/include/errno.h, which should be pretty all-inclusive.

Error code 1 is defined in errno.h and means Operation not permitted.

About your error

Your setuptools do not appear to be installed. Just follow the Installation Instructions from the PyPI website.

If it’s already installed, try

pip install --upgrade setuptools

If it’s already up to date, check that the module ez_setup is not missing. If it is, then

pip install ez_setup

Then try again

pip install unroll

If it’s still not working, maybe pip didn’t install/upgrade setup_tools properly so you might want to try

easy_install -U setuptools

And again

pip install unroll

回答 1

这是一些指南,解释了我通常如何在Python + Windows上安装新软件包。看来您使用的是Windows路径,因此此答案将遵循特定的SO:

  • 我从不使用系统范围的Python安装。我只使用virtualenvs,通常我会尝试使用最新版本的2.x和3.x。
  • 我的第一次尝试总是pip install package_i_want在某些Visual Studio命令提示符下进行。什么Visual Studio命令提示符?好吧,理想情况下是与用来构建Python的Visual Studio相匹配的Visual Studio。例如,假设您的Python安装提示Python 2.7.11 (v2.7.11:6d1b6a68f775, Dec 5 2015, 20:40:30) [MSC v.1500 64 bit (AMD64)] on win32。可以在此处找到用于编译Python的Visual Studio版本,因此v1500表示我将使用vs2008 x64命令提示符
  • 如果上一步由于某种原因而失败,我只是尝试使用 easy_install package_i_want
  • 如果上一步由于某种原因失败,我将转到gohlke网站,并检查我的包裹是否在那儿。如果是这样,我很幸运,我将其下载到virtualenv中,然后使用命令提示符转到该位置,然后执行pip install package_i_want.whl
  • 如果上一步没有成功,我将尝试自己制作轮子,一旦生成,我将尝试使用 pip install package_i_want.whl

现在,如果我们专注于您的特定问题,那么您将很难安装展开软件包。似乎最快的安装方式是执行以下操作:

  • git clone https://github.com/Zulko/unroll
  • cd unroll && python setup.py bdist_wheel
  • 将创建的dist文件夹中生成的unroll-0.1.0-py2-none-any.whl文件复制到virtualenv中。
  • pip install unroll-0.1.0-py2-none-any.whl

这样,它将安装没有任何问题。要检查它是否确实有效,只需登录Python安装并尝试import unroll,不要抱怨。

最后一点:这种方法几乎在99%的时间内都有效,有时您会发现一些特定于Unix或Mac OS X的pip程序包,在这种情况下,恐怕最好的方法是Windows版本正在向主要开发人员发布一些问题,或者您可以通过自己移植到Windows来获得一些乐趣(如果不走运,通常需要几个小时):)

Here’s a little guide explaining a little bit how I usually install new packages on Python + Windows. It seems you’re using Windows paths, so this answer will stick to that particular SO:

  • I never use a system-wide Python installation. I only use virtualenvs, and usually I try to have the latest version of 2.x & 3.x.
  • My first attempt is always doing pip install package_i_want in some of my Visual Studio command prompts. What Visual Studio command prompt? Well, ideally the Visual Studio which matches the one which was used to build Python. For instance, let’s say your Python installation says Python 2.7.11 (v2.7.11:6d1b6a68f775, Dec 5 2015, 20:40:30) [MSC v.1500 64 bit (AMD64)] on win32. The version of Visual Studio used to compile Python can be found here, so v1500 means I’d be using vs2008 x64 command prompt
  • If the previous step failed for some reason I just try using easy_install package_i_want
  • If the previous step failed for some reason I go to gohlke website and I check whether my package is available over there. If it’s so, I’m lucky, I just download it into my virtualenv and then I just go to that location using a command prompt and I do pip install package_i_want.whl
  • If the previous step didn’t succeed I’ll just try to build the wheel myself and once it’s generated I’ll try to install it with pip install package_i_want.whl

Now, if we focus in your specific problem, where you’re having a hard time installing the unroll package. It seems the fastest way to install it is doing something like this:

  • git clone https://github.com/Zulko/unroll
  • cd unroll && python setup.py bdist_wheel
  • Copy the generated unroll-0.1.0-py2-none-any.whl file from the created dist folder into your virtualenv.
  • pip install unroll-0.1.0-py2-none-any.whl

That way it will install without any problems. To check it really works, just login into the Python installation and try import unroll, it shouldn’t complain.

One last note: This method works almost 99% of the time, and sometimes you’ll find some pip packages which are specific to Unix or Mac OS X, in that case, when that happens I’m afraid the best way to get a Windows version is either posting some issues to the main developers or having some fun by yourself porting to Windows (typically a few hours if you’re not lucky) :)


回答 2

升级点数后已解决:

python -m pip install --upgrade pip
pip install "package-name"

It was resolved after upgrading pip:

python -m pip install --upgrade pip
pip install "package-name"

回答 3

我完全陷入了与相同的错误psycopg2。看来我在安装Python和相关软件包时跳过了几个步骤。

  1. sudo apt-get install python-dev libpq-dev
  2. 转到您的虚拟环境
  3. pip install psycopg2

(在您的情况下,您需要替换psycopg2遇到问题的软件包。)

它无缝地工作。

I got stuck exactly with the same error with psycopg2. It looks like I skipped a few steps while installing Python and related packages.

  1. sudo apt-get install python-dev libpq-dev
  2. Go to your virtual env
  3. pip install psycopg2

(In your case you need to replace psycopg2 with the package you have an issue with.)

It worked seamlessly.


回答 4

在安装我得到同样的错误mitmproxy使用pip3。下面的命令解决了这个问题:

pip3 install --upgrade setuptools

I got this same error while installing mitmproxy using pip3. The below command fixed this:

pip3 install --upgrade setuptools

回答 5

  • Microsoft Visual C++ Compiler for Python 2.7https://www.microsoft.com/zh-cn/download/details.aspx?id=44266下载并安装-此程序包包含为Python 2.7程序包生成二进制文件所需的编译器和系统标头集。
  • 在提升模式下打开命令提示符(以管理员身份运行)
  • 首先要做 pip install ez_setup
  • 然后做pip install unroll(它将开始安装numpy, music21, decorator, imageio, tqdm, moviepy, unroll)#请耐心等待music21安装

使用python 2.7.11 64位

  • Download and install the Microsoft Visual C++ Compiler for Python 2.7 from https://www.microsoft.com/en-in/download/details.aspx?id=44266 – this package contains the compiler and set of system headers necessary for producing binary wheels for Python 2.7 packages.
  • Open a command prompt in elevated mode (run as administrator)
  • Firstly do pip install ez_setup
  • Then do pip install unroll (It will start installing numpy, music21, decorator, imageio, tqdm, moviepy, unroll) # Please be patient for music21 installation

Python 2.7.11 64 bit used


回答 6

另一种方式:

sudo apt-get install python-psycopg2 python-mysqldb

Other way:

sudo apt-get install python-psycopg2 python-mysqldb

回答 7

我有同样的问题。

问题是

pyparsing 2.2已经安装好了,我requirements.txt正在尝试安装pyparsing 2.0.1,抛出此错误

上下文:我使用的是virtualenv,似乎2.2来自我的全局操作系统Python site-packages,但是即使带有--no-site-packages标志(默认为上一个virtualenv中的默认标志),2.2仍然存在。肯定是因为我从他们的网站安装了Python,并将Python库添加到了我的$PATH

也许一个pip install --ignore-installed会工作。

解决方案:因为我需要向前移动,我只是删除了pyparsing==2.0.1从我的requirements.txt

I had the same problem.

The problem was:

pyparsing 2.2 was already installed and my requirements.txt was trying to install pyparsing 2.0.1 which throw this error

Context: I was using virtualenv, and it seems the 2.2 came from my global OS Python site-packages, but even with --no-site-packages flag (now by default in last virtualenv) the 2.2 was still present. Surely because I installed Python from their website and it added Python libraries to my $PATH.

Maybe a pip install --ignore-installed would have worked.

Solution: as I needed to move forwards, I just removed the pyparsing==2.0.1 from my requirements.txt.


回答 8

尝试使用pip安装Python模块时遇到了相同的错误代码。@Hackndo指出文档指出了安全问题。

基于该答案,我的问题通过运行带有sudo前缀的pip install命令得以解决:

sudo pip install python-mpd2

I ran into the same error code when trying to install a Python module with pip. @Hackndo noted that the documentation indicate a security issue.

Based on that answer, my problem was solved by running the pip install command with sudo prefixed:

sudo pip install python-mpd2

回答 9

在安装“ Twisted”库时遇到了相同的问题,并通过在Ubuntu 16.04(Xenial Xerus)上运行以下命令解决了该问题:

sudo apt-get install python-setuptools python-dev build-essential

I had the same issue when installing the “Twisted” library and solved it by running the following command on Ubuntu 16.04 (Xenial Xerus):

sudo apt-get install python-setuptools python-dev build-essential

回答 10

我尝试了以上所有方法,但均未成功。然后,我将Python版本从2.7.10更新到2.7.13,它解决了我遇到的问题。

I tried all of the above with no success. I then updated my Python version from 2.7.10 to 2.7.13, and it resolved the problems that I was experiencing.


回答 11

这意味着pip中的某些软件包较旧或未正确安装。

  1. 尝试检查版本,然后升级pip。如果可行,请使用自动删除。

  2. 如果pip命令始终对任何命令显示错误或冻结,等等。

  3. 最好的解决方案是将其卸载或完全删除。

  4. 安装一个新的点,然后更新和升级您的系统。

  5. 我给出了在此处新鲜安装pip的解决方案-python:无法打开文件get-pip.py错误2]没有此类文件或目录

That means some packages in pip are old or not correctly installed.

  1. Try checking version and then upgrading pip.Use auto remove if that works.

  2. If the pip command shows an error all the time for any command or it freezes, etc.

  3. The best solution is to uninstall it or remove it completely.

  4. Install a fresh pip and then update and upgrade your system.

  5. I have given a solution to installing pip fresh here – python: can’t open file get-pip.py error 2] no such file or directory


回答 12

这对我来说是更简单的方法:

pip2 install Name

因此,如果您使用的是pip,请尝试使用pip3pip2

它应该解决问题。

This was the easier way for me:

pip2 install Name

So if you was using pip, try to use pip3 or pip2

It should solve the problem.


回答 13

pip3 install –upgrade setuptools警告:pip正在由旧的脚本包装程序调用。这将在以后的pip版本中失败。请参阅https://github.com/pypa/pip/issues/5599,以获取有关解决基本问题的建议。

******为了避免此问题,您可以使用-m pip调用Python,而不是直接运行pip。******

使用python3 -m pip“命令”例如:python3 -m pip install –user pyqt5

pip3 install –upgrade setuptools WARNING: pip is being invoked by an old script wrapper. This will fail in a future version of pip. Please see https://github.com/pypa/pip/issues/5599 for advice on fixing the underlying issue.

******To avoid this problem you can invoke Python with ‘-m pip’ instead of running pip directly.******

use python3 -m pip “command” eg: python3 -m pip install –user pyqt5


回答 14

这为我工作:

sudo xcodebuild -license

This worked for me:

sudo xcodebuild -license

回答 15

将Python升级到版本3解决了我的问题。什么都没做。

Upgrading Python to version 3 fixed my problem. Nothing else did.


回答 16

我从http://www.lfd.uci.edu/~gohlke/pythonlibs/下载了.whl文件,然后执行了以下操作:

pip install scipy-0.19.1-cp27-cp27m-win32.whl

请注意,您需要使用的版本(win32 / win_amd-64)取决于Python的版本,而不取决于Windows的版本。

I downloaded the .whl file from http://www.lfd.uci.edu/~gohlke/pythonlibs/ and then did:

pip install scipy-0.19.1-cp27-cp27m-win32.whl

Note that the version you need to use (win32/win_amd-64) depends on the version of Python and not that of Windows.


回答 17

我在新的开发设置上使用virtualenvs(带有pipenv)遇到了这个问题。

我只能通过将psycopg2版本从2.6.2升级到2.7.3来解决它。更多信息在https://github.com/psycopg/psycopg2/issues/594

I had this problem using virtualenvs (with pipenv) on my new development setup.

I could only solve it by upgrading the psycopg2 version from 2.6.2 to 2.7.3. More information is at https://github.com/psycopg/psycopg2/issues/594


回答 18

我在使用相同的错误消息时遇到了同样的问题,但是在Ubuntu 16.04 LTS(Xenial Xerus)上却相反:

命令“ python setup.py egg_info”在/ tmp / pip-install-w71uo1rg / poster /中失败,错误代码为1

我测试了上面提供的所有解决方案,但没有一个对我有用。我阅读了完整的TraceBack,发现我必须使用python版本2.7创建虚拟环境(默认情况下使用Python 3.5):

virtualenv --python=/usr/bin/python2.7 my_venv

激活它后,我将pip install unirest成功运行。

I faced the same problem with the same error message but on Ubuntu 16.04 LTS (Xenial Xerus) instead:

Command “python setup.py egg_info” failed with error code 1 in /tmp/pip-install-w71uo1rg/poster/

I tested all the solutions provided above and none of them worked for me. I read the full TraceBack and found out I had to create the virtual environment with Python version 2.7 instead (the default one uses Python 3.5 instead):

virtualenv --python=/usr/bin/python2.7 my_venv

Once I activated it, I run pip install unirest successfully.


回答 19

尝试在Linux上:

sudo apt install python-pip python-bluez libbluetooth-dev libboost-python-dev libboost-thread-dev libglib2.0-dev bluez bluez-hcidump

try on linux:

sudo apt install python-pip python-bluez libbluetooth-dev libboost-python-dev libboost-thread-dev libglib2.0-dev bluez bluez-hcidump

回答 20

我使用以下方法在Centos 7上解决了该问题:

sudo yum install libcurl-devel

I solved it on Centos 7 by using:

sudo yum install libcurl-devel

回答 21

我的Win10 PC上使用不同的软件包时遇到了相同的问题,并尝试了到目前为止提到的所有内容。

最后通过禁用Comodo Auto-Containment解决了该问题。

由于没有人提及它,我希望它能对某人有所帮助。

Had the same problem on my Win10 PC with different packages and tried everything mentioned so far.

Finally solved it by disabling Comodo Auto-Containment.

Since nobody has mentioned it yet, I hope it helps someone.


回答 22

我遇到了同样的问题,可以通过以下操作解决。

Windows Python需要通过SDK安装的Visual C ++库来构建代码,例如通过setuptools.extension.Extension或numpy.distutils.core.Extension。例如,在Windows中使用Python构建f2py模块需要安装上述Visual C ++ SDK。在Linux和Mac上,C ++库随编译器一起安装。

https://www.scivision.co/python-windows-visual-c++-14-required/

I had the same problem and was able to fix by doing the following.

Windows Python needs Visual C++ libraries installed via the SDK to build code, such as via setuptools.extension.Extension or numpy.distutils.core.Extension. For example, building f2py modules in Windows with Python requires Visual C++ SDK as installed above. On Linux and Mac, the C++ libraries are installed with the compiler.

https://www.scivision.co/python-windows-visual-c++-14-required/


回答 23

以下命令对我有用

[root@sandbox ~]# pip install google-api-python-client==1.6.4

Following below command worked for me

[root@sandbox ~]# pip install google-api-python-client==1.6.4

回答 24

更新setuptools时解决setup.pu egg_info问题的方法或其他方法不起作用。

  1. 如果CONDA可以安装版本的库,请使用conda而不是pip。
  2. 克隆库回购,然后尝试通过pip install -e .或通过进行安装python setup.py install

Methods to solve setup.pu egg_info issue when updating setuptools or not other methods doesnot works.

  1. If CONDA version of the library is available to install use conda instead of pip.
  2. Clone the library repo and then try installation by pip install -e . or by python setup.py install

enumerate()是什么意思?

问题:enumerate()是什么意思?

for row_number, row in enumerate(cursor):在Python 中做什么?

enumerate在这种情况下是什么意思?

What does for row_number, row in enumerate(cursor): do in Python?

What does enumerate mean in this context?


回答 0

enumerate()函数向可迭代对象添加一个计数器。

因此,对于其中的每个元素cursor,将生成一个元组(counter, element);的for环结合,要row_numberrow分别。

演示:

>>> elements = ('foo', 'bar', 'baz')
>>> for elem in elements:
...     print elem
... 
foo
bar
baz
>>> for count, elem in enumerate(elements):
...     print count, elem
... 
0 foo
1 bar
2 baz

默认情况下,enumerate()从开始计数,0但是如果您给它第二个整数参数,它将从该数字开始:

>>> for count, elem in enumerate(elements, 42):
...     print count, elem
... 
42 foo
43 bar
44 baz

如果要enumerate()在Python中重新实现,可以通过以下两种方法来实现。一个itertools.count()用于进行计数,另一个用于在生成器函数中手动计数:

from itertools import count

def enumerate(it, start=0):
    # return an iterator that adds a counter to each element of it
    return zip(count(start), it)

def enumerate(it, start=0):
    count = start
    for elem in it:
        yield (count, elem)
        count += 1

C实际实现更接近于后者,它的优化方式是在常见的for i, ...拆包情况下重用单个元组对象,并对计数器使用标准的C整数值,直到计数器变得太大而避免使用Python整数对象(无界)。

The enumerate() function adds a counter to an iterable.

So for each element in cursor, a tuple is produced with (counter, element); the for loop binds that to row_number and row, respectively.

Demo:

>>> elements = ('foo', 'bar', 'baz')
>>> for elem in elements:
...     print elem
... 
foo
bar
baz
>>> for count, elem in enumerate(elements):
...     print count, elem
... 
0 foo
1 bar
2 baz

By default, enumerate() starts counting at 0 but if you give it a second integer argument, it’ll start from that number instead:

>>> for count, elem in enumerate(elements, 42):
...     print count, elem
... 
42 foo
43 bar
44 baz

If you were to re-implement enumerate() in Python, here are two ways of achieving that; one using itertools.count() to do the counting, the other manually counting in a generator function:

from itertools import count

def enumerate(it, start=0):
    # return an iterator that adds a counter to each element of it
    return zip(count(start), it)

and

def enumerate(it, start=0):
    count = start
    for elem in it:
        yield (count, elem)
        count += 1

The actual implementation in C is closer to the latter, with optimisations to reuse a single tuple object for the common for i, ... unpacking case and using a standard C integer value for the counter until the counter becomes too large to avoid using a Python integer object (which is unbounded).


回答 1

它是一个内置函数,可返回可以迭代的对象。请参阅文档

简而言之,它遍历组合在一个元组中的可迭代元素(如列表)以及索引号:

for item in enumerate(["a", "b", "c"]):
    print item

版画

(0, "a")
(1, "b")
(2, "c")

如果要遍历一个序列(或其他可迭代的事物),并且还希望有一个可用的索引计数器,这将很有帮助。如果您希望计数器从其他值(通常为1)开始,则可以将其作为第二个参数enumerate

It’s a builtin function that returns an object that can be iterated over. See the documentation.

In short, it loops over the elements of an iterable (like a list), as well as an index number, combined in a tuple:

for item in enumerate(["a", "b", "c"]):
    print item

prints

(0, "a")
(1, "b")
(2, "c")

It’s helpful if you want to loop over a sequence (or other iterable thing), and also want to have an index counter available. If you want the counter to start from some other value (usually 1), you can give that as second argument to enumerate.


回答 2

我正在阅读Brett Slatkin 的书(《Effective Python》),他展示了另一种遍历列表的方法,并且知道列表中当前项目的索引,但是他建议最好不要使用它,enumerate而应该使用它。我知道您问过枚举是什么意思,但是当我理解以下内容时,我也了解了如何enumerate遍历列表,同时又更容易了解当前项目的索引(并且更具可读性)。

list_of_letters = ['a', 'b', 'c']
for i in range(len(list_of_letters)):
    letter = list_of_letters[i]
    print (i, letter)

输出为:

0 a
1 b
2 c

我也曾经做过一些事情,甚至在阅读有关该enumerate功能之前变得更加愚蠢。

i = 0
for n in list_of_letters:
    print (i, n)
    i += 1

它产生相同的输出。

但是enumerate我只需要写:

list_of_letters = ['a', 'b', 'c']
for i, letter in enumerate(list_of_letters):
    print (i, letter)

I am reading a book (Effective Python) by Brett Slatkin and he shows another way to iterate over a list and also know the index of the current item in the list but he suggests that it is better not to use it and to use enumerate instead. I know you asked what enumerate means, but when I understood the following, I also understood how enumerate makes iterating over a list while knowing the index of the current item easier (and more readable).

list_of_letters = ['a', 'b', 'c']
for i in range(len(list_of_letters)):
    letter = list_of_letters[i]
    print (i, letter)

The output is:

0 a
1 b
2 c

I also used to do something, even sillier before I read about the enumerate function.

i = 0
for n in list_of_letters:
    print (i, n)
    i += 1

It produces the same output.

But with enumerate I just have to write:

list_of_letters = ['a', 'b', 'c']
for i, letter in enumerate(list_of_letters):
    print (i, letter)

回答 3

正如其他用户所提到的那样,enumerate是一个生成器,它在可迭代项的每个项旁边添加一个增量索引。

因此,如果您有一个要说的清单l = ["test_1", "test_2", "test_3"]list(enumerate(l))它将为您提供以下信息:[(0, 'test_1'), (1, 'test_2'), (2, 'test_3')]

现在,什么时候有用?一个可能的用例是当您要遍历项目时,并且想要跳过仅知道列表中其索引但不知道其值的特定项目(因为当时尚不知道其值)。

for index, value in enumerate(joint_values):
   if index == 3:
       continue

   # Do something with the other `value`

因此,您的代码读起来更好,因为您也可以执行常规的for循环,range但随后访问需要为其编入索引的项目(即joint_values[i])。

尽管另一个用户提到了enumerateusing 的实现zip,但我认为没有使用一种更纯净(但稍微复杂一点)的方法itertools如下:

def enumerate(l, start=0):
    return zip(range(start, len(l) + start), l)

例:

l = ["test_1", "test_2", "test_3"]
enumerate(l)
enumerate(l, 10)

输出:

[(0,’test_1’),(1,’test_2’),(2,’test_3’)]

[(10,’test_1’),(11,’test_2’),(12,’test_3’)]

如评论中所提到的,这种具有范围的方法不适用于任意可迭代对象,就像原始enumerate函数一样。

As other users have mentioned, enumerate is a generator that adds an incremental index next to each item of an iterable.

So if you have a list say l = ["test_1", "test_2", "test_3"], the list(enumerate(l)) will give you something like this: [(0, 'test_1'), (1, 'test_2'), (2, 'test_3')].

Now, when this is useful? A possible use case is when you want to iterate over items, and you want to skip a specific item that you only know its index in the list but not its value (because its value is not known at the time).

for index, value in enumerate(joint_values):
   if index == 3:
       continue

   # Do something with the other `value`

So your code reads better because you could also do a regular for loop with range but then to access the items you need to index them (i.e., joint_values[i]).

Although another user mentioned an implementation of enumerate using zip, I think a more pure (but slightly more complex) way without using itertools is the following:

def enumerate(l, start=0):
    return zip(range(start, len(l) + start), l)

Example:

l = ["test_1", "test_2", "test_3"]
enumerate(l)
enumerate(l, 10)

Output:

[(0, ‘test_1’), (1, ‘test_2’), (2, ‘test_3’)]

[(10, ‘test_1’), (11, ‘test_2’), (12, ‘test_3’)]

As mentioned in the comments, this approach with range will not work with arbitrary iterables as the original enumerate function does.


回答 4

枚举函数的工作方式如下:

doc = """I like movie. But I don't like the cast. The story is very nice"""
doc1 = doc.split('.')
for i in enumerate(doc1):
     print(i)

输出是

(0, 'I like movie')
(1, " But I don't like the cast")
(2, ' The story is very nice')

The enumerate function works as follows:

doc = """I like movie. But I don't like the cast. The story is very nice"""
doc1 = doc.split('.')
for i in enumerate(doc1):
     print(i)

The output is

(0, 'I like movie')
(1, " But I don't like the cast")
(2, ' The story is very nice')

将模块“子进程”与超时一起使用

问题:将模块“子进程”与超时一起使用

这是运行任意命令以返回其stdout数据或在非零退出代码上引发异常的Python代码:

proc = subprocess.Popen(
    cmd,
    stderr=subprocess.STDOUT,  # Merge stdout and stderr
    stdout=subprocess.PIPE,
    shell=True)

communicate 用于等待进程退出:

stdoutdata, stderrdata = proc.communicate()

subprocess模块不支持超时-可以杀死运行时间超过X秒的进程-因此,communicate可能需要永远运行。

在打算在Windows和Linux上运行的Python程序中实现超时的最简单方法是什么?

Here’s the Python code to run an arbitrary command returning its stdout data, or raise an exception on non-zero exit codes:

proc = subprocess.Popen(
    cmd,
    stderr=subprocess.STDOUT,  # Merge stdout and stderr
    stdout=subprocess.PIPE,
    shell=True)

communicate is used to wait for the process to exit:

stdoutdata, stderrdata = proc.communicate()

The subprocess module does not support timeout–ability to kill a process running for more than X number of seconds–therefore, communicate may take forever to run.

What is the simplest way to implement timeouts in a Python program meant to run on Windows and Linux?


回答 0

在Python 3.3+中:

from subprocess import STDOUT, check_output

output = check_output(cmd, stderr=STDOUT, timeout=seconds)

output 是一个字节字符串,其中包含命令的合并标准输出,标准错误数据。

check_output提出CalledProcessError问题文本中指定的非零退出状态,这与proc.communicate()的方法。

我已删除,shell=True因为它经常不必要地使用。如果cmd确实需要,您可以随时将其添加回去。如果添加,shell=True即子进程是否产生了自己的后代;check_output()可以比超时指示晚得多返回,请参阅子进程超时失败

超时功能可在Python 2.x上通过subprocess323.2+子进程模块的反向端口使用。

In Python 3.3+:

from subprocess import STDOUT, check_output

output = check_output(cmd, stderr=STDOUT, timeout=seconds)

output is a byte string that contains command’s merged stdout, stderr data.

check_output raises CalledProcessError on non-zero exit status as specified in the question’s text unlike proc.communicate() method.

I’ve removed shell=True because it is often used unnecessarily. You can always add it back if cmd indeed requires it. If you add shell=True i.e., if the child process spawns its own descendants; check_output() can return much later than the timeout indicates, see Subprocess timeout failure.

The timeout feature is available on Python 2.x via the subprocess32 backport of the 3.2+ subprocess module.


回答 1

我对底层细节了解不多;但是,鉴于python 2.6中的API提供了等待线程并终止进程的能力,那么如何在单独的线程中运行进程呢?

import subprocess, threading

class Command(object):
    def __init__(self, cmd):
        self.cmd = cmd
        self.process = None

    def run(self, timeout):
        def target():
            print 'Thread started'
            self.process = subprocess.Popen(self.cmd, shell=True)
            self.process.communicate()
            print 'Thread finished'

        thread = threading.Thread(target=target)
        thread.start()

        thread.join(timeout)
        if thread.is_alive():
            print 'Terminating process'
            self.process.terminate()
            thread.join()
        print self.process.returncode

command = Command("echo 'Process started'; sleep 2; echo 'Process finished'")
command.run(timeout=3)
command.run(timeout=1)

我的计算机中此代码段的输出为:

Thread started
Process started
Process finished
Thread finished
0
Thread started
Process started
Terminating process
Thread finished
-15

从中可以看出,在第一次执行中,进程正确完成了(返回代码0),而在第二次执行中,进程终止了(返回代码-15)。

我没有在Windows中进行测试;但是,除了更新示例命令外,我认为它应该可以工作,因为我在文档中没有发现任何不支持thread.join或process.terminate的内容。

I don’t know much about the low level details; but, given that in python 2.6 the API offers the ability to wait for threads and terminate processes, what about running the process in a separate thread?

import subprocess, threading

class Command(object):
    def __init__(self, cmd):
        self.cmd = cmd
        self.process = None

    def run(self, timeout):
        def target():
            print 'Thread started'
            self.process = subprocess.Popen(self.cmd, shell=True)
            self.process.communicate()
            print 'Thread finished'

        thread = threading.Thread(target=target)
        thread.start()

        thread.join(timeout)
        if thread.is_alive():
            print 'Terminating process'
            self.process.terminate()
            thread.join()
        print self.process.returncode

command = Command("echo 'Process started'; sleep 2; echo 'Process finished'")
command.run(timeout=3)
command.run(timeout=1)

The output of this snippet in my machine is:

Thread started
Process started
Process finished
Thread finished
0
Thread started
Process started
Terminating process
Thread finished
-15

where it can be seen that, in the first execution, the process finished correctly (return code 0), while the in the second one the process was terminated (return code -15).

I haven’t tested in windows; but, aside from updating the example command, I think it should work since I haven’t found in the documentation anything that says that thread.join or process.terminate is not supported.


回答 2

可以使用threading.Timer类简化jcollado的答案:

import shlex
from subprocess import Popen, PIPE
from threading import Timer

def run(cmd, timeout_sec):
    proc = Popen(shlex.split(cmd), stdout=PIPE, stderr=PIPE)
    timer = Timer(timeout_sec, proc.kill)
    try:
        timer.start()
        stdout, stderr = proc.communicate()
    finally:
        timer.cancel()

# Examples: both take 1 second
run("sleep 1", 5)  # process ends normally at 1 second
run("sleep 5", 1)  # timeout happens at 1 second

jcollado’s answer can be simplified using the threading.Timer class:

import shlex
from subprocess import Popen, PIPE
from threading import Timer

def run(cmd, timeout_sec):
    proc = Popen(shlex.split(cmd), stdout=PIPE, stderr=PIPE)
    timer = Timer(timeout_sec, proc.kill)
    try:
        timer.start()
        stdout, stderr = proc.communicate()
    finally:
        timer.cancel()

# Examples: both take 1 second
run("sleep 1", 5)  # process ends normally at 1 second
run("sleep 5", 1)  # timeout happens at 1 second

回答 3

如果您使用的是Unix,

import signal
  ...
class Alarm(Exception):
    pass

def alarm_handler(signum, frame):
    raise Alarm

signal.signal(signal.SIGALRM, alarm_handler)
signal.alarm(5*60)  # 5 minutes
try:
    stdoutdata, stderrdata = proc.communicate()
    signal.alarm(0)  # reset the alarm
except Alarm:
    print "Oops, taking too long!"
    # whatever else

If you’re on Unix,

import signal
  ...
class Alarm(Exception):
    pass

def alarm_handler(signum, frame):
    raise Alarm

signal.signal(signal.SIGALRM, alarm_handler)
signal.alarm(5*60)  # 5 minutes
try:
    stdoutdata, stderrdata = proc.communicate()
    signal.alarm(0)  # reset the alarm
except Alarm:
    print "Oops, taking too long!"
    # whatever else

回答 4

这是Alex Martelli作为具有适当过程终止功能的模块的解决方案。其他方法不起作用,因为它们不使用proc.communicate()。因此,如果您有一个产生大量输出的进程,它将填充其输出缓冲区,然后阻塞直到您从中读取内容。

from os import kill
from signal import alarm, signal, SIGALRM, SIGKILL
from subprocess import PIPE, Popen

def run(args, cwd = None, shell = False, kill_tree = True, timeout = -1, env = None):
    '''
    Run a command with a timeout after which it will be forcibly
    killed.
    '''
    class Alarm(Exception):
        pass
    def alarm_handler(signum, frame):
        raise Alarm
    p = Popen(args, shell = shell, cwd = cwd, stdout = PIPE, stderr = PIPE, env = env)
    if timeout != -1:
        signal(SIGALRM, alarm_handler)
        alarm(timeout)
    try:
        stdout, stderr = p.communicate()
        if timeout != -1:
            alarm(0)
    except Alarm:
        pids = [p.pid]
        if kill_tree:
            pids.extend(get_process_children(p.pid))
        for pid in pids:
            # process might have died before getting to this line
            # so wrap to avoid OSError: no such process
            try: 
                kill(pid, SIGKILL)
            except OSError:
                pass
        return -9, '', ''
    return p.returncode, stdout, stderr

def get_process_children(pid):
    p = Popen('ps --no-headers -o pid --ppid %d' % pid, shell = True,
              stdout = PIPE, stderr = PIPE)
    stdout, stderr = p.communicate()
    return [int(p) for p in stdout.split()]

if __name__ == '__main__':
    print run('find /', shell = True, timeout = 3)
    print run('find', shell = True)

Here is Alex Martelli’s solution as a module with proper process killing. The other approaches do not work because they do not use proc.communicate(). So if you have a process that produces lots of output, it will fill its output buffer and then block until you read something from it.

from os import kill
from signal import alarm, signal, SIGALRM, SIGKILL
from subprocess import PIPE, Popen

def run(args, cwd = None, shell = False, kill_tree = True, timeout = -1, env = None):
    '''
    Run a command with a timeout after which it will be forcibly
    killed.
    '''
    class Alarm(Exception):
        pass
    def alarm_handler(signum, frame):
        raise Alarm
    p = Popen(args, shell = shell, cwd = cwd, stdout = PIPE, stderr = PIPE, env = env)
    if timeout != -1:
        signal(SIGALRM, alarm_handler)
        alarm(timeout)
    try:
        stdout, stderr = p.communicate()
        if timeout != -1:
            alarm(0)
    except Alarm:
        pids = [p.pid]
        if kill_tree:
            pids.extend(get_process_children(p.pid))
        for pid in pids:
            # process might have died before getting to this line
            # so wrap to avoid OSError: no such process
            try: 
                kill(pid, SIGKILL)
            except OSError:
                pass
        return -9, '', ''
    return p.returncode, stdout, stderr

def get_process_children(pid):
    p = Popen('ps --no-headers -o pid --ppid %d' % pid, shell = True,
              stdout = PIPE, stderr = PIPE)
    stdout, stderr = p.communicate()
    return [int(p) for p in stdout.split()]

if __name__ == '__main__':
    print run('find /', shell = True, timeout = 3)
    print run('find', shell = True)

回答 5

我修改了sussudio答案。现在函数返回:( ,returncodestdoutstderrtimeoutstdoutstderr被解码为UTF-8字符串

def kill_proc(proc, timeout):
  timeout["value"] = True
  proc.kill()

def run(cmd, timeout_sec):
  proc = subprocess.Popen(shlex.split(cmd), stdout=subprocess.PIPE, stderr=subprocess.PIPE)
  timeout = {"value": False}
  timer = Timer(timeout_sec, kill_proc, [proc, timeout])
  timer.start()
  stdout, stderr = proc.communicate()
  timer.cancel()
  return proc.returncode, stdout.decode("utf-8"), stderr.decode("utf-8"), timeout["value"]

I’ve modified sussudio answer. Now function returns: (returncode, stdout, stderr, timeout) – stdout and stderr is decoded to utf-8 string

def kill_proc(proc, timeout):
  timeout["value"] = True
  proc.kill()

def run(cmd, timeout_sec):
  proc = subprocess.Popen(shlex.split(cmd), stdout=subprocess.PIPE, stderr=subprocess.PIPE)
  timeout = {"value": False}
  timer = Timer(timeout_sec, kill_proc, [proc, timeout])
  timer.start()
  stdout, stderr = proc.communicate()
  timer.cancel()
  return proc.returncode, stdout.decode("utf-8"), stderr.decode("utf-8"), timeout["value"]

回答 6

惊讶没人提到使用 timeout

timeout 5 ping -c 3 somehost

显然,这不适用于每个用例,但是如果您处理的是简单脚本,那么这是很难克服的。

homebrew对于Mac用户,也可以通过coreutils中的gtimeout 使用。

surprised nobody mentioned using timeout

timeout 5 ping -c 3 somehost

This won’t for work for every use case obviously, but if your dealing with a simple script, this is hard to beat.

Also available as gtimeout in coreutils via homebrew for mac users.


回答 7

timeout现在由subprocess模块支持call()communicate()在其中(在Python3.3中):

import subprocess

subprocess.call("command", timeout=20, shell=True)

这将调用命令并引发异常

subprocess.TimeoutExpired

如果20秒后命令仍未完成。

然后,您可以处理异常以继续执行代码,例如:

try:
    subprocess.call("command", timeout=20, shell=True)
except subprocess.TimeoutExpired:
    # insert code here

希望这可以帮助。

timeout is now supported by call() and communicate() in the subprocess module (as of Python3.3):

import subprocess

subprocess.call("command", timeout=20, shell=True)

This will call the command and raise the exception

subprocess.TimeoutExpired

if the command doesn’t finish after 20 seconds.

You can then handle the exception to continue your code, something like:

try:
    subprocess.call("command", timeout=20, shell=True)
except subprocess.TimeoutExpired:
    # insert code here

Hope this helps.


回答 8

另一种选择是写入临时文件以防止stdout阻塞,而不是需要使用communication()进行轮询。这对我有用,而其他答案却没有。例如在Windows上。

    outFile =  tempfile.SpooledTemporaryFile() 
    errFile =   tempfile.SpooledTemporaryFile() 
    proc = subprocess.Popen(args, stderr=errFile, stdout=outFile, universal_newlines=False)
    wait_remaining_sec = timeout

    while proc.poll() is None and wait_remaining_sec > 0:
        time.sleep(1)
        wait_remaining_sec -= 1

    if wait_remaining_sec <= 0:
        killProc(proc.pid)
        raise ProcessIncompleteError(proc, timeout)

    # read temp streams from start
    outFile.seek(0);
    errFile.seek(0);
    out = outFile.read()
    err = errFile.read()
    outFile.close()
    errFile.close()

Another option is to write to a temporary file to prevent the stdout blocking instead of needing to poll with communicate(). This worked for me where the other answers did not; for example on windows.

    outFile =  tempfile.SpooledTemporaryFile() 
    errFile =   tempfile.SpooledTemporaryFile() 
    proc = subprocess.Popen(args, stderr=errFile, stdout=outFile, universal_newlines=False)
    wait_remaining_sec = timeout

    while proc.poll() is None and wait_remaining_sec > 0:
        time.sleep(1)
        wait_remaining_sec -= 1

    if wait_remaining_sec <= 0:
        killProc(proc.pid)
        raise ProcessIncompleteError(proc, timeout)

    # read temp streams from start
    outFile.seek(0);
    errFile.seek(0);
    out = outFile.read()
    err = errFile.read()
    outFile.close()
    errFile.close()

回答 9

我不知道为什么它不mentionned但是因为Python 3.5,有一个新的subprocess.run通用指令(即意味着取代check_callcheck_output……),并且其具有timeout参数也是如此。

subprocess.run(args,*,stdin = None,input = None,stdout = None,stderr = None,shell = False,cwd = None,timeout = None,check = False,encoding = None,errors = None)

Run the command described by args. Wait for command to complete, then return a CompletedProcess instance.

subprocess.TimeoutExpired超时到期时会引发异常。

I don’t know why it isn’t mentionned but since Python 3.5, there’s a new subprocess.run universal command (that is meant to replace check_call, check_output …) and which has the timeout parameter as well.

subprocess.run(args, *, stdin=None, input=None, stdout=None, stderr=None, shell=False, cwd=None, timeout=None, check=False, encoding=None, errors=None)

Run the command described by args. Wait for command to complete, then return a CompletedProcess instance.

It raises a subprocess.TimeoutExpired exception when the timeout is expired.


回答 10

这是我的解决方案,我正在使用线程和事件:

import subprocess
from threading import Thread, Event

def kill_on_timeout(done, timeout, proc):
    if not done.wait(timeout):
        proc.kill()

def exec_command(command, timeout):

    done = Event()
    proc = subprocess.Popen(command, stdout=subprocess.PIPE, stderr=subprocess.PIPE)

    watcher = Thread(target=kill_on_timeout, args=(done, timeout, proc))
    watcher.daemon = True
    watcher.start()

    data, stderr = proc.communicate()
    done.set()

    return data, stderr, proc.returncode

实际上:

In [2]: exec_command(['sleep', '10'], 5)
Out[2]: ('', '', -9)

In [3]: exec_command(['sleep', '10'], 11)
Out[3]: ('', '', 0)

Here is my solution, I was using Thread and Event:

import subprocess
from threading import Thread, Event

def kill_on_timeout(done, timeout, proc):
    if not done.wait(timeout):
        proc.kill()

def exec_command(command, timeout):

    done = Event()
    proc = subprocess.Popen(command, stdout=subprocess.PIPE, stderr=subprocess.PIPE)

    watcher = Thread(target=kill_on_timeout, args=(done, timeout, proc))
    watcher.daemon = True
    watcher.start()

    data, stderr = proc.communicate()
    done.set()

    return data, stderr, proc.returncode

In action:

In [2]: exec_command(['sleep', '10'], 5)
Out[2]: ('', '', -9)

In [3]: exec_command(['sleep', '10'], 11)
Out[3]: ('', '', 0)

回答 11

我使用的解决方案是给shell命令加上时间限制。如果命令花费的时间太长,则时间限制将停止它,并且Popen将有一个由时间限制设置的返回码。如果大于128,则表示时间限制终止了该进程。

另请参阅具有超时和大输出(> 64K)的python子进程

The solution I use is to prefix the shell command with timelimit. If the comand takes too long, timelimit will stop it and Popen will have a returncode set by timelimit. If it is > 128, it means timelimit killed the process.

See also python subprocess with timeout and large output (>64K)


回答 12

我将带有线程自的解决方案添加jcollado到了我的Python模块easyprocess中

安装:

pip install easyprocess

例:

from easyprocess import Proc

# shell is not supported!
stdout=Proc('ping localhost').call(timeout=1.5).stdout
print stdout

I added the solution with threading from jcollado to my Python module easyprocess.

Install:

pip install easyprocess

Example:

from easyprocess import Proc

# shell is not supported!
stdout=Proc('ping localhost').call(timeout=1.5).stdout
print stdout

回答 13

如果您使用的是python 2,请尝试一下

import subprocess32

try:
    output = subprocess32.check_output(command, shell=True, timeout=3)
except subprocess32.TimeoutExpired as e:
    print e

if you are using python 2, give it a try

import subprocess32

try:
    output = subprocess32.check_output(command, shell=True, timeout=3)
except subprocess32.TimeoutExpired as e:
    print e

回答 14

前置Linux命令timeout不是一个坏的解决方法,它对我有用。

cmd = "timeout 20 "+ cmd
subprocess.Popen(cmd.split(), stdout=subprocess.PIPE, stderr=subprocess.PIPE)
(output, err) = p.communicate()

Prepending the Linux command timeout isn’t a bad workaround and it worked for me.

cmd = "timeout 20 "+ cmd
subprocess.Popen(cmd.split(), stdout=subprocess.PIPE, stderr=subprocess.PIPE)
(output, err) = p.communicate()

回答 15

我已经实现了我可以从其中一些中学到的东西。这在Windows中有效,并且由于这是社区Wiki,因此我想我也将共享我的代码:

class Command(threading.Thread):
    def __init__(self, cmd, outFile, errFile, timeout):
        threading.Thread.__init__(self)
        self.cmd = cmd
        self.process = None
        self.outFile = outFile
        self.errFile = errFile
        self.timed_out = False
        self.timeout = timeout

    def run(self):
        self.process = subprocess.Popen(self.cmd, stdout = self.outFile, \
            stderr = self.errFile)

        while (self.process.poll() is None and self.timeout > 0):
            time.sleep(1)
            self.timeout -= 1

        if not self.timeout > 0:
            self.process.terminate()
            self.timed_out = True
        else:
            self.timed_out = False

然后从另一个类或文件:

        outFile =  tempfile.SpooledTemporaryFile()
        errFile =   tempfile.SpooledTemporaryFile()

        executor = command.Command(c, outFile, errFile, timeout)
        executor.daemon = True
        executor.start()

        executor.join()
        if executor.timed_out:
            out = 'timed out'
        else:
            outFile.seek(0)
            errFile.seek(0)
            out = outFile.read()
            err = errFile.read()

        outFile.close()
        errFile.close()

I’ve implemented what I could gather from a few of these. This works in Windows, and since this is a community wiki, I figure I would share my code as well:

class Command(threading.Thread):
    def __init__(self, cmd, outFile, errFile, timeout):
        threading.Thread.__init__(self)
        self.cmd = cmd
        self.process = None
        self.outFile = outFile
        self.errFile = errFile
        self.timed_out = False
        self.timeout = timeout

    def run(self):
        self.process = subprocess.Popen(self.cmd, stdout = self.outFile, \
            stderr = self.errFile)

        while (self.process.poll() is None and self.timeout > 0):
            time.sleep(1)
            self.timeout -= 1

        if not self.timeout > 0:
            self.process.terminate()
            self.timed_out = True
        else:
            self.timed_out = False

Then from another class or file:

        outFile =  tempfile.SpooledTemporaryFile()
        errFile =   tempfile.SpooledTemporaryFile()

        executor = command.Command(c, outFile, errFile, timeout)
        executor.daemon = True
        executor.start()

        executor.join()
        if executor.timed_out:
            out = 'timed out'
        else:
            outFile.seek(0)
            errFile.seek(0)
            out = outFile.read()
            err = errFile.read()

        outFile.close()
        errFile.close()

回答 16

一旦您了解了* unix中运行全过程的机器,您将轻松找到更简单的解决方案:

考虑这个简单的示例,如何使用select.select()使超时的communication()方法(现在几乎在* nix上几乎所有可用)。这也可以用epoll / poll / kqueue编写,但是select.select()变体可能是一个很好的例子。而且select.select()的主要限制(速度和最大1024 fds)不适用于您的任务。

这可以在* nix下工作,不创建线程,不使用信号,可以从任何线程(不仅是主线程)启动,并且速度足够快,可以从我的计算机上的stdout(i5 2.3ghz)读取250mb / s的数据。

在通信结束时加入stdout / stderr存在问题。如果您的程序输出很大,可能会导致占用大量内存。但是您可以在较小的超时时间内多次调用communication()。

class Popen(subprocess.Popen):
    def communicate(self, input=None, timeout=None):
        if timeout is None:
            return subprocess.Popen.communicate(self, input)

        if self.stdin:
            # Flush stdio buffer, this might block if user
            # has been writing to .stdin in an uncontrolled
            # fashion.
            self.stdin.flush()
            if not input:
                self.stdin.close()

        read_set, write_set = [], []
        stdout = stderr = None

        if self.stdin and input:
            write_set.append(self.stdin)
        if self.stdout:
            read_set.append(self.stdout)
            stdout = []
        if self.stderr:
            read_set.append(self.stderr)
            stderr = []

        input_offset = 0
        deadline = time.time() + timeout

        while read_set or write_set:
            try:
                rlist, wlist, xlist = select.select(read_set, write_set, [], max(0, deadline - time.time()))
            except select.error as ex:
                if ex.args[0] == errno.EINTR:
                    continue
                raise

            if not (rlist or wlist):
                # Just break if timeout
                # Since we do not close stdout/stderr/stdin, we can call
                # communicate() several times reading data by smaller pieces.
                break

            if self.stdin in wlist:
                chunk = input[input_offset:input_offset + subprocess._PIPE_BUF]
                try:
                    bytes_written = os.write(self.stdin.fileno(), chunk)
                except OSError as ex:
                    if ex.errno == errno.EPIPE:
                        self.stdin.close()
                        write_set.remove(self.stdin)
                    else:
                        raise
                else:
                    input_offset += bytes_written
                    if input_offset >= len(input):
                        self.stdin.close()
                        write_set.remove(self.stdin)

            # Read stdout / stderr by 1024 bytes
            for fn, tgt in (
                (self.stdout, stdout),
                (self.stderr, stderr),
            ):
                if fn in rlist:
                    data = os.read(fn.fileno(), 1024)
                    if data == '':
                        fn.close()
                        read_set.remove(fn)
                    tgt.append(data)

        if stdout is not None:
            stdout = ''.join(stdout)
        if stderr is not None:
            stderr = ''.join(stderr)

        return (stdout, stderr)

Once you understand full process running machinery in *unix, you will easily find simplier solution:

Consider this simple example how to make timeoutable communicate() meth using select.select() (available alsmost everythere on *nix nowadays). This also can be written with epoll/poll/kqueue, but select.select() variant could be a good example for you. And major limitations of select.select() (speed and 1024 max fds) are not applicapable for your task.

This works under *nix, does not create threads, does not uses signals, can be lauched from any thread (not only main), and fast enought to read 250mb/s of data from stdout on my machine (i5 2.3ghz).

There is a problem in join’ing stdout/stderr at the end of communicate. If you have huge program output this could lead to big memory usage. But you can call communicate() several times with smaller timeouts.

class Popen(subprocess.Popen):
    def communicate(self, input=None, timeout=None):
        if timeout is None:
            return subprocess.Popen.communicate(self, input)

        if self.stdin:
            # Flush stdio buffer, this might block if user
            # has been writing to .stdin in an uncontrolled
            # fashion.
            self.stdin.flush()
            if not input:
                self.stdin.close()

        read_set, write_set = [], []
        stdout = stderr = None

        if self.stdin and input:
            write_set.append(self.stdin)
        if self.stdout:
            read_set.append(self.stdout)
            stdout = []
        if self.stderr:
            read_set.append(self.stderr)
            stderr = []

        input_offset = 0
        deadline = time.time() + timeout

        while read_set or write_set:
            try:
                rlist, wlist, xlist = select.select(read_set, write_set, [], max(0, deadline - time.time()))
            except select.error as ex:
                if ex.args[0] == errno.EINTR:
                    continue
                raise

            if not (rlist or wlist):
                # Just break if timeout
                # Since we do not close stdout/stderr/stdin, we can call
                # communicate() several times reading data by smaller pieces.
                break

            if self.stdin in wlist:
                chunk = input[input_offset:input_offset + subprocess._PIPE_BUF]
                try:
                    bytes_written = os.write(self.stdin.fileno(), chunk)
                except OSError as ex:
                    if ex.errno == errno.EPIPE:
                        self.stdin.close()
                        write_set.remove(self.stdin)
                    else:
                        raise
                else:
                    input_offset += bytes_written
                    if input_offset >= len(input):
                        self.stdin.close()
                        write_set.remove(self.stdin)

            # Read stdout / stderr by 1024 bytes
            for fn, tgt in (
                (self.stdout, stdout),
                (self.stderr, stderr),
            ):
                if fn in rlist:
                    data = os.read(fn.fileno(), 1024)
                    if data == '':
                        fn.close()
                        read_set.remove(fn)
                    tgt.append(data)

        if stdout is not None:
            stdout = ''.join(stdout)
        if stderr is not None:
            stderr = ''.join(stderr)

        return (stdout, stderr)

回答 17

您可以使用 select

import subprocess
from datetime import datetime
from select import select

def call_with_timeout(cmd, timeout):
    started = datetime.now()
    sp = subprocess.Popen(cmd, stdout=subprocess.PIPE)
    while True:
        p = select([sp.stdout], [], [], timeout)
        if p[0]:
            p[0][0].read()
        ret = sp.poll()
        if ret is not None:
            return ret
        if (datetime.now()-started).total_seconds() > timeout:
            sp.kill()
            return None

You can do this using select

import subprocess
from datetime import datetime
from select import select

def call_with_timeout(cmd, timeout):
    started = datetime.now()
    sp = subprocess.Popen(cmd, stdout=subprocess.PIPE)
    while True:
        p = select([sp.stdout], [], [], timeout)
        if p[0]:
            p[0][0].read()
        ret = sp.poll()
        if ret is not None:
            return ret
        if (datetime.now()-started).total_seconds() > timeout:
            sp.kill()
            return None

回答 18

我已经在Windows,Linux和Mac上成功使用killableprocess。如果您使用的是Cygwin Python,则需要OSAF的killableprocess版本,因为否则本机Windows进程将不会被杀死。

I’ve used killableprocess successfully on Windows, Linux and Mac. If you are using Cygwin Python, you’ll need OSAF’s version of killableprocess because otherwise native Windows processes won’t get killed.


回答 19

尽管我没有广泛研究它,但我在ActiveState上发现的这种装饰器似乎对这种事情很有用。与一起subprocess.Popen(..., close_fds=True),至少我已经准备好使用Python编写shell脚本了。

Although I haven’t looked at it extensively, this decorator I found at ActiveState seems to be quite useful for this sort of thing. Along with subprocess.Popen(..., close_fds=True), at least I’m ready for shell-scripting in Python.


回答 20

如果shell = True,此解决方案将杀死进程树,将参数传递给进程(或不传递参数),具有超时并获取回调的stdout,stderr和进程输出(它将psutil用于kill_proc_tree)。这是基于SO中发布的几种解决方案,包括jcollado的解决方案。在jcollado的回答中张贴对Anson和jradice的评论的回应。已在Windows Srvr 2012和Ubuntu 14.04中测试。请注意,对于Ubuntu,您需要将parent.children(…)调用更改为parent.get_children(…)。

def kill_proc_tree(pid, including_parent=True):
  parent = psutil.Process(pid)
  children = parent.children(recursive=True)
  for child in children:
    child.kill()
  psutil.wait_procs(children, timeout=5)
  if including_parent:
    parent.kill()
    parent.wait(5)

def run_with_timeout(cmd, current_dir, cmd_parms, timeout):
  def target():
    process = subprocess.Popen(cmd, cwd=current_dir, shell=True, stdout=subprocess.PIPE, stdin=subprocess.PIPE, stderr=subprocess.PIPE)

    # wait for the process to terminate
    if (cmd_parms == ""):
      out, err = process.communicate()
    else:
      out, err = process.communicate(cmd_parms)
    errcode = process.returncode

  thread = Thread(target=target)
  thread.start()

  thread.join(timeout)
  if thread.is_alive():
    me = os.getpid()
    kill_proc_tree(me, including_parent=False)
    thread.join()

This solution kills the process tree in case of shell=True, passes parameters to the process (or not), has a timeout and gets the stdout, stderr and process output of the call back (it uses psutil for the kill_proc_tree). This was based on several solutions posted in SO including jcollado’s. Posting in response to comments by Anson and jradice in jcollado’s answer. Tested in Windows Srvr 2012 and Ubuntu 14.04. Please note that for Ubuntu you need to change the parent.children(…) call to parent.get_children(…).

def kill_proc_tree(pid, including_parent=True):
  parent = psutil.Process(pid)
  children = parent.children(recursive=True)
  for child in children:
    child.kill()
  psutil.wait_procs(children, timeout=5)
  if including_parent:
    parent.kill()
    parent.wait(5)

def run_with_timeout(cmd, current_dir, cmd_parms, timeout):
  def target():
    process = subprocess.Popen(cmd, cwd=current_dir, shell=True, stdout=subprocess.PIPE, stdin=subprocess.PIPE, stderr=subprocess.PIPE)

    # wait for the process to terminate
    if (cmd_parms == ""):
      out, err = process.communicate()
    else:
      out, err = process.communicate(cmd_parms)
    errcode = process.returncode

  thread = Thread(target=target)
  thread.start()

  thread.join(timeout)
  if thread.is_alive():
    me = os.getpid()
    kill_proc_tree(me, including_parent=False)
    thread.join()

回答 21

有一个想法可以继承Popen类并使用一些简单的方法装饰器对其进行扩展。我们称之为ExpirablePopen。

from logging import error
from subprocess import Popen
from threading import Event
from threading import Thread


class ExpirablePopen(Popen):

    def __init__(self, *args, **kwargs):
        self.timeout = kwargs.pop('timeout', 0)
        self.timer = None
        self.done = Event()

        Popen.__init__(self, *args, **kwargs)

    def __tkill(self):
        timeout = self.timeout
        if not self.done.wait(timeout):
            error('Terminating process {} by timeout of {} secs.'.format(self.pid, timeout))
            self.kill()

    def expirable(func):
        def wrapper(self, *args, **kwargs):
            # zero timeout means call of parent method
            if self.timeout == 0:
                return func(self, *args, **kwargs)

            # if timer is None, need to start it
            if self.timer is None:
                self.timer = thr = Thread(target=self.__tkill)
                thr.daemon = True
                thr.start()

            result = func(self, *args, **kwargs)
            self.done.set()

            return result
        return wrapper

    wait = expirable(Popen.wait)
    communicate = expirable(Popen.communicate)


if __name__ == '__main__':
    from subprocess import PIPE

    print ExpirablePopen('ssh -T git@bitbucket.org', stdout=PIPE, timeout=1).communicate()

There’s an idea to subclass the Popen class and extend it with some simple method decorators. Let’s call it ExpirablePopen.

from logging import error
from subprocess import Popen
from threading import Event
from threading import Thread


class ExpirablePopen(Popen):

    def __init__(self, *args, **kwargs):
        self.timeout = kwargs.pop('timeout', 0)
        self.timer = None
        self.done = Event()

        Popen.__init__(self, *args, **kwargs)

    def __tkill(self):
        timeout = self.timeout
        if not self.done.wait(timeout):
            error('Terminating process {} by timeout of {} secs.'.format(self.pid, timeout))
            self.kill()

    def expirable(func):
        def wrapper(self, *args, **kwargs):
            # zero timeout means call of parent method
            if self.timeout == 0:
                return func(self, *args, **kwargs)

            # if timer is None, need to start it
            if self.timer is None:
                self.timer = thr = Thread(target=self.__tkill)
                thr.daemon = True
                thr.start()

            result = func(self, *args, **kwargs)
            self.done.set()

            return result
        return wrapper

    wait = expirable(Popen.wait)
    communicate = expirable(Popen.communicate)


if __name__ == '__main__':
    from subprocess import PIPE

    print ExpirablePopen('ssh -T git@bitbucket.org', stdout=PIPE, timeout=1).communicate()

回答 22

我遇到的问题是,如果花费的时间比给定的超时时间长,我想终止多线程子进程。我想在中设置一个超时Popen(),但是没有用。然后,我意识到这Popen().wait()等于call(),因此我有了在该.wait(timeout=xxx)方法中设置超时的想法,该方法终于奏效了。因此,我通过以下方式解决了问题:

import os
import sys
import signal
import subprocess
from multiprocessing import Pool

cores_for_parallelization = 4
timeout_time = 15  # seconds

def main():
    jobs = [...YOUR_JOB_LIST...]
    with Pool(cores_for_parallelization) as p:
        p.map(run_parallel_jobs, jobs)

def run_parallel_jobs(args):
    # Define the arguments including the paths
    initial_terminal_command = 'C:\\Python34\\python.exe'  # Python executable
    function_to_start = 'C:\\temp\\xyz.py'  # The multithreading script
    final_list = [initial_terminal_command, function_to_start]
    final_list.extend(args)

    # Start the subprocess and determine the process PID
    subp = subprocess.Popen(final_list)  # starts the process
    pid = subp.pid

    # Wait until the return code returns from the function by considering the timeout. 
    # If not, terminate the process.
    try:
        returncode = subp.wait(timeout=timeout_time)  # should be zero if accomplished
    except subprocess.TimeoutExpired:
        # Distinguish between Linux and Windows and terminate the process if 
        # the timeout has been expired
        if sys.platform == 'linux2':
            os.kill(pid, signal.SIGTERM)
        elif sys.platform == 'win32':
            subp.terminate()

if __name__ == '__main__':
    main()

I had the problem that I wanted to terminate a multithreading subprocess if it took longer than a given timeout length. I wanted to set a timeout in Popen(), but it did not work. Then, I realized that Popen().wait() is equal to call() and so I had the idea to set a timeout within the .wait(timeout=xxx) method, which finally worked. Thus, I solved it this way:

import os
import sys
import signal
import subprocess
from multiprocessing import Pool

cores_for_parallelization = 4
timeout_time = 15  # seconds

def main():
    jobs = [...YOUR_JOB_LIST...]
    with Pool(cores_for_parallelization) as p:
        p.map(run_parallel_jobs, jobs)

def run_parallel_jobs(args):
    # Define the arguments including the paths
    initial_terminal_command = 'C:\\Python34\\python.exe'  # Python executable
    function_to_start = 'C:\\temp\\xyz.py'  # The multithreading script
    final_list = [initial_terminal_command, function_to_start]
    final_list.extend(args)

    # Start the subprocess and determine the process PID
    subp = subprocess.Popen(final_list)  # starts the process
    pid = subp.pid

    # Wait until the return code returns from the function by considering the timeout. 
    # If not, terminate the process.
    try:
        returncode = subp.wait(timeout=timeout_time)  # should be zero if accomplished
    except subprocess.TimeoutExpired:
        # Distinguish between Linux and Windows and terminate the process if 
        # the timeout has been expired
        if sys.platform == 'linux2':
            os.kill(pid, signal.SIGTERM)
        elif sys.platform == 'win32':
            subp.terminate()

if __name__ == '__main__':
    main()

回答 23

不幸的是,我受雇主披露源代码的非常严格的政策约束,因此我无法提供实际的代码。但按我的喜好,最好的解决方案是创建一个重写的子类Popen.wait()以轮询而不是无限期地等待,并Popen.__init__接受超时参数。完成后,所有其他Popen方法(调用wait)将按预期工作,包括communicate

Unfortunately, I’m bound by very strict policies on the disclosure of source code by my employer, so I can’t provide actual code. But for my taste the best solution is to create a subclass overriding Popen.wait() to poll instead of wait indefinitely, and Popen.__init__ to accept a timeout parameter. Once you do that, all the other Popen methods (which call wait) will work as expected, including communicate.


回答 24

https://pypi.python.org/pypi/python-subprocess2提供了子流程模块的扩展,使您可以等待一段时间,否则终止。

因此,要等待10秒钟才能终止进程,否则请终止:

pipe  = subprocess.Popen('...')

timeout =  10

results = pipe.waitOrTerminate(timeout)

这与Windows和UNIX兼容。“结果”是一个字典,它包含“ returnCode”和“ actionTaken”,returnCode是应用程序的返回值(如果必须终止,则为None)。如果该过程正常完成,则显示为“ SUBPROCESS2_PROCESS_COMPLETED”,或者根据执行的操作显示“ SUBPROCESS2_PROCESS_TERMINATED”和SUBPROCESS2_PROCESS_KILLED的掩码(有关详细信息,请参阅文档)

https://pypi.python.org/pypi/python-subprocess2 provides extensions to the subprocess module which allow you to wait up to a certain period of time, otherwise terminate.

So, to wait up to 10 seconds for the process to terminate, otherwise kill:

pipe  = subprocess.Popen('...')

timeout =  10

results = pipe.waitOrTerminate(timeout)

This is compatible with both windows and unix. “results” is a dictionary, it contains “returnCode” which is the return of the app (or None if it had to be killed), as well as “actionTaken”. which will be “SUBPROCESS2_PROCESS_COMPLETED” if the process completed normally, or a mask of “SUBPROCESS2_PROCESS_TERMINATED” and SUBPROCESS2_PROCESS_KILLED depending on action taken (see documentation for full details)


回答 25

对于python 2.6+,请使用gevent

 from gevent.subprocess import Popen, PIPE, STDOUT

 def call_sys(cmd, timeout):
      p= Popen(cmd, shell=True, stdout=PIPE)
      output, _ = p.communicate(timeout=timeout)
      assert p.returncode == 0, p. returncode
      return output

 call_sys('./t.sh', 2)

 # t.sh example
 sleep 5
 echo done
 exit 1

for python 2.6+, use gevent

 from gevent.subprocess import Popen, PIPE, STDOUT

 def call_sys(cmd, timeout):
      p= Popen(cmd, shell=True, stdout=PIPE)
      output, _ = p.communicate(timeout=timeout)
      assert p.returncode == 0, p. returncode
      return output

 call_sys('./t.sh', 2)

 # t.sh example
 sleep 5
 echo done
 exit 1

回答 26

python 2.7

import time
import subprocess

def run_command(cmd, timeout=0):
    start_time = time.time()
    df = subprocess.Popen(cmd, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
    while timeout and df.poll() == None:
        if time.time()-start_time >= timeout:
            df.kill()
            return -1, ""
    output = '\n'.join(df.communicate()).strip()
    return df.returncode, output

python 2.7

import time
import subprocess

def run_command(cmd, timeout=0):
    start_time = time.time()
    df = subprocess.Popen(cmd, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
    while timeout and df.poll() == None:
        if time.time()-start_time >= timeout:
            df.kill()
            return -1, ""
    output = '\n'.join(df.communicate()).strip()
    return df.returncode, output

回答 27

import subprocess, optparse, os, sys, re, datetime, threading, time, glob, shutil, xml.dom.minidom, traceback

class OutputManager:
    def __init__(self, filename, mode, console, logonly):
        self.con = console
        self.logtoconsole = True
        self.logtofile = False

        if filename:
            try:
                self.f = open(filename, mode)
                self.logtofile = True
                if logonly == True:
                    self.logtoconsole = False
            except IOError:
                print (sys.exc_value)
                print ("Switching to console only output...\n")
                self.logtofile = False
                self.logtoconsole = True

    def write(self, data):
        if self.logtoconsole == True:
            self.con.write(data)
        if self.logtofile == True:
            self.f.write(data)
        sys.stdout.flush()

def getTimeString():
        return time.strftime("%Y-%m-%d", time.gmtime())

def runCommand(command):
    '''
    Execute a command in new thread and return the
    stdout and stderr content of it.
    '''
    try:
        Output = subprocess.Popen(command, stdout=subprocess.PIPE, shell=True).communicate()[0]
    except Exception as e:
        print ("runCommand failed :%s" % (command))
        print (str(e))
        sys.stdout.flush()
        return None
    return Output

def GetOs():
    Os = ""
    if sys.platform.startswith('win32'):
        Os = "win"
    elif sys.platform.startswith('linux'):
        Os = "linux"
    elif sys.platform.startswith('darwin'):
        Os = "mac"
    return Os


def check_output(*popenargs, **kwargs):
    try:
        if 'stdout' in kwargs: 
            raise ValueError('stdout argument not allowed, it will be overridden.') 

        # Get start time.
        startTime = datetime.datetime.now()
        timeoutValue=3600

        cmd = popenargs[0]

        if sys.platform.startswith('win32'):
            process = subprocess.Popen( cmd, stdout=subprocess.PIPE, shell=True) 
        elif sys.platform.startswith('linux'):
            process = subprocess.Popen( cmd , stdout=subprocess.PIPE, shell=True ) 
        elif sys.platform.startswith('darwin'):
            process = subprocess.Popen( cmd , stdout=subprocess.PIPE, shell=True ) 

        stdoutdata, stderrdata = process.communicate( timeout = timeoutValue )
        retcode = process.poll()

        ####################################
        # Catch crash error and log it.
        ####################################
        OutputHandle = None
        try:
            if retcode >= 1:
                OutputHandle = OutputManager( 'CrashJob_' + getTimeString() + '.txt', 'a+', sys.stdout, False)
                OutputHandle.write( cmd )
                print (stdoutdata)
                print (stderrdata)
                sys.stdout.flush()
        except Exception as e:
            print (str(e))

    except subprocess.TimeoutExpired:
            ####################################
            # Catch time out error and log it.
            ####################################
            Os = GetOs()
            if Os == 'win':
                killCmd = "taskkill /FI \"IMAGENAME eq {0}\" /T /F"
            elif Os == 'linux':
                killCmd = "pkill {0)"
            elif Os == 'mac':
                # Linux, Mac OS
                killCmd = "killall -KILL {0}"

            runCommand(killCmd.format("java"))
            runCommand(killCmd.format("YouApp"))

            OutputHandle = None
            try:
                OutputHandle = OutputManager( 'KillJob_' + getTimeString() + '.txt', 'a+', sys.stdout, False)
                OutputHandle.write( cmd )
            except Exception as e:
                print (str(e))
    except Exception as e:
            for frame in traceback.extract_tb(sys.exc_info()[2]):
                        fname,lineno,fn,text = frame
                        print "Error in %s on line %d" % (fname, lineno)
import subprocess, optparse, os, sys, re, datetime, threading, time, glob, shutil, xml.dom.minidom, traceback

class OutputManager:
    def __init__(self, filename, mode, console, logonly):
        self.con = console
        self.logtoconsole = True
        self.logtofile = False

        if filename:
            try:
                self.f = open(filename, mode)
                self.logtofile = True
                if logonly == True:
                    self.logtoconsole = False
            except IOError:
                print (sys.exc_value)
                print ("Switching to console only output...\n")
                self.logtofile = False
                self.logtoconsole = True

    def write(self, data):
        if self.logtoconsole == True:
            self.con.write(data)
        if self.logtofile == True:
            self.f.write(data)
        sys.stdout.flush()

def getTimeString():
        return time.strftime("%Y-%m-%d", time.gmtime())

def runCommand(command):
    '''
    Execute a command in new thread and return the
    stdout and stderr content of it.
    '''
    try:
        Output = subprocess.Popen(command, stdout=subprocess.PIPE, shell=True).communicate()[0]
    except Exception as e:
        print ("runCommand failed :%s" % (command))
        print (str(e))
        sys.stdout.flush()
        return None
    return Output

def GetOs():
    Os = ""
    if sys.platform.startswith('win32'):
        Os = "win"
    elif sys.platform.startswith('linux'):
        Os = "linux"
    elif sys.platform.startswith('darwin'):
        Os = "mac"
    return Os


def check_output(*popenargs, **kwargs):
    try:
        if 'stdout' in kwargs: 
            raise ValueError('stdout argument not allowed, it will be overridden.') 

        # Get start time.
        startTime = datetime.datetime.now()
        timeoutValue=3600

        cmd = popenargs[0]

        if sys.platform.startswith('win32'):
            process = subprocess.Popen( cmd, stdout=subprocess.PIPE, shell=True) 
        elif sys.platform.startswith('linux'):
            process = subprocess.Popen( cmd , stdout=subprocess.PIPE, shell=True ) 
        elif sys.platform.startswith('darwin'):
            process = subprocess.Popen( cmd , stdout=subprocess.PIPE, shell=True ) 

        stdoutdata, stderrdata = process.communicate( timeout = timeoutValue )
        retcode = process.poll()

        ####################################
        # Catch crash error and log it.
        ####################################
        OutputHandle = None
        try:
            if retcode >= 1:
                OutputHandle = OutputManager( 'CrashJob_' + getTimeString() + '.txt', 'a+', sys.stdout, False)
                OutputHandle.write( cmd )
                print (stdoutdata)
                print (stderrdata)
                sys.stdout.flush()
        except Exception as e:
            print (str(e))

    except subprocess.TimeoutExpired:
            ####################################
            # Catch time out error and log it.
            ####################################
            Os = GetOs()
            if Os == 'win':
                killCmd = "taskkill /FI \"IMAGENAME eq {0}\" /T /F"
            elif Os == 'linux':
                killCmd = "pkill {0)"
            elif Os == 'mac':
                # Linux, Mac OS
                killCmd = "killall -KILL {0}"

            runCommand(killCmd.format("java"))
            runCommand(killCmd.format("YouApp"))

            OutputHandle = None
            try:
                OutputHandle = OutputManager( 'KillJob_' + getTimeString() + '.txt', 'a+', sys.stdout, False)
                OutputHandle.write( cmd )
            except Exception as e:
                print (str(e))
    except Exception as e:
            for frame in traceback.extract_tb(sys.exc_info()[2]):
                        fname,lineno,fn,text = frame
                        print "Error in %s on line %d" % (fname, lineno)

回答 28

只是想写一些简单的东西。

#!/usr/bin/python

from subprocess import Popen, PIPE
import datetime
import time 

popen = Popen(["/bin/sleep", "10"]);
pid = popen.pid
sttime = time.time();
waittime =  3

print "Start time %s"%(sttime)

while True:
    popen.poll();
    time.sleep(1)
    rcode = popen.returncode
    now = time.time();
    if [ rcode is None ]  and  [ now > (sttime + waittime) ] :
        print "Killing it now"
        popen.kill()

Was just trying to write something simpler.

#!/usr/bin/python

from subprocess import Popen, PIPE
import datetime
import time 

popen = Popen(["/bin/sleep", "10"]);
pid = popen.pid
sttime = time.time();
waittime =  3

print "Start time %s"%(sttime)

while True:
    popen.poll();
    time.sleep(1)
    rcode = popen.returncode
    now = time.time();
    if [ rcode is None ]  and  [ now > (sttime + waittime) ] :
        print "Killing it now"
        popen.kill()

为什么os.path.join()在这种情况下不起作用?

问题:为什么os.path.join()在这种情况下不起作用?

下面的代码将不会加入,调试后,该命令将不会存储整个路径,而只会存储最后一个条目。

os.path.join('/home/build/test/sandboxes/', todaystr, '/new_sandbox/')

当我对此进行测试时,它仅存储/new_sandbox/部分代码。

The below code will not join, when debugged the command does not store the whole path but just the last entry.

os.path.join('/home/build/test/sandboxes/', todaystr, '/new_sandbox/')

When I test this it only stores the /new_sandbox/ part of the code.


回答 0

后面的字符串不应以斜杠开头。如果它们以斜杠开头,那么它们将被视为“绝对路径”,并且丢弃它们之前的所有内容。

Python文档os.path.join引用

如果组件是绝对路径,则所有先前的组件都将被丢弃,并且连接将从绝对路径组件继续。

请注意,在Windows上,与驱动器号有关的行为与早期的Python版本相比似乎有所变化:

在Windows上,r'\foo'遇到绝对路径组件(例如)时,不会重置驱动器号。如果某个组件包含驱动器号,则会丢弃所有先前的组件,并重置驱动器号。请注意,由于每个驱动器都有一个当前目录,因此os.path.join("c:", "foo")表示相对于驱动器C:c:foo)上当前目录的路径,而不是c:\foo

The latter strings shouldn’t start with a slash. If they start with a slash, then they’re considered an “absolute path” and everything before them is discarded.

Quoting the Python docs for os.path.join:

If a component is an absolute path, all previous components are thrown away and joining continues from the absolute path component.

Note on Windows, the behaviour in relation to drive letters, which seems to have changed compared to earlier Python versions:

On Windows, the drive letter is not reset when an absolute path component (e.g., r'\foo') is encountered. If a component contains a drive letter, all previous components are thrown away and the drive letter is reset. Note that since there is a current directory for each drive, os.path.join("c:", "foo") represents a path relative to the current directory on drive C: (c:foo), not c:\foo.


回答 1

的想法os.path.join()是使您的程序跨平台(linux / windows / etc)。

即使是一个斜杠也会破坏它。

因此,仅当与某种参考点(例如os.environ['HOME']或)一起使用时,它才有意义 os.path.dirname(__file__)

The idea of os.path.join() is to make your program cross-platform (linux/windows/etc).

Even one slash ruins it.

So it only makes sense when being used with some kind of a reference point like os.environ['HOME'] or os.path.dirname(__file__).


回答 2

os.path.join()可以与一起使用,os.path.sep以创建绝对路径而不是相对路径。

os.path.join(os.path.sep, 'home','build','test','sandboxes',todaystr,'new_sandbox')

os.path.join() can be used in conjunction with os.path.sep to create an absolute rather than relative path.

os.path.join(os.path.sep, 'home','build','test','sandboxes',todaystr,'new_sandbox')

回答 3

除了在引用根目录时,不要在路径组件的开头使用正斜杠:

os.path.join('/home/build/test/sandboxes', todaystr, 'new_sandbox')

另请参见:http : //docs.python.org/library/os.path.html#os.path.join

Do not use forward slashes at the beginning of path components, except when refering to the root directory:

os.path.join('/home/build/test/sandboxes', todaystr, 'new_sandbox')

see also: http://docs.python.org/library/os.path.html#os.path.join


回答 4

为了帮助理解为什么这种令人惊讶的行为并不完全可怕,请考虑接受配置文件名作为参数的应用程序:

config_root = "/etc/myapp.conf/"
file_name = os.path.join(config_root, sys.argv[1])

如果应用程序通过以下方式执行:

$ myapp foo.conf

/etc/myapp.conf/foo.conf将使用配置文件。

但是请考虑使用以下方法调用应用程序会发生什么:

$ myapp /some/path/bar.conf

然后myapp 使用处的配置文件/some/path/bar.conf(而不是/etc/myapp.conf/some/path/bar.conf类似文件)。

可能不是很好,但是我相信这是绝对路径行为的动机。

To help understand why this surprising behavior isn’t entirely terrible, consider an application which accepts a config file name as an argument:

config_root = "/etc/myapp.conf/"
file_name = os.path.join(config_root, sys.argv[1])

If the application is executed with:

$ myapp foo.conf

The config file /etc/myapp.conf/foo.conf will be used.

But consider what happens if the application is called with:

$ myapp /some/path/bar.conf

Then myapp should use the config file at /some/path/bar.conf (and not /etc/myapp.conf/some/path/bar.conf or similar).

It may not be great, but I believe this is the motivation for the absolute path behaviour.


回答 5

这是因为您'/new_sandbox/'以a开头,/因此被假定为相对于根目录。拆下龙头/

It’s because your '/new_sandbox/' begins with a / and thus is assumed to be relative to the root directory. Remove the leading /.


回答 6

为了使您的功能更具可移植性,请按以下方式使用它:

os.path.join(os.sep, 'home', 'build', 'test', 'sandboxes', todaystr, 'new_sandbox')

要么

os.path.join(os.environ.get("HOME"), 'test', 'sandboxes', todaystr, 'new_sandbox')

To make your function more portable, use it as such:

os.path.join(os.sep, 'home', 'build', 'test', 'sandboxes', todaystr, 'new_sandbox')

or

os.path.join(os.environ.get("HOME"), 'test', 'sandboxes', todaystr, 'new_sandbox')

回答 7

尝试使用split("/")和组合*带有现有联接的字符串。

import os

home = '/home/build/test/sandboxes/'
todaystr = '042118'
new = '/new_sandbox/'

os.path.join(*home.split("/"), todaystr, *new.split("/"))


怎么运行的…

split("/") 将现有路径转换为列表: ['', 'home', 'build', 'test', 'sandboxes', '']

* 列表前面的内容将列表中的每个项目分解成自己的参数

Try combo of split("/") and * for strings with existing joins.

import os

home = '/home/build/test/sandboxes/'
todaystr = '042118'
new = '/new_sandbox/'

os.path.join(*home.split("/"), todaystr, *new.split("/"))


How it works…

split("/") turns existing path into list: ['', 'home', 'build', 'test', 'sandboxes', '']

* in front of the list breaks out each item of list its own parameter


回答 8

new_sandbox仅尝试

os.path.join('/home/build/test/sandboxes/', todaystr, 'new_sandbox')

Try with new_sandbox only

os.path.join('/home/build/test/sandboxes/', todaystr, 'new_sandbox')

回答 9

这样做,没有太多的斜线

root="/home"
os.path.join(root,"build","test","sandboxes",todaystr,"new_sandbox")

do it like this, without too the extra slashes

root="/home"
os.path.join(root,"build","test","sandboxes",todaystr,"new_sandbox")

回答 10

请注意,如果您使用os.path.join()已经包含点的扩展名,也会遇到类似的问题,当您使用时,扩展名会自动出现os.path.splitext()。在此示例中:

components = os.path.splitext(filename)
prefix = components[0]
extension = components[1]
return os.path.join("avatars", instance.username, prefix, extension)

即使最终您extension可能会得到.jpg一个名为“ foobar”的文件夹,而不是一个名为“ foobar.jpg”的文件。为防止这种情况,您需要单独附加扩展名:

return os.path.join("avatars", instance.username, prefix) + extension

Note that a similar issue can bite you if you use os.path.join() to include an extension that already includes a dot, which is what happens automatically when you use os.path.splitext(). In this example:

components = os.path.splitext(filename)
prefix = components[0]
extension = components[1]
return os.path.join("avatars", instance.username, prefix, extension)

Even though extension might be .jpg you end up with a folder named “foobar” rather than a file called “foobar.jpg”. To prevent this you need to append the extension separately:

return os.path.join("avatars", instance.username, prefix) + extension

回答 11

你可以strip'/'

>>> os.path.join('/home/build/test/sandboxes/', todaystr, '/new_sandbox/'.strip('/'))
'/home/build/test/sandboxes/04122019/new_sandbox'

you can strip the '/':

>>> os.path.join('/home/build/test/sandboxes/', todaystr, '/new_sandbox/'.strip('/'))
'/home/build/test/sandboxes/04122019/new_sandbox'

回答 12

我建议从第二个和后面的字符串中删除字符串os.path.sep,以防止将它们解释为绝对路径:

first_path_str = '/home/build/test/sandboxes/'
original_other_path_to_append_ls = [todaystr, '/new_sandbox/']
other_path_to_append_ls = [
    i_path.strip(os.path.sep) for i_path in original_other_path_to_append_ls
]
output_path = os.path.join(first_path_str, *other_path_to_append_ls)

I’d recommend to strip from the second and the following strings the string os.path.sep, preventing them to be interpreted as absolute paths:

first_path_str = '/home/build/test/sandboxes/'
original_other_path_to_append_ls = [todaystr, '/new_sandbox/']
other_path_to_append_ls = [
    i_path.strip(os.path.sep) for i_path in original_other_path_to_append_ls
]
output_path = os.path.join(first_path_str, *other_path_to_append_ls)

回答 13

os.path.join("a", *"/b".split(os.sep))
'a/b'

完整版本:

import os

def join (p, f, sep = os.sep):
    f = os.path.normpath(f)
    if p == "":
        return (f);
    else:
        p = os.path.normpath(p)
        return (os.path.join(p, *f.split(os.sep)))

def test (p, f, sep = os.sep):
    print("os.path.join({}, {}) => {}".format(p, f, os.path.join(p, f)))
    print("        join({}, {}) => {}".format(p, f, join(p, f, sep)))

if __name__ == "__main__":
    # /a/b/c for all
    test("\\a\\b", "\\c", "\\") # optionally pass in the sep you are using locally
    test("/a/b", "/c", "/")
    test("/a/b", "c")
    test("/a/b/", "c")
    test("", "/c")
    test("", "c")
os.path.join("a", *"/b".split(os.sep))
'a/b'

a fuller version:

import os

def join (p, f, sep = os.sep):
    f = os.path.normpath(f)
    if p == "":
        return (f);
    else:
        p = os.path.normpath(p)
        return (os.path.join(p, *f.split(os.sep)))

def test (p, f, sep = os.sep):
    print("os.path.join({}, {}) => {}".format(p, f, os.path.join(p, f)))
    print("        join({}, {}) => {}".format(p, f, join(p, f, sep)))

if __name__ == "__main__":
    # /a/b/c for all
    test("\\a\\b", "\\c", "\\") # optionally pass in the sep you are using locally
    test("/a/b", "/c", "/")
    test("/a/b", "c")
    test("/a/b/", "c")
    test("", "/c")
    test("", "c")

熊猫-如何展平列中的层次结构索引

问题:熊猫-如何展平列中的层次结构索引

我有一个在轴1(列)中具有层次结构索引的数据框(来自groupby.agg操作):

     USAF   WBAN  year  month  day  s_PC  s_CL  s_CD  s_CNT  tempf       
                                     sum   sum   sum    sum   amax   amin
0  702730  26451  1993      1    1     1     0    12     13  30.92  24.98
1  702730  26451  1993      1    2     0     0    13     13  32.00  24.98
2  702730  26451  1993      1    3     1    10     2     13  23.00   6.98
3  702730  26451  1993      1    4     1     0    12     13  10.04   3.92
4  702730  26451  1993      1    5     3     0    10     13  19.94  10.94

我想将其展平,使其看起来像这样(名称不是关键的-我可以重命名):

     USAF   WBAN  year  month  day  s_PC  s_CL  s_CD  s_CNT  tempf_amax  tmpf_amin   
0  702730  26451  1993      1    1     1     0    12     13  30.92          24.98
1  702730  26451  1993      1    2     0     0    13     13  32.00          24.98
2  702730  26451  1993      1    3     1    10     2     13  23.00          6.98
3  702730  26451  1993      1    4     1     0    12     13  10.04          3.92
4  702730  26451  1993      1    5     3     0    10     13  19.94          10.94

我该怎么做呢?(我已经尝试了很多,无济于事。)

根据建议,这是字典形式的头

{('USAF', ''): {0: '702730',
  1: '702730',
  2: '702730',
  3: '702730',
  4: '702730'},
 ('WBAN', ''): {0: '26451', 1: '26451', 2: '26451', 3: '26451', 4: '26451'},
 ('day', ''): {0: 1, 1: 2, 2: 3, 3: 4, 4: 5},
 ('month', ''): {0: 1, 1: 1, 2: 1, 3: 1, 4: 1},
 ('s_CD', 'sum'): {0: 12.0, 1: 13.0, 2: 2.0, 3: 12.0, 4: 10.0},
 ('s_CL', 'sum'): {0: 0.0, 1: 0.0, 2: 10.0, 3: 0.0, 4: 0.0},
 ('s_CNT', 'sum'): {0: 13.0, 1: 13.0, 2: 13.0, 3: 13.0, 4: 13.0},
 ('s_PC', 'sum'): {0: 1.0, 1: 0.0, 2: 1.0, 3: 1.0, 4: 3.0},
 ('tempf', 'amax'): {0: 30.920000000000002,
  1: 32.0,
  2: 23.0,
  3: 10.039999999999999,
  4: 19.939999999999998},
 ('tempf', 'amin'): {0: 24.98,
  1: 24.98,
  2: 6.9799999999999969,
  3: 3.9199999999999982,
  4: 10.940000000000001},
 ('year', ''): {0: 1993, 1: 1993, 2: 1993, 3: 1993, 4: 1993}}

I have a data frame with a hierarchical index in axis 1 (columns) (from a groupby.agg operation):

     USAF   WBAN  year  month  day  s_PC  s_CL  s_CD  s_CNT  tempf       
                                     sum   sum   sum    sum   amax   amin
0  702730  26451  1993      1    1     1     0    12     13  30.92  24.98
1  702730  26451  1993      1    2     0     0    13     13  32.00  24.98
2  702730  26451  1993      1    3     1    10     2     13  23.00   6.98
3  702730  26451  1993      1    4     1     0    12     13  10.04   3.92
4  702730  26451  1993      1    5     3     0    10     13  19.94  10.94

I want to flatten it, so that it looks like this (names aren’t critical – I could rename):

     USAF   WBAN  year  month  day  s_PC  s_CL  s_CD  s_CNT  tempf_amax  tmpf_amin   
0  702730  26451  1993      1    1     1     0    12     13  30.92          24.98
1  702730  26451  1993      1    2     0     0    13     13  32.00          24.98
2  702730  26451  1993      1    3     1    10     2     13  23.00          6.98
3  702730  26451  1993      1    4     1     0    12     13  10.04          3.92
4  702730  26451  1993      1    5     3     0    10     13  19.94          10.94

How do I do this? (I’ve tried a lot, to no avail.)

Per a suggestion, here is the head in dict form

{('USAF', ''): {0: '702730',
  1: '702730',
  2: '702730',
  3: '702730',
  4: '702730'},
 ('WBAN', ''): {0: '26451', 1: '26451', 2: '26451', 3: '26451', 4: '26451'},
 ('day', ''): {0: 1, 1: 2, 2: 3, 3: 4, 4: 5},
 ('month', ''): {0: 1, 1: 1, 2: 1, 3: 1, 4: 1},
 ('s_CD', 'sum'): {0: 12.0, 1: 13.0, 2: 2.0, 3: 12.0, 4: 10.0},
 ('s_CL', 'sum'): {0: 0.0, 1: 0.0, 2: 10.0, 3: 0.0, 4: 0.0},
 ('s_CNT', 'sum'): {0: 13.0, 1: 13.0, 2: 13.0, 3: 13.0, 4: 13.0},
 ('s_PC', 'sum'): {0: 1.0, 1: 0.0, 2: 1.0, 3: 1.0, 4: 3.0},
 ('tempf', 'amax'): {0: 30.920000000000002,
  1: 32.0,
  2: 23.0,
  3: 10.039999999999999,
  4: 19.939999999999998},
 ('tempf', 'amin'): {0: 24.98,
  1: 24.98,
  2: 6.9799999999999969,
  3: 3.9199999999999982,
  4: 10.940000000000001},
 ('year', ''): {0: 1993, 1: 1993, 2: 1993, 3: 1993, 4: 1993}}

回答 0

我认为最简单的方法是将列设置为顶级:

df.columns = df.columns.get_level_values(0)

注意:如果to级别具有名称,您也可以通过此名称(而不是0)来访问它。

如果要将joinMultiIndex 组合成一个索引(假设您的列中仅包含字符串条目),则可以:

df.columns = [' '.join(col).strip() for col in df.columns.values]

注意:strip没有第二个索引时,必须使用空格。

In [11]: [' '.join(col).strip() for col in df.columns.values]
Out[11]: 
['USAF',
 'WBAN',
 'day',
 'month',
 's_CD sum',
 's_CL sum',
 's_CNT sum',
 's_PC sum',
 'tempf amax',
 'tempf amin',
 'year']

I think the easiest way to do this would be to set the columns to the top level:

df.columns = df.columns.get_level_values(0)

Note: if the to level has a name you can also access it by this, rather than 0.

.

If you want to combine/join your MultiIndex into one Index (assuming you have just string entries in your columns) you could:

df.columns = [' '.join(col).strip() for col in df.columns.values]

Note: we must strip the whitespace for when there is no second index.

In [11]: [' '.join(col).strip() for col in df.columns.values]
Out[11]: 
['USAF',
 'WBAN',
 'day',
 'month',
 's_CD sum',
 's_CL sum',
 's_CNT sum',
 's_PC sum',
 'tempf amax',
 'tempf amin',
 'year']

回答 1

pd.DataFrame(df.to_records()) # multiindex become columns and new index is integers only
pd.DataFrame(df.to_records()) # multiindex become columns and new index is integers only

回答 2

该线程上的所有当前答案都必须已过时。从pandas0.24.0版开始,.to_flat_index()您需要做什么。

从熊猫自己的文档中

MultiIndex.to_flat_index()

将MultiIndex转换为包含级别值的元组索引。

文档中的一个简单示例:

import pandas as pd
print(pd.__version__) # '0.23.4'
index = pd.MultiIndex.from_product(
        [['foo', 'bar'], ['baz', 'qux']],
        names=['a', 'b'])

print(index)
# MultiIndex(levels=[['bar', 'foo'], ['baz', 'qux']],
#           codes=[[1, 1, 0, 0], [0, 1, 0, 1]],
#           names=['a', 'b'])

申请to_flat_index()

index.to_flat_index()
# Index([('foo', 'baz'), ('foo', 'qux'), ('bar', 'baz'), ('bar', 'qux')], dtype='object')

用它来替换现有的pandas

一个如何在上使用它的示例dat,它是一个带有MultiIndex列的DataFrame :

dat = df.loc[:,['name','workshop_period','class_size']].groupby(['name','workshop_period']).describe()
print(dat.columns)
# MultiIndex(levels=[['class_size'], ['count', 'mean', 'std', 'min', '25%', '50%', '75%', 'max']],
#            codes=[[0, 0, 0, 0, 0, 0, 0, 0], [0, 1, 2, 3, 4, 5, 6, 7]])

dat.columns = dat.columns.to_flat_index()
print(dat.columns)
# Index([('class_size', 'count'),  ('class_size', 'mean'),
#     ('class_size', 'std'),   ('class_size', 'min'),
#     ('class_size', '25%'),   ('class_size', '50%'),
#     ('class_size', '75%'),   ('class_size', 'max')],
#  dtype='object')

All of the current answers on this thread must have been a bit dated. As of pandas version 0.24.0, the .to_flat_index() does what you need.

From panda’s own documentation:

MultiIndex.to_flat_index()

Convert a MultiIndex to an Index of Tuples containing the level values.

A simple example from its documentation:

import pandas as pd
print(pd.__version__) # '0.23.4'
index = pd.MultiIndex.from_product(
        [['foo', 'bar'], ['baz', 'qux']],
        names=['a', 'b'])

print(index)
# MultiIndex(levels=[['bar', 'foo'], ['baz', 'qux']],
#           codes=[[1, 1, 0, 0], [0, 1, 0, 1]],
#           names=['a', 'b'])

Applying to_flat_index():

index.to_flat_index()
# Index([('foo', 'baz'), ('foo', 'qux'), ('bar', 'baz'), ('bar', 'qux')], dtype='object')

Using it to replace existing pandas column

An example of how you’d use it on dat, which is a DataFrame with a MultiIndex column:

dat = df.loc[:,['name','workshop_period','class_size']].groupby(['name','workshop_period']).describe()
print(dat.columns)
# MultiIndex(levels=[['class_size'], ['count', 'mean', 'std', 'min', '25%', '50%', '75%', 'max']],
#            codes=[[0, 0, 0, 0, 0, 0, 0, 0], [0, 1, 2, 3, 4, 5, 6, 7]])

dat.columns = dat.columns.to_flat_index()
print(dat.columns)
# Index([('class_size', 'count'),  ('class_size', 'mean'),
#     ('class_size', 'std'),   ('class_size', 'min'),
#     ('class_size', '25%'),   ('class_size', '50%'),
#     ('class_size', '75%'),   ('class_size', 'max')],
#  dtype='object')

回答 3

安迪·海登(Andy Hayden)的答案当然是最简单的方法-如果要避免重复的列标签,则需要进行一些调整

In [34]: df
Out[34]: 
     USAF   WBAN  day  month  s_CD  s_CL  s_CNT  s_PC  tempf         year
                               sum   sum    sum   sum   amax   amin      
0  702730  26451    1      1    12     0     13     1  30.92  24.98  1993
1  702730  26451    2      1    13     0     13     0  32.00  24.98  1993
2  702730  26451    3      1     2    10     13     1  23.00   6.98  1993
3  702730  26451    4      1    12     0     13     1  10.04   3.92  1993
4  702730  26451    5      1    10     0     13     3  19.94  10.94  1993


In [35]: mi = df.columns

In [36]: mi
Out[36]: 
MultiIndex
[(USAF, ), (WBAN, ), (day, ), (month, ), (s_CD, sum), (s_CL, sum), (s_CNT, sum), (s_PC, sum), (tempf, amax), (tempf, amin), (year, )]


In [37]: mi.tolist()
Out[37]: 
[('USAF', ''),
 ('WBAN', ''),
 ('day', ''),
 ('month', ''),
 ('s_CD', 'sum'),
 ('s_CL', 'sum'),
 ('s_CNT', 'sum'),
 ('s_PC', 'sum'),
 ('tempf', 'amax'),
 ('tempf', 'amin'),
 ('year', '')]

In [38]: ind = pd.Index([e[0] + e[1] for e in mi.tolist()])

In [39]: ind
Out[39]: Index([USAF, WBAN, day, month, s_CDsum, s_CLsum, s_CNTsum, s_PCsum, tempfamax, tempfamin, year], dtype=object)

In [40]: df.columns = ind




In [46]: df
Out[46]: 
     USAF   WBAN  day  month  s_CDsum  s_CLsum  s_CNTsum  s_PCsum  tempfamax  tempfamin  \
0  702730  26451    1      1       12        0        13        1      30.92      24.98   
1  702730  26451    2      1       13        0        13        0      32.00      24.98   
2  702730  26451    3      1        2       10        13        1      23.00       6.98   
3  702730  26451    4      1       12        0        13        1      10.04       3.92   
4  702730  26451    5      1       10        0        13        3      19.94      10.94   




   year  
0  1993  
1  1993  
2  1993  
3  1993  
4  1993

Andy Hayden’s answer is certainly the easiest way — if you want to avoid duplicate column labels you need to tweak a bit

In [34]: df
Out[34]: 
     USAF   WBAN  day  month  s_CD  s_CL  s_CNT  s_PC  tempf         year
                               sum   sum    sum   sum   amax   amin      
0  702730  26451    1      1    12     0     13     1  30.92  24.98  1993
1  702730  26451    2      1    13     0     13     0  32.00  24.98  1993
2  702730  26451    3      1     2    10     13     1  23.00   6.98  1993
3  702730  26451    4      1    12     0     13     1  10.04   3.92  1993
4  702730  26451    5      1    10     0     13     3  19.94  10.94  1993


In [35]: mi = df.columns

In [36]: mi
Out[36]: 
MultiIndex
[(USAF, ), (WBAN, ), (day, ), (month, ), (s_CD, sum), (s_CL, sum), (s_CNT, sum), (s_PC, sum), (tempf, amax), (tempf, amin), (year, )]


In [37]: mi.tolist()
Out[37]: 
[('USAF', ''),
 ('WBAN', ''),
 ('day', ''),
 ('month', ''),
 ('s_CD', 'sum'),
 ('s_CL', 'sum'),
 ('s_CNT', 'sum'),
 ('s_PC', 'sum'),
 ('tempf', 'amax'),
 ('tempf', 'amin'),
 ('year', '')]

In [38]: ind = pd.Index([e[0] + e[1] for e in mi.tolist()])

In [39]: ind
Out[39]: Index([USAF, WBAN, day, month, s_CDsum, s_CLsum, s_CNTsum, s_PCsum, tempfamax, tempfamin, year], dtype=object)

In [40]: df.columns = ind




In [46]: df
Out[46]: 
     USAF   WBAN  day  month  s_CDsum  s_CLsum  s_CNTsum  s_PCsum  tempfamax  tempfamin  \
0  702730  26451    1      1       12        0        13        1      30.92      24.98   
1  702730  26451    2      1       13        0        13        0      32.00      24.98   
2  702730  26451    3      1        2       10        13        1      23.00       6.98   
3  702730  26451    4      1       12        0        13        1      10.04       3.92   
4  702730  26451    5      1       10        0        13        3      19.94      10.94   




   year  
0  1993  
1  1993  
2  1993  
3  1993  
4  1993

回答 4

df.columns = ['_'.join(tup).rstrip('_') for tup in df.columns.values]
df.columns = ['_'.join(tup).rstrip('_') for tup in df.columns.values]

回答 5

而且,如果您想保留第二级多索引中的任何聚合信息,则可以尝试以下操作:

In [1]: new_cols = [''.join(t) for t in df.columns]
Out[1]:
['USAF',
 'WBAN',
 'day',
 'month',
 's_CDsum',
 's_CLsum',
 's_CNTsum',
 's_PCsum',
 'tempfamax',
 'tempfamin',
 'year']

In [2]: df.columns = new_cols

And if you want to retain any of the aggregation info from the second level of the multiindex you can try this:

In [1]: new_cols = [''.join(t) for t in df.columns]
Out[1]:
['USAF',
 'WBAN',
 'day',
 'month',
 's_CDsum',
 's_CLsum',
 's_CNTsum',
 's_PCsum',
 'tempfamax',
 'tempfamin',
 'year']

In [2]: df.columns = new_cols

回答 6

使用map函数的最pythonic方法。

df.columns = df.columns.map(' '.join).str.strip()

输出print(df.columns)

Index(['USAF', 'WBAN', 'day', 'month', 's_CD sum', 's_CL sum', 's_CNT sum',
       's_PC sum', 'tempf amax', 'tempf amin', 'year'],
      dtype='object')

使用Python 3.6+和f字符串进行更新:

df.columns = [f'{f} {s}' if s != '' else f'{f}' 
              for f, s in df.columns]

print(df.columns)

输出:

Index(['USAF', 'WBAN', 'day', 'month', 's_CD sum', 's_CL sum', 's_CNT sum',
       's_PC sum', 'tempf amax', 'tempf amin', 'year'],
      dtype='object')

The most pythonic way to do this to use map function.

df.columns = df.columns.map(' '.join).str.strip()

Output print(df.columns):

Index(['USAF', 'WBAN', 'day', 'month', 's_CD sum', 's_CL sum', 's_CNT sum',
       's_PC sum', 'tempf amax', 'tempf amin', 'year'],
      dtype='object')

Update using Python 3.6+ with f string:

df.columns = [f'{f} {s}' if s != '' else f'{f}' 
              for f, s in df.columns]

print(df.columns)

Output:

Index(['USAF', 'WBAN', 'day', 'month', 's_CD sum', 's_CL sum', 's_CNT sum',
       's_PC sum', 'tempf amax', 'tempf amin', 'year'],
      dtype='object')

回答 7

对我来说,最简单,最直观的解决方案是使用get_level_values组合列名称。当您在同一列上执行多个聚合时,这可以防止重复的列名称:

level_one = df.columns.get_level_values(0).astype(str)
level_two = df.columns.get_level_values(1).astype(str)
df.columns = level_one + level_two

如果要在列之间使用分隔符,则可以执行此操作。这将返回与Seiji Armstrong关于已接受答案的评论相同的内容,该评论仅包括两个索引级别中的值的列的下划线:

level_one = df.columns.get_level_values(0).astype(str)
level_two = df.columns.get_level_values(1).astype(str)
column_separator = ['_' if x != '' else '' for x in level_two]
df.columns = level_one + column_separator + level_two

我知道这与Andy Hayden的出色答案具有相同的作用,但我认为这种方式更直观,并且更容易记住(因此,我不必继续引用此线程),尤其是对于熊猫新手用户。

在您可能具有3个列级别的情况下,此方法也可以扩展。

level_one = df.columns.get_level_values(0).astype(str)
level_two = df.columns.get_level_values(1).astype(str)
level_three = df.columns.get_level_values(2).astype(str)
df.columns = level_one + level_two + level_three

The easiest and most intuitive solution for me was to combine the column names using get_level_values. This prevents duplicate column names when you do more than one aggregation on the same column:

level_one = df.columns.get_level_values(0).astype(str)
level_two = df.columns.get_level_values(1).astype(str)
df.columns = level_one + level_two

If you want a separator between columns, you can do this. This will return the same thing as Seiji Armstrong’s comment on the accepted answer that only includes underscores for columns with values in both index levels:

level_one = df.columns.get_level_values(0).astype(str)
level_two = df.columns.get_level_values(1).astype(str)
column_separator = ['_' if x != '' else '' for x in level_two]
df.columns = level_one + column_separator + level_two

I know this does the same thing as Andy Hayden’s great answer above, but I think it is a bit more intuitive this way and is easier to remember (so I don’t have to keep referring to this thread), especially for novice pandas users.

This method is also more extensible in the case where you may have 3 column levels.

level_one = df.columns.get_level_values(0).astype(str)
level_two = df.columns.get_level_values(1).astype(str)
level_three = df.columns.get_level_values(2).astype(str)
df.columns = level_one + level_two + level_three

回答 8

阅读完所有答案后,我想到了:

def __my_flatten_cols(self, how="_".join, reset_index=True):
    how = (lambda iter: list(iter)[-1]) if how == "last" else how
    self.columns = [how(filter(None, map(str, levels))) for levels in self.columns.values] \
                    if isinstance(self.columns, pd.MultiIndex) else self.columns
    return self.reset_index() if reset_index else self
pd.DataFrame.my_flatten_cols = __my_flatten_cols

用法:

给定一个数据框:

df = pd.DataFrame({"grouper": ["x","x","y","y"], "val1": [0,2,4,6], 2: [1,3,5,7]}, columns=["grouper", "val1", 2])

  grouper  val1  2
0       x     0  1
1       x     2  3
2       y     4  5
3       y     6  7
  • 单一聚合方法与源名称相同的结果变量:

    df.groupby(by="grouper").agg("min").my_flatten_cols()
    • df.groupby(by="grouper", as_index = False).agg(...).reset_index()相同
    • ----- before -----
                 val1  2
        grouper         
      
      ------ after -----
        grouper  val1  2
      0       x     0  1
      1       y     4  5
  • 单源变量,多个聚合以统计信息命名的结果变量:

    df.groupby(by="grouper").agg({"val1": [min,max]}).my_flatten_cols("last")
    • 与相同a = df.groupby(..).agg(..); a.columns = a.columns.droplevel(0); a.reset_index()
    • ----- before -----
                  val1    
                 min max
        grouper         
      
      ------ after -----
        grouper  min  max
      0       x    0    2
      1       y    4    6
  • 多个变量,多个聚合:名为(varname)_(statname)的结果变量:

    df.groupby(by="grouper").agg({"val1": min, 2:[sum, "size"]}).my_flatten_cols()
    # you can combine the names in other ways too, e.g. use a different delimiter:
    #df.groupby(by="grouper").agg({"val1": min, 2:[sum, "size"]}).my_flatten_cols(" ".join)
    • 在后台运行a.columns = ["_".join(filter(None, map(str, levels))) for levels in a.columns.values](因为这种形式的agg()结果出现MultiIndex在列上)。
    • 如果您没有my_flatten_cols帮助者,则输入@Seigi建议的解决方案可能会更容易:a.columns = ["_".join(t).rstrip("_") for t in a.columns.values]在这种情况下,它的工作原理类似(但如果列上有数字标签,则会失败)
    • 要处理列上的数字标签,可以使用@jxstanford和@Nolan Conawaya.columns = ["_".join(tuple(map(str, t))).rstrip("_") for t in a.columns.values])建议的解决方案,但我不明白为什么tuple()需要调用,并且我相信rstrip()只有在某些列具有类似("colname", "")(如果您reset_index()在尝试修复之前会发生这种情况.columns
    • ----- before -----
                 val1           2     
                 min       sum    size
        grouper              
      
      ------ after -----
        grouper  val1_min  2_sum  2_size
      0       x         0      4       2
      1       y         4     12       2
  • 要手动命名结果变量:(这是因为大熊猫0.20.0弃用没有适当的替代性为0.23

    df.groupby(by="grouper").agg({"val1": {"sum_of_val1": "sum", "count_of_val1": "count"},
                                       2: {"sum_of_2":    "sum", "count_of_2":    "count"}}).my_flatten_cols("last")
    • 其他建议包括:手动设置列:res.columns = ['A_sum', 'B_sum', 'count'].join()输入多个groupby语句。
    • ----- before -----
                         val1                      2         
                count_of_val1 sum_of_val1 count_of_2 sum_of_2
        grouper                                              
      
      ------ after -----
        grouper  count_of_val1  sum_of_val1  count_of_2  sum_of_2
      0       x              2            2           2         4
      1       y              2           10           2        12

助手功能处理的案件

  • 级别名称可以是非字符串,例如,当列名称是整数时按列号使用Index pandas DataFrame,因此我们必须使用map(str, ..)
  • 它们也可以是空的,所以我们必须 filter(None, ..)
  • 对于单级列(即,除MultiIndex之外的任何内容),columns.values返回名称(str,而不是元组)
  • 根据您的使用方式,.agg()您可能需要保留一列的最底端标签或连接多个标签
  • (因为我是熊猫新手?),我希望reset_index()能够以常规方式使用group-by列,因此默认情况下会这样做

After reading through all the answers, I came up with this:

def __my_flatten_cols(self, how="_".join, reset_index=True):
    how = (lambda iter: list(iter)[-1]) if how == "last" else how
    self.columns = [how(filter(None, map(str, levels))) for levels in self.columns.values] \
                    if isinstance(self.columns, pd.MultiIndex) else self.columns
    return self.reset_index() if reset_index else self
pd.DataFrame.my_flatten_cols = __my_flatten_cols

Usage:

Given a data frame:

df = pd.DataFrame({"grouper": ["x","x","y","y"], "val1": [0,2,4,6], 2: [1,3,5,7]}, columns=["grouper", "val1", 2])

  grouper  val1  2
0       x     0  1
1       x     2  3
2       y     4  5
3       y     6  7
  • Single aggregation method: resulting variables named the same as source:

    df.groupby(by="grouper").agg("min").my_flatten_cols()
    
    • Same as df.groupby(by="grouper", as_index=False) or .agg(...).reset_index()
    • ----- before -----
                 val1  2
        grouper         
      
      ------ after -----
        grouper  val1  2
      0       x     0  1
      1       y     4  5
      
  • Single source variable, multiple aggregations: resulting variables named after statistics:

    df.groupby(by="grouper").agg({"val1": [min,max]}).my_flatten_cols("last")
    
    • Same as a = df.groupby(..).agg(..); a.columns = a.columns.droplevel(0); a.reset_index().
    • ----- before -----
                  val1    
                 min max
        grouper         
      
      ------ after -----
        grouper  min  max
      0       x    0    2
      1       y    4    6
      
  • Multiple variables, multiple aggregations: resulting variables named (varname)_(statname):

    df.groupby(by="grouper").agg({"val1": min, 2:[sum, "size"]}).my_flatten_cols()
    # you can combine the names in other ways too, e.g. use a different delimiter:
    #df.groupby(by="grouper").agg({"val1": min, 2:[sum, "size"]}).my_flatten_cols(" ".join)
    
    • Runs a.columns = ["_".join(filter(None, map(str, levels))) for levels in a.columns.values] under the hood (since this form of agg() results in MultiIndex on columns).
    • If you don’t have the my_flatten_cols helper, it might be easier to type in the solution suggested by @Seigi: a.columns = ["_".join(t).rstrip("_") for t in a.columns.values], which works similarly in this case (but fails if you have numeric labels on columns)
    • To handle the numeric labels on columns, you could use the solution suggested by @jxstanford and @Nolan Conaway (a.columns = ["_".join(tuple(map(str, t))).rstrip("_") for t in a.columns.values]), but I don’t understand why the tuple() call is needed, and I believe rstrip() is only required if some columns have a descriptor like ("colname", "") (which can happen if you reset_index() before trying to fix up .columns)
    • ----- before -----
                 val1           2     
                 min       sum    size
        grouper              
      
      ------ after -----
        grouper  val1_min  2_sum  2_size
      0       x         0      4       2
      1       y         4     12       2
      
  • You want to name the resulting variables manually: (this is deprecated since pandas 0.20.0 with no adequate alternative as of 0.23)

    df.groupby(by="grouper").agg({"val1": {"sum_of_val1": "sum", "count_of_val1": "count"},
                                       2: {"sum_of_2":    "sum", "count_of_2":    "count"}}).my_flatten_cols("last")
    
    • Other suggestions include: setting the columns manually: res.columns = ['A_sum', 'B_sum', 'count'] or .join()ing multiple groupby statements.
    • ----- before -----
                         val1                      2         
                count_of_val1 sum_of_val1 count_of_2 sum_of_2
        grouper                                              
      
      ------ after -----
        grouper  count_of_val1  sum_of_val1  count_of_2  sum_of_2
      0       x              2            2           2         4
      1       y              2           10           2        12
      

Cases handled by the helper function

  • level names can be non-string, e.g. Index pandas DataFrame by column numbers, when column names are integers, so we have to convert with map(str, ..)
  • they can also be empty, so we have to filter(None, ..)
  • for single-level columns (i.e. anything except MultiIndex), columns.values returns the names (str, not tuples)
  • depending on how you used .agg() you may need to keep the bottom-most label for a column or concatenate multiple labels
  • (since I’m new to pandas?) more often than not, I want reset_index() to be able to work with the group-by columns in the regular way, so it does that by default

回答 9

处理多个级别和混合类型的常规解决方案:

df.columns = ['_'.join(tuple(map(str, t))) for t in df.columns.values]

A general solution that handles multiple levels and mixed types:

df.columns = ['_'.join(tuple(map(str, t))) for t in df.columns.values]

回答 10

也许有些晚,但是如果您不担心重复的列名:

df.columns = df.columns.tolist()

A bit late maybe, but if you are not worried about duplicate column names:

df.columns = df.columns.tolist()

回答 11

如果您希望在各个级别之间使用分隔符,则此功能会很好用。

def flattenHierarchicalCol(col,sep = '_'):
    if not type(col) is tuple:
        return col
    else:
        new_col = ''
        for leveli,level in enumerate(col):
            if not level == '':
                if not leveli == 0:
                    new_col += sep
                new_col += level
        return new_col

df.columns = df.columns.map(flattenHierarchicalCol)

In case you want to have a separator in the name between levels, this function works well.

def flattenHierarchicalCol(col,sep = '_'):
    if not type(col) is tuple:
        return col
    else:
        new_col = ''
        for leveli,level in enumerate(col):
            if not level == '':
                if not leveli == 0:
                    new_col += sep
                new_col += level
        return new_col

df.columns = df.columns.map(flattenHierarchicalCol)

回答 12

在@jxstanford和@ tvt173之后,我编写了一个快速函数,无论字符串/ int列名如何,该函数都可以完成此任务:

def flatten_cols(df):
    df.columns = [
        '_'.join(tuple(map(str, t))).rstrip('_') 
        for t in df.columns.values
        ]
    return df

Following @jxstanford and @tvt173, I wrote a quick function which should do the trick, regardless of string/int column names:

def flatten_cols(df):
    df.columns = [
        '_'.join(tuple(map(str, t))).rstrip('_') 
        for t in df.columns.values
        ]
    return df

回答 13

您也可以按照以下步骤进行操作。考虑df是您的数据框,并假设一个二级索引(在您的示例中就是这种情况)

df.columns = [(df.columns[i][0])+'_'+(datadf_pos4.columns[i][1]) for i in range(len(df.columns))]

You could also do as below. Consider df to be your dataframe and assume a two level index (as is the case in your example)

df.columns = [(df.columns[i][0])+'_'+(datadf_pos4.columns[i][1]) for i in range(len(df.columns))]

回答 14

我将分享一种对我有用的简单方法。

[" ".join([str(elem) for elem in tup]) for tup in df.columns.tolist()]
#df = df.reset_index() if needed

I’ll share a straight-forward way that worked for me.

[" ".join([str(elem) for elem in tup]) for tup in df.columns.tolist()]
#df = df.reset_index() if needed

回答 15

要在其他DataFrame方法链内展平MultiIndex,请定义如下函数:

def flatten_index(df):
  df_copy = df.copy()
  df_copy.columns = ['_'.join(col).rstrip('_') for col in df_copy.columns.values]
  return df_copy.reset_index()

然后使用该pipe方法在DataFrame方法链中,在链中任何其他方法之后groupbyagg之前应用此函数:

my_df \
  .groupby('group') \
  .agg({'value': ['count']}) \
  .pipe(flatten_index) \
  .sort_values('value_count')

To flatten a MultiIndex inside a chain of other DataFrame methods, define a function like this:

def flatten_index(df):
  df_copy = df.copy()
  df_copy.columns = ['_'.join(col).rstrip('_') for col in df_copy.columns.values]
  return df_copy.reset_index()

Then use the pipe method to apply this function in the chain of DataFrame methods, after groupby and agg but before any other methods in the chain:

my_df \
  .groupby('group') \
  .agg({'value': ['count']}) \
  .pipe(flatten_index) \
  .sort_values('value_count')

回答 16

另一个简单的例程。

def flatten_columns(df, sep='.'):
    def _remove_empty(column_name):
        return tuple(element for element in column_name if element)
    def _join(column_name):
        return sep.join(column_name)

    new_columns = [_join(_remove_empty(column)) for column in df.columns.values]
    df.columns = new_columns

Another simple routine.

def flatten_columns(df, sep='.'):
    def _remove_empty(column_name):
        return tuple(element for element in column_name if element)
    def _join(column_name):
        return sep.join(column_name)

    new_columns = [_join(_remove_empty(column)) for column in df.columns.values]
    df.columns = new_columns