标签归档:python-3.x

将字节转换为字符串

问题:将字节转换为字符串

我正在使用以下代码从外部程序获取标准输出:

>>> from subprocess import *
>>> command_stdout = Popen(['ls', '-l'], stdout=PIPE).communicate()[0]

communication()方法返回一个字节数组:

>>> command_stdout
b'total 0\n-rw-rw-r-- 1 thomas thomas 0 Mar  3 07:03 file1\n-rw-rw-r-- 1 thomas thomas 0 Mar  3 07:03 file2\n'

但是,我想将输出作为普通的Python字符串使用。这样我就可以像这样打印它:

>>> print(command_stdout)
-rw-rw-r-- 1 thomas thomas 0 Mar  3 07:03 file1
-rw-rw-r-- 1 thomas thomas 0 Mar  3 07:03 file2

我认为这就是binascii.b2a_qp()方法的用途,但是当我尝试使用它时,我又得到了相同的字节数组:

>>> binascii.b2a_qp(command_stdout)
b'total 0\n-rw-rw-r-- 1 thomas thomas 0 Mar  3 07:03 file1\n-rw-rw-r-- 1 thomas thomas 0 Mar  3 07:03 file2\n'

如何将字节值转换回字符串?我的意思是,使用“电池”而不是手动进行操作。我希望它与Python 3兼容。

I’m using this code to get standard output from an external program:

>>> from subprocess import *
>>> command_stdout = Popen(['ls', '-l'], stdout=PIPE).communicate()[0]

The communicate() method returns an array of bytes:

>>> command_stdout
b'total 0\n-rw-rw-r-- 1 thomas thomas 0 Mar  3 07:03 file1\n-rw-rw-r-- 1 thomas thomas 0 Mar  3 07:03 file2\n'

However, I’d like to work with the output as a normal Python string. So that I could print it like this:

>>> print(command_stdout)
-rw-rw-r-- 1 thomas thomas 0 Mar  3 07:03 file1
-rw-rw-r-- 1 thomas thomas 0 Mar  3 07:03 file2

I thought that’s what the binascii.b2a_qp() method is for, but when I tried it, I got the same byte array again:

>>> binascii.b2a_qp(command_stdout)
b'total 0\n-rw-rw-r-- 1 thomas thomas 0 Mar  3 07:03 file1\n-rw-rw-r-- 1 thomas thomas 0 Mar  3 07:03 file2\n'

How do I convert the bytes value back to string? I mean, using the “batteries” instead of doing it manually. And I’d like it to be OK with Python 3.


回答 0

您需要解码bytes对象以产生一个字符串:

>>> b"abcde"
b'abcde'

# utf-8 is used here because it is a very common encoding, but you
# need to use the encoding your data is actually in.
>>> b"abcde".decode("utf-8") 
'abcde'

You need to decode the bytes object to produce a string:

>>> b"abcde"
b'abcde'

# utf-8 is used here because it is a very common encoding, but you
# need to use the encoding your data is actually in.
>>> b"abcde".decode("utf-8") 
'abcde'

回答 1

您需要解码该字节字符串,然后将其转换为字符(Unicode)字符串。

在Python 2上

encoding = 'utf-8'
'hello'.decode(encoding)

要么

unicode('hello', encoding)

在Python 3上

encoding = 'utf-8'
b'hello'.decode(encoding)

要么

str(b'hello', encoding)

You need to decode the byte string and turn it in to a character (Unicode) string.

On Python 2

encoding = 'utf-8'
'hello'.decode(encoding)

or

unicode('hello', encoding)

On Python 3

encoding = 'utf-8'
b'hello'.decode(encoding)

or

str(b'hello', encoding)

回答 2

我认为这种方式很简单:

>>> bytes_data = [112, 52, 52]
>>> "".join(map(chr, bytes_data))
'p44'

I think this way is easy:

>>> bytes_data = [112, 52, 52]
>>> "".join(map(chr, bytes_data))
'p44'

回答 3

如果您不知道编码,则要以Python 3和Python 2兼容的方式将二进制输入读取为字符串,请使用古老的MS-DOS CP437编码:

PY3K = sys.version_info >= (3, 0)

lines = []
for line in stream:
    if not PY3K:
        lines.append(line)
    else:
        lines.append(line.decode('cp437'))

因为编码是未知的,所以希望将非英语符号转换为字符cp437(不会翻译英语字符,因为它们在大多数单字节编码和UTF-8中都匹配)。

将任意二进制输入解码为UTF-8是不安全的,因为您可能会得到以下信息:

>>> b'\x00\x01\xffsd'.decode('utf-8')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 2: invalid
start byte

同样适用于latin-1,这在Python 2中很流行(默认?)。请参见“ 代码页布局”中的遗漏之处-这是Python臭名昭著的地方ordinal not in range

UPDATE 20150604:有传言称Python 3具有surrogateescape错误策略,可将内容编码为二进制数据而不会导致数据丢失和崩溃,但它需要进行转换测试[binary] -> [str] -> [binary],以验证性能和可靠性。

更新20170116:感谢评论-还可以使用backslashreplace错误处理程序对所有未知字节进行斜线转义。这仅适用于Python 3,因此即使采用这种解决方法,您仍然会从不同的Python版本获得不一致的输出:

PY3K = sys.version_info >= (3, 0)

lines = []
for line in stream:
    if not PY3K:
        lines.append(line)
    else:
        lines.append(line.decode('utf-8', 'backslashreplace'))

看到 详细信息, Python的Unicode支持

更新20170119:我决定实现适用于Python 2和Python 3的斜线转义解码。它应该比cp437解决方案要慢,但是在每个Python版本上都应产生相同的结果

# --- preparation

import codecs

def slashescape(err):
    """ codecs error handler. err is UnicodeDecode instance. return
    a tuple with a replacement for the unencodable part of the input
    and a position where encoding should continue"""
    #print err, dir(err), err.start, err.end, err.object[:err.start]
    thebyte = err.object[err.start:err.end]
    repl = u'\\x'+hex(ord(thebyte))[2:]
    return (repl, err.end)

codecs.register_error('slashescape', slashescape)

# --- processing

stream = [b'\x80abc']

lines = []
for line in stream:
    lines.append(line.decode('utf-8', 'slashescape'))

If you don’t know the encoding, then to read binary input into string in Python 3 and Python 2 compatible way, use the ancient MS-DOS CP437 encoding:

PY3K = sys.version_info >= (3, 0)

lines = []
for line in stream:
    if not PY3K:
        lines.append(line)
    else:
        lines.append(line.decode('cp437'))

Because encoding is unknown, expect non-English symbols to translate to characters of cp437 (English characters are not translated, because they match in most single byte encodings and UTF-8).

Decoding arbitrary binary input to UTF-8 is unsafe, because you may get this:

>>> b'\x00\x01\xffsd'.decode('utf-8')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 2: invalid
start byte

The same applies to latin-1, which was popular (the default?) for Python 2. See the missing points in Codepage Layout – it is where Python chokes with infamous ordinal not in range.

UPDATE 20150604: There are rumors that Python 3 has the surrogateescape error strategy for encoding stuff into binary data without data loss and crashes, but it needs conversion tests, [binary] -> [str] -> [binary], to validate both performance and reliability.

UPDATE 20170116: Thanks to comment by Nearoo – there is also a possibility to slash escape all unknown bytes with backslashreplace error handler. That works only for Python 3, so even with this workaround you will still get inconsistent output from different Python versions:

PY3K = sys.version_info >= (3, 0)

lines = []
for line in stream:
    if not PY3K:
        lines.append(line)
    else:
        lines.append(line.decode('utf-8', 'backslashreplace'))

See Python’s Unicode Support for details.

UPDATE 20170119: I decided to implement slash escaping decode that works for both Python 2 and Python 3. It should be slower than the cp437 solution, but it should produce identical results on every Python version.

# --- preparation

import codecs

def slashescape(err):
    """ codecs error handler. err is UnicodeDecode instance. return
    a tuple with a replacement for the unencodable part of the input
    and a position where encoding should continue"""
    #print err, dir(err), err.start, err.end, err.object[:err.start]
    thebyte = err.object[err.start:err.end]
    repl = u'\\x'+hex(ord(thebyte))[2:]
    return (repl, err.end)

codecs.register_error('slashescape', slashescape)

# --- processing

stream = [b'\x80abc']

lines = []
for line in stream:
    lines.append(line.decode('utf-8', 'slashescape'))

回答 4

在Python 3中,默认编码为"utf-8",因此您可以直接使用:

b'hello'.decode()

相当于

b'hello'.decode(encoding="utf-8")

另一方面,在Python 2中,编码默认为默认的字符串编码。因此,您应该使用:

b'hello'.decode(encoding)

encoding您想要的编码在哪里。

注意:在Python 2.7中添加了对关键字参数的支持。

In Python 3, the default encoding is "utf-8", so you can directly use:

b'hello'.decode()

which is equivalent to

b'hello'.decode(encoding="utf-8")

On the other hand, in Python 2, encoding defaults to the default string encoding. Thus, you should use:

b'hello'.decode(encoding)

where encoding is the encoding you want.

Note: support for keyword arguments was added in Python 2.7.


回答 5

我认为您实际上想要这样:

>>> from subprocess import *
>>> command_stdout = Popen(['ls', '-l'], stdout=PIPE).communicate()[0]
>>> command_text = command_stdout.decode(encoding='windows-1252')

亚伦的答案是正确的,除了您需要知道哪个要使用编码。而且我相信Windows使用的是“ windows-1252”。仅当您的内容中包含一些不寻常的(非ASCII)字符时,这才有意义,但这将有所作为。

顺便说一句,它事实上事情的原因了Python转移到使用两种不同类型的二进制和文本数据:它不能神奇地将它们转换之间,因为它不知道编码,除非你告诉它!您唯一知道的方法是阅读Windows文档(或在此处阅读)。

I think you actually want this:

>>> from subprocess import *
>>> command_stdout = Popen(['ls', '-l'], stdout=PIPE).communicate()[0]
>>> command_text = command_stdout.decode(encoding='windows-1252')

Aaron’s answer was correct, except that you need to know which encoding to use. And I believe that Windows uses ‘windows-1252’. It will only matter if you have some unusual (non-ASCII) characters in your content, but then it will make a difference.

By the way, the fact that it does matter is the reason that Python moved to using two different types for binary and text data: it can’t convert magically between them, because it doesn’t know the encoding unless you tell it! The only way YOU would know is to read the Windows documentation (or read it here).


回答 6

将Universal_newlines设置为True,即

command_stdout = Popen(['ls', '-l'], stdout=PIPE, universal_newlines=True).communicate()[0]

Set universal_newlines to True, i.e.

command_stdout = Popen(['ls', '-l'], stdout=PIPE, universal_newlines=True).communicate()[0]

回答 7

虽然@Aaron Maenpaa的答案有效,但最近有用户

有没有更简单的方法?’fhand.read()。decode(“ ASCII”)'[…]太长了!

您可以使用:

command_stdout.decode()

decode()有一个标准参数

codecs.decode(obj, encoding='utf-8', errors='strict')

While @Aaron Maenpaa’s answer just works, a user recently asked:

Is there any more simply way? ‘fhand.read().decode(“ASCII”)’ […] It’s so long!

You can use:

command_stdout.decode()

decode() has a standard argument:

codecs.decode(obj, encoding='utf-8', errors='strict')


回答 8

要将字节序列解释为文本,您必须知道相应的字符编码:

unicode_text = bytestring.decode(character_encoding)

例:

>>> b'\xc2\xb5'.decode('utf-8')
'µ'

ls命令可能会产生无法解释为文本的输出。Unix上的文件名可以是任何字节序列,但斜杠b'/'和零 除外b'\0'

>>> open(bytes(range(0x100)).translate(None, b'\0/'), 'w').close()

尝试使用utf-8编码对此类字节汤进行解码将引发UnicodeDecodeError

可能会更糟。 如果使用错误的不兼容编码,解码可能会默默失败并产生mojibake

>>> '—'.encode('utf-8').decode('cp1252')
'—'

数据已损坏,但是您的程序仍然不知道发生了故障。

通常,要使用的字符编码不会嵌入字节序列本身。您必须带外传达此信息。一些结果比其他结果更有可能,因此chardet存在可以猜测字符编码的模块。单个Python脚本可能在不同位置使用多种字符编码。


ls可以使用os.fsdecode() 即使对于无法解码的文件名也成功的函数将输出转换为Python字符串(在Unix上使用 sys.getfilesystemencoding()surrogateescape错误处理程序):

import os
import subprocess

output = os.fsdecode(subprocess.check_output('ls'))

要获取原始字节,可以使用os.fsencode()

如果传递universal_newlines=True参数,则subprocess用于 locale.getpreferredencoding(False)解码字节,例如,它可以 cp1252在Windows上使用。

要实时解码字节流, io.TextIOWrapper() 可以使用:example

不同的命令可能对其输出使用不同的字符编码,例如,dir内部命令(cmd)可能使用cp437。要解码其输出,可以显式传递编码(Python 3.6+):

output = subprocess.check_output('dir', shell=True, encoding='cp437')

文件名可能与os.listdir()(使用Windows Unicode API)不同(例如,'\xb6'可以用'\x14'—Python的cp437编解码器映射b'\x14'代替)来控制字符U + 0014而不是U + 00B6(¶)。要支持带有任意Unicode字符的文件名,请参阅将 PowerShell输出可能包含非ASCII Unicode字符解码为Python字符串。

To interpret a byte sequence as a text, you have to know the corresponding character encoding:

unicode_text = bytestring.decode(character_encoding)

Example:

>>> b'\xc2\xb5'.decode('utf-8')
'µ'

ls command may produce output that can’t be interpreted as text. File names on Unix may be any sequence of bytes except slash b'/' and zero b'\0':

>>> open(bytes(range(0x100)).translate(None, b'\0/'), 'w').close()

Trying to decode such byte soup using utf-8 encoding raises UnicodeDecodeError.

It can be worse. The decoding may fail silently and produce mojibake if you use a wrong incompatible encoding:

>>> '—'.encode('utf-8').decode('cp1252')
'—'

The data is corrupted but your program remains unaware that a failure has occurred.

In general, what character encoding to use is not embedded in the byte sequence itself. You have to communicate this info out-of-band. Some outcomes are more likely than others and therefore chardet module exists that can guess the character encoding. A single Python script may use multiple character encodings in different places.


ls output can be converted to a Python string using os.fsdecode() function that succeeds even for undecodable filenames (it uses sys.getfilesystemencoding() and surrogateescape error handler on Unix):

import os
import subprocess

output = os.fsdecode(subprocess.check_output('ls'))

To get the original bytes, you could use os.fsencode().

If you pass universal_newlines=True parameter then subprocess uses locale.getpreferredencoding(False) to decode bytes e.g., it can be cp1252 on Windows.

To decode the byte stream on-the-fly, io.TextIOWrapper() could be used: example.

Different commands may use different character encodings for their output e.g., dir internal command (cmd) may use cp437. To decode its output, you could pass the encoding explicitly (Python 3.6+):

output = subprocess.check_output('dir', shell=True, encoding='cp437')

The filenames may differ from os.listdir() (which uses Windows Unicode API) e.g., '\xb6' can be substituted with '\x14'—Python’s cp437 codec maps b'\x14' to control character U+0014 instead of U+00B6 (¶). To support filenames with arbitrary Unicode characters, see Decode PowerShell output possibly containing non-ASCII Unicode characters into a Python string


回答 9

由于这个问题实际上是在询问subprocess输出,因此您可以使用更直接的方法,因为它Popen接受了encoding关键字(在Python 3.6+中):

>>> from subprocess import Popen, PIPE
>>> text = Popen(['ls', '-l'], stdout=PIPE, encoding='utf-8').communicate()[0]
>>> type(text)
str
>>> print(text)
total 0
-rw-r--r-- 1 wim badger 0 May 31 12:45 some_file.txt

其他用户的一般答案是将字节解码为文本:

>>> b'abcde'.decode()
'abcde'

没有参数,sys.getdefaultencoding()将被使用。如果您的数据不是sys.getdefaultencoding(),那么您必须在decode调用中显式指定编码:

>>> b'caf\xe9'.decode('cp1250')
'café'

Since this question is actually asking about subprocess output, you have a more direct approach available since Popen accepts an encoding keyword (in Python 3.6+):

>>> from subprocess import Popen, PIPE
>>> text = Popen(['ls', '-l'], stdout=PIPE, encoding='utf-8').communicate()[0]
>>> type(text)
str
>>> print(text)
total 0
-rw-r--r-- 1 wim badger 0 May 31 12:45 some_file.txt

The general answer for other users is to decode bytes to text:

>>> b'abcde'.decode()
'abcde'

With no argument, sys.getdefaultencoding() will be used. If your data is not sys.getdefaultencoding(), then you must specify the encoding explicitly in the decode call:

>>> b'caf\xe9'.decode('cp1250')
'café'

回答 10

如果您应该尝试以下操作decode()

AttributeError:“ str”对象没有属性“ decode”

您还可以直接在转换中指定编码类型:

>>> my_byte_str
b'Hello World'

>>> str(my_byte_str, 'utf-8')
'Hello World'

If you should get the following by trying decode():

AttributeError: ‘str’ object has no attribute ‘decode’

You can also specify the encoding type straight in a cast:

>>> my_byte_str
b'Hello World'

>>> str(my_byte_str, 'utf-8')
'Hello World'

回答 11

当使用Windows系统中的数据(以\r\n行结尾)时,我的答案是

String = Bytes.decode("utf-8").replace("\r\n", "\n")

为什么?尝试使用多行Input.txt:

Bytes = open("Input.txt", "rb").read()
String = Bytes.decode("utf-8")
open("Output.txt", "w").write(String)

您所有的行尾都将加倍(以 \r\r\n),从而导致多余的空行。Python的文本读取函数通常会规范行尾,因此字符串只能使用\n。如果您从Windows系统接收二进制数据,Python将没有机会这样做。从而,

Bytes = open("Input.txt", "rb").read()
String = Bytes.decode("utf-8").replace("\r\n", "\n")
open("Output.txt", "w").write(String)

将复制您的原始文件。

When working with data from Windows systems (with \r\n line endings), my answer is

String = Bytes.decode("utf-8").replace("\r\n", "\n")

Why? Try this with a multiline Input.txt:

Bytes = open("Input.txt", "rb").read()
String = Bytes.decode("utf-8")
open("Output.txt", "w").write(String)

All your line endings will be doubled (to \r\r\n), leading to extra empty lines. Python’s text-read functions usually normalize line endings so that strings use only \n. If you receive binary data from a Windows system, Python does not have a chance to do that. Thus,

Bytes = open("Input.txt", "rb").read()
String = Bytes.decode("utf-8").replace("\r\n", "\n")
open("Output.txt", "w").write(String)

will replicate your original file.


回答 12

我做了一个清理清单的功能

def cleanLists(self, lista):
    lista = [x.strip() for x in lista]
    lista = [x.replace('\n', '') for x in lista]
    lista = [x.replace('\b', '') for x in lista]
    lista = [x.encode('utf8') for x in lista]
    lista = [x.decode('utf8') for x in lista]

    return lista

I made a function to clean a list

def cleanLists(self, lista):
    lista = [x.strip() for x in lista]
    lista = [x.replace('\n', '') for x in lista]
    lista = [x.replace('\b', '') for x in lista]
    lista = [x.encode('utf8') for x in lista]
    lista = [x.decode('utf8') for x in lista]

    return lista

回答 13

对于Python 3,这是一个更安全和Python的方法来从转换bytestring

def byte_to_str(bytes_or_str):
    if isinstance(bytes_or_str, bytes): # Check if it's in bytes
        print(bytes_or_str.decode('utf-8'))
    else:
        print("Object not of byte type")

byte_to_str(b'total 0\n-rw-rw-r-- 1 thomas thomas 0 Mar  3 07:03 file1\n-rw-rw-r-- 1 thomas thomas 0 Mar  3 07:03 file2\n')

输出:

total 0
-rw-rw-r-- 1 thomas thomas 0 Mar  3 07:03 file1
-rw-rw-r-- 1 thomas thomas 0 Mar  3 07:03 file2

For Python 3, this is a much safer and Pythonic approach to convert from byte to string:

def byte_to_str(bytes_or_str):
    if isinstance(bytes_or_str, bytes): # Check if it's in bytes
        print(bytes_or_str.decode('utf-8'))
    else:
        print("Object not of byte type")

byte_to_str(b'total 0\n-rw-rw-r-- 1 thomas thomas 0 Mar  3 07:03 file1\n-rw-rw-r-- 1 thomas thomas 0 Mar  3 07:03 file2\n')

Output:

total 0
-rw-rw-r-- 1 thomas thomas 0 Mar  3 07:03 file1
-rw-rw-r-- 1 thomas thomas 0 Mar  3 07:03 file2

回答 14

sys —系统特定的参数和功能

要从标准流写入二进制数据或从标准流读取二进制数据,请使用基础二进制缓冲区。例如,要将字节写入stdout,请使用sys.stdout.buffer.write(b'abc')

From sys — System-specific parameters and functions:

To write or read binary data from/to the standard streams, use the underlying binary buffer. For example, to write bytes to stdout, use sys.stdout.buffer.write(b'abc').


回答 15

def toString(string):    
    try:
        return v.decode("utf-8")
    except ValueError:
        return string

b = b'97.080.500'
s = '97.080.500'
print(toString(b))
print(toString(s))
def toString(string):    
    try:
        return v.decode("utf-8")
    except ValueError:
        return string

b = b'97.080.500'
s = '97.080.500'
print(toString(b))
print(toString(s))

回答 16

对于“运行shell命令并以文本而不是字节形式获取其输出” 的特定情况,在Python 3.7上,您应该使用subprocess.run并传入text=True(以及capture_output=True捕获输出)

command_result = subprocess.run(["ls", "-l"], capture_output=True, text=True)
command_result.stdout  # is a `str` containing your program's stdout

text过去称为universal_newlines,并在Python 3.7中进行了更改(很好,为别名)。如果要支持3.7之前的Python版本,请传入universal_newlines=True而不是text=True

For your specific case of “run a shell command and get its output as text instead of bytes”, on Python 3.7, you should use subprocess.run and pass in text=True (as well as capture_output=True to capture the output)

command_result = subprocess.run(["ls", "-l"], capture_output=True, text=True)
command_result.stdout  # is a `str` containing your program's stdout

text used to be called universal_newlines, and was changed (well, aliased) in Python 3.7. If you want to support Python versions before 3.7, pass in universal_newlines=True instead of text=True


回答 17

如果要转换任何字节,而不仅仅是将字符串转换为字节:

with open("bytesfile", "rb") as infile:
    str = base64.b85encode(imageFile.read())

with open("bytesfile", "rb") as infile:
    str2 = json.dumps(list(infile.read()))

但是,这不是很有效。它将2 MB的图片变成9 MB。

If you want to convert any bytes, not just string converted to bytes:

with open("bytesfile", "rb") as infile:
    str = base64.b85encode(imageFile.read())

with open("bytesfile", "rb") as infile:
    str2 = json.dumps(list(infile.read()))

This is not very efficient, however. It will turn a 2 MB picture into 9 MB.


回答 18

尝试这个

bytes.fromhex('c3a9').decode('utf-8') 

try this

bytes.fromhex('c3a9').decode('utf-8') 

为什么在Python 3中“范围(1000000000000000(1000000000000001))”这么快?

问题:为什么在Python 3中“范围(1000000000000000(1000000000000001))”这么快?

据我了解,该range()函数实际上是Python 3中的一种对象类型,它会生成器一样动态生成其内容。

在这种情况下,我本以为下一行会花费过多的时间,因为要确定1个四舍五入是否在该范围内,必须生成一个四舍五入值:

1000000000000000 in range(1000000000000001)

此外:似乎无论我添加多少个零,计算多少都花费相同的时间(基本上是瞬时的)。

我也尝试过这样的事情,但是计算仍然是即时的:

1000000000000000000000 in range(0,1000000000000000000001,10) # count by tens

如果我尝试实现自己的范围函数,结果将不是很好!

def my_crappy_range(N):
    i = 0
    while i < N:
        yield i
        i += 1
    return

使range()物体如此之快的物体在做什么?


选择Martijn Pieters的答案是因为它的完整性,但也看到了abarnert的第一个答案,它很好地讨论了在Python 3中range成为完整序列的含义,以及一些有关__contains__跨Python实现的函数优化潜在不一致的信息/警告。。abarnert的其他答案更加详细,并为那些对Python 3优化背后的历史(以及xrangePython 2中缺乏优化)感兴趣的人提供了链接。pokewim的答案感兴趣的人提供了相关的C源代码和说明。

It is my understanding that the range() function, which is actually an object type in Python 3, generates its contents on the fly, similar to a generator.

This being the case, I would have expected the following line to take an inordinate amount of time, because in order to determine whether 1 quadrillion is in the range, a quadrillion values would have to be generated:

1000000000000000 in range(1000000000000001)

Furthermore: it seems that no matter how many zeroes I add on, the calculation more or less takes the same amount of time (basically instantaneous).

I have also tried things like this, but the calculation is still almost instant:

1000000000000000000000 in range(0,1000000000000000000001,10) # count by tens

If I try to implement my own range function, the result is not so nice!!

def my_crappy_range(N):
    i = 0
    while i < N:
        yield i
        i += 1
    return

What is the range() object doing under the hood that makes it so fast?


Martijn Pieters’ answer was chosen for its completeness, but also see abarnert’s first answer for a good discussion of what it means for range to be a full-fledged sequence in Python 3, and some information/warning regarding potential inconsistency for __contains__ function optimization across Python implementations. abarnert’s other answer goes into some more detail and provides links for those interested in the history behind the optimization in Python 3 (and lack of optimization of xrange in Python 2). Answers by poke and by wim provide the relevant C source code and explanations for those who are interested.


回答 0

Python 3 range()对象不会立即产生数字。它是一个智能序列对象,可按需生成数字。它包含的只是您的开始,结束和步长值,然后在对对象进行迭代时,每次迭代都会计算下一个整数。

该对象还实现了object.__contains__hook,并计算您的电话号码是否在其范围内。计算是一个(近)恒定时间运算*。永远不需要扫描范围内的所有可能整数。

range()对象文档中

所述的优点range类型通过常规listtuple是一个范围对象将始终以相同的内存(小)数量,无论它代表的范围内的大小(因为它仅存储startstopstep值,计算各个项目和子范围如所须)。

因此,您的range()对象至少可以做到:

class my_range(object):
    def __init__(self, start, stop=None, step=1):
        if stop is None:
            start, stop = 0, start
        self.start, self.stop, self.step = start, stop, step
        if step < 0:
            lo, hi, step = stop, start, -step
        else:
            lo, hi = start, stop
        self.length = 0 if lo > hi else ((hi - lo - 1) // step) + 1

    def __iter__(self):
        current = self.start
        if self.step < 0:
            while current > self.stop:
                yield current
                current += self.step
        else:
            while current < self.stop:
                yield current
                current += self.step

    def __len__(self):
        return self.length

    def __getitem__(self, i):
        if i < 0:
            i += self.length
        if 0 <= i < self.length:
            return self.start + i * self.step
        raise IndexError('Index out of range: {}'.format(i))

    def __contains__(self, num):
        if self.step < 0:
            if not (self.stop < num <= self.start):
                return False
        else:
            if not (self.start <= num < self.stop):
                return False
        return (num - self.start) % self.step == 0

这仍然缺少实际range()支持的几项内容(例如.index().count()方法,哈希,相等性测试或切片),但应该可以给您一个提示。

我还简化了__contains__实现,只专注于整数测试。如果您为实物range()提供非整数值(包括的子类int),则会启动慢速扫描以查看是否存在匹配项,就好像您对所有包含的值的列表使用了包含测试一样。这样做是为了继续支持其他数字类型,这些数字类型恰好支持使用整数进行相等性测试,但也不希望同时支持整数算术。请参阅实现收容测试的原始Python问题


* 由于Python整数是无界的,所以时间接近恒定,因此数学运算也随着N的增长而及时增长,这使其成为O(log N)运算。由于所有操作均以优化的C代码执行,并且Python将整数值存储在30位块中,因此,由于此处涉及的整数大小,您会用光内存,然后再看到任何性能影响。

The Python 3 range() object doesn’t produce numbers immediately; it is a smart sequence object that produces numbers on demand. All it contains is your start, stop and step values, then as you iterate over the object the next integer is calculated each iteration.

The object also implements the object.__contains__ hook, and calculates if your number is part of its range. Calculating is a (near) constant time operation *. There is never a need to scan through all possible integers in the range.

From the range() object documentation:

The advantage of the range type over a regular list or tuple is that a range object will always take the same (small) amount of memory, no matter the size of the range it represents (as it only stores the start, stop and step values, calculating individual items and subranges as needed).

So at a minimum, your range() object would do:

class my_range(object):
    def __init__(self, start, stop=None, step=1):
        if stop is None:
            start, stop = 0, start
        self.start, self.stop, self.step = start, stop, step
        if step < 0:
            lo, hi, step = stop, start, -step
        else:
            lo, hi = start, stop
        self.length = 0 if lo > hi else ((hi - lo - 1) // step) + 1

    def __iter__(self):
        current = self.start
        if self.step < 0:
            while current > self.stop:
                yield current
                current += self.step
        else:
            while current < self.stop:
                yield current
                current += self.step

    def __len__(self):
        return self.length

    def __getitem__(self, i):
        if i < 0:
            i += self.length
        if 0 <= i < self.length:
            return self.start + i * self.step
        raise IndexError('Index out of range: {}'.format(i))

    def __contains__(self, num):
        if self.step < 0:
            if not (self.stop < num <= self.start):
                return False
        else:
            if not (self.start <= num < self.stop):
                return False
        return (num - self.start) % self.step == 0

This is still missing several things that a real range() supports (such as the .index() or .count() methods, hashing, equality testing, or slicing), but should give you an idea.

I also simplified the __contains__ implementation to only focus on integer tests; if you give a real range() object a non-integer value (including subclasses of int), a slow scan is initiated to see if there is a match, just as if you use a containment test against a list of all the contained values. This was done to continue to support other numeric types that just happen to support equality testing with integers but are not expected to support integer arithmetic as well. See the original Python issue that implemented the containment test.


* Near constant time because Python integers are unbounded and so math operations also grow in time as N grows, making this a O(log N) operation. Since it’s all executed in optimised C code and Python stores integer values in 30-bit chunks, you’d run out of memory before you saw any performance impact due to the size of the integers involved here.


回答 1

此处的根本误解是认为range是生成器。不是。实际上,它不是任何迭代器。

您可以很容易地说出这一点:

>>> a = range(5)
>>> print(list(a))
[0, 1, 2, 3, 4]
>>> print(list(a))
[0, 1, 2, 3, 4]

如果它是一个生成器,则对其进行一次迭代将耗尽它:

>>> b = my_crappy_range(5)
>>> print(list(b))
[0, 1, 2, 3, 4]
>>> print(list(b))
[]

什么range实际上是,是一个序列,就像一个列表。您甚至可以测试一下:

>>> import collections.abc
>>> isinstance(a, collections.abc.Sequence)
True

这意味着它必须遵循成为序列的所有规则:

>>> a[3]         # indexable
3
>>> len(a)       # sized
5
>>> 3 in a       # membership
True
>>> reversed(a)  # reversible
<range_iterator at 0x101cd2360>
>>> a.index(3)   # implements 'index'
3
>>> a.count(3)   # implements 'count'
1

一个之间的差range和一list在于,range动态序列; 它不记得所有的价值,它只是记住它startstopstep,并根据需要创建的值__getitem__

(作为一个旁注,如果您使用print(iter(a)),则会注意到range使用与相同的listiterator类型list。它是如何工作的?A 除了listiterator使用listC的C实现这一事实外,没有使用任何其他特殊方法__getitem__,因此对于range太。)


现在,没有什么可以说Sequence.__contains__必须是恒定时间的-实际上,对于类似的明显示例list,事实并非如此。但是没有什么可以说是不可能的。与range.__contains__(val - start) % step实际进行计算和测试所有值相比,仅对其进行数学检查(,但具有一些额外的复杂性来处理否定步骤)要容易实现,那么为什么这样做会更好呢?

但是似乎没有什么语言可以保证会发生这种情况。正如Ashwini Chaudhari指出的那样,如果您给它提供一个非整数值,而不是转换为整数并进行数学测试,它将落到对所有值进行迭代并逐一进行比较的过程中。不仅因为CPython 3.2+和PyPy 3.x版本恰好包含此优化,而且这是一个显而易见的好主意且易于实现,所以Iron Iron或NewKickAssPython 3.x没有理由不能放弃它。(实际上,CPython 3.0-3.1 并未包含它。)


如果range实际上是一个生成器(如)my_crappy_range,那么以__contains__这种方式进行测试就没有意义,或者至少有一种合理的方式并不明显。如果您已经迭代了前三个值,那么生成器1仍然in是吗?测试是否应该1使其迭代并消耗所有值1(或直到第一个值>= 1)?

The fundamental misunderstanding here is in thinking that range is a generator. It’s not. In fact, it’s not any kind of iterator.

You can tell this pretty easily:

>>> a = range(5)
>>> print(list(a))
[0, 1, 2, 3, 4]
>>> print(list(a))
[0, 1, 2, 3, 4]

If it were a generator, iterating it once would exhaust it:

>>> b = my_crappy_range(5)
>>> print(list(b))
[0, 1, 2, 3, 4]
>>> print(list(b))
[]

What range actually is, is a sequence, just like a list. You can even test this:

>>> import collections.abc
>>> isinstance(a, collections.abc.Sequence)
True

This means it has to follow all the rules of being a sequence:

>>> a[3]         # indexable
3
>>> len(a)       # sized
5
>>> 3 in a       # membership
True
>>> reversed(a)  # reversible
<range_iterator at 0x101cd2360>
>>> a.index(3)   # implements 'index'
3
>>> a.count(3)   # implements 'count'
1

The difference between a range and a list is that a range is a lazy or dynamic sequence; it doesn’t remember all of its values, it just remembers its start, stop, and step, and creates the values on demand on __getitem__.

(As a side note, if you print(iter(a)), you’ll notice that range uses the same listiterator type as list. How does that work? A listiterator doesn’t use anything special about list except for the fact that it provides a C implementation of __getitem__, so it works fine for range too.)


Now, there’s nothing that says that Sequence.__contains__ has to be constant time—in fact, for obvious examples of sequences like list, it isn’t. But there’s nothing that says it can’t be. And it’s easier to implement range.__contains__ to just check it mathematically ((val - start) % step, but with some extra complexity to deal with negative steps) than to actually generate and test all the values, so why shouldn’t it do it the better way?

But there doesn’t seem to be anything in the language that guarantees this will happen. As Ashwini Chaudhari points out, if you give it a non-integral value, instead of converting to integer and doing the mathematical test, it will fall back to iterating all the values and comparing them one by one. And just because CPython 3.2+ and PyPy 3.x versions happen to contain this optimization, and it’s an obvious good idea and easy to do, there’s no reason that IronPython or NewKickAssPython 3.x couldn’t leave it out. (And in fact CPython 3.0-3.1 didn’t include it.)


If range actually were a generator, like my_crappy_range, then it wouldn’t make sense to test __contains__ this way, or at least the way it makes sense wouldn’t be obvious. If you’d already iterated the first 3 values, is 1 still in the generator? Should testing for 1 cause it to iterate and consume all the values up to 1 (or up to the first value >= 1)?


回答 2

使用消息来源,卢克!

在CPython中,range(...).__contains__(方法包装器)最终将委托给一个简单的计算,该计算将检查该值是否可以在该范围内。速度之所以如此,是因为我们使用关于边界的数学推理,而不是range对象的直接迭代。解释所使用的逻辑:

  1. 检查数字在start和之间stop,以及
  2. 检查步幅值是否不会“超过”我们的数字。

例如,994range(4, 1000, 2)因为:

  1. 4 <= 994 < 1000
  2. (994 - 4) % 2 == 0

完整的C代码包含在下面,由于内存管理和引用计数的详细信息,因此较为冗长,但这里存在基本思想:

static int
range_contains_long(rangeobject *r, PyObject *ob)
{
    int cmp1, cmp2, cmp3;
    PyObject *tmp1 = NULL;
    PyObject *tmp2 = NULL;
    PyObject *zero = NULL;
    int result = -1;

    zero = PyLong_FromLong(0);
    if (zero == NULL) /* MemoryError in int(0) */
        goto end;

    /* Check if the value can possibly be in the range. */

    cmp1 = PyObject_RichCompareBool(r->step, zero, Py_GT);
    if (cmp1 == -1)
        goto end;
    if (cmp1 == 1) { /* positive steps: start <= ob < stop */
        cmp2 = PyObject_RichCompareBool(r->start, ob, Py_LE);
        cmp3 = PyObject_RichCompareBool(ob, r->stop, Py_LT);
    }
    else { /* negative steps: stop < ob <= start */
        cmp2 = PyObject_RichCompareBool(ob, r->start, Py_LE);
        cmp3 = PyObject_RichCompareBool(r->stop, ob, Py_LT);
    }

    if (cmp2 == -1 || cmp3 == -1) /* TypeError */
        goto end;
    if (cmp2 == 0 || cmp3 == 0) { /* ob outside of range */
        result = 0;
        goto end;
    }

    /* Check that the stride does not invalidate ob's membership. */
    tmp1 = PyNumber_Subtract(ob, r->start);
    if (tmp1 == NULL)
        goto end;
    tmp2 = PyNumber_Remainder(tmp1, r->step);
    if (tmp2 == NULL)
        goto end;
    /* result = ((int(ob) - start) % step) == 0 */
    result = PyObject_RichCompareBool(tmp2, zero, Py_EQ);
  end:
    Py_XDECREF(tmp1);
    Py_XDECREF(tmp2);
    Py_XDECREF(zero);
    return result;
}

static int
range_contains(rangeobject *r, PyObject *ob)
{
    if (PyLong_CheckExact(ob) || PyBool_Check(ob))
        return range_contains_long(r, ob);

    return (int)_PySequence_IterSearch((PyObject*)r, ob,
                                       PY_ITERSEARCH_CONTAINS);
}

该行的“实质”在该行中提到:

/* result = ((int(ob) - start) % step) == 0 */ 

最后一点-查看range_contains代码段底部的函数。如果确切的类型检查失败,那么我们将不使用描述的巧妙算法,而是使用_PySequence_IterSearch!退回到该范围的愚蠢迭代搜索。您可以在解释器中检查此行为(我在这里使用v3.5.0):

>>> x, r = 1000000000000000, range(1000000000000001)
>>> class MyInt(int):
...     pass
... 
>>> x_ = MyInt(x)
>>> x in r  # calculates immediately :) 
True
>>> x_ in r  # iterates for ages.. :( 
^\Quit (core dumped)

Use the source, Luke!

In CPython, range(...).__contains__ (a method wrapper) will eventually delegate to a simple calculation which checks if the value can possibly be in the range. The reason for the speed here is we’re using mathematical reasoning about the bounds, rather than a direct iteration of the range object. To explain the logic used:

  1. Check that the number is between start and stop, and
  2. Check that the stride value doesn’t “step over” our number.

For example, 994 is in range(4, 1000, 2) because:

  1. 4 <= 994 < 1000, and
  2. (994 - 4) % 2 == 0.

The full C code is included below, which is a bit more verbose because of memory management and reference counting details, but the basic idea is there:

static int
range_contains_long(rangeobject *r, PyObject *ob)
{
    int cmp1, cmp2, cmp3;
    PyObject *tmp1 = NULL;
    PyObject *tmp2 = NULL;
    PyObject *zero = NULL;
    int result = -1;

    zero = PyLong_FromLong(0);
    if (zero == NULL) /* MemoryError in int(0) */
        goto end;

    /* Check if the value can possibly be in the range. */

    cmp1 = PyObject_RichCompareBool(r->step, zero, Py_GT);
    if (cmp1 == -1)
        goto end;
    if (cmp1 == 1) { /* positive steps: start <= ob < stop */
        cmp2 = PyObject_RichCompareBool(r->start, ob, Py_LE);
        cmp3 = PyObject_RichCompareBool(ob, r->stop, Py_LT);
    }
    else { /* negative steps: stop < ob <= start */
        cmp2 = PyObject_RichCompareBool(ob, r->start, Py_LE);
        cmp3 = PyObject_RichCompareBool(r->stop, ob, Py_LT);
    }

    if (cmp2 == -1 || cmp3 == -1) /* TypeError */
        goto end;
    if (cmp2 == 0 || cmp3 == 0) { /* ob outside of range */
        result = 0;
        goto end;
    }

    /* Check that the stride does not invalidate ob's membership. */
    tmp1 = PyNumber_Subtract(ob, r->start);
    if (tmp1 == NULL)
        goto end;
    tmp2 = PyNumber_Remainder(tmp1, r->step);
    if (tmp2 == NULL)
        goto end;
    /* result = ((int(ob) - start) % step) == 0 */
    result = PyObject_RichCompareBool(tmp2, zero, Py_EQ);
  end:
    Py_XDECREF(tmp1);
    Py_XDECREF(tmp2);
    Py_XDECREF(zero);
    return result;
}

static int
range_contains(rangeobject *r, PyObject *ob)
{
    if (PyLong_CheckExact(ob) || PyBool_Check(ob))
        return range_contains_long(r, ob);

    return (int)_PySequence_IterSearch((PyObject*)r, ob,
                                       PY_ITERSEARCH_CONTAINS);
}

The “meat” of the idea is mentioned in the line:

/* result = ((int(ob) - start) % step) == 0 */ 

As a final note – look at the range_contains function at the bottom of the code snippet. If the exact type check fails then we don’t use the clever algorithm described, instead falling back to a dumb iteration search of the range using _PySequence_IterSearch! You can check this behaviour in the interpreter (I’m using v3.5.0 here):

>>> x, r = 1000000000000000, range(1000000000000001)
>>> class MyInt(int):
...     pass
... 
>>> x_ = MyInt(x)
>>> x in r  # calculates immediately :) 
True
>>> x_ in r  # iterates for ages.. :( 
^\Quit (core dumped)

回答 3

为了补充Martijn的答案,这是源代码的相关部分(在C中,因为range对象是用本机代码编写的):

static int
range_contains(rangeobject *r, PyObject *ob)
{
    if (PyLong_CheckExact(ob) || PyBool_Check(ob))
        return range_contains_long(r, ob);

    return (int)_PySequence_IterSearch((PyObject*)r, ob,
                                       PY_ITERSEARCH_CONTAINS);
}

因此对于PyLong对象(int在Python 3中是),它将使用该range_contains_long函数确定结果。该函数实际上检查是否ob在指定范围内(尽管在C语言中看起来更复杂)。

如果不是int对象,它将退回到迭代,直到找到(或没有)值为止。

整个逻辑可以像这样转换为伪Python:

def range_contains (rangeObj, obj):
    if isinstance(obj, int):
        return range_contains_long(rangeObj, obj)

    # default logic by iterating
    return any(obj == x for x in rangeObj)

def range_contains_long (r, num):
    if r.step > 0:
        # positive step: r.start <= num < r.stop
        cmp2 = r.start <= num
        cmp3 = num < r.stop
    else:
        # negative step: r.start >= num > r.stop
        cmp2 = num <= r.start
        cmp3 = r.stop < num

    # outside of the range boundaries
    if not cmp2 or not cmp3:
        return False

    # num must be on a valid step inside the boundaries
    return (num - r.start) % r.step == 0

To add to Martijn’s answer, this is the relevant part of the source (in C, as the range object is written in native code):

static int
range_contains(rangeobject *r, PyObject *ob)
{
    if (PyLong_CheckExact(ob) || PyBool_Check(ob))
        return range_contains_long(r, ob);

    return (int)_PySequence_IterSearch((PyObject*)r, ob,
                                       PY_ITERSEARCH_CONTAINS);
}

So for PyLong objects (which is int in Python 3), it will use the range_contains_long function to determine the result. And that function essentially checks if ob is in the specified range (although it looks a bit more complex in C).

If it’s not an int object, it falls back to iterating until it finds the value (or not).

The whole logic could be translated to pseudo-Python like this:

def range_contains (rangeObj, obj):
    if isinstance(obj, int):
        return range_contains_long(rangeObj, obj)

    # default logic by iterating
    return any(obj == x for x in rangeObj)

def range_contains_long (r, num):
    if r.step > 0:
        # positive step: r.start <= num < r.stop
        cmp2 = r.start <= num
        cmp3 = num < r.stop
    else:
        # negative step: r.start >= num > r.stop
        cmp2 = num <= r.start
        cmp3 = r.stop < num

    # outside of the range boundaries
    if not cmp2 or not cmp3:
        return False

    # num must be on a valid step inside the boundaries
    return (num - r.start) % r.step == 0

回答 4

如果您想知道为什么将此优化添加到range.__contains__,以及为什么将其添加到xrange.__contains__2.7:

首先,正如Ashwini Chaudhary所发现的, 发行1766304已明确打开以进行优化[x]range.__contains__接受了此修补程序并签入了3.2版本,但没有回迁到2.7版本,因为“ xrange表现得如此之久,以至于我看不到它为什么让我们提交最新的修补程序。” (当时2.7快要淘汰了。)

与此同时:

最初xrange是一个非相当序列的对象。如 3.1文档所说:

范围对象的行为很少:它们仅支持索引,迭代和 len功能。

这不是真的。一个xrange对象实际支持,与索引和自动出现一些其他的东西len *包括__contains__(通过线性搜索)。但是,没有人认为有必要在那时将它们完整地序列化。

然后,作为实现抽象基类 PEP的一部分,重要的是弄清楚应将哪些内置类型标记为实现哪些ABC和xrange/ range声称实现collections.Sequence,即使它仍仅处理相同的“非常少的行为”。在发布9213之前,没有人注意到这个问题。该问题的补丁不仅增加indexcount3.2的range,它也重新工作的优化__contains__(共享相同的数学index,并直接使用count)。** 此更改也适用于3.2,并且没有回移植到2.x,因为“这是一个添加了新方法的错误修正”。(此时,2.7已经超过了rc状态。)

因此,有两次机会可以将此优化回溯到2.7,但都被拒绝了。


*实际上,您甚至可以单独使用索引免费获得迭代,但是在2.3 xrange对象中获得了自定义迭代器。

**第一个版本实际上是重新实现了它,并且弄错了细节-例如,它将给您MyIntSubclass(2) in range(5) == False。但是Daniel Stutzbach的补丁更新版本恢复了以前的大部分代码,包括对通用代码的后备支持,_PySequence_IterSearchrange.__contains__在不应用优化的情况下会缓慢地降低3.2 之前版本的隐式使用。

If you’re wondering why this optimization was added to range.__contains__, and why it wasn’t added to xrange.__contains__ in 2.7:

First, as Ashwini Chaudhary discovered, issue 1766304 was opened explicitly to optimize [x]range.__contains__. A patch for this was accepted and checked in for 3.2, but not backported to 2.7 because “xrange has behaved like this for such a long time that I don’t see what it buys us to commit the patch this late.” (2.7 was nearly out at that point.)

Meanwhile:

Originally, xrange was a not-quite-sequence object. As the 3.1 docs say:

Range objects have very little behavior: they only support indexing, iteration, and the len function.

This wasn’t quite true; an xrange object actually supported a few other things that come automatically with indexing and len,* including __contains__ (via linear search). But nobody thought it was worth making them full sequences at the time.

Then, as part of implementing the Abstract Base Classes PEP, it was important to figure out which builtin types should be marked as implementing which ABCs, and xrange/range claimed to implement collections.Sequence, even though it still only handled the same “very little behavior”. Nobody noticed that problem until issue 9213. The patch for that issue not only added index and count to 3.2’s range, it also re-worked the optimized __contains__ (which shares the same math with index, and is directly used by count).** This change went in for 3.2 as well, and was not backported to 2.x, because “it’s a bugfix that adds new methods”. (At this point, 2.7 was already past rc status.)

So, there were two chances to get this optimization backported to 2.7, but they were both rejected.


* In fact, you even get iteration for free with indexing alone, but in 2.3 xrange objects got a custom iterator.

** The first version actually reimplemented it, and got the details wrong—e.g., it would give you MyIntSubclass(2) in range(5) == False. But Daniel Stutzbach’s updated version of the patch restored most of the previous code, including the fallback to the generic, slow _PySequence_IterSearch that pre-3.2 range.__contains__ was implicitly using when the optimization doesn’t apply.


回答 5

其他答案已经很好地说明了这一点,但是我想提供另一个实验来说明范围对象的性质:

>>> r = range(5)
>>> for i in r:
        print(i, 2 in r, list(r))

0 True [0, 1, 2, 3, 4]
1 True [0, 1, 2, 3, 4]
2 True [0, 1, 2, 3, 4]
3 True [0, 1, 2, 3, 4]
4 True [0, 1, 2, 3, 4]

如您所见,范围对象是一个记住其范围的对象,可以多次使用(即使在其上进行迭代),而不仅仅是一次生成器。

The other answers explained it well already, but I’d like to offer another experiment illustrating the nature of range objects:

>>> r = range(5)
>>> for i in r:
        print(i, 2 in r, list(r))

0 True [0, 1, 2, 3, 4]
1 True [0, 1, 2, 3, 4]
2 True [0, 1, 2, 3, 4]
3 True [0, 1, 2, 3, 4]
4 True [0, 1, 2, 3, 4]

As you can see, a range object is an object that remembers its range and can be used many times (even while iterating over it), not just a one-time generator.


回答 6

这是关于一个偷懒的办法来评估和一些额外的优化range。直到实际使用时才需要计算范围内的值,或者由于额外的优化甚至不需要进一步计算。

顺便说一句,您的整数不是那么大,请考虑 sys.maxsize

sys.maxsize in range(sys.maxsize) 相当快

由于优化-比较给定的整数和范围的最小值和最大值很容易。

但:

Decimal(sys.maxsize) in range(sys.maxsize) 很慢

(在这种情况下, range,因此,如果python收到意外的Decimal,则python将比较所有数字)

您应该了解实现细节,但不应依赖它,因为将来可能会改变。

It’s all about a lazy approach to the evaluation and some extra optimization of range. Values in ranges don’t need to be computed until real use, or even further due to extra optimization.

By the way, your integer is not such big, consider sys.maxsize

sys.maxsize in range(sys.maxsize) is pretty fast

due to optimization – it’s easy to compare given integer just with min and max of range.

but:

Decimal(sys.maxsize) in range(sys.maxsize) is pretty slow.

(in this case, there is no optimization in range, so if python receives unexpected Decimal, python will compare all numbers)

You should be aware of an implementation detail but should not be relied upon, because this may change in the future.


回答 7

TL; DR

传回的物件range()实际上是range对象。该对象实现了迭代器接口,因此您可以按顺序迭代其值,就像生成器,列表或元组一样。

但是,它实现了__contains__接口,该接口实际上是当对象出现在in操作员右侧时调用的接口。该__contains__()方法返回a bool左侧项目是否in在对象中。由于range对象知道其边界和步幅,因此在O(1)中非常容易实现。

TL;DR

The object returned by range() is actually a range object. This object implements the iterator interface so you can iterate over its values sequentially, just like a generator, list, or tuple.

But it also implements the __contains__ interface which is actually what gets called when an object appears on the right hand side of the in operator. The __contains__() method returns a bool of whether or not the item on the left-hand-side of the in is in the object. Since range objects know their bounds and stride, this is very easy to implement in O(1).


回答 8

  1. 由于优化,将给定的整数与最小和最大范围进行比较非常容易。
  2. 在Python3 中range()函数之所以如此之快,是因为这里我们对边界使用数学推理,而不是范围对象的直接迭代。
  3. 所以在这里解释逻辑:
    • 检查数字是否在开始和停止之间。
    • 检查步长精度值是否不超过我们的数字。
  4. 例如,997在range(4,1000,3)内是因为:

    4 <= 997 < 1000, and (997 - 4) % 3 == 0.

  1. Due to optimization, it is very easy to compare given integers just with min and max range.
  2. The reason that range() function is so fast in Python3 is that here we use mathematical reasoning for the bounds, rather than a direct iteration of the range object.
  3. So for explaining the logic here:
    • Check whether the number is between the start and stop.
    • Check whether the step precision value doesn’t go over our number.
  4. Take an example, 997 is in range(4, 1000, 3) because:

    4 <= 997 < 1000, and (997 - 4) % 3 == 0.


回答 9

尝试x-1 in (i for i in range(x))使用较大的x值,该值使用生成器理解来避免调用range.__contains__优化。

Try x-1 in (i for i in range(x)) for large x values, which uses a generator comprehension to avoid invoking the range.__contains__ optimisation.


什么是与Python 3等价的“ python -m SimpleHTTPServer”

问题:什么是与Python 3等价的“ python -m SimpleHTTPServer”

Python 3等效于python -m SimpleHTTPServer什么?

What is the Python 3 equivalent of python -m SimpleHTTPServer?


回答 0

文档

SimpleHTTPServer模块已合并到http.serverPython 3.0中。将源转换为3.0时,2to3工具将自动适应导入。

因此,您的命令是python -m http.server,或者取决于您的安装,它可以是:

python3 -m http.server

From the docs:

The SimpleHTTPServer module has been merged into http.server in Python 3.0. The 2to3 tool will automatically adapt imports when converting your sources to 3.0.

So, your command is python -m http.server, or depending on your installation, it can be:

python3 -m http.server

回答 1

等效为:

python3 -m http.server

The equivalent is:

python3 -m http.server

回答 2

使用2to3实用程序。

$ cat try.py
import SimpleHTTPServer

$ 2to3 try.py
RefactoringTool: Skipping implicit fixer: buffer
RefactoringTool: Skipping implicit fixer: idioms
RefactoringTool: Skipping implicit fixer: set_literal
RefactoringTool: Skipping implicit fixer: ws_comma
RefactoringTool: Refactored try.py
--- try.py  (original)
+++ try.py  (refactored)
@@ -1 +1 @@
-import SimpleHTTPServer
+import http.server
RefactoringTool: Files that need to be modified:
RefactoringTool: try.py

Using 2to3 utility.

$ cat try.py
import SimpleHTTPServer

$ 2to3 try.py
RefactoringTool: Skipping implicit fixer: buffer
RefactoringTool: Skipping implicit fixer: idioms
RefactoringTool: Skipping implicit fixer: set_literal
RefactoringTool: Skipping implicit fixer: ws_comma
RefactoringTool: Refactored try.py
--- try.py  (original)
+++ try.py  (refactored)
@@ -1 +1 @@
-import SimpleHTTPServer
+import http.server
RefactoringTool: Files that need to be modified:
RefactoringTool: try.py

回答 3

除了Petr的答案,如果您想绑定到特定接口而不是所有接口,则可以使用-b--bind标记。

python -m http.server 8000 --bind 127.0.0.1

上面的代码段应该可以解决问题。端口号8000。80用作HTTP通信的标准端口。

In addition to Petr’s answer, if you want to bind to a specific interface instead of all the interfaces you can use -b or --bind flag.

python -m http.server 8000 --bind 127.0.0.1

The above snippet should do the trick. 8000 is the port number. 80 is used as the standard port for HTTP communications.


回答 4

在我的一个项目中,我针对Python 2和3运行测试。为此,我编写了一个小脚本来独立启动本地服务器:

$ python -m $(python -c 'import sys; print("http.server" if sys.version_info[:2] > (2,7) else "SimpleHTTPServer")')
Serving HTTP on 0.0.0.0 port 8000 ...

作为别名:

$ alias serve="python -m $(python -c 'import sys; print("http.server" if sys.version_info[:2] > (2,7) else "SimpleHTTPServer")')"
$ serve
Serving HTTP on 0.0.0.0 port 8000 ...

请注意,我可以通过conda环境控制我的Python版本,因此可以python代替python3使用Python 3。

In one of my projects I run tests against Python 2 and 3. For that I wrote a small script which starts a local server independently:

$ python -m $(python -c 'import sys; print("http.server" if sys.version_info[:2] > (2,7) else "SimpleHTTPServer")')
Serving HTTP on 0.0.0.0 port 8000 ...

As an alias:

$ alias serve="python -m $(python -c 'import sys; print("http.server" if sys.version_info[:2] > (2,7) else "SimpleHTTPServer")')"
$ serve
Serving HTTP on 0.0.0.0 port 8000 ...

Please note that I control my Python version via conda environments, because of that I can use python instead of python3 for using Python 3.


如何刷新打印功能的输出?

问题:如何刷新打印功能的输出?

如何强制将Python的打印功能输出到屏幕?

这与“ 禁用输出缓冲”不是重复的-链接的问题正在尝试无缓冲输出,尽管这更普遍。对于这个问题,最重要的答案太过强大或牵扯太多(对于这个问题,它们不是很好的答案),这个问题可以由相对新手在Google上找到。

How do I force Python’s print function to output to the screen?

This is not a duplicate of Disable output buffering – the linked question is attempting unbuffered output, while this is more general. The top answers in that question are too powerful or involved for this one (they’re not good answers for this), and this question can be found on Google by a relative newbie.


回答 0

在Python 3上,print可以采用可选flush参数

print("Hello world!", flush=True)

在Python 2上,您必须做

import sys
sys.stdout.flush()

打电话后print。默认情况下,print打印到sys.stdout(有关文件对象的更多信息,请参阅文档)。

On Python 3, print can take an optional flush argument

print("Hello world!", flush=True)

On Python 2 you’ll have to do

import sys
sys.stdout.flush()

after calling print. By default, print prints to sys.stdout (see the documentation for more about file objects).


回答 1

运行时python -h,我看到一个命令行选项

-u:无缓冲的二进制stdout和stderr; 也PYTHONUNBUFFERED = x有关与’-u’有关的内部缓冲的详细信息,请参见手册页

这是相关的文档

Running python -h, I see a command line option:

-u : unbuffered binary stdout and stderr; also PYTHONUNBUFFERED=x see man page for details on internal buffering relating to ‘-u’

Here is the relevant doc.


回答 2

从Python 3.3开始,您可以强制使用普通print()函数进行刷新,而无需使用sys.stdout.flush(); 只需将“ flush”关键字参数设置为true。从文档中

print(* objects,sep =”,end =’\ n’,file = sys.stdout,flush = False)

将对象打印到流文件中,以sep分隔,然后以end分隔。sep,end和file(如果存在)必须作为关键字参数给出。

所有非关键字参数都将像str()一样转换为字符串,并写入流中,以sep分隔,然后以end分隔。sep和end都必须是字符串;它们也可以是None,这意味着要使用默认值。如果没有给出对象,print()只会写完。

file参数必须是带有write(string)方法的对象;如果不存在或没有,将使用sys.stdout。通常是否由文件决定是否对输出进行缓冲,但是如果flush关键字参数为true,则将强制刷新流。

Since Python 3.3, you can force the normal print() function to flush without the need to use sys.stdout.flush(); just set the “flush” keyword argument to true. From the documentation:

print(*objects, sep=’ ‘, end=’\n’, file=sys.stdout, flush=False)

Print objects to the stream file, separated by sep and followed by end. sep, end and file, if present, must be given as keyword arguments.

All non-keyword arguments are converted to strings like str() does and written to the stream, separated by sep and followed by end. Both sep and end must be strings; they can also be None, which means to use the default values. If no objects are given, print() will just write end.

The file argument must be an object with a write(string) method; if it is not present or None, sys.stdout will be used. Whether output is buffered is usually determined by file, but if the flush keyword argument is true, the stream is forcibly flushed.


回答 3

如何刷新Python打印输出?

我建议这样做的五种方法:

  • 在Python 3中,调用print(..., flush=True)(flush参数在Python 2的print函数中不可用,并且print语句没有类似物)。
  • 调用file.flush()输出文件(我们可以包装python 2的print函数来执行此操作),例如,sys.stdout
  • 将此函数
    print = partial(print, flush=True)应用于局部函数的模块中的每个打印函数调用,并应用于全局模块。
  • -u通过传递给解释器命令的标志()将其应用于进程
  • 将其应用于您环境中的每个python进程PYTHONUNBUFFERED=TRUE(并取消设置变量以撤消此操作)。

Python 3.3以上

使用Python 3.3或更高版本,您可以仅flush=True将关键字参数提供给该print函数:

print('foo', flush=True) 

Python 2(或<3.3)

他们没有将flush参数反向移植到Python 2.7,因此,如果您使用的是Python 2(或低于3.3),并且想要与2和3都兼容的代码,我建议以下兼容代码。(请注意,__future__导入必须位于/非常靠近“ 模块顶部 ”):

from __future__ import print_function
import sys

if sys.version_info[:2] < (3, 3):
    old_print = print
    def print(*args, **kwargs):
        flush = kwargs.pop('flush', False)
        old_print(*args, **kwargs)
        if flush:
            file = kwargs.get('file', sys.stdout)
            # Why might file=None? IDK, but it works for print(i, file=None)
            file.flush() if file is not None else sys.stdout.flush()

上面的兼容性代码将涵盖大多数用途,但要进行更彻底的处理,请参阅six模块

另外,您也可以file.flush()在打印后调用,例如使用Python 2中的print语句:

import sys
print 'delayed output'
sys.stdout.flush()

将一个模块中的默认值更改为 flush=True

您可以通过在模块的全局范围内使用functools.partial来更改打印功能的默认值:

import functools
print = functools.partial(print, flush=True)

如果您看看我们新的部分函数,​​至少在Python 3中:

>>> print = functools.partial(print, flush=True)
>>> print
functools.partial(<built-in function print>, flush=True)

我们可以看到它的工作原理与正常情况一样:

>>> print('foo')
foo

实际上,我们可以覆盖新的默认值:

>>> print('foo', flush=False)
foo

再次注意,这只会更改当前的全局范围,因为当前全局范围内的打印名称将使内置print函数(如果在该当前全局范围中使用Python 2使用,则取消引用兼容性函数)。

如果要在函数内部而不是在模块的全局范围内执行此操作,则应给它取一个不同的名称,例如:

def foo():
    printf = functools.partial(print, flush=True)
    printf('print stuff like this')

如果在函数中将其声明为全局变量,则需要在模块的全局命名空间中对其进行更改,因此,应将其放在全局命名空间中,除非特定行为正是您想要的。

更改流程的默认值

我认为这里最好的选择是使用-u标志来获取无缓冲的输出。

$ python -u script.py

要么

$ python -um package.module

文档

强制stdin,stdout和stderr完全没有缓冲。在重要的系统上,还将stdin,stdout和stderr置于二进制模式。

请注意,file.readlines()和文件对象(sys.stdin中的行)具有内部缓冲,不受该选项的影响。要解决此问题,您将需要在while 1:循环内使用file.readline()。

更改Shell操作环境的默认值

如果将环境变量设置为非空字符串,则可以在环境或从该环境继承的环境中的所有python进程中获得以下行为:

例如,在Linux或OSX中:

$ export PYTHONUNBUFFERED=TRUE

或Windows:

C:\SET PYTHONUNBUFFERED=TRUE

文档

PYTHONUNBUFFERD

如果将其设置为非空字符串,则等效于指定-u选项。


附录

这是Python 2.7.12中的print函数的帮助-请注意没有 flush参数:

>>> from __future__ import print_function
>>> help(print)
print(...)
    print(value, ..., sep=' ', end='\n', file=sys.stdout)

    Prints the values to a stream, or to sys.stdout by default.
    Optional keyword arguments:
    file: a file-like object (stream); defaults to the current sys.stdout.
    sep:  string inserted between values, default a space.
    end:  string appended after the last value, default a newline.

How to flush output of Python print?

I suggest five ways of doing this:

  • In Python 3, call print(..., flush=True) (the flush argument is not available in Python 2’s print function, and there is no analogue for the print statement).
  • Call file.flush() on the output file (we can wrap python 2’s print function to do this), for example, sys.stdout
  • apply this to every print function call in the module with a partial function,
    print = partial(print, flush=True) applied to the module global.
  • apply this to the process with a flag (-u) passed to the interpreter command
  • apply this to every python process in your environment with PYTHONUNBUFFERED=TRUE (and unset the variable to undo this).

Python 3.3+

Using Python 3.3 or higher, you can just provide flush=True as a keyword argument to the print function:

print('foo', flush=True) 

Python 2 (or < 3.3)

They did not backport the flush argument to Python 2.7 So if you’re using Python 2 (or less than 3.3), and want code that’s compatible with both 2 and 3, may I suggest the following compatibility code. (Note the __future__ import must be at/very “near the top of your module“):

from __future__ import print_function
import sys

if sys.version_info[:2] < (3, 3):
    old_print = print
    def print(*args, **kwargs):
        flush = kwargs.pop('flush', False)
        old_print(*args, **kwargs)
        if flush:
            file = kwargs.get('file', sys.stdout)
            # Why might file=None? IDK, but it works for print(i, file=None)
            file.flush() if file is not None else sys.stdout.flush()

The above compatibility code will cover most uses, but for a much more thorough treatment, see the six module.

Alternatively, you can just call file.flush() after printing, for example, with the print statement in Python 2:

import sys
print 'delayed output'
sys.stdout.flush()

Changing the default in one module to flush=True

You can change the default for the print function by using functools.partial on the global scope of a module:

import functools
print = functools.partial(print, flush=True)

if you look at our new partial function, at least in Python 3:

>>> print = functools.partial(print, flush=True)
>>> print
functools.partial(<built-in function print>, flush=True)

We can see it works just like normal:

>>> print('foo')
foo

And we can actually override the new default:

>>> print('foo', flush=False)
foo

Note again, this only changes the current global scope, because the print name on the current global scope will overshadow the builtin print function (or unreference the compatibility function, if using one in Python 2, in that current global scope).

If you want to do this inside a function instead of on a module’s global scope, you should give it a different name, e.g.:

def foo():
    printf = functools.partial(print, flush=True)
    printf('print stuff like this')

If you declare it a global in a function, you’re changing it on the module’s global namespace, so you should just put it in the global namespace, unless that specific behavior is exactly what you want.

Changing the default for the process

I think the best option here is to use the -u flag to get unbuffered output.

$ python -u script.py

or

$ python -um package.module

From the docs:

Force stdin, stdout and stderr to be totally unbuffered. On systems where it matters, also put stdin, stdout and stderr in binary mode.

Note that there is internal buffering in file.readlines() and File Objects (for line in sys.stdin) which is not influenced by this option. To work around this, you will want to use file.readline() inside a while 1: loop.

Changing the default for the shell operating environment

You can get this behavior for all python processes in the environment or environments that inherit from the environment if you set the environment variable to a nonempty string:

e.g., in Linux or OSX:

$ export PYTHONUNBUFFERED=TRUE

or Windows:

C:\SET PYTHONUNBUFFERED=TRUE

from the docs:

PYTHONUNBUFFERED

If this is set to a non-empty string it is equivalent to specifying the -u option.


Addendum

Here’s the help on the print function from Python 2.7.12 – note that there is no flush argument:

>>> from __future__ import print_function
>>> help(print)
print(...)
    print(value, ..., sep=' ', end='\n', file=sys.stdout)

    Prints the values to a stream, or to sys.stdout by default.
    Optional keyword arguments:
    file: a file-like object (stream); defaults to the current sys.stdout.
    sep:  string inserted between values, default a space.
    end:  string appended after the last value, default a newline.

回答 4

另外,如本博客中所建议,可以sys.stdout在无缓冲模式下重新打开:

sys.stdout = os.fdopen(sys.stdout.fileno(), 'w', 0)

每个stdout.writeprint操作后自动刷新。

Also as suggested in this blog one can reopen sys.stdout in unbuffered mode:

sys.stdout = os.fdopen(sys.stdout.fileno(), 'w', 0)

Each stdout.write and print operation will be automatically flushed afterwards.


回答 5

使用Python 3.x,该print()功能已扩展:

print(*objects, sep=' ', end='\n', file=sys.stdout, flush=False)

因此,您可以执行以下操作:

print("Visiting toilet", flush=True)

Python文档条目

With Python 3.x the print() function has been extended:

print(*objects, sep=' ', end='\n', file=sys.stdout, flush=False)

So, you can just do:

print("Visiting toilet", flush=True)

Python Docs Entry


回答 6

使用-u命令行开关可以,但是有点笨拙。这意味着如果用户在没有-u选项的情况下调用脚本,程序可能会出现错误的行为。我通常使用custom stdout,例如:

class flushfile:
  def __init__(self, f):
    self.f = f

  def write(self, x):
    self.f.write(x)
    self.f.flush()

import sys
sys.stdout = flushfile(sys.stdout)

…现在,您的所有print呼叫(sys.stdout隐式使用)将被自动flush编辑。

Using the -u command-line switch works, but it is a little bit clumsy. It would mean that the program would potentially behave incorrectly if the user invoked the script without the -u option. I usually use a custom stdout, like this:

class flushfile:
  def __init__(self, f):
    self.f = f

  def write(self, x):
    self.f.write(x)
    self.f.flush()

import sys
sys.stdout = flushfile(sys.stdout)

… Now all your print calls (which use sys.stdout implicitly), will be automatically flushed.


回答 7

为什么不尝试使用未缓冲的文件?

f = open('xyz.log', 'a', 0)

要么

sys.stdout = open('out.log', 'a', 0)

Why not try using an unbuffered file?

f = open('xyz.log', 'a', 0)

OR

sys.stdout = open('out.log', 'a', 0)

回答 8

丹的想法不太有效:

#!/usr/bin/env python
class flushfile(file):
    def __init__(self, f):
        self.f = f
    def write(self, x):
        self.f.write(x)
        self.f.flush()

import sys
sys.stdout = flushfile(sys.stdout)

print "foo"

结果:

Traceback (most recent call last):
  File "./passpersist.py", line 12, in <module>
    print "foo"
ValueError: I/O operation on closed file

我认为问题在于它是从文件类继承的,实际上是不必要的。根据sys.stdout的文档:

stdout和stderr不必是内置文件对象:任何对象都是可以接受的,只要它具有带有字符串参数的write()方法即可。

所以改变

class flushfile(file):

class flushfile(object):

使它工作正常。

Dan’s idea doesn’t quite work:

#!/usr/bin/env python
class flushfile(file):
    def __init__(self, f):
        self.f = f
    def write(self, x):
        self.f.write(x)
        self.f.flush()

import sys
sys.stdout = flushfile(sys.stdout)

print "foo"

The result:

Traceback (most recent call last):
  File "./passpersist.py", line 12, in <module>
    print "foo"
ValueError: I/O operation on closed file

I believe the problem is that it inherits from the file class, which actually isn’t necessary. According to the docs for sys.stdout:

stdout and stderr needn’t be built-in file objects: any object is acceptable as long as it has a write() method that takes a string argument.

so changing

class flushfile(file):

to

class flushfile(object):

makes it work just fine.


回答 9

这是我的版本,它也提供writelines()和fileno():

class FlushFile(object):
    def __init__(self, fd):
        self.fd = fd

    def write(self, x):
        ret = self.fd.write(x)
        self.fd.flush()
        return ret

    def writelines(self, lines):
        ret = self.writelines(lines)
        self.fd.flush()
        return ret

    def flush(self):
        return self.fd.flush

    def close(self):
        return self.fd.close()

    def fileno(self):
        return self.fd.fileno()

Here is my version, which provides writelines() and fileno(), too:

class FlushFile(object):
    def __init__(self, fd):
        self.fd = fd

    def write(self, x):
        ret = self.fd.write(x)
        self.fd.flush()
        return ret

    def writelines(self, lines):
        ret = self.writelines(lines)
        self.fd.flush()
        return ret

    def flush(self):
        return self.fd.flush

    def close(self):
        return self.fd.close()

    def fileno(self):
        return self.fd.fileno()

回答 10

在Python 3中,您可以覆盖打印功能,默认设置为 flush = True

def print(*objects, sep=' ', end='\n', file=sys.stdout, flush=True):
    __builtins__.print(*objects, sep=sep, end=end, file=file, flush=flush)

In Python 3 you can overwrite print function with default set to flush = True

def print(*objects, sep=' ', end='\n', file=sys.stdout, flush=True):
    __builtins__.print(*objects, sep=sep, end=end, file=file, flush=flush)

回答 11

我在Python 3.4中这样做:

'''To write to screen in real-time'''
message = lambda x: print(x, flush=True, end="")
message('I am flushing out now...')

I did it like this in Python 3.4:

'''To write to screen in real-time'''
message = lambda x: print(x, flush=True, end="")
message('I am flushing out now...')

回答 12

我首先努力了解冲洗选项的工作方式。我想做一个“加载显示”,这是我找到的解决方案:

for i in range(100000):
    print('{:s}\r'.format(''), end='', flush=True)
    print('Loading index: {:d}/100000'.format(i+1), end='')

第一行刷新先前的打印内容,第二行打印新的更新消息。我不知道这里是否存在单行语法。

I first struggled to understand how the flush option was working. I wanted to do a ‘loading display’ and here is the solution I found:

for i in range(100000):
    print('{:s}\r'.format(''), end='', flush=True)
    print('Loading index: {:d}/100000'.format(i+1), end='')

The first line flushes the previous print and the second line prints a new updated message. I don’t know if an one-line syntax exists here.


如何在Python中表示“枚举”?

问题:如何在Python中表示“枚举”?

我主要是C#开发人员,但目前正在使用Python开发项目。

如何用Python表示等效的枚举?

I’m mainly a C# developer, but I’m currently working on a project in Python.

How can I represent the equivalent of an Enum in Python?


回答 0

PEP 435中所述,将枚举添加到Python 3.4中。它也已在pypi上反向移植到 3.3、3.2、3.1、2.7、2.6、2.5 和2.4

对于更高级的Enum技术,请尝试aenum库(2.7、3.3+,与作者相同enum34。py2和py3之间的代码并不完全兼容,例如,__order__在python 2中需要)。

  • 要使用enum34,做$ pip install enum34
  • 要使用aenum,做$ pip install aenum

安装enum(无编号)将安装完全不同且不兼容的版本。


from enum import Enum     # for enum34, or the stdlib version
# from aenum import Enum  # for the aenum version
Animal = Enum('Animal', 'ant bee cat dog')

Animal.ant  # returns <Animal.ant: 1>
Animal['ant']  # returns <Animal.ant: 1> (string lookup)
Animal.ant.name  # returns 'ant' (inverse lookup)

或等效地:

class Animal(Enum):
    ant = 1
    bee = 2
    cat = 3
    dog = 4

在早期版本中,完成枚举的一种方法是:

def enum(**enums):
    return type('Enum', (), enums)

用法如下:

>>> Numbers = enum(ONE=1, TWO=2, THREE='three')
>>> Numbers.ONE
1
>>> Numbers.TWO
2
>>> Numbers.THREE
'three'

您还可以轻松支持自动枚举,如下所示:

def enum(*sequential, **named):
    enums = dict(zip(sequential, range(len(sequential))), **named)
    return type('Enum', (), enums)

并像这样使用:

>>> Numbers = enum('ZERO', 'ONE', 'TWO')
>>> Numbers.ZERO
0
>>> Numbers.ONE
1

可以通过以下方式添加对将值转换回名称的支持:

def enum(*sequential, **named):
    enums = dict(zip(sequential, range(len(sequential))), **named)
    reverse = dict((value, key) for key, value in enums.iteritems())
    enums['reverse_mapping'] = reverse
    return type('Enum', (), enums)

这将覆盖具有该名称的所有内容,但是对于在输出中呈现枚举很有用。如果反向映射不存在,它将抛出KeyError。对于第一个示例:

>>> Numbers.reverse_mapping['three']
'THREE'

Enums have been added to Python 3.4 as described in PEP 435. It has also been backported to 3.3, 3.2, 3.1, 2.7, 2.6, 2.5, and 2.4 on pypi.

For more advanced Enum techniques try the aenum library (2.7, 3.3+, same author as enum34. Code is not perfectly compatible between py2 and py3, e.g. you’ll need __order__ in python 2).

  • To use enum34, do $ pip install enum34
  • To use aenum, do $ pip install aenum

Installing enum (no numbers) will install a completely different and incompatible version.


from enum import Enum     # for enum34, or the stdlib version
# from aenum import Enum  # for the aenum version
Animal = Enum('Animal', 'ant bee cat dog')

Animal.ant  # returns <Animal.ant: 1>
Animal['ant']  # returns <Animal.ant: 1> (string lookup)
Animal.ant.name  # returns 'ant' (inverse lookup)

or equivalently:

class Animal(Enum):
    ant = 1
    bee = 2
    cat = 3
    dog = 4

In earlier versions, one way of accomplishing enums is:

def enum(**enums):
    return type('Enum', (), enums)

which is used like so:

>>> Numbers = enum(ONE=1, TWO=2, THREE='three')
>>> Numbers.ONE
1
>>> Numbers.TWO
2
>>> Numbers.THREE
'three'

You can also easily support automatic enumeration with something like this:

def enum(*sequential, **named):
    enums = dict(zip(sequential, range(len(sequential))), **named)
    return type('Enum', (), enums)

and used like so:

>>> Numbers = enum('ZERO', 'ONE', 'TWO')
>>> Numbers.ZERO
0
>>> Numbers.ONE
1

Support for converting the values back to names can be added this way:

def enum(*sequential, **named):
    enums = dict(zip(sequential, range(len(sequential))), **named)
    reverse = dict((value, key) for key, value in enums.iteritems())
    enums['reverse_mapping'] = reverse
    return type('Enum', (), enums)

This overwrites anything with that name, but it is useful for rendering your enums in output. It will throw KeyError if the reverse mapping doesn’t exist. With the first example:

>>> Numbers.reverse_mapping['three']
'THREE'

回答 1

在PEP 435之前,Python没有等效项,但是您可以实现自己的等效项。

我自己,我喜欢保持简单(我在网上看到了一些非常复杂的示例),就像这样…

class Animal:
    DOG = 1
    CAT = 2

x = Animal.DOG

在Python 3.4(PEP 435)中,您可以将Enum设为基类。这会给您带来一些额外的功能,如PEP中所述。例如,枚举成员不同于整数,并且由a name和a 组成value

class Animal(Enum):
    DOG = 1
    CAT = 2

print(Animal.DOG)
# <Animal.DOG: 1>

print(Animal.DOG.value)
# 1

print(Animal.DOG.name)
# "DOG"

如果您不想键入值,请使用以下快捷方式:

class Animal(Enum):
    DOG, CAT = range(2)

Enum实现可以转换为列表并且可以迭代。其成员的顺序是声明顺序,与它们的值无关。例如:

class Animal(Enum):
    DOG = 1
    CAT = 2
    COW = 0

list(Animal)
# [<Animal.DOG: 1>, <Animal.CAT: 2>, <Animal.COW: 0>]

[animal.value for animal in Animal]
# [1, 2, 0]

Animal.CAT in Animal
# True

Before PEP 435, Python didn’t have an equivalent but you could implement your own.

Myself, I like keeping it simple (I’ve seen some horribly complex examples on the net), something like this …

class Animal:
    DOG = 1
    CAT = 2

x = Animal.DOG

In Python 3.4 (PEP 435), you can make Enum the base class. This gets you a little bit of extra functionality, described in the PEP. For example, enum members are distinct from integers, and they are composed of a name and a value.

class Animal(Enum):
    DOG = 1
    CAT = 2

print(Animal.DOG)
# <Animal.DOG: 1>

print(Animal.DOG.value)
# 1

print(Animal.DOG.name)
# "DOG"

If you don’t want to type the values, use the following shortcut:

class Animal(Enum):
    DOG, CAT = range(2)

Enum implementations can be converted to lists and are iterable. The order of its members is the declaration order and has nothing to do with their values. For example:

class Animal(Enum):
    DOG = 1
    CAT = 2
    COW = 0

list(Animal)
# [<Animal.DOG: 1>, <Animal.CAT: 2>, <Animal.COW: 0>]

[animal.value for animal in Animal]
# [1, 2, 0]

Animal.CAT in Animal
# True

回答 2

这是一个实现:

class Enum(set):
    def __getattr__(self, name):
        if name in self:
            return name
        raise AttributeError

这是它的用法:

Animals = Enum(["DOG", "CAT", "HORSE"])

print(Animals.DOG)

Here is one implementation:

class Enum(set):
    def __getattr__(self, name):
        if name in self:
            return name
        raise AttributeError

Here is its usage:

Animals = Enum(["DOG", "CAT", "HORSE"])

print(Animals.DOG)

回答 3

如果需要数字值,这是最快的方法:

dog, cat, rabbit = range(3)

在Python 3.x中,您还可以在末尾添加一个加星标的占位符,以防吸收内存中的剩余值,以防万一:

dog, cat, rabbit, horse, *_ = range(100)

If you need the numeric values, here’s the quickest way:

dog, cat, rabbit = range(3)

In Python 3.x you can also add a starred placeholder at the end, which will soak up all the remaining values of the range in case you don’t mind wasting memory and cannot count:

dog, cat, rabbit, horse, *_ = range(100)

回答 4

最好的解决方案取决于您对假货的 要求enum

简单枚举:

如果enum仅需要标识不同项目名称列表,那么马克·哈里森(上述)的解决方案非常有用:

Pen, Pencil, Eraser = range(0, 3)

使用a range还可以设置任何起始值

Pen, Pencil, Eraser = range(9, 12)

除上述内容外,如果您还要求这些项目属于某种容器,则将它们嵌入一个类中:

class Stationery:
    Pen, Pencil, Eraser = range(0, 3)

要使用枚举项目,您现在需要使用容器名称和项目名称:

stype = Stationery.Pen

复杂的枚举:

对于一长串的枚举或更复杂的枚举使用,这些解决方案将无法满足要求。您可以查看Will Cook 的Python食谱手册中的Python 模拟枚举方法。可在此处获得其在线版本。

更多信息:

PEP 354:Python枚举中有一个有趣的细节,建议使用Python枚举以及为什么拒绝该枚举。

The best solution for you would depend on what you require from your fake enum.

Simple enum:

If you need the enum as only a list of names identifying different items, the solution by Mark Harrison (above) is great:

Pen, Pencil, Eraser = range(0, 3)

Using a range also allows you to set any starting value:

Pen, Pencil, Eraser = range(9, 12)

In addition to the above, if you also require that the items belong to a container of some sort, then embed them in a class:

class Stationery:
    Pen, Pencil, Eraser = range(0, 3)

To use the enum item, you would now need to use the container name and the item name:

stype = Stationery.Pen

Complex enum:

For long lists of enum or more complicated uses of enum, these solutions will not suffice. You could look to the recipe by Will Ware for Simulating Enumerations in Python published in the Python Cookbook. An online version of that is available here.

More info:

PEP 354: Enumerations in Python has the interesting details of a proposal for enum in Python and why it was rejected.


回答 5

Java之前的JDK 5中使用的类型安全枚举模式具有许多优点。就像在Alexandru的答案中一样,您创建了一个类,并且类级别字段是枚举值。但是,枚举值是类的实例,而不是小整数。这样做的优点是您的枚举值不会无意间等于小整数,您可以控制它们的打印方式,添加有用的任意方法,并使用isinstance进行断言:

class Animal:
   def __init__(self, name):
       self.name = name

   def __str__(self):
       return self.name

   def __repr__(self):
       return "<Animal: %s>" % self

Animal.DOG = Animal("dog")
Animal.CAT = Animal("cat")

>>> x = Animal.DOG
>>> x
<Animal: dog>
>>> x == 1
False

python-dev上的一个最新线程指出,野外有几个枚举库,包括:

The typesafe enum pattern which was used in Java pre-JDK 5 has a number of advantages. Much like in Alexandru’s answer, you create a class and class level fields are the enum values; however, the enum values are instances of the class rather than small integers. This has the advantage that your enum values don’t inadvertently compare equal to small integers, you can control how they’re printed, add arbitrary methods if that’s useful and make assertions using isinstance:

class Animal:
   def __init__(self, name):
       self.name = name

   def __str__(self):
       return self.name

   def __repr__(self):
       return "<Animal: %s>" % self

Animal.DOG = Animal("dog")
Animal.CAT = Animal("cat")

>>> x = Animal.DOG
>>> x
<Animal: dog>
>>> x == 1
False

A recent thread on python-dev pointed out there are a couple of enum libraries in the wild, including:


回答 6

枚举类可以是单行。

class Enum(tuple): __getattr__ = tuple.index

如何使用它(正向和反向查找,键,值,项目等)

>>> State = Enum(['Unclaimed', 'Claimed'])
>>> State.Claimed
1
>>> State[1]
'Claimed'
>>> State
('Unclaimed', 'Claimed')
>>> range(len(State))
[0, 1]
>>> [(k, State[k]) for k in range(len(State))]
[(0, 'Unclaimed'), (1, 'Claimed')]
>>> [(k, getattr(State, k)) for k in State]
[('Unclaimed', 0), ('Claimed', 1)]

An Enum class can be a one-liner.

class Enum(tuple): __getattr__ = tuple.index

How to use it (forward and reverse lookup, keys, values, items, etc.)

>>> State = Enum(['Unclaimed', 'Claimed'])
>>> State.Claimed
1
>>> State[1]
'Claimed'
>>> State
('Unclaimed', 'Claimed')
>>> range(len(State))
[0, 1]
>>> [(k, State[k]) for k in range(len(State))]
[(0, 'Unclaimed'), (1, 'Claimed')]
>>> [(k, getattr(State, k)) for k in State]
[('Unclaimed', 0), ('Claimed', 1)]

回答 7

所以,我同意。我们不要在Python中强制执行类型安全性,但我想保护自己免受愚蠢的错误的影响。那么我们对此怎么看?

class Animal(object):
    values = ['Horse','Dog','Cat']

    class __metaclass__(type):
        def __getattr__(self, name):
            return self.values.index(name)

它使我在定义枚举时避免了价值冲突。

>>> Animal.Cat
2

还有一个方便的优点:真正快速的反向查找:

def name_of(self, i):
    return self.values[i]

So, I agree. Let’s not enforce type safety in Python, but I would like to protect myself from silly mistakes. So what do we think about this?

class Animal(object):
    values = ['Horse','Dog','Cat']

    class __metaclass__(type):
        def __getattr__(self, name):
            return self.values.index(name)

It keeps me from value-collision in defining my enums.

>>> Animal.Cat
2

There’s another handy advantage: really fast reverse lookups:

def name_of(self, i):
    return self.values[i]

回答 8

Python没有等效于的内置函数enum,其他答案也有实现自己的想法(您可能也对Python食谱中的顶级版本感兴趣)。

但是,在enum需要用C调用an的情况下,我通常最终只使用简单的字符串:由于对象/属性的实现方式,(C)Python经过优化,无论如何都可以非常快速地使用短字符串,因此使用整数确实不会对性能产生任何好处。为了防止输入错误/无效值,您可以在所选位置插入支票。

ANIMALS = ['cat', 'dog', 'python']

def take_for_a_walk(animal):
    assert animal in ANIMALS
    ...

(与使用类相比,一个缺点是您失去了自动完成功能的优势)

Python doesn’t have a built-in equivalent to enum, and other answers have ideas for implementing your own (you may also be interested in the over the top version in the Python cookbook).

However, in situations where an enum would be called for in C, I usually end up just using simple strings: because of the way objects/attributes are implemented, (C)Python is optimized to work very fast with short strings anyway, so there wouldn’t really be any performance benefit to using integers. To guard against typos / invalid values you can insert checks in selected places.

ANIMALS = ['cat', 'dog', 'python']

def take_for_a_walk(animal):
    assert animal in ANIMALS
    ...

(One disadvantage compared to using a class is that you lose the benefit of autocomplete)


回答 9

在2013-05-10上,Guido同意将PEP 435接受到Python 3.4标准库中。这意味着Python终于内置了对枚举的支持!

有一个适用于Python 3.3、3.2、3.1、2.7、2.6、2.5和2.4的反向端口。在Pypi上为enum34

宣言:

>>> from enum import Enum
>>> class Color(Enum):
...     red = 1
...     green = 2
...     blue = 3

表示:

>>> print(Color.red)
Color.red
>>> print(repr(Color.red))
<Color.red: 1>

迭代:

>>> for color in Color:
...   print(color)
...
Color.red
Color.green
Color.blue

程序访问:

>>> Color(1)
Color.red
>>> Color['blue']
Color.blue

有关更多信息,请参阅建议。官方文档可能很快就会发布。

On 2013-05-10, Guido agreed to accept PEP 435 into the Python 3.4 standard library. This means that Python finally has builtin support for enumerations!

There is a backport available for Python 3.3, 3.2, 3.1, 2.7, 2.6, 2.5, and 2.4. It’s on Pypi as enum34.

Declaration:

>>> from enum import Enum
>>> class Color(Enum):
...     red = 1
...     green = 2
...     blue = 3

Representation:

>>> print(Color.red)
Color.red
>>> print(repr(Color.red))
<Color.red: 1>

Iteration:

>>> for color in Color:
...   print(color)
...
Color.red
Color.green
Color.blue

Programmatic access:

>>> Color(1)
Color.red
>>> Color['blue']
Color.blue

For more information, refer to the proposal. Official documentation will probably follow soon.


回答 10

我更喜欢在Python中定义枚举,如下所示:

class Animal:
  class Dog: pass
  class Cat: pass

x = Animal.Dog

与使用整数相比,它具有更好的防错功能,因为您不必担心确保整数是唯一的(例如,如果您说Dog = 1和Cat = 1会被搞砸)。

比起使用字符串,它更具防错性,因为您不必担心拼写错误(例如,x ==“ catt”静默失败,但x == Animal.Catt是运行时异常)。

I prefer to define enums in Python like so:

class Animal:
  class Dog: pass
  class Cat: pass

x = Animal.Dog

It’s more bug-proof than using integers since you don’t have to worry about ensuring that the integers are unique (e.g. if you said Dog = 1 and Cat = 1 you’d be screwed).

It’s more bug-proof than using strings since you don’t have to worry about typos (e.g. x == “catt” fails silently, but x == Animal.Catt is a runtime exception).


回答 11

def M_add_class_attribs(attribs):
    def foo(name, bases, dict_):
        for v, k in attribs:
            dict_[k] = v
        return type(name, bases, dict_)
    return foo

def enum(*names):
    class Foo(object):
        __metaclass__ = M_add_class_attribs(enumerate(names))
        def __setattr__(self, name, value):  # this makes it read-only
            raise NotImplementedError
    return Foo()

像这样使用它:

Animal = enum('DOG', 'CAT')
Animal.DOG # returns 0
Animal.CAT # returns 1
Animal.DOG = 2 # raises NotImplementedError

如果您只需要唯一的符号并且不关心值,请替换此行:

__metaclass__ = M_add_class_attribs(enumerate(names))

有了这个:

__metaclass__ = M_add_class_attribs((object(), name) for name in names)
def M_add_class_attribs(attribs):
    def foo(name, bases, dict_):
        for v, k in attribs:
            dict_[k] = v
        return type(name, bases, dict_)
    return foo

def enum(*names):
    class Foo(object):
        __metaclass__ = M_add_class_attribs(enumerate(names))
        def __setattr__(self, name, value):  # this makes it read-only
            raise NotImplementedError
    return Foo()

Use it like this:

Animal = enum('DOG', 'CAT')
Animal.DOG # returns 0
Animal.CAT # returns 1
Animal.DOG = 2 # raises NotImplementedError

if you just want unique symbols and don’t care about the values, replace this line:

__metaclass__ = M_add_class_attribs(enumerate(names))

with this:

__metaclass__ = M_add_class_attribs((object(), name) for name in names)

回答 12

嗯…我想最接近枚举的是字典,定义如下:

months = {
    'January': 1,
    'February': 2,
    ...
}

要么

months = dict(
    January=1,
    February=2,
    ...
)

然后,可以为常量使用符号名称,如下所示:

mymonth = months['January']

还有其他选项,例如元组列表或元组的元组,但是字典是唯一为您提供“符号”(常量字符串)访问值的方式的字典。

编辑:我也喜欢Alexandru的答案!

Hmmm… I suppose the closest thing to an enum would be a dictionary, defined either like this:

months = {
    'January': 1,
    'February': 2,
    ...
}

or

months = dict(
    January=1,
    February=2,
    ...
)

Then, you can use the symbolic name for the constants like this:

mymonth = months['January']

There are other options, like a list of tuples, or a tuple of tuples, but the dictionary is the only one that provides you with a “symbolic” (constant string) way to access the value.

Edit: I like Alexandru’s answer too!


回答 13

另一个非常简单的Python枚举实现,使用namedtuple

from collections import namedtuple

def enum(*keys):
    return namedtuple('Enum', keys)(*keys)

MyEnum = enum('FOO', 'BAR', 'BAZ')

或者,

# With sequential number values
def enum(*keys):
    return namedtuple('Enum', keys)(*range(len(keys)))

# From a dict / keyword args
def enum(**kwargs):
    return namedtuple('Enum', kwargs.keys())(*kwargs.values())

就像上面子类的方法一样set,这允许:

'FOO' in MyEnum
other = MyEnum.FOO
assert other == MyEnum.FOO

但是具有更大的灵活性,因为它可以具有不同的键和值。这允许

MyEnum.FOO < MyEnum.BAR

如果您使用填充连续数字值的版本,则可以按预期操作。

Another, very simple, implementation of an enum in Python, using namedtuple:

from collections import namedtuple

def enum(*keys):
    return namedtuple('Enum', keys)(*keys)

MyEnum = enum('FOO', 'BAR', 'BAZ')

or, alternatively,

# With sequential number values
def enum(*keys):
    return namedtuple('Enum', keys)(*range(len(keys)))

# From a dict / keyword args
def enum(**kwargs):
    return namedtuple('Enum', kwargs.keys())(*kwargs.values())

Like the method above that subclasses set, this allows:

'FOO' in MyEnum
other = MyEnum.FOO
assert other == MyEnum.FOO

But has more flexibility as it can have different keys and values. This allows

MyEnum.FOO < MyEnum.BAR

to act as is expected if you use the version that fills in sequential number values.


回答 14

从Python 3.4开始,将正式支持枚举。您可以在Python 3.4文档页面上找到文档和示例。

枚举是使用类语法创建的,这使得它们易于读写。在功能API中介绍了另一种创建方法。要定义枚举,请子类Enum如下:

from enum import Enum
class Color(Enum):
     red = 1
     green = 2
     blue = 3

From Python 3.4 there will be official support for enums. You can find documentation and examples here on Python 3.4 documentation page.

Enumerations are created using the class syntax, which makes them easy to read and write. An alternative creation method is described in Functional API. To define an enumeration, subclass Enum as follows:

from enum import Enum
class Color(Enum):
     red = 1
     green = 2
     blue = 3

回答 15

我用什么:

class Enum(object):
    def __init__(self, names, separator=None):
        self.names = names.split(separator)
        for value, name in enumerate(self.names):
            setattr(self, name.upper(), value)
    def tuples(self):
        return tuple(enumerate(self.names))

如何使用:

>>> state = Enum('draft published retracted')
>>> state.DRAFT
0
>>> state.RETRACTED
2
>>> state.FOO
Traceback (most recent call last):
   File "<stdin>", line 1, in <module>
AttributeError: 'Enum' object has no attribute 'FOO'
>>> state.tuples()
((0, 'draft'), (1, 'published'), (2, 'retracted'))

因此,这将为您提供诸如state.PUBLISHED之类的整数常量,并在Django模型中使用两个元组作为选择。

What I use:

class Enum(object):
    def __init__(self, names, separator=None):
        self.names = names.split(separator)
        for value, name in enumerate(self.names):
            setattr(self, name.upper(), value)
    def tuples(self):
        return tuple(enumerate(self.names))

How to use:

>>> state = Enum('draft published retracted')
>>> state.DRAFT
0
>>> state.RETRACTED
2
>>> state.FOO
Traceback (most recent call last):
   File "<stdin>", line 1, in <module>
AttributeError: 'Enum' object has no attribute 'FOO'
>>> state.tuples()
((0, 'draft'), (1, 'published'), (2, 'retracted'))

So this gives you integer constants like state.PUBLISHED and the two-tuples to use as choices in Django models.


回答 16

大卫建议使用字典。我会更进一步并使用集合:

months = set('January', 'February', ..., 'December')

现在,您可以像这样测试一个值是否与集合中的值之一匹配:

if m in months:

但是,像dF一样,我通常只使用字符串常量来代替枚举。

davidg recommends using dicts. I’d go one step further and use sets:

months = set('January', 'February', ..., 'December')

Now you can test whether a value matches one of the values in the set like this:

if m in months:

like dF, though, I usually just use string constants in place of enums.


回答 17

这是我所见过的最好的:“ Python中的一流枚举”

http://code.activestate.com/recipes/413486/

它给您一个类,并且该类包含所有枚举。枚举可以相互比较,但没有任何特殊的价值。您不能将它们用作整数值。(我之所以拒绝这样做,是因为我习惯于C枚举,它们是整数值。但是,如果您不能将其用作整数,则不能将其错误地用作整数,因此总的来说,我认为这是一次胜利。 。)每个枚举都是一个唯一值。您可以打印枚举,可以对其进行迭代,可以测试枚举值是否在该枚举中。它非常完整和光滑。

编辑(cfi):上面的链接与Python 3不兼容。这是我的enum.py移植到Python 3的端口:

def cmp(a,b):
   if a < b: return -1
   if b < a: return 1
   return 0


def Enum(*names):
   ##assert names, "Empty enums are not supported" # <- Don't like empty enums? Uncomment!

   class EnumClass(object):
      __slots__ = names
      def __iter__(self):        return iter(constants)
      def __len__(self):         return len(constants)
      def __getitem__(self, i):  return constants[i]
      def __repr__(self):        return 'Enum' + str(names)
      def __str__(self):         return 'enum ' + str(constants)

   class EnumValue(object):
      __slots__ = ('__value')
      def __init__(self, value): self.__value = value
      Value = property(lambda self: self.__value)
      EnumType = property(lambda self: EnumType)
      def __hash__(self):        return hash(self.__value)
      def __cmp__(self, other):
         # C fans might want to remove the following assertion
         # to make all enums comparable by ordinal value {;))
         assert self.EnumType is other.EnumType, "Only values from the same enum are comparable"
         return cmp(self.__value, other.__value)
      def __lt__(self, other):   return self.__cmp__(other) < 0
      def __eq__(self, other):   return self.__cmp__(other) == 0
      def __invert__(self):      return constants[maximum - self.__value]
      def __nonzero__(self):     return bool(self.__value)
      def __repr__(self):        return str(names[self.__value])

   maximum = len(names) - 1
   constants = [None] * len(names)
   for i, each in enumerate(names):
      val = EnumValue(i)
      setattr(EnumClass, each, val)
      constants[i] = val
   constants = tuple(constants)
   EnumType = EnumClass()
   return EnumType


if __name__ == '__main__':
   print( '\n*** Enum Demo ***')
   print( '--- Days of week ---')
   Days = Enum('Mo', 'Tu', 'We', 'Th', 'Fr', 'Sa', 'Su')
   print( Days)
   print( Days.Mo)
   print( Days.Fr)
   print( Days.Mo < Days.Fr)
   print( list(Days))
   for each in Days:
      print( 'Day:', each)
   print( '--- Yes/No ---')
   Confirmation = Enum('No', 'Yes')
   answer = Confirmation.No
   print( 'Your answer is not', ~answer)

This is the best one I have seen: “First Class Enums in Python”

http://code.activestate.com/recipes/413486/

It gives you a class, and the class contains all the enums. The enums can be compared to each other, but don’t have any particular value; you can’t use them as an integer value. (I resisted this at first because I am used to C enums, which are integer values. But if you can’t use it as an integer, you can’t use it as an integer by mistake so overall I think it is a win.) Each enum is a unique value. You can print enums, you can iterate over them, you can test that an enum value is “in” the enum. It’s pretty complete and slick.

Edit (cfi): The above link is not Python 3 compatible. Here’s my port of enum.py to Python 3:

def cmp(a,b):
   if a < b: return -1
   if b < a: return 1
   return 0


def Enum(*names):
   ##assert names, "Empty enums are not supported" # <- Don't like empty enums? Uncomment!

   class EnumClass(object):
      __slots__ = names
      def __iter__(self):        return iter(constants)
      def __len__(self):         return len(constants)
      def __getitem__(self, i):  return constants[i]
      def __repr__(self):        return 'Enum' + str(names)
      def __str__(self):         return 'enum ' + str(constants)

   class EnumValue(object):
      __slots__ = ('__value')
      def __init__(self, value): self.__value = value
      Value = property(lambda self: self.__value)
      EnumType = property(lambda self: EnumType)
      def __hash__(self):        return hash(self.__value)
      def __cmp__(self, other):
         # C fans might want to remove the following assertion
         # to make all enums comparable by ordinal value {;))
         assert self.EnumType is other.EnumType, "Only values from the same enum are comparable"
         return cmp(self.__value, other.__value)
      def __lt__(self, other):   return self.__cmp__(other) < 0
      def __eq__(self, other):   return self.__cmp__(other) == 0
      def __invert__(self):      return constants[maximum - self.__value]
      def __nonzero__(self):     return bool(self.__value)
      def __repr__(self):        return str(names[self.__value])

   maximum = len(names) - 1
   constants = [None] * len(names)
   for i, each in enumerate(names):
      val = EnumValue(i)
      setattr(EnumClass, each, val)
      constants[i] = val
   constants = tuple(constants)
   EnumType = EnumClass()
   return EnumType


if __name__ == '__main__':
   print( '\n*** Enum Demo ***')
   print( '--- Days of week ---')
   Days = Enum('Mo', 'Tu', 'We', 'Th', 'Fr', 'Sa', 'Su')
   print( Days)
   print( Days.Mo)
   print( Days.Fr)
   print( Days.Mo < Days.Fr)
   print( list(Days))
   for each in Days:
      print( 'Day:', each)
   print( '--- Yes/No ---')
   Confirmation = Enum('No', 'Yes')
   answer = Confirmation.No
   print( 'Your answer is not', ~answer)

回答 18

把事情简单化:

class Enum(object): 
    def __init__(self, tupleList):
            self.tupleList = tupleList

    def __getattr__(self, name):
            return self.tupleList.index(name)

然后:

DIRECTION = Enum(('UP', 'DOWN', 'LEFT', 'RIGHT'))
DIRECTION.DOWN
1

Keep it simple:

class Enum(object): 
    def __init__(self, tupleList):
            self.tupleList = tupleList

    def __getattr__(self, name):
            return self.tupleList.index(name)

Then:

DIRECTION = Enum(('UP', 'DOWN', 'LEFT', 'RIGHT'))
DIRECTION.DOWN
1

回答 19

为了解码二进制文件格式,我有时需要Enum类。我碰巧想要的功能是简洁的枚举定义,通过整数值或字符串自由创建枚举实例的能力以及有用的repr表达方式。我最终得到的是:

>>> class Enum(int):
...     def __new__(cls, value):
...         if isinstance(value, str):
...             return getattr(cls, value)
...         elif isinstance(value, int):
...             return cls.__index[value]
...     def __str__(self): return self.__name
...     def __repr__(self): return "%s.%s" % (type(self).__name__, self.__name)
...     class __metaclass__(type):
...         def __new__(mcls, name, bases, attrs):
...             attrs['__slots__'] = ['_Enum__name']
...             cls = type.__new__(mcls, name, bases, attrs)
...             cls._Enum__index = _index = {}
...             for base in reversed(bases):
...                 if hasattr(base, '_Enum__index'):
...                     _index.update(base._Enum__index)
...             # create all of the instances of the new class
...             for attr in attrs.keys():
...                 value = attrs[attr]
...                 if isinstance(value, int):
...                     evalue = int.__new__(cls, value)
...                     evalue._Enum__name = attr
...                     _index[value] = evalue
...                     setattr(cls, attr, evalue)
...             return cls
... 

一个奇特的使用示例:

>>> class Citrus(Enum):
...     Lemon = 1
...     Lime = 2
... 
>>> Citrus.Lemon
Citrus.Lemon
>>> 
>>> Citrus(1)
Citrus.Lemon
>>> Citrus(5)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<stdin>", line 6, in __new__
KeyError: 5
>>> class Fruit(Citrus):
...     Apple = 3
...     Banana = 4
... 
>>> Fruit.Apple
Fruit.Apple
>>> Fruit.Lemon
Citrus.Lemon
>>> Fruit(1)
Citrus.Lemon
>>> Fruit(3)
Fruit.Apple
>>> "%d %s %r" % ((Fruit.Apple,)*3)
'3 Apple Fruit.Apple'
>>> Fruit(1) is Citrus.Lemon
True

主要特点:

  • str()int()repr()所有的产品的最有用的输出可能,enumartion的分别的名称,它的整数值,以及一个Python表达式,其值回到所述枚举。
  • 构造函数返回的枚举值严格限于预定义的值,没有意外的枚举值。
  • 枚举值是单例;他们可以严格地与is

I have had occasion to need of an Enum class, for the purpose of decoding a binary file format. The features I happened to want is concise enum definition, the ability to freely create instances of the enum by either integer value or string, and a useful representation. Here’s what I ended up with:

>>> class Enum(int):
...     def __new__(cls, value):
...         if isinstance(value, str):
...             return getattr(cls, value)
...         elif isinstance(value, int):
...             return cls.__index[value]
...     def __str__(self): return self.__name
...     def __repr__(self): return "%s.%s" % (type(self).__name__, self.__name)
...     class __metaclass__(type):
...         def __new__(mcls, name, bases, attrs):
...             attrs['__slots__'] = ['_Enum__name']
...             cls = type.__new__(mcls, name, bases, attrs)
...             cls._Enum__index = _index = {}
...             for base in reversed(bases):
...                 if hasattr(base, '_Enum__index'):
...                     _index.update(base._Enum__index)
...             # create all of the instances of the new class
...             for attr in attrs.keys():
...                 value = attrs[attr]
...                 if isinstance(value, int):
...                     evalue = int.__new__(cls, value)
...                     evalue._Enum__name = attr
...                     _index[value] = evalue
...                     setattr(cls, attr, evalue)
...             return cls
... 

A whimsical example of using it:

>>> class Citrus(Enum):
...     Lemon = 1
...     Lime = 2
... 
>>> Citrus.Lemon
Citrus.Lemon
>>> 
>>> Citrus(1)
Citrus.Lemon
>>> Citrus(5)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<stdin>", line 6, in __new__
KeyError: 5
>>> class Fruit(Citrus):
...     Apple = 3
...     Banana = 4
... 
>>> Fruit.Apple
Fruit.Apple
>>> Fruit.Lemon
Citrus.Lemon
>>> Fruit(1)
Citrus.Lemon
>>> Fruit(3)
Fruit.Apple
>>> "%d %s %r" % ((Fruit.Apple,)*3)
'3 Apple Fruit.Apple'
>>> Fruit(1) is Citrus.Lemon
True

Key features:

  • str(), int() and repr() all produce the most useful output possible, respectively the name of the enumartion, its integer value, and a Python expression that evaluates back to the enumeration.
  • Enumerated values returned by the constructor are limited strictly to the predefined values, no accidental enum values.
  • Enumerated values are singletons; they can be strictly compared with is

回答 20

Python中的新标准是PEP 435,因此Enum类将在将来的Python版本中可用:

>>> from enum import Enum

但是,现在就开始使用它,您可以安装激发PEP 的原始库

$ pip install flufl.enum

然后,您可以根据其在线指南使用它

>>> from flufl.enum import Enum
>>> class Colors(Enum):
...     red = 1
...     green = 2
...     blue = 3
>>> for color in Colors: print color
Colors.red
Colors.green
Colors.blue

The new standard in Python is PEP 435, so an Enum class will be available in future versions of Python:

>>> from enum import Enum

However to begin using it now you can install the original library that motivated the PEP:

$ pip install flufl.enum

Then you can use it as per its online guide:

>>> from flufl.enum import Enum
>>> class Colors(Enum):
...     red = 1
...     green = 2
...     blue = 3
>>> for color in Colors: print color
Colors.red
Colors.green
Colors.blue

回答 21

def enum(*sequential, **named):
    enums = dict(zip(sequential, [object() for _ in range(len(sequential))]), **named)
    return type('Enum', (), enums)

如果命名,是您的问题,但是如果不创建对象而不是值,则可以执行以下操作:

>>> DOG = enum('BARK', 'WALK', 'SIT')
>>> CAT = enum('MEOW', 'WALK', 'SIT')
>>> DOG.WALK == CAT.WALK
False

使用此处的其他实现时(在我的示例中也使用命名实例时),必须确保不要尝试比较来自不同枚举的对象。因为这可能是一个陷阱:

>>> DOG = enum('BARK'=1, 'WALK'=2, 'SIT'=3)
>>> CAT = enum('WALK'=1, 'SIT'=2)
>>> pet1_state = DOG.BARK
>>> pet2_state = CAT.WALK
>>> pet1_state == pet2_state
True

kes!

def enum(*sequential, **named):
    enums = dict(zip(sequential, [object() for _ in range(len(sequential))]), **named)
    return type('Enum', (), enums)

If you name it, is your problem, but if not creating objects instead of values allows you to do this:

>>> DOG = enum('BARK', 'WALK', 'SIT')
>>> CAT = enum('MEOW', 'WALK', 'SIT')
>>> DOG.WALK == CAT.WALK
False

When using other implementations sited here (also when using named instances in my example) you must be sure you never try to compare objects from different enums. For here’s a possible pitfall:

>>> DOG = enum('BARK'=1, 'WALK'=2, 'SIT'=3)
>>> CAT = enum('WALK'=1, 'SIT'=2)
>>> pet1_state = DOG.BARK
>>> pet2_state = CAT.WALK
>>> pet1_state == pet2_state
True

Yikes!


回答 22

我真的很喜欢Alec Thomas的解决方案(http://stackoverflow.com/a/1695250):

def enum(**enums):
    '''simple constant "enums"'''
    return type('Enum', (object,), enums)

它外观优美整洁,但这只是一个使用指定属性创建类的函数。

对该函数进行一些修改,我们可以使其表现出更多的“枚举”:

注意:我通过尝试重现pygtk的新样式“枚举”(例如Gtk.MessageType.WARNING)的行为来创建了以下示例

def enum_base(t, **enums):
    '''enums with a base class'''
    T = type('Enum', (t,), {})
    for key,val in enums.items():
        setattr(T, key, T(val))

    return T

这将基于指定的类型创建一个枚举。除了像以前的函数一样授予属性访问权限外,它的行为还与您期望的Enum类型有关。它还继承了基类。

例如,整数枚举:

>>> Numbers = enum_base(int, ONE=1, TWO=2, THREE=3)
>>> Numbers.ONE
1
>>> x = Numbers.TWO
>>> 10 + x
12
>>> type(Numbers)
<type 'type'>
>>> type(Numbers.ONE)
<class 'Enum'>
>>> isinstance(x, Numbers)
True

使用此方法可以完成的另一件有趣的事情是,通过覆盖内置方法来自定义特定行为:

def enum_repr(t, **enums):
    '''enums with a base class and repr() output'''
    class Enum(t):
        def __repr__(self):
            return '<enum {0} of type Enum({1})>'.format(self._name, t.__name__)

    for key,val in enums.items():
        i = Enum(val)
        i._name = key
        setattr(Enum, key, i)

    return Enum



>>> Numbers = enum_repr(int, ONE=1, TWO=2, THREE=3)
>>> repr(Numbers.ONE)
'<enum ONE of type Enum(int)>'
>>> str(Numbers.ONE)
'1'

I really like Alec Thomas’ solution (http://stackoverflow.com/a/1695250):

def enum(**enums):
    '''simple constant "enums"'''
    return type('Enum', (object,), enums)

It’s elegant and clean looking, but it’s just a function that creates a class with the specified attributes.

With a little modification to the function, we can get it to act a little more ‘enumy’:

NOTE: I created the following examples by trying to reproduce the behavior of pygtk’s new style ‘enums’ (like Gtk.MessageType.WARNING)

def enum_base(t, **enums):
    '''enums with a base class'''
    T = type('Enum', (t,), {})
    for key,val in enums.items():
        setattr(T, key, T(val))

    return T

This creates an enum based off a specified type. In addition to giving attribute access like the previous function, it behaves as you would expect an Enum to with respect to types. It also inherits the base class.

For example, integer enums:

>>> Numbers = enum_base(int, ONE=1, TWO=2, THREE=3)
>>> Numbers.ONE
1
>>> x = Numbers.TWO
>>> 10 + x
12
>>> type(Numbers)
<type 'type'>
>>> type(Numbers.ONE)
<class 'Enum'>
>>> isinstance(x, Numbers)
True

Another interesting thing that can be done with this method is customize specific behavior by overriding built-in methods:

def enum_repr(t, **enums):
    '''enums with a base class and repr() output'''
    class Enum(t):
        def __repr__(self):
            return '<enum {0} of type Enum({1})>'.format(self._name, t.__name__)

    for key,val in enums.items():
        i = Enum(val)
        i._name = key
        setattr(Enum, key, i)

    return Enum



>>> Numbers = enum_repr(int, ONE=1, TWO=2, THREE=3)
>>> repr(Numbers.ONE)
'<enum ONE of type Enum(int)>'
>>> str(Numbers.ONE)
'1'

回答 23

PyPI的enum包提供了enum 的可靠实现。较早的答案提到了PEP 354。这被拒绝,但是该提案已实现 http://pypi.python.org/pypi/enum

使用简单优雅:

>>> from enum import Enum
>>> Colors = Enum('red', 'blue', 'green')
>>> shirt_color = Colors.green
>>> shirt_color = Colors[2]
>>> shirt_color > Colors.red
True
>>> shirt_color.index
2
>>> str(shirt_color)
'green'

The enum package from PyPI provides a robust implementation of enums. An earlier answer mentioned PEP 354; this was rejected but the proposal was implemented http://pypi.python.org/pypi/enum.

Usage is easy and elegant:

>>> from enum import Enum
>>> Colors = Enum('red', 'blue', 'green')
>>> shirt_color = Colors.green
>>> shirt_color = Colors[2]
>>> shirt_color > Colors.red
True
>>> shirt_color.index
2
>>> str(shirt_color)
'green'

回答 24

Alexandru关于将类常量用于枚举的建议非常有效。

我还喜欢为每组常量添加一个字典,以查找人类可读的字符串表示形式。

这有两个目的:a)提供一种简单的方法来漂亮地枚举枚举; b)字典在逻辑上将常量分组,以便您可以测试成员资格。

class Animal:    
  TYPE_DOG = 1
  TYPE_CAT = 2

  type2str = {
    TYPE_DOG: "dog",
    TYPE_CAT: "cat"
  }

  def __init__(self, type_):
    assert type_ in self.type2str.keys()
    self._type = type_

  def __repr__(self):
    return "<%s type=%s>" % (
        self.__class__.__name__, self.type2str[self._type].upper())

Alexandru’s suggestion of using class constants for enums works quite well.

I also like to add a dictionary for each set of constants to lookup a human-readable string representation.

This serves two purposes: a) it provides a simple way to pretty-print your enum and b) the dictionary logically groups the constants so that you can test for membership.

class Animal:    
  TYPE_DOG = 1
  TYPE_CAT = 2

  type2str = {
    TYPE_DOG: "dog",
    TYPE_CAT: "cat"
  }

  def __init__(self, type_):
    assert type_ in self.type2str.keys()
    self._type = type_

  def __repr__(self):
    return "<%s type=%s>" % (
        self.__class__.__name__, self.type2str[self._type].upper())

回答 25

这是一种我认为有价值的具有不同特征的方法:

  • 允许基于枚举而不是词法顺序的>和<比较
  • 可以通过名称,属性或索引来寻址项目:xa,x [‘a’]或x [0]
  • 支持切片操作,例如[:]或[-1]

最重要的是防止不同类型的枚举之间进行比较

紧密基于http://code.activestate.com/recipes/413486-first-class-enums-in-python

这里包括许多文档测试,以说明此方法的不同之处。

def enum(*names):
    """
SYNOPSIS
    Well-behaved enumerated type, easier than creating custom classes

DESCRIPTION
    Create a custom type that implements an enumeration.  Similar in concept
    to a C enum but with some additional capabilities and protections.  See
    http://code.activestate.com/recipes/413486-first-class-enums-in-python/.

PARAMETERS
    names       Ordered list of names.  The order in which names are given
                will be the sort order in the enum type.  Duplicate names
                are not allowed.  Unicode names are mapped to ASCII.

RETURNS
    Object of type enum, with the input names and the enumerated values.

EXAMPLES
    >>> letters = enum('a','e','i','o','u','b','c','y','z')
    >>> letters.a < letters.e
    True

    ## index by property
    >>> letters.a
    a

    ## index by position
    >>> letters[0]
    a

    ## index by name, helpful for bridging string inputs to enum
    >>> letters['a']
    a

    ## sorting by order in the enum() create, not character value
    >>> letters.u < letters.b
    True

    ## normal slicing operations available
    >>> letters[-1]
    z

    ## error since there are not 100 items in enum
    >>> letters[99]
    Traceback (most recent call last):
        ...
    IndexError: tuple index out of range

    ## error since name does not exist in enum
    >>> letters['ggg']
    Traceback (most recent call last):
        ...
    ValueError: tuple.index(x): x not in tuple

    ## enums must be named using valid Python identifiers
    >>> numbers = enum(1,2,3,4)
    Traceback (most recent call last):
        ...
    AssertionError: Enum values must be string or unicode

    >>> a = enum('-a','-b')
    Traceback (most recent call last):
        ...
    TypeError: Error when calling the metaclass bases
        __slots__ must be identifiers

    ## create another enum
    >>> tags = enum('a','b','c')
    >>> tags.a
    a
    >>> letters.a
    a

    ## can't compare values from different enums
    >>> letters.a == tags.a
    Traceback (most recent call last):
        ...
    AssertionError: Only values from the same enum are comparable

    >>> letters.a < tags.a
    Traceback (most recent call last):
        ...
    AssertionError: Only values from the same enum are comparable

    ## can't update enum after create
    >>> letters.a = 'x'
    Traceback (most recent call last):
        ...
    AttributeError: 'EnumClass' object attribute 'a' is read-only

    ## can't update enum after create
    >>> del letters.u
    Traceback (most recent call last):
        ...
    AttributeError: 'EnumClass' object attribute 'u' is read-only

    ## can't have non-unique enum values
    >>> x = enum('a','b','c','a')
    Traceback (most recent call last):
        ...
    AssertionError: Enums must not repeat values

    ## can't have zero enum values
    >>> x = enum()
    Traceback (most recent call last):
        ...
    AssertionError: Empty enums are not supported

    ## can't have enum values that look like special function names
    ## since these could collide and lead to non-obvious errors
    >>> x = enum('a','b','c','__cmp__')
    Traceback (most recent call last):
        ...
    AssertionError: Enum values beginning with __ are not supported

LIMITATIONS
    Enum values of unicode type are not preserved, mapped to ASCII instead.

    """
    ## must have at least one enum value
    assert names, 'Empty enums are not supported'
    ## enum values must be strings
    assert len([i for i in names if not isinstance(i, types.StringTypes) and not \
        isinstance(i, unicode)]) == 0, 'Enum values must be string or unicode'
    ## enum values must not collide with special function names
    assert len([i for i in names if i.startswith("__")]) == 0,\
        'Enum values beginning with __ are not supported'
    ## each enum value must be unique from all others
    assert names == uniquify(names), 'Enums must not repeat values'

    class EnumClass(object):
        """ See parent function for explanation """

        __slots__ = names

        def __iter__(self):
            return iter(constants)

        def __len__(self):
            return len(constants)

        def __getitem__(self, i):
            ## this makes xx['name'] possible
            if isinstance(i, types.StringTypes):
                i = names.index(i)
            ## handles the more normal xx[0]
            return constants[i]

        def __repr__(self):
            return 'enum' + str(names)

        def __str__(self):
            return 'enum ' + str(constants)

        def index(self, i):
            return names.index(i)

    class EnumValue(object):
        """ See parent function for explanation """

        __slots__ = ('__value')

        def __init__(self, value):
            self.__value = value

        value = property(lambda self: self.__value)

        enumtype = property(lambda self: enumtype)

        def __hash__(self):
            return hash(self.__value)

        def __cmp__(self, other):
            assert self.enumtype is other.enumtype, 'Only values from the same enum are comparable'
            return cmp(self.value, other.value)

        def __invert__(self):
            return constants[maximum - self.value]

        def __nonzero__(self):
            ## return bool(self.value)
            ## Original code led to bool(x[0])==False, not correct
            return True

        def __repr__(self):
            return str(names[self.value])

    maximum = len(names) - 1
    constants = [None] * len(names)
    for i, each in enumerate(names):
        val = EnumValue(i)
        setattr(EnumClass, each, val)
        constants[i] = val
    constants = tuple(constants)
    enumtype = EnumClass()
    return enumtype

Here’s an approach with some different characteristics I find valuable:

  • allows > and < comparison based on order in enum, not lexical order
  • can address item by name, property or index: x.a, x[‘a’] or x[0]
  • supports slicing operations like [:] or [-1]

and most importantly prevents comparisons between enums of different types!

Based closely on http://code.activestate.com/recipes/413486-first-class-enums-in-python.

Many doctests included here to illustrate what’s different about this approach.

def enum(*names):
    """
SYNOPSIS
    Well-behaved enumerated type, easier than creating custom classes

DESCRIPTION
    Create a custom type that implements an enumeration.  Similar in concept
    to a C enum but with some additional capabilities and protections.  See
    http://code.activestate.com/recipes/413486-first-class-enums-in-python/.

PARAMETERS
    names       Ordered list of names.  The order in which names are given
                will be the sort order in the enum type.  Duplicate names
                are not allowed.  Unicode names are mapped to ASCII.

RETURNS
    Object of type enum, with the input names and the enumerated values.

EXAMPLES
    >>> letters = enum('a','e','i','o','u','b','c','y','z')
    >>> letters.a < letters.e
    True

    ## index by property
    >>> letters.a
    a

    ## index by position
    >>> letters[0]
    a

    ## index by name, helpful for bridging string inputs to enum
    >>> letters['a']
    a

    ## sorting by order in the enum() create, not character value
    >>> letters.u < letters.b
    True

    ## normal slicing operations available
    >>> letters[-1]
    z

    ## error since there are not 100 items in enum
    >>> letters[99]
    Traceback (most recent call last):
        ...
    IndexError: tuple index out of range

    ## error since name does not exist in enum
    >>> letters['ggg']
    Traceback (most recent call last):
        ...
    ValueError: tuple.index(x): x not in tuple

    ## enums must be named using valid Python identifiers
    >>> numbers = enum(1,2,3,4)
    Traceback (most recent call last):
        ...
    AssertionError: Enum values must be string or unicode

    >>> a = enum('-a','-b')
    Traceback (most recent call last):
        ...
    TypeError: Error when calling the metaclass bases
        __slots__ must be identifiers

    ## create another enum
    >>> tags = enum('a','b','c')
    >>> tags.a
    a
    >>> letters.a
    a

    ## can't compare values from different enums
    >>> letters.a == tags.a
    Traceback (most recent call last):
        ...
    AssertionError: Only values from the same enum are comparable

    >>> letters.a < tags.a
    Traceback (most recent call last):
        ...
    AssertionError: Only values from the same enum are comparable

    ## can't update enum after create
    >>> letters.a = 'x'
    Traceback (most recent call last):
        ...
    AttributeError: 'EnumClass' object attribute 'a' is read-only

    ## can't update enum after create
    >>> del letters.u
    Traceback (most recent call last):
        ...
    AttributeError: 'EnumClass' object attribute 'u' is read-only

    ## can't have non-unique enum values
    >>> x = enum('a','b','c','a')
    Traceback (most recent call last):
        ...
    AssertionError: Enums must not repeat values

    ## can't have zero enum values
    >>> x = enum()
    Traceback (most recent call last):
        ...
    AssertionError: Empty enums are not supported

    ## can't have enum values that look like special function names
    ## since these could collide and lead to non-obvious errors
    >>> x = enum('a','b','c','__cmp__')
    Traceback (most recent call last):
        ...
    AssertionError: Enum values beginning with __ are not supported

LIMITATIONS
    Enum values of unicode type are not preserved, mapped to ASCII instead.

    """
    ## must have at least one enum value
    assert names, 'Empty enums are not supported'
    ## enum values must be strings
    assert len([i for i in names if not isinstance(i, types.StringTypes) and not \
        isinstance(i, unicode)]) == 0, 'Enum values must be string or unicode'
    ## enum values must not collide with special function names
    assert len([i for i in names if i.startswith("__")]) == 0,\
        'Enum values beginning with __ are not supported'
    ## each enum value must be unique from all others
    assert names == uniquify(names), 'Enums must not repeat values'

    class EnumClass(object):
        """ See parent function for explanation """

        __slots__ = names

        def __iter__(self):
            return iter(constants)

        def __len__(self):
            return len(constants)

        def __getitem__(self, i):
            ## this makes xx['name'] possible
            if isinstance(i, types.StringTypes):
                i = names.index(i)
            ## handles the more normal xx[0]
            return constants[i]

        def __repr__(self):
            return 'enum' + str(names)

        def __str__(self):
            return 'enum ' + str(constants)

        def index(self, i):
            return names.index(i)

    class EnumValue(object):
        """ See parent function for explanation """

        __slots__ = ('__value')

        def __init__(self, value):
            self.__value = value

        value = property(lambda self: self.__value)

        enumtype = property(lambda self: enumtype)

        def __hash__(self):
            return hash(self.__value)

        def __cmp__(self, other):
            assert self.enumtype is other.enumtype, 'Only values from the same enum are comparable'
            return cmp(self.value, other.value)

        def __invert__(self):
            return constants[maximum - self.value]

        def __nonzero__(self):
            ## return bool(self.value)
            ## Original code led to bool(x[0])==False, not correct
            return True

        def __repr__(self):
            return str(names[self.value])

    maximum = len(names) - 1
    constants = [None] * len(names)
    for i, each in enumerate(names):
        val = EnumValue(i)
        setattr(EnumClass, each, val)
        constants[i] = val
    constants = tuple(constants)
    enumtype = EnumClass()
    return enumtype

回答 26

这是Alec Thomas的解决方案的一个变体:

def enum(*args, **kwargs):
    return type('Enum', (), dict((y, x) for x, y in enumerate(args), **kwargs)) 

x = enum('POOH', 'TIGGER', 'EEYORE', 'ROO', 'PIGLET', 'RABBIT', 'OWL')
assert x.POOH == 0
assert x.TIGGER == 1

Here is a variant on Alec Thomas’s solution:

def enum(*args, **kwargs):
    return type('Enum', (), dict((y, x) for x, y in enumerate(args), **kwargs)) 

x = enum('POOH', 'TIGGER', 'EEYORE', 'ROO', 'PIGLET', 'RABBIT', 'OWL')
assert x.POOH == 0
assert x.TIGGER == 1

回答 27

此解决方案是获取枚举类的简单方法,该类定义为列表(不再烦人的整数分配):

枚举.py:

import new

def create(class_name, names):
    return new.classobj(
        class_name, (object,), dict((y, x) for x, y in enumerate(names))
    )

example.py:

import enumeration

Colors = enumeration.create('Colors', (
    'red',
    'orange',
    'yellow',
    'green',
    'blue',
    'violet',
))

This solution is a simple way of getting a class for the enumeration defined as a list (no more annoying integer assignments):

enumeration.py:

import new

def create(class_name, names):
    return new.classobj(
        class_name, (object,), dict((y, x) for x, y in enumerate(names))
    )

example.py:

import enumeration

Colors = enumeration.create('Colors', (
    'red',
    'orange',
    'yellow',
    'green',
    'blue',
    'violet',
))

回答 28

虽然最初的枚举建议PEP 354在几年前被拒绝,但它仍在继续提出。本来打算将某种枚举添加到3.2,但是将其推回到3.3,然后忘记了。现在有一个PEP 435,打算包含在Python 3.4中。PEP 435的参考实现是flufl.enum

截至2013年4月,似乎已经达成了普遍共识,即应该在3.4的标准库中添加一些内容,只要人们可以就该“内容”达成共识。那是困难的部分。请参阅此处此处开始的主题以及2013年前几个月的其他六个主题。

同时,每次出现这种情况时,都会在PyPI,ActiveState等上出现大量新设计和实现,因此,如果您不喜欢FLUFL设计,请尝试进行PyPI搜索

While the original enum proposal, PEP 354, was rejected years ago, it keeps coming back up. Some kind of enum was intended to be added to 3.2, but it got pushed back to 3.3 and then forgotten. And now there’s a PEP 435 intended for inclusion in Python 3.4. The reference implementation of PEP 435 is flufl.enum.

As of April 2013, there seems to be a general consensus that something should be added to the standard library in 3.4—as long as people can agree on what that “something” should be. That’s the hard part. See the threads starting here and here, and a half dozen other threads in the early months of 2013.

Meanwhile, every time this comes up, a slew of new designs and implementations appear on PyPI, ActiveState, etc., so if you don’t like the FLUFL design, try a PyPI search.


回答 29

使用以下内容。

TYPE = {'EAN13':   u'EAN-13',
        'CODE39':  u'Code 39',
        'CODE128': u'Code 128',
        'i25':     u'Interleaved 2 of 5',}

>>> TYPE.items()
[('EAN13', u'EAN-13'), ('i25', u'Interleaved 2 of 5'), ('CODE39', u'Code 39'), ('CODE128', u'Code 128')]
>>> TYPE.keys()
['EAN13', 'i25', 'CODE39', 'CODE128']
>>> TYPE.values()
[u'EAN-13', u'Interleaved 2 of 5', u'Code 39', u'Code 128']

我将其用于Django模型选择,它看起来非常Python。它实际上不是一个枚举,但是可以完成工作。

Use the following.

TYPE = {'EAN13':   u'EAN-13',
        'CODE39':  u'Code 39',
        'CODE128': u'Code 128',
        'i25':     u'Interleaved 2 of 5',}

>>> TYPE.items()
[('EAN13', u'EAN-13'), ('i25', u'Interleaved 2 of 5'), ('CODE39', u'Code 39'), ('CODE128', u'Code 128')]
>>> TYPE.keys()
['EAN13', 'i25', 'CODE39', 'CODE128']
>>> TYPE.values()
[u'EAN-13', u'Interleaved 2 of 5', u'Code 39', u'Code 128']

I used that for Django model choices, and it looks very pythonic. It is not really an Enum, but it does the job.


什么是setup.py?

问题:什么是setup.py?

谁能解释一下setup.py它是什么以及如何进行配置或使用?

Can anyone please explain what setup.py is and how it can be configured or used?


回答 0

setup.py 是一个python文件,通常会告诉您要安装的模块/软件包已与Distutils打包并分发,Distutils是分发Python模块的标准。

这使您可以轻松安装Python软件包。通常写就足够了:

$ pip install . 

pip将使用setup.py安装模块。避免setup.py直接调用。

https://docs.python.org/3/installing/index.html#installing-index

setup.py is a python file, which usually tells you that the module/package you are about to install has been packaged and distributed with Distutils, which is the standard for distributing Python Modules.

This allows you to easily install Python packages. Often it’s enough to write:

$ pip install . 

pip will use setup.py to install your module. Avoid calling setup.py directly.

https://docs.python.org/3/installing/index.html#installing-index


回答 1

它有助于foo在您的计算机上安装python软件包(也可以位于中virtualenv),以便您可以foo从其他项目以及[I] Python提示符中导入该软件包。

它完成pipeasy_install等的类似工作,


使用 setup.py

让我们从一些定义开始:

-包含__init__.py文件的文件夹/目录。
模块 -具有.py扩展名的有效python文件。
分发 -一个软件包与其他软件包模块的关系

假设您要安装名为的软件包foo。那你做

$ git clone https://github.com/user/foo  
$ cd foo
$ python setup.py install

相反,如果您不想实际安装它,但仍然想使用它。然后做,

$ python setup.py develop  

此命令将在站点包内创建到源目录的符号链接,而不是复制内容。因此,它非常快(特别是对于大包装)。


创造 setup.py

如果您有类似的打包树,

foo
├── foo
   ├── data_struct.py
   ├── __init__.py
   └── internals.py
├── README
├── requirements.txt
└── setup.py

然后,在setup.py脚本中执行以下操作,以便可以将其安装在某些计算机上:

from setuptools import setup

setup(
   name='foo',
   version='1.0',
   description='A useful module',
   author='Man Foo',
   author_email='foomail@foo.com',
   packages=['foo'],  #same as name
   install_requires=['bar', 'greek'], #external packages as dependencies
)

相反,如果您的程序包树更复杂,如以下所示:

foo
├── foo
   ├── data_struct.py
   ├── __init__.py
   └── internals.py
├── README
├── requirements.txt
├── scripts
   ├── cool
   └── skype
└── setup.py

然后,setup.py在这种情况下,您将像:

from setuptools import setup

setup(
   name='foo',
   version='1.0',
   description='A useful module',
   author='Man Foo',
   author_email='foomail@foo.com',
   packages=['foo'],  #same as name
   install_requires=['bar', 'greek'], #external packages as dependencies
   scripts=[
            'scripts/cool',
            'scripts/skype',
           ]
)

向(setup.py)添加更多内容,并使其得体:

from setuptools import setup

with open("README", 'r') as f:
    long_description = f.read()

setup(
   name='foo',
   version='1.0',
   description='A useful module',
   license="MIT",
   long_description=long_description,
   author='Man Foo',
   author_email='foomail@foo.com',
   url="http://www.foopackage.com/",
   packages=['foo'],  #same as name
   install_requires=['bar', 'greek'], #external packages as dependencies
   scripts=[
            'scripts/cool',
            'scripts/skype',
           ]
)

long_description被使用pypi.org作为你的包的README描述。


最后,您现在可以将软件包上传到PyPi.org,以便其他人可以使用来安装您的软件包pip install yourpackage

第一步是使用以下方法在pypi中声明您的软件包名称和空间:

$ python setup.py register

注册您的包裹名称后,任何人都无法声明或使用它。成功注册后,您必须通过以下方式将软件包上传到云(到云):

$ python setup.py upload

您也可以选择GPG通过以下方式对包裹进行签名:

$ python setup.py --sign upload

奖励setup.py在此处查看来自真实项目的示例:torchvision-setup.py

It helps to install a python package foo on your machine (can also be in virtualenv) so that you can import the package foo from other projects and also from [I]Python prompts.

It does the similar job of pip, easy_install etc.,


Using setup.py

Let’s start with some definitions:

Package – A folder/directory that contains __init__.py file.
Module – A valid python file with .py extension.
Distribution – How one package relates to other packages and modules.

Let’s say you want to install a package named foo. Then you do,

$ git clone https://github.com/user/foo  
$ cd foo
$ python setup.py install

Instead, if you don’t want to actually install it but still would like to use it. Then do,

$ python setup.py develop  

This command will create symlinks to the source directory within site-packages instead of copying things. Because of this, it is quite fast (particularly for large packages).


Creating setup.py

If you have your package tree like,

foo
├── foo
│   ├── data_struct.py
│   ├── __init__.py
│   └── internals.py
├── README
├── requirements.txt
└── setup.py

Then, you do the following in your setup.py script so that it can be installed on some machine:

from setuptools import setup

setup(
   name='foo',
   version='1.0',
   description='A useful module',
   author='Man Foo',
   author_email='foomail@foo.com',
   packages=['foo'],  #same as name
   install_requires=['bar', 'greek'], #external packages as dependencies
)

Instead, if your package tree is more complex like the one below:

foo
├── foo
│   ├── data_struct.py
│   ├── __init__.py
│   └── internals.py
├── README
├── requirements.txt
├── scripts
│   ├── cool
│   └── skype
└── setup.py

Then, your setup.py in this case would be like:

from setuptools import setup

setup(
   name='foo',
   version='1.0',
   description='A useful module',
   author='Man Foo',
   author_email='foomail@foo.com',
   packages=['foo'],  #same as name
   install_requires=['bar', 'greek'], #external packages as dependencies
   scripts=[
            'scripts/cool',
            'scripts/skype',
           ]
)

Add more stuff to (setup.py) & make it decent:

from setuptools import setup

with open("README", 'r') as f:
    long_description = f.read()

setup(
   name='foo',
   version='1.0',
   description='A useful module',
   license="MIT",
   long_description=long_description,
   author='Man Foo',
   author_email='foomail@foo.com',
   url="http://www.foopackage.com/",
   packages=['foo'],  #same as name
   install_requires=['bar', 'greek'], #external packages as dependencies
   scripts=[
            'scripts/cool',
            'scripts/skype',
           ]
)

The long_description is used in pypi.org as the README description of your package.


And finally, you’re now ready to upload your package to PyPi.org so that others can install your package using pip install yourpackage.

First step is to claim your package name & space in pypi using:

$ python setup.py register

Once your package name is registered, nobody can claim or use it. After successful registration, you have to upload your package there (to the cloud) by,

$ python setup.py upload

Optionally, you can also sign your package with GPG by,

$ python setup.py --sign upload

Bonus: See a sample setup.py from a real project here: torchvision-setup.py


回答 2

setup.py是Python对多平台安装程序和make文件的解答。

如果您熟悉命令行安装,请make && make install转换为python setup.py build && python setup.py install

一些软件包是纯Python,并且仅按字节编译。其他可能包含本机代码,这将需要本机编译器(如gcccl)和Python接口模块(如swigpyrex)。

setup.py is Python’s answer to a multi-platform installer and make file.

If you’re familiar with command line installations, then make && make install translates to python setup.py build && python setup.py install.

Some packages are pure Python, and are only byte compiled. Others may contain native code, which will require a native compiler (like gcc or cl) and a Python interfacing module (like swig or pyrex).


回答 3

如果您下载的软件包在根文件夹中具有“ setup.py”,则可以通过运行以下命令进行安装

python setup.py install

如果您正在开发项目,并且想知道此文件的用途,请查看有关编写安装脚本的Python文档。

If you downloaded package that has “setup.py” in root folder, you can install it by running

python setup.py install

If you are developing a project and are wondering what this file is useful for, check Python documentation on writing the Setup Script


回答 4

setup.py是通常用该语言编写的库或程序随附的Python脚本。目的是正确安装软件。

许多软件包将distutils框架与结合使用setup.py

http://docs.python.org/distutils/

setup.py is a Python script that is usually shipped with libraries or programs, written in that language. It’s purpose is the correct installation of the software.

Many packages use the distutils framework in conjuction with setup.py.

http://docs.python.org/distutils/


回答 5

setup.py可以在两种情况下使用:首先,您要安装Python软件包。其次,您要创建自己的Python包。通常,标准的Python软件包具有几个重要文件,例如setup.py,setup.cfg和Manifest.in。当您创建Python软件包时,这三个文件将确定(egg-info文件夹下PKG-INFO中的内容)名称,版本,描述,其他所需的安装(通常在.txt文件中)以及其他几个参数。创建包时setup.py将读取setup.cfg(可以是tar.gz)。在Manifest.in中,您可以定义应包含在软件包中的内容。无论如何,您都可以使用setup.py做很多事情,例如

python setup.py build
python setup.py install
python setup.py sdist <distname> upload [-r urltorepo]  (to upload package to pypi or local repo)

还有许多其他命令可以与setup.py一起使用。求助

python setup.py --help-commands

setup.py can be used in two scenarios , First, you want to install a Python package. Second, you want to create your own Python package. Usually standard Python package has couple of important files like setup.py, setup.cfg and Manifest.in. When you are creating the Python package, these three files will determine the (content in PKG-INFO under egg-info folder) name, version, description, other required installations (usually in .txt file) and few other parameters. setup.cfg is read by setup.py while package is created (could be tar.gz ). Manifest.in is where you can define what should be included in your package. Anyways you can do bunch of stuff using setup.py like

python setup.py build
python setup.py install
python setup.py sdist <distname> upload [-r urltorepo]  (to upload package to pypi or local repo)

There are bunch of other commands which could be used with setup.py . for help

python setup.py --help-commands

回答 6

当您通过setup.py打开终端(Mac,Linux)或命令提示符(Windows)下载软件包时。使用“ cd Tab”按钮并为您提供帮助,将路径设置为已下载文件的文件夹的正确位置,该文​​件夹位于setup.py

iMac:~ user $ cd path/pakagefolderwithsetupfile/

按Enter键,您应该会看到类似以下内容:

iMac:pakagefolderwithsetupfile user$

然后输入以下内容python setup.py install

iMac:pakagefolderwithsetupfile user$ python setup.py install

enter。做完了!

When you download a package with setup.py open your Terminal (Mac,Linux) or Command Prompt (Windows). Using cd and helping you with Tab button set the path right to the folder where you have downloaded the file and where there is setup.py :

iMac:~ user $ cd path/pakagefolderwithsetupfile/

Press enter, you should see something like this:

iMac:pakagefolderwithsetupfile user$

Then type after this python setup.py install :

iMac:pakagefolderwithsetupfile user$ python setup.py install

Press enter. Done!


回答 7

要安装已下载的Python软件包,请提取档案并在其中运行setup.py脚本:

python setup.py install

对我来说,这一直很奇怪。将包管理器指向下载位置会更自然,例如在Ruby和Nodejs中。gem install rails-4.1.1.gem

包管理器也更舒适,因为它既熟悉又可靠。另一方面,每个setup.py都是新颖的,因为它是特定于包装的。它要求遵守约定“我相信此setup.py会接受与过去使用的命令相同的命令”。这是对精神意志力的遗憾。

我并不是说setup.py工作流的安全性不如包管理器(我知道Pip只是在内部运行setup.py),但是我肯定觉得这很麻烦。将所有命令都发送到同一个程序包管理器应用程序是一种和谐。您甚至可能会喜欢它。

To install a Python package you’ve downloaded, you extract the archive and run the setup.py script inside:

python setup.py install

To me, this has always felt odd. It would be more natural to point a package manager at the download, as one would do in Ruby and Nodejs, eg. gem install rails-4.1.1.gem

A package manager is more comfortable too, because it’s familiar and reliable. On the other hand, each setup.py is novel, because it’s specific to the package. It demands faith in convention “I trust this setup.py takes the same commands as others I have used in the past”. That’s a regrettable tax on mental willpower.

I’m not saying the setup.py workflow is less secure than a package manager (I understand Pip just runs the setup.py inside), but certainly I feel it’s awkard and jarring. There’s a harmony to commands all being to the same package manager application. You might even grow fond it.


回答 8

setup.py是与其他文件一样的Python文件。它可以采用任何名称,除非按惯例命名,否则setup.py每个脚本都没有不同的过程。

最常setup.py用于安装Python模块,但用于服务器其他目的:

模块:

也许这是setup.py模块中最著名的用法。尽管可以使用来安装它们pip,但pip默认情况下不包括旧的Python版本,因此需要单独安装。

如果您想安装模块但不想安装pip,则唯一的选择是从setup.py文件安装模块。这可以通过完成python setup.py install。这将Python模块安装到根字典(不pipeasy_installECT)。

pip失败时通常使用此方法。例如,如果所需软件包的正确Python版本pip由于可能由于不再维护而无法提供,则下载源并运行python setup.py install将执行相同的操作,除非需要编译的二进制文件(但将忽略编译的二进制文件)。 Python版本-除非返回错误)。

的另一种用法setup.py是从源代码安装软件包。如果模块仍在开发中,则将无法使用wheel文件,并且唯一的安装方法是直接从源代码进行安装。

构建Python扩展:

构建模块后,可以使用distutils安装脚本将其转换为可分发的模块。一旦构建完成,就可以使用上面的命令进行安装。

安装脚本易于构建,一旦文件已正确配置并且可以通过运行进行编译python setup.py build(请参阅所有命令的链接)。

再次setup.py按易用性和惯例命名,但可以使用任何名称。

Cython:

setup.py文件的另一种著名用法包括编译后的扩展名。这些需要具有用户定义值的安装脚本。它们允许快速执行(但一旦编译则依赖平台)。这是文档中的一个简单示例:

from distutils.core import setup
from Cython.Build import cythonize

setup(
    name = 'Hello world app',
    ext_modules = cythonize("hello.pyx"),
)

这可以通过编译 python setup.py build

Cx_Freeze:

需要安装脚本的另一个模块是cx_Freeze。这会将Python脚本转换为可执行文件。这允许包括描述,名称,图标,包在内的许多命令包括,排除等,并且一旦运行将产生可分发的应用程序。文档中的示例:

import sys
from cx_Freeze import setup, Executable
build_exe_options = {"packages": ["os"], "excludes": ["tkinter"]} 

base = None
if sys.platform == "win32":
    base = "Win32GUI"

setup(  name = "guifoo",
        version = "0.1",
        description = "My GUI application!",
        options = {"build_exe": build_exe_options},
        executables = [Executable("guifoo.py", base=base)])

可以通过编译python setup.py build

那么什么是setup.py文件?

很简单,它是一个在Python环境中构建或配置某些东西的脚本。

分发时,程序包应仅包含一个安装脚本,但将多个脚本组合成一个安装脚本并不少见。请注意,这经常涉及distutils但并非总是如此(如我在上一个示例中所示)。要记住的事情是以某种方式配置Python包/脚本。

它使用名称,因此在构建或安装时始终可以使用相同的命令。

setup.py is a Python file like any other. It can take any name, except by convention it is named setup.py so that there is not a different procedure with each script.

Most frequently setup.py is used to install a Python module but server other purposes:

Modules:

Perhaps this is most famous usage of setup.py is in modules. Although they can be installed using pip, old Python versions did not include pip by default and they needed to be installed separately.

If you wanted to install a module but did not want to install pip, just about the only alternative was to install the module from setup.py file. This could be achieved via python setup.py install. This would install the Python module to the root dictionary (without pip, easy_install ect).

This method is often used when pip will fail. For example if the correct Python version of the desired package is not available via pipperhaps because it is no longer maintained, , downloading the source and running python setup.py install would perform the same thing, except in the case of compiled binaries are required, (but will disregard the Python version -unless an error is returned).

Another use of setup.py is to install a package from source. If a module is still under development the wheel files will not be available and the only way to install is to install from the source directly.

Building Python extensions:

When a module has been built it can be converted into module ready for distribution using a distutils setup script. Once built these can be installed using the command above.

A setup script is easy to build and once the file has been properly configured and can be compiled by running python setup.py build (see link for all commands).

Once again it is named setup.py for ease of use and by convention, but can take any name.

Cython:

Another famous use of setup.py files include compiled extensions. These require a setup script with user defined values. They allow fast (but once compiled are platform dependant) execution. Here is a simple example from the documentation:

from distutils.core import setup
from Cython.Build import cythonize

setup(
    name = 'Hello world app',
    ext_modules = cythonize("hello.pyx"),
)

This can be compiled via python setup.py build

Cx_Freeze:

Another module requiring a setup script is cx_Freeze. This converts Python script to executables. This allows many commands such as descriptions, names, icons, packages to include, exclude ect and once run will produce a distributable application. An example from the documentation:

import sys
from cx_Freeze import setup, Executable
build_exe_options = {"packages": ["os"], "excludes": ["tkinter"]} 

base = None
if sys.platform == "win32":
    base = "Win32GUI"

setup(  name = "guifoo",
        version = "0.1",
        description = "My GUI application!",
        options = {"build_exe": build_exe_options},
        executables = [Executable("guifoo.py", base=base)])

This can be compiled via python setup.py build.

So what is a setup.py file?

Quite simply it is a script that builds or configures something in the Python environment.

A package when distributed should contain only one setup script but it is not uncommon to combine several together into a single setup script. Notice this often involves distutils but not always (as I showed in my last example). The thing to remember it just configures Python package/script in some way.

It takes the name so the same command can always be used when building or installing.


回答 9

为简单起见,setup.py的运行就像"__main__"您在调用安装函数时提到的其他答案一样。在setup.py内部,应该放置安装软件包所需的一切。

常用的setup.py功能

以下两节讨论了许多setup.py模块具有的两件事。

setuptools.setup

此功能允许您指定项目属性,例如项目的名称,版本。…最重要的是,如果其他功能打包正确,此功能将允许您安装其他功能。请参阅此网页以获取setuptools.setup的示例。setuptools.setup的

这些属性允许安装以下类型的软件包:

自定义功能

在理想的世界中,setuptools.setup将为您处理所有事情。不幸的是,情况并非总是如此。有时,您需要做一些特定的事情,例如使用subprocess命令安装依赖项,以使要安装的系统处于正确的软件包状态。尝试避免这种情况,这些功能会造成混乱,并且在OS甚至发行版之间通常会有所不同。

To make it simple, setup.py is run as "__main__" when you call the install functions the other answers mentioned. Inside setup.py, you should put everything needed to install your package.

Common setup.py functions

The following two sections discuss two things many setup.py modules have.

setuptools.setup

This function allows you to specify project attributes like the name of the project, the version…. Most importantly, this function allows you to install other functions if they’re packaged properly. See this webpage for an example of setuptools.setup

These attributes of setuptools.setup enable installing these types of packages:

  • Packages that are imported to your project and listed in PyPI using setuptools.findpackages:

    packages=find_packages(exclude=["docs","tests", ".gitignore", "README.rst","DESCRIPTION.rst"])

  • Packages not in PyPI, but can be downloaded from a URL using dependency_links

    dependency_links=["http://peak.telecommunity.com/snapshots/",]

Custom functions

In an ideal world, setuptools.setup would handle everything for you. Unfortunately this isn’t always the case. Sometimes you have to do specific things, like installing dependencies with the subprocess command, to get the system you’re installing on in the right state for your package. Try to avoid this, these functions get confusing and often differ between OS and even distribution.