分类目录归档：知识问答

如何修剪字符串中的空格？

2021年7月24日 Python实用宝典

问题：如何修剪字符串中的空格？

如何从Python中的字符串中删除开头和结尾的空格？

例如：

" Hello " --> "Hello"
" Hello"  --> "Hello"
"Hello "  --> "Hello"
"Bob has a cat" --> "Bob has a cat"

How do I remove leading and trailing whitespace from a string in Python?

For example:

" Hello " --> "Hello"
" Hello"  --> "Hello"
"Hello "  --> "Hello"
"Bob has a cat" --> "Bob has a cat"

回答 0

只是一个空格，还是所有连续的空格？如果是第二个，则字符串已经具有.strip()方法：

>>> ' Hello '.strip()
'Hello'
>>> ' Hello'.strip()
'Hello'
>>> 'Bob has a cat'.strip()
'Bob has a cat'
>>> '   Hello   '.strip()  # ALL consecutive spaces at both ends removed
'Hello'

但是，如果只需要删除一个空格，可以使用以下方法：

def strip_one_space(s):
    if s.endswith(" "): s = s[:-1]
    if s.startswith(" "): s = s[1:]
    return s

>>> strip_one_space("   Hello ")
'  Hello'

另外，请注意，str.strip()它也会删除其他空白字符（例如，制表符和换行符）。要仅删除空格，您可以指定要删除的字符作为的参数strip，即：

>>> "  Hello\n".strip(" ")
'Hello\n'

Just one space, or all consecutive spaces? If the second, then strings already have a .strip() method:

>>> ' Hello '.strip()
'Hello'
>>> ' Hello'.strip()
'Hello'
>>> 'Bob has a cat'.strip()
'Bob has a cat'
>>> '   Hello   '.strip()  # ALL consecutive spaces at both ends removed
'Hello'

If you need only to remove one space however, you could do it with:

def strip_one_space(s):
    if s.endswith(" "): s = s[:-1]
    if s.startswith(" "): s = s[1:]
    return s

>>> strip_one_space("   Hello ")
'  Hello'

Also, note that str.strip() removes other whitespace characters as well (e.g. tabs and newlines). To remove only spaces, you can specify the character to remove as an argument to strip, i.e.:

>>> "  Hello\n".strip(" ")
'Hello\n'

回答 1

正如以上答案中指出的

myString.strip()

将删除所有前导和尾随空格字符，例如\ n，\ r，\ t，\ f，空格。

为了获得更大的灵活性，请使用以下命令

仅删除前导空格字符：myString.lstrip()
仅删除尾随空白字符：myString.rstrip()
删除特定的空格字符：myString.strip('\n')或myString.lstrip('\n\r')or myString.rstrip('\n\t')等等。

更多详细信息可在文档中找到

As pointed out in answers above

myString.strip()

will remove all the leading and trailing whitespace characters such as \n, \r, \t, \f, space.

For more flexibility use the following

Removes only leading whitespace chars: myString.lstrip()
Removes only trailing whitespace chars: myString.rstrip()
Removes specific whitespace chars: myString.strip('\n') or myString.lstrip('\n\r') or myString.rstrip('\n\t') and so on.

More details are available in the docs

回答 2

strip 不限于空白字符：

# remove all leading/trailing commas, periods and hyphens
title = title.strip(',.-')

strip is not limited to whitespace characters either:

# remove all leading/trailing commas, periods and hyphens
title = title.strip(',.-')

回答 3

这将删除以下所有开头和结尾的空格myString：

myString.strip()

This will remove all leading and trailing whitespace in myString:

myString.strip()

回答 4

您要strip（）：

myphrases = [ " Hello ", " Hello", "Hello ", "Bob has a cat" ]

for phrase in myphrases:
    print phrase.strip()

You want strip():

myphrases = [ " Hello ", " Hello", "Hello ", "Bob has a cat" ]

for phrase in myphrases:
    print phrase.strip()

回答 5

我想删除字符串中太多的空格（不仅在开头或结尾，而且在字符串之间）。我这样做了，因为否则我不知道该怎么做：

string = "Name : David         Account: 1234             Another thing: something  " 

ready = False
while ready == False:
    pos = string.find("  ")
    if pos != -1:
       string = string.replace("  "," ")
    else:
       ready = True
print(string)

这将在一个空间中替换双倍空格，直到您不再有双倍空格为止

I wanted to remove the too-much spaces in a string (also in between the string, not only in the beginning or end). I made this, because I don’t know how to do it otherwise:

string = "Name : David         Account: 1234             Another thing: something  " 

ready = False
while ready == False:
    pos = string.find("  ")
    if pos != -1:
       string = string.replace("  "," ")
    else:
       ready = True
print(string)

This replaces double spaces in one space until you have no double spaces any more

回答 6

我找不到想要的解决方案，所以我创建了一些自定义函数。您可以尝试一下。

def cleansed(s: str):
    """:param s: String to be cleansed"""
    assert s is not (None or "")
    # return trimmed(s.replace('"', '').replace("'", ""))
    return trimmed(s)


def trimmed(s: str):
    """:param s: String to be cleansed"""
    assert s is not (None or "")
    ss = trim_start_and_end(s).replace('  ', ' ')
    while '  ' in ss:
        ss = ss.replace('  ', ' ')
    return ss


def trim_start_and_end(s: str):
    """:param s: String to be cleansed"""
    assert s is not (None or "")
    return trim_start(trim_end(s))


def trim_start(s: str):
    """:param s: String to be cleansed"""
    assert s is not (None or "")
    chars = []
    for c in s:
        if c is not ' ' or len(chars) > 0:
            chars.append(c)
    return "".join(chars).lower()


def trim_end(s: str):
    """:param s: String to be cleansed"""
    assert s is not (None or "")
    chars = []
    for c in reversed(s):
        if c is not ' ' or len(chars) > 0:
            chars.append(c)
    return "".join(reversed(chars)).lower()


s1 = '  b Beer '
s2 = 'Beer  b    '
s3 = '      Beer  b    '
s4 = '  bread butter    Beer  b    '

cdd = trim_start(s1)
cddd = trim_end(s2)
clean1 = cleansed(s3)
clean2 = cleansed(s4)

print("\nStr: {0} Len: {1} Cleansed: {2} Len: {3}".format(s1, len(s1), cdd, len(cdd)))
print("\nStr: {0} Len: {1} Cleansed: {2} Len: {3}".format(s2, len(s2), cddd, len(cddd)))
print("\nStr: {0} Len: {1} Cleansed: {2} Len: {3}".format(s3, len(s3), clean1, len(clean1)))
print("\nStr: {0} Len: {1} Cleansed: {2} Len: {3}".format(s4, len(s4), clean2, len(clean2)))

I could not find a solution to what I was looking for so I created some custom functions. You can try them out.

def cleansed(s: str):
    """:param s: String to be cleansed"""
    assert s is not (None or "")
    # return trimmed(s.replace('"', '').replace("'", ""))
    return trimmed(s)


def trimmed(s: str):
    """:param s: String to be cleansed"""
    assert s is not (None or "")
    ss = trim_start_and_end(s).replace('  ', ' ')
    while '  ' in ss:
        ss = ss.replace('  ', ' ')
    return ss


def trim_start_and_end(s: str):
    """:param s: String to be cleansed"""
    assert s is not (None or "")
    return trim_start(trim_end(s))


def trim_start(s: str):
    """:param s: String to be cleansed"""
    assert s is not (None or "")
    chars = []
    for c in s:
        if c is not ' ' or len(chars) > 0:
            chars.append(c)
    return "".join(chars).lower()


def trim_end(s: str):
    """:param s: String to be cleansed"""
    assert s is not (None or "")
    chars = []
    for c in reversed(s):
        if c is not ' ' or len(chars) > 0:
            chars.append(c)
    return "".join(reversed(chars)).lower()


s1 = '  b Beer '
s2 = 'Beer  b    '
s3 = '      Beer  b    '
s4 = '  bread butter    Beer  b    '

cdd = trim_start(s1)
cddd = trim_end(s2)
clean1 = cleansed(s3)
clean2 = cleansed(s4)

print("\nStr: {0} Len: {1} Cleansed: {2} Len: {3}".format(s1, len(s1), cdd, len(cdd)))
print("\nStr: {0} Len: {1} Cleansed: {2} Len: {3}".format(s2, len(s2), cddd, len(cddd)))
print("\nStr: {0} Len: {1} Cleansed: {2} Len: {3}".format(s3, len(s3), clean1, len(clean1)))
print("\nStr: {0} Len: {1} Cleansed: {2} Len: {3}".format(s4, len(s4), clean2, len(clean2)))

回答 7

如果要从left和right修剪指定数量的空格，可以执行以下操作：

def remove_outer_spaces(text, num_of_leading, num_of_trailing):
    text = list(text)
    for i in range(num_of_leading):
        if text[i] == " ":
            text[i] = ""
        else:
            break

    for i in range(1, num_of_trailing+1):
        if text[-i] == " ":
            text[-i] = ""
        else:
            break
    return ''.join(text)

txt1 = "   MY name is     "
print(remove_outer_spaces(txt1, 1, 1))  # result is: "  MY name is    "
print(remove_outer_spaces(txt1, 2, 3))  # result is: " MY name is  "
print(remove_outer_spaces(txt1, 6, 8))  # result is: "MY name is"

If you want to trim specified number of spaces from left and right, you could do this:

def remove_outer_spaces(text, num_of_leading, num_of_trailing):
    text = list(text)
    for i in range(num_of_leading):
        if text[i] == " ":
            text[i] = ""
        else:
            break

    for i in range(1, num_of_trailing+1):
        if text[-i] == " ":
            text[-i] = ""
        else:
            break
    return ''.join(text)

txt1 = "   MY name is     "
print(remove_outer_spaces(txt1, 1, 1))  # result is: "  MY name is    "
print(remove_outer_spaces(txt1, 2, 3))  # result is: " MY name is  "
print(remove_outer_spaces(txt1, 6, 8))  # result is: "MY name is"

回答 8

也可以使用正则表达式来完成

import re

input  = " Hello "
output = re.sub(r'^\s+|\s+$', '', input)
# output = 'Hello'

This can also be done with a regular expression

import re

input  = " Hello "
output = re.sub(r'^\s+|\s+$', '', input)
# output = 'Hello'

回答 9

如何从Python中的字符串中删除开头和结尾的空格？

因此，下面的解决方案也将删除前导和尾随空格以及中间空格。就像您需要获取不带多个空格的清晰字符串值一样。

>>> str_1 = '     Hello World'
>>> print(' '.join(str_1.split()))
Hello World
>>>
>>>
>>> str_2 = '     Hello      World'
>>> print(' '.join(str_2.split()))
Hello World
>>>
>>>
>>> str_3 = 'Hello World     '
>>> print(' '.join(str_3.split()))
Hello World
>>>
>>>
>>> str_4 = 'Hello      World     '
>>> print(' '.join(str_4.split()))
Hello World
>>>
>>>
>>> str_5 = '     Hello World     '
>>> print(' '.join(str_5.split()))
Hello World
>>>
>>>
>>> str_6 = '     Hello      World     '
>>> print(' '.join(str_6.split()))
Hello World
>>>
>>>
>>> str_7 = 'Hello World'
>>> print(' '.join(str_7.split()))
Hello World

如您所见，这将删除字符串中的所有多个空格（输出适用Hello World于所有空格）。位置无关紧要。但是，如果您确实需要前导和尾随空格，那么strip()就会发现。

How do I remove leading and trailing whitespace from a string in Python?

So below solution will remove leading and trailing whitespaces as well as intermediate whitespaces too. Like if you need to get a clear string values without multiple whitespaces.

>>> str_1 = '     Hello World'
>>> print(' '.join(str_1.split()))
Hello World
>>>
>>>
>>> str_2 = '     Hello      World'
>>> print(' '.join(str_2.split()))
Hello World
>>>
>>>
>>> str_3 = 'Hello World     '
>>> print(' '.join(str_3.split()))
Hello World
>>>
>>>
>>> str_4 = 'Hello      World     '
>>> print(' '.join(str_4.split()))
Hello World
>>>
>>>
>>> str_5 = '     Hello World     '
>>> print(' '.join(str_5.split()))
Hello World
>>>
>>>
>>> str_6 = '     Hello      World     '
>>> print(' '.join(str_6.split()))
Hello World
>>>
>>>
>>> str_7 = 'Hello World'
>>> print(' '.join(str_7.split()))
Hello World

As you can see this will remove all the multiple whitespace in the string(output is Hello World for all). Location doesn’t matter. But if you really need leading and trailing whitespaces, then strip() would be find.

知识问答

将绘图保存到图像文件，而不是使用Matplotlib显示

2021年7月24日 Python实用宝典

问题：将绘图保存到图像文件，而不是使用Matplotlib显示

我正在编写一个快速脚本来动态生成绘图。我使用下面的代码（来自Matplotlib文档）作为起点：

from pylab import figure, axes, pie, title, show

# Make a square figure and axes
figure(1, figsize=(6, 6))
ax = axes([0.1, 0.1, 0.8, 0.8])

labels = 'Frogs', 'Hogs', 'Dogs', 'Logs'
fracs = [15, 30, 45, 10]

explode = (0, 0.05, 0, 0)
pie(fracs, explode=explode, labels=labels, autopct='%1.1f%%', shadow=True)
title('Raining Hogs and Dogs', bbox={'facecolor': '0.8', 'pad': 5})

show()  # Actually, don't show, just save to foo.png

我不想将图形显示在GUI上，而是要将图形保存到文件（例如foo.png）中，以便可以在批处理脚本中使用它。我怎么做？

I am writing a quick-and-dirty script to generate plots on the fly. I am using the code below (from Matplotlib documentation) as a starting point:

from pylab import figure, axes, pie, title, show

# Make a square figure and axes
figure(1, figsize=(6, 6))
ax = axes([0.1, 0.1, 0.8, 0.8])

labels = 'Frogs', 'Hogs', 'Dogs', 'Logs'
fracs = [15, 30, 45, 10]

explode = (0, 0.05, 0, 0)
pie(fracs, explode=explode, labels=labels, autopct='%1.1f%%', shadow=True)
title('Raining Hogs and Dogs', bbox={'facecolor': '0.8', 'pad': 5})

show()  # Actually, don't show, just save to foo.png

I don’t want to display the plot on a GUI, instead, I want to save the plot to a file (say foo.png), so that, for example, it can be used in batch scripts. How do I do that?

回答 0

在回答问题后，我想在使用matplotlib.pyplot.savefig时添加一些有用的提示。文件格式可以通过扩展名指定：

from matplotlib import pyplot as plt

plt.savefig('foo.png')
plt.savefig('foo.pdf')

将分别给出栅格化或矢量化的输出，这两个都可能有用。此外，您会发现pylab在图像周围留有大量的空白，通常是不希望的空白。使用以下方法删除它：

savefig('foo.png', bbox_inches='tight')

While the question has been answered, I’d like to add some useful tips when using matplotlib.pyplot.savefig. The file format can be specified by the extension:

from matplotlib import pyplot as plt

plt.savefig('foo.png')
plt.savefig('foo.pdf')

Will give a rasterized or vectorized output respectively, both which could be useful. In addition, you’ll find that pylab leaves a generous, often undesirable, whitespace around the image. Remove it with:

savefig('foo.png', bbox_inches='tight')

回答 1

正如其他人所说的，plt.savefig()或者fig1.savefig()确实是保存图像的方法。

但是我发现在某些情况下总是显示该图。（例如，在Spyder具有plt.ion()：交互模式= On的情况下）。我通过强制关闭巨型循环中的图形窗口来解决此问题plt.close(figure_object)（请参阅文档），因此在循环中没有一百万个开放图形：

import matplotlib.pyplot as plt
fig, ax = plt.subplots( nrows=1, ncols=1 )  # create figure & 1 axis
ax.plot([0,1,2], [10,20,3])
fig.savefig('path/to/save/image/to.png')   # save the figure to file
plt.close(fig)    # close the figure window

如有需要，您应该可以重新打开该图fig.show()（不必测试自己）。

As others have said, plt.savefig() or fig1.savefig() is indeed the way to save an image.

However I’ve found that in certain cases the figure is always shown. (eg. with Spyder having plt.ion(): interactive mode = On.) I work around this by forcing the closing of the figure window in my giant loop with plt.close(figure_object) (see documentation), so I don’t have a million open figures during the loop:

import matplotlib.pyplot as plt
fig, ax = plt.subplots( nrows=1, ncols=1 )  # create figure & 1 axis
ax.plot([0,1,2], [10,20,3])
fig.savefig('path/to/save/image/to.png')   # save the figure to file
plt.close(fig)    # close the figure window

You should be able to re-open the figure later if needed to with fig.show() (didn’t test myself).

回答 2

解决方案是：

pylab.savefig('foo.png')

The solution is:

pylab.savefig('foo.png')

回答 3

刚在MatPlotLib文档中找到此链接，可以解决此问题： http

他们说，防止图形弹出的最简单方法是通过使用非交互式后端（例如Agg）matplotib.use(<backend>)，例如：

import matplotlib
matplotlib.use('Agg')
import matplotlib.pyplot as plt
plt.plot([1,2,3])
plt.savefig('myfig')

我个人还是喜欢使用 plt.close( fig )，因为从那时起，您可以选择隐藏某些图形（在循环过程中），但仍显示图形以进行循环后数据处理。不过，它可能比选择非交互式后端要慢-如果有人对此进行了测试，那将很有趣。

更新：对于Spyder，您通常无法以这种方式设置后端（因为Spyder通常会较早加载matplotlib，从而阻止您使用matplotlib.use()）。

而是在Spyder偏好设置中使用plt.switch_backend('Agg')或关闭“ 启用支持 ”，然后运行matplotlib.use('Agg')自己命令。

从这两个提示：一，二

Just found this link on the MatPlotLib documentation addressing exactly this issue: http://matplotlib.org/faq/howto_faq.html#generate-images-without-having-a-window-appear

They say that the easiest way to prevent the figure from popping up is to use a non-interactive backend (eg. Agg), via matplotib.use(<backend>), eg:

import matplotlib
matplotlib.use('Agg')
import matplotlib.pyplot as plt
plt.plot([1,2,3])
plt.savefig('myfig')

I still personally prefer using plt.close( fig ), since then you have the option to hide certain figures (during a loop), but still display figures for post-loop data processing. It is probably slower than choosing a non-interactive backend though – would be interesting if someone tested that.

UPDATE: for Spyder, you usually can’t set the backend in this way (Because Spyder usually loads matplotlib early, preventing you from using matplotlib.use()).

Instead, use plt.switch_backend('Agg'), or Turn off “enable support” in the Spyder prefs and run the matplotlib.use('Agg') command yourself.

From these two hints: one, two

回答 4

如果您不喜欢“当前”数字的概念，请执行以下操作：

import matplotlib.image as mpimg

img = mpimg.imread("src.png")
mpimg.imsave("out.png", img)

If you don’t like the concept of the “current” figure, do:

import matplotlib.image as mpimg

img = mpimg.imread("src.png")
mpimg.imsave("out.png", img)

回答 5

其他答案是正确的。但是，有时我发现我想稍后再打开图形对象。例如，我可能想更改标签大小，添加网格或进行其他处理。在理想的情况下，我只需要重新运行生成图的代码并修改设置即可。las，世界并不完美。因此，除了保存为PDF或PNG之外，我还添加：

with open('some_file.pkl', "wb") as fp:
    pickle.dump(fig, fp, protocol=4)

这样，以后我可以加载图形对象并根据需要操纵设置。

我还用源代码写出了堆栈， locals()每个函数/方法字典的堆栈，以便以后可以准确地知道是什么产生了该图。

注意：请小心，因为有时此方法会生成巨大的文件。

The other answers are correct. However, I sometimes find that I want to open the figure object later. For example, I might want to change the label sizes, add a grid, or do other processing. In a perfect world, I would simply rerun the code generating the plot, and adapt the settings. Alas, the world is not perfect. Therefore, in addition to saving to PDF or PNG, I add:

with open('some_file.pkl', "wb") as fp:
    pickle.dump(fig, fp, protocol=4)

Like this, I can later load the figure object and manipulate the settings as I please.

I also write out the stack with the source-code and locals() dictionary for each function/method in the stack, so that I can later tell exactly what generated the figure.

NB: Be careful, as sometimes this method generates huge files.

回答 6

import datetime
import numpy as np
from matplotlib.backends.backend_pdf import PdfPages
import matplotlib.pyplot as plt

# Create the PdfPages object to which we will save the pages:
# The with statement makes sure that the PdfPages object is closed properly at
# the end of the block, even if an Exception occurs.
with PdfPages('multipage_pdf.pdf') as pdf:
    plt.figure(figsize=(3, 3))
    plt.plot(range(7), [3, 1, 4, 1, 5, 9, 2], 'r-o')
    plt.title('Page One')
    pdf.savefig()  # saves the current figure into a pdf page
    plt.close()

    plt.rc('text', usetex=True)
    plt.figure(figsize=(8, 6))
    x = np.arange(0, 5, 0.1)
    plt.plot(x, np.sin(x), 'b-')
    plt.title('Page Two')
    pdf.savefig()
    plt.close()

    plt.rc('text', usetex=False)
    fig = plt.figure(figsize=(4, 5))
    plt.plot(x, x*x, 'ko')
    plt.title('Page Three')
    pdf.savefig(fig)  # or you can pass a Figure object to pdf.savefig
    plt.close()

    # We can also set the file's metadata via the PdfPages object:
    d = pdf.infodict()
    d['Title'] = 'Multipage PDF Example'
    d['Author'] = u'Jouni K. Sepp\xe4nen'
    d['Subject'] = 'How to create a multipage pdf file and set its metadata'
    d['Keywords'] = 'PdfPages multipage keywords author title subject'
    d['CreationDate'] = datetime.datetime(2009, 11, 13)
    d['ModDate'] = datetime.datetime.today()

import datetime
import numpy as np
from matplotlib.backends.backend_pdf import PdfPages
import matplotlib.pyplot as plt

# Create the PdfPages object to which we will save the pages:
# The with statement makes sure that the PdfPages object is closed properly at
# the end of the block, even if an Exception occurs.
with PdfPages('multipage_pdf.pdf') as pdf:
    plt.figure(figsize=(3, 3))
    plt.plot(range(7), [3, 1, 4, 1, 5, 9, 2], 'r-o')
    plt.title('Page One')
    pdf.savefig()  # saves the current figure into a pdf page
    plt.close()

    plt.rc('text', usetex=True)
    plt.figure(figsize=(8, 6))
    x = np.arange(0, 5, 0.1)
    plt.plot(x, np.sin(x), 'b-')
    plt.title('Page Two')
    pdf.savefig()
    plt.close()

    plt.rc('text', usetex=False)
    fig = plt.figure(figsize=(4, 5))
    plt.plot(x, x*x, 'ko')
    plt.title('Page Three')
    pdf.savefig(fig)  # or you can pass a Figure object to pdf.savefig
    plt.close()

    # We can also set the file's metadata via the PdfPages object:
    d = pdf.infodict()
    d['Title'] = 'Multipage PDF Example'
    d['Author'] = u'Jouni K. Sepp\xe4nen'
    d['Subject'] = 'How to create a multipage pdf file and set its metadata'
    d['Keywords'] = 'PdfPages multipage keywords author title subject'
    d['CreationDate'] = datetime.datetime(2009, 11, 13)
    d['ModDate'] = datetime.datetime.today()

回答 7

在使用plot（）和其他函数创建所需的内容之后，可以使用如下子句在绘制到屏幕或文件之间进行选择：

import matplotlib.pyplot as plt

fig = plt.figure(figsize=(4, 5))       # size in inches
# use plot(), etc. to create your plot.

# Pick one of the following lines to uncomment
# save_file = None
# save_file = os.path.join(your_directory, your_file_name)  

if save_file:
    plt.savefig(save_file)
    plt.close(fig)
else:
    plt.show()

After using the plot() and other functions to create the content you want, you could use a clause like this to select between plotting to the screen or to file:

import matplotlib.pyplot as plt

fig = plt.figure(figsize=(4, 5))       # size in inches
# use plot(), etc. to create your plot.

# Pick one of the following lines to uncomment
# save_file = None
# save_file = os.path.join(your_directory, your_file_name)  

if save_file:
    plt.savefig(save_file)
    plt.close(fig)
else:
    plt.show()

回答 8

我使用了以下内容：

import matplotlib.pyplot as plt

p1 = plt.plot(dates, temp, 'r-', label="Temperature (celsius)")  
p2 = plt.plot(dates, psal, 'b-', label="Salinity (psu)")  
plt.legend(loc='upper center', numpoints=1, bbox_to_anchor=(0.5, -0.05),        ncol=2, fancybox=True, shadow=True)

plt.savefig('data.png')  
plt.show()  
f.close()
plt.close()

保存数字后，我发现使用plt.show非常重要，否则它将无法正常工作。图以png格式导出

I used the following:

import matplotlib.pyplot as plt

p1 = plt.plot(dates, temp, 'r-', label="Temperature (celsius)")  
p2 = plt.plot(dates, psal, 'b-', label="Salinity (psu)")  
plt.legend(loc='upper center', numpoints=1, bbox_to_anchor=(0.5, -0.05),        ncol=2, fancybox=True, shadow=True)

plt.savefig('data.png')  
plt.show()  
f.close()
plt.close()

I found very important to use plt.show after saving the figure, otherwise it won’t work.figure exported in png

回答 9

您可以执行以下操作：

plt.show(hold=False)
plt.savefig('name.pdf')

并记得在关闭GUI图之前先让savefig完成。这样，您可以预先查看图像。

或者，你可以看看它plt.show() 然后关闭GUI，然后再次运行该脚本，但这次替换plt.show()用plt.savefig()。

或者，您可以使用

fig, ax = plt.figure(nrows=1, ncols=1)
plt.plot(...)
plt.show()
fig.savefig('out.pdf')

You can either do:

plt.show(hold=False)
plt.savefig('name.pdf')

and remember to let savefig finish before closing the GUI plot. This way you can see the image beforehand.

Alternatively, you can look at it with plt.show() Then close the GUI and run the script again, but this time replace plt.show() with plt.savefig().

Alternatively, you can use

fig, ax = plt.figure(nrows=1, ncols=1)
plt.plot(...)
plt.show()
fig.savefig('out.pdf')

回答 10

如果像我一样使用Spyder IDE，则必须使用以下命令禁用交互模式：

plt.ioff()

（此命令随科学启动一起自动启动）

如果要再次启用它，请使用：

plt.ion()

If, like me, you use Spyder IDE, you have to disable the interactive mode with :

plt.ioff()

(this command is automatically launched with the scientific startup)

If you want to enable it again, use :

plt.ion()

回答 11

解决方案：

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import matplotlib
matplotlib.style.use('ggplot')
ts = pd.Series(np.random.randn(1000), index=pd.date_range('1/1/2000', periods=1000))
ts = ts.cumsum()
plt.figure()
ts.plot()
plt.savefig("foo.png", bbox_inches='tight')

如果确实要显示图像并保存图像，请使用：

%matplotlib inline

后 import matplotlib

The Solution :

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import matplotlib
matplotlib.style.use('ggplot')
ts = pd.Series(np.random.randn(1000), index=pd.date_range('1/1/2000', periods=1000))
ts = ts.cumsum()
plt.figure()
ts.plot()
plt.savefig("foo.png", bbox_inches='tight')

If you do want to display the image as well as saving the image use:

%matplotlib inline

after import matplotlib

回答 12

根据问题Matplotlib（pyplot）savefig输出空白图像。

请注意一件事：如果使用plt.show，则应在之后plt.savefig，否则将给出空白图像。

详细的例子：

import numpy as np
import matplotlib.pyplot as plt


def draw_result(lst_iter, lst_loss, lst_acc, title):
    plt.plot(lst_iter, lst_loss, '-b', label='loss')
    plt.plot(lst_iter, lst_acc, '-r', label='accuracy')

    plt.xlabel("n iteration")
    plt.legend(loc='upper left')
    plt.title(title)
    plt.savefig(title+".png")  # should before plt.show method

    plt.show()


def test_draw():
    lst_iter = range(100)
    lst_loss = [0.01 * i + 0.01 * i ** 2 for i in xrange(100)]
    # lst_loss = np.random.randn(1, 100).reshape((100, ))
    lst_acc = [0.01 * i - 0.01 * i ** 2 for i in xrange(100)]
    # lst_acc = np.random.randn(1, 100).reshape((100, ))
    draw_result(lst_iter, lst_loss, lst_acc, "sgd_method")


if __name__ == '__main__':
    test_draw()

According to question Matplotlib (pyplot) savefig outputs blank image.

One thing should note: if you use plt.show and it should after plt.savefig, or you will give a blank image.

A detailed example:

import numpy as np
import matplotlib.pyplot as plt


def draw_result(lst_iter, lst_loss, lst_acc, title):
    plt.plot(lst_iter, lst_loss, '-b', label='loss')
    plt.plot(lst_iter, lst_acc, '-r', label='accuracy')

    plt.xlabel("n iteration")
    plt.legend(loc='upper left')
    plt.title(title)
    plt.savefig(title+".png")  # should before plt.show method

    plt.show()


def test_draw():
    lst_iter = range(100)
    lst_loss = [0.01 * i + 0.01 * i ** 2 for i in xrange(100)]
    # lst_loss = np.random.randn(1, 100).reshape((100, ))
    lst_acc = [0.01 * i - 0.01 * i ** 2 for i in xrange(100)]
    # lst_acc = np.random.randn(1, 100).reshape((100, ))
    draw_result(lst_iter, lst_loss, lst_acc, "sgd_method")


if __name__ == '__main__':
    test_draw()

回答 13

import matplotlib.pyplot as plt
plt.savefig("image.png")

在Jupyter Notebook中，您必须在一个单元格中删除plt.show()并添加plt.savefig()，以及其余的plt代码。该图像仍将显示在笔记本中。

import matplotlib.pyplot as plt
plt.savefig("image.png")

In Jupyter Notebook you have to remove plt.show() and add plt.savefig(), together with the rest of the plt-code in one cell. The image will still show up in your notebook.

回答 14

鉴于今天（提出此问题时尚不可用）很多人将Jupyter Notebook用作python控制台，所以有一种极为简单的方式将图保存为.png，只需从Jupyter Notebook 调用matplotlib的pylab类，将图绘制为“内联” ‘jupyter单元格，然后将该图形/图像拖到本地目录。不要忘记 %matplotlib inline第一行！

Given that today (was not available when this question was made) lots of people use Jupyter Notebook as python console, there is an extremely easy way to save the plots as .png, just call the matplotlib‘s pylab class from Jupyter Notebook, plot the figure ‘inline’ jupyter cells, and then drag that figure/image to a local directory. Don’t forget %matplotlib inline in the first line!

回答 15

除了上述内容外，我还添加__file__了名称，以便图片和Python文件获得相同的名称。我还添加了一些参数使它看起来更好：

# Saves a PNG file of the current graph to the folder and updates it every time
# (nameOfimage, dpi=(sizeOfimage),Keeps_Labels_From_Disappearing)
plt.savefig(__file__+".png",dpi=(250), bbox_inches='tight')
# Hard coded name: './test.png'

Additionally to those above, I added __file__ for the name so the picture and Python file get the same names. I also added few arguments to make It look better:

# Saves a PNG file of the current graph to the folder and updates it every time
# (nameOfimage, dpi=(sizeOfimage),Keeps_Labels_From_Disappearing)
plt.savefig(__file__+".png",dpi=(250), bbox_inches='tight')
# Hard coded name: './test.png'

回答 16

使用时matplotlib.pyplot，必须先保存您的绘图，然后使用以下两行将其关闭：

fig.savefig('plot.png') # save the plot, place the path you want to save the figure in quotation
plt.close(fig) # close the figure window

When using matplotlib.pyplot, you must first save your plot and then close it using these 2 lines:

fig.savefig('plot.png') # save the plot, place the path you want to save the figure in quotation
plt.close(fig) # close the figure window

回答 17

如前所述，您可以使用：

import matplotlib.pyplot as plt
plt.savefig("myfig.png")

用于保存您正在显示的任何IPhython图像。或者换个角度（从另一个角度看），如果您曾经使用过开放式简历，或者如果您导入过开放式简历，则可以进行以下工作：

导入cv2

cv2.imwrite（“ myfig.png”，图像）

但这只是万一，如果您需要使用Open CV。否则，plt.savefig（）应该足够。

As suggested before, you can either use:

import matplotlib.pyplot as plt
plt.savefig("myfig.png")

For saving whatever IPhython image that you are displaying. Or on a different note (looking from a different angle), if you ever get to work with open cv, or if you have open cv imported, you can go for:

import cv2

cv2.imwrite(“myfig.png”,image)

But this is just in case if you need to work with Open CV. Otherwise plt.savefig() should be sufficient.

回答 18

您可以使用任何扩展名（png，jpg等）并以所需的分辨率保存图像。这是保存您的身材的功能。

import os

def save_fig(fig_id, tight_layout=True, fig_extension="png", resolution=300):
    path = os.path.join(IMAGES_PATH, fig_id + "." + fig_extension)
    print("Saving figure", fig_id)
    if tight_layout:
        plt.tight_layout()
    plt.savefig(path, format=fig_extension, dpi=resolution)

“ fig_id”是您要用来保存图形的名称。希望能帮助到你：）

You can save your image with any extension(png, jpg,etc.) and with the resolution you want. Here’s a function to save your figure.

import os

def save_fig(fig_id, tight_layout=True, fig_extension="png", resolution=300):
    path = os.path.join(IMAGES_PATH, fig_id + "." + fig_extension)
    print("Saving figure", fig_id)
    if tight_layout:
        plt.tight_layout()
    plt.savefig(path, format=fig_extension, dpi=resolution)

‘fig_id’ is the name by which you want to save your figure. Hope it helps:)

回答 19

您可以这样做：

def plotAFig():
  plt.figure()
  plt.plot(x,y,'b-')
  plt.savefig("figurename.png")
  plt.close()

You can do it like this:

def plotAFig():
  plt.figure()
  plt.plot(x,y,'b-')
  plt.savefig("figurename.png")
  plt.close()

知识问答

为什么使用’==’或’is’比较字符串有时会产生不同的结果？

2021年7月24日 Python实用宝典

问题：为什么使用’==’或’is’比较字符串有时会产生不同的结果？

我有一个Python程序，其中将两个变量设置为value 'public'。在条件表达式我有比较var1 is var2其失败，但如果我把它改为var1 == var2返回True。

现在，如果我打开Python解释器并进行相同的“是”比较，则成功。

>>> s1 = 'public'
>>> s2 = 'public'
>>> s2 is s1
True

我在这里想念什么？

I’ve got a Python program where two variables are set to the value 'public'. In a conditional expression I have the comparison var1 is var2 which fails, but if I change it to var1 == var2 it returns True.

Now if I open my Python interpreter and do the same “is” comparison, it succeeds.

>>> s1 = 'public'
>>> s2 = 'public'
>>> s2 is s1
True

What am I missing here?

回答 0

is是身份测试，==是平等测试。您的代码中发生的情况将在解释器中进行模拟，如下所示：

>>> a = 'pub'
>>> b = ''.join(['p', 'u', 'b'])
>>> a == b
True
>>> a is b
False

所以，难怪他们不一样吧？

换句话说：is是id(a) == id(b)

is is identity testing, == is equality testing. what happens in your code would be emulated in the interpreter like this:

>>> a = 'pub'
>>> b = ''.join(['p', 'u', 'b'])
>>> a == b
True
>>> a is b
False

so, no wonder they’re not the same, right?

In other words: is is the id(a) == id(b)

回答 1

这里的其他答案是正确的：is用于身份比较，而==用于相等比较。由于您关心的是相等性（两个字符串应包含相同的字符），因此在这种情况下，is运算符完全是错误的，您应该==改用。

is交互工作的原因是（大多数）字符串文字默认情况下是interned。从维基百科：

插入的字符串可加快字符串比较的速度，这有时是严重依赖带有字符串键的哈希表的应用程序（例如编译器和动态编程语言运行时）的性能瓶颈。在不进行实习的情况下，检查两个不同的字符串是否相等涉及检查两个字符串的每个字符。这很慢，原因有几个：字符串的长度固有地为O（n）；它通常需要从多个内存区域进行读取，这需要时间。并且读取将填满处理器缓存，这意味着可用于其他需求的缓存较少。对于插入的字符串，在原始的内部操作之后，一个简单的对象身份测试就足够了；这通常被实现为指针相等性测试，

因此，当程序中有两个具有相同值的字符串文字（在程序源代码中逐字键入的单词，并用引号引起来）时，Python编译器将自动内插字符串，使它们都存储在相同的位置内存位置。（请注意，这并不总是会发生，并且发生这种情况的规则非常复杂，因此请不要在生产代码中依赖此行为！）

由于在您的交互式会话中，两个字符串实际上都存储在相同的存储位置中，因此它们具有相同的标识，因此is操作符将按预期工作。但是，如果您通过其他方法构造一个字符串（即使该字符串包含完全相同的字符），则该字符串可能相等，但它不是同一字符串 -也就是说，它具有不同的标识，因为它是存储在内存中的其他位置。

Other answers here are correct: is is used for identity comparison, while == is used for equality comparison. Since what you care about is equality (the two strings should contain the same characters), in this case the is operator is simply wrong and you should be using == instead.

The reason is works interactively is that (most) string literals are interned by default. From Wikipedia:

Interned strings speed up string comparisons, which are sometimes a performance bottleneck in applications (such as compilers and dynamic programming language runtimes) that rely heavily on hash tables with string keys. Without interning, checking that two different strings are equal involves examining every character of both strings. This is slow for several reasons: it is inherently O(n) in the length of the strings; it typically requires reads from several regions of memory, which take time; and the reads fills up the processor cache, meaning there is less cache available for other needs. With interned strings, a simple object identity test suffices after the original intern operation; this is typically implemented as a pointer equality test, normally just a single machine instruction with no memory reference at all.

So, when you have two string literals (words that are literally typed into your program source code, surrounded by quotation marks) in your program that have the same value, the Python compiler will automatically intern the strings, making them both stored at the same memory location. (Note that this doesn’t always happen, and the rules for when this happens are quite convoluted, so please don’t rely on this behavior in production code!)

Since in your interactive session both strings are actually stored in the same memory location, they have the same identity, so the is operator works as expected. But if you construct a string by some other method (even if that string contains exactly the same characters), then the string may be equal, but it is not the same string — that is, it has a different identity, because it is stored in a different place in memory.

回答 2

该is关键字是对象标识一个测试而==是一个值比较。

如果使用is，则当且仅当对象是同一对象时，结果才为true。但是，==只要对象的值相同，就为真。

The is keyword is a test for object identity while == is a value comparison.

If you use is, the result will be true if and only if the object is the same object. However, == will be true any time the values of the object are the same.

回答 3

最后要注意的一点是，您可以使用该sys.intern函数来确保获得对相同字符串的引用：

>>> from sys import intern
>>> a = intern('a')
>>> a2 = intern('a')
>>> a is a2
True

如上所述，您不应该is用来确定字符串的相等性。但这可能有助于了解您是否有某种奇怪的要求要使用is。

请注意，该intern函数以前是Python 2的内置函数，但已移至sysPython 3 的模块中。

One last thing to note, you may use the sys.intern function to ensure that you’re getting a reference to the same string:

>>> from sys import intern
>>> a = intern('a')
>>> a2 = intern('a')
>>> a is a2
True

As pointed out above, you should not be using is to determine equality of strings. But this may be helpful to know if you have some kind of weird requirement to use is.

Note that the intern function used to be a builtin on Python 2 but was moved to the sys module in Python 3.

回答 4

is是身份测试，==是平等测试。这意味着is检查两种事物是相同的还是等同的。

假设您有一个简单的person对象。如果它的名字叫“ Jack”并且是“ 23”岁，则相当于另一个23岁的Jack，但不是同一个人。

class Person(object):
   def __init__(self, name, age):
       self.name = name
       self.age = age

   def __eq__(self, other):
       return self.name == other.name and self.age == other.age

jack1 = Person('Jack', 23)
jack2 = Person('Jack', 23)

jack1 == jack2 #True
jack1 is jack2 #False

他们是同一年龄，但他们不是同一个人。一个字符串可能等效于另一个，但它不是同一对象。

is is identity testing, == is equality testing. What this means is that is is a way to check whether two things are the same things, or just equivalent.

Say you’ve got a simple person object. If it is named ‘Jack’ and is ’23’ years old, it’s equivalent to another 23yr old Jack, but its not the same person.

class Person(object):
   def __init__(self, name, age):
       self.name = name
       self.age = age

   def __eq__(self, other):
       return self.name == other.name and self.age == other.age

jack1 = Person('Jack', 23)
jack2 = Person('Jack', 23)

jack1 == jack2 #True
jack1 is jack2 #False

They’re the same age, but they’re not the same instance of person. A string might be equivalent to another, but it’s not the same object.

回答 5

这是一个旁注，但是在惯用的python中，您经常会看到类似以下内容：

if x is None: 
    # some clauses

这是安全的，因为保证存在Null对象的一个实例（即None）。

This is a side note, but in idiomatic python, you will often see things like:

if x is None: 
    # some clauses

This is safe, because there is guaranteed to be one instance of the Null Object (i.e., None).

回答 6

如果不确定自己在做什么，请使用’==’。如果您对此有更多了解，可以对已知对象（例如“无”）使用“ is”。

否则，您将最终想知道为什么事情不起作用以及为什么会发生这种情况：

>>> a = 1
>>> b = 1
>>> b is a
True
>>> a = 6000
>>> b = 6000
>>> b is a
False

我什至不确定在不同的python版本/实现之间是否可以保证某些事情保持不变。

If you’re not sure what you’re doing, use the ‘==’. If you have a little more knowledge about it you can use ‘is’ for known objects like ‘None’.

Otherwise you’ll end up wondering why things doesn’t work and why this happens:

>>> a = 1
>>> b = 1
>>> b is a
True
>>> a = 6000
>>> b = 6000
>>> b is a
False

I’m not even sure if some things are guaranteed to stay the same between different python versions/implementations.

回答 7

根据我在python中的有限经验，is用于比较两个对象以查看它们是否是同一对象，而不是两个具有相同值的不同对象。 ==用于确定值是否相同。

这是一个很好的例子：

>>> s1 = u'public'
>>> s2 = 'public'
>>> s1 is s2
False
>>> s1 == s2
True

s1是unicode字符串，并且s2是普通字符串。它们不是同一类型，但是具有相同的值。

From my limited experience with python, is is used to compare two objects to see if they are the same object as opposed to two different objects with the same value. == is used to determine if the values are identical.

Here is a good example:

>>> s1 = u'public'
>>> s2 = 'public'
>>> s1 is s2
False
>>> s1 == s2
True

s1 is a unicode string, and s2 is a normal string. They are not the same type, but are the same value.

回答 8

我认为这与以下事实有关：当“ is”比较结果为false时，将使用两个不同的对象。如果评估结果为true，则表示内部使用的是完全相同的对象，而不是创建一个新对象，这可能是因为您在不到2秒的时间内创建了它们，并且在优化和使用相同的对象。

这就是为什么您应该使用相等运算符==而不是is来比较字符串对象的值的原因。

>>> s = 'one'
>>> s2 = 'two'
>>> s is s2
False
>>> s2 = s2.replace('two', 'one')
>>> s2
'one'
>>> s2 is s
False
>>>

在此示例中，我创建了s2，它是一个以前等于’one’的不同字符串对象，但它与并不相同s，因为解释器没有使用相同的对象，因为我最初并未将其分配给’one’，如果我有的话，会让他们成为同一个对象。

I think it has to do with the fact that, when the ‘is’ comparison evaluates to false, two distinct objects are used. If it evaluates to true, that means internally it’s using the same exact object and not creating a new one, possibly because you created them within a fraction of 2 or so seconds and because there isn’t a large time gap in between it’s optimized and uses the same object.

This is why you should be using the equality operator ==, not is, to compare the value of a string object.

>>> s = 'one'
>>> s2 = 'two'
>>> s is s2
False
>>> s2 = s2.replace('two', 'one')
>>> s2
'one'
>>> s2 is s
False
>>>

In this example, I made s2, which was a different string object previously equal to ‘one’ but it is not the same object as s, because the interpreter did not use the same object as I did not initially assign it to ‘one’, if I had it would have made them the same object.

回答 9

我相信这被称为“ interned”字符串。在优化模式下，Python会这样做，Java也会这样做，C和C ++也会这样做。

如果您使用两个相同的字符串，而不是通过创建两个字符串对象来浪费内存，则具有相同内容的所有已嵌入字符串都指向相同的内存。

这导致Python“ is”运算符返回True，因为两个内容相同的字符串指向同一个字符串对象。这也将在Java和C语言中发生。

但是，这仅对节省内存有用。您不能依靠它来测试字符串是否相等，因为各种解释器和编译器以及JIT引擎不能总是这样做。

I believe that this is known as “interned” strings. Python does this, so does Java, and so do C and C++ when compiling in optimized modes.

If you use two identical strings, instead of wasting memory by creating two string objects, all interned strings with the same contents point to the same memory.

This results in the Python “is” operator returning True because two strings with the same contents are pointing at the same string object. This will also happen in Java and in C.

This is only useful for memory savings though. You cannot rely on it to test for string equality, because the various interpreters and compilers and JIT engines cannot always do it.

回答 10

我回答了这个问题，尽管这个问题已经很老了，因为上面没有答案引用了语言参考

实际上，is运算符检查身份，而==运算符检查是否相等，

从语言参考：

类型影响对象行为的几乎所有方面。甚至对象身份的重要性在某种意义上也受到影响：对于不可变类型，计算新值的操作实际上可能返回对具有相同类型和值的任何现有对象的引用，而对于可变对象，则不允许这样做。例如，在a = 1之后；b = 1，取决于实现，a和b可以或可以不使用值1引用同一对象，但是在c = []之后；d = []，保证c和d引用两个不同的，唯一的，新创建的空列表。（请注意，c = d = []将相同的对象分配给c和d。）

因此，根据上述陈述，我们可以推断出，使用“ is”检查时，不可变类型的字符串可能会失败，而使用“ is”检查时，则可能会检查成功

同样适用于int，tuple也是不可变的类型

I am answering the question even though the question is to old because no answers above quotes the language reference

Actually the is operator checks for identity and == operator checks for equality,

From Language Reference:

Types affect almost all aspects of object behavior. Even the importance of object identity is affected in some sense: for immutable types, operations that compute new values may actually return a reference to any existing object with the same type and value, while for mutable objects this is not allowed. E.g., after a = 1; b = 1, a and b may or may not refer to the same object with the value one, depending on the implementation, but after c = []; d = [], c and d are guaranteed to refer to two different, unique, newly created empty lists. (Note that c = d = [] assigns the same object to both c and d.)

so from above statement we can infer that the strings which is an immutable type may fail when checked with “is” and may checked succeed when checked with “is”

The same applies for int,tuple which are also immutable types

回答 11

该==运营商测试值等价。该is运营商的测试对象的身份，Python的测试是否两者实际上是同一个对象（即住在内存中的地址相同）。

>>> a = 'banana'
>>> b = 'banana'
>>> a is b 
True

在此例如，Python只创建了一个字符串对象，都a和b参照它。原因是Python在内部缓存和重用了一些字符串作为优化，实际上在内存中只有一个字符串“ banana”，由a和b共享；要触发正常行为，您需要使用更长的字符串：

>>> a = 'a longer banana'
>>> b = 'a longer banana'
>>> a == b, a is b
(True, False)

创建两个列表时，将获得两个对象：

>>> a = [1, 2, 3]
>>> b = [1, 2, 3]
>>> a is b
False

在这种情况下，我们可以说这两个列表是等效的，因为它们具有相同的元素，但是不相同，因为它们不是相同的对象。如果两个对象相同，则它们也是等效的，但是如果它们相等，则它们不一定相同。

如果a引用对象，则分配b = a，然后，则两个变量都引用同一个对象：

>>> a = [1, 2, 3]
>>> b = a
>>> b is a
True

The == operator test value equivalence. The is operator tests object identity, Python tests whether the two are really the same object(i.e., live at the same address in memory).

>>> a = 'banana'
>>> b = 'banana'
>>> a is b 
True

In this example, Python only created one string object, and both a and b refers to it. The reason is that Python internally caches and reuses some strings as an optimization, there really is just a string ‘banana’ in memory, shared by a and b; To trigger the normal behavior, you need to use longer strings:

>>> a = 'a longer banana'
>>> b = 'a longer banana'
>>> a == b, a is b
(True, False)

When you create two lists, you get two objects:

>>> a = [1, 2, 3]
>>> b = [1, 2, 3]
>>> a is b
False

In this case we would say that the two lists are equivalent, because they have the same elements, but not identical, because they are not the same object. If two objects are identical, they are also equivalent, but if they are equivalent, they are not necessarily identical.

If a refers to an object and you assign b = a, then both variables refer to the same object:

>>> a = [1, 2, 3]
>>> b = a
>>> b is a
True

回答 12

is将比较内存位置。它用于对象级比较。

==将比较程序中的变量。用于在值级别进行检查。

is 检查地址级别是否相等

== 检查价值水平是否相等

is will compare the memory location. It is used for object-level comparison.

== will compare the variables in the program. It is used for checking at a value level.

is checks for address level equivalence

== checks for value level equivalence

回答 13

is是身份测试，==是相等性测试（请参阅Python文档）。

在大多数情况下，如果a is b，则a == b。但是也有exceptions，例如：

>>> nan = float('nan')
>>> nan is nan
True
>>> nan == nan
False

因此，您只能is用于身份测试，而不能用于相等性测试。

is is identity testing, == is equality testing (see Python Documentation).

In most cases, if a is b, then a == b. But there are exceptions, for example:

>>> nan = float('nan')
>>> nan is nan
True
>>> nan == nan
False

So, you can only use is for identity tests, never equality tests.

知识问答

如何在Python中表示“枚举”？

2021年7月24日 Python实用宝典

问题：如何在Python中表示“枚举”？

我主要是C＃开发人员，但目前正在使用Python开发项目。

如何用Python表示等效的枚举？

I’m mainly a C# developer, but I’m currently working on a project in Python.

How can I represent the equivalent of an Enum in Python?

回答 0

如PEP 435中所述，将枚举添加到Python 3.4中。它也已在pypi上反向移植到 3.3、3.2、3.1、2.7、2.6、2.5 和2.4。

对于更高级的Enum技术，请尝试aenum库（2.7、3.3+，与作者相同enum34。py2和py3之间的代码并不完全兼容，例如，__order__在python 2中需要）。

要使用enum34，做$ pip install enum34
要使用aenum，做$ pip install aenum

安装enum（无编号）将安装完全不同且不兼容的版本。

from enum import Enum     # for enum34, or the stdlib version
# from aenum import Enum  # for the aenum version
Animal = Enum('Animal', 'ant bee cat dog')

Animal.ant  # returns <Animal.ant: 1>
Animal['ant']  # returns <Animal.ant: 1> (string lookup)
Animal.ant.name  # returns 'ant' (inverse lookup)

或等效地：

class Animal(Enum):
    ant = 1
    bee = 2
    cat = 3
    dog = 4

在早期版本中，完成枚举的一种方法是：

def enum(**enums):
    return type('Enum', (), enums)

用法如下：

>>> Numbers = enum(ONE=1, TWO=2, THREE='three')
>>> Numbers.ONE
1
>>> Numbers.TWO
2
>>> Numbers.THREE
'three'

您还可以轻松支持自动枚举，如下所示：

def enum(*sequential, **named):
    enums = dict(zip(sequential, range(len(sequential))), **named)
    return type('Enum', (), enums)

并像这样使用：

>>> Numbers = enum('ZERO', 'ONE', 'TWO')
>>> Numbers.ZERO
0
>>> Numbers.ONE
1

可以通过以下方式添加对将值转换回名称的支持：

def enum(*sequential, **named):
    enums = dict(zip(sequential, range(len(sequential))), **named)
    reverse = dict((value, key) for key, value in enums.iteritems())
    enums['reverse_mapping'] = reverse
    return type('Enum', (), enums)

这将覆盖具有该名称的所有内容，但是对于在输出中呈现枚举很有用。如果反向映射不存在，它将抛出KeyError。对于第一个示例：

>>> Numbers.reverse_mapping['three']
'THREE'

Enums have been added to Python 3.4 as described in PEP 435. It has also been backported to 3.3, 3.2, 3.1, 2.7, 2.6, 2.5, and 2.4 on pypi.

For more advanced Enum techniques try the aenum library (2.7, 3.3+, same author as enum34. Code is not perfectly compatible between py2 and py3, e.g. you’ll need __order__ in python 2).

To use enum34, do $ pip install enum34
To use aenum, do $ pip install aenum

Installing enum (no numbers) will install a completely different and incompatible version.

from enum import Enum     # for enum34, or the stdlib version
# from aenum import Enum  # for the aenum version
Animal = Enum('Animal', 'ant bee cat dog')

Animal.ant  # returns <Animal.ant: 1>
Animal['ant']  # returns <Animal.ant: 1> (string lookup)
Animal.ant.name  # returns 'ant' (inverse lookup)

or equivalently:

class Animal(Enum):
    ant = 1
    bee = 2
    cat = 3
    dog = 4

In earlier versions, one way of accomplishing enums is:

def enum(**enums):
    return type('Enum', (), enums)

which is used like so:

>>> Numbers = enum(ONE=1, TWO=2, THREE='three')
>>> Numbers.ONE
1
>>> Numbers.TWO
2
>>> Numbers.THREE
'three'

You can also easily support automatic enumeration with something like this:

def enum(*sequential, **named):
    enums = dict(zip(sequential, range(len(sequential))), **named)
    return type('Enum', (), enums)

and used like so:

>>> Numbers = enum('ZERO', 'ONE', 'TWO')
>>> Numbers.ZERO
0
>>> Numbers.ONE
1

Support for converting the values back to names can be added this way:

def enum(*sequential, **named):
    enums = dict(zip(sequential, range(len(sequential))), **named)
    reverse = dict((value, key) for key, value in enums.iteritems())
    enums['reverse_mapping'] = reverse
    return type('Enum', (), enums)

This overwrites anything with that name, but it is useful for rendering your enums in output. It will throw KeyError if the reverse mapping doesn’t exist. With the first example:

>>> Numbers.reverse_mapping['three']
'THREE'

回答 1

在PEP 435之前，Python没有等效项，但是您可以实现自己的等效项。

我自己，我喜欢保持简单（我在网上看到了一些非常复杂的示例），就像这样…

class Animal:
    DOG = 1
    CAT = 2

x = Animal.DOG

在Python 3.4（PEP 435）中，您可以将Enum设为基类。这会给您带来一些额外的功能，如PEP中所述。例如，枚举成员不同于整数，并且由a name和a 组成value。

class Animal(Enum):
    DOG = 1
    CAT = 2

print(Animal.DOG)
# <Animal.DOG: 1>

print(Animal.DOG.value)
# 1

print(Animal.DOG.name)
# "DOG"

如果您不想键入值，请使用以下快捷方式：

class Animal(Enum):
    DOG, CAT = range(2)

Enum实现可以转换为列表并且可以迭代。其成员的顺序是声明顺序，与它们的值无关。例如：

class Animal(Enum):
    DOG = 1
    CAT = 2
    COW = 0

list(Animal)
# [<Animal.DOG: 1>, <Animal.CAT: 2>, <Animal.COW: 0>]

[animal.value for animal in Animal]
# [1, 2, 0]

Animal.CAT in Animal
# True

Before PEP 435, Python didn’t have an equivalent but you could implement your own.

Myself, I like keeping it simple (I’ve seen some horribly complex examples on the net), something like this …

class Animal:
    DOG = 1
    CAT = 2

x = Animal.DOG

In Python 3.4 (PEP 435), you can make Enum the base class. This gets you a little bit of extra functionality, described in the PEP. For example, enum members are distinct from integers, and they are composed of a name and a value.

class Animal(Enum):
    DOG = 1
    CAT = 2

print(Animal.DOG)
# <Animal.DOG: 1>

print(Animal.DOG.value)
# 1

print(Animal.DOG.name)
# "DOG"

If you don’t want to type the values, use the following shortcut:

class Animal(Enum):
    DOG, CAT = range(2)

Enum implementations can be converted to lists and are iterable. The order of its members is the declaration order and has nothing to do with their values. For example:

class Animal(Enum):
    DOG = 1
    CAT = 2
    COW = 0

list(Animal)
# [<Animal.DOG: 1>, <Animal.CAT: 2>, <Animal.COW: 0>]

[animal.value for animal in Animal]
# [1, 2, 0]

Animal.CAT in Animal
# True

回答 2

这是一个实现：

class Enum(set):
    def __getattr__(self, name):
        if name in self:
            return name
        raise AttributeError

这是它的用法：

Animals = Enum(["DOG", "CAT", "HORSE"])

print(Animals.DOG)

Here is one implementation:

class Enum(set):
    def __getattr__(self, name):
        if name in self:
            return name
        raise AttributeError

Here is its usage:

Animals = Enum(["DOG", "CAT", "HORSE"])

print(Animals.DOG)

回答 3

如果需要数字值，这是最快的方法：

dog, cat, rabbit = range(3)

在Python 3.x中，您还可以在末尾添加一个加星标的占位符，以防吸收内存中的剩余值，以防万一：

dog, cat, rabbit, horse, *_ = range(100)

If you need the numeric values, here’s the quickest way:

dog, cat, rabbit = range(3)

In Python 3.x you can also add a starred placeholder at the end, which will soak up all the remaining values of the range in case you don’t mind wasting memory and cannot count:

dog, cat, rabbit, horse, *_ = range(100)

回答 4

最好的解决方案取决于您对假货的 要求enum。

简单枚举：

如果enum仅需要标识不同项目的名称列表，那么马克·哈里森（上述）的解决方案非常有用：

Pen, Pencil, Eraser = range(0, 3)

使用a range还可以设置任何起始值：

Pen, Pencil, Eraser = range(9, 12)

除上述内容外，如果您还要求这些项目属于某种容器，则将它们嵌入一个类中：

class Stationery:
    Pen, Pencil, Eraser = range(0, 3)

要使用枚举项目，您现在需要使用容器名称和项目名称：

stype = Stationery.Pen

复杂的枚举：

对于一长串的枚举或更复杂的枚举使用，这些解决方案将无法满足要求。您可以查看Will Cook 的Python食谱手册中的Python 模拟枚举方法。可在此处获得其在线版本。

更多信息：

PEP 354：Python枚举中有一个有趣的细节，建议使用Python枚举以及为什么拒绝该枚举。

The best solution for you would depend on what you require from your fake enum.

Simple enum:

If you need the enum as only a list of names identifying different items, the solution by Mark Harrison (above) is great:

Pen, Pencil, Eraser = range(0, 3)

Using a range also allows you to set any starting value:

Pen, Pencil, Eraser = range(9, 12)

In addition to the above, if you also require that the items belong to a container of some sort, then embed them in a class:

class Stationery:
    Pen, Pencil, Eraser = range(0, 3)

To use the enum item, you would now need to use the container name and the item name:

stype = Stationery.Pen

Complex enum:

For long lists of enum or more complicated uses of enum, these solutions will not suffice. You could look to the recipe by Will Ware for Simulating Enumerations in Python published in the Python Cookbook. An online version of that is available here.

More info:

PEP 354: Enumerations in Python has the interesting details of a proposal for enum in Python and why it was rejected.

回答 5

Java之前的JDK 5中使用的类型安全枚举模式具有许多优点。就像在Alexandru的答案中一样，您创建了一个类，并且类级别字段是枚举值。但是，枚举值是类的实例，而不是小整数。这样做的优点是您的枚举值不会无意间等于小整数，您可以控制它们的打印方式，添加有用的任意方法，并使用isinstance进行断言：

class Animal:
   def __init__(self, name):
       self.name = name

   def __str__(self):
       return self.name

   def __repr__(self):
       return "<Animal: %s>" % self

Animal.DOG = Animal("dog")
Animal.CAT = Animal("cat")

>>> x = Animal.DOG
>>> x
<Animal: dog>
>>> x == 1
False

python-dev上的一个最新线程指出，野外有几个枚举库，包括：

氟虫
lazr.enum
…以及富有想象力的枚举

The typesafe enum pattern which was used in Java pre-JDK 5 has a number of advantages. Much like in Alexandru’s answer, you create a class and class level fields are the enum values; however, the enum values are instances of the class rather than small integers. This has the advantage that your enum values don’t inadvertently compare equal to small integers, you can control how they’re printed, add arbitrary methods if that’s useful and make assertions using isinstance:

class Animal:
   def __init__(self, name):
       self.name = name

   def __str__(self):
       return self.name

   def __repr__(self):
       return "<Animal: %s>" % self

Animal.DOG = Animal("dog")
Animal.CAT = Animal("cat")

>>> x = Animal.DOG
>>> x
<Animal: dog>
>>> x == 1
False

A recent thread on python-dev pointed out there are a couple of enum libraries in the wild, including:

flufl.enum
lazr.enum
… and the imaginatively named enum

回答 6

枚举类可以是单行。

class Enum(tuple): __getattr__ = tuple.index

如何使用它（正向和反向查找，键，值，项目等）

>>> State = Enum(['Unclaimed', 'Claimed'])
>>> State.Claimed
1
>>> State[1]
'Claimed'
>>> State
('Unclaimed', 'Claimed')
>>> range(len(State))
[0, 1]
>>> [(k, State[k]) for k in range(len(State))]
[(0, 'Unclaimed'), (1, 'Claimed')]
>>> [(k, getattr(State, k)) for k in State]
[('Unclaimed', 0), ('Claimed', 1)]

An Enum class can be a one-liner.

class Enum(tuple): __getattr__ = tuple.index

How to use it (forward and reverse lookup, keys, values, items, etc.)

>>> State = Enum(['Unclaimed', 'Claimed'])
>>> State.Claimed
1
>>> State[1]
'Claimed'
>>> State
('Unclaimed', 'Claimed')
>>> range(len(State))
[0, 1]
>>> [(k, State[k]) for k in range(len(State))]
[(0, 'Unclaimed'), (1, 'Claimed')]
>>> [(k, getattr(State, k)) for k in State]
[('Unclaimed', 0), ('Claimed', 1)]

回答 7

所以，我同意。我们不要在Python中强制执行类型安全性，但我想保护自己免受愚蠢的错误的影响。那么我们对此怎么看？

class Animal(object):
    values = ['Horse','Dog','Cat']

    class __metaclass__(type):
        def __getattr__(self, name):
            return self.values.index(name)

它使我在定义枚举时避免了价值冲突。

>>> Animal.Cat
2

还有一个方便的优点：真正快速的反向查找：

def name_of(self, i):
    return self.values[i]

So, I agree. Let’s not enforce type safety in Python, but I would like to protect myself from silly mistakes. So what do we think about this?

class Animal(object):
    values = ['Horse','Dog','Cat']

    class __metaclass__(type):
        def __getattr__(self, name):
            return self.values.index(name)

It keeps me from value-collision in defining my enums.

>>> Animal.Cat
2

There’s another handy advantage: really fast reverse lookups:

def name_of(self, i):
    return self.values[i]

回答 8

Python没有等效于的内置函数enum，其他答案也有实现自己的想法（您可能也对Python食谱中的顶级版本感兴趣）。

但是，在enum需要用C调用an的情况下，我通常最终只使用简单的字符串：由于对象/属性的实现方式，（C）Python经过优化，无论如何都可以非常快速地使用短字符串，因此使用整数确实不会对性能产生任何好处。为了防止输入错误/无效值，您可以在所选位置插入支票。

ANIMALS = ['cat', 'dog', 'python']

def take_for_a_walk(animal):
    assert animal in ANIMALS
    ...

（与使用类相比，一个缺点是您失去了自动完成功能的优势）

Python doesn’t have a built-in equivalent to enum, and other answers have ideas for implementing your own (you may also be interested in the over the top version in the Python cookbook).

However, in situations where an enum would be called for in C, I usually end up just using simple strings: because of the way objects/attributes are implemented, (C)Python is optimized to work very fast with short strings anyway, so there wouldn’t really be any performance benefit to using integers. To guard against typos / invalid values you can insert checks in selected places.

ANIMALS = ['cat', 'dog', 'python']

def take_for_a_walk(animal):
    assert animal in ANIMALS
    ...

(One disadvantage compared to using a class is that you lose the benefit of autocomplete)

回答 9

在2013-05-10上，Guido同意将PEP 435接受到Python 3.4标准库中。这意味着Python终于内置了对枚举的支持！

有一个适用于Python 3.3、3.2、3.1、2.7、2.6、2.5和2.4的反向端口。在Pypi上为enum34。

宣言：

>>> from enum import Enum
>>> class Color(Enum):
...     red = 1
...     green = 2
...     blue = 3

表示：

>>> print(Color.red)
Color.red
>>> print(repr(Color.red))
<Color.red: 1>

迭代：

>>> for color in Color:
...   print(color)
...
Color.red
Color.green
Color.blue

程序访问：

>>> Color(1)
Color.red
>>> Color['blue']
Color.blue

有关更多信息，请参阅建议。官方文档可能很快就会发布。

On 2013-05-10, Guido agreed to accept PEP 435 into the Python 3.4 standard library. This means that Python finally has builtin support for enumerations!

There is a backport available for Python 3.3, 3.2, 3.1, 2.7, 2.6, 2.5, and 2.4. It’s on Pypi as enum34.

Declaration:

>>> from enum import Enum
>>> class Color(Enum):
...     red = 1
...     green = 2
...     blue = 3

Representation:

>>> print(Color.red)
Color.red
>>> print(repr(Color.red))
<Color.red: 1>

Iteration:

>>> for color in Color:
...   print(color)
...
Color.red
Color.green
Color.blue

Programmatic access:

>>> Color(1)
Color.red
>>> Color['blue']
Color.blue

For more information, refer to the proposal. Official documentation will probably follow soon.

回答 10

我更喜欢在Python中定义枚举，如下所示：

class Animal:
  class Dog: pass
  class Cat: pass

x = Animal.Dog

与使用整数相比，它具有更好的防错功能，因为您不必担心确保整数是唯一的（例如，如果您说Dog = 1和Cat = 1会被搞砸）。

比起使用字符串，它更具防错性，因为您不必担心拼写错误（例如，x ==“ catt”静默失败，但x == Animal.Catt是运行时异常）。

I prefer to define enums in Python like so:

class Animal:
  class Dog: pass
  class Cat: pass

x = Animal.Dog

It’s more bug-proof than using integers since you don’t have to worry about ensuring that the integers are unique (e.g. if you said Dog = 1 and Cat = 1 you’d be screwed).

It’s more bug-proof than using strings since you don’t have to worry about typos (e.g. x == “catt” fails silently, but x == Animal.Catt is a runtime exception).

回答 11

def M_add_class_attribs(attribs):
    def foo(name, bases, dict_):
        for v, k in attribs:
            dict_[k] = v
        return type(name, bases, dict_)
    return foo

def enum(*names):
    class Foo(object):
        __metaclass__ = M_add_class_attribs(enumerate(names))
        def __setattr__(self, name, value):  # this makes it read-only
            raise NotImplementedError
    return Foo()

像这样使用它：

Animal = enum('DOG', 'CAT')
Animal.DOG # returns 0
Animal.CAT # returns 1
Animal.DOG = 2 # raises NotImplementedError

如果您只需要唯一的符号并且不关心值，请替换此行：

__metaclass__ = M_add_class_attribs(enumerate(names))

有了这个：

__metaclass__ = M_add_class_attribs((object(), name) for name in names)

def M_add_class_attribs(attribs):
    def foo(name, bases, dict_):
        for v, k in attribs:
            dict_[k] = v
        return type(name, bases, dict_)
    return foo

def enum(*names):
    class Foo(object):
        __metaclass__ = M_add_class_attribs(enumerate(names))
        def __setattr__(self, name, value):  # this makes it read-only
            raise NotImplementedError
    return Foo()

Use it like this:

Animal = enum('DOG', 'CAT')
Animal.DOG # returns 0
Animal.CAT # returns 1
Animal.DOG = 2 # raises NotImplementedError

if you just want unique symbols and don’t care about the values, replace this line:

__metaclass__ = M_add_class_attribs(enumerate(names))

with this:

__metaclass__ = M_add_class_attribs((object(), name) for name in names)

回答 12

嗯…我想最接近枚举的是字典，定义如下：

months = {
    'January': 1,
    'February': 2,
    ...
}

要么

months = dict(
    January=1,
    February=2,
    ...
)

然后，可以为常量使用符号名称，如下所示：

mymonth = months['January']

还有其他选项，例如元组列表或元组的元组，但是字典是唯一为您提供“符号”（常量字符串）访问值的方式的字典。

编辑：我也喜欢Alexandru的答案！

Hmmm… I suppose the closest thing to an enum would be a dictionary, defined either like this:

months = {
    'January': 1,
    'February': 2,
    ...
}

months = dict(
    January=1,
    February=2,
    ...
)

Then, you can use the symbolic name for the constants like this:

mymonth = months['January']

There are other options, like a list of tuples, or a tuple of tuples, but the dictionary is the only one that provides you with a “symbolic” (constant string) way to access the value.

Edit: I like Alexandru’s answer too!

回答 13

另一个非常简单的Python枚举实现，使用namedtuple：

from collections import namedtuple

def enum(*keys):
    return namedtuple('Enum', keys)(*keys)

MyEnum = enum('FOO', 'BAR', 'BAZ')

或者，

# With sequential number values
def enum(*keys):
    return namedtuple('Enum', keys)(*range(len(keys)))

# From a dict / keyword args
def enum(**kwargs):
    return namedtuple('Enum', kwargs.keys())(*kwargs.values())

就像上面子类的方法一样set，这允许：

'FOO' in MyEnum
other = MyEnum.FOO
assert other == MyEnum.FOO

但是具有更大的灵活性，因为它可以具有不同的键和值。这允许

MyEnum.FOO < MyEnum.BAR

如果您使用填充连续数字值的版本，则可以按预期操作。

Another, very simple, implementation of an enum in Python, using namedtuple:

from collections import namedtuple

def enum(*keys):
    return namedtuple('Enum', keys)(*keys)

MyEnum = enum('FOO', 'BAR', 'BAZ')

or, alternatively,

# With sequential number values
def enum(*keys):
    return namedtuple('Enum', keys)(*range(len(keys)))

# From a dict / keyword args
def enum(**kwargs):
    return namedtuple('Enum', kwargs.keys())(*kwargs.values())

Like the method above that subclasses set, this allows:

'FOO' in MyEnum
other = MyEnum.FOO
assert other == MyEnum.FOO

But has more flexibility as it can have different keys and values. This allows

MyEnum.FOO < MyEnum.BAR

to act as is expected if you use the version that fills in sequential number values.

回答 14

从Python 3.4开始，将正式支持枚举。您可以在Python 3.4文档页面上找到文档和示例。

枚举是使用类语法创建的，这使得它们易于读写。在功能API中介绍了另一种创建方法。要定义枚举，请子类Enum如下：

from enum import Enum
class Color(Enum):
     red = 1
     green = 2
     blue = 3

From Python 3.4 there will be official support for enums. You can find documentation and examples here on Python 3.4 documentation page.

Enumerations are created using the class syntax, which makes them easy to read and write. An alternative creation method is described in Functional API. To define an enumeration, subclass Enum as follows:

from enum import Enum
class Color(Enum):
     red = 1
     green = 2
     blue = 3

回答 15

我用什么：

class Enum(object):
    def __init__(self, names, separator=None):
        self.names = names.split(separator)
        for value, name in enumerate(self.names):
            setattr(self, name.upper(), value)
    def tuples(self):
        return tuple(enumerate(self.names))

如何使用：

>>> state = Enum('draft published retracted')
>>> state.DRAFT
0
>>> state.RETRACTED
2
>>> state.FOO
Traceback (most recent call last):
   File "<stdin>", line 1, in <module>
AttributeError: 'Enum' object has no attribute 'FOO'
>>> state.tuples()
((0, 'draft'), (1, 'published'), (2, 'retracted'))

因此，这将为您提供诸如state.PUBLISHED之类的整数常量，并在Django模型中使用两个元组作为选择。

What I use:

class Enum(object):
    def __init__(self, names, separator=None):
        self.names = names.split(separator)
        for value, name in enumerate(self.names):
            setattr(self, name.upper(), value)
    def tuples(self):
        return tuple(enumerate(self.names))

How to use:

>>> state = Enum('draft published retracted')
>>> state.DRAFT
0
>>> state.RETRACTED
2
>>> state.FOO
Traceback (most recent call last):
   File "<stdin>", line 1, in <module>
AttributeError: 'Enum' object has no attribute 'FOO'
>>> state.tuples()
((0, 'draft'), (1, 'published'), (2, 'retracted'))

So this gives you integer constants like state.PUBLISHED and the two-tuples to use as choices in Django models.

回答 16

大卫建议使用字典。我会更进一步并使用集合：

months = set('January', 'February', ..., 'December')

现在，您可以像这样测试一个值是否与集合中的值之一匹配：

if m in months:

但是，像dF一样，我通常只使用字符串常量来代替枚举。

davidg recommends using dicts. I’d go one step further and use sets:

months = set('January', 'February', ..., 'December')

Now you can test whether a value matches one of the values in the set like this:

if m in months:

like dF, though, I usually just use string constants in place of enums.

回答 17

这是我所见过的最好的：“ Python中的一流枚举”

http://code.activestate.com/recipes/413486/

它给您一个类，并且该类包含所有枚举。枚举可以相互比较，但没有任何特殊的价值。您不能将它们用作整数值。（我之所以拒绝这样做，是因为我习惯于C枚举，它们是整数值。但是，如果您不能将其用作整数，则不能将其错误地用作整数，因此总的来说，我认为这是一次胜利。。）每个枚举都是一个唯一值。您可以打印枚举，可以对其进行迭代，可以测试枚举值是否在该枚举中。它非常完整和光滑。

编辑（cfi）：上面的链接与Python 3不兼容。这是我的enum.py移植到Python 3的端口：

def cmp(a,b):
   if a < b: return -1
   if b < a: return 1
   return 0


def Enum(*names):
   ##assert names, "Empty enums are not supported" # <- Don't like empty enums? Uncomment!

   class EnumClass(object):
      __slots__ = names
      def __iter__(self):        return iter(constants)
      def __len__(self):         return len(constants)
      def __getitem__(self, i):  return constants[i]
      def __repr__(self):        return 'Enum' + str(names)
      def __str__(self):         return 'enum ' + str(constants)

   class EnumValue(object):
      __slots__ = ('__value')
      def __init__(self, value): self.__value = value
      Value = property(lambda self: self.__value)
      EnumType = property(lambda self: EnumType)
      def __hash__(self):        return hash(self.__value)
      def __cmp__(self, other):
         # C fans might want to remove the following assertion
         # to make all enums comparable by ordinal value {;))
         assert self.EnumType is other.EnumType, "Only values from the same enum are comparable"
         return cmp(self.__value, other.__value)
      def __lt__(self, other):   return self.__cmp__(other) < 0
      def __eq__(self, other):   return self.__cmp__(other) == 0
      def __invert__(self):      return constants[maximum - self.__value]
      def __nonzero__(self):     return bool(self.__value)
      def __repr__(self):        return str(names[self.__value])

   maximum = len(names) - 1
   constants = [None] * len(names)
   for i, each in enumerate(names):
      val = EnumValue(i)
      setattr(EnumClass, each, val)
      constants[i] = val
   constants = tuple(constants)
   EnumType = EnumClass()
   return EnumType


if __name__ == '__main__':
   print( '\n*** Enum Demo ***')
   print( '--- Days of week ---')
   Days = Enum('Mo', 'Tu', 'We', 'Th', 'Fr', 'Sa', 'Su')
   print( Days)
   print( Days.Mo)
   print( Days.Fr)
   print( Days.Mo < Days.Fr)
   print( list(Days))
   for each in Days:
      print( 'Day:', each)
   print( '--- Yes/No ---')
   Confirmation = Enum('No', 'Yes')
   answer = Confirmation.No
   print( 'Your answer is not', ~answer)

This is the best one I have seen: “First Class Enums in Python”

http://code.activestate.com/recipes/413486/

It gives you a class, and the class contains all the enums. The enums can be compared to each other, but don’t have any particular value; you can’t use them as an integer value. (I resisted this at first because I am used to C enums, which are integer values. But if you can’t use it as an integer, you can’t use it as an integer by mistake so overall I think it is a win.) Each enum is a unique value. You can print enums, you can iterate over them, you can test that an enum value is “in” the enum. It’s pretty complete and slick.

Edit (cfi): The above link is not Python 3 compatible. Here’s my port of enum.py to Python 3:

def cmp(a,b):
   if a < b: return -1
   if b < a: return 1
   return 0


def Enum(*names):
   ##assert names, "Empty enums are not supported" # <- Don't like empty enums? Uncomment!

   class EnumClass(object):
      __slots__ = names
      def __iter__(self):        return iter(constants)
      def __len__(self):         return len(constants)
      def __getitem__(self, i):  return constants[i]
      def __repr__(self):        return 'Enum' + str(names)
      def __str__(self):         return 'enum ' + str(constants)

   class EnumValue(object):
      __slots__ = ('__value')
      def __init__(self, value): self.__value = value
      Value = property(lambda self: self.__value)
      EnumType = property(lambda self: EnumType)
      def __hash__(self):        return hash(self.__value)
      def __cmp__(self, other):
         # C fans might want to remove the following assertion
         # to make all enums comparable by ordinal value {;))
         assert self.EnumType is other.EnumType, "Only values from the same enum are comparable"
         return cmp(self.__value, other.__value)
      def __lt__(self, other):   return self.__cmp__(other) < 0
      def __eq__(self, other):   return self.__cmp__(other) == 0
      def __invert__(self):      return constants[maximum - self.__value]
      def __nonzero__(self):     return bool(self.__value)
      def __repr__(self):        return str(names[self.__value])

   maximum = len(names) - 1
   constants = [None] * len(names)
   for i, each in enumerate(names):
      val = EnumValue(i)
      setattr(EnumClass, each, val)
      constants[i] = val
   constants = tuple(constants)
   EnumType = EnumClass()
   return EnumType


if __name__ == '__main__':
   print( '\n*** Enum Demo ***')
   print( '--- Days of week ---')
   Days = Enum('Mo', 'Tu', 'We', 'Th', 'Fr', 'Sa', 'Su')
   print( Days)
   print( Days.Mo)
   print( Days.Fr)
   print( Days.Mo < Days.Fr)
   print( list(Days))
   for each in Days:
      print( 'Day:', each)
   print( '--- Yes/No ---')
   Confirmation = Enum('No', 'Yes')
   answer = Confirmation.No
   print( 'Your answer is not', ~answer)

回答 18

把事情简单化：

class Enum(object): 
    def __init__(self, tupleList):
            self.tupleList = tupleList

    def __getattr__(self, name):
            return self.tupleList.index(name)

然后：

DIRECTION = Enum(('UP', 'DOWN', 'LEFT', 'RIGHT'))
DIRECTION.DOWN
1

Keep it simple:

class Enum(object): 
    def __init__(self, tupleList):
            self.tupleList = tupleList

    def __getattr__(self, name):
            return self.tupleList.index(name)

Then:

DIRECTION = Enum(('UP', 'DOWN', 'LEFT', 'RIGHT'))
DIRECTION.DOWN
1

回答 19

为了解码二进制文件格式，我有时需要Enum类。我碰巧想要的功能是简洁的枚举定义，通过整数值或字符串自由创建枚举实例的能力以及有用的repr表达方式。我最终得到的是：

>>> class Enum(int):
...     def __new__(cls, value):
...         if isinstance(value, str):
...             return getattr(cls, value)
...         elif isinstance(value, int):
...             return cls.__index[value]
...     def __str__(self): return self.__name
...     def __repr__(self): return "%s.%s" % (type(self).__name__, self.__name)
...     class __metaclass__(type):
...         def __new__(mcls, name, bases, attrs):
...             attrs['__slots__'] = ['_Enum__name']
...             cls = type.__new__(mcls, name, bases, attrs)
...             cls._Enum__index = _index = {}
...             for base in reversed(bases):
...                 if hasattr(base, '_Enum__index'):
...                     _index.update(base._Enum__index)
...             # create all of the instances of the new class
...             for attr in attrs.keys():
...                 value = attrs[attr]
...                 if isinstance(value, int):
...                     evalue = int.__new__(cls, value)
...                     evalue._Enum__name = attr
...                     _index[value] = evalue
...                     setattr(cls, attr, evalue)
...             return cls
...

一个奇特的使用示例：

>>> class Citrus(Enum):
...     Lemon = 1
...     Lime = 2
... 
>>> Citrus.Lemon
Citrus.Lemon
>>> 
>>> Citrus(1)
Citrus.Lemon
>>> Citrus(5)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<stdin>", line 6, in __new__
KeyError: 5
>>> class Fruit(Citrus):
...     Apple = 3
...     Banana = 4
... 
>>> Fruit.Apple
Fruit.Apple
>>> Fruit.Lemon
Citrus.Lemon
>>> Fruit(1)
Citrus.Lemon
>>> Fruit(3)
Fruit.Apple
>>> "%d %s %r" % ((Fruit.Apple,)*3)
'3 Apple Fruit.Apple'
>>> Fruit(1) is Citrus.Lemon
True

主要特点：

str()，int()和repr()所有的产品的最有用的输出可能，enumartion的分别的名称，它的整数值，以及一个Python表达式，其值回到所述枚举。
构造函数返回的枚举值严格限于预定义的值，没有意外的枚举值。
枚举值是单例；他们可以严格地与is

I have had occasion to need of an Enum class, for the purpose of decoding a binary file format. The features I happened to want is concise enum definition, the ability to freely create instances of the enum by either integer value or string, and a useful representation. Here’s what I ended up with:

>>> class Enum(int):
...     def __new__(cls, value):
...         if isinstance(value, str):
...             return getattr(cls, value)
...         elif isinstance(value, int):
...             return cls.__index[value]
...     def __str__(self): return self.__name
...     def __repr__(self): return "%s.%s" % (type(self).__name__, self.__name)
...     class __metaclass__(type):
...         def __new__(mcls, name, bases, attrs):
...             attrs['__slots__'] = ['_Enum__name']
...             cls = type.__new__(mcls, name, bases, attrs)
...             cls._Enum__index = _index = {}
...             for base in reversed(bases):
...                 if hasattr(base, '_Enum__index'):
...                     _index.update(base._Enum__index)
...             # create all of the instances of the new class
...             for attr in attrs.keys():
...                 value = attrs[attr]
...                 if isinstance(value, int):
...                     evalue = int.__new__(cls, value)
...                     evalue._Enum__name = attr
...                     _index[value] = evalue
...                     setattr(cls, attr, evalue)
...             return cls
...

A whimsical example of using it:

>>> class Citrus(Enum):
...     Lemon = 1
...     Lime = 2
... 
>>> Citrus.Lemon
Citrus.Lemon
>>> 
>>> Citrus(1)
Citrus.Lemon
>>> Citrus(5)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<stdin>", line 6, in __new__
KeyError: 5
>>> class Fruit(Citrus):
...     Apple = 3
...     Banana = 4
... 
>>> Fruit.Apple
Fruit.Apple
>>> Fruit.Lemon
Citrus.Lemon
>>> Fruit(1)
Citrus.Lemon
>>> Fruit(3)
Fruit.Apple
>>> "%d %s %r" % ((Fruit.Apple,)*3)
'3 Apple Fruit.Apple'
>>> Fruit(1) is Citrus.Lemon
True

Key features:

str(), int() and repr() all produce the most useful output possible, respectively the name of the enumartion, its integer value, and a Python expression that evaluates back to the enumeration.
Enumerated values returned by the constructor are limited strictly to the predefined values, no accidental enum values.
Enumerated values are singletons; they can be strictly compared with is

回答 20

Python中的新标准是PEP 435，因此Enum类将在将来的Python版本中可用：

>>> from enum import Enum

但是，现在就开始使用它，您可以安装激发PEP 的原始库：

$ pip install flufl.enum

然后，您可以根据其在线指南使用它：

>>> from flufl.enum import Enum
>>> class Colors(Enum):
...     red = 1
...     green = 2
...     blue = 3
>>> for color in Colors: print color
Colors.red
Colors.green
Colors.blue

The new standard in Python is PEP 435, so an Enum class will be available in future versions of Python:

>>> from enum import Enum

However to begin using it now you can install the original library that motivated the PEP:

$ pip install flufl.enum

Then you can use it as per its online guide:

>>> from flufl.enum import Enum
>>> class Colors(Enum):
...     red = 1
...     green = 2
...     blue = 3
>>> for color in Colors: print color
Colors.red
Colors.green
Colors.blue

回答 21

def enum(*sequential, **named):
    enums = dict(zip(sequential, [object() for _ in range(len(sequential))]), **named)
    return type('Enum', (), enums)

如果命名，是您的问题，但是如果不创建对象而不是值，则可以执行以下操作：

>>> DOG = enum('BARK', 'WALK', 'SIT')
>>> CAT = enum('MEOW', 'WALK', 'SIT')
>>> DOG.WALK == CAT.WALK
False

使用此处的其他实现时（在我的示例中也使用命名实例时），必须确保不要尝试比较来自不同枚举的对象。因为这可能是一个陷阱：

>>> DOG = enum('BARK'=1, 'WALK'=2, 'SIT'=3)
>>> CAT = enum('WALK'=1, 'SIT'=2)
>>> pet1_state = DOG.BARK
>>> pet2_state = CAT.WALK
>>> pet1_state == pet2_state
True

kes！

def enum(*sequential, **named):
    enums = dict(zip(sequential, [object() for _ in range(len(sequential))]), **named)
    return type('Enum', (), enums)

If you name it, is your problem, but if not creating objects instead of values allows you to do this:

>>> DOG = enum('BARK', 'WALK', 'SIT')
>>> CAT = enum('MEOW', 'WALK', 'SIT')
>>> DOG.WALK == CAT.WALK
False

When using other implementations sited here (also when using named instances in my example) you must be sure you never try to compare objects from different enums. For here’s a possible pitfall:

>>> DOG = enum('BARK'=1, 'WALK'=2, 'SIT'=3)
>>> CAT = enum('WALK'=1, 'SIT'=2)
>>> pet1_state = DOG.BARK
>>> pet2_state = CAT.WALK
>>> pet1_state == pet2_state
True

Yikes!

回答 22

我真的很喜欢Alec Thomas的解决方案（http://stackoverflow.com/a/1695250）：

def enum(**enums):
    '''simple constant "enums"'''
    return type('Enum', (object,), enums)

它外观优美整洁，但这只是一个使用指定属性创建类的函数。

对该函数进行一些修改，我们可以使其表现出更多的“枚举”：

注意：我通过尝试重现pygtk的新样式“枚举”（例如Gtk.MessageType.WARNING）的行为来创建了以下示例

def enum_base(t, **enums):
    '''enums with a base class'''
    T = type('Enum', (t,), {})
    for key,val in enums.items():
        setattr(T, key, T(val))

    return T

这将基于指定的类型创建一个枚举。除了像以前的函数一样授予属性访问权限外，它的行为还与您期望的Enum类型有关。它还继承了基类。

例如，整数枚举：

>>> Numbers = enum_base(int, ONE=1, TWO=2, THREE=3)
>>> Numbers.ONE
1
>>> x = Numbers.TWO
>>> 10 + x
12
>>> type(Numbers)
<type 'type'>
>>> type(Numbers.ONE)
<class 'Enum'>
>>> isinstance(x, Numbers)
True

使用此方法可以完成的另一件有趣的事情是，通过覆盖内置方法来自定义特定行为：

def enum_repr(t, **enums):
    '''enums with a base class and repr() output'''
    class Enum(t):
        def __repr__(self):
            return '<enum {0} of type Enum({1})>'.format(self._name, t.__name__)

    for key,val in enums.items():
        i = Enum(val)
        i._name = key
        setattr(Enum, key, i)

    return Enum



>>> Numbers = enum_repr(int, ONE=1, TWO=2, THREE=3)
>>> repr(Numbers.ONE)
'<enum ONE of type Enum(int)>'
>>> str(Numbers.ONE)
'1'

I really like Alec Thomas’ solution (http://stackoverflow.com/a/1695250):

def enum(**enums):
    '''simple constant "enums"'''
    return type('Enum', (object,), enums)

It’s elegant and clean looking, but it’s just a function that creates a class with the specified attributes.

With a little modification to the function, we can get it to act a little more ‘enumy’:

NOTE: I created the following examples by trying to reproduce the behavior of pygtk’s new style ‘enums’ (like Gtk.MessageType.WARNING)

def enum_base(t, **enums):
    '''enums with a base class'''
    T = type('Enum', (t,), {})
    for key,val in enums.items():
        setattr(T, key, T(val))

    return T

This creates an enum based off a specified type. In addition to giving attribute access like the previous function, it behaves as you would expect an Enum to with respect to types. It also inherits the base class.

For example, integer enums:

>>> Numbers = enum_base(int, ONE=1, TWO=2, THREE=3)
>>> Numbers.ONE
1
>>> x = Numbers.TWO
>>> 10 + x
12
>>> type(Numbers)
<type 'type'>
>>> type(Numbers.ONE)
<class 'Enum'>
>>> isinstance(x, Numbers)
True

Another interesting thing that can be done with this method is customize specific behavior by overriding built-in methods:

def enum_repr(t, **enums):
    '''enums with a base class and repr() output'''
    class Enum(t):
        def __repr__(self):
            return '<enum {0} of type Enum({1})>'.format(self._name, t.__name__)

    for key,val in enums.items():
        i = Enum(val)
        i._name = key
        setattr(Enum, key, i)

    return Enum



>>> Numbers = enum_repr(int, ONE=1, TWO=2, THREE=3)
>>> repr(Numbers.ONE)
'<enum ONE of type Enum(int)>'
>>> str(Numbers.ONE)
'1'

回答 23

PyPI的enum包提供了enum 的可靠实现。较早的答案提到了PEP 354。这被拒绝，但是该提案已实现 http://pypi.python.org/pypi/enum。

使用简单优雅：

>>> from enum import Enum
>>> Colors = Enum('red', 'blue', 'green')
>>> shirt_color = Colors.green
>>> shirt_color = Colors[2]
>>> shirt_color > Colors.red
True
>>> shirt_color.index
2
>>> str(shirt_color)
'green'

The enum package from PyPI provides a robust implementation of enums. An earlier answer mentioned PEP 354; this was rejected but the proposal was implemented http://pypi.python.org/pypi/enum.

Usage is easy and elegant:

>>> from enum import Enum
>>> Colors = Enum('red', 'blue', 'green')
>>> shirt_color = Colors.green
>>> shirt_color = Colors[2]
>>> shirt_color > Colors.red
True
>>> shirt_color.index
2
>>> str(shirt_color)
'green'

回答 24

Alexandru关于将类常量用于枚举的建议非常有效。

我还喜欢为每组常量添加一个字典，以查找人类可读的字符串表示形式。

这有两个目的：a）提供一种简单的方法来漂亮地枚举枚举； b）字典在逻辑上将常量分组，以便您可以测试成员资格。

class Animal:    
  TYPE_DOG = 1
  TYPE_CAT = 2

  type2str = {
    TYPE_DOG: "dog",
    TYPE_CAT: "cat"
  }

  def __init__(self, type_):
    assert type_ in self.type2str.keys()
    self._type = type_

  def __repr__(self):
    return "<%s type=%s>" % (
        self.__class__.__name__, self.type2str[self._type].upper())

Alexandru’s suggestion of using class constants for enums works quite well.

I also like to add a dictionary for each set of constants to lookup a human-readable string representation.

This serves two purposes: a) it provides a simple way to pretty-print your enum and b) the dictionary logically groups the constants so that you can test for membership.

class Animal:    
  TYPE_DOG = 1
  TYPE_CAT = 2

  type2str = {
    TYPE_DOG: "dog",
    TYPE_CAT: "cat"
  }

  def __init__(self, type_):
    assert type_ in self.type2str.keys()
    self._type = type_

  def __repr__(self):
    return "<%s type=%s>" % (
        self.__class__.__name__, self.type2str[self._type].upper())

回答 25

这是一种我认为有价值的具有不同特征的方法：

允许基于枚举而不是词法顺序的>和<比较
可以通过名称，属性或索引来寻址项目：xa，x [‘a’]或x [0]
支持切片操作，例如[：]或[-1]

最重要的是防止不同类型的枚举之间进行比较！

紧密基于http://code.activestate.com/recipes/413486-first-class-enums-in-python。

这里包括许多文档测试，以说明此方法的不同之处。

def enum(*names):
    """
SYNOPSIS
    Well-behaved enumerated type, easier than creating custom classes

DESCRIPTION
    Create a custom type that implements an enumeration.  Similar in concept
    to a C enum but with some additional capabilities and protections.  See
    http://code.activestate.com/recipes/413486-first-class-enums-in-python/.

PARAMETERS
    names       Ordered list of names.  The order in which names are given
                will be the sort order in the enum type.  Duplicate names
                are not allowed.  Unicode names are mapped to ASCII.

RETURNS
    Object of type enum, with the input names and the enumerated values.

EXAMPLES
    >>> letters = enum('a','e','i','o','u','b','c','y','z')
    >>> letters.a < letters.e
    True

    ## index by property
    >>> letters.a
    a

    ## index by position
    >>> letters[0]
    a

    ## index by name, helpful for bridging string inputs to enum
    >>> letters['a']
    a

    ## sorting by order in the enum() create, not character value
    >>> letters.u < letters.b
    True

    ## normal slicing operations available
    >>> letters[-1]
    z

    ## error since there are not 100 items in enum
    >>> letters[99]
    Traceback (most recent call last):
        ...
    IndexError: tuple index out of range

    ## error since name does not exist in enum
    >>> letters['ggg']
    Traceback (most recent call last):
        ...
    ValueError: tuple.index(x): x not in tuple

    ## enums must be named using valid Python identifiers
    >>> numbers = enum(1,2,3,4)
    Traceback (most recent call last):
        ...
    AssertionError: Enum values must be string or unicode

    >>> a = enum('-a','-b')
    Traceback (most recent call last):
        ...
    TypeError: Error when calling the metaclass bases
        __slots__ must be identifiers

    ## create another enum
    >>> tags = enum('a','b','c')
    >>> tags.a
    a
    >>> letters.a
    a

    ## can't compare values from different enums
    >>> letters.a == tags.a
    Traceback (most recent call last):
        ...
    AssertionError: Only values from the same enum are comparable

    >>> letters.a < tags.a
    Traceback (most recent call last):
        ...
    AssertionError: Only values from the same enum are comparable

    ## can't update enum after create
    >>> letters.a = 'x'
    Traceback (most recent call last):
        ...
    AttributeError: 'EnumClass' object attribute 'a' is read-only

    ## can't update enum after create
    >>> del letters.u
    Traceback (most recent call last):
        ...
    AttributeError: 'EnumClass' object attribute 'u' is read-only

    ## can't have non-unique enum values
    >>> x = enum('a','b','c','a')
    Traceback (most recent call last):
        ...
    AssertionError: Enums must not repeat values

    ## can't have zero enum values
    >>> x = enum()
    Traceback (most recent call last):
        ...
    AssertionError: Empty enums are not supported

    ## can't have enum values that look like special function names
    ## since these could collide and lead to non-obvious errors
    >>> x = enum('a','b','c','__cmp__')
    Traceback (most recent call last):
        ...
    AssertionError: Enum values beginning with __ are not supported

LIMITATIONS
    Enum values of unicode type are not preserved, mapped to ASCII instead.

    """
    ## must have at least one enum value
    assert names, 'Empty enums are not supported'
    ## enum values must be strings
    assert len([i for i in names if not isinstance(i, types.StringTypes) and not \
        isinstance(i, unicode)]) == 0, 'Enum values must be string or unicode'
    ## enum values must not collide with special function names
    assert len([i for i in names if i.startswith("__")]) == 0,\
        'Enum values beginning with __ are not supported'
    ## each enum value must be unique from all others
    assert names == uniquify(names), 'Enums must not repeat values'

    class EnumClass(object):
        """ See parent function for explanation """

        __slots__ = names

        def __iter__(self):
            return iter(constants)

        def __len__(self):
            return len(constants)

        def __getitem__(self, i):
            ## this makes xx['name'] possible
            if isinstance(i, types.StringTypes):
                i = names.index(i)
            ## handles the more normal xx[0]
            return constants[i]

        def __repr__(self):
            return 'enum' + str(names)

        def __str__(self):
            return 'enum ' + str(constants)

        def index(self, i):
            return names.index(i)

    class EnumValue(object):
        """ See parent function for explanation """

        __slots__ = ('__value')

        def __init__(self, value):
            self.__value = value

        value = property(lambda self: self.__value)

        enumtype = property(lambda self: enumtype)

        def __hash__(self):
            return hash(self.__value)

        def __cmp__(self, other):
            assert self.enumtype is other.enumtype, 'Only values from the same enum are comparable'
            return cmp(self.value, other.value)

        def __invert__(self):
            return constants[maximum - self.value]

        def __nonzero__(self):
            ## return bool(self.value)
            ## Original code led to bool(x[0])==False, not correct
            return True

        def __repr__(self):
            return str(names[self.value])

    maximum = len(names) - 1
    constants = [None] * len(names)
    for i, each in enumerate(names):
        val = EnumValue(i)
        setattr(EnumClass, each, val)
        constants[i] = val
    constants = tuple(constants)
    enumtype = EnumClass()
    return enumtype

Here’s an approach with some different characteristics I find valuable:

allows > and < comparison based on order in enum, not lexical order
can address item by name, property or index: x.a, x[‘a’] or x[0]
supports slicing operations like [:] or [-1]

and most importantly prevents comparisons between enums of different types!

Based closely on http://code.activestate.com/recipes/413486-first-class-enums-in-python.

Many doctests included here to illustrate what’s different about this approach.

def enum(*names):
    """
SYNOPSIS
    Well-behaved enumerated type, easier than creating custom classes

DESCRIPTION
    Create a custom type that implements an enumeration.  Similar in concept
    to a C enum but with some additional capabilities and protections.  See
    http://code.activestate.com/recipes/413486-first-class-enums-in-python/.

PARAMETERS
    names       Ordered list of names.  The order in which names are given
                will be the sort order in the enum type.  Duplicate names
                are not allowed.  Unicode names are mapped to ASCII.

RETURNS
    Object of type enum, with the input names and the enumerated values.

EXAMPLES
    >>> letters = enum('a','e','i','o','u','b','c','y','z')
    >>> letters.a < letters.e
    True

    ## index by property
    >>> letters.a
    a

    ## index by position
    >>> letters[0]
    a

    ## index by name, helpful for bridging string inputs to enum
    >>> letters['a']
    a

    ## sorting by order in the enum() create, not character value
    >>> letters.u < letters.b
    True

    ## normal slicing operations available
    >>> letters[-1]
    z

    ## error since there are not 100 items in enum
    >>> letters[99]
    Traceback (most recent call last):
        ...
    IndexError: tuple index out of range

    ## error since name does not exist in enum
    >>> letters['ggg']
    Traceback (most recent call last):
        ...
    ValueError: tuple.index(x): x not in tuple

    ## enums must be named using valid Python identifiers
    >>> numbers = enum(1,2,3,4)
    Traceback (most recent call last):
        ...
    AssertionError: Enum values must be string or unicode

    >>> a = enum('-a','-b')
    Traceback (most recent call last):
        ...
    TypeError: Error when calling the metaclass bases
        __slots__ must be identifiers

    ## create another enum
    >>> tags = enum('a','b','c')
    >>> tags.a
    a
    >>> letters.a
    a

    ## can't compare values from different enums
    >>> letters.a == tags.a
    Traceback (most recent call last):
        ...
    AssertionError: Only values from the same enum are comparable

    >>> letters.a < tags.a
    Traceback (most recent call last):
        ...
    AssertionError: Only values from the same enum are comparable

    ## can't update enum after create
    >>> letters.a = 'x'
    Traceback (most recent call last):
        ...
    AttributeError: 'EnumClass' object attribute 'a' is read-only

    ## can't update enum after create
    >>> del letters.u
    Traceback (most recent call last):
        ...
    AttributeError: 'EnumClass' object attribute 'u' is read-only

    ## can't have non-unique enum values
    >>> x = enum('a','b','c','a')
    Traceback (most recent call last):
        ...
    AssertionError: Enums must not repeat values

    ## can't have zero enum values
    >>> x = enum()
    Traceback (most recent call last):
        ...
    AssertionError: Empty enums are not supported

    ## can't have enum values that look like special function names
    ## since these could collide and lead to non-obvious errors
    >>> x = enum('a','b','c','__cmp__')
    Traceback (most recent call last):
        ...
    AssertionError: Enum values beginning with __ are not supported

LIMITATIONS
    Enum values of unicode type are not preserved, mapped to ASCII instead.

    """
    ## must have at least one enum value
    assert names, 'Empty enums are not supported'
    ## enum values must be strings
    assert len([i for i in names if not isinstance(i, types.StringTypes) and not \
        isinstance(i, unicode)]) == 0, 'Enum values must be string or unicode'
    ## enum values must not collide with special function names
    assert len([i for i in names if i.startswith("__")]) == 0,\
        'Enum values beginning with __ are not supported'
    ## each enum value must be unique from all others
    assert names == uniquify(names), 'Enums must not repeat values'

    class EnumClass(object):
        """ See parent function for explanation """

        __slots__ = names

        def __iter__(self):
            return iter(constants)

        def __len__(self):
            return len(constants)

        def __getitem__(self, i):
            ## this makes xx['name'] possible
            if isinstance(i, types.StringTypes):
                i = names.index(i)
            ## handles the more normal xx[0]
            return constants[i]

        def __repr__(self):
            return 'enum' + str(names)

        def __str__(self):
            return 'enum ' + str(constants)

        def index(self, i):
            return names.index(i)

    class EnumValue(object):
        """ See parent function for explanation """

        __slots__ = ('__value')

        def __init__(self, value):
            self.__value = value

        value = property(lambda self: self.__value)

        enumtype = property(lambda self: enumtype)

        def __hash__(self):
            return hash(self.__value)

        def __cmp__(self, other):
            assert self.enumtype is other.enumtype, 'Only values from the same enum are comparable'
            return cmp(self.value, other.value)

        def __invert__(self):
            return constants[maximum - self.value]

        def __nonzero__(self):
            ## return bool(self.value)
            ## Original code led to bool(x[0])==False, not correct
            return True

        def __repr__(self):
            return str(names[self.value])

    maximum = len(names) - 1
    constants = [None] * len(names)
    for i, each in enumerate(names):
        val = EnumValue(i)
        setattr(EnumClass, each, val)
        constants[i] = val
    constants = tuple(constants)
    enumtype = EnumClass()
    return enumtype

回答 26

这是Alec Thomas的解决方案的一个变体：

def enum(*args, **kwargs):
    return type('Enum', (), dict((y, x) for x, y in enumerate(args), **kwargs)) 

x = enum('POOH', 'TIGGER', 'EEYORE', 'ROO', 'PIGLET', 'RABBIT', 'OWL')
assert x.POOH == 0
assert x.TIGGER == 1

Here is a variant on Alec Thomas’s solution:

def enum(*args, **kwargs):
    return type('Enum', (), dict((y, x) for x, y in enumerate(args), **kwargs)) 

x = enum('POOH', 'TIGGER', 'EEYORE', 'ROO', 'PIGLET', 'RABBIT', 'OWL')
assert x.POOH == 0
assert x.TIGGER == 1

回答 27

此解决方案是获取枚举类的简单方法，该类定义为列表（不再烦人的整数分配）：

枚举.py：

import new

def create(class_name, names):
    return new.classobj(
        class_name, (object,), dict((y, x) for x, y in enumerate(names))
    )

example.py：

import enumeration

Colors = enumeration.create('Colors', (
    'red',
    'orange',
    'yellow',
    'green',
    'blue',
    'violet',
))

This solution is a simple way of getting a class for the enumeration defined as a list (no more annoying integer assignments):

enumeration.py:

import new

def create(class_name, names):
    return new.classobj(
        class_name, (object,), dict((y, x) for x, y in enumerate(names))
    )

example.py:

import enumeration

Colors = enumeration.create('Colors', (
    'red',
    'orange',
    'yellow',
    'green',
    'blue',
    'violet',
))

回答 28

虽然最初的枚举建议PEP 354在几年前被拒绝，但它仍在继续提出。本来打算将某种枚举添加到3.2，但是将其推回到3.3，然后忘记了。现在有一个PEP 435，打算包含在Python 3.4中。PEP 435的参考实现是flufl.enum。

截至2013年4月，似乎已经达成了普遍共识，即应该在3.4的标准库中添加一些内容，只要人们可以就该“内容”达成共识。那是困难的部分。请参阅此处和此处开始的主题以及2013年前几个月的其他六个主题。

同时，每次出现这种情况时，都会在PyPI，ActiveState等上出现大量新设计和实现，因此，如果您不喜欢FLUFL设计，请尝试进行PyPI搜索。

While the original enum proposal, PEP 354, was rejected years ago, it keeps coming back up. Some kind of enum was intended to be added to 3.2, but it got pushed back to 3.3 and then forgotten. And now there’s a PEP 435 intended for inclusion in Python 3.4. The reference implementation of PEP 435 is flufl.enum.

As of April 2013, there seems to be a general consensus that something should be added to the standard library in 3.4—as long as people can agree on what that “something” should be. That’s the hard part. See the threads starting here and here, and a half dozen other threads in the early months of 2013.

Meanwhile, every time this comes up, a slew of new designs and implementations appear on PyPI, ActiveState, etc., so if you don’t like the FLUFL design, try a PyPI search.

回答 29

使用以下内容。

TYPE = {'EAN13':   u'EAN-13',
        'CODE39':  u'Code 39',
        'CODE128': u'Code 128',
        'i25':     u'Interleaved 2 of 5',}

>>> TYPE.items()
[('EAN13', u'EAN-13'), ('i25', u'Interleaved 2 of 5'), ('CODE39', u'Code 39'), ('CODE128', u'Code 128')]
>>> TYPE.keys()
['EAN13', 'i25', 'CODE39', 'CODE128']
>>> TYPE.values()
[u'EAN-13', u'Interleaved 2 of 5', u'Code 39', u'Code 128']

我将其用于Django模型选择，它看起来非常Python。它实际上不是一个枚举，但是可以完成工作。

Use the following.

TYPE = {'EAN13':   u'EAN-13',
        'CODE39':  u'Code 39',
        'CODE128': u'Code 128',
        'i25':     u'Interleaved 2 of 5',}

>>> TYPE.items()
[('EAN13', u'EAN-13'), ('i25', u'Interleaved 2 of 5'), ('CODE39', u'Code 39'), ('CODE128', u'Code 128')]
>>> TYPE.keys()
['EAN13', 'i25', 'CODE39', 'CODE128']
>>> TYPE.values()
[u'EAN-13', u'Interleaved 2 of 5', u'Code 39', u'Code 128']

I used that for Django model choices, and it looks very pythonic. It is not really an Enum, but it does the job.

知识问答

给定完整路径，如何导入模块？

2021年7月24日 Python实用宝典

问题：给定完整路径，如何导入模块？

给定完整路径，如何加载Python模块？请注意，该文件可以在文件系统中的任何位置，因为它是配置选项。

How can I load a Python module given its full path? Note that the file can be anywhere in the filesystem, as it is a configuration option.

回答 0

对于Python 3.5+，请使用：

import importlib.util
spec = importlib.util.spec_from_file_location("module.name", "/path/to/file.py")
foo = importlib.util.module_from_spec(spec)
spec.loader.exec_module(foo)
foo.MyClass()

对于Python 3.3和3.4，请使用：

from importlib.machinery import SourceFileLoader

foo = SourceFileLoader("module.name", "/path/to/file.py").load_module()
foo.MyClass()

（尽管在Python 3.4中已弃用此功能。）

对于Python 2，请使用：

import imp

foo = imp.load_source('module.name', '/path/to/file.py')
foo.MyClass()

编译后的Python文件和DLL具有等效的便捷功能。

另请参见http://bugs.python.org/issue21436。

For Python 3.5+ use:

import importlib.util
spec = importlib.util.spec_from_file_location("module.name", "/path/to/file.py")
foo = importlib.util.module_from_spec(spec)
spec.loader.exec_module(foo)
foo.MyClass()

For Python 3.3 and 3.4 use:

from importlib.machinery import SourceFileLoader

foo = SourceFileLoader("module.name", "/path/to/file.py").load_module()
foo.MyClass()

(Although this has been deprecated in Python 3.4.)

For Python 2 use:

import imp

foo = imp.load_source('module.name', '/path/to/file.py')
foo.MyClass()

There are equivalent convenience functions for compiled Python files and DLLs.

回答 1

（通过使用imp）向sys.path添加路径的好处是，当从单个包中导入多个模块时，它可以简化操作。例如：

import sys
# the mock-0.3.1 dir contains testcase.py, testutils.py & mock.py
sys.path.append('/foo/bar/mock-0.3.1')

from testcase import TestCase
from testutils import RunTests
from mock import Mock, sentinel, patch

The advantage of adding a path to sys.path (over using imp) is that it simplifies things when importing more than one module from a single package. For example:

import sys
# the mock-0.3.1 dir contains testcase.py, testutils.py & mock.py
sys.path.append('/foo/bar/mock-0.3.1')

from testcase import TestCase
from testutils import RunTests
from mock import Mock, sentinel, patch

回答 2

如果您的顶级模块不是文件，而是与__init__.py一起打包为目录，则可接受的解决方案几乎可以使用，但效果不佳。在Python 3.5+中，需要以下代码（请注意，添加的行以’sys.modules’开头）：

MODULE_PATH = "/path/to/your/module/__init__.py"
MODULE_NAME = "mymodule"
import importlib
import sys
spec = importlib.util.spec_from_file_location(MODULE_NAME, MODULE_PATH)
module = importlib.util.module_from_spec(spec)
sys.modules[spec.name] = module 
spec.loader.exec_module(module)

如果没有此行，则在执行exec_module时，它将尝试将顶级__init__.py中的相对导入绑定到顶级模块名称（在本例中为“ mymodule”）。但是“ mymodule”尚未加载，因此您将收到错误“ SystemError：父模块’mymodule’未加载，无法执行相对导入”。因此，在加载名称之前，需要先绑定名称。这样做的原因是相对导入系统的基本不变性：“不变性在于，如果您拥有sys.modules [‘spam’]和sys.modules [‘spam.foo’]（就像在完成上述导入之后一样）），后者必须显示为前者的foo属性” ，如此处所述。

If your top-level module is not a file but is packaged as a directory with __init__.py, then the accepted solution almost works, but not quite. In Python 3.5+ the following code is needed (note the added line that begins with ‘sys.modules’):

MODULE_PATH = "/path/to/your/module/__init__.py"
MODULE_NAME = "mymodule"
import importlib
import sys
spec = importlib.util.spec_from_file_location(MODULE_NAME, MODULE_PATH)
module = importlib.util.module_from_spec(spec)
sys.modules[spec.name] = module 
spec.loader.exec_module(module)

Without this line, when exec_module is executed, it tries to bind relative imports in your top level __init__.py to the top level module name — in this case “mymodule”. But “mymodule” isn’t loaded yet so you’ll get the error “SystemError: Parent module ‘mymodule’ not loaded, cannot perform relative import”. So you need to bind the name before you load it. The reason for this is the fundamental invariant of the relative import system: “The invariant holding is that if you have sys.modules[‘spam’] and sys.modules[‘spam.foo’] (as you would after the above import), the latter must appear as the foo attribute of the former” as discussed here.

回答 3

要导入模块，您需要将其目录临时或永久地添加到环境变量中。

暂时

import sys
sys.path.append("/path/to/my/modules/")
import my_module

永久性

.bashrc将以下行添加到您的文件（在Linux中）并source ~/.bashrc在终端中执行：

export PYTHONPATH="${PYTHONPATH}:/path/to/my/modules/"

信用/来源：saarrrr，另一个 stackexchange 问题

To import your module, you need to add its directory to the environment variable, either temporarily or permanently.

Temporarily

import sys
sys.path.append("/path/to/my/modules/")
import my_module

Permanently

Adding the following line to your .bashrc file (in linux) and excecute source ~/.bashrc in the terminal:

export PYTHONPATH="${PYTHONPATH}:/path/to/my/modules/"

Credit/Source: saarrrr, another stackexchange question

回答 4

听起来您似乎不想专门导入配置文件（它具有很多副作用和其他复杂性），您只想运行它并能够访问生成的命名空间。标准库以runpy.run_path的形式专门提供了一个API ：

from runpy import run_path
settings = run_path("/path/to/file.py")

该接口在Python 2.7和Python 3.2+中可用

It sounds like you don’t want to specifically import the configuration file (which has a whole lot of side effects and additional complications involved), you just want to run it, and be able to access the resulting namespace. The standard library provides an API specifically for that in the form of runpy.run_path:

from runpy import run_path
settings = run_path("/path/to/file.py")

That interface is available in Python 2.7 and Python 3.2+

回答 5

您还可以执行类似的操作，并将配置文件所在的目录添加到Python加载路径中，然后进行常规导入，前提是您事先知道文件名，在本例中为“ config”。

凌乱，但有效。

configfile = '~/config.py'

import os
import sys

sys.path.append(os.path.dirname(os.path.expanduser(configfile)))

import config

You can also do something like this and add the directory that the configuration file is sitting in to the Python load path, and then just do a normal import, assuming you know the name of the file in advance, in this case “config”.

Messy, but it works.

configfile = '~/config.py'

import os
import sys

sys.path.append(os.path.dirname(os.path.expanduser(configfile)))

import config

回答 6

您可以使用

load_source(module_name, path_to_file)

来自imp模块的方法。

You can use the

load_source(module_name, path_to_file)

method from imp module.

回答 7

def import_file(full_path_to_module):
    try:
        import os
        module_dir, module_file = os.path.split(full_path_to_module)
        module_name, module_ext = os.path.splitext(module_file)
        save_cwd = os.getcwd()
        os.chdir(module_dir)
        module_obj = __import__(module_name)
        module_obj.__file__ = full_path_to_module
        globals()[module_name] = module_obj
        os.chdir(save_cwd)
    except:
        raise ImportError

import_file('/home/somebody/somemodule.py')

def import_file(full_path_to_module):
    try:
        import os
        module_dir, module_file = os.path.split(full_path_to_module)
        module_name, module_ext = os.path.splitext(module_file)
        save_cwd = os.getcwd()
        os.chdir(module_dir)
        module_obj = __import__(module_name)
        module_obj.__file__ = full_path_to_module
        globals()[module_name] = module_obj
        os.chdir(save_cwd)
    except:
        raise ImportError

import_file('/home/somebody/somemodule.py')

回答 8

这是一些适用于所有Python版本（从2.7-3.5，甚至其他版本）的代码。

config_file = "/tmp/config.py"
with open(config_file) as f:
    code = compile(f.read(), config_file, 'exec')
    exec(code, globals(), locals())

我测试了它可能很丑陋，但到目前为止，它是唯一可以在所有版本中使用的版本。

Here is some code that works in all Python versions, from 2.7-3.5 and probably even others.

config_file = "/tmp/config.py"
with open(config_file) as f:
    code = compile(f.read(), config_file, 'exec')
    exec(code, globals(), locals())

I tested it. It may be ugly but so far is the only one that works in all versions.

回答 9

我想出了@SebastianRittau的一个很好的答案的略微修改的版本（我认为是针对Python> 3.4），它允许您使用spec_from_loader而不是使用模块将具有任何扩展名的文件加载为模块spec_from_file_location：

from importlib.util import spec_from_loader, module_from_spec
from importlib.machinery import SourceFileLoader 

spec = spec_from_loader("module.name", SourceFileLoader("module.name", "/path/to/file.py"))
mod = module_from_spec(spec)
spec.loader.exec_module(mod)

以显式SourceFileLoader方式对路径进行编码的优点在于，该机制不会尝试从扩展名中找出文件的类型。这意味着您可以.txt使用此方法加载类似文件的内容，但是如果spec_from_file_location不指定loader，.txt则无法进行加载，因为not in中importlib.machinery.SOURCE_SUFFIXES。

I have come up with a slightly modified version of @SebastianRittau’s wonderful answer (for Python > 3.4 I think), which will allow you to load a file with any extension as a module using spec_from_loader instead of spec_from_file_location:

from importlib.util import spec_from_loader, module_from_spec
from importlib.machinery import SourceFileLoader 

spec = spec_from_loader("module.name", SourceFileLoader("module.name", "/path/to/file.py"))
mod = module_from_spec(spec)
spec.loader.exec_module(mod)

The advantage of encoding the path in an explicit SourceFileLoader is that the machinery will not try to figure out the type of the file from the extension. This means that you can load something like a .txt file using this method, but you could not do it with spec_from_file_location without specifying the loader because .txt is not in importlib.machinery.SOURCE_SUFFIXES.

回答 10

您是指加载还是导入？

您可以操纵sys.path列表，指定模块的路径，然后导入模块。例如，给定一个模块位于：

/foo/bar.py

您可以这样做：

import sys
sys.path[0:0] = ['/foo'] # puts the /foo directory at the start of your path
import bar

Do you mean load or import?

You can manipulate the sys.path list specify the path to your module, then import your module. For example, given a module at:

/foo/bar.py

You could do:

import sys
sys.path[0:0] = ['/foo'] # puts the /foo directory at the start of your path
import bar

回答 11

我相信你可以使用imp.find_module()和imp.load_module()加载指定的模块。您需要从路径中拆分模块名称，即，如果要加载/home/mypath/mymodule.py，则需要执行以下操作：

imp.find_module('mymodule', '/home/mypath/')

…但这应该可以完成工作。

I believe you can use imp.find_module() and imp.load_module() to load the specified module. You’ll need to split the module name off of the path, i.e. if you wanted to load /home/mypath/mymodule.py you’d need to do:

imp.find_module('mymodule', '/home/mypath/')

…but that should get the job done.

回答 12

您可以使用pkgutil模块（特别是walk_packages方法）来获取当前目录中软件包的列表。从那里开始，使用importlib机器导入所需的模块很简单：

import pkgutil
import importlib

packages = pkgutil.walk_packages(path='.')
for importer, name, is_package in packages:
    mod = importlib.import_module(name)
    # do whatever you want with module now, it's been imported!

You can use the pkgutil module (specifically the walk_packages method) to get a list of the packages in the current directory. From there it’s trivial to use the importlib machinery to import the modules you want:

import pkgutil
import importlib

packages = pkgutil.walk_packages(path='.')
for importer, name, is_package in packages:
    mod = importlib.import_module(name)
    # do whatever you want with module now, it's been imported!

回答 13

创建python模块test.py

import sys
sys.path.append("<project-path>/lib/")
from tes1 import Client1
from tes2 import Client2
import tes3

创建python模块test_check.py

from test import Client1
from test import Client2
from test import test3

我们可以从模块导入导入的模块。

Create python module test.py

import sys
sys.path.append("<project-path>/lib/")
from tes1 import Client1
from tes2 import Client2
import tes3

Create python module test_check.py

from test import Client1
from test import Client2
from test import test3

We can import the imported module from module.

回答 14

Python 3.4的这一领域似乎很难理解！但是，通过使用Chris Calloway的代码进行了一些黑客操作，我设法使某些东西起作用。这是基本功能。

def import_module_from_file(full_path_to_module):
    """
    Import a module given the full path/filename of the .py file

    Python 3.4

    """

    module = None

    try:

        # Get module name and path from full path
        module_dir, module_file = os.path.split(full_path_to_module)
        module_name, module_ext = os.path.splitext(module_file)

        # Get module "spec" from filename
        spec = importlib.util.spec_from_file_location(module_name,full_path_to_module)

        module = spec.loader.load_module()

    except Exception as ec:
        # Simple error printing
        # Insert "sophisticated" stuff here
        print(ec)

    finally:
        return module

这似乎使用了Python 3.4中不推荐使用的模块。我不假装理解为什么，但是它似乎可以在程序中运行。我发现克里斯的解决方案在命令行上有效，但不是在程序内部。

This area of Python 3.4 seems to be extremely tortuous to understand! However with a bit of hacking using the code from Chris Calloway as a start I managed to get something working. Here’s the basic function.

def import_module_from_file(full_path_to_module):
    """
    Import a module given the full path/filename of the .py file

    Python 3.4

    """

    module = None

    try:

        # Get module name and path from full path
        module_dir, module_file = os.path.split(full_path_to_module)
        module_name, module_ext = os.path.splitext(module_file)

        # Get module "spec" from filename
        spec = importlib.util.spec_from_file_location(module_name,full_path_to_module)

        module = spec.loader.load_module()

    except Exception as ec:
        # Simple error printing
        # Insert "sophisticated" stuff here
        print(ec)

    finally:
        return module

This appears to use non-deprecated modules from Python 3.4. I don’t pretend to understand why, but it seems to work from within a program. I found Chris’ solution worked on the command line but not from inside a program.

回答 15

我并不是说它更好，但是为了完整起见，我想建议该exec函数在python 2和3中都可用。 exec提供允许您在全局范围或内部范围内执行任意代码，作为字典提供。

例如，如果您"/path/to/module的函数中存储有一个模块，则foo()可以执行以下操作：

module = dict()
with open("/path/to/module") as f:
    exec(f.read(), module)
module['foo']()

这使您可以动态加载代码更加明确，并赋予您一些额外的功能，例如提供自定义内置函数的能力。

如果通过属性（而不是键）进行访问对您很重要，则可以为全局对象设计一个自定义dict类，它提供了这种访问，例如：

class MyModuleClass(dict):
    def __getattr__(self, name):
        return self.__getitem__(name)

I’m not saying that it is better, but for the sake of completeness, I wanted to suggest the exec function, available in both python 2 and 3. exec allows you to execute arbitrary code in either the global scope, or in an internal scope, provided as a dictionary.

For example, if you have a module stored in "/path/to/module” with the function foo(), you could run it by doing the following:

module = dict()
with open("/path/to/module") as f:
    exec(f.read(), module)
module['foo']()

This makes it a bit more explicit that you’re loading code dynamically, and grants you some additional power, such as the ability to provide custom builtins.

And if having access through attributes, instead of keys is important to you, you can design a custom dict class for the globals, that provides such access, e.g.:

class MyModuleClass(dict):
    def __getattr__(self, name):
        return self.__getitem__(name)

回答 16

要从给定的文件名导入模块，可以临时扩展路径，并在finally块引用中恢复系统路径：

filename = "directory/module.py"

directory, module_name = os.path.split(filename)
module_name = os.path.splitext(module_name)[0]

path = list(sys.path)
sys.path.insert(0, directory)
try:
    module = __import__(module_name)
finally:
    sys.path[:] = path # restore

To import a module from a given filename, you can temporarily extend the path, and restore the system path in the finally block reference:

filename = "directory/module.py"

directory, module_name = os.path.split(filename)
module_name = os.path.splitext(module_name)[0]

path = list(sys.path)
sys.path.insert(0, directory)
try:
    module = __import__(module_name)
finally:
    sys.path[:] = path # restore

回答 17

这应该工作

path = os.path.join('./path/to/folder/with/py/files', '*.py')
for infile in glob.glob(path):
    basename = os.path.basename(infile)
    basename_without_extension = basename[:-3]

    # http://docs.python.org/library/imp.html?highlight=imp#module-imp
    imp.load_source(basename_without_extension, infile)

This should work

path = os.path.join('./path/to/folder/with/py/files', '*.py')
for infile in glob.glob(path):
    basename = os.path.basename(infile)
    basename_without_extension = basename[:-3]

    # http://docs.python.org/library/imp.html?highlight=imp#module-imp
    imp.load_source(basename_without_extension, infile)

回答 18

如果我们在同一项目中有脚本，但在不同的目录中有脚本，则可以通过以下方法解决此问题。

在这种情况下utils.py是src/main/util/

import sys
sys.path.append('./')

import src.main.util.utils
#or
from src.main.util.utils import json_converter # json_converter is example method

If we have scripts in the same project but in different directory means, we can solve this problem by the following method.

In this situation utils.py is in src/main/util/

import sys
sys.path.append('./')

import src.main.util.utils
#or
from src.main.util.utils import json_converter # json_converter is example method

回答 19

我imp为您准备了一个包装。我称呼它import_file，这就是它的用法：

>>>from import_file import import_file
>>>mylib = import_file('c:\\mylib.py')
>>>another = import_file('relative_subdir/another.py')

您可以在以下位置获得它：

http://pypi.python.org/pypi/import_file

或

http://code.google.com/p/import-file/

I made a package that uses imp for you. I call it import_file and this is how it’s used:

>>>from import_file import import_file
>>>mylib = import_file('c:\\mylib.py')
>>>another = import_file('relative_subdir/another.py')

You can get it at:

http://pypi.python.org/pypi/import_file

or at

http://code.google.com/p/import-file/

回答 20

在运行时导入软件包模块（Python配方）

http://code.activestate.com/recipes/223972/

###################
##                #
## classloader.py #
##                #
###################

import sys, types

def _get_mod(modulePath):
    try:
        aMod = sys.modules[modulePath]
        if not isinstance(aMod, types.ModuleType):
            raise KeyError
    except KeyError:
        # The last [''] is very important!
        aMod = __import__(modulePath, globals(), locals(), [''])
        sys.modules[modulePath] = aMod
    return aMod

def _get_func(fullFuncName):
    """Retrieve a function object from a full dotted-package name."""

    # Parse out the path, module, and function
    lastDot = fullFuncName.rfind(u".")
    funcName = fullFuncName[lastDot + 1:]
    modPath = fullFuncName[:lastDot]

    aMod = _get_mod(modPath)
    aFunc = getattr(aMod, funcName)

    # Assert that the function is a *callable* attribute.
    assert callable(aFunc), u"%s is not callable." % fullFuncName

    # Return a reference to the function itself,
    # not the results of the function.
    return aFunc

def _get_class(fullClassName, parentClass=None):
    """Load a module and retrieve a class (NOT an instance).

    If the parentClass is supplied, className must be of parentClass
    or a subclass of parentClass (or None is returned).
    """
    aClass = _get_func(fullClassName)

    # Assert that the class is a subclass of parentClass.
    if parentClass is not None:
        if not issubclass(aClass, parentClass):
            raise TypeError(u"%s is not a subclass of %s" %
                            (fullClassName, parentClass))

    # Return a reference to the class itself, not an instantiated object.
    return aClass


######################
##       Usage      ##
######################

class StorageManager: pass
class StorageManagerMySQL(StorageManager): pass

def storage_object(aFullClassName, allOptions={}):
    aStoreClass = _get_class(aFullClassName, StorageManager)
    return aStoreClass(allOptions)

Import package modules at runtime (Python recipe)

http://code.activestate.com/recipes/223972/

###################
##                #
## classloader.py #
##                #
###################

import sys, types

def _get_mod(modulePath):
    try:
        aMod = sys.modules[modulePath]
        if not isinstance(aMod, types.ModuleType):
            raise KeyError
    except KeyError:
        # The last [''] is very important!
        aMod = __import__(modulePath, globals(), locals(), [''])
        sys.modules[modulePath] = aMod
    return aMod

def _get_func(fullFuncName):
    """Retrieve a function object from a full dotted-package name."""

    # Parse out the path, module, and function
    lastDot = fullFuncName.rfind(u".")
    funcName = fullFuncName[lastDot + 1:]
    modPath = fullFuncName[:lastDot]

    aMod = _get_mod(modPath)
    aFunc = getattr(aMod, funcName)

    # Assert that the function is a *callable* attribute.
    assert callable(aFunc), u"%s is not callable." % fullFuncName

    # Return a reference to the function itself,
    # not the results of the function.
    return aFunc

def _get_class(fullClassName, parentClass=None):
    """Load a module and retrieve a class (NOT an instance).

    If the parentClass is supplied, className must be of parentClass
    or a subclass of parentClass (or None is returned).
    """
    aClass = _get_func(fullClassName)

    # Assert that the class is a subclass of parentClass.
    if parentClass is not None:
        if not issubclass(aClass, parentClass):
            raise TypeError(u"%s is not a subclass of %s" %
                            (fullClassName, parentClass))

    # Return a reference to the class itself, not an instantiated object.
    return aClass


######################
##       Usage      ##
######################

class StorageManager: pass
class StorageManagerMySQL(StorageManager): pass

def storage_object(aFullClassName, allOptions={}):
    aStoreClass = _get_class(aFullClassName, StorageManager)
    return aStoreClass(allOptions)

回答 21

在Linux中，可以在python脚本所在的目录中添加符号链接。

即：

ln -s /absolute/path/to/module/module.py /absolute/path/to/script/module.py

/absolute/path/to/script/module.pyc如果您更改python 的内容，python将会创建并更新它/absolute/path/to/module/module.py

然后在mypythonscript.py中包含以下内容

from module import *

In Linux, adding a symbolic link in the directory your python script is located works.

ie:

ln -s /absolute/path/to/module/module.py /absolute/path/to/script/module.py

python will create /absolute/path/to/script/module.pyc and will update it if you change the contents of /absolute/path/to/module/module.py

then include the following in mypythonscript.py

from module import *

回答 22

我已经基于importlib模块编写了自己的全局和可移植导入函数，用于：

既可以将两个模块都作为子模块导入，又可以将模块的内容导入父模块（如果没有父模块，则可以导入全局变量）。
能够导入文件名中带有句点字符的模块。
能够导入具有任何扩展名的模块。
能够为子模块使用独立名称，而不是默认情况下不带扩展名的文件名。
能够基于先前导入的模块来定义导入顺序，而不是依赖于sys.path或依赖于什么搜索路径存储。

示例目录结构：

<root>
 |
 +- test.py
 |
 +- testlib.py
 |
 +- /std1
 |   |
 |   +- testlib.std1.py
 |
 +- /std2
 |   |
 |   +- testlib.std2.py
 |
 +- /std3
     |
     +- testlib.std3.py

包含关系和顺序：

test.py
  -> testlib.py
    -> testlib.std1.py
      -> testlib.std2.py
    -> testlib.std3.py

实现方式：

test.py：

import os, sys, inspect, copy

SOURCE_FILE = os.path.abspath(inspect.getsourcefile(lambda:0)).replace('\\','/')
SOURCE_DIR = os.path.dirname(SOURCE_FILE)

print("test::SOURCE_FILE: ", SOURCE_FILE)

# portable import to the global space
sys.path.append(TACKLELIB_ROOT) # TACKLELIB_ROOT - path to the library directory
import tacklelib as tkl

tkl.tkl_init(tkl)

# cleanup
del tkl # must be instead of `tkl = None`, otherwise the variable would be still persist
sys.path.pop()

tkl_import_module(SOURCE_DIR, 'testlib.py')

print(globals().keys())

testlib.base_test()
testlib.testlib_std1.std1_test()
testlib.testlib_std1.testlib_std2.std2_test()
#testlib.testlib.std3.std3_test()                             # does not reachable directly ...
getattr(globals()['testlib'], 'testlib.std3').std3_test()     # ... but reachable through the `globals` + `getattr`

tkl_import_module(SOURCE_DIR, 'testlib.py', '.')

print(globals().keys())

base_test()
testlib_std1.std1_test()
testlib_std1.testlib_std2.std2_test()
#testlib.std3.std3_test()                                     # does not reachable directly ...
globals()['testlib.std3'].std3_test()                         # ... but reachable through the `globals` + `getattr`

testlib.py：

# optional for 3.4.x and higher
#import os, inspect
#
#SOURCE_FILE = os.path.abspath(inspect.getsourcefile(lambda:0)).replace('\\','/')
#SOURCE_DIR = os.path.dirname(SOURCE_FILE)

print("1 testlib::SOURCE_FILE: ", SOURCE_FILE)

tkl_import_module(SOURCE_DIR + '/std1', 'testlib.std1.py', 'testlib_std1')

# SOURCE_DIR is restored here
print("2 testlib::SOURCE_FILE: ", SOURCE_FILE)

tkl_import_module(SOURCE_DIR + '/std3', 'testlib.std3.py')

print("3 testlib::SOURCE_FILE: ", SOURCE_FILE)

def base_test():
  print('base_test')

testlib.std1.py：

# optional for 3.4.x and higher
#import os, inspect
#
#SOURCE_FILE = os.path.abspath(inspect.getsourcefile(lambda:0)).replace('\\','/')
#SOURCE_DIR = os.path.dirname(SOURCE_FILE)

print("testlib.std1::SOURCE_FILE: ", SOURCE_FILE)

tkl_import_module(SOURCE_DIR + '/../std2', 'testlib.std2.py', 'testlib_std2')

def std1_test():
  print('std1_test')

testlib.std2.py：

# optional for 3.4.x and higher
#import os, inspect
#
#SOURCE_FILE = os.path.abspath(inspect.getsourcefile(lambda:0)).replace('\\','/')
#SOURCE_DIR = os.path.dirname(SOURCE_FILE)

print("testlib.std2::SOURCE_FILE: ", SOURCE_FILE)

def std2_test():
  print('std2_test')

testlib.std3.py：

# optional for 3.4.x and higher
#import os, inspect
#
#SOURCE_FILE = os.path.abspath(inspect.getsourcefile(lambda:0)).replace('\\','/')
#SOURCE_DIR = os.path.dirname(SOURCE_FILE)

print("testlib.std3::SOURCE_FILE: ", SOURCE_FILE)

def std3_test():
  print('std3_test')

输出（3.7.4）：

test::SOURCE_FILE:  <root>/test01/test.py
import : <root>/test01/testlib.py as testlib -> []
1 testlib::SOURCE_FILE:  <root>/test01/testlib.py
import : <root>/test01/std1/testlib.std1.py as testlib_std1 -> ['testlib']
import : <root>/test01/std1/../std2/testlib.std2.py as testlib_std2 -> ['testlib', 'testlib_std1']
testlib.std2::SOURCE_FILE:  <root>/test01/std1/../std2/testlib.std2.py
2 testlib::SOURCE_FILE:  <root>/test01/testlib.py
import : <root>/test01/std3/testlib.std3.py as testlib.std3 -> ['testlib']
testlib.std3::SOURCE_FILE:  <root>/test01/std3/testlib.std3.py
3 testlib::SOURCE_FILE:  <root>/test01/testlib.py
dict_keys(['__name__', '__doc__', '__package__', '__loader__', '__spec__', '__annotations__', '__builtins__', '__file__', '__cached__', 'os', 'sys', 'inspect', 'copy', 'SOURCE_FILE', 'SOURCE_DIR', 'TackleGlobalImportModuleState', 'tkl_membercopy', 'tkl_merge_module', 'tkl_get_parent_imported_module_state', 'tkl_declare_global', 'tkl_import_module', 'TackleSourceModuleState', 'tkl_source_module', 'TackleLocalImportModuleState', 'testlib'])
base_test
std1_test
std2_test
std3_test
import : <root>/test01/testlib.py as . -> []
1 testlib::SOURCE_FILE:  <root>/test01/testlib.py
import : <root>/test01/std1/testlib.std1.py as testlib_std1 -> ['testlib']
import : <root>/test01/std1/../std2/testlib.std2.py as testlib_std2 -> ['testlib', 'testlib_std1']
testlib.std2::SOURCE_FILE:  <root>/test01/std1/../std2/testlib.std2.py
2 testlib::SOURCE_FILE:  <root>/test01/testlib.py
import : <root>/test01/std3/testlib.std3.py as testlib.std3 -> ['testlib']
testlib.std3::SOURCE_FILE:  <root>/test01/std3/testlib.std3.py
3 testlib::SOURCE_FILE:  <root>/test01/testlib.py
dict_keys(['__name__', '__doc__', '__package__', '__loader__', '__spec__', '__annotations__', '__builtins__', '__file__', '__cached__', 'os', 'sys', 'inspect', 'copy', 'SOURCE_FILE', 'SOURCE_DIR', 'TackleGlobalImportModuleState', 'tkl_membercopy', 'tkl_merge_module', 'tkl_get_parent_imported_module_state', 'tkl_declare_global', 'tkl_import_module', 'TackleSourceModuleState', 'tkl_source_module', 'TackleLocalImportModuleState', 'testlib', 'testlib_std1', 'testlib.std3', 'base_test'])
base_test
std1_test
std2_test
std3_test

经测试在Python 3.7.4，3.2.5，2.7.16

优点：

可以将两个模块都作为子模块导入，也可以将模块的内容导入父模块（如果没有父模块，则可以导入全局变量）。
可以导入文件名中带有句点的模块。
可以从任何扩展模块导入任何扩展模块。
可以为子模块使用独立名称，而不是默认情况下不带扩展名的文件名（例如，testlib.std.pyas testlib，testlib.blabla.pyas testlib_blabla等）。
不依赖于sys.path搜索路径存储。
不需要在调用之间SOURCE_FILE和SOURCE_DIR之间保存/恢复全局变量tkl_import_module。
[用于3.4.x和较高]可以混合在嵌套的模块命名空间tkl_import_module的呼叫（例如：named->local->named或local->named->local此类推）。
[ 3.4.x及更高]可以将全局变量/函数/类从声明的位置自动导出到通过tkl_import_module（通过tkl_declare_global函数）导入的所有子模块。

缺点：

[for 3.3.xand lower]要求tkl_import_module在所有要调用的模块中声明tkl_import_module（代码重复）

更新1,2（对于3.4.x仅限更高版本）：

在Python 3.4及更高版本中，您可以tkl_import_module通过tkl_import_module在顶级模块中进行声明来绕过在每个模块中声明的要求，并且该函数将在一次调用中将自身注入所有子模块（这是一种自我部署导入）。

更新3：

添加了tkl_source_module与bash类似的功能，source并在导入时提供了支持执行保护（通过模块合并而不是导入实现）。

更新4：

添加tkl_declare_global了将模块全局变量自动导出到所有子模块的功能，这些模块由于不属于子模块而无法看到模块全局变量。

更新5：

所有功能都已移入铲斗库，请参见上面的链接。

I have wrote my own global and portable import function, based on importlib module, for:

Be able to import both module as a submodule and to import content of a module to a parent module (or into a globals if has no parent module).
Be able to import modules with a period characters in a file name.
Be able to import modules with any extension.
Be able to use a standalone name for a submodule instead of a file name without extension which is by default.
Be able to define the import order based on previously imported module instead of dependent on sys.path or on a what ever search path storage.

The examples directory structure:

<root>
 |
 +- test.py
 |
 +- testlib.py
 |
 +- /std1
 |   |
 |   +- testlib.std1.py
 |
 +- /std2
 |   |
 |   +- testlib.std2.py
 |
 +- /std3
     |
     +- testlib.std3.py

Inclusion dependency and order:

test.py
  -> testlib.py
    -> testlib.std1.py
      -> testlib.std2.py
    -> testlib.std3.py

Implementation:

Latest changes store: https://sourceforge.net/p/tacklelib/tacklelib/HEAD/tree/trunk/python/tacklelib/tacklelib.py

test.py:

import os, sys, inspect, copy

SOURCE_FILE = os.path.abspath(inspect.getsourcefile(lambda:0)).replace('\\','/')
SOURCE_DIR = os.path.dirname(SOURCE_FILE)

print("test::SOURCE_FILE: ", SOURCE_FILE)

# portable import to the global space
sys.path.append(TACKLELIB_ROOT) # TACKLELIB_ROOT - path to the library directory
import tacklelib as tkl

tkl.tkl_init(tkl)

# cleanup
del tkl # must be instead of `tkl = None`, otherwise the variable would be still persist
sys.path.pop()

tkl_import_module(SOURCE_DIR, 'testlib.py')

print(globals().keys())

testlib.base_test()
testlib.testlib_std1.std1_test()
testlib.testlib_std1.testlib_std2.std2_test()
#testlib.testlib.std3.std3_test()                             # does not reachable directly ...
getattr(globals()['testlib'], 'testlib.std3').std3_test()     # ... but reachable through the `globals` + `getattr`

tkl_import_module(SOURCE_DIR, 'testlib.py', '.')

print(globals().keys())

base_test()
testlib_std1.std1_test()
testlib_std1.testlib_std2.std2_test()
#testlib.std3.std3_test()                                     # does not reachable directly ...
globals()['testlib.std3'].std3_test()                         # ... but reachable through the `globals` + `getattr`

testlib.py:

# optional for 3.4.x and higher
#import os, inspect
#
#SOURCE_FILE = os.path.abspath(inspect.getsourcefile(lambda:0)).replace('\\','/')
#SOURCE_DIR = os.path.dirname(SOURCE_FILE)

print("1 testlib::SOURCE_FILE: ", SOURCE_FILE)

tkl_import_module(SOURCE_DIR + '/std1', 'testlib.std1.py', 'testlib_std1')

# SOURCE_DIR is restored here
print("2 testlib::SOURCE_FILE: ", SOURCE_FILE)

tkl_import_module(SOURCE_DIR + '/std3', 'testlib.std3.py')

print("3 testlib::SOURCE_FILE: ", SOURCE_FILE)

def base_test():
  print('base_test')

testlib.std1.py:

# optional for 3.4.x and higher
#import os, inspect
#
#SOURCE_FILE = os.path.abspath(inspect.getsourcefile(lambda:0)).replace('\\','/')
#SOURCE_DIR = os.path.dirname(SOURCE_FILE)

print("testlib.std1::SOURCE_FILE: ", SOURCE_FILE)

tkl_import_module(SOURCE_DIR + '/../std2', 'testlib.std2.py', 'testlib_std2')

def std1_test():
  print('std1_test')

testlib.std2.py:

# optional for 3.4.x and higher
#import os, inspect
#
#SOURCE_FILE = os.path.abspath(inspect.getsourcefile(lambda:0)).replace('\\','/')
#SOURCE_DIR = os.path.dirname(SOURCE_FILE)

print("testlib.std2::SOURCE_FILE: ", SOURCE_FILE)

def std2_test():
  print('std2_test')

testlib.std3.py:

# optional for 3.4.x and higher
#import os, inspect
#
#SOURCE_FILE = os.path.abspath(inspect.getsourcefile(lambda:0)).replace('\\','/')
#SOURCE_DIR = os.path.dirname(SOURCE_FILE)

print("testlib.std3::SOURCE_FILE: ", SOURCE_FILE)

def std3_test():
  print('std3_test')

Output (3.7.4):

test::SOURCE_FILE:  <root>/test01/test.py
import : <root>/test01/testlib.py as testlib -> []
1 testlib::SOURCE_FILE:  <root>/test01/testlib.py
import : <root>/test01/std1/testlib.std1.py as testlib_std1 -> ['testlib']
import : <root>/test01/std1/../std2/testlib.std2.py as testlib_std2 -> ['testlib', 'testlib_std1']
testlib.std2::SOURCE_FILE:  <root>/test01/std1/../std2/testlib.std2.py
2 testlib::SOURCE_FILE:  <root>/test01/testlib.py
import : <root>/test01/std3/testlib.std3.py as testlib.std3 -> ['testlib']
testlib.std3::SOURCE_FILE:  <root>/test01/std3/testlib.std3.py
3 testlib::SOURCE_FILE:  <root>/test01/testlib.py
dict_keys(['__name__', '__doc__', '__package__', '__loader__', '__spec__', '__annotations__', '__builtins__', '__file__', '__cached__', 'os', 'sys', 'inspect', 'copy', 'SOURCE_FILE', 'SOURCE_DIR', 'TackleGlobalImportModuleState', 'tkl_membercopy', 'tkl_merge_module', 'tkl_get_parent_imported_module_state', 'tkl_declare_global', 'tkl_import_module', 'TackleSourceModuleState', 'tkl_source_module', 'TackleLocalImportModuleState', 'testlib'])
base_test
std1_test
std2_test
std3_test
import : <root>/test01/testlib.py as . -> []
1 testlib::SOURCE_FILE:  <root>/test01/testlib.py
import : <root>/test01/std1/testlib.std1.py as testlib_std1 -> ['testlib']
import : <root>/test01/std1/../std2/testlib.std2.py as testlib_std2 -> ['testlib', 'testlib_std1']
testlib.std2::SOURCE_FILE:  <root>/test01/std1/../std2/testlib.std2.py
2 testlib::SOURCE_FILE:  <root>/test01/testlib.py
import : <root>/test01/std3/testlib.std3.py as testlib.std3 -> ['testlib']
testlib.std3::SOURCE_FILE:  <root>/test01/std3/testlib.std3.py
3 testlib::SOURCE_FILE:  <root>/test01/testlib.py
dict_keys(['__name__', '__doc__', '__package__', '__loader__', '__spec__', '__annotations__', '__builtins__', '__file__', '__cached__', 'os', 'sys', 'inspect', 'copy', 'SOURCE_FILE', 'SOURCE_DIR', 'TackleGlobalImportModuleState', 'tkl_membercopy', 'tkl_merge_module', 'tkl_get_parent_imported_module_state', 'tkl_declare_global', 'tkl_import_module', 'TackleSourceModuleState', 'tkl_source_module', 'TackleLocalImportModuleState', 'testlib', 'testlib_std1', 'testlib.std3', 'base_test'])
base_test
std1_test
std2_test
std3_test

Tested in Python 3.7.4, 3.2.5, 2.7.16

Pros:

Can import both module as a submodule and can import content of a module to a parent module (or into a globals if has no parent module).
Can import modules with periods in a file name.
Can import any extension module from any extension module.
Can use a standalone name for a submodule instead of a file name without extension which is by default (for example, testlib.std.py as testlib, testlib.blabla.py as testlib_blabla and so on).
Does not depend on a sys.path or on a what ever search path storage.
Does not require to save/restore global variables like SOURCE_FILE and SOURCE_DIR between calls to tkl_import_module.
[for 3.4.x and higher] Can mix the module namespaces in nested tkl_import_module calls (ex: named->local->named or local->named->local and so on).
[for 3.4.x and higher] Can auto export global variables/functions/classes from where being declared to all children modules imported through the tkl_import_module (through the tkl_declare_global function).

Cons:

[for 3.3.x and lower] Require to declare tkl_import_module in all modules which calls to tkl_import_module (code duplication)

Update 1,2 (for 3.4.x and higher only):

In Python 3.4 and higher you can bypass the requirement to declare tkl_import_module in each module by declare tkl_import_module in a top level module and the function would inject itself to all children modules in a single call (it’s a kind of self deploy import).

Update 3:

Added function tkl_source_module as analog to bash source with support execution guard upon import (implemented through the module merge instead of import).

Update 4:

Added function tkl_declare_global to auto export a module global variable to all children modules where a module global variable is not visible because is not a part of a child module.

Update 5:

All functions has moved into the tacklelib library, see the link above.

回答 23

有一个专用于此的软件包：

from thesmuggler import smuggle

# À la `import weapons`
weapons = smuggle('weapons.py')

# À la `from contraband import drugs, alcohol`
drugs, alcohol = smuggle('drugs', 'alcohol', source='contraband.py')

# À la `from contraband import drugs as dope, alcohol as booze`
dope, booze = smuggle('drugs', 'alcohol', source='contraband.py')

它已经在Python版本（也包括Jython和PyPy）上进行了测试，但是根据项目的大小，它可能会显得过大。

There’s a package that’s dedicated to this specifically:

from thesmuggler import smuggle

# À la `import weapons`
weapons = smuggle('weapons.py')

# À la `from contraband import drugs, alcohol`
drugs, alcohol = smuggle('drugs', 'alcohol', source='contraband.py')

# À la `from contraband import drugs as dope, alcohol as booze`
dope, booze = smuggle('drugs', 'alcohol', source='contraband.py')

It’s tested across Python versions (Jython and PyPy too), but it might be overkill depending on the size of your project.

回答 24

将其添加到答案列表中，因为我找不到任何有效的方法。这将允许导入3.4中的已编译（pyd）python模块：

import sys
import importlib.machinery

def load_module(name, filename):
    # If the Loader finds the module name in this list it will use
    # module_name.__file__ instead so we need to delete it here
    if name in sys.modules:
        del sys.modules[name]
    loader = importlib.machinery.ExtensionFileLoader(name, filename)
    module = loader.load_module()
    locals()[name] = module
    globals()[name] = module

load_module('something', r'C:\Path\To\something.pyd')
something.do_something()

Adding this to the list of answers as I couldn’t find anything that worked. This will allow imports of compiled (pyd) python modules in 3.4:

import sys
import importlib.machinery

def load_module(name, filename):
    # If the Loader finds the module name in this list it will use
    # module_name.__file__ instead so we need to delete it here
    if name in sys.modules:
        del sys.modules[name]
    loader = importlib.machinery.ExtensionFileLoader(name, filename)
    module = loader.load_module()
    locals()[name] = module
    globals()[name] = module

load_module('something', r'C:\Path\To\something.pyd')
something.do_something()

回答 25

很简单的方法：假设您要导入具有相对路径../../MyLibs/pyfunc.py的文件


libPath = '../../MyLibs'
import sys
if not libPath in sys.path: sys.path.append(libPath)
import pyfunc as pf

但是，如果没有警卫就可以做到，那么您最终可以走很长的路

quite simple way: suppose you want import file with relative path ../../MyLibs/pyfunc.py


libPath = '../../MyLibs'
import sys
if not libPath in sys.path: sys.path.append(libPath)
import pyfunc as pf

But if you make it without a guard you can finally get a very long path

回答 26

一个简单的解决方案，importlib而不是使用imp包（已针对Python 2.7进行了测试，尽管它也适用于Python 3）：

import importlib

dirname, basename = os.path.split(pyfilepath) # pyfilepath: '/my/path/mymodule.py'
sys.path.append(dirname) # only directories should be added to PYTHONPATH
module_name = os.path.splitext(basename)[0] # '/my/path/mymodule.py' --> 'mymodule'
module = importlib.import_module(module_name) # name space of defined module (otherwise we would literally look for "module_name")

现在，您可以直接使用导入模块的命名空间，如下所示：

a = module.myvar
b = module.myfunc(a)

该解决方案的优势在于，为了在我们的代码中使用它，我们甚至不需要知道我们想要导入的模块的实际名称。例如，在模块的路径是可配置参数的情况下，这很有用。

A simple solution using importlib instead of the imp package (tested for Python 2.7, although it should work for Python 3 too):

import importlib

dirname, basename = os.path.split(pyfilepath) # pyfilepath: '/my/path/mymodule.py'
sys.path.append(dirname) # only directories should be added to PYTHONPATH
module_name = os.path.splitext(basename)[0] # '/my/path/mymodule.py' --> 'mymodule'
module = importlib.import_module(module_name) # name space of defined module (otherwise we would literally look for "module_name")

Now you can directly use the namespace of the imported module, like this:

a = module.myvar
b = module.myfunc(a)

The advantage of this solution is that we don’t even need to know the actual name of the module we would like to import, in order to use it in our code. This is useful, e.g. in case the path of the module is a configurable argument.

回答 27

这个答案是塞巴斯蒂安·里陶（Sebastian Rittau）对评论的回答的补充：“但是，如果您没有模块名怎么办？” 这是一种在给定文件名的情况下获取可能的python模块名称的快速而肮脏的方法-它只是沿树走，直到找到没有__init__.py文件的目录，然后将其转换回文件名。对于Python 3.4+（使用pathlib），这很有意义，因为Py2人可以使用“ imp”或其他方式进行相对导入：

import pathlib

def likely_python_module(filename):
    '''
    Given a filename or Path, return the "likely" python module name.  That is, iterate
    the parent directories until it doesn't contain an __init__.py file.

    :rtype: str
    '''
    p = pathlib.Path(filename).resolve()
    paths = []
    if p.name != '__init__.py':
        paths.append(p.stem)
    while True:
        p = p.parent
        if not p:
            break
        if not p.is_dir():
            break

        inits = [f for f in p.iterdir() if f.name == '__init__.py']
        if not inits:
            break

        paths.append(p.stem)

    return '.'.join(reversed(paths))

当然存在改进的可能性，并且可选__init__.py文件可能需要进行其他更改，但是如果__init__.py总的来说，这可以解决问题。

This answer is a supplement to Sebastian Rittau’s answer responding to the comment: “but what if you don’t have the module name?” This is a quick and dirty way of getting the likely python module name given a filename — it just goes up the tree until it finds a directory without an __init__.py file and then turns it back into a filename. For Python 3.4+ (uses pathlib), which makes sense since Py2 people can use “imp” or other ways of doing relative imports:

import pathlib

def likely_python_module(filename):
    '''
    Given a filename or Path, return the "likely" python module name.  That is, iterate
    the parent directories until it doesn't contain an __init__.py file.

    :rtype: str
    '''
    p = pathlib.Path(filename).resolve()
    paths = []
    if p.name != '__init__.py':
        paths.append(p.stem)
    while True:
        p = p.parent
        if not p:
            break
        if not p.is_dir():
            break

        inits = [f for f in p.iterdir() if f.name == '__init__.py']
        if not inits:
            break

        paths.append(p.stem)

    return '.'.join(reversed(paths))

There are certainly possibilities for improvement, and the optional __init__.py files might necessitate other changes, but if you have __init__.py in general, this does the trick.

回答 28

我认为，最好的方法是从官方文档中获取（29.1。imp —访问import internals）：

import imp
import sys

def __import__(name, globals=None, locals=None, fromlist=None):
    # Fast path: see if the module has already been imported.
    try:
        return sys.modules[name]
    except KeyError:
        pass

    # If any of the following calls raises an exception,
    # there's a problem we can't handle -- let the caller handle it.

    fp, pathname, description = imp.find_module(name)

    try:
        return imp.load_module(name, fp, pathname, description)
    finally:
        # Since we may exit via an exception, close fp explicitly.
        if fp:
            fp.close()

The best way, I think, is from the official documentation (29.1. imp — Access the import internals):

import imp
import sys

def __import__(name, globals=None, locals=None, fromlist=None):
    # Fast path: see if the module has already been imported.
    try:
        return sys.modules[name]
    except KeyError:
        pass

    # If any of the following calls raises an exception,
    # there's a problem we can't handle -- let the caller handle it.

    fp, pathname, description = imp.find_module(name)

    try:
        return imp.load_module(name, fp, pathname, description)
    finally:
        # Since we may exit via an exception, close fp explicitly.
        if fp:
            fp.close()

知识问答

如何查找Python中是否存在目录

2021年7月24日 Python实用宝典

问题：如何查找Python中是否存在目录

在osPython模块中，有一种方法可以查找目录是否存在，例如：

>>> os.direxists(os.path.join(os.getcwd()), 'new_folder')) # in pseudocode
True/False

In the os module in Python, is there a way to find if a directory exists, something like:

>>> os.direxists(os.path.join(os.getcwd()), 'new_folder')) # in pseudocode
True/False

回答 0

您正在寻找os.path.isdir，还是os.path.exists不在乎它是文件还是目录。

例：

import os
print(os.path.isdir("/home/el"))
print(os.path.exists("/home/el/myfile.txt"))

You’re looking for os.path.isdir, or os.path.exists if you don’t care whether it’s a file or a directory.

Example:

import os
print(os.path.isdir("/home/el"))
print(os.path.exists("/home/el/myfile.txt"))

回答 1

很近！如果传入当前存在的目录名，则os.path.isdir返回True。如果不存在或不是目录，则返回False。

So close! os.path.isdir returns True if you pass in the name of a directory that currently exists. If it doesn’t exist or it’s not a directory, then it returns False.

回答 2

蟒3.4引入的pathlib模块到标准库，它提供了一个面向对象的方法来处理的文件系统的路径。对象的is_dir()和exists()方法Path可用于回答以下问题：

In [1]: from pathlib import Path

In [2]: p = Path('/usr')

In [3]: p.exists()
Out[3]: True

In [4]: p.is_dir()
Out[4]: True

路径（和字符串）可以与/运算符连接在一起：

In [5]: q = p / 'bin' / 'vim'

In [6]: q
Out[6]: PosixPath('/usr/bin/vim') 

In [7]: q.exists()
Out[7]: True

In [8]: q.is_dir()
Out[8]: False

也可以通过PyPi上的pathlib2模块在 Python 2.7 上使用Pathlib。

Python 3.4 introduced the pathlib module into the standard library, which provides an object oriented approach to handle filesystem paths. The is_dir() and exists() methods of a Path object can be used to answer the question:

In [1]: from pathlib import Path

In [2]: p = Path('/usr')

In [3]: p.exists()
Out[3]: True

In [4]: p.is_dir()
Out[4]: True

Paths (and strings) can be joined together with the / operator:

In [5]: q = p / 'bin' / 'vim'

In [6]: q
Out[6]: PosixPath('/usr/bin/vim') 

In [7]: q.exists()
Out[7]: True

In [8]: q.is_dir()
Out[8]: False

Pathlib is also available on Python 2.7 via the pathlib2 module on PyPi.

回答 3

是的，请使用os.path.exists()。

Yes, use os.path.exists().

回答 4

我们可以检查2个内置函数

os.path.isdir("directory")

它将布尔值为true，指定的目录可用。

os.path.exists("directoryorfile")

如果指定的目录或文件可用，它将使boolead为true。

检查路径是否为目录；

os.path.isdir("directorypath")

如果路径为目录，则将为布尔值true

We can check with 2 built in functions

os.path.isdir("directory")

It will give boolean true the specified directory is available.

os.path.exists("directoryorfile")

It will give boolead true if specified directory or file is available.

To check whether the path is directory;

os.path.isdir("directorypath")

will give boolean true if the path is directory

回答 5

是的，使用os.path.isdir（path）

Yes use os.path.isdir(path)

回答 6

如：

In [3]: os.path.exists('/d/temp')
Out[3]: True

可能会折腾os.path.isdir(...)以确保。

As in:

In [3]: os.path.exists('/d/temp')
Out[3]: True

Probably toss in a os.path.isdir(...) to be sure.

回答 7

仅提供os.stat版本（python 2）：

import os, stat, errno
def CheckIsDir(directory):
  try:
    return stat.S_ISDIR(os.stat(directory).st_mode)
  except OSError, e:
    if e.errno == errno.ENOENT:
      return False
    raise

Just to provide the os.stat version (python 2):

import os, stat, errno
def CheckIsDir(directory):
  try:
    return stat.S_ISDIR(os.stat(directory).st_mode)
  except OSError, e:
    if e.errno == errno.ENOENT:
      return False
    raise

回答 8

os为您提供了许多以下功能：

import os
os.path.isdir(dir_in) #True/False: check if this is a directory
os.listdir(dir_in)    #gets you a list of all files and directories under dir_in

如果输入路径无效，则listdir将引发异常。

os provides you with a lot of these capabilities:

import os
os.path.isdir(dir_in) #True/False: check if this is a directory
os.listdir(dir_in)    #gets you a list of all files and directories under dir_in

the listdir will throw an exception if the input path is invalid.

回答 9

#You can also check it get help for you

if not os.path.isdir('mydir'):
    print('new directry has been created')
    os.system('mkdir mydir')

#You can also check it get help for you

if not os.path.isdir('mydir'):
    print('new directry has been created')
    os.system('mkdir mydir')

回答 10

有一个方便的Unipath模块。

>>> from unipath import Path 
>>>  
>>> Path('/var/log').exists()
True
>>> Path('/var/log').isdir()
True

您可能需要的其他相关事项：

>>> Path('/var/log/system.log').parent
Path('/var/log')
>>> Path('/var/log/system.log').ancestor(2)
Path('/var')
>>> Path('/var/log/system.log').listdir()
[Path('/var/foo'), Path('/var/bar')]
>>> (Path('/var/log') + '/system.log').isfile()
True

您可以使用pip安装它：

$ pip3 install unipath

它类似于内置的pathlib。不同之处在于，它将每个路径都视为字符串（Path是的子类str），因此，如果某些函数需要字符串，则可以轻松地将其传递给Path对象，而无需将其转换为字符串。

例如，这在Django和下非常有用settings.py：

# settings.py
BASE_DIR = Path(__file__).ancestor(2)
STATIC_ROOT = BASE_DIR + '/tmp/static'

There is a convenient Unipath module.

>>> from unipath import Path 
>>>  
>>> Path('/var/log').exists()
True
>>> Path('/var/log').isdir()
True

Other related things you might need:

>>> Path('/var/log/system.log').parent
Path('/var/log')
>>> Path('/var/log/system.log').ancestor(2)
Path('/var')
>>> Path('/var/log/system.log').listdir()
[Path('/var/foo'), Path('/var/bar')]
>>> (Path('/var/log') + '/system.log').isfile()
True

You can install it using pip:

$ pip3 install unipath

It’s similar to the built-in pathlib. The difference is that it treats every path as a string (Path is a subclass of the str), so if some function expects a string, you can easily pass it a Path object without a need to convert it to a string.

For example, this works great with Django and settings.py:

# settings.py
BASE_DIR = Path(__file__).ancestor(2)
STATIC_ROOT = BASE_DIR + '/tmp/static'

回答 11

Source，如果它仍在SO上。

================================================== ===================

在Python≥3.5上，使用pathlib.Path.mkdir：

from pathlib import Path
Path("/my/directory").mkdir(parents=True, exist_ok=True)

对于旧版本的Python，我看到两个质量很好的答案，每个都有一个小缺陷，因此我将对此进行说明：

试试看os.path.exists，然后考虑os.makedirs创建。

import os
if not os.path.exists(directory):
    os.makedirs(directory)

如注释和其他地方所述，存在竞争条件–如果在os.path.exists和os.makedirs调用之间创建目录，os.makedirs则将失败并显示OSError。不幸的是，毯式捕获OSError和继续操作并非万无一失，因为它将忽略由于其他因素（例如权限不足，磁盘已满等）而导致的目录创建失败。

一种选择是捕获OSError并检查嵌入式错误代码（请参阅是否存在从Python的OSError获取信息的跨平台方法）：

import os, errno

try:
    os.makedirs(directory)
except OSError as e:
    if e.errno != errno.EEXIST:
        raise

或者，可以有第二个os.path.exists，但是假设另一个在第一次检查后创建了目录，然后在第二个检查之前将其删除了–我们仍然可能会上当。

取决于应用程序，并发操作的危险可能比其他因素（例如文件许可权）造成的危险更大或更小。在选择实现之前，开发人员必须了解有关正在开发的特定应用程序及其预期环境的更多信息。

现代版本的Python通过暴露FileExistsError（在3.3+ 版本中）都极大地改善了此代码。

try:
    os.makedirs("path/to/directory")
except FileExistsError:
    # directory already exists
    pass

…并允许关键字参数os.makedirs调用exist_ok（在3.2+版本中）。

os.makedirs("path/to/directory", exist_ok=True)  # succeeds even if directory exists.

You may also want to create the directory if it’s not there.

Source, if it’s still there on SO.

=====================================================================

On Python ≥ 3.5, use pathlib.Path.mkdir:

from pathlib import Path
Path("/my/directory").mkdir(parents=True, exist_ok=True)

For older versions of Python, I see two answers with good qualities, each with a small flaw, so I will give my take on it:

Try os.path.exists, and consider os.makedirs for the creation.

import os
if not os.path.exists(directory):
    os.makedirs(directory)

As noted in comments and elsewhere, there’s a race condition – if the directory is created between the os.path.exists and the os.makedirs calls, the os.makedirs will fail with an OSError. Unfortunately, blanket-catching OSError and continuing is not foolproof, as it will ignore a failure to create the directory due to other factors, such as insufficient permissions, full disk, etc.

One option would be to trap the OSError and examine the embedded error code (see Is there a cross-platform way of getting information from Python’s OSError):

import os, errno

try:
    os.makedirs(directory)
except OSError as e:
    if e.errno != errno.EEXIST:
        raise

Alternatively, there could be a second os.path.exists, but suppose another created the directory after the first check, then removed it before the second one – we could still be fooled.

Depending on the application, the danger of concurrent operations may be more or less than the danger posed by other factors such as file permissions. The developer would have to know more about the particular application being developed and its expected environment before choosing an implementation.

Modern versions of Python improve this code quite a bit, both by exposing FileExistsError (in 3.3+)…

try:
    os.makedirs("path/to/directory")
except FileExistsError:
    # directory already exists
    pass

…and by allowing a keyword argument to os.makedirs called exist_ok (in 3.2+).

os.makedirs("path/to/directory", exist_ok=True)  # succeeds even if directory exists.

回答 12

两件事情

import os
dirpath = "<dirpath>" # Replace the "<dirpath>" with actual directory path.

if os.path.exists(dirpath):
   print("Directory exist")
else: #this is optional if you want to create a directory if doesn't exist.
   os.mkdir(dirpath):
   print("Directory created")

Two things

check if the directory exist?
if not, create a directory (optional).

import os
dirpath = "<dirpath>" # Replace the "<dirpath>" with actual directory path.

if os.path.exists(dirpath):
   print("Directory exist")
else: #this is optional if you want to create a directory if doesn't exist.
   os.mkdir(dirpath):
   print("Directory created")

知识问答

Django可扩展吗？[关闭]

2021年7月24日 Python实用宝典

问题：Django可扩展吗？[关闭]

我正在使用Django构建Web应用程序。我选择Django的原因是：

我想使用免费/开源工具。
我喜欢Python，并认为它是一种长期的语言，而对于Ruby，我不确定，PHP似乎是一个学习上的麻烦。
我正在为一个想法构建原型，并且对未来没有太多考虑。开发速度是主要因素，我已经了解Python。
我知道，如果将来我选择迁移到Google App Engine，将会更容易。
我听说Django很“不错”。

现在，我开始考虑发布作品了，我开始担心规模。我发现的有关Django扩展功能的唯一信息是Django团队提供的（我并不是说要忽略它们，但这显然不是客观信息…）。

我的问题：

今天在Django上构建的“最大”网站是什么？（我主要通过用户流量来衡量规模）
Django可以每天处理100,000个用户，每个用户访问几个小时吗？
像Stack Overflow这样的网站可以在Django上运行吗？

I’m building a web application with Django. The reasons I chose Django were:

I wanted to work with free/open-source tools.
I like Python and feel it’s a long-term language, whereas regarding Ruby I wasn’t sure, and PHP seemed like a huge hassle to learn.
I’m building a prototype for an idea and wasn’t thinking too much about the future. Development speed was the main factor, and I already knew Python.
I knew the migration to Google App Engine would be easier should I choose to do so in the future.
I heard Django was “nice”.

Now that I’m getting closer to thinking about publishing my work, I start being concerned about scale. The only information I found about the scaling capabilities of Django is provided by the Django team (I’m not saying anything to disregard them, but this is clearly not objective information…).

My questions:

What’s the “largest” site that’s built on Django today? (I measure size mostly by user traffic)
Can Django deal with 100,000 users daily, each visiting the site for a couple of hours?
Could a site like Stack Overflow run on Django?

回答 0

“当今在Django上最大的网站是什么？”

没有一个地方可以收集有关Django构建的网站上的流量的信息，因此我将不得不使用来自不同位置的数据来刺探它。首先，在Django项目主页的首页上有Django站点列表，然后在djangosites.org上有Django构建的站点列表。浏览列表并挑选一些我知道流量不错的网站，我们看到：
- Instagram： Instagram的强大力量：数百种实例，数十种技术。
- Pinterest：2013年 Alexa排名37（21.4.2015）和7000万用户
- Bitbucket： 200TB的代码和2.500.000用户
- Disqus：用Python为4亿人服务。
- curse.com：每天有60万访问量。
- tabblo.com：每天访问量为44k，请参见Ned Batchelder的帖子现代网站的基础结构。
- chesspark.com： Alexa的约179K排名。
- pownce.com（不再活动）： alexa排名约65k。Pownce的Mike Malone在关于扩展Django Web Apps的 EuroDjangoCon演讲中说“每秒数百次点击”。这是有关如何扩展Django的很好的演示，并提出了一些优点，包括Django可扩展性的（当前）缺点。
- 惠普有一个使用Django 1.5：ePrint center构建的网站。但是，对于novemer / 2015，整个网站已迁移，此链接只是重定向。该网站是一项全球性服务，正在订阅HP提供的Instant Ink和相关服务（*）。
“ Django每天可以处理100,000个用户，每个用户访问网站几个小时吗？”

是的，请参见上文。
“像Stack Overflow这样的网站可以在Django上运行吗？”

我的直觉是肯定的，但是正如其他人回答并且Mike Malone在演讲中提到的那样，数据库设计至关重要。如果我们可以找到任何可靠的流量统计信息，也可以在www.cnprog.com上找到有力的证明。无论如何，将一堆Django模型放在一起不仅仅是发生的事情:)

当然，还有更多感兴趣的网站和博客作者，但是我必须在某个地方停下来！

关于使用Django构建高流量网站michaelmoore.com的博客文章，被描述为排名前10,000的网站。 Quantcast统计信息和Competition.com统计数据。

_{（*）编辑的作者，包括此类参考文献，曾在该项目中担任外包开发人员。}

“What are the largest sites built on Django today?”

There isn’t any single place that collects information about traffic on Django built sites, so I’ll have to take a stab at it using data from various locations. First, we have a list of Django sites on the front page of the main Django project page and then a list of Django built sites at djangosites.org. Going through the lists and picking some that I know have decent traffic we see:
- Instagram: What Powers Instagram: Hundreds of Instances, Dozens of Technologies.
- Pinterest: Alexa rank 37 (21.4.2015) and 70 Million users in 2013
- Bitbucket: 200TB of Code and 2.500.000 Users
- Disqus: Serving 400 million people with Python.
- curse.com: 600k daily visits.
- tabblo.com: 44k daily visits, see Ned Batchelder’s posts Infrastructure for modern web sites.
- chesspark.com: Alexa rank about 179k.
- pownce.com (no longer active): alexa rank about 65k. Mike Malone of Pownce, in his EuroDjangoCon presentation on Scaling Django Web Apps says “hundreds of hits per second”. This is a very good presentation on how to scale Django, and makes some good points including (current) shortcomings in Django scalability.
- HP had a site built with Django 1.5: ePrint center. However, as for novemer/2015 the entire website was migrated and this link is just a redirect. This website was a world-wide service attending subscription to Instant Ink and related services HP offered (*).
“Can Django deal with 100,000 users daily, each visiting the site for a couple of hours?”

Yes, see above.
“Could a site like Stack Overflow run on Django?”

My gut feeling is yes but, as others answered and Mike Malone mentions in his presentation, database design is critical. Strong proof might also be found at www.cnprog.com if we can find any reliable traffic stats. Anyway, it’s not just something that will happen by throwing together a bunch of Django models :)

There are, of course, many more sites and bloggers of interest, but I have got to stop somewhere!

Blog post about Using Django to build high-traffic site michaelmoore.com described as a top 10,000 website. Quantcast stats and compete.com stats.

_{(*) The author of the edit, including such reference, used to work as outsourced developer in that project.}

回答 1

我们正在进行负载测试。我们认为我们可以支持240个并发请求（持续24×7的每秒120次命中），而服务器性能没有任何显着降低。那将是每小时432,000次点击。响应时间并不短（我们的交易量很大），但是随着负载的增加，我们的基准性能不会降低。

我们正在使用Apache前端Django和MySQL。操作系统是Red Hat Enterprise Linux（RHEL）。64位。对于Django，我们在守护程序模式下使用mod_wsgi。除了接受默认值外，我们没有进行任何缓存或数据库优化。

我们全都位于具有（我认为）32Gb RAM的64位Dell上的一个VM中。

由于20个或200个并发用户的性能几乎相同，因此我们不需要花费大量时间“调整”。相反，我们只需要通过常规SSL性能改进，常规数据库设计和实现（索引等），常规防火墙性能改进等来保持基本性能。

我们要衡量的是我们的负载测试笔记本电脑在15个运行16个请求并发线程的进程的疯狂工作量下苦苦挣扎。

We’re doing load testing now. We think we can support 240 concurrent requests (a sustained rate of 120 hits per second 24×7) without any significant degradation in the server performance. That would be 432,000 hits per hour. Response times aren’t small (our transactions are large) but there’s no degradation from our baseline performance as the load increases.

We’re using Apache front-ending Django and MySQL. The OS is Red Hat Enterprise Linux (RHEL). 64-bit. We use mod_wsgi in daemon mode for Django. We’ve done no cache or database optimization other than to accept the defaults.

We’re all in one VM on a 64-bit Dell with (I think) 32Gb RAM.

Since performance is almost the same for 20 or 200 concurrent users, we don’t need to spend huge amounts of time “tweaking”. Instead we simply need to keep our base performance up through ordinary SSL performance improvements, ordinary database design and implementation (indexing, etc.), ordinary firewall performance improvements, etc.

What we do measure is our load test laptops struggling under the insane workload of 15 processes running 16 concurrent threads of requests.

回答 2

不确定每天的访问次数，但以下是一些大型Django网站的示例：

这是Quora上高流量Django站点列表的链接。

Not sure about the number of daily visits but here are a few examples of large Django sites:

disqus.com (talk from djangocon)
bitbucket.org (write up)
lanyrd.com (source)
support.mozilla.com (source code)
addons.mozilla.org (source code) (talk from djangocon)
theonion.com (write up)
The guardian.co.uk comment system uses Django (source)
instagram
pinterest
rdio

Here is a link to list of high traffic Django sites on Quora.

回答 3

今天在Django上构建的“最大”网站是什么？（我主要通过用户流量来衡量规模）

在美国，是玛哈洛（Mahalo）。有人告诉我他们每个月处理大约1000万个唯一身份。现在，在2019年，Mahalo由Ruby on Rails提供支持。

国外，Globo网络（巴西新闻，体育和娱乐网站的网络）；Alexa将其排在全球前100名（目前排名第80位）。

其他著名的Django用户包括PBS，国家地理，探索，NASA（实际上是NASA内的许多不同部门）和国会图书馆。

Django每天可以处理10万个用户，每个用户访问该网站几个小时吗？

是的-但前提是您正确编写了应用程序，并且拥有足够的硬件。Django不是万能的子弹。

像StackOverflow这样的网站可以在Django上运行吗？

是的（但见上文）。

从技术角度出发，轻而易举：尝试一下soclone。在流量方面，每月以不超过一百万的唯一身份竞争钉住StackOverflow。我可以命名至少十个Django网站，其流量比SO多。

What’s the “largest” site that’s built on Django today? (I measure size mostly by user traffic)

In the US, it was Mahalo. I’m told they handle roughly 10 million uniques a month. Now, in 2019, Mahalo is powered by Ruby on Rails.

Abroad, the Globo network (a network of news, sports, and entertainment sites in Brazil); Alexa ranks them in to top 100 globally (around 80th currently).

Other notable Django users include PBS, National Geographic, Discovery, NASA (actually a number of different divisions within NASA), and the Library of Congress.

Can Django deal with 100k users daily, each visiting the site for a couple of hours?

Yes — but only if you’ve written your application right, and if you’ve got enough hardware. Django’s not a magic bullet.

Could a site like StackOverflow run on Django?

Yes (but see above).

Technology-wise, easily: see soclone for one attempt. Traffic-wise, compete pegs StackOverflow at under 1 million uniques per month. I can name at least dozen Django sites with more traffic than SO.

回答 4

扩展Web应用程序与Web框架或语言无关，而与您的体系结构有关。它涉及到如何处理浏览器缓存，数据库缓存，如何使用非标准持久性提供程序（例如CouchDB），数据库的调整方式以及许多其他内容。

Scaling Web apps is not about web frameworks or languages, is about your architecture. It’s about how you handle you browser cache, your database cache, how you use non-standard persistence providers (like CouchDB), how tuned is your database and a lot of other stuff…

回答 5

扮演恶魔的拥护者：

您应该查看Cal Henderson提供的DjangoCon 2008主题演讲，题目为“为什么我讨厌Django”，其中他几乎涵盖了您可能想要在高流量网站中执行的Django缺少的所有事项。在这一天结束时，你有，因为它把所有这些以开放的心态是完全有可能写出Django的应用包含的规模，但我认为这是一个很好的介绍和有关你的问题。

Playing devil’s advocate a little bit:

You should check the DjangoCon 2008 Keynote, delivered by Cal Henderson, titled “Why I hate Django” where he pretty much goes over everything Django is missing that you might want to do in a high traffic website. At the end of the day you have to take this all with an open mind because it is perfectly possible to write Django apps that scale, but I thought it was a good presentation and relevant to your question.

回答 6

我知道的最大的django网站是《华盛顿邮报》，这肯定表明它可以很好地扩展。

好的设计决策可能会对性能产生更大的影响。Twitter通常被认为是一个网站，它通过另一个基于动态解释语言的Web框架Ruby on Rails来体现性能问题-但Twitter工程师表示，该框架并没有像他们早先做出的某些数据库设计选择那样重要上。

Django与memcached配合得很好，并提供了一些用于管理缓存的类，您可以在其中解决大部分性能问题。在线交付的内容实际上比后端要重要的多-使用yslow之类的工具对于高性能Web应用程序至关重要。您始终可以在后端投入更多的硬件，但不能更改用户带宽。

The largest django site I know of is the Washington Post, which would certainly indicate that it can scale well.

Good design decisions probably have a bigger performance impact than anything else. Twitter is often cited as a site which embodies the performance issues with another dynamic interpreted language based web framework, Ruby on Rails – yet Twitter engineers have stated that the framework isn’t as much an issue as some of the database design choices they made early on.

Django works very nicely with memcached and provides some classes for managing the cache, which is where you would resolve the majority of your performance issues. What you deliver on the wire is almost more important than your backend in reality – using a tool like yslow is critical for a high performance web application. You can always throw more hardware at your backend, but you can’t change your users bandwidth.

回答 7

我上周参加了EuroDjangoCon会议，这是几场讲座的主题-包括最大的基于Django的网站Pownce的创建者（这里的一个演讲的幻灯片）。主要信息是，您不必担心Django，而需要进行适当的缓存，负载平衡，数据库优化等工作。

Django实际上对大多数这些东西都有钩子-特别是缓存非常容易。

I was at the EuroDjangoCon conference the other week, and this was the subject of a couple of talks – including from the founders of what was the largest Django-based site, Pownce (slides from one talk here). The main message is that it’s not Django you have to worry about, but things like proper caching, load balancing, database optimisation, etc.

Django actually has hooks for most of those things – caching, in particular, is made very easy.

回答 8

我确定您正在寻找一个更可靠的答案，但是我能想到的最明显的客观验证是Google推动Django与它的App Engine框架一起使用。如果有人定期了解并处理可扩展性，那就是Google。根据我的阅读，最大的限制因素似乎是数据库后端，这就是Google使用自己的数据库的原因…

I’m sure you’re looking for a more solid answer, but the most obvious objective validation I can think of is that Google pushes Django for use with its App Engine framework. If anybody knows about and deals with scalability on a regular basis, it’s Google. From what I’ve read, the most limiting factor seems to be the database back-end, which is why Google uses their own…

回答 9

如高性能 Django书中所述，并通过本Cal Henderson

请参阅下面提到的更多详细信息：

听到人们说“ Django无法扩展”的情况并不少见。根据您的看法，该陈述是完全正确的，也可能是完全错误的。Django本身无法扩展。

Ruby on Rails，Flask，PHP或数据库驱动的动态网站使用的任何其他语言也可以这样说。

不过，好消息是Django与一套缓存和负载平衡工具进行了精美的交互，这将使其能够扩展到最大流量。

与您在网上阅读的内容相反，它可以这样做，而无需替换通常标为“过慢”的核心组件，例如数据库ORM或模板层。

Disqus每月提供超过80亿的页面浏览量。那些数字很大。

这些团队已经证明Django确实可以扩展。我们在林肯环路的经验对此提供了支持。

我们已经建立了大型的Django网站，这些网站能够在Reddit主页上度过一天而又不费吹灰之力。

到目前为止，Django的扩展成功案例几乎不胜枚举。

它支持Disqus，Instagram和Pinterest。需要更多证据吗？Instagram仅3位工程师（其中2位没有后端开发）就能在Django上维持超过3000万用户

As stated in High Performance Django Book and Go through this Cal Henderson

See further details as mentioned below:

It’s not uncommon to hear people say “Django doesn’t scale”. Depending on how you look at it, the statement is either completely true or patently false. Django, on its own, doesn’t scale.

The same can be said of Ruby on Rails, Flask, PHP, or any other language used by a database-driven dynamic website.

The good news, however, is that Django interacts beautifully with a suite of caching and load balancing tools that will allow it to scale to as much traffic as you can throw at it.

Contrary to what you may have read online, it can do so without replacing core components often labeled as “too slow” such as the database ORM or the template layer.

Disqus serves over 8 billion page views per month. Those are some huge numbers.

These teams have proven Django most certainly does scale. Our experience here at Lincoln Loop backs it up.

We’ve built big Django sites capable of spending the day on the Reddit homepage without breaking a sweat.

Django’s scaling success stories are almost too numerous to list at this point.

It backs Disqus, Instagram, and Pinterest. Want some more proof? Instagram was able to sustain over 30 million users on Django with only 3 engineers (2 of which had no back-end development

回答 10

今天，我们使用许多Web应用程序和网站来满足我们的需求。它们中的大多数非常有用。我将向您展示python或django使用的其中一些。

华盛顿邮报

《华盛顿邮报》的网站是伴随他们的每日报纸而广为流行的在线新闻来源。Django Web框架可以轻松处理其大量的视图和流量。 Washington Post - 52.2 million unique visitors (March, 2015)

美国宇航局

国家航空航天局的官方网站是查找有关其正在进行的太空探索的新闻，图片和视频的地方。这个Django网站可以轻松处理大量的视图和流量。 2 million visitors monthly

守护者

《卫报》是英国《卫报》媒体集团所有的新闻和媒体网站。它几乎包含了《卫报》和《观察家》报纸的所有内容。这些巨大的数据由Django处理。 The Guardian (commenting system) - 41,6 million unique visitors (October, 2014)

的YouTube

我们都知道YouTube是上传猫视频的地方，但失败了。作为现有的最受欢迎的网站之一，它为我们提供了无尽的视频娱乐时间。Python编程语言为其提供了强大支持，并为我们所喜爱的功能提供了支持。

投递箱

DropBox引发了在线文档存储革命，这已成为日常生活的一部分。现在，我们几乎将所有内容都存储在云中。Dropbox使我们能够使用Python的功能存储，同步和共享几乎所有内容。

调查Monkey

Survey Monkey是最大的在线调查公司。他们每天可以在重写的Python网站上处理超过一百万个响应。

Quora

Quora是在线提问和接收社区答案的第一人。这些社区成员在他们的Python网站上回答，编辑和组织了相关结果。

有点

Bitly URL缩短服务和分析的大多数代码都是使用Python构建的。他们的服务每天可以处理数亿个事件。

Reddit被称为互联网的首页。这是一个在线查找基于数千种不同类别的信息或娱乐的地方。帖子和链接由用户生成，并通过投票提升到顶部。Reddit的许多功能都依靠Python来实现。

希普姆克

Hipmunk是一个在线消费者旅游网站，它比较热门旅游网站以找到最优惠的价格。这个Python网站的工具可让您找到目的地的最便宜的酒店和机票。

Today we use many web apps and sites for our needs. Most of them are highly useful. I will show you some of them used by python or django.

Washington Post

The Washington Post’s website is a hugely popular online news source to accompany their daily paper. Its’ huge amount of views and traffic can be easily handled by the Django web framework. Washington Post - 52.2 million unique visitors (March, 2015)

NASA

The National Aeronautics and Space Administration’s official website is the place to find news, pictures, and videos about their ongoing space exploration. This Django website can easily handle huge amounts of views and traffic. 2 million visitors monthly

The Guardian

The Guardian is a British news and media website owned by the Guardian Media Group. It contains nearly all of the content of the newspapers The Guardian and The Observer. This huge data is handled by Django. The Guardian (commenting system) - 41,6 million unique visitors (October, 2014)

YouTube

We all know YouTube as the place to upload cat videos and fails. As one of the most popular websites in existence, it provides us with endless hours of video entertainment. The Python programming language powers it and the features we love.

DropBox

DropBox started the online document storing revolution that has become part of daily life. We now store almost everything in the cloud. Dropbox allows us to store, sync, and share almost anything using the power of Python.

Survey Monkey

Survey Monkey is the largest online survey company. They can handle over one million responses every day on their rewritten Python website.

Quora

Quora is the number one place online to ask a question and receive answers from a community of individuals. On their Python website relevant results are answered, edited, and organized by these community members.

Bitly

A majority of the code for Bitly URL shortening services and analytics are all built with Python. Their service can handle hundreds of millions of events per day.

Reddit is known as the front page of the internet. It is the place online to find information or entertainment based on thousands of different categories. Posts and links are user generated and are promoted to the top through votes. Many of Reddit’s capabilities rely on Python for their functionality.

Hipmunk

Hipmunk is an online consumer travel site that compares the top travel sites to find you the best deals. This Python website’s tools allow you to find the cheapest hotels and flights for your destination.

Click here for more: 25-of-the-most-popular-python-and-django-websites, What-are-some-well-known-sites-running-on-Django

回答 11

我认为我们不妨将2011年苹果年度最佳应用程序Instagram（Instagram）添加到大量使用django的列表中。

I think we might as well add Apple’s App of the year for 2011, Instagram, to the list which uses django intensively.

回答 12

是的，它可以。可以是带Python的Django或Ruby on Rails。它仍然会扩展。

有几种不同的技术。首先，缓存无法扩展。除了硬件平衡器之外，您可能还具有以nginx作为前端平衡的多个应用程序服务器。为了扩展数据库方面，如果您采用RDBMS方式，则可以在MySQL / PostgreSQL中使用读取从属进行相当大的扩展。

Django中的高流量网站的一些很好的例子可能是：

当他们还在那儿的时候就穿衣服。
铁饼（通用共享评论管理器）
所有与报纸相关的网站：《华盛顿邮报》等。

您可以放心。

Yes it can. It could be Django with Python or Ruby on Rails. It will still scale.

There are few different techniques. First, caching is not scaling. You could have several application servers balanced with nginx as the front in addition to hardware balancer(s). To scale on the database side you can go pretty far with read slave in MySQL / PostgreSQL if you go the RDBMS way.

Some good examples of heavy traffic websites in Django could be:

Pownce when they were still there.
Discus (generic shared comments manager)
All the newspaper related websites: Washington Post and others.

You can feel safe.

回答 13

以下是Django中一些比较引人注目的内容的列表：

监护人的“ 调查议员的费用 ”应用程序
Politifact.com（这是一篇有关（正面）体验的博客文章。该网站赢得了普利策奖）。
纽约时报的代表应用程序
每个块
WaPo的一名程序员Peter Harkins 在他的博客中列出了他们用Django构建的所有内容
它有些旧，但是《洛杉矶时报》的某人对他们为什么选择Django 进行了基本概述。
洋葱的AV俱乐部最近从（我认为Drupal）转移到了Django。

我想象这些网站中的许多网站每天的点击量可能超过10万次。Django当然可以每天点击10万次甚至更多。但是YMMV会根据您所构建的内容将您的特定网站放到那里。

在Django级别上，有一些缓存选项（例如，在memcached中缓存查询集和视图可以解决奇迹）以及其他方面（如Squid之类的上游缓存）。数据库服务器规范也将是一个因素（通常是挥霍的地方），以及您对其进行的优化程度。例如，不要以为Django会正确设置索引。不要以为默认的PostgreSQL或MySQL配置是正确的配置。

此外，如果这是很慢的话，您总是可以选择让多个应用程序服务器运行Django，并在其前面安装软件或硬件负载平衡器。

最后，您是否要在与Django相同的服务器上提供静态内容？您使用的是Apache还是nginx或lighttpd之类的东西？您能负担得起将CDN用于静态内容吗？这些都是要考虑的事情，但这都是非常投机的。每天10万次点击不是唯一的变量：您要花费多少？您拥有多少专业知识来管理所有这些组件？您需要花费多少时间将它们整合在一起？

Here’s a list of some relatively high-profile things built in Django:

The Guardian’s “Investigate your MP’s expenses” app
Politifact.com (here’s a Blog post talking about the (positive) experience. Site won a Pulitzer.
NY Times’ Represent app
EveryBlock
Peter Harkins, one of the programmers over at WaPo, lists all the stuff they’ve built with Django on his blog
It’s a little old, but someone from the LA Times gave a basic overview of why they went with Django.
The Onion’s AV Club was recently moved from (I think Drupal) to Django.

I imagine a number of these these sites probably gets well over 100k+ hits per day. Django can certainly do 100k hits/day and more. But YMMV in getting your particular site there depending on what you’re building.

There are caching options at the Django level (for example caching querysets and views in memcached can work wonders) and beyond (upstream caches like Squid). Database Server specifications will also be a factor (and usually the place to splurge), as is how well you’ve tuned it. Don’t assume, for example, that Django’s going set up indexes properly. Don’t assume that the default PostgreSQL or MySQL configuration is the right one.

Furthermore, you always have the option of having multiple application servers running Django if that is the slow point, with a software or hardware load balancer in front.

Finally, are you serving static content on the same server as Django? Are you using Apache or something like nginx or lighttpd? Can you afford to use a CDN for static content? These are things to think about, but it’s all very speculative. 100k hits/day isn’t the only variable: how much do you want to spend? How much expertise do you have managing all these components? How much time do you have to pull it all together?

回答 14

YouTube的开发者拥护者在PyCon 2012上发表了有关缩放Python的话题，这也与缩放Django有关。

YouTube拥有超过十亿的用户，并且YouTube基于Python构建。

The developer advocate for YouTube gave a talk about scaling Python at PyCon 2012, which is also relevant to scaling Django.

YouTube has more than a billion users, and YouTube is built on Python.

回答 15

我已经使用Django一年多了，它对组合模块化，可扩展性和开发速度的管理方式印象深刻。像任何技术一样，它也带有学习曲线。但是，Django社区提供的出色文档使学习曲线的难度大大降低。Django能够很好地处理我提出的所有问题。看起来它将能够很好地扩展到未来。

BidRodeo Penny Auctions是一个中等规模的Django支持的网站。这是一个非常动态的网站，每天确实处理大量的网页浏览。

I have been using Django for over a year now, and am very impressed with how it manages to combine modularity, scalability and speed of development. Like with any technology, it comes with a learning curve. However, this learning curve is made a lot less steep by the excellent documentation from the Django community. Django has been able to handle everything I have thrown at it really well. It looks like it will be able to scale well into the future.

BidRodeo Penny Auctions is a moderately sized Django powered website. It is a very dynamic website and does handle a good number of page views a day.

回答 16

请注意，如果您希望每天有10万名用户，并且一次处于活动状态数小时（意味着最多有2万名并发用户），那么您将需要大量服务器。SO拥有约15,000个注册用户，其中大多数人可能每天都不活跃。虽然大部分流量来自未注册的用户，但我猜想他们中很少有人会停留在网站上超过几分钟（即，他们遵循Google搜索结果然后离开）。

对于该数量，预计至少要有30台服务器……每台服务器仍然有1000个并发用户。

Note that if you’re expecting 100K users per day, that are active for hours at a time (meaning max of 20K+ concurrent users), you’re going to need A LOT of servers. SO has ~15,000 registered users, and most of them are probably not active daily. While the bulk of traffic comes from unregistered users, I’m guessing that very few of them stay on the site more than a couple minutes (i.e. they follow google search results then leave).

For that volume, expect at least 30 servers … which is still a rather heavy 1,000 concurrent users per server.

回答 17

今天在Django上构建的“最大”网站是什么？（我衡量大多是由用户流量的大小）， Pinterest的
 disqus.com
这里更多：https://www.shuup.com/en/blog/25-of-the-most-popular-python-and-django-websites/

Django是否可以每天处理100,000个用户，每个用户访问几个小时？
是的，但是使用适当的体系结构，数据库设计，缓存，负载平衡以及多个服务器或节点

像Stack Overflow这样的网站可以在Django上运行吗？
是的，只需要按照第二个问题中提到的答案

What’s the “largest” site that’s built on Django today? (I measure size mostly by user traffic) Pinterest
disqus.com
More here: https://www.shuup.com/en/blog/25-of-the-most-popular-python-and-django-websites/

Can Django deal with 100,000 users daily, each visiting the site for a couple of hours?
Yes but use proper architecture, database design, use of cache, use load balances and multiple servers or nodes

Could a site like Stack Overflow run on Django?
Yes just need to follow the answer mentioned in the 2nd question

回答 18

另一个示例是rasp.yandex.ru，俄罗斯的运输时间表服务。出席人数可以满足您的要求。

Another example is rasp.yandex.ru, Russian transport timetable service. Its attendance satisfies your requirements.

回答 19

如果您的网站上有一些静态内容，那么将Varnish服务器放在最前面将大大提高性能。即使是一个盒子，也可以轻松吐出100 Mbit / s的流量。

请注意，对于动态内容，使用Varnish之类的东西会变得更加棘手。

If you have a site with some static content, then putting a Varnish server in front will dramatically increase your performance. Even a single box can then easily spit out 100 Mbit/s of traffic.

Note that with dynamic content, using something like Varnish becomes a lot more tricky.

回答 20

我对Django的经验很少，但我确实记得《 Django书》中有一章，他们采访了运行某些较大Django应用程序的人员。这是一个链接。我想它可以提供一些见解。

它说curse.com是最大的Django应用程序之一，每月浏览量约为60-9000万。

My experience with Django is minimal but I do remember in The Django Book they have a chapter where they interview people running some of the larger Django applications. Here is a link. I guess it could provide some insights.

It says curse.com is one of the largest Django applications with around 60-90 million page views in a month.

回答 21

我使用Django为爱尔兰的国家广播公司开发高流量站点。它对我们很好。开发高性能站点不仅仅只是选择一个框架。框架将只是与最薄弱的环节一样强大的系统的一部分。如果问题是数据库查询速度慢或服务器或网络配置错误，则使用最新的框架“ X”不能解决您的性能问题。

I develop high traffic sites using Django for the national broadcaster in Ireland. It works well for us. Developing a high performance site is more than about just choosing a framework. A framework will only be one part of a system that is as strong as it’s weakest link. Using the latest framework ‘X’ won’t solve your performance issues if the problem is slow database queries or a badly configured server or network.

回答 22

尽管这里有很多不错的答案，但我只是想指出一点，没有人强调。

取决于应用

如果您的应用程序写时很少，那么从DB中读取的数据要比编写的要多得多。然后缩放django应该是相当琐碎的，哎呀，它带有一些相当不错的输出/视图缓存，可以直接使用。充分利用这一点，例如说redis作为缓存提供者，在它前面放置一个负载均衡器，启动n个实例，您应该能够处理非常大的流量。

现在，如果您必须每秒进行数千次复杂的写操作？不同的故事。Django将是一个错误的选择吗？好吧，不一定要取决于您如何真正设计解决方案以及您的要求是什么。

只是我的两分钱:-)

Even-though there have been a lot of great answers here, I just feel like pointing out, that nobody have put emphasis on..

It depends on the application

If you application is light on writes, as in you are reading a lot more data from the DB than you are writing. Then scaling django should be fairly trivial, heck, it comes with some fairly decent output/view caching straight out of the box. Make use of that, and say, redis as a cache provider, put a load balancer in front of it, spin up n-instances and you should be able to deal with a VERY large amount of traffic.

Now, if you have to do thousands of complex writes a second? Different story. Is Django going to be a bad choice? Well, not necessarily, depends on how you architect your solution really, and also, what your requirements are.

Just my two cents :-)

回答 23

您绝对可以在Django中运行高流量站点。在Django 1.0之前的版本中查看该版本，但仍在此处相关：http : //menendez.com/blog/launching-high-performance-django-site/

You can definitely run a high-traffic site in Django. Check out this pre-Django 1.0 but still relevant post here: http://menendez.com/blog/launching-high-performance-django-site/

回答 24

查看这个名为EveryBlock的微型新闻聚合器。

它完全用Django编写。实际上，他们是开发Django框架本身的人。

Check out this micro news aggregator called EveryBlock.

It’s entirely written in Django. In fact they are the people who developed the Django framework itself.

回答 25

问题不在于django是否可以扩展。

正确的方法是了解并了解在django / symfony / rails项目下可以很好地扩展的网络设计模式和工具。

一些想法可以是：

多路复用。
反向代理。例如：Nginx，光油
Memcache会话。例如：Redis
在项目和数据库上进行集群化以实现负载平衡和容错：例如：Docker
使用第三方存储资产。例如：Amazon S3

希望对您有所帮助。这是我到山上的小石头。

The problem is not to know if django can scale or not.

The right way is to understand and know which are the network design patterns and tools to put under your django/symfony/rails project to scale well.

Some ideas can be :

Multiplexing.
Inversed proxy. Ex : Nginx, Varnish
Memcache Session. Ex : Redis
Clusterization on your project and db for load balancing and fault tolerance : Ex : Docker
Use third party to store assets. Ex : Amazon S3

Hope it help a bit. This is my tiny rock to the mountain.

回答 26

如果您想使用开源，那么有很多选择。但是python是其中最好的，因为它有许多库和一个超棒的社区。这些是可能会改变您想法的一些原因：

Python非常好，但是它是一种解释型语言，因此速度很慢。但是有许多加速器和缓存服务可以部分解决此问题。
如果您正在考虑快速发展，那么Ruby on Rails是最好的选择。此（ROR）框架的主要座右铭是为开发人员提供舒适的体验。如果您比较一下，Ruby和Python的语法几乎相同。
Google App Engine是一项非常好的服务，但是它将在您的范围内束缚您，您没有机会尝试新事物。取而代之的是，您可以使用Digital Ocean云，因为它最简单的液滴仅需每月支付5美元。Heroku是另一项免费服务，您可以在其中部署产品。
是! 是! 您所听到的是完全正确的，但这是一些使用其他技术的示例
- Rails：Github，Twitter（以前），Shopify，Airbnb，Slideshare，Heroku等
- PHP：Facebook，Wikipedia，Flickr，Yahoo，Tumbler，Mailchimp等。

结论是一种框架或语言无法为您做任何事情。更好的架构，设计和策略将为您提供可扩展的网站。Instagram是最大的例子，这个小团队正在管理如此庞大的数据。这是一个有关其架构必须阅读的博客。

If you want to use Open source then there are many options for you. But python is best among them as it has many libraries and a super awesome community. These are a few reasons which might change your mind:

Python is very good but it is a interpreted language which makes it slow. But many accelerator and caching services are there which partly solve this problem.
If you are thinking about rapid development then Ruby on Rails is best among all. The main motto of this(ROR) framework is to give a comfortable experience to the developers. If you compare Ruby and Python both have nearly the same syntax.
Google App Engine is very good service but it will bind you in its scope, you don’t get chance to experiment new things. Instead of it you can use Digital Ocean cloud which will only take $5/Month charge for its simplest droplet. Heroku is another free service where you can deploy your product.
Yes! Yes! What you heard is totally correct but here are some examples which are using other technologies
- Rails: Github, Twitter(previously), Shopify, Airbnb, Slideshare, Heroku etc.
- PHP: Facebook, Wikipedia, Flickr, Yahoo, Tumbler, Mailchimp etc.

Conclusion is a framework or language won’t do everything for you. A better architecture, designing and strategy will give you a scalable website. Instagram is the biggest example, this small team is managing such huge data. Here is one blog about its architecture must read it.

回答 27

我不认为问题确实与Django缩放有关。

我真的建议您研究一下可以帮助您满足扩展需求的体系结构，如果您出错了，那么Django的性能将毫无意义。性能！=规模。您可以拥有一个性能惊人但不能扩展的系统，反之亦然。

您的应用程序数据库绑定了吗？如果是这样，那么您的规模问题也就在那里。您如何计划与Django中的数据库进行交互？当您的数据库无法以Django接受请求的速度处理请求时，会发生什么情况？当数据超过一台物理计算机时，会发生什么。您需要考虑如何计划应对这些情况。

此外，当您的流量超过一台应用服务器时会发生什么？在这种情况下，如何处理会话可能会很棘手，通常您可能需要无共享架构。同样，这取决于您的应用程序。

简而言之，语言不是决定规模的因素，语言是决定性能的因素（再次取决于您的应用程序，不同的语言表现不同）。是您的设计和体系结构使扩展成为现实。

希望对您有所帮助，如果您有任何疑问，将很乐意为您提供进一步的帮助。

I don’t think the issue is really about Django scaling.

I really suggest you look into your architecture that’s what will help you with your scaling needs.If you get that wrong there is no point on how well Django performs. Performance != Scale. You can have a system that has amazing performance but does not scale and vice versa.

Is your application database bound? If it is then your scale issues lay there as well. How are you planning on interacting with the database from Django? What happens when you database cannot process requests as fast as Django accepts them? What happens when your data outgrows one physical machine. You need to account for how you plan on dealing with those circumstances.

Moreover, What happens when your traffic outgrows one app server? how you handle sessions in this case can be tricky, more often than not you would probably require a shared nothing architecture. Again that depends on your application.

In short languages is not what determines scale, a language is responsible for performance(again depending on your applications, different languages perform differently). It is your design and architecture that makes scaling a reality.

I hope it helps, would be glad to help further if you have questions.

回答 28

一旦您的站点/应用程序开始增长，就必须平均分配任务，简而言之，优化各个方面，包括数据库，文件，图像，CSS等，并平衡负载与其他多种资源。或者，您为其腾出更多空间。大型站点必须实施CDN，云等最新技术。仅仅开发和调整应用程序并不能使您满意，其他组件也起着重要的作用。

Spreading the tasks evenly, in short optimizing each and every aspect including DBs, Files, Images, CSS etc. and balancing the load with several other resources is necessary once your site/application starts growing. OR you make some more space for it to grow. Implementation of latest technologies like CDN, Cloud are must with huge sites. Just developing and tweaking an application won’t give your the cent percent satisfation, other components also play an important role.

知识问答

“自我”一词的目的是什么？

2021年7月24日 Python实用宝典

问题：“自我”一词的目的是什么？

selfPython 中的单词的目的是什么？我知道它是指从该类创建的特定对象，但是我看不到为什么要将它显式地作为参数添加到每个函数中。为了说明这一点，在Ruby中，我可以这样做：

class myClass
    def myFunc(name)
        @name = name
    end
end

我很容易理解。但是在Python中，我需要包括self：

class myClass:
    def myFunc(self, name):
        self.name = name

有人可以通过这个告诉我吗？我的经历（公认有限）并不是我遇到的。

What is the purpose of the self word in Python? I understand it refers to the specific object created from that class, but I can’t see why it explicitly needs to be added to every function as a parameter. To illustrate, in Ruby I can do this:

class myClass
    def myFunc(name)
        @name = name
    end
end

Which I understand, quite easily. However in Python I need to include self:

class myClass:
    def myFunc(self, name):
        self.name = name

Can anyone talk me through this? It is not something I’ve come across in my (admittedly limited) experience.

回答 0

需要使用的原因self.是因为Python不使用@语法来引用实例属性。Python决定以一种使该方法所属的实例自动传递但不会自动接收的方式进行方法：方法的第一个参数是调用该方法的实例。这使方法与函数完全相同，并保留实际名称供您使用（尽管self是约定俗成的，当您使用其他方法时，人们通常会皱着眉头。）self对于代码而言并不特殊，它只是另一个对象。

Python可以做一些其他事情来区分普通名称和属性-像Ruby这样的特殊语法，或者像C ++和Java这样的声明都需要，或者也许还有其他不同-但事实并非如此。Python的全部目的是使事情变得明确，使事情变得显而易见，尽管它并非在所有地方都做到这一点，但它确实为实例属性做到了。因此，分配给实例属性需要知道要分配给哪个实例，这就是为什么需要的原因self.。

The reason you need to use self. is because Python does not use the @ syntax to refer to instance attributes. Python decided to do methods in a way that makes the instance to which the method belongs be passed automatically, but not received automatically: the first parameter of methods is the instance the method is called on. That makes methods entirely the same as functions, and leaves the actual name to use up to you (although self is the convention, and people will generally frown at you when you use something else.) self is not special to the code, it’s just another object.

Python could have done something else to distinguish normal names from attributes — special syntax like Ruby has, or requiring declarations like C++ and Java do, or perhaps something yet more different — but it didn’t. Python’s all for making things explicit, making it obvious what’s what, and although it doesn’t do it entirely everywhere, it does do it for instance attributes. That’s why assigning to an instance attribute needs to know what instance to assign to, and that’s why it needs self..

回答 1

让我们看一个简单的向量类：

class Vector:
    def __init__(self, x, y):
        self.x = x
        self.y = y

我们希望有一种计算长度的方法。如果我们想在类中定义它，它将是什么样？

    def length(self):
        return math.sqrt(self.x ** 2 + self.y ** 2)

当我们将其定义为全局方法/函数时，它应该是什么样？

def length_global(vector):
    return math.sqrt(vector.x ** 2 + vector.y ** 2)

因此，整个结构保持不变。我该如何利用呢？如果我们暂时假设没有length为Vector类编写方法，则可以执行以下操作：

Vector.length_new = length_global
v = Vector(3, 4)
print(v.length_new()) # 5.0

之所以有效，是因为的第一个参数length_global可以用作中的self参数length_new。没有明确的说法，这是不可能的self。

理解显式需求的另一种方法self是查看Python在何处添加了一些语法糖。当您牢记时，基本上，

v_instance.length()

在内部转换为

Vector.length(v_instance)

很容易看到self适合的位置。您实际上并没有用Python编写实例方法；您编写的是必须将实例作为第一个参数的类方法。因此，您必须将实例参数显式放置在某处。

Let’s take a simple vector class:

class Vector:
    def __init__(self, x, y):
        self.x = x
        self.y = y

We want to have a method which calculates the length. What would it look like if we wanted to define it inside the class?

    def length(self):
        return math.sqrt(self.x ** 2 + self.y ** 2)

What should it look like when we were to define it as a global method/function?

def length_global(vector):
    return math.sqrt(vector.x ** 2 + vector.y ** 2)

So the whole structure stays the same. How can me make use of this? If we assume for a moment that we hadn’t written a length method for our Vector class, we could do this:

Vector.length_new = length_global
v = Vector(3, 4)
print(v.length_new()) # 5.0

This works because the first parameter of length_global, can be re-used as the self parameter in length_new. This would not be possible without an explicit self.

Another way of understanding the need for the explicit self is to see where Python adds some syntactical sugar. When you keep in mind, that basically, a call like

v_instance.length()

is internally transformed to

Vector.length(v_instance)

it is easy to see where the self fits in. You don’t actually write instance methods in Python; what you write is class methods which must take an instance as a first parameter. And therefore, you’ll have to place the instance parameter somewhere explicitly.

回答 2

假设您有一个ClassA包含methodA定义为以下方法的类：

def methodA(self, arg1, arg2):
    # do something

并且ObjectA是此类的一个实例。

现在，当ObjectA.methodA(arg1, arg2)被调用时，python在内部将其转换为：

ClassA.methodA(ObjectA, arg1, arg2)

的self变量是指对象本身。

Let’s say you have a class ClassA which contains a method methodA defined as:

def methodA(self, arg1, arg2):
    # do something

and ObjectA is an instance of this class.

Now when ObjectA.methodA(arg1, arg2) is called, python internally converts it for you as:

ClassA.methodA(ObjectA, arg1, arg2)

The self variable refers to the object itself.

回答 3

实例化对象时，对象本身将传递到self参数中。

因此，对象的数据绑定到该对象。下面是一个示例，您可以如何可视化每个对象的数据外观。注意如何用对象名称替换“自我”。我并不是说下面的示例图是完全准确的，但希望它可以用于可视化自我的使用。

将对象传递到self参数中，以便对象可以保留其自己的数据。

尽管这可能并不完全准确，但是请考虑如下实例化对象的过程：制作对象时，它将类用作其自己的数据和方法的模板。如果不将其自身的名称传递给self参数，则该类中的属性和方法将保留为常规模板，并且不会引用该对象（属于该对象）。因此，通过将对象的名称传递给self参数，这意味着，如果从一个类实例化100个对象，则它们都可以跟踪自己的数据和方法。

请参见下图：

When objects are instantiated, the object itself is passed into the self parameter.

Because of this, the object’s data is bound to the object. Below is an example of how you might like to visualize what each object’s data might look. Notice how ‘self’ is replaced with the objects name. I’m not saying this example diagram below is wholly accurate but it hopefully with serve a purpose in visualizing the use of self.

The Object is passed into the self parameter so that the object can keep hold of its own data.

Although this may not be wholly accurate, think of the process of instantiating an object like this: When an object is made it uses the class as a template for its own data and methods. Without passing it’s own name into the self parameter, the attributes and methods in the class would remain as a general template and would not be referenced to (belong to) the object. So by passing the object’s name into the self parameter it means that if 100 objects are instantiated from the one class, they can all keep track of their own data and methods.

See the illustration below:

回答 4

我喜欢这个例子：

class A: 
    foo = []
a, b = A(), A()
a.foo.append(5)
b.foo
ans: [5]

class A: 
    def __init__(self): 
        self.foo = []
a, b = A(), A()
a.foo.append(5)
b.foo
ans: []

I like this example:

class A: 
    foo = []
a, b = A(), A()
a.foo.append(5)
b.foo
ans: [5]

class A: 
    def __init__(self): 
        self.foo = []
a, b = A(), A()
a.foo.append(5)
b.foo
ans: []

回答 5

我将用不使用类的代码进行演示：

def state_init(state):
    state['field'] = 'init'

def state_add(state, x):
    state['field'] += x

def state_mult(state, x):
    state['field'] *= x

def state_getField(state):
    return state['field']

myself = {}
state_init(myself)
state_add(myself, 'added')
state_mult(myself, 2)

print( state_getField(myself) )
#--> 'initaddedinitadded'

类只是避免始终传递此“状态”事物的一种方法（以及其他诸如初始化，类组合，很少需要的元类以及支持自定义方法以覆盖运算符之类的美好事物）的方法。

现在，让我们使用内置的python类机制来演示上面的代码，以显示其基本相同之处。

class State(object):
    def __init__(self):
        self.field = 'init'
    def add(self, x):
        self.field += x
    def mult(self, x):
        self.field *= x

s = State()
s.add('added')    # self is implicitly passed in
s.mult(2)         # self is implicitly passed in
print( s.field )

[从重复的封闭式问题中迁移了我的答案]

I will demonstrate with code that does not use classes:

def state_init(state):
    state['field'] = 'init'

def state_add(state, x):
    state['field'] += x

def state_mult(state, x):
    state['field'] *= x

def state_getField(state):
    return state['field']

myself = {}
state_init(myself)
state_add(myself, 'added')
state_mult(myself, 2)

print( state_getField(myself) )
#--> 'initaddedinitadded'

Classes are just a way to avoid passing in this “state” thing all the time (and other nice things like initializing, class composition, the rarely-needed metaclasses, and supporting custom methods to override operators).

Now let’s demonstrate the above code using the built-in python class machinery, to show how it’s basically the same thing.

class State(object):
    def __init__(self):
        self.field = 'init'
    def add(self, x):
        self.field += x
    def mult(self, x):
        self.field *= x

s = State()
s.add('added')    # self is implicitly passed in
s.mult(2)         # self is implicitly passed in
print( s.field )

[migrated my answer from duplicate closed question]

回答 6

以下摘录来自Python文档中关于self的内容：

与Modula-3中一样，[Python]中没有用于从其方法引用该对象的成员的简写：方法函数以表示该对象的显式第一个参数声明，该参数由调用隐式提供。

通常，方法的第一个参数称为self。这无非是一种约定：self对Python绝对没有特殊的含义。但是请注意，如果不遵循该约定，则其他Python程序员可能对代码的可读性较低，并且还可以想到可能会依赖此类约定编写类浏览器程序。

有关更多信息，请参见关于类的Python文档教程。

The following excerpts are from the Python documentation about self:

As in Modula-3, there are no shorthands [in Python] for referencing the object’s members from its methods: the method function is declared with an explicit first argument representing the object, which is provided implicitly by the call.

Often, the first argument of a method is called self. This is nothing more than a convention: the name self has absolutely no special meaning to Python. Note, however, that by not following the convention your code may be less readable to other Python programmers, and it is also conceivable that a class browser program might be written that relies upon such a convention.

For more information, see the Python documentation tutorial on classes.

回答 7

除已说明的所有其他原因外，它还允许更轻松地访问重写的方法；你可以打电话Class.some_method(inst)。

一个有用的例子：

class C1(object):
    def __init__(self):
         print "C1 init"

class C2(C1):
    def __init__(self): #overrides C1.__init__
        print "C2 init"
        C1.__init__(self) #but we still want C1 to init the class too

>>> C2()
"C2 init"
"C1 init"

As well as all the other reasons already stated, it allows for easier access to overridden methods; you can call Class.some_method(inst).

An example of where it’s useful:

class C1(object):
    def __init__(self):
         print "C1 init"

class C2(C1):
    def __init__(self): #overrides C1.__init__
        print "C2 init"
        C1.__init__(self) #but we still want C1 to init the class too

>>> C2()
"C2 init"
"C1 init"

回答 8

它的使用类似于thisJava 中关键字的使用，即提供对当前对象的引用。

Its use is similar to the use of this keyword in Java, i.e. to give a reference to the current object.

回答 9

与Java或C ++不同，Python不是为面向对象编程而构建的语言。

在Python中调用静态方法时，只需编写一个内部带有常规参数的方法。

class Animal():
    def staticMethod():
        print "This is a static method"

但是，对象方法需要您创建一个变量，在这种情况下，该方法是动物，需要使用self参数

class Animal():
    def objectMethod(self):
        print "This is an object method which needs an instance of a class"

self方法还用于引用类中的变量字段。

class Animal():
    #animalName made in constructor
    def Animal(self):
        self.animalName = "";


    def getAnimalName(self):
        return self.animalName

在这种情况下，self指的是整个类的animalName变量。记住：如果方法中有变量，则self将不起作用。该变量仅在该方法运行时才存在。为了定义字段（整个类的变量），您必须在类方法之外定义它们。

如果您听不懂我在说什么，请使用Google“面向对象编程”。一旦理解了这一点，您甚至不需要问这个问题：）。

Python is not a language built for Object Oriented Programming unlike Java or C++.

When calling a static method in Python, one simply writes a method with regular arguments inside it.

class Animal():
    def staticMethod():
        print "This is a static method"

However, an object method, which requires you to make a variable, which is an Animal, in this case, needs the self argument

class Animal():
    def objectMethod(self):
        print "This is an object method which needs an instance of a class"

The self method is also used to refer to a variable field within the class.

class Animal():
    #animalName made in constructor
    def Animal(self):
        self.animalName = "";


    def getAnimalName(self):
        return self.animalName

In this case, self is referring to the animalName variable of the entire class. REMEMBER: If you have a variable within a method, self will not work. That variable is simply existent only while that method is running. For defining fields (the variables of the entire class), you have to define them OUTSIDE the class methods.

If you don’t understand a single word of what I am saying, then Google “Object Oriented Programming.” Once you understand this, you won’t even need to ask that question :).

回答 10

可以遵循Python禅宗的“显式优于隐式”的说法。它确实是对您的类对象的引用。例如，在Java和PHP中，它称为this。

如果user_type_name模型上的字段为，则可以通过进行访问self.user_type_name。

It’s there to follow the Python zen “explicit is better than implicit”. It’s indeed a reference to your class object. In Java and PHP, for example, it’s called this.

If user_type_name is a field on your model you access it by self.user_type_name.

回答 11

首先，自我是一个常规名称，您可以代之以其他任何东西（连贯一致）。

它指的是对象本身，因此，在使用它时，您声明.name和.age是要创建的Student对象的属性（注意，不是Student类的属性）。

class Student:
    #called each time you create a new Student instance
    def __init__(self,name,age): #special method to initialize
        self.name=name
        self.age=age

    def __str__(self): #special method called for example when you use print
        return "Student %s is %s years old" %(self.name,self.age)

    def call(self, msg): #silly example for custom method
        return ("Hey, %s! "+msg) %self.name

#initializing two instances of the student class
bob=Student("Bob",20)
alice=Student("Alice",19)

#using them
print bob.name
print bob.age
print alice #this one only works if you define the __str__ method
print alice.call("Come here!") #notice you don't put a value for self

#you can modify attributes, like when alice ages
alice.age=20
print alice

代码在这里

First of all, self is a conventional name, you could put anything else (being coherent) in its stead.

It refers to the object itself, so when you are using it, you are declaring that .name and .age are properties of the Student objects (note, not of the Student class) you are going to create.

class Student:
    #called each time you create a new Student instance
    def __init__(self,name,age): #special method to initialize
        self.name=name
        self.age=age

    def __str__(self): #special method called for example when you use print
        return "Student %s is %s years old" %(self.name,self.age)

    def call(self, msg): #silly example for custom method
        return ("Hey, %s! "+msg) %self.name

#initializing two instances of the student class
bob=Student("Bob",20)
alice=Student("Alice",19)

#using them
print bob.name
print bob.age
print alice #this one only works if you define the __str__ method
print alice.call("Come here!") #notice you don't put a value for self

#you can modify attributes, like when alice ages
alice.age=20
print alice

Code is here

回答 12

self是对对象本身的对象引用，因此它们是相同的。在对象本身的上下文中未调用Python方法。 self在Python中，可能用于处理自定义对象模型之类的东西。

self is an object reference to the object itself, therefore, they are same. Python methods are not called in the context of the object itself. self in Python may be used to deal with custom object models or something.

回答 13

使用通常称为参数的参数self并不难理解，为什么要这样做呢？还是关于为什么要明确提及？我想，对于大多数查询此问题的用户来说，这是一个更大的问题，或者如果不是，则在继续学习python时，他们肯定会遇到相同的问题。我建议他们阅读以下两个博客：

1：自我解释

请注意，它不是关键字。

每个类方法（包括init）的第一个参数始终是对当前类实例的引用。按照惯例，此参数始终命名为self。在init方法中，self指的是新创建的对象；在其他类方法中，它引用其方法被调用的实例。例如，下面的代码与上面的代码相同。

2：为什么要用这种方式，为什么不能像Java那样将其作为参数消除，而要用关键字代替

我想补充的另一件事是，可选self参数允许我通过不编写而在类内声明静态方法self。

代码示例：

class MyClass():
    def staticMethod():
        print "This is a static method"

    def objectMethod(self):
        print "This is an object method which needs an instance of a class, and that is what self refers to"

聚苯乙烯：仅在Python 3.x中有效。

在以前的版本中，必须显式添加@staticmethod装饰器，否则self必须使用参数。

The use of the argument, conventionally called self isn’t as hard to understand, as is why is it necessary? Or as to why explicitly mention it? That, I suppose, is a bigger question for most users who look up this question, or if it is not, they will certainly have the same question as they move forward learning python. I recommend them to read these couple of blogs:

1: Use of self explained

Note that it is not a keyword.

The first argument of every class method, including init, is always a reference to the current instance of the class. By convention, this argument is always named self. In the init method, self refers to the newly created object; in other class methods, it refers to the instance whose method was called. For example the below code is the same as the above code.

2: Why do we have it this way and why can we not eliminate it as an argument, like Java, and have a keyword instead

Another thing I would like to add is, an optional self argument allows me to declare static methods inside a class, by not writing self.

Code examples:

class MyClass():
    def staticMethod():
        print "This is a static method"

    def objectMethod(self):
        print "This is an object method which needs an instance of a class, and that is what self refers to"

PS:This works only in Python 3.x.

In previous versions, you have to explicitly add @staticmethod decorator, otherwise self argument is obligatory.

回答 14

我很惊讶没有人提出Lua。Lua也使用’self’变量，但是可以省略但仍然使用。C ++对“ this”的作用相同。我没有看到任何理由必须在每个函数中声明“ self”，但是您仍然应该能够像在lua和C ++中一样使用它。对于一种以简短为荣的语言，奇怪的是它要求您声明自变量。

I’m surprised nobody has brought up Lua. Lua also uses the ‘self’ variable however it can be omitted but still used. C++ does the same with ‘this’. I don’t see any reason to have to declare ‘self’ in each function but you should still be able to use it just like you can with lua and C++. For a language that prides itself on being brief it’s odd that it requires you to declare the self variable.

回答 15

请看以下示例，该示例清楚地说明了 self

class Restaurant(object):  
    bankrupt = False

    def open_branch(self):
        if not self.bankrupt:
           print("branch opened")

#create instance1
>>> x = Restaurant()
>>> x.bankrupt
False

#create instance2
>>> y = Restaurant()
>>> y.bankrupt = True   
>>> y.bankrupt
True

>>> x.bankrupt
False

self 用于/需要区分实例。

资料来源：python中的self变量解释-Pythontips

Take a look at the following example, which clearly explains the purpose of self

class Restaurant(object):  
    bankrupt = False

    def open_branch(self):
        if not self.bankrupt:
           print("branch opened")

#create instance1
>>> x = Restaurant()
>>> x.bankrupt
False

#create instance2
>>> y = Restaurant()
>>> y.bankrupt = True   
>>> y.bankrupt
True

>>> x.bankrupt
False

self is used/needed to distinguish between instances.

Source: self variable in python explained – Pythontips

回答 16

是因为按照python的设计方式，替代方法几乎行不通。Python旨在允许在无法使用隐式this（a-la Java / C ++）或显式@（a-la ruby）的上下文中定义方法或函数。我们来看一个使用python约定的显式方法的示例：

def fubar(x):
    self.x = x

class C:
    frob = fubar

现在，该fubar功能将无法使用，因为它将假定它self是一个全局变量（以及in frob）。另一种方法是执行具有替换后的全局范围的方法（其中self对象）。

隐式方法是

def fubar(x)
    myX = x

class C:
    frob = fubar

这意味着myX它将被解释为fubar（frob以及）中的局部变量。这里的替代方法是执行具有替换的局部作用域的方法，该局部作用域在调用之间保留，但是这将消除方法局部变量的可能性。

但是，目前的情况很好：

 def fubar(self, x)
     self.x = x

 class C:
     frob = fubar

这里的时候，被称为一方法frob将接收上它通过调用对象self的参数，并且fubar仍然可以用一个对象作为参数调用和工作一样（这是一样的C.frob，我认为）。

Is because by the way python is designed the alternatives would hardly work. Python is designed to allow methods or functions to be defined in a context where both implicit this (a-la Java/C++) or explicit @ (a-la ruby) wouldn’t work. Let’s have an example with the explicit approach with python conventions:

def fubar(x):
    self.x = x

class C:
    frob = fubar

Now the fubar function wouldn’t work since it would assume that self is a global variable (and in frob as well). The alternative would be to execute method’s with a replaced global scope (where self is the object).

The implicit approach would be

def fubar(x)
    myX = x

class C:
    frob = fubar

This would mean that myX would be interpreted as a local variable in fubar (and in frob as well). The alternative here would be to execute methods with a replaced local scope which is retained between calls, but that would remove the posibility of method local variables.

However the current situation works out well:

 def fubar(self, x)
     self.x = x

 class C:
     frob = fubar

here when called as a method frob will receive the object on which it’s called via the self parameter, and fubar can still be called with an object as parameter and work the same (it is the same as C.frob I think).

回答 17

在该__init__方法中，self指的是新创建的对象；在其他类方法中，它引用其方法被调用的实例。

自我，正如名字一样，只是一个约定，可以随意称呼它！但是在使用它（例如删除对象）时，必须使用相同的名称：__del__(var)，其中var在使用__init__(var,[...])

您也应该看一下cls，以了解更大的情况。这篇文章可能会有所帮助。

In the __init__ method, self refers to the newly created object; in other class methods, it refers to the instance whose method was called.

self, as a name, is just a convention, call it as you want ! but when using it, for example to delete the object, you have to use the same name: __del__(var), where var was used in the __init__(var,[...])

You should take a look at cls too, to have the bigger picture. This post could be helpful.

回答 18

self的作用类似于当前的对象名称或class的实例。

# Self explanation.


 class classname(object):

    def __init__(self,name):

        self.name=name
        # Self is acting as a replacement of object name.
        #self.name=object1.name

   def display(self):
      print("Name of the person is :",self.name)
      print("object name:",object1.name)


 object1=classname("Bucky")
 object2=classname("ford")

 object1.display()
 object2.display()

###### Output 
Name of the person is : Bucky
object name: Bucky
Name of the person is : ford
object name: Bucky

self is acting as like current object name or instance of class .

# Self explanation.


 class classname(object):

    def __init__(self,name):

        self.name=name
        # Self is acting as a replacement of object name.
        #self.name=object1.name

   def display(self):
      print("Name of the person is :",self.name)
      print("object name:",object1.name)


 object1=classname("Bucky")
 object2=classname("ford")

 object1.display()
 object2.display()

###### Output 
Name of the person is : Bucky
object name: Bucky
Name of the person is : ford
object name: Bucky

回答 19

`self` 是不可避免的。

只是有一个问题应该self是隐性或显性的。 Guido van Rossum解决了这个问题，说self必须留下。

那么`self`住在哪里？

如果我们只是坚持使用函数式编程，那就不需要了self。进入Python OOP之后，我们发现self其中。

这是class C该方法的典型用例m1

class C:
    def m1(self, arg):
        print(self, ' inside')
        pass

ci =C()
print(ci, ' outside')
ci.m1(None)
print(hex(id(ci))) # hex memory address

该程序将输出：

<__main__.C object at 0x000002B9D79C6CC0>  outside
<__main__.C object at 0x000002B9D79C6CC0>  inside
0x2b9d79c6cc0

因此self保留了类实例的内存地址。 的目的是self为实例方法保留引用，并让我们可以显式访问该引用。

_{请注意，有三种不同类型的类方法：}

_{静态方法（阅读：函数），}
_类方法
_{实例方法（提到）。}

`self` is inevitable.

There was just a question should self be implicit or explicit. Guido van Rossum resolved this question saying self has to stay.

So where the `self` live?

If we would just stick to functional programming we would not need self. Once we enter the Python OOP we find self there.

Here is the typical use case class C with the method m1

class C:
    def m1(self, arg):
        print(self, ' inside')
        pass

ci =C()
print(ci, ' outside')
ci.m1(None)
print(hex(id(ci))) # hex memory address

This program will output:

<__main__.C object at 0x000002B9D79C6CC0>  outside
<__main__.C object at 0x000002B9D79C6CC0>  inside
0x2b9d79c6cc0

So self holds the memory address of the class instance. The purpose of self would be to hold the reference for instance methods and for us to have explicit access to that reference.

_{Note there are three different types of class methods:}

_{static methods (read: functions),}
_{class methods,}
_{instance methods (mentioned).}

回答 20

从文档中

方法的特殊之处在于，实例对象作为函数的第一个参数传递。在我们的示例中，该调用x.f()与完全等效MyClass.f(x)。通常，调用带有n个参数列表的方法等同于调用带有参数列表的函数，该参数列表是通过在第一个参数之前插入方法的实例对象而创建的。

在相关片段之前，

class MyClass:
    """A simple example class"""
    i = 12345

    def f(self):
        return 'hello world'

x = MyClass()

from the docs,

the special thing about methods is that the instance object is passed as the first argument of the function. In our example, the call x.f() is exactly equivalent to MyClass.f(x). In general, calling a method with a list of n arguments is equivalent to calling the corresponding function with an argument list that is created by inserting the method’s instance object before the first argument.

preceding this the related snippet,

class MyClass:
    """A simple example class"""
    i = 12345

    def f(self):
        return 'hello world'

x = MyClass()

回答 21

它是对类实例对象的显式引用。

it’s an explicit reference to the class instance object.

知识问答

如何将JSON数据写入文件？

2021年7月24日 Python实用宝典

问题：如何将JSON数据写入文件？

我将JSON数据存储在变量中data。

我想将其写入文本文件进行测试，因此不必每次都从服务器获取数据。

目前，我正在尝试：

obj = open('data.txt', 'wb')
obj.write(data)
obj.close

我收到此错误：

TypeError：必须是字符串或缓冲区，而不是dict

如何解决这个问题？

I have JSON data stored in the variable data.

I want to write this to a text file for testing so I don’t have to grab the data from the server each time.

Currently, I am trying this:

obj = open('data.txt', 'wb')
obj.write(data)
obj.close

And I am receiving this error:

TypeError: must be string or buffer, not dict

How to fix this?

回答 0

您忘记了实际的JSON部分- data是字典，尚未进行JSON编码。写这样的最大兼容性（Python 2和3）：

import json
with open('data.json', 'w') as f:
    json.dump(data, f)

在现代系统（即Python 3和UTF-8支持）上，您可以使用

import json
with open('data.json', 'w', encoding='utf-8') as f:
    json.dump(data, f, ensure_ascii=False, indent=4)

You forgot the actual JSON part – data is a dictionary and not yet JSON-encoded. Write it like this for maximum compatibility (Python 2 and 3):

import json
with open('data.json', 'w') as f:
    json.dump(data, f)

On a modern system (i.e. Python 3 and UTF-8 support), you can write a nicer file with

import json
with open('data.json', 'w', encoding='utf-8') as f:
    json.dump(data, f, ensure_ascii=False, indent=4)

回答 1

要获取utf8编码的文件，而不是Python 2可接受答案中的ascii编码，请使用：

import io, json
with io.open('data.txt', 'w', encoding='utf-8') as f:
  f.write(json.dumps(data, ensure_ascii=False))

该代码在Python 3中更简单：

import json
with open('data.txt', 'w') as f:
  json.dump(data, f, ensure_ascii=False)

在Windows上，encoding='utf-8'to 的参数open仍然是必需的。

为避免将数据的编码副本存储在内存中（结果为dumps），并在Python 2和3中输出utf8编码的字节串，请使用：

import json, codecs
with open('data.txt', 'wb') as f:
    json.dump(data, codecs.getwriter('utf-8')(f), ensure_ascii=False)

该codecs.getwriter调用在Python 3中是多余的，但对于Python 2是必需的

可读性和大小：

使用可以ensure_ascii=False提供更好的可读性和更小的尺寸：

>>> json.dumps({'price': '€10'})
'{"price": "\\u20ac10"}'
>>> json.dumps({'price': '€10'}, ensure_ascii=False)
'{"price": "€10"}'

>>> len(json.dumps({'абвгд': 1}))
37
>>> len(json.dumps({'абвгд': 1}, ensure_ascii=False).encode('utf8'))
17

通过将标记indent=4, sort_keys=True（如dinos66所建议的）添加到dump或的参数，进一步提高可读性dumps。这样，您将在json文件中获得一个很好的缩进排序结构，但要付出稍大的文件大小。

To get utf8-encoded file as opposed to ascii-encoded in the accepted answer for Python 2 use:

import io, json
with io.open('data.txt', 'w', encoding='utf-8') as f:
  f.write(json.dumps(data, ensure_ascii=False))

The code is simpler in Python 3:

import json
with open('data.txt', 'w') as f:
  json.dump(data, f, ensure_ascii=False)

On Windows, the encoding='utf-8' argument to open is still necessary.

To avoid storing an encoded copy of the data in memory (result of dumps) and to output utf8-encoded bytestrings in both Python 2 and 3, use:

import json, codecs
with open('data.txt', 'wb') as f:
    json.dump(data, codecs.getwriter('utf-8')(f), ensure_ascii=False)

The codecs.getwriter call is redundant in Python 3 but required for Python 2

Readability and size:

The use of ensure_ascii=False gives better readability and smaller size:

>>> json.dumps({'price': '€10'})
'{"price": "\\u20ac10"}'
>>> json.dumps({'price': '€10'}, ensure_ascii=False)
'{"price": "€10"}'

>>> len(json.dumps({'абвгд': 1}))
37
>>> len(json.dumps({'абвгд': 1}, ensure_ascii=False).encode('utf8'))
17

Further improve readability by adding flags indent=4, sort_keys=True (as suggested by dinos66) to arguments of dump or dumps. This way you’ll get a nicely indented sorted structure in the json file at the cost of a slightly larger file size.

回答 2

我会稍作修改，对上述答案进行回答，那就是编写一个美化的JSON文件，人眼可以更好地阅读。为此，sort_keys以True和传递indent4个空格字符就可以了。还要注意确保不会将ASCII代码写入您的JSON文件中：

with open('data.txt', 'w') as outfile:
     json.dump(jsonData, outfile, sort_keys = True, indent = 4,
               ensure_ascii = False)

I would answer with slight modification with aforementioned answers and that is to write a prettified JSON file which human eyes can read better. For this, pass sort_keys as True and indent with 4 space characters and you are good to go. Also take care of ensuring that the ascii codes will not be written in your JSON file:

with open('data.txt', 'w') as outfile:
     json.dump(jsonData, outfile, sort_keys = True, indent = 4,
               ensure_ascii = False)

回答 3

使用Python 2 + 3读写JSON文件；与unicode一起使用

# -*- coding: utf-8 -*-
import json

# Make it work for Python 2+3 and with Unicode
import io
try:
    to_unicode = unicode
except NameError:
    to_unicode = str

# Define data
data = {'a list': [1, 42, 3.141, 1337, 'help', u'€'],
        'a string': 'bla',
        'another dict': {'foo': 'bar',
                         'key': 'value',
                         'the answer': 42}}

# Write JSON file
with io.open('data.json', 'w', encoding='utf8') as outfile:
    str_ = json.dumps(data,
                      indent=4, sort_keys=True,
                      separators=(',', ': '), ensure_ascii=False)
    outfile.write(to_unicode(str_))

# Read JSON file
with open('data.json') as data_file:
    data_loaded = json.load(data_file)

print(data == data_loaded)

参数说明json.dump：

indent：使用4个空格来缩进每个条目，例如，当开始一个新的dict时（否则所有内容将排在一行中），
sort_keys：对字典的键进行排序。如果要使用diff工具比较json文件/将其置于版本控制下，则此功能很有用。
separators：防止Python添加尾随空格

带包装

看看我的实用程序包mpu，它是一个超级简单易记的软件包：

import mpu.io
data = mpu.io.read('example.json')
mpu.io.write('example.json', data)

创建的JSON文件

{
    "a list":[
        1,
        42,
        3.141,
        1337,
        "help",
        "€"
    ],
    "a string":"bla",
    "another dict":{
        "foo":"bar",
        "key":"value",
        "the answer":42
    }
}

通用文件结尾

.json

备择方案

CSV：超简单格式（读写）
JSON：非常适合编写人类可读的数据；非常常用（读和写）
YAML：YAML是JSON的超集，但更易于阅读（读，写， JSON和YAML的比较）
pickle：Python序列化格式（读写）
MessagePack（Python软件包）：更紧凑的表示形式（读和写）
HDF5（Python软件包）：适用于矩阵（读写）
XML：存在太多*叹息*（读与写）

对于您的应用程序，以下内容可能很重要：

其他编程语言的支持
阅读/写作表现
紧凑度（文件大小）

也可以看看：数据序列化格式的比较

如果您想寻找一种制作配置文件的方法，则可能需要阅读我的短文《Python中的配置文件》。

Read and write JSON files with Python 2+3; works with unicode

# -*- coding: utf-8 -*-
import json

# Make it work for Python 2+3 and with Unicode
import io
try:
    to_unicode = unicode
except NameError:
    to_unicode = str

# Define data
data = {'a list': [1, 42, 3.141, 1337, 'help', u'€'],
        'a string': 'bla',
        'another dict': {'foo': 'bar',
                         'key': 'value',
                         'the answer': 42}}

# Write JSON file
with io.open('data.json', 'w', encoding='utf8') as outfile:
    str_ = json.dumps(data,
                      indent=4, sort_keys=True,
                      separators=(',', ': '), ensure_ascii=False)
    outfile.write(to_unicode(str_))

# Read JSON file
with open('data.json') as data_file:
    data_loaded = json.load(data_file)

print(data == data_loaded)

Explanation of the parameters of json.dump:

indent: Use 4 spaces to indent each entry, e.g. when a new dict is started (otherwise all will be in one line),
sort_keys: sort the keys of dictionaries. This is useful if you want to compare json files with a diff tool / put them under version control.
separators: To prevent Python from adding trailing whitespaces

With a package

Have a look at my utility package mpu for a super simple and easy to remember one:

import mpu.io
data = mpu.io.read('example.json')
mpu.io.write('example.json', data)

Created JSON file

{
    "a list":[
        1,
        42,
        3.141,
        1337,
        "help",
        "€"
    ],
    "a string":"bla",
    "another dict":{
        "foo":"bar",
        "key":"value",
        "the answer":42
    }
}

Common file endings

.json

Alternatives

CSV: Super simple format (read & write)
JSON: Nice for writing human-readable data; VERY commonly used (read & write)
YAML: YAML is a superset of JSON, but easier to read (read & write, comparison of JSON and YAML)
pickle: A Python serialization format (read & write)
MessagePack (Python package): More compact representation (read & write)
HDF5 (Python package): Nice for matrices (read & write)
XML: exists too *sigh* (read & write)

For your application, the following might be important:

Support by other programming languages
Reading / writing performance
Compactness (file size)

In case you are rather looking for a way to make configuration files, you might want to read my short article Configuration files in Python

回答 4

对于那些尝试转储希腊语或其他“异类”语言（例如我）但也遇到奇怪字符（例如和平符号（\ u262E）或通常包含在json格式数据中的其他字符）的问题（unicode错误）的人例如Twitter，解决方案可能如下（sort_keys显然是可选的）：

import codecs, json
with codecs.open('data.json', 'w', 'utf8') as f:
     f.write(json.dumps(data, sort_keys = True, ensure_ascii=False))

For those of you who are trying to dump greek or other “exotic” languages such as me but are also having problems (unicode errors) with weird characters such as the peace symbol (\u262E) or others which are often contained in json formated data such as Twitter’s, the solution could be as follows (sort_keys is obviously optional):

import codecs, json
with codecs.open('data.json', 'w', 'utf8') as f:
     f.write(json.dumps(data, sort_keys = True, ensure_ascii=False))

回答 5

我没有足够的声誉来添加评论，所以我只在这里写下关于此烦人的TypeError的一些发现：

基本上，我认为这仅json.dump()是Python 2中的函数错误- 即使使用encoding = 'utf-8'参数打开文件，也无法转储包含非ASCII字符的Python（字典/列表）数据。（即，无论您做什么）。但是，json.dumps()可以在Python 2和3上使用。

为了说明这一点，请遵循phihag的答案：他的答案中的代码在Python 2中会中断TypeError: must be unicode, not str，如果data包含非ASCII字符，则会出现exception 。（Python 2.7.6，Debian）：

import json
data = {u'\u0430\u0431\u0432\u0433\u0434': 1} #{u'абвгд': 1}
with open('data.txt', 'w') as outfile:
    json.dump(data, outfile)

但是，它在Python 3中工作正常。

I don’t have enough reputation to add in comments, so I just write some of my findings of this annoying TypeError here:

Basically, I think it’s a bug in the json.dump() function in Python 2 only – It can’t dump a Python (dictionary / list) data containing non-ASCII characters, even you open the file with the encoding = 'utf-8' parameter. (i.e. No matter what you do). But, json.dumps() works on both Python 2 and 3.

To illustrate this, following up phihag’s answer: the code in his answer breaks in Python 2 with exception TypeError: must be unicode, not str, if data contains non-ASCII characters. (Python 2.7.6, Debian):

import json
data = {u'\u0430\u0431\u0432\u0433\u0434': 1} #{u'абвгд': 1}
with open('data.txt', 'w') as outfile:
    json.dump(data, outfile)

It however works fine in Python 3.

回答 6

使用JSON使用json.dump（）或json.dumps（）在文件中写入数据。这样写即可将数据存储在文件中。

import json
data = [1,2,3,4,5]
with open('no.txt', 'w') as txtfile:
    json.dump(data, txtfile)

列表中的此示例存储到文件中。

Write a data in file using JSON use json.dump() or json.dumps() used. write like this to store data in file.

import json
data = [1,2,3,4,5]
with open('no.txt', 'w') as txtfile:
    json.dump(data, txtfile)

this example in list is store to a file.

回答 7

要使用缩进“漂亮打印”来编写JSON：

import json

outfile = open('data.json')
json.dump(data, outfile, indent=4)

另外，如果您需要调试格式不正确的JSON，并希望得到有用的错误消息，请使用import simplejson库而不是import json（功能应相同）

To write the JSON with indentation, “pretty print”:

import json

outfile = open('data.json')
json.dump(data, outfile, indent=4)

Also, if you need to debug improperly formatted JSON, and want a helpful error message, use import simplejson library, instead of import json (functions should be the same)

回答 8

json.dump(data, open('data.txt', 'wb'))

json.dump(data, open('data.txt', 'wb'))

回答 9

将JSON写入文件

import json

data = {}
data['people'] = []
data['people'].append({
    'name': 'Scott',
    'website': 'stackabuse.com',
    'from': 'Nebraska'
})
data['people'].append({
    'name': 'Larry',
    'website': 'google.com',
    'from': 'Michigan'
})
data['people'].append({
    'name': 'Tim',
    'website': 'apple.com',
    'from': 'Alabama'
})

with open('data.txt', 'w') as outfile:
    json.dump(data, outfile)

从文件读取JSON

import json

with open('data.txt') as json_file:
    data = json.load(json_file)
    for p in data['people']:
        print('Name: ' + p['name'])
        print('Website: ' + p['website'])
        print('From: ' + p['from'])
        print('')

Writing JSON to a File

import json

data = {}
data['people'] = []
data['people'].append({
    'name': 'Scott',
    'website': 'stackabuse.com',
    'from': 'Nebraska'
})
data['people'].append({
    'name': 'Larry',
    'website': 'google.com',
    'from': 'Michigan'
})
data['people'].append({
    'name': 'Tim',
    'website': 'apple.com',
    'from': 'Alabama'
})

with open('data.txt', 'w') as outfile:
    json.dump(data, outfile)

Reading JSON from a File

import json

with open('data.txt') as json_file:
    data = json.load(json_file)
    for p in data['people']:
        print('Name: ' + p['name'])
        print('Website: ' + p['website'])
        print('From: ' + p['from'])
        print('')

回答 10

如果您尝试使用json格式将pandas数据帧写入文件，我建议您这样做

destination='filepath'
saveFile = open(destination, 'w')
saveFile.write(df.to_json())
saveFile.close()

if you are trying to write a pandas dataframe into a file using a json format i’d recommend this

destination='filepath'
saveFile = open(destination, 'w')
saveFile.write(df.to_json())
saveFile.close()

回答 11

以前所有的答案都是正确的，这是一个非常简单的示例：

#! /usr/bin/env python
import json

def write_json():
    # create a dictionary  
    student_data = {"students":[]}
    #create a list
    data_holder = student_data["students"]
    # just a counter
    counter = 0
    #loop through if you have multiple items..         
    while counter < 3:
        data_holder.append({'id':counter})
        data_holder.append({'room':counter})
        counter += 1    
    #write the file        
    file_path='/tmp/student_data.json'
    with open(file_path, 'w') as outfile:
        print("writing file to: ",file_path)
        # HERE IS WHERE THE MAGIC HAPPENS 
        json.dump(student_data, outfile)
    outfile.close()     
    print("done")

write_json()

All previous answers are correct here is a very simple example:

#! /usr/bin/env python
import json

def write_json():
    # create a dictionary  
    student_data = {"students":[]}
    #create a list
    data_holder = student_data["students"]
    # just a counter
    counter = 0
    #loop through if you have multiple items..         
    while counter < 3:
        data_holder.append({'id':counter})
        data_holder.append({'room':counter})
        counter += 1    
    #write the file        
    file_path='/tmp/student_data.json'
    with open(file_path, 'w') as outfile:
        print("writing file to: ",file_path)
        # HERE IS WHERE THE MAGIC HAPPENS 
        json.dump(student_data, outfile)
    outfile.close()     
    print("done")

write_json()

回答 12

接受的答案很好。但是，我遇到了“不是json可序列化”错误。

这是我将其固定open("file-name.json", 'w')为输出的方式：

output.write(str(response))

尽管它不是一个很好的解决方案，因为它创建的json文件不会使用双引号，但是如果您希望快速又肮脏的话，那就太好了。

The accepted answer is fine. However, I ran into “is not json serializable” error using that.

Here’s how I fixed it with open("file-name.json", 'w') as output:

output.write(str(response))

Although it is not a good fix as the json file it creates will not have double quotes, however it is great if you are looking for quick and dirty.

回答 13

可以将JSON数据写入文件，如下所示

hist1 = [{'val_loss': [0.5139984398465246],
'val_acc': [0.8002029867684085],
'loss': [0.593220705309384],
'acc': [0.7687131817929321]},
{'val_loss': [0.46456472964199463],
'val_acc': [0.8173602046780344],
'loss': [0.4932038113037539],
'acc': [0.8063946213802453]}]

写入文件：

with open('text1.json', 'w') as f:
     json.dump(hist1, f)

The JSON data can be written to a file as follows

hist1 = [{'val_loss': [0.5139984398465246],
'val_acc': [0.8002029867684085],
'loss': [0.593220705309384],
'acc': [0.7687131817929321]},
{'val_loss': [0.46456472964199463],
'val_acc': [0.8173602046780344],
'loss': [0.4932038113037539],
'acc': [0.8063946213802453]}]

Write to a file:

with open('text1.json', 'w') as f:
     json.dump(hist1, f)

知识问答

在pandas数据框中选择多个列

2021年7月24日 Python实用宝典

问题：在pandas数据框中选择多个列

我在不同的列中有数据，但是我不知道如何提取数据以将其保存在另一个变量中。

index  a   b   c
1      2   3   4
2      3   4   5

如何选择'a'，'b'并将其保存到df1？

我试过了

df1 = df['a':'b']
df1 = df.ix[:, 'a':'b']

似乎没有任何工作。

I have data in different columns but I don’t know how to extract it to save it in another variable.

index  a   b   c
1      2   3   4
2      3   4   5

How do I select 'a', 'b' and save it in to df1?

I tried

df1 = df['a':'b']
df1 = df.ix[:, 'a':'b']

None seem to work.

回答 0

列名（字符串）无法按照您尝试的方式进行切片。

在这里，您有两个选择。如果您从上下文中知道要切出哪些变量，则可以通过将列表传递给__getitem__语法（[]）来仅返回那些列的视图。

df1 = df[['a','b']]

或者，如果需要对它们进行数字索引而不是按其名称进行索引（例如，您的代码应在不知道前两列名称的情况下自动执行此操作），则可以执行以下操作：

df1 = df.iloc[:,0:2] # Remember that Python does not slice inclusive of the ending index.

此外，您应该熟悉Pandas对象与该对象副本的视图概念。上述方法中的第一个将在内存中返回所需子对象（所需切片）的新副本。

但是，有时熊猫中有一些索引约定不执行此操作，而是给您一个新变量，该变量仅引用与原始对象中的子对象或切片相同的内存块。第二种索引编制方式会发生这种情况，因此您可以使用copy()函数对其进行修改以获取常规副本。发生这种情况时，更改您认为是切片对象的内容有时会更改原始对象。始终对此保持警惕。

df1 = df.iloc[0,0:2].copy() # To avoid the case where changing df1 also changes df

要使用iloc，您需要知道列位置（或索引）。由于列位置可能会改变，而不是硬编码索引，则可以使用iloc随get_loc功能的columns数据框对象的方法来获得列索引。

{df.columns.get_loc(c):c for idx, c in enumerate(df.columns)}

现在，您可以使用此字典通过名称和使用来访问列iloc。

The column names (which are strings) cannot be sliced in the manner you tried.

Here you have a couple of options. If you know from context which variables you want to slice out, you can just return a view of only those columns by passing a list into the __getitem__ syntax (the []’s).

df1 = df[['a','b']]

Alternatively, if it matters to index them numerically and not by their name (say your code should automatically do this without knowing the names of the first two columns) then you can do this instead:

df1 = df.iloc[:,0:2] # Remember that Python does not slice inclusive of the ending index.

Additionally, you should familiarize yourself with the idea of a view into a Pandas object vs. a copy of that object. The first of the above methods will return a new copy in memory of the desired sub-object (the desired slices).

Sometimes, however, there are indexing conventions in Pandas that don’t do this and instead give you a new variable that just refers to the same chunk of memory as the sub-object or slice in the original object. This will happen with the second way of indexing, so you can modify it with the copy() function to get a regular copy. When this happens, changing what you think is the sliced object can sometimes alter the original object. Always good to be on the look out for this.

df1 = df.iloc[0,0:2].copy() # To avoid the case where changing df1 also changes df

To use iloc, you need to know the column positions (or indices). As the column positions may change, instead of hard-coding indices, you can use iloc along with get_loc function of columns method of dataframe object to obtain column indices.

{df.columns.get_loc(c):c for idx, c in enumerate(df.columns)}

Now you can use this dictionary to access columns through names and using iloc.

回答 1

从0.11.0版本开始，可以按照您尝试使用.loc索引器的方式对列进行切片：

df.loc[:, 'C':'E']

等价于

df[['C', 'D', 'E']]  # or df.loc[:, ['C', 'D', 'E']]

并C通过返回列E。

随机生成的DataFrame的演示：

import pandas as pd
import numpy as np
np.random.seed(5)
df = pd.DataFrame(np.random.randint(100, size=(100, 6)), 
                  columns=list('ABCDEF'), 
                  index=['R{}'.format(i) for i in range(100)])
df.head()

Out: 
     A   B   C   D   E   F
R0  99  78  61  16  73   8
R1  62  27  30  80   7  76
R2  15  53  80  27  44  77
R3  75  65  47  30  84  86
R4  18   9  41  62   1  82

要从C到E获得列（请注意，与整数切片不同，列中包含’E’）：

df.loc[:, 'C':'E']

Out: 
      C   D   E
R0   61  16  73
R1   30  80   7
R2   80  27  44
R3   47  30  84
R4   41  62   1
R5    5  58   0
...

同样适用于基于标签选择行。从这些列中获取行“ R6”至“ R10”：

df.loc['R6':'R10', 'C':'E']

Out: 
      C   D   E
R6   51  27  31
R7   83  19  18
R8   11  67  65
R9   78  27  29
R10   7  16  94

.loc还接受一个布尔数组，因此您可以选择在数组中对应条目为的列True。例如，如果列名称在列表中，则df.columns.isin(list('BCD'))返回array([False, True, True, True, False, False], dtype=bool)-True ['B', 'C', 'D']；错误，否则。

df.loc[:, df.columns.isin(list('BCD'))]

Out: 
      B   C   D
R0   78  61  16
R1   27  30  80
R2   53  80  27
R3   65  47  30
R4    9  41  62
R5   78   5  58
...

As of version 0.11.0, columns can be sliced in the manner you tried using the .loc indexer:

df.loc[:, 'C':'E']

is equivalent of

df[['C', 'D', 'E']]  # or df.loc[:, ['C', 'D', 'E']]

and returns columns C through E.

A demo on a randomly generated DataFrame:

import pandas as pd
import numpy as np
np.random.seed(5)
df = pd.DataFrame(np.random.randint(100, size=(100, 6)), 
                  columns=list('ABCDEF'), 
                  index=['R{}'.format(i) for i in range(100)])
df.head()

Out: 
     A   B   C   D   E   F
R0  99  78  61  16  73   8
R1  62  27  30  80   7  76
R2  15  53  80  27  44  77
R3  75  65  47  30  84  86
R4  18   9  41  62   1  82

To get the columns from C to E (note that unlike integer slicing, ‘E’ is included in the columns):

df.loc[:, 'C':'E']

Out: 
      C   D   E
R0   61  16  73
R1   30  80   7
R2   80  27  44
R3   47  30  84
R4   41  62   1
R5    5  58   0
...

Same works for selecting rows based on labels. Get the rows ‘R6’ to ‘R10’ from those columns:

df.loc['R6':'R10', 'C':'E']

Out: 
      C   D   E
R6   51  27  31
R7   83  19  18
R8   11  67  65
R9   78  27  29
R10   7  16  94

.loc also accepts a boolean array so you can select the columns whose corresponding entry in the array is True. For example, df.columns.isin(list('BCD')) returns array([False, True, True, True, False, False], dtype=bool) – True if the column name is in the list ['B', 'C', 'D']; False, otherwise.

df.loc[:, df.columns.isin(list('BCD'))]

Out: 
      B   C   D
R0   78  61  16
R1   27  30  80
R2   53  80  27
R3   65  47  30
R4    9  41  62
R5   78   5  58
...

回答 2

假设列名（df.columns）为['index','a','b','c']，则所需数据在第3列和第4列中。如果在脚本运行时不知道它们的名称，则可以执行此操作

newdf = df[df.columns[2:4]] # Remember, Python is 0-offset! The "3rd" entry is at slot 2.

正如EMS在他的回答中指出的那样，df.ix切片列更加简洁，但是.columns切片界面可能更自然，因为它使用了香草1-D python列表索引/切片语法。

警告：这'index'是DataFrame列的坏名称。该标签也用于真实df.index属性Index数组。因此，您的列由返回，df['index']而真正的DataFrame索引由返回df.index。An Index是一种特殊的Series优化方法，用于查找其元素的值。对于df.index，它用于按标签查找行。该df.columns属性也是一个pd.Index数组，用于按标签查找列。

Assuming your column names (df.columns) are ['index','a','b','c'], then the data you want is in the 3rd & 4th columns. If you don’t know their names when your script runs, you can do this

newdf = df[df.columns[2:4]] # Remember, Python is 0-offset! The "3rd" entry is at slot 2.

As EMS points out in his answer, df.ix slices columns a bit more concisely, but the .columns slicing interface might be more natural because it uses the vanilla 1-D python list indexing/slicing syntax.

WARN: 'index' is a bad name for a DataFrame column. That same label is also used for the real df.index attribute, a Index array. So your column is returned by df['index'] and the real DataFrame index is returned by df.index. An Index is a special kind of Series optimized for lookup of it’s elements’ values. For df.index it’s for looking up rows by their label. That df.columns attribute is also a pd.Index array, for looking up columns by their labels.

回答 3

In [39]: df
Out[39]: 
   index  a  b  c
0      1  2  3  4
1      2  3  4  5

In [40]: df1 = df[['b', 'c']]

In [41]: df1
Out[41]: 
   b  c
0  3  4
1  4  5

In [39]: df
Out[39]: 
   index  a  b  c
0      1  2  3  4
1      2  3  4  5

In [40]: df1 = df[['b', 'c']]

In [41]: df1
Out[41]: 
   b  c
0  3  4
1  4  5

回答 4

我知道这个问题已经很老了，但是在最新版本的熊猫中，有一种简单的方法可以做到这一点。列名（即字符串）可以按您喜欢的任何方式进行切片。

columns = ['b', 'c']
df1 = pd.DataFrame(df, columns=columns)

I realize this question is quite old, but in the latest version of pandas there is an easy way to do exactly this. Column names (which are strings) can be sliced in whatever manner you like.

columns = ['b', 'c']
df1 = pd.DataFrame(df, columns=columns)

回答 5

您可以提供要删除的列的列表，然后仅使用drop()Pandas DataFrame上的函数返回带有所需列的DataFrame。

只是说

colsToDrop = ['a']
df.drop(colsToDrop, axis=1)

将返回仅包含列b和的DataFrame c。

该drop方法在此处记录。

You could provide a list of columns to be dropped and return back the DataFrame with only the columns needed using the drop() function on a Pandas DataFrame.

Just saying

colsToDrop = ['a']
df.drop(colsToDrop, axis=1)

would return a DataFrame with just the columns b and c.

The drop method is documented here.

回答 6

有了熊猫

机智列名称

dataframe[['column1','column2']]

通过iloc和具有索引号的特定列进行选择：

dataframe.iloc[:,[1,2]]

与loc列名称可以像

dataframe.loc[:,['column1','column2']]

With pandas,

wit column names

dataframe[['column1','column2']]

to select by iloc and specific columns with index number:

dataframe.iloc[:,[1,2]]

with loc column names can be used like

dataframe.loc[:,['column1','column2']]

回答 7

我发现此方法非常有用：

# iloc[row slicing, column slicing]
surveys_df.iloc [0:3, 1:4]

可以在这里找到更多详细信息

I found this method to be very useful:

# iloc[row slicing, column slicing]
surveys_df.iloc [0:3, 1:4]

More details can be found here

回答 8

从0.21.0开始，不推荐使用.loc或[]使用带有一个或多个缺少标签的列表，而推荐使用.reindex。因此，您的问题的答案是：

df1 = df.reindex(columns=['b','c'])

在以前的版本中，.loc[list-of-labels]只要找到至少一个键就可以使用using （否则将引发KeyError）。此行为已弃用，现在显示警告消息。推荐的替代方法是使用.reindex()。

在索引和选择数据中了解更多信息

Starting with 0.21.0, using .loc or [] with a list with one or more missing labels is deprecated in favor of .reindex. So, the answer to your question is:

df1 = df.reindex(columns=['b','c'])

In prior versions, using .loc[list-of-labels] would work as long as at least 1 of the keys was found (otherwise it would raise a KeyError). This behavior is deprecated and now shows a warning message. The recommended alternative is to use .reindex().

Read more at Indexing and Selecting Data

回答 9

您可以使用熊猫。我创建了DataFrame：

    import pandas as pd
    df = pd.DataFrame([[1, 2,5], [5,4, 5], [7,7, 8], [7,6,9]], 
                      index=['Jane', 'Peter','Alex','Ann'],
                      columns=['Test_1', 'Test_2', 'Test_3'])

数据框：

           Test_1  Test_2  Test_3
    Jane        1       2       5
    Peter       5       4       5
    Alex        7       7       8
    Ann         7       6       9

要按名称选择1列或更多列：

    df[['Test_1','Test_3']]

           Test_1  Test_3
    Jane        1       5
    Peter       5       5
    Alex        7       8
    Ann         7       9

您还可以使用：

    df.Test_2

和哟列 Test_2

    Jane     2
    Peter    4
    Alex     7
    Ann      6

您也可以使用从这些行中选择列和行.loc()。这称为“切片”。请注意，我从列Test_1到Test_3

    df.loc[:,'Test_1':'Test_3']

“切片”为：

            Test_1  Test_2  Test_3
     Jane        1       2       5
     Peter       5       4       5
     Alex        7       7       8
     Ann         7       6       9

如果你只是想Peter和Ann来自列Test_1和Test_3：

    df.loc[['Peter', 'Ann'],['Test_1','Test_3']]

你得到：

           Test_1  Test_3
    Peter       5       5
    Ann         7       9

You can use pandas. I create the DataFrame:

    import pandas as pd
    df = pd.DataFrame([[1, 2,5], [5,4, 5], [7,7, 8], [7,6,9]], 
                      index=['Jane', 'Peter','Alex','Ann'],
                      columns=['Test_1', 'Test_2', 'Test_3'])

The DataFrame:

           Test_1  Test_2  Test_3
    Jane        1       2       5
    Peter       5       4       5
    Alex        7       7       8
    Ann         7       6       9

To select 1 or more columns by name:

    df[['Test_1','Test_3']]

           Test_1  Test_3
    Jane        1       5
    Peter       5       5
    Alex        7       8
    Ann         7       9

You can also use:

    df.Test_2

And yo get column Test_2

    Jane     2
    Peter    4
    Alex     7
    Ann      6

You can also select columns and rows from these rows using .loc(). This is called “slicing”. Notice that I take from column Test_1to Test_3

    df.loc[:,'Test_1':'Test_3']

The “Slice” is:

            Test_1  Test_2  Test_3
     Jane        1       2       5
     Peter       5       4       5
     Alex        7       7       8
     Ann         7       6       9

And if you just want Peter and Ann from columns Test_1 and Test_3:

    df.loc[['Peter', 'Ann'],['Test_1','Test_3']]

You get:

           Test_1  Test_3
    Peter       5       5
    Ann         7       9

回答 10

如果要按行索引和列名获取一个元素，则可以像那样进行df['b'][0]。它像成像一样简单。

或者，您也可以df.ix[0,'b']混合使用索引和标签。

注意：由于ix不推荐使用v0.20 ，而推荐使用loc/ iloc。

If you want to get one element by row index and column name, you can do it just like df['b'][0]. It is as simple as you can image.

Or you can use df.ix[0,'b'],mixed usage of index and label.

Note: Since v0.20 ix has been deprecated in favour of loc / iloc.

回答 11

一种不同而又简单的方法：迭代行

使用iterows

 df1= pd.DataFrame() #creating an empty dataframe
 for index,i in df.iterrows():
    df1.loc[index,'A']=df.loc[index,'A']
    df1.loc[index,'B']=df.loc[index,'B']
    df1.head()

One different and easy approach : iterating rows

using iterows

 df1= pd.DataFrame() #creating an empty dataframe
 for index,i in df.iterrows():
    df1.loc[index,'A']=df.loc[index,'A']
    df1.loc[index,'B']=df.loc[index,'B']
    df1.head()

回答 12

以上响应中讨论的不同方法是基于以下假设：用户知道要放下或子集化的列索引，或者用户希望使用一定范围的列（例如，在“ C”与“ E”之间）对数据帧进行子集化。pandas.DataFrame.drop（）当然是基于用户定义的列列表对数据进行子集化的选项（尽管您必须谨慎使用始终使用dataframe的副本，并且inplace参数不应设置为True！）

另一种选择是使用pandas.columns.difference（），它对列名进行设置上的区别，并返回包含所需列的数组的索引类型。以下是解决方案：

df = pd.DataFrame([[2,3,4],[3,4,5]],columns=['a','b','c'],index=[1,2])
columns_for_differencing = ['a']
df1 = df.copy()[df.columns.difference(columns_for_differencing)]
print(df1)

输出为：b c 1 3 4 2 4 5

The different approaches discussed in above responses are based on the assumption that either the user knows column indices to drop or subset on, or the user wishes to subset a dataframe using a range of columns (for instance between ‘C’ : ‘E’). pandas.DataFrame.drop() is certainly an option to subset data based on a list of columns defined by user (though you have to be cautious that you always use copy of dataframe and inplace parameters should not be set to True!!)

Another option is to use pandas.columns.difference(), which does a set difference on column names, and returns an index type of array containing desired columns. Following is the solution:

df = pd.DataFrame([[2,3,4],[3,4,5]],columns=['a','b','c'],index=[1,2])
columns_for_differencing = ['a']
df1 = df.copy()[df.columns.difference(columns_for_differencing)]
print(df1)

The output would be:b c 1 3 4 2 4 5

回答 13

您也可以使用df.pop（）

>>> df = pd.DataFrame([('falcon', 'bird',    389.0),
...                    ('parrot', 'bird',     24.0),
...                    ('lion',   'mammal',   80.5),
...                    ('monkey', 'mammal', np.nan)],
...                   columns=('name', 'class', 'max_speed'))
>>> df
     name   class  max_speed
0  falcon    bird      389.0
1  parrot    bird       24.0
2    lion  mammal       80.5
3  monkey  mammal 

>>> df.pop('class')
0      bird
1      bird
2    mammal
3    mammal
Name: class, dtype: object

>>> df
     name  max_speed
0  falcon      389.0
1  parrot       24.0
2    lion       80.5
3  monkey        NaN

让我知道这是否对您有帮助，请使用df.pop（c）

you can also use df.pop()

>>> df = pd.DataFrame([('falcon', 'bird',    389.0),
...                    ('parrot', 'bird',     24.0),
...                    ('lion',   'mammal',   80.5),
...                    ('monkey', 'mammal', np.nan)],
...                   columns=('name', 'class', 'max_speed'))
>>> df
     name   class  max_speed
0  falcon    bird      389.0
1  parrot    bird       24.0
2    lion  mammal       80.5
3  monkey  mammal 

>>> df.pop('class')
0      bird
1      bird
2    mammal
3    mammal
Name: class, dtype: object

>>> df
     name  max_speed
0  falcon      389.0
1  parrot       24.0
2    lion       80.5
3  monkey        NaN

let me know if this helps so for you , please use df.pop(c)

回答 14

我已经看到了一些答案，但是仍然不清楚。您将如何选择那些感兴趣的列？答案是，如果将它们收集在列表中，则可以使用列表引用列。

例

print(extracted_features.shape)
print(extracted_features)

(63,)
['f000004' 'f000005' 'f000006' 'f000014' 'f000039' 'f000040' 'f000043'
 'f000047' 'f000048' 'f000049' 'f000050' 'f000051' 'f000052' 'f000053'
 'f000054' 'f000055' 'f000056' 'f000057' 'f000058' 'f000059' 'f000060'
 'f000061' 'f000062' 'f000063' 'f000064' 'f000065' 'f000066' 'f000067'
 'f000068' 'f000069' 'f000070' 'f000071' 'f000072' 'f000073' 'f000074'
 'f000075' 'f000076' 'f000077' 'f000078' 'f000079' 'f000080' 'f000081'
 'f000082' 'f000083' 'f000084' 'f000085' 'f000086' 'f000087' 'f000088'
 'f000089' 'f000090' 'f000091' 'f000092' 'f000093' 'f000094' 'f000095'
 'f000096' 'f000097' 'f000098' 'f000099' 'f000100' 'f000101' 'f000103']

我有以下list / numpy数组extracted_features，指定63列。原始数据集有103列，我想准确提取出这些列，然后使用

dataset[extracted_features]

你最终会得到这个

您将在机器学习中（特别是在功能选择中）经常使用此功能。我也想讨论其他方式，但是我认为其他stackoverflowers已经对此进行了讨论。希望这对您有所帮助！

I’ve seen several answers on that, but on remained unclear to me. How would you select those columns of interest? The answer to that is that if you have them gathered in a list, you can just reference the columns using the list.

Example

print(extracted_features.shape)
print(extracted_features)

(63,)
['f000004' 'f000005' 'f000006' 'f000014' 'f000039' 'f000040' 'f000043'
 'f000047' 'f000048' 'f000049' 'f000050' 'f000051' 'f000052' 'f000053'
 'f000054' 'f000055' 'f000056' 'f000057' 'f000058' 'f000059' 'f000060'
 'f000061' 'f000062' 'f000063' 'f000064' 'f000065' 'f000066' 'f000067'
 'f000068' 'f000069' 'f000070' 'f000071' 'f000072' 'f000073' 'f000074'
 'f000075' 'f000076' 'f000077' 'f000078' 'f000079' 'f000080' 'f000081'
 'f000082' 'f000083' 'f000084' 'f000085' 'f000086' 'f000087' 'f000088'
 'f000089' 'f000090' 'f000091' 'f000092' 'f000093' 'f000094' 'f000095'
 'f000096' 'f000097' 'f000098' 'f000099' 'f000100' 'f000101' 'f000103']

I have the following list/numpy array extracted_features, specifying 63 columns. The original dataset has 103 columns, and I would like to extract exactly those, then I would use

dataset[extracted_features]

And you will end up with this

This something you would use quite often in Machine Learning (more specifically, in feature selection). I would like to discuss other ways too, but I think that has already been covered by other stackoverflowers. Hope this’ve been helpful!

回答 15

您可以使用pandas.DataFrame.filtermethod来过滤或重新排序列，如下所示：

df1 = df.filter(['a', 'b'])

You can use pandas.DataFrame.filter method to either filter or reorder columns like this:

df1 = df.filter(['a', 'b'])

回答 16

df[['a','b']] # select all rows of 'a' and 'b'column 
df.loc[0:10, ['a','b']] # index 0 to 10 select column 'a' and 'b'
df.loc[0:10, ['a':'b']] # index 0 to 10 select column 'a' to 'b'
df.iloc[0:10, 3:5] # index 0 to 10 and column 3 to 5
df.iloc[3, 3:5] # index 3 of column 3 to 5

df[['a','b']] # select all rows of 'a' and 'b'column 
df.loc[0:10, ['a','b']] # index 0 to 10 select column 'a' and 'b'
df.loc[0:10, ['a':'b']] # index 0 to 10 select column 'a' to 'b'
df.iloc[0:10, 3:5] # index 0 to 10 and column 3 to 5
df.iloc[3, 3:5] # index 3 of column 3 to 5