How do I remove leading and trailing whitespace from a string in Python?
For example:
" Hello " --> "Hello"
" Hello" --> "Hello"
"Hello " --> "Hello"
"Bob has a cat" --> "Bob has a cat"
回答 0
只是一个空格,还是所有连续的空格?如果是第二个,则字符串已经具有.strip()方法:
>>>' Hello '.strip()'Hello'>>>' Hello'.strip()'Hello'>>>'Bob has a cat'.strip()'Bob has a cat'>>>' Hello '.strip()# ALL consecutive spaces at both ends removed'Hello'
但是,如果只需要删除一个空格,可以使用以下方法:
def strip_one_space(s):if s.endswith(" "): s = s[:-1]if s.startswith(" "): s = s[1:]return s>>> strip_one_space(" Hello ")' Hello'
Just one space, or all consecutive spaces? If the second, then strings already have a .strip() method:
>>> ' Hello '.strip()
'Hello'
>>> ' Hello'.strip()
'Hello'
>>> 'Bob has a cat'.strip()
'Bob has a cat'
>>> ' Hello '.strip() # ALL consecutive spaces at both ends removed
'Hello'
If you need only to remove one space however, you could do it with:
def strip_one_space(s):
if s.endswith(" "): s = s[:-1]
if s.startswith(" "): s = s[1:]
return s
>>> strip_one_space(" Hello ")
' Hello'
Also, note that str.strip() removes other whitespace characters as well (e.g. tabs and newlines). To remove only spaces, you can specify the character to remove as an argument to strip, i.e.:
I wanted to remove the too-much spaces in a string (also in between the string, not only in the beginning or end). I made this, because I don’t know how to do it otherwise:
string = "Name : David Account: 1234 Another thing: something "
ready = False
while ready == False:
pos = string.find(" ")
if pos != -1:
string = string.replace(" "," ")
else:
ready = True
print(string)
This replaces double spaces in one space until you have no double spaces any more
回答 6
我找不到想要的解决方案,所以我创建了一些自定义函数。您可以尝试一下。
def cleansed(s: str):""":param s: String to be cleansed"""assert s isnot(Noneor"")# return trimmed(s.replace('"', '').replace("'", ""))return trimmed(s)def trimmed(s: str):""":param s: String to be cleansed"""assert s isnot(Noneor"")
ss = trim_start_and_end(s).replace(' ',' ')while' 'in ss:
ss = ss.replace(' ',' ')return ssdef trim_start_and_end(s: str):""":param s: String to be cleansed"""assert s isnot(Noneor"")return trim_start(trim_end(s))def trim_start(s: str):""":param s: String to be cleansed"""assert s isnot(Noneor"")
chars =[]for c in s:if c isnot' 'or len(chars)>0:
chars.append(c)return"".join(chars).lower()def trim_end(s: str):""":param s: String to be cleansed"""assert s isnot(Noneor"")
chars =[]for c in reversed(s):if c isnot' 'or len(chars)>0:
chars.append(c)return"".join(reversed(chars)).lower()
s1 =' b Beer '
s2 ='Beer b '
s3 =' Beer b '
s4 =' bread butter Beer b '
cdd = trim_start(s1)
cddd = trim_end(s2)
clean1 = cleansed(s3)
clean2 = cleansed(s4)print("\nStr: {0} Len: {1} Cleansed: {2} Len: {3}".format(s1, len(s1), cdd, len(cdd)))print("\nStr: {0} Len: {1} Cleansed: {2} Len: {3}".format(s2, len(s2), cddd, len(cddd)))print("\nStr: {0} Len: {1} Cleansed: {2} Len: {3}".format(s3, len(s3), clean1, len(clean1)))print("\nStr: {0} Len: {1} Cleansed: {2} Len: {3}".format(s4, len(s4), clean2, len(clean2)))
I could not find a solution to what I was looking for so I created some custom functions. You can try them out.
def cleansed(s: str):
""":param s: String to be cleansed"""
assert s is not (None or "")
# return trimmed(s.replace('"', '').replace("'", ""))
return trimmed(s)
def trimmed(s: str):
""":param s: String to be cleansed"""
assert s is not (None or "")
ss = trim_start_and_end(s).replace(' ', ' ')
while ' ' in ss:
ss = ss.replace(' ', ' ')
return ss
def trim_start_and_end(s: str):
""":param s: String to be cleansed"""
assert s is not (None or "")
return trim_start(trim_end(s))
def trim_start(s: str):
""":param s: String to be cleansed"""
assert s is not (None or "")
chars = []
for c in s:
if c is not ' ' or len(chars) > 0:
chars.append(c)
return "".join(chars).lower()
def trim_end(s: str):
""":param s: String to be cleansed"""
assert s is not (None or "")
chars = []
for c in reversed(s):
if c is not ' ' or len(chars) > 0:
chars.append(c)
return "".join(reversed(chars)).lower()
s1 = ' b Beer '
s2 = 'Beer b '
s3 = ' Beer b '
s4 = ' bread butter Beer b '
cdd = trim_start(s1)
cddd = trim_end(s2)
clean1 = cleansed(s3)
clean2 = cleansed(s4)
print("\nStr: {0} Len: {1} Cleansed: {2} Len: {3}".format(s1, len(s1), cdd, len(cdd)))
print("\nStr: {0} Len: {1} Cleansed: {2} Len: {3}".format(s2, len(s2), cddd, len(cddd)))
print("\nStr: {0} Len: {1} Cleansed: {2} Len: {3}".format(s3, len(s3), clean1, len(clean1)))
print("\nStr: {0} Len: {1} Cleansed: {2} Len: {3}".format(s4, len(s4), clean2, len(clean2)))
回答 7
如果要从left和right修剪指定数量的空格,可以执行以下操作:
def remove_outer_spaces(text, num_of_leading, num_of_trailing):
text = list(text)for i in range(num_of_leading):if text[i]==" ":
text[i]=""else:breakfor i in range(1, num_of_trailing+1):if text[-i]==" ":
text[-i]=""else:breakreturn''.join(text)
txt1 =" MY name is "print(remove_outer_spaces(txt1,1,1))# result is: " MY name is "print(remove_outer_spaces(txt1,2,3))# result is: " MY name is "print(remove_outer_spaces(txt1,6,8))# result is: "MY name is"
If you want to trim specified number of spaces from left and right, you could do this:
def remove_outer_spaces(text, num_of_leading, num_of_trailing):
text = list(text)
for i in range(num_of_leading):
if text[i] == " ":
text[i] = ""
else:
break
for i in range(1, num_of_trailing+1):
if text[-i] == " ":
text[-i] = ""
else:
break
return ''.join(text)
txt1 = " MY name is "
print(remove_outer_spaces(txt1, 1, 1)) # result is: " MY name is "
print(remove_outer_spaces(txt1, 2, 3)) # result is: " MY name is "
print(remove_outer_spaces(txt1, 6, 8)) # result is: "MY name is"
How do I remove leading and trailing whitespace from a string in Python?
So below solution will remove leading and trailing whitespaces as well as intermediate whitespaces too. Like if you need to get a clear string values without multiple whitespaces.
>>> str_1 = ' Hello World'
>>> print(' '.join(str_1.split()))
Hello World
>>>
>>>
>>> str_2 = ' Hello World'
>>> print(' '.join(str_2.split()))
Hello World
>>>
>>>
>>> str_3 = 'Hello World '
>>> print(' '.join(str_3.split()))
Hello World
>>>
>>>
>>> str_4 = 'Hello World '
>>> print(' '.join(str_4.split()))
Hello World
>>>
>>>
>>> str_5 = ' Hello World '
>>> print(' '.join(str_5.split()))
Hello World
>>>
>>>
>>> str_6 = ' Hello World '
>>> print(' '.join(str_6.split()))
Hello World
>>>
>>>
>>> str_7 = 'Hello World'
>>> print(' '.join(str_7.split()))
Hello World
As you can see this will remove all the multiple whitespace in the string(output is Hello World for all). Location doesn’t matter. But if you really need leading and trailing whitespaces, then strip() would be find.
from pylab import figure, axes, pie, title, show
# Make a square figure and axes
figure(1, figsize=(6,6))
ax = axes([0.1,0.1,0.8,0.8])
labels ='Frogs','Hogs','Dogs','Logs'
fracs =[15,30,45,10]
explode =(0,0.05,0,0)
pie(fracs, explode=explode, labels=labels, autopct='%1.1f%%', shadow=True)
title('Raining Hogs and Dogs', bbox={'facecolor':'0.8','pad':5})
show()# Actually, don't show, just save to foo.png
I am writing a quick-and-dirty script to generate plots on the fly. I am using the code below (from Matplotlib documentation) as a starting point:
from pylab import figure, axes, pie, title, show
# Make a square figure and axes
figure(1, figsize=(6, 6))
ax = axes([0.1, 0.1, 0.8, 0.8])
labels = 'Frogs', 'Hogs', 'Dogs', 'Logs'
fracs = [15, 30, 45, 10]
explode = (0, 0.05, 0, 0)
pie(fracs, explode=explode, labels=labels, autopct='%1.1f%%', shadow=True)
title('Raining Hogs and Dogs', bbox={'facecolor': '0.8', 'pad': 5})
show() # Actually, don't show, just save to foo.png
I don’t want to display the plot on a GUI, instead, I want to save the plot to a file (say foo.png), so that, for example, it can be used in batch scripts. How do I do that?
While the question has been answered, I’d like to add some useful tips when using matplotlib.pyplot.savefig. The file format can be specified by the extension:
from matplotlib import pyplot as plt
plt.savefig('foo.png')
plt.savefig('foo.pdf')
Will give a rasterized or vectorized output respectively, both which could be useful. In addition, you’ll find that pylab leaves a generous, often undesirable, whitespace around the image. Remove it with:
import matplotlib.pyplot as plt
fig, ax = plt.subplots( nrows=1, ncols=1)# create figure & 1 axis
ax.plot([0,1,2],[10,20,3])
fig.savefig('path/to/save/image/to.png')# save the figure to file
plt.close(fig)# close the figure window
As others have said, plt.savefig() or fig1.savefig() is indeed the way to save an image.
However I’ve found that in certain cases the figure is always shown. (eg. with Spyder having plt.ion(): interactive mode = On.) I work around this by forcing the closing of the figure window in my giant loop with plt.close(figure_object) (see documentation), so I don’t have a million open figures during the loop:
import matplotlib.pyplot as plt
fig, ax = plt.subplots( nrows=1, ncols=1 ) # create figure & 1 axis
ax.plot([0,1,2], [10,20,3])
fig.savefig('path/to/save/image/to.png') # save the figure to file
plt.close(fig) # close the figure window
You should be able to re-open the figure later if needed to with fig.show() (didn’t test myself).
They say that the easiest way to prevent the figure from popping up is to use a non-interactive backend (eg. Agg), via matplotib.use(<backend>), eg:
import matplotlib
matplotlib.use('Agg')
import matplotlib.pyplot as plt
plt.plot([1,2,3])
plt.savefig('myfig')
I still personally prefer using plt.close( fig ), since then you have the option to hide certain figures (during a loop), but still display figures for post-loop data processing. It is probably slower than choosing a non-interactive backend though – would be interesting if someone tested that.
UPDATE: for Spyder, you usually can’t set the backend in this way (Because Spyder usually loads matplotlib early, preventing you from using matplotlib.use()).
Instead, use plt.switch_backend('Agg'), or Turn off “enable support” in the Spyder prefs and run the matplotlib.use('Agg') command yourself.
The other answers are correct. However, I sometimes find that I want to open the figure object later. For example, I might want to change the label sizes, add a grid, or do other processing. In a perfect world, I would simply rerun the code generating the plot, and adapt the settings. Alas, the world is not perfect. Therefore, in addition to saving to PDF or PNG, I add:
with open('some_file.pkl', "wb") as fp:
pickle.dump(fig, fp, protocol=4)
Like this, I can later load the figure object and manipulate the settings as I please.
I also write out the stack with the source-code and locals() dictionary for each function/method in the stack, so that I can later tell exactly what generated the figure.
NB: Be careful, as sometimes this method generates huge files.
回答 6
import datetime
import numpy as np
from matplotlib.backends.backend_pdf importPdfPagesimport matplotlib.pyplot as plt
# Create the PdfPages object to which we will save the pages:# The with statement makes sure that the PdfPages object is closed properly at# the end of the block, even if an Exception occurs.withPdfPages('multipage_pdf.pdf')as pdf:
plt.figure(figsize=(3,3))
plt.plot(range(7),[3,1,4,1,5,9,2],'r-o')
plt.title('Page One')
pdf.savefig()# saves the current figure into a pdf page
plt.close()
plt.rc('text', usetex=True)
plt.figure(figsize=(8,6))
x = np.arange(0,5,0.1)
plt.plot(x, np.sin(x),'b-')
plt.title('Page Two')
pdf.savefig()
plt.close()
plt.rc('text', usetex=False)
fig = plt.figure(figsize=(4,5))
plt.plot(x, x*x,'ko')
plt.title('Page Three')
pdf.savefig(fig)# or you can pass a Figure object to pdf.savefig
plt.close()# We can also set the file's metadata via the PdfPages object:
d = pdf.infodict()
d['Title']='Multipage PDF Example'
d['Author']= u'Jouni K. Sepp\xe4nen'
d['Subject']='How to create a multipage pdf file and set its metadata'
d['Keywords']='PdfPages multipage keywords author title subject'
d['CreationDate']= datetime.datetime(2009,11,13)
d['ModDate']= datetime.datetime.today()
import datetime
import numpy as np
from matplotlib.backends.backend_pdf import PdfPages
import matplotlib.pyplot as plt
# Create the PdfPages object to which we will save the pages:
# The with statement makes sure that the PdfPages object is closed properly at
# the end of the block, even if an Exception occurs.
with PdfPages('multipage_pdf.pdf') as pdf:
plt.figure(figsize=(3, 3))
plt.plot(range(7), [3, 1, 4, 1, 5, 9, 2], 'r-o')
plt.title('Page One')
pdf.savefig() # saves the current figure into a pdf page
plt.close()
plt.rc('text', usetex=True)
plt.figure(figsize=(8, 6))
x = np.arange(0, 5, 0.1)
plt.plot(x, np.sin(x), 'b-')
plt.title('Page Two')
pdf.savefig()
plt.close()
plt.rc('text', usetex=False)
fig = plt.figure(figsize=(4, 5))
plt.plot(x, x*x, 'ko')
plt.title('Page Three')
pdf.savefig(fig) # or you can pass a Figure object to pdf.savefig
plt.close()
# We can also set the file's metadata via the PdfPages object:
d = pdf.infodict()
d['Title'] = 'Multipage PDF Example'
d['Author'] = u'Jouni K. Sepp\xe4nen'
d['Subject'] = 'How to create a multipage pdf file and set its metadata'
d['Keywords'] = 'PdfPages multipage keywords author title subject'
d['CreationDate'] = datetime.datetime(2009, 11, 13)
d['ModDate'] = datetime.datetime.today()
回答 7
在使用plot()和其他函数创建所需的内容之后,可以使用如下子句在绘制到屏幕或文件之间进行选择:
import matplotlib.pyplot as plt
fig = plt.figure(figsize=(4,5))# size in inches# use plot(), etc. to create your plot.# Pick one of the following lines to uncomment# save_file = None# save_file = os.path.join(your_directory, your_file_name) if save_file:
plt.savefig(save_file)
plt.close(fig)else:
plt.show()
After using the plot() and other functions to create the content you want, you could use a clause like this to select between plotting to the screen or to file:
import matplotlib.pyplot as plt
fig = plt.figure(figsize=(4, 5)) # size in inches
# use plot(), etc. to create your plot.
# Pick one of the following lines to uncomment
# save_file = None
# save_file = os.path.join(your_directory, your_file_name)
if save_file:
plt.savefig(save_file)
plt.close(fig)
else:
plt.show()
import matplotlib.pyplot as plt
plt.savefig("image.png")
In Jupyter Notebook you have to remove plt.show() and add plt.savefig(), together with the rest of the plt-code in one cell.
The image will still show up in your notebook.
Given that today (was not available when this question was made) lots of people use Jupyter Notebook as python console, there is an extremely easy way to save the plots as .png, just call the matplotlib‘s pylab class from Jupyter Notebook, plot the figure ‘inline’ jupyter cells, and then drag that figure/image to a local directory. Don’t forget
%matplotlib inline in the first line!
# Saves a PNG file of the current graph to the folder and updates it every time# (nameOfimage, dpi=(sizeOfimage),Keeps_Labels_From_Disappearing)
plt.savefig(__file__+".png",dpi=(250), bbox_inches='tight')# Hard coded name: './test.png'
Additionally to those above, I added __file__ for the name so the picture and Python file get the same names. I also added few arguments to make It look better:
# Saves a PNG file of the current graph to the folder and updates it every time
# (nameOfimage, dpi=(sizeOfimage),Keeps_Labels_From_Disappearing)
plt.savefig(__file__+".png",dpi=(250), bbox_inches='tight')
# Hard coded name: './test.png'
回答 16
使用时matplotlib.pyplot,必须先保存您的绘图,然后使用以下两行将其关闭:
fig.savefig('plot.png')# save the plot, place the path you want to save the figure in quotation
plt.close(fig)# close the figure window
import matplotlib.pyplot as plt
plt.savefig("myfig.png")
For saving whatever IPhython image that you are displaying. Or on a different note (looking from a different angle), if you ever get to work with open cv, or if you have open cv imported, you can go for:
import cv2
cv2.imwrite(“myfig.png”,image)
But this is just in case if you need to work with Open CV. Otherwise plt.savefig() should be sufficient.
I’ve got a Python program where two variables are set to the value 'public'. In a conditional expression I have the comparison var1 is var2 which fails, but if I change it to var1 == var2 it returns True.
Now if I open my Python interpreter and do the same “is” comparison, it succeeds.
Other answers here are correct: is is used for identity comparison, while == is used for equality comparison. Since what you care about is equality (the two strings should contain the same characters), in this case the is operator is simply wrong and you should be using == instead.
The reason is works interactively is that (most) string literals are interned by default. From Wikipedia:
Interned strings speed up string
comparisons, which are sometimes a
performance bottleneck in applications
(such as compilers and dynamic
programming language runtimes) that
rely heavily on hash tables with
string keys. Without interning,
checking that two different strings
are equal involves examining every
character of both strings. This is
slow for several reasons: it is
inherently O(n) in the length of the
strings; it typically requires reads
from several regions of memory, which
take time; and the reads fills up the
processor cache, meaning there is less
cache available for other needs. With
interned strings, a simple object
identity test suffices after the
original intern operation; this is
typically implemented as a pointer
equality test, normally just a single
machine instruction with no memory
reference at all.
So, when you have two string literals (words that are literally typed into your program source code, surrounded by quotation marks) in your program that have the same value, the Python compiler will automatically intern the strings, making them both stored at the same memory location. (Note that this doesn’t always happen, and the rules for when this happens are quite convoluted, so please don’t rely on this behavior in production code!)
Since in your interactive session both strings are actually stored in the same memory location, they have the same identity, so the is operator works as expected. But if you construct a string by some other method (even if that string contains exactly the same characters), then the string may be equal, but it is not the same string — that is, it has a different identity, because it is stored in a different place in memory.
The is keyword is a test for object identity while == is a value comparison.
If you use is, the result will be true if and only if the object is the same object. However, == will be true any time the values of the object are the same.
One last thing to note, you may use the sys.intern function to ensure that you’re getting a reference to the same string:
>>> from sys import intern
>>> a = intern('a')
>>> a2 = intern('a')
>>> a is a2
True
As pointed out above, you should not be using is to determine equality of strings. But this may be helpful to know if you have some kind of weird requirement to use is.
Note that the intern function used to be a builtin on Python 2 but was moved to the sys module in Python 3.
is is identity testing, == is equality testing. What this means is that is is a way to check whether two things are the same things, or just equivalent.
Say you’ve got a simple person object. If it is named ‘Jack’ and is ’23’ years old, it’s equivalent to another 23yr old Jack, but its not the same person.
class Person(object):
def __init__(self, name, age):
self.name = name
self.age = age
def __eq__(self, other):
return self.name == other.name and self.age == other.age
jack1 = Person('Jack', 23)
jack2 = Person('Jack', 23)
jack1 == jack2 #True
jack1 is jack2 #False
They’re the same age, but they’re not the same instance of person. A string might be equivalent to another, but it’s not the same object.
From my limited experience with python, is is used to compare two objects to see if they are the same object as opposed to two different objects with the same value. == is used to determine if the values are identical.
I think it has to do with the fact that, when the ‘is’ comparison evaluates to false, two distinct objects are used. If it evaluates to true, that means internally it’s using the same exact object and not creating a new one, possibly because you created them within a fraction of 2 or so seconds and because there isn’t a large time gap in between it’s optimized and uses the same object.
This is why you should be using the equality operator ==, not is, to compare the value of a string object.
>>> s = 'one'
>>> s2 = 'two'
>>> s is s2
False
>>> s2 = s2.replace('two', 'one')
>>> s2
'one'
>>> s2 is s
False
>>>
In this example, I made s2, which was a different string object previously equal to ‘one’ but it is not the same object as s, because the interpreter did not use the same object as I did not initially assign it to ‘one’, if I had it would have made them the same object.
I believe that this is known as “interned” strings. Python does this, so does Java, and so do C and C++ when compiling in optimized modes.
If you use two identical strings, instead of wasting memory by creating two string objects, all interned strings with the same contents point to the same memory.
This results in the Python “is” operator returning True because two strings with the same contents are pointing at the same string object. This will also happen in Java and in C.
This is only useful for memory savings though. You cannot rely on it to test for string equality, because the various interpreters and compilers and JIT engines cannot always do it.
I am answering the question even though the question is to old because no answers above quotes the language reference
Actually the is operator checks for identity and == operator checks for equality,
From Language Reference:
Types affect almost all aspects of object behavior. Even the importance of object identity is affected in some sense: for immutable types, operations that compute new values may actually return a reference to any existing object with the same type and value, while for mutable objects this is not allowed. E.g., after a = 1; b = 1, a and b may or may not refer to the same object with the value one, depending on the implementation, but after c = []; d = [], c and d are guaranteed to refer to two different, unique, newly created empty lists. (Note that c = d = [] assigns the same object to both c and d.)
so from above statement we can infer that the strings which is an immutable type may fail when checked with “is” and may checked succeed when checked with “is”
The same applies for int,tuple which are also immutable types
The == operator test value equivalence. The is operator tests object identity, Python tests whether the two are really the same object(i.e., live at the same address in memory).
>>> a = 'banana'
>>> b = 'banana'
>>> a is b
True
In this example, Python only created one string object, and both a and b refers to it. The reason is that Python internally caches and reuses some strings as an optimization, there really is just a string ‘banana’ in memory, shared by a and b; To trigger the normal behavior, you need to use longer strings:
>>> a = 'a longer banana'
>>> b = 'a longer banana'
>>> a == b, a is b
(True, False)
When you create two lists, you get two objects:
>>> a = [1, 2, 3]
>>> b = [1, 2, 3]
>>> a is b
False
In this case we would say that the two lists are equivalent, because they have the same elements, but not identical, because they are not the same object. If two objects are identical, they are also equivalent, but if they are equivalent, they are not necessarily identical.
If a refers to an object and you assign b = a, then both variables refer to the same object:
from enum importEnum# for enum34, or the stdlib version# from aenum import Enum # for the aenum versionAnimal=Enum('Animal','ant bee cat dog')Animal.ant # returns <Animal.ant: 1>Animal['ant']# returns <Animal.ant: 1> (string lookup)Animal.ant.name # returns 'ant' (inverse lookup)
For more advanced Enum techniques try the aenum library (2.7, 3.3+, same author as enum34. Code is not perfectly compatible between py2 and py3, e.g. you’ll need __order__ in python 2).
To use enum34, do $ pip install enum34
To use aenum, do $ pip install aenum
Installing enum (no numbers) will install a completely different and incompatible version.
from enum import Enum # for enum34, or the stdlib version
# from aenum import Enum # for the aenum version
Animal = Enum('Animal', 'ant bee cat dog')
Animal.ant # returns <Animal.ant: 1>
Animal['ant'] # returns <Animal.ant: 1> (string lookup)
Animal.ant.name # returns 'ant' (inverse lookup)
or equivalently:
class Animal(Enum):
ant = 1
bee = 2
cat = 3
dog = 4
In earlier versions, one way of accomplishing enums is:
Support for converting the values back to names can be added this way:
def enum(*sequential, **named):
enums = dict(zip(sequential, range(len(sequential))), **named)
reverse = dict((value, key) for key, value in enums.iteritems())
enums['reverse_mapping'] = reverse
return type('Enum', (), enums)
This overwrites anything with that name, but it is useful for rendering your enums in output. It will throw KeyError if the reverse mapping doesn’t exist. With the first example:
Before PEP 435, Python didn’t have an equivalent but you could implement your own.
Myself, I like keeping it simple (I’ve seen some horribly complex examples on the net), something like this …
class Animal:
DOG = 1
CAT = 2
x = Animal.DOG
In Python 3.4 (PEP 435), you can make Enum the base class. This gets you a little bit of extra functionality, described in the PEP. For example, enum members are distinct from integers, and they are composed of a name and a value.
class Animal(Enum):
DOG = 1
CAT = 2
print(Animal.DOG)
# <Animal.DOG: 1>
print(Animal.DOG.value)
# 1
print(Animal.DOG.name)
# "DOG"
If you don’t want to type the values, use the following shortcut:
class Animal(Enum):
DOG, CAT = range(2)
Enum implementations can be converted to lists and are iterable. The order of its members is the declaration order and has nothing to do with their values. For example:
class Animal(Enum):
DOG = 1
CAT = 2
COW = 0
list(Animal)
# [<Animal.DOG: 1>, <Animal.CAT: 2>, <Animal.COW: 0>]
[animal.value for animal in Animal]
# [1, 2, 0]
Animal.CAT in Animal
# True
回答 2
这是一个实现:
classEnum(set):def __getattr__(self, name):if name in self:return nameraiseAttributeError
If you need the numeric values, here’s the quickest way:
dog, cat, rabbit = range(3)
In Python 3.x you can also add a starred placeholder at the end, which will soak up all the remaining values of the range in case you don’t mind wasting memory and cannot count:
The best solution for you would depend on what you require from your fakeenum.
Simple enum:
If you need the enum as only a list of names identifying different items, the solution by Mark Harrison (above) is great:
Pen, Pencil, Eraser = range(0, 3)
Using a range also allows you to set any starting value:
Pen, Pencil, Eraser = range(9, 12)
In addition to the above, if you also require that the items belong to a container of some sort, then embed them in a class:
class Stationery:
Pen, Pencil, Eraser = range(0, 3)
To use the enum item, you would now need to use the container name and the item name:
stype = Stationery.Pen
Complex enum:
For long lists of enum or more complicated uses of enum, these solutions will not suffice. You could look to the recipe by Will Ware for Simulating Enumerations in Python published in the Python Cookbook. An online version of that is available here.
The typesafe enum pattern which was used in Java pre-JDK 5 has a
number of advantages. Much like in Alexandru’s answer, you create a
class and class level fields are the enum values; however, the enum
values are instances of the class rather than small integers. This has
the advantage that your enum values don’t inadvertently compare equal
to small integers, you can control how they’re printed, add arbitrary
methods if that’s useful and make assertions using isinstance:
class Animal:
def __init__(self, name):
self.name = name
def __str__(self):
return self.name
def __repr__(self):
return "<Animal: %s>" % self
Animal.DOG = Animal("dog")
Animal.CAT = Animal("cat")
>>> x = Animal.DOG
>>> x
<Animal: dog>
>>> x == 1
False
A recent thread on python-dev pointed out there are a couple of enum libraries in the wild, including:
>>>State=Enum(['Unclaimed','Claimed'])>>>State.Claimed1>>>State[1]'Claimed'>>>State('Unclaimed','Claimed')>>> range(len(State))[0,1]>>>[(k,State[k])for k in range(len(State))][(0,'Unclaimed'),(1,'Claimed')]>>>[(k, getattr(State, k))for k inState][('Unclaimed',0),('Claimed',1)]
Python doesn’t have a built-in equivalent to enum, and other answers have ideas for implementing your own (you may also be interested in the over the top version in the Python cookbook).
However, in situations where an enum would be called for in C, I usually end up just using simple strings: because of the way objects/attributes are implemented, (C)Python is optimized to work very fast with short strings anyway, so there wouldn’t really be any performance benefit to using integers. To guard against typos / invalid values you can insert checks in selected places.
On 2013-05-10, Guido agreed to accept PEP 435 into the Python 3.4 standard library. This means that Python finally has builtin support for enumerations!
There is a backport available for Python 3.3, 3.2, 3.1, 2.7, 2.6, 2.5, and 2.4. It’s on Pypi as enum34.
Declaration:
>>> from enum import Enum
>>> class Color(Enum):
... red = 1
... green = 2
... blue = 3
class Animal:
class Dog: pass
class Cat: pass
x = Animal.Dog
It’s more bug-proof than using integers since you don’t have to worry about ensuring that the integers are unique (e.g. if you said Dog = 1 and Cat = 1 you’d be screwed).
It’s more bug-proof than using strings since you don’t have to worry about typos (e.g.
x == “catt” fails silently, but x == Animal.Catt is a runtime exception).
回答 11
def M_add_class_attribs(attribs):def foo(name, bases, dict_):for v, k in attribs:
dict_[k]= v
return type(name, bases, dict_)return foo
def enum(*names):classFoo(object):
__metaclass__ = M_add_class_attribs(enumerate(names))def __setattr__(self, name, value):# this makes it read-onlyraiseNotImplementedErrorreturnFoo()
def M_add_class_attribs(attribs):
def foo(name, bases, dict_):
for v, k in attribs:
dict_[k] = v
return type(name, bases, dict_)
return foo
def enum(*names):
class Foo(object):
__metaclass__ = M_add_class_attribs(enumerate(names))
def __setattr__(self, name, value): # this makes it read-only
raise NotImplementedError
return Foo()
Hmmm… I suppose the closest thing to an enum would be a dictionary, defined either like this:
months = {
'January': 1,
'February': 2,
...
}
or
months = dict(
January=1,
February=2,
...
)
Then, you can use the symbolic name for the constants like this:
mymonth = months['January']
There are other options, like a list of tuples, or a tuple of tuples, but the dictionary is the only one that provides you with a “symbolic” (constant string) way to access the
value.
Edit: I like Alexandru’s answer too!
回答 13
另一个非常简单的Python枚举实现,使用namedtuple:
from collections import namedtuple
def enum(*keys):return namedtuple('Enum', keys)(*keys)MyEnum= enum('FOO','BAR','BAZ')
或者,
# With sequential number valuesdef enum(*keys):return namedtuple('Enum', keys)(*range(len(keys)))# From a dict / keyword argsdef enum(**kwargs):return namedtuple('Enum', kwargs.keys())(*kwargs.values())
就像上面子类的方法一样set,这允许:
'FOO'inMyEnum
other =MyEnum.FOO
assert other ==MyEnum.FOO
Enumerations are created using the class syntax, which makes them easy
to read and write. An alternative creation method is described in
Functional API. To define an enumeration, subclass Enum as follows:
from enum import Enum
class Color(Enum):
red = 1
green = 2
blue = 3
回答 15
我用什么:
classEnum(object):def __init__(self, names, separator=None):
self.names = names.split(separator)for value, name in enumerate(self.names):
setattr(self, name.upper(), value)def tuples(self):return tuple(enumerate(self.names))
如何使用:
>>> state =Enum('draft published retracted')>>> state.DRAFT
0>>> state.RETRACTED
2>>> state.FOO
Traceback(most recent call last):File"<stdin>", line 1,in<module>AttributeError:'Enum' object has no attribute 'FOO'>>> state.tuples()((0,'draft'),(1,'published'),(2,'retracted'))
def cmp(a,b):if a < b:return-1if b < a:return1return0defEnum(*names):##assert names, "Empty enums are not supported" # <- Don't like empty enums? Uncomment!classEnumClass(object):
__slots__ = names
def __iter__(self):return iter(constants)def __len__(self):return len(constants)def __getitem__(self, i):return constants[i]def __repr__(self):return'Enum'+ str(names)def __str__(self):return'enum '+ str(constants)classEnumValue(object):
__slots__ =('__value')def __init__(self, value): self.__value = value
Value= property(lambda self: self.__value)EnumType= property(lambda self:EnumType)def __hash__(self):return hash(self.__value)def __cmp__(self, other):# C fans might want to remove the following assertion# to make all enums comparable by ordinal value {;))assert self.EnumTypeis other.EnumType,"Only values from the same enum are comparable"return cmp(self.__value, other.__value)def __lt__(self, other):return self.__cmp__(other)<0def __eq__(self, other):return self.__cmp__(other)==0def __invert__(self):return constants[maximum - self.__value]def __nonzero__(self):return bool(self.__value)def __repr__(self):return str(names[self.__value])
maximum = len(names)-1
constants =[None]* len(names)for i, each in enumerate(names):
val =EnumValue(i)
setattr(EnumClass, each, val)
constants[i]= val
constants = tuple(constants)EnumType=EnumClass()returnEnumTypeif __name__ =='__main__':print('\n*** Enum Demo ***')print('--- Days of week ---')Days=Enum('Mo','Tu','We','Th','Fr','Sa','Su')print(Days)print(Days.Mo)print(Days.Fr)print(Days.Mo<Days.Fr)print( list(Days))for each inDays:print('Day:', each)print('--- Yes/No ---')Confirmation=Enum('No','Yes')
answer =Confirmation.Noprint('Your answer is not',~answer)
It gives you a class, and the class contains all the enums. The enums can be compared to each other, but don’t have any particular value; you can’t use them as an integer value. (I resisted this at first because I am used to C enums, which are integer values. But if you can’t use it as an integer, you can’t use it as an integer by mistake so overall I think it is a win.) Each enum is a unique value. You can print enums, you can iterate over them, you can test that an enum value is “in” the enum. It’s pretty complete and slick.
Edit (cfi): The above link is not Python 3 compatible. Here’s my port of enum.py to Python 3:
def cmp(a,b):
if a < b: return -1
if b < a: return 1
return 0
def Enum(*names):
##assert names, "Empty enums are not supported" # <- Don't like empty enums? Uncomment!
class EnumClass(object):
__slots__ = names
def __iter__(self): return iter(constants)
def __len__(self): return len(constants)
def __getitem__(self, i): return constants[i]
def __repr__(self): return 'Enum' + str(names)
def __str__(self): return 'enum ' + str(constants)
class EnumValue(object):
__slots__ = ('__value')
def __init__(self, value): self.__value = value
Value = property(lambda self: self.__value)
EnumType = property(lambda self: EnumType)
def __hash__(self): return hash(self.__value)
def __cmp__(self, other):
# C fans might want to remove the following assertion
# to make all enums comparable by ordinal value {;))
assert self.EnumType is other.EnumType, "Only values from the same enum are comparable"
return cmp(self.__value, other.__value)
def __lt__(self, other): return self.__cmp__(other) < 0
def __eq__(self, other): return self.__cmp__(other) == 0
def __invert__(self): return constants[maximum - self.__value]
def __nonzero__(self): return bool(self.__value)
def __repr__(self): return str(names[self.__value])
maximum = len(names) - 1
constants = [None] * len(names)
for i, each in enumerate(names):
val = EnumValue(i)
setattr(EnumClass, each, val)
constants[i] = val
constants = tuple(constants)
EnumType = EnumClass()
return EnumType
if __name__ == '__main__':
print( '\n*** Enum Demo ***')
print( '--- Days of week ---')
Days = Enum('Mo', 'Tu', 'We', 'Th', 'Fr', 'Sa', 'Su')
print( Days)
print( Days.Mo)
print( Days.Fr)
print( Days.Mo < Days.Fr)
print( list(Days))
for each in Days:
print( 'Day:', each)
print( '--- Yes/No ---')
Confirmation = Enum('No', 'Yes')
answer = Confirmation.No
print( 'Your answer is not', ~answer)
>>>classEnum(int):...def __new__(cls, value):...if isinstance(value, str):...return getattr(cls, value)...elif isinstance(value, int):...return cls.__index[value]...def __str__(self):return self.__name
...def __repr__(self):return"%s.%s"%(type(self).__name__, self.__name)...class __metaclass__(type):...def __new__(mcls, name, bases, attrs):... attrs['__slots__']=['_Enum__name']... cls = type.__new__(mcls, name, bases, attrs)... cls._Enum__index= _index ={}...for base in reversed(bases):...if hasattr(base,'_Enum__index'):... _index.update(base._Enum__index)...# create all of the instances of the new class...for attr in attrs.keys():... value = attrs[attr]...if isinstance(value, int):... evalue = int.__new__(cls, value)... evalue._Enum__name= attr
... _index[value]= evalue
... setattr(cls, attr, evalue)...return cls
...
一个奇特的使用示例:
>>>classCitrus(Enum):...Lemon=1...Lime=2...>>>Citrus.LemonCitrus.Lemon>>>>>>Citrus(1)Citrus.Lemon>>>Citrus(5)Traceback(most recent call last):File"<stdin>", line 1,in<module>File"<stdin>", line 6,in __new__
KeyError:5>>>classFruit(Citrus):...Apple=3...Banana=4...>>>Fruit.AppleFruit.Apple>>>Fruit.LemonCitrus.Lemon>>>Fruit(1)Citrus.Lemon>>>Fruit(3)Fruit.Apple>>>"%d %s %r"%((Fruit.Apple,)*3)'3 Apple Fruit.Apple'>>>Fruit(1)isCitrus.LemonTrue
I have had occasion to need of an Enum class, for the purpose of decoding a binary file format. The features I happened to want is concise enum definition, the ability to freely create instances of the enum by either integer value or string, and a useful representation. Here’s what I ended up with:
>>> class Enum(int):
... def __new__(cls, value):
... if isinstance(value, str):
... return getattr(cls, value)
... elif isinstance(value, int):
... return cls.__index[value]
... def __str__(self): return self.__name
... def __repr__(self): return "%s.%s" % (type(self).__name__, self.__name)
... class __metaclass__(type):
... def __new__(mcls, name, bases, attrs):
... attrs['__slots__'] = ['_Enum__name']
... cls = type.__new__(mcls, name, bases, attrs)
... cls._Enum__index = _index = {}
... for base in reversed(bases):
... if hasattr(base, '_Enum__index'):
... _index.update(base._Enum__index)
... # create all of the instances of the new class
... for attr in attrs.keys():
... value = attrs[attr]
... if isinstance(value, int):
... evalue = int.__new__(cls, value)
... evalue._Enum__name = attr
... _index[value] = evalue
... setattr(cls, attr, evalue)
... return cls
...
A whimsical example of using it:
>>> class Citrus(Enum):
... Lemon = 1
... Lime = 2
...
>>> Citrus.Lemon
Citrus.Lemon
>>>
>>> Citrus(1)
Citrus.Lemon
>>> Citrus(5)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<stdin>", line 6, in __new__
KeyError: 5
>>> class Fruit(Citrus):
... Apple = 3
... Banana = 4
...
>>> Fruit.Apple
Fruit.Apple
>>> Fruit.Lemon
Citrus.Lemon
>>> Fruit(1)
Citrus.Lemon
>>> Fruit(3)
Fruit.Apple
>>> "%d %s %r" % ((Fruit.Apple,)*3)
'3 Apple Fruit.Apple'
>>> Fruit(1) is Citrus.Lemon
True
Key features:
str(), int() and repr() all produce the most useful output possible, respectively the name of the enumartion, its integer value, and a Python expression that evaluates back to the enumeration.
Enumerated values returned by the constructor are limited strictly to the predefined values, no accidental enum values.
Enumerated values are singletons; they can be strictly compared with is
>>>from flufl.enum importEnum>>>classColors(Enum):... red =1... green =2... blue =3>>>for color inColors:print color
Colors.red
Colors.green
Colors.blue
>>> from flufl.enum import Enum
>>> class Colors(Enum):
... red = 1
... green = 2
... blue = 3
>>> for color in Colors: print color
Colors.red
Colors.green
Colors.blue
回答 21
def enum(*sequential,**named):
enums = dict(zip(sequential,[object()for _ in range(len(sequential))]),**named)return type('Enum',(), enums)
When using other implementations sited here (also when using named instances in my example) you must be sure you never try to compare objects from different enums. For here’s a possible pitfall:
>>>Numbers= enum_base(int, ONE=1, TWO=2, THREE=3)>>>Numbers.ONE
1>>> x =Numbers.TWO
>>>10+ x
12>>> type(Numbers)<type 'type'>>>> type(Numbers.ONE)<class'Enum'>>>> isinstance(x,Numbers)True
使用此方法可以完成的另一件有趣的事情是,通过覆盖内置方法来自定义特定行为:
def enum_repr(t,**enums):'''enums with a base class and repr() output'''classEnum(t):def __repr__(self):return'<enum {0} of type Enum({1})>'.format(self._name, t.__name__)for key,val in enums.items():
i =Enum(val)
i._name = key
setattr(Enum, key, i)returnEnum>>>Numbers= enum_repr(int, ONE=1, TWO=2, THREE=3)>>> repr(Numbers.ONE)'<enum ONE of type Enum(int)>'>>> str(Numbers.ONE)'1'
It’s elegant and clean looking, but it’s just a function that creates a class with the specified attributes.
With a little modification to the function, we can get it to act a little more ‘enumy’:
NOTE: I created the following examples by trying to reproduce the
behavior of pygtk’s new style ‘enums’ (like Gtk.MessageType.WARNING)
def enum_base(t, **enums):
'''enums with a base class'''
T = type('Enum', (t,), {})
for key,val in enums.items():
setattr(T, key, T(val))
return T
This creates an enum based off a specified type. In addition to giving attribute access like the previous function, it behaves as you would expect an Enum to with respect to types. It also inherits the base class.
Another interesting thing that can be done with this method is customize specific behavior by overriding built-in methods:
def enum_repr(t, **enums):
'''enums with a base class and repr() output'''
class Enum(t):
def __repr__(self):
return '<enum {0} of type Enum({1})>'.format(self._name, t.__name__)
for key,val in enums.items():
i = Enum(val)
i._name = key
setattr(Enum, key, i)
return Enum
>>> Numbers = enum_repr(int, ONE=1, TWO=2, THREE=3)
>>> repr(Numbers.ONE)
'<enum ONE of type Enum(int)>'
>>> str(Numbers.ONE)
'1'
The enum package from PyPI provides a robust implementation of enums. An earlier answer mentioned PEP 354; this was rejected but the proposal was implemented
http://pypi.python.org/pypi/enum.
Alexandru’s suggestion of using class constants for enums works quite well.
I also like to add a dictionary for each set of constants to lookup a human-readable string representation.
This serves two purposes: a) it provides a simple way to pretty-print your enum and b) the dictionary logically groups the constants so that you can test for membership.
def enum(*names):"""
SYNOPSIS
Well-behaved enumerated type, easier than creating custom classes
DESCRIPTION
Create a custom type that implements an enumeration. Similar in concept
to a C enum but with some additional capabilities and protections. See
http://code.activestate.com/recipes/413486-first-class-enums-in-python/.
PARAMETERS
names Ordered list of names. The order in which names are given
will be the sort order in the enum type. Duplicate names
are not allowed. Unicode names are mapped to ASCII.
RETURNS
Object of type enum, with the input names and the enumerated values.
EXAMPLES
>>> letters = enum('a','e','i','o','u','b','c','y','z')
>>> letters.a < letters.e
True
## index by property
>>> letters.a
a
## index by position
>>> letters[0]
a
## index by name, helpful for bridging string inputs to enum
>>> letters['a']
a
## sorting by order in the enum() create, not character value
>>> letters.u < letters.b
True
## normal slicing operations available
>>> letters[-1]
z
## error since there are not 100 items in enum
>>> letters[99]
Traceback (most recent call last):
...
IndexError: tuple index out of range
## error since name does not exist in enum
>>> letters['ggg']
Traceback (most recent call last):
...
ValueError: tuple.index(x): x not in tuple
## enums must be named using valid Python identifiers
>>> numbers = enum(1,2,3,4)
Traceback (most recent call last):
...
AssertionError: Enum values must be string or unicode
>>> a = enum('-a','-b')
Traceback (most recent call last):
...
TypeError: Error when calling the metaclass bases
__slots__ must be identifiers
## create another enum
>>> tags = enum('a','b','c')
>>> tags.a
a
>>> letters.a
a
## can't compare values from different enums
>>> letters.a == tags.a
Traceback (most recent call last):
...
AssertionError: Only values from the same enum are comparable
>>> letters.a < tags.a
Traceback (most recent call last):
...
AssertionError: Only values from the same enum are comparable
## can't update enum after create
>>> letters.a = 'x'
Traceback (most recent call last):
...
AttributeError: 'EnumClass' object attribute 'a' is read-only
## can't update enum after create
>>> del letters.u
Traceback (most recent call last):
...
AttributeError: 'EnumClass' object attribute 'u' is read-only
## can't have non-unique enum values
>>> x = enum('a','b','c','a')
Traceback (most recent call last):
...
AssertionError: Enums must not repeat values
## can't have zero enum values
>>> x = enum()
Traceback (most recent call last):
...
AssertionError: Empty enums are not supported
## can't have enum values that look like special function names
## since these could collide and lead to non-obvious errors
>>> x = enum('a','b','c','__cmp__')
Traceback (most recent call last):
...
AssertionError: Enum values beginning with __ are not supported
LIMITATIONS
Enum values of unicode type are not preserved, mapped to ASCII instead.
"""## must have at least one enum valueassert names,'Empty enums are not supported'## enum values must be stringsassert len([i for i in names ifnot isinstance(i, types.StringTypes)andnot \
isinstance(i, unicode)])==0,'Enum values must be string or unicode'## enum values must not collide with special function namesassert len([i for i in names if i.startswith("__")])==0,\
'Enum values beginning with __ are not supported'## each enum value must be unique from all othersassert names == uniquify(names),'Enums must not repeat values'classEnumClass(object):""" See parent function for explanation """
__slots__ = names
def __iter__(self):return iter(constants)def __len__(self):return len(constants)def __getitem__(self, i):## this makes xx['name'] possibleif isinstance(i, types.StringTypes):
i = names.index(i)## handles the more normal xx[0]return constants[i]def __repr__(self):return'enum'+ str(names)def __str__(self):return'enum '+ str(constants)def index(self, i):return names.index(i)classEnumValue(object):""" See parent function for explanation """
__slots__ =('__value')def __init__(self, value):
self.__value = value
value = property(lambda self: self.__value)
enumtype = property(lambda self: enumtype)def __hash__(self):return hash(self.__value)def __cmp__(self, other):assert self.enumtype is other.enumtype,'Only values from the same enum are comparable'return cmp(self.value, other.value)def __invert__(self):return constants[maximum - self.value]def __nonzero__(self):## return bool(self.value)## Original code led to bool(x[0])==False, not correctreturnTruedef __repr__(self):return str(names[self.value])
maximum = len(names)-1
constants =[None]* len(names)for i, each in enumerate(names):
val =EnumValue(i)
setattr(EnumClass, each, val)
constants[i]= val
constants = tuple(constants)
enumtype =EnumClass()return enumtype
Many doctests included here to illustrate what’s different about this approach.
def enum(*names):
"""
SYNOPSIS
Well-behaved enumerated type, easier than creating custom classes
DESCRIPTION
Create a custom type that implements an enumeration. Similar in concept
to a C enum but with some additional capabilities and protections. See
http://code.activestate.com/recipes/413486-first-class-enums-in-python/.
PARAMETERS
names Ordered list of names. The order in which names are given
will be the sort order in the enum type. Duplicate names
are not allowed. Unicode names are mapped to ASCII.
RETURNS
Object of type enum, with the input names and the enumerated values.
EXAMPLES
>>> letters = enum('a','e','i','o','u','b','c','y','z')
>>> letters.a < letters.e
True
## index by property
>>> letters.a
a
## index by position
>>> letters[0]
a
## index by name, helpful for bridging string inputs to enum
>>> letters['a']
a
## sorting by order in the enum() create, not character value
>>> letters.u < letters.b
True
## normal slicing operations available
>>> letters[-1]
z
## error since there are not 100 items in enum
>>> letters[99]
Traceback (most recent call last):
...
IndexError: tuple index out of range
## error since name does not exist in enum
>>> letters['ggg']
Traceback (most recent call last):
...
ValueError: tuple.index(x): x not in tuple
## enums must be named using valid Python identifiers
>>> numbers = enum(1,2,3,4)
Traceback (most recent call last):
...
AssertionError: Enum values must be string or unicode
>>> a = enum('-a','-b')
Traceback (most recent call last):
...
TypeError: Error when calling the metaclass bases
__slots__ must be identifiers
## create another enum
>>> tags = enum('a','b','c')
>>> tags.a
a
>>> letters.a
a
## can't compare values from different enums
>>> letters.a == tags.a
Traceback (most recent call last):
...
AssertionError: Only values from the same enum are comparable
>>> letters.a < tags.a
Traceback (most recent call last):
...
AssertionError: Only values from the same enum are comparable
## can't update enum after create
>>> letters.a = 'x'
Traceback (most recent call last):
...
AttributeError: 'EnumClass' object attribute 'a' is read-only
## can't update enum after create
>>> del letters.u
Traceback (most recent call last):
...
AttributeError: 'EnumClass' object attribute 'u' is read-only
## can't have non-unique enum values
>>> x = enum('a','b','c','a')
Traceback (most recent call last):
...
AssertionError: Enums must not repeat values
## can't have zero enum values
>>> x = enum()
Traceback (most recent call last):
...
AssertionError: Empty enums are not supported
## can't have enum values that look like special function names
## since these could collide and lead to non-obvious errors
>>> x = enum('a','b','c','__cmp__')
Traceback (most recent call last):
...
AssertionError: Enum values beginning with __ are not supported
LIMITATIONS
Enum values of unicode type are not preserved, mapped to ASCII instead.
"""
## must have at least one enum value
assert names, 'Empty enums are not supported'
## enum values must be strings
assert len([i for i in names if not isinstance(i, types.StringTypes) and not \
isinstance(i, unicode)]) == 0, 'Enum values must be string or unicode'
## enum values must not collide with special function names
assert len([i for i in names if i.startswith("__")]) == 0,\
'Enum values beginning with __ are not supported'
## each enum value must be unique from all others
assert names == uniquify(names), 'Enums must not repeat values'
class EnumClass(object):
""" See parent function for explanation """
__slots__ = names
def __iter__(self):
return iter(constants)
def __len__(self):
return len(constants)
def __getitem__(self, i):
## this makes xx['name'] possible
if isinstance(i, types.StringTypes):
i = names.index(i)
## handles the more normal xx[0]
return constants[i]
def __repr__(self):
return 'enum' + str(names)
def __str__(self):
return 'enum ' + str(constants)
def index(self, i):
return names.index(i)
class EnumValue(object):
""" See parent function for explanation """
__slots__ = ('__value')
def __init__(self, value):
self.__value = value
value = property(lambda self: self.__value)
enumtype = property(lambda self: enumtype)
def __hash__(self):
return hash(self.__value)
def __cmp__(self, other):
assert self.enumtype is other.enumtype, 'Only values from the same enum are comparable'
return cmp(self.value, other.value)
def __invert__(self):
return constants[maximum - self.value]
def __nonzero__(self):
## return bool(self.value)
## Original code led to bool(x[0])==False, not correct
return True
def __repr__(self):
return str(names[self.value])
maximum = len(names) - 1
constants = [None] * len(names)
for i, each in enumerate(names):
val = EnumValue(i)
setattr(EnumClass, each, val)
constants[i] = val
constants = tuple(constants)
enumtype = EnumClass()
return enumtype
While the original enum proposal, PEP 354, was rejected years ago, it keeps coming back up. Some kind of enum was intended to be added to 3.2, but it got pushed back to 3.3 and then forgotten. And now there’s a PEP 435 intended for inclusion in Python 3.4. The reference implementation of PEP 435 is flufl.enum.
As of April 2013, there seems to be a general consensus that something should be added to the standard library in 3.4—as long as people can agree on what that “something” should be. That’s the hard part. See the threads starting here and here, and a half dozen other threads in the early months of 2013.
Meanwhile, every time this comes up, a slew of new designs and implementations appear on PyPI, ActiveState, etc., so if you don’t like the FLUFL design, try a PyPI search.
回答 29
使用以下内容。
TYPE ={'EAN13': u'EAN-13','CODE39': u'Code 39','CODE128': u'Code 128','i25': u'Interleaved 2 of 5',}>>> TYPE.items()[('EAN13', u'EAN-13'),('i25', u'Interleaved 2 of 5'),('CODE39', u'Code 39'),('CODE128', u'Code 128')]>>> TYPE.keys()['EAN13','i25','CODE39','CODE128']>>> TYPE.values()[u'EAN-13', u'Interleaved 2 of 5', u'Code 39', u'Code 128']
The advantage of adding a path to sys.path (over using imp) is that it simplifies things when importing more than one module from a single package. For example:
import sys
# the mock-0.3.1 dir contains testcase.py, testutils.py & mock.py
sys.path.append('/foo/bar/mock-0.3.1')
from testcase import TestCase
from testutils import RunTests
from mock import Mock, sentinel, patch
If your top-level module is not a file but is packaged as a directory with __init__.py, then the accepted solution almost works, but not quite. In Python 3.5+ the following code is needed (note the added line that begins with ‘sys.modules’):
Without this line, when exec_module is executed, it tries to bind relative imports in your top level __init__.py to the top level module name — in this case “mymodule”. But “mymodule” isn’t loaded yet so you’ll get the error “SystemError: Parent module ‘mymodule’ not loaded, cannot perform relative import”. So you need to bind the name before you load it. The reason for this is the fundamental invariant of the relative import system: “The invariant holding is that if you have sys.modules[‘spam’] and sys.modules[‘spam.foo’] (as you would after the above import), the latter must appear as the foo attribute of the former” as discussed here.
It sounds like you don’t want to specifically import the configuration file (which has a whole lot of side effects and additional complications involved), you just want to run it, and be able to access the resulting namespace. The standard library provides an API specifically for that in the form of runpy.run_path:
from runpy import run_path
settings = run_path("/path/to/file.py")
That interface is available in Python 2.7 and Python 3.2+
You can also do something like this and add the directory that the configuration file is sitting in to the Python load path, and then just do a normal import, assuming you know the name of the file in advance, in this case “config”.
Messy, but it works.
configfile = '~/config.py'
import os
import sys
sys.path.append(os.path.dirname(os.path.expanduser(configfile)))
import config
from importlib.util import spec_from_loader, module_from_spec
from importlib.machinery importSourceFileLoader
spec = spec_from_loader("module.name",SourceFileLoader("module.name","/path/to/file.py"))
mod = module_from_spec(spec)
spec.loader.exec_module(mod)
from importlib.util import spec_from_loader, module_from_spec
from importlib.machinery import SourceFileLoader
spec = spec_from_loader("module.name", SourceFileLoader("module.name", "/path/to/file.py"))
mod = module_from_spec(spec)
spec.loader.exec_module(mod)
The advantage of encoding the path in an explicit SourceFileLoader is that the machinery will not try to figure out the type of the file from the extension. This means that you can load something like a .txt file using this method, but you could not do it with spec_from_file_location without specifying the loader because .txt is not in importlib.machinery.SOURCE_SUFFIXES.
回答 10
您是指加载还是导入?
您可以操纵sys.path列表,指定模块的路径,然后导入模块。例如,给定一个模块位于:
/foo/bar.py
您可以这样做:
import sys
sys.path[0:0]=['/foo']# puts the /foo directory at the start of your pathimport bar
I believe you can use imp.find_module() and imp.load_module() to load the specified module. You’ll need to split the module name off of the path, i.e. if you wanted to load /home/mypath/mymodule.py you’d need to do:
import pkgutil
import importlib
packages = pkgutil.walk_packages(path='.')for importer, name, is_package in packages:
mod = importlib.import_module(name)# do whatever you want with module now, it's been imported!
You can use the pkgutil module (specifically the walk_packages method) to get a list of the packages in the current directory. From there it’s trivial to use the importlib machinery to import the modules you want:
import pkgutil
import importlib
packages = pkgutil.walk_packages(path='.')
for importer, name, is_package in packages:
mod = importlib.import_module(name)
# do whatever you want with module now, it's been imported!
def import_module_from_file(full_path_to_module):"""
Import a module given the full path/filename of the .py file
Python 3.4
"""
module =Nonetry:# Get module name and path from full path
module_dir, module_file = os.path.split(full_path_to_module)
module_name, module_ext = os.path.splitext(module_file)# Get module "spec" from filename
spec = importlib.util.spec_from_file_location(module_name,full_path_to_module)
module = spec.loader.load_module()exceptExceptionas ec:# Simple error printing# Insert "sophisticated" stuff hereprint(ec)finally:return module
This area of Python 3.4 seems to be extremely tortuous to understand! However with a bit of hacking using the code from Chris Calloway as a start I managed to get something working. Here’s the basic function.
def import_module_from_file(full_path_to_module):
"""
Import a module given the full path/filename of the .py file
Python 3.4
"""
module = None
try:
# Get module name and path from full path
module_dir, module_file = os.path.split(full_path_to_module)
module_name, module_ext = os.path.splitext(module_file)
# Get module "spec" from filename
spec = importlib.util.spec_from_file_location(module_name,full_path_to_module)
module = spec.loader.load_module()
except Exception as ec:
# Simple error printing
# Insert "sophisticated" stuff here
print(ec)
finally:
return module
This appears to use non-deprecated modules from Python 3.4. I don’t pretend to understand why, but it seems to work from within a program. I found Chris’ solution worked on the command line but not from inside a program.
I’m not saying that it is better, but for the sake of completeness, I wanted to suggest the exec function, available in both python 2 and 3.
exec allows you to execute arbitrary code in either the global scope, or in an internal scope, provided as a dictionary.
For example, if you have a module stored in "/path/to/module” with the function foo(), you could run it by doing the following:
module = dict()
with open("/path/to/module") as f:
exec(f.read(), module)
module['foo']()
This makes it a bit more explicit that you’re loading code dynamically, and grants you some additional power, such as the ability to provide custom builtins.
And if having access through attributes, instead of keys is important to you, you can design a custom dict class for the globals, that provides such access, e.g.:
class MyModuleClass(dict):
def __getattr__(self, name):
return self.__getitem__(name)
##################### ### classloader.py ### ####################import sys, types
def _get_mod(modulePath):try:
aMod = sys.modules[modulePath]ifnot isinstance(aMod, types.ModuleType):raiseKeyErrorexceptKeyError:# The last [''] is very important!
aMod = __import__(modulePath, globals(), locals(),[''])
sys.modules[modulePath]= aMod
return aMod
def _get_func(fullFuncName):"""Retrieve a function object from a full dotted-package name."""# Parse out the path, module, and function
lastDot = fullFuncName.rfind(u".")
funcName = fullFuncName[lastDot +1:]
modPath = fullFuncName[:lastDot]
aMod = _get_mod(modPath)
aFunc = getattr(aMod, funcName)# Assert that the function is a *callable* attribute.assert callable(aFunc), u"%s is not callable."% fullFuncName
# Return a reference to the function itself,# not the results of the function.return aFunc
def _get_class(fullClassName, parentClass=None):"""Load a module and retrieve a class (NOT an instance).
If the parentClass is supplied, className must be of parentClass
or a subclass of parentClass (or None is returned).
"""
aClass = _get_func(fullClassName)# Assert that the class is a subclass of parentClass.if parentClass isnotNone:ifnot issubclass(aClass, parentClass):raiseTypeError(u"%s is not a subclass of %s"%(fullClassName, parentClass))# Return a reference to the class itself, not an instantiated object.return aClass
######################## Usage ########################classStorageManager:passclassStorageManagerMySQL(StorageManager):passdef storage_object(aFullClassName, allOptions={}):
aStoreClass = _get_class(aFullClassName,StorageManager)return aStoreClass(allOptions)
###################
## #
## classloader.py #
## #
###################
import sys, types
def _get_mod(modulePath):
try:
aMod = sys.modules[modulePath]
if not isinstance(aMod, types.ModuleType):
raise KeyError
except KeyError:
# The last [''] is very important!
aMod = __import__(modulePath, globals(), locals(), [''])
sys.modules[modulePath] = aMod
return aMod
def _get_func(fullFuncName):
"""Retrieve a function object from a full dotted-package name."""
# Parse out the path, module, and function
lastDot = fullFuncName.rfind(u".")
funcName = fullFuncName[lastDot + 1:]
modPath = fullFuncName[:lastDot]
aMod = _get_mod(modPath)
aFunc = getattr(aMod, funcName)
# Assert that the function is a *callable* attribute.
assert callable(aFunc), u"%s is not callable." % fullFuncName
# Return a reference to the function itself,
# not the results of the function.
return aFunc
def _get_class(fullClassName, parentClass=None):
"""Load a module and retrieve a class (NOT an instance).
If the parentClass is supplied, className must be of parentClass
or a subclass of parentClass (or None is returned).
"""
aClass = _get_func(fullClassName)
# Assert that the class is a subclass of parentClass.
if parentClass is not None:
if not issubclass(aClass, parentClass):
raise TypeError(u"%s is not a subclass of %s" %
(fullClassName, parentClass))
# Return a reference to the class itself, not an instantiated object.
return aClass
######################
## Usage ##
######################
class StorageManager: pass
class StorageManagerMySQL(StorageManager): pass
def storage_object(aFullClassName, allOptions={}):
aStoreClass = _get_class(aFullClassName, StorageManager)
return aStoreClass(allOptions)
import os, sys, inspect, copy
SOURCE_FILE = os.path.abspath(inspect.getsourcefile(lambda:0)).replace('\\','/')
SOURCE_DIR = os.path.dirname(SOURCE_FILE)print("test::SOURCE_FILE: ", SOURCE_FILE)# portable import to the global space
sys.path.append(TACKLELIB_ROOT)# TACKLELIB_ROOT - path to the library directoryimport tacklelib as tkl
tkl.tkl_init(tkl)# cleanupdel tkl # must be instead of `tkl = None`, otherwise the variable would be still persist
sys.path.pop()
tkl_import_module(SOURCE_DIR,'testlib.py')print(globals().keys())
testlib.base_test()
testlib.testlib_std1.std1_test()
testlib.testlib_std1.testlib_std2.std2_test()#testlib.testlib.std3.std3_test() # does not reachable directly ...
getattr(globals()['testlib'],'testlib.std3').std3_test()# ... but reachable through the `globals` + `getattr`
tkl_import_module(SOURCE_DIR,'testlib.py','.')print(globals().keys())
base_test()
testlib_std1.std1_test()
testlib_std1.testlib_std2.std2_test()#testlib.std3.std3_test() # does not reachable directly ...
globals()['testlib.std3'].std3_test()# ... but reachable through the `globals` + `getattr`
testlib.py:
# optional for 3.4.x and higher#import os, inspect##SOURCE_FILE = os.path.abspath(inspect.getsourcefile(lambda:0)).replace('\\','/')#SOURCE_DIR = os.path.dirname(SOURCE_FILE)print("1 testlib::SOURCE_FILE: ", SOURCE_FILE)
tkl_import_module(SOURCE_DIR +'/std1','testlib.std1.py','testlib_std1')# SOURCE_DIR is restored hereprint("2 testlib::SOURCE_FILE: ", SOURCE_FILE)
tkl_import_module(SOURCE_DIR +'/std3','testlib.std3.py')print("3 testlib::SOURCE_FILE: ", SOURCE_FILE)def base_test():print('base_test')
testlib.std1.py:
# optional for 3.4.x and higher#import os, inspect##SOURCE_FILE = os.path.abspath(inspect.getsourcefile(lambda:0)).replace('\\','/')#SOURCE_DIR = os.path.dirname(SOURCE_FILE)print("testlib.std1::SOURCE_FILE: ", SOURCE_FILE)
tkl_import_module(SOURCE_DIR +'/../std2','testlib.std2.py','testlib_std2')def std1_test():print('std1_test')
testlib.std2.py:
# optional for 3.4.x and higher#import os, inspect##SOURCE_FILE = os.path.abspath(inspect.getsourcefile(lambda:0)).replace('\\','/')#SOURCE_DIR = os.path.dirname(SOURCE_FILE)print("testlib.std2::SOURCE_FILE: ", SOURCE_FILE)def std2_test():print('std2_test')
testlib.std3.py:
# optional for 3.4.x and higher#import os, inspect##SOURCE_FILE = os.path.abspath(inspect.getsourcefile(lambda:0)).replace('\\','/')#SOURCE_DIR = os.path.dirname(SOURCE_FILE)print("testlib.std3::SOURCE_FILE: ", SOURCE_FILE)def std3_test():print('std3_test')
输出(3.7.4):
test::SOURCE_FILE:<root>/test01/test.py
import:<root>/test01/testlib.py as testlib ->[]1 testlib::SOURCE_FILE:<root>/test01/testlib.py
import:<root>/test01/std1/testlib.std1.py as testlib_std1 ->['testlib']import:<root>/test01/std1/../std2/testlib.std2.py as testlib_std2 ->['testlib','testlib_std1']
testlib.std2::SOURCE_FILE:<root>/test01/std1/../std2/testlib.std2.py
2 testlib::SOURCE_FILE:<root>/test01/testlib.py
import:<root>/test01/std3/testlib.std3.py as testlib.std3 ->['testlib']
testlib.std3::SOURCE_FILE:<root>/test01/std3/testlib.std3.py
3 testlib::SOURCE_FILE:<root>/test01/testlib.py
dict_keys(['__name__','__doc__','__package__','__loader__','__spec__','__annotations__','__builtins__','__file__','__cached__','os','sys','inspect','copy','SOURCE_FILE','SOURCE_DIR','TackleGlobalImportModuleState','tkl_membercopy','tkl_merge_module','tkl_get_parent_imported_module_state','tkl_declare_global','tkl_import_module','TackleSourceModuleState','tkl_source_module','TackleLocalImportModuleState','testlib'])
base_test
std1_test
std2_test
std3_test
import:<root>/test01/testlib.py as.->[]1 testlib::SOURCE_FILE:<root>/test01/testlib.py
import:<root>/test01/std1/testlib.std1.py as testlib_std1 ->['testlib']import:<root>/test01/std1/../std2/testlib.std2.py as testlib_std2 ->['testlib','testlib_std1']
testlib.std2::SOURCE_FILE:<root>/test01/std1/../std2/testlib.std2.py
2 testlib::SOURCE_FILE:<root>/test01/testlib.py
import:<root>/test01/std3/testlib.std3.py as testlib.std3 ->['testlib']
testlib.std3::SOURCE_FILE:<root>/test01/std3/testlib.std3.py
3 testlib::SOURCE_FILE:<root>/test01/testlib.py
dict_keys(['__name__','__doc__','__package__','__loader__','__spec__','__annotations__','__builtins__','__file__','__cached__','os','sys','inspect','copy','SOURCE_FILE','SOURCE_DIR','TackleGlobalImportModuleState','tkl_membercopy','tkl_merge_module','tkl_get_parent_imported_module_state','tkl_declare_global','tkl_import_module','TackleSourceModuleState','tkl_source_module','TackleLocalImportModuleState','testlib','testlib_std1','testlib.std3','base_test'])
base_test
std1_test
std2_test
std3_test
import os, sys, inspect, copy
SOURCE_FILE = os.path.abspath(inspect.getsourcefile(lambda:0)).replace('\\','/')
SOURCE_DIR = os.path.dirname(SOURCE_FILE)
print("test::SOURCE_FILE: ", SOURCE_FILE)
# portable import to the global space
sys.path.append(TACKLELIB_ROOT) # TACKLELIB_ROOT - path to the library directory
import tacklelib as tkl
tkl.tkl_init(tkl)
# cleanup
del tkl # must be instead of `tkl = None`, otherwise the variable would be still persist
sys.path.pop()
tkl_import_module(SOURCE_DIR, 'testlib.py')
print(globals().keys())
testlib.base_test()
testlib.testlib_std1.std1_test()
testlib.testlib_std1.testlib_std2.std2_test()
#testlib.testlib.std3.std3_test() # does not reachable directly ...
getattr(globals()['testlib'], 'testlib.std3').std3_test() # ... but reachable through the `globals` + `getattr`
tkl_import_module(SOURCE_DIR, 'testlib.py', '.')
print(globals().keys())
base_test()
testlib_std1.std1_test()
testlib_std1.testlib_std2.std2_test()
#testlib.std3.std3_test() # does not reachable directly ...
globals()['testlib.std3'].std3_test() # ... but reachable through the `globals` + `getattr`
Can import both module as a submodule and can import content of a module to a parent module (or into a globals if has no parent module).
Can import modules with periods in a file name.
Can import any extension module from any extension module.
Can use a standalone name for a submodule instead of a file name without extension which is by default (for example, testlib.std.py as testlib, testlib.blabla.py as testlib_blabla and so on).
Does not depend on a sys.path or on a what ever search path storage.
Does not require to save/restore global variables like SOURCE_FILE and SOURCE_DIR between calls to tkl_import_module.
[for 3.4.x and higher] Can mix the module namespaces in nested tkl_import_module calls (ex: named->local->named or local->named->local and so on).
[for 3.4.x and higher] Can auto export global variables/functions/classes from where being declared to all children modules imported through the tkl_import_module (through the tkl_declare_global function).
Cons:
[for 3.3.x and lower] Require to declare tkl_import_module in all modules which calls to tkl_import_module (code duplication)
Update 1,2 (for 3.4.x and higher only):
In Python 3.4 and higher you can bypass the requirement to declare tkl_import_module in each module by declare tkl_import_module in a top level module and the function would inject itself to all children modules in a single call (it’s a kind of self deploy import).
Update 3:
Added function tkl_source_module as analog to bash source with support execution guard upon import (implemented through the module merge instead of import).
Update 4:
Added function tkl_declare_global to auto export a module global variable to all children modules where a module global variable is not visible because is not a part of a child module.
Update 5:
All functions has moved into the tacklelib library, see the link above.
from thesmuggler import smuggle
# À la `import weapons`
weapons = smuggle('weapons.py')# À la `from contraband import drugs, alcohol`
drugs, alcohol = smuggle('drugs','alcohol', source='contraband.py')# À la `from contraband import drugs as dope, alcohol as booze`
dope, booze = smuggle('drugs','alcohol', source='contraband.py')
There’s a package that’s dedicated to this specifically:
from thesmuggler import smuggle
# À la `import weapons`
weapons = smuggle('weapons.py')
# À la `from contraband import drugs, alcohol`
drugs, alcohol = smuggle('drugs', 'alcohol', source='contraband.py')
# À la `from contraband import drugs as dope, alcohol as booze`
dope, booze = smuggle('drugs', 'alcohol', source='contraband.py')
It’s tested across Python versions (Jython and PyPy too), but it might be overkill depending on the size of your project.
import sys
import importlib.machinery
def load_module(name, filename):# If the Loader finds the module name in this list it will use# module_name.__file__ instead so we need to delete it hereif name in sys.modules:del sys.modules[name]
loader = importlib.machinery.ExtensionFileLoader(name, filename)
module = loader.load_module()
locals()[name]= module
globals()[name]= module
load_module('something', r'C:\Path\To\something.pyd')
something.do_something()
Adding this to the list of answers as I couldn’t find anything that worked. This will allow imports of compiled (pyd) python modules in 3.4:
import sys
import importlib.machinery
def load_module(name, filename):
# If the Loader finds the module name in this list it will use
# module_name.__file__ instead so we need to delete it here
if name in sys.modules:
del sys.modules[name]
loader = importlib.machinery.ExtensionFileLoader(name, filename)
module = loader.load_module()
locals()[name] = module
globals()[name] = module
load_module('something', r'C:\Path\To\something.pyd')
something.do_something()
回答 25
很简单的方法:假设您要导入具有相对路径../../MyLibs/pyfunc.py的文件
libPath ='../../MyLibs'import sys
ifnot libPath in sys.path: sys.path.append(libPath)import pyfunc as pf
import importlib
dirname, basename = os.path.split(pyfilepath)# pyfilepath: '/my/path/mymodule.py'
sys.path.append(dirname)# only directories should be added to PYTHONPATH
module_name = os.path.splitext(basename)[0]# '/my/path/mymodule.py' --> 'mymodule'
module = importlib.import_module(module_name)# name space of defined module (otherwise we would literally look for "module_name")
A simple solution using importlib instead of the imp package (tested for Python 2.7, although it should work for Python 3 too):
import importlib
dirname, basename = os.path.split(pyfilepath) # pyfilepath: '/my/path/mymodule.py'
sys.path.append(dirname) # only directories should be added to PYTHONPATH
module_name = os.path.splitext(basename)[0] # '/my/path/mymodule.py' --> 'mymodule'
module = importlib.import_module(module_name) # name space of defined module (otherwise we would literally look for "module_name")
Now you can directly use the namespace of the imported module, like this:
a = module.myvar
b = module.myfunc(a)
The advantage of this solution is that we don’t even need to know the actual name of the module we would like to import, in order to use it in our code. This is useful, e.g. in case the path of the module is a configurable argument.
import pathlib
def likely_python_module(filename):'''
Given a filename or Path, return the "likely" python module name. That is, iterate
the parent directories until it doesn't contain an __init__.py file.
:rtype: str
'''
p = pathlib.Path(filename).resolve()
paths =[]if p.name !='__init__.py':
paths.append(p.stem)whileTrue:
p = p.parent
ifnot p:breakifnot p.is_dir():break
inits =[f for f in p.iterdir()if f.name =='__init__.py']ifnot inits:break
paths.append(p.stem)return'.'.join(reversed(paths))
This answer is a supplement to Sebastian Rittau’s answer responding to the comment: “but what if you don’t have the module name?” This is a quick and dirty way of getting the likely python module name given a filename — it just goes up the tree until it finds a directory without an __init__.py file and then turns it back into a filename. For Python 3.4+ (uses pathlib), which makes sense since Py2 people can use “imp” or other ways of doing relative imports:
import pathlib
def likely_python_module(filename):
'''
Given a filename or Path, return the "likely" python module name. That is, iterate
the parent directories until it doesn't contain an __init__.py file.
:rtype: str
'''
p = pathlib.Path(filename).resolve()
paths = []
if p.name != '__init__.py':
paths.append(p.stem)
while True:
p = p.parent
if not p:
break
if not p.is_dir():
break
inits = [f for f in p.iterdir() if f.name == '__init__.py']
if not inits:
break
paths.append(p.stem)
return '.'.join(reversed(paths))
There are certainly possibilities for improvement, and the optional __init__.py files might necessitate other changes, but if you have __init__.py in general, this does the trick.
import imp
import sys
def __import__(name, globals=None, locals=None, fromlist=None):# Fast path: see if the module has already been imported.try:return sys.modules[name]exceptKeyError:pass# If any of the following calls raises an exception,# there's a problem we can't handle -- let the caller handle it.
fp, pathname, description = imp.find_module(name)try:return imp.load_module(name, fp, pathname, description)finally:# Since we may exit via an exception, close fp explicitly.if fp:
fp.close()
import imp
import sys
def __import__(name, globals=None, locals=None, fromlist=None):
# Fast path: see if the module has already been imported.
try:
return sys.modules[name]
except KeyError:
pass
# If any of the following calls raises an exception,
# there's a problem we can't handle -- let the caller handle it.
fp, pathname, description = imp.find_module(name)
try:
return imp.load_module(name, fp, pathname, description)
finally:
# Since we may exit via an exception, close fp explicitly.
if fp:
fp.close()
So close! os.path.isdir returns True if you pass in the name of a directory that currently exists. If it doesn’t exist or it’s not a directory, then it returns False.
Python 3.4 introduced the pathlib module into the standard library, which provides an object oriented approach to handle filesystem paths. The is_dir() and exists() methods of a Path object can be used to answer the question:
In [1]: from pathlib import Path
In [2]: p = Path('/usr')
In [3]: p.exists()
Out[3]: True
In [4]: p.is_dir()
Out[4]: True
Paths (and strings) can be joined together with the / operator:
In [5]: q = p / 'bin' / 'vim'
In [6]: q
Out[6]: PosixPath('/usr/bin/vim')
In [7]: q.exists()
Out[7]: True
In [8]: q.is_dir()
Out[8]: False
import os
os.path.isdir(dir_in) #True/False: check if this is a directory
os.listdir(dir_in) #gets you a list of all files and directories under dir_in
the listdir will throw an exception if the input path is invalid.
回答 9
#You can also check it get help for youifnot os.path.isdir('mydir'):print('new directry has been created')
os.system('mkdir mydir')
It’s similar to the built-in pathlib. The difference is that it treats every path as a string (Path is a subclass of the str), so if some function expects a string, you can easily pass it a Path object without a need to convert it to a string.
For example, this works great with Django and settings.py:
import os
if not os.path.exists(directory):
os.makedirs(directory)
As noted in comments and elsewhere, there’s a race condition – if the directory is created between the os.path.exists and the os.makedirs calls, the os.makedirs will fail with an OSError. Unfortunately, blanket-catching OSError and continuing is not foolproof, as it will ignore a failure to create the directory due to other factors, such as insufficient permissions, full disk, etc.
import os, errno
try:
os.makedirs(directory)
except OSError as e:
if e.errno != errno.EEXIST:
raise
Alternatively, there could be a second os.path.exists, but suppose another created the directory after the first check, then removed it before the second one – we could still be fooled.
Depending on the application, the danger of concurrent operations may be more or less than the danger posed by other factors such as file permissions. The developer would have to know more about the particular application being developed and its expected environment before choosing an implementation.
Modern versions of Python improve this code quite a bit, both by exposing FileExistsError (in 3.3+)…
os.makedirs("path/to/directory", exist_ok=True) # succeeds even if directory exists.
回答 12
两件事情
检查目录是否存在?
如果不是,请创建目录(可选)。
import os
dirpath ="<dirpath>"# Replace the "<dirpath>" with actual directory path.if os.path.exists(dirpath):print("Directory exist")else:#this is optional if you want to create a directory if doesn't exist.
os.mkdir(dirpath):print("Directory created")
import os
dirpath = "<dirpath>" # Replace the "<dirpath>" with actual directory path.
if os.path.exists(dirpath):
print("Directory exist")
else: #this is optional if you want to create a directory if doesn't exist.
os.mkdir(dirpath):
print("Directory created")
I’m building a web application with Django. The reasons I chose Django were:
I wanted to work with free/open-source tools.
I like Python and feel it’s a long-term language, whereas regarding Ruby I wasn’t sure, and PHP seemed like a huge hassle to learn.
I’m building a prototype for an idea and wasn’t thinking too much about the future. Development speed was the main factor, and I already knew Python.
I knew the migration to Google App Engine would be easier should I choose to do so in the future.
I heard Django was “nice”.
Now that I’m getting closer to thinking about publishing my work, I start being concerned about scale. The only information I found about the scaling capabilities of Django is provided by the Django team (I’m not saying anything to disregard them, but this is clearly not objective information…).
My questions:
What’s the “largest” site that’s built on Django today? (I measure size mostly by user traffic)
Can Django deal with 100,000 users daily, each visiting the site for a couple of hours?
“What are the largest sites built on Django today?”
There isn’t any single place that collects information about traffic on Django built sites, so I’ll have to take a stab at it using data from various locations. First, we have a list of Django sites on the front page of the main Django project page and then a list of Django built sites at djangosites.org. Going through the lists and picking some that I know have decent traffic we see:
pownce.com (no longer active): alexa rank about 65k.
Mike Malone of Pownce, in his EuroDjangoCon presentation on Scaling Django Web Apps says “hundreds of hits per second”. This is a very good presentation on how to scale Django, and makes some good points including (current) shortcomings in Django scalability.
HP had a site built with Django 1.5: ePrint center. However, as for novemer/2015 the entire website was migrated and this link is just a redirect. This website was a world-wide service attending subscription to Instant Ink and related services HP offered (*).
“Can Django deal with 100,000 users daily, each visiting the site for a couple of hours?”
Yes, see above.
“Could a site like Stack Overflow run on Django?”
My gut feeling is yes but, as others answered and Mike Malone mentions in his presentation, database design is critical. Strong proof might also be found at www.cnprog.com if we can find any reliable traffic stats. Anyway, it’s not just something that will happen by throwing together a bunch of Django models :)
There are, of course, many more sites and bloggers of interest, but I have got to stop somewhere!
We’re doing load testing now. We think we can support 240 concurrent requests (a sustained rate of 120 hits per second 24×7) without any significant degradation in the server performance. That would be 432,000 hits per hour. Response times aren’t small (our transactions are large) but there’s no degradation from our baseline performance as the load increases.
We’re using Apache front-ending Django and MySQL. The OS is Red Hat Enterprise Linux (RHEL). 64-bit. We use mod_wsgi in daemon mode for Django. We’ve done no cache or database optimization other than to accept the defaults.
We’re all in one VM on a 64-bit Dell with (I think) 32Gb RAM.
Since performance is almost the same for 20 or 200 concurrent users, we don’t need to spend huge amounts of time “tweaking”. Instead we simply need to keep our base performance up through ordinary SSL performance improvements, ordinary database design and implementation (indexing, etc.), ordinary firewall performance improvements, etc.
What we do measure is our load test laptops struggling under the insane workload of 15 processes running 16 concurrent threads of requests.
What’s the “largest” site that’s built on Django today? (I measure size mostly by user traffic)
In the US, it was Mahalo. I’m told they handle roughly 10 million uniques a month. Now, in 2019, Mahalo is powered by Ruby on Rails.
Abroad, the Globo network (a network of news, sports, and entertainment sites in Brazil); Alexa ranks them in to top 100 globally (around 80th currently).
Other notable Django users include PBS, National Geographic, Discovery, NASA (actually a number of different divisions within NASA), and the Library of Congress.
Can Django deal with 100k users daily, each visiting the site for a couple of hours?
Yes — but only if you’ve written your application right, and if you’ve got enough hardware. Django’s not a magic bullet.
Could a site like StackOverflow run on Django?
Yes (but see above).
Technology-wise, easily: see soclone for one attempt. Traffic-wise, compete pegs StackOverflow at under 1 million uniques per month. I can name at least dozen Django sites with more traffic than SO.
Scaling Web apps is not about web frameworks or languages, is about your architecture.
It’s about how you handle you browser cache, your database cache, how you use non-standard persistence providers (like CouchDB), how tuned is your database and a lot of other stuff…
You should check the DjangoCon 2008 Keynote, delivered by Cal Henderson, titled “Why I hate Django” where he pretty much goes over everything Django is missing that you might want to do in a high traffic website. At the end of the day you have to take this all with an open mind because it is perfectly possible to write Django apps that scale, but I thought it was a good presentation and relevant to your question.
The largest django site I know of is the Washington Post, which would certainly indicate that it can scale well.
Good design decisions probably have a bigger performance impact than anything else. Twitter is often cited as a site which embodies the performance issues with another dynamic interpreted language based web framework, Ruby on Rails – yet Twitter engineers have stated that the framework isn’t as much an issue as some of the database design choices they made early on.
Django works very nicely with memcached and provides some classes for managing the cache, which is where you would resolve the majority of your performance issues. What you deliver on the wire is almost more important than your backend in reality – using a tool like yslow is critical for a high performance web application. You can always throw more hardware at your backend, but you can’t change your users bandwidth.
I was at the EuroDjangoCon conference the other week, and this was the subject of a couple of talks – including from the founders of what was the largest Django-based site, Pownce (slides from one talk here). The main message is that it’s not Django you have to worry about, but things like proper caching, load balancing, database optimisation, etc.
Django actually has hooks for most of those things – caching, in particular, is made very easy.
I’m sure you’re looking for a more solid answer, but the most obvious objective validation I can think of is that Google pushes Django for use with its App Engine framework. If anybody knows about and deals with scalability on a regular basis, it’s Google. From what I’ve read, the most limiting factor seems to be the database back-end, which is why Google uses their own…
It’s not uncommon to hear people say “Django doesn’t scale”. Depending on how you look at it, the statement is either completely true or patently false. Django, on its own, doesn’t scale.
The same can be said of Ruby on Rails, Flask, PHP, or any other language used by a database-driven dynamic website.
The good news, however, is that Django interacts beautifully with a suite of caching and
load balancing tools that will allow it to scale to as much traffic as you can throw at it.
Contrary to what you may have read online,
it can do so without replacing core components often labeled as “too slow” such as the database ORM or the template layer.
Disqus serves over 8 billion page views per month. Those are some huge numbers.
These teams have proven Django most certainly does scale.
Our experience here at Lincoln Loop backs it up.
We’ve built big Django sites capable of spending the day on the Reddit homepage without breaking a sweat.
Django’s scaling success stories are almost too numerous to list at this point.
It backs Disqus, Instagram, and Pinterest. Want some more proof? Instagram was able to sustain over 30 million users on Django with only 3 engineers (2 of which had no back-end development
The Washington Post’s website is a hugely popular online news source to accompany their daily paper. Its’ huge amount of views and traffic can be easily handled by the Django web framework.
Washington Post - 52.2 million unique visitors (March, 2015)
The National Aeronautics and Space Administration’s official website is the place to find news, pictures, and videos about their ongoing space exploration. This Django website can easily handle huge amounts of views and traffic.
2 million visitors monthly
The Guardian is a British news and media website owned by the Guardian Media Group. It contains nearly all of the content of the newspapers The Guardian and The Observer. This huge data is handled by Django.
The Guardian (commenting system) - 41,6 million unique visitors (October, 2014)
We all know YouTube as the place to upload cat videos and fails. As one of the most popular websites in existence, it provides us with endless hours of video entertainment. The Python programming language powers it and the features we love.
DropBox started the online document storing revolution that has become part of daily life. We now store almost everything in the cloud. Dropbox allows us to store, sync, and share almost anything using the power of Python.
Quora is the number one place online to ask a question and receive answers from a community of individuals. On their Python website relevant results are answered, edited, and organized by these community members.
A majority of the code for Bitly URL shortening services and analytics are all built with Python. Their service can handle hundreds of millions of events per day.
Reddit is known as the front page of the internet. It is the place online to find information or entertainment based on thousands of different categories. Posts and links are user generated and are promoted to the top through votes. Many of Reddit’s capabilities rely on Python for their functionality.
Hipmunk is an online consumer travel site that compares the top travel sites to find you the best deals. This Python website’s tools allow you to find the cheapest hotels and flights for your destination.
Yes it can. It could be Django with Python or Ruby on Rails. It will still scale.
There are few different techniques. First, caching is not scaling. You could have several application servers balanced with nginx as the front in addition to hardware balancer(s).
To scale on the database side you can go pretty far with read slave in MySQL / PostgreSQL if you go the RDBMS way.
Some good examples of heavy traffic websites in Django could be:
It’s a little old, but someone from the LA Times gave a basic overview of why they went with Django.
The Onion’s AV Club was recently moved from (I think Drupal) to Django.
I imagine a number of these these sites probably gets well over 100k+ hits per day. Django can certainly do 100k hits/day and more. But YMMV in getting your particular site there depending on what you’re building.
There are caching options at the Django level (for example caching querysets and views in memcached can work wonders) and beyond (upstream caches like Squid). Database Server specifications will also be a factor (and usually the place to splurge), as is how well you’ve tuned it. Don’t assume, for example, that Django’s going set up indexes properly. Don’t assume that the default PostgreSQL or MySQL configuration is the right one.
Furthermore, you always have the option of having multiple application servers running Django if that is the slow point, with a software or hardware load balancer in front.
Finally, are you serving static content on the same server as Django? Are you using Apache or something like nginx or lighttpd? Can you afford to use a CDN for static content? These are things to think about, but it’s all very speculative. 100k hits/day isn’t the only variable: how much do you want to spend? How much expertise do you have managing all these components? How much time do you have to pull it all together?
I have been using Django for over a year now, and am very impressed with how it manages to combine modularity, scalability and speed of development. Like with any technology, it comes with a learning curve. However, this learning curve is made a lot less steep by the excellent documentation from the Django community. Django has been able to handle everything I have thrown at it really well. It looks like it will be able to scale well into the future.
BidRodeo Penny Auctions is a moderately sized Django powered website. It is a very dynamic website and does handle a good number of page views a day.
Note that if you’re expecting 100K users per day, that are active for hours at a time (meaning max of 20K+ concurrent users), you’re going to need A LOT of servers. SO has ~15,000 registered users, and most of them are probably not active daily. While the bulk of traffic comes from unregistered users, I’m guessing that very few of them stay on the site more than a couple minutes (i.e. they follow google search results then leave).
For that volume, expect at least 30 servers … which is still a rather heavy 1,000 concurrent users per server.
Can Django deal with 100,000 users daily, each visiting the site for a couple of hours?
Yes but use proper architecture, database design, use of cache, use load balances and multiple servers or nodes
Could a site like Stack Overflow run on Django?
Yes just need to follow the answer mentioned in the 2nd question
If you have a site with some static content, then putting a Varnish server in front will dramatically increase your performance. Even a single box can then easily spit out 100 Mbit/s of traffic.
Note that with dynamic content, using something like Varnish becomes a lot more tricky.
My experience with Django is minimal but I do remember in The Django Book they have a chapter where they interview people running some of the larger Django applications. Here is a link. I guess it could provide some insights.
It says curse.com is one of the largest Django applications with around 60-90 million page views in a month.
I develop high traffic sites using Django for the national broadcaster in Ireland. It works well for us. Developing a high performance site is more than about just choosing a framework. A framework will only be one part of a system that is as strong as it’s weakest link. Using the latest framework ‘X’ won’t solve your performance issues if the problem is slow database queries or a badly configured server or network.
Even-though there have been a lot of great answers here, I just feel like pointing out, that nobody have put emphasis on..
It depends on the application
If you application is light on writes, as in you are reading a lot more data from the DB than you are writing. Then scaling django should be fairly trivial, heck, it comes with some fairly decent output/view caching straight out of the box. Make use of that, and say, redis as a cache provider, put a load balancer in front of it, spin up n-instances and you should be able to deal with a VERY large amount of traffic.
Now, if you have to do thousands of complex writes a second? Different story. Is Django going to be a bad choice? Well, not necessarily, depends on how you architect your solution really, and also, what your requirements are.
If you want to use Open source then there are many options for you. But python is best among them as it has many libraries and a super awesome community.
These are a few reasons which might change your mind:
Python is very good but it is a interpreted language which makes it slow. But many accelerator and caching services are there which partly solve this problem.
If you are thinking about rapid development then Ruby on Rails is best among all. The main motto of this(ROR) framework is to give a comfortable experience to the developers. If you compare Ruby and Python both have nearly the same syntax.
Google App Engine is very good service but it will bind you in its scope, you don’t get chance to experiment new things. Instead of it you can use Digital Ocean cloud which will only take $5/Month charge for its simplest droplet. Heroku is another free service where you can deploy your product.
Yes! Yes! What you heard is totally correct but here are some examples which are using other technologies
Rails: Github, Twitter(previously), Shopify, Airbnb, Slideshare, Heroku etc.
PHP: Facebook, Wikipedia, Flickr, Yahoo, Tumbler, Mailchimp etc.
Conclusion is a framework or language won’t do everything for you. A better architecture, designing and strategy will give you a scalable website. Instagram is the biggest example, this small team is managing such huge data. Here is one blog about its architecture must read it.
I don’t think the issue is really about Django scaling.
I really suggest you look into your architecture that’s what will help you with your scaling needs.If you get that wrong there is no point on how well Django performs. Performance != Scale. You can have a system that has amazing performance but does not scale and vice versa.
Is your application database bound? If it is then your scale issues lay there as well. How are you planning on interacting with the database from Django? What happens when you database cannot process requests as fast as Django accepts them? What happens when your data outgrows one physical machine. You need to account for how you plan on dealing with those circumstances.
Moreover, What happens when your traffic outgrows one app server? how you handle sessions in this case can be tricky, more often than not you would probably require a shared nothing architecture. Again that depends on your application.
In short languages is not what determines scale, a language is responsible for performance(again depending on your applications, different languages perform differently). It is your design and architecture that makes scaling a reality.
I hope it helps, would be glad to help further if you have questions.
Spreading the tasks evenly, in short optimizing each and every aspect including DBs, Files, Images, CSS etc. and balancing the load with several other resources is necessary once your site/application starts growing. OR you make some more space for it to grow. Implementation of latest technologies like CDN, Cloud are must with huge sites. Just developing and tweaking an application won’t give your the cent percent satisfation, other components also play an important role.
What is the purpose of the self word in Python? I understand it refers to the specific object created from that class, but I can’t see why it explicitly needs to be added to every function as a parameter. To illustrate, in Ruby I can do this:
class myClass
def myFunc(name)
@name = name
end
end
Which I understand, quite easily. However in Python I need to include self:
class myClass:
def myFunc(self, name):
self.name = name
Can anyone talk me through this? It is not something I’ve come across in my (admittedly limited) experience.
The reason you need to use self. is because Python does not use the @ syntax to refer to instance attributes. Python decided to do methods in a way that makes the instance to which the method belongs be passed automatically, but not received automatically: the first parameter of methods is the instance the method is called on. That makes methods entirely the same as functions, and leaves the actual name to use up to you (although self is the convention, and people will generally frown at you when you use something else.) self is not special to the code, it’s just another object.
Python could have done something else to distinguish normal names from attributes — special syntax like Ruby has, or requiring declarations like C++ and Java do, or perhaps something yet more different — but it didn’t. Python’s all for making things explicit, making it obvious what’s what, and although it doesn’t do it entirely everywhere, it does do it for instance attributes. That’s why assigning to an instance attribute needs to know what instance to assign to, and that’s why it needs self..
回答 1
让我们看一个简单的向量类:
classVector:def __init__(self, x, y):
self.x = x
self.y = y
So the whole structure stays the same. How can me make use of this? If we assume for a moment that we hadn’t written a length method for our Vector class, we could do this:
Vector.length_new = length_global
v = Vector(3, 4)
print(v.length_new()) # 5.0
This works because the first parameter of length_global, can be re-used as the self parameter in length_new. This would not be possible without an explicit self.
Another way of understanding the need for the explicit self is to see where Python adds some syntactical sugar. When you keep in mind, that basically, a call like
v_instance.length()
is internally transformed to
Vector.length(v_instance)
it is easy to see where the self fits in. You don’t actually write instance methods in Python; what you write is class methods which must take an instance as a first parameter. And therefore, you’ll have to place the instance parameter somewhere explicitly.
When objects are instantiated, the object itself is passed into the self parameter.
Because of this, the object’s data is bound to the object. Below is an example of how you might like to visualize what each object’s data might look. Notice how ‘self’ is replaced with the objects name. I’m not saying this example diagram below is wholly accurate but it hopefully with serve a purpose in visualizing the use of self.
The Object is passed into the self parameter so that the object can keep hold of its own data.
Although this may not be wholly accurate, think of the process of instantiating an object like this: When an object is made it uses the class as a template for its own data and methods. Without passing it’s own name into the self parameter, the attributes and methods in the class would remain as a general template and would not be referenced to (belong to) the object. So by passing the object’s name into the self parameter it means that if 100 objects are instantiated from the one class, they can all keep track of their own data and methods.
See the illustration below:
回答 4
我喜欢这个例子:
class A:
foo =[]
a, b = A(), A()
a.foo.append(5)
b.foo
ans:[5]class A:def __init__(self):
self.foo =[]
a, b = A(), A()
a.foo.append(5)
b.foo
ans:[]
class A:
foo = []
a, b = A(), A()
a.foo.append(5)
b.foo
ans: [5]
class A:
def __init__(self):
self.foo = []
a, b = A(), A()
a.foo.append(5)
b.foo
ans: []
Classes are just a way to avoid passing in this “state” thing all the time (and other nice things like initializing, class composition, the rarely-needed metaclasses, and supporting custom methods to override operators).
Now let’s demonstrate the above code using the built-in python class machinery, to show how it’s basically the same thing.
class State(object):
def __init__(self):
self.field = 'init'
def add(self, x):
self.field += x
def mult(self, x):
self.field *= x
s = State()
s.add('added') # self is implicitly passed in
s.mult(2) # self is implicitly passed in
print( s.field )
[migrated my answer from duplicate closed question]
As in Modula-3, there are no shorthands [in Python] for referencing the object’s members from its methods: the method function is declared with an explicit first argument representing the object, which is provided implicitly by the call.
Often, the first argument of a method is called self. This is nothing more than a convention: the name self has absolutely no special meaning to Python. Note, however, that by not following the convention your code may be less readable to other Python programmers, and it is also conceivable that a class browser program might be written that relies upon such a convention.
class C1(object):def __init__(self):print"C1 init"class C2(C1):def __init__(self):#overrides C1.__init__print"C2 init"
C1.__init__(self)#but we still want C1 to init the class too
As well as all the other reasons already stated, it allows for easier access to overridden methods; you can call Class.some_method(inst).
An example of where it’s useful:
class C1(object):
def __init__(self):
print "C1 init"
class C2(C1):
def __init__(self): #overrides C1.__init__
print "C2 init"
C1.__init__(self) #but we still want C1 to init the class too
Python is not a language built for Object Oriented Programming unlike Java or C++.
When calling a static method in Python, one simply writes a method with regular arguments inside it.
class Animal():
def staticMethod():
print "This is a static method"
However, an object method, which requires you to make a variable, which is an Animal, in this case, needs the self argument
class Animal():
def objectMethod(self):
print "This is an object method which needs an instance of a class"
The self method is also used to refer to a variable field within the class.
class Animal():
#animalName made in constructor
def Animal(self):
self.animalName = "";
def getAnimalName(self):
return self.animalName
In this case, self is referring to the animalName variable of the entire class. REMEMBER: If you have a variable within a method, self will not work. That variable is simply existent only while that method is running. For defining fields (the variables of the entire class), you have to define them OUTSIDE the class methods.
If you don’t understand a single word of what I am saying, then Google “Object Oriented Programming.” Once you understand this, you won’t even need to ask that question :).
It’s there to follow the Python zen “explicit is better than implicit”. It’s indeed a reference to your class object. In Java and PHP, for example, it’s called this.
If user_type_name is a field on your model you access it by self.user_type_name.
classStudent:#called each time you create a new Student instancedef __init__(self,name,age):#special method to initialize
self.name=name
self.age=age
def __str__(self):#special method called for example when you use printreturn"Student %s is %s years old"%(self.name,self.age)def call(self, msg):#silly example for custom methodreturn("Hey, %s! "+msg)%self.name
#initializing two instances of the student class
bob=Student("Bob",20)
alice=Student("Alice",19)#using themprint bob.name
print bob.age
print alice #this one only works if you define the __str__ methodprint alice.call("Come here!")#notice you don't put a value for self#you can modify attributes, like when alice ages
alice.age=20print alice
First of all, self is a conventional name, you could put anything else (being coherent) in its stead.
It refers to the object itself, so when you are using it, you are declaring that .name and .age are properties of the Student objects (note, not of the Student class) you are going to create.
class Student:
#called each time you create a new Student instance
def __init__(self,name,age): #special method to initialize
self.name=name
self.age=age
def __str__(self): #special method called for example when you use print
return "Student %s is %s years old" %(self.name,self.age)
def call(self, msg): #silly example for custom method
return ("Hey, %s! "+msg) %self.name
#initializing two instances of the student class
bob=Student("Bob",20)
alice=Student("Alice",19)
#using them
print bob.name
print bob.age
print alice #this one only works if you define the __str__ method
print alice.call("Come here!") #notice you don't put a value for self
#you can modify attributes, like when alice ages
alice.age=20
print alice
self is an object reference to the object itself, therefore, they are same.
Python methods are not called in the context of the object itself.
self in Python may be used to deal with custom object models or something.
classMyClass():def staticMethod():print"This is a static method"def objectMethod(self):print"This is an object method which needs an instance of a class, and that is what self refers to"
The use of the argument, conventionally called self isn’t as hard to understand, as is why is it necessary? Or as to why explicitly mention it? That, I suppose, is a bigger question for most users who look up this question, or if it is not, they will certainly have the same question as they move forward learning python. I recommend them to read these couple of blogs:
The first argument of every class method, including init, is always a reference to the current instance of the class. By convention, this argument is always named self. In the init method, self refers to the newly created object; in other class methods, it refers to the instance whose method was called. For example the below code is the same as the above code.
Another thing I would like to add is, an optional self argument allows me to declare static methods inside a class, by not writing self.
Code examples:
class MyClass():
def staticMethod():
print "This is a static method"
def objectMethod(self):
print "This is an object method which needs an instance of a class, and that is what self refers to"
PS:This works only in Python 3.x.
In previous versions, you have to explicitly add @staticmethod decorator, otherwise self argument is obligatory.
I’m surprised nobody has brought up Lua. Lua also uses the ‘self’ variable however it can be omitted but still used. C++ does the same with ‘this’. I don’t see any reason to have to declare ‘self’ in each function but you should still be able to use it just like you can with lua and C++. For a language that prides itself on being brief it’s odd that it requires you to declare the self variable.
回答 15
请看以下示例,该示例清楚地说明了 self
classRestaurant(object):
bankrupt =Falsedef open_branch(self):ifnot self.bankrupt:print("branch opened")#create instance1>>> x =Restaurant()>>> x.bankrupt
False#create instance2>>> y =Restaurant()>>> y.bankrupt =True>>> y.bankrupt
True>>> x.bankrupt
False
Is because by the way python is designed the alternatives would hardly work. Python is designed to allow methods or functions to be defined in a context where both implicit this (a-la Java/C++) or explicit @ (a-la ruby) wouldn’t work. Let’s have an example with the explicit approach with python conventions:
def fubar(x):
self.x = x
class C:
frob = fubar
Now the fubar function wouldn’t work since it would assume that self is a global variable (and in frob as well). The alternative would be to execute method’s with a replaced global scope (where self is the object).
The implicit approach would be
def fubar(x)
myX = x
class C:
frob = fubar
This would mean that myX would be interpreted as a local variable in fubar (and in frob as well). The alternative here would be to execute methods with a replaced local scope which is retained between calls, but that would remove the posibility of method local variables.
However the current situation works out well:
def fubar(self, x)
self.x = x
class C:
frob = fubar
here when called as a method frob will receive the object on which it’s called via the self parameter, and fubar can still be called with an object as parameter and work the same (it is the same as C.frob I think).
In the __init__ method, self refers to the newly created object; in other class methods, it refers to the instance whose method was called.
self, as a name, is just a convention, call it as you want ! but when using it, for example to delete the object, you have to use the same name: __del__(var), where var was used in the __init__(var,[...])
You should take a look at cls too, to have the bigger picture. This post could be helpful.
回答 18
self的作用类似于当前的对象名称或class的实例。
# Self explanation.class classname(object):def __init__(self,name):
self.name=name
# Self is acting as a replacement of object name.#self.name=object1.namedef display(self):print("Name of the person is :",self.name)print("object name:",object1.name)
object1=classname("Bucky")
object2=classname("ford")
object1.display()
object2.display()###### Output Name of the person is:Bucky
object name:BuckyName of the person is: ford
object name:Bucky
self is acting as like current object name or instance of class .
# Self explanation.
class classname(object):
def __init__(self,name):
self.name=name
# Self is acting as a replacement of object name.
#self.name=object1.name
def display(self):
print("Name of the person is :",self.name)
print("object name:",object1.name)
object1=classname("Bucky")
object2=classname("ford")
object1.display()
object2.display()
###### Output
Name of the person is : Bucky
object name: Bucky
Name of the person is : ford
object name: Bucky
If we would just stick to functional programming we would not need self.
Once we enter the Python OOP we find self there.
Here is the typical use case class C with the method m1
class C:
def m1(self, arg):
print(self, ' inside')
pass
ci =C()
print(ci, ' outside')
ci.m1(None)
print(hex(id(ci))) # hex memory address
This program will output:
<__main__.C object at 0x000002B9D79C6CC0> outside
<__main__.C object at 0x000002B9D79C6CC0> inside
0x2b9d79c6cc0
So self holds the memory address of the class instance.
The purpose of self would be to hold the reference for instance methods and for us to have explicit access to that reference.
Note there are three different types of class methods:
the special thing about methods is that the instance object is passed as the first argument of the function. In our example, the call x.f() is exactly equivalent to MyClass.f(x). In general, calling a method with a list of n arguments is equivalent to calling the corresponding function with an argument list that is created by inserting the method’s instance object before the first argument.
preceding this the related snippet,
class MyClass:
"""A simple example class"""
i = 12345
def f(self):
return 'hello world'
Further improve readability by adding flags indent=4, sort_keys=True (as suggested by dinos66) to arguments of dump or dumps. This way you’ll get a nicely indented sorted structure in the json file at the cost of a slightly larger file size.
I would answer with slight modification with aforementioned answers and that is to write a prettified JSON file which human eyes can read better. For this, pass sort_keys as True and indent with 4 space characters and you are good to go. Also take care of ensuring that the ascii codes will not be written in your JSON file:
with open('data.txt', 'w') as outfile:
json.dump(jsonData, outfile, sort_keys = True, indent = 4,
ensure_ascii = False)
回答 3
使用Python 2 + 3读写JSON文件;与unicode一起使用
# -*- coding: utf-8 -*-import json
# Make it work for Python 2+3 and with Unicodeimport io
try:
to_unicode = unicode
exceptNameError:
to_unicode = str
# Define data
data ={'a list':[1,42,3.141,1337,'help', u'€'],'a string':'bla','another dict':{'foo':'bar','key':'value','the answer':42}}# Write JSON filewith io.open('data.json','w', encoding='utf8')as outfile:
str_ = json.dumps(data,
indent=4, sort_keys=True,
separators=(',',': '), ensure_ascii=False)
outfile.write(to_unicode(str_))# Read JSON filewith open('data.json')as data_file:
data_loaded = json.load(data_file)print(data == data_loaded)
For those of you who are trying to dump greek or other “exotic” languages such as me but are also having problems (unicode errors) with weird characters such as the peace symbol (\u262E) or others which are often contained in json formated data such as Twitter’s, the solution could be as follows (sort_keys is obviously optional):
import codecs, json
with codecs.open('data.json', 'w', 'utf8') as f:
f.write(json.dumps(data, sort_keys = True, ensure_ascii=False))
I don’t have enough reputation to add in comments, so I just write some of my findings of this annoying TypeError here:
Basically, I think it’s a bug in the json.dump() function in Python 2 only – It can’t dump a Python (dictionary / list) data containing non-ASCII characters, even you open the file with the encoding = 'utf-8' parameter. (i.e. No matter what you do). But, json.dumps() works on both Python 2 and 3.
To illustrate this, following up phihag’s answer: the code in his answer breaks in Python 2 with exception TypeError: must be unicode, not str, if data contains non-ASCII characters. (Python 2.7.6, Debian):
import json
data = {u'\u0430\u0431\u0432\u0433\u0434': 1} #{u'абвгд': 1}
with open('data.txt', 'w') as outfile:
json.dump(data, outfile)
Also, if you need to debug improperly formatted JSON, and want a helpful error message, use import simplejson library, instead of import json (functions should be the same)
import json
with open('data.txt')as json_file:
data = json.load(json_file)for p in data['people']:print('Name: '+ p['name'])print('Website: '+ p['website'])print('From: '+ p['from'])print('')
import json
with open('data.txt') as json_file:
data = json.load(json_file)
for p in data['people']:
print('Name: ' + p['name'])
print('Website: ' + p['website'])
print('From: ' + p['from'])
print('')
#! /usr/bin/env pythonimport json
def write_json():# create a dictionary
student_data ={"students":[]}#create a list
data_holder = student_data["students"]# just a counter
counter =0#loop through if you have multiple items.. while counter <3:
data_holder.append({'id':counter})
data_holder.append({'room':counter})
counter +=1#write the file
file_path='/tmp/student_data.json'with open(file_path,'w')as outfile:print("writing file to: ",file_path)# HERE IS WHERE THE MAGIC HAPPENS
json.dump(student_data, outfile)
outfile.close()print("done")
write_json()
All previous answers are correct here is a very simple example:
#! /usr/bin/env python
import json
def write_json():
# create a dictionary
student_data = {"students":[]}
#create a list
data_holder = student_data["students"]
# just a counter
counter = 0
#loop through if you have multiple items..
while counter < 3:
data_holder.append({'id':counter})
data_holder.append({'room':counter})
counter += 1
#write the file
file_path='/tmp/student_data.json'
with open(file_path, 'w') as outfile:
print("writing file to: ",file_path)
# HERE IS WHERE THE MAGIC HAPPENS
json.dump(student_data, outfile)
outfile.close()
print("done")
write_json()
The column names (which are strings) cannot be sliced in the manner you tried.
Here you have a couple of options. If you know from context which variables you want to slice out, you can just return a view of only those columns by passing a list into the __getitem__ syntax (the []’s).
df1 = df[['a','b']]
Alternatively, if it matters to index them numerically and not by their name (say your code should automatically do this without knowing the names of the first two columns) then you can do this instead:
df1 = df.iloc[:,0:2] # Remember that Python does not slice inclusive of the ending index.
Additionally, you should familiarize yourself with the idea of a view into a Pandas object vs. a copy of that object. The first of the above methods will return a new copy in memory of the desired sub-object (the desired slices).
Sometimes, however, there are indexing conventions in Pandas that don’t do this and instead give you a new variable that just refers to the same chunk of memory as the sub-object or slice in the original object. This will happen with the second way of indexing, so you can modify it with the copy() function to get a regular copy. When this happens, changing what you think is the sliced object can sometimes alter the original object. Always good to be on the look out for this.
df1 = df.iloc[0,0:2].copy() # To avoid the case where changing df1 also changes df
To use iloc, you need to know the column positions (or indices). As the column positions may change, instead of hard-coding indices, you can use iloc along with get_loc function of columns method of dataframe object to obtain column indices.
{df.columns.get_loc(c):c for idx, c in enumerate(df.columns)}
Now you can use this dictionary to access columns through names and using iloc.
import pandas as pd
import numpy as np
np.random.seed(5)
df = pd.DataFrame(np.random.randint(100, size=(100,6)),
columns=list('ABCDEF'),
index=['R{}'.format(i)for i in range(100)])
df.head()Out:
A B C D E F
R0 99786116738
R1 62273080776
R2 155380274477
R3 756547308486
R4 1894162182
要从C到E获得列(请注意,与整数切片不同,列中包含’E’):
df.loc[:,'C':'E']Out:
C D E
R0 611673
R1 30807
R2 802744
R3 473084
R4 41621
R5 5580...
同样适用于基于标签选择行。从这些列中获取行“ R6”至“ R10”:
df.loc['R6':'R10','C':'E']Out:
C D E
R6 512731
R7 831918
R8 116765
R9 782729
R10 71694
Same works for selecting rows based on labels. Get the rows ‘R6’ to ‘R10’ from those columns:
df.loc['R6':'R10', 'C':'E']
Out:
C D E
R6 51 27 31
R7 83 19 18
R8 11 67 65
R9 78 27 29
R10 7 16 94
.loc also accepts a boolean array so you can select the columns whose corresponding entry in the array is True. For example, df.columns.isin(list('BCD')) returns array([False, True, True, True, False, False], dtype=bool) – True if the column name is in the list ['B', 'C', 'D']; False, otherwise.
Assuming your column names (df.columns) are ['index','a','b','c'], then the data you want is in the
3rd & 4th columns. If you don’t know their names when your script runs, you can do this
newdf = df[df.columns[2:4]] # Remember, Python is 0-offset! The "3rd" entry is at slot 2.
As EMS points out in his answer, df.ix slices columns a bit more concisely, but the .columns slicing interface might be more natural because it uses the vanilla 1-D python list indexing/slicing syntax.
WARN: 'index' is a bad name for a DataFrame column. That same label is also used for the real df.index attribute, a Index array. So your column is returned by df['index'] and the real DataFrame index is returned by df.index. An Index is a special kind of Series optimized for lookup of it’s elements’ values. For df.index it’s for looking up rows by their label. That df.columns attribute is also a pd.Index array, for looking up columns by their labels.
回答 3
In[39]: df
Out[39]:
index a b c
0123412345In[40]: df1 = df[['b','c']]In[41]: df1
Out[41]:
b c
034145
I realize this question is quite old, but in the latest version of pandas there is an easy way to do exactly this. Column names (which are strings) can be sliced in whatever manner you like.
You could provide a list of columns to be dropped and return back the DataFrame with only the columns needed using the drop() function on a Pandas DataFrame.
Just saying
colsToDrop = ['a']
df.drop(colsToDrop, axis=1)
would return a DataFrame with just the columns b and c.
Starting with 0.21.0, using .loc or [] with a list with one or more missing labels is deprecated in favor of .reindex. So, the answer to your question is:
df1 = df.reindex(columns=['b','c'])
In prior versions, using .loc[list-of-labels] would work as long as at least 1 of the keys was found (otherwise it would raise a KeyError). This behavior is deprecated and now shows a warning message. The recommended alternative is to use .reindex().
df1= pd.DataFrame() #creating an empty dataframe
for index,i in df.iterrows():
df1.loc[index,'A']=df.loc[index,'A']
df1.loc[index,'B']=df.loc[index,'B']
df1.head()
The different approaches discussed in above responses are based on the assumption that either the user knows column indices to drop or subset on, or the user wishes to subset a dataframe using a range of columns (for instance between ‘C’ : ‘E’). pandas.DataFrame.drop() is certainly an option to subset data based on a list of columns defined by user (though you have to be cautious that you always use copy of dataframe and inplace parameters should not be set to True!!)
Another option is to use pandas.columns.difference(), which does a set difference on column names, and returns an index type of array containing desired columns. Following is the solution:
I’ve seen several answers on that, but on remained unclear to me. How would you select those columns of interest? The answer to that is that if you have them gathered in a list, you can just reference the columns using the list.
I have the following list/numpy array extracted_features, specifying 63 columns. The original dataset has 103 columns, and I would like to extract exactly those, then I would use
dataset[extracted_features]
And you will end up with this
This something you would use quite often in Machine Learning (more specifically, in feature selection). I would like to discuss other ways too, but I think that has already been covered by other stackoverflowers. Hope this’ve been helpful!
You can use pandas.DataFrame.filter method to either filter or reorder columns like this:
df1 = df.filter(['a', 'b'])
回答 16
df[['a','b']]# select all rows of 'a' and 'b'column
df.loc[0:10,['a','b']]# index 0 to 10 select column 'a' and 'b'
df.loc[0:10,['a':'b']]# index 0 to 10 select column 'a' to 'b'
df.iloc[0:10,3:5]# index 0 to 10 and column 3 to 5
df.iloc[3,3:5]# index 3 of column 3 to 5
df[['a','b']] # select all rows of 'a' and 'b'column
df.loc[0:10, ['a','b']] # index 0 to 10 select column 'a' and 'b'
df.loc[0:10, ['a':'b']] # index 0 to 10 select column 'a' to 'b'
df.iloc[0:10, 3:5] # index 0 to 10 and column 3 to 5
df.iloc[3, 3:5] # index 3 of column 3 to 5