In “Programming Python“, Mark Lutz mentions “mixins”. I’m from a C/C++/C# background and I have not heard the term before. What is a mixin?
Reading between the lines of this example (which I’ve linked to because it’s quite long), I’m presuming it’s a case of using multiple inheritance to extend a class as opposed to ‘proper’ subclassing. Is this right?
Why would I want to do that rather than put the new functionality into a subclass? For that matter, why would a mixin/multiple inheritance approach be better than using composition?
What separates a mixin from multiple inheritance? Is it just a matter of semantics?
from werkzeug importBaseRequestclassRequest(BaseRequest):pass
如果我想添加接受标头支持,我会做
from werkzeug importBaseRequest,AcceptMixinclassRequest(AcceptMixin,BaseRequest):pass
如果我想创建一个支持接受标头,etag,身份验证和用户代理支持的请求对象,则可以执行以下操作:
from werkzeug importBaseRequest,AcceptMixin,ETagRequestMixin,UserAgentMixin,AuthenticationMixinclassRequest(AcceptMixin,ETagRequestMixin,UserAgentMixin,AuthenticationMixin,BaseRequest):pass
from werkzeug import BaseRequest
class Request(BaseRequest):
pass
If I want to add accept header support, I would make that
from werkzeug import BaseRequest, AcceptMixin
class Request(AcceptMixin, BaseRequest):
pass
If I wanted to make a request object that supports accept headers, etags, authentication, and user agent support, I could do this:
from werkzeug import BaseRequest, AcceptMixin, ETagRequestMixin, UserAgentMixin, AuthenticationMixin
class Request(AcceptMixin, ETagRequestMixin, UserAgentMixin, AuthenticationMixin, BaseRequest):
pass
The difference is subtle, but in the above examples, the mixin classes weren’t made to stand on their own. In more traditional multiple inheritance, the AuthenticationMixin (for example) would probably be something more like Authenticator. That is, the class would probably be designed to stand on its own.
First, you should note that mixins only exist in multiple-inheritance languages. You can’t do a mixin in Java or C#.
Basically, a mixin is a stand-alone base type that provides limited functionality and polymorphic resonance for a child class. If you’re thinking in C#, think of an interface that you don’t have to actually implement because it’s already implemented; you just inherit from it and benefit from its functionality.
Mixins are typically narrow in scope and not meant to be extended.
[edit — as to why:]
I suppose I should address why, since you asked. The big benefit is that you don’t have to do it yourself over and over again. In C#, the biggest place where a mixin could benefit might be from the Disposal pattern. Whenever you implement IDisposable, you almost always want to follow the same pattern, but you end up writing and re-writing the same basic code with minor variations. If there were an extendable Disposal mixin, you could save yourself a lot of extra typing.
[edit 2 — to answer your other questions]
What separates a mixin from multiple inheritance? Is it just a matter of semantics?
Yes. The difference between a mixin and standard multiple inheritance is just a matter of semantics; a class that has multiple inheritance might utilize a mixin as part of that multiple inheritance.
The point of a mixin is to create a type that can be “mixed in” to any other type via inheritance without affecting the inheriting type while still offering some beneficial functionality for that type.
Again, think of an interface that is already implemented.
I personally don’t use mixins since I develop primarily in a language that doesn’t support them, so I’m having a really difficult time coming up with a decent example that will just supply that “ahah!” moment for you. But I’ll try again. I’m going to use an example that’s contrived — most languages already provide the feature in some way or another — but that will, hopefully, explain how mixins are supposed to be created and used. Here goes:
Suppose you have a type that you want to be able to serialize to and from XML. You want the type to provide a “ToXML” method that returns a string containing an XML fragment with the data values of the type, and a “FromXML” that allows the type to reconstruct its data values from an XML fragment in a string. Again, this is a contrived example, so perhaps you use a file stream, or an XML Writer class from your language’s runtime library… whatever. The point is that you want to serialize your object to XML and get a new object back from XML.
The other important point in this example is that you want to do this in a generic way. You don’t want to have to implement a “ToXML” and “FromXML” method for every type that you want to serialize, you want some generic means of ensuring that your type will do this and it just works. You want code reuse.
If your language supported it, you could create the XmlSerializable mixin to do your work for you. This type would implement the ToXML and the FromXML methods. It would, using some mechanism that’s not important to the example, be capable of gathering all the necessary data from any type that it’s mixed in with to build the XML fragment returned by ToXML and it would be equally capable of restoring that data when FromXML is called.
And.. that’s it. To use it, you would have any type that needs to be serialized to XML inherit from XmlSerializable. Whenever you needed to serialize or deserialize that type, you would simply call ToXML or FromXML. In fact, since XmlSerializable is a fully-fledged type and polymorphic, you could conceivably build a document serializer that doesn’t know anything about your original type, accepting only, say, an array of XmlSerializable types.
Now imagine using this scenario for other things, like creating a mixin that ensures that every class that mixes it in logs every method call, or a mixin that provides transactionality to the type that mixes it in. The list can go on and on.
If you just think of a mixin as a small base type designed to add a small amount of functionality to a type without otherwise affecting that type, then you’re golden.
classComparableMixin(object):"""This class has methods which use `<=` and `==`,
but this class does NOT implement those methods."""def __ne__(self, other):returnnot(self == other)def __lt__(self, other):return self <= other and(self != other)def __gt__(self, other):returnnot self <= other
def __ge__(self, other):return self == other or self > other
classInteger(ComparableMixin):def __init__(self, i):
self.i = i
def __le__(self, other):return self.i <= other.i
def __eq__(self, other):return self.i == other.i
assertInteger(0)<Integer(1)assertInteger(0)!=Integer(1)assertInteger(1)>Integer(0)assertInteger(1)>=Integer(1)# It is possible to instantiate a mixin:
o =ComparableMixin()# but one of its methods raise an exception:#o != o
classHasMethod1(object):def method(self):return1classHasMethod2(object):def method(self):return2classUsesMethod10(object):def usesMethod(self):return self.method()+10classUsesMethod20(object):def usesMethod(self):return self.method()+20class C1_10(HasMethod1,UsesMethod10):passclass C1_20(HasMethod1,UsesMethod20):passclass C2_10(HasMethod2,UsesMethod10):passclass C2_20(HasMethod2,UsesMethod20):passassert C1_10().usesMethod()==11assert C1_20().usesMethod()==21assert C2_10().usesMethod()==12assert C2_20().usesMethod()==22# Nothing prevents implementing the method# on the base class like in Definition 1:class C3_10(UsesMethod10):def method(self):return3assert C3_10().usesMethod()==13
This answer aims to explain mixins with examples that are:
self-contained: short, with no need to know any libraries to understand the example.
in Python, not in other languages.
It is understandable that there were examples from other languages such as Ruby since the term is much more common in those languages, but this is a Python thread.
It shall also consider the controversial question:
Is multiple inheritance necessary or not to characterize a mixin?
Definitions
I have yet to see a citation from an “authoritative” source clearly saying what is a mixin in Python.
I have seen 2 possible definitions of a mixin (if they are to be considered as different from other similar concepts such as abstract base classes), and people don’t entirely agree on which one is correct.
The consensus may vary between different languages.
Definition 1: no multiple inheritance
A mixin is a class such that some method of the class uses a method which is not defined in the class.
Therefore the class is not meant to be instantiated, but rather serve as a base class. Otherwise the instance would have methods that cannot be called without raising an exception.
A constraint which some sources add is that the class may not contain data, only methods, but I don’t see why this is necessary. In practice however, many useful mixins don’t have any data, and base classes without data are simpler to use.
A classic example is the implementation of all comparison operators from only <= and ==:
class ComparableMixin(object):
"""This class has methods which use `<=` and `==`,
but this class does NOT implement those methods."""
def __ne__(self, other):
return not (self == other)
def __lt__(self, other):
return self <= other and (self != other)
def __gt__(self, other):
return not self <= other
def __ge__(self, other):
return self == other or self > other
class Integer(ComparableMixin):
def __init__(self, i):
self.i = i
def __le__(self, other):
return self.i <= other.i
def __eq__(self, other):
return self.i == other.i
assert Integer(0) < Integer(1)
assert Integer(0) != Integer(1)
assert Integer(1) > Integer(0)
assert Integer(1) >= Integer(1)
# It is possible to instantiate a mixin:
o = ComparableMixin()
# but one of its methods raise an exception:
#o != o
This particular example could have been achieved via the functools.total_ordering() decorator, but the game here was to reinvent the wheel:
A mixin is a design pattern in which some method of a base class uses a method it does not define, and that method is meant to be implemented by another base class, not by the derived like in Definition 1.
The term mixin class refers to base classes which are intended to be used in that design pattern (TODO those that use the method, or those that implement it?)
It is not easy to decide if a given class is a mixin or not: the method could be just implemented on the derived class, in which case we’re back to Definition 1. You have to consider the author’s intentions.
This pattern is interesting because it is possible to recombine functionalities with different choices of base classes:
class HasMethod1(object):
def method(self):
return 1
class HasMethod2(object):
def method(self):
return 2
class UsesMethod10(object):
def usesMethod(self):
return self.method() + 10
class UsesMethod20(object):
def usesMethod(self):
return self.method() + 20
class C1_10(HasMethod1, UsesMethod10): pass
class C1_20(HasMethod1, UsesMethod20): pass
class C2_10(HasMethod2, UsesMethod10): pass
class C2_20(HasMethod2, UsesMethod20): pass
assert C1_10().usesMethod() == 11
assert C1_20().usesMethod() == 21
assert C2_10().usesMethod() == 12
assert C2_20().usesMethod() == 22
# Nothing prevents implementing the method
# on the base class like in Definition 1:
class C3_10(UsesMethod10):
def method(self):
return 3
assert C3_10().usesMethod() == 13
then the class gets an __iter__mixin method for free.
Therefore at least on this point of the documentation, mixin does not not require multiple inheritance, and is coherent with Definition 1.
The documentation could of course be contradictory at different points, and other important Python libraries might be using the other definition in their documentation.
This page also uses the term Set mixin, which clearly suggests that classes like Set and Iterator can be called Mixin classes.
In other languages
Ruby: Clearly does not require multiple inheritance for mixin, as mentioned in major reference books such as Programming Ruby and The Ruby programming Language
C++: A method that is not implemented is a pure virtual method.
Definition 1 coincides with the definition of an abstract class (a class that has a pure virtual method).
That class cannot be instantiated.
I think of them as a disciplined way of using multiple inheritance – because ultimately a mixin is just another python class that (might) follow the conventions about classes that are called mixins.
My understanding of the conventions that govern something you would call a Mixin are that a Mixin:
adds methods but not instance variables (class constants are OK)
only inherits from object (in Python)
That way it limits the potential complexity of multiple inheritance, and makes it reasonably easy to track the flow of your program by limiting where you have to look (compared to full multiple inheritance). They are similar to ruby modules.
If I want to add instance variables (with more flexibility than allowed for by single inheritance) then I tend to go for composition.
Having said that, I have seen classes called XYZMixin that do have instance variables.
Mixins is a concept in Programming in which the class provides functionalities but it is not meant to be used for instantiation. Main purpose of Mixins is to provide functionalities which are standalone and it would be best if the mixins itself do not have inheritance with other mixins and also avoid state. In languages such as Ruby, there is some direct language support but for Python, there isn’t. However, you could used multi-class inheritance to execute the functionality provided in Python.
I watched this video http://www.youtube.com/watch?v=v_uKI2NOLEM to understand the basics of mixins. It is quite useful for a beginner to understand the basics of mixins and how they work and the problems you might face in implementing them.
classOrderedCounter(Counter,OrderedDict):'Counter that remembers the order elements are first encountered'def __repr__(self):return'%s(%r)'%(self.__class__.__name__,OrderedDict(self))def __reduce__(self):return self.__class__,(OrderedDict(self),)
classThreadingMixIn:"""Mix-in class to handle each request in a new thread."""# Decides how threads will act upon termination of the# main process
daemon_threads =Falsedef process_request_thread(self, request, client_address):"""Same as in BaseServer but as a thread.
In addition, exception handling is done here.
"""try:
self.finish_request(request, client_address)exceptException:
self.handle_error(request, client_address)finally:
self.shutdown_request(request)def process_request(self, request, client_address):"""Start a new thread to process the request."""
t = threading.Thread(target = self.process_request_thread,
args =(request, client_address))
t.daemon = self.daemon_threads
t.start()
人为的例子
这是一个mixin,主要用于演示目的-大多数对象的发展将超出此repr的用途:
classSimpleInitReprMixin(object):"""mixin, don't instantiate - useful for classes instantiable
by keyword arguments to their __init__ method.
"""
__slots__ =()# allow subclasses to use __slots__ to prevent __dict__def __repr__(self):
kwarg_strings =[]
d = getattr(self,'__dict__',None)if d isnotNone:for k, v in d.items():
kwarg_strings.append('{k}={v}'.format(k=k, v=repr(v)))
slots = getattr(self,'__slots__',None)if slots isnotNone:for k in slots:
v = getattr(self, k,None)
kwarg_strings.append('{k}={v}'.format(k=k, v=repr(v)))return'{name}({kwargs})'.format(
name=type(self).__name__,
kwargs=', '.join(kwarg_strings))
用法是:
classFoo(SimpleInitReprMixin):# add other mixins and/or extend another class here
__slots__ ='foo',def __init__(self, foo=None):
self.foo = foo
super(Foo, self).__init__()
和用法:
>>> f1 =Foo('bar')>>> f2 =Foo()>>> f1
Foo(foo='bar')>>> f2
Foo(foo=None)
What separates a mixin from multiple inheritance? Is it just a matter of semantics?
A mixin is a limited form of multiple inheritance. In some languages the mechanism for adding a mixin to a class is slightly different (in terms of syntax) from that of inheritance.
In the context of Python especially, a mixin is a parent class that provides functionality to subclasses but is not intended to be instantiated itself.
What might cause you to say, “that’s just multiple inheritance, not really a mixin” is if the class that might be confused for a mixin can actually be instantiated and used – so indeed it is a semantic, and very real, difference.
class OrderedCounter(Counter, OrderedDict):
'Counter that remembers the order elements are first encountered'
def __repr__(self):
return '%s(%r)' % (self.__class__.__name__, OrderedDict(self))
def __reduce__(self):
return self.__class__, (OrderedDict(self),)
It subclasses both the Counter and the OrderedDict from the collections module.
Both Counter and OrderedDict are intended to be instantiated and used on their own. However, by subclassing them both, we can have a counter that is ordered and reuses the code in each object.
This is a powerful way to reuse code, but it can also be problematic. If it turns out there’s a bug in one of the objects, fixing it without care could create a bug in the subclass.
Example of a Mixin
Mixins are usually promoted as the way to get code reuse without potential coupling issues that cooperative multiple inheritance, like the OrderedCounter, could have. When you use mixins, you use functionality that isn’t as tightly coupled to the data.
Unlike the example above, a mixin is not intended to be used on its own. It provides new or different functionality.
Forking and threading versions of each type of server can be created
using these mix-in classes. For instance, ThreadingUDPServer is
created as follows:
class ThreadingUDPServer(ThreadingMixIn, UDPServer):
pass
The mix-in class comes first, since it overrides a method defined in
UDPServer. Setting the various attributes also changes the behavior of
the underlying server mechanism.
In this case, the mixin methods override the methods in the UDPServer object definition to allow for concurrency.
The overridden method appears to be process_request and it also provides another method, process_request_thread. Here it is from the source code:
class ThreadingMixIn:
"""Mix-in class to handle each request in a new thread."""
# Decides how threads will act upon termination of the
# main process
daemon_threads = False
def process_request_thread(self, request, client_address):
"""Same as in BaseServer but as a thread.
In addition, exception handling is done here.
"""
try:
self.finish_request(request, client_address)
except Exception:
self.handle_error(request, client_address)
finally:
self.shutdown_request(request)
def process_request(self, request, client_address):
"""Start a new thread to process the request."""
t = threading.Thread(target = self.process_request_thread,
args = (request, client_address))
t.daemon = self.daemon_threads
t.start()
A Contrived Example
This is a mixin that is mostly for demonstration purposes – most objects will evolve beyond the usefulness of this repr:
class SimpleInitReprMixin(object):
"""mixin, don't instantiate - useful for classes instantiable
by keyword arguments to their __init__ method.
"""
__slots__ = () # allow subclasses to use __slots__ to prevent __dict__
def __repr__(self):
kwarg_strings = []
d = getattr(self, '__dict__', None)
if d is not None:
for k, v in d.items():
kwarg_strings.append('{k}={v}'.format(k=k, v=repr(v)))
slots = getattr(self, '__slots__', None)
if slots is not None:
for k in slots:
v = getattr(self, k, None)
kwarg_strings.append('{k}={v}'.format(k=k, v=repr(v)))
return '{name}({kwargs})'.format(
name=type(self).__name__,
kwargs=', '.join(kwarg_strings)
)
and usage would be:
class Foo(SimpleInitReprMixin): # add other mixins and/or extend another class here
__slots__ = 'foo',
def __init__(self, foo=None):
self.foo = foo
super(Foo, self).__init__()
And usage:
>>> f1 = Foo('bar')
>>> f2 = Foo()
>>> f1
Foo(foo='bar')
>>> f2
Foo(foo=None)
I think there have been some good explanations here but I wanted to provide another perspective.
In Scala, you can do mixins as has been described here but what is very interesting is that the mixins are actually ‘fused’ together to create a new kind of class to inherit from. In essence, you do not inherit from multiple classes/mixins, but rather, generate a new kind of class with all the properties of the mixin to inherit from. This makes sense since Scala is based on the JVM where multiple-inheritance is not currently supported (as of Java 8). This mixin class type, by the way, is a special type called a Trait in Scala.
It’s hinted at in the way a class is defined:
class NewClass extends FirstMixin with SecondMixin with ThirdMixin
…
I’m not sure if the CPython interpreter does the same (mixin class-composition) but I wouldn’t be surprised. Also, coming from a C++ background, I would not call an ABC or ‘interface’ equivalent to a mixin — it’s a similar concept but divergent in use and implementation.
I’d advise against mix-ins in new Python code, if you can find any other way around it (such as composition-instead-of-inheritance, or just monkey-patching methods into your own classes) that isn’t much more effort.
In old-style classes you could use mix-ins as a way of grabbing a few methods from another class. But in the new-style world everything, even the mix-in, inherits from object. That means that any use of multiple inheritance naturally introduces MRO issues.
There are ways to make multiple-inheritance MRO work in Python, most notably the super() function, but it means you have to do your whole class hierarchy using super(), and it’s considerably more difficult to understand the flow of control.
If you’re building a class and you want it to act like a dictionary, you can define all the various __ __ methods necessary. But that’s a bit of a pain. As an alternative, you can just define a few, and inherit (in addition to any other inheritance) from UserDict.DictMixin (moved to collections.DictMixin in py3k). This will have the effect of automatically defining all the rest of the dictionary api.
A second example: the GUI toolkit wxPython allows you to make list controls with multiple columns (like, say, the file display in Windows Explorer). By default, these lists are fairly basic. You can add additional functionality, such as the ability to sort the list by a particular column by clicking on the column header, by inheriting from ListCtrl and adding appropriate mixins.
It’s not a Python example but in the D programing language the term mixin is used to refer to a construct used much the same way; adding a pile of stuff to a class.
In D (which by the way doesn’t do MI) this is done by inserting a template (think syntactically aware and safe macros and you will be close) into a scope. This allows for a single line of code in a class, struct, function, module or whatever to expand to any number of declarations.
OP mentioned that he/she never heard of mixin in C++, perhaps that is because they are called Curiously Recurring Template Pattern (CRTP) in C++. Also, @Ciro Santilli mentioned that mixin is implemented via abstract base class in C++. While abstract base class can be used to implement mixin, it is an overkill as the functionality of virtual function at run-time can be achieved using template at compile time without the overhead of virtual table lookup at run-time.
EDIT: Added protected constructor in ComparableMixin so that it can only be inherited and not instantiated. Updated the example to show how protected constructor will cause compilation error when an object of ComparableMixin is created.
module A # you create a moduledef a1 # lets have a method 'a1' in it
end
def a2 # Another method 'a2'
end
end
module B # let's say we have another moduledef b1 # A method 'b1'
end
def b2 #another method b2
end
end
classSample# we create a class 'Sample'
include A # including module 'A' in the class 'Sample' (mixin)
include B # including module B as welldef S1 #class 'Sample' contains a method 's1'
end
end
samp =Sample.new # creating an instance object 'samp'# we can access methods from module A and B in our class(power of mixin)
samp.a1 # accessing method 'a1' from module A
samp.a2 # accessing method 'a2' from module A
samp.b1 # accessing method 'b1' from module B
samp.b2 # accessing method 'a2' from module B
samp.s1 # accessing method 's1' inside the class Sample
mixin gives a way to add functionality in a class, i.e you can interact with methods defined in a module by including the module inside the desired class. Though ruby doesn’t supports multiple inheritance but provides mixin as an alternative to achieve that.
here is an example that explains how multiple inheritance is achieved using mixin.
module A # you create a module
def a1 # lets have a method 'a1' in it
end
def a2 # Another method 'a2'
end
end
module B # let's say we have another module
def b1 # A method 'b1'
end
def b2 #another method b2
end
end
class Sample # we create a class 'Sample'
include A # including module 'A' in the class 'Sample' (mixin)
include B # including module B as well
def S1 #class 'Sample' contains a method 's1'
end
end
samp = Sample.new # creating an instance object 'samp'
# we can access methods from module A and B in our class(power of mixin)
samp.a1 # accessing method 'a1' from module A
samp.a2 # accessing method 'a2' from module A
samp.b1 # accessing method 'b1' from module B
samp.b2 # accessing method 'a2' from module B
samp.s1 # accessing method 's1' inside the class Sample
I just used a python mixin to implement unit testing for python milters. Normally, a milter talks to an MTA, making unit testing difficult. The test mixin overrides methods that talk to the MTA, and create a simulated environment driven by test cases instead.
So, you take an unmodified milter application, like spfmilter, and mixin TestBase, like this:
I think previous responses defined very well what MixIns are. However,
in order to better understand them, it might be useful to compare MixIns with Abstract Classes and Interfaces from the code/implementation perspective:
1. Abstract Class
Class that needs to contain one or more abstract methods
Abstract Classcan contain state (instance variables) and non-abstract methods
2. Interface
Interface contains abstract methods only (no non-abstract methods and no internal state)
3. MixIns
MixIns (like Interfaces) do not contain internal state (instance variables)
MixIns contain one or more non-abstract methods (they can contain non-abstract methods unlike interfaces)
In e.g. Python these are just conventions, because all of the above are defined as classes. However, the common feature of both Abstract Classes, Interfaces and MixIns is that they should not exist on their own, i.e. should not be instantiated.
This question is not for the discussion of whether or not the singleton design pattern is desirable, is an anti-pattern, or for any religious wars, but to discuss how this pattern is best implemented in Python in such a way that is most pythonic. In this instance I define ‘most pythonic’ to mean that it follows the ‘principle of least astonishment’.
I have multiple classes which would become singletons (my use-case is for a logger, but this is not important). I do not wish to clutter several classes with added gumph when I can simply inherit or decorate.
Best methods:
Method 1: A decorator
def singleton(class_):
instances = {}
def getinstance(*args, **kwargs):
if class_ not in instances:
instances[class_] = class_(*args, **kwargs)
return instances[class_]
return getinstance
@singleton
class MyClass(BaseClass):
pass
Pros
Decorators are additive in a way that is often more intuitive than multiple inheritance.
Cons
While objects created using MyClass() would be true singleton objects, MyClass itself is a a function, not a class, so you cannot call class methods from it. Also for m = MyClass(); n = MyClass(); o = type(n)(); then m == n && m != o && n != o
Method 2: A base class
class Singleton(object):
_instance = None
def __new__(class_, *args, **kwargs):
if not isinstance(class_._instance, class_):
class_._instance = object.__new__(class_, *args, **kwargs)
return class_._instance
class MyClass(Singleton, BaseClass):
pass
Pros
It’s a true class
Cons
Multiple inheritance – eugh! __new__ could be overwritten during inheritance from a second base class? One has to think more than is necessary.
class Singleton(type):
_instances = {}
def __call__(cls, *args, **kwargs):
if cls not in cls._instances:
cls._instances[cls] = super(Singleton, cls).__call__(*args, **kwargs)
return cls._instances[cls]
#Python2
class MyClass(BaseClass):
__metaclass__ = Singleton
#Python3
class MyClass(BaseClass, metaclass=Singleton):
pass
Pros
It’s a true class
Auto-magically covers inheritance
Uses __metaclass__ for its proper purpose (and made me aware of it)
Cons
Are there any?
Method 4: decorator returning a class with the same name
def singleton(class_):
class class_w(class_):
_instance = None
def __new__(class_, *args, **kwargs):
if class_w._instance is None:
class_w._instance = super(class_w,
class_).__new__(class_,
*args,
**kwargs)
class_w._instance._sealed = False
return class_w._instance
def __init__(self, *args, **kwargs):
if self._sealed:
return
super(class_w, self).__init__(*args, **kwargs)
self._sealed = True
class_w.__name__ = class_.__name__
return class_w
@singleton
class MyClass(BaseClass):
pass
Pros
It’s a true class
Auto-magically covers inheritance
Cons
Is there not an overhead for creating each new class? Here we are creating two classes for each class we wish to make a singleton. While this is fine in my case, I worry that this might not scale. Of course there is a matter of debate as to whether it aught to be too easy to scale this pattern…
What is the point of the _sealed attribute
Can’t call methods of the same name on base classes using super() because they will recurse. This means you can’t customize __new__ and can’t subclass a class that needs you to call up to __init__.
# works in Python 2 & 3class_Singleton(type):""" A metaclass that creates a Singleton base class when called. """
_instances ={}def __call__(cls,*args,**kwargs):if cls notin cls._instances:
cls._instances[cls]= super(_Singleton, cls).__call__(*args,**kwargs)return cls._instances[cls]classSingleton(_Singleton('SingletonMeta',(object,),{})):passclassLogger(Singleton):pass
A few words about metaclasses. A metaclass is the class of a class; that is, a class is an instance of its metaclass. You find the metaclass of an object in Python with type(obj). Normal new-style classes are of type type. Logger in the code above will be of type class 'your_module.Singleton', just as the (only) instance of Logger will be of type class 'your_module.Logger'. When you call logger with Logger(), Python first asks the metaclass of Logger, Singleton, what to do, allowing instance creation to be pre-empted. This process is the same as Python asking a class what to do by calling __getattr__ when you reference one of it’s attributes by doing myclass.attribute.
A metaclass essentially decides what the definition of a class means and how to implement that definition. See for example http://code.activestate.com/recipes/498149/, which essentially recreates C-style structs in Python using metaclasses. The thread What are some (concrete) use-cases for metaclasses? also provides some examples, they generally seem to be related to declarative programming, especially as used in ORMs.
In this situation, if you use your Method #2, and a subclass defines a __new__ method, it will be executed every time you call SubClassOfSingleton() — because it is responsible for calling the method that returns the stored instance. With a metaclass, it will only be called once, when the only instance is created. You want to customize what it means to call the class, which is decided by it’s type.
In general, it makes sense to use a metaclass to implement a singleton. A singleton is special because is created only once, and a metaclass is the way you customize the creation of a class. Using a metaclass gives you more control in case you need to customize the singleton class definitions in other ways.
Your singletons won’t need multiple inheritance (because the metaclass is not a base class), but for subclasses of the created class that use multiple inheritance, you need to make sure the singleton class is the first / leftmost one with a metaclass that redefines __call__ This is very unlikely to be an issue. The instance dict is not in the instance’s namespace so it won’t accidentally overwrite it.
You will also hear that the singleton pattern violates the “Single Responsibility Principle” — each class should do only one thing. That way you don’t have to worry about messing up one thing the code does if you need to change another, because they are separate and encapsulated. The metaclass implementation passes this test. The metaclass is responsible for enforcing the pattern and the created class and subclasses need not be aware that they are singletons. Method #1 fails this test, as you noted with “MyClass itself is a a function, not a class, so you cannot call class methods from it.”
Python 2 and 3 Compatible Version
Writing something that works in both Python2 and 3 requires using a slightly more complicated scheme. Since metaclasses are usually subclasses of type type, it’s possible to use one to dynamically create an intermediary base class at run time with it as its metaclass and then use that as the baseclass of the public Singleton base class. It’s harder to explain than to do, as illustrated next:
# works in Python 2 & 3
class _Singleton(type):
""" A metaclass that creates a Singleton base class when called. """
_instances = {}
def __call__(cls, *args, **kwargs):
if cls not in cls._instances:
cls._instances[cls] = super(_Singleton, cls).__call__(*args, **kwargs)
return cls._instances[cls]
class Singleton(_Singleton('SingletonMeta', (object,), {})): pass
class Logger(Singleton):
pass
An ironic aspect of this approach is that it’s using subclassing to implement a metaclass. One possible advantage is that, unlike with a pure metaclass, isinstance(inst, Singleton) will return True.
Corrections
On another topic, you’ve probably already noticed this, but the base class implementation in your original post is wrong. _instances needs to be referenced on the class, you need to use super() or you’re recursing, and __new__ is actually a static method that you have to pass the class to, not a class method, as the actual class hasn’t been created yet when it is called. All of these things will be true for a metaclass implementation as well.
class Singleton(object):
_instances = {}
def __new__(class_, *args, **kwargs):
if class_ not in class_._instances:
class_._instances[class_] = super(Singleton, class_).__new__(class_, *args, **kwargs)
return class_._instances[class_]
class MyClass(Singleton):
pass
c = MyClass()
Decorator Returning A Class
I originally was writing a comment but it was too long, so I’ll add this here. Method #4 is better than the other decorator version, but it’s more code than needed for a singleton, and it’s not as clear what it does.
The main problems stem from the class being it’s own base class. First, isn’t it weird to have a class be a subclass of a nearly identical class with the same name that exists only in its __class__ attribute? This also means that you can’t define any methods that call the method of the same name on their base class with super() because they will recurse. This means your class can’t customize __new__, and can’t derive from any classes that need __init__ called on them.
When to use the singleton pattern
Your use case is one of the better examples of wanting to use a singleton. You say in one of the comments “To me logging has always seemed a natural candidate for Singletons.” You’re absolutely right.
When people say singletons are bad, the most common reason is they are implicit shared state. While with global variables and top-level module imports are explicit shared state, other objects that are passed around are generally instantiated. This is a good point, with two exceptions.
The first, and one that gets mentioned in various places, is when the singletons are constant. Use of global constants, especially enums, is widely accepted, and considered sane because no matter what, none of the users can mess them up for any other user. This is equally true for a constant singleton.
The second exception, which get mentioned less, is the opposite — when the singleton is only a data sink, not a data source (directly or indirectly). This is why loggers feel like a “natural” use for singletons. As the various users are not changing the loggers in ways other users will care about, there is not really shared state. This negates the primary argument against the singleton pattern, and makes them a reasonable choice because of their ease of use for the task.
Now, there is one kind of Singleton which is OK. That is a singleton where all of the reachable objects are immutable. If all objects are immutable than Singleton has no global state, as everything is constant. But it is so easy to turn this kind of singleton into mutable one, it is very slippery slope. Therefore, I am against these Singletons too, not because they are bad, but because it is very easy for them to go bad. (As a side note Java enumeration are just these kind of singletons. As long as you don’t put state into your enumeration you are OK, so please don’t.)
The other kind of Singletons, which are semi-acceptable are those which don’t effect the execution of your code, They have no “side effects”. Logging is perfect example. It is loaded with Singletons and global state. It is acceptable (as in it will not hurt you) because your application does not behave any different whether or not a given logger is enabled. The information here flows one way: From your application into the logger. Even thought loggers are global state since no information flows from loggers into your application, loggers are acceptable. You should still inject your logger if you want your test to assert that something is getting logged, but in general Loggers are not harmful despite being full of state.
Use a module. It is imported only once. Define some global variables in it – they will be singleton’s ‘attributes’. Add some functions – the singleton’s ‘methods’.
You probably never need a singleton in Python. Just define all your data and functions in a module and you have a de-facto singleton.
If you really absolutely have to have a singleton class then I’d go with:
class My_Singleton(object):
def foo(self):
pass
my_singleton = My_Singleton()
To use:
from mysingleton import my_singleton
my_singleton.foo()
where mysingleton.py is your filename that My_Singleton is defined in. This works because after the first time a file is imported, Python doesn’t re-execute the code.
I’d strongly recommend to watch Alex Martelli’s talks on design patterns in python: part 1 and part 2. In particular, in part 1 he talks about singletons/shared state objects.
@SingletonclassFoo:def __init__(self):print'Foo created'
f =Foo()# Error, this isn't how you get the instance of a singleton
f =Foo.Instance()# Good. Being explicit is in line with the Python Zen
g =Foo.Instance()# Returns already created instanceprint f is g # True
这是代码:
classSingleton:"""
A non-thread-safe helper class to ease implementing singletons.
This should be used as a decorator -- not a metaclass -- to the
class that should be a singleton.
The decorated class can define one `__init__` function that
takes only the `self` argument. Other than that, there are
no restrictions that apply to the decorated class.
To get the singleton instance, use the `Instance` method. Trying
to use `__call__` will result in a `TypeError` being raised.
Limitations: The decorated class cannot be inherited from.
"""def __init__(self, decorated):
self._decorated = decorated
defInstance(self):"""
Returns the singleton instance. Upon its first call, it creates a
new instance of the decorated class and calls its `__init__` method.
On all subsequent calls, the already created instance is returned.
"""try:return self._instance
exceptAttributeError:
self._instance = self._decorated()return self._instance
def __call__(self):raiseTypeError('Singletons must be accessed through `Instance()`.')def __instancecheck__(self, inst):return isinstance(inst, self._decorated)
Here’s my own implementation of singletons. All you have to do is decorate the class; to get the singleton, you then have to use the Instance method. Here’s an example:
@Singleton
class Foo:
def __init__(self):
print 'Foo created'
f = Foo() # Error, this isn't how you get the instance of a singleton
f = Foo.Instance() # Good. Being explicit is in line with the Python Zen
g = Foo.Instance() # Returns already created instance
print f is g # True
And here’s the code:
class Singleton:
"""
A non-thread-safe helper class to ease implementing singletons.
This should be used as a decorator -- not a metaclass -- to the
class that should be a singleton.
The decorated class can define one `__init__` function that
takes only the `self` argument. Other than that, there are
no restrictions that apply to the decorated class.
To get the singleton instance, use the `Instance` method. Trying
to use `__call__` will result in a `TypeError` being raised.
Limitations: The decorated class cannot be inherited from.
"""
def __init__(self, decorated):
self._decorated = decorated
def Instance(self):
"""
Returns the singleton instance. Upon its first call, it creates a
new instance of the decorated class and calls its `__init__` method.
On all subsequent calls, the already created instance is returned.
"""
try:
return self._instance
except AttributeError:
self._instance = self._decorated()
return self._instance
def __call__(self):
raise TypeError('Singletons must be accessed through `Instance()`.')
def __instancecheck__(self, inst):
return isinstance(inst, self._decorated)
classSingleton(type):
_instances ={}def __call__(cls,*args,**kwargs):if cls notin cls._instances:
cls._instances[cls]= super(Singleton, cls).__call__(*args,**kwargs)return cls._instances[cls]
MC =Singleton('MC',(object),{})classMyClass(MC):pass# Code for the class implementation
Method 3 seems to be very neat, but if you want your program to run in both Python 2 and Python 3, it doesn’t work. Even protecting the separate variants with tests for the Python version fails, because the Python 3 version gives a syntax error in Python 2.
class Singleton(type):
_instances = {}
def __call__(cls, *args, **kwargs):
if cls not in cls._instances:
cls._instances[cls] = super(Singleton, cls).__call__(*args, **kwargs)
return cls._instances[cls]
MC = Singleton('MC', (object), {})
class MyClass(MC):
pass # Code for the class implementation
I presume that ‘object’ in the assignment needs to be replaced with the ‘BaseClass’, but I haven’t tried that (I have tried code as illustrated).
Use it as a decorator on a class that should be a singleton. Like this:
@singleton
class MySingleton:
#....
This is similar to the singleton = lambda c: c() decorator in another answer. Like the other solution, the only instance has name of the class (MySingleton). However, with this solution you can still “create” instances (actually get the only instance) from the class, by doing MySingleton(). It also prevents you from creating additional instances by doing type(MySingleton)() (that also returns the same instance).
回答 10
我将我扔进戒指。这是一个简单的装饰器。
from abc import ABC
def singleton(real_cls):classSingletonFactory(ABC):
instance =Nonedef __new__(cls,*args,**kwargs):ifnot cls.instance:
cls.instance = real_cls(*args,**kwargs)return cls.instance
SingletonFactory.register(real_cls)returnSingletonFactory# Usage@singletonclassYourClass:...# Your normal implementation, no special requirements.
def unique(real_cls):classUniqueFactory(ABC):@functools.lru_cache(None)# Handy for 3.2+, but use any memoization decorator you likedef __new__(cls,*args,**kwargs):return real_cls(*args,**kwargs)UniqueFactory.register(real_cls)returnUniqueFactory
I’ll toss mine into the ring. It’s a simple decorator.
from abc import ABC
def singleton(real_cls):
class SingletonFactory(ABC):
instance = None
def __new__(cls, *args, **kwargs):
if not cls.instance:
cls.instance = real_cls(*args, **kwargs)
return cls.instance
SingletonFactory.register(real_cls)
return SingletonFactory
# Usage
@singleton
class YourClass:
... # Your normal implementation, no special requirements.
Benefits I think it has over some of the other solutions:
It’s clear and concise (to my eye ;D).
Its action is completely encapsulated. You don’t need to change a single thing about the implementation of YourClass. This includes not needing to use a metaclass for your class (note that the metaclass above is on the factory, not the “real” class).
It doesn’t rely on monkey-patching anything.
It’s transparent to callers:
Callers still simply import YourClass, it looks like a class (because it is), and they use it normally. No need to adapt callers to a factory function.
What YourClass() instantiates is still a true instance of the YourClass you implemented, not a proxy of any kind, so no chance of side effects resulting from that.
isinstance(instance, YourClass) and similar operations still work as expected (though this bit does require abc so precludes Python <2.6).
One downside does occur to me: classmethods and staticmethods of the real class are not transparently callable via the factory class hiding it. I’ve used this rarely enough that I’ve never happen to run into that need, but it would be easily rectified by using a custom metaclass on the factory that implements __getattr__() to delegate all-ish attribute access to the real class.
A related pattern I’ve actually found more useful (not that I’m saying these kinds of things are required very often at all) is a “Unique” pattern where instantiating the class with the same arguments results in getting back the same instance. I.e. a “singleton per arguments”. The above adapts to this well and becomes even more concise:
def unique(real_cls):
class UniqueFactory(ABC):
@functools.lru_cache(None) # Handy for 3.2+, but use any memoization decorator you like
def __new__(cls, *args, **kwargs):
return real_cls(*args, **kwargs)
UniqueFactory.register(real_cls)
return UniqueFactory
All that said, I do agree with the general advice that if you think you need one of these things, you really should probably stop for a moment and ask yourself if you really do. 99% of the time, YAGNI.
#decorator, modyfies new_cls
def _singleton(new_cls):
instance = new_cls() #2
def new(cls):
if isinstance(instance, cls): #4
return instance
else:
raise TypeError("I can only return instance of {}, caller wanted {}".format(new_cls, cls))
new_cls.__new__ = new #3
new_cls.__init__ = lambda self: None #5
return new_cls
#decorator, creates new class
def singleton(cls):
new_cls = type('singleton({})'.format(cls.__name__), (cls,), {} ) #1
return _singleton(new_cls)
#metaclass
def meta_singleton(name, bases, attrs):
new_cls = type(name, bases, attrs) #1
return _singleton(new_cls)
Explanation:
Create new class, inheriting from given cls
(it doesn’t modify cls in case someone wants for example singleton(list))
Create instance. Before overriding __new__ it’s so easy.
Now, when we have easily created instance, overrides __new__ using method defined moment ago.
The function returns instance only when it’s what the caller expects, otherwise raises TypeError.
The condition is not met when someone attempts to inherit from decorated class.
If __new__() returns an instance of cls, then the new instance’s __init__() method will be invoked like __init__(self[, ...]), where self is the new instance and the remaining arguments are the same as were passed to __new__().
instance is already initialized, so function replaces __init__ with function doing nothing.
try:# This is jython-specificfrom synchronize import make_synchronized
exceptImportError:# This should work across different python implementationsdef make_synchronized(func):import threading
func.__lock__ = threading.Lock()def synced_func(*args,**kws):with func.__lock__:return func(*args,**kws)return synced_func
classElvis(object):# NB must be subclass of object to use __new__
instance =None@classmethod@make_synchronizeddef __new__(cls,*args,**kwargs):if cls.instance isnotNone:raiseException()
cls.instance = object.__new__(cls,*args,**kwargs)return cls.instance
def __init__(self):pass# initialisation code...@classmethod@make_synchronizeddef the(cls):if cls.instance isnotNone:return cls.instance
return cls()
It is slightly similar to the answer by fab but not exactly the same.
The singleton contract does not require that we be able to call the constructor multiple times. As a singleton should be created once and once only, shouldn’t it be seen to be created just once? “Spoofing” the constructor arguably impairs legibility.
So my suggestion is just this:
class Elvis():
def __init__(self):
if hasattr(self.__class__, 'instance'):
raise Exception()
self.__class__.instance = self
# initialisation code...
@staticmethod
def the():
if hasattr(Elvis, 'instance'):
return Elvis.instance
return Elvis()
This does not rule out the use of the constructor or the field instance by user code:
if Elvis() is King.instance:
… if you know for sure that Elvis has not yet been created, and that King has.
But it encourages users to use the the method universally:
Elvis.the().leave(Building.the())
To make this complete you could also override __delattr__() to raise an Exception if an attempt is made to delete instance, and override __del__() so that it raises an Exception (unless we know the program is ending…)
Further improvements
My thanks to those who have helped with comments and edits, of which more are welcome. While I use Jython, this should work more generally, and be thread-safe.
try:
# This is jython-specific
from synchronize import make_synchronized
except ImportError:
# This should work across different python implementations
def make_synchronized(func):
import threading
func.__lock__ = threading.Lock()
def synced_func(*args, **kws):
with func.__lock__:
return func(*args, **kws)
return synced_func
class Elvis(object): # NB must be subclass of object to use __new__
instance = None
@classmethod
@make_synchronized
def __new__(cls, *args, **kwargs):
if cls.instance is not None:
raise Exception()
cls.instance = object.__new__(cls, *args, **kwargs)
return cls.instance
def __init__(self):
pass
# initialisation code...
@classmethod
@make_synchronized
def the(cls):
if cls.instance is not None:
return cls.instance
return cls()
Points of note:
If you don’t subclass from object in python2.x you will get an old-style class, which does not use __new__
When decorating __new__ you must decorate with @classmethod or __new__ will be an unbound instance method
This could possibly be improved by way of use of a metaclass, as this would allow you to make the a class-level property, possibly renaming it to instance
回答 13
一名班轮(我不为此感到自豪,但确实能胜任):
classMyclass:def __init__(self):# do your stuff
globals()[type(self).__name__]=lambda: self # singletonify
>>>from decorators import singleton
>>>>>>@singleton...class A:...def __init__(self,*args,**kwargs):...pass...>>>>>> a = A(name='Siddhesh')>>> b = A(name='Siddhesh', lname='Sathe')>>> c = A(name='Siddhesh', lname='Sathe')>>> a is b # has to be differentFalse>>> b is c # has to be sameTrue>>>
If one wants to have multiple number of instances of the same class, but only if the args or kwargs are different, one can use this
Ex.
If you have a class handling serial communication, and to create an instance you want to send the serial port as an argument, then with traditional approach won’t work
Using the above mentioned decorators, one can create multiple instances of the class if the args are different.
For same args, the decorator will return the same instance which is already been created.
>>> from decorators import singleton
>>>
>>> @singleton
... class A:
... def __init__(self, *args, **kwargs):
... pass
...
>>>
>>> a = A(name='Siddhesh')
>>> b = A(name='Siddhesh', lname='Sathe')
>>> c = A(name='Siddhesh', lname='Sathe')
>>> a is b # has to be different
False
>>> b is c # has to be same
True
>>>
#!/usr/bin/env python3classFoo:
me =Nonedef __init__(self):ifFoo.me !=None:raiseException('Instance of Foo still exists!')Foo.me = self
if __name__ =='__main__':Foo()Foo()
输出量
Traceback(most recent call last):File"./x.py", line 15,in<module>Foo()File"./x.py", line 8,in __init__
raiseException('Instance of Foo still exists!')Exception:Instance of Foo still exists!
Maybe I missunderstand the singleton pattern but my solution is this simple and pragmatic (pythonic?). This code fullfills two goals
Make the instance of Foo accessiable everywhere (global).
Only one instance of Foo can exist.
This is the code.
#!/usr/bin/env python3
class Foo:
me = None
def __init__(self):
if Foo.me != None:
raise Exception('Instance of Foo still exists!')
Foo.me = self
if __name__ == '__main__':
Foo()
Foo()
Output
Traceback (most recent call last):
File "./x.py", line 15, in <module>
Foo()
File "./x.py", line 8, in __init__
raise Exception('Instance of Foo still exists!')
Exception: Instance of Foo still exists!
import builtins
# -----------------------------------------------------------------------------# So..... you would expect that a class would be "global" in scope, however# when different modules use this,# EACH ONE effectively has its own class namespace. # In order to get around this, we use a metaclass to intercept# "new" and provide the "truly global metaclass instance" if it already existsclassMetaConfig(type):def __new__(cls, name, bases, dct):try:
class_inst = builtins.CONFIG_singleton
exceptAttributeError:
class_inst = super().__new__(cls, name, bases, dct)
builtins.CONFIG_singleton = class_inst
class_inst.do_load()return class_inst
# -----------------------------------------------------------------------------classConfig(metaclass=MetaConfig):
config_attr =None@classmethoddef do_load(cls):...<load-cfg-from-file>...
After struggling with this for some time I eventually came up with the following, so that the config object would only be loaded once, when called up from separate modules. The metaclass allows a global class instance to be stored in the builtins dict, which at present appears to be the neatest way of storing a proper program global.
import builtins
# -----------------------------------------------------------------------------
# So..... you would expect that a class would be "global" in scope, however
# when different modules use this,
# EACH ONE effectively has its own class namespace.
# In order to get around this, we use a metaclass to intercept
# "new" and provide the "truly global metaclass instance" if it already exists
class MetaConfig(type):
def __new__(cls, name, bases, dct):
try:
class_inst = builtins.CONFIG_singleton
except AttributeError:
class_inst = super().__new__(cls, name, bases, dct)
builtins.CONFIG_singleton = class_inst
class_inst.do_load()
return class_inst
# -----------------------------------------------------------------------------
class Config(metaclass=MetaConfig):
config_attr = None
@classmethod
def do_load(cls):
...<load-cfg-from-file>...
I can’t remember where I found this solution, but I find it to be the most ‘elegant’ from my non-Python-expert point of view:
class SomeSingleton(dict):
__instance__ = None
def __new__(cls, *args,**kwargs):
if SomeSingleton.__instance__ is None:
SomeSingleton.__instance__ = dict.__new__(cls)
return SomeSingleton.__instance__
def __init__(self):
pass
def some_func(self,arg):
pass
Why do I like this? No decorators, no meta classes, no multiple inheritance…and if you decide you don’t want it to be a Singleton anymore, just delete the __new__ method. As I am new to Python (and OOP in general) I expect someone will set me straight about why this is a terrible approach?
This is my preferred way of implementing singletons:
class Test(object):
obj = None
def __init__(self):
if Test.obj is not None:
raise Exception('A Test Singleton instance already exists')
# Initialization code here
@classmethod
def get_instance(cls):
if cls.obj is None:
cls.obj = Test()
return cls.obj
@classmethod
def custom_method(cls):
obj = cls.get_instance()
# Custom Code here
This answer is likely not what you’re looking for. I wanted a singleton in the sense that only that object had its identity, for comparison to. In my case it was being used as a Sentinel Value. To which the answer is very simple, make any object mything = object() and by python’s nature, only that thing will have its identity.
#!python
MyNone = object() # The singleton
for item in my_list:
if item is MyNone: # An Example identity comparison
raise StopIteration
# wouldn't it be nice if we could do this?classFoo(object):
instance =Nonedef __new__(cls):if cls.instance isNone:
cls.instance = object()
cls.instance.__class__ =Fooreturn cls.instance
This solution causes some namespace pollution at the module level (three definitions rather than just one), but I find it easy to follow.
I’d like to be able to write something like this (lazy initialization), but unfortunately classes are not available in the body of their own definitions.
# wouldn't it be nice if we could do this?
class Foo(object):
instance = None
def __new__(cls):
if cls.instance is None:
cls.instance = object()
cls.instance.__class__ = Foo
return cls.instance
Since that isn’t possible, we can break out the initialization and the static instance in
Eager Initialization:
import random
class FooMaker(object):
def __init__(self, *args):
self._count = random.random()
self._args = args
class Foo(object):
def __new__(self):
return foo_instance
foo_instance = FooMaker()
foo_instance.__class__ = Foo
Lazy initialization:
Eager Initialization:
import random
class FooMaker(object):
def __init__(self, *args):
self._count = random.random()
self._args = args
class Foo(object):
def __new__(self):
global foo_instance
if foo_instance is None:
foo_instance = FooMaker()
return foo_instance
foo_instance = None
Another possibility is to use a set instead of a list, if a set is applicable in your application.
IE if your data is not ordered, and does not have duplicates, then
my_set=set([3,4,2])
my_set.discard(1)
is error-free.
Often a list is just a handy container for items that are actually unordered. There are questions asking how to remove all occurences of an element from a list. If you don’t want dupes in the first place, once again a set is handy.
As stated by numerous other answers, list.remove() will work, but throw a ValueError if the item wasn’t in the list. With python 3.4+, there’s an interesting approach to handling this, using the suppress contextmanager:
from contextlib import suppress
with suppress(ValueError):
a.remove('b')
回答 8
通过使用列表的remove方法,可以更轻松地在列表中查找值,然后删除该索引(如果存在)。
>>> a =[1,2,3,4]>>>try:... a.remove(6)...exceptValueError:...pass...>>>print a
[1,2,3,4]>>>try:... a.remove(3)...exceptValueError:...pass...>>>print a
[1,2,4]
Maybe your solutions works with ints, but It Doesnt work for me with dictionarys.
In one hand, remove() has not worked for me. But maybe it works with basic Types. I guess the code bellow is also the way to remove items from objects list.
In the other hand, ‘del’ has not worked properly either. In my case, using python 3.6: when I try to delete an element from a list in a ‘for’ bucle with ‘del’ command, python changes the index in the process and bucle stops prematurely before time. It only works if You delete element by element in reversed order. In this way you dont change the pending elements array index when you are going through it
Then, Im used:
c = len(list)-1
for element in (reversed(list)):
if condition(element):
del list[c]
c -= 1
print(list)
where ‘list’ is like [{‘key1′:value1’},{‘key2’:value2}, {‘key3’:value3}, …]
Also You can do more pythonic using enumerate:
for i, element in enumerate(reversed(list)):
if condition(element):
del list[(i+1)*-1]
print(list)
回答 18
arr =[1,1,3,4,5,2,4,3]# to remove first occurence of that element, suppose 3 in this example
arr.remove(3)# to remove all occurences of that element, again suppose 3# use something called list comprehension
new_arr =[element for element in arr if element!=3]# if you want to delete a position use "pop" function, suppose # position 4 # the pop function also returns a value
removed_element = arr.pop(4)# u can also use "del" to delete a positiondel arr[4]
arr = [1, 1, 3, 4, 5, 2, 4, 3]
# to remove first occurence of that element, suppose 3 in this example
arr.remove(3)
# to remove all occurences of that element, again suppose 3
# use something called list comprehension
new_arr = [element for element in arr if element!=3]
# if you want to delete a position use "pop" function, suppose
# position 4
# the pop function also returns a value
removed_element = arr.pop(4)
# u can also use "del" to delete a position
del arr[4]
Note: With virtualenvs getsitepackages is not available, sys.path from above will list the virtualenv’s site-packages directory correctly, though. In Python 3, you may use the sysconfig module instead:
It will point you to /usr/lib/pythonX.X/dist-packages
This folder only contains packages your operating system has automatically installed for programs to run.
On ubuntu, the site-packages folder that contains packages installed via setup_tools\easy_install\pip will be in /usr/local/lib/pythonX.X/dist-packages
The second folder is probably the more useful one if the use case is related to installation or reading source code.
If you do not use Ubuntu, you are probably safe copy-pasting the first code box into the terminal.
Let’s say you have installed the package ‘django’. import it and type in dir(django). It will show you, all the functions and attributes with that module. Type in the python interpreter –
…though the default site.py does something a bit more crude, paraphrased below:
import sys, os
print os.sep.join([sys.prefix, 'lib', 'python' + sys.version[:3], 'site-packages'])
(it also adds ${sys.prefix}/lib/site-python and adds both paths for sys.exec_prefix as well, should that constant be different).
That said, what’s the context? You shouldn’t be messing with your site-packages directly; setuptools/distutils will work for installation, and your program may be running in a virtualenv where your pythonpath is completely user-local, so it shouldn’t assume use of the system site-packages directly either.
A modern stdlib way is using sysconfig module, available in version 2.7 and 3.2+.
Note: sysconfig (source) is not to be confused with the distutils.sysconfig submodule (source) mentioned in several other answers here. The latter is an entirely different module and it’s lacking the get_paths function discussed below.
stdlib: directory containing the standard Python library files that are not platform-specific.
platstdlib: directory containing the standard Python library files that are platform-specific.
platlib: directory for site-specific, platform-specific files.
purelib: directory for site-specific, non-platform-specific files.
include: directory for non-platform-specific header files.
platinclude: directory for platform-specific header files.
scripts: directory for script files.
data: directory for data files.
In most cases, users finding this question would be interested in the ‘purelib’ path (in some cases, you might be interested in ‘platlib’ too). Unlike the current accepted answer, this method still works regardless of whether or not you have a virtualenv activated.
from setuptools.command.easy_install import easy_install
class easy_install_default(easy_install):""" class easy_install had problems with the fist parameter not being
an instance of Distribution, even though it was. This is due to
some import-related mess.
"""def __init__(self):from distutils.dist importDistribution
dist =Distribution()
self.distribution = dist
self.initialize_options()
self._dry_run =None
self.verbose = dist.verbose
self.force =None
self.help =0
self.finalized =0
e = easy_install_default()import distutils.errors
try:
e.finalize_options()except distutils.errors.DistutilsError:passprint e.install_dir
All the answers (or: the same answer repeated over and over) are inadequate. What you want to do is this:
from setuptools.command.easy_install import easy_install
class easy_install_default(easy_install):
""" class easy_install had problems with the fist parameter not being
an instance of Distribution, even though it was. This is due to
some import-related mess.
"""
def __init__(self):
from distutils.dist import Distribution
dist = Distribution()
self.distribution = dist
self.initialize_options()
self._dry_run = None
self.verbose = dist.verbose
self.force = None
self.help = 0
self.finalized = 0
e = easy_install_default()
import distutils.errors
try:
e.finalize_options()
except distutils.errors.DistutilsError:
pass
print e.install_dir
The final line shows you the installation dir. Works on Ubuntu, whereas the above ones don’t. Don’t ask me about windows or other dists, but since it’s the exact same dir that easy_install uses by default, it’s probably correct everywhere where easy_install works (so, everywhere, even macs). Have fun. Note: original code has many swearwords in it.
A side-note: The proposed solution (distutils.sysconfig.get_python_lib()) does not work when there is more than one site-packages directory (as recommended by this article). It will only return the main site-packages directory.
Alas, I have no better solution either. Python doesn’t seem to keep track of site-packages directories, just the packages within them.
This works for me.
It will get you both dist-packages and site-packages folders.
If the folder is not on Python’s path, it won’t be
doing you much good anyway.
import sys;
print [f for f in sys.path if f.endswith('packages')]
This should work on all distributions in and out of virtual environment due to it’s “low-tech” nature. The os module always resides in the parent directory of ‘site-packages’
An additional note to the get_python_lib function mentioned already: on some platforms different directories are used for platform specific modules (eg: modules that require compilation). If you pass plat_specific=True to the function you get the site packages for platform specific packages.
回答 14
from distutils.sysconfig import get_python_lib
print get_python_lib()
This will give the following output about imaplib package –
Type: module
String form: <module 'imaplib' from '/usr/lib/python2.7/imaplib.py'>
File: /usr/lib/python2.7/imaplib.py
Docstring:
IMAP4 client.
Based on RFC 2060.
Public class: IMAP4
Public variable: Debug
Public functions: Internaldate2tuple
Int2AP
ParseFlags
Time2Internaldate
import sys
import os
from distutils.command.install import INSTALL_SCHEMES
if os.name =='nt':
scheme_key ='nt'else:
scheme_key ='unix_prefix'print(INSTALL_SCHEMES[scheme_key]['purelib'].replace('$py_version_short',(str.split(sys.version))[0][0:3]).replace('$base',''))
I had to do something slightly different for a project I was working on: find the relative site-packages directory relative to the base install prefix. If the site-packages folder was in /usr/lib/python2.7/site-packages, I wanted the /lib/python2.7/site-packages part. I have, in fact, encountered systems where site-packages was in /usr/lib64, and the accepted answer did NOT work on those systems.
Similar to cheater’s answer, my solution peeks deep into the guts of Distutils, to find the path that actually gets passed around inside setup.py. It was such a pain to figure out that I don’t want anyone to ever have to figure this out again.
import sys
import os
from distutils.command.install import INSTALL_SCHEMES
if os.name == 'nt':
scheme_key = 'nt'
else:
scheme_key = 'unix_prefix'
print(INSTALL_SCHEMES[scheme_key]['purelib'].replace('$py_version_short', (str.split(sys.version))[0][0:3]).replace('$base', ''))
That should print something like /Lib/site-packages or /lib/python3.6/site-packages.
What would be a nice way to go from {2:3, 1:89, 4:5, 3:0} to {1:89, 2:3, 3:0, 4:5}?
I checked some posts but they all use the “sorted” operator that returns tuples.
In[1]:import collections
In[2]: d ={2:3,1:89,4:5,3:0}In[3]: od = collections.OrderedDict(sorted(d.items()))In[4]: od
Out[4]:OrderedDict([(1,89),(2,3),(3,0),(4,5)])
没关系od打印出来的方式; 它会按预期工作:
In[11]: od[1]Out[11]:89In[12]: od[3]Out[12]:0In[13]:for k, v in od.iteritems():print k, v
....:189233045
Python 3
对于Python 3用户,需要使用.items()而不是.iteritems():
In[13]:for k, v in od.items():print(k, v)....:189233045
Standard Python dictionaries are unordered. Even if you sorted the (key,value) pairs, you wouldn’t be able to store them in a dict in a way that would preserve the ordering.
The easiest way is to use OrderedDict, which remembers the order in which the elements have been inserted:
In [1]: import collections
In [2]: d = {2:3, 1:89, 4:5, 3:0}
In [3]: od = collections.OrderedDict(sorted(d.items()))
In [4]: od
Out[4]: OrderedDict([(1, 89), (2, 3), (3, 0), (4, 5)])
Never mind the way od is printed out; it’ll work as expected:
In [11]: od[1]
Out[11]: 89
In [12]: od[3]
Out[12]: 0
In [13]: for k, v in od.iteritems(): print k, v
....:
1 89
2 3
3 0
4 5
Python 3
For Python 3 users, one needs to use the .items() instead of .iteritems():
In [13]: for k, v in od.items(): print(k, v)
....:
1 89
2 3
3 0
4 5
回答 1
字典本身没有这样的有序项目,如果您想按某种顺序将它们打印等,下面是一些示例:
在Python 2.4及更高版本中:
mydict ={'carl':40,'alan':2,'bob':1,'danny':3}for key in sorted(mydict):print"%s: %s"%(key, mydict[key])
给出:
alan:2
bob:1
carl:40
danny:3
(低于2.4的Python :)
keylist = mydict.keys()
keylist.sort()for key in keylist:print"%s: %s"%(key, mydict[key])
There are a number of Python modules that provide dictionary implementations which automatically maintain the keys in sorted order. Consider the sortedcontainers module which is pure-Python and fast-as-C implementations. There is also a performance comparison with other popular options benchmarked against one another.
Using an ordered dict is an inadequate solution if you need to constantly add and remove key/value pairs while also iterating.
>>> from sortedcontainers import SortedDict
>>> d = {2:3, 1:89, 4:5, 3:0}
>>> s = SortedDict(d)
>>> s.items()
[(1, 89), (2, 3), (3, 0), (4, 5)]
The SortedDict type also supports indexed location lookups and deletion which isn’t possible with the built-in dict type.
classSortedDisplayDict(dict):def __str__(self):return"{"+", ".join("%r: %r"%(key, self[key])for key in sorted(self))+"}">>> d =SortedDisplayDict({2:3,1:89,4:5,3:0})>>> d
{1:89,2:3,3:0,4:5}
As others have mentioned, dictionaries are inherently unordered. However, if the issue is merely displaying dictionaries in an ordered fashion, you can override the __str__ method in a dictionary subclass, and use this dictionary class rather than the builtin dict. Eg.
class SortedDisplayDict(dict):
def __str__(self):
return "{" + ", ".join("%r: %r" % (key, self[key]) for key in sorted(self)) + "}"
>>> d = SortedDisplayDict({2:3, 1:89, 4:5, 3:0})
>>> d
{1: 89, 2: 3, 3: 0, 4: 5}
Note, this changes nothing about how the keys are stored, the order they will come back when you iterate over them etc, just how they’re displayed with print or at the python console.
upd:
1. this also sorts nested objects (thanks @DanielF).
2. python dictionaries are unordered therefore this is sutable for print or assign to str only.
回答 8
在Python 3中。
>>> D1 ={2:3,1:89,4:5,3:0}>>>for key in sorted(D1):print(key, D1[key])
test_dict ={'a':1,'c':3,'b':{'b2':2,'b1':1}}def dict_reorder(item):return{k: sort_dict(v)if isinstance(v, dict)else v for k, v in sorted(item.items())}
reordered_dict = dict_reorder(test_dict)
Python dictionary was unordered before Python 3.6. In CPython implementation of Python 3.6, dictionary keeps the insertion order.
From Python 3.7, this will become a language feature.
The order-preserving aspect of this new implementation is considered
an implementation detail and should not be relied upon (this may
change in the future, but it is desired to have this new dict
implementation in the language for a few releases before changing the
language spec to mandate order-preserving semantics for all current
and future Python implementations; this also helps preserve
backwards-compatibility with older versions of the language where
random iteration order is still in effect, e.g. Python 3.5).
Performing list(d) on a dictionary returns a list of all the keys used
in the dictionary, in insertion order (if you want it sorted, just use
sorted(d) instead).
So unlike previous versions, you can sort a dict after Python 3.6/3.7. If you want to sort a nested dict including the sub-dict inside, you can do:
test_dict = {'a': 1, 'c': 3, 'b': {'b2': 2, 'b1': 1}}
def dict_reorder(item):
return {k: sort_dict(v) if isinstance(v, dict) else v for k, v in sorted(item.items())}
reordered_dict = dict_reorder(test_dict)
D1 = {2:3, 1:89, 4:5, 3:0}
sort_dic = {}
for i in sorted(D1):
sort_dic.update({i:D1[i]})
print sort_dic
{1: 89, 2: 3, 3: 0, 4: 5}
But this is not the correct way to do this, because, It could show a distinct behavior with different dictionaries, which I have learned recently. Hence perfect way has been suggested by Tim In the response of my Query which I am sharing here.
from collections import OrderedDict
sorted_dict = OrderedDict(sorted(D1.items(), key=lambda t: t[0]))
回答 13
我认为最简单的方法是按键对字典进行排序,然后将排序后的键:值对保存在新字典中。
dict1 ={'renault':3,'ford':4,'volvo':1,'toyota':2}
dict2 ={}# create an empty dict to store the sorted valuesfor key in sorted(dict1.keys()):ifnot key in dict2:# Depending on the goal, this line may not be neccessary
dict2[key]= dict1[key]
为了更清楚一点:
dict1 ={'renault':3,'ford':4,'volvo':1,'toyota':2}
dict2 ={}# create an empty dict to store the sorted valuesfor key in sorted(dict1.keys()):ifnot key in dict2:# Depending on the goal, this line may not be neccessary
value = dict1[key]
dict2[key]= value
I think the easiest thing is to sort the dict by key and save the sorted key:value pair in a new dict.
dict1 = {'renault': 3, 'ford':4, 'volvo': 1, 'toyota': 2}
dict2 = {} # create an empty dict to store the sorted values
for key in sorted(dict1.keys()):
if not key in dict2: # Depending on the goal, this line may not be neccessary
dict2[key] = dict1[key]
To make it clearer:
dict1 = {'renault': 3, 'ford':4, 'volvo': 1, 'toyota': 2}
dict2 = {} # create an empty dict to store the sorted values
for key in sorted(dict1.keys()):
if not key in dict2: # Depending on the goal, this line may not be neccessary
value = dict1[key]
dict2[key] = value
You can create a new dictionary by sorting the current dictionary by key as per your question.
This is your dictionary
d = {2:3, 1:89, 4:5, 3:0}
Create a new dictionary d1 by sorting this d using lambda function
d1 = dict(sorted(d.items(), key = lambda x:x[0]))
d1 should be {1: 89, 2: 3, 3: 0, 4: 5}, sorted based on keys in d.
回答 15
Python字典是无序的。通常,这不是问题,因为最常见的用例是进行查找。
执行所需操作的最简单方法是创建collections.OrderedDict按排序顺序插入元素。
ordered_dict = collections.OrderedDict([(k, d[k])for k in sorted(d.keys())])
如上面其他建议那样,如果需要迭代,则最简单的方法是迭代已排序的键。例子-
打印按键排序的值:
# create the dict
d ={k1:v1, k2:v2,...}# iterate by keys in sorted orderfor k in sorted(d.keys()):
value = d[k]# do something with k, value like printprint k, value
Python dicts are un-ordered. Usually, this is not a problem since the most common use case is to do a lookup.
The simplest way to do what you want would be to create a collections.OrderedDict inserting the elements in sorted order.
ordered_dict = collections.OrderedDict([(k, d[k]) for k in sorted(d.keys())])
If you need to iterated, as others above have suggested, the simplest way would be to iterate over sorted keys. Examples-
Print values sorted by keys:
# create the dict
d = {k1:v1, k2:v2,...}
# iterate by keys in sorted order
for k in sorted(d.keys()):
value = d[k]
# do something with k, value like print
print k, value
Get list of values sorted by keys:
values = [d[k] for k in sorted(d.keys())]
回答 16
我提出单行字典排序。
>> a ={2:3,1:89,4:5,3:0}>> c ={i:a[i]for i in sorted(a.keys())}>>print(c){1:89,2:3,3:0,4:5}[Finishedin0.4s]
This function will sort any dictionary recursively by its key. That is, if any value in the dictionary is also a dictionary, it too will be sorted by its key. If you are running on CPython 3.6 or greater, than a simple change to use a dict rather than an OrderedDict can be made.
from collections import OrderedDict
def sort_dict(d):
items = [[k, v] for k, v in sorted(d.items(), key=lambda x: x[0])]
for item in items:
if isinstance(item[1], dict):
item[1] = sort_dict(item[1])
return OrderedDict(items)
#return dict(items)
回答 18
伙计们,你让事情变得复杂了……这很简单
from pprint import pprint
Dict={'B':1,'A':2,'C':3}
pprint(Dict)
A timing comparison of the two methods in 2.7 shows them to be virtually identical:
>>> setup_string = "a = sorted(dict({2:3, 1:89, 4:5, 3:0}).items())"
>>> timeit.timeit(stmt="[(k, val) for k, val in a]", setup=setup_string, number=10000)
0.003599141953657181
>>> setup_string = "from collections import OrderedDict\n"
>>> setup_string += "a = OrderedDict({1:89, 2:3, 3:0, 4:5})\n"
>>> setup_string += "b = a.items()"
>>> timeit.timeit(stmt="[(k, val) for k, val in b]", setup=setup_string, number=10000)
0.003581275490432745
回答 22
from operator import itemgetter
# if you would like to play with multiple dictionaries then here you go:# Three dictionaries that are composed of first name and last name.
user =[{'fname':'Mo','lname':'Mahjoub'},{'fname':'Abdo','lname':'Al-hebashi'},{'fname':'Ali','lname':'Muhammad'}]# This loop will sort by the first and the last names.# notice that in a dictionary order doesn't matter. So it could put the first name first or the last name first. for k in sorted (user, key=itemgetter ('fname','lname')):print(k)# This one will sort by the first name only.for x in sorted (user, key=itemgetter ('fname')):print(x)
from operator import itemgetter
# if you would like to play with multiple dictionaries then here you go:
# Three dictionaries that are composed of first name and last name.
user = [
{'fname': 'Mo', 'lname': 'Mahjoub'},
{'fname': 'Abdo', 'lname': 'Al-hebashi'},
{'fname': 'Ali', 'lname': 'Muhammad'}
]
# This loop will sort by the first and the last names.
# notice that in a dictionary order doesn't matter. So it could put the first name first or the last name first.
for k in sorted (user, key=itemgetter ('fname', 'lname')):
print (k)
# This one will sort by the first name only.
for x in sorted (user, key=itemgetter ('fname')):
print (x)
回答 23
dictionary ={1:[2],2:[],5:[4,5],4:[5],3:[1]}
temp=sorted(dictionary)
sorted_dict = dict([(k,dictionary[k])for i,k in enumerate(temp)])
sorted_dict:{1:[2],2:[],3:[1],4:[5],5:[4,5]}
My suggestion is this as it allows you to sort a dict or keep a dict sorted as you are adding items and might need to add items in the future:
Build a dict from scratch as you go along. Have a second data structure, a list, with your list of keys. The bisect package has an insort function which allows inserting into a sorted list, or sort your list after completely populating your dict. Now, when you iterate over your dict, you instead iterate over the list to access each key in an in-order fashion without worrying about the representation of the dict structure (which was not made for sorting).
回答 26
l = dict.keys()
l2 = l
l2.append(0)
l3 =[]for repeater in range(0, len(l)):
smallnum = float("inf")for listitem in l2:if listitem < smallnum:
smallnum = listitem
l2.remove(smallnum)
l3.append(smallnum)
l3.remove(0)
l = l3
for listitem in l:print(listitem)
l = dict.keys()
l2 = l
l2.append(0)
l3 = []
for repeater in range(0, len(l)):
smallnum = float("inf")
for listitem in l2:
if listitem < smallnum:
smallnum = listitem
l2.remove(smallnum)
l3.append(smallnum)
l3.remove(0)
l = l3
for listitem in l:
print(listitem)
Don’t use easy_install, unless you
like stabbing yourself in the face.
Use pip.
Why use pip over easy_install? Doesn’t the fault lie with PyPI and package authors mostly? If an author uploads crap source tarball (eg: missing files, no setup.py) to PyPI, then both pip and easy_install will fail. Other than cosmetic differences, why do Python people (like in the above tweet) seem to strongly favor pip over easy_install?
(Let’s assume that we’re talking about easy_install from the Distribute package, that is maintained by the community)
我知道easy_install在2015年使用的唯一好的理由是在OS X 10.5-10.8中使用Apple预先安装的Python版本的特殊情况。从10.5开始,Apple已包含easy_install,但从10.10开始,它们仍然不包含pip。使用10.9+时,您仍然应该只使用get-pip.py,但是对于10.5-10.8,这存在一些问题,因此更容易实现sudo easy_install pip。(通常,这easy_install pip是一个坏主意;您只想在OS X 10.5-10.8上才能做到这一点。)此外,10.5-10.8包含readline以一种easy_install知道如何纠缠而pip不会纠缠的方式,因此您也想sudo easy_install readline如果要升级。
Binary packages are now distributed as wheels (.whl files)—not just on PyPI, but in third-party repositories like Christoph Gohlke’s Extension Packages for Windows. pip can handle wheels; easy_install cannot.
Virtual environments (which come built-in with 3.4, or can be added to 2.6+/3.1+ with virtualenv) have become a very important and prominent tool (and recommended in the official docs); they include pip out of the box, but don’t even work properly with easy_install.
The distribute package that included easy_install is no longer maintained. Its improvements over setuptools got merged back into setuptools. Trying to install distribute will just install setuptools instead.
easy_install itself is only quasi-maintained.
All of the cases where pip used to be inferior to easy_install—installing from an unpacked source tree, from a DVCS repo, etc.—are long-gone; you can pip install ., pip install git+https://.
pip comes with the official Python 2.7 and 3.4+ packages from python.org, and a pip bootstrap is included by default if you build from source.
The various incomplete bits of documentation on installing, using, and building packages have been replaced by the Python Packaging User Guide. Python’s own documentation on Installing Python Modules now defers to this user guide, and explicitly calls out pip as “the preferred installer program”.
Other new features have been added to pip over the years that will never be in easy_install. For example, pip makes it easy to clone your site-packages by building a requirements file and then installing it with a single command on each side. Or to convert your requirements file to a local repo to use for in-house development. And so on.
The only good reason that I know of to use easy_install in 2015 is the special case of using Apple’s pre-installed Python versions with OS X 10.5-10.8. Since 10.5, Apple has included easy_install, but as of 10.10 they still don’t include pip. With 10.9+, you should still just use get-pip.py, but for 10.5-10.8, this has some problems, so it’s easier to sudo easy_install pip. (In general, easy_install pip is a bad idea; it’s only for OS X 10.5-10.8 that you want to do this.) Also, 10.5-10.8 include readline in a way that easy_install knows how to kludge around but pip doesn’t, so you also want to sudo easy_install readline if you want to upgrade that.
Seriously, I use this in conjunction with virtualenv every day.
QUICK DEPENDENCY MANAGEMENT TUTORIAL, FOLKS
Requirements files allow you to create a snapshot of all packages that have been installed through pip. By encapsulating those packages in a virtualenvironment, you can have your codebase work off a very specific set of packages and share that codebase with others.
You create a virtual environment, and set your shell to use it. (bash/*nix instructions)
virtualenv env
source env/bin/activate
Now all python scripts run with this shell will use this environment’s packages and configuration. Now you can install a package locally to this environment without needing to install it globally on your machine.
pip install flask
Now you can dump the info about which packages are installed with
pip freeze > requirements.txt
If you checked that file into version control, when someone else gets your code, they can setup their own virtual environment and install all the dependencies with:
pip install -r requirements.txt
Any time you can automate tedium like this is awesome.
UPDATE: setuptools has absorbed distribute as opposed to the other way around, as some thought. setuptools is up-to-date with the latest distutils changes and the wheel format. Hence, easy_install and pip are more or less on equal footing now.
I will test wheel by creating an OS X installer for PySide using wheel instead of eggs. Will get back and report about this.
cheers – Chris
A quick update:
The transition to wheel is almost over. Most packages are supporting wheel.
I promised to build wheels for PySide, and I did that last summer. Works great!
HINT:
A few developers failed so far to support the wheel format, simply because they forget to
replace distutils by setuptools.
Often, it is easy to convert such packages by replacing this single word in setup.py.
Just met one special case that I had to use easy_install instead of pip, or I have to pull the source codes directly.
For the package GitPython, the version in pip is too old, which is 0.1.7, while the one from easy_install is the latest which is 0.3.2.rc1.
I’m using Python 2.7.8. I’m not sure about the underlay mechanism of easy_install and pip, but at least the versions of some packages may be different from each other, and sometimes easy_install is the one with newer version.
You can use a list comprehension to create a new list containing only the elements you don’t want to remove:
somelist = [x for x in somelist if not determine(x)]
Or, by assigning to the slice somelist[:], you can mutate the existing list to contain only the items you want:
somelist[:] = [x for x in somelist if not determine(x)]
This approach could be useful if there are other references to somelist that need to reflect the changes.
Instead of a comprehension, you could also use itertools. In Python 2:
from itertools import ifilterfalse
somelist[:] = ifilterfalse(determine, somelist)
Or in Python 3:
from itertools import filterfalse
somelist[:] = filterfalse(determine, somelist)
For the sake of clarity and for those who find the use of the [:] notation hackish or fuzzy, here’s a more explicit alternative. Theoretically, it should perform the same with regards to space and time than the one-liners above.
temp = []
while somelist:
x = somelist.pop()
if not determine(x):
temp.append(x)
while temp:
somelist.append(templist.pop())
It also works in other languages that may not have the replace items ability of Python lists, with minimal modifications. For instance, not all languages cast empty lists to a False as Python does. You can substitute while somelist: for something more explicit like while len(somelist) > 0:.
The answers suggesting list comprehensions are ALMOST correct — except that they build a completely new list and then give it the same name the old list as, they do NOT modify the old list in place. That’s different from what you’d be doing by selective removal, as in @Lennart’s suggestion — it’s faster, but if your list is accessed via multiple references the fact that you’re just reseating one of the references and NOT altering the list object itself can lead to subtle, disastrous bugs.
Fortunately, it’s extremely easy to get both the speed of list comprehensions AND the required semantics of in-place alteration — just code:
somelist[:] = [tup for tup in somelist if determine(tup)]
Note the subtle difference with other answers: this one is NOT assigning to a barename – it’s assigning to a list slice that just happens to be the entire list, thereby replacing the list contentswithin the same Python list object, rather than just reseating one reference (from previous list object to new list object) like the other answers.
回答 2
您需要获取列表的副本并首先对其进行迭代,否则迭代将失败,并可能导致意外结果。
例如(取决于列表的类型):
for tup in somelist[:]:
etc....
一个例子:
>>> somelist = range(10)>>>for x in somelist:... somelist.remove(x)>>> somelist[1,3,5,7,9]>>> somelist = range(10)>>>for x in somelist[:]:... somelist.remove(x)>>> somelist[]
>>> words =['cat','window','defenestrate']>>>for w in words[:]:# Loop over a slice copy of the entire list....if len(w)>6:... words.insert(0, w)...>>> words['defenestrate','cat','window','defenestrate']
you need to make a copy of the iterated list to modify it
one way to do it is with the slice notation [:]
If you need to modify the sequence you are iterating over while inside the loop (for example to duplicate selected items), it is recommended that you first make a copy. Iterating over a sequence does not implicitly make a copy. The slice notation makes this especially convenient:
>>> words = ['cat', 'window', 'defenestrate']
>>> for w in words[:]: # Loop over a slice copy of the entire list.
... if len(w) > 6:
... words.insert(0, w)
...
>>> words
['defenestrate', 'cat', 'window', 'defenestrate']
This part of the docs says once again that you have to make a copy, and gives an actual removal example:
Note: There is a subtlety when the sequence is being modified by the loop (this can only occur for mutable sequences, i.e. lists). An internal counter is used to keep track of which item is used next, and this is incremented on each iteration. When this counter has reached the length of the sequence the loop terminates. This means that if the suite deletes the current (or a previous) item from the sequence, the next item will be skipped (since it gets the index of the current item which has already been treated). Likewise, if the suite inserts an item in the sequence before the current item, the current item will be treated again the next time through the loop. This can lead to nasty bugs that can be avoided by making a temporary copy using a slice of the whole sequence, e.g.,
for x in a[:]:
if x < 0: a.remove(x)
However, I disagree with this implementation, since .remove() has to iterate the entire list to find the value.
This is more space efficient since it dispenses the array copy, but it is less time efficient because CPython lists are implemented with dynamic arrays.
This means that item removal requires shifting all following items back by one, which is O(N).
Generally you just want to go for the faster .append() option by default unless memory is a big concern.
Could Python do this better?
It seems like this particular Python API could be improved. Compare it, for instance, with:
Java ListIterator::remove which documents “This call can only be made once per call to next or previous”
C++ std::vector::erase which returns a valid interator to the element after the one removed
both of which make it crystal clear that you cannot modify a list being iterated except with the iterator itself, and gives you efficient ways to do so without copying the list.
Perhaps the underlying rationale is that Python lists are assumed to be dynamic array backed, and therefore any type of removal will be time inefficient anyways, while Java has a nicer interface hierarchy with both ArrayList and LinkedList implementations of ListIterator.
There doesn’t seem to be an explicit linked list type in the Python stdlib either: Python Linked List
newlist =[]for tup in somelist:# lots of code here, possibly setting things up for calling determineif determine(tup):
newlist.append(tup)
somelist = newlist
somelist = [tup for tup in somelist if determine(tup)]
In cases where you’re doing something more complex than calling a determine function, I prefer constructing a new list and simply appending to it as I go. For example
newlist = []
for tup in somelist:
# lots of code here, possibly setting things up for calling determine
if determine(tup):
newlist.append(tup)
somelist = newlist
Copying the list using remove might make your code look a little cleaner, as described in one of the answers below. You should definitely not do this for extremely large lists, since this involves first copying the entire list, and also performing an O(n)remove operation for each element being removed, making this an O(n^2) algorithm.
for tup in somelist[:]:
# lots of code here, possibly setting things up for calling determine
if determine(tup):
newlist.append(tup)
I needed to do this with a huge list, and duplicating the list seemed expensive, especially since in my case the number of deletions would be few compared to the items that remain. I took this low-level approach.
array = [lots of stuff]
arraySize = len(array)
i = 0
while i < arraySize:
if someTest(array[i]):
del array[i]
arraySize -= 1
else:
i += 1
What I don’t know is how efficient a couple of deletes are compared to copying a large list. Please comment if you have any insight.
回答 8
如果当前列表项符合期望的条件,则仅创建一个新列表也可能很聪明。
所以:
for item in originalList:if(item != badValue):
newList.append(item)
L2 = L1
for(a,b)in L1:if a <0or b <0:
L2.remove((a,b))# Now, remove the original copy of L1 and replace with L2print L2 is L1
del L1
L1 = L2;del L2
print("L1 is now: ", L1)
但是,输出将与之前相同:
'L1 is now: ',[(1,2),(5,6),(1,-2),(3,4),(5,7),(2,1),(5,-1),(0,6)]
import copy
L1 =[(1,2),(5,6),(-1,-2),(1,-2),(3,4),(5,7),(-4,4),(2,1),(-3,-3),(5,-1),(0,6)]
L2 = copy.copy(L1)for(a,b)in L1:if a <0or b <0:
L2.remove((a,b))# Now, remove the original copy of L1 and replace with L2del L1
L1 = L2;del L2
>>> L1 is now:[(1,2),(5,6),(3,4),(5,7),(2,1),(0,6)]
最后,有一个更清洁的解决方案,而不是必须制作全新的L1副本。reversed()函数:
L1 =[(1,2),(5,6),(-1,-2),(1,-2),(3,4),(5,7),(-4,4),(2,1),(-3,-3),(5,-1),(0,6)]for(a,b)in reversed(L1):if a <0or b <0:
L1.remove((a,b))print("L1 is now: ", L1)>>> L1 is now:[(1,2),(5,6),(3,4),(5,7),(2,1),(0,6)]
1) When using remove(), you attempt to remove integers whereas you need to remove a tuple.
2) The for loop will skip items in your list.
Let’s run through what happens when we execute your code:
>>> L1 = [(1,2), (5,6), (-1,-2), (1,-2)]
>>> for (a,b) in L1:
... if a < 0 or b < 0:
... L1.remove(a,b)
...
Traceback (most recent call last):
File "<stdin>", line 3, in <module>
TypeError: remove() takes exactly one argument (2 given)
The first problem is that you are passing both ‘a’ and ‘b’ to remove(), but remove() only accepts a single argument. So how can we get remove() to work properly with your list? We need to figure out what each element of your list is. In this case, each one is a tuple. To see this, let’s access one element of the list (indexing starts at 0):
>>> L1[1]
(5, 6)
>>> type(L1[1])
<type 'tuple'>
Aha! Each element of L1 is actually a tuple. So that’s what we need to be passing to remove(). Tuples in python are very easy, they’re simply made by enclosing values in parentheses. “a, b” is not a tuple, but “(a, b)” is a tuple. So we modify your code and run it again:
# The remove line now includes an extra "()" to make a tuple out of "a,b"
L1.remove((a,b))
This code runs without any error, but let’s look at the list it outputs:
L1 is now: [(1, 2), (5, 6), (1, -2)]
Why is (1,-2) still in your list? It turns out modifying the list while using a loop to iterate over it is a very bad idea without special care. The reason that (1, -2) remains in the list is that the locations of each item within the list changed between iterations of the for loop. Let’s look at what happens if we feed the above code a longer list:
As you can infer from that result, every time that the conditional statement evaluates to true and a list item is removed, the next iteration of the loop will skip evaluation of the next item in the list because its values are now located at different indices.
The most intuitive solution is to copy the list, then iterate over the original list and only modify the copy. You can try doing so like this:
L2 = L1
for (a,b) in L1:
if a < 0 or b < 0 :
L2.remove((a,b))
# Now, remove the original copy of L1 and replace with L2
print L2 is L1
del L1
L1 = L2; del L2
print ("L1 is now: ", L1)
This is because when we created L2, python did not actually create a new object. Instead, it merely referenced L2 to the same object as L1. We can verify this with ‘is’ which is different from merely “equals” (==).
>>> L2=L1
>>> L1 is L2
True
We can make a true copy using copy.copy(). Then everything works as expected:
import copy
L1 = [(1,2), (5,6),(-1,-2), (1,-2),(3,4),(5,7),(-4,4),(2,1),(-3,-3),(5,-1),(0,6)]
L2 = copy.copy(L1)
for (a,b) in L1:
if a < 0 or b < 0 :
L2.remove((a,b))
# Now, remove the original copy of L1 and replace with L2
del L1
L1 = L2; del L2
>>> L1 is now: [(1, 2), (5, 6), (3, 4), (5, 7), (2, 1), (0, 6)]
Finally, there is one cleaner solution than having to make an entirely new copy of L1. The reversed() function:
L1 = [(1,2), (5,6),(-1,-2), (1,-2),(3,4),(5,7),(-4,4),(2,1),(-3,-3),(5,-1),(0,6)]
for (a,b) in reversed(L1):
if a < 0 or b < 0 :
L1.remove((a,b))
print ("L1 is now: ", L1)
>>> L1 is now: [(1, 2), (5, 6), (3, 4), (5, 7), (2, 1), (0, 6)]
Unfortunately, I cannot adequately describe how reversed() works. It returns a ‘listreverseiterator’ object when a list is passed to it. For practical purposes, you can think of it as creating a reversed copy of its argument. This is the solution I recommend.
inlist =[{'field1':10,'field2':20},{'field1':30,'field2':15}]for idx, i in enumerate(inlist):do some stuff with i['field1']if somecondition:
xlist.append(idx)for i in reversed(xlist):del inlist[i]
If you want to do anything else during the iteration, it may be nice to get both the index (which guarantees you being able to reference it, for example if you have a list of dicts) and the actual list item contents.
inlist = [{'field1':10, 'field2':20}, {'field1':30, 'field2':15}]
for idx, i in enumerate(inlist):
do some stuff with i['field1']
if somecondition:
xlist.append(idx)
for i in reversed(xlist): del inlist[i]
enumerate gives you access to the item and the index at once. reversed is so that the indices that you’re going to later delete don’t change on you.
Most of the answers here want you to create a copy of the list. I had a use case where the list was quite long (110K items) and it was smarter to keep reducing the list instead.
First of all you’ll need to replace foreach loop with while loop,
i = 0
while i < len(somelist):
if determine(somelist[i]):
del somelist[i]
else:
i += 1
The value of i is not changed in the if block because you’ll want to get value of the new item FROM THE SAME INDEX, once the old item is deleted.
回答 13
您可以尝试反向进行循环,因此对于some_list,您将执行以下操作:
list_len = len(some_list)for i in range(list_len):
reverse_i = list_len -1- i
cur = some_list[reverse_i]# some logic with cur elementif some_condition:
some_list.pop(reverse_i)
You can try for-looping in reverse so for some_list you’ll do something like:
list_len = len(some_list)
for i in range(list_len):
reverse_i = list_len - 1 - i
cur = some_list[reverse_i]
# some logic with cur element
if some_condition:
some_list.pop(reverse_i)
This way the index is aligned and doesn’t suffer from the list updates (regardless whether you pop cur element or not).
alist =['good','bad','good','bad','good']
i =0for x in alist[:]:if x =='bad':
alist.pop(i)
i -=1# do something cool with x or just print xprint(x)
i +=1
One possible solution, useful if you want not only remove some things, but also do something with all elements in a single loop:
alist = ['good', 'bad', 'good', 'bad', 'good']
i = 0
for x in alist[:]:
if x == 'bad':
alist.pop(i)
i -= 1
# do something cool with x or just print x
print(x)
i += 1
I needed to do something similar and in my case the problem was memory – I needed to merge multiple dataset objects within a list, after doing some stuff with them, as a new object, and needed to get rid of each entry I was merging to avoid duplicating all of them and blowing up memory. In my case having the objects in a dictionary instead of a list worked fine:
“`
k = range(5)
v = ['a','b','c','d','e']
d = {key:val for key,val in zip(k, v)}
print d
for i in range(5):
print d[i]
d.pop(i)
print d
“`
回答 16
TLDR:
我写了一个库,使您可以执行此操作:
from fluidIter importFluidIterable
fSomeList =FluidIterable(someList)for tup in fSomeList:if determine(tup):# remove 'tup' without "breaking" the iteration
fSomeList.remove(tup)# tup has also been removed from 'someList'# as well as 'fSomeList'
from fluidIter importFluidIterable
l =[0,1,2,3,4,5,6,7,8]
fluidL =FluidIterable(l)for i in fluidL:print('initial state of list on this iteration: '+ str(fluidL))print('current iteration value: '+ str(i))print('popped value: '+ str(fluidL.pop(2)))print(' ')print('Final List Value: '+ str(l))
这将产生以下输出:
initial state of list on this iteration:[0,1,2,3,4,5,6,7,8]
current iteration value:0
popped value:2
initial state of list on this iteration:[0,1,3,4,5,6,7,8]
current iteration value:1
popped value:3
initial state of list on this iteration:[0,1,4,5,6,7,8]
current iteration value:4
popped value:4
initial state of list on this iteration:[0,1,5,6,7,8]
current iteration value:5
popped value:5
initial state of list on this iteration:[0,1,6,7,8]
current iteration value:6
popped value:6
initial state of list on this iteration:[0,1,7,8]
current iteration value:7
popped value:7
initial state of list on this iteration:[0,1,8]
current iteration value:8
popped value:8FinalListValue:[0,1]
fluidArr =FluidIterable([0,1,2,3])# get iterator first so can query the current index
fluidArrIter = fluidArr.__iter__()for i, v in enumerate(fluidArrIter):print('enum: ', i)print('current val: ', v)print('current ind: ', fluidArrIter.currentIndex)print(fluidArr)
fluidArr.insert(0,'a')print(' ')print('Final List Value: '+ str(fluidArr))
这将输出以下内容:
enum:0
current val:0
current ind:0[0,1,2,3]
enum:1
current val:1
current ind:2['a',0,1,2,3]
enum:2
current val:2
current ind:4['a','a',0,1,2,3]
enum:3
current val:3
current ind:6['a','a','a',0,1,2,3]FinalListValue:['a','a','a','a',0,1,2,3]
randInts =[70,20,61,80,54,18,7,18,55,9]
fRandInts =FluidIterable(randInts)
fRandIntsIter = fRandInts.__iter__()# for each value in the list (outer loop)# test against every other value in the list (inner loop)for i in fRandIntsIter:print(' ')print('outer val: ', i)
innerIntsIter = fRandInts.__iter__()for j in innerIntsIter:
innerIndex = innerIntsIter.currentIndex
# skip the element that the outloop is currently on# because we don't want to test a value against itselfifnot innerIndex == fRandIntsIter.currentIndex:# if the test element, j, is a multiple # of the reference element, i, then remove 'j'if j%i ==0:print('remove val: ', j)# remove element in place, without breaking the# iteration of either loopdel fRandInts[innerIndex]# end if multiple, then remove# end if not the same value as outer loop# end inner loop# end outerloopprint('')print('final list: ', randInts)
from fluidIter import FluidIterable
fSomeList = FluidIterable(someList)
for tup in fSomeList:
if determine(tup):
# remove 'tup' without "breaking" the iteration
fSomeList.remove(tup)
# tup has also been removed from 'someList'
# as well as 'fSomeList'
It’s best to use another method if possible that doesn’t require modifying your iterable while iterating over it, but for some algorithms it might not be that straight forward. And so if you are sure that you really do want the code pattern described in the original question, it is possible.
Should work on all mutable sequences not just lists.
Full answer:
Edit: The last code example in this answer gives a use case for why you might sometimes want to modify a list in place rather than use a list comprehension. The first part of the answers serves as tutorial of how an array can be modified in place.
The solution follows on from this answer (for a related question) from senderle. Which explains how the the array index is updated while iterating through a list that has been modified. The solution below is designed to correctly track the array index even if the list is modified.
Download fluidIter.py from herehttps://github.com/alanbacon/FluidIterator, it is just a single file so no need to install git. There is no installer so you will need to make sure that the file is in the python path your self. The code has been written for python 3 and is untested on python 2.
from fluidIter import FluidIterable
l = [0,1,2,3,4,5,6,7,8]
fluidL = FluidIterable(l)
for i in fluidL:
print('initial state of list on this iteration: ' + str(fluidL))
print('current iteration value: ' + str(i))
print('popped value: ' + str(fluidL.pop(2)))
print(' ')
print('Final List Value: ' + str(l))
This will produce the following output:
initial state of list on this iteration: [0, 1, 2, 3, 4, 5, 6, 7, 8]
current iteration value: 0
popped value: 2
initial state of list on this iteration: [0, 1, 3, 4, 5, 6, 7, 8]
current iteration value: 1
popped value: 3
initial state of list on this iteration: [0, 1, 4, 5, 6, 7, 8]
current iteration value: 4
popped value: 4
initial state of list on this iteration: [0, 1, 5, 6, 7, 8]
current iteration value: 5
popped value: 5
initial state of list on this iteration: [0, 1, 6, 7, 8]
current iteration value: 6
popped value: 6
initial state of list on this iteration: [0, 1, 7, 8]
current iteration value: 7
popped value: 7
initial state of list on this iteration: [0, 1, 8]
current iteration value: 8
popped value: 8
Final List Value: [0, 1]
Above we have used the pop method on the fluid list object. Other common iterable methods are also implemented such as del fluidL[i], .remove, .insert, .append, .extend. The list can also be modified using slices (sort and reverse methods are not implemented).
The only condition is that you must only modify the list in place, if at any point fluidL or l were reassigned to a different list object the code would not work. The original fluidL object would still be used by the for loop but would become out of scope for us to modify.
i.e.
fluidL[2] = 'a' # is OK
fluidL = [0, 1, 'a', 3, 4, 5, 6, 7, 8] # is not OK
If we want to access the current index value of the list we cannot use enumerate, as this only counts how many times the for loop has run. Instead we will use the iterator object directly.
fluidArr = FluidIterable([0,1,2,3])
# get iterator first so can query the current index
fluidArrIter = fluidArr.__iter__()
for i, v in enumerate(fluidArrIter):
print('enum: ', i)
print('current val: ', v)
print('current ind: ', fluidArrIter.currentIndex)
print(fluidArr)
fluidArr.insert(0,'a')
print(' ')
print('Final List Value: ' + str(fluidArr))
This will output the following:
enum: 0
current val: 0
current ind: 0
[0, 1, 2, 3]
enum: 1
current val: 1
current ind: 2
['a', 0, 1, 2, 3]
enum: 2
current val: 2
current ind: 4
['a', 'a', 0, 1, 2, 3]
enum: 3
current val: 3
current ind: 6
['a', 'a', 'a', 0, 1, 2, 3]
Final List Value: ['a', 'a', 'a', 'a', 0, 1, 2, 3]
The FluidIterable class just provides a wrapper for the original list object. The original object can be accessed as a property of the fluid object like so:
originalList = fluidArr.fixedIterable
More examples / tests can be found in the if __name__ is "__main__": section at the bottom of fluidIter.py. These are worth looking at because they explain what happens in various situations. Such as: Replacing a large sections of the list using a slice. Or using (and modifying) the same iterable in nested for loops.
As I stated to start with: this is a complicated solution that will hurt the readability of your code and make it more difficult to debug. Therefore other solutions such as the list comprehensions mentioned in David Raznick’s answer should be considered first. That being said, I have found times where this class has been useful to me and has been easier to use than keeping track of the indices of elements that need deleting.
Edit: As mentioned in the comments, this answer does not really present a problem for which this approach provides a solution. I will try to address that here:
List comprehensions provide a way to generate a new list but these approaches tend to look at each element in isolation rather than the current state of the list as a whole.
i.e.
newList = [i for i in oldList if testFunc(i)]
But what if the result of the testFunc depends on the elements that have been added to newList already? Or the elements still in oldList that might be added next? There might still be a way to use a list comprehension but it will begin to lose it’s elegance, and for me it feels easier to modify a list in place.
The code below is one example of an algorithm that suffers from the above problem. The algorithm will reduce a list so that no element is a multiple of any other element.
randInts = [70, 20, 61, 80, 54, 18, 7, 18, 55, 9]
fRandInts = FluidIterable(randInts)
fRandIntsIter = fRandInts.__iter__()
# for each value in the list (outer loop)
# test against every other value in the list (inner loop)
for i in fRandIntsIter:
print(' ')
print('outer val: ', i)
innerIntsIter = fRandInts.__iter__()
for j in innerIntsIter:
innerIndex = innerIntsIter.currentIndex
# skip the element that the outloop is currently on
# because we don't want to test a value against itself
if not innerIndex == fRandIntsIter.currentIndex:
# if the test element, j, is a multiple
# of the reference element, i, then remove 'j'
if j%i == 0:
print('remove val: ', j)
# remove element in place, without breaking the
# iteration of either loop
del fRandInts[innerIndex]
# end if multiple, then remove
# end if not the same value as outer loop
# end inner loop
# end outerloop
print('')
print('final list: ', randInts)
The output and the final reduced list are shown below
The most effective method is list comprehension, many people show their case, of course, it is also a good way to get an iterator through filter.
Filter receives a function and a sequence. Filter applies the passed function to each element in turn, and then decides whether to retain or discard the element depending on whether the function return value is True or False.
The other answers are correct that it is usually a bad idea to delete from a list that you’re iterating. Reverse iterating avoids the pitfalls, but it is much more difficult to follow code that does that, so usually you’re better off using a list comprehension or filter.
There is, however, one case where it is safe to remove elements from a sequence that you are iterating: if you’re only removing one item while you’re iterating. This can be ensured using a return or a break. For example:
for i, item in enumerate(lst):
if item % 4 == 0:
foo(item)
del lst[i]
break
This is often easier to understand than a list comprehension when you’re doing some operations with side effects on the first item in a list that meets some condition and then removing that item from the list immediately after.
回答 21
我可以想到三种解决问题的方法。例如,我将创建一个随机的元组列表somelist = [(1,2,3), (4,5,6), (3,6,6), (7,8,9), (15,0,0), (10,11,12)]。我选择的条件是sum of elements of a tuple = 15。在最终列表中,我们将只有那些总和不等于15的元组。
indices =[i for i in range(len(somelist))if(sum(somelist[i])==15)]
newlist2 =[tup for j, tup in enumerate(somelist)if j notin indices]print newlist2
>>>[(1,2,3),(7,8,9),(10,11,12)]
I can think of three approaches to solve your problem. As an example, I will create a random list of tuples somelist = [(1,2,3), (4,5,6), (3,6,6), (7,8,9), (15,0,0), (10,11,12)]. The condition that I choose is sum of elements of a tuple = 15. In the final list we will only have those tuples whose sum is not equal to 15.
What I have chosen is a randomly chosen example. Feel free to change the list of tuples and the condition that I have chosen.
Method 1.> Use the framework that you had suggested (where one fills in a code inside a for loop). I use a small code with del to delete a tuple that meets the said condition. However, this method will miss a tuple (which satisfies the said condition) if two consecutively placed tuples meet the given condition.
for tup in somelist:
if ( sum(tup)==15 ):
del somelist[somelist.index(tup)]
print somelist
>>> [(1, 2, 3), (3, 6, 6), (7, 8, 9), (10, 11, 12)]
Method 2.> Construct a new list which contains elements (tuples) where the given condition is not met (this is the same thing as removing elements of list where the given condition is met). Following is the code for that:
newlist1 = [somelist[tup] for tup in range(len(somelist)) if(sum(somelist[tup])!=15)]
print newlist1
>>>[(1, 2, 3), (7, 8, 9), (10, 11, 12)]
Method 3.> Find indices where the given condition is met, and then use remove elements (tuples) corresponding to those indices. Following is the code for that.
indices = [i for i in range(len(somelist)) if(sum(somelist[i])==15)]
newlist2 = [tup for j, tup in enumerate(somelist) if j not in indices]
print newlist2
>>>[(1, 2, 3), (7, 8, 9), (10, 11, 12)]
Method 1 and method 2 are faster than method 3. Method2 and method3 are more efficient than method1. I prefer method2. For the aforementioned example, time(method1) : time(method2) : time(method3) = 1 : 1 : 1.7
That should be significantly faster than anything else.
回答 23
在某些情况下,您要做的不仅仅是一次过滤一个列表,还希望迭代时更改迭代。
这是一个示例,其中事先复制列表是不正确的,不可能进行反向迭代,并且列表理解也不是一种选择。
""" Sieve of Eratosthenes """def generate_primes(n):""" Generates all primes less than n. """
primes = list(range(2,n))
idx =0while idx < len(primes):
p = primes[idx]for multiple in range(p+p, n, p):try:
primes.remove(multiple)exceptValueError:pass#EAFP
idx +=1yield p
In some situations, where you’re doing more than simply filtering a list one item at time, you want your iteration to change while iterating.
Here is an example where copying the list beforehand is incorrect, reverse iteration is impossible and a list comprehension is also not an option.
""" Sieve of Eratosthenes """
def generate_primes(n):
""" Generates all primes less than n. """
primes = list(range(2,n))
idx = 0
while idx < len(primes):
p = primes[idx]
for multiple in range(p+p, n, p):
try:
primes.remove(multiple)
except ValueError:
pass #EAFP
idx += 1
yield p
回答 24
如果以后要使用新列表,只需将elem设置为None,然后在以后的循环中进行判断,就像这样
for i in li:
i =Nonefor elem in li:if elem isNone:continue
import os
import platform
def creation_date(path_to_file):"""
Try to get the date that a file was created, falling back to when it was
last modified if that isn't possible.
See http://stackoverflow.com/a/39501288/1709587 for explanation.
"""if platform.system()=='Windows':return os.path.getctime(path_to_file)else:
stat = os.stat(path_to_file)try:return stat.st_birthtime
exceptAttributeError:# We're probably on Linux. No easy way to get creation dates here,# so we'll settle for when its content was last modified.return stat.st_mtime
Getting some sort of modification date in a cross-platform way is easy – just call os.path.getmtime(path) and you’ll get the Unix timestamp of when the file at path was last modified.
Getting file creation dates, on the other hand, is fiddly and platform-dependent, differing even between the three big OSes:
On Mac, as well as some other Unix-based OSes, you can use the .st_birthtime attribute of the result of a call to os.stat().
On Linux, this is currently impossible, at least without writing a C extension for Python. Although some file systems commonly used with Linux do store creation dates (for example, ext4 stores them in st_crtime) , the Linux kernel offers no way of accessing them; in particular, the structs it returns from stat() calls in C, as of the latest kernel version, don’t contain any creation date fields. You can also see that the identifier st_crtime doesn’t currently feature anywhere in the Python source. At least if you’re on ext4, the data is attached to the inodes in the file system, but there’s no convenient way of accessing it.
The next-best thing on Linux is to access the file’s mtime, through either os.path.getmtime() or the .st_mtime attribute of an os.stat() result. This will give you the last time the file’s content was modified, which may be adequate for some use cases.
Putting this all together, cross-platform code should look something like this…
import os
import platform
def creation_date(path_to_file):
"""
Try to get the date that a file was created, falling back to when it was
last modified if that isn't possible.
See http://stackoverflow.com/a/39501288/1709587 for explanation.
"""
if platform.system() == 'Windows':
return os.path.getctime(path_to_file)
else:
stat = os.stat(path_to_file)
try:
return stat.st_birthtime
except AttributeError:
# We're probably on Linux. No easy way to get creation dates here,
# so we'll settle for when its content was last modified.
return stat.st_mtime
Note: ctime() does not refer to creation time on *nix systems, but rather the last time the inode data changed. (thanks to kojiro for making that fact more clear in the comments by providing a link to an interesting blog post)
edit: In newer code you should probably use os.path.getmtime() (thanks Christian Oudard)
but note that it returns a floating point value of time_t with fraction seconds (if your OS supports it)
getmtime(path) Return the time of last modification of path. The return value is a number giving the
number of seconds since the epoch (see the time module). Raise os.error if the file does
not exist or is inaccessible. New in version 1.5.2. Changed in version 2.3: If
os.stat_float_times() returns True, the result is a floating point number.
stat(path) Perform a stat() system call on the given path. The return value is an object whose
attributes correspond to the members of the stat structure, namely: st_mode (protection
bits), st_ino (inode number), st_dev (device), st_nlink (number of hard links), st_uid
(user ID of owner), st_gid (group ID of owner), st_size (size of file, in bytes),
st_atime (time of most recent access), st_mtime (time of most recent content
modification), st_ctime (platform dependent; time of most recent metadata change on Unix, or the time of creation on Windows):
In Python 3.4 and above, you can use the object oriented pathlib module interface which includes wrappers for much of the os module. Here is an example of getting the file stats.
os.stat returns a named tuple with st_mtime and st_ctime attributes. The modification time is st_mtime on both platforms; unfortunately, on Windows, ctime means “creation time”, whereas on POSIX it means “change time”. I’m not aware of any way to get the creation time on POSIX platforms.
somefile.txt
Modified
1429613446
1429613446.0
1429613446.0
Created
1517491049
1517491049.28306
1517491049.28306
Date modified: Tue Apr 21 11:50:46 2015
Date modified: 2015-04-21 11:50:46
Date modified: 21/04/2015 11:50:46
Date created: Thu Feb 1 13:17:29 2018
Date created: 2018-02-01 13:17:29.283060
Date created: 01/02/2018 13:17:29
回答 8
>>>import os
>>> os.stat('feedparser.py').st_mtime
1136961142.0>>> os.stat('feedparser.py').st_ctime
1222664012.233>>>
from crtime import get_crtimes_in_dir
for fname, date in get_crtimes_in_dir(".", raise_on_error=True, as_epoch=False):print(fname, date)# file_a.py Mon Mar 18 20:51:18 CET 2019
It may worth taking a look at the crtime library which implements cross-platform access to the file creation time.
from crtime import get_crtimes_in_dir
for fname, date in get_crtimes_in_dir(".", raise_on_error=True, as_epoch=False):
print(fname, date)
# file_a.py Mon Mar 18 20:51:18 CET 2019
df = pd.DataFrame({'A': list('aabbc'),'B':['x','x', np.nan,'x', np.nan]})
s = df['B'].copy()
df
A B
0 a x
1 a x
2 b NaN3 b x
4 c NaN
s
0 x
1 x
2NaN3 x
4NaNName: B, dtype: object
This table summarises the different situations in which you’d want to count something in a DataFrame (or Series, for completeness), along with the recommended method(s).
Footnotes
DataFrame.count returns counts for each column as a Series since the non-null count varies by column.
DataFrameGroupBy.size returns a Series, since all columns in the same group share the same row-count.
DataFrameGroupBy.count returns a DataFrame, since the non-null count could differ across columns in the same group. To get the group-wise non-null count for a specific column, use df.groupby(...)['x'].count() where “x” is the column to count.
Minimal Code Examples
Below, I show examples of each of the methods described in the table above. First, the setup –
df = pd.DataFrame({
'A': list('aabbc'), 'B': ['x', 'x', np.nan, 'x', np.nan]})
s = df['B'].copy()
df
A B
0 a x
1 a x
2 b NaN
3 b x
4 c NaN
s
0 x
1 x
2 NaN
3 x
4 NaN
Name: B, dtype: object
Row Count of a DataFrame: len(df), df.shape[0], or len(df.index)
len(df)
# 5
df.shape[0]
# 5
len(df.index)
# 5
It seems silly to compare the performance of constant time operations, especially when the difference is on the level of “seriously, don’t worry about it”. But this seems to be a trend with other answers, so I’m doing the same for completeness.
Of the 3 methods above, len(df.index) (as mentioned in other answers) is the fastest.
Note
All the methods above are constant time operations as they are simple attribute lookups.
df.shape (similar to ndarray.shape) is an attribute that returns a tuple of (# Rows, # Cols). For example, df.shape returns (8,
2) for the example here.
Column Count of a DataFrame: df.shape[1], len(df.columns)
df.shape[1]
# 2
len(df.columns)
# 2
Analogous to len(df.index), len(df.columns) is the faster of the two methods (but takes more characters to type).
Row Count of a Series: len(s), s.size, len(s.index)
len(s)
# 5
s.size
# 5
len(s.index)
# 5
s.size and len(s.index) are about the same in terms of speed. But I recommend len(df).
Note size is an attribute, and it returns the number of elements (=count
of rows for any Series). DataFrames also define a size attribute which
returns the same result as df.shape[0] * df.shape[1].
Non-Null Row Count: DataFrame.count and Series.count
The methods described here only count non-null values (meaning NaNs are ignored).
Calling DataFrame.count will return non-NaN counts for each column:
s.groupby(df.A).size()
A
a 2
b 2
c 1
Name: B, dtype: int64
In both cases, a Series is returned. This makes sense for DataFrames as well since all groups share the same row-count.
Group-wise Non-Null Row Count: GroupBy.count
Similar to above, but use GroupBy.count, not GroupBy.size. Note that size always returns a Series, while count returns a Series if called on a specific column, or else a DataFrame.
The following methods return the same thing:
df.groupby('A')['B'].size()
df.groupby('A').size()
A
a 2
b 2
c 1
Name: B, dtype: int64
Meanwhile, for count, we have
df.groupby('A').count()
B
A
a 2
b 1
c 0
…called on the entire GroupBy object, v/s,
df.groupby('A')['B'].count()
A
a 2
b 1
c 0
Name: B, dtype: int64
len() is your friend, it can be used for row counts as len(df).
Alternatively, you can access all rows by df.index and all columns by
df.columns, and as you can use the len(anyList) for getting the count of list, use
len(df.index) for getting the number of rows, and len(df.columns) for the column count.
Or, you can use df.shape which returns the number of rows and columns together, if you want to access the number of rows only use df.shape[0] and for the number of columns only use: df.shape[1].
df.shape??Type: property
String form:<property object at 0x1127b33c0>Source:# df.shape.fget@propertydef shape(self):"""
Return a tuple representing the dimensionality of the DataFrame.
"""return len(self.index), len(self.columns)
在len(df)的内幕之下
df.__len__??Signature: df.__len__()Source:def __len__(self):"""Returns length of info axis, but here we use the index """return len(self.index)File:~/miniconda2/lib/python2.7/site-packages/pandas/core/frame.py
Type: instancemethod
The reason why len(df) or len(df.index) is faster than df.shape[0]. Look at the code. df.shape is a @property that runs a DataFrame method calling len twice.
df.shape??
Type: property
String form: <property object at 0x1127b33c0>
Source:
# df.shape.fget
@property
def shape(self):
"""
Return a tuple representing the dimensionality of the DataFrame.
"""
return len(self.index), len(self.columns)
And beneath the hood of len(df)
df.__len__??
Signature: df.__len__()
Source:
def __len__(self):
"""Returns length of info axis, but here we use the index """
return len(self.index)
File: ~/miniconda2/lib/python2.7/site-packages/pandas/core/frame.py
Type: instancemethod
len(df.index) will be slightly faster than len(df) since it has one less function call, but this is always faster than df.shape[0]
I come to pandas from R background, and I see that pandas is more complicated when it comes to selecting row or column.
I had to wrestle with it for a while, then I found some ways to deal with:
getting the number of columns:
len(df.columns)
## Here:
#df is your data.frame
#df.columns return a string, it contains column's titles of the df.
#Then, "len()" gets the length of it.