分类目录归档:知识问答

Python的super()如何与多重继承一起使用?

问题:Python的super()如何与多重继承一起使用?

我在Python面向对象编程中非常陌生,并且在理解super()函数(新样式类)时遇到困难,尤其是在涉及多重继承时。

例如,如果您有类似的东西:

class First(object):
    def __init__(self):
        print "first"

class Second(object):
    def __init__(self):
        print "second"

class Third(First, Second):
    def __init__(self):
        super(Third, self).__init__()
        print "that's it"

我不明白的是:Third()该类会继承这两个构造方法吗?如果是,那么哪个将与super()一起运行,为什么?

如果要运行另一台呢?我知道这与Python方法解析顺序(MRO)有关。

I’m pretty much new in Python object oriented programming and I have trouble understanding the super() function (new style classes) especially when it comes to multiple inheritance.

For example if you have something like:

class First(object):
    def __init__(self):
        print "first"

class Second(object):
    def __init__(self):
        print "second"

class Third(First, Second):
    def __init__(self):
        super(Third, self).__init__()
        print "that's it"

What I don’t get is: will the Third() class inherit both constructor methods? If yes, then which one will be run with super() and why?

And what if you want to run the other one? I know it has something to do with Python method resolution order (MRO).


回答 0

Guido自己在他的博客文章Method Resolution Order(包括两次较早的尝试)中对此进行了合理的详细说明。

在您的示例中,Third()将调用First.__init__。从左到右列出,Python会在类的父级中查找每个属性。在这种情况下,我们正在寻找__init__。所以,如果您定义

class Third(First, Second):
    ...

Python将首先查看First,如果First没有该属性,则它将查看Second

当继承开始跨越路径时(例如,如果First继承自Second),这种情况会变得更加复杂。阅读上面的链接以获取更多详细信息,但是,简而言之,Python会尝试维护每个类从子类本身开始在继承列表上出现的顺序。

因此,例如,如果您有:

class First(object):
    def __init__(self):
        print "first"

class Second(First):
    def __init__(self):
        print "second"

class Third(First):
    def __init__(self):
        print "third"

class Fourth(Second, Third):
    def __init__(self):
        super(Fourth, self).__init__()
        print "that's it"

MRO将是 [Fourth, Second, Third, First].

顺便说一句:如果Python无法找到一致的方法解析顺序,它将引发异常,而不是退回到可能使用户感到惊讶的行为。

编辑以添加一个模棱两可的MRO的示例:

class First(object):
    def __init__(self):
        print "first"

class Second(First):
    def __init__(self):
        print "second"

class Third(First, Second):
    def __init__(self):
        print "third"

是否应Third的MRO是[First, Second][Second, First]?没有明显的期望,Python会引发错误:

TypeError: Error when calling the metaclass bases
    Cannot create a consistent method resolution order (MRO) for bases Second, First

编辑:我看到几个人争辩说上面的示例缺少super()调用,所以让我解释一下:这些示例的重点是说明MRO的构造方式。他们打算打印“第一\第二\第三”或其他内容。您可以并且应该当然使用该示例,添加super()调用,看看会发生什么,并且对Python的继承模型有更深入的了解。但是我的目标是保持简单,并说明MRO的构建方式。正如我所解释的那样:

>>> Fourth.__mro__
(<class '__main__.Fourth'>,
 <class '__main__.Second'>, <class '__main__.Third'>,
 <class '__main__.First'>,
 <type 'object'>)

This is detailed with a reasonable amount of detail by Guido himself in his blog post Method Resolution Order (including two earlier attempts).

In your example, Third() will call First.__init__. Python looks for each attribute in the class’s parents as they are listed left to right. In this case, we are looking for __init__. So, if you define

class Third(First, Second):
    ...

Python will start by looking at First, and, if First doesn’t have the attribute, then it will look at Second.

This situation becomes more complex when inheritance starts crossing paths (for example if First inherited from Second). Read the link above for more details, but, in a nutshell, Python will try to maintain the order in which each class appears on the inheritance list, starting with the child class itself.

So, for instance, if you had:

class First(object):
    def __init__(self):
        print "first"

class Second(First):
    def __init__(self):
        print "second"

class Third(First):
    def __init__(self):
        print "third"

class Fourth(Second, Third):
    def __init__(self):
        super(Fourth, self).__init__()
        print "that's it"

the MRO would be [Fourth, Second, Third, First].

By the way: if Python cannot find a coherent method resolution order, it’ll raise an exception, instead of falling back to behavior which might surprise the user.

Edited to add an example of an ambiguous MRO:

class First(object):
    def __init__(self):
        print "first"

class Second(First):
    def __init__(self):
        print "second"

class Third(First, Second):
    def __init__(self):
        print "third"

Should Third‘s MRO be [First, Second] or [Second, First]? There’s no obvious expectation, and Python will raise an error:

TypeError: Error when calling the metaclass bases
    Cannot create a consistent method resolution order (MRO) for bases Second, First

Edit: I see several people arguing that the examples above lack super() calls, so let me explain: The point of the examples is to show how the MRO is constructed. They are not intended to print “first\nsecond\third” or whatever. You can – and should, of course, play around with the example, add super() calls, see what happens, and gain a deeper understanding of Python’s inheritance model. But my goal here is to keep it simple and show how the MRO is built. And it is built as I explained:

>>> Fourth.__mro__
(<class '__main__.Fourth'>,
 <class '__main__.Second'>, <class '__main__.Third'>,
 <class '__main__.First'>,
 <type 'object'>)

回答 1

您的代码和其他答案都是错误的。他们错过了super()合作子类工作所需的前两个类中的调用。

这是代码的固定版本:

class First(object):
    def __init__(self):
        super(First, self).__init__()
        print("first")

class Second(object):
    def __init__(self):
        super(Second, self).__init__()
        print("second")

class Third(First, Second):
    def __init__(self):
        super(Third, self).__init__()
        print("third")

super()调用在每个步骤中都会在MRO中找到下一个方法,这就是为什么First和Second也必须拥有它的原因,否则执行将在的结尾处停止Second.__init__()

这是我得到的:

>>> Third()
second
first
third

Your code, and the other answers, are all buggy. They are missing the super() calls in the first two classes that are required for co-operative subclassing to work.

Here is a fixed version of the code:

class First(object):
    def __init__(self):
        super(First, self).__init__()
        print("first")

class Second(object):
    def __init__(self):
        super(Second, self).__init__()
        print("second")

class Third(First, Second):
    def __init__(self):
        super(Third, self).__init__()
        print("third")

The super() call finds the next method in the MRO at each step, which is why First and Second have to have it too, otherwise execution stops at the end of Second.__init__().

This is what I get:

>>> Third()
second
first
third

回答 2

我想一点点地阐述一下答案,因为当我开始阅读如何在Python的多继承层次结构中使用super()时,我没有立即得到它。

您需要了解的是,在完整继承层次结构的上下文中,根据使用的方法分辨率排序(MRO)算法super(MyClass, self).__init__()提供 __init__一种方法。

最后一部分对于理解至关重要。让我们再次考虑示例:

#!/usr/bin/env python2

class First(object):
  def __init__(self):
    print "First(): entering"
    super(First, self).__init__()
    print "First(): exiting"

class Second(object):
  def __init__(self):
    print "Second(): entering"
    super(Second, self).__init__()
    print "Second(): exiting"

class Third(First, Second):
  def __init__(self):
    print "Third(): entering"
    super(Third, self).__init__()
    print "Third(): exiting"

根据 Guido van Rossum 撰写的有关方法解析顺序文章__init__在Python 2.3之前,使用“深度优先从左到右遍历”来计算解析顺序:

Third --> First --> object --> Second --> object

除去所有重复项(最后一个重复项除外)之后,我们得到:

Third --> First --> Second --> object

因此,让我们跟随实例化Third类实例时发生的情况x = Third()

  1. 据MRO Third.__init__执行。
    • 版画 Third(): entering
    • 然后super(Third, self).__init__()执行,MRO返回First.__init__被调用。
  2. First.__init__ 执行。
    • 版画 First(): entering
    • 然后super(First, self).__init__()执行,MRO返回Second.__init__被调用。
  3. Second.__init__ 执行。
    • 版画 Second(): entering
    • 然后super(Second, self).__init__()执行,MRO返回object.__init__被调用。
  4. object.__init__ 执行(那里的代码中没有打印语句)
  5. 执行返回到Second.__init__随后打印Second(): exiting
  6. 执行返回到First.__init__随后打印First(): exiting
  7. 执行返回到Third.__init__随后打印Third(): exiting

这详细说明了为什么实例化Third()导致:

Third(): entering
First(): entering
Second(): entering
Second(): exiting
First(): exiting
Third(): exiting

MRO算法已从Python 2.3开始进行了改进,可以在复杂的情况下很好地工作,但是我猜想在大多数情况下,仍然可以使用“深度优先的从左到右遍历” +“删除重复的期望值”(请如果不是这种情况,请发表评论)。请务必阅读Guido的博客文章!

I wanted to elaborate the answer by lifeless a bit because when I started reading about how to use super() in a multiple inheritance hierarchy in Python, I did’t get it immediately.

What you need to understand is that super(MyClass, self).__init__() provides the next __init__ method according to the used Method Resolution Ordering (MRO) algorithm in the context of the complete inheritance hierarchy.

This last part is crucial to understand. Let’s consider the example again:

#!/usr/bin/env python2

class First(object):
  def __init__(self):
    print "First(): entering"
    super(First, self).__init__()
    print "First(): exiting"

class Second(object):
  def __init__(self):
    print "Second(): entering"
    super(Second, self).__init__()
    print "Second(): exiting"

class Third(First, Second):
  def __init__(self):
    print "Third(): entering"
    super(Third, self).__init__()
    print "Third(): exiting"

According to this article about Method Resolution Order by Guido van Rossum, the order to resolve __init__ is calculated (before Python 2.3) using a “depth-first left-to-right traversal” :

Third --> First --> object --> Second --> object

After removing all duplicates, except for the last one, we get :

Third --> First --> Second --> object

So, lets follow what happens when we instantiate an instance of the Third class, e.g. x = Third().

  1. According to MRO Third.__init__ executes.
    • prints Third(): entering
    • then super(Third, self).__init__() executes and MRO returns First.__init__ which is called.
  2. First.__init__ executes.
    • prints First(): entering
    • then super(First, self).__init__() executes and MRO returns Second.__init__ which is called.
  3. Second.__init__ executes.
    • prints Second(): entering
    • then super(Second, self).__init__() executes and MRO returns object.__init__ which is called.
  4. object.__init__ executes (no print statements in the code there)
  5. execution goes back to Second.__init__ which then prints Second(): exiting
  6. execution goes back to First.__init__ which then prints First(): exiting
  7. execution goes back to Third.__init__ which then prints Third(): exiting

This details out why instantiating Third() results in to :

Third(): entering
First(): entering
Second(): entering
Second(): exiting
First(): exiting
Third(): exiting

The MRO algorithm has been improved from Python 2.3 onwards to work well in complex cases, but I guess that using the “depth-first left-to-right traversal” + “removing duplicates expect for the last” still works in most cases (please comment if this is not the case). Be sure to read the blog post by Guido!


回答 3

这就是所谓的Diamond问题,页面上有一个关于Python的条目,但是简而言之,Python将从左到右调用超类的方法。

This is known as the Diamond Problem, the page has an entry on Python, but in short, Python will call the superclass’s methods from left to right.


回答 4

这就是我解决以下问题的方法:具有用于初始化的不同变量的多个继承以及具有相同函数调用的多个MixIn。我必须显式地将变量添加到传递的** kwargs中,并添加一个MixIn接口作为超级调用的端点。

A是一个可扩展的基类,B并且C都是MixIn类,它们都提供功能fA并且Bv在他们的期望参数中__init__C期望w。该函数f采用一个参数yQ从所有三个类继承。MixInF对于混合接口BC


class A(object):
    def __init__(self, v, *args, **kwargs):
        print "A:init:v[{0}]".format(v)
        kwargs['v']=v
        super(A, self).__init__(*args, **kwargs)
        self.v = v


class MixInF(object):
    def __init__(self, *args, **kwargs):
        print "IObject:init"
    def f(self, y):
        print "IObject:y[{0}]".format(y)


class B(MixInF):
    def __init__(self, v, *args, **kwargs):
        print "B:init:v[{0}]".format(v)
        kwargs['v']=v
        super(B, self).__init__(*args, **kwargs)
        self.v = v
    def f(self, y):
        print "B:f:v[{0}]:y[{1}]".format(self.v, y)
        super(B, self).f(y)


class C(MixInF):
    def __init__(self, w, *args, **kwargs):
        print "C:init:w[{0}]".format(w)
        kwargs['w']=w
        super(C, self).__init__(*args, **kwargs)
        self.w = w
    def f(self, y):
        print "C:f:w[{0}]:y[{1}]".format(self.w, y)
        super(C, self).f(y)


class Q(C,B,A):
    def __init__(self, v, w):
        super(Q, self).__init__(v=v, w=w)
    def f(self, y):
        print "Q:f:y[{0}]".format(y)
        super(Q, self).f(y)

This is to how I solved to issue of having multiple inheritance with different variables for initialization and having multiple MixIns with the same function call. I had to explicitly add variables to passed **kwargs and add a MixIn interface to be an endpoint for super calls.

Here A is an extendable base class and B and C are MixIn classes both who provide function f. A and B both expect parameter v in their __init__ and C expects w. The function f takes one parameter y. Q inherits from all three classes. MixInF is the mixin interface for B and C.


class A(object):
    def __init__(self, v, *args, **kwargs):
        print "A:init:v[{0}]".format(v)
        kwargs['v']=v
        super(A, self).__init__(*args, **kwargs)
        self.v = v


class MixInF(object):
    def __init__(self, *args, **kwargs):
        print "IObject:init"
    def f(self, y):
        print "IObject:y[{0}]".format(y)


class B(MixInF):
    def __init__(self, v, *args, **kwargs):
        print "B:init:v[{0}]".format(v)
        kwargs['v']=v
        super(B, self).__init__(*args, **kwargs)
        self.v = v
    def f(self, y):
        print "B:f:v[{0}]:y[{1}]".format(self.v, y)
        super(B, self).f(y)


class C(MixInF):
    def __init__(self, w, *args, **kwargs):
        print "C:init:w[{0}]".format(w)
        kwargs['w']=w
        super(C, self).__init__(*args, **kwargs)
        self.w = w
    def f(self, y):
        print "C:f:w[{0}]:y[{1}]".format(self.w, y)
        super(C, self).f(y)


class Q(C,B,A):
    def __init__(self, v, w):
        super(Q, self).__init__(v=v, w=w)
    def f(self, y):
        print "Q:f:y[{0}]".format(y)
        super(Q, self).f(y)

回答 5

我知道这并不能直接回答super()问题,但我觉得它足够相关,可以分享。

还有一种方法可以直接调用每个继承的类:


class First(object):
    def __init__(self):
        print '1'

class Second(object):
    def __init__(self):
        print '2'

class Third(First, Second):
    def __init__(self):
        Second.__init__(self)

只是需要注意的是,如果你做这种方式,你必须手动调用每一个为我敢肯定First__init__()不会被调用。

I understand this doesn’t directly answer the super() question, but I feel it’s relevant enough to share.

There is also a way to directly call each inherited class:


class First(object):
    def __init__(self):
        print '1'

class Second(object):
    def __init__(self):
        print '2'

class Third(First, Second):
    def __init__(self):
        Second.__init__(self)

Just note that if you do it this way, you’ll have to call each manually as I’m pretty sure First‘s __init__() won’t be called.


回答 6

总体

假设一切都源自object(如果不是),Python将基于您的类继承树计算方法解析顺序(MRO)。MRO满足3个属性:

  • 班上的孩子比父母先
  • 左父母先于右父母
  • 一个Class只在MRO中出现一次

如果不存在此类排序,则Python错误。它的内部工作原理是该类祖先的C3 Linerization。在此处阅读所有内容:https//www.python.org/download/releases/2.3/mro/

因此,在下面的两个示例中,它是:

  1. 儿童
  2. 剩下
  3. 父母

调用方法时,该方法在MRO中的首次出现就是被调用的方法。任何不实现该方法的类都将被跳过。super在该方法内的任何调用都将在MRO中调用该方法的下一次出现。因此,在继承中放置类的顺序以及在super方法中放置调用的位置都很重要。

super第一每种方法

class Parent(object):
    def __init__(self):
        super(Parent, self).__init__()
        print "parent"

class Left(Parent):
    def __init__(self):
        super(Left, self).__init__()
        print "left"

class Right(Parent):
    def __init__(self):
        super(Right, self).__init__()
        print "right"

class Child(Left, Right):
    def __init__(self):
        super(Child, self).__init__()
        print "child"

Child() 输出:

parent
right
left
child

随着super最后在每个方法

class Parent(object):
    def __init__(self):
        print "parent"
        super(Parent, self).__init__()

class Left(Parent):
    def __init__(self):
        print "left"
        super(Left, self).__init__()

class Right(Parent):
    def __init__(self):
        print "right"
        super(Right, self).__init__()

class Child(Left, Right):
    def __init__(self):
        print "child"
        super(Child, self).__init__()

Child() 输出:

child
left
right
parent

Overall

Assuming everything descends from object (you are on your own if it doesn’t), Python computes a method resolution order (MRO) based on your class inheritance tree. The MRO satisfies 3 properties:

  • Children of a class come before their parents
  • Left parents come before right parents
  • A class only appears once in the MRO

If no such ordering exists, Python errors. The inner workings of this is a C3 Linerization of the classes ancestry. Read all about it here: https://www.python.org/download/releases/2.3/mro/

Thus, in both of the examples below, it is:

  1. Child
  2. Left
  3. Right
  4. Parent

When a method is called, the first occurrence of that method in the MRO is the one that is called. Any class that doesn’t implement that method is skipped. Any call to super within that method will call the next occurrence of that method in the MRO. Consequently, it matters both what order you place classes in inheritance, and where you put the calls to super in the methods.

With super first in each method

class Parent(object):
    def __init__(self):
        super(Parent, self).__init__()
        print "parent"

class Left(Parent):
    def __init__(self):
        super(Left, self).__init__()
        print "left"

class Right(Parent):
    def __init__(self):
        super(Right, self).__init__()
        print "right"

class Child(Left, Right):
    def __init__(self):
        super(Child, self).__init__()
        print "child"

Child() Outputs:

parent
right
left
child

With super last in each method

class Parent(object):
    def __init__(self):
        print "parent"
        super(Parent, self).__init__()

class Left(Parent):
    def __init__(self):
        print "left"
        super(Left, self).__init__()

class Right(Parent):
    def __init__(self):
        print "right"
        super(Right, self).__init__()

class Child(Left, Right):
    def __init__(self):
        print "child"
        super(Child, self).__init__()

Child() Outputs:

child
left
right
parent

回答 7

关于@calfzhou的评论,您可以像往常一样使用**kwargs

在线运行示例

class A(object):
  def __init__(self, a, *args, **kwargs):
    print("A", a)

class B(A):
  def __init__(self, b, *args, **kwargs):
    super(B, self).__init__(*args, **kwargs)
    print("B", b)

class A1(A):
  def __init__(self, a1, *args, **kwargs):
    super(A1, self).__init__(*args, **kwargs)
    print("A1", a1)

class B1(A1, B):
  def __init__(self, b1, *args, **kwargs):
    super(B1, self).__init__(*args, **kwargs)
    print("B1", b1)


B1(a1=6, b1=5, b="hello", a=None)

结果:

A None
B hello
A1 6
B1 5

您也可以按位置使用它们:

B1(5, 6, b="hello", a=None)

但是您必须记住MRO,这确实令人困惑。

我可能会有点烦,但是我注意到人们忘记了每次使用的时间*args以及**kwargs他们重写方法时,虽然这是对这些“魔术变量”真正有用且明智的使用方法之一。

About @calfzhou’s comment, you can use, as usually, **kwargs:

Online running example

class A(object):
  def __init__(self, a, *args, **kwargs):
    print("A", a)

class B(A):
  def __init__(self, b, *args, **kwargs):
    super(B, self).__init__(*args, **kwargs)
    print("B", b)

class A1(A):
  def __init__(self, a1, *args, **kwargs):
    super(A1, self).__init__(*args, **kwargs)
    print("A1", a1)

class B1(A1, B):
  def __init__(self, b1, *args, **kwargs):
    super(B1, self).__init__(*args, **kwargs)
    print("B1", b1)


B1(a1=6, b1=5, b="hello", a=None)

Result:

A None
B hello
A1 6
B1 5

You can also use them positionally:

B1(5, 6, b="hello", a=None)

but you have to remember the MRO, it’s really confusing.

I can be a little annoying, but I noticed that people forgot every time to use *args and **kwargs when they override a method, while it’s one of few really useful and sane use of these ‘magic variables’.


回答 8

另一个尚未涵盖的要点是传递用于初始化类的参数。由于的目标super取决于子类,因此传递参数的唯一好方法是将它们打包在一起。然后要注意不要使相同的参数名称具有不同的含义。

例:

class A(object):
    def __init__(self, **kwargs):
        print('A.__init__')
        super().__init__()

class B(A):
    def __init__(self, **kwargs):
        print('B.__init__ {}'.format(kwargs['x']))
        super().__init__(**kwargs)


class C(A):
    def __init__(self, **kwargs):
        print('C.__init__ with {}, {}'.format(kwargs['a'], kwargs['b']))
        super().__init__(**kwargs)


class D(B, C): # MRO=D, B, C, A
    def __init__(self):
        print('D.__init__')
        super().__init__(a=1, b=2, x=3)

print(D.mro())
D()

给出:

[<class '__main__.D'>, <class '__main__.B'>, <class '__main__.C'>, <class '__main__.A'>, <class 'object'>]
D.__init__
B.__init__ 3
C.__init__ with 1, 2
A.__init__

__init__直接调用超类来更直接地分配参数很容易,但是如果super超类中有任何调用和/或更改了MRO,并且根据实现的不同,可以多次调用A类,则失败。

总而言之:协作继承以及用于初始化的超级参数和特定参数不能很好地协同工作。

Another not yet covered point is passing parameters for initialization of classes. Since the destination of super depends on the subclass the only good way to pass parameters is packing them all together. Then be careful to not have the same parameter name with different meanings.

Example:

class A(object):
    def __init__(self, **kwargs):
        print('A.__init__')
        super().__init__()

class B(A):
    def __init__(self, **kwargs):
        print('B.__init__ {}'.format(kwargs['x']))
        super().__init__(**kwargs)


class C(A):
    def __init__(self, **kwargs):
        print('C.__init__ with {}, {}'.format(kwargs['a'], kwargs['b']))
        super().__init__(**kwargs)


class D(B, C): # MRO=D, B, C, A
    def __init__(self):
        print('D.__init__')
        super().__init__(a=1, b=2, x=3)

print(D.mro())
D()

gives:

[<class '__main__.D'>, <class '__main__.B'>, <class '__main__.C'>, <class '__main__.A'>, <class 'object'>]
D.__init__
B.__init__ 3
C.__init__ with 1, 2
A.__init__

Calling the super class __init__ directly to more direct assignment of parameters is tempting but fails if there is any super call in a super class and/or the MRO is changed and class A may be called multiple times, depending on the implementation.

To conclude: cooperative inheritance and super and specific parameters for initialization aren’t working together very well.


回答 9

class First(object):
  def __init__(self, a):
    print "first", a
    super(First, self).__init__(20)

class Second(object):
  def __init__(self, a):
    print "second", a
    super(Second, self).__init__()

class Third(First, Second):
  def __init__(self):
    super(Third, self).__init__(10)
    print "that's it"

t = Third()

输出是

first 10
second 20
that's it

调用Third()可以找到Third中定义的init。在该例程中对super的调用将调用First中定义的init。MRO = [第一,第二]。现在,对First 中定义的super init的调用将继续搜索MRO并查找Second中定义的init,对super的任何调用都将命中默认对象init。我希望这个例子可以澄清这个概念。

如果您不从First调用super。链停止,您将获得以下输出。

first 10
that's it
class First(object):
  def __init__(self, a):
    print "first", a
    super(First, self).__init__(20)

class Second(object):
  def __init__(self, a):
    print "second", a
    super(Second, self).__init__()

class Third(First, Second):
  def __init__(self):
    super(Third, self).__init__(10)
    print "that's it"

t = Third()

Output is

first 10
second 20
that's it

Call to Third() locates the init defined in Third. And call to super in that routine invokes init defined in First. MRO=[First, Second]. Now call to super in init defined in First will continue searching MRO and find init defined in Second, and any call to super will hit the default object init. I hope this example clarifies the concept.

If you don’t call super from First. The chain stops and you will get the following output.

first 10
that's it

回答 10

在学习python的方式中,我学习了一个名为super()的东西,如果没有记错的话,这是一个内置函数。调用super()函数可以帮助继承通过父级和“同级”,并帮助您更清晰地了解。我仍然是一个初学者,但我喜欢分享我在python2.7中使用此super()的经验。

如果您已阅读了本页中的注释,则将听到方法解析顺序(MRO),该方法即为您编写的函数,MRO将使用深度从左到右的方案进行搜索和运行。您可以对此进行更多研究。

通过添加super()函数

super(First, self).__init__() #example for class First.

您可以通过添加每个实例和每个实例,将多个实例和“家族”与super()连接。然后它将执行这些方法,仔细检查它们,并确保您没有错过!但是,在之前或之后添加它们确实会有所不同,您将知道是否已经完成了学习Python的难度练习44。让乐趣开始吧!

以下面的示例为例,您可以复制并粘贴并尝试运行它:

class First(object):
    def __init__(self):

        print("first")

class Second(First):
    def __init__(self):
        print("second (before)")
        super(Second, self).__init__()
        print("second (after)")

class Third(First):
    def __init__(self):
        print("third (before)")
        super(Third, self).__init__()
        print("third (after)")


class Fourth(First):
    def __init__(self):
        print("fourth (before)")
        super(Fourth, self).__init__()
        print("fourth (after)")


class Fifth(Second, Third, Fourth):
    def __init__(self):
        print("fifth (before)")
        super(Fifth, self).__init__()
        print("fifth (after)")

Fifth()

它如何运行?third()的实例将像这样。每个步骤都在添加超级功能的类之间进行。

1.) print("fifth (before)")
2.) super()>[Second, Third, Fourth] (Left to right)
3.) print("second (before)")
4.) super()> First (First is the Parent which inherit from object)

找到了父母,它将继续到第三和第四!

5.) print("third (before)")
6.) super()> First (Parent class)
7.) print ("Fourth (before)")
8.) super()> First (Parent class)

现在,所有具有super()的类都已被访问!已经找到并执行了父类,现在它继续对继承中的函数进行拆箱以完成代码。

9.) print("first") (Parent)
10.) print ("Fourth (after)") (Class Fourth un-box)
11.) print("third (after)") (Class Third un-box)
12.) print("second (after)") (Class Second un-box)
13.) print("fifth (after)") (Class Fifth un-box)
14.) Fifth() executed

以上方案的结果:

fifth (before)
second (before
third (before)
fourth (before)
first
fourth (after)
third (after)
second (after)
fifth (after)

对我来说,通过添加super()可以使我更清楚地了解python如何执行编码,并确保继承可以访问我想要的方法。

In learningpythonthehardway I learn something called super() an in-built function if not mistaken. Calling super() function can help the inheritance to pass through the parent and ‘siblings’ and help you to see clearer. I am still a beginner but I love to share my experience on using this super() in python2.7.

If you have read through the comments in this page, you will hear of Method Resolution Order (MRO), the method being the function you wrote, MRO will be using Depth-First-Left-to-Right scheme to search and run. You can do more research on that.

By adding super() function

super(First, self).__init__() #example for class First.

You can connect multiple instances and ‘families’ with super(), by adding in each and everyone in them. And it will execute the methods, go through them and make sure you didn’t miss out! However, adding them before or after does make a difference you will know if you have done the learningpythonthehardway exercise 44. Let the fun begins!!

Taking example below, you can copy & paste and try run it:

class First(object):
    def __init__(self):

        print("first")

class Second(First):
    def __init__(self):
        print("second (before)")
        super(Second, self).__init__()
        print("second (after)")

class Third(First):
    def __init__(self):
        print("third (before)")
        super(Third, self).__init__()
        print("third (after)")


class Fourth(First):
    def __init__(self):
        print("fourth (before)")
        super(Fourth, self).__init__()
        print("fourth (after)")


class Fifth(Second, Third, Fourth):
    def __init__(self):
        print("fifth (before)")
        super(Fifth, self).__init__()
        print("fifth (after)")

Fifth()

How does it run? The instance of fifth() will goes like this. Each step goes from class to class where the super function added.

1.) print("fifth (before)")
2.) super()>[Second, Third, Fourth] (Left to right)
3.) print("second (before)")
4.) super()> First (First is the Parent which inherit from object)

The parent was found and it will go continue to Third and Fourth!!

5.) print("third (before)")
6.) super()> First (Parent class)
7.) print ("Fourth (before)")
8.) super()> First (Parent class)

Now all the classes with super() have been accessed! The parent class has been found and executed and now it continues to unbox the function in the inheritances to finished the codes.

9.) print("first") (Parent)
10.) print ("Fourth (after)") (Class Fourth un-box)
11.) print("third (after)") (Class Third un-box)
12.) print("second (after)") (Class Second un-box)
13.) print("fifth (after)") (Class Fifth un-box)
14.) Fifth() executed

The outcome of the program above:

fifth (before)
second (before
third (before)
fourth (before)
first
fourth (after)
third (after)
second (after)
fifth (after)

For me by adding super() allows me to see clearer on how python would execute my coding and make sure the inheritance can access the method I intended.


回答 11

我想补充@Visionscaper在顶部所说的内容:

Third --> First --> object --> Second --> object

在这种情况下,解释器不会过滤掉对象类,因为它是重复的,而是因为第二类出现在层次结构子集中的头部位置而没有出现在尾部位置。而对象仅出现在尾部位置,在C3算法中不能将其视为确定优先级的强项。

C类的线性度(mro)为

  • C级
  • 加上合并
    • 其父代P1,P2,.. = L(P1,P2,…)的线性化
    • 其父级P1,P​​2,…的列表

线性化合并是通过选择显示在列表开头而不是结尾的常见类来完成的,因为顺序很重要(下面将变得很清楚)

Third的线性化可以计算如下:

    L(O)  := [O]  // the linearization(mro) of O(object), because O has no parents

    L(First)  :=  [First] + merge(L(O), [O])
               =  [First] + merge([O], [O])
               =  [First, O]

    // Similarly, 
    L(Second)  := [Second, O]

    L(Third)   := [Third] + merge(L(First), L(Second), [First, Second])
                = [Third] + merge([First, O], [Second, O], [First, Second])
// class First is a good candidate for the first merge step, because it only appears as the head of the first and last lists
// class O is not a good candidate for the next merge step, because it also appears in the tails of list 1 and 2, 
                = [Third, First] + merge([O], [Second, O], [Second])
// class Second is a good candidate for the second merge step, because it appears as the head of the list 2 and 3
                = [Third, First, Second] + merge([O], [O])            
                = [Third, First, Second, O]

因此,对于以下代码中的super()实现:

class First(object):
  def __init__(self):
    super(First, self).__init__()
    print "first"

class Second(object):
  def __init__(self):
    super(Second, self).__init__()
    print "second"

class Third(First, Second):
  def __init__(self):
    super(Third, self).__init__()
    print "that's it"

显然该方法将如何解决

Third.__init__() ---> First.__init__() ---> Second.__init__() ---> 
Object.__init__() ---> returns ---> Second.__init__() -
prints "second" - returns ---> First.__init__() -
prints "first" - returns ---> Third.__init__() - prints "that's it"

I would like to add to what @Visionscaper says at the top:

Third --> First --> object --> Second --> object

In this case the interpreter doesnt filter out the object class because its duplicated, rather its because Second appears in a head position and doesnt appear in the tail position in a hierarchy subset. While object only appears in tail positions and is not considered a strong position in C3 algorithm to determine priority.

The linearisation(mro) of a class C, L(C), is the

  • the Class C
  • plus the merge of
    • linearisation of its parents P1, P2, .. = L(P1, P2, …) and
    • the list of its parents P1, P2, ..

Linearised Merge is done by selecting the common classes that appears as the head of lists and not the tail since order matters(will become clear below)

The linearisation of Third can be computed as follows:

    L(O)  := [O]  // the linearization(mro) of O(object), because O has no parents

    L(First)  :=  [First] + merge(L(O), [O])
               =  [First] + merge([O], [O])
               =  [First, O]

    // Similarly, 
    L(Second)  := [Second, O]

    L(Third)   := [Third] + merge(L(First), L(Second), [First, Second])
                = [Third] + merge([First, O], [Second, O], [First, Second])
// class First is a good candidate for the first merge step, because it only appears as the head of the first and last lists
// class O is not a good candidate for the next merge step, because it also appears in the tails of list 1 and 2, 
                = [Third, First] + merge([O], [Second, O], [Second])
// class Second is a good candidate for the second merge step, because it appears as the head of the list 2 and 3
                = [Third, First, Second] + merge([O], [O])            
                = [Third, First, Second, O]

Thus for a super() implementation in the following code:

class First(object):
  def __init__(self):
    super(First, self).__init__()
    print "first"

class Second(object):
  def __init__(self):
    super(Second, self).__init__()
    print "second"

class Third(First, Second):
  def __init__(self):
    super(Third, self).__init__()
    print "that's it"

it becomes obvious how this method will be resolved

Third.__init__() ---> First.__init__() ---> Second.__init__() ---> 
Object.__init__() ---> returns ---> Second.__init__() -
prints "second" - returns ---> First.__init__() -
prints "first" - returns ---> Third.__init__() - prints "that's it"

回答 12

在python 3.5+中,继承看起来是可预测的,对我来说非常好。请看下面的代码:

class Base(object):
  def foo(self):
    print("    Base(): entering")
    print("    Base(): exiting")


class First(Base):
  def foo(self):
    print("   First(): entering Will call Second now")
    super().foo()
    print("   First(): exiting")


class Second(Base):
  def foo(self):
    print("  Second(): entering")
    super().foo()
    print("  Second(): exiting")


class Third(First, Second):
  def foo(self):
    print(" Third(): entering")
    super().foo()
    print(" Third(): exiting")


class Fourth(Third):
  def foo(self):
    print("Fourth(): entering")
    super().foo()
    print("Fourth(): exiting")

Fourth().foo()
print(Fourth.__mro__)

输出:

Fourth(): entering
 Third(): entering
   First(): entering Will call Second now
  Second(): entering
    Base(): entering
    Base(): exiting
  Second(): exiting
   First(): exiting
 Third(): exiting
Fourth(): exiting
(<class '__main__.Fourth'>, <class '__main__.Third'>, <class '__main__.First'>, <class '__main__.Second'>, <class '__main__.Base'>, <class 'object'>)

如您所见,它为每个继承的链以与继承相同的顺序恰好一次调用foo。您可以调用来获得该订单mro

第四->第三->第一->第二->基础->对象

In python 3.5+ inheritance looks predictable and very nice for me. Please looks at this code:

class Base(object):
  def foo(self):
    print("    Base(): entering")
    print("    Base(): exiting")


class First(Base):
  def foo(self):
    print("   First(): entering Will call Second now")
    super().foo()
    print("   First(): exiting")


class Second(Base):
  def foo(self):
    print("  Second(): entering")
    super().foo()
    print("  Second(): exiting")


class Third(First, Second):
  def foo(self):
    print(" Third(): entering")
    super().foo()
    print(" Third(): exiting")


class Fourth(Third):
  def foo(self):
    print("Fourth(): entering")
    super().foo()
    print("Fourth(): exiting")

Fourth().foo()
print(Fourth.__mro__)

Outputs:

Fourth(): entering
 Third(): entering
   First(): entering Will call Second now
  Second(): entering
    Base(): entering
    Base(): exiting
  Second(): exiting
   First(): exiting
 Third(): exiting
Fourth(): exiting
(<class '__main__.Fourth'>, <class '__main__.Third'>, <class '__main__.First'>, <class '__main__.Second'>, <class '__main__.Base'>, <class 'object'>)

As you can see, it calls foo exactly ONE time for each inherited chain in the same order as it was inherited. You can get that order by calling .mro :

Fourth -> Third -> First -> Second -> Base -> object


回答 13

也许仍然可以添加一些东西,一个带有Django rest_framework和装饰器的小例子。这提供了对隐式问题的答案:“我为什么仍要这样做?”

如前所述:我们使用的是Django rest_framework,我们使用的是通用视图,对于数据库中每种类型的对象,我们发现自己都有一个视图类为对象列表提供GET和POST,另一种视图类提供GET ,PUT和DELETE用于单个对象。

现在我们要用Django的login_required装饰POST,PUT和DELETE。请注意,这是如何触及两个类的,但并非触及任何一个类的所有方法。

一个解决方案可能要经历多重继承。

from django.utils.decorators import method_decorator
from django.contrib.auth.decorators import login_required

class LoginToPost:
    @method_decorator(login_required)
    def post(self, arg, *args, **kwargs):
        super().post(arg, *args, **kwargs)

其他方法也一样。

在具体类的继承列表中,我要添加LoginToPostbefore ListCreateAPIViewLoginToPutOrDeletebefore RetrieveUpdateDestroyAPIView。我的具体类get将保持不变。

Maybe there’s still something that can be added, a small example with Django rest_framework, and decorators. This provides an answer to the implicit question: “why would I want this anyway?”

As said: we’re with Django rest_framework, and we’re using generic views, and for each type of objects in our database we find ourselves with one view class providing GET and POST for lists of objects, and an other view class providing GET, PUT, and DELETE for individual objects.

Now the POST, PUT, and DELETE we want to decorate with Django’s login_required. Notice how this touches both classes, but not all methods in either class.

A solution could go through multiple inheritance.

from django.utils.decorators import method_decorator
from django.contrib.auth.decorators import login_required

class LoginToPost:
    @method_decorator(login_required)
    def post(self, arg, *args, **kwargs):
        super().post(arg, *args, **kwargs)

Likewise for the other methods.

In the inheritance list of my concrete classes, I would add my LoginToPost before ListCreateAPIView and LoginToPutOrDelete before RetrieveUpdateDestroyAPIView. My concrete classes’ get would stay undecorated.


回答 14

发布此答案供我将来参考。

Python多重继承应使用菱形模型,并且函数签名在模型中不应更改。

    A
   / \
  B   C
   \ /
    D

示例代码段将为;-

class A:
    def __init__(self, name=None):
        #  this is the head of the diamond, no need to call super() here
        self.name = name

class B(A):
    def __init__(self, param1='hello', **kwargs):
        super().__init__(**kwargs)
        self.param1 = param1

class C(A):
    def __init__(self, param2='bye', **kwargs):
        super().__init__(**kwargs)
        self.param2 = param2

class D(B, C):
    def __init__(self, works='fine', **kwargs):
        super().__init__(**kwargs)
        print(f"{works=}, {self.param1=}, {self.param2=}, {self.name=}")

d = D(name='Testing')

这是A班 object

Posting this answer for my future referance.

Python Multiple Inheritance should use a diamond model and the function signature shouldn’t change in the model.

    A
   / \
  B   C
   \ /
    D

The sample code snippet would be ;-

class A:
    def __init__(self, name=None):
        #  this is the head of the diamond, no need to call super() here
        self.name = name

class B(A):
    def __init__(self, param1='hello', **kwargs):
        super().__init__(**kwargs)
        self.param1 = param1

class C(A):
    def __init__(self, param2='bye', **kwargs):
        super().__init__(**kwargs)
        self.param2 = param2

class D(B, C):
    def __init__(self, works='fine', **kwargs):
        super().__init__(**kwargs)
        print(f"{works=}, {self.param1=}, {self.param2=}, {self.name=}")

d = D(name='Testing')

Here class A is object


如何更改DataFrame列的顺序?

问题:如何更改DataFrame列的顺序?

我有以下DataFramedf):

import numpy as np
import pandas as pd

df = pd.DataFrame(np.random.rand(10, 5))

我通过分配添加了更多列:

df['mean'] = df.mean(1)

如何将列mean移到最前面,即将其设置为第一列,而其他列的顺序保持不变?

I have the following DataFrame (df):

import numpy as np
import pandas as pd

df = pd.DataFrame(np.random.rand(10, 5))

I add more column(s) by assignment:

df['mean'] = df.mean(1)

How can I move the column mean to the front, i.e. set it as first column leaving the order of the other columns untouched?


回答 0

一种简单的方法是使用列的列表重新分配数据框,并根据需要重新排列。

这是您现在拥有的:

In [6]: df
Out[6]:
          0         1         2         3         4      mean
0  0.445598  0.173835  0.343415  0.682252  0.582616  0.445543
1  0.881592  0.696942  0.702232  0.696724  0.373551  0.670208
2  0.662527  0.955193  0.131016  0.609548  0.804694  0.632596
3  0.260919  0.783467  0.593433  0.033426  0.512019  0.436653
4  0.131842  0.799367  0.182828  0.683330  0.019485  0.363371
5  0.498784  0.873495  0.383811  0.699289  0.480447  0.587165
6  0.388771  0.395757  0.745237  0.628406  0.784473  0.588529
7  0.147986  0.459451  0.310961  0.706435  0.100914  0.345149
8  0.394947  0.863494  0.585030  0.565944  0.356561  0.553195
9  0.689260  0.865243  0.136481  0.386582  0.730399  0.561593

In [7]: cols = df.columns.tolist()

In [8]: cols
Out[8]: [0L, 1L, 2L, 3L, 4L, 'mean']

cols以您想要的任何方式重新排列。这就是我将最后一个元素移到第一个位置的方式:

In [12]: cols = cols[-1:] + cols[:-1]

In [13]: cols
Out[13]: ['mean', 0L, 1L, 2L, 3L, 4L]

然后像这样重新排列数据框:

In [16]: df = df[cols]  #    OR    df = df.ix[:, cols]

In [17]: df
Out[17]:
       mean         0         1         2         3         4
0  0.445543  0.445598  0.173835  0.343415  0.682252  0.582616
1  0.670208  0.881592  0.696942  0.702232  0.696724  0.373551
2  0.632596  0.662527  0.955193  0.131016  0.609548  0.804694
3  0.436653  0.260919  0.783467  0.593433  0.033426  0.512019
4  0.363371  0.131842  0.799367  0.182828  0.683330  0.019485
5  0.587165  0.498784  0.873495  0.383811  0.699289  0.480447
6  0.588529  0.388771  0.395757  0.745237  0.628406  0.784473
7  0.345149  0.147986  0.459451  0.310961  0.706435  0.100914
8  0.553195  0.394947  0.863494  0.585030  0.565944  0.356561
9  0.561593  0.689260  0.865243  0.136481  0.386582  0.730399

One easy way would be to reassign the dataframe with a list of the columns, rearranged as needed.

This is what you have now:

In [6]: df
Out[6]:
          0         1         2         3         4      mean
0  0.445598  0.173835  0.343415  0.682252  0.582616  0.445543
1  0.881592  0.696942  0.702232  0.696724  0.373551  0.670208
2  0.662527  0.955193  0.131016  0.609548  0.804694  0.632596
3  0.260919  0.783467  0.593433  0.033426  0.512019  0.436653
4  0.131842  0.799367  0.182828  0.683330  0.019485  0.363371
5  0.498784  0.873495  0.383811  0.699289  0.480447  0.587165
6  0.388771  0.395757  0.745237  0.628406  0.784473  0.588529
7  0.147986  0.459451  0.310961  0.706435  0.100914  0.345149
8  0.394947  0.863494  0.585030  0.565944  0.356561  0.553195
9  0.689260  0.865243  0.136481  0.386582  0.730399  0.561593

In [7]: cols = df.columns.tolist()

In [8]: cols
Out[8]: [0L, 1L, 2L, 3L, 4L, 'mean']

Rearrange cols in any way you want. This is how I moved the last element to the first position:

In [12]: cols = cols[-1:] + cols[:-1]

In [13]: cols
Out[13]: ['mean', 0L, 1L, 2L, 3L, 4L]

Then reorder the dataframe like this:

In [16]: df = df[cols]  #    OR    df = df.ix[:, cols]

In [17]: df
Out[17]:
       mean         0         1         2         3         4
0  0.445543  0.445598  0.173835  0.343415  0.682252  0.582616
1  0.670208  0.881592  0.696942  0.702232  0.696724  0.373551
2  0.632596  0.662527  0.955193  0.131016  0.609548  0.804694
3  0.436653  0.260919  0.783467  0.593433  0.033426  0.512019
4  0.363371  0.131842  0.799367  0.182828  0.683330  0.019485
5  0.587165  0.498784  0.873495  0.383811  0.699289  0.480447
6  0.588529  0.388771  0.395757  0.745237  0.628406  0.784473
7  0.345149  0.147986  0.459451  0.310961  0.706435  0.100914
8  0.553195  0.394947  0.863494  0.585030  0.565944  0.356561
9  0.561593  0.689260  0.865243  0.136481  0.386582  0.730399

回答 1

您还可以执行以下操作:

df = df[['mean', '0', '1', '2', '3']]

您可以通过以下方式获取列列表:

cols = list(df.columns.values)

输出将生成:

['0', '1', '2', '3', 'mean']

…然后在将其放入第一个功能之前很容易进行手动重新排列

You could also do something like this:

df = df[['mean', '0', '1', '2', '3']]

You can get the list of columns with:

cols = list(df.columns.values)

The output will produce:

['0', '1', '2', '3', 'mean']

…which is then easy to rearrange manually before dropping it into the first function


回答 2

只需按照所需的顺序分配列名:

In [39]: df
Out[39]: 
          0         1         2         3         4  mean
0  0.172742  0.915661  0.043387  0.712833  0.190717     1
1  0.128186  0.424771  0.590779  0.771080  0.617472     1
2  0.125709  0.085894  0.989798  0.829491  0.155563     1
3  0.742578  0.104061  0.299708  0.616751  0.951802     1
4  0.721118  0.528156  0.421360  0.105886  0.322311     1
5  0.900878  0.082047  0.224656  0.195162  0.736652     1
6  0.897832  0.558108  0.318016  0.586563  0.507564     1
7  0.027178  0.375183  0.930248  0.921786  0.337060     1
8  0.763028  0.182905  0.931756  0.110675  0.423398     1
9  0.848996  0.310562  0.140873  0.304561  0.417808     1

In [40]: df = df[['mean', 4,3,2,1]]

现在,“平均值”列出现在前面:

In [41]: df
Out[41]: 
   mean         4         3         2         1
0     1  0.190717  0.712833  0.043387  0.915661
1     1  0.617472  0.771080  0.590779  0.424771
2     1  0.155563  0.829491  0.989798  0.085894
3     1  0.951802  0.616751  0.299708  0.104061
4     1  0.322311  0.105886  0.421360  0.528156
5     1  0.736652  0.195162  0.224656  0.082047
6     1  0.507564  0.586563  0.318016  0.558108
7     1  0.337060  0.921786  0.930248  0.375183
8     1  0.423398  0.110675  0.931756  0.182905
9     1  0.417808  0.304561  0.140873  0.310562

Just assign the column names in the order you want them:

In [39]: df
Out[39]: 
          0         1         2         3         4  mean
0  0.172742  0.915661  0.043387  0.712833  0.190717     1
1  0.128186  0.424771  0.590779  0.771080  0.617472     1
2  0.125709  0.085894  0.989798  0.829491  0.155563     1
3  0.742578  0.104061  0.299708  0.616751  0.951802     1
4  0.721118  0.528156  0.421360  0.105886  0.322311     1
5  0.900878  0.082047  0.224656  0.195162  0.736652     1
6  0.897832  0.558108  0.318016  0.586563  0.507564     1
7  0.027178  0.375183  0.930248  0.921786  0.337060     1
8  0.763028  0.182905  0.931756  0.110675  0.423398     1
9  0.848996  0.310562  0.140873  0.304561  0.417808     1

In [40]: df = df[['mean', 4,3,2,1]]

Now, ‘mean’ column comes out in the front:

In [41]: df
Out[41]: 
   mean         4         3         2         1
0     1  0.190717  0.712833  0.043387  0.915661
1     1  0.617472  0.771080  0.590779  0.424771
2     1  0.155563  0.829491  0.989798  0.085894
3     1  0.951802  0.616751  0.299708  0.104061
4     1  0.322311  0.105886  0.421360  0.528156
5     1  0.736652  0.195162  0.224656  0.082047
6     1  0.507564  0.586563  0.318016  0.558108
7     1  0.337060  0.921786  0.930248  0.375183
8     1  0.423398  0.110675  0.931756  0.182905
9     1  0.417808  0.304561  0.140873  0.310562

回答 3


回答 4

就你而言

df = df.reindex(columns=['mean',0,1,2,3,4])

会完全按照您想要的去做。

就我而言(一般形式):

df = df.reindex(columns=sorted(df.columns))
df = df.reindex(columns=(['opened'] + list([a for a in df.columns if a != 'opened']) ))

In your case,

df = df.reindex(columns=['mean',0,1,2,3,4])

will do exactly what you want.

In my case (general form):

df = df.reindex(columns=sorted(df.columns))
df = df.reindex(columns=(['opened'] + list([a for a in df.columns if a != 'opened']) ))

回答 5

您需要按所需顺序创建列的新列表,然后用于df = df[cols]按此新顺序重新排列列。

cols = ['mean']  + [col for col in df if col != 'mean']
df = df[cols]

您也可以使用更通用的方法。在此示例中,最后一列(由-1表示)被插入为第一列。

cols = [df.columns[-1]] + [col for col in df if col != df.columns[-1]]
df = df[cols]

如果DataFrame中存在列,则还可以使用此方法以所需顺序对列进行重新排序。

inserted_cols = ['a', 'b', 'c']
cols = ([col for col in inserted_cols if col in df] 
        + [col for col in df if col not in inserted_cols])
df = df[cols]

You need to create a new list of your columns in the desired order, then use df = df[cols] to rearrange the columns in this new order.

cols = ['mean']  + [col for col in df if col != 'mean']
df = df[cols]

You can also use a more general approach. In this example, the last column (indicated by -1) is inserted as the first column.

cols = [df.columns[-1]] + [col for col in df if col != df.columns[-1]]
df = df[cols]

You can also use this approach for reordering columns in a desired order if they are present in the DataFrame.

inserted_cols = ['a', 'b', 'c']
cols = ([col for col in inserted_cols if col in df] 
        + [col for col in df if col not in inserted_cols])
df = df[cols]

回答 6

import numpy as np
import pandas as pd
df = pd.DataFrame()
column_names = ['x','y','z','mean']
for col in column_names: 
    df[col] = np.random.randint(0,100, size=10000)

您可以尝试以下解决方案:

解决方案1:

df = df[ ['mean'] + [ col for col in df.columns if col != 'mean' ] ]

解决方案2:


df = df[['mean', 'x', 'y', 'z']]

解决方案3:

col = df.pop("mean")
df = df.insert(0, col.name, col)

解决方案4:

df.set_index(df.columns[-1], inplace=True)
df.reset_index(inplace=True)

解决方案5:

cols = list(df)
cols = [cols[-1]] + cols[:-1]
df = df[cols]

解决方案6:

order = [1,2,3,0] # setting column's order
df = df[[df.columns[i] for i in order]]

时间比较:

解决方案1:

CPU时间:用户1.05 ms,sys:35 µs,总计:1.08 ms挂墙时间:995 µs

解决方案2

CPU时间:用户933 µs,系统:0 ns,总计:933 µs挂墙时间:800 µs

解决方案3

CPU时间:用户0 ns,sys:1.35 ms,总计:1.35 ms挂墙时间:1.08 ms

解决方案4

CPU时间:用户1.23 ms,sys:45 µs,总计:1.27 ms挂墙时间:986 µs

解决方案5

CPU时间:用户1.09 ms,sys:19 µs,总计:1.11 ms挂墙时间:949 µs

解决方案6

CPU时间:用户955 µs,系统:34 µs,总计:989 µs挂墙时间:859 µs

import numpy as np
import pandas as pd
df = pd.DataFrame()
column_names = ['x','y','z','mean']
for col in column_names: 
    df[col] = np.random.randint(0,100, size=10000)

You can try out the following solutions :

Solution 1:

df = df[ ['mean'] + [ col for col in df.columns if col != 'mean' ] ]

Solution 2:


df = df[['mean', 'x', 'y', 'z']]

Solution 3:

col = df.pop("mean")
df = df.insert(0, col.name, col)

Solution 4:

df.set_index(df.columns[-1], inplace=True)
df.reset_index(inplace=True)

Solution 5:

cols = list(df)
cols = [cols[-1]] + cols[:-1]
df = df[cols]

solution 6:

order = [1,2,3,0] # setting column's order
df = df[[df.columns[i] for i in order]]

Time Comparison:

Solution 1:

CPU times: user 1.05 ms, sys: 35 µs, total: 1.08 ms Wall time: 995 µs

Solution 2:

CPU times: user 933 µs, sys: 0 ns, total: 933 µs Wall time: 800 µs

Solution 3:

CPU times: user 0 ns, sys: 1.35 ms, total: 1.35 ms Wall time: 1.08 ms

Solution 4:

CPU times: user 1.23 ms, sys: 45 µs, total: 1.27 ms Wall time: 986 µs

Solution 5:

CPU times: user 1.09 ms, sys: 19 µs, total: 1.11 ms Wall time: 949 µs

Solution 6:

CPU times: user 955 µs, sys: 34 µs, total: 989 µs Wall time: 859 µs


回答 7

从2018年8月开始:

如果列名太长而无法键入,则可以通过以下位置的整数列表来指定新顺序:

数据:

          0         1         2         3         4      mean
0  0.397312  0.361846  0.719802  0.575223  0.449205  0.500678
1  0.287256  0.522337  0.992154  0.584221  0.042739  0.485741
2  0.884812  0.464172  0.149296  0.167698  0.793634  0.491923
3  0.656891  0.500179  0.046006  0.862769  0.651065  0.543382
4  0.673702  0.223489  0.438760  0.468954  0.308509  0.422683
5  0.764020  0.093050  0.100932  0.572475  0.416471  0.389390
6  0.259181  0.248186  0.626101  0.556980  0.559413  0.449972
7  0.400591  0.075461  0.096072  0.308755  0.157078  0.207592
8  0.639745  0.368987  0.340573  0.997547  0.011892  0.471749
9  0.050582  0.714160  0.168839  0.899230  0.359690  0.438500

通用示例:

new_order = [3,2,1,4,5,0]
print(df[df.columns[new_order]])  

          3         2         1         4      mean         0
0  0.575223  0.719802  0.361846  0.449205  0.500678  0.397312
1  0.584221  0.992154  0.522337  0.042739  0.485741  0.287256
2  0.167698  0.149296  0.464172  0.793634  0.491923  0.884812
3  0.862769  0.046006  0.500179  0.651065  0.543382  0.656891
4  0.468954  0.438760  0.223489  0.308509  0.422683  0.673702
5  0.572475  0.100932  0.093050  0.416471  0.389390  0.764020
6  0.556980  0.626101  0.248186  0.559413  0.449972  0.259181
7  0.308755  0.096072  0.075461  0.157078  0.207592  0.400591
8  0.997547  0.340573  0.368987  0.011892  0.471749  0.639745
9  0.899230  0.168839  0.714160  0.359690  0.438500  0.050582

对于OP问题的特定情况:

new_order = [-1,0,1,2,3,4]
df = df[df.columns[new_order]]
print(df)

       mean         0         1         2         3         4
0  0.500678  0.397312  0.361846  0.719802  0.575223  0.449205
1  0.485741  0.287256  0.522337  0.992154  0.584221  0.042739
2  0.491923  0.884812  0.464172  0.149296  0.167698  0.793634
3  0.543382  0.656891  0.500179  0.046006  0.862769  0.651065
4  0.422683  0.673702  0.223489  0.438760  0.468954  0.308509
5  0.389390  0.764020  0.093050  0.100932  0.572475  0.416471
6  0.449972  0.259181  0.248186  0.626101  0.556980  0.559413
7  0.207592  0.400591  0.075461  0.096072  0.308755  0.157078
8  0.471749  0.639745  0.368987  0.340573  0.997547  0.011892
9  0.438500  0.050582  0.714160  0.168839  0.899230  0.359690

这种方法的主要问题在于,多次调用相同的代码每次都会产生不同的结果,因此需要小心:)

From August 2018:

If your column names are too long to type then you could specify the new order through a list of integers with the positions:

Data:

          0         1         2         3         4      mean
0  0.397312  0.361846  0.719802  0.575223  0.449205  0.500678
1  0.287256  0.522337  0.992154  0.584221  0.042739  0.485741
2  0.884812  0.464172  0.149296  0.167698  0.793634  0.491923
3  0.656891  0.500179  0.046006  0.862769  0.651065  0.543382
4  0.673702  0.223489  0.438760  0.468954  0.308509  0.422683
5  0.764020  0.093050  0.100932  0.572475  0.416471  0.389390
6  0.259181  0.248186  0.626101  0.556980  0.559413  0.449972
7  0.400591  0.075461  0.096072  0.308755  0.157078  0.207592
8  0.639745  0.368987  0.340573  0.997547  0.011892  0.471749
9  0.050582  0.714160  0.168839  0.899230  0.359690  0.438500

Generic example:

new_order = [3,2,1,4,5,0]
print(df[df.columns[new_order]])  

          3         2         1         4      mean         0
0  0.575223  0.719802  0.361846  0.449205  0.500678  0.397312
1  0.584221  0.992154  0.522337  0.042739  0.485741  0.287256
2  0.167698  0.149296  0.464172  0.793634  0.491923  0.884812
3  0.862769  0.046006  0.500179  0.651065  0.543382  0.656891
4  0.468954  0.438760  0.223489  0.308509  0.422683  0.673702
5  0.572475  0.100932  0.093050  0.416471  0.389390  0.764020
6  0.556980  0.626101  0.248186  0.559413  0.449972  0.259181
7  0.308755  0.096072  0.075461  0.157078  0.207592  0.400591
8  0.997547  0.340573  0.368987  0.011892  0.471749  0.639745
9  0.899230  0.168839  0.714160  0.359690  0.438500  0.050582

And for the specific case of OP’s question:

new_order = [-1,0,1,2,3,4]
df = df[df.columns[new_order]]
print(df)

       mean         0         1         2         3         4
0  0.500678  0.397312  0.361846  0.719802  0.575223  0.449205
1  0.485741  0.287256  0.522337  0.992154  0.584221  0.042739
2  0.491923  0.884812  0.464172  0.149296  0.167698  0.793634
3  0.543382  0.656891  0.500179  0.046006  0.862769  0.651065
4  0.422683  0.673702  0.223489  0.438760  0.468954  0.308509
5  0.389390  0.764020  0.093050  0.100932  0.572475  0.416471
6  0.449972  0.259181  0.248186  0.626101  0.556980  0.559413
7  0.207592  0.400591  0.075461  0.096072  0.308755  0.157078
8  0.471749  0.639745  0.368987  0.340573  0.997547  0.011892
9  0.438500  0.050582  0.714160  0.168839  0.899230  0.359690

The main problem with this approach is that calling the same code multiple times will create different results each time, so one needs to be careful :)


回答 8

此功能避免了仅列出一些变量就不必列出数据集中的每个变量。

def order(frame,var):
    if type(var) is str:
        var = [var] #let the command take a string or list
    varlist =[w for w in frame.columns if w not in var]
    frame = frame[var+varlist]
    return frame 

它有两个参数,第一个是数据集,第二个是您要放在最前面的数据集中的列。

因此,在我的情况下,我有一个名为Frame的数据集,其中包含变量A1,A2,B1,B2,总计和日期。如果我想让道达尔走在前列,那么我要做的就是:

frame = order(frame,['Total'])

如果我想将“总计”和“日期”放在首位,那么我会这样做:

frame = order(frame,['Total','Date'])

编辑:

使用此功能的另一种有用方法是,如果您有一个陌生的表,并且正在查找其中包含特定术语的变量,例如VAR1,VAR2等,则可以执行以下操作:

frame = order(frame,[v for v in frame.columns if "VAR" in v])

This function avoids you having to list out every variable in your dataset just to order a few of them.

def order(frame,var):
    if type(var) is str:
        var = [var] #let the command take a string or list
    varlist =[w for w in frame.columns if w not in var]
    frame = frame[var+varlist]
    return frame 

It takes two arguments, the first is the dataset, the second are the columns in the data set that you want to bring to the front.

So in my case I have a data set called Frame with variables A1, A2, B1, B2, Total and Date. If I want to bring Total to the front then all I have to do is:

frame = order(frame,['Total'])

If I want to bring Total and Date to the front then I do:

frame = order(frame,['Total','Date'])

EDIT:

Another useful way to use this is, if you have an unfamiliar table and you’re looking with variables with a particular term in them, like VAR1, VAR2,… you may execute something like:

frame = order(frame,[v for v in frame.columns if "VAR" in v])

回答 9

我本人也遇到了类似的问题,只是想补充一下我所确定的内容。我喜欢reindex_axis() method用于更改列顺序的。这工作:

df = df.reindex_axis(['mean'] + list(df.columns[:-1]), axis=1)

一种基于@Jorge注释的替代方法:

df = df.reindex(columns=['mean'] + list(df.columns[:-1]))

尽管reindex_axis在微基准测试中似乎比在中稍快一些reindex,但我认为后者的直接性使我更喜欢后者。

I ran into a similar question myself, and just wanted to add what I settled on. I liked the reindex_axis() method for changing column order. This worked:

df = df.reindex_axis(['mean'] + list(df.columns[:-1]), axis=1)

An alternate method based on the comment from @Jorge:

df = df.reindex(columns=['mean'] + list(df.columns[:-1]))

Although reindex_axis seems to be slightly faster in micro benchmarks than reindex, I think I prefer the latter for its directness.


回答 10

简单地做,

df = df[['mean'] + df.columns[:-1].tolist()]

Simply do,

df = df[['mean'] + df.columns[:-1].tolist()]

回答 11

您可以执行以下操作(从Aman的答案中借用部分内容):

cols = df.columns.tolist()
cols.insert(0, cols.pop(-1))

cols
>>>['mean', 0L, 1L, 2L, 3L, 4L]

df = df[cols]

You could do the following (borrowing parts from Aman’s answer):

cols = df.columns.tolist()
cols.insert(0, cols.pop(-1))

cols
>>>['mean', 0L, 1L, 2L, 3L, 4L]

df = df[cols]

回答 12

只需输入要更改的列名,然后为新位置设置索引即可。

def change_column_order(df, col_name, index):
    cols = df.columns.tolist()
    cols.remove(col_name)
    cols.insert(index, col_name)
    return df[cols]

对于您的情况,这将是:

df = change_column_order(df, 'mean', 0)

Just type the column name you want to change, and set the index for the new location.

def change_column_order(df, col_name, index):
    cols = df.columns.tolist()
    cols.remove(col_name)
    cols.insert(index, col_name)
    return df[cols]

For your case, this would be like:

df = change_column_order(df, 'mean', 0)

回答 13

将任何列移动到任何位置:

import pandas as pd
df = pd.DataFrame({"A": [1,2,3], 
                   "B": [2,4,8], 
                   "C": [5,5,5]})

cols = df.columns.tolist()
column_to_move = "C"
new_position = 1

cols.insert(new_position, cols.pop(cols.index(column_to_move)))
df = df[cols]

Moving any column to any position:

import pandas as pd
df = pd.DataFrame({"A": [1,2,3], 
                   "B": [2,4,8], 
                   "C": [5,5,5]})

cols = df.columns.tolist()
column_to_move = "C"
new_position = 1

cols.insert(new_position, cols.pop(cols.index(column_to_move)))
df = df[cols]

回答 14

我认为这是一个稍微整洁的解决方案:

df.insert(0,'mean', df.pop("mean"))

该解决方案有点类似于@JoeHeffer的解决方案,但这只是一个衬里。

在这里,我们"mean"从数据框中删除该列,并将其附加到0具有相同列名的索引。

I think this is a slightly neater solution:

df.insert(0,'mean', df.pop("mean"))

This solution is somewhat similar to @JoeHeffer ‘s solution but this is one liner.

Here we remove the column "mean" from the dataframe and attach it to index 0 with the same column name.


回答 15

这是一种移动现有列的方法,该列将修改现有数据框。

my_column = df.pop('column name')
df.insert(3, my_column.name, my_column)

Here’s a way to move one existing column that will modify the existing data frame in place.

my_column = df.pop('column name')
df.insert(3, my_column.name, my_column)

回答 16

之前已经回答了这个问题但是现在不推荐使用reindex_axis,所以我建议使用:

df.reindex(sorted(df.columns), axis=1)

This question has been answered before but reindex_axis is deprecated now so I would suggest to use:

df.reindex(sorted(df.columns), axis=1)

回答 17

使用“ T”怎么样?

df.T.reindex(['mean',0,1,2,3,4]).T

How about using “T”?

df.T.reindex(['mean',0,1,2,3,4]).T

回答 18

@clocker:您的解决方案对我非常有帮助,因为我想从一个数据帧的最前面输入两列,而我并不确切知道所有列的名称,因为它们是从之前的透视语句生成的。因此,如果您处在相同的情况下:将前面知道名称的列放在前面,然后让“其他所有列”跟在后面,那么我想出了以下一般解决方案:

df = df.reindex_axis(['Col1','Col2'] + list(df.columns.drop(['Col1','Col2'])), axis=1)

@clocker: Your solution was very helpful for me, as I wanted to bring two columns in front from a dataframe where I do not know exactly the names of all columns, because they are generated from a pivot statement before. So, if you are in the same situation: To bring columns in front that you know the name of and then let them follow by “all the other columns”, I came up with the following general solution;

df = df.reindex_axis(['Col1','Col2'] + list(df.columns.drop(['Col1','Col2'])), axis=1)

回答 19

set()

一种简单的方法是使用set(),尤其是当您有很长的列列表并且不想手动处理它们时:

cols = list(set(df.columns.tolist()) - set(['mean']))
cols.insert(0, 'mean')
df = df[cols]

set():

A simple approach is using set(), in particular when you have a long list of columns and do not want to handle them manually:

cols = list(set(df.columns.tolist()) - set(['mean']))
cols.insert(0, 'mean')
df = df[cols]

回答 20

我喜欢Shoresh的回答,即在您不知道位置时使用集合功能删除列,但是,这对我而言不起作用,因为我需要保持原始列顺序(具有任意列标签)。

我通过使用boltons包中的IndexedSet使此工作正常。

我还需要重新添加多个列标签,因此对于更一般的情况,我使用了以下代码:

from boltons.setutils import IndexedSet
cols = list(IndexedSet(df.columns.tolist()) - set(['mean', 'std']))
cols[0:0] =['mean', 'std']
df = df[cols]

希望这对在此线程中寻求一般解决方案的任何人有用。

I liked Shoresh’s answer to use set functionality to remove columns when you don’t know the location, however this didn’t work for my purpose as I need to keep the original column order (which has arbitrary column labels).

I got this to work though by using IndexedSet from the boltons package.

I also needed to re-add multiple column labels, so for a more general case I used the following code:

from boltons.setutils import IndexedSet
cols = list(IndexedSet(df.columns.tolist()) - set(['mean', 'std']))
cols[0:0] =['mean', 'std']
df = df[cols]

Hope this is useful to anyone searching this thread for a general solution.


回答 21

您可以使用reindex可用于两个轴的轴:

df
#           0         1         2         3         4      mean
# 0  0.943825  0.202490  0.071908  0.452985  0.678397  0.469921
# 1  0.745569  0.103029  0.268984  0.663710  0.037813  0.363821
# 2  0.693016  0.621525  0.031589  0.956703  0.118434  0.484254
# 3  0.284922  0.527293  0.791596  0.243768  0.629102  0.495336
# 4  0.354870  0.113014  0.326395  0.656415  0.172445  0.324628
# 5  0.815584  0.532382  0.195437  0.829670  0.019001  0.478415
# 6  0.944587  0.068690  0.811771  0.006846  0.698785  0.506136
# 7  0.595077  0.437571  0.023520  0.772187  0.862554  0.538182
# 8  0.700771  0.413958  0.097996  0.355228  0.656919  0.444974
# 9  0.263138  0.906283  0.121386  0.624336  0.859904  0.555009

df.reindex(['mean', *range(5)], axis=1)

#        mean         0         1         2         3         4
# 0  0.469921  0.943825  0.202490  0.071908  0.452985  0.678397
# 1  0.363821  0.745569  0.103029  0.268984  0.663710  0.037813
# 2  0.484254  0.693016  0.621525  0.031589  0.956703  0.118434
# 3  0.495336  0.284922  0.527293  0.791596  0.243768  0.629102
# 4  0.324628  0.354870  0.113014  0.326395  0.656415  0.172445
# 5  0.478415  0.815584  0.532382  0.195437  0.829670  0.019001
# 6  0.506136  0.944587  0.068690  0.811771  0.006846  0.698785
# 7  0.538182  0.595077  0.437571  0.023520  0.772187  0.862554
# 8  0.444974  0.700771  0.413958  0.097996  0.355228  0.656919
# 9  0.555009  0.263138  0.906283  0.121386  0.624336  0.859904

You can use reindex which can be used for both axis:

df
#           0         1         2         3         4      mean
# 0  0.943825  0.202490  0.071908  0.452985  0.678397  0.469921
# 1  0.745569  0.103029  0.268984  0.663710  0.037813  0.363821
# 2  0.693016  0.621525  0.031589  0.956703  0.118434  0.484254
# 3  0.284922  0.527293  0.791596  0.243768  0.629102  0.495336
# 4  0.354870  0.113014  0.326395  0.656415  0.172445  0.324628
# 5  0.815584  0.532382  0.195437  0.829670  0.019001  0.478415
# 6  0.944587  0.068690  0.811771  0.006846  0.698785  0.506136
# 7  0.595077  0.437571  0.023520  0.772187  0.862554  0.538182
# 8  0.700771  0.413958  0.097996  0.355228  0.656919  0.444974
# 9  0.263138  0.906283  0.121386  0.624336  0.859904  0.555009

df.reindex(['mean', *range(5)], axis=1)

#        mean         0         1         2         3         4
# 0  0.469921  0.943825  0.202490  0.071908  0.452985  0.678397
# 1  0.363821  0.745569  0.103029  0.268984  0.663710  0.037813
# 2  0.484254  0.693016  0.621525  0.031589  0.956703  0.118434
# 3  0.495336  0.284922  0.527293  0.791596  0.243768  0.629102
# 4  0.324628  0.354870  0.113014  0.326395  0.656415  0.172445
# 5  0.478415  0.815584  0.532382  0.195437  0.829670  0.019001
# 6  0.506136  0.944587  0.068690  0.811771  0.006846  0.698785
# 7  0.538182  0.595077  0.437571  0.023520  0.772187  0.862554
# 8  0.444974  0.700771  0.413958  0.097996  0.355228  0.656919
# 9  0.555009  0.263138  0.906283  0.121386  0.624336  0.859904

回答 22

这是一个用于任意数量列的函数。

def mean_first(df):
    ncols = df.shape[1]        # Get the number of columns
    index = list(range(ncols)) # Create an index to reorder the columns
    index.insert(0,ncols)      # This puts the last column at the front
    return(df.assign(mean=df.mean(1)).iloc[:,index]) # new df with last column (mean) first

Here is a function to do this for any number of columns.

def mean_first(df):
    ncols = df.shape[1]        # Get the number of columns
    index = list(range(ncols)) # Create an index to reorder the columns
    index.insert(0,ncols)      # This puts the last column at the front
    return(df.assign(mean=df.mean(1)).iloc[:,index]) # new df with last column (mean) first

回答 23

书中最骇人听闻的方法

df.insert(0,"test",df["mean"])
df=df.drop(columns=["mean"]).rename(columns={"test":"mean"})

Hackiest method in the book

df.insert(0,"test",df["mean"])
df=df.drop(columns=["mean"]).rename(columns={"test":"mean"})

回答 24

我认为此功能更简单。您只需要在开头或结尾或同时在两者处指定一个子集即可:

def reorder_df_columns(df, start=None, end=None):
    """
        This function reorder columns of a DataFrame.
        It takes columns given in the list `start` and move them to the left.
        Its also takes columns in `end` and move them to the right.
    """
    if start is None:
        start = []
    if end is None:
        end = []
    assert isinstance(start, list) and isinstance(end, list)
    cols = list(df.columns)
    for c in start:
        if c not in cols:
            start.remove(c)
    for c in end:
        if c not in cols or c in start:
            end.remove(c)
    for c in start + end:
        cols.remove(c)
    cols = start + cols + end
    return df[cols]

I think this function is more straightforward. You Just need to specify a subset of columns at the start or the end or both:

def reorder_df_columns(df, start=None, end=None):
    """
        This function reorder columns of a DataFrame.
        It takes columns given in the list `start` and move them to the left.
        Its also takes columns in `end` and move them to the right.
    """
    if start is None:
        start = []
    if end is None:
        end = []
    assert isinstance(start, list) and isinstance(end, list)
    cols = list(df.columns)
    for c in start:
        if c not in cols:
            start.remove(c)
    for c in end:
        if c not in cols or c in start:
            end.remove(c)
    for c in start + end:
        cols.remove(c)
    cols = start + cols + end
    return df[cols]

回答 25

如果您知道另一列的位置,我相信@Aman的答案是最好的。

如果您不知道的位置mean,而只知道其名称,则不能直接诉诸cols = cols[-1:] + cols[:-1]。以下是我能想到的第二件好事:

meanDf = pd.DataFrame(df.pop('mean'))
# now df doesn't contain "mean" anymore. Order of join will move it to left or right:
meanDf.join(df) # has mean as first column
df.join(meanDf) # has mean as last column

I believe @Aman’s answer is the best if you know the location of the other column.

If you don’t know the location of mean, but only have its name, you cannot resort directly to cols = cols[-1:] + cols[:-1]. Following is the next-best thing I could come up with:

meanDf = pd.DataFrame(df.pop('mean'))
# now df doesn't contain "mean" anymore. Order of join will move it to left or right:
meanDf.join(df) # has mean as first column
df.join(meanDf) # has mean as last column

回答 26

只是翻转经常会有所帮助。

df[df.columns[::-1]]

或者只是随机播放一下。

import random
cols = list(df.columns)
random.shuffle(cols)
df[cols]

Just flipping helps often.

df[df.columns[::-1]]

Or just shuffle for a look.

import random
cols = list(df.columns)
random.shuffle(cols)
df[cols]

回答 27

多数答案还不够笼统,pandas reindex_axis方法有点繁琐,因此我提供了一个简单的函数,可以使用字典将任意数量的列移动到任意位置,其中字典=键名和值=要移动的位置。如果您的数据框很大,则将True传递给’big_data’,则该函数将返回有序列列表。您可以使用此列表来切片数据。

def order_column(df, columns, big_data = False):

    """Re-Orders dataFrame column(s)
       Parameters : 
       df      -- dataframe
       columns -- a dictionary:
                  key   = current column position/index or column name
                  value = position to move it to  
       big_data -- boolean 
                  True = returns only the ordered columns as a list
                          the user user can then slice the data using this
                          ordered column
                  False = default - return a copy of the dataframe
    """
    ordered_col = df.columns.tolist()

    for key, value in columns.items():

        ordered_col.remove(key)
        ordered_col.insert(value, key)

    if big_data:

        return ordered_col

    return df[ordered_col]

# e.g.
df = pd.DataFrame({'chicken wings': np.random.rand(10, 1).flatten(), 'taco': np.random.rand(10,1).flatten(),
                          'coffee': np.random.rand(10, 1).flatten()})
df['mean'] = df.mean(1)

df = order_column(df, {'mean': 0, 'coffee':1 })

>>>

输出

col = order_column(df, {'mean': 0, 'coffee':1 }, True)

col
>>>
['mean', 'coffee', 'chicken wings', 'taco']

# you could grab it by doing this

df = df[col]

Most of the answers did not generalize enough and pandas reindex_axis method is a little tedious, hence I offer a simple function to move an arbitrary number of columns to any position using a dictionary where key = column name and value = position to move to. If your dataframe is large pass True to ‘big_data’ then the function will return the ordered columns list. And you could use this list to slice your data.

def order_column(df, columns, big_data = False):

    """Re-Orders dataFrame column(s)
       Parameters : 
       df      -- dataframe
       columns -- a dictionary:
                  key   = current column position/index or column name
                  value = position to move it to  
       big_data -- boolean 
                  True = returns only the ordered columns as a list
                          the user user can then slice the data using this
                          ordered column
                  False = default - return a copy of the dataframe
    """
    ordered_col = df.columns.tolist()

    for key, value in columns.items():

        ordered_col.remove(key)
        ordered_col.insert(value, key)

    if big_data:

        return ordered_col

    return df[ordered_col]

# e.g.
df = pd.DataFrame({'chicken wings': np.random.rand(10, 1).flatten(), 'taco': np.random.rand(10,1).flatten(),
                          'coffee': np.random.rand(10, 1).flatten()})
df['mean'] = df.mean(1)

df = order_column(df, {'mean': 0, 'coffee':1 })

>>>

output

col = order_column(df, {'mean': 0, 'coffee':1 }, True)

col
>>>
['mean', 'coffee', 'chicken wings', 'taco']

# you could grab it by doing this

df = df[col]


回答 28

我有一个非常特殊的用例,用于对熊猫中的列名进行重新排序。有时我会在基于现有列的数据框中创建一个新列。默认情况下,pandas将在最后插入我的新列,但我希望将新列插入到其派生的现有列的​​旁边。

在此处输入图片说明

def rearrange_list(input_list, input_item_to_move, input_item_insert_here):
    '''
    Helper function to re-arrange the order of items in a list.
    Useful for moving column in pandas dataframe.

    Inputs:
        input_list - list
        input_item_to_move - item in list to move
        input_item_insert_here - item in list, insert before 

    returns:
        output_list
    '''
    # make copy for output, make sure it's a list
    output_list = list(input_list)

    # index of item to move
    idx_move = output_list.index(input_item_to_move)

    # pop off the item to move
    itm_move = output_list.pop(idx_move)

    # index of item to insert here
    idx_insert = output_list.index(input_item_insert_here)

    # insert item to move into here
    output_list.insert(idx_insert, itm_move)

    return output_list


import pandas as pd

# step 1: create sample dataframe
df = pd.DataFrame({
    'motorcycle': ['motorcycle1', 'motorcycle2', 'motorcycle3'],
    'initial_odometer': [101, 500, 322],
    'final_odometer': [201, 515, 463],
    'other_col_1': ['blah', 'blah', 'blah'],
    'other_col_2': ['blah', 'blah', 'blah']
})
print('Step 1: create sample dataframe')
display(df)
print()

# step 2: add new column that is difference between final and initial
df['change_odometer'] = df['final_odometer']-df['initial_odometer']
print('Step 2: add new column')
display(df)
print()

# step 3: rearrange columns
ls_cols = df.columns
ls_cols = rearrange_list(ls_cols, 'change_odometer', 'final_odometer')
df=df[ls_cols]
print('Step 3: rearrange columns')
display(df)

I have a very specific use case for re-ordering column names in pandas. Sometimes I am creating a new column in a dataframe that is based on an existing column. By default pandas will insert my new column at the end, but I want the new column to be inserted next to the existing column it’s derived from.

enter image description here

def rearrange_list(input_list, input_item_to_move, input_item_insert_here):
    '''
    Helper function to re-arrange the order of items in a list.
    Useful for moving column in pandas dataframe.

    Inputs:
        input_list - list
        input_item_to_move - item in list to move
        input_item_insert_here - item in list, insert before 

    returns:
        output_list
    '''
    # make copy for output, make sure it's a list
    output_list = list(input_list)

    # index of item to move
    idx_move = output_list.index(input_item_to_move)

    # pop off the item to move
    itm_move = output_list.pop(idx_move)

    # index of item to insert here
    idx_insert = output_list.index(input_item_insert_here)

    # insert item to move into here
    output_list.insert(idx_insert, itm_move)

    return output_list


import pandas as pd

# step 1: create sample dataframe
df = pd.DataFrame({
    'motorcycle': ['motorcycle1', 'motorcycle2', 'motorcycle3'],
    'initial_odometer': [101, 500, 322],
    'final_odometer': [201, 515, 463],
    'other_col_1': ['blah', 'blah', 'blah'],
    'other_col_2': ['blah', 'blah', 'blah']
})
print('Step 1: create sample dataframe')
display(df)
print()

# step 2: add new column that is difference between final and initial
df['change_odometer'] = df['final_odometer']-df['initial_odometer']
print('Step 2: add new column')
display(df)
print()

# step 3: rearrange columns
ls_cols = df.columns
ls_cols = rearrange_list(ls_cols, 'change_odometer', 'final_odometer')
df=df[ls_cols]
print('Step 3: rearrange columns')
display(df)

回答 29

一个对我有用的非常简单的解决方案是在df.columns上使用.reindex:

df=df[df.columns.reindex(['mean',0,1,2,3,4])[0]]

A pretty straightforward solution that worked for me is to use .reindex on df.columns:

df=df[df.columns.reindex(['mean',0,1,2,3,4])[0]]

如何使用Python通过HTTP下载文件?

问题:如何使用Python通过HTTP下载文件?

我有一个小的实用程序,可以用来按计划从网站上下载MP3文件,然后构建/更新已添加到iTunes的播客XML文件。

创建/更新XML文件的文本处理是用Python编写的。但是,我在Windows内使用wget.bat文件中下载实际的MP3文件。我希望使用Python编写整个实用程序。

我努力寻找一种方法来实际使用Python下载文件,因此为什么我诉诸于使用 wget

那么,如何使用Python下载文件?

I have a small utility that I use to download an MP3 file from a website on a schedule and then builds/updates a podcast XML file which I’ve added to iTunes.

The text processing that creates/updates the XML file is written in Python. However, I use wget inside a Windows .bat file to download the actual MP3 file. I would prefer to have the entire utility written in Python.

I struggled to find a way to actually download the file in Python, thus why I resorted to using wget.

So, how do I download the file using Python?


回答 0

在Python 2中,请使用标准库随附的urllib2。

import urllib2
response = urllib2.urlopen('http://www.example.com/')
html = response.read()

减去任何错误处理,这是使用该库的最基本方法。您还可以执行更复杂的操作,例如更改标题。该文档可在此处找到

In Python 2, use urllib2 which comes with the standard library.

import urllib2
response = urllib2.urlopen('http://www.example.com/')
html = response.read()

This is the most basic way to use the library, minus any error handling. You can also do more complex stuff such as changing headers. The documentation can be found here.


回答 1

使用urlretrieve

import urllib
urllib.urlretrieve ("http://www.example.com/songs/mp3.mp3", "mp3.mp3")

(对于Python 3+使用import urllib.requesturllib.request.urlretrieve

还有一个,带有“进度条”

import urllib2

url = "http://download.thinkbroadband.com/10MB.zip"

file_name = url.split('/')[-1]
u = urllib2.urlopen(url)
f = open(file_name, 'wb')
meta = u.info()
file_size = int(meta.getheaders("Content-Length")[0])
print "Downloading: %s Bytes: %s" % (file_name, file_size)

file_size_dl = 0
block_sz = 8192
while True:
    buffer = u.read(block_sz)
    if not buffer:
        break

    file_size_dl += len(buffer)
    f.write(buffer)
    status = r"%10d  [%3.2f%%]" % (file_size_dl, file_size_dl * 100. / file_size)
    status = status + chr(8)*(len(status)+1)
    print status,

f.close()

One more, using urlretrieve:

import urllib
urllib.urlretrieve ("http://www.example.com/songs/mp3.mp3", "mp3.mp3")

(for Python 3+ use import urllib.request and urllib.request.urlretrieve)

Yet another one, with a “progressbar”

import urllib2

url = "http://download.thinkbroadband.com/10MB.zip"

file_name = url.split('/')[-1]
u = urllib2.urlopen(url)
f = open(file_name, 'wb')
meta = u.info()
file_size = int(meta.getheaders("Content-Length")[0])
print "Downloading: %s Bytes: %s" % (file_name, file_size)

file_size_dl = 0
block_sz = 8192
while True:
    buffer = u.read(block_sz)
    if not buffer:
        break

    file_size_dl += len(buffer)
    f.write(buffer)
    status = r"%10d  [%3.2f%%]" % (file_size_dl, file_size_dl * 100. / file_size)
    status = status + chr(8)*(len(status)+1)
    print status,

f.close()

回答 2

在2012年,使用python请求库

>>> import requests
>>> 
>>> url = "http://download.thinkbroadband.com/10MB.zip"
>>> r = requests.get(url)
>>> print len(r.content)
10485760

你可以跑 pip install requests以获取它。

与API相比,请求具有许多优势,因为API更加简单。如果必须执行身份验证,则尤其如此。在这种情况下,urllib和urllib2非常不直观且令人痛苦。


2015-12-30

人们对进度条表示钦佩。肯定很酷。现在有几种现成的解决方案,包括tqdm

from tqdm import tqdm
import requests

url = "http://download.thinkbroadband.com/10MB.zip"
response = requests.get(url, stream=True)

with open("10MB", "wb") as handle:
    for data in tqdm(response.iter_content()):
        handle.write(data)

这本质上是30个月前描述的实现@kvance。

In 2012, use the python requests library

>>> import requests
>>> 
>>> url = "http://download.thinkbroadband.com/10MB.zip"
>>> r = requests.get(url)
>>> print len(r.content)
10485760

You can run pip install requests to get it.

Requests has many advantages over the alternatives because the API is much simpler. This is especially true if you have to do authentication. urllib and urllib2 are pretty unintuitive and painful in this case.


2015-12-30

People have expressed admiration for the progress bar. It’s cool, sure. There are several off-the-shelf solutions now, including tqdm:

from tqdm import tqdm
import requests

url = "http://download.thinkbroadband.com/10MB.zip"
response = requests.get(url, stream=True)

with open("10MB", "wb") as handle:
    for data in tqdm(response.iter_content()):
        handle.write(data)

This is essentially the implementation @kvance described 30 months ago.


回答 3

import urllib2
mp3file = urllib2.urlopen("http://www.example.com/songs/mp3.mp3")
with open('test.mp3','wb') as output:
  output.write(mp3file.read())

wbopen('test.mp3','wb')打开一个文件(并清除所有现有文件)以二进制模式,所以你可以用它来代替刚才保存的文本数据。

import urllib2
mp3file = urllib2.urlopen("http://www.example.com/songs/mp3.mp3")
with open('test.mp3','wb') as output:
  output.write(mp3file.read())

The wb in open('test.mp3','wb') opens a file (and erases any existing file) in binary mode so you can save data with it instead of just text.


回答 4

Python 3

  • urllib.request.urlopen

    import urllib.request
    response = urllib.request.urlopen('http://www.example.com/')
    html = response.read()
  • urllib.request.urlretrieve

    import urllib.request
    urllib.request.urlretrieve('http://www.example.com/songs/mp3.mp3', 'mp3.mp3')

    注意:根据文档,urllib.request.urlretrieve是“旧版界面”和“以后可能会弃用”(感谢gerrit

Python 2

  • urllib2.urlopen(感谢科里

    import urllib2
    response = urllib2.urlopen('http://www.example.com/')
    html = response.read()
  • urllib.urlretrieve(感谢PabloG

    import urllib
    urllib.urlretrieve('http://www.example.com/songs/mp3.mp3', 'mp3.mp3')

Python 3

  • urllib.request.urlopen

    import urllib.request
    response = urllib.request.urlopen('http://www.example.com/')
    html = response.read()
    
  • urllib.request.urlretrieve

    import urllib.request
    urllib.request.urlretrieve('http://www.example.com/songs/mp3.mp3', 'mp3.mp3')
    

    Note: According to the documentation, urllib.request.urlretrieve is a “legacy interface” and “might become deprecated in the future” (thanks gerrit)

Python 2

  • urllib2.urlopen (thanks Corey)

    import urllib2
    response = urllib2.urlopen('http://www.example.com/')
    html = response.read()
    
  • urllib.urlretrieve (thanks PabloG)

    import urllib
    urllib.urlretrieve('http://www.example.com/songs/mp3.mp3', 'mp3.mp3')
    

回答 5

使用wget模块:

import wget
wget.download('url')

use wget module:

import wget
wget.download('url')

回答 6

用于Python 2/3的PabloG代码的改进版本:

#!/usr/bin/env python
# -*- coding: utf-8 -*-
from __future__ import ( division, absolute_import, print_function, unicode_literals )

import sys, os, tempfile, logging

if sys.version_info >= (3,):
    import urllib.request as urllib2
    import urllib.parse as urlparse
else:
    import urllib2
    import urlparse

def download_file(url, dest=None):
    """ 
    Download and save a file specified by url to dest directory,
    """
    u = urllib2.urlopen(url)

    scheme, netloc, path, query, fragment = urlparse.urlsplit(url)
    filename = os.path.basename(path)
    if not filename:
        filename = 'downloaded.file'
    if dest:
        filename = os.path.join(dest, filename)

    with open(filename, 'wb') as f:
        meta = u.info()
        meta_func = meta.getheaders if hasattr(meta, 'getheaders') else meta.get_all
        meta_length = meta_func("Content-Length")
        file_size = None
        if meta_length:
            file_size = int(meta_length[0])
        print("Downloading: {0} Bytes: {1}".format(url, file_size))

        file_size_dl = 0
        block_sz = 8192
        while True:
            buffer = u.read(block_sz)
            if not buffer:
                break

            file_size_dl += len(buffer)
            f.write(buffer)

            status = "{0:16}".format(file_size_dl)
            if file_size:
                status += "   [{0:6.2f}%]".format(file_size_dl * 100 / file_size)
            status += chr(13)
            print(status, end="")
        print()

    return filename

if __name__ == "__main__":  # Only run if this file is called directly
    print("Testing with 10MB download")
    url = "http://download.thinkbroadband.com/10MB.zip"
    filename = download_file(url)
    print(filename)

An improved version of the PabloG code for Python 2/3:

#!/usr/bin/env python
# -*- coding: utf-8 -*-
from __future__ import ( division, absolute_import, print_function, unicode_literals )

import sys, os, tempfile, logging

if sys.version_info >= (3,):
    import urllib.request as urllib2
    import urllib.parse as urlparse
else:
    import urllib2
    import urlparse

def download_file(url, dest=None):
    """ 
    Download and save a file specified by url to dest directory,
    """
    u = urllib2.urlopen(url)

    scheme, netloc, path, query, fragment = urlparse.urlsplit(url)
    filename = os.path.basename(path)
    if not filename:
        filename = 'downloaded.file'
    if dest:
        filename = os.path.join(dest, filename)

    with open(filename, 'wb') as f:
        meta = u.info()
        meta_func = meta.getheaders if hasattr(meta, 'getheaders') else meta.get_all
        meta_length = meta_func("Content-Length")
        file_size = None
        if meta_length:
            file_size = int(meta_length[0])
        print("Downloading: {0} Bytes: {1}".format(url, file_size))

        file_size_dl = 0
        block_sz = 8192
        while True:
            buffer = u.read(block_sz)
            if not buffer:
                break

            file_size_dl += len(buffer)
            f.write(buffer)

            status = "{0:16}".format(file_size_dl)
            if file_size:
                status += "   [{0:6.2f}%]".format(file_size_dl * 100 / file_size)
            status += chr(13)
            print(status, end="")
        print()

    return filename

if __name__ == "__main__":  # Only run if this file is called directly
    print("Testing with 10MB download")
    url = "http://download.thinkbroadband.com/10MB.zip"
    filename = download_file(url)
    print(filename)

回答 7

Python 2 & Python 3附带了简单但兼容的方法six

from six.moves import urllib
urllib.request.urlretrieve("http://www.example.com/songs/mp3.mp3", "mp3.mp3")

Simple yet Python 2 & Python 3 compatible way comes with six library:

from six.moves import urllib
urllib.request.urlretrieve("http://www.example.com/songs/mp3.mp3", "mp3.mp3")

回答 8

import os,requests
def download(url):
    get_response = requests.get(url,stream=True)
    file_name  = url.split("/")[-1]
    with open(file_name, 'wb') as f:
        for chunk in get_response.iter_content(chunk_size=1024):
            if chunk: # filter out keep-alive new chunks
                f.write(chunk)


download("https://example.com/example.jpg")
import os,requests
def download(url):
    get_response = requests.get(url,stream=True)
    file_name  = url.split("/")[-1]
    with open(file_name, 'wb') as f:
        for chunk in get_response.iter_content(chunk_size=1024):
            if chunk: # filter out keep-alive new chunks
                f.write(chunk)


download("https://example.com/example.jpg")

回答 9

为此,在纯Python中编写了wget库。从2.0版开始urlretrieve,它就具备了这些功能

Wrote wget library in pure Python just for this purpose. It is pumped up urlretrieve with these features as of version 2.0.


回答 10

以下是在python中下载文件的最常用调用:

  1. urllib.urlretrieve ('url_to_file', file_name)

  2. urllib2.urlopen('url_to_file')

  3. requests.get(url)

  4. wget.download('url', file_name)

注意:urlopenurlretrieve发现在下载大文件(大小> 500 MB)时表现相对较差。requests.get将文件存储在内存中,直到下载完成。

Following are the most commonly used calls for downloading files in python:

  1. urllib.urlretrieve ('url_to_file', file_name)

  2. urllib2.urlopen('url_to_file')

  3. requests.get(url)

  4. wget.download('url', file_name)

Note: urlopen and urlretrieve are found to perform relatively bad with downloading large files (size > 500 MB). requests.get stores the file in-memory until download is complete.


回答 11

我同意Corey的观点,urllib2比urllib更完整,如果您想做更复杂的事情,它应该是使用的模块,但是要使答案更完整,如果只需要基础知识,则urllib是一个更简单的模块:

import urllib
response = urllib.urlopen('http://www.example.com/sound.mp3')
mp3 = response.read()

将正常工作。或者,如果您不想处理“响应”对象,则可以直接调用read()

import urllib
mp3 = urllib.urlopen('http://www.example.com/sound.mp3').read()

I agree with Corey, urllib2 is more complete than urllib and should likely be the module used if you want to do more complex things, but to make the answers more complete, urllib is a simpler module if you want just the basics:

import urllib
response = urllib.urlopen('http://www.example.com/sound.mp3')
mp3 = response.read()

Will work fine. Or, if you don’t want to deal with the “response” object you can call read() directly:

import urllib
mp3 = urllib.urlopen('http://www.example.com/sound.mp3').read()

回答 12

在python3中,您可以使用urllib3和shutil libraires。使用pip或pip3下载它们(取决于python3是否为默认值)

pip3 install urllib3 shutil

然后运行这段代码

import urllib.request
import shutil

url = "http://www.somewebsite.com/something.pdf"
output_file = "save_this_name.pdf"
with urllib.request.urlopen(url) as response, open(output_file, 'wb') as out_file:
    shutil.copyfileobj(response, out_file)

请注意,您已下载urllib3urllib在代码中使用

In python3 you can use urllib3 and shutil libraires. Download them by using pip or pip3 (Depending whether python3 is default or not)

pip3 install urllib3 shutil

Then run this code

import urllib.request
import shutil

url = "http://www.somewebsite.com/something.pdf"
output_file = "save_this_name.pdf"
with urllib.request.urlopen(url) as response, open(output_file, 'wb') as out_file:
    shutil.copyfileobj(response, out_file)

Note that you download urllib3 but use urllib in code


回答 13

您也可以使用urlretrieve获得进度反馈:

def report(blocknr, blocksize, size):
    current = blocknr*blocksize
    sys.stdout.write("\r{0:.2f}%".format(100.0*current/size))

def downloadFile(url):
    print "\n",url
    fname = url.split('/')[-1]
    print fname
    urllib.urlretrieve(url, fname, report)

You can get the progress feedback with urlretrieve as well:

def report(blocknr, blocksize, size):
    current = blocknr*blocksize
    sys.stdout.write("\r{0:.2f}%".format(100.0*current/size))

def downloadFile(url):
    print "\n",url
    fname = url.split('/')[-1]
    print fname
    urllib.urlretrieve(url, fname, report)

回答 14

如果已安装wget,则可以使用parallel_sync。

pip安装parallel_sync

from parallel_sync import wget
urls = ['http://something.png', 'http://somthing.tar.gz', 'http://somthing.zip']
wget.download('/tmp', urls)
# or a single file:
wget.download('/tmp', urls[0], filenames='x.zip', extract=True)

Doc:https : //pythonhosted.org/parallel_sync/pages/examples.html

这是非常强大的。它可以并行下载文件,重试失败,甚至可以将文件下载到远程计算机上。

If you have wget installed, you can use parallel_sync.

pip install parallel_sync

from parallel_sync import wget
urls = ['http://something.png', 'http://somthing.tar.gz', 'http://somthing.zip']
wget.download('/tmp', urls)
# or a single file:
wget.download('/tmp', urls[0], filenames='x.zip', extract=True)

Doc: https://pythonhosted.org/parallel_sync/pages/examples.html

This is pretty powerful. It can download files in parallel, retry upon failure , and it can even download files on a remote machine.


回答 15

如果速度对您很重要,则我对模块urllib和进行了小型性能测试wget,并考虑了wget一次尝试使用状态栏,一次尝试不使用状态栏。我使用了三个不同的500MB文件进行测试(不同的文件-消除了在后台进行某些缓存的机会)。在python2的debian机器上进行了测试。

首先,这些是结果(在不同的运行中它们是相似的):

$ python wget_test.py 
urlretrive_test : starting
urlretrive_test : 6.56
==============
wget_no_bar_test : starting
wget_no_bar_test : 7.20
==============
wget_with_bar_test : starting
100% [......................................................................] 541335552 / 541335552
wget_with_bar_test : 50.49
==============

我执行测试的方式是使用“配置文件”装饰器。这是完整的代码:

import wget
import urllib
import time
from functools import wraps

def profile(func):
    @wraps(func)
    def inner(*args):
        print func.__name__, ": starting"
        start = time.time()
        ret = func(*args)
        end = time.time()
        print func.__name__, ": {:.2f}".format(end - start)
        return ret
    return inner

url1 = 'http://host.com/500a.iso'
url2 = 'http://host.com/500b.iso'
url3 = 'http://host.com/500c.iso'

def do_nothing(*args):
    pass

@profile
def urlretrive_test(url):
    return urllib.urlretrieve(url)

@profile
def wget_no_bar_test(url):
    return wget.download(url, out='/tmp/', bar=do_nothing)

@profile
def wget_with_bar_test(url):
    return wget.download(url, out='/tmp/')

urlretrive_test(url1)
print '=============='
time.sleep(1)

wget_no_bar_test(url2)
print '=============='
time.sleep(1)

wget_with_bar_test(url3)
print '=============='
time.sleep(1)

urllib 似乎是最快的

If speed matters to you, I made a small performance test for the modules urllib and wget, and regarding wget I tried once with status bar and once without. I took three different 500MB files to test with (different files- to eliminate the chance that there is some caching going on under the hood). Tested on debian machine, with python2.

First, these are the results (they are similar in different runs):

$ python wget_test.py 
urlretrive_test : starting
urlretrive_test : 6.56
==============
wget_no_bar_test : starting
wget_no_bar_test : 7.20
==============
wget_with_bar_test : starting
100% [......................................................................] 541335552 / 541335552
wget_with_bar_test : 50.49
==============

The way I performed the test is using “profile” decorator. This is the full code:

import wget
import urllib
import time
from functools import wraps

def profile(func):
    @wraps(func)
    def inner(*args):
        print func.__name__, ": starting"
        start = time.time()
        ret = func(*args)
        end = time.time()
        print func.__name__, ": {:.2f}".format(end - start)
        return ret
    return inner

url1 = 'http://host.com/500a.iso'
url2 = 'http://host.com/500b.iso'
url3 = 'http://host.com/500c.iso'

def do_nothing(*args):
    pass

@profile
def urlretrive_test(url):
    return urllib.urlretrieve(url)

@profile
def wget_no_bar_test(url):
    return wget.download(url, out='/tmp/', bar=do_nothing)

@profile
def wget_with_bar_test(url):
    return wget.download(url, out='/tmp/')

urlretrive_test(url1)
print '=============='
time.sleep(1)

wget_no_bar_test(url2)
print '=============='
time.sleep(1)

wget_with_bar_test(url3)
print '=============='
time.sleep(1)

urllib seems to be the fastest


回答 16

仅出于完整性考虑,也可以使用该subprocess软件包调用任何程序来检索文件。专用于检索文件的程序要比Python函数(如)强大urlretrieve。例如,wget可以递归下载目录(-R),可以处理FTP,重定向,HTTP代理,可以避免重新下载现有文件(-nc),并且aria2可以进行多连接下载,从而有可能加快下载速度。

import subprocess
subprocess.check_output(['wget', '-O', 'example_output_file.html', 'https://example.com'])

在Jupyter Notebook中,还可以使用以下!语法直接调用程序:

!wget -O example_output_file.html https://example.com

Just for the sake of completeness, it is also possible to call any program for retrieving files using the subprocess package. Programs dedicated to retrieving files are more powerful than Python functions like urlretrieve. For example, wget can download directories recursively (-R), can deal with FTP, redirects, HTTP proxies, can avoid re-downloading existing files (-nc), and aria2 can do multi-connection downloads which can potentially speed up your downloads.

import subprocess
subprocess.check_output(['wget', '-O', 'example_output_file.html', 'https://example.com'])

In Jupyter Notebook, one can also call programs directly with the ! syntax:

!wget -O example_output_file.html https://example.com

回答 17

源代码可以是:

import urllib
sock = urllib.urlopen("http://diveintopython.org/")
htmlSource = sock.read()                            
sock.close()                                        
print htmlSource  

Source code can be:

import urllib
sock = urllib.urlopen("http://diveintopython.org/")
htmlSource = sock.read()                            
sock.close()                                        
print htmlSource  

回答 18

您可以在Python 2和3上使用PycURL

import pycurl

FILE_DEST = 'pycurl.html'
FILE_SRC = 'http://pycurl.io/'

with open(FILE_DEST, 'wb') as f:
    c = pycurl.Curl()
    c.setopt(c.URL, FILE_SRC)
    c.setopt(c.WRITEDATA, f)
    c.perform()
    c.close()

You can use PycURL on Python 2 and 3.

import pycurl

FILE_DEST = 'pycurl.html'
FILE_SRC = 'http://pycurl.io/'

with open(FILE_DEST, 'wb') as f:
    c = pycurl.Curl()
    c.setopt(c.URL, FILE_SRC)
    c.setopt(c.WRITEDATA, f)
    c.perform()
    c.close()

回答 19

我编写了以下内容,这些内容适用于香草Python 2或Python 3。


import sys
try:
    import urllib.request
    python3 = True
except ImportError:
    import urllib2
    python3 = False


def progress_callback_simple(downloaded,total):
    sys.stdout.write(
        "\r" +
        (len(str(total))-len(str(downloaded)))*" " + str(downloaded) + "/%d"%total +
        " [%3.2f%%]"%(100.0*float(downloaded)/float(total))
    )
    sys.stdout.flush()

def download(srcurl, dstfilepath, progress_callback=None, block_size=8192):
    def _download_helper(response, out_file, file_size):
        if progress_callback!=None: progress_callback(0,file_size)
        if block_size == None:
            buffer = response.read()
            out_file.write(buffer)

            if progress_callback!=None: progress_callback(file_size,file_size)
        else:
            file_size_dl = 0
            while True:
                buffer = response.read(block_size)
                if not buffer: break

                file_size_dl += len(buffer)
                out_file.write(buffer)

                if progress_callback!=None: progress_callback(file_size_dl,file_size)
    with open(dstfilepath,"wb") as out_file:
        if python3:
            with urllib.request.urlopen(srcurl) as response:
                file_size = int(response.getheader("Content-Length"))
                _download_helper(response,out_file,file_size)
        else:
            response = urllib2.urlopen(srcurl)
            meta = response.info()
            file_size = int(meta.getheaders("Content-Length")[0])
            _download_helper(response,out_file,file_size)

import traceback
try:
    download(
        "https://geometrian.com/data/programming/projects/glLib/glLib%20Reloaded%200.5.9/0.5.9.zip",
        "output.zip",
        progress_callback_simple
    )
except:
    traceback.print_exc()
    input()

笔记:

  • 支持“进度条”回调。
  • 从我的网站下载了一个4 MB的测试.zip文件。

I wrote the following, which works in vanilla Python 2 or Python 3.


import sys
try:
    import urllib.request
    python3 = True
except ImportError:
    import urllib2
    python3 = False


def progress_callback_simple(downloaded,total):
    sys.stdout.write(
        "\r" +
        (len(str(total))-len(str(downloaded)))*" " + str(downloaded) + "/%d"%total +
        " [%3.2f%%]"%(100.0*float(downloaded)/float(total))
    )
    sys.stdout.flush()

def download(srcurl, dstfilepath, progress_callback=None, block_size=8192):
    def _download_helper(response, out_file, file_size):
        if progress_callback!=None: progress_callback(0,file_size)
        if block_size == None:
            buffer = response.read()
            out_file.write(buffer)

            if progress_callback!=None: progress_callback(file_size,file_size)
        else:
            file_size_dl = 0
            while True:
                buffer = response.read(block_size)
                if not buffer: break

                file_size_dl += len(buffer)
                out_file.write(buffer)

                if progress_callback!=None: progress_callback(file_size_dl,file_size)
    with open(dstfilepath,"wb") as out_file:
        if python3:
            with urllib.request.urlopen(srcurl) as response:
                file_size = int(response.getheader("Content-Length"))
                _download_helper(response,out_file,file_size)
        else:
            response = urllib2.urlopen(srcurl)
            meta = response.info()
            file_size = int(meta.getheaders("Content-Length")[0])
            _download_helper(response,out_file,file_size)

import traceback
try:
    download(
        "https://geometrian.com/data/programming/projects/glLib/glLib%20Reloaded%200.5.9/0.5.9.zip",
        "output.zip",
        progress_callback_simple
    )
except:
    traceback.print_exc()
    input()

Notes:

  • Supports a “progress bar” callback.
  • Download is a 4 MB test .zip from my website.

回答 20

这可能有点晚了,但是我看到了pabloG的代码,不禁添加了一个os.system(’cls’)使其看上去真棒!一探究竟 :

    import urllib2,os

    url = "http://download.thinkbroadband.com/10MB.zip"

    file_name = url.split('/')[-1]
    u = urllib2.urlopen(url)
    f = open(file_name, 'wb')
    meta = u.info()
    file_size = int(meta.getheaders("Content-Length")[0])
    print "Downloading: %s Bytes: %s" % (file_name, file_size)
    os.system('cls')
    file_size_dl = 0
    block_sz = 8192
    while True:
        buffer = u.read(block_sz)
        if not buffer:
            break

        file_size_dl += len(buffer)
        f.write(buffer)
        status = r"%10d  [%3.2f%%]" % (file_size_dl, file_size_dl * 100. / file_size)
        status = status + chr(8)*(len(status)+1)
        print status,

    f.close()

如果在Windows以外的环境中运行,则必须使用“ cls”以外的其他名称。在MAC OS X和Linux中,应该“清楚”。

This may be a little late, But I saw pabloG’s code and couldn’t help adding a os.system(‘cls’) to make it look AWESOME! Check it out :

    import urllib2,os

    url = "http://download.thinkbroadband.com/10MB.zip"

    file_name = url.split('/')[-1]
    u = urllib2.urlopen(url)
    f = open(file_name, 'wb')
    meta = u.info()
    file_size = int(meta.getheaders("Content-Length")[0])
    print "Downloading: %s Bytes: %s" % (file_name, file_size)
    os.system('cls')
    file_size_dl = 0
    block_sz = 8192
    while True:
        buffer = u.read(block_sz)
        if not buffer:
            break

        file_size_dl += len(buffer)
        f.write(buffer)
        status = r"%10d  [%3.2f%%]" % (file_size_dl, file_size_dl * 100. / file_size)
        status = status + chr(8)*(len(status)+1)
        print status,

    f.close()

If running in an environment other than Windows, you will have to use something other then ‘cls’. In MAC OS X and Linux it should be ‘clear’.


回答 21

urlretrieve和request.get很简单,但事实并非如此。我已经获取了几个站点的数据,包括文本和图像,以上两个可能解决了大多数任务。但是对于更通用的解决方案,我建议使用urlopen。由于它包含在Python 3标准库中,因此您的代码可以在任何运行Python 3的计算机上运行,​​而无需预先安装站点软件包

import urllib.request
url_request = urllib.request.Request(url, headers=headers)
url_connect = urllib.request.urlopen(url_request)

#remember to open file in bytes mode
with open(filename, 'wb') as f:
    while True:
        buffer = url_connect.read(buffer_size)
        if not buffer: break

        #an integer value of size of written data
        data_wrote = f.write(buffer)

#you could probably use with-open-as manner
url_connect.close()

当使用Python通过http下载文件时,此答案提供了针对HTTP 403 Forbidden的解决方案。我只尝试了请求和urllib模块,其他模块可能提供了更好的功能,但这是我用来解决大多数问题的模块。

urlretrieve and requests.get are simple, however the reality not. I have fetched data for couple sites, including text and images, the above two probably solve most of the tasks. but for a more universal solution I suggest the use of urlopen. As it is included in Python 3 standard library, your code could run on any machine that run Python 3 without pre-installing site-package

import urllib.request
url_request = urllib.request.Request(url, headers=headers)
url_connect = urllib.request.urlopen(url_request)

#remember to open file in bytes mode
with open(filename, 'wb') as f:
    while True:
        buffer = url_connect.read(buffer_size)
        if not buffer: break

        #an integer value of size of written data
        data_wrote = f.write(buffer)

#you could probably use with-open-as manner
url_connect.close()

This answer provides a solution to HTTP 403 Forbidden when downloading file over http using Python. I have tried only requests and urllib modules, the other module may provide something better, but this is the one I used to solve most of the problems.


回答 22

答案较晚,但python>=3.6您可以使用:

import dload
dload.save(url)

安装dload方式:

pip3 install dload

Late answer, but for python>=3.6 you can use:

import dload
dload.save(url)

Install dload with:

pip3 install dload

回答 23

我想从网页上下载所有文件。我尝试了一下,wget但是失败了,所以我决定使用Python路由,然后找到了这个线程。

阅读后,我做了一个命令行应用程序soupget,扩展了PabloGStan的出色答案,并添加了一些有用的选项。

它使用BeatifulSoup收集页面的所有URL,然后下载具有所需扩展名的URL。最后,它可以并行下载多个文件。

这里是:

#!/usr/bin/env python3
# -*- coding: utf-8 -*-
from __future__ import (division, absolute_import, print_function, unicode_literals)
import sys, os, argparse
from bs4 import BeautifulSoup

# --- insert Stan's script here ---
# if sys.version_info >= (3,): 
#...
#...
# def download_file(url, dest=None): 
#...
#...

# --- new stuff ---
def collect_all_url(page_url, extensions):
    """
    Recovers all links in page_url checking for all the desired extensions
    """
    conn = urllib2.urlopen(page_url)
    html = conn.read()
    soup = BeautifulSoup(html, 'lxml')
    links = soup.find_all('a')

    results = []    
    for tag in links:
        link = tag.get('href', None)
        if link is not None: 
            for e in extensions:
                if e in link:
                    # Fallback for badly defined links
                    # checks for missing scheme or netloc
                    if bool(urlparse.urlparse(link).scheme) and bool(urlparse.urlparse(link).netloc):
                        results.append(link)
                    else:
                        new_url=urlparse.urljoin(page_url,link)                        
                        results.append(new_url)
    return results

if __name__ == "__main__":  # Only run if this file is called directly
    # Command line arguments
    parser = argparse.ArgumentParser(
        description='Download all files from a webpage.')
    parser.add_argument(
        '-u', '--url', 
        help='Page url to request')
    parser.add_argument(
        '-e', '--ext', 
        nargs='+',
        help='Extension(s) to find')    
    parser.add_argument(
        '-d', '--dest', 
        default=None,
        help='Destination where to save the files')
    parser.add_argument(
        '-p', '--par', 
        action='store_true', default=False, 
        help="Turns on parallel download")
    args = parser.parse_args()

    # Recover files to download
    all_links = collect_all_url(args.url, args.ext)

    # Download
    if not args.par:
        for l in all_links:
            try:
                filename = download_file(l, args.dest)
                print(l)
            except Exception as e:
                print("Error while downloading: {}".format(e))
    else:
        from multiprocessing.pool import ThreadPool
        results = ThreadPool(10).imap_unordered(
            lambda x: download_file(x, args.dest), all_links)
        for p in results:
            print(p)

其用法的一个示例是:

python3 soupget.py -p -e <list of extensions> -d <destination_folder> -u <target_webpage>

还有一个实际示例,如果您想看到它的实际效果:

python3 soupget.py -p -e .xlsx .pdf .csv -u https://healthdata.gov/dataset/chemicals-cosmetics

I wanted do download all the files from a webpage. I tried wget but it was failing so I decided for the Python route and I found this thread.

After reading it, I have made a little command line application, soupget, expanding on the excellent answers of PabloG and Stan and adding some useful options.

It uses BeatifulSoup to collect all the URLs of the page and then download the ones with the desired extension(s). Finally it can download multiple files in parallel.

Here it is:

#!/usr/bin/env python3
# -*- coding: utf-8 -*-
from __future__ import (division, absolute_import, print_function, unicode_literals)
import sys, os, argparse
from bs4 import BeautifulSoup

# --- insert Stan's script here ---
# if sys.version_info >= (3,): 
#...
#...
# def download_file(url, dest=None): 
#...
#...

# --- new stuff ---
def collect_all_url(page_url, extensions):
    """
    Recovers all links in page_url checking for all the desired extensions
    """
    conn = urllib2.urlopen(page_url)
    html = conn.read()
    soup = BeautifulSoup(html, 'lxml')
    links = soup.find_all('a')

    results = []    
    for tag in links:
        link = tag.get('href', None)
        if link is not None: 
            for e in extensions:
                if e in link:
                    # Fallback for badly defined links
                    # checks for missing scheme or netloc
                    if bool(urlparse.urlparse(link).scheme) and bool(urlparse.urlparse(link).netloc):
                        results.append(link)
                    else:
                        new_url=urlparse.urljoin(page_url,link)                        
                        results.append(new_url)
    return results

if __name__ == "__main__":  # Only run if this file is called directly
    # Command line arguments
    parser = argparse.ArgumentParser(
        description='Download all files from a webpage.')
    parser.add_argument(
        '-u', '--url', 
        help='Page url to request')
    parser.add_argument(
        '-e', '--ext', 
        nargs='+',
        help='Extension(s) to find')    
    parser.add_argument(
        '-d', '--dest', 
        default=None,
        help='Destination where to save the files')
    parser.add_argument(
        '-p', '--par', 
        action='store_true', default=False, 
        help="Turns on parallel download")
    args = parser.parse_args()

    # Recover files to download
    all_links = collect_all_url(args.url, args.ext)

    # Download
    if not args.par:
        for l in all_links:
            try:
                filename = download_file(l, args.dest)
                print(l)
            except Exception as e:
                print("Error while downloading: {}".format(e))
    else:
        from multiprocessing.pool import ThreadPool
        results = ThreadPool(10).imap_unordered(
            lambda x: download_file(x, args.dest), all_links)
        for p in results:
            print(p)

An example of its usage is:

python3 soupget.py -p -e <list of extensions> -d <destination_folder> -u <target_webpage>

And an actual example if you want to see it in action:

python3 soupget.py -p -e .xlsx .pdf .csv -u https://healthdata.gov/dataset/chemicals-cosmetics

二维阵列中的峰检测

问题:二维阵列中的峰检测

我正在帮助兽医诊所测量狗爪下的压力。我使用Python进行数据分析,但现在我一直试图将爪子分成(解剖)子区域。

我制作了每个爪子的2D数组,其中包含爪子随时间推移已加载的每个传感器的最大值。这是一个爪子的示例,我使用Excel绘制了要“检测”的区域。这些是传感器周围具有最大最大值的2 x 2框,它们的总和最大。

替代文字

因此,我尝试了一些实验,并决定只查找每一列和每一行的最大值(由于爪子的形状而不能朝一个方向看)。这似乎可以很好地“检测”到各个脚趾的位置,但是它也标记了相邻的传感器。

替代文字

那么,告诉Python我想要这些最大值中的哪一个是最好的方法呢?

注意:2×2的正方形不能重叠,因为它们必须是单独的脚趾!

同样我以2×2为方便,欢迎使用任何更高级的解决方案,但我只是人类运动的科学家,所以我既不是真正的程序员也不是数学家,所以请保持“简单”。

是可以加载版本np.loadtxt


结果

因此,我尝试了@jextee的解决方案(请参见下面的结果)。如您所见,它在前爪上非常有效,但在后腿上效果较差。

更具体地说,它无法识别出第四脚趾的小峰。显然,这是循环固有的固有观点,即循环从上到下朝向最低值,而不考虑此位置。

谁会知道如何调整@jextee的算法,以便它也能够找到第四个脚趾?

替代文字

由于我尚未处理其他任何试验,因此无法提供其他任何样品。但是我之前提供的数据是每只爪子的平均值。该文件是一个数组,其中最大9爪的数据按它们与板接触的顺序排列。

此图显示了它们如何在空间上分布在板上。

替代文字

更新:

我已经为有兴趣的任何人建立了博客,为SkyDrive设置了所有原始测量值。因此,对于任何需要更多数据的人:给您更多的权力!


新更新:

因此,在获得帮助后,我遇到了有关爪子检测爪子分类的问题,我终于能够检查每个爪子的脚趾检测!事实证明,除了爪子大小像我自己的示例中的爪子一样,它在任何情况下都无法正常工作。事后看来,如此随意地选择2×2是我自己的错。

这是一个出问题的好例子:指甲被识别为脚趾,而“脚跟”是如此之宽,被识别两次!

替代文字

脚掌太大,因此尺寸为2×2,没有重叠,会导致两次检测到一些脚趾。相反,在小型犬中,它通常找不到第5个脚趾,我怀疑这是2×2区域太大造成的。

尝试所有测量的当前解决方案得出了一个惊人的结论:几乎对我所有的小型犬来说,它都找不到第五个脚趾,而在大型犬的50%以上的撞击中,它会发现更多!

所以很明显我需要更改它。我自己的猜测是将其大小更改为neighborhood较小的大小(对于小型狗)和较大的大小(对于大型狗)。但是generate_binary_structure不允许我更改数组的大小。

因此,我希望其他人对脚趾的定位有更好的建议,也许脚趾的面积与爪子的大小成正比?

I’m helping a veterinary clinic measuring pressure under a dogs paw. I use Python for my data analysis and now I’m stuck trying to divide the paws into (anatomical) subregions.

I made a 2D array of each paw, that consists of the maximal values for each sensor that has been loaded by the paw over time. Here’s an example of one paw, where I used Excel to draw the areas I want to ‘detect’. These are 2 by 2 boxes around the sensor with local maxima’s, that together have the largest sum.

alt text

So I tried some experimenting and decide to simply look for the maximums of each column and row (can’t look in one direction due to the shape of the paw). This seems to ‘detect’ the location of the separate toes fairly well, but it also marks neighboring sensors.

alt text

So what would be the best way to tell Python which of these maximums are the ones I want?

Note: The 2×2 squares can’t overlap, since they have to be separate toes!

Also I took 2×2 as a convenience, any more advanced solution is welcome, but I’m simply a human movement scientist, so I’m neither a real programmer or a mathematician, so please keep it ‘simple’.

Here’s a version that can be loaded with np.loadtxt


Results

So I tried @jextee’s solution (see the results below). As you can see, it works very on the front paws, but it works less well for the hind legs.

More specifically, it can’t recognize the small peak that’s the fourth toe. This is obviously inherent to the fact that the loop looks top down towards the lowest value, without taking into account where this is.

Would anyone know how to tweak @jextee’s algorithm, so that it might be able to find the 4th toe too?

alt text

Since I haven’t processed any other trials yet, I can’t supply any other samples. But the data I gave before were the averages of each paw. This file is an array with the maximal data of 9 paws in the order they made contact with the plate.

This image shows how they were spatially spread out over the plate.

alt text

Update:

I have set up a blog for anyone interested and I have setup a SkyDrive with all the raw measurements. So to anyone requesting more data: more power to you!


New update:

So after the help I got with my questions regarding paw detection and paw sorting, I was finally able to check the toe detection for every paw! Turns out, it doesn’t work so well in anything but paws sized like the one in my own example. Off course in hindsight, it’s my own fault for choosing the 2×2 so arbitrarily.

Here’s a nice example of where it goes wrong: a nail is being recognized as a toe and the ‘heel’ is so wide, it gets recognized twice!

alt text

The paw is too large, so taking a 2×2 size with no overlap, causes some toes to be detected twice. The other way around, in small dogs it often fails to find a 5th toe, which I suspect is being caused by the 2×2 area being too large.

After trying the current solution on all my measurements I came to the staggering conclusion that for nearly all my small dogs it didn’t find a 5th toe and that in over 50% of the impacts for the large dogs it would find more!

So clearly I need to change it. My own guess was changing the size of the neighborhood to something smaller for small dogs and larger for large dogs. But generate_binary_structure wouldn’t let me change the size of the array.

Therefore, I’m hoping that anyone else has a better suggestion for locating the toes, perhaps having the toe area scale with the paw size?


回答 0

我使用局部最大滤波器检测到峰值。这是第一个4个爪子的数据集的结果: 峰检测结果

我还在9个爪子的第二个数据集上运行了它,效果也很好

这是您的操作方式:

import numpy as np
from scipy.ndimage.filters import maximum_filter
from scipy.ndimage.morphology import generate_binary_structure, binary_erosion
import matplotlib.pyplot as pp

#for some reason I had to reshape. Numpy ignored the shape header.
paws_data = np.loadtxt("paws.txt").reshape(4,11,14)

#getting a list of images
paws = [p.squeeze() for p in np.vsplit(paws_data,4)]


def detect_peaks(image):
    """
    Takes an image and detect the peaks usingthe local maximum filter.
    Returns a boolean mask of the peaks (i.e. 1 when
    the pixel's value is the neighborhood maximum, 0 otherwise)
    """

    # define an 8-connected neighborhood
    neighborhood = generate_binary_structure(2,2)

    #apply the local maximum filter; all pixel of maximal value 
    #in their neighborhood are set to 1
    local_max = maximum_filter(image, footprint=neighborhood)==image
    #local_max is a mask that contains the peaks we are 
    #looking for, but also the background.
    #In order to isolate the peaks we must remove the background from the mask.

    #we create the mask of the background
    background = (image==0)

    #a little technicality: we must erode the background in order to 
    #successfully subtract it form local_max, otherwise a line will 
    #appear along the background border (artifact of the local maximum filter)
    eroded_background = binary_erosion(background, structure=neighborhood, border_value=1)

    #we obtain the final mask, containing only peaks, 
    #by removing the background from the local_max mask (xor operation)
    detected_peaks = local_max ^ eroded_background

    return detected_peaks


#applying the detection and plotting results
for i, paw in enumerate(paws):
    detected_peaks = detect_peaks(paw)
    pp.subplot(4,2,(2*i+1))
    pp.imshow(paw)
    pp.subplot(4,2,(2*i+2) )
    pp.imshow(detected_peaks)

pp.show()

之后,您要做的就是scipy.ndimage.measurements.label在蒙版上使用以标记所有不同的对象。这样您就可以分别与他们一起玩了。

请注意,该方法效果很好,因为背景不嘈杂。如果是这样,您将在背景中检测到许多其他不需要的峰。另一个重要因素是邻里的大小。如果峰大小发生变化,则需要对其进行调整(该值应保持大致比例)。

I detected the peaks using a local maximum filter. Here is the result on your first dataset of 4 paws: Peaks detection result

I also ran it on the second dataset of 9 paws and it worked as well.

Here is how you do it:

import numpy as np
from scipy.ndimage.filters import maximum_filter
from scipy.ndimage.morphology import generate_binary_structure, binary_erosion
import matplotlib.pyplot as pp

#for some reason I had to reshape. Numpy ignored the shape header.
paws_data = np.loadtxt("paws.txt").reshape(4,11,14)

#getting a list of images
paws = [p.squeeze() for p in np.vsplit(paws_data,4)]


def detect_peaks(image):
    """
    Takes an image and detect the peaks usingthe local maximum filter.
    Returns a boolean mask of the peaks (i.e. 1 when
    the pixel's value is the neighborhood maximum, 0 otherwise)
    """

    # define an 8-connected neighborhood
    neighborhood = generate_binary_structure(2,2)

    #apply the local maximum filter; all pixel of maximal value 
    #in their neighborhood are set to 1
    local_max = maximum_filter(image, footprint=neighborhood)==image
    #local_max is a mask that contains the peaks we are 
    #looking for, but also the background.
    #In order to isolate the peaks we must remove the background from the mask.

    #we create the mask of the background
    background = (image==0)

    #a little technicality: we must erode the background in order to 
    #successfully subtract it form local_max, otherwise a line will 
    #appear along the background border (artifact of the local maximum filter)
    eroded_background = binary_erosion(background, structure=neighborhood, border_value=1)

    #we obtain the final mask, containing only peaks, 
    #by removing the background from the local_max mask (xor operation)
    detected_peaks = local_max ^ eroded_background

    return detected_peaks


#applying the detection and plotting results
for i, paw in enumerate(paws):
    detected_peaks = detect_peaks(paw)
    pp.subplot(4,2,(2*i+1))
    pp.imshow(paw)
    pp.subplot(4,2,(2*i+2) )
    pp.imshow(detected_peaks)

pp.show()

All you need to do after is use scipy.ndimage.measurements.label on the mask to label all distinct objects. Then you’ll be able to play with them individually.

Note that the method works well because the background is not noisy. If it were, you would detect a bunch of other unwanted peaks in the background. Another important factor is the size of the neighborhood. You will need to adjust it if the peak size changes (the should remain roughly proportional).


回答 1

数据文件:paw.txt。源代码:

from scipy import *
from operator import itemgetter

n = 5  # how many fingers are we looking for

d = loadtxt("paw.txt")
width, height = d.shape

# Create an array where every element is a sum of 2x2 squares.

fourSums = d[:-1,:-1] + d[1:,:-1] + d[1:,1:] + d[:-1,1:]

# Find positions of the fingers.

# Pair each sum with its position number (from 0 to width*height-1),

pairs = zip(arange(width*height), fourSums.flatten())

# Sort by descending sum value, filter overlapping squares

def drop_overlapping(pairs):
    no_overlaps = []
    def does_not_overlap(p1, p2):
        i1, i2 = p1[0], p2[0]
        r1, col1 = i1 / (width-1), i1 % (width-1)
        r2, col2 = i2 / (width-1), i2 % (width-1)
        return (max(abs(r1-r2),abs(col1-col2)) >= 2)
    for p in pairs:
        if all(map(lambda prev: does_not_overlap(p,prev), no_overlaps)):
            no_overlaps.append(p)
    return no_overlaps

pairs2 = drop_overlapping(sorted(pairs, key=itemgetter(1), reverse=True))

# Take the first n with the heighest values

positions = pairs2[:n]

# Print results

print d, "\n"

for i, val in positions:
    row = i / (width-1)
    column = i % (width-1)
    print "sum = %f @ %d,%d (%d)" % (val, row, column, i)
    print d[row:row+2,column:column+2], "\n"

输出没有重叠的正方形。似乎选择了与您的示例相同的区域。

一些评论

棘手的部分是计算所有2×2平方和。我以为您需要所有这些,所以可能会有一些重叠。我使用切片从原始2D数组中剪切出第一行/最后一列和行,然后将它们全部重叠在一起并计算总和。

为了更好地理解它,对3×3阵列进行成像:

>>> a = arange(9).reshape(3,3) ; a
array([[0, 1, 2],
       [3, 4, 5],
       [6, 7, 8]])

然后,您可以对其进行切片:

>>> a[:-1,:-1]
array([[0, 1],
       [3, 4]])
>>> a[1:,:-1]
array([[3, 4],
       [6, 7]])
>>> a[:-1,1:]
array([[1, 2],
       [4, 5]])
>>> a[1:,1:]
array([[4, 5],
       [7, 8]])

现在,假设您将它们一个堆叠在另一个之上,然后将元素求和在相同位置。这些总和将与2×2正方形上的总和完全相同,并且左上角在相同位置:

>>> sums = a[:-1,:-1] + a[1:,:-1] + a[:-1,1:] + a[1:,1:]; sums
array([[ 8, 12],
       [20, 24]])

当总和超过2×2平方时,可以使用max来找到最大值或sortsorted来找到峰值。

为了记住峰的位置,我将每个值(总和)与其在平坦阵列中的序数位置相结合(请参见 zip)。然后,当我打印结果时,我再次计算行/列位置。

笔记

我允许2×2正方形重叠。编辑后的版本会过滤掉其中的一些,以便结果中仅显示不重叠的正方形。

选择手指(一个想法)

另一个问题是如何从所有峰中选择可能是手指的东西。我有一个想法可能会或可能不会。我现在没有时间实现它,所以只有伪代码。

我注意到,如果前手指几乎保持在一个完美的圆上,则后手指应该在该圆的内部。同样,前指或多或少地等距分布。我们可能会尝试使用这些启发式属性来检测手指。

伪代码:

select the top N finger candidates (not too many, 10 or 12)
consider all possible combinations of 5 out of N (use itertools.combinations)
for each combination of 5 fingers:
    for each finger out of 5:
        fit the best circle to the remaining 4
        => position of the center, radius
        check if the selected finger is inside of the circle
        check if the remaining four are evenly spread
        (for example, consider angles from the center of the circle)
        assign some cost (penalty) to this selection of 4 peaks + a rear finger
        (consider, probably weighted:
             circle fitting error,
             if the rear finger is inside,
             variance in the spreading of the front fingers,
             total intensity of 5 peaks)
choose a combination of 4 peaks + a rear peak with the lowest penalty

这是一种蛮力的方法。如果N相对较小,那么我认为它是可行的。对于N = 12,有C_12 ^ 5 = 792个组合,乘以5种选择后指的方式,因此每只爪子需要评估3960种情况。

Solution

Data file: paw.txt. Source code:

from scipy import *
from operator import itemgetter

n = 5  # how many fingers are we looking for

d = loadtxt("paw.txt")
width, height = d.shape

# Create an array where every element is a sum of 2x2 squares.

fourSums = d[:-1,:-1] + d[1:,:-1] + d[1:,1:] + d[:-1,1:]

# Find positions of the fingers.

# Pair each sum with its position number (from 0 to width*height-1),

pairs = zip(arange(width*height), fourSums.flatten())

# Sort by descending sum value, filter overlapping squares

def drop_overlapping(pairs):
    no_overlaps = []
    def does_not_overlap(p1, p2):
        i1, i2 = p1[0], p2[0]
        r1, col1 = i1 / (width-1), i1 % (width-1)
        r2, col2 = i2 / (width-1), i2 % (width-1)
        return (max(abs(r1-r2),abs(col1-col2)) >= 2)
    for p in pairs:
        if all(map(lambda prev: does_not_overlap(p,prev), no_overlaps)):
            no_overlaps.append(p)
    return no_overlaps

pairs2 = drop_overlapping(sorted(pairs, key=itemgetter(1), reverse=True))

# Take the first n with the heighest values

positions = pairs2[:n]

# Print results

print d, "\n"

for i, val in positions:
    row = i / (width-1)
    column = i % (width-1)
    print "sum = %f @ %d,%d (%d)" % (val, row, column, i)
    print d[row:row+2,column:column+2], "\n"

Output without overlapping squares. It seems that the same areas are selected as in your example.

Some comments

The tricky part is to calculate sums of all 2×2 squares. I assumed you need all of them, so there might be some overlapping. I used slices to cut the first/last columns and rows from the original 2D array, and then overlapping them all together and calculating sums.

To understand it better, imaging a 3×3 array:

>>> a = arange(9).reshape(3,3) ; a
array([[0, 1, 2],
       [3, 4, 5],
       [6, 7, 8]])

Then you can take its slices:

>>> a[:-1,:-1]
array([[0, 1],
       [3, 4]])
>>> a[1:,:-1]
array([[3, 4],
       [6, 7]])
>>> a[:-1,1:]
array([[1, 2],
       [4, 5]])
>>> a[1:,1:]
array([[4, 5],
       [7, 8]])

Now imagine you stack them one above the other and sum elements at the same positions. These sums will be exactly the same sums over the 2×2 squares with the top-left corner in the same position:

>>> sums = a[:-1,:-1] + a[1:,:-1] + a[:-1,1:] + a[1:,1:]; sums
array([[ 8, 12],
       [20, 24]])

When you have the sums over 2×2 squares, you can use max to find the maximum, or sort, or sorted to find the peaks.

To remember positions of the peaks I couple every value (the sum) with its ordinal position in a flattened array (see zip). Then I calculate row/column position again when I print the results.

Notes

I allowed for the 2×2 squares to overlap. Edited version filters out some of them such that only non-overlapping squares appear in the results.

Choosing fingers (an idea)

Another problem is how to choose what is likely to be fingers out of all the peaks. I have an idea which may or may not work. I don’t have time to implement it right now, so just pseudo-code.

I noticed that if the front fingers stay on almost a perfect circle, the rear finger should be inside of that circle. Also, the front fingers are more or less equally spaced. We may try to use these heuristic properties to detect the fingers.

Pseudo code:

select the top N finger candidates (not too many, 10 or 12)
consider all possible combinations of 5 out of N (use itertools.combinations)
for each combination of 5 fingers:
    for each finger out of 5:
        fit the best circle to the remaining 4
        => position of the center, radius
        check if the selected finger is inside of the circle
        check if the remaining four are evenly spread
        (for example, consider angles from the center of the circle)
        assign some cost (penalty) to this selection of 4 peaks + a rear finger
        (consider, probably weighted:
             circle fitting error,
             if the rear finger is inside,
             variance in the spreading of the front fingers,
             total intensity of 5 peaks)
choose a combination of 4 peaks + a rear peak with the lowest penalty

This is a brute-force approach. If N is relatively small, then I think it is doable. For N=12, there are C_12^5 = 792 combinations, times 5 ways to select a rear finger, so 3960 cases to evaluate for every paw.


回答 2

这是图像配准问题。总体策略是:

  • 有一个已知的例子,或某种先验的数据。
  • 使数据适合示例,或使示例适合数据。
  • 如果您的数据首先是大致对齐的,则将有帮助。

这是一种粗略而现成的方法,“可能会起作用的最愚蠢的事情”:

  • 从五个脚趾坐标开始,大致在您期望的位置。
  • 与每个人一起,反复攀登到山顶。即给定当前位置,如果其值大于当前像素,则移动到最大相邻像素。当脚趾坐标停止移动时停止。

要解决方向问题,您可以为基本方向(北,东北等)设置8个左右的初始设置。单独运行每个,并丢弃两个或多个脚趾最终位于同一像素的任何结果。我会再考虑一下,但是在图像处理中仍在研究这种东西-没有正确的答案!

稍微复杂一点的想法:(加权)K均值聚类。没那么糟糕。

  • 从五个脚趾坐标开始,但是现在这些是“集群中心”。

然后迭代直到收敛:

  • 将每个像素分配给最近的群集(只需为每个群集列出一个列表)。
  • 计算每个群集的质心。对于每个群集,这是:总和(坐标*强度值)/总和(坐标)
  • 将每个群集移动到新的质心。

这种方法几乎可以肯定会带来更好的结果,而且您​​会得到每个簇的质量,这可能有助于识别脚趾。

(同样,您已经预先指定了簇数。使用簇时,必须以一种或另一种方式指定密度:选择适合这种情况的簇数,或者选择一个簇半径,然后查看要终止的簇数后者的一个例子是均值漂移。)

抱歉,缺少实施细节或其他细节。我会对此进行编码,但是我有最后期限。如果下周没有其他工作,请告诉我,我会尝试一下。

This is an image registration problem. The general strategy is:

  • Have a known example, or some kind of prior on the data.
  • Fit your data to the example, or fit the example to your data.
  • It helps if your data is roughly aligned in the first place.

Here’s a rough and ready approach, “the dumbest thing that could possibly work”:

  • Start with five toe coordinates in roughly the place you expect.
  • With each one, iteratively climb to the top of the hill. i.e. given current position, move to maximum neighbouring pixel, if its value is greater than current pixel. Stop when your toe coordinates have stopped moving.

To counteract the orientation problem, you could have 8 or so initial settings for the basic directions (North, North East, etc). Run each one individually and throw away any results where two or more toes end up at the same pixel. I’ll think about this some more, but this kind of thing is still being researched in image processing – there are no right answers!

Slightly more complex idea: (weighted) K-means clustering. It’s not that bad.

  • Start with five toe coordinates, but now these are “cluster centres”.

Then iterate until convergence:

  • Assign each pixel to the closest cluster (just make a list for each cluster).
  • Calculate the center of mass of each cluster. For each cluster, this is: Sum(coordinate * intensity value)/Sum(coordinate)
  • Move each cluster to the new centre of mass.

This method will almost certainly give much better results, and you get the mass of each cluster which may help in identifying the toes.

(Again, you’ve specified the number of clusters up front. With clustering you have to specify the density one way or another: Either choose the number of clusters, appropriate in this case, or choose a cluster radius and see how many you end up with. An example of the latter is mean-shift.)

Sorry about the lack of implementation details or other specifics. I would code this up but I’ve got a deadline. If nothing else has worked by next week let me know and I’ll give it a shot.


回答 3

使用持久同源性分析您的数据集,我得到以下结果(点击放大):

结果

这是此SO Answer中描述的峰值检测方法的2D版本。上图仅显示了按持久性排序的0维持久性同源类。

我使用scipy.misc.imresize()将原始数据集按比例放大了2倍。但是,请注意,我确实将这四个爪子视为一个数据集。将其分为四个部分将使问题更容易解决。

方法。 这背后的想法很简单:考虑为每个像素分配其级别的函数的函数图。看起来像这样:

3D功能图

现在考虑高度255处的水位,该水位连续下降到较低的水位。在局部最大的岛上弹出(出生)。在马鞍点,两个岛屿合并。我们认为较低的岛屿将合并到较高的岛屿(死亡)。所谓的持久性图(属于零维度同质性类,我们的岛屿)描述了所有岛屿的死亡人数与出生人数的关系:

持续图

持久性那么,一个岛屿就是出生和死亡水平之间的差异。点到灰色主对角线的垂直距离。该图通过减少持久性来标记岛屿。

第一张图片显示了岛屿的出生地点。该方法不仅给出局部最大值,而且通过上述持久性来量化它们的“重要性”。然后,将以太低的持久性过滤掉所有岛屿。但是,在您的示例中,每个岛(即每个局部最大值)都是您要寻找的峰值。

Python代码可以在这里找到。

Using persistent homology to analyze your data set I get the following result (click to enlarge):

Result

This is the 2D-version of the peak detection method described in this SO answer. The above figure simply shows 0-dimensional persistent homology classes sorted by persistence.

I did upscale the original dataset by a factor of 2 using scipy.misc.imresize(). However, note that I did consider the four paws as one dataset; splitting it into four would make the problem easier.

Methodology. The idea behind this quite simple: Consider the function graph of the function that assigns each pixel its level. It looks like this:

3D function graph

Now consider a water level at height 255 that continuously descents to lower levels. At local maxima islands pop up (birth). At saddle points two islands merge; we consider the lower island to be merged to the higher island (death). The so-called persistence diagram (of the 0-th dimensional homology classes, our islands) depicts death- over birth-values of all islands:

Persistence diagram

The persistence of an island is then the difference between the birth- and death-level; the vertical distance of a dot to the grey main diagonal. The figure labels the islands by decreasing persistence.

The very first picture shows the locations of births of the islands. This method not only gives the local maxima but also quantifies their “significance” by the above mentioned persistence. One would then filter out all islands with a too low persistence. However, in your example every island (i.e., every local maximum) is a peak you look for.

Python code can be found here.


回答 4

物理学家已经对该问题进行了深入研究。在ROOT中有一个很好的实现。查看TSpectrum类(尤其是TSpectrum2您的案例的)和它们的文档。

参考文献:

  1. M.Morhac等人:多维重合伽马射线光谱的背景消除方法。物理研究中的核仪器和方法A 401(1997)113-132。
  2. M.Morhac等:有效的一维和二维金反卷积及其在伽马射线光谱分解中的应用。物理研究中的核仪器和方法A 401(1997)385-408。
  3. M.Morhac等人:多维重合伽马射线光谱中的峰鉴定。《研究物理中的核仪器与方法》,A 443(2000),108-125。

…对于那些无权订阅NIM的用户:

This problem has been studied in some depth by physicists. There is a good implementation in ROOT. Look at the TSpectrum classes (especially TSpectrum2 for your case) and the documentation for them.

References:

  1. M.Morhac et al.: Background elimination methods for multidimensional coincidence gamma-ray spectra. Nuclear Instruments and Methods in Physics Research A 401 (1997) 113-132.
  2. M.Morhac et al.: Efficient one- and two-dimensional Gold deconvolution and its application to gamma-ray spectra decomposition. Nuclear Instruments and Methods in Physics Research A 401 (1997) 385-408.
  3. M.Morhac et al.: Identification of peaks in multidimensional coincidence gamma-ray spectra. Nuclear Instruments and Methods in Research Physics A 443(2000), 108-125.

…and for those who don’t have access to a subscription to NIM:


回答 5

这是一个想法:您计算图像的(离散)拉普拉斯算子。我希望它在最大值上(负)很大,比原​​始图像更具戏剧性。因此,最大值可能更容易找到。

这是另一个想法:如果您知道高压点的典型大小,则可以先通过使用相同大小的高斯将其卷积来平滑图像。这可以使您处理更简单的图像。

Here is an idea: you calculate the (discrete) Laplacian of the image. I would expect it to be (negative and) large at maxima, in a way that is more dramatic than in the original images. Thus, maxima could be easier to find.

Here is another idea: if you know the typical size of the high-pressure spots, you can first smooth your image by convoluting it with a Gaussian of the same size. This may give you simpler images to process.


回答 6

我的脑海中只有几个想法:

  • 进行扫描的梯度(导数),看看是否消除了错误的调用
  • 取局部最大值的最大值

您可能还想看看OpenCV,它有一个相当不错的Python API,并且可能有一些有用的功能。

Just a couple of ideas off the top of my head:

  • take the gradient (derivative) of the scan, see if that eliminates the false calls
  • take the maximum of the local maxima

You might also want to take a look at OpenCV, it’s got a fairly decent Python API and might have some functions you’d find useful.


回答 7

我确定您现在有足够的工作要做,但是我不禁建议使用k-means聚类方法。k-means是一种无监督的聚类算法,它将获取您的数据(任意数量的维度-我碰巧是在3D中进行此操作),并将其排列到具有不同边界的k个聚类中。这里很好,因为您确切知道这些犬(应该)有多少个脚趾。

此外,它是在Scipy中实现的,这真的很好(http://docs.scipy.org/doc/scipy/reference/cluster.vq.html)。

这是在空间上解析3D群集的方法的示例: 在此处输入图片说明

您想要做的有点不同(2D并包含压力值),但我仍然认为您可以尝试一下。

I’m sure you have enough to go on by now, but I can’t help but suggest using the k-means clustering method. k-means is an unsupervised clustering algorithm which will take you data (in any number of dimensions – I happen to do this in 3D) and arrange it into k clusters with distinct boundaries. It’s nice here because you know exactly how many toes these canines (should) have.

Additionally, it’s implemented in Scipy which is really nice (http://docs.scipy.org/doc/scipy/reference/cluster.vq.html).

Here’s an example of what it can do to spatially resolve 3D clusters: enter image description here

What you want to do is a bit different (2D and includes pressure values), but I still think you could give it a shot.


回答 8

感谢您的原始数据。我在火车上,这是我所能到达的最大距离(我的停靠站即将来临)。我用正则表达式对您的txt文件进行了按摩,并将其放入带有一些JavaScript的html页面中以进行可视化。我在这里分享它是因为某些人(例如我自己)可能会发现它比python更容易被黑客攻击。

我认为一个好的方法将是尺度和旋转不变,而我的下一步将是研究高斯的混合。(每个爪垫都是高斯的中心)。

    <html>
<head>
    <script type="text/javascript" src="http://vis.stanford.edu/protovis/protovis-r3.2.js"></script> 
    <script type="text/javascript">
    var heatmap = [[[0,0,0,0,0,0,0,4,4,0,0,0,0],
[0,0,0,0,0,7,14,22,18,7,0,0,0],
[0,0,0,0,11,40,65,43,18,7,0,0,0],
[0,0,0,0,14,61,72,32,7,4,11,14,4],
[0,7,14,11,7,22,25,11,4,14,65,72,14],
[4,29,79,54,14,7,4,11,18,29,79,83,18],
[0,18,54,32,18,43,36,29,61,76,25,18,4],
[0,4,7,7,25,90,79,36,79,90,22,0,0],
[0,0,0,0,11,47,40,14,29,36,7,0,0],
[0,0,0,0,4,7,7,4,4,4,0,0,0]
],[
[0,0,0,4,4,0,0,0,0,0,0,0,0],
[0,0,11,18,18,7,0,0,0,0,0,0,0],
[0,4,29,47,29,7,0,4,4,0,0,0,0],
[0,0,11,29,29,7,7,22,25,7,0,0,0],
[0,0,0,4,4,4,14,61,83,22,0,0,0],
[4,7,4,4,4,4,14,32,25,7,0,0,0],
[4,11,7,14,25,25,47,79,32,4,0,0,0],
[0,4,4,22,58,40,29,86,36,4,0,0,0],
[0,0,0,7,18,14,7,18,7,0,0,0,0],
[0,0,0,0,4,4,0,0,0,0,0,0,0],
],[
[0,0,0,4,11,11,7,4,0,0,0,0,0],
[0,0,0,4,22,36,32,22,11,4,0,0,0],
[4,11,7,4,11,29,54,50,22,4,0,0,0],
[11,58,43,11,4,11,25,22,11,11,18,7,0],
[11,50,43,18,11,4,4,7,18,61,86,29,4],
[0,11,18,54,58,25,32,50,32,47,54,14,0],
[0,0,14,72,76,40,86,101,32,11,7,4,0],
[0,0,4,22,22,18,47,65,18,0,0,0,0],
[0,0,0,0,4,4,7,11,4,0,0,0,0],
],[
[0,0,0,0,4,4,4,0,0,0,0,0,0],
[0,0,0,4,14,14,18,7,0,0,0,0,0],
[0,0,0,4,14,40,54,22,4,0,0,0,0],
[0,7,11,4,11,32,36,11,0,0,0,0,0],
[4,29,36,11,4,7,7,4,4,0,0,0,0],
[4,25,32,18,7,4,4,4,14,7,0,0,0],
[0,7,36,58,29,14,22,14,18,11,0,0,0],
[0,11,50,68,32,40,61,18,4,4,0,0,0],
[0,4,11,18,18,43,32,7,0,0,0,0,0],
[0,0,0,0,4,7,4,0,0,0,0,0,0],
],[
[0,0,0,0,0,0,4,7,4,0,0,0,0],
[0,0,0,0,4,18,25,32,25,7,0,0,0],
[0,0,0,4,18,65,68,29,11,0,0,0,0],
[0,4,4,4,18,65,54,18,4,7,14,11,0],
[4,22,36,14,4,14,11,7,7,29,79,47,7],
[7,54,76,36,18,14,11,36,40,32,72,36,4],
[4,11,18,18,61,79,36,54,97,40,14,7,0],
[0,0,0,11,58,101,40,47,108,50,7,0,0],
[0,0,0,4,11,25,7,11,22,11,0,0,0],
[0,0,0,0,0,4,0,0,0,0,0,0,0],
],[
[0,0,4,7,4,0,0,0,0,0,0,0,0],
[0,0,11,22,14,4,0,4,0,0,0,0,0],
[0,0,7,18,14,4,4,14,18,4,0,0,0],
[0,4,0,4,4,0,4,32,54,18,0,0,0],
[4,11,7,4,7,7,18,29,22,4,0,0,0],
[7,18,7,22,40,25,50,76,25,4,0,0,0],
[0,4,4,22,61,32,25,54,18,0,0,0,0],
[0,0,0,4,11,7,4,11,4,0,0,0,0],
],[
[0,0,0,0,7,14,11,4,0,0,0,0,0],
[0,0,0,4,18,43,50,32,14,4,0,0,0],
[0,4,11,4,7,29,61,65,43,11,0,0,0],
[4,18,54,25,7,11,32,40,25,7,11,4,0],
[4,36,86,40,11,7,7,7,7,25,58,25,4],
[0,7,18,25,65,40,18,25,22,22,47,18,0],
[0,0,4,32,79,47,43,86,54,11,7,4,0],
[0,0,0,14,32,14,25,61,40,7,0,0,0],
[0,0,0,0,4,4,4,11,7,0,0,0,0],
],[
[0,0,0,0,4,7,11,4,0,0,0,0,0],
[0,4,4,0,4,11,18,11,0,0,0,0,0],
[4,11,11,4,0,4,4,4,0,0,0,0,0],
[4,18,14,7,4,0,0,4,7,7,0,0,0],
[0,7,18,29,14,11,11,7,18,18,4,0,0],
[0,11,43,50,29,43,40,11,4,4,0,0,0],
[0,4,18,25,22,54,40,7,0,0,0,0,0],
[0,0,4,4,4,11,7,0,0,0,0,0,0],
],[
[0,0,0,0,0,7,7,7,7,0,0,0,0],
[0,0,0,0,7,32,32,18,4,0,0,0,0],
[0,0,0,0,11,54,40,14,4,4,22,11,0],
[0,7,14,11,4,14,11,4,4,25,94,50,7],
[4,25,65,43,11,7,4,7,22,25,54,36,7],
[0,7,25,22,29,58,32,25,72,61,14,7,0],
[0,0,4,4,40,115,68,29,83,72,11,0,0],
[0,0,0,0,11,29,18,7,18,14,4,0,0],
[0,0,0,0,0,4,0,0,0,0,0,0,0],
]
];
</script>
</head>
<body>
    <script type="text/javascript+protovis">    
    for (var a=0; a < heatmap.length; a++) {
    var w = heatmap[a][0].length,
    h = heatmap[a].length;
var vis = new pv.Panel()
    .width(w * 6)
    .height(h * 6)
    .strokeStyle("#aaa")
    .lineWidth(4)
    .antialias(true);
vis.add(pv.Image)
    .imageWidth(w)
    .imageHeight(h)
    .image(pv.Scale.linear()
        .domain(0, 99, 100)
        .range("#000", "#fff", '#ff0a0a')
        .by(function(i, j) heatmap[a][j][i]));
vis.render();
}
</script>
  </body>
</html>

替代文字

thanks for the raw data. I’m on the train and this is as far as I’ve gotten (my stop is coming up). I massaged your txt file with regexps and have plopped it into a html page with some javascript for visualization. I’m sharing it here because some, like myself, might find it more readily hackable than python.

I think a good approach will be scale and rotation invariant, and my next step will be to investigate mixtures of gaussians. (each paw pad being the center of a gaussian).

    <html>
<head>
    <script type="text/javascript" src="http://vis.stanford.edu/protovis/protovis-r3.2.js"></script> 
    <script type="text/javascript">
    var heatmap = [[[0,0,0,0,0,0,0,4,4,0,0,0,0],
[0,0,0,0,0,7,14,22,18,7,0,0,0],
[0,0,0,0,11,40,65,43,18,7,0,0,0],
[0,0,0,0,14,61,72,32,7,4,11,14,4],
[0,7,14,11,7,22,25,11,4,14,65,72,14],
[4,29,79,54,14,7,4,11,18,29,79,83,18],
[0,18,54,32,18,43,36,29,61,76,25,18,4],
[0,4,7,7,25,90,79,36,79,90,22,0,0],
[0,0,0,0,11,47,40,14,29,36,7,0,0],
[0,0,0,0,4,7,7,4,4,4,0,0,0]
],[
[0,0,0,4,4,0,0,0,0,0,0,0,0],
[0,0,11,18,18,7,0,0,0,0,0,0,0],
[0,4,29,47,29,7,0,4,4,0,0,0,0],
[0,0,11,29,29,7,7,22,25,7,0,0,0],
[0,0,0,4,4,4,14,61,83,22,0,0,0],
[4,7,4,4,4,4,14,32,25,7,0,0,0],
[4,11,7,14,25,25,47,79,32,4,0,0,0],
[0,4,4,22,58,40,29,86,36,4,0,0,0],
[0,0,0,7,18,14,7,18,7,0,0,0,0],
[0,0,0,0,4,4,0,0,0,0,0,0,0],
],[
[0,0,0,4,11,11,7,4,0,0,0,0,0],
[0,0,0,4,22,36,32,22,11,4,0,0,0],
[4,11,7,4,11,29,54,50,22,4,0,0,0],
[11,58,43,11,4,11,25,22,11,11,18,7,0],
[11,50,43,18,11,4,4,7,18,61,86,29,4],
[0,11,18,54,58,25,32,50,32,47,54,14,0],
[0,0,14,72,76,40,86,101,32,11,7,4,0],
[0,0,4,22,22,18,47,65,18,0,0,0,0],
[0,0,0,0,4,4,7,11,4,0,0,0,0],
],[
[0,0,0,0,4,4,4,0,0,0,0,0,0],
[0,0,0,4,14,14,18,7,0,0,0,0,0],
[0,0,0,4,14,40,54,22,4,0,0,0,0],
[0,7,11,4,11,32,36,11,0,0,0,0,0],
[4,29,36,11,4,7,7,4,4,0,0,0,0],
[4,25,32,18,7,4,4,4,14,7,0,0,0],
[0,7,36,58,29,14,22,14,18,11,0,0,0],
[0,11,50,68,32,40,61,18,4,4,0,0,0],
[0,4,11,18,18,43,32,7,0,0,0,0,0],
[0,0,0,0,4,7,4,0,0,0,0,0,0],
],[
[0,0,0,0,0,0,4,7,4,0,0,0,0],
[0,0,0,0,4,18,25,32,25,7,0,0,0],
[0,0,0,4,18,65,68,29,11,0,0,0,0],
[0,4,4,4,18,65,54,18,4,7,14,11,0],
[4,22,36,14,4,14,11,7,7,29,79,47,7],
[7,54,76,36,18,14,11,36,40,32,72,36,4],
[4,11,18,18,61,79,36,54,97,40,14,7,0],
[0,0,0,11,58,101,40,47,108,50,7,0,0],
[0,0,0,4,11,25,7,11,22,11,0,0,0],
[0,0,0,0,0,4,0,0,0,0,0,0,0],
],[
[0,0,4,7,4,0,0,0,0,0,0,0,0],
[0,0,11,22,14,4,0,4,0,0,0,0,0],
[0,0,7,18,14,4,4,14,18,4,0,0,0],
[0,4,0,4,4,0,4,32,54,18,0,0,0],
[4,11,7,4,7,7,18,29,22,4,0,0,0],
[7,18,7,22,40,25,50,76,25,4,0,0,0],
[0,4,4,22,61,32,25,54,18,0,0,0,0],
[0,0,0,4,11,7,4,11,4,0,0,0,0],
],[
[0,0,0,0,7,14,11,4,0,0,0,0,0],
[0,0,0,4,18,43,50,32,14,4,0,0,0],
[0,4,11,4,7,29,61,65,43,11,0,0,0],
[4,18,54,25,7,11,32,40,25,7,11,4,0],
[4,36,86,40,11,7,7,7,7,25,58,25,4],
[0,7,18,25,65,40,18,25,22,22,47,18,0],
[0,0,4,32,79,47,43,86,54,11,7,4,0],
[0,0,0,14,32,14,25,61,40,7,0,0,0],
[0,0,0,0,4,4,4,11,7,0,0,0,0],
],[
[0,0,0,0,4,7,11,4,0,0,0,0,0],
[0,4,4,0,4,11,18,11,0,0,0,0,0],
[4,11,11,4,0,4,4,4,0,0,0,0,0],
[4,18,14,7,4,0,0,4,7,7,0,0,0],
[0,7,18,29,14,11,11,7,18,18,4,0,0],
[0,11,43,50,29,43,40,11,4,4,0,0,0],
[0,4,18,25,22,54,40,7,0,0,0,0,0],
[0,0,4,4,4,11,7,0,0,0,0,0,0],
],[
[0,0,0,0,0,7,7,7,7,0,0,0,0],
[0,0,0,0,7,32,32,18,4,0,0,0,0],
[0,0,0,0,11,54,40,14,4,4,22,11,0],
[0,7,14,11,4,14,11,4,4,25,94,50,7],
[4,25,65,43,11,7,4,7,22,25,54,36,7],
[0,7,25,22,29,58,32,25,72,61,14,7,0],
[0,0,4,4,40,115,68,29,83,72,11,0,0],
[0,0,0,0,11,29,18,7,18,14,4,0,0],
[0,0,0,0,0,4,0,0,0,0,0,0,0],
]
];
</script>
</head>
<body>
    <script type="text/javascript+protovis">    
    for (var a=0; a < heatmap.length; a++) {
    var w = heatmap[a][0].length,
    h = heatmap[a].length;
var vis = new pv.Panel()
    .width(w * 6)
    .height(h * 6)
    .strokeStyle("#aaa")
    .lineWidth(4)
    .antialias(true);
vis.add(pv.Image)
    .imageWidth(w)
    .imageHeight(h)
    .image(pv.Scale.linear()
        .domain(0, 99, 100)
        .range("#000", "#fff", '#ff0a0a')
        .by(function(i, j) heatmap[a][j][i]));
vis.render();
}
</script>
  </body>
</html>

alt text


回答 9

物理学家的解决方案:
定义5个由其位置标识的爪标记,X_i并以随机位置初始化它们。定义一些能量函数,将对标记在脚掌位置的奖励与对标记重叠的惩罚相结合;比方说:

E(X_i;S)=-Sum_i(S(X_i))+alfa*Sum_ij (|X_i-Xj|<=2*sqrt(2)?1:0)

S(X_i)是2×2平方左右的平均力X_ialfa是实验达到峰值的参数)

现在是时候做一些Metropolis-Hastings魔术了:
1.选择随机标记并将其沿随机方向移动一个像素。
2.计算dE,此移动引起的能量差。
3.从0-1获得一个统一的随机数,并将其称为r。
4.如果dE<0exp(-beta*dE)>r,接受移动并转到1;否则,转到1。如果不是,请撤消移动并转到1。
应该重复这一步骤,直到标记会聚到爪子上为止。Beta控制扫描以优化权衡,因此也应通过实验进行优化;随着模拟时间(模拟退火)的增加,它也可以不断增加。

Physicist’s solution:
Define 5 paw-markers identified by their positions X_i and init them with random positions. Define some energy function combining some award for location of markers in paws’ positions with some punishment for overlap of markers; let’s say:

E(X_i;S)=-Sum_i(S(X_i))+alfa*Sum_ij (|X_i-Xj|<=2*sqrt(2)?1:0)

(S(X_i) is the mean force in 2×2 square around X_i, alfa is a parameter to be peaked experimentally)

Now time to do some Metropolis-Hastings magic:
1. Select random marker and move it by one pixel in random direction.
2. Calculate dE, the difference of energy this move caused.
3. Get an uniform random number from 0-1 and call it r.
4. If dE<0 or exp(-beta*dE)>r, accept the move and go to 1; if not, undo the move and go to 1.
This should be repeated until the markers will converge to paws. Beta controls the scanning to optimizing tradeoff, so it should be also optimized experimentally; it can be also constantly increased with the time of simulation (simulated annealing).


回答 10

这是我对大型望远镜进行类似操作时使用的另一种方法:

1)搜索最高像素。有了该值后,请在该值附近搜索2×2的最佳拟合(也许使2×2的总和最大化),或者在以最高像素为中心的4×4子区域内进行2d高斯拟合。

然后将您发现的2×2像素在峰中心周围设为零(或3×3)

返回1)并重复直到最高峰降到噪声阈值以下,或者您拥有了所有需要的脚趾

Heres another approach that I used when doing something similar for a large telescope:

1) Search for the highest pixel. Once you have that, search around that for the best fit for 2×2 (maybe maximizing the 2×2 sum), or do a 2d gaussian fit inside the sub region of say 4×4 centered on the highest pixel.

Then set those 2×2 pixels you have found to zero (or maybe 3×3) around the peak center

go back to 1) and repeat till the highest peak falls below a noise threshold, or you have all the toes you need


回答 11

如果您能够创建一些训练数据,则可能值得尝试使用神经网络…但是,这需要手工标注许多示例。

It’s probably worth to try with neural networks if you are able to create some training data… but this needs many samples annotated by hand.


回答 12

粗略的轮廓…

您可能希望使用连接的组件算法来隔离每个爪区域。Wiki在此处对此有一个不错的描述(带有一些代码):http : //en.wikipedia.org/wiki/Connected_Component_Labeling

您必须决定是使用4个连接还是8个连接。就个人而言,对于大多数问题,我更喜欢6连通性。无论如何,一旦将每个“爪印”分离为一个相连的区域,就应该足够容易地遍历该区域并找到最大值。一旦找到最大值,就可以迭代放大区域,直到达到预定阈值,以将其识别为给定的“脚趾”。

这里一个微妙的问题是,一旦您开始使用计算机视觉技术将某事物识别为右/左/前/后爪,并且开始查看各个脚趾,就必须开始考虑旋转,偏斜和平移。这是通过分析所谓的“时刻”来完成的。在视觉应用中需要考虑一些不同的时刻:

中心矩:平移不变归一化矩:缩放和平移不变hu矩:平移,尺度和旋转不变

通过在Wiki上搜索“图像时刻”可以找到有关时刻的更多信息。

a rough outline…

you’d probably want to use a connected components algorithm to isolate each paw region. wiki has a decent description of this (with some code) here: http://en.wikipedia.org/wiki/Connected_Component_Labeling

you’ll have to make a decision about whether to use 4 or 8 connectedness. personally, for most problems i prefer 6-connectedness. anyway, once you’ve separated out each “paw print” as a connected region, it should be easy enough to iterate through the region and find the maxima. once you’ve found the maxima, you could iteratively enlarge the region until you reach a predetermined threshold in order to identify it as a given “toe”.

one subtle problem here is that as soon as you start using computer vision techniques to identify something as a right/left/front/rear paw and you start looking at individual toes, you have to start taking rotations, skews, and translations into account. this is accomplished through the analysis of so-called “moments”. there are a few different moments to consider in vision applications:

central moments: translation invariant normalized moments: scaling and translation invariant hu moments: translation, scale, and rotation invariant

more information about moments can be found by searching “image moments” on wiki.


回答 13

也许您可以使用诸如高斯混合模型之类的东西。这是用于执行GMM的Python程序包(就像Google搜索一样) http://www.ar.media.kyoto-u.ac.jp/members/david/softwares/em/

Perhaps you can use something like Gaussian Mixture Models. Here’s a Python package for doing GMMs (just did a Google search) http://www.ar.media.kyoto-u.ac.jp/members/david/softwares/em/


回答 14

看来您可以使用jetxee的算法作弊。他发现前三个脚趾很好,您应该能够猜出第四个脚趾是基于哪个脚趾。

It seems you can cheat a bit using jetxee’s algorithm. He is finding the first three toes fine, and you should be able to guess where the fourth is based off that.


回答 15

有趣的问题。我尝试的解决方案如下。

  1. 应用低通滤波器,例如与2D高斯蒙版进行卷积。这将为您提供一堆(可能但不一定是浮点数)值。

  2. 使用每个爪垫(或脚趾)的已知近似半径执行2D非最大抑制。

这应该为您提供最大的职位,而不会使多个候选职位靠在一起。为了明确起见,步骤1中的遮罩半径也应与步骤2中使用的半径相似。该半径可以选择,或者兽医可以事先对其进行明确测量(随年龄/品种/等的不同而不同)。

建议的某些解决方案(均值漂移,神经网络等)可能会在某种程度上起作用,但过于复杂且可能不理想。

Interesting problem. The solution I would try is the following.

  1. Apply a low pass filter, such as convolution with a 2D gaussian mask. This will give you a bunch of (probably, but not necessarily floating point) values.

  2. Perform a 2D non-maximal suppression using the known approximate radius of each paw pad (or toe).

This should give you the maximal positions without having multiple candidates which are close together. Just to clarify, the radius of the mask in step 1 should also be similar to the radius used in step 2. This radius could be selectable, or the vet could explicitly measure it beforehand (it will vary with age/breed/etc).

Some of the solutions suggested (mean shift, neural nets, and so on) probably will work to some degree, but are overly complicated and probably not ideal.


回答 16

好吧,这是一些简单但效率不高的代码,但是对于这种大小的数据集来说,这很好。

import numpy as np
grid = np.array([[0,0,0,0,0,0,0,0,0,0,0,0,0,0],
              [0,0,0,0,0,0,0,0,0.4,0.4,0.4,0,0,0],
              [0,0,0,0,0.4,1.4,1.4,1.8,0.7,0,0,0,0,0],
              [0,0,0,0,0.4,1.4,4,5.4,2.2,0.4,0,0,0,0],
              [0,0,0.7,1.1,0.4,1.1,3.2,3.6,1.1,0,0,0,0,0],
              [0,0.4,2.9,3.6,1.1,0.4,0.7,0.7,0.4,0.4,0,0,0,0],
              [0,0.4,2.5,3.2,1.8,0.7,0.4,0.4,0.4,1.4,0.7,0,0,0],
              [0,0,0.7,3.6,5.8,2.9,1.4,2.2,1.4,1.8,1.1,0,0,0],
              [0,0,1.1,5,6.8,3.2,4,6.1,1.8,0.4,0.4,0,0,0],
              [0,0,0.4,1.1,1.8,1.8,4.3,3.2,0.7,0,0,0,0,0],
              [0,0,0,0,0,0.4,0.7,0.4,0,0,0,0,0,0]])

arr = []
for i in xrange(grid.shape[0] - 1):
    for j in xrange(grid.shape[1] - 1):
        tot = grid[i][j] + grid[i+1][j] + grid[i][j+1] + grid[i+1][j+1]
        arr.append([(i,j),tot])

best = []

arr.sort(key = lambda x: x[1])

for i in xrange(5):
    best.append(arr.pop())
    badpos = set([(best[-1][0][0]+x,best[-1][0][1]+y)
                  for x in [-1,0,1] for y in [-1,0,1] if x != 0 or y != 0])
    for j in xrange(len(arr)-1,-1,-1):
        if arr[j][0] in badpos:
            arr.pop(j)


for item in best:
    print grid[item[0][0]:item[0][0]+2,item[0][1]:item[0][1]+2]

我基本上只是用左上角的位置和每个2×2正方形的和组成一个数组,然后按和对它进行排序。然后,我从争用中取出具有最高总和的2×2正方形,将其放入best数组中,并删除所有其他2×2正方形,这些正方形使用了刚刚删除的2×2正方形的任何部分。

除了最后一个爪子(在第一张图片的最右端具有最小和的爪子)之外,它似乎工作正常,结果发现还有另外两个合格的2×2平方和,它们的总和等于彼此)。其中一个仍然从您的2×2正方形中选择一个正方形,但另一个在左侧。幸运的是,幸运的是,我们可以选择更多您想要的东西,但这可能需要使用其他一些思想来始终获得您真正想要的东西。

Well, here’s some simple and not terribly efficient code, but for this size of a data set it is fine.

import numpy as np
grid = np.array([[0,0,0,0,0,0,0,0,0,0,0,0,0,0],
              [0,0,0,0,0,0,0,0,0.4,0.4,0.4,0,0,0],
              [0,0,0,0,0.4,1.4,1.4,1.8,0.7,0,0,0,0,0],
              [0,0,0,0,0.4,1.4,4,5.4,2.2,0.4,0,0,0,0],
              [0,0,0.7,1.1,0.4,1.1,3.2,3.6,1.1,0,0,0,0,0],
              [0,0.4,2.9,3.6,1.1,0.4,0.7,0.7,0.4,0.4,0,0,0,0],
              [0,0.4,2.5,3.2,1.8,0.7,0.4,0.4,0.4,1.4,0.7,0,0,0],
              [0,0,0.7,3.6,5.8,2.9,1.4,2.2,1.4,1.8,1.1,0,0,0],
              [0,0,1.1,5,6.8,3.2,4,6.1,1.8,0.4,0.4,0,0,0],
              [0,0,0.4,1.1,1.8,1.8,4.3,3.2,0.7,0,0,0,0,0],
              [0,0,0,0,0,0.4,0.7,0.4,0,0,0,0,0,0]])

arr = []
for i in xrange(grid.shape[0] - 1):
    for j in xrange(grid.shape[1] - 1):
        tot = grid[i][j] + grid[i+1][j] + grid[i][j+1] + grid[i+1][j+1]
        arr.append([(i,j),tot])

best = []

arr.sort(key = lambda x: x[1])

for i in xrange(5):
    best.append(arr.pop())
    badpos = set([(best[-1][0][0]+x,best[-1][0][1]+y)
                  for x in [-1,0,1] for y in [-1,0,1] if x != 0 or y != 0])
    for j in xrange(len(arr)-1,-1,-1):
        if arr[j][0] in badpos:
            arr.pop(j)


for item in best:
    print grid[item[0][0]:item[0][0]+2,item[0][1]:item[0][1]+2]

I basically just make an array with the position of the upper-left and the sum of each 2×2 square and sort it by the sum. I then take the 2×2 square with the highest sum out of contention, put it in the best array, and remove all other 2×2 squares that used any part of this just removed 2×2 square.

It seems to work fine except with the last paw (the one with the smallest sum on the far right in your first picture), it turns out that there are two other eligible 2×2 squares with a larger sum (and they have an equal sum to each other). One of them is still selects one square from your 2×2 square, but the other is off to the left. Fortunately, by luck we see to be choosing more of the one that you would want, but this may require some other ideas to be used to get what you actually want all of the time.


回答 17

只是想告诉大家maxima,使用python在图像中找到本地是一个不错的选择:

from skimage.feature import peak_local_max

或skimage 0.8.0

from skimage.feature.peak import peak_local_max

http://scikit-image.org/docs/0.8.0/api/skimage.feature.peak.html

Just wanna tell you guys there is a nice option to find local maxima in images with python:

from skimage.feature import peak_local_max

or for skimage 0.8.0:

from skimage.feature.peak import peak_local_max

http://scikit-image.org/docs/0.8.0/api/skimage.feature.peak.html


回答 18

也许在这里一个简单的方法就足够了:建立飞机上所有2×2正方形的列表,并按它们的总和(降序排列)进行排序。

首先,在“爪子列表”中选择价值最高的正方形。然后,迭代选择不与任何先前找到的正方形相交的次佳正方形中的4个。

Maybe a naive approach is sufficient here: Build a list of all 2×2 squares on your plane, order them by their sum (in descending order).

First, select the highest-valued square into your “paw list”. Then, iteratively pick 4 of the next-best squares that don’t intersect with any of the previously found squares.


回答 19

天文学和宇宙学界提供了许多广泛的软件-这是历史上和当前的重要研究领域。

如果您不是天文学家,请不要惊慌-有些易于在野外使用。例如,您可以使用astropy / photutils:

https://photutils.readthedocs.io/en/stable/detection.html#local-peak-detection

[在这里重复简短的示例代码似乎有点不礼貌。]

下面列出了可能不感兴趣的技术/软件包/链接的不完整列表,并略有偏差。请在注释中添加更多内容,并在必要时更新此答案。当然,要在精度与计算资源之间进行权衡。[老实说,在这样的单个答案中有太多的例子无法给出代码示例,所以我不确定这个答案是否可行。]

源提取器https://www.astromatic.net/software/sextractor

MultiNest https://github.com/farhanferoz/MultiNest [+ pyMultiNest]

ASKAP / EMU寻源挑战:https ://arxiv.org/abs/1509.03931

您也可以搜索Planck和/或WMAP源提取挑战。

There are several and extensive pieces of software available from the astronomy and cosmology community – this is a significant area of research both historically and currently.

Do not be alarmed if you are not an astronomer – some are easy to use outside the field. For example, you could use astropy/photutils:

https://photutils.readthedocs.io/en/stable/detection.html#local-peak-detection

[It seems a bit rude to repeat their short sample code here.]

An incomplete and slightly biased list of techniques/packages/links that might be of interest is given below – do add more in the comments and I will update this answer as necessary. Of course there is a trade-off of accuracy vs compute resources. [Honestly, there are too many to give code examples in a single answer such as this so I am not sure whether this answer will fly or not.]

Source Extractor https://www.astromatic.net/software/sextractor

MultiNest https://github.com/farhanferoz/MultiNest [+ pyMultiNest]

ASKAP/EMU source-finding challenge: https://arxiv.org/abs/1509.03931

You could also search for Planck and/or WMAP source-extraction challenges.


回答 20

如果逐步进行操作怎么办:首先找到全局最大值,然后根据需要处理周围点的值,然后将找到的区域设置为零,然后对下一个重复。

What if you proceed step by step: you first locate the global maximum, process if needed the surrounding points given their value, then set the found region to zero, and repeat for the next one.


回答 21

我不确定这是否能回答问题,但似乎您可以寻找没有邻居的n个最高峰。

这是要点。 请注意,它在Ruby中,但是思路应该很清楚。

require 'pp'

NUM_PEAKS = 5
NEIGHBOR_DISTANCE = 1

data = [[1,2,3,4,5],
        [2,6,4,4,6],
        [3,6,7,4,3],
       ]

def tuples(matrix)
  tuples = []
  matrix.each_with_index { |row, ri|
    row.each_with_index { |value, ci|
      tuples << [value, ri, ci]
    }
  }
  tuples
end

def neighbor?(t1, t2, distance = 1)
  [1,2].each { |axis|
    return false if (t1[axis] - t2[axis]).abs > distance
  }
  true
end

# convert the matrix into a sorted list of tuples (value, row, col), highest peaks first
sorted = tuples(data).sort_by { |tuple| tuple.first }.reverse

# the list of peaks that don't have neighbors
non_neighboring_peaks = []

sorted.each { |candidate|
  # always take the highest peak
  if non_neighboring_peaks.empty?
    non_neighboring_peaks << candidate
    puts "took the first peak: #{candidate}"
  else
    # check that this candidate doesn't have any accepted neighbors
    is_ok = true
    non_neighboring_peaks.each { |accepted|
      if neighbor?(candidate, accepted, NEIGHBOR_DISTANCE)
        is_ok = false
        break
      end
    }
    if is_ok
      non_neighboring_peaks << candidate
      puts "took #{candidate}"
    else
      puts "denied #{candidate}"
    end
  end
}

pp non_neighboring_peaks

I am not sure this answers the question, but it seems like you can just look for the n highest peaks that don’t have neighbors.

Here is the gist. Note that it’s in Ruby, but the idea should be clear.

require 'pp'

NUM_PEAKS = 5
NEIGHBOR_DISTANCE = 1

data = [[1,2,3,4,5],
        [2,6,4,4,6],
        [3,6,7,4,3],
       ]

def tuples(matrix)
  tuples = []
  matrix.each_with_index { |row, ri|
    row.each_with_index { |value, ci|
      tuples << [value, ri, ci]
    }
  }
  tuples
end

def neighbor?(t1, t2, distance = 1)
  [1,2].each { |axis|
    return false if (t1[axis] - t2[axis]).abs > distance
  }
  true
end

# convert the matrix into a sorted list of tuples (value, row, col), highest peaks first
sorted = tuples(data).sort_by { |tuple| tuple.first }.reverse

# the list of peaks that don't have neighbors
non_neighboring_peaks = []

sorted.each { |candidate|
  # always take the highest peak
  if non_neighboring_peaks.empty?
    non_neighboring_peaks << candidate
    puts "took the first peak: #{candidate}"
  else
    # check that this candidate doesn't have any accepted neighbors
    is_ok = true
    non_neighboring_peaks.each { |accepted|
      if neighbor?(candidate, accepted, NEIGHBOR_DISTANCE)
        is_ok = false
        break
      end
    }
    if is_ok
      non_neighboring_peaks << candidate
      puts "took #{candidate}"
    else
      puts "denied #{candidate}"
    end
  end
}

pp non_neighboring_peaks

如何安装带有.whl文件的Python软件包?

问题:如何安装带有.whl文件的Python软件包?

我在Windows机器上安装Python软件包时遇到问题,想与Christoph Gohlke的Window二进制文件一起安装。(根据我的经验,这减轻了许多其他软件包安装的麻烦)。但是,仅.whl文件可用。

http://www.lfd.uci.edu/~gohlke/pythonlibs/#jpype

但是,如何安装.whl文件?

笔记

  • 我已经找到了车轮上的文档,但是它们在解释如何安装.whl文件时似乎并不那么简单。
  • 该问题与该问题重复,但未直接回答。

I’m having trouble installing a Python package on my Windows machine, and would like to install it with Christoph Gohlke’s Window binaries. (Which, to my experience, alleviated much of the fuss for many other package installations). However, only .whl files are available.

http://www.lfd.uci.edu/~gohlke/pythonlibs/#jpype

But how do I install .whl files?

Notes

  • I’ve found documents on wheel, but they don’t seem so staightforward in explaining how to install .whl files.
  • This question is a duplicate with this question, which wasn’t directly answered.

回答 0

我只是使用了以下非常简单的内容。首先打开一个控制台,然后打开cd到您下载文件的位置,例如some-package.whl并使用

pip install some-package.whl

注意:如果无法识别pip.exe,则可以在安装python的“脚本”目录中找到它。如果未安装pip,则此页面可以提供帮助: 如何在Windows上安装pip?

注意:为澄清起见,
如果将*.whl文件复制到本地驱动器(例如C:\ some-dir \ some-file.whl),请使用以下命令行参数-

pip install C:/some-dir/some-file.whl

I just used the following which was quite simple. First open a console then cd to where you’ve downloaded your file like some-package.whl and use

pip install some-package.whl

Note: if pip.exe is not recognized, you may find it in the “Scripts” directory from where python has been installed. If pip is not installed, this page can help: How do I install pip on Windows?

Note: for clarification
If you copy the *.whl file to your local drive (ex. C:\some-dir\some-file.whl) use the following command line parameters —

pip install C:/some-dir/some-file.whl

回答 1

首先,请确保您已更新点子以启用滚轮支持:

pip install --upgrade pip

然后,要从wheel安装,请给它提供下载wheel的目录。例如,安装package_name.whl

pip install --use-wheel --no-index --find-links=/where/its/downloaded package_name

First, make sure you have updated pip to enable wheel support:

pip install --upgrade pip

Then, to install from wheel, give it the directory where the wheel is downloaded. For example, to install package_name.whl:

pip install --use-wheel --no-index --find-links=/where/its/downloaded package_name

回答 2

我和OP在同一条船上。

使用Windows命令提示符,从目录:

C:\Python34\Scripts>
pip install wheel

似乎有效。

将目录更改为whl所在的目录,它只是告诉我“无法识别点子”。回到C:\Python34\Scripts>上面,然后使用上面的完整命令提供“ where / its / downloaded”位置Requirement 'scikit_image-...-win32.whl' looks like a filename, but the filename does not exist

因此,我在Python34 / Scripts中放了一个.whl副本,再次运行了完全相同的命令(--find-links=仍然转到另一个文件夹),这一次它起作用了。

I am in the same boat as the OP.

Using a Windows command prompt, from directory:

C:\Python34\Scripts>
pip install wheel

seemed to work.

Changing directory to where the whl was located, it just tells me ‘pip is not recognized’. Going back to C:\Python34\Scripts>, then using the full command above to provide the ‘where/its/downloaded’ location, it says Requirement 'scikit_image-...-win32.whl' looks like a filename, but the filename does not exist.

So I dropped a copy of the .whl in Python34/Scripts, ran the exact same command over again (with the --find-links= still going to the other folder), and this time it worked.


回答 3

Christoph Gohlke的站点上有多个文件版本。

从该站点安装车轮时,我发现重要的一点是首先从Python控制台运行此车轮:

import pip
print(pip.pep425tags.get_supported())

这样您就知道应该为计算机安装哪个版本。选择错误的版本可能会导致软件包安装失败(尤其是如果您没有使用正确的CPython标记,例如cp27)。

There are several file versions on the great Christoph Gohlke’s site.

Something I have found important when installing wheels from this site is to first run this from the Python console:

import pip
print(pip.pep425tags.get_supported())

so that you know which version you should install for your computer. Picking the wrong version may fail the installing of the package (especially if you don’t use the right CPython tag, for example, cp27).


回答 4

您必须从我的计算机上的命令提示符处运行pip.exe。我输入C:/Python27/Scripts/pip2.exe install numpy

You have to run pip.exe from the command prompt on my computer. I type C:/Python27/Scripts/pip2.exe install numpy


回答 5

在Windows上,您不能仅使用进行升级pip install --upgrade pip,因为pip.exe正在使用,替换它会出错。相反,您应该pip像这样升级:

easy_install --upgrade pip

然后检查pip版本:

pip --version

如果显示 6.x系列,则有车轮支撑。

只有这样,您才能安装以下车轮套件:

pip install your-package.whl

On Windows you can’t just upgrade using pip install --upgrade pip, because the pip.exe is in use and there would be an error replacing it. Instead, you should upgrade pip like this:

easy_install --upgrade pip

Then check the pip version:

pip --version

If it shows 6.x series, there is wheel support.

Only then, you can install a wheel package like this:

pip install your-package.whl

回答 6

为了能够通过简单的双击安装wheel文件,您可以执行以下操作之一:

1)在命令行下以管理员权限运行两个命令:

assoc .whl=pythonwheel
ftype pythonwheel=cmd /c pip.exe install "%1" ^& pause

2)或者,可以将它们复制到wheel.bat文件中,并通过属性中的“以管理员身份运行”复选框来执行。

PS pip.exe假定位于PATH中。

更新:

(1)可以合并为一行:

assoc .whl=pythonwheel& ftype pythonwheel=cmd /c pip.exe install -U "%1" ^& pause

(2).bat文件的语法略有不同:

assoc .whl=pythonwheel& ftype pythonwheel=cmd /c pip.exe install -U "%%1" ^& pause

还可以使其输出更加详细:

@assoc .whl=pythonwheel|| echo Run me with administrator rights! && pause && exit 1
@ftype pythonwheel=cmd /c pip.exe install -U "%%1" ^& pause || echo Installation error && pause && exit 1
@echo Installation successfull & pause

有关详细信息,请参见我的博客文章

To be able to install wheel files with a simple doubleclick on them you can do one the following:

1) Run two commands in command line under administrator privileges:

assoc .whl=pythonwheel
ftype pythonwheel=cmd /c pip.exe install "%1" ^& pause

2) Alternatively, they can be copied into a wheel.bat file and executed with ‘Run as administrator’ checkbox in the properties.

PS pip.exe is assumed to be in the PATH.

Update:

(1) Those can be combined in one line:

assoc .whl=pythonwheel& ftype pythonwheel=cmd /c pip.exe install -U "%1" ^& pause

(2) Syntax for .bat files is slightly different:

assoc .whl=pythonwheel& ftype pythonwheel=cmd /c pip.exe install -U "%%1" ^& pause

Also its output can be made more verbose:

@assoc .whl=pythonwheel|| echo Run me with administrator rights! && pause && exit 1
@ftype pythonwheel=cmd /c pip.exe install -U "%%1" ^& pause || echo Installation error && pause && exit 1
@echo Installation successfull & pause

see my blog post for details.


回答 7

编辑:这不再是画中画的一部分

为避免下载此类文件,您可以尝试:

pip install --use-wheel pillow

有关更多信息,请参见this

EDIT: THIS NO LONGER IS A PART OF PIP

To avoid having to download such files, you can try:

pip install --use-wheel pillow

For more information, see this.


回答 8

如果您无法直接使用PIP安装特定的软件包。

您可以.whl-https://www.lfd.uci.edu/~gohlke/pythonlibs/下载特定的(wheel)软件包

CD(更改目录)到下载的软件包,并通过
pip install PACKAGENAME.whl
-ex 手动安装:
pip install ad3‑2.1‑cp27‑cp27m‑win32.whl

In-case if you unable to install specific package directly using PIP.

You can download a specific .whl (wheel) package from – https://www.lfd.uci.edu/~gohlke/pythonlibs/

CD (Change directory) to that downloaded package and install it manually by –
pip install PACKAGENAME.whl
ex:
pip install ad3‑2.1‑cp27‑cp27m‑win32.whl


回答 9

我设法安装NumPy的唯一方法如下:

我从这里下载了NumPy https://pypi.python.org/pypi/numpy

这个模块

https://pypi.python.org/packages/d7/3c/d8b473b517062cc700575889d79e7444c9b54c6072a22189d1831d2fbbce/numpy-1.11.2-cp35-none-win32.whl#md5=e485e06907826af5e1fc88608d0629a2

PowerShell中从Python的安装路径执行命令

PS C:\Program Files (x86)\Python35-32> .\python -m pip install C:/Users/MyUsername/Documents/Programs/Python/numpy-1.11.2-cp35-none-win32.whl
Processing c:\users\MyUsername\documents\programs\numpy-1.11.2-cp35-none-win32.whl
Installing collected packages: numpy
Successfully installed numpy-1.11.2
PS C:\Program Files (x86)\Python35-32>

PS .:我在Windows 10上安装了它。

The only way I managed to install NumPy was as follows:

I downloaded NumPy from here https://pypi.python.org/pypi/numpy

This Module

https://pypi.python.org/packages/d7/3c/d8b473b517062cc700575889d79e7444c9b54c6072a22189d1831d2fbbce/numpy-1.11.2-cp35-none-win32.whl#md5=e485e06907826af5e1fc88608d0629a2

Command execution from Python’s installation path in PowerShell

PS C:\Program Files (x86)\Python35-32> .\python -m pip install C:/Users/MyUsername/Documents/Programs/Python/numpy-1.11.2-cp35-none-win32.whl
Processing c:\users\MyUsername\documents\programs\numpy-1.11.2-cp35-none-win32.whl
Installing collected packages: numpy
Successfully installed numpy-1.11.2
PS C:\Program Files (x86)\Python35-32>

PS.: I installed it on Windows 10.


回答 10

您可以使用来安装.whl文件pip install filename。尽管要以这种形式使用它,但它应该与命令行位于同一目录中,否则请指定完整的文件名以及其地址(例如)pip install C:\Some\PAth\filename

另外,请确保.whl文件与您使用的平台处于同一平台,执行a python -V找出您正在运行的Python版本,如果它是win32或64,则根据其安装正确的版本。

You can install the .whl file, using pip install filename. Though to use it in this form, it should be in the same directory as your command line, otherwise specify the complete filename, along with its address like pip install C:\Some\PAth\filename.

Also make sure the .whl file is of the same platform as you are using, do a python -V to find out which version of Python you are running and if it is win32 or 64, install the correct version according to it.


回答 11

我要做的是先使用命令更新点子: pip install --upgrade pip然后使用命令安装轮pip install wheel,然后完美地工作了。

希望它对您有用。

What I did was first updating the pip by using the command: pip install --upgrade pip and then I also installed wheel by using command: pip install wheel and then it worked perfectly Fine.

Hope it works for you I guess.


回答 12

Windows上的新Python用户通常会在安装过程中忘记将Python的\ Scripts目录添加到PATH变量中。我建议使用Python启动器,并通过-m开关将pip作为脚本执行。然后,您可以安装特定Python版本的轮子(如果已安装多个版本),并且Scripts目录不必位于PATH中。因此,打开命令行,(使用cd命令)导航到.whl文件所在的文件夹,然后输入:

py -3.6 -m pip install your_whl_file.whl

3.6您的Python版本替换,或者输入-3所需的Python版本是否首先出现在PATH中。并在活跃的虚拟环境中:py -m pip install your_whl_file.whl

当然,您也可以通过这种方式从PyPI安装软件包,例如

py -3.6 -m pip install pygame

New Python users on Windows often forget to add Python’s \Scripts directory to the PATH variable during the installation. I recommend to use the Python launcher and execute pip as a script with the -m switch. Then you can install the wheels for a specific Python version (if more than one are installed) and the Scripts directory doesn’t have to be in the PATH. So open the command line, navigate (with the cd command) to the folder where the .whl file is located and enter:

py -3.6 -m pip install your_whl_file.whl

Replace 3.6 by your Python version or just enter -3 if the desired Python version appears first in the PATH. And with an active virtual environment: py -m pip install your_whl_file.whl.

Of course you can also install packages from PyPI in this way, e.g.

py -3.6 -m pip install pygame

回答 13

我会建议您如何安装.whl文件的确切方法。最初我遇到很多问题,但是后来我解决了,这是安装.whl文件的技巧。

正确地遵循步骤以获取模块导入

  1. 确保.whl文件保存在python 2.7 / 3.6 / 3.7 / ..文件夹中。最初,当您下载.whl文件时,该文件保留在下载文件夹中,我的建议是更改文件夹。它使安装文件更加容易。
  2. 打开命令提示符,然后输入以下内容,打开保存文件的文件夹:

cd c:\ python 3.7

3.现在,输入下面写的命令

>py -3.7(version name) -m pip install (file name).whl
  1. 单击输入,并确保以正确的文件名输入当前正在使用的版本。

  2. 按下Enter键后,等待几分钟,文件将被安装,您将能够导入特定的模块。

  3. 为了检查模块是否成功安装,请以空闲状态导入模块并进行检查。

谢谢:)

I would be suggesting you the exact way how to install .whl file. Initially I faced many issues but then I solved it, Here is my trick to install .whl files.

Follow The Steps properly in order to get a module imported

  1. Make sure your .whl file is kept in the python 2.7/3.6/3.7/.. folder. Initially when you download the .whl file the file is kept in downloaded folder, my suggestion is to change the folder. It makes it easier to install the file.
  2. Open command prompt and open the folder where you have kept the file by entering

cd c:\python 3.7

3.Now, enter the command written below

>py -3.7(version name) -m pip install (file name).whl
  1. Click enter and make sure you enter the version you are currently using with correct file name.

  2. Once you press enter, wait for few minutes and the file will be installed and you will be able to import the particular module.

  3. In order to check if the module is installed successfully, import the module in idle and check it.

Thank you:)


回答 14

下载包(.whl)。

将文件放在python目录的script文件夹中

C:\Python36\Scripts

使用命令提示符安装软件包。

C:\Python36\Scripts>pip install package_name.whl

Download the package (.whl).

Put the file inside the script folder of python directory

C:\Python36\Scripts

Use the command prompt to install the package.

C:\Python36\Scripts>pip install package_name.whl

回答 15

在MacOS上(pip通过MacPorts安装到MacPorts python2.7中),我不得不使用@Dunes解决方案:

sudo python -m pip install some-package.whl

python在我的情况下,在哪里被MacPorts python取代了,python2.7或者是python3.5对我对。

-m根据联机帮助页选项为“将库模块作为脚本运行”。

(我以前曾运行sudo port install py27-pip py27-wheel过安装程序,pip然后先wheel进入python 2.7安装程序。)

On the MacOS, with pip installed via MacPorts into the MacPorts python2.7, I had to use @Dunes solution:

sudo python -m pip install some-package.whl

Where python was replaced by the MacPorts python in my case, which is python2.7 or python3.5 for me.

The -m option is “Run library module as script” according to the manpage.

(I had previously run sudo port install py27-pip py27-wheel to install pip and wheel into my python 2.7 installation first.)


向pandas DataFrame添加一行

问题:向pandas DataFrame添加一行

我知道pandas旨在加载完全填充的内容,DataFrame但是我需要创建一个空的DataFrame然后逐行添加行。做这个的最好方式是什么 ?

我成功创建了一个空的DataFrame:

res = DataFrame(columns=('lib', 'qty1', 'qty2'))

然后,我可以添加新行,并使用以下字段填充字段:

res = res.set_value(len(res), 'qty1', 10.0)

它有效,但看起来很奇怪:-/(添加字符串值失败)

如何将新行添加到DataFrame(具有不同的列类型)?

I understand that pandas is designed to load fully populated DataFrame but I need to create an empty DataFrame then add rows, one by one. What is the best way to do this ?

I successfully created an empty DataFrame with :

res = DataFrame(columns=('lib', 'qty1', 'qty2'))

Then I can add a new row and fill a field with :

res = res.set_value(len(res), 'qty1', 10.0)

It works but seems very odd :-/ (it fails for adding string value)

How can I add a new row to my DataFrame (with different columns type) ?


回答 0

>>> import pandas as pd
>>> from numpy.random import randint

>>> df = pd.DataFrame(columns=['lib', 'qty1', 'qty2'])
>>> for i in range(5):
>>>     df.loc[i] = ['name' + str(i)] + list(randint(10, size=2))

>>> df
     lib qty1 qty2
0  name0    3    3
1  name1    2    4
2  name2    2    8
3  name3    2    1
4  name4    9    6
>>> import pandas as pd
>>> from numpy.random import randint

>>> df = pd.DataFrame(columns=['lib', 'qty1', 'qty2'])
>>> for i in range(5):
>>>     df.loc[i] = ['name' + str(i)] + list(randint(10, size=2))

>>> df
     lib qty1 qty2
0  name0    3    3
1  name1    2    4
2  name2    2    8
3  name3    2    1
4  name4    9    6

回答 1

如果可以预先获取该数据帧的所有数据,则有一种比附加到数据帧快得多的方法:

  1. 创建一个词典列表,其中每个词典对应于一个输入数据行。
  2. 从此列表创建一个数据框。

我有一个类似的任务,需要花30分钟的时间逐行附加到数据框,然后根据在几秒钟内完成的词典列表创建数据框。

rows_list = []
for row in input_rows:

        dict1 = {}
        # get input row in dictionary format
        # key = col_name
        dict1.update(blah..) 

        rows_list.append(dict1)

df = pd.DataFrame(rows_list)               

In case you can get all data for the data frame upfront, there is a much faster approach than appending to a data frame:

  1. Create a list of dictionaries in which each dictionary corresponds to an input data row.
  2. Create a data frame from this list.

I had a similar task for which appending to a data frame row by row took 30 min, and creating a data frame from a list of dictionaries completed within seconds.

rows_list = []
for row in input_rows:

        dict1 = {}
        # get input row in dictionary format
        # key = col_name
        dict1.update(blah..) 

        rows_list.append(dict1)

df = pd.DataFrame(rows_list)               

回答 2

您可以使用pandas.concat()DataFrame.append()。有关详细信息和示例,请参见合并,联接和连接

You could use pandas.concat() or DataFrame.append(). For details and examples, see Merge, join, and concatenate.


回答 3

已经很长时间了,但是我也面临着同样的问题。并在这里找到了很多有趣的答案。所以我很困惑使用什么方法。

在向数据帧添加很多行的情况下,我对速度性能感兴趣。因此,我尝试了4种最流行的方法并检查了它们的速度。

使用新版本的软件包在2019年更新。在@FooBar评论也会更新

速度表现

  1. 使用.append(NPE的答案
  2. 使用.loc(弗雷德的答案
  3. 使用.loc进行预分配(FooBar的答案
  4. 最后使用dict并创建DataFrame(ShikharDua的答案

结果(以秒为单位):

|------------|-------------|-------------|-------------|
|  Approach  |  1000 rows  |  5000 rows  | 10 000 rows |
|------------|-------------|-------------|-------------|
| .append    |    0.69     |    3.39     |    6.78     |
|------------|-------------|-------------|-------------|
| .loc w/o   |    0.74     |    3.90     |    8.35     |
| prealloc   |             |             |             |
|------------|-------------|-------------|-------------|
| .loc with  |    0.24     |    2.58     |    8.70     |
| prealloc   |             |             |             |
|------------|-------------|-------------|-------------|
|  dict      |    0.012    |   0.046     |   0.084     |
|------------|-------------|-------------|-------------|

也感谢@krassowski的有用评论-我更新了代码。

所以我自己在字典中使用加法。


码:

import pandas as pd
import numpy as np
import time

del df1, df2, df3, df4
numOfRows = 1000
# append
startTime = time.perf_counter()
df1 = pd.DataFrame(np.random.randint(100, size=(5,5)), columns=['A', 'B', 'C', 'D', 'E'])
for i in range( 1,numOfRows-4):
    df1 = df1.append( dict( (a,np.random.randint(100)) for a in ['A','B','C','D','E']), ignore_index=True)
print('Elapsed time: {:6.3f} seconds for {:d} rows'.format(time.perf_counter() - startTime, numOfRows))
print(df1.shape)

# .loc w/o prealloc
startTime = time.perf_counter()
df2 = pd.DataFrame(np.random.randint(100, size=(5,5)), columns=['A', 'B', 'C', 'D', 'E'])
for i in range( 1,numOfRows):
    df2.loc[i]  = np.random.randint(100, size=(1,5))[0]
print('Elapsed time: {:6.3f} seconds for {:d} rows'.format(time.perf_counter() - startTime, numOfRows))
print(df2.shape)

# .loc with prealloc
df3 = pd.DataFrame(index=np.arange(0, numOfRows), columns=['A', 'B', 'C', 'D', 'E'] )
startTime = time.perf_counter()
for i in range( 1,numOfRows):
    df3.loc[i]  = np.random.randint(100, size=(1,5))[0]
print('Elapsed time: {:6.3f} seconds for {:d} rows'.format(time.perf_counter() - startTime, numOfRows))
print(df3.shape)

# dict
startTime = time.perf_counter()
row_list = []
for i in range (0,5):
    row_list.append(dict( (a,np.random.randint(100)) for a in ['A','B','C','D','E']))
for i in range( 1,numOfRows-4):
    dict1 = dict( (a,np.random.randint(100)) for a in ['A','B','C','D','E'])
    row_list.append(dict1)

df4 = pd.DataFrame(row_list, columns=['A','B','C','D','E'])
print('Elapsed time: {:6.3f} seconds for {:d} rows'.format(time.perf_counter() - startTime, numOfRows))
print(df4.shape)

PS我相信,我的认识并不完美,也许还有一些优化。

It’s been a long time, but I faced the same problem too. And found here a lot of interesting answers. So I was confused what method to use.

In the case of adding a lot of rows to dataframe I interested in speed performance. So I tried 4 most popular methods and checked their speed.

UPDATED IN 2019 using new versions of packages. Also updated after @FooBar comment

SPEED PERFORMANCE

  1. Using .append (NPE’s answer)
  2. Using .loc (fred’s answer)
  3. Using .loc with preallocating (FooBar’s answer)
  4. Using dict and create DataFrame in the end (ShikharDua’s answer)

Results (in secs):

|------------|-------------|-------------|-------------|
|  Approach  |  1000 rows  |  5000 rows  | 10 000 rows |
|------------|-------------|-------------|-------------|
| .append    |    0.69     |    3.39     |    6.78     |
|------------|-------------|-------------|-------------|
| .loc w/o   |    0.74     |    3.90     |    8.35     |
| prealloc   |             |             |             |
|------------|-------------|-------------|-------------|
| .loc with  |    0.24     |    2.58     |    8.70     |
| prealloc   |             |             |             |
|------------|-------------|-------------|-------------|
|  dict      |    0.012    |   0.046     |   0.084     |
|------------|-------------|-------------|-------------|

Also thanks to @krassowski for useful comment – I updated the code.

So I use addition through the dictionary for myself.


Code:

import pandas as pd
import numpy as np
import time

del df1, df2, df3, df4
numOfRows = 1000
# append
startTime = time.perf_counter()
df1 = pd.DataFrame(np.random.randint(100, size=(5,5)), columns=['A', 'B', 'C', 'D', 'E'])
for i in range( 1,numOfRows-4):
    df1 = df1.append( dict( (a,np.random.randint(100)) for a in ['A','B','C','D','E']), ignore_index=True)
print('Elapsed time: {:6.3f} seconds for {:d} rows'.format(time.perf_counter() - startTime, numOfRows))
print(df1.shape)

# .loc w/o prealloc
startTime = time.perf_counter()
df2 = pd.DataFrame(np.random.randint(100, size=(5,5)), columns=['A', 'B', 'C', 'D', 'E'])
for i in range( 1,numOfRows):
    df2.loc[i]  = np.random.randint(100, size=(1,5))[0]
print('Elapsed time: {:6.3f} seconds for {:d} rows'.format(time.perf_counter() - startTime, numOfRows))
print(df2.shape)

# .loc with prealloc
df3 = pd.DataFrame(index=np.arange(0, numOfRows), columns=['A', 'B', 'C', 'D', 'E'] )
startTime = time.perf_counter()
for i in range( 1,numOfRows):
    df3.loc[i]  = np.random.randint(100, size=(1,5))[0]
print('Elapsed time: {:6.3f} seconds for {:d} rows'.format(time.perf_counter() - startTime, numOfRows))
print(df3.shape)

# dict
startTime = time.perf_counter()
row_list = []
for i in range (0,5):
    row_list.append(dict( (a,np.random.randint(100)) for a in ['A','B','C','D','E']))
for i in range( 1,numOfRows-4):
    dict1 = dict( (a,np.random.randint(100)) for a in ['A','B','C','D','E'])
    row_list.append(dict1)

df4 = pd.DataFrame(row_list, columns=['A','B','C','D','E'])
print('Elapsed time: {:6.3f} seconds for {:d} rows'.format(time.perf_counter() - startTime, numOfRows))
print(df4.shape)

P.S. I believe, my realization isn’t perfect, and maybe there is some optimization.


回答 4

如果事先知道条目数,则应该通过提供索引来预分配空间(从另一个答案中获取数据示例):

import pandas as pd
import numpy as np
# we know we're gonna have 5 rows of data
numberOfRows = 5
# create dataframe
df = pd.DataFrame(index=np.arange(0, numberOfRows), columns=('lib', 'qty1', 'qty2') )

# now fill it up row by row
for x in np.arange(0, numberOfRows):
    #loc or iloc both work here since the index is natural numbers
    df.loc[x] = [np.random.randint(-1,1) for n in range(3)]
In[23]: df
Out[23]: 
   lib  qty1  qty2
0   -1    -1    -1
1    0     0     0
2   -1     0    -1
3    0    -1     0
4   -1     0     0

速度比较

In[30]: %timeit tryThis() # function wrapper for this answer
In[31]: %timeit tryOther() # function wrapper without index (see, for example, @fred)
1000 loops, best of 3: 1.23 ms per loop
100 loops, best of 3: 2.31 ms per loop

而且-从注释中看-大小为6000,速度差变得更大:

增加数组(12)的大小和行数(500)会使速度差异更加明显:313ms vs 2.29s

If you know the number of entries ex ante, you should preallocate the space by also providing the index (taking the data example from a different answer):

import pandas as pd
import numpy as np
# we know we're gonna have 5 rows of data
numberOfRows = 5
# create dataframe
df = pd.DataFrame(index=np.arange(0, numberOfRows), columns=('lib', 'qty1', 'qty2') )

# now fill it up row by row
for x in np.arange(0, numberOfRows):
    #loc or iloc both work here since the index is natural numbers
    df.loc[x] = [np.random.randint(-1,1) for n in range(3)]
In[23]: df
Out[23]: 
   lib  qty1  qty2
0   -1    -1    -1
1    0     0     0
2   -1     0    -1
3    0    -1     0
4   -1     0     0

Speed comparison

In[30]: %timeit tryThis() # function wrapper for this answer
In[31]: %timeit tryOther() # function wrapper without index (see, for example, @fred)
1000 loops, best of 3: 1.23 ms per loop
100 loops, best of 3: 2.31 ms per loop

And – as from the comments – with a size of 6000, the speed difference becomes even larger:

Increasing the size of the array (12) and the number of rows (500) makes the speed difference more striking: 313ms vs 2.29s


回答 5

mycolumns = ['A', 'B']
df = pd.DataFrame(columns=mycolumns)
rows = [[1,2],[3,4],[5,6]]
for row in rows:
    df.loc[len(df)] = row
mycolumns = ['A', 'B']
df = pd.DataFrame(columns=mycolumns)
rows = [[1,2],[3,4],[5,6]]
for row in rows:
    df.loc[len(df)] = row

回答 6

为了高效地附加,请参见如何向pandas数据框添加额外的行和“ 设置为放大”

通过添加行loc/ix不存在的关键指标数据。例如:

In [1]: se = pd.Series([1,2,3])

In [2]: se
Out[2]: 
0    1
1    2
2    3
dtype: int64

In [3]: se[5] = 5.

In [4]: se
Out[4]: 
0    1.0
1    2.0
2    3.0
5    5.0
dtype: float64

要么:

In [1]: dfi = pd.DataFrame(np.arange(6).reshape(3,2),
   .....:                 columns=['A','B'])
   .....: 

In [2]: dfi
Out[2]: 
   A  B
0  0  1
1  2  3
2  4  5

In [3]: dfi.loc[:,'C'] = dfi.loc[:,'A']

In [4]: dfi
Out[4]: 
   A  B  C
0  0  1  0
1  2  3  2
2  4  5  4
In [5]: dfi.loc[3] = 5

In [6]: dfi
Out[6]: 
   A  B  C
0  0  1  0
1  2  3  2
2  4  5  4
3  5  5  5

For efficient appending see How to add an extra row to a pandas dataframe and Setting With Enlargement.

Add rows through loc/ix on non existing key index data. e.g. :

In [1]: se = pd.Series([1,2,3])

In [2]: se
Out[2]: 
0    1
1    2
2    3
dtype: int64

In [3]: se[5] = 5.

In [4]: se
Out[4]: 
0    1.0
1    2.0
2    3.0
5    5.0
dtype: float64

Or:

In [1]: dfi = pd.DataFrame(np.arange(6).reshape(3,2),
   .....:                 columns=['A','B'])
   .....: 

In [2]: dfi
Out[2]: 
   A  B
0  0  1
1  2  3
2  4  5

In [3]: dfi.loc[:,'C'] = dfi.loc[:,'A']

In [4]: dfi
Out[4]: 
   A  B  C
0  0  1  0
1  2  3  2
2  4  5  4
In [5]: dfi.loc[3] = 5

In [6]: dfi
Out[6]: 
   A  B  C
0  0  1  0
1  2  3  2
2  4  5  4
3  5  5  5

回答 7

您可以使用ignore_index选项将单行附加为字典。

>>> f = pandas.DataFrame(data = {'Animal':['cow','horse'], 'Color':['blue', 'red']})
>>> f
  Animal Color
0    cow  blue
1  horse   red
>>> f.append({'Animal':'mouse', 'Color':'black'}, ignore_index=True)
  Animal  Color
0    cow   blue
1  horse    red
2  mouse  black

You can append a single row as a dictionary using the ignore_index option.

>>> f = pandas.DataFrame(data = {'Animal':['cow','horse'], 'Color':['blue', 'red']})
>>> f
  Animal Color
0    cow  blue
1  horse   red
>>> f.append({'Animal':'mouse', 'Color':'black'}, ignore_index=True)
  Animal  Color
0    cow   blue
1  horse    red
2  mouse  black

回答 8

为了Python的方式,在这里添加我的答案:

res = pd.DataFrame(columns=('lib', 'qty1', 'qty2'))
res = res.append([{'qty1':10.0}], ignore_index=True)
print(res.head())

   lib  qty1  qty2
0  NaN  10.0   NaN

For the sake of Pythonic way, here add my answer:

res = pd.DataFrame(columns=('lib', 'qty1', 'qty2'))
res = res.append([{'qty1':10.0}], ignore_index=True)
print(res.head())

   lib  qty1  qty2
0  NaN  10.0   NaN

回答 9

您还可以建立列表列表,并将其转换为数据框-

import pandas as pd

columns = ['i','double','square']
rows = []

for i in range(6):
    row = [i, i*2, i*i]
    rows.append(row)

df = pd.DataFrame(rows, columns=columns)

给予

    我加倍
0 0 0 0
1 1 2 1
2 2 4 4
3 3 6 9
4 4 8 16
5 5 10 25

You can also build up a list of lists and convert it to a dataframe –

import pandas as pd

columns = ['i','double','square']
rows = []

for i in range(6):
    row = [i, i*2, i*i]
    rows.append(row)

df = pd.DataFrame(rows, columns=columns)

giving

    i   double  square
0   0   0   0
1   1   2   1
2   2   4   4
3   3   6   9
4   4   8   16
5   5   10  25

回答 10

这不是对OP问题的答案,而是一个玩具示例,用于说明@ShikharDua的答案,在上面我发现它非常有用。

尽管这个片段是微不足道的,但在实际数据中,我有1000行和许多列,我希望能够按不同的列进行分组,然后对一个以上的taget列执行以下统计信息。因此,拥有一种可靠的方法来一次一次构建数据帧非常方便。谢谢@ShikharDua!

import pandas as pd 

BaseData = pd.DataFrame({ 'Customer' : ['Acme','Mega','Acme','Acme','Mega','Acme'],
                          'Territory'  : ['West','East','South','West','East','South'],
                          'Product'  : ['Econ','Luxe','Econ','Std','Std','Econ']})
BaseData

columns = ['Customer','Num Unique Products', 'List Unique Products']

rows_list=[]
for name, group in BaseData.groupby('Customer'):
    RecordtoAdd={} #initialise an empty dict 
    RecordtoAdd.update({'Customer' : name}) #
    RecordtoAdd.update({'Num Unique Products' : len(pd.unique(group['Product']))})      
    RecordtoAdd.update({'List Unique Products' : pd.unique(group['Product'])})                   

    rows_list.append(RecordtoAdd)

AnalysedData = pd.DataFrame(rows_list)

print('Base Data : \n',BaseData,'\n\n Analysed Data : \n',AnalysedData)

This is not an answer to the OP question but a toy example to illustrate the answer of @ShikharDua above which I found very useful.

While this fragment is trivial, in the actual data I had 1,000s of rows, and many columns, and I wished to be able to group by different columns and then perform the stats below for more than one taget column. So having a reliable method for building the data frame one row at a time was a great convenience. Thank you @ShikharDua !

import pandas as pd 

BaseData = pd.DataFrame({ 'Customer' : ['Acme','Mega','Acme','Acme','Mega','Acme'],
                          'Territory'  : ['West','East','South','West','East','South'],
                          'Product'  : ['Econ','Luxe','Econ','Std','Std','Econ']})
BaseData

columns = ['Customer','Num Unique Products', 'List Unique Products']

rows_list=[]
for name, group in BaseData.groupby('Customer'):
    RecordtoAdd={} #initialise an empty dict 
    RecordtoAdd.update({'Customer' : name}) #
    RecordtoAdd.update({'Num Unique Products' : len(pd.unique(group['Product']))})      
    RecordtoAdd.update({'List Unique Products' : pd.unique(group['Product'])})                   

    rows_list.append(RecordtoAdd)

AnalysedData = pd.DataFrame(rows_list)

print('Base Data : \n',BaseData,'\n\n Analysed Data : \n',AnalysedData)

回答 11

想出了一种简单而又不错的方法:

>>> df
     A  B  C
one  1  2  3
>>> df.loc["two"] = [4,5,6]
>>> df
     A  B  C
one  1  2  3
two  4  5  6

Figured out a simple and nice way:

>>> df
     A  B  C
one  1  2  3
>>> df.loc["two"] = [4,5,6]
>>> df
     A  B  C
one  1  2  3
two  4  5  6

回答 12

您可以使用生成器对象创建Dataframe,这将在列表上提高内存效率。

num = 10

# Generator function to generate generator object
def numgen_func(num):
    for i in range(num):
        yield ('name_{}'.format(i), (i*i), (i*i*i))

# Generator expression to generate generator object (Only once data get populated, can not be re used)
numgen_expression = (('name_{}'.format(i), (i*i), (i*i*i)) for i in range(num) )

df = pd.DataFrame(data=numgen_func(num), columns=('lib', 'qty1', 'qty2'))

要将原始数据添加到现有DataFrame中,可以使用append方法。

df = df.append([{ 'lib': "name_20", 'qty1': 20, 'qty2': 400  }])

You can use generator object to create Dataframe, which will be more memory efficient over the list.

num = 10

# Generator function to generate generator object
def numgen_func(num):
    for i in range(num):
        yield ('name_{}'.format(i), (i*i), (i*i*i))

# Generator expression to generate generator object (Only once data get populated, can not be re used)
numgen_expression = (('name_{}'.format(i), (i*i), (i*i*i)) for i in range(num) )

df = pd.DataFrame(data=numgen_func(num), columns=('lib', 'qty1', 'qty2'))

To add raw to existing DataFrame you can use append method.

df = df.append([{ 'lib': "name_20", 'qty1': 20, 'qty2': 400  }])

回答 13

创建一个新记录(数据框)并添加到old_data_frame
传递列表和相应的名以创建new_record(data_frame)

new_record = pd.DataFrame([[0,'abcd',0,1,123]],columns=['a','b','c','d','e'])

old_data_frame = pd.concat([old_data_frame,new_record])

Create a new record(data frame) and add to old_data_frame.
pass list of values and corresponding column names to create a new_record (data_frame)

new_record = pd.DataFrame([[0,'abcd',0,1,123]],columns=['a','b','c','d','e'])

old_data_frame = pd.concat([old_data_frame,new_record])

回答 14

这是在其中添加/添加行的方法 pandas DataFrame

def add_row(df, row):
    df.loc[-1] = row
    df.index = df.index + 1  
    return df.sort_index()

add_row(df, [1,2,3]) 

它可以用于在空的或填充的熊猫DataFrame中插入/追加一行

Here is the way to add/append a row in pandas DataFrame

def add_row(df, row):
    df.loc[-1] = row
    df.index = df.index + 1  
    return df.sort_index()

add_row(df, [1,2,3]) 

It can be used to insert/append a row in empty or populated pandas DataFrame


回答 15

除了ShikharDua的答案中的字典列表之外,我们还可以将表表示为list字典,假设我们事先知道各列,则每个列表按行顺序存储一列。最后,我们构造一次DataFrame。

对于c列和n行,这使用1个字典和c个列表,而使用1个列表和n个字典。字典列表方法使每个字典都存储所有键,并且需要为每行创建一个新字典。在这里,我们仅附加到列表,这是恒定时间并且理论上非常快。

# current data
data = {"Animal":["cow", "horse"], "Color":["blue", "red"]}

# adding a new row (be careful to ensure every column gets another value)
data["Animal"].append("mouse")
data["Color"].append("black")

# at the end, construct our DataFrame
df = pd.DataFrame(data)
#   Animal  Color
# 0    cow   blue
# 1  horse    red
# 2  mouse  black

Instead of a list of dictionaries as in ShikharDua’s answer, we can also represent our table as a dictionary of lists, where each list stores one column in row-order, given we know our columns beforehand. At the end we construct our DataFrame once.

For c columns and n rows, this uses 1 dictionary and c lists, versus 1 list and n dictionaries. The list of dictionaries method has each dictionary storing all keys and requires creating a new dictionary for every row. Here we only append to lists, which is constant time and theoretically very fast.

# current data
data = {"Animal":["cow", "horse"], "Color":["blue", "red"]}

# adding a new row (be careful to ensure every column gets another value)
data["Animal"].append("mouse")
data["Color"].append("black")

# at the end, construct our DataFrame
df = pd.DataFrame(data)
#   Animal  Color
# 0    cow   blue
# 1  horse    red
# 2  mouse  black

回答 16

如果要在行末添加行,请将其添加为列表

valuestoappend = [va1,val2,val3]
res = res.append(pd.Series(valuestoappend,index = ['lib', 'qty1', 'qty2']),ignore_index = True)

if you want to add row at the end append it as a list

valuestoappend = [va1,val2,val3]
res = res.append(pd.Series(valuestoappend,index = ['lib', 'qty1', 'qty2']),ignore_index = True)

回答 17

另一种方法(可能不是很出色):

# add a row
def add_row(df, row):
    colnames = list(df.columns)
    ncol = len(colnames)
    assert ncol == len(row), "Length of row must be the same as width of DataFrame: %s" % row
    return df.append(pd.DataFrame([row], columns=colnames))

您还可以像这样增强DataFrame类:

import pandas as pd
def add_row(self, row):
    self.loc[len(self.index)] = row
pd.DataFrame.add_row = add_row

Another way to do it (probably not very performant):

# add a row
def add_row(df, row):
    colnames = list(df.columns)
    ncol = len(colnames)
    assert ncol == len(row), "Length of row must be the same as width of DataFrame: %s" % row
    return df.append(pd.DataFrame([row], columns=colnames))

You can also enhance the DataFrame class like this:

import pandas as pd
def add_row(self, row):
    self.loc[len(self.index)] = row
pd.DataFrame.add_row = add_row

回答 18

简单点。通过将列表作为输入,将其添加为数据帧中的行:

import pandas as pd  
res = pd.DataFrame(columns=('lib', 'qty1', 'qty2'))  
for i in range(5):  
    res_list = list(map(int, input().split()))  
    res = res.append(pd.Series(res_list,index=['lib','qty1','qty2']), ignore_index=True)

Make it simple. By taking list as input which will be appended as row in data-frame:-

import pandas as pd  
res = pd.DataFrame(columns=('lib', 'qty1', 'qty2'))  
for i in range(5):  
    res_list = list(map(int, input().split()))  
    res = res.append(pd.Series(res_list,index=['lib','qty1','qty2']), ignore_index=True)

回答 19

您需要的是loc[df.shape[0]]loc[len(df)]


# Assuming your df has 4 columns (str, int, str, bool)
df.loc[df.shape[0]] = ['col1Value', 100, 'col3Value', False] 

要么

df.loc[len(df)] = ['col1Value', 100, 'col3Value', False] 

All you need is loc[df.shape[0]] or loc[len(df)]


# Assuming your df has 4 columns (str, int, str, bool)
df.loc[df.shape[0]] = ['col1Value', 100, 'col3Value', False] 

or

df.loc[len(df)] = ['col1Value', 100, 'col3Value', False] 

回答 20

我们经常看到df.loc[subscript] = …分配给一个DataFrame行的结构。Mikhail_Sam发布了基准测试,其中包含此构造以及使用dict的方法,最后创建了DataFrame。他发现后者是迄今为止最快的。但是,如果我们用替换df3.loc[i] = …其代码中的(使用预分配的DataFrame)df3.values[i] = …,结果将发生显着变化,因为该方法的性能类似于使用dict的方法。因此,我们应该更多地考虑使用df.values[subscript] = …。但是请注意,.values它采用从零开始的下标,该下标可能与DataFrame.index不同。

We often see the construct df.loc[subscript] = … to assign to one DataFrame row. Mikhail_Sam posted benchmarks containing, among others, this construct as well as the method using dict and create DataFrame in the end. He found the latter to be the fastest by far. But if we replace the df3.loc[i] = … (with preallocated DataFrame) in his code with df3.values[i] = …, the outcome changes significantly, in that that method performs similar to the one using dict. So we should more often take the use of df.values[subscript] = … into consideration. However note that .values takes a zero-based subscript, which may be different from the DataFrame.index.


回答 21

pandas.DataFrame.append

DataFrame.append(自身,其他,ignore_index = False,verify_integrity = False,sort = False)→’DataFrame’

df = pd.DataFrame([[1, 2], [3, 4]], columns=list('AB'))
df2 = pd.DataFrame([[5, 6], [7, 8]], columns=list('AB'))
df.append(df2)

将ignore_index设置为True:

df.append(df2, ignore_index=True)

pandas.DataFrame.append

DataFrame.append(self, other, ignore_index=False, verify_integrity=False, sort=False) → ‘DataFrame’

df = pd.DataFrame([[1, 2], [3, 4]], columns=list('AB'))
df2 = pd.DataFrame([[5, 6], [7, 8]], columns=list('AB'))
df.append(df2)

With ignore_index set to True:

df.append(df2, ignore_index=True)

回答 22

在添加行之前,我们必须将数据帧转换为字典,在那里您可以看到键在数据帧中为列,并且列的值再次存储在字典中,但是每一列的键都是数据帧中的索引号。这个想法让我写了下面的代码。

df2=df.to_dict()
values=["s_101","hyderabad",10,20,16,13,15,12,12,13,25,26,25,27,"good","bad"] #this is total row that we are going to add
i=0
for x in df.columns:   #here df.columns gives us the main dictionary key
    df2[x][101]=values[i]   #here the 101 is our index number it is also key of sub dictionary
    i+=1

before going to add a row, we have to convert the dataframe to dictionary there you can see the keys as columns in dataframe and values of the columns are again stored in the dictionary but there key for every column is the index number in dataframe. That idea make me to write the below code.

df2=df.to_dict()
values=["s_101","hyderabad",10,20,16,13,15,12,12,13,25,26,25,27,"good","bad"] #this is total row that we are going to add
i=0
for x in df.columns:   #here df.columns gives us the main dictionary key
    df2[x][101]=values[i]   #here the 101 is our index number it is also key of sub dictionary
    i+=1

回答 23

您可以为此连接两个DataFrame。我基本上遇到了这个问题,将新行添加到具有字符索引(非数字)的现有DataFrame中。因此,我在pipe()中输入新行的数据,并在列表中输入索引。

new_dict = {put input for new row here}
new_list = [put your index here]

new_df = pd.DataFrame(data=new_dict, index=new_list)

df = pd.concat([existing_df, new_df])

You can concatenate two DataFrames for this. I basically came across this problem to add a new row to an existing DataFrame with a character index(not numeric). So, I input the data for a new row in a duct() and index in a list.

new_dict = {put input for new row here}
new_list = [put your index here]

new_df = pd.DataFrame(data=new_dict, index=new_list)

df = pd.concat([existing_df, new_df])

回答 24

这将有助于将一个项目添加到一个空的DataFrame中。问题在于df.index.max() == nan第一个索引:

df = pd.DataFrame(columns=['timeMS', 'accelX', 'accelY', 'accelZ', 'gyroX', 'gyroY', 'gyroZ'])

df.loc[0 if math.isnan(df.index.max()) else df.index.max() + 1] = [x for x in range(7)]

This will take care of adding an item to an empty DataFrame. The issue is that df.index.max() == nan for the first index:

df = pd.DataFrame(columns=['timeMS', 'accelX', 'accelY', 'accelZ', 'gyroX', 'gyroY', 'gyroZ'])

df.loc[0 if math.isnan(df.index.max()) else df.index.max() + 1] = [x for x in range(7)]

获取字典中具有最大值的键?

问题:获取字典中具有最大值的键?

我有一个dictionary:键是字符串,值是整数。

例:

stats = {'a':1000, 'b':3000, 'c': 100}

我想得到'b'一个答案,因为它是具有更高价值的关键。

我使用带有反向键值元组的中间列表进行了以下操作:

inverse = [(value, key) for key, value in stats.items()]
print max(inverse)[1]

那是更好(或更优雅)的方法吗?

I have a dictionary: keys are strings, values are integers.

Example:

stats = {'a':1000, 'b':3000, 'c': 100}

I’d like to get 'b' as an answer, since it’s the key with a higher value.

I did the following, using an intermediate list with reversed key-value tuples:

inverse = [(value, key) for key, value in stats.items()]
print max(inverse)[1]

Is that one the better (or even more elegant) approach?


回答 0

您可以使用operator.itemgetter

import operator
stats = {'a':1000, 'b':3000, 'c': 100}
max(stats.iteritems(), key=operator.itemgetter(1))[0]

而不是在内存使用中构建新列表stats.iteritems()。该函数的key参数max()是一个函数,该函数计算用于确定如何对项目进行排名的键。

请注意,如果要使用另一个键值对“ d”:3000,则此方法将仅返回两个值中的一个,即使它们都具有最大值。

>>> import operator
>>> stats = {'a':1000, 'b':3000, 'c': 100, 'd':3000}
>>> max(stats.iteritems(), key=operator.itemgetter(1))[0]
'b' 

如果使用Python3:

>>> max(stats.items(), key=operator.itemgetter(1))[0]
'b'

You can use operator.itemgetter for that:

import operator
stats = {'a':1000, 'b':3000, 'c': 100}
max(stats.iteritems(), key=operator.itemgetter(1))[0]

And instead of building a new list in memory use stats.iteritems(). The key parameter to the max() function is a function that computes a key that is used to determine how to rank items.

Please note that if you were to have another key-value pair ‘d’: 3000 that this method will only return one of the two even though they both have the maximum value.

>>> import operator
>>> stats = {'a':1000, 'b':3000, 'c': 100, 'd':3000}
>>> max(stats.iteritems(), key=operator.itemgetter(1))[0]
'b' 

If using Python3:

>>> max(stats.items(), key=operator.itemgetter(1))[0]
'b'

回答 1

max(stats, key=stats.get)
max(stats, key=stats.get)

回答 2

我已经测试了许多变体,这是用最大值返回字典键的最快方法:

def keywithmaxval(d):
     """ a) create a list of the dict's keys and values; 
         b) return the key with the max value"""  
     v=list(d.values())
     k=list(d.keys())
     return k[v.index(max(v))]

为了给您一个想法,以下是一些候选方法:

def f1():  
     v=list(d1.values())
     k=list(d1.keys())
     return k[v.index(max(v))]

def f2():
    d3={v:k for k,v in d1.items()}
    return d3[max(d3)]

def f3():
    return list(filter(lambda t: t[1]==max(d1.values()), d1.items()))[0][0]    

def f3b():
    # same as f3 but remove the call to max from the lambda
    m=max(d1.values())
    return list(filter(lambda t: t[1]==m, d1.items()))[0][0]        

def f4():
    return [k for k,v in d1.items() if v==max(d1.values())][0]    

def f4b():
    # same as f4 but remove the max from the comprehension
    m=max(d1.values())
    return [k for k,v in d1.items() if v==m][0]        

def f5():
    return max(d1.items(), key=operator.itemgetter(1))[0]    

def f6():
    return max(d1,key=d1.get)     

def f7():
     """ a) create a list of the dict's keys and values; 
         b) return the key with the max value"""    
     v=list(d1.values())
     return list(d1.keys())[v.index(max(v))]    

def f8():
     return max(d1, key=lambda k: d1[k])     

tl=[f1,f2, f3b, f4b, f5, f6, f7, f8, f4,f3]     
cmpthese.cmpthese(tl,c=100) 

测试字典:

d1={1: 1, 2: 2, 3: 8, 4: 3, 5: 6, 6: 9, 7: 17, 8: 4, 9: 20, 10: 7, 11: 15, 
    12: 10, 13: 10, 14: 18, 15: 18, 16: 5, 17: 13, 18: 21, 19: 21, 20: 8, 
    21: 8, 22: 16, 23: 16, 24: 11, 25: 24, 26: 11, 27: 112, 28: 19, 29: 19, 
    30: 19, 3077: 36, 32: 6, 33: 27, 34: 14, 35: 14, 36: 22, 4102: 39, 38: 22, 
    39: 35, 40: 9, 41: 110, 42: 9, 43: 30, 44: 17, 45: 17, 46: 17, 47: 105, 48: 12, 
    49: 25, 50: 25, 51: 25, 52: 12, 53: 12, 54: 113, 1079: 50, 56: 20, 57: 33, 
    58: 20, 59: 33, 60: 20, 61: 20, 62: 108, 63: 108, 64: 7, 65: 28, 66: 28, 67: 28, 
    68: 15, 69: 15, 70: 15, 71: 103, 72: 23, 73: 116, 74: 23, 75: 15, 76: 23, 77: 23, 
    78: 36, 79: 36, 80: 10, 81: 23, 82: 111, 83: 111, 84: 10, 85: 10, 86: 31, 87: 31, 
    88: 18, 89: 31, 90: 18, 91: 93, 92: 18, 93: 18, 94: 106, 95: 106, 96: 13, 9232: 35, 
    98: 26, 99: 26, 100: 26, 101: 26, 103: 88, 104: 13, 106: 13, 107: 101, 1132: 63, 
    2158: 51, 112: 21, 113: 13, 116: 21, 118: 34, 119: 34, 7288: 45, 121: 96, 122: 21, 
    124: 109, 125: 109, 128: 8, 1154: 32, 131: 29, 134: 29, 136: 16, 137: 91, 140: 16, 
    142: 104, 143: 104, 146: 117, 148: 24, 149: 24, 152: 24, 154: 24, 155: 86, 160: 11, 
    161: 99, 1186: 76, 3238: 49, 167: 68, 170: 11, 172: 32, 175: 81, 178: 32, 179: 32, 
    182: 94, 184: 19, 31: 107, 188: 107, 190: 107, 196: 27, 197: 27, 202: 27, 206: 89, 
    208: 14, 214: 102, 215: 102, 220: 115, 37: 22, 224: 22, 226: 14, 232: 22, 233: 84, 
    238: 35, 242: 97, 244: 22, 250: 110, 251: 66, 1276: 58, 256: 9, 2308: 33, 262: 30, 
    263: 79, 268: 30, 269: 30, 274: 92, 1300: 27, 280: 17, 283: 61, 286: 105, 292: 118, 
    296: 25, 298: 25, 304: 25, 310: 87, 1336: 71, 319: 56, 322: 100, 323: 100, 325: 25, 
    55: 113, 334: 69, 340: 12, 1367: 40, 350: 82, 358: 33, 364: 95, 376: 108, 
    377: 64, 2429: 46, 394: 28, 395: 77, 404: 28, 412: 90, 1438: 53, 425: 59, 430: 103, 
    1456: 97, 433: 28, 445: 72, 448: 23, 466: 85, 479: 54, 484: 98, 485: 98, 488: 23, 
    6154: 37, 502: 67, 4616: 34, 526: 80, 538: 31, 566: 62, 3644: 44, 577: 31, 97: 119, 
    592: 26, 593: 75, 1619: 48, 638: 57, 646: 101, 650: 26, 110: 114, 668: 70, 2734: 41, 
    700: 83, 1732: 30, 719: 52, 728: 96, 754: 65, 1780: 74, 4858: 47, 130: 29, 790: 78, 
    1822: 43, 2051: 38, 808: 29, 850: 60, 866: 29, 890: 73, 911: 42, 958: 55, 970: 99, 
    976: 24, 166: 112}

以及在Python 3.2下的测试结果:

    rate/sec       f4      f3    f3b     f8     f5     f2    f4b     f6     f7     f1
f4       454       --   -2.5% -96.9% -97.5% -98.6% -98.6% -98.7% -98.7% -98.9% -99.0%
f3       466     2.6%      -- -96.8% -97.4% -98.6% -98.6% -98.6% -98.7% -98.9% -99.0%
f3b   14,715  3138.9% 3057.4%     -- -18.6% -55.5% -56.0% -56.4% -58.3% -63.8% -68.4%
f8    18,070  3877.3% 3777.3%  22.8%     -- -45.4% -45.9% -46.5% -48.8% -55.5% -61.2%
f5    33,091  7183.7% 7000.5% 124.9%  83.1%     --  -1.0%  -2.0%  -6.3% -18.6% -29.0%
f2    33,423  7256.8% 7071.8% 127.1%  85.0%   1.0%     --  -1.0%  -5.3% -17.7% -28.3%
f4b   33,762  7331.4% 7144.6% 129.4%  86.8%   2.0%   1.0%     --  -4.4% -16.9% -27.5%
f6    35,300  7669.8% 7474.4% 139.9%  95.4%   6.7%   5.6%   4.6%     -- -13.1% -24.2%
f7    40,631  8843.2% 8618.3% 176.1% 124.9%  22.8%  21.6%  20.3%  15.1%     -- -12.8%
f1    46,598 10156.7% 9898.8% 216.7% 157.9%  40.8%  39.4%  38.0%  32.0%  14.7%     --

在Python 2.7下:

    rate/sec       f3       f4     f8    f3b     f6     f5     f2    f4b     f7     f1
f3       384       --    -2.6% -97.1% -97.2% -97.9% -97.9% -98.0% -98.2% -98.5% -99.2%
f4       394     2.6%       -- -97.0% -97.2% -97.8% -97.9% -98.0% -98.1% -98.5% -99.1%
f8    13,079  3303.3%  3216.1%     --  -5.6% -28.6% -29.9% -32.8% -38.3% -49.7% -71.2%
f3b   13,852  3504.5%  3412.1%   5.9%     -- -24.4% -25.8% -28.9% -34.6% -46.7% -69.5%
f6    18,325  4668.4%  4546.2%  40.1%  32.3%     --  -1.8%  -5.9% -13.5% -29.5% -59.6%
f5    18,664  4756.5%  4632.0%  42.7%  34.7%   1.8%     --  -4.1% -11.9% -28.2% -58.8%
f2    19,470  4966.4%  4836.5%  48.9%  40.6%   6.2%   4.3%     --  -8.1% -25.1% -57.1%
f4b   21,187  5413.0%  5271.7%  62.0%  52.9%  15.6%  13.5%   8.8%     -- -18.5% -53.3%
f7    26,002  6665.8%  6492.4%  98.8%  87.7%  41.9%  39.3%  33.5%  22.7%     -- -42.7%
f1    45,354 11701.5% 11399.0% 246.8% 227.4% 147.5% 143.0% 132.9% 114.1%  74.4%     -- 

您可以看到f1在Python 3.2和2.7下这是最快的(或者更完整地说,keywithmaxval在本文的顶部)

I have tested MANY variants, and this is the fastest way to return the key of dict with the max value:

def keywithmaxval(d):
     """ a) create a list of the dict's keys and values; 
         b) return the key with the max value"""  
     v=list(d.values())
     k=list(d.keys())
     return k[v.index(max(v))]

To give you an idea, here are some candidate methods:

def f1():  
     v=list(d1.values())
     k=list(d1.keys())
     return k[v.index(max(v))]

def f2():
    d3={v:k for k,v in d1.items()}
    return d3[max(d3)]

def f3():
    return list(filter(lambda t: t[1]==max(d1.values()), d1.items()))[0][0]    

def f3b():
    # same as f3 but remove the call to max from the lambda
    m=max(d1.values())
    return list(filter(lambda t: t[1]==m, d1.items()))[0][0]        

def f4():
    return [k for k,v in d1.items() if v==max(d1.values())][0]    

def f4b():
    # same as f4 but remove the max from the comprehension
    m=max(d1.values())
    return [k for k,v in d1.items() if v==m][0]        

def f5():
    return max(d1.items(), key=operator.itemgetter(1))[0]    

def f6():
    return max(d1,key=d1.get)     

def f7():
     """ a) create a list of the dict's keys and values; 
         b) return the key with the max value"""    
     v=list(d1.values())
     return list(d1.keys())[v.index(max(v))]    

def f8():
     return max(d1, key=lambda k: d1[k])     

tl=[f1,f2, f3b, f4b, f5, f6, f7, f8, f4,f3]     
cmpthese.cmpthese(tl,c=100) 

The test dictionary:

d1={1: 1, 2: 2, 3: 8, 4: 3, 5: 6, 6: 9, 7: 17, 8: 4, 9: 20, 10: 7, 11: 15, 
    12: 10, 13: 10, 14: 18, 15: 18, 16: 5, 17: 13, 18: 21, 19: 21, 20: 8, 
    21: 8, 22: 16, 23: 16, 24: 11, 25: 24, 26: 11, 27: 112, 28: 19, 29: 19, 
    30: 19, 3077: 36, 32: 6, 33: 27, 34: 14, 35: 14, 36: 22, 4102: 39, 38: 22, 
    39: 35, 40: 9, 41: 110, 42: 9, 43: 30, 44: 17, 45: 17, 46: 17, 47: 105, 48: 12, 
    49: 25, 50: 25, 51: 25, 52: 12, 53: 12, 54: 113, 1079: 50, 56: 20, 57: 33, 
    58: 20, 59: 33, 60: 20, 61: 20, 62: 108, 63: 108, 64: 7, 65: 28, 66: 28, 67: 28, 
    68: 15, 69: 15, 70: 15, 71: 103, 72: 23, 73: 116, 74: 23, 75: 15, 76: 23, 77: 23, 
    78: 36, 79: 36, 80: 10, 81: 23, 82: 111, 83: 111, 84: 10, 85: 10, 86: 31, 87: 31, 
    88: 18, 89: 31, 90: 18, 91: 93, 92: 18, 93: 18, 94: 106, 95: 106, 96: 13, 9232: 35, 
    98: 26, 99: 26, 100: 26, 101: 26, 103: 88, 104: 13, 106: 13, 107: 101, 1132: 63, 
    2158: 51, 112: 21, 113: 13, 116: 21, 118: 34, 119: 34, 7288: 45, 121: 96, 122: 21, 
    124: 109, 125: 109, 128: 8, 1154: 32, 131: 29, 134: 29, 136: 16, 137: 91, 140: 16, 
    142: 104, 143: 104, 146: 117, 148: 24, 149: 24, 152: 24, 154: 24, 155: 86, 160: 11, 
    161: 99, 1186: 76, 3238: 49, 167: 68, 170: 11, 172: 32, 175: 81, 178: 32, 179: 32, 
    182: 94, 184: 19, 31: 107, 188: 107, 190: 107, 196: 27, 197: 27, 202: 27, 206: 89, 
    208: 14, 214: 102, 215: 102, 220: 115, 37: 22, 224: 22, 226: 14, 232: 22, 233: 84, 
    238: 35, 242: 97, 244: 22, 250: 110, 251: 66, 1276: 58, 256: 9, 2308: 33, 262: 30, 
    263: 79, 268: 30, 269: 30, 274: 92, 1300: 27, 280: 17, 283: 61, 286: 105, 292: 118, 
    296: 25, 298: 25, 304: 25, 310: 87, 1336: 71, 319: 56, 322: 100, 323: 100, 325: 25, 
    55: 113, 334: 69, 340: 12, 1367: 40, 350: 82, 358: 33, 364: 95, 376: 108, 
    377: 64, 2429: 46, 394: 28, 395: 77, 404: 28, 412: 90, 1438: 53, 425: 59, 430: 103, 
    1456: 97, 433: 28, 445: 72, 448: 23, 466: 85, 479: 54, 484: 98, 485: 98, 488: 23, 
    6154: 37, 502: 67, 4616: 34, 526: 80, 538: 31, 566: 62, 3644: 44, 577: 31, 97: 119, 
    592: 26, 593: 75, 1619: 48, 638: 57, 646: 101, 650: 26, 110: 114, 668: 70, 2734: 41, 
    700: 83, 1732: 30, 719: 52, 728: 96, 754: 65, 1780: 74, 4858: 47, 130: 29, 790: 78, 
    1822: 43, 2051: 38, 808: 29, 850: 60, 866: 29, 890: 73, 911: 42, 958: 55, 970: 99, 
    976: 24, 166: 112}

And the test results under Python 3.2:

    rate/sec       f4      f3    f3b     f8     f5     f2    f4b     f6     f7     f1
f4       454       --   -2.5% -96.9% -97.5% -98.6% -98.6% -98.7% -98.7% -98.9% -99.0%
f3       466     2.6%      -- -96.8% -97.4% -98.6% -98.6% -98.6% -98.7% -98.9% -99.0%
f3b   14,715  3138.9% 3057.4%     -- -18.6% -55.5% -56.0% -56.4% -58.3% -63.8% -68.4%
f8    18,070  3877.3% 3777.3%  22.8%     -- -45.4% -45.9% -46.5% -48.8% -55.5% -61.2%
f5    33,091  7183.7% 7000.5% 124.9%  83.1%     --  -1.0%  -2.0%  -6.3% -18.6% -29.0%
f2    33,423  7256.8% 7071.8% 127.1%  85.0%   1.0%     --  -1.0%  -5.3% -17.7% -28.3%
f4b   33,762  7331.4% 7144.6% 129.4%  86.8%   2.0%   1.0%     --  -4.4% -16.9% -27.5%
f6    35,300  7669.8% 7474.4% 139.9%  95.4%   6.7%   5.6%   4.6%     -- -13.1% -24.2%
f7    40,631  8843.2% 8618.3% 176.1% 124.9%  22.8%  21.6%  20.3%  15.1%     -- -12.8%
f1    46,598 10156.7% 9898.8% 216.7% 157.9%  40.8%  39.4%  38.0%  32.0%  14.7%     --

And under Python 2.7:

    rate/sec       f3       f4     f8    f3b     f6     f5     f2    f4b     f7     f1
f3       384       --    -2.6% -97.1% -97.2% -97.9% -97.9% -98.0% -98.2% -98.5% -99.2%
f4       394     2.6%       -- -97.0% -97.2% -97.8% -97.9% -98.0% -98.1% -98.5% -99.1%
f8    13,079  3303.3%  3216.1%     --  -5.6% -28.6% -29.9% -32.8% -38.3% -49.7% -71.2%
f3b   13,852  3504.5%  3412.1%   5.9%     -- -24.4% -25.8% -28.9% -34.6% -46.7% -69.5%
f6    18,325  4668.4%  4546.2%  40.1%  32.3%     --  -1.8%  -5.9% -13.5% -29.5% -59.6%
f5    18,664  4756.5%  4632.0%  42.7%  34.7%   1.8%     --  -4.1% -11.9% -28.2% -58.8%
f2    19,470  4966.4%  4836.5%  48.9%  40.6%   6.2%   4.3%     --  -8.1% -25.1% -57.1%
f4b   21,187  5413.0%  5271.7%  62.0%  52.9%  15.6%  13.5%   8.8%     -- -18.5% -53.3%
f7    26,002  6665.8%  6492.4%  98.8%  87.7%  41.9%  39.3%  33.5%  22.7%     -- -42.7%
f1    45,354 11701.5% 11399.0% 246.8% 227.4% 147.5% 143.0% 132.9% 114.1%  74.4%     -- 

You can see that f1 is the fastest under Python 3.2 and 2.7 (or, more completely, keywithmaxval at the top of this post)


回答 3

如果你只需要知道与最高值的键,你可以不用iterkeys或者iteritems因为迭代通过Python字典是迭代通过它的键。

max_key = max(stats, key=lambda k: stats[k])

编辑:

从评论@ user1274878:

我是python的新手。您能分步解释您的答案吗?

是的

最高

max(iterable [,key])

max(arg1,arg2,* args [,key])

返回可迭代的最大项或两个或多个参数中的最大项。

可选key参数描述如何比较元素以获得最大的元素:

lambda <item>: return <a result of operation with item> 

返回的值将被比较。

辞典

Python dict是一个哈希表。dict的键是声明为键的对象的哈希。由于性能原因,尽管通过字典的键实现了迭代,但仍执行迭代。

因此,我们可以使用它来摆脱获取密钥列表的操作。

关闭

在另一个函数内部定义的函数称为嵌套函数。嵌套函数可以访问封闭范围的变量。

stats通过函数的__closure__属性可用的变量,lambda作为指向父范围中定义的变量值的指针。

If you need to know only a key with the max value you can do it without iterkeys or iteritems because iteration through dictionary in Python is iteration through it’s keys.

max_key = max(stats, key=lambda k: stats[k])

EDIT:

From comments, @user1274878 :

I am new to python. Can you please explain your answer in steps?

Yep…

max

max(iterable[, key])

max(arg1, arg2, *args[, key])

Return the largest item in an iterable or the largest of two or more arguments.

The optional key argument describes how to compare elements to get maximum among them:

lambda <item>: return <a result of operation with item> 

Returned values will be compared.

Dict

Python dict is a hash table. A key of dict is a hash of an object declared as a key. Due to performance reasons iteration though a dict implemented as iteration through it’s keys.

Therefore we can use it to rid operation of obtaining a keys list.

Closure

A function defined inside another function is called a nested function. Nested functions can access variables of the enclosing scope.

The stats variable available through __closure__ attribute of the lambda function as a pointer to the value of the variable defined in the parent scope.


回答 4

例:

stats = {'a':1000, 'b':3000, 'c': 100}

如果您想通过键找到最大值,则跟随可能很简单,而无需任何相关功能。

max(stats, key=stats.get)

输出是具有最大值的键。

Example:

stats = {'a':1000, 'b':3000, 'c': 100}

if you wanna find the max value with its key, maybe follwing could be simple, without any relevant functions.

max(stats, key=stats.get)

the output is the key which has the max value.


回答 5

这是另一个:

stats = {'a':1000, 'b':3000, 'c': 100}
max(stats.iterkeys(), key=lambda k: stats[k])

该函数key仅返回应用于排序的值,并立即max()返回所需的元素。

Here is another one:

stats = {'a':1000, 'b':3000, 'c': 100}
max(stats.iterkeys(), key=lambda k: stats[k])

The function key simply returns the value that should be used for ranking and max() returns the demanded element right away.


回答 6

key, value = max(stats.iteritems(), key=lambda x:x[1])

如果您不在乎价值(我会很惊讶,但是),您可以这样做:

key, _ = max(stats.iteritems(), key=lambda x:x[1])

与表达式末尾的[0]下标相比,我更喜欢将元组拆包。我从不非常喜欢lambda表达式的可读性,但是发现它比operator.itemgetter(1)更好。

key, value = max(stats.iteritems(), key=lambda x:x[1])

If you don’t care about value (I’d be surprised, but) you can do:

key, _ = max(stats.iteritems(), key=lambda x:x[1])

I like the tuple unpacking better than a [0] subscript at the end of the expression. I never like the readability of lambda expressions very much, but find this one better than the operator.itemgetter(1) IMHO.


回答 7

鉴于有多个条目,我拥有最大值。我将列出具有最大值的键。

>>> stats = {'a':1000, 'b':3000, 'c': 100, 'd':3000}
>>> [key for m in [max(stats.values())] for key,val in stats.iteritems() if val == m]
['b', 'd']

这也将为您提供“ b”和任何其他最大键。

注意:对于python 3,请使用stats.items()代替stats.iteritems()

Given that more than one entry my have the max value. I would make a list of the keys that have the max value as their value.

>>> stats = {'a':1000, 'b':3000, 'c': 100, 'd':3000}
>>> [key for m in [max(stats.values())] for key,val in stats.iteritems() if val == m]
['b', 'd']

This will give you ‘b’ and any other max key as well.

Note: For python 3 use stats.items() instead of stats.iteritems()


回答 8

您可以使用:

max(d, key = d.get) 
# which is equivalent to 
max(d, key = lambda k : d.get(k))

要返回键,值对使用:

max(d.items(), key = lambda k : k[1])

You can use:

max(d, key = d.get) 
# which is equivalent to 
max(d, key = lambda k : d.get(k))

To return the key, value pair use:

max(d.items(), key = lambda k : k[1])

回答 9

要获得字典的最大键/值stats

stats = {'a':1000, 'b':3000, 'c': 100}
  • 基于

>>> max(stats.items(), key = lambda x: x[0]) ('c', 100)

  • 基于价值

>>> max(stats.items(), key = lambda x: x[1]) ('b', 3000)

当然,如果只想从结果中获取键或值,则可以使用元组索引。例如,要获取对应于最大值的密钥:

>>> max(stats.items(), key = lambda x: x[1])[0] 'b'

说明

items()Python 3中的dictionary方法返回字典的view对象。通过该max函数迭代该视图对象时,它会将字典项生成为form的元组(key, value)

>>> list(stats.items()) [('c', 100), ('b', 3000), ('a', 1000)]

使用lambda表达式时lambda x: x[1],在每次迭代中,x 都是这些元组之一(key, value)。因此,通过选择正确的索引,您可以选择是按键还是按值进行比较。

Python 2

对于Python 2.2+版本,相同的代码将起作用。但是,最好使用iteritems()字典方法而不是items()性能。

笔记

To get the maximum key/value of the dictionary stats:

stats = {'a':1000, 'b':3000, 'c': 100}
  • Based on keys

>>> max(stats.items(), key = lambda x: x[0]) ('c', 100)

  • Based on values

>>> max(stats.items(), key = lambda x: x[1]) ('b', 3000)

Of course, if you want to get only the key or value from the result, you can use tuple indexing. For Example, to get the key corresponding to the maximum value:

>>> max(stats.items(), key = lambda x: x[1])[0] 'b'

Explanation

The dictionary method items() in Python 3 returns a view object of the dictionary. When this view object is iterated over, by the max function, it yields the dictionary items as tuples of the form (key, value).

>>> list(stats.items()) [('c', 100), ('b', 3000), ('a', 1000)]

When you use the lambda expression lambda x: x[1], in each iteration, x is one of these tuples (key, value). So, by choosing the right index, you select whether you want to compare by keys or by values.

Python 2

For Python 2.2+ releases, the same code will work. However, it is better to use iteritems() dictionary method instead of items() for performance.

Notes


回答 10

d = {'A': 4,'B':10}

min_v = min(zip(d.values(), d.keys()))
# min_v is (4,'A')

max_v = max(zip(d.values(), d.keys()))
# max_v is (10,'B')
d = {'A': 4,'B':10}

min_v = min(zip(d.values(), d.keys()))
# min_v is (4,'A')

max_v = max(zip(d.values(), d.keys()))
# max_v is (10,'B')

回答 11

通过在选定答案中的注释进行迭代的解决方案…

在Python 3中:

max(stats.keys(), key=(lambda k: stats[k]))

在Python 2中:

max(stats.iterkeys(), key=(lambda k: stats[k]))

Per the iterated solutions via comments in the selected answer…

In Python 3:

max(stats.keys(), key=(lambda k: stats[k]))

In Python 2:

max(stats.iterkeys(), key=(lambda k: stats[k]))

回答 12

我到达这里是mydict.keys()根据的值寻找如何返回mydict.values()。我不只是返回一个键,而是希望返回前x个值。

该解决方案比使用该max()函数更简单,并且您可以轻松更改返回的值数量:

stats = {'a':1000, 'b':3000, 'c': 100}

x = sorted(stats, key=(lambda key:stats[key]), reverse=True)
['b', 'a', 'c']

如果要使用单个最高排名的键,只需使用索引:

x[0]
['b']

如果要使用排名最高的前两个键,请使用列表切片:

x[:2]
['b', 'a']

I got here looking for how to return mydict.keys() based on the value of mydict.values(). Instead of just the one key returned, I was looking to return the top x number of values.

This solution is simpler than using the max() function and you can easily change the number of values returned:

stats = {'a':1000, 'b':3000, 'c': 100}

x = sorted(stats, key=(lambda key:stats[key]), reverse=True)
['b', 'a', 'c']

If you want the single highest ranking key, just use the index:

x[0]
['b']

If you want the top two highest ranking keys, just use list slicing:

x[:2]
['b', 'a']

回答 13

我对这些答案都不满意。max总是选择具有最大值的第一个键。字典可以有多个具有该值的键。

def keys_with_top_values(my_dict):
    return [key  for (key, value) in my_dict.items() if value == max(my_dict.values())]

发布此答案,以防有人帮忙。见下面的SO帖子

如果是平局,Python会选择哪个最大值?

I was not satisfied with any of these answers. max always picks the first key with the max value. The dictionary could have multiple keys with that value.

def keys_with_top_values(my_dict):
    return [key  for (key, value) in my_dict.items() if value == max(my_dict.values())]

Posting this answer in case it helps someone out. See the below SO post

Which maximum does Python pick in the case of a tie?


回答 14

collections.Counter你可以做

>>> import collections
>>> stats = {'a':1000, 'b':3000, 'c': 100}
>>> stats = collections.Counter(stats)
>>> stats.most_common(1)
[('b', 3000)]

如果合适,您可以简单地以空开头collections.Counter并添加到其中

>>> stats = collections.Counter()
>>> stats['a'] += 1
:
etc. 

With collections.Counter you could do

>>> import collections
>>> stats = {'a':1000, 'b':3000, 'c': 100}
>>> stats = collections.Counter(stats)
>>> stats.most_common(1)
[('b', 3000)]

If appropriate, you could simply start with an empty collections.Counter and add to it

>>> stats = collections.Counter()
>>> stats['a'] += 1
:
etc. 

回答 15

堆队列是一种通用解决方案,它允许您提取按值排序的前n个键:

from heapq import nlargest

stats = {'a':1000, 'b':3000, 'c': 100}

res1 = nlargest(1, stats, key=stats.__getitem__)  # ['b']
res2 = nlargest(2, stats, key=stats.__getitem__)  # ['b', 'a']

res1_val = next(iter(res1))                       # 'b'

注意dict.__getitem__是语法糖调用的方法dict[]。与相对dict.getKeyError如果未找到密钥,它将返回,在此不会发生。

A heap queue is a generalised solution which allows you to extract the top n keys ordered by value:

from heapq import nlargest

stats = {'a':1000, 'b':3000, 'c': 100}

res1 = nlargest(1, stats, key=stats.__getitem__)  # ['b']
res2 = nlargest(2, stats, key=stats.__getitem__)  # ['b', 'a']

res1_val = next(iter(res1))                       # 'b'

Note dict.__getitem__ is the method called by the syntactic sugar dict[]. As opposed to dict.get, it will return KeyError if a key is not found, which here cannot occur.


回答 16

max((value, key) for key, value in stats.items())[1]

max((value, key) for key, value in stats.items())[1]


回答 17

+1 @Aric Coady的最简单的解决方案。
还有一种在字典中随机选择具有最大值的键之一的方法:

stats = {'a':1000, 'b':3000, 'c': 100, 'd':3000}

import random
maxV = max(stats.values())
# Choice is one of the keys with max value
choice = random.choice([key for key, value in stats.items() if value == maxV])

+1 to @Aric Coady‘s simplest solution.
And also one way to random select one of keys with max value in the dictionary:

stats = {'a':1000, 'b':3000, 'c': 100, 'd':3000}

import random
maxV = max(stats.values())
# Choice is one of the keys with max value
choice = random.choice([key for key, value in stats.items() if value == maxV])

回答 18

Counter = 0
for word in stats.keys():
    if stats[word]> counter:
        Counter = stats [word]
print Counter
Counter = 0
for word in stats.keys():
    if stats[word]> counter:
        Counter = stats [word]
print Counter

回答 19

怎么样:

 max(zip(stats.keys(), stats.values()), key=lambda t : t[1])[0]

How about:

 max(zip(stats.keys(), stats.values()), key=lambda t : t[1])[0]

回答 20

我在一个非常基本的循环中测试了可接受的答案和@thewolf最快的解决方案,该循环比两者都快:

import time
import operator


d = {"a"+str(i): i for i in range(1000000)}

def t1(dct):
    mx = float("-inf")
    key = None
    for k,v in dct.items():
        if v > mx:
            mx = v
            key = k
    return key

def t2(dct):
    v=list(dct.values())
    k=list(dct.keys())
    return k[v.index(max(v))]

def t3(dct):
    return max(dct.items(),key=operator.itemgetter(1))[0]

start = time.time()
for i in range(25):
    m = t1(d)
end = time.time()
print ("Iterating: "+str(end-start))

start = time.time()
for i in range(25):
    m = t2(d)
end = time.time()
print ("List creating: "+str(end-start))

start = time.time()
for i in range(25):
    m = t3(d)
end = time.time()
print ("Accepted answer: "+str(end-start))

结果:

Iterating: 3.8201940059661865
List creating: 6.928712844848633
Accepted answer: 5.464320182800293

I tested the accepted answer AND @thewolf’s fastest solution against a very basic loop and the loop was faster than both:

import time
import operator


d = {"a"+str(i): i for i in range(1000000)}

def t1(dct):
    mx = float("-inf")
    key = None
    for k,v in dct.items():
        if v > mx:
            mx = v
            key = k
    return key

def t2(dct):
    v=list(dct.values())
    k=list(dct.keys())
    return k[v.index(max(v))]

def t3(dct):
    return max(dct.items(),key=operator.itemgetter(1))[0]

start = time.time()
for i in range(25):
    m = t1(d)
end = time.time()
print ("Iterating: "+str(end-start))

start = time.time()
for i in range(25):
    m = t2(d)
end = time.time()
print ("List creating: "+str(end-start))

start = time.time()
for i in range(25):
    m = t3(d)
end = time.time()
print ("Accepted answer: "+str(end-start))

results:

Iterating: 3.8201940059661865
List creating: 6.928712844848633
Accepted answer: 5.464320182800293

回答 21

对于科学的python用户,以下是使用Pandas的简单解决方案:

import pandas as pd
stats = {'a': 1000, 'b': 3000, 'c': 100}
series = pd.Series(stats)
series.idxmax()

>>> b

For scientific python users, here is a simple solution using Pandas:

import pandas as pd
stats = {'a': 1000, 'b': 3000, 'c': 100}
series = pd.Series(stats)
series.idxmax()

>>> b

回答 22

如果您有多个具有相同值的键,例如:

stats = {'a':1000, 'b':3000, 'c': 100, 'd':3000, 'e':3000}

您可以获得具有最大值的所有键的集合,如下所示:

from collections import defaultdict
from collections import OrderedDict

groupedByValue = defaultdict(list)
for key, value in sorted(stats.items()):
    groupedByValue[value].append(key)

# {1000: ['a'], 3000: ['b', 'd', 'e'], 100: ['c']}

groupedByValue[max(groupedByValue)]
# ['b', 'd', 'e']

In the case you have more than one key with the same value, for example:

stats = {'a':1000, 'b':3000, 'c': 100, 'd':3000, 'e':3000}

You could get a collection with all the keys with max value as follow:

from collections import defaultdict
from collections import OrderedDict

groupedByValue = defaultdict(list)
for key, value in sorted(stats.items()):
    groupedByValue[value].append(key)

# {1000: ['a'], 3000: ['b', 'd', 'e'], 100: ['c']}

groupedByValue[max(groupedByValue)]
# ['b', 'd', 'e']

回答 23

更简单易懂的方法:

dict = { 'a':302, 'e':53, 'g':302, 'h':100 }
max_value_keys = [key for key in dict.keys() if dict[key] == max(dict.values())]
print(max_value_keys) # prints a list of keys with max value

输出: [‘a’,’g’]

现在您只能选择一个键:

maximum = dict[max_value_keys[0]]

Much simpler to understand approach:

dict = { 'a':302, 'e':53, 'g':302, 'h':100 }
max_value_keys = [key for key in dict.keys() if dict[key] == max(dict.values())]
print(max_value_keys) # prints a list of keys with max value

Output: [‘a’, ‘g’]

Now you can choose only one key:

maximum = dict[max_value_keys[0]]

如何并行地遍历两个列表?

问题:如何并行地遍历两个列表?

我在Python中有两个可迭代的对象,我想成对地遍历它们:

foo = (1, 2, 3)
bar = (4, 5, 6)

for (f, b) in some_iterator(foo, bar):
    print "f: ", f, "; b: ", b

它应导致:

f: 1; b: 4
f: 2; b: 5
f: 3; b: 6

一种方法是遍历索引:

for i in xrange(len(foo)):
    print "f: ", foo[i], "; b: ", b[i]

但这对我来说似乎有些不可思议。有更好的方法吗?

I have two iterables in Python, and I want to go over them in pairs:

foo = (1, 2, 3)
bar = (4, 5, 6)

for (f, b) in some_iterator(foo, bar):
    print "f: ", f, "; b: ", b

It should result in:

f: 1; b: 4
f: 2; b: 5
f: 3; b: 6

One way to do it is to iterate over the indices:

for i in xrange(len(foo)):
    print "f: ", foo[i], "; b: ", b[i]

But that seems somewhat unpythonic to me. Is there a better way to do it?


回答 0

Python 3

for f, b in zip(foo, bar):
    print(f, b)

zipfoo或中的较短者bar停止。

Python 3zip 中,像itertools.izip在Python2中一样,返回元组的迭代器。要获取元组列表,请使用list(zip(foo, bar))。要压缩直到两个迭代器都用尽,可以使用 itertools.zip_longest

Python 2

Python 2中zip 返回一个元组列表。当foobar不是很大时,这很好。如果它们都是大量的,则形成zip(foo,bar)是不必要的大量临时变量,应将其替换为itertools.izipitertools.izip_longest,它返回迭代器而不是列表。

import itertools
for f,b in itertools.izip(foo,bar):
    print(f,b)
for f,b in itertools.izip_longest(foo,bar):
    print(f,b)

izipfoobar耗尽时停止。 izip_longest当两个停止foobar耗尽。当较短的迭代器用尽时,将izip_longest生成一个元组,None其位置与该迭代器相对应。您还可以设置不同fillvalue之外None,如果你想。看到这里的完整故事


还要注意zip,其zip类似brethen可以接受任意数量的Iterables作为参数。例如,

for num, cheese, color in zip([1,2,3], ['manchego', 'stilton', 'brie'], 
                              ['red', 'blue', 'green']):
    print('{} {} {}'.format(num, color, cheese))

版画

1 red manchego
2 blue stilton
3 green brie

Python 3

for f, b in zip(foo, bar):
    print(f, b)

zip stops when the shorter of foo or bar stops.

In Python 3, zip returns an iterator of tuples, like itertools.izip in Python2. To get a list of tuples, use list(zip(foo, bar)). And to zip until both iterators are exhausted, you would use itertools.zip_longest.

Python 2

In Python 2, zip returns a list of tuples. This is fine when foo and bar are not massive. If they are both massive then forming zip(foo,bar) is an unnecessarily massive temporary variable, and should be replaced by itertools.izip or itertools.izip_longest, which returns an iterator instead of a list.

import itertools
for f,b in itertools.izip(foo,bar):
    print(f,b)
for f,b in itertools.izip_longest(foo,bar):
    print(f,b)

izip stops when either foo or bar is exhausted. izip_longest stops when both foo and bar are exhausted. When the shorter iterator(s) are exhausted, izip_longest yields a tuple with None in the position corresponding to that iterator. You can also set a different fillvalue besides None if you wish. See here for the full story.


Note also that zip and its zip-like brethen can accept an arbitrary number of iterables as arguments. For example,

for num, cheese, color in zip([1,2,3], ['manchego', 'stilton', 'brie'], 
                              ['red', 'blue', 'green']):
    print('{} {} {}'.format(num, color, cheese))

prints

1 red manchego
2 blue stilton
3 green brie

回答 1

您需要该zip功能。

for (f,b) in zip(foo, bar):
    print "f: ", f ,"; b: ", b

You want the zip function.

for (f,b) in zip(foo, bar):
    print "f: ", f ,"; b: ", b

回答 2

您应该使用“ zip ”功能。这是您自己的zip函数的外观示例

def custom_zip(seq1, seq2):
    it1 = iter(seq1)
    it2 = iter(seq2)
    while True:
        yield next(it1), next(it2)

You should use ‘zip‘ function. Here is an example how your own zip function can look like

def custom_zip(seq1, seq2):
    it1 = iter(seq1)
    it2 = iter(seq2)
    while True:
        yield next(it1), next(it2)

回答 3

您可以使用理解将第n个元素捆绑到一个元组或列表中,然后使用生成器函数将其传递出去。

def iterate_multi(*lists):
    for i in range(min(map(len,lists))):
        yield tuple(l[i] for l in lists)

for l1, l2, l3 in iterate_multi([1,2,3],[4,5,6],[7,8,9]):
    print(str(l1)+","+str(l2)+","+str(l3))

You can bundle the nth elements into a tuple or list using comprehension, then pass them out with a generator function.

def iterate_multi(*lists):
    for i in range(min(map(len,lists))):
        yield tuple(l[i] for l in lists)

for l1, l2, l3 in iterate_multi([1,2,3],[4,5,6],[7,8,9]):
    print(str(l1)+","+str(l2)+","+str(l3))

回答 4

万一有人在寻找这样的东西,我发现它非常简单:

list_1 = ["Hello", "World"]
list_2 = [1, 2, 3]

for a,b in [(list_1, list_2)]:
    for element_a in a:
        ...
    for element_b in b:
        ...

>> Hello
World
1
2
3

列表将以其全部内容进行迭代,而zip()只会迭代最大内容长度。

In case someone is looking for something like this, I found it very simple and easy:

list_1 = ["Hello", "World"]
list_2 = [1, 2, 3]

for a,b in [(list_1, list_2)]:
    for element_a in a:
        ...
    for element_b in b:
        ...

>> Hello
World
1
2
3

The lists will be iterated with their full content, unlike zip() which only iterates up to the minimum content length.


回答 5

以下是使用列表理解的方法:

a = (1, 2, 3)
b = (4, 5, 6)
[print('f:', i, '; b', j) for i, j in zip(a, b)]

印刷品:

f: 1 ; b 4
f: 2 ; b 5
f: 3 ; b 6

Here’s how to do it with list comprehension:

a = (1, 2, 3)
b = (4, 5, 6)
[print('f:', i, '; b', j) for i, j in zip(a, b)]

prints:

f: 1 ; b 4
f: 2 ; b 5
f: 3 ; b 6

在Python 3中将字符串转换为字节的最佳方法?

问题:在Python 3中将字符串转换为字节的最佳方法?

TypeError的答案中可以看出,有两种不同的方法可以将字符串转换为字节:’str’不支持缓冲区接口

以下哪种方法更好或更Pythonic?还是仅仅是个人喜好问题?

b = bytes(mystring, 'utf-8')

b = mystring.encode('utf-8')

There appear to be two different ways to convert a string to bytes, as seen in the answers to TypeError: ‘str’ does not support the buffer interface

Which of these methods would be better or more Pythonic? Or is it just a matter of personal preference?

b = bytes(mystring, 'utf-8')

b = mystring.encode('utf-8')

回答 0

如果您查看的文档bytes,它将指向bytearray

bytearray([源[,编码[,错误]]])

返回一个新的字节数组。字节数组类型是一个可变的整数序列,范围为0 <= x <256。它具有可变序列类型中介绍的大多数可变序列的常用方法,以及字节类型具有的大多数方法,请参见字节和。字节数组方法。

可选的source参数可以通过几种不同的方式用于初始化数组:

如果是字符串,则还必须提供编码(以及可选的错误)参数;然后,bytearray()使用str.encode()将字符串转换为字节。

如果它是整数,则数组将具有该大小,并将使用空字节初始化。

如果它是符合缓冲区接口的对象,则该对象的只读缓冲区将用于初始化bytes数组。

如果是可迭代的,则它必须是0 <= x <256范围内的整数的可迭代对象,这些整数用作数组的初始内容。

没有参数,将创建大小为0的数组。

所以 bytes除了编码字符串以外,还可以做更多的事情。这是Pythonic的用法,它允许您使用有意义的任何类型的源参数来调用构造函数。

对于编码字符串,我认为它some_string.encode(encoding)比使用构造函数更具Pythonic性,因为它是最易于记录的文档-“使用此字符串并使用此编码对其进行编码”比bytes(some_string, encoding) – -当您使用构造函数。

编辑:我检查了Python源。如果您将unicode字符串传递给bytes使用CPython,它将调用PyUnicode_AsEncodedString,它是encode; 的实现。因此,如果您自称,则只是跳过了间接级别encode

另外,请参见Serdalis的评论- unicode_string.encode(encoding)也是Python 风格的,因为它的反函数为byte_string.decode(encoding),对称性很好。

If you look at the docs for bytes, it points you to bytearray:

bytearray([source[, encoding[, errors]]])

Return a new array of bytes. The bytearray type is a mutable sequence of integers in the range 0 <= x < 256. It has most of the usual methods of mutable sequences, described in Mutable Sequence Types, as well as most methods that the bytes type has, see Bytes and Byte Array Methods.

The optional source parameter can be used to initialize the array in a few different ways:

If it is a string, you must also give the encoding (and optionally, errors) parameters; bytearray() then converts the string to bytes using str.encode().

If it is an integer, the array will have that size and will be initialized with null bytes.

If it is an object conforming to the buffer interface, a read-only buffer of the object will be used to initialize the bytes array.

If it is an iterable, it must be an iterable of integers in the range 0 <= x < 256, which are used as the initial contents of the array.

Without an argument, an array of size 0 is created.

So bytes can do much more than just encode a string. It’s Pythonic that it would allow you to call the constructor with any type of source parameter that makes sense.

For encoding a string, I think that some_string.encode(encoding) is more Pythonic than using the constructor, because it is the most self documenting — “take this string and encode it with this encoding” is clearer than bytes(some_string, encoding) — there is no explicit verb when you use the constructor.

Edit: I checked the Python source. If you pass a unicode string to bytes using CPython, it calls PyUnicode_AsEncodedString, which is the implementation of encode; so you’re just skipping a level of indirection if you call encode yourself.

Also, see Serdalis’ comment — unicode_string.encode(encoding) is also more Pythonic because its inverse is byte_string.decode(encoding) and symmetry is nice.


回答 1

比想像的要容易:

my_str = "hello world"
my_str_as_bytes = str.encode(my_str)
type(my_str_as_bytes) # ensure it is byte representation
my_decoded_str = my_str_as_bytes.decode()
type(my_decoded_str) # ensure it is string representation

It’s easier than it is thought:

my_str = "hello world"
my_str_as_bytes = str.encode(my_str)
type(my_str_as_bytes) # ensure it is byte representation
my_decoded_str = my_str_as_bytes.decode()
type(my_decoded_str) # ensure it is string representation

回答 2

绝对最好的办法既不是2,但第3位。自Python 3.0以来,第一个参数默认默认值。因此,最好的方法是encode 'utf-8'

b = mystring.encode()

这也将更快,因为默认参数的结果不是"utf-8"C代码中的字符串,而是NULL,它的检查快得多!

以下是一些时间安排:

In [1]: %timeit -r 10 'abc'.encode('utf-8')
The slowest run took 38.07 times longer than the fastest. 
This could mean that an intermediate result is being cached.
10000000 loops, best of 10: 183 ns per loop

In [2]: %timeit -r 10 'abc'.encode()
The slowest run took 27.34 times longer than the fastest. 
This could mean that an intermediate result is being cached.
10000000 loops, best of 10: 137 ns per loop

尽管发出警告,但重复运行后时间仍然非常稳定-偏差仅为〜2%。


encode()不带参数使用不兼容Python 2,因为在Python 2中,默认字符编码为ASCII

>>> 'äöä'.encode()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 0: ordinal not in range(128)

The absolutely best way is neither of the 2, but the 3rd. The first parameter to encode defaults to 'utf-8' ever since Python 3.0. Thus the best way is

b = mystring.encode()

This will also be faster, because the default argument results not in the string "utf-8" in the C code, but NULL, which is much faster to check!

Here be some timings:

In [1]: %timeit -r 10 'abc'.encode('utf-8')
The slowest run took 38.07 times longer than the fastest. 
This could mean that an intermediate result is being cached.
10000000 loops, best of 10: 183 ns per loop

In [2]: %timeit -r 10 'abc'.encode()
The slowest run took 27.34 times longer than the fastest. 
This could mean that an intermediate result is being cached.
10000000 loops, best of 10: 137 ns per loop

Despite the warning the times were very stable after repeated runs – the deviation was just ~2 per cent.


Using encode() without an argument is not Python 2 compatible, as in Python 2 the default character encoding is ASCII.

>>> 'äöä'.encode()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 0: ordinal not in range(128)

如何复制字典并仅编辑副本

问题:如何复制字典并仅编辑副本

有人可以向我解释一下吗?这对我来说毫无意义。

我将字典复制到另一个字典中,然后编辑第二个字典,并且两者都已更改。为什么会这样呢?

>>> dict1 = {"key1": "value1", "key2": "value2"}
>>> dict2 = dict1
>>> dict2
{'key2': 'value2', 'key1': 'value1'}
>>> dict2["key2"] = "WHY?!"
>>> dict1
{'key2': 'WHY?!', 'key1': 'value1'}

Can someone please explain this to me? This doesn’t make any sense to me.

I copy a dictionary into another and edit the second and both are changed. Why is this happening?

>>> dict1 = {"key1": "value1", "key2": "value2"}
>>> dict2 = dict1
>>> dict2
{'key2': 'value2', 'key1': 'value1'}
>>> dict2["key2"] = "WHY?!"
>>> dict1
{'key2': 'WHY?!', 'key1': 'value1'}

回答 0

Python 绝不会隐式复制对象。设置时dict2 = dict1,将使它们引用同一精确的dict对象,因此,在对它进行突变时,对其的所有引用都将始终引用该对象的当前状态。

如果要复制字典(这种情况很少见),则必须使用

dict2 = dict(dict1)

要么

dict2 = dict1.copy()

Python never implicitly copies objects. When you set dict2 = dict1, you are making them refer to the same exact dict object, so when you mutate it, all references to it keep referring to the object in its current state.

If you want to copy the dict (which is rare), you have to do so explicitly with

dict2 = dict(dict1)

or

dict2 = dict1.copy()

回答 1

分配时dict2 = dict1,您并没有复制该文件的副本dict1,导致dict2它只是它的另一个名称dict1

要复制字典等可变类型,请使用copy/ deepcopycopy模块。

import copy

dict2 = copy.deepcopy(dict1)

When you assign dict2 = dict1, you are not making a copy of dict1, it results in dict2 being just another name for dict1.

To copy the mutable types like dictionaries, use copy / deepcopy of the copy module.

import copy

dict2 = copy.deepcopy(dict1)

回答 2

虽然dict.copy()dict(dict1)生成副本,但它们只是浅表副本。如果要拷贝,copy.deepcopy(dict1)则是必需的。一个例子:

>>> source = {'a': 1, 'b': {'m': 4, 'n': 5, 'o': 6}, 'c': 3}
>>> copy1 = x.copy()
>>> copy2 = dict(x)
>>> import copy
>>> copy3 = copy.deepcopy(x)
>>> source['a'] = 10  # a change to first-level properties won't affect copies
>>> source
{'a': 10, 'c': 3, 'b': {'m': 4, 'o': 6, 'n': 5}}
>>> copy1
{'a': 1, 'c': 3, 'b': {'m': 4, 'o': 6, 'n': 5}}
>>> copy2
{'a': 1, 'c': 3, 'b': {'m': 4, 'o': 6, 'n': 5}}
>>> copy3
{'a': 1, 'c': 3, 'b': {'m': 4, 'o': 6, 'n': 5}}
>>> source['b']['m'] = 40  # a change to deep properties WILL affect shallow copies 'b.m' property
>>> source
{'a': 10, 'c': 3, 'b': {'m': 40, 'o': 6, 'n': 5}}
>>> copy1
{'a': 1, 'c': 3, 'b': {'m': 40, 'o': 6, 'n': 5}}
>>> copy2
{'a': 1, 'c': 3, 'b': {'m': 40, 'o': 6, 'n': 5}}
>>> copy3  # Deep copy's 'b.m' property is unaffected
{'a': 1, 'c': 3, 'b': {'m': 4, 'o': 6, 'n': 5}}

关于浅层副本与深层副本,来自Python copy模块docs

浅表复制和深度复制之间的区别仅与复合对象(包含其他对象的对象,如列表或类实例)有关:

  • 浅表副本构造一个新的复合对象,然后(在可能的范围内)将对原始对象中找到的对象的引用插入其中。
  • 深层副本将构造一个新的复合对象,然后递归地将原始对象中发现的对象的副本插入其中。

While dict.copy() and dict(dict1) generates a copy, they are only shallow copies. If you want a deep copy, copy.deepcopy(dict1) is required. An example:

>>> source = {'a': 1, 'b': {'m': 4, 'n': 5, 'o': 6}, 'c': 3}
>>> copy1 = x.copy()
>>> copy2 = dict(x)
>>> import copy
>>> copy3 = copy.deepcopy(x)
>>> source['a'] = 10  # a change to first-level properties won't affect copies
>>> source
{'a': 10, 'c': 3, 'b': {'m': 4, 'o': 6, 'n': 5}}
>>> copy1
{'a': 1, 'c': 3, 'b': {'m': 4, 'o': 6, 'n': 5}}
>>> copy2
{'a': 1, 'c': 3, 'b': {'m': 4, 'o': 6, 'n': 5}}
>>> copy3
{'a': 1, 'c': 3, 'b': {'m': 4, 'o': 6, 'n': 5}}
>>> source['b']['m'] = 40  # a change to deep properties WILL affect shallow copies 'b.m' property
>>> source
{'a': 10, 'c': 3, 'b': {'m': 40, 'o': 6, 'n': 5}}
>>> copy1
{'a': 1, 'c': 3, 'b': {'m': 40, 'o': 6, 'n': 5}}
>>> copy2
{'a': 1, 'c': 3, 'b': {'m': 40, 'o': 6, 'n': 5}}
>>> copy3  # Deep copy's 'b.m' property is unaffected
{'a': 1, 'c': 3, 'b': {'m': 4, 'o': 6, 'n': 5}}

Regarding shallow vs deep copies, from the Python copy module docs:

The difference between shallow and deep copying is only relevant for compound objects (objects that contain other objects, like lists or class instances):

  • A shallow copy constructs a new compound object and then (to the extent possible) inserts references into it to the objects found in the original.
  • A deep copy constructs a new compound object and then, recursively, inserts copies into it of the objects found in the original.

回答 3

在python 3.5+上,有一种更简单的方法可以通过使用**解包运算符来实现浅表副本。由Pep 448定义。

>>>dict1 = {"key1": "value1", "key2": "value2"}
>>>dict2 = {**dict1}
>>>print(dict2)
{'key1': 'value1', 'key2': 'value2'}
>>>dict2["key2"] = "WHY?!"
>>>print(dict1)
{'key1': 'value1', 'key2': 'value2'}
>>>print(dict2)
{'key1': 'value1', 'key2': 'WHY?!'}

**将字典解包为新字典,然后将其分配给dict2。

我们还可以确认每个词典都有不同的ID。

>>>id(dict1)
 178192816

>>>id(dict2)
 178192600

如果需要深层副本,那么仍然可以使用copy.deepcopy()

On python 3.5+ there is an easier way to achieve a shallow copy by using the ** unpackaging operator. Defined by Pep 448.

>>>dict1 = {"key1": "value1", "key2": "value2"}
>>>dict2 = {**dict1}
>>>print(dict2)
{'key1': 'value1', 'key2': 'value2'}
>>>dict2["key2"] = "WHY?!"
>>>print(dict1)
{'key1': 'value1', 'key2': 'value2'}
>>>print(dict2)
{'key1': 'value1', 'key2': 'WHY?!'}

** unpackages the dictionary into a new dictionary that is then assigned to dict2.

We can also confirm that each dictionary has a distinct id.

>>>id(dict1)
 178192816

>>>id(dict2)
 178192600

If a deep copy is needed then copy.deepcopy() is still the way to go.


回答 4

最好的和最简单的方法创建一个副本一个的字典中都Python的2.7和3是…

要创建简单(单级)字典的副本:

1.使用dict()方法,而不是生成指向现有dict的引用。

my_dict1 = dict()
my_dict1["message"] = "Hello Python"
print(my_dict1)  # {'message':'Hello Python'}

my_dict2 = dict(my_dict1)
print(my_dict2)  # {'message':'Hello Python'}

# Made changes in my_dict1 
my_dict1["name"] = "Emrit"
print(my_dict1)  # {'message':'Hello Python', 'name' : 'Emrit'}
print(my_dict2)  # {'message':'Hello Python'}

2.使用python字典的内置update()方法。

my_dict2 = dict()
my_dict2.update(my_dict1)
print(my_dict2)  # {'message':'Hello Python'}

# Made changes in my_dict1 
my_dict1["name"] = "Emrit"
print(my_dict1)  # {'message':'Hello Python', 'name' : 'Emrit'}
print(my_dict2)  # {'message':'Hello Python'}

要创建嵌套或复杂字典的副本:

使用内置的复制模块,该模块提供通用的浅层和深层复制操作。Python 2.7和3中都提供了此模块。*

import copy

my_dict2 = copy.deepcopy(my_dict1)

The best and the easiest ways to create a copy of a dict in both Python 2.7 and 3 are…

To create a copy of simple(single-level) dictionary:

1. Using dict() method, instead of generating a reference that points to the existing dict.

my_dict1 = dict()
my_dict1["message"] = "Hello Python"
print(my_dict1)  # {'message':'Hello Python'}

my_dict2 = dict(my_dict1)
print(my_dict2)  # {'message':'Hello Python'}

# Made changes in my_dict1 
my_dict1["name"] = "Emrit"
print(my_dict1)  # {'message':'Hello Python', 'name' : 'Emrit'}
print(my_dict2)  # {'message':'Hello Python'}

2. Using the built-in update() method of python dictionary.

my_dict2 = dict()
my_dict2.update(my_dict1)
print(my_dict2)  # {'message':'Hello Python'}

# Made changes in my_dict1 
my_dict1["name"] = "Emrit"
print(my_dict1)  # {'message':'Hello Python', 'name' : 'Emrit'}
print(my_dict2)  # {'message':'Hello Python'}

To create a copy of nested or complex dictionary:

Use the built-in copy module, which provides a generic shallow and deep copy operations. This module is present in both Python 2.7 and 3.*

import copy

my_dict2 = copy.deepcopy(my_dict1)

回答 5

您也可以使用字典理解功能来制作新字典。这样可以避免导入副本。

dout = dict((k,v) for k,v in mydict.items())

当然,在python> = 2.7中,您可以执行以下操作:

dout = {k:v for k,v in mydict.items()}

但是对于向后兼容,顶级方法更好。

You can also just make a new dictionary with a dictionary comprehension. This avoids importing copy.

dout = dict((k,v) for k,v in mydict.items())

Of course in python >= 2.7 you can do:

dout = {k:v for k,v in mydict.items()}

But for backwards compat., the top method is better.


回答 6

除了提供的其他解决方案外,您还可以**将字典集成到一个空字典中,例如,

shallow_copy_of_other_dict = {**other_dict}

现在,您将拥有的“浅”副本other_dict

应用于您的示例:

>>> dict1 = {"key1": "value1", "key2": "value2"}
>>> dict2 = {**dict1}
>>> dict2
{'key1': 'value1', 'key2': 'value2'}
>>> dict2["key2"] = "WHY?!"
>>> dict1
{'key1': 'value1', 'key2': 'value2'}
>>>

指针:浅拷贝和深拷贝之间的区别

In addition to the other provided solutions, you can use ** to integrate the dictionary into an empty dictionary, e.g.,

shallow_copy_of_other_dict = {**other_dict}.

Now you will have a “shallow” copy of other_dict.

Applied to your example:

>>> dict1 = {"key1": "value1", "key2": "value2"}
>>> dict2 = {**dict1}
>>> dict2
{'key1': 'value1', 'key2': 'value2'}
>>> dict2["key2"] = "WHY?!"
>>> dict1
{'key1': 'value1', 'key2': 'value2'}
>>>

Pointer: Difference between shallow and deep copys


回答 7

Python中的赋值语句不复制对象,它们在目标和对象之间创建绑定。

因此,dict2 = dict1它会在dict2dict1引用的对象之间产生另一个绑定。

如果要复制字典,可以使用copy module。复制模块有两个接口:

copy.copy(x)
Return a shallow copy of x.

copy.deepcopy(x)
Return a deep copy of x.

浅表复制和深度复制之间的区别仅与复合对象(包含其他对象的对象,如列表或类实例)有关:

浅拷贝构造新化合物对象,然后(在可能的范围)插入到它的对象引用原始发现。

深层副本构造新化合物的对象,然后,递归地,插入拷贝到它的目的在原始发现。

例如,在python 2.7.9中:

>>> import copy
>>> a = [1,2,3,4,['a', 'b']]
>>> b = a
>>> c = copy.copy(a)
>>> d = copy.deepcopy(a)
>>> a.append(5)
>>> a[4].append('c')

结果是:

>>> a
[1, 2, 3, 4, ['a', 'b', 'c'], 5]
>>> b
[1, 2, 3, 4, ['a', 'b', 'c'], 5]
>>> c
[1, 2, 3, 4, ['a', 'b', 'c']]
>>> d
[1, 2, 3, 4, ['a', 'b']]

Assignment statements in Python do not copy objects, they create bindings between a target and an object.

so, dict2 = dict1, it results another binding between dict2and the object that dict1 refer to.

if you want to copy a dict, you can use the copy module. The copy module has two interface:

copy.copy(x)
Return a shallow copy of x.

copy.deepcopy(x)
Return a deep copy of x.

The difference between shallow and deep copying is only relevant for compound objects (objects that contain other objects, like lists or class instances):

A shallow copy constructs a new compound object and then (to the extent possible) inserts references into it to the objects found in the original.

A deep copy constructs a new compound object and then, recursively, inserts copies into it of the objects found in the original.

For example, in python 2.7.9:

>>> import copy
>>> a = [1,2,3,4,['a', 'b']]
>>> b = a
>>> c = copy.copy(a)
>>> d = copy.deepcopy(a)
>>> a.append(5)
>>> a[4].append('c')

and the result is:

>>> a
[1, 2, 3, 4, ['a', 'b', 'c'], 5]
>>> b
[1, 2, 3, 4, ['a', 'b', 'c'], 5]
>>> c
[1, 2, 3, 4, ['a', 'b', 'c']]
>>> d
[1, 2, 3, 4, ['a', 'b']]

回答 8

您可以通过dict使用其他关键字参数调用构造函数来一次性复制和编辑新构造的副本:

>>> dict1 = {"key1": "value1", "key2": "value2"}
>>> dict2 = dict(dict1, key2="WHY?!")
>>> dict1
{'key2': 'value2', 'key1': 'value1'}
>>> dict2
{'key2': 'WHY?!', 'key1': 'value1'}

You can copy and edit the newly constructed copy in one go by calling the dict constructor with additional keyword arguments:

>>> dict1 = {"key1": "value1", "key2": "value2"}
>>> dict2 = dict(dict1, key2="WHY?!")
>>> dict1
{'key2': 'value2', 'key1': 'value1'}
>>> dict2
{'key2': 'WHY?!', 'key1': 'value1'}

回答 9

最初,这也使我感到困惑,因为我来自C语言。

在C语言中,变量是内存中定义类型的位置。分配给变量会将数据复制到变量的存储位置。

但是在Python中,变量的作用更像是指向对象的指针。因此,将一个变量分配给另一个变量不会产生副本,只会使该变量名称指向同一对象。

This confused me too, initially, because I was coming from a C background.

In C, a variable is a location in memory with a defined type. Assigning to a variable copies the data into the variable’s memory location.

But in Python, variables act more like pointers to objects. So assigning one variable to another doesn’t make a copy, it just makes that variable name point to the same object.


回答 10

python中的每个变量(类似于dict1str或的东西__builtins__都是指向机器内部某些隐藏的柏拉图“对象”的指针。

如果设置dict1 = dict2,则只需指向dict1与相同的对象(或内存位置,或类似的对象)dict2。现在,所引用的对象与所引用的对象dict1相同dict2

您可以检查:dict1 is dict2应该是True。另外,id(dict1)应与相同id(dict2)

您想要dict1 = copy(dict2)dict1 = deepcopy(dict2)

copy和之间的区别deepcopydeepcopy将确保dict2(您是否将其指向列表?)的元素也是副本。

我用的不是deepcopy很多-在我看来,编写需要它的代码通常是不明智的做法。

Every variable in python (stuff like dict1 or str or __builtins__ is a pointer to some hidden platonic “object” inside the machine.

If you set dict1 = dict2,you just point dict1 to the same object (or memory location, or whatever analogy you like) as dict2. Now, the object referenced by dict1 is the same object referenced by dict2.

You can check: dict1 is dict2 should be True. Also, id(dict1) should be the same as id(dict2).

You want dict1 = copy(dict2), or dict1 = deepcopy(dict2).

The difference between copy and deepcopy? deepcopy will make sure that the elements of dict2 (did you point it at a list?) are also copies.

I don’t use deepcopy much – it’s usually poor practice to write code that needs it (in my opinion).


回答 11

dict1是引用基础字典对象的符号。分配dict1dict2仅分配相同的参考。通过dict2符号更改键的值会更改基础对象,这也会影响dict1。这很混乱。

关于不可变值的推理要比引用容易得多,因此请尽可能制作副本:

person = {'name': 'Mary', 'age': 25}
one_year_later = {**person, 'age': 26}  # does not mutate person dict

在语法上与以下内容相同:

one_year_later = dict(person, age=26)

dict1 is a symbol that references an underlying dictionary object. Assigning dict1 to dict2 merely assigns the same reference. Changing a key’s value via the dict2 symbol changes the underlying object, which also affects dict1. This is confusing.

It is far easier to reason about immutable values than references, so make copies whenever possible:

person = {'name': 'Mary', 'age': 25}
one_year_later = {**person, 'age': 26}  # does not mutate person dict

This is syntactically the same as:

one_year_later = dict(person, age=26)

回答 12

dict2 = dict1不复制字典。它只是为程序员提供了第二种方法(dict2)来引用同一词典。

dict2 = dict1 does not copy the dictionary. It simply gives you the programmer a second way (dict2) to refer to the same dictionary.


回答 13

>>> dict2 = dict1
# dict2 is bind to the same Dict object which binds to dict1, so if you modify dict2, you will modify the dict1

复制Dict对象的方法很多,我只是简单地使用

dict_1 = {
           'a':1,
           'b':2
         }
dict_2 = {}
dict_2.update(dict_1)
>>> dict2 = dict1
# dict2 is bind to the same Dict object which binds to dict1, so if you modify dict2, you will modify the dict1

There are many ways to copy Dict object, I simply use

dict_1 = {
           'a':1,
           'b':2
         }
dict_2 = {}
dict_2.update(dict_1)

回答 14

正如其他人所解释的,内置dict功能无法满足您的需求。但是在Python2中(可能还有3个),您可以轻松地创建一个ValueDict用于复制的类,=因此可以确保原始版本不会更改。

class ValueDict(dict):

    def __ilshift__(self, args):
        result = ValueDict(self)
        if isinstance(args, dict):
            dict.update(result, args)
        else:
            dict.__setitem__(result, *args)
        return result # Pythonic LVALUE modification

    def __irshift__(self, args):
        result = ValueDict(self)
        dict.__delitem__(result, args)
        return result # Pythonic LVALUE modification

    def __setitem__(self, k, v):
        raise AttributeError, \
            "Use \"value_dict<<='%s', ...\" instead of \"d[%s] = ...\"" % (k,k)

    def __delitem__(self, k):
        raise AttributeError, \
            "Use \"value_dict>>='%s'\" instead of \"del d[%s]" % (k,k)

    def update(self, d2):
        raise AttributeError, \
            "Use \"value_dict<<=dict2\" instead of \"value_dict.update(dict2)\""


# test
d = ValueDict()

d <<='apples', 5
d <<='pears', 8
print "d =", d

e = d
e <<='bananas', 1
print "e =", e
print "d =", d

d >>='pears'
print "d =", d
d <<={'blueberries': 2, 'watermelons': 315}
print "d =", d
print "e =", e
print "e['bananas'] =", e['bananas']


# result
d = {'apples': 5, 'pears': 8}
e = {'apples': 5, 'pears': 8, 'bananas': 1}
d = {'apples': 5, 'pears': 8}
d = {'apples': 5}
d = {'watermelons': 315, 'blueberries': 2, 'apples': 5}
e = {'apples': 5, 'pears': 8, 'bananas': 1}
e['bananas'] = 1

# e[0]=3
# would give:
# AttributeError: Use "value_dict<<='0', ..." instead of "d[0] = ..."

请参考此处讨论的左值修改模式:Python 2.7-左值修改的纯语法。关键的观察是,strint表现为在Python值(即使它们实际上是引擎盖下的不可变对象)。当您观察到这一点时,也请注意,关于str或,没有什么神奇的特别之处intdict可以以几乎相同的方式使用,我可以想到很多ValueDict有意义的情况。

As others have explained, the built-in dict does not do what you want. But in Python2 (and probably 3 too) you can easily create a ValueDict class that copies with = so you can be sure that the original will not change.

class ValueDict(dict):

    def __ilshift__(self, args):
        result = ValueDict(self)
        if isinstance(args, dict):
            dict.update(result, args)
        else:
            dict.__setitem__(result, *args)
        return result # Pythonic LVALUE modification

    def __irshift__(self, args):
        result = ValueDict(self)
        dict.__delitem__(result, args)
        return result # Pythonic LVALUE modification

    def __setitem__(self, k, v):
        raise AttributeError, \
            "Use \"value_dict<<='%s', ...\" instead of \"d[%s] = ...\"" % (k,k)

    def __delitem__(self, k):
        raise AttributeError, \
            "Use \"value_dict>>='%s'\" instead of \"del d[%s]" % (k,k)

    def update(self, d2):
        raise AttributeError, \
            "Use \"value_dict<<=dict2\" instead of \"value_dict.update(dict2)\""


# test
d = ValueDict()

d <<='apples', 5
d <<='pears', 8
print "d =", d

e = d
e <<='bananas', 1
print "e =", e
print "d =", d

d >>='pears'
print "d =", d
d <<={'blueberries': 2, 'watermelons': 315}
print "d =", d
print "e =", e
print "e['bananas'] =", e['bananas']


# result
d = {'apples': 5, 'pears': 8}
e = {'apples': 5, 'pears': 8, 'bananas': 1}
d = {'apples': 5, 'pears': 8}
d = {'apples': 5}
d = {'watermelons': 315, 'blueberries': 2, 'apples': 5}
e = {'apples': 5, 'pears': 8, 'bananas': 1}
e['bananas'] = 1

# e[0]=3
# would give:
# AttributeError: Use "value_dict<<='0', ..." instead of "d[0] = ..."

Please refer to the lvalue modification pattern discussed here: Python 2.7 – clean syntax for lvalue modification. The key observation is that str and int behave as values in Python (even though they’re actually immutable objects under the hood). While you’re observing that, please also observe that nothing is magically special about str or int. dict can be used in much the same ways, and I can think of many cases where ValueDict makes sense.


回答 15

以下代码,是遵循json语法的字典的,比deepcopy快3倍以上

def CopyDict(dSrc):
    try:
        return json.loads(json.dumps(dSrc))
    except Exception as e:
        Logger.warning("Can't copy dict the preferred way:"+str(dSrc))
        return deepcopy(dSrc)

the following code, which is on dicts which follows json syntax more than 3 times faster than deepcopy

def CopyDict(dSrc):
    try:
        return json.loads(json.dumps(dSrc))
    except Exception as e:
        Logger.warning("Can't copy dict the preferred way:"+str(dSrc))
        return deepcopy(dSrc)

回答 16

尝试深复制类w / o将其分配给变量的字典属性时,遇到了一种特殊的行为

new = copy.deepcopy(my_class.a)不起作用,即修改new修改my_class.a

但是,如果您这样做old = my_class.a,那么new = copy.deepcopy(old)它会完美运行,即修改new不会影响my_class.a

我不确定为什么会发生这种情况,但希望它可以节省一些时间!:)

i ran into a peculiar behavior when trying to deep copy dictionary property of class w/o assigning it to variable

new = copy.deepcopy(my_class.a) doesn’t work i.e. modifying new modifies my_class.a

but if you do old = my_class.a and then new = copy.deepcopy(old) it works perfectly i.e. modifying new does not affect my_class.a

I am not sure why this happens, but hope it helps save some hours! :)


回答 17

因为dict2 = dict1,dict2保存了对dict1的引用。dict1和dict2都指向内存中的同一位置。这是在python中使用可变对象时的正常情况。使用python中的可变对象时,必须小心,因为它很难调试。如下面的例子。

 my_users = {
        'ids':[1,2],
        'blocked_ids':[5,6,7]
 }
 ids = my_users.get('ids')
 ids.extend(my_users.get('blocked_ids')) #all_ids
 print ids#output:[1, 2, 5, 6, 7]
 print my_users #output:{'blocked_ids': [5, 6, 7], 'ids': [1, 2, 5, 6, 7]}

此示例意图是获取所有用户ID,包括被阻止的ID。我们是从ids变量获得的,但是我们也无意间更新了my_users的值。当你扩展的IDSblocked_ids my_users得到了更新,因为IDS参考my_users

because, dict2 = dict1, dict2 holds the reference to dict1. Both dict1 and dict2 points to the same location in the memory. This is just a normal case while working with mutable objects in python. When you are working with mutable objects in python you must be careful as it is hard to debug. Such as the following example.

 my_users = {
        'ids':[1,2],
        'blocked_ids':[5,6,7]
 }
 ids = my_users.get('ids')
 ids.extend(my_users.get('blocked_ids')) #all_ids
 print ids#output:[1, 2, 5, 6, 7]
 print my_users #output:{'blocked_ids': [5, 6, 7], 'ids': [1, 2, 5, 6, 7]}

This example intention is to get all the user ids including blocked ids. That we got from ids variable but we also updated the value of my_users unintentionally. when you extended the ids with blocked_ids my_users got updated because ids refer to my_users.


回答 18

使用for循环进行复制:

orig = {"X2": 674.5, "X3": 245.0}

copy = {}
for key in orig:
    copy[key] = orig[key]

print(orig) # {'X2': 674.5, 'X3': 245.0}
print(copy) # {'X2': 674.5, 'X3': 245.0}
copy["X2"] = 808
print(orig) # {'X2': 674.5, 'X3': 245.0}
print(copy) # {'X2': 808, 'X3': 245.0}

Copying by using a for loop:

orig = {"X2": 674.5, "X3": 245.0}

copy = {}
for key in orig:
    copy[key] = orig[key]

print(orig) # {'X2': 674.5, 'X3': 245.0}
print(copy) # {'X2': 674.5, 'X3': 245.0}
copy["X2"] = 808
print(orig) # {'X2': 674.5, 'X3': 245.0}
print(copy) # {'X2': 808, 'X3': 245.0}

回答 19

您可以直接使用:

dict2 = eval(repr(dict1))

其中对象dict2是dict1的独立副本,因此您可以修改dict2而不会影响dict1。

这适用于任何类型的对象。

You can use directly:

dict2 = eval(repr(dict1))

where object dict2 is an independent copy of dict1, so you can modify dict2 without affecting dict1.

This works for any kind of object.