标签归档:Python

如何使用python识别文件是普通文件还是目录

问题:如何使用python识别文件是普通文件还是目录

如何使用python检查文件是普通文件还是目录?

How do you check whether a file is a normal file or a directory using python?


回答 0

os.path.isdir()并且os.path.isfile()应该给你你想要的东西。请参阅:http//docs.python.org/library/os.path.html

os.path.isdir() and os.path.isfile() should give you what you want. See: http://docs.python.org/library/os.path.html


回答 1

至于其他的答案说,os.path.isdir()os.path.isfile()你想要的东西。但是,您需要记住,这不是仅有的两种情况。使用os.path.islink()的符号链接的实例。此外,False如果文件不存在,这些都将返回,因此您可能还需要检查一下os.path.exists()

As other answers have said, os.path.isdir() and os.path.isfile() are what you want. However, you need to keep in mind that these are not the only two cases. Use os.path.islink() for symlinks for instance. Furthermore, these all return False if the file does not exist, so you’ll probably want to check with os.path.exists() as well.


回答 2

蟒3.4引入pathlib模块到标准库,它提供了一个面向对象的方法来处理的文件系统的路径。相对的方法是.is_file().is_dir()

In [1]: from pathlib import Path

In [2]: p = Path('/usr')

In [3]: p.is_file()
Out[3]: False

In [4]: p.is_dir()
Out[4]: True

In [5]: q = p / 'bin' / 'vim'

In [6]: q.is_file()
Out[6]: True

In [7]: q.is_dir()
Out[7]: False

也可以通过PyPi上的pathlib2模块在 Python 2.7 上使用Pathlib。

Python 3.4 introduced the pathlib module into the standard library, which provides an object oriented approach to handle filesystem paths. The relavant methods would be .is_file() and .is_dir():

In [1]: from pathlib import Path

In [2]: p = Path('/usr')

In [3]: p.is_file()
Out[3]: False

In [4]: p.is_dir()
Out[4]: True

In [5]: q = p / 'bin' / 'vim'

In [6]: q.is_file()
Out[6]: True

In [7]: q.is_dir()
Out[7]: False

Pathlib is also available on Python 2.7 via the pathlib2 module on PyPi.


回答 3

import os

if os.path.isdir(d):
    print "dir"
else:
    print "file"
import os

if os.path.isdir(d):
    print "dir"
else:
    print "file"

回答 4

os.path.isdir('string')
os.path.isfile('string')

os.path.isdir('string')
os.path.isfile('string')

回答 5

试试这个:

import os.path
if os.path.isdir("path/to/your/file"):
    print "it's a directory"
else:
    print "it's a file"

try this:

import os.path
if os.path.isdir("path/to/your/file"):
    print "it's a directory"
else:
    print "it's a file"

回答 6

如果您只是浏览一组目录,则最好尝试尝试os.chdir给出错误/警告(如果失败):

import os,sys
for DirName in sys.argv[1:]:
    SaveDir = os.getcwd()
    try:
        os.chdir(DirName)
        print "Changed to "+DirName
        # Do some stuff here in the directory
        os.chdir(SaveDir)
    except:
        sys.stderr.write("%s: WARNING: Cannot change to %s\n" % (sys.argv[0],DirName))

If you’re just stepping through a set of directories you might be better just to try os.chdir and give an error/warning if it fails:

import os,sys
for DirName in sys.argv[1:]:
    SaveDir = os.getcwd()
    try:
        os.chdir(DirName)
        print "Changed to "+DirName
        # Do some stuff here in the directory
        os.chdir(SaveDir)
    except:
        sys.stderr.write("%s: WARNING: Cannot change to %s\n" % (sys.argv[0],DirName))

“ x

问题:“ x

从此页面,我们知道:

链式比较比使用and运算符要快。写x < y < z而不是x < y and y < z

但是,测试以下代码片段时,我得到了不同的结果:

$ python -m timeit "x = 1.2" "y = 1.3" "z = 1.8" "x < y < z"
1000000 loops, best of 3: 0.322 usec per loop
$ python -m timeit "x = 1.2" "y = 1.3" "z = 1.8" "x < y and y < z"
1000000 loops, best of 3: 0.22 usec per loop
$ python -m timeit "x = 1.2" "y = 1.3" "z = 1.1" "x < y < z"
1000000 loops, best of 3: 0.279 usec per loop
$ python -m timeit "x = 1.2" "y = 1.3" "z = 1.1" "x < y and y < z"
1000000 loops, best of 3: 0.215 usec per loop

看来x < y and y < z比快x < y < z为什么?

在搜索了该站点的一些帖子(如本篇文章)之后,我知道“仅评估一次”是的关键x < y < z,但是我仍然感到困惑。为了进一步研究,我使用dis.dis以下命令分解了这两个函数:

import dis

def chained_compare():
        x = 1.2
        y = 1.3
        z = 1.1
        x < y < z

def and_compare():
        x = 1.2
        y = 1.3
        z = 1.1
        x < y and y < z

dis.dis(chained_compare)
dis.dis(and_compare)

输出为:

## chained_compare ##

  4           0 LOAD_CONST               1 (1.2)
              3 STORE_FAST               0 (x)

  5           6 LOAD_CONST               2 (1.3)
              9 STORE_FAST               1 (y)

  6          12 LOAD_CONST               3 (1.1)
             15 STORE_FAST               2 (z)

  7          18 LOAD_FAST                0 (x)
             21 LOAD_FAST                1 (y)
             24 DUP_TOP
             25 ROT_THREE
             26 COMPARE_OP               0 (<)
             29 JUMP_IF_FALSE_OR_POP    41
             32 LOAD_FAST                2 (z)
             35 COMPARE_OP               0 (<)
             38 JUMP_FORWARD             2 (to 43)
        >>   41 ROT_TWO
             42 POP_TOP
        >>   43 POP_TOP
             44 LOAD_CONST               0 (None)
             47 RETURN_VALUE

## and_compare ##

 10           0 LOAD_CONST               1 (1.2)
              3 STORE_FAST               0 (x)

 11           6 LOAD_CONST               2 (1.3)
              9 STORE_FAST               1 (y)

 12          12 LOAD_CONST               3 (1.1)
             15 STORE_FAST               2 (z)

 13          18 LOAD_FAST                0 (x)
             21 LOAD_FAST                1 (y)
             24 COMPARE_OP               0 (<)
             27 JUMP_IF_FALSE_OR_POP    39
             30 LOAD_FAST                1 (y)
             33 LOAD_FAST                2 (z)
             36 COMPARE_OP               0 (<)
        >>   39 POP_TOP
             40 LOAD_CONST               0 (None)

看来,的x < y and y < z分解命令比少x < y < z。我应该考虑x < y and y < zx < y < z吗?

在Intel®Xeon®CPU E5640 @ 2.67GHz上使用Python 2.7.6进行了测试。

From this page, we know that:

Chained comparisons are faster than using the and operator. Write x < y < z instead of x < y and y < z.

However, I got a different result testing the following code snippets:

$ python -m timeit "x = 1.2" "y = 1.3" "z = 1.8" "x < y < z"
1000000 loops, best of 3: 0.322 usec per loop
$ python -m timeit "x = 1.2" "y = 1.3" "z = 1.8" "x < y and y < z"
1000000 loops, best of 3: 0.22 usec per loop
$ python -m timeit "x = 1.2" "y = 1.3" "z = 1.1" "x < y < z"
1000000 loops, best of 3: 0.279 usec per loop
$ python -m timeit "x = 1.2" "y = 1.3" "z = 1.1" "x < y and y < z"
1000000 loops, best of 3: 0.215 usec per loop

It seems that x < y and y < z is faster than x < y < z. Why?

After searching some posts in this site (like this one) I know that “evaluated only once” is the key for x < y < z, however I’m still confused. To do further study, I disassembled these two functions using dis.dis:

import dis

def chained_compare():
        x = 1.2
        y = 1.3
        z = 1.1
        x < y < z

def and_compare():
        x = 1.2
        y = 1.3
        z = 1.1
        x < y and y < z

dis.dis(chained_compare)
dis.dis(and_compare)

And the output is:

## chained_compare ##

  4           0 LOAD_CONST               1 (1.2)
              3 STORE_FAST               0 (x)

  5           6 LOAD_CONST               2 (1.3)
              9 STORE_FAST               1 (y)

  6          12 LOAD_CONST               3 (1.1)
             15 STORE_FAST               2 (z)

  7          18 LOAD_FAST                0 (x)
             21 LOAD_FAST                1 (y)
             24 DUP_TOP
             25 ROT_THREE
             26 COMPARE_OP               0 (<)
             29 JUMP_IF_FALSE_OR_POP    41
             32 LOAD_FAST                2 (z)
             35 COMPARE_OP               0 (<)
             38 JUMP_FORWARD             2 (to 43)
        >>   41 ROT_TWO
             42 POP_TOP
        >>   43 POP_TOP
             44 LOAD_CONST               0 (None)
             47 RETURN_VALUE

## and_compare ##

 10           0 LOAD_CONST               1 (1.2)
              3 STORE_FAST               0 (x)

 11           6 LOAD_CONST               2 (1.3)
              9 STORE_FAST               1 (y)

 12          12 LOAD_CONST               3 (1.1)
             15 STORE_FAST               2 (z)

 13          18 LOAD_FAST                0 (x)
             21 LOAD_FAST                1 (y)
             24 COMPARE_OP               0 (<)
             27 JUMP_IF_FALSE_OR_POP    39
             30 LOAD_FAST                1 (y)
             33 LOAD_FAST                2 (z)
             36 COMPARE_OP               0 (<)
        >>   39 POP_TOP
             40 LOAD_CONST               0 (None)

It seems that the x < y and y < z has less dissembled commands than x < y < z. Should I consider x < y and y < z faster than x < y < z?

Tested with Python 2.7.6 on an Intel(R) Xeon(R) CPU E5640 @ 2.67GHz.


回答 0

区别在于in x < y < z y仅被评估一次。如果y是一个变量,这并没有太大的区别,但是当它是一个函数调用时,它却会产生很大的差异,这需要花费一些时间来计算。

from time import sleep
def y():
    sleep(.2)
    return 1.3
%timeit 1.2 < y() < 1.8
10 loops, best of 3: 203 ms per loop
%timeit 1.2 < y() and y() < 1.8
1 loops, best of 3: 405 ms per loop

The difference is that in x < y < z y is only evaluated once. This does not make a large difference if y is a variable, but it does when it is a function call, which takes some time to compute.

from time import sleep
def y():
    sleep(.2)
    return 1.3
%timeit 1.2 < y() < 1.8
10 loops, best of 3: 203 ms per loop
%timeit 1.2 < y() and y() < 1.8
1 loops, best of 3: 405 ms per loop

回答 1

您定义的两个函数的最佳字节码将是

          0 LOAD_CONST               0 (None)
          3 RETURN_VALUE

因为不使用比较结果。通过返回比较结果,使情况变得更加有趣。让我们在编译时也无法得知结果。

def interesting_compare(y):
    x = 1.1
    z = 1.3
    return x < y < z  # or: x < y and y < z

同样,比较的两个版本在语义上是相同的,因此两个结构的最佳字节码相同。尽我所能,它看起来像这样。我已经在每个操作码之前和之后用Forth注释(在右边的栈顶,在--前后划分,尾部?表示可能存在或不存在的东西)的每一行都用了堆栈内容。请注意,RETURN_VALUE将丢弃所有遗留在返回值下面的堆栈中的所有内容。

          0 LOAD_FAST                0 (y)    ;          -- y
          3 DUP_TOP                           ; y        -- y y
          4 LOAD_CONST               0 (1.1)  ; y y      -- y y 1.1
          7 COMPARE_OP               4 (>)    ; y y 1.1  -- y pred
         10 JUMP_IF_FALSE_OR_POP     19       ; y pred   -- y
         13 LOAD_CONST               1 (1.3)  ; y        -- y 1.3
         16 COMPARE_OP               0 (<)    ; y 1.3    -- pred
     >>  19 RETURN_VALUE                      ; y? pred  --

如果CPython,PyPy等语言的实现未针对两种变体生成此字节码(或其等效的操作序列),则说明该字节码编译器的质量较差。从上面发布的字节码序列中获取是一个已解决的问题(我想在这种情况下,您需要做的就是不断折叠消除无效代码以及对堆栈内容进行更好的建模;常见的子表达式消除也将是廉价且有价值的),而没有在现代语言实现中没有这样做的借口。

现在,碰巧该语言的所有当前实现都具有劣质的字节码编译器。但是您在编码时应该忽略这一点!假装字节码编译器很好,并编写最易读的代码。无论如何它可能足够快。如果不是这样,请首先寻找算法上的改进,然后尝试Cython-与您可能应用的任何表达式级调整相比,在相同的工作量下将提供更多的改进。

Optimal bytecode for both of the functions you defined would be

          0 LOAD_CONST               0 (None)
          3 RETURN_VALUE

because the result of the comparison is not used. Let’s make the situation more interesting by returning the result of the comparison. Let’s also have the result not be knowable at compile time.

def interesting_compare(y):
    x = 1.1
    z = 1.3
    return x < y < z  # or: x < y and y < z

Again, the two versions of the comparison are semantically identical, so the optimal bytecode is the same for both constructs. As best I can work it out, it would look like this. I’ve annotated each line with the stack contents before and after each opcode, in Forth notation (top of stack at right, -- divides before and after, trailing ? indicates something that might or might not be there). Note that RETURN_VALUE discards everything that happens to be left on the stack underneath the value returned.

          0 LOAD_FAST                0 (y)    ;          -- y
          3 DUP_TOP                           ; y        -- y y
          4 LOAD_CONST               0 (1.1)  ; y y      -- y y 1.1
          7 COMPARE_OP               4 (>)    ; y y 1.1  -- y pred
         10 JUMP_IF_FALSE_OR_POP     19       ; y pred   -- y
         13 LOAD_CONST               1 (1.3)  ; y        -- y 1.3
         16 COMPARE_OP               0 (<)    ; y 1.3    -- pred
     >>  19 RETURN_VALUE                      ; y? pred  --

If an implementation of the language, CPython, PyPy, whatever, does not generate this bytecode (or its own equivalent sequence of operations) for both variations, that demonstrates the poor quality of that bytecode compiler. Getting from the bytecode sequences you posted to the above is a solved problem (I think all you need for this case is constant folding, dead code elimination, and better modeling of the contents of the stack; common subexpression elimination would also be cheap and valuable), and there’s really no excuse for not doing it in a modern language implementation.

Now, it happens that all current implementations of the language have poor-quality bytecode compilers. But you should ignore that while coding! Pretend the bytecode compiler is good, and write the most readable code. It will probably be plenty fast enough anyway. If it isn’t, look for algorithmic improvements first, and give Cython a try second — that will provide far more improvement for the same effort than any expression-level tweaks you might apply.


回答 2

由于输出的差异似乎是由于缺乏优化所致,所以我认为在大多数情况下您应该忽略该差异-可能差异会消失。区别在于,y只应评估一次,然后通过将其复制到堆栈上来解决该问题,这需要额外的费用POP_TOPLOAD_FAST尽管有可能使用解决方案。

但是,重要的区别在于,如果对x<y and y<z第二个y进行评估,则如果应x<y为true,则应评估两次,如果对的评估y花费大量时间或具有副作用,则可能会产生影响。

在大多数情况下,x<y<z尽管速度稍慢,但仍应使用。

Since the difference in the output seem to be due to lack of optimization I think you should ignore that difference for most cases – it could be that the difference will go away. The difference is because y only should be evaluated once and that is solved by duplicating it on the stack which requires an extra POP_TOP – the solution to use LOAD_FAST might be possible though.

The important difference though is that in x<y and y<z the second y should be evaluated twice if x<y evaluates to true, this has implications if the evaluation of y takes considerable time or have side effects.

In most scenarios you should use x<y<z despite the fact it’s somewhat slower.


回答 3

首先,您的比较几乎没有意义,因为没有引入两种不同的构造来提高性能,因此您不应基于此决定是否使用一个构造来代替另一个构造。

x < y < z结构:

  1. 其含义更清晰,更直接。
  2. 它的语义是您从比较的“数学意义”中所期望的:evalute xyz 一次并检查是否整个条件成立。使用可以and通过y多次评估来更改语义,这可以更改结果

因此,请根据您想要的语义以及是否相等来选择一个,以代替另一个。

这就是说:更多的反汇编代码确实 并不意味着慢的代码。但是,执行更多的字节码操作意味着每个操作都比较简单,但是需要主循环的迭代。这意味着,如果您正在执行的操作非常快(例如,您在那里执行的本地变量查找),那么执行更多字节码操作的开销可能会很重要。

但要注意,这个结果并没有在更一般的情况下举行,仅在“最坏情况”那你碰巧轮廓。正如其他人指出的那样,如果更改y为花费更多时间的内容,您将看到结果更改,因为链接表示法仅对它进行一次评估。

总结:

  • 性能之前要考虑语义。
  • 考虑到可读性。
  • 不要相信微型基准。始终使用不同种类的参数进行分析,以了解功能/表达式时序相对于所述参数的行为,并考虑您打算如何使用它。

First of all, your comparison is pretty much meaningless because the two different constructs were not introduced to provide a performance improvement, so you shouldn’t decide whether to use one in place of the other based on that.

The x < y < z construct:

  1. Is clearer and more direct in its meaning.
  2. Its semantics is what you’d expect from the “mathematical meaning” of the comparison: evalute x, y and z once and check if the whole condition holds. Using and changes the semantics by evaluating y multiple times, which can change the result.

So choose one in place of the other depending on the semantics you want and, if they are equivalent, whether one is more readable than the other.

This said: more disassembled code does does not imply slower code. However executing more bytecode operations means that each operation is simpler and yet it requires an iteration of the main loop. This means that if the operations you are performing are extremely fast (e.g. local variable lookup as you are doing there), then the overhead of executing more bytecode operations can matter.

But note that this result does not hold in the more generic situation, only to the “worst case” that you happen to profile. As others have noted, if you change y to something that takes even a bit more time you’ll see that the results change, because the chained notation evaluates it only once.

Summarizing:

  • Consider semantics before performance.
  • Take into account readability.
  • Don’t trust micro benchmarks. Always profile with different kind of parameters to see how a function/expression timing behave in relation to said parameters and consider how you plan to use it.

类方法生成“ TypeError:…为关键字参数获得了多个值……”

问题:类方法生成“ TypeError:…为关键字参数获得了多个值……”

如果我用关键字参数定义一个类方法,则:

class foo(object):
  def foodo(thing=None, thong='not underwear'):
    print thing if thing else "nothing" 
    print 'a thong is',thong

调用该方法将生成TypeError

myfoo = foo()
myfoo.foodo(thing="something")

...
TypeError: foodo() got multiple values for keyword argument 'thing'

这是怎么回事?

If I define a class method with a keyword argument thus:

class foo(object):
  def foodo(thing=None, thong='not underwear'):
    print thing if thing else "nothing" 
    print 'a thong is',thong

calling the method generates a TypeError:

myfoo = foo()
myfoo.foodo(thing="something")

...
TypeError: foodo() got multiple values for keyword argument 'thing'

What’s going on?


回答 0

问题在于,传递给python中类方法的第一个参数始终是在其上调用该方法的类实例的副本,通常标记为self。如果这样声明了该类:

class foo(object):
  def foodo(self, thing=None, thong='not underwear'):
    print thing if thing else "nothing" 
    print 'a thong is',thong

它的行为符合预期。

说明:

如果没有self作为第一个参数,则在myfoo.foodo(thing="something")执行时,将foodo使用arguments调用该方法(myfoo, thing="something")myfoo然后将该实例分配给thing(因为thing是第一个声明的参数),但是python也会尝试分配"something"thing,因此是Exception。

为了演示,请尝试使用原始代码运行它:

myfoo.foodo("something")
print
print myfoo

您将输出如下:

<__main__.foo object at 0x321c290>
a thong is something

<__main__.foo object at 0x321c290>

您可以看到“事物”已被分配对类“ foo”的实例“ myfoo”的引用。文档的此部分说明了函数参数的工作原理。

The problem is that the first argument passed to class methods in python is always a copy of the class instance on which the method is called, typically labelled self. If the class is declared thus:

class foo(object):
  def foodo(self, thing=None, thong='not underwear'):
    print thing if thing else "nothing" 
    print 'a thong is',thong

it behaves as expected.

Explanation:

Without self as the first parameter, when myfoo.foodo(thing="something") is executed, the foodo method is called with arguments (myfoo, thing="something"). The instance myfoo is then assigned to thing (since thing is the first declared parameter), but python also attempts to assign "something" to thing, hence the Exception.

To demonstrate, try running this with the original code:

myfoo.foodo("something")
print
print myfoo

You’ll output like:

<__main__.foo object at 0x321c290>
a thong is something

<__main__.foo object at 0x321c290>

You can see that ‘thing’ has been assigned a reference to the instance ‘myfoo’ of the class ‘foo’. This section of the docs explains how function arguments work a bit more.


回答 1

感谢您的指导性帖子。我只想说明一下,如果您收到“ TypeError:foodo()为关键字参数’thing’获得多个值”,则可能是您错误地将“ self”作为参数传递的调用该函数(可能是因为您从类声明中复制了该行-急时这是一个常见错误)。

Thanks for the instructive posts. I’d just like to keep a note that if you’re getting “TypeError: foodo() got multiple values for keyword argument ‘thing'”, it may also be that you’re mistakenly passing the ‘self’ as a parameter when calling the function (probably because you copied the line from the class declaration – it’s a common error when one’s in a hurry).


回答 2

这可能很明显,但可能会对从未见过的人有所帮助。如果您错误地通过位置和名称显式地分配了参数,则对于常规函数也会发生这种情况。

>>> def foodo(thing=None, thong='not underwear'):
...     print thing if thing else "nothing"
...     print 'a thong is',thong
...
>>> foodo('something', thing='everything')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: foodo() got multiple values for keyword argument 'thing'

This might be obvious, but it might help someone who has never seen it before. This also happens for regular functions if you mistakenly assign a parameter by position and explicitly by name.

>>> def foodo(thing=None, thong='not underwear'):
...     print thing if thing else "nothing"
...     print 'a thong is',thong
...
>>> foodo('something', thing='everything')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: foodo() got multiple values for keyword argument 'thing'

回答 3

只需向功能添加“ staticmethod”装饰器即可解决问题

class foo(object):
    @staticmethod
    def foodo(thing=None, thong='not underwear'):
        print thing if thing else "nothing" 
        print 'a thong is',thong

just add ‘staticmethod’ decorator to function and problem is fixed

class foo(object):
    @staticmethod
    def foodo(thing=None, thong='not underwear'):
        print thing if thing else "nothing" 
        print 'a thong is',thong

回答 4

我想再添加一个答案:

当您尝试在调用函数中尝试传递位置顺序错误的位置参数以及关键字参数时,就会发生这种情况。

there is difference between parameter and argument您可以在此处详细了解python中的参数和参数

def hello(a,b=1, *args):
   print(a, b, *args)


hello(1, 2, 3, 4,a=12)

因为我们有三个参数:

a是位置参数

b = 1是关键字和默认参数

* args是可变长度参数

因此我们首先将a作为位置参数赋值,这意味着我们必须按位置顺序向位置参数提供值,这里顺序很重要。但是我们将参数1传递给in调用函数中的位置,然后还将值提供给a,将其视为关键字参数。现在一个有两个值:

一个是位置值:a = 1

第二个是关键字值,a = 12

我们必须更改hello(1, 2, 3, 4,a=12)为,hello(1, 2, 3, 4,12) 所以现在a将仅获得一个位置值,即1,b将获得值2,其余值将获得* args(可变长度参数)

附加信息

如果我们希望* args应该得到2,3,4而a应该得到1和b应该得到12

那么我们可以这样做
def hello(a,*args,b=1): pass hello(1, 2, 3, 4,b=12)

还有更多:

def hello(a,*c,b=1,**kwargs):
    print(b)
    print(c)
    print(a)
    print(kwargs)

hello(1,2,1,2,8,9,c=12)

输出:

1

(2, 1, 2, 8, 9)

1

{'c': 12}

I want to add one more answer :

It happens when you try to pass positional parameter with wrong position order along with keyword argument in calling function.

there is difference between parameter and argument you can read in detail about here Arguments and Parameter in python

def hello(a,b=1, *args):
   print(a, b, *args)


hello(1, 2, 3, 4,a=12)

since we have three parameters :

a is positional parameter

b=1 is keyword and default parameter

*args is variable length parameter

so we first assign a as positional parameter , means we have to provide value to positional argument in its position order, here order matter. but we are passing argument 1 at the place of a in calling function and then we are also providing value to a , treating as keyword argument. now a have two values :

one is positional value: a=1

second is keyworded value which is a=12

Solution

We have to change hello(1, 2, 3, 4,a=12) to hello(1, 2, 3, 4,12) so now a will get only one positional value which is 1 and b will get value 2 and rest of values will get *args (variable length parameter)

additional information

if we want that *args should get 2,3,4 and a should get 1 and b should get 12

then we can do like this
def hello(a,*args,b=1): pass hello(1, 2, 3, 4,b=12)

Something more :

def hello(a,*c,b=1,**kwargs):
    print(b)
    print(c)
    print(a)
    print(kwargs)

hello(1,2,1,2,8,9,c=12)

output :

1

(2, 1, 2, 8, 9)

1

{'c': 12}

回答 5

如果您传递的关键字自变量的键之一与位置自变量相似(具有相同的字符串名称),则也会发生此错误。

>>> class Foo():
...     def bar(self, bar, **kwargs):
...             print(bar)
... 
>>> kwgs = {"bar":"Barred", "jokes":"Another key word argument"}
>>> myfoo = Foo()
>>> myfoo.bar("fire", **kwgs)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: bar() got multiple values for argument 'bar'
>>> 

“开火”已被纳入“酒吧”论点。但是在kwargs中还存在另一个“禁止”论点。

您必须先将关键字参数从kwargs中删除,然后再将其传递给方法。

This error can also happen if you pass a key word argument for which one of the keys is similar (has same string name) to a positional argument.

>>> class Foo():
...     def bar(self, bar, **kwargs):
...             print(bar)
... 
>>> kwgs = {"bar":"Barred", "jokes":"Another key word argument"}
>>> myfoo = Foo()
>>> myfoo.bar("fire", **kwgs)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: bar() got multiple values for argument 'bar'
>>> 

“fire” has been accepted into the ‘bar’ argument. And yet there is another ‘bar’ argument present in kwargs.

You would have to remove the keyword argument from the kwargs before passing it to the method.


回答 6

如果您使用jquery ajax的URL反向到不包含’request’参数的函数,则这也可能在Django中发生

$.ajax({
  url: '{{ url_to_myfunc }}',
});


def myfunc(foo, bar):
    ...

Also this can happen in Django if you are using jquery ajax to url that reverses to a function that doesn’t contain ‘request’ parameter

$.ajax({
  url: '{{ url_to_myfunc }}',
});


def myfunc(foo, bar):
    ...

使用argparse获取选定的子命令

问题:使用argparse获取选定的子命令

当我将子命令与python argparse一起使用时,可以获取所选的参数。

parser = argparse.ArgumentParser()
parser.add_argument('-g', '--global')
subparsers = parser.add_subparsers()   
foo_parser = subparsers.add_parser('foo')
foo_parser.add_argument('-c', '--count')
bar_parser = subparsers.add_parser('bar')
args = parser.parse_args(['-g, 'xyz', 'foo', '--count', '42'])
# args => Namespace(global='xyz', count='42')

因此args不包含'foo'sys.argv[1]由于可能存在全局arg,因此简单地编写不起作用。如何获得子命令本身?

When I use subcommands with python argparse, I can get the selected arguments.

parser = argparse.ArgumentParser()
parser.add_argument('-g', '--global')
subparsers = parser.add_subparsers()   
foo_parser = subparsers.add_parser('foo')
foo_parser.add_argument('-c', '--count')
bar_parser = subparsers.add_parser('bar')
args = parser.parse_args(['-g, 'xyz', 'foo', '--count', '42'])
# args => Namespace(global='xyz', count='42')

So args doesn’t contain 'foo'. Simply writing sys.argv[1] doesn’t work because of the possible global args. How can I get the subcommand itself?


回答 0

关于argparse子命令Python文档的最底层介绍了如何执行此操作:

>>> parser = argparse.ArgumentParser()
>>> parser.add_argument('-g', '--global')
>>> subparsers = parser.add_subparsers(dest="subparser_name") # this line changed
>>> foo_parser = subparsers.add_parser('foo')
>>> foo_parser.add_argument('-c', '--count')
>>> bar_parser = subparsers.add_parser('bar')
>>> args = parser.parse_args(['-g', 'xyz', 'foo', '--count', '42'])
>>> args
Namespace(count='42', global='xyz', subparser_name='foo')

您也可以使用set_defaults()我发现的示例上方引用的方法。

The very bottom of the Python docs on argparse sub-commands explains how to do this:

>>> parser = argparse.ArgumentParser()
>>> parser.add_argument('-g', '--global')
>>> subparsers = parser.add_subparsers(dest="subparser_name") # this line changed
>>> foo_parser = subparsers.add_parser('foo')
>>> foo_parser.add_argument('-c', '--count')
>>> bar_parser = subparsers.add_parser('bar')
>>> args = parser.parse_args(['-g', 'xyz', 'foo', '--count', '42'])
>>> args
Namespace(count='42', global='xyz', subparser_name='foo')

You can also use the set_defaults() method referenced just above the example I found.


回答 1

ArgumentParser.add_subparsersdest正式的说法描述为:

dest-将在其下存储子命令名称的属性的名称;默认情况下None,不存储任何值

在以下使用子解析器的简单任务功能布局的示例中,所选子解析器位于中parser.parse_args().subparser

import argparse


def task_a(alpha):
    print('task a', alpha)


def task_b(beta, gamma):
    print('task b', beta, gamma)


if __name__ == '__main__':
    parser = argparse.ArgumentParser()
    subparsers = parser.add_subparsers(dest='subparser')

    parser_a = subparsers.add_parser('task_a')
    parser_a.add_argument(
        '-a', '--alpha', dest='alpha', help='Alpha description')

    parser_b = subparsers.add_parser('task_b')
    parser_b.add_argument(
        '-b', '--beta', dest='beta', help='Beta description')
    parser_b.add_argument(
        '-g', '--gamma', dest='gamma', default=42, help='Gamma description')

    kwargs = vars(parser.parse_args())
    globals()[kwargs.pop('subparser')](**kwargs)

ArgumentParser.add_subparsers has dest formal argument described as:

dest – name of the attribute under which sub-command name will be stored; by default None and no value is stored

In the example below of a simple task function layout using subparsers, the selected subparser is in parser.parse_args().subparser.

import argparse


def task_a(alpha):
    print('task a', alpha)


def task_b(beta, gamma):
    print('task b', beta, gamma)


if __name__ == '__main__':
    parser = argparse.ArgumentParser()
    subparsers = parser.add_subparsers(dest='subparser')

    parser_a = subparsers.add_parser('task_a')
    parser_a.add_argument(
        '-a', '--alpha', dest='alpha', help='Alpha description')

    parser_b = subparsers.add_parser('task_b')
    parser_b.add_argument(
        '-b', '--beta', dest='beta', help='Beta description')
    parser_b.add_argument(
        '-g', '--gamma', dest='gamma', default=42, help='Gamma description')

    kwargs = vars(parser.parse_args())
    globals()[kwargs.pop('subparser')](**kwargs)

在python中继承和覆盖__init__

问题:在python中继承和覆盖__init__

我正在阅读“深入Python”,并在有关类的章节中给出了以下示例:

class FileInfo(UserDict):
    "store file metadata"
    def __init__(self, filename=None):
        UserDict.__init__(self)
        self["name"] = filename

然后,作者说,如果要覆盖该__init__方法,则必须__init__使用正确的参数显式调用父方法。

  1. 如果该FileInfo班有一个以上的祖先班怎么办?
    • 我是否必须显式调用所有祖先类的__init__方法?
  2. 另外,我是否必须对要覆盖的其他任何方法执行此操作?

I was reading ‘Dive Into Python’ and in the chapter on classes it gives this example:

class FileInfo(UserDict):
    "store file metadata"
    def __init__(self, filename=None):
        UserDict.__init__(self)
        self["name"] = filename

The author then says that if you want to override the __init__ method, you must explicitly call the parent __init__ with the correct parameters.

  1. What if that FileInfo class had more than one ancestor class?
    • Do I have to explicitly call all of the ancestor classes’ __init__ methods?
  2. Also, do I have to do this to any other method I want to override?

回答 0

关于子类-超类调用,这本书有些过时了。在子类化内置类方面也有些过时。

如今看起来像这样:

class FileInfo(dict):
    """store file metadata"""
    def __init__(self, filename=None):
        super(FileInfo, self).__init__()
        self["name"] = filename

请注意以下几点:

  1. 我们可以直接继承内建类,如dictlisttuple,等。

  2. super函数负责跟踪此类的超类并在其中适当地调用函数。

The book is a bit dated with respect to subclass-superclass calling. It’s also a little dated with respect to subclassing built-in classes.

It looks like this nowadays:

class FileInfo(dict):
    """store file metadata"""
    def __init__(self, filename=None):
        super(FileInfo, self).__init__()
        self["name"] = filename

Note the following:

  1. We can directly subclass built-in classes, like dict, list, tuple, etc.

  2. The super function handles tracking down this class’s superclasses and calling functions in them appropriately.


回答 1

在您需要继承的每个类中,您可以运行每个需要在子类启动时进行初始化的类的循环…可以更好地理解可以复制的示例…

class Female_Grandparent:
    def __init__(self):
        self.grandma_name = 'Grandma'

class Male_Grandparent:
    def __init__(self):
        self.grandpa_name = 'Grandpa'

class Parent(Female_Grandparent, Male_Grandparent):
    def __init__(self):
        Female_Grandparent.__init__(self)
        Male_Grandparent.__init__(self)

        self.parent_name = 'Parent Class'

class Child(Parent):
    def __init__(self):
        Parent.__init__(self)
#---------------------------------------------------------------------------------------#
        for cls in Parent.__bases__: # This block grabs the classes of the child
             cls.__init__(self)      # class (which is named 'Parent' in this case), 
                                     # and iterates through them, initiating each one.
                                     # The result is that each parent, of each child,
                                     # is automatically handled upon initiation of the 
                                     # dependent class. WOOT WOOT! :D
#---------------------------------------------------------------------------------------#



g = Female_Grandparent()
print g.grandma_name

p = Parent()
print p.grandma_name

child = Child()

print child.grandma_name

In each class that you need to inherit from, you can run a loop of each class that needs init’d upon initiation of the child class…an example that can copied might be better understood…

class Female_Grandparent:
    def __init__(self):
        self.grandma_name = 'Grandma'

class Male_Grandparent:
    def __init__(self):
        self.grandpa_name = 'Grandpa'

class Parent(Female_Grandparent, Male_Grandparent):
    def __init__(self):
        Female_Grandparent.__init__(self)
        Male_Grandparent.__init__(self)

        self.parent_name = 'Parent Class'

class Child(Parent):
    def __init__(self):
        Parent.__init__(self)
#---------------------------------------------------------------------------------------#
        for cls in Parent.__bases__: # This block grabs the classes of the child
             cls.__init__(self)      # class (which is named 'Parent' in this case), 
                                     # and iterates through them, initiating each one.
                                     # The result is that each parent, of each child,
                                     # is automatically handled upon initiation of the 
                                     # dependent class. WOOT WOOT! :D
#---------------------------------------------------------------------------------------#



g = Female_Grandparent()
print g.grandma_name

p = Parent()
print p.grandma_name

child = Child()

print child.grandma_name

回答 2

您实际上不必调用__init__基类的方法,但是通常您希望这样做,因为基类将在其中进行一些重要的初始化,而其余的类方法都需要进行初始化。

对于其他方法,这取决于您的意图。如果只想向基类行为添加某些内容,则需要在自己的代码中另外调用基类方法。如果要从根本上改变行为,则可以不调用基类的方法,而直接在派生类中实现所有功能。

You don’t really have to call the __init__ methods of the base class(es), but you usually want to do it because the base classes will do some important initializations there that are needed for rest of the classes methods to work.

For other methods it depends on your intentions. If you just want to add something to the base classes behavior you will want to call the base classes method additionally to your own code. If you want to fundamentally change the behavior, you might not call the base class’ method and implement all the functionality directly in the derived class.


回答 3

如果FileInfo类具有多个祖先类,则您绝对应该调用其所有__init __()函数。您还应该对__del __()函数(该函数是析构函数)执行相同的操作。

If the FileInfo class has more than one ancestor class then you should definitely call all of their __init__() functions. You should also do the same for the __del__() function, which is a destructor.


回答 4

是的,您必须调用__init__每个家长班。如果要覆盖两个父级中都存在的功能,则功能也是如此。

Yes, you must call __init__ for each parent class. The same goes for functions, if you are overriding a function that exists in both parents.


您如何在python中检查字符串是否仅包含数字?

问题:您如何在python中检查字符串是否仅包含数字?

如何检查字符串是否仅包含数字?

我已经去了这里。我想看看实现此目的的最简单方法。

import string

def main():
    isbn = input("Enter your 10 digit ISBN number: ")
    if len(isbn) == 10 and string.digits == True:
        print ("Works")
    else:
        print("Error, 10 digit number was not inputted and/or letters were inputted.")
        main()

if __name__ == "__main__":
    main()
    input("Press enter to exit: ")

How do you check whether a string contains only numbers?

I’ve given it a go here. I’d like to see the simplest way to accomplish this.

import string

def main():
    isbn = input("Enter your 10 digit ISBN number: ")
    if len(isbn) == 10 and string.digits == True:
        print ("Works")
    else:
        print("Error, 10 digit number was not inputted and/or letters were inputted.")
        main()

if __name__ == "__main__":
    main()
    input("Press enter to exit: ")

回答 0

您需要isdigitstr对象上使用方法:

if len(isbn) == 10 and isbn.isdigit():

isdigit文档中

str.isdigit()

如果字符串中的所有字符都是数字并且至少有一个字符,则返回True,否则返回False。数字包括需要特殊处理的十进制字符和数字,例如兼容性上标数字。它涵盖了不能用于​​以10为底的数字的数字,例如Kharosthi数字。形式上,数字是具有属性值Numeric_Type =数字或Numeric_Type =十进制的字符。

You’ll want to use the isdigit method on your str object:

if len(isbn) == 10 and isbn.isdigit():

From the isdigit documentation:

str.isdigit()

Return True if all characters in the string are digits and there is at least one character, False otherwise. Digits include decimal characters and digits that need special handling, such as the compatibility superscript digits. This covers digits which cannot be used to form numbers in base 10, like the Kharosthi numbers. Formally, a digit is a character that has the property value Numeric_Type=Digit or Numeric_Type=Decimal.


回答 1

用途str.isdigit

>>> "12345".isdigit()
True
>>> "12345a".isdigit()
False
>>>

Use str.isdigit:

>>> "12345".isdigit()
True
>>> "12345a".isdigit()
False
>>>

回答 2

使用字符串isdigit函数:

>>> s = '12345'
>>> s.isdigit()
True
>>> s = '1abc'
>>> s.isdigit()
False

Use string isdigit function:

>>> s = '12345'
>>> s.isdigit()
True
>>> s = '1abc'
>>> s.isdigit()
False

回答 3

您还可以使用正则表达式,

import re

例如:-1)word =“ 3487954”

re.match('^[0-9]*$',word)

例如:-2)word =“ 3487.954”

re.match('^[0-9\.]*$',word)

例如:-3)word =“ 3487.954 328”

re.match('^[0-9\.\ ]*$',word)

如您所见,所有3个eg表示您的字符串中没有任何内容。因此,您可以按照其提供的相应解决方案进行操作。

You can also use the regex,

import re

eg:-1) word = “3487954”

re.match('^[0-9]*$',word)

eg:-2) word = “3487.954”

re.match('^[0-9\.]*$',word)

eg:-3) word = “3487.954 328”

re.match('^[0-9\.\ ]*$',word)

As you can see all 3 eg means that there is only no in your string. So you can follow the respective solutions given with them.


回答 4

关于什么浮点数底片号码等。所有的例子之前,将是错误的。

到现在为止,我得到了类似的东西,但我认为它可能会好得多:

'95.95'.replace('.','',1).isdigit()

仅当存在一个或没有“。”时返回true。在数字字符串中。

'9.5.9.5'.replace('.','',1).isdigit()

将返回假

What about of float numbers, negatives numbers, etc.. All the examples before will be wrong.

Until now I got something like this, but I think it could be a lot better:

'95.95'.replace('.','',1).isdigit()

will return true only if there is one or no ‘.’ in the string of digits.

'9.5.9.5'.replace('.','',1).isdigit()

will return false


回答 5

正如该评论所指出的,如何检查python字符串是否仅包含数字?isdigit()方法在此用例中并不完全准确,因为对于某些类似数字的字符它返回True:

>>> "\u2070".isdigit() # unicode escaped 'superscript zero' 
True

如果需要避免这种情况,则以下简单函数检查字符串中的所有字符是否为“ 0”和“ 9”之间的数字:

import string

def contains_only_digits(s):
    # True for "", "0", "123"
    # False for "1.2", "1,2", "-1", "a", "a1"
    for ch in s:
        if not ch in string.digits:
            return False
    return True

在问题示例中使用:

if len(isbn) == 10 and contains_only_digits(isbn):
    print ("Works")

As pointed out in this comment How do you check in python whether a string contains only numbers? the isdigit() method is not totally accurate for this use case, because it returns True for some digit-like characters:

>>> "\u2070".isdigit() # unicode escaped 'superscript zero' 
True

If this needs to be avoided, the following simple function checks, if all characters in a string are a digit between “0” and “9”:

import string

def contains_only_digits(s):
    # True for "", "0", "123"
    # False for "1.2", "1,2", "-1", "a", "a1"
    for ch in s:
        if not ch in string.digits:
            return False
    return True

Used in the example from the question:

if len(isbn) == 10 and contains_only_digits(isbn):
    print ("Works")

回答 6

您可以在此处使用try catch块:

s="1234"
try:
    num=int(s)
    print "S contains only digits"
except:
    print "S doesn't contain digits ONLY"

You can use try catch block here:

s="1234"
try:
    num=int(s)
    print "S contains only digits"
except:
    print "S doesn't contain digits ONLY"

回答 7

因为每次我遇到检查问题都是因为str有时可以为None,并且如果str可以为None,则仅使用str.isdigit()是不够的,因为您会得到一个错误

AttributeError:’NoneType’对象没有属性’isdigit’

然后您需要首先验证str是否为None。为了避免多if分支,一种清晰的方法是:

if str and str.isdigit():

希望这对像我这样的人有帮助。

As every time I encounter an issue with the check is because the str can be None sometimes, and if the str can be None, only use str.isdigit() is not enough as you will get an error

AttributeError: ‘NoneType’ object has no attribute ‘isdigit’

and then you need to first validate the str is None or not. To avoid a multi-if branch, a clear way to do this is:

if str and str.isdigit():

Hope this helps for people have the same issue like me.


回答 8

我可以想到2种方法来检查字符串是否具有全位数

方法1(在python中使用内置的isdigit()函数):-

>>>st = '12345'
>>>st.isdigit()
True
>>>st = '1abcd'
>>>st.isdigit()
False

方法2(在字符串顶部执行异常处理):-

st="1abcd"
try:
    number=int(st)
    print("String has all digits in it")
except:
    print("String does not have all digits in it")

上面代码的输出将是:

String does not have all digits in it

There are 2 methods that I can think of to check whether a string has all digits of not

Method 1(Using the built-in isdigit() function in python):-

>>>st = '12345'
>>>st.isdigit()
True
>>>st = '1abcd'
>>>st.isdigit()
False

Method 2(Performing Exception Handling on top of the string):-

st="1abcd"
try:
    number=int(st)
    print("String has all digits in it")
except:
    print("String does not have all digits in it")

The output of the above code will be:

String does not have all digits in it

回答 9

您可以使用str.isdigit()方法或str.isnumeric()方法

you can use str.isdigit() method or str.isnumeric() method


如何合并字典的字典?

问题:如何合并字典的字典?

我需要合并多个词典,例如:

dict1 = {1:{"a":{A}}, 2:{"b":{B}}}

dict2 = {2:{"c":{C}}, 3:{"d":{D}}

随着A B CD作为树的叶子像{"info1":"value", "info2":"value2"}

词典的级别(深度)未知,可能是 {2:{"c":{"z":{"y":{C}}}}}

在我的情况下,它表示目录/文件结构,其中节点为docs,而节点为文件。

我想将它们合并以获得:

 dict3 = {1:{"a":{A}}, 2:{"b":{B},"c":{C}}, 3:{"d":{D}}}

我不确定如何使用Python轻松做到这一点。

I need to merge multiple dictionaries, here’s what I have for instance:

dict1 = {1:{"a":{A}}, 2:{"b":{B}}}

dict2 = {2:{"c":{C}}, 3:{"d":{D}}

With A B C and D being leaves of the tree, like {"info1":"value", "info2":"value2"}

There is an unknown level(depth) of dictionaries, it could be {2:{"c":{"z":{"y":{C}}}}}

In my case it represents a directory/files structure with nodes being docs and leaves being files.

I want to merge them to obtain:

 dict3 = {1:{"a":{A}}, 2:{"b":{B},"c":{C}}, 3:{"d":{D}}}

I’m not sure how I could do that easily with Python.


回答 0

这实际上是非常棘手的-特别是如果您希望在事物不一致时收到有用的错误消息,同时正确地接受重复但一致的条目(这里没有其他答案了……)。

假设您没有大量条目,那么递归函数最简单:

def merge(a, b, path=None):
    "merges b into a"
    if path is None: path = []
    for key in b:
        if key in a:
            if isinstance(a[key], dict) and isinstance(b[key], dict):
                merge(a[key], b[key], path + [str(key)])
            elif a[key] == b[key]:
                pass # same leaf value
            else:
                raise Exception('Conflict at %s' % '.'.join(path + [str(key)]))
        else:
            a[key] = b[key]
    return a

# works
print(merge({1:{"a":"A"},2:{"b":"B"}}, {2:{"c":"C"},3:{"d":"D"}}))
# has conflict
merge({1:{"a":"A"},2:{"b":"B"}}, {1:{"a":"A"},2:{"b":"C"}})

请注意,这会发生变化a-将的内容b添加到a(也会返回)。如果您想保留a,可以这样称呼merge(dict(a), b)

agf指出(如下),您可能有两个以上的命令,在这种情况下,您可以使用:

reduce(merge, [dict1, dict2, dict3...])

将所有内容添加到dict1中。

[注意-我编辑了最初的答案以使第一个参数发生变化;使“减少”更易于解释]

python 3中的ps,您还需要 from functools import reduce

this is actually quite tricky – particularly if you want a useful error message when things are inconsistent, while correctly accepting duplicate but consistent entries (something no other answer here does….)

assuming you don’t have huge numbers of entries a recursive function is easiest:

def merge(a, b, path=None):
    "merges b into a"
    if path is None: path = []
    for key in b:
        if key in a:
            if isinstance(a[key], dict) and isinstance(b[key], dict):
                merge(a[key], b[key], path + [str(key)])
            elif a[key] == b[key]:
                pass # same leaf value
            else:
                raise Exception('Conflict at %s' % '.'.join(path + [str(key)]))
        else:
            a[key] = b[key]
    return a

# works
print(merge({1:{"a":"A"},2:{"b":"B"}}, {2:{"c":"C"},3:{"d":"D"}}))
# has conflict
merge({1:{"a":"A"},2:{"b":"B"}}, {1:{"a":"A"},2:{"b":"C"}})

note that this mutates a – the contents of b are added to a (which is also returned). if you want to keep a you could call it like merge(dict(a), b).

agf pointed out (below) that you may have more than two dicts, in which case you can use:

reduce(merge, [dict1, dict2, dict3...])

where everything will be added to dict1.

[note – i edited my initial answer to mutate the first argument; that makes the “reduce” easier to explain]

ps in python 3, you will also need from functools import reduce


回答 1

这是使用生成器实现的一种简单方法:

def mergedicts(dict1, dict2):
    for k in set(dict1.keys()).union(dict2.keys()):
        if k in dict1 and k in dict2:
            if isinstance(dict1[k], dict) and isinstance(dict2[k], dict):
                yield (k, dict(mergedicts(dict1[k], dict2[k])))
            else:
                # If one of the values is not a dict, you can't continue merging it.
                # Value from second dict overrides one in first and we move on.
                yield (k, dict2[k])
                # Alternatively, replace this with exception raiser to alert you of value conflicts
        elif k in dict1:
            yield (k, dict1[k])
        else:
            yield (k, dict2[k])

dict1 = {1:{"a":"A"},2:{"b":"B"}}
dict2 = {2:{"c":"C"},3:{"d":"D"}}

print dict(mergedicts(dict1,dict2))

打印:

{1: {'a': 'A'}, 2: {'c': 'C', 'b': 'B'}, 3: {'d': 'D'}}

Here’s an easy way to do it using generators:

def mergedicts(dict1, dict2):
    for k in set(dict1.keys()).union(dict2.keys()):
        if k in dict1 and k in dict2:
            if isinstance(dict1[k], dict) and isinstance(dict2[k], dict):
                yield (k, dict(mergedicts(dict1[k], dict2[k])))
            else:
                # If one of the values is not a dict, you can't continue merging it.
                # Value from second dict overrides one in first and we move on.
                yield (k, dict2[k])
                # Alternatively, replace this with exception raiser to alert you of value conflicts
        elif k in dict1:
            yield (k, dict1[k])
        else:
            yield (k, dict2[k])

dict1 = {1:{"a":"A"},2:{"b":"B"}}
dict2 = {2:{"c":"C"},3:{"d":"D"}}

print dict(mergedicts(dict1,dict2))

This prints:

{1: {'a': 'A'}, 2: {'c': 'C', 'b': 'B'}, 3: {'d': 'D'}}

回答 2

这个问题的一个问题是,字典的值可以是任意复杂的数据。基于这些和其他答案,我想到了以下代码:

class YamlReaderError(Exception):
    pass

def data_merge(a, b):
    """merges b into a and return merged result

    NOTE: tuples and arbitrary objects are not handled as it is totally ambiguous what should happen"""
    key = None
    # ## debug output
    # sys.stderr.write("DEBUG: %s to %s\n" %(b,a))
    try:
        if a is None or isinstance(a, str) or isinstance(a, unicode) or isinstance(a, int) or isinstance(a, long) or isinstance(a, float):
            # border case for first run or if a is a primitive
            a = b
        elif isinstance(a, list):
            # lists can be only appended
            if isinstance(b, list):
                # merge lists
                a.extend(b)
            else:
                # append to list
                a.append(b)
        elif isinstance(a, dict):
            # dicts must be merged
            if isinstance(b, dict):
                for key in b:
                    if key in a:
                        a[key] = data_merge(a[key], b[key])
                    else:
                        a[key] = b[key]
            else:
                raise YamlReaderError('Cannot merge non-dict "%s" into dict "%s"' % (b, a))
        else:
            raise YamlReaderError('NOT IMPLEMENTED "%s" into "%s"' % (b, a))
    except TypeError, e:
        raise YamlReaderError('TypeError "%s" in key "%s" when merging "%s" into "%s"' % (e, key, b, a))
    return a

我的用例是合并YAML文件,我只需要处理可能的数据类型的子集。因此,我可以忽略元组和其他对象。对我而言,明智的合并逻辑意味着

  • 更换标量
  • 追加清单
  • 通过添加缺少的键并更新现有键来合并字典

其他所有和不可预见的结果都会导致错误。

One issue with this question is that the values of the dict can be arbitrarily complex pieces of data. Based upon these and other answers I came up with this code:

class YamlReaderError(Exception):
    pass

def data_merge(a, b):
    """merges b into a and return merged result

    NOTE: tuples and arbitrary objects are not handled as it is totally ambiguous what should happen"""
    key = None
    # ## debug output
    # sys.stderr.write("DEBUG: %s to %s\n" %(b,a))
    try:
        if a is None or isinstance(a, str) or isinstance(a, unicode) or isinstance(a, int) or isinstance(a, long) or isinstance(a, float):
            # border case for first run or if a is a primitive
            a = b
        elif isinstance(a, list):
            # lists can be only appended
            if isinstance(b, list):
                # merge lists
                a.extend(b)
            else:
                # append to list
                a.append(b)
        elif isinstance(a, dict):
            # dicts must be merged
            if isinstance(b, dict):
                for key in b:
                    if key in a:
                        a[key] = data_merge(a[key], b[key])
                    else:
                        a[key] = b[key]
            else:
                raise YamlReaderError('Cannot merge non-dict "%s" into dict "%s"' % (b, a))
        else:
            raise YamlReaderError('NOT IMPLEMENTED "%s" into "%s"' % (b, a))
    except TypeError, e:
        raise YamlReaderError('TypeError "%s" in key "%s" when merging "%s" into "%s"' % (e, key, b, a))
    return a

My use case is merging YAML files where I only have to deal with a subset of possible data types. Hence I can ignore tuples and other objects. For me a sensible merge logic means

  • replace scalars
  • append lists
  • merge dicts by adding missing keys and updating existing keys

Everything else and the unforeseens results in an error.


回答 3

字典词典合并

由于这是规范的问题(尽管有某些非一般性的规定),所以我提供了规范的Python方法来解决此问题。

最简单的情况:“叶是嵌套的字典,以空字典结尾”:

d1 = {'a': {1: {'foo': {}}, 2: {}}}
d2 = {'a': {1: {}, 2: {'bar': {}}}}
d3 = {'b': {3: {'baz': {}}}}
d4 = {'a': {1: {'quux': {}}}}

这是最简单的递归情况,我建议两种朴素的方法:

def rec_merge1(d1, d2):
    '''return new merged dict of dicts'''
    for k, v in d1.items(): # in Python 2, use .iteritems()!
        if k in d2:
            d2[k] = rec_merge1(v, d2[k])
    d3 = d1.copy()
    d3.update(d2)
    return d3

def rec_merge2(d1, d2):
    '''update first dict with second recursively'''
    for k, v in d1.items(): # in Python 2, use .iteritems()!
        if k in d2:
            d2[k] = rec_merge2(v, d2[k])
    d1.update(d2)
    return d1

我相信我更喜欢第二个,但要记住,第一个的原始状态必须从其原始位置重建。这是用法:

>>> from functools import reduce # only required for Python 3.
>>> reduce(rec_merge1, (d1, d2, d3, d4))
{'a': {1: {'quux': {}, 'foo': {}}, 2: {'bar': {}}}, 'b': {3: {'baz': {}}}}
>>> reduce(rec_merge2, (d1, d2, d3, d4))
{'a': {1: {'quux': {}, 'foo': {}}, 2: {'bar': {}}}, 'b': {3: {'baz': {}}}}

复杂的情况:“叶子是任何其他类型:”

因此,如果它们以字典结尾,这是合并末端空字典的简单情况。如果没有,那不是那么简单。如果是字符串,如何合并它们?可以类似地更新集合,因此我们可以进行这种处理,但是会丢失它们合并的顺序。那么顺序重要吗?

因此,代替更多信息,最简单的方法是在两个值都不都是dict的情况下为他们提供标准的更新处理:即第二个dict的值将覆盖第一个dict的值,即使第二个dict的值为None且第一个的值为a有很多信息的字典。

d1 = {'a': {1: 'foo', 2: None}}
d2 = {'a': {1: None, 2: 'bar'}}
d3 = {'b': {3: 'baz'}}
d4 = {'a': {1: 'quux'}}

from collections import MutableMapping

def rec_merge(d1, d2):
    '''
    Update two dicts of dicts recursively, 
    if either mapping has leaves that are non-dicts, 
    the second's leaf overwrites the first's.
    '''
    for k, v in d1.items(): # in Python 2, use .iteritems()!
        if k in d2:
            # this next check is the only difference!
            if all(isinstance(e, MutableMapping) for e in (v, d2[k])):
                d2[k] = rec_merge(v, d2[k])
            # we could further check types and merge as appropriate here.
    d3 = d1.copy()
    d3.update(d2)
    return d3

现在

from functools import reduce
reduce(rec_merge, (d1, d2, d3, d4))

退货

{'a': {1: 'quux', 2: 'bar'}, 'b': {3: 'baz'}}

适用于原始问题:

我必须删除字母周围的花括号并将其放在单引号中,以使其成为合法的Python(否则,它们将在Python 2.7+中设置为原义)并附加缺少的花括号:

dict1 = {1:{"a":'A'}, 2:{"b":'B'}}
dict2 = {2:{"c":'C'}, 3:{"d":'D'}}

rec_merge(dict1, dict2)现在返回:

{1: {'a': 'A'}, 2: {'c': 'C', 'b': 'B'}, 3: {'d': 'D'}}

匹配原始问题的期望结果(例如,更改{A}为后)'A'

Dictionaries of dictionaries merge

As this is the canonical question (in spite of certain non-generalities) I’m providing the canonical Pythonic approach to solving this issue.

Simplest Case: “leaves are nested dicts that end in empty dicts”:

d1 = {'a': {1: {'foo': {}}, 2: {}}}
d2 = {'a': {1: {}, 2: {'bar': {}}}}
d3 = {'b': {3: {'baz': {}}}}
d4 = {'a': {1: {'quux': {}}}}

This is the simplest case for recursion, and I would recommend two naive approaches:

def rec_merge1(d1, d2):
    '''return new merged dict of dicts'''
    for k, v in d1.items(): # in Python 2, use .iteritems()!
        if k in d2:
            d2[k] = rec_merge1(v, d2[k])
    d3 = d1.copy()
    d3.update(d2)
    return d3

def rec_merge2(d1, d2):
    '''update first dict with second recursively'''
    for k, v in d1.items(): # in Python 2, use .iteritems()!
        if k in d2:
            d2[k] = rec_merge2(v, d2[k])
    d1.update(d2)
    return d1

I believe I would prefer the second to the first, but keep in mind that the original state of the first would have to be rebuilt from its origin. Here’s the usage:

>>> from functools import reduce # only required for Python 3.
>>> reduce(rec_merge1, (d1, d2, d3, d4))
{'a': {1: {'quux': {}, 'foo': {}}, 2: {'bar': {}}}, 'b': {3: {'baz': {}}}}
>>> reduce(rec_merge2, (d1, d2, d3, d4))
{'a': {1: {'quux': {}, 'foo': {}}, 2: {'bar': {}}}, 'b': {3: {'baz': {}}}}

Complex Case: “leaves are of any other type:”

So if they end in dicts, it’s a simple case of merging the end empty dicts. If not, it’s not so trivial. If strings, how do you merge them? Sets can be updated similarly, so we could give that treatment, but we lose the order in which they were merged. So does order matter?

So in lieu of more information, the simplest approach will be to give them the standard update treatment if both values are not dicts: i.e. the second dict’s value will overwrite the first, even if the second dict’s value is None and the first’s value is a dict with a lot of info.

d1 = {'a': {1: 'foo', 2: None}}
d2 = {'a': {1: None, 2: 'bar'}}
d3 = {'b': {3: 'baz'}}
d4 = {'a': {1: 'quux'}}

from collections.abc import MutableMapping

def rec_merge(d1, d2):
    '''
    Update two dicts of dicts recursively, 
    if either mapping has leaves that are non-dicts, 
    the second's leaf overwrites the first's.
    '''
    for k, v in d1.items():
        if k in d2:
            # this next check is the only difference!
            if all(isinstance(e, MutableMapping) for e in (v, d2[k])):
                d2[k] = rec_merge(v, d2[k])
            # we could further check types and merge as appropriate here.
    d3 = d1.copy()
    d3.update(d2)
    return d3

And now

from functools import reduce
reduce(rec_merge, (d1, d2, d3, d4))

returns

{'a': {1: 'quux', 2: 'bar'}, 'b': {3: 'baz'}}

Application to the original question:

I’ve had to remove the curly braces around the letters and put them in single quotes for this to be legit Python (else they would be set literals in Python 2.7+) as well as append a missing brace:

dict1 = {1:{"a":'A'}, 2:{"b":'B'}}
dict2 = {2:{"c":'C'}, 3:{"d":'D'}}

and rec_merge(dict1, dict2) now returns:

{1: {'a': 'A'}, 2: {'c': 'C', 'b': 'B'}, 3: {'d': 'D'}}

Which matches the desired outcome of the original question (after changing, e.g. the {A} to 'A'.)


回答 4

基于@andrew cooke。此版本处理字典的嵌套列表,还允许选择更新值

def merge(a, b, path=None, update=True):
    "http://stackoverflow.com/questions/7204805/python-dictionaries-of-dictionaries-merge"
    "merges b into a"
    if path is None: path = []
    for key in b:
        if key in a:
            if isinstance(a[key], dict) and isinstance(b[key], dict):
                merge(a[key], b[key], path + [str(key)])
            elif a[key] == b[key]:
                pass # same leaf value
            elif isinstance(a[key], list) and isinstance(b[key], list):
                for idx, val in enumerate(b[key]):
                    a[key][idx] = merge(a[key][idx], b[key][idx], path + [str(key), str(idx)], update=update)
            elif update:
                a[key] = b[key]
            else:
                raise Exception('Conflict at %s' % '.'.join(path + [str(key)]))
        else:
            a[key] = b[key]
    return a

Based on @andrew cooke. This version handles nested lists of dicts and also allows the option to update the values

def merge(a, b, path=None, update=True):
    "http://stackoverflow.com/questions/7204805/python-dictionaries-of-dictionaries-merge"
    "merges b into a"
    if path is None: path = []
    for key in b:
        if key in a:
            if isinstance(a[key], dict) and isinstance(b[key], dict):
                merge(a[key], b[key], path + [str(key)])
            elif a[key] == b[key]:
                pass # same leaf value
            elif isinstance(a[key], list) and isinstance(b[key], list):
                for idx, val in enumerate(b[key]):
                    a[key][idx] = merge(a[key][idx], b[key][idx], path + [str(key), str(idx)], update=update)
            elif update:
                a[key] = b[key]
            else:
                raise Exception('Conflict at %s' % '.'.join(path + [str(key)]))
        else:
            a[key] = b[key]
    return a

回答 5

这个简单的递归过程将一个字典合并到另一个字典中,同时覆盖冲突的键:

#!/usr/bin/env python2.7

def merge_dicts(dict1, dict2):
    """ Recursively merges dict2 into dict1 """
    if not isinstance(dict1, dict) or not isinstance(dict2, dict):
        return dict2
    for k in dict2:
        if k in dict1:
            dict1[k] = merge_dicts(dict1[k], dict2[k])
        else:
            dict1[k] = dict2[k]
    return dict1

print (merge_dicts({1:{"a":"A"}, 2:{"b":"B"}}, {2:{"c":"C"}, 3:{"d":"D"}}))
print (merge_dicts({1:{"a":"A"}, 2:{"b":"B"}}, {1:{"a":"A"}, 2:{"b":"C"}}))

输出:

{1: {'a': 'A'}, 2: {'c': 'C', 'b': 'B'}, 3: {'d': 'D'}}
{1: {'a': 'A'}, 2: {'b': 'C'}}

This simple recursive procedure will merge one dictionary into another while overriding conflicting keys:

#!/usr/bin/env python2.7

def merge_dicts(dict1, dict2):
    """ Recursively merges dict2 into dict1 """
    if not isinstance(dict1, dict) or not isinstance(dict2, dict):
        return dict2
    for k in dict2:
        if k in dict1:
            dict1[k] = merge_dicts(dict1[k], dict2[k])
        else:
            dict1[k] = dict2[k]
    return dict1

print (merge_dicts({1:{"a":"A"}, 2:{"b":"B"}}, {2:{"c":"C"}, 3:{"d":"D"}}))
print (merge_dicts({1:{"a":"A"}, 2:{"b":"B"}}, {1:{"a":"A"}, 2:{"b":"C"}}))

Output:

{1: {'a': 'A'}, 2: {'c': 'C', 'b': 'B'}, 3: {'d': 'D'}}
{1: {'a': 'A'}, 2: {'b': 'C'}}

回答 6

基于@andrew cooke的答案。它以更好的方式处理嵌套列表。

def deep_merge_lists(original, incoming):
    """
    Deep merge two lists. Modifies original.
    Recursively call deep merge on each correlated element of list. 
    If item type in both elements are
     a. dict: Call deep_merge_dicts on both values.
     b. list: Recursively call deep_merge_lists on both values.
     c. any other type: Value is overridden.
     d. conflicting types: Value is overridden.

    If length of incoming list is more that of original then extra values are appended.
    """
    common_length = min(len(original), len(incoming))
    for idx in range(common_length):
        if isinstance(original[idx], dict) and isinstance(incoming[idx], dict):
            deep_merge_dicts(original[idx], incoming[idx])

        elif isinstance(original[idx], list) and isinstance(incoming[idx], list):
            deep_merge_lists(original[idx], incoming[idx])

        else:
            original[idx] = incoming[idx]

    for idx in range(common_length, len(incoming)):
        original.append(incoming[idx])


def deep_merge_dicts(original, incoming):
    """
    Deep merge two dictionaries. Modifies original.
    For key conflicts if both values are:
     a. dict: Recursively call deep_merge_dicts on both values.
     b. list: Call deep_merge_lists on both values.
     c. any other type: Value is overridden.
     d. conflicting types: Value is overridden.

    """
    for key in incoming:
        if key in original:
            if isinstance(original[key], dict) and isinstance(incoming[key], dict):
                deep_merge_dicts(original[key], incoming[key])

            elif isinstance(original[key], list) and isinstance(incoming[key], list):
                deep_merge_lists(original[key], incoming[key])

            else:
                original[key] = incoming[key]
        else:
            original[key] = incoming[key]

Based on answers from @andrew cooke. It takes care of nested lists in a better way.

def deep_merge_lists(original, incoming):
    """
    Deep merge two lists. Modifies original.
    Recursively call deep merge on each correlated element of list. 
    If item type in both elements are
     a. dict: Call deep_merge_dicts on both values.
     b. list: Recursively call deep_merge_lists on both values.
     c. any other type: Value is overridden.
     d. conflicting types: Value is overridden.

    If length of incoming list is more that of original then extra values are appended.
    """
    common_length = min(len(original), len(incoming))
    for idx in range(common_length):
        if isinstance(original[idx], dict) and isinstance(incoming[idx], dict):
            deep_merge_dicts(original[idx], incoming[idx])

        elif isinstance(original[idx], list) and isinstance(incoming[idx], list):
            deep_merge_lists(original[idx], incoming[idx])

        else:
            original[idx] = incoming[idx]

    for idx in range(common_length, len(incoming)):
        original.append(incoming[idx])


def deep_merge_dicts(original, incoming):
    """
    Deep merge two dictionaries. Modifies original.
    For key conflicts if both values are:
     a. dict: Recursively call deep_merge_dicts on both values.
     b. list: Call deep_merge_lists on both values.
     c. any other type: Value is overridden.
     d. conflicting types: Value is overridden.

    """
    for key in incoming:
        if key in original:
            if isinstance(original[key], dict) and isinstance(incoming[key], dict):
                deep_merge_dicts(original[key], incoming[key])

            elif isinstance(original[key], list) and isinstance(incoming[key], list):
                deep_merge_lists(original[key], incoming[key])

            else:
                original[key] = incoming[key]
        else:
            original[key] = incoming[key]

回答 7

如果您的词典级别未知,那么我建议使用递归函数:

def combineDicts(dictionary1, dictionary2):
    output = {}
    for item, value in dictionary1.iteritems():
        if dictionary2.has_key(item):
            if isinstance(dictionary2[item], dict):
                output[item] = combineDicts(value, dictionary2.pop(item))
        else:
            output[item] = value
    for item, value in dictionary2.iteritems():
         output[item] = value
    return output

If you have an unknown level of dictionaries, then I would suggest a recursive function:

def combineDicts(dictionary1, dictionary2):
    output = {}
    for item, value in dictionary1.iteritems():
        if dictionary2.has_key(item):
            if isinstance(dictionary2[item], dict):
                output[item] = combineDicts(value, dictionary2.pop(item))
        else:
            output[item] = value
    for item, value in dictionary2.iteritems():
         output[item] = value
    return output

回答 8

总览

以下方法将dict的深度合并问题细分为:

  1. 参数化的浅表合并函数merge(f)(a,b),该函数使用一个函数f合并两个字典ab

  2. 递归合并函数f将与merge


实作

可以通过多种方式来编写用于合并两个(非嵌套)字典的函数。我个人喜欢

def merge(f):
    def merge(a,b): 
        keys = a.keys() | b.keys()
        return {key:f(a.get(key), b.get(key)) for key in keys}
    return merge

定义适当的递归合并函数的一种好方法f是使用多调度,它允许定义根据参数类型沿不同路径求值的函数。

from multipledispatch import dispatch

#for anything that is not a dict return
@dispatch(object, object)
def f(a, b):
    return b if b is not None else a

#for dicts recurse 
@dispatch(dict, dict)
def f(a,b):
    return merge(f)(a,b)

要合并两个嵌套的字典,只需使用merge(f)例如:

dict1 = {1:{"a":"A"},2:{"b":"B"}}
dict2 = {2:{"c":"C"},3:{"d":"D"}}
merge(f)(dict1, dict2)
#returns {1: {'a': 'A'}, 2: {'b': 'B', 'c': 'C'}, 3: {'d': 'D'}} 

笔记:

这种方法的优点是:

  • 该函数是由较小的函数构建的,每个较小的函数都做一件事情,这使代码更易于推理和测试。

  • 该行为不是硬编码的,但是可以根据需要进行更改和扩展,从而提高了代码重用性(请参见下面的示例)。


客制化

一些答案还考虑了包含例如其他(可能嵌套的)字典列表的字典。在这种情况下,可能需要映射列表并根据位置合并它们。这可以通过向合并功能添加另一个定义来完成f

import itertools
@dispatch(list, list)
def f(a,b):
    return [merge(f)(*arg) for arg in itertools.zip_longest(a, b)]

Overview

The following approach subdivides the problem of a deep merge of dicts into:

  1. A parameterized shallow merge function merge(f)(a,b) that uses a function f to merge two dicts a and b

  2. A recursive merger function f to be used together with merge


Implementation

A function for merging two (non nested) dicts can be written in a lot of ways. I personally like

def merge(f):
    def merge(a,b): 
        keys = a.keys() | b.keys()
        return {key:f(a.get(key), b.get(key)) for key in keys}
    return merge

A nice way of defining an appropriate recursive merger function f is using multipledispatch which allows to define functions that evaluate along different paths depending on the type of their arguments.

from multipledispatch import dispatch

#for anything that is not a dict return
@dispatch(object, object)
def f(a, b):
    return b if b is not None else a

#for dicts recurse 
@dispatch(dict, dict)
def f(a,b):
    return merge(f)(a,b)

Example

To merge two nested dicts simply use merge(f) e.g.:

dict1 = {1:{"a":"A"},2:{"b":"B"}}
dict2 = {2:{"c":"C"},3:{"d":"D"}}
merge(f)(dict1, dict2)
#returns {1: {'a': 'A'}, 2: {'b': 'B', 'c': 'C'}, 3: {'d': 'D'}} 

Notes:

The advantages of this approach are:

  • The function is build from smaller functions that each do a single thing which makes the code simpler to reason about and test

  • The behaviour is not hard-coded but can be changed and extended as needed which improves code reuse (see example below).


Customization

Some answers also considered dicts that contain lists e.g. of other (potentially nested) dicts. In this case one might want map over the lists and merge them based on position. This can be done by adding another definition to the merger function f:

import itertools
@dispatch(list, list)
def f(a,b):
    return [merge(f)(*arg) for arg in itertools.zip_longest(a, b)]

回答 9

如果有人想要解决这个问题,这是我的解决方案。

美德:简短,说明性且具有风格上的功能(递归,无突变)。

潜在缺点:这可能不是您要查找的合并。有关语义,请查阅文档字符串。

def deep_merge(a, b):
    """
    Merge two values, with `b` taking precedence over `a`.

    Semantics:
    - If either `a` or `b` is not a dictionary, `a` will be returned only if
      `b` is `None`. Otherwise `b` will be returned.
    - If both values are dictionaries, they are merged as follows:
        * Each key that is found only in `a` or only in `b` will be included in
          the output collection with its value intact.
        * For any key in common between `a` and `b`, the corresponding values
          will be merged with the same semantics.
    """
    if not isinstance(a, dict) or not isinstance(b, dict):
        return a if b is None else b
    else:
        # If we're here, both a and b must be dictionaries or subtypes thereof.

        # Compute set of all keys in both dictionaries.
        keys = set(a.keys()) | set(b.keys())

        # Build output dictionary, merging recursively values with common keys,
        # where `None` is used to mean the absence of a value.
        return {
            key: deep_merge(a.get(key), b.get(key))
            for key in keys
        }

In case someone wants yet another approach to this problem, here’s my solution.

Virtues: short, declarative, and functional in style (recursive, does no mutation).

Potential Drawback: This might not be the merge you’re looking for. Consult the docstring for semantics.

def deep_merge(a, b):
    """
    Merge two values, with `b` taking precedence over `a`.

    Semantics:
    - If either `a` or `b` is not a dictionary, `a` will be returned only if
      `b` is `None`. Otherwise `b` will be returned.
    - If both values are dictionaries, they are merged as follows:
        * Each key that is found only in `a` or only in `b` will be included in
          the output collection with its value intact.
        * For any key in common between `a` and `b`, the corresponding values
          will be merged with the same semantics.
    """
    if not isinstance(a, dict) or not isinstance(b, dict):
        return a if b is None else b
    else:
        # If we're here, both a and b must be dictionaries or subtypes thereof.

        # Compute set of all keys in both dictionaries.
        keys = set(a.keys()) | set(b.keys())

        # Build output dictionary, merging recursively values with common keys,
        # where `None` is used to mean the absence of a value.
        return {
            key: deep_merge(a.get(key), b.get(key))
            for key in keys
        }

回答 10

您可以尝试mergedeep


安装

$ pip3 install mergedeep

用法

from mergedeep import merge

a = {"keyA": 1}
b = {"keyB": {"sub1": 10}}
c = {"keyB": {"sub2": 20}}

merge(a, b, c) 

print(a)
# {"keyA": 1, "keyB": {"sub1": 10, "sub2": 20}}

有关选项的完整列表,请查看文档

You could try mergedeep.


Installation

$ pip3 install mergedeep

Usage

from mergedeep import merge

a = {"keyA": 1}
b = {"keyB": {"sub1": 10}}
c = {"keyB": {"sub2": 20}}

merge(a, b, c) 

print(a)
# {"keyA": 1, "keyB": {"sub1": 10, "sub2": 20}}

For a full list of options, check out the docs!


回答 11

安德鲁·库克斯答案有一个小问题:在某些情况下,b当您修改返回的字典时,它会修改第二个参数。特别是因为这一行:

if key in a:
    ...
else:
    a[key] = b[key]

如果b[key]为a dict,则将其简单地分配给a,这意味着对它的任何后续修改dict都会影响ab

a={}
b={'1':{'2':'b'}}
c={'1':{'3':'c'}}
merge(merge(a,b), c) # {'1': {'3': 'c', '2': 'b'}}
a # {'1': {'3': 'c', '2': 'b'}} (as expected)
b # {'1': {'3': 'c', '2': 'b'}} <----
c # {'1': {'3': 'c'}} (unmodified)

为了解决这个问题,该行必须替换为:

if isinstance(b[key], dict):
    a[key] = clone_dict(b[key])
else:
    a[key] = b[key]

在哪里clone_dict

def clone_dict(obj):
    clone = {}
    for key, value in obj.iteritems():
        if isinstance(value, dict):
            clone[key] = clone_dict(value)
        else:
            clone[key] = value
    return

仍然。这显然不占listset和其他的东西,但我希望尝试合并时,它说明了陷阱dicts

为了完整起见,这是我的版本,您可以在其中多次传递它dicts

def merge_dicts(*args):
    def clone_dict(obj):
        clone = {}
        for key, value in obj.iteritems():
            if isinstance(value, dict):
                clone[key] = clone_dict(value)
            else:
                clone[key] = value
        return

    def merge(a, b, path=[]):
        for key in b:
            if key in a:
                if isinstance(a[key], dict) and isinstance(b[key], dict):
                    merge(a[key], b[key], path + [str(key)])
                elif a[key] == b[key]:
                    pass
                else:
                    raise Exception('Conflict at `{path}\''.format(path='.'.join(path + [str(key)])))
            else:
                if isinstance(b[key], dict):
                    a[key] = clone_dict(b[key])
                else:
                    a[key] = b[key]
        return a
    return reduce(merge, args, {})

There’s a slight problem with andrew cookes answer: In some cases it modifies the second argument b when you modify the returned dict. Specifically it’s because of this line:

if key in a:
    ...
else:
    a[key] = b[key]

If b[key] is a dict, it will simply be assigned to a, meaning any subsequent modifications to that dict will affect both a and b.

a={}
b={'1':{'2':'b'}}
c={'1':{'3':'c'}}
merge(merge(a,b), c) # {'1': {'3': 'c', '2': 'b'}}
a # {'1': {'3': 'c', '2': 'b'}} (as expected)
b # {'1': {'3': 'c', '2': 'b'}} <----
c # {'1': {'3': 'c'}} (unmodified)

To fix this, the line would have to be substituted with this:

if isinstance(b[key], dict):
    a[key] = clone_dict(b[key])
else:
    a[key] = b[key]

Where clone_dict is:

def clone_dict(obj):
    clone = {}
    for key, value in obj.iteritems():
        if isinstance(value, dict):
            clone[key] = clone_dict(value)
        else:
            clone[key] = value
    return

Still. This obviously doesn’t account for list, set and other stuff, but I hope it illustrates the pitfalls when trying to merge dicts.

And for completeness sake, here is my version, where you can pass it multiple dicts:

def merge_dicts(*args):
    def clone_dict(obj):
        clone = {}
        for key, value in obj.iteritems():
            if isinstance(value, dict):
                clone[key] = clone_dict(value)
            else:
                clone[key] = value
        return

    def merge(a, b, path=[]):
        for key in b:
            if key in a:
                if isinstance(a[key], dict) and isinstance(b[key], dict):
                    merge(a[key], b[key], path + [str(key)])
                elif a[key] == b[key]:
                    pass
                else:
                    raise Exception('Conflict at `{path}\''.format(path='.'.join(path + [str(key)])))
            else:
                if isinstance(b[key], dict):
                    a[key] = clone_dict(b[key])
                else:
                    a[key] = b[key]
        return a
    return reduce(merge, args, {})

回答 12

此版本的函数将占N个字典,仅占字典-无法传递不正确的参数,否则将引发TypeError。合并本身解决了键冲突,而不是覆盖合并链下游的词典中的数据,它创建了一组值并将其追加到该值之后;没有数据丢失。

它在页面上可能不是最有效的,但是却是最彻底的,并且在合并2到N个字典时,您不会丢失任何信息。

def merge_dicts(*dicts):
    if not reduce(lambda x, y: isinstance(y, dict) and x, dicts, True):
        raise TypeError, "Object in *dicts not of type dict"
    if len(dicts) < 2:
        raise ValueError, "Requires 2 or more dict objects"


    def merge(a, b):
        for d in set(a.keys()).union(b.keys()):
            if d in a and d in b:
                if type(a[d]) == type(b[d]):
                    if not isinstance(a[d], dict):
                        ret = list({a[d], b[d]})
                        if len(ret) == 1: ret = ret[0]
                        yield (d, sorted(ret))
                    else:
                        yield (d, dict(merge(a[d], b[d])))
                else:
                    raise TypeError, "Conflicting key:value type assignment"
            elif d in a:
                yield (d, a[d])
            elif d in b:
                yield (d, b[d])
            else:
                raise KeyError

    return reduce(lambda x, y: dict(merge(x, y)), dicts[1:], dicts[0])

print merge_dicts({1:1,2:{1:2}},{1:2,2:{3:1}},{4:4})

输出:{1:[1,2],2:{1:2,3:3:1},4:4}

This version of the function will account for N number of dictionaries, and only dictionaries — no improper parameters can be passed, or it will raise a TypeError. The merge itself accounts for key conflicts, and instead of overwriting data from a dictionary further down the merge chain, it creates a set of values and appends to that; no data is lost.

It might not be the most effecient on the page, but it’s the most thorough and you’re not going to lose any information when you merge your 2 to N dicts.

def merge_dicts(*dicts):
    if not reduce(lambda x, y: isinstance(y, dict) and x, dicts, True):
        raise TypeError, "Object in *dicts not of type dict"
    if len(dicts) < 2:
        raise ValueError, "Requires 2 or more dict objects"


    def merge(a, b):
        for d in set(a.keys()).union(b.keys()):
            if d in a and d in b:
                if type(a[d]) == type(b[d]):
                    if not isinstance(a[d], dict):
                        ret = list({a[d], b[d]})
                        if len(ret) == 1: ret = ret[0]
                        yield (d, sorted(ret))
                    else:
                        yield (d, dict(merge(a[d], b[d])))
                else:
                    raise TypeError, "Conflicting key:value type assignment"
            elif d in a:
                yield (d, a[d])
            elif d in b:
                yield (d, b[d])
            else:
                raise KeyError

    return reduce(lambda x, y: dict(merge(x, y)), dicts[1:], dicts[0])

print merge_dicts({1:1,2:{1:2}},{1:2,2:{3:1}},{4:4})

output: {1: [1, 2], 2: {1: 2, 3: 1}, 4: 4}


回答 13

由于dictviews支持集合操作,因此我能够大大简化jterrace的答案。

def merge(dict1, dict2):
    for k in dict1.keys() - dict2.keys():
        yield (k, dict1[k])

    for k in dict2.keys() - dict1.keys():
        yield (k, dict2[k])

    for k in dict1.keys() & dict2.keys():
        yield (k, dict(merge(dict1[k], dict2[k])))

任何尝试将dict与非dict组合在一起的尝试(从技术上讲,是指具有“ keys”方法的对象,而没有“ keys”方法的对象)将引发AttributeError。这包括对函数的初始调用和递归调用。这正是我想要的,所以我离开了。您可以轻松地捕获递归调用引发的AttributeErrors,然后产生您想要的任何值。

Since dictviews support set operations, I was able to greatly simplify jterrace’s answer.

def merge(dict1, dict2):
    for k in dict1.keys() - dict2.keys():
        yield (k, dict1[k])

    for k in dict2.keys() - dict1.keys():
        yield (k, dict2[k])

    for k in dict1.keys() & dict2.keys():
        yield (k, dict(merge(dict1[k], dict2[k])))

Any attempt to combine a dict with a non dict (technically, an object with a ‘keys’ method and an object without a ‘keys’ method) will raise an AttributeError. This includes both the initial call to the function and recursive calls. This is exactly what I wanted so I left it. You could easily catch an AttributeErrors thrown by the recursive call and then yield any value you please.


回答 14

短n甜:

from collections.abc import MutableMapping as Map

def nested_update(d, v):
"""
Nested update of dict-like 'd' with dict-like 'v'.
"""

for key in v:
    if key in d and isinstance(d[key], Map) and isinstance(v[key], Map):
        nested_update(d[key], v[key])
    else:
        d[key] = v[key]

它的工作方式类似于Python的dict.update方法(并建立在该方法的基础上)。它会返回Nonereturn d如果愿意,可以随时添加),因为它会d原位更新dict 。中的键v将覆盖其中的任何现有键d(它不会尝试解释字典的内容)。

它也适用于其他(“类似字典”的)映射。

Short-n-sweet:

from collections.abc import MutableMapping as Map

def nested_update(d, v):
"""
Nested update of dict-like 'd' with dict-like 'v'.
"""

for key in v:
    if key in d and isinstance(d[key], Map) and isinstance(v[key], Map):
        nested_update(d[key], v[key])
    else:
        d[key] = v[key]

This works like (and is build on) Python’s dict.update method. It returns None (you can always add return d if you prefer) as it updates dict d in-place. Keys in v will overwrite any existing keys in d (it does not try to interpret the dict’s contents).

It will also work for other (“dict-like”) mappings.


回答 15

当然,代码将取决于您解决合并冲突的规则。这是一个可以使用任意数量的参数并将其递归合并到任意深度的版本,而无需使用任何对象突变。它使用以下规则解决合并冲突:

  • 字典优先于非字典值({"foo": {...}}优先于{"foo": "bar"}
  • 随后的参数优先于前面的参数(如果合并{"a": 1}{"a", 2}以及{"a": 3}为了,其结果将是{"a": 3}
try:
    from collections import Mapping
except ImportError:
    Mapping = dict

def merge_dicts(*dicts):                                                            
    """                                                                             
    Return a new dictionary that is the result of merging the arguments together.   
    In case of conflicts, later arguments take precedence over earlier arguments.   
    """                                                                             
    updated = {}                                                                    
    # grab all keys                                                                 
    keys = set()                                                                    
    for d in dicts:                                                                 
        keys = keys.union(set(d))                                                   

    for key in keys:                                                                
        values = [d[key] for d in dicts if key in d]                                
        # which ones are mapping types? (aka dict)                                  
        maps = [value for value in values if isinstance(value, Mapping)]            
        if maps:                                                                    
            # if we have any mapping types, call recursively to merge them          
            updated[key] = merge_dicts(*maps)                                       
        else:                                                                       
            # otherwise, just grab the last value we have, since later arguments    
            # take precedence over earlier arguments                                
            updated[key] = values[-1]                                               
    return updated  

The code will depend on your rules for resolving merge conflicts, of course. Here’s a version which can take an arbitrary number of arguments and merges them recursively to an arbitrary depth, without using any object mutation. It uses the following rules to resolve merge conflicts:

  • dictionaries take precedence over non-dict values ({"foo": {...}} takes precedence over {"foo": "bar"})
  • later arguments take precedence over earlier arguments (if you merge {"a": 1}, {"a", 2}, and {"a": 3} in order, the result will be {"a": 3})
try:
    from collections import Mapping
except ImportError:
    Mapping = dict

def merge_dicts(*dicts):                                                            
    """                                                                             
    Return a new dictionary that is the result of merging the arguments together.   
    In case of conflicts, later arguments take precedence over earlier arguments.   
    """                                                                             
    updated = {}                                                                    
    # grab all keys                                                                 
    keys = set()                                                                    
    for d in dicts:                                                                 
        keys = keys.union(set(d))                                                   

    for key in keys:                                                                
        values = [d[key] for d in dicts if key in d]                                
        # which ones are mapping types? (aka dict)                                  
        maps = [value for value in values if isinstance(value, Mapping)]            
        if maps:                                                                    
            # if we have any mapping types, call recursively to merge them          
            updated[key] = merge_dicts(*maps)                                       
        else:                                                                       
            # otherwise, just grab the last value we have, since later arguments    
            # take precedence over earlier arguments                                
            updated[key] = values[-1]                                               
    return updated  

回答 16

我有两个字典(ab),每个字典可以包含任意数量的嵌套字典。我想以b优先于递归合并它们a

将嵌套字典视为树,我想要的是:

  • 更新a以使到每个叶子的每条路径都b表示为a
  • 要覆盖的子树a,如果叶在相应路径中找到b
    • 保持所有b叶节点仍为​​叶的不变性。

对于我的口味而言,现有的答案有些复杂,并且在架子上留下了一些细节。我一起整理了以下内容,这些内容通过了我的数据集的单元测试。

  def merge_map(a, b):
    if not isinstance(a, dict) or not isinstance(b, dict):
      return b

    for key in b.keys():
      a[key] = merge_map(a[key], b[key]) if key in a else b[key]
    return a

示例(为清晰起见而格式化):

 a = {
    1 : {'a': 'red', 
         'b': {'blue': 'fish', 'yellow': 'bear' },
         'c': { 'orange': 'dog'},
    },
    2 : {'d': 'green'},
    3: 'e'
  }

  b = {
    1 : {'b': 'white'},
    2 : {'d': 'black'},
    3: 'e'
  }


  >>> merge_map(a, b)
  {1: {'a': 'red', 
       'b': 'white',
       'c': {'orange': 'dog'},},
   2: {'d': 'black'},
   3: 'e'}

b需要维护的路径是:

  • 1 -> 'b' -> 'white'
  • 2 -> 'd' -> 'black'
  • 3 -> 'e'

a 具有以下独特且无冲突的路径:

  • 1 -> 'a' -> 'red'
  • 1 -> 'c' -> 'orange' -> 'dog'

因此它们仍显示在合并地图中。

I had two dictionaries (a and b) which could each contain any number of nested dictionaries. I wanted to recursively merge them, with b taking precedence over a.

Considering the nested dictionaries as trees, what I wanted was:

  • To update a so that every path to every leaf in b would be represented in a
  • To overwrite subtrees of a if a leaf is found in the corresponding path in b
    • Maintain the invariant that all b leaf nodes remain leafs.

The existing answers were a little complicated for my taste and left some details on the shelf. I hacked together the following, which passes unit tests for my data set.

  def merge_map(a, b):
    if not isinstance(a, dict) or not isinstance(b, dict):
      return b

    for key in b.keys():
      a[key] = merge_map(a[key], b[key]) if key in a else b[key]
    return a

Example (formatted for clarity):

 a = {
    1 : {'a': 'red', 
         'b': {'blue': 'fish', 'yellow': 'bear' },
         'c': { 'orange': 'dog'},
    },
    2 : {'d': 'green'},
    3: 'e'
  }

  b = {
    1 : {'b': 'white'},
    2 : {'d': 'black'},
    3: 'e'
  }


  >>> merge_map(a, b)
  {1: {'a': 'red', 
       'b': 'white',
       'c': {'orange': 'dog'},},
   2: {'d': 'black'},
   3: 'e'}

The paths in b that needed to be maintained were:

  • 1 -> 'b' -> 'white'
  • 2 -> 'd' -> 'black'
  • 3 -> 'e'.

a had the unique and non-conflicting paths of:

  • 1 -> 'a' -> 'red'
  • 1 -> 'c' -> 'orange' -> 'dog'

so they are still represented in the merged map.


回答 17

我有一个迭代的解决方案-大字典和很多字典(例如json等)的效果要好得多:

import collections


def merge_dict_with_subdicts(dict1: dict, dict2: dict) -> dict:
    """
    similar behaviour to builtin dict.update - but knows how to handle nested dicts
    """
    q = collections.deque([(dict1, dict2)])
    while len(q) > 0:
        d1, d2 = q.pop()
        for k, v in d2.items():
            if k in d1 and isinstance(d1[k], dict) and isinstance(v, dict):
                q.append((d1[k], v))
            else:
                d1[k] = v

    return dict1

请注意,如果它们不是两个字典,则将使用d2中的值覆盖d1。(与python相同dict.update()

一些测试:

def test_deep_update():
    d = dict()
    merge_dict_with_subdicts(d, {"a": 4})
    assert d == {"a": 4}

    new_dict = {
        "b": {
            "c": {
                "d": 6
            }
        }
    }
    merge_dict_with_subdicts(d, new_dict)
    assert d == {
        "a": 4,
        "b": {
            "c": {
                "d": 6
            }
        }
    }

    new_dict = {
        "a": 3,
        "b": {
            "f": 7
        }
    }
    merge_dict_with_subdicts(d, new_dict)
    assert d == {
        "a": 3,
        "b": {
            "c": {
                "d": 6
            },
            "f": 7
        }
    }

    # test a case where one of the dicts has dict as value and the other has something else
    new_dict = {
        'a': {
            'b': 4
        }
    }
    merge_dict_with_subdicts(d, new_dict)
    assert d['a']['b'] == 4

我已经测试了约1200格-该方法花费了0.4秒,而递归解决方案花费了约2.5秒。

I have an iterative solution – works much much better with big dicts & a lot of them (for example jsons etc):

import collections


def merge_dict_with_subdicts(dict1: dict, dict2: dict) -> dict:
    """
    similar behaviour to builtin dict.update - but knows how to handle nested dicts
    """
    q = collections.deque([(dict1, dict2)])
    while len(q) > 0:
        d1, d2 = q.pop()
        for k, v in d2.items():
            if k in d1 and isinstance(d1[k], dict) and isinstance(v, dict):
                q.append((d1[k], v))
            else:
                d1[k] = v

    return dict1

note that this will use the value in d2 to override d1, in case they are not both dicts. (same as python’s dict.update())

some tests:

def test_deep_update():
    d = dict()
    merge_dict_with_subdicts(d, {"a": 4})
    assert d == {"a": 4}

    new_dict = {
        "b": {
            "c": {
                "d": 6
            }
        }
    }
    merge_dict_with_subdicts(d, new_dict)
    assert d == {
        "a": 4,
        "b": {
            "c": {
                "d": 6
            }
        }
    }

    new_dict = {
        "a": 3,
        "b": {
            "f": 7
        }
    }
    merge_dict_with_subdicts(d, new_dict)
    assert d == {
        "a": 3,
        "b": {
            "c": {
                "d": 6
            },
            "f": 7
        }
    }

    # test a case where one of the dicts has dict as value and the other has something else
    new_dict = {
        'a': {
            'b': 4
        }
    }
    merge_dict_with_subdicts(d, new_dict)
    assert d['a']['b'] == 4

I’ve tested with around ~1200 dicts – this method took 0.4 seconds, while the recursive solution took ~2.5 seconds.


回答 18

这应该有助于将所有项目合并dict2dict1

for item in dict2:
    if item in dict1:
        for leaf in dict2[item]:
            dict1[item][leaf] = dict2[item][leaf]
    else:
        dict1[item] = dict2[item]

请测试一下,然后告诉我们这是否是您想要的。

编辑:

上述解决方案仅合并了一个级别,但正确地解决了OP给出的示例。要合并多个级别,应使用递归。

This should help in merging all items from dict2 into dict1:

for item in dict2:
    if item in dict1:
        for leaf in dict2[item]:
            dict1[item][leaf] = dict2[item][leaf]
    else:
        dict1[item] = dict2[item]

Please test it and tell us whether this is what you wanted.

EDIT:

The above mentioned solution merges only one level, but correctly solves the example given by OP. To merge multiple levels, the recursion should be used.


回答 19

我一直在测试您的解决方案,并决定在我的项目中使用此解决方案:

def mergedicts(dict1, dict2, conflict, no_conflict):
    for k in set(dict1.keys()).union(dict2.keys()):
        if k in dict1 and k in dict2:
            yield (k, conflict(dict1[k], dict2[k]))
        elif k in dict1:
            yield (k, no_conflict(dict1[k]))
        else:
            yield (k, no_conflict(dict2[k]))

dict1 = {1:{"a":"A"}, 2:{"b":"B"}}
dict2 = {2:{"c":"C"}, 3:{"d":"D"}}

#this helper function allows for recursion and the use of reduce
def f2(x, y):
    return dict(mergedicts(x, y, f2, lambda x: x))

print dict(mergedicts(dict1, dict2, f2, lambda x: x))
print dict(reduce(f2, [dict1, dict2]))

将函数作为参数传递是扩展jterrace解决方案以使其与所有其他递归解决方案一样起作用的关键。

I’ve been testing your solutions and decided to use this one in my project:

def mergedicts(dict1, dict2, conflict, no_conflict):
    for k in set(dict1.keys()).union(dict2.keys()):
        if k in dict1 and k in dict2:
            yield (k, conflict(dict1[k], dict2[k]))
        elif k in dict1:
            yield (k, no_conflict(dict1[k]))
        else:
            yield (k, no_conflict(dict2[k]))

dict1 = {1:{"a":"A"}, 2:{"b":"B"}}
dict2 = {2:{"c":"C"}, 3:{"d":"D"}}

#this helper function allows for recursion and the use of reduce
def f2(x, y):
    return dict(mergedicts(x, y, f2, lambda x: x))

print dict(mergedicts(dict1, dict2, f2, lambda x: x))
print dict(reduce(f2, [dict1, dict2]))

Passing functions as parameteres is key to extend jterrace solution to behave as all the other recursive solutions.


回答 20

我能想到的最简单的方法是:

#!/usr/bin/python

from copy import deepcopy
def dict_merge(a, b):
    if not isinstance(b, dict):
        return b
    result = deepcopy(a)
    for k, v in b.iteritems():
        if k in result and isinstance(result[k], dict):
                result[k] = dict_merge(result[k], v)
        else:
            result[k] = deepcopy(v)
    return result

a = {1:{"a":'A'}, 2:{"b":'B'}}
b = {2:{"c":'C'}, 3:{"d":'D'}}

print dict_merge(a,b)

输出:

{1: {'a': 'A'}, 2: {'c': 'C', 'b': 'B'}, 3: {'d': 'D'}}

Easiest way i can think of is :

#!/usr/bin/python

from copy import deepcopy
def dict_merge(a, b):
    if not isinstance(b, dict):
        return b
    result = deepcopy(a)
    for k, v in b.iteritems():
        if k in result and isinstance(result[k], dict):
                result[k] = dict_merge(result[k], v)
        else:
            result[k] = deepcopy(v)
    return result

a = {1:{"a":'A'}, 2:{"b":'B'}}
b = {2:{"c":'C'}, 3:{"d":'D'}}

print dict_merge(a,b)

Output:

{1: {'a': 'A'}, 2: {'c': 'C', 'b': 'B'}, 3: {'d': 'D'}}

回答 21

我在这里还有另一个略有不同的解决方案:

def deepMerge(d1, d2, inconflict = lambda v1,v2 : v2) :
''' merge d2 into d1. using inconflict function to resolve the leaf conflicts '''
    for k in d2:
        if k in d1 : 
            if isinstance(d1[k], dict) and isinstance(d2[k], dict) :
                deepMerge(d1[k], d2[k], inconflict)
            elif d1[k] != d2[k] :
                d1[k] = inconflict(d1[k], d2[k])
        else :
            d1[k] = d2[k]
    return d1

默认情况下,它可以解决冲突,而有利于第二个dict的值,但是您可以轻松地覆盖它,通过做些麻烦,您甚至可以抛出异常。:)。

I have another slightly different solution here:

def deepMerge(d1, d2, inconflict = lambda v1,v2 : v2) :
''' merge d2 into d1. using inconflict function to resolve the leaf conflicts '''
    for k in d2:
        if k in d1 : 
            if isinstance(d1[k], dict) and isinstance(d2[k], dict) :
                deepMerge(d1[k], d2[k], inconflict)
            elif d1[k] != d2[k] :
                d1[k] = inconflict(d1[k], d2[k])
        else :
            d1[k] = d2[k]
    return d1

By default it resolves conflicts in favor of values from the second dict, but you can easily override this, with some witchery you may be able to even throw exceptions out of it. :).


回答 22

class Utils(object):

    """

    >>> a = { 'first' : { 'all_rows' : { 'pass' : 'dog', 'number' : '1' } } }
    >>> b = { 'first' : { 'all_rows' : { 'fail' : 'cat', 'number' : '5' } } }
    >>> Utils.merge_dict(b, a) == { 'first' : { 'all_rows' : { 'pass' : 'dog', 'fail' : 'cat', 'number' : '5' } } }
    True

    >>> main = {'a': {'b': {'test': 'bug'}, 'c': 'C'}}
    >>> suply = {'a': {'b': 2, 'd': 'D', 'c': {'test': 'bug2'}}}
    >>> Utils.merge_dict(main, suply) == {'a': {'b': {'test': 'bug'}, 'c': 'C', 'd': 'D'}}
    True

    """

    @staticmethod
    def merge_dict(main, suply):
        """
        获取融合的字典,以main为主,suply补充,冲突时以main为准
        :return:
        """
        for key, value in suply.items():
            if key in main:
                if isinstance(main[key], dict):
                    if isinstance(value, dict):
                        Utils.merge_dict(main[key], value)
                    else:
                        pass
                else:
                    pass
            else:
                main[key] = value
        return main

if __name__ == '__main__':
    import doctest
    doctest.testmod()
class Utils(object):

    """

    >>> a = { 'first' : { 'all_rows' : { 'pass' : 'dog', 'number' : '1' } } }
    >>> b = { 'first' : { 'all_rows' : { 'fail' : 'cat', 'number' : '5' } } }
    >>> Utils.merge_dict(b, a) == { 'first' : { 'all_rows' : { 'pass' : 'dog', 'fail' : 'cat', 'number' : '5' } } }
    True

    >>> main = {'a': {'b': {'test': 'bug'}, 'c': 'C'}}
    >>> suply = {'a': {'b': 2, 'd': 'D', 'c': {'test': 'bug2'}}}
    >>> Utils.merge_dict(main, suply) == {'a': {'b': {'test': 'bug'}, 'c': 'C', 'd': 'D'}}
    True

    """

    @staticmethod
    def merge_dict(main, suply):
        """
        获取融合的字典,以main为主,suply补充,冲突时以main为准
        :return:
        """
        for key, value in suply.items():
            if key in main:
                if isinstance(main[key], dict):
                    if isinstance(value, dict):
                        Utils.merge_dict(main[key], value)
                    else:
                        pass
                else:
                    pass
            else:
                main[key] = value
        return main

if __name__ == '__main__':
    import doctest
    doctest.testmod()

回答 23

嘿,我也有同样的问题,但是我有一个解决方案,我会把它发布在这里,以防它对其他人也有用,基本上是合并嵌套的字典并添加值,对我来说,我需要计算一些概率,因此一个很好的工作:

#used to copy a nested dict to a nested dict
def deepupdate(target, src):
    for k, v in src.items():
        if k in target:
            for k2, v2 in src[k].items():
                if k2 in target[k]:
                    target[k][k2]+=v2
                else:
                    target[k][k2] = v2
        else:
            target[k] = copy.deepcopy(v)

通过使用以上方法,我们可以合并:

target = {‘6,6’:{‘6,63’:1},’63,4’:{‘4,4’:1},’4,4’:{‘4,3’:1} ,’6,63’:{’63,4’:1}}

src = {‘5,4’:{‘4,4’:1},’5,5’:{‘5,4’:1},’4,4’:{‘4,3’:1} }

它将变成:{‘5,5’:{‘5,4’:1},’5,4’:{‘4,4’:1},’6,6’:{‘6,63’ :1},’63,4’:{‘4,4’:1},’4,4’:{‘4,3’:2},’6,63’:{’63,4’:1 }}

还请注意此处的更改:

target = {‘6,6’:{‘6,63’:1},’6,63’:{’63,4’:1},‘4,4’:{‘4,3’:1},’63,4’:{‘4,4’:1}}

src = {‘5,4’:{‘4,4’:1},’4,3’:{‘3,4’:1},‘4,4’:{‘4,9’:1},’3,4’:{‘4,4’:1},’5,5’:{‘5,4’:1}}

合并= {‘5,4’:{‘4,4’:1},’4,3’:{‘3,4’:1},’6,63’:{’63,4’:1} ,’5,5’:{‘5,4’:1},’6,6’:{‘6,63’:1},’3,4’:{‘4,4’:1},’ 63,4’:{‘4,4’:1},‘4,4’:{‘4,3’:1,’4,9’:1} }}

别忘了还要添加导入副本:

import copy

hey there I also had the same problem but I though of a solution and I will post it here, in case it is also useful for others, basically merging nested dictionaries and also adding the values, for me I needed to calculate some probabilities so this one worked great:

#used to copy a nested dict to a nested dict
def deepupdate(target, src):
    for k, v in src.items():
        if k in target:
            for k2, v2 in src[k].items():
                if k2 in target[k]:
                    target[k][k2]+=v2
                else:
                    target[k][k2] = v2
        else:
            target[k] = copy.deepcopy(v)

by using the above method we can merge:

target = {‘6,6’: {‘6,63’: 1}, ‘63,4’: {‘4,4’: 1}, ‘4,4’: {‘4,3’: 1}, ‘6,63’: {‘63,4’: 1}}

src = {‘5,4’: {‘4,4’: 1}, ‘5,5’: {‘5,4’: 1}, ‘4,4’: {‘4,3’: 1}}

and this will become: {‘5,5’: {‘5,4’: 1}, ‘5,4’: {‘4,4’: 1}, ‘6,6’: {‘6,63’: 1}, ‘63,4’: {‘4,4’: 1}, ‘4,4’: {‘4,3’: 2}, ‘6,63’: {‘63,4’: 1}}

also notice the changes here:

target = {‘6,6’: {‘6,63’: 1}, ‘6,63’: {‘63,4’: 1}, ‘4,4’: {‘4,3’: 1}, ‘63,4’: {‘4,4’: 1}}

src = {‘5,4’: {‘4,4’: 1}, ‘4,3’: {‘3,4’: 1}, ‘4,4’: {‘4,9’: 1}, ‘3,4’: {‘4,4’: 1}, ‘5,5’: {‘5,4’: 1}}

merge = {‘5,4’: {‘4,4’: 1}, ‘4,3’: {‘3,4’: 1}, ‘6,63’: {‘63,4’: 1}, ‘5,5’: {‘5,4’: 1}, ‘6,6’: {‘6,63’: 1}, ‘3,4’: {‘4,4’: 1}, ‘63,4’: {‘4,4’: 1}, ‘4,4’: {‘4,3’: 1, ‘4,9’: 1}}

dont forget to also add the import for copy:

import copy

回答 24

from collections import defaultdict
from itertools import chain

class DictHelper:

@staticmethod
def merge_dictionaries(*dictionaries, override=True):
    merged_dict = defaultdict(set)
    all_unique_keys = set(chain(*[list(dictionary.keys()) for dictionary in dictionaries]))  # Build a set using all dict keys
    for key in all_unique_keys:
        keys_value_type = list(set(filter(lambda obj_type: obj_type != type(None), [type(dictionary.get(key, None)) for dictionary in dictionaries])))
        # Establish the object type for each key, return None if key is not present in dict and remove None from final result
        if len(keys_value_type) != 1:
            raise Exception("Different objects type for same key: {keys_value_type}".format(keys_value_type=keys_value_type))

        if keys_value_type[0] == list:
            values = list(chain(*[dictionary.get(key, []) for dictionary in dictionaries]))  # Extract the value for each key
            merged_dict[key].update(values)

        elif keys_value_type[0] == dict:
            # Extract all dictionaries by key and enter in recursion
            dicts_to_merge = list(filter(lambda obj: obj != None, [dictionary.get(key, None) for dictionary in dictionaries]))
            merged_dict[key] = DictHelper.merge_dictionaries(*dicts_to_merge)

        else:
            # if override => get value from last dictionary else make a list of all values
            values = list(filter(lambda obj: obj != None, [dictionary.get(key, None) for dictionary in dictionaries]))
            merged_dict[key] = values[-1] if override else values

    return dict(merged_dict)



if __name__ == '__main__':
  d1 = {'aaaaaaaaa': ['to short', 'to long'], 'bbbbb': ['to short', 'to long'], "cccccc": ["the is a test"]}
  d2 = {'aaaaaaaaa': ['field is not a bool'], 'bbbbb': ['field is not a bool']}
  d3 = {'aaaaaaaaa': ['filed is not a string', "to short"], 'bbbbb': ['field is not an integer']}
  print(DictHelper.merge_dictionaries(d1, d2, d3))

  d4 = {"a": {"x": 1, "y": 2, "z": 3, "d": {"x1": 10}}}
  d5 = {"a": {"x": 10, "y": 20, "d": {"x2": 20}}}
  print(DictHelper.merge_dictionaries(d4, d5))

输出:

{'bbbbb': {'to long', 'field is not an integer', 'to short', 'field is not a bool'}, 
'aaaaaaaaa': {'to long', 'to short', 'filed is not a string', 'field is not a bool'}, 
'cccccc': {'the is a test'}}

{'a': {'y': 20, 'd': {'x1': 10, 'x2': 20}, 'z': 3, 'x': 10}}
from collections import defaultdict
from itertools import chain

class DictHelper:

@staticmethod
def merge_dictionaries(*dictionaries, override=True):
    merged_dict = defaultdict(set)
    all_unique_keys = set(chain(*[list(dictionary.keys()) for dictionary in dictionaries]))  # Build a set using all dict keys
    for key in all_unique_keys:
        keys_value_type = list(set(filter(lambda obj_type: obj_type != type(None), [type(dictionary.get(key, None)) for dictionary in dictionaries])))
        # Establish the object type for each key, return None if key is not present in dict and remove None from final result
        if len(keys_value_type) != 1:
            raise Exception("Different objects type for same key: {keys_value_type}".format(keys_value_type=keys_value_type))

        if keys_value_type[0] == list:
            values = list(chain(*[dictionary.get(key, []) for dictionary in dictionaries]))  # Extract the value for each key
            merged_dict[key].update(values)

        elif keys_value_type[0] == dict:
            # Extract all dictionaries by key and enter in recursion
            dicts_to_merge = list(filter(lambda obj: obj != None, [dictionary.get(key, None) for dictionary in dictionaries]))
            merged_dict[key] = DictHelper.merge_dictionaries(*dicts_to_merge)

        else:
            # if override => get value from last dictionary else make a list of all values
            values = list(filter(lambda obj: obj != None, [dictionary.get(key, None) for dictionary in dictionaries]))
            merged_dict[key] = values[-1] if override else values

    return dict(merged_dict)



if __name__ == '__main__':
  d1 = {'aaaaaaaaa': ['to short', 'to long'], 'bbbbb': ['to short', 'to long'], "cccccc": ["the is a test"]}
  d2 = {'aaaaaaaaa': ['field is not a bool'], 'bbbbb': ['field is not a bool']}
  d3 = {'aaaaaaaaa': ['filed is not a string', "to short"], 'bbbbb': ['field is not an integer']}
  print(DictHelper.merge_dictionaries(d1, d2, d3))

  d4 = {"a": {"x": 1, "y": 2, "z": 3, "d": {"x1": 10}}}
  d5 = {"a": {"x": 10, "y": 20, "d": {"x2": 20}}}
  print(DictHelper.merge_dictionaries(d4, d5))

Output:

{'bbbbb': {'to long', 'field is not an integer', 'to short', 'field is not a bool'}, 
'aaaaaaaaa': {'to long', 'to short', 'filed is not a string', 'field is not a bool'}, 
'cccccc': {'the is a test'}}

{'a': {'y': 20, 'd': {'x1': 10, 'x2': 20}, 'z': 3, 'x': 10}}

回答 25

看一看toolz

import toolz
dict1={1:{"a":"A"},2:{"b":"B"}}
dict2={2:{"c":"C"},3:{"d":"D"}}
toolz.merge_with(toolz.merge,dict1,dict2)

{1: {'a': 'A'}, 2: {'b': 'B', 'c': 'C'}, 3: {'d': 'D'}}

take a look at toolz package

import toolz
dict1={1:{"a":"A"},2:{"b":"B"}}
dict2={2:{"c":"C"},3:{"d":"D"}}
toolz.merge_with(toolz.merge,dict1,dict2)

gives

{1: {'a': 'A'}, 2: {'b': 'B', 'c': 'C'}, 3: {'d': 'D'}}

回答 26

以下函数将b合并为a。

def mergedicts(a, b):
    for key in b:
        if isinstance(a.get(key), dict) or isinstance(b.get(key), dict):
            mergedicts(a[key], b[key])
        else:
            a[key] = b[key]
    return a

The following function merges b into a.

def mergedicts(a, b):
    for key in b:
        if isinstance(a.get(key), dict) or isinstance(b.get(key), dict):
            mergedicts(a[key], b[key])
        else:
            a[key] = b[key]
    return a

回答 27

还有另一个细微变化:

这是基于纯python3集的深度更新功能。它通过一次遍历一个级别来更新嵌套字典,并调用自身来更新每个下一级别的字典值:

def deep_update(dict_original, dict_update):
    if isinstance(dict_original, dict) and isinstance(dict_update, dict):
        output=dict(dict_original)
        keys_original=set(dict_original.keys())
        keys_update=set(dict_update.keys())
        similar_keys=keys_original.intersection(keys_update)
        similar_dict={key:deep_update(dict_original[key], dict_update[key]) for key in similar_keys}
        new_keys=keys_update.difference(keys_original)
        new_dict={key:dict_update[key] for key in new_keys}
        output.update(similar_dict)
        output.update(new_dict)
        return output
    else:
        return dict_update

一个简单的例子:

x={'a':{'b':{'c':1, 'd':1}}}
y={'a':{'b':{'d':2, 'e':2}}, 'f':2}

print(deep_update(x, y))
>>> {'a': {'b': {'c': 1, 'd': 2, 'e': 2}}, 'f': 2}

And just another slight variation:

Here is a pure python3 set based deep update function. It updates nested dictionaries by looping through one level at a time and calls itself to update each next level of dictionary values:

def deep_update(dict_original, dict_update):
    if isinstance(dict_original, dict) and isinstance(dict_update, dict):
        output=dict(dict_original)
        keys_original=set(dict_original.keys())
        keys_update=set(dict_update.keys())
        similar_keys=keys_original.intersection(keys_update)
        similar_dict={key:deep_update(dict_original[key], dict_update[key]) for key in similar_keys}
        new_keys=keys_update.difference(keys_original)
        new_dict={key:dict_update[key] for key in new_keys}
        output.update(similar_dict)
        output.update(new_dict)
        return output
    else:
        return dict_update

A simple example:

x={'a':{'b':{'c':1, 'd':1}}}
y={'a':{'b':{'d':2, 'e':2}}, 'f':2}

print(deep_update(x, y))
>>> {'a': {'b': {'c': 1, 'd': 2, 'e': 2}}, 'f': 2}

回答 28

另一个答案怎么样?!这也避免了突变/副作用:

def merge(dict1, dict2):
    output = {}

    # adds keys from `dict1` if they do not exist in `dict2` and vice-versa
    intersection = {**dict2, **dict1}

    for k_intersect, v_intersect in intersection.items():
        if k_intersect not in dict1:
            v_dict2 = dict2[k_intersect]
            output[k_intersect] = v_dict2

        elif k_intersect not in dict2:
            output[k_intersect] = v_intersect

        elif isinstance(v_intersect, dict):
            v_dict2 = dict2[k_intersect]
            output[k_intersect] = merge(v_intersect, v_dict2)

        else:
            output[k_intersect] = v_intersect

    return output
dict1 = {1:{"a":{"A"}}, 2:{"b":{"B"}}}
dict2 = {2:{"c":{"C"}}, 3:{"d":{"D"}}}
dict3 = {1:{"a":{"A"}}, 2:{"b":{"B"},"c":{"C"}}, 3:{"d":{"D"}}}

assert dict3 == merge(dict1, dict2)

How about another answer?!? This one also avoids mutation/side effects:

def merge(dict1, dict2):
    output = {}

    # adds keys from `dict1` if they do not exist in `dict2` and vice-versa
    intersection = {**dict2, **dict1}

    for k_intersect, v_intersect in intersection.items():
        if k_intersect not in dict1:
            v_dict2 = dict2[k_intersect]
            output[k_intersect] = v_dict2

        elif k_intersect not in dict2:
            output[k_intersect] = v_intersect

        elif isinstance(v_intersect, dict):
            v_dict2 = dict2[k_intersect]
            output[k_intersect] = merge(v_intersect, v_dict2)

        else:
            output[k_intersect] = v_intersect

    return output

dict1 = {1:{"a":{"A"}}, 2:{"b":{"B"}}}
dict2 = {2:{"c":{"C"}}, 3:{"d":{"D"}}}
dict3 = {1:{"a":{"A"}}, 2:{"b":{"B"},"c":{"C"}}, 3:{"d":{"D"}}}

assert dict3 == merge(dict1, dict2)

用python 3解开python 2对象

问题:用python 3解开python 2对象

我想知道是否有一种方法可以加载在Python 2.4和Python 3.4中腌制的对象。

我一直在大量公司遗留代码上运行2to3,以使其保持最新状态。

完成此操作后,在运行文件时出现以下错误:

  File "H:\fixers - 3.4\addressfixer - 3.4\trunk\lib\address\address_generic.py"
, line 382, in read_ref_files
    d = pickle.load(open(mshelffile, 'rb'))
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 1: ordinal
not in range(128)

在争用中查看腌制的对象,它dict在中dict,包含键和type值str

所以我的问题是:有没有办法用python 3.4加载最初在python 2.4中腌制的对象?

I’m wondering if there is a way to load an object that was pickled in Python 2.4, with Python 3.4.

I’ve been running 2to3 on a large amount of company legacy code to get it up to date.

Having done this, when running the file I get the following error:

  File "H:\fixers - 3.4\addressfixer - 3.4\trunk\lib\address\address_generic.py"
, line 382, in read_ref_files
    d = pickle.load(open(mshelffile, 'rb'))
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 1: ordinal
not in range(128)

Looking at the pickled object in contention, it’s a dict in a dict, containing keys and values of type str.

So my question is: Is there a way to load an object, originally pickled in python 2.4, with python 3.4?


回答 0

您必须告诉pickle.load()如何将Python字节串数据转换为Python 3字符串,或者可以告诉pickle将它们保留为字节。

默认设置是尝试将所有字符串数据解码为ASCII,并且解码失败。请参阅pickle.load()文档

可选的关键字参数是fix_importsencodingerrors,用于控制对Python 2生成的pickle流的兼容性支持。如果fix_imports为true,pickle将尝试将旧的Python 2名称映射到Python 3中使用的新名称。编码错误告诉pickle如何解码Python 2腌制的8位字符串实例;它们分别默认为“ ASCII”和“ strict”。该编码可以是“字节”来读取这些8位串实例作为字节对象。

将编码设置为latin1可以直接导入数据:

with open(mshelffile, 'rb') as f:
    d = pickle.load(f, encoding='latin1') 

但是您需要确认没有使用错误的编解码器对所有字符串进行解码;Latin-1适用于任何输入,因为它将字节值0-255直接映射到前256个Unicode代码点。

另一种选择是使用加载数据encoding='bytes',然后解码所有bytes键和值。

请注意,直到使用3.6.8、3.7.2和3.8.0之前的Python版本,除非使用,否则对Python 2 datetime对象数据的解泄漏都是无效的encoding='bytes'

You’ll have to tell pickle.load() how to convert Python bytestring data to Python 3 strings, or you can tell pickle to leave them as bytes.

The default is to try and decode all string data as ASCII, and that decoding fails. See the pickle.load() documentation:

Optional keyword arguments are fix_imports, encoding and errors, which are used to control compatibility support for pickle stream generated by Python 2. If fix_imports is true, pickle will try to map the old Python 2 names to the new names used in Python 3. The encoding and errors tell pickle how to decode 8-bit string instances pickled by Python 2; these default to ‘ASCII’ and ‘strict’, respectively. The encoding can be ‘bytes’ to read these 8-bit string instances as bytes objects.

Setting the encoding to latin1 allows you to import the data directly:

with open(mshelffile, 'rb') as f:
    d = pickle.load(f, encoding='latin1') 

but you’ll need to verify that none of your strings are decoded using the wrong codec; Latin-1 works for any input as it maps the byte values 0-255 to the first 256 Unicode codepoints directly.

The alternative would be to load the data with encoding='bytes', and decode all bytes keys and values afterwards.

Note that up to Python versions before 3.6.8, 3.7.2 and 3.8.0, unpickling of Python 2 datetime object data is broken unless you use encoding='bytes'.


回答 1

encoding='latin1'当对象中包含numpy数组时,使用会引起一些问题。

使用encoding='bytes'会更好。

请参阅此答案以获取有关使用的完整说明encoding='bytes'

Using encoding='latin1' causes some issues when your object contains numpy arrays in it.

Using encoding='bytes' will be better.

Please see this answer for complete explanation of using encoding='bytes'


如何装饰一堂课?

问题:如何装饰一堂课?

在Python 2.5中,有没有办法创建装饰类的装饰器?具体来说,我想使用装饰器将成员添加到类中,并更改构造函数以获取该成员的值。

寻找类似以下的内容(在“ Foo类:”上存在语法错误:

def getId(self): return self.__id

class addID(original_class):
    def __init__(self, id, *args, **kws):
        self.__id = id
        self.getId = getId
        original_class.__init__(self, *args, **kws)

@addID
class Foo:
    def __init__(self, value1):
        self.value1 = value1

if __name__ == '__main__':
    foo1 = Foo(5,1)
    print foo1.value1, foo1.getId()
    foo2 = Foo(15,2)
    print foo2.value1, foo2.getId()

我想我真正想要的是在Python中执行类似C#接口的方法。我想我应该改变我的范式。

In Python 2.5, is there a way to create a decorator that decorates a class? Specifically, I want to use a decorator to add a member to a class and change the constructor to take a value for that member.

Looking for something like the following (which has a syntax error on ‘class Foo:’:

def getId(self): return self.__id

class addID(original_class):
    def __init__(self, id, *args, **kws):
        self.__id = id
        self.getId = getId
        original_class.__init__(self, *args, **kws)

@addID
class Foo:
    def __init__(self, value1):
        self.value1 = value1

if __name__ == '__main__':
    foo1 = Foo(5,1)
    print foo1.value1, foo1.getId()
    foo2 = Foo(15,2)
    print foo2.value1, foo2.getId()

I guess what I’m really after is a way to do something like a C# interface in Python. I need to switch my paradigm I suppose.


回答 0

我的观点是您可能希望考虑一个子类,而不是您概述的方法。但是,不知道您的特定情况,YMMV :-)

您正在考虑的是元类。__new__元类中的函数将传递该类的完整建议定义,然后可以在创建类之前将其重写。那时,您可以将构造函数细分为一个新的。

例:

def substitute_init(self, id, *args, **kwargs):
    pass

class FooMeta(type):

    def __new__(cls, name, bases, attrs):
        attrs['__init__'] = substitute_init
        return super(FooMeta, cls).__new__(cls, name, bases, attrs)

class Foo(object):

    __metaclass__ = FooMeta

    def __init__(self, value1):
        pass

替换构造函数可能有点麻烦,但是语言确实为这种深入的内省和动态修改提供了支持。

I would second the notion that you may wish to consider a subclass instead of the approach you’ve outlined. However, not knowing your specific scenario, YMMV :-)

What you’re thinking of is a metaclass. The __new__ function in a metaclass is passed the full proposed definition of the class, which it can then rewrite before the class is created. You can, at that time, sub out the constructor for a new one.

Example:

def substitute_init(self, id, *args, **kwargs):
    pass

class FooMeta(type):

    def __new__(cls, name, bases, attrs):
        attrs['__init__'] = substitute_init
        return super(FooMeta, cls).__new__(cls, name, bases, attrs)

class Foo(object):

    __metaclass__ = FooMeta

    def __init__(self, value1):
        pass

Replacing the constructor is perhaps a bit dramatic, but the language does provide support for this kind of deep introspection and dynamic modification.


回答 1

除了类装饰器是否是您的问题的正确解决方案的问题之外:

在Python 2.6和更高版本中,有带有@语法的类装饰器,因此您可以编写:

@addID
class Foo:
    pass

在旧版本中,您可以使用另一种方法:

class Foo:
    pass

Foo = addID(Foo)

但是请注意,这与函数装饰器的工作原理相同,并且装饰器应返回新(或修改后的原始)类,这不是您在示例中所做的。addID装饰器如下所示:

def addID(original_class):
    orig_init = original_class.__init__
    # Make copy of original __init__, so we can call it without recursion

    def __init__(self, id, *args, **kws):
        self.__id = id
        self.getId = getId
        orig_init(self, *args, **kws) # Call the original __init__

    original_class.__init__ = __init__ # Set the class' __init__ to the new one
    return original_class

然后,您可以按照上述方式为Python版本使用适当的语法。

但是我同意其他人的观点,如果你想重写继承,继承更适合__init__

Apart from the question whether class decorators are the right solution to your problem:

In Python 2.6 and higher, there are class decorators with the @-syntax, so you can write:

@addID
class Foo:
    pass

In older versions, you can do it another way:

class Foo:
    pass

Foo = addID(Foo)

Note however that this works the same as for function decorators, and that the decorator should return the new (or modified original) class, which is not what you’re doing in the example. The addID decorator would look like this:

def addID(original_class):
    orig_init = original_class.__init__
    # Make copy of original __init__, so we can call it without recursion

    def __init__(self, id, *args, **kws):
        self.__id = id
        self.getId = getId
        orig_init(self, *args, **kws) # Call the original __init__

    original_class.__init__ = __init__ # Set the class' __init__ to the new one
    return original_class

You could then use the appropriate syntax for your Python version as described above.

But I agree with others that inheritance is better suited if you want to override __init__.


回答 2

没有人解释过您可以动态定义类。因此,您可以使用一个装饰器来定义(并返回)一个子类:

def addId(cls):

    class AddId(cls):

        def __init__(self, id, *args, **kargs):
            super(AddId, self).__init__(*args, **kargs)
            self.__id = id

        def getId(self):
            return self.__id

    return AddId

可以在Python 2中使用它(来自Blckknght的评论,它解释了为什么您应该在2.6+中继续这样做),如下所示:

class Foo:
    pass

FooId = addId(Foo)

并且在Python 3中是这样的(但要小心使用 super()在类中):

@addId
class Foo:
    pass

所以,你可以有你的蛋糕吃它-继承装饰!

No one has explained that you can dynamically define classes. So you can have a decorator that defines (and returns) a subclass:

def addId(cls):

    class AddId(cls):

        def __init__(self, id, *args, **kargs):
            super(AddId, self).__init__(*args, **kargs)
            self.__id = id

        def getId(self):
            return self.__id

    return AddId

Which can be used in Python 2 (the comment from Blckknght which explains why you should continue to do this in 2.6+) like this:

class Foo:
    pass

FooId = addId(Foo)

And in Python 3 like this (but be careful to use super() in your classes):

@addId
class Foo:
    pass

So you can have your cake and eat it – inheritance and decorators!


回答 3

这不是一个好习惯,因此,没有机制可以做到这一点。完成所需内容的正确方法是继承。

查看类文档

一个小例子:

class Employee(object):

    def __init__(self, age, sex, siblings=0):
        self.age = age
        self.sex = sex    
        self.siblings = siblings

    def born_on(self):    
        today = datetime.date.today()

        return today - datetime.timedelta(days=self.age*365)


class Boss(Employee):    
    def __init__(self, age, sex, siblings=0, bonus=0):
        self.bonus = bonus
        Employee.__init__(self, age, sex, siblings)

这样老板就拥有了一切Employee,还有他自己的__init__方法和自己的成员。

That’s not a good practice and there is no mechanism to do that because of that. The right way to accomplish what you want is inheritance.

Take a look into the class documentation.

A little example:

class Employee(object):

    def __init__(self, age, sex, siblings=0):
        self.age = age
        self.sex = sex    
        self.siblings = siblings

    def born_on(self):    
        today = datetime.date.today()

        return today - datetime.timedelta(days=self.age*365)


class Boss(Employee):    
    def __init__(self, age, sex, siblings=0, bonus=0):
        self.bonus = bonus
        Employee.__init__(self, age, sex, siblings)

This way Boss has everything Employee has, with also his own __init__ method and own members.


回答 4

我同意继承更适合提出的问题。

我发现这个问题在装饰类上确实很方便,谢谢大家。

这是另外两个基于其他答案的示例,包括继承如何影响Python 2.7中的内容(以及@wraps,它维护原始函数的文档字符串等):

def dec(klass):
    old_foo = klass.foo
    @wraps(klass.foo)
    def decorated_foo(self, *args ,**kwargs):
        print('@decorator pre %s' % msg)
        old_foo(self, *args, **kwargs)
        print('@decorator post %s' % msg)
    klass.foo = decorated_foo
    return klass

@dec  # No parentheses
class Foo...

通常,您想向装饰器添加参数:

from functools import wraps

def dec(msg='default'):
    def decorator(klass):
        old_foo = klass.foo
        @wraps(klass.foo)
        def decorated_foo(self, *args ,**kwargs):
            print('@decorator pre %s' % msg)
            old_foo(self, *args, **kwargs)
            print('@decorator post %s' % msg)
        klass.foo = decorated_foo
        return klass
    return decorator

@dec('foo decorator')  # You must add parentheses now, even if they're empty
class Foo(object):
    def foo(self, *args, **kwargs):
        print('foo.foo()')

@dec('subfoo decorator')
class SubFoo(Foo):
    def foo(self, *args, **kwargs):
        print('subfoo.foo() pre')
        super(SubFoo, self).foo(*args, **kwargs)
        print('subfoo.foo() post')

@dec('subsubfoo decorator')
class SubSubFoo(SubFoo):
    def foo(self, *args, **kwargs):
        print('subsubfoo.foo() pre')
        super(SubSubFoo, self).foo(*args, **kwargs)
        print('subsubfoo.foo() post')

SubSubFoo().foo()

输出:

@decorator pre subsubfoo decorator
subsubfoo.foo() pre
@decorator pre subfoo decorator
subfoo.foo() pre
@decorator pre foo decorator
foo.foo()
@decorator post foo decorator
subfoo.foo() post
@decorator post subfoo decorator
subsubfoo.foo() post
@decorator post subsubfoo decorator

我使用了一个函数装饰器,因为我发现它们更加简洁。这是一个装饰类的类:

class Dec(object):

    def __init__(self, msg):
        self.msg = msg

    def __call__(self, klass):
        old_foo = klass.foo
        msg = self.msg
        def decorated_foo(self, *args, **kwargs):
            print('@decorator pre %s' % msg)
            old_foo(self, *args, **kwargs)
            print('@decorator post %s' % msg)
        klass.foo = decorated_foo
        return klass

一个更强大的版本,用于检查这些括号,并在装饰的类上不存在该方法时起作用:

from inspect import isclass

def decorate_if(condition, decorator):
    return decorator if condition else lambda x: x

def dec(msg):
    # Only use if your decorator's first parameter is never a class
    assert not isclass(msg)

    def decorator(klass):
        old_foo = getattr(klass, 'foo', None)

        @decorate_if(old_foo, wraps(klass.foo))
        def decorated_foo(self, *args ,**kwargs):
            print('@decorator pre %s' % msg)
            if callable(old_foo):
                old_foo(self, *args, **kwargs)
            print('@decorator post %s' % msg)

        klass.foo = decorated_foo
        return klass

    return decorator

assert该装饰尚未使用支票没有括号。如果包含,则将要装饰的类传递给msg装饰器的参数,该参数将引发AssertionError

@decorate_if仅将decoratorif condition求值应用于True

使用getattrcallable测试和@decorate_if可以使装饰器foo()在被装饰的类上不存在该方法时也不会中断。

I’d agree inheritance is a better fit for the problem posed.

I found this question really handy though on decorating classes, thanks all.

Here’s another couple of examples, based on other answers, including how inheritance affects things in Python 2.7, (and @wraps, which maintains the original function’s docstring, etc.):

def dec(klass):
    old_foo = klass.foo
    @wraps(klass.foo)
    def decorated_foo(self, *args ,**kwargs):
        print('@decorator pre %s' % msg)
        old_foo(self, *args, **kwargs)
        print('@decorator post %s' % msg)
    klass.foo = decorated_foo
    return klass

@dec  # No parentheses
class Foo...

Often you want to add parameters to your decorator:

from functools import wraps

def dec(msg='default'):
    def decorator(klass):
        old_foo = klass.foo
        @wraps(klass.foo)
        def decorated_foo(self, *args ,**kwargs):
            print('@decorator pre %s' % msg)
            old_foo(self, *args, **kwargs)
            print('@decorator post %s' % msg)
        klass.foo = decorated_foo
        return klass
    return decorator

@dec('foo decorator')  # You must add parentheses now, even if they're empty
class Foo(object):
    def foo(self, *args, **kwargs):
        print('foo.foo()')

@dec('subfoo decorator')
class SubFoo(Foo):
    def foo(self, *args, **kwargs):
        print('subfoo.foo() pre')
        super(SubFoo, self).foo(*args, **kwargs)
        print('subfoo.foo() post')

@dec('subsubfoo decorator')
class SubSubFoo(SubFoo):
    def foo(self, *args, **kwargs):
        print('subsubfoo.foo() pre')
        super(SubSubFoo, self).foo(*args, **kwargs)
        print('subsubfoo.foo() post')

SubSubFoo().foo()

Outputs:

@decorator pre subsubfoo decorator
subsubfoo.foo() pre
@decorator pre subfoo decorator
subfoo.foo() pre
@decorator pre foo decorator
foo.foo()
@decorator post foo decorator
subfoo.foo() post
@decorator post subfoo decorator
subsubfoo.foo() post
@decorator post subsubfoo decorator

I’ve used a function decorator, as I find them more concise. Here’s a class to decorate a class:

class Dec(object):

    def __init__(self, msg):
        self.msg = msg

    def __call__(self, klass):
        old_foo = klass.foo
        msg = self.msg
        def decorated_foo(self, *args, **kwargs):
            print('@decorator pre %s' % msg)
            old_foo(self, *args, **kwargs)
            print('@decorator post %s' % msg)
        klass.foo = decorated_foo
        return klass

A more robust version that checks for those parentheses, and works if the methods don’t exist on the decorated class:

from inspect import isclass

def decorate_if(condition, decorator):
    return decorator if condition else lambda x: x

def dec(msg):
    # Only use if your decorator's first parameter is never a class
    assert not isclass(msg)

    def decorator(klass):
        old_foo = getattr(klass, 'foo', None)

        @decorate_if(old_foo, wraps(klass.foo))
        def decorated_foo(self, *args ,**kwargs):
            print('@decorator pre %s' % msg)
            if callable(old_foo):
                old_foo(self, *args, **kwargs)
            print('@decorator post %s' % msg)

        klass.foo = decorated_foo
        return klass

    return decorator

The assert checks that the decorator has not been used without parentheses. If it has, then the class being decorated is passed to the msg parameter of the decorator, which raises an AssertionError.

@decorate_if only applies the decorator if condition evaluates to True.

The getattr, callable test, and @decorate_if are used so that the decorator doesn’t break if the foo() method doesn’t exist on the class being decorated.


回答 5

实际上,这里有一个很好的类装饰器实现:

https://github.com/agiliq/Django-parsley/blob/master/parsley/decorators.py

我实际上认为这是一个非常有趣的实现。因为它继承了它装饰的类的子类,所以在诸如isinstance检查。

它还有另外一个好处:它并不罕见__init__在自定义声明Django表单进行修改或补充,self.fields因此,最好的改变self.fields发生后,所有的__init__已经跑了有问题的类。

非常聪明。

但是,在您的类中,您实际上希望装饰更改构造函数,但我认为这不是类装饰器的好用例。

There’s actually a pretty good implementation of a class decorator here:

https://github.com/agiliq/Django-parsley/blob/master/parsley/decorators.py

I actually think this is a pretty interesting implementation. Because it subclasses the class it decorates, it will behave exactly like this class in things like isinstance checks.

It has an added benefit: it’s not uncommon for the __init__ statement in a custom django Form to make modifications or additions to self.fields so it’s better for changes to self.fields to happen after all of __init__ has run for the class in question.

Very clever.

However, in your class you actually want the decoration to alter the constructor, which I don’t think is a good use case for a class decorator.


回答 6

这是一个示例,它回答了返回类参数的问题。而且,它仍然尊重继承链,即仅返回类本身的参数。get_params作为一个简单的示例,添加了该功能,但是借助inspect模块,可以添加其他功能。

import inspect 

class Parent:
    @classmethod
    def get_params(my_class):
        return list(inspect.signature(my_class).parameters.keys())

class OtherParent:
    def __init__(self, a, b, c):
        pass

class Child(Parent, OtherParent):
    def __init__(self, x, y, z):
        pass

print(Child.get_params())
>>['x', 'y', 'z']

Here is an example which answers the question of returning the parameters of a class. Moreover, it still respects the chain of inheritance, i.e. only the parameters of the class itself are returned. The function get_params is added as a simple example, but other functionalities can be added thanks to the inspect module.

import inspect 

class Parent:
    @classmethod
    def get_params(my_class):
        return list(inspect.signature(my_class).parameters.keys())

class OtherParent:
    def __init__(self, a, b, c):
        pass

class Child(Parent, OtherParent):
    def __init__(self, x, y, z):
        pass

print(Child.get_params())
>>['x', 'y', 'z']


回答 7

Django具有method_decorator一个装饰器,可以将任何装饰器转换为方法装饰器,您可以看到它是如何实现的django.utils.decorators

https://github.com/django/django/blob/50cf183d219face91822c75fa0a15fe2fe3cb32d/django/utils/decorators.py#L53

https://docs.djangoproject.com/zh-CN/3.0/topics/class-based-views/intro/#decorating-the-class

Django has method_decorator which is a decorator that turns any decorator into a method decorator, you can see how it’s implemented in django.utils.decorators:

https://github.com/django/django/blob/50cf183d219face91822c75fa0a15fe2fe3cb32d/django/utils/decorators.py#L53

https://docs.djangoproject.com/en/3.0/topics/class-based-views/intro/#decorating-the-class


Python中的多元线性回归

问题:Python中的多元线性回归

我似乎找不到任何进行多元回归的python库。我发现的唯一的东西只是做简单的回归。我需要针对几个自变量(x1,x2,x3等)对我的因变量(y)进行回归。

例如,使用以下数据:

print 'y        x1      x2       x3       x4      x5     x6       x7'
for t in texts:
    print "{:>7.1f}{:>10.2f}{:>9.2f}{:>9.2f}{:>10.2f}{:>7.2f}{:>7.2f}{:>9.2f}" /
   .format(t.y,t.x1,t.x2,t.x3,t.x4,t.x5,t.x6,t.x7)

(以上输出:)

      y        x1       x2       x3        x4     x5     x6       x7
   -6.0     -4.95    -5.87    -0.76     14.73   4.02   0.20     0.45
   -5.0     -4.55    -4.52    -0.71     13.74   4.47   0.16     0.50
  -10.0    -10.96   -11.64    -0.98     15.49   4.18   0.19     0.53
   -5.0     -1.08    -3.36     0.75     24.72   4.96   0.16     0.60
   -8.0     -6.52    -7.45    -0.86     16.59   4.29   0.10     0.48
   -3.0     -0.81    -2.36    -0.50     22.44   4.81   0.15     0.53
   -6.0     -7.01    -7.33    -0.33     13.93   4.32   0.21     0.50
   -8.0     -4.46    -7.65    -0.94     11.40   4.43   0.16     0.49
   -8.0    -11.54   -10.03    -1.03     18.18   4.28   0.21     0.55

我将如何在python中进行回归,以获得线性回归公式:

Y = a1x1 + a2x2 + a3x3 + a4x4 + a5x5 + a6x6 + + a7x7 + c

I can’t seem to find any python libraries that do multiple regression. The only things I find only do simple regression. I need to regress my dependent variable (y) against several independent variables (x1, x2, x3, etc.).

For example, with this data:

print 'y        x1      x2       x3       x4      x5     x6       x7'
for t in texts:
    print "{:>7.1f}{:>10.2f}{:>9.2f}{:>9.2f}{:>10.2f}{:>7.2f}{:>7.2f}{:>9.2f}" /
   .format(t.y,t.x1,t.x2,t.x3,t.x4,t.x5,t.x6,t.x7)

(output for above:)

      y        x1       x2       x3        x4     x5     x6       x7
   -6.0     -4.95    -5.87    -0.76     14.73   4.02   0.20     0.45
   -5.0     -4.55    -4.52    -0.71     13.74   4.47   0.16     0.50
  -10.0    -10.96   -11.64    -0.98     15.49   4.18   0.19     0.53
   -5.0     -1.08    -3.36     0.75     24.72   4.96   0.16     0.60
   -8.0     -6.52    -7.45    -0.86     16.59   4.29   0.10     0.48
   -3.0     -0.81    -2.36    -0.50     22.44   4.81   0.15     0.53
   -6.0     -7.01    -7.33    -0.33     13.93   4.32   0.21     0.50
   -8.0     -4.46    -7.65    -0.94     11.40   4.43   0.16     0.49
   -8.0    -11.54   -10.03    -1.03     18.18   4.28   0.21     0.55

How would I regress these in python, to get the linear regression formula:

Y = a1x1 + a2x2 + a3x3 + a4x4 + a5x5 + a6x6 + +a7x7 + c


回答 0

sklearn.linear_model.LinearRegression 会做的:

from sklearn import linear_model
clf = linear_model.LinearRegression()
clf.fit([[getattr(t, 'x%d' % i) for i in range(1, 8)] for t in texts],
        [t.y for t in texts])

然后clf.coef_将具有回归系数。

sklearn.linear_model 也具有类似的接口,可以对回归进行各种正则化。

sklearn.linear_model.LinearRegression will do it:

from sklearn import linear_model
clf = linear_model.LinearRegression()
clf.fit([[getattr(t, 'x%d' % i) for i in range(1, 8)] for t in texts],
        [t.y for t in texts])

Then clf.coef_ will have the regression coefficients.

sklearn.linear_model also has similar interfaces to do various kinds of regularizations on the regression.


回答 1

这是我创建的一些解决方法。我用R检查了它,它可以正常工作。

import numpy as np
import statsmodels.api as sm

y = [1,2,3,4,3,4,5,4,5,5,4,5,4,5,4,5,6,5,4,5,4,3,4]

x = [
     [4,2,3,4,5,4,5,6,7,4,8,9,8,8,6,6,5,5,5,5,5,5,5],
     [4,1,2,3,4,5,6,7,5,8,7,8,7,8,7,8,7,7,7,7,7,6,5],
     [4,1,2,5,6,7,8,9,7,8,7,8,7,7,7,7,7,7,6,6,4,4,4]
     ]

def reg_m(y, x):
    ones = np.ones(len(x[0]))
    X = sm.add_constant(np.column_stack((x[0], ones)))
    for ele in x[1:]:
        X = sm.add_constant(np.column_stack((ele, X)))
    results = sm.OLS(y, X).fit()
    return results

结果:

print reg_m(y, x).summary()

输出:

                            OLS Regression Results                            
==============================================================================
Dep. Variable:                      y   R-squared:                       0.535
Model:                            OLS   Adj. R-squared:                  0.461
Method:                 Least Squares   F-statistic:                     7.281
Date:                Tue, 19 Feb 2013   Prob (F-statistic):            0.00191
Time:                        21:51:28   Log-Likelihood:                -26.025
No. Observations:                  23   AIC:                             60.05
Df Residuals:                      19   BIC:                             64.59
Df Model:                           3                                         
==============================================================================
                 coef    std err          t      P>|t|      [95.0% Conf. Int.]
------------------------------------------------------------------------------
x1             0.2424      0.139      1.739      0.098        -0.049     0.534
x2             0.2360      0.149      1.587      0.129        -0.075     0.547
x3            -0.0618      0.145     -0.427      0.674        -0.365     0.241
const          1.5704      0.633      2.481      0.023         0.245     2.895

==============================================================================
Omnibus:                        6.904   Durbin-Watson:                   1.905
Prob(Omnibus):                  0.032   Jarque-Bera (JB):                4.708
Skew:                          -0.849   Prob(JB):                       0.0950
Kurtosis:                       4.426   Cond. No.                         38.6

pandas 提供了运行此答案中给出的OLS的便捷方法:

使用Pandas Data Frame运行OLS回归

Here is a little work around that I created. I checked it with R and it works correct.

import numpy as np
import statsmodels.api as sm

y = [1,2,3,4,3,4,5,4,5,5,4,5,4,5,4,5,6,5,4,5,4,3,4]

x = [
     [4,2,3,4,5,4,5,6,7,4,8,9,8,8,6,6,5,5,5,5,5,5,5],
     [4,1,2,3,4,5,6,7,5,8,7,8,7,8,7,8,7,7,7,7,7,6,5],
     [4,1,2,5,6,7,8,9,7,8,7,8,7,7,7,7,7,7,6,6,4,4,4]
     ]

def reg_m(y, x):
    ones = np.ones(len(x[0]))
    X = sm.add_constant(np.column_stack((x[0], ones)))
    for ele in x[1:]:
        X = sm.add_constant(np.column_stack((ele, X)))
    results = sm.OLS(y, X).fit()
    return results

Result:

print reg_m(y, x).summary()

Output:

                            OLS Regression Results                            
==============================================================================
Dep. Variable:                      y   R-squared:                       0.535
Model:                            OLS   Adj. R-squared:                  0.461
Method:                 Least Squares   F-statistic:                     7.281
Date:                Tue, 19 Feb 2013   Prob (F-statistic):            0.00191
Time:                        21:51:28   Log-Likelihood:                -26.025
No. Observations:                  23   AIC:                             60.05
Df Residuals:                      19   BIC:                             64.59
Df Model:                           3                                         
==============================================================================
                 coef    std err          t      P>|t|      [95.0% Conf. Int.]
------------------------------------------------------------------------------
x1             0.2424      0.139      1.739      0.098        -0.049     0.534
x2             0.2360      0.149      1.587      0.129        -0.075     0.547
x3            -0.0618      0.145     -0.427      0.674        -0.365     0.241
const          1.5704      0.633      2.481      0.023         0.245     2.895

==============================================================================
Omnibus:                        6.904   Durbin-Watson:                   1.905
Prob(Omnibus):                  0.032   Jarque-Bera (JB):                4.708
Skew:                          -0.849   Prob(JB):                       0.0950
Kurtosis:                       4.426   Cond. No.                         38.6

pandas provides a convenient way to run OLS as given in this answer:

Run an OLS regression with Pandas Data Frame


回答 2

为了澄清起见,您给出的示例是多元线性回归,而不是多元线性回归。区别

单个标量预测变量x和单个标量响应变量y的最简单情况就是简单线性回归。对多个和/或向量值的预测变量(用大写的X表示)的扩展被称为多元线性回归,也称为多元线性回归。几乎所有现实世界中的回归模型都涉及多个预测变量,而线性回归的基本描述通常用多元回归模型来表述。但是请注意,在这些情况下,响应变量y仍然是标量。另一个变量多元线性回归是指y是向量的情况,即与一般线性回归相同。

简而言之:

  • 多元线性回归:响应y是一个标量。
  • 多元线性回归:响应y是向量。

(另一个来源。)

Just to clarify, the example you gave is multiple linear regression, not multivariate linear regression refer. Difference:

The very simplest case of a single scalar predictor variable x and a single scalar response variable y is known as simple linear regression. The extension to multiple and/or vector-valued predictor variables (denoted with a capital X) is known as multiple linear regression, also known as multivariable linear regression. Nearly all real-world regression models involve multiple predictors, and basic descriptions of linear regression are often phrased in terms of the multiple regression model. Note, however, that in these cases the response variable y is still a scalar. Another term multivariate linear regression refers to cases where y is a vector, i.e., the same as general linear regression. The difference between multivariate linear regression and multivariable linear regression should be emphasized as it causes much confusion and misunderstanding in the literature.

In short:

  • multiple linear regression: the response y is a scalar.
  • multivariate linear regression: the response y is a vector.

(Another source.)


回答 3

您可以使用numpy.linalg.lstsq

import numpy as np
y = np.array([-6,-5,-10,-5,-8,-3,-6,-8,-8])
X = np.array([[-4.95,-4.55,-10.96,-1.08,-6.52,-0.81,-7.01,-4.46,-11.54],[-5.87,-4.52,-11.64,-3.36,-7.45,-2.36,-7.33,-7.65,-10.03],[-0.76,-0.71,-0.98,0.75,-0.86,-0.50,-0.33,-0.94,-1.03],[14.73,13.74,15.49,24.72,16.59,22.44,13.93,11.40,18.18],[4.02,4.47,4.18,4.96,4.29,4.81,4.32,4.43,4.28],[0.20,0.16,0.19,0.16,0.10,0.15,0.21,0.16,0.21],[0.45,0.50,0.53,0.60,0.48,0.53,0.50,0.49,0.55]])
X = X.T # transpose so input vectors are along the rows
X = np.c_[X, np.ones(X.shape[0])] # add bias term
beta_hat = np.linalg.lstsq(X,y)[0]
print beta_hat

结果:

[ -0.49104607   0.83271938   0.0860167    0.1326091    6.85681762  22.98163883 -41.08437805 -19.08085066]

您可以通过以下方式查看估计的输出:

print np.dot(X,beta_hat)

结果:

[ -5.97751163,  -5.06465759, -10.16873217,  -4.96959788,  -7.96356915,  -3.06176313,  -6.01818435,  -7.90878145,  -7.86720264]

You can use numpy.linalg.lstsq:

import numpy as np

y = np.array([-6, -5, -10, -5, -8, -3, -6, -8, -8])
X = np.array(
    [
        [-4.95, -4.55, -10.96, -1.08, -6.52, -0.81, -7.01, -4.46, -11.54],
        [-5.87, -4.52, -11.64, -3.36, -7.45, -2.36, -7.33, -7.65, -10.03],
        [-0.76, -0.71, -0.98, 0.75, -0.86, -0.50, -0.33, -0.94, -1.03],
        [14.73, 13.74, 15.49, 24.72, 16.59, 22.44, 13.93, 11.40, 18.18],
        [4.02, 4.47, 4.18, 4.96, 4.29, 4.81, 4.32, 4.43, 4.28],
        [0.20, 0.16, 0.19, 0.16, 0.10, 0.15, 0.21, 0.16, 0.21],
        [0.45, 0.50, 0.53, 0.60, 0.48, 0.53, 0.50, 0.49, 0.55],
    ]
)
X = X.T  # transpose so input vectors are along the rows
X = np.c_[X, np.ones(X.shape[0])]  # add bias term
beta_hat = np.linalg.lstsq(X, y, rcond=None)[0]
print(beta_hat)

Result:

[ -0.49104607   0.83271938   0.0860167    0.1326091    6.85681762  22.98163883 -41.08437805 -19.08085066]

You can see the estimated output with:

print(np.dot(X,beta_hat))

Result:

[ -5.97751163,  -5.06465759, -10.16873217,  -4.96959788,  -7.96356915,  -3.06176313,  -6.01818435,  -7.90878145,  -7.86720264]

回答 4

使用scipy.optimize.curve_fit。而且不仅适用于线性拟合。

from scipy.optimize import curve_fit
import scipy

def fn(x, a, b, c):
    return a + b*x[0] + c*x[1]

# y(x0,x1) data:
#    x0=0 1 2
# ___________
# x1=0 |0 1 2
# x1=1 |1 2 3
# x1=2 |2 3 4

x = scipy.array([[0,1,2,0,1,2,0,1,2,],[0,0,0,1,1,1,2,2,2]])
y = scipy.array([0,1,2,1,2,3,2,3,4])
popt, pcov = curve_fit(fn, x, y)
print popt

Use scipy.optimize.curve_fit. And not only for linear fit.

from scipy.optimize import curve_fit
import scipy

def fn(x, a, b, c):
    return a + b*x[0] + c*x[1]

# y(x0,x1) data:
#    x0=0 1 2
# ___________
# x1=0 |0 1 2
# x1=1 |1 2 3
# x1=2 |2 3 4

x = scipy.array([[0,1,2,0,1,2,0,1,2,],[0,0,0,1,1,1,2,2,2]])
y = scipy.array([0,1,2,1,2,3,2,3,4])
popt, pcov = curve_fit(fn, x, y)
print popt

回答 5

将数据转换为熊猫数据框(df)后,

import statsmodels.formula.api as smf
lm = smf.ols(formula='y ~ x1 + x2 + x3 + x4 + x5 + x6 + x7', data=df).fit()
print(lm.params)

默认情况下包括拦截项。

有关更多示例,请参见此笔记本

Once you convert your data to a pandas dataframe (df),

import statsmodels.formula.api as smf
lm = smf.ols(formula='y ~ x1 + x2 + x3 + x4 + x5 + x6 + x7', data=df).fit()
print(lm.params)

The intercept term is included by default.

See this notebook for more examples.


回答 6

我认为这可能是完成这项工作的最简单方法:

from random import random
from pandas import DataFrame
from statsmodels.api import OLS
lr = lambda : [random() for i in range(100)]
x = DataFrame({'x1': lr(), 'x2':lr(), 'x3':lr()})
x['b'] = 1
y = x.x1 + x.x2 * 2 + x.x3 * 3 + 4

print x.head()

         x1        x2        x3  b
0  0.433681  0.946723  0.103422  1
1  0.400423  0.527179  0.131674  1
2  0.992441  0.900678  0.360140  1
3  0.413757  0.099319  0.825181  1
4  0.796491  0.862593  0.193554  1

print y.head()

0    6.637392
1    5.849802
2    7.874218
3    7.087938
4    7.102337
dtype: float64

model = OLS(y, x)
result = model.fit()
print result.summary()

                            OLS Regression Results                            
==============================================================================
Dep. Variable:                      y   R-squared:                       1.000
Model:                            OLS   Adj. R-squared:                  1.000
Method:                 Least Squares   F-statistic:                 5.859e+30
Date:                Wed, 09 Dec 2015   Prob (F-statistic):               0.00
Time:                        15:17:32   Log-Likelihood:                 3224.9
No. Observations:                 100   AIC:                            -6442.
Df Residuals:                      96   BIC:                            -6431.
Df Model:                           3                                         
Covariance Type:            nonrobust                                         
==============================================================================
                 coef    std err          t      P>|t|      [95.0% Conf. Int.]
------------------------------------------------------------------------------
x1             1.0000   8.98e-16   1.11e+15      0.000         1.000     1.000
x2             2.0000   8.28e-16   2.41e+15      0.000         2.000     2.000
x3             3.0000   8.34e-16    3.6e+15      0.000         3.000     3.000
b              4.0000   8.51e-16    4.7e+15      0.000         4.000     4.000
==============================================================================
Omnibus:                        7.675   Durbin-Watson:                   1.614
Prob(Omnibus):                  0.022   Jarque-Bera (JB):                3.118
Skew:                           0.045   Prob(JB):                        0.210
Kurtosis:                       2.140   Cond. No.                         6.89
==============================================================================

I think this may the most easy way to finish this work:

from random import random
from pandas import DataFrame
from statsmodels.api import OLS
lr = lambda : [random() for i in range(100)]
x = DataFrame({'x1': lr(), 'x2':lr(), 'x3':lr()})
x['b'] = 1
y = x.x1 + x.x2 * 2 + x.x3 * 3 + 4

print x.head()

         x1        x2        x3  b
0  0.433681  0.946723  0.103422  1
1  0.400423  0.527179  0.131674  1
2  0.992441  0.900678  0.360140  1
3  0.413757  0.099319  0.825181  1
4  0.796491  0.862593  0.193554  1

print y.head()

0    6.637392
1    5.849802
2    7.874218
3    7.087938
4    7.102337
dtype: float64

model = OLS(y, x)
result = model.fit()
print result.summary()

                            OLS Regression Results                            
==============================================================================
Dep. Variable:                      y   R-squared:                       1.000
Model:                            OLS   Adj. R-squared:                  1.000
Method:                 Least Squares   F-statistic:                 5.859e+30
Date:                Wed, 09 Dec 2015   Prob (F-statistic):               0.00
Time:                        15:17:32   Log-Likelihood:                 3224.9
No. Observations:                 100   AIC:                            -6442.
Df Residuals:                      96   BIC:                            -6431.
Df Model:                           3                                         
Covariance Type:            nonrobust                                         
==============================================================================
                 coef    std err          t      P>|t|      [95.0% Conf. Int.]
------------------------------------------------------------------------------
x1             1.0000   8.98e-16   1.11e+15      0.000         1.000     1.000
x2             2.0000   8.28e-16   2.41e+15      0.000         2.000     2.000
x3             3.0000   8.34e-16    3.6e+15      0.000         3.000     3.000
b              4.0000   8.51e-16    4.7e+15      0.000         4.000     4.000
==============================================================================
Omnibus:                        7.675   Durbin-Watson:                   1.614
Prob(Omnibus):                  0.022   Jarque-Bera (JB):                3.118
Skew:                           0.045   Prob(JB):                        0.210
Kurtosis:                       2.140   Cond. No.                         6.89
==============================================================================

回答 7

可以使用上面提到的sklearn库处理多个线性回归。我正在使用Python 3.6的Anaconda安装。

如下创建模型:

from sklearn.linear_model import LinearRegression
regressor = LinearRegression()
regressor.fit(X, y)

# display coefficients
print(regressor.coef_)

Multiple Linear Regression can be handled using the sklearn library as referenced above. I’m using the Anaconda install of Python 3.6.

Create your model as follows:

from sklearn.linear_model import LinearRegression
regressor = LinearRegression()
regressor.fit(X, y)

# display coefficients
print(regressor.coef_)

回答 8

您可以使用numpy.linalg.lstsq


回答 9

您可以使用下面的函数并将其传递给DataFrame:

def linear(x, y=None, show=True):
    """
    @param x: pd.DataFrame
    @param y: pd.DataFrame or pd.Series or None
              if None, then use last column of x as y
    @param show: if show regression summary
    """
    import statsmodels.api as sm

    xy = sm.add_constant(x if y is None else pd.concat([x, y], axis=1))
    res = sm.OLS(xy.ix[:, -1], xy.ix[:, :-1], missing='drop').fit()

    if show: print res.summary()
    return res

You can use the function below and pass it a DataFrame:

def linear(x, y=None, show=True):
    """
    @param x: pd.DataFrame
    @param y: pd.DataFrame or pd.Series or None
              if None, then use last column of x as y
    @param show: if show regression summary
    """
    import statsmodels.api as sm

    xy = sm.add_constant(x if y is None else pd.concat([x, y], axis=1))
    res = sm.OLS(xy.ix[:, -1], xy.ix[:, :-1], missing='drop').fit()

    if show: print res.summary()
    return res

回答 10

Scikit-learn是一个适用于Python的机器学习库,可以为您完成这项工作。只需将sklearn.linear_model模块导入脚本即可。

在python中使用sklearn查找多重线性回归的代码模板:

import numpy as np
import matplotlib.pyplot as plt #to plot visualizations
import pandas as pd

# Importing the dataset
df = pd.read_csv(<Your-dataset-path>)
# Assigning feature and target variables
X = df.iloc[:,:-1]
y = df.iloc[:,-1]

# Use label encoders, if you have any categorical variable
from sklearn.preprocessing import LabelEncoder
labelencoder = LabelEncoder()
X['<column-name>'] = labelencoder.fit_transform(X['<column-name>'])

from sklearn.preprocessing import OneHotEncoder
onehotencoder = OneHotEncoder(categorical_features = ['<index-value>'])
X = onehotencoder.fit_transform(X).toarray()

# Avoiding the dummy variable trap
X = X[:,1:] # Usually done by the algorithm itself

#Spliting the data into test and train set
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X,y, random_state = 0, test_size = 0.2)

# Fitting the model
from sklearn.linear_model import LinearRegression
regressor = LinearRegression()
regressor.fit(X_train, y_train)

# Predicting the test set results
y_pred = regressor.predict(X_test)

而已。您可以将此代码用作在任何数据集中实现多元线性回归的模板。为了更好地理解示例,请访问:带有示例的线性回归

Scikit-learn is a machine learning library for Python which can do this job for you. Just import sklearn.linear_model module into your script.

Find the code template for Multiple Linear Regression using sklearn in Python:

import numpy as np
import matplotlib.pyplot as plt #to plot visualizations
import pandas as pd

# Importing the dataset
df = pd.read_csv(<Your-dataset-path>)
# Assigning feature and target variables
X = df.iloc[:,:-1]
y = df.iloc[:,-1]

# Use label encoders, if you have any categorical variable
from sklearn.preprocessing import LabelEncoder
labelencoder = LabelEncoder()
X['<column-name>'] = labelencoder.fit_transform(X['<column-name>'])

from sklearn.preprocessing import OneHotEncoder
onehotencoder = OneHotEncoder(categorical_features = ['<index-value>'])
X = onehotencoder.fit_transform(X).toarray()

# Avoiding the dummy variable trap
X = X[:,1:] # Usually done by the algorithm itself

#Spliting the data into test and train set
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X,y, random_state = 0, test_size = 0.2)

# Fitting the model
from sklearn.linear_model import LinearRegression
regressor = LinearRegression()
regressor.fit(X_train, y_train)

# Predicting the test set results
y_pred = regressor.predict(X_test)

That’s it. You can use this code as a template for implementing Multiple Linear Regression in any dataset. For a better understanding with an example, Visit: Linear Regression with an example


回答 11

这是另一种基本方法:

from patsy import dmatrices
import statsmodels.api as sm

y,x = dmatrices("y_data ~ x_1 + x_2 ", data = my_data)
### y_data is the name of the dependent variable in your data ### 
model_fit = sm.OLS(y,x)
results = model_fit.fit()
print(results.summary())

代替sm.OLS您也可以使用sm.Logitor sm.Probit和等。

Here is an alternative and basic method:

from patsy import dmatrices
import statsmodels.api as sm

y,x = dmatrices("y_data ~ x_1 + x_2 ", data = my_data)
### y_data is the name of the dependent variable in your data ### 
model_fit = sm.OLS(y,x)
results = model_fit.fit()
print(results.summary())

Instead of sm.OLS you can also use sm.Logit or sm.Probit and etc.