问题:如何合并字典的字典?

我需要合并多个词典,例如:

dict1 = {1:{"a":{A}}, 2:{"b":{B}}}dict2 = {2:{"c":{C}}, 3:{"d":{D}}

随着A B CD作为树的叶子像{"info1":"value", "info2":"value2"}

词典的级别(深度)未知,可能是 {2:{"c":{"z":{"y":{C}}}}}

在我的情况下,它表示目录/文件结构,其中节点为docs,而节点为文件。

我想将它们合并以获得:

 dict3 = {1:{"a":{A}}, 2:{"b":{B},"c":{C}}, 3:{"d":{D}}}

我不确定如何使用Python轻松做到这一点。

I need to merge multiple dictionaries, here’s what I have for instance:

dict1 = {1:{"a":{A}}, 2:{"b":{B}}}dict2 = {2:{"c":{C}}, 3:{"d":{D}}

With A B C and D being leaves of the tree, like {"info1":"value", "info2":"value2"}

There is an unknown level(depth) of dictionaries, it could be {2:{"c":{"z":{"y":{C}}}}}

In my case it represents a directory/files structure with nodes being docs and leaves being files.

I want to merge them to obtain:

 dict3 = {1:{"a":{A}}, 2:{"b":{B},"c":{C}}, 3:{"d":{D}}}

I’m not sure how I could do that easily with Python.


回答 0

这实际上是非常棘手的-特别是如果您希望在事物不一致时收到有用的错误消息,同时正确地接受重复但一致的条目(这里没有其他答案了……)。

假设您没有大量条目,那么递归函数最简单:

def merge(a, b, path=None):    "merges b into a"    if path is None: path = []    for key in b:        if key in a:            if isinstance(a[key], dict) and isinstance(b[key], dict):                merge(a[key], b[key], path + [str(key)])            elif a[key] == b[key]:                pass # same leaf value            else:                raise Exception('Conflict at %s' % '.'.join(path + [str(key)]))        else:            a[key] = b[key]    return a# worksprint(merge({1:{"a":"A"},2:{"b":"B"}}, {2:{"c":"C"},3:{"d":"D"}}))# has conflictmerge({1:{"a":"A"},2:{"b":"B"}}, {1:{"a":"A"},2:{"b":"C"}})

请注意,这会发生变化a-将的内容b添加到a(也会返回)。如果您想保留a,可以这样称呼merge(dict(a), b)

agf指出(如下),您可能有两个以上的命令,在这种情况下,您可以使用:

reduce(merge, [dict1, dict2, dict3...])

将所有内容添加到dict1中。

[注意-我编辑了最初的答案以使第一个参数发生变化;使“减少”更易于解释]

python 3中的ps,您还需要 from functools import reduce

this is actually quite tricky – particularly if you want a useful error message when things are inconsistent, while correctly accepting duplicate but consistent entries (something no other answer here does….)

assuming you don’t have huge numbers of entries a recursive function is easiest:

def merge(a, b, path=None):    "merges b into a"    if path is None: path = []    for key in b:        if key in a:            if isinstance(a[key], dict) and isinstance(b[key], dict):                merge(a[key], b[key], path + [str(key)])            elif a[key] == b[key]:                pass # same leaf value            else:                raise Exception('Conflict at %s' % '.'.join(path + [str(key)]))        else:            a[key] = b[key]    return a# worksprint(merge({1:{"a":"A"},2:{"b":"B"}}, {2:{"c":"C"},3:{"d":"D"}}))# has conflictmerge({1:{"a":"A"},2:{"b":"B"}}, {1:{"a":"A"},2:{"b":"C"}})

note that this mutates a – the contents of b are added to a (which is also returned). if you want to keep a you could call it like merge(dict(a), b).

agf pointed out (below) that you may have more than two dicts, in which case you can use:

reduce(merge, [dict1, dict2, dict3...])

where everything will be added to dict1.

[note – i edited my initial answer to mutate the first argument; that makes the “reduce” easier to explain]

ps in python 3, you will also need from functools import reduce


回答 1

这是使用生成器实现的一种简单方法:

def mergedicts(dict1, dict2):    for k in set(dict1.keys()).union(dict2.keys()):        if k in dict1 and k in dict2:            if isinstance(dict1[k], dict) and isinstance(dict2[k], dict):                yield (k, dict(mergedicts(dict1[k], dict2[k])))            else:                # If one of the values is not a dict, you can't continue merging it.                # Value from second dict overrides one in first and we move on.                yield (k, dict2[k])                # Alternatively, replace this with exception raiser to alert you of value conflicts        elif k in dict1:            yield (k, dict1[k])        else:            yield (k, dict2[k])dict1 = {1:{"a":"A"},2:{"b":"B"}}dict2 = {2:{"c":"C"},3:{"d":"D"}}print dict(mergedicts(dict1,dict2))

打印:

{1: {'a': 'A'}, 2: {'c': 'C', 'b': 'B'}, 3: {'d': 'D'}}

Here’s an easy way to do it using generators:

def mergedicts(dict1, dict2):    for k in set(dict1.keys()).union(dict2.keys()):        if k in dict1 and k in dict2:            if isinstance(dict1[k], dict) and isinstance(dict2[k], dict):                yield (k, dict(mergedicts(dict1[k], dict2[k])))            else:                # If one of the values is not a dict, you can't continue merging it.                # Value from second dict overrides one in first and we move on.                yield (k, dict2[k])                # Alternatively, replace this with exception raiser to alert you of value conflicts        elif k in dict1:            yield (k, dict1[k])        else:            yield (k, dict2[k])dict1 = {1:{"a":"A"},2:{"b":"B"}}dict2 = {2:{"c":"C"},3:{"d":"D"}}print dict(mergedicts(dict1,dict2))

This prints:

{1: {'a': 'A'}, 2: {'c': 'C', 'b': 'B'}, 3: {'d': 'D'}}

回答 2

这个问题的一个问题是,字典的值可以是任意复杂的数据。基于这些和其他答案,我想到了以下代码:

class YamlReaderError(Exception):    passdef data_merge(a, b):    """merges b into a and return merged result    NOTE: tuples and arbitrary objects are not handled as it is totally ambiguous what should happen"""    key = None    # ## debug output    # sys.stderr.write("DEBUG: %s to %s\n" %(b,a))    try:        if a is None or isinstance(a, str) or isinstance(a, unicode) or isinstance(a, int) or isinstance(a, long) or isinstance(a, float):            # border case for first run or if a is a primitive            a = b        elif isinstance(a, list):            # lists can be only appended            if isinstance(b, list):                # merge lists                a.extend(b)            else:                # append to list                a.append(b)        elif isinstance(a, dict):            # dicts must be merged            if isinstance(b, dict):                for key in b:                    if key in a:                        a[key] = data_merge(a[key], b[key])                    else:                        a[key] = b[key]            else:                raise YamlReaderError('Cannot merge non-dict "%s" into dict "%s"' % (b, a))        else:            raise YamlReaderError('NOT IMPLEMENTED "%s" into "%s"' % (b, a))    except TypeError, e:        raise YamlReaderError('TypeError "%s" in key "%s" when merging "%s" into "%s"' % (e, key, b, a))    return a

我的用例是合并YAML文件,我只需要处理可能的数据类型的子集。因此,我可以忽略元组和其他对象。对我而言,明智的合并逻辑意味着

  • 更换标量
  • 追加清单
  • 通过添加缺少的键并更新现有键来合并字典

其他所有和不可预见的结果都会导致错误。

One issue with this question is that the values of the dict can be arbitrarily complex pieces of data. Based upon these and other answers I came up with this code:

class YamlReaderError(Exception):    passdef data_merge(a, b):    """merges b into a and return merged result    NOTE: tuples and arbitrary objects are not handled as it is totally ambiguous what should happen"""    key = None    # ## debug output    # sys.stderr.write("DEBUG: %s to %s\n" %(b,a))    try:        if a is None or isinstance(a, str) or isinstance(a, unicode) or isinstance(a, int) or isinstance(a, long) or isinstance(a, float):            # border case for first run or if a is a primitive            a = b        elif isinstance(a, list):            # lists can be only appended            if isinstance(b, list):                # merge lists                a.extend(b)            else:                # append to list                a.append(b)        elif isinstance(a, dict):            # dicts must be merged            if isinstance(b, dict):                for key in b:                    if key in a:                        a[key] = data_merge(a[key], b[key])                    else:                        a[key] = b[key]            else:                raise YamlReaderError('Cannot merge non-dict "%s" into dict "%s"' % (b, a))        else:            raise YamlReaderError('NOT IMPLEMENTED "%s" into "%s"' % (b, a))    except TypeError, e:        raise YamlReaderError('TypeError "%s" in key "%s" when merging "%s" into "%s"' % (e, key, b, a))    return a

My use case is merging YAML files where I only have to deal with a subset of possible data types. Hence I can ignore tuples and other objects. For me a sensible merge logic means

  • replace scalars
  • append lists
  • merge dicts by adding missing keys and updating existing keys

Everything else and the unforeseens results in an error.


回答 3

字典词典合并

由于这是规范的问题(尽管有某些非一般性的规定),所以我提供了规范的Python方法来解决此问题。

最简单的情况:“叶是嵌套的字典,以空字典结尾”:

d1 = {'a': {1: {'foo': {}}, 2: {}}}d2 = {'a': {1: {}, 2: {'bar': {}}}}d3 = {'b': {3: {'baz': {}}}}d4 = {'a': {1: {'quux': {}}}}

这是最简单的递归情况,我建议两种朴素的方法:

def rec_merge1(d1, d2):    '''return new merged dict of dicts'''    for k, v in d1.items(): # in Python 2, use .iteritems()!        if k in d2:            d2[k] = rec_merge1(v, d2[k])    d3 = d1.copy()    d3.update(d2)    return d3def rec_merge2(d1, d2):    '''update first dict with second recursively'''    for k, v in d1.items(): # in Python 2, use .iteritems()!        if k in d2:            d2[k] = rec_merge2(v, d2[k])    d1.update(d2)    return d1

我相信我更喜欢第二个,但要记住,第一个的原始状态必须从其原始位置重建。这是用法:

>>> from functools import reduce # only required for Python 3.>>> reduce(rec_merge1, (d1, d2, d3, d4)){'a': {1: {'quux': {}, 'foo': {}}, 2: {'bar': {}}}, 'b': {3: {'baz': {}}}}>>> reduce(rec_merge2, (d1, d2, d3, d4)){'a': {1: {'quux': {}, 'foo': {}}, 2: {'bar': {}}}, 'b': {3: {'baz': {}}}}

复杂的情况:“叶子是任何其他类型:”

因此,如果它们以字典结尾,这是合并末端空字典的简单情况。如果没有,那不是那么简单。如果是字符串,如何合并它们?可以类似地更新集合,因此我们可以进行这种处理,但是会丢失它们合并的顺序。那么顺序重要吗?

因此,代替更多信息,最简单的方法是在两个值都不都是dict的情况下为他们提供标准的更新处理:即第二个dict的值将覆盖第一个dict的值,即使第二个dict的值为None且第一个的值为a有很多信息的字典。

d1 = {'a': {1: 'foo', 2: None}}d2 = {'a': {1: None, 2: 'bar'}}d3 = {'b': {3: 'baz'}}d4 = {'a': {1: 'quux'}}from collections import MutableMappingdef rec_merge(d1, d2):    '''    Update two dicts of dicts recursively,     if either mapping has leaves that are non-dicts,     the second's leaf overwrites the first's.    '''    for k, v in d1.items(): # in Python 2, use .iteritems()!        if k in d2:            # this next check is the only difference!            if all(isinstance(e, MutableMapping) for e in (v, d2[k])):                d2[k] = rec_merge(v, d2[k])            # we could further check types and merge as appropriate here.    d3 = d1.copy()    d3.update(d2)    return d3

现在

from functools import reducereduce(rec_merge, (d1, d2, d3, d4))

退货

{'a': {1: 'quux', 2: 'bar'}, 'b': {3: 'baz'}}

适用于原始问题:

我必须删除字母周围的花括号并将其放在单引号中,以使其成为合法的Python(否则,它们将在Python 2.7+中设置为原义)并附加缺少的花括号:

dict1 = {1:{"a":'A'}, 2:{"b":'B'}}dict2 = {2:{"c":'C'}, 3:{"d":'D'}}

rec_merge(dict1, dict2)现在返回:

{1: {'a': 'A'}, 2: {'c': 'C', 'b': 'B'}, 3: {'d': 'D'}}

匹配原始问题的期望结果(例如,更改{A}为后)'A'

Dictionaries of dictionaries merge

As this is the canonical question (in spite of certain non-generalities) I’m providing the canonical Pythonic approach to solving this issue.

Simplest Case: “leaves are nested dicts that end in empty dicts”:

d1 = {'a': {1: {'foo': {}}, 2: {}}}d2 = {'a': {1: {}, 2: {'bar': {}}}}d3 = {'b': {3: {'baz': {}}}}d4 = {'a': {1: {'quux': {}}}}

This is the simplest case for recursion, and I would recommend two naive approaches:

def rec_merge1(d1, d2):    '''return new merged dict of dicts'''    for k, v in d1.items(): # in Python 2, use .iteritems()!        if k in d2:            d2[k] = rec_merge1(v, d2[k])    d3 = d1.copy()    d3.update(d2)    return d3def rec_merge2(d1, d2):    '''update first dict with second recursively'''    for k, v in d1.items(): # in Python 2, use .iteritems()!        if k in d2:            d2[k] = rec_merge2(v, d2[k])    d1.update(d2)    return d1

I believe I would prefer the second to the first, but keep in mind that the original state of the first would have to be rebuilt from its origin. Here’s the usage:

>>> from functools import reduce # only required for Python 3.>>> reduce(rec_merge1, (d1, d2, d3, d4)){'a': {1: {'quux': {}, 'foo': {}}, 2: {'bar': {}}}, 'b': {3: {'baz': {}}}}>>> reduce(rec_merge2, (d1, d2, d3, d4)){'a': {1: {'quux': {}, 'foo': {}}, 2: {'bar': {}}}, 'b': {3: {'baz': {}}}}

Complex Case: “leaves are of any other type:”

So if they end in dicts, it’s a simple case of merging the end empty dicts. If not, it’s not so trivial. If strings, how do you merge them? Sets can be updated similarly, so we could give that treatment, but we lose the order in which they were merged. So does order matter?

So in lieu of more information, the simplest approach will be to give them the standard update treatment if both values are not dicts: i.e. the second dict’s value will overwrite the first, even if the second dict’s value is None and the first’s value is a dict with a lot of info.

d1 = {'a': {1: 'foo', 2: None}}d2 = {'a': {1: None, 2: 'bar'}}d3 = {'b': {3: 'baz'}}d4 = {'a': {1: 'quux'}}from collections.abc import MutableMappingdef rec_merge(d1, d2):    '''    Update two dicts of dicts recursively,     if either mapping has leaves that are non-dicts,     the second's leaf overwrites the first's.    '''    for k, v in d1.items():        if k in d2:            # this next check is the only difference!            if all(isinstance(e, MutableMapping) for e in (v, d2[k])):                d2[k] = rec_merge(v, d2[k])            # we could further check types and merge as appropriate here.    d3 = d1.copy()    d3.update(d2)    return d3

And now

from functools import reducereduce(rec_merge, (d1, d2, d3, d4))

returns

{'a': {1: 'quux', 2: 'bar'}, 'b': {3: 'baz'}}

Application to the original question:

I’ve had to remove the curly braces around the letters and put them in single quotes for this to be legit Python (else they would be set literals in Python 2.7+) as well as append a missing brace:

dict1 = {1:{"a":'A'}, 2:{"b":'B'}}dict2 = {2:{"c":'C'}, 3:{"d":'D'}}

and rec_merge(dict1, dict2) now returns:

{1: {'a': 'A'}, 2: {'c': 'C', 'b': 'B'}, 3: {'d': 'D'}}

Which matches the desired outcome of the original question (after changing, e.g. the {A} to 'A'.)


回答 4

基于@andrew cooke。此版本处理字典的嵌套列表,还允许选择更新值

def merge(a, b, path=None, update=True):    "http://stackoverflow.com/questions/7204805/python-dictionaries-of-dictionaries-merge"    "merges b into a"    if path is None: path = []    for key in b:        if key in a:            if isinstance(a[key], dict) and isinstance(b[key], dict):                merge(a[key], b[key], path + [str(key)])            elif a[key] == b[key]:                pass # same leaf value            elif isinstance(a[key], list) and isinstance(b[key], list):                for idx, val in enumerate(b[key]):                    a[key][idx] = merge(a[key][idx], b[key][idx], path + [str(key), str(idx)], update=update)            elif update:                a[key] = b[key]            else:                raise Exception('Conflict at %s' % '.'.join(path + [str(key)]))        else:            a[key] = b[key]    return a

Based on @andrew cooke. This version handles nested lists of dicts and also allows the option to update the values

def merge(a, b, path=None, update=True):    "http://stackoverflow.com/questions/7204805/python-dictionaries-of-dictionaries-merge"    "merges b into a"    if path is None: path = []    for key in b:        if key in a:            if isinstance(a[key], dict) and isinstance(b[key], dict):                merge(a[key], b[key], path + [str(key)])            elif a[key] == b[key]:                pass # same leaf value            elif isinstance(a[key], list) and isinstance(b[key], list):                for idx, val in enumerate(b[key]):                    a[key][idx] = merge(a[key][idx], b[key][idx], path + [str(key), str(idx)], update=update)            elif update:                a[key] = b[key]            else:                raise Exception('Conflict at %s' % '.'.join(path + [str(key)]))        else:            a[key] = b[key]    return a

回答 5

这个简单的递归过程将一个字典合并到另一个字典中,同时覆盖冲突的键:

#!/usr/bin/env python2.7def merge_dicts(dict1, dict2):    """ Recursively merges dict2 into dict1 """    if not isinstance(dict1, dict) or not isinstance(dict2, dict):        return dict2    for k in dict2:        if k in dict1:            dict1[k] = merge_dicts(dict1[k], dict2[k])        else:            dict1[k] = dict2[k]    return dict1print (merge_dicts({1:{"a":"A"}, 2:{"b":"B"}}, {2:{"c":"C"}, 3:{"d":"D"}}))print (merge_dicts({1:{"a":"A"}, 2:{"b":"B"}}, {1:{"a":"A"}, 2:{"b":"C"}}))

输出:

{1: {'a': 'A'}, 2: {'c': 'C', 'b': 'B'}, 3: {'d': 'D'}}{1: {'a': 'A'}, 2: {'b': 'C'}}

This simple recursive procedure will merge one dictionary into another while overriding conflicting keys:

#!/usr/bin/env python2.7def merge_dicts(dict1, dict2):    """ Recursively merges dict2 into dict1 """    if not isinstance(dict1, dict) or not isinstance(dict2, dict):        return dict2    for k in dict2:        if k in dict1:            dict1[k] = merge_dicts(dict1[k], dict2[k])        else:            dict1[k] = dict2[k]    return dict1print (merge_dicts({1:{"a":"A"}, 2:{"b":"B"}}, {2:{"c":"C"}, 3:{"d":"D"}}))print (merge_dicts({1:{"a":"A"}, 2:{"b":"B"}}, {1:{"a":"A"}, 2:{"b":"C"}}))

Output:

{1: {'a': 'A'}, 2: {'c': 'C', 'b': 'B'}, 3: {'d': 'D'}}{1: {'a': 'A'}, 2: {'b': 'C'}}

回答 6

基于@andrew cooke的答案。它以更好的方式处理嵌套列表。

def deep_merge_lists(original, incoming):    """    Deep merge two lists. Modifies original.    Recursively call deep merge on each correlated element of list.     If item type in both elements are     a. dict: Call deep_merge_dicts on both values.     b. list: Recursively call deep_merge_lists on both values.     c. any other type: Value is overridden.     d. conflicting types: Value is overridden.    If length of incoming list is more that of original then extra values are appended.    """    common_length = min(len(original), len(incoming))    for idx in range(common_length):        if isinstance(original[idx], dict) and isinstance(incoming[idx], dict):            deep_merge_dicts(original[idx], incoming[idx])        elif isinstance(original[idx], list) and isinstance(incoming[idx], list):            deep_merge_lists(original[idx], incoming[idx])        else:            original[idx] = incoming[idx]    for idx in range(common_length, len(incoming)):        original.append(incoming[idx])def deep_merge_dicts(original, incoming):    """    Deep merge two dictionaries. Modifies original.    For key conflicts if both values are:     a. dict: Recursively call deep_merge_dicts on both values.     b. list: Call deep_merge_lists on both values.     c. any other type: Value is overridden.     d. conflicting types: Value is overridden.    """    for key in incoming:        if key in original:            if isinstance(original[key], dict) and isinstance(incoming[key], dict):                deep_merge_dicts(original[key], incoming[key])            elif isinstance(original[key], list) and isinstance(incoming[key], list):                deep_merge_lists(original[key], incoming[key])            else:                original[key] = incoming[key]        else:            original[key] = incoming[key]

Based on answers from @andrew cooke. It takes care of nested lists in a better way.

def deep_merge_lists(original, incoming):    """    Deep merge two lists. Modifies original.    Recursively call deep merge on each correlated element of list.     If item type in both elements are     a. dict: Call deep_merge_dicts on both values.     b. list: Recursively call deep_merge_lists on both values.     c. any other type: Value is overridden.     d. conflicting types: Value is overridden.    If length of incoming list is more that of original then extra values are appended.    """    common_length = min(len(original), len(incoming))    for idx in range(common_length):        if isinstance(original[idx], dict) and isinstance(incoming[idx], dict):            deep_merge_dicts(original[idx], incoming[idx])        elif isinstance(original[idx], list) and isinstance(incoming[idx], list):            deep_merge_lists(original[idx], incoming[idx])        else:            original[idx] = incoming[idx]    for idx in range(common_length, len(incoming)):        original.append(incoming[idx])def deep_merge_dicts(original, incoming):    """    Deep merge two dictionaries. Modifies original.    For key conflicts if both values are:     a. dict: Recursively call deep_merge_dicts on both values.     b. list: Call deep_merge_lists on both values.     c. any other type: Value is overridden.     d. conflicting types: Value is overridden.    """    for key in incoming:        if key in original:            if isinstance(original[key], dict) and isinstance(incoming[key], dict):                deep_merge_dicts(original[key], incoming[key])            elif isinstance(original[key], list) and isinstance(incoming[key], list):                deep_merge_lists(original[key], incoming[key])            else:                original[key] = incoming[key]        else:            original[key] = incoming[key]

回答 7

如果您的词典级别未知,那么我建议使用递归函数:

def combineDicts(dictionary1, dictionary2):    output = {}    for item, value in dictionary1.iteritems():        if dictionary2.has_key(item):            if isinstance(dictionary2[item], dict):                output[item] = combineDicts(value, dictionary2.pop(item))        else:            output[item] = value    for item, value in dictionary2.iteritems():         output[item] = value    return output

If you have an unknown level of dictionaries, then I would suggest a recursive function:

def combineDicts(dictionary1, dictionary2):    output = {}    for item, value in dictionary1.iteritems():        if dictionary2.has_key(item):            if isinstance(dictionary2[item], dict):                output[item] = combineDicts(value, dictionary2.pop(item))        else:            output[item] = value    for item, value in dictionary2.iteritems():         output[item] = value    return output

回答 8

总览

以下方法将dict的深度合并问题细分为:

  1. 参数化的浅表合并函数merge(f)(a,b),该函数使用一个函数f合并两个字典ab

  2. 递归合并函数f将与merge


实作

可以通过多种方式来编写用于合并两个(非嵌套)字典的函数。我个人喜欢

def merge(f):    def merge(a,b):         keys = a.keys() | b.keys()        return {key:f(a.get(key), b.get(key)) for key in keys}    return merge

定义适当的递归合并函数的一种好方法f是使用多调度,它允许定义根据参数类型沿不同路径求值的函数。

from multipledispatch import dispatch#for anything that is not a dict return@dispatch(object, object)def f(a, b):    return b if b is not None else a#for dicts recurse @dispatch(dict, dict)def f(a,b):    return merge(f)(a,b)

要合并两个嵌套的字典,只需使用merge(f)例如:

dict1 = {1:{"a":"A"},2:{"b":"B"}}dict2 = {2:{"c":"C"},3:{"d":"D"}}merge(f)(dict1, dict2)#returns {1: {'a': 'A'}, 2: {'b': 'B', 'c': 'C'}, 3: {'d': 'D'}} 

笔记:

这种方法的优点是:

  • 该函数是由较小的函数构建的,每个较小的函数都做一件事情,这使代码更易于推理和测试。

  • 该行为不是硬编码的,但是可以根据需要进行更改和扩展,从而提高了代码重用性(请参见下面的示例)。


客制化

一些答案还考虑了包含例如其他(可能嵌套的)字典列表的字典。在这种情况下,可能需要映射列表并根据位置合并它们。这可以通过向合并功能添加另一个定义来完成f

import itertools@dispatch(list, list)def f(a,b):    return [merge(f)(*arg) for arg in itertools.zip_longest(a, b)]

Overview

The following approach subdivides the problem of a deep merge of dicts into:

  1. A parameterized shallow merge function merge(f)(a,b) that uses afunction f to merge two dicts a and b

  2. A recursive merger function f to be used together with merge


Implementation

A function for merging two (non nested) dicts can be written in a lot of ways. I personally like

def merge(f):    def merge(a,b):         keys = a.keys() | b.keys()        return {key:f(a.get(key), b.get(key)) for key in keys}    return merge

A nice way of defining an appropriate recursive merger function f is using multipledispatch which allows to define functions that evaluate along different paths depending on the type of their arguments.

from multipledispatch import dispatch#for anything that is not a dict return@dispatch(object, object)def f(a, b):    return b if b is not None else a#for dicts recurse @dispatch(dict, dict)def f(a,b):    return merge(f)(a,b)

Example

To merge two nested dicts simply use merge(f) e.g.:

dict1 = {1:{"a":"A"},2:{"b":"B"}}dict2 = {2:{"c":"C"},3:{"d":"D"}}merge(f)(dict1, dict2)#returns {1: {'a': 'A'}, 2: {'b': 'B', 'c': 'C'}, 3: {'d': 'D'}} 

Notes:

The advantages of this approach are:

  • The function is build from smaller functions that each do a single thing which makes the code simpler to reason about and test

  • The behaviour is not hard-coded but can be changed and extended as needed which improves code reuse (see example below).


Customization

Some answers also considered dicts that contain lists e.g. of other (potentially nested) dicts. In this case one might want map over the lists and merge them based on position. This can be done by adding another definition to the merger function f:

import itertools@dispatch(list, list)def f(a,b):    return [merge(f)(*arg) for arg in itertools.zip_longest(a, b)]

回答 9

如果有人想要解决这个问题,这是我的解决方案。

美德:简短,说明性且具有风格上的功能(递归,无突变)。

潜在缺点:这可能不是您要查找的合并。有关语义,请查阅文档字符串。

def deep_merge(a, b):    """    Merge two values, with `b` taking precedence over `a`.    Semantics:    - If either `a` or `b` is not a dictionary, `a` will be returned only if      `b` is `None`. Otherwise `b` will be returned.    - If both values are dictionaries, they are merged as follows:        * Each key that is found only in `a` or only in `b` will be included in          the output collection with its value intact.        * For any key in common between `a` and `b`, the corresponding values          will be merged with the same semantics.    """    if not isinstance(a, dict) or not isinstance(b, dict):        return a if b is None else b    else:        # If we're here, both a and b must be dictionaries or subtypes thereof.        # Compute set of all keys in both dictionaries.        keys = set(a.keys()) | set(b.keys())        # Build output dictionary, merging recursively values with common keys,        # where `None` is used to mean the absence of a value.        return {            key: deep_merge(a.get(key), b.get(key))            for key in keys        }

In case someone wants yet another approach to this problem, here’s my solution.

Virtues: short, declarative, and functional in style (recursive, does no mutation).

Potential Drawback: This might not be the merge you’re looking for. Consult the docstring for semantics.

def deep_merge(a, b):    """    Merge two values, with `b` taking precedence over `a`.    Semantics:    - If either `a` or `b` is not a dictionary, `a` will be returned only if      `b` is `None`. Otherwise `b` will be returned.    - If both values are dictionaries, they are merged as follows:        * Each key that is found only in `a` or only in `b` will be included in          the output collection with its value intact.        * For any key in common between `a` and `b`, the corresponding values          will be merged with the same semantics.    """    if not isinstance(a, dict) or not isinstance(b, dict):        return a if b is None else b    else:        # If we're here, both a and b must be dictionaries or subtypes thereof.        # Compute set of all keys in both dictionaries.        keys = set(a.keys()) | set(b.keys())        # Build output dictionary, merging recursively values with common keys,        # where `None` is used to mean the absence of a value.        return {            key: deep_merge(a.get(key), b.get(key))            for key in keys        }

回答 10

您可以尝试mergedeep


安装

$ pip3 install mergedeep

用法

from mergedeep import mergea = {"keyA": 1}b = {"keyB": {"sub1": 10}}c = {"keyB": {"sub2": 20}}merge(a, b, c) print(a)# {"keyA": 1, "keyB": {"sub1": 10, "sub2": 20}}

有关选项的完整列表,请查看文档

You could try mergedeep.


Installation

$ pip3 install mergedeep

Usage

from mergedeep import mergea = {"keyA": 1}b = {"keyB": {"sub1": 10}}c = {"keyB": {"sub2": 20}}merge(a, b, c) print(a)# {"keyA": 1, "keyB": {"sub1": 10, "sub2": 20}}

For a full list of options, check out the docs!


回答 11

安德鲁·库克斯答案有一个小问题:在某些情况下,b当您修改返回的字典时,它会修改第二个参数。特别是因为这一行:

if key in a:    ...else:    a[key] = b[key]

如果b[key]为a dict,则将其简单地分配给a,这意味着对它的任何后续修改dict都会影响ab

a={}b={'1':{'2':'b'}}c={'1':{'3':'c'}}merge(merge(a,b), c) # {'1': {'3': 'c', '2': 'b'}}a # {'1': {'3': 'c', '2': 'b'}} (as expected)b # {'1': {'3': 'c', '2': 'b'}} <----c # {'1': {'3': 'c'}} (unmodified)

为了解决这个问题,该行必须替换为:

if isinstance(b[key], dict):    a[key] = clone_dict(b[key])else:    a[key] = b[key]

在哪里clone_dict

def clone_dict(obj):    clone = {}    for key, value in obj.iteritems():        if isinstance(value, dict):            clone[key] = clone_dict(value)        else:            clone[key] = value    return

仍然。这显然不占listset和其他的东西,但我希望尝试合并时,它说明了陷阱dicts

为了完整起见,这是我的版本,您可以在其中多次传递它dicts

def merge_dicts(*args):    def clone_dict(obj):        clone = {}        for key, value in obj.iteritems():            if isinstance(value, dict):                clone[key] = clone_dict(value)            else:                clone[key] = value        return    def merge(a, b, path=[]):        for key in b:            if key in a:                if isinstance(a[key], dict) and isinstance(b[key], dict):                    merge(a[key], b[key], path + [str(key)])                elif a[key] == b[key]:                    pass                else:                    raise Exception('Conflict at `{path}\''.format(path='.'.join(path + [str(key)])))            else:                if isinstance(b[key], dict):                    a[key] = clone_dict(b[key])                else:                    a[key] = b[key]        return a    return reduce(merge, args, {})

There’s a slight problem with andrew cookes answer: In some cases it modifies the second argument b when you modify the returned dict. Specifically it’s because of this line:

if key in a:    ...else:    a[key] = b[key]

If b[key] is a dict, it will simply be assigned to a, meaning any subsequent modifications to that dict will affect both a and b.

a={}b={'1':{'2':'b'}}c={'1':{'3':'c'}}merge(merge(a,b), c) # {'1': {'3': 'c', '2': 'b'}}a # {'1': {'3': 'c', '2': 'b'}} (as expected)b # {'1': {'3': 'c', '2': 'b'}} <----c # {'1': {'3': 'c'}} (unmodified)

To fix this, the line would have to be substituted with this:

if isinstance(b[key], dict):    a[key] = clone_dict(b[key])else:    a[key] = b[key]

Where clone_dict is:

def clone_dict(obj):    clone = {}    for key, value in obj.iteritems():        if isinstance(value, dict):            clone[key] = clone_dict(value)        else:            clone[key] = value    return

Still. This obviously doesn’t account for list, set and other stuff, but I hope it illustrates the pitfalls when trying to merge dicts.

And for completeness sake, here is my version, where you can pass it multiple dicts:

def merge_dicts(*args):    def clone_dict(obj):        clone = {}        for key, value in obj.iteritems():            if isinstance(value, dict):                clone[key] = clone_dict(value)            else:                clone[key] = value        return    def merge(a, b, path=[]):        for key in b:            if key in a:                if isinstance(a[key], dict) and isinstance(b[key], dict):                    merge(a[key], b[key], path + [str(key)])                elif a[key] == b[key]:                    pass                else:                    raise Exception('Conflict at `{path}\''.format(path='.'.join(path + [str(key)])))            else:                if isinstance(b[key], dict):                    a[key] = clone_dict(b[key])                else:                    a[key] = b[key]        return a    return reduce(merge, args, {})

回答 12

此版本的函数将占N个字典,仅占字典-无法传递不正确的参数,否则将引发TypeError。合并本身解决了键冲突,而不是覆盖合并链下游的词典中的数据,它创建了一组值并将其追加到该值之后;没有数据丢失。

它在页面上可能不是最有效的,但是却是最彻底的,并且在合并2到N个字典时,您不会丢失任何信息。

def merge_dicts(*dicts):    if not reduce(lambda x, y: isinstance(y, dict) and x, dicts, True):        raise TypeError, "Object in *dicts not of type dict"    if len(dicts) < 2:        raise ValueError, "Requires 2 or more dict objects"    def merge(a, b):        for d in set(a.keys()).union(b.keys()):            if d in a and d in b:                if type(a[d]) == type(b[d]):                    if not isinstance(a[d], dict):                        ret = list({a[d], b[d]})                        if len(ret) == 1: ret = ret[0]                        yield (d, sorted(ret))                    else:                        yield (d, dict(merge(a[d], b[d])))                else:                    raise TypeError, "Conflicting key:value type assignment"            elif d in a:                yield (d, a[d])            elif d in b:                yield (d, b[d])            else:                raise KeyError    return reduce(lambda x, y: dict(merge(x, y)), dicts[1:], dicts[0])print merge_dicts({1:1,2:{1:2}},{1:2,2:{3:1}},{4:4})

输出:{1:[1,2],2:{1:2,3:3:1},4:4}

This version of the function will account for N number of dictionaries, and only dictionaries — no improper parameters can be passed, or it will raise a TypeError. The merge itself accounts for key conflicts, and instead of overwriting data from a dictionary further down the merge chain, it creates a set of values and appends to that; no data is lost.

It might not be the most effecient on the page, but it’s the most thorough and you’re not going to lose any information when you merge your 2 to N dicts.

def merge_dicts(*dicts):
    if not reduce(lambda x, y: isinstance(y, dict) and x, dicts, True):
        raise TypeError, "Object in *dicts not of type dict"
    if len(dicts) < 2:
        raise ValueError, "Requires 2 or more dict objects"


    def merge(a, b):
        for d in set(a.keys()).union(b.keys()):
            if d in a and d in b:
                if type(a[d]) == type(b[d]):
                    if not isinstance(a[d], dict):
                        ret = list({a[d], b[d]})
                        if len(ret) == 1: ret = ret[0]
                        yield (d, sorted(ret))
                    else:
                        yield (d, dict(merge(a[d], b[d])))
                else:
                    raise TypeError, "Conflicting key:value type assignment"
            elif d in a:
                yield (d, a[d])
            elif d in b:
                yield (d, b[d])
            else:
                raise KeyError

    return reduce(lambda x, y: dict(merge(x, y)), dicts[1:], dicts[0])

print merge_dicts({1:1,2:{1:2}},{1:2,2:{3:1}},{4:4})

output: {1: [1, 2], 2: {1: 2, 3: 1}, 4: 4}