如何合并字典的字典?-Python 实用宝典

如何合并字典的字典?

我需要合并多个词典,例如: dict1 = {1:{"a":{A}}, 2:{"b":{B}}} dict2 = {2:{"c":{C}}, 3:{"d":{D}} 随着A B C和D作为树的叶子像{"info1":"value", "info2":"value2"} 词典的级别(深度)未知,可能是 {2:{"c":{"z":{"y":{C}}}}} 在我的情况下,它表示目录/文件结构,其中节点为docs,而节点为文件。 我想将它们合并以获得: dict3 = {1:{"a":{A}}, 2:{"b":{B},"c":{C}}, 3:{"d":{D}}} 我不确定如何使用Python轻松做到这一点。

问题:如何合并字典的字典?

我需要合并多个词典,例如:

dict1 = {1:{"a":{A}}, 2:{"b":{B}}}

dict2 = {2:{"c":{C}}, 3:{"d":{D}}

随着A B CD作为树的叶子像{"info1":"value", "info2":"value2"}

词典的级别(深度)未知,可能是 {2:{"c":{"z":{"y":{C}}}}}

在我的情况下,它表示目录/文件结构,其中节点为docs,而节点为文件。

我想将它们合并以获得:

 dict3 = {1:{"a":{A}}, 2:{"b":{B},"c":{C}}, 3:{"d":{D}}}

我不确定如何使用Python轻松做到这一点。

I need to merge multiple dictionaries, here's what I have for instance:

dict1 = {1:{"a":{A}}, 2:{"b":{B}}}

dict2 = {2:{"c":{C}}, 3:{"d":{D}}

With A B C and D being leaves of the tree, like {"info1":"value", "info2":"value2"}

There is an unknown level(depth) of dictionaries, it could be {2:{"c":{"z":{"y":{C}}}}}

In my case it represents a directory/files structure with nodes being docs and leaves being files.

I want to merge them to obtain:

 dict3 = {1:{"a":{A}}, 2:{"b":{B},"c":{C}}, 3:{"d":{D}}}

I'm not sure how I could do that easily with Python.


回答 0

这实际上是非常棘手的-特别是如果您希望在事物不一致时收到有用的错误消息,同时正确地接受重复但一致的条目(这里没有其他答案了……)。

假设您没有大量条目,那么递归函数最简单:

def merge(a, b, path=None):
    "merges b into a"
    if path is None: path = []
    for key in b:
        if key in a:
            if isinstance(a[key], dict) and isinstance(b[key], dict):
                merge(a[key], b[key], path + [str(key)])
            elif a[key] == b[key]:
                pass # same leaf value
            else:
                raise Exception('Conflict at %s' % '.'.join(path + [str(key)]))
        else:
            a[key] = b[key]
    return a

# works
print(merge({1:{"a":"A"},2:{"b":"B"}}, {2:{"c":"C"},3:{"d":"D"}}))
# has conflict
merge({1:{"a":"A"},2:{"b":"B"}}, {1:{"a":"A"},2:{"b":"C"}})

请注意,这会发生变化a-将的内容b添加到a(也会返回)。如果您想保留a,可以这样称呼merge(dict(a), b)

agf指出(如下),您可能有两个以上的命令,在这种情况下,您可以使用:

reduce(merge, [dict1, dict2, dict3...])

将所有内容添加到dict1中。

[注意-我编辑了最初的答案以使第一个参数发生变化;使“减少”更易于解释]

python 3中的ps,您还需要 from functools import reduce

this is actually quite tricky - particularly if you want a useful error message when things are inconsistent, while correctly accepting duplicate but consistent entries (something no other answer here does....)

assuming you don't have huge numbers of entries a recursive function is easiest:

def merge(a, b, path=None):
    "merges b into a"
    if path is None: path = []
    for key in b:
        if key in a:
            if isinstance(a[key], dict) and isinstance(b[key], dict):
                merge(a[key], b[key], path + [str(key)])
            elif a[key] == b[key]:
                pass # same leaf value
            else:
                raise Exception('Conflict at %s' % '.'.join(path + [str(key)]))
        else:
            a[key] = b[key]
    return a

# works
print(merge({1:{"a":"A"},2:{"b":"B"}}, {2:{"c":"C"},3:{"d":"D"}}))
# has conflict
merge({1:{"a":"A"},2:{"b":"B"}}, {1:{"a":"A"},2:{"b":"C"}})

note that this mutates a - the contents of b are added to a (which is also returned). if you want to keep a you could call it like merge(dict(a), b).

agf pointed out (below) that you may have more than two dicts, in which case you can use:

reduce(merge, [dict1, dict2, dict3...])

where everything will be added to dict1.

[note - i edited my initial answer to mutate the first argument; that makes the "reduce" easier to explain]

ps in python 3, you will also need from functools import reduce


回答 1

这是使用生成器实现的一种简单方法:

def mergedicts(dict1, dict2):
    for k in set(dict1.keys()).union(dict2.keys()):
        if k in dict1 and k in dict2:
            if isinstance(dict1[k], dict) and isinstance(dict2[k], dict):
                yield (k, dict(mergedicts(dict1[k], dict2[k])))
            else:
                # If one of the values is not a dict, you can't continue merging it.
                # Value from second dict overrides one in first and we move on.
                yield (k, dict2[k])
                # Alternatively, replace this with exception raiser to alert you of value conflicts
        elif k in dict1:
            yield (k, dict1[k])
        else:
            yield (k, dict2[k])

dict1 = {1:{"a":"A"},2:{"b":"B"}}
dict2 = {2:{"c":"C"},3:{"d":"D"}}

print dict(mergedicts(dict1,dict2))

打印:

{1: {'a': 'A'}, 2: {'c': 'C', 'b': 'B'}, 3: {'d': 'D'}}

Here's an easy way to do it using generators:

def mergedicts(dict1, dict2):
    for k in set(dict1.keys()).union(dict2.keys()):
        if k in dict1 and k in dict2:
            if isinstance(dict1[k], dict) and isinstance(dict2[k], dict):
                yield (k, dict(mergedicts(dict1[k], dict2[k])))
            else:
                # If one of the values is not a dict, you can't continue merging it.
                # Value from second dict overrides one in first and we move on.
                yield (k, dict2[k])
                # Alternatively, replace this with exception raiser to alert you of value conflicts
        elif k in dict1:
            yield (k, dict1[k])
        else:
            yield (k, dict2[k])

dict1 = {1:{"a":"A"},2:{"b":"B"}}
dict2 = {2:{"c":"C"},3:{"d":"D"}}

print dict(mergedicts(dict1,dict2))

This prints:

{1: {'a': 'A'}, 2: {'c': 'C', 'b': 'B'}, 3: {'d': 'D'}}

回答 2

这个问题的一个问题是,字典的值可以是任意复杂的数据。基于这些和其他答案,我想到了以下代码:

class YamlReaderError(Exception):
    pass

def data_merge(a, b):
    """merges b into a and return merged result

    NOTE: tuples and arbitrary objects are not handled as it is totally ambiguous what should happen"""
    key = None
    # ## debug output
    # sys.stderr.write("DEBUG: %s to %s\n" %(b,a))
    try:
        if a is None or isinstance(a, str) or isinstance(a, unicode) or isinstance(a, int) or isinstance(a, long) or isinstance(a, float):
            # border case for first run or if a is a primitive
            a = b
        elif isinstance(a, list):
            # lists can be only appended
            if isinstance(b, list):
                # merge lists
                a.extend(b)
            else:
                # append to list
                a.append(b)
        elif isinstance(a, dict):
            # dicts must be merged
            if isinstance(b, dict):
                for key in b:
                    if key in a:
                        a[key] = data_merge(a[key], b[key])
                    else:
                        a[key] = b[key]
            else:
                raise YamlReaderError('Cannot merge non-dict "%s" into dict "%s"' % (b, a))
        else:
            raise YamlReaderError('NOT IMPLEMENTED "%s" into "%s"' % (b, a))
    except TypeError, e:
        raise YamlReaderError('TypeError "%s" in key "%s" when merging "%s" into "%s"' % (e, key, b, a))
    return a

我的用例是合并YAML文件,我只需要处理可能的数据类型的子集。因此,我可以忽略元组和其他对象。对我而言,明智的合并逻辑意味着

  • 更换标量
  • 追加清单
  • 通过添加缺少的键并更新现有键来合并字典

其他所有和不可预见的结果都会导致错误。

One issue with this question is that the values of the dict can be arbitrarily complex pieces of data. Based upon these and other answers I came up with this code:

class YamlReaderError(Exception):
    pass

def data_merge(a, b):
    """merges b into a and return merged result

    NOTE: tuples and arbitrary objects are not handled as it is totally ambiguous what should happen"""
    key = None
    # ## debug output
    # sys.stderr.write("DEBUG: %s to %s\n" %(b,a))
    try:
        if a is None or isinstance(a, str) or isinstance(a, unicode) or isinstance(a, int) or isinstance(a, long) or isinstance(a, float):
            # border case for first run or if a is a primitive
            a = b
        elif isinstance(a, list):
            # lists can be only appended
            if isinstance(b, list):
                # merge lists
                a.extend(b)
            else:
                # append to list
                a.append(b)
        elif isinstance(a, dict):
            # dicts must be merged
            if isinstance(b, dict):
                for key in b:
                    if key in a:
                        a[key] = data_merge(a[key], b[key])
                    else:
                        a[key] = b[key]
            else:
                raise YamlReaderError('Cannot merge non-dict "%s" into dict "%s"' % (b, a))
        else:
            raise YamlReaderError('NOT IMPLEMENTED "%s" into "%s"' % (b, a))
    except TypeError, e:
        raise YamlReaderError('TypeError "%s" in key "%s" when merging "%s" into "%s"' % (e, key, b, a))
    return a

My use case is merging YAML files where I only have to deal with a subset of possible data types. Hence I can ignore tuples and other objects. For me a sensible merge logic means

  • replace scalars
  • append lists
  • merge dicts by adding missing keys and updating existing keys

Everything else and the unforeseens results in an error.


回答 3

字典词典合并

由于这是规范的问题(尽管有某些非一般性的规定),所以我提供了规范的Python方法来解决此问题。

最简单的情况:“叶是嵌套的字典,以空字典结尾”:

d1 = {'a': {1: {'foo': {}}, 2: {}}}
d2 = {'a': {1: {}, 2: {'bar': {}}}}
d3 = {'b': {3: {'baz': {}}}}
d4 = {'a': {1: {'quux': {}}}}

这是最简单的递归情况,我建议两种朴素的方法:

def rec_merge1(d1, d2):
    '''return new merged dict of dicts'''
    for k, v in d1.items(): # in Python 2, use .iteritems()!
        if k in d2:
            d2[k] = rec_merge1(v, d2[k])
    d3 = d1.copy()
    d3.update(d2)
    return d3

def rec_merge2(d1, d2):
    '''update first dict with second recursively'''
    for k, v in d1.items(): # in Python 2, use .iteritems()!
        if k in d2:
            d2[k] = rec_merge2(v, d2[k])
    d1.update(d2)
    return d1

我相信我更喜欢第二个,但要记住,第一个的原始状态必须从其原始位置重建。这是用法:

>>> from functools import reduce # only required for Python 3.
>>> reduce(rec_merge1, (d1, d2, d3, d4))
{'a': {1: {'quux': {}, 'foo': {}}, 2: {'bar': {}}}, 'b': {3: {'baz': {}}}}
>>> reduce(rec_merge2, (d1, d2, d3, d4))
{'a': {1: {'quux': {}, 'foo': {}}, 2: {'bar': {}}}, 'b': {3: {'baz': {}}}}

复杂的情况:“叶子是任何其他类型:”

因此,如果它们以字典结尾,这是合并末端空字典的简单情况。如果没有,那不是那么简单。如果是字符串,如何合并它们?可以类似地更新集合,因此我们可以进行这种处理,但是会丢失它们合并的顺序。那么顺序重要吗?

因此,代替更多信息,最简单的方法是在两个值都不都是dict的情况下为他们提供标准的更新处理:即第二个dict的值将覆盖第一个dict的值,即使第二个dict的值为None且第一个的值为a有很多信息的字典。

d1 = {'a': {1: 'foo', 2: None}}
d2 = {'a': {1: None, 2: 'bar'}}
d3 = {'b': {3: 'baz'}}
d4 = {'a': {1: 'quux'}}

from collections import MutableMapping

def rec_merge(d1, d2):
    '''
    Update two dicts of dicts recursively, 
    if either mapping has leaves that are non-dicts, 
    the second's leaf overwrites the first's.
    '''
    for k, v in d1.items(): # in Python 2, use .iteritems()!
        if k in d2:
            # this next check is the only difference!
            if all(isinstance(e, MutableMapping) for e in (v, d2[k])):
                d2[k] = rec_merge(v, d2[k])
            # we could further check types and merge as appropriate here.
    d3 = d1.copy()
    d3.update(d2)
    return d3

现在

from functools import reduce
reduce(rec_merge, (d1, d2, d3, d4))

退货

{'a': {1: 'quux', 2: 'bar'}, 'b': {3: 'baz'}}

适用于原始问题:

我必须删除字母周围的花括号并将其放在单引号中,以使其成为合法的Python(否则,它们将在Python 2.7+中设置为原义)并附加缺少的花括号:

dict1 = {1:{"a":'A'}, 2:{"b":'B'}}
dict2 = {2:{"c":'C'}, 3:{"d":'D'}}

rec_merge(dict1, dict2)现在返回:

{1: {'a': 'A'}, 2: {'c': 'C', 'b': 'B'}, 3: {'d': 'D'}}

匹配原始问题的期望结果(例如,更改{A}为后)'A'

Dictionaries of dictionaries merge

As this is the canonical question (in spite of certain non-generalities) I'm providing the canonical Pythonic approach to solving this issue.

Simplest Case: "leaves are nested dicts that end in empty dicts":

d1 = {'a': {1: {'foo': {}}, 2: {}}}
d2 = {'a': {1: {}, 2: {'bar': {}}}}
d3 = {'b': {3: {'baz': {}}}}
d4 = {'a': {1: {'quux': {}}}}

This is the simplest case for recursion, and I would recommend two naive approaches:

def rec_merge1(d1, d2):
    '''return new merged dict of dicts'''
    for k, v in d1.items(): # in Python 2, use .iteritems()!
        if k in d2:
            d2[k] = rec_merge1(v, d2[k])
    d3 = d1.copy()
    d3.update(d2)
    return d3

def rec_merge2(d1, d2):
    '''update first dict with second recursively'''
    for k, v in d1.items(): # in Python 2, use .iteritems()!
        if k in d2:
            d2[k] = rec_merge2(v, d2[k])
    d1.update(d2)
    return d1

I believe I would prefer the second to the first, but keep in mind that the original state of the first would have to be rebuilt from its origin. Here's the usage:

>>> from functools import reduce # only required for Python 3.
>>> reduce(rec_merge1, (d1, d2, d3, d4))
{'a': {1: {'quux': {}, 'foo': {}}, 2: {'bar': {}}}, 'b': {3: {'baz': {}}}}
>>> reduce(rec_merge2, (d1, d2, d3, d4))
{'a': {1: {'quux': {}, 'foo': {}}, 2: {'bar': {}}}, 'b': {3: {'baz': {}}}}

Complex Case: "leaves are of any other type:"

So if they end in dicts, it's a simple case of merging the end empty dicts. If not, it's not so trivial. If strings, how do you merge them? Sets can be updated similarly, so we could give that treatment, but we lose the order in which they were merged. So does order matter?

So in lieu of more information, the simplest approach will be to give them the standard update treatment if both values are not dicts: i.e. the second dict's value will overwrite the first, even if the second dict's value is None and the first's value is a dict with a lot of info.

d1 = {'a': {1: 'foo', 2: None}}
d2 = {'a': {1: None, 2: 'bar'}}
d3 = {'b': {3: 'baz'}}
d4 = {'a': {1: 'quux'}}

from collections.abc import MutableMapping

def rec_merge(d1, d2):
    '''
    Update two dicts of dicts recursively, 
    if either mapping has leaves that are non-dicts, 
    the second's leaf overwrites the first's.
    '''
    for k, v in d1.items():
        if k in d2:
            # this next check is the only difference!
            if all(isinstance(e, MutableMapping) for e in (v, d2[k])):
                d2[k] = rec_merge(v, d2[k])
            # we could further check types and merge as appropriate here.
    d3 = d1.copy()
    d3.update(d2)
    return d3

And now

from functools import reduce
reduce(rec_merge, (d1, d2, d3, d4))

returns

{'a': {1: 'quux', 2: 'bar'}, 'b': {3: 'baz'}}

Application to the original question:

I've had to remove the curly braces around the letters and put them in single quotes for this to be legit Python (else they would be set literals in Python 2.7+) as well as append a missing brace:

dict1 = {1:{"a":'A'}, 2:{"b":'B'}}
dict2 = {2:{"c":'C'}, 3:{"d":'D'}}

and rec_merge(dict1, dict2) now returns:

{1: {'a': 'A'}, 2: {'c': 'C', 'b': 'B'}, 3: {'d': 'D'}}

Which matches the desired outcome of the original question (after changing, e.g. the {A} to 'A'.)


回答 4

基于@andrew cooke。此版本处理字典的嵌套列表,还允许选择更新值

def merge(a, b, path=None, update=True):
    "http://stackoverflow.com/questions/7204805/python-dictionaries-of-dictionaries-merge"
    "merges b into a"
    if path is None: path = []
    for key in b:
        if key in a:
            if isinstance(a[key], dict) and isinstance(b[key], dict):
                merge(a[key], b[key], path + [str(key)])
            elif a[key] == b[key]:
                pass # same leaf value
            elif isinstance(a[key], list) and isinstance(b[key], list):
                for idx, val in enumerate(b[key]):
                    a[key][idx] = merge(a[key][idx], b[key][idx], path + [str(key), str(idx)], update=update)
            elif update:
                a[key] = b[key]
            else:
                raise Exception('Conflict at %s' % '.'.join(path + [str(key)]))
        else:
            a[key] = b[key]
    return a

Based on @andrew cooke. This version handles nested lists of dicts and also allows the option to update the values

def merge(a, b, path=None, update=True):
    "http://stackoverflow.com/questions/7204805/python-dictionaries-of-dictionaries-merge"
    "merges b into a"
    if path is None: path = []
    for key in b:
        if key in a:
            if isinstance(a[key], dict) and isinstance(b[key], dict):
                merge(a[key], b[key], path + [str(key)])
            elif a[key] == b[key]:
                pass # same leaf value
            elif isinstance(a[key], list) and isinstance(b[key], list):
                for idx, val in enumerate(b[key]):
                    a[key][idx] = merge(a[key][idx], b[key][idx], path + [str(key), str(idx)], update=update)
            elif update:
                a[key] = b[key]
            else:
                raise Exception('Conflict at %s' % '.'.join(path + [str(key)]))
        else:
            a[key] = b[key]
    return a

回答 5

这个简单的递归过程将一个字典合并到另一个字典中,同时覆盖冲突的键:

#!/usr/bin/env python2.7

def merge_dicts(dict1, dict2):
    """ Recursively merges dict2 into dict1 """
    if not isinstance(dict1, dict) or not isinstance(dict2, dict):
        return dict2
    for k in dict2:
        if k in dict1:
            dict1[k] = merge_dicts(dict1[k], dict2[k])
        else:
            dict1[k] = dict2[k]
    return dict1

print (merge_dicts({1:{"a":"A"}, 2:{"b":"B"}}, {2:{"c":"C"}, 3:{"d":"D"}}))
print (merge_dicts({1:{"a":"A"}, 2:{"b":"B"}}, {1:{"a":"A"}, 2:{"b":"C"}}))

输出:

{1: {'a': 'A'}, 2: {'c': 'C', 'b': 'B'}, 3: {'d': 'D'}}
{1: {'a': 'A'}, 2: {'b': 'C'}}

This simple recursive procedure will merge one dictionary into another while overriding conflicting keys:

#!/usr/bin/env python2.7

def merge_dicts(dict1, dict2):
    """ Recursively merges dict2 into dict1 """
    if not isinstance(dict1, dict) or not isinstance(dict2, dict):
        return dict2
    for k in dict2:
        if k in dict1:
            dict1[k] = merge_dicts(dict1[k], dict2[k])
        else:
            dict1[k] = dict2[k]
    return dict1

print (merge_dicts({1:{"a":"A"}, 2:{"b":"B"}}, {2:{"c":"C"}, 3:{"d":"D"}}))
print (merge_dicts({1:{"a":"A"}, 2:{"b":"B"}}, {1:{"a":"A"}, 2:{"b":"C"}}))

Output:

{1: {'a': 'A'}, 2: {'c': 'C', 'b': 'B'}, 3: {'d': 'D'}}
{1: {'a': 'A'}, 2: {'b': 'C'}}

回答 6

基于@andrew cooke的答案。它以更好的方式处理嵌套列表。

def deep_merge_lists(original, incoming):
    """
    Deep merge two lists. Modifies original.
    Recursively call deep merge on each correlated element of list. 
    If item type in both elements are
     a. dict: Call deep_merge_dicts on both values.
     b. list: Recursively call deep_merge_lists on both values.
     c. any other type: Value is overridden.
     d. conflicting types: Value is overridden.

    If length of incoming list is more that of original then extra values are appended.
    """
    common_length = min(len(original), len(incoming))
    for idx in range(common_length):
        if isinstance(original[idx], dict) and isinstance(incoming[idx], dict):
            deep_merge_dicts(original[idx], incoming[idx])

        elif isinstance(original[idx], list) and isinstance(incoming[idx], list):
            deep_merge_lists(original[idx], incoming[idx])

        else:
            original[idx] = incoming[idx]

    for idx in range(common_length, len(incoming)):
        original.append(incoming[idx])


def deep_merge_dicts(original, incoming):
    """
    Deep merge two dictionaries. Modifies original.
    For key conflicts if both values are:
     a. dict: Recursively call deep_merge_dicts on both values.
     b. list: Call deep_merge_lists on both values.
     c. any other type: Value is overridden.
     d. conflicting types: Value is overridden.

    """
    for key in incoming:
        if key in original:
            if isinstance(original[key], dict) and isinstance(incoming[key], dict):
                deep_merge_dicts(original[key], incoming[key])

            elif isinstance(original[key], list) and isinstance(incoming[key], list):
                deep_merge_lists(original[key], incoming[key])

            else:
                original[key] = incoming[key]
        else:
            original[key] = incoming[key]

Based on answers from @andrew cooke. It takes care of nested lists in a better way.

def deep_merge_lists(original, incoming):
    """
    Deep merge two lists. Modifies original.
    Recursively call deep merge on each correlated element of list. 
    If item type in both elements are
     a. dict: Call deep_merge_dicts on both values.
     b. list: Recursively call deep_merge_lists on both values.
     c. any other type: Value is overridden.
     d. conflicting types: Value is overridden.

    If length of incoming list is more that of original then extra values are appended.
    """
    common_length = min(len(original), len(incoming))
    for idx in range(common_length):
        if isinstance(original[idx], dict) and isinstance(incoming[idx], dict):
            deep_merge_dicts(original[idx], incoming[idx])

        elif isinstance(original[idx], list) and isinstance(incoming[idx], list):
            deep_merge_lists(original[idx], incoming[idx])

        else:
            original[idx] = incoming[idx]

    for idx in range(common_length, len(incoming)):
        original.append(incoming[idx])


def deep_merge_dicts(original, incoming):
    """
    Deep merge two dictionaries. Modifies original.
    For key conflicts if both values are:
     a. dict: Recursively call deep_merge_dicts on both values.
     b. list: Call deep_merge_lists on both values.
     c. any other type: Value is overridden.
     d. conflicting types: Value is overridden.

    """
    for key in incoming:
        if key in original:
            if isinstance(original[key], dict) and isinstance(incoming[key], dict):
                deep_merge_dicts(original[key], incoming[key])

            elif isinstance(original[key], list) and isinstance(incoming[key], list):
                deep_merge_lists(original[key], incoming[key])

            else:
                original[key] = incoming[key]
        else:
            original[key] = incoming[key]

回答 7

如果您的词典级别未知,那么我建议使用递归函数:

def combineDicts(dictionary1, dictionary2):
    output = {}
    for item, value in dictionary1.iteritems():
        if dictionary2.has_key(item):
            if isinstance(dictionary2[item], dict):
                output[item] = combineDicts(value, dictionary2.pop(item))
        else:
            output[item] = value
    for item, value in dictionary2.iteritems():
         output[item] = value
    return output

If you have an unknown level of dictionaries, then I would suggest a recursive function:

def combineDicts(dictionary1, dictionary2):
    output = {}
    for item, value in dictionary1.iteritems():
        if dictionary2.has_key(item):
            if isinstance(dictionary2[item], dict):
                output[item] = combineDicts(value, dictionary2.pop(item))
        else:
            output[item] = value
    for item, value in dictionary2.iteritems():
         output[item] = value
    return output

回答 8

总览

以下方法将dict的深度合并问题细分为:

  1. 参数化的浅表合并函数merge(f)(a,b),该函数使用一个函数f合并两个字典ab

  2. 递归合并函数f将与merge


实作

可以通过多种方式来编写用于合并两个(非嵌套)字典的函数。我个人喜欢

def merge(f):
    def merge(a,b): 
        keys = a.keys() | b.keys()
        return {key:f(a.get(key), b.get(key)) for key in keys}
    return merge

定义适当的递归合并函数的一种好方法f是使用多调度,它允许定义根据参数类型沿不同路径求值的函数。

from multipledispatch import dispatch

#for anything that is not a dict return
@dispatch(object, object)
def f(a, b):
    return b if b is not None else a

#for dicts recurse 
@dispatch(dict, dict)
def f(a,b):
    return merge(f)(a,b)

要合并两个嵌套的字典,只需使用merge(f)例如:

dict1 = {1:{"a":"A"},2:{"b":"B"}}
dict2 = {2:{"c":"C"},3:{"d":"D"}}
merge(f)(dict1, dict2)
#returns {1: {'a': 'A'}, 2: {'b': 'B', 'c': 'C'}, 3: {'d': 'D'}} 

笔记:

这种方法的优点是:

  • 该函数是由较小的函数构建的,每个较小的函数都做一件事情,这使代码更易于推理和测试。

  • 该行为不是硬编码的,但是可以根据需要进行更改和扩展,从而提高了代码重用性(请参见下面的示例)。


客制化

一些答案还考虑了包含例如其他(可能嵌套的)字典列表的字典。在这种情况下,可能需要映射列表并根据位置合并它们。这可以通过向合并功能添加另一个定义来完成f

import itertools
@dispatch(list, list)
def f(a,b):
    return [merge(f)(*arg) for arg in itertools.zip_longest(a, b)]

Overview

The following approach subdivides the problem of a deep merge of dicts into:

  1. A parameterized shallow merge function merge(f)(a,b) that uses a function f to merge two dicts a and b

  2. A recursive merger function f to be used together with merge


Implementation

A function for merging two (non nested) dicts can be written in a lot of ways. I personally like

def merge(f):
    def merge(a,b): 
        keys = a.keys() | b.keys()
        return {key:f(a.get(key), b.get(key)) for key in keys}
    return merge

A nice way of defining an appropriate recursive merger function f is using multipledispatch which allows to define functions that evaluate along different paths depending on the type of their arguments.

from multipledispatch import dispatch

#for anything that is not a dict return
@dispatch(object, object)
def f(a, b):
    return b if b is not None else a

#for dicts recurse 
@dispatch(dict, dict)
def f(a,b):
    return merge(f)(a,b)

Example

To merge two nested dicts simply use merge(f) e.g.:

dict1 = {1:{"a":"A"},2:{"b":"B"}}
dict2 = {2:{"c":"C"},3:{"d":"D"}}
merge(f)(dict1, dict2)
#returns {1: {'a': 'A'}, 2: {'b': 'B', 'c': 'C'}, 3: {'d': 'D'}} 

Notes:

The advantages of this approach are:

  • The function is build from smaller functions that each do a single thing which makes the code simpler to reason about and test

  • The behaviour is not hard-coded but can be changed and extended as needed which improves code reuse (see example below).


Customization

Some answers also considered dicts that contain lists e.g. of other (potentially nested) dicts. In this case one might want map over the lists and merge them based on position. This can be done by adding another definition to the merger function f:

import itertools
@dispatch(list, list)
def f(a,b):
    return [merge(f)(*arg) for arg in itertools.zip_longest(a, b)]

回答 9

如果有人想要解决这个问题,这是我的解决方案。

美德:简短,说明性且具有风格上的功能(递归,无突变)。

潜在缺点:这可能不是您要查找的合并。有关语义,请查阅文档字符串。

def deep_merge(a, b):
    """
    Merge two values, with `b` taking precedence over `a`.

    Semantics:
    - If either `a` or `b` is not a dictionary, `a` will be returned only if
      `b` is `None`. Otherwise `b` will be returned.
    - If both values are dictionaries, they are merged as follows:
        * Each key that is found only in `a` or only in `b` will be included in
          the output collection with its value intact.
        * For any key in common between `a` and `b`, the corresponding values
          will be merged with the same semantics.
    """
    if not isinstance(a, dict) or not isinstance(b, dict):
        return a if b is None else b
    else:
        # If we're here, both a and b must be dictionaries or subtypes thereof.

        # Compute set of all keys in both dictionaries.
        keys = set(a.keys()) | set(b.keys())

        # Build output dictionary, merging recursively values with common keys,
        # where `None` is used to mean the absence of a value.
        return {
            key: deep_merge(a.get(key), b.get(key))
            for key in keys
        }

In case someone wants yet another approach to this problem, here's my solution.

Virtues: short, declarative, and functional in style (recursive, does no mutation).

Potential Drawback: This might not be the merge you're looking for. Consult the docstring for semantics.

def deep_merge(a, b):
    """
    Merge two values, with `b` taking precedence over `a`.

    Semantics:
    - If either `a` or `b` is not a dictionary, `a` will be returned only if
      `b` is `None`. Otherwise `b` will be returned.
    - If both values are dictionaries, they are merged as follows:
        * Each key that is found only in `a` or only in `b` will be included in
          the output collection with its value intact.
        * For any key in common between `a` and `b`, the corresponding values
          will be merged with the same semantics.
    """
    if not isinstance(a, dict) or not isinstance(b, dict):
        return a if b is None else b
    else:
        # If we're here, both a and b must be dictionaries or subtypes thereof.

        # Compute set of all keys in both dictionaries.
        keys = set(a.keys()) | set(b.keys())

        # Build output dictionary, merging recursively values with common keys,
        # where `None` is used to mean the absence of a value.
        return {
            key: deep_merge(a.get(key), b.get(key))
            for key in keys
        }

回答 10

您可以尝试mergedeep


安装

$ pip3 install mergedeep

用法

from mergedeep import merge

a = {"keyA": 1}
b = {"keyB": {"sub1": 10}}
c = {"keyB": {"sub2": 20}}

merge(a, b, c) 

print(a)
# {"keyA": 1, "keyB": {"sub1": 10, "sub2": 20}}

有关选项的完整列表,请查看文档

You could try mergedeep.


Installation

$ pip3 install mergedeep

Usage

from mergedeep import merge

a = {"keyA": 1}
b = {"keyB": {"sub1": 10}}
c = {"keyB": {"sub2": 20}}

merge(a, b, c) 

print(a)
# {"keyA": 1, "keyB": {"sub1": 10, "sub2": 20}}

For a full list of options, check out the docs!


回答 11

安德鲁·库克斯答案有一个小问题:在某些情况下,b当您修改返回的字典时,它会修改第二个参数。特别是因为这一行:

if key in a:
    ...
else:
    a[key] = b[key]

如果b[key]为a dict,则将其简单地分配给a,这意味着对它的任何后续修改dict都会影响ab

a={}
b={'1':{'2':'b'}}
c={'1':{'3':'c'}}
merge(merge(a,b), c) # {'1': {'3': 'c', '2': 'b'}}
a # {'1': {'3': 'c', '2': 'b'}} (as expected)
b # {'1': {'3': 'c', '2': 'b'}} <----
c # {'1': {'3': 'c'}} (unmodified)

为了解决这个问题,该行必须替换为:

if isinstance(b[key], dict):
    a[key] = clone_dict(b[key])
else:
    a[key] = b[key]

在哪里clone_dict

def clone_dict(obj):
    clone = {}
    for key, value in obj.iteritems():
        if isinstance(value, dict):
            clone[key] = clone_dict(value)
        else:
            clone[key] = value
    return

仍然。这显然不占listset和其他的东西,但我希望尝试合并时,它说明了陷阱dicts

为了完整起见,这是我的版本,您可以在其中多次传递它dicts

def merge_dicts(*args):
    def clone_dict(obj):
        clone = {}
        for key, value in obj.iteritems():
            if isinstance(value, dict):
                clone[key] = clone_dict(value)
            else:
                clone[key] = value
        return

    def merge(a, b, path=[]):
        for key in b:
            if key in a:
                if isinstance(a[key], dict) and isinstance(b[key], dict):
                    merge(a[key], b[key], path + [str(key)])
                elif a[key] == b[key]:
                    pass
                else:
                    raise Exception('Conflict at `{path}\''.format(path='.'.join(path + [str(key)])))
            else:
                if isinstance(b[key], dict):
                    a[key] = clone_dict(b[key])
                else:
                    a[key] = b[key]
        return a
    return reduce(merge, args, {})

There's a slight problem with andrew cookes answer: In some cases it modifies the second argument b when you modify the returned dict. Specifically it's because of this line:

if key in a:
    ...
else:
    a[key] = b[key]

If b[key] is a dict, it will simply be assigned to a, meaning any subsequent modifications to that dict will affect both a and b.

a={}
b={'1':{'2':'b'}}
c={'1':{'3':'c'}}
merge(merge(a,b), c) # {'1': {'3': 'c', '2': 'b'}}
a # {'1': {'3': 'c', '2': 'b'}} (as expected)
b # {'1': {'3': 'c', '2': 'b'}} <----
c # {'1': {'3': 'c'}} (unmodified)

To fix this, the line would have to be substituted with this:

if isinstance(b[key], dict):
    a[key] = clone_dict(b[key])
else:
    a[key] = b[key]

Where clone_dict is:

def clone_dict(obj):
    clone = {}
    for key, value in obj.iteritems():
        if isinstance(value, dict):
            clone[key] = clone_dict(value)
        else:
            clone[key] = value
    return

Still. This obviously doesn't account for list, set and other stuff, but I hope it illustrates the pitfalls when trying to merge dicts.

And for completeness sake, here is my version, where you can pass it multiple dicts:

def merge_dicts(*args):
    def clone_dict(obj):
        clone = {}
        for key, value in obj.iteritems():
            if isinstance(value, dict):
                clone[key] = clone_dict(value)
            else:
                clone[key] = value
        return

    def merge(a, b, path=[]):
        for key in b:
            if key in a:
                if isinstance(a[key], dict) and isinstance(b[key], dict):
                    merge(a[key], b[key], path + [str(key)])
                elif a[key] == b[key]:
                    pass
                else:
                    raise Exception('Conflict at `{path}\''.format(path='.'.join(path + [str(key)])))
            else:
                if isinstance(b[key], dict):
                    a[key] = clone_dict(b[key])
                else:
                    a[key] = b[key]
        return a
    return reduce(merge, args, {})

回答 12

此版本的函数将占N个字典,仅占字典-无法传递不正确的参数,否则将引发TypeError。合并本身解决了键冲突,而不是覆盖合并链下游的词典中的数据,它创建了一组值并将其追加到该值之后;没有数据丢失。

它在页面上可能不是最有效的,但是却是最彻底的,并且在合并2到N个字典时,您不会丢失任何信息。

def merge_dicts(*dicts):
    if not reduce(lambda x, y: isinstance(y, dict) and x, dicts, True):
        raise TypeError, "Object in *dicts not of type dict"
    if len(dicts) < 2:
        raise ValueError, "Requires 2 or more dict objects"


    def merge(a, b):
        for d in set(a.keys()).union(b.keys()):
            if d in a and d in b:
                if type(a[d]) == type(b[d]):
                    if not isinstance(a[d], dict):
                        ret = list({a[d], b[d]})
                        if len(ret) == 1: ret = ret[0]
                        yield (d, sorted(ret))
                    else:
                        yield (d, dict(merge(a[d], b[d])))
                else:
                    raise TypeError, "Conflicting key:value type assignment"
            elif d in a:
                yield (d, a[d])
            elif d in b:
                yield (d, b[d])
            else:
                raise KeyError

    return reduce(lambda x, y: dict(merge(x, y)), dicts[1:], dicts[0])

print merge_dicts({1:1,2:{1:2}},{1:2,2:{3:1}},{4:4})

输出:{1:[1,2],2:{1:2,3:3:1},4:4}

This version of the function will account for N number of dictionaries, and only dictionaries -- no improper parameters can be passed, or it will raise a TypeError. The merge itself accounts for key conflicts, and instead of overwriting data from a dictionary further down the merge chain, it creates a set of values and appends to that; no data is lost.

It might not be the most effecient on the page, but it's the most thorough and you're not going to lose any information when you merge your 2 to N dicts.

def merge_dicts(*dicts):
    if not reduce(lambda x, y: isinstance(y, dict) and x, dicts, True):
        raise TypeError, "Object in *dicts not of type dict"
    if len(dicts) < 2:
        raise ValueError, "Requires 2 or more dict objects"


    def merge(a, b):
        for d in set(a.keys()).union(b.keys()):
            if d in a and d in b:
                if type(a[d]) == type(b[d]):
                    if not isinstance(a[d], dict):
                        ret = list({a[d], b[d]})
                        if len(ret) == 1: ret = ret[0]
                        yield (d, sorted(ret))
                    else:
                        yield (d, dict(merge(a[d], b[d])))
                else:
                    raise TypeError, "Conflicting key:value type assignment"
            elif d in a:
                yield (d, a[d])
            elif d in b:
                yield (d, b[d])
            else:
                raise KeyError

    return reduce(lambda x, y: dict(merge(x, y)), dicts[1:], dicts[0])

print merge_dicts({1:1,2:{1:2}},{1:2,2:{3:1}},{4:4})

output: {1: [1, 2], 2: {1: 2, 3: 1}, 4: 4}


回答 13

由于dictviews支持集合操作,因此我能够大大简化jterrace的答案。

def merge(dict1, dict2):
    for k in dict1.keys() - dict2.keys():
        yield (k, dict1[k])

    for k in dict2.keys() - dict1.keys():
        yield (k, dict2[k])

    for k in dict1.keys() & dict2.keys():
        yield (k, dict(merge(dict1[k], dict2[k])))

任何尝试将dict与非dict组合在一起的尝试(从技术上讲,是指具有“ keys”方法的对象,而没有“ keys”方法的对象)将引发AttributeError。这包括对函数的初始调用和递归调用。这正是我想要的,所以我离开了。您可以轻松地捕获递归调用引发的AttributeErrors,然后产生您想要的任何值。

Since dictviews support set operations, I was able to greatly simplify jterrace's answer.

def merge(dict1, dict2):
    for k in dict1.keys() - dict2.keys():
        yield (k, dict1[k])

    for k in dict2.keys() - dict1.keys():
        yield (k, dict2[k])

    for k in dict1.keys() & dict2.keys():
        yield (k, dict(merge(dict1[k], dict2[k])))

Any attempt to combine a dict with a non dict (technically, an object with a 'keys' method and an object without a 'keys' method) will raise an AttributeError. This includes both the initial call to the function and recursive calls. This is exactly what I wanted so I left it. You could easily catch an AttributeErrors thrown by the recursive call and then yield any value you please.


回答 14

短n甜:

from collections.abc import MutableMapping as Map

def nested_update(d, v):
"""
Nested update of dict-like 'd' with dict-like 'v'.
"""

for key in v:
    if key in d and isinstance(d[key], Map) and isinstance(v[key], Map):
        nested_update(d[key], v[key])
    else:
        d[key] = v[key]

它的工作方式类似于Python的dict.update方法(并建立在该方法的基础上)。它会返回Nonereturn d如果愿意,可以随时添加),因为它会d原位更新dict 。中的键v将覆盖其中的任何现有键d(它不会尝试解释字典的内容)。

它也适用于其他(“类似字典”的)映射。

Short-n-sweet:

from collections.abc import MutableMapping as Map

def nested_update(d, v):
"""
Nested update of dict-like 'd' with dict-like 'v'.
"""

for key in v:
    if key in d and isinstance(d[key], Map) and isinstance(v[key], Map):
        nested_update(d[key], v[key])
    else:
        d[key] = v[key]

This works like (and is build on) Python's dict.update method. It returns None (you can always add return d if you prefer) as it updates dict d in-place. Keys in v will overwrite any existing keys in d (it does not try to interpret the dict's contents).

It will also work for other ("dict-like") mappings.


回答 15

当然,代码将取决于您解决合并冲突的规则。这是一个可以使用任意数量的参数并将其递归合并到任意深度的版本,而无需使用任何对象突变。它使用以下规则解决合并冲突:

  • 字典优先于非字典值({"foo": {...}}优先于{"foo": "bar"}
  • 随后的参数优先于前面的参数(如果合并{"a": 1}{"a", 2}以及{"a": 3}为了,其结果将是{"a": 3}
try:
    from collections import Mapping
except ImportError:
    Mapping = dict

def merge_dicts(*dicts):                                                            
    """                                                                             
    Return a new dictionary that is the result of merging the arguments together.   
    In case of conflicts, later arguments take precedence over earlier arguments.   
    """                                                                             
    updated = {}                                                                    
    # grab all keys                                                                 
    keys = set()                                                                    
    for d in dicts:                                                                 
        keys = keys.union(set(d))                                                   

    for key in keys:                                                                
        values = [d[key] for d in dicts if key in d]                                
        # which ones are mapping types? (aka dict)                                  
        maps = [value for value in values if isinstance(value, Mapping)]            
        if maps:                                                                    
            # if we have any mapping types, call recursively to merge them          
            updated[key] = merge_dicts(*maps)                                       
        else:                                                                       
            # otherwise, just grab the last value we have, since later arguments    
            # take precedence over earlier arguments                                
            updated[key] = values[-1]                                               
    return updated  

The code will depend on your rules for resolving merge conflicts, of course. Here's a version which can take an arbitrary number of arguments and merges them recursively to an arbitrary depth, without using any object mutation. It uses the following rules to resolve merge conflicts:

  • dictionaries take precedence over non-dict values ({"foo": {...}} takes precedence over {"foo": "bar"})
  • later arguments take precedence over earlier arguments (if you merge {"a": 1}, {"a", 2}, and {"a": 3} in order, the result will be {"a": 3})
try:
    from collections import Mapping
except ImportError:
    Mapping = dict

def merge_dicts(*dicts):                                                            
    """                                                                             
    Return a new dictionary that is the result of merging the arguments together.   
    In case of conflicts, later arguments take precedence over earlier arguments.   
    """                                                                             
    updated = {}                                                                    
    # grab all keys                                                                 
    keys = set()                                                                    
    for d in dicts:                                                                 
        keys = keys.union(set(d))                                                   

    for key in keys:                                                                
        values = [d[key] for d in dicts if key in d]                                
        # which ones are mapping types? (aka dict)                                  
        maps = [value for value in values if isinstance(value, Mapping)]            
        if maps:                                                                    
            # if we have any mapping types, call recursively to merge them          
            updated[key] = merge_dicts(*maps)                                       
        else:                                                                       
            # otherwise, just grab the last value we have, since later arguments    
            # take precedence over earlier arguments                                
            updated[key] = values[-1]                                               
    return updated  

回答 16

我有两个字典(ab),每个字典可以包含任意数量的嵌套字典。我想以b优先于递归合并它们a

将嵌套字典视为树,我想要的是:

  • 更新a以使到每个叶子的每条路径都b表示为a
  • 要覆盖的子树a,如果叶在相应路径中找到b
    • 保持所有b叶节点仍为​​叶的不变性。

对于我的口味而言,现有的答案有些复杂,并且在架子上留下了一些细节。我一起整理了以下内容,这些内容通过了我的数据集的单元测试。

  def merge_map(a, b):
    if not isinstance(a, dict) or not isinstance(b, dict):
      return b

    for key in b.keys():
      a[key] = merge_map(a[key], b[key]) if key in a else b[key]
    return a

示例(为清晰起见而格式化):

 a = {
    1 : {'a': 'red', 
         'b': {'blue': 'fish', 'yellow': 'bear' },
         'c': { 'orange': 'dog'},
    },
    2 : {'d': 'green'},
    3: 'e'
  }

  b = {
    1 : {'b': 'white'},
    2 : {'d': 'black'},
    3: 'e'
  }


  >>> merge_map(a, b)
  {1: {'a': 'red', 
       'b': 'white',
       'c': {'orange': 'dog'},},
   2: {'d': 'black'},
   3: 'e'}

b需要维护的路径是:

  • 1 -> 'b' -> 'white'
  • 2 -> 'd' -> 'black'
  • 3 -> 'e'

a 具有以下独特且无冲突的路径:

  • 1 -> 'a' -> 'red'
  • 1 -> 'c' -> 'orange' -> 'dog'

因此它们仍显示在合并地图中。

I had two dictionaries (a and b) which could each contain any number of nested dictionaries. I wanted to recursively merge them, with b taking precedence over a.

Considering the nested dictionaries as trees, what I wanted was:

  • To update a so that every path to every leaf in b would be represented in a
  • To overwrite subtrees of a if a leaf is found in the corresponding path in b
    • Maintain the invariant that all b leaf nodes remain leafs.

The existing answers were a little complicated for my taste and left some details on the shelf. I hacked together the following, which passes unit tests for my data set.

  def merge_map(a, b):
    if not isinstance(a, dict) or not isinstance(b, dict):
      return b

    for key in b.keys():
      a[key] = merge_map(a[key], b[key]) if key in a else b[key]
    return a

Example (formatted for clarity):

 a = {
    1 : {'a': 'red', 
         'b': {'blue': 'fish', 'yellow': 'bear' },
         'c': { 'orange': 'dog'},
    },
    2 : {'d': 'green'},
    3: 'e'
  }

  b = {
    1 : {'b': 'white'},
    2 : {'d': 'black'},
    3: 'e'
  }


  >>> merge_map(a, b)
  {1: {'a': 'red', 
       'b': 'white',
       'c': {'orange': 'dog'},},
   2: {'d': 'black'},
   3: 'e'}

The paths in b that needed to be maintained were:

  • 1 -> 'b' -> 'white'
  • 2 -> 'd' -> 'black'
  • 3 -> 'e'.

a had the unique and non-conflicting paths of:

  • 1 -> 'a' -> 'red'
  • 1 -> 'c' -> 'orange' -> 'dog'

so they are still represented in the merged map.


回答 17

我有一个迭代的解决方案-大字典和很多字典(例如json等)的效果要好得多:

import collections


def merge_dict_with_subdicts(dict1: dict, dict2: dict) -> dict:
    """
    similar behaviour to builtin dict.update - but knows how to handle nested dicts
    """
    q = collections.deque([(dict1, dict2)])
    while len(q) > 0:
        d1, d2 = q.pop()
        for k, v in d2.items():
            if k in d1 and isinstance(d1[k], dict) and isinstance(v, dict):
                q.append((d1[k], v))
            else:
                d1[k] = v

    return dict1

请注意,如果它们不是两个字典,则将使用d2中的值覆盖d1。(与python相同dict.update()

一些测试:

def test_deep_update():
    d = dict()
    merge_dict_with_subdicts(d, {"a": 4})
    assert d == {"a": 4}

    new_dict = {
        "b": {
            "c": {
                "d": 6
            }
        }
    }
    merge_dict_with_subdicts(d, new_dict)
    assert d == {
        "a": 4,
        "b": {
            "c": {
                "d": 6
            }
        }
    }

    new_dict = {
        "a": 3,
        "b": {
            "f": 7
        }
    }
    merge_dict_with_subdicts(d, new_dict)
    assert d == {
        "a": 3,
        "b": {
            "c": {
                "d": 6
            },
            "f": 7
        }
    }

    # test a case where one of the dicts has dict as value and the other has something else
    new_dict = {
        'a': {
            'b': 4
        }
    }
    merge_dict_with_subdicts(d, new_dict)
    assert d['a']['b'] == 4

我已经测试了约1200格-该方法花费了0.4秒,而递归解决方案花费了约2.5秒。

I have an iterative solution - works much much better with big dicts & a lot of them (for example jsons etc):

import collections


def merge_dict_with_subdicts(dict1: dict, dict2: dict) -> dict:
    """
    similar behaviour to builtin dict.update - but knows how to handle nested dicts
    """
    q = collections.deque([(dict1, dict2)])
    while len(q) > 0:
        d1, d2 = q.pop()
        for k, v in d2.items():
            if k in d1 and isinstance(d1[k], dict) and isinstance(v, dict):
                q.append((d1[k], v))
            else:
                d1[k] = v

    return dict1

note that this will use the value in d2 to override d1, in case they are not both dicts. (same as python's dict.update())

some tests:

def test_deep_update():
    d = dict()
    merge_dict_with_subdicts(d, {"a": 4})
    assert d == {"a": 4}

    new_dict = {
        "b": {
            "c": {
                "d": 6
            }
        }
    }
    merge_dict_with_subdicts(d, new_dict)
    assert d == {
        "a": 4,
        "b": {
            "c": {
                "d": 6
            }
        }
    }

    new_dict = {
        "a": 3,
        "b": {
            "f": 7
        }
    }
    merge_dict_with_subdicts(d, new_dict)
    assert d == {
        "a": 3,
        "b": {
            "c": {
                "d": 6
            },
            "f": 7
        }
    }

    # test a case where one of the dicts has dict as value and the other has something else
    new_dict = {
        'a': {
            'b': 4
        }
    }
    merge_dict_with_subdicts(d, new_dict)
    assert d['a']['b'] == 4

I've tested with around ~1200 dicts - this method took 0.4 seconds, while the recursive solution took ~2.5 seconds.


回答 18

这应该有助于将所有项目合并dict2dict1

for item in dict2:
    if item in dict1:
        for leaf in dict2[item]:
            dict1[item][leaf] = dict2[item][leaf]
    else:
        dict1[item] = dict2[item]

请测试一下,然后告诉我们这是否是您想要的。

编辑:

上述解决方案仅合并了一个级别,但正确地解决了OP给出的示例。要合并多个级别,应使用递归。

This should help in merging all items from dict2 into dict1:

for item in dict2:
    if item in dict1:
        for leaf in dict2[item]:
            dict1[item][leaf] = dict2[item][leaf]
    else:
        dict1[item] = dict2[item]

Please test it and tell us whether this is what you wanted.

EDIT:

The above mentioned solution merges only one level, but correctly solves the example given by OP. To merge multiple levels, the recursion should be used.


回答 19

我一直在测试您的解决方案,并决定在我的项目中使用此解决方案:

def mergedicts(dict1, dict2, conflict, no_conflict):
    for k in set(dict1.keys()).union(dict2.keys()):
        if k in dict1 and k in dict2:
            yield (k, conflict(dict1[k], dict2[k]))
        elif k in dict1:
            yield (k, no_conflict(dict1[k]))
        else:
            yield (k, no_conflict(dict2[k]))

dict1 = {1:{"a":"A"}, 2:{"b":"B"}}
dict2 = {2:{"c":"C"}, 3:{"d":"D"}}

#this helper function allows for recursion and the use of reduce
def f2(x, y):
    return dict(mergedicts(x, y, f2, lambda x: x))

print dict(mergedicts(dict1, dict2, f2, lambda x: x))
print dict(reduce(f2, [dict1, dict2]))

将函数作为参数传递是扩展jterrace解决方案以使其与所有其他递归解决方案一样起作用的关键。

I've been testing your solutions and decided to use this one in my project:

def mergedicts(dict1, dict2, conflict, no_conflict):
    for k in set(dict1.keys()).union(dict2.keys()):
        if k in dict1 and k in dict2:
            yield (k, conflict(dict1[k], dict2[k]))
        elif k in dict1:
            yield (k, no_conflict(dict1[k]))
        else:
            yield (k, no_conflict(dict2[k]))

dict1 = {1:{"a":"A"}, 2:{"b":"B"}}
dict2 = {2:{"c":"C"}, 3:{"d":"D"}}

#this helper function allows for recursion and the use of reduce
def f2(x, y):
    return dict(mergedicts(x, y, f2, lambda x: x))

print dict(mergedicts(dict1, dict2, f2, lambda x: x))
print dict(reduce(f2, [dict1, dict2]))

Passing functions as parameteres is key to extend jterrace solution to behave as all the other recursive solutions.


回答 20

我能想到的最简单的方法是:

#!/usr/bin/python

from copy import deepcopy
def dict_merge(a, b):
    if not isinstance(b, dict):
        return b
    result = deepcopy(a)
    for k, v in b.iteritems():
        if k in result and isinstance(result[k], dict):
                result[k] = dict_merge(result[k], v)
        else:
            result[k] = deepcopy(v)
    return result

a = {1:{"a":'A'}, 2:{"b":'B'}}
b = {2:{"c":'C'}, 3:{"d":'D'}}

print dict_merge(a,b)

输出:

{1: {'a': 'A'}, 2: {'c': 'C', 'b': 'B'}, 3: {'d': 'D'}}

Easiest way i can think of is :

#!/usr/bin/python

from copy import deepcopy
def dict_merge(a, b):
    if not isinstance(b, dict):
        return b
    result = deepcopy(a)
    for k, v in b.iteritems():
        if k in result and isinstance(result[k], dict):
                result[k] = dict_merge(result[k], v)
        else:
            result[k] = deepcopy(v)
    return result

a = {1:{"a":'A'}, 2:{"b":'B'}}
b = {2:{"c":'C'}, 3:{"d":'D'}}

print dict_merge(a,b)

Output:

{1: {'a': 'A'}, 2: {'c': 'C', 'b': 'B'}, 3: {'d': 'D'}}

回答 21

我在这里还有另一个略有不同的解决方案:

def deepMerge(d1, d2, inconflict = lambda v1,v2 : v2) :
''' merge d2 into d1. using inconflict function to resolve the leaf conflicts '''
    for k in d2:
        if k in d1 : 
            if isinstance(d1[k], dict) and isinstance(d2[k], dict) :
                deepMerge(d1[k], d2[k], inconflict)
            elif d1[k] != d2[k] :
                d1[k] = inconflict(d1[k], d2[k])
        else :
            d1[k] = d2[k]
    return d1

默认情况下,它可以解决冲突,而有利于第二个dict的值,但是您可以轻松地覆盖它,通过做些麻烦,您甚至可以抛出异常。:)。

I have another slightly different solution here:

def deepMerge(d1, d2, inconflict = lambda v1,v2 : v2) :
''' merge d2 into d1. using inconflict function to resolve the leaf conflicts '''
    for k in d2:
        if k in d1 : 
            if isinstance(d1[k], dict) and isinstance(d2[k], dict) :
                deepMerge(d1[k], d2[k], inconflict)
            elif d1[k] != d2[k] :
                d1[k] = inconflict(d1[k], d2[k])
        else :
            d1[k] = d2[k]
    return d1

By default it resolves conflicts in favor of values from the second dict, but you can easily override this, with some witchery you may be able to even throw exceptions out of it. :).


回答 22

class Utils(object):

    """

    >>> a = { 'first' : { 'all_rows' : { 'pass' : 'dog', 'number' : '1' } } }
    >>> b = { 'first' : { 'all_rows' : { 'fail' : 'cat', 'number' : '5' } } }
    >>> Utils.merge_dict(b, a) == { 'first' : { 'all_rows' : { 'pass' : 'dog', 'fail' : 'cat', 'number' : '5' } } }
    True

    >>> main = {'a': {'b': {'test': 'bug'}, 'c': 'C'}}
    >>> suply = {'a': {'b': 2, 'd': 'D', 'c': {'test': 'bug2'}}}
    >>> Utils.merge_dict(main, suply) == {'a': {'b': {'test': 'bug'}, 'c': 'C', 'd': 'D'}}
    True

    """

    @staticmethod
    def merge_dict(main, suply):
        """
        获取融合的字典,以main为主,suply补充,冲突时以main为准
        :return:
        """
        for key, value in suply.items():
            if key in main:
                if isinstance(main[key], dict):
                    if isinstance(value, dict):
                        Utils.merge_dict(main[key], value)
                    else:
                        pass
                else:
                    pass
            else:
                main[key] = value
        return main

if __name__ == '__main__':
    import doctest
    doctest.testmod()
class Utils(object):

    """

    >>> a = { 'first' : { 'all_rows' : { 'pass' : 'dog', 'number' : '1' } } }
    >>> b = { 'first' : { 'all_rows' : { 'fail' : 'cat', 'number' : '5' } } }
    >>> Utils.merge_dict(b, a) == { 'first' : { 'all_rows' : { 'pass' : 'dog', 'fail' : 'cat', 'number' : '5' } } }
    True

    >>> main = {'a': {'b': {'test': 'bug'}, 'c': 'C'}}
    >>> suply = {'a': {'b': 2, 'd': 'D', 'c': {'test': 'bug2'}}}
    >>> Utils.merge_dict(main, suply) == {'a': {'b': {'test': 'bug'}, 'c': 'C', 'd': 'D'}}
    True

    """

    @staticmethod
    def merge_dict(main, suply):
        """
        获取融合的字典,以main为主,suply补充,冲突时以main为准
        :return:
        """
        for key, value in suply.items():
            if key in main:
                if isinstance(main[key], dict):
                    if isinstance(value, dict):
                        Utils.merge_dict(main[key], value)
                    else:
                        pass
                else:
                    pass
            else:
                main[key] = value
        return main

if __name__ == '__main__':
    import doctest
    doctest.testmod()

回答 23

嘿,我也有同样的问题,但是我有一个解决方案,我会把它发布在这里,以防它对其他人也有用,基本上是合并嵌套的字典并添加值,对我来说,我需要计算一些概率,因此一个很好的工作:

#used to copy a nested dict to a nested dict
def deepupdate(target, src):
    for k, v in src.items():
        if k in target:
            for k2, v2 in src[k].items():
                if k2 in target[k]:
                    target[k][k2]+=v2
                else:
                    target[k][k2] = v2
        else:
            target[k] = copy.deepcopy(v)

通过使用以上方法,我们可以合并:

target = {'6,6':{'6,63':1},'63,4':{'4,4':1},'4,4':{'4,3':1} ,'6,63':{'63,4':1}}

src = {'5,4':{'4,4':1},'5,5':{'5,4':1},'4,4':{'4,3':1} }

它将变成:{'5,5':{'5,4':1},'5,4':{'4,4':1},'6,6':{'6,63' :1},'63,4':{'4,4':1},'4,4':{'4,3':2},'6,63':{'63,4':1 }}

还请注意此处的更改:

target = {'6,6':{'6,63':1},'6,63':{'63,4':1},'4,4':{'4,3':1},'63,4':{'4,4':1}}

src = {'5,4':{'4,4':1},'4,3':{'3,4':1},'4,4':{'4,9':1},'3,4':{'4,4':1},'5,5':{'5,4':1}}

合并= {'5,4':{'4,4':1},'4,3':{'3,4':1},'6,63':{'63,4':1} ,'5,5':{'5,4':1},'6,6':{'6,63':1},'3,4':{'4,4':1},' 63,4':{'4,4':1},'4,4':{'4,3':1,'4,9':1} }}

别忘了还要添加导入副本:

import copy

hey there I also had the same problem but I though of a solution and I will post it here, in case it is also useful for others, basically merging nested dictionaries and also adding the values, for me I needed to calculate some probabilities so this one worked great:

#used to copy a nested dict to a nested dict
def deepupdate(target, src):
    for k, v in src.items():
        if k in target:
            for k2, v2 in src[k].items():
                if k2 in target[k]:
                    target[k][k2]+=v2
                else:
                    target[k][k2] = v2
        else:
            target[k] = copy.deepcopy(v)

by using the above method we can merge:

target = {'6,6': {'6,63': 1}, '63,4': {'4,4': 1}, '4,4': {'4,3': 1}, '6,63': {'63,4': 1}}

src = {'5,4': {'4,4': 1}, '5,5': {'5,4': 1}, '4,4': {'4,3': 1}}

and this will become: {'5,5': {'5,4': 1}, '5,4': {'4,4': 1}, '6,6': {'6,63': 1}, '63,4': {'4,4': 1}, '4,4': {'4,3': 2}, '6,63': {'63,4': 1}}

also notice the changes here:

target = {'6,6': {'6,63': 1}, '6,63': {'63,4': 1}, '4,4': {'4,3': 1}, '63,4': {'4,4': 1}}

src = {'5,4': {'4,4': 1}, '4,3': {'3,4': 1}, '4,4': {'4,9': 1}, '3,4': {'4,4': 1}, '5,5': {'5,4': 1}}

merge = {'5,4': {'4,4': 1}, '4,3': {'3,4': 1}, '6,63': {'63,4': 1}, '5,5': {'5,4': 1}, '6,6': {'6,63': 1}, '3,4': {'4,4': 1}, '63,4': {'4,4': 1}, '4,4': {'4,3': 1, '4,9': 1}}

dont forget to also add the import for copy:

import copy

回答 24

from collections import defaultdict
from itertools import chain

class DictHelper:

@staticmethod
def merge_dictionaries(*dictionaries, override=True):
    merged_dict = defaultdict(set)
    all_unique_keys = set(chain(*[list(dictionary.keys()) for dictionary in dictionaries]))  # Build a set using all dict keys
    for key in all_unique_keys:
        keys_value_type = list(set(filter(lambda obj_type: obj_type != type(None), [type(dictionary.get(key, None)) for dictionary in dictionaries])))
        # Establish the object type for each key, return None if key is not present in dict and remove None from final result
        if len(keys_value_type) != 1:
            raise Exception("Different objects type for same key: {keys_value_type}".format(keys_value_type=keys_value_type))

        if keys_value_type[0] == list:
            values = list(chain(*[dictionary.get(key, []) for dictionary in dictionaries]))  # Extract the value for each key
            merged_dict[key].update(values)

        elif keys_value_type[0] == dict:
            # Extract all dictionaries by key and enter in recursion
            dicts_to_merge = list(filter(lambda obj: obj != None, [dictionary.get(key, None) for dictionary in dictionaries]))
            merged_dict[key] = DictHelper.merge_dictionaries(*dicts_to_merge)

        else:
            # if override => get value from last dictionary else make a list of all values
            values = list(filter(lambda obj: obj != None, [dictionary.get(key, None) for dictionary in dictionaries]))
            merged_dict[key] = values[-1] if override else values

    return dict(merged_dict)



if __name__ == '__main__':
  d1 = {'aaaaaaaaa': ['to short', 'to long'], 'bbbbb': ['to short', 'to long'], "cccccc": ["the is a test"]}
  d2 = {'aaaaaaaaa': ['field is not a bool'], 'bbbbb': ['field is not a bool']}
  d3 = {'aaaaaaaaa': ['filed is not a string', "to short"], 'bbbbb': ['field is not an integer']}
  print(DictHelper.merge_dictionaries(d1, d2, d3))

  d4 = {"a": {"x": 1, "y": 2, "z": 3, "d": {"x1": 10}}}
  d5 = {"a": {"x": 10, "y": 20, "d": {"x2": 20}}}
  print(DictHelper.merge_dictionaries(d4, d5))

输出:

{'bbbbb': {'to long', 'field is not an integer', 'to short', 'field is not a bool'}, 
'aaaaaaaaa': {'to long', 'to short', 'filed is not a string', 'field is not a bool'}, 
'cccccc': {'the is a test'}}

{'a': {'y': 20, 'd': {'x1': 10, 'x2': 20}, 'z': 3, 'x': 10}}
from collections import defaultdict
from itertools import chain

class DictHelper:

@staticmethod
def merge_dictionaries(*dictionaries, override=True):
    merged_dict = defaultdict(set)
    all_unique_keys = set(chain(*[list(dictionary.keys()) for dictionary in dictionaries]))  # Build a set using all dict keys
    for key in all_unique_keys:
        keys_value_type = list(set(filter(lambda obj_type: obj_type != type(None), [type(dictionary.get(key, None)) for dictionary in dictionaries])))
        # Establish the object type for each key, return None if key is not present in dict and remove None from final result
        if len(keys_value_type) != 1:
            raise Exception("Different objects type for same key: {keys_value_type}".format(keys_value_type=keys_value_type))

        if keys_value_type[0] == list:
            values = list(chain(*[dictionary.get(key, []) for dictionary in dictionaries]))  # Extract the value for each key
            merged_dict[key].update(values)

        elif keys_value_type[0] == dict:
            # Extract all dictionaries by key and enter in recursion
            dicts_to_merge = list(filter(lambda obj: obj != None, [dictionary.get(key, None) for dictionary in dictionaries]))
            merged_dict[key] = DictHelper.merge_dictionaries(*dicts_to_merge)

        else:
            # if override => get value from last dictionary else make a list of all values
            values = list(filter(lambda obj: obj != None, [dictionary.get(key, None) for dictionary in dictionaries]))
            merged_dict[key] = values[-1] if override else values

    return dict(merged_dict)



if __name__ == '__main__':
  d1 = {'aaaaaaaaa': ['to short', 'to long'], 'bbbbb': ['to short', 'to long'], "cccccc": ["the is a test"]}
  d2 = {'aaaaaaaaa': ['field is not a bool'], 'bbbbb': ['field is not a bool']}
  d3 = {'aaaaaaaaa': ['filed is not a string', "to short"], 'bbbbb': ['field is not an integer']}
  print(DictHelper.merge_dictionaries(d1, d2, d3))

  d4 = {"a": {"x": 1, "y": 2, "z": 3, "d": {"x1": 10}}}
  d5 = {"a": {"x": 10, "y": 20, "d": {"x2": 20}}}
  print(DictHelper.merge_dictionaries(d4, d5))

Output:

{'bbbbb': {'to long', 'field is not an integer', 'to short', 'field is not a bool'}, 
'aaaaaaaaa': {'to long', 'to short', 'filed is not a string', 'field is not a bool'}, 
'cccccc': {'the is a test'}}

{'a': {'y': 20, 'd': {'x1': 10, 'x2': 20}, 'z': 3, 'x': 10}}

回答 25

看一看toolz

import toolz
dict1={1:{"a":"A"},2:{"b":"B"}}
dict2={2:{"c":"C"},3:{"d":"D"}}
toolz.merge_with(toolz.merge,dict1,dict2)

{1: {'a': 'A'}, 2: {'b': 'B', 'c': 'C'}, 3: {'d': 'D'}}

take a look at toolz package

import toolz
dict1={1:{"a":"A"},2:{"b":"B"}}
dict2={2:{"c":"C"},3:{"d":"D"}}
toolz.merge_with(toolz.merge,dict1,dict2)

gives

{1: {'a': 'A'}, 2: {'b': 'B', 'c': 'C'}, 3: {'d': 'D'}}

回答 26

以下函数将b合并为a。

def mergedicts(a, b):
    for key in b:
        if isinstance(a.get(key), dict) or isinstance(b.get(key), dict):
            mergedicts(a[key], b[key])
        else:
            a[key] = b[key]
    return a

The following function merges b into a.

def mergedicts(a, b):
    for key in b:
        if isinstance(a.get(key), dict) or isinstance(b.get(key), dict):
            mergedicts(a[key], b[key])
        else:
            a[key] = b[key]
    return a

回答 27

还有另一个细微变化:

这是基于纯python3集的深度更新功能。它通过一次遍历一个级别来更新嵌套字典,并调用自身来更新每个下一级别的字典值:

def deep_update(dict_original, dict_update):
    if isinstance(dict_original, dict) and isinstance(dict_update, dict):
        output=dict(dict_original)
        keys_original=set(dict_original.keys())
        keys_update=set(dict_update.keys())
        similar_keys=keys_original.intersection(keys_update)
        similar_dict={key:deep_update(dict_original[key], dict_update[key]) for key in similar_keys}
        new_keys=keys_update.difference(keys_original)
        new_dict={key:dict_update[key] for key in new_keys}
        output.update(similar_dict)
        output.update(new_dict)
        return output
    else:
        return dict_update

一个简单的例子:

x={'a':{'b':{'c':1, 'd':1}}}
y={'a':{'b':{'d':2, 'e':2}}, 'f':2}

print(deep_update(x, y))
>>> {'a': {'b': {'c': 1, 'd': 2, 'e': 2}}, 'f': 2}

And just another slight variation:

Here is a pure python3 set based deep update function. It updates nested dictionaries by looping through one level at a time and calls itself to update each next level of dictionary values:

def deep_update(dict_original, dict_update):
    if isinstance(dict_original, dict) and isinstance(dict_update, dict):
        output=dict(dict_original)
        keys_original=set(dict_original.keys())
        keys_update=set(dict_update.keys())
        similar_keys=keys_original.intersection(keys_update)
        similar_dict={key:deep_update(dict_original[key], dict_update[key]) for key in similar_keys}
        new_keys=keys_update.difference(keys_original)
        new_dict={key:dict_update[key] for key in new_keys}
        output.update(similar_dict)
        output.update(new_dict)
        return output
    else:
        return dict_update

A simple example:

x={'a':{'b':{'c':1, 'd':1}}}
y={'a':{'b':{'d':2, 'e':2}}, 'f':2}

print(deep_update(x, y))
>>> {'a': {'b': {'c': 1, 'd': 2, 'e': 2}}, 'f': 2}

回答 28

另一个答案怎么样?!这也避免了突变/副作用:

def merge(dict1, dict2):
    output = {}

    # adds keys from `dict1` if they do not exist in `dict2` and vice-versa
    intersection = {**dict2, **dict1}

    for k_intersect, v_intersect in intersection.items():
        if k_intersect not in dict1:
            v_dict2 = dict2[k_intersect]
            output[k_intersect] = v_dict2

        elif k_intersect not in dict2:
            output[k_intersect] = v_intersect

        elif isinstance(v_intersect, dict):
            v_dict2 = dict2[k_intersect]
            output[k_intersect] = merge(v_intersect, v_dict2)

        else:
            output[k_intersect] = v_intersect

    return output
dict1 = {1:{"a":{"A"}}, 2:{"b":{"B"}}}
dict2 = {2:{"c":{"C"}}, 3:{"d":{"D"}}}
dict3 = {1:{"a":{"A"}}, 2:{"b":{"B"},"c":{"C"}}, 3:{"d":{"D"}}}

assert dict3 == merge(dict1, dict2)

How about another answer?!? This one also avoids mutation/side effects:

def merge(dict1, dict2):
    output = {}

    # adds keys from `dict1` if they do not exist in `dict2` and vice-versa
    intersection = {**dict2, **dict1}

    for k_intersect, v_intersect in intersection.items():
        if k_intersect not in dict1:
            v_dict2 = dict2[k_intersect]
            output[k_intersect] = v_dict2

        elif k_intersect not in dict2:
            output[k_intersect] = v_intersect

        elif isinstance(v_intersect, dict):
            v_dict2 = dict2[k_intersect]
            output[k_intersect] = merge(v_intersect, v_dict2)

        else:
            output[k_intersect] = v_intersect

    return output

dict1 = {1:{"a":{"A"}}, 2:{"b":{"B"}}}
dict2 = {2:{"c":{"C"}}, 3:{"d":{"D"}}}
dict3 = {1:{"a":{"A"}}, 2:{"b":{"B"},"c":{"C"}}, 3:{"d":{"D"}}}

assert dict3 == merge(dict1, dict2)

本文由 Python 实用宝典 作者:Python实用宝典 发表,其版权均为 Python 实用宝典 所有,文章内容系作者个人观点,不代表 Python 实用宝典 对观点赞同或支持。如需转载,请注明文章来源。
14

抱歉,评论已关闭!