标签归档:merge

如何合并字典的字典?

问题:如何合并字典的字典?

我需要合并多个词典,例如:

dict1 = {1:{"a":{A}}, 2:{"b":{B}}}

dict2 = {2:{"c":{C}}, 3:{"d":{D}}

随着A B CD作为树的叶子像{"info1":"value", "info2":"value2"}

词典的级别(深度)未知,可能是 {2:{"c":{"z":{"y":{C}}}}}

在我的情况下,它表示目录/文件结构,其中节点为docs,而节点为文件。

我想将它们合并以获得:

 dict3 = {1:{"a":{A}}, 2:{"b":{B},"c":{C}}, 3:{"d":{D}}}

我不确定如何使用Python轻松做到这一点。

I need to merge multiple dictionaries, here’s what I have for instance:

dict1 = {1:{"a":{A}}, 2:{"b":{B}}}

dict2 = {2:{"c":{C}}, 3:{"d":{D}}

With A B C and D being leaves of the tree, like {"info1":"value", "info2":"value2"}

There is an unknown level(depth) of dictionaries, it could be {2:{"c":{"z":{"y":{C}}}}}

In my case it represents a directory/files structure with nodes being docs and leaves being files.

I want to merge them to obtain:

 dict3 = {1:{"a":{A}}, 2:{"b":{B},"c":{C}}, 3:{"d":{D}}}

I’m not sure how I could do that easily with Python.


回答 0

这实际上是非常棘手的-特别是如果您希望在事物不一致时收到有用的错误消息,同时正确地接受重复但一致的条目(这里没有其他答案了……)。

假设您没有大量条目,那么递归函数最简单:

def merge(a, b, path=None):
    "merges b into a"
    if path is None: path = []
    for key in b:
        if key in a:
            if isinstance(a[key], dict) and isinstance(b[key], dict):
                merge(a[key], b[key], path + [str(key)])
            elif a[key] == b[key]:
                pass # same leaf value
            else:
                raise Exception('Conflict at %s' % '.'.join(path + [str(key)]))
        else:
            a[key] = b[key]
    return a

# works
print(merge({1:{"a":"A"},2:{"b":"B"}}, {2:{"c":"C"},3:{"d":"D"}}))
# has conflict
merge({1:{"a":"A"},2:{"b":"B"}}, {1:{"a":"A"},2:{"b":"C"}})

请注意,这会发生变化a-将的内容b添加到a(也会返回)。如果您想保留a,可以这样称呼merge(dict(a), b)

agf指出(如下),您可能有两个以上的命令,在这种情况下,您可以使用:

reduce(merge, [dict1, dict2, dict3...])

将所有内容添加到dict1中。

[注意-我编辑了最初的答案以使第一个参数发生变化;使“减少”更易于解释]

python 3中的ps,您还需要 from functools import reduce

this is actually quite tricky – particularly if you want a useful error message when things are inconsistent, while correctly accepting duplicate but consistent entries (something no other answer here does….)

assuming you don’t have huge numbers of entries a recursive function is easiest:

def merge(a, b, path=None):
    "merges b into a"
    if path is None: path = []
    for key in b:
        if key in a:
            if isinstance(a[key], dict) and isinstance(b[key], dict):
                merge(a[key], b[key], path + [str(key)])
            elif a[key] == b[key]:
                pass # same leaf value
            else:
                raise Exception('Conflict at %s' % '.'.join(path + [str(key)]))
        else:
            a[key] = b[key]
    return a

# works
print(merge({1:{"a":"A"},2:{"b":"B"}}, {2:{"c":"C"},3:{"d":"D"}}))
# has conflict
merge({1:{"a":"A"},2:{"b":"B"}}, {1:{"a":"A"},2:{"b":"C"}})

note that this mutates a – the contents of b are added to a (which is also returned). if you want to keep a you could call it like merge(dict(a), b).

agf pointed out (below) that you may have more than two dicts, in which case you can use:

reduce(merge, [dict1, dict2, dict3...])

where everything will be added to dict1.

[note – i edited my initial answer to mutate the first argument; that makes the “reduce” easier to explain]

ps in python 3, you will also need from functools import reduce


回答 1

这是使用生成器实现的一种简单方法:

def mergedicts(dict1, dict2):
    for k in set(dict1.keys()).union(dict2.keys()):
        if k in dict1 and k in dict2:
            if isinstance(dict1[k], dict) and isinstance(dict2[k], dict):
                yield (k, dict(mergedicts(dict1[k], dict2[k])))
            else:
                # If one of the values is not a dict, you can't continue merging it.
                # Value from second dict overrides one in first and we move on.
                yield (k, dict2[k])
                # Alternatively, replace this with exception raiser to alert you of value conflicts
        elif k in dict1:
            yield (k, dict1[k])
        else:
            yield (k, dict2[k])

dict1 = {1:{"a":"A"},2:{"b":"B"}}
dict2 = {2:{"c":"C"},3:{"d":"D"}}

print dict(mergedicts(dict1,dict2))

打印:

{1: {'a': 'A'}, 2: {'c': 'C', 'b': 'B'}, 3: {'d': 'D'}}

Here’s an easy way to do it using generators:

def mergedicts(dict1, dict2):
    for k in set(dict1.keys()).union(dict2.keys()):
        if k in dict1 and k in dict2:
            if isinstance(dict1[k], dict) and isinstance(dict2[k], dict):
                yield (k, dict(mergedicts(dict1[k], dict2[k])))
            else:
                # If one of the values is not a dict, you can't continue merging it.
                # Value from second dict overrides one in first and we move on.
                yield (k, dict2[k])
                # Alternatively, replace this with exception raiser to alert you of value conflicts
        elif k in dict1:
            yield (k, dict1[k])
        else:
            yield (k, dict2[k])

dict1 = {1:{"a":"A"},2:{"b":"B"}}
dict2 = {2:{"c":"C"},3:{"d":"D"}}

print dict(mergedicts(dict1,dict2))

This prints:

{1: {'a': 'A'}, 2: {'c': 'C', 'b': 'B'}, 3: {'d': 'D'}}

回答 2

这个问题的一个问题是,字典的值可以是任意复杂的数据。基于这些和其他答案,我想到了以下代码:

class YamlReaderError(Exception):
    pass

def data_merge(a, b):
    """merges b into a and return merged result

    NOTE: tuples and arbitrary objects are not handled as it is totally ambiguous what should happen"""
    key = None
    # ## debug output
    # sys.stderr.write("DEBUG: %s to %s\n" %(b,a))
    try:
        if a is None or isinstance(a, str) or isinstance(a, unicode) or isinstance(a, int) or isinstance(a, long) or isinstance(a, float):
            # border case for first run or if a is a primitive
            a = b
        elif isinstance(a, list):
            # lists can be only appended
            if isinstance(b, list):
                # merge lists
                a.extend(b)
            else:
                # append to list
                a.append(b)
        elif isinstance(a, dict):
            # dicts must be merged
            if isinstance(b, dict):
                for key in b:
                    if key in a:
                        a[key] = data_merge(a[key], b[key])
                    else:
                        a[key] = b[key]
            else:
                raise YamlReaderError('Cannot merge non-dict "%s" into dict "%s"' % (b, a))
        else:
            raise YamlReaderError('NOT IMPLEMENTED "%s" into "%s"' % (b, a))
    except TypeError, e:
        raise YamlReaderError('TypeError "%s" in key "%s" when merging "%s" into "%s"' % (e, key, b, a))
    return a

我的用例是合并YAML文件,我只需要处理可能的数据类型的子集。因此,我可以忽略元组和其他对象。对我而言,明智的合并逻辑意味着

  • 更换标量
  • 追加清单
  • 通过添加缺少的键并更新现有键来合并字典

其他所有和不可预见的结果都会导致错误。

One issue with this question is that the values of the dict can be arbitrarily complex pieces of data. Based upon these and other answers I came up with this code:

class YamlReaderError(Exception):
    pass

def data_merge(a, b):
    """merges b into a and return merged result

    NOTE: tuples and arbitrary objects are not handled as it is totally ambiguous what should happen"""
    key = None
    # ## debug output
    # sys.stderr.write("DEBUG: %s to %s\n" %(b,a))
    try:
        if a is None or isinstance(a, str) or isinstance(a, unicode) or isinstance(a, int) or isinstance(a, long) or isinstance(a, float):
            # border case for first run or if a is a primitive
            a = b
        elif isinstance(a, list):
            # lists can be only appended
            if isinstance(b, list):
                # merge lists
                a.extend(b)
            else:
                # append to list
                a.append(b)
        elif isinstance(a, dict):
            # dicts must be merged
            if isinstance(b, dict):
                for key in b:
                    if key in a:
                        a[key] = data_merge(a[key], b[key])
                    else:
                        a[key] = b[key]
            else:
                raise YamlReaderError('Cannot merge non-dict "%s" into dict "%s"' % (b, a))
        else:
            raise YamlReaderError('NOT IMPLEMENTED "%s" into "%s"' % (b, a))
    except TypeError, e:
        raise YamlReaderError('TypeError "%s" in key "%s" when merging "%s" into "%s"' % (e, key, b, a))
    return a

My use case is merging YAML files where I only have to deal with a subset of possible data types. Hence I can ignore tuples and other objects. For me a sensible merge logic means

  • replace scalars
  • append lists
  • merge dicts by adding missing keys and updating existing keys

Everything else and the unforeseens results in an error.


回答 3

字典词典合并

由于这是规范的问题(尽管有某些非一般性的规定),所以我提供了规范的Python方法来解决此问题。

最简单的情况:“叶是嵌套的字典,以空字典结尾”:

d1 = {'a': {1: {'foo': {}}, 2: {}}}
d2 = {'a': {1: {}, 2: {'bar': {}}}}
d3 = {'b': {3: {'baz': {}}}}
d4 = {'a': {1: {'quux': {}}}}

这是最简单的递归情况,我建议两种朴素的方法:

def rec_merge1(d1, d2):
    '''return new merged dict of dicts'''
    for k, v in d1.items(): # in Python 2, use .iteritems()!
        if k in d2:
            d2[k] = rec_merge1(v, d2[k])
    d3 = d1.copy()
    d3.update(d2)
    return d3

def rec_merge2(d1, d2):
    '''update first dict with second recursively'''
    for k, v in d1.items(): # in Python 2, use .iteritems()!
        if k in d2:
            d2[k] = rec_merge2(v, d2[k])
    d1.update(d2)
    return d1

我相信我更喜欢第二个,但要记住,第一个的原始状态必须从其原始位置重建。这是用法:

>>> from functools import reduce # only required for Python 3.
>>> reduce(rec_merge1, (d1, d2, d3, d4))
{'a': {1: {'quux': {}, 'foo': {}}, 2: {'bar': {}}}, 'b': {3: {'baz': {}}}}
>>> reduce(rec_merge2, (d1, d2, d3, d4))
{'a': {1: {'quux': {}, 'foo': {}}, 2: {'bar': {}}}, 'b': {3: {'baz': {}}}}

复杂的情况:“叶子是任何其他类型:”

因此,如果它们以字典结尾,这是合并末端空字典的简单情况。如果没有,那不是那么简单。如果是字符串,如何合并它们?可以类似地更新集合,因此我们可以进行这种处理,但是会丢失它们合并的顺序。那么顺序重要吗?

因此,代替更多信息,最简单的方法是在两个值都不都是dict的情况下为他们提供标准的更新处理:即第二个dict的值将覆盖第一个dict的值,即使第二个dict的值为None且第一个的值为a有很多信息的字典。

d1 = {'a': {1: 'foo', 2: None}}
d2 = {'a': {1: None, 2: 'bar'}}
d3 = {'b': {3: 'baz'}}
d4 = {'a': {1: 'quux'}}

from collections import MutableMapping

def rec_merge(d1, d2):
    '''
    Update two dicts of dicts recursively, 
    if either mapping has leaves that are non-dicts, 
    the second's leaf overwrites the first's.
    '''
    for k, v in d1.items(): # in Python 2, use .iteritems()!
        if k in d2:
            # this next check is the only difference!
            if all(isinstance(e, MutableMapping) for e in (v, d2[k])):
                d2[k] = rec_merge(v, d2[k])
            # we could further check types and merge as appropriate here.
    d3 = d1.copy()
    d3.update(d2)
    return d3

现在

from functools import reduce
reduce(rec_merge, (d1, d2, d3, d4))

退货

{'a': {1: 'quux', 2: 'bar'}, 'b': {3: 'baz'}}

适用于原始问题:

我必须删除字母周围的花括号并将其放在单引号中,以使其成为合法的Python(否则,它们将在Python 2.7+中设置为原义)并附加缺少的花括号:

dict1 = {1:{"a":'A'}, 2:{"b":'B'}}
dict2 = {2:{"c":'C'}, 3:{"d":'D'}}

rec_merge(dict1, dict2)现在返回:

{1: {'a': 'A'}, 2: {'c': 'C', 'b': 'B'}, 3: {'d': 'D'}}

匹配原始问题的期望结果(例如,更改{A}为后)'A'

Dictionaries of dictionaries merge

As this is the canonical question (in spite of certain non-generalities) I’m providing the canonical Pythonic approach to solving this issue.

Simplest Case: “leaves are nested dicts that end in empty dicts”:

d1 = {'a': {1: {'foo': {}}, 2: {}}}
d2 = {'a': {1: {}, 2: {'bar': {}}}}
d3 = {'b': {3: {'baz': {}}}}
d4 = {'a': {1: {'quux': {}}}}

This is the simplest case for recursion, and I would recommend two naive approaches:

def rec_merge1(d1, d2):
    '''return new merged dict of dicts'''
    for k, v in d1.items(): # in Python 2, use .iteritems()!
        if k in d2:
            d2[k] = rec_merge1(v, d2[k])
    d3 = d1.copy()
    d3.update(d2)
    return d3

def rec_merge2(d1, d2):
    '''update first dict with second recursively'''
    for k, v in d1.items(): # in Python 2, use .iteritems()!
        if k in d2:
            d2[k] = rec_merge2(v, d2[k])
    d1.update(d2)
    return d1

I believe I would prefer the second to the first, but keep in mind that the original state of the first would have to be rebuilt from its origin. Here’s the usage:

>>> from functools import reduce # only required for Python 3.
>>> reduce(rec_merge1, (d1, d2, d3, d4))
{'a': {1: {'quux': {}, 'foo': {}}, 2: {'bar': {}}}, 'b': {3: {'baz': {}}}}
>>> reduce(rec_merge2, (d1, d2, d3, d4))
{'a': {1: {'quux': {}, 'foo': {}}, 2: {'bar': {}}}, 'b': {3: {'baz': {}}}}

Complex Case: “leaves are of any other type:”

So if they end in dicts, it’s a simple case of merging the end empty dicts. If not, it’s not so trivial. If strings, how do you merge them? Sets can be updated similarly, so we could give that treatment, but we lose the order in which they were merged. So does order matter?

So in lieu of more information, the simplest approach will be to give them the standard update treatment if both values are not dicts: i.e. the second dict’s value will overwrite the first, even if the second dict’s value is None and the first’s value is a dict with a lot of info.

d1 = {'a': {1: 'foo', 2: None}}
d2 = {'a': {1: None, 2: 'bar'}}
d3 = {'b': {3: 'baz'}}
d4 = {'a': {1: 'quux'}}

from collections.abc import MutableMapping

def rec_merge(d1, d2):
    '''
    Update two dicts of dicts recursively, 
    if either mapping has leaves that are non-dicts, 
    the second's leaf overwrites the first's.
    '''
    for k, v in d1.items():
        if k in d2:
            # this next check is the only difference!
            if all(isinstance(e, MutableMapping) for e in (v, d2[k])):
                d2[k] = rec_merge(v, d2[k])
            # we could further check types and merge as appropriate here.
    d3 = d1.copy()
    d3.update(d2)
    return d3

And now

from functools import reduce
reduce(rec_merge, (d1, d2, d3, d4))

returns

{'a': {1: 'quux', 2: 'bar'}, 'b': {3: 'baz'}}

Application to the original question:

I’ve had to remove the curly braces around the letters and put them in single quotes for this to be legit Python (else they would be set literals in Python 2.7+) as well as append a missing brace:

dict1 = {1:{"a":'A'}, 2:{"b":'B'}}
dict2 = {2:{"c":'C'}, 3:{"d":'D'}}

and rec_merge(dict1, dict2) now returns:

{1: {'a': 'A'}, 2: {'c': 'C', 'b': 'B'}, 3: {'d': 'D'}}

Which matches the desired outcome of the original question (after changing, e.g. the {A} to 'A'.)


回答 4

基于@andrew cooke。此版本处理字典的嵌套列表,还允许选择更新值

def merge(a, b, path=None, update=True):
    "http://stackoverflow.com/questions/7204805/python-dictionaries-of-dictionaries-merge"
    "merges b into a"
    if path is None: path = []
    for key in b:
        if key in a:
            if isinstance(a[key], dict) and isinstance(b[key], dict):
                merge(a[key], b[key], path + [str(key)])
            elif a[key] == b[key]:
                pass # same leaf value
            elif isinstance(a[key], list) and isinstance(b[key], list):
                for idx, val in enumerate(b[key]):
                    a[key][idx] = merge(a[key][idx], b[key][idx], path + [str(key), str(idx)], update=update)
            elif update:
                a[key] = b[key]
            else:
                raise Exception('Conflict at %s' % '.'.join(path + [str(key)]))
        else:
            a[key] = b[key]
    return a

Based on @andrew cooke. This version handles nested lists of dicts and also allows the option to update the values

def merge(a, b, path=None, update=True):
    "http://stackoverflow.com/questions/7204805/python-dictionaries-of-dictionaries-merge"
    "merges b into a"
    if path is None: path = []
    for key in b:
        if key in a:
            if isinstance(a[key], dict) and isinstance(b[key], dict):
                merge(a[key], b[key], path + [str(key)])
            elif a[key] == b[key]:
                pass # same leaf value
            elif isinstance(a[key], list) and isinstance(b[key], list):
                for idx, val in enumerate(b[key]):
                    a[key][idx] = merge(a[key][idx], b[key][idx], path + [str(key), str(idx)], update=update)
            elif update:
                a[key] = b[key]
            else:
                raise Exception('Conflict at %s' % '.'.join(path + [str(key)]))
        else:
            a[key] = b[key]
    return a

回答 5

这个简单的递归过程将一个字典合并到另一个字典中,同时覆盖冲突的键:

#!/usr/bin/env python2.7

def merge_dicts(dict1, dict2):
    """ Recursively merges dict2 into dict1 """
    if not isinstance(dict1, dict) or not isinstance(dict2, dict):
        return dict2
    for k in dict2:
        if k in dict1:
            dict1[k] = merge_dicts(dict1[k], dict2[k])
        else:
            dict1[k] = dict2[k]
    return dict1

print (merge_dicts({1:{"a":"A"}, 2:{"b":"B"}}, {2:{"c":"C"}, 3:{"d":"D"}}))
print (merge_dicts({1:{"a":"A"}, 2:{"b":"B"}}, {1:{"a":"A"}, 2:{"b":"C"}}))

输出:

{1: {'a': 'A'}, 2: {'c': 'C', 'b': 'B'}, 3: {'d': 'D'}}
{1: {'a': 'A'}, 2: {'b': 'C'}}

This simple recursive procedure will merge one dictionary into another while overriding conflicting keys:

#!/usr/bin/env python2.7

def merge_dicts(dict1, dict2):
    """ Recursively merges dict2 into dict1 """
    if not isinstance(dict1, dict) or not isinstance(dict2, dict):
        return dict2
    for k in dict2:
        if k in dict1:
            dict1[k] = merge_dicts(dict1[k], dict2[k])
        else:
            dict1[k] = dict2[k]
    return dict1

print (merge_dicts({1:{"a":"A"}, 2:{"b":"B"}}, {2:{"c":"C"}, 3:{"d":"D"}}))
print (merge_dicts({1:{"a":"A"}, 2:{"b":"B"}}, {1:{"a":"A"}, 2:{"b":"C"}}))

Output:

{1: {'a': 'A'}, 2: {'c': 'C', 'b': 'B'}, 3: {'d': 'D'}}
{1: {'a': 'A'}, 2: {'b': 'C'}}

回答 6

基于@andrew cooke的答案。它以更好的方式处理嵌套列表。

def deep_merge_lists(original, incoming):
    """
    Deep merge two lists. Modifies original.
    Recursively call deep merge on each correlated element of list. 
    If item type in both elements are
     a. dict: Call deep_merge_dicts on both values.
     b. list: Recursively call deep_merge_lists on both values.
     c. any other type: Value is overridden.
     d. conflicting types: Value is overridden.

    If length of incoming list is more that of original then extra values are appended.
    """
    common_length = min(len(original), len(incoming))
    for idx in range(common_length):
        if isinstance(original[idx], dict) and isinstance(incoming[idx], dict):
            deep_merge_dicts(original[idx], incoming[idx])

        elif isinstance(original[idx], list) and isinstance(incoming[idx], list):
            deep_merge_lists(original[idx], incoming[idx])

        else:
            original[idx] = incoming[idx]

    for idx in range(common_length, len(incoming)):
        original.append(incoming[idx])


def deep_merge_dicts(original, incoming):
    """
    Deep merge two dictionaries. Modifies original.
    For key conflicts if both values are:
     a. dict: Recursively call deep_merge_dicts on both values.
     b. list: Call deep_merge_lists on both values.
     c. any other type: Value is overridden.
     d. conflicting types: Value is overridden.

    """
    for key in incoming:
        if key in original:
            if isinstance(original[key], dict) and isinstance(incoming[key], dict):
                deep_merge_dicts(original[key], incoming[key])

            elif isinstance(original[key], list) and isinstance(incoming[key], list):
                deep_merge_lists(original[key], incoming[key])

            else:
                original[key] = incoming[key]
        else:
            original[key] = incoming[key]

Based on answers from @andrew cooke. It takes care of nested lists in a better way.

def deep_merge_lists(original, incoming):
    """
    Deep merge two lists. Modifies original.
    Recursively call deep merge on each correlated element of list. 
    If item type in both elements are
     a. dict: Call deep_merge_dicts on both values.
     b. list: Recursively call deep_merge_lists on both values.
     c. any other type: Value is overridden.
     d. conflicting types: Value is overridden.

    If length of incoming list is more that of original then extra values are appended.
    """
    common_length = min(len(original), len(incoming))
    for idx in range(common_length):
        if isinstance(original[idx], dict) and isinstance(incoming[idx], dict):
            deep_merge_dicts(original[idx], incoming[idx])

        elif isinstance(original[idx], list) and isinstance(incoming[idx], list):
            deep_merge_lists(original[idx], incoming[idx])

        else:
            original[idx] = incoming[idx]

    for idx in range(common_length, len(incoming)):
        original.append(incoming[idx])


def deep_merge_dicts(original, incoming):
    """
    Deep merge two dictionaries. Modifies original.
    For key conflicts if both values are:
     a. dict: Recursively call deep_merge_dicts on both values.
     b. list: Call deep_merge_lists on both values.
     c. any other type: Value is overridden.
     d. conflicting types: Value is overridden.

    """
    for key in incoming:
        if key in original:
            if isinstance(original[key], dict) and isinstance(incoming[key], dict):
                deep_merge_dicts(original[key], incoming[key])

            elif isinstance(original[key], list) and isinstance(incoming[key], list):
                deep_merge_lists(original[key], incoming[key])

            else:
                original[key] = incoming[key]
        else:
            original[key] = incoming[key]

回答 7

如果您的词典级别未知,那么我建议使用递归函数:

def combineDicts(dictionary1, dictionary2):
    output = {}
    for item, value in dictionary1.iteritems():
        if dictionary2.has_key(item):
            if isinstance(dictionary2[item], dict):
                output[item] = combineDicts(value, dictionary2.pop(item))
        else:
            output[item] = value
    for item, value in dictionary2.iteritems():
         output[item] = value
    return output

If you have an unknown level of dictionaries, then I would suggest a recursive function:

def combineDicts(dictionary1, dictionary2):
    output = {}
    for item, value in dictionary1.iteritems():
        if dictionary2.has_key(item):
            if isinstance(dictionary2[item], dict):
                output[item] = combineDicts(value, dictionary2.pop(item))
        else:
            output[item] = value
    for item, value in dictionary2.iteritems():
         output[item] = value
    return output

回答 8

总览

以下方法将dict的深度合并问题细分为:

  1. 参数化的浅表合并函数merge(f)(a,b),该函数使用一个函数f合并两个字典ab

  2. 递归合并函数f将与merge


实作

可以通过多种方式来编写用于合并两个(非嵌套)字典的函数。我个人喜欢

def merge(f):
    def merge(a,b): 
        keys = a.keys() | b.keys()
        return {key:f(a.get(key), b.get(key)) for key in keys}
    return merge

定义适当的递归合并函数的一种好方法f是使用多调度,它允许定义根据参数类型沿不同路径求值的函数。

from multipledispatch import dispatch

#for anything that is not a dict return
@dispatch(object, object)
def f(a, b):
    return b if b is not None else a

#for dicts recurse 
@dispatch(dict, dict)
def f(a,b):
    return merge(f)(a,b)

要合并两个嵌套的字典,只需使用merge(f)例如:

dict1 = {1:{"a":"A"},2:{"b":"B"}}
dict2 = {2:{"c":"C"},3:{"d":"D"}}
merge(f)(dict1, dict2)
#returns {1: {'a': 'A'}, 2: {'b': 'B', 'c': 'C'}, 3: {'d': 'D'}} 

笔记:

这种方法的优点是:

  • 该函数是由较小的函数构建的,每个较小的函数都做一件事情,这使代码更易于推理和测试。

  • 该行为不是硬编码的,但是可以根据需要进行更改和扩展,从而提高了代码重用性(请参见下面的示例)。


客制化

一些答案还考虑了包含例如其他(可能嵌套的)字典列表的字典。在这种情况下,可能需要映射列表并根据位置合并它们。这可以通过向合并功能添加另一个定义来完成f

import itertools
@dispatch(list, list)
def f(a,b):
    return [merge(f)(*arg) for arg in itertools.zip_longest(a, b)]

Overview

The following approach subdivides the problem of a deep merge of dicts into:

  1. A parameterized shallow merge function merge(f)(a,b) that uses a function f to merge two dicts a and b

  2. A recursive merger function f to be used together with merge


Implementation

A function for merging two (non nested) dicts can be written in a lot of ways. I personally like

def merge(f):
    def merge(a,b): 
        keys = a.keys() | b.keys()
        return {key:f(a.get(key), b.get(key)) for key in keys}
    return merge

A nice way of defining an appropriate recursive merger function f is using multipledispatch which allows to define functions that evaluate along different paths depending on the type of their arguments.

from multipledispatch import dispatch

#for anything that is not a dict return
@dispatch(object, object)
def f(a, b):
    return b if b is not None else a

#for dicts recurse 
@dispatch(dict, dict)
def f(a,b):
    return merge(f)(a,b)

Example

To merge two nested dicts simply use merge(f) e.g.:

dict1 = {1:{"a":"A"},2:{"b":"B"}}
dict2 = {2:{"c":"C"},3:{"d":"D"}}
merge(f)(dict1, dict2)
#returns {1: {'a': 'A'}, 2: {'b': 'B', 'c': 'C'}, 3: {'d': 'D'}} 

Notes:

The advantages of this approach are:

  • The function is build from smaller functions that each do a single thing which makes the code simpler to reason about and test

  • The behaviour is not hard-coded but can be changed and extended as needed which improves code reuse (see example below).


Customization

Some answers also considered dicts that contain lists e.g. of other (potentially nested) dicts. In this case one might want map over the lists and merge them based on position. This can be done by adding another definition to the merger function f:

import itertools
@dispatch(list, list)
def f(a,b):
    return [merge(f)(*arg) for arg in itertools.zip_longest(a, b)]

回答 9

如果有人想要解决这个问题,这是我的解决方案。

美德:简短,说明性且具有风格上的功能(递归,无突变)。

潜在缺点:这可能不是您要查找的合并。有关语义,请查阅文档字符串。

def deep_merge(a, b):
    """
    Merge two values, with `b` taking precedence over `a`.

    Semantics:
    - If either `a` or `b` is not a dictionary, `a` will be returned only if
      `b` is `None`. Otherwise `b` will be returned.
    - If both values are dictionaries, they are merged as follows:
        * Each key that is found only in `a` or only in `b` will be included in
          the output collection with its value intact.
        * For any key in common between `a` and `b`, the corresponding values
          will be merged with the same semantics.
    """
    if not isinstance(a, dict) or not isinstance(b, dict):
        return a if b is None else b
    else:
        # If we're here, both a and b must be dictionaries or subtypes thereof.

        # Compute set of all keys in both dictionaries.
        keys = set(a.keys()) | set(b.keys())

        # Build output dictionary, merging recursively values with common keys,
        # where `None` is used to mean the absence of a value.
        return {
            key: deep_merge(a.get(key), b.get(key))
            for key in keys
        }

In case someone wants yet another approach to this problem, here’s my solution.

Virtues: short, declarative, and functional in style (recursive, does no mutation).

Potential Drawback: This might not be the merge you’re looking for. Consult the docstring for semantics.

def deep_merge(a, b):
    """
    Merge two values, with `b` taking precedence over `a`.

    Semantics:
    - If either `a` or `b` is not a dictionary, `a` will be returned only if
      `b` is `None`. Otherwise `b` will be returned.
    - If both values are dictionaries, they are merged as follows:
        * Each key that is found only in `a` or only in `b` will be included in
          the output collection with its value intact.
        * For any key in common between `a` and `b`, the corresponding values
          will be merged with the same semantics.
    """
    if not isinstance(a, dict) or not isinstance(b, dict):
        return a if b is None else b
    else:
        # If we're here, both a and b must be dictionaries or subtypes thereof.

        # Compute set of all keys in both dictionaries.
        keys = set(a.keys()) | set(b.keys())

        # Build output dictionary, merging recursively values with common keys,
        # where `None` is used to mean the absence of a value.
        return {
            key: deep_merge(a.get(key), b.get(key))
            for key in keys
        }

回答 10

您可以尝试mergedeep


安装

$ pip3 install mergedeep

用法

from mergedeep import merge

a = {"keyA": 1}
b = {"keyB": {"sub1": 10}}
c = {"keyB": {"sub2": 20}}

merge(a, b, c) 

print(a)
# {"keyA": 1, "keyB": {"sub1": 10, "sub2": 20}}

有关选项的完整列表,请查看文档

You could try mergedeep.


Installation

$ pip3 install mergedeep

Usage

from mergedeep import merge

a = {"keyA": 1}
b = {"keyB": {"sub1": 10}}
c = {"keyB": {"sub2": 20}}

merge(a, b, c) 

print(a)
# {"keyA": 1, "keyB": {"sub1": 10, "sub2": 20}}

For a full list of options, check out the docs!


回答 11

安德鲁·库克斯答案有一个小问题:在某些情况下,b当您修改返回的字典时,它会修改第二个参数。特别是因为这一行:

if key in a:
    ...
else:
    a[key] = b[key]

如果b[key]为a dict,则将其简单地分配给a,这意味着对它的任何后续修改dict都会影响ab

a={}
b={'1':{'2':'b'}}
c={'1':{'3':'c'}}
merge(merge(a,b), c) # {'1': {'3': 'c', '2': 'b'}}
a # {'1': {'3': 'c', '2': 'b'}} (as expected)
b # {'1': {'3': 'c', '2': 'b'}} <----
c # {'1': {'3': 'c'}} (unmodified)

为了解决这个问题,该行必须替换为:

if isinstance(b[key], dict):
    a[key] = clone_dict(b[key])
else:
    a[key] = b[key]

在哪里clone_dict

def clone_dict(obj):
    clone = {}
    for key, value in obj.iteritems():
        if isinstance(value, dict):
            clone[key] = clone_dict(value)
        else:
            clone[key] = value
    return

仍然。这显然不占listset和其他的东西,但我希望尝试合并时,它说明了陷阱dicts

为了完整起见,这是我的版本,您可以在其中多次传递它dicts

def merge_dicts(*args):
    def clone_dict(obj):
        clone = {}
        for key, value in obj.iteritems():
            if isinstance(value, dict):
                clone[key] = clone_dict(value)
            else:
                clone[key] = value
        return

    def merge(a, b, path=[]):
        for key in b:
            if key in a:
                if isinstance(a[key], dict) and isinstance(b[key], dict):
                    merge(a[key], b[key], path + [str(key)])
                elif a[key] == b[key]:
                    pass
                else:
                    raise Exception('Conflict at `{path}\''.format(path='.'.join(path + [str(key)])))
            else:
                if isinstance(b[key], dict):
                    a[key] = clone_dict(b[key])
                else:
                    a[key] = b[key]
        return a
    return reduce(merge, args, {})

There’s a slight problem with andrew cookes answer: In some cases it modifies the second argument b when you modify the returned dict. Specifically it’s because of this line:

if key in a:
    ...
else:
    a[key] = b[key]

If b[key] is a dict, it will simply be assigned to a, meaning any subsequent modifications to that dict will affect both a and b.

a={}
b={'1':{'2':'b'}}
c={'1':{'3':'c'}}
merge(merge(a,b), c) # {'1': {'3': 'c', '2': 'b'}}
a # {'1': {'3': 'c', '2': 'b'}} (as expected)
b # {'1': {'3': 'c', '2': 'b'}} <----
c # {'1': {'3': 'c'}} (unmodified)

To fix this, the line would have to be substituted with this:

if isinstance(b[key], dict):
    a[key] = clone_dict(b[key])
else:
    a[key] = b[key]

Where clone_dict is:

def clone_dict(obj):
    clone = {}
    for key, value in obj.iteritems():
        if isinstance(value, dict):
            clone[key] = clone_dict(value)
        else:
            clone[key] = value
    return

Still. This obviously doesn’t account for list, set and other stuff, but I hope it illustrates the pitfalls when trying to merge dicts.

And for completeness sake, here is my version, where you can pass it multiple dicts:

def merge_dicts(*args):
    def clone_dict(obj):
        clone = {}
        for key, value in obj.iteritems():
            if isinstance(value, dict):
                clone[key] = clone_dict(value)
            else:
                clone[key] = value
        return

    def merge(a, b, path=[]):
        for key in b:
            if key in a:
                if isinstance(a[key], dict) and isinstance(b[key], dict):
                    merge(a[key], b[key], path + [str(key)])
                elif a[key] == b[key]:
                    pass
                else:
                    raise Exception('Conflict at `{path}\''.format(path='.'.join(path + [str(key)])))
            else:
                if isinstance(b[key], dict):
                    a[key] = clone_dict(b[key])
                else:
                    a[key] = b[key]
        return a
    return reduce(merge, args, {})

回答 12

此版本的函数将占N个字典,仅占字典-无法传递不正确的参数,否则将引发TypeError。合并本身解决了键冲突,而不是覆盖合并链下游的词典中的数据,它创建了一组值并将其追加到该值之后;没有数据丢失。

它在页面上可能不是最有效的,但是却是最彻底的,并且在合并2到N个字典时,您不会丢失任何信息。

def merge_dicts(*dicts):
    if not reduce(lambda x, y: isinstance(y, dict) and x, dicts, True):
        raise TypeError, "Object in *dicts not of type dict"
    if len(dicts) < 2:
        raise ValueError, "Requires 2 or more dict objects"


    def merge(a, b):
        for d in set(a.keys()).union(b.keys()):
            if d in a and d in b:
                if type(a[d]) == type(b[d]):
                    if not isinstance(a[d], dict):
                        ret = list({a[d], b[d]})
                        if len(ret) == 1: ret = ret[0]
                        yield (d, sorted(ret))
                    else:
                        yield (d, dict(merge(a[d], b[d])))
                else:
                    raise TypeError, "Conflicting key:value type assignment"
            elif d in a:
                yield (d, a[d])
            elif d in b:
                yield (d, b[d])
            else:
                raise KeyError

    return reduce(lambda x, y: dict(merge(x, y)), dicts[1:], dicts[0])

print merge_dicts({1:1,2:{1:2}},{1:2,2:{3:1}},{4:4})

输出:{1:[1,2],2:{1:2,3:3:1},4:4}

This version of the function will account for N number of dictionaries, and only dictionaries — no improper parameters can be passed, or it will raise a TypeError. The merge itself accounts for key conflicts, and instead of overwriting data from a dictionary further down the merge chain, it creates a set of values and appends to that; no data is lost.

It might not be the most effecient on the page, but it’s the most thorough and you’re not going to lose any information when you merge your 2 to N dicts.

def merge_dicts(*dicts):
    if not reduce(lambda x, y: isinstance(y, dict) and x, dicts, True):
        raise TypeError, "Object in *dicts not of type dict"
    if len(dicts) < 2:
        raise ValueError, "Requires 2 or more dict objects"


    def merge(a, b):
        for d in set(a.keys()).union(b.keys()):
            if d in a and d in b:
                if type(a[d]) == type(b[d]):
                    if not isinstance(a[d], dict):
                        ret = list({a[d], b[d]})
                        if len(ret) == 1: ret = ret[0]
                        yield (d, sorted(ret))
                    else:
                        yield (d, dict(merge(a[d], b[d])))
                else:
                    raise TypeError, "Conflicting key:value type assignment"
            elif d in a:
                yield (d, a[d])
            elif d in b:
                yield (d, b[d])
            else:
                raise KeyError

    return reduce(lambda x, y: dict(merge(x, y)), dicts[1:], dicts[0])

print merge_dicts({1:1,2:{1:2}},{1:2,2:{3:1}},{4:4})

output: {1: [1, 2], 2: {1: 2, 3: 1}, 4: 4}


回答 13

由于dictviews支持集合操作,因此我能够大大简化jterrace的答案。

def merge(dict1, dict2):
    for k in dict1.keys() - dict2.keys():
        yield (k, dict1[k])

    for k in dict2.keys() - dict1.keys():
        yield (k, dict2[k])

    for k in dict1.keys() & dict2.keys():
        yield (k, dict(merge(dict1[k], dict2[k])))

任何尝试将dict与非dict组合在一起的尝试(从技术上讲,是指具有“ keys”方法的对象,而没有“ keys”方法的对象)将引发AttributeError。这包括对函数的初始调用和递归调用。这正是我想要的,所以我离开了。您可以轻松地捕获递归调用引发的AttributeErrors,然后产生您想要的任何值。

Since dictviews support set operations, I was able to greatly simplify jterrace’s answer.

def merge(dict1, dict2):
    for k in dict1.keys() - dict2.keys():
        yield (k, dict1[k])

    for k in dict2.keys() - dict1.keys():
        yield (k, dict2[k])

    for k in dict1.keys() & dict2.keys():
        yield (k, dict(merge(dict1[k], dict2[k])))

Any attempt to combine a dict with a non dict (technically, an object with a ‘keys’ method and an object without a ‘keys’ method) will raise an AttributeError. This includes both the initial call to the function and recursive calls. This is exactly what I wanted so I left it. You could easily catch an AttributeErrors thrown by the recursive call and then yield any value you please.


回答 14

短n甜:

from collections.abc import MutableMapping as Map

def nested_update(d, v):
"""
Nested update of dict-like 'd' with dict-like 'v'.
"""

for key in v:
    if key in d and isinstance(d[key], Map) and isinstance(v[key], Map):
        nested_update(d[key], v[key])
    else:
        d[key] = v[key]

它的工作方式类似于Python的dict.update方法(并建立在该方法的基础上)。它会返回Nonereturn d如果愿意,可以随时添加),因为它会d原位更新dict 。中的键v将覆盖其中的任何现有键d(它不会尝试解释字典的内容)。

它也适用于其他(“类似字典”的)映射。

Short-n-sweet:

from collections.abc import MutableMapping as Map

def nested_update(d, v):
"""
Nested update of dict-like 'd' with dict-like 'v'.
"""

for key in v:
    if key in d and isinstance(d[key], Map) and isinstance(v[key], Map):
        nested_update(d[key], v[key])
    else:
        d[key] = v[key]

This works like (and is build on) Python’s dict.update method. It returns None (you can always add return d if you prefer) as it updates dict d in-place. Keys in v will overwrite any existing keys in d (it does not try to interpret the dict’s contents).

It will also work for other (“dict-like”) mappings.


回答 15

当然,代码将取决于您解决合并冲突的规则。这是一个可以使用任意数量的参数并将其递归合并到任意深度的版本,而无需使用任何对象突变。它使用以下规则解决合并冲突:

  • 字典优先于非字典值({"foo": {...}}优先于{"foo": "bar"}
  • 随后的参数优先于前面的参数(如果合并{"a": 1}{"a", 2}以及{"a": 3}为了,其结果将是{"a": 3}
try:
    from collections import Mapping
except ImportError:
    Mapping = dict

def merge_dicts(*dicts):                                                            
    """                                                                             
    Return a new dictionary that is the result of merging the arguments together.   
    In case of conflicts, later arguments take precedence over earlier arguments.   
    """                                                                             
    updated = {}                                                                    
    # grab all keys                                                                 
    keys = set()                                                                    
    for d in dicts:                                                                 
        keys = keys.union(set(d))                                                   

    for key in keys:                                                                
        values = [d[key] for d in dicts if key in d]                                
        # which ones are mapping types? (aka dict)                                  
        maps = [value for value in values if isinstance(value, Mapping)]            
        if maps:                                                                    
            # if we have any mapping types, call recursively to merge them          
            updated[key] = merge_dicts(*maps)                                       
        else:                                                                       
            # otherwise, just grab the last value we have, since later arguments    
            # take precedence over earlier arguments                                
            updated[key] = values[-1]                                               
    return updated  

The code will depend on your rules for resolving merge conflicts, of course. Here’s a version which can take an arbitrary number of arguments and merges them recursively to an arbitrary depth, without using any object mutation. It uses the following rules to resolve merge conflicts:

  • dictionaries take precedence over non-dict values ({"foo": {...}} takes precedence over {"foo": "bar"})
  • later arguments take precedence over earlier arguments (if you merge {"a": 1}, {"a", 2}, and {"a": 3} in order, the result will be {"a": 3})
try:
    from collections import Mapping
except ImportError:
    Mapping = dict

def merge_dicts(*dicts):                                                            
    """                                                                             
    Return a new dictionary that is the result of merging the arguments together.   
    In case of conflicts, later arguments take precedence over earlier arguments.   
    """                                                                             
    updated = {}                                                                    
    # grab all keys                                                                 
    keys = set()                                                                    
    for d in dicts:                                                                 
        keys = keys.union(set(d))                                                   

    for key in keys:                                                                
        values = [d[key] for d in dicts if key in d]                                
        # which ones are mapping types? (aka dict)                                  
        maps = [value for value in values if isinstance(value, Mapping)]            
        if maps:                                                                    
            # if we have any mapping types, call recursively to merge them          
            updated[key] = merge_dicts(*maps)                                       
        else:                                                                       
            # otherwise, just grab the last value we have, since later arguments    
            # take precedence over earlier arguments                                
            updated[key] = values[-1]                                               
    return updated  

回答 16

我有两个字典(ab),每个字典可以包含任意数量的嵌套字典。我想以b优先于递归合并它们a

将嵌套字典视为树,我想要的是:

  • 更新a以使到每个叶子的每条路径都b表示为a
  • 要覆盖的子树a,如果叶在相应路径中找到b
    • 保持所有b叶节点仍为​​叶的不变性。

对于我的口味而言,现有的答案有些复杂,并且在架子上留下了一些细节。我一起整理了以下内容,这些内容通过了我的数据集的单元测试。

  def merge_map(a, b):
    if not isinstance(a, dict) or not isinstance(b, dict):
      return b

    for key in b.keys():
      a[key] = merge_map(a[key], b[key]) if key in a else b[key]
    return a

示例(为清晰起见而格式化):

 a = {
    1 : {'a': 'red', 
         'b': {'blue': 'fish', 'yellow': 'bear' },
         'c': { 'orange': 'dog'},
    },
    2 : {'d': 'green'},
    3: 'e'
  }

  b = {
    1 : {'b': 'white'},
    2 : {'d': 'black'},
    3: 'e'
  }


  >>> merge_map(a, b)
  {1: {'a': 'red', 
       'b': 'white',
       'c': {'orange': 'dog'},},
   2: {'d': 'black'},
   3: 'e'}

b需要维护的路径是:

  • 1 -> 'b' -> 'white'
  • 2 -> 'd' -> 'black'
  • 3 -> 'e'

a 具有以下独特且无冲突的路径:

  • 1 -> 'a' -> 'red'
  • 1 -> 'c' -> 'orange' -> 'dog'

因此它们仍显示在合并地图中。

I had two dictionaries (a and b) which could each contain any number of nested dictionaries. I wanted to recursively merge them, with b taking precedence over a.

Considering the nested dictionaries as trees, what I wanted was:

  • To update a so that every path to every leaf in b would be represented in a
  • To overwrite subtrees of a if a leaf is found in the corresponding path in b
    • Maintain the invariant that all b leaf nodes remain leafs.

The existing answers were a little complicated for my taste and left some details on the shelf. I hacked together the following, which passes unit tests for my data set.

  def merge_map(a, b):
    if not isinstance(a, dict) or not isinstance(b, dict):
      return b

    for key in b.keys():
      a[key] = merge_map(a[key], b[key]) if key in a else b[key]
    return a

Example (formatted for clarity):

 a = {
    1 : {'a': 'red', 
         'b': {'blue': 'fish', 'yellow': 'bear' },
         'c': { 'orange': 'dog'},
    },
    2 : {'d': 'green'},
    3: 'e'
  }

  b = {
    1 : {'b': 'white'},
    2 : {'d': 'black'},
    3: 'e'
  }


  >>> merge_map(a, b)
  {1: {'a': 'red', 
       'b': 'white',
       'c': {'orange': 'dog'},},
   2: {'d': 'black'},
   3: 'e'}

The paths in b that needed to be maintained were:

  • 1 -> 'b' -> 'white'
  • 2 -> 'd' -> 'black'
  • 3 -> 'e'.

a had the unique and non-conflicting paths of:

  • 1 -> 'a' -> 'red'
  • 1 -> 'c' -> 'orange' -> 'dog'

so they are still represented in the merged map.


回答 17

我有一个迭代的解决方案-大字典和很多字典(例如json等)的效果要好得多:

import collections


def merge_dict_with_subdicts(dict1: dict, dict2: dict) -> dict:
    """
    similar behaviour to builtin dict.update - but knows how to handle nested dicts
    """
    q = collections.deque([(dict1, dict2)])
    while len(q) > 0:
        d1, d2 = q.pop()
        for k, v in d2.items():
            if k in d1 and isinstance(d1[k], dict) and isinstance(v, dict):
                q.append((d1[k], v))
            else:
                d1[k] = v

    return dict1

请注意,如果它们不是两个字典,则将使用d2中的值覆盖d1。(与python相同dict.update()

一些测试:

def test_deep_update():
    d = dict()
    merge_dict_with_subdicts(d, {"a": 4})
    assert d == {"a": 4}

    new_dict = {
        "b": {
            "c": {
                "d": 6
            }
        }
    }
    merge_dict_with_subdicts(d, new_dict)
    assert d == {
        "a": 4,
        "b": {
            "c": {
                "d": 6
            }
        }
    }

    new_dict = {
        "a": 3,
        "b": {
            "f": 7
        }
    }
    merge_dict_with_subdicts(d, new_dict)
    assert d == {
        "a": 3,
        "b": {
            "c": {
                "d": 6
            },
            "f": 7
        }
    }

    # test a case where one of the dicts has dict as value and the other has something else
    new_dict = {
        'a': {
            'b': 4
        }
    }
    merge_dict_with_subdicts(d, new_dict)
    assert d['a']['b'] == 4

我已经测试了约1200格-该方法花费了0.4秒,而递归解决方案花费了约2.5秒。

I have an iterative solution – works much much better with big dicts & a lot of them (for example jsons etc):

import collections


def merge_dict_with_subdicts(dict1: dict, dict2: dict) -> dict:
    """
    similar behaviour to builtin dict.update - but knows how to handle nested dicts
    """
    q = collections.deque([(dict1, dict2)])
    while len(q) > 0:
        d1, d2 = q.pop()
        for k, v in d2.items():
            if k in d1 and isinstance(d1[k], dict) and isinstance(v, dict):
                q.append((d1[k], v))
            else:
                d1[k] = v

    return dict1

note that this will use the value in d2 to override d1, in case they are not both dicts. (same as python’s dict.update())

some tests:

def test_deep_update():
    d = dict()
    merge_dict_with_subdicts(d, {"a": 4})
    assert d == {"a": 4}

    new_dict = {
        "b": {
            "c": {
                "d": 6
            }
        }
    }
    merge_dict_with_subdicts(d, new_dict)
    assert d == {
        "a": 4,
        "b": {
            "c": {
                "d": 6
            }
        }
    }

    new_dict = {
        "a": 3,
        "b": {
            "f": 7
        }
    }
    merge_dict_with_subdicts(d, new_dict)
    assert d == {
        "a": 3,
        "b": {
            "c": {
                "d": 6
            },
            "f": 7
        }
    }

    # test a case where one of the dicts has dict as value and the other has something else
    new_dict = {
        'a': {
            'b': 4
        }
    }
    merge_dict_with_subdicts(d, new_dict)
    assert d['a']['b'] == 4

I’ve tested with around ~1200 dicts – this method took 0.4 seconds, while the recursive solution took ~2.5 seconds.


回答 18

这应该有助于将所有项目合并dict2dict1

for item in dict2:
    if item in dict1:
        for leaf in dict2[item]:
            dict1[item][leaf] = dict2[item][leaf]
    else:
        dict1[item] = dict2[item]

请测试一下,然后告诉我们这是否是您想要的。

编辑:

上述解决方案仅合并了一个级别,但正确地解决了OP给出的示例。要合并多个级别,应使用递归。

This should help in merging all items from dict2 into dict1:

for item in dict2:
    if item in dict1:
        for leaf in dict2[item]:
            dict1[item][leaf] = dict2[item][leaf]
    else:
        dict1[item] = dict2[item]

Please test it and tell us whether this is what you wanted.

EDIT:

The above mentioned solution merges only one level, but correctly solves the example given by OP. To merge multiple levels, the recursion should be used.


回答 19

我一直在测试您的解决方案,并决定在我的项目中使用此解决方案:

def mergedicts(dict1, dict2, conflict, no_conflict):
    for k in set(dict1.keys()).union(dict2.keys()):
        if k in dict1 and k in dict2:
            yield (k, conflict(dict1[k], dict2[k]))
        elif k in dict1:
            yield (k, no_conflict(dict1[k]))
        else:
            yield (k, no_conflict(dict2[k]))

dict1 = {1:{"a":"A"}, 2:{"b":"B"}}
dict2 = {2:{"c":"C"}, 3:{"d":"D"}}

#this helper function allows for recursion and the use of reduce
def f2(x, y):
    return dict(mergedicts(x, y, f2, lambda x: x))

print dict(mergedicts(dict1, dict2, f2, lambda x: x))
print dict(reduce(f2, [dict1, dict2]))

将函数作为参数传递是扩展jterrace解决方案以使其与所有其他递归解决方案一样起作用的关键。

I’ve been testing your solutions and decided to use this one in my project:

def mergedicts(dict1, dict2, conflict, no_conflict):
    for k in set(dict1.keys()).union(dict2.keys()):
        if k in dict1 and k in dict2:
            yield (k, conflict(dict1[k], dict2[k]))
        elif k in dict1:
            yield (k, no_conflict(dict1[k]))
        else:
            yield (k, no_conflict(dict2[k]))

dict1 = {1:{"a":"A"}, 2:{"b":"B"}}
dict2 = {2:{"c":"C"}, 3:{"d":"D"}}

#this helper function allows for recursion and the use of reduce
def f2(x, y):
    return dict(mergedicts(x, y, f2, lambda x: x))

print dict(mergedicts(dict1, dict2, f2, lambda x: x))
print dict(reduce(f2, [dict1, dict2]))

Passing functions as parameteres is key to extend jterrace solution to behave as all the other recursive solutions.


回答 20

我能想到的最简单的方法是:

#!/usr/bin/python

from copy import deepcopy
def dict_merge(a, b):
    if not isinstance(b, dict):
        return b
    result = deepcopy(a)
    for k, v in b.iteritems():
        if k in result and isinstance(result[k], dict):
                result[k] = dict_merge(result[k], v)
        else:
            result[k] = deepcopy(v)
    return result

a = {1:{"a":'A'}, 2:{"b":'B'}}
b = {2:{"c":'C'}, 3:{"d":'D'}}

print dict_merge(a,b)

输出:

{1: {'a': 'A'}, 2: {'c': 'C', 'b': 'B'}, 3: {'d': 'D'}}

Easiest way i can think of is :

#!/usr/bin/python

from copy import deepcopy
def dict_merge(a, b):
    if not isinstance(b, dict):
        return b
    result = deepcopy(a)
    for k, v in b.iteritems():
        if k in result and isinstance(result[k], dict):
                result[k] = dict_merge(result[k], v)
        else:
            result[k] = deepcopy(v)
    return result

a = {1:{"a":'A'}, 2:{"b":'B'}}
b = {2:{"c":'C'}, 3:{"d":'D'}}

print dict_merge(a,b)

Output:

{1: {'a': 'A'}, 2: {'c': 'C', 'b': 'B'}, 3: {'d': 'D'}}

回答 21

我在这里还有另一个略有不同的解决方案:

def deepMerge(d1, d2, inconflict = lambda v1,v2 : v2) :
''' merge d2 into d1. using inconflict function to resolve the leaf conflicts '''
    for k in d2:
        if k in d1 : 
            if isinstance(d1[k], dict) and isinstance(d2[k], dict) :
                deepMerge(d1[k], d2[k], inconflict)
            elif d1[k] != d2[k] :
                d1[k] = inconflict(d1[k], d2[k])
        else :
            d1[k] = d2[k]
    return d1

默认情况下,它可以解决冲突,而有利于第二个dict的值,但是您可以轻松地覆盖它,通过做些麻烦,您甚至可以抛出异常。:)。

I have another slightly different solution here:

def deepMerge(d1, d2, inconflict = lambda v1,v2 : v2) :
''' merge d2 into d1. using inconflict function to resolve the leaf conflicts '''
    for k in d2:
        if k in d1 : 
            if isinstance(d1[k], dict) and isinstance(d2[k], dict) :
                deepMerge(d1[k], d2[k], inconflict)
            elif d1[k] != d2[k] :
                d1[k] = inconflict(d1[k], d2[k])
        else :
            d1[k] = d2[k]
    return d1

By default it resolves conflicts in favor of values from the second dict, but you can easily override this, with some witchery you may be able to even throw exceptions out of it. :).


回答 22

class Utils(object):

    """

    >>> a = { 'first' : { 'all_rows' : { 'pass' : 'dog', 'number' : '1' } } }
    >>> b = { 'first' : { 'all_rows' : { 'fail' : 'cat', 'number' : '5' } } }
    >>> Utils.merge_dict(b, a) == { 'first' : { 'all_rows' : { 'pass' : 'dog', 'fail' : 'cat', 'number' : '5' } } }
    True

    >>> main = {'a': {'b': {'test': 'bug'}, 'c': 'C'}}
    >>> suply = {'a': {'b': 2, 'd': 'D', 'c': {'test': 'bug2'}}}
    >>> Utils.merge_dict(main, suply) == {'a': {'b': {'test': 'bug'}, 'c': 'C', 'd': 'D'}}
    True

    """

    @staticmethod
    def merge_dict(main, suply):
        """
        获取融合的字典,以main为主,suply补充,冲突时以main为准
        :return:
        """
        for key, value in suply.items():
            if key in main:
                if isinstance(main[key], dict):
                    if isinstance(value, dict):
                        Utils.merge_dict(main[key], value)
                    else:
                        pass
                else:
                    pass
            else:
                main[key] = value
        return main

if __name__ == '__main__':
    import doctest
    doctest.testmod()
class Utils(object):

    """

    >>> a = { 'first' : { 'all_rows' : { 'pass' : 'dog', 'number' : '1' } } }
    >>> b = { 'first' : { 'all_rows' : { 'fail' : 'cat', 'number' : '5' } } }
    >>> Utils.merge_dict(b, a) == { 'first' : { 'all_rows' : { 'pass' : 'dog', 'fail' : 'cat', 'number' : '5' } } }
    True

    >>> main = {'a': {'b': {'test': 'bug'}, 'c': 'C'}}
    >>> suply = {'a': {'b': 2, 'd': 'D', 'c': {'test': 'bug2'}}}
    >>> Utils.merge_dict(main, suply) == {'a': {'b': {'test': 'bug'}, 'c': 'C', 'd': 'D'}}
    True

    """

    @staticmethod
    def merge_dict(main, suply):
        """
        获取融合的字典,以main为主,suply补充,冲突时以main为准
        :return:
        """
        for key, value in suply.items():
            if key in main:
                if isinstance(main[key], dict):
                    if isinstance(value, dict):
                        Utils.merge_dict(main[key], value)
                    else:
                        pass
                else:
                    pass
            else:
                main[key] = value
        return main

if __name__ == '__main__':
    import doctest
    doctest.testmod()

回答 23

嘿,我也有同样的问题,但是我有一个解决方案,我会把它发布在这里,以防它对其他人也有用,基本上是合并嵌套的字典并添加值,对我来说,我需要计算一些概率,因此一个很好的工作:

#used to copy a nested dict to a nested dict
def deepupdate(target, src):
    for k, v in src.items():
        if k in target:
            for k2, v2 in src[k].items():
                if k2 in target[k]:
                    target[k][k2]+=v2
                else:
                    target[k][k2] = v2
        else:
            target[k] = copy.deepcopy(v)

通过使用以上方法,我们可以合并:

target = {‘6,6’:{‘6,63’:1},’63,4’:{‘4,4’:1},’4,4’:{‘4,3’:1} ,’6,63’:{’63,4’:1}}

src = {‘5,4’:{‘4,4’:1},’5,5’:{‘5,4’:1},’4,4’:{‘4,3’:1} }

它将变成:{‘5,5’:{‘5,4’:1},’5,4’:{‘4,4’:1},’6,6’:{‘6,63’ :1},’63,4’:{‘4,4’:1},’4,4’:{‘4,3’:2},’6,63’:{’63,4’:1 }}

还请注意此处的更改:

target = {‘6,6’:{‘6,63’:1},’6,63’:{’63,4’:1},‘4,4’:{‘4,3’:1},’63,4’:{‘4,4’:1}}

src = {‘5,4’:{‘4,4’:1},’4,3’:{‘3,4’:1},‘4,4’:{‘4,9’:1},’3,4’:{‘4,4’:1},’5,5’:{‘5,4’:1}}

合并= {‘5,4’:{‘4,4’:1},’4,3’:{‘3,4’:1},’6,63’:{’63,4’:1} ,’5,5’:{‘5,4’:1},’6,6’:{‘6,63’:1},’3,4’:{‘4,4’:1},’ 63,4’:{‘4,4’:1},‘4,4’:{‘4,3’:1,’4,9’:1} }}

别忘了还要添加导入副本:

import copy

hey there I also had the same problem but I though of a solution and I will post it here, in case it is also useful for others, basically merging nested dictionaries and also adding the values, for me I needed to calculate some probabilities so this one worked great:

#used to copy a nested dict to a nested dict
def deepupdate(target, src):
    for k, v in src.items():
        if k in target:
            for k2, v2 in src[k].items():
                if k2 in target[k]:
                    target[k][k2]+=v2
                else:
                    target[k][k2] = v2
        else:
            target[k] = copy.deepcopy(v)

by using the above method we can merge:

target = {‘6,6’: {‘6,63’: 1}, ‘63,4’: {‘4,4’: 1}, ‘4,4’: {‘4,3’: 1}, ‘6,63’: {‘63,4’: 1}}

src = {‘5,4’: {‘4,4’: 1}, ‘5,5’: {‘5,4’: 1}, ‘4,4’: {‘4,3’: 1}}

and this will become: {‘5,5’: {‘5,4’: 1}, ‘5,4’: {‘4,4’: 1}, ‘6,6’: {‘6,63’: 1}, ‘63,4’: {‘4,4’: 1}, ‘4,4’: {‘4,3’: 2}, ‘6,63’: {‘63,4’: 1}}

also notice the changes here:

target = {‘6,6’: {‘6,63’: 1}, ‘6,63’: {‘63,4’: 1}, ‘4,4’: {‘4,3’: 1}, ‘63,4’: {‘4,4’: 1}}

src = {‘5,4’: {‘4,4’: 1}, ‘4,3’: {‘3,4’: 1}, ‘4,4’: {‘4,9’: 1}, ‘3,4’: {‘4,4’: 1}, ‘5,5’: {‘5,4’: 1}}

merge = {‘5,4’: {‘4,4’: 1}, ‘4,3’: {‘3,4’: 1}, ‘6,63’: {‘63,4’: 1}, ‘5,5’: {‘5,4’: 1}, ‘6,6’: {‘6,63’: 1}, ‘3,4’: {‘4,4’: 1}, ‘63,4’: {‘4,4’: 1}, ‘4,4’: {‘4,3’: 1, ‘4,9’: 1}}

dont forget to also add the import for copy:

import copy

回答 24

from collections import defaultdict
from itertools import chain

class DictHelper:

@staticmethod
def merge_dictionaries(*dictionaries, override=True):
    merged_dict = defaultdict(set)
    all_unique_keys = set(chain(*[list(dictionary.keys()) for dictionary in dictionaries]))  # Build a set using all dict keys
    for key in all_unique_keys:
        keys_value_type = list(set(filter(lambda obj_type: obj_type != type(None), [type(dictionary.get(key, None)) for dictionary in dictionaries])))
        # Establish the object type for each key, return None if key is not present in dict and remove None from final result
        if len(keys_value_type) != 1:
            raise Exception("Different objects type for same key: {keys_value_type}".format(keys_value_type=keys_value_type))

        if keys_value_type[0] == list:
            values = list(chain(*[dictionary.get(key, []) for dictionary in dictionaries]))  # Extract the value for each key
            merged_dict[key].update(values)

        elif keys_value_type[0] == dict:
            # Extract all dictionaries by key and enter in recursion
            dicts_to_merge = list(filter(lambda obj: obj != None, [dictionary.get(key, None) for dictionary in dictionaries]))
            merged_dict[key] = DictHelper.merge_dictionaries(*dicts_to_merge)

        else:
            # if override => get value from last dictionary else make a list of all values
            values = list(filter(lambda obj: obj != None, [dictionary.get(key, None) for dictionary in dictionaries]))
            merged_dict[key] = values[-1] if override else values

    return dict(merged_dict)



if __name__ == '__main__':
  d1 = {'aaaaaaaaa': ['to short', 'to long'], 'bbbbb': ['to short', 'to long'], "cccccc": ["the is a test"]}
  d2 = {'aaaaaaaaa': ['field is not a bool'], 'bbbbb': ['field is not a bool']}
  d3 = {'aaaaaaaaa': ['filed is not a string', "to short"], 'bbbbb': ['field is not an integer']}
  print(DictHelper.merge_dictionaries(d1, d2, d3))

  d4 = {"a": {"x": 1, "y": 2, "z": 3, "d": {"x1": 10}}}
  d5 = {"a": {"x": 10, "y": 20, "d": {"x2": 20}}}
  print(DictHelper.merge_dictionaries(d4, d5))

输出:

{'bbbbb': {'to long', 'field is not an integer', 'to short', 'field is not a bool'}, 
'aaaaaaaaa': {'to long', 'to short', 'filed is not a string', 'field is not a bool'}, 
'cccccc': {'the is a test'}}

{'a': {'y': 20, 'd': {'x1': 10, 'x2': 20}, 'z': 3, 'x': 10}}
from collections import defaultdict
from itertools import chain

class DictHelper:

@staticmethod
def merge_dictionaries(*dictionaries, override=True):
    merged_dict = defaultdict(set)
    all_unique_keys = set(chain(*[list(dictionary.keys()) for dictionary in dictionaries]))  # Build a set using all dict keys
    for key in all_unique_keys:
        keys_value_type = list(set(filter(lambda obj_type: obj_type != type(None), [type(dictionary.get(key, None)) for dictionary in dictionaries])))
        # Establish the object type for each key, return None if key is not present in dict and remove None from final result
        if len(keys_value_type) != 1:
            raise Exception("Different objects type for same key: {keys_value_type}".format(keys_value_type=keys_value_type))

        if keys_value_type[0] == list:
            values = list(chain(*[dictionary.get(key, []) for dictionary in dictionaries]))  # Extract the value for each key
            merged_dict[key].update(values)

        elif keys_value_type[0] == dict:
            # Extract all dictionaries by key and enter in recursion
            dicts_to_merge = list(filter(lambda obj: obj != None, [dictionary.get(key, None) for dictionary in dictionaries]))
            merged_dict[key] = DictHelper.merge_dictionaries(*dicts_to_merge)

        else:
            # if override => get value from last dictionary else make a list of all values
            values = list(filter(lambda obj: obj != None, [dictionary.get(key, None) for dictionary in dictionaries]))
            merged_dict[key] = values[-1] if override else values

    return dict(merged_dict)



if __name__ == '__main__':
  d1 = {'aaaaaaaaa': ['to short', 'to long'], 'bbbbb': ['to short', 'to long'], "cccccc": ["the is a test"]}
  d2 = {'aaaaaaaaa': ['field is not a bool'], 'bbbbb': ['field is not a bool']}
  d3 = {'aaaaaaaaa': ['filed is not a string', "to short"], 'bbbbb': ['field is not an integer']}
  print(DictHelper.merge_dictionaries(d1, d2, d3))

  d4 = {"a": {"x": 1, "y": 2, "z": 3, "d": {"x1": 10}}}
  d5 = {"a": {"x": 10, "y": 20, "d": {"x2": 20}}}
  print(DictHelper.merge_dictionaries(d4, d5))

Output:

{'bbbbb': {'to long', 'field is not an integer', 'to short', 'field is not a bool'}, 
'aaaaaaaaa': {'to long', 'to short', 'filed is not a string', 'field is not a bool'}, 
'cccccc': {'the is a test'}}

{'a': {'y': 20, 'd': {'x1': 10, 'x2': 20}, 'z': 3, 'x': 10}}

回答 25

看一看toolz

import toolz
dict1={1:{"a":"A"},2:{"b":"B"}}
dict2={2:{"c":"C"},3:{"d":"D"}}
toolz.merge_with(toolz.merge,dict1,dict2)

{1: {'a': 'A'}, 2: {'b': 'B', 'c': 'C'}, 3: {'d': 'D'}}

take a look at toolz package

import toolz
dict1={1:{"a":"A"},2:{"b":"B"}}
dict2={2:{"c":"C"},3:{"d":"D"}}
toolz.merge_with(toolz.merge,dict1,dict2)

gives

{1: {'a': 'A'}, 2: {'b': 'B', 'c': 'C'}, 3: {'d': 'D'}}

回答 26

以下函数将b合并为a。

def mergedicts(a, b):
    for key in b:
        if isinstance(a.get(key), dict) or isinstance(b.get(key), dict):
            mergedicts(a[key], b[key])
        else:
            a[key] = b[key]
    return a

The following function merges b into a.

def mergedicts(a, b):
    for key in b:
        if isinstance(a.get(key), dict) or isinstance(b.get(key), dict):
            mergedicts(a[key], b[key])
        else:
            a[key] = b[key]
    return a

回答 27

还有另一个细微变化:

这是基于纯python3集的深度更新功能。它通过一次遍历一个级别来更新嵌套字典,并调用自身来更新每个下一级别的字典值:

def deep_update(dict_original, dict_update):
    if isinstance(dict_original, dict) and isinstance(dict_update, dict):
        output=dict(dict_original)
        keys_original=set(dict_original.keys())
        keys_update=set(dict_update.keys())
        similar_keys=keys_original.intersection(keys_update)
        similar_dict={key:deep_update(dict_original[key], dict_update[key]) for key in similar_keys}
        new_keys=keys_update.difference(keys_original)
        new_dict={key:dict_update[key] for key in new_keys}
        output.update(similar_dict)
        output.update(new_dict)
        return output
    else:
        return dict_update

一个简单的例子:

x={'a':{'b':{'c':1, 'd':1}}}
y={'a':{'b':{'d':2, 'e':2}}, 'f':2}

print(deep_update(x, y))
>>> {'a': {'b': {'c': 1, 'd': 2, 'e': 2}}, 'f': 2}

And just another slight variation:

Here is a pure python3 set based deep update function. It updates nested dictionaries by looping through one level at a time and calls itself to update each next level of dictionary values:

def deep_update(dict_original, dict_update):
    if isinstance(dict_original, dict) and isinstance(dict_update, dict):
        output=dict(dict_original)
        keys_original=set(dict_original.keys())
        keys_update=set(dict_update.keys())
        similar_keys=keys_original.intersection(keys_update)
        similar_dict={key:deep_update(dict_original[key], dict_update[key]) for key in similar_keys}
        new_keys=keys_update.difference(keys_original)
        new_dict={key:dict_update[key] for key in new_keys}
        output.update(similar_dict)
        output.update(new_dict)
        return output
    else:
        return dict_update

A simple example:

x={'a':{'b':{'c':1, 'd':1}}}
y={'a':{'b':{'d':2, 'e':2}}, 'f':2}

print(deep_update(x, y))
>>> {'a': {'b': {'c': 1, 'd': 2, 'e': 2}}, 'f': 2}

回答 28

另一个答案怎么样?!这也避免了突变/副作用:

def merge(dict1, dict2):
    output = {}

    # adds keys from `dict1` if they do not exist in `dict2` and vice-versa
    intersection = {**dict2, **dict1}

    for k_intersect, v_intersect in intersection.items():
        if k_intersect not in dict1:
            v_dict2 = dict2[k_intersect]
            output[k_intersect] = v_dict2

        elif k_intersect not in dict2:
            output[k_intersect] = v_intersect

        elif isinstance(v_intersect, dict):
            v_dict2 = dict2[k_intersect]
            output[k_intersect] = merge(v_intersect, v_dict2)

        else:
            output[k_intersect] = v_intersect

    return output
dict1 = {1:{"a":{"A"}}, 2:{"b":{"B"}}}
dict2 = {2:{"c":{"C"}}, 3:{"d":{"D"}}}
dict3 = {1:{"a":{"A"}}, 2:{"b":{"B"},"c":{"C"}}, 3:{"d":{"D"}}}

assert dict3 == merge(dict1, dict2)

How about another answer?!? This one also avoids mutation/side effects:

def merge(dict1, dict2):
    output = {}

    # adds keys from `dict1` if they do not exist in `dict2` and vice-versa
    intersection = {**dict2, **dict1}

    for k_intersect, v_intersect in intersection.items():
        if k_intersect not in dict1:
            v_dict2 = dict2[k_intersect]
            output[k_intersect] = v_dict2

        elif k_intersect not in dict2:
            output[k_intersect] = v_intersect

        elif isinstance(v_intersect, dict):
            v_dict2 = dict2[k_intersect]
            output[k_intersect] = merge(v_intersect, v_dict2)

        else:
            output[k_intersect] = v_intersect

    return output

dict1 = {1:{"a":{"A"}}, 2:{"b":{"B"}}}
dict2 = {2:{"c":{"C"}}, 3:{"d":{"D"}}}
dict3 = {1:{"a":{"A"}}, 2:{"b":{"B"},"c":{"C"}}, 3:{"d":{"D"}}}

assert dict3 == merge(dict1, dict2)

Python Pandas仅合并某些列

问题:Python Pandas仅合并某些列

是否可以仅合并一些列?我有一个带有x,y,z和df2列的DataFrame df1,其中x,a,b,c,d,e,f等列。

我想在x上合并两个DataFrame,但是我只想合并df2.a,df2.b列-而不是整个DataFrame。

结果将是具有x,y,z,a,b的DataFrame。

我可以合并然后删除不需要的列,但是似乎有更好的方法。

Is it possible to only merge some columns? I have a DataFrame df1 with columns x, y, z, and df2 with columns x, a ,b, c, d, e, f, etc.

I want to merge the two DataFrames on x, but I only want to merge columns df2.a, df2.b – not the entire DataFrame.

The result would be a DataFrame with x, y, z, a, b.

I could merge then delete the unwanted columns, but it seems like there is a better method.


回答 0

您可以合并sub-DataFrame(仅包含那些列):

df2[list('xab')]  # df2 but only with columns x, a, and b

df1.merge(df2[list('xab')])

You could merge the sub-DataFrame (with just those columns):

df2[list('xab')]  # df2 but only with columns x, a, and b

df1.merge(df2[list('xab')])

回答 1

您想使用两个括号,因此,如果要执行VLOOKUP动作,请执行以下操作:

df = pd.merge(df,df2[['Key_Column','Target_Column']],on='Key_Column', how='left')

这将为您提供原始df中的所有内容,并在df2中添加您想要加入的相应列。

You want to use TWO brackets, so if you are doing a VLOOKUP sort of action:

df = pd.merge(df,df2[['Key_Column','Target_Column']],on='Key_Column', how='left')

This will give you everything in the original df + add that one corresponding column in df2 that you want to join.


回答 2

如果要从目标数据框中删除列,但联接需要该列,则可以执行以下操作:

df1 = df1.merge(df2[['a', 'b', 'key1']], how = 'left',
                left_on = 'key2', right_on = 'key1').drop('key1')

.drop('key1')部分将防止“ key1”保留在结果数据帧中,尽管它首先需要加入。

If you want to drop column(s) from the target data frame, but the column(s) are required for the join, you can do the following:

df1 = df1.merge(df2[['a', 'b', 'key1']], how = 'left',
                left_on = 'key2', right_on = 'key1').drop('key1')

The .drop('key1') part will prevent ‘key1’ from being kept in the resulting data frame, despite it being required to join in the first place.


回答 3

您可以使用.loc选择所有行的特定列,然后将其拉出。下面是一个示例:

pandas.merge(dataframe1, dataframe2.iloc[:, [0:5]], how='left', on='key')

在此示例中,您要合并dataframe1和dataframe2。您已选择对“键”进行外部左连接。但是,对于dataframe2,您指定.iloc了允许您以数字格式指定想要的行和列的方法。使用:,选择所有行,但[0:5]选择前5列。您可以使用.loc按名称指定,但是如果您使用长列名称,则.iloc可能会更好。

You can use .loc to select the specific columns with all rows and then pull that. An example is below:

pandas.merge(dataframe1, dataframe2.iloc[:, [0:5]], how='left', on='key')

In this example, you are merging dataframe1 and dataframe2. You have chosen to do an outer left join on ‘key’. However, for dataframe2 you have specified .iloc which allows you to specific the rows and columns you want in a numerical format. Using :, your selecting all rows, but [0:5] selects the first 5 columns. You could use .loc to specify by name, but if your dealing with long column names, then .iloc may be better.


回答 4

这是为了合并两个表中的选定列。

如果table_1包含t1_a,t1_b,t1_c..,id,..t1_ztable_2包含t2_a, t2_b, t2_c..., id,..t2_z列,并且最终表中仅需要t1_a,id和t2_a,则

mergedCSV = table_1[['t1_a','id']].merge(table_2[['t2_a','id']], on = 'id',how = 'left')
# save resulting output file    
mergedCSV.to_csv('output.csv',index = False)

This is to merge selected columns from two tables.

If table_1 contains t1_a,t1_b,t1_c..,id,..t1_z columns, and table_2 contains t2_a, t2_b, t2_c..., id,..t2_z columns, and only t1_a, id, t2_a are required in the final table, then

mergedCSV = table_1[['t1_a','id']].merge(table_2[['t2_a','id']], on = 'id',how = 'left')
# save resulting output file    
mergedCSV.to_csv('output.csv',index = False)

按索引合并两个数据框

问题:按索引合并两个数据框

嗨,我有以下数据框:

> df1
  id begin conditional confidence discoveryTechnique  
0 278    56       false        0.0                  1   
1 421    18       false        0.0                  1 

> df2
   concept 
0  A  
1  B

如何合并索引以获取:

  id begin conditional confidence discoveryTechnique   concept 
0 278    56       false        0.0                  1  A 
1 421    18       false        0.0                  1  B

我问,因为据我了解,merge()df1.merge(df2)使用列进行匹配。实际上,这样做我得到:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python2.7/dist-packages/pandas/core/frame.py", line 4618, in merge
    copy=copy, indicator=indicator)
  File "/usr/local/lib/python2.7/dist-packages/pandas/tools/merge.py", line 58, in merge
    copy=copy, indicator=indicator)
  File "/usr/local/lib/python2.7/dist-packages/pandas/tools/merge.py", line 491, in __init__
    self._validate_specification()
  File "/usr/local/lib/python2.7/dist-packages/pandas/tools/merge.py", line 812, in _validate_specification
    raise MergeError('No common columns to perform merge on')
pandas.tools.merge.MergeError: No common columns to perform merge on

在索引上合并是不好的做法吗?不可能吗 如果是这样,如何将索引移到称为“索引”的新列中?

谢谢

I have the following dataframes:

> df1
  id begin conditional confidence discoveryTechnique  
0 278    56       false        0.0                  1   
1 421    18       false        0.0                  1 

> df2
   concept 
0  A  
1  B
   

How do I merge on the indices to get:

  id begin conditional confidence discoveryTechnique   concept 
0 278    56       false        0.0                  1  A 
1 421    18       false        0.0                  1  B

I ask because it is my understanding that merge() i.e. df1.merge(df2) uses columns to do the matching. In fact, doing this I get:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python2.7/dist-packages/pandas/core/frame.py", line 4618, in merge
    copy=copy, indicator=indicator)
  File "/usr/local/lib/python2.7/dist-packages/pandas/tools/merge.py", line 58, in merge
    copy=copy, indicator=indicator)
  File "/usr/local/lib/python2.7/dist-packages/pandas/tools/merge.py", line 491, in __init__
    self._validate_specification()
  File "/usr/local/lib/python2.7/dist-packages/pandas/tools/merge.py", line 812, in _validate_specification
    raise MergeError('No common columns to perform merge on')
pandas.tools.merge.MergeError: No common columns to perform merge on

Is it bad practice to merge on index? Is it impossible? If so, how can I shift the index into a new column called “index”?


回答 0

使用merge,默认情况下是内部联接:

pd.merge(df1, df2, left_index=True, right_index=True)

join,默认情况下为左连接:

df1.join(df2)

concat,默认情况下为外部联接:

pd.concat([df1, df2], axis=1)

样品

df1 = pd.DataFrame({'a':range(6),
                    'b':[5,3,6,9,2,4]}, index=list('abcdef'))

print (df1)
   a  b
a  0  5
b  1  3
c  2  6
d  3  9
e  4  2
f  5  4

df2 = pd.DataFrame({'c':range(4),
                    'd':[10,20,30, 40]}, index=list('abhi'))

print (df2)
   c   d
a  0  10
b  1  20
h  2  30
i  3  40

#default inner join
df3 = pd.merge(df1, df2, left_index=True, right_index=True)
print (df3)
   a  b  c   d
a  0  5  0  10
b  1  3  1  20

#default left join
df4 = df1.join(df2)
print (df4)
   a  b    c     d
a  0  5  0.0  10.0
b  1  3  1.0  20.0
c  2  6  NaN   NaN
d  3  9  NaN   NaN
e  4  2  NaN   NaN
f  5  4  NaN   NaN

#default outer join
df5 = pd.concat([df1, df2], axis=1)
print (df5)
     a    b    c     d
a  0.0  5.0  0.0  10.0
b  1.0  3.0  1.0  20.0
c  2.0  6.0  NaN   NaN
d  3.0  9.0  NaN   NaN
e  4.0  2.0  NaN   NaN
f  5.0  4.0  NaN   NaN
h  NaN  NaN  2.0  30.0
i  NaN  NaN  3.0  40.0

Use merge, which is inner join by default:

pd.merge(df1, df2, left_index=True, right_index=True)

Or join, which is left join by default:

df1.join(df2)

Or concat, which is outer join by default:

pd.concat([df1, df2], axis=1)

Samples:

df1 = pd.DataFrame({'a':range(6),
                    'b':[5,3,6,9,2,4]}, index=list('abcdef'))

print (df1)
   a  b
a  0  5
b  1  3
c  2  6
d  3  9
e  4  2
f  5  4

df2 = pd.DataFrame({'c':range(4),
                    'd':[10,20,30, 40]}, index=list('abhi'))

print (df2)
   c   d
a  0  10
b  1  20
h  2  30
i  3  40

#default inner join
df3 = pd.merge(df1, df2, left_index=True, right_index=True)
print (df3)
   a  b  c   d
a  0  5  0  10
b  1  3  1  20

#default left join
df4 = df1.join(df2)
print (df4)
   a  b    c     d
a  0  5  0.0  10.0
b  1  3  1.0  20.0
c  2  6  NaN   NaN
d  3  9  NaN   NaN
e  4  2  NaN   NaN
f  5  4  NaN   NaN

#default outer join
df5 = pd.concat([df1, df2], axis=1)
print (df5)
     a    b    c     d
a  0.0  5.0  0.0  10.0
b  1.0  3.0  1.0  20.0
c  2.0  6.0  NaN   NaN
d  3.0  9.0  NaN   NaN
e  4.0  2.0  NaN   NaN
f  5.0  4.0  NaN   NaN
h  NaN  NaN  2.0  30.0
i  NaN  NaN  3.0  40.0

回答 1

您可以使用concat([df1,df2,…],axis = 1)来连接两个或更多个按索引对齐的DF:

pd.concat([df1, df2, df3, ...], axis=1)

合并以通过自定义字段/索引进行串联:

# join by _common_ columns: `col1`, `col3`
pd.merge(df1, df2, on=['col1','col3'])

# join by: `df1.col1 == df2.index`
pd.merge(df1, df2, left_on='col1' right_index=True)

参加由指数加盟:

 df1.join(df2)

you can use concat([df1, df2, …], axis=1) in order to concatenate two or more DFs aligned by indexes:

pd.concat([df1, df2, df3, ...], axis=1)

or merge for concatenating by custom fields / indexes:

# join by _common_ columns: `col1`, `col3`
pd.merge(df1, df2, on=['col1','col3'])

# join by: `df1.col1 == df2.index`
pd.merge(df1, df2, left_on='col1' right_index=True)

or join for joining by index:

 df1.join(df2)

回答 2

默认情况下:
join是按列的左联接
pd.merge是按列的内部联接
pd.concat是按行的外部联接

pd.concat
采用Iterable参数。因此,它不能直接使用DataFrames(使用[df,df2]
DataFrame的尺寸应沿轴匹配

Joinpd.merge
可以接受DataFrame参数

By default:
join is a column-wise left join
pd.merge is a column-wise inner join
pd.concat is a row-wise outer join

pd.concat:
takes Iterable arguments. Thus, it cannot take DataFrames directly (use [df,df2])
Dimensions of DataFrame should match along axis

Join and pd.merge:
can take DataFrame arguments


回答 3

一个愚蠢的错误吸引了我:由于索引dtypes不同,联接失败。这不是很明显,因为两个表都是同一原始表的数据透视表。之后reset_index,索引在Jupyter中看起来相同。保存到Excel时才发现…

固定于: df1[['key']] = df1[['key']].apply(pd.to_numeric)

希望这可以节省一个小时!

A silly bug that got me: the joins failed because index dtypes differed. This was not obvious as both tables were pivot tables of the same original table. After reset_index, the indices looked identical in Jupyter. It only came to light when saving to Excel…

Fixed with: df1[['key']] = df1[['key']].apply(pd.to_numeric)

Hopefully this saves somebody an hour!


回答 4

如果要在熊猫中加入两个数据框,则可以简单地使用可用的属性,例如mergeconcatenate。例如,如果我有两个数据框df1df2可以通过以下方式将它们加入:

newdataframe=merge(df1,df2,left_index=True,right_index=True)

If u want to join two dataframes in pandas you can simply use available attributes like merge or concatenate. For example if I have two dataframes df1 and df2 I can join them by:

newdataframe=merge(df1,df2,left_index=True,right_index=True)

熊猫三向联接列上的多个数据框

问题:熊猫三向联接列上的多个数据框

我有3个CSV文件。每个列都有第一列作为人员的(字符串)名称,而每个数据框中的所有其他列都是该人员的属性。

如何将所有三个CSV文档“连接”在一起以创建一个CSV,而每一行都具有该人的字符串名称的每个唯一值的所有属性?

join()pandas中的函数指定我需要一个多索引,但是我对层次化索引方案与基于单个索引进行联接有何关系感到困惑。

I have 3 CSV files. Each has the first column as the (string) names of people, while all the other columns in each dataframe are attributes of that person.

How can I “join” together all three CSV documents to create a single CSV with each row having all the attributes for each unique value of the person’s string name?

The join() function in pandas specifies that I need a multiindex, but I’m confused about what a hierarchical indexing scheme has to do with making a join based on a single index.


回答 0

假设进口:

import pandas as pd

John Galt的答案基本上是一项reduce手术。如果我有几个数据帧,则将它们放在这样的列表中(通过列表推导或循环或其他方式生成):

dfs = [df0, df1, df2, dfN]

假设它们有一些共同的列,例如name您的示例,我将执行以下操作:

df_final = reduce(lambda left,right: pd.merge(left,right,on='name'), dfs)

这样,您的代码应该可以与要合并的任意数量的数据框一起使用。

编辑2016年8月1日:对于使用Python 3的用户:reduce已移入functools。因此,要使用此功能,您首先需要导入该模块:

from functools import reduce

Assumed imports:

import pandas as pd

John Galt’s answer is basically a reduce operation. If I have more than a handful of dataframes, I’d put them in a list like this (generated via list comprehensions or loops or whatnot):

dfs = [df0, df1, df2, dfN]

Assuming they have some common column, like name in your example, I’d do the following:

df_final = reduce(lambda left,right: pd.merge(left,right,on='name'), dfs)

That way, your code should work with whatever number of dataframes you want to merge.

Edit August 1, 2016: For those using Python 3: reduce has been moved into functools. So to use this function, you’ll first need to import that module:

from functools import reduce

回答 1

如果您有3个数据框,则可以尝试

# Merge multiple dataframes
df1 = pd.DataFrame(np.array([
    ['a', 5, 9],
    ['b', 4, 61],
    ['c', 24, 9]]),
    columns=['name', 'attr11', 'attr12'])
df2 = pd.DataFrame(np.array([
    ['a', 5, 19],
    ['b', 14, 16],
    ['c', 4, 9]]),
    columns=['name', 'attr21', 'attr22'])
df3 = pd.DataFrame(np.array([
    ['a', 15, 49],
    ['b', 4, 36],
    ['c', 14, 9]]),
    columns=['name', 'attr31', 'attr32'])

pd.merge(pd.merge(df1,df2,on='name'),df3,on='name')

或者,如cwharland所述

df1.merge(df2,on='name').merge(df3,on='name')

You could try this if you have 3 dataframes

# Merge multiple dataframes
df1 = pd.DataFrame(np.array([
    ['a', 5, 9],
    ['b', 4, 61],
    ['c', 24, 9]]),
    columns=['name', 'attr11', 'attr12'])
df2 = pd.DataFrame(np.array([
    ['a', 5, 19],
    ['b', 14, 16],
    ['c', 4, 9]]),
    columns=['name', 'attr21', 'attr22'])
df3 = pd.DataFrame(np.array([
    ['a', 15, 49],
    ['b', 4, 36],
    ['c', 14, 9]]),
    columns=['name', 'attr31', 'attr32'])

pd.merge(pd.merge(df1,df2,on='name'),df3,on='name')

alternatively, as mentioned by cwharland

df1.merge(df2,on='name').merge(df3,on='name')

回答 2

这是该join方法的理想情况

join方法正是针对这些类型的情况而构建的。您可以将任意数量的DataFrame与其一起加入。调用DataFrame与传递的DataFrames集合的索引连接。要使用多个DataFrame,必须将联接列放在索引中。

代码看起来像这样:

filenames = ['fn1', 'fn2', 'fn3', 'fn4',....]
dfs = [pd.read_csv(filename, index_col=index_col) for filename in filenames)]
dfs[0].join(dfs[1:])

使用@zero的数据,您可以执行以下操作:

df1 = pd.DataFrame(np.array([
    ['a', 5, 9],
    ['b', 4, 61],
    ['c', 24, 9]]),
    columns=['name', 'attr11', 'attr12'])
df2 = pd.DataFrame(np.array([
    ['a', 5, 19],
    ['b', 14, 16],
    ['c', 4, 9]]),
    columns=['name', 'attr21', 'attr22'])
df3 = pd.DataFrame(np.array([
    ['a', 15, 49],
    ['b', 4, 36],
    ['c', 14, 9]]),
    columns=['name', 'attr31', 'attr32'])

dfs = [df1, df2, df3]
dfs = [df.set_index('name') for df in dfs]
dfs[0].join(dfs[1:])

     attr11 attr12 attr21 attr22 attr31 attr32
name                                          
a         5      9      5     19     15     49
b         4     61     14     16      4     36
c        24      9      4      9     14      9

This is an ideal situation for the join method

The join method is built exactly for these types of situations. You can join any number of DataFrames together with it. The calling DataFrame joins with the index of the collection of passed DataFrames. To work with multiple DataFrames, you must put the joining columns in the index.

The code would look something like this:

filenames = ['fn1', 'fn2', 'fn3', 'fn4',....]
dfs = [pd.read_csv(filename, index_col=index_col) for filename in filenames)]
dfs[0].join(dfs[1:])

With @zero’s data, you could do this:

df1 = pd.DataFrame(np.array([
    ['a', 5, 9],
    ['b', 4, 61],
    ['c', 24, 9]]),
    columns=['name', 'attr11', 'attr12'])
df2 = pd.DataFrame(np.array([
    ['a', 5, 19],
    ['b', 14, 16],
    ['c', 4, 9]]),
    columns=['name', 'attr21', 'attr22'])
df3 = pd.DataFrame(np.array([
    ['a', 15, 49],
    ['b', 4, 36],
    ['c', 14, 9]]),
    columns=['name', 'attr31', 'attr32'])

dfs = [df1, df2, df3]
dfs = [df.set_index('name') for df in dfs]
dfs[0].join(dfs[1:])

     attr11 attr12 attr21 attr22 attr31 attr32
name                                          
a         5      9      5     19     15     49
b         4     61     14     16      4     36
c        24      9      4      9     14      9

回答 3

对于数据帧列表,也可以按以下步骤进行操作df_list

df = df_list[0]
for df_ in df_list[1:]:
    df = df.merge(df_, on='join_col_name')

或数据帧在生成器对象中(例如,以减少内存消耗):

df = next(df_list)
for df_ in df_list:
    df = df.merge(df_, on='join_col_name')

This can also be done as follows for a list of dataframes df_list:

df = df_list[0]
for df_ in df_list[1:]:
    df = df.merge(df_, on='join_col_name')

or if the dataframes are in a generator object (e.g. to reduce memory consumption):

df = next(df_list)
for df_ in df_list:
    df = df.merge(df_, on='join_col_name')

回答 4

python3.6.3和pandas0.22.0中concat,只要将要用于联接的列设置为索引,也可以使用

pd.concat(
    (iDF.set_index('name') for iDF in [df1, df2, df3]),
    axis=1, join='inner'
).reset_index()

其中df1df2df3定义为John Galt的答案

import pandas as pd
df1 = pd.DataFrame(np.array([
    ['a', 5, 9],
    ['b', 4, 61],
    ['c', 24, 9]]),
    columns=['name', 'attr11', 'attr12']
)
df2 = pd.DataFrame(np.array([
    ['a', 5, 19],
    ['b', 14, 16],
    ['c', 4, 9]]),
    columns=['name', 'attr21', 'attr22']
)
df3 = pd.DataFrame(np.array([
    ['a', 15, 49],
    ['b', 4, 36],
    ['c', 14, 9]]),
    columns=['name', 'attr31', 'attr32']
)

In python 3.6.3 with pandas 0.22.0 you can also use concat as long as you set as index the columns you want to use for the joining

pd.concat(
    (iDF.set_index('name') for iDF in [df1, df2, df3]),
    axis=1, join='inner'
).reset_index()

where df1, df2, and df3 are defined as in John Galt’s answer

import pandas as pd
df1 = pd.DataFrame(np.array([
    ['a', 5, 9],
    ['b', 4, 61],
    ['c', 24, 9]]),
    columns=['name', 'attr11', 'attr12']
)
df2 = pd.DataFrame(np.array([
    ['a', 5, 19],
    ['b', 14, 16],
    ['c', 4, 9]]),
    columns=['name', 'attr21', 'attr22']
)
df3 = pd.DataFrame(np.array([
    ['a', 15, 49],
    ['b', 4, 36],
    ['c', 14, 9]]),
    columns=['name', 'attr31', 'attr32']
)

回答 5

一个并不需要一个多指标进行连接操作。只需正确设置要在其上执行联接操作的索引列(df.set_index('Name')例如,该命令)

join默认情况下,该操作是对索引执行的。对于您的情况,只需要指定该Name列对应于您的索引即可。下面是一个例子

教程可能是有用的。

# Simple example where dataframes index are the name on which to perform the join operations
import pandas as pd
import numpy as np
name = ['Sophia' ,'Emma' ,'Isabella' ,'Olivia' ,'Ava' ,'Emily' ,'Abigail' ,'Mia']
df1 = pd.DataFrame(np.random.randn(8, 3), columns=['A','B','C'], index=name)
df2 = pd.DataFrame(np.random.randn(8, 1), columns=['D'],         index=name)
df3 = pd.DataFrame(np.random.randn(8, 2), columns=['E','F'],     index=name)
df = df1.join(df2)
df = df.join(df3)

# If you a 'Name' column that is not the index of your dataframe, one can set this column to be the index
# 1) Create a column 'Name' based on the previous index
df1['Name']=df1.index
# 1) Select the index from column 'Name'
df1=df1.set_index('Name')

# If indexes are different, one may have to play with parameter how
gf1 = pd.DataFrame(np.random.randn(8, 3), columns=['A','B','C'], index=range(8))
gf2 = pd.DataFrame(np.random.randn(8, 1), columns=['D'], index=range(2,10))
gf3 = pd.DataFrame(np.random.randn(8, 2), columns=['E','F'], index=range(4,12))

gf = gf1.join(gf2, how='outer')
gf = gf.join(gf3, how='outer')

One does not need a multiindex to perform join operations. One just need to set correctly the index column on which to perform the join operations (which command df.set_index('Name') for example)

The join operation is by default performed on index. In your case, you just have to specify that the Name column corresponds to your index. Below is an example

A tutorial may be useful.

# Simple example where dataframes index are the name on which to perform
# the join operations
import pandas as pd
import numpy as np
name = ['Sophia' ,'Emma' ,'Isabella' ,'Olivia' ,'Ava' ,'Emily' ,'Abigail' ,'Mia']
df1 = pd.DataFrame(np.random.randn(8, 3), columns=['A','B','C'], index=name)
df2 = pd.DataFrame(np.random.randn(8, 1), columns=['D'],         index=name)
df3 = pd.DataFrame(np.random.randn(8, 2), columns=['E','F'],     index=name)
df = df1.join(df2)
df = df.join(df3)

# If you have a 'Name' column that is not the index of your dataframe,
# one can set this column to be the index
# 1) Create a column 'Name' based on the previous index
df1['Name'] = df1.index
# 1) Select the index from column 'Name'
df1 = df1.set_index('Name')

# If indexes are different, one may have to play with parameter how
gf1 = pd.DataFrame(np.random.randn(8, 3), columns=['A','B','C'], index=range(8))
gf2 = pd.DataFrame(np.random.randn(8, 1), columns=['D'], index=range(2,10))
gf3 = pd.DataFrame(np.random.randn(8, 2), columns=['E','F'], index=range(4,12))

gf = gf1.join(gf2, how='outer')
gf = gf.join(gf3, how='outer')

回答 6

这是一种合并数据帧字典,同时使列名与字典同步的方法。如果需要,它还会填写缺失值:

这是合并数据帧字典的功能

def MergeDfDict(dfDict, onCols, how='outer', naFill=None):
  keys = dfDict.keys()
  for i in range(len(keys)):
    key = keys[i]
    df0 = dfDict[key]
    cols = list(df0.columns)
    valueCols = list(filter(lambda x: x not in (onCols), cols))
    df0 = df0[onCols + valueCols]
    df0.columns = onCols + [(s + '_' + key) for s in valueCols] 

    if (i == 0):
      outDf = df0
    else:
      outDf = pd.merge(outDf, df0, how=how, on=onCols)   

  if (naFill != None):
    outDf = outDf.fillna(naFill)

  return(outDf)

好的,让我们生成数据并进行测试:

def GenDf(size):
  df = pd.DataFrame({'categ1':np.random.choice(a=['a', 'b', 'c', 'd', 'e'], size=size, replace=True),
                      'categ2':np.random.choice(a=['A', 'B'], size=size, replace=True), 
                      'col1':np.random.uniform(low=0.0, high=100.0, size=size), 
                      'col2':np.random.uniform(low=0.0, high=100.0, size=size)
                      })
  df = df.sort_values(['categ2', 'categ1', 'col1', 'col2'])
  return(df)


size = 5
dfDict = {'US':GenDf(size), 'IN':GenDf(size), 'GER':GenDf(size)}   
MergeDfDict(dfDict=dfDict, onCols=['categ1', 'categ2'], how='outer', naFill=0)

Here is a method to merge a dictionary of data frames while keeping the column names in sync with the dictionary. Also it fills in missing values if needed:

This is the function to merge a dict of data frames

def MergeDfDict(dfDict, onCols, how='outer', naFill=None):
  keys = dfDict.keys()
  for i in range(len(keys)):
    key = keys[i]
    df0 = dfDict[key]
    cols = list(df0.columns)
    valueCols = list(filter(lambda x: x not in (onCols), cols))
    df0 = df0[onCols + valueCols]
    df0.columns = onCols + [(s + '_' + key) for s in valueCols] 

    if (i == 0):
      outDf = df0
    else:
      outDf = pd.merge(outDf, df0, how=how, on=onCols)   

  if (naFill != None):
    outDf = outDf.fillna(naFill)

  return(outDf)

OK, lets generates data and test this:

def GenDf(size):
  df = pd.DataFrame({'categ1':np.random.choice(a=['a', 'b', 'c', 'd', 'e'], size=size, replace=True),
                      'categ2':np.random.choice(a=['A', 'B'], size=size, replace=True), 
                      'col1':np.random.uniform(low=0.0, high=100.0, size=size), 
                      'col2':np.random.uniform(low=0.0, high=100.0, size=size)
                      })
  df = df.sort_values(['categ2', 'categ1', 'col1', 'col2'])
  return(df)


size = 5
dfDict = {'US':GenDf(size), 'IN':GenDf(size), 'GER':GenDf(size)}   
MergeDfDict(dfDict=dfDict, onCols=['categ1', 'categ2'], how='outer', naFill=0)

回答 7

简单的解决方案:

如果列名相似:

 df1.merge(df2,on='col_name').merge(df3,on='col_name')

如果列名不同:

df1.merge(df2,left_on='col_name1', right_on='col_name2').merge(df3,left_on='col_name1', right_on='col_name3').drop(columns=['col_name2', 'col_name3']).rename(columns={'col_name1':'col_name'})

Simple Solution:

If the column names are similar:

 df1.merge(df2,on='col_name').merge(df3,on='col_name')

If the column names are different:

df1.merge(df2,left_on='col_name1', right_on='col_name2').merge(df3,left_on='col_name1', right_on='col_name3').drop(columns=['col_name2', 'col_name3']).rename(columns={'col_name1':'col_name'})

回答 8

pandas文档中还有另一种解决方案(我在这里看不到),

使用 .append

>>> df = pd.DataFrame([[1, 2], [3, 4]], columns=list('AB'))
   A  B
0  1  2
1  3  4
>>> df2 = pd.DataFrame([[5, 6], [7, 8]], columns=list('AB'))
   A  B
0  5  6
1  7  8
>>> df.append(df2, ignore_index=True)
   A  B
0  1  2
1  3  4
2  5  6
3  7  8

ignore_index=True被用来忽略所附数据帧的索引,在源一个可用下一个索引代替。

如果列名不同,Nan将引入。

There is another solution from the pandas documentation (that I don’t see here),

using the .append

>>> df = pd.DataFrame([[1, 2], [3, 4]], columns=list('AB'))
   A  B
0  1  2
1  3  4
>>> df2 = pd.DataFrame([[5, 6], [7, 8]], columns=list('AB'))
   A  B
0  5  6
1  7  8
>>> df.append(df2, ignore_index=True)
   A  B
0  1  2
1  3  4
2  5  6
3  7  8

The ignore_index=True is used to ignore the index of the appended dataframe, replacing it with the next index available in the source one.

If there are different column names, Nan will be introduced.


回答 9

这三个数据帧是

在此处输入图片说明

在此处输入图片说明

让我们使用嵌套的pd.merge合并这些框架

在此处输入图片说明

在这里,我们有了合并的数据框。

快乐的分析!!!

The three dataframes are

enter image description here

enter image description here

Let’s merge these frames using nested pd.merge

enter image description here

Here we go, we have our merged dataframe.

Happy Analysis!!!


如何将列表合并为元组列表?

问题:如何将列表合并为元组列表?

实现以下目标的Python方法是什么?

# Original lists:

list_a = [1, 2, 3, 4]
list_b = [5, 6, 7, 8]

# List of tuples from 'list_a' and 'list_b':

list_c = [(1,5), (2,6), (3,7), (4,8)]

的每个成员list_c都是一个元组,其第一个成员是from list_a,第二个成员是from list_b

What is the Pythonic approach to achieve the following?

# Original lists:

list_a = [1, 2, 3, 4]
list_b = [5, 6, 7, 8]

# List of tuples from 'list_a' and 'list_b':

list_c = [(1,5), (2,6), (3,7), (4,8)]

Each member of list_c is a tuple, whose first member is from list_a and the second is from list_b.


回答 0

在Python 2中:

>>> list_a = [1, 2, 3, 4]
>>> list_b = [5, 6, 7, 8]
>>> zip(list_a, list_b)
[(1, 5), (2, 6), (3, 7), (4, 8)]

在Python 3中:

>>> list_a = [1, 2, 3, 4]
>>> list_b = [5, 6, 7, 8]
>>> list(zip(list_a, list_b))
[(1, 5), (2, 6), (3, 7), (4, 8)]

In Python 2:

>>> list_a = [1, 2, 3, 4]
>>> list_b = [5, 6, 7, 8]
>>> zip(list_a, list_b)
[(1, 5), (2, 6), (3, 7), (4, 8)]

In Python 3:

>>> list_a = [1, 2, 3, 4]
>>> list_b = [5, 6, 7, 8]
>>> list(zip(list_a, list_b))
[(1, 5), (2, 6), (3, 7), (4, 8)]

回答 1

在python 3.0中,zip返回一个zip对象。您可以调用以获得清单list(zip(a, b))

In python 3.0 zip returns a zip object. You can get a list out of it by calling list(zip(a, b)).


回答 2

您可以使用地图lambda

a = [2,3,4]
b = [5,6,7]
c = map(lambda x,y:(x,y),a,b)

如果原始列表的长度不匹配,这也将起作用

You can use map lambda

a = [2,3,4]
b = [5,6,7]
c = map(lambda x,y:(x,y),a,b)

This will also work if there lengths of original lists do not match


回答 3

您正在寻找内置功能zip

Youre looking for the builtin function zip.


回答 4

我不确定这是否是pythonic方式,但是如果两个列表具有相同数量的元素,这似乎很简单:

list_a = [1, 2, 3, 4]

list_b = [5, 6, 7, 8]

list_c=[(list_a[i],list_b[i]) for i in range(0,len(list_a))]

I am not sure if this a pythonic way or not but this seems simple if both lists have the same number of elements :

list_a = [1, 2, 3, 4]

list_b = [5, 6, 7, 8]

list_c=[(list_a[i],list_b[i]) for i in range(0,len(list_a))]

回答 5

我知道这是一个古老的问题,已经得到回答,但是由于某些原因,我仍然想发布此替代解决方案。我知道很容易找出哪个内置函数可以完成您所需的“魔术”,但是知道您可以自己完成该操作也不会有什么害处。

>>> list_1 = ['Ace', 'King']
>>> list_2 = ['Spades', 'Clubs', 'Diamonds']
>>> deck = []
>>> for i in range(max((len(list_1),len(list_2)))):
        while True:
            try:
                card = (list_1[i],list_2[i])
            except IndexError:
                if len(list_1)>len(list_2):
                    list_2.append('')
                    card = (list_1[i],list_2[i])
                elif len(list_1)<len(list_2):
                    list_1.append('')
                    card = (list_1[i], list_2[i])
                continue
            deck.append(card)
            break
>>>
>>> #and the result should be:
>>> print deck
>>> [('Ace', 'Spades'), ('King', 'Clubs'), ('', 'Diamonds')]

I know this is an old question and was already answered, but for some reason, I still wanna post this alternative solution. I know it’s easy to just find out which built-in function does the “magic” you need, but it doesn’t hurt to know you can do it by yourself.

>>> list_1 = ['Ace', 'King']
>>> list_2 = ['Spades', 'Clubs', 'Diamonds']
>>> deck = []
>>> for i in range(max((len(list_1),len(list_2)))):
        while True:
            try:
                card = (list_1[i],list_2[i])
            except IndexError:
                if len(list_1)>len(list_2):
                    list_2.append('')
                    card = (list_1[i],list_2[i])
                elif len(list_1)<len(list_2):
                    list_1.append('')
                    card = (list_1[i], list_2[i])
                continue
            deck.append(card)
            break
>>>
>>> #and the result should be:
>>> print deck
>>> [('Ace', 'Spades'), ('King', 'Clubs'), ('', 'Diamonds')]

回答 6

您在问题陈述中显示的输出不是元组而是列表

list_c = [(1,5), (2,6), (3,7), (4,8)]

检查

type(list_c)

考虑到您想要结果作为list_a和list_b中的元组,请执行

tuple(zip(list_a,list_b)) 

The output which you showed in problem statement is not the tuple but list

list_c = [(1,5), (2,6), (3,7), (4,8)]

check for

type(list_c)

considering you want the result as tuple out of list_a and list_b, do

tuple(zip(list_a,list_b)) 

回答 7

一种不使用的替代方法zip

list_c = [(p1, p2) for idx1, p1 in enumerate(list_a) for idx2, p2 in enumerate(list_b) if idx1==idx2]

万一不仅要获取元组1st与1st,2nd与2nd …而且要获取2个列表的所有可能组合,可以使用

list_d = [(p1, p2) for p1 in list_a for p2 in list_b]

One alternative without using zip:

list_c = [(p1, p2) for idx1, p1 in enumerate(list_a) for idx2, p2 in enumerate(list_b) if idx1==idx2]

In case one wants to get not only tuples 1st with 1st, 2nd with 2nd… but all possible combinations of the 2 lists, that would be done with

list_d = [(p1, p2) for p1 in list_a for p2 in list_b]

熊猫合并101

问题:熊猫合并101

  • 如何执行与熊猫(LEFT| RIGHT| FULL)(INNER| OUTER)的联接?
  • 合并后如何为缺失的行添加NaN?
  • 合并后如何去除NaN?
  • 我可以合并索引吗?
  • 与大熊猫交叉交往?
  • 如何合并多个DataFrame?
  • mergejoinconcatupdate?WHO?什么?为什么?!

… 和更多。我已经看到这些重复出现的问题,询问有关熊猫合并功能的各个方面。如今,有关合并及其各种用例的大多数信息都分散在数十个措辞不好,无法搜索的帖子中。这里的目的是整理后代的一些更重要的观点。

本QnA是下一章有关大熊猫习语的有用的用户指南中的下一部分(请参阅有关透视的这篇文章有关串联的这篇文章,我将在以后进行探讨)。

请注意,本文并非旨在代替文档,因此也请阅读!一些示例是从那里获取的。

  • How to perform a (LEFT|RIGHT|FULL) (INNER|OUTER) join with pandas?
  • How do I add NaNs for missing rows after merge?
  • How do I get rid of NaNs after merging?
  • Can I merge on the index?
  • Cross join with pandas?
  • How do I merge multiple DataFrames?
  • merge? join? concat? update? Who? What? Why?!

… and more. I’ve seen these recurring questions asking about various facets of the pandas merge functionality. Most of the information regarding merge and its various use cases today is fragmented across dozens of badly worded, unsearchable posts. The aim here is to collate some of the more important points for posterity.

This QnA is meant to be the next installment in a series of helpful user-guides on common pandas idioms (see this post on pivoting, and this post on concatenation, which I will be touching on, later).

Please note that this post is not meant to be a replacement for the documentation, so please read that as well! Some of the examples are taken from there.


回答 0

这篇文章旨在为读者提供有关SQL风格的与熊猫的合并,使用方法以及何时不使用它的入门。

特别是,这是这篇文章的内容:

  • 基础-联接类型(左,右,外,内)

    • 与不同的列名合并
    • 避免在输出中出现重复的合并键列
  • 在不同条件下与索引合并
    • 有效地使用您的命名索引
    • 合并键作为一个索引,另一个索引
  • 多路合并列和索引(唯一和非唯一)
  • 值得注意的替代品mergejoin

这篇文章不会讲的内容:

  • 与性能相关的讨论和时间安排(目前)。在适当的地方,最引人注目的是提到更好的替代方案。
  • 处理后缀,删除多余的列,重命名输出以及其他特定用例。还有其他(阅读:更好)的帖子可以解决这个问题,所以请弄清楚!

注意
除非另有说明,否则大多数示例在演示各种功能时会默认使用INNER JOIN操作。

此外,可以复制和复制此处的所有DataFrame,以便您可以使用它们。另外,请参阅这篇文章 ,了解如何从剪贴板读取DataFrames。

最后,已使用Google绘图手工绘制了JOIN操作的所有可视表示形式。从这里得到启示。

聊够了,只是告诉我如何使用merge

设定

np.random.seed(0)
left = pd.DataFrame({'key': ['A', 'B', 'C', 'D'], 'value': np.random.randn(4)})    
right = pd.DataFrame({'key': ['B', 'D', 'E', 'F'], 'value': np.random.randn(4)})

left

  key     value
0   A  1.764052
1   B  0.400157
2   C  0.978738
3   D  2.240893

right

  key     value
0   B  1.867558
1   D -0.977278
2   E  0.950088
3   F -0.151357

为了简单起见,键列具有相同的名称(目前)。

一个内连接由下式表示

注意:
此规则以及即将发布的附图均遵循以下约定:

  • 蓝色表示合并结果中存在的行
  • 红色表示从结果中排除(即删除)的行
  • 绿色表示缺少的值将在结果中替换为NaN

要执行INNER JOIN,请调用merge左侧的DataFrame,并指定右侧的DataFrame和联接键(至少)作为参数。

left.merge(right, on='key')
# Or, if you want to be explicit
# left.merge(right, on='key', how='inner')

  key   value_x   value_y
0   B  0.400157  1.867558
1   D  2.240893 -0.977278

这仅返回来自leftright共享一个公共密钥的行(在本示例中为“ B”和“ D”)。

LEFT OUTER JOIN,或LEFT JOIN由下式表示

可以通过指定执行how='left'

left.merge(right, on='key', how='left')

  key   value_x   value_y
0   A  1.764052       NaN
1   B  0.400157  1.867558
2   C  0.978738       NaN
3   D  2.240893 -0.977278

请仔细注意NaN的位置。如果指定how='left',则仅left使用from 的键,而rightNaN替换缺少的数据。

同样,对于RIGHT OUTER JOIN或RIGHT JOIN来说…

…指定how='right'

left.merge(right, on='key', how='right')

  key   value_x   value_y
0   B  0.400157  1.867558
1   D  2.240893 -0.977278
2   E       NaN  0.950088
3   F       NaN -0.151357

在这里,right使用了from 的键,而缺失的数据left被NaN代替。

最后,对于FULL OUTER JOIN,由

指定how='outer'

left.merge(right, on='key', how='outer')

  key   value_x   value_y
0   A  1.764052       NaN
1   B  0.400157  1.867558
2   C  0.978738       NaN
3   D  2.240893 -0.977278
4   E       NaN  0.950088
5   F       NaN -0.151357

这将使用两个框架中的关键点,并且会为两个框架中缺少的行插入NaN。

该文档很好地总结了这些各种合并:

在此处输入图片说明

其他联接-左排除,右排除和全排除/ ANTI连接

如果您需要分两个步骤进行LEFT-Exclusive JOINRIGHT-Exclusive JOIN

对于不包括JOIN的LEFT,表示为

首先执行LEFT OUTER JOIN,然后过滤(不包括!)left仅来自于行的行,

(left.merge(right, on='key', how='left', indicator=True)
     .query('_merge == "left_only"')
     .drop('_merge', 1))

  key   value_x  value_y
0   A  1.764052      NaN
2   C  0.978738      NaN

哪里,

left.merge(right, on='key', how='left', indicator=True)

  key   value_x   value_y     _merge
0   A  1.764052       NaN  left_only
1   B  0.400157  1.867558       both
2   C  0.978738       NaN  left_only
3   D  2.240893 -0.977278       both

同样,对于除权利加入之外,

(left.merge(right, on='key', how='right', indicator=True)
     .query('_merge == "right_only"')
     .drop('_merge', 1))

  key  value_x   value_y
2   E      NaN  0.950088
3   F      NaN -0.151357

最后,如果您需要执行合并操作,而该合并操作只保留左侧或右侧的键,而不同时保留两者(IOW,执行ANTI-JOIN),

您可以通过类似的方式进行操作-

(left.merge(right, on='key', how='outer', indicator=True)
     .query('_merge != "both"')
     .drop('_merge', 1))

  key   value_x   value_y
0   A  1.764052       NaN
2   C  0.978738       NaN
4   E       NaN  0.950088
5   F       NaN -0.151357

键列的不同名称

如果键列的名称不同(例如,lefthas keyLeftrighthas keyRight代替),key则必须指定left_onright_on作为参数,而不是on

left2 = left.rename({'key':'keyLeft'}, axis=1)
right2 = right.rename({'key':'keyRight'}, axis=1)

left2

  keyLeft     value
0       A  1.764052
1       B  0.400157
2       C  0.978738
3       D  2.240893

right2

  keyRight     value
0        B  1.867558
1        D -0.977278
2        E  0.950088
3        F -0.151357

left2.merge(right2, left_on='keyLeft', right_on='keyRight', how='inner')

  keyLeft   value_x keyRight   value_y
0       B  0.400157        B  1.867558
1       D  2.240893        D -0.977278

避免在输出中重复键列

keyLeftfrom leftkeyRightfrom 上进行合并时right,如果只希望在输出中使用keyLeftkeyRight(但不同时使用)中的任何一个,则可以从将索引设置为初步步骤开始。

left3 = left2.set_index('keyLeft')
left3.merge(right2, left_index=True, right_on='keyRight')

    value_x keyRight   value_y
0  0.400157        B  1.867558
1  2.240893        D -0.977278

将此与命令输出(恰恰是的输出left2.merge(right2, left_on='keyLeft', right_on='keyRight', how='inner'))进行对比(您会发现keyLeft它丢失了)。您可以根据将哪个帧的索引设置为关键字来找出要保留的列。例如,当执行某些OUTER JOIN操作时,这可能很重要。

仅合并其中一个的单个列 DataFrames

例如,考虑

right3 = right.assign(newcol=np.arange(len(right)))
right3
  key     value  newcol
0   B  1.867558       0
1   D -0.977278       1
2   E  0.950088       2
3   F -0.151357       3

如果只需要合并“ new_val”(不包含任何其他列),通常可以在合并之前仅对列进行子集化:

left.merge(right3[['key', 'newcol']], on='key')

  key     value  newcol
0   B  0.400157       0
1   D  2.240893       1

如果您要进行左外部联接,则性能更高的解决方案将涉及map

# left['newcol'] = left['key'].map(right3.set_index('key')['newcol']))
left.assign(newcol=left['key'].map(right3.set_index('key')['newcol']))

  key     value  newcol
0   A  1.764052     NaN
1   B  0.400157     0.0
2   C  0.978738     NaN
3   D  2.240893     1.0

如前所述,这类似于但比

left.merge(right3[['key', 'newcol']], on='key', how='left')

  key     value  newcol
0   A  1.764052     NaN
1   B  0.400157     0.0
2   C  0.978738     NaN
3   D  2.240893     1.0

合并多列

要加入对多列,指定列表on(或left_onright_on,如适用)。

left.merge(right, on=['key1', 'key2'] ...)

或者,如果名称不同,

left.merge(right, left_on=['lkey1', 'lkey2'], right_on=['rkey1', 'rkey2'])

其他有用的merge*操作和功能

本节仅介绍最基本的内容,目的只是为了激发您的胃口。更多的例子和案例,看到的文档mergejoin以及concat还有链接的功能规格。


基于索引的* -JOIN(+ index-column merges)

设定

np.random.seed([3, 14])
left = pd.DataFrame({'value': np.random.randn(4)}, index=['A', 'B', 'C', 'D'])    
right = pd.DataFrame({'value': np.random.randn(4)}, index=['B', 'D', 'E', 'F'])
left.index.name = right.index.name = 'idxkey'

left
           value
idxkey          
A      -0.602923
B      -0.402655
C       0.302329
D      -0.524349

right

           value
idxkey          
B       0.543843
D       0.013135
E      -0.326498
F       1.385076

通常,索引合并看起来像这样:

left.merge(right, left_index=True, right_index=True)


         value_x   value_y
idxkey                    
B      -0.402655  0.543843
D      -0.524349  0.013135

支持索引名称

如果索引被命名,然后v0.23用户还可以指定的级别名称on(或left_onright_on必要的)。

left.merge(right, on='idxkey')

         value_x   value_y
idxkey                    
B      -0.402655  0.543843
D      -0.524349  0.013135

合并一个索引,另一个列

可以(并且非常简单)使用一个索引和另一个列进行合并。例如,

left.merge(right, left_on='key1', right_index=True)

反之亦然(right_on=...left_index=True)。

right2 = right.reset_index().rename({'idxkey' : 'colkey'}, axis=1)
right2

  colkey     value
0      B  0.543843
1      D  0.013135
2      E -0.326498
3      F  1.385076

left.merge(right2, left_index=True, right_on='colkey')

    value_x colkey   value_y
0 -0.402655      B  0.543843
1 -0.524349      D  0.013135

在这种特殊情况下,left为命名了索引,因此您也可以将索引名称与一起使用left_on,如下所示:

left.merge(right2, left_on='idxkey', right_on='colkey')

    value_x colkey   value_y
0 -0.402655      B  0.543843
1 -0.524349      D  0.013135

DataFrame.join
除了这些,还有另一个简洁的选择。您可以使用DataFrame.join默认值来联接索引。DataFrame.join默认情况下不做LEFT OUTER JOIN,所以how='inner'在这里是必要的。

left.join(right, how='inner', lsuffix='_x', rsuffix='_y')

         value_x   value_y
idxkey                    
B      -0.402655  0.543843
D      -0.524349  0.013135

请注意,我需要指定lsuffixrsuffix参数join,否则会出错:

left.join(right)
ValueError: columns overlap but no suffix specified: Index(['value'], dtype='object')

由于列名相同。如果它们的名称不同,这将不是问题。

left.rename(columns={'value':'leftvalue'}).join(right, how='inner')

        leftvalue     value
idxkey                     
B       -0.402655  0.543843
D       -0.524349  0.013135

pd.concat
最后,作为基于索引的联接的替代方法,可以使用pd.concat

pd.concat([left, right], axis=1, sort=False, join='inner')

           value     value
idxkey                    
B      -0.402655  0.543843
D      -0.524349  0.013135

省略join='inner'是否需要FULL OUTER JOIN(默认):

pd.concat([left, right], axis=1, sort=False)

      value     value
A -0.602923       NaN
B -0.402655  0.543843
C  0.302329       NaN
D -0.524349  0.013135
E       NaN -0.326498
F       NaN  1.385076

有关更多信息,请参见@piRSquared pd.concat的此规范帖子


通用化:merge处理多个DataFrame

通常,将多个DataFrame合并在一起时会出现这种情况。天真的,这可以通过链接merge调用来完成:

df1.merge(df2, ...).merge(df3, ...)

但是,对于许多DataFrame,这很快就变得一发不可收拾。此外,可能有必要归纳为未知数量的DataFrame。

在这里,我将介绍pd.concat针对唯一DataFrame.join的多方联接,以及针对非唯一键的多方联接。首先,设置。

# Setup.
np.random.seed(0)
A = pd.DataFrame({'key': ['A', 'B', 'C', 'D'], 'valueA': np.random.randn(4)})    
B = pd.DataFrame({'key': ['B', 'D', 'E', 'F'], 'valueB': np.random.randn(4)})
C = pd.DataFrame({'key': ['D', 'E', 'J', 'C'], 'valueC': np.ones(4)})
dfs = [A, B, C] 

# Note, the "key" column values are unique, so the index is unique.
A2 = A.set_index('key')
B2 = B.set_index('key')
C2 = C.set_index('key')

dfs2 = [A2, B2, C2]

多路合并唯一键(或索引)

如果您的键(此处的键可以是列或索引)是唯一的,则可以使用pd.concat。请注意,pd.concat将DataFrames连接到索引

# merge on `key` column, you'll need to set the index before concatenating
pd.concat([
    df.set_index('key') for df in dfs], axis=1, join='inner'
).reset_index()

  key    valueA    valueB  valueC
0   D  2.240893 -0.977278     1.0

# merge on `key` index
pd.concat(dfs2, axis=1, sort=False, join='inner')

       valueA    valueB  valueC
key                            
D    2.240893 -0.977278     1.0

省略join='inner'完整的外部联接。请注意,您不能指定LEFT或RIGHT OUTER连接(如果需要这些连接,请使用join,如下所述)。

多路合并重复项

concat速度很快,但也有缺点。它不能处理重复项。

A3 = pd.DataFrame({'key': ['A', 'B', 'C', 'D', 'D'], 'valueA': np.random.randn(5)})

pd.concat([df.set_index('key') for df in [A3, B, C]], axis=1, join='inner')
ValueError: Shape of passed values is (3, 4), indices imply (3, 2)

在这种情况下,我们可以使用join它,因为它可以处理非唯一键(请注意,join将DataFrames连接到它们的索引上;merge除非另有说明,否则它在后台进行调用并执行LEFT OUTER JOIN)。

# join on `key` column, set as the index first
# For inner join. For left join, omit the "how" argument.
A.set_index('key').join(
    [df.set_index('key') for df in (B, C)], how='inner').reset_index()

  key    valueA    valueB  valueC
0   D  2.240893 -0.977278     1.0

# join on `key` index
A3.set_index('key').join([B2, C2], how='inner')

       valueA    valueB  valueC
key                            
D    1.454274 -0.977278     1.0
D    0.761038 -0.977278     1.0

This post aims to give readers a primer on SQL-flavoured merging with pandas, how to use it, and when not to use it.

In particular, here’s what this post will go through:

  • The basics – types of joins (LEFT, RIGHT, OUTER, INNER)

    • merging with different column names
    • avoiding duplicate merge key column in output
  • Merging with index under different conditions
    • effectively using your named index
    • merge key as the index of one and column of another
  • Multiway merges on columns and indexes (unique and non-unique)
  • Notable alternatives to merge and join

What this post will not go through:

  • Performance-related discussions and timings (for now). Mostly notable mentions of better alternatives, wherever appropriate.
  • Handling suffixes, removing extra columns, renaming outputs, and other specific use cases. There are other (read: better) posts that deal with that, so figure it out!

Note
Most examples default to INNER JOIN operations while demonstrating various features, unless otherwise specified.

Furthermore, all the DataFrames here can be copied and replicated so you can play with them. Also, see this post on how to read DataFrames from your clipboard.

Lastly, all visual representation of JOIN operations have been hand-drawn using Google Drawings. Inspiration from here.

Enough Talk, just show me how to use merge!

Setup

np.random.seed(0)
left = pd.DataFrame({'key': ['A', 'B', 'C', 'D'], 'value': np.random.randn(4)})    
right = pd.DataFrame({'key': ['B', 'D', 'E', 'F'], 'value': np.random.randn(4)})

left

  key     value
0   A  1.764052
1   B  0.400157
2   C  0.978738
3   D  2.240893

right

  key     value
0   B  1.867558
1   D -0.977278
2   E  0.950088
3   F -0.151357

For the sake of simplicity, the key column has the same name (for now).

An INNER JOIN is represented by

Note
This, along with the forthcoming figures all follow this convention:

  • blue indicates rows that are present in the merge result
  • red indicates rows that are excluded from the result (i.e., removed)
  • green indicates missing values that are replaced with NaNs in the result

To perform an INNER JOIN, call merge on the left DataFrame, specifying the right DataFrame and the join key (at the very least) as arguments.

left.merge(right, on='key')
# Or, if you want to be explicit
# left.merge(right, on='key', how='inner')

  key   value_x   value_y
0   B  0.400157  1.867558
1   D  2.240893 -0.977278

This returns only rows from left and right which share a common key (in this example, “B” and “D).

A LEFT OUTER JOIN, or LEFT JOIN is represented by

This can be performed by specifying how='left'.

left.merge(right, on='key', how='left')

  key   value_x   value_y
0   A  1.764052       NaN
1   B  0.400157  1.867558
2   C  0.978738       NaN
3   D  2.240893 -0.977278

Carefully note the placement of NaNs here. If you specify how='left', then only keys from left are used, and missing data from right is replaced by NaN.

And similarly, for a RIGHT OUTER JOIN, or RIGHT JOIN which is…

…specify how='right':

left.merge(right, on='key', how='right')

  key   value_x   value_y
0   B  0.400157  1.867558
1   D  2.240893 -0.977278
2   E       NaN  0.950088
3   F       NaN -0.151357

Here, keys from right are used, and missing data from left is replaced by NaN.

Finally, for the FULL OUTER JOIN, given by

specify how='outer'.

left.merge(right, on='key', how='outer')

  key   value_x   value_y
0   A  1.764052       NaN
1   B  0.400157  1.867558
2   C  0.978738       NaN
3   D  2.240893 -0.977278
4   E       NaN  0.950088
5   F       NaN -0.151357

This uses the keys from both frames, and NaNs are inserted for missing rows in both.

The documentation summarises these various merges nicely:

enter image description here

Other JOINs – LEFT-Excluding, RIGHT-Excluding, and FULL-Excluding/ANTI JOINs

If you need LEFT-Excluding JOINs and RIGHT-Excluding JOINs in two steps.

For LEFT-Excluding JOIN, represented as

Start by performing a LEFT OUTER JOIN and then filtering (excluding!) rows coming from left only,

(left.merge(right, on='key', how='left', indicator=True)
     .query('_merge == "left_only"')
     .drop('_merge', 1))

  key   value_x  value_y
0   A  1.764052      NaN
2   C  0.978738      NaN

Where,

left.merge(right, on='key', how='left', indicator=True)

  key   value_x   value_y     _merge
0   A  1.764052       NaN  left_only
1   B  0.400157  1.867558       both
2   C  0.978738       NaN  left_only
3   D  2.240893 -0.977278       both

And similarly, for a RIGHT-Excluding JOIN,

(left.merge(right, on='key', how='right', indicator=True)
     .query('_merge == "right_only"')
     .drop('_merge', 1))

  key  value_x   value_y
2   E      NaN  0.950088
3   F      NaN -0.151357

Lastly, if you are required to do a merge that only retains keys from the left or right, but not both (IOW, performing an ANTI-JOIN),

You can do this in similar fashion—

(left.merge(right, on='key', how='outer', indicator=True)
     .query('_merge != "both"')
     .drop('_merge', 1))

  key   value_x   value_y
0   A  1.764052       NaN
2   C  0.978738       NaN
4   E       NaN  0.950088
5   F       NaN -0.151357

Different names for key columns

If the key columns are named differently—for example, left has keyLeft, and right has keyRight instead of key—then you will have to specify left_on and right_on as arguments instead of on:

left2 = left.rename({'key':'keyLeft'}, axis=1)
right2 = right.rename({'key':'keyRight'}, axis=1)

left2

  keyLeft     value
0       A  1.764052
1       B  0.400157
2       C  0.978738
3       D  2.240893

right2

  keyRight     value
0        B  1.867558
1        D -0.977278
2        E  0.950088
3        F -0.151357

left2.merge(right2, left_on='keyLeft', right_on='keyRight', how='inner')

  keyLeft   value_x keyRight   value_y
0       B  0.400157        B  1.867558
1       D  2.240893        D -0.977278

Avoiding duplicate key column in output

When merging on keyLeft from left and keyRight from right, if you only want either of the keyLeft or keyRight (but not both) in the output, you can start by setting the index as a preliminary step.

left3 = left2.set_index('keyLeft')
left3.merge(right2, left_index=True, right_on='keyRight')

    value_x keyRight   value_y
0  0.400157        B  1.867558
1  2.240893        D -0.977278

Contrast this with the output of the command just before (thst is, the output of left2.merge(right2, left_on='keyLeft', right_on='keyRight', how='inner')), you’ll notice keyLeft is missing. You can figure out what column to keep based on which frame’s index is set as the key. This may matter when, say, performing some OUTER JOIN operation.

Merging only a single column from one of the DataFrames

For example, consider

right3 = right.assign(newcol=np.arange(len(right)))
right3
  key     value  newcol
0   B  1.867558       0
1   D -0.977278       1
2   E  0.950088       2
3   F -0.151357       3

If you are required to merge only “new_val” (without any of the other columns), you can usually just subset columns before merging:

left.merge(right3[['key', 'newcol']], on='key')

  key     value  newcol
0   B  0.400157       0
1   D  2.240893       1

If you’re doing a LEFT OUTER JOIN, a more performant solution would involve map:

# left['newcol'] = left['key'].map(right3.set_index('key')['newcol']))
left.assign(newcol=left['key'].map(right3.set_index('key')['newcol']))

  key     value  newcol
0   A  1.764052     NaN
1   B  0.400157     0.0
2   C  0.978738     NaN
3   D  2.240893     1.0

As mentioned, this is similar to, but faster than

left.merge(right3[['key', 'newcol']], on='key', how='left')

  key     value  newcol
0   A  1.764052     NaN
1   B  0.400157     0.0
2   C  0.978738     NaN
3   D  2.240893     1.0

Merging on multiple columns

To join on more than one column, specify a list for on (or left_on and right_on, as appropriate).

left.merge(right, on=['key1', 'key2'] ...)

Or, in the event the names are different,

left.merge(right, left_on=['lkey1', 'lkey2'], right_on=['rkey1', 'rkey2'])

Other useful merge* operations and functions

This section only covers the very basics, and is designed to only whet your appetite. For more examples and cases, see the documentation on merge, join, and concat as well as the links to the function specs.


Index-based *-JOIN (+ index-column merges)

Setup

np.random.seed([3, 14])
left = pd.DataFrame({'value': np.random.randn(4)}, index=['A', 'B', 'C', 'D'])    
right = pd.DataFrame({'value': np.random.randn(4)}, index=['B', 'D', 'E', 'F'])
left.index.name = right.index.name = 'idxkey'

left
           value
idxkey          
A      -0.602923
B      -0.402655
C       0.302329
D      -0.524349

right

           value
idxkey          
B       0.543843
D       0.013135
E      -0.326498
F       1.385076

Typically, a merge on index would look like this:

left.merge(right, left_index=True, right_index=True)


         value_x   value_y
idxkey                    
B      -0.402655  0.543843
D      -0.524349  0.013135

Support for index names

If your index is named, then v0.23 users can also specify the level name to on (or left_on and right_on as necessary).

left.merge(right, on='idxkey')

         value_x   value_y
idxkey                    
B      -0.402655  0.543843
D      -0.524349  0.013135

Merging on index of one, column(s) of another

It is possible (and quite simple) to use the index of one, and the column of another, to perform a merge. For example,

left.merge(right, left_on='key1', right_index=True)

Or vice versa (right_on=... and left_index=True).

right2 = right.reset_index().rename({'idxkey' : 'colkey'}, axis=1)
right2

  colkey     value
0      B  0.543843
1      D  0.013135
2      E -0.326498
3      F  1.385076

left.merge(right2, left_index=True, right_on='colkey')

    value_x colkey   value_y
0 -0.402655      B  0.543843
1 -0.524349      D  0.013135

In this special case, the index for left is named, so you can also use the index name with left_on, like this:

left.merge(right2, left_on='idxkey', right_on='colkey')

    value_x colkey   value_y
0 -0.402655      B  0.543843
1 -0.524349      D  0.013135

DataFrame.join
Besides these, there is another succinct option. You can use DataFrame.join which defaults to joins on the index. DataFrame.join does a LEFT OUTER JOIN by default, so how='inner' is necessary here.

left.join(right, how='inner', lsuffix='_x', rsuffix='_y')

         value_x   value_y
idxkey                    
B      -0.402655  0.543843
D      -0.524349  0.013135

Note that I needed to specify the lsuffix and rsuffix arguments since join would otherwise error out:

left.join(right)
ValueError: columns overlap but no suffix specified: Index(['value'], dtype='object')

Since the column names are the same. This would not be a problem if they were differently named.

left.rename(columns={'value':'leftvalue'}).join(right, how='inner')

        leftvalue     value
idxkey                     
B       -0.402655  0.543843
D       -0.524349  0.013135

pd.concat
Lastly, as an alternative for index-based joins, you can use pd.concat:

pd.concat([left, right], axis=1, sort=False, join='inner')

           value     value
idxkey                    
B      -0.402655  0.543843
D      -0.524349  0.013135

Omit join='inner' if you need a FULL OUTER JOIN (the default):

pd.concat([left, right], axis=1, sort=False)

      value     value
A -0.602923       NaN
B -0.402655  0.543843
C  0.302329       NaN
D -0.524349  0.013135
E       NaN -0.326498
F       NaN  1.385076

For more information, see this canonical post on pd.concat by @piRSquared.


Generalizing: mergeing multiple DataFrames

Oftentimes, the situation arises when multiple DataFrames are to be merged together. Naively, this can be done by chaining merge calls:

df1.merge(df2, ...).merge(df3, ...)

However, this quickly gets out of hand for many DataFrames. Furthermore, it may be necessary to generalise for an unknown number of DataFrames.

Here I introduce pd.concat for multi-way joins on unique keys, and DataFrame.join for multi-way joins on non-unique keys. First, the setup.

# Setup.
np.random.seed(0)
A = pd.DataFrame({'key': ['A', 'B', 'C', 'D'], 'valueA': np.random.randn(4)})    
B = pd.DataFrame({'key': ['B', 'D', 'E', 'F'], 'valueB': np.random.randn(4)})
C = pd.DataFrame({'key': ['D', 'E', 'J', 'C'], 'valueC': np.ones(4)})
dfs = [A, B, C] 

# Note, the "key" column values are unique, so the index is unique.
A2 = A.set_index('key')
B2 = B.set_index('key')
C2 = C.set_index('key')

dfs2 = [A2, B2, C2]

Multiway merge on unique keys (or index)

If your keys (here, the key could either be a column or an index) are unique, then you can use pd.concat. Note that pd.concat joins DataFrames on the index.

# merge on `key` column, you'll need to set the index before concatenating
pd.concat([
    df.set_index('key') for df in dfs], axis=1, join='inner'
).reset_index()

  key    valueA    valueB  valueC
0   D  2.240893 -0.977278     1.0

# merge on `key` index
pd.concat(dfs2, axis=1, sort=False, join='inner')

       valueA    valueB  valueC
key                            
D    2.240893 -0.977278     1.0

Omit join='inner' for a FULL OUTER JOIN. Note that you cannot specify LEFT or RIGHT OUTER joins (if you need these, use join, described below).

Multiway merge on keys with duplicates

concat is fast, but has its shortcomings. It cannot handle duplicates.

A3 = pd.DataFrame({'key': ['A', 'B', 'C', 'D', 'D'], 'valueA': np.random.randn(5)})

pd.concat([df.set_index('key') for df in [A3, B, C]], axis=1, join='inner')
ValueError: Shape of passed values is (3, 4), indices imply (3, 2)

In this situation, we can use join since it can handle non-unique keys (note that join joins DataFrames on their index; it calls merge under the hood and does a LEFT OUTER JOIN unless otherwise specified).

# join on `key` column, set as the index first
# For inner join. For left join, omit the "how" argument.
A.set_index('key').join(
    [df.set_index('key') for df in (B, C)], how='inner').reset_index()

  key    valueA    valueB  valueC
0   D  2.240893 -0.977278     1.0

# join on `key` index
A3.set_index('key').join([B2, C2], how='inner')

       valueA    valueB  valueC
key                            
D    1.454274 -0.977278     1.0
D    0.761038 -0.977278     1.0

回答 1

一个补充的视觉观pd.concat([df0, df1], kwargs)。请注意,kwarg axis=0axis=1的含义不如df.mean()或直观df.apply(func)


在pd.concat([df0,df1])

A supplemental visual view of pd.concat([df0, df1], kwargs). Notice that, kwarg axis=0 or axis=1 ‘s meaning is not as intuitive as df.mean() or df.apply(func)


on pd.concat([df0, df1])


如何在Python中的单个表达式中合并两个字典?

问题:如何在Python中的单个表达式中合并两个字典?

我有两个Python字典,我想编写一个返回合并的这两个字典的表达式。update()如果返回结果而不是就地修改字典,该方法将是我所需要的。

>>> x = {'a': 1, 'b': 2}
>>> y = {'b': 10, 'c': 11}
>>> z = x.update(y)
>>> print(z)
None
>>> x
{'a': 1, 'b': 10, 'c': 11}

我怎样才能在最终的合并字典z,不是x

(更清楚地说,dict.update()我正在寻找的最后一个胜出的冲突处理方法是。)

I have two Python dictionaries, and I want to write a single expression that returns these two dictionaries, merged. The update() method would be what I need, if it returned its result instead of modifying a dictionary in-place.

>>> x = {'a': 1, 'b': 2}
>>> y = {'b': 10, 'c': 11}
>>> z = x.update(y)
>>> print(z)
None
>>> x
{'a': 1, 'b': 10, 'c': 11}

How can I get that final merged dictionary in z, not x?

(To be extra-clear, the last-one-wins conflict-handling of dict.update() is what I’m looking for as well.)


回答 0

如何在一个表达式中合并两个Python字典?

对于字典xyz变成了浅层合并的字典,带有y替换的值x

  • 在Python 3.5或更高版本中:

    z = {**x, **y}
  • 在Python 2(或3.4或更低版本)中,编写一个函数:

    def merge_two_dicts(x, y):
        z = x.copy()   # start with x's keys and values
        z.update(y)    # modifies z with y's keys and values & returns None
        return z

    现在:

    z = merge_two_dicts(x, y)
  • 在Python 3.9.0a4以上(最终发布日期大约2020年10月):PEP-584这里讨论,执行,以进一步简化这一点:

    z = x | y          # NOTE: 3.9+ ONLY

说明

假设您有两个字典,并且想要将它们合并为新字典而不更改原始字典:

x = {'a': 1, 'b': 2}
y = {'b': 3, 'c': 4}

理想的结果是获得一个z合并了值的新字典(),第二个dict的值覆盖第一个字典的值。

>>> z
{'a': 1, 'b': 3, 'c': 4}

PEP 448中提出并从Python 3.5开始可用的新语法是

z = {**x, **y}

它确实是一个表达。

注意,我们也可以使用文字符号合并:

z = {**x, 'foo': 1, 'bar': 2, **y}

现在:

>>> z
{'a': 1, 'b': 3, 'foo': 1, 'bar': 2, 'c': 4}

它现在显示为在3.5发布时间表中实现,PEP 478,并且已进入Python 3.5的新功能文档。

但是,由于许多组织仍在使用Python 2,因此您可能希望以向后兼容的方式进行操作。在Python 2和Python 3.0-3.4中可用的经典Pythonic方法是分两个步骤完成的:

z = x.copy()
z.update(y) # which returns None since it mutates z

在这两种方法中,y将排第二,其值将替换x的值,因此'b'将指向3我们的最终结果。

尚未在Python 3.5上运行,但需要一个表达式

如果您尚未使用Python 3.5,或者需要编写向后兼容的代码,并且希望在单个表达式中使用它,则最有效的方法是将其放入函数中:

def merge_two_dicts(x, y):
    """Given two dicts, merge them into a new dict as a shallow copy."""
    z = x.copy()
    z.update(y)
    return z

然后您有一个表达式:

z = merge_two_dicts(x, y)

您还可以创建一个函数来合并未定义数量的dict,从零到非常大的数量:

def merge_dicts(*dict_args):
    """
    Given any number of dicts, shallow copy and merge into a new dict,
    precedence goes to key value pairs in latter dicts.
    """
    result = {}
    for dictionary in dict_args:
        result.update(dictionary)
    return result

此函数将在Python 2和3中适用于所有字典。例如给以下a命令g

z = merge_dicts(a, b, c, d, e, f, g) 

和中的键值对g优先af,以此类推。

其他答案的批判

不要使用以前接受的答案中看到的内容:

z = dict(x.items() + y.items())

在Python 2中,您将在每个内存字典中创建两个列表,在内存中创建第三个列表,其长度等于前两个字典的长度,然后丢弃所有三个列表以创建字典。在Python 3中,这将失败,因为您将两个dict_items对象而不是两个列表加在一起-

>>> c = dict(a.items() + b.items())
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: unsupported operand type(s) for +: 'dict_items' and 'dict_items'

并且您必须将它们明确创建为列表,例如z = dict(list(x.items()) + list(y.items()))。这浪费了资源和计算能力。

类似地,当值是不可散列的对象(例如列表)时,items()在Python 3(viewitems()在Python 2.7中)进行联合也将失败。即使您的值是可哈希的,由于集合在语义上是无序的,因此关于优先级的行为是不确定的。所以不要这样做:

>>> c = dict(a.items() | b.items())

此示例演示了值不可散列时会发生的情况:

>>> x = {'a': []}
>>> y = {'b': []}
>>> dict(x.items() | y.items())
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: unhashable type: 'list'

这是一个示例,其中y应该优先,但是由于集合的任意顺序,保留了x的值:

>>> x = {'a': 2}
>>> y = {'a': 1}
>>> dict(x.items() | y.items())
{'a': 2}

您不应该使用的另一种技巧:

z = dict(x, **y)

这使用了dict构造函数,并且非常快速且具有内存效率(甚至比我们的两步过程略高),但是除非您确切地知道这里正在发生什么(也就是说,第二个dict作为关键字参数传递给dict,构造函数),很难阅读,它不是预期的用法,因此不是Pythonic。

这是在django修复的用法示例。

字典旨在获取可散列的键(例如,frozenset或元组),但是当键不是字符串时此方法在Python 3中失败。

>>> c = dict(a, **b)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: keyword arguments must be strings

邮件列表中,该语言的创建者Guido van Rossum写道:

我宣布dict({},** {1:3})非法是可以的,因为毕竟这是对**机制的滥用。

显然dict(x,** y)被“调用x.update(y)并返回x”的“酷砍”。我个人觉得它比酷更卑鄙。

根据我的理解(以及对语言创建者的理解),该命令的预期用途dict(**y)是用于创建可读性强的命令,例如:

dict(a=1, b=10, c=11)

代替

{'a': 1, 'b': 10, 'c': 11}

对评论的回应

尽管Guido说了什么dict(x, **y),但符合dict规范,顺便说一句。它仅适用于Python 2和3。事实上,这仅适用于字符串键,这是关键字参数如何工作的直接结果,而不是字典的简称。在这个地方使用**运算符也不会滥用该机制,实际上**正是为了将dict作为关键字传递而设计的。

同样,当键为非字符串时,它不适用于3。隐式调用协定是命名空间采用普通命令,而用户只能传递字符串形式的关键字参数。所有其他可调用对象都强制执行了它。dict在Python 2中破坏了这种一致性:

>>> foo(**{('a', 'b'): None})
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: foo() keywords must be strings
>>> dict(**{('a', 'b'): None})
{('a', 'b'): None}

考虑到其他Python实现(Pypy,Jython,IronPython),这种不一致是很糟糕的。因此,它在Python 3中已得到修复,因为这种用法可能是一个重大更改。

我向您指出,故意编写仅在一种语言版本中有效或仅在特定的任意约束下有效的代码是恶意的无能。

更多评论:

dict(x.items() + y.items()) 仍然是Python 2最具可读性的解决方案。可读性至关重要。

我的回答:merge_two_dicts(x, y)如果我们实际上担心可读性,实际上对我来说似乎更加清晰。而且它不向前兼容,因为Python 2越来越不推荐使用。

{**x, **y}似乎不处理嵌套字典。嵌套键的内容只是被覆盖,而不是被合并。我最终被这些没有递归合并的答案所烧死,令我惊讶的是,没有人提到它。在我对“合并”一词的解释中,这些答案描述的是“将一个词典与另一个词典更新”,而不是合并。

是。我必须回头再问这个问题,它要求两个字典进行浅层合并,第一个字典的值由第二个字典覆盖-在一个表达式中。

假设有两个字典,一个字典可能会递归地将它们合并到一个函数中,但是您应注意不要从任何一个源修改字典,避免这种情况的最可靠方法是在分配值时进行复制。由于键必须是可散列的,因此通常是不可变的,因此复制它们毫无意义:

from copy import deepcopy

def dict_of_dicts_merge(x, y):
    z = {}
    overlapping_keys = x.keys() & y.keys()
    for key in overlapping_keys:
        z[key] = dict_of_dicts_merge(x[key], y[key])
    for key in x.keys() - overlapping_keys:
        z[key] = deepcopy(x[key])
    for key in y.keys() - overlapping_keys:
        z[key] = deepcopy(y[key])
    return z

用法:

>>> x = {'a':{1:{}}, 'b': {2:{}}}
>>> y = {'b':{10:{}}, 'c': {11:{}}}
>>> dict_of_dicts_merge(x, y)
{'b': {2: {}, 10: {}}, 'a': {1: {}}, 'c': {11: {}}}

提出其他值类型的偶发性问题远远超出了此问题的范围,因此,我将为您回答有关“词典合并字典”的规范问题的答案

性能较差但临时性正确

这些方法的性能较差,但是它们将提供正确的行为。他们将少得多比高性能copyupdate或新的拆包,因为他们通过在更高的抽象水平的每个键-值对迭代,但他们做的尊重优先顺序(后者类型的字典具有优先权)

您还可以在dict理解内手动将dict链接:

{k: v for d in dicts for k, v in d.items()} # iteritems in Python 2.7

或在python 2.6中(也许在引入生成器表达式时早在2.4中):

dict((k, v) for d in dicts for k, v in d.items())

itertools.chain 将以正确的顺序在键值对上链接迭代器:

import itertools
z = dict(itertools.chain(x.iteritems(), y.iteritems()))

绩效分析

我将仅对已知行为正确的用法进行性能分析。

import timeit

在Ubuntu 14.04上完成以下操作

在Python 2.7(系统Python)中:

>>> min(timeit.repeat(lambda: merge_two_dicts(x, y)))
0.5726828575134277
>>> min(timeit.repeat(lambda: {k: v for d in (x, y) for k, v in d.items()} ))
1.163769006729126
>>> min(timeit.repeat(lambda: dict(itertools.chain(x.iteritems(), y.iteritems()))))
1.1614501476287842
>>> min(timeit.repeat(lambda: dict((k, v) for d in (x, y) for k, v in d.items())))
2.2345519065856934

在Python 3.5中(死神PPA):

>>> min(timeit.repeat(lambda: {**x, **y}))
0.4094954460160807
>>> min(timeit.repeat(lambda: merge_two_dicts(x, y)))
0.7881555100320838
>>> min(timeit.repeat(lambda: {k: v for d in (x, y) for k, v in d.items()} ))
1.4525277839857154
>>> min(timeit.repeat(lambda: dict(itertools.chain(x.items(), y.items()))))
2.3143140770262107
>>> min(timeit.repeat(lambda: dict((k, v) for d in (x, y) for k, v in d.items())))
3.2069112799945287

词典资源

How can I merge two Python dictionaries in a single expression?

For dictionaries x and y, z becomes a shallowly merged dictionary with values from y replacing those from x.

  • In Python 3.5 or greater:

    z = {**x, **y}
    
  • In Python 2, (or 3.4 or lower) write a function:

    def merge_two_dicts(x, y):
        z = x.copy()   # start with x's keys and values
        z.update(y)    # modifies z with y's keys and values & returns None
        return z
    

    and now:

    z = merge_two_dicts(x, y)
    
  • In Python 3.9.0a4 or greater (final release date approx October 2020): PEP-584, discussed here, was implemented to further simplify this:

    z = x | y          # NOTE: 3.9+ ONLY
    

Explanation

Say you have two dicts and you want to merge them into a new dict without altering the original dicts:

x = {'a': 1, 'b': 2}
y = {'b': 3, 'c': 4}

The desired result is to get a new dictionary (z) with the values merged, and the second dict’s values overwriting those from the first.

>>> z
{'a': 1, 'b': 3, 'c': 4}

A new syntax for this, proposed in PEP 448 and available as of Python 3.5, is

z = {**x, **y}

And it is indeed a single expression.

Note that we can merge in with literal notation as well:

z = {**x, 'foo': 1, 'bar': 2, **y}

and now:

>>> z
{'a': 1, 'b': 3, 'foo': 1, 'bar': 2, 'c': 4}

It is now showing as implemented in the release schedule for 3.5, PEP 478, and it has now made its way into What’s New in Python 3.5 document.

However, since many organizations are still on Python 2, you may wish to do this in a backwards compatible way. The classically Pythonic way, available in Python 2 and Python 3.0-3.4, is to do this as a two-step process:

z = x.copy()
z.update(y) # which returns None since it mutates z

In both approaches, y will come second and its values will replace x‘s values, thus 'b' will point to 3 in our final result.

Not yet on Python 3.5, but want a single expression

If you are not yet on Python 3.5, or need to write backward-compatible code, and you want this in a single expression, the most performant while correct approach is to put it in a function:

def merge_two_dicts(x, y):
    """Given two dicts, merge them into a new dict as a shallow copy."""
    z = x.copy()
    z.update(y)
    return z

and then you have a single expression:

z = merge_two_dicts(x, y)

You can also make a function to merge an undefined number of dicts, from zero to a very large number:

def merge_dicts(*dict_args):
    """
    Given any number of dicts, shallow copy and merge into a new dict,
    precedence goes to key value pairs in latter dicts.
    """
    result = {}
    for dictionary in dict_args:
        result.update(dictionary)
    return result

This function will work in Python 2 and 3 for all dicts. e.g. given dicts a to g:

z = merge_dicts(a, b, c, d, e, f, g) 

and key value pairs in g will take precedence over dicts a to f, and so on.

Critiques of Other Answers

Don’t use what you see in the formerly accepted answer:

z = dict(x.items() + y.items())

In Python 2, you create two lists in memory for each dict, create a third list in memory with length equal to the length of the first two put together, and then discard all three lists to create the dict. In Python 3, this will fail because you’re adding two dict_items objects together, not two lists –

>>> c = dict(a.items() + b.items())
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: unsupported operand type(s) for +: 'dict_items' and 'dict_items'

and you would have to explicitly create them as lists, e.g. z = dict(list(x.items()) + list(y.items())). This is a waste of resources and computation power.

Similarly, taking the union of items() in Python 3 (viewitems() in Python 2.7) will also fail when values are unhashable objects (like lists, for example). Even if your values are hashable, since sets are semantically unordered, the behavior is undefined in regards to precedence. So don’t do this:

>>> c = dict(a.items() | b.items())

This example demonstrates what happens when values are unhashable:

>>> x = {'a': []}
>>> y = {'b': []}
>>> dict(x.items() | y.items())
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: unhashable type: 'list'

Here’s an example where y should have precedence, but instead the value from x is retained due to the arbitrary order of sets:

>>> x = {'a': 2}
>>> y = {'a': 1}
>>> dict(x.items() | y.items())
{'a': 2}

Another hack you should not use:

z = dict(x, **y)

This uses the dict constructor, and is very fast and memory efficient (even slightly more-so than our two-step process) but unless you know precisely what is happening here (that is, the second dict is being passed as keyword arguments to the dict constructor), it’s difficult to read, it’s not the intended usage, and so it is not Pythonic.

Here’s an example of the usage being remediated in django.

Dicts are intended to take hashable keys (e.g. frozensets or tuples), but this method fails in Python 3 when keys are not strings.

>>> c = dict(a, **b)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: keyword arguments must be strings

From the mailing list, Guido van Rossum, the creator of the language, wrote:

I am fine with declaring dict({}, **{1:3}) illegal, since after all it is abuse of the ** mechanism.

and

Apparently dict(x, **y) is going around as “cool hack” for “call x.update(y) and return x”. Personally I find it more despicable than cool.

It is my understanding (as well as the understanding of the creator of the language) that the intended usage for dict(**y) is for creating dicts for readability purposes, e.g.:

dict(a=1, b=10, c=11)

instead of

{'a': 1, 'b': 10, 'c': 11}

Response to comments

Despite what Guido says, dict(x, **y) is in line with the dict specification, which btw. works for both Python 2 and 3. The fact that this only works for string keys is a direct consequence of how keyword parameters work and not a short-comming of dict. Nor is using the ** operator in this place an abuse of the mechanism, in fact ** was designed precisely to pass dicts as keywords.

Again, it doesn’t work for 3 when keys are non-strings. The implicit calling contract is that namespaces take ordinary dicts, while users must only pass keyword arguments that are strings. All other callables enforced it. dict broke this consistency in Python 2:

>>> foo(**{('a', 'b'): None})
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: foo() keywords must be strings
>>> dict(**{('a', 'b'): None})
{('a', 'b'): None}

This inconsistency was bad given other implementations of Python (Pypy, Jython, IronPython). Thus it was fixed in Python 3, as this usage could be a breaking change.

I submit to you that it is malicious incompetence to intentionally write code that only works in one version of a language or that only works given certain arbitrary constraints.

More comments:

dict(x.items() + y.items()) is still the most readable solution for Python 2. Readability counts.

My response: merge_two_dicts(x, y) actually seems much clearer to me, if we’re actually concerned about readability. And it is not forward compatible, as Python 2 is increasingly deprecated.

{**x, **y} does not seem to handle nested dictionaries. the contents of nested keys are simply overwritten, not merged […] I ended up being burnt by these answers that do not merge recursively and I was surprised no one mentioned it. In my interpretation of the word “merging” these answers describe “updating one dict with another”, and not merging.

Yes. I must refer you back to the question, which is asking for a shallow merge of two dictionaries, with the first’s values being overwritten by the second’s – in a single expression.

Assuming two dictionary of dictionaries, one might recursively merge them in a single function, but you should be careful not to modify the dicts from either source, and the surest way to avoid that is to make a copy when assigning values. As keys must be hashable and are usually therefore immutable, it is pointless to copy them:

from copy import deepcopy

def dict_of_dicts_merge(x, y):
    z = {}
    overlapping_keys = x.keys() & y.keys()
    for key in overlapping_keys:
        z[key] = dict_of_dicts_merge(x[key], y[key])
    for key in x.keys() - overlapping_keys:
        z[key] = deepcopy(x[key])
    for key in y.keys() - overlapping_keys:
        z[key] = deepcopy(y[key])
    return z

Usage:

>>> x = {'a':{1:{}}, 'b': {2:{}}}
>>> y = {'b':{10:{}}, 'c': {11:{}}}
>>> dict_of_dicts_merge(x, y)
{'b': {2: {}, 10: {}}, 'a': {1: {}}, 'c': {11: {}}}

Coming up with contingencies for other value types is far beyond the scope of this question, so I will point you at my answer to the canonical question on a “Dictionaries of dictionaries merge”.

Less Performant But Correct Ad-hocs

These approaches are less performant, but they will provide correct behavior. They will be much less performant than copy and update or the new unpacking because they iterate through each key-value pair at a higher level of abstraction, but they do respect the order of precedence (latter dicts have precedence)

You can also chain the dicts manually inside a dict comprehension:

{k: v for d in dicts for k, v in d.items()} # iteritems in Python 2.7

or in python 2.6 (and perhaps as early as 2.4 when generator expressions were introduced):

dict((k, v) for d in dicts for k, v in d.items())

itertools.chain will chain the iterators over the key-value pairs in the correct order:

import itertools
z = dict(itertools.chain(x.iteritems(), y.iteritems()))

Performance Analysis

I’m only going to do the performance analysis of the usages known to behave correctly.

import timeit

The following is done on Ubuntu 14.04

In Python 2.7 (system Python):

>>> min(timeit.repeat(lambda: merge_two_dicts(x, y)))
0.5726828575134277
>>> min(timeit.repeat(lambda: {k: v for d in (x, y) for k, v in d.items()} ))
1.163769006729126
>>> min(timeit.repeat(lambda: dict(itertools.chain(x.iteritems(), y.iteritems()))))
1.1614501476287842
>>> min(timeit.repeat(lambda: dict((k, v) for d in (x, y) for k, v in d.items())))
2.2345519065856934

In Python 3.5 (deadsnakes PPA):

>>> min(timeit.repeat(lambda: {**x, **y}))
0.4094954460160807
>>> min(timeit.repeat(lambda: merge_two_dicts(x, y)))
0.7881555100320838
>>> min(timeit.repeat(lambda: {k: v for d in (x, y) for k, v in d.items()} ))
1.4525277839857154
>>> min(timeit.repeat(lambda: dict(itertools.chain(x.items(), y.items()))))
2.3143140770262107
>>> min(timeit.repeat(lambda: dict((k, v) for d in (x, y) for k, v in d.items())))
3.2069112799945287

Resources on Dictionaries


回答 1

您的情况是:

z = dict(x.items() + y.items())

可以根据需要将最终的dict放入中z,并使key的值b被第二(y)dict的值正确覆盖:

>>> x = {'a':1, 'b': 2}
>>> y = {'b':10, 'c': 11}
>>> z = dict(x.items() + y.items())
>>> z
{'a': 1, 'c': 11, 'b': 10}

如果您使用Python 3,它只会稍微复杂一点。创建z

>>> z = dict(list(x.items()) + list(y.items()))
>>> z
{'a': 1, 'c': 11, 'b': 10}

如果您使用Python版本3.9.0a4或更高版本,则可以直接使用:

x = {'a':1, 'b': 2}
y = {'b':10, 'c': 11}
z = x | y
print(z)

Output: {'a': 1, 'c': 11, 'b': 10}

In your case, what you can do is:

z = dict(x.items() + y.items())

This will, as you want it, put the final dict in z, and make the value for key b be properly overridden by the second (y) dict’s value:

>>> x = {'a':1, 'b': 2}
>>> y = {'b':10, 'c': 11}
>>> z = dict(x.items() + y.items())
>>> z
{'a': 1, 'c': 11, 'b': 10}

If you use Python 3, it is only a little more complicated. To create z:

>>> z = dict(list(x.items()) + list(y.items()))
>>> z
{'a': 1, 'c': 11, 'b': 10}

If you use Python version 3.9.0a4 or greater, then you can directly use:

x = {'a':1, 'b': 2}
y = {'b':10, 'c': 11}
z = x | y
print(z)

Output: {'a': 1, 'c': 11, 'b': 10}

回答 2

替代:

z = x.copy()
z.update(y)

An alternative:

z = x.copy()
z.update(y)

回答 3

另一个更简洁的选择:

z = dict(x, **y)

注意:这已经成为一个流行的答案,但必须指出的是,如果y有任何非字符串键,那么它实际上是对CPython实现细节的滥用,并且在Python 3中不起作用,或在PyPy,IronPython或Jython中。另外,Guido也不是粉丝。因此,我不建议将此技术用于前向兼容或交叉实现的可移植代码,这实际上意味着应完全避免使用它。

Another, more concise, option:

z = dict(x, **y)

Note: this has become a popular answer, but it is important to point out that if y has any non-string keys, the fact that this works at all is an abuse of a CPython implementation detail, and it does not work in Python 3, or in PyPy, IronPython, or Jython. Also, Guido is not a fan. So I can’t recommend this technique for forward-compatible or cross-implementation portable code, which really means it should be avoided entirely.


回答 4

这可能不是一个流行的答案,但是您几乎可以肯定不想这样做。如果要合并的副本,请使用copy(或deepcopy,具体取决于您的需求),然后进行更新。与使用.items()+ .items()进行单行创建相比,两行代码更具可读性-更具Python风格。显式胜于隐式。

此外,当您使用.items()(Python 3.0之前的版本)时,您正在创建一个新列表,其中包含字典中的项目。如果您的词典很大,那将是很大的开销(创建合并字典后将立即丢弃两个大列表)。update()可以更高效地工作,因为它可以逐项执行第二个字典。

时间方面

>>> timeit.Timer("dict(x, **y)", "x = dict(zip(range(1000), range(1000)))\ny=dict(zip(range(1000,2000), range(1000,2000)))").timeit(100000)
15.52571702003479
>>> timeit.Timer("temp = x.copy()\ntemp.update(y)", "x = dict(zip(range(1000), range(1000)))\ny=dict(zip(range(1000,2000), range(1000,2000)))").timeit(100000)
15.694622993469238
>>> timeit.Timer("dict(x.items() + y.items())", "x = dict(zip(range(1000), range(1000)))\ny=dict(zip(range(1000,2000), range(1000,2000)))").timeit(100000)
41.484580039978027

IMO出于可读性考虑,前两者之间的微小速度下降是值得的。此外,仅在Python 2.3中添加了用于字典创建的关键字参数,而copy()和update()将在较早的版本中运行。

This probably won’t be a popular answer, but you almost certainly do not want to do this. If you want a copy that’s a merge, then use copy (or deepcopy, depending on what you want) and then update. The two lines of code are much more readable – more Pythonic – than the single line creation with .items() + .items(). Explicit is better than implicit.

In addition, when you use .items() (pre Python 3.0), you’re creating a new list that contains the items from the dict. If your dictionaries are large, then that is quite a lot of overhead (two large lists that will be thrown away as soon as the merged dict is created). update() can work more efficiently, because it can run through the second dict item-by-item.

In terms of time:

>>> timeit.Timer("dict(x, **y)", "x = dict(zip(range(1000), range(1000)))\ny=dict(zip(range(1000,2000), range(1000,2000)))").timeit(100000)
15.52571702003479
>>> timeit.Timer("temp = x.copy()\ntemp.update(y)", "x = dict(zip(range(1000), range(1000)))\ny=dict(zip(range(1000,2000), range(1000,2000)))").timeit(100000)
15.694622993469238
>>> timeit.Timer("dict(x.items() + y.items())", "x = dict(zip(range(1000), range(1000)))\ny=dict(zip(range(1000,2000), range(1000,2000)))").timeit(100000)
41.484580039978027

IMO the tiny slowdown between the first two is worth it for the readability. In addition, keyword arguments for dictionary creation was only added in Python 2.3, whereas copy() and update() will work in older versions.


回答 5

在后续回答中,您询问了这两种选择的相对性能:

z1 = dict(x.items() + y.items())
z2 = dict(x, **y)

至少在我的机器上(运行Python 2.5.2的相当普通的x86_64),替代z2方法不仅更短,更简单,而且显着更快。您可以使用timeitPython随附的模块自行验证。

示例1:相同的字典将20个连续的整数映射到自身:

% python -m timeit -s 'x=y=dict((i,i) for i in range(20))' 'z1=dict(x.items() + y.items())'
100000 loops, best of 3: 5.67 usec per loop
% python -m timeit -s 'x=y=dict((i,i) for i in range(20))' 'z2=dict(x, **y)' 
100000 loops, best of 3: 1.53 usec per loop

z2胜出率约为3.5。不同的词典似乎会产生完全不同的结果,但z2似乎总是遥遥领先。(如果同一测试的结果不一致,请尝试传递-r大于默认值3的数字。)

示例2:非重叠字典将252个短字符串映射为整数,反之亦然:

% python -m timeit -s 'from htmlentitydefs import codepoint2name as x, name2codepoint as y' 'z1=dict(x.items() + y.items())'
1000 loops, best of 3: 260 usec per loop
% python -m timeit -s 'from htmlentitydefs import codepoint2name as x, name2codepoint as y' 'z2=dict(x, **y)'               
10000 loops, best of 3: 26.9 usec per loop

z2 赢了大约10倍。这是我书中相当大的胜利!

比较z1完这两个项目后,我想知道的不佳表现是否可以归因于构建两个项目列表的开销,这反过来又让我想知道这种变化是否会更好:

from itertools import chain
z3 = dict(chain(x.iteritems(), y.iteritems()))

一些快速测试,例如

% python -m timeit -s 'from itertools import chain; from htmlentitydefs import codepoint2name as x, name2codepoint as y' 'z3=dict(chain(x.iteritems(), y.iteritems()))'
10000 loops, best of 3: 66 usec per loop

我得出的结论是,z3它的速度要比速度快z1,但不及速度z2。绝对不值得所有额外的打字。

讨论中仍然缺少一些重要的内容,这是将这些替代方法与合并两个列表的“明显”方法的性能比较:使用该update方法。为了使事情与表达式保持一致,而不会修改x或y,我将制作x的副本,而不是就地对其进行修改,如下所示:

z0 = dict(x)
z0.update(y)

典型结果:

% python -m timeit -s 'from htmlentitydefs import codepoint2name as x, name2codepoint as y' 'z0=dict(x); z0.update(y)'
10000 loops, best of 3: 26.9 usec per loop

换句话说,z0并且z2似乎具有基本相同的性能。您认为这可能是巧合吗?我不….

实际上,我什至声称纯粹的Python代码不可能做到比这更好。而且,如果您可以在C扩展模块中做得更好,我想Python人士可能会对将您的代码(或方法的变体)并入Python核心感兴趣。Python dict在很多地方都使用过;优化运营非常重要。

您也可以这样写

z0 = x.copy()
z0.update(y)

就像Tony一样,但是(并不奇怪)表示法上的差异对性能没有任何可测量的影响。使用对您而言合适的任何一种。当然,他绝对正确地指出,两语句版本更容易理解。

In a follow-up answer, you asked about the relative performance of these two alternatives:

z1 = dict(x.items() + y.items())
z2 = dict(x, **y)

On my machine, at least (a fairly ordinary x86_64 running Python 2.5.2), alternative z2 is not only shorter and simpler but also significantly faster. You can verify this for yourself using the timeit module that comes with Python.

Example 1: identical dictionaries mapping 20 consecutive integers to themselves:

% python -m timeit -s 'x=y=dict((i,i) for i in range(20))' 'z1=dict(x.items() + y.items())'
100000 loops, best of 3: 5.67 usec per loop
% python -m timeit -s 'x=y=dict((i,i) for i in range(20))' 'z2=dict(x, **y)' 
100000 loops, best of 3: 1.53 usec per loop

z2 wins by a factor of 3.5 or so. Different dictionaries seem to yield quite different results, but z2 always seems to come out ahead. (If you get inconsistent results for the same test, try passing in -r with a number larger than the default 3.)

Example 2: non-overlapping dictionaries mapping 252 short strings to integers and vice versa:

% python -m timeit -s 'from htmlentitydefs import codepoint2name as x, name2codepoint as y' 'z1=dict(x.items() + y.items())'
1000 loops, best of 3: 260 usec per loop
% python -m timeit -s 'from htmlentitydefs import codepoint2name as x, name2codepoint as y' 'z2=dict(x, **y)'               
10000 loops, best of 3: 26.9 usec per loop

z2 wins by about a factor of 10. That’s a pretty big win in my book!

After comparing those two, I wondered if z1‘s poor performance could be attributed to the overhead of constructing the two item lists, which in turn led me to wonder if this variation might work better:

from itertools import chain
z3 = dict(chain(x.iteritems(), y.iteritems()))

A few quick tests, e.g.

% python -m timeit -s 'from itertools import chain; from htmlentitydefs import codepoint2name as x, name2codepoint as y' 'z3=dict(chain(x.iteritems(), y.iteritems()))'
10000 loops, best of 3: 66 usec per loop

lead me to conclude that z3 is somewhat faster than z1, but not nearly as fast as z2. Definitely not worth all the extra typing.

This discussion is still missing something important, which is a performance comparison of these alternatives with the “obvious” way of merging two lists: using the update method. To try to keep things on an equal footing with the expressions, none of which modify x or y, I’m going to make a copy of x instead of modifying it in-place, as follows:

z0 = dict(x)
z0.update(y)

A typical result:

% python -m timeit -s 'from htmlentitydefs import codepoint2name as x, name2codepoint as y' 'z0=dict(x); z0.update(y)'
10000 loops, best of 3: 26.9 usec per loop

In other words, z0 and z2 seem to have essentially identical performance. Do you think this might be a coincidence? I don’t….

In fact, I’d go so far as to claim that it’s impossible for pure Python code to do any better than this. And if you can do significantly better in a C extension module, I imagine the Python folks might well be interested in incorporating your code (or a variation on your approach) into the Python core. Python uses dict in lots of places; optimizing its operations is a big deal.

You could also write this as

z0 = x.copy()
z0.update(y)

as Tony does, but (not surprisingly) the difference in notation turns out not to have any measurable effect on performance. Use whichever looks right to you. Of course, he’s absolutely correct to point out that the two-statement version is much easier to understand.


回答 6

在Python 3.0及更高版本中,您可以使用collections.ChainMap将多个字典或其他映射组合在一起的方式来创建一个可更新的视图:

>>> from collections import ChainMap
>>> x = {'a':1, 'b': 2}
>>> y = {'b':10, 'c': 11}
>>> z = dict(ChainMap({}, y, x))
>>> for k, v in z.items():
        print(k, '-->', v)

a --> 1
b --> 10
c --> 11

适用于Python 3.5和更高版本的更新:可以使用PEP 448扩展词典打包和拆包。快速简便:

>>> x = {'a':1, 'b': 2}
>>> y = y = {'b':10, 'c': 11}
>>> {**x, **y}
{'a': 1, 'b': 10, 'c': 11}

In Python 3.0 and later, you can use collections.ChainMap which groups multiple dicts or other mappings together to create a single, updateable view:

>>> from collections import ChainMap
>>> x = {'a':1, 'b': 2}
>>> y = {'b':10, 'c': 11}
>>> z = dict(ChainMap({}, y, x))
>>> for k, v in z.items():
        print(k, '-->', v)

a --> 1
b --> 10
c --> 11

Update for Python 3.5 and later: You can use PEP 448 extended dictionary packing and unpacking. This is fast and easy:

>>> x = {'a':1, 'b': 2}
>>> y = y = {'b':10, 'c': 11}
>>> {**x, **y}
{'a': 1, 'b': 10, 'c': 11}

回答 7

我想要类似的东西,但是能够指定如何合并重复键上的值,所以我破解了这个(但并未对其进行大量测试)。显然,这不是单个表达式,而是单个函数调用。

def merge(d1, d2, merge_fn=lambda x,y:y):
    """
    Merges two dictionaries, non-destructively, combining 
    values on duplicate keys as defined by the optional merge
    function.  The default behavior replaces the values in d1
    with corresponding values in d2.  (There is no other generally
    applicable merge strategy, but often you'll have homogeneous 
    types in your dicts, so specifying a merge technique can be 
    valuable.)

    Examples:

    >>> d1
    {'a': 1, 'c': 3, 'b': 2}
    >>> merge(d1, d1)
    {'a': 1, 'c': 3, 'b': 2}
    >>> merge(d1, d1, lambda x,y: x+y)
    {'a': 2, 'c': 6, 'b': 4}

    """
    result = dict(d1)
    for k,v in d2.iteritems():
        if k in result:
            result[k] = merge_fn(result[k], v)
        else:
            result[k] = v
    return result

I wanted something similar, but with the ability to specify how the values on duplicate keys were merged, so I hacked this out (but did not heavily test it). Obviously this is not a single expression, but it is a single function call.

def merge(d1, d2, merge_fn=lambda x,y:y):
    """
    Merges two dictionaries, non-destructively, combining 
    values on duplicate keys as defined by the optional merge
    function.  The default behavior replaces the values in d1
    with corresponding values in d2.  (There is no other generally
    applicable merge strategy, but often you'll have homogeneous 
    types in your dicts, so specifying a merge technique can be 
    valuable.)

    Examples:

    >>> d1
    {'a': 1, 'c': 3, 'b': 2}
    >>> merge(d1, d1)
    {'a': 1, 'c': 3, 'b': 2}
    >>> merge(d1, d1, lambda x,y: x+y)
    {'a': 2, 'c': 6, 'b': 4}

    """
    result = dict(d1)
    for k,v in d2.iteritems():
        if k in result:
            result[k] = merge_fn(result[k], v)
        else:
            result[k] = v
    return result

回答 8

递归/深度更新字典

def deepupdate(original, update):
    """
    Recursively update a dict.
    Subdict's won't be overwritten but also updated.
    """
    for key, value in original.iteritems(): 
        if key not in update:
            update[key] = value
        elif isinstance(value, dict):
            deepupdate(value, update[key]) 
    return update

示范:

pluto_original = {
    'name': 'Pluto',
    'details': {
        'tail': True,
        'color': 'orange'
    }
}

pluto_update = {
    'name': 'Pluutoo',
    'details': {
        'color': 'blue'
    }
}

print deepupdate(pluto_original, pluto_update)

输出:

{
    'name': 'Pluutoo',
    'details': {
        'color': 'blue',
        'tail': True
    }
}

感谢rednaw的编辑。

Recursively/deep update a dict

def deepupdate(original, update):
    """
    Recursively update a dict.
    Subdict's won't be overwritten but also updated.
    """
    for key, value in original.iteritems(): 
        if key not in update:
            update[key] = value
        elif isinstance(value, dict):
            deepupdate(value, update[key]) 
    return update

Demonstration:

pluto_original = {
    'name': 'Pluto',
    'details': {
        'tail': True,
        'color': 'orange'
    }
}

pluto_update = {
    'name': 'Pluutoo',
    'details': {
        'color': 'blue'
    }
}

print deepupdate(pluto_original, pluto_update)

Outputs:

{
    'name': 'Pluutoo',
    'details': {
        'color': 'blue',
        'tail': True
    }
}

Thanks rednaw for edits.


回答 9

我不使用副本时可能想到的最佳版本是:

from itertools import chain
x = {'a':1, 'b': 2}
y = {'b':10, 'c': 11}
dict(chain(x.iteritems(), y.iteritems()))

至少在CPython上,它比快,dict(x.items() + y.items())但没有快n = copy(a); n.update(b)。如果更改iteritems()items(),则此版本在Python 3中也可以使用,这是2to3工具自动完成的。

我个人最喜欢这个版本,因为它用一种功能语法很好地描述了我想要的内容。唯一的小问题是,来自y的值优先于来自x的值并不能完全清楚,但是我不认为很难弄清楚。

The best version I could think while not using copy would be:

from itertools import chain
x = {'a':1, 'b': 2}
y = {'b':10, 'c': 11}
dict(chain(x.iteritems(), y.iteritems()))

It’s faster than dict(x.items() + y.items()) but not as fast as n = copy(a); n.update(b), at least on CPython. This version also works in Python 3 if you change iteritems() to items(), which is automatically done by the 2to3 tool.

Personally I like this version best because it describes fairly good what I want in a single functional syntax. The only minor problem is that it doesn’t make completely obvious that values from y takes precedence over values from x, but I don’t believe it’s difficult to figure that out.


回答 10

Python 3.5(PEP 448)允许使用更好的语法选项:

x = {'a': 1, 'b': 1}
y = {'a': 2, 'c': 2}
final = {**x, **y} 
final
# {'a': 2, 'b': 1, 'c': 2}

甚至

final = {'a': 1, 'b': 1, **x, **y}

在Python 3.9中,您还可以使用| 和| =以及下面来自PEP 584的示例

d = {'spam': 1, 'eggs': 2, 'cheese': 3}
e = {'cheese': 'cheddar', 'aardvark': 'Ethel'}
d | e
# {'spam': 1, 'eggs': 2, 'cheese': 'cheddar', 'aardvark': 'Ethel'}

Python 3.5 (PEP 448) allows a nicer syntax option:

x = {'a': 1, 'b': 1}
y = {'a': 2, 'c': 2}
final = {**x, **y} 
final
# {'a': 2, 'b': 1, 'c': 2}

Or even

final = {'a': 1, 'b': 1, **x, **y}

In Python 3.9 you also use | and |= with the below example from PEP 584

d = {'spam': 1, 'eggs': 2, 'cheese': 3}
e = {'cheese': 'cheddar', 'aardvark': 'Ethel'}
d | e
# {'spam': 1, 'eggs': 2, 'cheese': 'cheddar', 'aardvark': 'Ethel'}

回答 11

x = {'a':1, 'b': 2}
y = {'b':10, 'c': 11}
z = dict(x.items() + y.items())
print z

对于两个字典中都有键的项目,您可以通过将最后一个放在输出中来控制哪一个在输出中结束。

x = {'a':1, 'b': 2}
y = {'b':10, 'c': 11}
z = dict(x.items() + y.items())
print z

For items with keys in both dictionaries (‘b’), you can control which one ends up in the output by putting that one last.


回答 12

虽然已经多次回答了该问题,但尚未列出此问题的简单解决方案。

x = {'a':1, 'b': 2}
y = {'b':10, 'c': 11}
z4 = {}
z4.update(x)
z4.update(y)

它与上面提到的z0和邪恶z2一样快,但易于理解和更改。

While the question has already been answered several times, this simple solution to the problem has not been listed yet.

x = {'a':1, 'b': 2}
y = {'b':10, 'c': 11}
z4 = {}
z4.update(x)
z4.update(y)

It is as fast as z0 and the evil z2 mentioned above, but easy to understand and change.


回答 13

def dict_merge(a, b):
  c = a.copy()
  c.update(b)
  return c

new = dict_merge(old, extras)

在这些阴暗而可疑的答案中,这个出色的例子是在Python中合并字典的唯一且唯一的好方法,这是独裁者终身支持的Guido van Rossum本人!有人提出了一半的建议,但没有将其放在函数中。

print dict_merge(
      {'color':'red', 'model':'Mini'},
      {'model':'Ferrari', 'owner':'Carl'})

给出:

{'color': 'red', 'owner': 'Carl', 'model': 'Ferrari'}
def dict_merge(a, b):
  c = a.copy()
  c.update(b)
  return c

new = dict_merge(old, extras)

Among such shady and dubious answers, this shining example is the one and only good way to merge dicts in Python, endorsed by dictator for life Guido van Rossum himself! Someone else suggested half of this, but did not put it in a function.

print dict_merge(
      {'color':'red', 'model':'Mini'},
      {'model':'Ferrari', 'owner':'Carl'})

gives:

{'color': 'red', 'owner': 'Carl', 'model': 'Ferrari'}

回答 14

如果您认为lambda是邪恶的,那么请继续阅读。根据要求,您可以使用一个表达式编写快速而高效的内存解决方案:

x = {'a':1, 'b':2}
y = {'b':10, 'c':11}
z = (lambda a, b: (lambda a_copy: a_copy.update(b) or a_copy)(a.copy()))(x, y)
print z
{'a': 1, 'c': 11, 'b': 10}
print x
{'a': 1, 'b': 2}

如上所述,使用两行或编写函数可能是更好的方法。

If you think lambdas are evil then read no further. As requested, you can write the fast and memory-efficient solution with one expression:

x = {'a':1, 'b':2}
y = {'b':10, 'c':11}
z = (lambda a, b: (lambda a_copy: a_copy.update(b) or a_copy)(a.copy()))(x, y)
print z
{'a': 1, 'c': 11, 'b': 10}
print x
{'a': 1, 'b': 2}

As suggested above, using two lines or writing a function is probably a better way to go.


回答 15

是pythonic。使用理解

z={i:d[i] for d in [x,y] for i in d}

>>> print z
{'a': 1, 'c': 11, 'b': 10}

Be pythonic. Use a comprehension:

z={i:d[i] for d in [x,y] for i in d}

>>> print z
{'a': 1, 'c': 11, 'b': 10}

回答 16

在python3中,该items方法不再返回list,而是返回一个view,其作用类似于set。在这种情况下,您将需要使用set联合,因为与的连接+将不起作用:

dict(x.items() | y.items())

对于2.7版中的类似python3的行为,该viewitems方法应代替items

dict(x.viewitems() | y.viewitems())

无论如何,我还是更喜欢这种表示法,因为将其视为固定的联合运算而不是串联(如标题所示)似乎更为自然。

编辑:

对于python 3还有几点。首先,请注意,dict(x, **y)除非输入的键y是字符串,否则该技巧在python 3中将不起作用。

而且,Raymond Hettinger的Chainmap 答案非常优雅,因为它可以将任意数量的dicts作为参数,但是从文档中看,它似乎依次遍历了每次查找的所有dicts列表:

查找顺序搜索基础映射,直到找到密钥为止。

如果您的应用程序中有很多查找,这可能会减慢您的速度:

In [1]: from collections import ChainMap
In [2]: from string import ascii_uppercase as up, ascii_lowercase as lo; x = dict(zip(lo, up)); y = dict(zip(up, lo))
In [3]: chainmap_dict = ChainMap(y, x)
In [4]: union_dict = dict(x.items() | y.items())
In [5]: timeit for k in union_dict: union_dict[k]
100000 loops, best of 3: 2.15 µs per loop
In [6]: timeit for k in chainmap_dict: chainmap_dict[k]
10000 loops, best of 3: 27.1 µs per loop

因此,查找速度要慢一个数量级。我是Chainmap的粉丝,但在可能有很多查找的地方看起来不太实用。

In python3, the items method no longer returns a list, but rather a view, which acts like a set. In this case you’ll need to take the set union since concatenating with + won’t work:

dict(x.items() | y.items())

For python3-like behavior in version 2.7, the viewitems method should work in place of items:

dict(x.viewitems() | y.viewitems())

I prefer this notation anyways since it seems more natural to think of it as a set union operation rather than concatenation (as the title shows).

Edit:

A couple more points for python 3. First, note that the dict(x, **y) trick won’t work in python 3 unless the keys in y are strings.

Also, Raymond Hettinger’s Chainmap answer is pretty elegant, since it can take an arbitrary number of dicts as arguments, but from the docs it looks like it sequentially looks through a list of all the dicts for each lookup:

Lookups search the underlying mappings successively until a key is found.

This can slow you down if you have a lot of lookups in your application:

In [1]: from collections import ChainMap
In [2]: from string import ascii_uppercase as up, ascii_lowercase as lo; x = dict(zip(lo, up)); y = dict(zip(up, lo))
In [3]: chainmap_dict = ChainMap(y, x)
In [4]: union_dict = dict(x.items() | y.items())
In [5]: timeit for k in union_dict: union_dict[k]
100000 loops, best of 3: 2.15 µs per loop
In [6]: timeit for k in chainmap_dict: chainmap_dict[k]
10000 loops, best of 3: 27.1 µs per loop

So about an order of magnitude slower for lookups. I’m a fan of Chainmap, but looks less practical where there may be many lookups.


回答 17

滥用导致马修回答的单一表达解决方案:

>>> x = {'a':1, 'b': 2}
>>> y = {'b':10, 'c': 11}
>>> z = (lambda f=x.copy(): (f.update(y), f)[1])()
>>> z
{'a': 1, 'c': 11, 'b': 10}

您说您想要一个表达式,所以我滥用lambda了绑定一个名称,并使用元组重写了lambda的一个表达式的限制。随时畏缩。

如果您不关心复制它,当然也可以这样做:

>>> x = {'a':1, 'b': 2}
>>> y = {'b':10, 'c': 11}
>>> z = (x.update(y), x)[1]
>>> z
{'a': 1, 'b': 10, 'c': 11}

Abuse leading to a one-expression solution for Matthew’s answer:

>>> x = {'a':1, 'b': 2}
>>> y = {'b':10, 'c': 11}
>>> z = (lambda f=x.copy(): (f.update(y), f)[1])()
>>> z
{'a': 1, 'c': 11, 'b': 10}

You said you wanted one expression, so I abused lambda to bind a name, and tuples to override lambda’s one-expression limit. Feel free to cringe.

You could also do this of course if you don’t care about copying it:

>>> x = {'a':1, 'b': 2}
>>> y = {'b':10, 'c': 11}
>>> z = (x.update(y), x)[1]
>>> z
{'a': 1, 'b': 10, 'c': 11}

回答 18

使用itertools的简单解决方案,该命令可保留顺序(后格优先)

import itertools as it
merge = lambda *args: dict(it.chain.from_iterable(it.imap(dict.iteritems, args)))

它的用法是:

>>> x = {'a':1, 'b': 2}
>>> y = {'b':10, 'c': 11}
>>> merge(x, y)
{'a': 1, 'b': 10, 'c': 11}

>>> z = {'c': 3, 'd': 4}
>>> merge(x, y, z)
{'a': 1, 'b': 10, 'c': 3, 'd': 4}

Simple solution using itertools that preserves order (latter dicts have precedence)

import itertools as it
merge = lambda *args: dict(it.chain.from_iterable(it.imap(dict.iteritems, args)))

And it’s usage:

>>> x = {'a':1, 'b': 2}
>>> y = {'b':10, 'c': 11}
>>> merge(x, y)
{'a': 1, 'b': 10, 'c': 11}

>>> z = {'c': 3, 'd': 4}
>>> merge(x, y, z)
{'a': 1, 'b': 10, 'c': 3, 'd': 4}

回答 19

两本字典

def union2(dict1, dict2):
    return dict(list(dict1.items()) + list(dict2.items()))

n词典

def union(*dicts):
    return dict(itertools.chain.from_iterable(dct.items() for dct in dicts))

sum性能不佳。参见https://mathieularose.com/how-not-to-flatten-a-list-of-lists-in-python/

Two dictionaries

def union2(dict1, dict2):
    return dict(list(dict1.items()) + list(dict2.items()))

n dictionaries

def union(*dicts):
    return dict(itertools.chain.from_iterable(dct.items() for dct in dicts))

sum has bad performance. See https://mathieularose.com/how-not-to-flatten-a-list-of-lists-in-python/


回答 20

即使答案对于此浅表字典而言是好的,但此处定义的方法均未进行深字典合并。

示例如下:

a = { 'one': { 'depth_2': True }, 'two': True }
b = { 'one': { 'extra': False } }
print dict(a.items() + b.items())

人们会期望这样的结果:

{ 'one': { 'extra': False', 'depth_2': True }, 'two': True }

相反,我们得到以下信息:

{'two': True, 'one': {'extra': False}}

如果“ one”条目确实是合并的,则其字典中的项目应具有“ depth_2”和“ extra”作为条目。

也使用链,不起作用:

from itertools import chain
print dict(chain(a.iteritems(), b.iteritems()))

结果是:

{'two': True, 'one': {'extra': False}}

rcwesick进行的深度合并也产生相同的结果。

是的,可以合并示例字典,但是它们都不是合并的通用机制。一旦编写了可以真正合并的方法,我将在以后进行更新。

Even though the answers were good for this shallow dictionary, none of the methods defined here actually do a deep dictionary merge.

Examples follow:

a = { 'one': { 'depth_2': True }, 'two': True }
b = { 'one': { 'extra': False } }
print dict(a.items() + b.items())

One would expect a result of something like this:

{ 'one': { 'extra': False', 'depth_2': True }, 'two': True }

Instead, we get this:

{'two': True, 'one': {'extra': False}}

The ‘one’ entry should have had ‘depth_2’ and ‘extra’ as items inside its dictionary if it truly was a merge.

Using chain also, does not work:

from itertools import chain
print dict(chain(a.iteritems(), b.iteritems()))

Results in:

{'two': True, 'one': {'extra': False}}

The deep merge that rcwesick gave also creates the same result.

Yes, it will work to merge the sample dictionaries, but none of them are a generic mechanism to merge. I’ll update this later once I write a method that does a true merge.


回答 21

(仅适用于Python2.7 *;对于Python3 *有更简单的解决方案。)

如果您不反对导入标准库模块,则可以执行

from functools import reduce

def merge_dicts(*dicts):
    return reduce(lambda a, d: a.update(d) or a, dicts, {})

(由于总是返回成功,所以中的or alambda是必需的。)dict.updateNone

(For Python2.7* only; there are simpler solutions for Python3*.)

If you’re not averse to importing a standard library module, you can do

from functools import reduce

def merge_dicts(*dicts):
    return reduce(lambda a, d: a.update(d) or a, dicts, {})

(The or a bit in the lambda is necessary because dict.update always returns None on success.)


回答 22

如果您不介意变异x

x.update(y) or x

简单,可读,高效。您知道 update()总是会返回None,这是一个错误的值。因此,上述表达式x在更新后将始终等于。

标准库中的变异方法(如.update()None按约定返回,因此该模式也适用于这些方法。如果您使用的方法不遵循此约定,则or可能无法正常工作。但是,您可以使用元组显示和索引来使其成为单个表达式。无论第一个元素的计算结果如何,此方法都有效。

(x.update(y), x)[-1]

如果还没有x变量,则可以使用lambda本地变量而不使用赋值语句。这相当于lambda用作let表达式,这是功能语言中的一种常用技术,但可能不是Python语言。

(lambda x: x.update(y) or x)({'a': 1, 'b': 2})

尽管与下面使用新的walrus运算符(仅适用于Python 3.8+)没有什么不同:

(x := {'a': 1, 'b': 2}).update(y) or x

如果您确实想要复制,则PEP 448样式最简单{**x, **y}。但是,如果您的(旧)Python版本中没有该功能,则let模式也可以在这里使用。

(lambda z: z.update(y) or z)(x.copy())

(当然,这等效于(z := x.copy()).update(y) or z,但是如果您的Python版本足够新,则可以使用PEP 448样式。)

If you don’t mind mutating x,

x.update(y) or x

Simple, readable, performant. You know update() always returns None, which is a false value. So the above expression will always evaluate to x, after updating it.

Mutating methods in the standard library (like .update()) return None by convention, so this pattern will work on those too. If you’re using a method that doesn’t follow this convention, then or may not work. But, you can use a tuple display and index to make it a single expression, instead. This works regardless of what the first element evaluates to.

(x.update(y), x)[-1]

If you don’t have x in a variable yet, you can use lambda to make a local without using an assignment statement. This amounts to using lambda as a let expression, which is a common technique in functional languages, but maybe unpythonic.

(lambda x: x.update(y) or x)({'a': 1, 'b': 2})

Although it’s not that different from the following use of the new walrus operator (Python 3.8+ only):

(x := {'a': 1, 'b': 2}).update(y) or x

If you do want a copy, PEP 448 style is easiest {**x, **y}. But if that’s not available in your (older) Python version, the let pattern works here too.

(lambda z: z.update(y) or z)(x.copy())

(That is, of course, equivalent to (z := x.copy()).update(y) or z, but if your Python version is new enough for that, then the PEP 448 style will be available.)


回答 23

借鉴这里和其他地方的想法,我理解了一个函数:

def merge(*dicts, **kv): 
      return { k:v for d in list(dicts) + [kv] for k,v in d.items() }

用法(在python 3中测试):

assert (merge({1:11,'a':'aaa'},{1:99, 'b':'bbb'},foo='bar')==\
    {1: 99, 'foo': 'bar', 'b': 'bbb', 'a': 'aaa'})

assert (merge(foo='bar')=={'foo': 'bar'})

assert (merge({1:11},{1:99},foo='bar',baz='quux')==\
    {1: 99, 'foo': 'bar', 'baz':'quux'})

assert (merge({1:11},{1:99})=={1: 99})

您可以改用lambda。

Drawing on ideas here and elsewhere I’ve comprehended a function:

def merge(*dicts, **kv): 
      return { k:v for d in list(dicts) + [kv] for k,v in d.items() }

Usage (tested in python 3):

assert (merge({1:11,'a':'aaa'},{1:99, 'b':'bbb'},foo='bar')==\
    {1: 99, 'foo': 'bar', 'b': 'bbb', 'a': 'aaa'})

assert (merge(foo='bar')=={'foo': 'bar'})

assert (merge({1:11},{1:99},foo='bar',baz='quux')==\
    {1: 99, 'foo': 'bar', 'baz':'quux'})

assert (merge({1:11},{1:99})=={1: 99})

You could use a lambda instead.


回答 24

迄今为止,我列出的解决方案存在的问题是,在合并的词典中,键“ b”的值是10,但按照我的想法,应该是12。鉴于此,我提出以下内容:

import timeit

n=100000
su = """
x = {'a':1, 'b': 2}
y = {'b':10, 'c': 11}
"""

def timeMerge(f,su,niter):
    print "{:4f} sec for: {:30s}".format(timeit.Timer(f,setup=su).timeit(n),f)

timeMerge("dict(x, **y)",su,n)
timeMerge("x.update(y)",su,n)
timeMerge("dict(x.items() + y.items())",su,n)
timeMerge("for k in y.keys(): x[k] = k in x and x[k]+y[k] or y[k] ",su,n)

#confirm for loop adds b entries together
x = {'a':1, 'b': 2}
y = {'b':10, 'c': 11}
for k in y.keys(): x[k] = k in x and x[k]+y[k] or y[k]
print "confirm b elements are added:",x

结果:

0.049465 sec for: dict(x, **y)
0.033729 sec for: x.update(y)                   
0.150380 sec for: dict(x.items() + y.items())   
0.083120 sec for: for k in y.keys(): x[k] = k in x and x[k]+y[k] or y[k]

confirm b elements are added: {'a': 1, 'c': 11, 'b': 12}

The problem I have with solutions listed to date is that, in the merged dictionary, the value for key “b” is 10 but, to my way of thinking, it should be 12. In that light, I present the following:

import timeit

n=100000
su = """
x = {'a':1, 'b': 2}
y = {'b':10, 'c': 11}
"""

def timeMerge(f,su,niter):
    print "{:4f} sec for: {:30s}".format(timeit.Timer(f,setup=su).timeit(n),f)

timeMerge("dict(x, **y)",su,n)
timeMerge("x.update(y)",su,n)
timeMerge("dict(x.items() + y.items())",su,n)
timeMerge("for k in y.keys(): x[k] = k in x and x[k]+y[k] or y[k] ",su,n)

#confirm for loop adds b entries together
x = {'a':1, 'b': 2}
y = {'b':10, 'c': 11}
for k in y.keys(): x[k] = k in x and x[k]+y[k] or y[k]
print "confirm b elements are added:",x

Results:

0.049465 sec for: dict(x, **y)
0.033729 sec for: x.update(y)                   
0.150380 sec for: dict(x.items() + y.items())   
0.083120 sec for: for k in y.keys(): x[k] = k in x and x[k]+y[k] or y[k]

confirm b elements are added: {'a': 1, 'c': 11, 'b': 12}

回答 25

真傻,.update什么也没返回。
我只是使用一个简单的辅助函数来解决问题:

def merge(dict1,*dicts):
    for dict2 in dicts:
        dict1.update(dict2)
    return dict1

例子:

merge(dict1,dict2)
merge(dict1,dict2,dict3)
merge(dict1,dict2,dict3,dict4)
merge({},dict1,dict2)  # this one returns a new copy

It’s so silly that .update returns nothing.
I just use a simple helper function to solve the problem:

def merge(dict1,*dicts):
    for dict2 in dicts:
        dict1.update(dict2)
    return dict1

Examples:

merge(dict1,dict2)
merge(dict1,dict2,dict3)
merge(dict1,dict2,dict3,dict4)
merge({},dict1,dict2)  # this one returns a new copy

回答 26

from collections import Counter
dict1 = {'a':1, 'b': 2}
dict2 = {'b':10, 'c': 11}
result = dict(Counter(dict1) + Counter(dict2))

这应该可以解决您的问题。

from collections import Counter
dict1 = {'a':1, 'b': 2}
dict2 = {'b':10, 'c': 11}
result = dict(Counter(dict1) + Counter(dict2))

This should solve your problem.


回答 27

这可以通过单个dict理解来完成:

>>> x = {'a':1, 'b': 2}
>>> y = {'b':10, 'c': 11}
>>> { key: y[key] if key in y else x[key]
      for key in set(x) + set(y)
    }

在我看来,“单个表达式”部分的最佳答案是因为不需要额外的功能,而且它很简短。

This can be done with a single dict comprehension:

>>> x = {'a':1, 'b': 2}
>>> y = {'b':10, 'c': 11}
>>> { key: y[key] if key in y else x[key]
      for key in set(x) + set(y)
    }

In my view the best answer for the ‘single expression’ part as no extra functions are needed, and it is short.


回答 28

由于PEP 572:Assignment Expressions,Python 3.8发行版(计划于2019年10月20日)将提供一个新选项。新的赋值表达式运算符使您可以分配的结果,并仍然使用它来调用,从而使组合的代码成为单个表达式,而不是两个语句,从而进行了更改::=copyupdate

newdict = dict1.copy()
newdict.update(dict2)

至:

(newdict := dict1.copy()).update(dict2)

同时在各个方面都表现相同。如果还必须返回结果dict(您要求返回的表达式dict;上面创建并分配给newdict,但没有返回,因此您不能使用它将参数直接传递给函数la myfunc((newdict := dict1.copy()).update(dict2))) ,然后将其添加or newdict到末尾(因为updatereturns None是虚假的,因此它将求值并newdict作为表达式的结果返回):

(newdict := dict1.copy()).update(dict2) or newdict

重要警告:通常,我不建议采用以下方法:

newdict = {**dict1, **dict2}

拆包方法更清晰(对于一开始就知道要进行广义拆包的人来说,应该这样),根本不需要名称(因此,构造一个立即传递给a的临时文件时,它会更加简洁。函数或包含在list/ tuple文字等中),并且几乎肯定也更快,因为(在CPython上)大致等同于:

newdict = {}
newdict.update(dict1)
newdict.update(dict2)

但使用具体的dictAPI 在C层完成,因此不涉及动态方法查找/绑定或函数调用分派开销(在此(newdict := dict1.copy()).update(dict2)情况下,行为不可避免地与原始的两层相同,在不连续的步骤中执行工作,并进行动态查找/绑定/方法的调用。

它也更可扩展,因为合并三个dicts是显而易见的:

 newdict = {**dict1, **dict2, **dict3}

使用赋值表达式不会像这样缩放的地方;您能得到的最接近的是:

 (newdict := dict1.copy()).update(dict2), newdict.update(dict3)

或没有Nones 的临时元组,但对每个None结果进行真实性测试:

 (newdict := dict1.copy()).update(dict2) or newdict.update(dict3)

其中的任一个是明显更恶心,并且包括进一步的低效(或者是临时浪费tupleNoneS表示逗号分离,或每个的无意义感实性测试updateNone用于返回or分离)。

赋值表达式方法的唯一真正优势在于:

  1. 您有需要同时处理sets和dicts的通用代码(它们都支持copyupdate,因此代码可以按您期望的那样大致工作)
  2. 您期望接收任意类似dict的对象,而不仅仅是dict自身,并且必须保留左侧的类型和语义(而不是以简单的结尾dict)。尽管myspecialdict({**speciala, **specialb})可能会起作用,但它会涉及一个额外的临时操作dict,并且如果myspecialdict具有平原dict无法保留的功能(例如,常规dicts现在基于键的首次出现保留顺序,而基于键的最后出现保留值;您可能想要一个根据最后一个保留订单键的外观,因此更新值也会将其移到末尾),那么语义将是错误的。由于赋值表达式版本使用命名方法(可能会重载以使其正常运行),因此它根本不会创建一个dict(除非dict1已经是一个dict),并保留原始类型(和原始类型的语义),同时避免任何临时性。

There will be a new option when Python 3.8 releases (scheduled for 20 October, 2019), thanks to PEP 572: Assignment Expressions. The new assignment expression operator := allows you to assign the result of the copy and still use it to call update, leaving the combined code a single expression, rather than two statements, changing:

newdict = dict1.copy()
newdict.update(dict2)

to:

(newdict := dict1.copy()).update(dict2)

while behaving identically in every way. If you must also return the resulting dict (you asked for an expression returning the dict; the above creates and assigns to newdict, but doesn’t return it, so you couldn’t use it to pass an argument to a function as is, a la myfunc((newdict := dict1.copy()).update(dict2))), then just add or newdict to the end (since update returns None, which is falsy, it will then evaluate and return newdict as the result of the expression):

(newdict := dict1.copy()).update(dict2) or newdict

Important caveat: In general, I’d discourage this approach in favor of:

newdict = {**dict1, **dict2}

The unpacking approach is clearer (to anyone who knows about generalized unpacking in the first place, which you should), doesn’t require a name for the result at all (so it’s much more concise when constructing a temporary that is immediately passed to a function or included in a list/tuple literal or the like), and is almost certainly faster as well, being (on CPython) roughly equivalent to:

newdict = {}
newdict.update(dict1)
newdict.update(dict2)

but done at the C layer, using the concrete dict API, so no dynamic method lookup/binding or function call dispatch overhead is involved (where (newdict := dict1.copy()).update(dict2) is unavoidably identical to the original two-liner in behavior, performing the work in discrete steps, with dynamic lookup/binding/invocation of methods.

It’s also more extensible, as merging three dicts is obvious:

 newdict = {**dict1, **dict2, **dict3}

where using assignment expressions won’t scale like that; the closest you could get would be:

 (newdict := dict1.copy()).update(dict2), newdict.update(dict3)

or without the temporary tuple of Nones, but with truthiness testing of each None result:

 (newdict := dict1.copy()).update(dict2) or newdict.update(dict3)

either of which is obviously much uglier, and includes further inefficiencies (either a wasted temporary tuple of Nones for comma separation, or pointless truthiness testing of each update‘s None return for or separation).

The only real advantage to the assignment expression approach occurs if:

  1. You have generic code that needs handle both sets and dicts (both of them support copy and update, so the code works roughly as you’d expect it to)
  2. You expect to receive arbitrary dict-like objects, not just dict itself, and must preserve the type and semantics of the left hand side (rather than ending up with a plain dict). While myspecialdict({**speciala, **specialb}) might work, it would involve an extra temporary dict, and if myspecialdict has features plain dict can’t preserve (e.g. regular dicts now preserve order based on the first appearance of a key, and value based on the last appearance of a key; you might want one that preserves order based on the last appearance of a key so updating a value also moves it to the end), then the semantics would be wrong. Since the assignment expression version uses the named methods (which are presumably overloaded to behave appropriately), it never creates a dict at all (unless dict1 was already a dict), preserving the original type (and original type’s semantics), all while avoiding any temporaries.

回答 29

>>> x = {'a':1, 'b': 2}
>>> y = {'b':10, 'c': 11}
>>> x, z = dict(x), x.update(y) or x
>>> x
{'a': 1, 'b': 2}
>>> y
{'c': 11, 'b': 10}
>>> z
{'a': 1, 'c': 11, 'b': 10}
>>> x = {'a':1, 'b': 2}
>>> y = {'b':10, 'c': 11}
>>> x, z = dict(x), x.update(y) or x
>>> x
{'a': 1, 'b': 2}
>>> y
{'c': 11, 'b': 10}
>>> z
{'a': 1, 'c': 11, 'b': 10}