问题:计算两个Python字典中包含的键的差异
假设我有两个Python字典- dictA
和dictB
。我需要找出是否有任何键存在于中,dictB
但没有dictA
。最快的方法是什么?
我应该将字典键转换为集合然后继续吗?
有兴趣了解您的想法…
感谢您的回复。
很抱歉未能正确说明我的问题。我的情况是这样的-我有一个dictA
与可能相同的dictB
密钥,或者可能缺少一些密钥,dictB
否则某些密钥的值可能会有所不同,必须将其设置为dictA
密钥的值。
问题在于字典没有标准,并且可以具有可以作为dict的值。
说
dictA={'key1':a, 'key2':b, 'key3':{'key11':cc, 'key12':dd}, 'key4':{'key111':{....}}}
dictB={'key1':a, 'key2:':newb, 'key3':{'key11':cc, 'key12':newdd, 'key13':ee}.......
因此,必须将“ key2”值重置为新值,并在字典内部添加“ key13”。键值没有固定的格式。它可以是一个简单的值或dict或dict的dict。
回答 0
您可以在按键上使用设置操作:
diff = set(dictb.keys()) - set(dicta.keys())
这是一个查找所有可能性的类:添加了什么,删除了什么,哪些键值对相同以及哪些键值对已更改。
class DictDiffer(object):
"""
Calculate the difference between two dictionaries as:
(1) items added
(2) items removed
(3) keys same in both but changed values
(4) keys same in both and unchanged values
"""
def __init__(self, current_dict, past_dict):
self.current_dict, self.past_dict = current_dict, past_dict
self.set_current, self.set_past = set(current_dict.keys()), set(past_dict.keys())
self.intersect = self.set_current.intersection(self.set_past)
def added(self):
return self.set_current - self.intersect
def removed(self):
return self.set_past - self.intersect
def changed(self):
return set(o for o in self.intersect if self.past_dict[o] != self.current_dict[o])
def unchanged(self):
return set(o for o in self.intersect if self.past_dict[o] == self.current_dict[o])
这是一些示例输出:
>>> a = {'a': 1, 'b': 1, 'c': 0}
>>> b = {'a': 1, 'b': 2, 'd': 0}
>>> d = DictDiffer(b, a)
>>> print "Added:", d.added()
Added: set(['d'])
>>> print "Removed:", d.removed()
Removed: set(['c'])
>>> print "Changed:", d.changed()
Changed: set(['b'])
>>> print "Unchanged:", d.unchanged()
Unchanged: set(['a'])
可以作为github存储库使用:https : //github.com/hughdbrown/dictdiffer
回答 1
如果您需要递归的区别,我已经为python编写了一个软件包:https : //github.com/seperman/deepdiff
安装
从PyPi安装:
pip install deepdiff
用法示例
输入
>>> from deepdiff import DeepDiff
>>> from pprint import pprint
>>> from __future__ import print_function # In case running on Python 2
同一对象返回空
>>> t1 = {1:1, 2:2, 3:3}
>>> t2 = t1
>>> print(DeepDiff(t1, t2))
{}
项目类型已更改
>>> t1 = {1:1, 2:2, 3:3}
>>> t2 = {1:1, 2:"2", 3:3}
>>> pprint(DeepDiff(t1, t2), indent=2)
{ 'type_changes': { 'root[2]': { 'newtype': <class 'str'>,
'newvalue': '2',
'oldtype': <class 'int'>,
'oldvalue': 2}}}
项目的价值已更改
>>> t1 = {1:1, 2:2, 3:3}
>>> t2 = {1:1, 2:4, 3:3}
>>> pprint(DeepDiff(t1, t2), indent=2)
{'values_changed': {'root[2]': {'newvalue': 4, 'oldvalue': 2}}}
添加和/或删除项目
>>> t1 = {1:1, 2:2, 3:3, 4:4}
>>> t2 = {1:1, 2:4, 3:3, 5:5, 6:6}
>>> ddiff = DeepDiff(t1, t2)
>>> pprint (ddiff)
{'dic_item_added': ['root[5]', 'root[6]'],
'dic_item_removed': ['root[4]'],
'values_changed': {'root[2]': {'newvalue': 4, 'oldvalue': 2}}}
弦差异
>>> t1 = {1:1, 2:2, 3:3, 4:{"a":"hello", "b":"world"}}
>>> t2 = {1:1, 2:4, 3:3, 4:{"a":"hello", "b":"world!"}}
>>> ddiff = DeepDiff(t1, t2)
>>> pprint (ddiff, indent = 2)
{ 'values_changed': { 'root[2]': {'newvalue': 4, 'oldvalue': 2},
"root[4]['b']": { 'newvalue': 'world!',
'oldvalue': 'world'}}}
弦差异2
>>> t1 = {1:1, 2:2, 3:3, 4:{"a":"hello", "b":"world!\nGoodbye!\n1\n2\nEnd"}}
>>> t2 = {1:1, 2:2, 3:3, 4:{"a":"hello", "b":"world\n1\n2\nEnd"}}
>>> ddiff = DeepDiff(t1, t2)
>>> pprint (ddiff, indent = 2)
{ 'values_changed': { "root[4]['b']": { 'diff': '--- \n'
'+++ \n'
'@@ -1,5 +1,4 @@\n'
'-world!\n'
'-Goodbye!\n'
'+world\n'
' 1\n'
' 2\n'
' End',
'newvalue': 'world\n1\n2\nEnd',
'oldvalue': 'world!\n'
'Goodbye!\n'
'1\n'
'2\n'
'End'}}}
>>>
>>> print (ddiff['values_changed']["root[4]['b']"]["diff"])
---
+++
@@ -1,5 +1,4 @@
-world!
-Goodbye!
+world
1
2
End
类型变更
>>> t1 = {1:1, 2:2, 3:3, 4:{"a":"hello", "b":[1, 2, 3]}}
>>> t2 = {1:1, 2:2, 3:3, 4:{"a":"hello", "b":"world\n\n\nEnd"}}
>>> ddiff = DeepDiff(t1, t2)
>>> pprint (ddiff, indent = 2)
{ 'type_changes': { "root[4]['b']": { 'newtype': <class 'str'>,
'newvalue': 'world\n\n\nEnd',
'oldtype': <class 'list'>,
'oldvalue': [1, 2, 3]}}}
清单差异
>>> t1 = {1:1, 2:2, 3:3, 4:{"a":"hello", "b":[1, 2, 3, 4]}}
>>> t2 = {1:1, 2:2, 3:3, 4:{"a":"hello", "b":[1, 2]}}
>>> ddiff = DeepDiff(t1, t2)
>>> pprint (ddiff, indent = 2)
{'iterable_item_removed': {"root[4]['b'][2]": 3, "root[4]['b'][3]": 4}}
清单差异2:
>>> t1 = {1:1, 2:2, 3:3, 4:{"a":"hello", "b":[1, 2, 3]}}
>>> t2 = {1:1, 2:2, 3:3, 4:{"a":"hello", "b":[1, 3, 2, 3]}}
>>> ddiff = DeepDiff(t1, t2)
>>> pprint (ddiff, indent = 2)
{ 'iterable_item_added': {"root[4]['b'][3]": 3},
'values_changed': { "root[4]['b'][1]": {'newvalue': 3, 'oldvalue': 2},
"root[4]['b'][2]": {'newvalue': 2, 'oldvalue': 3}}}
列出差异忽略顺序或重复项:(具有与上述相同的字典)
>>> t1 = {1:1, 2:2, 3:3, 4:{"a":"hello", "b":[1, 2, 3]}}
>>> t2 = {1:1, 2:2, 3:3, 4:{"a":"hello", "b":[1, 3, 2, 3]}}
>>> ddiff = DeepDiff(t1, t2, ignore_order=True)
>>> print (ddiff)
{}
包含字典的列表:
>>> t1 = {1:1, 2:2, 3:3, 4:{"a":"hello", "b":[1, 2, {1:1, 2:2}]}}
>>> t2 = {1:1, 2:2, 3:3, 4:{"a":"hello", "b":[1, 2, {1:3}]}}
>>> ddiff = DeepDiff(t1, t2)
>>> pprint (ddiff, indent = 2)
{ 'dic_item_removed': ["root[4]['b'][2][2]"],
'values_changed': {"root[4]['b'][2][1]": {'newvalue': 3, 'oldvalue': 1}}}
套装:
>>> t1 = {1, 2, 8}
>>> t2 = {1, 2, 3, 5}
>>> ddiff = DeepDiff(t1, t2)
>>> pprint (DeepDiff(t1, t2))
{'set_item_added': ['root[3]', 'root[5]'], 'set_item_removed': ['root[8]']}
命名元组:
>>> from collections import namedtuple
>>> Point = namedtuple('Point', ['x', 'y'])
>>> t1 = Point(x=11, y=22)
>>> t2 = Point(x=11, y=23)
>>> pprint (DeepDiff(t1, t2))
{'values_changed': {'root.y': {'newvalue': 23, 'oldvalue': 22}}}
自定义对象:
>>> class ClassA(object):
... a = 1
... def __init__(self, b):
... self.b = b
...
>>> t1 = ClassA(1)
>>> t2 = ClassA(2)
>>>
>>> pprint(DeepDiff(t1, t2))
{'values_changed': {'root.b': {'newvalue': 2, 'oldvalue': 1}}}
添加对象属性:
>>> t2.c = "new attribute"
>>> pprint(DeepDiff(t1, t2))
{'attribute_added': ['root.c'],
'values_changed': {'root.b': {'newvalue': 2, 'oldvalue': 1}}}
回答 2
不知道它是否“快速”,但是通常情况下,可以做到这一点
dicta = {"a":1,"b":2,"c":3,"d":4}
dictb = {"a":1,"d":2}
for key in dicta.keys():
if not key in dictb:
print key
回答 3
就像Alex Martelli所写的那样,如果您只想检查B中的任何键是否不在A中,那any(True for k in dictB if k not in dictA)
将是您的最佳选择。
要查找缺少的密钥:
diff = set(dictB)-set(dictA) #sets
C:\Dokumente und Einstellungen\thc>python -m timeit -s "dictA =
dict(zip(range(1000),range
(1000))); dictB = dict(zip(range(0,2000,2),range(1000)))" "diff=set(dictB)-set(dictA)"
10000 loops, best of 3: 107 usec per loop
diff = [ k for k in dictB if k not in dictA ] #lc
C:\Dokumente und Einstellungen\thc>python -m timeit -s "dictA =
dict(zip(range(1000),range
(1000))); dictB = dict(zip(range(0,2000,2),range(1000)))" "diff=[ k for k in dictB if
k not in dictA ]"
10000 loops, best of 3: 95.9 usec per loop
因此,这两种解决方案的速度几乎相同。
回答 4
如果您确实要说的是真的(您只需要找出B中而不是A中“有任何键”的情况,那么ONES可能没有),最快的方法应该是:
if any(True for k in dictB if k not in dictA): ...
如果您实际上需要找出哪个键(如果有)在B中而不是在A中,而不仅仅是“ IF”,那么有这样的键,那么现有的答案就很合适了(但是我建议在以后的问题中更精确一些,如果那是确实是您的意思;-)。
回答 5
set(dictA.keys()).intersection(dictB.keys())
回答 6
hughdbrown的最高答案是建议使用集差异,这绝对是最好的方法:
diff = set(dictb.keys()) - set(dicta.keys())
这段代码的问题在于,它仅创建两个列表就创建了两个列表,因此浪费了4N的时间和2N的空间。它也比需要的要复杂一些。
通常,这没什么大不了的,但是如果是这样的话:
diff = dictb.keys() - dicta
- 您不需要将正确的字典转换为集合;设置差异需要任何迭代(而dict是其键的迭代)。
- 您也不需要将左字典转换为集合,因为任何符合条件的东西都
collections.abc.Mapping
具有KeysView
类似的行为Set
。
Python 2
在Python 2中,keys()
返回键列表,而不是KeysView
。因此,您必须viewkeys()
直接提出要求。
diff = dictb.viewkeys() - dicta
对于双版本2.7 / 3.x代码,希望使用six
或类似的代码,因此可以使用six.viewkeys(dictb)
:
diff = six.viewkeys(dictb) - dicta
在2.4-2.6中,没有KeysView
。但是,您可以直接从迭代器中构建左集合,而不是先构建列表,至少可以将成本从4N削减到N:
diff = set(dictb) - dicta
物品
我有一个dictA可以与dictB相同,或者与dictB相比可能缺少一些键,否则某些键的值可能不同
因此,您实际上不需要比较键,而是需要比较项。ItemsView
仅Set
当值是可哈希值(例如字符串)时,an 才是a 。如果是这样,这很容易:
diff = dictb.items() - dicta.items()
递归差异
尽管问题不是直接要求递归差异,但某些示例值是dict,并且看来预期的输出确实递归地对它们进行差异。这里已经有多个答案显示了如何执行此操作。
回答 7
关于此参数,stackoverflow中还有另一个问题,我不得不承认有一个简单的解决方案:python 的datadiff库有助于打印两个字典之间的差异。
回答 8
这是一种可行的方法,允许将键的值计算为False
,并且在可能的情况下仍使用生成器表达式尽早退出。虽然不是特别漂亮。
any(map(lambda x: True, (k for k in b if k not in a)))
编辑:
THC4k发表了对我对另一个答案的评论的回复。这是一种更好,更漂亮的方法来执行上述操作:
any(True for k in b if k not in a)
不知道那怎么没想到…
回答 9
这是一个古老的问题,要求的问题比我需要的要少,因此,此答案实际上比该问题所要求的要多。这个问题的答案帮助我解决了以下问题:
- (要求)记录两个词典之间的差异
- 将#1的差异合并到基础词典中
- (要求)合并两个字典之间的差异(将第2个字典视为差异字典)
- 尝试检测物品的移动和变化
- (要求)递归执行所有这些操作
所有这些与JSON相结合,提供了非常强大的配置存储支持。
解决方案(也在github上):
from collections import OrderedDict
from pprint import pprint
class izipDestinationMatching(object):
__slots__ = ("attr", "value", "index")
def __init__(self, attr, value, index):
self.attr, self.value, self.index = attr, value, index
def __repr__(self):
return "izip_destination_matching: found match by '%s' = '%s' @ %d" % (self.attr, self.value, self.index)
def izip_destination(a, b, attrs, addMarker=True):
"""
Returns zipped lists, but final size is equal to b with (if shorter) a padded with nulls
Additionally also tries to find item reallocations by searching child dicts (if they are dicts) for attribute, listed in attrs)
When addMarker == False (patching), final size will be the longer of a, b
"""
for idx, item in enumerate(b):
try:
attr = next((x for x in attrs if x in item), None) # See if the item has any of the ID attributes
match, matchIdx = next(((orgItm, idx) for idx, orgItm in enumerate(a) if attr in orgItm and orgItm[attr] == item[attr]), (None, None)) if attr else (None, None)
if match and matchIdx != idx and addMarker: item[izipDestinationMatching] = izipDestinationMatching(attr, item[attr], matchIdx)
except:
match = None
yield (match if match else a[idx] if len(a) > idx else None), item
if not addMarker and len(a) > len(b):
for item in a[len(b) - len(a):]:
yield item, item
def dictdiff(a, b, searchAttrs=[]):
"""
returns a dictionary which represents difference from a to b
the return dict is as short as possible:
equal items are removed
added / changed items are listed
removed items are listed with value=None
Also processes list values where the resulting list size will match that of b.
It can also search said list items (that are dicts) for identity values to detect changed positions.
In case such identity value is found, it is kept so that it can be re-found during the merge phase
@param a: original dict
@param b: new dict
@param searchAttrs: list of strings (keys to search for in sub-dicts)
@return: dict / list / whatever input is
"""
if not (isinstance(a, dict) and isinstance(b, dict)):
if isinstance(a, list) and isinstance(b, list):
return [dictdiff(v1, v2, searchAttrs) for v1, v2 in izip_destination(a, b, searchAttrs)]
return b
res = OrderedDict()
if izipDestinationMatching in b:
keepKey = b[izipDestinationMatching].attr
del b[izipDestinationMatching]
else:
keepKey = izipDestinationMatching
for key in sorted(set(a.keys() + b.keys())):
v1 = a.get(key, None)
v2 = b.get(key, None)
if keepKey == key or v1 != v2: res[key] = dictdiff(v1, v2, searchAttrs)
if len(res) <= 1: res = dict(res) # This is only here for pretty print (OrderedDict doesn't pprint nicely)
return res
def dictmerge(a, b, searchAttrs=[]):
"""
Returns a dictionary which merges differences recorded in b to base dictionary a
Also processes list values where the resulting list size will match that of a
It can also search said list items (that are dicts) for identity values to detect changed positions
@param a: original dict
@param b: diff dict to patch into a
@param searchAttrs: list of strings (keys to search for in sub-dicts)
@return: dict / list / whatever input is
"""
if not (isinstance(a, dict) and isinstance(b, dict)):
if isinstance(a, list) and isinstance(b, list):
return [dictmerge(v1, v2, searchAttrs) for v1, v2 in izip_destination(a, b, searchAttrs, False)]
return b
res = OrderedDict()
for key in sorted(set(a.keys() + b.keys())):
v1 = a.get(key, None)
v2 = b.get(key, None)
#print "processing", key, v1, v2, key not in b, dictmerge(v1, v2)
if v2 is not None: res[key] = dictmerge(v1, v2, searchAttrs)
elif key not in b: res[key] = v1
if len(res) <= 1: res = dict(res) # This is only here for pretty print (OrderedDict doesn't pprint nicely)
return res
回答 10
怎么样标准(比较完整对象)
PyDev->新的PyDev模块->模块:单元测试
import unittest
class Test(unittest.TestCase):
def testName(self):
obj1 = {1:1, 2:2}
obj2 = {1:1, 2:2}
self.maxDiff = None # sometimes is usefull
self.assertDictEqual(d1, d2)
if __name__ == "__main__":
#import sys;sys.argv = ['', 'Test.testName']
unittest.main()
回答 11
如果在Python≥2.7上:
# update different values in dictB
# I would assume only dictA should be updated,
# but the question specifies otherwise
for k in dictA.viewkeys() & dictB.viewkeys():
if dictA[k] != dictB[k]:
dictB[k]= dictA[k]
# add missing keys to dictA
dictA.update( (k,dictB[k]) for k in dictB.viewkeys() - dictA.viewkeys() )
回答 12
这是深度比较两个字典键的解决方案:
def compareDictKeys(dict1, dict2):
if type(dict1) != dict or type(dict2) != dict:
return False
keys1, keys2 = dict1.keys(), dict2.keys()
diff = set(keys1) - set(keys2) or set(keys2) - set(keys1)
if not diff:
for key in keys1:
if (type(dict1[key]) == dict or type(dict2[key]) == dict) and not compareDictKeys(dict1[key], dict2[key]):
diff = True
break
return not diff
回答 13
这是一个可以比较两个以上命令的解决方案:
def diff_dict(dicts, default=None):
diff_dict = {}
# add 'list()' around 'd.keys()' for python 3 compatibility
for k in set(sum([d.keys() for d in dicts], [])):
# we can just use "values = [d.get(k, default) ..." below if
# we don't care that d1[k]=default and d2[k]=missing will
# be treated as equal
if any(k not in d for d in dicts):
diff_dict[k] = [d.get(k, default) for d in dicts]
else:
values = [d[k] for d in dicts]
if any(v != values[0] for v in values):
diff_dict[k] = values
return diff_dict
用法示例:
import matplotlib.pyplot as plt
diff_dict([plt.rcParams, plt.rcParamsDefault, plt.matplotlib.rcParamsOrig])
回答 14
我的两个字典之间的对称差异的配方:
def find_dict_diffs(dict1, dict2):
unequal_keys = []
unequal_keys.extend(set(dict1.keys()).symmetric_difference(set(dict2.keys())))
for k in dict1.keys():
if dict1.get(k, 'N\A') != dict2.get(k, 'N\A'):
unequal_keys.append(k)
if unequal_keys:
print 'param', 'dict1\t', 'dict2'
for k in set(unequal_keys):
print str(k)+'\t'+dict1.get(k, 'N\A')+'\t '+dict2.get(k, 'N\A')
else:
print 'Dicts are equal'
dict1 = {1:'a', 2:'b', 3:'c', 4:'d', 5:'e'}
dict2 = {1:'b', 2:'a', 3:'c', 4:'d', 6:'f'}
find_dict_diffs(dict1, dict2)
结果是:
param dict1 dict2
1 a b
2 b a
5 e N\A
6 N\A f
回答 15
正如其他答案中提到的那样,unittest可以生成一些不错的输出来比较dict,但是在此示例中,我们不需要先构建整个测试。
废弃unittest源代码,看起来您可以通过以下方式获得公平的解决方案:
import difflib
import pprint
def diff_dicts(a, b):
if a == b:
return ''
return '\n'.join(
difflib.ndiff(pprint.pformat(a, width=30).splitlines(),
pprint.pformat(b, width=30).splitlines())
)
所以
dictA = dict(zip(range(7), map(ord, 'python')))
dictB = {0: 112, 1: 'spam', 2: [1,2,3], 3: 104, 4: 111}
print diff_dicts(dictA, dictB)
结果是:
{0: 112,
- 1: 121,
- 2: 116,
+ 1: 'spam',
+ 2: [1, 2, 3],
3: 104,
- 4: 111,
? ^
+ 4: 111}
? ^
- 5: 110}
哪里:
- ‘-‘表示第一/第二个字典中的键/值
- “ +”表示第二个而不是第一个字典中的键/值
像在单元测试中一样,唯一的警告是由于尾随逗号/括号,最终映射可以被认为是差异。
回答 16
@Maxx有一个很好的答案,请使用unittest
Python提供的工具:
import unittest
class Test(unittest.TestCase):
def runTest(self):
pass
def testDict(self, d1, d2, maxDiff=None):
self.maxDiff = maxDiff
self.assertDictEqual(d1, d2)
然后,您可以在代码中的任何位置调用:
try:
Test().testDict(dict1, dict2)
except Exception, e:
print e
结果输出看起来像来自的输出diff
,用不同的行+
或在-
每行之前添加漂亮的字典。
回答 17
不知道它是否仍然有用,但是我遇到了这个问题,我的情况是我只需要返回所有嵌套字典等的变化的字典。找不到合适的解决方案,但是我最终写了一个简单的函数要做到这一点。希望这可以帮助,
回答 18
如果您想使用内置解决方案与任意dict结构进行全面比较,@ Maxx的答案就是一个好的开始。
import unittest
test = unittest.TestCase()
test.assertEqual(dictA, dictB)
回答 19
根据ghostdog74的回答,
dicta = {"a":1,"d":2}
dictb = {"a":5,"d":2}
for value in dicta.values():
if not value in dictb.values():
print value
将打印不同的dicta值
回答 20
尝试此操作以找到de交集,即两个字典中的键,如果要在第二个字典中找不到键,只需使用not in …
intersect = filter(lambda x, dictB=dictB.keys(): x in dictB, dictA.keys())