Python，计算列表差异-Python 实用宝典

# Python，计算列表差异

## 问题：Python，计算列表差异

``````A = [1,2,3,4]
B = [2,5]

A - B = [1,3,4]
B - A = [5]``````

In Python, what is the best way to compute the difference between two lists?

example

``````A = [1,2,3,4]
B = [2,5]

A - B = [1,3,4]
B - A = [5]
``````

## 回答 0

``````>>> def diff(first, second):
second = set(second)
return [item for item in first if item not in second]

>>> diff(A, B)
[1, 3, 4]
>>> diff(B, A)
[5]
>>> ``````

Use `set` if you don't care about items order or repetition. Use list comprehensions if you do:

``````>>> def diff(first, second):
second = set(second)
return [item for item in first if item not in second]

>>> diff(A, B)
[1, 3, 4]
>>> diff(B, A)
[5]
>>>
``````

## 回答 1

``````>>> set([1,2,3,4]) - set([2,5])
set([1, 4, 3])
>>> set([2,5]) - set([1,2,3,4])
set([5])``````

If the order does not matter, you can simply calculate the set difference:

``````>>> set([1,2,3,4]) - set([2,5])
set([1, 4, 3])
>>> set([2,5]) - set([1,2,3,4])
set([5])
``````

## 回答 2

``list(set(A)-set(B))``

``list(set(B)-set(A))``

You can do a

``````list(set(A)-set(B))
``````

and

``````list(set(B)-set(A))
``````

## 回答 3

``````diff = lambda l1,l2: [x for x in l1 if x not in l2]
diff(A,B)
diff(B,A)``````

``````diff = lambda l1,l2: filter(lambda x: x not in l2, l1)
diff(A,B)
diff(B,A)``````

One liner:

``````diff = lambda l1,l2: [x for x in l1 if x not in l2]
diff(A,B)
diff(B,A)
``````

Or:

``````diff = lambda l1,l2: filter(lambda x: x not in l2, l1)
diff(A,B)
diff(B,A)
``````

## 回答 4

``````from difflib import SequenceMatcher

squeeze=SequenceMatcher( None, A, B )

print "A - B = [%s]"%( reduce( lambda p,q: p+q,
map( lambda t: squeeze.a[t[1]:t[2]],
filter(lambda x:x[0]!='equal',
squeeze.get_opcodes() ) ) ) )``````

A-B = [[1、3、4]]

The above examples trivialized the problem of calculating differences. Assuming sorting or de-duplication definitely make it easier to compute the difference, but if your comparison cannot afford those assumptions then you'll need a non-trivial implementation of a diff algorithm. See difflib in the python standard library.

``````#! /usr/bin/python2
from difflib import SequenceMatcher

A = [1,2,3,4]
B = [2,5]

squeeze=SequenceMatcher( None, A, B )

print "A - B = [%s]"%( reduce( lambda p,q: p+q,
map( lambda t: squeeze.a[t[1]:t[2]],
filter(lambda x:x[0]!='equal',
squeeze.get_opcodes() ) ) ) )
``````

Or Python3...

``````#! /usr/bin/python3
from difflib import SequenceMatcher
from functools import reduce

A = [1,2,3,4]
B = [2,5]

squeeze=SequenceMatcher( None, A, B )

print( "A - B = [%s]"%( reduce( lambda p,q: p+q,
map( lambda t: squeeze.a[t[1]:t[2]],
filter(lambda x:x[0]!='equal',
squeeze.get_opcodes() ) ) ) ) )

``````

Output:

``````A - B = [[1, 3, 4]]
``````

## 回答 5

Python 2.7.3（默认，2014年2月27日，19:58:35）-IPython 1.1.0-timeit：（github gist）

``````def diff(a, b):
b = set(b)
return [aa for aa in a if aa not in b]

def set_diff(a, b):
return list(set(a) - set(b))

diff_lamb_hension = lambda l1,l2: [x for x in l1 if x not in l2]

diff_lamb_filter = lambda l1,l2: filter(lambda x: x not in l2, l1)

from difflib import SequenceMatcher
def squeezer(a, b):
squeeze = SequenceMatcher(None, a, b)
return reduce(lambda p,q: p+q, map(
lambda t: squeeze.a[t[1]:t[2]],
filter(lambda x:x[0]!='equal',
squeeze.get_opcodes())))``````

``````# Small
a = range(10)
b = range(10/2)

timeit[diff(a, b)]
100000 loops, best of 3: 1.97 µs per loop

timeit[set_diff(a, b)]
100000 loops, best of 3: 2.71 µs per loop

timeit[diff_lamb_hension(a, b)]
100000 loops, best of 3: 2.1 µs per loop

timeit[diff_lamb_filter(a, b)]
100000 loops, best of 3: 3.58 µs per loop

timeit[squeezer(a, b)]
10000 loops, best of 3: 36 µs per loop

# Medium
a = range(10**4)
b = range(10**4/2)

timeit[diff(a, b)]
1000 loops, best of 3: 1.17 ms per loop

timeit[set_diff(a, b)]
1000 loops, best of 3: 1.27 ms per loop

timeit[diff_lamb_hension(a, b)]
1 loops, best of 3: 736 ms per loop

timeit[diff_lamb_filter(a, b)]
1 loops, best of 3: 732 ms per loop

timeit[squeezer(a, b)]
100 loops, best of 3: 12.8 ms per loop

# Big
a = xrange(10**7)
b = xrange(10**7/2)

timeit[diff(a, b)]
1 loops, best of 3: 1.74 s per loop

timeit[set_diff(a, b)]
1 loops, best of 3: 2.57 s per loop

timeit[diff_lamb_filter(a, b)]
# too long to wait for

timeit[diff_lamb_filter(a, b)]
# too long to wait for

timeit[diff_lamb_filter(a, b)]
# TypeError: sequence index must be integer, not 'slice'``````

@ roman-bodnarchuk列表推导函数def diff（a，b）似乎更快。

Python 2.7.3 (default, Feb 27 2014, 19:58:35) - IPython 1.1.0 - timeit: (github gist)

``````def diff(a, b):
b = set(b)
return [aa for aa in a if aa not in b]

def set_diff(a, b):
return list(set(a) - set(b))

diff_lamb_hension = lambda l1,l2: [x for x in l1 if x not in l2]

diff_lamb_filter = lambda l1,l2: filter(lambda x: x not in l2, l1)

from difflib import SequenceMatcher
def squeezer(a, b):
squeeze = SequenceMatcher(None, a, b)
return reduce(lambda p,q: p+q, map(
lambda t: squeeze.a[t[1]:t[2]],
filter(lambda x:x[0]!='equal',
squeeze.get_opcodes())))
``````

Results:

``````# Small
a = range(10)
b = range(10/2)

timeit[diff(a, b)]
100000 loops, best of 3: 1.97 µs per loop

timeit[set_diff(a, b)]
100000 loops, best of 3: 2.71 µs per loop

timeit[diff_lamb_hension(a, b)]
100000 loops, best of 3: 2.1 µs per loop

timeit[diff_lamb_filter(a, b)]
100000 loops, best of 3: 3.58 µs per loop

timeit[squeezer(a, b)]
10000 loops, best of 3: 36 µs per loop

# Medium
a = range(10**4)
b = range(10**4/2)

timeit[diff(a, b)]
1000 loops, best of 3: 1.17 ms per loop

timeit[set_diff(a, b)]
1000 loops, best of 3: 1.27 ms per loop

timeit[diff_lamb_hension(a, b)]
1 loops, best of 3: 736 ms per loop

timeit[diff_lamb_filter(a, b)]
1 loops, best of 3: 732 ms per loop

timeit[squeezer(a, b)]
100 loops, best of 3: 12.8 ms per loop

# Big
a = xrange(10**7)
b = xrange(10**7/2)

timeit[diff(a, b)]
1 loops, best of 3: 1.74 s per loop

timeit[set_diff(a, b)]
1 loops, best of 3: 2.57 s per loop

timeit[diff_lamb_filter(a, b)]
# too long to wait for

timeit[diff_lamb_filter(a, b)]
# too long to wait for

timeit[diff_lamb_filter(a, b)]
# TypeError: sequence index must be integer, not 'slice'
``````

@roman-bodnarchuk list comprehensions function def diff(a, b) seems to be faster.

## 回答 6

``````A = [1,2,3,4]
B = [2,5]

#A - B
x = list(set(A) - set(B))
#B - A
y = list(set(B) - set(A))

print x
print y ``````
``````A = [1,2,3,4]
B = [2,5]

#A - B
x = list(set(A) - set(B))
#B - A
y = list(set(B) - set(A))

print x
print y
``````

## 回答 7

You would want to use a `set` instead of a `list`.

## 安装

``pip install deepdiff``

``pip install future six``

## 用法示例

``````>>> from deepdiff import DeepDiff
>>> from pprint import pprint
>>> from __future__ import print_function``````

``````>>> t1 = {1:1, 2:2, 3:3}
>>> t2 = t1
>>> ddiff = DeepDiff(t1, t2)
>>> print (ddiff.changes)
{}``````

``````>>> t1 = {1:1, 2:2, 3:3}
>>> t2 = {1:1, 2:"2", 3:3}
>>> ddiff = DeepDiff(t1, t2)
>>> print (ddiff.changes)
{'type_changes': ["root[2]: 2=<type 'int'> vs. 2=<type 'str'>"]}``````

``````>>> t1 = {1:1, 2:2, 3:3}
>>> t2 = {1:1, 2:4, 3:3}
>>> ddiff = DeepDiff(t1, t2)
>>> print (ddiff.changes)
{'values_changed': ['root[2]: 2 ====>> 4']}``````

``````>>> t1 = {1:1, 2:2, 3:3, 4:4}
>>> t2 = {1:1, 2:4, 3:3, 5:5, 6:6}
>>> ddiff = DeepDiff(t1, t2)
>>> pprint (ddiff.changes)
'dic_item_removed': ['root[4]'],
'values_changed': ['root[2]: 2 ====>> 4']}``````

``````>>> t1 = {1:1, 2:2, 3:3, 4:{"a":"hello", "b":"world"}}
>>> t2 = {1:1, 2:4, 3:3, 4:{"a":"hello", "b":"world!"}}
>>> ddiff = DeepDiff(t1, t2)
>>> pprint (ddiff.changes, indent = 2)
{ 'values_changed': [ 'root[2]: 2 ====>> 4',
"root[4]['b']:\n--- \n+++ \n@@ -1 +1 @@\n-world\n+world!"]}
>>>
>>> print (ddiff.changes['values_changed'][1])
root[4]['b']:
---
+++
@@ -1 +1 @@
-world
+world!``````

``````>>> t1 = {1:1, 2:2, 3:3, 4:{"a":"hello", "b":"world!\nGoodbye!\n1\n2\nEnd"}}
>>> t2 = {1:1, 2:2, 3:3, 4:{"a":"hello", "b":"world\n1\n2\nEnd"}}
>>> ddiff = DeepDiff(t1, t2)
>>> pprint (ddiff.changes, indent = 2)
{ 'values_changed': [ "root[4]['b']:\n--- \n+++ \n@@ -1,5 +1,4 @@\n-world!\n-Goodbye!\n+world\n 1\n 2\n End"]}
>>>
>>> print (ddiff.changes['values_changed'][0])
root[4]['b']:
---
+++
@@ -1,5 +1,4 @@
-world!
-Goodbye!
+world
1
2
End``````

``````>>> t1 = {1:1, 2:2, 3:3, 4:{"a":"hello", "b":[1, 2, 3]}}
>>> t2 = {1:1, 2:2, 3:3, 4:{"a":"hello", "b":"world\n\n\nEnd"}}
>>> ddiff = DeepDiff(t1, t2)
>>> pprint (ddiff.changes, indent = 2)
{ 'type_changes': [ "root[4]['b']: [1, 2, 3]=<type 'list'> vs. world\n\n\nEnd=<type 'str'>"]}``````

``````>>> t1 = {1:1, 2:2, 3:3, 4:{"a":"hello", "b":[1, 2, 3]}}
>>> t2 = {1:1, 2:2, 3:3, 4:{"a":"hello", "b":[1, 2]}}
>>> ddiff = DeepDiff(t1, t2)
>>> pprint (ddiff.changes, indent = 2)
{ 'list_removed': ["root[4]['b']: [3]"]}``````

``````>>> # Note that it DOES NOT take order into account
... t1 = {1:1, 2:2, 3:3, 4:{"a":"hello", "b":[1, 2, 3]}}
>>> t2 = {1:1, 2:2, 3:3, 4:{"a":"hello", "b":[1, 3, 2]}}
>>> ddiff = DeepDiff(t1, t2)
>>> pprint (ddiff.changes, indent = 2)
{ }``````

``````>>> t1 = {1:1, 2:2, 3:3, 4:{"a":"hello", "b":[1, 2, {1:1, 2:2}]}}
>>> t2 = {1:1, 2:2, 3:3, 4:{"a":"hello", "b":[1, 2, {1:3}]}}
>>> ddiff = DeepDiff(t1, t2)
>>> pprint (ddiff.changes, indent = 2)
{ 'dic_item_removed': ["root[4]['b'][2][2]"],
'values_changed': ["root[4]['b'][2][1]: 1 ====>> 3"]}``````

In case you want the difference recursively going deep into items of your list, I have written a package for python: https://github.com/erasmose/deepdiff

## Installation

Install from PyPi:

``````pip install deepdiff
``````

If you are Python3 you need to also install:

``````pip install future six
``````

## Example usage

``````>>> from deepdiff import DeepDiff
>>> from pprint import pprint
>>> from __future__ import print_function
``````

Same object returns empty

``````>>> t1 = {1:1, 2:2, 3:3}
>>> t2 = t1
>>> ddiff = DeepDiff(t1, t2)
>>> print (ddiff.changes)
{}
``````

Type of an item has changed

``````>>> t1 = {1:1, 2:2, 3:3}
>>> t2 = {1:1, 2:"2", 3:3}
>>> ddiff = DeepDiff(t1, t2)
>>> print (ddiff.changes)
{'type_changes': ["root[2]: 2=<type 'int'> vs. 2=<type 'str'>"]}
``````

Value of an item has changed

``````>>> t1 = {1:1, 2:2, 3:3}
>>> t2 = {1:1, 2:4, 3:3}
>>> ddiff = DeepDiff(t1, t2)
>>> print (ddiff.changes)
{'values_changed': ['root[2]: 2 ====>> 4']}
``````

``````>>> t1 = {1:1, 2:2, 3:3, 4:4}
>>> t2 = {1:1, 2:4, 3:3, 5:5, 6:6}
>>> ddiff = DeepDiff(t1, t2)
>>> pprint (ddiff.changes)
'dic_item_removed': ['root[4]'],
'values_changed': ['root[2]: 2 ====>> 4']}
``````

String difference

``````>>> t1 = {1:1, 2:2, 3:3, 4:{"a":"hello", "b":"world"}}
>>> t2 = {1:1, 2:4, 3:3, 4:{"a":"hello", "b":"world!"}}
>>> ddiff = DeepDiff(t1, t2)
>>> pprint (ddiff.changes, indent = 2)
{ 'values_changed': [ 'root[2]: 2 ====>> 4',
"root[4]['b']:\n--- \n+++ \n@@ -1 +1 @@\n-world\n+world!"]}
>>>
>>> print (ddiff.changes['values_changed'][1])
root[4]['b']:
---
+++
@@ -1 +1 @@
-world
+world!
``````

String difference 2

``````>>> t1 = {1:1, 2:2, 3:3, 4:{"a":"hello", "b":"world!\nGoodbye!\n1\n2\nEnd"}}
>>> t2 = {1:1, 2:2, 3:3, 4:{"a":"hello", "b":"world\n1\n2\nEnd"}}
>>> ddiff = DeepDiff(t1, t2)
>>> pprint (ddiff.changes, indent = 2)
{ 'values_changed': [ "root[4]['b']:\n--- \n+++ \n@@ -1,5 +1,4 @@\n-world!\n-Goodbye!\n+world\n 1\n 2\n End"]}
>>>
>>> print (ddiff.changes['values_changed'][0])
root[4]['b']:
---
+++
@@ -1,5 +1,4 @@
-world!
-Goodbye!
+world
1
2
End
``````

Type change

``````>>> t1 = {1:1, 2:2, 3:3, 4:{"a":"hello", "b":[1, 2, 3]}}
>>> t2 = {1:1, 2:2, 3:3, 4:{"a":"hello", "b":"world\n\n\nEnd"}}
>>> ddiff = DeepDiff(t1, t2)
>>> pprint (ddiff.changes, indent = 2)
{ 'type_changes': [ "root[4]['b']: [1, 2, 3]=<type 'list'> vs. world\n\n\nEnd=<type 'str'>"]}
``````

List difference

``````>>> t1 = {1:1, 2:2, 3:3, 4:{"a":"hello", "b":[1, 2, 3]}}
>>> t2 = {1:1, 2:2, 3:3, 4:{"a":"hello", "b":[1, 2]}}
>>> ddiff = DeepDiff(t1, t2)
>>> pprint (ddiff.changes, indent = 2)
{ 'list_removed': ["root[4]['b']: [3]"]}
``````

List difference 2: Note that it DOES NOT take order into account

``````>>> # Note that it DOES NOT take order into account
... t1 = {1:1, 2:2, 3:3, 4:{"a":"hello", "b":[1, 2, 3]}}
>>> t2 = {1:1, 2:2, 3:3, 4:{"a":"hello", "b":[1, 3, 2]}}
>>> ddiff = DeepDiff(t1, t2)
>>> pprint (ddiff.changes, indent = 2)
{ }
``````

List that contains dictionary:

``````>>> t1 = {1:1, 2:2, 3:3, 4:{"a":"hello", "b":[1, 2, {1:1, 2:2}]}}
>>> t2 = {1:1, 2:2, 3:3, 4:{"a":"hello", "b":[1, 2, {1:3}]}}
>>> ddiff = DeepDiff(t1, t2)
>>> pprint (ddiff.changes, indent = 2)
{ 'dic_item_removed': ["root[4]['b'][2][2]"],
'values_changed': ["root[4]['b'][2][1]: 1 ====>> 3"]}
``````

## 回答 9

``````list_a = [1,2,3]
list_b = [2,3]
print set(list_a).difference(set(list_b))``````

most simple way,

use set().difference(set())

``````list_a = [1,2,3]
list_b = [2,3]
print set(list_a).difference(set(list_b))
``````

answer is `set([1])`

## 回答 10

``TypeError: unhashable type: 'dict'``

# 测试用例

``````def diff(a, b):
return [aa for aa in a if aa not in b]

d1 = {"a":1, "b":1}
d2 = {"a":2, "b":2}
d3 = {"a":3, "b":3}

>>> diff([d1, d2, d3], [d2, d3])
[{'a': 1, 'b': 1}]
>>> diff([d1, d2, d3], [d1])
[{'a': 2, 'b': 2}, {'a': 3, 'b': 3}]``````

In case of a list of dictionaries, the full list comprehension solution works while the `set` solution raises

``````TypeError: unhashable type: 'dict'
``````

# Test Case

``````def diff(a, b):
return [aa for aa in a if aa not in b]

d1 = {"a":1, "b":1}
d2 = {"a":2, "b":2}
d3 = {"a":3, "b":3}

>>> diff([d1, d2, d3], [d2, d3])
[{'a': 1, 'b': 1}]
>>> diff([d1, d2, d3], [d1])
[{'a': 2, 'b': 2}, {'a': 3, 'b': 3}]
``````

## 回答 11

``````a=[1,2,3,3,4]
b=[2,4]
tmp = copy.deepcopy(a)
for k in b:
if k in tmp:
tmp.remove(k)
print(tmp)``````

Simple code that gives you the difference with multiple items if you want that:

``````a=[1,2,3,3,4]
b=[2,4]
tmp = copy.deepcopy(a)
for k in b:
if k in tmp:
tmp.remove(k)
print(tmp)
``````

## 回答 12

``````# Compares the difference of list a and b
# uses a callback function to compare items
def diff(a, b, callback):
a_missing_in_b = []
ai = 0
bi = 0

a = sorted(a, callback)
b = sorted(b, callback)

while (ai < len(a)) and (bi < len(b)):

cmp = callback(a[ai], b[bi])
if cmp < 0:
a_missing_in_b.append(a[ai])
ai += 1
elif cmp > 0:
# Item b is missing in a
bi += 1
else:
# a and b intersecting on this item
ai += 1
bi += 1

# if a and b are not of same length, we need to add the remaining items
for ai in xrange(ai, len(a)):
a_missing_in_b.append(a[ai])

return a_missing_in_b``````

``````>>> a=[1,2,3]
>>> b=[2,4,6]
>>> diff(a, b, cmp)
[1, 3]``````

When having a look at TimeComplexity of In-operator, in worst case it works with O(n). Even for Sets.

So when comparing two arrays we'll have a TimeComplexity of O(n) in best case and O(n^2) in worst case.

An alternative (but unfortunately more complex) solution, which works with O(n) in best and worst case is this one:

``````# Compares the difference of list a and b
# uses a callback function to compare items
def diff(a, b, callback):
a_missing_in_b = []
ai = 0
bi = 0

a = sorted(a, callback)
b = sorted(b, callback)

while (ai < len(a)) and (bi < len(b)):

cmp = callback(a[ai], b[bi])
if cmp < 0:
a_missing_in_b.append(a[ai])
ai += 1
elif cmp > 0:
# Item b is missing in a
bi += 1
else:
# a and b intersecting on this item
ai += 1
bi += 1

# if a and b are not of same length, we need to add the remaining items
for ai in xrange(ai, len(a)):
a_missing_in_b.append(a[ai])

return a_missing_in_b
``````

e.g.

``````>>> a=[1,2,3]
>>> b=[2,4,6]
>>> diff(a, b, cmp)
[1, 3]
``````