问题:Python-唯一词典列表
假设我有一个字典列表:
[
{'id': 1, 'name': 'john', 'age': 34},
{'id': 1, 'name': 'john', 'age': 34},
{'id': 2, 'name': 'hanna', 'age': 30},
]
并且我需要获取唯一字典列表(删除重复项):
[
{'id': 1, 'name': 'john', 'age': 34},
{'id': 2, 'name': 'hanna', 'age': 30},
]
谁能以最有效的方式帮助我在Python中实现这一目标?
Let’s say I got a list of dictionaries:
[
{'id': 1, 'name': 'john', 'age': 34},
{'id': 1, 'name': 'john', 'age': 34},
{'id': 2, 'name': 'hanna', 'age': 30},
]
and I need to obtain a list of unique dictionaries (removing the duplicates):
[
{'id': 1, 'name': 'john', 'age': 34},
{'id': 2, 'name': 'hanna', 'age': 30},
]
Can anyone help me with the most efficient way to achieve this in Python?
回答 0
因此,以密钥为临时做出命令id
。这将滤除重复项。的values()
该字典中会列表
在Python2.7中
>>> L=[
... {'id':1,'name':'john', 'age':34},
... {'id':1,'name':'john', 'age':34},
... {'id':2,'name':'hanna', 'age':30},
... ]
>>> {v['id']:v for v in L}.values()
[{'age': 34, 'id': 1, 'name': 'john'}, {'age': 30, 'id': 2, 'name': 'hanna'}]
在Python3中
>>> L=[
... {'id':1,'name':'john', 'age':34},
... {'id':1,'name':'john', 'age':34},
... {'id':2,'name':'hanna', 'age':30},
... ]
>>> list({v['id']:v for v in L}.values())
[{'age': 34, 'id': 1, 'name': 'john'}, {'age': 30, 'id': 2, 'name': 'hanna'}]
在Python2.5 / 2.6中
>>> L=[
... {'id':1,'name':'john', 'age':34},
... {'id':1,'name':'john', 'age':34},
... {'id':2,'name':'hanna', 'age':30},
... ]
>>> dict((v['id'],v) for v in L).values()
[{'age': 34, 'id': 1, 'name': 'john'}, {'age': 30, 'id': 2, 'name': 'hanna'}]
So make a temporary dict with the key being the id
. This filters out the duplicates.
The values()
of the dict will be the list
In Python2.7
>>> L=[
... {'id':1,'name':'john', 'age':34},
... {'id':1,'name':'john', 'age':34},
... {'id':2,'name':'hanna', 'age':30},
... ]
>>> {v['id']:v for v in L}.values()
[{'age': 34, 'id': 1, 'name': 'john'}, {'age': 30, 'id': 2, 'name': 'hanna'}]
In Python3
>>> L=[
... {'id':1,'name':'john', 'age':34},
... {'id':1,'name':'john', 'age':34},
... {'id':2,'name':'hanna', 'age':30},
... ]
>>> list({v['id']:v for v in L}.values())
[{'age': 34, 'id': 1, 'name': 'john'}, {'age': 30, 'id': 2, 'name': 'hanna'}]
In Python2.5/2.6
>>> L=[
... {'id':1,'name':'john', 'age':34},
... {'id':1,'name':'john', 'age':34},
... {'id':2,'name':'hanna', 'age':30},
... ]
>>> dict((v['id'],v) for v in L).values()
[{'age': 34, 'id': 1, 'name': 'john'}, {'age': 30, 'id': 2, 'name': 'hanna'}]
回答 1
在集合中查找常见元素的通常方法是使用Python的set
类。只需将所有元素添加到集合中,然后将集合转换为,然后list
重复就消失了。
当然,问题在于a set()
只能包含可哈希的条目,而a dict
不可哈希。
如果遇到此问题,我的解决方案是将每个dict
字符串转换为表示的字符串dict
,然后将所有字符串添加至,然后将set()
字符串值读出为,list()
然后转换回dict
。
dict
JSON格式很好地表示了字符串形式。而且Python有一个内置的JSON模块(json
当然也称为)。
剩下的问题是,中的元素dict
没有排序,并且当Python将转换dict
为JSON字符串时,您可能会得到两个JSON字符串,它们表示等效字典,但不是相同的字符串。一种简单的解决方案是在调用sort_keys=True
时传递参数json.dumps()
。
编辑:此解决方案是假设给定的dict
任何部分都可以不同。如果我们可以假设dict
具有相同"id"
值的每个对象都将dict
具有相同"id"
值的其他对象匹配,那么这太过分了。@gnibbler的解决方案将更快,更轻松。
编辑:现在,安德烈·利马(AndréLima)有一条评论明确指出,如果ID是重复的,则可以安全地假定整个dict
重复。因此,此答案过于刻薄,我建议使用@gnibbler的答案。
The usual way to find just the common elements in a set is to use Python’s set
class. Just add all the elements to the set, then convert the set to a list
, and bam the duplicates are gone.
The problem, of course, is that a set()
can only contain hashable entries, and a dict
is not hashable.
If I had this problem, my solution would be to convert each dict
into a string that represents the dict
, then add all the strings to a set()
then read out the string values as a list()
and convert back to dict
.
A good representation of a dict
in string form is JSON format. And Python has a built-in module for JSON (called json
of course).
The remaining problem is that the elements in a dict
are not ordered, and when Python converts the dict
to a JSON string, you might get two JSON strings that represent equivalent dictionaries but are not identical strings. The easy solution is to pass the argument sort_keys=True
when you call json.dumps()
.
EDIT: This solution was assuming that a given dict
could have any part different. If we can assume that every dict
with the same "id"
value will match every other dict
with the same "id"
value, then this is overkill; @gnibbler’s solution would be faster and easier.
EDIT: Now there is a comment from André Lima explicitly saying that if the ID is a duplicate, it’s safe to assume that the whole dict
is a duplicate. So this answer is overkill and I recommend @gnibbler’s answer.
回答 2
如果字典仅由所有项目唯一标识(ID不可用),则可以使用JSON答案。以下是不使用JSON的替代方法,只要所有字典值都是不可变的,就可以使用
[dict(s) for s in set(frozenset(d.items()) for d in L)]
In case the dictionaries are only uniquely identified by all items (ID is not available) you can use the answer using JSON. The following is an alternative that does not use JSON, and will work as long as all dictionary values are immutable
[dict(s) for s in set(frozenset(d.items()) for d in L)]
回答 3
您可以使用numpy库(仅适用于Python2.x):
import numpy as np
list_of_unique_dicts=list(np.unique(np.array(list_of_dicts)))
要使其与Python 3.x(以及numpy的最新版本)一起使用,您需要将dict数组转换为numpy字符串数组,例如
list_of_unique_dicts=list(np.unique(np.array(list_of_dicts).astype(str)))
You can use numpy library (works for Python2.x only):
import numpy as np
list_of_unique_dicts=list(np.unique(np.array(list_of_dicts)))
To get it worked with Python 3.x (and recent versions of numpy), you need to convert array of dicts to numpy array of strings, e.g.
list_of_unique_dicts=list(np.unique(np.array(list_of_dicts).astype(str)))
回答 4
这是一个相当紧凑的解决方案,尽管我怀疑这不是特别有效(说得有点客气):
>>> ds = [{'id':1,'name':'john', 'age':34},
... {'id':1,'name':'john', 'age':34},
... {'id':2,'name':'hanna', 'age':30}
... ]
>>> map(dict, set(tuple(sorted(d.items())) for d in ds))
[{'age': 30, 'id': 2, 'name': 'hanna'}, {'age': 34, 'id': 1, 'name': 'john'}]
Here’s a reasonably compact solution, though I suspect not particularly efficient (to put it mildly):
>>> ds = [{'id':1,'name':'john', 'age':34},
... {'id':1,'name':'john', 'age':34},
... {'id':2,'name':'hanna', 'age':30}
... ]
>>> map(dict, set(tuple(sorted(d.items())) for d in ds))
[{'age': 30, 'id': 2, 'name': 'hanna'}, {'age': 34, 'id': 1, 'name': 'john'}]
回答 5
由于id
足以检测重复项,并且id
可以进行哈希处理:请通过以id
为主键的字典运行’em 。每个键的值是原始字典。
deduped_dicts = dict((item["id"], item) for item in list_of_dicts).values()
在Python 3中,values()
不返回列表。您需要将表达式的整个右侧包装在中list()
,并且您可以更经济地将表达式的内容写成dict理解:
deduped_dicts = list({item["id"]: item for item in list_of_dicts}.values())
请注意,结果可能不会与原始顺序相同。如果需要,您可以使用Collections.OrderedDict
而不是dict
。
顺便说一句,将数据保留在使用id
as键开头。
Since the id
is sufficient for detecting duplicates, and the id
is hashable: run ’em through a dictionary that has the id
as the key. The value for each key is the original dictionary.
deduped_dicts = dict((item["id"], item) for item in list_of_dicts).values()
In Python 3, values()
doesn’t return a list; you’ll need to wrap the whole right-hand-side of that expression in list()
, and you can write the meat of the expression more economically as a dict comprehension:
deduped_dicts = list({item["id"]: item for item in list_of_dicts}.values())
Note that the result likely will not be in the same order as the original. If that’s a requirement, you could use a Collections.OrderedDict
instead of a dict
.
As an aside, it may make a good deal of sense to just keep the data in a dictionary that uses the id
as key to begin with.
回答 6
a = [
{'id':1,'name':'john', 'age':34},
{'id':1,'name':'john', 'age':34},
{'id':2,'name':'hanna', 'age':30},
]
b = {x['id']:x for x in a}.values()
print(b)
输出:
[{‘age’:34,’id’:1,1,name’:’john’},{‘age’:30,’id’:2,2,’name’:’hanna’}]
a = [
{'id':1,'name':'john', 'age':34},
{'id':1,'name':'john', 'age':34},
{'id':2,'name':'hanna', 'age':30},
]
b = {x['id']:x for x in a}.values()
print(b)
outputs:
[{‘age’: 34, ‘id’: 1, ‘name’: ‘john’}, {‘age’: 30, ‘id’: 2, ‘name’: ‘hanna’}]
回答 7
扩展John La Rooy(Python-独特词典的列表)的答案,使其更加灵活:
def dedup_dict_list(list_of_dicts: list, columns: list) -> list:
return list({''.join(row[column] for column in columns): row
for row in list_of_dicts}.values())
调用函数:
sorted_list_of_dicts = dedup_dict_list(
unsorted_list_of_dicts, ['id', 'name'])
Expanding on John La Rooy (Python – List of unique dictionaries) answer, making it a bit more flexible:
def dedup_dict_list(list_of_dicts: list, columns: list) -> list:
return list({''.join(row[column] for column in columns): row
for row in list_of_dicts}.values())
Calling Function:
sorted_list_of_dicts = dedup_dict_list(
unsorted_list_of_dicts, ['id', 'name'])
回答 8
我们可以做 pandas
import pandas as pd
yourdict=pd.DataFrame(L).drop_duplicates().to_dict('r')
Out[293]: [{'age': 34, 'id': 1, 'name': 'john'}, {'age': 30, 'id': 2, 'name': 'hanna'}]
注意与接受答案略有不同。
drop_duplicates
将检查熊猫中的所有列,如果全部相同,则将删除该行。
例如 :
如果我们将第二个dict
名字从john更改为peter
L=[
{'id': 1, 'name': 'john', 'age': 34},
{'id': 1, 'name': 'peter', 'age': 34},
{'id': 2, 'name': 'hanna', 'age': 30},
]
pd.DataFrame(L).drop_duplicates().to_dict('r')
Out[295]:
[{'age': 34, 'id': 1, 'name': 'john'},
{'age': 34, 'id': 1, 'name': 'peter'},# here will still keeping the dict in the out put
{'age': 30, 'id': 2, 'name': 'hanna'}]
We can do with pandas
import pandas as pd
yourdict=pd.DataFrame(L).drop_duplicates().to_dict('r')
Out[293]: [{'age': 34, 'id': 1, 'name': 'john'}, {'age': 30, 'id': 2, 'name': 'hanna'}]
Notice slightly different from the accept answer.
drop_duplicates
will check all column in pandas , if all same then the row will be dropped .
For example :
If we change the 2nd dict
name from john to peter
L=[
{'id': 1, 'name': 'john', 'age': 34},
{'id': 1, 'name': 'peter', 'age': 34},
{'id': 2, 'name': 'hanna', 'age': 30},
]
pd.DataFrame(L).drop_duplicates().to_dict('r')
Out[295]:
[{'age': 34, 'id': 1, 'name': 'john'},
{'age': 34, 'id': 1, 'name': 'peter'},# here will still keeping the dict in the out put
{'age': 30, 'id': 2, 'name': 'hanna'}]
回答 9
在python 3.6+(我已经测试过)中,只需使用:
import json
#Toy example, but will also work for your case
myListOfDicts = [{'a':1,'b':2},{'a':1,'b':2},{'a':1,'b':3}]
#Start by sorting each dictionary by keys
myListOfDictsSorted = [sorted(d.items()) for d in myListOfDicts]
#Using json methods with set() to get unique dict
myListOfUniqueDicts = list(map(json.loads,set(map(json.dumps, myListOfDictsSorted))))
print(myListOfUniqueDicts)
说明:我们正在映射,json.dumps
以将字典编码为不可变的json对象。set
然后可以用来产生唯一不可变的迭代。最后,我们使用转换回字典表示形式json.loads
。请注意,最初,您必须按键排序才能以唯一的形式排列字典。这对Python 3.6+有效,因为默认情况下字典是有序的。
In python 3.6+ (what I’ve tested), just use:
import json
#Toy example, but will also work for your case
myListOfDicts = [{'a':1,'b':2},{'a':1,'b':2},{'a':1,'b':3}]
#Start by sorting each dictionary by keys
myListOfDictsSorted = [sorted(d.items()) for d in myListOfDicts]
#Using json methods with set() to get unique dict
myListOfUniqueDicts = list(map(json.loads,set(map(json.dumps, myListOfDictsSorted))))
print(myListOfUniqueDicts)
Explanation: we’re mapping the json.dumps
to encode the dictionaries as json objects, which are immutable. set
can then be used to produce an iterable of unique immutables. Finally, we convert back to our dictionary representation using json.loads
. Note that initially, one must sort by keys to arrange the dictionaries in a unique form. This is valid for Python 3.6+ since dictionaries are ordered by default.
回答 10
我总结了我的最爱以尝试:
https://repl.it/@SmaMa/Python-List-of-unique-dictionaries
# ----------------------------------------------
# Setup
# ----------------------------------------------
myList = [
{"id":"1", "lala": "value_1"},
{"id": "2", "lala": "value_2"},
{"id": "2", "lala": "value_2"},
{"id": "3", "lala": "value_3"}
]
print("myList:", myList)
# -----------------------------------------------
# Option 1 if objects has an unique identifier
# -----------------------------------------------
myUniqueList = list({myObject['id']:myObject for myObject in myList}.values())
print("myUniqueList:", myUniqueList)
# -----------------------------------------------
# Option 2 if uniquely identified by whole object
# -----------------------------------------------
myUniqueSet = [dict(s) for s in set(frozenset(myObject.items()) for myObject in myList)]
print("myUniqueSet:", myUniqueSet)
# -----------------------------------------------
# Option 3 for hashable objects (not dicts)
# -----------------------------------------------
myHashableObjects = list(set(["1", "2", "2", "3"]))
print("myHashAbleList:", myHashableObjects)
I have summarized my favorites to try out:
https://repl.it/@SmaMa/Python-List-of-unique-dictionaries
# ----------------------------------------------
# Setup
# ----------------------------------------------
myList = [
{"id":"1", "lala": "value_1"},
{"id": "2", "lala": "value_2"},
{"id": "2", "lala": "value_2"},
{"id": "3", "lala": "value_3"}
]
print("myList:", myList)
# -----------------------------------------------
# Option 1 if objects has an unique identifier
# -----------------------------------------------
myUniqueList = list({myObject['id']:myObject for myObject in myList}.values())
print("myUniqueList:", myUniqueList)
# -----------------------------------------------
# Option 2 if uniquely identified by whole object
# -----------------------------------------------
myUniqueSet = [dict(s) for s in set(frozenset(myObject.items()) for myObject in myList)]
print("myUniqueSet:", myUniqueSet)
# -----------------------------------------------
# Option 3 for hashable objects (not dicts)
# -----------------------------------------------
myHashableObjects = list(set(["1", "2", "2", "3"]))
print("myHashAbleList:", myHashableObjects)
回答 11
快速而又肮脏的解决方案只是生成一个新列表。
sortedlist = []
for item in listwhichneedssorting:
if item not in sortedlist:
sortedlist.append(item)
A quick-and-dirty solution is just by generating a new list.
sortedlist = []
for item in listwhichneedssorting:
if item not in sortedlist:
sortedlist.append(item)
回答 12
我不知道您是否只希望列表中的字典ID是唯一的,但是如果目标是要有一组dict,其中所有键的值都具有唯一性,那么您应该使用元组键在您的理解中:
>>> L=[
... {'id':1,'name':'john', 'age':34},
... {'id':1,'name':'john', 'age':34},
... {'id':2,'name':'hanna', 'age':30},
... {'id':2,'name':'hanna', 'age':50}
... ]
>>> len(L)
4
>>> L=list({(v['id'], v['age'], v['name']):v for v in L}.values())
>>>L
[{'id': 1, 'name': 'john', 'age': 34}, {'id': 2, 'name': 'hanna', 'age': 30}, {'id': 2, 'name': 'hanna', 'age': 50}]
>>>len(L)
3
希望它可以帮助您或其他有问题的人。
I don’t know if you only want the id of your dicts in the list to be unique, but if the goal is to have a set of dict where the unicity is on all keys’ values.. you should use tuples key like this in your comprehension :
>>> L=[
... {'id':1,'name':'john', 'age':34},
... {'id':1,'name':'john', 'age':34},
... {'id':2,'name':'hanna', 'age':30},
... {'id':2,'name':'hanna', 'age':50}
... ]
>>> len(L)
4
>>> L=list({(v['id'], v['age'], v['name']):v for v in L}.values())
>>>L
[{'id': 1, 'name': 'john', 'age': 34}, {'id': 2, 'name': 'hanna', 'age': 30}, {'id': 2, 'name': 'hanna', 'age': 50}]
>>>len(L)
3
Hope it helps you or another person having the concern….
回答 13
这里有很多答案,所以让我添加另一个:
import json
from typing import List
def dedup_dicts(items: List[dict]):
dedupped = [ json.loads(i) for i in set(json.dumps(item, sort_keys=True) for item in items)]
return dedupped
items = [
{'id': 1, 'name': 'john', 'age': 34},
{'id': 1, 'name': 'john', 'age': 34},
{'id': 2, 'name': 'hanna', 'age': 30},
]
dedup_dicts(items)
There are a lot of answers here, so let me add another:
import json
from typing import List
def dedup_dicts(items: List[dict]):
dedupped = [ json.loads(i) for i in set(json.dumps(item, sort_keys=True) for item in items)]
return dedupped
items = [
{'id': 1, 'name': 'john', 'age': 34},
{'id': 1, 'name': 'john', 'age': 34},
{'id': 2, 'name': 'hanna', 'age': 30},
]
dedup_dicts(items)
回答 14
非常简单的选项:
L = [
{'id':1,'name':'john', 'age':34},
{'id':1,'name':'john', 'age':34},
{'id':2,'name':'hanna', 'age':30},
]
D = dict()
for l in L: D[l['id']] = l
output = list(D.values())
print output
Pretty straightforward option:
L = [
{'id':1,'name':'john', 'age':34},
{'id':1,'name':'john', 'age':34},
{'id':2,'name':'hanna', 'age':30},
]
D = dict()
for l in L: D[l['id']] = l
output = list(D.values())
print output
回答 15
那么这里提到的所有答案都是好的,但是在某些答案中,如果字典项具有嵌套列表或字典,则可能会遇到错误,因此我提出了一个简单的答案
a = [str(i) for i in a]
a = list(set(a))
a = [eval(i) for i in a]
Well all the answers mentioned here are good, but in some answers one can face error if the dictionary items have nested list or dictionary, so I propose simple answer
a = [str(i) for i in a]
a = list(set(a))
a = [eval(i) for i in a]
回答 16
这是一种内存开销很小的实现,但其代价是没有其余的那么紧凑。
values = [ {'id':2,'name':'hanna', 'age':30},
{'id':1,'name':'john', 'age':34},
{'id':1,'name':'john', 'age':34},
{'id':2,'name':'hanna', 'age':30},
{'id':1,'name':'john', 'age':34},]
count = {}
index = 0
while index < len(values):
if values[index]['id'] in count:
del values[index]
else:
count[values[index]['id']] = 1
index += 1
输出:
[{'age': 30, 'id': 2, 'name': 'hanna'}, {'age': 34, 'id': 1, 'name': 'john'}]
Heres an implementation with little memory overhead at the cost of not being as compact as the rest.
values = [ {'id':2,'name':'hanna', 'age':30},
{'id':1,'name':'john', 'age':34},
{'id':1,'name':'john', 'age':34},
{'id':2,'name':'hanna', 'age':30},
{'id':1,'name':'john', 'age':34},]
count = {}
index = 0
while index < len(values):
if values[index]['id'] in count:
del values[index]
else:
count[values[index]['id']] = 1
index += 1
output:
[{'age': 30, 'id': 2, 'name': 'hanna'}, {'age': 34, 'id': 1, 'name': 'john'}]
回答 17
这是我发现的解决方案:
usedID = []
x = [
{'id':1,'name':'john', 'age':34},
{'id':1,'name':'john', 'age':34},
{'id':2,'name':'hanna', 'age':30},
]
for each in x:
if each['id'] in usedID:
x.remove(each)
else:
usedID.append(each['id'])
print x
基本上,您检查ID是否存在于列表中,如果存在,则删除字典,否则,将ID附加到列表中
This is the solution I found:
usedID = []
x = [
{'id':1,'name':'john', 'age':34},
{'id':1,'name':'john', 'age':34},
{'id':2,'name':'hanna', 'age':30},
]
for each in x:
if each['id'] in usedID:
x.remove(each)
else:
usedID.append(each['id'])
print x
Basically you check if the ID is present in the list, if it is, delete the dictionary, if not, append the ID to the list