Python-唯一词典列表

Question 1

Let’s say I got a list of dictionaries:

[
    {'id': 1, 'name': 'john', 'age': 34},
    {'id': 1, 'name': 'john', 'age': 34},
    {'id': 2, 'name': 'hanna', 'age': 30},
]

and I need to obtain a list of unique dictionaries (removing the duplicates):

[
    {'id': 1, 'name': 'john', 'age': 34},
    {'id': 2, 'name': 'hanna', 'age': 30},
]

Can anyone help me with the most efficient way to achieve this in Python?

Question 2

So make a temporary dict with the key being the id. This filters out the duplicates. The values() of the dict will be the list

In Python2.7

>>> L=[
... {'id':1,'name':'john', 'age':34},
... {'id':1,'name':'john', 'age':34},
... {'id':2,'name':'hanna', 'age':30},
... ]
>>> {v['id']:v for v in L}.values()
[{'age': 34, 'id': 1, 'name': 'john'}, {'age': 30, 'id': 2, 'name': 'hanna'}]

In Python3

>>> L=[
... {'id':1,'name':'john', 'age':34},
... {'id':1,'name':'john', 'age':34},
... {'id':2,'name':'hanna', 'age':30},
... ] 
>>> list({v['id']:v for v in L}.values())
[{'age': 34, 'id': 1, 'name': 'john'}, {'age': 30, 'id': 2, 'name': 'hanna'}]

In Python2.5/2.6

>>> L=[
... {'id':1,'name':'john', 'age':34},
... {'id':1,'name':'john', 'age':34},
... {'id':2,'name':'hanna', 'age':30},
... ] 
>>> dict((v['id'],v) for v in L).values()
[{'age': 34, 'id': 1, 'name': 'john'}, {'age': 30, 'id': 2, 'name': 'hanna'}]

Question 3

The usual way to find just the common elements in a set is to use Python’s set class. Just add all the elements to the set, then convert the set to a list, and bam the duplicates are gone.

The problem, of course, is that a set() can only contain hashable entries, and a dict is not hashable.

If I had this problem, my solution would be to convert each dict into a string that represents the dict, then add all the strings to a set() then read out the string values as a list() and convert back to dict.

A good representation of a dict in string form is JSON format. And Python has a built-in module for JSON (called json of course).

The remaining problem is that the elements in a dict are not ordered, and when Python converts the dict to a JSON string, you might get two JSON strings that represent equivalent dictionaries but are not identical strings. The easy solution is to pass the argument sort_keys=True when you call json.dumps().

EDIT: This solution was assuming that a given dict could have any part different. If we can assume that every dict with the same "id" value will match every other dict with the same "id" value, then this is overkill; @gnibbler’s solution would be faster and easier.

EDIT: Now there is a comment from André Lima explicitly saying that if the ID is a duplicate, it’s safe to assume that the whole dict is a duplicate. So this answer is overkill and I recommend @gnibbler’s answer.

Question 4

In case the dictionaries are only uniquely identified by all items (ID is not available) you can use the answer using JSON. The following is an alternative that does not use JSON, and will work as long as all dictionary values are immutable

[dict(s) for s in set(frozenset(d.items()) for d in L)]

Question 5

You can use numpy library (works for Python2.x only):

   import numpy as np 

   list_of_unique_dicts=list(np.unique(np.array(list_of_dicts)))

To get it worked with Python 3.x (and recent versions of numpy), you need to convert array of dicts to numpy array of strings, e.g.

list_of_unique_dicts=list(np.unique(np.array(list_of_dicts).astype(str)))

Question 6

Here’s a reasonably compact solution, though I suspect not particularly efficient (to put it mildly):

>>> ds = [{'id':1,'name':'john', 'age':34},
...       {'id':1,'name':'john', 'age':34},
...       {'id':2,'name':'hanna', 'age':30}
...       ]
>>> map(dict, set(tuple(sorted(d.items())) for d in ds))
[{'age': 30, 'id': 2, 'name': 'hanna'}, {'age': 34, 'id': 1, 'name': 'john'}]

Question 7

Since the id is sufficient for detecting duplicates, and the id is hashable: run ’em through a dictionary that has the id as the key. The value for each key is the original dictionary.

deduped_dicts = dict((item["id"], item) for item in list_of_dicts).values()

In Python 3, values() doesn’t return a list; you’ll need to wrap the whole right-hand-side of that expression in list(), and you can write the meat of the expression more economically as a dict comprehension:

deduped_dicts = list({item["id"]: item for item in list_of_dicts}.values())

Note that the result likely will not be in the same order as the original. If that’s a requirement, you could use a Collections.OrderedDict instead of a dict.

As an aside, it may make a good deal of sense to just keep the data in a dictionary that uses the id as key to begin with.

Question 8

a = [
{'id':1,'name':'john', 'age':34},
{'id':1,'name':'john', 'age':34},
{'id':2,'name':'hanna', 'age':30},
]

b = {x['id']:x for x in a}.values()

print(b)

outputs:

[{‘age’: 34, ‘id’: 1, ‘name’: ‘john’}, {‘age’: 30, ‘id’: 2, ‘name’: ‘hanna’}]

Question 9

Expanding on John La Rooy (Python – List of unique dictionaries) answer, making it a bit more flexible:

def dedup_dict_list(list_of_dicts: list, columns: list) -> list:
    return list({''.join(row[column] for column in columns): row
                for row in list_of_dicts}.values())

Calling Function:

sorted_list_of_dicts = dedup_dict_list(
    unsorted_list_of_dicts, ['id', 'name'])

Question 10

We can do with pandas

import pandas as pd
yourdict=pd.DataFrame(L).drop_duplicates().to_dict('r')
Out[293]: [{'age': 34, 'id': 1, 'name': 'john'}, {'age': 30, 'id': 2, 'name': 'hanna'}]

Notice slightly different from the accept answer.

drop_duplicates will check all column in pandas , if all same then the row will be dropped .

For example :

If we change the 2nd dict name from john to peter

L=[
    {'id': 1, 'name': 'john', 'age': 34},
    {'id': 1, 'name': 'peter', 'age': 34},
    {'id': 2, 'name': 'hanna', 'age': 30},
]
pd.DataFrame(L).drop_duplicates().to_dict('r')
Out[295]: 
[{'age': 34, 'id': 1, 'name': 'john'},
 {'age': 34, 'id': 1, 'name': 'peter'},# here will still keeping the dict in the out put 
 {'age': 30, 'id': 2, 'name': 'hanna'}]

Question 11

In python 3.6+ (what I’ve tested), just use:

import json

#Toy example, but will also work for your case 
myListOfDicts = [{'a':1,'b':2},{'a':1,'b':2},{'a':1,'b':3}]
#Start by sorting each dictionary by keys
myListOfDictsSorted = [sorted(d.items()) for d in myListOfDicts]

#Using json methods with set() to get unique dict
myListOfUniqueDicts = list(map(json.loads,set(map(json.dumps, myListOfDictsSorted))))

print(myListOfUniqueDicts)

Explanation: we’re mapping the json.dumps to encode the dictionaries as json objects, which are immutable. set can then be used to produce an iterable of unique immutables. Finally, we convert back to our dictionary representation using json.loads. Note that initially, one must sort by keys to arrange the dictionaries in a unique form. This is valid for Python 3.6+ since dictionaries are ordered by default.

Question 12

I have summarized my favorites to try out:

https://repl.it/@SmaMa/Python-List-of-unique-dictionaries

# ----------------------------------------------
# Setup
# ----------------------------------------------

myList = [
  {"id":"1", "lala": "value_1"},
  {"id": "2", "lala": "value_2"}, 
  {"id": "2", "lala": "value_2"}, 
  {"id": "3", "lala": "value_3"}
]
print("myList:", myList)

# -----------------------------------------------
# Option 1 if objects has an unique identifier
# -----------------------------------------------

myUniqueList = list({myObject['id']:myObject for myObject in myList}.values())
print("myUniqueList:", myUniqueList)

# -----------------------------------------------
# Option 2 if uniquely identified by whole object
# -----------------------------------------------

myUniqueSet = [dict(s) for s in set(frozenset(myObject.items()) for myObject in myList)]
print("myUniqueSet:", myUniqueSet)

# -----------------------------------------------
# Option 3 for hashable objects (not dicts)
# -----------------------------------------------

myHashableObjects = list(set(["1", "2", "2", "3"]))
print("myHashAbleList:", myHashableObjects)

Question 13

A quick-and-dirty solution is just by generating a new list.

sortedlist = []

for item in listwhichneedssorting:
    if item not in sortedlist:
        sortedlist.append(item)

Question 14

I don’t know if you only want the id of your dicts in the list to be unique, but if the goal is to have a set of dict where the unicity is on all keys’ values.. you should use tuples key like this in your comprehension :

>>> L=[
...     {'id':1,'name':'john', 'age':34},
...    {'id':1,'name':'john', 'age':34}, 
...    {'id':2,'name':'hanna', 'age':30},
...    {'id':2,'name':'hanna', 'age':50}
...    ]
>>> len(L)
4
>>> L=list({(v['id'], v['age'], v['name']):v for v in L}.values())
>>>L
[{'id': 1, 'name': 'john', 'age': 34}, {'id': 2, 'name': 'hanna', 'age': 30}, {'id': 2, 'name': 'hanna', 'age': 50}]
>>>len(L)
3

Hope it helps you or another person having the concern….

Question 15

There are a lot of answers here, so let me add another:

import json
from typing import List

def dedup_dicts(items: List[dict]):
    dedupped = [ json.loads(i) for i in set(json.dumps(item, sort_keys=True) for item in items)]
    return dedupped

items = [
    {'id': 1, 'name': 'john', 'age': 34},
    {'id': 1, 'name': 'john', 'age': 34},
    {'id': 2, 'name': 'hanna', 'age': 30},
]
dedup_dicts(items)

Question 16

Pretty straightforward option:

L = [
    {'id':1,'name':'john', 'age':34},
    {'id':1,'name':'john', 'age':34},
    {'id':2,'name':'hanna', 'age':30},
    ]


D = dict()
for l in L: D[l['id']] = l
output = list(D.values())
print output

Question 17

Well all the answers mentioned here are good, but in some answers one can face error if the dictionary items have nested list or dictionary, so I propose simple answer

a = [str(i) for i in a]
a = list(set(a))
a = [eval(i) for i in a]

Question 18

Heres an implementation with little memory overhead at the cost of not being as compact as the rest.

values = [ {'id':2,'name':'hanna', 'age':30},
           {'id':1,'name':'john', 'age':34},
           {'id':1,'name':'john', 'age':34},
           {'id':2,'name':'hanna', 'age':30},
           {'id':1,'name':'john', 'age':34},]
count = {}
index = 0
while index < len(values):
    if values[index]['id'] in count:
        del values[index]
    else:
        count[values[index]['id']] = 1
        index += 1

output:

[{'age': 30, 'id': 2, 'name': 'hanna'}, {'age': 34, 'id': 1, 'name': 'john'}]

Question 19

This is the solution I found:

usedID = []

x = [
{'id':1,'name':'john', 'age':34},
{'id':1,'name':'john', 'age':34},
{'id':2,'name':'hanna', 'age':30},
]

for each in x:
    if each['id'] in usedID:
        x.remove(each)
    else:
        usedID.append(each['id'])

print x

Basically you check if the ID is present in the list, if it is, delete the dictionary, if not, append the ID to the list

Python-唯一词典列表

问题：Python-唯一词典列表

回答 0

回答 1

回答 2

回答 3

回答 4

回答 5

回答 6

回答 7

回答 8

回答 9

回答 10

回答 11

回答 12

回答 13

回答 14

回答 15

回答 16

回答 17

排行榜展示

Python 情人节超强技能导出微信聊天记录生成词云

你不得不知道的python超级文献批量搜索下载工具

7行代码 Python热力图可视化分析缺失数据处理

Python 流程图 — 一键转化代码为流程图

Python 优化—算出每条语句执行时间

你的10W块放哪里能赚最多钱？

文章展示

Python中的高性能模糊字符串比较，使用Levenshtein或difflib

重复字符串一定长度

如何比较两个日期？

为什么我们需要在PyTorch中调用zero_grad（）？

如何在Django网站上记录服务器错误

有条件替换熊猫

Python-唯一词典列表

问题：Python-唯一词典列表

回答 0

回答 1

回答 2

回答 3

回答 4

回答 5

回答 6

回答 7

回答 8

回答 9

回答 10

回答 11

回答 12

回答 13

回答 14

回答 15

回答 16

回答 17

相关文章

排行榜展示

文章展示