标签归档:dictionary

什么是实现__hash __()的正确和好方法?

问题:什么是实现__hash __()的正确和好方法?

什么是正确的好方法__hash__()

我说的是一个返回哈希码的函数,然后该哈希码用于将对象插入哈希表(又名字典)。

__hash__()返回一个整数并用于将对象“绑定”到哈希表中时,我假设返回的整数的值应均匀分配给公共数据(以最大程度地减少冲突)。获得这样的价值观是什么好习惯?碰撞是个问题吗?就我而言,我有一个小类,它充当一个容器类,其中包含一些整数,一些浮点数和一个字符串。

What’s a correct and good way to implement __hash__()?

I am talking about the function that returns a hashcode that is then used to insert objects into hashtables aka dictionaries.

As __hash__() returns an integer and is used for “binning” objects into hashtables I assume that the values of the returned integer should be uniformly distributed for common data (to minimize collisions). What’s a good practice to get such values? Are collisions a problem? In my case I have a small class which acts as a container class holding some ints, some floats and a string.


回答 0

一种简单而正确的实现方法__hash__()是使用键元组。它不会像专门的哈希那样快,但是如果需要,则应该在C中实现该类型。

这是使用键进行哈希和相等的示例:

class A:
    def __key(self):
        return (self.attr_a, self.attr_b, self.attr_c)

    def __hash__(self):
        return hash(self.__key())

    def __eq__(self, other):
        if isinstance(other, A):
            return self.__key() == other.__key()
        return NotImplemented

此外,的文档__hash__还包含更多信息,这些信息在某些特定情况下可能会很有价值。

An easy, correct way to implement __hash__() is to use a key tuple. It won’t be as fast as a specialized hash, but if you need that then you should probably implement the type in C.

Here’s an example of using a key for hash and equality:

class A:
    def __key(self):
        return (self.attr_a, self.attr_b, self.attr_c)

    def __hash__(self):
        return hash(self.__key())

    def __eq__(self, other):
        if isinstance(other, A):
            return self.__key() == other.__key()
        return NotImplemented

Also, the documentation of __hash__ has more information, that may be valuable in some particular circumstances.


回答 1

John Millikin提出了类似于以下的解决方案:

class A(object):

    def __init__(self, a, b, c):
        self._a = a
        self._b = b
        self._c = c

    def __eq__(self, othr):
        return (isinstance(othr, type(self))
                and (self._a, self._b, self._c) ==
                    (othr._a, othr._b, othr._c))

    def __hash__(self):
        return hash((self._a, self._b, self._c))

此解决方案的问题是hash(A(a, b, c)) == hash((a, b, c))。换句话说,哈希与它的关键成员的元组冲突。也许这在实践中并不经常发生?

更新:Python文档现在建议使用上面的示例中的元组。请注意,文档说明

唯一需要的属性是比较相等的对象具有相同的哈希值

注意相反的说法是不正确的。不相等的对象可能具有相同的哈希值。这种哈希冲突在用作dict键或set元素时不会导致一个对象替换另一对象,只要这些对象也不能相等

过时/不好的解决方案

上的Python文档__hash__建议使用XOR之类的东西来组合子组件的哈希值,从而实现以下目的:

class B(object):

    def __init__(self, a, b, c):
        self._a = a
        self._b = b
        self._c = c

    def __eq__(self, othr):
        if isinstance(othr, type(self)):
            return ((self._a, self._b, self._c) ==
                    (othr._a, othr._b, othr._c))
        return NotImplemented

    def __hash__(self):
        return (hash(self._a) ^ hash(self._b) ^ hash(self._c) ^
                hash((self._a, self._b, self._c)))

更新:正如Blckknght指出的那样,更改a,b和c的顺序可能会引起问题。我添加了一个附加项^ hash((self._a, self._b, self._c))来捕获被哈希值的顺序。^ hash(...)如果合并的值无法重新排列(例如,如果它们的类型不同,因此_a将永远不会将的值分配给_b_c,等等),则可以删除此最终形式。

John Millikin proposed a solution similar to this:

class A(object):

    def __init__(self, a, b, c):
        self._a = a
        self._b = b
        self._c = c

    def __eq__(self, othr):
        return (isinstance(othr, type(self))
                and (self._a, self._b, self._c) ==
                    (othr._a, othr._b, othr._c))

    def __hash__(self):
        return hash((self._a, self._b, self._c))

The problem with this solution is that the hash(A(a, b, c)) == hash((a, b, c)). In other words, the hash collides with that of the tuple of its key members. Maybe this does not matter very often in practice?

Update: the Python docs now recommend to use a tuple as in the example above. Note that the documentation states

The only required property is that objects which compare equal have the same hash value

Note that the opposite is not true. Objects which do not compare equal may have the same hash value. Such a hash collision will not cause one object to replace another when used as a dict key or set element as long as the objects do not also compare equal.

Outdated/bad solution

The Python documentation on __hash__ suggests to combine the hashes of the sub-components using something like XOR, which gives us this:

class B(object):

    def __init__(self, a, b, c):
        self._a = a
        self._b = b
        self._c = c

    def __eq__(self, othr):
        if isinstance(othr, type(self)):
            return ((self._a, self._b, self._c) ==
                    (othr._a, othr._b, othr._c))
        return NotImplemented

    def __hash__(self):
        return (hash(self._a) ^ hash(self._b) ^ hash(self._c) ^
                hash((self._a, self._b, self._c)))

Update: as Blckknght points out, changing the order of a, b, and c could cause problems. I added an additional ^ hash((self._a, self._b, self._c)) to capture the order of the values being hashed. This final ^ hash(...) can be removed if the values being combined cannot be rearranged (for example, if they have different types and therefore the value of _a will never be assigned to _b or _c, etc.).


回答 2

Microsoft Research的Paul Larson研究了各种哈希函数。他告诉我

for c in some_string:
    hash = 101 * hash  +  ord(c)

对于各种各样的琴弦,效果都非常好。我发现类似的多项式技术可以很好地用于计算不同子字段的哈希。

Paul Larson of Microsoft Research studied a wide variety of hash functions. He told me that

for c in some_string:
    hash = 101 * hash  +  ord(c)

worked surprisingly well for a wide variety of strings. I’ve found that similar polynomial techniques work well for computing a hash of disparate subfields.


回答 3

我可以尝试回答您问题的第二部分。

冲突可能不是哈希码本身引起的,而是哈希码映射到集合中的索引所导致的。因此,例如,您的哈希函数可以返回1到10000之间的随机值,但是如果您的哈希表只有32个条目,则在插入时会发生冲突。

另外,我认为冲突将由集合内部解决,并且有很多解决冲突的方法。最简单(也是最糟糕)的情况是,给定要插入到索引i的条目,将i加1直到找到一个空白点并插入该位置。然后,检索以相同的方式进行。这会导致某些条目的检索效率低下,因为您可能有一个条目需要遍历整个集合才能找到!

其他冲突解决方法通过在插入项目以散布事物时移动哈希表中的条目来减少检索时间。这会增加插入时间,但假定您阅读的内容多于插入内容。也有尝试将不同的冲突条目分支出来的方法,以使条目聚集在一个特定位置。

另外,如果您需要调整集合的大小,则需要重新哈希所有内容或使用动态哈希方法。

简而言之,根据您使用的哈希码,您可能必须实现自己的冲突解决方法。如果不将它们存储在集合中,则可以使用仅生成很大范围内的哈希码的哈希函数来解决。如果是这样,则可以根据您的内存问题来确保容器大于所需的容器(当然,容器越大越好)。

如果您有更多兴趣,请点击以下链接:

维基百科上的合并哈希

Wikipedia还总结了各种冲突解决方法:

此外,Tharp的“ 文件组织和处理 ”广泛涵盖了许多冲突解决方法。IMO是哈希算法的重要参考。

I can try to answer the second part of your question.

The collisions will probably result not from the hash code itself, but from mapping the hash code to an index in a collection. So for example your hash function could return random values from 1 to 10000, but if your hash table only has 32 entries you’ll get collisions on insertion.

In addition, I would think that collisions would be resolved by the collection internally, and there are many methods to resolve collisions. The simplest (and worst) is, given an entry to insert at index i, add 1 to i until you find an empty spot and insert there. Retrieval then works the same way. This results in inefficient retrievals for some entries, as you could have an entry that requires traversing the entire collection to find!

Other collision resolution methods reduce the retrieval time by moving entries in the hash table when an item is inserted to spread things out. This increases the insertion time but assumes you read more than you insert. There are also methods that try and branch different colliding entries out so that entries to cluster in one particular spot.

Also, if you need to resize the collection you will need to rehash everything or use a dynamic hashing method.

In short, depending on what you’re using the hash code for you may have to implement your own collision resolution method. If you’re not storing them in a collection, you can probably get away with a hash function that just generates hash codes in a very large range. If so, you can make sure your container is bigger than it needs to be (the bigger the better of course) depending on your memory concerns.

Here are some links if you’re interested more:

coalesced hashing on wikipedia

Wikipedia also has a summary of various collision resolution methods:

Also, “File Organization And Processing” by Tharp covers alot of collision resolution methods extensively. IMO it’s a great reference for hashing algorithms.


回答 4

__hash__programiz网站上很好地解释了何时以及如何实现该功能:

只是一个截图以提供概述:(检索2019-12-13)

https://www.programiz.com/python-programming/methods/built-in/hash的屏幕快照2019-12-13

至于该方法的个人实现,上述站点提供了一个与millerdev答案匹配的示例

class Person:
def __init__(self, age, name):
    self.age = age
    self.name = name

def __eq__(self, other):
    return self.age == other.age and self.name == other.name

def __hash__(self):
    print('The hash is:')
    return hash((self.age, self.name))

person = Person(23, 'Adam')
print(hash(person))

A very good explanation on when and how implement the __hash__ function is on programiz website:

Just a screenshot to provide an overview: (Retrieved 2019-12-13)

Screenshot of https://www.programiz.com/python-programming/methods/built-in/hash 2019-12-13

As for a personal implementation of the method, the above mentioned site provides an example that matches the answer of millerdev.

class Person:
def __init__(self, age, name):
    self.age = age
    self.name = name

def __eq__(self, other):
    return self.age == other.age and self.name == other.name

def __hash__(self):
    print('The hash is:')
    return hash((self.age, self.name))

person = Person(23, 'Adam')
print(hash(person))

回答 5

取决于您返回的哈希值的大小。这是很简单的逻辑,如果您需要基于四个32位int的哈希值返回32位int,则会发生冲突。

我希望位操作。像下面的C伪代码:

int a;
int b;
int c;
int d;
int hash = (a & 0xF000F000) | (b & 0x0F000F00) | (c & 0x00F000F0 | (d & 0x000F000F);

如果仅将它们用作浮点值而不是实际代表浮点值,则这样的系统也可以用于浮点数,也许更好。

对于字符串,我几乎一无所知。

Depends on the size of the hash value you return. It’s simple logic that if you need to return a 32bit int based on the hash of four 32bit ints, you’re gonna get collisions.

I would favor bit operations. Like, the following C pseudo code:

int a;
int b;
int c;
int d;
int hash = (a & 0xF000F000) | (b & 0x0F000F00) | (c & 0x00F000F0 | (d & 0x000F000F);

Such a system could work for floats too, if you simply took them as their bit value rather than actually representing a floating-point value, maybe better.

For strings, I’ve got little/no idea.


python元组到字典

问题:python元组到字典

对于元组,t = ((1, 'a'),(2, 'b')) dict(t)返回{1: 'a', 2: 'b'}

有没有一种好的方法{'a': 1, 'b': 2}(交换键和值)?

最终,我希望能够返回1给定'a'2给定'b',也许转换为字典不是最好的方法。

For the tuple, t = ((1, 'a'),(2, 'b')) dict(t) returns {1: 'a', 2: 'b'}

Is there a good way to get {'a': 1, 'b': 2} (keys and vals swapped)?

Ultimately, I want to be able to return 1 given 'a' or 2 given 'b', perhaps converting to a dict is not the best way.


回答 0

尝试:

>>> t = ((1, 'a'),(2, 'b'))
>>> dict((y, x) for x, y in t)
{'a': 1, 'b': 2}

Try:

>>> t = ((1, 'a'),(2, 'b'))
>>> dict((y, x) for x, y in t)
{'a': 1, 'b': 2}

回答 1

稍微简单一些的方法:

>>> t = ((1, 'a'),(2, 'b'))
>>> dict(map(reversed, t))
{'a': 1, 'b': 2}

A slightly simpler method:

>>> t = ((1, 'a'),(2, 'b'))
>>> dict(map(reversed, t))
{'a': 1, 'b': 2}

回答 2

如果您使用的是python 2.7,则更加简洁:

>>> t = ((1,'a'),(2,'b'))
>>> {y:x for x,y in t}
{'a':1, 'b':2}

Even more concise if you are on python 2.7:

>>> t = ((1,'a'),(2,'b'))
>>> {y:x for x,y in t}
{'a':1, 'b':2}

回答 3

>>> dict([('hi','goodbye')])
{'hi': 'goodbye'}

要么:

>>> [ dict([i]) for i in (('CSCO', 21.14), ('CSCO', 21.14), ('CSCO', 21.14), ('CSCO', 21.14)) ]
[{'CSCO': 21.14}, {'CSCO': 21.14}, {'CSCO': 21.14}, {'CSCO': 21.14}]
>>> dict([('hi','goodbye')])
{'hi': 'goodbye'}

Or:

>>> [ dict([i]) for i in (('CSCO', 21.14), ('CSCO', 21.14), ('CSCO', 21.14), ('CSCO', 21.14)) ]
[{'CSCO': 21.14}, {'CSCO': 21.14}, {'CSCO': 21.14}, {'CSCO': 21.14}]

回答 4

如果同一个键有多个值,则以下代码会将这些值附加到与它们的键相对应的列表中,

d = dict()
for x,y in t:
    if(d.has_key(y)):
        d[y].append(x)
    else:
        d[y] = [x]

If there are multiple values for the same key, the following code will append those values to a list corresponding to their key,

d = dict()
for x,y in t:
    if(d.has_key(y)):
        d[y].append(x)
    else:
        d[y] = [x]

回答 5

以下是几种方法:

>>> t = ((1, 'a'), (2, 'b'))

>>> # using reversed function
>>> dict(reversed(i) for i in t)
{'a': 1, 'b': 2}

>>> # using slice operator
>>> dict(i[::-1] for i in t)
{'a': 1, 'b': 2}

Here are couple ways of doing it:

>>> t = ((1, 'a'), (2, 'b'))

>>> # using reversed function
>>> dict(reversed(i) for i in t)
{'a': 1, 'b': 2}

>>> # using slice operator
>>> dict(i[::-1] for i in t)
{'a': 1, 'b': 2}

如何在Python中创建嵌套字典?

问题:如何在Python中创建嵌套字典?

我有2个CSV文件:“数据”和“映射”:

  • ‘映射’文件有4列:Device_NameGDNDevice_Type,和Device_OS。填充所有四个列。
  • “数据”文件具有这些相同的列,其中Device_Name填充了列,而其他三列为空白。
  • 我希望我的Python代码来打开这两个文件并为每个Device_Name数据文件,它的映射GDNDevice_Type以及Device_OS从映射文件中值。

我知道只有2列存在时才需要使用dict(需要映射1列),但是当需要映射3列时我不知道如何实现。

以下是我尝试完成的映射的代码Device_Type

x = dict([])
with open("Pricing Mapping_2013-04-22.csv", "rb") as in_file1:
    file_map = csv.reader(in_file1, delimiter=',')
    for row in file_map:
       typemap = [row[0],row[2]]
       x.append(typemap)

with open("Pricing_Updated_Cleaned.csv", "rb") as in_file2, open("Data Scraper_GDN.csv", "wb") as out_file:
    writer = csv.writer(out_file, delimiter=',')
    for row in csv.reader(in_file2, delimiter=','):
         try:
              row[27] = x[row[11]]
         except KeyError:
              row[27] = ""
         writer.writerow(row)

它返回Attribute Error

经过研究后,我认为我需要创建一个嵌套的字典,但是我不知道如何执行此操作。

I have 2 CSV files: ‘Data’ and ‘Mapping’:

  • ‘Mapping’ file has 4 columns: Device_Name, GDN, Device_Type, and Device_OS. All four columns are populated.
  • ‘Data’ file has these same columns, with Device_Name column populated and the other three columns blank.
  • I want my Python code to open both files and for each Device_Name in the Data file, map its GDN, Device_Type, and Device_OS value from the Mapping file.

I know how to use dict when only 2 columns are present (1 is needed to be mapped) but I don’t know how to accomplish this when 3 columns need to be mapped.

Following is the code using which I tried to accomplish mapping of Device_Type:

x = dict([])
with open("Pricing Mapping_2013-04-22.csv", "rb") as in_file1:
    file_map = csv.reader(in_file1, delimiter=',')
    for row in file_map:
       typemap = [row[0],row[2]]
       x.append(typemap)

with open("Pricing_Updated_Cleaned.csv", "rb") as in_file2, open("Data Scraper_GDN.csv", "wb") as out_file:
    writer = csv.writer(out_file, delimiter=',')
    for row in csv.reader(in_file2, delimiter=','):
         try:
              row[27] = x[row[11]]
         except KeyError:
              row[27] = ""
         writer.writerow(row)

It returns Attribute Error.

After some researching, I think I need to create a nested dict, but I don’t have any idea how to do this.


回答 0

嵌套字典是字典中的字典。非常简单的事情。

>>> d = {}
>>> d['dict1'] = {}
>>> d['dict1']['innerkey'] = 'value'
>>> d
{'dict1': {'innerkey': 'value'}}

你也可以使用一个defaultdictcollections包装,以方便创建嵌套的字典。

>>> import collections
>>> d = collections.defaultdict(dict)
>>> d['dict1']['innerkey'] = 'value'
>>> d  # currently a defaultdict type
defaultdict(<type 'dict'>, {'dict1': {'innerkey': 'value'}})
>>> dict(d)  # but is exactly like a normal dictionary.
{'dict1': {'innerkey': 'value'}}

您可以根据需要填充。

我建议在你的代码的东西下面:

d = {}  # can use defaultdict(dict) instead

for row in file_map:
    # derive row key from something 
    # when using defaultdict, we can skip the next step creating a dictionary on row_key
    d[row_key] = {} 
    for idx, col in enumerate(row):
        d[row_key][idx] = col

根据您的评论

可能上面的代码令人困惑。我的问题简而言之:我有2个文件a.csv b.csv,a.csv有4列ijkl,b.csv也有这些列。我是这些csv的关键列。jkl列在a.csv中为空,但在b.csv中填充。我想使用’i’作为键列将b.csv中的jk l列的值映射到a.csv文件

我的建议是什么这样(不使用defaultdict):

a_file = "path/to/a.csv"
b_file = "path/to/b.csv"

# read from file a.csv
with open(a_file) as f:
    # skip headers
    f.next()
    # get first colum as keys
    keys = (line.split(',')[0] for line in f) 

# create empty dictionary:
d = {}

# read from file b.csv
with open(b_file) as f:
    # gather headers except first key header
    headers = f.next().split(',')[1:]
    # iterate lines
    for line in f:
        # gather the colums
        cols = line.strip().split(',')
        # check to make sure this key should be mapped.
        if cols[0] not in keys:
            continue
        # add key to dict
        d[cols[0]] = dict(
            # inner keys are the header names, values are columns
            (headers[idx], v) for idx, v in enumerate(cols[1:]))

但是请注意,用于解析csv文件的是csv模块

A nested dict is a dictionary within a dictionary. A very simple thing.

>>> d = {}
>>> d['dict1'] = {}
>>> d['dict1']['innerkey'] = 'value'
>>> d
{'dict1': {'innerkey': 'value'}}

You can also use a defaultdict from the collections package to facilitate creating nested dictionaries.

>>> import collections
>>> d = collections.defaultdict(dict)
>>> d['dict1']['innerkey'] = 'value'
>>> d  # currently a defaultdict type
defaultdict(<type 'dict'>, {'dict1': {'innerkey': 'value'}})
>>> dict(d)  # but is exactly like a normal dictionary.
{'dict1': {'innerkey': 'value'}}

You can populate that however you want.

I would recommend in your code something like the following:

d = {}  # can use defaultdict(dict) instead

for row in file_map:
    # derive row key from something 
    # when using defaultdict, we can skip the next step creating a dictionary on row_key
    d[row_key] = {} 
    for idx, col in enumerate(row):
        d[row_key][idx] = col

According to your comment:

may be above code is confusing the question. My problem in nutshell: I have 2 files a.csv b.csv, a.csv has 4 columns i j k l, b.csv also has these columns. i is kind of key columns for these csvs’. j k l column is empty in a.csv but populated in b.csv. I want to map values of j k l columns using ‘i` as key column from b.csv to a.csv file

My suggestion would be something like this (without using defaultdict):

a_file = "path/to/a.csv"
b_file = "path/to/b.csv"

# read from file a.csv
with open(a_file) as f:
    # skip headers
    f.next()
    # get first colum as keys
    keys = (line.split(',')[0] for line in f) 

# create empty dictionary:
d = {}

# read from file b.csv
with open(b_file) as f:
    # gather headers except first key header
    headers = f.next().split(',')[1:]
    # iterate lines
    for line in f:
        # gather the colums
        cols = line.strip().split(',')
        # check to make sure this key should be mapped.
        if cols[0] not in keys:
            continue
        # add key to dict
        d[cols[0]] = dict(
            # inner keys are the header names, values are columns
            (headers[idx], v) for idx, v in enumerate(cols[1:]))

Please note though, that for parsing csv files there is a csv module.


回答 1

更新:对于嵌套字典的任意长度,请转到此答案

使用集合中的defaultdict函数。

高性能:当数据集很大时,“ if key not in dict”非常昂贵。

维护成本低:使代码更具可读性,并且可以轻松扩展。

from collections import defaultdict

target_dict = defaultdict(dict)
target_dict[key1][key2] = val

UPDATE: For an arbitrary length of a nested dictionary, go to this answer.

Use the defaultdict function from the collections.

High performance: “if key not in dict” is very expensive when the data set is large.

Low maintenance: make the code more readable and can be easily extended.

from collections import defaultdict

target_dict = defaultdict(dict)
target_dict[key1][key2] = val

回答 2

对于任意级别的嵌套:

In [2]: def nested_dict():
   ...:     return collections.defaultdict(nested_dict)
   ...:

In [3]: a = nested_dict()

In [4]: a
Out[4]: defaultdict(<function __main__.nested_dict>, {})

In [5]: a['a']['b']['c'] = 1

In [6]: a
Out[6]:
defaultdict(<function __main__.nested_dict>,
            {'a': defaultdict(<function __main__.nested_dict>,
                         {'b': defaultdict(<function __main__.nested_dict>,
                                      {'c': 1})})})

For arbitrary levels of nestedness:

In [2]: def nested_dict():
   ...:     return collections.defaultdict(nested_dict)
   ...:

In [3]: a = nested_dict()

In [4]: a
Out[4]: defaultdict(<function __main__.nested_dict>, {})

In [5]: a['a']['b']['c'] = 1

In [6]: a
Out[6]:
defaultdict(<function __main__.nested_dict>,
            {'a': defaultdict(<function __main__.nested_dict>,
                         {'b': defaultdict(<function __main__.nested_dict>,
                                      {'c': 1})})})

回答 3

重要的是要记住,在使用defaultdict和类似的嵌套dict模块(如nested_dict)时,查找不存在的键可能会无意间在dict中创建新的键条目,并造成很多破坏。

这是带有nested_dict模块的Python3示例:

import nested_dict as nd
nest = nd.nested_dict()
nest['outer1']['inner1'] = 'v11'
nest['outer1']['inner2'] = 'v12'
print('original nested dict: \n', nest)
try:
    nest['outer1']['wrong_key1']
except KeyError as e:
    print('exception missing key', e)
print('nested dict after lookup with missing key.  no exception raised:\n', nest)

# Instead, convert back to normal dict...
nest_d = nest.to_dict(nest)
try:
    print('converted to normal dict. Trying to lookup Wrong_key2')
    nest_d['outer1']['wrong_key2']
except KeyError as e:
    print('exception missing key', e)
else:
    print(' no exception raised:\n')

# ...or use dict.keys to check if key in nested dict
print('checking with dict.keys')
print(list(nest['outer1'].keys()))
if 'wrong_key3' in list(nest.keys()):

    print('found wrong_key3')
else:
    print(' did not find wrong_key3')

输出为:

original nested dict:   {"outer1": {"inner2": "v12", "inner1": "v11"}}

nested dict after lookup with missing key.  no exception raised:  
{"outer1": {"wrong_key1": {}, "inner2": "v12", "inner1": "v11"}} 

converted to normal dict. 
Trying to lookup Wrong_key2 

exception missing key 'wrong_key2' 

checking with dict.keys 

['wrong_key1', 'inner2', 'inner1']  
did not find wrong_key3

It is important to remember when using defaultdict and similar nested dict modules such as nested_dict, that looking up a nonexistent key may inadvertently create a new key entry in the dict and cause a lot of havoc.

Here is a Python3 example with nested_dict module:

import nested_dict as nd
nest = nd.nested_dict()
nest['outer1']['inner1'] = 'v11'
nest['outer1']['inner2'] = 'v12'
print('original nested dict: \n', nest)
try:
    nest['outer1']['wrong_key1']
except KeyError as e:
    print('exception missing key', e)
print('nested dict after lookup with missing key.  no exception raised:\n', nest)

# Instead, convert back to normal dict...
nest_d = nest.to_dict(nest)
try:
    print('converted to normal dict. Trying to lookup Wrong_key2')
    nest_d['outer1']['wrong_key2']
except KeyError as e:
    print('exception missing key', e)
else:
    print(' no exception raised:\n')

# ...or use dict.keys to check if key in nested dict
print('checking with dict.keys')
print(list(nest['outer1'].keys()))
if 'wrong_key3' in list(nest.keys()):

    print('found wrong_key3')
else:
    print(' did not find wrong_key3')

Output is:

original nested dict:   {"outer1": {"inner2": "v12", "inner1": "v11"}}

nested dict after lookup with missing key.  no exception raised:  
{"outer1": {"wrong_key1": {}, "inner2": "v12", "inner1": "v11"}} 

converted to normal dict. 
Trying to lookup Wrong_key2 

exception missing key 'wrong_key2' 

checking with dict.keys 

['wrong_key1', 'inner2', 'inner1']  
did not find wrong_key3

如何从python中的字典中获取随机值

问题:如何从python中的字典中获取随机值

如何从中获得随机对dict?我正在制作一款游戏,您需要猜测一个国家的首都,并且需要随机出现的问题。

dict模样{'VENEZUELA':'CARACAS'}

我怎样才能做到这一点?

How can I get a random pair from a dict? I’m making a game where you need to guess a capital of a country and I need questions to appear randomly.

The dict looks like {'VENEZUELA':'CARACAS'}

How can I do this?


回答 0

一种方法是:

import random
d = {'VENEZUELA':'CARACAS', 'CANADA':'OTTAWA'}
random.choice(list(d.values()))

编辑:该问题在原始帖子发布后的几年内已更改,现在要求使用一对,而不是单个物品。现在的最后一行应该是:

country, capital = random.choice(list(d.items()))

One way would be:

import random
d = {'VENEZUELA':'CARACAS', 'CANADA':'OTTAWA'}
random.choice(list(d.values()))

EDIT: The question was changed a couple years after the original post, and now asks for a pair, rather than a single item. The final line should now be:

country, capital = random.choice(list(d.items()))

回答 1

我写这个试图解决同样的问题:

https://github.com/robtandy/randomdict

它具有O(1)对键,值和项的随机访问。

I wrote this trying to solve the same problem:

https://github.com/robtandy/randomdict

It has O(1) random access to keys, values, and items.


回答 2

试试这个:

import random
a = dict(....) # a is some dictionary
random_key = random.sample(a, 1)[0]

这绝对有效。

Try this:

import random
a = dict(....) # a is some dictionary
random_key = random.sample(a, 1)[0]

This definitely works.


回答 3

如果您不想使用该random模块,也可以尝试popitem()

>> d = {'a': 1, 'b': 5, 'c': 7}
>>> d.popitem()
('a', 1)
>>> d
{'c': 7, 'b': 5}
>>> d.popitem()
('c', 7)

由于dict 不保留订单,因此使用popitem可以从中获得任意(但不是严格随机)顺序的项目。

还请记住popitem,如docs中所述,从字典中删除键值对。

popitem()可用于破坏性地迭代字典

If you don’t want to use the random module, you can also try popitem():

>> d = {'a': 1, 'b': 5, 'c': 7}
>>> d.popitem()
('a', 1)
>>> d
{'c': 7, 'b': 5}
>>> d.popitem()
('c', 7)

Since the dict doesn’t preserve order, by using popitem you get items in an arbitrary (but not strictly random) order from it.

Also keep in mind that popitem removes the key-value pair from dictionary, as stated in the docs.

popitem() is useful to destructively iterate over a dictionary


回答 4

>>> import random
>>> d = dict(Venezuela = 1, Spain = 2, USA = 3, Italy = 4)
>>> random.choice(d.keys())
'Venezuela'
>>> random.choice(d.keys())
'USA'

通过在字典(国家/地区)的上调用random.choicekeys

>>> import random
>>> d = dict(Venezuela = 1, Spain = 2, USA = 3, Italy = 4)
>>> random.choice(d.keys())
'Venezuela'
>>> random.choice(d.keys())
'USA'

By calling random.choice on the keys of the dictionary (the countries).


回答 5

这适用于Python 2和Python 3:

随机密钥:

random.choice(list(d.keys()))

随机值

random.choice(list(d.values()))

随机键和值

random.choice(list(d.items()))

This works in Python 2 and Python 3:

A random key:

random.choice(list(d.keys()))

A random value

random.choice(list(d.values()))

A random key and value

random.choice(list(d.items()))

回答 6

如果您不想使用random.choice(),可以尝试以下方式:

>>> list(myDictionary)[i]
'VENEZUELA'
>>> myDictionary = {'VENEZUELA':'CARACAS', 'IRAN' : 'TEHRAN'}
>>> import random
>>> i = random.randint(0, len(myDictionary) - 1)
>>> myDictionary[list(myDictionary)[i]]
'TEHRAN'
>>> list(myDictionary)[i]
'IRAN'

If you don’t want to use random.choice() you can try this way:

>>> list(myDictionary)[i]
'VENEZUELA'
>>> myDictionary = {'VENEZUELA':'CARACAS', 'IRAN' : 'TEHRAN'}
>>> import random
>>> i = random.randint(0, len(myDictionary) - 1)
>>> myDictionary[list(myDictionary)[i]]
'TEHRAN'
>>> list(myDictionary)[i]
'IRAN'

回答 7

由于原始帖子想要这

import random
d = {'VENEZUELA':'CARACAS', 'CANADA':'TORONTO'}
country, capital = random.choice(list(d.items()))

(python 3样式)

Since the original post wanted the pair:

import random
d = {'VENEZUELA':'CARACAS', 'CANADA':'TORONTO'}
country, capital = random.choice(list(d.items()))

(python 3 style)


回答 8

由于这是家庭作业:

找出random.sample()哪个将选择并从列表中返回一个随机元素。您可以使用来获得字典键列表和来获得dict.keys()字典值列表dict.values()

Since this is homework:

Check out random.sample() which will select and return a random element from an list. You can get a list of dictionary keys with dict.keys() and a list of dictionary values with dict.values().


回答 9

我假设您正在做一种测验的应用程序。对于这种应用程序,我编写了一个函数,如下所示:

def shuffle(q):
"""
The input of the function will 
be the dictionary of the question
and answers. The output will
be a random question with answer
"""
selected_keys = []
i = 0
while i < len(q):
    current_selection = random.choice(q.keys())
    if current_selection not in selected_keys:
        selected_keys.append(current_selection)
        i = i+1
        print(current_selection+'? '+str(q[current_selection]))

如果我将给出的输入questions = {'VENEZUELA':'CARACAS', 'CANADA':'TORONTO'}并调用函数shuffle(questions),则输出将如下所示:

委内瑞拉?卡拉卡斯
加拿大?多伦多

您还可以通过改组选项进一步扩展此范围

I am assuming that you are making a quiz kind of application. For this kind of application I have written a function which is as follows:

def shuffle(q):
"""
The input of the function will 
be the dictionary of the question
and answers. The output will
be a random question with answer
"""
selected_keys = []
i = 0
while i < len(q):
    current_selection = random.choice(q.keys())
    if current_selection not in selected_keys:
        selected_keys.append(current_selection)
        i = i+1
        print(current_selection+'? '+str(q[current_selection]))

If I will give the input of questions = {'VENEZUELA':'CARACAS', 'CANADA':'TORONTO'} and call the function shuffle(questions) Then the output will be as follows:

VENEZUELA? CARACAS
CANADA? TORONTO

You can extend this further more by shuffling the options also


回答 10

试试这个(使用来自项目的random.choice)

import random

a={ "str" : "sda" , "number" : 123, 55 : "num"}
random.choice(list(a.items()))
#  ('str', 'sda')
random.choice(list(a.items()))[1] # getting a value
#  'num'

Try this (using random.choice from items)

import random

a={ "str" : "sda" , "number" : 123, 55 : "num"}
random.choice(list(a.items()))
#  ('str', 'sda')
random.choice(list(a.items()))[1] # getting a value
#  'num'

回答 11

与Python(自3)的现代版本,对象的方法返回dict.keys()dict.values()dict.items()在视图对象*。嘿可以迭代,因此直接使用random.choice是不可能的,因为现在它们不是列表或集合。

一种选择是使用列表理解来完成以下工作random.choice

import random

colors = {
    'purple': '#7A4198',
    'turquoise':'#9ACBC9',
    'orange': '#EF5C35',
    'blue': '#19457D',
    'green': '#5AF9B5',
    'red': ' #E04160',
    'yellow': '#F9F985'
}

color=random.choice([hex_color for color_value in colors.values()]

print(f'The new color is: {color}')

参考文献:

With modern versions of Python(since 3), the objects returned by methods dict.keys(), dict.values() and dict.items() are view objects*. And hey can be iterated, so using directly random.choice is not possible as now they are not a list or set.

One option is to use list comprehension to do the job with random.choice:

import random

colors = {
    'purple': '#7A4198',
    'turquoise':'#9ACBC9',
    'orange': '#EF5C35',
    'blue': '#19457D',
    'green': '#5AF9B5',
    'red': ' #E04160',
    'yellow': '#F9F985'
}

color=random.choice([hex_color for color_value in colors.values()]

print(f'The new color is: {color}')

References:


回答 12

b = { 'video':0, 'music':23,"picture":12 } 
random.choice(tuple(b.items())) ('music', 23) 
random.choice(tuple(b.items())) ('music', 23) 
random.choice(tuple(b.items())) ('picture', 12) 
random.choice(tuple(b.items())) ('video', 0) 
b = { 'video':0, 'music':23,"picture":12 } 
random.choice(tuple(b.items())) ('music', 23) 
random.choice(tuple(b.items())) ('music', 23) 
random.choice(tuple(b.items())) ('picture', 12) 
random.choice(tuple(b.items())) ('video', 0) 

回答 13

我通过寻找一个相当可比的解决方案找到了这篇文章。为了从一个字典中挑选多个元素,可以使用:

idx_picks = np.random.choice(len(d), num_of_picks, replace=False) #(Don't pick the same element twice)
result = dict ()
c_keys = [d.keys()] #not so efficient - unfortunately .keys() returns a non-indexable object because dicts are unordered
for i in idx_picks:
    result[c_keys[i]] = d[i]

I found this post by looking for a rather comparable solution. For picking multiple elements out of a dict, this can be used:

idx_picks = np.random.choice(len(d), num_of_picks, replace=False) #(Don't pick the same element twice)
result = dict ()
c_keys = [d.keys()] #not so efficient - unfortunately .keys() returns a non-indexable object because dicts are unordered
for i in idx_picks:
    result[c_keys[i]] = d[i]

将namedtuple转换成字典

问题:将namedtuple转换成字典

我在python中有一个命名的tuple类

class Town(collections.namedtuple('Town', [
    'name', 
    'population',
    'coordinates',
    'population', 
    'capital', 
    'state_bird'])):
    # ...

我想将Town实例转换成字典。我不希望它与城镇中字段的名称或数量严格相关。

有没有一种方法可以编写它,以便我可以添加更多字段,或者传入完全不同的命名元组并获得字典。

我无法更改其他人代码中的原始类定义。因此,我需要以一个Town实例为例,并将其转换为字典。

I have a named tuple class in python

class Town(collections.namedtuple('Town', [
    'name', 
    'population',
    'coordinates',
    'population', 
    'capital', 
    'state_bird'])):
    # ...

I’d like to convert Town instances into dictionaries. I don’t want it to be rigidly tied to the names or number of the fields in a Town.

Is there a way to write it such that I could add more fields, or pass an entirely different named tuple in and get a dictionary.

I can not alter the original class definition as its in someone else’s code. So I need to take an instance of a Town and convert it to a dictionary.


回答 0

TL; DR:_asdict为此提供了一种方法。

这是用法的演示:

>>> fields = ['name', 'population', 'coordinates', 'capital', 'state_bird']
>>> Town = collections.namedtuple('Town', fields)
>>> funkytown = Town('funky', 300, 'somewhere', 'lipps', 'chicken')
>>> funkytown._asdict()
OrderedDict([('name', 'funky'),
             ('population', 300),
             ('coordinates', 'somewhere'),
             ('capital', 'lipps'),
             ('state_bird', 'chicken')])

这是一个已记录的namedtuples 方法,即,与python中的常规约定不同,该方法名上的前划线并不妨碍使用。随着加入namedtuples其他方法,_make_replace_source_fields,它有下划线只尝试和防止可能的字段名的冲突。


注意: 对于一些2.7.5 <python版本<3.5.0的代码,您可能会看到以下版本:

>>> vars(funkytown)
OrderedDict([('name', 'funky'),
             ('population', 300),
             ('coordinates', 'somewhere'),
             ('capital', 'lipps'),
             ('state_bird', 'chicken')])

有一段时间,文档提到_asdict过时了(请参阅此处),并建议使用内置方法vars。那个建议现在已经过时了。为了修复与子类相关的错误__dict__此commit再次删除了namedtuples上存在的属性。

TL;DR: there’s a method _asdict provided for this.

Here is a demonstration of the usage:

>>> fields = ['name', 'population', 'coordinates', 'capital', 'state_bird']
>>> Town = collections.namedtuple('Town', fields)
>>> funkytown = Town('funky', 300, 'somewhere', 'lipps', 'chicken')
>>> funkytown._asdict()
OrderedDict([('name', 'funky'),
             ('population', 300),
             ('coordinates', 'somewhere'),
             ('capital', 'lipps'),
             ('state_bird', 'chicken')])

This is a documented method of namedtuples, i.e. unlike the usual convention in python the leading underscore on the method name isn’t there to discourage use. Along with the other methods added to namedtuples, _make, _replace, _source, _fields, it has the underscore only to try and prevent conflicts with possible field names.


Note: For some 2.7.5 < python version < 3.5.0 code out in the wild, you might see this version:

>>> vars(funkytown)
OrderedDict([('name', 'funky'),
             ('population', 300),
             ('coordinates', 'somewhere'),
             ('capital', 'lipps'),
             ('state_bird', 'chicken')])

For a while the documentation had mentioned that _asdict was obsolete (see here), and suggested to use the built-in method vars. That advice is now outdated; in order to fix a bug related to subclassing, the __dict__ property which was present on namedtuples has again been removed by this commit.


回答 1

namedtuple实例上有一个内置方法_asdict

正如评论中所讨论的,在某些版本上vars()也可以这样做,但是它显然高度依赖于构建细节,而_asdict应该是可靠的。在某些版本_asdict中,已将其标记为已弃用,但注释表明从3.4版开始,情况已不再如此。

There’s a built in method on namedtuple instances for this, _asdict.

As discussed in the comments, on some versions vars() will also do it, but it’s apparently highly dependent on build details, whereas _asdict should be reliable. In some versions _asdict was marked as deprecated, but comments indicate that this is no longer the case as of 3.4.


回答 2

在Ubuntu 14.04 LTS版本的python2.7和python3.4上,该__dict__属性按预期工作。该_asdict 方法也有效,但我倾向于使用标准定义的统一属性api而不是本地化的非统一api。

$ python2.7

# Works on:
# Python 2.7.6 (default, Jun 22 2015, 17:58:13)  [GCC 4.8.2] on linux2
# Python 3.4.3 (default, Oct 14 2015, 20:28:29)  [GCC 4.8.4] on linux

import collections

Color = collections.namedtuple('Color', ['r', 'g', 'b'])
red = Color(r=256, g=0, b=0)

# Access the namedtuple as a dict
print(red.__dict__['r'])  # 256

# Drop the namedtuple only keeping the dict
red = red.__dict__
print(red['r'])  #256

视为字典是获取表示词义的字典的语义方式(至少据我所知)。


汇总主要python版本和平台及其对它们的支持会很高兴__dict__,目前如上所述,我只有一个平台版本和两个python版本。

| Platform                      | PyVer     | __dict__ | _asdict |
| --------------------------    | --------- | -------- | ------- |
| Ubuntu 14.04 LTS              | Python2.7 | yes      | yes     |
| Ubuntu 14.04 LTS              | Python3.4 | yes      | yes     |
| CentOS Linux release 7.4.1708 | Python2.7 | no       | yes     |
| CentOS Linux release 7.4.1708 | Python3.4 | no       | yes     |
| CentOS Linux release 7.4.1708 | Python3.6 | no       | yes     |

On Ubuntu 14.04 LTS versions of python2.7 and python3.4 the __dict__ property worked as expected. The _asdict method also worked, but I’m inclined to use the standards-defined, uniform, property api instead of the localized non-uniform api.

$ python2.7

# Works on:
# Python 2.7.6 (default, Jun 22 2015, 17:58:13)  [GCC 4.8.2] on linux2
# Python 3.4.3 (default, Oct 14 2015, 20:28:29)  [GCC 4.8.4] on linux

import collections

Color = collections.namedtuple('Color', ['r', 'g', 'b'])
red = Color(r=256, g=0, b=0)

# Access the namedtuple as a dict
print(red.__dict__['r'])  # 256

# Drop the namedtuple only keeping the dict
red = red.__dict__
print(red['r'])  #256

Seeing as dict is the semantic way to get a dictionary representing soemthing, (at least to the best of my knowledge).


It would be nice to accumulate a table of major python versions and platforms and their support for __dict__, currently I only have one platform version and two python versions as posted above.

| Platform                      | PyVer     | __dict__ | _asdict |
| --------------------------    | --------- | -------- | ------- |
| Ubuntu 14.04 LTS              | Python2.7 | yes      | yes     |
| Ubuntu 14.04 LTS              | Python3.4 | yes      | yes     |
| CentOS Linux release 7.4.1708 | Python2.7 | no       | yes     |
| CentOS Linux release 7.4.1708 | Python3.4 | no       | yes     |
| CentOS Linux release 7.4.1708 | Python3.6 | no       | yes     |

回答 3

案例1:一维元组

TUPLE_ROLES = (
    (912,"Role 21"),
    (913,"Role 22"),
    (925,"Role 23"),
    (918,"Role 24"),
)


TUPLE_ROLES[912]  #==> Error because it is out of bounce. 
TUPLE_ROLES[  2]  #==> will show Role 23.
DICT1_ROLE = {k:v for k, v in TUPLE_ROLES }
DICT1_ROLE[925] # will display "Role 23" 

情况2:二维元组
示例:DICT_ROLES [961]#将显示“后端编程器”

NAMEDTUPLE_ROLES = (
    ('Company', ( 
            ( 111, 'Owner/CEO/President'), 
            ( 113, 'Manager'),
            ( 115, 'Receptionist'),
            ( 117, 'Marketer'),
            ( 119, 'Sales Person'),
            ( 121, 'Accountant'),
            ( 123, 'Director'),
            ( 125, 'Vice President'),
            ( 127, 'HR Specialist'),
            ( 141, 'System Operator'),
    )),
    ('Restaurant', ( 
            ( 211, 'Chef'), 
            ( 212, 'Waiter/Waitress'), 
    )),
    ('Oil Collector', ( 
            ( 211, 'Truck Driver'), 
            ( 213, 'Tank Installer'), 
            ( 217, 'Welder'),
            ( 218, 'In-house Handler'),
            ( 219, 'Dispatcher'),
    )),
    ('Information Technology', ( 
            ( 912, 'Server Administrator'),
            ( 914, 'Graphic Designer'),
            ( 916, 'Project Manager'),
            ( 918, 'Consultant'),
            ( 921, 'Business Logic Analyzer'),
            ( 923, 'Data Model Designer'),
            ( 951, 'Programmer'),
            ( 953, 'WEB Front-End Programmer'),
            ( 955, 'Android Programmer'),
            ( 957, 'iOS Programmer'),
            ( 961, 'Back-End Programmer'),
            ( 962, 'Fullstack Programmer'),
            ( 971, 'System Architect'),
    )),
)

#Thus, we need dictionary/set

T4 = {}
def main():
    for k, v in NAMEDTUPLE_ROLES:
        for k1, v1 in v:
            T4.update ( {k1:v1}  )
    print (T4[961]) # will display 'Back-End Programmer'
    # print (T4) # will display all list of dictionary

main()

Case #1: one dimension tuple

TUPLE_ROLES = (
    (912,"Role 21"),
    (913,"Role 22"),
    (925,"Role 23"),
    (918,"Role 24"),
)


TUPLE_ROLES[912]  #==> Error because it is out of bounce. 
TUPLE_ROLES[  2]  #==> will show Role 23.
DICT1_ROLE = {k:v for k, v in TUPLE_ROLES }
DICT1_ROLE[925] # will display "Role 23" 

Case #2: Two dimension tuple
Example: DICT_ROLES[961] # will show ‘Back-End Programmer’

NAMEDTUPLE_ROLES = (
    ('Company', ( 
            ( 111, 'Owner/CEO/President'), 
            ( 113, 'Manager'),
            ( 115, 'Receptionist'),
            ( 117, 'Marketer'),
            ( 119, 'Sales Person'),
            ( 121, 'Accountant'),
            ( 123, 'Director'),
            ( 125, 'Vice President'),
            ( 127, 'HR Specialist'),
            ( 141, 'System Operator'),
    )),
    ('Restaurant', ( 
            ( 211, 'Chef'), 
            ( 212, 'Waiter/Waitress'), 
    )),
    ('Oil Collector', ( 
            ( 211, 'Truck Driver'), 
            ( 213, 'Tank Installer'), 
            ( 217, 'Welder'),
            ( 218, 'In-house Handler'),
            ( 219, 'Dispatcher'),
    )),
    ('Information Technology', ( 
            ( 912, 'Server Administrator'),
            ( 914, 'Graphic Designer'),
            ( 916, 'Project Manager'),
            ( 918, 'Consultant'),
            ( 921, 'Business Logic Analyzer'),
            ( 923, 'Data Model Designer'),
            ( 951, 'Programmer'),
            ( 953, 'WEB Front-End Programmer'),
            ( 955, 'Android Programmer'),
            ( 957, 'iOS Programmer'),
            ( 961, 'Back-End Programmer'),
            ( 962, 'Fullstack Programmer'),
            ( 971, 'System Architect'),
    )),
)

#Thus, we need dictionary/set

T4 = {}
def main():
    for k, v in NAMEDTUPLE_ROLES:
        for k1, v1 in v:
            T4.update ( {k1:v1}  )
    print (T4[961]) # will display 'Back-End Programmer'
    # print (T4) # will display all list of dictionary

main()

回答 4

如果没有_asdict(),则可以使用以下方式:

def to_dict(model):
    new_dict = {}
    keys = model._fields
    index = 0
    for key in keys:
        new_dict[key] = model[index]
        index += 1

    return new_dict

if no _asdict(), you can use this way:

def to_dict(model):
    new_dict = {}
    keys = model._fields
    index = 0
    for key in keys:
        new_dict[key] = model[index]
        index += 1

    return new_dict

回答 5

Python 3.将任何字段分配给字典作为字典的必需索引,我使用了“名称”。

import collections

Town = collections.namedtuple("Town", "name population coordinates capital state_bird")

town_list = []

town_list.append(Town('Town 1', '10', '10.10', 'Capital 1', 'Turkey'))
town_list.append(Town('Town 2', '11', '11.11', 'Capital 2', 'Duck'))

town_dictionary = {t.name: t for t in town_list}

Python 3. Allocate any field to the dictionary as the required index for the dictionary, I used ‘name’.

import collections

Town = collections.namedtuple("Town", "name population coordinates capital state_bird")

town_list = []

town_list.append(Town('Town 1', '10', '10.10', 'Capital 1', 'Turkey'))
town_list.append(Town('Town 2', '11', '11.11', 'Capital 2', 'Duck'))

town_dictionary = {t.name: t for t in town_list}

将“熊猫”列中的字典/列表拆分为单独的列

问题:将“熊猫”列中的字典/列表拆分为单独的列

我将数据保存在postgreSQL数据库中。我正在使用Python2.7查询此数据并将其转换为Pandas DataFrame。但是,此数据框的最后一列中包含值的字典(或列表?)。DataFrame看起来像这样:

[1] df
Station ID     Pollutants
8809           {"a": "46", "b": "3", "c": "12"}
8810           {"a": "36", "b": "5", "c": "8"}
8811           {"b": "2", "c": "7"}
8812           {"c": "11"}
8813           {"a": "82", "c": "15"}

我需要将此列拆分为单独的列,以便DataFrame如下所示:

[2] df2
Station ID     a      b       c
8809           46     3       12
8810           36     5       8
8811           NaN    2       7
8812           NaN    NaN     11
8813           82     NaN     15

我遇到的主要问题是列表的长度不同。但是所有列表最多只能包含相同的3个值:a,b和c。而且它们始终以相同的顺序出现(第一,第二,第三)。

以下代码用于工作并返回我想要的内容(df2)。

[3] df 
[4] objs = [df, pandas.DataFrame(df['Pollutant Levels'].tolist()).iloc[:, :3]]
[5] df2 = pandas.concat(objs, axis=1).drop('Pollutant Levels', axis=1)
[6] print(df2)

我上周才运行此代码,并且运行良好。但是现在我的代码坏了,我从第[4]行得到了这个错误:

IndexError: out-of-bounds on slice (end) 

我没有对代码进行任何更改,但是现在出现了错误。我觉得这是由于我的方法不够健壮或不合适。

对于如何将列表的此列拆分为单独的列的任何建议或指导,将不胜感激!

编辑:我认为.tolist()和.apply方法不适用于我的代码,因为它是一个unicode字符串,即:

#My data format 
u{'a': '1', 'b': '2', 'c': '3'}

#and not
{u'a': '1', u'b': '2', u'c': '3'}

数据是从PostgreSQL数据库以这种格式导入的。这个问题有什么帮助或想法吗?有没有办法转换unicode?

I have data saved in a postgreSQL database. I am querying this data using Python2.7 and turning it into a Pandas DataFrame. However, the last column of this dataframe has a dictionary (or list?) of values within it. The DataFrame looks like this:

[1] df
Station ID     Pollutants
8809           {"a": "46", "b": "3", "c": "12"}
8810           {"a": "36", "b": "5", "c": "8"}
8811           {"b": "2", "c": "7"}
8812           {"c": "11"}
8813           {"a": "82", "c": "15"}

I need to split this column into separate columns so that the DataFrame looks like this:

[2] df2
Station ID     a      b       c
8809           46     3       12
8810           36     5       8
8811           NaN    2       7
8812           NaN    NaN     11
8813           82     NaN     15

The major issue I’m having is that the lists are not the same lengths. But all of the lists only contain up to the same 3 values: a, b, and c. And they always appear in the same order (a first, b second, c third).

The following code USED to work and return exactly what I wanted (df2).

[3] df 
[4] objs = [df, pandas.DataFrame(df['Pollutant Levels'].tolist()).iloc[:, :3]]
[5] df2 = pandas.concat(objs, axis=1).drop('Pollutant Levels', axis=1)
[6] print(df2)

I was running this code just last week and it was working fine. But now my code is broken and I get this error from line [4]:

IndexError: out-of-bounds on slice (end) 

I made no changes to the code but am now getting the error. I feel this is due to my method not being robust or proper.

Any suggestions or guidance on how to split this column of lists into separate columns would be super appreciated!

EDIT: I think the .tolist() and .apply methods are not working on my code because it is one Unicode string, i.e.:

#My data format 
u{'a': '1', 'b': '2', 'c': '3'}

#and not
{u'a': '1', u'b': '2', u'c': '3'}

The data is importing from the postgreSQL database in this format. Any help or ideas with this issue? is there a way to convert the Unicode?


回答 0

要将字符串转换为实际的dict,可以执行df['Pollutant Levels'].map(eval)。之后,可以使用以下解决方案将dict转换为不同的列。


通过一个小例子,您可以使用.apply(pd.Series)

In [2]: df = pd.DataFrame({'a':[1,2,3], 'b':[{'c':1}, {'d':3}, {'c':5, 'd':6}]})

In [3]: df
Out[3]:
   a                   b
0  1           {u'c': 1}
1  2           {u'd': 3}
2  3  {u'c': 5, u'd': 6}

In [4]: df['b'].apply(pd.Series)
Out[4]:
     c    d
0  1.0  NaN
1  NaN  3.0
2  5.0  6.0

要将其与数据框的其余部分合并,可以concat将其他列与上述结果结合在一起:

In [7]: pd.concat([df.drop(['b'], axis=1), df['b'].apply(pd.Series)], axis=1)
Out[7]:
   a    c    d
0  1  1.0  NaN
1  2  NaN  3.0
2  3  5.0  6.0

使用我的代码,如果我省略了这一iloc部分,这也可以工作:

In [15]: pd.concat([df.drop('b', axis=1), pd.DataFrame(df['b'].tolist())], axis=1)
Out[15]:
   a    c    d
0  1  1.0  NaN
1  2  NaN  3.0
2  3  5.0  6.0

To convert the string to an actual dict, you can do df['Pollutant Levels'].map(eval). Afterwards, the solution below can be used to convert the dict to different columns.


Using a small example, you can use .apply(pd.Series):

In [2]: df = pd.DataFrame({'a':[1,2,3], 'b':[{'c':1}, {'d':3}, {'c':5, 'd':6}]})

In [3]: df
Out[3]:
   a                   b
0  1           {u'c': 1}
1  2           {u'd': 3}
2  3  {u'c': 5, u'd': 6}

In [4]: df['b'].apply(pd.Series)
Out[4]:
     c    d
0  1.0  NaN
1  NaN  3.0
2  5.0  6.0

To combine it with the rest of the dataframe, you can concat the other columns with the above result:

In [7]: pd.concat([df.drop(['b'], axis=1), df['b'].apply(pd.Series)], axis=1)
Out[7]:
   a    c    d
0  1  1.0  NaN
1  2  NaN  3.0
2  3  5.0  6.0

Using your code, this also works if I leave out the iloc part:

In [15]: pd.concat([df.drop('b', axis=1), pd.DataFrame(df['b'].tolist())], axis=1)
Out[15]:
   a    c    d
0  1  1.0  NaN
1  2  NaN  3.0
2  3  5.0  6.0

回答 1

我知道这个问题已经很老了,但是我到这里来寻找答案。实际上,现在有一种更好(更快)的方法json_normalize

import pandas as pd

df2 = pd.json_normalize(df['Pollutant Levels'])

这避免了昂贵的应用功能…

I know the question is quite old, but I got here searching for answers. There is actually a better (and faster) way now of doing this using json_normalize:

import pandas as pd

df2 = pd.json_normalize(df['Pollutant Levels'])

This avoids costly apply functions…


回答 2

尝试以下操作: 从SQL返回的数据必须转换为Dict。 还是 "Pollutant Levels" 现在Pollutants'

   StationID                   Pollutants
0       8809  {"a":"46","b":"3","c":"12"}
1       8810   {"a":"36","b":"5","c":"8"}
2       8811            {"b":"2","c":"7"}
3       8812                   {"c":"11"}
4       8813          {"a":"82","c":"15"}


df2["Pollutants"] = df2["Pollutants"].apply(lambda x : dict(eval(x)) )
df3 = df2["Pollutants"].apply(pd.Series )

    a    b   c
0   46    3  12
1   36    5   8
2  NaN    2   7
3  NaN  NaN  11
4   82  NaN  15


result = pd.concat([df, df3], axis=1).drop('Pollutants', axis=1)
result

   StationID    a    b   c
0       8809   46    3  12
1       8810   36    5   8
2       8811  NaN    2   7
3       8812  NaN  NaN  11
4       8813   82  NaN  15

Try this: The data returned from SQL has to converted into a Dict. or could it be "Pollutant Levels" is now Pollutants'

   StationID                   Pollutants
0       8809  {"a":"46","b":"3","c":"12"}
1       8810   {"a":"36","b":"5","c":"8"}
2       8811            {"b":"2","c":"7"}
3       8812                   {"c":"11"}
4       8813          {"a":"82","c":"15"}


df2["Pollutants"] = df2["Pollutants"].apply(lambda x : dict(eval(x)) )
df3 = df2["Pollutants"].apply(pd.Series )

    a    b   c
0   46    3  12
1   36    5   8
2  NaN    2   7
3  NaN  NaN  11
4   82  NaN  15


result = pd.concat([df, df3], axis=1).drop('Pollutants', axis=1)
result

   StationID    a    b   c
0       8809   46    3  12
1       8810   36    5   8
2       8811  NaN    2   7
3       8812  NaN  NaN  11
4       8813   82  NaN  15

回答 3

Merlin的答案更好,更简单,但是我们不需要lambda函数。可以通过以下两种方式之一安全地忽略对字典的评估:

方法1:两步

# step 1: convert the `Pollutants` column to Pandas dataframe series
df_pol_ps = data_df['Pollutants'].apply(pd.Series)

df_pol_ps:
    a   b   c
0   46  3   12
1   36  5   8
2   NaN 2   7
3   NaN NaN 11
4   82  NaN 15

# step 2: concat columns `a, b, c` and drop/remove the `Pollutants` 
df_final = pd.concat([df, df_pol_ps], axis = 1).drop('Pollutants', axis = 1)

df_final:
    StationID   a   b   c
0   8809    46  3   12
1   8810    36  5   8
2   8811    NaN 2   7
3   8812    NaN NaN 11
4   8813    82  NaN 15

方式2:以上两个步骤可以一并组合:

df_final = pd.concat([df, df['Pollutants'].apply(pd.Series)], axis = 1).drop('Pollutants', axis = 1)

df_final:
    StationID   a   b   c
0   8809    46  3   12
1   8810    36  5   8
2   8811    NaN 2   7
3   8812    NaN NaN 11
4   8813    82  NaN 15

Merlin’s answer is better and super easy, but we don’t need a lambda function. The evaluation of dictionary can be safely ignored by either of the following two ways as illustrated below:

Way 1: Two steps

# step 1: convert the `Pollutants` column to Pandas dataframe series
df_pol_ps = data_df['Pollutants'].apply(pd.Series)

df_pol_ps:
    a   b   c
0   46  3   12
1   36  5   8
2   NaN 2   7
3   NaN NaN 11
4   82  NaN 15

# step 2: concat columns `a, b, c` and drop/remove the `Pollutants` 
df_final = pd.concat([df, df_pol_ps], axis = 1).drop('Pollutants', axis = 1)

df_final:
    StationID   a   b   c
0   8809    46  3   12
1   8810    36  5   8
2   8811    NaN 2   7
3   8812    NaN NaN 11
4   8813    82  NaN 15

Way 2: The above two steps can be combined in one go:

df_final = pd.concat([df, df['Pollutants'].apply(pd.Series)], axis = 1).drop('Pollutants', axis = 1)

df_final:
    StationID   a   b   c
0   8809    46  3   12
1   8810    36  5   8
2   8811    NaN 2   7
3   8812    NaN NaN 11
4   8813    82  NaN 15

回答 4

我强烈建议该方法提取“污染物”列:

df_pollutants = pd.DataFrame(df['Pollutants'].values.tolist(), index=df.index)

它比

df_pollutants = df['Pollutants'].apply(pd.Series)

当df的大小很大时。

I strongly recommend the method extract the column ‘Pollutants’:

df_pollutants = pd.DataFrame(df['Pollutants'].values.tolist(), index=df.index)

it’s much faster than

df_pollutants = df['Pollutants'].apply(pd.Series)

when the size of df is giant.


回答 5

你可以用joinpop+ tolist。性能concatdrop+ 相当tolist,但有些人可能会发现此语法更简洁:

res = df.join(pd.DataFrame(df.pop('b').tolist()))

使用其他方法进行基准测试:

df = pd.DataFrame({'a':[1,2,3], 'b':[{'c':1}, {'d':3}, {'c':5, 'd':6}]})

def joris1(df):
    return pd.concat([df.drop('b', axis=1), df['b'].apply(pd.Series)], axis=1)

def joris2(df):
    return pd.concat([df.drop('b', axis=1), pd.DataFrame(df['b'].tolist())], axis=1)

def jpp(df):
    return df.join(pd.DataFrame(df.pop('b').tolist()))

df = pd.concat([df]*1000, ignore_index=True)

%timeit joris1(df.copy())  # 1.33 s per loop
%timeit joris2(df.copy())  # 7.42 ms per loop
%timeit jpp(df.copy())     # 7.68 ms per loop

You can use join with pop + tolist. Performance is comparable to concat with drop + tolist, but some may find this syntax cleaner:

res = df.join(pd.DataFrame(df.pop('b').tolist()))

Benchmarking with other methods:

df = pd.DataFrame({'a':[1,2,3], 'b':[{'c':1}, {'d':3}, {'c':5, 'd':6}]})

def joris1(df):
    return pd.concat([df.drop('b', axis=1), df['b'].apply(pd.Series)], axis=1)

def joris2(df):
    return pd.concat([df.drop('b', axis=1), pd.DataFrame(df['b'].tolist())], axis=1)

def jpp(df):
    return df.join(pd.DataFrame(df.pop('b').tolist()))

df = pd.concat([df]*1000, ignore_index=True)

%timeit joris1(df.copy())  # 1.33 s per loop
%timeit joris2(df.copy())  # 7.42 ms per loop
%timeit jpp(df.copy())     # 7.68 ms per loop

回答 6

一种解决方案如下:

>>> df = pd.concat([df['Station ID'], df['Pollutants'].apply(pd.Series)], axis=1)
>>> print(df)
   Station ID    a    b   c
0        8809   46    3  12
1        8810   36    5   8
2        8811  NaN    2   7
3        8812  NaN  NaN  11
4        8813   82  NaN  15

One line solution is following:

>>> df = pd.concat([df['Station ID'], df['Pollutants'].apply(pd.Series)], axis=1)
>>> print(df)
   Station ID    a    b   c
0        8809   46    3  12
1        8810   36    5   8
2        8811  NaN    2   7
3        8812  NaN  NaN  11
4        8813   82  NaN  15

回答 7

my_df = pd.DataFrame.from_dict(my_dict, orient='index', columns=['my_col'])

..本可以正确解析字典(将每个字典键放入单独的df列中,并将键值放入df行中),因此这些dict首先不会被压入单个列中。

my_df = pd.DataFrame.from_dict(my_dict, orient='index', columns=['my_col'])

.. would have parsed the dict properly (putting each dict key into a separate df column, and key values into df rows), so the dicts would not get squashed into a single column in the first place.


回答 8

我将这些步骤串联在一个方法中,您只需要传递数据框和包含扩展字典的列即可:

def expand_dataframe(dw: pd.DataFrame, column_to_expand: str) -> pd.DataFrame:
    """
    dw: DataFrame with some column which contain a dict to expand
        in columns
    column_to_expand: String with column name of dw
    """
    import pandas as pd

    def convert_to_dict(sequence: str) -> Dict:
        import json
        s = sequence
        json_acceptable_string = s.replace("'", "\"")
        d = json.loads(json_acceptable_string)
        return d    

    expanded_dataframe = pd.concat([dw.drop([column_to_expand], axis=1),
                                    dw[column_to_expand]
                                    .apply(convert_to_dict)
                                    .apply(pd.Series)],
                                    axis=1)
    return expanded_dataframe

I’ve concatenated those steps in a method, you have to pass only the dataframe and the column which contains the dict to expand:

def expand_dataframe(dw: pd.DataFrame, column_to_expand: str) -> pd.DataFrame:
    """
    dw: DataFrame with some column which contain a dict to expand
        in columns
    column_to_expand: String with column name of dw
    """
    import pandas as pd

    def convert_to_dict(sequence: str) -> Dict:
        import json
        s = sequence
        json_acceptable_string = s.replace("'", "\"")
        d = json.loads(json_acceptable_string)
        return d    

    expanded_dataframe = pd.concat([dw.drop([column_to_expand], axis=1),
                                    dw[column_to_expand]
                                    .apply(convert_to_dict)
                                    .apply(pd.Series)],
                                    axis=1)
    return expanded_dataframe

回答 9

df = pd.concat([df['a'], df.b.apply(pd.Series)], axis=1)
df = pd.concat([df['a'], df.b.apply(pd.Series)], axis=1)

TypeError:“ dict_keys”对象不支持索引

问题:TypeError:“ dict_keys”对象不支持索引

def shuffle(self, x, random=None, int=int):
    """x, random=random.random -> shuffle list x in place; return None.

    Optional arg random is a 0-argument function returning a random
    float in [0.0, 1.0); by default, the standard random.random.
    """

    randbelow = self._randbelow
    for i in reversed(range(1, len(x))):
        # pick an element in x[:i+1] with which to exchange x[i]
        j = randbelow(i+1) if random is None else int(random() * (i+1))
        x[i], x[j] = x[j], x[i]

当我运行该shuffle函数时,它会引发以下错误,这是为什么呢?

TypeError: 'dict_keys' object does not support indexing
def shuffle(self, x, random=None, int=int):
    """x, random=random.random -> shuffle list x in place; return None.

    Optional arg random is a 0-argument function returning a random
    float in [0.0, 1.0); by default, the standard random.random.
    """

    randbelow = self._randbelow
    for i in reversed(range(1, len(x))):
        # pick an element in x[:i+1] with which to exchange x[i]
        j = randbelow(i+1) if random is None else int(random() * (i+1))
        x[i], x[j] = x[j], x[i]

When I run the shuffle function it raises the following error, why is that?

TypeError: 'dict_keys' object does not support indexing

回答 0

显然,您正在传递d.keys()shuffle函数。可能是用python2.x编写的(d.keys()返回列表时)。使用python3.x,d.keys()返回一个dict_keys对象,其行为更像a而set不是alist。因此,无法对其进行索引。

解决方案是将list(d.keys())(或简单地list(d))传递给shuffle

Clearly you’re passing in d.keys() to your shuffle function. Probably this was written with python2.x (when d.keys() returned a list). With python3.x, d.keys() returns a dict_keys object which behaves a lot more like a set than a list. As such, it can’t be indexed.

The solution is to pass list(d.keys()) (or simply list(d)) to shuffle.


回答 1

您将把结果传递somedict.keys()给函数。在Python 3中,dict.keys它不返回列表,但是代表字典键视图的(类似于集合)的类似集合的对象不支持索引。

要解决此问题,请使用list(somedict.keys())来收集密钥并进行处理。

You’re passing the result of somedict.keys() to the function. In Python 3, dict.keys doesn’t return a list, but a set-like object that represents a view of the dictionary’s keys and (being set-like) doesn’t support indexing.

To fix the problem, use list(somedict.keys()) to collect the keys, and work with that.


回答 2

将迭代器转换为列表可能会产生成本。相反,要获得第一项,可以使用:

next(iter(keys))

或者,如果要遍历所有项目,则可以使用:

items = iter(keys)
while True:
    try:
        item = next(items)
    except StopIteration as e:
        pass # finish

Convert an iterable to a list may have a cost. Instead, to get the the first item, you can use:

next(iter(keys))

Or, if you want to iterate over all items, you can use:

items = iter(keys)
while True:
    try:
        item = next(items)
    except StopIteration as e:
        pass # finish

回答 3

为什么需要在已经存在的情况下实施改组?留在巨人的肩膀上。

import random

d1 = {0:'zero', 1:'one', 2:'two', 3:'three', 4:'four',
     5:'five', 6:'six', 7:'seven', 8:'eight', 9:'nine'}

keys = list(d1)
random.shuffle(keys)

d2 = {}
for key in keys: d2[key] = d1[key]

print(d1)
print(d2)

Why you need to implement shuffle when it already exists? Stay on the shoulders of giants.

import random

d1 = {0:'zero', 1:'one', 2:'two', 3:'three', 4:'four',
     5:'five', 6:'six', 7:'seven', 8:'eight', 9:'nine'}

keys = list(d1)
random.shuffle(keys)

d2 = {}
for key in keys: d2[key] = d1[key]

print(d1)
print(d2)

回答 4

在Python 2中,dict.keys()返回一个列表,而在Python 3中,它返回一个生成器。

您只能遍历其值,否则可能必须将其显式转换为列表,即将其传递给列表函数。

In Python 2 dict.keys() return a list, whereas in Python 3 it returns a generator.

You could only iterate over it’s values else you may have to explicitly convert it to a list i.e. pass it to a list function.


获取嵌套字典值的Python安全方法

问题:获取嵌套字典值的Python安全方法

我有一本嵌套的字典。只有一种方法可以安全地获取价值吗?

try:
    example_dict['key1']['key2']
except KeyError:
    pass

也许python有像get()嵌套字典这样的方法?

I have a nested dictionary. Is there only one way to get values out safely?

try:
    example_dict['key1']['key2']
except KeyError:
    pass

Or maybe python has a method like get() for nested dictionary ?


回答 0

您可以使用get两次:

example_dict.get('key1', {}).get('key2')

None如果存在key1key2不存在,它将返回。

请注意,这仍可能引发AttributeErrorif example_dict['key1']存在但不是dict(或带有get方法的类似dict的对象)。try..except如果发布的代码无法订阅,则会引发一个TypeError代替example_dict['key1']

另一个区别是try...except在第一个丢失的键之后立即发生短路。get呼叫链没有。


如果您希望保留语法,example_dict['key1']['key2']但不希望它引发KeyErrors,则可以使用Hasher配方

class Hasher(dict):
    # https://stackoverflow.com/a/3405143/190597
    def __missing__(self, key):
        value = self[key] = type(self)()
        return value

example_dict = Hasher()
print(example_dict['key1'])
# {}
print(example_dict['key1']['key2'])
# {}
print(type(example_dict['key1']['key2']))
# <class '__main__.Hasher'>

请注意,如果缺少密钥,这将返回一个空的哈希器。

因为Hasher是的子类,所以dict您可以像使用一样使用Hasher dict。可以使用所有相同的方法和语法,而Hashers只是以不同方式对待丢失的密钥。

您可以将常规dict转换成Hasher这样:

hasher = Hasher(example_dict)

并轻松将其转换Hasher为常规dict

regular_dict = dict(hasher)

另一种选择是在帮助函数中隐藏丑陋:

def safeget(dct, *keys):
    for key in keys:
        try:
            dct = dct[key]
        except KeyError:
            return None
    return dct

因此,其余代码可以保持相对可读性:

safeget(example_dict, 'key1', 'key2')

You could use get twice:

example_dict.get('key1', {}).get('key2')

This will return None if either key1 or key2 does not exist.

Note that this could still raise an AttributeError if example_dict['key1'] exists but is not a dict (or a dict-like object with a get method). The try..except code you posted would raise a TypeError instead if example_dict['key1'] is unsubscriptable.

Another difference is that the try...except short-circuits immediately after the first missing key. The chain of get calls does not.


If you wish to preserve the syntax, example_dict['key1']['key2'] but do not want it to ever raise KeyErrors, then you could use the Hasher recipe:

class Hasher(dict):
    # https://stackoverflow.com/a/3405143/190597
    def __missing__(self, key):
        value = self[key] = type(self)()
        return value

example_dict = Hasher()
print(example_dict['key1'])
# {}
print(example_dict['key1']['key2'])
# {}
print(type(example_dict['key1']['key2']))
# <class '__main__.Hasher'>

Note that this returns an empty Hasher when a key is missing.

Since Hasher is a subclass of dict you can use a Hasher in much the same way you could use a dict. All the same methods and syntax is available, Hashers just treat missing keys differently.

You can convert a regular dict into a Hasher like this:

hasher = Hasher(example_dict)

and convert a Hasher to a regular dict just as easily:

regular_dict = dict(hasher)

Another alternative is to hide the ugliness in a helper function:

def safeget(dct, *keys):
    for key in keys:
        try:
            dct = dct[key]
        except KeyError:
            return None
    return dct

So the rest of your code can stay relatively readable:

safeget(example_dict, 'key1', 'key2')

回答 1

您还可以使用python reduce

def deep_get(dictionary, *keys):
    return reduce(lambda d, key: d.get(key) if d else None, keys, dictionary)

You could also use python reduce:

def deep_get(dictionary, *keys):
    return reduce(lambda d, key: d.get(key) if d else None, keys, dictionary)

回答 2

通过将此处所有这些答案与我所做的微小更改结合起来,我认为此功能将很有用。其安全,快速,易于维护。

def deep_get(dictionary, keys, default=None):
    return reduce(lambda d, key: d.get(key, default) if isinstance(d, dict) else default, keys.split("."), dictionary)

范例:

>>> from functools import reduce
>>> def deep_get(dictionary, keys, default=None):
...     return reduce(lambda d, key: d.get(key, default) if isinstance(d, dict) else default, keys.split("."), dictionary)
...
>>> person = {'person':{'name':{'first':'John'}}}
>>> print (deep_get(person, "person.name.first"))
John
>>> print (deep_get(person, "person.name.lastname"))
None
>>> print (deep_get(person, "person.name.lastname", default="No lastname"))
No lastname
>>>

By combining all of these answer here and small changes that I made, I think this function would be useful. its safe, quick, easily maintainable.

def deep_get(dictionary, keys, default=None):
    return reduce(lambda d, key: d.get(key, default) if isinstance(d, dict) else default, keys.split("."), dictionary)

Example :

>>> from functools import reduce
>>> def deep_get(dictionary, keys, default=None):
...     return reduce(lambda d, key: d.get(key, default) if isinstance(d, dict) else default, keys.split("."), dictionary)
...
>>> person = {'person':{'name':{'first':'John'}}}
>>> print (deep_get(person, "person.name.first"))
John
>>> print (deep_get(person, "person.name.lastname"))
None
>>> print (deep_get(person, "person.name.lastname", default="No lastname"))
No lastname
>>>

回答 3

以Yoav的答案为基础,这是一种更为安全的方法:

def deep_get(dictionary, *keys):
    return reduce(lambda d, key: d.get(key, None) if isinstance(d, dict) else None, keys, dictionary)

Building up on Yoav’s answer, an even safer approach:

def deep_get(dictionary, *keys):
    return reduce(lambda d, key: d.get(key, None) if isinstance(d, dict) else None, keys, dictionary)

回答 4

递归解决方案。它不是最有效的,但是我发现它比其他示例更具可读性,并且不依赖于functools。

def deep_get(d, keys):
    if not keys or d is None:
        return d
    return deep_get(d.get(keys[0]), keys[1:])

d = {'meta': {'status': 'OK', 'status_code': 200}}
deep_get(d, ['meta', 'status_code'])     # => 200
deep_get(d, ['garbage', 'status_code'])  # => None

更精致的版本

def deep_get(d, keys, default=None):
    """
    Example:
        d = {'meta': {'status': 'OK', 'status_code': 200}}
        deep_get(d, ['meta', 'status_code'])          # => 200
        deep_get(d, ['garbage', 'status_code'])       # => None
        deep_get(d, ['meta', 'garbage'], default='-') # => '-'
    """
    assert type(keys) is list
    if d is None:
        return default
    if not keys:
        return d
    return deep_get(d.get(keys[0]), keys[1:], default)

A recursive solution. It’s not the most efficient but I find it a bit more readable than the other examples and it doesn’t rely on functools.

def deep_get(d, keys):
    if not keys or d is None:
        return d
    return deep_get(d.get(keys[0]), keys[1:])

Example

d = {'meta': {'status': 'OK', 'status_code': 200}}
deep_get(d, ['meta', 'status_code'])     # => 200
deep_get(d, ['garbage', 'status_code'])  # => None

A more polished version

def deep_get(d, keys, default=None):
    """
    Example:
        d = {'meta': {'status': 'OK', 'status_code': 200}}
        deep_get(d, ['meta', 'status_code'])          # => 200
        deep_get(d, ['garbage', 'status_code'])       # => None
        deep_get(d, ['meta', 'garbage'], default='-') # => '-'
    """
    assert type(keys) is list
    if d is None:
        return default
    if not keys:
        return d
    return deep_get(d.get(keys[0]), keys[1:], default)

回答 5

虽然reduce方法简洁明了,但我认为一个简单的循环更容易理解。我还包括一个默认参数。

def deep_get(_dict, keys, default=None):
    for key in keys:
        if isinstance(_dict, dict):
            _dict = _dict.get(key, default)
        else:
            return default
    return _dict

为了了解还原型单缸衬套的工作原理,我做了以下工作。但最终循环方法对我来说似乎更直观。

def deep_get(_dict, keys, default=None):

    def _reducer(d, key):
        if isinstance(d, dict):
            return d.get(key, default)
        return default

    return reduce(_reducer, keys, _dict)

用法

nested = {'a': {'b': {'c': 42}}}

print deep_get(nested, ['a', 'b'])
print deep_get(nested, ['a', 'b', 'z', 'z'], default='missing')

While the reduce approach is neat and short, I think a simple loop is easier to grok. I’ve also included a default parameter.

def deep_get(_dict, keys, default=None):
    for key in keys:
        if isinstance(_dict, dict):
            _dict = _dict.get(key, default)
        else:
            return default
    return _dict

As an exercise to understand how the reduce one-liner worked, I did the following. But ultimately the loop approach seems more intuitive to me.

def deep_get(_dict, keys, default=None):

    def _reducer(d, key):
        if isinstance(d, dict):
            return d.get(key, default)
        return default

    return reduce(_reducer, keys, _dict)

Usage

nested = {'a': {'b': {'c': 42}}}

print deep_get(nested, ['a', 'b'])
print deep_get(nested, ['a', 'b', 'z', 'z'], default='missing')

回答 6

我建议你试试python-benedict

它是一个dict子类,提供键路径支持等等。

安装: pip install python-benedict

from benedict import benedict

example_dict = benedict(example_dict, keypath_separator='.')

现在您可以使用keypath访问嵌套值:

val = example_dict['key1.key2']

# using 'get' method to avoid a possible KeyError:
val = example_dict.get('key1.key2')

或使用键列表访问嵌套值:

val = example_dict['key1', 'key2']

# using get to avoid a possible KeyError:
val = example_dict.get(['key1', 'key2'])

它在GitHub上经过了良好的测试和开源

https://github.com/fabiocaccamo/python-benedict

I suggest you to try python-benedict.

It is a dict subclass that provides keypath support and much more.

Installation: pip install python-benedict

from benedict import benedict

example_dict = benedict(example_dict, keypath_separator='.')

now you can access nested values using keypath:

val = example_dict['key1.key2']

# using 'get' method to avoid a possible KeyError:
val = example_dict.get('key1.key2')

or access nested values using keys list:

val = example_dict['key1', 'key2']

# using get to avoid a possible KeyError:
val = example_dict.get(['key1', 'key2'])

It is well tested and open-source on GitHub:

https://github.com/fabiocaccamo/python-benedict

Note: I am the author of this project


回答 7

一个简单的类,可以包装字典并根据键进行检索:

class FindKey(dict):
    def get(self, path, default=None):
        keys = path.split(".")
        val = None

        for key in keys:
            if val:
                if isinstance(val, list):
                    val = [v.get(key, default) if v else None for v in val]
                else:
                    val = val.get(key, default)
            else:
                val = dict.get(self, key, default)

            if not val:
                break

        return val

例如:

person = {'person':{'name':{'first':'John'}}}
FindDict(person).get('person.name.first') # == 'John'

如果键不存在,则None默认情况下返回。您可以使用包装器中的default=键覆盖它FindDict,例如`:

FindDict(person, default='').get('person.name.last') # == doesn't exist, so ''

A simple class that can wrap a dict, and retrieve based on a key:

class FindKey(dict):
    def get(self, path, default=None):
        keys = path.split(".")
        val = None

        for key in keys:
            if val:
                if isinstance(val, list):
                    val = [v.get(key, default) if v else None for v in val]
                else:
                    val = val.get(key, default)
            else:
                val = dict.get(self, key, default)

            if not val:
                break

        return val

For example:

person = {'person':{'name':{'first':'John'}}}
FindDict(person).get('person.name.first') # == 'John'

If the key doesn’t exist, it returns None by default. You can override that using a default= key in the FindDict wrapper — for example`:

FindDict(person, default='').get('person.name.last') # == doesn't exist, so ''

回答 8

对于第二级密钥检索,可以执行以下操作:

key2_value = (example_dict.get('key1') or {}).get('key2')

for a second level key retrieving, you can do this:

key2_value = (example_dict.get('key1') or {}).get('key2')

回答 9

看到属性后,我进行了以下操作以dict使用点表示法安全地获取嵌套值。这对我dicts有用,因为我是反序列化的MongoDB对象,所以我知道键名不包含.。另外,在我的上下文中,我可以指定一个None我的数据中没有的虚假后备值(),因此在调用该函数时可以避免使用try / except模式。

from functools import reduce # Python 3
def deepgetitem(obj, item, fallback=None):
    """Steps through an item chain to get the ultimate value.

    If ultimate value or path to value does not exist, does not raise
    an exception and instead returns `fallback`.

    >>> d = {'snl_final': {'about': {'_icsd': {'icsd_id': 1}}}}
    >>> deepgetitem(d, 'snl_final.about._icsd.icsd_id')
    1
    >>> deepgetitem(d, 'snl_final.about._sandbox.sbx_id')
    >>>
    """
    def getitem(obj, name):
        try:
            return obj[name]
        except (KeyError, TypeError):
            return fallback
    return reduce(getitem, item.split('.'), obj)

After seeing this for deeply getting attributes, I made the following to safely get nested dict values using dot notation. This works for me because my dicts are deserialized MongoDB objects, so I know the key names don’t contain .s. Also, in my context, I can specify a falsy fallback value (None) that I don’t have in my data, so I can avoid the try/except pattern when calling the function.

from functools import reduce # Python 3
def deepgetitem(obj, item, fallback=None):
    """Steps through an item chain to get the ultimate value.

    If ultimate value or path to value does not exist, does not raise
    an exception and instead returns `fallback`.

    >>> d = {'snl_final': {'about': {'_icsd': {'icsd_id': 1}}}}
    >>> deepgetitem(d, 'snl_final.about._icsd.icsd_id')
    1
    >>> deepgetitem(d, 'snl_final.about._sandbox.sbx_id')
    >>>
    """
    def getitem(obj, name):
        try:
            return obj[name]
        except (KeyError, TypeError):
            return fallback
    return reduce(getitem, item.split('.'), obj)

回答 10

同一件事的另一个函数也返回一个布尔值,表示是否找到了密钥,并处理一些意外错误。

'''
json : json to extract value from if exists
path : details.detail.first_name
            empty path represents root

returns a tuple (boolean, object)
        boolean : True if path exists, otherwise False
        object : the object if path exists otherwise None

'''
def get_json_value_at_path(json, path=None, default=None):

    if not bool(path):
        return True, json
    if type(json) is not dict :
        raise ValueError(f'json={json}, path={path} not supported, json must be a dict')
    if type(path) is not str and type(path) is not list:
        raise ValueError(f'path format {path} not supported, path can be a list of strings like [x,y,z] or a string like x.y.z')

    if type(path) is str:
        path = path.strip('.').split('.')
    key = path[0]
    if key in json.keys():
        return get_json_value_at_path(json[key], path[1:], default)
    else:
        return False, default

用法示例:

my_json = {'details' : {'first_name' : 'holla', 'last_name' : 'holla'}}
print(get_json_value_at_path(my_json, 'details.first_name', ''))
print(get_json_value_at_path(my_json, 'details.phone', ''))

(真的,“ holla”)

(错误,“”)

Yet another function for the same thing, also returns a boolean to represent whether the key was found or not and handles some unexpected errors.

'''
json : json to extract value from if exists
path : details.detail.first_name
            empty path represents root

returns a tuple (boolean, object)
        boolean : True if path exists, otherwise False
        object : the object if path exists otherwise None

'''
def get_json_value_at_path(json, path=None, default=None):

    if not bool(path):
        return True, json
    if type(json) is not dict :
        raise ValueError(f'json={json}, path={path} not supported, json must be a dict')
    if type(path) is not str and type(path) is not list:
        raise ValueError(f'path format {path} not supported, path can be a list of strings like [x,y,z] or a string like x.y.z')

    if type(path) is str:
        path = path.strip('.').split('.')
    key = path[0]
    if key in json.keys():
        return get_json_value_at_path(json[key], path[1:], default)
    else:
        return False, default

example usage:

my_json = {'details' : {'first_name' : 'holla', 'last_name' : 'holla'}}
print(get_json_value_at_path(my_json, 'details.first_name', ''))
print(get_json_value_at_path(my_json, 'details.phone', ''))

(True, ‘holla’)

(False, ”)


回答 11

您可以使用pydash:

import pydash as _

_.get(example_dict, 'key1.key2', default='Default')

https://pydash.readthedocs.io/en/latest/api.html

You can use pydash:

import pydash as _

_.get(example_dict, 'key1.key2', default='Default')

https://pydash.readthedocs.io/en/latest/api.html


回答 12

我发现在自己的代码中有用的unutbu答案的改编:

example_dict.setdefaut('key1', {}).get('key2')

如果它没有key1,它将为key1生成一个字典条目,以便避免KeyError。如果您想像我一样以包含该键对的嵌套字典作为结尾,这似乎是最简单的解决方案。

An adaptation of unutbu’s answer that I found useful in my own code:

example_dict.setdefaut('key1', {}).get('key2')

It generates a dictionary entry for key1 if it does not have that key already so that you avoid the KeyError. If you want to end up a nested dictionary that includes that key pairing anyway like I did, this seems like the easiest solution.


回答 13

由于如果缺少一个键会引发一个键错误是一件合理的事情,因此我们甚至无法检查它并使其成为单个:

def get_dict(d, kl):
  cur = d[kl[0]]
  return get_dict(cur, kl[1:]) if len(kl) > 1 else cur

Since raising an key error if one of keys is missing is a reasonable thing to do, we can even not check for it and get it as single as that:

def get_dict(d, kl):
  cur = d[kl[0]]
  return get_dict(cur, kl[1:]) if len(kl) > 1 else cur

回答 14

reduce使它与列表一起使用的方法几乎没有改进。还使用数据路径作为由点而不是数组分隔的字符串。

def deep_get(dictionary, path):
    keys = path.split('.')
    return reduce(lambda d, key: d[int(key)] if isinstance(d, list) else d.get(key) if d else None, keys, dictionary)

Little improvement to reduce approach to make it work with list. Also using data path as string divided by dots instead of array.

def deep_get(dictionary, path):
    keys = path.split('.')
    return reduce(lambda d, key: d[int(key)] if isinstance(d, list) else d.get(key) if d else None, keys, dictionary)

回答 15

我使用的解决方案类似于double get,但具有使用if else逻辑避免TypeError的附加功能:

    value = example_dict['key1']['key2'] if example_dict.get('key1') and example_dict['key1'].get('key2') else default_value

但是,字典嵌套得越多,麻烦就越多。

A solution I’ve used that is similar to the double get but with the additional ability to avoid a TypeError using if else logic:

    value = example_dict['key1']['key2'] if example_dict.get('key1') and example_dict['key1'].get('key2') else default_value

However, the more nested the dictionary the more cumbersome this becomes.


回答 16

对于嵌套字典/ JSON查找,可以使用dictor

点安装独裁者

字典对象

{
    "characters": {
        "Lonestar": {
            "id": 55923,
            "role": "renegade",
            "items": [
                "space winnebago",
                "leather jacket"
            ]
        },
        "Barfolomew": {
            "id": 55924,
            "role": "mawg",
            "items": [
                "peanut butter jar",
                "waggy tail"
            ]
        },
        "Dark Helmet": {
            "id": 99999,
            "role": "Good is dumb",
            "items": [
                "Shwartz",
                "helmet"
            ]
        },
        "Skroob": {
            "id": 12345,
            "role": "Spaceballs CEO",
            "items": [
                "luggage"
            ]
        }
    }
}

要获得Lonestar的商品,只需提供一个点分隔的路径,即

import json
from dictor import dictor

with open('test.json') as data: 
    data = json.load(data)

print dictor(data, 'characters.Lonestar.items')

>> [u'space winnebago', u'leather jacket']

您可以提供备用值,以防路径中的键不存在

您还有更多选择,例如忽略字母大写和使用”以外的其他字符。作为路径分隔符

https://github.com/perfecto25/dictor

For nested dictionary/JSON lookups, you can use dictor

pip install dictor

dict object

{
    "characters": {
        "Lonestar": {
            "id": 55923,
            "role": "renegade",
            "items": [
                "space winnebago",
                "leather jacket"
            ]
        },
        "Barfolomew": {
            "id": 55924,
            "role": "mawg",
            "items": [
                "peanut butter jar",
                "waggy tail"
            ]
        },
        "Dark Helmet": {
            "id": 99999,
            "role": "Good is dumb",
            "items": [
                "Shwartz",
                "helmet"
            ]
        },
        "Skroob": {
            "id": 12345,
            "role": "Spaceballs CEO",
            "items": [
                "luggage"
            ]
        }
    }
}

to get Lonestar’s items, simply provide a dot-separated path, ie

import json
from dictor import dictor

with open('test.json') as data: 
    data = json.load(data)

print dictor(data, 'characters.Lonestar.items')

>> [u'space winnebago', u'leather jacket']

you can provide fallback value in case the key isnt in path

theres tons more options you can do, like ignore letter casing and using other characters besides ‘.’ as a path separator,

https://github.com/perfecto25/dictor


回答 17

我几乎没有改变这个答案。我添加了检查是否正在使用带有数字的列表。所以现在我们可以使用任何一种方式。deep_get(allTemp, [0], {})deep_get(getMinimalTemp, [0, minimalTemperatureKey], 26)

def deep_get(_dict, keys, default=None):
    def _reducer(d, key):
        if isinstance(d, dict):
            return d.get(key, default)
        if isinstance(d, list):
            return d[key] if len(d) > 0 else default
        return default
    return reduce(_reducer, keys, _dict)

I little changed this answer. I added checking if we’re using list with numbers. So now we can use it whichever way. deep_get(allTemp, [0], {}) or deep_get(getMinimalTemp, [0, minimalTemperatureKey], 26) etc

def deep_get(_dict, keys, default=None):
    def _reducer(d, key):
        if isinstance(d, dict):
            return d.get(key, default)
        if isinstance(d, list):
            return d[key] if len(d) > 0 else default
        return default
    return reduce(_reducer, keys, _dict)

回答 18

已经有了很多好的答案,但是我想出了一个名为get的函数,类似于JavaScript领域中的lodash get,它还支持按索引进入列表:

def get(value, keys, default_value = None):
'''
    Useful for reaching into nested JSON like data
    Inspired by JavaScript lodash get and Clojure get-in etc.
'''
  if value is None or keys is None:
      return None
  path = keys.split('.') if isinstance(keys, str) else keys
  result = value
  def valid_index(key):
      return re.match('^([1-9][0-9]*|[0-9])$', key) and int(key) >= 0
  def is_dict_like(v):
      return hasattr(v, '__getitem__') and hasattr(v, '__contains__')
  for key in path:
      if isinstance(result, list) and valid_index(key) and int(key) < len(result):
          result = result[int(key)] if int(key) < len(result) else None
      elif is_dict_like(result) and key in result:
          result = result[key]
      else:
          result = default_value
          break
  return result

def test_get():
  assert get(None, ['foo']) == None
  assert get({'foo': 1}, None) == None
  assert get(None, None) == None
  assert get({'foo': 1}, []) == {'foo': 1}
  assert get({'foo': 1}, ['foo']) == 1
  assert get({'foo': 1}, ['bar']) == None
  assert get({'foo': 1}, ['bar'], 'the default') == 'the default'
  assert get({'foo': {'bar': 'hello'}}, ['foo', 'bar']) == 'hello'
  assert get({'foo': {'bar': 'hello'}}, 'foo.bar') == 'hello'
  assert get({'foo': [{'bar': 'hello'}]}, 'foo.0.bar') == 'hello'
  assert get({'foo': [{'bar': 'hello'}]}, 'foo.1') == None
  assert get({'foo': [{'bar': 'hello'}]}, 'foo.1.bar') == None
  assert get(['foo', 'bar'], '1') == 'bar'
  assert get(['foo', 'bar'], '2') == None

There are already lots of good answers but I have come up with a function called get similar to lodash get in JavaScript land that also supports reaching into lists by index:

def get(value, keys, default_value = None):
'''
    Useful for reaching into nested JSON like data
    Inspired by JavaScript lodash get and Clojure get-in etc.
'''
  if value is None or keys is None:
      return None
  path = keys.split('.') if isinstance(keys, str) else keys
  result = value
  def valid_index(key):
      return re.match('^([1-9][0-9]*|[0-9])$', key) and int(key) >= 0
  def is_dict_like(v):
      return hasattr(v, '__getitem__') and hasattr(v, '__contains__')
  for key in path:
      if isinstance(result, list) and valid_index(key) and int(key) < len(result):
          result = result[int(key)] if int(key) < len(result) else None
      elif is_dict_like(result) and key in result:
          result = result[key]
      else:
          result = default_value
          break
  return result

def test_get():
  assert get(None, ['foo']) == None
  assert get({'foo': 1}, None) == None
  assert get(None, None) == None
  assert get({'foo': 1}, []) == {'foo': 1}
  assert get({'foo': 1}, ['foo']) == 1
  assert get({'foo': 1}, ['bar']) == None
  assert get({'foo': 1}, ['bar'], 'the default') == 'the default'
  assert get({'foo': {'bar': 'hello'}}, ['foo', 'bar']) == 'hello'
  assert get({'foo': {'bar': 'hello'}}, 'foo.bar') == 'hello'
  assert get({'foo': [{'bar': 'hello'}]}, 'foo.0.bar') == 'hello'
  assert get({'foo': [{'bar': 'hello'}]}, 'foo.1') == None
  assert get({'foo': [{'bar': 'hello'}]}, 'foo.1.bar') == None
  assert get(['foo', 'bar'], '1') == 'bar'
  assert get(['foo', 'bar'], '2') == None

通过索引访问collections.OrderedDict中的项目

问题:通过索引访问collections.OrderedDict中的项目

可以说我有以下代码:

import collections
d = collections.OrderedDict()
d['foo'] = 'python'
d['bar'] = 'spam'

有没有一种方法可以以编号方式访问项目,例如:

d(0) #foo's Output
d(1) #bar's Output

Lets say I have the following code:

import collections
d = collections.OrderedDict()
d['foo'] = 'python'
d['bar'] = 'spam'

Is there a way I can access the items in a numbered manner, like:

d(0) #foo's Output
d(1) #bar's Output

回答 0

如果是OrderedDict(),则可以通过获取(key,value)对的元组的索引来轻松访问元素,如下所示

>>> import collections
>>> d = collections.OrderedDict()
>>> d['foo'] = 'python'
>>> d['bar'] = 'spam'
>>> d.items()
[('foo', 'python'), ('bar', 'spam')]
>>> d.items()[0]
('foo', 'python')
>>> d.items()[1]
('bar', 'spam')

Python 3.X的注意事项

dict.items将返回一个可迭代的dict视图对象而不是一个列表。我们需要将调用包装到一个列表中,以使建立索引成为可能

>>> items = list(d.items())
>>> items
[('foo', 'python'), ('bar', 'spam')]
>>> items[0]
('foo', 'python')
>>> items[1]
('bar', 'spam')

If its an OrderedDict() you can easily access the elements by indexing by getting the tuples of (key,value) pairs as follows

>>> import collections
>>> d = collections.OrderedDict()
>>> d['foo'] = 'python'
>>> d['bar'] = 'spam'
>>> d.items()
[('foo', 'python'), ('bar', 'spam')]
>>> d.items()[0]
('foo', 'python')
>>> d.items()[1]
('bar', 'spam')

Note for Python 3.X

dict.items would return an iterable dict view object rather than a list. We need to wrap the call onto a list in order to make the indexing possible

>>> items = list(d.items())
>>> items
[('foo', 'python'), ('bar', 'spam')]
>>> items[0]
('foo', 'python')
>>> items[1]
('bar', 'spam')

回答 1

您是否必须使用OrderedDict还是特别想要以快速位置索引以某种方式排序的类似地图的类型?如果是后者,则考虑使用Python多种排序的dict类型之一(根据键的排序顺序对键值对进行排序)。一些实现还支持快速索引。例如,为此目的,sortedcontainers项目具有SortedDict类型。

>>> from sortedcontainers import SortedDict
>>> sd = SortedDict()
>>> sd['foo'] = 'python'
>>> sd['bar'] = 'spam'
>>> print sd.iloc[0] # Note that 'bar' comes before 'foo' in sort order.
'bar'
>>> # If you want the value, then simple do a key lookup:
>>> print sd[sd.iloc[1]]
'python'

Do you have to use an OrderedDict or do you specifically want a map-like type that’s ordered in some way with fast positional indexing? If the latter, then consider one of Python’s many sorted dict types (which orders key-value pairs based on key sort order). Some implementations also support fast indexing. For example, the sortedcontainers project has a SortedDict type for just this purpose.

>>> from sortedcontainers import SortedDict
>>> sd = SortedDict()
>>> sd['foo'] = 'python'
>>> sd['bar'] = 'spam'
>>> print sd.iloc[0] # Note that 'bar' comes before 'foo' in sort order.
'bar'
>>> # If you want the value, then simple do a key lookup:
>>> print sd[sd.iloc[1]]
'python'

回答 2

如果您要在OrderedDict中创建第一个条目(或靠近它)而不创建列表,则是一种特殊情况。(此版本已更新为Python 3):

>>> from collections import OrderedDict
>>> 
>>> d = OrderedDict()
>>> d["foo"] = "one"
>>> d["bar"] = "two"
>>> d["baz"] = "three"
>>> next(iter(d.items()))
('foo', 'one')
>>> next(iter(d.values()))
'one'

(当您第一次说“ next()”时,它的意思实际上是“第一”。)

在我的非正式测试中,next(iter(d.items()))使用小OrderedDict仅比快一点items()[0]。使用10,000个条目的OrderedDict,next(iter(d.items()))比快200倍items()[0]

但是,如果您只保存items()列表一次,然后大量使用该列表,那可能会更快。或者,如果您反复{创建一个items()迭代器并将其逐步移动到所需位置},那可能会更慢。

Here is a special case if you want the first entry (or close to it) in an OrderedDict, without creating a list. (This has been updated to Python 3):

>>> from collections import OrderedDict
>>> 
>>> d = OrderedDict()
>>> d["foo"] = "one"
>>> d["bar"] = "two"
>>> d["baz"] = "three"
>>> next(iter(d.items()))
('foo', 'one')
>>> next(iter(d.values()))
'one'

(The first time you say “next()”, it really means “first.”)

In my informal test, next(iter(d.items())) with a small OrderedDict is only a tiny bit faster than items()[0]. With an OrderedDict of 10,000 entries, next(iter(d.items())) was about 200 times faster than items()[0].

BUT if you save the items() list once and then use the list a lot, that could be faster. Or if you repeatedly { create an items() iterator and step through it to to the position you want }, that could be slower.


回答 3

从包中使用IndexedOrderedDict会大大提高效率indexed

根据Niklas的评论,我对OrderedDictIndexedOrderedDict进行了基准测试,其中包含1000个条目。

In [1]: from numpy import *
In [2]: from indexed import IndexedOrderedDict
In [3]: id=IndexedOrderedDict(zip(arange(1000),random.random(1000)))
In [4]: timeit id.keys()[56]
1000000 loops, best of 3: 969 ns per loop

In [8]: from collections import OrderedDict
In [9]: od=OrderedDict(zip(arange(1000),random.random(1000)))
In [10]: timeit od.keys()[56]
10000 loops, best of 3: 104 µs per loop

在此特定情况下,在特定位置的索引元素中的IndexedOrderedDict快约100倍。

It is dramatically more efficient to use IndexedOrderedDict from the indexed package.

Following Niklas’s comment, I have done a benchmark on OrderedDict and IndexedOrderedDict with 1000 entries.

In [1]: from numpy import *
In [2]: from indexed import IndexedOrderedDict
In [3]: id=IndexedOrderedDict(zip(arange(1000),random.random(1000)))
In [4]: timeit id.keys()[56]
1000000 loops, best of 3: 969 ns per loop

In [8]: from collections import OrderedDict
In [9]: od=OrderedDict(zip(arange(1000),random.random(1000)))
In [10]: timeit od.keys()[56]
10000 loops, best of 3: 104 µs per loop

IndexedOrderedDict is ~100 times faster in indexing elements at specific position in this specific case.


回答 4

该社区Wiki尝试收集现有答案。

Python 2.7

在Python 2中,keys()values(),和items()函数OrderedDict的返回列表。使用values为例,最简单的方法是

d.values()[0]  # "python"
d.values()[1]  # "spam"

对于大集合,你只关心一个单一的指标,你能避免使用生成器版本创建的完整列表,iterkeysitervaluesiteritems

import itertools
next(itertools.islice(d.itervalues(), 0, 1))  # "python"
next(itertools.islice(d.itervalues(), 1, 2))  # "spam"

indexed.py包提供IndexedOrderedDict,这是专为这种使用情况下,将是最快的选项。

from indexed import IndexedOrderedDict
d = IndexedOrderedDict({'foo':'python','bar':'spam'})
d.values()[0]  # "python"
d.values()[1]  # "spam"

对于具有随机访问权限的大型词典,使用itervalues可能会更快:

$ python2 -m timeit -s 'from collections import OrderedDict; from random import randint; size = 1000;   d = OrderedDict({i:i for i in range(size)})'  'i = randint(0, size-1); d.values()[i:i+1]'
1000 loops, best of 3: 259 usec per loop
$ python2 -m timeit -s 'from collections import OrderedDict; from random import randint; size = 10000;  d = OrderedDict({i:i for i in range(size)})' 'i = randint(0, size-1); d.values()[i:i+1]'
100 loops, best of 3: 2.3 msec per loop
$ python2 -m timeit -s 'from collections import OrderedDict; from random import randint; size = 100000; d = OrderedDict({i:i for i in range(size)})' 'i = randint(0, size-1); d.values()[i:i+1]'
10 loops, best of 3: 24.5 msec per loop

$ python2 -m timeit -s 'from collections import OrderedDict; from random import randint; size = 1000;   d = OrderedDict({i:i for i in range(size)})' 'i = randint(0, size-1); next(itertools.islice(d.itervalues(), i, i+1))'
10000 loops, best of 3: 118 usec per loop
$ python2 -m timeit -s 'from collections import OrderedDict; from random import randint; size = 10000;  d = OrderedDict({i:i for i in range(size)})' 'i = randint(0, size-1); next(itertools.islice(d.itervalues(), i, i+1))'
1000 loops, best of 3: 1.26 msec per loop
$ python2 -m timeit -s 'from collections import OrderedDict; from random import randint; size = 100000; d = OrderedDict({i:i for i in range(size)})' 'i = randint(0, size-1); next(itertools.islice(d.itervalues(), i, i+1))'
100 loops, best of 3: 10.9 msec per loop

$ python2 -m timeit -s 'from indexed import IndexedOrderedDict; from random import randint; size = 1000;   d = IndexedOrderedDict({i:i for i in range(size)})' 'i = randint(0, size-1); d.values()[i]'
100000 loops, best of 3: 2.19 usec per loop
$ python2 -m timeit -s 'from indexed import IndexedOrderedDict; from random import randint; size = 10000;  d = IndexedOrderedDict({i:i for i in range(size)})' 'i = randint(0, size-1); d.values()[i]'
100000 loops, best of 3: 2.24 usec per loop
$ python2 -m timeit -s 'from indexed import IndexedOrderedDict; from random import randint; size = 100000; d = IndexedOrderedDict({i:i for i in range(size)})' 'i = randint(0, size-1); d.values()[i]'
100000 loops, best of 3: 2.61 usec per loop

+--------+-----------+----------------+---------+
|  size  | list (ms) | generator (ms) | indexed |
+--------+-----------+----------------+---------+
|   1000 | .259      | .118           | .00219  |
|  10000 | 2.3       | 1.26           | .00224  |
| 100000 | 24.5      | 10.9           | .00261  |
+--------+-----------+----------------+---------+

Python 3.6

Python 3具有相同的两个基本选项(列表vs生成器),但是默认情况下dict方法返回生成器。

清单方法:

list(d.values())[0]  # "python"
list(d.values())[1]  # "spam"

生成器方法:

import itertools
next(itertools.islice(d.values(), 0, 1))  # "python"
next(itertools.islice(d.values(), 1, 2))  # "spam"

Python 3字典比python 2快一个数量级,并且使用生成器的速度类似。

+--------+-----------+----------------+---------+
|  size  | list (ms) | generator (ms) | indexed |
+--------+-----------+----------------+---------+
|   1000 | .0316     | .0165          | .00262  |
|  10000 | .288      | .166           | .00294  |
| 100000 | 3.53      | 1.48           | .00332  |
+--------+-----------+----------------+---------+

This community wiki attempts to collect existing answers.

Python 2.7

In python 2, the keys(), values(), and items() functions of OrderedDict return lists. Using values as an example, the simplest way is

d.values()[0]  # "python"
d.values()[1]  # "spam"

For large collections where you only care about a single index, you can avoid creating the full list using the generator versions, iterkeys, itervalues and iteritems:

import itertools
next(itertools.islice(d.itervalues(), 0, 1))  # "python"
next(itertools.islice(d.itervalues(), 1, 2))  # "spam"

The indexed.py package provides IndexedOrderedDict, which is designed for this use case and will be the fastest option.

from indexed import IndexedOrderedDict
d = IndexedOrderedDict({'foo':'python','bar':'spam'})
d.values()[0]  # "python"
d.values()[1]  # "spam"

Using itervalues can be considerably faster for large dictionaries with random access:

$ python2 -m timeit -s 'from collections import OrderedDict; from random import randint; size = 1000;   d = OrderedDict({i:i for i in range(size)})'  'i = randint(0, size-1); d.values()[i:i+1]'
1000 loops, best of 3: 259 usec per loop
$ python2 -m timeit -s 'from collections import OrderedDict; from random import randint; size = 10000;  d = OrderedDict({i:i for i in range(size)})' 'i = randint(0, size-1); d.values()[i:i+1]'
100 loops, best of 3: 2.3 msec per loop
$ python2 -m timeit -s 'from collections import OrderedDict; from random import randint; size = 100000; d = OrderedDict({i:i for i in range(size)})' 'i = randint(0, size-1); d.values()[i:i+1]'
10 loops, best of 3: 24.5 msec per loop

$ python2 -m timeit -s 'from collections import OrderedDict; from random import randint; size = 1000;   d = OrderedDict({i:i for i in range(size)})' 'i = randint(0, size-1); next(itertools.islice(d.itervalues(), i, i+1))'
10000 loops, best of 3: 118 usec per loop
$ python2 -m timeit -s 'from collections import OrderedDict; from random import randint; size = 10000;  d = OrderedDict({i:i for i in range(size)})' 'i = randint(0, size-1); next(itertools.islice(d.itervalues(), i, i+1))'
1000 loops, best of 3: 1.26 msec per loop
$ python2 -m timeit -s 'from collections import OrderedDict; from random import randint; size = 100000; d = OrderedDict({i:i for i in range(size)})' 'i = randint(0, size-1); next(itertools.islice(d.itervalues(), i, i+1))'
100 loops, best of 3: 10.9 msec per loop

$ python2 -m timeit -s 'from indexed import IndexedOrderedDict; from random import randint; size = 1000;   d = IndexedOrderedDict({i:i for i in range(size)})' 'i = randint(0, size-1); d.values()[i]'
100000 loops, best of 3: 2.19 usec per loop
$ python2 -m timeit -s 'from indexed import IndexedOrderedDict; from random import randint; size = 10000;  d = IndexedOrderedDict({i:i for i in range(size)})' 'i = randint(0, size-1); d.values()[i]'
100000 loops, best of 3: 2.24 usec per loop
$ python2 -m timeit -s 'from indexed import IndexedOrderedDict; from random import randint; size = 100000; d = IndexedOrderedDict({i:i for i in range(size)})' 'i = randint(0, size-1); d.values()[i]'
100000 loops, best of 3: 2.61 usec per loop

+--------+-----------+----------------+---------+
|  size  | list (ms) | generator (ms) | indexed |
+--------+-----------+----------------+---------+
|   1000 | .259      | .118           | .00219  |
|  10000 | 2.3       | 1.26           | .00224  |
| 100000 | 24.5      | 10.9           | .00261  |
+--------+-----------+----------------+---------+

Python 3.6

Python 3 has the same two basic options (list vs generator), but the dict methods return generators by default.

List method:

list(d.values())[0]  # "python"
list(d.values())[1]  # "spam"

Generator method:

import itertools
next(itertools.islice(d.values(), 0, 1))  # "python"
next(itertools.islice(d.values(), 1, 2))  # "spam"

Python 3 dictionaries are an order of magnitude faster than python 2 and have similar speedups for using generators.

+--------+-----------+----------------+---------+
|  size  | list (ms) | generator (ms) | indexed |
+--------+-----------+----------------+---------+
|   1000 | .0316     | .0165          | .00262  |
|  10000 | .288      | .166           | .00294  |
| 100000 | 3.53      | 1.48           | .00332  |
+--------+-----------+----------------+---------+

回答 5

这是一个新时代,Python 3.6.1词典现在可以保留其顺序。这些语义不明确,因为这需要BDFL批准。但是雷蒙德·海廷格(Raymond Hettinger)是下一个最好的东西(而且更有趣),他提出了一个非常有力的理由,那就是字典将在很长一段时间内被订购。

因此,现在很容易创建字典的切片:

test_dict = {
                'first':  1,
                'second': 2,
                'third':  3,
                'fourth': 4
            }

list(test_dict.items())[:2]

注意:现在,字典插入顺序保留在Python 3.7中正式的

It’s a new era and with Python 3.6.1 dictionaries now retain their order. These semantics aren’t explicit because that would require BDFL approval. But Raymond Hettinger is the next best thing (and funnier) and he makes a pretty strong case that dictionaries will be ordered for a very long time.

So now it’s easy to create slices of a dictionary:

test_dict = {
                'first':  1,
                'second': 2,
                'third':  3,
                'fourth': 4
            }

list(test_dict.items())[:2]

Note: Dictonary insertion-order preservation is now official in Python 3.7.


回答 6

对于OrderedDict(),您可以通过按以下方式获取(键,值)对的元组或通过使用’.values()’进行索引来访问元素。

>>> import collections
>>> d = collections.OrderedDict()
>>> d['foo'] = 'python'
>>> d['bar'] = 'spam'
>>> d.items()
[('foo', 'python'), ('bar', 'spam')]
>>>d.values()
odict_values(['python','spam'])
>>>list(d.values())
['python','spam']

for OrderedDict() you can access the elements by indexing by getting the tuples of (key,value) pairs as follows or using ‘.values()’

>>> import collections
>>> d = collections.OrderedDict()
>>> d['foo'] = 'python'
>>> d['bar'] = 'spam'
>>> d.items()
[('foo', 'python'), ('bar', 'spam')]
>>>d.values()
odict_values(['python','spam'])
>>>list(d.values())
['python','spam']

为什么python dict.update()不返回对象?

问题:为什么python dict.update()不返回对象?

我正在尝试:

award_dict = {
    "url" : "http://facebook.com",
    "imageurl" : "http://farm4.static.flickr.com/3431/3939267074_feb9eb19b1_o.png",
    "count" : 1,
}

def award(name, count, points, desc_string, my_size, parent) :
    if my_size > count :
        a = {
            "name" : name,
            "description" : desc_string % count,
            "points" : points,
            "parent_award" : parent,
        }
        a.update(award_dict)
        return self.add_award(a, siteAlias, alias).award

但是,如果觉得该函数真的很麻烦,我宁愿这样做:

        return self.add_award({
            "name" : name,
            "description" : desc_string % count,
            "points" : points,
            "parent_award" : parent,
        }.update(award_dict), siteAlias, alias).award

为什么不更新返回对象,以便您可以链接?

JQuery这样做是为了进行链接。为什么在python中不可接受?

I ‘m trying to do :

award_dict = {
    "url" : "http://facebook.com",
    "imageurl" : "http://farm4.static.flickr.com/3431/3939267074_feb9eb19b1_o.png",
    "count" : 1,
}

def award(name, count, points, desc_string, my_size, parent) :
    if my_size > count :
        a = {
            "name" : name,
            "description" : desc_string % count,
            "points" : points,
            "parent_award" : parent,
        }
        a.update(award_dict)
        return self.add_award(a, siteAlias, alias).award

But if felt really cumbersome in the function, and I would have rather done :

        return self.add_award({
            "name" : name,
            "description" : desc_string % count,
            "points" : points,
            "parent_award" : parent,
        }.update(award_dict), siteAlias, alias).award

Why doesn’t update return the object so you can chain?

JQuery does this to do chaining. Why isn’t it acceptable in python?


回答 0

Python大多实现了务实的命令查询分离风格:mutator返回None(带有务实的异常,例如pop;-),因此它们不可能与访问器混淆(同样,赋值不是表达式,语句-表达式分离,依此类推)。

这并不意味着没有很多方法可以在您真正想要的时候将它们合并,例如,dict(a, **award_dict)做出一个新的字典,就像您希望.update返回的字典一样。所以,如果您真的觉得很重要,那就为什么不使用THAT ?

编辑:顺便说一句,在您的特定情况下,无需a按照以下方式进行创建:

dict(name=name, description=desc % count, points=points, parent_award=parent,
     **award_dict)

创建一个具有与您的语义完全相同的语义的dict a.update(award_dict)(包括在发生冲突的情况下,in中的条目award_dict会覆盖您明确提供的条目的事实;要获取其他语义,即,使显式条目“赢得”此类冲突,award_dict作为唯一的位置 arg 传递,关键字“>” 之前传递,并丧失**形式- dict(award_dict, name=name等等)。

Python’s mostly implementing a pragmatically tinged flavor of command-query separation: mutators return None (with pragmatically induced exceptions such as pop;-) so they can’t possibly be confused with accessors (and in the same vein, assignment is not an expression, the statement-expression separation is there, and so forth).

That doesn’t mean there aren’t a lot of ways to merge things up when you really want, e.g., dict(a, **award_dict) makes a new dict much like the one you appear to wish .update returned — so why not use THAT if you really feel it’s important?

Edit: btw, no need, in your specific case, to create a along the way, either:

dict(name=name, description=desc % count, points=points, parent_award=parent,
     **award_dict)

creates a single dict with exactly the same semantics as your a.update(award_dict) (including, in case of conflicts, the fact that entries in award_dict override those you’re giving explicitly; to get the other semantics, i.e., to have explicit entries “winning” such conflicts, pass award_dict as the sole positional arg, before the keyword ones, and bereft of the ** form — dict(award_dict, name=name etc etc).


回答 1

按照惯例,Python的API区分过程和函数。函数根据其参数(包括任何目标对象)计算新值;过程会修改对象,并且不返回任何内容(即,它们返回None)。因此,程序具有副作用,而功能则没有。更新是一个过程,因此它不返回值。

这样做的动机是,否则可能会导致不良的副作用。考虑

bar = foo.reverse()

如果reverse(也将反向替换列表)也返回列表,则用户可能会认为reverse返回一个新列表,该列表已分配给bar,而永远不会注意到foo也被修改了。通过使反向返回为“无”,他们可以立即认识到bar不是反向的结果,并且看起来更接近反向的效果。

Python’s API, by convention, distinguishes between procedures and functions. Functions compute new values out of their parameters (including any target object); procedures modify objects and don’t return anything (i.e. they return None). So procedures have side effects, functions don’t. update is a procedure, hence it doesn’t return a value.

The motivation for doing it that way is that otherwise, you may get undesirable side effects. Consider

bar = foo.reverse()

If reverse (which reverses the list in-place) would also return the list, users may think that reverse returns a new list which gets assigned to bar, and never notice that foo also gets modified. By making reverse return None, they immediately recognize that bar is not the result of the reversal, and will look more close what the effect of reverse is.


回答 2

这很容易,因为:

(lambda d: d.update(dict2) or d)(d1)

This is easy as:

(lambda d: d.update(dict2) or d)(d1)

回答 3

>>> dict_merge = lambda a,b: a.update(b) or a
>>> dict_merge({'a':1, 'b':3},{'c':5})
{'a': 1, 'c': 5, 'b': 3}

请注意,除了返回合并的字典外,它还会就地修改第一个参数。因此dict_merge(a,b)将修改a。

或者,当然,您可以全部内联:

>>> (lambda a,b: a.update(b) or a)({'a':1, 'b':3},{'c':5})
{'a': 1, 'c': 5, 'b': 3}
>>> dict_merge = lambda a,b: a.update(b) or a
>>> dict_merge({'a':1, 'b':3},{'c':5})
{'a': 1, 'c': 5, 'b': 3}

Note that as well as returning the merged dict, it modifies the first parameter in-place. So dict_merge(a,b) will modify a.

Or, of course, you can do it all inline:

>>> (lambda a,b: a.update(b) or a)({'a':1, 'b':3},{'c':5})
{'a': 1, 'c': 5, 'b': 3}

回答 4

没有足够的声誉来评论顶部答案

@beardc这似乎不是CPython。PyPy给我“ TypeError:关键字必须是字符串”

之所以**kwargs只能使用解决方案,是因为要合并的字典仅具有string类型的键

>>> dict({1:2}, **{3:4})
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: keyword arguments must be strings

>>> dict({1:2}, **{'3':4})
{1: 2, '3': 4}

not enough reputation for comment left on top answer

@beardc this doesn’t seem to be CPython thing. PyPy gives me “TypeError: keywords must be strings”

The solution with **kwargs only works because the dictionary to be merged only has keys of type string.

i.e.

>>> dict({1:2}, **{3:4})
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: keyword arguments must be strings

vs

>>> dict({1:2}, **{'3':4})
{1: 2, '3': 4}

回答 5

不是说它不被接受,而是不是那样dicts实现的。

如果您查看Django的ORM,它将充分利用链接。不劝阻它,您甚至可以继承dict并仅重写update以执行update和return self,如果您确实需要的话。

class myDict(dict):
    def update(self, *args):
        dict.update(self, *args)
        return self

Its not that it isn’t acceptable, but rather that dicts weren’t implemented that way.

If you look at Django’s ORM, it makes extensive use of chaining. Its not discouraged, you could even inherit from dict and only override update to do update and return self, if you really want it.

class myDict(dict):
    def update(self, *args):
        dict.update(self, *args)
        return self

回答 6

我会尽可能接近您建议的解决方案

from collections import ChainMap

return self.add_award(ChainMap(award_dict, {
    "name" : name,
    "description" : desc_string % count,
    "points" : points,
    "parent_award" : parent,
}), siteAlias, alias).award

as close to your proposed solution as I could get

from collections import ChainMap

return self.add_award(ChainMap(award_dict, {
    "name" : name,
    "description" : desc_string % count,
    "points" : points,
    "parent_award" : parent,
}), siteAlias, alias).award

回答 7

对于那些迟到的人,我已经安排了一些时间安排(Py 3.7),显示了.update()保留输入的基础方法看起来要快一点(约5%),而就地更新时则要快得多(约30%)。 。

像往常一样,所有基准都应加一粒盐。

def join2(dict1, dict2, inplace=False):
    result = dict1 if inplace else dict1.copy()
    result.update(dict2)
    return result


def join(*items):
    iter_items = iter(items)
    result = next(iter_items).copy()
    for item in iter_items:
        result.update(item)
    return result


def update_or(dict1, dict2):
    return dict1.update(dict2) or dict1


d1 = {i: str(i) for i in range(1000000)}
d2 = {str(i): i for i in range(1000000)}

%timeit join2(d1, d2)
# 258 ms ± 1.47 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

%timeit join(d1, d2)
# 262 ms ± 2.97 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

%timeit dict(d1, **d2)
# 267 ms ± 2.74 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

%timeit {**d1, **d2}
# 267 ms ± 1.84 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

就地操作的时序有些棘手,因此需要在额外的复制操作中进行修改(第一个时序仅供参考):

%timeit dd = d1.copy()
# 44.9 ms ± 495 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

%timeit dd = d1.copy(); join2(dd, d2)
# 296 ms ± 2.05 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

%timeit dd = d1.copy(); join2(dd, d2, True)
# 234 ms ± 1.02 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

%timeit dd = d1.copy(); update_or(dd, d2)
# 235 ms ± 1.6 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

For those coming late to the party, I had put some timing together (Py 3.7), showing that .update() based methods look a bit (~5%) faster when inputs are preserved and noticeably (~30%) faster when just updating in-place.

As usual, all the benchmarks should be taken with a grain of salt.

def join2(dict1, dict2, inplace=False):
    result = dict1 if inplace else dict1.copy()
    result.update(dict2)
    return result


def join(*items):
    iter_items = iter(items)
    result = next(iter_items).copy()
    for item in iter_items:
        result.update(item)
    return result


def update_or(dict1, dict2):
    return dict1.update(dict2) or dict1


d1 = {i: str(i) for i in range(1000000)}
d2 = {str(i): i for i in range(1000000)}

%timeit join2(d1, d2)
# 258 ms ± 1.47 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

%timeit join(d1, d2)
# 262 ms ± 2.97 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

%timeit dict(d1, **d2)
# 267 ms ± 2.74 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

%timeit {**d1, **d2}
# 267 ms ± 1.84 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

The timings for the in-place operations are a bit trickier, so it would need to be modified along an extra copy operation (the first timing is just for reference):

%timeit dd = d1.copy()
# 44.9 ms ± 495 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

%timeit dd = d1.copy(); join2(dd, d2)
# 296 ms ± 2.05 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

%timeit dd = d1.copy(); join2(dd, d2, True)
# 234 ms ± 1.02 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

%timeit dd = d1.copy(); update_or(dd, d2)
# 235 ms ± 1.6 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

回答 8

import itertools
dict_merge = lambda *args: dict(itertools.chain(*[d.iteritems() for d in args]))
import itertools
dict_merge = lambda *args: dict(itertools.chain(*[d.iteritems() for d in args]))

回答 9

刚在Python 3.4中尝试过此操作(因此无法使用高级{**dict_1, **dict_2}语法)。

我希望能够在字典中使用非字符串键,并提供任意数量的字典。

另外,我想制作一本新词典,所以我选择不使用collections.ChainMap(这是我dict.update最初不想使用的原因。

这是我最后写的:

def merge_dicts(*dicts):
    all_keys  = set(k for d in dicts for k in d.keys())
    chain_map = ChainMap(*reversed(dicts))
    return {k: chain_map[k] for k in all_keys}

merge_maps({'1': 1}, {'2': 2, '3': 3}, {'1': 4, '3': 5})
# {'1': 4, '3': 5, '2': 2}

Just been trying this myself in Python 3.4 (so wasn’t able to use the fancy {**dict_1, **dict_2} syntax).

I wanted to be able to have non-string keys in dictionaries as well as provide an arbitrary amount of dictionaries.

Also, I wanted to make a new dictionary so I opted to not use collections.ChainMap (kinda the reason I didn’t want to use dict.update initially.

Here’s what I ended up writing:

def merge_dicts(*dicts):
    all_keys  = set(k for d in dicts for k in d.keys())
    chain_map = ChainMap(*reversed(dicts))
    return {k: chain_map[k] for k in all_keys}

merge_maps({'1': 1}, {'2': 2, '3': 3}, {'1': 4, '3': 5})
# {'1': 4, '3': 5, '2': 2}