标签归档:mapping

如何在Python中创建嵌套字典?

问题:如何在Python中创建嵌套字典?

我有2个CSV文件:“数据”和“映射”:

  • ‘映射’文件有4列:Device_NameGDNDevice_Type,和Device_OS。填充所有四个列。
  • “数据”文件具有这些相同的列,其中Device_Name填充了列,而其他三列为空白。
  • 我希望我的Python代码来打开这两个文件并为每个Device_Name数据文件,它的映射GDNDevice_Type以及Device_OS从映射文件中值。

我知道只有2列存在时才需要使用dict(需要映射1列),但是当需要映射3列时我不知道如何实现。

以下是我尝试完成的映射的代码Device_Type

x = dict([])
with open("Pricing Mapping_2013-04-22.csv", "rb") as in_file1:
    file_map = csv.reader(in_file1, delimiter=',')
    for row in file_map:
       typemap = [row[0],row[2]]
       x.append(typemap)

with open("Pricing_Updated_Cleaned.csv", "rb") as in_file2, open("Data Scraper_GDN.csv", "wb") as out_file:
    writer = csv.writer(out_file, delimiter=',')
    for row in csv.reader(in_file2, delimiter=','):
         try:
              row[27] = x[row[11]]
         except KeyError:
              row[27] = ""
         writer.writerow(row)

它返回Attribute Error

经过研究后,我认为我需要创建一个嵌套的字典,但是我不知道如何执行此操作。

I have 2 CSV files: ‘Data’ and ‘Mapping’:

  • ‘Mapping’ file has 4 columns: Device_Name, GDN, Device_Type, and Device_OS. All four columns are populated.
  • ‘Data’ file has these same columns, with Device_Name column populated and the other three columns blank.
  • I want my Python code to open both files and for each Device_Name in the Data file, map its GDN, Device_Type, and Device_OS value from the Mapping file.

I know how to use dict when only 2 columns are present (1 is needed to be mapped) but I don’t know how to accomplish this when 3 columns need to be mapped.

Following is the code using which I tried to accomplish mapping of Device_Type:

x = dict([])
with open("Pricing Mapping_2013-04-22.csv", "rb") as in_file1:
    file_map = csv.reader(in_file1, delimiter=',')
    for row in file_map:
       typemap = [row[0],row[2]]
       x.append(typemap)

with open("Pricing_Updated_Cleaned.csv", "rb") as in_file2, open("Data Scraper_GDN.csv", "wb") as out_file:
    writer = csv.writer(out_file, delimiter=',')
    for row in csv.reader(in_file2, delimiter=','):
         try:
              row[27] = x[row[11]]
         except KeyError:
              row[27] = ""
         writer.writerow(row)

It returns Attribute Error.

After some researching, I think I need to create a nested dict, but I don’t have any idea how to do this.


回答 0

嵌套字典是字典中的字典。非常简单的事情。

>>> d = {}
>>> d['dict1'] = {}
>>> d['dict1']['innerkey'] = 'value'
>>> d
{'dict1': {'innerkey': 'value'}}

你也可以使用一个defaultdictcollections包装,以方便创建嵌套的字典。

>>> import collections
>>> d = collections.defaultdict(dict)
>>> d['dict1']['innerkey'] = 'value'
>>> d  # currently a defaultdict type
defaultdict(<type 'dict'>, {'dict1': {'innerkey': 'value'}})
>>> dict(d)  # but is exactly like a normal dictionary.
{'dict1': {'innerkey': 'value'}}

您可以根据需要填充。

我建议在你的代码的东西下面:

d = {}  # can use defaultdict(dict) instead

for row in file_map:
    # derive row key from something 
    # when using defaultdict, we can skip the next step creating a dictionary on row_key
    d[row_key] = {} 
    for idx, col in enumerate(row):
        d[row_key][idx] = col

根据您的评论

可能上面的代码令人困惑。我的问题简而言之:我有2个文件a.csv b.csv,a.csv有4列ijkl,b.csv也有这些列。我是这些csv的关键列。jkl列在a.csv中为空,但在b.csv中填充。我想使用’i’作为键列将b.csv中的jk l列的值映射到a.csv文件

我的建议是什么这样(不使用defaultdict):

a_file = "path/to/a.csv"
b_file = "path/to/b.csv"

# read from file a.csv
with open(a_file) as f:
    # skip headers
    f.next()
    # get first colum as keys
    keys = (line.split(',')[0] for line in f) 

# create empty dictionary:
d = {}

# read from file b.csv
with open(b_file) as f:
    # gather headers except first key header
    headers = f.next().split(',')[1:]
    # iterate lines
    for line in f:
        # gather the colums
        cols = line.strip().split(',')
        # check to make sure this key should be mapped.
        if cols[0] not in keys:
            continue
        # add key to dict
        d[cols[0]] = dict(
            # inner keys are the header names, values are columns
            (headers[idx], v) for idx, v in enumerate(cols[1:]))

但是请注意,用于解析csv文件的是csv模块

A nested dict is a dictionary within a dictionary. A very simple thing.

>>> d = {}
>>> d['dict1'] = {}
>>> d['dict1']['innerkey'] = 'value'
>>> d
{'dict1': {'innerkey': 'value'}}

You can also use a defaultdict from the collections package to facilitate creating nested dictionaries.

>>> import collections
>>> d = collections.defaultdict(dict)
>>> d['dict1']['innerkey'] = 'value'
>>> d  # currently a defaultdict type
defaultdict(<type 'dict'>, {'dict1': {'innerkey': 'value'}})
>>> dict(d)  # but is exactly like a normal dictionary.
{'dict1': {'innerkey': 'value'}}

You can populate that however you want.

I would recommend in your code something like the following:

d = {}  # can use defaultdict(dict) instead

for row in file_map:
    # derive row key from something 
    # when using defaultdict, we can skip the next step creating a dictionary on row_key
    d[row_key] = {} 
    for idx, col in enumerate(row):
        d[row_key][idx] = col

According to your comment:

may be above code is confusing the question. My problem in nutshell: I have 2 files a.csv b.csv, a.csv has 4 columns i j k l, b.csv also has these columns. i is kind of key columns for these csvs’. j k l column is empty in a.csv but populated in b.csv. I want to map values of j k l columns using ‘i` as key column from b.csv to a.csv file

My suggestion would be something like this (without using defaultdict):

a_file = "path/to/a.csv"
b_file = "path/to/b.csv"

# read from file a.csv
with open(a_file) as f:
    # skip headers
    f.next()
    # get first colum as keys
    keys = (line.split(',')[0] for line in f) 

# create empty dictionary:
d = {}

# read from file b.csv
with open(b_file) as f:
    # gather headers except first key header
    headers = f.next().split(',')[1:]
    # iterate lines
    for line in f:
        # gather the colums
        cols = line.strip().split(',')
        # check to make sure this key should be mapped.
        if cols[0] not in keys:
            continue
        # add key to dict
        d[cols[0]] = dict(
            # inner keys are the header names, values are columns
            (headers[idx], v) for idx, v in enumerate(cols[1:]))

Please note though, that for parsing csv files there is a csv module.


回答 1

更新:对于嵌套字典的任意长度,请转到此答案

使用集合中的defaultdict函数。

高性能:当数据集很大时,“ if key not in dict”非常昂贵。

维护成本低:使代码更具可读性,并且可以轻松扩展。

from collections import defaultdict

target_dict = defaultdict(dict)
target_dict[key1][key2] = val

UPDATE: For an arbitrary length of a nested dictionary, go to this answer.

Use the defaultdict function from the collections.

High performance: “if key not in dict” is very expensive when the data set is large.

Low maintenance: make the code more readable and can be easily extended.

from collections import defaultdict

target_dict = defaultdict(dict)
target_dict[key1][key2] = val

回答 2

对于任意级别的嵌套:

In [2]: def nested_dict():
   ...:     return collections.defaultdict(nested_dict)
   ...:

In [3]: a = nested_dict()

In [4]: a
Out[4]: defaultdict(<function __main__.nested_dict>, {})

In [5]: a['a']['b']['c'] = 1

In [6]: a
Out[6]:
defaultdict(<function __main__.nested_dict>,
            {'a': defaultdict(<function __main__.nested_dict>,
                         {'b': defaultdict(<function __main__.nested_dict>,
                                      {'c': 1})})})

For arbitrary levels of nestedness:

In [2]: def nested_dict():
   ...:     return collections.defaultdict(nested_dict)
   ...:

In [3]: a = nested_dict()

In [4]: a
Out[4]: defaultdict(<function __main__.nested_dict>, {})

In [5]: a['a']['b']['c'] = 1

In [6]: a
Out[6]:
defaultdict(<function __main__.nested_dict>,
            {'a': defaultdict(<function __main__.nested_dict>,
                         {'b': defaultdict(<function __main__.nested_dict>,
                                      {'c': 1})})})

回答 3

重要的是要记住,在使用defaultdict和类似的嵌套dict模块(如nested_dict)时,查找不存在的键可能会无意间在dict中创建新的键条目,并造成很多破坏。

这是带有nested_dict模块的Python3示例:

import nested_dict as nd
nest = nd.nested_dict()
nest['outer1']['inner1'] = 'v11'
nest['outer1']['inner2'] = 'v12'
print('original nested dict: \n', nest)
try:
    nest['outer1']['wrong_key1']
except KeyError as e:
    print('exception missing key', e)
print('nested dict after lookup with missing key.  no exception raised:\n', nest)

# Instead, convert back to normal dict...
nest_d = nest.to_dict(nest)
try:
    print('converted to normal dict. Trying to lookup Wrong_key2')
    nest_d['outer1']['wrong_key2']
except KeyError as e:
    print('exception missing key', e)
else:
    print(' no exception raised:\n')

# ...or use dict.keys to check if key in nested dict
print('checking with dict.keys')
print(list(nest['outer1'].keys()))
if 'wrong_key3' in list(nest.keys()):

    print('found wrong_key3')
else:
    print(' did not find wrong_key3')

输出为:

original nested dict:   {"outer1": {"inner2": "v12", "inner1": "v11"}}

nested dict after lookup with missing key.  no exception raised:  
{"outer1": {"wrong_key1": {}, "inner2": "v12", "inner1": "v11"}} 

converted to normal dict. 
Trying to lookup Wrong_key2 

exception missing key 'wrong_key2' 

checking with dict.keys 

['wrong_key1', 'inner2', 'inner1']  
did not find wrong_key3

It is important to remember when using defaultdict and similar nested dict modules such as nested_dict, that looking up a nonexistent key may inadvertently create a new key entry in the dict and cause a lot of havoc.

Here is a Python3 example with nested_dict module:

import nested_dict as nd
nest = nd.nested_dict()
nest['outer1']['inner1'] = 'v11'
nest['outer1']['inner2'] = 'v12'
print('original nested dict: \n', nest)
try:
    nest['outer1']['wrong_key1']
except KeyError as e:
    print('exception missing key', e)
print('nested dict after lookup with missing key.  no exception raised:\n', nest)

# Instead, convert back to normal dict...
nest_d = nest.to_dict(nest)
try:
    print('converted to normal dict. Trying to lookup Wrong_key2')
    nest_d['outer1']['wrong_key2']
except KeyError as e:
    print('exception missing key', e)
else:
    print(' no exception raised:\n')

# ...or use dict.keys to check if key in nested dict
print('checking with dict.keys')
print(list(nest['outer1'].keys()))
if 'wrong_key3' in list(nest.keys()):

    print('found wrong_key3')
else:
    print(' did not find wrong_key3')

Output is:

original nested dict:   {"outer1": {"inner2": "v12", "inner1": "v11"}}

nested dict after lookup with missing key.  no exception raised:  
{"outer1": {"wrong_key1": {}, "inner2": "v12", "inner1": "v11"}} 

converted to normal dict. 
Trying to lookup Wrong_key2 

exception missing key 'wrong_key2' 

checking with dict.keys 

['wrong_key1', 'inner2', 'inner1']  
did not find wrong_key3

实现嵌套字典的最佳方法是什么?

问题:实现嵌套字典的最佳方法是什么?

我有一个实质上相当于嵌套字典的数据结构。假设它看起来像这样:

{'new jersey': {'mercer county': {'plumbers': 3,
                                  'programmers': 81},
                'middlesex county': {'programmers': 81,
                                     'salesmen': 62}},
 'new york': {'queens county': {'plumbers': 9,
                                'salesmen': 36}}}

现在,维护和创建它非常痛苦。每当我有一个新的州/县/专业时,我都必须通过讨厌的try / catch块创建较低层的字典。此外,如果要遍历所有值,则必须创建烦人的嵌套迭代器。

我也可以使用元组作为键,例如:

{('new jersey', 'mercer county', 'plumbers'): 3,
 ('new jersey', 'mercer county', 'programmers'): 81,
 ('new jersey', 'middlesex county', 'programmers'): 81,
 ('new jersey', 'middlesex county', 'salesmen'): 62,
 ('new york', 'queens county', 'plumbers'): 9,
 ('new york', 'queens county', 'salesmen'): 36}

这使得对值的迭代非常简单自然,但是在语法上进行诸如汇总和查看字典子集之类的操作在语法上更加痛苦(例如,如果我只是想逐个查看状态的话)。

基本上,有时我想将嵌套字典视为平面字典,而有时又想将其视为复杂的层次结构。我可以将所有这些都包装在一个类中,但是似乎有人已经做到了。另外,似乎可能有一些非常优雅的语法构造可以做到这一点。

我怎样才能做得更好?

附录:我知道,setdefault()但这实际上并不能使语法简洁。同样,您创建的每个子词典仍然需要setdefault()手动设置。

I have a data structure which essentially amounts to a nested dictionary. Let’s say it looks like this:

{'new jersey': {'mercer county': {'plumbers': 3,
                                  'programmers': 81},
                'middlesex county': {'programmers': 81,
                                     'salesmen': 62}},
 'new york': {'queens county': {'plumbers': 9,
                                'salesmen': 36}}}

Now, maintaining and creating this is pretty painful; every time I have a new state/county/profession I have to create the lower layer dictionaries via obnoxious try/catch blocks. Moreover, I have to create annoying nested iterators if I want to go over all the values.

I could also use tuples as keys, like such:

{('new jersey', 'mercer county', 'plumbers'): 3,
 ('new jersey', 'mercer county', 'programmers'): 81,
 ('new jersey', 'middlesex county', 'programmers'): 81,
 ('new jersey', 'middlesex county', 'salesmen'): 62,
 ('new york', 'queens county', 'plumbers'): 9,
 ('new york', 'queens county', 'salesmen'): 36}

This makes iterating over the values very simple and natural, but it is more syntactically painful to do things like aggregations and looking at subsets of the dictionary (e.g. if I just want to go state-by-state).

Basically, sometimes I want to think of a nested dictionary as a flat dictionary, and sometimes I want to think of it indeed as a complex hierarchy. I could wrap this all in a class, but it seems like someone might have done this already. Alternatively, it seems like there might be some really elegant syntactical constructions to do this.

How could I do this better?

Addendum: I’m aware of setdefault() but it doesn’t really make for clean syntax. Also, each sub-dictionary you create still needs to have setdefault() manually set.


回答 0

在Python中实现嵌套字典的最佳方法是什么?

这是个坏主意,请不要这样做。相反,请使用常规字典并dict.setdefault在适当位置使用apropos,因此,在正常使用情况下缺少键时,您将获得期望的KeyError。如果您坚持要采取这种行为,请按以下步骤射击自己:

__missing__dict子类上实现以设置并返回新实例。

从Python 2.5开始,这种方法就已经可用(并记录在案),并且(对我来说特别有价值)它的打印效果与普通dict一样,而不是自动生成的defaultdict的丑陋打印:

class Vividict(dict):
    def __missing__(self, key):
        value = self[key] = type(self)() # retain local pointer to value
        return value                     # faster to return than dict lookup

(注意self[key]在作业的左侧,因此此处没有递归。)

并说您有一些数据:

data = {('new jersey', 'mercer county', 'plumbers'): 3,
        ('new jersey', 'mercer county', 'programmers'): 81,
        ('new jersey', 'middlesex county', 'programmers'): 81,
        ('new jersey', 'middlesex county', 'salesmen'): 62,
        ('new york', 'queens county', 'plumbers'): 9,
        ('new york', 'queens county', 'salesmen'): 36}

这是我们的用法代码:

vividict = Vividict()
for (state, county, occupation), number in data.items():
    vividict[state][county][occupation] = number

现在:

>>> import pprint
>>> pprint.pprint(vividict, width=40)
{'new jersey': {'mercer county': {'plumbers': 3,
                                  'programmers': 81},
                'middlesex county': {'programmers': 81,
                                     'salesmen': 62}},
 'new york': {'queens county': {'plumbers': 9,
                                'salesmen': 36}}}

批评

对这种类型的容器的批评是,如果用户拼错了密钥,我们的代码可能会无声地失败:

>>> vividict['new york']['queens counyt']
{}

另外,现在我们的数据中会有一个拼写错误的县:

>>> pprint.pprint(vividict, width=40)
{'new jersey': {'mercer county': {'plumbers': 3,
                                  'programmers': 81},
                'middlesex county': {'programmers': 81,
                                     'salesmen': 62}},
 'new york': {'queens county': {'plumbers': 9,
                                'salesmen': 36},
              'queens counyt': {}}}

说明:

我们只是提供了该类的另一个嵌套实例 Vividict每当访问键但丢失键时。(返回值分配很有用,因为它避免了我们额外地在dict上调用getter,不幸的是,我们无法在设置它时返回它。)

请注意,这些与最受支持的答案具有相同的语义,但代码行的一半-nosklo的实现:

class AutoVivification(dict):
    """Implementation of perl's autovivification feature."""
    def __getitem__(self, item):
        try:
            return dict.__getitem__(self, item)
        except KeyError:
            value = self[item] = type(self)()
            return value

用法示范

下面只是一个示例,说明如何轻松地使用此dict即时创建嵌套的dict结构。这样可以快速创建层次结构树结构,如您所愿。

import pprint

class Vividict(dict):
    def __missing__(self, key):
        value = self[key] = type(self)()
        return value

d = Vividict()

d['foo']['bar']
d['foo']['baz']
d['fizz']['buzz']
d['primary']['secondary']['tertiary']['quaternary']
pprint.pprint(d)

哪个输出:

{'fizz': {'buzz': {}},
 'foo': {'bar': {}, 'baz': {}},
 'primary': {'secondary': {'tertiary': {'quaternary': {}}}}}

正如最后一行所示,它打印精美,便于人工检查。但是,如果要直观地检查数据,则可以实施__missing__将其类的新实例设置为键并将其返回的方法,这是更好的解决方案。

对比其他替代方法:

dict.setdefault

尽管询问者认为这不干净,但我发现它比Vividict我自己更喜欢。

d = {} # or dict()
for (state, county, occupation), number in data.items():
    d.setdefault(state, {}).setdefault(county, {})[occupation] = number

现在:

>>> pprint.pprint(d, width=40)
{'new jersey': {'mercer county': {'plumbers': 3,
                                  'programmers': 81},
                'middlesex county': {'programmers': 81,
                                     'salesmen': 62}},
 'new york': {'queens county': {'plumbers': 9,
                                'salesmen': 36}}}

拼写错误将严重失败,并且不会因错误信息而使我们的数据混乱:

>>> d['new york']['queens counyt']
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
KeyError: 'queens counyt'

另外,我认为setdefault在循环中使用时效果很好,并且您不知道密钥要获得什么,但是重复使用变得很繁重,而且我认为没有人愿意遵守以下规定:

d = dict()

d.setdefault('foo', {}).setdefault('bar', {})
d.setdefault('foo', {}).setdefault('baz', {})
d.setdefault('fizz', {}).setdefault('buzz', {})
d.setdefault('primary', {}).setdefault('secondary', {}).setdefault('tertiary', {}).setdefault('quaternary', {})

另一个批评是,无论是否使用setdefault,setdefault都需要一个新实例。但是,Python(或至少CPython)在处理未使用和未引用的新实例方面相当聪明,例如,它重用了内存中的位置:

>>> id({}), id({}), id({})
(523575344, 523575344, 523575344)

自动更新的defaultdict

这是一个简洁的实现,不检查数据的脚本中的用法与实现一样有用__missing__

from collections import defaultdict

def vivdict():
    return defaultdict(vivdict)

但是,如果您需要检查数据,则以相同方式填充数据的自动复现defaultdict的结果如下所示:

>>> d = vivdict(); d['foo']['bar']; d['foo']['baz']; d['fizz']['buzz']; d['primary']['secondary']['tertiary']['quaternary']; import pprint; 
>>> pprint.pprint(d)
defaultdict(<function vivdict at 0x17B01870>, {'foo': defaultdict(<function vivdict 
at 0x17B01870>, {'baz': defaultdict(<function vivdict at 0x17B01870>, {}), 'bar': 
defaultdict(<function vivdict at 0x17B01870>, {})}), 'primary': defaultdict(<function 
vivdict at 0x17B01870>, {'secondary': defaultdict(<function vivdict at 0x17B01870>, 
{'tertiary': defaultdict(<function vivdict at 0x17B01870>, {'quaternary': defaultdict(
<function vivdict at 0x17B01870>, {})})})}), 'fizz': defaultdict(<function vivdict at 
0x17B01870>, {'buzz': defaultdict(<function vivdict at 0x17B01870>, {})})})

此输出非常微不足道,并且结果非常不可读。通常给出的解决方案是将其递归转换回dict以进行手动检查。这个非平凡的解决方案留给读者练习。

性能

最后,让我们看一下性能。我要减去实例化的成本。

>>> import timeit
>>> min(timeit.repeat(lambda: {}.setdefault('foo', {}))) - min(timeit.repeat(lambda: {}))
0.13612580299377441
>>> min(timeit.repeat(lambda: vivdict()['foo'])) - min(timeit.repeat(lambda: vivdict()))
0.2936999797821045
>>> min(timeit.repeat(lambda: Vividict()['foo'])) - min(timeit.repeat(lambda: Vividict()))
0.5354437828063965
>>> min(timeit.repeat(lambda: AutoVivification()['foo'])) - min(timeit.repeat(lambda: AutoVivification()))
2.138362169265747

基于性能,dict.setdefault效果最佳。如果您关心执行速度,我强烈建议将其用于生产代码。

如果您需要将它用于交互式使用(也许是在IPython笔记本中),那么性能并不重要-在这种情况下,我会选择Vividict来确保输出的可读性。与AutoVivification对象(为此目的而使用__getitem__代替__missing__)相比,它要优越得多。

结论

__missing__在子类dict上实现以设置和返回新实例要比替代方法难一些,但具有以下优点:

  • 易于实例化
  • 简单数据填充
  • 轻松查看数据

并且因为它比修改不那么复杂且性能更高__getitem__,所以应该优先于该方法。

但是,它有缺点:

  • 错误的查询将自动失败。
  • 错误的查询将保留在词典中。

因此,我个人更喜欢setdefault其他解决方案,并且在每种情况下都需要这种行为。

What is the best way to implement nested dictionaries in Python?

This is a bad idea, don’t do it. Instead, use a regular dictionary and use dict.setdefault where apropos, so when keys are missing under normal usage you get the expected KeyError. If you insist on getting this behavior, here’s how to shoot yourself in the foot:

Implement __missing__ on a dict subclass to set and return a new instance.

This approach has been available (and documented) since Python 2.5, and (particularly valuable to me) it pretty prints just like a normal dict, instead of the ugly printing of an autovivified defaultdict:

class Vividict(dict):
    def __missing__(self, key):
        value = self[key] = type(self)() # retain local pointer to value
        return value                     # faster to return than dict lookup

(Note self[key] is on the left-hand side of assignment, so there’s no recursion here.)

and say you have some data:

data = {('new jersey', 'mercer county', 'plumbers'): 3,
        ('new jersey', 'mercer county', 'programmers'): 81,
        ('new jersey', 'middlesex county', 'programmers'): 81,
        ('new jersey', 'middlesex county', 'salesmen'): 62,
        ('new york', 'queens county', 'plumbers'): 9,
        ('new york', 'queens county', 'salesmen'): 36}

Here’s our usage code:

vividict = Vividict()
for (state, county, occupation), number in data.items():
    vividict[state][county][occupation] = number

And now:

>>> import pprint
>>> pprint.pprint(vividict, width=40)
{'new jersey': {'mercer county': {'plumbers': 3,
                                  'programmers': 81},
                'middlesex county': {'programmers': 81,
                                     'salesmen': 62}},
 'new york': {'queens county': {'plumbers': 9,
                                'salesmen': 36}}}

Criticism

A criticism of this type of container is that if the user misspells a key, our code could fail silently:

>>> vividict['new york']['queens counyt']
{}

And additionally now we’d have a misspelled county in our data:

>>> pprint.pprint(vividict, width=40)
{'new jersey': {'mercer county': {'plumbers': 3,
                                  'programmers': 81},
                'middlesex county': {'programmers': 81,
                                     'salesmen': 62}},
 'new york': {'queens county': {'plumbers': 9,
                                'salesmen': 36},
              'queens counyt': {}}}

Explanation:

We’re just providing another nested instance of our class Vividict whenever a key is accessed but missing. (Returning the value assignment is useful because it avoids us additionally calling the getter on the dict, and unfortunately, we can’t return it as it is being set.)

Note, these are the same semantics as the most upvoted answer but in half the lines of code – nosklo’s implementation:

class AutoVivification(dict):
    """Implementation of perl's autovivification feature."""
    def __getitem__(self, item):
        try:
            return dict.__getitem__(self, item)
        except KeyError:
            value = self[item] = type(self)()
            return value

Demonstration of Usage

Below is just an example of how this dict could be easily used to create a nested dict structure on the fly. This can quickly create a hierarchical tree structure as deeply as you might want to go.

import pprint

class Vividict(dict):
    def __missing__(self, key):
        value = self[key] = type(self)()
        return value

d = Vividict()

d['foo']['bar']
d['foo']['baz']
d['fizz']['buzz']
d['primary']['secondary']['tertiary']['quaternary']
pprint.pprint(d)

Which outputs:

{'fizz': {'buzz': {}},
 'foo': {'bar': {}, 'baz': {}},
 'primary': {'secondary': {'tertiary': {'quaternary': {}}}}}

And as the last line shows, it pretty prints beautifully and in order for manual inspection. But if you want to visually inspect your data, implementing __missing__ to set a new instance of its class to the key and return it is a far better solution.

Other alternatives, for contrast:

dict.setdefault

Although the asker thinks this isn’t clean, I find it preferable to the Vividict myself.

d = {} # or dict()
for (state, county, occupation), number in data.items():
    d.setdefault(state, {}).setdefault(county, {})[occupation] = number

and now:

>>> pprint.pprint(d, width=40)
{'new jersey': {'mercer county': {'plumbers': 3,
                                  'programmers': 81},
                'middlesex county': {'programmers': 81,
                                     'salesmen': 62}},
 'new york': {'queens county': {'plumbers': 9,
                                'salesmen': 36}}}

A misspelling would fail noisily, and not clutter our data with bad information:

>>> d['new york']['queens counyt']
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
KeyError: 'queens counyt'

Additionally, I think setdefault works great when used in loops and you don’t know what you’re going to get for keys, but repetitive usage becomes quite burdensome, and I don’t think anyone would want to keep up the following:

d = dict()

d.setdefault('foo', {}).setdefault('bar', {})
d.setdefault('foo', {}).setdefault('baz', {})
d.setdefault('fizz', {}).setdefault('buzz', {})
d.setdefault('primary', {}).setdefault('secondary', {}).setdefault('tertiary', {}).setdefault('quaternary', {})

Another criticism is that setdefault requires a new instance whether it is used or not. However, Python (or at least CPython) is rather smart about handling unused and unreferenced new instances, for example, it reuses the location in memory:

>>> id({}), id({}), id({})
(523575344, 523575344, 523575344)

An auto-vivified defaultdict

This is a neat looking implementation, and usage in a script that you’re not inspecting the data on would be as useful as implementing __missing__:

from collections import defaultdict

def vivdict():
    return defaultdict(vivdict)

But if you need to inspect your data, the results of an auto-vivified defaultdict populated with data in the same way looks like this:

>>> d = vivdict(); d['foo']['bar']; d['foo']['baz']; d['fizz']['buzz']; d['primary']['secondary']['tertiary']['quaternary']; import pprint; 
>>> pprint.pprint(d)
defaultdict(<function vivdict at 0x17B01870>, {'foo': defaultdict(<function vivdict 
at 0x17B01870>, {'baz': defaultdict(<function vivdict at 0x17B01870>, {}), 'bar': 
defaultdict(<function vivdict at 0x17B01870>, {})}), 'primary': defaultdict(<function 
vivdict at 0x17B01870>, {'secondary': defaultdict(<function vivdict at 0x17B01870>, 
{'tertiary': defaultdict(<function vivdict at 0x17B01870>, {'quaternary': defaultdict(
<function vivdict at 0x17B01870>, {})})})}), 'fizz': defaultdict(<function vivdict at 
0x17B01870>, {'buzz': defaultdict(<function vivdict at 0x17B01870>, {})})})

This output is quite inelegant, and the results are quite unreadable. The solution typically given is to recursively convert back to a dict for manual inspection. This non-trivial solution is left as an exercise for the reader.

Performance

Finally, let’s look at performance. I’m subtracting the costs of instantiation.

>>> import timeit
>>> min(timeit.repeat(lambda: {}.setdefault('foo', {}))) - min(timeit.repeat(lambda: {}))
0.13612580299377441
>>> min(timeit.repeat(lambda: vivdict()['foo'])) - min(timeit.repeat(lambda: vivdict()))
0.2936999797821045
>>> min(timeit.repeat(lambda: Vividict()['foo'])) - min(timeit.repeat(lambda: Vividict()))
0.5354437828063965
>>> min(timeit.repeat(lambda: AutoVivification()['foo'])) - min(timeit.repeat(lambda: AutoVivification()))
2.138362169265747

Based on performance, dict.setdefault works the best. I’d highly recommend it for production code, in cases where you care about execution speed.

If you need this for interactive use (in an IPython notebook, perhaps) then performance doesn’t really matter – in which case, I’d go with Vividict for readability of the output. Compared to the AutoVivification object (which uses __getitem__ instead of __missing__, which was made for this purpose) it is far superior.

Conclusion

Implementing __missing__ on a subclassed dict to set and return a new instance is slightly more difficult than alternatives but has the benefits of

  • easy instantiation
  • easy data population
  • easy data viewing

and because it is less complicated and more performant than modifying __getitem__, it should be preferred to that method.

Nevertheless, it has drawbacks:

  • Bad lookups will fail silently.
  • The bad lookup will remain in the dictionary.

Thus I personally prefer setdefault to the other solutions, and have in every situation where I have needed this sort of behavior.


回答 1

class AutoVivification(dict):
    """Implementation of perl's autovivification feature."""
    def __getitem__(self, item):
        try:
            return dict.__getitem__(self, item)
        except KeyError:
            value = self[item] = type(self)()
            return value

测试:

a = AutoVivification()

a[1][2][3] = 4
a[1][3][3] = 5
a[1][2]['test'] = 6

print a

输出:

{1: {2: {'test': 6, 3: 4}, 3: {3: 5}}}
class AutoVivification(dict):
    """Implementation of perl's autovivification feature."""
    def __getitem__(self, item):
        try:
            return dict.__getitem__(self, item)
        except KeyError:
            value = self[item] = type(self)()
            return value

Testing:

a = AutoVivification()

a[1][2][3] = 4
a[1][3][3] = 5
a[1][2]['test'] = 6

print a

Output:

{1: {2: {'test': 6, 3: 4}, 3: {3: 5}}}

回答 2

只是因为我还没有看到这么小的一个,这是一个像您想嵌套的字典一样,没有汗水:

# yo dawg, i heard you liked dicts                                                                      
def yodict():
    return defaultdict(yodict)

Just because I haven’t seen one this small, here’s a dict that gets as nested as you like, no sweat:

# yo dawg, i heard you liked dicts                                                                      
def yodict():
    return defaultdict(yodict)

回答 3

您可以创建一个YAML文件并使用PyYaml读取它

步骤1:创建一个YAML文件“ employment.yml”:

new jersey:
  mercer county:
    pumbers: 3
    programmers: 81
  middlesex county:
    salesmen: 62
    programmers: 81
new york:
  queens county:
    plumbers: 9
    salesmen: 36

步骤2:以Python阅读

import yaml
file_handle = open("employment.yml")
my_shnazzy_dictionary = yaml.safe_load(file_handle)
file_handle.close()

现在my_shnazzy_dictionary拥有您的所有价值观。如果您需要即时执行此操作,则可以将YAML创建为字符串并将其输入yaml.safe_load(...)

You could create a YAML file and read it in using PyYaml.

Step 1: Create a YAML file, “employment.yml”:

new jersey:
  mercer county:
    pumbers: 3
    programmers: 81
  middlesex county:
    salesmen: 62
    programmers: 81
new york:
  queens county:
    plumbers: 9
    salesmen: 36

Step 2: Read it in Python

import yaml
file_handle = open("employment.yml")
my_shnazzy_dictionary = yaml.safe_load(file_handle)
file_handle.close()

and now my_shnazzy_dictionary has all your values. If you needed to do this on the fly, you can create the YAML as a string and feed that into yaml.safe_load(...).


回答 4

由于您具有星形模式设计,因此您可能希望使其结构更像关系表,而不像字典。

import collections

class Jobs( object ):
    def __init__( self, state, county, title, count ):
        self.state= state
        self.count= county
        self.title= title
        self.count= count

facts = [
    Jobs( 'new jersey', 'mercer county', 'plumbers', 3 ),
    ...

def groupBy( facts, name ):
    total= collections.defaultdict( int )
    for f in facts:
        key= getattr( f, name )
        total[key] += f.count

在没有SQL开销的情况下,创建类似数据仓库的设计可以走很长一段路。

Since you have a star-schema design, you might want to structure it more like a relational table and less like a dictionary.

import collections

class Jobs( object ):
    def __init__( self, state, county, title, count ):
        self.state= state
        self.count= county
        self.title= title
        self.count= count

facts = [
    Jobs( 'new jersey', 'mercer county', 'plumbers', 3 ),
    ...

def groupBy( facts, name ):
    total= collections.defaultdict( int )
    for f in facts:
        key= getattr( f, name )
        total[key] += f.count

That kind of thing can go a long way to creating a data warehouse-like design without the SQL overheads.


回答 5

如果嵌套级别的数量很少,那么我可以collections.defaultdict这样做:

from collections import defaultdict

def nested_dict_factory(): 
  return defaultdict(int)
def nested_dict_factory2(): 
  return defaultdict(nested_dict_factory)
db = defaultdict(nested_dict_factory2)

db['new jersey']['mercer county']['plumbers'] = 3
db['new jersey']['mercer county']['programmers'] = 81

使用defaultdict这样避免了大量杂乱的setdefault()get()等等。

If the number of nesting levels is small, I use collections.defaultdict for this:

from collections import defaultdict

def nested_dict_factory(): 
  return defaultdict(int)
def nested_dict_factory2(): 
  return defaultdict(nested_dict_factory)
db = defaultdict(nested_dict_factory2)

db['new jersey']['mercer county']['plumbers'] = 3
db['new jersey']['mercer county']['programmers'] = 81

Using defaultdict like this avoids a lot of messy setdefault(), get(), etc.


回答 6

这是一个返回任意深度的嵌套字典的函数:

from collections import defaultdict
def make_dict():
    return defaultdict(make_dict)

像这样使用它:

d=defaultdict(make_dict)
d["food"]["meat"]="beef"
d["food"]["veggie"]="corn"
d["food"]["sweets"]="ice cream"
d["animal"]["pet"]["dog"]="collie"
d["animal"]["pet"]["cat"]="tabby"
d["animal"]["farm animal"]="chicken"

使用以下内容遍历所有内容:

def iter_all(d,depth=1):
    for k,v in d.iteritems():
        print "-"*depth,k
        if type(v) is defaultdict:
            iter_all(v,depth+1)
        else:
            print "-"*(depth+1),v

iter_all(d)

打印输出:

- food
-- sweets
--- ice cream
-- meat
--- beef
-- veggie
--- corn
- animal
-- pet
--- dog
---- labrador
--- cat
---- tabby
-- farm animal
--- chicken

您可能最终希望做到这一点,以便不能将新项目添加到字典中。将所有这些defaultdicts 递归转换为正常dicts 很容易。

def dictify(d):
    for k,v in d.iteritems():
        if isinstance(v,defaultdict):
            d[k] = dictify(v)
    return dict(d)

This is a function that returns a nested dictionary of arbitrary depth:

from collections import defaultdict
def make_dict():
    return defaultdict(make_dict)

Use it like this:

d=defaultdict(make_dict)
d["food"]["meat"]="beef"
d["food"]["veggie"]="corn"
d["food"]["sweets"]="ice cream"
d["animal"]["pet"]["dog"]="collie"
d["animal"]["pet"]["cat"]="tabby"
d["animal"]["farm animal"]="chicken"

Iterate through everything with something like this:

def iter_all(d,depth=1):
    for k,v in d.iteritems():
        print "-"*depth,k
        if type(v) is defaultdict:
            iter_all(v,depth+1)
        else:
            print "-"*(depth+1),v

iter_all(d)

This prints out:

- food
-- sweets
--- ice cream
-- meat
--- beef
-- veggie
--- corn
- animal
-- pet
--- dog
---- labrador
--- cat
---- tabby
-- farm animal
--- chicken

You might eventually want to make it so that new items can not be added to the dict. It’s easy to recursively convert all these defaultdicts to normal dicts.

def dictify(d):
    for k,v in d.iteritems():
        if isinstance(v,defaultdict):
            d[k] = dictify(v)
    return dict(d)

回答 7

我觉得setdefault很有用;它检查是否存在密钥,如果不存在,则添加它:

d = {}
d.setdefault('new jersey', {}).setdefault('mercer county', {})['plumbers'] = 3

setdefault 总是返回相关密钥,因此您实际上是在更新’d在原地 ”。

关于迭代,我敢肯定,如果Python中尚不存在生成器,那么您可以足够容易地编写生成器:

def iterateStates(d):
    # Let's count up the total number of "plumbers" / "dentists" / etc.
    # across all counties and states
    job_totals = {}

    # I guess this is the annoying nested stuff you were talking about?
    for (state, counties) in d.iteritems():
        for (county, jobs) in counties.iteritems():
            for (job, num) in jobs.iteritems():
                # If job isn't already in job_totals, default it to zero
                job_totals[job] = job_totals.get(job, 0) + num

    # Now return an iterator of (job, number) tuples
    return job_totals.iteritems()

# Display all jobs
for (job, num) in iterateStates(d):
    print "There are %d %s in total" % (job, num)

I find setdefault quite useful; It checks if a key is present and adds it if not:

d = {}
d.setdefault('new jersey', {}).setdefault('mercer county', {})['plumbers'] = 3

setdefault always returns the relevant key, so you are actually updating the values of ‘d‘ in place.

When it comes to iterating, I’m sure you could write a generator easily enough if one doesn’t already exist in Python:

def iterateStates(d):
    # Let's count up the total number of "plumbers" / "dentists" / etc.
    # across all counties and states
    job_totals = {}

    # I guess this is the annoying nested stuff you were talking about?
    for (state, counties) in d.iteritems():
        for (county, jobs) in counties.iteritems():
            for (job, num) in jobs.iteritems():
                # If job isn't already in job_totals, default it to zero
                job_totals[job] = job_totals.get(job, 0) + num

    # Now return an iterator of (job, number) tuples
    return job_totals.iteritems()

# Display all jobs
for (job, num) in iterateStates(d):
    print "There are %d %s in total" % (job, num)

回答 8

正如其他人所建议的,关系数据库对您可能更有用。您可以使用内存中的sqlite3数据库作为数据结构来创建表,然后对其进行查询。

import sqlite3

c = sqlite3.Connection(':memory:')
c.execute('CREATE TABLE jobs (state, county, title, count)')

c.executemany('insert into jobs values (?, ?, ?, ?)', [
    ('New Jersey', 'Mercer County',    'Programmers', 81),
    ('New Jersey', 'Mercer County',    'Plumbers',     3),
    ('New Jersey', 'Middlesex County', 'Programmers', 81),
    ('New Jersey', 'Middlesex County', 'Salesmen',    62),
    ('New York',   'Queens County',    'Salesmen',    36),
    ('New York',   'Queens County',    'Plumbers',     9),
])

# some example queries
print list(c.execute('SELECT * FROM jobs WHERE county = "Queens County"'))
print list(c.execute('SELECT SUM(count) FROM jobs WHERE title = "Programmers"'))

这只是一个简单的例子。您可以为州,县和职称定义单独的表格。

As others have suggested, a relational database could be more useful to you. You can use a in-memory sqlite3 database as a data structure to create tables and then query them.

import sqlite3

c = sqlite3.Connection(':memory:')
c.execute('CREATE TABLE jobs (state, county, title, count)')

c.executemany('insert into jobs values (?, ?, ?, ?)', [
    ('New Jersey', 'Mercer County',    'Programmers', 81),
    ('New Jersey', 'Mercer County',    'Plumbers',     3),
    ('New Jersey', 'Middlesex County', 'Programmers', 81),
    ('New Jersey', 'Middlesex County', 'Salesmen',    62),
    ('New York',   'Queens County',    'Salesmen',    36),
    ('New York',   'Queens County',    'Plumbers',     9),
])

# some example queries
print list(c.execute('SELECT * FROM jobs WHERE county = "Queens County"'))
print list(c.execute('SELECT SUM(count) FROM jobs WHERE title = "Programmers"'))

This is just a simple example. You could define separate tables for states, counties and job titles.


回答 9

collections.defaultdict可以细分为嵌套的字典。然后将任何有用的迭代方法添加到该类。

>>> from collections import defaultdict
>>> class nesteddict(defaultdict):
    def __init__(self):
        defaultdict.__init__(self, nesteddict)
    def walk(self):
        for key, value in self.iteritems():
            if isinstance(value, nesteddict):
                for tup in value.walk():
                    yield (key,) + tup
            else:
                yield key, value


>>> nd = nesteddict()
>>> nd['new jersey']['mercer county']['plumbers'] = 3
>>> nd['new jersey']['mercer county']['programmers'] = 81
>>> nd['new jersey']['middlesex county']['programmers'] = 81
>>> nd['new jersey']['middlesex county']['salesmen'] = 62
>>> nd['new york']['queens county']['plumbers'] = 9
>>> nd['new york']['queens county']['salesmen'] = 36
>>> for tup in nd.walk():
    print tup


('new jersey', 'mercer county', 'programmers', 81)
('new jersey', 'mercer county', 'plumbers', 3)
('new jersey', 'middlesex county', 'programmers', 81)
('new jersey', 'middlesex county', 'salesmen', 62)
('new york', 'queens county', 'salesmen', 36)
('new york', 'queens county', 'plumbers', 9)

collections.defaultdict can be sub-classed to make a nested dict. Then add any useful iteration methods to that class.

>>> from collections import defaultdict
>>> class nesteddict(defaultdict):
    def __init__(self):
        defaultdict.__init__(self, nesteddict)
    def walk(self):
        for key, value in self.iteritems():
            if isinstance(value, nesteddict):
                for tup in value.walk():
                    yield (key,) + tup
            else:
                yield key, value


>>> nd = nesteddict()
>>> nd['new jersey']['mercer county']['plumbers'] = 3
>>> nd['new jersey']['mercer county']['programmers'] = 81
>>> nd['new jersey']['middlesex county']['programmers'] = 81
>>> nd['new jersey']['middlesex county']['salesmen'] = 62
>>> nd['new york']['queens county']['plumbers'] = 9
>>> nd['new york']['queens county']['salesmen'] = 36
>>> for tup in nd.walk():
    print tup


('new jersey', 'mercer county', 'programmers', 81)
('new jersey', 'mercer county', 'plumbers', 3)
('new jersey', 'middlesex county', 'programmers', 81)
('new jersey', 'middlesex county', 'salesmen', 62)
('new york', 'queens county', 'salesmen', 36)
('new york', 'queens county', 'plumbers', 9)

回答 10

至于“令人讨厌的try / catch块”:

d = {}
d.setdefault('key',{}).setdefault('inner key',{})['inner inner key'] = 'value'
print d

Yield

{'key': {'inner key': {'inner inner key': 'value'}}}

您可以使用此方法将平面词典格式转换为结构化格式:

fd = {('new jersey', 'mercer county', 'plumbers'): 3,
 ('new jersey', 'mercer county', 'programmers'): 81,
 ('new jersey', 'middlesex county', 'programmers'): 81,
 ('new jersey', 'middlesex county', 'salesmen'): 62,
 ('new york', 'queens county', 'plumbers'): 9,
 ('new york', 'queens county', 'salesmen'): 36}

for (k1,k2,k3), v in fd.iteritems():
    d.setdefault(k1, {}).setdefault(k2, {})[k3] = v

As for “obnoxious try/catch blocks”:

d = {}
d.setdefault('key',{}).setdefault('inner key',{})['inner inner key'] = 'value'
print d

yields

{'key': {'inner key': {'inner inner key': 'value'}}}

You can use this to convert from your flat dictionary format to structured format:

fd = {('new jersey', 'mercer county', 'plumbers'): 3,
 ('new jersey', 'mercer county', 'programmers'): 81,
 ('new jersey', 'middlesex county', 'programmers'): 81,
 ('new jersey', 'middlesex county', 'salesmen'): 62,
 ('new york', 'queens county', 'plumbers'): 9,
 ('new york', 'queens county', 'salesmen'): 36}

for (k1,k2,k3), v in fd.iteritems():
    d.setdefault(k1, {}).setdefault(k2, {})[k3] = v

回答 11

您可以使用Addict:https//github.com/mewwts/addict

>>> from addict import Dict
>>> my_new_shiny_dict = Dict()
>>> my_new_shiny_dict.a.b.c.d.e = 2
>>> my_new_shiny_dict
{'a': {'b': {'c': {'d': {'e': 2}}}}}

You can use Addict: https://github.com/mewwts/addict

>>> from addict import Dict
>>> my_new_shiny_dict = Dict()
>>> my_new_shiny_dict.a.b.c.d.e = 2
>>> my_new_shiny_dict
{'a': {'b': {'c': {'d': {'e': 2}}}}}

回答 12

defaultdict() 是你的朋友!

对于二维字典,您可以执行以下操作:

d = defaultdict(defaultdict)
d[1][2] = 3

有关更多尺寸,您可以:

d = defaultdict(lambda :defaultdict(defaultdict))
d[1][2][3] = 4

defaultdict() is your friend!

For a two dimensional dictionary you can do:

d = defaultdict(defaultdict)
d[1][2] = 3

For more dimensions you can:

d = defaultdict(lambda :defaultdict(defaultdict))
d[1][2][3] = 4

回答 13

为了方便地迭代嵌套字典,为什么不编写一个简单的生成器呢?

def each_job(my_dict):
    for state, a in my_dict.items():
        for county, b in a.items():
            for job, value in b.items():
                yield {
                    'state'  : state,
                    'county' : county,
                    'job'    : job,
                    'value'  : value
                }

因此,如果您有编译后的嵌套字典,则对其进行迭代就变得很简单:

for r in each_job(my_dict):
    print "There are %d %s in %s, %s" % (r['value'], r['job'], r['county'], r['state'])

显然,您的生成器可以产生任何对您有用的数据格式。

为什么使用try catch块读取树?在尝试检索字典中的键之前,很容易(而且可能更安全)进行查询。使用保护子句的函数可能如下所示:

if not my_dict.has_key('new jersey'):
    return False

nj_dict = my_dict['new jersey']
...

或者,也许有些冗长的方法是使用get方法:

value = my_dict.get('new jersey', {}).get('middlesex county', {}).get('salesmen', 0)

但是,以更简洁的方式,您可能希望使用collections.defaultdict,它是自python 2.5以来标准库的一部分。

import collections

def state_struct(): return collections.defaultdict(county_struct)
def county_struct(): return collections.defaultdict(job_struct)
def job_struct(): return 0

my_dict = collections.defaultdict(state_struct)

print my_dict['new jersey']['middlesex county']['salesmen']

我在这里对数据结构的含义进行假设,但是应该很容易根据实际需要进行调整。

For easy iterating over your nested dictionary, why not just write a simple generator?

def each_job(my_dict):
    for state, a in my_dict.items():
        for county, b in a.items():
            for job, value in b.items():
                yield {
                    'state'  : state,
                    'county' : county,
                    'job'    : job,
                    'value'  : value
                }

So then, if you have your compilicated nested dictionary, iterating over it becomes simple:

for r in each_job(my_dict):
    print "There are %d %s in %s, %s" % (r['value'], r['job'], r['county'], r['state'])

Obviously your generator can yield whatever format of data is useful to you.

Why are you using try catch blocks to read the tree? It’s easy enough (and probably safer) to query whether a key exists in a dict before trying to retrieve it. A function using guard clauses might look like this:

if not my_dict.has_key('new jersey'):
    return False

nj_dict = my_dict['new jersey']
...

Or, a perhaps somewhat verbose method, is to use the get method:

value = my_dict.get('new jersey', {}).get('middlesex county', {}).get('salesmen', 0)

But for a somewhat more succinct way, you might want to look at using a collections.defaultdict, which is part of the standard library since python 2.5.

import collections

def state_struct(): return collections.defaultdict(county_struct)
def county_struct(): return collections.defaultdict(job_struct)
def job_struct(): return 0

my_dict = collections.defaultdict(state_struct)

print my_dict['new jersey']['middlesex county']['salesmen']

I’m making assumptions about the meaning of your data structure here, but it should be easy to adjust for what you actually want to do.


回答 14

我喜欢的一类包装这和实施的想法__getitem__,并__setitem__使得它们实现了一个简单的查询语言:

>>> d['new jersey/mercer county/plumbers'] = 3
>>> d['new jersey/mercer county/programmers'] = 81
>>> d['new jersey/mercer county/programmers']
81
>>> d['new jersey/mercer country']
<view which implicitly adds 'new jersey/mercer county' to queries/mutations>

如果您想花哨的话,还可以执行以下操作:

>>> d['*/*/programmers']
<view which would contain 'programmers' entries>

但大多数情况下,我认为实现这样的事情会很有趣:D

I like the idea of wrapping this in a class and implementing __getitem__ and __setitem__ such that they implemented a simple query language:

>>> d['new jersey/mercer county/plumbers'] = 3
>>> d['new jersey/mercer county/programmers'] = 81
>>> d['new jersey/mercer county/programmers']
81
>>> d['new jersey/mercer country']
<view which implicitly adds 'new jersey/mercer county' to queries/mutations>

If you wanted to get fancy you could also implement something like:

>>> d['*/*/programmers']
<view which would contain 'programmers' entries>

but mostly I think such a thing would be really fun to implement :D


回答 15

除非您的数据集将保持很小,否则您可能要考虑使用关系数据库。它将完全满足您的要求:轻松添加计数,选择​​计数子集,甚至可以按州,县,职业或这些方法的任意组合来汇总计数。

Unless your dataset is going to stay pretty small, you might want to consider using a relational database. It will do exactly what you want: make it easy to add counts, selecting subsets of counts, and even aggregate counts by state, county, occupation, or any combination of these.


回答 16

class JobDb(object):
    def __init__(self):
        self.data = []
        self.all = set()
        self.free = []
        self.index1 = {}
        self.index2 = {}
        self.index3 = {}

    def _indices(self,(key1,key2,key3)):
        indices = self.all.copy()
        wild = False
        for index,key in ((self.index1,key1),(self.index2,key2),
                                             (self.index3,key3)):
            if key is not None:
                indices &= index.setdefault(key,set())
            else:
                wild = True
        return indices, wild

    def __getitem__(self,key):
        indices, wild = self._indices(key)
        if wild:
            return dict(self.data[i] for i in indices)
        else:
            values = [self.data[i][-1] for i in indices]
            if values:
                return values[0]

    def __setitem__(self,key,value):
        indices, wild = self._indices(key)
        if indices:
            for i in indices:
                self.data[i] = key,value
        elif wild:
            raise KeyError(k)
        else:
            if self.free:
                index = self.free.pop(0)
                self.data[index] = key,value
            else:
                index = len(self.data)
                self.data.append((key,value))
                self.all.add(index)
            self.index1.setdefault(key[0],set()).add(index)
            self.index2.setdefault(key[1],set()).add(index)
            self.index3.setdefault(key[2],set()).add(index)

    def __delitem__(self,key):
        indices,wild = self._indices(key)
        if not indices:
            raise KeyError
        self.index1[key[0]] -= indices
        self.index2[key[1]] -= indices
        self.index3[key[2]] -= indices
        self.all -= indices
        for i in indices:
            self.data[i] = None
        self.free.extend(indices)

    def __len__(self):
        return len(self.all)

    def __iter__(self):
        for key,value in self.data:
            yield key

例:

>>> db = JobDb()
>>> db['new jersey', 'mercer county', 'plumbers'] = 3
>>> db['new jersey', 'mercer county', 'programmers'] = 81
>>> db['new jersey', 'middlesex county', 'programmers'] = 81
>>> db['new jersey', 'middlesex county', 'salesmen'] = 62
>>> db['new york', 'queens county', 'plumbers'] = 9
>>> db['new york', 'queens county', 'salesmen'] = 36

>>> db['new york', None, None]
{('new york', 'queens county', 'plumbers'): 9,
 ('new york', 'queens county', 'salesmen'): 36}

>>> db[None, None, 'plumbers']
{('new jersey', 'mercer county', 'plumbers'): 3,
 ('new york', 'queens county', 'plumbers'): 9}

>>> db['new jersey', 'mercer county', None]
{('new jersey', 'mercer county', 'plumbers'): 3,
 ('new jersey', 'mercer county', 'programmers'): 81}

>>> db['new jersey', 'middlesex county', 'programmers']
81

>>>

编辑:现在使用通配符(None)查询时返回字典,否则返回单个值。

class JobDb(object):
    def __init__(self):
        self.data = []
        self.all = set()
        self.free = []
        self.index1 = {}
        self.index2 = {}
        self.index3 = {}

    def _indices(self,(key1,key2,key3)):
        indices = self.all.copy()
        wild = False
        for index,key in ((self.index1,key1),(self.index2,key2),
                                             (self.index3,key3)):
            if key is not None:
                indices &= index.setdefault(key,set())
            else:
                wild = True
        return indices, wild

    def __getitem__(self,key):
        indices, wild = self._indices(key)
        if wild:
            return dict(self.data[i] for i in indices)
        else:
            values = [self.data[i][-1] for i in indices]
            if values:
                return values[0]

    def __setitem__(self,key,value):
        indices, wild = self._indices(key)
        if indices:
            for i in indices:
                self.data[i] = key,value
        elif wild:
            raise KeyError(k)
        else:
            if self.free:
                index = self.free.pop(0)
                self.data[index] = key,value
            else:
                index = len(self.data)
                self.data.append((key,value))
                self.all.add(index)
            self.index1.setdefault(key[0],set()).add(index)
            self.index2.setdefault(key[1],set()).add(index)
            self.index3.setdefault(key[2],set()).add(index)

    def __delitem__(self,key):
        indices,wild = self._indices(key)
        if not indices:
            raise KeyError
        self.index1[key[0]] -= indices
        self.index2[key[1]] -= indices
        self.index3[key[2]] -= indices
        self.all -= indices
        for i in indices:
            self.data[i] = None
        self.free.extend(indices)

    def __len__(self):
        return len(self.all)

    def __iter__(self):
        for key,value in self.data:
            yield key

Example:

>>> db = JobDb()
>>> db['new jersey', 'mercer county', 'plumbers'] = 3
>>> db['new jersey', 'mercer county', 'programmers'] = 81
>>> db['new jersey', 'middlesex county', 'programmers'] = 81
>>> db['new jersey', 'middlesex county', 'salesmen'] = 62
>>> db['new york', 'queens county', 'plumbers'] = 9
>>> db['new york', 'queens county', 'salesmen'] = 36

>>> db['new york', None, None]
{('new york', 'queens county', 'plumbers'): 9,
 ('new york', 'queens county', 'salesmen'): 36}

>>> db[None, None, 'plumbers']
{('new jersey', 'mercer county', 'plumbers'): 3,
 ('new york', 'queens county', 'plumbers'): 9}

>>> db['new jersey', 'mercer county', None]
{('new jersey', 'mercer county', 'plumbers'): 3,
 ('new jersey', 'mercer county', 'programmers'): 81}

>>> db['new jersey', 'middlesex county', 'programmers']
81

>>>

Edit: Now returning dictionaries when querying with wild cards (None), and single values otherwise.


回答 17

我也有类似的事情。我有很多情况下会这样做:

thedict = {}
for item in ('foo', 'bar', 'baz'):
  mydict = thedict.get(item, {})
  mydict = get_value_for(item)
  thedict[item] = mydict

但是要深入很多层次。关键在于“ .get(item,{})”,因为如果还没有字典的话,它将制作另一本字典。同时,我一直在思考如何更好地处理此问题。现在,有很多

value = mydict.get('foo', {}).get('bar', {}).get('baz', 0)

因此,我做了:

def dictgetter(thedict, default, *args):
  totalargs = len(args)
  for i,arg in enumerate(args):
    if i+1 == totalargs:
      thedict = thedict.get(arg, default)
    else:
      thedict = thedict.get(arg, {})
  return thedict

如果执行以下操作,则具有相同的效果:

value = dictgetter(mydict, 0, 'foo', 'bar', 'baz')

更好?我认同。

I have a similar thing going. I have a lot of cases where I do:

thedict = {}
for item in ('foo', 'bar', 'baz'):
  mydict = thedict.get(item, {})
  mydict = get_value_for(item)
  thedict[item] = mydict

But going many levels deep. It’s the “.get(item, {})” that’s the key as it’ll make another dictionary if there isn’t one already. Meanwhile, I’ve been thinking of ways to deal with this better. Right now, there’s a lot of

value = mydict.get('foo', {}).get('bar', {}).get('baz', 0)

So instead, I made:

def dictgetter(thedict, default, *args):
  totalargs = len(args)
  for i,arg in enumerate(args):
    if i+1 == totalargs:
      thedict = thedict.get(arg, default)
    else:
      thedict = thedict.get(arg, {})
  return thedict

Which has the same effect if you do:

value = dictgetter(mydict, 0, 'foo', 'bar', 'baz')

Better? I think so.


回答 18

您可以在lambdas和defaultdict中使用递归,无需定义名称:

a = defaultdict((lambda f: f(f))(lambda g: lambda:defaultdict(g(g))))

这是一个例子:

>>> a['new jersey']['mercer county']['plumbers']=3
>>> a['new jersey']['middlesex county']['programmers']=81
>>> a['new jersey']['mercer county']['programmers']=81
>>> a['new jersey']['middlesex county']['salesmen']=62
>>> a
defaultdict(<function __main__.<lambda>>,
        {'new jersey': defaultdict(<function __main__.<lambda>>,
                     {'mercer county': defaultdict(<function __main__.<lambda>>,
                                  {'plumbers': 3, 'programmers': 81}),
                      'middlesex county': defaultdict(<function __main__.<lambda>>,
                                  {'programmers': 81, 'salesmen': 62})})})

You can use recursion in lambdas and defaultdict, no need to define names:

a = defaultdict((lambda f: f(f))(lambda g: lambda:defaultdict(g(g))))

Here’s an example:

>>> a['new jersey']['mercer county']['plumbers']=3
>>> a['new jersey']['middlesex county']['programmers']=81
>>> a['new jersey']['mercer county']['programmers']=81
>>> a['new jersey']['middlesex county']['salesmen']=62
>>> a
defaultdict(<function __main__.<lambda>>,
        {'new jersey': defaultdict(<function __main__.<lambda>>,
                     {'mercer county': defaultdict(<function __main__.<lambda>>,
                                  {'plumbers': 3, 'programmers': 81}),
                      'middlesex county': defaultdict(<function __main__.<lambda>>,
                                  {'programmers': 81, 'salesmen': 62})})})

回答 19

我曾经使用此功能。其安全,快速,易于维护。

def deep_get(dictionary, keys, default=None):
    return reduce(lambda d, key: d.get(key, default) if isinstance(d, dict) else default, keys.split("."), dictionary)

范例:

>>> from functools import reduce
>>> def deep_get(dictionary, keys, default=None):
...     return reduce(lambda d, key: d.get(key, default) if isinstance(d, dict) else default, keys.split("."), dictionary)
...
>>> person = {'person':{'name':{'first':'John'}}}
>>> print (deep_get(person, "person.name.first"))
John
>>> print (deep_get(person, "person.name.lastname"))
None
>>> print (deep_get(person, "person.name.lastname", default="No lastname"))
No lastname
>>>

I used to use this function. its safe, quick, easily maintainable.

def deep_get(dictionary, keys, default=None):
    return reduce(lambda d, key: d.get(key, default) if isinstance(d, dict) else default, keys.split("."), dictionary)

Example :

>>> from functools import reduce
>>> def deep_get(dictionary, keys, default=None):
...     return reduce(lambda d, key: d.get(key, default) if isinstance(d, dict) else default, keys.split("."), dictionary)
...
>>> person = {'person':{'name':{'first':'John'}}}
>>> print (deep_get(person, "person.name.first"))
John
>>> print (deep_get(person, "person.name.lastname"))
None
>>> print (deep_get(person, "person.name.lastname", default="No lastname"))
No lastname
>>>

反转/反转字典映射

问题:反转/反转字典映射

给定这样的字典:

my_map = {'a': 1, 'b': 2}

如何将这张地图倒置即可:

inv_map = {1: 'a', 2: 'b'}

Given a dictionary like so:

my_map = {'a': 1, 'b': 2}

How can one invert this map to get:

inv_map = {1: 'a', 2: 'b'}

回答 0

对于Python 2.7.x

inv_map = {v: k for k, v in my_map.iteritems()}

对于Python 3+:

inv_map = {v: k for k, v in my_map.items()}

For Python 2.7.x

inv_map = {v: k for k, v in my_map.iteritems()}

For Python 3+:

inv_map = {v: k for k, v in my_map.items()}

回答 1

假设字典中的值是唯一的:

dict((v, k) for k, v in my_map.iteritems())

Assuming that the values in the dict are unique:

dict((v, k) for k, v in my_map.iteritems())

回答 2

如果中的值my_map不是唯一的:

inv_map = {}
for k, v in my_map.iteritems():
    inv_map[v] = inv_map.get(v, [])
    inv_map[v].append(k)

If the values in my_map aren’t unique:

inv_map = {}
for k, v in my_map.iteritems():
    inv_map[v] = inv_map.get(v, [])
    inv_map[v].append(k)

回答 3

为此,同时保留映射类型(假设它是a dictdict子类):

def inverse_mapping(f):
    return f.__class__(map(reversed, f.items()))

To do this while preserving the type of your mapping (assuming that it is a dict or a dict subclass):

def inverse_mapping(f):
    return f.__class__(map(reversed, f.items()))

回答 4

尝试这个:

inv_map = dict(zip(my_map.values(), my_map.keys()))

(请注意,字典视图上的Python文档明确地保证了这一点,.keys()并且.values()其元素具有相同的顺序,这使得上述方法可以工作。)

或者:

inv_map = dict((my_map[k], k) for k in my_map)

或使用python 3.0的dict理解

inv_map = {my_map[k] : k for k in my_map}

Try this:

inv_map = dict(zip(my_map.values(), my_map.keys()))

(Note that the Python docs on dictionary views explicitly guarantee that .keys() and .values() have their elements in the same order, which allows the approach above to work.)

Alternatively:

inv_map = dict((my_map[k], k) for k in my_map)

or using python 3.0’s dict comprehensions

inv_map = {my_map[k] : k for k in my_map}

回答 5

另一种更实用的方法:

my_map = { 'a': 1, 'b':2 }
dict(map(reversed, my_map.items()))

Another, more functional, way:

my_map = { 'a': 1, 'b':2 }
dict(map(reversed, my_map.items()))

回答 6

这扩展了Robert的答案,适用于字典中的值不是唯一的情况。

class ReversibleDict(dict):

    def reversed(self):
        """
        Return a reversed dict, with common values in the original dict
        grouped into a list in the returned dict.

        Example:
        >>> d = ReversibleDict({'a': 3, 'c': 2, 'b': 2, 'e': 3, 'd': 1, 'f': 2})
        >>> d.reversed()
        {1: ['d'], 2: ['c', 'b', 'f'], 3: ['a', 'e']}
        """

        revdict = {}
        for k, v in self.iteritems():
            revdict.setdefault(v, []).append(k)
        return revdict

实现受到限制,因为您不能使用reversed两次并取回原始文件。因此它不是对称的。已通过Python 2.6测试。是一个我用来打印结果字典的用例。

如果您宁愿使用a而set不是a list,并且可能存在对此有意义的无序应用程序,而不是setdefault(v, []).append(k)use setdefault(v, set()).add(k)

This expands upon the answer by Robert, applying to when the values in the dict aren’t unique.

class ReversibleDict(dict):

    def reversed(self):
        """
        Return a reversed dict, with common values in the original dict
        grouped into a list in the returned dict.

        Example:
        >>> d = ReversibleDict({'a': 3, 'c': 2, 'b': 2, 'e': 3, 'd': 1, 'f': 2})
        >>> d.reversed()
        {1: ['d'], 2: ['c', 'b', 'f'], 3: ['a', 'e']}
        """

        revdict = {}
        for k, v in self.iteritems():
            revdict.setdefault(v, []).append(k)
        return revdict

The implementation is limited in that you cannot use reversed twice and get the original back. It is not symmetric as such. It is tested with Python 2.6. Here is a use case of how I am using to print the resultant dict.

If you’d rather use a set than a list, and there could exist unordered applications for which this makes sense, instead of setdefault(v, []).append(k), use setdefault(v, set()).add(k).


回答 7

我们还可以使用重复键反转字典defaultdict

from collections import Counter, defaultdict

def invert_dict(d):
    d_inv = defaultdict(list)
    for k, v in d.items():
        d_inv[v].append(k)
    return d_inv

text = 'aaa bbb ccc ddd aaa bbb ccc aaa' 
c = Counter(text.split()) # Counter({'aaa': 3, 'bbb': 2, 'ccc': 2, 'ddd': 1})
dict(invert_dict(c)) # {1: ['ddd'], 2: ['bbb', 'ccc'], 3: ['aaa']}  

这里

与使用的等效技术相比,此技术更简单,更快dict.setdefault()

We can also reverse a dictionary with duplicate keys using defaultdict:

from collections import Counter, defaultdict

def invert_dict(d):
    d_inv = defaultdict(list)
    for k, v in d.items():
        d_inv[v].append(k)
    return d_inv

text = 'aaa bbb ccc ddd aaa bbb ccc aaa' 
c = Counter(text.split()) # Counter({'aaa': 3, 'bbb': 2, 'ccc': 2, 'ddd': 1})
dict(invert_dict(c)) # {1: ['ddd'], 2: ['bbb', 'ccc'], 3: ['aaa']}  

See here:

This technique is simpler and faster than an equivalent technique using dict.setdefault().


回答 8

例如,您有以下字典:

dict = {'a': 'fire', 'b': 'ice', 'c': 'fire', 'd': 'water'}

而且您想以相反的形式获取它:

inverted_dict = {'fire': ['a', 'c'], 'ice': ['b'], 'water': ['d']}

第一个解决方案。要在字典中反转键值对,请使用for-loop方法:

# Use this code to invert dictionaries that have non-unique values

inverted_dict = dict()
for key, value in dict.items():
    inverted_dict.setdefault(value, list()).append(key)

第二解决方案。使用字典理解方法进行反演:

# Use this code to invert dictionaries that have unique values

inverted_dict = {value: key for key, value in dict.items()}

第三解。使用还原反转方法(取决于第二种解决方案):

# Use this code to invert dictionaries that have lists of values

dict = {value: key for key in inverted_dict for value in my_map[key]}

For instance, you have the following dictionary:

dict = {'a': 'fire', 'b': 'ice', 'c': 'fire', 'd': 'water'}

And you wanna get it in such an inverted form:

inverted_dict = {'fire': ['a', 'c'], 'ice': ['b'], 'water': ['d']}

First Solution. For inverting key-value pairs in your dictionary use a for-loop approach:

# Use this code to invert dictionaries that have non-unique values

inverted_dict = dict()
for key, value in dict.items():
    inverted_dict.setdefault(value, list()).append(key)

Second Solution. Use a dictionary comprehension approach for inversion:

# Use this code to invert dictionaries that have unique values

inverted_dict = {value: key for key, value in dict.items()}

Third Solution. Use reverting the inversion approach (relies on second solution):

# Use this code to invert dictionaries that have lists of values

dict = {value: key for key in inverted_dict for value in my_map[key]}

回答 9

列表和字典理解的结合。可以处理重复的钥匙

{v:[i for i in d.keys() if d[i] == v ] for k,v in d.items()}

Combination of list and dictionary comprehension. Can handle duplicate keys

{v:[i for i in d.keys() if d[i] == v ] for k,v in d.items()}

回答 10

如果值不是唯一的,并且您有点硬核:

inv_map = dict(
    (v, [k for (k, xx) in filter(lambda (key, value): value == v, my_map.items())]) 
    for v in set(my_map.values())
)

特别是对于大字典,请注意,此解决方案的效率远不及Python反向/反转映射的答案,因为它会循环items()多次。

If the values aren’t unique, and you’re a little hardcore:

inv_map = dict(
    (v, [k for (k, xx) in filter(lambda (key, value): value == v, my_map.items())]) 
    for v in set(my_map.values())
)

Especially for a large dict, note that this solution is far less efficient than the answer Python reverse / invert a mapping because it loops over items() multiple times.


回答 11

除了上面建议的其他功能之外,如果您喜欢lambdas:

invert = lambda mydict: {v:k for k, v in mydict.items()}

或者,您也可以采用这种方式:

invert = lambda mydict: dict( zip(mydict.values(), mydict.keys()) )

In addition to the other functions suggested above, if you like lambdas:

invert = lambda mydict: {v:k for k, v in mydict.items()}

Or, you could do it this way too:

invert = lambda mydict: dict( zip(mydict.values(), mydict.keys()) )

回答 12

我认为最好的方法是定义一个类。这是“对称字典”的实现:

class SymDict:
    def __init__(self):
        self.aToB = {}
        self.bToA = {}

    def assocAB(self, a, b):
        # Stores and returns a tuple (a,b) of overwritten bindings
        currB = None
        if a in self.aToB: currB = self.bToA[a]
        currA = None
        if b in self.bToA: currA = self.aToB[b]

        self.aToB[a] = b
        self.bToA[b] = a
        return (currA, currB)

    def lookupA(self, a):
        if a in self.aToB:
            return self.aToB[a]
        return None

    def lookupB(self, b):
        if b in self.bToA:
            return self.bToA[b]
        return None

如果需要,删除和迭代方法很容易实现。

与反转整个字典(这似乎是此页面上最流行的解决方案)相比,这种实现方式效率更高。更不用说,您可以根据需要在自己的SymDict中添加或删除值,并且逆字典将始终保持有效-如果仅对整个字典进行一次反向操作,则情况并非如此。

I think the best way to do this is to define a class. Here is an implementation of a “symmetric dictionary”:

class SymDict:
    def __init__(self):
        self.aToB = {}
        self.bToA = {}

    def assocAB(self, a, b):
        # Stores and returns a tuple (a,b) of overwritten bindings
        currB = None
        if a in self.aToB: currB = self.bToA[a]
        currA = None
        if b in self.bToA: currA = self.aToB[b]

        self.aToB[a] = b
        self.bToA[b] = a
        return (currA, currB)

    def lookupA(self, a):
        if a in self.aToB:
            return self.aToB[a]
        return None

    def lookupB(self, b):
        if b in self.bToA:
            return self.bToA[b]
        return None

Deletion and iteration methods are easy enough to implement if they’re needed.

This implementation is way more efficient than inverting an entire dictionary (which seems to be the most popular solution on this page). Not to mention, you can add or remove values from your SymDict as much as you want, and your inverse-dictionary will always stay valid — this isn’t true if you simply reverse the entire dictionary once.


回答 13

这处理非唯一值,并保留了唯一情况的大部分外观。

inv_map = {v:[k for k in my_map if my_map[k] == v] for v in my_map.itervalues()}

对于Python 3.x,请替换itervaluesvalues

This handles non-unique values and retains much of the look of the unique case.

inv_map = {v:[k for k in my_map if my_map[k] == v] for v in my_map.itervalues()}

For Python 3.x, replace itervalues with values.


回答 14

函数对于类型列表的值是对称的;执行reverse_dict(reverse_dict(dictionary))时,元组被覆盖到列表中

def reverse_dict(dictionary):
    reverse_dict = {}
    for key, value in dictionary.iteritems():
        if not isinstance(value, (list, tuple)):
            value = [value]
        for val in value:
            reverse_dict[val] = reverse_dict.get(val, [])
            reverse_dict[val].append(key)
    for key, value in reverse_dict.iteritems():
        if len(value) == 1:
            reverse_dict[key] = value[0]
    return reverse_dict

Function is symmetric for values of type list; Tuples are coverted to lists when performing reverse_dict(reverse_dict(dictionary))

def reverse_dict(dictionary):
    reverse_dict = {}
    for key, value in dictionary.iteritems():
        if not isinstance(value, (list, tuple)):
            value = [value]
        for val in value:
            reverse_dict[val] = reverse_dict.get(val, [])
            reverse_dict[val].append(key)
    for key, value in reverse_dict.iteritems():
        if len(value) == 1:
            reverse_dict[key] = value[0]
    return reverse_dict

回答 15

由于字典在字典中需要一个与值不同的唯一键,因此我们必须将反转的值附加到要包含在新的特定键中的排序列表中。

def r_maping(dictionary):
    List_z=[]
    Map= {}
    for z, x in dictionary.iteritems(): #iterate through the keys and values
        Map.setdefault(x,List_z).append(z) #Setdefault is the same as dict[key]=default."The method returns the key value available in the dictionary and if given key is not available then it will return provided default value. Afterward, we will append into the default list our new values for the specific key.
    return Map

Since dictionaries require one unique key within the dictionary unlike values, we have to append the reversed values into a list of sort to be included within the new specific keys.

def r_maping(dictionary):
    List_z=[]
    Map= {}
    for z, x in dictionary.iteritems(): #iterate through the keys and values
        Map.setdefault(x,List_z).append(z) #Setdefault is the same as dict[key]=default."The method returns the key value available in the dictionary and if given key is not available then it will return provided default value. Afterward, we will append into the default list our new values for the specific key.
    return Map

回答 16

非双射映射的快速功能解决方案(值不是唯一的):

from itertools import imap, groupby

def fst(s):
    return s[0]

def snd(s):
    return s[1]

def inverseDict(d):
    """
    input d: a -> b
    output : b -> set(a)
    """
    return {
        v : set(imap(fst, kv_iter))
        for (v, kv_iter) in groupby(
            sorted(d.iteritems(),
                   key=snd),
            key=snd
        )
    }

从理论上讲,这应该比在命令式解决方案中一个接一个地添加到集合(或追加到列表中)要快。

不幸的是,值必须是可排序的,groupby要求排序。

Fast functional solution for non-bijective maps (values not unique):

from itertools import imap, groupby

def fst(s):
    return s[0]

def snd(s):
    return s[1]

def inverseDict(d):
    """
    input d: a -> b
    output : b -> set(a)
    """
    return {
        v : set(imap(fst, kv_iter))
        for (v, kv_iter) in groupby(
            sorted(d.iteritems(),
                   key=snd),
            key=snd
        )
    }

In theory this should be faster than adding to the set (or appending to the list) one by one like in the imperative solution.

Unfortunately the values have to be sortable, the sorting is required by groupby.


回答 17

尝试使用python 2.7 / 3.x

inv_map={};
for i in my_map:
    inv_map[my_map[i]]=i    
print inv_map

Try this for python 2.7/3.x

inv_map={};
for i in my_map:
    inv_map[my_map[i]]=i    
print inv_map

回答 18

我会在python 2中那样做。

inv_map = {my_map[x] : x for x in my_map}

I would do it that way in python 2.

inv_map = {my_map[x] : x for x in my_map}

回答 19

def invertDictionary(d):
    myDict = {}
  for i in d:
     value = d.get(i)
     myDict.setdefault(value,[]).append(i)   
 return myDict
 print invertDictionary({'a':1, 'b':2, 'c':3 , 'd' : 1})

这将提供以下输出:{1:[‘a’,’d’],2:[‘b’],3:[‘c’]}

def invertDictionary(d):
    myDict = {}
  for i in d:
     value = d.get(i)
     myDict.setdefault(value,[]).append(i)   
 return myDict
 print invertDictionary({'a':1, 'b':2, 'c':3 , 'd' : 1})

This will provide output as : {1: [‘a’, ‘d’], 2: [‘b’], 3: [‘c’]}


回答 20

  def reverse_dictionary(input_dict):
      out = {}
      for v in input_dict.values():  
          for value in v:
              if value not in out:
                  out[value.lower()] = []

      for i in input_dict:
          for j in out:
              if j in map (lambda x : x.lower(),input_dict[i]):
                  out[j].append(i.lower())
                  out[j].sort()
      return out

这段代码是这样的:

r = reverse_dictionary({'Accurate': ['exact', 'precise'], 'exact': ['precise'], 'astute': ['Smart', 'clever'], 'smart': ['clever', 'bright', 'talented']})

print(r)

{'precise': ['accurate', 'exact'], 'clever': ['astute', 'smart'], 'talented': ['smart'], 'bright': ['smart'], 'exact': ['accurate'], 'smart': ['astute']}
  def reverse_dictionary(input_dict):
      out = {}
      for v in input_dict.values():  
          for value in v:
              if value not in out:
                  out[value.lower()] = []

      for i in input_dict:
          for j in out:
              if j in map (lambda x : x.lower(),input_dict[i]):
                  out[j].append(i.lower())
                  out[j].sort()
      return out

this code do like this:

r = reverse_dictionary({'Accurate': ['exact', 'precise'], 'exact': ['precise'], 'astute': ['Smart', 'clever'], 'smart': ['clever', 'bright', 'talented']})

print(r)

{'precise': ['accurate', 'exact'], 'clever': ['astute', 'smart'], 'talented': ['smart'], 'bright': ['smart'], 'exact': ['accurate'], 'smart': ['astute']}

回答 21

没有什么完全不同,只是Cookbook重写了一些食谱。还通过保留setdefault方法进行了优化,而不是每次通过实例进行优化:

def inverse(mapping):
    '''
    A function to inverse mapping, collecting keys with simillar values
    in list. Careful to retain original type and to be fast.
    >> d = dict(a=1, b=2, c=1, d=3, e=2, f=1, g=5, h=2)
    >> inverse(d)
    {1: ['f', 'c', 'a'], 2: ['h', 'b', 'e'], 3: ['d'], 5: ['g']}
    '''
    res = {}
    setdef = res.setdefault
    for key, value in mapping.items():
        setdef(value, []).append(key)
    return res if mapping.__class__==dict else mapping.__class__(res)

设计下CPython的3.X中运行,为2.X替换mapping.items()mapping.iteritems()

在我的机器上,运行速度比这里的其他示例更快

Not something completely different, just a bit rewritten recipe from Cookbook. It’s futhermore optimized by retaining setdefault method, instead of each time getting it through the instance:

def inverse(mapping):
    '''
    A function to inverse mapping, collecting keys with simillar values
    in list. Careful to retain original type and to be fast.
    >> d = dict(a=1, b=2, c=1, d=3, e=2, f=1, g=5, h=2)
    >> inverse(d)
    {1: ['f', 'c', 'a'], 2: ['h', 'b', 'e'], 3: ['d'], 5: ['g']}
    '''
    res = {}
    setdef = res.setdefault
    for key, value in mapping.items():
        setdef(value, []).append(key)
    return res if mapping.__class__==dict else mapping.__class__(res)

Designed to be run under CPython 3.x, for 2.x replace mapping.items() with mapping.iteritems()

On my machine runs a bit faster, than other examples here


回答 22

我在循环“ for”和方法“ .get()”的帮助下编写了此代码,并将字典的名称“ map”更改为“ map1”,因为“ map”是一个函数。

def dict_invert(map1):
    inv_map = {} # new dictionary
    for key in map1.keys():
        inv_map[map1.get(key)] = key
    return inv_map

I wrote this with the help of cycle ‘for’ and method ‘.get()’ and I changed the name ‘map’ of the dictionary to ‘map1’ because ‘map’ is a function.

def dict_invert(map1):
    inv_map = {} # new dictionary
    for key in map1.keys():
        inv_map[map1.get(key)] = key
    return inv_map

回答 23

如果值不是唯一的,并且可能是哈希(一维):

for k, v in myDict.items():
    if len(v) > 1:
        for item in v:
            invDict[item] = invDict.get(item, [])
            invDict[item].append(k)
    else:
        invDict[v] = invDict.get(v, [])
        invDict[v].append(k)

如果需要更深层次地进行递归,则只需一个维度即可:

def digList(lst):
    temp = []
    for item in lst:
        if type(item) is list:
            temp.append(digList(item))
        else:
            temp.append(item)
    return set(temp)

for k, v in myDict.items():
    if type(v) is list:
        items = digList(v)
        for item in items:
            invDict[item] = invDict.get(item, [])
            invDict[item].append(k)
    else:
        invDict[v] = invDict.get(v, [])
        invDict[v].append(k)

If values aren’t unique AND may be a hash (one dimension):

for k, v in myDict.items():
    if len(v) > 1:
        for item in v:
            invDict[item] = invDict.get(item, [])
            invDict[item].append(k)
    else:
        invDict[v] = invDict.get(v, [])
        invDict[v].append(k)

And with a recursion if you need to dig deeper then just one dimension:

def digList(lst):
    temp = []
    for item in lst:
        if type(item) is list:
            temp.append(digList(item))
        else:
            temp.append(item)
    return set(temp)

for k, v in myDict.items():
    if type(v) is list:
        items = digList(v)
        for item in items:
            invDict[item] = invDict.get(item, [])
            invDict[item].append(k)
    else:
        invDict[v] = invDict.get(v, [])
        invDict[v].append(k)