问题:实现嵌套字典的最佳方法是什么?
我有一个实质上相当于嵌套字典的数据结构。假设它看起来像这样:
{'new jersey': {'mercer county': {'plumbers': 3,
'programmers': 81},
'middlesex county': {'programmers': 81,
'salesmen': 62}},
'new york': {'queens county': {'plumbers': 9,
'salesmen': 36}}}
现在,维护和创建它非常痛苦。每当我有一个新的州/县/专业时,我都必须通过讨厌的try / catch块创建较低层的字典。此外,如果要遍历所有值,则必须创建烦人的嵌套迭代器。
我也可以使用元组作为键,例如:
{('new jersey', 'mercer county', 'plumbers'): 3,
('new jersey', 'mercer county', 'programmers'): 81,
('new jersey', 'middlesex county', 'programmers'): 81,
('new jersey', 'middlesex county', 'salesmen'): 62,
('new york', 'queens county', 'plumbers'): 9,
('new york', 'queens county', 'salesmen'): 36}
这使得对值的迭代非常简单自然,但是在语法上进行诸如汇总和查看字典子集之类的操作在语法上更加痛苦(例如,如果我只是想逐个查看状态的话)。
基本上,有时我想将嵌套字典视为平面字典,而有时又想将其视为复杂的层次结构。我可以将所有这些都包装在一个类中,但是似乎有人已经做到了。另外,似乎可能有一些非常优雅的语法构造可以做到这一点。
我怎样才能做得更好?
附录:我知道,setdefault()
但这实际上并不能使语法简洁。同样,您创建的每个子词典仍然需要setdefault()
手动设置。
I have a data structure which essentially amounts to a nested dictionary. Let’s say it looks like this:
{'new jersey': {'mercer county': {'plumbers': 3,
'programmers': 81},
'middlesex county': {'programmers': 81,
'salesmen': 62}},
'new york': {'queens county': {'plumbers': 9,
'salesmen': 36}}}
Now, maintaining and creating this is pretty painful; every time I have a new state/county/profession I have to create the lower layer dictionaries via obnoxious try/catch blocks. Moreover, I have to create annoying nested iterators if I want to go over all the values.
I could also use tuples as keys, like such:
{('new jersey', 'mercer county', 'plumbers'): 3,
('new jersey', 'mercer county', 'programmers'): 81,
('new jersey', 'middlesex county', 'programmers'): 81,
('new jersey', 'middlesex county', 'salesmen'): 62,
('new york', 'queens county', 'plumbers'): 9,
('new york', 'queens county', 'salesmen'): 36}
This makes iterating over the values very simple and natural, but it is more syntactically painful to do things like aggregations and looking at subsets of the dictionary (e.g. if I just want to go state-by-state).
Basically, sometimes I want to think of a nested dictionary as a flat dictionary, and sometimes I want to think of it indeed as a complex hierarchy. I could wrap this all in a class, but it seems like someone might have done this already. Alternatively, it seems like there might be some really elegant syntactical constructions to do this.
How could I do this better?
Addendum: I’m aware of setdefault()
but it doesn’t really make for clean syntax. Also, each sub-dictionary you create still needs to have setdefault()
manually set.
回答 0
在Python中实现嵌套字典的最佳方法是什么?
这是个坏主意,请不要这样做。相反,请使用常规字典并dict.setdefault
在适当位置使用apropos,因此,在正常使用情况下缺少键时,您将获得期望的KeyError
。如果您坚持要采取这种行为,请按以下步骤射击自己:
__missing__
在dict
子类上实现以设置并返回新实例。
从Python 2.5开始,这种方法就已经可用(并记录在案),并且(对我来说特别有价值)它的打印效果与普通dict一样,而不是自动生成的defaultdict的丑陋打印:
class Vividict(dict):
def __missing__(self, key):
value = self[key] = type(self)() # retain local pointer to value
return value # faster to return than dict lookup
(注意self[key]
在作业的左侧,因此此处没有递归。)
并说您有一些数据:
data = {('new jersey', 'mercer county', 'plumbers'): 3,
('new jersey', 'mercer county', 'programmers'): 81,
('new jersey', 'middlesex county', 'programmers'): 81,
('new jersey', 'middlesex county', 'salesmen'): 62,
('new york', 'queens county', 'plumbers'): 9,
('new york', 'queens county', 'salesmen'): 36}
这是我们的用法代码:
vividict = Vividict()
for (state, county, occupation), number in data.items():
vividict[state][county][occupation] = number
现在:
>>> import pprint
>>> pprint.pprint(vividict, width=40)
{'new jersey': {'mercer county': {'plumbers': 3,
'programmers': 81},
'middlesex county': {'programmers': 81,
'salesmen': 62}},
'new york': {'queens county': {'plumbers': 9,
'salesmen': 36}}}
批评
对这种类型的容器的批评是,如果用户拼错了密钥,我们的代码可能会无声地失败:
>>> vividict['new york']['queens counyt']
{}
另外,现在我们的数据中会有一个拼写错误的县:
>>> pprint.pprint(vividict, width=40)
{'new jersey': {'mercer county': {'plumbers': 3,
'programmers': 81},
'middlesex county': {'programmers': 81,
'salesmen': 62}},
'new york': {'queens county': {'plumbers': 9,
'salesmen': 36},
'queens counyt': {}}}
说明:
我们只是提供了该类的另一个嵌套实例 Vividict
每当访问键但丢失键时。(返回值分配很有用,因为它避免了我们额外地在dict上调用getter,不幸的是,我们无法在设置它时返回它。)
请注意,这些与最受支持的答案具有相同的语义,但代码行的一半-nosklo的实现:
class AutoVivification(dict):
"""Implementation of perl's autovivification feature."""
def __getitem__(self, item):
try:
return dict.__getitem__(self, item)
except KeyError:
value = self[item] = type(self)()
return value
用法示范
下面只是一个示例,说明如何轻松地使用此dict即时创建嵌套的dict结构。这样可以快速创建层次结构树结构,如您所愿。
import pprint
class Vividict(dict):
def __missing__(self, key):
value = self[key] = type(self)()
return value
d = Vividict()
d['foo']['bar']
d['foo']['baz']
d['fizz']['buzz']
d['primary']['secondary']['tertiary']['quaternary']
pprint.pprint(d)
哪个输出:
{'fizz': {'buzz': {}},
'foo': {'bar': {}, 'baz': {}},
'primary': {'secondary': {'tertiary': {'quaternary': {}}}}}
正如最后一行所示,它打印精美,便于人工检查。但是,如果要直观地检查数据,则可以实施__missing__
将其类的新实例设置为键并将其返回的方法,这是更好的解决方案。
对比其他替代方法:
dict.setdefault
尽管询问者认为这不干净,但我发现它比Vividict
我自己更喜欢。
d = {} # or dict()
for (state, county, occupation), number in data.items():
d.setdefault(state, {}).setdefault(county, {})[occupation] = number
现在:
>>> pprint.pprint(d, width=40)
{'new jersey': {'mercer county': {'plumbers': 3,
'programmers': 81},
'middlesex county': {'programmers': 81,
'salesmen': 62}},
'new york': {'queens county': {'plumbers': 9,
'salesmen': 36}}}
拼写错误将严重失败,并且不会因错误信息而使我们的数据混乱:
>>> d['new york']['queens counyt']
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
KeyError: 'queens counyt'
另外,我认为setdefault在循环中使用时效果很好,并且您不知道密钥要获得什么,但是重复使用变得很繁重,而且我认为没有人愿意遵守以下规定:
d = dict()
d.setdefault('foo', {}).setdefault('bar', {})
d.setdefault('foo', {}).setdefault('baz', {})
d.setdefault('fizz', {}).setdefault('buzz', {})
d.setdefault('primary', {}).setdefault('secondary', {}).setdefault('tertiary', {}).setdefault('quaternary', {})
另一个批评是,无论是否使用setdefault,setdefault都需要一个新实例。但是,Python(或至少CPython)在处理未使用和未引用的新实例方面相当聪明,例如,它重用了内存中的位置:
>>> id({}), id({}), id({})
(523575344, 523575344, 523575344)
自动更新的defaultdict
这是一个简洁的实现,不检查数据的脚本中的用法与实现一样有用__missing__
:
from collections import defaultdict
def vivdict():
return defaultdict(vivdict)
但是,如果您需要检查数据,则以相同方式填充数据的自动复现defaultdict的结果如下所示:
>>> d = vivdict(); d['foo']['bar']; d['foo']['baz']; d['fizz']['buzz']; d['primary']['secondary']['tertiary']['quaternary']; import pprint;
>>> pprint.pprint(d)
defaultdict(<function vivdict at 0x17B01870>, {'foo': defaultdict(<function vivdict
at 0x17B01870>, {'baz': defaultdict(<function vivdict at 0x17B01870>, {}), 'bar':
defaultdict(<function vivdict at 0x17B01870>, {})}), 'primary': defaultdict(<function
vivdict at 0x17B01870>, {'secondary': defaultdict(<function vivdict at 0x17B01870>,
{'tertiary': defaultdict(<function vivdict at 0x17B01870>, {'quaternary': defaultdict(
<function vivdict at 0x17B01870>, {})})})}), 'fizz': defaultdict(<function vivdict at
0x17B01870>, {'buzz': defaultdict(<function vivdict at 0x17B01870>, {})})})
此输出非常微不足道,并且结果非常不可读。通常给出的解决方案是将其递归转换回dict以进行手动检查。这个非平凡的解决方案留给读者练习。
性能
最后,让我们看一下性能。我要减去实例化的成本。
>>> import timeit
>>> min(timeit.repeat(lambda: {}.setdefault('foo', {}))) - min(timeit.repeat(lambda: {}))
0.13612580299377441
>>> min(timeit.repeat(lambda: vivdict()['foo'])) - min(timeit.repeat(lambda: vivdict()))
0.2936999797821045
>>> min(timeit.repeat(lambda: Vividict()['foo'])) - min(timeit.repeat(lambda: Vividict()))
0.5354437828063965
>>> min(timeit.repeat(lambda: AutoVivification()['foo'])) - min(timeit.repeat(lambda: AutoVivification()))
2.138362169265747
基于性能,dict.setdefault
效果最佳。如果您关心执行速度,我强烈建议将其用于生产代码。
如果您需要将它用于交互式使用(也许是在IPython笔记本中),那么性能并不重要-在这种情况下,我会选择Vividict来确保输出的可读性。与AutoVivification对象(为此目的而使用__getitem__
代替__missing__
)相比,它要优越得多。
结论
__missing__
在子类dict
上实现以设置和返回新实例要比替代方法难一些,但具有以下优点:
并且因为它比修改不那么复杂且性能更高__getitem__
,所以应该优先于该方法。
但是,它有缺点:
- 错误的查询将自动失败。
- 错误的查询将保留在词典中。
因此,我个人更喜欢setdefault
其他解决方案,并且在每种情况下都需要这种行为。
What is the best way to implement nested dictionaries in Python?
This is a bad idea, don’t do it. Instead, use a regular dictionary and use dict.setdefault
where apropos, so when keys are missing under normal usage you get the expected KeyError
. If you insist on getting this behavior, here’s how to shoot yourself in the foot:
Implement __missing__
on a dict
subclass to set and return a new instance.
This approach has been available (and documented) since Python 2.5, and (particularly valuable to me) it pretty prints just like a normal dict, instead of the ugly printing of an autovivified defaultdict:
class Vividict(dict):
def __missing__(self, key):
value = self[key] = type(self)() # retain local pointer to value
return value # faster to return than dict lookup
(Note self[key]
is on the left-hand side of assignment, so there’s no recursion here.)
and say you have some data:
data = {('new jersey', 'mercer county', 'plumbers'): 3,
('new jersey', 'mercer county', 'programmers'): 81,
('new jersey', 'middlesex county', 'programmers'): 81,
('new jersey', 'middlesex county', 'salesmen'): 62,
('new york', 'queens county', 'plumbers'): 9,
('new york', 'queens county', 'salesmen'): 36}
Here’s our usage code:
vividict = Vividict()
for (state, county, occupation), number in data.items():
vividict[state][county][occupation] = number
And now:
>>> import pprint
>>> pprint.pprint(vividict, width=40)
{'new jersey': {'mercer county': {'plumbers': 3,
'programmers': 81},
'middlesex county': {'programmers': 81,
'salesmen': 62}},
'new york': {'queens county': {'plumbers': 9,
'salesmen': 36}}}
Criticism
A criticism of this type of container is that if the user misspells a key, our code could fail silently:
>>> vividict['new york']['queens counyt']
{}
And additionally now we’d have a misspelled county in our data:
>>> pprint.pprint(vividict, width=40)
{'new jersey': {'mercer county': {'plumbers': 3,
'programmers': 81},
'middlesex county': {'programmers': 81,
'salesmen': 62}},
'new york': {'queens county': {'plumbers': 9,
'salesmen': 36},
'queens counyt': {}}}
Explanation:
We’re just providing another nested instance of our class Vividict
whenever a key is accessed but missing. (Returning the value assignment is useful because it avoids us additionally calling the getter on the dict, and unfortunately, we can’t return it as it is being set.)
Note, these are the same semantics as the most upvoted answer but in half the lines of code – nosklo’s implementation:
class AutoVivification(dict):
"""Implementation of perl's autovivification feature."""
def __getitem__(self, item):
try:
return dict.__getitem__(self, item)
except KeyError:
value = self[item] = type(self)()
return value
Demonstration of Usage
Below is just an example of how this dict could be easily used to create a nested dict structure on the fly. This can quickly create a hierarchical tree structure as deeply as you might want to go.
import pprint
class Vividict(dict):
def __missing__(self, key):
value = self[key] = type(self)()
return value
d = Vividict()
d['foo']['bar']
d['foo']['baz']
d['fizz']['buzz']
d['primary']['secondary']['tertiary']['quaternary']
pprint.pprint(d)
Which outputs:
{'fizz': {'buzz': {}},
'foo': {'bar': {}, 'baz': {}},
'primary': {'secondary': {'tertiary': {'quaternary': {}}}}}
And as the last line shows, it pretty prints beautifully and in order for manual inspection. But if you want to visually inspect your data, implementing __missing__
to set a new instance of its class to the key and return it is a far better solution.
Other alternatives, for contrast:
dict.setdefault
Although the asker thinks this isn’t clean, I find it preferable to the Vividict
myself.
d = {} # or dict()
for (state, county, occupation), number in data.items():
d.setdefault(state, {}).setdefault(county, {})[occupation] = number
and now:
>>> pprint.pprint(d, width=40)
{'new jersey': {'mercer county': {'plumbers': 3,
'programmers': 81},
'middlesex county': {'programmers': 81,
'salesmen': 62}},
'new york': {'queens county': {'plumbers': 9,
'salesmen': 36}}}
A misspelling would fail noisily, and not clutter our data with bad information:
>>> d['new york']['queens counyt']
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
KeyError: 'queens counyt'
Additionally, I think setdefault works great when used in loops and you don’t know what you’re going to get for keys, but repetitive usage becomes quite burdensome, and I don’t think anyone would want to keep up the following:
d = dict()
d.setdefault('foo', {}).setdefault('bar', {})
d.setdefault('foo', {}).setdefault('baz', {})
d.setdefault('fizz', {}).setdefault('buzz', {})
d.setdefault('primary', {}).setdefault('secondary', {}).setdefault('tertiary', {}).setdefault('quaternary', {})
Another criticism is that setdefault requires a new instance whether it is used or not. However, Python (or at least CPython) is rather smart about handling unused and unreferenced new instances, for example, it reuses the location in memory:
>>> id({}), id({}), id({})
(523575344, 523575344, 523575344)
An auto-vivified defaultdict
This is a neat looking implementation, and usage in a script that you’re not inspecting the data on would be as useful as implementing __missing__
:
from collections import defaultdict
def vivdict():
return defaultdict(vivdict)
But if you need to inspect your data, the results of an auto-vivified defaultdict populated with data in the same way looks like this:
>>> d = vivdict(); d['foo']['bar']; d['foo']['baz']; d['fizz']['buzz']; d['primary']['secondary']['tertiary']['quaternary']; import pprint;
>>> pprint.pprint(d)
defaultdict(<function vivdict at 0x17B01870>, {'foo': defaultdict(<function vivdict
at 0x17B01870>, {'baz': defaultdict(<function vivdict at 0x17B01870>, {}), 'bar':
defaultdict(<function vivdict at 0x17B01870>, {})}), 'primary': defaultdict(<function
vivdict at 0x17B01870>, {'secondary': defaultdict(<function vivdict at 0x17B01870>,
{'tertiary': defaultdict(<function vivdict at 0x17B01870>, {'quaternary': defaultdict(
<function vivdict at 0x17B01870>, {})})})}), 'fizz': defaultdict(<function vivdict at
0x17B01870>, {'buzz': defaultdict(<function vivdict at 0x17B01870>, {})})})
This output is quite inelegant, and the results are quite unreadable. The solution typically given is to recursively convert back to a dict for manual inspection. This non-trivial solution is left as an exercise for the reader.
Performance
Finally, let’s look at performance. I’m subtracting the costs of instantiation.
>>> import timeit
>>> min(timeit.repeat(lambda: {}.setdefault('foo', {}))) - min(timeit.repeat(lambda: {}))
0.13612580299377441
>>> min(timeit.repeat(lambda: vivdict()['foo'])) - min(timeit.repeat(lambda: vivdict()))
0.2936999797821045
>>> min(timeit.repeat(lambda: Vividict()['foo'])) - min(timeit.repeat(lambda: Vividict()))
0.5354437828063965
>>> min(timeit.repeat(lambda: AutoVivification()['foo'])) - min(timeit.repeat(lambda: AutoVivification()))
2.138362169265747
Based on performance, dict.setdefault
works the best. I’d highly recommend it for production code, in cases where you care about execution speed.
If you need this for interactive use (in an IPython notebook, perhaps) then performance doesn’t really matter – in which case, I’d go with Vividict for readability of the output. Compared to the AutoVivification object (which uses __getitem__
instead of __missing__
, which was made for this purpose) it is far superior.
Conclusion
Implementing __missing__
on a subclassed dict
to set and return a new instance is slightly more difficult than alternatives but has the benefits of
- easy instantiation
- easy data population
- easy data viewing
and because it is less complicated and more performant than modifying __getitem__
, it should be preferred to that method.
Nevertheless, it has drawbacks:
- Bad lookups will fail silently.
- The bad lookup will remain in the dictionary.
Thus I personally prefer setdefault
to the other solutions, and have in every situation where I have needed this sort of behavior.
回答 1
class AutoVivification(dict):
"""Implementation of perl's autovivification feature."""
def __getitem__(self, item):
try:
return dict.__getitem__(self, item)
except KeyError:
value = self[item] = type(self)()
return value
测试:
a = AutoVivification()
a[1][2][3] = 4
a[1][3][3] = 5
a[1][2]['test'] = 6
print a
输出:
{1: {2: {'test': 6, 3: 4}, 3: {3: 5}}}
class AutoVivification(dict):
"""Implementation of perl's autovivification feature."""
def __getitem__(self, item):
try:
return dict.__getitem__(self, item)
except KeyError:
value = self[item] = type(self)()
return value
Testing:
a = AutoVivification()
a[1][2][3] = 4
a[1][3][3] = 5
a[1][2]['test'] = 6
print a
Output:
{1: {2: {'test': 6, 3: 4}, 3: {3: 5}}}
回答 2
只是因为我还没有看到这么小的一个,这是一个像您想嵌套的字典一样,没有汗水:
# yo dawg, i heard you liked dicts
def yodict():
return defaultdict(yodict)
Just because I haven’t seen one this small, here’s a dict that gets as nested as you like, no sweat:
# yo dawg, i heard you liked dicts
def yodict():
return defaultdict(yodict)
回答 3
您可以创建一个YAML文件并使用PyYaml读取它。
步骤1:创建一个YAML文件“ employment.yml”:
new jersey:
mercer county:
pumbers: 3
programmers: 81
middlesex county:
salesmen: 62
programmers: 81
new york:
queens county:
plumbers: 9
salesmen: 36
步骤2:以Python阅读
import yaml
file_handle = open("employment.yml")
my_shnazzy_dictionary = yaml.safe_load(file_handle)
file_handle.close()
现在my_shnazzy_dictionary
拥有您的所有价值观。如果您需要即时执行此操作,则可以将YAML创建为字符串并将其输入yaml.safe_load(...)
。
You could create a YAML file and read it in using PyYaml.
Step 1: Create a YAML file, “employment.yml”:
new jersey:
mercer county:
pumbers: 3
programmers: 81
middlesex county:
salesmen: 62
programmers: 81
new york:
queens county:
plumbers: 9
salesmen: 36
Step 2: Read it in Python
import yaml
file_handle = open("employment.yml")
my_shnazzy_dictionary = yaml.safe_load(file_handle)
file_handle.close()
and now my_shnazzy_dictionary
has all your values. If you needed to do this on the fly, you can create the YAML as a string and feed that into yaml.safe_load(...)
.
回答 4
由于您具有星形模式设计,因此您可能希望使其结构更像关系表,而不像字典。
import collections
class Jobs( object ):
def __init__( self, state, county, title, count ):
self.state= state
self.count= county
self.title= title
self.count= count
facts = [
Jobs( 'new jersey', 'mercer county', 'plumbers', 3 ),
...
def groupBy( facts, name ):
total= collections.defaultdict( int )
for f in facts:
key= getattr( f, name )
total[key] += f.count
在没有SQL开销的情况下,创建类似数据仓库的设计可以走很长一段路。
Since you have a star-schema design, you might want to structure it more like a relational table and less like a dictionary.
import collections
class Jobs( object ):
def __init__( self, state, county, title, count ):
self.state= state
self.count= county
self.title= title
self.count= count
facts = [
Jobs( 'new jersey', 'mercer county', 'plumbers', 3 ),
...
def groupBy( facts, name ):
total= collections.defaultdict( int )
for f in facts:
key= getattr( f, name )
total[key] += f.count
That kind of thing can go a long way to creating a data warehouse-like design without the SQL overheads.
回答 5
如果嵌套级别的数量很少,那么我可以collections.defaultdict
这样做:
from collections import defaultdict
def nested_dict_factory():
return defaultdict(int)
def nested_dict_factory2():
return defaultdict(nested_dict_factory)
db = defaultdict(nested_dict_factory2)
db['new jersey']['mercer county']['plumbers'] = 3
db['new jersey']['mercer county']['programmers'] = 81
使用defaultdict
这样避免了大量杂乱的setdefault()
,get()
等等。
If the number of nesting levels is small, I use collections.defaultdict
for this:
from collections import defaultdict
def nested_dict_factory():
return defaultdict(int)
def nested_dict_factory2():
return defaultdict(nested_dict_factory)
db = defaultdict(nested_dict_factory2)
db['new jersey']['mercer county']['plumbers'] = 3
db['new jersey']['mercer county']['programmers'] = 81
Using defaultdict
like this avoids a lot of messy setdefault()
, get()
, etc.
回答 6
这是一个返回任意深度的嵌套字典的函数:
from collections import defaultdict
def make_dict():
return defaultdict(make_dict)
像这样使用它:
d=defaultdict(make_dict)
d["food"]["meat"]="beef"
d["food"]["veggie"]="corn"
d["food"]["sweets"]="ice cream"
d["animal"]["pet"]["dog"]="collie"
d["animal"]["pet"]["cat"]="tabby"
d["animal"]["farm animal"]="chicken"
使用以下内容遍历所有内容:
def iter_all(d,depth=1):
for k,v in d.iteritems():
print "-"*depth,k
if type(v) is defaultdict:
iter_all(v,depth+1)
else:
print "-"*(depth+1),v
iter_all(d)
打印输出:
- food
-- sweets
--- ice cream
-- meat
--- beef
-- veggie
--- corn
- animal
-- pet
--- dog
---- labrador
--- cat
---- tabby
-- farm animal
--- chicken
您可能最终希望做到这一点,以便不能将新项目添加到字典中。将所有这些defaultdict
s 递归转换为正常dict
s 很容易。
def dictify(d):
for k,v in d.iteritems():
if isinstance(v,defaultdict):
d[k] = dictify(v)
return dict(d)
This is a function that returns a nested dictionary of arbitrary depth:
from collections import defaultdict
def make_dict():
return defaultdict(make_dict)
Use it like this:
d=defaultdict(make_dict)
d["food"]["meat"]="beef"
d["food"]["veggie"]="corn"
d["food"]["sweets"]="ice cream"
d["animal"]["pet"]["dog"]="collie"
d["animal"]["pet"]["cat"]="tabby"
d["animal"]["farm animal"]="chicken"
Iterate through everything with something like this:
def iter_all(d,depth=1):
for k,v in d.iteritems():
print "-"*depth,k
if type(v) is defaultdict:
iter_all(v,depth+1)
else:
print "-"*(depth+1),v
iter_all(d)
This prints out:
- food
-- sweets
--- ice cream
-- meat
--- beef
-- veggie
--- corn
- animal
-- pet
--- dog
---- labrador
--- cat
---- tabby
-- farm animal
--- chicken
You might eventually want to make it so that new items can not be added to the dict. It’s easy to recursively convert all these defaultdict
s to normal dict
s.
def dictify(d):
for k,v in d.iteritems():
if isinstance(v,defaultdict):
d[k] = dictify(v)
return dict(d)
回答 7
我觉得setdefault
很有用;它检查是否存在密钥,如果不存在,则添加它:
d = {}
d.setdefault('new jersey', {}).setdefault('mercer county', {})['plumbers'] = 3
setdefault
总是返回相关密钥,因此您实际上是在更新’d
在原地 ”。
关于迭代,我敢肯定,如果Python中尚不存在生成器,那么您可以足够容易地编写生成器:
def iterateStates(d):
# Let's count up the total number of "plumbers" / "dentists" / etc.
# across all counties and states
job_totals = {}
# I guess this is the annoying nested stuff you were talking about?
for (state, counties) in d.iteritems():
for (county, jobs) in counties.iteritems():
for (job, num) in jobs.iteritems():
# If job isn't already in job_totals, default it to zero
job_totals[job] = job_totals.get(job, 0) + num
# Now return an iterator of (job, number) tuples
return job_totals.iteritems()
# Display all jobs
for (job, num) in iterateStates(d):
print "There are %d %s in total" % (job, num)
I find setdefault
quite useful; It checks if a key is present and adds it if not:
d = {}
d.setdefault('new jersey', {}).setdefault('mercer county', {})['plumbers'] = 3
setdefault
always returns the relevant key, so you are actually updating the values of ‘d
‘ in place.
When it comes to iterating, I’m sure you could write a generator easily enough if one doesn’t already exist in Python:
def iterateStates(d):
# Let's count up the total number of "plumbers" / "dentists" / etc.
# across all counties and states
job_totals = {}
# I guess this is the annoying nested stuff you were talking about?
for (state, counties) in d.iteritems():
for (county, jobs) in counties.iteritems():
for (job, num) in jobs.iteritems():
# If job isn't already in job_totals, default it to zero
job_totals[job] = job_totals.get(job, 0) + num
# Now return an iterator of (job, number) tuples
return job_totals.iteritems()
# Display all jobs
for (job, num) in iterateStates(d):
print "There are %d %s in total" % (job, num)
回答 8
正如其他人所建议的,关系数据库对您可能更有用。您可以使用内存中的sqlite3数据库作为数据结构来创建表,然后对其进行查询。
import sqlite3
c = sqlite3.Connection(':memory:')
c.execute('CREATE TABLE jobs (state, county, title, count)')
c.executemany('insert into jobs values (?, ?, ?, ?)', [
('New Jersey', 'Mercer County', 'Programmers', 81),
('New Jersey', 'Mercer County', 'Plumbers', 3),
('New Jersey', 'Middlesex County', 'Programmers', 81),
('New Jersey', 'Middlesex County', 'Salesmen', 62),
('New York', 'Queens County', 'Salesmen', 36),
('New York', 'Queens County', 'Plumbers', 9),
])
# some example queries
print list(c.execute('SELECT * FROM jobs WHERE county = "Queens County"'))
print list(c.execute('SELECT SUM(count) FROM jobs WHERE title = "Programmers"'))
这只是一个简单的例子。您可以为州,县和职称定义单独的表格。
As others have suggested, a relational database could be more useful to you. You can use a in-memory sqlite3 database as a data structure to create tables and then query them.
import sqlite3
c = sqlite3.Connection(':memory:')
c.execute('CREATE TABLE jobs (state, county, title, count)')
c.executemany('insert into jobs values (?, ?, ?, ?)', [
('New Jersey', 'Mercer County', 'Programmers', 81),
('New Jersey', 'Mercer County', 'Plumbers', 3),
('New Jersey', 'Middlesex County', 'Programmers', 81),
('New Jersey', 'Middlesex County', 'Salesmen', 62),
('New York', 'Queens County', 'Salesmen', 36),
('New York', 'Queens County', 'Plumbers', 9),
])
# some example queries
print list(c.execute('SELECT * FROM jobs WHERE county = "Queens County"'))
print list(c.execute('SELECT SUM(count) FROM jobs WHERE title = "Programmers"'))
This is just a simple example. You could define separate tables for states, counties and job titles.
回答 9
collections.defaultdict
可以细分为嵌套的字典。然后将任何有用的迭代方法添加到该类。
>>> from collections import defaultdict
>>> class nesteddict(defaultdict):
def __init__(self):
defaultdict.__init__(self, nesteddict)
def walk(self):
for key, value in self.iteritems():
if isinstance(value, nesteddict):
for tup in value.walk():
yield (key,) + tup
else:
yield key, value
>>> nd = nesteddict()
>>> nd['new jersey']['mercer county']['plumbers'] = 3
>>> nd['new jersey']['mercer county']['programmers'] = 81
>>> nd['new jersey']['middlesex county']['programmers'] = 81
>>> nd['new jersey']['middlesex county']['salesmen'] = 62
>>> nd['new york']['queens county']['plumbers'] = 9
>>> nd['new york']['queens county']['salesmen'] = 36
>>> for tup in nd.walk():
print tup
('new jersey', 'mercer county', 'programmers', 81)
('new jersey', 'mercer county', 'plumbers', 3)
('new jersey', 'middlesex county', 'programmers', 81)
('new jersey', 'middlesex county', 'salesmen', 62)
('new york', 'queens county', 'salesmen', 36)
('new york', 'queens county', 'plumbers', 9)
collections.defaultdict
can be sub-classed to make a nested dict. Then add any useful iteration methods to that class.
>>> from collections import defaultdict
>>> class nesteddict(defaultdict):
def __init__(self):
defaultdict.__init__(self, nesteddict)
def walk(self):
for key, value in self.iteritems():
if isinstance(value, nesteddict):
for tup in value.walk():
yield (key,) + tup
else:
yield key, value
>>> nd = nesteddict()
>>> nd['new jersey']['mercer county']['plumbers'] = 3
>>> nd['new jersey']['mercer county']['programmers'] = 81
>>> nd['new jersey']['middlesex county']['programmers'] = 81
>>> nd['new jersey']['middlesex county']['salesmen'] = 62
>>> nd['new york']['queens county']['plumbers'] = 9
>>> nd['new york']['queens county']['salesmen'] = 36
>>> for tup in nd.walk():
print tup
('new jersey', 'mercer county', 'programmers', 81)
('new jersey', 'mercer county', 'plumbers', 3)
('new jersey', 'middlesex county', 'programmers', 81)
('new jersey', 'middlesex county', 'salesmen', 62)
('new york', 'queens county', 'salesmen', 36)
('new york', 'queens county', 'plumbers', 9)
回答 10
至于“令人讨厌的try / catch块”:
d = {}
d.setdefault('key',{}).setdefault('inner key',{})['inner inner key'] = 'value'
print d
Yield
{'key': {'inner key': {'inner inner key': 'value'}}}
您可以使用此方法将平面词典格式转换为结构化格式:
fd = {('new jersey', 'mercer county', 'plumbers'): 3,
('new jersey', 'mercer county', 'programmers'): 81,
('new jersey', 'middlesex county', 'programmers'): 81,
('new jersey', 'middlesex county', 'salesmen'): 62,
('new york', 'queens county', 'plumbers'): 9,
('new york', 'queens county', 'salesmen'): 36}
for (k1,k2,k3), v in fd.iteritems():
d.setdefault(k1, {}).setdefault(k2, {})[k3] = v
As for “obnoxious try/catch blocks”:
d = {}
d.setdefault('key',{}).setdefault('inner key',{})['inner inner key'] = 'value'
print d
yields
{'key': {'inner key': {'inner inner key': 'value'}}}
You can use this to convert from your flat dictionary format to structured format:
fd = {('new jersey', 'mercer county', 'plumbers'): 3,
('new jersey', 'mercer county', 'programmers'): 81,
('new jersey', 'middlesex county', 'programmers'): 81,
('new jersey', 'middlesex county', 'salesmen'): 62,
('new york', 'queens county', 'plumbers'): 9,
('new york', 'queens county', 'salesmen'): 36}
for (k1,k2,k3), v in fd.iteritems():
d.setdefault(k1, {}).setdefault(k2, {})[k3] = v
回答 11
您可以使用Addict:https://github.com/mewwts/addict
>>> from addict import Dict
>>> my_new_shiny_dict = Dict()
>>> my_new_shiny_dict.a.b.c.d.e = 2
>>> my_new_shiny_dict
{'a': {'b': {'c': {'d': {'e': 2}}}}}
You can use Addict: https://github.com/mewwts/addict
>>> from addict import Dict
>>> my_new_shiny_dict = Dict()
>>> my_new_shiny_dict.a.b.c.d.e = 2
>>> my_new_shiny_dict
{'a': {'b': {'c': {'d': {'e': 2}}}}}
回答 12
defaultdict()
是你的朋友!
对于二维字典,您可以执行以下操作:
d = defaultdict(defaultdict)
d[1][2] = 3
有关更多尺寸,您可以:
d = defaultdict(lambda :defaultdict(defaultdict))
d[1][2][3] = 4
defaultdict()
is your friend!
For a two dimensional dictionary you can do:
d = defaultdict(defaultdict)
d[1][2] = 3
For more dimensions you can:
d = defaultdict(lambda :defaultdict(defaultdict))
d[1][2][3] = 4
回答 13
为了方便地迭代嵌套字典,为什么不编写一个简单的生成器呢?
def each_job(my_dict):
for state, a in my_dict.items():
for county, b in a.items():
for job, value in b.items():
yield {
'state' : state,
'county' : county,
'job' : job,
'value' : value
}
因此,如果您有编译后的嵌套字典,则对其进行迭代就变得很简单:
for r in each_job(my_dict):
print "There are %d %s in %s, %s" % (r['value'], r['job'], r['county'], r['state'])
显然,您的生成器可以产生任何对您有用的数据格式。
为什么使用try catch块读取树?在尝试检索字典中的键之前,很容易(而且可能更安全)进行查询。使用保护子句的函数可能如下所示:
if not my_dict.has_key('new jersey'):
return False
nj_dict = my_dict['new jersey']
...
或者,也许有些冗长的方法是使用get方法:
value = my_dict.get('new jersey', {}).get('middlesex county', {}).get('salesmen', 0)
但是,以更简洁的方式,您可能希望使用collections.defaultdict,它是自python 2.5以来标准库的一部分。
import collections
def state_struct(): return collections.defaultdict(county_struct)
def county_struct(): return collections.defaultdict(job_struct)
def job_struct(): return 0
my_dict = collections.defaultdict(state_struct)
print my_dict['new jersey']['middlesex county']['salesmen']
我在这里对数据结构的含义进行假设,但是应该很容易根据实际需要进行调整。
For easy iterating over your nested dictionary, why not just write a simple generator?
def each_job(my_dict):
for state, a in my_dict.items():
for county, b in a.items():
for job, value in b.items():
yield {
'state' : state,
'county' : county,
'job' : job,
'value' : value
}
So then, if you have your compilicated nested dictionary, iterating over it becomes simple:
for r in each_job(my_dict):
print "There are %d %s in %s, %s" % (r['value'], r['job'], r['county'], r['state'])
Obviously your generator can yield whatever format of data is useful to you.
Why are you using try catch blocks to read the tree? It’s easy enough (and probably safer) to query whether a key exists in a dict before trying to retrieve it. A function using guard clauses might look like this:
if not my_dict.has_key('new jersey'):
return False
nj_dict = my_dict['new jersey']
...
Or, a perhaps somewhat verbose method, is to use the get method:
value = my_dict.get('new jersey', {}).get('middlesex county', {}).get('salesmen', 0)
But for a somewhat more succinct way, you might want to look at using a collections.defaultdict, which is part of the standard library since python 2.5.
import collections
def state_struct(): return collections.defaultdict(county_struct)
def county_struct(): return collections.defaultdict(job_struct)
def job_struct(): return 0
my_dict = collections.defaultdict(state_struct)
print my_dict['new jersey']['middlesex county']['salesmen']
I’m making assumptions about the meaning of your data structure here, but it should be easy to adjust for what you actually want to do.
回答 14
我喜欢的一类包装这和实施的想法__getitem__
,并__setitem__
使得它们实现了一个简单的查询语言:
>>> d['new jersey/mercer county/plumbers'] = 3
>>> d['new jersey/mercer county/programmers'] = 81
>>> d['new jersey/mercer county/programmers']
81
>>> d['new jersey/mercer country']
<view which implicitly adds 'new jersey/mercer county' to queries/mutations>
如果您想花哨的话,还可以执行以下操作:
>>> d['*/*/programmers']
<view which would contain 'programmers' entries>
但大多数情况下,我认为实现这样的事情会很有趣:D
I like the idea of wrapping this in a class and implementing __getitem__
and __setitem__
such that they implemented a simple query language:
>>> d['new jersey/mercer county/plumbers'] = 3
>>> d['new jersey/mercer county/programmers'] = 81
>>> d['new jersey/mercer county/programmers']
81
>>> d['new jersey/mercer country']
<view which implicitly adds 'new jersey/mercer county' to queries/mutations>
If you wanted to get fancy you could also implement something like:
>>> d['*/*/programmers']
<view which would contain 'programmers' entries>
but mostly I think such a thing would be really fun to implement :D
回答 15
除非您的数据集将保持很小,否则您可能要考虑使用关系数据库。它将完全满足您的要求:轻松添加计数,选择计数子集,甚至可以按州,县,职业或这些方法的任意组合来汇总计数。
Unless your dataset is going to stay pretty small, you might want to consider using a relational database. It will do exactly what you want: make it easy to add counts, selecting subsets of counts, and even aggregate counts by state, county, occupation, or any combination of these.
回答 16
class JobDb(object):
def __init__(self):
self.data = []
self.all = set()
self.free = []
self.index1 = {}
self.index2 = {}
self.index3 = {}
def _indices(self,(key1,key2,key3)):
indices = self.all.copy()
wild = False
for index,key in ((self.index1,key1),(self.index2,key2),
(self.index3,key3)):
if key is not None:
indices &= index.setdefault(key,set())
else:
wild = True
return indices, wild
def __getitem__(self,key):
indices, wild = self._indices(key)
if wild:
return dict(self.data[i] for i in indices)
else:
values = [self.data[i][-1] for i in indices]
if values:
return values[0]
def __setitem__(self,key,value):
indices, wild = self._indices(key)
if indices:
for i in indices:
self.data[i] = key,value
elif wild:
raise KeyError(k)
else:
if self.free:
index = self.free.pop(0)
self.data[index] = key,value
else:
index = len(self.data)
self.data.append((key,value))
self.all.add(index)
self.index1.setdefault(key[0],set()).add(index)
self.index2.setdefault(key[1],set()).add(index)
self.index3.setdefault(key[2],set()).add(index)
def __delitem__(self,key):
indices,wild = self._indices(key)
if not indices:
raise KeyError
self.index1[key[0]] -= indices
self.index2[key[1]] -= indices
self.index3[key[2]] -= indices
self.all -= indices
for i in indices:
self.data[i] = None
self.free.extend(indices)
def __len__(self):
return len(self.all)
def __iter__(self):
for key,value in self.data:
yield key
例:
>>> db = JobDb()
>>> db['new jersey', 'mercer county', 'plumbers'] = 3
>>> db['new jersey', 'mercer county', 'programmers'] = 81
>>> db['new jersey', 'middlesex county', 'programmers'] = 81
>>> db['new jersey', 'middlesex county', 'salesmen'] = 62
>>> db['new york', 'queens county', 'plumbers'] = 9
>>> db['new york', 'queens county', 'salesmen'] = 36
>>> db['new york', None, None]
{('new york', 'queens county', 'plumbers'): 9,
('new york', 'queens county', 'salesmen'): 36}
>>> db[None, None, 'plumbers']
{('new jersey', 'mercer county', 'plumbers'): 3,
('new york', 'queens county', 'plumbers'): 9}
>>> db['new jersey', 'mercer county', None]
{('new jersey', 'mercer county', 'plumbers'): 3,
('new jersey', 'mercer county', 'programmers'): 81}
>>> db['new jersey', 'middlesex county', 'programmers']
81
>>>
编辑:现在使用通配符(None
)查询时返回字典,否则返回单个值。
class JobDb(object):
def __init__(self):
self.data = []
self.all = set()
self.free = []
self.index1 = {}
self.index2 = {}
self.index3 = {}
def _indices(self,(key1,key2,key3)):
indices = self.all.copy()
wild = False
for index,key in ((self.index1,key1),(self.index2,key2),
(self.index3,key3)):
if key is not None:
indices &= index.setdefault(key,set())
else:
wild = True
return indices, wild
def __getitem__(self,key):
indices, wild = self._indices(key)
if wild:
return dict(self.data[i] for i in indices)
else:
values = [self.data[i][-1] for i in indices]
if values:
return values[0]
def __setitem__(self,key,value):
indices, wild = self._indices(key)
if indices:
for i in indices:
self.data[i] = key,value
elif wild:
raise KeyError(k)
else:
if self.free:
index = self.free.pop(0)
self.data[index] = key,value
else:
index = len(self.data)
self.data.append((key,value))
self.all.add(index)
self.index1.setdefault(key[0],set()).add(index)
self.index2.setdefault(key[1],set()).add(index)
self.index3.setdefault(key[2],set()).add(index)
def __delitem__(self,key):
indices,wild = self._indices(key)
if not indices:
raise KeyError
self.index1[key[0]] -= indices
self.index2[key[1]] -= indices
self.index3[key[2]] -= indices
self.all -= indices
for i in indices:
self.data[i] = None
self.free.extend(indices)
def __len__(self):
return len(self.all)
def __iter__(self):
for key,value in self.data:
yield key
Example:
>>> db = JobDb()
>>> db['new jersey', 'mercer county', 'plumbers'] = 3
>>> db['new jersey', 'mercer county', 'programmers'] = 81
>>> db['new jersey', 'middlesex county', 'programmers'] = 81
>>> db['new jersey', 'middlesex county', 'salesmen'] = 62
>>> db['new york', 'queens county', 'plumbers'] = 9
>>> db['new york', 'queens county', 'salesmen'] = 36
>>> db['new york', None, None]
{('new york', 'queens county', 'plumbers'): 9,
('new york', 'queens county', 'salesmen'): 36}
>>> db[None, None, 'plumbers']
{('new jersey', 'mercer county', 'plumbers'): 3,
('new york', 'queens county', 'plumbers'): 9}
>>> db['new jersey', 'mercer county', None]
{('new jersey', 'mercer county', 'plumbers'): 3,
('new jersey', 'mercer county', 'programmers'): 81}
>>> db['new jersey', 'middlesex county', 'programmers']
81
>>>
Edit: Now returning dictionaries when querying with wild cards (None
), and single values otherwise.
回答 17
我也有类似的事情。我有很多情况下会这样做:
thedict = {}
for item in ('foo', 'bar', 'baz'):
mydict = thedict.get(item, {})
mydict = get_value_for(item)
thedict[item] = mydict
但是要深入很多层次。关键在于“ .get(item,{})”,因为如果还没有字典的话,它将制作另一本字典。同时,我一直在思考如何更好地处理此问题。现在,有很多
value = mydict.get('foo', {}).get('bar', {}).get('baz', 0)
因此,我做了:
def dictgetter(thedict, default, *args):
totalargs = len(args)
for i,arg in enumerate(args):
if i+1 == totalargs:
thedict = thedict.get(arg, default)
else:
thedict = thedict.get(arg, {})
return thedict
如果执行以下操作,则具有相同的效果:
value = dictgetter(mydict, 0, 'foo', 'bar', 'baz')
更好?我认同。
I have a similar thing going. I have a lot of cases where I do:
thedict = {}
for item in ('foo', 'bar', 'baz'):
mydict = thedict.get(item, {})
mydict = get_value_for(item)
thedict[item] = mydict
But going many levels deep. It’s the “.get(item, {})” that’s the key as it’ll make another dictionary if there isn’t one already. Meanwhile, I’ve been thinking of ways to deal with
this better. Right now, there’s a lot of
value = mydict.get('foo', {}).get('bar', {}).get('baz', 0)
So instead, I made:
def dictgetter(thedict, default, *args):
totalargs = len(args)
for i,arg in enumerate(args):
if i+1 == totalargs:
thedict = thedict.get(arg, default)
else:
thedict = thedict.get(arg, {})
return thedict
Which has the same effect if you do:
value = dictgetter(mydict, 0, 'foo', 'bar', 'baz')
Better? I think so.
回答 18
您可以在lambdas和defaultdict中使用递归,无需定义名称:
a = defaultdict((lambda f: f(f))(lambda g: lambda:defaultdict(g(g))))
这是一个例子:
>>> a['new jersey']['mercer county']['plumbers']=3
>>> a['new jersey']['middlesex county']['programmers']=81
>>> a['new jersey']['mercer county']['programmers']=81
>>> a['new jersey']['middlesex county']['salesmen']=62
>>> a
defaultdict(<function __main__.<lambda>>,
{'new jersey': defaultdict(<function __main__.<lambda>>,
{'mercer county': defaultdict(<function __main__.<lambda>>,
{'plumbers': 3, 'programmers': 81}),
'middlesex county': defaultdict(<function __main__.<lambda>>,
{'programmers': 81, 'salesmen': 62})})})
You can use recursion in lambdas and defaultdict, no need to define names:
a = defaultdict((lambda f: f(f))(lambda g: lambda:defaultdict(g(g))))
Here’s an example:
>>> a['new jersey']['mercer county']['plumbers']=3
>>> a['new jersey']['middlesex county']['programmers']=81
>>> a['new jersey']['mercer county']['programmers']=81
>>> a['new jersey']['middlesex county']['salesmen']=62
>>> a
defaultdict(<function __main__.<lambda>>,
{'new jersey': defaultdict(<function __main__.<lambda>>,
{'mercer county': defaultdict(<function __main__.<lambda>>,
{'plumbers': 3, 'programmers': 81}),
'middlesex county': defaultdict(<function __main__.<lambda>>,
{'programmers': 81, 'salesmen': 62})})})
回答 19
我曾经使用此功能。其安全,快速,易于维护。
def deep_get(dictionary, keys, default=None):
return reduce(lambda d, key: d.get(key, default) if isinstance(d, dict) else default, keys.split("."), dictionary)
范例:
>>> from functools import reduce
>>> def deep_get(dictionary, keys, default=None):
... return reduce(lambda d, key: d.get(key, default) if isinstance(d, dict) else default, keys.split("."), dictionary)
...
>>> person = {'person':{'name':{'first':'John'}}}
>>> print (deep_get(person, "person.name.first"))
John
>>> print (deep_get(person, "person.name.lastname"))
None
>>> print (deep_get(person, "person.name.lastname", default="No lastname"))
No lastname
>>>
I used to use this function. its safe, quick, easily maintainable.
def deep_get(dictionary, keys, default=None):
return reduce(lambda d, key: d.get(key, default) if isinstance(d, dict) else default, keys.split("."), dictionary)
Example :
>>> from functools import reduce
>>> def deep_get(dictionary, keys, default=None):
... return reduce(lambda d, key: d.get(key, default) if isinstance(d, dict) else default, keys.split("."), dictionary)
...
>>> person = {'person':{'name':{'first':'John'}}}
>>> print (deep_get(person, "person.name.first"))
John
>>> print (deep_get(person, "person.name.lastname"))
None
>>> print (deep_get(person, "person.name.lastname", default="No lastname"))
No lastname
>>>