标签归档:dictionary

使用python map和其他功能工具

问题:使用python map和其他功能工具

这相当不错,但是我正在尝试学习/理解python中的函数式编程。如下代码:

foos = [1.0,2.0,3.0,4.0,5.0]
bars = [1,2,3]

def maptest(foo, bar):
    print foo, bar

map(maptest, foos, bars)

生成:

1.0 1
2.0 2
3.0 3
4.0 None
5.0 None

问:有没有一种方法可以使用map或python中的任何其他功能工具来产生以下内容而没有循环等。

1.0 [1,2,3]
2.0 [1,2,3]
3.0 [1,2,3]
4.0 [1,2,3]
5.0 [1,2,3]

顺便提一下,如果foo和bar之间存在依赖关系,则实现将如何更改。例如

foos = [1.0,2.0,3.0,4.0,5.0]
bars = [1,2,3,4,5]

并打印:

1.0 [2,3,4,5]
2.0 [1,3,4,5]
3.0 [1,2,4,5]
...

PS:我知道如何使用if,循环和/或生成器天真地做到这一点,但是我想学习如何使用功能性工具来实现这一点。是否只是在maptest中添加if语句或在maptest内部将其他过滤器映射应用于bar的情况?

This is quite n00bish, but I’m trying to learn/understand functional programming in python. The following code:

foos = [1.0,2.0,3.0,4.0,5.0]
bars = [1,2,3]

def maptest(foo, bar):
    print foo, bar

map(maptest, foos, bars)

produces:

1.0 1
2.0 2
3.0 3
4.0 None
5.0 None

Q. Is there a way to use map or any other functional tools in python to produce the following without loops etc.

1.0 [1,2,3]
2.0 [1,2,3]
3.0 [1,2,3]
4.0 [1,2,3]
5.0 [1,2,3]

Just as a side note how would the implementation change if there is a dependency between foo and bar. e.g.

foos = [1.0,2.0,3.0,4.0,5.0]
bars = [1,2,3,4,5]

and print:

1.0 [2,3,4,5]
2.0 [1,3,4,5]
3.0 [1,2,4,5]
...

P.S: I know how to do it naively using if, loops and/or generators, but I’d like to learn how to achieve the same using functional tools. Is it just a case of adding an if statement to maptest or apply another filter map to bars internally within maptest?


回答 0

最简单的方法是不bars通过不同的功能,而直接从maptest以下位置访问它:

foos = [1.0,2.0,3.0,4.0,5.0]
bars = [1,2,3]

def maptest(foo):
    print foo, bars

map(maptest, foos)

使用原始maptest函数,您还可以在map以下代码中使用lambda函数:

map((lambda foo: maptest(foo, bars)), foos)

The easiest way would be not to pass bars through the different functions, but to access it directly from maptest:

foos = [1.0,2.0,3.0,4.0,5.0]
bars = [1,2,3]

def maptest(foo):
    print foo, bars

map(maptest, foos)

With your original maptest function you could also use a lambda function in map:

map((lambda foo: maptest(foo, bars)), foos)

回答 1

您是否熟悉其他功能语言?即,您是要学习python如何进行函数编程,还是要学习有关函数编程并使用python作为工具?

另外,您了解列表理解吗?

map(f, sequence)

与(*)直接等效:

[f(x) for x in sequence]

实际上,我认为map()曾经打算从python 3.0中删除它是多余的(那没有发生)。

map(f, sequence1, sequence2)

大致等于:

[f(x1, x2) for x1, x2 in zip(sequence1, sequence2)]

(在处理序列长度不同的情况时,它的处理方式有所不同。如您所见,map()当其中一个序列用完时,填入None,而zip()当最短序列停止时,则填满)

因此,为了解决您的特定问题,您尝试产生结果:

foos[0], bars
foos[1], bars
foos[2], bars
# etc.

您可以通过编写一个带有单个参数并打印它的函数,然后加上杠来做到这一点:

def maptest(x):
     print x, bars
map(maptest, foos)

或者,您可以创建一个如下所示的列表:

[bars, bars, bars, ] # etc.

并使用原始的maptest:

def maptest(x, y):
    print x, y

一种方法是事先显式构建列表:

barses = [bars] * len(foos)
map(maptest, foos, barses)

或者,您可以拉入itertools模块。 itertools包含许多巧妙的功能,可帮助您在python中进行功能风格的延迟评估编程。在这种情况下,我们需要itertools.repeat,当您对其进行迭代时,它将无限期地输出其参数。最后一个事实意味着,如果您这样做:

map(maptest, foos, itertools.repeat(bars))

map()只要参数之一仍在产生输出,您就会得到无穷的输出,因为它一直持续下去。但是,itertools.imap就像map(),但最短的可迭代停止就停止。

itertools.imap(maptest, foos, itertools.repeat(bars))

希望这可以帮助 :-)

(*)在python 3.0中有些不同。在那里,map()本质上返回一个生成器表达式。

Are you familiar with other functional languages? i.e. are you trying to learn how python does functional programming, or are you trying to learn about functional programming and using python as the vehicle?

Also, do you understand list comprehensions?

map(f, sequence)

is directly equivalent (*) to:

[f(x) for x in sequence]

In fact, I think map() was once slated for removal from python 3.0 as being redundant (that didn’t happen).

map(f, sequence1, sequence2)

is mostly equivalent to:

[f(x1, x2) for x1, x2 in zip(sequence1, sequence2)]

(there is a difference in how it handles the case where the sequences are of different length. As you saw, map() fills in None when one of the sequences runs out, whereas zip() stops when the shortest sequence stops)

So, to address your specific question, you’re trying to produce the result:

foos[0], bars
foos[1], bars
foos[2], bars
# etc.

You could do this by writing a function that takes a single argument and prints it, followed by bars:

def maptest(x):
     print x, bars
map(maptest, foos)

Alternatively, you could create a list that looks like this:

[bars, bars, bars, ] # etc.

and use your original maptest:

def maptest(x, y):
    print x, y

One way to do this would be to explicitely build the list beforehand:

barses = [bars] * len(foos)
map(maptest, foos, barses)

Alternatively, you could pull in the itertools module. itertools contains many clever functions that help you do functional-style lazy-evaluation programming in python. In this case, we want itertools.repeat, which will output its argument indefinitely as you iterate over it. This last fact means that if you do:

map(maptest, foos, itertools.repeat(bars))

you will get endless output, since map() keeps going as long as one of the arguments is still producing output. However, itertools.imap is just like map(), but stops as soon as the shortest iterable stops.

itertools.imap(maptest, foos, itertools.repeat(bars))

Hope this helps :-)

(*) It’s a little different in python 3.0. There, map() essentially returns a generator expression.


回答 2

这是您要寻找的解决方案:

>>> foos = [1.0, 2.0, 3.0, 4.0, 5.0]
>>> bars = [1, 2, 3]
>>> [(x, bars) for x in foos]
[(1.0, [1, 2, 3]), (2.0, [1, 2, 3]), (3.0, [1, 2, 3]), (4.0, [1, 2, 3]), (5.0, [
1, 2, 3])]

我建议使用列表理解([(x, bars) for x in foos]部分)而不是使用地图,因为它避免了每次迭代时函数调用的开销(这可能非常重要)。如果只打算在for循环中使用它,则可以通过使用生成器理解来获得更好的速度:

>>> y = ((x, bars) for x in foos)
>>> for z in y:
...     print z
...
(1.0, [1, 2, 3])
(2.0, [1, 2, 3])
(3.0, [1, 2, 3])
(4.0, [1, 2, 3])
(5.0, [1, 2, 3])

区别在于生成器理解迟缓地加载

更新 针对此评论:

当然,您知道您不复制栏,所有条目都是相同的栏列表。因此,如果您修改其中的任何一个(包括原始条),那么您将修改所有的它们。

我想这是一个正确的观点。我可以想到两种解决方案。最有效的可能是这样的:

tbars = tuple(bars)
[(x, tbars) for x in foos]

由于元组是不可变的,因此这将防止通过此列表理解的结果(或通过该路线生成器理解)的结果来修改钢筋。如果确实需要修改每个结果,则可以执行以下操作:

from copy import copy
[(x, copy(bars)) for x in foos]

但是,这在内存使用和速度方面都可能会有些昂贵,因此我建议您不要这样做,除非您确实需要添加每个内存。

Here’s the solution you’re looking for:

>>> foos = [1.0, 2.0, 3.0, 4.0, 5.0]
>>> bars = [1, 2, 3]
>>> [(x, bars) for x in foos]
[(1.0, [1, 2, 3]), (2.0, [1, 2, 3]), (3.0, [1, 2, 3]), (4.0, [1, 2, 3]), (5.0, [
1, 2, 3])]

I’d recommend using a list comprehension (the [(x, bars) for x in foos] part) over using map as it avoids the overhead of a function call on every iteration (which can be very significant). If you’re just going to use it in a for loop, you’ll get better speeds by using a generator comprehension:

>>> y = ((x, bars) for x in foos)
>>> for z in y:
...     print z
...
(1.0, [1, 2, 3])
(2.0, [1, 2, 3])
(3.0, [1, 2, 3])
(4.0, [1, 2, 3])
(5.0, [1, 2, 3])

The difference is that the generator comprehension is lazily loaded.

UPDATE In response to this comment:

Of course you know, that you don’t copy bars, all entries are the same bars list. So if you modify any one of them (including original bars), you modify all of them.

I suppose this is a valid point. There are two solutions to this that I can think of. The most efficient is probably something like this:

tbars = tuple(bars)
[(x, tbars) for x in foos]

Since tuples are immutable, this will prevent bars from being modified through the results of this list comprehension (or generator comprehension if you go that route). If you really need to modify each and every one of the results, you can do this:

from copy import copy
[(x, copy(bars)) for x in foos]

However, this can be a bit expensive both in terms of memory usage and in speed, so I’d recommend against it unless you really need to add to each one of them.


回答 3

函数式编程是关于创建无副作用的代码。

map是功能列表转换的抽象。您可以使用它来获取一系列序列并将其转换为其他序列。

您正在尝试将其用作迭代器。不要那样做 :)

这是一个示例,说明如何使用地图构建所需的列表。有较短的解决方案(我只是使用理解力),但这将帮助您了解哪种地图效果更好:

def my_transform_function(input):
    return [input, [1, 2, 3]]

new_list = map(my_transform, input_list)

请注意,此时您仅完成了数据操作。现在您可以打印它:

for n,l in new_list:
    print n, ll

-我不确定“没有循环”是什么意思。fp并不是要避免循环(您无法访问列表中的每个项目都无法对其进行检查)。这是关于避免副作用,从而减少错误。

Functional programming is about creating side-effect-free code.

map is a functional list transformation abstraction. You use it to take a sequence of something and turn it into a sequence of something else.

You are trying to use it as an iterator. Don’t do that. :)

Here is an example of how you might use map to build the list you want. There are shorter solutions (I’d just use comprehensions), but this will help you understand what map does a bit better:

def my_transform_function(input):
    return [input, [1, 2, 3]]

new_list = map(my_transform, input_list)

Notice at this point, you’ve only done a data manipulation. Now you can print it:

for n,l in new_list:
    print n, ll

— I’m not sure what you mean by ‘without loops.’ fp isn’t about avoiding loops (you can’t examine every item in a list without visiting each one). It’s about avoiding side-effects, thus writing fewer bugs.


回答 4

>>> from itertools import repeat
>>> for foo, bars in zip(foos, repeat(bars)):
...     print foo, bars
... 
1.0 [1, 2, 3]
2.0 [1, 2, 3]
3.0 [1, 2, 3]
4.0 [1, 2, 3]
5.0 [1, 2, 3]
>>> from itertools import repeat
>>> for foo, bars in zip(foos, repeat(bars)):
...     print foo, bars
... 
1.0 [1, 2, 3]
2.0 [1, 2, 3]
3.0 [1, 2, 3]
4.0 [1, 2, 3]
5.0 [1, 2, 3]

回答 5

import itertools

foos=[1.0, 2.0, 3.0, 4.0, 5.0]
bars=[1, 2, 3]

print zip(foos, itertools.cycle([bars]))
import itertools

foos=[1.0, 2.0, 3.0, 4.0, 5.0]
bars=[1, 2, 3]

print zip(foos, itertools.cycle([bars]))

回答 6

以下是该map(function, *sequences)函数的参数概述:

  • function 是函数的名称。
  • sequences是任意数量的序列,通常是列表或元组。 map同时迭代它们并将当前值提供给function。这就是为什么序列数应等于函数的参数数的原因。

听起来您正在尝试迭代某些function参数,但保持其他参数不变,但是不幸的map是不支持该参数。我发现一个向Python添加此类功能的旧建议,但是map构造是如此干净且完善,以至于我怀疑这样的东西是否会实现。

像其他人建议的那样,使用诸如全局变量或列表理解之类的解决方法。

Here’s an overview of the parameters to the map(function, *sequences) function:

  • function is the name of your function.
  • sequences is any number of sequences, which are usually lists or tuples. map will iterate over them simultaneously and give the current values to function. That’s why the number of sequences should equal the number of parameters to your function.

It sounds like you’re trying to iterate for some of function‘s parameters but keep others constant, and unfortunately map doesn’t support that. I found an old proposal to add such a feature to Python, but the map construct is so clean and well-established that I doubt something like that will ever be implemented.

Use a workaround like global variables or list comprehensions, as others have suggested.


回答 7

这样可以吗?

foos = [1.0,2.0,3.0,4.0,5.0]
bars = [1,2,3]

def maptest2(bar):
  print bar

def maptest(foo):
  print foo
  map(maptest2, bars)

map(maptest, foos)

Would this do it?

foos = [1.0,2.0,3.0,4.0,5.0]
bars = [1,2,3]

def maptest2(bar):
  print bar

def maptest(foo):
  print foo
  map(maptest2, bars)

map(maptest, foos)

回答 8

这个怎么样:

foos = [1.0,2.0,3.0,4.0,5.0]
bars = [1,2,3]

def maptest(foo, bar):
    print foo, bar

map(maptest, foos, [bars]*len(foos))

How about this:

foos = [1.0,2.0,3.0,4.0,5.0]
bars = [1,2,3]

def maptest(foo, bar):
    print foo, bar

map(maptest, foos, [bars]*len(foos))

安全地从字典中删除多个键

问题:安全地从字典中删除多个键

我知道d安全地从字典中删除条目“键” ,您可以这样做:

if d.has_key('key'):
    del d['key']

但是,我需要安全地从字典中删除多个条目。我正在考虑在元组中定义条目,因为我将需要多次执行此操作。

entitiesToREmove = ('a', 'b', 'c')
for x in entitiesToRemove:
    if d.has_key(x):
        del d[x]

但是,我想知道是否有更聪明的方法来做到这一点?

I know how to remove an entry, 'key' from my dictionary d, safely. You do:

if d.has_key('key'):
    del d['key']

However, I need to remove multiple entries from a dictionary safely. I was thinking of defining the entries in a tuple as I will need to do this more than once.

entities_to_remove = ('a', 'b', 'c')
for x in entities_to_remove:
    if x in d:
        del d[x]

However, I was wondering if there is a smarter way to do this?


回答 0

为什么不这样:

entries = ('a', 'b', 'c')
the_dict = {'b': 'foo'}

def entries_to_remove(entries, the_dict):
    for key in entries:
        if key in the_dict:
            del the_dict[key]

mattbornski使用dict.pop()提供了一个更紧凑的版本

Why not like this:

entries = ('a', 'b', 'c')
the_dict = {'b': 'foo'}

def entries_to_remove(entries, the_dict):
    for key in entries:
        if key in the_dict:
            del the_dict[key]

A more compact version was provided by mattbornski using dict.pop()


回答 1

d = {'some':'data'}
entriesToRemove = ('any', 'iterable')
for k in entriesToRemove:
    d.pop(k, None)

Using dict.pop:

d = {'some': 'data'}
entries_to_remove = ('any', 'iterable')
for k in entries_to_remove:
    d.pop(k, None)

回答 2

使用词典理解

final_dict = {key: t[key] for key in t if key not in [key1, key2]}

其中key1key2将被删除。

在下面的示例中,将删除键“ b”和“ c”并将其保存在键列表中。

>>> a
{'a': 1, 'c': 3, 'b': 2, 'd': 4}
>>> keys = ["b", "c"]
>>> print {key: a[key] for key in a if key not in keys}
{'a': 1, 'd': 4}
>>> 

Using Dict Comprehensions

final_dict = {key: t[key] for key in t if key not in [key1, key2]}

where key1 and key2 are to be removed.

In the example below, keys “b” and “c” are to be removed & it’s kept in a keys list.

>>> a
{'a': 1, 'c': 3, 'b': 2, 'd': 4}
>>> keys = ["b", "c"]
>>> print {key: a[key] for key in a if key not in keys}
{'a': 1, 'd': 4}
>>> 

回答 3

解决方案正在使用mapfilter起作用

Python2

d={"a":1,"b":2,"c":3}
l=("a","b","d")
map(d.__delitem__, filter(d.__contains__,l))
print(d)

Python3

d={"a":1,"b":2,"c":3}
l=("a","b","d")
list(map(d.__delitem__, filter(d.__contains__,l)))
print(d)

你得到:

{'c': 3}

a solution is using map and filter functions

python 2

d={"a":1,"b":2,"c":3}
l=("a","b","d")
map(d.__delitem__, filter(d.__contains__,l))
print(d)

python 3

d={"a":1,"b":2,"c":3}
l=("a","b","d")
list(map(d.__delitem__, filter(d.__contains__,l)))
print(d)

you get:

{'c': 3}

回答 4

如果还需要检索要删除的键的值,这将是一个很好的方法:

valuesRemoved = [d.pop(k, None) for k in entitiesToRemove]

当然,您仍然可以仅从中删除键来执行此操作d,但是您将不必要使用列表理解来创建值列表。只是为了函数的副作用而使用列表理解也有点不清楚。

If you also need to retrieve the values for the keys you are removing, this would be a pretty good way to do it:

values_removed = [d.pop(k, None) for k in entities_to_remove]

You could of course still do this just for the removal of the keys from d, but you would be unnecessarily creating the list of values with the list comprehension. It is also a little unclear to use a list comprehension just for the function’s side effect.


回答 5

发现用溶液popmap

d = {'a': 'valueA', 'b': 'valueB', 'c': 'valueC', 'd': 'valueD'}
keys = ['a', 'b', 'c']
list(map(d.pop, keys))
print(d)

此输出:

{'d': 'valueD'}

我这么晚才回答了这个问题,只是因为我认为如果有人进行搜索,将来会有所帮助。这可能会有所帮助。

更新资料

如果字典中不存在键,则以上代码将引发错误。

DICTIONARY = {'a': 'valueA', 'b': 'valueB', 'c': 'valueC', 'd': 'valueD'}
keys = ['a', 'l', 'c']

def remove_keys(key):
    try:
        DICTIONARY.pop(key, None)
    except:
        pass  # or do any action

list(map(remove_key, keys))
print(DICTIONARY)

输出:

DICTIONARY = {'b': 'valueB', 'd': 'valueD'}

Found a solution with pop and map

d = {'a': 'valueA', 'b': 'valueB', 'c': 'valueC', 'd': 'valueD'}
keys = ['a', 'b', 'c']
list(map(d.pop, keys))
print(d)

The output of this:

{'d': 'valueD'}

I have answered this question so late just because I think it will help in the future if anyone searches the same. And this might help.

Update

The above code will throw an error if a key does not exist in the dict.

DICTIONARY = {'a': 'valueA', 'b': 'valueB', 'c': 'valueC', 'd': 'valueD'}
keys = ['a', 'l', 'c']

def remove_keys(key):
    try:
        DICTIONARY.pop(key, None)
    except:
        pass  # or do any action

list(map(remove_key, keys))
print(DICTIONARY)

output:

DICTIONARY = {'b': 'valueB', 'd': 'valueD'}

回答 6

任何现有的答案我都没有问题,但是我很惊讶没有找到这个解决方案:

keys_to_remove = ['a', 'b', 'c']
my_dict = {k: v for k, v in zip("a b c d e f g".split(' '), [0, 1, 2, 3, 4, 5, 6])}

for k in keys_to_remove:
    try:
        del my_dict[k]
    except KeyError:
        pass

assert my_dict == {'d': 3, 'e': 4, 'f': 5, 'g': 6}

注:我碰到这个问题,从跌跌撞撞来这里。我的答案与此答案有关

I have no problem with any of the existing answers, but I was surprised to not find this solution:

keys_to_remove = ['a', 'b', 'c']
my_dict = {k: v for k, v in zip("a b c d e f g".split(' '), [0, 1, 2, 3, 4, 5, 6])}

for k in keys_to_remove:
    try:
        del my_dict[k]
    except KeyError:
        pass

assert my_dict == {'d': 3, 'e': 4, 'f': 5, 'g': 6}

Note: I stumbled across this question coming from here. And my answer is related to this answer.


回答 7

为什么不:

entriestoremove = (2,5,1)
for e in entriestoremove:
    if d.has_key(e):
        del d[e]

我不知道您所说的“更聪明的方式”。当然,还有其他方法,也许是对字典的理解:

entriestoremove = (2,5,1)
newdict = {x for x in d if x not in entriestoremove}

Why not:

entriestoremove = (2,5,1)
for e in entriestoremove:
    if d.has_key(e):
        del d[e]

I don’t know what you mean by “smarter way”. Surely there are other ways, maybe with dictionary comprehensions:

entriestoremove = (2,5,1)
newdict = {x for x in d if x not in entriestoremove}

回答 8

排队

import functools

#: not key(c) in d
d = {"a": "avalue", "b": "bvalue", "d": "dvalue"}

entitiesToREmove = ('a', 'b', 'c')

#: python2
map(lambda x: functools.partial(d.pop, x, None)(), entitiesToREmove)

#: python3

list(map(lambda x: functools.partial(d.pop, x, None)(), entitiesToREmove))

print(d)
# output: {'d': 'dvalue'}

inline

import functools

#: not key(c) in d
d = {"a": "avalue", "b": "bvalue", "d": "dvalue"}

entitiesToREmove = ('a', 'b', 'c')

#: python2
map(lambda x: functools.partial(d.pop, x, None)(), entitiesToREmove)

#: python3

list(map(lambda x: functools.partial(d.pop, x, None)(), entitiesToREmove))

print(d)
# output: {'d': 'dvalue'}

回答 9

对cpython 3的一些计时测试表明,简单的for循环是最快的方法,并且可读性强。添加一个函数也不会导致太多开销:

timeit结果(10000次迭代):

  • all(x.pop(v) for v in r) # 0.85
  • all(map(x.pop, r)) # 0.60
  • list(map(x.pop, r)) # 0.70
  • all(map(x.__delitem__, r)) # 0.44
  • del_all(x, r) # 0.40
  • <inline for loop>(x, r) # 0.35
def del_all(mapping, to_remove):
      """Remove list of elements from mapping."""
      for key in to_remove:
          del mapping[key]

对于小迭代,由于函数调用的开销,执行“内联”要快一些。但是,del_all它比所有python理解和映射结构都更安全,可重用并且运行速度更快。

Some timing tests for cpython 3 shows that a simple for loop is the fastest way, and it’s quite readable. Adding in a function doesn’t cause much overhead either:

timeit results (10k iterations):

  • all(x.pop(v) for v in r) # 0.85
  • all(map(x.pop, r)) # 0.60
  • list(map(x.pop, r)) # 0.70
  • all(map(x.__delitem__, r)) # 0.44
  • del_all(x, r) # 0.40
  • <inline for loop>(x, r) # 0.35
def del_all(mapping, to_remove):
      """Remove list of elements from mapping."""
      for key in to_remove:
          del mapping[key]

For small iterations, doing that ‘inline’ was a bit faster, because of the overhead of the function call. But del_all is lint-safe, reusable, and faster than all the python comprehension and mapping constructs.


回答 10

我认为,如果您使用的是python 3,最好将键视为一个集合:

def remove_keys(d, keys):
    to_remove = set(keys)
    filtered_keys = d.keys() - to_remove
    filtered_values = map(d.get, filtered_keys)
    return dict(zip(filtered_keys, filtered_values))

例:

>>> remove_keys({'k1': 1, 'k3': 3}, ['k1', 'k2'])
{'k3': 3}

I think using the fact that the keys can be treated as a set is the nicest way if you’re on python 3:

def remove_keys(d, keys):
    to_remove = set(keys)
    filtered_keys = d.keys() - to_remove
    filtered_values = map(d.get, filtered_keys)
    return dict(zip(filtered_keys, filtered_values))

Example:

>>> remove_keys({'k1': 1, 'k3': 3}, ['k1', 'k2'])
{'k3': 3}

回答 11

完全支持字典的set方法(而不是我们在Python 3.9中遇到的麻烦)是很好的,这样您就可以简单地“删除”一组键。但是,只要不是这种情况,并且您有一个大型词典并且可能要删除大量键,则可能需要了解性能。因此,我创建了一些代码,该代码创建的大小足以进行有意义的比较:100,000 x 1000矩阵,因此总共10,000,00个项目。

from itertools import product
from time import perf_counter

# make a complete worksheet 100000 * 1000
start = perf_counter()
prod = product(range(1, 100000), range(1, 1000))
cells = {(x,y):x for x,y in prod}
print(len(cells))

print(f"Create time {perf_counter()-start:.2f}s")
clock = perf_counter()
# remove everything above row 50,000

keys = product(range(50000, 100000), range(1, 100))

# for x,y in keys:
#     del cells[x, y]

for n in map(cells.pop, keys):
    pass

print(len(cells))
stop = perf_counter()
print(f"Removal time {stop-clock:.2f}s")

在某些情况下,1000万个或更多的项目并不罕见。比较本地计算机上的这两种方法,我发现使用map和时会略有改善pop,大概是因为调用的函数较少,但是这两种方法在我的计算机上大约需要2.5秒的时间。但这与首先创建字典(55s)或在循环中包括检查所需的时间相比显得苍白。如果可能,那么最好创建一个集合,该集合是字典键和过滤器的交集:

keys = cells.keys() & keys

总结:del已经进行了优化,所以不用担心使用它。

It would be nice to have full support for set methods for dictionaries (and not the unholy mess we’re getting with Python 3.9) so that you could simply “remove” a set of keys. However, as long as that’s not the case, and you have a large dictionary with potentially a large number of keys to remove, you might want to know about the performance. So, I’ve created some code that creates something large enough for meaningful comparisons: a 100,000 x 1000 matrix, so 10,000,00 items in total.

from itertools import product
from time import perf_counter

# make a complete worksheet 100000 * 1000
start = perf_counter()
prod = product(range(1, 100000), range(1, 1000))
cells = {(x,y):x for x,y in prod}
print(len(cells))

print(f"Create time {perf_counter()-start:.2f}s")
clock = perf_counter()
# remove everything above row 50,000

keys = product(range(50000, 100000), range(1, 100))

# for x,y in keys:
#     del cells[x, y]

for n in map(cells.pop, keys):
    pass

print(len(cells))
stop = perf_counter()
print(f"Removal time {stop-clock:.2f}s")

10 million items or more is not unusual in some settings. Comparing the two methods on my local machine I see a slight improvement when using map and pop, presumably because of fewer function calls, but both take around 2.5s on my machine. But this pales in comparison to the time required to create the dictionary in the first place (55s), or including checks within the loop. If this is likely then its best to create a set that is a intersection of the dictionary keys and your filter:

keys = cells.keys() & keys

In summary: del is already heavily optimised, so don’t worry about using it.


回答 12

我迟到了这个讨论,但对于其他人。解决方案可以是这样创建键列表。

k = ['a','b','c','d']

然后在列表推导或for循环中使用pop()遍历这些键,并一次弹出一个键。

new_dictionary = [dictionary.pop(x, 'n/a') for x in k]

如果密钥不存在,则“ n / a”,则需要返回默认值。

I’m late to this discussion but for anyone else. A solution may be to create a list of keys as such.

k = ['a','b','c','d']

Then use pop() in a list comprehension, or for loop, to iterate over the keys and pop one at a time as such.

new_dictionary = [dictionary.pop(x, 'n/a') for x in k]

The ‘n/a’ is in case the key does not exist, a default value needs to be returned.


字典与对象-哪个更有效,为什么?

问题:字典与对象-哪个更有效,为什么?

在内存使用和CPU消耗方面,在Python中更有效的方法是-字典还是对象?

背景: 我必须将大量数据加载到Python中。我创建了一个只是字段容器的对象。创建4M实例并将其放入字典中大约需要10分钟和约6GB的内存。字典准备就绪后,只需眨眼即可访问。

示例: 为了检查性能,我编写了两个简单的程序,它们执行相同的操作-一个使用对象,另一个使用字典:

对象(执行时间〜18sec):

class Obj(object):
  def __init__(self, i):
    self.i = i
    self.l = []
all = {}
for i in range(1000000):
  all[i] = Obj(i)

字典(执行时间约12秒):

all = {}
for i in range(1000000):
  o = {}
  o['i'] = i
  o['l'] = []
  all[i] = o

问题: 我做错什么了吗?字典比对象快?如果确实字典表现更好,有人可以解释为什么吗?

What is more efficient in Python in terms of memory usage and CPU consumption – Dictionary or Object?

Background: I have to load huge amount of data into Python. I created an object that is just a field container. Creating 4M instances and putting them into a dictionary took about 10 minutes and ~6GB of memory. After dictionary is ready, accessing it is a blink of an eye.

Example: To check the performance I wrote two simple programs that do the same – one is using objects, other dictionary:

Object (execution time ~18sec):

class Obj(object):
  def __init__(self, i):
    self.i = i
    self.l = []
all = {}
for i in range(1000000):
  all[i] = Obj(i)

Dictionary (execution time ~12sec):

all = {}
for i in range(1000000):
  o = {}
  o['i'] = i
  o['l'] = []
  all[i] = o

Question: Am I doing something wrong or dictionary is just faster than object? If indeed dictionary performs better, can somebody explain why?


回答 0

您是否尝试过使用__slots__

文档中

默认情况下,新旧类的实例都有用于属性存储的字典。这浪费了具有很少实例变量的对象的空间。创建大量实例时,空间消耗会变得非常大。

可以通过__slots__在新式类定义中进行定义来覆盖默认值。该__slots__声明采用一系列实例变量,并且在每个实例中仅保留足够的空间来容纳每个变量的值。因为__dict__未为每个实例创建空间,所以节省了空间。

那么,这样既节省时间又节省内存吗?

比较计算机上的三种方法:

test_slots.py:

class Obj(object):
  __slots__ = ('i', 'l')
  def __init__(self, i):
    self.i = i
    self.l = []
all = {}
for i in range(1000000):
  all[i] = Obj(i)

test_obj.py:

class Obj(object):
  def __init__(self, i):
    self.i = i
    self.l = []
all = {}
for i in range(1000000):
  all[i] = Obj(i)

test_dict.py:

all = {}
for i in range(1000000):
  o = {}
  o['i'] = i
  o['l'] = []
  all[i] = o

test_namedtuple.py(在2.6中受支持):

import collections

Obj = collections.namedtuple('Obj', 'i l')

all = {}
for i in range(1000000):
  all[i] = Obj(i, [])

运行基准测试(使用CPython 2.5):

$ lshw | grep product | head -n 1
          product: Intel(R) Pentium(R) M processor 1.60GHz
$ python --version
Python 2.5
$ time python test_obj.py && time python test_dict.py && time python test_slots.py 

real    0m27.398s (using 'normal' object)
real    0m16.747s (using __dict__)
real    0m11.777s (using __slots__)

使用CPython 2.6.2,包括命名的元组测试:

$ python --version
Python 2.6.2
$ time python test_obj.py && time python test_dict.py && time python test_slots.py && time python test_namedtuple.py 

real    0m27.197s (using 'normal' object)
real    0m17.657s (using __dict__)
real    0m12.249s (using __slots__)
real    0m12.262s (using namedtuple)

因此,是的(不是很意外),使用__slots__是一种性能优化。使用命名元组的性能与相似__slots__

Have you tried using __slots__?

From the documentation:

By default, instances of both old and new-style classes have a dictionary for attribute storage. This wastes space for objects having very few instance variables. The space consumption can become acute when creating large numbers of instances.

The default can be overridden by defining __slots__ in a new-style class definition. The __slots__ declaration takes a sequence of instance variables and reserves just enough space in each instance to hold a value for each variable. Space is saved because __dict__ is not created for each instance.

So does this save time as well as memory?

Comparing the three approaches on my computer:

test_slots.py:

class Obj(object):
  __slots__ = ('i', 'l')
  def __init__(self, i):
    self.i = i
    self.l = []
all = {}
for i in range(1000000):
  all[i] = Obj(i)

test_obj.py:

class Obj(object):
  def __init__(self, i):
    self.i = i
    self.l = []
all = {}
for i in range(1000000):
  all[i] = Obj(i)

test_dict.py:

all = {}
for i in range(1000000):
  o = {}
  o['i'] = i
  o['l'] = []
  all[i] = o

test_namedtuple.py (supported in 2.6):

import collections

Obj = collections.namedtuple('Obj', 'i l')

all = {}
for i in range(1000000):
  all[i] = Obj(i, [])

Run benchmark (using CPython 2.5):

$ lshw | grep product | head -n 1
          product: Intel(R) Pentium(R) M processor 1.60GHz
$ python --version
Python 2.5
$ time python test_obj.py && time python test_dict.py && time python test_slots.py 

real    0m27.398s (using 'normal' object)
real    0m16.747s (using __dict__)
real    0m11.777s (using __slots__)

Using CPython 2.6.2, including the named tuple test:

$ python --version
Python 2.6.2
$ time python test_obj.py && time python test_dict.py && time python test_slots.py && time python test_namedtuple.py 

real    0m27.197s (using 'normal' object)
real    0m17.657s (using __dict__)
real    0m12.249s (using __slots__)
real    0m12.262s (using namedtuple)

So yes (not really a surprise), using __slots__ is a performance optimization. Using a named tuple has similar performance to __slots__.


回答 1

对象中的属性访问使用幕后的字典访问-因此,使用属性访问会增加额外的开销。另外,在对象情况下,由于例如额外的内存分配和代码执行(例如__init__方法的执行),您将承担额外的开销。

在您的代码中,如果o是一个Obj实例,o.attr则等效于o.__dict__['attr']少量的额外开销。

Attribute access in an object uses dictionary access behind the scenes – so by using attribute access you are adding extra overhead. Plus in the object case, you are incurring additional overhead because of e.g. additional memory allocations and code execution (e.g. of the __init__ method).

In your code, if o is an Obj instance, o.attr is equivalent to o.__dict__['attr'] with a small amount of extra overhead.


回答 2

您是否考虑过使用namedtuple?(python 2.4 / 2.5的链接

这是表示结构化数据的新标准方式,可为您提供元组的性能和类的便利性。

与字典相比,它的唯一缺点是(如元组)它不具有创建后更改属性的能力。

Have you considered using a namedtuple? (link for python 2.4/2.5)

It’s the new standard way of representing structured data that gives you the performance of a tuple and the convenience of a class.

It’s only downside compared with dictionaries is that (like tuples) it doesn’t give you the ability to change attributes after creation.


回答 3

这是python 3.6.1的@hughdbrown答案的副本,我将计数增加了5倍,并在每次运行结束时添加了一些代码来测试python进程的内存占用量。

在不愿接受投票的人之前,请注意,这种计算对象大小的方法并不准确。

from datetime import datetime
import os
import psutil

process = psutil.Process(os.getpid())


ITER_COUNT = 1000 * 1000 * 5

RESULT=None

def makeL(i):
    # Use this line to negate the effect of the strings on the test 
    # return "Python is smart and will only create one string with this line"

    # Use this if you want to see the difference with 5 million unique strings
    return "This is a sample string %s" % i

def timeit(method):
    def timed(*args, **kw):
        global RESULT
        s = datetime.now()
        RESULT = method(*args, **kw)
        e = datetime.now()

        sizeMb = process.memory_info().rss / 1024 / 1024
        sizeMbStr = "{0:,}".format(round(sizeMb, 2))

        print('Time Taken = %s, \t%s, \tSize = %s' % (e - s, method.__name__, sizeMbStr))

    return timed

class Obj(object):
    def __init__(self, i):
       self.i = i
       self.l = makeL(i)

class SlotObj(object):
    __slots__ = ('i', 'l')
    def __init__(self, i):
       self.i = i
       self.l = makeL(i)

from collections import namedtuple
NT = namedtuple("NT", ["i", 'l'])

@timeit
def profile_dict_of_nt():
    return [NT(i=i, l=makeL(i)) for i in range(ITER_COUNT)]

@timeit
def profile_list_of_nt():
    return dict((i, NT(i=i, l=makeL(i))) for i in range(ITER_COUNT))

@timeit
def profile_dict_of_dict():
    return dict((i, {'i': i, 'l': makeL(i)}) for i in range(ITER_COUNT))

@timeit
def profile_list_of_dict():
    return [{'i': i, 'l': makeL(i)} for i in range(ITER_COUNT)]

@timeit
def profile_dict_of_obj():
    return dict((i, Obj(i)) for i in range(ITER_COUNT))

@timeit
def profile_list_of_obj():
    return [Obj(i) for i in range(ITER_COUNT)]

@timeit
def profile_dict_of_slot():
    return dict((i, SlotObj(i)) for i in range(ITER_COUNT))

@timeit
def profile_list_of_slot():
    return [SlotObj(i) for i in range(ITER_COUNT)]

profile_dict_of_nt()
profile_list_of_nt()
profile_dict_of_dict()
profile_list_of_dict()
profile_dict_of_obj()
profile_list_of_obj()
profile_dict_of_slot()
profile_list_of_slot()

这些是我的结果

Time Taken = 0:00:07.018720,    provile_dict_of_nt,     Size = 951.83
Time Taken = 0:00:07.716197,    provile_list_of_nt,     Size = 1,084.75
Time Taken = 0:00:03.237139,    profile_dict_of_dict,   Size = 1,926.29
Time Taken = 0:00:02.770469,    profile_list_of_dict,   Size = 1,778.58
Time Taken = 0:00:07.961045,    profile_dict_of_obj,    Size = 1,537.64
Time Taken = 0:00:05.899573,    profile_list_of_obj,    Size = 1,458.05
Time Taken = 0:00:06.567684,    profile_dict_of_slot,   Size = 1,035.65
Time Taken = 0:00:04.925101,    profile_list_of_slot,   Size = 887.49

我的结论是:

  1. 插槽具有最佳的内存占用,并且速度合理。
  2. dict是最快的,但使用最多的内存。

Here is a copy of @hughdbrown answer for python 3.6.1, I’ve made the count 5x larger and added some code to test the memory footprint of the python process at the end of each run.

Before the downvoters have at it, Be advised that this method of counting the size of objects is not accurate.

from datetime import datetime
import os
import psutil

process = psutil.Process(os.getpid())


ITER_COUNT = 1000 * 1000 * 5

RESULT=None

def makeL(i):
    # Use this line to negate the effect of the strings on the test 
    # return "Python is smart and will only create one string with this line"

    # Use this if you want to see the difference with 5 million unique strings
    return "This is a sample string %s" % i

def timeit(method):
    def timed(*args, **kw):
        global RESULT
        s = datetime.now()
        RESULT = method(*args, **kw)
        e = datetime.now()

        sizeMb = process.memory_info().rss / 1024 / 1024
        sizeMbStr = "{0:,}".format(round(sizeMb, 2))

        print('Time Taken = %s, \t%s, \tSize = %s' % (e - s, method.__name__, sizeMbStr))

    return timed

class Obj(object):
    def __init__(self, i):
       self.i = i
       self.l = makeL(i)

class SlotObj(object):
    __slots__ = ('i', 'l')
    def __init__(self, i):
       self.i = i
       self.l = makeL(i)

from collections import namedtuple
NT = namedtuple("NT", ["i", 'l'])

@timeit
def profile_dict_of_nt():
    return [NT(i=i, l=makeL(i)) for i in range(ITER_COUNT)]

@timeit
def profile_list_of_nt():
    return dict((i, NT(i=i, l=makeL(i))) for i in range(ITER_COUNT))

@timeit
def profile_dict_of_dict():
    return dict((i, {'i': i, 'l': makeL(i)}) for i in range(ITER_COUNT))

@timeit
def profile_list_of_dict():
    return [{'i': i, 'l': makeL(i)} for i in range(ITER_COUNT)]

@timeit
def profile_dict_of_obj():
    return dict((i, Obj(i)) for i in range(ITER_COUNT))

@timeit
def profile_list_of_obj():
    return [Obj(i) for i in range(ITER_COUNT)]

@timeit
def profile_dict_of_slot():
    return dict((i, SlotObj(i)) for i in range(ITER_COUNT))

@timeit
def profile_list_of_slot():
    return [SlotObj(i) for i in range(ITER_COUNT)]

profile_dict_of_nt()
profile_list_of_nt()
profile_dict_of_dict()
profile_list_of_dict()
profile_dict_of_obj()
profile_list_of_obj()
profile_dict_of_slot()
profile_list_of_slot()

And these are my results

Time Taken = 0:00:07.018720,    provile_dict_of_nt,     Size = 951.83
Time Taken = 0:00:07.716197,    provile_list_of_nt,     Size = 1,084.75
Time Taken = 0:00:03.237139,    profile_dict_of_dict,   Size = 1,926.29
Time Taken = 0:00:02.770469,    profile_list_of_dict,   Size = 1,778.58
Time Taken = 0:00:07.961045,    profile_dict_of_obj,    Size = 1,537.64
Time Taken = 0:00:05.899573,    profile_list_of_obj,    Size = 1,458.05
Time Taken = 0:00:06.567684,    profile_dict_of_slot,   Size = 1,035.65
Time Taken = 0:00:04.925101,    profile_list_of_slot,   Size = 887.49

My conclusion is:

  1. Slots have the best memory footprint and are reasonable on speed.
  2. dicts are the fastest, but use the most memory.

回答 4

from datetime import datetime

ITER_COUNT = 1000 * 1000

def timeit(method):
    def timed(*args, **kw):
        s = datetime.now()
        result = method(*args, **kw)
        e = datetime.now()

        print method.__name__, '(%r, %r)' % (args, kw), e - s
        return result
    return timed

class Obj(object):
    def __init__(self, i):
       self.i = i
       self.l = []

class SlotObj(object):
    __slots__ = ('i', 'l')
    def __init__(self, i):
       self.i = i
       self.l = []

@timeit
def profile_dict_of_dict():
    return dict((i, {'i': i, 'l': []}) for i in xrange(ITER_COUNT))

@timeit
def profile_list_of_dict():
    return [{'i': i, 'l': []} for i in xrange(ITER_COUNT)]

@timeit
def profile_dict_of_obj():
    return dict((i, Obj(i)) for i in xrange(ITER_COUNT))

@timeit
def profile_list_of_obj():
    return [Obj(i) for i in xrange(ITER_COUNT)]

@timeit
def profile_dict_of_slotobj():
    return dict((i, SlotObj(i)) for i in xrange(ITER_COUNT))

@timeit
def profile_list_of_slotobj():
    return [SlotObj(i) for i in xrange(ITER_COUNT)]

if __name__ == '__main__':
    profile_dict_of_dict()
    profile_list_of_dict()
    profile_dict_of_obj()
    profile_list_of_obj()
    profile_dict_of_slotobj()
    profile_list_of_slotobj()

结果:

hbrown@hbrown-lpt:~$ python ~/Dropbox/src/StackOverflow/1336791.py 
profile_dict_of_dict ((), {}) 0:00:08.228094
profile_list_of_dict ((), {}) 0:00:06.040870
profile_dict_of_obj ((), {}) 0:00:11.481681
profile_list_of_obj ((), {}) 0:00:10.893125
profile_dict_of_slotobj ((), {}) 0:00:06.381897
profile_list_of_slotobj ((), {}) 0:00:05.860749
from datetime import datetime

ITER_COUNT = 1000 * 1000

def timeit(method):
    def timed(*args, **kw):
        s = datetime.now()
        result = method(*args, **kw)
        e = datetime.now()

        print method.__name__, '(%r, %r)' % (args, kw), e - s
        return result
    return timed

class Obj(object):
    def __init__(self, i):
       self.i = i
       self.l = []

class SlotObj(object):
    __slots__ = ('i', 'l')
    def __init__(self, i):
       self.i = i
       self.l = []

@timeit
def profile_dict_of_dict():
    return dict((i, {'i': i, 'l': []}) for i in xrange(ITER_COUNT))

@timeit
def profile_list_of_dict():
    return [{'i': i, 'l': []} for i in xrange(ITER_COUNT)]

@timeit
def profile_dict_of_obj():
    return dict((i, Obj(i)) for i in xrange(ITER_COUNT))

@timeit
def profile_list_of_obj():
    return [Obj(i) for i in xrange(ITER_COUNT)]

@timeit
def profile_dict_of_slotobj():
    return dict((i, SlotObj(i)) for i in xrange(ITER_COUNT))

@timeit
def profile_list_of_slotobj():
    return [SlotObj(i) for i in xrange(ITER_COUNT)]

if __name__ == '__main__':
    profile_dict_of_dict()
    profile_list_of_dict()
    profile_dict_of_obj()
    profile_list_of_obj()
    profile_dict_of_slotobj()
    profile_list_of_slotobj()

Results:

hbrown@hbrown-lpt:~$ python ~/Dropbox/src/StackOverflow/1336791.py 
profile_dict_of_dict ((), {}) 0:00:08.228094
profile_list_of_dict ((), {}) 0:00:06.040870
profile_dict_of_obj ((), {}) 0:00:11.481681
profile_list_of_obj ((), {}) 0:00:10.893125
profile_dict_of_slotobj ((), {}) 0:00:06.381897
profile_list_of_slotobj ((), {}) 0:00:05.860749

回答 5

没问题。
您有没有其他属性的数据(没有方法,没有任何东西)。因此,您有一个数据容器(在本例中为字典)。

我通常更喜欢在数据建模方面进行思考。如果存在巨大的性能问题,那么我可以放弃抽象中的某些内容,但是只有非常好的理由。
编程是关于管理复杂性的,维护正确的抽象常常是实现这种结果的最有用的方法之一。

关于物体变慢的原因,我认为您的测量不正确。
您在for循环内执行的分配太少,因此看到的实例化dict(本机对象)和“ custom”对象所需的时间不同。尽管从语言角度看它们是相同的,但它们的实现却大不相同。
之后,两者的分配时间应几乎相同,因为最终成员将保留在词典中。

There is no question.
You have data, with no other attributes (no methods, nothing). Hence you have a data container (in this case, a dictionary).

I usually prefer to think in terms of data modeling. If there is some huge performance issue, then I can give up something in the abstraction, but only with very good reasons.
Programming is all about managing complexity, and the maintaining the correct abstraction is very often one of the most useful way to achieve such result.

About the reasons an object is slower, I think your measurement is not correct.
You are performing too little assignments inside the for loop, and therefore what you see there is the different time necessary to instantiate a dict (intrinsic object) and a “custom” object. Although from the language perspective they are the same, they have quite a different implementation.
After that, the assignment time should be almost the same for both, as in the end members are maintained inside a dictionary.


回答 6

如果数据结构不应包含参考周期,则还有另一种减少内存使用的方法。

让我们比较两个类:

class DataItem:
    __slots__ = ('name', 'age', 'address')
    def __init__(self, name, age, address):
        self.name = name
        self.age = age
        self.address = address

$ pip install recordclass

>>> from recordclass import structclass
>>> DataItem2 = structclass('DataItem', 'name age address')
>>> inst = DataItem('Mike', 10, 'Cherry Street 15')
>>> inst2 = DataItem2('Mike', 10, 'Cherry Street 15')
>>> print(inst2)
>>> print(sys.getsizeof(inst), sys.getsizeof(inst2))
DataItem(name='Mike', age=10, address='Cherry Street 15')
64 40

由于structclass基于类的类不支持循环垃圾收集,在这种情况下不需要,因此成为可能。

__slots__基于类的类相比,还有一个优点:您可以添加额外的属性:

>>> DataItem3 = structclass('DataItem', 'name age address', usedict=True)
>>> inst3 = DataItem3('Mike', 10, 'Cherry Street 15')
>>> inst3.hobby = ['drawing', 'singing']
>>> print(inst3)
>>> print(sizeof(inst3), 'has dict:',  bool(inst3.__dict__))
DataItem(name='Mike', age=10, address='Cherry Street 15', **{'hobby': ['drawing', 'singing']})
48 has dict: True

There is yet another way to reduce memory usage if data structure isn’t supposed to contain reference cycles.

Let’s compare two classes:

class DataItem:
    __slots__ = ('name', 'age', 'address')
    def __init__(self, name, age, address):
        self.name = name
        self.age = age
        self.address = address

and

$ pip install recordclass

>>> from recordclass import structclass
>>> DataItem2 = structclass('DataItem', 'name age address')
>>> inst = DataItem('Mike', 10, 'Cherry Street 15')
>>> inst2 = DataItem2('Mike', 10, 'Cherry Street 15')
>>> print(inst2)
>>> print(sys.getsizeof(inst), sys.getsizeof(inst2))
DataItem(name='Mike', age=10, address='Cherry Street 15')
64 40

It became possible since structclass-based classes doesn’t support cyclic garbage collection, which is not needed in such cases.

There is also one advantage over __slots__-based class: you are able to add extra attributes:

>>> DataItem3 = structclass('DataItem', 'name age address', usedict=True)
>>> inst3 = DataItem3('Mike', 10, 'Cherry Street 15')
>>> inst3.hobby = ['drawing', 'singing']
>>> print(inst3)
>>> print(sizeof(inst3), 'has dict:',  bool(inst3.__dict__))
DataItem(name='Mike', age=10, address='Cherry Street 15', **{'hobby': ['drawing', 'singing']})
48 has dict: True

回答 7

这是我对@ Jarrod-Chesney非常好的脚本的测试运行。为了进行比较,我还针对python2运行了它,将“ range”替换为“ xrange”。

出于好奇,我还使用OrderedDict(ordict)添加了类似的测试以进行比较。

Python 3.6.9:

Time Taken = 0:00:04.971369,    profile_dict_of_nt,     Size = 944.27
Time Taken = 0:00:05.743104,    profile_list_of_nt,     Size = 1,066.93
Time Taken = 0:00:02.524507,    profile_dict_of_dict,   Size = 1,920.35
Time Taken = 0:00:02.123801,    profile_list_of_dict,   Size = 1,760.9
Time Taken = 0:00:05.374294,    profile_dict_of_obj,    Size = 1,532.12
Time Taken = 0:00:04.517245,    profile_list_of_obj,    Size = 1,441.04
Time Taken = 0:00:04.590298,    profile_dict_of_slot,   Size = 1,030.09
Time Taken = 0:00:04.197425,    profile_list_of_slot,   Size = 870.67

Time Taken = 0:00:08.833653,    profile_ordict_of_ordict, Size = 3,045.52
Time Taken = 0:00:11.539006,    profile_list_of_ordict, Size = 2,722.34
Time Taken = 0:00:06.428105,    profile_ordict_of_obj,  Size = 1,799.29
Time Taken = 0:00:05.559248,    profile_ordict_of_slot, Size = 1,257.75

Python 2.7.15+:

Time Taken = 0:00:05.193900,    profile_dict_of_nt,     Size = 906.0
Time Taken = 0:00:05.860978,    profile_list_of_nt,     Size = 1,177.0
Time Taken = 0:00:02.370905,    profile_dict_of_dict,   Size = 2,228.0
Time Taken = 0:00:02.100117,    profile_list_of_dict,   Size = 2,036.0
Time Taken = 0:00:08.353666,    profile_dict_of_obj,    Size = 2,493.0
Time Taken = 0:00:07.441747,    profile_list_of_obj,    Size = 2,337.0
Time Taken = 0:00:06.118018,    profile_dict_of_slot,   Size = 1,117.0
Time Taken = 0:00:04.654888,    profile_list_of_slot,   Size = 964.0

Time Taken = 0:00:59.576874,    profile_ordict_of_ordict, Size = 7,427.0
Time Taken = 0:10:25.679784,    profile_list_of_ordict, Size = 11,305.0
Time Taken = 0:05:47.289230,    profile_ordict_of_obj,  Size = 11,477.0
Time Taken = 0:00:51.485756,    profile_ordict_of_slot, Size = 11,193.0

因此,在两个主要版本上,@ Jarrod-Chesney的结论仍然看起来不错。

Here are my test runs of the very nice script of @Jarrod-Chesney. For comparison, I also run it against python2 with “range” replaced by “xrange”.

By curiosity, I also added similar tests with OrderedDict (ordict) for comparison.

Python 3.6.9:

Time Taken = 0:00:04.971369,    profile_dict_of_nt,     Size = 944.27
Time Taken = 0:00:05.743104,    profile_list_of_nt,     Size = 1,066.93
Time Taken = 0:00:02.524507,    profile_dict_of_dict,   Size = 1,920.35
Time Taken = 0:00:02.123801,    profile_list_of_dict,   Size = 1,760.9
Time Taken = 0:00:05.374294,    profile_dict_of_obj,    Size = 1,532.12
Time Taken = 0:00:04.517245,    profile_list_of_obj,    Size = 1,441.04
Time Taken = 0:00:04.590298,    profile_dict_of_slot,   Size = 1,030.09
Time Taken = 0:00:04.197425,    profile_list_of_slot,   Size = 870.67

Time Taken = 0:00:08.833653,    profile_ordict_of_ordict, Size = 3,045.52
Time Taken = 0:00:11.539006,    profile_list_of_ordict, Size = 2,722.34
Time Taken = 0:00:06.428105,    profile_ordict_of_obj,  Size = 1,799.29
Time Taken = 0:00:05.559248,    profile_ordict_of_slot, Size = 1,257.75

Python 2.7.15+:

Time Taken = 0:00:05.193900,    profile_dict_of_nt,     Size = 906.0
Time Taken = 0:00:05.860978,    profile_list_of_nt,     Size = 1,177.0
Time Taken = 0:00:02.370905,    profile_dict_of_dict,   Size = 2,228.0
Time Taken = 0:00:02.100117,    profile_list_of_dict,   Size = 2,036.0
Time Taken = 0:00:08.353666,    profile_dict_of_obj,    Size = 2,493.0
Time Taken = 0:00:07.441747,    profile_list_of_obj,    Size = 2,337.0
Time Taken = 0:00:06.118018,    profile_dict_of_slot,   Size = 1,117.0
Time Taken = 0:00:04.654888,    profile_list_of_slot,   Size = 964.0

Time Taken = 0:00:59.576874,    profile_ordict_of_ordict, Size = 7,427.0
Time Taken = 0:10:25.679784,    profile_list_of_ordict, Size = 11,305.0
Time Taken = 0:05:47.289230,    profile_ordict_of_obj,  Size = 11,477.0
Time Taken = 0:00:51.485756,    profile_ordict_of_slot, Size = 11,193.0

So, on both major versions, the conclusions of @Jarrod-Chesney are still looking good.


Python中的字符串到字典

问题:Python中的字符串到字典

所以我花了很多时间在此上,在我看来,这应该是一个简单的修复。我正在尝试使用Facebook的身份验证在我的网站上注册用户,并且正在服务器端进行操作。我已经到了获取访问令牌的地步,并且当我去:

https://graph.facebook.com/me?access_token=MY_ACCESS_TOKEN

我得到的信息就是这样的字符串:

{"id":"123456789","name":"John Doe","first_name":"John","last_name":"Doe","link":"http:\/\/www.facebook.com\/jdoe","gender":"male","email":"jdoe\u0040gmail.com","timezone":-7,"locale":"en_US","verified":true,"updated_time":"2011-01-12T02:43:35+0000"}

似乎我应该可以使用dict(string)它,但出现此错误:

ValueError: dictionary update sequence element #0 has length 1; 2 is required

所以我尝试使用Pickle,但收到此错误:

KeyError: '{'

我尝试使用django.serializers反序列化它,但结果相似。有什么想法吗?我觉得答案必须很简单,而且我很愚蠢。谢谢你的帮助!

So I’ve spent way to much time on this, and it seems to me like it should be a simple fix. I’m trying to use Facebook’s Authentication to register users on my site, and I’m trying to do it server side. I’ve gotten to the point where I get my access token, and when I go to:

https://graph.facebook.com/me?access_token=MY_ACCESS_TOKEN

I get the information I’m looking for as a string that’s like this:

{"id":"123456789","name":"John Doe","first_name":"John","last_name":"Doe","link":"http:\/\/www.facebook.com\/jdoe","gender":"male","email":"jdoe\u0040gmail.com","timezone":-7,"locale":"en_US","verified":true,"updated_time":"2011-01-12T02:43:35+0000"}

It seems like I should just be able to use dict(string) on this but I’m getting this error:

ValueError: dictionary update sequence element #0 has length 1; 2 is required

So I tried using Pickle, but got this error:

KeyError: '{'

I tried using django.serializers to de-serialize it but had similar results. Any thoughts? I feel like the answer has to be simple, and I’m just being stupid. Thanks for any help!


回答 0

此数据为JSON!如果您使用的是Python 2.6+,则可以使用内置json模块反序列化它,否则可以使用出色的第三方simplejson模块

import json    # or `import simplejson as json` if on Python < 2.6

json_string = u'{ "id":"123456789", ... }'
obj = json.loads(json_string)    # obj now contains a dict of the data

This data is JSON! You can deserialize it using the built-in json module if you’re on Python 2.6+, otherwise you can use the excellent third-party simplejson module.

import json    # or `import simplejson as json` if on Python < 2.6

json_string = u'{ "id":"123456789", ... }'
obj = json.loads(json_string)    # obj now contains a dict of the data

回答 1

使用ast.literal_eval评估Python文字。但是,您拥有的是JSON(例如,请注意“ true”),因此请使用JSON解串器。

>>> import json
>>> s = """{"id":"123456789","name":"John Doe","first_name":"John","last_name":"Doe","link":"http:\/\/www.facebook.com\/jdoe","gender":"male","email":"jdoe\u0040gmail.com","timezone":-7,"locale":"en_US","verified":true,"updated_time":"2011-01-12T02:43:35+0000"}"""
>>> json.loads(s)
{u'first_name': u'John', u'last_name': u'Doe', u'verified': True, u'name': u'John Doe', u'locale': u'en_US', u'gender': u'male', u'email': u'jdoe@gmail.com', u'link': u'http://www.facebook.com/jdoe', u'timezone': -7, u'updated_time': u'2011-01-12T02:43:35+0000', u'id': u'123456789'}

Use ast.literal_eval to evaluate Python literals. However, what you have is JSON (note “true” for example), so use a JSON deserializer.

>>> import json
>>> s = """{"id":"123456789","name":"John Doe","first_name":"John","last_name":"Doe","link":"http:\/\/www.facebook.com\/jdoe","gender":"male","email":"jdoe\u0040gmail.com","timezone":-7,"locale":"en_US","verified":true,"updated_time":"2011-01-12T02:43:35+0000"}"""
>>> json.loads(s)
{u'first_name': u'John', u'last_name': u'Doe', u'verified': True, u'name': u'John Doe', u'locale': u'en_US', u'gender': u'male', u'email': u'jdoe@gmail.com', u'link': u'http://www.facebook.com/jdoe', u'timezone': -7, u'updated_time': u'2011-01-12T02:43:35+0000', u'id': u'123456789'}

如何将键值元组列表转换成字典?

问题:如何将键值元组列表转换成字典?

我有一个列表,看起来像:

[('A', 1), ('B', 2), ('C', 3)]

我想把它变成一个像这样的字典:

{'A': 1, 'B': 2, 'C': 3}

最好的方法是什么?

编辑:我的元组列表实际上更像是:

[(A, 12937012397), (BERA, 2034927830), (CE, 2349057340)]

I have a list that looks like:

[('A', 1), ('B', 2), ('C', 3)]

I want to turn it into a dictionary that looks like:

{'A': 1, 'B': 2, 'C': 3}

What’s the best way to go about this?

EDIT: My list of tuples is actually more like:

[(A, 12937012397), (BERA, 2034927830), (CE, 2349057340)]

回答 0

这给了我与尝试拆分列表并压缩列表相同的错误。ValueError:字典更新序列元素#0的长度为1916;2个为必填项

那是你的实际问题。

答案是列表中的元素与您认为的不一样。如果键入,myList[0]您会发现列表的第一个元素不是二元组,例如('A', 1),而是1916长度的iterable。

一旦您真正有了原始问题(myList = [('A',1),('B',2),...])中所述表格的列表,您所需要做的就是dict(myList)

This gives me the same error as trying to split the list up and zip it. ValueError: dictionary update sequence element #0 has length 1916; 2 is required

THAT is your actual question.

The answer is that the elements of your list are not what you think they are. If you type myList[0] you will find that the first element of your list is not a two-tuple, e.g. ('A', 1), but rather a 1916-length iterable.

Once you actually have a list in the form you stated in your original question (myList = [('A',1),('B',2),...]), all you need to do is dict(myList).


回答 1

>>> dict([('A', 1), ('B', 2), ('C', 3)])
{'A': 1, 'C': 3, 'B': 2}
>>> dict([('A', 1), ('B', 2), ('C', 3)])
{'A': 1, 'C': 3, 'B': 2}

回答 2

你有尝试过吗?

>>> l=[('A',1), ('B',2), ('C',3)]
>>> d=dict(l)
>>> d
{'A': 1, 'C': 3, 'B': 2}

Have you tried this?

>>> l=[('A',1), ('B',2), ('C',3)]
>>> d=dict(l)
>>> d
{'A': 1, 'C': 3, 'B': 2}

回答 3

这是处理重复的元组“键”的方法:

# An example
l = [('A', 1), ('B', 2), ('C', 3), ('A', 5), ('D', 0), ('D', 9)]

# A solution
d = dict()
[d [t [0]].append(t [1]) if t [0] in list(d.keys()) 
 else d.update({t [0]: [t [1]]}) for t in l]
d

OUTPUT: {'A': [1, 5], 'B': [2], 'C': [3], 'D': [0, 9]}

Here is a way to handle duplicate tuple “keys”:

# An example
l = [('A', 1), ('B', 2), ('C', 3), ('A', 5), ('D', 0), ('D', 9)]

# A solution
d = dict()
[d [t [0]].append(t [1]) if t [0] in list(d.keys()) 
 else d.update({t [0]: [t [1]]}) for t in l]
d

OUTPUT: {'A': [1, 5], 'B': [2], 'C': [3], 'D': [0, 9]}

回答 4

使用字典推导的另一种方式

>>> t = [('A', 1), ('B', 2), ('C', 3)]
>>> d = { i:j for i,j in t }
>>> d
{'A': 1, 'B': 2, 'C': 3}

Another way using dictionary comprehensions,

>>> t = [('A', 1), ('B', 2), ('C', 3)]
>>> d = { i:j for i,j in t }
>>> d
{'A': 1, 'B': 2, 'C': 3}

回答 5

如果Tuple没有重复键,则很简单。

tup = [("A",0),("B",3),("C",5)]
dic = dict(tup)
print(dic)

如果元组具有键重复。

tup = [("A",0),("B",3),("C",5),("A",9),("B",4)]
dic = {}
for i, j in tup:
    dic.setdefault(i,[]).append(j)
print(dic)

If Tuple has no key repetitions, it’s Simple.

tup = [("A",0),("B",3),("C",5)]
dic = dict(tup)
print(dic)

If tuple has key repetitions.

tup = [("A",0),("B",3),("C",5),("A",9),("B",4)]
dic = {}
for i, j in tup:
    dic.setdefault(i,[]).append(j)
print(dic)

回答 6

l=[['A', 1], ['B', 2], ['C', 3]]
d={}
for i,j in l:
d.setdefault(i,j)
print(d)
l=[['A', 1], ['B', 2], ['C', 3]]
d={}
for i,j in l:
d.setdefault(i,j)
print(d)

如何将xml字符串转换为字典?

问题:如何将xml字符串转换为字典?

我有一个程序可以从套接字读取xml文档。我将xml文档存储在一个字符串中,我想将其直接转换为Python字典,就像在Django的simplejson库中一样。

举个例子:

str ="<?xml version="1.0" ?><person><name>john</name><age>20</age></person"
dic_xml = convert_to_dic(str)

然后dic_xml看起来像{'person' : { 'name' : 'john', 'age' : 20 } }

I have a program that reads an xml document from a socket. I have the xml document stored in a string which I would like to convert directly to a Python dictionary, the same way it is done in Django’s simplejson library.

Take as an example:

str ="<?xml version="1.0" ?><person><name>john</name><age>20</age></person"
dic_xml = convert_to_dic(str)

Then dic_xml would look like {'person' : { 'name' : 'john', 'age' : 20 } }


回答 0

这是某人创建的一个很棒的模块。我已经使用过几次了。 http://code.activestate.com/recipes/410469-xml-as-dictionary/

这是网站上的代码,以防链接损坏。

from xml.etree import cElementTree as ElementTree

class XmlListConfig(list):
    def __init__(self, aList):
        for element in aList:
            if element:
                # treat like dict
                if len(element) == 1 or element[0].tag != element[1].tag:
                    self.append(XmlDictConfig(element))
                # treat like list
                elif element[0].tag == element[1].tag:
                    self.append(XmlListConfig(element))
            elif element.text:
                text = element.text.strip()
                if text:
                    self.append(text)


class XmlDictConfig(dict):
    '''
    Example usage:

    >>> tree = ElementTree.parse('your_file.xml')
    >>> root = tree.getroot()
    >>> xmldict = XmlDictConfig(root)

    Or, if you want to use an XML string:

    >>> root = ElementTree.XML(xml_string)
    >>> xmldict = XmlDictConfig(root)

    And then use xmldict for what it is... a dict.
    '''
    def __init__(self, parent_element):
        if parent_element.items():
            self.update(dict(parent_element.items()))
        for element in parent_element:
            if element:
                # treat like dict - we assume that if the first two tags
                # in a series are different, then they are all different.
                if len(element) == 1 or element[0].tag != element[1].tag:
                    aDict = XmlDictConfig(element)
                # treat like list - we assume that if the first two tags
                # in a series are the same, then the rest are the same.
                else:
                    # here, we put the list in dictionary; the key is the
                    # tag name the list elements all share in common, and
                    # the value is the list itself 
                    aDict = {element[0].tag: XmlListConfig(element)}
                # if the tag has attributes, add those to the dict
                if element.items():
                    aDict.update(dict(element.items()))
                self.update({element.tag: aDict})
            # this assumes that if you've got an attribute in a tag,
            # you won't be having any text. This may or may not be a 
            # good idea -- time will tell. It works for the way we are
            # currently doing XML configuration files...
            elif element.items():
                self.update({element.tag: dict(element.items())})
            # finally, if there are no child tags and no attributes, extract
            # the text
            else:
                self.update({element.tag: element.text})

用法示例:

tree = ElementTree.parse('your_file.xml')
root = tree.getroot()
xmldict = XmlDictConfig(root)

//或者,如果要使用XML字符串:

root = ElementTree.XML(xml_string)
xmldict = XmlDictConfig(root)

This is a great module that someone created. I’ve used it several times. http://code.activestate.com/recipes/410469-xml-as-dictionary/

Here is the code from the website just in case the link goes bad.

from xml.etree import cElementTree as ElementTree

class XmlListConfig(list):
    def __init__(self, aList):
        for element in aList:
            if element:
                # treat like dict
                if len(element) == 1 or element[0].tag != element[1].tag:
                    self.append(XmlDictConfig(element))
                # treat like list
                elif element[0].tag == element[1].tag:
                    self.append(XmlListConfig(element))
            elif element.text:
                text = element.text.strip()
                if text:
                    self.append(text)


class XmlDictConfig(dict):
    '''
    Example usage:

    >>> tree = ElementTree.parse('your_file.xml')
    >>> root = tree.getroot()
    >>> xmldict = XmlDictConfig(root)

    Or, if you want to use an XML string:

    >>> root = ElementTree.XML(xml_string)
    >>> xmldict = XmlDictConfig(root)

    And then use xmldict for what it is... a dict.
    '''
    def __init__(self, parent_element):
        if parent_element.items():
            self.update(dict(parent_element.items()))
        for element in parent_element:
            if element:
                # treat like dict - we assume that if the first two tags
                # in a series are different, then they are all different.
                if len(element) == 1 or element[0].tag != element[1].tag:
                    aDict = XmlDictConfig(element)
                # treat like list - we assume that if the first two tags
                # in a series are the same, then the rest are the same.
                else:
                    # here, we put the list in dictionary; the key is the
                    # tag name the list elements all share in common, and
                    # the value is the list itself 
                    aDict = {element[0].tag: XmlListConfig(element)}
                # if the tag has attributes, add those to the dict
                if element.items():
                    aDict.update(dict(element.items()))
                self.update({element.tag: aDict})
            # this assumes that if you've got an attribute in a tag,
            # you won't be having any text. This may or may not be a 
            # good idea -- time will tell. It works for the way we are
            # currently doing XML configuration files...
            elif element.items():
                self.update({element.tag: dict(element.items())})
            # finally, if there are no child tags and no attributes, extract
            # the text
            else:
                self.update({element.tag: element.text})

Example usage:

tree = ElementTree.parse('your_file.xml')
root = tree.getroot()
xmldict = XmlDictConfig(root)

//Or, if you want to use an XML string:

root = ElementTree.XML(xml_string)
xmldict = XmlDictConfig(root)

回答 1

xmltodict(完全公开:我写了它)确实做到了:

xmltodict.parse("""
<?xml version="1.0" ?>
<person>
  <name>john</name>
  <age>20</age>
</person>""")
# {u'person': {u'age': u'20', u'name': u'john'}}

xmltodict (full disclosure: I wrote it) does exactly that:

xmltodict.parse("""
<?xml version="1.0" ?>
<person>
  <name>john</name>
  <age>20</age>
</person>""")
# {u'person': {u'age': u'20', u'name': u'john'}}

回答 2

以下XML-to-Python-dict片段分析了此XML-to-JSON“规范”之后的实体以及属性。这是处理XML所有情况的最通用的解决方案。

from collections import defaultdict

def etree_to_dict(t):
    d = {t.tag: {} if t.attrib else None}
    children = list(t)
    if children:
        dd = defaultdict(list)
        for dc in map(etree_to_dict, children):
            for k, v in dc.items():
                dd[k].append(v)
        d = {t.tag: {k:v[0] if len(v) == 1 else v for k, v in dd.items()}}
    if t.attrib:
        d[t.tag].update(('@' + k, v) for k, v in t.attrib.items())
    if t.text:
        text = t.text.strip()
        if children or t.attrib:
            if text:
              d[t.tag]['#text'] = text
        else:
            d[t.tag] = text
    return d

它用于:

from xml.etree import cElementTree as ET
e = ET.XML('''
<root>
  <e />
  <e>text</e>
  <e name="value" />
  <e name="value">text</e>
  <e> <a>text</a> <b>text</b> </e>
  <e> <a>text</a> <a>text</a> </e>
  <e> text <a>text</a> </e>
</root>
''')

from pprint import pprint
pprint(etree_to_dict(e))

此示例的输出(根据上面链接的“规范”)应为:

{'root': {'e': [None,
                'text',
                {'@name': 'value'},
                {'#text': 'text', '@name': 'value'},
                {'a': 'text', 'b': 'text'},
                {'a': ['text', 'text']},
                {'#text': 'text', 'a': 'text'}]}}

不一定很漂亮,但它是明确的,而更简单的XML输入会导致更简单的JSON。:)


更新资料

如果要进行相反的操作从JSON / dict发出XML字符串,则可以使用:

try:
  basestring
except NameError:  # python3
  basestring = str

def dict_to_etree(d):
    def _to_etree(d, root):
        if not d:
            pass
        elif isinstance(d, basestring):
            root.text = d
        elif isinstance(d, dict):
            for k,v in d.items():
                assert isinstance(k, basestring)
                if k.startswith('#'):
                    assert k == '#text' and isinstance(v, basestring)
                    root.text = v
                elif k.startswith('@'):
                    assert isinstance(v, basestring)
                    root.set(k[1:], v)
                elif isinstance(v, list):
                    for e in v:
                        _to_etree(e, ET.SubElement(root, k))
                else:
                    _to_etree(v, ET.SubElement(root, k))
        else:
            raise TypeError('invalid type: ' + str(type(d)))
    assert isinstance(d, dict) and len(d) == 1
    tag, body = next(iter(d.items()))
    node = ET.Element(tag)
    _to_etree(body, node)
    return ET.tostring(node)

pprint(dict_to_etree(d))

The following XML-to-Python-dict snippet parses entities as well as attributes following this XML-to-JSON “specification”. It is the most general solution handling all cases of XML.

from collections import defaultdict

def etree_to_dict(t):
    d = {t.tag: {} if t.attrib else None}
    children = list(t)
    if children:
        dd = defaultdict(list)
        for dc in map(etree_to_dict, children):
            for k, v in dc.items():
                dd[k].append(v)
        d = {t.tag: {k:v[0] if len(v) == 1 else v for k, v in dd.items()}}
    if t.attrib:
        d[t.tag].update(('@' + k, v) for k, v in t.attrib.items())
    if t.text:
        text = t.text.strip()
        if children or t.attrib:
            if text:
              d[t.tag]['#text'] = text
        else:
            d[t.tag] = text
    return d

It is used:

from xml.etree import cElementTree as ET
e = ET.XML('''
<root>
  <e />
  <e>text</e>
  <e name="value" />
  <e name="value">text</e>
  <e> <a>text</a> <b>text</b> </e>
  <e> <a>text</a> <a>text</a> </e>
  <e> text <a>text</a> </e>
</root>
''')

from pprint import pprint
pprint(etree_to_dict(e))

The output of this example (as per above-linked “specification”) should be:

{'root': {'e': [None,
                'text',
                {'@name': 'value'},
                {'#text': 'text', '@name': 'value'},
                {'a': 'text', 'b': 'text'},
                {'a': ['text', 'text']},
                {'#text': 'text', 'a': 'text'}]}}

Not necessarily pretty, but it is unambiguous, and simpler XML inputs result in simpler JSON. :)


Update

If you want to do the reverse, emit an XML string from a JSON/dict, you can use:

try:
  basestring
except NameError:  # python3
  basestring = str

def dict_to_etree(d):
    def _to_etree(d, root):
        if not d:
            pass
        elif isinstance(d, basestring):
            root.text = d
        elif isinstance(d, dict):
            for k,v in d.items():
                assert isinstance(k, basestring)
                if k.startswith('#'):
                    assert k == '#text' and isinstance(v, basestring)
                    root.text = v
                elif k.startswith('@'):
                    assert isinstance(v, basestring)
                    root.set(k[1:], v)
                elif isinstance(v, list):
                    for e in v:
                        _to_etree(e, ET.SubElement(root, k))
                else:
                    _to_etree(v, ET.SubElement(root, k))
        else:
            raise TypeError('invalid type: ' + str(type(d)))
    assert isinstance(d, dict) and len(d) == 1
    tag, body = next(iter(d.items()))
    node = ET.Element(tag)
    _to_etree(body, node)
    return ET.tostring(node)

pprint(dict_to_etree(d))

回答 3

这个轻量级的版本虽然不可配置,但是很容易根据需要进行定制,并且可以在旧的python中工作。它也是严格的-意味着无论属性是否存在,结果都是相同的。

import xml.etree.ElementTree as ET

from copy import copy

def dictify(r,root=True):
    if root:
        return {r.tag : dictify(r, False)}
    d=copy(r.attrib)
    if r.text:
        d["_text"]=r.text
    for x in r.findall("./*"):
        if x.tag not in d:
            d[x.tag]=[]
        d[x.tag].append(dictify(x,False))
    return d

所以:

root = ET.fromstring("<erik><a x='1'>v</a><a y='2'>w</a></erik>")

dictify(root)

结果是:

{'erik': {'a': [{'x': '1', '_text': 'v'}, {'y': '2', '_text': 'w'}]}}

This lightweight version, while not configurable, is pretty easy to tailor as needed, and works in old pythons. Also it is rigid – meaning the results are the same regardless of the existence of attributes.

import xml.etree.ElementTree as ET

from copy import copy

def dictify(r,root=True):
    if root:
        return {r.tag : dictify(r, False)}
    d=copy(r.attrib)
    if r.text:
        d["_text"]=r.text
    for x in r.findall("./*"):
        if x.tag not in d:
            d[x.tag]=[]
        d[x.tag].append(dictify(x,False))
    return d

So:

root = ET.fromstring("<erik><a x='1'>v</a><a y='2'>w</a></erik>")

dictify(root)

Results in:

{'erik': {'a': [{'x': '1', '_text': 'v'}, {'y': '2', '_text': 'w'}]}}

回答 4

PicklingTools库的最新版本(1.3.0和1.3.1)支持将XML转换为Python dict的工具。

可从此处下载文件: PicklingTools 1.3.1

没有为转换颇有几分文档在这里:文档中详细的所有XML和Python字典之间转换时将产生的决定和问题描述(也有一些边缘情况:属性,列表,匿名列表,匿名多数转换器无法处理的dict,eval等)。通常,这些转换器易于使用。如果“ example.xml”包含:

<top>
  <a>1</a>
  <b>2.2</b>
  <c>three</c>
</top>

然后将其转换为字典:

>>> from xmlloader import *
>>> example = file('example.xml', 'r')   # A document containing XML
>>> xl = StreamXMLLoader(example, 0)     # 0 = all defaults on operation
>>> result = xl.expect XML()
>>> print result
{'top': {'a': '1', 'c': 'three', 'b': '2.2'}}

有一些可以在C ++和Python中进行转换的工具:C ++和Python可以进行相同的转换,但是C ++的速度要快60倍左右

The most recent versions of the PicklingTools libraries (1.3.0 and 1.3.1) support tools for converting from XML to a Python dict.

The download is available here: PicklingTools 1.3.1

There is quite a bit of documentation for the converters here: the documentation describes in detail all of the decisions and issues that will arise when converting between XML and Python dictionaries (there are a number of edge cases: attributes, lists, anonymous lists, anonymous dicts, eval, etc. that most converters don’t handle). In general, though, the converters are easy to use. If an ‘example.xml’ contains:

<top>
  <a>1</a>
  <b>2.2</b>
  <c>three</c>
</top>

Then to convert it to a dictionary:

>>> from xmlloader import *
>>> example = file('example.xml', 'r')   # A document containing XML
>>> xl = StreamXMLLoader(example, 0)     # 0 = all defaults on operation
>>> result = xl.expect XML()
>>> print result
{'top': {'a': '1', 'c': 'three', 'b': '2.2'}}

There are tools for converting in both C++ and Python: the C++ and Python do indentical conversion, but the C++ is about 60x faster


回答 5

您可以使用lxml轻松完成此操作。首先安装它:

[sudo] pip install lxml

这是我编写的递归函数,可以为您完成繁重的工作:

from lxml import objectify as xml_objectify


def xml_to_dict(xml_str):
    """ Convert xml to dict, using lxml v3.4.2 xml processing library """
    def xml_to_dict_recursion(xml_object):
        dict_object = xml_object.__dict__
        if not dict_object:
            return xml_object
        for key, value in dict_object.items():
            dict_object[key] = xml_to_dict_recursion(value)
        return dict_object
    return xml_to_dict_recursion(xml_objectify.fromstring(xml_str))

xml_string = """<?xml version="1.0" encoding="UTF-8"?><Response><NewOrderResp>
<IndustryType>Test</IndustryType><SomeData><SomeNestedData1>1234</SomeNestedData1>
<SomeNestedData2>3455</SomeNestedData2></SomeData></NewOrderResp></Response>"""

print xml_to_dict(xml_string)

以下变体保留了父键/元素:

def xml_to_dict(xml_str):
    """ Convert xml to dict, using lxml v3.4.2 xml processing library, see http://lxml.de/ """
    def xml_to_dict_recursion(xml_object):
        dict_object = xml_object.__dict__
        if not dict_object:  # if empty dict returned
            return xml_object
        for key, value in dict_object.items():
            dict_object[key] = xml_to_dict_recursion(value)
        return dict_object
    xml_obj = objectify.fromstring(xml_str)
    return {xml_obj.tag: xml_to_dict_recursion(xml_obj)}

如果只想返回一个子树并将其转换为dict,则可以使用Element.find()获取该子树,然后对其进行转换:

xml_obj.find('.//')  # lxml.objectify.ObjectifiedElement instance

请在此处查看lxml文档。我希望这有帮助!

You can do this quite easily with lxml. First install it:

[sudo] pip install lxml

Here is a recursive function I wrote that does the heavy lifting for you:

from lxml import objectify as xml_objectify


def xml_to_dict(xml_str):
    """ Convert xml to dict, using lxml v3.4.2 xml processing library """
    def xml_to_dict_recursion(xml_object):
        dict_object = xml_object.__dict__
        if not dict_object:
            return xml_object
        for key, value in dict_object.items():
            dict_object[key] = xml_to_dict_recursion(value)
        return dict_object
    return xml_to_dict_recursion(xml_objectify.fromstring(xml_str))

xml_string = """<?xml version="1.0" encoding="UTF-8"?><Response><NewOrderResp>
<IndustryType>Test</IndustryType><SomeData><SomeNestedData1>1234</SomeNestedData1>
<SomeNestedData2>3455</SomeNestedData2></SomeData></NewOrderResp></Response>"""

print xml_to_dict(xml_string)

The below variant preserves the parent key / element:

def xml_to_dict(xml_str):
    """ Convert xml to dict, using lxml v3.4.2 xml processing library, see http://lxml.de/ """
    def xml_to_dict_recursion(xml_object):
        dict_object = xml_object.__dict__
        if not dict_object:  # if empty dict returned
            return xml_object
        for key, value in dict_object.items():
            dict_object[key] = xml_to_dict_recursion(value)
        return dict_object
    xml_obj = objectify.fromstring(xml_str)
    return {xml_obj.tag: xml_to_dict_recursion(xml_obj)}

If you want to only return a subtree and convert it to dict, you can use Element.find() to get the subtree and then convert it:

xml_obj.find('.//')  # lxml.objectify.ObjectifiedElement instance

See the lxml docs here. I hope this helps!


回答 6

免责声明:此经过修改的XML解析器受到Adam Clark 的启发。原始XML解析器适用于大多数简单情况。但是,它不适用于某些复杂的XML文件。我逐行调试了代码,最后解决了一些问题。如果您发现一些错误,请告诉我。我很高兴修复它。

class XmlDictConfig(dict):  
    '''   
    Note: need to add a root into if no exising    
    Example usage:
    >>> tree = ElementTree.parse('your_file.xml')
    >>> root = tree.getroot()
    >>> xmldict = XmlDictConfig(root)
    Or, if you want to use an XML string:
    >>> root = ElementTree.XML(xml_string)
    >>> xmldict = XmlDictConfig(root)
    And then use xmldict for what it is... a dict.
    '''
    def __init__(self, parent_element):
        if parent_element.items():
            self.updateShim( dict(parent_element.items()) )
        for element in parent_element:
            if len(element):
                aDict = XmlDictConfig(element)
            #   if element.items():
            #   aDict.updateShim(dict(element.items()))
                self.updateShim({element.tag: aDict})
            elif element.items():    # items() is specialy for attribtes
                elementattrib= element.items()
                if element.text:           
                    elementattrib.append((element.tag,element.text ))     # add tag:text if there exist
                self.updateShim({element.tag: dict(elementattrib)})
            else:
                self.updateShim({element.tag: element.text})

    def updateShim (self, aDict ):
        for key in aDict.keys():   # keys() includes tag and attributes
            if key in self:
                value = self.pop(key)
                if type(value) is not list:
                    listOfDicts = []
                    listOfDicts.append(value)
                    listOfDicts.append(aDict[key])
                    self.update({key: listOfDicts})
                else:
                    value.append(aDict[key])
                    self.update({key: value})
            else:
                self.update({key:aDict[key]})  # it was self.update(aDict)    

Disclaimer: This modified XML parser was inspired by Adam Clark The original XML parser works for most of simple cases. However, it didn’t work for some complicated XML files. I debugged the code line by line and finally fixed some issues. If you find some bugs, please let me know. I am glad to fix it.

class XmlDictConfig(dict):  
    '''   
    Note: need to add a root into if no exising    
    Example usage:
    >>> tree = ElementTree.parse('your_file.xml')
    >>> root = tree.getroot()
    >>> xmldict = XmlDictConfig(root)
    Or, if you want to use an XML string:
    >>> root = ElementTree.XML(xml_string)
    >>> xmldict = XmlDictConfig(root)
    And then use xmldict for what it is... a dict.
    '''
    def __init__(self, parent_element):
        if parent_element.items():
            self.updateShim( dict(parent_element.items()) )
        for element in parent_element:
            if len(element):
                aDict = XmlDictConfig(element)
            #   if element.items():
            #   aDict.updateShim(dict(element.items()))
                self.updateShim({element.tag: aDict})
            elif element.items():    # items() is specialy for attribtes
                elementattrib= element.items()
                if element.text:           
                    elementattrib.append((element.tag,element.text ))     # add tag:text if there exist
                self.updateShim({element.tag: dict(elementattrib)})
            else:
                self.updateShim({element.tag: element.text})

    def updateShim (self, aDict ):
        for key in aDict.keys():   # keys() includes tag and attributes
            if key in self:
                value = self.pop(key)
                if type(value) is not list:
                    listOfDicts = []
                    listOfDicts.append(value)
                    listOfDicts.append(aDict[key])
                    self.update({key: listOfDicts})
                else:
                    value.append(aDict[key])
                    self.update({key: value})
            else:
                self.update({key:aDict[key]})  # it was self.update(aDict)    

回答 7

def xml_to_dict(node):
    u''' 
    @param node:lxml_node
    @return: dict 
    '''

    return {'tag': node.tag, 'text': node.text, 'attrib': node.attrib, 'children': {child.tag: xml_to_dict(child) for child in node}}
def xml_to_dict(node):
    u''' 
    @param node:lxml_node
    @return: dict 
    '''

    return {'tag': node.tag, 'text': node.text, 'attrib': node.attrib, 'children': {child.tag: xml_to_dict(child) for child in node}}

回答 8

最容易使用的XML XML解析器是ElementTree(从2.5x开始,在标准库xml.etree.ElementTree中)。我认为没有什么可以完全满足您的要求。使用ElementTree编写某些内容来完成您想要的事情,这很简单,但是为什么要转换为字典,为什么不直接使用ElementTree。

The easiest to use XML parser for Python is ElementTree (as of 2.5x and above it is in the standard library xml.etree.ElementTree). I don’t think there is anything that does exactly what you want out of the box. It would be pretty trivial to write something to do what you want using ElementTree, but why convert to a dictionary, and why not just use ElementTree directly.


回答 9

来自http://code.activestate.com/recipes/410469-xml-as-dictionary/的代码效果很好,但是,如果在层次结构中的给定位置存在多个相同的元素,它将覆盖它们。

我在两者之间添加了一个垫片,以查看在self.update()之前该元素是否已经存在。如果是这样,则弹出现有条目并从现有条目和新条目中创建一个列表。随后的所有重复项都将添加到列表中。

不知道是否可以更妥善地处理此问题,但它的工作原理是:

import xml.etree.ElementTree as ElementTree

class XmlDictConfig(dict):
    def __init__(self, parent_element):
        if parent_element.items():
            self.updateShim(dict(parent_element.items()))
        for element in parent_element:
            if len(element):
                aDict = XmlDictConfig(element)
                if element.items():
                    aDict.updateShim(dict(element.items()))
                self.updateShim({element.tag: aDict})
            elif element.items():
                self.updateShim({element.tag: dict(element.items())})
            else:
                self.updateShim({element.tag: element.text.strip()})

    def updateShim (self, aDict ):
        for key in aDict.keys():
            if key in self:
                value = self.pop(key)
                if type(value) is not list:
                    listOfDicts = []
                    listOfDicts.append(value)
                    listOfDicts.append(aDict[key])
                    self.update({key: listOfDicts})

                else:
                    value.append(aDict[key])
                    self.update({key: value})
            else:
                self.update(aDict)

The code from http://code.activestate.com/recipes/410469-xml-as-dictionary/ works well, but if there are multiple elements that are the same at a given place in the hierarchy it just overrides them.

I added a shim between that looks to see if the element already exists before self.update(). If so, pops the existing entry and creates a lists out of the existing and the new. Any subsequent duplicates are added to the list.

Not sure if this can be handled more gracefully, but it works:

import xml.etree.ElementTree as ElementTree

class XmlDictConfig(dict):
    def __init__(self, parent_element):
        if parent_element.items():
            self.updateShim(dict(parent_element.items()))
        for element in parent_element:
            if len(element):
                aDict = XmlDictConfig(element)
                if element.items():
                    aDict.updateShim(dict(element.items()))
                self.updateShim({element.tag: aDict})
            elif element.items():
                self.updateShim({element.tag: dict(element.items())})
            else:
                self.updateShim({element.tag: element.text.strip()})

    def updateShim (self, aDict ):
        for key in aDict.keys():
            if key in self:
                value = self.pop(key)
                if type(value) is not list:
                    listOfDicts = []
                    listOfDicts.append(value)
                    listOfDicts.append(aDict[key])
                    self.update({key: listOfDicts})

                else:
                    value.append(aDict[key])
                    self.update({key: value})
            else:
                self.update(aDict)

回答 10

从@ K3 — rnc 响应(最适合我),我添加了一些小修改以从XML文本中获得OrderedDict(有时顺序很重要):

def etree_to_ordereddict(t):
d = OrderedDict()
d[t.tag] = OrderedDict() if t.attrib else None
children = list(t)
if children:
    dd = OrderedDict()
    for dc in map(etree_to_ordereddict, children):
        for k, v in dc.iteritems():
            if k not in dd:
                dd[k] = list()
            dd[k].append(v)
    d = OrderedDict()
    d[t.tag] = OrderedDict()
    for k, v in dd.iteritems():
        if len(v) == 1:
            d[t.tag][k] = v[0]
        else:
            d[t.tag][k] = v
if t.attrib:
    d[t.tag].update(('@' + k, v) for k, v in t.attrib.iteritems())
if t.text:
    text = t.text.strip()
    if children or t.attrib:
        if text:
            d[t.tag]['#text'] = text
    else:
        d[t.tag] = text
return d

在@ K3 — rnc示例中,可以使用它:

from xml.etree import cElementTree as ET
e = ET.XML('''
<root>
  <e />
  <e>text</e>
  <e name="value" />
  <e name="value">text</e>
  <e> <a>text</a> <b>text</b> </e>
  <e> <a>text</a> <a>text</a> </e>
  <e> text <a>text</a> </e>
</root>
''')

from pprint import pprint
pprint(etree_to_ordereddict(e))

希望能帮助到你 ;)

From @K3—rnc response (the best for me) I’ve added a small modifications to get an OrderedDict from an XML text (some times order matters):

def etree_to_ordereddict(t):
d = OrderedDict()
d[t.tag] = OrderedDict() if t.attrib else None
children = list(t)
if children:
    dd = OrderedDict()
    for dc in map(etree_to_ordereddict, children):
        for k, v in dc.iteritems():
            if k not in dd:
                dd[k] = list()
            dd[k].append(v)
    d = OrderedDict()
    d[t.tag] = OrderedDict()
    for k, v in dd.iteritems():
        if len(v) == 1:
            d[t.tag][k] = v[0]
        else:
            d[t.tag][k] = v
if t.attrib:
    d[t.tag].update(('@' + k, v) for k, v in t.attrib.iteritems())
if t.text:
    text = t.text.strip()
    if children or t.attrib:
        if text:
            d[t.tag]['#text'] = text
    else:
        d[t.tag] = text
return d

Following @K3—rnc example, you can use it:

from xml.etree import cElementTree as ET
e = ET.XML('''
<root>
  <e />
  <e>text</e>
  <e name="value" />
  <e name="value">text</e>
  <e> <a>text</a> <b>text</b> </e>
  <e> <a>text</a> <a>text</a> </e>
  <e> text <a>text</a> </e>
</root>
''')

from pprint import pprint
pprint(etree_to_ordereddict(e))

Hope it helps ;)


回答 11

这是ActiveState解决方案的链接-以及代码再次消失的代码。

==================================================
xmlreader.py:
==================================================
from xml.dom.minidom import parse


class NotTextNodeError:
    pass


def getTextFromNode(node):
    """
    scans through all children of node and gathers the
    text. if node has non-text child-nodes, then
    NotTextNodeError is raised.
    """
    t = ""
    for n in node.childNodes:
    if n.nodeType == n.TEXT_NODE:
        t += n.nodeValue
    else:
        raise NotTextNodeError
    return t


def nodeToDic(node):
    """
    nodeToDic() scans through the children of node and makes a
    dictionary from the content.
    three cases are differentiated:
    - if the node contains no other nodes, it is a text-node
    and {nodeName:text} is merged into the dictionary.
    - if the node has the attribute "method" set to "true",
    then it's children will be appended to a list and this
    list is merged to the dictionary in the form: {nodeName:list}.
    - else, nodeToDic() will call itself recursively on
    the nodes children (merging {nodeName:nodeToDic()} to
    the dictionary).
    """
    dic = {} 
    for n in node.childNodes:
    if n.nodeType != n.ELEMENT_NODE:
        continue
    if n.getAttribute("multiple") == "true":
        # node with multiple children:
        # put them in a list
        l = []
        for c in n.childNodes:
            if c.nodeType != n.ELEMENT_NODE:
            continue
        l.append(nodeToDic(c))
            dic.update({n.nodeName:l})
        continue

    try:
        text = getTextFromNode(n)
    except NotTextNodeError:
            # 'normal' node
            dic.update({n.nodeName:nodeToDic(n)})
            continue

        # text node
        dic.update({n.nodeName:text})
    continue
    return dic


def readConfig(filename):
    dom = parse(filename)
    return nodeToDic(dom)





def test():
    dic = readConfig("sample.xml")

    print dic["Config"]["Name"]
    print
    for item in dic["Config"]["Items"]:
    print "Item's Name:", item["Name"]
    print "Item's Value:", item["Value"]

test()



==================================================
sample.xml:
==================================================
<?xml version="1.0" encoding="UTF-8"?>

<Config>
    <Name>My Config File</Name>

    <Items multiple="true">
    <Item>
        <Name>First Item</Name>
        <Value>Value 1</Value>
    </Item>
    <Item>
        <Name>Second Item</Name>
        <Value>Value 2</Value>
    </Item>
    </Items>

</Config>



==================================================
output:
==================================================
My Config File

Item's Name: First Item
Item's Value: Value 1
Item's Name: Second Item
Item's Value: Value 2

Here’s a link to an ActiveState solution – and the code in case it disappears again.

==================================================
xmlreader.py:
==================================================
from xml.dom.minidom import parse


class NotTextNodeError:
    pass


def getTextFromNode(node):
    """
    scans through all children of node and gathers the
    text. if node has non-text child-nodes, then
    NotTextNodeError is raised.
    """
    t = ""
    for n in node.childNodes:
    if n.nodeType == n.TEXT_NODE:
        t += n.nodeValue
    else:
        raise NotTextNodeError
    return t


def nodeToDic(node):
    """
    nodeToDic() scans through the children of node and makes a
    dictionary from the content.
    three cases are differentiated:
    - if the node contains no other nodes, it is a text-node
    and {nodeName:text} is merged into the dictionary.
    - if the node has the attribute "method" set to "true",
    then it's children will be appended to a list and this
    list is merged to the dictionary in the form: {nodeName:list}.
    - else, nodeToDic() will call itself recursively on
    the nodes children (merging {nodeName:nodeToDic()} to
    the dictionary).
    """
    dic = {} 
    for n in node.childNodes:
    if n.nodeType != n.ELEMENT_NODE:
        continue
    if n.getAttribute("multiple") == "true":
        # node with multiple children:
        # put them in a list
        l = []
        for c in n.childNodes:
            if c.nodeType != n.ELEMENT_NODE:
            continue
        l.append(nodeToDic(c))
            dic.update({n.nodeName:l})
        continue

    try:
        text = getTextFromNode(n)
    except NotTextNodeError:
            # 'normal' node
            dic.update({n.nodeName:nodeToDic(n)})
            continue

        # text node
        dic.update({n.nodeName:text})
    continue
    return dic


def readConfig(filename):
    dom = parse(filename)
    return nodeToDic(dom)





def test():
    dic = readConfig("sample.xml")

    print dic["Config"]["Name"]
    print
    for item in dic["Config"]["Items"]:
    print "Item's Name:", item["Name"]
    print "Item's Value:", item["Value"]

test()



==================================================
sample.xml:
==================================================
<?xml version="1.0" encoding="UTF-8"?>

<Config>
    <Name>My Config File</Name>

    <Items multiple="true">
    <Item>
        <Name>First Item</Name>
        <Value>Value 1</Value>
    </Item>
    <Item>
        <Name>Second Item</Name>
        <Value>Value 2</Value>
    </Item>
    </Items>

</Config>



==================================================
output:
==================================================
My Config File

Item's Name: First Item
Item's Value: Value 1
Item's Name: Second Item
Item's Value: Value 2

回答 12

在某一时刻,我不得不解析和编写仅包含没有属性的元素的XML,因此从XML到dict的1:1映射很容易。如果别人也不需要属性,这就是我想出的:

def xmltodict(element):
    if not isinstance(element, ElementTree.Element):
        raise ValueError("must pass xml.etree.ElementTree.Element object")

    def xmltodict_handler(parent_element):
        result = dict()
        for element in parent_element:
            if len(element):
                obj = xmltodict_handler(element)
            else:
                obj = element.text

            if result.get(element.tag):
                if hasattr(result[element.tag], "append"):
                    result[element.tag].append(obj)
                else:
                    result[element.tag] = [result[element.tag], obj]
            else:
                result[element.tag] = obj
        return result

    return {element.tag: xmltodict_handler(element)}


def dicttoxml(element):
    if not isinstance(element, dict):
        raise ValueError("must pass dict type")
    if len(element) != 1:
        raise ValueError("dict must have exactly one root key")

    def dicttoxml_handler(result, key, value):
        if isinstance(value, list):
            for e in value:
                dicttoxml_handler(result, key, e)
        elif isinstance(value, basestring):
            elem = ElementTree.Element(key)
            elem.text = value
            result.append(elem)
        elif isinstance(value, int) or isinstance(value, float):
            elem = ElementTree.Element(key)
            elem.text = str(value)
            result.append(elem)
        elif value is None:
            result.append(ElementTree.Element(key))
        else:
            res = ElementTree.Element(key)
            for k, v in value.items():
                dicttoxml_handler(res, k, v)
            result.append(res)

    result = ElementTree.Element(element.keys()[0])
    for key, value in element[element.keys()[0]].items():
        dicttoxml_handler(result, key, value)
    return result

def xmlfiletodict(filename):
    return xmltodict(ElementTree.parse(filename).getroot())

def dicttoxmlfile(element, filename):
    ElementTree.ElementTree(dicttoxml(element)).write(filename)

def xmlstringtodict(xmlstring):
    return xmltodict(ElementTree.fromstring(xmlstring).getroot())

def dicttoxmlstring(element):
    return ElementTree.tostring(dicttoxml(element))

At one point I had to parse and write XML that only consisted of elements without attributes so a 1:1 mapping from XML to dict was possible easily. This is what I came up with in case someone else also doesnt need attributes:

def xmltodict(element):
    if not isinstance(element, ElementTree.Element):
        raise ValueError("must pass xml.etree.ElementTree.Element object")

    def xmltodict_handler(parent_element):
        result = dict()
        for element in parent_element:
            if len(element):
                obj = xmltodict_handler(element)
            else:
                obj = element.text

            if result.get(element.tag):
                if hasattr(result[element.tag], "append"):
                    result[element.tag].append(obj)
                else:
                    result[element.tag] = [result[element.tag], obj]
            else:
                result[element.tag] = obj
        return result

    return {element.tag: xmltodict_handler(element)}


def dicttoxml(element):
    if not isinstance(element, dict):
        raise ValueError("must pass dict type")
    if len(element) != 1:
        raise ValueError("dict must have exactly one root key")

    def dicttoxml_handler(result, key, value):
        if isinstance(value, list):
            for e in value:
                dicttoxml_handler(result, key, e)
        elif isinstance(value, basestring):
            elem = ElementTree.Element(key)
            elem.text = value
            result.append(elem)
        elif isinstance(value, int) or isinstance(value, float):
            elem = ElementTree.Element(key)
            elem.text = str(value)
            result.append(elem)
        elif value is None:
            result.append(ElementTree.Element(key))
        else:
            res = ElementTree.Element(key)
            for k, v in value.items():
                dicttoxml_handler(res, k, v)
            result.append(res)

    result = ElementTree.Element(element.keys()[0])
    for key, value in element[element.keys()[0]].items():
        dicttoxml_handler(result, key, value)
    return result

def xmlfiletodict(filename):
    return xmltodict(ElementTree.parse(filename).getroot())

def dicttoxmlfile(element, filename):
    ElementTree.ElementTree(dicttoxml(element)).write(filename)

def xmlstringtodict(xmlstring):
    return xmltodict(ElementTree.fromstring(xmlstring).getroot())

def dicttoxmlstring(element):
    return ElementTree.tostring(dicttoxml(element))

回答 13

@dibrovsd:如果xml具有多个具有相同名称的标签,则解决方案将不起作用

根据您的想法,我对代码进行了一些修改,并将其编写为常规节点而不是root用户:

from collections import defaultdict
def xml2dict(node):
    d, count = defaultdict(list), 1
    for i in node:
        d[i.tag + "_" + str(count)]['text'] = i.findtext('.')[0]
        d[i.tag + "_" + str(count)]['attrib'] = i.attrib # attrib gives the list
        d[i.tag + "_" + str(count)]['children'] = xml2dict(i) # it gives dict
     return d

@dibrovsd: Solution will not work if the xml have more than one tag with same name

On your line of thought, I have modified the code a bit and written it for general node instead of root:

from collections import defaultdict
def xml2dict(node):
    d, count = defaultdict(list), 1
    for i in node:
        d[i.tag + "_" + str(count)]['text'] = i.findtext('.')[0]
        d[i.tag + "_" + str(count)]['attrib'] = i.attrib # attrib gives the list
        d[i.tag + "_" + str(count)]['children'] = xml2dict(i) # it gives dict
     return d

回答 14

我修改了我的口味的答案之一,并使用同一标签处理多个值,例如考虑以下保存在XML.xml文件中的xml代码。

     <A>
        <B>
            <BB>inAB</BB>
            <C>
                <D>
                    <E>
                        inABCDE
                    </E>
                    <E>value2</E>
                    <E>value3</E>
                </D>
                <inCout-ofD>123</inCout-ofD>
            </C>
        </B>
        <B>abc</B>
        <F>F</F>
    </A>

和在python中

import xml.etree.ElementTree as ET




class XMLToDictionary(dict):
    def __init__(self, parentElement):
        self.parentElement = parentElement
        for child in list(parentElement):
            child.text = child.text if (child.text != None) else  ' '
            if len(child) == 0:
                self.update(self._addToDict(key= child.tag, value = child.text.strip(), dict = self))
            else:
                innerChild = XMLToDictionary(parentElement=child)
                self.update(self._addToDict(key=innerChild.parentElement.tag, value=innerChild, dict=self))

    def getDict(self):
        return {self.parentElement.tag: self}

    class _addToDict(dict):
        def __init__(self, key, value, dict):
            if not key in dict:
                self.update({key: value})
            else:
                identical = dict[key] if type(dict[key]) == list else [dict[key]]
                self.update({key: identical + [value]})


tree = ET.parse('./XML.xml')
root = tree.getroot()
parseredDict = XMLToDictionary(root).getDict()
print(parseredDict)

输出是

{'A': {'B': [{'BB': 'inAB', 'C': {'D': {'E': ['inABCDE', 'value2', 'value3']}, 'inCout-ofD': '123'}}, 'abc'], 'F': 'F'}}

I have modified one of the answers to my taste and to work with multiple values with the same tag for example consider the following xml code saved in XML.xml file

     <A>
        <B>
            <BB>inAB</BB>
            <C>
                <D>
                    <E>
                        inABCDE
                    </E>
                    <E>value2</E>
                    <E>value3</E>
                </D>
                <inCout-ofD>123</inCout-ofD>
            </C>
        </B>
        <B>abc</B>
        <F>F</F>
    </A>

and in python

import xml.etree.ElementTree as ET




class XMLToDictionary(dict):
    def __init__(self, parentElement):
        self.parentElement = parentElement
        for child in list(parentElement):
            child.text = child.text if (child.text != None) else  ' '
            if len(child) == 0:
                self.update(self._addToDict(key= child.tag, value = child.text.strip(), dict = self))
            else:
                innerChild = XMLToDictionary(parentElement=child)
                self.update(self._addToDict(key=innerChild.parentElement.tag, value=innerChild, dict=self))

    def getDict(self):
        return {self.parentElement.tag: self}

    class _addToDict(dict):
        def __init__(self, key, value, dict):
            if not key in dict:
                self.update({key: value})
            else:
                identical = dict[key] if type(dict[key]) == list else [dict[key]]
                self.update({key: identical + [value]})


tree = ET.parse('./XML.xml')
root = tree.getroot()
parseredDict = XMLToDictionary(root).getDict()
print(parseredDict)

the output is

{'A': {'B': [{'BB': 'inAB', 'C': {'D': {'E': ['inABCDE', 'value2', 'value3']}, 'inCout-ofD': '123'}}, 'abc'], 'F': 'F'}}

回答 15

我有一个递归方法,可从lxml元素获取字典

    def recursive_dict(element):
        return (element.tag.split('}')[1],
                dict(map(recursive_dict, element.getchildren()),
                     **element.attrib))

I have a recursive method to get a dictionary from a lxml element

    def recursive_dict(element):
        return (element.tag.split('}')[1],
                dict(map(recursive_dict, element.getchildren()),
                     **element.attrib))

使用其构造函数初始化OrderedDict的正确方法,使其保留初始数据的顺序?

问题:使用其构造函数初始化OrderedDict的正确方法,使其保留初始数据的顺序?

初始化有序词典(OD)以便保留初始数据顺序的正确方法是什么?

from collections import OrderedDict

# Obviously wrong because regular dict loses order
d = OrderedDict({'b':2, 'a':1}) 

# An OD is represented by a list of tuples, so would this work?
d = OrderedDict([('b',2), ('a', 1)])

# What about using a list comprehension, will 'd' preserve the order of 'l'
l = ['b', 'a', 'c', 'aa']
d = OrderedDict([(i,i) for i in l])

题:

  • OrderedDict在初始化时是否会保留元组列表的顺序,元组的元组或列表的元组或列表的列表等的顺序(上述第二和第三示例)?

  • 如何验证是否OrderedDict实际维持订单?由于a dict具有不可预测的顺序,如果我的测试向量幸运地具有与dict不可预测的顺序相同的初始顺序,该怎么办?例如,如果不是d = OrderedDict({'b':2, 'a':1})我写d = OrderedDict({'a':1, 'b':2}),我可能会错误地得出结论认为该顺序已保留。在这种情况下,我发现a dict是按字母顺序排列的,但这可能并不总是正确的。什么是使用反例来验证数据结构是否保留顺序的可靠方法,而无需反复尝试测试向量,直到一个中断为止?

PS:我将在此留出参考:“ OrderedDict构造函数和update()方法都接受关键字参数,但是它们的顺序丢失了,因为Python的函数使用常规无序字典来调用语义传递关键字参数”

PPS:希望将来,OrderedDict也将保留kwarg的顺序(示例1):http : //bugs.python.org/issue16991

What’s the correct way to initialize an ordered dictionary (OD) so that it retains the order of initial data?

from collections import OrderedDict

# Obviously wrong because regular dict loses order
d = OrderedDict({'b':2, 'a':1}) 

# An OD is represented by a list of tuples, so would this work?
d = OrderedDict([('b',2), ('a', 1)])

# What about using a list comprehension, will 'd' preserve the order of 'l'
l = ['b', 'a', 'c', 'aa']
d = OrderedDict([(i,i) for i in l])

Question:

  • Will an OrderedDict preserve the order of a list of tuples, or tuple of tuples or tuple of lists or list of lists etc. passed at the time of initialization (2nd & 3rd example above)?

  • How does one go about verifying if OrderedDict actually maintains an order? Since a dict has an unpredictable order, what if my test vectors luckily have the same initial order as the unpredictable order of a dict? For example, if instead of d = OrderedDict({'b':2, 'a':1}) I write d = OrderedDict({'a':1, 'b':2}), I can wrongly conclude that the order is preserved. In this case, I found out that a dict is ordered alphabetically, but that may not be always true. What’s a reliable way to use a counterexample to verify whether a data structure preserves order or not, short of trying test vectors repeatedly until one breaks?

P.S. I’ll just leave this here for reference: “The OrderedDict constructor and update() method both accept keyword arguments, but their order is lost because Python’s function call semantics pass-in keyword arguments using a regular unordered dictionary”

P.P.S : Hopefully, in future, OrderedDict will preserve the order of kwargs also (example 1): http://bugs.python.org/issue16991


回答 0

OrderedDict将保留其有权访问的任何订单。将有序数据传递给它进行初始化的唯一方法是传递键值对的列表(或更普遍地讲,是可迭代的),如最后两个示例中所示。正如您链接的文档所述,当您传入关键字参数或dict参数时,OrderedDict无法访问任何顺序,因为其中的任何顺序都在OrderedDict构造函数看到之前被删除。

请注意,在上一个示例中使用列表推导并没有什么改变。OrderedDict([(i,i) for i in l])和之间没有区别OrderedDict([('b', 'b'), ('a', 'a'), ('c', 'c'), ('aa', 'aa')])。评估列表理解并创建列表,并将其传入;OrderedDict对它的创建方式一无所知。

The OrderedDict will preserve any order that it has access to. The only way to pass ordered data to it to initialize is to pass a list (or, more generally, an iterable) of key-value pairs, as in your last two examples. As the documentation you linked to says, the OrderedDict does not have access to any order when you pass in keyword arguments or a dict argument, since any order there is removed before the OrderedDict constructor sees it.

Note that using a list comprehension in your last example doesn’t change anything. There’s no difference between OrderedDict([(i,i) for i in l]) and OrderedDict([('b', 'b'), ('a', 'a'), ('c', 'c'), ('aa', 'aa')]). The list comprehension is evaluated and creates the list and it is passed in; OrderedDict knows nothing about how it was created.


回答 1

# An OD is represented by a list of tuples, so would this work?
d = OrderedDict([('b', 2), ('a', 1)])

是的,那行得通。根据定义,列表总是按照其表示方式进行排序。这也适用于列表理解,生成的列表的提供方式与提供数据的方式相同(即,来自列表的来源将是确定性的,来源于setdict不那么多)。

如何验证是否OrderedDict实际维持订单。由于字典具有不可预测的顺序,如果我的测试向量幸运地具有与字典的不可预测顺序相同的初始顺序,该怎么办?例如,如果不是d = OrderedDict({'b':2, 'a':1})我写d = OrderedDict({'a':1, 'b':2}),我可能会错误地得出结论认为该顺序已保留。在这种情况下,我发现a dict是按字母顺序排列的,但这可能并不总是正确的。也就是说,使用反例来验证数据结构是否保留顺序还是一种可靠的方法是一种可靠的方法,可以反复尝试测试向量,直到一个中断。

您保留2元组的源列表作为参考,并在进行单元测试时将其用作测试用例的测试数据。遍历它们并确保维持订单。

# An OD is represented by a list of tuples, so would this work?
d = OrderedDict([('b', 2), ('a', 1)])

Yes, that will work. By definition, a list is always ordered the way it is represented. This goes for list-comprehension too, the list generated is in the same way the data was provided (i.e. source from a list it will be deterministic, sourced from a set or dict not so much).

How does one go about verifying if OrderedDict actually maintains an order. Since a dict has an unpredictable order, what if my test vectors luckily has the same initial order as the unpredictable order of a dict?. For example, if instead of d = OrderedDict({'b':2, 'a':1}) I write d = OrderedDict({'a':1, 'b':2}), I can wrongly conclude that the order is preserved. In this case, I found out that a dict is order alphabetically, but that may not be always true. i.e. what’s a reliable way to use a counter example to verify if a data structure preserves order or not short of trying test vectors repeatedly until one breaks.

You keep your source list of 2-tuple around for reference, and use that as your test data for your test cases when you do unit tests. Iterate through them and ensure the order is maintained.


Python字典到URL参数

问题:Python字典到URL参数

我正在尝试将Python字典转换为用作URL参数的字符串。我敢肯定,有一种更好的,更Python化的方法可以做到这一点。它是什么?

x = ""
for key, val in {'a':'A', 'b':'B'}.items():
    x += "%s=%s&" %(key,val)
x = x[:-1]

I am trying to convert a Python dictionary to a string for use as URL parameters. I am sure that there is a better, more Pythonic way of doing this. What is it?

x = ""
for key, val in {'a':'A', 'b':'B'}.items():
    x += "%s=%s&" %(key,val)
x = x[:-1]

回答 0

使用urllib.urlencode()。它采用键值对字典,然后将其转换为适合网址的形式(例如,key1=val1&key2=val2)。

如果您使用的是Python3,请使用 urllib.parse.urlencode()

如果要使用重复的参数创建URL,例如:p=1&p=2&p=3您有两个选择:

>>> import urllib
>>> a = (('p',1),('p',2), ('p', 3))
>>> urllib.urlencode(a)
'p=1&p=2&p=3'

或者,如果您想使用重复的参数创建网址:

>>> urllib.urlencode({'p': [1, 2, 3]}, doseq=True)
'p=1&p=2&p=3'

Use urllib.urlencode(). It takes a dictionary of key-value pairs, and converts it into a form suitable for a URL (e.g., key1=val1&key2=val2).

If you are using Python3, use urllib.parse.urlencode()

If you want to make a URL with repetitive params such as: p=1&p=2&p=3 you have two options:

>>> import urllib
>>> a = (('p',1),('p',2), ('p', 3))
>>> urllib.urlencode(a)
'p=1&p=2&p=3'

or if you want to make a url with repetitive params:

>>> urllib.urlencode({'p': [1, 2, 3]}, doseq=True)
'p=1&p=2&p=3'

回答 1

使用第三方Python URL操作库furl

f = furl.furl('')
f.args = {'a':'A', 'b':'B'}
print(f.url) # prints ... '?a=A&b=B'

如果需要重复的参数,可以执行以下操作:

f = furl.furl('')
f.args = [('a', 'A'), ('b', 'B'),('b', 'B2')]
print(f.url) # prints ... '?a=A&b=B&b=B2'

Use the 3rd party Python url manipulation library furl:

f = furl.furl('')
f.args = {'a':'A', 'b':'B'}
print(f.url) # prints ... '?a=A&b=B'

If you want repetitive parameters, you can do the following:

f = furl.furl('')
f.args = [('a', 'A'), ('b', 'B'),('b', 'B2')]
print(f.url) # prints ... '?a=A&b=B&b=B2'

回答 2

在我看来,这似乎更像Pythonic,并且不使用任何其他模块:

x = '&'.join(["{}={}".format(k, v) for k, v in {'a':'A', 'b':'B'}.items()])

This seems a bit more Pythonic to me, and doesn’t use any other modules:

x = '&'.join(["{}={}".format(k, v) for k, v in {'a':'A', 'b':'B'}.items()])

检查值是否已存在于字典列表中?

问题:检查值是否已存在于字典列表中?

我有一个Python字典列表,如下所示:

a = [
    {'main_color': 'red', 'second_color':'blue'},
    {'main_color': 'yellow', 'second_color':'green'},
    {'main_color': 'yellow', 'second_color':'blue'},
]

我想检查列表中是否已存在具有特定键/值的字典,如下所示:

// is a dict with 'main_color'='red' in the list already?
// if not: add item

I’ve got a Python list of dictionaries, as follows:

a = [
    {'main_color': 'red', 'second_color':'blue'},
    {'main_color': 'yellow', 'second_color':'green'},
    {'main_color': 'yellow', 'second_color':'blue'},
]

I’d like to check whether a dictionary with a particular key/value already exists in the list, as follows:

// is a dict with 'main_color'='red' in the list already?
// if not: add item

回答 0

这是一种实现方法:

if not any(d['main_color'] == 'red' for d in a):
    # does not exist

括号中的部分是一个生成器表达式,该表达式True将为每个具有您要查找的键-值对的字典返回,否则为False


如果密钥也可能丢失,则上面的代码可以给您一个KeyError。您可以使用get并提供默认值来解决此问题。如果不提供默认值,None则返回。

if not any(d.get('main_color', default_value) == 'red' for d in a):
    # does not exist

Here’s one way to do it:

if not any(d['main_color'] == 'red' for d in a):
    # does not exist

The part in parentheses is a generator expression that returns True for each dictionary that has the key-value pair you are looking for, otherwise False.


If the key could also be missing the above code can give you a KeyError. You can fix this by using get and providing a default value. If you don’t provide a default value, None is returned.

if not any(d.get('main_color', default_value) == 'red' for d in a):
    # does not exist

回答 1

也许这会有所帮助:

a = [{ 'main_color': 'red', 'second_color':'blue'},
     { 'main_color': 'yellow', 'second_color':'green'},
     { 'main_color': 'yellow', 'second_color':'blue'}]

def in_dictlist((key, value), my_dictlist):
    for this in my_dictlist:
        if this[key] == value:
            return this
    return {}

print in_dictlist(('main_color','red'), a)
print in_dictlist(('main_color','pink'), a)

Maybe this helps:

a = [{ 'main_color': 'red', 'second_color':'blue'},
     { 'main_color': 'yellow', 'second_color':'green'},
     { 'main_color': 'yellow', 'second_color':'blue'}]

def in_dictlist((key, value), my_dictlist):
    for this in my_dictlist:
        if this[key] == value:
            return this
    return {}

print in_dictlist(('main_color','red'), a)
print in_dictlist(('main_color','pink'), a)

回答 2

遵循这些原则的功能也许就是您所追求的:

 def add_unique_to_dict_list(dict_list, key, value):
  for d in dict_list:
     if key in d:
        return d[key]

  dict_list.append({ key: value })
  return value

Perhaps a function along these lines is what you’re after:

 def add_unique_to_dict_list(dict_list, key, value):
  for d in dict_list:
     if key in d:
        return d[key]

  dict_list.append({ key: value })
  return value

回答 3

基于@Mark Byers的一个很好的答案,并紧接着@Florent问题,仅表明它也可以在具有超过2个键的dic列表中使用2个条件:

names = []
names.append({'first': 'Nil', 'last': 'Elliot', 'suffix': 'III'})
names.append({'first': 'Max', 'last': 'Sam', 'suffix': 'IX'})
names.append({'first': 'Anthony', 'last': 'Mark', 'suffix': 'IX'})

if not any(d['first'] == 'Anthony' and d['last'] == 'Mark' for d in names):

    print('Not exists!')
else:
    print('Exists!')

结果:

Exists!

Based on @Mark Byers great answer, and following @Florent question, just to indicate that it will also work with 2 conditions on list of dics with more than 2 keys:

names = []
names.append({'first': 'Nil', 'last': 'Elliot', 'suffix': 'III'})
names.append({'first': 'Max', 'last': 'Sam', 'suffix': 'IX'})
names.append({'first': 'Anthony', 'last': 'Mark', 'suffix': 'IX'})

if not any(d['first'] == 'Anthony' and d['last'] == 'Mark' for d in names):

    print('Not exists!')
else:
    print('Exists!')

Result:

Exists!

遍历所有嵌套的字典值?

问题:遍历所有嵌套的字典值?

for k, v in d.iteritems():
    if type(v) is dict:
        for t, c in v.iteritems():
            print "{0} : {1}".format(t, c)

我试图遍历字典并打印出所有值不是嵌套字典的键值对。如果值是字典,我想进入它并打印出它的键值对…等等。有什么帮助吗?

编辑

这个怎么样?它仍然只打印一件事。

def printDict(d):
    for k, v in d.iteritems():
        if type(v) is dict:
            printDict(v)
        else:
            print "{0} : {1}".format(k, v)

完整的测试用例

字典:

{u'xml': {u'config': {u'portstatus': {u'status': u'good'}, u'target': u'1'},
      u'port': u'11'}}

结果:

xml : {u'config': {u'portstatus': {u'status': u'good'}, u'target': u'1'}, u'port': u'11'}
for k, v in d.iteritems():
    if type(v) is dict:
        for t, c in v.iteritems():
            print "{0} : {1}".format(t, c)

I’m trying to loop through a dictionary and print out all key value pairs where the value is not a nested dictionary. If the value is a dictionary I want to go into it and print out its key value pairs…etc. Any help?

EDIT

How about this? It still only prints one thing.

def printDict(d):
    for k, v in d.iteritems():
        if type(v) is dict:
            printDict(v)
        else:
            print "{0} : {1}".format(k, v)

Full Test Case

Dictionary:

{u'xml': {u'config': {u'portstatus': {u'status': u'good'}, u'target': u'1'},
      u'port': u'11'}}

Result:

xml : {u'config': {u'portstatus': {u'status': u'good'}, u'target': u'1'}, u'port': u'11'}

回答 0

如Niklas所说,您需要递归,即您想定义一个函数来打印您的字典,如果该值是一个字典,则想使用这个新字典来调用您的打印函数。

就像是 :

def myprint(d):
    for k, v in d.items():
        if isinstance(v, dict):
            myprint(v)
        else:
            print("{0} : {1}".format(k, v))

As said by Niklas, you need recursion, i.e. you want to define a function to print your dict, and if the value is a dict, you want to call your print function using this new dict.

Something like :

def myprint(d):
    for k, v in d.items():
        if isinstance(v, dict):
            myprint(v)
        else:
            print("{0} : {1}".format(k, v))

回答 1

如果您编写自己的递归实现或带有堆栈的迭代等效项,则可能会出现问题。请参阅以下示例:

    dic = {}
    dic["key1"] = {}
    dic["key1"]["key1.1"] = "value1"
    dic["key2"]  = {}
    dic["key2"]["key2.1"] = "value2"
    dic["key2"]["key2.2"] = dic["key1"]
    dic["key2"]["key2.3"] = dic

在正常情况下,嵌套字典将是像数据结构一样的n元树。但是定义不排除出现交叉边缘甚至后边缘的可能性(因此不再是树)。例如,这里key2.2key1保留到字典,key2.3指向整个字典(后沿/循环)。当有后沿(循环)时,堆栈/递归将无限运行。

                          root<-------back edge
                        /      \           |
                     _key1   __key2__      |
                    /       /   \    \     |
               |->key1.1 key2.1 key2.2 key2.3
               |   /       |      |
               | value1  value2   |
               |                  | 
              cross edge----------|

如果您使用Scharron的此实现打印此词典

    def myprint(d):
      for k, v in d.items():
        if isinstance(v, dict):
          myprint(v)
        else:
          print "{0} : {1}".format(k, v)

您会看到此错误:

    RuntimeError: maximum recursion depth exceeded while calling a Python object

senderle的实现也是如此

同样,您可以从Fred Foo的此实现中获得无限循环:

    def myprint(d):
        stack = list(d.items())
        while stack:
            k, v = stack.pop()
            if isinstance(v, dict):
                stack.extend(v.items())
            else:
                print("%s: %s" % (k, v))

但是,Python实际上会检测嵌套字典中的循环:

    print dic
    {'key2': {'key2.1': 'value2', 'key2.3': {...}, 
       'key2.2': {'key1.1': 'value1'}}, 'key1': {'key1.1': 'value1'}}

“ {…}”是检测到循环的位置。

根据Moondra的要求,这是一种避免循环(DFS)的方法:

def myprint(d): 
  stack = list(d.items()) 
  visited = set() 
  while stack: 
    k, v = stack.pop() 
    if isinstance(v, dict): 
      if k not in visited: 
        stack.extend(v.items()) 
      else: 
        print("%s: %s" % (k, v)) 
      visited.add(k)

There are potential problems if you write your own recursive implementation or the iterative equivalent with stack. See this example:

    dic = {}
    dic["key1"] = {}
    dic["key1"]["key1.1"] = "value1"
    dic["key2"]  = {}
    dic["key2"]["key2.1"] = "value2"
    dic["key2"]["key2.2"] = dic["key1"]
    dic["key2"]["key2.3"] = dic

In the normal sense, nested dictionary will be a n-nary tree like data structure. But the definition doesn’t exclude the possibility of a cross edge or even a back edge (thus no longer a tree). For instance, here key2.2 holds to the dictionary from key1, key2.3 points to the entire dictionary(back edge/cycle). When there is a back edge(cycle), the stack/recursion will run infinitely.

                          root<-------back edge
                        /      \           |
                     _key1   __key2__      |
                    /       /   \    \     |
               |->key1.1 key2.1 key2.2 key2.3
               |   /       |      |
               | value1  value2   |
               |                  | 
              cross edge----------|

If you print this dictionary with this implementation from Scharron

    def myprint(d):
      for k, v in d.items():
        if isinstance(v, dict):
          myprint(v)
        else:
          print "{0} : {1}".format(k, v)

You would see this error:

    RuntimeError: maximum recursion depth exceeded while calling a Python object

The same goes with the implementation from senderle.

Similarly, you get an infinite loop with this implementation from Fred Foo:

    def myprint(d):
        stack = list(d.items())
        while stack:
            k, v = stack.pop()
            if isinstance(v, dict):
                stack.extend(v.items())
            else:
                print("%s: %s" % (k, v))

However, Python actually detects cycles in nested dictionary:

    print dic
    {'key2': {'key2.1': 'value2', 'key2.3': {...}, 
       'key2.2': {'key1.1': 'value1'}}, 'key1': {'key1.1': 'value1'}}

“{…}” is where a cycle is detected.

As requested by Moondra this is a way to avoid cycles (DFS):

def myprint(d): 
  stack = list(d.items()) 
  visited = set() 
  while stack: 
    k, v = stack.pop() 
    if isinstance(v, dict): 
      if k not in visited: 
        stack.extend(v.items()) 
      else: 
        print("%s: %s" % (k, v)) 
      visited.add(k)

回答 2

由于a dict是可迭代的,因此您只需稍作一些更改就可以将经典的嵌套容器可迭代公式应用于此问题。这是Python 2版本(请参阅下面的3):

import collections
def nested_dict_iter(nested):
    for key, value in nested.iteritems():
        if isinstance(value, collections.Mapping):
            for inner_key, inner_value in nested_dict_iter(value):
                yield inner_key, inner_value
        else:
            yield key, value

测试:

list(nested_dict_iter({'a':{'b':{'c':1, 'd':2}, 
                            'e':{'f':3, 'g':4}}, 
                       'h':{'i':5, 'j':6}}))
# output: [('g', 4), ('f', 3), ('c', 1), ('d', 2), ('i', 5), ('j', 6)]

在Python 2中,可能可以创建一个Mapping限定为,Mapping但不包含的自定义iteritems,在这种情况下,这将失败。文档没有指出这iteritems是必需的Mapping;另一方面,Mapping类型提供了一种iteritems方法。因此,对于custom Mappings,从collections.Mapping显式继承以防万一。

在Python 3中,有许多改进。从Python 3.3开始,抽象基类存在于中collections.abc。它们也保持collections向后兼容,但是将我们的抽象基类放在一个命名空间中会更好。因此,这是abc从导入的collections。Python 3.3还添加了yield from,它专门用于这种情况。这不是空的语法糖。它可能导致更快的代码和与协同程序更明智的交互。

from collections import abc
def nested_dict_iter(nested):
    for key, value in nested.items():
        if isinstance(value, abc.Mapping):
            yield from nested_dict_iter(value)
        else:
            yield key, value

Since a dict is iterable, you can apply the classic nested container iterable formula to this problem with only a couple of minor changes. Here’s a Python 2 version (see below for 3):

import collections
def nested_dict_iter(nested):
    for key, value in nested.iteritems():
        if isinstance(value, collections.Mapping):
            for inner_key, inner_value in nested_dict_iter(value):
                yield inner_key, inner_value
        else:
            yield key, value

Test:

list(nested_dict_iter({'a':{'b':{'c':1, 'd':2}, 
                            'e':{'f':3, 'g':4}}, 
                       'h':{'i':5, 'j':6}}))
# output: [('g', 4), ('f', 3), ('c', 1), ('d', 2), ('i', 5), ('j', 6)]

In Python 2, It might be possible to create a custom Mapping that qualifies as a Mapping but doesn’t contain iteritems, in which case this will fail. The docs don’t indicate that iteritems is required for a Mapping; on the other hand, the source gives Mapping types an iteritems method. So for custom Mappings, inherit from collections.Mapping explicitly just in case.

In Python 3, there are a number of improvements to be made. As of Python 3.3, abstract base classes live in collections.abc. They remain in collections too for backwards compatibility, but it’s nicer having our abstract base classes together in one namespace. So this imports abc from collections. Python 3.3 also adds yield from, which is designed for just these sorts of situations. This is not empty syntactic sugar; it may lead to faster code and more sensible interactions with coroutines.

from collections import abc
def nested_dict_iter(nested):
    for key, value in nested.items():
        if isinstance(value, abc.Mapping):
            yield from nested_dict_iter(value)
        else:
            yield key, value

回答 3

替代迭代解决方案:

def myprint(d):
    stack = d.items()
    while stack:
        k, v = stack.pop()
        if isinstance(v, dict):
            stack.extend(v.iteritems())
        else:
            print("%s: %s" % (k, v))

Alternative iterative solution:

def myprint(d):
    stack = d.items()
    while stack:
        k, v = stack.pop()
        if isinstance(v, dict):
            stack.extend(v.iteritems())
        else:
            print("%s: %s" % (k, v))

回答 4

我写的版本略有不同,跟踪到达那里的过程中的按键

def print_dict(v, prefix=''):
    if isinstance(v, dict):
        for k, v2 in v.items():
            p2 = "{}['{}']".format(prefix, k)
            print_dict(v2, p2)
    elif isinstance(v, list):
        for i, v2 in enumerate(v):
            p2 = "{}[{}]".format(prefix, i)
            print_dict(v2, p2)
    else:
        print('{} = {}'.format(prefix, repr(v)))

在您的数据上,它将打印

data['xml']['config']['portstatus']['status'] = u'good'
data['xml']['config']['target'] = u'1'
data['xml']['port'] = u'11'

修改它以将前缀作为键的元组而不是字符串来跟踪前缀(如果您需要的话)也很容易。

Slightly different version I wrote that keeps track of the keys along the way to get there

def print_dict(v, prefix=''):
    if isinstance(v, dict):
        for k, v2 in v.items():
            p2 = "{}['{}']".format(prefix, k)
            print_dict(v2, p2)
    elif isinstance(v, list):
        for i, v2 in enumerate(v):
            p2 = "{}[{}]".format(prefix, i)
            print_dict(v2, p2)
    else:
        print('{} = {}'.format(prefix, repr(v)))

On your data, it’ll print

data['xml']['config']['portstatus']['status'] = u'good'
data['xml']['config']['target'] = u'1'
data['xml']['port'] = u'11'

It’s also easy to modify it to track the prefix as a tuple of keys rather than a string if you need it that way.


回答 5

这是pythonic的方法。此功能将允许您在所有级别中遍历键值对。它不会将整个内容保存到内存中,而是在您遍历字典时逐步执行

def recursive_items(dictionary):
    for key, value in dictionary.items():
        if type(value) is dict:
            yield (key, value)
            yield from recursive_items(value)
        else:
            yield (key, value)

a = {'a': {1: {1: 2, 3: 4}, 2: {5: 6}}}

for key, value in recursive_items(a):
    print(key, value)

版画

a {1: {1: 2, 3: 4}, 2: {5: 6}}
1 {1: 2, 3: 4}
1 2
3 4
2 {5: 6}
5 6

Here is pythonic way to do it. This function will allow you to loop through key-value pair in all the levels. It does not save the whole thing to the memory but rather walks through the dict as you loop through it

def recursive_items(dictionary):
    for key, value in dictionary.items():
        if type(value) is dict:
            yield (key, value)
            yield from recursive_items(value)
        else:
            yield (key, value)

a = {'a': {1: {1: 2, 3: 4}, 2: {5: 6}}}

for key, value in recursive_items(a):
    print(key, value)

Prints

a {1: {1: 2, 3: 4}, 2: {5: 6}}
1 {1: 2, 3: 4}
1 2
3 4
2 {5: 6}
5 6

回答 6

迭代解决方案作为替代方案:

def traverse_nested_dict(d):
    iters = [d.iteritems()]

    while iters:
        it = iters.pop()
        try:
            k, v = it.next()
        except StopIteration:
            continue

        iters.append(it)

        if isinstance(v, dict):
            iters.append(v.iteritems())
        else:
            yield k, v


d = {"a": 1, "b": 2, "c": {"d": 3, "e": {"f": 4}}}
for k, v in traverse_nested_dict(d):
    print k, v

Iterative solution as an alternative:

def traverse_nested_dict(d):
    iters = [d.iteritems()]

    while iters:
        it = iters.pop()
        try:
            k, v = it.next()
        except StopIteration:
            continue

        iters.append(it)

        if isinstance(v, dict):
            iters.append(v.iteritems())
        else:
            yield k, v


d = {"a": 1, "b": 2, "c": {"d": 3, "e": {"f": 4}}}
for k, v in traverse_nested_dict(d):
    print k, v

回答 7

基于Scharron解决方案的另一种使用列表的解决方案

def myprint(d):
    my_list = d.iteritems() if isinstance(d, dict) else enumerate(d)

    for k, v in my_list:
        if isinstance(v, dict) or isinstance(v, list):
            myprint(v)
        else:
            print u"{0} : {1}".format(k, v)

A alternative solution to work with lists based on Scharron’s solution

def myprint(d):
    my_list = d.iteritems() if isinstance(d, dict) else enumerate(d)

    for k, v in my_list:
        if isinstance(v, dict) or isinstance(v, list):
            myprint(v)
        else:
            print u"{0} : {1}".format(k, v)

回答 8

考虑到该值可能是包含字典的列表,我正在使用以下代码来打印嵌套字典的所有值。当我将JSON文件解析为字典并且需要快速检查其任何值是否为时,这对我很有用None

    d = {
            "user": 10,
            "time": "2017-03-15T14:02:49.301000",
            "metadata": [
                {"foo": "bar"},
                "some_string"
            ]
        }


    def print_nested(d):
        if isinstance(d, dict):
            for k, v in d.items():
                print_nested(v)
        elif hasattr(d, '__iter__') and not isinstance(d, str):
            for item in d:
                print_nested(item)
        elif isinstance(d, str):
            print(d)

        else:
            print(d)

    print_nested(d)

输出:

    10
    2017-03-15T14:02:49.301000
    bar
    some_string

I am using the following code to print all the values of a nested dictionary, taking into account where the value could be a list containing dictionaries. This was useful to me when parsing a JSON file into a dictionary and needing to quickly check whether any of its values are None.

    d = {
            "user": 10,
            "time": "2017-03-15T14:02:49.301000",
            "metadata": [
                {"foo": "bar"},
                "some_string"
            ]
        }


    def print_nested(d):
        if isinstance(d, dict):
            for k, v in d.items():
                print_nested(v)
        elif hasattr(d, '__iter__') and not isinstance(d, str):
            for item in d:
                print_nested(item)
        elif isinstance(d, str):
            print(d)

        else:
            print(d)

    print_nested(d)

Output:

    10
    2017-03-15T14:02:49.301000
    bar
    some_string

回答 9

这是Fred Foo对Python 2的回答的修改版本。在原始响应中,仅输出最深层的嵌套。如果将键输出为列表,则可以保留所有级别的键,尽管要引用它们,则需要引用列表。

功能如下:

def NestIter(nested):
    for key, value in nested.iteritems():
        if isinstance(value, collections.Mapping):
            for inner_key, inner_value in NestIter(value):
                yield [key, inner_key], inner_value
        else:
            yield [key],value

引用键:

for keys, vals in mynested: 
    print(mynested[keys[0]][keys[1][0]][keys[1][1][0]])

三级字典。

您需要在访问多个键之前知道级别的数量,并且级别的数量应该是恒定的(在遍历值时可以添加一小段脚本来检查嵌套级别的数量,但是我没有还没看这个)。

Here’s a modified version of Fred Foo’s answer for Python 2. In the original response, only the deepest level of nesting is output. If you output the keys as lists, you can keep the keys for all levels, although to reference them you need to reference a list of lists.

Here’s the function:

def NestIter(nested):
    for key, value in nested.iteritems():
        if isinstance(value, collections.Mapping):
            for inner_key, inner_value in NestIter(value):
                yield [key, inner_key], inner_value
        else:
            yield [key],value

To reference the keys:

for keys, vals in mynested: 
    print(mynested[keys[0]][keys[1][0]][keys[1][1][0]])

for a three-level dictionary.

You need to know the number of levels before to access multiple keys and the number of levels should be constant (it may be possible to add a small bit of script to check the number of nesting levels when iterating through values, but I haven’t yet looked at this).


回答 10

我发现这种方法更加灵活,这里您仅提供生成器函数,该函数可以生成键,值对,并且可以轻松扩展以遍历列表。

def traverse(value, key=None):
    if isinstance(value, dict):
        for k, v in value.items():
            yield from traverse(v, k)
    else:
        yield key, value

然后,您可以编写自己的myprint函数,然后打印这些键值对。

def myprint(d):
    for k, v in traverse(d):
        print(f"{k} : {v}")

一个测试:

myprint({
    'xml': {
        'config': {
            'portstatus': {
                'status': 'good',
            },
            'target': '1',
        },
        'port': '11',
    },
})

输出:

status : good
target : 1
port : 11

我在Python 3.6上进行了测试。

I find this approach a bit more flexible, here you just providing generator function that emits key, value pairs and can be easily extended to also iterate over lists.

def traverse(value, key=None):
    if isinstance(value, dict):
        for k, v in value.items():
            yield from traverse(v, k)
    else:
        yield key, value

Then you can write your own myprint function, then would print those key value pairs.

def myprint(d):
    for k, v in traverse(d):
        print(f"{k} : {v}")

A test:

myprint({
    'xml': {
        'config': {
            'portstatus': {
                'status': 'good',
            },
            'target': '1',
        },
        'port': '11',
    },
})

Output:

status : good
target : 1
port : 11

I tested this on Python 3.6.


回答 11

这些答案仅适用于2级子词典。有关更多信息,请尝试以下方法:

nested_dict = {'dictA': {'key_1': 'value_1', 'key_1A': 'value_1A','key_1Asub1': {'Asub1': 'Asub1_val', 'sub_subA1': {'sub_subA1_key':'sub_subA1_val'}}},
                'dictB': {'key_2': 'value_2'},
                1: {'key_3': 'value_3', 'key_3A': 'value_3A'}}

def print_dict(dictionary):
    dictionary_array = [dictionary]
    for sub_dictionary in dictionary_array:
        if type(sub_dictionary) is dict:
            for key, value in sub_dictionary.items():
                print("key=", key)
                print("value", value)
                if type(value) is dict:
                    dictionary_array.append(value)



print_dict(nested_dict)

These answers work for only 2 levels of sub-dictionaries. For more try this:

nested_dict = {'dictA': {'key_1': 'value_1', 'key_1A': 'value_1A','key_1Asub1': {'Asub1': 'Asub1_val', 'sub_subA1': {'sub_subA1_key':'sub_subA1_val'}}},
                'dictB': {'key_2': 'value_2'},
                1: {'key_3': 'value_3', 'key_3A': 'value_3A'}}

def print_dict(dictionary):
    dictionary_array = [dictionary]
    for sub_dictionary in dictionary_array:
        if type(sub_dictionary) is dict:
            for key, value in sub_dictionary.items():
                print("key=", key)
                print("value", value)
                if type(value) is dict:
                    dictionary_array.append(value)



print_dict(nested_dict)