Python 实用宝典

Question 1

This is quite n00bish, but I’m trying to learn/understand functional programming in python. The following code:

foos = [1.0,2.0,3.0,4.0,5.0]
bars = [1,2,3]

def maptest(foo, bar):
    print foo, bar

map(maptest, foos, bars)

produces:

1.0 1
2.0 2
3.0 3
4.0 None
5.0 None

Q. Is there a way to use map or any other functional tools in python to produce the following without loops etc.

1.0 [1,2,3]
2.0 [1,2,3]
3.0 [1,2,3]
4.0 [1,2,3]
5.0 [1,2,3]

Just as a side note how would the implementation change if there is a dependency between foo and bar. e.g.

foos = [1.0,2.0,3.0,4.0,5.0]
bars = [1,2,3,4,5]

and print:

1.0 [2,3,4,5]
2.0 [1,3,4,5]
3.0 [1,2,4,5]
...

P.S: I know how to do it naively using if, loops and/or generators, but I’d like to learn how to achieve the same using functional tools. Is it just a case of adding an if statement to maptest or apply another filter map to bars internally within maptest?

Question 2

The easiest way would be not to pass bars through the different functions, but to access it directly from maptest:

foos = [1.0,2.0,3.0,4.0,5.0]
bars = [1,2,3]

def maptest(foo):
    print foo, bars

map(maptest, foos)

With your original maptest function you could also use a lambda function in map:

map((lambda foo: maptest(foo, bars)), foos)

Question 3

Are you familiar with other functional languages? i.e. are you trying to learn how python does functional programming, or are you trying to learn about functional programming and using python as the vehicle?

Also, do you understand list comprehensions?

map(f, sequence)

is directly equivalent (*) to:

[f(x) for x in sequence]

In fact, I think map() was once slated for removal from python 3.0 as being redundant (that didn’t happen).

map(f, sequence1, sequence2)

is mostly equivalent to:

[f(x1, x2) for x1, x2 in zip(sequence1, sequence2)]

(there is a difference in how it handles the case where the sequences are of different length. As you saw, map() fills in None when one of the sequences runs out, whereas zip() stops when the shortest sequence stops)

So, to address your specific question, you’re trying to produce the result:

foos[0], bars
foos[1], bars
foos[2], bars
# etc.

You could do this by writing a function that takes a single argument and prints it, followed by bars:

def maptest(x):
     print x, bars
map(maptest, foos)

Alternatively, you could create a list that looks like this:

[bars, bars, bars, ] # etc.

and use your original maptest:

def maptest(x, y):
    print x, y

One way to do this would be to explicitely build the list beforehand:

barses = [bars] * len(foos)
map(maptest, foos, barses)

Alternatively, you could pull in the itertools module. itertools contains many clever functions that help you do functional-style lazy-evaluation programming in python. In this case, we want itertools.repeat, which will output its argument indefinitely as you iterate over it. This last fact means that if you do:

map(maptest, foos, itertools.repeat(bars))

you will get endless output, since map() keeps going as long as one of the arguments is still producing output. However, itertools.imap is just like map(), but stops as soon as the shortest iterable stops.

itertools.imap(maptest, foos, itertools.repeat(bars))

Hope this helps :-)

(*) It’s a little different in python 3.0. There, map() essentially returns a generator expression.

Question 4

Here’s the solution you’re looking for:

>>> foos = [1.0, 2.0, 3.0, 4.0, 5.0]
>>> bars = [1, 2, 3]
>>> [(x, bars) for x in foos]
[(1.0, [1, 2, 3]), (2.0, [1, 2, 3]), (3.0, [1, 2, 3]), (4.0, [1, 2, 3]), (5.0, [
1, 2, 3])]

I’d recommend using a list comprehension (the [(x, bars) for x in foos] part) over using map as it avoids the overhead of a function call on every iteration (which can be very significant). If you’re just going to use it in a for loop, you’ll get better speeds by using a generator comprehension:

>>> y = ((x, bars) for x in foos)
>>> for z in y:
...     print z
...
(1.0, [1, 2, 3])
(2.0, [1, 2, 3])
(3.0, [1, 2, 3])
(4.0, [1, 2, 3])
(5.0, [1, 2, 3])

The difference is that the generator comprehension is lazily loaded.

UPDATE In response to this comment:

Of course you know, that you don’t copy bars, all entries are the same bars list. So if you modify any one of them (including original bars), you modify all of them.

I suppose this is a valid point. There are two solutions to this that I can think of. The most efficient is probably something like this:

tbars = tuple(bars)
[(x, tbars) for x in foos]

Since tuples are immutable, this will prevent bars from being modified through the results of this list comprehension (or generator comprehension if you go that route). If you really need to modify each and every one of the results, you can do this:

from copy import copy
[(x, copy(bars)) for x in foos]

However, this can be a bit expensive both in terms of memory usage and in speed, so I’d recommend against it unless you really need to add to each one of them.

Question 5

Functional programming is about creating side-effect-free code.

map is a functional list transformation abstraction. You use it to take a sequence of something and turn it into a sequence of something else.

You are trying to use it as an iterator. Don’t do that. :)

Here is an example of how you might use map to build the list you want. There are shorter solutions (I’d just use comprehensions), but this will help you understand what map does a bit better:

def my_transform_function(input):
    return [input, [1, 2, 3]]

new_list = map(my_transform, input_list)

Notice at this point, you’ve only done a data manipulation. Now you can print it:

for n,l in new_list:
    print n, ll

— I’m not sure what you mean by ‘without loops.’ fp isn’t about avoiding loops (you can’t examine every item in a list without visiting each one). It’s about avoiding side-effects, thus writing fewer bugs.

Question 6

>>> from itertools import repeat
>>> for foo, bars in zip(foos, repeat(bars)):
...     print foo, bars
... 
1.0 [1, 2, 3]
2.0 [1, 2, 3]
3.0 [1, 2, 3]
4.0 [1, 2, 3]
5.0 [1, 2, 3]

Question 7

import itertools

foos=[1.0, 2.0, 3.0, 4.0, 5.0]
bars=[1, 2, 3]

print zip(foos, itertools.cycle([bars]))

Question 8

Here’s an overview of the parameters to the map(function, *sequences) function:

function is the name of your function.
sequences is any number of sequences, which are usually lists or tuples. map will iterate over them simultaneously and give the current values to function. That’s why the number of sequences should equal the number of parameters to your function.

It sounds like you’re trying to iterate for some of function‘s parameters but keep others constant, and unfortunately map doesn’t support that. I found an old proposal to add such a feature to Python, but the map construct is so clean and well-established that I doubt something like that will ever be implemented.

Use a workaround like global variables or list comprehensions, as others have suggested.

Question 9

Would this do it?

foos = [1.0,2.0,3.0,4.0,5.0]
bars = [1,2,3]

def maptest2(bar):
  print bar

def maptest(foo):
  print foo
  map(maptest2, bars)

map(maptest, foos)

Question 10

How about this:

foos = [1.0,2.0,3.0,4.0,5.0]
bars = [1,2,3]

def maptest(foo, bar):
    print foo, bar

map(maptest, foos, [bars]*len(foos))

Question 11

I know how to remove an entry, 'key' from my dictionary d, safely. You do:

if d.has_key('key'):
    del d['key']

However, I need to remove multiple entries from a dictionary safely. I was thinking of defining the entries in a tuple as I will need to do this more than once.

entities_to_remove = ('a', 'b', 'c')
for x in entities_to_remove:
    if x in d:
        del d[x]

However, I was wondering if there is a smarter way to do this?

Question 12

Why not like this:

entries = ('a', 'b', 'c')
the_dict = {'b': 'foo'}

def entries_to_remove(entries, the_dict):
    for key in entries:
        if key in the_dict:
            del the_dict[key]

A more compact version was provided by mattbornski using dict.pop()

Question 13

Using dict.pop:

d = {'some': 'data'}
entries_to_remove = ('any', 'iterable')
for k in entries_to_remove:
    d.pop(k, None)

Question 14

Using Dict Comprehensions

final_dict = {key: t[key] for key in t if key not in [key1, key2]}

where key1 and key2 are to be removed.

In the example below, keys “b” and “c” are to be removed & it’s kept in a keys list.

>>> a
{'a': 1, 'c': 3, 'b': 2, 'd': 4}
>>> keys = ["b", "c"]
>>> print {key: a[key] for key in a if key not in keys}
{'a': 1, 'd': 4}
>>>

Question 15

a solution is using map and filter functions

python 2

d={"a":1,"b":2,"c":3}
l=("a","b","d")
map(d.__delitem__, filter(d.__contains__,l))
print(d)

python 3

d={"a":1,"b":2,"c":3}
l=("a","b","d")
list(map(d.__delitem__, filter(d.__contains__,l)))
print(d)

you get:

{'c': 3}

Question 16

If you also need to retrieve the values for the keys you are removing, this would be a pretty good way to do it:

values_removed = [d.pop(k, None) for k in entities_to_remove]

You could of course still do this just for the removal of the keys from d, but you would be unnecessarily creating the list of values with the list comprehension. It is also a little unclear to use a list comprehension just for the function’s side effect.

Question 17

Found a solution with pop and map

d = {'a': 'valueA', 'b': 'valueB', 'c': 'valueC', 'd': 'valueD'}
keys = ['a', 'b', 'c']
list(map(d.pop, keys))
print(d)

The output of this:

{'d': 'valueD'}

I have answered this question so late just because I think it will help in the future if anyone searches the same. And this might help.

Update

The above code will throw an error if a key does not exist in the dict.

DICTIONARY = {'a': 'valueA', 'b': 'valueB', 'c': 'valueC', 'd': 'valueD'}
keys = ['a', 'l', 'c']

def remove_keys(key):
    try:
        DICTIONARY.pop(key, None)
    except:
        pass  # or do any action

list(map(remove_key, keys))
print(DICTIONARY)

output:

DICTIONARY = {'b': 'valueB', 'd': 'valueD'}

Question 18

I have no problem with any of the existing answers, but I was surprised to not find this solution:

keys_to_remove = ['a', 'b', 'c']
my_dict = {k: v for k, v in zip("a b c d e f g".split(' '), [0, 1, 2, 3, 4, 5, 6])}

for k in keys_to_remove:
    try:
        del my_dict[k]
    except KeyError:
        pass

assert my_dict == {'d': 3, 'e': 4, 'f': 5, 'g': 6}

Note: I stumbled across this question coming from here. And my answer is related to this answer.

Question 19

Why not:

entriestoremove = (2,5,1)
for e in entriestoremove:
    if d.has_key(e):
        del d[e]

I don’t know what you mean by “smarter way”. Surely there are other ways, maybe with dictionary comprehensions:

entriestoremove = (2,5,1)
newdict = {x for x in d if x not in entriestoremove}

Question 20

inline

import functools

#: not key(c) in d
d = {"a": "avalue", "b": "bvalue", "d": "dvalue"}

entitiesToREmove = ('a', 'b', 'c')

#: python2
map(lambda x: functools.partial(d.pop, x, None)(), entitiesToREmove)

#: python3

list(map(lambda x: functools.partial(d.pop, x, None)(), entitiesToREmove))

print(d)
# output: {'d': 'dvalue'}

Question 21

Some timing tests for cpython 3 shows that a simple for loop is the fastest way, and it’s quite readable. Adding in a function doesn’t cause much overhead either:

timeit results (10k iterations):

all(x.pop(v) for v in r) # 0.85
all(map(x.pop, r)) # 0.60
list(map(x.pop, r)) # 0.70
all(map(x.__delitem__, r)) # 0.44
del_all(x, r) # 0.40
<inline for loop>(x, r) # 0.35

def del_all(mapping, to_remove):
      """Remove list of elements from mapping."""
      for key in to_remove:
          del mapping[key]

For small iterations, doing that ‘inline’ was a bit faster, because of the overhead of the function call. But del_all is lint-safe, reusable, and faster than all the python comprehension and mapping constructs.

Question 22

I think using the fact that the keys can be treated as a set is the nicest way if you’re on python 3:

def remove_keys(d, keys):
    to_remove = set(keys)
    filtered_keys = d.keys() - to_remove
    filtered_values = map(d.get, filtered_keys)
    return dict(zip(filtered_keys, filtered_values))

Example:

>>> remove_keys({'k1': 1, 'k3': 3}, ['k1', 'k2'])
{'k3': 3}

Question 23

It would be nice to have full support for set methods for dictionaries (and not the unholy mess we’re getting with Python 3.9) so that you could simply “remove” a set of keys. However, as long as that’s not the case, and you have a large dictionary with potentially a large number of keys to remove, you might want to know about the performance. So, I’ve created some code that creates something large enough for meaningful comparisons: a 100,000 x 1000 matrix, so 10,000,00 items in total.

from itertools import product
from time import perf_counter

# make a complete worksheet 100000 * 1000
start = perf_counter()
prod = product(range(1, 100000), range(1, 1000))
cells = {(x,y):x for x,y in prod}
print(len(cells))

print(f"Create time {perf_counter()-start:.2f}s")
clock = perf_counter()
# remove everything above row 50,000

keys = product(range(50000, 100000), range(1, 100))

# for x,y in keys:
#     del cells[x, y]

for n in map(cells.pop, keys):
    pass

print(len(cells))
stop = perf_counter()
print(f"Removal time {stop-clock:.2f}s")

10 million items or more is not unusual in some settings. Comparing the two methods on my local machine I see a slight improvement when using map and pop, presumably because of fewer function calls, but both take around 2.5s on my machine. But this pales in comparison to the time required to create the dictionary in the first place (55s), or including checks within the loop. If this is likely then its best to create a set that is a intersection of the dictionary keys and your filter:

keys = cells.keys() & keys

In summary: del is already heavily optimised, so don’t worry about using it.

Question 24

I’m late to this discussion but for anyone else. A solution may be to create a list of keys as such.

k = ['a','b','c','d']

Then use pop() in a list comprehension, or for loop, to iterate over the keys and pop one at a time as such.

new_dictionary = [dictionary.pop(x, 'n/a') for x in k]

The ‘n/a’ is in case the key does not exist, a default value needs to be returned.

Question 25

What is more efficient in Python in terms of memory usage and CPU consumption – Dictionary or Object?

Background: I have to load huge amount of data into Python. I created an object that is just a field container. Creating 4M instances and putting them into a dictionary took about 10 minutes and ~6GB of memory. After dictionary is ready, accessing it is a blink of an eye.

Example: To check the performance I wrote two simple programs that do the same – one is using objects, other dictionary:

Object (execution time ~18sec):

class Obj(object):
  def __init__(self, i):
    self.i = i
    self.l = []
all = {}
for i in range(1000000):
  all[i] = Obj(i)

Dictionary (execution time ~12sec):

all = {}
for i in range(1000000):
  o = {}
  o['i'] = i
  o['l'] = []
  all[i] = o

Question: Am I doing something wrong or dictionary is just faster than object? If indeed dictionary performs better, can somebody explain why?

Question 26

Have you tried using __slots__?

From the documentation:

By default, instances of both old and new-style classes have a dictionary for attribute storage. This wastes space for objects having very few instance variables. The space consumption can become acute when creating large numbers of instances.

The default can be overridden by defining __slots__ in a new-style class definition. The __slots__ declaration takes a sequence of instance variables and reserves just enough space in each instance to hold a value for each variable. Space is saved because __dict__ is not created for each instance.

So does this save time as well as memory?

Comparing the three approaches on my computer:

test_slots.py:

class Obj(object):
  __slots__ = ('i', 'l')
  def __init__(self, i):
    self.i = i
    self.l = []
all = {}
for i in range(1000000):
  all[i] = Obj(i)

test_obj.py:

class Obj(object):
  def __init__(self, i):
    self.i = i
    self.l = []
all = {}
for i in range(1000000):
  all[i] = Obj(i)

test_dict.py:

all = {}
for i in range(1000000):
  o = {}
  o['i'] = i
  o['l'] = []
  all[i] = o

test_namedtuple.py (supported in 2.6):

import collections

Obj = collections.namedtuple('Obj', 'i l')

all = {}
for i in range(1000000):
  all[i] = Obj(i, [])

Run benchmark (using CPython 2.5):

$ lshw | grep product | head -n 1
          product: Intel(R) Pentium(R) M processor 1.60GHz
$ python --version
Python 2.5
$ time python test_obj.py && time python test_dict.py && time python test_slots.py 

real    0m27.398s (using 'normal' object)
real    0m16.747s (using __dict__)
real    0m11.777s (using __slots__)

Using CPython 2.6.2, including the named tuple test:

$ python --version
Python 2.6.2
$ time python test_obj.py && time python test_dict.py && time python test_slots.py && time python test_namedtuple.py 

real    0m27.197s (using 'normal' object)
real    0m17.657s (using __dict__)
real    0m12.249s (using __slots__)
real    0m12.262s (using namedtuple)

So yes (not really a surprise), using __slots__ is a performance optimization. Using a named tuple has similar performance to __slots__.

Question 27

Attribute access in an object uses dictionary access behind the scenes – so by using attribute access you are adding extra overhead. Plus in the object case, you are incurring additional overhead because of e.g. additional memory allocations and code execution (e.g. of the __init__ method).

In your code, if o is an Obj instance, o.attr is equivalent to o.__dict__['attr'] with a small amount of extra overhead.

Question 28

Have you considered using a namedtuple? (link for python 2.4/2.5)

It’s the new standard way of representing structured data that gives you the performance of a tuple and the convenience of a class.

It’s only downside compared with dictionaries is that (like tuples) it doesn’t give you the ability to change attributes after creation.

Question 29

Here is a copy of @hughdbrown answer for python 3.6.1, I’ve made the count 5x larger and added some code to test the memory footprint of the python process at the end of each run.

Before the downvoters have at it, Be advised that this method of counting the size of objects is not accurate.

from datetime import datetime
import os
import psutil

process = psutil.Process(os.getpid())


ITER_COUNT = 1000 * 1000 * 5

RESULT=None

def makeL(i):
    # Use this line to negate the effect of the strings on the test 
    # return "Python is smart and will only create one string with this line"

    # Use this if you want to see the difference with 5 million unique strings
    return "This is a sample string %s" % i

def timeit(method):
    def timed(*args, **kw):
        global RESULT
        s = datetime.now()
        RESULT = method(*args, **kw)
        e = datetime.now()

        sizeMb = process.memory_info().rss / 1024 / 1024
        sizeMbStr = "{0:,}".format(round(sizeMb, 2))

        print('Time Taken = %s, \t%s, \tSize = %s' % (e - s, method.__name__, sizeMbStr))

    return timed

class Obj(object):
    def __init__(self, i):
       self.i = i
       self.l = makeL(i)

class SlotObj(object):
    __slots__ = ('i', 'l')
    def __init__(self, i):
       self.i = i
       self.l = makeL(i)

from collections import namedtuple
NT = namedtuple("NT", ["i", 'l'])

@timeit
def profile_dict_of_nt():
    return [NT(i=i, l=makeL(i)) for i in range(ITER_COUNT)]

@timeit
def profile_list_of_nt():
    return dict((i, NT(i=i, l=makeL(i))) for i in range(ITER_COUNT))

@timeit
def profile_dict_of_dict():
    return dict((i, {'i': i, 'l': makeL(i)}) for i in range(ITER_COUNT))

@timeit
def profile_list_of_dict():
    return [{'i': i, 'l': makeL(i)} for i in range(ITER_COUNT)]

@timeit
def profile_dict_of_obj():
    return dict((i, Obj(i)) for i in range(ITER_COUNT))

@timeit
def profile_list_of_obj():
    return [Obj(i) for i in range(ITER_COUNT)]

@timeit
def profile_dict_of_slot():
    return dict((i, SlotObj(i)) for i in range(ITER_COUNT))

@timeit
def profile_list_of_slot():
    return [SlotObj(i) for i in range(ITER_COUNT)]

profile_dict_of_nt()
profile_list_of_nt()
profile_dict_of_dict()
profile_list_of_dict()
profile_dict_of_obj()
profile_list_of_obj()
profile_dict_of_slot()
profile_list_of_slot()

And these are my results

Time Taken = 0:00:07.018720,    provile_dict_of_nt,     Size = 951.83
Time Taken = 0:00:07.716197,    provile_list_of_nt,     Size = 1,084.75
Time Taken = 0:00:03.237139,    profile_dict_of_dict,   Size = 1,926.29
Time Taken = 0:00:02.770469,    profile_list_of_dict,   Size = 1,778.58
Time Taken = 0:00:07.961045,    profile_dict_of_obj,    Size = 1,537.64
Time Taken = 0:00:05.899573,    profile_list_of_obj,    Size = 1,458.05
Time Taken = 0:00:06.567684,    profile_dict_of_slot,   Size = 1,035.65
Time Taken = 0:00:04.925101,    profile_list_of_slot,   Size = 887.49

My conclusion is:

Slots have the best memory footprint and are reasonable on speed.
dicts are the fastest, but use the most memory.

Question 30

from datetime import datetime

ITER_COUNT = 1000 * 1000

def timeit(method):
    def timed(*args, **kw):
        s = datetime.now()
        result = method(*args, **kw)
        e = datetime.now()

        print method.__name__, '(%r, %r)' % (args, kw), e - s
        return result
    return timed

class Obj(object):
    def __init__(self, i):
       self.i = i
       self.l = []

class SlotObj(object):
    __slots__ = ('i', 'l')
    def __init__(self, i):
       self.i = i
       self.l = []

@timeit
def profile_dict_of_dict():
    return dict((i, {'i': i, 'l': []}) for i in xrange(ITER_COUNT))

@timeit
def profile_list_of_dict():
    return [{'i': i, 'l': []} for i in xrange(ITER_COUNT)]

@timeit
def profile_dict_of_obj():
    return dict((i, Obj(i)) for i in xrange(ITER_COUNT))

@timeit
def profile_list_of_obj():
    return [Obj(i) for i in xrange(ITER_COUNT)]

@timeit
def profile_dict_of_slotobj():
    return dict((i, SlotObj(i)) for i in xrange(ITER_COUNT))

@timeit
def profile_list_of_slotobj():
    return [SlotObj(i) for i in xrange(ITER_COUNT)]

if __name__ == '__main__':
    profile_dict_of_dict()
    profile_list_of_dict()
    profile_dict_of_obj()
    profile_list_of_obj()
    profile_dict_of_slotobj()
    profile_list_of_slotobj()

Results:

hbrown@hbrown-lpt:~$ python ~/Dropbox/src/StackOverflow/1336791.py 
profile_dict_of_dict ((), {}) 0:00:08.228094
profile_list_of_dict ((), {}) 0:00:06.040870
profile_dict_of_obj ((), {}) 0:00:11.481681
profile_list_of_obj ((), {}) 0:00:10.893125
profile_dict_of_slotobj ((), {}) 0:00:06.381897
profile_list_of_slotobj ((), {}) 0:00:05.860749

Question 31

There is no question.
You have data, with no other attributes (no methods, nothing). Hence you have a data container (in this case, a dictionary).

I usually prefer to think in terms of data modeling. If there is some huge performance issue, then I can give up something in the abstraction, but only with very good reasons.
Programming is all about managing complexity, and the maintaining the correct abstraction is very often one of the most useful way to achieve such result.

About the reasons an object is slower, I think your measurement is not correct.
You are performing too little assignments inside the for loop, and therefore what you see there is the different time necessary to instantiate a dict (intrinsic object) and a “custom” object. Although from the language perspective they are the same, they have quite a different implementation.
After that, the assignment time should be almost the same for both, as in the end members are maintained inside a dictionary.

Question 32

There is yet another way to reduce memory usage if data structure isn’t supposed to contain reference cycles.

Let’s compare two classes:

class DataItem:
    __slots__ = ('name', 'age', 'address')
    def __init__(self, name, age, address):
        self.name = name
        self.age = age
        self.address = address

and

$ pip install recordclass

>>> from recordclass import structclass
>>> DataItem2 = structclass('DataItem', 'name age address')
>>> inst = DataItem('Mike', 10, 'Cherry Street 15')
>>> inst2 = DataItem2('Mike', 10, 'Cherry Street 15')
>>> print(inst2)
>>> print(sys.getsizeof(inst), sys.getsizeof(inst2))
DataItem(name='Mike', age=10, address='Cherry Street 15')
64 40

It became possible since structclass-based classes doesn’t support cyclic garbage collection, which is not needed in such cases.

There is also one advantage over __slots__-based class: you are able to add extra attributes:

>>> DataItem3 = structclass('DataItem', 'name age address', usedict=True)
>>> inst3 = DataItem3('Mike', 10, 'Cherry Street 15')
>>> inst3.hobby = ['drawing', 'singing']
>>> print(inst3)
>>> print(sizeof(inst3), 'has dict:',  bool(inst3.__dict__))
DataItem(name='Mike', age=10, address='Cherry Street 15', **{'hobby': ['drawing', 'singing']})
48 has dict: True

Question 33

Here are my test runs of the very nice script of @Jarrod-Chesney. For comparison, I also run it against python2 with “range” replaced by “xrange”.

By curiosity, I also added similar tests with OrderedDict (ordict) for comparison.

Python 3.6.9:

Time Taken = 0:00:04.971369,    profile_dict_of_nt,     Size = 944.27
Time Taken = 0:00:05.743104,    profile_list_of_nt,     Size = 1,066.93
Time Taken = 0:00:02.524507,    profile_dict_of_dict,   Size = 1,920.35
Time Taken = 0:00:02.123801,    profile_list_of_dict,   Size = 1,760.9
Time Taken = 0:00:05.374294,    profile_dict_of_obj,    Size = 1,532.12
Time Taken = 0:00:04.517245,    profile_list_of_obj,    Size = 1,441.04
Time Taken = 0:00:04.590298,    profile_dict_of_slot,   Size = 1,030.09
Time Taken = 0:00:04.197425,    profile_list_of_slot,   Size = 870.67

Time Taken = 0:00:08.833653,    profile_ordict_of_ordict, Size = 3,045.52
Time Taken = 0:00:11.539006,    profile_list_of_ordict, Size = 2,722.34
Time Taken = 0:00:06.428105,    profile_ordict_of_obj,  Size = 1,799.29
Time Taken = 0:00:05.559248,    profile_ordict_of_slot, Size = 1,257.75

Python 2.7.15+:

Time Taken = 0:00:05.193900,    profile_dict_of_nt,     Size = 906.0
Time Taken = 0:00:05.860978,    profile_list_of_nt,     Size = 1,177.0
Time Taken = 0:00:02.370905,    profile_dict_of_dict,   Size = 2,228.0
Time Taken = 0:00:02.100117,    profile_list_of_dict,   Size = 2,036.0
Time Taken = 0:00:08.353666,    profile_dict_of_obj,    Size = 2,493.0
Time Taken = 0:00:07.441747,    profile_list_of_obj,    Size = 2,337.0
Time Taken = 0:00:06.118018,    profile_dict_of_slot,   Size = 1,117.0
Time Taken = 0:00:04.654888,    profile_list_of_slot,   Size = 964.0

Time Taken = 0:00:59.576874,    profile_ordict_of_ordict, Size = 7,427.0
Time Taken = 0:10:25.679784,    profile_list_of_ordict, Size = 11,305.0
Time Taken = 0:05:47.289230,    profile_ordict_of_obj,  Size = 11,477.0
Time Taken = 0:00:51.485756,    profile_ordict_of_slot, Size = 11,193.0

So, on both major versions, the conclusions of @Jarrod-Chesney are still looking good.

Question 34

So I’ve spent way to much time on this, and it seems to me like it should be a simple fix. I’m trying to use Facebook’s Authentication to register users on my site, and I’m trying to do it server side. I’ve gotten to the point where I get my access token, and when I go to:

https://graph.facebook.com/me?access_token=MY_ACCESS_TOKEN

I get the information I’m looking for as a string that’s like this:

{"id":"123456789","name":"John Doe","first_name":"John","last_name":"Doe","link":"http:\/\/www.facebook.com\/jdoe","gender":"male","email":"jdoe\u0040gmail.com","timezone":-7,"locale":"en_US","verified":true,"updated_time":"2011-01-12T02:43:35+0000"}

It seems like I should just be able to use dict(string) on this but I’m getting this error:

ValueError: dictionary update sequence element #0 has length 1; 2 is required

So I tried using Pickle, but got this error:

KeyError: '{'

I tried using django.serializers to de-serialize it but had similar results. Any thoughts? I feel like the answer has to be simple, and I’m just being stupid. Thanks for any help!

Question 35

This data is JSON! You can deserialize it using the built-in json module if you’re on Python 2.6+, otherwise you can use the excellent third-party simplejson module.

import json    # or `import simplejson as json` if on Python < 2.6

json_string = u'{ "id":"123456789", ... }'
obj = json.loads(json_string)    # obj now contains a dict of the data

Question 36

Use ast.literal_eval to evaluate Python literals. However, what you have is JSON (note “true” for example), so use a JSON deserializer.

>>> import json
>>> s = """{"id":"123456789","name":"John Doe","first_name":"John","last_name":"Doe","link":"http:\/\/www.facebook.com\/jdoe","gender":"male","email":"jdoe\u0040gmail.com","timezone":-7,"locale":"en_US","verified":true,"updated_time":"2011-01-12T02:43:35+0000"}"""
>>> json.loads(s)
{u'first_name': u'John', u'last_name': u'Doe', u'verified': True, u'name': u'John Doe', u'locale': u'en_US', u'gender': u'male', u'email': u'jdoe@gmail.com', u'link': u'http://www.facebook.com/jdoe', u'timezone': -7, u'updated_time': u'2011-01-12T02:43:35+0000', u'id': u'123456789'}

Question 37

I have a list that looks like:

[('A', 1), ('B', 2), ('C', 3)]

I want to turn it into a dictionary that looks like:

{'A': 1, 'B': 2, 'C': 3}

What’s the best way to go about this?

EDIT: My list of tuples is actually more like:

[(A, 12937012397), (BERA, 2034927830), (CE, 2349057340)]

Question 38

This gives me the same error as trying to split the list up and zip it. ValueError: dictionary update sequence element #0 has length 1916; 2 is required

THAT is your actual question.

The answer is that the elements of your list are not what you think they are. If you type myList[0] you will find that the first element of your list is not a two-tuple, e.g. ('A', 1), but rather a 1916-length iterable.

Once you actually have a list in the form you stated in your original question (myList = [('A',1),('B',2),...]), all you need to do is dict(myList).

Question 39

>>> dict([('A', 1), ('B', 2), ('C', 3)])
{'A': 1, 'C': 3, 'B': 2}

Question 40

Have you tried this?

>>> l=[('A',1), ('B',2), ('C',3)]
>>> d=dict(l)
>>> d
{'A': 1, 'C': 3, 'B': 2}

Question 41

Here is a way to handle duplicate tuple “keys”:

# An example
l = [('A', 1), ('B', 2), ('C', 3), ('A', 5), ('D', 0), ('D', 9)]

# A solution
d = dict()
[d [t [0]].append(t [1]) if t [0] in list(d.keys()) 
 else d.update({t [0]: [t [1]]}) for t in l]
d

OUTPUT: {'A': [1, 5], 'B': [2], 'C': [3], 'D': [0, 9]}

Question 42

Another way using dictionary comprehensions,

>>> t = [('A', 1), ('B', 2), ('C', 3)]
>>> d = { i:j for i,j in t }
>>> d
{'A': 1, 'B': 2, 'C': 3}

Question 43

If Tuple has no key repetitions, it’s Simple.

tup = [("A",0),("B",3),("C",5)]
dic = dict(tup)
print(dic)

If tuple has key repetitions.

tup = [("A",0),("B",3),("C",5),("A",9),("B",4)]
dic = {}
for i, j in tup:
    dic.setdefault(i,[]).append(j)
print(dic)

Question 44

l=[['A', 1], ['B', 2], ['C', 3]]
d={}
for i,j in l:
d.setdefault(i,j)
print(d)

Question 45

I have a program that reads an xml document from a socket. I have the xml document stored in a string which I would like to convert directly to a Python dictionary, the same way it is done in Django’s simplejson library.

Take as an example:

str ="<?xml version="1.0" ?><person><name>john</name><age>20</age></person"
dic_xml = convert_to_dic(str)

Then dic_xml would look like {'person' : { 'name' : 'john', 'age' : 20 } }

Question 46

This is a great module that someone created. I’ve used it several times. http://code.activestate.com/recipes/410469-xml-as-dictionary/

Here is the code from the website just in case the link goes bad.

from xml.etree import cElementTree as ElementTree

class XmlListConfig(list):
    def __init__(self, aList):
        for element in aList:
            if element:
                # treat like dict
                if len(element) == 1 or element[0].tag != element[1].tag:
                    self.append(XmlDictConfig(element))
                # treat like list
                elif element[0].tag == element[1].tag:
                    self.append(XmlListConfig(element))
            elif element.text:
                text = element.text.strip()
                if text:
                    self.append(text)


class XmlDictConfig(dict):
    '''
    Example usage:

    >>> tree = ElementTree.parse('your_file.xml')
    >>> root = tree.getroot()
    >>> xmldict = XmlDictConfig(root)

    Or, if you want to use an XML string:

    >>> root = ElementTree.XML(xml_string)
    >>> xmldict = XmlDictConfig(root)

    And then use xmldict for what it is... a dict.
    '''
    def __init__(self, parent_element):
        if parent_element.items():
            self.update(dict(parent_element.items()))
        for element in parent_element:
            if element:
                # treat like dict - we assume that if the first two tags
                # in a series are different, then they are all different.
                if len(element) == 1 or element[0].tag != element[1].tag:
                    aDict = XmlDictConfig(element)
                # treat like list - we assume that if the first two tags
                # in a series are the same, then the rest are the same.
                else:
                    # here, we put the list in dictionary; the key is the
                    # tag name the list elements all share in common, and
                    # the value is the list itself 
                    aDict = {element[0].tag: XmlListConfig(element)}
                # if the tag has attributes, add those to the dict
                if element.items():
                    aDict.update(dict(element.items()))
                self.update({element.tag: aDict})
            # this assumes that if you've got an attribute in a tag,
            # you won't be having any text. This may or may not be a 
            # good idea -- time will tell. It works for the way we are
            # currently doing XML configuration files...
            elif element.items():
                self.update({element.tag: dict(element.items())})
            # finally, if there are no child tags and no attributes, extract
            # the text
            else:
                self.update({element.tag: element.text})

Example usage:

tree = ElementTree.parse('your_file.xml')
root = tree.getroot()
xmldict = XmlDictConfig(root)

//Or, if you want to use an XML string:

root = ElementTree.XML(xml_string)
xmldict = XmlDictConfig(root)

Question 47

xmltodict (full disclosure: I wrote it) does exactly that:

xmltodict.parse("""
<?xml version="1.0" ?>
<person>
  <name>john</name>
  <age>20</age>
</person>""")
# {u'person': {u'age': u'20', u'name': u'john'}}

Question 48

The following XML-to-Python-dict snippet parses entities as well as attributes following this XML-to-JSON “specification”. It is the most general solution handling all cases of XML.

from collections import defaultdict

def etree_to_dict(t):
    d = {t.tag: {} if t.attrib else None}
    children = list(t)
    if children:
        dd = defaultdict(list)
        for dc in map(etree_to_dict, children):
            for k, v in dc.items():
                dd[k].append(v)
        d = {t.tag: {k:v[0] if len(v) == 1 else v for k, v in dd.items()}}
    if t.attrib:
        d[t.tag].update(('@' + k, v) for k, v in t.attrib.items())
    if t.text:
        text = t.text.strip()
        if children or t.attrib:
            if text:
              d[t.tag]['#text'] = text
        else:
            d[t.tag] = text
    return d

It is used:

from xml.etree import cElementTree as ET
e = ET.XML('''
<root>
  <e />
  <e>text</e>
  <e name="value" />
  <e name="value">text</e>
  <e> <a>text</a> <b>text</b> </e>
  <e> <a>text</a> <a>text</a> </e>
  <e> text <a>text</a> </e>
</root>
''')

from pprint import pprint
pprint(etree_to_dict(e))

The output of this example (as per above-linked “specification”) should be:

{'root': {'e': [None,
                'text',
                {'@name': 'value'},
                {'#text': 'text', '@name': 'value'},
                {'a': 'text', 'b': 'text'},
                {'a': ['text', 'text']},
                {'#text': 'text', 'a': 'text'}]}}

Not necessarily pretty, but it is unambiguous, and simpler XML inputs result in simpler JSON. :)

Update

If you want to do the reverse, emit an XML string from a JSON/dict, you can use:

try:
  basestring
except NameError:  # python3
  basestring = str

def dict_to_etree(d):
    def _to_etree(d, root):
        if not d:
            pass
        elif isinstance(d, basestring):
            root.text = d
        elif isinstance(d, dict):
            for k,v in d.items():
                assert isinstance(k, basestring)
                if k.startswith('#'):
                    assert k == '#text' and isinstance(v, basestring)
                    root.text = v
                elif k.startswith('@'):
                    assert isinstance(v, basestring)
                    root.set(k[1:], v)
                elif isinstance(v, list):
                    for e in v:
                        _to_etree(e, ET.SubElement(root, k))
                else:
                    _to_etree(v, ET.SubElement(root, k))
        else:
            raise TypeError('invalid type: ' + str(type(d)))
    assert isinstance(d, dict) and len(d) == 1
    tag, body = next(iter(d.items()))
    node = ET.Element(tag)
    _to_etree(body, node)
    return ET.tostring(node)

pprint(dict_to_etree(d))

Question 49

This lightweight version, while not configurable, is pretty easy to tailor as needed, and works in old pythons. Also it is rigid – meaning the results are the same regardless of the existence of attributes.

import xml.etree.ElementTree as ET

from copy import copy

def dictify(r,root=True):
    if root:
        return {r.tag : dictify(r, False)}
    d=copy(r.attrib)
    if r.text:
        d["_text"]=r.text
    for x in r.findall("./*"):
        if x.tag not in d:
            d[x.tag]=[]
        d[x.tag].append(dictify(x,False))
    return d

So:

root = ET.fromstring("<erik><a x='1'>v</a><a y='2'>w</a></erik>")

dictify(root)

Results in:

{'erik': {'a': [{'x': '1', '_text': 'v'}, {'y': '2', '_text': 'w'}]}}

Question 50

The most recent versions of the PicklingTools libraries (1.3.0 and 1.3.1) support tools for converting from XML to a Python dict.

The download is available here: PicklingTools 1.3.1

There is quite a bit of documentation for the converters here: the documentation describes in detail all of the decisions and issues that will arise when converting between XML and Python dictionaries (there are a number of edge cases: attributes, lists, anonymous lists, anonymous dicts, eval, etc. that most converters don’t handle). In general, though, the converters are easy to use. If an ‘example.xml’ contains:

<top>
  <a>1</a>
  <b>2.2</b>
  <c>three</c>
</top>

Then to convert it to a dictionary:

>>> from xmlloader import *
>>> example = file('example.xml', 'r')   # A document containing XML
>>> xl = StreamXMLLoader(example, 0)     # 0 = all defaults on operation
>>> result = xl.expect XML()
>>> print result
{'top': {'a': '1', 'c': 'three', 'b': '2.2'}}

There are tools for converting in both C++ and Python: the C++ and Python do indentical conversion, but the C++ is about 60x faster

Question 51

You can do this quite easily with lxml. First install it:

[sudo] pip install lxml

Here is a recursive function I wrote that does the heavy lifting for you:

from lxml import objectify as xml_objectify


def xml_to_dict(xml_str):
    """ Convert xml to dict, using lxml v3.4.2 xml processing library """
    def xml_to_dict_recursion(xml_object):
        dict_object = xml_object.__dict__
        if not dict_object:
            return xml_object
        for key, value in dict_object.items():
            dict_object[key] = xml_to_dict_recursion(value)
        return dict_object
    return xml_to_dict_recursion(xml_objectify.fromstring(xml_str))

xml_string = """<?xml version="1.0" encoding="UTF-8"?><Response><NewOrderResp>
<IndustryType>Test</IndustryType><SomeData><SomeNestedData1>1234</SomeNestedData1>
<SomeNestedData2>3455</SomeNestedData2></SomeData></NewOrderResp></Response>"""

print xml_to_dict(xml_string)

The below variant preserves the parent key / element:

def xml_to_dict(xml_str):
    """ Convert xml to dict, using lxml v3.4.2 xml processing library, see http://lxml.de/ """
    def xml_to_dict_recursion(xml_object):
        dict_object = xml_object.__dict__
        if not dict_object:  # if empty dict returned
            return xml_object
        for key, value in dict_object.items():
            dict_object[key] = xml_to_dict_recursion(value)
        return dict_object
    xml_obj = objectify.fromstring(xml_str)
    return {xml_obj.tag: xml_to_dict_recursion(xml_obj)}

If you want to only return a subtree and convert it to dict, you can use Element.find() to get the subtree and then convert it:

xml_obj.find('.//')  # lxml.objectify.ObjectifiedElement instance

See the lxml docs here. I hope this helps!

Question 52

Disclaimer: This modified XML parser was inspired by Adam Clark The original XML parser works for most of simple cases. However, it didn’t work for some complicated XML files. I debugged the code line by line and finally fixed some issues. If you find some bugs, please let me know. I am glad to fix it.

class XmlDictConfig(dict):  
    '''   
    Note: need to add a root into if no exising    
    Example usage:
    >>> tree = ElementTree.parse('your_file.xml')
    >>> root = tree.getroot()
    >>> xmldict = XmlDictConfig(root)
    Or, if you want to use an XML string:
    >>> root = ElementTree.XML(xml_string)
    >>> xmldict = XmlDictConfig(root)
    And then use xmldict for what it is... a dict.
    '''
    def __init__(self, parent_element):
        if parent_element.items():
            self.updateShim( dict(parent_element.items()) )
        for element in parent_element:
            if len(element):
                aDict = XmlDictConfig(element)
            #   if element.items():
            #   aDict.updateShim(dict(element.items()))
                self.updateShim({element.tag: aDict})
            elif element.items():    # items() is specialy for attribtes
                elementattrib= element.items()
                if element.text:           
                    elementattrib.append((element.tag,element.text ))     # add tag:text if there exist
                self.updateShim({element.tag: dict(elementattrib)})
            else:
                self.updateShim({element.tag: element.text})

    def updateShim (self, aDict ):
        for key in aDict.keys():   # keys() includes tag and attributes
            if key in self:
                value = self.pop(key)
                if type(value) is not list:
                    listOfDicts = []
                    listOfDicts.append(value)
                    listOfDicts.append(aDict[key])
                    self.update({key: listOfDicts})
                else:
                    value.append(aDict[key])
                    self.update({key: value})
            else:
                self.update({key:aDict[key]})  # it was self.update(aDict)

Question 53

def xml_to_dict(node):
    u''' 
    @param node:lxml_node
    @return: dict 
    '''

    return {'tag': node.tag, 'text': node.text, 'attrib': node.attrib, 'children': {child.tag: xml_to_dict(child) for child in node}}

Question 54

The easiest to use XML parser for Python is ElementTree (as of 2.5x and above it is in the standard library xml.etree.ElementTree). I don’t think there is anything that does exactly what you want out of the box. It would be pretty trivial to write something to do what you want using ElementTree, but why convert to a dictionary, and why not just use ElementTree directly.

Question 55

The code from http://code.activestate.com/recipes/410469-xml-as-dictionary/ works well, but if there are multiple elements that are the same at a given place in the hierarchy it just overrides them.

I added a shim between that looks to see if the element already exists before self.update(). If so, pops the existing entry and creates a lists out of the existing and the new. Any subsequent duplicates are added to the list.

Not sure if this can be handled more gracefully, but it works:

import xml.etree.ElementTree as ElementTree

class XmlDictConfig(dict):
    def __init__(self, parent_element):
        if parent_element.items():
            self.updateShim(dict(parent_element.items()))
        for element in parent_element:
            if len(element):
                aDict = XmlDictConfig(element)
                if element.items():
                    aDict.updateShim(dict(element.items()))
                self.updateShim({element.tag: aDict})
            elif element.items():
                self.updateShim({element.tag: dict(element.items())})
            else:
                self.updateShim({element.tag: element.text.strip()})

    def updateShim (self, aDict ):
        for key in aDict.keys():
            if key in self:
                value = self.pop(key)
                if type(value) is not list:
                    listOfDicts = []
                    listOfDicts.append(value)
                    listOfDicts.append(aDict[key])
                    self.update({key: listOfDicts})

                else:
                    value.append(aDict[key])
                    self.update({key: value})
            else:
                self.update(aDict)

Question 56

From @K3—rnc response (the best for me) I’ve added a small modifications to get an OrderedDict from an XML text (some times order matters):

def etree_to_ordereddict(t):
d = OrderedDict()
d[t.tag] = OrderedDict() if t.attrib else None
children = list(t)
if children:
    dd = OrderedDict()
    for dc in map(etree_to_ordereddict, children):
        for k, v in dc.iteritems():
            if k not in dd:
                dd[k] = list()
            dd[k].append(v)
    d = OrderedDict()
    d[t.tag] = OrderedDict()
    for k, v in dd.iteritems():
        if len(v) == 1:
            d[t.tag][k] = v[0]
        else:
            d[t.tag][k] = v
if t.attrib:
    d[t.tag].update(('@' + k, v) for k, v in t.attrib.iteritems())
if t.text:
    text = t.text.strip()
    if children or t.attrib:
        if text:
            d[t.tag]['#text'] = text
    else:
        d[t.tag] = text
return d

Following @K3—rnc example, you can use it:

from xml.etree import cElementTree as ET
e = ET.XML('''
<root>
  <e />
  <e>text</e>
  <e name="value" />
  <e name="value">text</e>
  <e> <a>text</a> <b>text</b> </e>
  <e> <a>text</a> <a>text</a> </e>
  <e> text <a>text</a> </e>
</root>
''')

from pprint import pprint
pprint(etree_to_ordereddict(e))

Hope it helps ;)

Question 57

Here’s a link to an ActiveState solution – and the code in case it disappears again.

==================================================
xmlreader.py:
==================================================
from xml.dom.minidom import parse


class NotTextNodeError:
    pass


def getTextFromNode(node):
    """
    scans through all children of node and gathers the
    text. if node has non-text child-nodes, then
    NotTextNodeError is raised.
    """
    t = ""
    for n in node.childNodes:
    if n.nodeType == n.TEXT_NODE:
        t += n.nodeValue
    else:
        raise NotTextNodeError
    return t


def nodeToDic(node):
    """
    nodeToDic() scans through the children of node and makes a
    dictionary from the content.
    three cases are differentiated:
    - if the node contains no other nodes, it is a text-node
    and {nodeName:text} is merged into the dictionary.
    - if the node has the attribute "method" set to "true",
    then it's children will be appended to a list and this
    list is merged to the dictionary in the form: {nodeName:list}.
    - else, nodeToDic() will call itself recursively on
    the nodes children (merging {nodeName:nodeToDic()} to
    the dictionary).
    """
    dic = {} 
    for n in node.childNodes:
    if n.nodeType != n.ELEMENT_NODE:
        continue
    if n.getAttribute("multiple") == "true":
        # node with multiple children:
        # put them in a list
        l = []
        for c in n.childNodes:
            if c.nodeType != n.ELEMENT_NODE:
            continue
        l.append(nodeToDic(c))
            dic.update({n.nodeName:l})
        continue

    try:
        text = getTextFromNode(n)
    except NotTextNodeError:
            # 'normal' node
            dic.update({n.nodeName:nodeToDic(n)})
            continue

        # text node
        dic.update({n.nodeName:text})
    continue
    return dic


def readConfig(filename):
    dom = parse(filename)
    return nodeToDic(dom)





def test():
    dic = readConfig("sample.xml")

    print dic["Config"]["Name"]
    print
    for item in dic["Config"]["Items"]:
    print "Item's Name:", item["Name"]
    print "Item's Value:", item["Value"]

test()



==================================================
sample.xml:
==================================================
<?xml version="1.0" encoding="UTF-8"?>

<Config>
    <Name>My Config File</Name>

    <Items multiple="true">
    <Item>
        <Name>First Item</Name>
        <Value>Value 1</Value>
    </Item>
    <Item>
        <Name>Second Item</Name>
        <Value>Value 2</Value>
    </Item>
    </Items>

</Config>



==================================================
output:
==================================================
My Config File

Item's Name: First Item
Item's Value: Value 1
Item's Name: Second Item
Item's Value: Value 2

Question 58

At one point I had to parse and write XML that only consisted of elements without attributes so a 1:1 mapping from XML to dict was possible easily. This is what I came up with in case someone else also doesnt need attributes:

def xmltodict(element):
    if not isinstance(element, ElementTree.Element):
        raise ValueError("must pass xml.etree.ElementTree.Element object")

    def xmltodict_handler(parent_element):
        result = dict()
        for element in parent_element:
            if len(element):
                obj = xmltodict_handler(element)
            else:
                obj = element.text

            if result.get(element.tag):
                if hasattr(result[element.tag], "append"):
                    result[element.tag].append(obj)
                else:
                    result[element.tag] = [result[element.tag], obj]
            else:
                result[element.tag] = obj
        return result

    return {element.tag: xmltodict_handler(element)}


def dicttoxml(element):
    if not isinstance(element, dict):
        raise ValueError("must pass dict type")
    if len(element) != 1:
        raise ValueError("dict must have exactly one root key")

    def dicttoxml_handler(result, key, value):
        if isinstance(value, list):
            for e in value:
                dicttoxml_handler(result, key, e)
        elif isinstance(value, basestring):
            elem = ElementTree.Element(key)
            elem.text = value
            result.append(elem)
        elif isinstance(value, int) or isinstance(value, float):
            elem = ElementTree.Element(key)
            elem.text = str(value)
            result.append(elem)
        elif value is None:
            result.append(ElementTree.Element(key))
        else:
            res = ElementTree.Element(key)
            for k, v in value.items():
                dicttoxml_handler(res, k, v)
            result.append(res)

    result = ElementTree.Element(element.keys()[0])
    for key, value in element[element.keys()[0]].items():
        dicttoxml_handler(result, key, value)
    return result

def xmlfiletodict(filename):
    return xmltodict(ElementTree.parse(filename).getroot())

def dicttoxmlfile(element, filename):
    ElementTree.ElementTree(dicttoxml(element)).write(filename)

def xmlstringtodict(xmlstring):
    return xmltodict(ElementTree.fromstring(xmlstring).getroot())

def dicttoxmlstring(element):
    return ElementTree.tostring(dicttoxml(element))

Question 59

@dibrovsd: Solution will not work if the xml have more than one tag with same name

On your line of thought, I have modified the code a bit and written it for general node instead of root:

from collections import defaultdict
def xml2dict(node):
    d, count = defaultdict(list), 1
    for i in node:
        d[i.tag + "_" + str(count)]['text'] = i.findtext('.')[0]
        d[i.tag + "_" + str(count)]['attrib'] = i.attrib # attrib gives the list
        d[i.tag + "_" + str(count)]['children'] = xml2dict(i) # it gives dict
     return d

Question 60

I have modified one of the answers to my taste and to work with multiple values with the same tag for example consider the following xml code saved in XML.xml file

     <A>
        <B>
            <BB>inAB</BB>
            <C>
                <D>
                    <E>
                        inABCDE
                    </E>
                    <E>value2</E>
                    <E>value3</E>
                </D>
                <inCout-ofD>123</inCout-ofD>
            </C>
        </B>
        <B>abc</B>
        <F>F</F>
    </A>

and in python

import xml.etree.ElementTree as ET




class XMLToDictionary(dict):
    def __init__(self, parentElement):
        self.parentElement = parentElement
        for child in list(parentElement):
            child.text = child.text if (child.text != None) else  ' '
            if len(child) == 0:
                self.update(self._addToDict(key= child.tag, value = child.text.strip(), dict = self))
            else:
                innerChild = XMLToDictionary(parentElement=child)
                self.update(self._addToDict(key=innerChild.parentElement.tag, value=innerChild, dict=self))

    def getDict(self):
        return {self.parentElement.tag: self}

    class _addToDict(dict):
        def __init__(self, key, value, dict):
            if not key in dict:
                self.update({key: value})
            else:
                identical = dict[key] if type(dict[key]) == list else [dict[key]]
                self.update({key: identical + [value]})


tree = ET.parse('./XML.xml')
root = tree.getroot()
parseredDict = XMLToDictionary(root).getDict()
print(parseredDict)

the output is

{'A': {'B': [{'BB': 'inAB', 'C': {'D': {'E': ['inABCDE', 'value2', 'value3']}, 'inCout-ofD': '123'}}, 'abc'], 'F': 'F'}}

Question 61

I have a recursive method to get a dictionary from a lxml element

    def recursive_dict(element):
        return (element.tag.split('}')[1],
                dict(map(recursive_dict, element.getchildren()),
                     **element.attrib))

Question 62

What’s the correct way to initialize an ordered dictionary (OD) so that it retains the order of initial data?

from collections import OrderedDict

# Obviously wrong because regular dict loses order
d = OrderedDict({'b':2, 'a':1}) 

# An OD is represented by a list of tuples, so would this work?
d = OrderedDict([('b',2), ('a', 1)])

# What about using a list comprehension, will 'd' preserve the order of 'l'
l = ['b', 'a', 'c', 'aa']
d = OrderedDict([(i,i) for i in l])

Question:

Will an OrderedDict preserve the order of a list of tuples, or tuple of tuples or tuple of lists or list of lists etc. passed at the time of initialization (2nd & 3rd example above)?
How does one go about verifying if OrderedDict actually maintains an order? Since a dict has an unpredictable order, what if my test vectors luckily have the same initial order as the unpredictable order of a dict? For example, if instead of d = OrderedDict({'b':2, 'a':1}) I write d = OrderedDict({'a':1, 'b':2}), I can wrongly conclude that the order is preserved. In this case, I found out that a dict is ordered alphabetically, but that may not be always true. What’s a reliable way to use a counterexample to verify whether a data structure preserves order or not, short of trying test vectors repeatedly until one breaks?

P.S. I’ll just leave this here for reference: “The OrderedDict constructor and update() method both accept keyword arguments, but their order is lost because Python’s function call semantics pass-in keyword arguments using a regular unordered dictionary”

P.P.S : Hopefully, in future, OrderedDict will preserve the order of kwargs also (example 1): http://bugs.python.org/issue16991

Question 63

The OrderedDict will preserve any order that it has access to. The only way to pass ordered data to it to initialize is to pass a list (or, more generally, an iterable) of key-value pairs, as in your last two examples. As the documentation you linked to says, the OrderedDict does not have access to any order when you pass in keyword arguments or a dict argument, since any order there is removed before the OrderedDict constructor sees it.

Note that using a list comprehension in your last example doesn’t change anything. There’s no difference between OrderedDict([(i,i) for i in l]) and OrderedDict([('b', 'b'), ('a', 'a'), ('c', 'c'), ('aa', 'aa')]). The list comprehension is evaluated and creates the list and it is passed in; OrderedDict knows nothing about how it was created.

Question 64

# An OD is represented by a list of tuples, so would this work?
d = OrderedDict([('b', 2), ('a', 1)])

Yes, that will work. By definition, a list is always ordered the way it is represented. This goes for list-comprehension too, the list generated is in the same way the data was provided (i.e. source from a list it will be deterministic, sourced from a set or dict not so much).

How does one go about verifying if OrderedDict actually maintains an order. Since a dict has an unpredictable order, what if my test vectors luckily has the same initial order as the unpredictable order of a dict?. For example, if instead of d = OrderedDict({'b':2, 'a':1}) I write d = OrderedDict({'a':1, 'b':2}), I can wrongly conclude that the order is preserved. In this case, I found out that a dict is order alphabetically, but that may not be always true. i.e. what’s a reliable way to use a counter example to verify if a data structure preserves order or not short of trying test vectors repeatedly until one breaks.

You keep your source list of 2-tuple around for reference, and use that as your test data for your test cases when you do unit tests. Iterate through them and ensure the order is maintained.

问题：使用python map和其他功能工具

回答 0

回答 1

回答 2

回答 3

回答 4

回答 5

回答 6

回答 7

回答 8

问题：安全地从字典中删除多个键

回答 0

回答 1

回答 2

回答 3

回答 4

回答 5

回答 6

回答 7

回答 8

回答 9

回答 10

回答 11

回答 12

问题：字典与对象-哪个更有效，为什么？

回答 0

回答 1

回答 2

回答 3

回答 4

回答 5

回答 6

回答 7

问题：Python中的字符串到字典

回答 0

回答 1

问题：如何将键值元组列表转换成字典？

回答 0

回答 1

回答 2

回答 3

回答 4

回答 5

回答 6

问题：如何将xml字符串转换为字典？

回答 0

回答 1

回答 2

更新资料

Update

回答 3

回答 4

回答 5

回答 6

回答 7

回答 8

回答 9

回答 10

回答 11

回答 12

回答 13

回答 14

回答 15

问题：使用其构造函数初始化OrderedDict的正确方法，使其保留初始数据的顺序？

回答 0

回答 1

问题：Python字典到URL参数

回答 0

回答 1

回答 2

问题：检查值是否已存在于字典列表中？

回答 0

回答 1

回答 2

回答 3

问题：遍历所有嵌套的字典值？

回答 0

回答 1

回答 2

回答 3