Python 实用宝典

Question 1

I have a class which holds a dictionary

class OrderBook:
    orders = {'Restaurant1': None,
              'Restaurant2': None,
              'Restaurant3': None,
              'Restaurant4': None}

    @staticmethod
    def addOrder(restaurant_name, orders):
        OrderBook.orders[restaurant_name] = orders

And I am running 4 threads (one for each restaurant) that call the method OrderBook.addOrder. Here is the function ran by each thread:

def addOrders(restaurant_name):

    #creates orders
    ...

    OrderBook.addOrder(restaurant_name, orders)

Is this safe, or do I have to use a lock before calling addOrder?

Question 2

Python’s built-in structures are thread-safe for single operations, but it can sometimes be hard to see where a statement really becomes multiple operations.

Your code should be safe. Keep in mind: a lock here will add almost no overhead, and will give you peace of mind.

http://effbot.org/pyfaq/what-kinds-of-global-value-mutation-are-thread-safe.htm has more details.

Question 3

Yes, built-in types are inherently thread-safe: http://docs.python.org/glossary.html#term-global-interpreter-lock

This simplifies the CPython implementation by making the object model (including critical built-in types such as dict) implicitly safe against concurrent access.

Question 4

Google’s style guide advises against relying on dict atomicity

Explained in further detail at: Is Python variable assignment atomic?

Do not rely on the atomicity of built-in types.

While Python’s built-in data types such as dictionaries appear to have atomic operations, there are corner cases where they aren’t atomic (e.g. if __hash__ or __eq__ are implemented as Python methods) and their atomicity should not be relied upon. Neither should you rely on atomic variable assignment (since this in turn depends on dictionaries).

Use the Queue module’s Queue data type as the preferred way to communicate data between threads. Otherwise, use the threading module and its locking primitives. Learn about the proper use of condition variables so you can use threading.Condition instead of using lower-level locks.

And I agree with this one: there is already the GIL in CPython, so the performance hit of using a Lock will be negligible. Much more costly will be the hours spent bug hunting in a complex codebase when those CPython implementation details change one day.

Question 5

Is it possible to create an object from a dictionary in python in such a way that each key is an attribute of that object?

Something like this:

 d = { 'name': 'Oscar', 'lastName': 'Reyes', 'age':32 }

 e = Employee(d) 
 print e.name # Oscar 
 print e.age + 10 # 42

I think it would be pretty much the inverse of this question: Python dictionary from an object’s fields

Question 6

Sure, something like this:

class Employee(object):
    def __init__(self, initial_data):
        for key in initial_data:
            setattr(self, key, initial_data[key])

Update

As Brent Nash suggests, you can make this more flexible by allowing keyword arguments as well:

class Employee(object):
    def __init__(self, *initial_data, **kwargs):
        for dictionary in initial_data:
            for key in dictionary:
                setattr(self, key, dictionary[key])
        for key in kwargs:
            setattr(self, key, kwargs[key])

Then you can call it like this:

e = Employee({"name": "abc", "age": 32})

or like this:

e = Employee(name="abc", age=32)

or even like this:

employee_template = {"role": "minion"}
e = Employee(employee_template, name="abc", age=32)

Question 7

Setting attributes in this way is almost certainly not the best way to solve a problem. Either:

You know what all the fields should be ahead of time. In that case, you can set all the attributes explicitly. This would look like

class Employee(object):
    def __init__(self, name, last_name, age):
        self.name = name
        self.last_name = last_name
        self.age = age

d = {'name': 'Oscar', 'last_name': 'Reyes', 'age':32 }
e = Employee(**d) 

print e.name # Oscar 
print e.age + 10 # 42

or

You don’t know what all the fields should be ahead of time. In this case, you should store the data as a dict instead of polluting an objects namespace. Attributes are for static access. This case would look like

class Employee(object):
    def __init__(self, data):
        self.data = data

d = {'name': 'Oscar', 'last_name': 'Reyes', 'age':32 }
e = Employee(d) 

print e.data['name'] # Oscar 
print e.data['age'] + 10 # 42

Another solution that is basically equivalent to case 1 is to use a collections.namedtuple. See van’s answer for how to implement that.

Question 8

You can access the attributes of an object with __dict__, and call the update method on it:

>>> class Employee(object):
...     def __init__(self, _dict):
...         self.__dict__.update(_dict)
... 


>>> dict = { 'name': 'Oscar', 'lastName': 'Reyes', 'age':32 }

>>> e = Employee(dict)

>>> e.name
'Oscar'

>>> e.age
32

Question 9

Why not just use attribute names as keys to a dictionary?

class StructMyDict(dict):

     def __getattr__(self, name):
         try:
             return self[name]
         except KeyError as e:
             raise AttributeError(e)

     def __setattr__(self, name, value):
         self[name] = value

You can initialize with named arguments, a list of tuples, or a dictionary, or individual attribute assignments, e.g.:

nautical = StructMyDict(left = "Port", right = "Starboard") # named args

nautical2 = StructMyDict({"left":"Port","right":"Starboard"}) # dictionary

nautical3 = StructMyDict([("left","Port"),("right","Starboard")]) # tuples list

nautical4 = StructMyDict()  # fields TBD
nautical4.left = "Port"
nautical4.right = "Starboard"

for x in [nautical, nautical2, nautical3, nautical4]:
    print "%s <--> %s" % (x.left,x.right)

Alternatively, instead of raising the attribute error, you can return None for unknown values. (A trick used in the web2py storage class)

Question 10

I think that answer using settattr are the way to go if you really need to support dict.

But if Employee object is just a structure which you can access with dot syntax (.name) instead of dict syntax (['name']), you can use namedtuple like this:

from collections import namedtuple

Employee = namedtuple('Employee', 'name age')
e = Employee('noname01', 6)
print e
#>> Employee(name='noname01', age=6)

# create Employee from dictionary
d = {'name': 'noname02', 'age': 7}
e = Employee(**d)
print e
#>> Employee(name='noname02', age=7)
print e._asdict()
#>> {'age': 7, 'name': 'noname02'}

You do have _asdict() method to access all properties as dictionary, but you cannot add additional attributes later, only during the construction.

Question 11

say for example

class A():
    def __init__(self):
        self.x=7
        self.y=8
        self.z="name"

if you want to set the attributes at once

d = {'x':100,'y':300,'z':"blah"}
a = A()
a.__dict__.update(d)

Question 12

similar to using a dict, you could just use kwargs like so:

class Person:
   def __init__(self, **kwargs):
       self.properties = kwargs

   def get_property(self, key):
       return self.properties.get(key, None)

   def main():
       timmy = Person(color = 'red')
       print(timmy.get_property('color')) #prints 'red'

Question 13

Is there any straightforward way of finding a key by knowing the value within a dictionary?

All I can think of is this:

key = [key for key, value in dict_obj.items() if value == 'value'][0]

Question 14

There is none. Don’t forget that the value may be found on any number of keys, including 0 or more than 1.

Question 15

Your list comprehension goes through all the dict’s items finding all the matches, then just returns the first key. This generator expression will only iterate as far as necessary to return the first value:

key = next(key for key, value in dd.items() if value == 'value')

where dd is the dict. Will raise StopIteration if no match is found, so you might want to catch that and return a more appropriate exception like ValueError or KeyError.

Question 16

There are cases where a dictionary is a one:one mapping

Eg,

d = {1: "one", 2: "two" ...}

Your approach is ok if you are only doing a single lookup. However if you need to do more than one lookup it will be more efficient to create an inverse dictionary

ivd = {v: k for k, v in d.items()}

If there is a possibility of multiple keys with the same value, you will need to specify the desired behaviour in this case.

If your Python is 2.6 or older, you can use

ivd = dict((v, k) for k, v in d.items())

Question 17

This version is 26% shorter than yours but functions identically, even for redundant/ambiguous values (returns the first match, as yours does). However, it is probably twice as slow as yours, because it creates a list from the dict twice.

key = dict_obj.keys()[dict_obj.values().index(value)]

Or if you prefer brevity over readability you can save one more character with

key = list(dict_obj)[dict_obj.values().index(value)]

And if you prefer efficiency, @PaulMcGuire’s approach is better. If there are lots of keys that share the same value it’s more efficient not to instantiate that list of keys with a list comprehension and instead use use a generator:

key = (key for key, value in dict_obj.items() if value == 'value').next()

Question 18

Since this is still very relevant, the first Google hit and I just spend some time figuring this out, I’ll post my (working in Python 3) solution:

testdict = {'one'   : '1',
            'two'   : '2',
            'three' : '3',
            'four'  : '4'
            }

value = '2'

[key for key in testdict.items() if key[1] == value][0][0]

Out[1]: 'two'

It will give you the first value that matches.

Question 19

Maybe a dictionary-like class such as DoubleDict down below is what you want? You can use any one of the provided metaclasses in conjuction with DoubleDict or may avoid using any metaclass at all.

import functools
import threading

################################################################################

class _DDChecker(type):

    def __new__(cls, name, bases, classdict):
        for key, value in classdict.items():
            if key not in {'__new__', '__slots__', '_DoubleDict__dict_view'}:
                classdict[key] = cls._wrap(value)
        return super().__new__(cls, name, bases, classdict)

    @staticmethod
    def _wrap(function):
        @functools.wraps(function)
        def check(self, *args, **kwargs):
            value = function(self, *args, **kwargs)
            if self._DoubleDict__forward != \
               dict(map(reversed, self._DoubleDict__reverse.items())):
                raise RuntimeError('Forward & Reverse are not equivalent!')
            return value
        return check

################################################################################

class _DDAtomic(_DDChecker):

    def __new__(cls, name, bases, classdict):
        if not bases:
            classdict['__slots__'] += ('_DDAtomic__mutex',)
            classdict['__new__'] = cls._atomic_new
        return super().__new__(cls, name, bases, classdict)

    @staticmethod
    def _atomic_new(cls, iterable=(), **pairs):
        instance = object.__new__(cls, iterable, **pairs)
        instance.__mutex = threading.RLock()
        instance.clear()
        return instance

    @staticmethod
    def _wrap(function):
        @functools.wraps(function)
        def atomic(self, *args, **kwargs):
            with self.__mutex:
                return function(self, *args, **kwargs)
        return atomic

################################################################################

class _DDAtomicChecker(_DDAtomic):

    @staticmethod
    def _wrap(function):
        return _DDAtomic._wrap(_DDChecker._wrap(function))

################################################################################

class DoubleDict(metaclass=_DDAtomicChecker):

    __slots__ = '__forward', '__reverse'

    def __new__(cls, iterable=(), **pairs):
        instance = super().__new__(cls, iterable, **pairs)
        instance.clear()
        return instance

    def __init__(self, iterable=(), **pairs):
        self.update(iterable, **pairs)

    ########################################################################

    def __repr__(self):
        return repr(self.__forward)

    def __lt__(self, other):
        return self.__forward < other

    def __le__(self, other):
        return self.__forward <= other

    def __eq__(self, other):
        return self.__forward == other

    def __ne__(self, other):
        return self.__forward != other

    def __gt__(self, other):
        return self.__forward > other

    def __ge__(self, other):
        return self.__forward >= other

    def __len__(self):
        return len(self.__forward)

    def __getitem__(self, key):
        if key in self:
            return self.__forward[key]
        return self.__missing_key(key)

    def __setitem__(self, key, value):
        if self.in_values(value):
            del self[self.get_key(value)]
        self.__set_key_value(key, value)
        return value

    def __delitem__(self, key):
        self.pop(key)

    def __iter__(self):
        return iter(self.__forward)

    def __contains__(self, key):
        return key in self.__forward

    ########################################################################

    def clear(self):
        self.__forward = {}
        self.__reverse = {}

    def copy(self):
        return self.__class__(self.items())

    def del_value(self, value):
        self.pop_key(value)

    def get(self, key, default=None):
        return self[key] if key in self else default

    def get_key(self, value):
        if self.in_values(value):
            return self.__reverse[value]
        return self.__missing_value(value)

    def get_key_default(self, value, default=None):
        return self.get_key(value) if self.in_values(value) else default

    def in_values(self, value):
        return value in self.__reverse

    def items(self):
        return self.__dict_view('items', ((key, self[key]) for key in self))

    def iter_values(self):
        return iter(self.__reverse)

    def keys(self):
        return self.__dict_view('keys', self.__forward)

    def pop(self, key, *default):
        if len(default) > 1:
            raise TypeError('too many arguments')
        if key in self:
            value = self[key]
            self.__del_key_value(key, value)
            return value
        if default:
            return default[0]
        raise KeyError(key)

    def pop_key(self, value, *default):
        if len(default) > 1:
            raise TypeError('too many arguments')
        if self.in_values(value):
            key = self.get_key(value)
            self.__del_key_value(key, value)
            return key
        if default:
            return default[0]
        raise KeyError(value)

    def popitem(self):
        try:
            key = next(iter(self))
        except StopIteration:
            raise KeyError('popitem(): dictionary is empty')
        return key, self.pop(key)

    def set_key(self, value, key):
        if key in self:
            self.del_value(self[key])
        self.__set_key_value(key, value)
        return key

    def setdefault(self, key, default=None):
        if key not in self:
            self[key] = default
        return self[key]

    def setdefault_key(self, value, default=None):
        if not self.in_values(value):
            self.set_key(value, default)
        return self.get_key(value)

    def update(self, iterable=(), **pairs):
        for key, value in (((key, iterable[key]) for key in iterable.keys())
                           if hasattr(iterable, 'keys') else iterable):
            self[key] = value
        for key, value in pairs.items():
            self[key] = value

    def values(self):
        return self.__dict_view('values', self.__reverse)

    ########################################################################

    def __missing_key(self, key):
        if hasattr(self.__class__, '__missing__'):
            return self.__missing__(key)
        if not hasattr(self, 'default_factory') \
           or self.default_factory is None:
            raise KeyError(key)
        return self.__setitem__(key, self.default_factory())

    def __missing_value(self, value):
        if hasattr(self.__class__, '__missing_value__'):
            return self.__missing_value__(value)
        if not hasattr(self, 'default_key_factory') \
           or self.default_key_factory is None:
            raise KeyError(value)
        return self.set_key(value, self.default_key_factory())

    def __set_key_value(self, key, value):
        self.__forward[key] = value
        self.__reverse[value] = key

    def __del_key_value(self, key, value):
        del self.__forward[key]
        del self.__reverse[value]

    ########################################################################

    class __dict_view(frozenset):

        __slots__ = '__name'

        def __new__(cls, name, iterable=()):
            instance = super().__new__(cls, iterable)
            instance.__name = name
            return instance

        def __repr__(self):
            return 'dict_{}({})'.format(self.__name, list(self))

Question 20

No, you can not do this efficiently without looking in all the keys and checking all their values. So you will need O(n) time to do this. If you need to do a lot of such lookups you will need to do this efficiently by constructing a reversed dictionary (can be done also in O(n)) and then making a search inside of this reversed dictionary (each search will take on average O(1)).

Here is an example of how to construct a reversed dictionary (which will be able to do one to many mapping) from a normal dictionary:

for i in h_normal:
    for j in h_normal[i]:
        if j not in h_reversed:
            h_reversed[j] = set([i])
        else:
            h_reversed[j].add(i)

For example if your

h_normal = {
  1: set([3]), 
  2: set([5, 7]), 
  3: set([]), 
  4: set([7]), 
  5: set([1, 4]), 
  6: set([1, 7]), 
  7: set([1]), 
  8: set([2, 5, 6])
}

your h_reversed will be

{
  1: set([5, 6, 7]),
  2: set([8]), 
  3: set([1]), 
  4: set([5]), 
  5: set([8, 2]), 
  6: set([8]), 
  7: set([2, 4, 6])
}

Question 21

There isn’t one as far as I know of, one way however to do it is to create a dict for normal lookup by key and another dict for reverse lookup by value.

There’s an example of such an implementation here:

http://code.activestate.com/recipes/415903-two-dict-classes-which-can-lookup-keys-by-value-an/

This does mean that looking up the keys for a value could result in multiple results which can be returned as a simple list.

Question 22

I know this might be considered ‘wasteful’, but in this scenario I often store the key as an additional column in the value record:

d = {'key1' : ('key1', val, val...), 'key2' : ('key2', val, val...) }

it’s a tradeoff and feels wrong, but it’s simple and works and of course depends on values being tuples rather than simple values.

Question 23

Make a reverse dictionary

reverse_dictionary = {v:k for k,v in dictionary.items()}

If you have a lot of reverse lookups to do

Question 24

Through values in dictionary can be object of any kind they can’t be hashed or indexed other way. So finding key by the value is unnatural for this collection type. Any query like that can be executed in O(n) time only. So if this is frequent task you should take a look for some indexing of key like Jon sujjested or maybe even some spatial index (DB or http://pypi.python.org/pypi/Rtree/ ).

Question 25

I’m using dictionaries as a sort of “database”, so I need to find a key that I can reuse. For my case, if a key’s value is None, then I can take it and reuse it without having to “allocate” another id. Just figured I’d share it.

db = {0:[], 1:[], ..., 5:None, 11:None, 19:[], ...}

keys_to_reallocate = [None]
allocate.extend(i for i in db.iterkeys() if db[i] is None)
free_id = keys_to_reallocate[-1]

I like this one because I don’t have to try and catch any errors such as StopIteration or IndexError. If there’s a key available, then free_id will contain one. If there isn’t, then it will simply be None. Probably not pythonic, but I really didn’t want to use a try here…

Question 26

Code:

d = {'a': 0, 'b': 1, 'c': 2}
l = d.keys()

print l

This prints ['a', 'c', 'b']. I’m unsure of how the method keys() determines the order of the keywords within l. However, I’d like to be able to retrive the keywords in the “proper” order. The proper order of course would create the list ['a', 'b', 'c'].

Question 27

You could use OrderedDict (requires Python 2.7) or higher.

Also, note that OrderedDict({'a': 1, 'b':2, 'c':3}) won’t work since the dict you create with {...} has already forgotten the order of the elements. Instead, you want to use OrderedDict([('a', 1), ('b', 2), ('c', 3)]).

As mentioned in the documentation, for versions lower than Python 2.7, you can use this recipe.

Question 28

Python 3.7+

In Python 3.7.0 the insertion-order preservation nature of dict objects has been declared to be an official part of the Python language spec. Therefore, you can depend on it.

Python 3.6 (CPython)

As of Python 3.6, for the CPython implementation of Python, dictionaries maintain insertion order by default. This is considered an implementation detail though; you should still use collections.OrderedDict if you want insertion ordering that’s guaranteed across other implementations of Python.

Python >=2.7 and <3.6

Use the collections.OrderedDict class when you need a dict that remembers the order of items inserted.

Question 29

>>> print sorted(d.keys())
['a', 'b', 'c']

Use the sorted function, which sorts the iterable passed in.

The .keys() method returns the keys in an arbitrary order.

Question 30

Just sort the list when you want to use it.

l = sorted(d.keys())

Question 31

From http://docs.python.org/tutorial/datastructures.html:

“The keys() method of a dictionary object returns a list of all the keys used in the dictionary, in arbitrary order (if you want it sorted, just apply the sorted() function to it).”

Question 32

Although the order does not matter as the dictionary is hashmap. It depends on the order how it is pushed in:

s = 'abbc'
a = 'cbab'

def load_dict(s):
    dict_tmp = {}
    for ch in s:
        if ch in dict_tmp.keys():
            dict_tmp[ch]+=1
        else:
            dict_tmp[ch] = 1
    return dict_tmp

dict_a = load_dict(a)
dict_s = load_dict(s)
print('for string %s, the keys are %s'%(s, dict_s.keys()))
print('for string %s, the keys are %s'%(a, dict_a.keys()))

output:
for string abbc, the keys are dict_keys([‘a’, ‘b’, ‘c’])
for string cbab, the keys are dict_keys([‘c’, ‘b’, ‘a’])

Question 33

I’m trying to write a custom filter method that takes an arbitrary number of kwargs and returns a list containing the elements of a database-like list that contain those kwargs.

For example, suppose d1 = {'a':'2', 'b':'3'} and d2 = the same thing. d1 == d2 results in True. But suppose d2 = the same thing plus a bunch of other things. My method needs to be able to tell if d1 in d2, but Python can’t do that with dictionaries.

Context:

I have a Word class, and each object has properties like word, definition, part_of_speech, and so on. I want to be able to call a filter method on the main list of these words, like Word.objects.filter(word='jump', part_of_speech='verb-intransitive'). I can’t figure out how to manage these keys and values at the same time. But this could have larger functionality outside this context for other people.

Question 34

Convert to item pairs and check for containment.

all(item in superset.items() for item in subset.items())

Optimization is left as an exercise for the reader.

Question 35

In Python 3, you can use dict.items() to get a set-like view of the dict items. You can then use the <= operator to test if one view is a “subset” of the other:

d1.items() <= d2.items()

In Python 2.7, use the dict.viewitems() to do the same:

d1.viewitems() <= d2.viewitems()

In Python 2.6 and below you will need a different solution, such as using all():

all(key in d2 and d2[key] == d1[key] for key in d1)

Question 36

Note for people that need this for unit testing: there’s also an assertDictContainsSubset() method in Python’s TestCase class.

http://docs.python.org/2/library/unittest.html?highlight=assertdictcontainssubset#unittest.TestCase.assertDictContainsSubset

It’s however deprecated in 3.2, not sure why, maybe there’s a replacement for it.

Question 37

for keys and values check use: set(d1.items()).issubset(set(d2.items()))

if you need to check only keys: set(d1).issubset(set(d2))

Question 38

For completeness, you can also do this:

def is_subdict(small, big):
    return dict(big, **small) == big

However, I make no claims whatsoever concerning speed (or lack thereof) or readability (or lack thereof).

Question 39

>>> d1 = {'a':'2', 'b':'3'}
>>> d2 = {'a':'2', 'b':'3','c':'4'}
>>> all((k in d2 and d2[k]==v) for k,v in d1.iteritems())
True

context:

>>> d1 = {'a':'2', 'b':'3'}
>>> d2 = {'a':'2', 'b':'3','c':'4'}
>>> list(d1.iteritems())
[('a', '2'), ('b', '3')]
>>> [(k,v) for k,v in d1.iteritems()]
[('a', '2'), ('b', '3')]
>>> k,v = ('a','2')
>>> k
'a'
>>> v
'2'
>>> k in d2
True
>>> d2[k]
'2'
>>> k in d2 and d2[k]==v
True
>>> [(k in d2 and d2[k]==v) for k,v in d1.iteritems()]
[True, True]
>>> ((k in d2 and d2[k]==v) for k,v in d1.iteritems())
<generator object <genexpr> at 0x02A9D2B0>
>>> ((k in d2 and d2[k]==v) for k,v in d1.iteritems()).next()
True
>>> all((k in d2 and d2[k]==v) for k,v in d1.iteritems())
True
>>>

Question 40

My function for the same purpose, doing this recursively:

def dictMatch(patn, real):
    """does real dict match pattern?"""
    try:
        for pkey, pvalue in patn.iteritems():
            if type(pvalue) is dict:
                result = dictMatch(pvalue, real[pkey])
                assert result
            else:
                assert real[pkey] == pvalue
                result = True
    except (AssertionError, KeyError):
        result = False
    return result

In your example, dictMatch(d1, d2) should return True even if d2 has other stuff in it, plus it applies also to lower levels:

d1 = {'a':'2', 'b':{3: 'iii'}}
d2 = {'a':'2', 'b':{3: 'iii', 4: 'iv'},'c':'4'}

dictMatch(d1, d2)   # True

Notes: There could be even better solution which avoids the if type(pvalue) is dict clause and applies to even wider range of cases (like lists of hashes etc). Also recursion is not limited here so use at your own risk. ;)

Question 41

Here is a solution that also properly recurses into lists and sets contained within the dictionary. You can also use this for lists containing dicts etc…

def is_subset(subset, superset):
    if isinstance(subset, dict):
        return all(key in superset and is_subset(val, superset[key]) for key, val in subset.items())

    if isinstance(subset, list) or isinstance(subset, set):
        return all(any(is_subset(subitem, superitem) for superitem in superset) for subitem in subset)

    # assume that subset is a plain value if none of the above match
    return subset == superset

Question 42

This seemingly straightforward issue costs me a couple hours in research to find a 100% reliable solution, so I documented what I’ve found in this answer.

“Pythonic-ally” speaking, small_dict <= big_dict would be the most intuitive way, but too bad that it won’t work. {'a': 1} < {'a': 1, 'b': 2} seemingly works in Python 2, but it is not reliable because the official documention explicitly calls it out. Go search “Outcomes other than equality are resolved consistently, but are not otherwise defined.” in this section. Not to mention, comparing 2 dicts in Python 3 results in a TypeError exception.
The second most-intuitive thing is small.viewitems() <= big.viewitems() for Python 2.7 only, and small.items() <= big.items() for Python 3. But there is one caveat: it is potentially buggy. If your program could potentially be used on Python <=2.6, its d1.items() <= d2.items() are actually comparing 2 lists of tuples, without particular order, so the final result will be unreliable and it becomes a nasty bug in your program. I am not keen to write yet another implementation for Python<=2.6, but I still don’t feel comfortable that my code comes with a known bug (even if it is on an unsupported platform). So I abandon this approach.
I settle down with @blubberdiblub ‘s answer (Credit goes to him):

def is_subdict(small, big): return dict(big, **small) == big

It is worth pointing out that, this answer relies on the == behavior between dicts, which is clearly defined in official document, hence should work in every Python version. Go search:
- “Dictionaries compare equal if and only if they have the same (key, value) pairs.” is the last sentence in this page
- “Mappings (instances of dict) compare equal if and only if they have equal (key, value) pairs. Equality comparison of the keys and elements enforces reflexivity.” in this page

Question 43

Here’s a general recursive solution for the problem given:

import traceback
import unittest

def is_subset(superset, subset):
    for key, value in subset.items():
        if key not in superset:
            return False

        if isinstance(value, dict):
            if not is_subset(superset[key], value):
                return False

        elif isinstance(value, str):
            if value not in superset[key]:
                return False

        elif isinstance(value, list):
            if not set(value) <= set(superset[key]):
                return False
        elif isinstance(value, set):
            if not value <= superset[key]:
                return False

        else:
            if not value == superset[key]:
                return False

    return True


class Foo(unittest.TestCase):

    def setUp(self):
        self.dct = {
            'a': 'hello world',
            'b': 12345,
            'c': 1.2345,
            'd': [1, 2, 3, 4, 5],
            'e': {1, 2, 3, 4, 5},
            'f': {
                'a': 'hello world',
                'b': 12345,
                'c': 1.2345,
                'd': [1, 2, 3, 4, 5],
                'e': {1, 2, 3, 4, 5},
                'g': False,
                'h': None
            },
            'g': False,
            'h': None,
            'question': 'mcve',
            'metadata': {}
        }

    def tearDown(self):
        pass

    def check_true(self, superset, subset):
        return self.assertEqual(is_subset(superset, subset), True)

    def check_false(self, superset, subset):
        return self.assertEqual(is_subset(superset, subset), False)

    def test_simple_cases(self):
        self.check_true(self.dct, {'a': 'hello world'})
        self.check_true(self.dct, {'b': 12345})
        self.check_true(self.dct, {'c': 1.2345})
        self.check_true(self.dct, {'d': [1, 2, 3, 4, 5]})
        self.check_true(self.dct, {'e': {1, 2, 3, 4, 5}})
        self.check_true(self.dct, {'f': {
            'a': 'hello world',
            'b': 12345,
            'c': 1.2345,
            'd': [1, 2, 3, 4, 5],
            'e': {1, 2, 3, 4, 5},
        }})
        self.check_true(self.dct, {'g': False})
        self.check_true(self.dct, {'h': None})

    def test_tricky_cases(self):
        self.check_true(self.dct, {'a': 'hello'})
        self.check_true(self.dct, {'d': [1, 2, 3]})
        self.check_true(self.dct, {'e': {3, 4}})
        self.check_true(self.dct, {'f': {
            'a': 'hello world',
            'h': None
        }})
        self.check_false(
            self.dct, {'question': 'mcve', 'metadata': {'author': 'BPL'}})
        self.check_true(
            self.dct, {'question': 'mcve', 'metadata': {}})
        self.check_false(
            self.dct, {'question1': 'mcve', 'metadata': {}})

if __name__ == "__main__":
    unittest.main()

NOTE: The original code would fail in certain cases, credits for the fixing goes to @olivier-melançon

Question 44

If you don’t mind using pydash there is is_match there which does exactly that:

import pydash

a = {1:2, 3:4, 5:{6:7}}
b = {3:4.0, 5:{6:8}}
c = {3:4.0, 5:{6:7}}

pydash.predicates.is_match(a, b) # False
pydash.predicates.is_match(a, c) # True

Question 45

I know this question is old, but here is my solution for checking if one nested dictionary is a part of another nested dictionary. The solution is recursive.

def compare_dicts(a, b):
    for key, value in a.items():
        if key in b:
            if isinstance(a[key], dict):
                if not compare_dicts(a[key], b[key]):
                    return False
            elif value != b[key]:
                return False
        else:
            return False
    return True

Question 46

This function works for non-hashable values. I also think that it is clear and easy to read.

def isSubDict(subDict,dictionary):
    for key in subDict.keys():
        if (not key in dictionary) or (not subDict[key] == dictionary[key]):
            return False
    return True

In [126]: isSubDict({1:2},{3:4})
Out[126]: False

In [127]: isSubDict({1:2},{1:2,3:4})
Out[127]: True

In [128]: isSubDict({1:{2:3}},{1:{2:3},3:4})
Out[128]: True

In [129]: isSubDict({1:{2:3}},{1:{2:4},3:4})
Out[129]: False

Question 47

A short recursive implementation that works for nested dictionaries:

def compare_dicts(a,b):
    if not a: return True
    if isinstance(a, dict):
        key, val = a.popitem()
        return isinstance(b, dict) and key in b and compare_dicts(val, b.pop(key)) and compare_dicts(a, b)
    return a == b

This will consume the a and b dicts. If anyone knows of a good way to avoid that without resorting to partially iterative solutions as in other answers, please tell me. I would need a way to split a dict into head and tail based on a key.

This code is more usefull as a programming exercise, and probably is a lot slower than other solutions in here that mix recursion and iteration. @Nutcracker’s solution is pretty good for nested dictionaries.

Question 48

I’m a bit confused about what can/can’t be used as a key for a python dict.

dicked = {}
dicked[None] = 'foo'     # None ok
dicked[(1,3)] = 'baz'    # tuple ok
import sys
dicked[sys] = 'bar'      # wow, even a module is ok !
dicked[(1,[3])] = 'qux'  # oops, not allowed

So a tuple is an immutable type but if I hide a list inside of it, then it can’t be a key.. couldn’t I just as easily hide a list inside a module?

I had some vague idea that that the key has to be “hashable” but I’m just going to admit my own ignorance about the technical details; I don’t know what’s really going on here. What would go wrong if you tried to use lists as keys, with the hash as, say, their memory location?

Question 49

There’s a good article on the topic in the Python wiki: Why Lists Can’t Be Dictionary Keys. As explained there:

What would go wrong if you tried to use lists as keys, with the hash as, say, their memory location?

It can be done without really breaking any of the requirements, but it leads to unexpected behavior. Lists are generally treated as if their value was derived from their content’s values, for instance when checking (in-)equality. Many would – understandably – expect that you can use any list [1, 2] to get the same key, where you’d have to keep around exactly the same list object. But lookup by value breaks as soon as a list used as key is modified, and for lookup by identity requires you to keep around exactly the same list – which isn’t requires for any other common list operation (at least none I can think of).

Other objects such as modules and object make a much bigger deal out of their object identity anyway (when was the last time you had two distinct module objects called sys?), and are compared by that anyway. Therefore, it’s less surprising – or even expected – that they, when used as dict keys, compare by identity in that case as well.

Question 50

Why can’t I use a list as a dict key in python?

>>> d = {repr([1,2,3]): 'value'}
{'[1, 2, 3]': 'value'}

(for anybody who stumbles on this question looking for a way around it)

as explained by others here, indeed you cannot. You can however use its string representation instead if you really want to use your list.

Question 51

Just found you can change List into tuple, then use it as keys.

d = {tuple([1,2,3]): 'value'}

Question 52

The issue is that tuples are immutable, and lists are not. Consider the following

d = {}
li = [1,2,3]
d[li] = 5
li.append(4)

What should d[li] return? Is it the same list? How about d[[1,2,3]]? It has the same values, but is a different list?

Ultimately, there is no satisfactory answer. For example, if the only key that works is the original key, then if you have no reference to that key, you can never again access the value. With every other allowed key, you can construct a key without a reference to the original.

If both of my suggestions work, then you have very different keys that return the same value, which is more than a little surprising. If only the original contents work, then your key will quickly go bad, since lists are made to be modified.

Question 53

Here’s an answer http://wiki.python.org/moin/DictionaryKeys

What would go wrong if you tried to use lists as keys, with the hash as, say, their memory location?

Looking up different lists with the same contents would produce different results, even though comparing lists with the same contents would indicate them as equivalent.

What about Using a list literal in a dictionary lookup?

Question 54

Your awnser can be found here:

Why Lists Can’t Be Dictionary Keys

Newcomers to Python often wonder why, while the language includes both a tuple and a list type, tuples are usable as a dictionary keys, while lists are not. This was a deliberate design decision, and can best be explained by first understanding how Python dictionaries work.

Source & more info: http://wiki.python.org/moin/DictionaryKeys

Question 55

Because lists are mutable, dict keys (and set members) need to be hashable, and hashing mutable objects is a bad idea because hash values should be computed on the basis of instance attributes.

In this answer, I will give some concrete examples, hopefully adding value on top of the existing answers. Every insight applies to the elements of the set datastructure as well.

Example 1: hashing a mutable object where the hash value is based on a mutable characteristic of the object.

>>> class stupidlist(list):
...     def __hash__(self):
...         return len(self)
... 
>>> stupid = stupidlist([1, 2, 3])
>>> d = {stupid: 0}
>>> stupid.append(4)
>>> stupid
[1, 2, 3, 4]
>>> d
{[1, 2, 3, 4]: 0}
>>> stupid in d
False
>>> stupid in d.keys()
False
>>> stupid in list(d.keys())
True

After mutating stupid, it cannot be found in the dict any longer because the hash changed. Only a linear scan over the list of the dict’s keys finds stupid.

Example 2: … but why not just a constant hash value?

>>> class stupidlist2(list):
...     def __hash__(self):
...         return id(self)
... 
>>> stupidA = stupidlist2([1, 2, 3])
>>> stupidB = stupidlist2([1, 2, 3])
>>> 
>>> stupidA == stupidB
True
>>> stupidA in {stupidB: 0}
False

That’s not a good idea as well because equal objects should hash identically such that you can find them in a dict or set.

Example 3: … ok, what about constant hashes across all instances?!

>>> class stupidlist3(list):
...     def __hash__(self):
...         return 1
... 
>>> stupidC = stupidlist3([1, 2, 3])
>>> stupidD = stupidlist3([1, 2, 3])
>>> stupidE = stupidlist3([1, 2, 3, 4])
>>> 
>>> stupidC in {stupidD: 0}
True
>>> stupidC in {stupidE: 0}
False
>>> d = {stupidC: 0}
>>> stupidC.append(5)
>>> stupidC in d
True

Things seem to work as expected, but think about what’s happening: when all instances of your class produce the same hash value, you will have a hash collision whenever there are more than two instances as keys in a dict or present in a set.

Finding the right instance with my_dict[key] or key in my_dict (or item in my_set) needs to perform as many equality checks as there are instances of stupidlist3 in the dict’s keys (in the worst case). At this point, the purpose of the dictionary – O(1) lookup – is completely defeated. This is demonstrated in the following timings (done with IPython).

Some Timings for Example 3

>>> lists_list = [[i]  for i in range(1000)]
>>> stupidlists_set = {stupidlist3([i]) for i in range(1000)}
>>> tuples_set = {(i,) for i in range(1000)}
>>> l = [999]
>>> s = stupidlist3([999])
>>> t = (999,)
>>> 
>>> %timeit l in lists_list
25.5 µs ± 442 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
>>> %timeit s in stupidlists_set
38.5 µs ± 61.2 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
>>> %timeit t in tuples_set
77.6 ns ± 1.5 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)

As you can see, the membership test in our stupidlists_set is even slower than a linear scan over the whole lists_list, while you have the expected super fast lookup time (factor 500) in a set without loads of hash collisions.

TL; DR: you can use tuple(yourlist) as dict keys, because tuples are immutable and hashable.

Question 56

The simple answer to your question is that the class list does not implement the method hash which is required for any object which wishes to be used as a key in a dictionary. However the reason why hash is not implemented the same way it is in say the tuple class (based on the content of the container) is because a list is mutable so editing the list would require the hash to be recalculated which may mean the list in now located in the wrong bucket within the underling hash table. Note that since you cannot modify a tuple (immutable) it doesn’t run into this problem.

As a side note, the actual implementation of the dictobjects lookup is based on Algorithm D from Knuth Vol. 3, Sec. 6.4. If you have that book available to you it might be a worthwhile read, in addition if you’re really, really interested you may like to take a peek at the developer comments on the actual implementation of dictobject here. It goes into great detail as to exactly how it works. There is also a python lecture on the implementation of dictionaries which you may be interested in. They go through the definition of a key and what a hash is in the first few minutes.

Question 57

According to the Python 2.7.2 documentation:

An object is hashable if it has a hash value which never changes during its lifetime (it needs a hash() method), and can be compared to other objects (it needs an eq() or cmp() method). Hashable objects which compare equal must have the same hash value.

Hashability makes an object usable as a dictionary key and a set member, because these data structures use the hash value internally.

All of Python’s immutable built-in objects are hashable, while no mutable containers (such as lists or dictionaries) are. Objects which are instances of user-defined classes are hashable by default; they all compare unequal, and their hash value is their id().

A tuple is immutable in the sense that you cannot add, remove or replace its elements, but the elements themselves may be mutable. List’s hash value depends on the hash values of its elements, and so it changes when you change the elements.

Using id’s for list hashes would imply that all lists compare differently, which would be surprising and inconvenient.

Question 58

A Dictionary is a HashMap it stores map of your keys, value converted to a hashed new key and value mapping.

something like (psuedo code):

{key : val}  
hash(key) = val

If you are wondering which are available options that can be used as key for your dictionary. Then

anything which is hashable(can be converted to hash, and hold static value i.e immutable so as to make a hashed key as stated above) is eligible but as list or set objects can be vary on the go so hash(key) should also needs to vary just to be in sync with your list or set.

You can try :

hash(<your key here>)

If it works fine it can be used as key for your dictionary or else convert it to something hashable.

Inshort :

Convert that list to tuple(<your list>).
Convert that list to str(<your list>).

Question 59

dict keys need to be hashable. Lists are Mutable and they do not provide a valid hash method.

Question 60

I have a class that contains only fields and no methods, like this:

class Request(object):

    def __init__(self, environ):
        self.environ = environ
        self.request_method = environ.get('REQUEST_METHOD', None)
        self.url_scheme = environ.get('wsgi.url_scheme', None)
        self.request_uri = wsgiref.util.request_uri(environ)
        self.path = environ.get('PATH_INFO', None)
        # ...

This could easily be translated to a dict. The class is more flexible for future additions and could be fast with __slots__. So would there be a benefit of using a dict instead? Would a dict be faster than a class? And faster than a class with slots?

Question 61

Why would you make this a dictionary? What’s the advantage? What happens if you later want to add some code? Where would your __init__ code go?

Classes are for bundling related data (and usually code).

Dictionaries are for storing key-value relationships, where usually the keys are all of the same type, and all the values are also of one type. Occasionally they can be useful for bundling data when the key/attribute names are not all known up front, but often this a sign that something’s wrong with your design.

Keep this a class.

Question 62

Use a dictionary unless you need the extra mechanism of a class. You could also use a namedtuple for a hybrid approach:

>>> from collections import namedtuple
>>> request = namedtuple("Request", "environ request_method url_scheme")
>>> request
<class '__main__.Request'>
>>> request.environ = "foo"
>>> request.environ
'foo'

Performance differences here will be minimal, although I would be surprised if the dictionary wasn’t faster.

Question 63

A class in python is a dict underneath. You do get some overhead with the class behavior, but you won’t be able to notice it without a profiler. In this case, I believe you benefit from the class because:

All your logic lives in a single function
It is easy to update and stays encapsulated
If you change anything later, you can easily keep the interface the same

Question 64

I think that the usage of each one is way too subjective for me to get in on that, so i’ll just stick to numbers.

I compared the time it takes to create and to change a variable in a dict, a new_style class and a new_style class with slots.

Here’s the code i used to test it(it’s a bit messy but it does the job.)

import timeit

class Foo(object):

    def __init__(self):

        self.foo1 = 'test'
        self.foo2 = 'test'
        self.foo3 = 'test'

def create_dict():

    foo_dict = {}
    foo_dict['foo1'] = 'test'
    foo_dict['foo2'] = 'test'
    foo_dict['foo3'] = 'test'

    return foo_dict

class Bar(object):
    __slots__ = ['foo1', 'foo2', 'foo3']

    def __init__(self):

        self.foo1 = 'test'
        self.foo2 = 'test'
        self.foo3 = 'test'

tmit = timeit.timeit

print 'Creating...\n'
print 'Dict: ' + str(tmit('create_dict()', 'from __main__ import create_dict'))
print 'Class: ' + str(tmit('Foo()', 'from __main__ import Foo'))
print 'Class with slots: ' + str(tmit('Bar()', 'from __main__ import Bar'))

print '\nChanging a variable...\n'

print 'Dict: ' + str((tmit('create_dict()[\'foo3\'] = "Changed"', 'from __main__ import create_dict') - tmit('create_dict()', 'from __main__ import create_dict')))
print 'Class: ' + str((tmit('Foo().foo3 = "Changed"', 'from __main__ import Foo') - tmit('Foo()', 'from __main__ import Foo')))
print 'Class with slots: ' + str((tmit('Bar().foo3 = "Changed"', 'from __main__ import Bar') - tmit('Bar()', 'from __main__ import Bar')))

And here is the output…

Creating…

Dict: 0.817466186345
Class: 1.60829183597
Class_with_slots: 1.28776730003

Changing a variable…

Dict: 0.0735140918748
Class: 0.111714198313
Class_with_slots: 0.10618612142

So, if you’re just storing variables, you need speed, and it won’t require you to do many calculations, i recommend using a dict(you could always just make a function that looks like a method). But, if you really need classes, remember – always use __slots__.

Note:

I tested the ‘Class’ with both new_style and old_style classes. It turns out that old_style classes are faster to create but slower to modify(not by much but significant if you’re creating lots of classes in a tight loop (tip: you’re doing it wrong)).

Also the times for creating and changing variables may differ on your computer since mine is old and slow. Make sure you test it yourself to see the ‘real’ results.

Edit:

I later tested the namedtuple: i can’t modify it but to create the 10000 samples (or something like that) it took 1.4 seconds so the dictionary is indeed the fastest.

If i change the dict function to include the keys and values and to return the dict instead of the variable containing the dict when i create it it gives me 0.65 instead of 0.8 seconds.

class Foo(dict):
    pass

Creating is like a class with slots and changing the variable is the slowest (0.17 seconds) so do not use these classes. go for a dict (speed) or for the class derived from object (‘syntax candy’)

问题：Python字典中的线程安全

回答 0

回答 1

回答 2

问题：在python中从字典设置属性

回答 0

回答 1

回答 2

回答 3

回答 4

回答 5

回答 6

问题：在Python中进行逆字典查找

回答 0

回答 1

回答 2

回答 3

回答 4

回答 5

回答 6

回答 7

回答 8

回答 9

制作反向字典

Make a reverse dictionary

回答 10

回答 11

问题：字典中键的顺序

回答 0

回答 1

Python 3.7以上

Python 3.6（CPython）

Python> = 2.7和<3.6

Python 3.7+

Python 3.6 (CPython)

Python >=2.7 and <3.6

回答 2

回答 3

回答 4

回答 5

问题：Python：检查一个字典是否是另一个较大字典的子集

回答 0

回答 1

回答 2

回答 3

回答 4

回答 5

回答 6

回答 7

回答 8

回答 9

回答 10

回答 11

回答 12

回答 13

问题：为什么我不能在Python中使用列表作为字典键？

回答 0

回答 1

回答 2

回答 3

回答 4

回答 5

回答 6

回答 7

回答 8

回答 9

回答 10

问题：我应该使用类还是字典？

回答 0

回答 1

回答 2

回答 3

正在建立…

更改变量…

注意：

编辑：

Creating…

Changing a variable…

Note:

Edit:

1. `set_index`用于将`ID`列设置为数据框索引。

2.使用`orient=index`参数将索引用作字典键。

1. Use `set_index` to set `ID` columns as the dataframe index.

2. Use the `orient=index` parameter to have the index as dictionary keys.