‘setdefault’dict方法的用例

问题:’setdefault’dict方法的用例

加入collections.defaultdict在Python 2.5大大降低用于需要dictsetdefault方法。这个问题是针对我们的集体教育:

  1. 什么是setdefault仍然有用,今天在Python 2.6 / 2.7?
  2. setdefault取代了哪些流行的用例collections.defaultdict

The addition of collections.defaultdict in Python 2.5 greatly reduced the need for dict‘s setdefault method. This question is for our collective education:

  1. What is setdefault still useful for, today in Python 2.6/2.7?
  2. What popular use cases of setdefault were superseded with collections.defaultdict?

回答 0

您可以说defaultdict这对于在填充dict之前设置默认值很有用,并且setdefault对于在填充dict时或之后设置默认值很有用。

可能是最常见的用例:对项目进行分组(在未排序的数据中,否则使用itertools.groupby

# really verbose
new = {}
for (key, value) in data:
    if key in new:
        new[key].append( value )
    else:
        new[key] = [value]


# easy with setdefault
new = {}
for (key, value) in data:
    group = new.setdefault(key, []) # key might exist already
    group.append( value )


# even simpler with defaultdict 
from collections import defaultdict
new = defaultdict(list)
for (key, value) in data:
    new[key].append( value ) # all keys have a default already

有时,您要确保在创建字典后存在特定的键。defaultdict在这种情况下不起作用,因为它仅在显式访问时创建密钥。认为您使用带有许多头的HTTP-ish头-有些头是可选的,但您希望使用它们的默认值:

headers = parse_headers( msg ) # parse the message, get a dict
# now add all the optional headers
for headername, defaultvalue in optional_headers:
    headers.setdefault( headername, defaultvalue )

You could say defaultdict is useful for settings defaults before filling the dict and setdefault is useful for setting defaults while or after filling the dict.

Probably the most common use case: Grouping items (in unsorted data, else use itertools.groupby)

# really verbose
new = {}
for (key, value) in data:
    if key in new:
        new[key].append( value )
    else:
        new[key] = [value]


# easy with setdefault
new = {}
for (key, value) in data:
    group = new.setdefault(key, []) # key might exist already
    group.append( value )


# even simpler with defaultdict 
from collections import defaultdict
new = defaultdict(list)
for (key, value) in data:
    new[key].append( value ) # all keys have a default already

Sometimes you want to make sure that specific keys exist after creating a dict. defaultdict doesn’t work in this case, because it only creates keys on explicit access. Think you use something HTTP-ish with many headers — some are optional, but you want defaults for them:

headers = parse_headers( msg ) # parse the message, get a dict
# now add all the optional headers
for headername, defaultvalue in optional_headers:
    headers.setdefault( headername, defaultvalue )

回答 1

我常用 setdefault关键字自变量dict,例如在此函数中:

def notify(self, level, *pargs, **kwargs):
    kwargs.setdefault("persist", level >= DANGER)
    self.__defcon.set(level, **kwargs)
    try:
        kwargs.setdefault("name", self.client.player_entity().name)
    except pytibia.PlayerEntityNotFound:
        pass
    return _notify(level, *pargs, **kwargs)

这对于在带有关键字参数的函数周围的包装器中调整参数非常有用。

I commonly use setdefault for keyword argument dicts, such as in this function:

def notify(self, level, *pargs, **kwargs):
    kwargs.setdefault("persist", level >= DANGER)
    self.__defcon.set(level, **kwargs)
    try:
        kwargs.setdefault("name", self.client.player_entity().name)
    except pytibia.PlayerEntityNotFound:
        pass
    return _notify(level, *pargs, **kwargs)

It’s great for tweaking arguments in wrappers around functions that take keyword arguments.


回答 2

defaultdict 当默认值是静态的(如新列表)时,它是很好的选择,但如果它是动态的,则没有那么多。

例如,我需要一个字典来将字符串映射到唯一的整数。defaultdict(int)默认值始终为0。同样defaultdict(intGen())始终产生1。

相反,我使用了常规的字典:

nextID = intGen()
myDict = {}
for lots of complicated stuff:
    #stuff that generates unpredictable, possibly already seen str
    strID = myDict.setdefault(myStr, nextID())

注意这dict.get(key, nextID())还不够,因为我以后也需要引用这些值。

intGen 是我构建的一个很小的类,它会自动递增一个int并返回其值:

class intGen:
    def __init__(self):
        self.i = 0

    def __call__(self):
        self.i += 1
    return self.i

如果有人有办法做到这一点,defaultdict我很乐意看到它。

defaultdict is great when the default value is static, like a new list, but not so much if it’s dynamic.

For example, I need a dictionary to map strings to unique ints. defaultdict(int) will always use 0 for the default value. Likewise, defaultdict(intGen()) always produces 1.

Instead, I used a regular dict:

nextID = intGen()
myDict = {}
for lots of complicated stuff:
    #stuff that generates unpredictable, possibly already seen str
    strID = myDict.setdefault(myStr, nextID())

Note that dict.get(key, nextID()) is insufficient because I need to be able to refer to these values later as well.

intGen is a tiny class I build that automatically increments an int and returns its value:

class intGen:
    def __init__(self):
        self.i = 0

    def __call__(self):
        self.i += 1
    return self.i

If someone has a way to do this with defaultdict I’d love to see it.


回答 3

setdefault()要在中使用默认值时使用OrderedDict。没有一个标准的Python集合可以做到这两种,但是有一些 方法可以实现这样的集合。

I use setdefault() when I want a default value in an OrderedDict. There isn’t a standard Python collection that does both, but there are ways to implement such a collection.


回答 4

正如大多数答案所说的那样,setdefault或者defaultdict当键不存在时让您设置默认值。但是,我想指出有关的用例的小警告setdefault。当Python解释器执行时setdefault,即使键存在于字典中,它也将始终对函数的第二个参数求值。例如:

In: d = {1:5, 2:6}

In: d
Out: {1: 5, 2: 6}

In: d.setdefault(2, 0)
Out: 6

In: d.setdefault(2, print('test'))
test
Out: 6

如您所见,print即使字典中已经存在2 ,它也被执行了。如果您打算setdefault例如使用进行优化,则这尤其重要memoization。如果您将递归函数调用添加为的第二个参数setdefault,您将无法获得任何性能,因为Python始终会递归地调用该函数。

由于提到了备忘录,如果您考虑使用备忘录增强功能,则更好的替代方法是使用functools.lru_cache装饰器。lru_cache可以更好地处理递归函数的缓存要求。

As most answers state setdefault or defaultdict would let you set a default value when a key doesn’t exist. However, I would like to point out a small caveat with regard to the use cases of setdefault. When the Python interpreter executes setdefaultit will always evaluate the second argument to the function even if the key exists in the dictionary. For example:

In: d = {1:5, 2:6}

In: d
Out: {1: 5, 2: 6}

In: d.setdefault(2, 0)
Out: 6

In: d.setdefault(2, print('test'))
test
Out: 6

As you can see, print was also executed even though 2 already existed in the dictionary. This becomes particularly important if you are planning to use setdefault for example for an optimization like memoization. If you add a recursive function call as the second argument to setdefault, you wouldn’t get any performance out of it as Python would always be calling the function recursively.

Since memoization was mentioned, a better alternative is to use functools.lru_cache decorator if you consider enhancing a function with memoization. lru_cache handles the caching requirements for a recursive function better.


回答 5

正如穆罕默德所说,在某些情况下,您有时只希望设置默认值。一个很好的例子是首先填充然后查询的数据结构。

考虑一个特里。添加单词时,如果需要但不存在子节点,则必须创建该子节点以扩展该Trie。查询单词是否存在时,缺少子节点表示该单词不存在,因此不应创建。

defaultdict无法做到这一点。相反,必须使用带有get和setdefault方法的常规dict。

As Muhammad said, there are situations in which you only sometimes wish to set a default value. A great example of this is a data structure which is first populated, then queried.

Consider a trie. When adding a word, if a subnode is needed but not present, it must be created to extend the trie. When querying for the presence of a word, a missing subnode indicates that the word is not present and it should not be created.

A defaultdict cannot do this. Instead, a regular dict with the get and setdefault methods must be used.


回答 6

从理论上讲,setdefault如果您有时想设置默认值而有时又不想设置默认值,那。在现实生活中,我还没有遇到过这样的用例。

但是,标准库(Python 2.6,_threadinglocal.py)提出了一个有趣的用例:

>>> mydata = local()
>>> mydata.__dict__
{'number': 42}
>>> mydata.__dict__.setdefault('widgets', [])
[]
>>> mydata.widgets
[]

我会说使用 __dict__.setdefault是一个非常有用的案例。

编辑:碰巧的是,这是标准库中的唯一示例,并且在注释中。因此,可能不足以证明存在setdefault。不过,这里有一个解释:

对象将其属性存储在__dict__属性中。碰巧的是,该__dict__属性在对象创建后随时可以写入。这也是一本字典,而不是一本字典defaultdict。在一般情况下,将对象__dict__作为一个对象是不明智的,defaultdict因为这会使每个对象都具有所有合法标识符作为属性。因此,我无法预见对Python对象的任何更改都将被__dict__.setdefault删除,除非它被认为没有用,否则将其全部删除。

Theoretically speaking, setdefault would still be handy if you sometimes want to set a default and sometimes not. In real life, I haven’t come across such a use case.

However, an interesting use case comes up from the standard library (Python 2.6, _threadinglocal.py):

>>> mydata = local()
>>> mydata.__dict__
{'number': 42}
>>> mydata.__dict__.setdefault('widgets', [])
[]
>>> mydata.widgets
[]

I would say that using __dict__.setdefault is a pretty useful case.

Edit: As it happens, this is the only example in the standard library and it is in a comment. So may be it is not enough of a case to justify the existence of setdefault. Still, here is an explanation:

Objects store their attributes in the __dict__ attribute. As it happens, the __dict__ attribute is writeable at any time after the object creation. It is also a dictionary not a defaultdict. It is not sensible for objects in the general case to have __dict__ as a defaultdict because that would make each object having all legal identifiers as attributes. So I can’t foresee any change to Python objects getting rid of __dict__.setdefault, apart from deleting it altogether if it was deemed not useful.


回答 7

的一个缺点defaultdict超过dictdict.setdefault)是一个defaultdict对象来创建一个新的项目,每次不存在的关键是给出(例如,使用==print)。同样,defaultdict该类通常不如dict该类常见,因此很难将其序列化为IME。

PS IMO功能并非意在使对象发生变异,而不应使对象发生变异。

One drawback of defaultdict over dict (dict.setdefault) is that a defaultdict object creates a new item EVERYTIME non existing key is given (eg with ==, print). Also the defaultdict class is generally way less common then the dict class, its more difficult to serialize it IME.

P.S. IMO functions|methods not meant to mutate an object, should not mutate an object.


回答 8

以下是一些setdefault的示例,以显示其有用性:

"""
d = {}
# To add a key->value pair, do the following:
d.setdefault(key, []).append(value)

# To retrieve a list of the values for a key
list_of_values = d[key]

# To remove a key->value pair is still easy, if
# you don't mind leaving empty lists behind when
# the last value for a given key is removed:
d[key].remove(value)

# Despite the empty lists, it's still possible to 
# test for the existance of values easily:
if d.has_key(key) and d[key]:
    pass # d has some values for key

# Note: Each value can exist multiple times!
"""
e = {}
print e
e.setdefault('Cars', []).append('Toyota')
print e
e.setdefault('Motorcycles', []).append('Yamaha')
print e
e.setdefault('Airplanes', []).append('Boeing')
print e
e.setdefault('Cars', []).append('Honda')
print e
e.setdefault('Cars', []).append('BMW')
print e
e.setdefault('Cars', []).append('Toyota')
print e

# NOTE: now e['Cars'] == ['Toyota', 'Honda', 'BMW', 'Toyota']
e['Cars'].remove('Toyota')
print e
# NOTE: it's still true that ('Toyota' in e['Cars'])

Here are some examples of setdefault to show its usefulness:

"""
d = {}
# To add a key->value pair, do the following:
d.setdefault(key, []).append(value)

# To retrieve a list of the values for a key
list_of_values = d[key]

# To remove a key->value pair is still easy, if
# you don't mind leaving empty lists behind when
# the last value for a given key is removed:
d[key].remove(value)

# Despite the empty lists, it's still possible to 
# test for the existance of values easily:
if d.has_key(key) and d[key]:
    pass # d has some values for key

# Note: Each value can exist multiple times!
"""
e = {}
print e
e.setdefault('Cars', []).append('Toyota')
print e
e.setdefault('Motorcycles', []).append('Yamaha')
print e
e.setdefault('Airplanes', []).append('Boeing')
print e
e.setdefault('Cars', []).append('Honda')
print e
e.setdefault('Cars', []).append('BMW')
print e
e.setdefault('Cars', []).append('Toyota')
print e

# NOTE: now e['Cars'] == ['Toyota', 'Honda', 'BMW', 'Toyota']
e['Cars'].remove('Toyota')
print e
# NOTE: it's still true that ('Toyota' in e['Cars'])

回答 9

我改写了接受的答案,并为新手提供了便利。

#break it down and understand it intuitively.
new = {}
for (key, value) in data:
    if key not in new:
        new[key] = [] # this is core of setdefault equals to new.setdefault(key, [])
        new[key].append(value)
    else:
        new[key].append(value)


# easy with setdefault
new = {}
for (key, value) in data:
    group = new.setdefault(key, []) # it is new[key] = []
    group.append(value)



# even simpler with defaultdict
new = defaultdict(list)
for (key, value) in data:
    new[key].append(value) # all keys have a default value of empty list []

此外,我将这些方法归类为参考:

dict_methods_11 = {
            'views':['keys', 'values', 'items'],
            'add':['update','setdefault'],
            'remove':['pop', 'popitem','clear'],
            'retrieve':['get',],
            'copy':['copy','fromkeys'],}

I rewrote the accepted answer and facile it for the newbies.

#break it down and understand it intuitively.
new = {}
for (key, value) in data:
    if key not in new:
        new[key] = [] # this is core of setdefault equals to new.setdefault(key, [])
        new[key].append(value)
    else:
        new[key].append(value)


# easy with setdefault
new = {}
for (key, value) in data:
    group = new.setdefault(key, []) # it is new[key] = []
    group.append(value)



# even simpler with defaultdict
new = defaultdict(list)
for (key, value) in data:
    new[key].append(value) # all keys have a default value of empty list []

Additionally,I categorized the methods as reference:

dict_methods_11 = {
            'views':['keys', 'values', 'items'],
            'add':['update','setdefault'],
            'remove':['pop', 'popitem','clear'],
            'retrieve':['get',],
            'copy':['copy','fromkeys'],}

回答 10

当在字典中设置默认值(!!!)时,我经常使用setdefault。os.environ词典有些常见:

# Set the venv dir if it isn't already overridden:
os.environ.setdefault('VENV_DIR', '/my/default/path')

不太简洁,它看起来像这样:

# Set the venv dir if it isn't already overridden:
if 'VENV_DIR' not in os.environ:
    os.environ['VENV_DIR'] = '/my/default/path')

值得注意的是,您也可以使用结果变量:

venv_dir = os.environ.setdefault('VENV_DIR', '/my/default/path')

但这比没有defaultdicts之前的必要性要小。

I use setdefault frequently when, get this, setting a default (!!!) in a dictionary; somewhat commonly the os.environ dictionary:

# Set the venv dir if it isn't already overridden:
os.environ.setdefault('VENV_DIR', '/my/default/path')

Less succinctly, this looks like this:

# Set the venv dir if it isn't already overridden:
if 'VENV_DIR' not in os.environ:
    os.environ['VENV_DIR'] = '/my/default/path')

It’s worth noting that you can also use the resulting variable:

venv_dir = os.environ.setdefault('VENV_DIR', '/my/default/path')

But that’s less necessary than it was before defaultdicts existed.


回答 11

上面没有提到另一个我不认为的用例。有时,您通过对象的ID保留对象的缓存字典,其中主要实例位于缓存中,而您想在丢失对象时设置缓存。

return self.objects_by_id.setdefault(obj.id, obj)

当您始终希望每个唯一的ID保留一个实例时,无论您每次如何获取obj,这都非常有用。例如,当对象属性在内存中更新并推迟保存到存储时。

Another use case that I don’t think was mentioned above. Sometimes you keep a cache dict of objects by their id where primary instance is in the cache and you want to set cache when missing.

return self.objects_by_id.setdefault(obj.id, obj)

That’s useful when you always want to keep a single instance per distinct id no matter how you obtain an obj each time. For example when object attributes get updated in memory and saving to storage is deferred.


回答 12

我偶然发现了一个非常重要的用例: dict.setdefault()当您只需要一个规范的对象(而不是恰好相等的多个对象)时,它非常适合多线程代码。

例如,(Int)FlagPython 3.6.0中Enum有一个错误:如果多个线程竞争一个复合(Int)Flag成员,则最终可能会超过一个:

from enum import IntFlag, auto
import threading

class TestFlag(IntFlag):
    one = auto()
    two = auto()
    three = auto()
    four = auto()
    five = auto()
    six = auto()
    seven = auto()
    eight = auto()

    def __eq__(self, other):
        return self is other

    def __hash__(self):
        return hash(self.value)

seen = set()

class cycle_enum(threading.Thread):
    def run(self):
        for i in range(256):
            seen.add(TestFlag(i))

threads = []
for i in range(8):
    threads.append(cycle_enum())

for t in threads:
    t.start()

for t in threads:
    t.join()

len(seen)
# 272  (should be 256)

解决方案是将其setdefault()用作保存计算所得复合成员的最后一步-如果已经保存了另一个成员,则使用它代替新成员,从​​而保证唯一的Enum成员。

One very important use-case I just stumbled across: dict.setdefault() is great for multi-threaded code when you only want a single canonical object (as opposed to multiple objects that happen to be equal).

For example, the (Int)Flag Enum in Python 3.6.0 has a bug: if multiple threads are competing for a composite (Int)Flag member, there may end up being more than one:

from enum import IntFlag, auto
import threading

class TestFlag(IntFlag):
    one = auto()
    two = auto()
    three = auto()
    four = auto()
    five = auto()
    six = auto()
    seven = auto()
    eight = auto()

    def __eq__(self, other):
        return self is other

    def __hash__(self):
        return hash(self.value)

seen = set()

class cycle_enum(threading.Thread):
    def run(self):
        for i in range(256):
            seen.add(TestFlag(i))

threads = []
for i in range(8):
    threads.append(cycle_enum())

for t in threads:
    t.start()

for t in threads:
    t.join()

len(seen)
# 272  (should be 256)

The solution is to use setdefault() as the last step of saving the computed composite member — if another has already been saved then it is used instead of the new one, guaranteeing unique Enum members.


回答 13

[编辑] 非常错误!setdefault总是会触发long_computation,而Python则很渴望。

扩展塔特尔的答案。对我来说,最好的用例是缓存机制。代替:

if x not in memo:
   memo[x]=long_computation(x)
return memo[x]

它消耗3行和2或3个查询,我会很高兴地写道

return memo.setdefault(x, long_computation(x))

[Edit] Very wrong! The setdefault would always trigger long_computation, Python being eager.

Expanding on Tuttle’s answer. For me the best use case is cache mechanism. Instead of:

if x not in memo:
   memo[x]=long_computation(x)
return memo[x]

which consumes 3 lines and 2 or 3 lookups, I would happily write :

return memo.setdefault(x, long_computation(x))

回答 14

我喜欢这里给出的答案:

http://stupidpythonideas.blogspot.com/2013/08/defaultdict-vs-setdefault.html

简而言之,决定(在非性能关键型应用程序中)应基于您希望如何处理下游空键( KeyError对默认值)的决定。

I like the answer given here:

http://stupidpythonideas.blogspot.com/2013/08/defaultdict-vs-setdefault.html

In short, the decision (in non-performance-critical apps) should be made on the basis of how you want to handle lookup of empty keys downstream (viz. KeyError versus default value).


回答 15

不同的用例setdefault()当您不想覆盖已经设置的键的值时。defaultdict覆盖,而setdefault()不会覆盖。对于嵌套字典,通常情况是仅在尚未设置键的情况下才想要设置默认值,因为您不想删除当前的子词典。这是当你使用setdefault()

范例defaultdict

>>> from collection import defaultdict()
>>> foo = defaultdict()
>>> foo['a'] = 4
>>> foo['a'] = 2
>>> print(foo)
defaultdict(None, {'a': 2})

setdefault 不会覆盖:

>>> bar = dict()
>>> bar.setdefault('a', 4)
>>> bar.setdefault('a', 2)
>>> print(bar)
{'a': 4}

The different use case for setdefault() is when you don’t want to overwrite the value of an already set key. defaultdict overwrites, while setdefault() does not. For nested dictionaries it is more often the case that you want to set a default only if the key is not set yet, because you don’t want to remove the present sub dictionary. This is when you use setdefault().

Example with defaultdict:

>>> from collection import defaultdict()
>>> foo = defaultdict()
>>> foo['a'] = 4
>>> foo['a'] = 2
>>> print(foo)
defaultdict(None, {'a': 2})

setdefault doesn’t overwrite:

>>> bar = dict()
>>> bar.setdefault('a', 4)
>>> bar.setdefault('a', 2)
>>> print(bar)
{'a': 4}