问题:’setdefault’dict方法的用例
加入collections.defaultdict
在Python 2.5大大降低用于需要dict
的setdefault
方法。这个问题是针对我们的集体教育:
- 什么是
setdefault
仍然有用,今天在Python 2.6 / 2.7?
setdefault
取代了哪些流行的用例collections.defaultdict
?
The addition of collections.defaultdict
in Python 2.5 greatly reduced the need for dict
‘s setdefault
method. This question is for our collective education:
- What is
setdefault
still useful for, today in Python 2.6/2.7?
- What popular use cases of
setdefault
were superseded with collections.defaultdict
?
回答 0
您可以说defaultdict
这对于在填充dict之前设置默认值很有用,并且setdefault
对于在填充dict时或之后设置默认值很有用。
可能是最常见的用例:对项目进行分组(在未排序的数据中,否则使用itertools.groupby
)
# really verbose
new = {}
for (key, value) in data:
if key in new:
new[key].append( value )
else:
new[key] = [value]
# easy with setdefault
new = {}
for (key, value) in data:
group = new.setdefault(key, []) # key might exist already
group.append( value )
# even simpler with defaultdict
from collections import defaultdict
new = defaultdict(list)
for (key, value) in data:
new[key].append( value ) # all keys have a default already
有时,您要确保在创建字典后存在特定的键。defaultdict
在这种情况下不起作用,因为它仅在显式访问时创建密钥。认为您使用带有许多头的HTTP-ish头-有些头是可选的,但您希望使用它们的默认值:
headers = parse_headers( msg ) # parse the message, get a dict
# now add all the optional headers
for headername, defaultvalue in optional_headers:
headers.setdefault( headername, defaultvalue )
You could say defaultdict
is useful for settings defaults before filling the dict and setdefault
is useful for setting defaults while or after filling the dict.
Probably the most common use case: Grouping items (in unsorted data, else use itertools.groupby
)
# really verbose
new = {}
for (key, value) in data:
if key in new:
new[key].append( value )
else:
new[key] = [value]
# easy with setdefault
new = {}
for (key, value) in data:
group = new.setdefault(key, []) # key might exist already
group.append( value )
# even simpler with defaultdict
from collections import defaultdict
new = defaultdict(list)
for (key, value) in data:
new[key].append( value ) # all keys have a default already
Sometimes you want to make sure that specific keys exist after creating a dict. defaultdict
doesn’t work in this case, because it only creates keys on explicit access. Think you use something HTTP-ish with many headers — some are optional, but you want defaults for them:
headers = parse_headers( msg ) # parse the message, get a dict
# now add all the optional headers
for headername, defaultvalue in optional_headers:
headers.setdefault( headername, defaultvalue )
回答 1
我常用 setdefault
关键字自变量dict,例如在此函数中:
def notify(self, level, *pargs, **kwargs):
kwargs.setdefault("persist", level >= DANGER)
self.__defcon.set(level, **kwargs)
try:
kwargs.setdefault("name", self.client.player_entity().name)
except pytibia.PlayerEntityNotFound:
pass
return _notify(level, *pargs, **kwargs)
这对于在带有关键字参数的函数周围的包装器中调整参数非常有用。
I commonly use setdefault
for keyword argument dicts, such as in this function:
def notify(self, level, *pargs, **kwargs):
kwargs.setdefault("persist", level >= DANGER)
self.__defcon.set(level, **kwargs)
try:
kwargs.setdefault("name", self.client.player_entity().name)
except pytibia.PlayerEntityNotFound:
pass
return _notify(level, *pargs, **kwargs)
It’s great for tweaking arguments in wrappers around functions that take keyword arguments.
回答 2
defaultdict
当默认值是静态的(如新列表)时,它是很好的选择,但如果它是动态的,则没有那么多。
例如,我需要一个字典来将字符串映射到唯一的整数。defaultdict(int)
默认值始终为0。同样defaultdict(intGen())
始终产生1。
相反,我使用了常规的字典:
nextID = intGen()
myDict = {}
for lots of complicated stuff:
#stuff that generates unpredictable, possibly already seen str
strID = myDict.setdefault(myStr, nextID())
注意这dict.get(key, nextID())
还不够,因为我以后也需要引用这些值。
intGen
是我构建的一个很小的类,它会自动递增一个int并返回其值:
class intGen:
def __init__(self):
self.i = 0
def __call__(self):
self.i += 1
return self.i
如果有人有办法做到这一点,defaultdict
我很乐意看到它。
defaultdict
is great when the default value is static, like a new list, but not so much if it’s dynamic.
For example, I need a dictionary to map strings to unique ints. defaultdict(int)
will always use 0 for the default value. Likewise, defaultdict(intGen())
always produces 1.
Instead, I used a regular dict:
nextID = intGen()
myDict = {}
for lots of complicated stuff:
#stuff that generates unpredictable, possibly already seen str
strID = myDict.setdefault(myStr, nextID())
Note that dict.get(key, nextID())
is insufficient because I need to be able to refer to these values later as well.
intGen
is a tiny class I build that automatically increments an int and returns its value:
class intGen:
def __init__(self):
self.i = 0
def __call__(self):
self.i += 1
return self.i
If someone has a way to do this with defaultdict
I’d love to see it.
回答 3
我setdefault()
要在中使用默认值时使用OrderedDict
。没有一个标准的Python集合可以做到这两种,但是有一些 方法可以实现这样的集合。
I use setdefault()
when I want a default value in an OrderedDict
. There isn’t a standard Python collection that does both, but there are ways to implement such a collection.
回答 4
正如大多数答案所说的那样,setdefault
或者defaultdict
当键不存在时让您设置默认值。但是,我想指出有关的用例的小警告setdefault
。当Python解释器执行时setdefault
,即使键存在于字典中,它也将始终对函数的第二个参数求值。例如:
In: d = {1:5, 2:6}
In: d
Out: {1: 5, 2: 6}
In: d.setdefault(2, 0)
Out: 6
In: d.setdefault(2, print('test'))
test
Out: 6
如您所见,print
即使字典中已经存在2 ,它也被执行了。如果您打算setdefault
例如使用进行优化,则这尤其重要memoization
。如果您将递归函数调用添加为的第二个参数setdefault
,您将无法获得任何性能,因为Python始终会递归地调用该函数。
由于提到了备忘录,如果您考虑使用备忘录增强功能,则更好的替代方法是使用functools.lru_cache装饰器。lru_cache可以更好地处理递归函数的缓存要求。
As most answers state setdefault
or defaultdict
would let you set a default value when a key doesn’t exist. However, I would like to point out a small caveat with regard to the use cases of setdefault
. When the Python interpreter executes setdefault
it will always evaluate the second argument to the function even if the key exists in the dictionary. For example:
In: d = {1:5, 2:6}
In: d
Out: {1: 5, 2: 6}
In: d.setdefault(2, 0)
Out: 6
In: d.setdefault(2, print('test'))
test
Out: 6
As you can see, print
was also executed even though 2 already existed in the dictionary. This becomes particularly important if you are planning to use setdefault
for example for an optimization like memoization
. If you add a recursive function call as the second argument to setdefault
, you wouldn’t get any performance out of it as Python would always be calling the function recursively.
Since memoization was mentioned, a better alternative is to use functools.lru_cache decorator if you consider enhancing a function with memoization. lru_cache handles the caching requirements for a recursive function better.
回答 5
正如穆罕默德所说,在某些情况下,您有时只希望设置默认值。一个很好的例子是首先填充然后查询的数据结构。
考虑一个特里。添加单词时,如果需要但不存在子节点,则必须创建该子节点以扩展该Trie。查询单词是否存在时,缺少子节点表示该单词不存在,因此不应创建。
defaultdict无法做到这一点。相反,必须使用带有get和setdefault方法的常规dict。
As Muhammad said, there are situations in which you only sometimes wish to set a default value. A great example of this is a data structure which is first populated, then queried.
Consider a trie. When adding a word, if a subnode is needed but not present, it must be created to extend the trie. When querying for the presence of a word, a missing subnode indicates that the word is not present and it should not be created.
A defaultdict cannot do this. Instead, a regular dict with the get and setdefault methods must be used.
回答 6
从理论上讲,setdefault
如果您有时想设置默认值而有时又不想设置默认值,那。在现实生活中,我还没有遇到过这样的用例。
但是,标准库(Python 2.6,_threadinglocal.py)提出了一个有趣的用例:
>>> mydata = local()
>>> mydata.__dict__
{'number': 42}
>>> mydata.__dict__.setdefault('widgets', [])
[]
>>> mydata.widgets
[]
我会说使用 __dict__.setdefault
是一个非常有用的案例。
编辑:碰巧的是,这是标准库中的唯一示例,并且在注释中。因此,可能不足以证明存在setdefault
。不过,这里有一个解释:
对象将其属性存储在__dict__
属性中。碰巧的是,该__dict__
属性在对象创建后随时可以写入。这也是一本字典,而不是一本字典defaultdict
。在一般情况下,将对象__dict__
作为一个对象是不明智的,defaultdict
因为这会使每个对象都具有所有合法标识符作为属性。因此,我无法预见对Python对象的任何更改都将被__dict__.setdefault
删除,除非它被认为没有用,否则将其全部删除。
Theoretically speaking, setdefault
would still be handy if you sometimes want to set a default and sometimes not. In real life, I haven’t come across such a use case.
However, an interesting use case comes up from the standard library (Python 2.6, _threadinglocal.py):
>>> mydata = local()
>>> mydata.__dict__
{'number': 42}
>>> mydata.__dict__.setdefault('widgets', [])
[]
>>> mydata.widgets
[]
I would say that using __dict__.setdefault
is a pretty useful case.
Edit: As it happens, this is the only example in the standard library and it is in a comment. So may be it is not enough of a case to justify the existence of setdefault
. Still, here is an explanation:
Objects store their attributes in the __dict__
attribute. As it happens, the __dict__
attribute is writeable at any time after the object creation. It is also a dictionary not a defaultdict
. It is not sensible for objects in the general case to have __dict__
as a defaultdict
because that would make each object having all legal identifiers as attributes. So I can’t foresee any change to Python objects getting rid of __dict__.setdefault
, apart from deleting it altogether if it was deemed not useful.
回答 7
的一个缺点defaultdict
超过dict
(dict.setdefault
)是一个defaultdict
对象来创建一个新的项目,每次不存在的关键是给出(例如,使用==
,print
)。同样,defaultdict
该类通常不如dict
该类常见,因此很难将其序列化为IME。
PS IMO功能并非意在使对象发生变异,而不应使对象发生变异。
One drawback of defaultdict
over dict
(dict.setdefault
) is that a defaultdict
object creates a new item EVERYTIME non existing key is given (eg with ==
, print
). Also the defaultdict
class is generally way less common then the dict
class, its more difficult to serialize it IME.
P.S. IMO functions|methods not meant to mutate an object, should not mutate an object.
回答 8
以下是一些setdefault的示例,以显示其有用性:
"""
d = {}
# To add a key->value pair, do the following:
d.setdefault(key, []).append(value)
# To retrieve a list of the values for a key
list_of_values = d[key]
# To remove a key->value pair is still easy, if
# you don't mind leaving empty lists behind when
# the last value for a given key is removed:
d[key].remove(value)
# Despite the empty lists, it's still possible to
# test for the existance of values easily:
if d.has_key(key) and d[key]:
pass # d has some values for key
# Note: Each value can exist multiple times!
"""
e = {}
print e
e.setdefault('Cars', []).append('Toyota')
print e
e.setdefault('Motorcycles', []).append('Yamaha')
print e
e.setdefault('Airplanes', []).append('Boeing')
print e
e.setdefault('Cars', []).append('Honda')
print e
e.setdefault('Cars', []).append('BMW')
print e
e.setdefault('Cars', []).append('Toyota')
print e
# NOTE: now e['Cars'] == ['Toyota', 'Honda', 'BMW', 'Toyota']
e['Cars'].remove('Toyota')
print e
# NOTE: it's still true that ('Toyota' in e['Cars'])
Here are some examples of setdefault to show its usefulness:
"""
d = {}
# To add a key->value pair, do the following:
d.setdefault(key, []).append(value)
# To retrieve a list of the values for a key
list_of_values = d[key]
# To remove a key->value pair is still easy, if
# you don't mind leaving empty lists behind when
# the last value for a given key is removed:
d[key].remove(value)
# Despite the empty lists, it's still possible to
# test for the existance of values easily:
if d.has_key(key) and d[key]:
pass # d has some values for key
# Note: Each value can exist multiple times!
"""
e = {}
print e
e.setdefault('Cars', []).append('Toyota')
print e
e.setdefault('Motorcycles', []).append('Yamaha')
print e
e.setdefault('Airplanes', []).append('Boeing')
print e
e.setdefault('Cars', []).append('Honda')
print e
e.setdefault('Cars', []).append('BMW')
print e
e.setdefault('Cars', []).append('Toyota')
print e
# NOTE: now e['Cars'] == ['Toyota', 'Honda', 'BMW', 'Toyota']
e['Cars'].remove('Toyota')
print e
# NOTE: it's still true that ('Toyota' in e['Cars'])
回答 9
我改写了接受的答案,并为新手提供了便利。
#break it down and understand it intuitively.
new = {}
for (key, value) in data:
if key not in new:
new[key] = [] # this is core of setdefault equals to new.setdefault(key, [])
new[key].append(value)
else:
new[key].append(value)
# easy with setdefault
new = {}
for (key, value) in data:
group = new.setdefault(key, []) # it is new[key] = []
group.append(value)
# even simpler with defaultdict
new = defaultdict(list)
for (key, value) in data:
new[key].append(value) # all keys have a default value of empty list []
此外,我将这些方法归类为参考:
dict_methods_11 = {
'views':['keys', 'values', 'items'],
'add':['update','setdefault'],
'remove':['pop', 'popitem','clear'],
'retrieve':['get',],
'copy':['copy','fromkeys'],}
I rewrote the accepted answer and facile it for the newbies.
#break it down and understand it intuitively.
new = {}
for (key, value) in data:
if key not in new:
new[key] = [] # this is core of setdefault equals to new.setdefault(key, [])
new[key].append(value)
else:
new[key].append(value)
# easy with setdefault
new = {}
for (key, value) in data:
group = new.setdefault(key, []) # it is new[key] = []
group.append(value)
# even simpler with defaultdict
new = defaultdict(list)
for (key, value) in data:
new[key].append(value) # all keys have a default value of empty list []
Additionally,I categorized the methods as reference:
dict_methods_11 = {
'views':['keys', 'values', 'items'],
'add':['update','setdefault'],
'remove':['pop', 'popitem','clear'],
'retrieve':['get',],
'copy':['copy','fromkeys'],}
回答 10
当在字典中设置默认值(!!!)时,我经常使用setdefault。os.environ词典有些常见:
# Set the venv dir if it isn't already overridden:
os.environ.setdefault('VENV_DIR', '/my/default/path')
不太简洁,它看起来像这样:
# Set the venv dir if it isn't already overridden:
if 'VENV_DIR' not in os.environ:
os.environ['VENV_DIR'] = '/my/default/path')
值得注意的是,您也可以使用结果变量:
venv_dir = os.environ.setdefault('VENV_DIR', '/my/default/path')
但这比没有defaultdicts之前的必要性要小。
I use setdefault frequently when, get this, setting a default (!!!) in a dictionary; somewhat commonly the os.environ dictionary:
# Set the venv dir if it isn't already overridden:
os.environ.setdefault('VENV_DIR', '/my/default/path')
Less succinctly, this looks like this:
# Set the venv dir if it isn't already overridden:
if 'VENV_DIR' not in os.environ:
os.environ['VENV_DIR'] = '/my/default/path')
It’s worth noting that you can also use the resulting variable:
venv_dir = os.environ.setdefault('VENV_DIR', '/my/default/path')
But that’s less necessary than it was before defaultdicts existed.
回答 11
上面没有提到另一个我不认为的用例。有时,您通过对象的ID保留对象的缓存字典,其中主要实例位于缓存中,而您想在丢失对象时设置缓存。
return self.objects_by_id.setdefault(obj.id, obj)
当您始终希望每个唯一的ID保留一个实例时,无论您每次如何获取obj,这都非常有用。例如,当对象属性在内存中更新并推迟保存到存储时。
Another use case that I don’t think was mentioned above.
Sometimes you keep a cache dict of objects by their id where primary instance is in the cache and you want to set cache when missing.
return self.objects_by_id.setdefault(obj.id, obj)
That’s useful when you always want to keep a single instance per distinct id no matter how you obtain an obj each time. For example when object attributes get updated in memory and saving to storage is deferred.
回答 12
我偶然发现了一个非常重要的用例: dict.setdefault()
当您只需要一个规范的对象(而不是恰好相等的多个对象)时,它非常适合多线程代码。
例如,(Int)Flag
Python 3.6.0中的Enum有一个错误:如果多个线程竞争一个复合(Int)Flag
成员,则最终可能会超过一个:
from enum import IntFlag, auto
import threading
class TestFlag(IntFlag):
one = auto()
two = auto()
three = auto()
four = auto()
five = auto()
six = auto()
seven = auto()
eight = auto()
def __eq__(self, other):
return self is other
def __hash__(self):
return hash(self.value)
seen = set()
class cycle_enum(threading.Thread):
def run(self):
for i in range(256):
seen.add(TestFlag(i))
threads = []
for i in range(8):
threads.append(cycle_enum())
for t in threads:
t.start()
for t in threads:
t.join()
len(seen)
# 272 (should be 256)
解决方案是将其setdefault()
用作保存计算所得复合成员的最后一步-如果已经保存了另一个成员,则使用它代替新成员,从而保证唯一的Enum成员。
One very important use-case I just stumbled across: dict.setdefault()
is great for multi-threaded code when you only want a single canonical object (as opposed to multiple objects that happen to be equal).
For example, the (Int)Flag
Enum in Python 3.6.0 has a bug: if multiple threads are competing for a composite (Int)Flag
member, there may end up being more than one:
from enum import IntFlag, auto
import threading
class TestFlag(IntFlag):
one = auto()
two = auto()
three = auto()
four = auto()
five = auto()
six = auto()
seven = auto()
eight = auto()
def __eq__(self, other):
return self is other
def __hash__(self):
return hash(self.value)
seen = set()
class cycle_enum(threading.Thread):
def run(self):
for i in range(256):
seen.add(TestFlag(i))
threads = []
for i in range(8):
threads.append(cycle_enum())
for t in threads:
t.start()
for t in threads:
t.join()
len(seen)
# 272 (should be 256)
The solution is to use setdefault()
as the last step of saving the computed composite member — if another has already been saved then it is used instead of the new one, guaranteeing unique Enum members.
回答 13
[编辑] 非常错误!setdefault总是会触发long_computation,而Python则很渴望。
扩展塔特尔的答案。对我来说,最好的用例是缓存机制。代替:
if x not in memo:
memo[x]=long_computation(x)
return memo[x]
它消耗3行和2或3个查询,我会很高兴地写道:
return memo.setdefault(x, long_computation(x))
[Edit] Very wrong! The setdefault would always trigger long_computation, Python being eager.
Expanding on Tuttle’s answer. For me the best use case is cache mechanism. Instead of:
if x not in memo:
memo[x]=long_computation(x)
return memo[x]
which consumes 3 lines and 2 or 3 lookups, I would happily write :
return memo.setdefault(x, long_computation(x))
回答 14
回答 15
不同的用例setdefault()
是当您不想覆盖已经设置的键的值时。defaultdict
覆盖,而setdefault()
不会覆盖。对于嵌套字典,通常情况是仅在尚未设置键的情况下才想要设置默认值,因为您不想删除当前的子词典。这是当你使用setdefault()
。
范例defaultdict
:
>>> from collection import defaultdict()
>>> foo = defaultdict()
>>> foo['a'] = 4
>>> foo['a'] = 2
>>> print(foo)
defaultdict(None, {'a': 2})
setdefault
不会覆盖:
>>> bar = dict()
>>> bar.setdefault('a', 4)
>>> bar.setdefault('a', 2)
>>> print(bar)
{'a': 4}
The different use case for setdefault()
is when you don’t want to overwrite the value of an already set key. defaultdict
overwrites, while setdefault()
does not. For nested dictionaries it is more often the case that you want to set a default only if the key is not set yet, because you don’t want to remove the present sub dictionary. This is when you use setdefault()
.
Example with defaultdict
:
>>> from collection import defaultdict()
>>> foo = defaultdict()
>>> foo['a'] = 4
>>> foo['a'] = 2
>>> print(foo)
defaultdict(None, {'a': 2})
setdefault
doesn’t overwrite:
>>> bar = dict()
>>> bar.setdefault('a', 4)
>>> bar.setdefault('a', 2)
>>> print(bar)
{'a': 4}