问题:自定义类型的对象作为字典键
如何将自定义类型的对象用作Python字典中的键(我不希望“对象id”用作键),例如
class MyThing:
def __init__(self,name,location,length):
self.name = name
self.location = location
self.length = length
如果名称和位置相同,我想将MyThing用作相同的键。从C#/ Java开始,我习惯于重写并提供一个equals和hashcode方法,并保证不会突变该hashcode依赖的任何内容。
我必须在Python中做什么才能做到这一点?我应该吗?
(在一个简单的例子中,就像这里一样,也许最好将一个(名称,位置)元组放置为键-但考虑到我希望键成为一个对象)
What must I do to use my objects of a custom type as keys in a Python dictionary (where I don’t want the “object id” to act as the key) , e.g.
class MyThing:
def __init__(self,name,location,length):
self.name = name
self.location = location
self.length = length
I’d want to use MyThing’s as keys that are considered the same if name and location are the same.
From C#/Java I’m used to having to override and provide an equals and hashcode method, and promise not to mutate anything the hashcode depends on.
What must I do in Python to accomplish this ? Should I even ?
(In a simple case, like here, perhaps it’d be better to just place a (name,location) tuple as key – but consider I’d want the key to be an object)
回答 0
您需要添加2种方法,注意__hash__
和 __eq__
:
class MyThing:
def __init__(self,name,location,length):
self.name = name
self.location = location
self.length = length
def __hash__(self):
return hash((self.name, self.location))
def __eq__(self, other):
return (self.name, self.location) == (other.name, other.location)
def __ne__(self, other):
# Not strictly necessary, but to avoid having both x==y and x!=y
# True at the same time
return not(self == other)
Python dict文档对关键对象定义了这些要求,即它们必须是可哈希的。
You need to add 2 methods, note __hash__
and __eq__
:
class MyThing:
def __init__(self,name,location,length):
self.name = name
self.location = location
self.length = length
def __hash__(self):
return hash((self.name, self.location))
def __eq__(self, other):
return (self.name, self.location) == (other.name, other.location)
def __ne__(self, other):
# Not strictly necessary, but to avoid having both x==y and x!=y
# True at the same time
return not(self == other)
The Python dict documentation defines these requirements on key objects, i.e. they must be hashable.
回答 1
使用python 2.6或更高版本的替代collections.namedtuple()
方法-它可以节省编写任何特殊方法的时间:
from collections import namedtuple
MyThingBase = namedtuple("MyThingBase", ["name", "location"])
class MyThing(MyThingBase):
def __new__(cls, name, location, length):
obj = MyThingBase.__new__(cls, name, location)
obj.length = length
return obj
a = MyThing("a", "here", 10)
b = MyThing("a", "here", 20)
c = MyThing("c", "there", 10)
a == b
# True
hash(a) == hash(b)
# True
a == c
# False
An alternative in Python 2.6 or above is to use collections.namedtuple()
— it saves you writing any special methods:
from collections import namedtuple
MyThingBase = namedtuple("MyThingBase", ["name", "location"])
class MyThing(MyThingBase):
def __new__(cls, name, location, length):
obj = MyThingBase.__new__(cls, name, location)
obj.length = length
return obj
a = MyThing("a", "here", 10)
b = MyThing("a", "here", 20)
c = MyThing("c", "there", 10)
a == b
# True
hash(a) == hash(b)
# True
a == c
# False
回答 2
__hash__
如果需要特殊的哈希语义,则可以覆盖,__cmp__
或者__eq__
为了使您的类可用作键。比较相等的对象需要具有相同的哈希值。
Python期望__hash__
返回一个整数,Banana()
不建议返回:)
如您所述,用户定义的类在__hash__
默认情况下会调用id(self)
。
文档中还有一些其他技巧:
__hash__()
从父类继承方法但更改的含义__cmp__()
或__eq__()
使得返回的哈希值不再合适的类(例如,通过切换到基于值的相等性概念,而不是基于默认的基于身份的相等性),这些类可以显式地标记为可以通过__hash__ = None
在类定义中进行设置来取消哈希。这样做意味着,在程序尝试检索其哈希值时,该类的实例不仅会引发适当的TypeError,而且在检查时它们也将被正确标识为不可哈希
isinstance(obj, collections.Hashable)
(与定义自己__hash__()
明确地引发TypeError的类不同
)。
You override __hash__
if you want special hash-semantics, and __cmp__
or __eq__
in order to make your class usable as a key. Objects who compare equal need to have the same hash value.
Python expects __hash__
to return an integer, returning Banana()
is not recommended :)
User defined classes have __hash__
by default that calls id(self)
, as you noted.
There is some extra tips from the documentation.:
Classes which inherit a __hash__()
method from a parent class but change
the meaning of __cmp__()
or __eq__()
such that the hash value returned is
no longer appropriate (e.g. by
switching to a value-based concept of
equality instead of the default
identity based equality) can
explicitly flag themselves as being
unhashable by setting __hash__ = None
in the class definition. Doing so
means that not only will instances of
the class raise an appropriate
TypeError when a program attempts to
retrieve their hash value, but they
will also be correctly identified as
unhashable when checking
isinstance(obj, collections.Hashable)
(unlike classes which define their own
__hash__()
to explicitly raise TypeError).