“可哈希”在Python中是什么意思?

问题:“可哈希”在Python中是什么意思?

我尝试搜索互联网,但找不到可哈希的含义。

当他们说物体是什么hashablehashable objects什么意思?

I tried searching internet but could not find the meaning of hashable.

When they say objects are hashable or hashable objects what does it mean?


回答 0

Python词汇表中

如果对象的哈希值在其生命周期内始终不变(需要一个__hash__()方法),并且可以与其他对象进行比较(需要一个__eq__()or __cmp__()方法),则该对象是可哈希的。比较相等的可哈希对象必须具有相同的哈希值。

散列性使对象可用作字典键和set成员,因为这些数据结构在内部使用散列值。

Python的所有不可变内置对象都是可哈希的,而没有可变容器(例如列表或字典)是可哈希的。作为用户定义类实例的对象默认情况下可哈希化;它们都比较不相等,并且其哈希值是id()

From the Python glossary:

An object is hashable if it has a hash value which never changes during its lifetime (it needs a __hash__() method), and can be compared to other objects (it needs an __eq__() or __cmp__() method). Hashable objects which compare equal must have the same hash value.

Hashability makes an object usable as a dictionary key and a set member, because these data structures use the hash value internally.

All of Python’s immutable built-in objects are hashable, while no mutable containers (such as lists or dictionaries) are. Objects which are instances of user-defined classes are hashable by default; they all compare unequal, and their hash value is their id().


回答 1

这里的所有答案都很好地解释了python中可哈希对象的工作原理,但是我相信必须首先了解术语“哈希”。

散列是计算机科学中的一个概念,用于创建高性能的伪随机访问数据结构,在该结构中要快速存储和访问大量数据。

例如,如果您有10,000个电话号码,并且想要将它们存储在一个数组中(这是一个顺序数据结构,可将数据存储在连续的内存位置中,并提供随机访问),但是您可能没有所需的连续数量内存位置。

因此,您可以改为使用大小为100的数组,并使用哈希函数将一组值映射到相同的索引,并且这些值可以存储在链接列表中。这提供了类似于阵列的性能。

现在,哈希函数可以很简单,只需将数字除以数组的大小,然后将其余部分作为索引即可。

有关更多详细信息,请参阅https://en.wikipedia.org/wiki/Hash_function

这是另一个很好的参考:http : //interactivepython.org/runestone/static/pythonds/SortSearch/Hashing.html

All the answers here have good working explanation of hashable objects in python, but I believe one needs to understand the term Hashing first.

Hashing is a concept in computer science which is used to create high performance, pseudo random access data structures where large amount of data is to be stored and accessed quickly.

For example, if you have 10,000 phone numbers, and you want to store them in an array (which is a sequential data structure that stores data in contiguous memory locations, and provides random access), but you might not have the required amount of contiguous memory locations.

So, you can instead use an array of size 100, and use a hash function to map a set of values to same indices, and these values can be stored in a linked list. This provides a performance similar to an array.

Now, a hash function can be as simple as dividing the number with the size of the array and taking the remainder as the index.

For more detail refer to https://en.wikipedia.org/wiki/Hash_function

Here is another good reference: http://interactivepython.org/runestone/static/pythonds/SortSearch/Hashing.html


回答 2

任何不可变的东西(可变的手段,可能会改变)都可以被散列。除了要查找的哈希函数(如果有类)之外,还可以通过例如。dir(tuple)寻找__hash__方法,这里有一些例子

#x = hash(set([1,2])) #set unhashable
x = hash(frozenset([1,2])) #hashable
#x = hash(([1,2], [2,3])) #tuple of mutable objects, unhashable
x = hash((1,2,3)) #tuple of immutable objects, hashable
#x = hash()
#x = hash({1,2}) #list of mutable objects, unhashable
#x = hash([1,2,3]) #list of immutable objects, unhashable

不可变类型列表:

int, float, decimal, complex, bool, string, tuple, range, frozenset, bytes

可变类型列表:

list, dict, set, bytearray, user-defined classes

Anything that is not mutable (mutable means, likely to change) can be hashed. Besides the hash function to look for, if a class has it, by eg. dir(tuple) and looking for the __hash__ method, here are some examples

#x = hash(set([1,2])) #set unhashable
x = hash(frozenset([1,2])) #hashable
#x = hash(([1,2], [2,3])) #tuple of mutable objects, unhashable
x = hash((1,2,3)) #tuple of immutable objects, hashable
#x = hash()
#x = hash({1,2}) #list of mutable objects, unhashable
#x = hash([1,2,3]) #list of immutable objects, unhashable

List of immutable types:

int, float, decimal, complex, bool, string, tuple, range, frozenset, bytes

List of mutable types:

list, dict, set, bytearray, user-defined classes

回答 3

根据Python词汇表的理解,当您创建可哈希对象的实例时,还会根据实例的成员或值来计算不可更改的值。例如,该值随后可以用作字典中的键,如下所示:

>>> tuple_a = (1,2,3)
>>> tuple_a.__hash__()
2528502973977326415
>>> tuple_b = (2,3,4)
>>> tuple_b.__hash__()
3789705017596477050
>>> tuple_c = (1,2,3)
>>> tuple_c.__hash__()
2528502973977326415
>>> id(a) == id(c)  # a and c same object?
False
>>> a.__hash__() == c.__hash__()  # a and c same value?
True
>>> dict_a = {}
>>> dict_a[tuple_a] = 'hiahia'
>>> dict_a[tuple_c]
'hiahia'

我们可以发现tuple_a和tuple_c的哈希值相同,因为它们具有相同的成员。当我们将tuple_a用作dict_a中的键时,我们可以发现dict_a [tuple_c]的值相同,这意味着,当它们用作dict中的键时,它们将返回相同的值,因为哈希值是相同。对于那些不可哈希的对象,方法哈希定义为“无”:

>>> type(dict.__hash__) 
<class 'NoneType'>

我猜这个哈希值是在实例初始化时计算出来的,而不是以动态方式计算的,这就是为什么只有不可变对象才可以哈希的原因。希望这可以帮助。

In my understanding according to Python glossary, when you create a instance of objects that are hashable, an unchangeable value is also calculated according to the members or values of the instance. For example, that value could then be used as a key in a dict as below:

>>> tuple_a = (1,2,3)
>>> tuple_a.__hash__()
2528502973977326415
>>> tuple_b = (2,3,4)
>>> tuple_b.__hash__()
3789705017596477050
>>> tuple_c = (1,2,3)
>>> tuple_c.__hash__()
2528502973977326415
>>> id(a) == id(c)  # a and c same object?
False
>>> a.__hash__() == c.__hash__()  # a and c same value?
True
>>> dict_a = {}
>>> dict_a[tuple_a] = 'hiahia'
>>> dict_a[tuple_c]
'hiahia'

we can find that the hash value of tuple_a and tuple_c are the same since they have the same members. When we use tuple_a as the key in dict_a, we can find that the value for dict_a[tuple_c] is the same, which means that, when they are used as the key in a dict, they return the same value because the hash values are the same. For those objects that are not hashable, the method hash is defined as None:

>>> type(dict.__hash__) 
<class 'NoneType'>

I guess this hash value is calculated upon the initialization of the instance, not in a dynamic way, that’s why only immutable objects are hashable. Hope this helps.


回答 4

让我给您一个工作示例,以了解python中的可哈希对象。我以2个元组为例,一个元组中的每个值都有一个唯一的哈希值,该值在其生命周期内不会改变。因此,基于此值,可以完成两个元组之间的比较。我们可以使用Id()获得元组元素的哈希值。

Let me give you a working example to understand the hashable objects in python. I am taking 2 Tuples for this example.Each value in a tuple has a unique Hash Value which never changes during its lifetime. So based on this has value, the comparison between two tuples is done. We can get the hash value of a tuple element using the Id().


回答 5

在python中,这意味着该对象可以是集合的成员以便返回索引。即,它们具有唯一的身份/ ID。

例如,在python 3.3中:

数据结构列表不可散列,但数据结构元组可散列。

In python it means that the object can be members of sets in order to return a index. That is, they have unique identity/ id.

for example, in python 3.3:

the data structure Lists are not hashable but the data structure Tuples are hashable.


回答 6

可散列=能够被散列。

好的,什么是哈希?哈希函数是一种函数,它接受一个对象(例如字符串,例如“ Python”)并返回固定大小的代码。为简单起见,假定返回值为整数。

当我在Python 3中运行hash(’Python’)时,结果为5952713340227947791。不同版本的Python可以自由更改基础哈希函数,因此您可能会获得不同的值。重要的是,无论我现在多次运行hash(’Python’),还是始终使用相同版本的Python获得相同的结果。

但是hash(’Java’)返回1753925553814008565。因此,如果要散列的对象发生了变化,结果也将发生变化。另一方面,如果我正在哈希的对象没有更改,则结果保持不变。

为什么这么重要?

好吧,例如,Python字典要求键是不可变的。即,键必须是不变的对象。字符串在Python中是不变的,其他基本类型(int,float,bool)也是如此。元组和冻结集也是不可变的。另一方面,列表不是不可变的(即,它们是可变的),因为您可以更改它们。同样,字典是易变的。

因此,当我们说某事是可哈希的时,我们表示它是不可变的。如果我尝试将可变类型传递给hash()函数,它将失败:

>>> hash('Python')
1687380313081734297
>>> hash('Java')
1753925553814008565
>>>
>>> hash([1, 2])
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: unhashable type: 'list'
>>> hash({1, 2})
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: unhashable type: 'set'
>>> hash({1 : 2})
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: unhashable type: 'dict'
>>>
>>> hash(frozenset({1, 2}))
-1834016341293975159
>>> hash((1, 2))
3713081631934410656

Hashable = capable of being hashed.

Ok, what is hashing? A hashing function is a function which takes an object, say a string such as “Python,” and returns a fixed-size code. For simplicity, assume the return value is an integer.

When I run hash(‘Python’) in Python 3, I get 5952713340227947791 as the result. Different versions of Python are free to change the underlying hash function, so you will likely get a different value. The important thing is that no matter now many times I run hash(‘Python’), I’ll always get the same result with the same version of Python.

But hash(‘Java’) returns 1753925553814008565. So if the object I am hashing changes, so does the result. On the other hand, if the object I am hashing does not change, then the result stays the same.

Why does this matter?

Well, Python dictionaries, for example, require the keys to be immutable. That is, keys must be objects which do not change. Strings are immutable in Python, as are the other basic types (int, float, bool). Tuples and frozensets are also immutable. Lists, on the other hand, are not immutable (i.e., they are mutable) because you can change them. Similarly, dicts are mutable.

So when we say something is hashable, we mean it is immutable. If I try to pass a mutable type to the hash() function, it will fail:

>>> hash('Python')
1687380313081734297
>>> hash('Java')
1753925553814008565
>>>
>>> hash([1, 2])
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: unhashable type: 'list'
>>> hash({1, 2})
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: unhashable type: 'set'
>>> hash({1 : 2})
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: unhashable type: 'dict'
>>>
>>> hash(frozenset({1, 2}))
-1834016341293975159
>>> hash((1, 2))
3713081631934410656

回答 7

在Python中,任何不可变的对象(例如整数,布尔值,字符串,元组)都是可哈希的,这意味着其值在其生存期内不会改变。这使Python可以创建一个唯一的哈希值来标识它,字典可以使用它来跟踪唯一的键,并使用集合来跟踪唯一的值。

这就是为什么Python要求我们对字典中的键使用不可变的数据类型。

In Python, any immutable object (such as an integer, boolean, string, tuple) is hashable, meaning its value does not change during its lifetime. This allows Python to create a unique hash value to identify it, which can be used by dictionaries to track unique keys and sets to track unique values.

This is why Python requires us to use immutable datatypes for the keys in a dictionary.


回答 8

为了从头开始创建哈希表,必须将所有值设置为“无”,并在需要时进行修改。哈希对象是指可修改的数据类型(字典,列表等)。另一方面,一旦分配,就无法重新初始化集合,因此集合不可散列。而set()的变体-Frozenset()是可哈希的。

For creating a hashing table from scratch, all the values has to set to “None” and modified once a requirement arises. Hashable objects refers to the modifiable datatypes(Dictionary,lists etc). Sets on the other hand cannot be reinitialized once assigned, so sets are non hashable. Whereas, The variant of set() — frozenset() — is hashable.