为什么Python的无穷大散列具有π的数字?

问题:为什么Python的无穷大散列具有π的数字?

Python中无穷大的哈​​希值具有与pi匹配的数字:

>>> inf = float('inf')
>>> hash(inf)
314159
>>> int(math.pi*1e5)
314159

这仅仅是巧合还是故意的?

The hash of infinity in Python has digits matching pi:

>>> inf = float('inf')
>>> hash(inf)
314159
>>> int(math.pi*1e5)
314159

Is that just a coincidence or is it intentional?


回答 0

_PyHASH_INF定义为等于的常数314159

我找不到关于此的任何讨论,也没有提供原因的评论。我认为它或多或少是任意选择的。我想只要它们不将相同的有意义的值用于其他哈希,就没关系。

_PyHASH_INF is defined as a constant equal to 314159.

I can’t find any discussion about this, or comments giving a reason. I think it was chosen more or less arbitrarily. I imagine that as long as they don’t use the same meaningful value for other hashes, it shouldn’t matter.


回答 1

简介:这不是巧合;在Python的默认CPython实现中_PyHASH_INF被硬编码为314159,并在2000年被Tim Peters选为任意值(显然是从π的数字)。


的值hash(float('inf'))是数值类型内置散列函数的系统相关的参数中的一个,并且也可以作为sys.hash_info.inf在Python 3:

>>> import sys
>>> sys.hash_info
sys.hash_info(width=64, modulus=2305843009213693951, inf=314159, nan=0, imag=1000003, algorithm='siphash24', hash_bits=64, seed_bits=128, cutoff=0)
>>> sys.hash_info.inf
314159

与PyPy的结果相同。)


就代码而言,hash是一个内置函数。在Python float对象上调用它会调用函数,该函数的指针由内置float类型()的tp_hash属性给定,该类型定义为的函数,PyTypeObject PyFloat_Type而该函数又具有float_hashreturn _Py_HashDouble(v->ob_fval)

    if (Py_IS_INFINITY(v))
        return v > 0 ? _PyHASH_INF : -_PyHASH_INF;

其中_PyHASH_INF定义为 314159:

#define _PyHASH_INF 314159

从历史的角度来看,Tim Peters在2000年8月添加了314159此上下文中Python代码中的第一个提及(您可以使用git bisect或找到git log -S 314159 -p),现在在git存储库中提交了39dce293cpython

提交消息说:

修复了http://sourceforge.net/bugs/?func=detailbug&bug_id=111866&group_id=5470的问题。这是一个令人误解的错误-真正的“错误”是hash(x)xinfinity为无限时返回错误。修复了。向添加了新的Py_IS_INFINITYpyport.h。重新排列了代码,以减少浮点数和复数的散列中越来越多的重复,从而将Trent之前的做法推到了合理的结论。修复了一个极其罕见的错误,即即使没有错误,浮点数的哈希也可能返回-1(并没有浪费时间来构造一个测试用例,从代码中可以明显看出它可能发生)。改进了复杂的哈希,因此 hash(complex(x, y))不再系统地相等hash(complex(y, x))

特别是,在此提交中,他撕掉了static long float_hash(PyFloatObject *v)in 的代码Objects/floatobject.c并使它成为just return _Py_HashDouble(v->ob_fval);,并在in的定义long _Py_HashDouble(double v)Objects/object.c添加了以下几行:

        if (Py_IS_INFINITY(intpart))
            /* can't convert to long int -- arbitrary */
            v = v < 0 ? -271828.0 : 314159.0;

因此,如上所述,这是一个任意选择。请注意,271828由e的前几个十进制数字形成。

相关的以后的提交:

Summary: It’s not a coincidence; _PyHASH_INF is hardcoded as 314159 in the default CPython implementation of Python, and was picked as an arbitrary value (obviously from the digits of π) by Tim Peters in 2000.


The value of hash(float('inf')) is one of the system-dependent parameters of the built-in hash function for numeric types, and is also available as sys.hash_info.inf in Python 3:

>>> import sys
>>> sys.hash_info
sys.hash_info(width=64, modulus=2305843009213693951, inf=314159, nan=0, imag=1000003, algorithm='siphash24', hash_bits=64, seed_bits=128, cutoff=0)
>>> sys.hash_info.inf
314159

(Same results with PyPy too.)


In terms of code, hash is a built-in function. Calling it on a Python float object invokes the function whose pointer is given by the tp_hash attribute of the built-in float type (PyTypeObject PyFloat_Type), which is the float_hash function, defined as return _Py_HashDouble(v->ob_fval), which in turn has

    if (Py_IS_INFINITY(v))
        return v > 0 ? _PyHASH_INF : -_PyHASH_INF;

where _PyHASH_INF is defined as 314159:

#define _PyHASH_INF 314159

In terms of history, the first mention of 314159 in this context in the Python code (you can find this with git bisect or git log -S 314159 -p) was added by Tim Peters in August 2000, in what is now commit 39dce293 in the cpython git repository.

The commit message says:

Fix for http://sourceforge.net/bugs/?func=detailbug&bug_id=111866&group_id=5470. This was a misleading bug — the true “bug” was that hash(x) gave an error return when x is an infinity. Fixed that. Added new Py_IS_INFINITY macro to pyport.h. Rearranged code to reduce growing duplication in hashing of float and complex numbers, pushing Trent’s earlier stab at that to a logical conclusion. Fixed exceedingly rare bug where hashing of floats could return -1 even if there wasn’t an error (didn’t waste time trying to construct a test case, it was simply obvious from the code that it could happen). Improved complex hash so that hash(complex(x, y)) doesn’t systematically equal hash(complex(y, x)) anymore.

In particular, in this commit he ripped out the code of static long float_hash(PyFloatObject *v) in Objects/floatobject.c and made it just return _Py_HashDouble(v->ob_fval);, and in the definition of long _Py_HashDouble(double v) in Objects/object.c he added the lines:

        if (Py_IS_INFINITY(intpart))
            /* can't convert to long int -- arbitrary */
            v = v < 0 ? -271828.0 : 314159.0;

So as mentioned, it was an arbitrary choice. Note that 271828 is formed from the first few decimal digits of e.

Related later commits:


回答 2

确实,

sys.hash_info.inf

返回314159。该值不会生成,而是内置在源代码中。事实上,

hash(float('-inf'))

-271828在python 2中返回或大约为-e(现在为-314159)。

将所有时间中两个最著名的无理数用作哈希值的事实使得它不太可能是巧合。

Indeed,

sys.hash_info.inf

returns 314159. The value is not generated, it’s built into the source code. In fact,

hash(float('-inf'))

returns -271828, or approximately -e, in python 2 (it’s -314159 now).

The fact that the two most famous irrational numbers of all time are used as the hash values makes it very unlikely to be a coincidence.