为什么tuple(set([(1,“ a”,“ b”,“ c”,“ z”,“ f”]))==元组(set([(a,b,c) “ z”,“ f”,1]))85%的时间启用了哈希随机化?

问题:为什么tuple(set([(1,“ a”,“ b”,“ c”,“ z”,“ f”]))==元组(set([(a,b,c) “ z”,“ f”,1]))85%的时间启用了哈希随机化?

鉴于零比雷埃夫斯对另一个问题的回答,我们认为

x = tuple(set([1, "a", "b", "c", "z", "f"]))
y = tuple(set(["a", "b", "c", "z", "f", 1]))
print(x == y)

True在启用散列随机化的情况下,大约打印时间的85%。为什么是85%?

Given Zero Piraeus’ answer to another question, we have that

x = tuple(set([1, "a", "b", "c", "z", "f"]))
y = tuple(set(["a", "b", "c", "z", "f", 1]))
print(x == y)

Prints True about 85% of the time with hash randomization enabled. Why 85%?


回答 0

我假设这个问题的所有读者都读过:

首先要注意的是,哈希随机化是由解释器启动决定的。

两组字母的哈希值都相同,因此唯一重要的是是否发生冲突(顺序会受到影响)。


通过第二个链接的推论,我们知道这些集合的支持数组从长度8开始:

_ _ _ _ _ _ _ _

在第一种情况下,我们插入1

_ 1 _ _ _ _ _ _

然后插入其余部分:

α 1 ? ? ? ? ? ?

然后将其重新映射为32号:

    1 can't collide with α as α is an even hash
  ↓ so 1 is inserted at slot 1 first
? 1 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?

在第二种情况下,我们插入其余部分:

? β ? ? ? ? ? ?

然后尝试插入1:

    Try to insert 1 here, but will
  ↓ be rehashed if β exists
? β ? ? ? ? ? ?

然后它将被修复:

    Try to insert 1 here, but will
    be rehashed if β exists and has
  ↓ not rehashed somewhere else
? β ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?

因此,迭代次数是否不同仅取决于β是否存在。


β的机率是5个字母中的任何一个都将以1模8 哈希以1模32哈希的概率。

由于任何哈希到1模32的东西也哈希到1模8,我们想找到32个插槽的机会,所以五个插槽之一在插槽1中:

5 (number of letters) / 32 (number of slots)

5/32为0.15625,因此在两个固定结构之间有15.625%的订单概率不同


一点也不奇怪,这正是零比雷埃夫斯所测量的。


¹从技术上讲,这并不明显。我们可以假装5个散列中的每一个都是唯一的,因为需要重新哈希处理,但是由于线性探测,实际上更有可能发生“成束的”结构……但是因为我们只查看是否占用了一个插槽,所以实际上不会影响我们。

I’m going to assume any readers of this question to have read both:

The first thing to note is that hash randomization is decided on interpreter start-up.

The hash of each letter will be the same for both sets, so the only thing that can matter is if there is a collision (where order will be affected).


By the deductions of that second link we know the backing array for these sets starts at length 8:

_ _ _ _ _ _ _ _

In the first case, we insert 1:

_ 1 _ _ _ _ _ _

and then insert the rest:

α 1 ? ? ? ? ? ?

Then it is rehashed to size 32:

    1 can't collide with α as α is an even hash
  ↓ so 1 is inserted at slot 1 first
? 1 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?

In the second case, we insert the rest:

? β ? ? ? ? ? ?

And then try to insert 1:

    Try to insert 1 here, but will
  ↓ be rehashed if β exists
? β ? ? ? ? ? ?

And then it will be rehashed:

    Try to insert 1 here, but will
    be rehashed if β exists and has
  ↓ not rehashed somewhere else
? β ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?

So whether the iteration orders are different depends solely on whether β exists.


The chance of a β is the chance that any of the 5 letters will hash to 1 modulo 8 and hash to 1 modulo 32.

Since anything that hashes to 1 modulo 32 also hashes to 1 modulo 8, we want to find the chance that of the 32 slots, one of the five is in slot 1:

5 (number of letters) / 32 (number of slots)

5/32 is 0.15625, so there is a 15.625% chance¹ of the orders being different between the two set constructions.


Not very strangely at all, this is exactly what Zero Piraeus measured.


¹Technically even this isn’t obvious. We can pretend every one of the 5 hashes uniquely because of rehashing, but because of linear probing it’s actually more likely for “bunched” structures to occur… but because we’re only looking at whether a single slot is occupied, this doesn’t actually affect us.