## 问题：为什么tuple（set（[（1，“ a”，“ b”，“ c”，“ z”，“ f”]））==元组（set（[（a，b，c） “ z”，“ f”，1]））85％的时间启用了哈希随机化？

``````x = tuple(set([1, "a", "b", "c", "z", "f"]))
y = tuple(set(["a", "b", "c", "z", "f", 1]))
print(x == y)``````

`True`在启用散列随机化的情况下，大约打印时间的85％。为什么是85％？

Given Zero Piraeus’ answer to another question, we have that

``````x = tuple(set([1, "a", "b", "c", "z", "f"]))
y = tuple(set(["a", "b", "c", "z", "f", 1]))
print(x == y)
``````

Prints `True` about 85% of the time with hash randomization enabled. Why 85%?

## 回答 0

``_ _ _ _ _ _ _ _``

``_ 1 _ _ _ _ _ _``

``α 1 ? ? ? ? ? ?``

``````    1 can't collide with α as α is an even hash
↓ so 1 is inserted at slot 1 first
? 1 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?``````

``? β ? ? ? ? ? ?``

``````    Try to insert 1 here, but will
↓ be rehashed if β exists
? β ? ? ? ? ? ?``````

``````    Try to insert 1 here, but will
be rehashed if β exists and has
↓ not rehashed somewhere else
? β ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?``````

β的机率是5个字母中的任何一个都将以1模8 哈希以1模32哈希的概率。

``5 (number of letters) / 32 (number of slots)``

5/32为0.15625，因此在两个固定结构之间有15.625％的订单概率不同

¹从技术上讲，这并不明显。我们可以假装5个散列中的每一个都是唯一的，因为需要重新哈希处理，但是由于线性探测，实际上更有可能发生“成束的”结构……但是因为我们只查看是否占用了一个插槽，所以实际上不会影响我们。

I’m going to assume any readers of this question to have read both:

The first thing to note is that hash randomization is decided on interpreter start-up.

The hash of each letter will be the same for both sets, so the only thing that can matter is if there is a collision (where order will be affected).

By the deductions of that second link we know the backing array for these sets starts at length 8:

``````_ _ _ _ _ _ _ _
``````

In the first case, we insert `1`:

``````_ 1 _ _ _ _ _ _
``````

and then insert the rest:

``````α 1 ? ? ? ? ? ?
``````

Then it is rehashed to size 32:

``````    1 can't collide with α as α is an even hash
↓ so 1 is inserted at slot 1 first
? 1 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?
``````

In the second case, we insert the rest:

``````? β ? ? ? ? ? ?
``````

And then try to insert 1:

``````    Try to insert 1 here, but will
↓ be rehashed if β exists
? β ? ? ? ? ? ?
``````

And then it will be rehashed:

``````    Try to insert 1 here, but will
be rehashed if β exists and has
↓ not rehashed somewhere else
? β ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?
``````

So whether the iteration orders are different depends solely on whether β exists.

The chance of a β is the chance that any of the 5 letters will hash to 1 modulo 8 and hash to 1 modulo 32.

Since anything that hashes to 1 modulo 32 also hashes to 1 modulo 8, we want to find the chance that of the 32 slots, one of the five is in slot 1:

``````5 (number of letters) / 32 (number of slots)
``````

5/32 is 0.15625, so there is a 15.625% chance¹ of the orders being different between the two set constructions.

Not very strangely at all, this is exactly what Zero Piraeus measured.

¹Technically even this isn’t obvious. We can pretend every one of the 5 hashes uniquely because of rehashing, but because of linear probing it’s actually more likely for “bunched” structures to occur… but because we’re only looking at whether a single slot is occupied, this doesn’t actually affect us.