numpy.random.seed(0)有什么作用?

问题:numpy.random.seed(0)有什么作用?

np.random.seedScikit-Learn教程的以下代码在做什么?我对NumPy的随机状态生成器不太熟悉,因此我非常感谢外行对此的解释。

np.random.seed(0)
indices = np.random.permutation(len(iris_X))

What does np.random.seed do in the below code from a Scikit-Learn tutorial? I’m not very familiar with NumPy’s random state generator stuff, so I’d really appreciate a layman’s terms explanation of this.

np.random.seed(0)
indices = np.random.permutation(len(iris_X))

回答 0

np.random.seed(0) 使随机数可预测

>>> numpy.random.seed(0) ; numpy.random.rand(4)
array([ 0.55,  0.72,  0.6 ,  0.54])
>>> numpy.random.seed(0) ; numpy.random.rand(4)
array([ 0.55,  0.72,  0.6 ,  0.54])

每次重置种子后,相同每次都会出现一组的数字。

如果未重置随机种子,则每次调用都会显示不同的数字:

>>> numpy.random.rand(4)
array([ 0.42,  0.65,  0.44,  0.89])
>>> numpy.random.rand(4)
array([ 0.96,  0.38,  0.79,  0.53])

(伪)随机数的工作方式是从一个数字(种子)开始,将其乘以一个大数字,加上一个偏移量,然后对该和取模。然后将所得的数字用作种子,以生成下一个“随机”数字。设置种子时(每次),每次都会执行相同的操作,并为您提供相同的编号。

如果您希望看似随机数,请不要设置种子。但是,如果您使用的代码使用要调试的随机数,则在每次运行之前设置种子可能非常有帮助,这样每次运行代码时,它们都会执行相同的操作。

要获得每次运行的最大随机数,请调用numpy.random.seed()将导致numpy将种子设置为从/dev/urandom Windows或其Windows模拟或者,如果两者均不可用,它将使用时钟。

有关使用种子生成伪随机数的更多信息,请参见Wikipedia

np.random.seed(0) makes the random numbers predictable

>>> numpy.random.seed(0) ; numpy.random.rand(4)
array([ 0.55,  0.72,  0.6 ,  0.54])
>>> numpy.random.seed(0) ; numpy.random.rand(4)
array([ 0.55,  0.72,  0.6 ,  0.54])

With the seed reset (every time), the same set of numbers will appear every time.

If the random seed is not reset, different numbers appear with every invocation:

>>> numpy.random.rand(4)
array([ 0.42,  0.65,  0.44,  0.89])
>>> numpy.random.rand(4)
array([ 0.96,  0.38,  0.79,  0.53])

(pseudo-)random numbers work by starting with a number (the seed), multiplying it by a large number, adding an offset, then taking modulo of that sum. The resulting number is then used as the seed to generate the next “random” number. When you set the seed (every time), it does the same thing every time, giving you the same numbers.

If you want seemingly random numbers, do not set the seed. If you have code that uses random numbers that you want to debug, however, it can be very helpful to set the seed before each run so that the code does the same thing every time you run it.

To get the most random numbers for each run, call numpy.random.seed(). This will cause numpy to set the seed to a random number obtained from /dev/urandom or its Windows analog or, if neither of those is available, it will use the clock.

For more information on using seeds to generate pseudo-random numbers, see wikipedia.


回答 1

如果您设置np.random.seed(a_fixed_number)每次调用numpy的其他随机函数,则结果将相同:

>>> import numpy as np
>>> np.random.seed(0) 
>>> perm = np.random.permutation(10) 
>>> print perm 
[2 8 4 9 1 6 7 3 0 5]
>>> np.random.seed(0) 
>>> print np.random.permutation(10) 
[2 8 4 9 1 6 7 3 0 5]
>>> np.random.seed(0) 
>>> print np.random.permutation(10) 
[2 8 4 9 1 6 7 3 0 5]
>>> np.random.seed(0) 
>>> print np.random.permutation(10) 
[2 8 4 9 1 6 7 3 0 5]
>>> np.random.seed(0) 
>>> print np.random.rand(4) 
[0.5488135  0.71518937 0.60276338 0.54488318]
>>> np.random.seed(0) 
>>> print np.random.rand(4) 
[0.5488135  0.71518937 0.60276338 0.54488318]

但是,如果只调用一次并使用各种随机函数,结果将仍然不同:

>>> import numpy as np
>>> np.random.seed(0) 
>>> perm = np.random.permutation(10)
>>> print perm 
[2 8 4 9 1 6 7 3 0 5]
>>> np.random.seed(0) 
>>> print np.random.permutation(10)
[2 8 4 9 1 6 7 3 0 5]
>>> print np.random.permutation(10) 
[3 5 1 2 9 8 0 6 7 4]
>>> print np.random.permutation(10) 
[2 3 8 4 5 1 0 6 9 7]
>>> print np.random.rand(4) 
[0.64817187 0.36824154 0.95715516 0.14035078]
>>> print np.random.rand(4) 
[0.87008726 0.47360805 0.80091075 0.52047748]

If you set the np.random.seed(a_fixed_number) every time you call the numpy’s other random function, the result will be the same:

>>> import numpy as np
>>> np.random.seed(0) 
>>> perm = np.random.permutation(10) 
>>> print perm 
[2 8 4 9 1 6 7 3 0 5]
>>> np.random.seed(0) 
>>> print np.random.permutation(10) 
[2 8 4 9 1 6 7 3 0 5]
>>> np.random.seed(0) 
>>> print np.random.permutation(10) 
[2 8 4 9 1 6 7 3 0 5]
>>> np.random.seed(0) 
>>> print np.random.permutation(10) 
[2 8 4 9 1 6 7 3 0 5]
>>> np.random.seed(0) 
>>> print np.random.rand(4) 
[0.5488135  0.71518937 0.60276338 0.54488318]
>>> np.random.seed(0) 
>>> print np.random.rand(4) 
[0.5488135  0.71518937 0.60276338 0.54488318]

However, if you just call it once and use various random functions, the results will still be different:

>>> import numpy as np
>>> np.random.seed(0) 
>>> perm = np.random.permutation(10)
>>> print perm 
[2 8 4 9 1 6 7 3 0 5]
>>> np.random.seed(0) 
>>> print np.random.permutation(10)
[2 8 4 9 1 6 7 3 0 5]
>>> print np.random.permutation(10) 
[3 5 1 2 9 8 0 6 7 4]
>>> print np.random.permutation(10) 
[2 3 8 4 5 1 0 6 9 7]
>>> print np.random.rand(4) 
[0.64817187 0.36824154 0.95715516 0.14035078]
>>> print np.random.rand(4) 
[0.87008726 0.47360805 0.80091075 0.52047748]

回答 2

如前所述,numpy.random.seed(0)将随机种子设置为0,因此从random获得的伪随机数将从同一点开始。在某些情况下,这对于调试非常有用。但是,经过一番阅读后,如果您有线程,这似乎是错误的处理方法,因为它不是线程安全的。

来自python中的numpy随机和随机随机数之间的差异

对于numpy.random.seed(),主要的困难在于它不是线程安全的-也就是说,如果您有许多不同的执行线程,则使用它是不安全的,因为如果两个不同的线程正在执行,则不能保证它可以正常工作。同时功能。如果您不使用线程,并且可以合理地期望将来不需要以这种方式重写程序,那么numpy.random.seed()应该可以用于测试。如果有任何理由怀疑您将来可能需要线程,那么从长远来看,按照建议进行操作并创建numpy.random.Random类的本地实例要安全得多。据我所知,random.random.seed()是线程安全的(或者至少我没有发现任何相反的证据)。

如何执行此操作的示例:

from numpy.random import RandomState
prng = RandomState()
print prng.permutation(10)
prng = RandomState()
print prng.permutation(10)
prng = RandomState(42)
print prng.permutation(10)
prng = RandomState(42)
print prng.permutation(10)

可以给:

[3 0 4 6 8 2 1 9 7 5]

[1 6 9 0 2 7 8 3 5 4]

[8 1 5 0 7 2 9 4 3 6]

[8 1 5 0 7 2 9 4 3 6]

最后,请注意,由于xor的工作方式,可能在某些情况下初始化为0(与并非所有位均为0的种子相反)可能会导致一些首次迭代的分布不均匀,但这取决于算法,这超出了我目前的担忧和这个问题的范围。

As noted, numpy.random.seed(0) sets the random seed to 0, so the pseudo random numbers you get from random will start from the same point. This can be good for debuging in some cases. HOWEVER, after some reading, this seems to be the wrong way to go at it, if you have threads because it is not thread safe.

from differences-between-numpy-random-and-random-random-in-python:

For numpy.random.seed(), the main difficulty is that it is not thread-safe – that is, it’s not safe to use if you have many different threads of execution, because it’s not guaranteed to work if two different threads are executing the function at the same time. If you’re not using threads, and if you can reasonably expect that you won’t need to rewrite your program this way in the future, numpy.random.seed() should be fine for testing purposes. If there’s any reason to suspect that you may need threads in the future, it’s much safer in the long run to do as suggested, and to make a local instance of the numpy.random.Random class. As far as I can tell, random.random.seed() is thread-safe (or at least, I haven’t found any evidence to the contrary).

example of how to go about this:

from numpy.random import RandomState
prng = RandomState()
print prng.permutation(10)
prng = RandomState()
print prng.permutation(10)
prng = RandomState(42)
print prng.permutation(10)
prng = RandomState(42)
print prng.permutation(10)

may give:

[3 0 4 6 8 2 1 9 7 5]

[1 6 9 0 2 7 8 3 5 4]

[8 1 5 0 7 2 9 4 3 6]

[8 1 5 0 7 2 9 4 3 6]

Lastly, note that there might be cases where initializing to 0 (as opposed to a seed that has not all bits 0) may result to non-uniform distributions for some few first iterations because of the way xor works, but this depends on the algorithm, and is beyond my current worries and the scope of this question.


回答 3

我在神经网络中经常使用它。众所周知,当我们开始训练神经网络时,我们会随机初始化权重。在特定数据集上对这些权重训练模型。经过数个时期后,您将获得一组训练有素的权重。

现在,假设您要从头开始再次训练,或者要将模型传递给其他人来重现您的结果,权重将再次初始化为一个随机数,该数字与以前的数字大不相同。在与之前相同的时期(保持相同的数据和其他参数)之后,获得的训练权重将有所不同。问题在于您的模型不再具有可复制性,因为每次您从头训练模型时,模型都会提供不同的权重集。这是因为每次都会用不同的随机数初始化模型。

如果每次您从头开始训练时都将模型初始化为同一组随机初始化权重,该怎么办?在这种情况下,您的模型可以重现。这是通过numpy.random.seed(0)实现的。通过将seed()提到一个特定的数字,您将始终挂在同一组随机数上。

I have used this very often in neural networks. It is well known that when we start training a neural network we randomly initialise the weights. The model is trained on these weights on a particular dataset. After number of epochs you get trained set of weights.

Now suppose you want to again train from scratch or you want to pass the model to others to reproduce your results, the weights will be again initialised to a random numbers which mostly will be different from earlier ones. The obtained trained weights after same number of epochs ( keeping same data and other parameters ) as earlier one will differ. The problem is your model is no more reproducible that is every time you train your model from scratch it provides you different sets of weights. This is because the model is being initialized by different random numbers every time.

What if every time you start training from scratch the model is initialised to the same set of random initialise weights? In this case your model could become reproducible. This is achieved by numpy.random.seed(0). By mentioning seed() to a particular number, you are hanging on to same set of random numbers always.


回答 4

想象一下,您正在向某人展示如何使用一堆“随机”数字进行编码。通过使用numpy种子,他们可以使用相同的种子编号并获得相同的“随机”编号集。

因此,它不是完全随机的,因为算法会散出数字,但看起来像是随机生成的一堆。

Imagine you are showing someone how to code something with a bunch of “random” numbers. By using numpy seed they can use the same seed number and get the same set of “random” numbers.

So it’s not exactly random because an algorithm spits out the numbers but it looks like a randomly generated bunch.


回答 5

随机种子指定计算机生成随机数序列时的起点。

例如,假设您要在Excel中生成一个随机数(注意:Excel为种子设置的限制为9999)。如果您在此过程中向“随机种子”框中输入数字,则可以再次使用同一组随机数。如果在框中键入“ 77”,并在下次运行随机数生成器时键入“ 77”,则Excel将显示同一组随机数。如果输入“ 99”,则会得到一组完全不同的数字。但是,如果您恢复为77的种子,那么您将获得与开始时相同的一组随机数。

例如,“取一个数字x,加900 + x,然后减去52。” 为了启动该过程,您必须指定一个起始编号x(种子)。让我们以77开始:

加900 + 77 = 977减52 = 925按照相同的算法,第二个“随机”数将是:

900 + 925 = 1825减52 = 1773这个简单的例子遵循一个模式,但是计算机数字生成背后的算法要复杂得多。

A random seed specifies the start point when a computer generates a random number sequence.

For example, let’s say you wanted to generate a random number in Excel (Note: Excel sets a limit of 9999 for the seed). If you enter a number into the Random Seed box during the process, you’ll be able to use the same set of random numbers again. If you typed “77” into the box, and typed “77” the next time you run the random number generator, Excel will display that same set of random numbers. If you type “99”, you’ll get an entirely different set of numbers. But if you revert back to a seed of 77, then you’ll get the same set of random numbers you started with.

For example, “take a number x, add 900 +x, then subtract 52.” In order for the process to start, you have to specify a starting number, x (the seed). Let’s take the starting number 77:

Add 900 + 77 = 977 Subtract 52 = 925 Following the same algorithm, the second “random” number would be:

900 + 925 = 1825 Subtract 52 = 1773 This simple example follows a pattern, but the algorithms behind computer number generation are much more complicated


回答 6

在所有平台/系统上,设置特定种子值后生成的所有随机数均相同。

All the random numbers generated after setting particular seed value are same across all the platforms/systems.


回答 7

在Numpy文档中有一个很好的解释:https ://docs.scipy.org/doc/numpy-1.15.1/reference/generation/numpy.random.RandomState.html 指的是Mersenne Twister伪随机数生成器。有关该算法的更多详细信息,请参见:https : //en.wikipedia.org/wiki/Mersenne_Twister

There is a nice explanation in Numpy docs: https://docs.scipy.org/doc/numpy-1.15.1/reference/generated/numpy.random.RandomState.html it refers to Mersenne Twister pseudo-random number generator. More details on the algorithm here: https://en.wikipedia.org/wiki/Mersenne_Twister


回答 8

numpy.random.seed(0)
numpy.random.randint(10, size=5)

这将产生以下输出: array([5, 0, 3, 3, 7]) 同样,如果我们运行相同的代码,我们将得到相同的结果。

现在,如果我们将种子值0更改为1或其他值:

numpy.random.seed(1)
numpy.random.randint(10, size=5)

这将产生以下输出:array([5 8 9 5 0])但是现在输出与上面的不一样。

numpy.random.seed(0)
numpy.random.randint(10, size=5)

This produces the following output: array([5, 0, 3, 3, 7]) Again,if we run the same code we will get the same result.

Now if we change the seed value 0 to 1 or others:

numpy.random.seed(1)
numpy.random.randint(10, size=5)

This produces the following output: array([5 8 9 5 0]) but now the output not the same like above.


回答 9

以上所有答案均显示了 np.random.seed() in代码。我将尽力简要地解释为什么它真正发生。计算机是基于预定义算法设计的计算机。计算机的任何输出都是在输入上实现的算法的结果。因此,当我们要求计算机生成随机数时,请确保它们是随机的,但计算机并不仅仅是随机地提供它们!

因此,当我们编写np.random.seed(any_number_here)该算法时,将输出一组特定于参数的数字any_number_here。如果我们传递正确的参数,几乎就像可以获得一组特定的随机数。但是,这将要求我们了解算法的工作方式,这非常繁琐。

因此,例如,如果我写np.random.seed(10)了一组获得的特定数字,即使我在10年后执行同一行,也将保持不变,除非算法发生变化。

All the answers above show the implementation of np.random.seed() in code. I’ll try my best to explain briefly why it actually happens. Computers are machines that are designed based on predefined algorithms. Any output from a computer is the result of the algorithm implemented on the input. So when we request a computer to generate random numbers, sure they are random but the computer did not just come up with them randomly!

So when we write np.random.seed(any_number_here) the algorithm will output a particular set of numbers that is unique to the argument any_number_here. It’s almost like a particular set of random numbers can be obtained if we pass the correct argument. But this will require us to know about how the algorithm works which is quite tedious.

So, for example if I write np.random.seed(10) the particular set of numbers that I obtain will remain the same even if I execute the same line after 10 years unless the algorithm changes.