问题:numpy:从2D数组中获取随机的行集

我有一个非常大的2D数组,看起来像这样:

a=
[[a1, b1, c1],
 [a2, b2, c2],
 ...,
 [an, bn, cn]]

使用numpy,是否有一种简单的方法来获得一个新的2D数组,例如,从初始数组中获得2个随机行a(无需替换)?

例如

b=
[[a4,  b4,  c4],
 [a99, b99, c99]]

I have a very large 2D array which looks something like this:

a=
[[a1, b1, c1],
 [a2, b2, c2],
 ...,
 [an, bn, cn]]

Using numpy, is there an easy way to get a new 2D array with, e.g., 2 random rows from the initial array a (without replacement)?

e.g.

b=
[[a4,  b4,  c4],
 [a99, b99, c99]]

回答 0

>>> A = np.random.randint(5, size=(10,3))
>>> A
array([[1, 3, 0],
       [3, 2, 0],
       [0, 2, 1],
       [1, 1, 4],
       [3, 2, 2],
       [0, 1, 0],
       [1, 3, 1],
       [0, 4, 1],
       [2, 4, 2],
       [3, 3, 1]])
>>> idx = np.random.randint(10, size=2)
>>> idx
array([7, 6])
>>> A[idx,:]
array([[0, 4, 1],
       [1, 3, 1]])

一般情况下将其放在一起:

A[np.random.randint(A.shape[0], size=2), :]

对于非替换(numpy 1.7.0+):

A[np.random.choice(A.shape[0], 2, replace=False), :]

我不认为有一种很好的方法可以在不替换1.7之前生成随机列表。也许您可以设置一个小的定义,以确保两个值不相同。

>>> A = np.random.randint(5, size=(10,3))
>>> A
array([[1, 3, 0],
       [3, 2, 0],
       [0, 2, 1],
       [1, 1, 4],
       [3, 2, 2],
       [0, 1, 0],
       [1, 3, 1],
       [0, 4, 1],
       [2, 4, 2],
       [3, 3, 1]])
>>> idx = np.random.randint(10, size=2)
>>> idx
array([7, 6])
>>> A[idx,:]
array([[0, 4, 1],
       [1, 3, 1]])

Putting it together for a general case:

A[np.random.randint(A.shape[0], size=2), :]

For non replacement (numpy 1.7.0+):

A[np.random.choice(A.shape[0], 2, replace=False), :]

I do not believe there is a good way to generate random list without replacement before 1.7. Perhaps you can setup a small definition that ensures the two values are not the same.


回答 1

这是旧文章,但这对我来说是最合适的:

A[np.random.choice(A.shape[0], num_rows_2_sample, replace=False)]

将replace = False更改为True可以得到相同的结果,但是要进行替换。

This is an old post, but this is what works best for me:

A[np.random.choice(A.shape[0], num_rows_2_sample, replace=False)]

change the replace=False to True to get the same thing, but with replacement.


回答 2

如果您只想按一定因素对数据进行下采样,则另一种选择是创建随机掩码。假设我想将原始数据集下采样到当前存储在数组中的25%data_arr

# generate random boolean mask the length of data
# use p 0.75 for False and 0.25 for True
mask = numpy.random.choice([False, True], len(data_arr), p=[0.75, 0.25])

现在,您可以调用data_arr[mask]并返回大约25%的行(随机采样)。

Another option is to create a random mask if you just want to down-sample your data by a certain factor. Say I want to down-sample to 25% of my original data set, which is currently held in the array data_arr:

# generate random boolean mask the length of data
# use p 0.75 for False and 0.25 for True
mask = numpy.random.choice([False, True], len(data_arr), p=[0.75, 0.25])

Now you can call data_arr[mask] and return ~25% of the rows, randomly sampled.


回答 3

这与Hezi Rasheff提供的答案类似,但经过简化,因此新的python用户可以理解发生了什么(我注意到许多新的数据科学专业的学生以最奇怪的方式获取随机样本,因为他们不知道自己在python中做什么。)

您可以使用以下方法从数组中获得许多随机索引:

indices = np.random.choice(A.shape[0], amount_of_samples, replace=False)

然后,可以对numpy数组使用切片,以在这些索引处获取样本:

A[indices]

这将从您的数据中获得指定数量的随机样本。

This is a similar answer to the one Hezi Rasheff provided, but simplified so newer python users understand what’s going on (I noticed many new datascience students fetch random samples in the weirdest ways because they don’t know what they are doing in python).

You can get a number of random indices from your array by using:

indices = np.random.choice(A.shape[0], amount_of_samples, replace=False)

You can then use slicing with your numpy array to get the samples at those indices:

A[indices]

This will get you the specified number of random samples from your data.


回答 4

如果您需要相同的行而只是随机样本,

import random
new_array = random.sample(old_array,x)

在此,x必须是一个“ int”,用于定义要随机选择的行数。

If you need the same rows but just a random sample then,

import random
new_array = random.sample(old_array,x)

Here x, has to be an ‘int’ defining the number of rows you want to randomly pick.


回答 5

我看到有人建议进行排列。实际上,它可以做成一行:

>>> A = np.random.randint(5, size=(10,3))
>>> np.random.permutation(A)[:2]

array([[0, 3, 0],
       [3, 1, 2]])

I see permutation has been suggested. In fact it can be made into one line:

>>> A = np.random.randint(5, size=(10,3))
>>> np.random.permutation(A)[:2]

array([[0, 3, 0],
       [3, 1, 2]])

回答 6

如果要生成多个随机的行子集,例如,如果要执行RANSAC。

num_pop = 10
num_samples = 2
pop_in_sample = 3
rows_to_sample = np.random.random([num_pop, 5])
random_numbers = np.random.random([num_samples, num_pop])
samples = np.argsort(random_numbers, axis=1)[:, :pop_in_sample]
# will be shape [num_samples, pop_in_sample, 5]
row_subsets = rows_to_sample[samples, :]

If you want to generate multiple random subsets of rows, for example if your doing RANSAC.

num_pop = 10
num_samples = 2
pop_in_sample = 3
rows_to_sample = np.random.random([num_pop, 5])
random_numbers = np.random.random([num_samples, num_pop])
samples = np.argsort(random_numbers, axis=1)[:, :pop_in_sample]
# will be shape [num_samples, pop_in_sample, 5]
row_subsets = rows_to_sample[samples, :]

声明:本站所有文章,如无特殊说明或标注,均为本站原创发布。任何个人或组织,在未征得本站同意时,禁止复制、盗用、采集、发布本站内容到任何网站、书籍等各类媒体平台。如若本站内容侵犯了原著者的合法权益,可联系我们进行处理。