numpy：从2D数组中获取随机的行集

Question 1

I have a very large 2D array which looks something like this:

a=
[[a1, b1, c1],
 [a2, b2, c2],
 ...,
 [an, bn, cn]]

Using numpy, is there an easy way to get a new 2D array with, e.g., 2 random rows from the initial array a (without replacement)?

e.g.

b=
[[a4,  b4,  c4],
 [a99, b99, c99]]

Question 2

>>> A = np.random.randint(5, size=(10,3))
>>> A
array([[1, 3, 0],
       [3, 2, 0],
       [0, 2, 1],
       [1, 1, 4],
       [3, 2, 2],
       [0, 1, 0],
       [1, 3, 1],
       [0, 4, 1],
       [2, 4, 2],
       [3, 3, 1]])
>>> idx = np.random.randint(10, size=2)
>>> idx
array([7, 6])
>>> A[idx,:]
array([[0, 4, 1],
       [1, 3, 1]])

Putting it together for a general case:

A[np.random.randint(A.shape[0], size=2), :]

For non replacement (numpy 1.7.0+):

A[np.random.choice(A.shape[0], 2, replace=False), :]

I do not believe there is a good way to generate random list without replacement before 1.7. Perhaps you can setup a small definition that ensures the two values are not the same.

Question 3

This is an old post, but this is what works best for me:

A[np.random.choice(A.shape[0], num_rows_2_sample, replace=False)]

change the replace=False to True to get the same thing, but with replacement.

Question 4

Another option is to create a random mask if you just want to down-sample your data by a certain factor. Say I want to down-sample to 25% of my original data set, which is currently held in the array data_arr:

# generate random boolean mask the length of data
# use p 0.75 for False and 0.25 for True
mask = numpy.random.choice([False, True], len(data_arr), p=[0.75, 0.25])

Now you can call data_arr[mask] and return ~25% of the rows, randomly sampled.

Question 5

This is a similar answer to the one Hezi Rasheff provided, but simplified so newer python users understand what’s going on (I noticed many new datascience students fetch random samples in the weirdest ways because they don’t know what they are doing in python).

You can get a number of random indices from your array by using:

indices = np.random.choice(A.shape[0], amount_of_samples, replace=False)

You can then use slicing with your numpy array to get the samples at those indices:

A[indices]

This will get you the specified number of random samples from your data.

Question 6

If you need the same rows but just a random sample then,

import random
new_array = random.sample(old_array,x)

Here x, has to be an ‘int’ defining the number of rows you want to randomly pick.

Question 7

I see permutation has been suggested. In fact it can be made into one line:

>>> A = np.random.randint(5, size=(10,3))
>>> np.random.permutation(A)[:2]

array([[0, 3, 0],
       [3, 1, 2]])

Question 8

If you want to generate multiple random subsets of rows, for example if your doing RANSAC.

num_pop = 10
num_samples = 2
pop_in_sample = 3
rows_to_sample = np.random.random([num_pop, 5])
random_numbers = np.random.random([num_samples, num_pop])
samples = np.argsort(random_numbers, axis=1)[:, :pop_in_sample]
# will be shape [num_samples, pop_in_sample, 5]
row_subsets = rows_to_sample[samples, :]

numpy：从2D数组中获取随机的行集

问题：numpy：从2D数组中获取随机的行集

回答 0

回答 1

回答 2

回答 3

回答 4

回答 5

回答 6

排行榜展示

Python 情人节超强技能导出微信聊天记录生成词云

你不得不知道的python超级文献批量搜索下载工具

Python 流程图 — 一键转化代码为流程图

7行代码 Python热力图可视化分析缺失数据处理

Python 优化—算出每条语句执行时间

你的10W块放哪里能赚最多钱？

文章展示

如何在Python中生成键盘事件？

如何将布尔数组转换为int数组

检查是否安装了Python软件包

输出到同一行覆盖先前的输出？

如何从一个简单的字符串构造一个timedelta对象

py2exe-生成单个可执行文件

numpy：从2D数组中获取随机的行集

问题：numpy：从2D数组中获取随机的行集

回答 0

回答 1

回答 2

回答 3

回答 4

回答 5

回答 6

相关文章

排行榜展示

文章展示