标签归档：shuffle

更好地协调两个numpy数组的更好方法

2021年8月1日 Python实用宝典

问题：更好地协调两个numpy数组的更好方法

我有两个不同形状的numpy数组，但是长度（引导尺寸）相同。我想对它们中的每一个进行混洗，以使相应的元素继续对应-即相对于它们的前导索引一致地对它们进行混洗。

该代码有效，并说明了我的目标：

def shuffle_in_unison(a, b):
    assert len(a) == len(b)
    shuffled_a = numpy.empty(a.shape, dtype=a.dtype)
    shuffled_b = numpy.empty(b.shape, dtype=b.dtype)
    permutation = numpy.random.permutation(len(a))
    for old_index, new_index in enumerate(permutation):
        shuffled_a[new_index] = a[old_index]
        shuffled_b[new_index] = b[old_index]
    return shuffled_a, shuffled_b

例如：

>>> a = numpy.asarray([[1, 1], [2, 2], [3, 3]])
>>> b = numpy.asarray([1, 2, 3])
>>> shuffle_in_unison(a, b)
(array([[2, 2],
       [1, 1],
       [3, 3]]), array([2, 1, 3]))

但是，这感觉笨拙，效率低下且速度慢，并且需要复制数组-我宁愿就地对其进行随机播放，因为它们会很大。

有更好的方法来解决这个问题吗？更快的执行速度和更低的内存使用是我的主要目标，但是优美的代码也将是不错的。

我的另一个想法是：

def shuffle_in_unison_scary(a, b):
    rng_state = numpy.random.get_state()
    numpy.random.shuffle(a)
    numpy.random.set_state(rng_state)
    numpy.random.shuffle(b)

这行得通…但是有点吓人，因为我看不到它会继续工作-例如，它看起来像不能在numpy版本中生存的那种东西。

I have two numpy arrays of different shapes, but with the same length (leading dimension). I want to shuffle each of them, such that corresponding elements continue to correspond — i.e. shuffle them in unison with respect to their leading indices.

This code works, and illustrates my goals:

def shuffle_in_unison(a, b):
    assert len(a) == len(b)
    shuffled_a = numpy.empty(a.shape, dtype=a.dtype)
    shuffled_b = numpy.empty(b.shape, dtype=b.dtype)
    permutation = numpy.random.permutation(len(a))
    for old_index, new_index in enumerate(permutation):
        shuffled_a[new_index] = a[old_index]
        shuffled_b[new_index] = b[old_index]
    return shuffled_a, shuffled_b

For example:

>>> a = numpy.asarray([[1, 1], [2, 2], [3, 3]])
>>> b = numpy.asarray([1, 2, 3])
>>> shuffle_in_unison(a, b)
(array([[2, 2],
       [1, 1],
       [3, 3]]), array([2, 1, 3]))

However, this feels clunky, inefficient, and slow, and it requires making a copy of the arrays — I’d rather shuffle them in-place, since they’ll be quite large.

Is there a better way to go about this? Faster execution and lower memory usage are my primary goals, but elegant code would be nice, too.

One other thought I had was this:

def shuffle_in_unison_scary(a, b):
    rng_state = numpy.random.get_state()
    numpy.random.shuffle(a)
    numpy.random.set_state(rng_state)
    numpy.random.shuffle(b)

This works…but it’s a little scary, as I see little guarantee it’ll continue to work — it doesn’t look like the sort of thing that’s guaranteed to survive across numpy version, for example.

回答 0

您的“吓人”解决方案对我来说似乎并不可怕。调用shuffle()两个相同长度的序列会导致对随机数生成器的调用次数相同，这是随机播放算法中唯一的“随机”元素。通过重置状态，可以确保对随机数生成器的调用将在对的第二次调用中给出相同的结果shuffle()，因此整个算法将生成相同的排列。

如果您不喜欢这种方法，那么另一种解决方案是将数据存储在一个数组中，而不是从一开始就存储在两个数组中，然后在此单个数组中创建两个视图以模拟您现在拥有的两个数组。您可以将单个数组用于改组，并将视图用于所有其他目的。

例如：假设数组a和b这个样子的：

a = numpy.array([[[  0.,   1.,   2.],
                  [  3.,   4.,   5.]],

                 [[  6.,   7.,   8.],
                  [  9.,  10.,  11.]],

                 [[ 12.,  13.,  14.],
                  [ 15.,  16.,  17.]]])

b = numpy.array([[ 0.,  1.],
                 [ 2.,  3.],
                 [ 4.,  5.]])

现在我们可以构造一个包含所有数据的数组：

c = numpy.c_[a.reshape(len(a), -1), b.reshape(len(b), -1)]
# array([[  0.,   1.,   2.,   3.,   4.,   5.,   0.,   1.],
#        [  6.,   7.,   8.,   9.,  10.,  11.,   2.,   3.],
#        [ 12.,  13.,  14.,  15.,  16.,  17.,   4.,   5.]])

现在我们创建模拟原始视图的视图 a和的b：

a2 = c[:, :a.size//len(a)].reshape(a.shape)
b2 = c[:, a.size//len(a):].reshape(b.shape)

的数据 a2与b2共享c。要同时混洗两个数组，请使用numpy.random.shuffle(c)。

在生产代码，你当然会尽量避免创建原始a和b根本，并马上创建c，a2和b2。

该解决方案能够适应的情况下a，并b有不同的dtypes。

Your “scary” solution does not appear scary to me. Calling shuffle() for two sequences of the same length results in the same number of calls to the random number generator, and these are the only “random” elements in the shuffle algorithm. By resetting the state, you ensure that the calls to the random number generator will give the same results in the second call to shuffle(), so the whole algorithm will generate the same permutation.

If you don’t like this, a different solution would be to store your data in one array instead of two right from the beginning, and create two views into this single array simulating the two arrays you have now. You can use the single array for shuffling and the views for all other purposes.

Example: Let’s assume the arrays a and b look like this:

a = numpy.array([[[  0.,   1.,   2.],
                  [  3.,   4.,   5.]],

                 [[  6.,   7.,   8.],
                  [  9.,  10.,  11.]],

                 [[ 12.,  13.,  14.],
                  [ 15.,  16.,  17.]]])

b = numpy.array([[ 0.,  1.],
                 [ 2.,  3.],
                 [ 4.,  5.]])

We can now construct a single array containing all the data:

c = numpy.c_[a.reshape(len(a), -1), b.reshape(len(b), -1)]
# array([[  0.,   1.,   2.,   3.,   4.,   5.,   0.,   1.],
#        [  6.,   7.,   8.,   9.,  10.,  11.,   2.,   3.],
#        [ 12.,  13.,  14.,  15.,  16.,  17.,   4.,   5.]])

Now we create views simulating the original a and b:

a2 = c[:, :a.size//len(a)].reshape(a.shape)
b2 = c[:, a.size//len(a):].reshape(b.shape)

The data of a2 and b2 is shared with c. To shuffle both arrays simultaneously, use numpy.random.shuffle(c).

In production code, you would of course try to avoid creating the original a and b at all and right away create c, a2 and b2.

This solution could be adapted to the case that a and b have different dtypes.

回答 1

您可以使用NumPy的数组索引：

def unison_shuffled_copies(a, b):
    assert len(a) == len(b)
    p = numpy.random.permutation(len(a))
    return a[p], b[p]

这将导致创建单独的统一重组的数组。

Your can use NumPy’s array indexing:

def unison_shuffled_copies(a, b):
    assert len(a) == len(b)
    p = numpy.random.permutation(len(a))
    return a[p], b[p]

This will result in creation of separate unison-shuffled arrays.

回答 2

X = np.array([[1., 0.], [2., 1.], [0., 0.]])
y = np.array([0, 1, 2])
from sklearn.utils import shuffle
X, y = shuffle(X, y, random_state=0)

要了解更多信息，请参见http://scikit-learn.org/stable/modules/generated/sklearn.utils.shuffle.html

X = np.array([[1., 0.], [2., 1.], [0., 0.]])
y = np.array([0, 1, 2])
from sklearn.utils import shuffle
X, y = shuffle(X, y, random_state=0)

To learn more, see http://scikit-learn.org/stable/modules/generated/sklearn.utils.shuffle.html

回答 3

很简单的解决方案：

randomize = np.arange(len(x))
np.random.shuffle(randomize)
x = x[randomize]
y = y[randomize]

现在，两个数组x，y都以相同的方式随机洗牌

Very simple solution:

randomize = np.arange(len(x))
np.random.shuffle(randomize)
x = x[randomize]
y = y[randomize]

the two arrays x,y are now both randomly shuffled in the same way

回答 4

James在2015年编写了一个sklearn 解决方案，这很有帮助。但是他添加了一个不需要的随机状态变量。在下面的代码中，自动假定numpy为随机状态。

X = np.array([[1., 0.], [2., 1.], [0., 0.]])
y = np.array([0, 1, 2])
from sklearn.utils import shuffle
X, y = shuffle(X, y)

James wrote in 2015 an sklearn solution which is helpful. But he added a random state variable, which is not needed. In the below code, the random state from numpy is automatically assumed.

X = np.array([[1., 0.], [2., 1.], [0., 0.]])
y = np.array([0, 1, 2])
from sklearn.utils import shuffle
X, y = shuffle(X, y)

回答 5

from np.random import permutation
from sklearn.datasets import load_iris
iris = load_iris()
X = iris.data #numpy array
y = iris.target #numpy array

# Data is currently unshuffled; we should shuffle 
# each X[i] with its corresponding y[i]
perm = permutation(len(X))
X = X[perm]
y = y[perm]

from np.random import permutation
from sklearn.datasets import load_iris
iris = load_iris()
X = iris.data #numpy array
y = iris.target #numpy array

# Data is currently unshuffled; we should shuffle 
# each X[i] with its corresponding y[i]
perm = permutation(len(X))
X = X[perm]
y = y[perm]

回答 6

仅使用NumPy将任意数量的数组混合在一起就位。

import numpy as np


def shuffle_arrays(arrays, set_seed=-1):
    """Shuffles arrays in-place, in the same order, along axis=0

    Parameters:
    -----------
    arrays : List of NumPy arrays.
    set_seed : Seed value if int >= 0, else seed is random.
    """
    assert all(len(arr) == len(arrays[0]) for arr in arrays)
    seed = np.random.randint(0, 2**(32 - 1) - 1) if set_seed < 0 else set_seed

    for arr in arrays:
        rstate = np.random.RandomState(seed)
        rstate.shuffle(arr)

可以这样使用

a = np.array([1, 2, 3, 4, 5])
b = np.array([10,20,30,40,50])
c = np.array([[1,10,11], [2,20,22], [3,30,33], [4,40,44], [5,50,55]])

shuffle_arrays([a, b, c])

注意事项：

该断言确保所有输入数组沿其第一维具有相同的长度。
数组按其第一个维度在原地随机排列-没有返回任何内容。
int32正范围内的随机种子。
如果需要重复播放，可以设置种子值。

随机播放后，可以np.split使用切片对数据进行拆分或使用切片进行引用-取决于应用程序。

Shuffle any number of arrays together, in-place, using only NumPy.

import numpy as np


def shuffle_arrays(arrays, set_seed=-1):
    """Shuffles arrays in-place, in the same order, along axis=0

    Parameters:
    -----------
    arrays : List of NumPy arrays.
    set_seed : Seed value if int >= 0, else seed is random.
    """
    assert all(len(arr) == len(arrays[0]) for arr in arrays)
    seed = np.random.randint(0, 2**(32 - 1) - 1) if set_seed < 0 else set_seed

    for arr in arrays:
        rstate = np.random.RandomState(seed)
        rstate.shuffle(arr)

And can be used like this

a = np.array([1, 2, 3, 4, 5])
b = np.array([10,20,30,40,50])
c = np.array([[1,10,11], [2,20,22], [3,30,33], [4,40,44], [5,50,55]])

shuffle_arrays([a, b, c])

A few things to note:

The assert ensures that all input arrays have the same length along their first dimension.
Arrays shuffled in-place by their first dimension – nothing returned.
Random seed within positive int32 range.
If a repeatable shuffle is needed, seed value can be set.

After the shuffle, the data can be split using np.split or referenced using slices – depending on the application.

回答 7

您可以制作一个像这样的数组：

s = np.arange(0, len(a), 1)

然后随机播放：

np.random.shuffle(s)

现在使用this作为数组的参数。相同的改组参数返回相同的改组向量。

x_data = x_data[s]
x_label = x_label[s]

you can make an array like:

s = np.arange(0, len(a), 1)

then shuffle it:

np.random.shuffle(s)

now use this s as argument of your arrays. same shuffled arguments return same shuffled vectors.

x_data = x_data[s]
x_label = x_label[s]

回答 8

可以对连接的列表执行就地改组的一种方法是使用种子（可以是随机的）并使用numpy.random.shuffle进行改组。

# Set seed to a random number if you want the shuffling to be non-deterministic.
def shuffle(a, b, seed):
   np.random.seed(seed)
   np.random.shuffle(a)
   np.random.seed(seed)
   np.random.shuffle(b)

而已。这将以完全相同的方式混洗a和b。这也就地完成，这总是一个优点。

编辑，不要使用np.random.seed（）而是使用np.random.RandomState

def shuffle(a, b, seed):
   rand_state = np.random.RandomState(seed)
   rand_state.shuffle(a)
   rand_state.seed(seed)
   rand_state.shuffle(b)

调用它时，只需传入任何种子即可提供随机状态：

a = [1,2,3,4]
b = [11, 22, 33, 44]
shuffle(a, b, 12345)

输出：

>>> a
[1, 4, 2, 3]
>>> b
[11, 44, 22, 33]

编辑：修复了重新设置随机状态的代码

One way in which in-place shuffling can be done for connected lists is using a seed (it could be random) and using numpy.random.shuffle to do the shuffling.

# Set seed to a random number if you want the shuffling to be non-deterministic.
def shuffle(a, b, seed):
   np.random.seed(seed)
   np.random.shuffle(a)
   np.random.seed(seed)
   np.random.shuffle(b)

That’s it. This will shuffle both a and b in the exact same way. This is also done in-place which is always a plus.

EDIT, don’t use np.random.seed() use np.random.RandomState instead

def shuffle(a, b, seed):
   rand_state = np.random.RandomState(seed)
   rand_state.shuffle(a)
   rand_state.seed(seed)
   rand_state.shuffle(b)

When calling it just pass in any seed to feed the random state:

a = [1,2,3,4]
b = [11, 22, 33, 44]
shuffle(a, b, 12345)

Output:

>>> a
[1, 4, 2, 3]
>>> b
[11, 44, 22, 33]

Edit: Fixed code to re-seed the random state

回答 9

有一个众所周知的函数可以处理此问题：

from sklearn.model_selection import train_test_split
X, _, Y, _ = train_test_split(X,Y, test_size=0.0)

只需将test_size设置为0即可避免拆分，并为您提供随机数据。尽管它通常用于拆分训练数据和测试数据，但它的确也可以洗牌。
从文档

将数组或矩阵拆分为随机训练和测试子集

快速实用程序，用于包装输入验证以及next（ShuffleSplit（）。split（X，y））和应用程序，以将数据输入到单个调用中，以便在oneliner中拆分（以及可选地对子采样）数据。

There is a well-known function that can handle this:

from sklearn.model_selection import train_test_split
X, _, Y, _ = train_test_split(X,Y, test_size=0.0)

Just setting test_size to 0 will avoid splitting and give you shuffled data. Though it is usually used to split train and test data, it does shuffle them too.
From documentation

Split arrays or matrices into random train and test subsets

Quick utility that wraps input validation and next(ShuffleSplit().split(X, y)) and application to input data into a single call for splitting (and optionally subsampling) data in a oneliner.

回答 10

假设我们有两个数组：a和b。

a = np.array([[1,2,3],[4,5,6],[7,8,9]])
b = np.array([[9,1,1],[6,6,6],[4,2,0]])

我们首先可以通过排列第一维来获得行索引

indices = np.random.permutation(a.shape[0])
[1 2 0]

然后使用高级索引。在这里，我们使用相同的索引来同时对两个数组进行混洗。

a_shuffled = a[indices[:,np.newaxis], np.arange(a.shape[1])]
b_shuffled = b[indices[:,np.newaxis], np.arange(b.shape[1])]

这相当于

np.take(a, indices, axis=0)
[[4 5 6]
 [7 8 9]
 [1 2 3]]

np.take(b, indices, axis=0)
[[6 6 6]
 [4 2 0]
 [9 1 1]]

Say we have two arrays: a and b.

a = np.array([[1,2,3],[4,5,6],[7,8,9]])
b = np.array([[9,1,1],[6,6,6],[4,2,0]])

We can first obtain row indices by permutating first dimension

indices = np.random.permutation(a.shape[0])
[1 2 0]

Then use advanced indexing. Here we are using the same indices to shuffle both arrays in unison.

a_shuffled = a[indices[:,np.newaxis], np.arange(a.shape[1])]
b_shuffled = b[indices[:,np.newaxis], np.arange(b.shape[1])]

This is equivalent to

np.take(a, indices, axis=0)
[[4 5 6]
 [7 8 9]
 [1 2 3]]

np.take(b, indices, axis=0)
[[6 6 6]
 [4 2 0]
 [9 1 1]]

回答 11

如果要避免复制数组，则建议不要遍历数组，而是遍历数组中的每个元素，然后将其随机交换到数组中的另一个位置

for old_index in len(a):
    new_index = numpy.random.randint(old_index+1)
    a[old_index], a[new_index] = a[new_index], a[old_index]
    b[old_index], b[new_index] = b[new_index], b[old_index]

这实现了Knuth-Fisher-Yates随机播放算法。

If you want to avoid copying arrays, then I would suggest that instead of generating a permutation list, you go through every element in the array, and randomly swap it to another position in the array

for old_index in len(a):
    new_index = numpy.random.randint(old_index+1)
    a[old_index], a[new_index] = a[new_index], a[old_index]
    b[old_index], b[new_index] = b[new_index], b[old_index]

This implements the Knuth-Fisher-Yates shuffle algorithm.

回答 12

这似乎是一个非常简单的解决方案：

import numpy as np
def shuffle_in_unison(a,b):

    assert len(a)==len(b)
    c = np.arange(len(a))
    np.random.shuffle(c)

    return a[c],b[c]

a =  np.asarray([[1, 1], [2, 2], [3, 3]])
b =  np.asarray([11, 22, 33])

shuffle_in_unison(a,b)
Out[94]: 
(array([[3, 3],
        [2, 2],
        [1, 1]]),
 array([33, 22, 11]))

This seems like a very simple solution:

import numpy as np
def shuffle_in_unison(a,b):

    assert len(a)==len(b)
    c = np.arange(len(a))
    np.random.shuffle(c)

    return a[c],b[c]

a =  np.asarray([[1, 1], [2, 2], [3, 3]])
b =  np.asarray([11, 22, 33])

shuffle_in_unison(a,b)
Out[94]: 
(array([[3, 3],
        [2, 2],
        [1, 1]]),
 array([33, 22, 11]))

回答 13

举个例子，这就是我在做什么：

combo = []
for i in range(60000):
    combo.append((images[i], labels[i]))

shuffle(combo)

im = []
lab = []
for c in combo:
    im.append(c[0])
    lab.append(c[1])
images = np.asarray(im)
labels = np.asarray(lab)

With an example, this is what I’m doing:

combo = []
for i in range(60000):
    combo.append((images[i], labels[i]))

shuffle(combo)

im = []
lab = []
for c in combo:
    im.append(c[0])
    lab.append(c[1])
images = np.asarray(im)
labels = np.asarray(lab)

回答 14

我扩展了python的random.shuffle（）以获取第二个参数：

def shuffle_together(x, y):
    assert len(x) == len(y)

    for i in reversed(xrange(1, len(x))):
        # pick an element in x[:i+1] with which to exchange x[i]
        j = int(random.random() * (i+1))
        x[i], x[j] = x[j], x[i]
        y[i], y[j] = y[j], y[i]

这样，我可以确定改组发生在原位，并且函数不会太长或太复杂。

I extended python’s random.shuffle() to take a second arg:

def shuffle_together(x, y):
    assert len(x) == len(y)

    for i in reversed(xrange(1, len(x))):
        # pick an element in x[:i+1] with which to exchange x[i]
        j = int(random.random() * (i+1))
        x[i], x[j] = x[j], x[i]
        y[i], y[j] = y[j], y[i]

That way I can be sure that the shuffling happens in-place, and the function is not all too long or complicated.

回答 15

只需使用 numpy …

首先合并两个输入数组，一维数组是labels（y），二维数组是data（x），然后用NumPy shuffle方法将它们洗牌。最后将它们拆分并返回。

import numpy as np

def shuffle_2d(a, b):
    rows= a.shape[0]
    if b.shape != (rows,1):
        b = b.reshape((rows,1))
    S = np.hstack((b,a))
    np.random.shuffle(S)
    b, a  = S[:,0], S[:,1:]
    return a,b

features, samples = 2, 5
x, y = np.random.random((samples, features)), np.arange(samples)
x, y = shuffle_2d(train, test)

Just use numpy…

First merge the two input arrays 1D array is labels(y) and 2D array is data(x) and shuffle them with NumPy shuffle method. Finally split them and return.

import numpy as np

def shuffle_2d(a, b):
    rows= a.shape[0]
    if b.shape != (rows,1):
        b = b.reshape((rows,1))
    S = np.hstack((b,a))
    np.random.shuffle(S)
    b, a  = S[:,0], S[:,1:]
    return a,b

features, samples = 2, 5
x, y = np.random.random((samples, features)), np.arange(samples)
x, y = shuffle_2d(train, test)

知识问答

使用python随机整理数组，使用python随机化数组项顺序

2021年7月31日 Python实用宝典

问题：使用python随机整理数组，使用python随机化数组项顺序

用python重组数组的最简单方法是什么？

What’s the easiest way to shuffle an array with python?

回答 0

import random
random.shuffle(array)

import random
random.shuffle(array)

回答 1

import random
random.shuffle(array)

import random
random.shuffle(array)

回答 2

使用sklearn的另一种方法

from sklearn.utils import shuffle
X=[1,2,3]
y = ['one', 'two', 'three']
X, y = shuffle(X, y, random_state=0)
print(X)
print(y)

输出：

[2, 1, 3]
['two', 'one', 'three']

优点：您可以同时随机分配多个阵列，而不会破坏映射。并且“ random_state”可以控制改组以实现可重现的行为。

Alternative way to do this using sklearn

from sklearn.utils import shuffle
X=[1,2,3]
y = ['one', 'two', 'three']
X, y = shuffle(X, y, random_state=0)
print(X)
print(y)

Output:

[2, 1, 3]
['two', 'one', 'three']

Advantage: You can random multiple arrays simultaneously without disrupting the mapping. And ‘random_state’ can control the shuffling for reproducible behavior.

回答 3

其他答案最简单，但是令人讨厌的是该random.shuffle方法实际上不返回任何内容，它只是对给定列表进行排序。如果要链接调用，或者只想在一行中声明一个改组数组，则可以执行以下操作：

    import random
    def my_shuffle(array):
        random.shuffle(array)
        return array

然后，您可以执行以下操作：

    for suit in my_shuffle(['hearts', 'spades', 'clubs', 'diamonds']):

The other answers are the easiest, however it’s a bit annoying that the random.shuffle method doesn’t actually return anything – it just sorts the given list. If you want to chain calls or just be able to declare a shuffled array in one line you can do:

    import random
    def my_shuffle(array):
        random.shuffle(array)
        return array

Then you can do lines like:

    for suit in my_shuffle(['hearts', 'spades', 'clubs', 'diamonds']):

回答 4

当处理常规的Python列表时，random.shuffle()将按照前面的答案所示进行操作。

但是，当谈到ndarray（numpy.array）时，random.shuffle似乎打破了原来的ndarray。这是一个例子：

import random
import numpy as np
import numpy.random

a = np.array([1,2,3,4,5,6])
a.shape = (3,2)
print a
random.shuffle(a) # a will definitely be destroyed
print a

只需使用： np.random.shuffle(a)

像一样random.shuffle，np.random.shuffle就地调整数组的位置。

When dealing with regular Python lists, random.shuffle() will do the job just as the previous answers show.

But when it come to ndarray(numpy.array), random.shuffle seems to break the original ndarray. Here is an example:

import random
import numpy as np
import numpy.random

a = np.array([1,2,3,4,5,6])
a.shape = (3,2)
print a
random.shuffle(a) # a will definitely be destroyed
print a

Just use: np.random.shuffle(a)

Like random.shuffle, np.random.shuffle shuffles the array in-place.

回答 5

万一您想要一个新的数组，可以使用sample：

import random
new_array = random.sample( array, len(array) )

Just in case you want a new array you can use sample:

import random
new_array = random.sample( array, len(array) )

回答 6

您可以使用随机键对数组进行排序

sorted(array, key = lambda x: random.random())

密钥只能读取一次，因此排序期间的比较项目仍然有效。

但是看起来好像random.shuffle(array)会更快，因为它是用C编写的

You can sort your array with random key

sorted(array, key = lambda x: random.random())

key only be read once so comparing item during sort still efficient.

but look like random.shuffle(array) will be faster since it written in C

回答 7

除了前面的答复，我还要介绍另一个功能。

numpy.random.shuffle以及random.shuffle执行就地改组。但是，如果要返回经过改组的数组，numpy.random.permutation则可以使用该函数。

In addition to the previous replies, I would like to introduce another function.

numpy.random.shuffle as well as random.shuffle perform in-place shuffling. However, if you want to return a shuffled array numpy.random.permutation is the function to use.

回答 8

我不知道我曾经用过，random.shuffle()但是它返回“ None”给我，所以我写了这个，可能对某人有帮助

def shuffle(arr):
    for n in range(len(arr) - 1):
        rnd = random.randint(0, (len(arr) - 1))
        val1 = arr[rnd]
        val2 = arr[rnd - 1]

        arr[rnd - 1] = val1
        arr[rnd] = val2

    return arr

I don’t know I used random.shuffle() but it return ‘None’ to me, so I wrote this, might helpful to someone

def shuffle(arr):
    for n in range(len(arr) - 1):
        rnd = random.randint(0, (len(arr) - 1))
        val1 = arr[rnd]
        val2 = arr[rnd - 1]

        arr[rnd - 1] = val1
        arr[rnd] = val2

    return arr

回答 9

# arr = numpy array to shuffle

def shuffle(arr):
    a = numpy.arange(len(arr))
    b = numpy.empty(1)
    for i in range(len(arr)):
        sel = numpy.random.random_integers(0, high=len(a)-1, size=1)
        b = numpy.append(b, a[sel])
        a = numpy.delete(a, sel)
    b = b[1:].astype(int)
    return arr[b]

# arr = numpy array to shuffle

def shuffle(arr):
    a = numpy.arange(len(arr))
    b = numpy.empty(1)
    for i in range(len(arr)):
        sel = numpy.random.random_integers(0, high=len(a)-1, size=1)
        b = numpy.append(b, a[sel])
        a = numpy.delete(a, sel)
    b = b[1:].astype(int)
    return arr[b]

回答 10

请注意，random.shuffle()不应在多维数组上使用它，因为它会引起重复。

假设您想沿数组的第一维进行混洗，我们可以创建以下测试示例，

import numpy as np
x = np.zeros((10, 2, 3))

for i in range(10):
   x[i, ...] = i*np.ones((2,3))

因此，沿着第一个轴，第i个元素对应于2×3矩阵，其中所有元素都等于i。

如果我们对多维数组使用正确的随机播放功能，即np.random.shuffle(x)该数组将根据需要沿第一个轴随机播放。但是，使用random.shuffle(x)会导致重复。您可以通过len(np.unique(x))在改组后运行来检查此问题，使用时可以得到10（按预期），np.random.shuffle()但使用时只有5 random.shuffle()。

Be aware that random.shuffle() should not be used on multi-dimensional arrays as it causes repetitions.

Imagine you want to shuffle an array along its first dimension, we can create the following test example,

import numpy as np
x = np.zeros((10, 2, 3))

for i in range(10):
   x[i, ...] = i*np.ones((2,3))

so that along the first axis, the i-th element corresponds to a 2×3 matrix where all the elements are equal to i.

If we use the correct shuffle function for multi-dimensional arrays, i.e. np.random.shuffle(x), the array will be shuffled along the first axis as desired. However, using random.shuffle(x) will cause repetitions. You can check this by running len(np.unique(x)) after shuffling which gives you 10 (as expected) with np.random.shuffle() but only around 5 when using random.shuffle().

知识问答

随机播放DataFrame行

2021年7月25日 Python实用宝典

问题：随机播放DataFrame行

我有以下DataFrame：

    Col1  Col2  Col3  Type
0      1     2     3     1
1      4     5     6     1
...
20     7     8     9     2
21    10    11    12     2
...
45    13    14    15     3
46    16    17    18     3
...

从csv文件读取DataFrame。所有具有Type1的行都在最上面，然后是具有Type2 的行，然后是具有Type3 的行，依此类推。

我想重新整理DataFrame行的顺序，以便将所有行Type混合在一起。可能的结果可能是：

    Col1  Col2  Col3  Type
0      7     8     9     2
1     13    14    15     3
...
20     1     2     3     1
21    10    11    12     2
...
45     4     5     6     1
46    16    17    18     3
...

我该如何实现？

I have the following DataFrame:

    Col1  Col2  Col3  Type
0      1     2     3     1
1      4     5     6     1
...
20     7     8     9     2
21    10    11    12     2
...
45    13    14    15     3
46    16    17    18     3
...

The DataFrame is read from a csv file. All rows which have Type 1 are on top, followed by the rows with Type 2, followed by the rows with Type 3, etc.

I would like to shuffle the order of the DataFrame’s rows, so that all Type‘s are mixed. A possible result could be:

    Col1  Col2  Col3  Type
0      7     8     9     2
1     13    14    15     3
...
20     1     2     3     1
21    10    11    12     2
...
45     4     5     6     1
46    16    17    18     3
...

How can I achieve this?

回答 0

使用Pandas的惯用方式是使用.sample数据框的方法对所有行进行采样而无需替换：

df.sample(frac=1)

的frac关键字参数指定的行的分数到随机样品中返回，所以frac=1装置返回所有行（随机顺序）。

注意： 如果您希望就地改组数据帧并重置索引，则可以执行例如

df = df.sample(frac=1).reset_index(drop=True)

在此，指定drop=True可防止.reset_index创建包含旧索引条目的列。

后续注解：尽管上面的操作似乎并不就位，但是python / pandas足够聪明，不会为经过改组的对象做另一个malloc。也就是说，即使参考对象已更改（我的意思id(df_old)是与相同id(df_new)），底层C对象仍然相同。为了证明确实如此，您可以运行一个简单的内存探查器：

$ python3 -m memory_profiler .\test.py
Filename: .\test.py

Line #    Mem usage    Increment   Line Contents
================================================
     5     68.5 MiB     68.5 MiB   @profile
     6                             def shuffle():
     7    847.8 MiB    779.3 MiB       df = pd.DataFrame(np.random.randn(100, 1000000))
     8    847.9 MiB      0.1 MiB       df = df.sample(frac=1).reset_index(drop=True)

The idiomatic way to do this with Pandas is to use the .sample method of your dataframe to sample all rows without replacement:

df.sample(frac=1)

The frac keyword argument specifies the fraction of rows to return in the random sample, so frac=1 means return all rows (in random order).

Note: If you wish to shuffle your dataframe in-place and reset the index, you could do e.g.

df = df.sample(frac=1).reset_index(drop=True)

Here, specifying drop=True prevents .reset_index from creating a column containing the old index entries.

Follow-up note: Although it may not look like the above operation is in-place, python/pandas is smart enough not to do another malloc for the shuffled object. That is, even though the reference object has changed (by which I mean id(df_old) is not the same as id(df_new)), the underlying C object is still the same. To show that this is indeed the case, you could run a simple memory profiler:

$ python3 -m memory_profiler .\test.py
Filename: .\test.py

Line #    Mem usage    Increment   Line Contents
================================================
     5     68.5 MiB     68.5 MiB   @profile
     6                             def shuffle():
     7    847.8 MiB    779.3 MiB       df = pd.DataFrame(np.random.randn(100, 1000000))
     8    847.9 MiB      0.1 MiB       df = df.sample(frac=1).reset_index(drop=True)

回答 1

您可以为此简单地使用sklearn

from sklearn.utils import shuffle
df = shuffle(df)

You can simply use sklearn for this

from sklearn.utils import shuffle
df = shuffle(df)

回答 2

您可以通过使用改组后的索引建立索引来改组数据帧的行。为此，您可以使用np.random.permutation（但np.random.choice也可以）：

In [12]: df = pd.read_csv(StringIO(s), sep="\s+")

In [13]: df
Out[13]: 
    Col1  Col2  Col3  Type
0      1     2     3     1
1      4     5     6     1
20     7     8     9     2
21    10    11    12     2
45    13    14    15     3
46    16    17    18     3

In [14]: df.iloc[np.random.permutation(len(df))]
Out[14]: 
    Col1  Col2  Col3  Type
46    16    17    18     3
45    13    14    15     3
20     7     8     9     2
0      1     2     3     1
1      4     5     6     1
21    10    11    12     2

如果要像示例中那样将索引的编号始终保持为1、2，..，n，则只需重置索引即可： df_shuffled.reset_index(drop=True)

You can shuffle the rows of a dataframe by indexing with a shuffled index. For this, you can eg use np.random.permutation (but np.random.choice is also a possibility):

In [12]: df = pd.read_csv(StringIO(s), sep="\s+")

In [13]: df
Out[13]: 
    Col1  Col2  Col3  Type
0      1     2     3     1
1      4     5     6     1
20     7     8     9     2
21    10    11    12     2
45    13    14    15     3
46    16    17    18     3

In [14]: df.iloc[np.random.permutation(len(df))]
Out[14]: 
    Col1  Col2  Col3  Type
46    16    17    18     3
45    13    14    15     3
20     7     8     9     2
0      1     2     3     1
1      4     5     6     1
21    10    11    12     2

If you want to keep the index numbered from 1, 2, .., n as in your example, you can simply reset the index: df_shuffled.reset_index(drop=True)

回答 3

TL; DR：np.random.shuffle(ndarray)可以胜任。
所以，在你的情况下

np.random.shuffle(DataFrame.values)

DataFrame在后台，使用NumPy ndarray作为数据持有者。（您可以从DataFrame源代码检查）

因此，如果使用np.random.shuffle()，它将沿多维数组的第一个轴随机排列数组。但是DataFrame遗体的索引仍然没有改组。

虽然，有一些要考虑的问题。

函数不返回任何内容。如果要保留原始对象的副本，则必须这样做，然后再传递给该函数。
sklearn.utils.shuffle()，如用户tj89所建议的那样，可以指定random_state其他选项来控制输出。您可能需要出于开发目的。
sklearn.utils.shuffle()是比较快的。但洗牌的轴信息（索引，列）DataFrame与沿ndarray它包含的内容。

基准结果

在sklearn.utils.shuffle()和之间np.random.shuffle()。

ndarray

nd = sklearn.utils.shuffle(nd)

0.10793248389381915秒 快8倍

np.random.shuffle(nd)

0.8897626010002568秒

数据框

df = sklearn.utils.shuffle(df)

0.3183923360193148秒 快3倍

np.random.shuffle(df.values)

0.9357550159329548秒

结论：如果可以将轴信息（索引，列）与ndarray一起改组，请使用sklearn.utils.shuffle()。否则，使用np.random.shuffle()

使用的代码

import timeit
setup = '''
import numpy as np
import pandas as pd
import sklearn
nd = np.random.random((1000, 100))
df = pd.DataFrame(nd)
'''

timeit.timeit('nd = sklearn.utils.shuffle(nd)', setup=setup, number=1000)
timeit.timeit('np.random.shuffle(nd)', setup=setup, number=1000)
timeit.timeit('df = sklearn.utils.shuffle(df)', setup=setup, number=1000)
timeit.timeit('np.random.shuffle(df.values)', setup=setup, number=1000)

Python 基准测试

TL;DR: np.random.shuffle(ndarray) can do the job.
So, in your case

np.random.shuffle(DataFrame.values)

DataFrame, under the hood, uses NumPy ndarray as data holder. (You can check from DataFrame source code)

So if you use np.random.shuffle(), it would shuffles the array along the first axis of a multi-dimensional array. But index of the DataFrame remains unshuffled.

Though, there are some points to consider.

function returns none. In case you want to keep a copy of the original object, you have to do so before you pass to the function.
sklearn.utils.shuffle(), as user tj89 suggested, can designate random_state along with another option to control output. You may want that for dev purpose.
sklearn.utils.shuffle() is faster. But WILL SHUFFLE the axis info(index, column) of the DataFrame along with the ndarray it contains.

Benchmark result

between sklearn.utils.shuffle() and np.random.shuffle().

ndarray

nd = sklearn.utils.shuffle(nd)

0.10793248389381915 sec. 8x faster

np.random.shuffle(nd)

0.8897626010002568 sec

DataFrame

df = sklearn.utils.shuffle(df)

0.3183923360193148 sec. 3x faster

np.random.shuffle(df.values)

0.9357550159329548 sec

Conclusion: If it is okay to axis info(index, column) to be shuffled along with ndarray, use sklearn.utils.shuffle(). Otherwise, use np.random.shuffle()

used code

import timeit
setup = '''
import numpy as np
import pandas as pd
import sklearn
nd = np.random.random((1000, 100))
df = pd.DataFrame(nd)
'''

timeit.timeit('nd = sklearn.utils.shuffle(nd)', setup=setup, number=1000)
timeit.timeit('np.random.shuffle(nd)', setup=setup, number=1000)
timeit.timeit('df = sklearn.utils.shuffle(df)', setup=setup, number=1000)
timeit.timeit('np.random.shuffle(df.values)', setup=setup, number=1000)

python benchmarking

回答 4

（我没有足够的声誉在最高职位上对此发表评论，所以我希望其他人可以为我这样做。）第一种方法引起了人们的关注：

df.sample(frac=1)

进行深拷贝或只是更改数据框。我运行了以下代码：

print(hex(id(df)))
print(hex(id(df.sample(frac=1))))
print(hex(id(df.sample(frac=1).reset_index(drop=True))))

我的结果是：

0x1f8a784d400
0x1f8b9d65e10
0x1f8b9d65b70

这意味着该方法未返回上一个注释中建议的相同对象。因此，此方法的确可以制作随机的副本。

(I don’t have enough reputation to comment this on the top post, so I hope someone else can do that for me.) There was a concern raised that the first method:

df.sample(frac=1)

made a deep copy or just changed the dataframe. I ran the following code:

print(hex(id(df)))
print(hex(id(df.sample(frac=1))))
print(hex(id(df.sample(frac=1).reset_index(drop=True))))

and my results were:

0x1f8a784d400
0x1f8b9d65e10
0x1f8b9d65b70

which means the method is not returning the same object, as was suggested in the last comment. So this method does indeed make a shuffled copy.

回答 5

还有用的是，如果将其用于Machine_learning并且希望始终分离相同的数据，则可以使用：

df.sample(n=len(df), random_state=42)

这样可以确保您的随机选择始终可复制

What is also useful, if you use it for Machine_learning and want to seperate always the same data, you could use:

df.sample(n=len(df), random_state=42)

this makes sure, that you keep your random choice always replicatable

回答 6

AFAIK最简单的解决方案是：

df_shuffled = df.reindex(np.random.permutation(df.index))

AFAIK the simplest solution is:

df_shuffled = df.reindex(np.random.permutation(df.index))

回答 7

通过取样阵列中的这种情况下，洗牌大熊猫数据帧索引和随机那么它的顺序来设置所述阵列的数据帧的索引。现在根据索引对数据帧进行排序。这是您经过改组的数据框

import random
df = pd.DataFrame({"a":[1,2,3,4],"b":[5,6,7,8]})
index = [i for i in range(df.shape[0])]
random.shuffle(index)
df.set_index([index]).sort_index()

输出

在上面的代码中将数据框插入我的位置。

shuffle the pandas data frame by taking a sample array in this case index and randomize its order then set the array as an index of data frame. Now sort the data frame according to index. Here goes your shuffled dataframe

import random
df = pd.DataFrame({"a":[1,2,3,4],"b":[5,6,7,8]})
index = [i for i in range(df.shape[0])]
random.shuffle(index)
df.set_index([index]).sort_index()

output

Insert you data frame in the place of mine in above code .

回答 8

这是另一种方式：

df['rnd'] = np.random.rand(len(df)) df = df.sort_values(by='rnd', inplace=True).drop('rnd', axis=1)

Here is another way:

df['rnd'] = np.random.rand(len(df)) df = df.sort_values(by='rnd', inplace=True).drop('rnd', axis=1)

知识问答

改组对象列表

2021年7月25日 Python实用宝典

问题：改组对象列表

我有一个对象列表，我想对其进行洗牌。我以为可以使用该random.shuffle方法，但是当列表中包含对象时，这似乎失败了。是否有一种用于改组对象的方法或解决此问题的另一种方法？

import random

class A:
    foo = "bar"

a1 = a()
a2 = a()
b = [a1, a2]

print(random.shuffle(b))

这将失败。

I have a list of objects and I want to shuffle them. I thought I could use the random.shuffle method, but this seems to fail when the list is of objects. Is there a method for shuffling objects or another way around this?

import random

class A:
    foo = "bar"

a1 = a()
a2 = a()
b = [a1, a2]

print(random.shuffle(b))

This will fail.

回答 0

random.shuffle应该管用。这是一个示例，其中对象是列表：

from random import shuffle
x = [[i] for i in range(10)]
shuffle(x)

# print(x)  gives  [[9], [2], [7], [0], [4], [5], [3], [1], [8], [6]]
# of course your results will vary

请注意，随机播放在适当的地方起作用，并返回None。

random.shuffle should work. Here’s an example, where the objects are lists:

from random import shuffle
x = [[i] for i in range(10)]
shuffle(x)

# print(x)  gives  [[9], [2], [7], [0], [4], [5], [3], [1], [8], [6]]
# of course your results will vary

Note that shuffle works in place, and returns None.

回答 1

当您了解到就地改组就是问题所在。我也经常遇到问题，而且似乎也常常忘记如何复制列表。使用sample(a, len(a))是解决方案，使用len(a)作为样本量。有关Python文档，请参见https://docs.python.org/3.6/library/random.html#random.sample。

这是使用的简单版本random.sample()，它将经过改组的结果作为新列表返回。

import random

a = range(5)
b = random.sample(a, len(a))
print a, b, "two list same:", a == b
# print: [0, 1, 2, 3, 4] [2, 1, 3, 4, 0] two list same: False

# The function sample allows no duplicates.
# Result can be smaller but not larger than the input.
a = range(555)
b = random.sample(a, len(a))
print "no duplicates:", a == list(set(b))

try:
    random.sample(a, len(a) + 1)
except ValueError as e:
    print "Nope!", e

# print: no duplicates: True
# print: Nope! sample larger than population

As you learned the in-place shuffling was the problem. I also have problem frequently, and often seem to forget how to copy a list, too. Using sample(a, len(a)) is the solution, using len(a) as the sample size. See https://docs.python.org/3.6/library/random.html#random.sample for the Python documentation.

Here’s a simple version using random.sample() that returns the shuffled result as a new list.

import random

a = range(5)
b = random.sample(a, len(a))
print a, b, "two list same:", a == b
# print: [0, 1, 2, 3, 4] [2, 1, 3, 4, 0] two list same: False

# The function sample allows no duplicates.
# Result can be smaller but not larger than the input.
a = range(555)
b = random.sample(a, len(a))
print "no duplicates:", a == list(set(b))

try:
    random.sample(a, len(a) + 1)
except ValueError as e:
    print "Nope!", e

# print: no duplicates: True
# print: Nope! sample larger than population

回答 2

我也花了一些时间来做到这一点。但是洗牌的文档非常清楚：

在列表中随机排列x ; 不返回。

所以你不应该print(random.shuffle(b))。相反random.shuffle(b)，然后print(b)。

It took me some time to get that too. But the documentation for shuffle is very clear:

shuffle list x in place; return None.

So you shouldn’t print(random.shuffle(b)). Instead do random.shuffle(b) and then print(b).

回答 3

#!/usr/bin/python3

import random

s=list(range(5))
random.shuffle(s) # << shuffle before print or assignment
print(s)

# print: [2, 4, 1, 3, 0]

#!/usr/bin/python3

import random

s=list(range(5))
random.shuffle(s) # << shuffle before print or assignment
print(s)

# print: [2, 4, 1, 3, 0]

回答 4

如果您碰巧已经使用numpy（在科学和金融应用中非常流行），则可以节省导入时间。

import numpy as np    
np.random.shuffle(b)
print(b)

http://docs.scipy.org/doc/numpy/reference/generation/numpy.random.shuffle.html

If you happen to be using numpy already (very popular for scientific and financial applications) you can save yourself an import.

import numpy as np    
np.random.shuffle(b)
print(b)

http://docs.scipy.org/doc/numpy/reference/generated/numpy.random.shuffle.html

回答 5

>>> import random
>>> a = ['hi','world','cat','dog']
>>> random.shuffle(a,random.random)
>>> a
['hi', 'cat', 'dog', 'world']

这对我来说可以。确保设置随机方法。

>>> import random
>>> a = ['hi','world','cat','dog']
>>> random.shuffle(a,random.random)
>>> a
['hi', 'cat', 'dog', 'world']

It works fine for me. Make sure to set the random method.

回答 6

如果您有多个列表，则可能要先定义排列（随机排列列表/重新排列列表中项目的方式），然后将其应用于所有列表：

import random

perm = list(range(len(list_one)))
random.shuffle(perm)
list_one = [list_one[index] for index in perm]
list_two = [list_two[index] for index in perm]

脾气暴躁

如果您的列表是numpy数组，则更为简单：

import numpy as np

perm = np.random.permutation(len(list_one))
list_one = list_one[perm]
list_two = list_two[perm]

处理器

我创建了mpu具有以下consistent_shuffle功能的小型实用程序包：

import mpu

# Necessary if you want consistent results
import random
random.seed(8)

# Define example lists
list_one = [1,2,3]
list_two = ['a', 'b', 'c']

# Call the function
list_one, list_two = mpu.consistent_shuffle(list_one, list_two)

请注意，它mpu.consistent_shuffle接受任意数量的参数。因此，您也可以使用它洗牌三个或更多列表。

If you have multiple lists, you might want to define the permutation (the way you shuffle the list / rearrange the items in the list) first and then apply it to all lists:

import random

perm = list(range(len(list_one)))
random.shuffle(perm)
list_one = [list_one[index] for index in perm]
list_two = [list_two[index] for index in perm]

Numpy / Scipy

If your lists are numpy arrays, it is simpler:

import numpy as np

perm = np.random.permutation(len(list_one))
list_one = list_one[perm]
list_two = list_two[perm]

mpu

I’ve created the small utility package mpu which has the consistent_shuffle function:

import mpu

# Necessary if you want consistent results
import random
random.seed(8)

# Define example lists
list_one = [1,2,3]
list_two = ['a', 'b', 'c']

# Call the function
list_one, list_two = mpu.consistent_shuffle(list_one, list_two)

Note that mpu.consistent_shuffle takes an arbitrary number of arguments. So you can also shuffle three or more lists with it.

回答 7

from random import random
my_list = range(10)
shuffled_list = sorted(my_list, key=lambda x: random())

对于要交换订购功能的某些应用程序，此替代方法可能很有用。

from random import random
my_list = range(10)
shuffled_list = sorted(my_list, key=lambda x: random())

This alternative may be useful for some applications where you want to swap the ordering function.

回答 8

在某些情况下，使用numpy数组时，请random.shuffle在数组中使用创建的重复数据。

另一种方法是使用numpy.random.shuffle。如果您已经在使用numpy，那么这是优于generic的首选方法random.shuffle。

numpy.random.shuffle

例

>>> import numpy as np
>>> import random

使用random.shuffle：

>>> foo = np.array([[1,2,3],[4,5,6],[7,8,9]])
>>> foo

array([[1, 2, 3],
       [4, 5, 6],
       [7, 8, 9]])


>>> random.shuffle(foo)
>>> foo

array([[1, 2, 3],
       [1, 2, 3],
       [4, 5, 6]])

使用numpy.random.shuffle：

>>> foo = np.array([[1,2,3],[4,5,6],[7,8,9]])
>>> foo

array([[1, 2, 3],
       [4, 5, 6],
       [7, 8, 9]])


>>> np.random.shuffle(foo)
>>> foo

array([[1, 2, 3],
       [7, 8, 9],
       [4, 5, 6]])

In some cases when using numpy arrays, using random.shuffle created duplicate data in the array.

An alternative is to use numpy.random.shuffle. If you’re working with numpy already, this is the preferred method over the generic random.shuffle.

numpy.random.shuffle

Example

>>> import numpy as np
>>> import random

Using random.shuffle:

>>> foo = np.array([[1,2,3],[4,5,6],[7,8,9]])
>>> foo

array([[1, 2, 3],
       [4, 5, 6],
       [7, 8, 9]])


>>> random.shuffle(foo)
>>> foo

array([[1, 2, 3],
       [1, 2, 3],
       [4, 5, 6]])

Using numpy.random.shuffle:

>>> foo = np.array([[1,2,3],[4,5,6],[7,8,9]])
>>> foo

array([[1, 2, 3],
       [4, 5, 6],
       [7, 8, 9]])


>>> np.random.shuffle(foo)
>>> foo

array([[1, 2, 3],
       [7, 8, 9],
       [4, 5, 6]])

回答 9

对于单行代码，请使用random.sample(list_to_be_shuffled, length_of_the_list)示例：

import random
random.sample(list(range(10)), 10)

输出：[2、9、7、8、3、0、4、1、6、5]

For one-liners, userandom.sample(list_to_be_shuffled, length_of_the_list) with an example:

import random
random.sample(list(range(10)), 10)

outputs: [2, 9, 7, 8, 3, 0, 4, 1, 6, 5]

回答 10

当使用’foo’调用时，’print func（foo）’将输出’func’的返回值。但是，’shuffle’的返回类型为None，因为该列表将被修改，因此不打印任何内容。解决方法：

# shuffle the list in place 
random.shuffle(b)

# print it
print(b)

如果您更喜欢函数式编程风格，则可能需要创建以下包装函数：

def myshuffle(ls):
    random.shuffle(ls)
    return ls

‘print func(foo)’ will print the return value of ‘func’ when called with ‘foo’. ‘shuffle’ however has None as its return type, as the list will be modified in place, hence it prints nothing. Workaround:

# shuffle the list in place 
random.shuffle(b)

# print it
print(b)

If you’re more into functional programming style you might want to make the following wrapper function:

def myshuffle(ls):
    random.shuffle(ls)
    return ls

回答 11

可以定义一个函数shuffled（sort与vs 相同sorted）

def shuffled(x):
    import random
    y = x[:]
    random.shuffle(y)
    return y

x = shuffled([1, 2, 3, 4])
print x

One can define a function called shuffled (in the same sense of sort vs sorted)

def shuffled(x):
    import random
    y = x[:]
    random.shuffle(y)
    return y

x = shuffled([1, 2, 3, 4])
print x

回答 12

import random

class a:
    foo = "bar"

a1 = a()
a2 = a()
a3 = a()
a4 = a()
b = [a1,a2,a3,a4]

random.shuffle(b)
print(b)

shuffle 到位，因此不要打印结果None，而是列表。

import random

class a:
    foo = "bar"

a1 = a()
a2 = a()
a3 = a()
a4 = a()
b = [a1,a2,a3,a4]

random.shuffle(b)
print(b)

shuffle is in place, so do not print result, which is None, but the list.

回答 13

您可以这样做：

>>> A = ['r','a','n','d','o','m']
>>> B = [1,2,3,4,5,6]
>>> import random
>>> random.sample(A+B, len(A+B))
[3, 'r', 4, 'n', 6, 5, 'm', 2, 1, 'a', 'o', 'd']

如果要返回到两个列表，则可以将此长列表分成两部分。

You can go for this:

>>> A = ['r','a','n','d','o','m']
>>> B = [1,2,3,4,5,6]
>>> import random
>>> random.sample(A+B, len(A+B))
[3, 'r', 4, 'n', 6, 5, 'm', 2, 1, 'a', 'o', 'd']

if you want to go back to two lists, you then split this long list into two.

回答 14

您可以构建一个将列表作为参数并返回列表的随机版本的函数：

from random import *

def listshuffler(inputlist):
    for i in range(len(inputlist)):
        swap = randint(0,len(inputlist)-1)
        temp = inputlist[swap]
        inputlist[swap] = inputlist[i]
        inputlist[i] = temp
    return inputlist

you could build a function that takes a list as a parameter and returns a shuffled version of the list:

from random import *

def listshuffler(inputlist):
    for i in range(len(inputlist)):
        swap = randint(0,len(inputlist)-1)
        temp = inputlist[swap]
        inputlist[swap] = inputlist[i]
        inputlist[i] = temp
    return inputlist

回答 15

""" to shuffle random, set random= True """

def shuffle(x,random=False):
     shuffled = []
     ma = x
     if random == True:
         rando = [ma[i] for i in np.random.randint(0,len(ma),len(ma))]
         return rando
     if random == False:
          for i in range(len(ma)):
          ave = len(ma)//3
          if i < ave:
             shuffled.append(ma[i+ave])
          else:
             shuffled.append(ma[i-ave])    
     return shuffled

""" to shuffle random, set random= True """

def shuffle(x,random=False):
     shuffled = []
     ma = x
     if random == True:
         rando = [ma[i] for i in np.random.randint(0,len(ma),len(ma))]
         return rando
     if random == False:
          for i in range(len(ma)):
          ave = len(ma)//3
          if i < ave:
             shuffled.append(ma[i+ave])
          else:
             shuffled.append(ma[i-ave])    
     return shuffled

回答 16

您可以使用随机播放或采样。两者均来自随机模块。

import random
def shuffle(arr1):
    n=len(arr1)
    b=random.sample(arr1,n)
    return b

要么

import random
def shuffle(arr1):
    random.shuffle(arr1)
    return arr1

you can either use shuffle or sample . both of which come from random module.

import random
def shuffle(arr1):
    n=len(arr1)
    b=random.sample(arr1,n)
    return b

import random
def shuffle(arr1):
    random.shuffle(arr1)
    return arr1

回答 17

确保您没有命名源文件random.py，并且工作目录中没有名为random.pyc ..的文件，这可能会导致程序尝试导入本地random.py文件而不是pythons random模块。

Make sure you are not naming your source file random.py, and that there is not a file in your working directory called random.pyc.. either could cause your program to try and import your local random.py file instead of pythons random module.

回答 18

def shuffle(_list):
    if not _list == []:
        import random
        list2 = []
        while _list != []:
            card = random.choice(_list)
            _list.remove(card)
            list2.append(card)
        while list2 != []:
            card1 = list2[0]
            list2.remove(card1)
            _list.append(card1)
        return _list

def shuffle(_list):
    if not _list == []:
        import random
        list2 = []
        while _list != []:
            card = random.choice(_list)
            _list.remove(card)
            list2.append(card)
        while list2 != []:
            card1 = list2[0]
            list2.remove(card1)
            _list.append(card1)
        return _list

回答 19

import random
class a:
    foo = "bar"

a1 = a()
a2 = a()
b = [a1.foo,a2.foo]
random.shuffle(b)

import random
class a:
    foo = "bar"

a1 = a()
a2 = a()
b = [a1.foo,a2.foo]
random.shuffle(b)

回答 20

改组过程是“有替换的”，因此每个项目的出现可能会改变！至少当列表中的项目也同时列出时。

例如，

ml = [[0], [1]] * 10

后，

random.shuffle(ml)

[0]的数目可以是9或8，但不完全是10。

The shuffling process is “with replacement”, so the occurrence of each item may change! At least when when items in your list is also list.

E.g.,

ml = [[0], [1]] * 10

After,

random.shuffle(ml)

The number of [0] may be 9 or 8, but not exactly 10.

回答 21

计划：无需依赖库就可以完成改组工作。示例：从元素0的开头开始浏览列表；找到一个新的随机位置，例如6，将0的值放在6中，将6的值放在0中。移到元素1并重复此过程，以此类推。

import random
iteration = random.randint(2, 100)
temp_var = 0
while iteration > 0:

    for i in range(1, len(my_list)): # have to use range with len()
        for j in range(1, len(my_list) - i):
            # Using temp_var as my place holder so I don't lose values
            temp_var = my_list[i]
            my_list[i] = my_list[j]
            my_list[j] = temp_var

        iteration -= 1

Plan: Write out the shuffle without relying on a library to do the heavy lifting. Example: Go through the list from the beginning starting with element 0; find a new random position for it, say 6, put 0’s value in 6 and 6’s value in 0. Move on to element 1 and repeat this process, and so on through the rest of the list

import random
iteration = random.randint(2, 100)
temp_var = 0
while iteration > 0:

    for i in range(1, len(my_list)): # have to use range with len()
        for j in range(1, len(my_list) - i):
            # Using temp_var as my place holder so I don't lose values
            temp_var = my_list[i]
            my_list[i] = my_list[j]
            my_list[j] = temp_var

        iteration -= 1

回答 22

它工作正常。我在这里尝试使用功能作为列表对象：

    from random import shuffle

    def foo1():
        print "foo1",

    def foo2():
        print "foo2",

    def foo3():
        print "foo3",

    A=[foo1,foo2,foo3]

    for x in A:
        x()

    print "\r"

    shuffle(A)
    for y in A:
        y()

它打印出来：foo1 foo2 foo3 foo2 foo3 foo1（最后一行中的foos具有随机顺序）

It works fine. I am trying it here with functions as list objects:

    from random import shuffle

    def foo1():
        print "foo1",

    def foo2():
        print "foo2",

    def foo3():
        print "foo3",

    A=[foo1,foo2,foo3]

    for x in A:
        x()

    print "\r"

    shuffle(A)
    for y in A:
        y()

It prints out: foo1 foo2 foo3 foo2 foo3 foo1 (the foos in the last row have a random order)

问题：更好地协调两个numpy数组的更好方法

回答 0

回答 1

回答 2

回答 3

回答 4

回答 5

回答 6

回答 7

回答 8

编辑，不要使用np.random.seed（）而是使用np.random.RandomState

EDIT, don’t use np.random.seed() use np.random.RandomState instead

回答 9

回答 10

回答 11

回答 12

回答 13

回答 14

回答 15

问题：使用python随机整理数组，使用python随机化数组项顺序

回答 0

回答 1

回答 2

回答 3

回答 4

回答 5

回答 6

回答 7

回答 8

回答 9

回答 10

问题：随机播放DataFrame行

回答 0

回答 1

回答 2

回答 3

基准结果

ndarray

数据框

使用的代码

Benchmark result

ndarray

DataFrame

used code

回答 4

回答 5

回答 6

回答 7

回答 8

问题：改组对象列表

回答 0

回答 1

回答 2

回答 3

回答 4

回答 5

回答 6

脾气暴躁

处理器

Numpy / Scipy

mpu

回答 7

回答 8

回答 9

回答 10

回答 11

回答 12

回答 13

回答 14

回答 15

回答 16

回答 17

回答 18

回答 19

回答 20

回答 21

回答 22

有趣好用的Python教程