标签归档:permutation

如何生成列表的所有排列?

问题:如何生成列表的所有排列?

如何在Python中生成列表的所有排列,而与列表中元素的类型无关?

例如:

permutations([])
[]

permutations([1])
[1]

permutations([1, 2])
[1, 2]
[2, 1]

permutations([1, 2, 3])
[1, 2, 3]
[1, 3, 2]
[2, 1, 3]
[2, 3, 1]
[3, 1, 2]
[3, 2, 1]

How do you generate all the permutations of a list in Python, independently of the type of elements in that list?

For example:

permutations([])
[]

permutations([1])
[1]

permutations([1, 2])
[1, 2]
[2, 1]

permutations([1, 2, 3])
[1, 2, 3]
[1, 3, 2]
[2, 1, 3]
[2, 3, 1]
[3, 1, 2]
[3, 2, 1]

回答 0

从Python 2.6开始(如果您使用的是Python 3),您可以使用标准库工具:itertools.permutations

import itertools
list(itertools.permutations([1, 2, 3]))

如果您出于某种原因使用旧版Python(<2.6),或者只是想知道它的工作原理,那么这是一种不错的方法,摘自 http://code.activestate.com/recipes/252178/

def all_perms(elements):
    if len(elements) <=1:
        yield elements
    else:
        for perm in all_perms(elements[1:]):
            for i in range(len(elements)):
                # nb elements[0:1] works in both string and list contexts
                yield perm[:i] + elements[0:1] + perm[i:]

的文档中列出了几种其他方法itertools.permutations。这是一个:

def permutations(iterable, r=None):
    # permutations('ABCD', 2) --> AB AC AD BA BC BD CA CB CD DA DB DC
    # permutations(range(3)) --> 012 021 102 120 201 210
    pool = tuple(iterable)
    n = len(pool)
    r = n if r is None else r
    if r > n:
        return
    indices = range(n)
    cycles = range(n, n-r, -1)
    yield tuple(pool[i] for i in indices[:r])
    while n:
        for i in reversed(range(r)):
            cycles[i] -= 1
            if cycles[i] == 0:
                indices[i:] = indices[i+1:] + indices[i:i+1]
                cycles[i] = n - i
            else:
                j = cycles[i]
                indices[i], indices[-j] = indices[-j], indices[i]
                yield tuple(pool[i] for i in indices[:r])
                break
        else:
            return

另一个基于itertools.product

def permutations(iterable, r=None):
    pool = tuple(iterable)
    n = len(pool)
    r = n if r is None else r
    for indices in product(range(n), repeat=r):
        if len(set(indices)) == r:
            yield tuple(pool[i] for i in indices)

Starting with Python 2.6 (and if you’re on Python 3) you have a standard-library tool for this: itertools.permutations.

import itertools
list(itertools.permutations([1, 2, 3]))

If you’re using an older Python (<2.6) for some reason or are just curious to know how it works, here’s one nice approach, taken from http://code.activestate.com/recipes/252178/:

def all_perms(elements):
    if len(elements) <=1:
        yield elements
    else:
        for perm in all_perms(elements[1:]):
            for i in range(len(elements)):
                # nb elements[0:1] works in both string and list contexts
                yield perm[:i] + elements[0:1] + perm[i:]

A couple of alternative approaches are listed in the documentation of itertools.permutations. Here’s one:

def permutations(iterable, r=None):
    # permutations('ABCD', 2) --> AB AC AD BA BC BD CA CB CD DA DB DC
    # permutations(range(3)) --> 012 021 102 120 201 210
    pool = tuple(iterable)
    n = len(pool)
    r = n if r is None else r
    if r > n:
        return
    indices = range(n)
    cycles = range(n, n-r, -1)
    yield tuple(pool[i] for i in indices[:r])
    while n:
        for i in reversed(range(r)):
            cycles[i] -= 1
            if cycles[i] == 0:
                indices[i:] = indices[i+1:] + indices[i:i+1]
                cycles[i] = n - i
            else:
                j = cycles[i]
                indices[i], indices[-j] = indices[-j], indices[i]
                yield tuple(pool[i] for i in indices[:r])
                break
        else:
            return

And another, based on itertools.product:

def permutations(iterable, r=None):
    pool = tuple(iterable)
    n = len(pool)
    r = n if r is None else r
    for indices in product(range(n), repeat=r):
        if len(set(indices)) == r:
            yield tuple(pool[i] for i in indices)

回答 1

Python 2.6及更高版本中:

import itertools
itertools.permutations([1,2,3])

(作为生成器返回。用于list(permutations(l))作为列表返回。)

And in Python 2.6 onwards:

import itertools
itertools.permutations([1,2,3])

(returned as a generator. Use list(permutations(l)) to return as a list.)


回答 2

以下代码仅适用于Python 2.6及更高版本

首先,导入itertools

import itertools

排列(顺序很重要):

print list(itertools.permutations([1,2,3,4], 2))
[(1, 2), (1, 3), (1, 4),
(2, 1), (2, 3), (2, 4),
(3, 1), (3, 2), (3, 4),
(4, 1), (4, 2), (4, 3)]

组合(顺序无关紧要):

print list(itertools.combinations('123', 2))
[('1', '2'), ('1', '3'), ('2', '3')]

笛卡尔积(具有多个可迭代项):

print list(itertools.product([1,2,3], [4,5,6]))
[(1, 4), (1, 5), (1, 6),
(2, 4), (2, 5), (2, 6),
(3, 4), (3, 5), (3, 6)]

笛卡尔积(本身具有一个可迭代的):

print list(itertools.product([1,2], repeat=3))
[(1, 1, 1), (1, 1, 2), (1, 2, 1), (1, 2, 2),
(2, 1, 1), (2, 1, 2), (2, 2, 1), (2, 2, 2)]

The following code with Python 2.6 and above ONLY

First, import itertools:

import itertools

Permutation (order matters):

print list(itertools.permutations([1,2,3,4], 2))
[(1, 2), (1, 3), (1, 4),
(2, 1), (2, 3), (2, 4),
(3, 1), (3, 2), (3, 4),
(4, 1), (4, 2), (4, 3)]

Combination (order does NOT matter):

print list(itertools.combinations('123', 2))
[('1', '2'), ('1', '3'), ('2', '3')]

Cartesian product (with several iterables):

print list(itertools.product([1,2,3], [4,5,6]))
[(1, 4), (1, 5), (1, 6),
(2, 4), (2, 5), (2, 6),
(3, 4), (3, 5), (3, 6)]

Cartesian product (with one iterable and itself):

print list(itertools.product([1,2], repeat=3))
[(1, 1, 1), (1, 1, 2), (1, 2, 1), (1, 2, 2),
(2, 1, 1), (2, 1, 2), (2, 2, 1), (2, 2, 2)]

回答 3

def permutations(head, tail=''):
    if len(head) == 0: print tail
    else:
        for i in range(len(head)):
            permutations(head[0:i] + head[i+1:], tail+head[i])

称为:

permutations('abc')
def permutations(head, tail=''):
    if len(head) == 0: print tail
    else:
        for i in range(len(head)):
            permutations(head[0:i] + head[i+1:], tail+head[i])

called as:

permutations('abc')

回答 4

#!/usr/bin/env python

def perm(a, k=0):
   if k == len(a):
      print a
   else:
      for i in xrange(k, len(a)):
         a[k], a[i] = a[i] ,a[k]
         perm(a, k+1)
         a[k], a[i] = a[i], a[k]

perm([1,2,3])

输出:

[1, 2, 3]
[1, 3, 2]
[2, 1, 3]
[2, 3, 1]
[3, 2, 1]
[3, 1, 2]

当我交换列表的内容时,需要一个可变的序列类型作为输入。例如,perm(list("ball"))将工作,perm("ball")不会因为您不能更改字符串。

此Python实现受Horowitz,Sahni和Rajasekeran的《计算机算法》一书中介绍的算法的启发

#!/usr/bin/env python

def perm(a, k=0):
   if k == len(a):
      print a
   else:
      for i in xrange(k, len(a)):
         a[k], a[i] = a[i] ,a[k]
         perm(a, k+1)
         a[k], a[i] = a[i], a[k]

perm([1,2,3])

Output:

[1, 2, 3]
[1, 3, 2]
[2, 1, 3]
[2, 3, 1]
[3, 2, 1]
[3, 1, 2]

As I’m swapping the content of the list it’s required a mutable sequence type as input. E.g. perm(list("ball")) will work and perm("ball") won’t because you can’t change a string.

This Python implementation is inspired by the algorithm presented in the book Computer Algorithms by Horowitz, Sahni and Rajasekeran.


回答 5

此解决方案实现了一个生成器,以避免将所有排列保留在内存中:

def permutations (orig_list):
    if not isinstance(orig_list, list):
        orig_list = list(orig_list)

    yield orig_list

    if len(orig_list) == 1:
        return

    for n in sorted(orig_list):
        new_list = orig_list[:]
        pos = new_list.index(n)
        del(new_list[pos])
        new_list.insert(0, n)
        for resto in permutations(new_list[1:]):
            if new_list[:1] + resto <> orig_list:
                yield new_list[:1] + resto

This solution implements a generator, to avoid holding all the permutations on memory:

def permutations (orig_list):
    if not isinstance(orig_list, list):
        orig_list = list(orig_list)

    yield orig_list

    if len(orig_list) == 1:
        return

    for n in sorted(orig_list):
        new_list = orig_list[:]
        pos = new_list.index(n)
        del(new_list[pos])
        new_list.insert(0, n)
        for resto in permutations(new_list[1:]):
            if new_list[:1] + resto <> orig_list:
                yield new_list[:1] + resto

回答 6

以实用的风格

def addperm(x,l):
    return [ l[0:i] + [x] + l[i:]  for i in range(len(l)+1) ]

def perm(l):
    if len(l) == 0:
        return [[]]
    return [x for y in perm(l[1:]) for x in addperm(l[0],y) ]

print perm([ i for i in range(3)])

结果:

[[0, 1, 2], [1, 0, 2], [1, 2, 0], [0, 2, 1], [2, 0, 1], [2, 1, 0]]

In a functional style

def addperm(x,l):
    return [ l[0:i] + [x] + l[i:]  for i in range(len(l)+1) ]

def perm(l):
    if len(l) == 0:
        return [[]]
    return [x for y in perm(l[1:]) for x in addperm(l[0],y) ]

print perm([ i for i in range(3)])

The result:

[[0, 1, 2], [1, 0, 2], [1, 2, 0], [0, 2, 1], [2, 0, 1], [2, 1, 0]]

回答 7

以下代码是给定列表的就地排列,实现为生成器。由于仅返回对列表的引用,因此不应在生成器外部修改列表。该解决方案是非递归的,因此使用低内存。输入列表中元素的多个副本也可以很好地工作。

def permute_in_place(a):
    a.sort()
    yield list(a)

    if len(a) <= 1:
        return

    first = 0
    last = len(a)
    while 1:
        i = last - 1

        while 1:
            i = i - 1
            if a[i] < a[i+1]:
                j = last - 1
                while not (a[i] < a[j]):
                    j = j - 1
                a[i], a[j] = a[j], a[i] # swap the values
                r = a[i+1:last]
                r.reverse()
                a[i+1:last] = r
                yield list(a)
                break
            if i == first:
                a.reverse()
                return

if __name__ == '__main__':
    for n in range(5):
        for a in permute_in_place(range(1, n+1)):
            print a
        print

    for a in permute_in_place([0, 0, 1, 1, 1]):
        print a
    print

The following code is an in-place permutation of a given list, implemented as a generator. Since it only returns references to the list, the list should not be modified outside the generator. The solution is non-recursive, so uses low memory. Work well also with multiple copies of elements in the input list.

def permute_in_place(a):
    a.sort()
    yield list(a)

    if len(a) <= 1:
        return

    first = 0
    last = len(a)
    while 1:
        i = last - 1

        while 1:
            i = i - 1
            if a[i] < a[i+1]:
                j = last - 1
                while not (a[i] < a[j]):
                    j = j - 1
                a[i], a[j] = a[j], a[i] # swap the values
                r = a[i+1:last]
                r.reverse()
                a[i+1:last] = r
                yield list(a)
                break
            if i == first:
                a.reverse()
                return

if __name__ == '__main__':
    for n in range(5):
        for a in permute_in_place(range(1, n+1)):
            print a
        print

    for a in permute_in_place([0, 0, 1, 1, 1]):
        print a
    print

回答 8

我认为一种很明显的方式可能是:

def permutList(l):
    if not l:
            return [[]]
    res = []
    for e in l:
            temp = l[:]
            temp.remove(e)
            res.extend([[e] + r for r in permutList(temp)])

    return res

A quite obvious way in my opinion might be also:

def permutList(l):
    if not l:
            return [[]]
    res = []
    for e in l:
            temp = l[:]
            temp.remove(e)
            res.extend([[e] + r for r in permutList(temp)])

    return res

回答 9

list2Perm = [1, 2.0, 'three']
listPerm = [[a, b, c]
            for a in list2Perm
            for b in list2Perm
            for c in list2Perm
            if ( a != b and b != c and a != c )
            ]
print listPerm

输出:

[
    [1, 2.0, 'three'], 
    [1, 'three', 2.0], 
    [2.0, 1, 'three'], 
    [2.0, 'three', 1], 
    ['three', 1, 2.0], 
    ['three', 2.0, 1]
]
list2Perm = [1, 2.0, 'three']
listPerm = [[a, b, c]
            for a in list2Perm
            for b in list2Perm
            for c in list2Perm
            if ( a != b and b != c and a != c )
            ]
print listPerm

Output:

[
    [1, 2.0, 'three'], 
    [1, 'three', 2.0], 
    [2.0, 1, 'three'], 
    [2.0, 'three', 1], 
    ['three', 1, 2.0], 
    ['three', 2.0, 1]
]

回答 10

我使用了基于阶乘数系统的算法-对于长度为n的列表,您可以逐项组合每个排列项,并从每个阶段剩下的项中进行选择。第一项有n个选择,第二项有n-1个,最后一项只有n个,因此可以将阶乘数字系统中数字的数字用作索引。这样,数字0到n!-1对应于字典顺序中所有可能的排列。

from math import factorial
def permutations(l):
    permutations=[]
    length=len(l)
    for x in xrange(factorial(length)):
        available=list(l)
        newPermutation=[]
        for radix in xrange(length, 0, -1):
            placeValue=factorial(radix-1)
            index=x/placeValue
            newPermutation.append(available.pop(index))
            x-=index*placeValue
        permutations.append(newPermutation)
    return permutations

permutations(range(3))

输出:

[[0, 1, 2], [0, 2, 1], [1, 0, 2], [1, 2, 0], [2, 0, 1], [2, 1, 0]]

该方法是非递归的,但是在我的计算机上它会稍微慢一些,并且当n!时xrange会引发错误!太大而无法转换为C长整数(对我来说n = 13)。当我需要它时就足够了,但是从长远来看这不是itertools.permutations。

I used an algorithm based on the factorial number system– For a list of length n, you can assemble each permutation item by item, selecting from the items left at each stage. You have n choices for the first item, n-1 for the second, and only one for the last, so you can use the digits of a number in the factorial number system as the indices. This way the numbers 0 through n!-1 correspond to all possible permutations in lexicographic order.

from math import factorial
def permutations(l):
    permutations=[]
    length=len(l)
    for x in xrange(factorial(length)):
        available=list(l)
        newPermutation=[]
        for radix in xrange(length, 0, -1):
            placeValue=factorial(radix-1)
            index=x/placeValue
            newPermutation.append(available.pop(index))
            x-=index*placeValue
        permutations.append(newPermutation)
    return permutations

permutations(range(3))

output:

[[0, 1, 2], [0, 2, 1], [1, 0, 2], [1, 2, 0], [2, 0, 1], [2, 1, 0]]

This method is non-recursive, but it is slightly slower on my computer and xrange raises an error when n! is too large to be converted to a C long integer (n=13 for me). It was enough when I needed it, but it’s no itertools.permutations by a long shot.


回答 11

请注意,此算法具有n factorial时间复杂度,其中n是输入列表的长度

打印运行结果:

global result
result = [] 

def permutation(li):
if li == [] or li == None:
    return

if len(li) == 1:
    result.append(li[0])
    print result
    result.pop()
    return

for i in range(0,len(li)):
    result.append(li[i])
    permutation(li[:i] + li[i+1:])
    result.pop()    

例:

permutation([1,2,3])

输出:

[1, 2, 3]
[1, 3, 2]
[2, 1, 3]
[2, 3, 1]
[3, 1, 2]
[3, 2, 1]

Note that this algorithm has an n factorial time complexity, where n is the length of the input list

Print the results on the run:

global result
result = [] 

def permutation(li):
if li == [] or li == None:
    return

if len(li) == 1:
    result.append(li[0])
    print result
    result.pop()
    return

for i in range(0,len(li)):
    result.append(li[i])
    permutation(li[:i] + li[i+1:])
    result.pop()    

Example:

permutation([1,2,3])

Output:

[1, 2, 3]
[1, 3, 2]
[2, 1, 3]
[2, 3, 1]
[3, 1, 2]
[3, 2, 1]

回答 12

正如tzwenn的回答,确实可以迭代每个排列的第一个元素。但是,以这种方式编写此解决方案效率更高:

def all_perms(elements):
    if len(elements) <= 1:
        yield elements  # Only permutation possible = no permutation
    else:
        # Iteration over the first element in the result permutation:
        for (index, first_elmt) in enumerate(elements):
            other_elmts = elements[:index]+elements[index+1:]
            for permutation in all_perms(other_elmts): 
                yield [first_elmt] + permutation

该解决方案的速度提高了约30%,这显然归功于递归以len(elements) <= 1代替0yield就像Riccardo Reyes的解决方案一样,它使用生成器函数(通过),因此内存效率也更高。

One can indeed iterate over the first element of each permutation, as in tzwenn’s answer. It is however more efficient to write this solution this way:

def all_perms(elements):
    if len(elements) <= 1:
        yield elements  # Only permutation possible = no permutation
    else:
        # Iteration over the first element in the result permutation:
        for (index, first_elmt) in enumerate(elements):
            other_elmts = elements[:index]+elements[index+1:]
            for permutation in all_perms(other_elmts): 
                yield [first_elmt] + permutation

This solution is about 30 % faster, apparently thanks to the recursion ending at len(elements) <= 1 instead of 0. It is also much more memory-efficient, as it uses a generator function (through yield), like in Riccardo Reyes’s solution.


回答 13

这是受Haskell实现的启发,该实现使用列表理解:

def permutation(list):
    if len(list) == 0:
        return [[]]
    else:
        return [[x] + ys for x in list for ys in permutation(delete(list, x))]

def delete(list, item):
    lc = list[:]
    lc.remove(item)
    return lc

This is inspired by the Haskell implementation using list comprehension:

def permutation(list):
    if len(list) == 0:
        return [[]]
    else:
        return [[x] + ys for x in list for ys in permutation(delete(list, x))]

def delete(list, item):
    lc = list[:]
    lc.remove(item)
    return lc

回答 14

常规执行(无收益-将在内存中做所有事情):

def getPermutations(array):
    if len(array) == 1:
        return [array]
    permutations = []
    for i in range(len(array)): 
        # get all perm's of subarray w/o current item
        perms = getPermutations(array[:i] + array[i+1:])  
        for p in perms:
            permutations.append([array[i], *p])
    return permutations

收益实施:

def getPermutations(array):
    if len(array) == 1:
        yield array
    else:
        for i in range(len(array)):
            perms = getPermutations(array[:i] + array[i+1:])
            for p in perms:
                yield [array[i], *p]

基本思想是遍历数组中所有元素的第1个位置,然后在第2个位置中遍历所有其余元素,而第1个位置没有选择元素,依此类推。您可以使用recursion进行操作,其中停止条件是到达由1个元素组成的数组-在这种情况下,您将返回该数组。

Regular implementation (no yield – will do everything in memory):

def getPermutations(array):
    if len(array) == 1:
        return [array]
    permutations = []
    for i in range(len(array)): 
        # get all perm's of subarray w/o current item
        perms = getPermutations(array[:i] + array[i+1:])  
        for p in perms:
            permutations.append([array[i], *p])
    return permutations

Yield implementation:

def getPermutations(array):
    if len(array) == 1:
        yield array
    else:
        for i in range(len(array)):
            perms = getPermutations(array[:i] + array[i+1:])
            for p in perms:
                yield [array[i], *p]

The basic idea is to go over all the elements in the array for the 1st position, and then in 2nd position go over all the rest of the elements without the chosen element for the 1st, etc. You can do this with recursion, where the stop criteria is getting to an array of 1 element – in which case you return that array.


回答 15

为了提高性能,从Knuth(p22)那里得到了一个麻木的解决方案:

from numpy import empty, uint8
from math import factorial

def perms(n):
    f = 1
    p = empty((2*n-1, factorial(n)), uint8)
    for i in range(n):
        p[i, :f] = i
        p[i+1:2*i+1, :f] = p[:i, :f]  # constitution de blocs
        for j in range(i):
            p[:i+1, f*(j+1):f*(j+2)] = p[j+1:j+i+2, :f]  # copie de blocs
        f = f*(i+1)
    return p[:n, :]

复制大块内存可以节省时间-比list(itertools.permutations(range(n))以下方法快20倍:

In [1]: %timeit -n10 list(permutations(range(10)))
10 loops, best of 3: 815 ms per loop

In [2]: %timeit -n100 perms(10) 
100 loops, best of 3: 40 ms per loop

For performance, a numpy solution inspired by Knuth, (p22) :

from numpy import empty, uint8
from math import factorial

def perms(n):
    f = 1
    p = empty((2*n-1, factorial(n)), uint8)
    for i in range(n):
        p[i, :f] = i
        p[i+1:2*i+1, :f] = p[:i, :f]  # constitution de blocs
        for j in range(i):
            p[:i+1, f*(j+1):f*(j+2)] = p[j+1:j+i+2, :f]  # copie de blocs
        f = f*(i+1)
    return p[:n, :]

Copying large blocs of memory saves time – it’s 20x faster than list(itertools.permutations(range(n)) :

In [1]: %timeit -n10 list(permutations(range(10)))
10 loops, best of 3: 815 ms per loop

In [2]: %timeit -n100 perms(10) 
100 loops, best of 3: 40 ms per loop

回答 16

from __future__ import print_function

def perm(n):
    p = []
    for i in range(0,n+1):
        p.append(i)
    while True:
        for i in range(1,n+1):
            print(p[i], end=' ')
        print("")
        i = n - 1
        found = 0
        while (not found and i>0):
            if p[i]<p[i+1]:
                found = 1
            else:
                i = i - 1
        k = n
        while p[i]>p[k]:
            k = k - 1
        aux = p[i]
        p[i] = p[k]
        p[k] = aux
        for j in range(1,(n-i)/2+1):
            aux = p[i+j]
            p[i+j] = p[n-j+1]
            p[n-j+1] = aux
        if not found:
            break

perm(5)
from __future__ import print_function

def perm(n):
    p = []
    for i in range(0,n+1):
        p.append(i)
    while True:
        for i in range(1,n+1):
            print(p[i], end=' ')
        print("")
        i = n - 1
        found = 0
        while (not found and i>0):
            if p[i]<p[i+1]:
                found = 1
            else:
                i = i - 1
        k = n
        while p[i]>p[k]:
            k = k - 1
        aux = p[i]
        p[i] = p[k]
        p[k] = aux
        for j in range(1,(n-i)/2+1):
            aux = p[i+j]
            p[i+j] = p[n-j+1]
            p[n-j+1] = aux
        if not found:
            break

perm(5)

回答 17

这是一种适用于列表的算法,无需创建新的中间列表,类似于https://stackoverflow.com/a/108651/184528上Ber的解决方案。

def permute(xs, low=0):
    if low + 1 >= len(xs):
        yield xs
    else:
        for p in permute(xs, low + 1):
            yield p        
        for i in range(low + 1, len(xs)):        
            xs[low], xs[i] = xs[i], xs[low]
            for p in permute(xs, low + 1):
                yield p        
            xs[low], xs[i] = xs[i], xs[low]

for p in permute([1, 2, 3, 4]):
    print p

您可以在此处亲自尝试该代码:http : //repl.it/J9v

Here is an algorithm that works on a list without creating new intermediate lists similar to Ber’s solution at https://stackoverflow.com/a/108651/184528.

def permute(xs, low=0):
    if low + 1 >= len(xs):
        yield xs
    else:
        for p in permute(xs, low + 1):
            yield p        
        for i in range(low + 1, len(xs)):        
            xs[low], xs[i] = xs[i], xs[low]
            for p in permute(xs, low + 1):
                yield p        
            xs[low], xs[i] = xs[i], xs[low]

for p in permute([1, 2, 3, 4]):
    print p

You can try the code out for yourself here: http://repl.it/J9v


回答 18

递归的美丽:

>>> import copy
>>> def perm(prefix,rest):
...      for e in rest:
...              new_rest=copy.copy(rest)
...              new_prefix=copy.copy(prefix)
...              new_prefix.append(e)
...              new_rest.remove(e)
...              if len(new_rest) == 0:
...                      print new_prefix + new_rest
...                      continue
...              perm(new_prefix,new_rest)
... 
>>> perm([],['a','b','c','d'])
['a', 'b', 'c', 'd']
['a', 'b', 'd', 'c']
['a', 'c', 'b', 'd']
['a', 'c', 'd', 'b']
['a', 'd', 'b', 'c']
['a', 'd', 'c', 'b']
['b', 'a', 'c', 'd']
['b', 'a', 'd', 'c']
['b', 'c', 'a', 'd']
['b', 'c', 'd', 'a']
['b', 'd', 'a', 'c']
['b', 'd', 'c', 'a']
['c', 'a', 'b', 'd']
['c', 'a', 'd', 'b']
['c', 'b', 'a', 'd']
['c', 'b', 'd', 'a']
['c', 'd', 'a', 'b']
['c', 'd', 'b', 'a']
['d', 'a', 'b', 'c']
['d', 'a', 'c', 'b']
['d', 'b', 'a', 'c']
['d', 'b', 'c', 'a']
['d', 'c', 'a', 'b']
['d', 'c', 'b', 'a']

The beauty of recursion:

>>> import copy
>>> def perm(prefix,rest):
...      for e in rest:
...              new_rest=copy.copy(rest)
...              new_prefix=copy.copy(prefix)
...              new_prefix.append(e)
...              new_rest.remove(e)
...              if len(new_rest) == 0:
...                      print new_prefix + new_rest
...                      continue
...              perm(new_prefix,new_rest)
... 
>>> perm([],['a','b','c','d'])
['a', 'b', 'c', 'd']
['a', 'b', 'd', 'c']
['a', 'c', 'b', 'd']
['a', 'c', 'd', 'b']
['a', 'd', 'b', 'c']
['a', 'd', 'c', 'b']
['b', 'a', 'c', 'd']
['b', 'a', 'd', 'c']
['b', 'c', 'a', 'd']
['b', 'c', 'd', 'a']
['b', 'd', 'a', 'c']
['b', 'd', 'c', 'a']
['c', 'a', 'b', 'd']
['c', 'a', 'd', 'b']
['c', 'b', 'a', 'd']
['c', 'b', 'd', 'a']
['c', 'd', 'a', 'b']
['c', 'd', 'b', 'a']
['d', 'a', 'b', 'c']
['d', 'a', 'c', 'b']
['d', 'b', 'a', 'c']
['d', 'b', 'c', 'a']
['d', 'c', 'a', 'b']
['d', 'c', 'b', 'a']

回答 19

此算法是最有效的算法,它避免了在递归调用中进行数组传递和操作,在Python 2、3中有效:

def permute(items):
    length = len(items)
    def inner(ix=[]):
        do_yield = len(ix) == length - 1
        for i in range(0, length):
            if i in ix: #avoid duplicates
                continue
            if do_yield:
                yield tuple([items[y] for y in ix + [i]])
            else:
                for p in inner(ix + [i]):
                    yield p
    return inner()

用法:

for p in permute((1,2,3)):
    print(p)

(1, 2, 3)
(1, 3, 2)
(2, 1, 3)
(2, 3, 1)
(3, 1, 2)
(3, 2, 1)

This algorithm is the most effective one, it avoids of array passing and manipulation in recursive calls, works in Python 2, 3:

def permute(items):
    length = len(items)
    def inner(ix=[]):
        do_yield = len(ix) == length - 1
        for i in range(0, length):
            if i in ix: #avoid duplicates
                continue
            if do_yield:
                yield tuple([items[y] for y in ix + [i]])
            else:
                for p in inner(ix + [i]):
                    yield p
    return inner()

Usage:

for p in permute((1,2,3)):
    print(p)

(1, 2, 3)
(1, 3, 2)
(2, 1, 3)
(2, 3, 1)
(3, 1, 2)
(3, 2, 1)

回答 20

def pzip(c, seq):
    result = []
    for item in seq:
        for i in range(len(item)+1):
            result.append(item[i:]+c+item[:i])
    return result


def perm(line):
    seq = [c for c in line]
    if len(seq) <=1 :
        return seq
    else:
        return pzip(seq[0], perm(seq[1:]))
def pzip(c, seq):
    result = []
    for item in seq:
        for i in range(len(item)+1):
            result.append(item[i:]+c+item[:i])
    return result


def perm(line):
    seq = [c for c in line]
    if len(seq) <=1 :
        return seq
    else:
        return pzip(seq[0], perm(seq[1:]))

回答 21

另一种方法(无库)

def permutation(input):
    if len(input) == 1:
        return input if isinstance(input, list) else [input]

    result = []
    for i in range(len(input)):
        first = input[i]
        rest = input[:i] + input[i + 1:]
        rest_permutation = permutation(rest)
        for p in rest_permutation:
            result.append(first + p)
    return result

输入可以是字符串或列表

print(permutation('abcd'))
print(permutation(['a', 'b', 'c', 'd']))

ANOTHER APPROACH (without libs)

def permutation(input):
    if len(input) == 1:
        return input if isinstance(input, list) else [input]

    result = []
    for i in range(len(input)):
        first = input[i]
        rest = input[:i] + input[i + 1:]
        rest_permutation = permutation(rest)
        for p in rest_permutation:
            result.append(first + p)
    return result

Input can be a string or a list

print(permutation('abcd'))
print(permutation(['a', 'b', 'c', 'd']))

回答 22

免责声明:软件包作者提供的无形插件。:)

猪手包是从大多数实现不同,它产生不实际包含的排列,而是描述的排列和各位置之间的映射关系的顺序,使其能够工作,排列非常大名单“,如图所示伪名单在这个演示中,执行了一个非常瞬时的操作并在“包含”字母中所有字母排列的伪列表中进行查找,而没有使用比典型的Web页面更多的内存或处理。

无论如何,要生成排列列表,我们可以执行以下操作。

import trotter

my_permutations = trotter.Permutations(3, [1, 2, 3])

print(my_permutations)

for p in my_permutations:
    print(p)

输出:

包含[1、2、3]的6个3排列的伪列表。
[1,2,3]
[1、3、2]
[3,1,2]
[3,2,1]
[2,3,1]
[2,1,3]

Disclaimer: shapeless plug by package author. :)

The trotter package is different from most implementations in that it generates pseudo lists that don’t actually contain permutations but rather describe mappings between permutations and respective positions in an ordering, making it possible to work with very large ‘lists’ of permutations, as shown in this demo which performs pretty instantaneous operations and look-ups in a pseudo-list ‘containing’ all the permutations of the letters in the alphabet, without using more memory or processing than a typical web page.

In any case, to generate a list of permutations, we can do the following.

import trotter

my_permutations = trotter.Permutations(3, [1, 2, 3])

print(my_permutations)

for p in my_permutations:
    print(p)

Output:

A pseudo-list containing 6 3-permutations of [1, 2, 3].
[1, 2, 3]
[1, 3, 2]
[3, 1, 2]
[3, 2, 1]
[2, 3, 1]
[2, 1, 3]

回答 23

生成所有可能的排列

我正在使用python3.4:

def calcperm(arr, size):
    result = set([()])
    for dummy_idx in range(size):
        temp = set()
        for dummy_lst in result:
            for dummy_outcome in arr:
                if dummy_outcome not in dummy_lst:
                    new_seq = list(dummy_lst)
                    new_seq.append(dummy_outcome)
                    temp.add(tuple(new_seq))
        result = temp
    return result

测试用例:

lst = [1, 2, 3, 4]
#lst = ["yellow", "magenta", "white", "blue"]
seq = 2
final = calcperm(lst, seq)
print(len(final))
print(final)

Generate all possible permutations

I’m using python3.4:

def calcperm(arr, size):
    result = set([()])
    for dummy_idx in range(size):
        temp = set()
        for dummy_lst in result:
            for dummy_outcome in arr:
                if dummy_outcome not in dummy_lst:
                    new_seq = list(dummy_lst)
                    new_seq.append(dummy_outcome)
                    temp.add(tuple(new_seq))
        result = temp
    return result

Test Cases:

lst = [1, 2, 3, 4]
#lst = ["yellow", "magenta", "white", "blue"]
seq = 2
final = calcperm(lst, seq)
print(len(final))
print(final)

回答 24

为了节省大家的搜索和实验时间,以下是Python中的非递归置换解决方案,该解决方案也适用于Numba(自0.41版起):

@numba.njit()
def permutations(A, k):
    r = [[i for i in range(0)]]
    for i in range(k):
        r = [[a] + b for a in A for b in r if (a in b)==False]
    return r
permutations([1,2,3],3)
[[1, 2, 3], [1, 3, 2], [2, 1, 3], [2, 3, 1], [3, 1, 2], [3, 2, 1]]

给人印象的表现:

%timeit permutations(np.arange(5),5)

243 µs ± 11.1 µs per loop (mean ± std. dev. of 7 runs, 1 loop each)
time: 406 ms

%timeit list(itertools.permutations(np.arange(5),5))
15.9 µs ± 8.61 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
time: 12.9 s

因此,仅当必须从njitted函数调用它时才使用此版本,否则,请选择itertools实现。

To save you folks possible hours of searching and experimenting, here’s the non-recursive permutaions solution in Python which also works with Numba (as of v. 0.41):

@numba.njit()
def permutations(A, k):
    r = [[i for i in range(0)]]
    for i in range(k):
        r = [[a] + b for a in A for b in r if (a in b)==False]
    return r
permutations([1,2,3],3)
[[1, 2, 3], [1, 3, 2], [2, 1, 3], [2, 3, 1], [3, 1, 2], [3, 2, 1]]

To give an impression about performance:

%timeit permutations(np.arange(5),5)

243 µs ± 11.1 µs per loop (mean ± std. dev. of 7 runs, 1 loop each)
time: 406 ms

%timeit list(itertools.permutations(np.arange(5),5))
15.9 µs ± 8.61 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
time: 12.9 s

So use this version only if you have to call it from njitted function, otherwise prefer itertools implementation.


回答 25

我看到这些递归函数内部进行了很多迭代,而并非完全是函数递归…

因此对于那些甚至无法遵守一个循环的人来说,这是一个总的,完全不必要的完全递归解决方案

def all_insert(x, e, i=0):
    return [x[0:i]+[e]+x[i:]] + all_insert(x,e,i+1) if i<len(x)+1 else []

def for_each(X, e):
    return all_insert(X[0], e) + for_each(X[1:],e) if X else []

def permute(x):
    return [x] if len(x) < 2 else for_each( permute(x[1:]) , x[0])


perms = permute([1,2,3])

I see a lot of iteration going on inside these recursive functions, not exactly pure recursion…

so for those of you who cannot abide by even a single loop, here’s a gross, totally unnecessary fully recursive solution

def all_insert(x, e, i=0):
    return [x[0:i]+[e]+x[i:]] + all_insert(x,e,i+1) if i<len(x)+1 else []

def for_each(X, e):
    return all_insert(X[0], e) + for_each(X[1:],e) if X else []

def permute(x):
    return [x] if len(x) < 2 else for_each( permute(x[1:]) , x[0])


perms = permute([1,2,3])

回答 26

另一个解决方案:

def permutation(flag, k =1 ):
    N = len(flag)
    for i in xrange(0, N):
        if flag[i] != 0:
            continue
        flag[i] = k 
        if k == N:
            print flag
        permutation(flag, k+1)
        flag[i] = 0

permutation([0, 0, 0])

Another solution:

def permutation(flag, k =1 ):
    N = len(flag)
    for i in xrange(0, N):
        if flag[i] != 0:
            continue
        flag[i] = k 
        if k == N:
            print flag
        permutation(flag, k+1)
        flag[i] = 0

permutation([0, 0, 0])

回答 27

我的Python解决方案:

def permutes(input,offset):
    if( len(input) == offset ):
        return [''.join(input)]

    result=[]        
    for i in range( offset, len(input) ):
         input[offset], input[i] = input[i], input[offset]
         result = result + permutes(input,offset+1)
         input[offset], input[i] = input[i], input[offset]
    return result

# input is a "string"
# return value is a list of strings
def permutations(input):
    return permutes( list(input), 0 )

# Main Program
print( permutations("wxyz") )

My Python Solution:

def permutes(input,offset):
    if( len(input) == offset ):
        return [''.join(input)]

    result=[]        
    for i in range( offset, len(input) ):
         input[offset], input[i] = input[i], input[offset]
         result = result + permutes(input,offset+1)
         input[offset], input[i] = input[i], input[offset]
    return result

# input is a "string"
# return value is a list of strings
def permutations(input):
    return permutes( list(input), 0 )

# Main Program
print( permutations("wxyz") )

回答 28

def permutation(word, first_char=None):
    if word == None or len(word) == 0: return []
    if len(word) == 1: return [word]

    result = []
    first_char = word[0]
    for sub_word in permutation(word[1:], first_char):
        result += insert(first_char, sub_word)
    return sorted(result)

def insert(ch, sub_word):
    arr = [ch + sub_word]
    for i in range(len(sub_word)):
        arr.append(sub_word[i:] + ch + sub_word[:i])
    return arr


assert permutation(None) == []
assert permutation('') == []
assert permutation('1')  == ['1']
assert permutation('12') == ['12', '21']

print permutation('abc')

输出:[‘abc’,’acb’,’bac’,’bca’,’cab’,’cba’]

def permutation(word, first_char=None):
    if word == None or len(word) == 0: return []
    if len(word) == 1: return [word]

    result = []
    first_char = word[0]
    for sub_word in permutation(word[1:], first_char):
        result += insert(first_char, sub_word)
    return sorted(result)

def insert(ch, sub_word):
    arr = [ch + sub_word]
    for i in range(len(sub_word)):
        arr.append(sub_word[i:] + ch + sub_word[:i])
    return arr


assert permutation(None) == []
assert permutation('') == []
assert permutation('1')  == ['1']
assert permutation('12') == ['12', '21']

print permutation('abc')

Output: [‘abc’, ‘acb’, ‘bac’, ‘bca’, ‘cab’, ‘cba’]


回答 29

使用 Counter

from collections import Counter

def permutations(nums):
    ans = [[]]
    cache = Counter(nums)

    for idx, x in enumerate(nums):
        result = []
        for items in ans:
            cache1 = Counter(items)
            for id, n in enumerate(nums):
                if cache[n] != cache1[n] and items + [n] not in result:
                    result.append(items + [n])

        ans = result
    return ans
permutations([1, 2, 2])
> [[1, 2, 2], [2, 1, 2], [2, 2, 1]]

Using Counter

from collections import Counter

def permutations(nums):
    ans = [[]]
    cache = Counter(nums)

    for idx, x in enumerate(nums):
        result = []
        for items in ans:
            cache1 = Counter(items)
            for id, n in enumerate(nums):
                if cache[n] != cache1[n] and items + [n] not in result:
                    result.append(items + [n])

        ans = result
    return ans
permutations([1, 2, 2])
> [[1, 2, 2], [2, 1, 2], [2, 2, 1]]


随机播放DataFrame行

问题:随机播放DataFrame行

我有以下DataFrame:

    Col1  Col2  Col3  Type
0      1     2     3     1
1      4     5     6     1
...
20     7     8     9     2
21    10    11    12     2
...
45    13    14    15     3
46    16    17    18     3
...

从csv文件读取DataFrame。所有具有Type1的行都在最上面,然后是具有Type2 的行,然后是具有Type3 的行,依此类推。

我想重新整理DataFrame行的顺序,以便将所有行Type混合在一起。可能的结果可能是:

    Col1  Col2  Col3  Type
0      7     8     9     2
1     13    14    15     3
...
20     1     2     3     1
21    10    11    12     2
...
45     4     5     6     1
46    16    17    18     3
...

我该如何实现?

I have the following DataFrame:

    Col1  Col2  Col3  Type
0      1     2     3     1
1      4     5     6     1
...
20     7     8     9     2
21    10    11    12     2
...
45    13    14    15     3
46    16    17    18     3
...

The DataFrame is read from a csv file. All rows which have Type 1 are on top, followed by the rows with Type 2, followed by the rows with Type 3, etc.

I would like to shuffle the order of the DataFrame’s rows, so that all Type‘s are mixed. A possible result could be:

    Col1  Col2  Col3  Type
0      7     8     9     2
1     13    14    15     3
...
20     1     2     3     1
21    10    11    12     2
...
45     4     5     6     1
46    16    17    18     3
...

How can I achieve this?


回答 0

使用Pandas的惯用方式是使用.sample数据框的方法对所有行进行采样而无需替换:

df.sample(frac=1)

frac关键字参数指定的行的分数到随机样品中返回,所以frac=1装置返回所有行(随机顺序)。


注意: 如果您希望就地改组数据帧并重置索引,则可以执行例如

df = df.sample(frac=1).reset_index(drop=True)

在此,指定drop=True可防止.reset_index创建包含旧索引条目的列。

后续注解:尽管上面的操作似乎并不就位,但是python / pandas足够聪明,不会为经过改组的对象做另一个malloc。也就是说,即使参考对象已更改(我的意思id(df_old)是与相同id(df_new)),底层C对象仍然相同。为了证明确实如此,您可以运行一个简单的内存探查器:

$ python3 -m memory_profiler .\test.py
Filename: .\test.py

Line #    Mem usage    Increment   Line Contents
================================================
     5     68.5 MiB     68.5 MiB   @profile
     6                             def shuffle():
     7    847.8 MiB    779.3 MiB       df = pd.DataFrame(np.random.randn(100, 1000000))
     8    847.9 MiB      0.1 MiB       df = df.sample(frac=1).reset_index(drop=True)

The idiomatic way to do this with Pandas is to use the .sample method of your dataframe to sample all rows without replacement:

df.sample(frac=1)

The frac keyword argument specifies the fraction of rows to return in the random sample, so frac=1 means return all rows (in random order).


Note: If you wish to shuffle your dataframe in-place and reset the index, you could do e.g.

df = df.sample(frac=1).reset_index(drop=True)

Here, specifying drop=True prevents .reset_index from creating a column containing the old index entries.

Follow-up note: Although it may not look like the above operation is in-place, python/pandas is smart enough not to do another malloc for the shuffled object. That is, even though the reference object has changed (by which I mean id(df_old) is not the same as id(df_new)), the underlying C object is still the same. To show that this is indeed the case, you could run a simple memory profiler:

$ python3 -m memory_profiler .\test.py
Filename: .\test.py

Line #    Mem usage    Increment   Line Contents
================================================
     5     68.5 MiB     68.5 MiB   @profile
     6                             def shuffle():
     7    847.8 MiB    779.3 MiB       df = pd.DataFrame(np.random.randn(100, 1000000))
     8    847.9 MiB      0.1 MiB       df = df.sample(frac=1).reset_index(drop=True)


回答 1

您可以为此简单地使用sklearn

from sklearn.utils import shuffle
df = shuffle(df)

You can simply use sklearn for this

from sklearn.utils import shuffle
df = shuffle(df)

回答 2

您可以通过使用改组后的索引建立索引来改组数据帧的行。为此,您可以使用np.random.permutation(但np.random.choice也可以):

In [12]: df = pd.read_csv(StringIO(s), sep="\s+")

In [13]: df
Out[13]: 
    Col1  Col2  Col3  Type
0      1     2     3     1
1      4     5     6     1
20     7     8     9     2
21    10    11    12     2
45    13    14    15     3
46    16    17    18     3

In [14]: df.iloc[np.random.permutation(len(df))]
Out[14]: 
    Col1  Col2  Col3  Type
46    16    17    18     3
45    13    14    15     3
20     7     8     9     2
0      1     2     3     1
1      4     5     6     1
21    10    11    12     2

如果要像示例中那样将索引的编号始终保持为1、2,..,n,则只需重置索引即可: df_shuffled.reset_index(drop=True)

You can shuffle the rows of a dataframe by indexing with a shuffled index. For this, you can eg use np.random.permutation (but np.random.choice is also a possibility):

In [12]: df = pd.read_csv(StringIO(s), sep="\s+")

In [13]: df
Out[13]: 
    Col1  Col2  Col3  Type
0      1     2     3     1
1      4     5     6     1
20     7     8     9     2
21    10    11    12     2
45    13    14    15     3
46    16    17    18     3

In [14]: df.iloc[np.random.permutation(len(df))]
Out[14]: 
    Col1  Col2  Col3  Type
46    16    17    18     3
45    13    14    15     3
20     7     8     9     2
0      1     2     3     1
1      4     5     6     1
21    10    11    12     2

If you want to keep the index numbered from 1, 2, .., n as in your example, you can simply reset the index: df_shuffled.reset_index(drop=True)


回答 3

TL; DRnp.random.shuffle(ndarray)可以胜任。
所以,在你的情况下

np.random.shuffle(DataFrame.values)

DataFrame在后台,使用NumPy ndarray作为数据持有者。(您可以从DataFrame源代码检查)

因此,如果使用np.random.shuffle(),它将沿多维数组的第一个轴随机排列数组。但是DataFrame遗体的索引仍然没有改组。

虽然,有一些要考虑的问题。

  • 函数不返回任何内容。如果要保留原始对象的副本,则必须这样做,然后再传递给该函数。
  • sklearn.utils.shuffle(),如用户tj89所建议的那样,可以指定random_state其他选项来控制输出。您可能需要出于开发目的。
  • sklearn.utils.shuffle()是比较快的。但洗牌的轴信息(索引,列)DataFrame与沿ndarray它包含的内容。

基准结果

sklearn.utils.shuffle()和之间np.random.shuffle()

ndarray

nd = sklearn.utils.shuffle(nd)

0.10793248389381915秒 快8倍

np.random.shuffle(nd)

0.8897626010002568秒

数据框

df = sklearn.utils.shuffle(df)

0.3183923360193148秒 快3倍

np.random.shuffle(df.values)

0.9357550159329548秒

结论:如果可以将轴信息(索引,列)与ndarray一起改组,请使用sklearn.utils.shuffle()。否则,使用np.random.shuffle()

使用的代码

import timeit
setup = '''
import numpy as np
import pandas as pd
import sklearn
nd = np.random.random((1000, 100))
df = pd.DataFrame(nd)
'''

timeit.timeit('nd = sklearn.utils.shuffle(nd)', setup=setup, number=1000)
timeit.timeit('np.random.shuffle(nd)', setup=setup, number=1000)
timeit.timeit('df = sklearn.utils.shuffle(df)', setup=setup, number=1000)
timeit.timeit('np.random.shuffle(df.values)', setup=setup, number=1000)

TL;DR: np.random.shuffle(ndarray) can do the job.
So, in your case

np.random.shuffle(DataFrame.values)

DataFrame, under the hood, uses NumPy ndarray as data holder. (You can check from DataFrame source code)

So if you use np.random.shuffle(), it would shuffles the array along the first axis of a multi-dimensional array. But index of the DataFrame remains unshuffled.

Though, there are some points to consider.

  • function returns none. In case you want to keep a copy of the original object, you have to do so before you pass to the function.
  • sklearn.utils.shuffle(), as user tj89 suggested, can designate random_state along with another option to control output. You may want that for dev purpose.
  • sklearn.utils.shuffle() is faster. But WILL SHUFFLE the axis info(index, column) of the DataFrame along with the ndarray it contains.

Benchmark result

between sklearn.utils.shuffle() and np.random.shuffle().

ndarray

nd = sklearn.utils.shuffle(nd)

0.10793248389381915 sec. 8x faster

np.random.shuffle(nd)

0.8897626010002568 sec

DataFrame

df = sklearn.utils.shuffle(df)

0.3183923360193148 sec. 3x faster

np.random.shuffle(df.values)

0.9357550159329548 sec

Conclusion: If it is okay to axis info(index, column) to be shuffled along with ndarray, use sklearn.utils.shuffle(). Otherwise, use np.random.shuffle()

used code

import timeit
setup = '''
import numpy as np
import pandas as pd
import sklearn
nd = np.random.random((1000, 100))
df = pd.DataFrame(nd)
'''

timeit.timeit('nd = sklearn.utils.shuffle(nd)', setup=setup, number=1000)
timeit.timeit('np.random.shuffle(nd)', setup=setup, number=1000)
timeit.timeit('df = sklearn.utils.shuffle(df)', setup=setup, number=1000)
timeit.timeit('np.random.shuffle(df.values)', setup=setup, number=1000)


回答 4

(我没有足够的声誉在最高职位上对此发表评论,所以我希望其他人可以为我这样做。)第一种方法引起了人们的关注:

df.sample(frac=1)

进行深拷贝或只是更改数据框。我运行了以下代码:

print(hex(id(df)))
print(hex(id(df.sample(frac=1))))
print(hex(id(df.sample(frac=1).reset_index(drop=True))))

我的结果是:

0x1f8a784d400
0x1f8b9d65e10
0x1f8b9d65b70

这意味着该方法返回上一个注释中建议的相同对象。因此,此方法的确可以制作随机的副本

(I don’t have enough reputation to comment this on the top post, so I hope someone else can do that for me.) There was a concern raised that the first method:

df.sample(frac=1)

made a deep copy or just changed the dataframe. I ran the following code:

print(hex(id(df)))
print(hex(id(df.sample(frac=1))))
print(hex(id(df.sample(frac=1).reset_index(drop=True))))

and my results were:

0x1f8a784d400
0x1f8b9d65e10
0x1f8b9d65b70

which means the method is not returning the same object, as was suggested in the last comment. So this method does indeed make a shuffled copy.


回答 5

还有用的是,如果将其用于Machine_learning并且希望始终分离相同的数据,则可以使用:

df.sample(n=len(df), random_state=42)

这样可以确保您的随机选择始终可复制

What is also useful, if you use it for Machine_learning and want to seperate always the same data, you could use:

df.sample(n=len(df), random_state=42)

this makes sure, that you keep your random choice always replicatable


回答 6

AFAIK最简单的解决方案是:

df_shuffled = df.reindex(np.random.permutation(df.index))

AFAIK the simplest solution is:

df_shuffled = df.reindex(np.random.permutation(df.index))

回答 7

通过取样阵列中的这种情况下,洗牌大熊猫数据帧索引和随机那么它的顺序来设置所述阵列的数据帧的索引。现在根据索引对数据帧进行排序。这是您经过改组的数据框

import random
df = pd.DataFrame({"a":[1,2,3,4],"b":[5,6,7,8]})
index = [i for i in range(df.shape[0])]
random.shuffle(index)
df.set_index([index]).sort_index()

输出

    a   b
0   2   6
1   1   5
2   3   7
3   4   8

在上面的代码中将数据框插入我的位置。

shuffle the pandas data frame by taking a sample array in this case index and randomize its order then set the array as an index of data frame. Now sort the data frame according to index. Here goes your shuffled dataframe

import random
df = pd.DataFrame({"a":[1,2,3,4],"b":[5,6,7,8]})
index = [i for i in range(df.shape[0])]
random.shuffle(index)
df.set_index([index]).sort_index()

output

    a   b
0   2   6
1   1   5
2   3   7
3   4   8

Insert you data frame in the place of mine in above code .


回答 8

这是另一种方式:

df['rnd'] = np.random.rand(len(df)) df = df.sort_values(by='rnd', inplace=True).drop('rnd', axis=1)

Here is another way:

df['rnd'] = np.random.rand(len(df)) df = df.sort_values(by='rnd', inplace=True).drop('rnd', axis=1)