使用numpy构建两个数组的所有组合的数组

问题:使用numpy构建两个数组的所有组合的数组

我试图在尝试使用6参数函数的参数空间之前先研究它的数值行为,然后再尝试对其进行复杂的处理,因此我正在寻找一种有效的方法来执行此操作。

我的函数采用给定6维numpy数组作为输入的float值。我最初尝试做的是:

首先,我创建了一个函数,该函数接受2个数组并生成一个包含两个数组中值的所有组合的数组

from numpy import *
def comb(a,b):
    c = []
    for i in a:
        for j in b:
            c.append(r_[i,j])
    return c

然后我将reduce()其应用于同一数组的m个副本:

def combs(a,m):
    return reduce(comb,[a]*m)

然后我像这样评估我的功能:

values = combs(np.arange(0,1,0.1),6)
for val in values:
    print F(val)

这有效,但是太慢了。我知道参数的空间很大,但这不应该太慢。在此示例中,我仅采样了10 6(一百万)个点,仅花费了15秒以上的时间便创建了数组values

您知道使用numpy进行此操作的更有效的方法吗?

F如果需要,我可以修改函数接受参数的方式。

I’m trying to run over the parameters space of a 6 parameter function to study it’s numerical behavior before trying to do anything complex with it so I’m searching for a efficient way to do this.

My function takes float values given a 6-dim numpy array as input. What I tried to do initially was this:

First I created a function that takes 2 arrays and generate an array with all combinations of values from the two arrays

from numpy import *
def comb(a,b):
    c = []
    for i in a:
        for j in b:
            c.append(r_[i,j])
    return c

Then I used reduce() to apply that to m copies of the same array:

def combs(a,m):
    return reduce(comb,[a]*m)

And then I evaluate my function like this:

values = combs(np.arange(0,1,0.1),6)
for val in values:
    print F(val)

This works but it’s waaaay too slow. I know the space of parameters is huge, but this shouldn’t be so slow. I have only sampled 106 (a million) points in this example and it took more than 15 seconds just to create the array values.

Do you know any more efficient way of doing this with numpy?

I can modify the way the function F takes it’s arguments if it’s necessary.


回答 0

numpy(> 1.8.x)的较新版本中,numpy.meshgrid()提供了更快的实现:

@PV的解决方案

In [113]:

%timeit cartesian(([1, 2, 3], [4, 5], [6, 7]))
10000 loops, best of 3: 135 µs per loop
In [114]:

cartesian(([1, 2, 3], [4, 5], [6, 7]))

Out[114]:
array([[1, 4, 6],
       [1, 4, 7],
       [1, 5, 6],
       [1, 5, 7],
       [2, 4, 6],
       [2, 4, 7],
       [2, 5, 6],
       [2, 5, 7],
       [3, 4, 6],
       [3, 4, 7],
       [3, 5, 6],
       [3, 5, 7]])

numpy.meshgrid()只能用于2D,现在可以ND。在这种情况下,3D:

In [115]:

%timeit np.array(np.meshgrid([1, 2, 3], [4, 5], [6, 7])).T.reshape(-1,3)
10000 loops, best of 3: 74.1 µs per loop
In [116]:

np.array(np.meshgrid([1, 2, 3], [4, 5], [6, 7])).T.reshape(-1,3)

Out[116]:
array([[1, 4, 6],
       [1, 5, 6],
       [2, 4, 6],
       [2, 5, 6],
       [3, 4, 6],
       [3, 5, 6],
       [1, 4, 7],
       [1, 5, 7],
       [2, 4, 7],
       [2, 5, 7],
       [3, 4, 7],
       [3, 5, 7]])

请注意,最终结果的顺序略有不同。

In newer version of numpy (>1.8.x), numpy.meshgrid() provides a much faster implementation:

@pv’s solution

In [113]:

%timeit cartesian(([1, 2, 3], [4, 5], [6, 7]))
10000 loops, best of 3: 135 µs per loop
In [114]:

cartesian(([1, 2, 3], [4, 5], [6, 7]))

Out[114]:
array([[1, 4, 6],
       [1, 4, 7],
       [1, 5, 6],
       [1, 5, 7],
       [2, 4, 6],
       [2, 4, 7],
       [2, 5, 6],
       [2, 5, 7],
       [3, 4, 6],
       [3, 4, 7],
       [3, 5, 6],
       [3, 5, 7]])

numpy.meshgrid() use to be 2D only, now it is capable of ND. In this case, 3D:

In [115]:

%timeit np.array(np.meshgrid([1, 2, 3], [4, 5], [6, 7])).T.reshape(-1,3)
10000 loops, best of 3: 74.1 µs per loop
In [116]:

np.array(np.meshgrid([1, 2, 3], [4, 5], [6, 7])).T.reshape(-1,3)

Out[116]:
array([[1, 4, 6],
       [1, 5, 6],
       [2, 4, 6],
       [2, 5, 6],
       [3, 4, 6],
       [3, 5, 6],
       [1, 4, 7],
       [1, 5, 7],
       [2, 4, 7],
       [2, 5, 7],
       [3, 4, 7],
       [3, 5, 7]])

Note that the order of the final resultant is slightly different.


回答 1

这是一个纯粹的numpy实现。它比使用itertools快约5倍。


import numpy as np

def cartesian(arrays, out=None):
    """
    Generate a cartesian product of input arrays.

    Parameters
    ----------
    arrays : list of array-like
        1-D arrays to form the cartesian product of.
    out : ndarray
        Array to place the cartesian product in.

    Returns
    -------
    out : ndarray
        2-D array of shape (M, len(arrays)) containing cartesian products
        formed of input arrays.

    Examples
    --------
    >>> cartesian(([1, 2, 3], [4, 5], [6, 7]))
    array([[1, 4, 6],
           [1, 4, 7],
           [1, 5, 6],
           [1, 5, 7],
           [2, 4, 6],
           [2, 4, 7],
           [2, 5, 6],
           [2, 5, 7],
           [3, 4, 6],
           [3, 4, 7],
           [3, 5, 6],
           [3, 5, 7]])

    """

    arrays = [np.asarray(x) for x in arrays]
    dtype = arrays[0].dtype

    n = np.prod([x.size for x in arrays])
    if out is None:
        out = np.zeros([n, len(arrays)], dtype=dtype)

    m = n / arrays[0].size
    out[:,0] = np.repeat(arrays[0], m)
    if arrays[1:]:
        cartesian(arrays[1:], out=out[0:m, 1:])
        for j in xrange(1, arrays[0].size):
            out[j*m:(j+1)*m, 1:] = out[0:m, 1:]
    return out

Here’s a pure-numpy implementation. It’s about 5× faster than using itertools.


import numpy as np

def cartesian(arrays, out=None):
    """
    Generate a cartesian product of input arrays.

    Parameters
    ----------
    arrays : list of array-like
        1-D arrays to form the cartesian product of.
    out : ndarray
        Array to place the cartesian product in.

    Returns
    -------
    out : ndarray
        2-D array of shape (M, len(arrays)) containing cartesian products
        formed of input arrays.

    Examples
    --------
    >>> cartesian(([1, 2, 3], [4, 5], [6, 7]))
    array([[1, 4, 6],
           [1, 4, 7],
           [1, 5, 6],
           [1, 5, 7],
           [2, 4, 6],
           [2, 4, 7],
           [2, 5, 6],
           [2, 5, 7],
           [3, 4, 6],
           [3, 4, 7],
           [3, 5, 6],
           [3, 5, 7]])

    """

    arrays = [np.asarray(x) for x in arrays]
    dtype = arrays[0].dtype

    n = np.prod([x.size for x in arrays])
    if out is None:
        out = np.zeros([n, len(arrays)], dtype=dtype)

    m = n / arrays[0].size
    out[:,0] = np.repeat(arrays[0], m)
    if arrays[1:]:
        cartesian(arrays[1:], out=out[0:m, 1:])
        for j in xrange(1, arrays[0].size):
            out[j*m:(j+1)*m, 1:] = out[0:m, 1:]
    return out

回答 2

通常,itertools.combinations是从Python容器中获取组合的最快方法(如果您实际上确实想要组合,即无重复且无顺序的安排;那不是您的代码看起来正在做的事情,但是我做不到判断这是因为您的代码有错误还是因为您使用了错误的术语)。

如果您想要的不是组合,则itertools中的其他迭代器productpermutations可能会为您提供更好的服务。例如,看起来您的代码与以下代码大致相同:

for val in itertools.product(np.arange(0, 1, 0.1), repeat=6):
    print F(val)

所有这些迭代器都会生成元组,而不是列表或numpy数组,因此,如果您的F对要专门获取一个numpy数组很挑剔,则您将不得不承担在每一步构造或清除和重新填充一个数组的额外开销。

itertools.combinations is in general the fastest way to get combinations from a Python container (if you do in fact want combinations, i.e., arrangements WITHOUT repetitions and independent of order; that’s not what your code appears to be doing, but I can’t tell whether that’s because your code is buggy or because you’re using the wrong terminology).

If you want something different than combinations perhaps other iterators in itertools, product or permutations, might serve you better. For example, it looks like your code is roughly the same as:

for val in itertools.product(np.arange(0, 1, 0.1), repeat=6):
    print F(val)

All of these iterators yield tuples, not lists or numpy arrays, so if your F is picky about getting specifically a numpy array you’ll have to accept the extra overhead of constructing or clearing and re-filling one at each step.


回答 3

你可以做这样的事情

import numpy as np

def cartesian_coord(*arrays):
    grid = np.meshgrid(*arrays)        
    coord_list = [entry.ravel() for entry in grid]
    points = np.vstack(coord_list).T
    return points

a = np.arange(4)  # fake data
print(cartesian_coord(*6*[a])

这使

array([[0, 0, 0, 0, 0, 0],
   [0, 0, 0, 0, 0, 1],
   [0, 0, 0, 0, 0, 2],
   ..., 
   [3, 3, 3, 3, 3, 1],
   [3, 3, 3, 3, 3, 2],
   [3, 3, 3, 3, 3, 3]])

You can do something like this

import numpy as np

def cartesian_coord(*arrays):
    grid = np.meshgrid(*arrays)        
    coord_list = [entry.ravel() for entry in grid]
    points = np.vstack(coord_list).T
    return points

a = np.arange(4)  # fake data
print(cartesian_coord(*6*[a])

which gives

array([[0, 0, 0, 0, 0, 0],
   [0, 0, 0, 0, 0, 1],
   [0, 0, 0, 0, 0, 2],
   ..., 
   [3, 3, 3, 3, 3, 1],
   [3, 3, 3, 3, 3, 2],
   [3, 3, 3, 3, 3, 3]])

回答 4

以下numpy实现应为大约。给定答案速度的2倍:

def cartesian2(arrays):
    arrays = [np.asarray(a) for a in arrays]
    shape = (len(x) for x in arrays)

    ix = np.indices(shape, dtype=int)
    ix = ix.reshape(len(arrays), -1).T

    for n, arr in enumerate(arrays):
        ix[:, n] = arrays[n][ix[:, n]]

    return ix

The following numpy implementation should be approx. 2x the speed of the given answer:

def cartesian2(arrays):
    arrays = [np.asarray(a) for a in arrays]
    shape = (len(x) for x in arrays)

    ix = np.indices(shape, dtype=int)
    ix = ix.reshape(len(arrays), -1).T

    for n, arr in enumerate(arrays):
        ix[:, n] = arrays[n][ix[:, n]]

    return ix

回答 5

看起来您想让网格评估您的功能,在这种情况下,您可以使用numpy.ogrid(打开)或numpy.mgrid(完善):

import numpy
my_grid = numpy.mgrid[[slice(0,1,0.1)]*6]

It looks like you want a grid to evaluate your function, in which case you can use numpy.ogrid (open) or numpy.mgrid (fleshed out):

import numpy
my_grid = numpy.mgrid[[slice(0,1,0.1)]*6]

回答 6

您可以使用 np.array(itertools.product(a, b))

you can use np.array(itertools.product(a, b))


回答 7

这是使用纯NumPy,没有递归,没有列表理解以及没有明确的for循环的另一种方式。它比原始答案慢20%,并且基于np.meshgrid。

def cartesian(*arrays):
    mesh = np.meshgrid(*arrays)  # standard numpy meshgrid
    dim = len(mesh)  # number of dimensions
    elements = mesh[0].size  # number of elements, any index will do
    flat = np.concatenate(mesh).ravel()  # flatten the whole meshgrid
    reshape = np.reshape(flat, (dim, elements)).T  # reshape and transpose
    return reshape

例如,

x = np.arange(3)
a = cartesian(x, x, x, x, x)
print(a)

[[0 0 0 0 0]
 [0 0 0 0 1]
 [0 0 0 0 2]
 ..., 
 [2 2 2 2 0]
 [2 2 2 2 1]
 [2 2 2 2 2]]

Here’s yet another way, using pure NumPy, no recursion, no list comprehension, and no explicit for loops. It’s about 20% slower than the original answer, and it’s based on np.meshgrid.

def cartesian(*arrays):
    mesh = np.meshgrid(*arrays)  # standard numpy meshgrid
    dim = len(mesh)  # number of dimensions
    elements = mesh[0].size  # number of elements, any index will do
    flat = np.concatenate(mesh).ravel()  # flatten the whole meshgrid
    reshape = np.reshape(flat, (dim, elements)).T  # reshape and transpose
    return reshape

For example,

x = np.arange(3)
a = cartesian(x, x, x, x, x)
print(a)

gives

[[0 0 0 0 0]
 [0 0 0 0 1]
 [0 0 0 0 2]
 ..., 
 [2 2 2 2 0]
 [2 2 2 2 1]
 [2 2 2 2 2]]

回答 8

对于一维数组(或平坦的python列表)的笛卡尔积的纯numpy实现,只需使用meshgrid(),用滚动轴transpose(),然后将形状整形为所需的输出:

 def cartprod(*arrays):
     N = len(arrays)
     return transpose(meshgrid(*arrays, indexing='ij'), 
                      roll(arange(N + 1), -1)).reshape(-1, N)

请注意,这具有最后一个轴更改最快的约定(“ C样式”或“行主要”)。

In [88]: cartprod([1,2,3], [4,8], [100, 200, 300, 400], [-5, -4])
Out[88]: 
array([[  1,   4, 100,  -5],
       [  1,   4, 100,  -4],
       [  1,   4, 200,  -5],
       [  1,   4, 200,  -4],
       [  1,   4, 300,  -5],
       [  1,   4, 300,  -4],
       [  1,   4, 400,  -5],
       [  1,   4, 400,  -4],
       [  1,   8, 100,  -5],
       [  1,   8, 100,  -4],
       [  1,   8, 200,  -5],
       [  1,   8, 200,  -4],
       [  1,   8, 300,  -5],
       [  1,   8, 300,  -4],
       [  1,   8, 400,  -5],
       [  1,   8, 400,  -4],
       [  2,   4, 100,  -5],
       [  2,   4, 100,  -4],
       [  2,   4, 200,  -5],
       [  2,   4, 200,  -4],
       [  2,   4, 300,  -5],
       [  2,   4, 300,  -4],
       [  2,   4, 400,  -5],
       [  2,   4, 400,  -4],
       [  2,   8, 100,  -5],
       [  2,   8, 100,  -4],
       [  2,   8, 200,  -5],
       [  2,   8, 200,  -4],
       [  2,   8, 300,  -5],
       [  2,   8, 300,  -4],
       [  2,   8, 400,  -5],
       [  2,   8, 400,  -4],
       [  3,   4, 100,  -5],
       [  3,   4, 100,  -4],
       [  3,   4, 200,  -5],
       [  3,   4, 200,  -4],
       [  3,   4, 300,  -5],
       [  3,   4, 300,  -4],
       [  3,   4, 400,  -5],
       [  3,   4, 400,  -4],
       [  3,   8, 100,  -5],
       [  3,   8, 100,  -4],
       [  3,   8, 200,  -5],
       [  3,   8, 200,  -4],
       [  3,   8, 300,  -5],
       [  3,   8, 300,  -4],
       [  3,   8, 400,  -5],
       [  3,   8, 400,  -4]])

如果要最快地更改第一轴(“ FORTRAN样式”或“主要列”),只需更改如下order参数reshape()reshape((-1, N), order='F')

For a pure numpy implementation of Cartesian product of 1D arrays (or flat python lists), just use meshgrid(), roll the axes with transpose(), and reshape to the desired ouput:

 def cartprod(*arrays):
     N = len(arrays)
     return transpose(meshgrid(*arrays, indexing='ij'), 
                      roll(arange(N + 1), -1)).reshape(-1, N)

Note this has the convention of last axis changing fastest (“C style” or “row-major”).

In [88]: cartprod([1,2,3], [4,8], [100, 200, 300, 400], [-5, -4])
Out[88]: 
array([[  1,   4, 100,  -5],
       [  1,   4, 100,  -4],
       [  1,   4, 200,  -5],
       [  1,   4, 200,  -4],
       [  1,   4, 300,  -5],
       [  1,   4, 300,  -4],
       [  1,   4, 400,  -5],
       [  1,   4, 400,  -4],
       [  1,   8, 100,  -5],
       [  1,   8, 100,  -4],
       [  1,   8, 200,  -5],
       [  1,   8, 200,  -4],
       [  1,   8, 300,  -5],
       [  1,   8, 300,  -4],
       [  1,   8, 400,  -5],
       [  1,   8, 400,  -4],
       [  2,   4, 100,  -5],
       [  2,   4, 100,  -4],
       [  2,   4, 200,  -5],
       [  2,   4, 200,  -4],
       [  2,   4, 300,  -5],
       [  2,   4, 300,  -4],
       [  2,   4, 400,  -5],
       [  2,   4, 400,  -4],
       [  2,   8, 100,  -5],
       [  2,   8, 100,  -4],
       [  2,   8, 200,  -5],
       [  2,   8, 200,  -4],
       [  2,   8, 300,  -5],
       [  2,   8, 300,  -4],
       [  2,   8, 400,  -5],
       [  2,   8, 400,  -4],
       [  3,   4, 100,  -5],
       [  3,   4, 100,  -4],
       [  3,   4, 200,  -5],
       [  3,   4, 200,  -4],
       [  3,   4, 300,  -5],
       [  3,   4, 300,  -4],
       [  3,   4, 400,  -5],
       [  3,   4, 400,  -4],
       [  3,   8, 100,  -5],
       [  3,   8, 100,  -4],
       [  3,   8, 200,  -5],
       [  3,   8, 200,  -4],
       [  3,   8, 300,  -5],
       [  3,   8, 300,  -4],
       [  3,   8, 400,  -5],
       [  3,   8, 400,  -4]])

If you want to change the first axis fastest (“FORTRAN style” or “column-major”), just change the order parameter of reshape() like this: reshape((-1, N), order='F')


回答 9

熊猫merge提供了一个天真的,快速的解决方案:

# given the lists
x, y, z = [1, 2, 3], [4, 5], [6, 7]

# get dfs with same, constant index 
x = pd.DataFrame({'x': x}, index=np.repeat(0, len(x))
y = pd.DataFrame({'y': y}, index=np.repeat(0, len(y))
z = pd.DataFrame({'z': z}, index=np.repeat(0, len(z))

# get all permutations stored in a new df
df = pd.merge(x, pd.merge(y, z, left_index=True, righ_index=True),
              left_index=True, right_index=True)

Pandas merge offers a naive, fast solution to the problem:

# given the lists
x, y, z = [1, 2, 3], [4, 5], [6, 7]

# get dfs with same, constant index 
x = pd.DataFrame({'x': x}, index=np.repeat(0, len(x))
y = pd.DataFrame({'y': y}, index=np.repeat(0, len(y))
z = pd.DataFrame({'z': z}, index=np.repeat(0, len(z))

# get all permutations stored in a new df
df = pd.merge(x, pd.merge(y, z, left_index=True, righ_index=True),
              left_index=True, right_index=True)