如何将numpy数组列表转换为单个numpy数组?

问题:如何将numpy数组列表转换为单个numpy数组?

假设我有;

LIST = [[array([1, 2, 3, 4, 5]), array([1, 2, 3, 4, 5],[1,2,3,4,5])] # inner lists are numpy arrays

我尝试转换;

array([[1, 2, 3, 4, 5],
       [1, 2, 3, 4, 5],
       [1, 2, 3, 4, 5])

我现在正在vstack上通过迭代来解决它,但是对于特别大的LIST来说确实很慢

您对最佳有效方法有何建议?

Suppose I have ;

LIST = [[array([1, 2, 3, 4, 5]), array([1, 2, 3, 4, 5],[1,2,3,4,5])] # inner lists are numpy arrays

I try to convert;

array([[1, 2, 3, 4, 5],
       [1, 2, 3, 4, 5],
       [1, 2, 3, 4, 5])

I am solving it by iteration on vstack right now but it is really slow for especially large LIST

What do you suggest for the best efficient way?


回答 0

通常,您可以沿任意轴连接整个数组序列:

numpy.concatenate( LIST, axis=0 )

但是你必须对列表中的形状和每个阵列的维度担心(用于2维3×5的输出,你需要确保它们都是2维正由-5阵列的话)。如果要将一维数组连接为二维输出的行,则需要扩展其维数。

正如Jorge的答案所指出的那样,还有stacknumpy 1.10中引入的function :

numpy.stack( LIST, axis=0 )

这采用了补充方法:在连接之前,它会为每个输入数组创建一个新视图并添加一个额外的维数(在这种情况下,在左侧,因此每个n元素1D数组将变为1 x n2D数组)。仅当所有输入数组都具有相同的形状时才有效(即使沿着串联轴也是如此)。

vstack(或等价的row_stack)通常是一个更易于使用的解决方案,因为它将采用一维和/或二维数组序列,并在将整个列表连接在一起之前,在必要时且仅在必要时才自动扩展维数。在需要新尺寸的地方,将其添加到左侧。同样,您可以一次串联整个列表,而无需进行迭代:

numpy.vstack( LIST )

语法快捷方式也显示了这种灵活的行为numpy.r_[ array1, ...., arrayN ](请注意方括号)。这对于连接几个显式命名的数组很有用,但对您的情况不利,因为此语法将不接受数组序列,例如your LIST

还有一个类似的函数column_stack和快捷方式c_[...],用于水平(列方式)堆叠,以及一个几乎类似的函数hstack -尽管出于某种原因,后者的灵活性较差(它对输入数组的维数更为严格,并试图进行串联)一维数组首尾相连,而不是将它们视为列。

最后,在垂直堆叠一维数组的特定情况下,以下内容也适用:

numpy.array( LIST )

…因为数组可以从其他数组序列中构造出来,因此在开头增加了新的维度。

In general you can concatenate a whole sequence of arrays along any axis:

numpy.concatenate( LIST, axis=0 )

but you do have to worry about the shape and dimensionality of each array in the list (for a 2-dimensional 3×5 output, you need to ensure that they are all 2-dimensional n-by-5 arrays already). If you want to concatenate 1-dimensional arrays as the rows of a 2-dimensional output, you need to expand their dimensionality.

As Jorge’s answer points out, there is also the function stack, introduced in numpy 1.10:

numpy.stack( LIST, axis=0 )

This takes the complementary approach: it creates a new view of each input array and adds an extra dimension (in this case, on the left, so each n-element 1D array becomes a 1-by-n 2D array) before concatenating. It will only work if all the input arrays have the same shape—even along the axis of concatenation.

vstack (or equivalently row_stack) is often an easier-to-use solution because it will take a sequence of 1- and/or 2-dimensional arrays and expand the dimensionality automatically where necessary and only where necessary, before concatenating the whole list together. Where a new dimension is required, it is added on the left. Again, you can concatenate a whole list at once without needing to iterate:

numpy.vstack( LIST )

This flexible behavior is also exhibited by the syntactic shortcut numpy.r_[ array1, ...., arrayN ] (note the square brackets). This is good for concatenating a few explicitly-named arrays but is no good for your situation because this syntax will not accept a sequence of arrays, like your LIST.

There is also an analogous function column_stack and shortcut c_[...], for horizontal (column-wise) stacking, as well as an almost-analogous function hstack—although for some reason the latter is less flexible (it is stricter about input arrays’ dimensionality, and tries to concatenate 1-D arrays end-to-end instead of treating them as columns).

Finally, in the specific case of vertical stacking of 1-D arrays, the following also works:

numpy.array( LIST )

…because arrays can be constructed out of a sequence of other arrays, adding a new dimension to the beginning.


回答 1

从NumPy 1.10版开始,我们有了方法stack。它可以堆叠任何维度的数组(全部相等):

# List of arrays.
L = [np.random.randn(5,4,2,5,1,2) for i in range(10)]

# Stack them using axis=0.
M = np.stack(L)
M.shape # == (10,5,4,2,5,1,2)
np.all(M == L) # == True

M = np.stack(L, axis=1)
M.shape # == (5,10,4,2,5,1,2)
np.all(M == L) # == False (Don't Panic)

# This are all true    
np.all(M[:,0,:] == L[0]) # == True
all(np.all(M[:,i,:] == L[i]) for i in range(10)) # == True

请享用,

Starting in NumPy version 1.10, we have the method stack. It can stack arrays of any dimension (all equal):

# List of arrays.
L = [np.random.randn(5,4,2,5,1,2) for i in range(10)]

# Stack them using axis=0.
M = np.stack(L)
M.shape # == (10,5,4,2,5,1,2)
np.all(M == L) # == True

M = np.stack(L, axis=1)
M.shape # == (5,10,4,2,5,1,2)
np.all(M == L) # == False (Don't Panic)

# This are all true    
np.all(M[:,0,:] == L[0]) # == True
all(np.all(M[:,i,:] == L[i]) for i in range(10)) # == True

Enjoy,


回答 2

我检查了一些提高速度性能的方法,发现没有什么不同! 唯一的区别是,使用某些方法必须仔细检查尺寸。

定时:

|------------|----------------|-------------------|
|            | shape (10000)  |  shape (1,10000)  |
|------------|----------------|-------------------|
| np.concat  |    0.18280     |      0.17960      |
|------------|----------------|-------------------|
|  np.stack  |    0.21501     |      0.16465      |
|------------|----------------|-------------------|
| np.vstack  |    0.21501     |      0.17181      |
|------------|----------------|-------------------|
|  np.array  |    0.21656     |      0.16833      |
|------------|----------------|-------------------|

如您所见,我尝试了2个实验-使用np.random.rand(10000)np.random.rand(1, 10000) 如果我们使用2d数组,则np.stacknp.array创建附加维度-result.shape是(1,10000,10000)和(10000,1,10000),那么他们需要采取其他措施来避免这种情况。

码:

from time import perf_counter
from tqdm import tqdm_notebook
import numpy as np
l = []
for i in tqdm_notebook(range(10000)):
    new_np = np.random.rand(10000)
    l.append(new_np)



start = perf_counter()
stack = np.stack(l, axis=0 )
print(f'np.stack: {perf_counter() - start:.5f}')

start = perf_counter()
vstack = np.vstack(l)
print(f'np.vstack: {perf_counter() - start:.5f}')

start = perf_counter()
wrap = np.array(l)
print(f'np.array: {perf_counter() - start:.5f}')

start = perf_counter()
l = [el.reshape(1,-1) for el in l]
conc = np.concatenate(l, axis=0 )
print(f'np.concatenate: {perf_counter() - start:.5f}')

I checked some of the methods for speed performance and find that there is no difference! The only difference is that using some methods you must carefully check dimension.

Timing:

|------------|----------------|-------------------|
|            | shape (10000)  |  shape (1,10000)  |
|------------|----------------|-------------------|
| np.concat  |    0.18280     |      0.17960      |
|------------|----------------|-------------------|
|  np.stack  |    0.21501     |      0.16465      |
|------------|----------------|-------------------|
| np.vstack  |    0.21501     |      0.17181      |
|------------|----------------|-------------------|
|  np.array  |    0.21656     |      0.16833      |
|------------|----------------|-------------------|

As you can see I tried 2 experiments – using np.random.rand(10000) and np.random.rand(1, 10000) And if we use 2d arrays than np.stack and np.array create additional dimension – result.shape is (1,10000,10000) and (10000,1,10000) so they need additional actions to avoid this.

Code:

from time import perf_counter
from tqdm import tqdm_notebook
import numpy as np
l = []
for i in tqdm_notebook(range(10000)):
    new_np = np.random.rand(10000)
    l.append(new_np)



start = perf_counter()
stack = np.stack(l, axis=0 )
print(f'np.stack: {perf_counter() - start:.5f}')

start = perf_counter()
vstack = np.vstack(l)
print(f'np.vstack: {perf_counter() - start:.5f}')

start = perf_counter()
wrap = np.array(l)
print(f'np.array: {perf_counter() - start:.5f}')

start = perf_counter()
l = [el.reshape(1,-1) for el in l]
conc = np.concatenate(l, axis=0 )
print(f'np.concatenate: {perf_counter() - start:.5f}')