问题:将2D数组复制到3维,N次(Python)
我想将一个numpy的2D数组复制到第三维。例如,给定(2D)numpy数组:
import numpy as np
arr = np.array([[1,2],[1,2]])
# arr.shape = (2, 2)
将其转换为3D矩阵,并在一个新维度中包含N个此类副本。作用于arr
与N = 3时,输出应为:
new_arr = np.array([[[1,2],[1,2]],[[1,2],[1,2]],[[1,2],[1,2]]])
# new_arr.shape = (3, 2, 2)
I’d like to copy a numpy 2D array into a third dimension. For example, given the (2D) numpy array:
import numpy as np
arr = np.array([[1,2],[1,2]])
# arr.shape = (2, 2)
convert it into a 3D matrix with N such copies in a new dimension. Acting on arr
with N=3, the output should be:
new_arr = np.array([[[1,2],[1,2]],[[1,2],[1,2]],[[1,2],[1,2]]])
# new_arr.shape = (3, 2, 2)
回答 0
也许最干净的方法是使用np.repeat
:
a = np.array([[1, 2], [1, 2]])
print(a.shape)
# (2, 2)
# indexing with np.newaxis inserts a new 3rd dimension, which we then repeat the
# array along, (you can achieve the same effect by indexing with None, see below)
b = np.repeat(a[:, :, np.newaxis], 3, axis=2)
print(b.shape)
# (2, 2, 3)
print(b[:, :, 0])
# [[1 2]
# [1 2]]
print(b[:, :, 1])
# [[1 2]
# [1 2]]
print(b[:, :, 2])
# [[1 2]
# [1 2]]
话虽如此,您通常可以通过使用broadcast避免完全重复阵列。例如,假设我要添加一个(3,)
向量:
c = np.array([1, 2, 3])
到a
。我可以a
在第三维中复制3次的内容,然后c
在第一维和第二维中复制两次的内容,这样我的两个数组都是(2, 2, 3)
,然后计算它们的总和。但是,这样做更加简单快捷:
d = a[..., None] + c[None, None, :]
在此,a[..., None]
具有形状,(2, 2, 1)
并且c[None, None, :]
具有形状(1, 1, 3)
*。当我计算总和时,结果沿大小为1的维度“广播”出去,给了我shape的结果(2, 2, 3)
:
print(d.shape)
# (2, 2, 3)
print(d[..., 0]) # a + c[0]
# [[2 3]
# [2 3]]
print(d[..., 1]) # a + c[1]
# [[3 4]
# [3 4]]
print(d[..., 2]) # a + c[2]
# [[4 5]
# [4 5]]
广播是一项非常强大的技术,因为它避免了在内存中创建输入数组的重复副本所涉及的额外开销。
*尽管为清楚起见,我将它们包括在内,但实际上并不需要None
索引c
-您也可以这样做a[..., None] + c
,即(2, 2, 1)
针对(3,)
数组广播数组。这是因为,如果其中一个数组的尺寸小于另一个数组的尺寸,则仅两个数组的尾随尺寸需要兼容。举一个更复杂的例子:
a = np.ones((6, 1, 4, 3, 1)) # 6 x 1 x 4 x 3 x 1
b = np.ones((5, 1, 3, 2)) # 5 x 1 x 3 x 2
result = a + b # 6 x 5 x 4 x 3 x 2
Probably the cleanest way is to use np.repeat
:
a = np.array([[1, 2], [1, 2]])
print(a.shape)
# (2, 2)
# indexing with np.newaxis inserts a new 3rd dimension, which we then repeat the
# array along, (you can achieve the same effect by indexing with None, see below)
b = np.repeat(a[:, :, np.newaxis], 3, axis=2)
print(b.shape)
# (2, 2, 3)
print(b[:, :, 0])
# [[1 2]
# [1 2]]
print(b[:, :, 1])
# [[1 2]
# [1 2]]
print(b[:, :, 2])
# [[1 2]
# [1 2]]
Having said that, you can often avoid repeating your arrays altogether by using broadcasting. For example, let’s say I wanted to add a (3,)
vector:
c = np.array([1, 2, 3])
to a
. I could copy the contents of a
3 times in the third dimension, then copy the contents of c
twice in both the first and second dimensions, so that both of my arrays were (2, 2, 3)
, then compute their sum. However, it’s much simpler and quicker to do this:
d = a[..., None] + c[None, None, :]
Here, a[..., None]
has shape (2, 2, 1)
and c[None, None, :]
has shape (1, 1, 3)
*. When I compute the sum, the result gets ‘broadcast’ out along the dimensions of size 1, giving me a result of shape (2, 2, 3)
:
print(d.shape)
# (2, 2, 3)
print(d[..., 0]) # a + c[0]
# [[2 3]
# [2 3]]
print(d[..., 1]) # a + c[1]
# [[3 4]
# [3 4]]
print(d[..., 2]) # a + c[2]
# [[4 5]
# [4 5]]
Broadcasting is a very powerful technique because it avoids the additional overhead involved in creating repeated copies of your input arrays in memory.
* Although I included them for clarity, the None
indices into c
aren’t actually necessary – you could also do a[..., None] + c
, i.e. broadcast a (2, 2, 1)
array against a (3,)
array. This is because if one of the arrays has fewer dimensions than the other then only the trailing dimensions of the two arrays need to be compatible. To give a more complicated example:
a = np.ones((6, 1, 4, 3, 1)) # 6 x 1 x 4 x 3 x 1
b = np.ones((5, 1, 3, 2)) # 5 x 1 x 3 x 2
result = a + b # 6 x 5 x 4 x 3 x 2
回答 1
另一种方法是使用numpy.dstack
。假设您要重复矩阵a
num_repeats
时间:
import numpy as np
b = np.dstack([a]*num_repeats)
诀窍是将矩阵包装a
到单个元素的列表中,然后使用*
运算符在此列表中重复元素num_repeats
。
例如,如果:
a = np.array([[1, 2], [1, 2]])
num_repeats = 5
这将[1 2; 1 2]
在第三维中重复5次该数组。验证(在IPython中):
In [110]: import numpy as np
In [111]: num_repeats = 5
In [112]: a = np.array([[1, 2], [1, 2]])
In [113]: b = np.dstack([a]*num_repeats)
In [114]: b[:,:,0]
Out[114]:
array([[1, 2],
[1, 2]])
In [115]: b[:,:,1]
Out[115]:
array([[1, 2],
[1, 2]])
In [116]: b[:,:,2]
Out[116]:
array([[1, 2],
[1, 2]])
In [117]: b[:,:,3]
Out[117]:
array([[1, 2],
[1, 2]])
In [118]: b[:,:,4]
Out[118]:
array([[1, 2],
[1, 2]])
In [119]: b.shape
Out[119]: (2, 2, 5)
最后,我们可以看到矩阵的形状为2 x 2
,在第三维中有5个切片。
Another way is to use numpy.dstack
. Supposing that you want to repeat the matrix a
num_repeats
times:
import numpy as np
b = np.dstack([a]*num_repeats)
The trick is to wrap the matrix a
into a list of a single element, then using the *
operator to duplicate the elements in this list num_repeats
times.
For example, if:
a = np.array([[1, 2], [1, 2]])
num_repeats = 5
This repeats the array of [1 2; 1 2]
5 times in the third dimension. To verify (in IPython):
In [110]: import numpy as np
In [111]: num_repeats = 5
In [112]: a = np.array([[1, 2], [1, 2]])
In [113]: b = np.dstack([a]*num_repeats)
In [114]: b[:,:,0]
Out[114]:
array([[1, 2],
[1, 2]])
In [115]: b[:,:,1]
Out[115]:
array([[1, 2],
[1, 2]])
In [116]: b[:,:,2]
Out[116]:
array([[1, 2],
[1, 2]])
In [117]: b[:,:,3]
Out[117]:
array([[1, 2],
[1, 2]])
In [118]: b[:,:,4]
Out[118]:
array([[1, 2],
[1, 2]])
In [119]: b.shape
Out[119]: (2, 2, 5)
At the end we can see that the shape of the matrix is 2 x 2
, with 5 slices in the third dimension.
回答 2
使用视图并获得免费的运行时!将通用n-dim
数组扩展为n+1-dim
在NumPy中1.10.0
引入后,我们可以利用它numpy.broadcast_to
来简单地生成输入数组的3D
视图2D
。好处将是没有额外的内存开销和几乎免费的运行时。这在数组很大且我们可以使用视图的情况下至关重要。同样,这将适用于一般n-dim
情况。
我会用单词stack
代替copy
,因为读者可能会将它与创建内存副本的数组的复制混淆。
沿第一轴堆叠
如果我们要arr
沿第一个轴堆叠输入,np.broadcast_to
创建3D
视图的解决方案将是-
np.broadcast_to(arr,(3,)+arr.shape) # N = 3 here
沿第三个/最后一个轴堆叠
要arr
沿第三轴堆叠输入,创建3D
视图的解决方案是-
np.broadcast_to(arr[...,None],arr.shape+(3,))
如果我们确实需要一个内存副本,那么我们总是可以在此追加.copy()
。因此,解决方案将是-
np.broadcast_to(arr,(3,)+arr.shape).copy()
np.broadcast_to(arr[...,None],arr.shape+(3,)).copy()
这是两种情况下堆叠的工作方式,并显示了样品箱的形状信息-
# Create a sample input array of shape (4,5)
In [55]: arr = np.random.rand(4,5)
# Stack along first axis
In [56]: np.broadcast_to(arr,(3,)+arr.shape).shape
Out[56]: (3, 4, 5)
# Stack along third axis
In [57]: np.broadcast_to(arr[...,None],arr.shape+(3,)).shape
Out[57]: (4, 5, 3)
相同的解决方案可以用来扩展n-dim
输入以n+1-dim
沿第一个轴和最后一个轴查看输出。让我们探讨一些更暗淡的情况-
3D输入盒:
In [58]: arr = np.random.rand(4,5,6)
# Stack along first axis
In [59]: np.broadcast_to(arr,(3,)+arr.shape).shape
Out[59]: (3, 4, 5, 6)
# Stack along last axis
In [60]: np.broadcast_to(arr[...,None],arr.shape+(3,)).shape
Out[60]: (4, 5, 6, 3)
4D输入盒:
In [61]: arr = np.random.rand(4,5,6,7)
# Stack along first axis
In [62]: np.broadcast_to(arr,(3,)+arr.shape).shape
Out[62]: (3, 4, 5, 6, 7)
# Stack along last axis
In [63]: np.broadcast_to(arr[...,None],arr.shape+(3,)).shape
Out[63]: (4, 5, 6, 7, 3)
等等。
时机
让我们使用一个大样本示例2D
,获取时间并验证输出是否为view
。
# Sample input array
In [19]: arr = np.random.rand(1000,1000)
让我们证明所提出的解决方案确实是一种观点。我们将使用沿第一轴的堆叠(沿第三轴的堆叠结果非常相似)-
In [22]: np.shares_memory(arr, np.broadcast_to(arr,(3,)+arr.shape))
Out[22]: True
让我们来说明一下它实际上是免费的-
In [20]: %timeit np.broadcast_to(arr,(3,)+arr.shape)
100000 loops, best of 3: 3.56 µs per loop
In [21]: %timeit np.broadcast_to(arr,(3000,)+arr.shape)
100000 loops, best of 3: 3.51 µs per loop
N
从观点来看,在时序上从3
增加到3000
不变,并且在时序单位上两者都可以忽略不计。因此,在内存和性能上都非常有效!
Use a view and get free runtime! Extend generic n-dim
arrays to n+1-dim
Introduced in NumPy 1.10.0
, we can leverage numpy.broadcast_to
to simply generate a 3D
view into the 2D
input array. The benefit would be no extra memory overhead and virtually free runtime. This would be essential in cases where the arrays are big and we are okay to work with views. Also, this would work with generic n-dim
cases.
I would use the word stack
in place of copy
, as readers might confuse it with the copying of arrays that creates memory copies.
Stack along first axis
If we want to stack input arr
along the first axis, the solution with np.broadcast_to
to create 3D
view would be –
np.broadcast_to(arr,(3,)+arr.shape) # N = 3 here
Stack along third/last axis
To stack input arr
along the third axis, the solution to create 3D
view would be –
np.broadcast_to(arr[...,None],arr.shape+(3,))
If we actually need a memory copy, we can always append .copy()
there. Hence, the solutions would be –
np.broadcast_to(arr,(3,)+arr.shape).copy()
np.broadcast_to(arr[...,None],arr.shape+(3,)).copy()
Here’s how the stacking works for the two cases, shown with their shape information for a sample case –
# Create a sample input array of shape (4,5)
In [55]: arr = np.random.rand(4,5)
# Stack along first axis
In [56]: np.broadcast_to(arr,(3,)+arr.shape).shape
Out[56]: (3, 4, 5)
# Stack along third axis
In [57]: np.broadcast_to(arr[...,None],arr.shape+(3,)).shape
Out[57]: (4, 5, 3)
Same solution(s) would work to extend a n-dim
input to n+1-dim
view output along the first and last axes. Let’s explore some higher dim cases –
3D input case :
In [58]: arr = np.random.rand(4,5,6)
# Stack along first axis
In [59]: np.broadcast_to(arr,(3,)+arr.shape).shape
Out[59]: (3, 4, 5, 6)
# Stack along last axis
In [60]: np.broadcast_to(arr[...,None],arr.shape+(3,)).shape
Out[60]: (4, 5, 6, 3)
4D input case :
In [61]: arr = np.random.rand(4,5,6,7)
# Stack along first axis
In [62]: np.broadcast_to(arr,(3,)+arr.shape).shape
Out[62]: (3, 4, 5, 6, 7)
# Stack along last axis
In [63]: np.broadcast_to(arr[...,None],arr.shape+(3,)).shape
Out[63]: (4, 5, 6, 7, 3)
and so on.
Timings
Let’s use a large sample 2D
case and get the timings and verify output being a view
.
# Sample input array
In [19]: arr = np.random.rand(1000,1000)
Let’s prove that the proposed solution is a view indeed. We will use stacking along first axis (results would be very similar for stacking along the third axis) –
In [22]: np.shares_memory(arr, np.broadcast_to(arr,(3,)+arr.shape))
Out[22]: True
Let’s get the timings to show that it’s virtually free –
In [20]: %timeit np.broadcast_to(arr,(3,)+arr.shape)
100000 loops, best of 3: 3.56 µs per loop
In [21]: %timeit np.broadcast_to(arr,(3000,)+arr.shape)
100000 loops, best of 3: 3.51 µs per loop
Being a view, increasing N
from 3
to 3000
changed nothing on timings and both are negligible on timing units. Hence, efficient both on memory and performance!
回答 3
A=np.array([[1,2],[3,4]])
B=np.asarray([A]*N)
编辑@ Mr.F,以保留尺寸顺序:
B=B.T
A=np.array([[1,2],[3,4]])
B=np.asarray([A]*N)
Edit @Mr.F, to preserve dimension order:
B=B.T
回答 4
这是一个广播示例,可以完全满足您的要求。
a = np.array([[1, 2], [1, 2]])
a=a[:,:,None]
b=np.array([1]*5)[None,None,:]
然后b*a
是所希望的结果和(b*a)[:,:,0]
产生array([[1, 2],[1, 2]])
,这是原来a
一样,(b*a)[:,:,1]
等
Here’s a broadcasting example that does exactly what was requested.
a = np.array([[1, 2], [1, 2]])
a=a[:,:,None]
b=np.array([1]*5)[None,None,:]
Then b*a
is the desired result and (b*a)[:,:,0]
produces array([[1, 2],[1, 2]])
, which is the original a
, as does (b*a)[:,:,1]
, etc.
回答 5
现在也可以使用np.tile如下实现:
import numpy as np
a = np.array([[1,2],[1,2]])
b = np.tile(a,(3, 1,1))
b.shape
(3,2,2)
b
array([[[1, 2],
[1, 2]],
[[1, 2],
[1, 2]],
[[1, 2],
[1, 2]]])
This can now also be achived using np.tile as follows:
import numpy as np
a = np.array([[1,2],[1,2]])
b = np.tile(a,(3, 1,1))
b.shape
(3,2,2)
b
array([[[1, 2],
[1, 2]],
[[1, 2],
[1, 2]],
[[1, 2],
[1, 2]]])