问题:“克隆”行或列向量
有时将行或列向量“克隆”到矩阵很有用。克隆是指将行向量转换为
[1,2,3]
入矩阵
[[1,2,3]
[1,2,3]
[1,2,3]
]
或列向量,例如
[1
2
3
]
进入
[[1,1,1]
[2,2,2]
[3,3,3]
]
在matlab或octave中,这很容易做到:
x = [1,2,3]
a = ones(3,1) * x
a =
1 2 3
1 2 3
1 2 3
b = (x') * ones(1,3)
b =
1 1 1
2 2 2
3 3 3
我想以numpy重复此操作,但未成功
In [14]: x = array([1,2,3])
In [14]: ones((3,1)) * x
Out[14]:
array([[ 1., 2., 3.],
[ 1., 2., 3.],
[ 1., 2., 3.]])
# so far so good
In [16]: x.transpose() * ones((1,3))
Out[16]: array([[ 1., 2., 3.]])
# DAMN
# I end up with
In [17]: (ones((3,1)) * x).transpose()
Out[17]:
array([[ 1., 1., 1.],
[ 2., 2., 2.],
[ 3., 3., 3.]])
为什么第一种方法(In [16]
)不起作用?有没有办法以更优雅的方式在python中完成此任务?
Sometimes it is useful to “clone” a row or column vector to a matrix. By cloning I mean converting a row vector such as
[1,2,3]
Into a matrix
[[1,2,3]
[1,2,3]
[1,2,3]
]
or a column vector such as
[1
2
3
]
into
[[1,1,1]
[2,2,2]
[3,3,3]
]
In matlab or octave this is done pretty easily:
x = [1,2,3]
a = ones(3,1) * x
a =
1 2 3
1 2 3
1 2 3
b = (x') * ones(1,3)
b =
1 1 1
2 2 2
3 3 3
I want to repeat this in numpy, but unsuccessfully
In [14]: x = array([1,2,3])
In [14]: ones((3,1)) * x
Out[14]:
array([[ 1., 2., 3.],
[ 1., 2., 3.],
[ 1., 2., 3.]])
# so far so good
In [16]: x.transpose() * ones((1,3))
Out[16]: array([[ 1., 2., 3.]])
# DAMN
# I end up with
In [17]: (ones((3,1)) * x).transpose()
Out[17]:
array([[ 1., 1., 1.],
[ 2., 2., 2.],
[ 3., 3., 3.]])
Why wasn’t the first method (In [16]
) working? Is there a way to achieve this task in python in a more elegant way?
回答 0
这是一种优雅的Python方式:
>>> array([[1,2,3],]*3)
array([[1, 2, 3],
[1, 2, 3],
[1, 2, 3]])
>>> array([[1,2,3],]*3).transpose()
array([[1, 1, 1],
[2, 2, 2],
[3, 3, 3]])
问题[16]
似乎是转置对数组没有作用。您可能需要矩阵:
>>> x = array([1,2,3])
>>> x
array([1, 2, 3])
>>> x.transpose()
array([1, 2, 3])
>>> matrix([1,2,3])
matrix([[1, 2, 3]])
>>> matrix([1,2,3]).transpose()
matrix([[1],
[2],
[3]])
Here’s an elegant, Pythonic way to do it:
>>> array([[1,2,3],]*3)
array([[1, 2, 3],
[1, 2, 3],
[1, 2, 3]])
>>> array([[1,2,3],]*3).transpose()
array([[1, 1, 1],
[2, 2, 2],
[3, 3, 3]])
the problem with [16]
seems to be that the transpose has no effect for an array. you’re probably wanting a matrix instead:
>>> x = array([1,2,3])
>>> x
array([1, 2, 3])
>>> x.transpose()
array([1, 2, 3])
>>> matrix([1,2,3])
matrix([[1, 2, 3]])
>>> matrix([1,2,3]).transpose()
matrix([[1],
[2],
[3]])
回答 1
用途numpy.tile
:
>>> tile(array([1,2,3]), (3, 1))
array([[1, 2, 3],
[1, 2, 3],
[1, 2, 3]])
或重复列:
>>> tile(array([[1,2,3]]).transpose(), (1, 3))
array([[1, 1, 1],
[2, 2, 2],
[3, 3, 3]])
Use numpy.tile
:
>>> tile(array([1,2,3]), (3, 1))
array([[1, 2, 3],
[1, 2, 3],
[1, 2, 3]])
or for repeating columns:
>>> tile(array([[1,2,3]]).transpose(), (1, 3))
array([[1, 1, 1],
[2, 2, 2],
[3, 3, 3]])
回答 2
首先请注意,使用numpy的广播操作,通常不必复制行和列。见这个和这个有关描述。
但是要做到这一点,重复和换轴可能是最好的方法
In [12]: x = array([1,2,3])
In [13]: repeat(x[:,newaxis], 3, 1)
Out[13]:
array([[1, 1, 1],
[2, 2, 2],
[3, 3, 3]])
In [14]: repeat(x[newaxis,:], 3, 0)
Out[14]:
array([[1, 2, 3],
[1, 2, 3],
[1, 2, 3]])
这个例子是针对行向量的,但是将其应用于列向量是显而易见的。重复似乎很好地说明了这一点,但是您也可以像示例一样通过乘法来完成
In [15]: x = array([[1, 2, 3]]) # note the double brackets
In [16]: (ones((3,1))*x).transpose()
Out[16]:
array([[ 1., 1., 1.],
[ 2., 2., 2.],
[ 3., 3., 3.]])
First note that with numpy’s broadcasting operations it’s usually not necessary to duplicate rows and columns. See this and this for descriptions.
But to do this, repeat and newaxis are probably the best way
In [12]: x = array([1,2,3])
In [13]: repeat(x[:,newaxis], 3, 1)
Out[13]:
array([[1, 1, 1],
[2, 2, 2],
[3, 3, 3]])
In [14]: repeat(x[newaxis,:], 3, 0)
Out[14]:
array([[1, 2, 3],
[1, 2, 3],
[1, 2, 3]])
This example is for a row vector, but applying this to a column vector is hopefully obvious. repeat seems to spell this well, but you can also do it via multiplication as in your example
In [15]: x = array([[1, 2, 3]]) # note the double brackets
In [16]: (ones((3,1))*x).transpose()
Out[16]:
array([[ 1., 1., 1.],
[ 2., 2., 2.],
[ 3., 3., 3.]])
回答 3
让:
>>> n = 1000
>>> x = np.arange(n)
>>> reps = 10000
零成本分配
甲视图不采取任何附加的存储器。因此,这些声明是瞬时的:
# New axis
x[np.newaxis, ...]
# Broadcast to specific shape
np.broadcast_to(x, (reps, n))
强制分配
如果要强制内容驻留在内存中:
>>> %timeit np.array(np.broadcast_to(x, (reps, n)))
10.2 ms ± 62.3 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
>>> %timeit np.repeat(x[np.newaxis, :], reps, axis=0)
9.88 ms ± 52.4 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
>>> %timeit np.tile(x, (reps, 1))
9.97 ms ± 77.3 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
这三种方法的速度大致相同。
计算方式
>>> a = np.arange(reps * n).reshape(reps, n)
>>> x_tiled = np.tile(x, (reps, 1))
>>> %timeit np.broadcast_to(x, (reps, n)) * a
17.1 ms ± 284 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
>>> %timeit x[np.newaxis, :] * a
17.5 ms ± 300 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
>>> %timeit x_tiled * a
17.6 ms ± 240 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
这三种方法的速度大致相同。
结论
如果要在计算之前进行复制,请考虑使用“零成本分配”方法之一。您不会遭受“强制分配”的性能损失。
Let:
>>> n = 1000
>>> x = np.arange(n)
>>> reps = 10000
Zero-cost allocations
A view does not take any additional memory. Thus, these declarations are instantaneous:
# New axis
x[np.newaxis, ...]
# Broadcast to specific shape
np.broadcast_to(x, (reps, n))
Forced allocation
If you want force the contents to reside in memory:
>>> %timeit np.array(np.broadcast_to(x, (reps, n)))
10.2 ms ± 62.3 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
>>> %timeit np.repeat(x[np.newaxis, :], reps, axis=0)
9.88 ms ± 52.4 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
>>> %timeit np.tile(x, (reps, 1))
9.97 ms ± 77.3 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
All three methods are roughly the same speed.
Computation
>>> a = np.arange(reps * n).reshape(reps, n)
>>> x_tiled = np.tile(x, (reps, 1))
>>> %timeit np.broadcast_to(x, (reps, n)) * a
17.1 ms ± 284 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
>>> %timeit x[np.newaxis, :] * a
17.5 ms ± 300 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
>>> %timeit x_tiled * a
17.6 ms ± 240 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
All three methods are roughly the same speed.
Conclusion
If you want to replicate before a computation, consider using one of the “zero-cost allocation” methods. You won’t suffer the performance penalty of “forced allocation”.
回答 4
我认为使用numpy中的广播是最好的,而且速度更快
我做了如下比较
import numpy as np
b = np.random.randn(1000)
In [105]: %timeit c = np.tile(b[:, newaxis], (1,100))
1000 loops, best of 3: 354 µs per loop
In [106]: %timeit c = np.repeat(b[:, newaxis], 100, axis=1)
1000 loops, best of 3: 347 µs per loop
In [107]: %timeit c = np.array([b,]*100).transpose()
100 loops, best of 3: 5.56 ms per loop
使用广播大约快15倍
I think using the broadcast in numpy is the best, and faster
I did a compare as following
import numpy as np
b = np.random.randn(1000)
In [105]: %timeit c = np.tile(b[:, newaxis], (1,100))
1000 loops, best of 3: 354 µs per loop
In [106]: %timeit c = np.repeat(b[:, newaxis], 100, axis=1)
1000 loops, best of 3: 347 µs per loop
In [107]: %timeit c = np.array([b,]*100).transpose()
100 loops, best of 3: 5.56 ms per loop
about 15 times faster using broadcast
回答 5
一种干净的解决方案是将NumPy的外部乘积函数与向量1结合使用:
np.outer(np.ones(n), x)
给出n
重复的行。切换参数顺序以获取重复列。要获得相等数量的行和列,您可以执行
np.outer(np.ones_like(x), x)
One clean solution is to use NumPy’s outer-product function with a vector of ones:
np.outer(np.ones(n), x)
gives n
repeating rows. Switch the argument order to get repeating columns. To get an equal number of rows and columns you might do
np.outer(np.ones_like(x), x)
回答 6
您可以使用
np.tile(x,3).reshape((4,3))
瓷砖将生成矢量的代表
并重塑将使其具有您想要的形状
You can use
np.tile(x,3).reshape((4,3))
tile will generate the reps of the vector
and reshape will give it the shape you want
回答 7
如果您拥有pandas数据框,并且想要保留dtypes(甚至是类别),这是一种快速的方法:
import numpy as np
import pandas as pd
df = pd.DataFrame({1: [1, 2, 3], 2: [4, 5, 6]})
number_repeats = 50
new_df = df.reindex(np.tile(df.index, number_repeats))
If you have a pandas dataframe and want to preserve the dtypes, even the categoricals, this is a fast way to do it:
import numpy as np
import pandas as pd
df = pd.DataFrame({1: [1, 2, 3], 2: [4, 5, 6]})
number_repeats = 50
new_df = df.reindex(np.tile(df.index, number_repeats))
回答 8
import numpy as np
x=np.array([1,2,3])
y=np.multiply(np.ones((len(x),len(x))),x).T
print(y)
Yield:
[[ 1. 1. 1.]
[ 2. 2. 2.]
[ 3. 3. 3.]]
import numpy as np
x=np.array([1,2,3])
y=np.multiply(np.ones((len(x),len(x))),x).T
print(y)
yields:
[[ 1. 1. 1.]
[ 2. 2. 2.]
[ 3. 3. 3.]]