问题:numpy dot()和Python 3.5+矩阵乘法@之间的区别
我最近使用Python 3.5,注意到新的矩阵乘法运算符(@)有时与numpy点运算符的行为有所不同。例如,对于3d阵列:
import numpy as np
a = np.random.rand(8,13,13)
b = np.random.rand(8,13,13)
c = a @ b # Python 3.5+
d = np.dot(a, b)
的@
运算符返回形状的阵列:
c.shape
(8, 13, 13)
当np.dot()
函数返回时:
d.shape
(8, 13, 8, 13)
如何用numpy点重现相同的结果?还有其他重大区别吗?
I recently moved to Python 3.5 and noticed the new matrix multiplication operator (@) sometimes behaves differently from the numpy dot operator. In example, for 3d arrays:
import numpy as np
a = np.random.rand(8,13,13)
b = np.random.rand(8,13,13)
c = a @ b # Python 3.5+
d = np.dot(a, b)
The @
operator returns an array of shape:
c.shape
(8, 13, 13)
while the np.dot()
function returns:
d.shape
(8, 13, 8, 13)
How can I reproduce the same result with numpy dot? Are there any other significant differences?
回答 0
该@
运营商称阵列的__matmul__
方法,而不是dot
。此方法在API中也作为函数存在np.matmul
。
>>> a = np.random.rand(8,13,13)
>>> b = np.random.rand(8,13,13)
>>> np.matmul(a, b).shape
(8, 13, 13)
从文档中:
matmul
区别在于dot
两个重要方面。
- 标量不能相乘。
- 将矩阵堆栈一起广播,就好像矩阵是元素一样。
最后一点很清楚,当传递3D(或更高维)数组时,dot
和matmul
方法的行为会有所不同。从文档中引用更多内容:
对于matmul
:
如果任何一个参数为ND,N> 2,则将其视为驻留在最后两个索引中的一组矩阵,并进行相应广播。
对于np.dot
:
对于2-D数组,它等效于矩阵乘法,对于1-D数组,其等效于向量的内积(无复共轭)。对于N维,它是a的最后一个轴和b的倒数第二个轴的和积
The @
operator calls the array’s __matmul__
method, not dot
. This method is also present in the API as the function np.matmul
.
>>> a = np.random.rand(8,13,13)
>>> b = np.random.rand(8,13,13)
>>> np.matmul(a, b).shape
(8, 13, 13)
From the documentation:
matmul
differs from dot
in two important ways.
- Multiplication by scalars is not allowed.
- Stacks of matrices are broadcast together as if the matrices were elements.
The last point makes it clear that dot
and matmul
methods behave differently when passed 3D (or higher dimensional) arrays. Quoting from the documentation some more:
For matmul
:
If either argument is N-D, N > 2, it is treated as a stack of matrices residing in the last two indexes and broadcast accordingly.
For np.dot
:
For 2-D arrays it is equivalent to matrix multiplication, and for 1-D arrays to inner product of vectors (without complex conjugation). For N dimensions it is a sum product over the last axis of a and the second-to-last of b
回答 1
@ajcr的答案说明了dot
and matmul
(由@
符号调用)之间的区别。通过看一个简单的例子,可以清楚地看到两者在“矩阵堆栈”或张量上进行操作时的行为有何不同。
为了弄清差异,采用4×4数组,然后将dot
乘积和matmul
乘积返回3x4x2的“矩阵堆栈”或张量。
import numpy as np
fourbyfour = np.array([
[1,2,3,4],
[3,2,1,4],
[5,4,6,7],
[11,12,13,14]
])
threebyfourbytwo = np.array([
[[2,3],[11,9],[32,21],[28,17]],
[[2,3],[1,9],[3,21],[28,7]],
[[2,3],[1,9],[3,21],[28,7]],
])
print('4x4*3x4x2 dot:\n {}\n'.format(np.dot(fourbyfour,twobyfourbythree)))
print('4x4*3x4x2 matmul:\n {}\n'.format(np.matmul(fourbyfour,twobyfourbythree)))
每个操作的结果如下所示。注意点积如何
… a的最后一个轴与b的倒数第二个和的乘积
以及如何通过一起广播矩阵来形成矩阵乘积。
4x4*3x4x2 dot:
[[[232 152]
[125 112]
[125 112]]
[[172 116]
[123 76]
[123 76]]
[[442 296]
[228 226]
[228 226]]
[[962 652]
[465 512]
[465 512]]]
4x4*3x4x2 matmul:
[[[232 152]
[172 116]
[442 296]
[962 652]]
[[125 112]
[123 76]
[228 226]
[465 512]]
[[125 112]
[123 76]
[228 226]
[465 512]]]
The answer by @ajcr explains how the dot
and matmul
(invoked by the @
symbol) differ. By looking at a simple example, one clearly sees how the two behave differently when operating on ‘stacks of matricies’ or tensors.
To clarify the differences take a 4×4 array and return the dot
product and matmul
product with a 3x4x2 ‘stack of matricies’ or tensor.
import numpy as np
fourbyfour = np.array([
[1,2,3,4],
[3,2,1,4],
[5,4,6,7],
[11,12,13,14]
])
threebyfourbytwo = np.array([
[[2,3],[11,9],[32,21],[28,17]],
[[2,3],[1,9],[3,21],[28,7]],
[[2,3],[1,9],[3,21],[28,7]],
])
print('4x4*3x4x2 dot:\n {}\n'.format(np.dot(fourbyfour,threebyfourbytwo)))
print('4x4*3x4x2 matmul:\n {}\n'.format(np.matmul(fourbyfour,threebyfourbytwo)))
The products of each operation appear below. Notice how the dot product is,
…a sum product over the last axis of a and the second-to-last of b
and how the matrix product is formed by broadcasting the matrix together.
4x4*3x4x2 dot:
[[[232 152]
[125 112]
[125 112]]
[[172 116]
[123 76]
[123 76]]
[[442 296]
[228 226]
[228 226]]
[[962 652]
[465 512]
[465 512]]]
4x4*3x4x2 matmul:
[[[232 152]
[172 116]
[442 296]
[962 652]]
[[125 112]
[123 76]
[228 226]
[465 512]]
[[125 112]
[123 76]
[228 226]
[465 512]]]
回答 2
仅供参考,@
其numpy的等价物dot
,并matmul
都大致一样快。(用我的一个项目perfplot创建的图。)
复制剧情的代码:
import perfplot
import numpy
def setup(n):
A = numpy.random.rand(n, n)
x = numpy.random.rand(n)
return A, x
def at(data):
A, x = data
return A @ x
def numpy_dot(data):
A, x = data
return numpy.dot(A, x)
def numpy_matmul(data):
A, x = data
return numpy.matmul(A, x)
perfplot.show(
setup=setup,
kernels=[at, numpy_dot, numpy_matmul],
n_range=[2 ** k for k in range(12)],
logx=True,
logy=True,
)
Just FYI, @
and its numpy equivalents dot
and matmul
are all equally fast. (Plot created with perfplot, a project of mine.)
Code to reproduce the plot:
import perfplot
import numpy
def setup(n):
A = numpy.random.rand(n, n)
x = numpy.random.rand(n)
return A, x
def at(data):
A, x = data
return A @ x
def numpy_dot(data):
A, x = data
return numpy.dot(A, x)
def numpy_matmul(data):
A, x = data
return numpy.matmul(A, x)
perfplot.show(
setup=setup,
kernels=[at, numpy_dot, numpy_matmul],
n_range=[2 ** k for k in range(15)],
)
回答 3
在数学上,我认为numpy中的点更有意义
点(a,b)_ {i,j,k,a,b,c} =
因为当a和b是向量时它给出点积,或者当a和b是矩阵时给出矩阵乘积
对于numpy中的matmul操作,它由点结果的一部分组成,可以定义为
> matmul(a,b)_ {i,j,k,c} =
因此,您可以看到matmul(a,b)返回的数组形状较小,从而减少了内存消耗,并在应用程序中更有意义。特别是结合广播,您可以获得
matmul(a,b)_ {i,j,k,l} =
例如。
从以上两个定义中,您可以看到使用这两个操作的要求。假设a.shape =(s1,s2,s3,s4)和b.shape =(t1,t2,t3,t4)
要使用点(a,b),您需要
- t3 = s4 ;
要使用matmul(a,b),您需要
- t3 = s4
- t2 = s2或t2和s2之一为1
- t1 = s1或t1和s1之一为1
使用以下代码说服自己。
代码样例
import numpy as np
for it in xrange(10000):
a = np.random.rand(5,6,2,4)
b = np.random.rand(6,4,3)
c = np.matmul(a,b)
d = np.dot(a,b)
#print 'c shape: ', c.shape,'d shape:', d.shape
for i in range(5):
for j in range(6):
for k in range(2):
for l in range(3):
if not c[i,j,k,l] == d[i,j,k,j,l]:
print it,i,j,k,l,c[i,j,k,l]==d[i,j,k,j,l] #you will not see them
In mathematics, I think the dot in numpy makes more sense
dot(a,b)_{i,j,k,a,b,c} =
since it gives the dot product when a and b are vectors, or the matrix multiplication when a and b are matrices
As for matmul operation in numpy, it consists of parts of dot result, and it can be defined as
>matmul(a,b)_{i,j,k,c} =
So, you can see that matmul(a,b) returns an array with a small shape, which has smaller memory consumption and make more sense in applications. In particular, combining with broadcasting, you can get
matmul(a,b)_{i,j,k,l} =
for example.
From the above two definitions, you can see the requirements to use those two operations. Assume a.shape=(s1,s2,s3,s4) and b.shape=(t1,t2,t3,t4)
Use the following piece of code to convince yourself.
Code sample
import numpy as np
for it in xrange(10000):
a = np.random.rand(5,6,2,4)
b = np.random.rand(6,4,3)
c = np.matmul(a,b)
d = np.dot(a,b)
#print 'c shape: ', c.shape,'d shape:', d.shape
for i in range(5):
for j in range(6):
for k in range(2):
for l in range(3):
if not c[i,j,k,l] == d[i,j,k,j,l]:
print it,i,j,k,l,c[i,j,k,l]==d[i,j,k,j,l] #you will not see them
回答 4
这是与的比较,np.einsum
以显示索引的投影方式
np.allclose(np.einsum('ijk,ijk->ijk', a,b), a*b) # True
np.allclose(np.einsum('ijk,ikl->ijl', a,b), a@b) # True
np.allclose(np.einsum('ijk,lkm->ijlm',a,b), a.dot(b)) # True
Here is a comparison with np.einsum
to show how the indices are projected
np.allclose(np.einsum('ijk,ijk->ijk', a,b), a*b) # True
np.allclose(np.einsum('ijk,ikl->ijl', a,b), a@b) # True
np.allclose(np.einsum('ijk,lkm->ijlm',a,b), a.dot(b)) # True
回答 5
我对MATMUL和DOT的经验
尝试使用MATMUL时,我经常收到“ ValueError:传递的值的形状为(200,1),索引表示(200,3)”。我想要一个快速的解决方法,并发现DOT可以提供相同的功能。使用DOT我没有任何错误。我得到正确的答案
与MATMUL
X.shape
>>>(200, 3)
type(X)
>>>pandas.core.frame.DataFrame
w
>>>array([0.37454012, 0.95071431, 0.73199394])
YY = np.matmul(X,w)
>>> ValueError: Shape of passed values is (200, 1), indices imply (200, 3)"
与DOT
YY = np.dot(X,w)
# no error message
YY
>>>array([ 2.59206877, 1.06842193, 2.18533396, 2.11366346, 0.28505879, …
YY.shape
>>> (200, )
My experience with MATMUL and DOT
I was constantly getting “ValueError: Shape of passed values is (200, 1), indices imply (200, 3)” when trying to use MATMUL. I wanted a quick workaround and found DOT to deliver the same functionality. I don’t get any error using DOT. I get the correct answer
with MATMUL
X.shape
>>>(200, 3)
type(X)
>>>pandas.core.frame.DataFrame
w
>>>array([0.37454012, 0.95071431, 0.73199394])
YY = np.matmul(X,w)
>>> ValueError: Shape of passed values is (200, 1), indices imply (200, 3)"
with DOT
YY = np.dot(X,w)
# no error message
YY
>>>array([ 2.59206877, 1.06842193, 2.18533396, 2.11366346, 0.28505879, …
YY.shape
>>> (200, )
声明:本站所有文章,如无特殊说明或标注,均为本站原创发布。任何个人或组织,在未征得本站同意时,禁止复制、盗用、采集、发布本站内容到任何网站、书籍等各类媒体平台。如若本站内容侵犯了原著者的合法权益,可联系我们进行处理。