The @ operator calls the array’s __matmul__ method, not dot. This method is also present in the API as the function np.matmul.
>>> a = np.random.rand(8,13,13)
>>> b = np.random.rand(8,13,13)
>>> np.matmul(a, b).shape
(8, 13, 13)
From the documentation:
matmul differs from dot in two important ways.
Multiplication by scalars is not allowed.
Stacks of matrices are broadcast together as if the matrices were elements.
The last point makes it clear that dot and matmul methods behave differently when passed 3D (or higher dimensional) arrays. Quoting from the documentation some more:
For matmul:
If either argument is N-D, N > 2, it is treated as a stack of matrices residing in the last two indexes and broadcast accordingly.
For 2-D arrays it is equivalent to matrix multiplication, and for 1-D arrays to inner product of vectors (without complex conjugation). For N dimensions it is a sum product over the last axis of a and the second-to-last of b
The answer by @ajcr explains how the dot and matmul (invoked by the @ symbol) differ. By looking at a simple example, one clearly sees how the two behave differently when operating on ‘stacks of matricies’ or tensors.
To clarify the differences take a 4×4 array and return the dot product and matmul product with a 3x4x2 ‘stack of matricies’ or tensor.
import perfplot
import numpy
def setup(n):
A = numpy.random.rand(n, n)
x = numpy.random.rand(n)return A, x
def at(data):
A, x = data
return A @ x
def numpy_dot(data):
A, x = data
return numpy.dot(A, x)def numpy_matmul(data):
A, x = data
return numpy.matmul(A, x)
perfplot.show(
setup=setup,
kernels=[at, numpy_dot, numpy_matmul],
n_range=[2** k for k in range(12)],
logx=True,
logy=True,)
Just FYI, @ and its numpy equivalents dot and matmul are all equally fast. (Plot created with perfplot, a project of mine.)
Code to reproduce the plot:
import perfplot
import numpy
def setup(n):
A = numpy.random.rand(n, n)
x = numpy.random.rand(n)
return A, x
def at(data):
A, x = data
return A @ x
def numpy_dot(data):
A, x = data
return numpy.dot(A, x)
def numpy_matmul(data):
A, x = data
return numpy.matmul(A, x)
perfplot.show(
setup=setup,
kernels=[at, numpy_dot, numpy_matmul],
n_range=[2 ** k for k in range(15)],
)
import numpy as np
for it in xrange(10000):
a = np.random.rand(5,6,2,4)
b = np.random.rand(6,4,3)
c = np.matmul(a,b)
d = np.dot(a,b)#print 'c shape: ', c.shape,'d shape:', d.shapefor i in range(5):for j in range(6):for k in range(2):for l in range(3):ifnot c[i,j,k,l]== d[i,j,k,j,l]:print it,i,j,k,l,c[i,j,k,l]==d[i,j,k,j,l]#you will not see them
In mathematics, I think the dot in numpy makes more sense
dot(a,b)_{i,j,k,a,b,c} =
since it gives the dot product when a and b are vectors, or the matrix multiplication when a and b are matrices
As for matmul operation in numpy, it consists of parts of dot result, and it can be defined as
>matmul(a,b)_{i,j,k,c} =
So, you can see that matmul(a,b) returns an array with a small shape,
which has smaller memory consumption and make more sense in applications.
In particular, combining with broadcasting, you can get
matmul(a,b)_{i,j,k,l} =
for example.
From the above two definitions, you can see the requirements to use those two operations. Assume a.shape=(s1,s2,s3,s4) and b.shape=(t1,t2,t3,t4)
To use dot(a,b) you need
t3=s4;
To use matmul(a,b) you need
t3=s4
t2=s2, or one of t2 and s2 is 1
t1=s1, or one of t1 and s1 is 1
Use the following piece of code to convince yourself.
Code sample
import numpy as np
for it in xrange(10000):
a = np.random.rand(5,6,2,4)
b = np.random.rand(6,4,3)
c = np.matmul(a,b)
d = np.dot(a,b)
#print 'c shape: ', c.shape,'d shape:', d.shape
for i in range(5):
for j in range(6):
for k in range(2):
for l in range(3):
if not c[i,j,k,l] == d[i,j,k,j,l]:
print it,i,j,k,l,c[i,j,k,l]==d[i,j,k,j,l] #you will not see them
X.shape
>>>(200,3)
type(X)>>>pandas.core.frame.DataFrame
w
>>>array([0.37454012,0.95071431,0.73199394])
YY = np.matmul(X,w)>>>ValueError:Shape of passed values is(200,1), indices imply (200,3)"
与DOT
YY = np.dot(X,w)# no error message
YY
>>>array([2.59206877,1.06842193,2.18533396,2.11366346,0.28505879,…
YY.shape
>>>(200,)
I was constantly getting “ValueError: Shape of passed values is (200, 1), indices imply (200, 3)” when trying to use MATMUL. I wanted a quick workaround and found DOT to deliver the same functionality. I don’t get any error using DOT. I get the correct answer
with MATMUL
X.shape
>>>(200, 3)
type(X)
>>>pandas.core.frame.DataFrame
w
>>>array([0.37454012, 0.95071431, 0.73199394])
YY = np.matmul(X,w)
>>> ValueError: Shape of passed values is (200, 1), indices imply (200, 3)"