问题:Python NumPy中的np.mean()vs np.average()吗?
我注意到
In [30]: np.mean([1, 2, 3])
Out[30]: 2.0
In [31]: np.average([1, 2, 3])
Out[31]: 2.0
但是,应该存在一些差异,因为它们毕竟是两个不同的功能。
它们之间有什么区别?
I notice that
In [30]: np.mean([1, 2, 3])
Out[30]: 2.0
In [31]: np.average([1, 2, 3])
Out[31]: 2.0
However, there should be some differences, since after all they are two different functions.
What are the differences between them?
回答 0
np.average采用可选的权重参数。如果未提供,则等效。看一下源代码:Mean,Average
np.mean:
try:
mean = a.mean
except AttributeError:
return _wrapit(a, 'mean', axis, dtype, out)
return mean(axis, dtype, out)
np.average:
...
if weights is None :
avg = a.mean(axis)
scl = avg.dtype.type(a.size/avg.size)
else:
#code that does weighted mean here
if returned: #returned is another optional argument
scl = np.multiply(avg, 0) + scl
return avg, scl
else:
return avg
...
np.average takes an optional weight parameter. If it is not supplied they are equivalent. Take a look at the source code: Mean, Average
np.mean:
try:
mean = a.mean
except AttributeError:
return _wrapit(a, 'mean', axis, dtype, out)
return mean(axis, dtype, out)
np.average:
...
if weights is None :
avg = a.mean(axis)
scl = avg.dtype.type(a.size/avg.size)
else:
#code that does weighted mean here
if returned: #returned is another optional argument
scl = np.multiply(avg, 0) + scl
return avg, scl
else:
return avg
...
回答 1
np.mean
总是计算算术平均值,并具有一些用于输入和输出的其他选项(例如,使用什么数据类型,将结果放置在何处)。
np.average
如果weights
提供了参数,则可以计算加权平均值。
np.mean
always computes an arithmetic mean, and has some additional options for input and output (e.g. what datatypes to use, where to place the result).
np.average
can compute a weighted average if the weights
parameter is supplied.
回答 2
在某些版本的numpy中,您必须意识到另一个重要的区别:
average
不考虑掩码,因此请计算整个数据集的平均值。
mean
考虑到掩码,因此仅对未掩码的值计算平均值。
g = [1,2,3,55,66,77]
f = np.ma.masked_greater(g,5)
np.average(f)
Out: 34.0
np.mean(f)
Out: 2.0
In some version of numpy there is another imporant difference that you must be aware:
average
do not take in account masks, so compute the average over the whole set of data.
mean
takes in account masks, so compute the mean only over unmasked values.
g = [1,2,3,55,66,77]
f = np.ma.masked_greater(g,5)
np.average(f)
Out: 34.0
np.mean(f)
Out: 2.0
回答 3
In your invocation, the two functions are the same.
average
can compute a weighted average though.
Doc links: mean
and average
回答 4
除了已经指出的差异之外,还有另一个非常重要的差异,我刚刚发现了很难的方法:与不同np.mean
,np.average
不允许使用dtype
关键字,这在某些情况下对于获得正确的结果至关重要。我有一个非常大的单精度数组,可以从h5
文件访问它。如果我沿轴0和1取平均值,除非指定dtype='float64'
:
>T.shape
(4096, 4096, 720)
>T.dtype
dtype('<f4')
m1 = np.average(T, axis=(0,1)) # garbage
m2 = np.mean(T, axis=(0,1)) # the same garbage
m3 = np.mean(T, axis=(0,1), dtype='float64') # correct results
不幸的是,除非您知道要查找的内容,否则不一定能说出结果是错误的。np.average
由于这个原因,我将不再使用,但将始终np.mean(.., dtype='float64')
在任何大型阵列上使用。如果我想要一个加权平均数,我将使用权重向量和目标数组的乘积,然后再加上np.sum
或np.mean
,适当地(也具有适当的精度),对它进行显式计算。
In addition to the differences already noted, there’s another extremely important difference that I just now discovered the hard way: unlike np.mean
, np.average
doesn’t allow the dtype
keyword, which is essential for getting correct results in some cases. I have a very large single-precision array that is accessed from an h5
file. If I take the mean along axes 0 and 1, I get wildly incorrect results unless I specify dtype='float64'
:
>T.shape
(4096, 4096, 720)
>T.dtype
dtype('<f4')
m1 = np.average(T, axis=(0,1)) # garbage
m2 = np.mean(T, axis=(0,1)) # the same garbage
m3 = np.mean(T, axis=(0,1), dtype='float64') # correct results
Unfortunately, unless you know what to look for, you can’t necessarily tell your results are wrong. I will never use np.average
again for this reason but will always use np.mean(.., dtype='float64')
on any large array. If I want a weighted average, I’ll compute it explicitly using the product of the weight vector and the target array and then either np.sum
or np.mean
, as appropriate (with appropriate precision as well).