Python NumPy中的np.mean()vs np.average()吗?

问题:Python NumPy中的np.mean()vs np.average()吗?

我注意到

In [30]: np.mean([1, 2, 3])
Out[30]: 2.0

In [31]: np.average([1, 2, 3])
Out[31]: 2.0

但是,应该存在一些差异,因为它们毕竟是两个不同的功能。

它们之间有什么区别?

I notice that

In [30]: np.mean([1, 2, 3])
Out[30]: 2.0

In [31]: np.average([1, 2, 3])
Out[31]: 2.0

However, there should be some differences, since after all they are two different functions.

What are the differences between them?


回答 0

np.average采用可选的权重参数。如果未提供,则等效。看一下源代码:MeanAverage

np.mean:

try:
    mean = a.mean
except AttributeError:
    return _wrapit(a, 'mean', axis, dtype, out)
return mean(axis, dtype, out)

np.average:

...
if weights is None :
    avg = a.mean(axis)
    scl = avg.dtype.type(a.size/avg.size)
else:
    #code that does weighted mean here

if returned: #returned is another optional argument
    scl = np.multiply(avg, 0) + scl
    return avg, scl
else:
    return avg
...

np.average takes an optional weight parameter. If it is not supplied they are equivalent. Take a look at the source code: Mean, Average

np.mean:

try:
    mean = a.mean
except AttributeError:
    return _wrapit(a, 'mean', axis, dtype, out)
return mean(axis, dtype, out)

np.average:

...
if weights is None :
    avg = a.mean(axis)
    scl = avg.dtype.type(a.size/avg.size)
else:
    #code that does weighted mean here

if returned: #returned is another optional argument
    scl = np.multiply(avg, 0) + scl
    return avg, scl
else:
    return avg
...

回答 1

np.mean 总是计算算术平均值,并具有一些用于输入和输出的其他选项(例如,使用什么数据类型,将结果放置在何处)。

np.average如果weights提供了参数,则可以计算加权平均值。

np.mean always computes an arithmetic mean, and has some additional options for input and output (e.g. what datatypes to use, where to place the result).

np.average can compute a weighted average if the weights parameter is supplied.


回答 2

在某些版本的numpy中,您必须意识到另一个重要的区别:

average 不考虑掩码,因此请计算整个数据集的平均值。

mean 考虑到掩码,因此仅对未掩码的值计算平均值。

g = [1,2,3,55,66,77]
f = np.ma.masked_greater(g,5)

np.average(f)
Out: 34.0

np.mean(f)
Out: 2.0

In some version of numpy there is another imporant difference that you must be aware:

average do not take in account masks, so compute the average over the whole set of data.

mean takes in account masks, so compute the mean only over unmasked values.

g = [1,2,3,55,66,77]
f = np.ma.masked_greater(g,5)

np.average(f)
Out: 34.0

np.mean(f)
Out: 2.0

回答 3

在您的调用中,两个函数是相同的。

average 可以计算加权平均值。

Doc链接:meanaverage

In your invocation, the two functions are the same.

average can compute a weighted average though.

Doc links: mean and average


回答 4

除了已经指出的差异之外,还有另一个非常重要的差异,我刚刚发现了很难的方法:与不同np.meannp.average不允许使用dtype关键字,这在某些情况下对于获得正确的结果至关重要。我有一个非常大的单精度数组,可以从h5文件访问它。如果我沿轴0和1取平均值,除非指定dtype='float64'

>T.shape
(4096, 4096, 720)
>T.dtype
dtype('<f4')

m1 = np.average(T, axis=(0,1))                #  garbage
m2 = np.mean(T, axis=(0,1))                   #  the same garbage
m3 = np.mean(T, axis=(0,1), dtype='float64')  # correct results

不幸的是,除非您知道要查找的内容,否则不一定能说出结果是错误的。np.average由于这个原因,我将不再使用,但将始终np.mean(.., dtype='float64')在任何大型阵列上使用。如果我想要一个加权平均数,我将使用权重向量和目标数组的乘积,然后再加上np.sumnp.mean,适当地(也具有适当的精度),对它进行显式计算。

In addition to the differences already noted, there’s another extremely important difference that I just now discovered the hard way: unlike np.mean, np.average doesn’t allow the dtype keyword, which is essential for getting correct results in some cases. I have a very large single-precision array that is accessed from an h5 file. If I take the mean along axes 0 and 1, I get wildly incorrect results unless I specify dtype='float64':

>T.shape
(4096, 4096, 720)
>T.dtype
dtype('<f4')

m1 = np.average(T, axis=(0,1))                #  garbage
m2 = np.mean(T, axis=(0,1))                   #  the same garbage
m3 = np.mean(T, axis=(0,1), dtype='float64')  # correct results

Unfortunately, unless you know what to look for, you can’t necessarily tell your results are wrong. I will never use np.average again for this reason but will always use np.mean(.., dtype='float64') on any large array. If I want a weighted average, I’ll compute it explicitly using the product of the weight vector and the target array and then either np.sum or np.mean, as appropriate (with appropriate precision as well).