问题:列表的标准偏差

我想找到几个(Z)列表的第一,第二,…个数字的均值和标准差。例如,我有

A_rank=[0.8,0.4,1.2,3.7,2.6,5.8]
B_rank=[0.1,2.8,3.7,2.6,5,3.4]
C_Rank=[1.2,3.4,0.5,0.1,2.5,6.1]
# etc (up to Z_rank )...

现在,我要获取的均值和std *_Rank[0],的均值和std *_Rank[1]
(即:所有(A..Z)_rank列表中第一个数字
的均值和std;来自的第二个数字的均值和std所有(A..Z)_rank列表;
第三个数字的均值和std …;等等)。

I want to find mean and standard deviation of 1st, 2nd,… digits of several (Z) lists. For example, I have

A_rank=[0.8,0.4,1.2,3.7,2.6,5.8]
B_rank=[0.1,2.8,3.7,2.6,5,3.4]
C_Rank=[1.2,3.4,0.5,0.1,2.5,6.1]
# etc (up to Z_rank )...

Now I want to take the mean and std of *_Rank[0], the mean and std of *_Rank[1], etc.
(ie: mean and std of the 1st digit from all the (A..Z)_rank lists;
the mean and std of the 2nd digit from all the (A..Z)_rank lists;
the mean and std of the 3rd digit…; etc).


回答 0

从Python 3.4 / PEP450开始statistics module,标准库中提供了一个,该库提供了一种stdev用于计算像您这样的可迭代对象的标准偏差的方法

>>> A_rank = [0.8, 0.4, 1.2, 3.7, 2.6, 5.8]
>>> import statistics
>>> statistics.stdev(A_rank)
2.0634114147853952

Since Python 3.4 / PEP450 there is a statistics module in the standard library, which has a method stdev for calculating the standard deviation of iterables like yours:

>>> A_rank = [0.8, 0.4, 1.2, 3.7, 2.6, 5.8]
>>> import statistics
>>> statistics.stdev(A_rank)
2.0634114147853952

回答 1

我将A_Rank等人放入二维NumPy数组中,然后使用numpy.mean()numpy.std()计算均值和标准差:

In [17]: import numpy

In [18]: arr = numpy.array([A_rank, B_rank, C_rank])

In [20]: numpy.mean(arr, axis=0)
Out[20]: 
array([ 0.7       ,  2.2       ,  1.8       ,  2.13333333,  3.36666667,
        5.1       ])

In [21]: numpy.std(arr, axis=0)
Out[21]: 
array([ 0.45460606,  1.29614814,  1.37355985,  1.50628314,  1.15566239,
        1.2083046 ])

I would put A_Rank et al into a 2D NumPy array, and then use numpy.mean() and numpy.std() to compute the means and the standard deviations:

In [17]: import numpy

In [18]: arr = numpy.array([A_rank, B_rank, C_rank])

In [20]: numpy.mean(arr, axis=0)
Out[20]: 
array([ 0.7       ,  2.2       ,  1.8       ,  2.13333333,  3.36666667,
        5.1       ])

In [21]: numpy.std(arr, axis=0)
Out[21]: 
array([ 0.45460606,  1.29614814,  1.37355985,  1.50628314,  1.15566239,
        1.2083046 ])

回答 2

这是一些纯Python代码,可用于计算均值和标准差。

以下所有代码均基于statisticsPython 3.4+中的模块。

def mean(data):
    """Return the sample arithmetic mean of data."""
    n = len(data)
    if n < 1:
        raise ValueError('mean requires at least one data point')
    return sum(data)/n # in Python 2 use sum(data)/float(n)

def _ss(data):
    """Return sum of square deviations of sequence data."""
    c = mean(data)
    ss = sum((x-c)**2 for x in data)
    return ss

def stddev(data, ddof=0):
    """Calculates the population standard deviation
    by default; specify ddof=1 to compute the sample
    standard deviation."""
    n = len(data)
    if n < 2:
        raise ValueError('variance requires at least two data points')
    ss = _ss(data)
    pvar = ss/(n-ddof)
    return pvar**0.5

注意:为提高浮点求和时的准确性,该statistics模块使用了自定义函数,_sum而不是sum我使用的内置函数。

现在我们有例如:

>>> mean([1, 2, 3])
2.0
>>> stddev([1, 2, 3]) # population standard deviation
0.816496580927726
>>> stddev([1, 2, 3], ddof=1) # sample standard deviation
0.1

Here’s some pure-Python code you can use to calculate the mean and standard deviation.

All code below is based on the statistics module in Python 3.4+.

def mean(data):
    """Return the sample arithmetic mean of data."""
    n = len(data)
    if n < 1:
        raise ValueError('mean requires at least one data point')
    return sum(data)/n # in Python 2 use sum(data)/float(n)

def _ss(data):
    """Return sum of square deviations of sequence data."""
    c = mean(data)
    ss = sum((x-c)**2 for x in data)
    return ss

def stddev(data, ddof=0):
    """Calculates the population standard deviation
    by default; specify ddof=1 to compute the sample
    standard deviation."""
    n = len(data)
    if n < 2:
        raise ValueError('variance requires at least two data points')
    ss = _ss(data)
    pvar = ss/(n-ddof)
    return pvar**0.5

Note: for improved accuracy when summing floats, the statistics module uses a custom function _sum rather than the built-in sum which I’ve used in its place.

Now we have for example:

>>> mean([1, 2, 3])
2.0
>>> stddev([1, 2, 3]) # population standard deviation
0.816496580927726
>>> stddev([1, 2, 3], ddof=1) # sample standard deviation
0.1

回答 3

在Python 2.7.1中,您可以使用numpy.std()以下方法计算标准差:

  • 人口标准:仅使用numpy.std()数据列表之外的其他参数即可。
  • 示例std:您需要将ddof(即Delta自由度)设置为1,如以下示例所示:

numpy.std(<您的列表>,ddof = 1

计算中使用的除数为N-ddof,其中N表示元素数。默认情况下,ddof为零。

它计算样本std而不是总体std。

In Python 2.7.1, you may calculate standard deviation using numpy.std() for:

  • Population std: Just use numpy.std() with no additional arguments besides to your data list.
  • Sample std: You need to pass ddof (i.e. Delta Degrees of Freedom) set to 1, as in the following example:

numpy.std(< your-list >, ddof=1)

The divisor used in calculations is N – ddof, where N represents the number of elements. By default ddof is zero.

It calculates sample std rather than population std.


回答 4

在python 2.7中,您可以使用NumPy numpy.std()给出总体标准差

在Python 3.4中返回样本标准偏差。该pstdv()功能是一样的numpy.std()

In python 2.7 you can use NumPy’s numpy.std() gives the population standard deviation.

In Python 3.4 returns the sample standard deviation. The pstdv() function is the same as numpy.std().


回答 5

使用python,以下是几种方法:

import statistics as st

n = int(input())
data = list(map(int, input().split()))

方法1-使用功能

stdev = st.pstdev(data)

方法2:计算方差并求平方根

variance = st.pvariance(data)
devia = math.sqrt(variance)

方法3:使用基本数学

mean = sum(data)/n
variance = sum([((x - mean) ** 2) for x in X]) / n
stddev = variance ** 0.5

print("{0:0.1f}".format(stddev))

注意:

  • variance 计算样本总体的方差
  • pvariance 计算整个人口的方差
  • 相似的差异stdevpstdev

Using python, here are few methods:

import statistics as st

n = int(input())
data = list(map(int, input().split()))

Approach1 – using a function

stdev = st.pstdev(data)

Approach2: calculate variance and take square root of it

variance = st.pvariance(data)
devia = math.sqrt(variance)

Approach3: using basic math

mean = sum(data)/n
variance = sum([((x - mean) ** 2) for x in X]) / n
stddev = variance ** 0.5

print("{0:0.1f}".format(stddev))

Note:

  • variance calculates variance of sample population
  • pvariance calculates variance of entire population
  • similar differences between stdev and pstdev

回答 6

纯python代码:

from math import sqrt

def stddev(lst):
    mean = float(sum(lst)) / len(lst)
    return sqrt(float(reduce(lambda x, y: x + y, map(lambda x: (x - mean) ** 2, lst))) / len(lst))

pure python code:

from math import sqrt

def stddev(lst):
    mean = float(sum(lst)) / len(lst)
    return sqrt(float(reduce(lambda x, y: x + y, map(lambda x: (x - mean) ** 2, lst))) / len(lst))

回答 7

其他答案涵盖了如何在python中充分执行std dev,但没有人解释如何进行您所描述的怪异遍历。

我将假设AZ是整个人口。如果没有,请参阅Ome关于如何从样本推断的答案。

因此,要获得每个列表的第一位数字的标准差/均值,您将需要如下所示:

#standard deviation
numpy.std([A_rank[0], B_rank[0], C_rank[0], ..., Z_rank[0]])

#mean
numpy.mean([A_rank[0], B_rank[0], C_rank[0], ..., Z_rank[0]])

为了缩短代码并将其通用化为第n个数字,请使用我为您生成的以下函数:

def getAllNthRanks(n):
    return [A_rank[n], B_rank[n], C_rank[n], D_rank[n], E_rank[n], F_rank[n], G_rank[n], H_rank[n], I_rank[n], J_rank[n], K_rank[n], L_rank[n], M_rank[n], N_rank[n], O_rank[n], P_rank[n], Q_rank[n], R_rank[n], S_rank[n], T_rank[n], U_rank[n], V_rank[n], W_rank[n], X_rank[n], Y_rank[n], Z_rank[n]] 

现在,您可以像这样简单地从AZ获取所有n个位置的stdd和均值:

#standard deviation
numpy.std(getAllNthRanks(n))

#mean
numpy.mean(getAllNthRanks(n))

The other answers cover how to do std dev in python sufficiently, but no one explains how to do the bizarre traversal you’ve described.

I’m going to assume A-Z is the entire population. If not see Ome‘s answer on how to inference from a sample.

So to get the standard deviation/mean of the first digit of every list you would need something like this:

#standard deviation
numpy.std([A_rank[0], B_rank[0], C_rank[0], ..., Z_rank[0]])

#mean
numpy.mean([A_rank[0], B_rank[0], C_rank[0], ..., Z_rank[0]])

To shorten the code and generalize this to any nth digit use the following function I generated for you:

def getAllNthRanks(n):
    return [A_rank[n], B_rank[n], C_rank[n], D_rank[n], E_rank[n], F_rank[n], G_rank[n], H_rank[n], I_rank[n], J_rank[n], K_rank[n], L_rank[n], M_rank[n], N_rank[n], O_rank[n], P_rank[n], Q_rank[n], R_rank[n], S_rank[n], T_rank[n], U_rank[n], V_rank[n], W_rank[n], X_rank[n], Y_rank[n], Z_rank[n]] 

Now you can simply get the stdd and mean of all the nth places from A-Z like this:

#standard deviation
numpy.std(getAllNthRanks(n))

#mean
numpy.mean(getAllNthRanks(n))

声明:本站所有文章,如无特殊说明或标注,均为本站原创发布。任何个人或组织,在未征得本站同意时,禁止复制、盗用、采集、发布本站内容到任何网站、书籍等各类媒体平台。如若本站内容侵犯了原著者的合法权益,可联系我们进行处理。