pandas DataFrame：用列的平均值替换nan值

Question 1

I’ve got a pandas DataFrame filled mostly with real numbers, but there is a few nan values in it as well.

How can I replace the nans with averages of columns where they are?

This question is very similar to this one: numpy array: replace nan values with average of columns but, unfortunately, the solution given there doesn’t work for a pandas DataFrame.

Question 2

You can simply use to fill the nan‘s directly:

In [27]: df 
Out[27]: 
          A         B         C
0 -0.166919  0.979728 -0.632955
1 -0.297953 -0.912674 -1.365463
2 -0.120211 -0.540679 -0.680481
3       NaN -2.027325  1.533582
4       NaN       NaN  0.461821
5 -0.788073       NaN       NaN
6 -0.916080 -0.612343       NaN
7 -0.887858  1.033826       NaN
8  1.948430  1.025011 -2.982224
9  0.019698 -0.795876 -0.046431

In [28]: df.mean()
Out[28]: 
A   -0.151121
B   -0.231291
C   -0.530307
dtype: float64

In [29]: df.fillna(df.mean())
Out[29]: 
          A         B         C
0 -0.166919  0.979728 -0.632955
1 -0.297953 -0.912674 -1.365463
2 -0.120211 -0.540679 -0.680481
3 -0.151121 -2.027325  1.533582
4 -0.151121 -0.231291  0.461821
5 -0.788073 -0.231291 -0.530307
6 -0.916080 -0.612343 -0.530307
7 -0.887858  1.033826 -0.530307
8  1.948430  1.025011 -2.982224
9  0.019698 -0.795876 -0.046431

The docstring of fillna says that value should be a scalar or a dict, however, it seems to work with a Series as well. If you want to pass a dict, you could use df.mean().to_dict().

Question 3

Try:

sub2['income'].fillna((sub2['income'].mean()), inplace=True)

Question 4

In [16]: df = DataFrame(np.random.randn(10,3))

In [17]: df.iloc[3:5,0] = np.nan

In [18]: df.iloc[4:6,1] = np.nan

In [19]: df.iloc[5:8,2] = np.nan

In [20]: df
Out[20]: 
          0         1         2
0  1.148272  0.227366 -2.368136
1 -0.820823  1.071471 -0.784713
2  0.157913  0.602857  0.665034
3       NaN -0.985188 -0.324136
4       NaN       NaN  0.238512
5  0.769657       NaN       NaN
6  0.141951  0.326064       NaN
7 -1.694475 -0.523440       NaN
8  0.352556 -0.551487 -1.639298
9 -2.067324 -0.492617 -1.675794

In [22]: df.mean()
Out[22]: 
0   -0.251534
1   -0.040622
2   -0.841219
dtype: float64

Apply per-column the mean of that columns and fill

In [23]: df.apply(lambda x: x.fillna(x.mean()),axis=0)
Out[23]: 
          0         1         2
0  1.148272  0.227366 -2.368136
1 -0.820823  1.071471 -0.784713
2  0.157913  0.602857  0.665034
3 -0.251534 -0.985188 -0.324136
4 -0.251534 -0.040622  0.238512
5  0.769657 -0.040622 -0.841219
6  0.141951  0.326064 -0.841219
7 -1.694475 -0.523440 -0.841219
8  0.352556 -0.551487 -1.639298
9 -2.067324 -0.492617 -1.675794

Question 5

# To read data from csv file
Dataset = pd.read_csv('Data.csv')

X = Dataset.iloc[:, :-1].values

# To calculate mean use imputer class
from sklearn.impute import SimpleImputer
imputer = SimpleImputer(missing_values=np.nan, strategy='mean')
imputer = imputer.fit(X[:, 1:3])
X[:, 1:3] = imputer.transform(X[:, 1:3])

Question 6

If you want to impute missing values with mean and you want to go column by column, then this will only impute with the mean of that column. This might be a little more readable.

sub2['income'] = sub2['income'].fillna((sub2['income'].mean()))

Question 7

Directly use df.fillna(df.mean()) to fill all the null value with mean

If you want to fill null value with mean of that column then you can use this

suppose x=df['Item_Weight'] here Item_Weight is column name

here we are assigning (fill null values of x with mean of x into x)

df['Item_Weight'] = df['Item_Weight'].fillna((df['Item_Weight'].mean()))

If you want to fill null value with some string then use

here Outlet_size is column name

df.Outlet_Size = df.Outlet_Size.fillna('Missing')

Question 8

Another option besides those above is:

df = df.groupby(df.columns, axis = 1).transform(lambda x: x.fillna(x.mean()))

It’s less elegant than previous responses for mean, but it could be shorter if you desire to replace nulls by some other column function.

Question 9

Pandas: How to replace NaN (nan) values with the average (mean), median or other statistics of one column

Say your DataFrame is df and you have one column called nr_items. This is: df['nr_items']

If you want to replace the values of your column df['nr_items'] with the mean of the column:

Use method .fillna():

mean_value=df['nr_items'].mean() df['nr_item_ave']=df['nr_items'].fillna(mean_value)

I have created a new df column called nr_item_ave to store the new column with the NaN values replaced by the mean value of the column.

You should be careful when using the mean. If you have outliers is more recommendable to use the median

Question 10

using sklearn library preprocessing class

from sklearn.impute import SimpleImputer
missingvalues = SimpleImputer(missing_values = np.nan, strategy = 'mean', axis = 0)
missingvalues = missingvalues.fit(x[:,1:3])
x[:,1:3] = missingvalues.transform(x[:,1:3])

Note: In the recent version parameter missing_values value change to np.nan from NaN

pandas DataFrame：用列的平均值替换nan值

问题：pandas DataFrame：用列的平均值替换nan值

回答 0

回答 1

回答 2

回答 3

回答 4

回答 5

回答 6

回答 7

回答 8

排行榜展示

Python 情人节超强技能导出微信聊天记录生成词云

你不得不知道的python超级文献批量搜索下载工具

Python 流程图 — 一键转化代码为流程图

7行代码 Python热力图可视化分析缺失数据处理

Python 优化—算出每条语句执行时间

你的10W块放哪里能赚最多钱？

文章展示

JavaScript函数类似于Python range（）

如何将pandas DataFrame的第一列作为系列？

嵌套列表上的列表理解？

您如何在Tkinter的事件循环中运行自己的代码？

如何在NumPy数组中添加额外的列

如何遍历给定目录中的文件？

pandas DataFrame：用列的平均值替换nan值

问题：pandas DataFrame：用列的平均值替换nan值

回答 0

回答 1

回答 2

回答 3

回答 4

回答 5

回答 6

回答 7

回答 8

相关文章

排行榜展示

文章展示