如何在Pandas数据框中查找哪些列包含任何NaN值-Python 实用宝典

问题：如何在Pandas数据框中查找哪些列包含任何NaN值

给定一个熊猫数据框，其中包含可能在此处和此处散布的NaN值：

问题：如何确定哪些列包含NaN值？特别是，可以获取包含NaN的列名称的列表吗？

Given a pandas dataframe containing possible NaN values scattered here and there:

Question: How do I determine which columns contain NaN values? In particular, can I get a list of the column names containing NaNs?

回答 0

更新：使用熊猫0.22.0

较新的Pandas版本具有新的方法‘DataFrame.isna（）’和‘DataFrame.notna（）’

In [71]: df
Out[71]:
     a    b  c
0  NaN  7.0  0
1  0.0  NaN  4
2  2.0  NaN  4
3  1.0  7.0  0
4  1.0  3.0  9
5  7.0  4.0  9
6  2.0  6.0  9
7  9.0  6.0  4
8  3.0  0.0  9
9  9.0  0.0  1

In [72]: df.isna().any()
Out[72]:
a     True
b     True
c    False
dtype: bool

作为列列表：

In [74]: df.columns[df.isna().any()].tolist()
Out[74]: ['a', 'b']

选择这些列（至少包含一个NaN值）：

In [73]: df.loc[:, df.isna().any()]
Out[73]:
     a    b
0  NaN  7.0
1  0.0  NaN
2  2.0  NaN
3  1.0  7.0
4  1.0  3.0
5  7.0  4.0
6  2.0  6.0
7  9.0  6.0
8  3.0  0.0
9  9.0  0.0

旧答案：

尝试使用isnull（）：

In [97]: df
Out[97]:
     a    b  c
0  NaN  7.0  0
1  0.0  NaN  4
2  2.0  NaN  4
3  1.0  7.0  0
4  1.0  3.0  9
5  7.0  4.0  9
6  2.0  6.0  9
7  9.0  6.0  4
8  3.0  0.0  9
9  9.0  0.0  1

In [98]: pd.isnull(df).sum() > 0
Out[98]:
a     True
b     True
c    False
dtype: bool

或作为@root建议的更清晰的版本：

In [5]: df.isnull().any()
Out[5]:
a     True
b     True
c    False
dtype: bool

In [7]: df.columns[df.isnull().any()].tolist()
Out[7]: ['a', 'b']

选择一个子集-所有列至少包含一个NaN值：

In [31]: df.loc[:, df.isnull().any()]
Out[31]:
     a    b
0  NaN  7.0
1  0.0  NaN
2  2.0  NaN
3  1.0  7.0
4  1.0  3.0
5  7.0  4.0
6  2.0  6.0
7  9.0  6.0
8  3.0  0.0
9  9.0  0.0

UPDATE: using Pandas 0.22.0

Newer Pandas versions have new methods ‘DataFrame.isna()’ and ‘DataFrame.notna()’

In [71]: df
Out[71]:
     a    b  c
0  NaN  7.0  0
1  0.0  NaN  4
2  2.0  NaN  4
3  1.0  7.0  0
4  1.0  3.0  9
5  7.0  4.0  9
6  2.0  6.0  9
7  9.0  6.0  4
8  3.0  0.0  9
9  9.0  0.0  1

In [72]: df.isna().any()
Out[72]:
a     True
b     True
c    False
dtype: bool

as list of columns:

In [74]: df.columns[df.isna().any()].tolist()
Out[74]: ['a', 'b']

to select those columns (containing at least one NaN value):

In [73]: df.loc[:, df.isna().any()]
Out[73]:
     a    b
0  NaN  7.0
1  0.0  NaN
2  2.0  NaN
3  1.0  7.0
4  1.0  3.0
5  7.0  4.0
6  2.0  6.0
7  9.0  6.0
8  3.0  0.0
9  9.0  0.0

OLD answer:

Try to use isnull():

In [97]: df
Out[97]:
     a    b  c
0  NaN  7.0  0
1  0.0  NaN  4
2  2.0  NaN  4
3  1.0  7.0  0
4  1.0  3.0  9
5  7.0  4.0  9
6  2.0  6.0  9
7  9.0  6.0  4
8  3.0  0.0  9
9  9.0  0.0  1

In [98]: pd.isnull(df).sum() > 0
Out[98]:
a     True
b     True
c    False
dtype: bool

or as @root proposed clearer version:

In [5]: df.isnull().any()
Out[5]:
a     True
b     True
c    False
dtype: bool

In [7]: df.columns[df.isnull().any()].tolist()
Out[7]: ['a', 'b']

to select a subset – all columns containing at least one NaN value:

In [31]: df.loc[:, df.isnull().any()]
Out[31]:
     a    b
0  NaN  7.0
1  0.0  NaN
2  2.0  NaN
3  1.0  7.0
4  1.0  3.0
5  7.0  4.0
6  2.0  6.0
7  9.0  6.0
8  3.0  0.0
9  9.0  0.0

回答 1

您可以使用df.isnull().sum()。它显示了所有列以及每个功能的总NaN。

You can use df.isnull().sum(). It shows all columns and the total NaNs of each feature.

回答 2

我有一个问题，我必须在屏幕上目视检查许多列，因此筛选和返回有问题的列的简短列表组合是

nan_cols = [i for i in df.columns if df[i].isnull().any()]

如果这对任何人有帮助

I had a problem where I had to many columns to visually inspect on the screen so a short list comp that filters and returns the offending columns is

nan_cols = [i for i in df.columns if df[i].isnull().any()]

if that’s helpful to anyone

回答 3

在具有大量列的数据集中，最好查看有多少列包含空值而有多少列不包含空值。

print("No. of columns containing null values")
print(len(df.columns[df.isna().any()]))

print("No. of columns not containing null values")
print(len(df.columns[df.notna().all()]))

print("Total no. of columns in the dataframe")
print(len(df.columns))

例如，在我的数据框中，它包含82列，其中19列至少包含一个空值。

此外，您还可以自动删除cols和row，具体取决于哪个具有更多null值
。

df = df.drop(df.columns[df.isna().sum()>len(df.columns)],axis = 1)
df = df.dropna(axis = 0).reset_index(drop=True)

注意：上面的代码删除了所有空值。如果需要空值，请先处理它们。

In datasets having large number of columns its even better to see how many columns contain null values and how many don’t.

print("No. of columns containing null values")
print(len(df.columns[df.isna().any()]))

print("No. of columns not containing null values")
print(len(df.columns[df.notna().all()]))

print("Total no. of columns in the dataframe")
print(len(df.columns))

For example in my dataframe it contained 82 columns, of which 19 contained at least one null value.

Further you can also automatically remove cols and rows depending on which has more null values
Here is the code which does this intelligently:

df = df.drop(df.columns[df.isna().sum()>len(df.columns)],axis = 1)
df = df.dropna(axis = 0).reset_index(drop=True)

Note: Above code removes all of your null values. If you want null values, process them before.

回答 4

我使用以下三行代码来打印出包含至少一个空值的列名：

for column in dataframe:
    if dataframe[column].isnull().any():
       print('{0} has {1} null values'.format(column, dataframe[column].isnull().sum()))

i use these three lines of code to print out the column names which contain at least one null value:

for column in dataframe:
    if dataframe[column].isnull().any():
       print('{0} has {1} null values'.format(column, dataframe[column].isnull().sum()))

回答 5

这两个都应该起作用：

df.isnull().sum()
df.isna().sum()

DataFrame方法isna()还是isnull()完全相同的。

注意：空字符串''被视为False（不视为NA）

Both of these should work:

df.isnull().sum()
df.isna().sum()

DataFrame methods isna() or isnull() are completely identical.

Note: Empty strings '' is considered as False (not considered NA)

回答 6

这对我有用

1.用于获取具有至少1个空值的列。（列名）

data.columns[data.isnull().any()]

2.用于获取具有count且具有至少1个空值的Columns。

data[data.columns[data.isnull().any()]].isnull().sum()

[可选] 3.用于获取空计数的百分比。

data[data.columns[data.isnull().any()]].isnull().sum() * 100 / data.shape[0]

This worked for me,

1. For getting Columns having at least 1 null value. (column names)

data.columns[data.isnull().any()]

2. For getting Columns with count, with having at least 1 null value.

data[data.columns[data.isnull().any()]].isnull().sum()

[Optional] 3. For getting percentage of the null count.

data[data.columns[data.isnull().any()]].isnull().sum() * 100 / data.shape[0]

声明：本站所有文章，如无特殊说明或标注，均为本站原创发布。任何个人或组织，在未征得本站同意时，禁止复制、盗用、采集、发布本站内容到任何网站、书籍等各类媒体平台。如若本站内容侵犯了原著者的合法权益，可联系我们进行处理。

如何在Pandas数据框中查找哪些列包含任何NaN值

问题：如何在Pandas数据框中查找哪些列包含任何NaN值

回答 0

回答 1

回答 2

回答 3

回答 4

回答 5

回答 6

排行榜展示

Python 情人节超强技能导出微信聊天记录生成词云

你不得不知道的python超级文献批量搜索下载工具

Python 流程图 — 一键转化代码为流程图

7行代码 Python热力图可视化分析缺失数据处理

Python 优化—算出每条语句执行时间

你的10W块放哪里能赚最多钱？

文章展示

当DEBUG = False时，Django给出错误请求（400）

python lambda表达式中可以有多个语句吗？

SQLAlchemy是否具有与Django的get_or_create等效的功能？

格式化字符串时多次插入相同的值

使用dict文字和dict构造函数之间有区别吗？

无法通过套接字’/tmp/mysql.sock连接到本地MySQL服务器

如何在Pandas数据框中查找哪些列包含任何NaN值

问题：如何在Pandas数据框中查找哪些列包含任何NaN值

回答 0

回答 1

回答 2

回答 3

回答 4

回答 5

回答 6

相关文章

排行榜展示

文章展示