## 问题：在熊猫数据框中删除全零的行

``````P   kt  b   tt  mky depth
1   0   0   0   0   0
2   0   0   0   0   0
3   0   0   0   0   0
4   0   0   0   0   0
5   1.1 3   4.5 2.3 9.0``````

I can use `pandas` `dropna()` functionality to remove rows with some or all columns set as `NA`‘s. Is there an equivalent function for dropping rows with all columns having value 0?

``````P   kt  b   tt  mky depth
1   0   0   0   0   0
2   0   0   0   0   0
3   0   0   0   0   0
4   0   0   0   0   0
5   1.1 3   4.5 2.3 9.0
``````

In this example, we would like to drop the first 4 rows from the data frame.

thanks!

## 回答 0

``````> df = pd.DataFrame({'a':[0,0,1,1], 'b':[0,1,0,1]})
> df = df[(df.T != 0).any()]
> df
a  b
1  0  1
2  1  0
3  1  1``````

It turns out this can be nicely expressed in a vectorized fashion:

``````> df = pd.DataFrame({'a':[0,0,1,1], 'b':[0,1,0,1]})
> df = df[(df.T != 0).any()]
> df
a  b
1  0  1
2  1  0
3  1  1
``````

## 回答 1

``df.loc[~(df==0).all(axis=1)]``

``df.loc[(df!=0).any(axis=1)]``

One-liner. No transpose needed:

``````df.loc[~(df==0).all(axis=1)]
``````

And for those who like symmetry, this also works…

``````df.loc[(df!=0).any(axis=1)]
``````

## 回答 2

``df.loc[(df!=0).any(1)]``

I look up this question about once a month and always have to dig out the best answer from the comments:

``````df.loc[(df!=0).any(1)]
``````

Thanks Dan Allan!

## 回答 3

``````import numpy as np
df = df.replace(0, np.nan)
df = df.dropna(how='all', axis=0)
df = df.replace(np.nan, 0)``````

Replace the zeros with `nan` and then drop the rows with all entries as `nan`. After that replace `nan` with zeros.

``````import numpy as np
df = df.replace(0, np.nan)
df = df.dropna(how='all', axis=0)
df = df.replace(np.nan, 0)
``````

## 回答 4

``df= df[df['ColName'] != 0]``

I think this solution is the shortest :

``````df= df[df['ColName'] != 0]
``````

## 回答 5

``````df[(df.sum(axis=1) != 0)]       # 30% faster
df[df.values.sum(axis=1) != 0]  # 3X faster ``````

``````In [88]: df = pd.DataFrame({'a':[0,0,1,1], 'b':[0,1,0,1]})

In [91]: %timeit df[(df.T != 0).any()]
1000 loops, best of 3: 686 µs per loop

In [92]: df[(df.sum(axis=1) != 0)]
Out[92]:
a  b
1  0  1
2  1  0
3  1  1

In [95]: %timeit df[(df.sum(axis=1) != 0)]
1000 loops, best of 3: 495 µs per loop

In [96]: %timeit df[df.values.sum(axis=1) != 0]
1000 loops, best of 3: 217 µs per loop``````

``````In [119]: bdf = pd.DataFrame(np.random.randint(0,2,size=(10000,4)))

In [120]: %timeit bdf[(bdf.T != 0).any()]
1000 loops, best of 3: 1.63 ms per loop

In [121]: %timeit bdf[(bdf.sum(axis=1) != 0)]
1000 loops, best of 3: 1.09 ms per loop

In [122]: %timeit bdf[bdf.values.sum(axis=1) != 0]
1000 loops, best of 3: 517 µs per loop``````

Couple of solutions I found to be helpful while looking this up, especially for larger data sets:

``````df[(df.sum(axis=1) != 0)]       # 30% faster
df[df.values.sum(axis=1) != 0]  # 3X faster
``````

Continuing with the example from @U2EF1:

``````In [88]: df = pd.DataFrame({'a':[0,0,1,1], 'b':[0,1,0,1]})

In [91]: %timeit df[(df.T != 0).any()]
1000 loops, best of 3: 686 µs per loop

In [92]: df[(df.sum(axis=1) != 0)]
Out[92]:
a  b
1  0  1
2  1  0
3  1  1

In [95]: %timeit df[(df.sum(axis=1) != 0)]
1000 loops, best of 3: 495 µs per loop

In [96]: %timeit df[df.values.sum(axis=1) != 0]
1000 loops, best of 3: 217 µs per loop
``````

On a larger dataset:

``````In [119]: bdf = pd.DataFrame(np.random.randint(0,2,size=(10000,4)))

In [120]: %timeit bdf[(bdf.T != 0).any()]
1000 loops, best of 3: 1.63 ms per loop

In [121]: %timeit bdf[(bdf.sum(axis=1) != 0)]
1000 loops, best of 3: 1.09 ms per loop

In [122]: %timeit bdf[bdf.values.sum(axis=1) != 0]
1000 loops, best of 3: 517 µs per loop
``````

## 回答 6

``````import pandas as pd

df = pd.DataFrame({'a' : [0,0,1], 'b' : [0,0,-1]})

temp = df.abs().sum(axis=1) == 0
df = df.drop(temp)``````

``````>>> df
a  b
2  1 -1``````
``````import pandas as pd

df = pd.DataFrame({'a' : [0,0,1], 'b' : [0,0,-1]})

temp = df.abs().sum(axis=1) == 0
df = df.drop(temp)
``````

Result:

``````>>> df
a  b
2  1 -1
``````

## 回答 7

``````import pandas as pd
import numpy as np

np.random.seed(0)

df = pd.DataFrame(np.random.randn(5,3),
index=['one', 'two', 'three', 'four', 'five'],
columns=list('abc'))

df.loc[['one', 'three']] = 0

print df
print df.loc[~df.apply(lambda row: (row==0).all(), axis=1)]``````

Yield：

``````              a         b         c
one    0.000000  0.000000  0.000000
two    2.240893  1.867558 -0.977278
three  0.000000  0.000000  0.000000
four   0.410599  0.144044  1.454274
five   0.761038  0.121675  0.443863

[5 rows x 3 columns]
a         b         c
two   2.240893  1.867558 -0.977278
four  0.410599  0.144044  1.454274
five  0.761038  0.121675  0.443863

[3 rows x 3 columns]``````

You can use a quick `lambda` function to check if all the values in a given row are `0`. Then you can use the result of applying that `lambda` as a way to choose only the rows that match or don’t match that condition:

``````import pandas as pd
import numpy as np

np.random.seed(0)

df = pd.DataFrame(np.random.randn(5,3),
index=['one', 'two', 'three', 'four', 'five'],
columns=list('abc'))

df.loc[['one', 'three']] = 0

print df
print df.loc[~df.apply(lambda row: (row==0).all(), axis=1)]
``````

Yields:

``````              a         b         c
one    0.000000  0.000000  0.000000
two    2.240893  1.867558 -0.977278
three  0.000000  0.000000  0.000000
four   0.410599  0.144044  1.454274
five   0.761038  0.121675  0.443863

[5 rows x 3 columns]
a         b         c
two   2.240893  1.867558 -0.977278
four  0.410599  0.144044  1.454274
five  0.761038  0.121675  0.443863

[3 rows x 3 columns]
``````

## 回答 8

``````# Is there anything in this row non-zero?
# df != 0 --> which entries are non-zero? T/F
# (df != 0).any(axis=1) --> are there 'any' entries non-zero row-wise? T/F of rows that return true to this statement.
# df.shape to confirm a subset.

all_zero_mask=(df != 0).any(axis=1) # Is there anything in this row non-zero?

Another alternative:

``````# Is there anything in this row non-zero?
# df != 0 --> which entries are non-zero? T/F
# (df != 0).any(axis=1) --> are there 'any' entries non-zero row-wise? T/F of rows that return true to this statement.
# df.shape to confirm a subset.

all_zero_mask=(df != 0).any(axis=1) # Is there anything in this row non-zero?
``````

## 回答 9

For me this code: `df.loc[(df!=0).any(axis=0)]` did not work. It returned the exact dataset.

Instead, I used `df.loc[:, (df!=0).any(axis=0)]` and dropped all the columns with 0 values in the dataset

The function `.all()` droped all the columns in which are any zero values in my dataset.

## 回答 10

``df = df [~( df [ ['kt'  'b'   'tt'  'mky' 'depth', ] ] == 0).all(axis=1) ]``

``````df = df [~( df [ ['kt'  'b'   'tt'  'mky' 'depth', ] ] == 0).all(axis=1) ]
``````

Try this command its perfectly working.

## 回答 11

``new_df = df[df.loc[:]!=0].dropna()``

To drop all columns with values 0 in any row:

``````new_df = df[df.loc[:]!=0].dropna()
``````