从数组中删除Nan值

问题:从数组中删除Nan值

我想弄清楚如何从数组中删除nan值。我的数组看起来像这样:

x = [1400, 1500, 1600, nan, nan, nan ,1700] #Not in this exact configuration

如何从中删除nanx

I want to figure out how to remove nan values from my array. My array looks something like this:

x = [1400, 1500, 1600, nan, nan, nan ,1700] #Not in this exact configuration

How can I remove the nan values from x?


回答 0

如果您对数组使用numpy,也可以使用

x = x[numpy.logical_not(numpy.isnan(x))]

等效地

x = x[~numpy.isnan(x)]

[感谢chbrown新增了速记]

说明

内部函数numpy.isnan返回一个布尔值/逻辑数组,该数组在True每个地方都x具有非数字值。因为我们希望相反,我们使用逻辑不操作,~以获得与阵列True到处都是这x 一个有效的数字。

最后,我们使用此逻辑数组索引到原始数组x,仅检索非NaN值。

If you’re using numpy for your arrays, you can also use

x = x[numpy.logical_not(numpy.isnan(x))]

Equivalently

x = x[~numpy.isnan(x)]

[Thanks to chbrown for the added shorthand]

Explanation

The inner function, numpy.isnan returns a boolean/logical array which has the value True everywhere that x is not-a-number. As we want the opposite, we use the logical-not operator, ~ to get an array with Trues everywhere that x is a valid number.

Lastly we use this logical array to index into the original array x, to retrieve just the non-NaN values.


回答 1

filter(lambda v: v==v, x)

由于v!= v仅适用于NaN,因此适用于列表和numpy数组

filter(lambda v: v==v, x)

works both for lists and numpy array since v!=v only for NaN


回答 2

试试这个:

import math
print [value for value in x if not math.isnan(value)]

有关更多信息,请阅读列表理解

Try this:

import math
print [value for value in x if not math.isnan(value)]

For more, read on List Comprehensions.


回答 3

对我来说,@ jmetz的答案不起作用,但是使用熊猫的isull()可以。

x = x[~pd.isnull(x)]

For me the answer by @jmetz didn’t work, however using pandas isnull() did.

x = x[~pd.isnull(x)]

回答 4

执行以上操作:

x = x[~numpy.isnan(x)]

要么

x = x[numpy.logical_not(numpy.isnan(x))]

我发现重置为相同的变量(x)不会删除实际的nan值,而必须使用其他变量。将其设置为其他变量将删除nans。例如

y = x[~numpy.isnan(x)]

Doing the above :

x = x[~numpy.isnan(x)]

or

x = x[numpy.logical_not(numpy.isnan(x))]

I found that resetting to the same variable (x) did not remove the actual nan values and had to use a different variable. Setting it to a different variable removed the nans. e.g.

y = x[~numpy.isnan(x)]

回答 5

如其他人所示

x[~numpy.isnan(x)]

作品。但是,如果numpy dtype不是本机数据类型(例如,如果它是object),则将引发错误。在这种情况下,您可以使用熊猫。

x[~pandas.isna(x)] or x[~pandas.isnull(x)]

As shown by others

x[~numpy.isnan(x)]

works. But it will throw an error if the numpy dtype is not a native data type, for example if it is object. In that case you can use pandas.

x[~pandas.isna(x)] or x[~pandas.isnull(x)]

回答 6

所述接受的答案改变为2D阵列的形状。我在这里提出了一个使用Pandas dropna()功能的解决方案。它适用于一维和二维阵列。在2D情况下,您可以选择天气删除包含的行或列np.nan

import pandas as pd
import numpy as np

def dropna(arr, *args, **kwarg):
    assert isinstance(arr, np.ndarray)
    dropped=pd.DataFrame(arr).dropna(*args, **kwarg).values
    if arr.ndim==1:
        dropped=dropped.flatten()
    return dropped

x = np.array([1400, 1500, 1600, np.nan, np.nan, np.nan ,1700])
y = np.array([[1400, 1500, 1600], [np.nan, 0, np.nan] ,[1700,1800,np.nan]] )


print('='*20+' 1D Case: ' +'='*20+'\nInput:\n',x,sep='')
print('\ndropna:\n',dropna(x),sep='')

print('\n\n'+'='*20+' 2D Case: ' +'='*20+'\nInput:\n',y,sep='')
print('\ndropna (rows):\n',dropna(y),sep='')
print('\ndropna (columns):\n',dropna(y,axis=1),sep='')

print('\n\n'+'='*20+' x[np.logical_not(np.isnan(x))] for 2D: ' +'='*20+'\nInput:\n',y,sep='')
print('\ndropna:\n',x[np.logical_not(np.isnan(x))],sep='')

结果:

==================== 1D Case: ====================
Input:
[1400. 1500. 1600.   nan   nan   nan 1700.]

dropna:
[1400. 1500. 1600. 1700.]


==================== 2D Case: ====================
Input:
[[1400. 1500. 1600.]
 [  nan    0.   nan]
 [1700. 1800.   nan]]

dropna (rows):
[[1400. 1500. 1600.]]

dropna (columns):
[[1500.]
 [   0.]
 [1800.]]


==================== x[np.logical_not(np.isnan(x))] for 2D: ====================
Input:
[[1400. 1500. 1600.]
 [  nan    0.   nan]
 [1700. 1800.   nan]]

dropna:
[1400. 1500. 1600. 1700.]

The accepted answer changes shape for 2d arrays. I present a solution here, using the Pandas dropna() functionality. It works for 1D and 2D arrays. In the 2D case you can choose weather to drop the row or column containing np.nan.

import pandas as pd
import numpy as np

def dropna(arr, *args, **kwarg):
    assert isinstance(arr, np.ndarray)
    dropped=pd.DataFrame(arr).dropna(*args, **kwarg).values
    if arr.ndim==1:
        dropped=dropped.flatten()
    return dropped

x = np.array([1400, 1500, 1600, np.nan, np.nan, np.nan ,1700])
y = np.array([[1400, 1500, 1600], [np.nan, 0, np.nan] ,[1700,1800,np.nan]] )


print('='*20+' 1D Case: ' +'='*20+'\nInput:\n',x,sep='')
print('\ndropna:\n',dropna(x),sep='')

print('\n\n'+'='*20+' 2D Case: ' +'='*20+'\nInput:\n',y,sep='')
print('\ndropna (rows):\n',dropna(y),sep='')
print('\ndropna (columns):\n',dropna(y,axis=1),sep='')

print('\n\n'+'='*20+' x[np.logical_not(np.isnan(x))] for 2D: ' +'='*20+'\nInput:\n',y,sep='')
print('\ndropna:\n',x[np.logical_not(np.isnan(x))],sep='')

Result:

==================== 1D Case: ====================
Input:
[1400. 1500. 1600.   nan   nan   nan 1700.]

dropna:
[1400. 1500. 1600. 1700.]


==================== 2D Case: ====================
Input:
[[1400. 1500. 1600.]
 [  nan    0.   nan]
 [1700. 1800.   nan]]

dropna (rows):
[[1400. 1500. 1600.]]

dropna (columns):
[[1500.]
 [   0.]
 [1800.]]


==================== x[np.logical_not(np.isnan(x))] for 2D: ====================
Input:
[[1400. 1500. 1600.]
 [  nan    0.   nan]
 [1700. 1800.   nan]]

dropna:
[1400. 1500. 1600. 1700.]

回答 7

如果您正在使用 numpy

# first get the indices where the values are finite
ii = np.isfinite(x)

# second get the values
x = x[ii]

If you’re using numpy

# first get the indices where the values are finite
ii = np.isfinite(x)

# second get the values
x = x[ii]

回答 8

最简单的方法是:

numpy.nan_to_num(x)

文档:https : //docs.scipy.org/doc/numpy/reference/generated/numpy.nan_to_num.html


回答 9

这是我为NaN和infs过滤ndarray “ X”的方法,

我创建的行映射不包含NaN任何内容inf,如下所示:

idx = np.where((np.isnan(X)==False) & (np.isinf(X)==False))

idx是一个元组。它的第二列(idx[1])包含数组的索引,在该行中找不到NaNinf

然后:

filtered_X = X[idx[1]]

filtered_X包含X,而不 包含NaNnor inf

This is my approach to filter ndarray “X” for NaNs and infs,

I create a map of rows without any NaN and any inf as follows:

idx = np.where((np.isnan(X)==False) & (np.isinf(X)==False))

idx is a tuple. It’s second column (idx[1]) contains the indices of the array, where no NaN nor inf where found across the row.

Then:

filtered_X = X[idx[1]]

filtered_X contains X without NaN nor inf.


回答 10

@jmetz的答案可能是大多数人需要的答案。但是,它会产生一维数组,例如,使其无法删除矩阵中的整个行或列。

为此,应将逻辑数组缩小为一维,然后索引目标数组。例如,以下内容将删除至少具有一个NaN值的行:

x = x[~numpy.isnan(x).any(axis=1)]

在这里查看更多详细信息

@jmetz’s answer is probably the one most people need; however it yields a one-dimensional array, e.g. making it unusable to remove entire rows or columns in matrices.

To do so, one should reduce the logical array to one dimension, then index the target array. For instance, the following will remove rows which have at least one NaN value:

x = x[~numpy.isnan(x).any(axis=1)]

See more detail here.