如何删除numpy.ndarray中包含非数字值的所有行

问题:如何删除numpy.ndarray中包含非数字值的所有行

基本上,我正在做一些数据分析。我以numpy.ndarray的形式读取数据集,并且缺少某些值(要么只是不在那里NaN,要么是作为字符串写为“ NA”)。

我想清除包含这样任何条目的所有行。我该如何用一个numpy的ndarray?

Basically, I’m doing some data analysis. I read in a dataset as a numpy.ndarray and some of the values are missing (either by just not being there, being NaN, or by being a string written “NA“).

I want to clean out all rows containing any entry like this. How do I do that with a numpy ndarray?


回答 0

>>> a = np.array([[1,2,3], [4,5,np.nan], [7,8,9]])
array([[  1.,   2.,   3.],
       [  4.,   5.,  nan],
       [  7.,   8.,   9.]])

>>> a[~np.isnan(a).any(axis=1)]
array([[ 1.,  2.,  3.],
       [ 7.,  8.,  9.]])

并将其重新分配给a

说明:np.isnan(a)返回一个相似的阵列True,其中NaNFalse在其他地方。.any(axis=1)降低了m*n阵列n与逻辑or对整个行,操作~反相True/Falsea[ ]从原始数组只选择行,其具有True括号内。

>>> a = np.array([[1,2,3], [4,5,np.nan], [7,8,9]])
array([[  1.,   2.,   3.],
       [  4.,   5.,  nan],
       [  7.,   8.,   9.]])

>>> a[~np.isnan(a).any(axis=1)]
array([[ 1.,  2.,  3.],
       [ 7.,  8.,  9.]])

and reassign this to a.

Explanation: np.isnan(a) returns a similar array with True where NaN, False elsewhere. .any(axis=1) reduces an m*n array to n with an logical or operation on the whole rows, ~ inverts True/False and a[ ] chooses just the rows from the original array, which have True within the brackets.