用None替换Pandas或Numpy Nan以与MysqlDB一起使用

问题:用None替换Pandas或Numpy Nan以与MysqlDB一起使用

我正在尝试使用MysqlDB将Pandas数据帧(或可以使用numpy数组)写入mysql数据库。MysqlDB似乎不理解’nan’,我的数据库抛出一个错误,说nan不在字段列表中。我需要找到一种将’nan’转换为NoneType的方法。

有任何想法吗?

I am trying to write a Pandas dataframe (or can use a numpy array) to a mysql database using MysqlDB . MysqlDB doesn’t seem understand ‘nan’ and my database throws out an error saying nan is not in the field list. I need to find a way to convert the ‘nan’ into a NoneType.

Any ideas?


回答 0

@bogatron正确,您可以使用where,值得注意的是您可以在熊猫本机执行此操作:

df1 = df.where(pd.notnull(df), None)

注意:这会将所有列的dtype更改为object

例:

In [1]: df = pd.DataFrame([1, np.nan])

In [2]: df
Out[2]: 
    0
0   1
1 NaN

In [3]: df1 = df.where(pd.notnull(df), None)

In [4]: df1
Out[4]: 
      0
0     1
1  None

注意:您不能执行的操作dtype是使用astype,然后使用DataFrame fillna方法来重铸DataFrame 以允许所有数据类型,请执行以下操作:

df1 = df.astype(object).replace(np.nan, 'None')

遗憾的是这个没有,也没有使用replace,用作品None这个(关闭)的问题


顺便说一句,值得注意的是,对于大多数用例,您不需要将NaN替换为None,请参阅有关熊猫中NaN和None之间的区别的问题。

但是,在这种特定情况下,您似乎可以这样做(至少在回答此问题时)。

@bogatron has it right, you can use where, it’s worth noting that you can do this natively in pandas:

df1 = df.where(pd.notnull(df), None)

Note: this changes the dtype of all columns to object.

Example:

In [1]: df = pd.DataFrame([1, np.nan])

In [2]: df
Out[2]: 
    0
0   1
1 NaN

In [3]: df1 = df.where(pd.notnull(df), None)

In [4]: df1
Out[4]: 
      0
0     1
1  None

Note: what you cannot do recast the DataFrames dtype to allow all datatypes types, using astype, and then the DataFrame fillna method:

df1 = df.astype(object).replace(np.nan, 'None')

Unfortunately neither this, nor using replace, works with None see this (closed) issue.


As an aside, it’s worth noting that for most use cases you don’t need to replace NaN with None, see this question about the difference between NaN and None in pandas.

However, in this specific case it seems you do (at least at the time of this answer).


回答 1

df = df.replace({np.nan: None})

这个Github问题归功于这个家伙。

df = df.replace({np.nan: None})

Credit goes to this guy here on this Github issue.


回答 2

您可以在numpy数组中替换nanNone

>>> x = np.array([1, np.nan, 3])
>>> y = np.where(np.isnan(x), None, x)
>>> print y
[1.0 None 3.0]
>>> print type(y[1])
<type 'NoneType'>

You can replace nan with None in your numpy array:

>>> x = np.array([1, np.nan, 3])
>>> y = np.where(np.isnan(x), None, x)
>>> print y
[1.0 None 3.0]
>>> print type(y[1])
<type 'NoneType'>

回答 3

经过绊脚,这对我有用:

df = df.astype(object).where(pd.notnull(df),None)

After stumbling around, this worked for me:

df = df.astype(object).where(pd.notnull(df),None)

回答 4

只是@Andy Hayden的答案的补充:

由于DataFrame.mask是的相对孪生子DataFrame.where,因此它们具有完全相同的签名,但含义相反:

  • DataFrame.where对于替换条件为False的很有用
  • DataFrame.mask用于替换条件为True的值。

所以在这个问题上,使用df.mask(df.isna(), other=None, inplace=True)可能会更直观。

Just an addition to @Andy Hayden’s answer:

Since DataFrame.mask is the opposite twin of DataFrame.where, they have the exactly same signature but with opposite meaning:

  • DataFrame.where is useful for Replacing values where the condition is False.
  • DataFrame.mask is used for Replacing values where the condition is True.

So in this question, using df.mask(df.isna(), other=None, inplace=True) might be more intuitive.


回答 5

另外除了:更换倍数和转换从柱背面的类型时要小心对象浮动。如果您想确定自己None的不会退回到np.NaN‘s’,请使用@ andy-hayden的建议pd.where。替换仍然会出错的说明:

In [1]: import pandas as pd

In [2]: import numpy as np

In [3]: df = pd.DataFrame({"a": [1, np.NAN, np.inf]})

In [4]: df
Out[4]:
     a
0  1.0
1  NaN
2  inf

In [5]: df.replace({np.NAN: None})
Out[5]:
      a
0     1
1  None
2   inf

In [6]: df.replace({np.NAN: None, np.inf: None})
Out[6]:
     a
0  1.0
1  NaN
2  NaN

In [7]: df.where((pd.notnull(df)), None).replace({np.inf: None})
Out[7]:
     a
0  1.0
1  NaN
2  NaN

Another addition: be careful when replacing multiples and converting the type of the column back from object to float. If you want to be certain that your None‘s won’t flip back to np.NaN‘s apply @andy-hayden’s suggestion with using pd.where. Illustration of how replace can still go ‘wrong’:

In [1]: import pandas as pd

In [2]: import numpy as np

In [3]: df = pd.DataFrame({"a": [1, np.NAN, np.inf]})

In [4]: df
Out[4]:
     a
0  1.0
1  NaN
2  inf

In [5]: df.replace({np.NAN: None})
Out[5]:
      a
0     1
1  None
2   inf

In [6]: df.replace({np.NAN: None, np.inf: None})
Out[6]:
     a
0  1.0
1  NaN
2  NaN

In [7]: df.where((pd.notnull(df)), None).replace({np.inf: None})
Out[7]:
     a
0  1.0
1  NaN
2  NaN

回答 6

很老,但我偶然发现了同样的问题。尝试这样做:

df['col_replaced'] = df['col_with_npnans'].apply(lambda x: None if np.isnan(x) else x)

Quite old, yet I stumbled upon the very same issue. Try doing this:

df['col_replaced'] = df['col_with_npnans'].apply(lambda x: None if np.isnan(x) else x)