获取pandas.read_csv以将空值读取为空字符串而不是nan

问题:获取pandas.read_csv以将空值读取为空字符串而不是nan

我正在使用pandas库读取一些CSV数据。在我的数据中,某些列包含字符串。该字符串"nan"是一个可能的值,一个空字符串也可以。我设法让大熊猫将“ nan”读取为字符串,但是我不知道如何获取不读取空值的NaN。这是示例数据和输出

One,Two,Three
a,1,one
b,2,two
,3,three
d,4,nan
e,5,five
nan,6,
g,7,seven

>>> pandas.read_csv('test.csv', na_values={'One': [], "Three": []})
    One  Two  Three
0    a    1    one
1    b    2    two
2  NaN    3  three
3    d    4    nan
4    e    5   five
5  nan    6    NaN
6    g    7  seven

它正确地写着“男”为字符串“南”,但仍读取空单元格作为NaN的。我想传递strconverters参数read_csv(带converters={'One': str})),但它仍然读取空单元格作为NaN的。

我意识到我可以在读取后使用fillna填充值,但是真的没有办法告诉熊猫特定CSV列中的空单元格应被读取为空字符串而不是NaN吗?

I’m using the pandas library to read in some CSV data. In my data, certain columns contain strings. The string "nan" is a possible value, as is an empty string. I managed to get pandas to read “nan” as a string, but I can’t figure out how to get it not to read an empty value as NaN. Here’s sample data and output

One,Two,Three
a,1,one
b,2,two
,3,three
d,4,nan
e,5,five
nan,6,
g,7,seven

>>> pandas.read_csv('test.csv', na_values={'One': [], "Three": []})
    One  Two  Three
0    a    1    one
1    b    2    two
2  NaN    3  three
3    d    4    nan
4    e    5   five
5  nan    6    NaN
6    g    7  seven

It correctly reads “nan” as the string “nan’, but still reads the empty cells as NaN. I tried passing in str in the converters argument to read_csv (with converters={'One': str})), but it still reads the empty cells as NaN.

I realize I can fill the values after reading, with fillna, but is there really no way to tell pandas that an empty cell in a particular CSV column should be read as an empty string instead of NaN?


回答 0

我添加了票证以在此处添加某种选项:

https://github.com/pydata/pandas/issues/1450

同时,result.fillna('')应该做你想做的

编辑:在开发版本中(最终为0.8.0),如果您指定的空列表na_values,则空字符串将在结果中保留空字符串

I added a ticket to add an option of some sort here:

https://github.com/pydata/pandas/issues/1450

In the meantime, result.fillna('') should do what you want

EDIT: in the development version (to be 0.8.0 final) if you specify an empty list of na_values, empty strings will stay empty strings in the result


回答 1

阅读其他答案和评论后,我仍然感到困惑。但是,现在的答案似乎更简单,因此您可以开始操作。

从Pandas 0.9版(自2012年起)开始,您只需设置keep_default_na=False以下内容,即可读取解释为空字符串的空单元格的csv :

pd.read_csv('test.csv', keep_default_na=False)

此问题在以下内容中有更清楚的说明

该问题已于2012年8月19日在Pandas 0.9版中修复

I was still confused after reading the other answers and comments. But the answer now seems simpler, so here you go.

Since Pandas version 0.9 (from 2012), you can read your csv with empty cells interpreted as empty strings by simply setting keep_default_na=False:

pd.read_csv('test.csv', keep_default_na=False)

This issue is more clearly explained in

That was fixed on on Aug 19, 2012 for Pandas version 0.9 in


回答 2

我们在Pandas read_csv中有一个简单的参数:

使用:

df = pd.read_csv('test.csv', na_filter= False)

熊猫的文档清楚地解释了上述论点是如何工作的。

链接

We have a simple argument in Pandas read_csv for this:

Use:

df = pd.read_csv('test.csv', na_filter= False)

Pandas documentation clearly explains how the above argument works.

Link