如何摆脱熊猫DataFrame中的“未命名：0”列？

Question 1

I have a situation wherein sometimes when I read a csv from df I get an unwanted index-like column named unnamed:0.

file.csv

,A,B,C
0,1,2,3
1,4,5,6
2,7,8,9

The CSV is read with this:

pd.read_csv('file.csv')

   Unnamed: 0  A  B  C
0           0  1  2  3
1           1  4  5  6
2           2  7  8  9

This is very annoying! Does anyone have an idea on how to get rid of this?

Question 2

It’s the index column, pass index=False to not write it out, see the docs

Example:

In [37]:
df = pd.DataFrame(np.random.randn(5,3), columns=list('abc'))
pd.read_csv(io.StringIO(df.to_csv()))

Out[37]:
   Unnamed: 0         a         b         c
0           0  0.109066 -1.112704 -0.545209
1           1  0.447114  1.525341  0.317252
2           2  0.507495  0.137863  0.886283
3           3  1.452867  1.888363  1.168101
4           4  0.901371 -0.704805  0.088335

compare with:

In [38]:
pd.read_csv(io.StringIO(df.to_csv(index=False)))

Out[38]:
          a         b         c
0  0.109066 -1.112704 -0.545209
1  0.447114  1.525341  0.317252
2  0.507495  0.137863  0.886283
3  1.452867  1.888363  1.168101
4  0.901371 -0.704805  0.088335

You could also optionally tell read_csv that the first column is the index column by passing index_col=0:

In [40]:
pd.read_csv(io.StringIO(df.to_csv()), index_col=0)

Out[40]:
          a         b         c
0  0.109066 -1.112704 -0.545209
1  0.447114  1.525341  0.317252
2  0.507495  0.137863  0.886283
3  1.452867  1.888363  1.168101
4  0.901371 -0.704805  0.088335

Question 3

This issue most likely manifests because your CSV was saved along with its RangeIndex (which usually doesn’t have a name). The fix would actually need to be done when saving the DataFrame, but this isn’t always an option.

Avoiding the Problem: `read_csv` with `index_col` argument

IMO, the simplest solution would be to read the unnamed column as the index. Specify an index_col=[0] argument to pd.read_csv, this reads in the first column as the index.

df = pd.DataFrame('x', index=range(5), columns=list('abc'))
df

   a  b  c
0  x  x  x
1  x  x  x
2  x  x  x
3  x  x  x
4  x  x  x

# Save DataFrame to CSV.
df.to_csv('file.csv')

pd.read_csv('file.csv')

   Unnamed: 0  a  b  c
0           0  x  x  x
1           1  x  x  x
2           2  x  x  x
3           3  x  x  x
4           4  x  x  x

# Now try this again, with the extra argument.
pd.read_csv('file.csv', index_col=[0])

   a  b  c
0  x  x  x
1  x  x  x
2  x  x  x
3  x  x  x
4  x  x  x

Note
You could have avoided this in the first place by using index=False when creating the output CSV, if your DataFrame does not have an index to begin with.
df.to_csv('file.csv', index=False)
But as mentioned above, this isn’t always an option.

Stopgap Solution: Filtering with `str.match`

If you cannot modify the code to read/write the CSV file, you can just remove the column by filtering with str.match:

df 

   Unnamed: 0  a  b  c
0           0  x  x  x
1           1  x  x  x
2           2  x  x  x
3           3  x  x  x
4           4  x  x  x

df.columns
# Index(['Unnamed: 0', 'a', 'b', 'c'], dtype='object')

df.columns.str.match('Unnamed')
# array([ True, False, False, False])

df.loc[:, ~df.columns.str.match('Unnamed')]

   a  b  c
0  x  x  x
1  x  x  x
2  x  x  x
3  x  x  x
4  x  x  x

Question 4

Another case that this might be happening is if your data was improperly written to your csv to have each row end with a comma. This will leave you with an unnamed column Unnamed: x at the end of your data when you try to read it into a df.

Question 5

To get ride of all Unnamed columns, you can also use regex such as df.drop(df.filter(regex="Unname"),axis=1, inplace=True)

Question 6

Simply delete that column using: del df['column_name']

如何摆脱熊猫DataFrame中的“未命名：0”列？

问题：如何摆脱熊猫DataFrame中的“未命名：0”列？

回答 0

回答 1

避免问题：`read_csv`带有`index_col` 参数

权宜之计解决方案：过滤 `str.match`

Avoiding the Problem: `read_csv` with `index_col` argument

Stopgap Solution: Filtering with `str.match`

回答 2

回答 3

回答 4

排行榜展示

Python 情人节超强技能导出微信聊天记录生成词云

你不得不知道的python超级文献批量搜索下载工具

7行代码 Python热力图可视化分析缺失数据处理

Python 流程图 — 一键转化代码为流程图

Python 优化—算出每条语句执行时间

你的10W块放哪里能赚最多钱？

文章展示

Python日志记录：使用毫秒格式的时间

如何在Django中按日期范围过滤查询对象？

获取与字典中的最小值对应的键

比较两个变量是否在python中引用了同一对象

如何在python中检测文件是否为二进制（非文本）？

python中的嵌套try / except块是一种好的编程习惯吗？

如何摆脱熊猫DataFrame中的“未命名：0”列？

问题：如何摆脱熊猫DataFrame中的“未命名：0”列？

回答 0

回答 1

避免问题：read_csv带有index_col 参数

权宜之计解决方案：过滤 str.match

Avoiding the Problem: read_csv with index_col argument

Stopgap Solution: Filtering with str.match

回答 2

回答 3

回答 4

相关文章

排行榜展示

文章展示

避免问题：`read_csv`带有`index_col` 参数

权宜之计解决方案：过滤 `str.match`

Avoiding the Problem: `read_csv` with `index_col` argument

Stopgap Solution: Filtering with `str.match`