Python 实用宝典

Question 1

我正在尝试使用导入.csv文件pandas.read_csv()，但是我不想导入数据文件的第二行（索引为0的索引为1的行）。

我看不到如何不导入它，因为与命令一起使用的参数似乎模棱两可：

从熊猫网站：

skiprows ：类列表或整数

文件开头要跳过的行号（索引为0）或要跳过的行数（整数）。”

如果输入skiprows=1参数，它如何知道是跳过第一行还是跳过索引为1的行？

Question 2

I’m trying to import a .csv file using pandas.read_csv(), however I don’t want to import the 2nd row of the data file (the row with index = 1 for 0-indexing).

I can’t see how not to import it because the arguments used with the command seem ambiguous:

From the pandas website:

skiprows : list-like or integer

Row numbers to skip (0-indexed) or number of rows to skip (int) at the start of the file.”

If I put skiprows=1 in the arguments, how does it know whether to skip the first row or skip the row with index 1?

Question 3

您可以尝试：

>>> import pandas as pd
>>> from StringIO import StringIO
>>> s = """1, 2
... 3, 4
... 5, 6"""
>>> pd.read_csv(StringIO(s), skiprows=[1], header=None)
   0  1
0  1  2
1  5  6
>>> pd.read_csv(StringIO(s), skiprows=1, header=None)
   0  1
0  3  4
1  5  6

Question 4

You can try yourself:

>>> import pandas as pd
>>> from StringIO import StringIO
>>> s = """1, 2
... 3, 4
... 5, 6"""
>>> pd.read_csv(StringIO(s), skiprows=[1], header=None)
   0  1
0  1  2
1  5  6
>>> pd.read_csv(StringIO(s), skiprows=1, header=None)
   0  1
0  3  4
1  5  6

Question 5

我尚无任何评论的信誉，但我想添加到alko答案以供进一步参考。

从文档：

skiprows：文件中要跳过的行的数字集合。也可以是整数以跳过前n行

Question 6

I don’t have reputation to comment yet, but I want to add to alko answer for further reference.

From the docs:

skiprows: A collection of numbers for rows in the file to skip. Can also be an integer to skip the first n rows

Question 7

在读取csv文件时运行行列时遇到相同的问题。我当时在做skip_rows = 1这行不通

一个简单的示例给出了一个在读取csv文件时如何使用跳栏的想法。

import pandas as pd

#skiprows=1 will skip first line and try to read from second line
df = pd.read_csv('my_csv_file.csv', skiprows=1)  ## pandas as pd

#print the data frame
df

Question 8

I got the same issue while running the skiprows while reading the csv file. I was doning skip_rows=1 this will not work

Simple example gives an idea how to use skiprows while reading csv file.

import pandas as pd

#skiprows=1 will skip first line and try to read from second line
df = pd.read_csv('my_csv_file.csv', skiprows=1)  ## pandas as pd

#print the data frame
df

Question 9

所有这些答案都遗漏了一个重要点-第n行是文件中的第n行，而不是数据集中的第n行。我遇到从USGS下载一些过时的流量表数据的情况。数据集的开头用“＃”注释，其后的第一行是标签，下一行是描述日期类型的行，最后是数据本身。我不知道有多少条注释行，但是我知道前几行是什么。例：

– – – – – – – – – – – – – – – 警告 – – – – – – – – – – ————–

您从此美国地质调查局数据库中获得的一些数据

可能尚未获得董事的批准。… agency_cd site_no datetime tz_cd 139719_00065 139719_00065_cd

5s 15s 20d 6s 14n 10s USGS 08041780 2018-05-06 00:00 CDT 1.98 A

如果有一种方法可以自动跳过第n行和第n行，那就太好了。

作为说明，我能够通过以下方式解决问题：

import pandas as pd
ds = pd.read_csv(fname, comment='#', sep='\t', header=0, parse_dates=True)
ds.drop(0, inplace=True)

Question 10

All of these answers miss one important point — the n’th line is the n’th line in the file, and not the n’th row in the dataset. I have a situation where I download some antiquated stream gauge data from the USGS. The head of the dataset is commented with ‘#’, the first line after that are the labels, next comes a line that describes the date types, and last the data itself. I never know how many comment lines there are, but I know what the first couple of rows are. Example:

—————————– WARNING ———————————-

Some of the data that you have obtained from this U.S. Geological Survey database

may not have received Director’s approval. … agency_cd site_no datetime tz_cd 139719_00065 139719_00065_cd

5s 15s 20d 6s 14n 10s USGS 08041780 2018-05-06 00:00 CDT 1.98 A

It would be nice if there was a way to automatically skip the n’th row as well as the n’th line.

As a note, I was able to fix my issue with:

import pandas as pd
ds = pd.read_csv(fname, comment='#', sep='\t', header=0, parse_dates=True)
ds.drop(0, inplace=True)

Question 11

skip[1] 将跳过第二行，而不是第一行。

Question 12

skip[1] will skip second line, not the first one.

Question 13

另外，请确保您的文件实际上是CSV文件。例如，如果您有一个.xls文件，并且只是将文件扩展名更改为.csv，则该文件将不会导入，并且会出现上述错误。要检查是否是您的问题，请在excel中打开文件，该文件可能会显示：

“’Filename.csv’的文件格式和扩展名不匹配。该文件可能已损坏或不安全。除非您信任它的来源，否则请不要打开它。是否仍要打开它？”

修复文件：在Excel中打开文件，单击“另存为”，选择要另存为的文件格式（使用.cvs），然后替换现有文件。

这是我的问题，并为我修复了错误。

Question 14

Also be sure that your file is actually a CSV file. For example, if you had an .xls file, and simply changed the file extension to .csv, the file won’t import and will give the error above. To check to see if this is your problem open the file in excel and it will likely say:

“The file format and extension of ‘Filename.csv’ don’t match. The file could be corrupted or unsafe. Unless you trust its source, don’t open it. Do you want to open it anyway?”

To fix the file: open the file in Excel, click “Save As”, Choose the file format to save as (use .cvs), then replace the existing file.

This was my problem, and fixed the error for me.

Python 实用宝典

CSV导入熊猫时跳过行

问题：CSV导入熊猫时跳过行

回答 0

回答 1

回答 2

回答 3

– – – – – – – – – – – – – – – 警告 – – – – – – – – – – ————–

您从此美国地质调查局数据库中获得的一些数据

可能尚未获得董事的批准。… agency_cd site_no datetime tz_cd 139719_00065 139719_00065_cd

—————————– WARNING ———————————-

Some of the data that you have obtained from this U.S. Geological Survey database

may not have received Director’s approval. … agency_cd site_no datetime tz_cd 139719_00065 139719_00065_cd

回答 4

回答 5

有趣好用的Python教程