处理CSV数据时如何忽略第一行数据?

问题:处理CSV数据时如何忽略第一行数据?

我要Python从一列CSV数据中打印最少的数字,但是第一行是列号,我不希望Python考虑到第一行。如何确定Python忽略第一行?

到目前为止,这是代码:

import csv

with open('all16.csv', 'rb') as inf:
    incsv = csv.reader(inf)
    column = 1                
    datatype = float          
    data = (datatype(column) for row in incsv)   
    least_value = min(data)

print least_value

您还能说明自己在做什么,而不仅仅是给出代码吗?我对Python非常陌生,并希望确保我了解所有内容。

I am asking Python to print the minimum number from a column of CSV data, but the top row is the column number, and I don’t want Python to take the top row into account. How can I make sure Python ignores the first line?

This is the code so far:

import csv

with open('all16.csv', 'rb') as inf:
    incsv = csv.reader(inf)
    column = 1                
    datatype = float          
    data = (datatype(column) for row in incsv)   
    least_value = min(data)

print least_value

Could you also explain what you are doing, not just give the code? I am very very new to Python and would like to make sure I understand everything.


回答 0

您可以使用csv模块Sniffer类的实例来推断CSV文件的格式,并检测是否存在标头行以及next()仅在必要时才跳过第一行的内置函数:

import csv

with open('all16.csv', 'r', newline='') as file:
    has_header = csv.Sniffer().has_header(file.read(1024))
    file.seek(0)  # Rewind.
    reader = csv.reader(file)
    if has_header:
        next(reader)  # Skip header row.
    column = 1
    datatype = float
    data = (datatype(row[column]) for row in reader)
    least_value = min(data)

print(least_value)

由于在您的示例中datatypecolumn都进行了硬编码,因此row像这样处理起来会更快一些:

    data = (float(row[1]) for row in reader)

注意:以上代码适用于Python3.x。对于Python 2.x,使用以下行来打开文件而不是显示的内容:

with open('all16.csv', 'rb') as file:

You could use an instance of the csv module’s Sniffer class to deduce the format of a CSV file and detect whether a header row is present along with the built-in next() function to skip over the first row only when necessary:

import csv

with open('all16.csv', 'r', newline='') as file:
    has_header = csv.Sniffer().has_header(file.read(1024))
    file.seek(0)  # Rewind.
    reader = csv.reader(file)
    if has_header:
        next(reader)  # Skip header row.
    column = 1
    datatype = float
    data = (datatype(row[column]) for row in reader)
    least_value = min(data)

print(least_value)

Since datatype and column are hardcoded in your example, it would be slightly faster to process the row like this:

    data = (float(row[1]) for row in reader)

Note: the code above is for Python 3.x. For Python 2.x use the following line to open the file instead of what is shown:

with open('all16.csv', 'rb') as file:

回答 1

要跳过第一行,只需调用:

next(inf)

Python中的文件是行上的迭代器。

To skip the first line just call:

next(inf)

Files in Python are iterators over lines.


回答 2

在类似的用例中,我不得不在具有实际列名的行之前跳过烦人的行。该解决方案效果很好。首先阅读文件,然后将列表传递给csv.DictReader

with open('all16.csv') as tmp:
    # Skip first line (if any)
    next(tmp, None)

    # {line_num: row}
    data = dict(enumerate(csv.DictReader(tmp)))

In a similar use case I had to skip annoying lines before the line with my actual column names. This solution worked nicely. Read the file first, then pass the list to csv.DictReader.

with open('all16.csv') as tmp:
    # Skip first line (if any)
    next(tmp, None)

    # {line_num: row}
    data = dict(enumerate(csv.DictReader(tmp)))

回答 3

python cookbook借来的,
更简洁的模板代码可能如下所示:

import csv
with open('stocks.csv') as f:
    f_csv = csv.reader(f) 
    headers = next(f_csv) 
    for row in f_csv:
        # Process row ...

Borrowed from python cookbook,
A more concise template code might look like this:

import csv
with open('stocks.csv') as f:
    f_csv = csv.reader(f) 
    headers = next(f_csv) 
    for row in f_csv:
        # Process row ...

回答 4

通常next(incsv),您会使用它使迭代器前进一排,因此跳过标题。另一个(例如,您想跳过30行)将是:

from itertools import islice
for row in islice(incsv, 30, None):
    # process

You would normally use next(incsv) which advances the iterator one row, so you skip the header. The other (say you wanted to skip 30 rows) would be:

from itertools import islice
for row in islice(incsv, 30, None):
    # process

回答 5

使用csv.DictReader而不是csv.Reader。如果省略fieldnames参数,则csvfile第一行中的值将用作字段名称。这样便可以使用row [“ 1”]等访问字段值

use csv.DictReader instead of csv.Reader. If the fieldnames parameter is omitted, the values in the first row of the csvfile will be used as field names. you would then be able to access field values using row[“1”] etc


回答 6

新的“ pandas”软件包可能比“ csv”更相关。下面的代码将读取一个CSV文件,默认情况下将第一行解释为列标题,并在各列中查找最小值。

import pandas as pd

data = pd.read_csv('all16.csv')
data.min()

The new ‘pandas’ package might be more relevant than ‘csv’. The code below will read a CSV file, by default interpreting the first line as the column header and find the minimum across columns.

import pandas as pd

data = pd.read_csv('all16.csv')
data.min()

回答 7

好吧,我的迷你包装库也可以完成这项工作。

>>> import pyexcel as pe
>>> data = pe.load('all16.csv', name_columns_by_row=0)
>>> min(data.column[1])

同时,如果您知道什么是标题列索引之一,例如“ Column 1”,则可以执行以下操作:

>>> min(data.column["Column 1"])

Well, my mini wrapper library would do the job as well.

>>> import pyexcel as pe
>>> data = pe.load('all16.csv', name_columns_by_row=0)
>>> min(data.column[1])

Meanwhile, if you know what header column index one is, for example “Column 1”, you can do this instead:

>>> min(data.column["Column 1"])

回答 8

对我来说,最简单的方法就是使用范围。

import csv

with open('files/filename.csv') as I:
    reader = csv.reader(I)
    fulllist = list(reader)

# Starting with data skipping header
for item in range(1, len(fulllist)): 
    # Print each row using "item" as the index value
    print (fulllist[item])  

For me the easiest way to go is to use range.

import csv

with open('files/filename.csv') as I:
    reader = csv.reader(I)
    fulllist = list(reader)

# Starting with data skipping header
for item in range(1, len(fulllist)): 
    # Print each row using "item" as the index value
    print (fulllist[item])  

回答 9

因为这与我正在做的事情有关,所以我在这里分享。

如果我们不确定是否有标题并且您又不想导入嗅探器和其他内容,该怎么办?

如果您的任务是基本任务,例如打印或追加到列表或数组,则可以使用if语句:

# Let's say there's 4 columns
with open('file.csv') as csvfile:
     csvreader = csv.reader(csvfile)
# read first line
     first_line = next(csvreader)
# My headers were just text. You can use any suitable conditional here
     if len(first_line) == 4:
          array.append(first_line)
# Now we'll just iterate over everything else as usual:
     for row in csvreader:
          array.append(row)

Because this is related to something I was doing, I’ll share here.

What if we’re not sure if there’s a header and you also don’t feel like importing sniffer and other things?

If your task is basic, such as printing or appending to a list or array, you could just use an if statement:

# Let's say there's 4 columns
with open('file.csv') as csvfile:
     csvreader = csv.reader(csvfile)
# read first line
     first_line = next(csvreader)
# My headers were just text. You can use any suitable conditional here
     if len(first_line) == 4:
          array.append(first_line)
# Now we'll just iterate over everything else as usual:
     for row in csvreader:
          array.append(row)

回答 10

Python 3 CSV模块文档提供了以下示例:

with open('example.csv', newline='') as csvfile:
    dialect = csv.Sniffer().sniff(csvfile.read(1024))
    csvfile.seek(0)
    reader = csv.reader(csvfile, dialect)
    # ... process CSV file contents here ...

Sniffer会尝试自动检测有关CSV文件很多东西。您需要显式调用其has_header()方法以确定文件是否具有标题行。如果是这样,则在循环CSV行时跳过第一行。您可以这样做:

if sniffer.has_header():
    for header_row in reader:
        break
for data_row in reader:
    # do something with the row

The documentation for the Python 3 CSV module provides this example:

with open('example.csv', newline='') as csvfile:
    dialect = csv.Sniffer().sniff(csvfile.read(1024))
    csvfile.seek(0)
    reader = csv.reader(csvfile, dialect)
    # ... process CSV file contents here ...

The Sniffer will try to auto-detect many things about the CSV file. You need to explicitly call its has_header() method to determine whether the file has a header line. If it does, then skip the first row when iterating the CSV rows. You can do it like this:

if sniffer.has_header():
    for header_row in reader:
        break
for data_row in reader:
    # do something with the row

回答 11

我将使用tail摆脱不必要的第一行:

tail -n +2 $INFIL | whatever_script.py 

I would use tail to get rid of the unwanted first line:

tail -n +2 $INFIL | whatever_script.py 

回答 12

只需添加[1:]

下面的例子:

data = pd.read_csv("/Users/xyz/Desktop/xyxData/xyz.csv", sep=',', header=None)**[1:]**

在iPython中对我有用

just add [1:]

example below:

data = pd.read_csv("/Users/xyz/Desktop/xyxData/xyz.csv", sep=',', header=None)**[1:]**

that works for me in iPython


回答 13

的Python 3.X

处理UTF8 BOM + HEADER

令人沮丧的是,csv模块无法轻松获取标头,UTF-8 BOM(文件中的第一个字符)也存在一个错误。这仅适用于我的csv模块:

import csv

def read_csv(self, csv_path, delimiter):
    with open(csv_path, newline='', encoding='utf-8') as f:
        # https://bugs.python.org/issue7185
        # Remove UTF8 BOM.
        txt = f.read()[1:]

    # Remove header line.
    header = txt.splitlines()[:1]
    lines = txt.splitlines()[1:]

    # Convert to list.
    csv_rows = list(csv.reader(lines, delimiter=delimiter))

    for row in csv_rows:
        value = row[INDEX_HERE]

Python 3.X

Handles UTF8 BOM + HEADER

It was quite frustrating that the csv module could not easily get the header, there is also a bug with the UTF-8 BOM (first char in file). This works for me using only the csv module:

import csv

def read_csv(self, csv_path, delimiter):
    with open(csv_path, newline='', encoding='utf-8') as f:
        # https://bugs.python.org/issue7185
        # Remove UTF8 BOM.
        txt = f.read()[1:]

    # Remove header line.
    header = txt.splitlines()[:1]
    lines = txt.splitlines()[1:]

    # Convert to list.
    csv_rows = list(csv.reader(lines, delimiter=delimiter))

    for row in csv_rows:
        value = row[INDEX_HERE]

回答 14

我将csvreader转换为list,然后弹出第一个元素

import csv        

with open(fileName, 'r') as csvfile:
        csvreader = csv.reader(csvfile)
        data = list(csvreader)               # Convert to list
        data.pop(0)                          # Removes the first row

        for row in data:
            print(row)

I would convert csvreader to list, then pop the first element

import csv        

with open(fileName, 'r') as csvfile:
        csvreader = csv.reader(csvfile)
        data = list(csvreader)               # Convert to list
        data.pop(0)                          # Removes the first row

        for row in data:
            print(row)

回答 15

Python 2.x

csvreader.next()

将读者可迭代对象的下一行作为列表返回,并根据当前方言进行解析。

csv_data = csv.reader(open('sample.csv'))
csv_data.next() # skip first row
for row in csv_data:
    print(row) # should print second row

Python 3.x

csvreader.__next__()

返回读取器的可迭代对象的下一行作为列表(如果该对象是从reader()返回的)或字典(如果它是DictReader实例),则根据当前的方言进行解析。通常,您应该将此称为next(reader)。

csv_data = csv.reader(open('sample.csv'))
csv_data.__next__() # skip first row
for row in csv_data:
    print(row) # should print second row

Python 2.x

csvreader.next()

Return the next row of the reader’s iterable object as a list, parsed according to the current dialect.

csv_data = csv.reader(open('sample.csv'))
csv_data.next() # skip first row
for row in csv_data:
    print(row) # should print second row

Python 3.x

csvreader.__next__()

Return the next row of the reader’s iterable object as a list (if the object was returned from reader()) or a dict (if it is a DictReader instance), parsed according to the current dialect. Usually you should call this as next(reader).

csv_data = csv.reader(open('sample.csv'))
csv_data.__next__() # skip first row
for row in csv_data:
    print(row) # should print second row