使用Python编辑CSV文件时跳过标题

问题:使用Python编辑CSV文件时跳过标题

我正在使用以下引用的代码使用Python编辑CSV。代码中调用的函数构成了代码的上部。

问题:我希望下面引用的代码从第二行开始编辑csv,我希望它排除包含标题的第一行。现在,它仅在第一行上应用函数,并且我的标题行正在更改。

in_file = open("tmob_notcleaned.csv", "rb")
reader = csv.reader(in_file)
out_file = open("tmob_cleaned.csv", "wb")
writer = csv.writer(out_file)
row = 1
for row in reader:
    row[13] = handle_color(row[10])[1].replace(" - ","").strip()
    row[10] = handle_color(row[10])[0].replace("-","").replace("(","").replace(")","").strip()
    row[14] = handle_gb(row[10])[1].replace("-","").replace(" ","").replace("GB","").strip()
    row[10] = handle_gb(row[10])[0].strip()
    row[9] = handle_oem(row[10])[1].replace("Blackberry","RIM").replace("TMobile","T-Mobile").strip()
    row[15] = handle_addon(row[10])[1].strip()
    row[10] = handle_addon(row[10])[0].replace(" by","").replace("FREE","").strip()
    writer.writerow(row)
in_file.close()    
out_file.close()

我试图通过将row变量初始化为来解决此问题,1但没有成功。

请帮助我解决这个问题。

I am using below referred code to edit a csv using Python. Functions called in the code form upper part of the code.

Problem: I want the below referred code to start editing the csv from 2nd row, I want it to exclude 1st row which contains headers. Right now it is applying the functions on 1st row only and my header row is getting changed.

in_file = open("tmob_notcleaned.csv", "rb")
reader = csv.reader(in_file)
out_file = open("tmob_cleaned.csv", "wb")
writer = csv.writer(out_file)
row = 1
for row in reader:
    row[13] = handle_color(row[10])[1].replace(" - ","").strip()
    row[10] = handle_color(row[10])[0].replace("-","").replace("(","").replace(")","").strip()
    row[14] = handle_gb(row[10])[1].replace("-","").replace(" ","").replace("GB","").strip()
    row[10] = handle_gb(row[10])[0].strip()
    row[9] = handle_oem(row[10])[1].replace("Blackberry","RIM").replace("TMobile","T-Mobile").strip()
    row[15] = handle_addon(row[10])[1].strip()
    row[10] = handle_addon(row[10])[0].replace(" by","").replace("FREE","").strip()
    writer.writerow(row)
in_file.close()    
out_file.close()

I tried to solve this problem by initializing row variable to 1 but it didn’t work.

Please help me in solving this issue.


回答 0

您的reader变量是可迭代的,通过循环它可以检索行。

要使其在循环前跳过一项,只需调用next(reader, None)并忽略返回值即可。

您还可以稍微简化代码;使用打开的文件作为上下文管理器可以自动关闭它们:

with open("tmob_notcleaned.csv", "rb") as infile, open("tmob_cleaned.csv", "wb") as outfile:
   reader = csv.reader(infile)
   next(reader, None)  # skip the headers
   writer = csv.writer(outfile)
   for row in reader:
       # process each row
       writer.writerow(row)

# no need to close, the files are closed automatically when you get to this point.

如果您想将标头写入未经处理的输出文件,也很容易,请将输出传递next()writer.writerow()

headers = next(reader, None)  # returns the headers or `None` if the input is empty
if headers:
    writer.writerow(headers)

Your reader variable is an iterable, by looping over it you retrieve the rows.

To make it skip one item before your loop, simply call next(reader, None) and ignore the return value.

You can also simplify your code a little; use the opened files as context managers to have them closed automatically:

with open("tmob_notcleaned.csv", "rb") as infile, open("tmob_cleaned.csv", "wb") as outfile:
   reader = csv.reader(infile)
   next(reader, None)  # skip the headers
   writer = csv.writer(outfile)
   for row in reader:
       # process each row
       writer.writerow(row)

# no need to close, the files are closed automatically when you get to this point.

If you wanted to write the header to the output file unprocessed, that’s easy too, pass the output of next() to writer.writerow():

headers = next(reader, None)  # returns the headers or `None` if the input is empty
if headers:
    writer.writerow(headers)

回答 1

解决此问题的另一种方法是使用DictReader类,该类“跳过”标题行并将其用于允许命名索引。

给定“ foo.csv”,如下所示:

FirstColumn,SecondColumn
asdf,1234
qwer,5678

像这样使用DictReader:

import csv
with open('foo.csv') as f:
    reader = csv.DictReader(f, delimiter=',')
    for row in reader:
        print(row['FirstColumn'])  # Access by column header instead of column number
        print(row['SecondColumn'])

Another way of solving this is to use the DictReader class, which “skips” the header row and uses it to allowed named indexing.

Given “foo.csv” as follows:

FirstColumn,SecondColumn
asdf,1234
qwer,5678

Use DictReader like this:

import csv
with open('foo.csv') as f:
    reader = csv.DictReader(f, delimiter=',')
    for row in reader:
        print(row['FirstColumn'])  # Access by column header instead of column number
        print(row['SecondColumn'])

回答 2

在做 row=1不会改变任何东西,因为您只会用循环的结果覆盖它。

您要next(reader)跳过一行。

Doing row=1 won’t change anything, because you’ll just overwrite that with the results of the loop.

You want to do next(reader) to skip one row.