问题:计算CSV Python中有多少行?

我正在使用python(Django Framework)读取CSV文件。如您所见,我仅从该CSV中提取了2行。我一直在尝试将CS​​V的总行数存储在变量中。

如何获得总行数?

file = object.myfilePath
fileObject = csv.reader(file)
for i in range(2):
    data.append(fileObject.next()) 

我努力了:

len(fileObject)
fileObject.length

I’m using python (Django Framework) to read a CSV file. I pull just 2 lines out of this CSV as you can see. What I have been trying to do is store in a variable the total number of rows the CSV also.

How can I get the total number of rows?

file = object.myfilePath
fileObject = csv.reader(file)
for i in range(2):
    data.append(fileObject.next()) 

I have tried:

len(fileObject)
fileObject.length

回答 0

您需要计算行数:

row_count = sum(1 for row in fileObject)  # fileObject is your csv.reader

使用sum()与生成器表达使一个有效的计数器,从而避免在存储器中存储整个文件。

如果您已经开始阅读两行,那么您需要将这两行加到总计中;已读取的行不计在内。

You need to count the number of rows:

row_count = sum(1 for row in fileObject)  # fileObject is your csv.reader

Using sum() with a generator expression makes for an efficient counter, avoiding storing the whole file in memory.

If you already read 2 rows to start with, then you need to add those 2 rows to your total; rows that have already been read are not being counted.


回答 1

2018-10-29编辑

谢谢你的意见。

我测试了几种代码来获取csv文件中的行数(以速度为单位)。最好的方法如下。

with open(filename) as f:
    sum(1 for line in f)

这是经过测试的代码。

import timeit
import csv
import pandas as pd

filename = './sample_submission.csv'

def talktime(filename, funcname, func):
    print(f"# {funcname}")
    t = timeit.timeit(f'{funcname}("{filename}")', setup=f'from __main__ import {funcname}', number = 100) / 100
    print('Elapsed time : ', t)
    print('n = ', func(filename))
    print('\n')

def sum1forline(filename):
    with open(filename) as f:
        return sum(1 for line in f)
talktime(filename, 'sum1forline', sum1forline)

def lenopenreadlines(filename):
    with open(filename) as f:
        return len(f.readlines())
talktime(filename, 'lenopenreadlines', lenopenreadlines)

def lenpd(filename):
    return len(pd.read_csv(filename)) + 1
talktime(filename, 'lenpd', lenpd)

def csvreaderfor(filename):
    cnt = 0
    with open(filename) as f:
        cr = csv.reader(f)
        for row in cr:
            cnt += 1
    return cnt
talktime(filename, 'csvreaderfor', csvreaderfor)

def openenum(filename):
    cnt = 0
    with open(filename) as f:
        for i, line in enumerate(f,1):
            cnt += 1
    return cnt
talktime(filename, 'openenum', openenum)

结果如下。

# sum1forline
Elapsed time :  0.6327946722068599
n =  2528244


# lenopenreadlines
Elapsed time :  0.655304473598555
n =  2528244


# lenpd
Elapsed time :  0.7561274056295324
n =  2528244


# csvreaderfor
Elapsed time :  1.5571560935772661
n =  2528244


# openenum
Elapsed time :  0.773000013928679
n =  2528244

总之,sum(1 for line in f)是最快的。但是与可能没有太大区别len(f.readlines())

sample_submission.csv 是30.2MB,具有3100万个字符。

2018-10-29 EDIT

Thank you for the comments.

I tested several kinds of code to get the number of lines in a csv file in terms of speed. The best method is below.

with open(filename) as f:
    sum(1 for line in f)

Here is the code tested.

import timeit
import csv
import pandas as pd

filename = './sample_submission.csv'

def talktime(filename, funcname, func):
    print(f"# {funcname}")
    t = timeit.timeit(f'{funcname}("{filename}")', setup=f'from __main__ import {funcname}', number = 100) / 100
    print('Elapsed time : ', t)
    print('n = ', func(filename))
    print('\n')

def sum1forline(filename):
    with open(filename) as f:
        return sum(1 for line in f)
talktime(filename, 'sum1forline', sum1forline)

def lenopenreadlines(filename):
    with open(filename) as f:
        return len(f.readlines())
talktime(filename, 'lenopenreadlines', lenopenreadlines)

def lenpd(filename):
    return len(pd.read_csv(filename)) + 1
talktime(filename, 'lenpd', lenpd)

def csvreaderfor(filename):
    cnt = 0
    with open(filename) as f:
        cr = csv.reader(f)
        for row in cr:
            cnt += 1
    return cnt
talktime(filename, 'csvreaderfor', csvreaderfor)

def openenum(filename):
    cnt = 0
    with open(filename) as f:
        for i, line in enumerate(f,1):
            cnt += 1
    return cnt
talktime(filename, 'openenum', openenum)

The result was below.

# sum1forline
Elapsed time :  0.6327946722068599
n =  2528244


# lenopenreadlines
Elapsed time :  0.655304473598555
n =  2528244


# lenpd
Elapsed time :  0.7561274056295324
n =  2528244


# csvreaderfor
Elapsed time :  1.5571560935772661
n =  2528244


# openenum
Elapsed time :  0.773000013928679
n =  2528244

In conclusion, sum(1 for line in f) is fastest. But there might not be significant difference from len(f.readlines()).

sample_submission.csv is 30.2MB and has 31 million characters.


回答 2

为此,您需要像我的示例一样有一些代码:

file = open("Task1.csv")
numline = len(file.readlines())
print (numline)

希望对大家有帮助。

To do it you need to have a bit of code like my example here:

file = open("Task1.csv")
numline = len(file.readlines())
print (numline)

I hope this helps everyone.


回答 3

上面的一些建议计算了csv文件中的LINES数量。但是某些CSV文件将包含带引号的字符串,这些字符串本身包含换行符。MS CSV文件通常用\ r \ n分隔记录,但在带引号的字符串中单独使用\ n。

对于这样的文件,计算文件中的文本行(由换行符分隔)将导致太大的结果。因此,为了获得准确的计数,您需要使用csv.reader来读取记录。

Several of the above suggestions count the number of LINES in the csv file. But some CSV files will contain quoted strings which themselves contain newline characters. MS CSV files usually delimit records with \r\n, but use \n alone within quoted strings.

For a file like this, counting lines of text (as delimited by newline) in the file will give too large a result. So for an accurate count you need to use csv.reader to read the records.


回答 4

首先,您必须使用open打开文件

input_file = open("nameOfFile.csv","r+")

然后使用csv.reader打开csv

reader_file = csv.reader(input_file)

最后,您可以使用“ len”指令获取行数

value = len(list(reader_file))

总代码是这样的:

input_file = open("nameOfFile.csv","r+")
reader_file = csv.reader(input_file)
value = len(list(reader_file))

请记住,如果要重复使用csv文件,则必须创建一个input_file.fseek(0),因为当您使用reader_file的列表时,它将读取所有文件,并且文件中的指针会更改其位置

First you have to open the file with open

input_file = open("nameOfFile.csv","r+")

Then use the csv.reader for open the csv

reader_file = csv.reader(input_file)

At the last, you can take the number of row with the instruction ‘len’

value = len(list(reader_file))

The total code is this:

input_file = open("nameOfFile.csv","r+")
reader_file = csv.reader(input_file)
value = len(list(reader_file))

Remember that if you want to reuse the csv file, you have to make a input_file.fseek(0), because when you use a list for the reader_file, it reads all file, and the pointer in the file change its position


回答 5

row_count = sum(1 for line in open(filename)) 为我工作。

注意:sum(1 for line in csv.reader(filename))似乎要计算第一行的长度

row_count = sum(1 for line in open(filename)) worked for me.

Note : sum(1 for line in csv.reader(filename)) seems to calculate the length of first line


回答 6

numline = len(file_read.readlines())
numline = len(file_read.readlines())

回答 7

当实例化一个csv.reader对象并遍历整个文件时,可以访问提供行数的名为line_num的实例变量:

import csv
with open('csv_path_file') as f:
    csv_reader = csv.reader(f)
    for row in csv_reader:
        pass
    print(csv_reader.line_num)

when you instantiate a csv.reader object and you iter the whole file then you can access an instance variable called line_num providing the row count:

import csv
with open('csv_path_file') as f:
    csv_reader = csv.reader(f)
    for row in csv_reader:
        pass
    print(csv_reader.line_num)

回答 8

import csv
count = 0
with open('filename.csv', 'rb') as count_file:
    csv_reader = csv.reader(count_file)
    for row in csv_reader:
        count += 1

print count
import csv
count = 0
with open('filename.csv', 'rb') as count_file:
    csv_reader = csv.reader(count_file)
    for row in csv_reader:
        count += 1

print count

回答 9

使用“列表”适合更可行的对象。

然后,您可以计数,跳过,变异,直到您的内心渴望:

list(fileObject) #list values

len(list(fileObject)) # get length of file lines

list(fileObject)[10:] # skip first 10 lines

Use “list” to fit a more workably object.

You can then count, skip, mutate till your heart’s desire:

list(fileObject) #list values

len(list(fileObject)) # get length of file lines

list(fileObject)[10:] # skip first 10 lines

回答 10

您还可以使用经典的for循环:

import pandas as pd
df = pd.read_csv('your_file.csv')

count = 0
for i in df['a_column']:
    count = count + 1

print(count)

You can also use a classic for loop:

import pandas as pd
df = pd.read_csv('your_file.csv')

count = 0
for i in df['a_column']:
    count = count + 1

print(count)

回答 11

可能想在命令行中尝试以下简单的操作:

sed -n '$=' filename 要么 wc -l filename

might want to try something as simple as below in the command line:

sed -n '$=' filename

or

wc -l filename

回答 12

这适用于csv和所有基于Unix的操作系统中包含字符串的文件:

import os

numOfLines = int(os.popen('wc -l < file.csv').read()[:-1])

如果csv文件包含一个字段行,则可以从numOfLines上面扣除一个:

numOfLines = numOfLines - 1

This works for csv and all files containing strings in Unix-based OSes:

import os

numOfLines = int(os.popen('wc -l < file.csv').read()[:-1])

In case the csv file contains a fields row you can deduct one from numOfLines above:

numOfLines = numOfLines - 1

回答 13

我认为我们可以改善最佳答案,我正在使用:

len = sum(1 for _ in reader)

此外,我们不应忘记pythonic代码在项目中并非总是具有最佳性能。例如:如果我们可以在同一数据集中同时执行更多操作,最好在同一个气泡中全部进行操作,而不是制作两个或多个pythonic气泡。

I think we can improve the best answer a little bit, I’m using:

len = sum(1 for _ in reader)

Moreover, we shouldnt forget pythonic code not always have the best performance in the project. In example: If we can do more operations at the same time in the same data set Its better to do all in the same bucle instead make two or more pythonic bucles.


回答 14

尝试

data = pd.read_csv("data.csv")
data.shape

在输出中,您可以看到类似(aa,bb)的图形,其中aa是行数

try

data = pd.read_csv("data.csv")
data.shape

and in the output you can see something like (aa,bb) where aa is the # of rows


回答 15

import pandas as pd
data = pd.read_csv('data.csv') 
totalInstances=len(data)
import pandas as pd
data = pd.read_csv('data.csv') 
totalInstances=len(data)

声明:本站所有文章,如无特殊说明或标注,均为本站原创发布。任何个人或组织,在未征得本站同意时,禁止复制、盗用、采集、发布本站内容到任何网站、书籍等各类媒体平台。如若本站内容侵犯了原著者的合法权益,可联系我们进行处理。