

我们有一个很大的原始数据文件,我们希望将其修剪到指定的大小。我在.net c#方面经验丰富,但是想在python中做到这一点,以简化事情,并且不感兴趣。


We have a large raw data file that we would like to trim to a specified size. I am experienced in .net c#, however would like to do this in python to simplify things and out of interest.

How would I go about getting the first N lines of a text file in python? Will the OS being used have any effect on the implementation?

回答 0

Python 2

with open("datafile") as myfile:
    head = [next(myfile) for x in xrange(N)]
print head

Python 3

with open("datafile") as myfile:
    head = [next(myfile) for x in range(N)]

这是另一种方式(Python 2和3)

from itertools import islice
with open("datafile") as myfile:
    head = list(islice(myfile, N))
print head

Python 2

with open("datafile") as myfile:
    head = [next(myfile) for x in xrange(N)]
print head

Python 3

with open("datafile") as myfile:
    head = [next(myfile) for x in range(N)]

Here’s another way (both Python 2 & 3)

from itertools import islice
with open("datafile") as myfile:
    head = list(islice(myfile, N))
print head

回答 1

N = 10
with open("file.txt", "a") as file:  # the a opens it in append mode
    for i in range(N):
        line = next(file).strip()
N = 10
with open("file.txt", "a") as file:  # the a opens it in append mode
    for i in range(N):
        line = next(file).strip()

回答 2



with open("pathofmyfileandfileandname") as myfile:
    firstNlines=myfile.readlines()[0:5] #put here the interval you want


print firstNlines


If you want to read the first lines quickly and you don’t care about performance you can use .readlines() which returns list object and then slice the list.

E.g. for the first 5 lines:

with open("pathofmyfileandfileandname") as myfile:
    firstNlines=myfile.readlines()[0:5] #put here the interval you want

Note: the whole file is read so is not the best from the performance point of view but it is easy to use, fast to write and easy to remember so if you want just perform some one-time calculation is very convenient

print firstNlines

One advantage compared to the other answers is the possibility to select easily the range of lines e.g. skipping the first 10 lines [10:30] or the lasts 10 [:-10] or taking only even lines [::2].

回答 3


import pandas as pd
yourfile = pd.read('path/to/your/file.csv',nrows=1000)

What I do is to call the N lines using pandas. I think the performance is not the best, but for example if N=1000:

import pandas as pd
yourfile = pd.read_csv('path/to/your/file.csv',nrows=1000)

回答 4



lines =[]
with open(file_name) as f:
    lines.extend(f.readline() for i in xrange(N))

There is no specific method to read number of lines exposed by file object.

I guess the easiest way would be following:

lines =[]
with open(file_name) as f:
    lines.extend(f.readline() for i in xrange(N))

回答 5


class File(file):
    def head(self, lines_2find=1):
        self.seek(0)                            #Rewind file
        return [self.next() for x in xrange(lines_2find)]

    def tail(self, lines_2find=1):  
        self.seek(0, 2)                         #go to end of file
        bytes_in_file = self.tell()             
        lines_found, total_bytes_scanned = 0, 0
        while (lines_2find+1 > lines_found and
               bytes_in_file > total_bytes_scanned): 
            byte_block = min(1024, bytes_in_file-total_bytes_scanned)
            self.seek(-(byte_block+total_bytes_scanned), 2)
            total_bytes_scanned += byte_block
            lines_found += self.read(1024).count('\n')
        self.seek(-total_bytes_scanned, 2)
        line_list = list(self.readlines())
        return line_list[-lines_2find:]


f = File('path/to/file', 'r')

Based on gnibbler top voted answer (Nov 20 ’09 at 0:27): this class add head() and tail() method to file object.

class File(file):
    def head(self, lines_2find=1):
        self.seek(0)                            #Rewind file
        return [self.next() for x in xrange(lines_2find)]

    def tail(self, lines_2find=1):  
        self.seek(0, 2)                         #go to end of file
        bytes_in_file = self.tell()             
        lines_found, total_bytes_scanned = 0, 0
        while (lines_2find+1 > lines_found and
               bytes_in_file > total_bytes_scanned): 
            byte_block = min(1024, bytes_in_file-total_bytes_scanned)
            self.seek(-(byte_block+total_bytes_scanned), 2)
            total_bytes_scanned += byte_block
            lines_found += self.read(1024).count('\n')
        self.seek(-total_bytes_scanned, 2)
        line_list = list(self.readlines())
        return line_list[-lines_2find:]


f = File('path/to/file', 'r')

回答 6


  1. 迭代对文件中的行由行和breakN线。

  2. 使用next()方法N时间逐行迭代文件。(这实际上是最佳答案的语法不同。)


# Method 1:
with open("fileName", "r") as f:
    counter = 0
    for line in f:
        print line
        counter += 1
        if counter == N: break

# Method 2:
with open("fileName", "r") as f:
    for i in xrange(N):
        line = f.next()
        print line


The two most intuitive ways of doing this would be:

  1. Iterate on the file line-by-line, and break after N lines.

  2. Iterate on the file line-by-line using the next() method N times. (This is essentially just a different syntax for what the top answer does.)

Here is the code:

# Method 1:
with open("fileName", "r") as f:
    counter = 0
    for line in f:
        print line
        counter += 1
        if counter == N: break

# Method 2:
with open("fileName", "r") as f:
    for i in xrange(N):
        line = f.next()
        print line

The bottom line is, as long as you don’t use readlines() or enumerateing the whole file into memory, you have plenty of options.

回答 7


print [s for (i, s) in enumerate(open('test.txt')) if i < LINE_COUNT]

基于列表理解的解决方案 的函数open()支持迭代接口。enumerate()覆盖open()并返回元组(索引,项目),然后我们检查是否在可接受的范围内(如果i <LINE_COUNT),然后简单地打印结果。


most convinient way on my own:

print [s for (i, s) in enumerate(open('test.txt')) if i < LINE_COUNT]

Solution based on List Comprehension The function open() supports an iteration interface. The enumerate() covers open() and return tuples (index, item), then we check that we’re inside an accepted range (if i < LINE_COUNT) and then simply print the result.

Enjoy the Python. ;)

回答 8


with open("data_file", "r") as file:
    for i in range(N):
       print file.next()

For first 5 lines, simply do:

with open("data_file", "r") as file:
    for i in range(N):
       print file.next()

回答 9

如果您希望某些东西(无需查找手册中深奥的东西)显然不需要导入和try / except即可工作,并且可以在各种Python 2.x版本(2.2至2.6)上工作:

def headn(file_name, n):
    """Like *x head -N command"""
    result = []
    nlines = 0
    assert n >= 1
    for line in open(file_name):
        nlines += 1
        if nlines >= n:
    return result

if __name__ == "__main__":
    import sys
    rval = headn(sys.argv[1], int(sys.argv[2]))
    print rval
    print len(rval)

If you want something that obviously (without looking up esoteric stuff in manuals) works without imports and try/except and works on a fair range of Python 2.x versions (2.2 to 2.6):

def headn(file_name, n):
    """Like *x head -N command"""
    result = []
    nlines = 0
    assert n >= 1
    for line in open(file_name):
        nlines += 1
        if nlines >= n:
    return result

if __name__ == "__main__":
    import sys
    rval = headn(sys.argv[1], int(sys.argv[2]))
    print rval
    print len(rval)

回答 10


def load_big_file(fname,maxrows):
'''only works for well-formed text file of space-separated doubles'''

rows = []  # unknown number of lines, so use list

with open(fname) as f:
    for line in f:
        if j==maxrows:
            line = [float(s) for s in line.split()]
            rows.append(np.array(line, dtype = np.double))
return np.vstack(rows)  # convert list of vectors to array

If you have a really big file, and assuming you want the output to be a numpy array, using np.genfromtxt will freeze your computer. This is so much better in my experience:

def load_big_file(fname,maxrows):
'''only works for well-formed text file of space-separated doubles'''

rows = []  # unknown number of lines, so use list

with open(fname) as f:
    for line in f:
        if j==maxrows:
            line = [float(s) for s in line.split()]
            rows.append(np.array(line, dtype = np.double))
return np.vstack(rows)  # convert list of vectors to array

回答 11

从Python 2.6开始,您可以在IO基本类中利用更复杂的功能。因此,上面评分最高的答案可以重写为:

    with open("datafile") as myfile:
       head = myfile.readlines(N)
    print head


Starting at Python 2.6, you can take advantage of more sophisticated functions in the IO base clase. So the top rated answer above can be rewritten as:

    with open("datafile") as myfile:
       head = myfile.readlines(N)
    print head

(You don’t have to worry about your file having less than N lines since no StopIteration exception is thrown.)

回答 12


f = open("history_export.csv", "r")
line= 5
for x in range(line):
    a = f.readline()

This worked for me

f = open("history_export.csv", "r")
line= 5
for x in range(line):
    a = f.readline()

回答 13

这适用于Python 2和3:

from itertools import islice

with open('/tmp/filename.txt') as inf:
    for line in islice(inf, N, N+M):

This works for Python 2 & 3:

from itertools import islice

with open('/tmp/filename.txt') as inf:
    for line in islice(inf, N, N+M):

回答 14

fname = input("Enter file name: ")
num_lines = 0

with open(fname, 'r') as f: #lines count
    for line in f:
        num_lines += 1

num_lines_input = int (input("Enter line numbers: "))

if num_lines_input <= num_lines:
    f = open(fname, "r")
    for x in range(num_lines_input):
        a = f.readline()

    f = open(fname, "r")
    for x in range(num_lines_input):
        a = f.readline()
        print("Don't have", num_lines_input, " lines print as much as you can")

print("Total lines in the text",num_lines)

fname = input("Enter file name: ")
num_lines = 0

with open(fname, 'r') as f: #lines count
    for line in f:
        num_lines += 1

num_lines_input = int (input("Enter line numbers: "))

if num_lines_input <= num_lines:
    f = open(fname, "r")
    for x in range(num_lines_input):
        a = f.readline()

    f = open(fname, "r")
    for x in range(num_lines_input):
        a = f.readline()
        print("Don't have", num_lines_input, " lines print as much as you can")

print("Total lines in the text",num_lines)

回答 15


import subprocess

p = subprocess.Popen(["tail", "-n 3", "passlist"], stdout=subprocess.PIPE)

output, err = p.communicate()

print  output



import subprocess

p = subprocess.Popen(["tail", "-n 3", "passlist"], stdout=subprocess.PIPE)

output, err = p.communicate()

print  output

This Method Worked for me