仅读取特定行

问题:仅读取特定行

我正在使用for循环读取文件,但是我只想读取特定的行,例如26号和30号行。是否有内置功能可以实现这一目标?

谢谢

I’m using a for loop to read a file, but I only want to read specific lines, say line #26 and #30. Is there any built-in feature to achieve this?

Thanks


回答 0

如果要读取的文件很大,并且您不想一次读取内存中的整个文件:

fp = open("file")
for i, line in enumerate(fp):
    if i == 25:
        # 26th line
    elif i == 29:
        # 30th line
    elif i > 29:
        break
fp.close()

注意,i == n-1对于nth行。


在Python 2.6或更高版本中:

with open("file") as fp:
    for i, line in enumerate(fp):
        if i == 25:
            # 26th line
        elif i == 29:
            # 30th line
        elif i > 29:
            break

If the file to read is big, and you don’t want to read the whole file in memory at once:

fp = open("file")
for i, line in enumerate(fp):
    if i == 25:
        # 26th line
    elif i == 29:
        # 30th line
    elif i > 29:
        break
fp.close()

Note that i == n-1 for the nth line.


In Python 2.6 or later:

with open("file") as fp:
    for i, line in enumerate(fp):
        if i == 25:
            # 26th line
        elif i == 29:
            # 30th line
        elif i > 29:
            break

回答 1

快速答案:

f=open('filename')
lines=f.readlines()
print lines[25]
print lines[29]

要么:

lines=[25, 29]
i=0
f=open('filename')
for line in f:
    if i in lines:
        print i
    i+=1

有一种提取许多行的更优雅的解决方案:linecache(由“ python:如何跳转到巨大的文本文件中的特定行?”,这是上一个stackoverflow.com问题)。

引用上面链接的python文档:

>>> import linecache
>>> linecache.getline('/etc/passwd', 4)
'sys:x:3:3:sys:/dev:/bin/sh\n'

将更4改为所需的行号,然后打开。请注意,由于计数从零开始,因此4将带来第五行。

如果文件可能很大,并且在读入内存时引起问题,则最好采用@Alok的建议并使用enumerate()

结论:

  • 使用fileobject.readlines()for line in fileobject作为小型文件的快速解决方案。
  • 使用linecache一个更优雅的解决方案,这将是相当快的读取许多文件,可能反复。
  • 听@Alok的建议,将其enumerate()用于可能非常大且不适合内存的文件。请注意,使用此方法可能会变慢,因为文件是按顺序读取的。

The quick answer:

f=open('filename')
lines=f.readlines()
print lines[25]
print lines[29]

or:

lines=[25, 29]
i=0
f=open('filename')
for line in f:
    if i in lines:
        print i
    i+=1

There is a more elegant solution for extracting many lines: linecache (courtesy of “python: how to jump to a particular line in a huge text file?”, a previous stackoverflow.com question).

Quoting the python documentation linked above:

>>> import linecache
>>> linecache.getline('/etc/passwd', 4)
'sys:x:3:3:sys:/dev:/bin/sh\n'

Change the 4 to your desired line number, and you’re on. Note that 4 would bring the fifth line as the count is zero-based.

If the file might be very large, and cause problems when read into memory, it might be a good idea to take @Alok’s advice and use enumerate().

To Conclude:

  • Use fileobject.readlines() or for line in fileobject as a quick solution for small files.
  • Use linecache for a more elegant solution, which will be quite fast for reading many files, possible repeatedly.
  • Take @Alok’s advice and use enumerate() for files which could be very large, and won’t fit into memory. Note that using this method might slow because the file is read sequentially.

回答 2

一种快速而紧凑的方法可以是:

def picklines(thefile, whatlines):
  return [x for i, x in enumerate(thefile) if i in whatlines]

它接受任何打开的类文件对象thefile(无论是从磁盘文件中打开,还是应通过套接字或其他类似文件的流打开,都由调用者决定)和一组从零开始的行索引whatlines,并返回一个列表,具有较低的内存占用量和合理的速度。如果要返回的行数很大,则您可能更喜欢生成器:

def yieldlines(thefile, whatlines):
  return (x for i, x in enumerate(thefile) if i in whatlines)

这基本上只适合循环使用-请注意,唯一的区别是在return语句中使用了舍入而不是正方形的括号,分别使列表理解和生成器表达式成为可能。

此外应注意,尽管“线”,并注明“文件”这些功能很多,很多更普遍的-他们会在工作的任何可迭代的,无论是打开的文件或任何其他的,返回的项目清单(或生成器)根据其渐进项编号。因此,我建议使用更适当的通用名称;-)。

A fast and compact approach could be:

def picklines(thefile, whatlines):
  return [x for i, x in enumerate(thefile) if i in whatlines]

this accepts any open file-like object thefile (leaving up to the caller whether it should be opened from a disk file, or via e.g a socket, or other file-like stream) and a set of zero-based line indices whatlines, and returns a list, with low memory footprint and reasonable speed. If the number of lines to be returned is huge, you might prefer a generator:

def yieldlines(thefile, whatlines):
  return (x for i, x in enumerate(thefile) if i in whatlines)

which is basically only good for looping upon — note that the only difference comes from using rounded rather than square parentheses in the return statement, making a list comprehension and a generator expression respectively.

Further note that despite the mention of “lines” and “file” these functions are much, much more general — they’ll work on any iterable, be it an open file or any other, returning a list (or generator) of items based on their progressive item-numbers. So, I’d suggest using more appropriately general names;-).


回答 3

为了提供另一个解决方案:

import linecache
linecache.getline('Sample.txt', Number_of_Line)

我希望这是方便快捷的:)

For the sake of offering another solution:

import linecache
linecache.getline('Sample.txt', Number_of_Line)

I hope this is quick and easy :)


回答 4

如果你要第7行

line = open(“ file.txt”,“ r”)。readlines()[7]

if you want line 7

line = open("file.txt", "r").readlines()[7]

回答 5

为了完整起见,这里还有一个选择。

让我们从python docs的定义开始:

切片通常包含一部分序列的对象。使用下标符号[]创建切片,当给出多个变量时(例如在variable_name [1:3:5]中),在数字之间使用冒号。方括号(下标)表示法在内部使用切片对象(或在较早的版本中为__getslice __()和__setslice __())。

尽管切片符号通常不直接适用于迭代器,但该itertools包包含替换功能:

from itertools import islice

# print the 100th line
with open('the_file') as lines:
    for line in islice(lines, 99, 100):
        print line

# print each third line until 100
with open('the_file') as lines:
    for line in islice(lines, 0, 100, 3):
        print line

该函数的另一个优点是,直到结束,它才读取迭代器。因此,您可以做更复杂的事情:

with open('the_file') as lines:
    # print the first 100 lines
    for line in islice(lines, 100):
        print line

    # then skip the next 5
    for line in islice(lines, 5):
        pass

    # print the rest
    for line in lines:
        print line

并回答原始问题:

# how to read lines #26 and #30
In [365]: list(islice(xrange(1,100), 25, 30, 4))
Out[365]: [26, 30]

For the sake of completeness, here is one more option.

Let’s start with a definition from python docs:

slice An object usually containing a portion of a sequence. A slice is created using the subscript notation, [] with colons between numbers when several are given, such as in variable_name[1:3:5]. The bracket (subscript) notation uses slice objects internally (or in older versions, __getslice__() and __setslice__()).

Though the slice notation is not directly applicable to iterators in general, the itertools package contains a replacement function:

from itertools import islice

# print the 100th line
with open('the_file') as lines:
    for line in islice(lines, 99, 100):
        print line

# print each third line until 100
with open('the_file') as lines:
    for line in islice(lines, 0, 100, 3):
        print line

The additional advantage of the function is that it does not read the iterator until the end. So you can do more complex things:

with open('the_file') as lines:
    # print the first 100 lines
    for line in islice(lines, 100):
        print line

    # then skip the next 5
    for line in islice(lines, 5):
        pass

    # print the rest
    for line in lines:
        print line

And to answer the original question:

# how to read lines #26 and #30
In [365]: list(islice(xrange(1,100), 25, 30, 4))
Out[365]: [26, 30]

回答 6

读取文件的速度非常快。读取100MB的文件只需不到0.1秒的时间(请参阅我的文章使用Python读取和写入文件)。因此,您应该完整阅读它,然后使用单行代码。

大多数答案在这里不是错,而是风格不好。应该始终使用打开文件的方式进行操作,with因为这可以确保再次关闭文件。

因此,您应该这样做:

with open("path/to/file.txt") as f:
    lines = f.readlines()
print(lines[26])  # or whatever you want to do with this line
print(lines[30])  # or whatever you want to do with this line

巨大的文件

如果碰巧有一个巨大的文件,而内存消耗是一个问题,则可以逐行处理它:

with open("path/to/file.txt") as f:
    for i, line in enumerate(f):
        pass  # process line i

Reading files is incredible fast. Reading a 100MB file takes less than 0.1 seconds (see my article Reading and Writing Files with Python). Hence you should read it completely and then work with the single lines.

What most answer here do is not wrong, but bad style. Opening files should always be done with with as it makes sure that the file is closed again.

So you should do it like this:

with open("path/to/file.txt") as f:
    lines = f.readlines()
print(lines[26])  # or whatever you want to do with this line
print(lines[30])  # or whatever you want to do with this line

Huge files

If you happen to have a huge file and memory consumption is a concern, you can process it line by line:

with open("path/to/file.txt") as f:
    for i, line in enumerate(f):
        pass  # process line i

回答 7

其中一些很可爱,但是可以更简单地完成:

start = 0 # some starting index
end = 5000 # some ending index
filename = 'test.txt' # some file we want to use

with open(filename) as fh:
    data = fin.readlines()[start:end]

print(data)

这将仅使用列表切片,它会加载整个文件,但是大多数系统会适当地最小化内存使用,它比上面给出的大多数方法都快,并且可以在我的10G +数据文件上运行。祝好运!

Some of these are lovely, but it can be done much more simply:

start = 0 # some starting index
end = 5000 # some ending index
filename = 'test.txt' # some file we want to use

with open(filename) as fh:
    data = fin.readlines()[start:end]

print(data)

That will use simply list slicing, it loads the whole file, but most systems will minimise memory usage appropriately, it’s faster than most of the methods given above, and works on my 10G+ data files. Good luck!


回答 8

您可以进行一次seek()调用,将读取头定位到文件中的指定字节。除非您确切知道要读取的行之前文件中写入了多少个字节(字符),否则这对您没有帮助。也许文件是严格格式化的(每行是X字节数?),或者,如果您确实想要提高速度,则可以自己计算字符数(记住要包括换行符等不可见字符)。

否则,您必须按照此处已提出的许多解决方案之一,在需要的行之前先阅读每一行。

You can do a seek() call which positions your read head to a specified byte within the file. This won’t help you unless you know exactly how many bytes (characters) are written in the file before the line you want to read. Perhaps your file is strictly formatted (each line is X number of bytes?) or, you could count the number of characters yourself (remember to include invisible characters like line breaks) if you really want the speed boost.

Otherwise, you do have to read every line prior to the line you desire, as per one of the many solutions already proposed here.


回答 9

如果大型文本文件file的结构严格(意味着每一行的长度都相同l),则可以使用n-th行

with open(file) as f:
    f.seek(n*l)
    line = f.readline() 
    last_pos = f.tell()

免责声明这仅适用于相同长度的文件!

If your large text file file is strictly well-structured (meaning every line has the same length l), you could use for n-th line

with open(file) as f:
    f.seek(n*l)
    line = f.readline() 
    last_pos = f.tell()

Disclaimer This does only work for files with the same length!


回答 10

这个怎么样:

>>> with open('a', 'r') as fin: lines = fin.readlines()
>>> for i, line in enumerate(lines):
      if i > 30: break
      if i == 26: dox()
      if i == 30: doy()

How about this:

>>> with open('a', 'r') as fin: lines = fin.readlines()
>>> for i, line in enumerate(lines):
      if i > 30: break
      if i == 26: dox()
      if i == 30: doy()

回答 11

如果您不介意导入,那么fileinput会完全满足您的需要(这是您可以读取当前行的行号)

If you don’t mind importing then fileinput does exactly what you need (this is you can read the line number of the current line)


回答 12

def getitems(iterable, items):
  items = list(items) # get a list from any iterable and make our own copy
                      # since we modify it
  if items:
    items.sort()
    for n, v in enumerate(iterable):
      if n == items[0]:
        yield v
        items.pop(0)
        if not items:
          break

print list(getitems(open("/usr/share/dict/words"), [25, 29]))
# ['Abelson\n', 'Abernathy\n']
# note that index 25 is the 26th item
def getitems(iterable, items):
  items = list(items) # get a list from any iterable and make our own copy
                      # since we modify it
  if items:
    items.sort()
    for n, v in enumerate(iterable):
      if n == items[0]:
        yield v
        items.pop(0)
        if not items:
          break

print list(getitems(open("/usr/share/dict/words"), [25, 29]))
# ['Abelson\n', 'Abernathy\n']
# note that index 25 is the 26th item

回答 13

我更喜欢这种方法,因为它更具通用性,也就是说,您可以在文件上,在结果上f.readlines(),在StringIO对象上使用它,无论如何:

def read_specific_lines(file, lines_to_read):
   """file is any iterable; lines_to_read is an iterable containing int values"""
   lines = set(lines_to_read)
   last = max(lines)
   for n, line in enumerate(file):
      if n + 1 in lines:
          yield line
      if n + 1 > last:
          return

>>> with open(r'c:\temp\words.txt') as f:
        [s for s in read_specific_lines(f, [1, 2, 3, 1000])]
['A\n', 'a\n', 'aa\n', 'accordant\n']

I prefer this approach because it’s more general-purpose, i.e. you can use it on a file, on the result of f.readlines(), on a StringIO object, whatever:

def read_specific_lines(file, lines_to_read):
   """file is any iterable; lines_to_read is an iterable containing int values"""
   lines = set(lines_to_read)
   last = max(lines)
   for n, line in enumerate(file):
      if n + 1 in lines:
          yield line
      if n + 1 > last:
          return

>>> with open(r'c:\temp\words.txt') as f:
        [s for s in read_specific_lines(f, [1, 2, 3, 1000])]
['A\n', 'a\n', 'aa\n', 'accordant\n']

回答 14

这是我的2美分,不值一分;)

def indexLines(filename, lines=[2,4,6,8,10,12,3,5,7,1]):
    fp   = open(filename, "r")
    src  = fp.readlines()
    data = [(index, line) for index, line in enumerate(src) if index in lines]
    fp.close()
    return data


# Usage below
filename = "C:\\Your\\Path\\And\\Filename.txt"
for line in indexLines(filename): # using default list, specify your own list of lines otherwise
    print "Line: %s\nData: %s\n" % (line[0], line[1])

Here’s my little 2 cents, for what it’s worth ;)

def indexLines(filename, lines=[2,4,6,8,10,12,3,5,7,1]):
    fp   = open(filename, "r")
    src  = fp.readlines()
    data = [(index, line) for index, line in enumerate(src) if index in lines]
    fp.close()
    return data


# Usage below
filename = "C:\\Your\\Path\\And\\Filename.txt"
for line in indexLines(filename): # using default list, specify your own list of lines otherwise
    print "Line: %s\nData: %s\n" % (line[0], line[1])

回答 15

Alok Singhal的答案有一个更好而次要的变化

fp = open("file")
for i, line in enumerate(fp,1):
    if i == 26:
        # 26th line
    elif i == 30:
        # 30th line
    elif i > 30:
        break
fp.close()

A better and minor change for Alok Singhal’s answer

fp = open("file")
for i, line in enumerate(fp,1):
    if i == 26:
        # 26th line
    elif i == 30:
        # 30th line
    elif i > 30:
        break
fp.close()

回答 16

文件对象具有.readlines()方法,该方法将为您提供文件内容的列表,每个列表项一行。在那之后,您可以只使用常规的列表切片技术。

http://docs.python.org/library/stdtypes.html#file.readlines

File objects have a .readlines() method which will give you a list of the contents of the file, one line per list item. After that, you can just use normal list slicing techniques.

http://docs.python.org/library/stdtypes.html#file.readlines


回答 17

@OP,可以使用枚举

for n,line in enumerate(open("file")):
    if n+1 in [26,30]: # or n in [25,29] 
       print line.rstrip()

@OP, you can use enumerate

for n,line in enumerate(open("file")):
    if n+1 in [26,30]: # or n in [25,29] 
       print line.rstrip()

回答 18

file = '/path/to/file_to_be_read.txt'
with open(file) as f:
    print f.readlines()[26]
    print f.readlines()[30]

使用with语句,将打开文件,打印第26和30行,然后关闭文件。简单!

file = '/path/to/file_to_be_read.txt'
with open(file) as f:
    print f.readlines()[26]
    print f.readlines()[30]

Using the with statement, this opens the file, prints lines 26 and 30, then closes the file. Simple!


回答 19

您可以使用已经有人提到过的这种语法非常简单地执行此操作,但这是迄今为止最简单的方法:

inputFile = open("lineNumbers.txt", "r")
lines = inputFile.readlines()
print (lines[0])
print (lines[2])

You can do this very simply with this syntax that someone already mentioned, but it’s by far the easiest way to do it:

inputFile = open("lineNumbers.txt", "r")
lines = inputFile.readlines()
print (lines[0])
print (lines[2])

回答 20

要打印第3行,

line_number = 3

with open(filename,"r") as file:
current_line = 1
for line in file:
    if current_line == line_number:
        print(file.readline())
        break
    current_line += 1

原作者:弗兰克·霍夫曼

To print line# 3,

line_number = 3

with open(filename,"r") as file:
current_line = 1
for line in file:
    if current_line == line_number:
        print(file.readline())
        break
    current_line += 1

Original author: Frank Hofmann


回答 21

相当快而且很关键。

在文本文件中打印某些行。创建一个“ lines2print”列表,然后仅在枚举“ lines2print”列表中时打印。要摆脱多余的“ \ n”,请使用line.strip()或line.strip(’\ n’)。我只喜欢“列表理解”,并在可以的时候尝试使用。我喜欢使用“ with”方法读取文本文件,以防止由于任何原因使文件保持打开状态。

lines2print = [26,30] # can be a big list and order doesn't matter.

with open("filepath", 'r') as fp:
    [print(x.strip()) for ei,x in enumerate(fp) if ei in lines2print]

或者,如果list很小,只需在列表中输入list作为理解即可。

with open("filepath", 'r') as fp:
    [print(x.strip()) for ei,x in enumerate(fp) if ei in [26,30]]

Fairly quick and to the point.

To print certain lines in a text file. Create a “lines2print” list and then just print when the enumeration is “in” the lines2print list. To get rid of extra ‘\n’ use line.strip() or line.strip(‘\n’). I just like “list comprehension” and try to use when I can. I like the “with” method to read text files in order to prevent leaving a file open for any reason.

lines2print = [26,30] # can be a big list and order doesn't matter.

with open("filepath", 'r') as fp:
    [print(x.strip()) for ei,x in enumerate(fp) if ei in lines2print]

or if list is small just type in list as a list into the comprehension.

with open("filepath", 'r') as fp:
    [print(x.strip()) for ei,x in enumerate(fp) if ei in [26,30]]

回答 22

打印所需的行。在所需行上方/下方打印行。

def dline(file,no,add_sub=0):
    tf=open(file)
    for sno,line in enumerate(tf):
        if sno==no-1+add_sub:
         print(line)
    tf.close()

execute —-> dline(“ D:\ dummy.txt”,6),即dline(“ file path”,line_number,如果要让搜索行的上一行给低1 -1,这是可选的默认值被采取0)

To print desired line. To print line above/below required line.

def dline(file,no,add_sub=0):
    tf=open(file)
    for sno,line in enumerate(tf):
        if sno==no-1+add_sub:
         print(line)
    tf.close()

execute—->dline(“D:\dummy.txt”,6) i.e dline(“file path”, line_number, if you want upper line of the searched line give 1 for lower -1 this is optional default value will be taken 0)


回答 23

如果您想读取特定的行,例如在某个阈值行之后开始的行,则可以使用以下代码, file = open("files.txt","r") lines = file.readlines() ## convert to list of lines datas = lines[11:] ## raed the specific lines

If you want to read specific lines, such as line starting after some threshold line then you can use the following codes, file = open("files.txt","r") lines = file.readlines() ## convert to list of lines datas = lines[11:] ## raed the specific lines


回答 24

f = open(filename, 'r')
totalLines = len(f.readlines())
f.close()
f = open(filename, 'r')

lineno = 1
while lineno < totalLines:
    line = f.readline()

    if lineno == 26:
        doLine26Commmand(line)

    elif lineno == 30:
        doLine30Commmand(line)

    lineno += 1
f.close()
f = open(filename, 'r')
totalLines = len(f.readlines())
f.close()
f = open(filename, 'r')

lineno = 1
while lineno < totalLines:
    line = f.readline()

    if lineno == 26:
        doLine26Commmand(line)

    elif lineno == 30:
        doLine30Commmand(line)

    lineno += 1
f.close()

回答 25

我认为这会工作

 open_file1 = open("E:\\test.txt",'r')
 read_it1 = open_file1.read()
 myline1 = []
 for line1 in read_it1.splitlines():
 myline1.append(line1)
 print myline1[0]

I think this would work

 open_file1 = open("E:\\test.txt",'r')
 read_it1 = open_file1.read()
 myline1 = []
 for line1 in read_it1.splitlines():
 myline1.append(line1)
 print myline1[0]