如何在Python中逐行读取文件?

问题:如何在Python中逐行读取文件?

在史前时代(Python 1.4)中,我们做到了:

fp = open('filename.txt')
while 1:
    line = fp.readline()
    if not line:
        break
    print line

在Python 2.1之后,我们做到了:

for line in open('filename.txt').xreadlines():
    print line

在Python 2.3中获得便捷的迭代器协议之前,它可以做到:

for line in open('filename.txt'):
    print line

我看过一些使用更详细的示例:

with open('filename.txt') as fp:
    for line in fp:
        print line

这是首选的方法吗?

[edit]我知道with语句可以确保关闭文件…但是为什么文件对象的迭代器协议中没有包含该语句呢?

In pre-historic times (Python 1.4) we did:

fp = open('filename.txt')
while 1:
    line = fp.readline()
    if not line:
        break
    print line

after Python 2.1, we did:

for line in open('filename.txt').xreadlines():
    print line

before we got the convenient iterator protocol in Python 2.3, and could do:

for line in open('filename.txt'):
    print line

I’ve seen some examples using the more verbose:

with open('filename.txt') as fp:
    for line in fp:
        print line

is this the preferred method going forwards?

[edit] I get that the with statement ensures closing of the file… but why isn’t that included in the iterator protocol for file objects?


回答 0

首选以下原因正是有一个原因:

with open('filename.txt') as fp:
    for line in fp:
        print line

CPython的相对确定性的引用计数方案对垃圾回收来说,我们都被宠坏了。如果其他假设的Python实现with使用某种其他方案来回收内存,则它们在没有该块的情况下不一定会“足够快地”关闭文件。

在这样的实现中,如果您的代码打开文件的速度比垃圾收集器调用孤立文件句柄上的终结器的速度快,则可能会从OS收到“打开太多文件”错误。通常的解决方法是立即触发GC,但这是一个讨厌的技巧,必须由可能遇到错误的每个函数(包括库中的函数)来完成。什么样的恶梦。

或者,您可以只使用该with块。

奖金问题

(如果仅对问题的客观方面感兴趣,请立即停止阅读。)

为什么文件对象的迭代器协议中未包含该代码?

这是有关API设计的主观问题,因此我有两个部分的主观答案。

从直觉上讲,这是错的,因为它使迭代器协议执行两项单独的操作(遍历行关闭文件句柄),并且使外观简单的函数执行两项操作通常不是一个好主意。在这种情况下,感觉特别糟糕,因为迭代器以准功能,基于值的方式与文件内容相关联,但是管理文件句柄是完全独立的任务。对于阅读代码的人来说,将两者无形地压成一个动作是令人惊讶的,这使得推理程序行为变得更加困难。

其他语言基本上得出了相同的结论。Haskell简要调情了所谓的“惰性IO”,它允许您遍历文件并在到达流末尾时自动将其关闭,但是如今,在Haskell和Haskell中几乎普遍不建议使用惰性IO。用户大多转向更明确的资源管理,例如Conduit,其行为更像withPython中的块。

从技术上讲,您可能需要对Python中的文件句柄进行某些操作,如果迭代关闭了文件句柄,这些操作将无法正常工作。例如,假设我需要遍历文件两次:

with open('filename.txt') as fp:
    for line in fp:
        ...
    fp.seek(0)
    for line in fp:
        ...

虽然这是一种不太常见的用例,但请考虑以下事实:我可能刚刚将底部的三行代码添加到了原来具有前三行的现有代码库中。如果迭代关闭了该文件,我将无法执行该操作。因此,将迭代和资源管理分开保持可以更轻松地将代码块组合成一个更大的,可运行的Python程序。

可组合性是语言或API最重要的可用性功能之一。

There is exactly one reason why the following is preferred:

with open('filename.txt') as fp:
    for line in fp:
        print line

We are all spoiled by CPython’s relatively deterministic reference-counting scheme for garbage collection. Other, hypothetical implementations of Python will not necessarily close the file “quickly enough” without the with block if they use some other scheme to reclaim memory.

In such an implementation, you might get a “too many files open” error from the OS if your code opens files faster than the garbage collector calls finalizers on orphaned file handles. The usual workaround is to trigger the GC immediately, but this is a nasty hack and it has to be done by every function that could encounter the error, including those in libraries. What a nightmare.

Or you could just use the with block.

Bonus Question

(Stop reading now if are only interested in the objective aspects of the question.)

Why isn’t that included in the iterator protocol for file objects?

This is a subjective question about API design, so I have a subjective answer in two parts.

On a gut level, this feels wrong, because it makes iterator protocol do two separate things—iterate over lines and close the file handle—and it’s often a bad idea to make a simple-looking function do two actions. In this case, it feels especially bad because iterators relate in a quasi-functional, value-based way to the contents of a file, but managing file handles is a completely separate task. Squashing both, invisibly, into one action, is surprising to humans who read the code and makes it more difficult to reason about program behavior.

Other languages have essentially come to the same conclusion. Haskell briefly flirted with so-called “lazy IO” which allows you to iterate over a file and have it automatically closed when you get to the end of the stream, but it’s almost universally discouraged to use lazy IO in Haskell these days, and Haskell users have mostly moved to more explicit resource management like Conduit which behaves more like the with block in Python.

On a technical level, there are some things you may want to do with a file handle in Python which would not work as well if iteration closed the file handle. For example, suppose I need to iterate over the file twice:

with open('filename.txt') as fp:
    for line in fp:
        ...
    fp.seek(0)
    for line in fp:
        ...

While this is a less common use case, consider the fact that I might have just added the three lines of code at the bottom to an existing code base which originally had the top three lines. If iteration closed the file, I wouldn’t be able to do that. So keeping iteration and resource management separate makes it easier to compose chunks of code into a larger, working Python program.

Composability is one of the most important usability features of a language or API.


回答 1

是,

with open('filename.txt') as fp:
    for line in fp:
        print line

是要走的路。

它并不冗长。更安全。

Yes,

with open('filename.txt') as fp:
    for line in fp:
        print line

is the way to go.

It is not more verbose. It is more safe.


回答 2

如果您被多余的行关闭,则可以使用包装函数,如下所示:

def with_iter(iterable):
    with iterable as iter:
        for item in iter:
            yield item

for line in with_iter(open('...')):
    ...

在Python 3.3中,该yield from语句会使此操作更短:

def with_iter(iterable):
    with iterable as iter:
        yield from iter

if you’re turned off by the extra line, you can use a wrapper function like so:

def with_iter(iterable):
    with iterable as iter:
        for item in iter:
            yield item

for line in with_iter(open('...')):
    ...

in Python 3.3, the yield from statement would make this even shorter:

def with_iter(iterable):
    with iterable as iter:
        yield from iter

回答 3

f = open('test.txt','r')
for line in f.xreadlines():
    print line
f.close()
f = open('test.txt','r')
for line in f.xreadlines():
    print line
f.close()