标签归档:filehandle

读取整个文件是否会使文件句柄保持打开状态?

问题:读取整个文件是否会使文件句柄保持打开状态?

如果您读取整个文件时content = open('Path/to/file', 'r').read(),文件句柄是否一直打开直到脚本退出?有没有更简洁的方法来读取整个文件?

If you read an entire file with content = open('Path/to/file', 'r').read() is the file handle left open until the script exits? Is there a more concise method to read a whole file?


回答 0

这个问题的答案在某种程度上取决于特定的Python实现。

要了解所有内容,请特别注意实际file对象。在您的代码中,该对象仅在表达式中被提及一次,并且在read()调用返回后立即变得不可访问。

这意味着文件对象是垃圾。剩下的唯一问题是“垃圾收集器何时收集文件对象?”。

在使用引用计数器的CPython中,这种垃圾立即被注意到,因此将立即被收集。这通常不适用于其他python实现。

确保该文件已关闭的一种更好的解决方案是以下模式:

with open('Path/to/file', 'r') as content_file:
    content = content_file.read()

块结束后,它将始终立即关闭文件;即使发生异常。

编辑:在上面提出一个更好的点:

除了file.__exit__(),这是在with上下文管理器设置中“自动”调用的,唯一file.close()可以自动调用的其他方法(即,除了自己明确调用之外)是via file.__del__()。这就引出了我们什么时候__del__()打电话的问题?

正确编写的程序不能假定终结器将在程序终止之前的任何时候运行。

https://devblogs.microsoft.com/oldnewthing/20100809-00/?p=13203

特别是:

从不显式销毁对象。但是,当它们变得不可访问时,它们可能会被垃圾回收。允许实现推迟或完全取消垃圾回收 -只要没有收集仍可到达的对象,垃圾回收的实现方式就取决于实现质量。

[…]

CPython当前使用带有循环计数垃圾的(可选)延迟检测的引用计数方案,该方案会在无法访问时立即收集大多数对象,但不能保证收集包含循环引用的垃圾。

https://docs.python.org/3.5/reference/datamodel.html#objects-values-and-types

(强调我的)

但正如它暗示的那样,其他实现可能具有其他行为。例如,PyPy 6种不同的垃圾回收实现

The answer to that question depends somewhat on the particular Python implementation.

To understand what this is all about, pay particular attention to the actual file object. In your code, that object is mentioned only once, in an expression, and becomes inaccessible immediately after the read() call returns.

This means that the file object is garbage. The only remaining question is “When will the garbage collector collect the file object?”.

in CPython, which uses a reference counter, this kind of garbage is noticed immediately, and so it will be collected immediately. This is not generally true of other python implementations.

A better solution, to make sure that the file is closed, is this pattern:

with open('Path/to/file', 'r') as content_file:
    content = content_file.read()

which will always close the file immediately after the block ends; even if an exception occurs.

Edit: To put a finer point on it:

Other than file.__exit__(), which is “automatically” called in a with context manager setting, the only other way that file.close() is automatically called (that is, other than explicitly calling it yourself,) is via file.__del__(). This leads us to the question of when does __del__() get called?

A correctly-written program cannot assume that finalizers will ever run at any point prior to program termination.

https://devblogs.microsoft.com/oldnewthing/20100809-00/?p=13203

In particular:

Objects are never explicitly destroyed; however, when they become unreachable they may be garbage-collected. An implementation is allowed to postpone garbage collection or omit it altogether — it is a matter of implementation quality how garbage collection is implemented, as long as no objects are collected that are still reachable.

[…]

CPython currently uses a reference-counting scheme with (optional) delayed detection of cyclically linked garbage, which collects most objects as soon as they become unreachable, but is not guaranteed to collect garbage containing circular references.

https://docs.python.org/3.5/reference/datamodel.html#objects-values-and-types

(Emphasis mine)

but as it suggests, other implementations may have other behavior. As an example, PyPy has 6 different garbage collection implementations!


回答 1

您可以使用pathlib

对于Python 3.5及更高版本:

from pathlib import Path
contents = Path(file_path).read_text()

对于旧版本的Python,请使用pathlib2

$ pip install pathlib2

然后:

from pathlib2 import Path
contents = Path(file_path).read_text()

这是实际的read_text 实现

def read_text(self, encoding=None, errors=None):
    """
    Open the file in text mode, read it, and close the file.
    """
    with self.open(mode='r', encoding=encoding, errors=errors) as f:
        return f.read()

You can use pathlib.

For Python 3.5 and above:

from pathlib import Path
contents = Path(file_path).read_text()

For older versions of Python use pathlib2:

$ pip install pathlib2

Then:

from pathlib2 import Path
contents = Path(file_path).read_text()

This is the actual read_text implementation:

def read_text(self, encoding=None, errors=None):
    """
    Open the file in text mode, read it, and close the file.
    """
    with self.open(mode='r', encoding=encoding, errors=errors) as f:
        return f.read()

回答 2

好吧,如果您必须逐行读取文件才能使用每一行,则可以使用

with open('Path/to/file', 'r') as f:
    s = f.readline()
    while s:
        # do whatever you want to
        s = f.readline()

甚至更好的方法:

with open('Path/to/file') as f:
    for line in f:
        # do whatever you want to

Well, if you have to read file line by line to work with each line, you can use

with open('Path/to/file', 'r') as f:
    s = f.readline()
    while s:
        # do whatever you want to
        s = f.readline()

Or even better way:

with open('Path/to/file') as f:
    for line in f:
        # do whatever you want to

回答 3

与其将文件内容作为单个字符串检索,不如将其存储为文件包括的所有行的列表,这很方便:

with open('Path/to/file', 'r') as content_file:
    content_list = content_file.read().strip().split("\n")

可以看出,一个需要经连结的方法添加.strip().split("\n")这个线程的主要答案

在这里,.strip()只需删除整个文件字符串末尾的空格和换行符,并.split("\n")通过在每个换行符 \ n处拆分整个文件字符串来生成实际列表。

而且,这种方式可以将整个文件内容存储在一个变量中,这在某些情况下可能是需要的,而不是像在文件中指出的那样逐行循环文件。 先前的答案中

Instead of retrieving the file content as a single string, it can be handy to store the content as a list of all lines the file comprises:

with open('Path/to/file', 'r') as content_file:
    content_list = content_file.read().strip().split("\n")

As can be seen, one needs to add the concatenated methods .strip().split("\n") to the main answer in this thread.

Here, .strip() just removes whitespace and newline characters at the endings of the entire file string, and .split("\n") produces the actual list via splitting the entire file string at every newline character \n.

Moreover, this way the entire file content can be stored in a variable, which might be desired in some cases, instead of looping over the file line by line as pointed out in this previous answer.