标签归档:garbage-collection

Python垃圾收集器文档

问题:Python垃圾收集器文档

我正在寻找详细描述python垃圾回收如何工作的文档。

我对在哪个步骤中完成操作很感兴趣。这三个集合中有哪些对象?在每个步骤中删除哪些对象?参考循环使用什么算法?

背景:我正在实施一些必须在短时间内完成的搜索。当垃圾收集器开始收集最旧的一代时,它比其他情况“慢很多”。它花费了比计划搜索更多的时间。我正在寻找如何预测何时收集最古老的一代以及需要多长时间。

很容易预测何时使用get_count()和收集最老的一代get_threshold()。也可以使用进行操纵set_threshold()。但是我看不出collect()用武力做出更好的决定或等待预定的收集会多么容易。

I’m looking for documents that describes in details how python garbage collection works.

I’m interested what is done in which step. What objects are in these 3 collections? What kinds of objects are deleted in each step? What algorithm is used for reference cycles finding?

Background: I’m implementing some searches that have to finish in small amount of time. When the garbage collector starts collecting the oldest generation, it is “much” slower than in other cases. It took more time than it is intended for searches. I’m looking how to predict when it will collect oldest generation and how long it will take.

It is easy to predict when it will collect oldest generation with get_count() and get_threshold(). That also can be manipulated with set_threshold(). But I don’t see how easy to decide is it better to make collect() by force or wait for scheduled collection.


回答 0

没有关于Python如何进行垃圾收集的明确资源(除了源代码本身),但是这3个链接应该给您一个很好的主意。

更新资料

来源实际上很有帮助。从中获得多少取决于您对C的理解程度,但是注释实际上非常有帮助。跳到该collect()功能,注释会很好地解释该过程(尽管在技术上非常严格)。

There’s no definitive resource on how Python does its garbage collection (other than the source code itself), but those 3 links should give you a pretty good idea.

Update

The source is actually pretty helpful. How much you get out of it depends on how well you read C, but the comments are actually very helpful. Skip down to the collect() function and the comments explain the process well (albeit in very technical terms).


明确关闭文件重要吗?

问题:明确关闭文件重要吗?

在Python中,如果您不调用即可打开文件close(),或者不使用tryfinally或“ with”语句而关闭文件,这是问题吗?还是依靠Python垃圾回收来关闭所有文件作为一种编码实践就足够了?例如,如果这样做:

for line in open("filename"):
    # ... do stuff ...

…这是一个问题,因为文件永远无法关闭,并且可能发生阻止文件关闭的异常吗?还是for由于文件超出范围而肯定会在声明结束时将其关闭?

In Python, if you either open a file without calling close(), or close the file but not using tryfinally or the “with” statement, is this a problem? Or does it suffice as a coding practice to rely on the Python garbage-collection to close all files? For example, if one does this:

for line in open("filename"):
    # ... do stuff ...

… is this a problem because the file can never be closed and an exception could occur that prevents it from being closed? Or will it definitely be closed at the conclusion of the for statement because the file goes out of scope?


回答 0

在您的示例中,不能保证在解释器退出之前关闭文件。在当前版本的CPython中,该文件将在for循环结束时关闭,因为CPython使用引用计数作为其主要的垃圾收集机制,但这是实现细节,而不是语言的功能。不能保证其他Python实现会以这种方式工作。例如,IronPython,PyPy和Jython不使用引用计数,因此不会在循环结束时关闭文件。

依靠CPython的垃圾回收实现是一个坏习惯,因为它使您的代码可移植性降低。如果使用CPython,则可能不会发生资源泄漏,但是,如果切换到不使用引用计数的Python实现,则需要遍历所有代码并确保正确关闭了所有文件。

作为示例,请使用:

with open("filename") as f:
     for line in f:
        # ... do stuff ...

In your example the file isn’t guaranteed to be closed before the interpreter exits. In current versions of CPython the file will be closed at the end of the for loop because CPython uses reference counting as its primary garbage collection mechanism but that’s an implementation detail, not a feature of the language. Other implementations of Python aren’t guaranteed to work this way. For example IronPython, PyPy, and Jython don’t use reference counting and therefore won’t close the file at the end of the loop.

It’s bad practice to rely on CPython’s garbage collection implementation because it makes your code less portable. You might not have resource leaks if you use CPython, but if you ever switch to a Python implementation which doesn’t use reference counting you’ll need to go through all your code and make sure all your files are closed properly.

For your example use:

with open("filename") as f:
     for line in f:
        # ... do stuff ...

回答 1

有些Python在不再被引用时会自动关闭文件,而其他Python不会,并且在Python解释器退出时由O / S来关闭文件。

即使对于将为您关闭文件的Python,也无法保证时间:可能是立即执行,也可能是秒/分钟/小时/天之后。

因此,尽管您使用的Python可能不会遇到问题,但绝对不要将文件保持打开状态。实际上,在cpython 3中,您现在会得到警告,如果您不这样做,系统必须为您关闭文件。

道德:自己清理。:)

Some Pythons will close files automatically when they are no longer referenced, while others will not and it’s up to the O/S to close files when the Python interpreter exits.

Even for the Pythons that will close files for you, the timing is not guaranteed: it could be immediately, or it could be seconds/minutes/hours/days later.

So, while you may not experience problems with the Python you are using, it is definitely not good practice to leave your files open. In fact, in cpython 3 you will now get warnings that the system had to close files for you if you didn’t do it.

Moral: Clean up after yourself. :)


回答 2

尽管在这种特殊情况下使用这种构造是相当安全的,但仍需注意一些概括这种做法的注意事项:

  • 运行可能会用完文件描述符,尽管这不太可能,想象一下找到这样的错误
  • 您可能无法在某些系统上删除该文件,例如win32
  • 如果您运行的不是CPython,则不知道何时关闭文件
  • 如果以写或读写模式打开文件,则不知道何时刷新数据

Although it is quite safe to use such construct in this particular case, there are some caveats for generalising such practice:

  • run can potentially run out of file descriptors, although unlikely, imagine hunting a bug like that
  • you may not be able to delete said file on some systems, e.g. win32
  • if you run anything other than CPython, you don’t know when file is closed for you
  • if you open the file in write or read-write mode, you don’t know when data is flushed

回答 3

该文件确实会收集垃圾,因此已关闭。GC将确定关闭的时间,而不是您。显然,这不是推荐的做法,因为如果您没有在使用完文件后立即关闭文件,则可能会达到打开文件句柄的限制。如果在您的那个for循环中打开更多文件而让它们挥之不去怎么办?

The file does get garbage collected, and hence closed. The GC determines when it gets closed, not you. Obviously, this is not a recommended practice because you might hit open file handle limit if you do not close files as soon as you finish using them. What if within that for loop of yours, you open more files and leave them lingering?


回答 4

嗨,当您要在同一python脚本中使用文件描述符时,关闭文件描述符非常重要。经过很长时间的调试之后,我今天才意识到。原因是仅在关闭文件描述符后,内容才会被编辑/删除/保存,并且对文件的更改也会受到影响!

因此,假设您遇到的情况是将内容写入新文件,然后不关闭fd而在另一个读取其内容的shell命令中使用该文件(而非fd)。在这种情况下,您将无法按预期获得shell命令的内容,并且如果尝试调试,将很难找到该错误。您也可以在我的博客条目http://magnificentzps.blogspot.in/2014/04/importance-of-closing-file-descriptor.html中阅读更多内容

Hi It is very important to close your file descriptor in situation when you are going to use it’s content in the same python script. I today itself realize after so long hecting debugging. The reason is content will be edited/removed/saved only after you close you file descriptor and changes are affected to file!

So suppose you have situation that you write content to a new file and then without closing fd you are using that file(not fd) in another shell command which reads its content. In this situation you will not get you contents for shell command as expected and if you try to debug you can’t find the bug easily. you can also read more in my blog entry http://magnificentzps.blogspot.in/2014/04/importance-of-closing-file-descriptor.html


回答 5

在I / O过程中,数据被缓冲:这意味着在将数据写入文件之前将其保留在一个临时位置。

Python不会刷新缓冲区(即,将数据写入文件),直到确定完成写入为止。一种方法是关闭文件。

如果您在不关闭的情况下写入文件,则数据将不会写入目标文件。

During the I/O process, data is buffered: this means that it is held in a temporary location before being written to the file.

Python doesn’t flush the buffer—that is, write data to the file—until it’s sure you’re done writing. One way to do this is to close the file.

If you write to a file without closing, the data won’t make it to the target file.