Python 实用宝典

Question 1

I’m looking for documents that describes in details how python garbage collection works.

I’m interested what is done in which step. What objects are in these 3 collections? What kinds of objects are deleted in each step? What algorithm is used for reference cycles finding?

Background: I’m implementing some searches that have to finish in small amount of time. When the garbage collector starts collecting the oldest generation, it is “much” slower than in other cases. It took more time than it is intended for searches. I’m looking how to predict when it will collect oldest generation and how long it will take.

It is easy to predict when it will collect oldest generation with get_count() and get_threshold(). That also can be manipulated with set_threshold(). But I don’t see how easy to decide is it better to make collect() by force or wait for scheduled collection.

Question 2

There’s no definitive resource on how Python does its garbage collection (other than the source code itself), but those 3 links should give you a pretty good idea.

Update

The source is actually pretty helpful. How much you get out of it depends on how well you read C, but the comments are actually very helpful. Skip down to the collect() function and the comments explain the process well (albeit in very technical terms).

Question 3

In Python, if you either open a file without calling close(), or close the file but not using try–finally or the “with” statement, is this a problem? Or does it suffice as a coding practice to rely on the Python garbage-collection to close all files? For example, if one does this:

for line in open("filename"):
    # ... do stuff ...

… is this a problem because the file can never be closed and an exception could occur that prevents it from being closed? Or will it definitely be closed at the conclusion of the for statement because the file goes out of scope?

Question 4

In your example the file isn’t guaranteed to be closed before the interpreter exits. In current versions of CPython the file will be closed at the end of the for loop because CPython uses reference counting as its primary garbage collection mechanism but that’s an implementation detail, not a feature of the language. Other implementations of Python aren’t guaranteed to work this way. For example IronPython, PyPy, and Jython don’t use reference counting and therefore won’t close the file at the end of the loop.

It’s bad practice to rely on CPython’s garbage collection implementation because it makes your code less portable. You might not have resource leaks if you use CPython, but if you ever switch to a Python implementation which doesn’t use reference counting you’ll need to go through all your code and make sure all your files are closed properly.

For your example use:

with open("filename") as f:
     for line in f:
        # ... do stuff ...

Question 5

Some Pythons will close files automatically when they are no longer referenced, while others will not and it’s up to the O/S to close files when the Python interpreter exits.

Even for the Pythons that will close files for you, the timing is not guaranteed: it could be immediately, or it could be seconds/minutes/hours/days later.

So, while you may not experience problems with the Python you are using, it is definitely not good practice to leave your files open. In fact, in cpython 3 you will now get warnings that the system had to close files for you if you didn’t do it.

Moral: Clean up after yourself. :)

Question 6

Although it is quite safe to use such construct in this particular case, there are some caveats for generalising such practice:

run can potentially run out of file descriptors, although unlikely, imagine hunting a bug like that
you may not be able to delete said file on some systems, e.g. win32
if you run anything other than CPython, you don’t know when file is closed for you
if you open the file in write or read-write mode, you don’t know when data is flushed

Question 7

The file does get garbage collected, and hence closed. The GC determines when it gets closed, not you. Obviously, this is not a recommended practice because you might hit open file handle limit if you do not close files as soon as you finish using them. What if within that for loop of yours, you open more files and leave them lingering?

Question 8

Hi It is very important to close your file descriptor in situation when you are going to use it’s content in the same python script. I today itself realize after so long hecting debugging. The reason is content will be edited/removed/saved only after you close you file descriptor and changes are affected to file!

So suppose you have situation that you write content to a new file and then without closing fd you are using that file(not fd) in another shell command which reads its content. In this situation you will not get you contents for shell command as expected and if you try to debug you can’t find the bug easily. you can also read more in my blog entry http://magnificentzps.blogspot.in/2014/04/importance-of-closing-file-descriptor.html

Question 9

During the I/O process, data is buffered: this means that it is held in a temporary location before being written to the file.

Python doesn’t flush the buffer—that is, write data to the file—until it’s sure you’re done writing. One way to do this is to close the file.

If you write to a file without closing, the data won’t make it to the target file.

Python 实用宝典

标签归档：garbage-collection

Python垃圾收集器文档

问题：Python垃圾收集器文档

回答 0

更新资料

Update

明确关闭文件重要吗？

问题：明确关闭文件重要吗？

回答 0

回答 1

回答 2

回答 3

回答 4

回答 5

有趣好用的Python教程