问题:为什么我不能在打开的文件上两次调用read()?

对于我正在做的练习,我试图使用read()方法两次读取给定文件的内容。奇怪的是,当我第二次调用它时,似乎没有将文件内容返回为字符串?

这是代码

f = f.open()

# get the year
match = re.search(r'Popularity in (\d+)', f.read())

if match:
  print match.group(1)

# get all the names
matches = re.findall(r'<td>(\d+)</td><td>(\w+)</td><td>(\w+)</td>', f.read())

if matches:
  # matches is always None

我当然知道这不是最有效或最好的方法,这不是重点。问题是,为什么我不能打read()两次电话?我是否需要重置文件句柄?还是关闭/重新打开文件以执行此操作?

For an exercise I’m doing, I’m trying to read the contents of a given file twice using the read() method. Strangely, when I call it the second time, it doesn’t seem to return the file content as a string?

Here’s the code

f = f.open()

# get the year
match = re.search(r'Popularity in (\d+)', f.read())

if match:
  print match.group(1)

# get all the names
matches = re.findall(r'<td>(\d+)</td><td>(\w+)</td><td>(\w+)</td>', f.read())

if matches:
  # matches is always None

Of course I know that this is not the most efficient or best way, this is not the point here. The point is, why can’t I call read() twice? Do I have to reset the file handle? Or close / reopen the file in order to do that?


回答 0

调用read()将读取整个文件,并将读取的游标留在文件的末尾(仅读取其他内容)。如果您希望一次阅读一定数量的行,则可以使用readline()readlines()或使用 遍历行for line in handle:

要直接回答您的问题,请在读取文件后read()使用seek(0),将读取的光标返回到文件的开头(文档在此处)。如果您知道文件不会太大,也可以将read()输出保存到变量中,并在findall表达式中使用它。

附言 完成操作后,不要忘记关闭文件;)

Calling read() reads through the entire file and leaves the read cursor at the end of the file (with nothing more to read). If you are looking to read a certain number of lines at a time you could use readline(), readlines() or iterate through lines with for line in handle:.

To answer your question directly, once a file has been read, with read() you can use seek(0) to return the read cursor to the start of the file (docs are here). If you know the file isn’t going to be too large, you can also save the read() output to a variable, using it in your findall expressions.

Ps. Dont forget to close the file after you are done with it ;)


回答 1

是的,如上所述

我只写一个例子:

>>> a = open('file.txt')
>>> a.read()
#output
>>> a.seek(0)
>>> a.read()
#same output

yeah, as above…

i’ll write just an example:

>>> a = open('file.txt')
>>> a.read()
#output
>>> a.seek(0)
>>> a.read()
#same output

回答 2

到目前为止,回答此问题的每个人都是绝对正确的- read()遍历文件,因此在调用该文件后,就无法再次调用它。

我要补充的是,在您的特定情况下,您无需重新查找文件或重新打开文件,您只需将已阅读的文本存储在局部变量中,然后使用两次,或者在您的程序中进行任意多次:

f = f.open()
text = f.read() # read the file into a local variable
# get the year
match = re.search(r'Popularity in (\d+)', text)
if match:
  print match.group(1)
# get all the names
matches = re.findall(r'<td>(\d+)</td><td>(\w+)</td><td>(\w+)</td>', text)
if matches:
  # matches will now not always be None

Everyone who has answered this question so far is absolutely right – read() moves through the file, so after you’ve called it, you can’t call it again.

What I’ll add is that in your particular case, you don’t need to seek back to the start or reopen the file, you can just store the text that you’ve read in a local variable, and use it twice, or as many times as you like, in your program:

f = f.open()
text = f.read() # read the file into a local variable
# get the year
match = re.search(r'Popularity in (\d+)', text)
if match:
  print match.group(1)
# get all the names
matches = re.findall(r'<td>(\d+)</td><td>(\w+)</td><td>(\w+)</td>', text)
if matches:
  # matches will now not always be None

回答 3

读指针移动到最后一个读字节/字符之后。使用该seek()方法将读取的指针后退到开头。

The read pointer moves to after the last read byte/character. Use the seek() method to rewind the read pointer to the beginning.


回答 4

每个打开的文件都有一个关联的位置。
当您读取()时,您将从该位置读取。例如read(10),从一个新打开的文件中读取前10个字节,然后另一个read(10)读取后10个字节。 read()不带参数的文件将读取文件的所有内容,而将文件位置保留在文件末尾。下次调用时read(),没有任何内容可供阅读。

您可以seek用来移动文件位置。或者在您的情况下更好的方法是做一个read()并保留两个搜索的结果。

Every open file has an associated position.
When you read() you read from that position. For example read(10) reads the first 10 bytes from a newly opened file, then another read(10) reads the next 10 bytes. read() without arguments reads all of the contents of the file, leaving the file position at the end of the file. Next time you call read() there is nothing to read.

You can use seek to move the file position. Or probably better in your case would be to do one read() and keep the result for both searches.


回答 5

read() 消耗。因此,您可以重设文件,或在重新读取之前寻求开始。或者,如果它read(n)适合您的任务,则可以用来仅消耗n字节。

read() consumes. So, you could reset the file, or seek to the start before re-reading. Or, if it suites your task, you can use read(n) to consume only n bytes.


回答 6

我总是发现读取方法有点像在黑暗的小巷中漫步。您会停下来停下来,但是如果您不计算步数,则不确定您走了多远。Seek通过重新定位来提供解决方案,另一个选项是Tell,它返回沿文件的位置。可能是Python文件api可以将读取和查找合并为一个read_from(position,bytes)以使其更简单-直到发生这种情况,您应该阅读此页面

I always find the read method something of a walk down a dark alley. You go down a bit and stop but if you are not counting your steps you are not sure how far along you are. Seek gives the solution by repositioning, the other option is Tell which returns the position along the file. May be the Python file api can combine read and seek into a read_from(position,bytes) to make it simpler – till that happens you should read this page.


声明:本站所有文章,如无特殊说明或标注,均为本站原创发布。任何个人或组织,在未征得本站同意时,禁止复制、盗用、采集、发布本站内容到任何网站、书籍等各类媒体平台。如若本站内容侵犯了原著者的合法权益,可联系我们进行处理。