问题:在Python中搜索并替换文件中的一行
我想遍历文本文件的内容,进行搜索并替换某些行,然后将结果写回到文件中。我可以先将整个文件加载到内存中,然后再写回去,但这可能不是最好的方法。
在以下代码中,执行此操作的最佳方法是什么?
f = open(file)
for line in f:
if line.contains('foo'):
newline = line.replace('foo', 'bar')
# how to write this newline back to the file
I want to loop over the contents of a text file and do a search and replace on some lines and write the result back to the file. I could first load the whole file in memory and then write it back, but that probably is not the best way to do it.
What is the best way to do this, within the following code?
f = open(file)
for line in f:
if line.contains('foo'):
newline = line.replace('foo', 'bar')
# how to write this newline back to the file
回答 0
我想类似的事情应该做。它基本上将内容写入新文件,并用新文件替换旧文件:
from tempfile import mkstemp
from shutil import move, copymode
from os import fdopen, remove
def replace(file_path, pattern, subst):
#Create temp file
fh, abs_path = mkstemp()
with fdopen(fh,'w') as new_file:
with open(file_path) as old_file:
for line in old_file:
new_file.write(line.replace(pattern, subst))
#Copy the file permissions from the old file to the new file
copymode(file_path, abs_path)
#Remove original file
remove(file_path)
#Move new file
move(abs_path, file_path)
I guess something like this should do it. It basically writes the content to a new file and replaces the old file with the new file:
from tempfile import mkstemp
from shutil import move, copymode
from os import fdopen, remove
def replace(file_path, pattern, subst):
#Create temp file
fh, abs_path = mkstemp()
with fdopen(fh,'w') as new_file:
with open(file_path) as old_file:
for line in old_file:
new_file.write(line.replace(pattern, subst))
#Copy the file permissions from the old file to the new file
copymode(file_path, abs_path)
#Remove original file
remove(file_path)
#Move new file
move(abs_path, file_path)
回答 1
最短的方法可能是使用fileinput模块。例如,以下代码将行号就地添加到文件中:
import fileinput
for line in fileinput.input("test.txt", inplace=True):
print('{} {}'.format(fileinput.filelineno(), line), end='') # for Python 3
# print "%d: %s" % (fileinput.filelineno(), line), # for Python 2
这里发生的是:
- 原始文件已移至备份文件
- 标准输出在循环中重定向到原始文件
- 因此,所有
print
语句都会写回到原始文件中
fileinput
有更多的钟声和口哨声。例如,它可以用于自动操作中的所有文件sys.args[1:]
,而无需显式遍历它们。从Python 3.2开始,它还提供了在with
语句中使用的便捷上下文管理器。
虽然fileinput
对于一次性脚本非常有用,但我会警惕在实际代码中使用它,因为要承认它不是很易读或不熟悉。在实际(生产)代码中,值得花几行代码来使过程明确,从而使代码可读。
有两种选择:
- 该文件不是太大,您可以将其全部读取到内存中。然后关闭文件,以写入模式将其重新打开,然后将修改后的内容写回。
- 该文件太大,无法存储在内存中。您可以将其移到一个临时文件中并打开它,逐行阅读,然后写回到原始文件中。请注意,这需要两倍的存储空间。
The shortest way would probably be to use the fileinput module. For example, the following adds line numbers to a file, in-place:
import fileinput
for line in fileinput.input("test.txt", inplace=True):
print('{} {}'.format(fileinput.filelineno(), line), end='') # for Python 3
# print "%d: %s" % (fileinput.filelineno(), line), # for Python 2
What happens here is:
- The original file is moved to a backup file
- The standard output is redirected to the original file within the loop
- Thus any
print
statements write back into the original file
fileinput
has more bells and whistles. For example, it can be used to automatically operate on all files in sys.args[1:]
, without your having to iterate over them explicitly. Starting with Python 3.2 it also provides a convenient context manager for use in a with
statement.
While fileinput
is great for throwaway scripts, I would be wary of using it in real code because admittedly it’s not very readable or familiar. In real (production) code it’s worthwhile to spend just a few more lines of code to make the process explicit and thus make the code readable.
There are two options:
- The file is not overly large, and you can just read it wholly to memory. Then close the file, reopen it in writing mode and write the modified contents back.
- The file is too large to be stored in memory; you can move it over to a temporary file and open that, reading it line by line, writing back into the original file. Note that this requires twice the storage.
回答 2
这是另一个经过测试的示例,它将匹配搜索和替换模式:
import fileinput
import sys
def replaceAll(file,searchExp,replaceExp):
for line in fileinput.input(file, inplace=1):
if searchExp in line:
line = line.replace(searchExp,replaceExp)
sys.stdout.write(line)
使用示例:
replaceAll("/fooBar.txt","Hello\sWorld!$","Goodbye\sWorld.")
Here’s another example that was tested, and will match search & replace patterns:
import fileinput
import sys
def replaceAll(file,searchExp,replaceExp):
for line in fileinput.input(file, inplace=1):
if searchExp in line:
line = line.replace(searchExp,replaceExp)
sys.stdout.write(line)
Example use:
replaceAll("/fooBar.txt","Hello\sWorld!$","Goodbye\sWorld.")
回答 3
这应该起作用:(就地编辑)
import fileinput
# Does a list of files, and
# redirects STDOUT to the file in question
for line in fileinput.input(files, inplace = 1):
print line.replace("foo", "bar"),
This should work: (inplace editing)
import fileinput
# Does a list of files, and
# redirects STDOUT to the file in question
for line in fileinput.input(files, inplace = 1):
print line.replace("foo", "bar"),
回答 4
根据Thomas Watnedal的回答。但是,这不能完全回答原始问题的线对线部分。该功能仍可以逐行替换
此实现无需使用临时文件即可替换文件内容,因此文件权限保持不变。
同样,re.sub代替replace,仅允许正则表达式代替纯文本替换。
以单个字符串而不是逐行读取文件可以进行多行匹配和替换。
import re
def replace(file, pattern, subst):
# Read contents from file as a single string
file_handle = open(file, 'r')
file_string = file_handle.read()
file_handle.close()
# Use RE package to allow for replacement (also allowing for (multiline) REGEX)
file_string = (re.sub(pattern, subst, file_string))
# Write contents to file.
# Using mode 'w' truncates the file.
file_handle = open(file, 'w')
file_handle.write(file_string)
file_handle.close()
Based on the answer by Thomas Watnedal. However, this does not answer the line-to-line part of the original question exactly. The function can still replace on a line-to-line basis
This implementation replaces the file contents without using temporary files, as a consequence file permissions remain unchanged.
Also re.sub instead of replace, allows regex replacement instead of plain text replacement only.
Reading the file as a single string instead of line by line allows for multiline match and replacement.
import re
def replace(file, pattern, subst):
# Read contents from file as a single string
file_handle = open(file, 'r')
file_string = file_handle.read()
file_handle.close()
# Use RE package to allow for replacement (also allowing for (multiline) REGEX)
file_string = (re.sub(pattern, subst, file_string))
# Write contents to file.
# Using mode 'w' truncates the file.
file_handle = open(file, 'w')
file_handle.write(file_string)
file_handle.close()
回答 5
就像lassevk建议的那样,随时随地写出新文件,这是一些示例代码:
fin = open("a.txt")
fout = open("b.txt", "wt")
for line in fin:
fout.write( line.replace('foo', 'bar') )
fin.close()
fout.close()
As lassevk suggests, write out the new file as you go, here is some example code:
fin = open("a.txt")
fout = open("b.txt", "wt")
for line in fin:
fout.write( line.replace('foo', 'bar') )
fin.close()
fout.close()
回答 6
如果您想要一个通用函数来将任何文本替换为其他文本,那么这可能是最好的选择,特别是如果您是正则表达式的支持者:
import re
def replace( filePath, text, subs, flags=0 ):
with open( filePath, "r+" ) as file:
fileContents = file.read()
textPattern = re.compile( re.escape( text ), flags )
fileContents = textPattern.sub( subs, fileContents )
file.seek( 0 )
file.truncate()
file.write( fileContents )
If you’re wanting a generic function that replaces any text with some other text, this is likely the best way to go, particularly if you’re a fan of regex’s:
import re
def replace( filePath, text, subs, flags=0 ):
with open( filePath, "r+" ) as file:
fileContents = file.read()
textPattern = re.compile( re.escape( text ), flags )
fileContents = textPattern.sub( subs, fileContents )
file.seek( 0 )
file.truncate()
file.write( fileContents )
回答 7
更加Python化的方式是使用上下文管理器,如下面的代码:
from tempfile import mkstemp
from shutil import move
from os import remove
def replace(source_file_path, pattern, substring):
fh, target_file_path = mkstemp()
with open(target_file_path, 'w') as target_file:
with open(source_file_path, 'r') as source_file:
for line in source_file:
target_file.write(line.replace(pattern, substring))
remove(source_file_path)
move(target_file_path, source_file_path)
您可以在此处找到完整的代码段。
A more pythonic way would be to use context managers like the code below:
from tempfile import mkstemp
from shutil import move
from os import remove
def replace(source_file_path, pattern, substring):
fh, target_file_path = mkstemp()
with open(target_file_path, 'w') as target_file:
with open(source_file_path, 'r') as source_file:
for line in source_file:
target_file.write(line.replace(pattern, substring))
remove(source_file_path)
move(target_file_path, source_file_path)
You can find the full snippet here.
回答 8
创建一个新文件,将行从旧复制到新,并在将行写入新文件之前进行替换。
Create a new file, copy lines from the old to the new, and do the replacing before you write the lines to the new file.
回答 9
扩展@Kiran的答案(我同意是更简洁和Pythonic的),这增加了编解码器以支持UTF-8的读写:
import codecs
from tempfile import mkstemp
from shutil import move
from os import remove
def replace(source_file_path, pattern, substring):
fh, target_file_path = mkstemp()
with codecs.open(target_file_path, 'w', 'utf-8') as target_file:
with codecs.open(source_file_path, 'r', 'utf-8') as source_file:
for line in source_file:
target_file.write(line.replace(pattern, substring))
remove(source_file_path)
move(target_file_path, source_file_path)
Expanding on @Kiran’s answer, which I agree is more succinct and Pythonic, this adds codecs to support the reading and writing of UTF-8:
import codecs
from tempfile import mkstemp
from shutil import move
from os import remove
def replace(source_file_path, pattern, substring):
fh, target_file_path = mkstemp()
with codecs.open(target_file_path, 'w', 'utf-8') as target_file:
with codecs.open(source_file_path, 'r', 'utf-8') as source_file:
for line in source_file:
target_file.write(line.replace(pattern, substring))
remove(source_file_path)
move(target_file_path, source_file_path)
回答 10
使用hamishmcn的答案作为模板,我能够在文件中搜索与我的正则表达式匹配的一行并将其替换为空字符串。
import re
fin = open("in.txt", 'r') # in file
fout = open("out.txt", 'w') # out file
for line in fin:
p = re.compile('[-][0-9]*[.][0-9]*[,]|[-][0-9]*[,]') # pattern
newline = p.sub('',line) # replace matching strings with empty string
print newline
fout.write(newline)
fin.close()
fout.close()
Using hamishmcn’s answer as a template I was able to search for a line in a file that match my regex and replacing it with empty string.
import re
fin = open("in.txt", 'r') # in file
fout = open("out.txt", 'w') # out file
for line in fin:
p = re.compile('[-][0-9]*[.][0-9]*[,]|[-][0-9]*[,]') # pattern
newline = p.sub('',line) # replace matching strings with empty string
print newline
fout.write(newline)
fin.close()
fout.close()
回答 11
fileinput
如先前的答案所述,它非常简单:
import fileinput
def replace_in_file(file_path, search_text, new_text):
with fileinput.input(file_path, inplace=True) as f:
for line in f:
new_line = line.replace(search_text, new_text)
print(new_line, end='')
说明:
fileinput
可以接受多个文件,但是我更喜欢在处理每个文件后立即将其关闭。所以,放置单file_path
中with
声明。 print
时inplace=True
,语句不会打印任何内容,因为STDOUT
将其转发到原始文件。 end=''
in print
语句是消除中间的空白新行。
可以如下使用:
file_path = '/path/to/my/file'
replace_in_file(file_path, 'old-text', 'new-text')
fileinput
is quite straightforward as mentioned on previous answers:
import fileinput
def replace_in_file(file_path, search_text, new_text):
with fileinput.input(file_path, inplace=True) as f:
for line in f:
new_line = line.replace(search_text, new_text)
print(new_line, end='')
Explanation:
fileinput
can accept multiple files, but I prefer to close each single file as soon as it is being processed. So placed single file_path
in with
statement. print
statement does not print anything when inplace=True
, because STDOUT
is being forwarded to the original file. end=''
in print
statement is to eliminate intermediate blank new lines.
Can be used as follows:
file_path = '/path/to/my/file'
replace_in_file(file_path, 'old-text', 'new-text')
回答 12
如果您在以下位置删除缩进,它将搜索并替换成多行。参见以下示例。
def replace(file, pattern, subst):
#Create temp file
fh, abs_path = mkstemp()
print fh, abs_path
new_file = open(abs_path,'w')
old_file = open(file)
for line in old_file:
new_file.write(line.replace(pattern, subst))
#close temp file
new_file.close()
close(fh)
old_file.close()
#Remove original file
remove(file)
#Move new file
move(abs_path, file)
if you remove the indent at the like below, it will search and replace in multiple line. See below for example.
def replace(file, pattern, subst):
#Create temp file
fh, abs_path = mkstemp()
print fh, abs_path
new_file = open(abs_path,'w')
old_file = open(file)
for line in old_file:
new_file.write(line.replace(pattern, subst))
#close temp file
new_file.close()
close(fh)
old_file.close()
#Remove original file
remove(file)
#Move new file
move(abs_path, file)
声明:本站所有文章,如无特殊说明或标注,均为本站原创发布。任何个人或组织,在未征得本站同意时,禁止复制、盗用、采集、发布本站内容到任何网站、书籍等各类媒体平台。如若本站内容侵犯了原著者的合法权益,可联系我们进行处理。