标签归档:file

如何在Python中获取所有直接子目录

问题:如何在Python中获取所有直接子目录

我正在尝试编写一个简单的Python脚本,该脚本会将所有子目录中的index.tpl复制到index.html(有一些exceptions)。

通过尝试获取子目录列表,我陷入了困境。

I’m trying to write a simple Python script that will copy a index.tpl to index.html in all of the subdirectories (with a few exceptions).

I’m getting bogged down by trying to get the list of subdirectories.


回答 0

我对各种功能进行了一些速度测试,以将完整路径返回到所有当前子目录。

tl; dr: 始终使用scandir

list_subfolders_with_paths = [f.path for f in os.scandir(path) if f.is_dir()]

奖励:使用,scandir您也可以只使用f.name而不是来获取文件夹名称f.path

此(以及下面的所有其他功能)将不使用自然排序。这意味着将对结果进行如下排序:1、10、2。要进行自然排序(1、2、10),请查看https://stackoverflow.com/a/48030307/2441026




结果scandir是:比walk快3倍,比listdir(带过滤器)快32 倍,比快35倍Pathliblistdir35 36倍,比快37倍(!)glob

Scandir:           0.977
Walk:              3.011
Listdir (filter): 31.288
Pathlib:          34.075
Listdir:          35.501
Glob:             36.277

已在W7x64,Python 3.8.1中测试。带有440个子文件夹的文件夹。
如果您想知道是否listdir可以通过不两次执行os.path.join()来加快速度,是的,但是区别根本不存在。

码:

import os
import pathlib
import timeit
import glob

path = r"<example_path>"



def a():
    list_subfolders_with_paths = [f.path for f in os.scandir(path) if f.is_dir()]
    # print(len(list_subfolders_with_paths))


def b():
    list_subfolders_with_paths = [os.path.join(path, f) for f in os.listdir(path) if os.path.isdir(os.path.join(path, f))]
    # print(len(list_subfolders_with_paths))


def c():
    list_subfolders_with_paths = []
    for root, dirs, files in os.walk(path):
        for dir in dirs:
            list_subfolders_with_paths.append( os.path.join(root, dir) )
        break
    # print(len(list_subfolders_with_paths))


def d():
    list_subfolders_with_paths = glob.glob(path + '/*/')
    # print(len(list_subfolders_with_paths))


def e():
    list_subfolders_with_paths = list(filter(os.path.isdir, [os.path.join(path, f) for f in os.listdir(path)]))
    # print(len(list(list_subfolders_with_paths)))


def f():
    p = pathlib.Path(path)
    list_subfolders_with_paths = [x for x in p.iterdir() if x.is_dir()]
    # print(len(list_subfolders_with_paths))



print(f"Scandir:          {timeit.timeit(a, number=1000):.3f}")
print(f"Listdir:          {timeit.timeit(b, number=1000):.3f}")
print(f"Walk:             {timeit.timeit(c, number=1000):.3f}")
print(f"Glob:             {timeit.timeit(d, number=1000):.3f}")
print(f"Listdir (filter): {timeit.timeit(e, number=1000):.3f}")
print(f"Pathlib:          {timeit.timeit(f, number=1000):.3f}")

I did some speed testing on various functions to return the full path to all current subdirectories.

tl;dr: Always use scandir:

list_subfolders_with_paths = [f.path for f in os.scandir(path) if f.is_dir()]

Bonus: With scandir you can also simply only get folder names by using f.name instead of f.path.

This (as well as all other functions below) will not use natural sorting. This means results will be sorted like this: 1, 10, 2. To get natural sorting (1, 2, 10), please have a look at https://stackoverflow.com/a/48030307/2441026




Results: scandir is: 3x faster than walk, 32x faster than listdir (with filter), 35x faster than Pathlib and 36x faster than listdir and 37x (!) faster than glob.

Scandir:           0.977
Walk:              3.011
Listdir (filter): 31.288
Pathlib:          34.075
Listdir:          35.501
Glob:             36.277

Tested with W7x64, Python 3.8.1. Folder with 440 subfolders.
In case you wonder if listdir could be speed up by not doing os.path.join() twice, yes, but the difference is basically nonexistent.

Code:

import os
import pathlib
import timeit
import glob

path = r"<example_path>"



def a():
    list_subfolders_with_paths = [f.path for f in os.scandir(path) if f.is_dir()]
    # print(len(list_subfolders_with_paths))


def b():
    list_subfolders_with_paths = [os.path.join(path, f) for f in os.listdir(path) if os.path.isdir(os.path.join(path, f))]
    # print(len(list_subfolders_with_paths))


def c():
    list_subfolders_with_paths = []
    for root, dirs, files in os.walk(path):
        for dir in dirs:
            list_subfolders_with_paths.append( os.path.join(root, dir) )
        break
    # print(len(list_subfolders_with_paths))


def d():
    list_subfolders_with_paths = glob.glob(path + '/*/')
    # print(len(list_subfolders_with_paths))


def e():
    list_subfolders_with_paths = list(filter(os.path.isdir, [os.path.join(path, f) for f in os.listdir(path)]))
    # print(len(list(list_subfolders_with_paths)))


def f():
    p = pathlib.Path(path)
    list_subfolders_with_paths = [x for x in p.iterdir() if x.is_dir()]
    # print(len(list_subfolders_with_paths))



print(f"Scandir:          {timeit.timeit(a, number=1000):.3f}")
print(f"Listdir:          {timeit.timeit(b, number=1000):.3f}")
print(f"Walk:             {timeit.timeit(c, number=1000):.3f}")
print(f"Glob:             {timeit.timeit(d, number=1000):.3f}")
print(f"Listdir (filter): {timeit.timeit(e, number=1000):.3f}")
print(f"Pathlib:          {timeit.timeit(f, number=1000):.3f}")

回答 1

import os
def get_immediate_subdirectories(a_dir):
    return [name for name in os.listdir(a_dir)
            if os.path.isdir(os.path.join(a_dir, name))]
import os
def get_immediate_subdirectories(a_dir):
    return [name for name in os.listdir(a_dir)
            if os.path.isdir(os.path.join(a_dir, name))]

回答 2

为什么没有人提到globglob使您可以使用Unix样式的路径名扩展,并且对于需要查找多个路径名的几乎所有内容,我都可以使用。它非常容易:

from glob import glob
paths = glob('*/')

请注意,这glob将返回带有最后斜杠的目录(与Unix一样),而大多数path基于解决方案的目录将省略最后斜杠。

Why has no one mentioned glob? glob lets you use Unix-style pathname expansion, and is my go to function for almost everything that needs to find more than one path name. It makes it very easy:

from glob import glob
paths = glob('*/')

Note that glob will return the directory with the final slash (as unix would) while most path based solutions will omit the final slash.


回答 3

选中“ 获取当前目录中所有子目录的列表 ”。

这是Python 3版本:

import os

dir_list = next(os.walk('.'))[1]

print(dir_list)

Check “Getting a list of all subdirectories in the current directory“.

Here’s a Python 3 version:

import os

dir_list = next(os.walk('.'))[1]

print(dir_list)

回答 4

import os, os.path

要获取目录中的(全路径)直接子目录:

def SubDirPath (d):
    return filter(os.path.isdir, [os.path.join(d,f) for f in os.listdir(d)])

要获取最新(最新)子目录:

def LatestDirectory (d):
    return max(SubDirPath(d), key=os.path.getmtime)
import os, os.path

To get (full-path) immediate sub-directories in a directory:

def SubDirPath (d):
    return filter(os.path.isdir, [os.path.join(d,f) for f in os.listdir(d)])

To get the latest (newest) sub-directory:

def LatestDirectory (d):
    return max(SubDirPath(d), key=os.path.getmtime)

回答 5

os.walk 在这种情况下是你的朋友。

直接来自文档:

walk()通过自上而下或自下而上遍历树来在目录树中生成文件名。对于以目录顶部(包括顶部本身)为根的树中的每个目录,它会生成一个三元组(目录路径,目录名,文件名)。

os.walk is your friend in this situation.

Straight from the documentation:

walk() generates the file names in a directory tree, by walking the tree either top down or bottom up. For each directory in the tree rooted at directory top (including top itself), it yields a 3-tuple (dirpath, dirnames, filenames).


回答 6

这种方法可以很好地一劳永逸。

from glob import glob
subd = [s.rstrip("/") for s in glob(parent_dir+"*/")]

This method nicely does it all in one go.

from glob import glob
subd = [s.rstrip("/") for s in glob(parent_dir+"*/")]

回答 7

使用Twisted的FilePath模块:

from twisted.python.filepath import FilePath

def subdirs(pathObj):
    for subpath in pathObj.walk():
        if subpath.isdir():
            yield subpath

if __name__ == '__main__':
    for subdir in subdirs(FilePath(".")):
        print "Subdirectory:", subdir

由于一些评论者曾问过使用Twisted的库有什么好处,所以在这里我将超出最初的问题。


分支中有一些经过改进的文档,它们解释了FilePath的优点。您可能需要阅读。

在此示例中更具体地说:与标准库版本不同,此功能可以在不导入的情况下实现。“ subdirs”函数是完全通用的,因为它只对参数起作用。为了使用标准库复制和移动文件,您需要依赖open内置的“ listdir”,“ isdir”或“ os.walk”或“ shutil.copy” 或“ ”。也许也“ os.path.join”。更不用说您需要一个字符串传递参数来标识实际文件的事实。让我们看一下将每个目录的“ index.tpl”复制到“ index.html”的完整实现:

def copyTemplates(topdir):
    for subdir in subdirs(topdir):
        tpl = subdir.child("index.tpl")
        if tpl.exists():
            tpl.copyTo(subdir.child("index.html"))

上面的“ subdirs”功能可以在任何类似FilePath对象上使用。除其他外,这意味着ZipPath对象。不幸的ZipPath是,目前它是只读的,但是可以扩展为支持写作。

您还可以传递自己的对象以进行测试。为了测试此处建议的使用os.path的API,您必须使用导入的名称和隐式依赖项来进行测试,并且通常需要执行魔术操作才能使测试生效。使用FilePath,您可以执行以下操作:

class MyFakePath:
    def child(self, name):
        "Return an appropriate child object"

    def walk(self):
        "Return an iterable of MyFakePath objects"

    def exists(self):
        "Return true or false, as appropriate to the test"

    def isdir(self):
        "Return true or false, as appropriate to the test"
...
subdirs(MyFakePath(...))

Using Twisted’s FilePath module:

from twisted.python.filepath import FilePath

def subdirs(pathObj):
    for subpath in pathObj.walk():
        if subpath.isdir():
            yield subpath

if __name__ == '__main__':
    for subdir in subdirs(FilePath(".")):
        print "Subdirectory:", subdir

Since some commenters have asked what the advantages of using Twisted’s libraries for this is, I’ll go a bit beyond the original question here.


There’s some improved documentation in a branch that explains the advantages of FilePath; you might want to read that.

More specifically in this example: unlike the standard library version, this function can be implemented with no imports. The “subdirs” function is totally generic, in that it operates on nothing but its argument. In order to copy and move the files using the standard library, you need to depend on the “open” builtin, “listdir“, perhaps “isdir” or “os.walk” or “shutil.copy“. Maybe “os.path.join” too. Not to mention the fact that you need a string passed an argument to identify the actual file. Let’s take a look at the full implementation which will copy each directory’s “index.tpl” to “index.html”:

def copyTemplates(topdir):
    for subdir in subdirs(topdir):
        tpl = subdir.child("index.tpl")
        if tpl.exists():
            tpl.copyTo(subdir.child("index.html"))

The “subdirs” function above can work on any FilePath-like object. Which means, among other things, ZipPath objects. Unfortunately ZipPath is read-only right now, but it could be extended to support writing.

You can also pass your own objects for testing purposes. In order to test the os.path-using APIs suggested here, you have to monkey with imported names and implicit dependencies and generally perform black magic to get your tests to work. With FilePath, you do something like this:

class MyFakePath:
    def child(self, name):
        "Return an appropriate child object"

    def walk(self):
        "Return an iterable of MyFakePath objects"

    def exists(self):
        "Return true or false, as appropriate to the test"

    def isdir(self):
        "Return true or false, as appropriate to the test"
...
subdirs(MyFakePath(...))

回答 8

我只是编写了一些代码来移动vmware虚拟机,最终使用os.pathshutil完成了子目录之间的文件复制。

def copy_client_files (file_src, file_dst):
    for file in os.listdir(file_src):
            print "Copying file: %s" % file
            shutil.copy(os.path.join(file_src, file), os.path.join(file_dst, file))

它并不十分优雅,但确实可以工作。

I just wrote some code to move vmware virtual machines around, and ended up using os.path and shutil to accomplish file copying between sub-directories.

def copy_client_files (file_src, file_dst):
    for file in os.listdir(file_src):
            print "Copying file: %s" % file
            shutil.copy(os.path.join(file_src, file), os.path.join(file_dst, file))

It’s not terribly elegant, but it does work.


回答 9

这是一种方法:

import os
import shutil

def copy_over(path, from_name, to_name):
  for path, dirname, fnames in os.walk(path):
    for fname in fnames:
      if fname == from_name:
        shutil.copy(os.path.join(path, from_name), os.path.join(path, to_name))


copy_over('.', 'index.tpl', 'index.html')

Here’s one way:

import os
import shutil

def copy_over(path, from_name, to_name):
  for path, dirname, fnames in os.walk(path):
    for fname in fnames:
      if fname == from_name:
        shutil.copy(os.path.join(path, from_name), os.path.join(path, to_name))


copy_over('.', 'index.tpl', 'index.html')

回答 10

我不得不提到path.py库,该库我经常使用。

提取直接子目录变得如此简单:

my_dir.dirs()

完整的工作示例是:

from path import Path

my_directory = Path("path/to/my/directory")

subdirs = my_directory.dirs()

注意:my_directory仍然可以作为字符串操作,因为Path是字符串的子类,但是提供了许多有用的方法来处理路径

I have to mention the path.py library, which I use very often.

Fetching the immediate subdirectories become as simple as that:

my_dir.dirs()

The full working example is:

from path import Path

my_directory = Path("path/to/my/directory")

subdirs = my_directory.dirs()

NB: my_directory still can be manipulated as a string, since Path is a subclass of string, but providing a bunch of useful methods for manipulating paths


回答 11

def get_folders_in_directories_recursively(directory, index=0):
    folder_list = list()
    parent_directory = directory

    for path, subdirs, _ in os.walk(directory):
        if not index:
            for sdirs in subdirs:
                folder_path = "{}/{}".format(path, sdirs)
                folder_list.append(folder_path)
        elif path[len(parent_directory):].count('/') + 1 == index:
            for sdirs in subdirs:
                folder_path = "{}/{}".format(path, sdirs)
                folder_list.append(folder_path)

    return folder_list

以下函数可以称为:

get_folders_in_directories_recursively(directory,index = 1)->给出第一级的文件夹列表

get_folders_in_directories_recursively(目录)->给出所有子文件夹

def get_folders_in_directories_recursively(directory, index=0):
    folder_list = list()
    parent_directory = directory

    for path, subdirs, _ in os.walk(directory):
        if not index:
            for sdirs in subdirs:
                folder_path = "{}/{}".format(path, sdirs)
                folder_list.append(folder_path)
        elif path[len(parent_directory):].count('/') + 1 == index:
            for sdirs in subdirs:
                folder_path = "{}/{}".format(path, sdirs)
                folder_list.append(folder_path)

    return folder_list

The following function can be called as:

get_folders_in_directories_recursively(directory, index=1) -> gives the list of folders in first level

get_folders_in_directories_recursively(directory) -> gives all the sub folders


回答 12

import glob
import os

def child_dirs(path):
     cd = os.getcwd()        # save the current working directory
     os.chdir(path)          # change directory 
     dirs = glob.glob("*/")  # get all the subdirectories
     os.chdir(cd)            # change directory to the script original location
     return dirs

child_dirs函数采用目录路径,并返回其中的直接子目录列表。

dir
 |
  -- dir_1
  -- dir_2

child_dirs('dir') -> ['dir_1', 'dir_2']
import glob
import os

def child_dirs(path):
     cd = os.getcwd()        # save the current working directory
     os.chdir(path)          # change directory 
     dirs = glob.glob("*/")  # get all the subdirectories
     os.chdir(cd)            # change directory to the script original location
     return dirs

The child_dirs function takes a path a directory and returns a list of the immediate subdirectories in it.

dir
 |
  -- dir_1
  -- dir_2

child_dirs('dir') -> ['dir_1', 'dir_2']

回答 13

import pathlib


def list_dir(dir):
    path = pathlib.Path(dir)
    dir = []
    try:
        for item in path.iterdir():
            if item.is_dir():
                dir.append(item)
        return dir
    except FileNotFoundError:
        print('Invalid directory')
import pathlib


def list_dir(dir):
    path = pathlib.Path(dir)
    dir = []
    try:
        for item in path.iterdir():
            if item.is_dir():
                dir.append(item)
        return dir
    except FileNotFoundError:
        print('Invalid directory')

回答 14

一种使用pathlib的衬板:

list_subfolders_with_paths = [p for p in pathlib.Path(path).iterdir() if p.is_dir()]

One liner using pathlib:

list_subfolders_with_paths = [p for p in pathlib.Path(path).iterdir() if p.is_dir()]

明确关闭文件重要吗?

问题:明确关闭文件重要吗?

在Python中,如果您不调用即可打开文件close(),或者不使用tryfinally或“ with”语句而关闭文件,这是问题吗?还是依靠Python垃圾回收来关闭所有文件作为一种编码实践就足够了?例如,如果这样做:

for line in open("filename"):
    # ... do stuff ...

…这是一个问题,因为文件永远无法关闭,并且可能发生阻止文件关闭的异常吗?还是for由于文件超出范围而肯定会在声明结束时将其关闭?

In Python, if you either open a file without calling close(), or close the file but not using tryfinally or the “with” statement, is this a problem? Or does it suffice as a coding practice to rely on the Python garbage-collection to close all files? For example, if one does this:

for line in open("filename"):
    # ... do stuff ...

… is this a problem because the file can never be closed and an exception could occur that prevents it from being closed? Or will it definitely be closed at the conclusion of the for statement because the file goes out of scope?


回答 0

在您的示例中,不能保证在解释器退出之前关闭文件。在当前版本的CPython中,该文件将在for循环结束时关闭,因为CPython使用引用计数作为其主要的垃圾收集机制,但这是实现细节,而不是语言的功能。不能保证其他Python实现会以这种方式工作。例如,IronPython,PyPy和Jython不使用引用计数,因此不会在循环结束时关闭文件。

依靠CPython的垃圾回收实现是一个坏习惯,因为它使您的代码可移植性降低。如果使用CPython,则可能不会发生资源泄漏,但是,如果切换到不使用引用计数的Python实现,则需要遍历所有代码并确保正确关闭了所有文件。

作为示例,请使用:

with open("filename") as f:
     for line in f:
        # ... do stuff ...

In your example the file isn’t guaranteed to be closed before the interpreter exits. In current versions of CPython the file will be closed at the end of the for loop because CPython uses reference counting as its primary garbage collection mechanism but that’s an implementation detail, not a feature of the language. Other implementations of Python aren’t guaranteed to work this way. For example IronPython, PyPy, and Jython don’t use reference counting and therefore won’t close the file at the end of the loop.

It’s bad practice to rely on CPython’s garbage collection implementation because it makes your code less portable. You might not have resource leaks if you use CPython, but if you ever switch to a Python implementation which doesn’t use reference counting you’ll need to go through all your code and make sure all your files are closed properly.

For your example use:

with open("filename") as f:
     for line in f:
        # ... do stuff ...

回答 1

有些Python在不再被引用时会自动关闭文件,而其他Python不会,并且在Python解释器退出时由O / S来关闭文件。

即使对于将为您关闭文件的Python,也无法保证时间:可能是立即执行,也可能是秒/分钟/小时/天之后。

因此,尽管您使用的Python可能不会遇到问题,但绝对不要将文件保持打开状态。实际上,在cpython 3中,您现在会得到警告,如果您不这样做,系统必须为您关闭文件。

道德:自己清理。:)

Some Pythons will close files automatically when they are no longer referenced, while others will not and it’s up to the O/S to close files when the Python interpreter exits.

Even for the Pythons that will close files for you, the timing is not guaranteed: it could be immediately, or it could be seconds/minutes/hours/days later.

So, while you may not experience problems with the Python you are using, it is definitely not good practice to leave your files open. In fact, in cpython 3 you will now get warnings that the system had to close files for you if you didn’t do it.

Moral: Clean up after yourself. :)


回答 2

尽管在这种特殊情况下使用这种构造是相当安全的,但仍需注意一些概括这种做法的注意事项:

  • 运行可能会用完文件描述符,尽管这不太可能,想象一下找到这样的错误
  • 您可能无法在某些系统上删除该文件,例如win32
  • 如果您运行的不是CPython,则不知道何时关闭文件
  • 如果以写或读写模式打开文件,则不知道何时刷新数据

Although it is quite safe to use such construct in this particular case, there are some caveats for generalising such practice:

  • run can potentially run out of file descriptors, although unlikely, imagine hunting a bug like that
  • you may not be able to delete said file on some systems, e.g. win32
  • if you run anything other than CPython, you don’t know when file is closed for you
  • if you open the file in write or read-write mode, you don’t know when data is flushed

回答 3

该文件确实会收集垃圾,因此已关闭。GC将确定关闭的时间,而不是您。显然,这不是推荐的做法,因为如果您没有在使用完文件后立即关闭文件,则可能会达到打开文件句柄的限制。如果在您的那个for循环中打开更多文件而让它们挥之不去怎么办?

The file does get garbage collected, and hence closed. The GC determines when it gets closed, not you. Obviously, this is not a recommended practice because you might hit open file handle limit if you do not close files as soon as you finish using them. What if within that for loop of yours, you open more files and leave them lingering?


回答 4

嗨,当您要在同一python脚本中使用文件描述符时,关闭文件描述符非常重要。经过很长时间的调试之后,我今天才意识到。原因是仅在关闭文件描述符后,内容才会被编辑/删除/保存,并且对文件的更改也会受到影响!

因此,假设您遇到的情况是将内容写入新文件,然后不关闭fd而在另一个读取其内容的shell命令中使用该文件(而非fd)。在这种情况下,您将无法按预期获得shell命令的内容,并且如果尝试调试,将很难找到该错误。您也可以在我的博客条目http://magnificentzps.blogspot.in/2014/04/importance-of-closing-file-descriptor.html中阅读更多内容

Hi It is very important to close your file descriptor in situation when you are going to use it’s content in the same python script. I today itself realize after so long hecting debugging. The reason is content will be edited/removed/saved only after you close you file descriptor and changes are affected to file!

So suppose you have situation that you write content to a new file and then without closing fd you are using that file(not fd) in another shell command which reads its content. In this situation you will not get you contents for shell command as expected and if you try to debug you can’t find the bug easily. you can also read more in my blog entry http://magnificentzps.blogspot.in/2014/04/importance-of-closing-file-descriptor.html


回答 5

在I / O过程中,数据被缓冲:这意味着在将数据写入文件之前将其保留在一个临时位置。

Python不会刷新缓冲区(即,将数据写入文件),直到确定完成写入为止。一种方法是关闭文件。

如果您在不关闭的情况下写入文件,则数据将不会写入目标文件。

During the I/O process, data is buffered: this means that it is held in a temporary location before being written to the file.

Python doesn’t flush the buffer—that is, write data to the file—until it’s sure you’re done writing. One way to do this is to close the file.

If you write to a file without closing, the data won’t make it to the target file.


如何打开文件夹中的每个文件?

问题:如何打开文件夹中的每个文件?

我有一个python脚本parse.py,该脚本在脚本中打开一个文件,例如file1,然后执行一些操作,可能会打印出字符总数。

filename = 'file1'
f = open(filename, 'r')
content = f.read()
print filename, len(content)

现在,我正在使用stdout将结果定向到我的输出文件-输出

python parse.py >> output

但是,我不想按文件手动处理此文件,有没有办法自动处理每个文件?喜欢

ls | awk '{print}' | python parse.py >> output 

然后问题是如何从standardin中读取文件名?还是已经有一些内置函数可以轻松执行ls和此类工作?

谢谢!

I have a python script parse.py, which in the script open a file, say file1, and then do something maybe print out the total number of characters.

filename = 'file1'
f = open(filename, 'r')
content = f.read()
print filename, len(content)

Right now, I am using stdout to direct the result to my output file – output

python parse.py >> output

However, I don’t want to do this file by file manually, is there a way to take care of every single file automatically? Like

ls | awk '{print}' | python parse.py >> output 

Then the problem is how could I read the file name from standardin? or there are already some built-in functions to do the ls and those kind of work easily?

Thanks!


回答 0

操作系统

您可以使用以下命令列出当前目录中的所有文件os.listdir

import os
for filename in os.listdir(os.getcwd()):
   with open(os.path.join(os.cwd(), filename), 'r') as f: # open in readonly mode
      # do your stuff

球状

或者,您可以根据glob模块的文件模式仅列出一些文件:

import glob
for filename in glob.glob('*.txt'):
   with open(os.path.join(os.cwd(), filename), 'r') as f: # open in readonly mode
      # do your stuff

不必是当前目录,您可以在所需的任何路径中列出它们:

path = '/some/path/to/file'
for filename in glob.glob(os.path.join(path, '*.txt')):
   with open(os.path.join(os.cwd(), filename), 'r') as f: # open in readonly mode
      # do your stuff

管道 或者您甚至可以使用指定的管道来使用fileinput

import fileinput
for line in fileinput.input():
    # do your stuff

然后将其与管道一起使用:

ls -1 | python parse.py

Os

You can list all files in the current directory using os.listdir:

import os
for filename in os.listdir(os.getcwd()):
   with open(os.path.join(os.cwd(), filename), 'r') as f: # open in readonly mode
      # do your stuff

Glob

Or you can list only some files, depending on the file pattern using the glob module:

import glob
for filename in glob.glob('*.txt'):
   with open(os.path.join(os.cwd(), filename), 'r') as f: # open in readonly mode
      # do your stuff

It doesn’t have to be the current directory you can list them in any path you want:

path = '/some/path/to/file'
for filename in glob.glob(os.path.join(path, '*.txt')):
   with open(os.path.join(os.cwd(), filename), 'r') as f: # open in readonly mode
      # do your stuff

Pipe Or you can even use the pipe as you specified using fileinput

import fileinput
for line in fileinput.input():
    # do your stuff

And then use it with piping:

ls -1 | python parse.py

回答 1

你应该尝试使用os.walk

yourpath = 'path'

import os
for root, dirs, files in os.walk(yourpath, topdown=False):
    for name in files:
        print(os.path.join(root, name))
        stuff
    for name in dirs:
        print(os.path.join(root, name))
        stuff

you should try using os.walk

yourpath = 'path'

import os
for root, dirs, files in os.walk(yourpath, topdown=False):
    for name in files:
        print(os.path.join(root, name))
        stuff
    for name in dirs:
        print(os.path.join(root, name))
        stuff

回答 2

我一直在寻找这个答案:

import os,glob
folder_path = '/some/path/to/file'
for filename in glob.glob(os.path.join(folder_path, '*.htm')):
  with open(filename, 'r') as f:
    text = f.read()
    print (filename)
    print (len(text))

您也可以选择“ * .txt”或文件名的另一端

I was looking for this answer:

import os,glob
folder_path = '/some/path/to/file'
for filename in glob.glob(os.path.join(folder_path, '*.htm')):
  with open(filename, 'r') as f:
    text = f.read()
    print (filename)
    print (len(text))

you can choose as well ‘*.txt’ or other ends of your filename


回答 3

您实际上可以只使用os模块来完成这两个操作:

  1. 列出文件夹中的所有文件
  2. 按文件类型,文件名等对文件进行排序

这是一个简单的例子:

import os #os module imported here
location = os.getcwd() # get present working directory location here
counter = 0 #keep a count of all files found
csvfiles = [] #list to store all csv files found at location
filebeginwithhello = [] # list to keep all files that begin with 'hello'
otherfiles = [] #list to keep any other file that do not match the criteria

for file in os.listdir(location):
    try:
        if file.endswith(".csv"):
            print "csv file found:\t", file
            csvfiles.append(str(file))
            counter = counter+1

        elif file.startswith("hello") and file.endswith(".csv"): #because some files may start with hello and also be a csv file
            print "csv file found:\t", file
            csvfiles.append(str(file))
            counter = counter+1

        elif file.startswith("hello"):
            print "hello files found: \t", file
            filebeginwithhello.append(file)
            counter = counter+1

        else:
            otherfiles.append(file)
            counter = counter+1
    except Exception as e:
        raise e
        print "No files found here!"

print "Total files found:\t", counter

现在,您不仅列出了文件夹中的所有文件,而且(可选)按起始名称,文件类型等排序。刚才遍历每个列表并做您的工作。

You can actually just use os module to do both:

  1. list all files in a folder
  2. sort files by file type, file name etc.

Here’s a simple example:

import os #os module imported here
location = os.getcwd() # get present working directory location here
counter = 0 #keep a count of all files found
csvfiles = [] #list to store all csv files found at location
filebeginwithhello = [] # list to keep all files that begin with 'hello'
otherfiles = [] #list to keep any other file that do not match the criteria

for file in os.listdir(location):
    try:
        if file.endswith(".csv"):
            print "csv file found:\t", file
            csvfiles.append(str(file))
            counter = counter+1

        elif file.startswith("hello") and file.endswith(".csv"): #because some files may start with hello and also be a csv file
            print "csv file found:\t", file
            csvfiles.append(str(file))
            counter = counter+1

        elif file.startswith("hello"):
            print "hello files found: \t", file
            filebeginwithhello.append(file)
            counter = counter+1

        else:
            otherfiles.append(file)
            counter = counter+1
    except Exception as e:
        raise e
        print "No files found here!"

print "Total files found:\t", counter

Now you have not only listed all the files in a folder but also have them (optionally) sorted by starting name, file type and others. Just now iterate over each list and do your stuff.


回答 4

import pyautogui
import keyboard
import time
import os
import pyperclip

os.chdir("target directory")

# get the current directory
cwd=os.getcwd()

files=[]

for i in os.walk(cwd):
    for j in i[2]:
        files.append(os.path.abspath(j))

os.startfile("C:\Program Files (x86)\Adobe\Acrobat 11.0\Acrobat\Acrobat.exe")
time.sleep(1)


for i in files:
    print(i)
    pyperclip.copy(i)
    keyboard.press('ctrl')
    keyboard.press_and_release('o')
    keyboard.release('ctrl')
    time.sleep(1)

    keyboard.press('ctrl')
    keyboard.press_and_release('v')
    keyboard.release('ctrl')
    time.sleep(1)
    keyboard.press_and_release('enter')
    keyboard.press('ctrl')
    keyboard.press_and_release('p')
    keyboard.release('ctrl')
    keyboard.press_and_release('enter')
    time.sleep(3)
    keyboard.press('ctrl')
    keyboard.press_and_release('w')
    keyboard.release('ctrl')
    pyperclip.copy('')
import pyautogui
import keyboard
import time
import os
import pyperclip

os.chdir("target directory")

# get the current directory
cwd=os.getcwd()

files=[]

for i in os.walk(cwd):
    for j in i[2]:
        files.append(os.path.abspath(j))

os.startfile("C:\Program Files (x86)\Adobe\Acrobat 11.0\Acrobat\Acrobat.exe")
time.sleep(1)


for i in files:
    print(i)
    pyperclip.copy(i)
    keyboard.press('ctrl')
    keyboard.press_and_release('o')
    keyboard.release('ctrl')
    time.sleep(1)

    keyboard.press('ctrl')
    keyboard.press_and_release('v')
    keyboard.release('ctrl')
    time.sleep(1)
    keyboard.press_and_release('enter')
    keyboard.press('ctrl')
    keyboard.press_and_release('p')
    keyboard.release('ctrl')
    keyboard.press_and_release('enter')
    time.sleep(3)
    keyboard.press('ctrl')
    keyboard.press_and_release('w')
    keyboard.release('ctrl')
    pyperclip.copy('')

回答 5

下面的代码读取包含我们正在运行的脚本的目录中所有可用的文本文件。然后,它将打开每个文本文件,并将文本行中的单词存储到列表中。存储单词后,我们逐行打印每个单词

import os, fnmatch

listOfFiles = os.listdir('.')
pattern = "*.txt"
store = []
for entry in listOfFiles:
    if fnmatch.fnmatch(entry, pattern):
        _fileName = open(entry,"r")
        if _fileName.mode == "r":
            content = _fileName.read()
            contentList = content.split(" ")
            for i in contentList:
                if i != '\n' and i != "\r\n":
                    store.append(i)

for i in store:
    print(i)

The code below reads for any text files available in the directory which contains the script we are running. Then it opens every text file and stores the words of the text line into a list. After store the words we print each word line by line

import os, fnmatch

listOfFiles = os.listdir('.')
pattern = "*.txt"
store = []
for entry in listOfFiles:
    if fnmatch.fnmatch(entry, pattern):
        _fileName = open(entry,"r")
        if _fileName.mode == "r":
            content = _fileName.read()
            contentList = content.split(" ")
            for i in contentList:
                if i != '\n' and i != "\r\n":
                    store.append(i)

for i in store:
    print(i)

使用Python删除文件中的特定行

问题:使用Python删除文件中的特定行

假设我有一个充满昵称的文本文件。如何使用Python从此文件中删除特定的昵称?

Let’s say I have a text file full of nicknames. How can I delete a specific nickname from this file, using Python?


回答 0

首先,打开文件并从文件中获取所有行。然后以写模式重新打开文件并写回您的行,但要删除的行除外:

with open("yourfile.txt", "r") as f:
    lines = f.readlines()
with open("yourfile.txt", "w") as f:
    for line in lines:
        if line.strip("\n") != "nickname_to_delete":
            f.write(line)

您需要strip("\n")在比较中使用换行符,因为如果文件不以换行符结尾,则最后一个line也不行。

First, open the file and get all your lines from the file. Then reopen the file in write mode and write your lines back, except for the line you want to delete:

with open("yourfile.txt", "r") as f:
    lines = f.readlines()
with open("yourfile.txt", "w") as f:
    for line in lines:
        if line.strip("\n") != "nickname_to_delete":
            f.write(line)

You need to strip("\n") the newline character in the comparison because if your file doesn’t end with a newline character the very last line won’t either.


回答 1

仅需一次打开即可解决此问题:

with open("target.txt", "r+") as f:
    d = f.readlines()
    f.seek(0)
    for i in d:
        if i != "line you want to remove...":
            f.write(i)
    f.truncate()

此解决方案以r / w模式(“ r +”)打开文件,并使用一次seek重置f指针,然后在上次写入后截断以删除所有内容。

Solution to this problem with only a single open:

with open("target.txt", "r+") as f:
    d = f.readlines()
    f.seek(0)
    for i in d:
        if i != "line you want to remove...":
            f.write(i)
    f.truncate()

This solution opens the file in r/w mode (“r+”) and makes use of seek to reset the f-pointer then truncate to remove everything after the last write.


回答 2

在我看来,最好和最快的选择不是将所有内容存储在列表中并重新打开文件以将其写入,而是将文件重新写入其他位置。

with open("yourfile.txt", "r") as input:
    with open("newfile.txt", "w") as output: 
        for line in input:
            if line.strip("\n") != "nickname_to_delete":
                output.write(line)

而已!在一个循环中,只有一个循环您可以执行相同的操作。它将更快。

The best and fastest option, rather than storing everything in a list and re-opening the file to write it, is in my opinion to re-write the file elsewhere.

with open("yourfile.txt", "r") as input:
    with open("newfile.txt", "w") as output: 
        for line in input:
            if line.strip("\n") != "nickname_to_delete":
                output.write(line)

That’s it! In one loop and one only you can do the same thing. It will be much faster.


回答 3

这是@Lother答案的“叉子” (我认为应该认为是正确的答案)。


对于这样的文件:

$ cat file.txt 
1: october rust
2: november rain
3: december snow

Lother解决方案中的这个fork可以正常工作:

#!/usr/bin/python3.4

with open("file.txt","r+") as f:
    new_f = f.readlines()
    f.seek(0)
    for line in new_f:
        if "snow" not in line:
            f.write(line)
    f.truncate()

改进之处:

  • with open,放弃使用 f.close()
  • 更清晰地if/else评估当前行中是否不存在字符串

This is a “fork” from @Lother‘s answer (which I believe that should be considered the right answer).


For a file like this:

$ cat file.txt 
1: october rust
2: november rain
3: december snow

This fork from Lother’s solution works fine:

#!/usr/bin/python3.4

with open("file.txt","r+") as f:
    new_f = f.readlines()
    f.seek(0)
    for line in new_f:
        if "snow" not in line:
            f.write(line)
    f.truncate()

Improvements:

  • with open, which discard the usage of f.close()
  • more clearer if/else for evaluating if string is not present in the current line

回答 4

在第一遍中读取行并在第二遍中进行更改(删除特定行)的问题是,如果文件大小很大,则会用完RAM。相反,一种更好的方法是逐行读取行,并将其写入单独的文件中,从而消除不需要的行。我使用的文件大小高达12-50 GB,并且RAM使用率几乎保持不变。只有CPU周期显示正在进行处理。

The issue with reading lines in first pass and making changes (deleting specific lines) in the second pass is that if you file sizes are huge, you will run out of RAM. Instead, a better approach is to read lines, one by one, and write them into a separate file, eliminating the ones you don’t need. I have run this approach with files as big as 12-50 GB, and the RAM usage remains almost constant. Only CPU cycles show processing in progress.


回答 5

我喜欢此答案中所述的fileinput方法: 从文本文件(python)删除一行

举例来说,我有一个包含空行的文件,并且想要删除空行,这是我如何解决的方法:

import fileinput
import sys
for line_number, line in enumerate(fileinput.input('file1.txt', inplace=1)):
    if len(line) > 1:
            sys.stdout.write(line)

注意:我的空行长度为1

I liked the fileinput approach as explained in this answer: Deleting a line from a text file (python)

Say for example I have a file which has empty lines in it and I want to remove empty lines, here’s how I solved it:

import fileinput
import sys
for line_number, line in enumerate(fileinput.input('file1.txt', inplace=1)):
    if len(line) > 1:
            sys.stdout.write(line)

Note: The empty lines in my case had length 1


回答 6

如果使用Linux,则可以尝试以下方法。
假设您有一个名为的文本文件animal.txt

$ cat animal.txt  
dog
pig
cat 
monkey         
elephant  

删除第一行:

>>> import subprocess
>>> subprocess.call(['sed','-i','/.*dog.*/d','animal.txt']) 

然后

$ cat animal.txt
pig
cat
monkey
elephant

If you use Linux, you can try the following approach.
Suppose you have a text file named animal.txt:

$ cat animal.txt  
dog
pig
cat 
monkey         
elephant  

Delete the first line:

>>> import subprocess
>>> subprocess.call(['sed','-i','/.*dog.*/d','animal.txt']) 

then

$ cat animal.txt
pig
cat
monkey
elephant

回答 7

我认为,如果您将文件读入列表,则可以在列表上进行遍历以查找要删除的昵称。您可以高效地执行此操作,而无需创建其他文件,但是必须将结果写回到源文件中。

这是我可能的方法:

import, os, csv # and other imports you need
nicknames_to_delete = ['Nick', 'Stephen', 'Mark']

我假设nicknames.csv包含如下数据:

Nick
Maria
James
Chris
Mario
Stephen
Isabella
Ahmed
Julia
Mark
...

然后将文件加载到列表中:

 nicknames = None
 with open("nicknames.csv") as sourceFile:
     nicknames = sourceFile.read().splitlines()

接下来,迭代到列表以匹配要删除的输入:

for nick in nicknames_to_delete:
     try:
         if nick in nicknames:
             nicknames.pop(nicknames.index(nick))
         else:
             print(nick + " is not found in the file")
     except ValueError:
         pass

最后,将结果写回文件:

with open("nicknames.csv", "a") as nicknamesFile:
    nicknamesFile.seek(0)
    nicknamesFile.truncate()
    nicknamesWriter = csv.writer(nicknamesFile)
    for name in nicknames:
        nicknamesWriter.writeRow([str(name)])
nicknamesFile.close()

I think if you read the file into a list, then do the you can iterate over the list to look for the nickname you want to get rid of. You can do it much efficiently without creating additional files, but you’ll have to write the result back to the source file.

Here’s how I might do this:

import, os, csv # and other imports you need
nicknames_to_delete = ['Nick', 'Stephen', 'Mark']

I’m assuming nicknames.csv contains data like:

Nick
Maria
James
Chris
Mario
Stephen
Isabella
Ahmed
Julia
Mark
...

Then load the file into the list:

 nicknames = None
 with open("nicknames.csv") as sourceFile:
     nicknames = sourceFile.read().splitlines()

Next, iterate over to list to match your inputs to delete:

for nick in nicknames_to_delete:
     try:
         if nick in nicknames:
             nicknames.pop(nicknames.index(nick))
         else:
             print(nick + " is not found in the file")
     except ValueError:
         pass

Lastly, write the result back to file:

with open("nicknames.csv", "a") as nicknamesFile:
    nicknamesFile.seek(0)
    nicknamesFile.truncate()
    nicknamesWriter = csv.writer(nicknamesFile)
    for name in nicknames:
        nicknamesWriter.writeRow([str(name)])
nicknamesFile.close()

回答 8

一般来说,您不能;您必须再次写入整个文件(至少从更改到结束为止)。

在某些特定情况下,您可以做得更好-

如果所有数据元素的长度相同且没有特定顺序,并且您知道要删除的元素的偏移量,则可以将最后一项复制到要删除的项上,并在最后一项之前截断文件;

或者,您也可以在已保存的数据元素中用“这是不良数据,跳过它”的值覆盖数据块,或者在已保存的数据元素中保留“此项目已被删除”标志,这样就可以将其标记为已删除,而无需另外修改文件。

对于简短的文档(小于100 KB的内容?)来说,这可能是多余的。

In general, you can’t; you have to write the whole file again (at least from the point of change to the end).

In some specific cases you can do better than this –

if all your data elements are the same length and in no specific order, and you know the offset of the one you want to get rid of, you could copy the last item over the one to be deleted and truncate the file before the last item;

or you could just overwrite the data chunk with a ‘this is bad data, skip it’ value or keep a ‘this item has been deleted’ flag in your saved data elements such that you can mark it deleted without otherwise modifying the file.

This is probably overkill for short documents (anything under 100 KB?).


回答 9

可能您已经得到了正确的答案,但这是我的。readlines()我使用了两个文件,而不是使用列表来收集未过滤的数据(方法做了什么)。一个用于保存主数据,第二个用于删除特定字符串时过滤数据。这是一个代码:

main_file = open('data_base.txt').read()    # your main dataBase file
filter_file = open('filter_base.txt', 'w')
filter_file.write(main_file)
filter_file.close()
main_file = open('data_base.txt', 'w')
for line in open('filter_base'):
    if 'your data to delete' not in line:    # remove a specific string
        main_file.write(line)                # put all strings back to your db except deleted
    else: pass
main_file.close()

希望您会发现这个有用!:)

Probably, you already got a correct answer, but here is mine. Instead of using a list to collect unfiltered data (what readlines() method does), I use two files. One is for hold a main data, and the second is for filtering the data when you delete a specific string. Here is a code:

main_file = open('data_base.txt').read()    # your main dataBase file
filter_file = open('filter_base.txt', 'w')
filter_file.write(main_file)
filter_file.close()
main_file = open('data_base.txt', 'w')
for line in open('filter_base'):
    if 'your data to delete' not in line:    # remove a specific string
        main_file.write(line)                # put all strings back to your db except deleted
    else: pass
main_file.close()

Hope you will find this useful! :)


回答 10

将文件行保存在列表中,然后从列表中删除要删除的行,并将其余行写入新文件

with open("file_name.txt", "r") as f:
    lines = f.readlines() 
    lines.remove("Line you want to delete\n")
    with open("new_file.txt", "w") as new_f:
        for line in lines:        
            new_f.write(line)

Save the file lines in a list, then remove of the list the line you want to delete and write the remain lines to a new file

with open("file_name.txt", "r") as f:
    lines = f.readlines() 
    lines.remove("Line you want to delete\n")
    with open("new_file.txt", "w") as new_f:
        for line in lines:        
            new_f.write(line)

回答 11

这是从文件中删除某行的一些其他方法:

src_file = zzzz.txt
f = open(src_file, "r")
contents = f.readlines()
f.close()

contents.pop(idx) # remove the line item from list, by line number, starts from 0

f = open(src_file, "w")
contents = "".join(contents)
f.write(contents)
f.close()

here’s some other method to remove a/some line(s) from a file:

src_file = zzzz.txt
f = open(src_file, "r")
contents = f.readlines()
f.close()

contents.pop(idx) # remove the line item from list, by line number, starts from 0

f = open(src_file, "w")
contents = "".join(contents)
f.write(contents)
f.close()

回答 12

我喜欢使用fileinput和’inplace’方法的此方法:

import fileinput
for line in fileinput.input(fname, inplace =1):
    line = line.strip()
    if not 'UnwantedWord' in line:
        print(line)

它比其他答案少罗word,并且足够快

I like this method using fileinput and the ‘inplace’ method:

import fileinput
for line in fileinput.input(fname, inplace =1):
    line = line.strip()
    if not 'UnwantedWord' in line:
        print(line)

It’s a little less wordy than the other answers and is fast enough for


回答 13

您可以使用re图书馆

假设您能够加载完整的txt文件。然后,您定义不需要的昵称列表,然后将其替换为空字符串“”。

# Delete unwanted characters
import re

# Read, then decode for py2 compat.
path_to_file = 'data/nicknames.txt'
text = open(path_to_file, 'rb').read().decode(encoding='utf-8')

# Define unwanted nicknames and substitute them
unwanted_nickname_list = ['SourDough']
text = re.sub("|".join(unwanted_nickname_list), "", text)

You can use the re library

Assuming that you are able to load your full txt-file. You then define a list of unwanted nicknames and then substitute them with an empty string “”.

# Delete unwanted characters
import re

# Read, then decode for py2 compat.
path_to_file = 'data/nicknames.txt'
text = open(path_to_file, 'rb').read().decode(encoding='utf-8')

# Define unwanted nicknames and substitute them
unwanted_nickname_list = ['SourDough']
text = re.sub("|".join(unwanted_nickname_list), "", text)

回答 14

通过文件的行号删除文件的特​​定行

将变量filenameline_to_delete替换为文件名和要删除的行号。

filename = 'foo.txt'
line_to_delete = 3
initial_line = 1
file_lines = {}

with open(filename) as f:
    content = f.readlines() 

for line in content:
    file_lines[initial_line] = line.strip()
    initial_line += 1

f = open(filename, "w")
for line_number, line_content in file_lines.items():
    if line_number != line_to_delete:
        f.write('{}\n'.format(line_content))

f.close()
print('Deleted line: {}'.format(line_to_delete))

输出示例:

Deleted line: 3

To delete a specific line of a file by its line number:

Replace variables filename and line_to_delete with the name of your file and the line number you want to delete.

filename = 'foo.txt'
line_to_delete = 3
initial_line = 1
file_lines = {}

with open(filename) as f:
    content = f.readlines() 

for line in content:
    file_lines[initial_line] = line.strip()
    initial_line += 1

f = open(filename, "w")
for line_number, line_content in file_lines.items():
    if line_number != line_to_delete:
        f.write('{}\n'.format(line_content))

f.close()
print('Deleted line: {}'.format(line_to_delete))

Example output:

Deleted line: 3

回答 15

取文件内容,用换行符将其拆分为元组。然后,访问您的元组的行号,加入结果元组,然后覆盖该文件。

Take the contents of the file, split it by newline into a tuple. Then, access your tuple’s line number, join your result tuple, and overwrite to the file.


仅读取特定行

问题:仅读取特定行

我正在使用for循环读取文件,但是我只想读取特定的行,例如26号和30号行。是否有内置功能可以实现这一目标?

谢谢

I’m using a for loop to read a file, but I only want to read specific lines, say line #26 and #30. Is there any built-in feature to achieve this?

Thanks


回答 0

如果要读取的文件很大,并且您不想一次读取内存中的整个文件:

fp = open("file")
for i, line in enumerate(fp):
    if i == 25:
        # 26th line
    elif i == 29:
        # 30th line
    elif i > 29:
        break
fp.close()

注意,i == n-1对于nth行。


在Python 2.6或更高版本中:

with open("file") as fp:
    for i, line in enumerate(fp):
        if i == 25:
            # 26th line
        elif i == 29:
            # 30th line
        elif i > 29:
            break

If the file to read is big, and you don’t want to read the whole file in memory at once:

fp = open("file")
for i, line in enumerate(fp):
    if i == 25:
        # 26th line
    elif i == 29:
        # 30th line
    elif i > 29:
        break
fp.close()

Note that i == n-1 for the nth line.


In Python 2.6 or later:

with open("file") as fp:
    for i, line in enumerate(fp):
        if i == 25:
            # 26th line
        elif i == 29:
            # 30th line
        elif i > 29:
            break

回答 1

快速答案:

f=open('filename')
lines=f.readlines()
print lines[25]
print lines[29]

要么:

lines=[25, 29]
i=0
f=open('filename')
for line in f:
    if i in lines:
        print i
    i+=1

有一种提取许多行的更优雅的解决方案:linecache(由“ python:如何跳转到巨大的文本文件中的特定行?”,这是上一个stackoverflow.com问题)。

引用上面链接的python文档:

>>> import linecache
>>> linecache.getline('/etc/passwd', 4)
'sys:x:3:3:sys:/dev:/bin/sh\n'

将更4改为所需的行号,然后打开。请注意,由于计数从零开始,因此4将带来第五行。

如果文件可能很大,并且在读入内存时引起问题,则最好采用@Alok的建议并使用enumerate()

结论:

  • 使用fileobject.readlines()for line in fileobject作为小型文件的快速解决方案。
  • 使用linecache一个更优雅的解决方案,这将是相当快的读取许多文件,可能反复。
  • 听@Alok的建议,将其enumerate()用于可能非常大且不适合内存的文件。请注意,使用此方法可能会变慢,因为文件是按顺序读取的。

The quick answer:

f=open('filename')
lines=f.readlines()
print lines[25]
print lines[29]

or:

lines=[25, 29]
i=0
f=open('filename')
for line in f:
    if i in lines:
        print i
    i+=1

There is a more elegant solution for extracting many lines: linecache (courtesy of “python: how to jump to a particular line in a huge text file?”, a previous stackoverflow.com question).

Quoting the python documentation linked above:

>>> import linecache
>>> linecache.getline('/etc/passwd', 4)
'sys:x:3:3:sys:/dev:/bin/sh\n'

Change the 4 to your desired line number, and you’re on. Note that 4 would bring the fifth line as the count is zero-based.

If the file might be very large, and cause problems when read into memory, it might be a good idea to take @Alok’s advice and use enumerate().

To Conclude:

  • Use fileobject.readlines() or for line in fileobject as a quick solution for small files.
  • Use linecache for a more elegant solution, which will be quite fast for reading many files, possible repeatedly.
  • Take @Alok’s advice and use enumerate() for files which could be very large, and won’t fit into memory. Note that using this method might slow because the file is read sequentially.

回答 2

一种快速而紧凑的方法可以是:

def picklines(thefile, whatlines):
  return [x for i, x in enumerate(thefile) if i in whatlines]

它接受任何打开的类文件对象thefile(无论是从磁盘文件中打开,还是应通过套接字或其他类似文件的流打开,都由调用者决定)和一组从零开始的行索引whatlines,并返回一个列表,具有较低的内存占用量和合理的速度。如果要返回的行数很大,则您可能更喜欢生成器:

def yieldlines(thefile, whatlines):
  return (x for i, x in enumerate(thefile) if i in whatlines)

这基本上只适合循环使用-请注意,唯一的区别是在return语句中使用了舍入而不是正方形的括号,分别使列表理解和生成器表达式成为可能。

此外应注意,尽管“线”,并注明“文件”这些功能很多,很多更普遍的-他们会在工作的任何可迭代的,无论是打开的文件或任何其他的,返回的项目清单(或生成器)根据其渐进项编号。因此,我建议使用更适当的通用名称;-)。

A fast and compact approach could be:

def picklines(thefile, whatlines):
  return [x for i, x in enumerate(thefile) if i in whatlines]

this accepts any open file-like object thefile (leaving up to the caller whether it should be opened from a disk file, or via e.g a socket, or other file-like stream) and a set of zero-based line indices whatlines, and returns a list, with low memory footprint and reasonable speed. If the number of lines to be returned is huge, you might prefer a generator:

def yieldlines(thefile, whatlines):
  return (x for i, x in enumerate(thefile) if i in whatlines)

which is basically only good for looping upon — note that the only difference comes from using rounded rather than square parentheses in the return statement, making a list comprehension and a generator expression respectively.

Further note that despite the mention of “lines” and “file” these functions are much, much more general — they’ll work on any iterable, be it an open file or any other, returning a list (or generator) of items based on their progressive item-numbers. So, I’d suggest using more appropriately general names;-).


回答 3

为了提供另一个解决方案:

import linecache
linecache.getline('Sample.txt', Number_of_Line)

我希望这是方便快捷的:)

For the sake of offering another solution:

import linecache
linecache.getline('Sample.txt', Number_of_Line)

I hope this is quick and easy :)


回答 4

如果你要第7行

line = open(“ file.txt”,“ r”)。readlines()[7]

if you want line 7

line = open("file.txt", "r").readlines()[7]

回答 5

为了完整起见,这里还有一个选择。

让我们从python docs的定义开始:

切片通常包含一部分序列的对象。使用下标符号[]创建切片,当给出多个变量时(例如在variable_name [1:3:5]中),在数字之间使用冒号。方括号(下标)表示法在内部使用切片对象(或在较早的版本中为__getslice __()和__setslice __())。

尽管切片符号通常不直接适用于迭代器,但该itertools包包含替换功能:

from itertools import islice

# print the 100th line
with open('the_file') as lines:
    for line in islice(lines, 99, 100):
        print line

# print each third line until 100
with open('the_file') as lines:
    for line in islice(lines, 0, 100, 3):
        print line

该函数的另一个优点是,直到结束,它才读取迭代器。因此,您可以做更复杂的事情:

with open('the_file') as lines:
    # print the first 100 lines
    for line in islice(lines, 100):
        print line

    # then skip the next 5
    for line in islice(lines, 5):
        pass

    # print the rest
    for line in lines:
        print line

并回答原始问题:

# how to read lines #26 and #30
In [365]: list(islice(xrange(1,100), 25, 30, 4))
Out[365]: [26, 30]

For the sake of completeness, here is one more option.

Let’s start with a definition from python docs:

slice An object usually containing a portion of a sequence. A slice is created using the subscript notation, [] with colons between numbers when several are given, such as in variable_name[1:3:5]. The bracket (subscript) notation uses slice objects internally (or in older versions, __getslice__() and __setslice__()).

Though the slice notation is not directly applicable to iterators in general, the itertools package contains a replacement function:

from itertools import islice

# print the 100th line
with open('the_file') as lines:
    for line in islice(lines, 99, 100):
        print line

# print each third line until 100
with open('the_file') as lines:
    for line in islice(lines, 0, 100, 3):
        print line

The additional advantage of the function is that it does not read the iterator until the end. So you can do more complex things:

with open('the_file') as lines:
    # print the first 100 lines
    for line in islice(lines, 100):
        print line

    # then skip the next 5
    for line in islice(lines, 5):
        pass

    # print the rest
    for line in lines:
        print line

And to answer the original question:

# how to read lines #26 and #30
In [365]: list(islice(xrange(1,100), 25, 30, 4))
Out[365]: [26, 30]

回答 6

读取文件的速度非常快。读取100MB的文件只需不到0.1秒的时间(请参阅我的文章使用Python读取和写入文件)。因此,您应该完整阅读它,然后使用单行代码。

大多数答案在这里不是错,而是风格不好。应该始终使用打开文件的方式进行操作,with因为这可以确保再次关闭文件。

因此,您应该这样做:

with open("path/to/file.txt") as f:
    lines = f.readlines()
print(lines[26])  # or whatever you want to do with this line
print(lines[30])  # or whatever you want to do with this line

巨大的文件

如果碰巧有一个巨大的文件,而内存消耗是一个问题,则可以逐行处理它:

with open("path/to/file.txt") as f:
    for i, line in enumerate(f):
        pass  # process line i

Reading files is incredible fast. Reading a 100MB file takes less than 0.1 seconds (see my article Reading and Writing Files with Python). Hence you should read it completely and then work with the single lines.

What most answer here do is not wrong, but bad style. Opening files should always be done with with as it makes sure that the file is closed again.

So you should do it like this:

with open("path/to/file.txt") as f:
    lines = f.readlines()
print(lines[26])  # or whatever you want to do with this line
print(lines[30])  # or whatever you want to do with this line

Huge files

If you happen to have a huge file and memory consumption is a concern, you can process it line by line:

with open("path/to/file.txt") as f:
    for i, line in enumerate(f):
        pass  # process line i

回答 7

其中一些很可爱,但是可以更简单地完成:

start = 0 # some starting index
end = 5000 # some ending index
filename = 'test.txt' # some file we want to use

with open(filename) as fh:
    data = fin.readlines()[start:end]

print(data)

这将仅使用列表切片,它会加载整个文件,但是大多数系统会适当地最小化内存使用,它比上面给出的大多数方法都快,并且可以在我的10G +数据文件上运行。祝好运!

Some of these are lovely, but it can be done much more simply:

start = 0 # some starting index
end = 5000 # some ending index
filename = 'test.txt' # some file we want to use

with open(filename) as fh:
    data = fin.readlines()[start:end]

print(data)

That will use simply list slicing, it loads the whole file, but most systems will minimise memory usage appropriately, it’s faster than most of the methods given above, and works on my 10G+ data files. Good luck!


回答 8

您可以进行一次seek()调用,将读取头定位到文件中的指定字节。除非您确切知道要读取的行之前文件中写入了多少个字节(字符),否则这对您没有帮助。也许文件是严格格式化的(每行是X字节数?),或者,如果您确实想要提高速度,则可以自己计算字符数(记住要包括换行符等不可见字符)。

否则,您必须按照此处已提出的许多解决方案之一,在需要的行之前先阅读每一行。

You can do a seek() call which positions your read head to a specified byte within the file. This won’t help you unless you know exactly how many bytes (characters) are written in the file before the line you want to read. Perhaps your file is strictly formatted (each line is X number of bytes?) or, you could count the number of characters yourself (remember to include invisible characters like line breaks) if you really want the speed boost.

Otherwise, you do have to read every line prior to the line you desire, as per one of the many solutions already proposed here.


回答 9

如果大型文本文件file的结构严格(意味着每一行的长度都相同l),则可以使用n-th行

with open(file) as f:
    f.seek(n*l)
    line = f.readline() 
    last_pos = f.tell()

免责声明这仅适用于相同长度的文件!

If your large text file file is strictly well-structured (meaning every line has the same length l), you could use for n-th line

with open(file) as f:
    f.seek(n*l)
    line = f.readline() 
    last_pos = f.tell()

Disclaimer This does only work for files with the same length!


回答 10

这个怎么样:

>>> with open('a', 'r') as fin: lines = fin.readlines()
>>> for i, line in enumerate(lines):
      if i > 30: break
      if i == 26: dox()
      if i == 30: doy()

How about this:

>>> with open('a', 'r') as fin: lines = fin.readlines()
>>> for i, line in enumerate(lines):
      if i > 30: break
      if i == 26: dox()
      if i == 30: doy()

回答 11

如果您不介意导入,那么fileinput会完全满足您的需要(这是您可以读取当前行的行号)

If you don’t mind importing then fileinput does exactly what you need (this is you can read the line number of the current line)


回答 12

def getitems(iterable, items):
  items = list(items) # get a list from any iterable and make our own copy
                      # since we modify it
  if items:
    items.sort()
    for n, v in enumerate(iterable):
      if n == items[0]:
        yield v
        items.pop(0)
        if not items:
          break

print list(getitems(open("/usr/share/dict/words"), [25, 29]))
# ['Abelson\n', 'Abernathy\n']
# note that index 25 is the 26th item
def getitems(iterable, items):
  items = list(items) # get a list from any iterable and make our own copy
                      # since we modify it
  if items:
    items.sort()
    for n, v in enumerate(iterable):
      if n == items[0]:
        yield v
        items.pop(0)
        if not items:
          break

print list(getitems(open("/usr/share/dict/words"), [25, 29]))
# ['Abelson\n', 'Abernathy\n']
# note that index 25 is the 26th item

回答 13

我更喜欢这种方法,因为它更具通用性,也就是说,您可以在文件上,在结果上f.readlines(),在StringIO对象上使用它,无论如何:

def read_specific_lines(file, lines_to_read):
   """file is any iterable; lines_to_read is an iterable containing int values"""
   lines = set(lines_to_read)
   last = max(lines)
   for n, line in enumerate(file):
      if n + 1 in lines:
          yield line
      if n + 1 > last:
          return

>>> with open(r'c:\temp\words.txt') as f:
        [s for s in read_specific_lines(f, [1, 2, 3, 1000])]
['A\n', 'a\n', 'aa\n', 'accordant\n']

I prefer this approach because it’s more general-purpose, i.e. you can use it on a file, on the result of f.readlines(), on a StringIO object, whatever:

def read_specific_lines(file, lines_to_read):
   """file is any iterable; lines_to_read is an iterable containing int values"""
   lines = set(lines_to_read)
   last = max(lines)
   for n, line in enumerate(file):
      if n + 1 in lines:
          yield line
      if n + 1 > last:
          return

>>> with open(r'c:\temp\words.txt') as f:
        [s for s in read_specific_lines(f, [1, 2, 3, 1000])]
['A\n', 'a\n', 'aa\n', 'accordant\n']

回答 14

这是我的2美分,不值一分;)

def indexLines(filename, lines=[2,4,6,8,10,12,3,5,7,1]):
    fp   = open(filename, "r")
    src  = fp.readlines()
    data = [(index, line) for index, line in enumerate(src) if index in lines]
    fp.close()
    return data


# Usage below
filename = "C:\\Your\\Path\\And\\Filename.txt"
for line in indexLines(filename): # using default list, specify your own list of lines otherwise
    print "Line: %s\nData: %s\n" % (line[0], line[1])

Here’s my little 2 cents, for what it’s worth ;)

def indexLines(filename, lines=[2,4,6,8,10,12,3,5,7,1]):
    fp   = open(filename, "r")
    src  = fp.readlines()
    data = [(index, line) for index, line in enumerate(src) if index in lines]
    fp.close()
    return data


# Usage below
filename = "C:\\Your\\Path\\And\\Filename.txt"
for line in indexLines(filename): # using default list, specify your own list of lines otherwise
    print "Line: %s\nData: %s\n" % (line[0], line[1])

回答 15

Alok Singhal的答案有一个更好而次要的变化

fp = open("file")
for i, line in enumerate(fp,1):
    if i == 26:
        # 26th line
    elif i == 30:
        # 30th line
    elif i > 30:
        break
fp.close()

A better and minor change for Alok Singhal’s answer

fp = open("file")
for i, line in enumerate(fp,1):
    if i == 26:
        # 26th line
    elif i == 30:
        # 30th line
    elif i > 30:
        break
fp.close()

回答 16

文件对象具有.readlines()方法,该方法将为您提供文件内容的列表,每个列表项一行。在那之后,您可以只使用常规的列表切片技术。

http://docs.python.org/library/stdtypes.html#file.readlines

File objects have a .readlines() method which will give you a list of the contents of the file, one line per list item. After that, you can just use normal list slicing techniques.

http://docs.python.org/library/stdtypes.html#file.readlines


回答 17

@OP,可以使用枚举

for n,line in enumerate(open("file")):
    if n+1 in [26,30]: # or n in [25,29] 
       print line.rstrip()

@OP, you can use enumerate

for n,line in enumerate(open("file")):
    if n+1 in [26,30]: # or n in [25,29] 
       print line.rstrip()

回答 18

file = '/path/to/file_to_be_read.txt'
with open(file) as f:
    print f.readlines()[26]
    print f.readlines()[30]

使用with语句,将打开文件,打印第26和30行,然后关闭文件。简单!

file = '/path/to/file_to_be_read.txt'
with open(file) as f:
    print f.readlines()[26]
    print f.readlines()[30]

Using the with statement, this opens the file, prints lines 26 and 30, then closes the file. Simple!


回答 19

您可以使用已经有人提到过的这种语法非常简单地执行此操作,但这是迄今为止最简单的方法:

inputFile = open("lineNumbers.txt", "r")
lines = inputFile.readlines()
print (lines[0])
print (lines[2])

You can do this very simply with this syntax that someone already mentioned, but it’s by far the easiest way to do it:

inputFile = open("lineNumbers.txt", "r")
lines = inputFile.readlines()
print (lines[0])
print (lines[2])

回答 20

要打印第3行,

line_number = 3

with open(filename,"r") as file:
current_line = 1
for line in file:
    if current_line == line_number:
        print(file.readline())
        break
    current_line += 1

原作者:弗兰克·霍夫曼

To print line# 3,

line_number = 3

with open(filename,"r") as file:
current_line = 1
for line in file:
    if current_line == line_number:
        print(file.readline())
        break
    current_line += 1

Original author: Frank Hofmann


回答 21

相当快而且很关键。

在文本文件中打印某些行。创建一个“ lines2print”列表,然后仅在枚举“ lines2print”列表中时打印。要摆脱多余的“ \ n”,请使用line.strip()或line.strip(’\ n’)。我只喜欢“列表理解”,并在可以的时候尝试使用。我喜欢使用“ with”方法读取文本文件,以防止由于任何原因使文件保持打开状态。

lines2print = [26,30] # can be a big list and order doesn't matter.

with open("filepath", 'r') as fp:
    [print(x.strip()) for ei,x in enumerate(fp) if ei in lines2print]

或者,如果list很小,只需在列表中输入list作为理解即可。

with open("filepath", 'r') as fp:
    [print(x.strip()) for ei,x in enumerate(fp) if ei in [26,30]]

Fairly quick and to the point.

To print certain lines in a text file. Create a “lines2print” list and then just print when the enumeration is “in” the lines2print list. To get rid of extra ‘\n’ use line.strip() or line.strip(‘\n’). I just like “list comprehension” and try to use when I can. I like the “with” method to read text files in order to prevent leaving a file open for any reason.

lines2print = [26,30] # can be a big list and order doesn't matter.

with open("filepath", 'r') as fp:
    [print(x.strip()) for ei,x in enumerate(fp) if ei in lines2print]

or if list is small just type in list as a list into the comprehension.

with open("filepath", 'r') as fp:
    [print(x.strip()) for ei,x in enumerate(fp) if ei in [26,30]]

回答 22

打印所需的行。在所需行上方/下方打印行。

def dline(file,no,add_sub=0):
    tf=open(file)
    for sno,line in enumerate(tf):
        if sno==no-1+add_sub:
         print(line)
    tf.close()

execute —-> dline(“ D:\ dummy.txt”,6),即dline(“ file path”,line_number,如果要让搜索行的上一行给低1 -1,这是可选的默认值被采取0)

To print desired line. To print line above/below required line.

def dline(file,no,add_sub=0):
    tf=open(file)
    for sno,line in enumerate(tf):
        if sno==no-1+add_sub:
         print(line)
    tf.close()

execute—->dline(“D:\dummy.txt”,6) i.e dline(“file path”, line_number, if you want upper line of the searched line give 1 for lower -1 this is optional default value will be taken 0)


回答 23

如果您想读取特定的行,例如在某个阈值行之后开始的行,则可以使用以下代码, file = open("files.txt","r") lines = file.readlines() ## convert to list of lines datas = lines[11:] ## raed the specific lines

If you want to read specific lines, such as line starting after some threshold line then you can use the following codes, file = open("files.txt","r") lines = file.readlines() ## convert to list of lines datas = lines[11:] ## raed the specific lines


回答 24

f = open(filename, 'r')
totalLines = len(f.readlines())
f.close()
f = open(filename, 'r')

lineno = 1
while lineno < totalLines:
    line = f.readline()

    if lineno == 26:
        doLine26Commmand(line)

    elif lineno == 30:
        doLine30Commmand(line)

    lineno += 1
f.close()
f = open(filename, 'r')
totalLines = len(f.readlines())
f.close()
f = open(filename, 'r')

lineno = 1
while lineno < totalLines:
    line = f.readline()

    if lineno == 26:
        doLine26Commmand(line)

    elif lineno == 30:
        doLine30Commmand(line)

    lineno += 1
f.close()

回答 25

我认为这会工作

 open_file1 = open("E:\\test.txt",'r')
 read_it1 = open_file1.read()
 myline1 = []
 for line1 in read_it1.splitlines():
 myline1.append(line1)
 print myline1[0]

I think this would work

 open_file1 = open("E:\\test.txt",'r')
 read_it1 = open_file1.read()
 myline1 = []
 for line1 in read_it1.splitlines():
 myline1.append(line1)
 print myline1[0]

如何搜索和替换文件中的文本?

问题:如何搜索和替换文件中的文本?

如何使用Python 3搜索和替换文件中的文本?

这是我的代码:

import os
import sys
import fileinput

print ("Text to search for:")
textToSearch = input( "> " )

print ("Text to replace it with:")
textToReplace = input( "> " )

print ("File to perform Search-Replace on:")
fileToSearch  = input( "> " )
#fileToSearch = 'D:\dummy1.txt'

tempFile = open( fileToSearch, 'r+' )

for line in fileinput.input( fileToSearch ):
    if textToSearch in line :
        print('Match Found')
    else:
        print('Match Not Found!!')
    tempFile.write( line.replace( textToSearch, textToReplace ) )
tempFile.close()


input( '\n\n Press Enter to exit...' )

输入文件:

hi this is abcd hi this is abcd
This is dummy text file.
This is how search and replace works abcd

当我在上面的输入文件中搜索并将“ ram”替换为“ abcd”时,它起了一种魅力。但是,反之亦然,即用“ ram”替换“ abcd”时,一些垃圾字符会保留在末尾。

用“ ram”代替“ abcd”

hi this is ram hi this is ram
This is dummy text file.
This is how search and replace works rambcd

How do I search and replace text in a file using Python 3?

Here is my code:

import os
import sys
import fileinput

print ("Text to search for:")
textToSearch = input( "> " )

print ("Text to replace it with:")
textToReplace = input( "> " )

print ("File to perform Search-Replace on:")
fileToSearch  = input( "> " )
#fileToSearch = 'D:\dummy1.txt'

tempFile = open( fileToSearch, 'r+' )

for line in fileinput.input( fileToSearch ):
    if textToSearch in line :
        print('Match Found')
    else:
        print('Match Not Found!!')
    tempFile.write( line.replace( textToSearch, textToReplace ) )
tempFile.close()


input( '\n\n Press Enter to exit...' )

Input file:

hi this is abcd hi this is abcd
This is dummy text file.
This is how search and replace works abcd

When I search and replace ‘ram’ by ‘abcd’ in above input file, it works as a charm. But when I do it vice-versa i.e. replacing ‘abcd’ by ‘ram’, some junk characters are left at the end.

Replacing ‘abcd’ by ‘ram’

hi this is ram hi this is ram
This is dummy text file.
This is how search and replace works rambcd

回答 0

fileinput已经支持就地编辑。stdout在这种情况下,它将重定向到文件:

#!/usr/bin/env python3
import fileinput

with fileinput.FileInput(filename, inplace=True, backup='.bak') as file:
    for line in file:
        print(line.replace(text_to_search, replacement_text), end='')

fileinput already supports inplace editing. It redirects stdout to the file in this case:

#!/usr/bin/env python3
import fileinput

with fileinput.FileInput(filename, inplace=True, backup='.bak') as file:
    for line in file:
        print(line.replace(text_to_search, replacement_text), end='')

回答 1

正如michaelb958指出的那样,您不能用其他长度的数据替换在原处,因为这会使其余部分无法正确放置。我不同意其他海报,建议您从一个文件中读取并写入另一个文件。相反,我将文件读入内存,修复数据,然后在单独的步骤中将其写出到同一文件中。

# Read in the file
with open('file.txt', 'r') as file :
  filedata = file.read()

# Replace the target string
filedata = filedata.replace('ram', 'abcd')

# Write the file out again
with open('file.txt', 'w') as file:
  file.write(filedata)

除非您要处理的海量文件太大而无法一次加载到内存中,否则除非担心在将数据写入文件的第二步过程中该过程中断,否则您可能会担心数据丢失。

As pointed out by michaelb958, you cannot replace in place with data of a different length because this will put the rest of the sections out of place. I disagree with the other posters suggesting you read from one file and write to another. Instead, I would read the file into memory, fix the data up, and then write it out to the same file in a separate step.

# Read in the file
with open('file.txt', 'r') as file :
  filedata = file.read()

# Replace the target string
filedata = filedata.replace('ram', 'abcd')

# Write the file out again
with open('file.txt', 'w') as file:
  file.write(filedata)

Unless you’ve got a massive file to work with which is too big to load into memory in one go, or you are concerned about potential data loss if the process is interrupted during the second step in which you write data to the file.


回答 2

正如杰克·艾德利(Jack Aidley)张贴的文章和JF Sebastian指出的那样,此代码不起作用:

 # Read in the file
filedata = None
with file = open('file.txt', 'r') :
  filedata = file.read()

# Replace the target string
filedata.replace('ram', 'abcd')

# Write the file out again
with file = open('file.txt', 'w') :
  file.write(filedata)`

但是此代码将起作用(我已经对其进行了测试):

f = open(filein,'r')
filedata = f.read()
f.close()

newdata = filedata.replace("old data","new data")

f = open(fileout,'w')
f.write(newdata)
f.close()

使用此方法,filein和fileout可以是同一文件,因为Python 3.3在打开进行写操作时会覆盖该文件。

As Jack Aidley had posted and J.F. Sebastian pointed out, this code will not work:

 # Read in the file
filedata = None
with file = open('file.txt', 'r') :
  filedata = file.read()

# Replace the target string
filedata.replace('ram', 'abcd')

# Write the file out again
with file = open('file.txt', 'w') :
  file.write(filedata)`

But this code WILL work (I’ve tested it):

f = open(filein,'r')
filedata = f.read()
f.close()

newdata = filedata.replace("old data","new data")

f = open(fileout,'w')
f.write(newdata)
f.close()

Using this method, filein and fileout can be the same file, because Python 3.3 will overwrite the file upon opening for write.


回答 3

你可以这样替换

f1 = open('file1.txt', 'r')
f2 = open('file2.txt', 'w')
for line in f1:
    f2.write(line.replace('old_text', 'new_text'))
f1.close()
f2.close()

You can do the replacement like this

f1 = open('file1.txt', 'r')
f2 = open('file2.txt', 'w')
for line in f1:
    f2.write(line.replace('old_text', 'new_text'))
f1.close()
f2.close()

回答 4

您也可以使用pathlib

from pathlib2 import Path
path = Path(file_to_search)
text = path.read_text()
text = text.replace(text_to_search, replacement_text)
path.write_text(text)

You can also use pathlib.

from pathlib2 import Path
path = Path(file_to_search)
text = path.read_text()
text = text.replace(text_to_search, replacement_text)
path.write_text(text)

回答 5

使用单个with块,您可以搜索和替换文本:

with open('file.txt','r+') as f:
    filedata = f.read()
    filedata = filedata.replace('abc','xyz')
    f.truncate(0)
    f.write(filedata)

With a single with block, you can search and replace your text:

with open('file.txt','r+') as f:
    filedata = f.read()
    filedata = filedata.replace('abc','xyz')
    f.truncate(0)
    f.write(filedata)

回答 6

您的问题源于读取和写入同一文件。无需打开fileToSearch进行写入,而是打开实际的临时文件,然后在完成并关闭后tempFile,使用os.rename将新文件移到上方fileToSearch

Your problem stems from reading from and writing to the same file. Rather than opening fileToSearch for writing, open an actual temporary file and then after you’re done and have closed tempFile, use os.rename to move the new file over fileToSearch.


回答 7

(pip install python-util)

from pyutil import filereplace

filereplace("somefile.txt","abcd","ram")

第二个参数(要替换的事物,例如“ abcd”也可以是正则表达式)
将替换所有出现的事件

(pip install python-util)

from pyutil import filereplace

filereplace("somefile.txt","abcd","ram")

The second parameter (the thing to be replaced, e.g. “abcd” can also be a regex)
Will replace all occurences


回答 8

我的变体,在整个文件上一次一个字。

我将其读入内存。

def replace_word(infile,old_word,new_word):
    if not os.path.isfile(infile):
        print ("Error on replace_word, not a regular file: "+infile)
        sys.exit(1)

    f1=open(infile,'r').read()
    f2=open(infile,'w')
    m=f1.replace(old_word,new_word)
    f2.write(m)

My variant, one word at a time on the entire file.

I read it into memory.

def replace_word(infile,old_word,new_word):
    if not os.path.isfile(infile):
        print ("Error on replace_word, not a regular file: "+infile)
        sys.exit(1)

    f1=open(infile,'r').read()
    f2=open(infile,'w')
    m=f1.replace(old_word,new_word)
    f2.write(m)

回答 9

我已经做到了:

#!/usr/bin/env python3

import fileinput
import os

Dir = input ("Source directory: ")
os.chdir(Dir)

Filelist = os.listdir()
print('File list: ',Filelist)

NomeFile = input ("Insert file name: ")

CarOr = input ("Text to search: ")

CarNew = input ("New text: ")

with fileinput.FileInput(NomeFile, inplace=True, backup='.bak') as file:
    for line in file:
        print(line.replace(CarOr, CarNew), end='')

file.close ()

I have done this:

#!/usr/bin/env python3

import fileinput
import os

Dir = input ("Source directory: ")
os.chdir(Dir)

Filelist = os.listdir()
print('File list: ',Filelist)

NomeFile = input ("Insert file name: ")

CarOr = input ("Text to search: ")

CarNew = input ("New text: ")

with fileinput.FileInput(NomeFile, inplace=True, backup='.bak') as file:
    for line in file:
        print(line.replace(CarOr, CarNew), end='')

file.close ()

回答 10

我稍微修改了Jayram Singh的帖子,以替换每个“!”实例。字符到我想随每个实例增加的数字。认为这对希望修改每行出现多次且要迭代的字符可能会有所帮助。希望能对某人有所帮助。PS-我对编码非常陌生,因此如果我的帖子在任何方面都不适当,我深表歉意,但这对我有用。

f1 = open('file1.txt', 'r')
f2 = open('file2.txt', 'w')
n = 1  

# if word=='!'replace w/ [n] & increment n; else append same word to     
# file2

for line in f1:
    for word in line:
        if word == '!':
            f2.write(word.replace('!', f'[{n}]'))
            n += 1
        else:
            f2.write(word)
f1.close()
f2.close()

I modified Jayram Singh’s post slightly in order to replace every instance of a ‘!’ character to a number which I wanted to increment with each instance. Thought it might be helpful to someone who wanted to modify a character that occurred more than once per line and wanted to iterate. Hope that helps someone. PS- I’m very new at coding so apologies if my post is inappropriate in any way, but this worked for me.

f1 = open('file1.txt', 'r')
f2 = open('file2.txt', 'w')
n = 1  

# if word=='!'replace w/ [n] & increment n; else append same word to     
# file2

for line in f1:
    for word in line:
        if word == '!':
            f2.write(word.replace('!', f'[{n}]'))
            n += 1
        else:
            f2.write(word)
f1.close()
f2.close()

回答 11

def word_replace(filename,old,new):
    c=0
    with open(filename,'r+',encoding ='utf-8') as f:
        a=f.read()
        b=a.split()
        for i in range(0,len(b)):
            if b[i]==old:
                c=c+1
        old=old.center(len(old)+2)
        new=new.center(len(new)+2)
        d=a.replace(old,new,c)
        f.truncate(0)
        f.seek(0)
        f.write(d)
    print('All words have been replaced!!!')
def word_replace(filename,old,new):
    c=0
    with open(filename,'r+',encoding ='utf-8') as f:
        a=f.read()
        b=a.split()
        for i in range(0,len(b)):
            if b[i]==old:
                c=c+1
        old=old.center(len(old)+2)
        new=new.center(len(new)+2)
        d=a.replace(old,new,c)
        f.truncate(0)
        f.seek(0)
        f.write(d)
    print('All words have been replaced!!!')

回答 12

像这样:

def find_and_replace(file, word, replacement):
  with open(file, 'r+') as f:
    text = f.read()
    f.write(text.replace(word, replacement))

Like so:

def find_and_replace(file, word, replacement):
  with open(file, 'r+') as f:
    text = f.read()
    f.write(text.replace(word, replacement))

回答 13

def findReplace(find, replace):

    import os 

    src = os.path.join(os.getcwd(), os.pardir) 

    for path, dirs, files in os.walk(os.path.abspath(src)):

        for name in files: 

            if name.endswith('.py'): 

                filepath = os.path.join(path, name)

                with open(filepath) as f: 

                    s = f.read()

                s = s.replace(find, replace) 

                with open(filepath, "w") as f:

                    f.write(s) 
def findReplace(find, replace):

    import os 

    src = os.path.join(os.getcwd(), os.pardir) 

    for path, dirs, files in os.walk(os.path.abspath(src)):

        for name in files: 

            if name.endswith('.py'): 

                filepath = os.path.join(path, name)

                with open(filepath) as f: 

                    s = f.read()

                s = s.replace(find, replace) 

                with open(filepath, "w") as f:

                    f.write(s) 

如何打开文件进行读写?

问题:如何打开文件进行读写?

有没有办法打开文件进行读写?

解决方法是,打开文件进行写入,将其关闭,然后再次打开以进行读取。但是,有没有办法打开一个文件阅读和写作?

Is there a way to open a file for both reading and writing?

As a workaround, I open the file for writing, close it, then open it again for reading. But is there a way to open a file for both reading and writing?


回答 0

在不关闭和重新打开的情况下,读取文件然后写入文件(覆盖所有现有数据)的方法如下:

with open(filename, "r+") as f:
    data = f.read()
    f.seek(0)
    f.write(output)
    f.truncate()

Here’s how you read a file, and then write to it (overwriting any existing data), without closing and reopening:

with open(filename, "r+") as f:
    data = f.read()
    f.seek(0)
    f.write(output)
    f.truncate()

回答 1

r+是同时读取和写入的规范模式。这与使用fopen()系统调用没有什么不同,因为file()/ open()只是围绕此操作系统调用的一个小包装。

r+ is the canonical mode for reading and writing at the same time. This is not different from using the fopen() system call since file() / open() is just a tiny wrapper around this operating system call.


回答 2

总结I / O行为

|          Mode          |  r   |  r+  |  w   |  w+  |  a   |  a+  |
| :--------------------: | :--: | :--: | :--: | :--: | :--: | :--: |
|          Read          |  +   |  +   |      |  +   |      |  +   |
|         Write          |      |  +   |  +   |  +   |  +   |  +   |
|         Create         |      |      |  +   |  +   |  +   |  +   |
|         Cover          |      |      |  +   |  +   |      |      |
| Point in the beginning |  +   |  +   |  +   |  +   |      |      |
|    Point in the end    |      |      |      |      |  +   |  +   |

和决策分支

Summarize the I/O behaviors

|          Mode          |  r   |  r+  |  w   |  w+  |  a   |  a+  |
| :--------------------: | :--: | :--: | :--: | :--: | :--: | :--: |
|          Read          |  +   |  +   |      |  +   |      |  +   |
|         Write          |      |  +   |  +   |  +   |  +   |  +   |
|         Create         |      |      |  +   |  +   |  +   |  +   |
|         Cover          |      |      |  +   |  +   |      |      |
| Point in the beginning |  +   |  +   |  +   |  +   |      |      |
|    Point in the end    |      |      |      |      |  +   |  +   |

and the decision branch


回答 3

我已经尝试过类似的方法,并且可以按预期工作:

f = open("c:\\log.log", 'r+b')
f.write("\x5F\x9D\x3E")
f.read(100)
f.close()

哪里:

f.read(size)-要读取文件的内容,请调用f.read(size),它将读取一定数量的数据并将其作为字符串返回。

和:

f.write(string)将string的内容写入文件,返回None。

另外,如果您打开有关读写文件的Python教程,则会发现:

“ r +”打开文件以供读取和写入。

在Windows上,附加到模式的’b’以二进制模式打开文件,因此也有’rb’,’wb’和’r + b’之类的模式。

I have tried something like this and it works as expected:

f = open("c:\\log.log", 'r+b')
f.write("\x5F\x9D\x3E")
f.read(100)
f.close()

Where:

f.read(size) – To read a file’s contents, call f.read(size), which reads some quantity of data and returns it as a string.

And:

f.write(string) writes the contents of string to the file, returning None.

Also if you open Python tutorial about reading and writing files you will find that:

‘r+’ opens the file for both reading and writing.

On Windows, ‘b’ appended to the mode opens the file in binary mode, so there are also modes like ‘rb’, ‘wb’, and ‘r+b’.


仅读取文件的第一行?

问题:仅读取文件的第一行?

如何使用Python仅将文件的第一行作为字符串?

How would you get only the first line of a file as a string with Python?


回答 0

使用.readline()方法(Python 2文档Python 3文档):

with open('myfile.txt') as f:
    first_line = f.readline()

一些注意事项:

  1. 如文档中所述,除非它是文件中的唯一一行,否则从中返回的字符串f.readline()将包含尾随换行符。您可能希望f.readline().strip()改用删除换行符。
  2. with语句在块结束时自动再次关闭文件。
  3. with语句仅在Python 2.5及更高版本中有效,而在Python 2.5中,您需要使用from __future__ import with_statement
  4. 在Python 3中,您应该为打开的文件指定文件编码。阅读更多…

Use the .readline() method (Python 2 docs, Python 3 docs):

with open('myfile.txt') as f:
    first_line = f.readline()

Some notes:

  1. As noted in the docs, unless it is the only line in the file, the string returned from f.readline() will contain a trailing newline. You may wish to use f.readline().strip() instead to remove the newline.
  2. The with statement automatically closes the file again when the block ends.
  3. The with statement only works in Python 2.5 and up, and in Python 2.5 you need to use from __future__ import with_statement
  4. In Python 3 you should specify the file encoding for the file you open. Read more…

回答 1

infile = open('filename.txt', 'r')
firstLine = infile.readline()
infile = open('filename.txt', 'r')
firstLine = infile.readline()

回答 2

fline=open("myfile").readline().rstrip()
fline=open("myfile").readline().rstrip()

回答 3

应该这样做:

f = open('myfile.txt')
first = f.readline()

This should do it:

f = open('myfile.txt')
first = f.readline()

回答 4

要返回打开文件的开头,然后返回第一行,请执行以下操作:

my_file.seek(0)
first_line = my_file.readline()

To go back to the beginning of an open file and then return the first line, do this:

my_file.seek(0)
first_line = my_file.readline()

回答 5

first_line = next(open(filename))
first_line = next(open(filename))

回答 6

这里还有很多其他答案,但是要精确回答您所提出的问题(在@MarkAmery去编辑原始问题并更改含义之前):

>>> f = open('myfile.txt')
>>> data = f.read()
>>> # I'm assuming you had the above before asking the question
>>> first_line = data.split('\n', 1)[0]

换句话说,如果您已经读入文件(如您所说),并且在内存中有一大块数据,那么要有效地从中获取第一行,请对换行符执行一次split()仅,并从结果列表中获取第一个元素。

请注意,这不包括该\n行末的字符,但是我假设您还是不希望使用该字符(单行文件甚至可能没有该字符)。还要注意,尽管它很短而且很快,但是它确实可以复制数据,因此对于很大的内存块,您可能不会认为它“有效”。和往常一样,这取决于…

Lots of other answers here, but to answer precisely the question you asked (before @MarkAmery went and edited the original question and changed the meaning):

>>> f = open('myfile.txt')
>>> data = f.read()
>>> # I'm assuming you had the above before asking the question
>>> first_line = data.split('\n', 1)[0]

In other words, if you’ve already read in the file (as you said), and have a big block of data in memory, then to get the first line from it efficiently, do a split() on the newline character, once only, and take the first element from the resulting list.

Note that this does not include the \n character at the end of the line, but I’m assuming you don’t want it anyway (and a single-line file may not even have one). Also note that although it’s pretty short and quick, it does make a copy of the data, so for a really large blob of memory you may not consider it “efficient”. As always, it depends…


回答 7

f1 = open("input1.txt", "r")
print(f1.readline())
f1 = open("input1.txt", "r")
print(f1.readline())

被python文件模式“ w +”混淆

问题:被python文件模式“ w +”混淆

文档

模式“ r +”,“ w +”和“ a +”打开文件进行更新(请注意,“ w +”会截断文件)。在区分二进制文件和文本文件的系统上,将’b’追加到以二进制模式打开文件的模式;在没有此区别的系统上,添加“ b”无效。

w +:打开一个文件进行读写。如果文件存在,则覆盖现有文件。如果该文件不存在,请创建一个新文件以进行读写。

但是,如何读取打开的文件w+

From the doc,

Modes ‘r+’, ‘w+’ and ‘a+’ open the file for updating (note that ‘w+’ truncates the file). Append ‘b’ to the mode to open the file in binary mode, on systems that differentiate between binary and text files; on systems that don’t have this distinction, adding the ‘b’ has no effect.

and here

w+ : Opens a file for both writing and reading. Overwrites the existing file if the file exists. If the file does not exist, creates a new file for reading and writing.

But, how to read a file open with w+?


回答 0

假设您打开的文件带有with应有的声明。然后,您将执行以下操作以从文件中读取内容:

with open('somefile.txt', 'w+') as f:
    # Note that f has now been truncated to 0 bytes, so you'll only
    # be able to read data that you write after this point
    f.write('somedata\n')
    f.seek(0)  # Important: return to the top of the file before reading, otherwise you'll just read an empty string
    data = f.read() # Returns 'somedata\n'

请注意f.seek(0)-如果您忘记了这一点,则该f.read()调用将尝试从文件末尾读取,并将返回一个空字符串。

Let’s say you’re opening the file with a with statement like you should be. Then you’d do something like this to read from your file:

with open('somefile.txt', 'w+') as f:
    # Note that f has now been truncated to 0 bytes, so you'll only
    # be able to read data that you write after this point
    f.write('somedata\n')
    f.seek(0)  # Important: return to the top of the file before reading, otherwise you'll just read an empty string
    data = f.read() # Returns 'somedata\n'

Note the f.seek(0) — if you forget this, the f.read() call will try to read from the end of the file, and will return an empty string.


回答 1

这是打开文件的不同模式的列表:

  • [R

    打开一个文件以供只读。文件指针放置在文件的开头。这是默认模式。

  • rb

    打开文件以仅以二进制格式读取。文件指针放置在文件的开头。这是默认模式。

  • r +

    打开一个文件进行读取和写入。文件指针将位于文件的开头。

  • rb +

    打开一个文件,以二进制格式读取和写入。文件指针将位于文件的开头。

  • w

    打开仅用于写入的文件。如果文件存在,则覆盖该文件。如果该文件不存在,则创建一个新文件进行写入。

  • b

    打开一个文件,仅以二进制格式写入。如果文件存在,则覆盖该文件。如果该文件不存在,则创建一个新文件进行写入。

  • w +

    打开一个文件进行读写。如果文件存在,则覆盖现有文件。如果该文件不存在,请创建一个新文件以进行读写。

  • wb +

    打开一个文件以进行二进制格式的读写。如果文件存在,则覆盖现有文件。如果该文件不存在,请创建一个新文件以进行读写。

  • 一个

    打开一个文件进行追加。如果文件存在,则文件指针位于文件的末尾。也就是说,文件处于附加模式。如果该文件不存在,它将创建一个新文件进行写入。

  • b

    打开文件以二进制格式追加。如果文件存在,则文件指针位于文件的末尾。也就是说,文件处于附加模式。如果该文件不存在,它将创建一个新文件进行写入。

  • a +

    打开文件以进行追加和读取。如果文件存在,则文件指针位于文件的末尾。该文件以追加模式打开。如果该文件不存在,它将创建一个用于读取和写入的新文件。

  • ab +

    打开一个文件,以便以二进制格式追加和读取。如果文件存在,则文件指针位于文件的末尾。该文件以追加模式打开。如果该文件不存在,它将创建一个用于读取和写入的新文件。

Here is a list of the different modes of opening a file:

  • r

    Opens a file for reading only. The file pointer is placed at the beginning of the file. This is the default mode.

  • rb

    Opens a file for reading only in binary format. The file pointer is placed at the beginning of the file. This is the default mode.

  • r+

    Opens a file for both reading and writing. The file pointer will be at the beginning of the file.

  • rb+

    Opens a file for both reading and writing in binary format. The file pointer will be at the beginning of the file.

  • w

    Opens a file for writing only. Overwrites the file if the file exists. If the file does not exist, creates a new file for writing.

  • wb

    Opens a file for writing only in binary format. Overwrites the file if the file exists. If the file does not exist, creates a new file for writing.

  • w+

    Opens a file for both writing and reading. Overwrites the existing file if the file exists. If the file does not exist, creates a new file for reading and writing.

  • wb+

    Opens a file for both writing and reading in binary format. Overwrites the existing file if the file exists. If the file does not exist, creates a new file for reading and writing.

  • a

    Opens a file for appending. The file pointer is at the end of the file if the file exists. That is, the file is in the append mode. If the file does not exist, it creates a new file for writing.

  • ab

    Opens a file for appending in binary format. The file pointer is at the end of the file if the file exists. That is, the file is in the append mode. If the file does not exist, it creates a new file for writing.

  • a+

    Opens a file for both appending and reading. The file pointer is at the end of the file if the file exists. The file opens in the append mode. If the file does not exist, it creates a new file for reading and writing.

  • ab+

    Opens a file for both appending and reading in binary format. The file pointer is at the end of the file if the file exists. The file opens in the append mode. If the file does not exist, it creates a new file for reading and writing.


回答 2

Python中的所有文件模式

  • r 阅读
  • r+ 打开以进行读写(无法截断文件)
  • w 用于写作
  • w+ 用于读写(可以截断文件)
  • rb用于读取二进制文件。文件指针放置在文件的开头。
  • rb+ 读取或写入二进制文件
  • wb+ 编写二进制文件
  • a+ 打开进行追加
  • ab+打开一个文件,以附加和读取二进制文件。如果文件存在,则文件指针位于文件的末尾。该文件以追加模式打开。
  • x 打开以进行独占创建,如果文件已存在则失败(Python 3)

All file modes in Python

  • r for reading
  • r+ opens for reading and writing (cannot truncate a file)
  • w for writing
  • w+ for writing and reading (can truncate a file)
  • rb for reading a binary file. The file pointer is placed at the beginning of the file.
  • rb+ reading or writing a binary file
  • wb+ writing a binary file
  • a+ opens for appending
  • ab+ Opens a file for both appending and reading in binary. The file pointer is at the end of the file if the file exists. The file opens in the append mode.
  • x open for exclusive creation, failing if the file already exists (Python 3)

回答 3

r 供阅读

w

r+ 用于读/写而不删除原始内容(如果文件存在),否则引发异常

w+ 用于删除原始内容,然后读取/写入(如果文件存在),否则创建文件

例如,

>>> with open("file1.txt", "w") as f:
...   f.write("ab\n")
... 
>>> with open("file1.txt", "w+") as f:
...   f.write("c")
... 

$ cat file1.txt 
c$
>>> with open("file2.txt", "r+") as f:
...   f.write("ab\n")
... 
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
IOError: [Errno 2] No such file or directory: 'file2.txt'
>>> with open("file2.txt", "w") as f:
...   f.write("ab\n")
... 
>>> with open("file2.txt", "r+") as f:
...   f.write("c")
... 

$ cat file2.txt 
cb
$

r for read

w for write

r+ for read/write without deleting the original content if file exists, otherwise raise exception

w+ for delete the original content then read/write if file exists, otherwise create the file

For example,

>>> with open("file1.txt", "w") as f:
...   f.write("ab\n")
... 
>>> with open("file1.txt", "w+") as f:
...   f.write("c")
... 

$ cat file1.txt 
c$
>>> with open("file2.txt", "r+") as f:
...   f.write("ab\n")
... 
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
IOError: [Errno 2] No such file or directory: 'file2.txt'

>>> with open("file2.txt", "w") as f:
...   f.write("ab\n")
... 
>>> with open("file2.txt", "r+") as f:
...   f.write("c")
... 

$ cat file2.txt 
cb
$

回答 4

该文件被截断,因此您可以调用read()(不会引发任何异常,与使用’w’打开时不同),但是您会得到一个空字符串。

The file is truncated, so you can call read() (no exceptions raised, unlike when opened using ‘w’) but you’ll get an empty string.


回答 5

我怀疑有两种方法可以处理您认为要达到的目标。

1)很明显,就是打开文件以供只读,将其读入内存,然后用t打开文件,然后写入更改。

2)使用低级文件处理例程:

# Open file in RW , create if it doesn't exist. *Don't* pass O_TRUNC
 fd = os.open(filename, os.O_RDWR | os.O_CREAT)

希望这可以帮助..

I suspect there are two ways to handle what I think you’r trying to achieve.

1) which is obvious, is open the file for reading only, read it into memory then open the file with t, then write your changes.

2) use the low level file handling routines:

# Open file in RW , create if it doesn't exist. *Don't* pass O_TRUNC
 fd = os.open(filename, os.O_RDWR | os.O_CREAT)

Hope this helps..


回答 6

实际上,关于r+模式的所有其他答案都有问题。

test.in 文件内容:

hello1
ok2
byebye3

和py脚本的:

with open("test.in", 'r+')as f:
    f.readline()
    f.write("addition")

执行它,test.in的内容将更改为:

hello1
ok2
byebye3
addition

但是,当我们将脚本修改为:

with open("test.in", 'r+')as f:
    f.write("addition")

test.in也做了回应:

additionk2
byebye3

所以 r+如果我们不执行读取操作模式将使我们从一开始就覆盖内容。而且,如果我们执行一些读取操作,f.write()则只会追加到文件中。

顺便说一下,如果我们f.seek(0,0)以前f.write(write_content)这样做过,write_content将从positon(0,0)覆盖它们。

Actually, there’s something wrong about all the other answers about r+ mode.

test.in file’s content:

hello1
ok2
byebye3

And the py script’s :

with open("test.in", 'r+')as f:
    f.readline()
    f.write("addition")

Execute it and the test.in‘s content will be changed to :

hello1
ok2
byebye3
addition

However, when we modify the script to :

with open("test.in", 'r+')as f:
    f.write("addition")

the test.in also do the respond:

additionk2
byebye3

So, the r+ mode will allow us to cover the content from the beginning if we did’t do the read operation. And if we do some read operation, f.write()will just append to the file.

By the way, if we f.seek(0,0) before f.write(write_content), the write_content will cover them from the positon(0,0).


回答 7

h4z3所述,为实际使用,有时您的数据太大而无法直接加载所有内容,或者您​​拥有生成器或实时传入的数据,则可以使用w +存储在文件中并在以后读取。

As mentioned by h4z3, For a practical use, Sometimes your data is too big to directly load everything, or you have a generator, or real-time incoming data, you could use w+ to store in a file and read later.


如何使用open with语句打开文件

问题:如何使用open with语句打开文件

我正在研究如何在Python中进行文件输入和输出。我编写了以下代码,以将文件列表中的名称列表(每行一个)读入另一个文件中,同时对照文件中的名称检查名称并将文本附加到文件中。该代码有效。可以做得更好吗?

我想对with open(...输入文件和输出文件都使用该语句,但是看不到它们如何位于同一块中,这意味着我需要将名称存储在一个临时位置。

def filter(txt, oldfile, newfile):
    '''\
    Read a list of names from a file line by line into an output file.
    If a line begins with a particular name, insert a string of text
    after the name before appending the line to the output file.
    '''

    outfile = open(newfile, 'w')
    with open(oldfile, 'r', encoding='utf-8') as infile:
        for line in infile:
            if line.startswith(txt):
                line = line[0:len(txt)] + ' - Truly a great person!\n'
            outfile.write(line)

    outfile.close()
    return # Do I gain anything by including this?

# input the name you want to check against
text = input('Please enter the name of a great person: ')    
letsgo = filter(text,'Spanish', 'Spanish2')

I’m looking at how to do file input and output in Python. I’ve written the following code to read a list of names (one per line) from a file into another file while checking a name against the names in the file and appending text to the occurrences in the file. The code works. Could it be done better?

I’d wanted to use the with open(... statement for both input and output files but can’t see how they could be in the same block meaning I’d need to store the names in a temporary location.

def filter(txt, oldfile, newfile):
    '''\
    Read a list of names from a file line by line into an output file.
    If a line begins with a particular name, insert a string of text
    after the name before appending the line to the output file.
    '''

    outfile = open(newfile, 'w')
    with open(oldfile, 'r', encoding='utf-8') as infile:
        for line in infile:
            if line.startswith(txt):
                line = line[0:len(txt)] + ' - Truly a great person!\n'
            outfile.write(line)

    outfile.close()
    return # Do I gain anything by including this?

# input the name you want to check against
text = input('Please enter the name of a great person: ')    
letsgo = filter(text,'Spanish', 'Spanish2')

回答 0

Python允许将多个open()语句放在一个语句中with。您用逗号分隔。您的代码将是:

def filter(txt, oldfile, newfile):
    '''\
    Read a list of names from a file line by line into an output file.
    If a line begins with a particular name, insert a string of text
    after the name before appending the line to the output file.
    '''

    with open(newfile, 'w') as outfile, open(oldfile, 'r', encoding='utf-8') as infile:
        for line in infile:
            if line.startswith(txt):
                line = line[0:len(txt)] + ' - Truly a great person!\n'
            outfile.write(line)

# input the name you want to check against
text = input('Please enter the name of a great person: ')    
letsgo = filter(text,'Spanish', 'Spanish2')

不,通过return在函数的末尾放置一个显式字符不会获得任何收益。您可以使用return提前退出,但最后要退出,并且该函数将在没有退出的情况下退出。(当然,对于返回值的函数,您可以使用return来指定要返回的值。)

引入该语句时,Python 2.5 或Python 2.6 不支持open()与一起使用多个项目,但Python 2.7和Python 3.1或更高版本with支持使用多个项目with

http://docs.python.org/reference/compound_stmts.html#the-with-statement http://docs.python.org/release/3.1/reference/compound_stmts.html#the-with-statement

如果要编写必须在Python 2.5、2.6或3.0中运行的代码,则将with语句嵌套为建议的其他答案或使用contextlib.nested

Python allows putting multiple open() statements in a single with. You comma-separate them. Your code would then be:

def filter(txt, oldfile, newfile):
    '''\
    Read a list of names from a file line by line into an output file.
    If a line begins with a particular name, insert a string of text
    after the name before appending the line to the output file.
    '''

    with open(newfile, 'w') as outfile, open(oldfile, 'r', encoding='utf-8') as infile:
        for line in infile:
            if line.startswith(txt):
                line = line[0:len(txt)] + ' - Truly a great person!\n'
            outfile.write(line)

# input the name you want to check against
text = input('Please enter the name of a great person: ')    
letsgo = filter(text,'Spanish', 'Spanish2')

And no, you don’t gain anything by putting an explicit return at the end of your function. You can use return to exit early, but you had it at the end, and the function will exit without it. (Of course with functions that return a value, you use the return to specify the value to return.)

Using multiple open() items with with was not supported in Python 2.5 when the with statement was introduced, or in Python 2.6, but it is supported in Python 2.7 and Python 3.1 or newer.

http://docs.python.org/reference/compound_stmts.html#the-with-statement http://docs.python.org/release/3.1/reference/compound_stmts.html#the-with-statement

If you are writing code that must run in Python 2.5, 2.6 or 3.0, nest the with statements as the other answers suggested or use contextlib.nested.


回答 1

这样使用嵌套块

with open(newfile, 'w') as outfile:
    with open(oldfile, 'r', encoding='utf-8') as infile:
        # your logic goes right here

Use nested blocks like this,

with open(newfile, 'w') as outfile:
    with open(oldfile, 'r', encoding='utf-8') as infile:
        # your logic goes right here

回答 2

您可以将块嵌套。像这样:

with open(newfile, 'w') as outfile:
    with open(oldfile, 'r', encoding='utf-8') as infile:
        for line in infile:
            if line.startswith(txt):
                line = line[0:len(txt)] + ' - Truly a great person!\n'
            outfile.write(line)

这比您的版本更好,因为outfile即使您的代码遇到异常,您也可以保证将其关闭。显然,您可以通过try / finally进行操作,但这with是正确的方法。

或者,正如我刚学到的,您可以在@steveha描述的with语句中包含多个上下文管理器。在我看来,这比嵌套是更好的选择。

对于您的最后一个小问题,退货没有实际目的。我会删除它。

You can nest your with blocks. Like this:

with open(newfile, 'w') as outfile:
    with open(oldfile, 'r', encoding='utf-8') as infile:
        for line in infile:
            if line.startswith(txt):
                line = line[0:len(txt)] + ' - Truly a great person!\n'
            outfile.write(line)

This is better than your version because you guarantee that outfile will be closed even if your code encounters exceptions. Obviously you could do that with try/finally, but with is the right way to do this.

Or, as I have just learnt, you can have multiple context managers in a with statement as described by @steveha. That seems to me to be a better option than nesting.

And for your final minor question, the return serves no real purpose. I would remove it.


回答 3

有时,您可能想打开不同数量的文件,并对待每个文件相同,可以使用 contextlib

from contextlib import ExitStack
filenames = [file1.txt, file2.txt, file3.txt]

with open('outfile.txt', 'a') as outfile:
    with ExitStack() as stack:
        file_pointers = [stack.enter_context(open(file, 'r')) for file in filenames]                
            for fp in file_pointers:
                outfile.write(fp.read())                   

Sometimes, you might want to open a variable amount of files and treat each one the same, you can do this with contextlib

from contextlib import ExitStack
filenames = [file1.txt, file2.txt, file3.txt]

with open('outfile.txt', 'a') as outfile:
    with ExitStack() as stack:
        file_pointers = [stack.enter_context(open(file, 'r')) for file in filenames]                
            for fp in file_pointers:
                outfile.write(fp.read())