使用Python遍历目录

问题:使用Python遍历目录

我需要遍历给定目录的子目录并搜索文件。如果我得到一个文件,则必须打开它并更改内容,然后用自己的行替换它。

我尝试了这个:

import os

rootdir ='C:/Users/sid/Desktop/test'

for subdir, dirs, files in os.walk(rootdir):
    for file in files:
        f=open(file,'r')
        lines=f.readlines()
        f.close()
        f=open(file,'w')
        for line in lines:
            newline = "No you are not"
            f.write(newline)
        f.close()

但我遇到一个错误。我究竟做错了什么?

I need to iterate through the subdirectories of a given directory and search for files. If I get a file I have to open it and change the content and replace it with my own lines.

I tried this:

import os

rootdir ='C:/Users/sid/Desktop/test'

for subdir, dirs, files in os.walk(rootdir):
    for file in files:
        f=open(file,'r')
        lines=f.readlines()
        f.close()
        f=open(file,'w')
        for line in lines:
            newline = "No you are not"
            f.write(newline)
        f.close()

but I am getting an error. What am I doing wrong?


回答 0

实际遍历目录的工作方式与您对代码的编码方式相同。如果用简单的print语句替换内部循环的内容,则可以看到找到了每个文件:

import os
rootdir = 'C:/Users/sid/Desktop/test'

for subdir, dirs, files in os.walk(rootdir):
    for file in files:
        print os.path.join(subdir, file)

如果在执行上述操作时仍然出现错误,请提供错误消息。


为Python3更新

import os
rootdir = 'C:/Users/sid/Desktop/test'

for subdir, dirs, files in os.walk(rootdir):
    for file in files:
        print(os.path.join(subdir, file))

The actual walk through the directories works as you have coded it. If you replace the contents of the inner loop with a simple print statement you can see that each file is found:

import os
rootdir = 'C:/Users/sid/Desktop/test'

for subdir, dirs, files in os.walk(rootdir):
    for file in files:
        print os.path.join(subdir, file)

If you still get errors when running the above, please provide the error message.


Updated for Python3

import os
rootdir = 'C:/Users/sid/Desktop/test'

for subdir, dirs, files in os.walk(rootdir):
    for file in files:
        print(os.path.join(subdir, file))

回答 1

返回子目录中所有文件的另一种方法是使用Python 3.4中引入pathlib模块模块提供了一种面向对象的方法来处理文件系统路径(Pathlib在2.7上也可以通过PyPi上的pathlib2模块获得):

from pathlib import Path

rootdir = Path('C:/Users/sid/Desktop/test')
# Return a list of regular files only, not directories
file_list = [f for f in rootdir.glob('**/*') if f.is_file()]

# For absolute paths instead of relative the current dir
file_list = [f for f in rootdir.resolve().glob('**/*') if f.is_file()]

从Python 3.5开始,该glob模块还支持递归文件查找:

import os
from glob import iglob

rootdir_glob = 'C:/Users/sid/Desktop/test/**/*' # Note the added asterisks
# This will return absolute paths
file_list = [f for f in iglob('**/*', recursive=True) if os.path.isfile(f)]

file_list从任一上述方法可被遍历,而不需要一个嵌套循环:

for f in file_list:
    print(f) # Replace with desired operations

Another way of returning all files in subdirectories is to use the pathlib module, introduced in Python 3.4, which provides an object oriented approach to handling filesystem paths (Pathlib is also available on Python 2.7 via the pathlib2 module on PyPi):

from pathlib import Path

rootdir = Path('C:/Users/sid/Desktop/test')
# Return a list of regular files only, not directories
file_list = [f for f in rootdir.glob('**/*') if f.is_file()]

# For absolute paths instead of relative the current dir
file_list = [f for f in rootdir.resolve().glob('**/*') if f.is_file()]

Since Python 3.5, the glob module also supports recursive file finding:

import os
from glob import iglob

rootdir_glob = 'C:/Users/sid/Desktop/test/**/*' # Note the added asterisks
# This will return absolute paths
file_list = [f for f in iglob('**/*', recursive=True) if os.path.isfile(f)]

The file_list from either of the above approaches can be iterated over without the need for a nested loop:

for f in file_list:
    print(f) # Replace with desired operations

回答 2

截至2020年glob.iglob(path/**, recursive=True)似乎是最pythonic的解决方案,即:

import glob, os

for filename in glob.iglob('/pardadox-music/**', recursive=True):
    if os.path.isfile(filename): # filter dirs
        print(filename)

输出:

/pardadox-music/modules/her1.mod
/pardadox-music/modules/her2.mod
...

注意:
1- glob.iglob

glob.iglob(pathname, recursive=False)

返回一个迭代器,该迭代器产生的值与glob()实际不同时存储它们的值相同。

2-如果是递归的True,则模式'**'将匹配任何文件以及零个或多个directoriessubdirectories

3-如果目录包含以开头的文件,  .则默认情况下将不匹配它们。例如,考虑包含card.gif 和的目录  .card.gif

>>> import glob
>>> glob.glob('*.gif') ['card.gif'] 
>>> glob.glob('.c*')['.card.gif']

4-您也可以使用rglob(pattern)glob() 与**/在给定相对模式前面添加调用相同  。

From python >= 3.5 onward, you can use **, glob.iglob(path/**, recursive=True) and it seems the most pythonic solution, i.e.:

import glob, os

for filename in glob.iglob('/pardadox-music/**', recursive=True):
    if os.path.isfile(filename): # filter dirs
        print(filename)

Output:

/pardadox-music/modules/her1.mod
/pardadox-music/modules/her2.mod
...

Notes:
1 – glob.iglob

glob.iglob(pathname, recursive=False)

Return an iterator which yields the same values as glob() without actually storing them all simultaneously.

2 – If recursive is True, the pattern '**' will match any files and zero or more directories and subdirectories.

3 – If the directory contains files starting with . they won’t be matched by default. For example, consider a directory containing card.gif and .card.gif:

>>> import glob
>>> glob.glob('*.gif') ['card.gif'] 
>>> glob.glob('.c*')['.card.gif']

4 – You can also use rglob(pattern), which is the same as calling glob() with **/ added in front of the given relative pattern.