排除os.walk中的目录-Python 实用宝典

问题：排除os.walk中的目录

我正在编写一个脚本，该脚本会下降到目录树中（使用os.walk（）），然后访问与某个文件扩展名匹配的每个文件。但是，由于我的工具将使用的某些目录树还包含子目录，这些子目录又包含很多无用的（出于此脚本的目的）内容，因此我想为用户添加一个选项以指定从遍历中排除的目录列表。

使用os.walk（）很简单。毕竟，由我决定是我实际上是要访问os.walk（）生成的相应文件/目录还是仅跳过它们。问题是，如果我有例如这样的目录树：

root--
     |
     --- dirA
     |
     --- dirB
     |
     --- uselessStuff --
                       |
                       --- moreJunk
                       |
                       --- yetMoreJunk

而且我想排除uselessStuff及其所有子项，os.walk（）仍会进入uselessStuff的所有（可能成千上万个）子目录中，不用说，这会大大降低速度。在理想的世界中，我可以告诉os.walk（）甚至不必费心再产生更多的uselessStuff子级，但是据我所知，这是没有办法的（是吗？）。

有人有主意吗？也许有一个第三方库提供了类似的东西？

I’m writing a script that descends into a directory tree (using os.walk()) and then visits each file matching a certain file extension. However, since some of the directory trees that my tool will be used on also contain sub directories that in turn contain a LOT of useless (for the purpose of this script) stuff, I figured I’d add an option for the user to specify a list of directories to exclude from the traversal.

This is easy enough with os.walk(). After all, it’s up to me to decide whether I actually want to visit the respective files / dirs yielded by os.walk() or just skip them. The problem is that if I have, for example, a directory tree like this:

root--
     |
     --- dirA
     |
     --- dirB
     |
     --- uselessStuff --
                       |
                       --- moreJunk
                       |
                       --- yetMoreJunk

and I want to exclude uselessStuff and all its children, os.walk() will still descend into all the (potentially thousands of) sub directories of uselessStuff, which, needless to say, slows things down a lot. In an ideal world, I could tell os.walk() to not even bother yielding any more children of uselessStuff, but to my knowledge there is no way of doing that (is there?).

Does anyone have an idea? Maybe there’s a third-party library that provides something like that?

回答 0

dirs 就地修改将修剪（以下）访问的（后续）文件和目录os.walk：

# exclude = set([...])
for root, dirs, files in os.walk(top, topdown=True):
    dirs[:] = [d for d in dirs if d not in exclude]

从帮助（os.walk）：

当topdown为true时，调用者可以就地修改目录名称列表（例如，通过del或slice分配），而walk仅会递归到名称保留在目录名称中的子目录中；这可以用来修剪搜索…

Modifying dirs in-place will prune the (subsequent) files and directories visited by os.walk:

# exclude = set(['New folder', 'Windows', 'Desktop'])
for root, dirs, files in os.walk(top, topdown=True):
    dirs[:] = [d for d in dirs if d not in exclude]

From help(os.walk):

When topdown is true, the caller can modify the dirnames list in-place (e.g., via del or slice assignment), and walk will only recurse into the subdirectories whose names remain in dirnames; this can be used to prune the search…

回答 1

… @ unutbu出色答案的另一种形式，它的读入更为直接，因为其目的是排除目录，所花费的时间为O（n ** 2）vs O（n）。

（list(dirs)为了正确执行，需要复制dirs列表）

# exclude = set([...])
for root, dirs, files in os.walk(top, topdown=True):
    [dirs.remove(d) for d in list(dirs) if d in exclude]

… an alternative form of @unutbu’s excellent answer that reads a little more directly, given that the intent is to exclude directories, at the cost of O(n**2) vs O(n) time.

(Making a copy of the dirs list with list(dirs) is required for correct execution)

# exclude = set([...])
for root, dirs, files in os.walk(top, topdown=True):
    [dirs.remove(d) for d in list(dirs) if d in exclude]

声明：本站所有文章，如无特殊说明或标注，均为本站原创发布。任何个人或组织，在未征得本站同意时，禁止复制、盗用、采集、发布本站内容到任何网站、书籍等各类媒体平台。如若本站内容侵犯了原著者的合法权益，可联系我们进行处理。

排除os.walk中的目录

问题：排除os.walk中的目录

回答 0

回答 1

排行榜展示

Python 情人节超强技能导出微信聊天记录生成词云

你不得不知道的python超级文献批量搜索下载工具

Python 流程图 — 一键转化代码为流程图

7行代码 Python热力图可视化分析缺失数据处理

Python 优化—算出每条语句执行时间

你的10W块放哪里能赚最多钱？

文章展示

输入python或ipython解释器时自动导入模块

Django-如何创建文件并将其保存到模型的FileField中？

如何在不停止程序的情况下打印完整的回溯？

Django的Meta类如何工作？

如何使IPython Notebook运行Python 3？

在此平台上不支持filename.whl的滚轮

排除os.walk中的目录

问题：排除os.walk中的目录

回答 0

回答 1

相关文章

排行榜展示

文章展示