问题:os.walk,无需深入研究以下目录

如何限制os.walk仅返回提供的目录中的文件?

def _dir_list(self, dir_name, whitelist):
    outputList = []
    for root, dirs, files in os.walk(dir_name):
        for f in files:
            if os.path.splitext(f)[1] in whitelist:
                outputList.append(os.path.join(root, f))
            else:
                self._email_to_("ignore")
    return outputList

How do I limit os.walk to only return files in the directory I provide it?

def _dir_list(self, dir_name, whitelist):
    outputList = []
    for root, dirs, files in os.walk(dir_name):
        for f in files:
            if os.path.splitext(f)[1] in whitelist:
                outputList.append(os.path.join(root, f))
            else:
                self._email_to_("ignore")
    return outputList

回答 0

使用walklevel功能。

import os

def walklevel(some_dir, level=1):
    some_dir = some_dir.rstrip(os.path.sep)
    assert os.path.isdir(some_dir)
    num_sep = some_dir.count(os.path.sep)
    for root, dirs, files in os.walk(some_dir):
        yield root, dirs, files
        num_sep_this = root.count(os.path.sep)
        if num_sep + level <= num_sep_this:
            del dirs[:]

它的工作方式与相似os.walk,但是您可以向其传递一个level参数,该参数指示递归进行的深度。

Use the walklevel function.

import os

def walklevel(some_dir, level=1):
    some_dir = some_dir.rstrip(os.path.sep)
    assert os.path.isdir(some_dir)
    num_sep = some_dir.count(os.path.sep)
    for root, dirs, files in os.walk(some_dir):
        yield root, dirs, files
        num_sep_this = root.count(os.path.sep)
        if num_sep + level <= num_sep_this:
            del dirs[:]

It works just like os.walk, but you can pass it a level parameter that indicates how deep the recursion will go.


回答 1

不要使用os.walk。

例:

import os

root = "C:\\"
for item in os.listdir(root):
    if os.path.isfile(os.path.join(root, item)):
        print item

Don’t use os.walk.

Example:

import os

root = "C:\\"
for item in os.listdir(root):
    if os.path.isfile(os.path.join(root, item)):
        print item

回答 2

我认为解决方案实际上非常简单。

break

仅执行for循环的第一次迭代,必须有一种更优雅的方法。

for root, dirs, files in os.walk(dir_name):
    for f in files:
        ...
        ...
    break
...

首次调用os.walk时,它将返回当前目录的郁金香,然后在下一个循环中循环下一个目录的内容。

使用原始脚本,然后添加一个break即可

def _dir_list(self, dir_name, whitelist):
    outputList = []
    for root, dirs, files in os.walk(dir_name):
        for f in files:
            if os.path.splitext(f)[1] in whitelist:
                outputList.append(os.path.join(root, f))
            else:
                self._email_to_("ignore")
        break
    return outputList

I think the solution is actually very simple.

use

break

to only do first iteration of the for loop, there must be a more elegant way.

for root, dirs, files in os.walk(dir_name):
    for f in files:
        ...
        ...
    break
...

The first time you call os.walk, it returns tulips for the current directory, then on next loop the contents of the next directory.

Take original script and just add a break.

def _dir_list(self, dir_name, whitelist):
    outputList = []
    for root, dirs, files in os.walk(dir_name):
        for f in files:
            if os.path.splitext(f)[1] in whitelist:
                outputList.append(os.path.join(root, f))
            else:
                self._email_to_("ignore")
        break
    return outputList

回答 3

使用建议listdir是一个很好的建议。在Python 2中,您的问题的直接答案是root, dirs, files = os.walk(dir_name).next()

等效的Python 3语法是 root, dirs, files = next(os.walk(dir_name))

The suggestion to use listdir is a good one. The direct answer to your question in Python 2 is root, dirs, files = os.walk(dir_name).next().

The equivalent Python 3 syntax is root, dirs, files = next(os.walk(dir_name))


回答 4

您可以使用which返回给定目录中的名称列表(包括文件和目录)。如果需要区分文件和目录,请调用os.stat()每个名称。

You could use which returns a list of names (for both files and directories) in a given directory. If you need to distinguish between files and directories, call os.stat() on each name.


回答 5

如果您的需求不仅仅是顶层目录(例如,忽略VCS目录等),还可以修改目录列表以防止os.walk在目录中递归。

即:

def _dir_list(self, dir_name, whitelist):
    outputList = []
    for root, dirs, files in os.walk(dir_name):
        dirs[:] = [d for d in dirs if is_good(d)]
        for f in files:
            do_stuff()

注意-请小心更改列表,而不是重新绑定它。显然,os.walk不了解外部重新绑定。

If you have more complex requirements than just the top directory (eg ignore VCS dirs etc), you can also modify the list of directories to prevent os.walk recursing through them.

ie:

def _dir_list(self, dir_name, whitelist):
    outputList = []
    for root, dirs, files in os.walk(dir_name):
        dirs[:] = [d for d in dirs if is_good(d)]
        for f in files:
            do_stuff()

Note – be careful to mutate the list, rather than just rebind it. Obviously os.walk doesn’t know about the external rebinding.


回答 6

for path, dirs, files in os.walk('.'):
    print path, dirs, files
    del dirs[:] # go only one level deep
for path, dirs, files in os.walk('.'):
    print path, dirs, files
    del dirs[:] # go only one level deep

回答 7

与的想法相同listdir,但更简短:

[f for f in os.listdir(root_dir) if os.path.isfile(os.path.join(root_dir, f))]

The same idea with listdir, but shorter:

[f for f in os.listdir(root_dir) if os.path.isfile(os.path.join(root_dir, f))]

回答 8

感觉就像在扔我的2便士。

baselevel = len(rootdir.split("\\"))
for subdirs, dirs, files in os.walk(rootdir):
    curlevel = len(subdirs.split("\\"))
    if curlevel <= baselevel + 1:
        [do stuff]

Felt like throwing my 2 pence in.

baselevel = len(rootdir.split("\\"))
for subdirs, dirs, files in os.walk(rootdir):
    curlevel = len(subdirs.split("\\"))
    if curlevel <= baselevel + 1:
        [do stuff]

回答 9

在Python 3中,我能够做到这一点:

import os
dir = "/path/to/files/"

#List all files immediately under this folder:
print ( next( os.walk(dir) )[2] )

#List all folders immediately under this folder:
print ( next( os.walk(dir) )[1] )

In Python 3, I was able to do this:

import os
dir = "/path/to/files/"

#List all files immediately under this folder:
print ( next( os.walk(dir) )[2] )

#List all folders immediately under this folder:
print ( next( os.walk(dir) )[1] )

回答 10

Python 3.5开始,您可以使用os.scandir代替os.listdir。您将获得DirEntry对象的迭代器,而不是字符串。从文档:

使用scandir()而不是listdir()可以大大提高还需要文件类型或文件属性信息的代码的性能,因为DirEntry如果操作系统在扫描目录时提供此信息,则对象会公开此信息。所有DirEntry方法可以执行系统调用,但is_dir()is_file()通常只需要一个系统调用的符号链接; DirEntry.stat()在Unix上始终需要系统调用,而在Windows上仅需要一个系统调用即可。

您可以访问该对象的名称,DirEntry.name然后该名称就相当于该对象的输出os.listdir

Since Python 3.5 you can use os.scandir instead of os.listdir. Instead of strings you get an iterator of DirEntry objects in return. From the docs:

Using scandir() instead of listdir() can significantly increase the performance of code that also needs file type or file attribute information, because DirEntry objects expose this information if the operating system provides it when scanning a directory. All DirEntry methods may perform a system call, but is_dir() and is_file() usually only require a system call for symbolic links; DirEntry.stat() always requires a system call on Unix but only requires one for symbolic links on Windows.

You can access the name of the object via DirEntry.name which is then equivalent to the output of os.listdir


回答 11

您还可以执行以下操作:

for path, subdirs, files in os.walk(dir_name):
    for name in files:
        if path == ".": #this will filter the files in the current directory
             #code here

You could also do the following:

for path, subdirs, files in os.walk(dir_name):
    for name in files:
        if path == ".": #this will filter the files in the current directory
             #code here

回答 12

这就是我解决的方法

if recursive:
    items = os.walk(target_directory)
else:
    items = [next(os.walk(target_directory))]

...

This is how I solved it

if recursive:
    items = os.walk(target_directory)
else:
    items = [next(os.walk(target_directory))]

...

回答 13

使用listdir时有一个陷阱。os.path.isdir(identifier)必须是绝对路径。要选择子目录,请执行以下操作:

for dirname in os.listdir(rootdir):
  if os.path.isdir(os.path.join(rootdir, dirname)):
     print("I got a subdirectory: %s" % dirname)

替代方法是更改​​目录,以在没有os.path.join()的情况下进行测试。

There is a catch when using listdir. The os.path.isdir(identifier) must be an absolute path. To pick subdirectories you do:

for dirname in os.listdir(rootdir):
  if os.path.isdir(os.path.join(rootdir, dirname)):
     print("I got a subdirectory: %s" % dirname)

The alternative is to change to the directory to do the testing without the os.path.join().


回答 14

您可以使用此代码段

for root, dirs, files in os.walk(directory):
    if level > 0:
        # do some stuff
    else:
        break
    level-=1

You can use this snippet

for root, dirs, files in os.walk(directory):
    if level > 0:
        # do some stuff
    else:
        break
    level-=1

回答 15

创建一个排除列表,使用fnmatch跳过目录结构并执行此过程

excludes= ['a\*\b', 'c\d\e']
for root, directories, files in os.walk('Start_Folder'):
    if not any(fnmatch.fnmatch(nf_root, pattern) for pattern in excludes):
        for root, directories, files in os.walk(nf_root):
            ....
            do the process
            ....

与“包含”相同:

if **any**(fnmatch.fnmatch(nf_root, pattern) for pattern in **includes**):

create a list of excludes, use fnmatch to skip the directory structure and do the process

excludes= ['a\*\b', 'c\d\e']
for root, directories, files in os.walk('Start_Folder'):
    if not any(fnmatch.fnmatch(nf_root, pattern) for pattern in excludes):
        for root, directories, files in os.walk(nf_root):
            ....
            do the process
            ....

same as for ‘includes’:

if **any**(fnmatch.fnmatch(nf_root, pattern) for pattern in **includes**):

回答 16

为什么不简单地使用range和并os.walk结合zip?不是最佳解决方案,但也可以。

例如这样:

# your part before
for count, (root, dirs, files) in zip(range(0, 1), os.walk(dir_name)):
    # logic stuff
# your later part

适用于python 3。

另外:A break更简单。(看@Pieter的答案)

Why not simply use a range and os.walk combined with the zip? Is not the best solution, but would work too.

For example like this:

# your part before
for count, (root, dirs, files) in zip(range(0, 1), os.walk(dir_name)):
    # logic stuff
# your later part

Works for me on python 3.

Also: A break is simpler too btw. (Look at the answer from @Pieter)


回答 17

亚历克斯的答案略有变化,但使用__next__()

print(next(os.walk('d:/'))[2]) 要么 print(os.walk('d:/').__next__()[2])

[2]作为fileroot, dirs, file其他的答案中提到

A slight change to Alex’s answer, but using __next__():

print(next(os.walk('d:/'))[2]) or print(os.walk('d:/').__next__()[2])

with the [2] being the file in root, dirs, file mentioned in other answers


回答 18

os.walk找到的每个目录的根文件夹都会更改。我求解器检查根==目录

def _dir_list(self, dir_name, whitelist):
    outputList = []
    for root, dirs, files in os.walk(dir_name):
        if root == dir_name: #This only meet parent folder
            for f in files:
                if os.path.splitext(f)[1] in whitelist:
                    outputList.append(os.path.join(root, f))
                else:
                    self._email_to_("ignore")
    return outputList

root folder changes for every directory os.walk finds. I solver that checking if root == directory

def _dir_list(self, dir_name, whitelist):
    outputList = []
    for root, dirs, files in os.walk(dir_name):
        if root == dir_name: #This only meet parent folder
            for f in files:
                if os.path.splitext(f)[1] in whitelist:
                    outputList.append(os.path.join(root, f))
                else:
                    self._email_to_("ignore")
    return outputList

回答 19

import os

def listFiles(self, dir_name):
    names = []
    for root, directory, files in os.walk(dir_name):
        if root == dir_name:
            for name in files:
                names.append(name)
    return names
import os

def listFiles(self, dir_name):
    names = []
    for root, directory, files in os.walk(dir_name):
        if root == dir_name:
            for name in files:
                names.append(name)
    return names

声明:本站所有文章,如无特殊说明或标注,均为本站原创发布。任何个人或组织,在未征得本站同意时,禁止复制、盗用、采集、发布本站内容到任何网站、书籍等各类媒体平台。如若本站内容侵犯了原著者的合法权益,可联系我们进行处理。