问题:如何在Python中获取所有直接子目录
我正在尝试编写一个简单的Python脚本,该脚本会将所有子目录中的index.tpl复制到index.html(有一些exceptions)。
通过尝试获取子目录列表,我陷入了困境。
I’m trying to write a simple Python script that will copy a index.tpl to index.html in all of the subdirectories (with a few exceptions).
I’m getting bogged down by trying to get the list of subdirectories.
回答 0
我对各种功能进行了一些速度测试,以将完整路径返回到所有当前子目录。
tl; dr:
始终使用scandir
:
list_subfolders_with_paths = [f.path for f in os.scandir(path) if f.is_dir()]
奖励:使用,scandir
您也可以只使用f.name
而不是来获取文件夹名称f.path
。
此(以及下面的所有其他功能)将不使用自然排序。这意味着将对结果进行如下排序:1、10、2。要进行自然排序(1、2、10),请查看https://stackoverflow.com/a/48030307/2441026
结果:
scandir
是:比walk
快3倍,比listdir
(带过滤器)快32 倍,比快35倍Pathlib
快listdir
35 36倍,比快37倍(!)glob
。
Scandir: 0.977
Walk: 3.011
Listdir (filter): 31.288
Pathlib: 34.075
Listdir: 35.501
Glob: 36.277
已在W7x64,Python 3.8.1中测试。带有440个子文件夹的文件夹。
如果您想知道是否listdir
可以通过不两次执行os.path.join()来加快速度,是的,但是区别根本不存在。
码:
import os
import pathlib
import timeit
import glob
path = r"<example_path>"
def a():
list_subfolders_with_paths = [f.path for f in os.scandir(path) if f.is_dir()]
# print(len(list_subfolders_with_paths))
def b():
list_subfolders_with_paths = [os.path.join(path, f) for f in os.listdir(path) if os.path.isdir(os.path.join(path, f))]
# print(len(list_subfolders_with_paths))
def c():
list_subfolders_with_paths = []
for root, dirs, files in os.walk(path):
for dir in dirs:
list_subfolders_with_paths.append( os.path.join(root, dir) )
break
# print(len(list_subfolders_with_paths))
def d():
list_subfolders_with_paths = glob.glob(path + '/*/')
# print(len(list_subfolders_with_paths))
def e():
list_subfolders_with_paths = list(filter(os.path.isdir, [os.path.join(path, f) for f in os.listdir(path)]))
# print(len(list(list_subfolders_with_paths)))
def f():
p = pathlib.Path(path)
list_subfolders_with_paths = [x for x in p.iterdir() if x.is_dir()]
# print(len(list_subfolders_with_paths))
print(f"Scandir: {timeit.timeit(a, number=1000):.3f}")
print(f"Listdir: {timeit.timeit(b, number=1000):.3f}")
print(f"Walk: {timeit.timeit(c, number=1000):.3f}")
print(f"Glob: {timeit.timeit(d, number=1000):.3f}")
print(f"Listdir (filter): {timeit.timeit(e, number=1000):.3f}")
print(f"Pathlib: {timeit.timeit(f, number=1000):.3f}")
I did some speed testing on various functions to return the full path to all current subdirectories.
tl;dr:
Always use scandir
:
list_subfolders_with_paths = [f.path for f in os.scandir(path) if f.is_dir()]
Bonus: With scandir
you can also simply only get folder names by using f.name
instead of f.path
.
This (as well as all other functions below) will not use natural sorting. This means results will be sorted like this: 1, 10, 2. To get natural sorting (1, 2, 10), please have a look at https://stackoverflow.com/a/48030307/2441026
Results:
scandir
is: 3x faster than walk
, 32x faster than listdir
(with filter), 35x faster than Pathlib
and 36x faster than listdir
and 37x (!) faster than glob
.
Scandir: 0.977
Walk: 3.011
Listdir (filter): 31.288
Pathlib: 34.075
Listdir: 35.501
Glob: 36.277
Tested with W7x64, Python 3.8.1. Folder with 440 subfolders.
In case you wonder if listdir
could be speed up by not doing os.path.join() twice, yes, but the difference is basically nonexistent.
Code:
import os
import pathlib
import timeit
import glob
path = r"<example_path>"
def a():
list_subfolders_with_paths = [f.path for f in os.scandir(path) if f.is_dir()]
# print(len(list_subfolders_with_paths))
def b():
list_subfolders_with_paths = [os.path.join(path, f) for f in os.listdir(path) if os.path.isdir(os.path.join(path, f))]
# print(len(list_subfolders_with_paths))
def c():
list_subfolders_with_paths = []
for root, dirs, files in os.walk(path):
for dir in dirs:
list_subfolders_with_paths.append( os.path.join(root, dir) )
break
# print(len(list_subfolders_with_paths))
def d():
list_subfolders_with_paths = glob.glob(path + '/*/')
# print(len(list_subfolders_with_paths))
def e():
list_subfolders_with_paths = list(filter(os.path.isdir, [os.path.join(path, f) for f in os.listdir(path)]))
# print(len(list(list_subfolders_with_paths)))
def f():
p = pathlib.Path(path)
list_subfolders_with_paths = [x for x in p.iterdir() if x.is_dir()]
# print(len(list_subfolders_with_paths))
print(f"Scandir: {timeit.timeit(a, number=1000):.3f}")
print(f"Listdir: {timeit.timeit(b, number=1000):.3f}")
print(f"Walk: {timeit.timeit(c, number=1000):.3f}")
print(f"Glob: {timeit.timeit(d, number=1000):.3f}")
print(f"Listdir (filter): {timeit.timeit(e, number=1000):.3f}")
print(f"Pathlib: {timeit.timeit(f, number=1000):.3f}")
回答 1
import os
def get_immediate_subdirectories(a_dir):
return [name for name in os.listdir(a_dir)
if os.path.isdir(os.path.join(a_dir, name))]
import os
def get_immediate_subdirectories(a_dir):
return [name for name in os.listdir(a_dir)
if os.path.isdir(os.path.join(a_dir, name))]
回答 2
为什么没有人提到glob
?glob
使您可以使用Unix样式的路径名扩展,并且对于需要查找多个路径名的几乎所有内容,我都可以使用。它非常容易:
from glob import glob
paths = glob('*/')
请注意,这glob
将返回带有最后斜杠的目录(与Unix一样),而大多数path
基于解决方案的目录将省略最后斜杠。
Why has no one mentioned glob
? glob
lets you use Unix-style pathname expansion, and is my go to function for almost everything that needs to find more than one path name. It makes it very easy:
from glob import glob
paths = glob('*/')
Note that glob
will return the directory with the final slash (as unix would) while most path
based solutions will omit the final slash.
回答 3
选中“ 获取当前目录中所有子目录的列表 ”。
这是Python 3版本:
import os
dir_list = next(os.walk('.'))[1]
print(dir_list)
回答 4
import os, os.path
要获取目录中的(全路径)直接子目录:
def SubDirPath (d):
return filter(os.path.isdir, [os.path.join(d,f) for f in os.listdir(d)])
要获取最新(最新)子目录:
def LatestDirectory (d):
return max(SubDirPath(d), key=os.path.getmtime)
import os, os.path
To get (full-path) immediate sub-directories in a directory:
def SubDirPath (d):
return filter(os.path.isdir, [os.path.join(d,f) for f in os.listdir(d)])
To get the latest (newest) sub-directory:
def LatestDirectory (d):
return max(SubDirPath(d), key=os.path.getmtime)
回答 5
os.walk
在这种情况下是你的朋友。
直接来自文档:
walk()通过自上而下或自下而上遍历树来在目录树中生成文件名。对于以目录顶部(包括顶部本身)为根的树中的每个目录,它会生成一个三元组(目录路径,目录名,文件名)。
os.walk
is your friend in this situation.
Straight from the documentation:
walk() generates the file names in a directory tree, by walking the tree either top down or bottom up. For each directory in the tree rooted at directory top (including top itself), it yields a 3-tuple (dirpath, dirnames, filenames).
回答 6
这种方法可以很好地一劳永逸。
from glob import glob
subd = [s.rstrip("/") for s in glob(parent_dir+"*/")]
This method nicely does it all in one go.
from glob import glob
subd = [s.rstrip("/") for s in glob(parent_dir+"*/")]
回答 7
使用Twisted的FilePath模块:
from twisted.python.filepath import FilePath
def subdirs(pathObj):
for subpath in pathObj.walk():
if subpath.isdir():
yield subpath
if __name__ == '__main__':
for subdir in subdirs(FilePath(".")):
print "Subdirectory:", subdir
由于一些评论者曾问过使用Twisted的库有什么好处,所以在这里我将超出最初的问题。
分支中有一些经过改进的文档,它们解释了FilePath的优点。您可能需要阅读。
在此示例中更具体地说:与标准库版本不同,此功能可以在不导入的情况下实现。“ subdirs”函数是完全通用的,因为它只对参数起作用。为了使用标准库复制和移动文件,您需要依赖open
内置的“ listdir
”,“ isdir
”或“ os.walk
”或“ shutil.copy
” 或“ ”。也许也“ os.path.join
”。更不用说您需要一个字符串传递参数来标识实际文件的事实。让我们看一下将每个目录的“ index.tpl”复制到“ index.html”的完整实现:
def copyTemplates(topdir):
for subdir in subdirs(topdir):
tpl = subdir.child("index.tpl")
if tpl.exists():
tpl.copyTo(subdir.child("index.html"))
上面的“ subdirs”功能可以在任何类似FilePath
对象上使用。除其他外,这意味着ZipPath
对象。不幸的ZipPath
是,目前它是只读的,但是可以扩展为支持写作。
您还可以传递自己的对象以进行测试。为了测试此处建议的使用os.path的API,您必须使用导入的名称和隐式依赖项来进行测试,并且通常需要执行魔术操作才能使测试生效。使用FilePath,您可以执行以下操作:
class MyFakePath:
def child(self, name):
"Return an appropriate child object"
def walk(self):
"Return an iterable of MyFakePath objects"
def exists(self):
"Return true or false, as appropriate to the test"
def isdir(self):
"Return true or false, as appropriate to the test"
...
subdirs(MyFakePath(...))
Using Twisted’s FilePath module:
from twisted.python.filepath import FilePath
def subdirs(pathObj):
for subpath in pathObj.walk():
if subpath.isdir():
yield subpath
if __name__ == '__main__':
for subdir in subdirs(FilePath(".")):
print "Subdirectory:", subdir
Since some commenters have asked what the advantages of using Twisted’s libraries for this is, I’ll go a bit beyond the original question here.
There’s some improved documentation in a branch that explains the advantages of FilePath; you might want to read that.
More specifically in this example: unlike the standard library version, this function can be implemented with no imports. The “subdirs” function is totally generic, in that it operates on nothing but its argument. In order to copy and move the files using the standard library, you need to depend on the “open
” builtin, “listdir
“, perhaps “isdir
” or “os.walk
” or “shutil.copy
“. Maybe “os.path.join
” too. Not to mention the fact that you need a string passed an argument to identify the actual file. Let’s take a look at the full implementation which will copy each directory’s “index.tpl” to “index.html”:
def copyTemplates(topdir):
for subdir in subdirs(topdir):
tpl = subdir.child("index.tpl")
if tpl.exists():
tpl.copyTo(subdir.child("index.html"))
The “subdirs” function above can work on any FilePath
-like object. Which means, among other things, ZipPath
objects. Unfortunately ZipPath
is read-only right now, but it could be extended to support writing.
You can also pass your own objects for testing purposes. In order to test the os.path-using APIs suggested here, you have to monkey with imported names and implicit dependencies and generally perform black magic to get your tests to work. With FilePath, you do something like this:
class MyFakePath:
def child(self, name):
"Return an appropriate child object"
def walk(self):
"Return an iterable of MyFakePath objects"
def exists(self):
"Return true or false, as appropriate to the test"
def isdir(self):
"Return true or false, as appropriate to the test"
...
subdirs(MyFakePath(...))
回答 8
我只是编写了一些代码来移动vmware虚拟机,最终使用os.path
并shutil
完成了子目录之间的文件复制。
def copy_client_files (file_src, file_dst):
for file in os.listdir(file_src):
print "Copying file: %s" % file
shutil.copy(os.path.join(file_src, file), os.path.join(file_dst, file))
它并不十分优雅,但确实可以工作。
I just wrote some code to move vmware virtual machines around, and ended up using os.path
and shutil
to accomplish file copying between sub-directories.
def copy_client_files (file_src, file_dst):
for file in os.listdir(file_src):
print "Copying file: %s" % file
shutil.copy(os.path.join(file_src, file), os.path.join(file_dst, file))
It’s not terribly elegant, but it does work.
回答 9
这是一种方法:
import os
import shutil
def copy_over(path, from_name, to_name):
for path, dirname, fnames in os.walk(path):
for fname in fnames:
if fname == from_name:
shutil.copy(os.path.join(path, from_name), os.path.join(path, to_name))
copy_over('.', 'index.tpl', 'index.html')
Here’s one way:
import os
import shutil
def copy_over(path, from_name, to_name):
for path, dirname, fnames in os.walk(path):
for fname in fnames:
if fname == from_name:
shutil.copy(os.path.join(path, from_name), os.path.join(path, to_name))
copy_over('.', 'index.tpl', 'index.html')
回答 10
我不得不提到path.py库,该库我经常使用。
提取直接子目录变得如此简单:
my_dir.dirs()
完整的工作示例是:
from path import Path
my_directory = Path("path/to/my/directory")
subdirs = my_directory.dirs()
注意:my_directory仍然可以作为字符串操作,因为Path是字符串的子类,但是提供了许多有用的方法来处理路径
I have to mention the path.py library, which I use very often.
Fetching the immediate subdirectories become as simple as that:
my_dir.dirs()
The full working example is:
from path import Path
my_directory = Path("path/to/my/directory")
subdirs = my_directory.dirs()
NB: my_directory still can be manipulated as a string, since Path is a subclass of string, but providing a bunch of useful methods for manipulating paths
回答 11
def get_folders_in_directories_recursively(directory, index=0):
folder_list = list()
parent_directory = directory
for path, subdirs, _ in os.walk(directory):
if not index:
for sdirs in subdirs:
folder_path = "{}/{}".format(path, sdirs)
folder_list.append(folder_path)
elif path[len(parent_directory):].count('/') + 1 == index:
for sdirs in subdirs:
folder_path = "{}/{}".format(path, sdirs)
folder_list.append(folder_path)
return folder_list
以下函数可以称为:
get_folders_in_directories_recursively(directory,index = 1)->给出第一级的文件夹列表
get_folders_in_directories_recursively(目录)->给出所有子文件夹
def get_folders_in_directories_recursively(directory, index=0):
folder_list = list()
parent_directory = directory
for path, subdirs, _ in os.walk(directory):
if not index:
for sdirs in subdirs:
folder_path = "{}/{}".format(path, sdirs)
folder_list.append(folder_path)
elif path[len(parent_directory):].count('/') + 1 == index:
for sdirs in subdirs:
folder_path = "{}/{}".format(path, sdirs)
folder_list.append(folder_path)
return folder_list
The following function can be called as:
get_folders_in_directories_recursively(directory, index=1) -> gives the list of folders in first level
get_folders_in_directories_recursively(directory) -> gives all the sub folders
回答 12
import glob
import os
def child_dirs(path):
cd = os.getcwd() # save the current working directory
os.chdir(path) # change directory
dirs = glob.glob("*/") # get all the subdirectories
os.chdir(cd) # change directory to the script original location
return dirs
该child_dirs
函数采用目录路径,并返回其中的直接子目录列表。
dir
|
-- dir_1
-- dir_2
child_dirs('dir') -> ['dir_1', 'dir_2']
import glob
import os
def child_dirs(path):
cd = os.getcwd() # save the current working directory
os.chdir(path) # change directory
dirs = glob.glob("*/") # get all the subdirectories
os.chdir(cd) # change directory to the script original location
return dirs
The child_dirs
function takes a path a directory and returns a list of the immediate subdirectories in it.
dir
|
-- dir_1
-- dir_2
child_dirs('dir') -> ['dir_1', 'dir_2']
回答 13
import pathlib
def list_dir(dir):
path = pathlib.Path(dir)
dir = []
try:
for item in path.iterdir():
if item.is_dir():
dir.append(item)
return dir
except FileNotFoundError:
print('Invalid directory')
import pathlib
def list_dir(dir):
path = pathlib.Path(dir)
dir = []
try:
for item in path.iterdir():
if item.is_dir():
dir.append(item)
return dir
except FileNotFoundError:
print('Invalid directory')
回答 14
一种使用pathlib的衬板:
list_subfolders_with_paths = [p for p in pathlib.Path(path).iterdir() if p.is_dir()]
One liner using pathlib:
list_subfolders_with_paths = [p for p in pathlib.Path(path).iterdir() if p.is_dir()]