内容 隐藏

问题:如何列出目录的所有文件?

如何在Python中列出目录的所有文件并将其添加到list

How can I list all files of a directory in Python and add them to a list?


回答 0

os.listdir()将为您提供目录中的所有内容- 文件目录

如果只需要文件,则可以使用os.path以下方法将其过滤掉:

from os import listdir
from os.path import isfile, join
onlyfiles = [f for f in listdir(mypath) if isfile(join(mypath, f))]

或者,您也可以使用os.walk()它为访问的每个目录生成两个列表 – 为您拆分为文件目录。如果只需要顶层目录,可以在第一次生成目录时中断

from os import walk

f = []
for (dirpath, dirnames, filenames) in walk(mypath):
    f.extend(filenames)
    break

os.listdir() will get you everything that’s in a directory – files and directories.

If you want just files, you could either filter this down using os.path:

from os import listdir
from os.path import isfile, join
onlyfiles = [f for f in listdir(mypath) if isfile(join(mypath, f))]

or you could use os.walk() which will yield two lists for each directory it visits – splitting into files and dirs for you. If you only want the top directory you can just break the first time it yields

from os import walk

f = []
for (dirpath, dirnames, filenames) in walk(mypath):
    f.extend(filenames)
    break

回答 1

我更喜欢使用glob模块,因为它可以进行模式匹配和扩展。

import glob
print(glob.glob("/home/adam/*.txt"))

它将返回包含查询文件的列表:

['/home/adam/file1.txt', '/home/adam/file2.txt', .... ]

I prefer using the glob module, as it does pattern matching and expansion.

import glob
print(glob.glob("/home/adam/*.txt"))

It will return a list with the queried files:

['/home/adam/file1.txt', '/home/adam/file2.txt', .... ]

回答 2

使用Python 2和3获取文件列表


os.listdir()

如何获取当前目录中的所有文件(和目录)(Python 3)

以下是在Python 3中使用oslistdir()函数仅检索当前目录中文件的简单方法。进一步的探索将演示如何返回目录中的文件夹,但是您不会在子目录中拥有该文件,因此可以使用步行-稍后讨论)。

 import os
 arr = os.listdir()
 print(arr)

 >>> ['$RECYCLE.BIN', 'work.txt', '3ebooks.txt', 'documents']

glob

我发现glob更容易选择相同类型或相同的文件。看下面的例子:

import glob

txtfiles = []
for file in glob.glob("*.txt"):
    txtfiles.append(file)

glob 具有列表理解

import glob

mylist = [f for f in glob.glob("*.txt")]

glob 具有功能

该函数在参数中返回给定扩展名(.txt,.docx等)的列表。

import glob

def filebrowser(ext=""):
    "Returns files with an extension"
    return [f for f in glob.glob(f"*{ext}")]

x = filebrowser(".txt")
print(x)

>>> ['example.txt', 'fb.txt', 'intro.txt', 'help.txt']

glob 扩展先前的代码

该函数现在返回与您作为参数传递的字符串匹配的文件列表

import glob

def filesearch(word=""):
    """Returns a list with all files with the word/extension in it"""
    file = []
    for f in glob.glob("*"):
        if word[0] == ".":
            if f.endswith(word):
                file.append(f)
                return file
        elif word in f:
            file.append(f)
            return file
    return file

lookfor = "example", ".py"
for w in lookfor:
    print(f"{w:10} found => {filesearch(w)}")

输出

example    found => []
.py        found => ['search.py']

获取完整的路径名 os.path.abspath

正如您所注意到的,上面的代码中没有文件的完整路径。如果需要绝对路径,则可以使用os.path模块的另一个函数,_getfullpathname将从os.listdir()中获取的文件作为参数。还有其他完整路径的方法,稍后我们将进行检查(如mexmex所建议,我将_getfullpathname替换为abspath)。

 import os
 files_path = [os.path.abspath(x) for x in os.listdir()]
 print(files_path)

 >>> ['F:\\documenti\applications.txt', 'F:\\documenti\collections.txt']

使用以下命令获取所有子目录中文件类型的完整路径名 walk

我发现这对于在许多目录中查找内容非常有用,它帮助我找到了一个我不记得其名称的文件:

import os

# Getting the current work directory (cwd)
thisdir = os.getcwd()

# r=root, d=directories, f = files
for r, d, f in os.walk(thisdir):
    for file in f:
        if file.endswith(".docx"):
            print(os.path.join(r, file))

os.listdir():获取当前目录中的文件(Python 2)

在Python 2中,如果要在当前目录中列出文件列表,则必须将参数指定为“。”。或os.listdir方法中的os.getcwd()。

 import os
 arr = os.listdir('.')
 print(arr)

 >>> ['$RECYCLE.BIN', 'work.txt', '3ebooks.txt', 'documents']

进入目录树

# Method 1
x = os.listdir('..')

# Method 2
x= os.listdir('/')

获取文件:os.listdir()在特定目录中(Python 2和3)

 import os
 arr = os.listdir('F:\\python')
 print(arr)

 >>> ['$RECYCLE.BIN', 'work.txt', '3ebooks.txt', 'documents']

使用以下命令获取特定子目录的文件 os.listdir()

import os

x = os.listdir("./content")

os.walk('.') -当前目录

 import os
 arr = next(os.walk('.'))[2]
 print(arr)

 >>> ['5bs_Turismo1.pdf', '5bs_Turismo1.pptx', 'esperienza.txt']

next(os.walk('.'))os.path.join('dir', 'file')

 import os
 arr = []
 for d,r,f in next(os.walk("F:\\_python")):
     for file in f:
         arr.append(os.path.join(r,file))

 for f in arr:
     print(files)

>>> F:\\_python\\dict_class.py
>>> F:\\_python\\programmi.txt

next(os.walk('F:\\') -获取完整路径-列表理解

 [os.path.join(r,file) for r,d,f in next(os.walk("F:\\_python")) for file in f]

 >>> ['F:\\_python\\dict_class.py', 'F:\\_python\\programmi.txt']

os.walk -获取完整路径-子目录中的所有文件**

x = [os.path.join(r,file) for r,d,f in os.walk("F:\\_python") for file in f]
print(x)

>>> ['F:\\_python\\dict.py', 'F:\\_python\\progr.txt', 'F:\\_python\\readl.py']

os.listdir() -仅获取txt文件

 arr_txt = [x for x in os.listdir() if x.endswith(".txt")]
 print(arr_txt)

 >>> ['work.txt', '3ebooks.txt']

使用glob获得的文件的完整路径

如果我需要文件的绝对路径:

from path import path
from glob import glob
x = [path(f).abspath() for f in glob("F:\\*.txt")]
for f in x:
    print(f)

>>> F:\acquistionline.txt
>>> F:\acquisti_2018.txt
>>> F:\bootstrap_jquery_ecc.txt

使用os.path.isfile列表,以避免目录

import os.path
listOfFiles = [f for f in os.listdir() if os.path.isfile(f)]
print(listOfFiles)

>>> ['a simple game.py', 'data.txt', 'decorator.py']

使用pathlib在Python 3.4

import pathlib

flist = []
for p in pathlib.Path('.').iterdir():
    if p.is_file():
        print(p)
        flist.append(p)

 >>> error.PNG
 >>> exemaker.bat
 >>> guiprova.mp3
 >>> setup.py
 >>> speak_gui2.py
 >>> thumb.PNG

list comprehension

flist = [p for p in pathlib.Path('.').iterdir() if p.is_file()]

或者,使用pathlib.Path()代替pathlib.Path(".")

在pathlib.Path()中使用glob方法

import pathlib

py = pathlib.Path().glob("*.py")
for file in py:
    print(file)

>>> stack_overflow_list.py
>>> stack_overflow_list_tkinter.py

使用os.walk获取所有文件

import os
x = [i[2] for i in os.walk('.')]
y=[]
for t in x:
    for f in t:
        y.append(f)
print(y)

>>> ['append_to_list.py', 'data.txt', 'data1.txt', 'data2.txt', 'data_180617', 'os_walk.py', 'READ2.py', 'read_data.py', 'somma_defaltdic.py', 'substitute_words.py', 'sum_data.py', 'data.txt', 'data1.txt', 'data_180617']

仅获取具有next的文件并进入目录

 import os
 x = next(os.walk('F://python'))[2]
 print(x)

 >>> ['calculator.bat','calculator.py']

仅获取具有next的目录并进入目录

 import os
 next(os.walk('F://python'))[1] # for the current dir use ('.')

 >>> ['python3','others']

使用以下命令获取所有子目录名称 walk

for r,d,f in os.walk("F:\\_python"):
    for dirs in d:
        print(dirs)

>>> .vscode
>>> pyexcel
>>> pyschool.py
>>> subtitles
>>> _metaprogramming
>>> .ipynb_checkpoints

os.scandir() 从Python 3.5及更高版本开始

import os
x = [f.name for f in os.scandir() if f.is_file()]
print(x)

>>> ['calculator.bat','calculator.py']

# Another example with scandir (a little variation from docs.python.org)
# This one is more efficient than os.listdir.
# In this case, it shows the files only in the current directory
# where the script is executed.

import os
with os.scandir() as i:
    for entry in i:
        if entry.is_file():
            print(entry.name)

>>> ebookmaker.py
>>> error.PNG
>>> exemaker.bat
>>> guiprova.mp3
>>> setup.py
>>> speakgui4.py
>>> speak_gui2.py
>>> speak_gui3.py
>>> thumb.PNG

例子:

例如 1:子目录中有多少个文件?

在此示例中,我们查找所有目录及其子目录中包含的文件数。

import os

def count(dir, counter=0):
    "returns number of files in dir and subdirs"
    for pack in os.walk(dir):
        for f in pack[2]:
            counter += 1
    return dir + " : " + str(counter) + "files"

print(count("F:\\python"))

>>> 'F:\\\python' : 12057 files'

例2:如何将所有文件从一个目录复制到另一个目录?

用于在计算机中排序的脚本,以查找一种类型的所有文件(默认值:pptx)并将其复制到新文件夹中。

import os
import shutil
from path import path

destination = "F:\\file_copied"
# os.makedirs(destination)

def copyfile(dir, filetype='pptx', counter=0):
    "Searches for pptx (or other - pptx is the default) files and copies them"
    for pack in os.walk(dir):
        for f in pack[2]:
            if f.endswith(filetype):
                fullpath = pack[0] + "\\" + f
                print(fullpath)
                shutil.copy(fullpath, destination)
                counter += 1
    if counter > 0:
        print('-' * 30)
        print("\t==> Found in: `" + dir + "` : " + str(counter) + " files\n")

for dir in os.listdir():
    "searches for folders that starts with `_`"
    if dir[0] == '_':
        # copyfile(dir, filetype='pdf')
        copyfile(dir, filetype='txt')


>>> _compiti18\Compito Contabilità 1\conti.txt
>>> _compiti18\Compito Contabilità 1\modula4.txt
>>> _compiti18\Compito Contabilità 1\moduloa4.txt
>>> ------------------------
>>> ==> Found in: `_compiti18` : 3 files

例如 3:如何获取txt文件中的所有文件

如果要使用所有文件名创建一个txt文件,请执行以下操作:

import os
mylist = ""
with open("filelist.txt", "w", encoding="utf-8") as file:
    for eachfile in os.listdir():
        mylist += eachfile + "\n"
    file.write(mylist)

示例:包含硬盘驱动器所有文件的txt

"""
We are going to save a txt file with all the files in your directory.
We will use the function walk()
"""

import os

# see all the methods of os
# print(*dir(os), sep=", ")
listafile = []
percorso = []
with open("lista_file.txt", "w", encoding='utf-8') as testo:
    for root, dirs, files in os.walk("D:\\"):
        for file in files:
            listafile.append(file)
            percorso.append(root + "\\" + file)
            testo.write(file + "\n")
listafile.sort()
print("N. of files", len(listafile))
with open("lista_file_ordinata.txt", "w", encoding="utf-8") as testo_ordinato:
    for file in listafile:
        testo_ordinato.write(file + "\n")

with open("percorso.txt", "w", encoding="utf-8") as file_percorso:
    for file in percorso:
        file_percorso.write(file + "\n")

os.system("lista_file.txt")
os.system("lista_file_ordinata.txt")
os.system("percorso.txt")

C:\的所有文件都在一个文本文件中

这是先前代码的简短版本。如果您需要从另一个位置开始,请更改开始查找文件的文件夹。这段代码在我的计算机上的文本文件上生成了50 mb的内容,其中包含完整路径的文件少于500.000行。

import os

with open("file.txt", "w", encoding="utf-8") as filewrite:
    for r, d, f in os.walk("C:\\"):
        for file in f:
            filewrite.write(f"{r + file}\n")

如何在一个类型的文件夹中写入所有路径的文件

使用此功能,您可以创建一个txt文件,该文件将具有要查找的文件类型的名称(例如pngfile.txt),并带有该类型所有文件的所有完整路径。我认为有时候它会很有用。

import os

def searchfiles(extension='.ttf', folder='H:\\'):
    "Create a txt file with all the file of a type"
    with open(extension[1:] + "file.txt", "w", encoding="utf-8") as filewrite:
        for r, d, f in os.walk(folder):
            for file in f:
                if file.endswith(extension):
                    filewrite.write(f"{r + file}\n")

# looking for png file (fonts) in the hard disk H:\
searchfiles('.png', 'H:\\')

>>> H:\4bs_18\Dolphins5.png
>>> H:\4bs_18\Dolphins6.png
>>> H:\4bs_18\Dolphins7.png
>>> H:\5_18\marketing html\assets\imageslogo2.png
>>> H:\7z001.png
>>> H:\7z002.png

(新)找到所有文件并使用tkinter GUI打开它们

我只是想在这个2019年添加一个小应用程序,以在目录中搜索所有文件,并能够通过双击列表中文件的名称来打开它们。 在此处输入图片说明

import tkinter as tk
import os

def searchfiles(extension='.txt', folder='H:\\'):
    "insert all files in the listbox"
    for r, d, f in os.walk(folder):
        for file in f:
            if file.endswith(extension):
                lb.insert(0, r + "\\" + file)

def open_file():
    os.startfile(lb.get(lb.curselection()[0]))

root = tk.Tk()
root.geometry("400x400")
bt = tk.Button(root, text="Search", command=lambda:searchfiles('.png', 'H:\\'))
bt.pack()
lb = tk.Listbox(root)
lb.pack(fill="both", expand=1)
lb.bind("<Double-Button>", lambda x: open_file())
root.mainloop()

Get a list of files with Python 2 and 3


os.listdir()

How to get all the files (and directories) in the current directory (Python 3)

Following, are simple methods to retrieve only files in the current directory, using os and the listdir() function, in Python 3. Further exploration, will demonstrate how to return folders in the directory, but you will not have the file in the subdirectory, for that you can use walk – discussed later).

 import os
 arr = os.listdir()
 print(arr)

 >>> ['$RECYCLE.BIN', 'work.txt', '3ebooks.txt', 'documents']

glob

I found glob easier to select the file of the same type or with something in common. Look at the following example:

import glob

txtfiles = []
for file in glob.glob("*.txt"):
    txtfiles.append(file)

glob with list comprehension

import glob

mylist = [f for f in glob.glob("*.txt")]

glob with a function

The function returns a list of the given extension (.txt, .docx ecc.) in the argument

import glob

def filebrowser(ext=""):
    "Returns files with an extension"
    return [f for f in glob.glob(f"*{ext}")]

x = filebrowser(".txt")
print(x)

>>> ['example.txt', 'fb.txt', 'intro.txt', 'help.txt']

glob extending the previous code

The function now returns a list of file that matched with the string you pass as argument

import glob

def filesearch(word=""):
    """Returns a list with all files with the word/extension in it"""
    file = []
    for f in glob.glob("*"):
        if word[0] == ".":
            if f.endswith(word):
                file.append(f)
                return file
        elif word in f:
            file.append(f)
            return file
    return file

lookfor = "example", ".py"
for w in lookfor:
    print(f"{w:10} found => {filesearch(w)}")

output

example    found => []
.py        found => ['search.py']

Getting the full path name with os.path.abspath

As you noticed, you don’t have the full path of the file in the code above. If you need to have the absolute path, you can use another function of the os.path module called _getfullpathname, putting the file that you get from os.listdir() as an argument. There are other ways to have the full path, as we will check later (I replaced, as suggested by mexmex, _getfullpathname with abspath).

 import os
 files_path = [os.path.abspath(x) for x in os.listdir()]
 print(files_path)

 >>> ['F:\\documenti\applications.txt', 'F:\\documenti\collections.txt']

Get the full path name of a type of file into all subdirectories with walk

I find this very useful to find stuff in many directories, and it helped me find a file about which I didn’t remember the name:

import os

# Getting the current work directory (cwd)
thisdir = os.getcwd()

# r=root, d=directories, f = files
for r, d, f in os.walk(thisdir):
    for file in f:
        if file.endswith(".docx"):
            print(os.path.join(r, file))

os.listdir(): get files in the current directory (Python 2)

In Python 2, if you want the list of the files in the current directory, you have to give the argument as ‘.’ or os.getcwd() in the os.listdir method.

 import os
 arr = os.listdir('.')
 print(arr)

 >>> ['$RECYCLE.BIN', 'work.txt', '3ebooks.txt', 'documents']

To go up in the directory tree

# Method 1
x = os.listdir('..')

# Method 2
x= os.listdir('/')

Get files: os.listdir() in a particular directory (Python 2 and 3)

 import os
 arr = os.listdir('F:\\python')
 print(arr)

 >>> ['$RECYCLE.BIN', 'work.txt', '3ebooks.txt', 'documents']

Get files of a particular subdirectory with os.listdir()

import os

x = os.listdir("./content")

os.walk('.') – current directory

 import os
 arr = next(os.walk('.'))[2]
 print(arr)

 >>> ['5bs_Turismo1.pdf', '5bs_Turismo1.pptx', 'esperienza.txt']

next(os.walk('.')) and os.path.join('dir', 'file')

 import os
 arr = []
 for d,r,f in next(os.walk("F:\\_python")):
     for file in f:
         arr.append(os.path.join(r,file))

 for f in arr:
     print(files)

>>> F:\\_python\\dict_class.py
>>> F:\\_python\\programmi.txt

next(os.walk('F:\\') – get the full path – list comprehension

 [os.path.join(r,file) for r,d,f in next(os.walk("F:\\_python")) for file in f]

 >>> ['F:\\_python\\dict_class.py', 'F:\\_python\\programmi.txt']

os.walk – get full path – all files in sub dirs**

x = [os.path.join(r,file) for r,d,f in os.walk("F:\\_python") for file in f]
print(x)

>>> ['F:\\_python\\dict.py', 'F:\\_python\\progr.txt', 'F:\\_python\\readl.py']

os.listdir() – get only txt files

 arr_txt = [x for x in os.listdir() if x.endswith(".txt")]
 print(arr_txt)

 >>> ['work.txt', '3ebooks.txt']

Using glob to get the full path of the files

If I should need the absolute path of the files:

from path import path
from glob import glob
x = [path(f).abspath() for f in glob("F:\\*.txt")]
for f in x:
    print(f)

>>> F:\acquistionline.txt
>>> F:\acquisti_2018.txt
>>> F:\bootstrap_jquery_ecc.txt

Using os.path.isfile to avoid directories in the list

import os.path
listOfFiles = [f for f in os.listdir() if os.path.isfile(f)]
print(listOfFiles)

>>> ['a simple game.py', 'data.txt', 'decorator.py']

Using pathlib from Python 3.4

import pathlib

flist = []
for p in pathlib.Path('.').iterdir():
    if p.is_file():
        print(p)
        flist.append(p)

 >>> error.PNG
 >>> exemaker.bat
 >>> guiprova.mp3
 >>> setup.py
 >>> speak_gui2.py
 >>> thumb.PNG

With list comprehension:

flist = [p for p in pathlib.Path('.').iterdir() if p.is_file()]

Alternatively, use pathlib.Path() instead of pathlib.Path(".")

Use glob method in pathlib.Path()

import pathlib

py = pathlib.Path().glob("*.py")
for file in py:
    print(file)

>>> stack_overflow_list.py
>>> stack_overflow_list_tkinter.py

Get all and only files with os.walk

import os
x = [i[2] for i in os.walk('.')]
y=[]
for t in x:
    for f in t:
        y.append(f)
print(y)

>>> ['append_to_list.py', 'data.txt', 'data1.txt', 'data2.txt', 'data_180617', 'os_walk.py', 'READ2.py', 'read_data.py', 'somma_defaltdic.py', 'substitute_words.py', 'sum_data.py', 'data.txt', 'data1.txt', 'data_180617']

Get only files with next and walk in a directory

 import os
 x = next(os.walk('F://python'))[2]
 print(x)

 >>> ['calculator.bat','calculator.py']

Get only directories with next and walk in a directory

 import os
 next(os.walk('F://python'))[1] # for the current dir use ('.')

 >>> ['python3','others']

Get all the subdir names with walk

for r,d,f in os.walk("F:\\_python"):
    for dirs in d:
        print(dirs)

>>> .vscode
>>> pyexcel
>>> pyschool.py
>>> subtitles
>>> _metaprogramming
>>> .ipynb_checkpoints

os.scandir() from Python 3.5 and greater

import os
x = [f.name for f in os.scandir() if f.is_file()]
print(x)

>>> ['calculator.bat','calculator.py']

# Another example with scandir (a little variation from docs.python.org)
# This one is more efficient than os.listdir.
# In this case, it shows the files only in the current directory
# where the script is executed.

import os
with os.scandir() as i:
    for entry in i:
        if entry.is_file():
            print(entry.name)

>>> ebookmaker.py
>>> error.PNG
>>> exemaker.bat
>>> guiprova.mp3
>>> setup.py
>>> speakgui4.py
>>> speak_gui2.py
>>> speak_gui3.py
>>> thumb.PNG

Examples:

Ex. 1: How many files are there in the subdirectories?

In this example, we look for the number of files that are included in all the directory and its subdirectories.

import os

def count(dir, counter=0):
    "returns number of files in dir and subdirs"
    for pack in os.walk(dir):
        for f in pack[2]:
            counter += 1
    return dir + " : " + str(counter) + "files"

print(count("F:\\python"))

>>> 'F:\\\python' : 12057 files'

Ex.2: How to copy all files from a directory to another?

A script to make order in your computer finding all files of a type (default: pptx) and copying them in a new folder.

import os
import shutil
from path import path

destination = "F:\\file_copied"
# os.makedirs(destination)

def copyfile(dir, filetype='pptx', counter=0):
    "Searches for pptx (or other - pptx is the default) files and copies them"
    for pack in os.walk(dir):
        for f in pack[2]:
            if f.endswith(filetype):
                fullpath = pack[0] + "\\" + f
                print(fullpath)
                shutil.copy(fullpath, destination)
                counter += 1
    if counter > 0:
        print('-' * 30)
        print("\t==> Found in: `" + dir + "` : " + str(counter) + " files\n")

for dir in os.listdir():
    "searches for folders that starts with `_`"
    if dir[0] == '_':
        # copyfile(dir, filetype='pdf')
        copyfile(dir, filetype='txt')


>>> _compiti18\Compito Contabilità 1\conti.txt
>>> _compiti18\Compito Contabilità 1\modula4.txt
>>> _compiti18\Compito Contabilità 1\moduloa4.txt
>>> ------------------------
>>> ==> Found in: `_compiti18` : 3 files

Ex. 3: How to get all the files in a txt file

In case you want to create a txt file with all the file names:

import os
mylist = ""
with open("filelist.txt", "w", encoding="utf-8") as file:
    for eachfile in os.listdir():
        mylist += eachfile + "\n"
    file.write(mylist)

Example: txt with all the files of an hard drive

"""
We are going to save a txt file with all the files in your directory.
We will use the function walk()
"""

import os

# see all the methods of os
# print(*dir(os), sep=", ")
listafile = []
percorso = []
with open("lista_file.txt", "w", encoding='utf-8') as testo:
    for root, dirs, files in os.walk("D:\\"):
        for file in files:
            listafile.append(file)
            percorso.append(root + "\\" + file)
            testo.write(file + "\n")
listafile.sort()
print("N. of files", len(listafile))
with open("lista_file_ordinata.txt", "w", encoding="utf-8") as testo_ordinato:
    for file in listafile:
        testo_ordinato.write(file + "\n")

with open("percorso.txt", "w", encoding="utf-8") as file_percorso:
    for file in percorso:
        file_percorso.write(file + "\n")

os.system("lista_file.txt")
os.system("lista_file_ordinata.txt")
os.system("percorso.txt")

All the file of C:\ in one text file

This is a shorter version of the previous code. Change the folder where to start finding the files if you need to start from another position. This code generate a 50 mb on text file on my computer with something less then 500.000 lines with files with the complete path.

import os

with open("file.txt", "w", encoding="utf-8") as filewrite:
    for r, d, f in os.walk("C:\\"):
        for file in f:
            filewrite.write(f"{r + file}\n")

How to write a file with all paths in a folder of a type

With this function you can create a txt file that will have the name of a type of file that you look for (ex. pngfile.txt) with all the full path of all the files of that type. It can be useful sometimes, I think.

import os

def searchfiles(extension='.ttf', folder='H:\\'):
    "Create a txt file with all the file of a type"
    with open(extension[1:] + "file.txt", "w", encoding="utf-8") as filewrite:
        for r, d, f in os.walk(folder):
            for file in f:
                if file.endswith(extension):
                    filewrite.write(f"{r + file}\n")

# looking for png file (fonts) in the hard disk H:\
searchfiles('.png', 'H:\\')

>>> H:\4bs_18\Dolphins5.png
>>> H:\4bs_18\Dolphins6.png
>>> H:\4bs_18\Dolphins7.png
>>> H:\5_18\marketing html\assets\imageslogo2.png
>>> H:\7z001.png
>>> H:\7z002.png

(New) Find all files and open them with tkinter GUI

I just wanted to add in this 2019 a little app to search for all files in a dir and be able to open them by doubleclicking on the name of the file in the list. enter image description here

import tkinter as tk
import os

def searchfiles(extension='.txt', folder='H:\\'):
    "insert all files in the listbox"
    for r, d, f in os.walk(folder):
        for file in f:
            if file.endswith(extension):
                lb.insert(0, r + "\\" + file)

def open_file():
    os.startfile(lb.get(lb.curselection()[0]))

root = tk.Tk()
root.geometry("400x400")
bt = tk.Button(root, text="Search", command=lambda:searchfiles('.png', 'H:\\'))
bt.pack()
lb = tk.Listbox(root)
lb.pack(fill="both", expand=1)
lb.bind("<Double-Button>", lambda x: open_file())
root.mainloop()

回答 3

import os
os.listdir("somedirectory")

将返回“ somedirectory”中所有文件和目录的列表。

import os
os.listdir("somedirectory")

will return a list of all files and directories in “somedirectory”.


回答 4

一种获取文件列表(不包含子目录)的单行解决方案:

filenames = next(os.walk(path))[2]

或绝对路径名:

paths = [os.path.join(path, fn) for fn in next(os.walk(path))[2]]

A one-line solution to get only list of files (no subdirectories):

filenames = next(os.walk(path))[2]

or absolute pathnames:

paths = [os.path.join(path, fn) for fn in next(os.walk(path))[2]]

回答 5

从目录及其所有子目录获取完整的文件路径

import os

def get_filepaths(directory):
    """
    This function will generate the file names in a directory 
    tree by walking the tree either top-down or bottom-up. For each 
    directory in the tree rooted at directory top (including top itself), 
    it yields a 3-tuple (dirpath, dirnames, filenames).
    """
    file_paths = []  # List which will store all of the full filepaths.

    # Walk the tree.
    for root, directories, files in os.walk(directory):
        for filename in files:
            # Join the two strings in order to form the full filepath.
            filepath = os.path.join(root, filename)
            file_paths.append(filepath)  # Add it to the list.

    return file_paths  # Self-explanatory.

# Run the above function and store its results in a variable.   
full_file_paths = get_filepaths("/Users/johnny/Desktop/TEST")

  • 我在上述函数中提供的路径包含3个文件-其中两个在根目录中,另一个在子文件夹“ SUBFOLDER”中。您现在可以执行以下操作:
  • print full_file_paths 这将打印列表:

    • ['/Users/johnny/Desktop/TEST/file1.txt', '/Users/johnny/Desktop/TEST/file2.txt', '/Users/johnny/Desktop/TEST/SUBFOLDER/file3.dat']

如果愿意,您可以打开和阅读内容,或仅关注扩展名为“ .dat”的文件,如以下代码所示:

for f in full_file_paths:
  if f.endswith(".dat"):
    print f

/Users/johnny/Desktop/TEST/SUBFOLDER/file3.dat

Getting Full File Paths From a Directory and All Its Subdirectories

import os

def get_filepaths(directory):
    """
    This function will generate the file names in a directory 
    tree by walking the tree either top-down or bottom-up. For each 
    directory in the tree rooted at directory top (including top itself), 
    it yields a 3-tuple (dirpath, dirnames, filenames).
    """
    file_paths = []  # List which will store all of the full filepaths.

    # Walk the tree.
    for root, directories, files in os.walk(directory):
        for filename in files:
            # Join the two strings in order to form the full filepath.
            filepath = os.path.join(root, filename)
            file_paths.append(filepath)  # Add it to the list.

    return file_paths  # Self-explanatory.

# Run the above function and store its results in a variable.   
full_file_paths = get_filepaths("/Users/johnny/Desktop/TEST")

  • The path I provided in the above function contained 3 files— two of them in the root directory, and another in a subfolder called “SUBFOLDER.” You can now do things like:
  • print full_file_paths which will print the list:

    • ['/Users/johnny/Desktop/TEST/file1.txt', '/Users/johnny/Desktop/TEST/file2.txt', '/Users/johnny/Desktop/TEST/SUBFOLDER/file3.dat']

If you’d like, you can open and read the contents, or focus only on files with the extension “.dat” like in the code below:

for f in full_file_paths:
  if f.endswith(".dat"):
    print f

/Users/johnny/Desktop/TEST/SUBFOLDER/file3.dat


回答 6

从3.4版开始,有内置的迭代器,它比os.listdir()

pathlib版本3.4中的新功能。

>>> import pathlib
>>> [p for p in pathlib.Path('.').iterdir() if p.is_file()]

根据PEP 428,该pathlib库的目的是提供一个简单的类层次结构,以处理文件系统路径以及用户对其进行的常见操作。

os.scandir()3.5版中的新功能。

>>> import os
>>> [entry for entry in os.scandir('.') if entry.is_file()]

请注意,os.walk()使用os.scandir()代替了os.listdir()3.5版,并且根据PEP 471将其速度提高了2-20倍。

我也建议您阅读下面的ShadowRanger评论。

Since version 3.4 there are builtin iterators for this which are a lot more efficient than os.listdir():

pathlib: New in version 3.4.

>>> import pathlib
>>> [p for p in pathlib.Path('.').iterdir() if p.is_file()]

According to PEP 428, the aim of the pathlib library is to provide a simple hierarchy of classes to handle filesystem paths and the common operations users do over them.

os.scandir(): New in version 3.5.

>>> import os
>>> [entry for entry in os.scandir('.') if entry.is_file()]

Note that os.walk() uses os.scandir() instead of os.listdir() from version 3.5, and its speed got increased by 2-20 times according to PEP 471.

Let me also recommend reading ShadowRanger’s comment below.


回答 7

初步说明

  • 尽管文件目录问题文本中的术语,但有些人可能会认为目录实际上是特殊文件
  • 该声明: ”目录的所有文件 ”可以用两种方式解释:
    1. 所有直接(或1级)后代
    2. 整个目录树中的所有子代(包括子目录中的子代)
  • 提出问题时,我认为Python 2LTS版本,但是代码示例将由Python 3.5)运行(我将使其尽可能与Python 2兼容;此外,属于我要发布的Python来自v3.5.4-除非另有说明)。结果与问题中的另一个关键字相关:“ 将它们添加到列表中 ”:

    • Python 2.2之前的版本中版本中,序列(可迭代)主要由列表(元组,集合等)表示。
    • Python 2.2中引入生成器[Python.Wiki]:Generators)的概念-由[Python 3]:yield语句提供。随着时间的流逝,对于返回/使用列表的函数,生成器对应对象开始出现
    • Python 3中,generator是默认行为
    • 不知道返回列表是否仍然是强制性的(或者生成器也可以执行),但是将生成器传递给列表构造函数会从列表构造器中创建列表(并消耗列表)。以下示例说明了[Python 3]的区别mapfunction,iterable,…
    >>> import sys
    >>> sys.version
    '2.7.10 (default, Mar  8 2016, 15:02:46) [MSC v.1600 64 bit (AMD64)]'
    >>> m = map(lambda x: x, [1, 2, 3])  # Just a dummy lambda function
    >>> m, type(m)
    ([1, 2, 3], <type 'list'>)
    >>> len(m)
    3


    >>> import sys
    >>> sys.version
    '3.5.4 (v3.5.4:3f56838, Aug  8 2017, 02:17:05) [MSC v.1900 64 bit (AMD64)]'
    >>> m = map(lambda x: x, [1, 2, 3])
    >>> m, type(m)
    (<map object at 0x000001B4257342B0>, <class 'map'>)
    >>> len(m)
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    TypeError: object of type 'map' has no len()
    >>> lm0 = list(m)  # Build a list from the generator
    >>> lm0, type(lm0)
    ([1, 2, 3], <class 'list'>)
    >>>
    >>> lm1 = list(m)  # Build a list from the same generator
    >>> lm1, type(lm1)  # Empty list now - generator already consumed
    ([], <class 'list'>)
  • 这些示例将基于具有以下结构的名为root_dir的目录(此示例适用于Win,但我也在Lnx上使用同一棵树):

    E:\Work\Dev\StackOverflow\q003207219>tree /f "root_dir"
    Folder PATH listing for volume Work
    Volume serial number is 00000029 3655:6FED
    E:\WORK\DEV\STACKOVERFLOW\Q003207219\ROOT_DIR
    ¦   file0
    ¦   file1
    ¦
    +---dir0
    ¦   +---dir00
    ¦   ¦   ¦   file000
    ¦   ¦   ¦
    ¦   ¦   +---dir000
    ¦   ¦           file0000
    ¦   ¦
    ¦   +---dir01
    ¦   ¦       file010
    ¦   ¦       file011
    ¦   ¦
    ¦   +---dir02
    ¦       +---dir020
    ¦           +---dir0200
    +---dir1
    ¦       file10
    ¦       file11
    ¦       file12
    ¦
    +---dir2
    ¦   ¦   file20
    ¦   ¦
    ¦   +---dir20
    ¦           file200
    ¦
    +---dir3


解决方案

程序化方法:

  1. [Python 3]:操作系统。listdirpath =’。’

    返回一个列表,其中包含由path给出的目录中条目的名称。列表按任意顺序排列,不包括特殊条目'.''..'


    >>> import os
    >>> root_dir = "root_dir"  # Path relative to current dir (os.getcwd())
    >>>
    >>> os.listdir(root_dir)  # List all the items in root_dir
    ['dir0', 'dir1', 'dir2', 'dir3', 'file0', 'file1']
    >>>
    >>> [item for item in os.listdir(root_dir) if os.path.isfile(os.path.join(root_dir, item))]  # Filter items and only keep files (strip out directories)
    ['file0', 'file1']

    一个更详细的示例(code_os_listdir.py):

    import os
    from pprint import pformat
    
    
    def _get_dir_content(path, include_folders, recursive):
        entries = os.listdir(path)
        for entry in entries:
            entry_with_path = os.path.join(path, entry)
            if os.path.isdir(entry_with_path):
                if include_folders:
                    yield entry_with_path
                if recursive:
                    for sub_entry in _get_dir_content(entry_with_path, include_folders, recursive):
                        yield sub_entry
            else:
                yield entry_with_path
    
    
    def get_dir_content(path, include_folders=True, recursive=True, prepend_folder_name=True):
        path_len = len(path) + len(os.path.sep)
        for item in _get_dir_content(path, include_folders, recursive):
            yield item if prepend_folder_name else item[path_len:]
    
    
    def _get_dir_content_old(path, include_folders, recursive):
        entries = os.listdir(path)
        ret = list()
        for entry in entries:
            entry_with_path = os.path.join(path, entry)
            if os.path.isdir(entry_with_path):
                if include_folders:
                    ret.append(entry_with_path)
                if recursive:
                    ret.extend(_get_dir_content_old(entry_with_path, include_folders, recursive))
            else:
                ret.append(entry_with_path)
        return ret
    
    
    def get_dir_content_old(path, include_folders=True, recursive=True, prepend_folder_name=True):
        path_len = len(path) + len(os.path.sep)
        return [item if prepend_folder_name else item[path_len:] for item in _get_dir_content_old(path, include_folders, recursive)]
    
    
    def main():
        root_dir = "root_dir"
        ret0 = get_dir_content(root_dir, include_folders=True, recursive=True, prepend_folder_name=True)
        lret0 = list(ret0)
        print(ret0, len(lret0), pformat(lret0))
        ret1 = get_dir_content_old(root_dir, include_folders=False, recursive=True, prepend_folder_name=False)
        print(len(ret1), pformat(ret1))
    
    
    if __name__ == "__main__":
        main()

    注意事项

    • 有两种实现:
      • 使用生成器的生成器(当然这里似乎没有用,因为我立即将结果转换为列表)
      • 经典的(函数名称以 _old
    • 使用递归(进入子目录)
    • 对于每种实现,都有两个功能:
      • 下划线 _):“ private”(不应直接调用)-完成所有工作
      • 公共的(包装上一个):它只是从返回的条目中剥离出初始路径(如果需要)。这是一个丑陋的实现,但这是我目前唯一能想到的想法
    • 在性能方面,生成器通常要快一些(考虑这两个创建迭代时间),但是我没有在递归函数中对其进行测试,而且我还在内部生成器上迭代了函数-不知道性能如何友好的是
    • 玩弄参数以获得不同的结果


    输出

    (py35x64_test) E:\Work\Dev\StackOverflow\q003207219>"e:\Work\Dev\VEnvs\py35x64_test\Scripts\python.exe" "code_os_listdir.py"
    <generator object get_dir_content at 0x000001BDDBB3DF10> 22 ['root_dir\\dir0',
     'root_dir\\dir0\\dir00',
     'root_dir\\dir0\\dir00\\dir000',
     'root_dir\\dir0\\dir00\\dir000\\file0000',
     'root_dir\\dir0\\dir00\\file000',
     'root_dir\\dir0\\dir01',
     'root_dir\\dir0\\dir01\\file010',
     'root_dir\\dir0\\dir01\\file011',
     'root_dir\\dir0\\dir02',
     'root_dir\\dir0\\dir02\\dir020',
     'root_dir\\dir0\\dir02\\dir020\\dir0200',
     'root_dir\\dir1',
     'root_dir\\dir1\\file10',
     'root_dir\\dir1\\file11',
     'root_dir\\dir1\\file12',
     'root_dir\\dir2',
     'root_dir\\dir2\\dir20',
     'root_dir\\dir2\\dir20\\file200',
     'root_dir\\dir2\\file20',
     'root_dir\\dir3',
     'root_dir\\file0',
     'root_dir\\file1']
    11 ['dir0\\dir00\\dir000\\file0000',
     'dir0\\dir00\\file000',
     'dir0\\dir01\\file010',
     'dir0\\dir01\\file011',
     'dir1\\file10',
     'dir1\\file11',
     'dir1\\file12',
     'dir2\\dir20\\file200',
     'dir2\\file20',
     'file0',
     'file1']


  1. [Python 3]:操作系统。scandirpath =’。’Python 3.5 +,反向移植:[PyPI]:scandir

    返回与path指定的目录中的条目相对应的os.DirEntry对象的迭代器。条目以任意顺序产生,特殊条目'.''..'不包括在内。

    使用scandir()而不是listdir()可以显着提高还需要文件类型或文件属性信息的代码的性能,因为如果操作系统在扫描目录时提供了os.DirEntry对象,则该信息会公开。所有的os.DirEntry方法都可以执行系统调用,但是is_dir()is_file()通常只需要系统调用即可进行符号链接。os.DirEntry.stat()在Unix上始终需要系统调用,而在Windows上只需要一个系统调用即可。


    >>> import os
    >>> root_dir = os.path.join(".", "root_dir")  # Explicitly prepending current directory
    >>> root_dir
    '.\\root_dir'
    >>>
    >>> scandir_iterator = os.scandir(root_dir)
    >>> scandir_iterator
    <nt.ScandirIterator object at 0x00000268CF4BC140>
    >>> [item.path for item in scandir_iterator]
    ['.\\root_dir\\dir0', '.\\root_dir\\dir1', '.\\root_dir\\dir2', '.\\root_dir\\dir3', '.\\root_dir\\file0', '.\\root_dir\\file1']
    >>>
    >>> [item.path for item in scandir_iterator]  # Will yield an empty list as it was consumed by previous iteration (automatically performed by the list comprehension)
    []
    >>>
    >>> scandir_iterator = os.scandir(root_dir)  # Reinitialize the generator
    >>> for item in scandir_iterator :
    ...     if os.path.isfile(item.path):
    ...             print(item.name)
    ...
    file0
    file1

    注意事项

    • 类似于 os.listdir
    • 但是它也更灵活(并提供更多功能),更多Python ic(在某些情况下更快)


  1. [Python 3]:操作系统。步行top,topdown = True,onerror = None,followlinks = False

    通过自上而下或自下而上移动目录树来生成文件名。对于在目录为根的树中的每个目录顶部(包括顶部本身),它产生一个3元组(dirpathdirnamesfilenames)。


    >>> import os
    >>> root_dir = os.path.join(os.getcwd(), "root_dir")  # Specify the full path
    >>> root_dir
    'E:\\Work\\Dev\\StackOverflow\\q003207219\\root_dir'
    >>>
    >>> walk_generator = os.walk(root_dir)
    >>> root_dir_entry = next(walk_generator)  # First entry corresponds to the root dir (passed as an argument)
    >>> root_dir_entry
    ('E:\\Work\\Dev\\StackOverflow\\q003207219\\root_dir', ['dir0', 'dir1', 'dir2', 'dir3'], ['file0', 'file1'])
    >>>
    >>> root_dir_entry[1] + root_dir_entry[2]  # Display dirs and files (direct descendants) in a single list
    ['dir0', 'dir1', 'dir2', 'dir3', 'file0', 'file1']
    >>>
    >>> [os.path.join(root_dir_entry[0], item) for item in root_dir_entry[1] + root_dir_entry[2]]  # Display all the entries in the previous list by their full path
    ['E:\\Work\\Dev\\StackOverflow\\q003207219\\root_dir\\dir0', 'E:\\Work\\Dev\\StackOverflow\\q003207219\\root_dir\\dir1', 'E:\\Work\\Dev\\StackOverflow\\q003207219\\root_dir\\dir2', 'E:\\Work\\Dev\\StackOverflow\\q003207219\\root_dir\\dir3', 'E:\\Work\\Dev\\StackOverflow\\q003207219\\root_dir\\file0', 'E:\\Work\\Dev\\StackOverflow\\q003207219\\root_dir\\file1']
    >>>
    >>> for entry in walk_generator:  # Display the rest of the elements (corresponding to every subdir)
    ...     print(entry)
    ...
    ('E:\\Work\\Dev\\StackOverflow\\q003207219\\root_dir\\dir0', ['dir00', 'dir01', 'dir02'], [])
    ('E:\\Work\\Dev\\StackOverflow\\q003207219\\root_dir\\dir0\\dir00', ['dir000'], ['file000'])
    ('E:\\Work\\Dev\\StackOverflow\\q003207219\\root_dir\\dir0\\dir00\\dir000', [], ['file0000'])
    ('E:\\Work\\Dev\\StackOverflow\\q003207219\\root_dir\\dir0\\dir01', [], ['file010', 'file011'])
    ('E:\\Work\\Dev\\StackOverflow\\q003207219\\root_dir\\dir0\\dir02', ['dir020'], [])
    ('E:\\Work\\Dev\\StackOverflow\\q003207219\\root_dir\\dir0\\dir02\\dir020', ['dir0200'], [])
    ('E:\\Work\\Dev\\StackOverflow\\q003207219\\root_dir\\dir0\\dir02\\dir020\\dir0200', [], [])
    ('E:\\Work\\Dev\\StackOverflow\\q003207219\\root_dir\\dir1', [], ['file10', 'file11', 'file12'])
    ('E:\\Work\\Dev\\StackOverflow\\q003207219\\root_dir\\dir2', ['dir20'], ['file20'])
    ('E:\\Work\\Dev\\StackOverflow\\q003207219\\root_dir\\dir2\\dir20', [], ['file200'])
    ('E:\\Work\\Dev\\StackOverflow\\q003207219\\root_dir\\dir3', [], [])

    注意事项

    • 在场景下,它使用os.scandiros.listdir在旧版本上)
    • 它通过重复子文件夹来完成繁重的工作


  1. [Python 3]:glob。globpathname,*,recursive = False[Python 3]:glob。iglobpathname,*,recursive = False

    返回与pathname匹配的路径名的可能为空的列表,该列表必须是包含路径说明的字符串。路径名可以是绝对的(如/usr/src/Python-1.5/Makefile)或相对的(如../../Tools/*/*.gif),并且可以包含壳式通配符。损坏的符号链接包含在结果中(如在shell中)。

    在版本3.5中更改:使用“ **” 支持递归glob 。


    >>> import glob, os
    >>> wildcard_pattern = "*"
    >>> root_dir = os.path.join("root_dir", wildcard_pattern)  # Match every file/dir name
    >>> root_dir
    'root_dir\\*'
    >>>
    >>> glob_list = glob.glob(root_dir)
    >>> glob_list
    ['root_dir\\dir0', 'root_dir\\dir1', 'root_dir\\dir2', 'root_dir\\dir3', 'root_dir\\file0', 'root_dir\\file1']
    >>>
    >>> [item.replace("root_dir" + os.path.sep, "") for item in glob_list]  # Strip the dir name and the path separator from begining
    ['dir0', 'dir1', 'dir2', 'dir3', 'file0', 'file1']
    >>>
    >>> for entry in glob.iglob(root_dir + "*", recursive=True):
    ...     print(entry)
    ...
    root_dir\
    root_dir\dir0
    root_dir\dir0\dir00
    root_dir\dir0\dir00\dir000
    root_dir\dir0\dir00\dir000\file0000
    root_dir\dir0\dir00\file000
    root_dir\dir0\dir01
    root_dir\dir0\dir01\file010
    root_dir\dir0\dir01\file011
    root_dir\dir0\dir02
    root_dir\dir0\dir02\dir020
    root_dir\dir0\dir02\dir020\dir0200
    root_dir\dir1
    root_dir\dir1\file10
    root_dir\dir1\file11
    root_dir\dir1\file12
    root_dir\dir2
    root_dir\dir2\dir20
    root_dir\dir2\dir20\file200
    root_dir\dir2\file20
    root_dir\dir3
    root_dir\file0
    root_dir\file1

    注意事项

    • 用途 os.listdir
    • 对于大树(尤其是在启用递归的情况下),iglob首选
    • 允许基于名称进行高级过滤(由于通配符)


  1. [Python 3]:类pathlib。路径* pathsegmentsPython 3.4 +,backport:[PyPI]:pathlib2

    >>> import pathlib
    >>> root_dir = "root_dir"
    >>> root_dir_instance = pathlib.Path(root_dir)
    >>> root_dir_instance
    WindowsPath('root_dir')
    >>> root_dir_instance.name
    'root_dir'
    >>> root_dir_instance.is_dir()
    True
    >>>
    >>> [item.name for item in root_dir_instance.glob("*")]  # Wildcard searching for all direct descendants
    ['dir0', 'dir1', 'dir2', 'dir3', 'file0', 'file1']
    >>>
    >>> [os.path.join(item.parent.name, item.name) for item in root_dir_instance.glob("*") if not item.is_dir()]  # Display paths (including parent) for files only
    ['root_dir\\file0', 'root_dir\\file1']

    注意事项

    • 这是实现我们目标的一种方式
    • 这是处理路径的OOP风格
    • 提供许多功能


  1. [Python 2]:dircache.listdir(path)(仅Python 2


    def listdir(path):
        """List directory contents, using cache."""
        try:
            cached_mtime, list = cache[path]
            del cache[path]
        except KeyError:
            cached_mtime, list = -1, []
        mtime = os.stat(path).st_mtime
        if mtime != cached_mtime:
            list = os.listdir(path)
            list.sort()
        cache[path] = mtime, list
        return list


  1. [man7]:OPENDIR(3) / [man7]:READDIR(3) / [man7]:CLOSEDIR(3)通过[Python 3]:ctypes-Python的外部函数库(特定POSIX

    ctypes是Python的外部函数库。它提供C兼容的数据类型,并允许在DLL或共享库中调用函数。它可以用于将这些库包装在纯Python中。

    code_ctypes.py

    #!/usr/bin/env python3
    
    import sys
    from ctypes import Structure, \
        c_ulonglong, c_longlong, c_ushort, c_ubyte, c_char, c_int, \
        CDLL, POINTER, \
        create_string_buffer, get_errno, set_errno, cast
    
    
    DT_DIR = 4
    DT_REG = 8
    
    char256 = c_char * 256
    
    
    class LinuxDirent64(Structure):
        _fields_ = [
            ("d_ino", c_ulonglong),
            ("d_off", c_longlong),
            ("d_reclen", c_ushort),
            ("d_type", c_ubyte),
            ("d_name", char256),
        ]
    
    LinuxDirent64Ptr = POINTER(LinuxDirent64)
    
    libc_dll = this_process = CDLL(None, use_errno=True)
    # ALWAYS set argtypes and restype for functions, otherwise it's UB!!!
    opendir = libc_dll.opendir
    readdir = libc_dll.readdir
    closedir = libc_dll.closedir
    
    
    def get_dir_content(path):
        ret = [path, list(), list()]
        dir_stream = opendir(create_string_buffer(path.encode()))
        if (dir_stream == 0):
            print("opendir returned NULL (errno: {:d})".format(get_errno()))
            return ret
        set_errno(0)
        dirent_addr = readdir(dir_stream)
        while dirent_addr:
            dirent_ptr = cast(dirent_addr, LinuxDirent64Ptr)
            dirent = dirent_ptr.contents
            name = dirent.d_name.decode()
            if dirent.d_type & DT_DIR:
                if name not in (".", ".."):
                    ret[1].append(name)
            elif dirent.d_type & DT_REG:
                ret[2].append(name)
            dirent_addr = readdir(dir_stream)
        if get_errno():
            print("readdir returned NULL (errno: {:d})".format(get_errno()))
        closedir(dir_stream)
        return ret
    
    
    def main():
        print("{:s} on {:s}\n".format(sys.version, sys.platform))
        root_dir = "root_dir"
        entries = get_dir_content(root_dir)
        print(entries)
    
    
    if __name__ == "__main__":
        main()

    注意事项

    • 它从libc加载三个函数(在当前进程中加载​​)并调用它们(有关更多详细信息,请检查[SO]:如何检查文件是否存在无异常?(@ CristiFati的回答))第4项的最后注释)。这将使这种方法非常接近Python / C边缘
    • LinuxDirent64ctypes的代表性结构dirent64[man7]:dirent.h(0P) (因此是DT_从我的机器常量):Ubtu 16 644.10.0-40泛型libc6的-dev的:AMD64)。在其他口味/版本上,结构定义可能会有所不同,如果是,则应更新ctypes别名,否则将产生未定义行为
    • 它以os.walk格式返回数据。我没有麻烦使其递归,但是从现有代码开始,这将是一件相当琐碎的任务
    • 一切在Win上也是可行的,数据(库,函数,结构,常量等)不同


    输出

    [cfati@cfati-ubtu16x64-0:~/Work/Dev/StackOverflow/q003207219]> ./code_ctypes.py
    3.5.2 (default, Nov 12 2018, 13:43:14)
    [GCC 5.4.0 20160609] on linux
    
    ['root_dir', ['dir2', 'dir1', 'dir3', 'dir0'], ['file1', 'file0']]


  1. [ActiveState.Docs]:win32file.FindFilesW(特定Win

    使用Windows Unicode API检索匹配文件名的列表。API FindFirstFileW / FindNextFileW / Find关闭函数的接口。


    >>> import os, win32file, win32con
    >>> root_dir = "root_dir"
    >>> wildcard = "*"
    >>> root_dir_wildcard = os.path.join(root_dir, wildcard)
    >>> entry_list = win32file.FindFilesW(root_dir_wildcard)
    >>> len(entry_list)  # Don't display the whole content as it's too long
    8
    >>> [entry[-2] for entry in entry_list]  # Only display the entry names
    ['.', '..', 'dir0', 'dir1', 'dir2', 'dir3', 'file0', 'file1']
    >>>
    >>> [entry[-2] for entry in entry_list if entry[0] & win32con.FILE_ATTRIBUTE_DIRECTORY and entry[-2] not in (".", "..")]  # Filter entries and only display dir names (except self and parent)
    ['dir0', 'dir1', 'dir2', 'dir3']
    >>>
    >>> [os.path.join(root_dir, entry[-2]) for entry in entry_list if entry[0] & (win32con.FILE_ATTRIBUTE_NORMAL | win32con.FILE_ATTRIBUTE_ARCHIVE)]  # Only display file "full" names
    ['root_dir\\file0', 'root_dir\\file1']

    注意事项


  1. 安装一些(其他)第三方软件包即可
    • 最有可能会依赖于上述一项(或多项)(可能需要进行一些自定义)


注意事项

  • 代码应具有可移植性(针对特定区域的地方-带有标记的地方除外)或交叉的:

    • 平台(NixWin,)
    • Python版本(2、3,)
  • 在上述变体中使用了多种路径样式(绝对路径,相对路径),以说明以下事实:所使用的“工具”在此方向上是灵活的

  • os.listdiros.scandir使用opendir / readdir / closedir[MS.Docs]:FindFirstFileW函数 / [MS.Docs]:FindNextFileW函数 / [MS.Docs]:FindClose函数)(通过[GitHub]:python / cpython-(master)cpython /模块/posixmodule.c

  • win32file.FindFilesW也使用那些(特定于Win的)函数(通过[GitHub]:mhammond / pywin32-(主)pywin32 / win32 / src / win32file.i

  • _get_dir_content(从第1点开始)可以使用以下任何一种方法来实现(有些需要更多的工作,有些需要更少的工作)

    • 可以执行一些高级过滤(而不只是文件目录):例如,include_folders参数可以被另一个参数(例如filter_func)代替,该函数将以路径作为参数:(filter_func=lambda x: True这不会删除)内容)和_get_dir_content内的内容类似:(if not filter_func(entry_with_path): continue如果该函数对一项失败,则将被跳过),但是代码越复杂,执行所花费的时间就越长
  • 诺娜·贝恩!由于使用了递归,因此我必须提到我在笔记本电脑(Win 10 x64)上进行了一些测试,与该问题完全无关,并且当递归级别达到(990 .. 1000)范围内的某个值时(recursionlimit -1000 (默认)),我得到了StackOverflow :)。如果目录树超过该限制(我不是FS专家,那么我什至不知道那是否可能),那可能是个问题。
    我还必须提到,我没有尝试增加递归限制,因为我在该领域没有经验(必须增加OS的堆栈之前,我可以增加多少?级别),但从理论上讲,如果目录深度大于可能的最高递归限制(在该计算机上),则总有失败的可能性

  • 代码示例仅用于说明目的。这意味着我没有考虑错误处理(我不认为有任何尝试 / 除外 / else / finally块),因此代码并不健壮(原因是:使其尽可能简单和简短) )。对于生产,还应添加错误处理

其他方法:

  1. 仅将Python用作包装器

    • 一切都使用另一种技术完成
    • 该技术是从Python调用的
    • 我所知道的最著名的味道是我所说的系统管理员方法:

      • 采用 Python(或与此相关的任何编程语言)执行Shell命令(并解析其输出)
      • 有人认为这是一个很好的技巧
      • 我认为这更像是一种la脚的解决方法(gainarie),因为操作本身是从shell在这种情况下为cmd)执行的,因此与Python无关。
      • 过滤(grep/ findstr)或输出格式化都可以在两面进行,但我不会坚持这样做。另外,我故意使用os.system代替subprocess.Popen
      (py35x64_test) E:\Work\Dev\StackOverflow\q003207219>"e:\Work\Dev\VEnvs\py35x64_test\Scripts\python.exe" -c "import os;os.system(\"dir /b root_dir\")"
      dir0
      dir1
      dir2
      dir3
      file0
      file1

    通常应避免这种方法,因为如果某些命令输出格式在OS版本/风格之间略有不同,则解析代码也应进行调整;例如,更不用说语言环境之间的差异了)。

Preliminary notes

  • Although there’s a clear differentiation between file and directory terms in the question text, some may argue that directories are actually special files
  • The statement: “all files of a directory” can be interpreted in two ways:
    1. All direct (or level 1) descendants only
    2. All descendants in the whole directory tree (including the ones in sub-directories)
  • When the question was asked, I imagine that Python 2, was the LTS version, however the code samples will be run by Python 3(.5) (I’ll keep them as Python 2 compliant as possible; also, any code belonging to Python that I’m going to post, is from v3.5.4 – unless otherwise specified). That has consequences related to another keyword in the question: “add them into a list“:

    • In pre Python 2.2 versions, sequences (iterables) were mostly represented by lists (tuples, sets, …)
    • In Python 2.2, the concept of generator ([Python.Wiki]: Generators) – courtesy of [Python 3]: The yield statement) – was introduced. As time passed, generator counterparts started to appear for functions that returned/worked with lists
    • In Python 3, generator is the default behavior
    • Not sure if returning a list is still mandatory (or a generator would do as well), but passing a generator to the list constructor, will create a list out of it (and also consume it). The example below illustrates the differences on [Python 3]: map(function, iterable, …)
    >>> import sys
    >>> sys.version
    '2.7.10 (default, Mar  8 2016, 15:02:46) [MSC v.1600 64 bit (AMD64)]'
    >>> m = map(lambda x: x, [1, 2, 3])  # Just a dummy lambda function
    >>> m, type(m)
    ([1, 2, 3], <type 'list'>)
    >>> len(m)
    3
    


    >>> import sys
    >>> sys.version
    '3.5.4 (v3.5.4:3f56838, Aug  8 2017, 02:17:05) [MSC v.1900 64 bit (AMD64)]'
    >>> m = map(lambda x: x, [1, 2, 3])
    >>> m, type(m)
    (<map object at 0x000001B4257342B0>, <class 'map'>)
    >>> len(m)
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    TypeError: object of type 'map' has no len()
    >>> lm0 = list(m)  # Build a list from the generator
    >>> lm0, type(lm0)
    ([1, 2, 3], <class 'list'>)
    >>>
    >>> lm1 = list(m)  # Build a list from the same generator
    >>> lm1, type(lm1)  # Empty list now - generator already consumed
    ([], <class 'list'>)
    
  • The examples will be based on a directory called root_dir with the following structure (this example is for Win, but I’m using the same tree on Lnx as well):

    E:\Work\Dev\StackOverflow\q003207219>tree /f "root_dir"
    Folder PATH listing for volume Work
    Volume serial number is 00000029 3655:6FED
    E:\WORK\DEV\STACKOVERFLOW\Q003207219\ROOT_DIR
    ¦   file0
    ¦   file1
    ¦
    +---dir0
    ¦   +---dir00
    ¦   ¦   ¦   file000
    ¦   ¦   ¦
    ¦   ¦   +---dir000
    ¦   ¦           file0000
    ¦   ¦
    ¦   +---dir01
    ¦   ¦       file010
    ¦   ¦       file011
    ¦   ¦
    ¦   +---dir02
    ¦       +---dir020
    ¦           +---dir0200
    +---dir1
    ¦       file10
    ¦       file11
    ¦       file12
    ¦
    +---dir2
    ¦   ¦   file20
    ¦   ¦
    ¦   +---dir20
    ¦           file200
    ¦
    +---dir3
    


Solutions

Programmatic approaches:

  1. [Python 3]: os.listdir(path=’.’)

    Return a list containing the names of the entries in the directory given by path. The list is in arbitrary order, and does not include the special entries '.' and '..'


    >>> import os
    >>> root_dir = "root_dir"  # Path relative to current dir (os.getcwd())
    >>>
    >>> os.listdir(root_dir)  # List all the items in root_dir
    ['dir0', 'dir1', 'dir2', 'dir3', 'file0', 'file1']
    >>>
    >>> [item for item in os.listdir(root_dir) if os.path.isfile(os.path.join(root_dir, item))]  # Filter items and only keep files (strip out directories)
    ['file0', 'file1']
    

    A more elaborate example (code_os_listdir.py):

    import os
    from pprint import pformat
    
    
    def _get_dir_content(path, include_folders, recursive):
        entries = os.listdir(path)
        for entry in entries:
            entry_with_path = os.path.join(path, entry)
            if os.path.isdir(entry_with_path):
                if include_folders:
                    yield entry_with_path
                if recursive:
                    for sub_entry in _get_dir_content(entry_with_path, include_folders, recursive):
                        yield sub_entry
            else:
                yield entry_with_path
    
    
    def get_dir_content(path, include_folders=True, recursive=True, prepend_folder_name=True):
        path_len = len(path) + len(os.path.sep)
        for item in _get_dir_content(path, include_folders, recursive):
            yield item if prepend_folder_name else item[path_len:]
    
    
    def _get_dir_content_old(path, include_folders, recursive):
        entries = os.listdir(path)
        ret = list()
        for entry in entries:
            entry_with_path = os.path.join(path, entry)
            if os.path.isdir(entry_with_path):
                if include_folders:
                    ret.append(entry_with_path)
                if recursive:
                    ret.extend(_get_dir_content_old(entry_with_path, include_folders, recursive))
            else:
                ret.append(entry_with_path)
        return ret
    
    
    def get_dir_content_old(path, include_folders=True, recursive=True, prepend_folder_name=True):
        path_len = len(path) + len(os.path.sep)
        return [item if prepend_folder_name else item[path_len:] for item in _get_dir_content_old(path, include_folders, recursive)]
    
    
    def main():
        root_dir = "root_dir"
        ret0 = get_dir_content(root_dir, include_folders=True, recursive=True, prepend_folder_name=True)
        lret0 = list(ret0)
        print(ret0, len(lret0), pformat(lret0))
        ret1 = get_dir_content_old(root_dir, include_folders=False, recursive=True, prepend_folder_name=False)
        print(len(ret1), pformat(ret1))
    
    
    if __name__ == "__main__":
        main()
    

    Notes:

    • There are two implementations:
      • One that uses generators (of course here it seems useless, since I immediately convert the result to a list)
      • The classic one (function names ending in _old)
    • Recursion is used (to get into subdirectories)
    • For each implementation there are two functions:
      • One that starts with an underscore (_): “private” (should not be called directly) – that does all the work
      • The public one (wrapper over previous): it just strips off the initial path (if required) from the returned entries. It’s an ugly implementation, but it’s the only idea that I could come with at this point
    • In terms of performance, generators are generally a little bit faster (considering both creation and iteration times), but I didn’t test them in recursive functions, and also I am iterating inside the function over inner generators – don’t know how performance friendly is that
    • Play with the arguments to get different results


    Output:

    (py35x64_test) E:\Work\Dev\StackOverflow\q003207219>"e:\Work\Dev\VEnvs\py35x64_test\Scripts\python.exe" "code_os_listdir.py"
    <generator object get_dir_content at 0x000001BDDBB3DF10> 22 ['root_dir\\dir0',
     'root_dir\\dir0\\dir00',
     'root_dir\\dir0\\dir00\\dir000',
     'root_dir\\dir0\\dir00\\dir000\\file0000',
     'root_dir\\dir0\\dir00\\file000',
     'root_dir\\dir0\\dir01',
     'root_dir\\dir0\\dir01\\file010',
     'root_dir\\dir0\\dir01\\file011',
     'root_dir\\dir0\\dir02',
     'root_dir\\dir0\\dir02\\dir020',
     'root_dir\\dir0\\dir02\\dir020\\dir0200',
     'root_dir\\dir1',
     'root_dir\\dir1\\file10',
     'root_dir\\dir1\\file11',
     'root_dir\\dir1\\file12',
     'root_dir\\dir2',
     'root_dir\\dir2\\dir20',
     'root_dir\\dir2\\dir20\\file200',
     'root_dir\\dir2\\file20',
     'root_dir\\dir3',
     'root_dir\\file0',
     'root_dir\\file1']
    11 ['dir0\\dir00\\dir000\\file0000',
     'dir0\\dir00\\file000',
     'dir0\\dir01\\file010',
     'dir0\\dir01\\file011',
     'dir1\\file10',
     'dir1\\file11',
     'dir1\\file12',
     'dir2\\dir20\\file200',
     'dir2\\file20',
     'file0',
     'file1']
    


  1. [Python 3]: os.scandir(path=’.’) (Python 3.5+, backport: [PyPI]: scandir)

    Return an iterator of os.DirEntry objects corresponding to the entries in the directory given by path. The entries are yielded in arbitrary order, and the special entries '.' and '..' are not included.

    Using scandir() instead of listdir() can significantly increase the performance of code that also needs file type or file attribute information, because os.DirEntry objects expose this information if the operating system provides it when scanning a directory. All os.DirEntry methods may perform a system call, but is_dir() and is_file() usually only require a system call for symbolic links; os.DirEntry.stat() always requires a system call on Unix but only requires one for symbolic links on Windows.


    >>> import os
    >>> root_dir = os.path.join(".", "root_dir")  # Explicitly prepending current directory
    >>> root_dir
    '.\\root_dir'
    >>>
    >>> scandir_iterator = os.scandir(root_dir)
    >>> scandir_iterator
    <nt.ScandirIterator object at 0x00000268CF4BC140>
    >>> [item.path for item in scandir_iterator]
    ['.\\root_dir\\dir0', '.\\root_dir\\dir1', '.\\root_dir\\dir2', '.\\root_dir\\dir3', '.\\root_dir\\file0', '.\\root_dir\\file1']
    >>>
    >>> [item.path for item in scandir_iterator]  # Will yield an empty list as it was consumed by previous iteration (automatically performed by the list comprehension)
    []
    >>>
    >>> scandir_iterator = os.scandir(root_dir)  # Reinitialize the generator
    >>> for item in scandir_iterator :
    ...     if os.path.isfile(item.path):
    ...             print(item.name)
    ...
    file0
    file1
    

    Notes:

    • It’s similar to os.listdir
    • But it’s also more flexible (and offers more functionality), more Pythonic (and in some cases, faster)


  1. [Python 3]: os.walk(top, topdown=True, onerror=None, followlinks=False)

    Generate the file names in a directory tree by walking the tree either top-down or bottom-up. For each directory in the tree rooted at directory top (including top itself), it yields a 3-tuple (dirpath, dirnames, filenames).


    >>> import os
    >>> root_dir = os.path.join(os.getcwd(), "root_dir")  # Specify the full path
    >>> root_dir
    'E:\\Work\\Dev\\StackOverflow\\q003207219\\root_dir'
    >>>
    >>> walk_generator = os.walk(root_dir)
    >>> root_dir_entry = next(walk_generator)  # First entry corresponds to the root dir (passed as an argument)
    >>> root_dir_entry
    ('E:\\Work\\Dev\\StackOverflow\\q003207219\\root_dir', ['dir0', 'dir1', 'dir2', 'dir3'], ['file0', 'file1'])
    >>>
    >>> root_dir_entry[1] + root_dir_entry[2]  # Display dirs and files (direct descendants) in a single list
    ['dir0', 'dir1', 'dir2', 'dir3', 'file0', 'file1']
    >>>
    >>> [os.path.join(root_dir_entry[0], item) for item in root_dir_entry[1] + root_dir_entry[2]]  # Display all the entries in the previous list by their full path
    ['E:\\Work\\Dev\\StackOverflow\\q003207219\\root_dir\\dir0', 'E:\\Work\\Dev\\StackOverflow\\q003207219\\root_dir\\dir1', 'E:\\Work\\Dev\\StackOverflow\\q003207219\\root_dir\\dir2', 'E:\\Work\\Dev\\StackOverflow\\q003207219\\root_dir\\dir3', 'E:\\Work\\Dev\\StackOverflow\\q003207219\\root_dir\\file0', 'E:\\Work\\Dev\\StackOverflow\\q003207219\\root_dir\\file1']
    >>>
    >>> for entry in walk_generator:  # Display the rest of the elements (corresponding to every subdir)
    ...     print(entry)
    ...
    ('E:\\Work\\Dev\\StackOverflow\\q003207219\\root_dir\\dir0', ['dir00', 'dir01', 'dir02'], [])
    ('E:\\Work\\Dev\\StackOverflow\\q003207219\\root_dir\\dir0\\dir00', ['dir000'], ['file000'])
    ('E:\\Work\\Dev\\StackOverflow\\q003207219\\root_dir\\dir0\\dir00\\dir000', [], ['file0000'])
    ('E:\\Work\\Dev\\StackOverflow\\q003207219\\root_dir\\dir0\\dir01', [], ['file010', 'file011'])
    ('E:\\Work\\Dev\\StackOverflow\\q003207219\\root_dir\\dir0\\dir02', ['dir020'], [])
    ('E:\\Work\\Dev\\StackOverflow\\q003207219\\root_dir\\dir0\\dir02\\dir020', ['dir0200'], [])
    ('E:\\Work\\Dev\\StackOverflow\\q003207219\\root_dir\\dir0\\dir02\\dir020\\dir0200', [], [])
    ('E:\\Work\\Dev\\StackOverflow\\q003207219\\root_dir\\dir1', [], ['file10', 'file11', 'file12'])
    ('E:\\Work\\Dev\\StackOverflow\\q003207219\\root_dir\\dir2', ['dir20'], ['file20'])
    ('E:\\Work\\Dev\\StackOverflow\\q003207219\\root_dir\\dir2\\dir20', [], ['file200'])
    ('E:\\Work\\Dev\\StackOverflow\\q003207219\\root_dir\\dir3', [], [])
    

    Notes:

    • Under the scenes, it uses os.scandir (os.listdir on older versions)
    • It does the heavy lifting by recurring in subfolders


  1. [Python 3]: glob.glob(pathname, *, recursive=False) ([Python 3]: glob.iglob(pathname, *, recursive=False))

    Return a possibly-empty list of path names that match pathname, which must be a string containing a path specification. pathname can be either absolute (like /usr/src/Python-1.5/Makefile) or relative (like ../../Tools/*/*.gif), and can contain shell-style wildcards. Broken symlinks are included in the results (as in the shell).

    Changed in version 3.5: Support for recursive globs using “**”.


    >>> import glob, os
    >>> wildcard_pattern = "*"
    >>> root_dir = os.path.join("root_dir", wildcard_pattern)  # Match every file/dir name
    >>> root_dir
    'root_dir\\*'
    >>>
    >>> glob_list = glob.glob(root_dir)
    >>> glob_list
    ['root_dir\\dir0', 'root_dir\\dir1', 'root_dir\\dir2', 'root_dir\\dir3', 'root_dir\\file0', 'root_dir\\file1']
    >>>
    >>> [item.replace("root_dir" + os.path.sep, "") for item in glob_list]  # Strip the dir name and the path separator from begining
    ['dir0', 'dir1', 'dir2', 'dir3', 'file0', 'file1']
    >>>
    >>> for entry in glob.iglob(root_dir + "*", recursive=True):
    ...     print(entry)
    ...
    root_dir\
    root_dir\dir0
    root_dir\dir0\dir00
    root_dir\dir0\dir00\dir000
    root_dir\dir0\dir00\dir000\file0000
    root_dir\dir0\dir00\file000
    root_dir\dir0\dir01
    root_dir\dir0\dir01\file010
    root_dir\dir0\dir01\file011
    root_dir\dir0\dir02
    root_dir\dir0\dir02\dir020
    root_dir\dir0\dir02\dir020\dir0200
    root_dir\dir1
    root_dir\dir1\file10
    root_dir\dir1\file11
    root_dir\dir1\file12
    root_dir\dir2
    root_dir\dir2\dir20
    root_dir\dir2\dir20\file200
    root_dir\dir2\file20
    root_dir\dir3
    root_dir\file0
    root_dir\file1
    

    Notes:

    • Uses os.listdir
    • For large trees (especially if recursive is on), iglob is preferred
    • Allows advanced filtering based on name (due to the wildcard)


  1. [Python 3]: class pathlib.Path(*pathsegments) (Python 3.4+, backport: [PyPI]: pathlib2)

    >>> import pathlib
    >>> root_dir = "root_dir"
    >>> root_dir_instance = pathlib.Path(root_dir)
    >>> root_dir_instance
    WindowsPath('root_dir')
    >>> root_dir_instance.name
    'root_dir'
    >>> root_dir_instance.is_dir()
    True
    >>>
    >>> [item.name for item in root_dir_instance.glob("*")]  # Wildcard searching for all direct descendants
    ['dir0', 'dir1', 'dir2', 'dir3', 'file0', 'file1']
    >>>
    >>> [os.path.join(item.parent.name, item.name) for item in root_dir_instance.glob("*") if not item.is_dir()]  # Display paths (including parent) for files only
    ['root_dir\\file0', 'root_dir\\file1']
    

    Notes:

    • This is one way of achieving our goal
    • It’s the OOP style of handling paths
    • Offers lots of functionalities


  1. [Python 2]: dircache.listdir(path) (Python 2 only)


    def listdir(path):
        """List directory contents, using cache."""
        try:
            cached_mtime, list = cache[path]
            del cache[path]
        except KeyError:
            cached_mtime, list = -1, []
        mtime = os.stat(path).st_mtime
        if mtime != cached_mtime:
            list = os.listdir(path)
            list.sort()
        cache[path] = mtime, list
        return list
    


  1. [man7]: OPENDIR(3) / [man7]: READDIR(3) / [man7]: CLOSEDIR(3) via [Python 3]: ctypes – A foreign function library for Python (POSIX specific)

    ctypes is a foreign function library for Python. It provides C compatible data types, and allows calling functions in DLLs or shared libraries. It can be used to wrap these libraries in pure Python.

    code_ctypes.py:

    #!/usr/bin/env python3
    
    import sys
    from ctypes import Structure, \
        c_ulonglong, c_longlong, c_ushort, c_ubyte, c_char, c_int, \
        CDLL, POINTER, \
        create_string_buffer, get_errno, set_errno, cast
    
    
    DT_DIR = 4
    DT_REG = 8
    
    char256 = c_char * 256
    
    
    class LinuxDirent64(Structure):
        _fields_ = [
            ("d_ino", c_ulonglong),
            ("d_off", c_longlong),
            ("d_reclen", c_ushort),
            ("d_type", c_ubyte),
            ("d_name", char256),
        ]
    
    LinuxDirent64Ptr = POINTER(LinuxDirent64)
    
    libc_dll = this_process = CDLL(None, use_errno=True)
    # ALWAYS set argtypes and restype for functions, otherwise it's UB!!!
    opendir = libc_dll.opendir
    readdir = libc_dll.readdir
    closedir = libc_dll.closedir
    
    
    def get_dir_content(path):
        ret = [path, list(), list()]
        dir_stream = opendir(create_string_buffer(path.encode()))
        if (dir_stream == 0):
            print("opendir returned NULL (errno: {:d})".format(get_errno()))
            return ret
        set_errno(0)
        dirent_addr = readdir(dir_stream)
        while dirent_addr:
            dirent_ptr = cast(dirent_addr, LinuxDirent64Ptr)
            dirent = dirent_ptr.contents
            name = dirent.d_name.decode()
            if dirent.d_type & DT_DIR:
                if name not in (".", ".."):
                    ret[1].append(name)
            elif dirent.d_type & DT_REG:
                ret[2].append(name)
            dirent_addr = readdir(dir_stream)
        if get_errno():
            print("readdir returned NULL (errno: {:d})".format(get_errno()))
        closedir(dir_stream)
        return ret
    
    
    def main():
        print("{:s} on {:s}\n".format(sys.version, sys.platform))
        root_dir = "root_dir"
        entries = get_dir_content(root_dir)
        print(entries)
    
    
    if __name__ == "__main__":
        main()
    

    Notes:

    • It loads the three functions from libc (loaded in the current process) and calls them (for more details check [SO]: How do I check whether a file exists without exceptions? (@CristiFati’s answer) – last notes from item #4.). That would place this approach very close to the Python / C edge
    • LinuxDirent64 is the ctypes representation of struct dirent64 from [man7]: dirent.h(0P) (so are the DT_ constants) from my machine: Ubtu 16 x64 (4.10.0-40-generic and libc6-dev:amd64). On other flavors/versions, the struct definition might differ, and if so, the ctypes alias should be updated, otherwise it will yield Undefined Behavior
    • It returns data in the os.walk‘s format. I didn’t bother to make it recursive, but starting from the existing code, that would be a fairly trivial task
    • Everything is doable on Win as well, the data (libraries, functions, structs, constants, …) differ


    Output:

    [cfati@cfati-ubtu16x64-0:~/Work/Dev/StackOverflow/q003207219]> ./code_ctypes.py
    3.5.2 (default, Nov 12 2018, 13:43:14)
    [GCC 5.4.0 20160609] on linux
    
    ['root_dir', ['dir2', 'dir1', 'dir3', 'dir0'], ['file1', 'file0']]
    


  1. [ActiveState.Docs]: win32file.FindFilesW (Win specific)

    Retrieves a list of matching filenames, using the Windows Unicode API. An interface to the API FindFirstFileW/FindNextFileW/Find close functions.


    >>> import os, win32file, win32con
    >>> root_dir = "root_dir"
    >>> wildcard = "*"
    >>> root_dir_wildcard = os.path.join(root_dir, wildcard)
    >>> entry_list = win32file.FindFilesW(root_dir_wildcard)
    >>> len(entry_list)  # Don't display the whole content as it's too long
    8
    >>> [entry[-2] for entry in entry_list]  # Only display the entry names
    ['.', '..', 'dir0', 'dir1', 'dir2', 'dir3', 'file0', 'file1']
    >>>
    >>> [entry[-2] for entry in entry_list if entry[0] & win32con.FILE_ATTRIBUTE_DIRECTORY and entry[-2] not in (".", "..")]  # Filter entries and only display dir names (except self and parent)
    ['dir0', 'dir1', 'dir2', 'dir3']
    >>>
    >>> [os.path.join(root_dir, entry[-2]) for entry in entry_list if entry[0] & (win32con.FILE_ATTRIBUTE_NORMAL | win32con.FILE_ATTRIBUTE_ARCHIVE)]  # Only display file "full" names
    ['root_dir\\file0', 'root_dir\\file1']
    

    Notes:


  1. Install some (other) third-party package that does the trick
    • Most likely, will rely on one (or more) of the above (maybe with slight customizations)


Notes:

  • Code is meant to be portable (except places that target a specific area – which are marked) or cross:

    • platform (Nix, Win, )
    • Python version (2, 3, )
  • Multiple path styles (absolute, relatives) were used across the above variants, to illustrate the fact that the “tools” used are flexible in this direction

  • os.listdir and os.scandir use opendir / readdir / closedir ([MS.Docs]: FindFirstFileW function / [MS.Docs]: FindNextFileW function / [MS.Docs]: FindClose function) (via [GitHub]: python/cpython – (master) cpython/Modules/posixmodule.c)

  • win32file.FindFilesW uses those (Win specific) functions as well (via [GitHub]: mhammond/pywin32 – (master) pywin32/win32/src/win32file.i)

  • _get_dir_content (from point #1.) can be implemented using any of these approaches (some will require more work and some less)

    • Some advanced filtering (instead of just file vs. dir) could be done: e.g. the include_folders argument could be replaced by another one (e.g. filter_func) which would be a function that takes a path as an argument: filter_func=lambda x: True (this doesn’t strip out anything) and inside _get_dir_content something like: if not filter_func(entry_with_path): continue (if the function fails for one entry, it will be skipped), but the more complex the code becomes, the longer it will take to execute
  • Nota bene! Since recursion is used, I must mention that I did some tests on my laptop (Win 10 x64), totally unrelated to this problem, and when the recursion level was reaching values somewhere in the (990 .. 1000) range (recursionlimit – 1000 (default)), I got StackOverflow :). If the directory tree exceeds that limit (I am not an FS expert, so I don’t know if that is even possible), that could be a problem.
    I must also mention that I didn’t try to increase recursionlimit because I have no experience in the area (how much can I increase it before having to also increase the stack at OS level), but in theory there will always be the possibility for failure, if the dir depth is larger than the highest possible recursionlimit (on that machine)

  • The code samples are for demonstrative purposes only. That means that I didn’t take into account error handling (I don’t think there’s any try / except / else / finally block), so the code is not robust (the reason is: to keep it as simple and short as possible). For production, error handling should be added as well

Other approaches:

  1. Use Python only as a wrapper

    • Everything is done using another technology
    • That technology is invoked from Python
    • The most famous flavor that I know is what I call the system administrator approach:

      • Use Python (or any programming language for that matter) in order to execute shell commands (and parse their outputs)
      • Some consider this a neat hack
      • I consider it more like a lame workaround (gainarie), as the action per se is performed from shell (cmd in this case), and thus doesn’t have anything to do with Python.
      • Filtering (grep / findstr) or output formatting could be done on both sides, but I’m not going to insist on it. Also, I deliberately used os.system instead of subprocess.Popen.
      (py35x64_test) E:\Work\Dev\StackOverflow\q003207219>"e:\Work\Dev\VEnvs\py35x64_test\Scripts\python.exe" -c "import os;os.system(\"dir /b root_dir\")"
      dir0
      dir1
      dir2
      dir3
      file0
      file1
      

    In general this approach is to be avoided, since if some command output format slightly differs between OS versions/flavors, the parsing code should be adapted as well; not to mention differences between locales).


回答 8

我真的很喜欢adamk的答案,建议您使用glob()同名模块中的。这使您可以与进行模式匹配*

但是,正如其他人在评论中指出的那样,它们glob()可能会因不一致的斜线方向而绊倒。为了解决这个问题,建议您使用模块中的join()expanduser()函数,也可以使用os.path模块中的getcwd()函数os

例如:

from glob import glob

# Return everything under C:\Users\admin that contains a folder called wlp.
glob('C:\Users\admin\*\wlp')

上面的代码很糟糕-路径已经过硬编码,并且只能在Windows上的驱动器名称和\s硬编码到路径之间使用。

from glob    import glob
from os.path import join

# Return everything under Users, admin, that contains a folder called wlp.
glob(join('Users', 'admin', '*', 'wlp'))

上面的方法效果更好,但是它依赖Users于Windows上经常发现的文件夹名称,而在其他OS上却很少见。它还依赖于具有特定名称的用户admin

from glob    import glob
from os.path import expanduser, join

# Return everything under the user directory that contains a folder called wlp.
glob(join(expanduser('~'), '*', 'wlp'))

这可以在所有平台上完美运行。

另一个很好的示例,它可以在各种平台上完美运行,并且有所不同:

from glob    import glob
from os      import getcwd
from os.path import join

# Return everything under the current directory that contains a folder called wlp.
glob(join(getcwd(), '*', 'wlp'))

希望这些示例可以帮助您了解在标准Python库模块中可以找到的一些功能的强大功能。

I really liked adamk’s answer, suggesting that you use glob(), from the module of the same name. This allows you to have pattern matching with *s.

But as other people pointed out in the comments, glob() can get tripped up over inconsistent slash directions. To help with that, I suggest you use the join() and expanduser() functions in the os.path module, and perhaps the getcwd() function in the os module, as well.

As examples:

from glob import glob

# Return everything under C:\Users\admin that contains a folder called wlp.
glob('C:\Users\admin\*\wlp')

The above is terrible – the path has been hardcoded and will only ever work on Windows between the drive name and the \s being hardcoded into the path.

from glob    import glob
from os.path import join

# Return everything under Users, admin, that contains a folder called wlp.
glob(join('Users', 'admin', '*', 'wlp'))

The above works better, but it relies on the folder name Users which is often found on Windows and not so often found on other OSs. It also relies on the user having a specific name, admin.

from glob    import glob
from os.path import expanduser, join

# Return everything under the user directory that contains a folder called wlp.
glob(join(expanduser('~'), '*', 'wlp'))

This works perfectly across all platforms.

Another great example that works perfectly across platforms and does something a bit different:

from glob    import glob
from os      import getcwd
from os.path import join

# Return everything under the current directory that contains a folder called wlp.
glob(join(getcwd(), '*', 'wlp'))

Hope these examples help you see the power of a few of the functions you can find in the standard Python library modules.


回答 9

def list_files(path):
    # returns a list of names (with extension, without full path) of all files 
    # in folder path
    files = []
    for name in os.listdir(path):
        if os.path.isfile(os.path.join(path, name)):
            files.append(name)
    return files 
def list_files(path):
    # returns a list of names (with extension, without full path) of all files 
    # in folder path
    files = []
    for name in os.listdir(path):
        if os.path.isfile(os.path.join(path, name)):
            files.append(name)
    return files 

回答 10

如果您正在寻找find的Python实现,这是我经常使用的食谱:

from findtools.find_files import (find_files, Match)

# Recursively find all *.sh files in **/usr/bin**
sh_files_pattern = Match(filetype='f', name='*.sh')
found_files = find_files(path='/usr/bin', match=sh_files_pattern)

for found_file in found_files:
    print found_file

因此,我用它制作了一个PyPI 软件包,并且还有一个GitHub存储库。我希望有人发现它可能对该代码有用。

If you are looking for a Python implementation of find, this is a recipe I use rather frequently:

from findtools.find_files import (find_files, Match)

# Recursively find all *.sh files in **/usr/bin**
sh_files_pattern = Match(filetype='f', name='*.sh')
found_files = find_files(path='/usr/bin', match=sh_files_pattern)

for found_file in found_files:
    print found_file

So I made a PyPI package out of it and there is also a GitHub repository. I hope that someone finds it potentially useful for this code.


回答 11

为了获得更好的结果,您可以listdir()os模块的方法与生成器一起使用(生成器是保持其状态的强大迭代器,还记得吗?)。以下代码在这两个版本上均可正常使用:Python 2和Python 3。

这是一个代码:

import os

def files(path):  
    for file in os.listdir(path):
        if os.path.isfile(os.path.join(path, file)):
            yield file

for file in files("."):  
    print (file)

listdir()方法返回给定目录的条目列表。如果给定的条目是文件,则该方法os.path.isfile()返回True。并且yield操作员退出功能但保持其当前状态,并且仅返回检测为文件的条目的名称。以上所有内容使我们可以循环生成器功能。

For greater results, you can use listdir() method of the os module along with a generator (a generator is a powerful iterator that keeps its state, remember?). The following code works fine with both versions: Python 2 and Python 3.

Here’s a code:

import os

def files(path):  
    for file in os.listdir(path):
        if os.path.isfile(os.path.join(path, file)):
            yield file

for file in files("."):  
    print (file)

The listdir() method returns the list of entries for the given directory. The method os.path.isfile() returns True if the given entry is a file. And the yield operator quits the func but keeps its current state, and it returns only the name of the entry detected as a file. All the above allows us to loop over the generator function.


回答 12

返回绝对文件路径的列表,不会递归到子目录中

L = [os.path.join(os.getcwd(),f) for f in os.listdir('.') if os.path.isfile(os.path.join(os.getcwd(),f))]

Returning a list of absolute filepaths, does not recurse into subdirectories

L = [os.path.join(os.getcwd(),f) for f in os.listdir('.') if os.path.isfile(os.path.join(os.getcwd(),f))]

回答 13

import os
import os.path


def get_files(target_dir):
    item_list = os.listdir(target_dir)

    file_list = list()
    for item in item_list:
        item_dir = os.path.join(target_dir,item)
        if os.path.isdir(item_dir):
            file_list += get_files(item_dir)
        else:
            file_list.append(item_dir)
    return file_list

在这里,我使用递归结构。

import os
import os.path


def get_files(target_dir):
    item_list = os.listdir(target_dir)

    file_list = list()
    for item in item_list:
        item_dir = os.path.join(target_dir,item)
        if os.path.isdir(item_dir):
            file_list += get_files(item_dir)
        else:
            file_list.append(item_dir)
    return file_list

Here I use a recursive structure.


回答 14

一位聪明的老师曾经告诉我:

当有几种确定的方法可以做某事时,没有一种方法适合所有情况。

因此,我将为问题的一个子集添加一个解决方案:很多时候,我们只想检查文件是否匹配开始字符串和结束字符串,而无需进入子目录。因此,我们想要一个返回文件名列表的函数,例如:

filenames = dir_filter('foo/baz', radical='radical', extension='.txt')

如果您想先声明两个函数,可以这样做:

def file_filter(filename, radical='', extension=''):
    "Check if a filename matches a radical and extension"
    if not filename:
        return False
    filename = filename.strip()
    return(filename.startswith(radical) and filename.endswith(extension))

def dir_filter(dirname='', radical='', extension=''):
    "Filter filenames in directory according to radical and extension"
    if not dirname:
        dirname = '.'
    return [filename for filename in os.listdir(dirname)
                if file_filter(filename, radical, extension)]

此解决方案可以使用正则表达式轻松进行一般化(pattern如果您不希望模式始终坚持文件名的开头或结尾,则可能需要添加一个参数)。

A wise teacher told me once that:

When there are several established ways to do something, none of them is good for all cases.

I will thus add a solution for a subset of the problem: quite often, we only want to check whether a file matches a start string and an end string, without going into subdirectories. We would thus like a function that returns a list of filenames, like:

filenames = dir_filter('foo/baz', radical='radical', extension='.txt')

If you care to first declare two functions, this can be done:

def file_filter(filename, radical='', extension=''):
    "Check if a filename matches a radical and extension"
    if not filename:
        return False
    filename = filename.strip()
    return(filename.startswith(radical) and filename.endswith(extension))

def dir_filter(dirname='', radical='', extension=''):
    "Filter filenames in directory according to radical and extension"
    if not dirname:
        dirname = '.'
    return [filename for filename in os.listdir(dirname)
                if file_filter(filename, radical, extension)]

This solution could be easily generalized with regular expressions (and you might want to add a pattern argument, if you do not want your patterns to always stick to the start or end of the filename).


回答 15

使用生成器

import os
def get_files(search_path):
     for (dirpath, _, filenames) in os.walk(search_path):
         for filename in filenames:
             yield os.path.join(dirpath, filename)
list_files = get_files('.')
for filename in list_files:
    print(filename)

Using generators

import os
def get_files(search_path):
     for (dirpath, _, filenames) in os.walk(search_path):
         for filename in filenames:
             yield os.path.join(dirpath, filename)
list_files = get_files('.')
for filename in list_files:
    print(filename)

回答 16

Python 3.4+的另一个非常易读的变体是使用pathlib.Path.glob:

from pathlib import Path
folder = '/foo'
[f for f in Path(folder).glob('*') if f.is_file()]

进行更具体的说明很简单,例如,在所有子目录中仅查找不是符号链接的Python源文件:

[f for f in Path(folder).glob('**/*.py') if not f.is_symlink()]

Another very readable variant for Python 3.4+ is using pathlib.Path.glob:

from pathlib import Path
folder = '/foo'
[f for f in Path(folder).glob('*') if f.is_file()]

It is simple to make more specific, e.g. only look for Python source files which are not symbolic links, also in all subdirectories:

[f for f in Path(folder).glob('**/*.py') if not f.is_symlink()]

回答 17

这是我的通用功能。它返回文件路径而不是文件名的列表,因为我发现它更有用。它具有一些可选参数,使其具有通用性。例如,我经常将其与诸如pattern='*.txt'或那样的参数一起使用subfolders=True

import os
import fnmatch

def list_paths(folder='.', pattern='*', case_sensitive=False, subfolders=False):
    """Return a list of the file paths matching the pattern in the specified 
    folder, optionally including files inside subfolders.
    """
    match = fnmatch.fnmatchcase if case_sensitive else fnmatch.fnmatch
    walked = os.walk(folder) if subfolders else [next(os.walk(folder))]
    return [os.path.join(root, f)
            for root, dirnames, filenames in walked
            for f in filenames if match(f, pattern)]

Here’s my general-purpose function for this. It returns a list of file paths rather than filenames since I found that to be more useful. It has a few optional arguments that make it versatile. For instance, I often use it with arguments like pattern='*.txt' or subfolders=True.

import os
import fnmatch

def list_paths(folder='.', pattern='*', case_sensitive=False, subfolders=False):
    """Return a list of the file paths matching the pattern in the specified 
    folder, optionally including files inside subfolders.
    """
    match = fnmatch.fnmatchcase if case_sensitive else fnmatch.fnmatch
    walked = os.walk(folder) if subfolders else [next(os.walk(folder))]
    return [os.path.join(root, f)
            for root, dirnames, filenames in walked
            for f in filenames if match(f, pattern)]

回答 18

我将提供一个示例liner,其中可以提供sourcepath和文件类型作为输入。该代码返回带有csv扩展名的文件名列表。使用万一需要返回所有文件。这还将递归扫描子目录。

[y for x in os.walk(sourcePath) for y in glob(os.path.join(x[0], '*.csv'))]

根据需要修改文件扩展名和源路径。

I will provide a sample one liner where sourcepath and file type can be provided as input. The code returns a list of filenames with csv extension. Use . in case all files needs to be returned. This will also recursively scans the subdirectories.

[y for x in os.walk(sourcePath) for y in glob(os.path.join(x[0], '*.csv'))]

Modify file extensions and source path as needed.


回答 19

对于python2:pip install rglob

import rglob
file_list=rglob.rglob("/home/base/dir/", "*")
print file_list

For python2: pip install rglob

import rglob
file_list=rglob.rglob("/home/base/dir/", "*")
print file_list

回答 20

dircache是“自2.6版起不推荐使用:dircache模块已在Python 3.0中删除。”

import dircache
list = dircache.listdir(pathname)
i = 0
check = len(list[0])
temp = []
count = len(list)
while count != 0:
  if len(list[i]) != check:
     temp.append(list[i-1])
     check = len(list[i])
  else:
    i = i + 1
    count = count - 1

print temp

dircache is “Deprecated since version 2.6: The dircache module has been removed in Python 3.0.”

import dircache
list = dircache.listdir(pathname)
i = 0
check = len(list[0])
temp = []
count = len(list)
while count != 0:
  if len(list[i]) != check:
     temp.append(list[i-1])
     check = len(list[i])
  else:
    i = i + 1
    count = count - 1

print temp

声明:本站所有文章,如无特殊说明或标注,均为本站原创发布。任何个人或组织,在未征得本站同意时,禁止复制、盗用、采集、发布本站内容到任何网站、书籍等各类媒体平台。如若本站内容侵犯了原著者的合法权益,可联系我们进行处理。