标签归档:fnmatch

如何使用glob.glob模块搜索子文件夹?

问题:如何使用glob.glob模块搜索子文件夹?

我想在文件夹中打开一系列子文件夹,然后找到一些文本文件并打印一些文本文件行。我正在使用这个:

configfiles = glob.glob('C:/Users/sam/Desktop/file1/*.txt')

但这也无法访问子文件夹。有谁知道我也可以使用相同的命令来访问子文件夹?

I want to open a series of subfolders in a folder and find some text files and print some lines of the text files. I am using this:

configfiles = glob.glob('C:/Users/sam/Desktop/file1/*.txt')

But this cannot access the subfolders as well. Does anyone know how I can use the same command to access subfolders as well?


回答 0

在Python 3.5及更高版本中,使用新的递归**/功能:

configfiles = glob.glob('C:/Users/sam/Desktop/file1/**/*.txt', recursive=True)

recursive被设置时,**随后是路径分隔匹配0或多个子目录。

在早期的Python版本中,glob.glob()无法递归列出子目录中的文件。

在这种情况下,我将改用os.walk()结合fnmatch.filter()

import os
import fnmatch

path = 'C:/Users/sam/Desktop/file1'

configfiles = [os.path.join(dirpath, f)
    for dirpath, dirnames, files in os.walk(path)
    for f in fnmatch.filter(files, '*.txt')]

这将递归遍历您的目录,并将所有绝对路径名返回到匹配.txt文件。在这种特定情况下,fnmatch.filter()可能是矫kill过正,您也可以使用.endswith()测试:

import os

path = 'C:/Users/sam/Desktop/file1'

configfiles = [os.path.join(dirpath, f)
    for dirpath, dirnames, files in os.walk(path)
    for f in files if f.endswith('.txt')]

In Python 3.5 and newer use the new recursive **/ functionality:

configfiles = glob.glob('C:/Users/sam/Desktop/file1/**/*.txt', recursive=True)

When recursive is set, ** followed by a path separator matches 0 or more subdirectories.

In earlier Python versions, glob.glob() cannot list files in subdirectories recursively.

In that case I’d use os.walk() combined with fnmatch.filter() instead:

import os
import fnmatch

path = 'C:/Users/sam/Desktop/file1'

configfiles = [os.path.join(dirpath, f)
    for dirpath, dirnames, files in os.walk(path)
    for f in fnmatch.filter(files, '*.txt')]

This’ll walk your directories recursively and return all absolute pathnames to matching .txt files. In this specific case the fnmatch.filter() may be overkill, you could also use a .endswith() test:

import os

path = 'C:/Users/sam/Desktop/file1'

configfiles = [os.path.join(dirpath, f)
    for dirpath, dirnames, files in os.walk(path)
    for f in files if f.endswith('.txt')]

回答 1

要在直接子目录中查找文件:

configfiles = glob.glob(r'C:\Users\sam\Desktop\*\*.txt')

对于遍历所有子目录的递归版本,您可以使用**和传递recursive=True 自Python 3.5之后的版本

configfiles = glob.glob(r'C:\Users\sam\Desktop\**\*.txt', recursive=True)

这两个函数调用都返回列表。您可以用来glob.iglob()一一返回路径。或使用pathlib

from pathlib import Path

path = Path(r'C:\Users\sam\Desktop')
txt_files_only_subdirs = path.glob('*/*.txt')
txt_files_all_recursively = path.rglob('*.txt') # including the current dir

两种方法都返回迭代器(您可以一一获取路径)。

To find files in immediate subdirectories:

configfiles = glob.glob(r'C:\Users\sam\Desktop\*\*.txt')

For a recursive version that traverse all subdirectories, you could use ** and pass recursive=True since Python 3.5:

configfiles = glob.glob(r'C:\Users\sam\Desktop\**\*.txt', recursive=True)

Both function calls return lists. You could use glob.iglob() to return paths one by one. Or use pathlib:

from pathlib import Path

path = Path(r'C:\Users\sam\Desktop')
txt_files_only_subdirs = path.glob('*/*.txt')
txt_files_all_recursively = path.rglob('*.txt') # including the current dir

Both methods return iterators (you can get paths one by one).


回答 2

在这个话题上有很多困惑。让我看看是否可以澄清它(Python 3.7):

  1. glob.glob('*.txt') :匹配当前目录中所有以“ .txt”结尾的文件
  2. glob.glob('*/*.txt') :与1相同
  3. glob.glob('**/*.txt') :仅匹配直接子目录中所有以’.txt’结尾的文件,而不匹配当前目录中的所有文件
  4. glob.glob('*.txt',recursive=True) :与1相同
  5. glob.glob('*/*.txt',recursive=True) :与3相同
  6. glob.glob('**/*.txt',recursive=True):匹配当前目录和所有子目录中所有以“ .txt”结尾的文件

所以最好总是指定 recursive=True.

There’s a lot of confusion on this topic. Let me see if I can clarify it (Python 3.7):

  1. glob.glob('*.txt') :matches all files ending in ‘.txt’ in current directory
  2. glob.glob('*/*.txt') :same as 1
  3. glob.glob('**/*.txt') :matches all files ending in ‘.txt’ in the immediate subdirectories only, but not in the current directory
  4. glob.glob('*.txt',recursive=True) :same as 1
  5. glob.glob('*/*.txt',recursive=True) :same as 3
  6. glob.glob('**/*.txt',recursive=True):matches all files ending in ‘.txt’ in the current directory and in all subdirectories

So it’s best to always specify recursive=True.


回答 3

glob2包支持通配符和相当快

code = '''
import glob2
glob2.glob("files/*/**")
'''
timeit.timeit(code, number=1)

在我的笔记本电脑上,匹配> 60,000个文件路径大约需要2秒钟。

The glob2 package supports wild cards and is reasonably fast

code = '''
import glob2
glob2.glob("files/*/**")
'''
timeit.timeit(code, number=1)

On my laptop it takes approximately 2 seconds to match >60,000 file paths.


回答 4

您可以在Python 2.6中使用Formic

import formic
fileset = formic.FileSet(include="**/*.txt", directory="C:/Users/sam/Desktop/")

披露-我是该软件包的作者。

You can use Formic with Python 2.6

import formic
fileset = formic.FileSet(include="**/*.txt", directory="C:/Users/sam/Desktop/")

Disclosure – I am the author of this package.


回答 5

这是改编版,glob.glob无需使用即可启用类似功能glob2

def find_files(directory, pattern='*'):
    if not os.path.exists(directory):
        raise ValueError("Directory not found {}".format(directory))

    matches = []
    for root, dirnames, filenames in os.walk(directory):
        for filename in filenames:
            full_path = os.path.join(root, filename)
            if fnmatch.filter([full_path], pattern):
                matches.append(os.path.join(root, filename))
    return matches

因此,如果您具有以下目录结构

tests/files
├── a0
   ├── a0.txt
   ├── a0.yaml
   └── b0
       ├── b0.yaml
       └── b00.yaml
└── a1

你可以做这样的事情

files = utils.find_files('tests/files','**/b0/b*.yaml')
> ['tests/files/a0/b0/b0.yaml', 'tests/files/a0/b0/b00.yaml']

几乎fnmatch对整个文件名本身模式匹配,而不只是文件名。

Here is a adapted version that enables glob.glob like functionality without using glob2.

def find_files(directory, pattern='*'):
    if not os.path.exists(directory):
        raise ValueError("Directory not found {}".format(directory))

    matches = []
    for root, dirnames, filenames in os.walk(directory):
        for filename in filenames:
            full_path = os.path.join(root, filename)
            if fnmatch.filter([full_path], pattern):
                matches.append(os.path.join(root, filename))
    return matches

So if you have the following dir structure

tests/files
├── a0
│   ├── a0.txt
│   ├── a0.yaml
│   └── b0
│       ├── b0.yaml
│       └── b00.yaml
└── a1

You can do something like this

files = utils.find_files('tests/files','**/b0/b*.yaml')
> ['tests/files/a0/b0/b0.yaml', 'tests/files/a0/b0/b00.yaml']

Pretty much fnmatch pattern match on the whole filename itself, rather than the filename only.


回答 6

configfiles = glob.glob('C:/Users/sam/Desktop/**/*.txt")

并非在所有情况下都适用,请改用glob2

configfiles = glob2.glob('C:/Users/sam/Desktop/**/*.txt")

configfiles = glob.glob('C:/Users/sam/Desktop/**/*.txt")

Doesn’t works for all cases, instead use glob2

configfiles = glob2.glob('C:/Users/sam/Desktop/**/*.txt")

回答 7

如果可以安装glob2软件包…

import glob2
filenames = glob2.glob("C:\\top_directory\\**\\*.ext")  # Where ext is a specific file extension
folders = glob2.glob("C:\\top_directory\\**\\")

所有文件名和文件夹:

all_ff = glob2.glob("C:\\top_directory\\**\\**")  

If you can install glob2 package…

import glob2
filenames = glob2.glob("C:\\top_directory\\**\\*.ext")  # Where ext is a specific file extension
folders = glob2.glob("C:\\top_directory\\**\\")

All filenames and folders:

all_ff = glob2.glob("C:\\top_directory\\**\\**")  

回答 8

如果您运行的是Python 3.4+,则可以使用该pathlib模块。该Path.glob()方法支持**模式,即“递归该目录和所有子目录”。它返回一个生成器,生成Path所有匹配文件的对象。

from pathlib import Path
configfiles = Path("C:/Users/sam/Desktop/file1/").glob("**/*.txt")

If you’re running Python 3.4+, you can use the pathlib module. The Path.glob() method supports the ** pattern, which means “this directory and all subdirectories, recursively”. It returns a generator yielding Path objects for all matching files.

from pathlib import Path
configfiles = Path("C:/Users/sam/Desktop/file1/").glob("**/*.txt")

回答 9

正如Martijn所指出的,glob只能通过**Python 3.5中引入的运算符来做到这一点。由于OP明确要求使用glob模块,因此以下代码将返回行为类似的惰性评估迭代器

import os, glob, itertools

configfiles = itertools.chain.from_iterable(glob.iglob(os.path.join(root,'*.txt'))
                         for root, dirs, files in os.walk('C:/Users/sam/Desktop/file1/'))

请注意,configfiles尽管如此,您只能在此方法中重复一次。如果您需要可在多个操作中使用的配置文件的真实列表,则必须使用创建显式的配置文件list(configfiles)

As pointed out by Martijn, glob can only do this through the **operator introduced in Python 3.5. Since the OP explicitly asked for the glob module, the following will return a lazy evaluation iterator that behaves similarly

import os, glob, itertools

configfiles = itertools.chain.from_iterable(glob.iglob(os.path.join(root,'*.txt'))
                         for root, dirs, files in os.walk('C:/Users/sam/Desktop/file1/'))

Note that you can only iterate once over configfiles in this approach though. If you require a real list of configfiles that can be used in multiple operations you would have to create this explicitly by using list(configfiles).


回答 10

该命令rglob将对目录结构的最深子级别进行无限递归。如果您只想深一层,则不要使用它。

我意识到OP正在谈论使用glob.glob。我相信,这可以回答意图,即递归搜索所有子文件夹。

rglob函数最近使数据处理算法的速度提高了100倍,该算法使用文件夹结构作为数据读取顺序的固定假设。但是,由于rglob我们能够对指定父目录或该目录下的所有文件进行一次扫描,将它们的名称保存到列表(超过一百万个文件),然后使用该列表来确定我们需要在任何目录下打开哪些文件仅基于文件命名约定及其在哪个文件夹中指向将来。

The command rglob will do an infinite recursion down the deepest sub-level of your directory structure. If you only want one level deep, then do not use it, however.

I realize the OP was talking about using glob.glob. I believe this answers the intent, however, which is to search all subfolders recursively.

The rglob function recently produced a 100x increase in speed for a data processing algorithm which was using the folder structure as a fixed assumption for the order of data reading. However, with rglob we were able to do a single scan once through all files at or below a specified parent directory, save their names to a list (over a million files), then use that list to determine which files we needed to open at any point in the future based on the file naming conventions only vs. which folder they were in.


如何使用Python计算目录中的文件数

问题:如何使用Python计算目录中的文件数

我需要计算使用Python的目录中的文件数。

我猜最简单的方法是len(glob.glob('*')),但这也将目录本身视为文件。

有什么方法可以只计算目录中的文件吗?

I need to count the number of files in a directory using Python.

I guess the easiest way is len(glob.glob('*')), but that also counts the directory itself as a file.

Is there any way to count only the files in a directory?


回答 0

os.listdir()比使用会更有效率glob.glob。要测试文件名是否是普通文件(而不是目录或其他实体),请使用os.path.isfile()

import os, os.path

# simple version for working with CWD
print len([name for name in os.listdir('.') if os.path.isfile(name)])

# path joining version for other paths
DIR = '/tmp'
print len([name for name in os.listdir(DIR) if os.path.isfile(os.path.join(DIR, name))])

os.listdir() will be slightly more efficient than using glob.glob. To test if a filename is an ordinary file (and not a directory or other entity), use os.path.isfile():

import os, os.path

# simple version for working with CWD
print len([name for name in os.listdir('.') if os.path.isfile(name)])

# path joining version for other paths
DIR = '/tmp'
print len([name for name in os.listdir(DIR) if os.path.isfile(os.path.join(DIR, name))])

回答 1

import os

path, dirs, files = next(os.walk("/usr/lib"))
file_count = len(files)
import os

path, dirs, files = next(os.walk("/usr/lib"))
file_count = len(files)

回答 2

对于所有类型的文件,都包含以下子目录:

import os

list = os.listdir(dir) # dir is your directory path
number_files = len(list)
print number_files

仅文件(避免使用子目录):

import os

onlyfiles = next(os.walk(dir))[2] #dir is your directory path as string
print len(onlyfiles)

For all kind of files, subdirectories included:

import os

list = os.listdir(dir) # dir is your directory path
number_files = len(list)
print number_files

Only files (avoiding subdirectories):

import os

onlyfiles = next(os.walk(dir))[2] #dir is your directory path as string
print len(onlyfiles)

回答 3

这是fnmatch派上用场的地方:

import fnmatch

print len(fnmatch.filter(os.listdir(dirpath), '*.txt'))

更多详细信息:http : //docs.python.org/2/library/fnmatch.html

This is where fnmatch comes very handy:

import fnmatch

print len(fnmatch.filter(os.listdir(dirpath), '*.txt'))

More details: http://docs.python.org/2/library/fnmatch.html


回答 4

如果要计算目录中的所有文件(包括子目录中的文件),则最Python化的方法是:

import os

file_count = sum(len(files) for _, _, files in os.walk(r'C:\Dropbox'))
print(file_count)

我们使用的总和要比显式添加文件计数(正在等待的时间)更快

If you want to count all files in the directory – including files in subdirectories, the most pythonic way is:

import os

file_count = sum(len(files) for _, _, files in os.walk(r'C:\Dropbox'))
print(file_count)

We use sum that is faster than explicitly adding the file counts (timings pending)


回答 5

import os
print len(os.listdir(os.getcwd()))
import os
print len(os.listdir(os.getcwd()))

回答 6

def directory(path,extension):
  list_dir = []
  list_dir = os.listdir(path)
  count = 0
  for file in list_dir:
    if file.endswith(extension): # eg: '.txt'
      count += 1
  return count
def directory(path,extension):
  list_dir = []
  list_dir = os.listdir(path)
  count = 0
  for file in list_dir:
    if file.endswith(extension): # eg: '.txt'
      count += 1
  return count

回答 7

我很惊讶没有人提到os.scandir

def count_files(dir):
    return len([1 for x in list(os.scandir(dir)) if x.is_file()])

I am surprised that nobody mentioned os.scandir:

def count_files(dir):
    return len([1 for x in list(os.scandir(dir)) if x.is_file()])

回答 8

这可以使用os.listdir并适用于任何目录:

import os
directory = 'mydirpath'

number_of_files = len([item for item in os.listdir(directory) if os.path.isfile(os.path.join(directory, item))])

可以使用生成器简化此过程,并通过以下方法使速度更快一点:

import os
isfile = os.path.isfile
join = os.path.join

directory = 'mydirpath'
number_of_files = sum(1 for item in os.listdir(directory) if isfile(join(directory, item)))

This uses os.listdir and works for any directory:

import os
directory = 'mydirpath'

number_of_files = len([item for item in os.listdir(directory) if os.path.isfile(os.path.join(directory, item))])

this can be simplified with a generator and made a little bit faster with:

import os
isfile = os.path.isfile
join = os.path.join

directory = 'mydirpath'
number_of_files = sum(1 for item in os.listdir(directory) if isfile(join(directory, item)))

回答 9

def count_em(valid_path):
   x = 0
   for root, dirs, files in os.walk(valid_path):
       for f in files:
            x = x+1
print "There are", x, "files in this directory."
return x

取自这篇文章

def count_em(valid_path):
   x = 0
   for root, dirs, files in os.walk(valid_path):
       for f in files:
            x = x+1
print "There are", x, "files in this directory."
return x

Taked from this post


回答 10

import os

def count_files(in_directory):
    joiner= (in_directory + os.path.sep).__add__
    return sum(
        os.path.isfile(filename)
        for filename
        in map(joiner, os.listdir(in_directory))
    )

>>> count_files("/usr/lib")
1797
>>> len(os.listdir("/usr/lib"))
2049
import os

def count_files(in_directory):
    joiner= (in_directory + os.path.sep).__add__
    return sum(
        os.path.isfile(filename)
        for filename
        in map(joiner, os.listdir(in_directory))
    )

>>> count_files("/usr/lib")
1797
>>> len(os.listdir("/usr/lib"))
2049

回答 11

卢克的代码重新格式化。

import os

print len(os.walk('/usr/lib').next()[2])

Luke’s code reformat.

import os

print len(os.walk('/usr/lib').next()[2])

回答 12

这是我发现有用的简单单行命令:

print int(os.popen("ls | wc -l").read())

Here is a simple one-line command that I found useful:

print int(os.popen("ls | wc -l").read())

回答 13

虽然我同意@DanielStutzbach提供的答案:os.listdir()将比使用效率更高glob.glob

但是,如果要计算文件夹中特定文件的数量,则需要额外的精度len(glob.glob())。例如,如果您要计算要使用的文件夹中的所有pdf,请执行以下操作:

pdfCounter = len(glob.glob1(myPath,"*.pdf"))

While I agree with the answer provided by @DanielStutzbach: os.listdir() will be slightly more efficient than using glob.glob.

However, an extra precision, if you do want to count the number of specific files in folder, you want to use len(glob.glob()). For instance if you were to count all the pdfs in a folder you want to use:

pdfCounter = len(glob.glob1(myPath,"*.pdf"))

回答 14

很简单:

print(len([iq for iq in os.scandir('PATH')]))

它只计算目录中的文件数,我使用列表推导技术遍历特定目录,以返回所有文件。“ len(返回列表)”返回文件数。

It is simple:

print(len([iq for iq in os.scandir('PATH')]))

it simply counts number of files in directory , i have used list comprehension technique to iterate through specific directory returning all files in return . “len(returned list)” returns number of files.


回答 15

import os

total_con=os.listdir('<directory path>')

files=[]

for f_n in total_con:
   if os.path.isfile(f_n):
     files.append(f_n)


print len(files)
import os

total_con=os.listdir('<directory path>')

files=[]

for f_n in total_con:
   if os.path.isfile(f_n):
     files.append(f_n)


print len(files)

回答 16

如果您将使用操作系统的标准外壳,则可以更快地获得结果,而不是使用纯pythonic方式。

Windows示例:

import os
import subprocess

def get_num_files(path):
    cmd = 'DIR \"%s\" /A-D /B /S | FIND /C /V ""' % path
    return int(subprocess.check_output(cmd, shell=True))

If you’ll be using the standard shell of the operating system, you can get the result much faster rather than using pure pythonic way.

Example for Windows:

import os
import subprocess

def get_num_files(path):
    cmd = 'DIR \"%s\" /A-D /B /S | FIND /C /V ""' % path
    return int(subprocess.check_output(cmd, shell=True))

回答 17

我找到了另一个可能是正确的答案。

for root, dirs, files in os.walk(input_path):    
for name in files:
    if os.path.splitext(name)[1] == '.TXT' or os.path.splitext(name)[1] == '.txt':
        datafiles.append(os.path.join(root,name)) 


print len(files) 

I found another answer which may be correct as accepted answer.

for root, dirs, files in os.walk(input_path):    
for name in files:
    if os.path.splitext(name)[1] == '.TXT' or os.path.splitext(name)[1] == '.txt':
        datafiles.append(os.path.join(root,name)) 


print len(files) 

回答 18

我用过glob.iglob类似的目录结构

data
└───train
   └───subfolder1
   |      file111.png
   |      file112.png
   |      ...
   |
   └───subfolder2
          file121.png
          file122.png
          ...
└───test
       file221.png
       file222.png

以下两个选项均返回4(如预期,即不计算子文件夹本身

  • len(list(glob.iglob("data/train/*/*.png", recursive=True)))
  • sum(1 for i in glob.iglob("data/train/*/*.png"))

I used glob.iglob for a directory structure similar to

data
└───train
│   └───subfolder1
│   |   │   file111.png
│   |   │   file112.png
│   |   │   ...
│   |
│   └───subfolder2
│       │   file121.png
│       │   file122.png
│       │   ...
└───test
    │   file221.png
    │   file222.png

Both of the following options return 4 (as expected, i.e. does not count the subfolders themselves)

  • len(list(glob.iglob("data/train/*/*.png", recursive=True)))
  • sum(1 for i in glob.iglob("data/train/*/*.png"))

回答 19

我这样做了,这返回了文件夹(Attack_Data)中文件的数量。

import os
def fcount(path):
    #Counts the number of files in a directory
    count = 0
    for f in os.listdir(path):
        if os.path.isfile(os.path.join(path, f)):
            count += 1

    return count
path = r"C:\Users\EE EKORO\Desktop\Attack_Data" #Read files in folder
print (fcount(path))

i did this and this returned the number of files in the folder(Attack_Data)…this works fine.

import os
def fcount(path):
    #Counts the number of files in a directory
    count = 0
    for f in os.listdir(path):
        if os.path.isfile(os.path.join(path, f)):
            count += 1

    return count
path = r"C:\Users\EE EKORO\Desktop\Attack_Data" #Read files in folder
print (fcount(path))

如何使用glob()递归查找文件?

问题:如何使用glob()递归查找文件?

这就是我所拥有的:

glob(os.path.join('src','*.c'))

但我想搜索src的子文件夹。这样的事情会起作用:

glob(os.path.join('src','*.c'))
glob(os.path.join('src','*','*.c'))
glob(os.path.join('src','*','*','*.c'))
glob(os.path.join('src','*','*','*','*.c'))

但这显然是有限且笨拙的。

This is what I have:

glob(os.path.join('src','*.c'))

but I want to search the subfolders of src. Something like this would work:

glob(os.path.join('src','*.c'))
glob(os.path.join('src','*','*.c'))
glob(os.path.join('src','*','*','*.c'))
glob(os.path.join('src','*','*','*','*.c'))

But this is obviously limited and clunky.


回答 0

Python 3.5+

由于您使用的是新的python,因此应pathlib.Path.rglobpathlib模块中使用。

from pathlib import Path

for path in Path('src').rglob('*.c'):
    print(path.name)

如果您不想使用pathlib,只需使用glob.glob,但不要忘记传递recursive关键字参数。

对于匹配文件以点(。)开头的情况;例如当前目录中的文件或基于Unix的系统上的隐藏文件,请使用以下os.walk解决方案。

较旧的Python版本

对于较旧的Python版本,可os.walk用于递归遍历目录并fnmatch.filter与简单表达式匹配:

import fnmatch
import os

matches = []
for root, dirnames, filenames in os.walk('src'):
    for filename in fnmatch.filter(filenames, '*.c'):
        matches.append(os.path.join(root, filename))

Python 3.5+

Since you’re on a new python, you should use pathlib.Path.rglob from the the pathlib module.

from pathlib import Path

for path in Path('src').rglob('*.c'):
    print(path.name)

If you don’t want to use pathlib, just use glob.glob, but don’t forget to pass in the recursive keyword parameter.

For cases where matching files beginning with a dot (.); like files in the current directory or hidden files on Unix based system, use the os.walk solution below.

Older Python versions

For older Python versions, use os.walk to recursively walk a directory and fnmatch.filter to match against a simple expression:

import fnmatch
import os

matches = []
for root, dirnames, filenames in os.walk('src'):
    for filename in fnmatch.filter(filenames, '*.c'):
        matches.append(os.path.join(root, filename))

回答 1

与其他解决方案类似,但是使用fnmatch.fnmatch而不是glob,因为os.walk已经列出了文件名:

import os, fnmatch


def find_files(directory, pattern):
    for root, dirs, files in os.walk(directory):
        for basename in files:
            if fnmatch.fnmatch(basename, pattern):
                filename = os.path.join(root, basename)
                yield filename


for filename in find_files('src', '*.c'):
    print 'Found C source:', filename

另外,使用生成器可以使您处理找到的每个文件,而不是查找所有文件然后进行处理。

Similar to other solutions, but using fnmatch.fnmatch instead of glob, since os.walk already listed the filenames:

import os, fnmatch


def find_files(directory, pattern):
    for root, dirs, files in os.walk(directory):
        for basename in files:
            if fnmatch.fnmatch(basename, pattern):
                filename = os.path.join(root, basename)
                yield filename


for filename in find_files('src', '*.c'):
    print 'Found C source:', filename

Also, using a generator alows you to process each file as it is found, instead of finding all the files and then processing them.


回答 2

我修改了glob模块,以支持**用于递归glob,例如:

>>> import glob2
>>> all_header_files = glob2.glob('src/**/*.c')

https://github.com/miracle2k/python-glob2/

当您想为用户提供使用**语法的能力时很有用,因此仅os.walk()不够好。

I’ve modified the glob module to support ** for recursive globbing, e.g:

>>> import glob2
>>> all_header_files = glob2.glob('src/**/*.c')

https://github.com/miracle2k/python-glob2/

Useful when you want to provide your users with the ability to use the ** syntax, and thus os.walk() alone is not good enough.


回答 3

从Python 3.4开始,可以使用新pathlib模块中支持通配符glob()Path类之一的方法。例如:**

from pathlib import Path

for file_path in Path('src').glob('**/*.c'):
    print(file_path) # do whatever you need with these files

更新: 从Python 3.5开始,glob.glob()

Starting with Python 3.4, one can use the glob() method of one of the Path classes in the new pathlib module, which supports ** wildcards. For example:

from pathlib import Path

for file_path in Path('src').glob('**/*.c'):
    print(file_path) # do whatever you need with these files

Update: Starting with Python 3.5, the same syntax is also supported by glob.glob().


回答 4

import os
import fnmatch


def recursive_glob(treeroot, pattern):
    results = []
    for base, dirs, files in os.walk(treeroot):
        goodfiles = fnmatch.filter(files, pattern)
        results.extend(os.path.join(base, f) for f in goodfiles)
    return results

fnmatch为您提供与完全相同的模式glob,因此对于glob.glob非常紧密的语义而言,这确实是一个很好的替代。迭代的版本(例如生成器),用IOW代替glob.iglob,是微不足道的改编(只是yield中间结果,而不是extend最后返回单个结果列表)。

import os
import fnmatch


def recursive_glob(treeroot, pattern):
    results = []
    for base, dirs, files in os.walk(treeroot):
        goodfiles = fnmatch.filter(files, pattern)
        results.extend(os.path.join(base, f) for f in goodfiles)
    return results

fnmatch gives you exactly the same patterns as glob, so this is really an excellent replacement for glob.glob with very close semantics. An iterative version (e.g. a generator), IOW a replacement for glob.iglob, is a trivial adaptation (just yield the intermediate results as you go, instead of extending a single results list to return at the end).


回答 5

对于python> = 3.5,可以使用**recursive=True

import glob
for x in glob.glob('path/**/*.c', recursive=True):
    print(x)

演示版


如果是递归的True,则模式** 将匹配任何文件以及零个或多个directoriessubdirectories。如果模式后跟一个os.sep,则仅目录和subdirectories匹配项。

For python >= 3.5 you can use **, recursive=True :

import glob
for x in glob.glob('path/**/*.c', recursive=True):
    print(x)

Demo


If recursive is True, the pattern ** will match any files and zero or more directories and subdirectories. If the pattern is followed by an os.sep, only directories and subdirectories match.


回答 6

您将要用来os.walk收集符合条件的文件名。例如:

import os
cfiles = []
for root, dirs, files in os.walk('src'):
  for file in files:
    if file.endswith('.c'):
      cfiles.append(os.path.join(root, file))

You’ll want to use os.walk to collect filenames that match your criteria. For example:

import os
cfiles = []
for root, dirs, files in os.walk('src'):
  for file in files:
    if file.endswith('.c'):
      cfiles.append(os.path.join(root, file))

回答 7

这是一个具有嵌套列表推导的解决方案,os.walk而不是简单的后缀匹配glob

import os
cfiles = [os.path.join(root, filename)
          for root, dirnames, filenames in os.walk('src')
          for filename in filenames if filename.endswith('.c')]

可以将其压缩为单线:

import os;cfiles=[os.path.join(r,f) for r,d,fs in os.walk('src') for f in fs if f.endswith('.c')]

或概括为一个函数:

import os

def recursive_glob(rootdir='.', suffix=''):
    return [os.path.join(looproot, filename)
            for looproot, _, filenames in os.walk(rootdir)
            for filename in filenames if filename.endswith(suffix)]

cfiles = recursive_glob('src', '.c')

如果您确实需要完整的glob样式模式,则可以遵循Alex和Bruno的示例并使用fnmatch

import fnmatch
import os

def recursive_glob(rootdir='.', pattern='*'):
    return [os.path.join(looproot, filename)
            for looproot, _, filenames in os.walk(rootdir)
            for filename in filenames
            if fnmatch.fnmatch(filename, pattern)]

cfiles = recursive_glob('src', '*.c')

Here’s a solution with nested list comprehensions, os.walk and simple suffix matching instead of glob:

import os
cfiles = [os.path.join(root, filename)
          for root, dirnames, filenames in os.walk('src')
          for filename in filenames if filename.endswith('.c')]

It can be compressed to a one-liner:

import os;cfiles=[os.path.join(r,f) for r,d,fs in os.walk('src') for f in fs if f.endswith('.c')]

or generalized as a function:

import os

def recursive_glob(rootdir='.', suffix=''):
    return [os.path.join(looproot, filename)
            for looproot, _, filenames in os.walk(rootdir)
            for filename in filenames if filename.endswith(suffix)]

cfiles = recursive_glob('src', '.c')

If you do need full glob style patterns, you can follow Alex’s and Bruno’s example and use fnmatch:

import fnmatch
import os

def recursive_glob(rootdir='.', pattern='*'):
    return [os.path.join(looproot, filename)
            for looproot, _, filenames in os.walk(rootdir)
            for filename in filenames
            if fnmatch.fnmatch(filename, pattern)]

cfiles = recursive_glob('src', '*.c')

回答 8

最近,我不得不恢复扩展名为.jpg的图片。我运行了photorec并恢复了4579个目录,其中220万个文件具有多种扩展名。使用以下脚本,我能够在几分钟内选择50133个具有.jpg扩展名的文件:

#!/usr/binenv python2.7

import glob
import shutil
import os

src_dir = "/home/mustafa/Masaüstü/yedek"
dst_dir = "/home/mustafa/Genel/media"
for mediafile in glob.iglob(os.path.join(src_dir, "*", "*.jpg")): #"*" is for subdirectory
    shutil.copy(mediafile, dst_dir)

Recently I had to recover my pictures with the extension .jpg. I ran photorec and recovered 4579 directories 2.2 million files within, having tremendous variety of extensions.With the script below I was able to select 50133 files havin .jpg extension within minutes:

#!/usr/binenv python2.7

import glob
import shutil
import os

src_dir = "/home/mustafa/Masaüstü/yedek"
dst_dir = "/home/mustafa/Genel/media"
for mediafile in glob.iglob(os.path.join(src_dir, "*", "*.jpg")): #"*" is for subdirectory
    shutil.copy(mediafile, dst_dir)

回答 9

考虑一下pathlib.rglob()

这就好比调用Path.glob()"**/"在给定的相对图案前面加:

import pathlib


for p in pathlib.Path("src").rglob("*.c"):
    print(p)

另请参阅@taleinat的相关文章和类似的文章其他地方。

Consider pathlib.rglob().

This is like calling Path.glob() with "**/" added in front of the given relative pattern:

import pathlib


for p in pathlib.Path("src").rglob("*.c"):
    print(p)

See also @taleinat’s related post here and a similar post elsewhere.


回答 10

Johan和Bruno针对上述最低要求提供了出色的解决方案。我刚刚发布了实现了Ant FileSet和Globs的Formic,它可以处理这种情况以及更复杂的情况。您的要求的实现是:

import formic
fileset = formic.FileSet(include="/src/**/*.c")
for file_name in fileset.qualified_files():
    print file_name

Johan and Bruno provide excellent solutions on the minimal requirement as stated. I have just released Formic which implements Ant FileSet and Globs which can handle this and more complicated scenarios. An implementation of your requirement is:

import formic
fileset = formic.FileSet(include="/src/**/*.c")
for file_name in fileset.qualified_files():
    print file_name

回答 11

基于其他答案,这是我当前的工作实现,它在根目录中检索嵌套的xml文件:

files = []
for root, dirnames, filenames in os.walk(myDir):
    files.extend(glob.glob(root + "/*.xml"))

我真的很喜欢python :)

based on other answers this is my current working implementation, which retrieves nested xml files in a root directory:

files = []
for root, dirnames, filenames in os.walk(myDir):
    files.extend(glob.glob(root + "/*.xml"))

I’m really having fun with python :)


回答 12

仅使用glob模块执行此操作的另一种方法。只需在rglob方法中添加一个起始基本目录和一个匹配模式即可,它将返回匹配文件名的列表。

import glob
import os

def _getDirs(base):
    return [x for x in glob.iglob(os.path.join( base, '*')) if os.path.isdir(x) ]

def rglob(base, pattern):
    list = []
    list.extend(glob.glob(os.path.join(base,pattern)))
    dirs = _getDirs(base)
    if len(dirs):
        for d in dirs:
            list.extend(rglob(os.path.join(base,d), pattern))
    return list

Another way to do it using just the glob module. Just seed the rglob method with a starting base directory and a pattern to match and it will return a list of matching file names.

import glob
import os

def _getDirs(base):
    return [x for x in glob.iglob(os.path.join( base, '*')) if os.path.isdir(x) ]

def rglob(base, pattern):
    list = []
    list.extend(glob.glob(os.path.join(base,pattern)))
    dirs = _getDirs(base)
    if len(dirs):
        for d in dirs:
            list.extend(rglob(os.path.join(base,d), pattern))
    return list

回答 13

或具有列表理解:

 >>> base = r"c:\User\xtofl"
 >>> binfiles = [ os.path.join(base,f) 
            for base, _, files in os.walk(root) 
            for f in files if f.endswith(".jpg") ] 

Or with a list comprehension:

 >>> base = r"c:\User\xtofl"
 >>> binfiles = [ os.path.join(base,f) 
            for base, _, files in os.walk(root) 
            for f in files if f.endswith(".jpg") ] 

回答 14

刚做这个..它将以分层方式打印文件和目录

但是我没有用过fnmatch或walk

#!/usr/bin/python

import os,glob,sys

def dirlist(path, c = 1):

        for i in glob.glob(os.path.join(path, "*")):
                if os.path.isfile(i):
                        filepath, filename = os.path.split(i)
                        print '----' *c + filename

                elif os.path.isdir(i):
                        dirname = os.path.basename(i)
                        print '----' *c + dirname
                        c+=1
                        dirlist(i,c)
                        c-=1


path = os.path.normpath(sys.argv[1])
print(os.path.basename(path))
dirlist(path)

Just made this.. it will print files and directory in hierarchical way

But I didn’t used fnmatch or walk

#!/usr/bin/python

import os,glob,sys

def dirlist(path, c = 1):

        for i in glob.glob(os.path.join(path, "*")):
                if os.path.isfile(i):
                        filepath, filename = os.path.split(i)
                        print '----' *c + filename

                elif os.path.isdir(i):
                        dirname = os.path.basename(i)
                        print '----' *c + dirname
                        c+=1
                        dirlist(i,c)
                        c-=1


path = os.path.normpath(sys.argv[1])
print(os.path.basename(path))
dirlist(path)

回答 15

那使用fnmatch或正则表达式:

import fnmatch, os

def filepaths(directory, pattern):
    for root, dirs, files in os.walk(directory):
        for basename in files:
            try:
                matched = pattern.match(basename)
            except AttributeError:
                matched = fnmatch.fnmatch(basename, pattern)
            if matched:
                yield os.path.join(root, basename)

# usage
if __name__ == '__main__':
    from pprint import pprint as pp
    import re
    path = r'/Users/hipertracker/app/myapp'
    pp([x for x in filepaths(path, re.compile(r'.*\.py$'))])
    pp([x for x in filepaths(path, '*.py')])

That one uses fnmatch or regular expression:

import fnmatch, os

def filepaths(directory, pattern):
    for root, dirs, files in os.walk(directory):
        for basename in files:
            try:
                matched = pattern.match(basename)
            except AttributeError:
                matched = fnmatch.fnmatch(basename, pattern)
            if matched:
                yield os.path.join(root, basename)

# usage
if __name__ == '__main__':
    from pprint import pprint as pp
    import re
    path = r'/Users/hipertracker/app/myapp'
    pp([x for x in filepaths(path, re.compile(r'.*\.py$'))])
    pp([x for x in filepaths(path, '*.py')])

回答 16

除了建议的答案,您还可以通过一些懒惰的生成和列表理解魔术来做到这一点:

import os, glob, itertools

results = itertools.chain.from_iterable(glob.iglob(os.path.join(root,'*.c'))
                                               for root, dirs, files in os.walk('src'))

for f in results: print(f)

除了适合一行并且避免在内存中使用不必要的列表之外,这还具有很好的副作用,即您可以以类似于**运算符的方式使用它,例如,可以使用os.path.join(root, 'some/path/*.c')它来获取所有.c文件。具有此结构的src子目录。

In addition to the suggested answers, you can do this with some lazy generation and list comprehension magic:

import os, glob, itertools

results = itertools.chain.from_iterable(glob.iglob(os.path.join(root,'*.c'))
                                               for root, dirs, files in os.walk('src'))

for f in results: print(f)

Besides fitting in one line and avoiding unnecessary lists in memory, this also has the nice side effect, that you can use it in a way similar to the ** operator, e.g., you could use os.path.join(root, 'some/path/*.c') in order to get all .c files in all sub directories of src that have this structure.


回答 17

对于python 3.5及更高版本

import glob

#file_names_array = glob.glob('path/*.c', recursive=True)
#above works for files directly at path/ as guided by NeStack

#updated version
file_names_array = glob.glob('path/**/*.c', recursive=True)

您可能还需要

for full_path_in_src in  file_names_array:
    print (full_path_in_src ) # be like 'abc/xyz.c'
    #Full system path of this would be like => 'path till src/abc/xyz.c'

For python 3.5 and later

import glob

#file_names_array = glob.glob('path/*.c', recursive=True)
#above works for files directly at path/ as guided by NeStack

#updated version
file_names_array = glob.glob('path/**/*.c', recursive=True)

further you might need

for full_path_in_src in  file_names_array:
    print (full_path_in_src ) # be like 'abc/xyz.c'
    #Full system path of this would be like => 'path till src/abc/xyz.c'

回答 18

这是Python 2.7上的有效代码。作为我的devops工作的一部分,我需要编写一个脚本,该脚本会将标有live-appName.properties的配置文件移动到appName.properties。可能还有其他扩展文件,例如live-appName.xml。

以下是用于此目的的工作代码,该代码在给定目录(嵌套级别)中查找文件,然后将其重命名(移动)为所需的文件名

def flipProperties(searchDir):
   print "Flipping properties to point to live DB"
   for root, dirnames, filenames in os.walk(searchDir):
      for filename in fnmatch.filter(filenames, 'live-*.*'):
        targetFileName = os.path.join(root, filename.split("live-")[1])
        print "File "+ os.path.join(root, filename) + "will be moved to " + targetFileName
        shutil.move(os.path.join(root, filename), targetFileName)

从主脚本调用此函数

flipProperties(searchDir)

希望这可以帮助遇到类似问题的人。

This is a working code on Python 2.7. As part of my devops work, I was required to write a script which would move the config files marked with live-appName.properties to appName.properties. There could be other extension files as well like live-appName.xml.

Below is a working code for this, which finds the files in the given directories (nested level) and then renames (moves) it to the required filename

def flipProperties(searchDir):
   print "Flipping properties to point to live DB"
   for root, dirnames, filenames in os.walk(searchDir):
      for filename in fnmatch.filter(filenames, 'live-*.*'):
        targetFileName = os.path.join(root, filename.split("live-")[1])
        print "File "+ os.path.join(root, filename) + "will be moved to " + targetFileName
        shutil.move(os.path.join(root, filename), targetFileName)

This function is called from a main script

flipProperties(searchDir)

Hope this helps someone struggling with similar issues.


回答 19

Johan Dahlin答案的简化版本,不带fnmatch

import os

matches = []
for root, dirnames, filenames in os.walk('src'):
  matches += [os.path.join(root, f) for f in filenames if f[-2:] == '.c']

Simplified version of Johan Dahlin’s answer, without fnmatch.

import os

matches = []
for root, dirnames, filenames in os.walk('src'):
  matches += [os.path.join(root, f) for f in filenames if f[-2:] == '.c']

回答 20

这是我的使用列表推导的解决方案在目录和所有子目录中递归搜索多个文件扩展名的解决方案:

import os, glob

def _globrec(path, *exts):
""" Glob recursively a directory and all subdirectories for multiple file extensions 
    Note: Glob is case-insensitive, i. e. for '\*.jpg' you will get files ending
    with .jpg and .JPG

    Parameters
    ----------
    path : str
        A directory name
    exts : tuple
        File extensions to glob for

    Returns
    -------
    files : list
        list of files matching extensions in exts in path and subfolders

    """
    dirs = [a[0] for a in os.walk(path)]
    f_filter = [d+e for d in dirs for e in exts]    
    return [f for files in [glob.iglob(files) for files in f_filter] for f in files]

my_pictures = _globrec(r'C:\Temp', '\*.jpg','\*.bmp','\*.png','\*.gif')
for f in my_pictures:
    print f

Here is my solution using list comprehension to search for multiple file extensions recursively in a directory and all subdirectories:

import os, glob

def _globrec(path, *exts):
""" Glob recursively a directory and all subdirectories for multiple file extensions 
    Note: Glob is case-insensitive, i. e. for '\*.jpg' you will get files ending
    with .jpg and .JPG

    Parameters
    ----------
    path : str
        A directory name
    exts : tuple
        File extensions to glob for

    Returns
    -------
    files : list
        list of files matching extensions in exts in path and subfolders

    """
    dirs = [a[0] for a in os.walk(path)]
    f_filter = [d+e for d in dirs for e in exts]    
    return [f for files in [glob.iglob(files) for files in f_filter] for f in files]

my_pictures = _globrec(r'C:\Temp', '\*.jpg','\*.bmp','\*.png','\*.gif')
for f in my_pictures:
    print f

回答 21

import sys, os, glob

dir_list = ["c:\\books\\heap"]

while len(dir_list) > 0:
    cur_dir = dir_list[0]
    del dir_list[0]
    list_of_files = glob.glob(cur_dir+'\\*')
    for book in list_of_files:
        if os.path.isfile(book):
            print(book)
        else:
            dir_list.append(book)
import sys, os, glob

dir_list = ["c:\\books\\heap"]

while len(dir_list) > 0:
    cur_dir = dir_list[0]
    del dir_list[0]
    list_of_files = glob.glob(cur_dir+'\\*')
    for book in list_of_files:
        if os.path.isfile(book):
            print(book)
        else:
            dir_list.append(book)

回答 22

我修改了此发布中的最佳答案..并最近创建了此脚本,该脚本将遍历给定目录(searchdir)中的所有文件及其下的子目录…并打印文件名,rootdir,修改/创建日期和尺寸。

希望这对某人有帮助…他们可以遍历目录并获取fileinfo。

import time
import fnmatch
import os

def fileinfo(file):
    filename = os.path.basename(file)
    rootdir = os.path.dirname(file)
    lastmod = time.ctime(os.path.getmtime(file))
    creation = time.ctime(os.path.getctime(file))
    filesize = os.path.getsize(file)

    print "%s**\t%s\t%s\t%s\t%s" % (rootdir, filename, lastmod, creation, filesize)

searchdir = r'D:\Your\Directory\Root'
matches = []

for root, dirnames, filenames in os.walk(searchdir):
    ##  for filename in fnmatch.filter(filenames, '*.c'):
    for filename in filenames:
        ##      matches.append(os.path.join(root, filename))
        ##print matches
        fileinfo(os.path.join(root, filename))

I modified the top answer in this posting.. and recently created this script which will loop through all files in a given directory (searchdir) and the sub-directories under it… and prints filename, rootdir, modified/creation date, and size.

Hope this helps someone… and they can walk the directory and get fileinfo.

import time
import fnmatch
import os

def fileinfo(file):
    filename = os.path.basename(file)
    rootdir = os.path.dirname(file)
    lastmod = time.ctime(os.path.getmtime(file))
    creation = time.ctime(os.path.getctime(file))
    filesize = os.path.getsize(file)

    print "%s**\t%s\t%s\t%s\t%s" % (rootdir, filename, lastmod, creation, filesize)

searchdir = r'D:\Your\Directory\Root'
matches = []

for root, dirnames, filenames in os.walk(searchdir):
    ##  for filename in fnmatch.filter(filenames, '*.c'):
    for filename in filenames:
        ##      matches.append(os.path.join(root, filename))
        ##print matches
        fileinfo(os.path.join(root, filename))

回答 23

这是一个将模式与完整路径而不只是基本文件名匹配的解决方案。

它用于fnmatch.translate将glob样式的模式转换为正则表达式,然后将其与在遍历目录时发现的每个文件的完整路径进行匹配。

re.IGNORECASE是可选的,但在Windows上是理想的,因为文件系统本身不区分大小写。(我没有费心编译正则表达式,因为文档表明它应该在内部缓存。)

import fnmatch
import os
import re

def findfiles(dir, pattern):
    patternregex = fnmatch.translate(pattern)
    for root, dirs, files in os.walk(dir):
        for basename in files:
            filename = os.path.join(root, basename)
            if re.search(patternregex, filename, re.IGNORECASE):
                yield filename

Here is a solution that will match the pattern against the full path and not just the base filename.

It uses fnmatch.translate to convert a glob-style pattern into a regular expression, which is then matched against the full path of each file found while walking the directory.

re.IGNORECASE is optional, but desirable on Windows since the file system itself is not case-sensitive. (I didn’t bother compiling the regex because docs indicate it should be cached internally.)

import fnmatch
import os
import re

def findfiles(dir, pattern):
    patternregex = fnmatch.translate(pattern)
    for root, dirs, files in os.walk(dir):
        for basename in files:
            filename = os.path.join(root, basename)
            if re.search(patternregex, filename, re.IGNORECASE):
                yield filename

回答 24

我需要一个解决方案的Python 2.x中,工程上大的目录。
我结束了这一点:

import subprocess
foundfiles= subprocess.check_output("ls src/*.c src/**/*.c", shell=True)
for foundfile in foundfiles.splitlines():
    print foundfile

请注意,如果ls找不到任何匹配文件,您可能需要一些异常处理。

I needed a solution for python 2.x that works fast on large directories.
I endet up with this:

import subprocess
foundfiles= subprocess.check_output("ls src/*.c src/**/*.c", shell=True)
for foundfile in foundfiles.splitlines():
    print foundfile

Note that you might need some exception handling in case ls doesn’t find any matching file.