标签归档:glob

如何使用glob.glob模块搜索子文件夹?

问题:如何使用glob.glob模块搜索子文件夹?

我想在文件夹中打开一系列子文件夹,然后找到一些文本文件并打印一些文本文件行。我正在使用这个:

configfiles = glob.glob('C:/Users/sam/Desktop/file1/*.txt')

但这也无法访问子文件夹。有谁知道我也可以使用相同的命令来访问子文件夹?

I want to open a series of subfolders in a folder and find some text files and print some lines of the text files. I am using this:

configfiles = glob.glob('C:/Users/sam/Desktop/file1/*.txt')

But this cannot access the subfolders as well. Does anyone know how I can use the same command to access subfolders as well?


回答 0

在Python 3.5及更高版本中,使用新的递归**/功能:

configfiles = glob.glob('C:/Users/sam/Desktop/file1/**/*.txt', recursive=True)

recursive被设置时,**随后是路径分隔匹配0或多个子目录。

在早期的Python版本中,glob.glob()无法递归列出子目录中的文件。

在这种情况下,我将改用os.walk()结合fnmatch.filter()

import os
import fnmatch

path = 'C:/Users/sam/Desktop/file1'

configfiles = [os.path.join(dirpath, f)
    for dirpath, dirnames, files in os.walk(path)
    for f in fnmatch.filter(files, '*.txt')]

这将递归遍历您的目录,并将所有绝对路径名返回到匹配.txt文件。在这种特定情况下,fnmatch.filter()可能是矫kill过正,您也可以使用.endswith()测试:

import os

path = 'C:/Users/sam/Desktop/file1'

configfiles = [os.path.join(dirpath, f)
    for dirpath, dirnames, files in os.walk(path)
    for f in files if f.endswith('.txt')]

In Python 3.5 and newer use the new recursive **/ functionality:

configfiles = glob.glob('C:/Users/sam/Desktop/file1/**/*.txt', recursive=True)

When recursive is set, ** followed by a path separator matches 0 or more subdirectories.

In earlier Python versions, glob.glob() cannot list files in subdirectories recursively.

In that case I’d use os.walk() combined with fnmatch.filter() instead:

import os
import fnmatch

path = 'C:/Users/sam/Desktop/file1'

configfiles = [os.path.join(dirpath, f)
    for dirpath, dirnames, files in os.walk(path)
    for f in fnmatch.filter(files, '*.txt')]

This’ll walk your directories recursively and return all absolute pathnames to matching .txt files. In this specific case the fnmatch.filter() may be overkill, you could also use a .endswith() test:

import os

path = 'C:/Users/sam/Desktop/file1'

configfiles = [os.path.join(dirpath, f)
    for dirpath, dirnames, files in os.walk(path)
    for f in files if f.endswith('.txt')]

回答 1

要在直接子目录中查找文件:

configfiles = glob.glob(r'C:\Users\sam\Desktop\*\*.txt')

对于遍历所有子目录的递归版本,您可以使用**和传递recursive=True 自Python 3.5之后的版本

configfiles = glob.glob(r'C:\Users\sam\Desktop\**\*.txt', recursive=True)

这两个函数调用都返回列表。您可以用来glob.iglob()一一返回路径。或使用pathlib

from pathlib import Path

path = Path(r'C:\Users\sam\Desktop')
txt_files_only_subdirs = path.glob('*/*.txt')
txt_files_all_recursively = path.rglob('*.txt') # including the current dir

两种方法都返回迭代器(您可以一一获取路径)。

To find files in immediate subdirectories:

configfiles = glob.glob(r'C:\Users\sam\Desktop\*\*.txt')

For a recursive version that traverse all subdirectories, you could use ** and pass recursive=True since Python 3.5:

configfiles = glob.glob(r'C:\Users\sam\Desktop\**\*.txt', recursive=True)

Both function calls return lists. You could use glob.iglob() to return paths one by one. Or use pathlib:

from pathlib import Path

path = Path(r'C:\Users\sam\Desktop')
txt_files_only_subdirs = path.glob('*/*.txt')
txt_files_all_recursively = path.rglob('*.txt') # including the current dir

Both methods return iterators (you can get paths one by one).


回答 2

在这个话题上有很多困惑。让我看看是否可以澄清它(Python 3.7):

  1. glob.glob('*.txt') :匹配当前目录中所有以“ .txt”结尾的文件
  2. glob.glob('*/*.txt') :与1相同
  3. glob.glob('**/*.txt') :仅匹配直接子目录中所有以’.txt’结尾的文件,而不匹配当前目录中的所有文件
  4. glob.glob('*.txt',recursive=True) :与1相同
  5. glob.glob('*/*.txt',recursive=True) :与3相同
  6. glob.glob('**/*.txt',recursive=True):匹配当前目录和所有子目录中所有以“ .txt”结尾的文件

所以最好总是指定 recursive=True.

There’s a lot of confusion on this topic. Let me see if I can clarify it (Python 3.7):

  1. glob.glob('*.txt') :matches all files ending in ‘.txt’ in current directory
  2. glob.glob('*/*.txt') :same as 1
  3. glob.glob('**/*.txt') :matches all files ending in ‘.txt’ in the immediate subdirectories only, but not in the current directory
  4. glob.glob('*.txt',recursive=True) :same as 1
  5. glob.glob('*/*.txt',recursive=True) :same as 3
  6. glob.glob('**/*.txt',recursive=True):matches all files ending in ‘.txt’ in the current directory and in all subdirectories

So it’s best to always specify recursive=True.


回答 3

glob2包支持通配符和相当快

code = '''
import glob2
glob2.glob("files/*/**")
'''
timeit.timeit(code, number=1)

在我的笔记本电脑上,匹配> 60,000个文件路径大约需要2秒钟。

The glob2 package supports wild cards and is reasonably fast

code = '''
import glob2
glob2.glob("files/*/**")
'''
timeit.timeit(code, number=1)

On my laptop it takes approximately 2 seconds to match >60,000 file paths.


回答 4

您可以在Python 2.6中使用Formic

import formic
fileset = formic.FileSet(include="**/*.txt", directory="C:/Users/sam/Desktop/")

披露-我是该软件包的作者。

You can use Formic with Python 2.6

import formic
fileset = formic.FileSet(include="**/*.txt", directory="C:/Users/sam/Desktop/")

Disclosure – I am the author of this package.


回答 5

这是改编版,glob.glob无需使用即可启用类似功能glob2

def find_files(directory, pattern='*'):
    if not os.path.exists(directory):
        raise ValueError("Directory not found {}".format(directory))

    matches = []
    for root, dirnames, filenames in os.walk(directory):
        for filename in filenames:
            full_path = os.path.join(root, filename)
            if fnmatch.filter([full_path], pattern):
                matches.append(os.path.join(root, filename))
    return matches

因此,如果您具有以下目录结构

tests/files
├── a0
   ├── a0.txt
   ├── a0.yaml
   └── b0
       ├── b0.yaml
       └── b00.yaml
└── a1

你可以做这样的事情

files = utils.find_files('tests/files','**/b0/b*.yaml')
> ['tests/files/a0/b0/b0.yaml', 'tests/files/a0/b0/b00.yaml']

几乎fnmatch对整个文件名本身模式匹配,而不只是文件名。

Here is a adapted version that enables glob.glob like functionality without using glob2.

def find_files(directory, pattern='*'):
    if not os.path.exists(directory):
        raise ValueError("Directory not found {}".format(directory))

    matches = []
    for root, dirnames, filenames in os.walk(directory):
        for filename in filenames:
            full_path = os.path.join(root, filename)
            if fnmatch.filter([full_path], pattern):
                matches.append(os.path.join(root, filename))
    return matches

So if you have the following dir structure

tests/files
├── a0
│   ├── a0.txt
│   ├── a0.yaml
│   └── b0
│       ├── b0.yaml
│       └── b00.yaml
└── a1

You can do something like this

files = utils.find_files('tests/files','**/b0/b*.yaml')
> ['tests/files/a0/b0/b0.yaml', 'tests/files/a0/b0/b00.yaml']

Pretty much fnmatch pattern match on the whole filename itself, rather than the filename only.


回答 6

configfiles = glob.glob('C:/Users/sam/Desktop/**/*.txt")

并非在所有情况下都适用,请改用glob2

configfiles = glob2.glob('C:/Users/sam/Desktop/**/*.txt")

configfiles = glob.glob('C:/Users/sam/Desktop/**/*.txt")

Doesn’t works for all cases, instead use glob2

configfiles = glob2.glob('C:/Users/sam/Desktop/**/*.txt")

回答 7

如果可以安装glob2软件包…

import glob2
filenames = glob2.glob("C:\\top_directory\\**\\*.ext")  # Where ext is a specific file extension
folders = glob2.glob("C:\\top_directory\\**\\")

所有文件名和文件夹:

all_ff = glob2.glob("C:\\top_directory\\**\\**")  

If you can install glob2 package…

import glob2
filenames = glob2.glob("C:\\top_directory\\**\\*.ext")  # Where ext is a specific file extension
folders = glob2.glob("C:\\top_directory\\**\\")

All filenames and folders:

all_ff = glob2.glob("C:\\top_directory\\**\\**")  

回答 8

如果您运行的是Python 3.4+,则可以使用该pathlib模块。该Path.glob()方法支持**模式,即“递归该目录和所有子目录”。它返回一个生成器,生成Path所有匹配文件的对象。

from pathlib import Path
configfiles = Path("C:/Users/sam/Desktop/file1/").glob("**/*.txt")

If you’re running Python 3.4+, you can use the pathlib module. The Path.glob() method supports the ** pattern, which means “this directory and all subdirectories, recursively”. It returns a generator yielding Path objects for all matching files.

from pathlib import Path
configfiles = Path("C:/Users/sam/Desktop/file1/").glob("**/*.txt")

回答 9

正如Martijn所指出的,glob只能通过**Python 3.5中引入的运算符来做到这一点。由于OP明确要求使用glob模块,因此以下代码将返回行为类似的惰性评估迭代器

import os, glob, itertools

configfiles = itertools.chain.from_iterable(glob.iglob(os.path.join(root,'*.txt'))
                         for root, dirs, files in os.walk('C:/Users/sam/Desktop/file1/'))

请注意,configfiles尽管如此,您只能在此方法中重复一次。如果您需要可在多个操作中使用的配置文件的真实列表,则必须使用创建显式的配置文件list(configfiles)

As pointed out by Martijn, glob can only do this through the **operator introduced in Python 3.5. Since the OP explicitly asked for the glob module, the following will return a lazy evaluation iterator that behaves similarly

import os, glob, itertools

configfiles = itertools.chain.from_iterable(glob.iglob(os.path.join(root,'*.txt'))
                         for root, dirs, files in os.walk('C:/Users/sam/Desktop/file1/'))

Note that you can only iterate once over configfiles in this approach though. If you require a real list of configfiles that can be used in multiple operations you would have to create this explicitly by using list(configfiles).


回答 10

该命令rglob将对目录结构的最深子级别进行无限递归。如果您只想深一层,则不要使用它。

我意识到OP正在谈论使用glob.glob。我相信,这可以回答意图,即递归搜索所有子文件夹。

rglob函数最近使数据处理算法的速度提高了100倍,该算法使用文件夹结构作为数据读取顺序的固定假设。但是,由于rglob我们能够对指定父目录或该目录下的所有文件进行一次扫描,将它们的名称保存到列表(超过一百万个文件),然后使用该列表来确定我们需要在任何目录下打开哪些文件仅基于文件命名约定及其在哪个文件夹中指向将来。

The command rglob will do an infinite recursion down the deepest sub-level of your directory structure. If you only want one level deep, then do not use it, however.

I realize the OP was talking about using glob.glob. I believe this answers the intent, however, which is to search all subfolders recursively.

The rglob function recently produced a 100x increase in speed for a data processing algorithm which was using the folder structure as a fixed assumption for the order of data reading. However, with rglob we were able to do a single scan once through all files at or below a specified parent directory, save their names to a list (over a million files), then use that list to determine which files we needed to open at any point in the future based on the file naming conventions only vs. which folder they were in.


全局排除模式

问题:全局排除模式

我有一个目录,里面有一堆文件:eee2314asd3442…和eph

我想排除所有eph以该glob功能开头的文件。

我该怎么做?

I have a directory with a bunch of files inside: eee2314, asd3442 … and eph.

I want to exclude all files that start with eph with the glob function.

How can I do it?


回答 0

glob的模式规则不是正则表达式。相反,它们遵循标准的Unix路径扩展规则。仅有几个特殊字符:支持两种不同的通配符,并且支持字符范围[from glob ]。

因此,您可以排除某些带有模式的文件。
例如,要排除清单文件(以开头的文件_)和glob,可以使用:

files = glob.glob('files_path/[!_]*')

The pattern rules for glob are not regular expressions. Instead, they follow standard Unix path expansion rules. There are only a few special characters: two different wild-cards, and character ranges are supported [from glob].

So you can exclude some files with patterns.
For example to exclude manifests files (files starting with _) with glob, you can use:

files = glob.glob('files_path/[!_]*')

回答 1

您可以扣除集合:

set(glob("*")) - set(glob("eph*"))

You can deduct sets:

set(glob("*")) - set(glob("eph*"))

回答 2

您不能使用该glob功能排除模式,Glob仅允许包含模式。通配符语法非常有限(即使[!..]字符类也必须与字符匹配,所以它是一个对于不在类中的每个字符,包含模式)。

您必须自己进行过滤;列表理解通常在这里很有效:

files = [fn for fn in glob('somepath/*.txt') 
         if not os.path.basename(fn).startswith('eph')]

You can’t exclude patterns with the glob function, globs only allow for inclusion patterns. Globbing syntax is very limited (even a [!..] character class must match a character, so it is an inclusion pattern for every character that is not in the class).

You’ll have to do your own filtering; a list comprehension usually works nicely here:

files = [fn for fn in glob('somepath/*.txt') 
         if not os.path.basename(fn).startswith('eph')]

回答 3

游戏晚了,但是您也可以将pythonfilter应用于结果glob

files = glob.iglob('your_path_here')
files_i_care_about = filter(lambda x: not x.startswith("eph"), files)

或将Lambda替换为适当的正则表达式搜索等。

编辑:我只是意识到,如果您使用完整路径startswith将无法正常工作,因此您需要一个正则表达式

In [10]: a
Out[10]: ['/some/path/foo', 'some/path/bar', 'some/path/eph_thing']

In [11]: filter(lambda x: not re.search('/eph', x), a)
Out[11]: ['/some/path/foo', 'some/path/bar']

Late to the game but you could alternatively just apply a python filter to the result of a glob:

files = glob.iglob('your_path_here')
files_i_care_about = filter(lambda x: not x.startswith("eph"), files)

or replacing the lambda with an appropriate regex search, etc…

EDIT: I just realized that if you’re using full paths the startswith won’t work, so you’d need a regex

In [10]: a
Out[10]: ['/some/path/foo', 'some/path/bar', 'some/path/eph_thing']

In [11]: filter(lambda x: not re.search('/eph', x), a)
Out[11]: ['/some/path/foo', 'some/path/bar']

回答 4

如何在遍历文件夹中的所有文件时跳过特定文件!下面的代码将跳过所有以’eph’开头的Excel文件

import glob
import re
for file in glob.glob('*.xlsx'):
    if re.match('eph.*\.xlsx',file):
        continue
    else:
        #do your stuff here
        print(file)

这样,您可以使用更复杂的正则表达式模式在文件夹中包含/排除一组特定的文件。

How about skipping the particular file while iterating over all the files in the folder! Below code would skip all excel files that start with ‘eph’

import glob
import re
for file in glob.glob('*.xlsx'):
    if re.match('eph.*\.xlsx',file):
        continue
    else:
        #do your stuff here
        print(file)

This way you can use more complex regex patterns to include/exclude a particular set of files in a folder.


回答 5

glob,我建议比较一下,pathlib过滤一个模式很简单。

from pathlib import Path

p = Path(YOUR_PATH)
filtered = [x for x in p.glob("**/*") if not x.name.startswith("eph")]

如果要过滤更复杂的模式,可以定义一个函数来执行此操作,就像:

def not_in_pattern(x):
    return (not x.name.startswith("eph")) and not x.name.startswith("epi")


filtered = [x for x in p.glob("**/*") if not_in_pattern(x)]

使用该代码,您可以过滤以eph或开头的所有文件epi

Compare with glob, I recommend pathlib, filter one pattern is very simple.

from pathlib import Path

p = Path(YOUR_PATH)
filtered = [x for x in p.glob("**/*") if not x.name.startswith("eph")]

and if you want to filter more complex pattern, you can define a function to do that, just like:

def not_in_pattern(x):
    return (not x.name.startswith("eph")) and not x.name.startswith("epi")


filtered = [x for x in p.glob("**/*") if not_in_pattern(x)]

use that code, you can filter all files that start with eph or start with epi.


回答 6

更一般而言,要排除不符合某些shell regexp的文件,可以使用module fnmatch

import fnmatch

file_list = glob('somepath')    
for ind, ii in enumerate(file_list):
    if not fnmatch.fnmatch(ii, 'bash_regexp_with_exclude'):
        file_list.pop(ind)

上面的代码将首先从给定的路径生成一个列表,然后弹出不满足正则表达式要求的约束的文件。

More generally, to exclude files that don’t comply with some shell regexp, you could use module fnmatch:

import fnmatch

file_list = glob('somepath')    
for ind, ii in enumerate(file_list):
    if not fnmatch.fnmatch(ii, 'bash_regexp_with_exclude'):
        file_list.pop(ind)

The above will first generate a list from a given path and next pop out the files that won’t satisfy the regular expression with the desired constraint.


回答 7

如公认的答案所述,您不能使用glob排除模式,因此以下是一种过滤glob结果的方法。

公认的答案可能是最好的pythonic做事方式,但是如果您认为列表理解看起来有些丑陋,并且无论如何都想使代码最大化numpythonic(就像我一样),那么您可以这样做(但是请注意,这可能效率较低)比列表理解方法):

import glob

data_files = glob.glob("path_to_files/*.fits")

light_files = np.setdiff1d( data_files, glob.glob("*BIAS*"))
light_files = np.setdiff1d(light_files, glob.glob("*FLAT*"))

(以我为例,我在一个目录中有一些图像帧,偏置帧和平面帧,而我只想要这些图像帧)

As mentioned by the accepted answer, you can’t exclude patterns with glob, so the following is a method to filter your glob result.

The accepted answer is probably the best pythonic way to do things but if you think list comprehensions look a bit ugly and want to make your code maximally numpythonic anyway (like I did) then you can do this (but note that this is probably less efficient than the list comprehension method):

import glob

data_files = glob.glob("path_to_files/*.fits")

light_files = np.setdiff1d( data_files, glob.glob("*BIAS*"))
light_files = np.setdiff1d(light_files, glob.glob("*FLAT*"))

(In my case, I had some image frames, bias frames, and flat frames all in one directory and I just wanted the image frames)


回答 8

如果字符的位置并不重要,那就是例如排除清单文件(无论它被发现_),与globre正则表达式的操作,您可以使用:

import glob
import re
for file in glob.glob('*.txt'):
    if re.match(r'.*\_.*', file):
        continue
    else:
        print(file)

或者以一种更优雅的方式- list comprehension

filtered = [f for f in glob.glob('*.txt') if not re.match(r'.*\_.*', f)]

for mach in filtered:
    print(mach)

If the position of the character isn’t important, that is for example to exclude manifests files (wherever it is found _) with glob and reregular expression operations, you can use:

import glob
import re
for file in glob.glob('*.txt'):
    if re.match(r'.*\_.*', file):
        continue
    else:
        print(file)

Or with in a more elegant way – list comprehension

filtered = [f for f in glob.glob('*.txt') if not re.match(r'.*\_.*', f)]

for mach in filtered:
    print(mach)

回答 9

您可以使用以下方法:

# Get all the files
allFiles = glob.glob("*")
# Files starting with eph
ephFiles = glob.glob("eph*")
# Files which doesnt start with eph
noephFiles = []
for file in allFiles:
    if file not in ephFiles:
        noephFiles.append(file)
# noepchFiles has all the file which doesnt start with eph.

Thank you.  

You can use the below method:

# Get all the files
allFiles = glob.glob("*")
# Files starting with eph
ephFiles = glob.glob("eph*")
# Files which doesnt start with eph
noephFiles = []
for file in allFiles:
    if file not in ephFiles:
        noephFiles.append(file)
# noepchFiles has all the file which doesnt start with eph.

Thank you.  

如何使用Python计算目录中的文件数

问题:如何使用Python计算目录中的文件数

我需要计算使用Python的目录中的文件数。

我猜最简单的方法是len(glob.glob('*')),但这也将目录本身视为文件。

有什么方法可以只计算目录中的文件吗?

I need to count the number of files in a directory using Python.

I guess the easiest way is len(glob.glob('*')), but that also counts the directory itself as a file.

Is there any way to count only the files in a directory?


回答 0

os.listdir()比使用会更有效率glob.glob。要测试文件名是否是普通文件(而不是目录或其他实体),请使用os.path.isfile()

import os, os.path

# simple version for working with CWD
print len([name for name in os.listdir('.') if os.path.isfile(name)])

# path joining version for other paths
DIR = '/tmp'
print len([name for name in os.listdir(DIR) if os.path.isfile(os.path.join(DIR, name))])

os.listdir() will be slightly more efficient than using glob.glob. To test if a filename is an ordinary file (and not a directory or other entity), use os.path.isfile():

import os, os.path

# simple version for working with CWD
print len([name for name in os.listdir('.') if os.path.isfile(name)])

# path joining version for other paths
DIR = '/tmp'
print len([name for name in os.listdir(DIR) if os.path.isfile(os.path.join(DIR, name))])

回答 1

import os

path, dirs, files = next(os.walk("/usr/lib"))
file_count = len(files)
import os

path, dirs, files = next(os.walk("/usr/lib"))
file_count = len(files)

回答 2

对于所有类型的文件,都包含以下子目录:

import os

list = os.listdir(dir) # dir is your directory path
number_files = len(list)
print number_files

仅文件(避免使用子目录):

import os

onlyfiles = next(os.walk(dir))[2] #dir is your directory path as string
print len(onlyfiles)

For all kind of files, subdirectories included:

import os

list = os.listdir(dir) # dir is your directory path
number_files = len(list)
print number_files

Only files (avoiding subdirectories):

import os

onlyfiles = next(os.walk(dir))[2] #dir is your directory path as string
print len(onlyfiles)

回答 3

这是fnmatch派上用场的地方:

import fnmatch

print len(fnmatch.filter(os.listdir(dirpath), '*.txt'))

更多详细信息:http : //docs.python.org/2/library/fnmatch.html

This is where fnmatch comes very handy:

import fnmatch

print len(fnmatch.filter(os.listdir(dirpath), '*.txt'))

More details: http://docs.python.org/2/library/fnmatch.html


回答 4

如果要计算目录中的所有文件(包括子目录中的文件),则最Python化的方法是:

import os

file_count = sum(len(files) for _, _, files in os.walk(r'C:\Dropbox'))
print(file_count)

我们使用的总和要比显式添加文件计数(正在等待的时间)更快

If you want to count all files in the directory – including files in subdirectories, the most pythonic way is:

import os

file_count = sum(len(files) for _, _, files in os.walk(r'C:\Dropbox'))
print(file_count)

We use sum that is faster than explicitly adding the file counts (timings pending)


回答 5

import os
print len(os.listdir(os.getcwd()))
import os
print len(os.listdir(os.getcwd()))

回答 6

def directory(path,extension):
  list_dir = []
  list_dir = os.listdir(path)
  count = 0
  for file in list_dir:
    if file.endswith(extension): # eg: '.txt'
      count += 1
  return count
def directory(path,extension):
  list_dir = []
  list_dir = os.listdir(path)
  count = 0
  for file in list_dir:
    if file.endswith(extension): # eg: '.txt'
      count += 1
  return count

回答 7

我很惊讶没有人提到os.scandir

def count_files(dir):
    return len([1 for x in list(os.scandir(dir)) if x.is_file()])

I am surprised that nobody mentioned os.scandir:

def count_files(dir):
    return len([1 for x in list(os.scandir(dir)) if x.is_file()])

回答 8

这可以使用os.listdir并适用于任何目录:

import os
directory = 'mydirpath'

number_of_files = len([item for item in os.listdir(directory) if os.path.isfile(os.path.join(directory, item))])

可以使用生成器简化此过程,并通过以下方法使速度更快一点:

import os
isfile = os.path.isfile
join = os.path.join

directory = 'mydirpath'
number_of_files = sum(1 for item in os.listdir(directory) if isfile(join(directory, item)))

This uses os.listdir and works for any directory:

import os
directory = 'mydirpath'

number_of_files = len([item for item in os.listdir(directory) if os.path.isfile(os.path.join(directory, item))])

this can be simplified with a generator and made a little bit faster with:

import os
isfile = os.path.isfile
join = os.path.join

directory = 'mydirpath'
number_of_files = sum(1 for item in os.listdir(directory) if isfile(join(directory, item)))

回答 9

def count_em(valid_path):
   x = 0
   for root, dirs, files in os.walk(valid_path):
       for f in files:
            x = x+1
print "There are", x, "files in this directory."
return x

取自这篇文章

def count_em(valid_path):
   x = 0
   for root, dirs, files in os.walk(valid_path):
       for f in files:
            x = x+1
print "There are", x, "files in this directory."
return x

Taked from this post


回答 10

import os

def count_files(in_directory):
    joiner= (in_directory + os.path.sep).__add__
    return sum(
        os.path.isfile(filename)
        for filename
        in map(joiner, os.listdir(in_directory))
    )

>>> count_files("/usr/lib")
1797
>>> len(os.listdir("/usr/lib"))
2049
import os

def count_files(in_directory):
    joiner= (in_directory + os.path.sep).__add__
    return sum(
        os.path.isfile(filename)
        for filename
        in map(joiner, os.listdir(in_directory))
    )

>>> count_files("/usr/lib")
1797
>>> len(os.listdir("/usr/lib"))
2049

回答 11

卢克的代码重新格式化。

import os

print len(os.walk('/usr/lib').next()[2])

Luke’s code reformat.

import os

print len(os.walk('/usr/lib').next()[2])

回答 12

这是我发现有用的简单单行命令:

print int(os.popen("ls | wc -l").read())

Here is a simple one-line command that I found useful:

print int(os.popen("ls | wc -l").read())

回答 13

虽然我同意@DanielStutzbach提供的答案:os.listdir()将比使用效率更高glob.glob

但是,如果要计算文件夹中特定文件的数量,则需要额外的精度len(glob.glob())。例如,如果您要计算要使用的文件夹中的所有pdf,请执行以下操作:

pdfCounter = len(glob.glob1(myPath,"*.pdf"))

While I agree with the answer provided by @DanielStutzbach: os.listdir() will be slightly more efficient than using glob.glob.

However, an extra precision, if you do want to count the number of specific files in folder, you want to use len(glob.glob()). For instance if you were to count all the pdfs in a folder you want to use:

pdfCounter = len(glob.glob1(myPath,"*.pdf"))

回答 14

很简单:

print(len([iq for iq in os.scandir('PATH')]))

它只计算目录中的文件数,我使用列表推导技术遍历特定目录,以返回所有文件。“ len(返回列表)”返回文件数。

It is simple:

print(len([iq for iq in os.scandir('PATH')]))

it simply counts number of files in directory , i have used list comprehension technique to iterate through specific directory returning all files in return . “len(returned list)” returns number of files.


回答 15

import os

total_con=os.listdir('<directory path>')

files=[]

for f_n in total_con:
   if os.path.isfile(f_n):
     files.append(f_n)


print len(files)
import os

total_con=os.listdir('<directory path>')

files=[]

for f_n in total_con:
   if os.path.isfile(f_n):
     files.append(f_n)


print len(files)

回答 16

如果您将使用操作系统的标准外壳,则可以更快地获得结果,而不是使用纯pythonic方式。

Windows示例:

import os
import subprocess

def get_num_files(path):
    cmd = 'DIR \"%s\" /A-D /B /S | FIND /C /V ""' % path
    return int(subprocess.check_output(cmd, shell=True))

If you’ll be using the standard shell of the operating system, you can get the result much faster rather than using pure pythonic way.

Example for Windows:

import os
import subprocess

def get_num_files(path):
    cmd = 'DIR \"%s\" /A-D /B /S | FIND /C /V ""' % path
    return int(subprocess.check_output(cmd, shell=True))

回答 17

我找到了另一个可能是正确的答案。

for root, dirs, files in os.walk(input_path):    
for name in files:
    if os.path.splitext(name)[1] == '.TXT' or os.path.splitext(name)[1] == '.txt':
        datafiles.append(os.path.join(root,name)) 


print len(files) 

I found another answer which may be correct as accepted answer.

for root, dirs, files in os.walk(input_path):    
for name in files:
    if os.path.splitext(name)[1] == '.TXT' or os.path.splitext(name)[1] == '.txt':
        datafiles.append(os.path.join(root,name)) 


print len(files) 

回答 18

我用过glob.iglob类似的目录结构

data
└───train
   └───subfolder1
   |      file111.png
   |      file112.png
   |      ...
   |
   └───subfolder2
          file121.png
          file122.png
          ...
└───test
       file221.png
       file222.png

以下两个选项均返回4(如预期,即不计算子文件夹本身

  • len(list(glob.iglob("data/train/*/*.png", recursive=True)))
  • sum(1 for i in glob.iglob("data/train/*/*.png"))

I used glob.iglob for a directory structure similar to

data
└───train
│   └───subfolder1
│   |   │   file111.png
│   |   │   file112.png
│   |   │   ...
│   |
│   └───subfolder2
│       │   file121.png
│       │   file122.png
│       │   ...
└───test
    │   file221.png
    │   file222.png

Both of the following options return 4 (as expected, i.e. does not count the subfolders themselves)

  • len(list(glob.iglob("data/train/*/*.png", recursive=True)))
  • sum(1 for i in glob.iglob("data/train/*/*.png"))

回答 19

我这样做了,这返回了文件夹(Attack_Data)中文件的数量。

import os
def fcount(path):
    #Counts the number of files in a directory
    count = 0
    for f in os.listdir(path):
        if os.path.isfile(os.path.join(path, f)):
            count += 1

    return count
path = r"C:\Users\EE EKORO\Desktop\Attack_Data" #Read files in folder
print (fcount(path))

i did this and this returned the number of files in the folder(Attack_Data)…this works fine.

import os
def fcount(path):
    #Counts the number of files in a directory
    count = 0
    for f in os.listdir(path):
        if os.path.isfile(os.path.join(path, f)):
            count += 1

    return count
path = r"C:\Users\EE EKORO\Desktop\Attack_Data" #Read files in folder
print (fcount(path))

获取目录中文件的过滤列表

问题:获取目录中文件的过滤列表

我正在尝试使用Python获取目录中的文件列表,但是我不想要所有文件的列表。

我本质上想要的是能够执行以下操作但使用Python而不执行ls的功能。

ls 145592*.jpg

如果没有内置方法,我目前正在考虑编写一个for循环以遍历an的结果。 os.listdir()并将所有匹配的文件附加到新列表中。

但是,该目录中有很多文件,因此我希望有一种更有效的方法(或内置方法)。

I am trying to get a list of files in a directory using Python, but I do not want a list of ALL the files.

What I essentially want is the ability to do something like the following but using Python and not executing ls.

ls 145592*.jpg

If there is no built-in method for this, I am currently thinking of writing a for loop to iterate through the results of an os.listdir() and to append all the matching files to a new list.

However, there are a lot of files in that directory and therefore I am hoping there is a more efficient method (or a built-in method).


回答 0

import glob

jpgFilenamesList = glob.glob('145592*.jpg')

See glob in python documenttion


回答 1

glob.glob()绝对是做到这一点的方式(根据Ignacio)。但是,如果您确实需要更复杂的匹配,则可以使用列表理解和来完成re.match(),例如:

files = [f for f in os.listdir('.') if re.match(r'[0-9]+.*\.jpg', f)]

更加灵活,但是您注意到效率更低。

glob.glob() is definitely the way to do it (as per Ignacio). However, if you do need more complicated matching, you can do it with a list comprehension and re.match(), something like so:

files = [f for f in os.listdir('.') if re.match(r'[0-9]+.*\.jpg', f)]

More flexible, but as you note, less efficient.


回答 2

把事情简单化:

import os
relevant_path = "[path to folder]"
included_extensions = ['jpg','jpeg', 'bmp', 'png', 'gif']
file_names = [fn for fn in os.listdir(relevant_path)
              if any(fn.endswith(ext) for ext in included_extensions)]

我更喜欢这种形式的列表理解,因为它的英文读起来很好。

我将第四行读为:对于os.listdir中路径的每个fn,请仅提供与我包含的任何扩展名匹配的那些fn。

对于新手python程序员来说,可能很难真正习惯于使用列表推导进行过滤,并且对于非常大的数据集,它可能会有一些内存开销,但是对于列出目录和其他简单的字符串过滤任务,列表推导会导致更干净可记录的代码。

这种设计的唯一之处在于,它不能保护您避免犯错误,而不是传递字符串而不是列表。例如,如果您不小心将字符串转换为列表,并最终检查了字符串的所有字符,则可能最终会得到一系列误报。

但是,拥有一个易于解决的问题比解决一个难以理解的解决方案要好。

Keep it simple:

import os
relevant_path = "[path to folder]"
included_extensions = ['jpg','jpeg', 'bmp', 'png', 'gif']
file_names = [fn for fn in os.listdir(relevant_path)
              if any(fn.endswith(ext) for ext in included_extensions)]

I prefer this form of list comprehensions because it reads well in English.

I read the fourth line as: For each fn in os.listdir for my path, give me only the ones that match any one of my included extensions.

It may be hard for novice python programmers to really get used to using list comprehensions for filtering, and it can have some memory overhead for very large data sets, but for listing a directory and other simple string filtering tasks, list comprehensions lead to more clean documentable code.

The only thing about this design is that it doesn’t protect you against making the mistake of passing a string instead of a list. For example if you accidentally convert a string to a list and end up checking against all the characters of a string, you could end up getting a slew of false positives.

But it’s better to have a problem that’s easy to fix than a solution that’s hard to understand.


回答 3

另外的选择:

>>> import os, fnmatch
>>> fnmatch.filter(os.listdir('.'), '*.py')
['manage.py']

https://docs.python.org/3/library/fnmatch.html

Another option:

>>> import os, fnmatch
>>> fnmatch.filter(os.listdir('.'), '*.py')
['manage.py']

https://docs.python.org/3/library/fnmatch.html


回答 4

过滤glob模块:

导入球

import glob

通配符:

files=glob.glob("data/*")
print(files)

Out:

['data/ks_10000_0', 'data/ks_1000_0', 'data/ks_100_0', 'data/ks_100_1',
'data/ks_100_2', 'data/ks_106_0', 'data/ks_19_0', 'data/ks_200_0', 'data/ks_200_1', 
'data/ks_300_0', 'data/ks_30_0', 'data/ks_400_0', 'data/ks_40_0', 'data/ks_45_0', 
'data/ks_4_0', 'data/ks_500_0', 'data/ks_50_0', 'data/ks_50_1', 'data/ks_60_0', 
'data/ks_82_0', 'data/ks_lecture_dp_1', 'data/ks_lecture_dp_2']

接头扩展.txt

files = glob.glob("/home/ach/*/*.txt")

一个字符

glob.glob("/home/ach/file?.txt")

编号范围

glob.glob("/home/ach/*[0-9]*")

字母范围

glob.glob("/home/ach/[a-c]*")

Filter with glob module:

Import glob

import glob

Wild Cards:

files=glob.glob("data/*")
print(files)

Out:

['data/ks_10000_0', 'data/ks_1000_0', 'data/ks_100_0', 'data/ks_100_1',
'data/ks_100_2', 'data/ks_106_0', 'data/ks_19_0', 'data/ks_200_0', 'data/ks_200_1', 
'data/ks_300_0', 'data/ks_30_0', 'data/ks_400_0', 'data/ks_40_0', 'data/ks_45_0', 
'data/ks_4_0', 'data/ks_500_0', 'data/ks_50_0', 'data/ks_50_1', 'data/ks_60_0', 
'data/ks_82_0', 'data/ks_lecture_dp_1', 'data/ks_lecture_dp_2']

Fiter extension .txt:

files = glob.glob("/home/ach/*/*.txt")

A single character

glob.glob("/home/ach/file?.txt")

Number Ranges

glob.glob("/home/ach/*[0-9]*")

Alphabet Ranges

glob.glob("/home/ach/[a-c]*")

回答 5

初步代码

import glob
import fnmatch
import pathlib
import os

pattern = '*.py'
path = '.'

解决方案1-使用“ glob”

# lookup in current dir
glob.glob(pattern)

In [2]: glob.glob(pattern)
Out[2]: ['wsgi.py', 'manage.py', 'tasks.py']

解决方案2-使用“操作系统” +“ fnmatch”

版本2.1-在当前目录中查找

# lookup in current dir
fnmatch.filter(os.listdir(path), pattern)

In [3]: fnmatch.filter(os.listdir(path), pattern)
Out[3]: ['wsgi.py', 'manage.py', 'tasks.py']

版本2.2-递归查找

# lookup recursive
for dirpath, dirnames, filenames in os.walk(path):

    if not filenames:
        continue

    pythonic_files = fnmatch.filter(filenames, pattern)
    if pythonic_files:
        for file in pythonic_files:
            print('{}/{}'.format(dirpath, file))

结果

./wsgi.py
./manage.py
./tasks.py
./temp/temp.py
./apps/diaries/urls.py
./apps/diaries/signals.py
./apps/diaries/actions.py
./apps/diaries/querysets.py
./apps/library/tests/test_forms.py
./apps/library/migrations/0001_initial.py
./apps/polls/views.py
./apps/polls/formsets.py
./apps/polls/reports.py
./apps/polls/admin.py

解决方案3使用“ pathlib”

# lookup in current dir
path_ = pathlib.Path('.')
tuple(path_.glob(pattern))

# lookup recursive
tuple(path_.rglob(pattern))

笔记:

  1. 在Python 3.4上测试
  2. 仅在Python 3.4中添加了模块“ pathlib”
  3. Python 3.5添加了glob.glob https://docs.python.org/3.5/library/glob.html#glob.glob递归查找的功能 。由于我的机器安装了Python 3.4,因此尚未进行测试。

Preliminary code

import glob
import fnmatch
import pathlib
import os

pattern = '*.py'
path = '.'

Solution 1 – use “glob”

# lookup in current dir
glob.glob(pattern)

In [2]: glob.glob(pattern)
Out[2]: ['wsgi.py', 'manage.py', 'tasks.py']

Solution 2 – use “os” + “fnmatch”

Variant 2.1 – Lookup in current dir

# lookup in current dir
fnmatch.filter(os.listdir(path), pattern)

In [3]: fnmatch.filter(os.listdir(path), pattern)
Out[3]: ['wsgi.py', 'manage.py', 'tasks.py']

Variant 2.2 – Lookup recursive

# lookup recursive
for dirpath, dirnames, filenames in os.walk(path):

    if not filenames:
        continue

    pythonic_files = fnmatch.filter(filenames, pattern)
    if pythonic_files:
        for file in pythonic_files:
            print('{}/{}'.format(dirpath, file))

Result

./wsgi.py
./manage.py
./tasks.py
./temp/temp.py
./apps/diaries/urls.py
./apps/diaries/signals.py
./apps/diaries/actions.py
./apps/diaries/querysets.py
./apps/library/tests/test_forms.py
./apps/library/migrations/0001_initial.py
./apps/polls/views.py
./apps/polls/formsets.py
./apps/polls/reports.py
./apps/polls/admin.py

Solution 3 – use “pathlib”

# lookup in current dir
path_ = pathlib.Path('.')
tuple(path_.glob(pattern))

# lookup recursive
tuple(path_.rglob(pattern))

Notes:

  1. Tested on the Python 3.4
  2. The module “pathlib” was added only in the Python 3.4
  3. The Python 3.5 added a feature for recursive lookup with glob.glob https://docs.python.org/3.5/library/glob.html#glob.glob. Since my machine is installed with Python 3.4, I have not tested that.

回答 6

使用os.walk递归列出您的文件

import os
root = "/home"
pattern = "145992"
alist_filter = ['jpg','bmp','png','gif'] 
path=os.path.join(root,"mydir_to_scan")
for r,d,f in os.walk(path):
    for file in f:
        if file[-3:] in alist_filter and pattern in file:
            print os.path.join(root,file)

use os.walk to recursively list your files

import os
root = "/home"
pattern = "145992"
alist_filter = ['jpg','bmp','png','gif'] 
path=os.path.join(root,"mydir_to_scan")
for r,d,f in os.walk(path):
    for file in f:
        if file[-3:] in alist_filter and pattern in file:
            print os.path.join(root,file)

回答 7

import os

dir="/path/to/dir"
[x[0]+"/"+f for x in os.walk(dir) for f in x[2] if f.endswith(".jpg")]

这将为您提供jpg文件及其完整路径的列表。您可以替换x[0]+"/"+ff的只是文件名。您也可以f.endswith(".jpg")用所需的任何字符串条件替换。

import os

dir="/path/to/dir"
[x[0]+"/"+f for x in os.walk(dir) for f in x[2] if f.endswith(".jpg")]

This will give you a list of jpg files with their full path. You can replace x[0]+"/"+f with f for just filenames. You can also replace f.endswith(".jpg") with whatever string condition you wish.


回答 8

您可能还需要更高级的方法(我已经实现并打包为findtools):

from findtools.find_files import (find_files, Match)


# Recursively find all *.txt files in **/home/**
txt_files_pattern = Match(filetype='f', name='*.txt')
found_files = find_files(path='/home', match=txt_files_pattern)

for found_file in found_files:
    print found_file

可以安装

pip install findtools

you might also like a more high-level approach (I have implemented and packaged as findtools):

from findtools.find_files import (find_files, Match)


# Recursively find all *.txt files in **/home/**
txt_files_pattern = Match(filetype='f', name='*.txt')
found_files = find_files(path='/home', match=txt_files_pattern)

for found_file in found_files:
    print found_file

can be installed with

pip install findtools

回答 9

“ path / to / images”中带有“ jpg”和“ png”扩展名的文件名:

import os
accepted_extensions = ["jpg", "png"]
filenames = [fn for fn in os.listdir("path/to/images") if fn.split(".")[-1] in accepted_extensions]

Filenames with “jpg” and “png” extensions in “path/to/images”:

import os
accepted_extensions = ["jpg", "png"]
filenames = [fn for fn in os.listdir("path/to/images") if fn.split(".")[-1] in accepted_extensions]

回答 10

您可以使用Python标准库3.4及更高版本中提供的pathlib

from pathlib import Path

files = [f for f in Path.cwd().iterdir() if f.match("145592*.jpg")]

You can use pathlib that is available in Python standard library 3.4 and above.

from pathlib import Path

files = [f for f in Path.cwd().iterdir() if f.match("145592*.jpg")]

回答 11

您可以定义模式并进行检查。在这里,我采用了开始和结束模式,并在文件名中查找它们。FILES包含目录中所有文件的列表。

import os
PATTERN_START = "145592"
PATTERN_END = ".jpg"
CURRENT_DIR = os.path.dirname(os.path.realpath(__file__))
for r,d,FILES in os.walk(CURRENT_DIR):
    for FILE in FILES:
        if PATTERN_START in FILE and PATTERN_END in FILE:
            print FILE

You can define pattern and check for it. Here I have taken both start and end pattern and looking for them in the filename. FILES contains the list of all the files in a directory.

import os
PATTERN_START = "145592"
PATTERN_END = ".jpg"
CURRENT_DIR = os.path.dirname(os.path.realpath(__file__))
for r,d,FILES in os.walk(CURRENT_DIR):
    for FILE in FILES:
        if PATTERN_START in FILE.startwith(PATTERN_START) and PATTERN_END in FILE.endswith(PATTERN_END):
            print FILE

回答 12

str.split()怎么样?没什么可导入的。

import os

image_names = [f for f in os.listdir(path) if len(f.split('.jpg')) == 2]

How about str.split()? Nothing to import.

import os

image_names = [f for f in os.listdir(path) if len(f.split('.jpg')) == 2]

回答 13

您可以使用subprocess.check_ouput()作为

import subprocess

list_files = subprocess.check_output("ls 145992*.jpg", shell=True) 

当然,引号之间的字符串可以是您要在shell中执行并存储输出的任何内容。

You can use subprocess.check_ouput() as

import subprocess

list_files = subprocess.check_output("ls 145992*.jpg", shell=True) 

Of course, the string between quotes can be anything you want to execute in the shell, and store the output.


如何使用glob()递归查找文件?

问题:如何使用glob()递归查找文件?

这就是我所拥有的:

glob(os.path.join('src','*.c'))

但我想搜索src的子文件夹。这样的事情会起作用:

glob(os.path.join('src','*.c'))
glob(os.path.join('src','*','*.c'))
glob(os.path.join('src','*','*','*.c'))
glob(os.path.join('src','*','*','*','*.c'))

但这显然是有限且笨拙的。

This is what I have:

glob(os.path.join('src','*.c'))

but I want to search the subfolders of src. Something like this would work:

glob(os.path.join('src','*.c'))
glob(os.path.join('src','*','*.c'))
glob(os.path.join('src','*','*','*.c'))
glob(os.path.join('src','*','*','*','*.c'))

But this is obviously limited and clunky.


回答 0

Python 3.5+

由于您使用的是新的python,因此应pathlib.Path.rglobpathlib模块中使用。

from pathlib import Path

for path in Path('src').rglob('*.c'):
    print(path.name)

如果您不想使用pathlib,只需使用glob.glob,但不要忘记传递recursive关键字参数。

对于匹配文件以点(。)开头的情况;例如当前目录中的文件或基于Unix的系统上的隐藏文件,请使用以下os.walk解决方案。

较旧的Python版本

对于较旧的Python版本,可os.walk用于递归遍历目录并fnmatch.filter与简单表达式匹配:

import fnmatch
import os

matches = []
for root, dirnames, filenames in os.walk('src'):
    for filename in fnmatch.filter(filenames, '*.c'):
        matches.append(os.path.join(root, filename))

Python 3.5+

Since you’re on a new python, you should use pathlib.Path.rglob from the the pathlib module.

from pathlib import Path

for path in Path('src').rglob('*.c'):
    print(path.name)

If you don’t want to use pathlib, just use glob.glob, but don’t forget to pass in the recursive keyword parameter.

For cases where matching files beginning with a dot (.); like files in the current directory or hidden files on Unix based system, use the os.walk solution below.

Older Python versions

For older Python versions, use os.walk to recursively walk a directory and fnmatch.filter to match against a simple expression:

import fnmatch
import os

matches = []
for root, dirnames, filenames in os.walk('src'):
    for filename in fnmatch.filter(filenames, '*.c'):
        matches.append(os.path.join(root, filename))

回答 1

与其他解决方案类似,但是使用fnmatch.fnmatch而不是glob,因为os.walk已经列出了文件名:

import os, fnmatch


def find_files(directory, pattern):
    for root, dirs, files in os.walk(directory):
        for basename in files:
            if fnmatch.fnmatch(basename, pattern):
                filename = os.path.join(root, basename)
                yield filename


for filename in find_files('src', '*.c'):
    print 'Found C source:', filename

另外,使用生成器可以使您处理找到的每个文件,而不是查找所有文件然后进行处理。

Similar to other solutions, but using fnmatch.fnmatch instead of glob, since os.walk already listed the filenames:

import os, fnmatch


def find_files(directory, pattern):
    for root, dirs, files in os.walk(directory):
        for basename in files:
            if fnmatch.fnmatch(basename, pattern):
                filename = os.path.join(root, basename)
                yield filename


for filename in find_files('src', '*.c'):
    print 'Found C source:', filename

Also, using a generator alows you to process each file as it is found, instead of finding all the files and then processing them.


回答 2

我修改了glob模块,以支持**用于递归glob,例如:

>>> import glob2
>>> all_header_files = glob2.glob('src/**/*.c')

https://github.com/miracle2k/python-glob2/

当您想为用户提供使用**语法的能力时很有用,因此仅os.walk()不够好。

I’ve modified the glob module to support ** for recursive globbing, e.g:

>>> import glob2
>>> all_header_files = glob2.glob('src/**/*.c')

https://github.com/miracle2k/python-glob2/

Useful when you want to provide your users with the ability to use the ** syntax, and thus os.walk() alone is not good enough.


回答 3

从Python 3.4开始,可以使用新pathlib模块中支持通配符glob()Path类之一的方法。例如:**

from pathlib import Path

for file_path in Path('src').glob('**/*.c'):
    print(file_path) # do whatever you need with these files

更新: 从Python 3.5开始,glob.glob()

Starting with Python 3.4, one can use the glob() method of one of the Path classes in the new pathlib module, which supports ** wildcards. For example:

from pathlib import Path

for file_path in Path('src').glob('**/*.c'):
    print(file_path) # do whatever you need with these files

Update: Starting with Python 3.5, the same syntax is also supported by glob.glob().


回答 4

import os
import fnmatch


def recursive_glob(treeroot, pattern):
    results = []
    for base, dirs, files in os.walk(treeroot):
        goodfiles = fnmatch.filter(files, pattern)
        results.extend(os.path.join(base, f) for f in goodfiles)
    return results

fnmatch为您提供与完全相同的模式glob,因此对于glob.glob非常紧密的语义而言,这确实是一个很好的替代。迭代的版本(例如生成器),用IOW代替glob.iglob,是微不足道的改编(只是yield中间结果,而不是extend最后返回单个结果列表)。

import os
import fnmatch


def recursive_glob(treeroot, pattern):
    results = []
    for base, dirs, files in os.walk(treeroot):
        goodfiles = fnmatch.filter(files, pattern)
        results.extend(os.path.join(base, f) for f in goodfiles)
    return results

fnmatch gives you exactly the same patterns as glob, so this is really an excellent replacement for glob.glob with very close semantics. An iterative version (e.g. a generator), IOW a replacement for glob.iglob, is a trivial adaptation (just yield the intermediate results as you go, instead of extending a single results list to return at the end).


回答 5

对于python> = 3.5,可以使用**recursive=True

import glob
for x in glob.glob('path/**/*.c', recursive=True):
    print(x)

演示版


如果是递归的True,则模式** 将匹配任何文件以及零个或多个directoriessubdirectories。如果模式后跟一个os.sep,则仅目录和subdirectories匹配项。

For python >= 3.5 you can use **, recursive=True :

import glob
for x in glob.glob('path/**/*.c', recursive=True):
    print(x)

Demo


If recursive is True, the pattern ** will match any files and zero or more directories and subdirectories. If the pattern is followed by an os.sep, only directories and subdirectories match.


回答 6

您将要用来os.walk收集符合条件的文件名。例如:

import os
cfiles = []
for root, dirs, files in os.walk('src'):
  for file in files:
    if file.endswith('.c'):
      cfiles.append(os.path.join(root, file))

You’ll want to use os.walk to collect filenames that match your criteria. For example:

import os
cfiles = []
for root, dirs, files in os.walk('src'):
  for file in files:
    if file.endswith('.c'):
      cfiles.append(os.path.join(root, file))

回答 7

这是一个具有嵌套列表推导的解决方案,os.walk而不是简单的后缀匹配glob

import os
cfiles = [os.path.join(root, filename)
          for root, dirnames, filenames in os.walk('src')
          for filename in filenames if filename.endswith('.c')]

可以将其压缩为单线:

import os;cfiles=[os.path.join(r,f) for r,d,fs in os.walk('src') for f in fs if f.endswith('.c')]

或概括为一个函数:

import os

def recursive_glob(rootdir='.', suffix=''):
    return [os.path.join(looproot, filename)
            for looproot, _, filenames in os.walk(rootdir)
            for filename in filenames if filename.endswith(suffix)]

cfiles = recursive_glob('src', '.c')

如果您确实需要完整的glob样式模式,则可以遵循Alex和Bruno的示例并使用fnmatch

import fnmatch
import os

def recursive_glob(rootdir='.', pattern='*'):
    return [os.path.join(looproot, filename)
            for looproot, _, filenames in os.walk(rootdir)
            for filename in filenames
            if fnmatch.fnmatch(filename, pattern)]

cfiles = recursive_glob('src', '*.c')

Here’s a solution with nested list comprehensions, os.walk and simple suffix matching instead of glob:

import os
cfiles = [os.path.join(root, filename)
          for root, dirnames, filenames in os.walk('src')
          for filename in filenames if filename.endswith('.c')]

It can be compressed to a one-liner:

import os;cfiles=[os.path.join(r,f) for r,d,fs in os.walk('src') for f in fs if f.endswith('.c')]

or generalized as a function:

import os

def recursive_glob(rootdir='.', suffix=''):
    return [os.path.join(looproot, filename)
            for looproot, _, filenames in os.walk(rootdir)
            for filename in filenames if filename.endswith(suffix)]

cfiles = recursive_glob('src', '.c')

If you do need full glob style patterns, you can follow Alex’s and Bruno’s example and use fnmatch:

import fnmatch
import os

def recursive_glob(rootdir='.', pattern='*'):
    return [os.path.join(looproot, filename)
            for looproot, _, filenames in os.walk(rootdir)
            for filename in filenames
            if fnmatch.fnmatch(filename, pattern)]

cfiles = recursive_glob('src', '*.c')

回答 8

最近,我不得不恢复扩展名为.jpg的图片。我运行了photorec并恢复了4579个目录,其中220万个文件具有多种扩展名。使用以下脚本,我能够在几分钟内选择50133个具有.jpg扩展名的文件:

#!/usr/binenv python2.7

import glob
import shutil
import os

src_dir = "/home/mustafa/Masaüstü/yedek"
dst_dir = "/home/mustafa/Genel/media"
for mediafile in glob.iglob(os.path.join(src_dir, "*", "*.jpg")): #"*" is for subdirectory
    shutil.copy(mediafile, dst_dir)

Recently I had to recover my pictures with the extension .jpg. I ran photorec and recovered 4579 directories 2.2 million files within, having tremendous variety of extensions.With the script below I was able to select 50133 files havin .jpg extension within minutes:

#!/usr/binenv python2.7

import glob
import shutil
import os

src_dir = "/home/mustafa/Masaüstü/yedek"
dst_dir = "/home/mustafa/Genel/media"
for mediafile in glob.iglob(os.path.join(src_dir, "*", "*.jpg")): #"*" is for subdirectory
    shutil.copy(mediafile, dst_dir)

回答 9

考虑一下pathlib.rglob()

这就好比调用Path.glob()"**/"在给定的相对图案前面加:

import pathlib


for p in pathlib.Path("src").rglob("*.c"):
    print(p)

另请参阅@taleinat的相关文章和类似的文章其他地方。

Consider pathlib.rglob().

This is like calling Path.glob() with "**/" added in front of the given relative pattern:

import pathlib


for p in pathlib.Path("src").rglob("*.c"):
    print(p)

See also @taleinat’s related post here and a similar post elsewhere.


回答 10

Johan和Bruno针对上述最低要求提供了出色的解决方案。我刚刚发布了实现了Ant FileSet和Globs的Formic,它可以处理这种情况以及更复杂的情况。您的要求的实现是:

import formic
fileset = formic.FileSet(include="/src/**/*.c")
for file_name in fileset.qualified_files():
    print file_name

Johan and Bruno provide excellent solutions on the minimal requirement as stated. I have just released Formic which implements Ant FileSet and Globs which can handle this and more complicated scenarios. An implementation of your requirement is:

import formic
fileset = formic.FileSet(include="/src/**/*.c")
for file_name in fileset.qualified_files():
    print file_name

回答 11

基于其他答案,这是我当前的工作实现,它在根目录中检索嵌套的xml文件:

files = []
for root, dirnames, filenames in os.walk(myDir):
    files.extend(glob.glob(root + "/*.xml"))

我真的很喜欢python :)

based on other answers this is my current working implementation, which retrieves nested xml files in a root directory:

files = []
for root, dirnames, filenames in os.walk(myDir):
    files.extend(glob.glob(root + "/*.xml"))

I’m really having fun with python :)


回答 12

仅使用glob模块执行此操作的另一种方法。只需在rglob方法中添加一个起始基本目录和一个匹配模式即可,它将返回匹配文件名的列表。

import glob
import os

def _getDirs(base):
    return [x for x in glob.iglob(os.path.join( base, '*')) if os.path.isdir(x) ]

def rglob(base, pattern):
    list = []
    list.extend(glob.glob(os.path.join(base,pattern)))
    dirs = _getDirs(base)
    if len(dirs):
        for d in dirs:
            list.extend(rglob(os.path.join(base,d), pattern))
    return list

Another way to do it using just the glob module. Just seed the rglob method with a starting base directory and a pattern to match and it will return a list of matching file names.

import glob
import os

def _getDirs(base):
    return [x for x in glob.iglob(os.path.join( base, '*')) if os.path.isdir(x) ]

def rglob(base, pattern):
    list = []
    list.extend(glob.glob(os.path.join(base,pattern)))
    dirs = _getDirs(base)
    if len(dirs):
        for d in dirs:
            list.extend(rglob(os.path.join(base,d), pattern))
    return list

回答 13

或具有列表理解:

 >>> base = r"c:\User\xtofl"
 >>> binfiles = [ os.path.join(base,f) 
            for base, _, files in os.walk(root) 
            for f in files if f.endswith(".jpg") ] 

Or with a list comprehension:

 >>> base = r"c:\User\xtofl"
 >>> binfiles = [ os.path.join(base,f) 
            for base, _, files in os.walk(root) 
            for f in files if f.endswith(".jpg") ] 

回答 14

刚做这个..它将以分层方式打印文件和目录

但是我没有用过fnmatch或walk

#!/usr/bin/python

import os,glob,sys

def dirlist(path, c = 1):

        for i in glob.glob(os.path.join(path, "*")):
                if os.path.isfile(i):
                        filepath, filename = os.path.split(i)
                        print '----' *c + filename

                elif os.path.isdir(i):
                        dirname = os.path.basename(i)
                        print '----' *c + dirname
                        c+=1
                        dirlist(i,c)
                        c-=1


path = os.path.normpath(sys.argv[1])
print(os.path.basename(path))
dirlist(path)

Just made this.. it will print files and directory in hierarchical way

But I didn’t used fnmatch or walk

#!/usr/bin/python

import os,glob,sys

def dirlist(path, c = 1):

        for i in glob.glob(os.path.join(path, "*")):
                if os.path.isfile(i):
                        filepath, filename = os.path.split(i)
                        print '----' *c + filename

                elif os.path.isdir(i):
                        dirname = os.path.basename(i)
                        print '----' *c + dirname
                        c+=1
                        dirlist(i,c)
                        c-=1


path = os.path.normpath(sys.argv[1])
print(os.path.basename(path))
dirlist(path)

回答 15

那使用fnmatch或正则表达式:

import fnmatch, os

def filepaths(directory, pattern):
    for root, dirs, files in os.walk(directory):
        for basename in files:
            try:
                matched = pattern.match(basename)
            except AttributeError:
                matched = fnmatch.fnmatch(basename, pattern)
            if matched:
                yield os.path.join(root, basename)

# usage
if __name__ == '__main__':
    from pprint import pprint as pp
    import re
    path = r'/Users/hipertracker/app/myapp'
    pp([x for x in filepaths(path, re.compile(r'.*\.py$'))])
    pp([x for x in filepaths(path, '*.py')])

That one uses fnmatch or regular expression:

import fnmatch, os

def filepaths(directory, pattern):
    for root, dirs, files in os.walk(directory):
        for basename in files:
            try:
                matched = pattern.match(basename)
            except AttributeError:
                matched = fnmatch.fnmatch(basename, pattern)
            if matched:
                yield os.path.join(root, basename)

# usage
if __name__ == '__main__':
    from pprint import pprint as pp
    import re
    path = r'/Users/hipertracker/app/myapp'
    pp([x for x in filepaths(path, re.compile(r'.*\.py$'))])
    pp([x for x in filepaths(path, '*.py')])

回答 16

除了建议的答案,您还可以通过一些懒惰的生成和列表理解魔术来做到这一点:

import os, glob, itertools

results = itertools.chain.from_iterable(glob.iglob(os.path.join(root,'*.c'))
                                               for root, dirs, files in os.walk('src'))

for f in results: print(f)

除了适合一行并且避免在内存中使用不必要的列表之外,这还具有很好的副作用,即您可以以类似于**运算符的方式使用它,例如,可以使用os.path.join(root, 'some/path/*.c')它来获取所有.c文件。具有此结构的src子目录。

In addition to the suggested answers, you can do this with some lazy generation and list comprehension magic:

import os, glob, itertools

results = itertools.chain.from_iterable(glob.iglob(os.path.join(root,'*.c'))
                                               for root, dirs, files in os.walk('src'))

for f in results: print(f)

Besides fitting in one line and avoiding unnecessary lists in memory, this also has the nice side effect, that you can use it in a way similar to the ** operator, e.g., you could use os.path.join(root, 'some/path/*.c') in order to get all .c files in all sub directories of src that have this structure.


回答 17

对于python 3.5及更高版本

import glob

#file_names_array = glob.glob('path/*.c', recursive=True)
#above works for files directly at path/ as guided by NeStack

#updated version
file_names_array = glob.glob('path/**/*.c', recursive=True)

您可能还需要

for full_path_in_src in  file_names_array:
    print (full_path_in_src ) # be like 'abc/xyz.c'
    #Full system path of this would be like => 'path till src/abc/xyz.c'

For python 3.5 and later

import glob

#file_names_array = glob.glob('path/*.c', recursive=True)
#above works for files directly at path/ as guided by NeStack

#updated version
file_names_array = glob.glob('path/**/*.c', recursive=True)

further you might need

for full_path_in_src in  file_names_array:
    print (full_path_in_src ) # be like 'abc/xyz.c'
    #Full system path of this would be like => 'path till src/abc/xyz.c'

回答 18

这是Python 2.7上的有效代码。作为我的devops工作的一部分,我需要编写一个脚本,该脚本会将标有live-appName.properties的配置文件移动到appName.properties。可能还有其他扩展文件,例如live-appName.xml。

以下是用于此目的的工作代码,该代码在给定目录(嵌套级别)中查找文件,然后将其重命名(移动)为所需的文件名

def flipProperties(searchDir):
   print "Flipping properties to point to live DB"
   for root, dirnames, filenames in os.walk(searchDir):
      for filename in fnmatch.filter(filenames, 'live-*.*'):
        targetFileName = os.path.join(root, filename.split("live-")[1])
        print "File "+ os.path.join(root, filename) + "will be moved to " + targetFileName
        shutil.move(os.path.join(root, filename), targetFileName)

从主脚本调用此函数

flipProperties(searchDir)

希望这可以帮助遇到类似问题的人。

This is a working code on Python 2.7. As part of my devops work, I was required to write a script which would move the config files marked with live-appName.properties to appName.properties. There could be other extension files as well like live-appName.xml.

Below is a working code for this, which finds the files in the given directories (nested level) and then renames (moves) it to the required filename

def flipProperties(searchDir):
   print "Flipping properties to point to live DB"
   for root, dirnames, filenames in os.walk(searchDir):
      for filename in fnmatch.filter(filenames, 'live-*.*'):
        targetFileName = os.path.join(root, filename.split("live-")[1])
        print "File "+ os.path.join(root, filename) + "will be moved to " + targetFileName
        shutil.move(os.path.join(root, filename), targetFileName)

This function is called from a main script

flipProperties(searchDir)

Hope this helps someone struggling with similar issues.


回答 19

Johan Dahlin答案的简化版本,不带fnmatch

import os

matches = []
for root, dirnames, filenames in os.walk('src'):
  matches += [os.path.join(root, f) for f in filenames if f[-2:] == '.c']

Simplified version of Johan Dahlin’s answer, without fnmatch.

import os

matches = []
for root, dirnames, filenames in os.walk('src'):
  matches += [os.path.join(root, f) for f in filenames if f[-2:] == '.c']

回答 20

这是我的使用列表推导的解决方案在目录和所有子目录中递归搜索多个文件扩展名的解决方案:

import os, glob

def _globrec(path, *exts):
""" Glob recursively a directory and all subdirectories for multiple file extensions 
    Note: Glob is case-insensitive, i. e. for '\*.jpg' you will get files ending
    with .jpg and .JPG

    Parameters
    ----------
    path : str
        A directory name
    exts : tuple
        File extensions to glob for

    Returns
    -------
    files : list
        list of files matching extensions in exts in path and subfolders

    """
    dirs = [a[0] for a in os.walk(path)]
    f_filter = [d+e for d in dirs for e in exts]    
    return [f for files in [glob.iglob(files) for files in f_filter] for f in files]

my_pictures = _globrec(r'C:\Temp', '\*.jpg','\*.bmp','\*.png','\*.gif')
for f in my_pictures:
    print f

Here is my solution using list comprehension to search for multiple file extensions recursively in a directory and all subdirectories:

import os, glob

def _globrec(path, *exts):
""" Glob recursively a directory and all subdirectories for multiple file extensions 
    Note: Glob is case-insensitive, i. e. for '\*.jpg' you will get files ending
    with .jpg and .JPG

    Parameters
    ----------
    path : str
        A directory name
    exts : tuple
        File extensions to glob for

    Returns
    -------
    files : list
        list of files matching extensions in exts in path and subfolders

    """
    dirs = [a[0] for a in os.walk(path)]
    f_filter = [d+e for d in dirs for e in exts]    
    return [f for files in [glob.iglob(files) for files in f_filter] for f in files]

my_pictures = _globrec(r'C:\Temp', '\*.jpg','\*.bmp','\*.png','\*.gif')
for f in my_pictures:
    print f

回答 21

import sys, os, glob

dir_list = ["c:\\books\\heap"]

while len(dir_list) > 0:
    cur_dir = dir_list[0]
    del dir_list[0]
    list_of_files = glob.glob(cur_dir+'\\*')
    for book in list_of_files:
        if os.path.isfile(book):
            print(book)
        else:
            dir_list.append(book)
import sys, os, glob

dir_list = ["c:\\books\\heap"]

while len(dir_list) > 0:
    cur_dir = dir_list[0]
    del dir_list[0]
    list_of_files = glob.glob(cur_dir+'\\*')
    for book in list_of_files:
        if os.path.isfile(book):
            print(book)
        else:
            dir_list.append(book)

回答 22

我修改了此发布中的最佳答案..并最近创建了此脚本,该脚本将遍历给定目录(searchdir)中的所有文件及其下的子目录…并打印文件名,rootdir,修改/创建日期和尺寸。

希望这对某人有帮助…他们可以遍历目录并获取fileinfo。

import time
import fnmatch
import os

def fileinfo(file):
    filename = os.path.basename(file)
    rootdir = os.path.dirname(file)
    lastmod = time.ctime(os.path.getmtime(file))
    creation = time.ctime(os.path.getctime(file))
    filesize = os.path.getsize(file)

    print "%s**\t%s\t%s\t%s\t%s" % (rootdir, filename, lastmod, creation, filesize)

searchdir = r'D:\Your\Directory\Root'
matches = []

for root, dirnames, filenames in os.walk(searchdir):
    ##  for filename in fnmatch.filter(filenames, '*.c'):
    for filename in filenames:
        ##      matches.append(os.path.join(root, filename))
        ##print matches
        fileinfo(os.path.join(root, filename))

I modified the top answer in this posting.. and recently created this script which will loop through all files in a given directory (searchdir) and the sub-directories under it… and prints filename, rootdir, modified/creation date, and size.

Hope this helps someone… and they can walk the directory and get fileinfo.

import time
import fnmatch
import os

def fileinfo(file):
    filename = os.path.basename(file)
    rootdir = os.path.dirname(file)
    lastmod = time.ctime(os.path.getmtime(file))
    creation = time.ctime(os.path.getctime(file))
    filesize = os.path.getsize(file)

    print "%s**\t%s\t%s\t%s\t%s" % (rootdir, filename, lastmod, creation, filesize)

searchdir = r'D:\Your\Directory\Root'
matches = []

for root, dirnames, filenames in os.walk(searchdir):
    ##  for filename in fnmatch.filter(filenames, '*.c'):
    for filename in filenames:
        ##      matches.append(os.path.join(root, filename))
        ##print matches
        fileinfo(os.path.join(root, filename))

回答 23

这是一个将模式与完整路径而不只是基本文件名匹配的解决方案。

它用于fnmatch.translate将glob样式的模式转换为正则表达式,然后将其与在遍历目录时发现的每个文件的完整路径进行匹配。

re.IGNORECASE是可选的,但在Windows上是理想的,因为文件系统本身不区分大小写。(我没有费心编译正则表达式,因为文档表明它应该在内部缓存。)

import fnmatch
import os
import re

def findfiles(dir, pattern):
    patternregex = fnmatch.translate(pattern)
    for root, dirs, files in os.walk(dir):
        for basename in files:
            filename = os.path.join(root, basename)
            if re.search(patternregex, filename, re.IGNORECASE):
                yield filename

Here is a solution that will match the pattern against the full path and not just the base filename.

It uses fnmatch.translate to convert a glob-style pattern into a regular expression, which is then matched against the full path of each file found while walking the directory.

re.IGNORECASE is optional, but desirable on Windows since the file system itself is not case-sensitive. (I didn’t bother compiling the regex because docs indicate it should be cached internally.)

import fnmatch
import os
import re

def findfiles(dir, pattern):
    patternregex = fnmatch.translate(pattern)
    for root, dirs, files in os.walk(dir):
        for basename in files:
            filename = os.path.join(root, basename)
            if re.search(patternregex, filename, re.IGNORECASE):
                yield filename

回答 24

我需要一个解决方案的Python 2.x中,工程上大的目录。
我结束了这一点:

import subprocess
foundfiles= subprocess.check_output("ls src/*.c src/**/*.c", shell=True)
for foundfile in foundfiles.splitlines():
    print foundfile

请注意,如果ls找不到任何匹配文件,您可能需要一些异常处理。

I needed a solution for python 2.x that works fast on large directories.
I endet up with this:

import subprocess
foundfiles= subprocess.check_output("ls src/*.c src/**/*.c", shell=True)
for foundfile in foundfiles.splitlines():
    print foundfile

Note that you might need some exception handling in case ls doesn’t find any matching file.