标签归档:filesystems

如何仅列出Python中的顶级目录?

问题:如何仅列出Python中的顶级目录?

我希望仅列出某个文件夹内的目录。这意味着我既不需要列出文件名,也不需要其他子文件夹。

让我们看看一个例子是否有帮助。在当前目录中,我们有:

>>> os.listdir(os.getcwd())
['cx_Oracle-doc', 'DLLs', 'Doc', 'include', 'Lib', 'libs', 'LICENSE.txt', 'mod_p
ython-wininst.log', 'NEWS.txt', 'pymssql-wininst.log', 'python.exe', 'pythonw.ex
e', 'README.txt', 'Removemod_python.exe', 'Removepymssql.exe', 'Scripts', 'tcl',
 'Tools', 'w9xpopen.exe']

但是,我不想列出文件名。我也不需要子文件夹,例如\ Lib \ curses。本质上,我想要的东西适用于以下情况:

>>> for root, dirnames, filenames in os.walk('.'):
...     print dirnames
...     break
...
['cx_Oracle-doc', 'DLLs', 'Doc', 'include', 'Lib', 'libs', 'Scripts', 'tcl', 'Tools']

但是,我想知道是否有一种更简单的方法来获得相同的结果。我得到的印象是仅使用os.walk返回顶级是无效的/太多了。

I want to be able to list only the directories inside some folder. This means I don’t want filenames listed, nor do I want additional sub-folders.

Let’s see if an example helps. In the current directory we have:

>>> os.listdir(os.getcwd())
['cx_Oracle-doc', 'DLLs', 'Doc', 'include', 'Lib', 'libs', 'LICENSE.txt', 'mod_p
ython-wininst.log', 'NEWS.txt', 'pymssql-wininst.log', 'python.exe', 'pythonw.ex
e', 'README.txt', 'Removemod_python.exe', 'Removepymssql.exe', 'Scripts', 'tcl',
 'Tools', 'w9xpopen.exe']

However, I don’t want filenames listed. Nor do I want sub-folders such as \Lib\curses. Essentially what I want works with the following:

>>> for root, dirnames, filenames in os.walk('.'):
...     print dirnames
...     break
...
['cx_Oracle-doc', 'DLLs', 'Doc', 'include', 'Lib', 'libs', 'Scripts', 'tcl', 'Tools']

However, I’m wondering if there’s a simpler way of achieving the same results. I get the impression that using os.walk only to return the top level is inefficient/too much.


回答 0

使用os.path.isdir()过滤结果(并使用os.path.join()获得真实路径):

>>> [ name for name in os.listdir(thedir) if os.path.isdir(os.path.join(thedir, name)) ]
['ctypes', 'distutils', 'encodings', 'lib-tk', 'config', 'idlelib', 'xml', 'bsddb', 'hotshot', 'logging', 'doc', 'test', 'compiler', 'curses', 'site-packages', 'email', 'sqlite3', 'lib-dynload', 'wsgiref', 'plat-linux2', 'plat-mac']

Filter the result using os.path.isdir() (and use os.path.join() to get the real path):

>>> [ name for name in os.listdir(thedir) if os.path.isdir(os.path.join(thedir, name)) ]
['ctypes', 'distutils', 'encodings', 'lib-tk', 'config', 'idlelib', 'xml', 'bsddb', 'hotshot', 'logging', 'doc', 'test', 'compiler', 'curses', 'site-packages', 'email', 'sqlite3', 'lib-dynload', 'wsgiref', 'plat-linux2', 'plat-mac']

回答 1

步行

os.walknext项目功能一起使用:

next(os.walk('.'))[1]

对于Python <= 2.5,请使用:

os.walk('.').next()[1]

如何运作

os.walk是一个生成器,调用next将以3元组(目录路径,目录名,文件名)的形式获取第一个结果。因此,[1]索引仅返回dirnames该元组的。

os.walk

Use os.walk with next item function:

next(os.walk('.'))[1]

For Python <=2.5 use:

os.walk('.').next()[1]

How this works

os.walk is a generator and calling next will get the first result in the form of a 3-tuple (dirpath, dirnames, filenames). Thus the [1] index returns only the dirnames from that tuple.


回答 2

使用os.path.isdir筛选列表以检测目录。

filter(os.path.isdir, os.listdir(os.getcwd()))

Filter the list using os.path.isdir to detect directories.

filter(os.path.isdir, os.listdir(os.getcwd()))

回答 3

directories=[d for d in os.listdir(os.getcwd()) if os.path.isdir(d)]
directories=[d for d in os.listdir(os.getcwd()) if os.path.isdir(d)]

回答 4

请注意,os.listdir(os.getcwd())最好不要这样做,而要这样做os.listdir(os.path.curdir)。少调用一个函数,它具有可移植性。

因此,要完成答案,请获取文件夹中的目录列表:

def listdirs(folder):
    return [d for d in os.listdir(folder) if os.path.isdir(os.path.join(folder, d))]

如果您希望使用完整路径名,请使用以下功能:

def listdirs(folder):
    return [
        d for d in (os.path.join(folder, d1) for d1 in os.listdir(folder))
        if os.path.isdir(d)
    ]

Note that, instead of doing os.listdir(os.getcwd()), it’s preferable to do os.listdir(os.path.curdir). One less function call, and it’s as portable.

So, to complete the answer, to get a list of directories in a folder:

def listdirs(folder):
    return [d for d in os.listdir(folder) if os.path.isdir(os.path.join(folder, d))]

If you prefer full pathnames, then use this function:

def listdirs(folder):
    return [
        d for d in (os.path.join(folder, d1) for d1 in os.listdir(folder))
        if os.path.isdir(d)
    ]

回答 5

这似乎也起作用(至少在Linux上):

import glob, os
glob.glob('*' + os.path.sep)

This seems to work too (at least on linux):

import glob, os
glob.glob('*' + os.path.sep)

回答 6

只是要补充一点,使用os.listdir()不会“比非常简单的os.walk()。next()[1]花费更多的处理时间”)。这是因为os.walk()在内部使用os.listdir()。实际上,如果您一起测试它们:

>>>> import timeit
>>>> timeit.timeit("os.walk('.').next()[1]", "import os", number=10000)
1.1215229034423828
>>>> timeit.timeit("[ name for name in os.listdir('.') if os.path.isdir(os.path.join('.', name)) ]", "import os", number=10000)
1.0592019557952881

os.listdir()的过滤非常快。

Just to add that using os.listdir() does not “take a lot of processing vs very simple os.walk().next()[1]”. This is because os.walk() uses os.listdir() internally. In fact if you test them together:

>>>> import timeit
>>>> timeit.timeit("os.walk('.').next()[1]", "import os", number=10000)
1.1215229034423828
>>>> timeit.timeit("[ name for name in os.listdir('.') if os.path.isdir(os.path.join('.', name)) ]", "import os", number=10000)
1.0592019557952881

The filtering of os.listdir() is very slightly faster.


回答 7

一种非常简单而优雅的方法是使用此方法:

 import os
 dir_list = os.walk('.').next()[1]
 print dir_list

在需要文件夹名称的同一文件夹中运行此脚本,它将仅为您提供直接的文件夹名称(也没有文件夹的完整路径)。

A very much simpler and elegant way is to use this:

 import os
 dir_list = os.walk('.').next()[1]
 print dir_list

Run this script in the same folder for which you want folder names.It will give you exactly the immediate folders name only(that too without the full path of the folders).


回答 8

使用列表理解

[a for a in os.listdir() if os.path.isdir(a)]

我认为这是最简单的方法

Using list comprehension,

[a for a in os.listdir() if os.path.isdir(a)]

I think It is the simplest way


回答 9

作为一个新手,我还不能直接发表评论,但这是我想补充到ΤζΩΤζΙΟΥ的以下部分的一个小更正:

如果您希望使用完整路径名,请使用以下功能:

def listdirs(folder):  
  return [
    d for d in (os.path.join(folder, d1) for d1 in os.listdir(folder))
    if os.path.isdir(d)
]

对于仍然使用python <2.4的用户:内部构造需要是列表而不是元组,因此应如下所示:

def listdirs(folder):  
  return [
    d for d in [os.path.join(folder, d1) for d1 in os.listdir(folder)]
    if os.path.isdir(d)
  ]

否则会出现语法错误。

being a newbie here i can’t yet directly comment but here is a small correction i’d like to add to the following part of ΤΖΩΤΖΙΟΥ’s answer :

If you prefer full pathnames, then use this function:

def listdirs(folder):  
  return [
    d for d in (os.path.join(folder, d1) for d1 in os.listdir(folder))
    if os.path.isdir(d)
]

for those still on python < 2.4: the inner construct needs to be a list instead of a tuple and therefore should read like this:

def listdirs(folder):  
  return [
    d for d in [os.path.join(folder, d1) for d1 in os.listdir(folder)]
    if os.path.isdir(d)
  ]

otherwise one gets a syntax error.


回答 10

[x for x in os.listdir(somedir) if os.path.isdir(os.path.join(somedir, x))]
[x for x in os.listdir(somedir) if os.path.isdir(os.path.join(somedir, x))]

回答 11

有关完整路径名的列表,相对于其他解决方案,我更喜欢此版本:

def listdirs(dir):
    return [os.path.join(os.path.join(dir, x)) for x in os.listdir(dir) 
        if os.path.isdir(os.path.join(dir, x))]

For a list of full path names I prefer this version to the other solutions here:

def listdirs(dir):
    return [os.path.join(os.path.join(dir, x)) for x in os.listdir(dir) 
        if os.path.isdir(os.path.join(dir, x))]

回答 12

scanDir = "abc"
directories = [d for d in os.listdir(scanDir) if os.path.isdir(os.path.join(os.path.abspath(scanDir), d))]
scanDir = "abc"
directories = [d for d in os.listdir(scanDir) if os.path.isdir(os.path.join(os.path.abspath(scanDir), d))]

回答 13

没有目录时不会失败的更安全的选项。

def listdirs(folder):
    if os.path.exists(folder):
         return [d for d in os.listdir(folder) if os.path.isdir(os.path.join(folder, d))]
    else:
         return []

A safer option that does not fail when there is no directory.

def listdirs(folder):
    if os.path.exists(folder):
         return [d for d in os.listdir(folder) if os.path.isdir(os.path.join(folder, d))]
    else:
         return []

回答 14

这样吗

>>>> [path for path in os.listdir(os.getcwd()) if os.path.isdir(path)]

Like so?

>>>> [path for path in os.listdir(os.getcwd()) if os.path.isdir(path)]

回答 15

蟒3.4引入pathlib模块到标准库,它提供了一个面向对象的方法来处理的文件系统的路径:

from pathlib import Path

p = Path('./')
[f for f in p.iterdir() if f.is_dir()]

Python 3.4 introduced the pathlib module into the standard library, which provides an object oriented approach to handle filesystem paths:

from pathlib import Path

p = Path('./')
[f for f in p.iterdir() if f.is_dir()]

回答 16

-- This will exclude files and traverse through 1 level of sub folders in the root

def list_files(dir):
    List = []
    filterstr = ' '
    for root, dirs, files in os.walk(dir, topdown = True):
        #r.append(root)
        if (root == dir):
            pass
        elif filterstr in root:
            #filterstr = ' '
            pass
        else:
            filterstr = root
            #print(root)
            for name in files:
                print(root)
                print(dirs)
                List.append(os.path.join(root,name))
            #print(os.path.join(root,name),"\n")
                print(List,"\n")

    return List
-- This will exclude files and traverse through 1 level of sub folders in the root

def list_files(dir):
    List = []
    filterstr = ' '
    for root, dirs, files in os.walk(dir, topdown = True):
        #r.append(root)
        if (root == dir):
            pass
        elif filterstr in root:
            #filterstr = ' '
            pass
        else:
            filterstr = root
            #print(root)
            for name in files:
                print(root)
                print(dirs)
                List.append(os.path.join(root,name))
            #print(os.path.join(root,name),"\n")
                print(List,"\n")

    return List

检查路径在Python中是否有效,而无需在路径的目标位置创建文件

问题:检查路径在Python中是否有效,而无需在路径的目标位置创建文件

我有一个路径(包括目录和文件名)。
我需要测试文件名是否有效,例如,文件系统是否允许我创建具有该名称的文件。
文件名中包含一些Unicode字符

可以安全地假设路径的目录段是有效且可访问的(我试图使这个问题更笼统地适用,并且显然我走得太远了)。

除非必须,否则我非常不想逃脱任何东西。

我会发布一些我正在处理的示例字符,但是显然它们会被堆栈交换系统自动删除。无论如何,我想保留标准的unicode实体,例如ö,仅转义文件名中无效的内容。


这里是要抓住的地方。路径目标上可能已经(可能没有)文件。如果该文件存在,我需要保留该文件,如果不存在,则不要创建该文件。

基本上,我想检查是否可以不实际打开写入路径的情况下写入路径(以及通常需要进行的自动文件创建/文件破坏)。

因此:

try:
    open(filename, 'w')
except OSError:
    # handle error here

从这里

这是不可接受的,因为它将覆盖我不想触摸的现有文件(如果存在),或者如果不存在则创建该文件。

我知道我可以做:

if not os.access(filePath, os.W_OK):
    try:
        open(filePath, 'w').close()
        os.unlink(filePath)
    except OSError:
        # handle error here

但这将在处创建文件filePath,然后我将不得不os.unlink

最后,似乎花了6或7行来完成应该简单os.isvalidpath(filePath)或相似的操作。


顺便说一句,我需要在(至少)Windows和MacOS上运行它,因此我想避免使用特定于平台的东西。

I have a path (including directory and file name).
I need to test if the file-name is a valid, e.g. if the file-system will allow me to create a file with such a name.
The file-name has some unicode characters in it.

It’s safe to assume the directory segment of the path is valid and accessible (I was trying to make the question more gnerally applicable, and apparently I wen too far).

I very much do not want to have to escape anything unless I have to.

I’d post some of the example characters I am dealing with, but apparently they get automatically removed by the stack-exchange system. Anyways, I want to keep standard unicode entities like ö, and only escape things which are invalid in a filename.


Here is the catch. There may (or may not) already be a file at the target of the path. I need to keep that file if it does exist, and not create a file if it does not.

Basically I want to check if I could write to a path without actually opening the path for writing (and the automatic file creation/file clobbering that typically entails).

As such:

try:
    open(filename, 'w')
except OSError:
    # handle error here

from here

Is not acceptable, because it will overwrite the existent file, which I do not want to touch (if it’s there), or create said file if it’s not.

I know I can do:

if not os.access(filePath, os.W_OK):
    try:
        open(filePath, 'w').close()
        os.unlink(filePath)
    except OSError:
        # handle error here

But that will create the file at the filePath, which I would then have to os.unlink.

In the end, it seems like it’s spending 6 or 7 lines to do something that should be as simple as os.isvalidpath(filePath) or similar.


As an aside, I need this to run on (at least) Windows and MacOS, so I’d like to avoid platform-specific stuff.


回答 0

tl; dr

调用is_path_exists_or_creatable()下面定义的函数。

严格地使用Python3。这就是我们的发展方向。

两个问题的故事

问题“如何测试路径名的有效性,以及对于有效路径名,这些路径的存在或可写性?” 显然是两个独立的问题。两者都很有趣,而且在这里还是我能找到的任何地方都没有收到真正令人满意的答案。

vikki答案可能是最接近的,但有以下明显的缺点:

  • 不必要地打开(然后无法可靠地关闭)文件句柄。
  • 不必要的写作( …然后无法可靠地关闭或删除)0字节文件。
  • 忽略操作系统特定的错误,以区分不可忽略的无效路径名和可忽略的文件系统问题。毫不奇怪,这在Windows下至关重要。(见下文。
  • 忽略由外部进程同时(重新)移动要测试的路径名的父目录导致的竞争条件。(见下文。
  • 忽略此路径名导致的连接超时,该路径名位于陈旧,缓慢或暂时不可访问的文件系统上。这可能会使面向公众的服务遭受潜在的DoS驱动的攻击。(见下文。

我们将解决所有问题。

问题#0:路径名有效性又是什么?

在将我们脆弱的肉类衣服扔进Python般的痛苦中之前,我们可能应该定义“路径名有效性”的含义。究竟是什么定义了有效性?

“路径名有效性”是指路径名相对于当前系统的根文件系统语法正确性,无论该路径或其父目录是否物理存在。如果路径名符合根文件系统的所有语法要求,则在此定义下语法上正确。

所谓“根文件系统”,是指:

  • 在与POSIX兼容的系统上,文件系统已安装到根目录(/)。
  • 在Windows中,文件系统安装到%HOMEDRIVE%,包含当前的Windows安装(通常但结肠-后缀盘符必然C:)。

反过来,“语法正确性”的含义取决于根文件系统的类型。对于ext4(且不是大多数但与所有POSIX兼容的)文件系统,路径名称在且仅当该路径名称在语法上正确:

  • 不包含空字节(即,\x00在Python中)。这是所有POSIX兼容文件系统的硬性要求。
  • 包含不超过255个字节的路径组件(例如,'a'*256在Python中)。路径成分是含有不路径名的最长子串/字符(例如,bergtattindi,和fjeldkamrene在路径名/bergtatt/ind/i/fjeldkamrene)。

句法正确性。根文件系统。而已。

问题1:我们现在应如何进行路径名有效性?

令人惊讶的是,在Python中验证路径名是不直观的。我在这里与Fake Name达成坚定协议:官方os.path软件包应为此提供现成的解决方案。出于未知(可能不令人信服)的原因,事实并非如此。幸运的是,展开您自己的临时解决方案并不是那么费劲……

好的,实际上是。毛茸茸的 讨厌 它在发光时发出嘶哑和咯咯笑声时可能会发痒。但是你会怎么做?Nuthin’。

我们将很快进入低级代码的放射性深渊。但首先,让我们谈谈高级商店。当传递无效的路径名时,标准os.stat()os.lstat()函数会引发以下异常:

  • 对于驻留在不存在的目录中的路径名, FileNotFoundError
  • 对于现有目录中的路径名:
    • 在Windows下,WindowsErrorwinerror属性为123(即ERROR_INVALID_NAME)的实例。
    • 在所有其他操作系统下:
    • 对于包含空字节(即'\x00')的路径名,请使用的实例TypeError
    • 对于包含长度超过255个字节的路径成分的路径名,OSErrorerrcode属性的实例为:
      • 在SunOS和* BSD系列操作系统下,errno.ERANGE。(这似乎是操作系统级别的错误,否则称为POSIX标准的“选择性解释”。)
      • 在所有其他操作系统下,errno.ENAMETOOLONG

至关重要的是,这意味着仅存在于现有目录中的路径名是有效的。当传递的路径名驻留在不存在的目录中时,不管这些路径名是否无效,os.stat()andos.lstat()函数都会引发通用FileNotFoundError异常。目录存在优先于路径名无效。

这是否意味着不存在的目录中的路径名无效?是的-除非我们修改这些路径名以驻留在现有目录中。但是,这甚至安全可行吗?修改路径名是否应该阻止我们验证原始路径名?

要回答这个问题,请从上面回忆一下,ext4文件系统上语法正确的路径名不包含路径组件(A)包含空字节,或(B)长度超过255个字节。因此,ext4仅当该路径名中的所有路径组件均有效时,该路径名才有效。大多数 现实世界中感兴趣的文件系统都是如此。

那根学究的见解真的对我们有帮助吗?是。它将一次验证完整路径名的较大问题减少到仅验证该路径名中的所有路径组成部分的较小问题。通过遵循以下算法,可以以跨平台方式对任意路径名进行有效验证(无论该路径名是否位于现有目录中):

  1. 将该路径名拆分为路径组成部分(例如,将路径名/troldskog/faren/vild拆分为list ['', 'troldskog', 'faren', 'vild'])。
  2. 对于每个这样的组件:
    1. 将保证与该组件一起存在的目录的路径名加入新的临时路径名(例如/troldskog)。
    2. 将该路径名传递给os.stat()os.lstat()。如果该路径名及其组件无效,则可以确保此调用引发一个暴露无效类型的异常,而不是通用FileNotFoundError异常。为什么?因为该路径名位于现有目录中。(循环逻辑是循环的。)

是否有目录保证存在?是的,但通常只有一个:根文件系统的最顶层目录(如上定义)。

将驻留在任何其他目录(因此不保证存在)中的路径名传递给竞争条件os.stat()os.lstat()引发竞争条件,即使该目录先前已被测试存在。为什么?因为在执行该测试之后将该路径名传递给os.stat()or之前,无法阻止外部进程同时删除该目录os.lstat()。释放令人发疯的狗!

上述方法也有一个很大的附带好处:安全性。(是不是好的?)具体为:

前端应用程序通过简单地将这样的路径名传递给拒绝服务(DoS)攻击os.stat()os.lstat()容易受到拒绝的攻击,从而验证来自不受信任来源的任意路径名。恶意用户可能试图反复验证驻留在已知陈旧或缓慢的文件系统上的路径名(例如,NFS Samba共享);在这种情况下,盲目声明传入的路径名可能最终会因连接超时而失败,或者消耗的时间和资源要比您承受失业的能力弱。

上面的方法通过仅针对根文件系统的根目录验证路径名的路径组成部分来避免这种情况。(即使这是陈旧,缓慢或无法访问的,也比路径名验证要麻烦得多。)

丢失?大。让我们开始。(假定使用Python3。请参阅“ leycec对300的脆弱希望是什么?”)

import errno, os

# Sadly, Python fails to provide the following magic number for us.
ERROR_INVALID_NAME = 123
'''
Windows-specific error code indicating an invalid pathname.

See Also
----------
https://docs.microsoft.com/en-us/windows/win32/debug/system-error-codes--0-499-
    Official listing of all such codes.
'''

def is_pathname_valid(pathname: str) -> bool:
    '''
    `True` if the passed pathname is a valid pathname for the current OS;
    `False` otherwise.
    '''
    # If this pathname is either not a string or is but is empty, this pathname
    # is invalid.
    try:
        if not isinstance(pathname, str) or not pathname:
            return False

        # Strip this pathname's Windows-specific drive specifier (e.g., `C:\`)
        # if any. Since Windows prohibits path components from containing `:`
        # characters, failing to strip this `:`-suffixed prefix would
        # erroneously invalidate all valid absolute Windows pathnames.
        _, pathname = os.path.splitdrive(pathname)

        # Directory guaranteed to exist. If the current OS is Windows, this is
        # the drive to which Windows was installed (e.g., the "%HOMEDRIVE%"
        # environment variable); else, the typical root directory.
        root_dirname = os.environ.get('HOMEDRIVE', 'C:') \
            if sys.platform == 'win32' else os.path.sep
        assert os.path.isdir(root_dirname)   # ...Murphy and her ironclad Law

        # Append a path separator to this directory if needed.
        root_dirname = root_dirname.rstrip(os.path.sep) + os.path.sep

        # Test whether each path component split from this pathname is valid or
        # not, ignoring non-existent and non-readable path components.
        for pathname_part in pathname.split(os.path.sep):
            try:
                os.lstat(root_dirname + pathname_part)
            # If an OS-specific exception is raised, its error code
            # indicates whether this pathname is valid or not. Unless this
            # is the case, this exception implies an ignorable kernel or
            # filesystem complaint (e.g., path not found or inaccessible).
            #
            # Only the following exceptions indicate invalid pathnames:
            #
            # * Instances of the Windows-specific "WindowsError" class
            #   defining the "winerror" attribute whose value is
            #   "ERROR_INVALID_NAME". Under Windows, "winerror" is more
            #   fine-grained and hence useful than the generic "errno"
            #   attribute. When a too-long pathname is passed, for example,
            #   "errno" is "ENOENT" (i.e., no such file or directory) rather
            #   than "ENAMETOOLONG" (i.e., file name too long).
            # * Instances of the cross-platform "OSError" class defining the
            #   generic "errno" attribute whose value is either:
            #   * Under most POSIX-compatible OSes, "ENAMETOOLONG".
            #   * Under some edge-case OSes (e.g., SunOS, *BSD), "ERANGE".
            except OSError as exc:
                if hasattr(exc, 'winerror'):
                    if exc.winerror == ERROR_INVALID_NAME:
                        return False
                elif exc.errno in {errno.ENAMETOOLONG, errno.ERANGE}:
                    return False
    # If a "TypeError" exception was raised, it almost certainly has the
    # error message "embedded NUL character" indicating an invalid pathname.
    except TypeError as exc:
        return False
    # If no exception was raised, all path components and hence this
    # pathname itself are valid. (Praise be to the curmudgeonly python.)
    else:
        return True
    # If any other exception was raised, this is an unrelated fatal issue
    # (e.g., a bug). Permit this exception to unwind the call stack.
    #
    # Did we mention this should be shipped with Python already?

做完了 不要斜视那个代码。(它咬。

问题2:路径名的存在或可创建性可能无效,是吗?

在上述解决方案的基础上,测试可能无效的路径名的存在或可创建性通常很简单。这里的关键是测试传递的路径之前调用先前定义的函数:

def is_path_creatable(pathname: str) -> bool:
    '''
    `True` if the current user has sufficient permissions to create the passed
    pathname; `False` otherwise.
    '''
    # Parent directory of the passed path. If empty, we substitute the current
    # working directory (CWD) instead.
    dirname = os.path.dirname(pathname) or os.getcwd()
    return os.access(dirname, os.W_OK)

def is_path_exists_or_creatable(pathname: str) -> bool:
    '''
    `True` if the passed pathname is a valid pathname for the current OS _and_
    either currently exists or is hypothetically creatable; `False` otherwise.

    This function is guaranteed to _never_ raise exceptions.
    '''
    try:
        # To prevent "os" module calls from raising undesirable exceptions on
        # invalid pathnames, is_pathname_valid() is explicitly called first.
        return is_pathname_valid(pathname) and (
            os.path.exists(pathname) or is_path_creatable(pathname))
    # Report failure on non-fatal filesystem complaints (e.g., connection
    # timeouts, permissions issues) implying this path to be inaccessible. All
    # other exceptions are unrelated fatal issues and should not be caught here.
    except OSError:
        return False

完成完成的。除了不太一样。

问题3:Windows上可能存在无效的路径名或可写性

有一个警告。当然有。

官方os.access()文件承认:

注意:即使os.access()表明I / O操作将成功,它也可能会失败,尤其是对于网络文件系统上的操作,其权限语义可能超出通常的POSIX权限位模型。

毫不奇怪,Windows通常是这里的嫌疑人。由于在NTFS文件系统上广泛使用了访问控制列表(ACL),因此简单的POSIX权限位模型无法很好地映射到底层Windows现实。尽管这(不是问题)不是Python的错,但对于与Windows兼容的应用程序,它可能仍然值得关注。

如果是您,那么需要一个更强大的替代方案。如果传递的路径也不会存在,我们不是试图建立保证该路径的父目录被立即删除临时文件- creatability的更便携的(如昂贵的)测试:

import os, tempfile

def is_path_sibling_creatable(pathname: str) -> bool:
    '''
    `True` if the current user has sufficient permissions to create **siblings**
    (i.e., arbitrary files in the parent directory) of the passed pathname;
    `False` otherwise.
    '''
    # Parent directory of the passed path. If empty, we substitute the current
    # working directory (CWD) instead.
    dirname = os.path.dirname(pathname) or os.getcwd()

    try:
        # For safety, explicitly close and hence delete this temporary file
        # immediately after creating it in the passed path's parent directory.
        with tempfile.TemporaryFile(dir=dirname): pass
        return True
    # While the exact type of exception raised by the above function depends on
    # the current version of the Python interpreter, all such types subclass the
    # following exception superclass.
    except EnvironmentError:
        return False

def is_path_exists_or_creatable_portable(pathname: str) -> bool:
    '''
    `True` if the passed pathname is a valid pathname on the current OS _and_
    either currently exists or is hypothetically creatable in a cross-platform
    manner optimized for POSIX-unfriendly filesystems; `False` otherwise.

    This function is guaranteed to _never_ raise exceptions.
    '''
    try:
        # To prevent "os" module calls from raising undesirable exceptions on
        # invalid pathnames, is_pathname_valid() is explicitly called first.
        return is_pathname_valid(pathname) and (
            os.path.exists(pathname) or is_path_sibling_creatable(pathname))
    # Report failure on non-fatal filesystem complaints (e.g., connection
    # timeouts, permissions issues) implying this path to be inaccessible. All
    # other exceptions are unrelated fatal issues and should not be caught here.
    except OSError:
        return False

但是请注意,即使可能还不够。

多亏了用户访问控制(UAC),永远无法模仿的Windows Vista及其所有后续迭代都明显涉及与系统目录有关的权限。当非管理员用户尝试在规范目录C:\WindowsC:\Windows\system32目录中创建文件时,UAC会从表面上允许用户这样做,同时实际上将所有创建的文件隔离到该用户配置文件中的“虚拟存储”中。(谁能想到欺骗用户会产生有害的长期后果?)

这太疯狂了。这是Windows。

证明给我看

敢吗 现在该进行上述测试了。

由于NULL是面向UNIX的文件系统上路径名中唯一禁止使用的字符,因此让我们利用它来展示冷酷的事实–忽略不可忽略的Windows恶作剧,坦白地说,这同样使我感到厌烦并激怒了我:

>>> print('"foo.bar" valid? ' + str(is_pathname_valid('foo.bar')))
"foo.bar" valid? True
>>> print('Null byte valid? ' + str(is_pathname_valid('\x00')))
Null byte valid? False
>>> print('Long path valid? ' + str(is_pathname_valid('a' * 256)))
Long path valid? False
>>> print('"/dev" exists or creatable? ' + str(is_path_exists_or_creatable('/dev')))
"/dev" exists or creatable? True
>>> print('"/dev/foo.bar" exists or creatable? ' + str(is_path_exists_or_creatable('/dev/foo.bar')))
"/dev/foo.bar" exists or creatable? False
>>> print('Null byte exists or creatable? ' + str(is_path_exists_or_creatable('\x00')))
Null byte exists or creatable? False

超越理智。超越痛苦。您会发现Python可移植性问题。

tl;dr

Call the is_path_exists_or_creatable() function defined below.

Strictly Python 3. That’s just how we roll.

A Tale of Two Questions

The question of “How do I test pathname validity and, for valid pathnames, the existence or writability of those paths?” is clearly two separate questions. Both are interesting, and neither have received a genuinely satisfactory answer here… or, well, anywhere that I could grep.

vikki‘s answer probably hews the closest, but has the remarkable disadvantages of:

  • Needlessly opening (…and then failing to reliably close) file handles.
  • Needlessly writing (…and then failing to reliable close or delete) 0-byte files.
  • Ignoring OS-specific errors differentiating between non-ignorable invalid pathnames and ignorable filesystem issues. Unsurprisingly, this is critical under Windows. (See below.)
  • Ignoring race conditions resulting from external processes concurrently (re)moving parent directories of the pathname to be tested. (See below.)
  • Ignoring connection timeouts resulting from this pathname residing on stale, slow, or otherwise temporarily inaccessible filesystems. This could expose public-facing services to potential DoS-driven attacks. (See below.)

We’re gonna fix all that.

Question #0: What’s Pathname Validity Again?

Before hurling our fragile meat suits into the python-riddled moshpits of pain, we should probably define what we mean by “pathname validity.” What defines validity, exactly?

By “pathname validity,” we mean the syntactic correctness of a pathname with respect to the root filesystem of the current system – regardless of whether that path or parent directories thereof physically exist. A pathname is syntactically correct under this definition if it complies with all syntactic requirements of the root filesystem.

By “root filesystem,” we mean:

  • On POSIX-compatible systems, the filesystem mounted to the root directory (/).
  • On Windows, the filesystem mounted to %HOMEDRIVE%, the colon-suffixed drive letter containing the current Windows installation (typically but not necessarily C:).

The meaning of “syntactic correctness,” in turn, depends on the type of root filesystem. For ext4 (and most but not all POSIX-compatible) filesystems, a pathname is syntactically correct if and only if that pathname:

  • Contains no null bytes (i.e., \x00 in Python). This is a hard requirement for all POSIX-compatible filesystems.
  • Contains no path components longer than 255 bytes (e.g., 'a'*256 in Python). A path component is a longest substring of a pathname containing no / character (e.g., bergtatt, ind, i, and fjeldkamrene in the pathname /bergtatt/ind/i/fjeldkamrene).

Syntactic correctness. Root filesystem. That’s it.

Question #1: How Now Shall We Do Pathname Validity?

Validating pathnames in Python is surprisingly non-intuitive. I’m in firm agreement with Fake Name here: the official os.path package should provide an out-of-the-box solution for this. For unknown (and probably uncompelling) reasons, it doesn’t. Fortunately, unrolling your own ad-hoc solution isn’t that gut-wrenching…

O.K., it actually is. It’s hairy; it’s nasty; it probably chortles as it burbles and giggles as it glows. But what you gonna do? Nuthin’.

We’ll soon descend into the radioactive abyss of low-level code. But first, let’s talk high-level shop. The standard os.stat() and os.lstat() functions raise the following exceptions when passed invalid pathnames:

  • For pathnames residing in non-existing directories, instances of FileNotFoundError.
  • For pathnames residing in existing directories:
    • Under Windows, instances of WindowsError whose winerror attribute is 123 (i.e., ERROR_INVALID_NAME).
    • Under all other OSes:
    • For pathnames containing null bytes (i.e., '\x00'), instances of TypeError.
    • For pathnames containing path components longer than 255 bytes, instances of OSError whose errcode attribute is:
      • Under SunOS and the *BSD family of OSes, errno.ERANGE. (This appears to be an OS-level bug, otherwise referred to as “selective interpretation” of the POSIX standard.)
      • Under all other OSes, errno.ENAMETOOLONG.

Crucially, this implies that only pathnames residing in existing directories are validatable. The os.stat() and os.lstat() functions raise generic FileNotFoundError exceptions when passed pathnames residing in non-existing directories, regardless of whether those pathnames are invalid or not. Directory existence takes precedence over pathname invalidity.

Does this mean that pathnames residing in non-existing directories are not validatable? Yes – unless we modify those pathnames to reside in existing directories. Is that even safely feasible, however? Shouldn’t modifying a pathname prevent us from validating the original pathname?

To answer this question, recall from above that syntactically correct pathnames on the ext4 filesystem contain no path components (A) containing null bytes or (B) over 255 bytes in length. Hence, an ext4 pathname is valid if and only if all path components in that pathname are valid. This is true of most real-world filesystems of interest.

Does that pedantic insight actually help us? Yes. It reduces the larger problem of validating the full pathname in one fell swoop to the smaller problem of only validating all path components in that pathname. Any arbitrary pathname is validatable (regardless of whether that pathname resides in an existing directory or not) in a cross-platform manner by following the following algorithm:

  1. Split that pathname into path components (e.g., the pathname /troldskog/faren/vild into the list ['', 'troldskog', 'faren', 'vild']).
  2. For each such component:
    1. Join the pathname of a directory guaranteed to exist with that component into a new temporary pathname (e.g., /troldskog) .
    2. Pass that pathname to os.stat() or os.lstat(). If that pathname and hence that component is invalid, this call is guaranteed to raise an exception exposing the type of invalidity rather than a generic FileNotFoundError exception. Why? Because that pathname resides in an existing directory. (Circular logic is circular.)

Is there a directory guaranteed to exist? Yes, but typically only one: the topmost directory of the root filesystem (as defined above).

Passing pathnames residing in any other directory (and hence not guaranteed to exist) to os.stat() or os.lstat() invites race conditions, even if that directory was previously tested to exist. Why? Because external processes cannot be prevented from concurrently removing that directory after that test has been performed but before that pathname is passed to os.stat() or os.lstat(). Unleash the dogs of mind-fellating insanity!

There exists a substantial side benefit to the above approach as well: security. (Isn’t that nice?) Specifically:

Front-facing applications validating arbitrary pathnames from untrusted sources by simply passing such pathnames to os.stat() or os.lstat() are susceptible to Denial of Service (DoS) attacks and other black-hat shenanigans. Malicious users may attempt to repeatedly validate pathnames residing on filesystems known to be stale or otherwise slow (e.g., NFS Samba shares); in that case, blindly statting incoming pathnames is liable to either eventually fail with connection timeouts or consume more time and resources than your feeble capacity to withstand unemployment.

The above approach obviates this by only validating the path components of a pathname against the root directory of the root filesystem. (If even that’s stale, slow, or inaccessible, you’ve got larger problems than pathname validation.)

Lost? Great. Let’s begin. (Python 3 assumed. See “What Is Fragile Hope for 300, leycec?”)

import errno, os

# Sadly, Python fails to provide the following magic number for us.
ERROR_INVALID_NAME = 123
'''
Windows-specific error code indicating an invalid pathname.

See Also
----------
https://docs.microsoft.com/en-us/windows/win32/debug/system-error-codes--0-499-
    Official listing of all such codes.
'''

def is_pathname_valid(pathname: str) -> bool:
    '''
    `True` if the passed pathname is a valid pathname for the current OS;
    `False` otherwise.
    '''
    # If this pathname is either not a string or is but is empty, this pathname
    # is invalid.
    try:
        if not isinstance(pathname, str) or not pathname:
            return False

        # Strip this pathname's Windows-specific drive specifier (e.g., `C:\`)
        # if any. Since Windows prohibits path components from containing `:`
        # characters, failing to strip this `:`-suffixed prefix would
        # erroneously invalidate all valid absolute Windows pathnames.
        _, pathname = os.path.splitdrive(pathname)

        # Directory guaranteed to exist. If the current OS is Windows, this is
        # the drive to which Windows was installed (e.g., the "%HOMEDRIVE%"
        # environment variable); else, the typical root directory.
        root_dirname = os.environ.get('HOMEDRIVE', 'C:') \
            if sys.platform == 'win32' else os.path.sep
        assert os.path.isdir(root_dirname)   # ...Murphy and her ironclad Law

        # Append a path separator to this directory if needed.
        root_dirname = root_dirname.rstrip(os.path.sep) + os.path.sep

        # Test whether each path component split from this pathname is valid or
        # not, ignoring non-existent and non-readable path components.
        for pathname_part in pathname.split(os.path.sep):
            try:
                os.lstat(root_dirname + pathname_part)
            # If an OS-specific exception is raised, its error code
            # indicates whether this pathname is valid or not. Unless this
            # is the case, this exception implies an ignorable kernel or
            # filesystem complaint (e.g., path not found or inaccessible).
            #
            # Only the following exceptions indicate invalid pathnames:
            #
            # * Instances of the Windows-specific "WindowsError" class
            #   defining the "winerror" attribute whose value is
            #   "ERROR_INVALID_NAME". Under Windows, "winerror" is more
            #   fine-grained and hence useful than the generic "errno"
            #   attribute. When a too-long pathname is passed, for example,
            #   "errno" is "ENOENT" (i.e., no such file or directory) rather
            #   than "ENAMETOOLONG" (i.e., file name too long).
            # * Instances of the cross-platform "OSError" class defining the
            #   generic "errno" attribute whose value is either:
            #   * Under most POSIX-compatible OSes, "ENAMETOOLONG".
            #   * Under some edge-case OSes (e.g., SunOS, *BSD), "ERANGE".
            except OSError as exc:
                if hasattr(exc, 'winerror'):
                    if exc.winerror == ERROR_INVALID_NAME:
                        return False
                elif exc.errno in {errno.ENAMETOOLONG, errno.ERANGE}:
                    return False
    # If a "TypeError" exception was raised, it almost certainly has the
    # error message "embedded NUL character" indicating an invalid pathname.
    except TypeError as exc:
        return False
    # If no exception was raised, all path components and hence this
    # pathname itself are valid. (Praise be to the curmudgeonly python.)
    else:
        return True
    # If any other exception was raised, this is an unrelated fatal issue
    # (e.g., a bug). Permit this exception to unwind the call stack.
    #
    # Did we mention this should be shipped with Python already?

Done. Don’t squint at that code. (It bites.)

Question #2: Possibly Invalid Pathname Existence or Creatability, Eh?

Testing the existence or creatability of possibly invalid pathnames is, given the above solution, mostly trivial. The little key here is to call the previously defined function before testing the passed path:

def is_path_creatable(pathname: str) -> bool:
    '''
    `True` if the current user has sufficient permissions to create the passed
    pathname; `False` otherwise.
    '''
    # Parent directory of the passed path. If empty, we substitute the current
    # working directory (CWD) instead.
    dirname = os.path.dirname(pathname) or os.getcwd()
    return os.access(dirname, os.W_OK)

def is_path_exists_or_creatable(pathname: str) -> bool:
    '''
    `True` if the passed pathname is a valid pathname for the current OS _and_
    either currently exists or is hypothetically creatable; `False` otherwise.

    This function is guaranteed to _never_ raise exceptions.
    '''
    try:
        # To prevent "os" module calls from raising undesirable exceptions on
        # invalid pathnames, is_pathname_valid() is explicitly called first.
        return is_pathname_valid(pathname) and (
            os.path.exists(pathname) or is_path_creatable(pathname))
    # Report failure on non-fatal filesystem complaints (e.g., connection
    # timeouts, permissions issues) implying this path to be inaccessible. All
    # other exceptions are unrelated fatal issues and should not be caught here.
    except OSError:
        return False

Done and done. Except not quite.

Question #3: Possibly Invalid Pathname Existence or Writability on Windows

There exists a caveat. Of course there does.

As the official os.access() documentation admits:

Note: I/O operations may fail even when os.access() indicates that they would succeed, particularly for operations on network filesystems which may have permissions semantics beyond the usual POSIX permission-bit model.

To no one’s surprise, Windows is the usual suspect here. Thanks to extensive use of Access Control Lists (ACL) on NTFS filesystems, the simplistic POSIX permission-bit model maps poorly to the underlying Windows reality. While this (arguably) isn’t Python’s fault, it might nonetheless be of concern for Windows-compatible applications.

If this is you, a more robust alternative is wanted. If the passed path does not exist, we instead attempt to create a temporary file guaranteed to be immediately deleted in the parent directory of that path – a more portable (if expensive) test of creatability:

import os, tempfile

def is_path_sibling_creatable(pathname: str) -> bool:
    '''
    `True` if the current user has sufficient permissions to create **siblings**
    (i.e., arbitrary files in the parent directory) of the passed pathname;
    `False` otherwise.
    '''
    # Parent directory of the passed path. If empty, we substitute the current
    # working directory (CWD) instead.
    dirname = os.path.dirname(pathname) or os.getcwd()

    try:
        # For safety, explicitly close and hence delete this temporary file
        # immediately after creating it in the passed path's parent directory.
        with tempfile.TemporaryFile(dir=dirname): pass
        return True
    # While the exact type of exception raised by the above function depends on
    # the current version of the Python interpreter, all such types subclass the
    # following exception superclass.
    except EnvironmentError:
        return False

def is_path_exists_or_creatable_portable(pathname: str) -> bool:
    '''
    `True` if the passed pathname is a valid pathname on the current OS _and_
    either currently exists or is hypothetically creatable in a cross-platform
    manner optimized for POSIX-unfriendly filesystems; `False` otherwise.

    This function is guaranteed to _never_ raise exceptions.
    '''
    try:
        # To prevent "os" module calls from raising undesirable exceptions on
        # invalid pathnames, is_pathname_valid() is explicitly called first.
        return is_pathname_valid(pathname) and (
            os.path.exists(pathname) or is_path_sibling_creatable(pathname))
    # Report failure on non-fatal filesystem complaints (e.g., connection
    # timeouts, permissions issues) implying this path to be inaccessible. All
    # other exceptions are unrelated fatal issues and should not be caught here.
    except OSError:
        return False

Note, however, that even this may not be enough.

Thanks to User Access Control (UAC), the ever-inimicable Windows Vista and all subsequent iterations thereof blatantly lie about permissions pertaining to system directories. When non-Administrator users attempt to create files in either the canonical C:\Windows or C:\Windows\system32 directories, UAC superficially permits the user to do so while actually isolating all created files into a “Virtual Store” in that user’s profile. (Who could have possibly imagined that deceiving users would have harmful long-term consequences?)

This is crazy. This is Windows.

Prove It

Dare we? It’s time to test-drive the above tests.

Since NULL is the only character prohibited in pathnames on UNIX-oriented filesystems, let’s leverage that to demonstrate the cold, hard truth – ignoring non-ignorable Windows shenanigans, which frankly bore and anger me in equal measure:

>>> print('"foo.bar" valid? ' + str(is_pathname_valid('foo.bar')))
"foo.bar" valid? True
>>> print('Null byte valid? ' + str(is_pathname_valid('\x00')))
Null byte valid? False
>>> print('Long path valid? ' + str(is_pathname_valid('a' * 256)))
Long path valid? False
>>> print('"/dev" exists or creatable? ' + str(is_path_exists_or_creatable('/dev')))
"/dev" exists or creatable? True
>>> print('"/dev/foo.bar" exists or creatable? ' + str(is_path_exists_or_creatable('/dev/foo.bar')))
"/dev/foo.bar" exists or creatable? False
>>> print('Null byte exists or creatable? ' + str(is_path_exists_or_creatable('\x00')))
Null byte exists or creatable? False

Beyond sanity. Beyond pain. You will find Python portability concerns.


回答 1

if os.path.exists(filePath):
    #the file is there
elif os.access(os.path.dirname(filePath), os.W_OK):
    #the file does not exists but write privileges are given
else:
    #can not write there

请注意,path.exists失败的原因the file is not there可能不仅仅是,所以您可能必须进行更精细的测试,例如测试包含目录是否存在等等。


在与OP讨论之后,事实证明,主要的问题似乎是文件名可能包含文件系统不允许的字符。当然,需要将它们删除,但是OP希望在文件系统允许的范围内保持尽可能多的人可读性。

可悲的是,我不知道有什么好的解决方案。但是,塞西尔·库里(Cecil Curry)的答案更仔细地研究了发现问题。

if os.path.exists(filePath):
    #the file is there
elif os.access(os.path.dirname(filePath), os.W_OK):
    #the file does not exists but write privileges are given
else:
    #can not write there

Note that path.exists can fail for more reasons than just the file is not there so you might have to do finer tests like testing if the containing directory exists and so on.


After my discussion with the OP it turned out, that the main problem seems to be, that the file name might contain characters that are not allowed by the filesystem. Of course they need to be removed but the OP wants to maintain as much human readablitiy as the filesystem allows.

Sadly I do not know of any good solution for this. However Cecil Curry’s answer takes a closer look at detecting the problem.


回答 2

使用Python 3,如何:

try:
    with open(filename, 'x') as tempfile: # OSError if file exists or is invalid
        pass
except OSError:
    # handle error here

使用“ x”选项,我们也不必担心比赛条件。请参阅此处的文档。

现在,如果该文件尚不存在,它将创建一个寿命很短的临时文件-除非名称无效。如果您可以忍受,那么可以简化很多事情。

With Python 3, how about:

try:
    with open(filename, 'x') as tempfile: # OSError if file exists or is invalid
        pass
except OSError:
    # handle error here

With the ‘x’ option we also don’t have to worry about race conditions. See documentation here.

Now, this WILL create a very shortlived temporary file if it does not exist already – unless the name is invalid. If you can live with that, it simplifies things a lot.


回答 3

open(filename,'r')   #2nd argument is r and not w

将打开文件或给出错误(如果不存在)。如果有错误,那么您可以尝试写入路径,如果不能,则出现第二个错误

try:
    open(filename,'r')
    return True
except IOError:
    try:
        open(filename, 'w')
        return True
    except IOError:
        return False

也可以在这里查看有关Windows权限的信息

open(filename,'r')   #2nd argument is r and not w

will open the file or give an error if it doesn’t exist. If there’s an error, then you can try to write to the path, if you can’t then you get a second error

try:
    open(filename,'r')
    return True
except IOError:
    try:
        open(filename, 'w')
        return True
    except IOError:
        return False

Also have a look here about permissions on windows


回答 4

尝试os.path.exists此操作将检查路径,并返回True是否存在(False如果不存在)。

try os.path.exists this will check for the path and return True if exists and False if not.


如何使用glob.glob模块搜索子文件夹?

问题:如何使用glob.glob模块搜索子文件夹?

我想在文件夹中打开一系列子文件夹,然后找到一些文本文件并打印一些文本文件行。我正在使用这个:

configfiles = glob.glob('C:/Users/sam/Desktop/file1/*.txt')

但这也无法访问子文件夹。有谁知道我也可以使用相同的命令来访问子文件夹?

I want to open a series of subfolders in a folder and find some text files and print some lines of the text files. I am using this:

configfiles = glob.glob('C:/Users/sam/Desktop/file1/*.txt')

But this cannot access the subfolders as well. Does anyone know how I can use the same command to access subfolders as well?


回答 0

在Python 3.5及更高版本中,使用新的递归**/功能:

configfiles = glob.glob('C:/Users/sam/Desktop/file1/**/*.txt', recursive=True)

recursive被设置时,**随后是路径分隔匹配0或多个子目录。

在早期的Python版本中,glob.glob()无法递归列出子目录中的文件。

在这种情况下,我将改用os.walk()结合fnmatch.filter()

import os
import fnmatch

path = 'C:/Users/sam/Desktop/file1'

configfiles = [os.path.join(dirpath, f)
    for dirpath, dirnames, files in os.walk(path)
    for f in fnmatch.filter(files, '*.txt')]

这将递归遍历您的目录,并将所有绝对路径名返回到匹配.txt文件。在这种特定情况下,fnmatch.filter()可能是矫kill过正,您也可以使用.endswith()测试:

import os

path = 'C:/Users/sam/Desktop/file1'

configfiles = [os.path.join(dirpath, f)
    for dirpath, dirnames, files in os.walk(path)
    for f in files if f.endswith('.txt')]

In Python 3.5 and newer use the new recursive **/ functionality:

configfiles = glob.glob('C:/Users/sam/Desktop/file1/**/*.txt', recursive=True)

When recursive is set, ** followed by a path separator matches 0 or more subdirectories.

In earlier Python versions, glob.glob() cannot list files in subdirectories recursively.

In that case I’d use os.walk() combined with fnmatch.filter() instead:

import os
import fnmatch

path = 'C:/Users/sam/Desktop/file1'

configfiles = [os.path.join(dirpath, f)
    for dirpath, dirnames, files in os.walk(path)
    for f in fnmatch.filter(files, '*.txt')]

This’ll walk your directories recursively and return all absolute pathnames to matching .txt files. In this specific case the fnmatch.filter() may be overkill, you could also use a .endswith() test:

import os

path = 'C:/Users/sam/Desktop/file1'

configfiles = [os.path.join(dirpath, f)
    for dirpath, dirnames, files in os.walk(path)
    for f in files if f.endswith('.txt')]

回答 1

要在直接子目录中查找文件:

configfiles = glob.glob(r'C:\Users\sam\Desktop\*\*.txt')

对于遍历所有子目录的递归版本,您可以使用**和传递recursive=True 自Python 3.5之后的版本

configfiles = glob.glob(r'C:\Users\sam\Desktop\**\*.txt', recursive=True)

这两个函数调用都返回列表。您可以用来glob.iglob()一一返回路径。或使用pathlib

from pathlib import Path

path = Path(r'C:\Users\sam\Desktop')
txt_files_only_subdirs = path.glob('*/*.txt')
txt_files_all_recursively = path.rglob('*.txt') # including the current dir

两种方法都返回迭代器(您可以一一获取路径)。

To find files in immediate subdirectories:

configfiles = glob.glob(r'C:\Users\sam\Desktop\*\*.txt')

For a recursive version that traverse all subdirectories, you could use ** and pass recursive=True since Python 3.5:

configfiles = glob.glob(r'C:\Users\sam\Desktop\**\*.txt', recursive=True)

Both function calls return lists. You could use glob.iglob() to return paths one by one. Or use pathlib:

from pathlib import Path

path = Path(r'C:\Users\sam\Desktop')
txt_files_only_subdirs = path.glob('*/*.txt')
txt_files_all_recursively = path.rglob('*.txt') # including the current dir

Both methods return iterators (you can get paths one by one).


回答 2

在这个话题上有很多困惑。让我看看是否可以澄清它(Python 3.7):

  1. glob.glob('*.txt') :匹配当前目录中所有以“ .txt”结尾的文件
  2. glob.glob('*/*.txt') :与1相同
  3. glob.glob('**/*.txt') :仅匹配直接子目录中所有以’.txt’结尾的文件,而不匹配当前目录中的所有文件
  4. glob.glob('*.txt',recursive=True) :与1相同
  5. glob.glob('*/*.txt',recursive=True) :与3相同
  6. glob.glob('**/*.txt',recursive=True):匹配当前目录和所有子目录中所有以“ .txt”结尾的文件

所以最好总是指定 recursive=True.

There’s a lot of confusion on this topic. Let me see if I can clarify it (Python 3.7):

  1. glob.glob('*.txt') :matches all files ending in ‘.txt’ in current directory
  2. glob.glob('*/*.txt') :same as 1
  3. glob.glob('**/*.txt') :matches all files ending in ‘.txt’ in the immediate subdirectories only, but not in the current directory
  4. glob.glob('*.txt',recursive=True) :same as 1
  5. glob.glob('*/*.txt',recursive=True) :same as 3
  6. glob.glob('**/*.txt',recursive=True):matches all files ending in ‘.txt’ in the current directory and in all subdirectories

So it’s best to always specify recursive=True.


回答 3

glob2包支持通配符和相当快

code = '''
import glob2
glob2.glob("files/*/**")
'''
timeit.timeit(code, number=1)

在我的笔记本电脑上,匹配> 60,000个文件路径大约需要2秒钟。

The glob2 package supports wild cards and is reasonably fast

code = '''
import glob2
glob2.glob("files/*/**")
'''
timeit.timeit(code, number=1)

On my laptop it takes approximately 2 seconds to match >60,000 file paths.


回答 4

您可以在Python 2.6中使用Formic

import formic
fileset = formic.FileSet(include="**/*.txt", directory="C:/Users/sam/Desktop/")

披露-我是该软件包的作者。

You can use Formic with Python 2.6

import formic
fileset = formic.FileSet(include="**/*.txt", directory="C:/Users/sam/Desktop/")

Disclosure – I am the author of this package.


回答 5

这是改编版,glob.glob无需使用即可启用类似功能glob2

def find_files(directory, pattern='*'):
    if not os.path.exists(directory):
        raise ValueError("Directory not found {}".format(directory))

    matches = []
    for root, dirnames, filenames in os.walk(directory):
        for filename in filenames:
            full_path = os.path.join(root, filename)
            if fnmatch.filter([full_path], pattern):
                matches.append(os.path.join(root, filename))
    return matches

因此,如果您具有以下目录结构

tests/files
├── a0
   ├── a0.txt
   ├── a0.yaml
   └── b0
       ├── b0.yaml
       └── b00.yaml
└── a1

你可以做这样的事情

files = utils.find_files('tests/files','**/b0/b*.yaml')
> ['tests/files/a0/b0/b0.yaml', 'tests/files/a0/b0/b00.yaml']

几乎fnmatch对整个文件名本身模式匹配,而不只是文件名。

Here is a adapted version that enables glob.glob like functionality without using glob2.

def find_files(directory, pattern='*'):
    if not os.path.exists(directory):
        raise ValueError("Directory not found {}".format(directory))

    matches = []
    for root, dirnames, filenames in os.walk(directory):
        for filename in filenames:
            full_path = os.path.join(root, filename)
            if fnmatch.filter([full_path], pattern):
                matches.append(os.path.join(root, filename))
    return matches

So if you have the following dir structure

tests/files
├── a0
│   ├── a0.txt
│   ├── a0.yaml
│   └── b0
│       ├── b0.yaml
│       └── b00.yaml
└── a1

You can do something like this

files = utils.find_files('tests/files','**/b0/b*.yaml')
> ['tests/files/a0/b0/b0.yaml', 'tests/files/a0/b0/b00.yaml']

Pretty much fnmatch pattern match on the whole filename itself, rather than the filename only.


回答 6

configfiles = glob.glob('C:/Users/sam/Desktop/**/*.txt")

并非在所有情况下都适用,请改用glob2

configfiles = glob2.glob('C:/Users/sam/Desktop/**/*.txt")

configfiles = glob.glob('C:/Users/sam/Desktop/**/*.txt")

Doesn’t works for all cases, instead use glob2

configfiles = glob2.glob('C:/Users/sam/Desktop/**/*.txt")

回答 7

如果可以安装glob2软件包…

import glob2
filenames = glob2.glob("C:\\top_directory\\**\\*.ext")  # Where ext is a specific file extension
folders = glob2.glob("C:\\top_directory\\**\\")

所有文件名和文件夹:

all_ff = glob2.glob("C:\\top_directory\\**\\**")  

If you can install glob2 package…

import glob2
filenames = glob2.glob("C:\\top_directory\\**\\*.ext")  # Where ext is a specific file extension
folders = glob2.glob("C:\\top_directory\\**\\")

All filenames and folders:

all_ff = glob2.glob("C:\\top_directory\\**\\**")  

回答 8

如果您运行的是Python 3.4+,则可以使用该pathlib模块。该Path.glob()方法支持**模式,即“递归该目录和所有子目录”。它返回一个生成器,生成Path所有匹配文件的对象。

from pathlib import Path
configfiles = Path("C:/Users/sam/Desktop/file1/").glob("**/*.txt")

If you’re running Python 3.4+, you can use the pathlib module. The Path.glob() method supports the ** pattern, which means “this directory and all subdirectories, recursively”. It returns a generator yielding Path objects for all matching files.

from pathlib import Path
configfiles = Path("C:/Users/sam/Desktop/file1/").glob("**/*.txt")

回答 9

正如Martijn所指出的,glob只能通过**Python 3.5中引入的运算符来做到这一点。由于OP明确要求使用glob模块,因此以下代码将返回行为类似的惰性评估迭代器

import os, glob, itertools

configfiles = itertools.chain.from_iterable(glob.iglob(os.path.join(root,'*.txt'))
                         for root, dirs, files in os.walk('C:/Users/sam/Desktop/file1/'))

请注意,configfiles尽管如此,您只能在此方法中重复一次。如果您需要可在多个操作中使用的配置文件的真实列表,则必须使用创建显式的配置文件list(configfiles)

As pointed out by Martijn, glob can only do this through the **operator introduced in Python 3.5. Since the OP explicitly asked for the glob module, the following will return a lazy evaluation iterator that behaves similarly

import os, glob, itertools

configfiles = itertools.chain.from_iterable(glob.iglob(os.path.join(root,'*.txt'))
                         for root, dirs, files in os.walk('C:/Users/sam/Desktop/file1/'))

Note that you can only iterate once over configfiles in this approach though. If you require a real list of configfiles that can be used in multiple operations you would have to create this explicitly by using list(configfiles).


回答 10

该命令rglob将对目录结构的最深子级别进行无限递归。如果您只想深一层,则不要使用它。

我意识到OP正在谈论使用glob.glob。我相信,这可以回答意图,即递归搜索所有子文件夹。

rglob函数最近使数据处理算法的速度提高了100倍,该算法使用文件夹结构作为数据读取顺序的固定假设。但是,由于rglob我们能够对指定父目录或该目录下的所有文件进行一次扫描,将它们的名称保存到列表(超过一百万个文件),然后使用该列表来确定我们需要在任何目录下打开哪些文件仅基于文件命名约定及其在哪个文件夹中指向将来。

The command rglob will do an infinite recursion down the deepest sub-level of your directory structure. If you only want one level deep, then do not use it, however.

I realize the OP was talking about using glob.glob. I believe this answers the intent, however, which is to search all subfolders recursively.

The rglob function recently produced a 100x increase in speed for a data processing algorithm which was using the folder structure as a fixed assumption for the order of data reading. However, with rglob we were able to do a single scan once through all files at or below a specified parent directory, save their names to a list (over a million files), then use that list to determine which files we needed to open at any point in the future based on the file naming conventions only vs. which folder they were in.


获取目录中文件的过滤列表

问题:获取目录中文件的过滤列表

我正在尝试使用Python获取目录中的文件列表,但是我不想要所有文件的列表。

我本质上想要的是能够执行以下操作但使用Python而不执行ls的功能。

ls 145592*.jpg

如果没有内置方法,我目前正在考虑编写一个for循环以遍历an的结果。 os.listdir()并将所有匹配的文件附加到新列表中。

但是,该目录中有很多文件,因此我希望有一种更有效的方法(或内置方法)。

I am trying to get a list of files in a directory using Python, but I do not want a list of ALL the files.

What I essentially want is the ability to do something like the following but using Python and not executing ls.

ls 145592*.jpg

If there is no built-in method for this, I am currently thinking of writing a for loop to iterate through the results of an os.listdir() and to append all the matching files to a new list.

However, there are a lot of files in that directory and therefore I am hoping there is a more efficient method (or a built-in method).


回答 0

import glob

jpgFilenamesList = glob.glob('145592*.jpg')

See glob in python documenttion


回答 1

glob.glob()绝对是做到这一点的方式(根据Ignacio)。但是,如果您确实需要更复杂的匹配,则可以使用列表理解和来完成re.match(),例如:

files = [f for f in os.listdir('.') if re.match(r'[0-9]+.*\.jpg', f)]

更加灵活,但是您注意到效率更低。

glob.glob() is definitely the way to do it (as per Ignacio). However, if you do need more complicated matching, you can do it with a list comprehension and re.match(), something like so:

files = [f for f in os.listdir('.') if re.match(r'[0-9]+.*\.jpg', f)]

More flexible, but as you note, less efficient.


回答 2

把事情简单化:

import os
relevant_path = "[path to folder]"
included_extensions = ['jpg','jpeg', 'bmp', 'png', 'gif']
file_names = [fn for fn in os.listdir(relevant_path)
              if any(fn.endswith(ext) for ext in included_extensions)]

我更喜欢这种形式的列表理解,因为它的英文读起来很好。

我将第四行读为:对于os.listdir中路径的每个fn,请仅提供与我包含的任何扩展名匹配的那些fn。

对于新手python程序员来说,可能很难真正习惯于使用列表推导进行过滤,并且对于非常大的数据集,它可能会有一些内存开销,但是对于列出目录和其他简单的字符串过滤任务,列表推导会导致更干净可记录的代码。

这种设计的唯一之处在于,它不能保护您避免犯错误,而不是传递字符串而不是列表。例如,如果您不小心将字符串转换为列表,并最终检查了字符串的所有字符,则可能最终会得到一系列误报。

但是,拥有一个易于解决的问题比解决一个难以理解的解决方案要好。

Keep it simple:

import os
relevant_path = "[path to folder]"
included_extensions = ['jpg','jpeg', 'bmp', 'png', 'gif']
file_names = [fn for fn in os.listdir(relevant_path)
              if any(fn.endswith(ext) for ext in included_extensions)]

I prefer this form of list comprehensions because it reads well in English.

I read the fourth line as: For each fn in os.listdir for my path, give me only the ones that match any one of my included extensions.

It may be hard for novice python programmers to really get used to using list comprehensions for filtering, and it can have some memory overhead for very large data sets, but for listing a directory and other simple string filtering tasks, list comprehensions lead to more clean documentable code.

The only thing about this design is that it doesn’t protect you against making the mistake of passing a string instead of a list. For example if you accidentally convert a string to a list and end up checking against all the characters of a string, you could end up getting a slew of false positives.

But it’s better to have a problem that’s easy to fix than a solution that’s hard to understand.


回答 3

另外的选择:

>>> import os, fnmatch
>>> fnmatch.filter(os.listdir('.'), '*.py')
['manage.py']

https://docs.python.org/3/library/fnmatch.html

Another option:

>>> import os, fnmatch
>>> fnmatch.filter(os.listdir('.'), '*.py')
['manage.py']

https://docs.python.org/3/library/fnmatch.html


回答 4

过滤glob模块:

导入球

import glob

通配符:

files=glob.glob("data/*")
print(files)

Out:

['data/ks_10000_0', 'data/ks_1000_0', 'data/ks_100_0', 'data/ks_100_1',
'data/ks_100_2', 'data/ks_106_0', 'data/ks_19_0', 'data/ks_200_0', 'data/ks_200_1', 
'data/ks_300_0', 'data/ks_30_0', 'data/ks_400_0', 'data/ks_40_0', 'data/ks_45_0', 
'data/ks_4_0', 'data/ks_500_0', 'data/ks_50_0', 'data/ks_50_1', 'data/ks_60_0', 
'data/ks_82_0', 'data/ks_lecture_dp_1', 'data/ks_lecture_dp_2']

接头扩展.txt

files = glob.glob("/home/ach/*/*.txt")

一个字符

glob.glob("/home/ach/file?.txt")

编号范围

glob.glob("/home/ach/*[0-9]*")

字母范围

glob.glob("/home/ach/[a-c]*")

Filter with glob module:

Import glob

import glob

Wild Cards:

files=glob.glob("data/*")
print(files)

Out:

['data/ks_10000_0', 'data/ks_1000_0', 'data/ks_100_0', 'data/ks_100_1',
'data/ks_100_2', 'data/ks_106_0', 'data/ks_19_0', 'data/ks_200_0', 'data/ks_200_1', 
'data/ks_300_0', 'data/ks_30_0', 'data/ks_400_0', 'data/ks_40_0', 'data/ks_45_0', 
'data/ks_4_0', 'data/ks_500_0', 'data/ks_50_0', 'data/ks_50_1', 'data/ks_60_0', 
'data/ks_82_0', 'data/ks_lecture_dp_1', 'data/ks_lecture_dp_2']

Fiter extension .txt:

files = glob.glob("/home/ach/*/*.txt")

A single character

glob.glob("/home/ach/file?.txt")

Number Ranges

glob.glob("/home/ach/*[0-9]*")

Alphabet Ranges

glob.glob("/home/ach/[a-c]*")

回答 5

初步代码

import glob
import fnmatch
import pathlib
import os

pattern = '*.py'
path = '.'

解决方案1-使用“ glob”

# lookup in current dir
glob.glob(pattern)

In [2]: glob.glob(pattern)
Out[2]: ['wsgi.py', 'manage.py', 'tasks.py']

解决方案2-使用“操作系统” +“ fnmatch”

版本2.1-在当前目录中查找

# lookup in current dir
fnmatch.filter(os.listdir(path), pattern)

In [3]: fnmatch.filter(os.listdir(path), pattern)
Out[3]: ['wsgi.py', 'manage.py', 'tasks.py']

版本2.2-递归查找

# lookup recursive
for dirpath, dirnames, filenames in os.walk(path):

    if not filenames:
        continue

    pythonic_files = fnmatch.filter(filenames, pattern)
    if pythonic_files:
        for file in pythonic_files:
            print('{}/{}'.format(dirpath, file))

结果

./wsgi.py
./manage.py
./tasks.py
./temp/temp.py
./apps/diaries/urls.py
./apps/diaries/signals.py
./apps/diaries/actions.py
./apps/diaries/querysets.py
./apps/library/tests/test_forms.py
./apps/library/migrations/0001_initial.py
./apps/polls/views.py
./apps/polls/formsets.py
./apps/polls/reports.py
./apps/polls/admin.py

解决方案3使用“ pathlib”

# lookup in current dir
path_ = pathlib.Path('.')
tuple(path_.glob(pattern))

# lookup recursive
tuple(path_.rglob(pattern))

笔记:

  1. 在Python 3.4上测试
  2. 仅在Python 3.4中添加了模块“ pathlib”
  3. Python 3.5添加了glob.glob https://docs.python.org/3.5/library/glob.html#glob.glob递归查找的功能 。由于我的机器安装了Python 3.4,因此尚未进行测试。

Preliminary code

import glob
import fnmatch
import pathlib
import os

pattern = '*.py'
path = '.'

Solution 1 – use “glob”

# lookup in current dir
glob.glob(pattern)

In [2]: glob.glob(pattern)
Out[2]: ['wsgi.py', 'manage.py', 'tasks.py']

Solution 2 – use “os” + “fnmatch”

Variant 2.1 – Lookup in current dir

# lookup in current dir
fnmatch.filter(os.listdir(path), pattern)

In [3]: fnmatch.filter(os.listdir(path), pattern)
Out[3]: ['wsgi.py', 'manage.py', 'tasks.py']

Variant 2.2 – Lookup recursive

# lookup recursive
for dirpath, dirnames, filenames in os.walk(path):

    if not filenames:
        continue

    pythonic_files = fnmatch.filter(filenames, pattern)
    if pythonic_files:
        for file in pythonic_files:
            print('{}/{}'.format(dirpath, file))

Result

./wsgi.py
./manage.py
./tasks.py
./temp/temp.py
./apps/diaries/urls.py
./apps/diaries/signals.py
./apps/diaries/actions.py
./apps/diaries/querysets.py
./apps/library/tests/test_forms.py
./apps/library/migrations/0001_initial.py
./apps/polls/views.py
./apps/polls/formsets.py
./apps/polls/reports.py
./apps/polls/admin.py

Solution 3 – use “pathlib”

# lookup in current dir
path_ = pathlib.Path('.')
tuple(path_.glob(pattern))

# lookup recursive
tuple(path_.rglob(pattern))

Notes:

  1. Tested on the Python 3.4
  2. The module “pathlib” was added only in the Python 3.4
  3. The Python 3.5 added a feature for recursive lookup with glob.glob https://docs.python.org/3.5/library/glob.html#glob.glob. Since my machine is installed with Python 3.4, I have not tested that.

回答 6

使用os.walk递归列出您的文件

import os
root = "/home"
pattern = "145992"
alist_filter = ['jpg','bmp','png','gif'] 
path=os.path.join(root,"mydir_to_scan")
for r,d,f in os.walk(path):
    for file in f:
        if file[-3:] in alist_filter and pattern in file:
            print os.path.join(root,file)

use os.walk to recursively list your files

import os
root = "/home"
pattern = "145992"
alist_filter = ['jpg','bmp','png','gif'] 
path=os.path.join(root,"mydir_to_scan")
for r,d,f in os.walk(path):
    for file in f:
        if file[-3:] in alist_filter and pattern in file:
            print os.path.join(root,file)

回答 7

import os

dir="/path/to/dir"
[x[0]+"/"+f for x in os.walk(dir) for f in x[2] if f.endswith(".jpg")]

这将为您提供jpg文件及其完整路径的列表。您可以替换x[0]+"/"+ff的只是文件名。您也可以f.endswith(".jpg")用所需的任何字符串条件替换。

import os

dir="/path/to/dir"
[x[0]+"/"+f for x in os.walk(dir) for f in x[2] if f.endswith(".jpg")]

This will give you a list of jpg files with their full path. You can replace x[0]+"/"+f with f for just filenames. You can also replace f.endswith(".jpg") with whatever string condition you wish.


回答 8

您可能还需要更高级的方法(我已经实现并打包为findtools):

from findtools.find_files import (find_files, Match)


# Recursively find all *.txt files in **/home/**
txt_files_pattern = Match(filetype='f', name='*.txt')
found_files = find_files(path='/home', match=txt_files_pattern)

for found_file in found_files:
    print found_file

可以安装

pip install findtools

you might also like a more high-level approach (I have implemented and packaged as findtools):

from findtools.find_files import (find_files, Match)


# Recursively find all *.txt files in **/home/**
txt_files_pattern = Match(filetype='f', name='*.txt')
found_files = find_files(path='/home', match=txt_files_pattern)

for found_file in found_files:
    print found_file

can be installed with

pip install findtools

回答 9

“ path / to / images”中带有“ jpg”和“ png”扩展名的文件名:

import os
accepted_extensions = ["jpg", "png"]
filenames = [fn for fn in os.listdir("path/to/images") if fn.split(".")[-1] in accepted_extensions]

Filenames with “jpg” and “png” extensions in “path/to/images”:

import os
accepted_extensions = ["jpg", "png"]
filenames = [fn for fn in os.listdir("path/to/images") if fn.split(".")[-1] in accepted_extensions]

回答 10

您可以使用Python标准库3.4及更高版本中提供的pathlib

from pathlib import Path

files = [f for f in Path.cwd().iterdir() if f.match("145592*.jpg")]

You can use pathlib that is available in Python standard library 3.4 and above.

from pathlib import Path

files = [f for f in Path.cwd().iterdir() if f.match("145592*.jpg")]

回答 11

您可以定义模式并进行检查。在这里,我采用了开始和结束模式,并在文件名中查找它们。FILES包含目录中所有文件的列表。

import os
PATTERN_START = "145592"
PATTERN_END = ".jpg"
CURRENT_DIR = os.path.dirname(os.path.realpath(__file__))
for r,d,FILES in os.walk(CURRENT_DIR):
    for FILE in FILES:
        if PATTERN_START in FILE and PATTERN_END in FILE:
            print FILE

You can define pattern and check for it. Here I have taken both start and end pattern and looking for them in the filename. FILES contains the list of all the files in a directory.

import os
PATTERN_START = "145592"
PATTERN_END = ".jpg"
CURRENT_DIR = os.path.dirname(os.path.realpath(__file__))
for r,d,FILES in os.walk(CURRENT_DIR):
    for FILE in FILES:
        if PATTERN_START in FILE.startwith(PATTERN_START) and PATTERN_END in FILE.endswith(PATTERN_END):
            print FILE

回答 12

str.split()怎么样?没什么可导入的。

import os

image_names = [f for f in os.listdir(path) if len(f.split('.jpg')) == 2]

How about str.split()? Nothing to import.

import os

image_names = [f for f in os.listdir(path) if len(f.split('.jpg')) == 2]

回答 13

您可以使用subprocess.check_ouput()作为

import subprocess

list_files = subprocess.check_output("ls 145992*.jpg", shell=True) 

当然,引号之间的字符串可以是您要在shell中执行并存储输出的任何内容。

You can use subprocess.check_ouput() as

import subprocess

list_files = subprocess.check_output("ls 145992*.jpg", shell=True) 

Of course, the string between quotes can be anything you want to execute in the shell, and store the output.


如何使用glob()递归查找文件?

问题:如何使用glob()递归查找文件?

这就是我所拥有的:

glob(os.path.join('src','*.c'))

但我想搜索src的子文件夹。这样的事情会起作用:

glob(os.path.join('src','*.c'))
glob(os.path.join('src','*','*.c'))
glob(os.path.join('src','*','*','*.c'))
glob(os.path.join('src','*','*','*','*.c'))

但这显然是有限且笨拙的。

This is what I have:

glob(os.path.join('src','*.c'))

but I want to search the subfolders of src. Something like this would work:

glob(os.path.join('src','*.c'))
glob(os.path.join('src','*','*.c'))
glob(os.path.join('src','*','*','*.c'))
glob(os.path.join('src','*','*','*','*.c'))

But this is obviously limited and clunky.


回答 0

Python 3.5+

由于您使用的是新的python,因此应pathlib.Path.rglobpathlib模块中使用。

from pathlib import Path

for path in Path('src').rglob('*.c'):
    print(path.name)

如果您不想使用pathlib,只需使用glob.glob,但不要忘记传递recursive关键字参数。

对于匹配文件以点(。)开头的情况;例如当前目录中的文件或基于Unix的系统上的隐藏文件,请使用以下os.walk解决方案。

较旧的Python版本

对于较旧的Python版本,可os.walk用于递归遍历目录并fnmatch.filter与简单表达式匹配:

import fnmatch
import os

matches = []
for root, dirnames, filenames in os.walk('src'):
    for filename in fnmatch.filter(filenames, '*.c'):
        matches.append(os.path.join(root, filename))

Python 3.5+

Since you’re on a new python, you should use pathlib.Path.rglob from the the pathlib module.

from pathlib import Path

for path in Path('src').rglob('*.c'):
    print(path.name)

If you don’t want to use pathlib, just use glob.glob, but don’t forget to pass in the recursive keyword parameter.

For cases where matching files beginning with a dot (.); like files in the current directory or hidden files on Unix based system, use the os.walk solution below.

Older Python versions

For older Python versions, use os.walk to recursively walk a directory and fnmatch.filter to match against a simple expression:

import fnmatch
import os

matches = []
for root, dirnames, filenames in os.walk('src'):
    for filename in fnmatch.filter(filenames, '*.c'):
        matches.append(os.path.join(root, filename))

回答 1

与其他解决方案类似,但是使用fnmatch.fnmatch而不是glob,因为os.walk已经列出了文件名:

import os, fnmatch


def find_files(directory, pattern):
    for root, dirs, files in os.walk(directory):
        for basename in files:
            if fnmatch.fnmatch(basename, pattern):
                filename = os.path.join(root, basename)
                yield filename


for filename in find_files('src', '*.c'):
    print 'Found C source:', filename

另外,使用生成器可以使您处理找到的每个文件,而不是查找所有文件然后进行处理。

Similar to other solutions, but using fnmatch.fnmatch instead of glob, since os.walk already listed the filenames:

import os, fnmatch


def find_files(directory, pattern):
    for root, dirs, files in os.walk(directory):
        for basename in files:
            if fnmatch.fnmatch(basename, pattern):
                filename = os.path.join(root, basename)
                yield filename


for filename in find_files('src', '*.c'):
    print 'Found C source:', filename

Also, using a generator alows you to process each file as it is found, instead of finding all the files and then processing them.


回答 2

我修改了glob模块,以支持**用于递归glob,例如:

>>> import glob2
>>> all_header_files = glob2.glob('src/**/*.c')

https://github.com/miracle2k/python-glob2/

当您想为用户提供使用**语法的能力时很有用,因此仅os.walk()不够好。

I’ve modified the glob module to support ** for recursive globbing, e.g:

>>> import glob2
>>> all_header_files = glob2.glob('src/**/*.c')

https://github.com/miracle2k/python-glob2/

Useful when you want to provide your users with the ability to use the ** syntax, and thus os.walk() alone is not good enough.


回答 3

从Python 3.4开始,可以使用新pathlib模块中支持通配符glob()Path类之一的方法。例如:**

from pathlib import Path

for file_path in Path('src').glob('**/*.c'):
    print(file_path) # do whatever you need with these files

更新: 从Python 3.5开始,glob.glob()

Starting with Python 3.4, one can use the glob() method of one of the Path classes in the new pathlib module, which supports ** wildcards. For example:

from pathlib import Path

for file_path in Path('src').glob('**/*.c'):
    print(file_path) # do whatever you need with these files

Update: Starting with Python 3.5, the same syntax is also supported by glob.glob().


回答 4

import os
import fnmatch


def recursive_glob(treeroot, pattern):
    results = []
    for base, dirs, files in os.walk(treeroot):
        goodfiles = fnmatch.filter(files, pattern)
        results.extend(os.path.join(base, f) for f in goodfiles)
    return results

fnmatch为您提供与完全相同的模式glob,因此对于glob.glob非常紧密的语义而言,这确实是一个很好的替代。迭代的版本(例如生成器),用IOW代替glob.iglob,是微不足道的改编(只是yield中间结果,而不是extend最后返回单个结果列表)。

import os
import fnmatch


def recursive_glob(treeroot, pattern):
    results = []
    for base, dirs, files in os.walk(treeroot):
        goodfiles = fnmatch.filter(files, pattern)
        results.extend(os.path.join(base, f) for f in goodfiles)
    return results

fnmatch gives you exactly the same patterns as glob, so this is really an excellent replacement for glob.glob with very close semantics. An iterative version (e.g. a generator), IOW a replacement for glob.iglob, is a trivial adaptation (just yield the intermediate results as you go, instead of extending a single results list to return at the end).


回答 5

对于python> = 3.5,可以使用**recursive=True

import glob
for x in glob.glob('path/**/*.c', recursive=True):
    print(x)

演示版


如果是递归的True,则模式** 将匹配任何文件以及零个或多个directoriessubdirectories。如果模式后跟一个os.sep,则仅目录和subdirectories匹配项。

For python >= 3.5 you can use **, recursive=True :

import glob
for x in glob.glob('path/**/*.c', recursive=True):
    print(x)

Demo


If recursive is True, the pattern ** will match any files and zero or more directories and subdirectories. If the pattern is followed by an os.sep, only directories and subdirectories match.


回答 6

您将要用来os.walk收集符合条件的文件名。例如:

import os
cfiles = []
for root, dirs, files in os.walk('src'):
  for file in files:
    if file.endswith('.c'):
      cfiles.append(os.path.join(root, file))

You’ll want to use os.walk to collect filenames that match your criteria. For example:

import os
cfiles = []
for root, dirs, files in os.walk('src'):
  for file in files:
    if file.endswith('.c'):
      cfiles.append(os.path.join(root, file))

回答 7

这是一个具有嵌套列表推导的解决方案,os.walk而不是简单的后缀匹配glob

import os
cfiles = [os.path.join(root, filename)
          for root, dirnames, filenames in os.walk('src')
          for filename in filenames if filename.endswith('.c')]

可以将其压缩为单线:

import os;cfiles=[os.path.join(r,f) for r,d,fs in os.walk('src') for f in fs if f.endswith('.c')]

或概括为一个函数:

import os

def recursive_glob(rootdir='.', suffix=''):
    return [os.path.join(looproot, filename)
            for looproot, _, filenames in os.walk(rootdir)
            for filename in filenames if filename.endswith(suffix)]

cfiles = recursive_glob('src', '.c')

如果您确实需要完整的glob样式模式,则可以遵循Alex和Bruno的示例并使用fnmatch

import fnmatch
import os

def recursive_glob(rootdir='.', pattern='*'):
    return [os.path.join(looproot, filename)
            for looproot, _, filenames in os.walk(rootdir)
            for filename in filenames
            if fnmatch.fnmatch(filename, pattern)]

cfiles = recursive_glob('src', '*.c')

Here’s a solution with nested list comprehensions, os.walk and simple suffix matching instead of glob:

import os
cfiles = [os.path.join(root, filename)
          for root, dirnames, filenames in os.walk('src')
          for filename in filenames if filename.endswith('.c')]

It can be compressed to a one-liner:

import os;cfiles=[os.path.join(r,f) for r,d,fs in os.walk('src') for f in fs if f.endswith('.c')]

or generalized as a function:

import os

def recursive_glob(rootdir='.', suffix=''):
    return [os.path.join(looproot, filename)
            for looproot, _, filenames in os.walk(rootdir)
            for filename in filenames if filename.endswith(suffix)]

cfiles = recursive_glob('src', '.c')

If you do need full glob style patterns, you can follow Alex’s and Bruno’s example and use fnmatch:

import fnmatch
import os

def recursive_glob(rootdir='.', pattern='*'):
    return [os.path.join(looproot, filename)
            for looproot, _, filenames in os.walk(rootdir)
            for filename in filenames
            if fnmatch.fnmatch(filename, pattern)]

cfiles = recursive_glob('src', '*.c')

回答 8

最近,我不得不恢复扩展名为.jpg的图片。我运行了photorec并恢复了4579个目录,其中220万个文件具有多种扩展名。使用以下脚本,我能够在几分钟内选择50133个具有.jpg扩展名的文件:

#!/usr/binenv python2.7

import glob
import shutil
import os

src_dir = "/home/mustafa/Masaüstü/yedek"
dst_dir = "/home/mustafa/Genel/media"
for mediafile in glob.iglob(os.path.join(src_dir, "*", "*.jpg")): #"*" is for subdirectory
    shutil.copy(mediafile, dst_dir)

Recently I had to recover my pictures with the extension .jpg. I ran photorec and recovered 4579 directories 2.2 million files within, having tremendous variety of extensions.With the script below I was able to select 50133 files havin .jpg extension within minutes:

#!/usr/binenv python2.7

import glob
import shutil
import os

src_dir = "/home/mustafa/Masaüstü/yedek"
dst_dir = "/home/mustafa/Genel/media"
for mediafile in glob.iglob(os.path.join(src_dir, "*", "*.jpg")): #"*" is for subdirectory
    shutil.copy(mediafile, dst_dir)

回答 9

考虑一下pathlib.rglob()

这就好比调用Path.glob()"**/"在给定的相对图案前面加:

import pathlib


for p in pathlib.Path("src").rglob("*.c"):
    print(p)

另请参阅@taleinat的相关文章和类似的文章其他地方。

Consider pathlib.rglob().

This is like calling Path.glob() with "**/" added in front of the given relative pattern:

import pathlib


for p in pathlib.Path("src").rglob("*.c"):
    print(p)

See also @taleinat’s related post here and a similar post elsewhere.


回答 10

Johan和Bruno针对上述最低要求提供了出色的解决方案。我刚刚发布了实现了Ant FileSet和Globs的Formic,它可以处理这种情况以及更复杂的情况。您的要求的实现是:

import formic
fileset = formic.FileSet(include="/src/**/*.c")
for file_name in fileset.qualified_files():
    print file_name

Johan and Bruno provide excellent solutions on the minimal requirement as stated. I have just released Formic which implements Ant FileSet and Globs which can handle this and more complicated scenarios. An implementation of your requirement is:

import formic
fileset = formic.FileSet(include="/src/**/*.c")
for file_name in fileset.qualified_files():
    print file_name

回答 11

基于其他答案,这是我当前的工作实现,它在根目录中检索嵌套的xml文件:

files = []
for root, dirnames, filenames in os.walk(myDir):
    files.extend(glob.glob(root + "/*.xml"))

我真的很喜欢python :)

based on other answers this is my current working implementation, which retrieves nested xml files in a root directory:

files = []
for root, dirnames, filenames in os.walk(myDir):
    files.extend(glob.glob(root + "/*.xml"))

I’m really having fun with python :)


回答 12

仅使用glob模块执行此操作的另一种方法。只需在rglob方法中添加一个起始基本目录和一个匹配模式即可,它将返回匹配文件名的列表。

import glob
import os

def _getDirs(base):
    return [x for x in glob.iglob(os.path.join( base, '*')) if os.path.isdir(x) ]

def rglob(base, pattern):
    list = []
    list.extend(glob.glob(os.path.join(base,pattern)))
    dirs = _getDirs(base)
    if len(dirs):
        for d in dirs:
            list.extend(rglob(os.path.join(base,d), pattern))
    return list

Another way to do it using just the glob module. Just seed the rglob method with a starting base directory and a pattern to match and it will return a list of matching file names.

import glob
import os

def _getDirs(base):
    return [x for x in glob.iglob(os.path.join( base, '*')) if os.path.isdir(x) ]

def rglob(base, pattern):
    list = []
    list.extend(glob.glob(os.path.join(base,pattern)))
    dirs = _getDirs(base)
    if len(dirs):
        for d in dirs:
            list.extend(rglob(os.path.join(base,d), pattern))
    return list

回答 13

或具有列表理解:

 >>> base = r"c:\User\xtofl"
 >>> binfiles = [ os.path.join(base,f) 
            for base, _, files in os.walk(root) 
            for f in files if f.endswith(".jpg") ] 

Or with a list comprehension:

 >>> base = r"c:\User\xtofl"
 >>> binfiles = [ os.path.join(base,f) 
            for base, _, files in os.walk(root) 
            for f in files if f.endswith(".jpg") ] 

回答 14

刚做这个..它将以分层方式打印文件和目录

但是我没有用过fnmatch或walk

#!/usr/bin/python

import os,glob,sys

def dirlist(path, c = 1):

        for i in glob.glob(os.path.join(path, "*")):
                if os.path.isfile(i):
                        filepath, filename = os.path.split(i)
                        print '----' *c + filename

                elif os.path.isdir(i):
                        dirname = os.path.basename(i)
                        print '----' *c + dirname
                        c+=1
                        dirlist(i,c)
                        c-=1


path = os.path.normpath(sys.argv[1])
print(os.path.basename(path))
dirlist(path)

Just made this.. it will print files and directory in hierarchical way

But I didn’t used fnmatch or walk

#!/usr/bin/python

import os,glob,sys

def dirlist(path, c = 1):

        for i in glob.glob(os.path.join(path, "*")):
                if os.path.isfile(i):
                        filepath, filename = os.path.split(i)
                        print '----' *c + filename

                elif os.path.isdir(i):
                        dirname = os.path.basename(i)
                        print '----' *c + dirname
                        c+=1
                        dirlist(i,c)
                        c-=1


path = os.path.normpath(sys.argv[1])
print(os.path.basename(path))
dirlist(path)

回答 15

那使用fnmatch或正则表达式:

import fnmatch, os

def filepaths(directory, pattern):
    for root, dirs, files in os.walk(directory):
        for basename in files:
            try:
                matched = pattern.match(basename)
            except AttributeError:
                matched = fnmatch.fnmatch(basename, pattern)
            if matched:
                yield os.path.join(root, basename)

# usage
if __name__ == '__main__':
    from pprint import pprint as pp
    import re
    path = r'/Users/hipertracker/app/myapp'
    pp([x for x in filepaths(path, re.compile(r'.*\.py$'))])
    pp([x for x in filepaths(path, '*.py')])

That one uses fnmatch or regular expression:

import fnmatch, os

def filepaths(directory, pattern):
    for root, dirs, files in os.walk(directory):
        for basename in files:
            try:
                matched = pattern.match(basename)
            except AttributeError:
                matched = fnmatch.fnmatch(basename, pattern)
            if matched:
                yield os.path.join(root, basename)

# usage
if __name__ == '__main__':
    from pprint import pprint as pp
    import re
    path = r'/Users/hipertracker/app/myapp'
    pp([x for x in filepaths(path, re.compile(r'.*\.py$'))])
    pp([x for x in filepaths(path, '*.py')])

回答 16

除了建议的答案,您还可以通过一些懒惰的生成和列表理解魔术来做到这一点:

import os, glob, itertools

results = itertools.chain.from_iterable(glob.iglob(os.path.join(root,'*.c'))
                                               for root, dirs, files in os.walk('src'))

for f in results: print(f)

除了适合一行并且避免在内存中使用不必要的列表之外,这还具有很好的副作用,即您可以以类似于**运算符的方式使用它,例如,可以使用os.path.join(root, 'some/path/*.c')它来获取所有.c文件。具有此结构的src子目录。

In addition to the suggested answers, you can do this with some lazy generation and list comprehension magic:

import os, glob, itertools

results = itertools.chain.from_iterable(glob.iglob(os.path.join(root,'*.c'))
                                               for root, dirs, files in os.walk('src'))

for f in results: print(f)

Besides fitting in one line and avoiding unnecessary lists in memory, this also has the nice side effect, that you can use it in a way similar to the ** operator, e.g., you could use os.path.join(root, 'some/path/*.c') in order to get all .c files in all sub directories of src that have this structure.


回答 17

对于python 3.5及更高版本

import glob

#file_names_array = glob.glob('path/*.c', recursive=True)
#above works for files directly at path/ as guided by NeStack

#updated version
file_names_array = glob.glob('path/**/*.c', recursive=True)

您可能还需要

for full_path_in_src in  file_names_array:
    print (full_path_in_src ) # be like 'abc/xyz.c'
    #Full system path of this would be like => 'path till src/abc/xyz.c'

For python 3.5 and later

import glob

#file_names_array = glob.glob('path/*.c', recursive=True)
#above works for files directly at path/ as guided by NeStack

#updated version
file_names_array = glob.glob('path/**/*.c', recursive=True)

further you might need

for full_path_in_src in  file_names_array:
    print (full_path_in_src ) # be like 'abc/xyz.c'
    #Full system path of this would be like => 'path till src/abc/xyz.c'

回答 18

这是Python 2.7上的有效代码。作为我的devops工作的一部分,我需要编写一个脚本,该脚本会将标有live-appName.properties的配置文件移动到appName.properties。可能还有其他扩展文件,例如live-appName.xml。

以下是用于此目的的工作代码,该代码在给定目录(嵌套级别)中查找文件,然后将其重命名(移动)为所需的文件名

def flipProperties(searchDir):
   print "Flipping properties to point to live DB"
   for root, dirnames, filenames in os.walk(searchDir):
      for filename in fnmatch.filter(filenames, 'live-*.*'):
        targetFileName = os.path.join(root, filename.split("live-")[1])
        print "File "+ os.path.join(root, filename) + "will be moved to " + targetFileName
        shutil.move(os.path.join(root, filename), targetFileName)

从主脚本调用此函数

flipProperties(searchDir)

希望这可以帮助遇到类似问题的人。

This is a working code on Python 2.7. As part of my devops work, I was required to write a script which would move the config files marked with live-appName.properties to appName.properties. There could be other extension files as well like live-appName.xml.

Below is a working code for this, which finds the files in the given directories (nested level) and then renames (moves) it to the required filename

def flipProperties(searchDir):
   print "Flipping properties to point to live DB"
   for root, dirnames, filenames in os.walk(searchDir):
      for filename in fnmatch.filter(filenames, 'live-*.*'):
        targetFileName = os.path.join(root, filename.split("live-")[1])
        print "File "+ os.path.join(root, filename) + "will be moved to " + targetFileName
        shutil.move(os.path.join(root, filename), targetFileName)

This function is called from a main script

flipProperties(searchDir)

Hope this helps someone struggling with similar issues.


回答 19

Johan Dahlin答案的简化版本,不带fnmatch

import os

matches = []
for root, dirnames, filenames in os.walk('src'):
  matches += [os.path.join(root, f) for f in filenames if f[-2:] == '.c']

Simplified version of Johan Dahlin’s answer, without fnmatch.

import os

matches = []
for root, dirnames, filenames in os.walk('src'):
  matches += [os.path.join(root, f) for f in filenames if f[-2:] == '.c']

回答 20

这是我的使用列表推导的解决方案在目录和所有子目录中递归搜索多个文件扩展名的解决方案:

import os, glob

def _globrec(path, *exts):
""" Glob recursively a directory and all subdirectories for multiple file extensions 
    Note: Glob is case-insensitive, i. e. for '\*.jpg' you will get files ending
    with .jpg and .JPG

    Parameters
    ----------
    path : str
        A directory name
    exts : tuple
        File extensions to glob for

    Returns
    -------
    files : list
        list of files matching extensions in exts in path and subfolders

    """
    dirs = [a[0] for a in os.walk(path)]
    f_filter = [d+e for d in dirs for e in exts]    
    return [f for files in [glob.iglob(files) for files in f_filter] for f in files]

my_pictures = _globrec(r'C:\Temp', '\*.jpg','\*.bmp','\*.png','\*.gif')
for f in my_pictures:
    print f

Here is my solution using list comprehension to search for multiple file extensions recursively in a directory and all subdirectories:

import os, glob

def _globrec(path, *exts):
""" Glob recursively a directory and all subdirectories for multiple file extensions 
    Note: Glob is case-insensitive, i. e. for '\*.jpg' you will get files ending
    with .jpg and .JPG

    Parameters
    ----------
    path : str
        A directory name
    exts : tuple
        File extensions to glob for

    Returns
    -------
    files : list
        list of files matching extensions in exts in path and subfolders

    """
    dirs = [a[0] for a in os.walk(path)]
    f_filter = [d+e for d in dirs for e in exts]    
    return [f for files in [glob.iglob(files) for files in f_filter] for f in files]

my_pictures = _globrec(r'C:\Temp', '\*.jpg','\*.bmp','\*.png','\*.gif')
for f in my_pictures:
    print f

回答 21

import sys, os, glob

dir_list = ["c:\\books\\heap"]

while len(dir_list) > 0:
    cur_dir = dir_list[0]
    del dir_list[0]
    list_of_files = glob.glob(cur_dir+'\\*')
    for book in list_of_files:
        if os.path.isfile(book):
            print(book)
        else:
            dir_list.append(book)
import sys, os, glob

dir_list = ["c:\\books\\heap"]

while len(dir_list) > 0:
    cur_dir = dir_list[0]
    del dir_list[0]
    list_of_files = glob.glob(cur_dir+'\\*')
    for book in list_of_files:
        if os.path.isfile(book):
            print(book)
        else:
            dir_list.append(book)

回答 22

我修改了此发布中的最佳答案..并最近创建了此脚本,该脚本将遍历给定目录(searchdir)中的所有文件及其下的子目录…并打印文件名,rootdir,修改/创建日期和尺寸。

希望这对某人有帮助…他们可以遍历目录并获取fileinfo。

import time
import fnmatch
import os

def fileinfo(file):
    filename = os.path.basename(file)
    rootdir = os.path.dirname(file)
    lastmod = time.ctime(os.path.getmtime(file))
    creation = time.ctime(os.path.getctime(file))
    filesize = os.path.getsize(file)

    print "%s**\t%s\t%s\t%s\t%s" % (rootdir, filename, lastmod, creation, filesize)

searchdir = r'D:\Your\Directory\Root'
matches = []

for root, dirnames, filenames in os.walk(searchdir):
    ##  for filename in fnmatch.filter(filenames, '*.c'):
    for filename in filenames:
        ##      matches.append(os.path.join(root, filename))
        ##print matches
        fileinfo(os.path.join(root, filename))

I modified the top answer in this posting.. and recently created this script which will loop through all files in a given directory (searchdir) and the sub-directories under it… and prints filename, rootdir, modified/creation date, and size.

Hope this helps someone… and they can walk the directory and get fileinfo.

import time
import fnmatch
import os

def fileinfo(file):
    filename = os.path.basename(file)
    rootdir = os.path.dirname(file)
    lastmod = time.ctime(os.path.getmtime(file))
    creation = time.ctime(os.path.getctime(file))
    filesize = os.path.getsize(file)

    print "%s**\t%s\t%s\t%s\t%s" % (rootdir, filename, lastmod, creation, filesize)

searchdir = r'D:\Your\Directory\Root'
matches = []

for root, dirnames, filenames in os.walk(searchdir):
    ##  for filename in fnmatch.filter(filenames, '*.c'):
    for filename in filenames:
        ##      matches.append(os.path.join(root, filename))
        ##print matches
        fileinfo(os.path.join(root, filename))

回答 23

这是一个将模式与完整路径而不只是基本文件名匹配的解决方案。

它用于fnmatch.translate将glob样式的模式转换为正则表达式,然后将其与在遍历目录时发现的每个文件的完整路径进行匹配。

re.IGNORECASE是可选的,但在Windows上是理想的,因为文件系统本身不区分大小写。(我没有费心编译正则表达式,因为文档表明它应该在内部缓存。)

import fnmatch
import os
import re

def findfiles(dir, pattern):
    patternregex = fnmatch.translate(pattern)
    for root, dirs, files in os.walk(dir):
        for basename in files:
            filename = os.path.join(root, basename)
            if re.search(patternregex, filename, re.IGNORECASE):
                yield filename

Here is a solution that will match the pattern against the full path and not just the base filename.

It uses fnmatch.translate to convert a glob-style pattern into a regular expression, which is then matched against the full path of each file found while walking the directory.

re.IGNORECASE is optional, but desirable on Windows since the file system itself is not case-sensitive. (I didn’t bother compiling the regex because docs indicate it should be cached internally.)

import fnmatch
import os
import re

def findfiles(dir, pattern):
    patternregex = fnmatch.translate(pattern)
    for root, dirs, files in os.walk(dir):
        for basename in files:
            filename = os.path.join(root, basename)
            if re.search(patternregex, filename, re.IGNORECASE):
                yield filename

回答 24

我需要一个解决方案的Python 2.x中,工程上大的目录。
我结束了这一点:

import subprocess
foundfiles= subprocess.check_output("ls src/*.c src/**/*.c", shell=True)
for foundfile in foundfiles.splitlines():
    print foundfile

请注意,如果ls找不到任何匹配文件,您可能需要一些异常处理。

I needed a solution for python 2.x that works fast on large directories.
I endet up with this:

import subprocess
foundfiles= subprocess.check_output("ls src/*.c src/**/*.c", shell=True)
for foundfile in foundfiles.splitlines():
    print foundfile

Note that you might need some exception handling in case ls doesn’t find any matching file.


如何在Python中复制文件?

问题:如何在Python中复制文件?

如何在Python中复制文件?

我找不到任何东西os

How do I copy a file in Python?

I couldn’t find anything under os.


回答 0

shutil有很多方法可以使用。其中之一是:

from shutil import copyfile
copyfile(src, dst)
  • 将名为src的文件的内容复制到名为dst的文件。
  • 目标位置必须可写;否则,将引发IOError异常。
  • 如果dst已经存在,它将被替换。
  • 特殊文件(例如字符或块设备和管道)无法使用此功能进行复制。
  • 对于copysrcdst是作为字符串给出的路径名。

如果使用os.path操作,请使用copy而不是copyfilecopyfile只接受字符串

shutil has many methods you can use. One of which is:

from shutil import copyfile
copyfile(src, dst)
  • Copy the contents of the file named src to a file named dst.
  • The destination location must be writable; otherwise, an IOError exception will be raised.
  • If dst already exists, it will be replaced.
  • Special files such as character or block devices and pipes cannot be copied with this function.
  • With copy, src and dst are path names given as strings.

If you use os.path operations, use copy rather than copyfile. copyfile will only accept strings.


回答 1

┌──────────────────┬────────┬───────────┬───────┬────────────────┐
│     Function     │ Copies │   Copies  │Can use│   Destination  │
│                  │metadata│permissions│buffer │may be directory│
├──────────────────┼────────┼───────────┼───────┼────────────────┤
│shutil.copy       │   No   │    Yes    │   No  │      Yes       │
│shutil.copyfile   │   No   │     No    │   No  │       No       │
│shutil.copy2      │  Yes   │    Yes    │   No  │      Yes       │
│shutil.copyfileobj│   No   │     No    │  Yes  │       No       │
└──────────────────┴────────┴───────────┴───────┴────────────────┘
┌──────────────────┬────────┬───────────┬───────┬────────────────┐
│     Function     │ Copies │   Copies  │Can use│   Destination  │
│                  │metadata│permissions│buffer │may be directory│
├──────────────────┼────────┼───────────┼───────┼────────────────┤
│shutil.copy       │   No   │    Yes    │   No  │      Yes       │
│shutil.copyfile   │   No   │     No    │   No  │       No       │
│shutil.copy2      │  Yes   │    Yes    │   No  │      Yes       │
│shutil.copyfileobj│   No   │     No    │  Yes  │       No       │
└──────────────────┴────────┴───────────┴───────┴────────────────┘

回答 2

copy2(src,dst)通常比以下copyfile(src,dst)原因更有用:

  • 它允许dst将一个目录(而不是完整的目标文件名),在这种情况下,基本名称src用于创建新的文件;
  • 它将原始修改和访问信息(mtime和atime)保留在文件元数据中(但是,这会带来一些开销)。

这是一个简短的示例:

import shutil
shutil.copy2('/src/dir/file.ext', '/dst/dir/newname.ext') # complete target filename given
shutil.copy2('/src/file.ext', '/dst/dir') # target filename is /dst/dir/file.ext

copy2(src,dst) is often more useful than copyfile(src,dst) because:

  • it allows dst to be a directory (instead of the complete target filename), in which case the basename of src is used for creating the new file;
  • it preserves the original modification and access info (mtime and atime) in the file metadata (however, this comes with a slight overhead).

Here is a short example:

import shutil
shutil.copy2('/src/dir/file.ext', '/dst/dir/newname.ext') # complete target filename given
shutil.copy2('/src/file.ext', '/dst/dir') # target filename is /dst/dir/file.ext

回答 3

您可以使用shutil软件包中的一种复制功能:

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━
功能保留支持接受复制其他
                      权限目录目的。文件obj元数据  
―――――――――――――――――――――――――――――――――――――――――――― ――――――――――――――――――――――――――――
shutil.copy               ✔✔☐☐
 shutil.copy2              ✔✔☐✔
 shutil.copyfile           ☐☐☐☐
 shutil.copyfileobj        ☐☐✔☐
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━

例:

import shutil
shutil.copy('/etc/hostname', '/var/tmp/testhostname')

You can use one of the copy functions from the shutil package:

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Function              preserves     supports          accepts     copies other
                      permissions   directory dest.   file obj    metadata  
――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――
shutil.copy              ✔             ✔                 ☐           ☐
shutil.copy2             ✔             ✔                 ☐           ✔
shutil.copyfile          ☐             ☐                 ☐           ☐
shutil.copyfileobj       ☐             ☐                 ✔           ☐
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

Example:

import shutil
shutil.copy('/etc/hostname', '/var/tmp/testhostname')

回答 4

在Python中,您可以使用


import os
import shutil
import subprocess

1)使用shutil模块复制文件

shutil.copyfile 签名

shutil.copyfile(src_file, dest_file, *, follow_symlinks=True)

# example    
shutil.copyfile('source.txt', 'destination.txt')

shutil.copy 签名

shutil.copy(src_file, dest_file, *, follow_symlinks=True)

# example
shutil.copy('source.txt', 'destination.txt')

shutil.copy2 签名

shutil.copy2(src_file, dest_file, *, follow_symlinks=True)

# example
shutil.copy2('source.txt', 'destination.txt')  

shutil.copyfileobj 签名

shutil.copyfileobj(src_file_object, dest_file_object[, length])

# example
file_src = 'source.txt'  
f_src = open(file_src, 'rb')

file_dest = 'destination.txt'  
f_dest = open(file_dest, 'wb')

shutil.copyfileobj(f_src, f_dest)  

2)使用os模块复制文件

os.popen 签名

os.popen(cmd[, mode[, bufsize]])

# example
# In Unix/Linux
os.popen('cp source.txt destination.txt') 

# In Windows
os.popen('copy source.txt destination.txt')

os.system 签名

os.system(command)


# In Linux/Unix
os.system('cp source.txt destination.txt')  

# In Windows
os.system('copy source.txt destination.txt')

3)使用subprocess模块复制文件

subprocess.call 签名

subprocess.call(args, *, stdin=None, stdout=None, stderr=None, shell=False)

# example (WARNING: setting `shell=True` might be a security-risk)
# In Linux/Unix
status = subprocess.call('cp source.txt destination.txt', shell=True) 

# In Windows
status = subprocess.call('copy source.txt destination.txt', shell=True)

subprocess.check_output 签名

subprocess.check_output(args, *, stdin=None, stderr=None, shell=False, universal_newlines=False)

# example (WARNING: setting `shell=True` might be a security-risk)
# In Linux/Unix
status = subprocess.check_output('cp source.txt destination.txt', shell=True)

# In Windows
status = subprocess.check_output('copy source.txt destination.txt', shell=True)

In Python, you can copy the files using


import os
import shutil
import subprocess

1) Copying files using shutil module

shutil.copyfile signature

shutil.copyfile(src_file, dest_file, *, follow_symlinks=True)

# example    
shutil.copyfile('source.txt', 'destination.txt')

shutil.copy signature

shutil.copy(src_file, dest_file, *, follow_symlinks=True)

# example
shutil.copy('source.txt', 'destination.txt')

shutil.copy2 signature

shutil.copy2(src_file, dest_file, *, follow_symlinks=True)

# example
shutil.copy2('source.txt', 'destination.txt')  

shutil.copyfileobj signature

shutil.copyfileobj(src_file_object, dest_file_object[, length])

# example
file_src = 'source.txt'  
f_src = open(file_src, 'rb')

file_dest = 'destination.txt'  
f_dest = open(file_dest, 'wb')

shutil.copyfileobj(f_src, f_dest)  

2) Copying files using os module

os.popen signature

os.popen(cmd[, mode[, bufsize]])

# example
# In Unix/Linux
os.popen('cp source.txt destination.txt') 

# In Windows
os.popen('copy source.txt destination.txt')

os.system signature

os.system(command)


# In Linux/Unix
os.system('cp source.txt destination.txt')  

# In Windows
os.system('copy source.txt destination.txt')

3) Copying files using subprocess module

subprocess.call signature

subprocess.call(args, *, stdin=None, stdout=None, stderr=None, shell=False)

# example (WARNING: setting `shell=True` might be a security-risk)
# In Linux/Unix
status = subprocess.call('cp source.txt destination.txt', shell=True) 

# In Windows
status = subprocess.call('copy source.txt destination.txt', shell=True)

subprocess.check_output signature

subprocess.check_output(args, *, stdin=None, stderr=None, shell=False, universal_newlines=False)

# example (WARNING: setting `shell=True` might be a security-risk)
# In Linux/Unix
status = subprocess.check_output('cp source.txt destination.txt', shell=True)

# In Windows
status = subprocess.check_output('copy source.txt destination.txt', shell=True)


回答 5

复制文件是一个相对简单的操作,如下面的示例所示,但是您应该为此使用shutil stdlib模块

def copyfileobj_example(source, dest, buffer_size=1024*1024):
    """      
    Copy a file from source to dest. source and dest
    must be file-like objects, i.e. any object with a read or
    write method, like for example StringIO.
    """
    while True:
        copy_buffer = source.read(buffer_size)
        if not copy_buffer:
            break
        dest.write(copy_buffer)

如果要按文件名复制,可以执行以下操作:

def copyfile_example(source, dest):
    # Beware, this example does not handle any edge cases!
    with open(source, 'rb') as src, open(dest, 'wb') as dst:
        copyfileobj_example(src, dst)

Copying a file is a relatively straightforward operation as shown by the examples below, but you should instead use the shutil stdlib module for that.

def copyfileobj_example(source, dest, buffer_size=1024*1024):
    """      
    Copy a file from source to dest. source and dest
    must be file-like objects, i.e. any object with a read or
    write method, like for example StringIO.
    """
    while True:
        copy_buffer = source.read(buffer_size)
        if not copy_buffer:
            break
        dest.write(copy_buffer)

If you want to copy by filename you could do something like this:

def copyfile_example(source, dest):
    # Beware, this example does not handle any edge cases!
    with open(source, 'rb') as src, open(dest, 'wb') as dst:
        copyfileobj_example(src, dst)

回答 6

使用shutil模块

copyfile(src, dst)

将名为src的文件的内容复制到名为dst的文件。目标位置必须可写;否则,将引发IOError异常。如果dst已经存在,它将被替换。特殊文件(例如字符或块设备和管道)无法使用此功能进行复制。src和dst是以字符串形式给出的路径名。

看一下filesys中标准Python模块中可用的所有文件和目录处理功能。

Use the shutil module.

copyfile(src, dst)

Copy the contents of the file named src to a file named dst. The destination location must be writable; otherwise, an IOError exception will be raised. If dst already exists, it will be replaced. Special files such as character or block devices and pipes cannot be copied with this function. src and dst are path names given as strings.

Take a look at filesys for all the file and directory handling functions available in standard Python modules.


回答 7

目录和文件复制示例-来自Tim Golden的Python资料:

http://timgolden.me.uk/python/win32_how_do_i/copy-a-file.html

import os
import shutil
import tempfile

filename1 = tempfile.mktemp (".txt")
open (filename1, "w").close ()
filename2 = filename1 + ".copy"
print filename1, "=>", filename2

shutil.copy (filename1, filename2)

if os.path.isfile (filename2): print "Success"

dirname1 = tempfile.mktemp (".dir")
os.mkdir (dirname1)
dirname2 = dirname1 + ".copy"
print dirname1, "=>", dirname2

shutil.copytree (dirname1, dirname2)

if os.path.isdir (dirname2): print "Success"

Directory and File copy example – From Tim Golden’s Python Stuff:

http://timgolden.me.uk/python/win32_how_do_i/copy-a-file.html

import os
import shutil
import tempfile

filename1 = tempfile.mktemp (".txt")
open (filename1, "w").close ()
filename2 = filename1 + ".copy"
print filename1, "=>", filename2

shutil.copy (filename1, filename2)

if os.path.isfile (filename2): print "Success"

dirname1 = tempfile.mktemp (".dir")
os.mkdir (dirname1)
dirname2 = dirname1 + ".copy"
print dirname1, "=>", dirname2

shutil.copytree (dirname1, dirname2)

if os.path.isdir (dirname2): print "Success"

回答 8

首先,我详尽介绍了shutil方法的摘要,供您参考。

shutil_methods =
{'copy':['shutil.copyfileobj',
          'shutil.copyfile',
          'shutil.copymode',
          'shutil.copystat',
          'shutil.copy',
          'shutil.copy2',
          'shutil.copytree',],
 'move':['shutil.rmtree',
         'shutil.move',],
 'exception': ['exception shutil.SameFileError',
                 'exception shutil.Error'],
 'others':['shutil.disk_usage',
             'shutil.chown',
             'shutil.which',
             'shutil.ignore_patterns',]
}

其次,解释示例中的复制方法:

  1. shutil.copyfileobj(fsrc, fdst[, length]) 操作打开的对象
In [3]: src = '~/Documents/Head+First+SQL.pdf'
In [4]: dst = '~/desktop'
In [5]: shutil.copyfileobj(src, dst)
AttributeError: 'str' object has no attribute 'read'
#copy the file object
In [7]: with open(src, 'rb') as f1,open(os.path.join(dst,'test.pdf'), 'wb') as f2:
    ...:      shutil.copyfileobj(f1, f2)
In [8]: os.stat(os.path.join(dst,'test.pdf'))
Out[8]: os.stat_result(st_mode=33188, st_ino=8598319475, st_dev=16777220, st_nlink=1, st_uid=501, st_gid=20, st_size=13507926, st_atime=1516067347, st_mtime=1516067335, st_ctime=1516067345)
  1. shutil.copyfile(src, dst, *, follow_symlinks=True) 复制并重命名
In [9]: shutil.copyfile(src, dst)
IsADirectoryError: [Errno 21] Is a directory: ~/desktop'
#so dst should be a filename instead of a directory name
  1. shutil.copy() 复制时不设置元数据
In [10]: shutil.copy(src, dst)
Out[10]: ~/desktop/Head+First+SQL.pdf'
#check their metadata
In [25]: os.stat(src)
Out[25]: os.stat_result(st_mode=33188, st_ino=597749, st_dev=16777220, st_nlink=1, st_uid=501, st_gid=20, st_size=13507926, st_atime=1516066425, st_mtime=1493698739, st_ctime=1514871215)
In [26]: os.stat(os.path.join(dst, 'Head+First+SQL.pdf'))
Out[26]: os.stat_result(st_mode=33188, st_ino=8598313736, st_dev=16777220, st_nlink=1, st_uid=501, st_gid=20, st_size=13507926, st_atime=1516066427, st_mtime=1516066425, st_ctime=1516066425)
# st_atime,st_mtime,st_ctime changed
  1. shutil.copy2() 保留元数据进行复制
In [30]: shutil.copy2(src, dst)
Out[30]: ~/desktop/Head+First+SQL.pdf'
In [31]: os.stat(src)
Out[31]: os.stat_result(st_mode=33188, st_ino=597749, st_dev=16777220, st_nlink=1, st_uid=501, st_gid=20, st_size=13507926, st_atime=1516067055, st_mtime=1493698739, st_ctime=1514871215)
In [32]: os.stat(os.path.join(dst, 'Head+First+SQL.pdf'))
Out[32]: os.stat_result(st_mode=33188, st_ino=8598313736, st_dev=16777220, st_nlink=1, st_uid=501, st_gid=20, st_size=13507926, st_atime=1516067063, st_mtime=1493698739, st_ctime=1516067055)
# Preseved st_mtime
  1. shutil.copytree()

以递归方式复制以src为根的整个目录树,返回目标目录

Firstly, I made an exhaustive cheatsheet of shutil methods for your reference.

shutil_methods =
{'copy':['shutil.copyfileobj',
          'shutil.copyfile',
          'shutil.copymode',
          'shutil.copystat',
          'shutil.copy',
          'shutil.copy2',
          'shutil.copytree',],
 'move':['shutil.rmtree',
         'shutil.move',],
 'exception': ['exception shutil.SameFileError',
                 'exception shutil.Error'],
 'others':['shutil.disk_usage',
             'shutil.chown',
             'shutil.which',
             'shutil.ignore_patterns',]
}

Secondly, explain methods of copy in exmaples:

  1. shutil.copyfileobj(fsrc, fdst[, length]) manipulate opened objects
In [3]: src = '~/Documents/Head+First+SQL.pdf'
In [4]: dst = '~/desktop'
In [5]: shutil.copyfileobj(src, dst)
AttributeError: 'str' object has no attribute 'read'
#copy the file object
In [7]: with open(src, 'rb') as f1,open(os.path.join(dst,'test.pdf'), 'wb') as f2:
    ...:      shutil.copyfileobj(f1, f2)
In [8]: os.stat(os.path.join(dst,'test.pdf'))
Out[8]: os.stat_result(st_mode=33188, st_ino=8598319475, st_dev=16777220, st_nlink=1, st_uid=501, st_gid=20, st_size=13507926, st_atime=1516067347, st_mtime=1516067335, st_ctime=1516067345)
  1. shutil.copyfile(src, dst, *, follow_symlinks=True) Copy and rename
In [9]: shutil.copyfile(src, dst)
IsADirectoryError: [Errno 21] Is a directory: ~/desktop'
#so dst should be a filename instead of a directory name
  1. shutil.copy() Copy without preseving the metadata
In [10]: shutil.copy(src, dst)
Out[10]: ~/desktop/Head+First+SQL.pdf'
#check their metadata
In [25]: os.stat(src)
Out[25]: os.stat_result(st_mode=33188, st_ino=597749, st_dev=16777220, st_nlink=1, st_uid=501, st_gid=20, st_size=13507926, st_atime=1516066425, st_mtime=1493698739, st_ctime=1514871215)
In [26]: os.stat(os.path.join(dst, 'Head+First+SQL.pdf'))
Out[26]: os.stat_result(st_mode=33188, st_ino=8598313736, st_dev=16777220, st_nlink=1, st_uid=501, st_gid=20, st_size=13507926, st_atime=1516066427, st_mtime=1516066425, st_ctime=1516066425)
# st_atime,st_mtime,st_ctime changed
  1. shutil.copy2() Copy with preseving the metadata
In [30]: shutil.copy2(src, dst)
Out[30]: ~/desktop/Head+First+SQL.pdf'
In [31]: os.stat(src)
Out[31]: os.stat_result(st_mode=33188, st_ino=597749, st_dev=16777220, st_nlink=1, st_uid=501, st_gid=20, st_size=13507926, st_atime=1516067055, st_mtime=1493698739, st_ctime=1514871215)
In [32]: os.stat(os.path.join(dst, 'Head+First+SQL.pdf'))
Out[32]: os.stat_result(st_mode=33188, st_ino=8598313736, st_dev=16777220, st_nlink=1, st_uid=501, st_gid=20, st_size=13507926, st_atime=1516067063, st_mtime=1493698739, st_ctime=1516067055)
# Preseved st_mtime
  1. shutil.copytree()

Recursively copy an entire directory tree rooted at src, returning the destination directory


回答 9

对于小文件并且仅使用python内置函数,可以使用以下单行代码:

with open(source, 'rb') as src, open(dest, 'wb') as dst: dst.write(src.read())

正如@maxschlepzig在下面的评论中提到的,对于文件太大或内存至关重要的应用程序,这不是最佳方法,因此应首选Swati的答案。

For small files and using only python built-ins, you can use the following one-liner:

with open(source, 'rb') as src, open(dest, 'wb') as dst: dst.write(src.read())

As @maxschlepzig mentioned in the comments below, this is not optimal way for applications where the file is too large or when memory is critical, thus Swati’s answer should be preferred.


回答 10

你可以用 os.system('cp nameoffilegeneratedbyprogram /otherdirectory/')

还是像我那样

os.system('cp '+ rawfile + ' rawdata.dat')

rawfile我在程序内部生成的名称在哪里。

这是仅Linux的解决方案

You could use os.system('cp nameoffilegeneratedbyprogram /otherdirectory/')

or as I did it,

os.system('cp '+ rawfile + ' rawdata.dat')

where rawfile is the name that I had generated inside the program.

This is a Linux only solution


回答 11

对于大文件,我所做的就是逐行读取文件并将每一行读入数组。然后,一旦数组达到特定大小,请将其附加到新文件中。

for line in open("file.txt", "r"):
    list.append(line)
    if len(list) == 1000000: 
        output.writelines(list)
        del list[:]

For large files, what I did was read the file line by line and read each line into an array. Then, once the array reached a certain size, append it to a new file.

for line in open("file.txt", "r"):
    list.append(line)
    if len(list) == 1000000: 
        output.writelines(list)
        del list[:]

回答 12

from subprocess import call
call("cp -p <file> <file>", shell=True)
from subprocess import call
call("cp -p <file> <file>", shell=True)

回答 13

Python 3.5开始,您可以对小文件(例如:文本文件,小jpegs)执行以下操作:

from pathlib import Path

source = Path('../path/to/my/file.txt')
destination = Path('../path/where/i/want/to/store/it.txt')
destination.write_bytes(source.read_bytes())

write_bytes 将覆盖目的地位置的所有内容

As of Python 3.5 you can do the following for small files (ie: text files, small jpegs):

from pathlib import Path

source = Path('../path/to/my/file.txt')
destination = Path('../path/where/i/want/to/store/it.txt')
destination.write_bytes(source.read_bytes())

write_bytes will overwrite whatever was at the destination’s location


回答 14

open(destination, 'wb').write(open(source, 'rb').read())

在读取模式下打开源文件,并在写入模式下写入目标文件。

open(destination, 'wb').write(open(source, 'rb').read())

Open the source file in read mode, and write to destination file in write mode.


回答 15

Python提供了内置功能,可使用操作系统外壳程序实用程序轻松复制文件。

以下命令用于复制文件

shutil.copy(src,dst)

以下命令用于复制带有元数据信息的文件

shutil.copystat(src,dst)

Python provides in-built functions for easily copying files using the Operating System Shell utilities.

Following command is used to Copy File

shutil.copy(src,dst)

Following command is used to Copy File with MetaData Information

shutil.copystat(src,dst)