Python 实用宝典

Question 1

I want to be able to list only the directories inside some folder. This means I don’t want filenames listed, nor do I want additional sub-folders.

Let’s see if an example helps. In the current directory we have:

>>> os.listdir(os.getcwd())
['cx_Oracle-doc', 'DLLs', 'Doc', 'include', 'Lib', 'libs', 'LICENSE.txt', 'mod_p
ython-wininst.log', 'NEWS.txt', 'pymssql-wininst.log', 'python.exe', 'pythonw.ex
e', 'README.txt', 'Removemod_python.exe', 'Removepymssql.exe', 'Scripts', 'tcl',
 'Tools', 'w9xpopen.exe']

However, I don’t want filenames listed. Nor do I want sub-folders such as \Lib\curses. Essentially what I want works with the following:

>>> for root, dirnames, filenames in os.walk('.'):
...     print dirnames
...     break
...
['cx_Oracle-doc', 'DLLs', 'Doc', 'include', 'Lib', 'libs', 'Scripts', 'tcl', 'Tools']

However, I’m wondering if there’s a simpler way of achieving the same results. I get the impression that using os.walk only to return the top level is inefficient/too much.

Question 2

Filter the result using os.path.isdir() (and use os.path.join() to get the real path):

>>> [ name for name in os.listdir(thedir) if os.path.isdir(os.path.join(thedir, name)) ]
['ctypes', 'distutils', 'encodings', 'lib-tk', 'config', 'idlelib', 'xml', 'bsddb', 'hotshot', 'logging', 'doc', 'test', 'compiler', 'curses', 'site-packages', 'email', 'sqlite3', 'lib-dynload', 'wsgiref', 'plat-linux2', 'plat-mac']

Question 3

os.walk

Use os.walk with next item function:

next(os.walk('.'))[1]

For Python <=2.5 use:

os.walk('.').next()[1]

How this works

os.walk is a generator and calling next will get the first result in the form of a 3-tuple (dirpath, dirnames, filenames). Thus the [1] index returns only the dirnames from that tuple.

Question 4

Filter the list using os.path.isdir to detect directories.

filter(os.path.isdir, os.listdir(os.getcwd()))

Question 5

directories=[d for d in os.listdir(os.getcwd()) if os.path.isdir(d)]

Question 6

Note that, instead of doing os.listdir(os.getcwd()), it’s preferable to do os.listdir(os.path.curdir). One less function call, and it’s as portable.

So, to complete the answer, to get a list of directories in a folder:

def listdirs(folder):
    return [d for d in os.listdir(folder) if os.path.isdir(os.path.join(folder, d))]

If you prefer full pathnames, then use this function:

def listdirs(folder):
    return [
        d for d in (os.path.join(folder, d1) for d1 in os.listdir(folder))
        if os.path.isdir(d)
    ]

Question 7

This seems to work too (at least on linux):

import glob, os
glob.glob('*' + os.path.sep)

Question 8

Just to add that using os.listdir() does not “take a lot of processing vs very simple os.walk().next()[1]”. This is because os.walk() uses os.listdir() internally. In fact if you test them together:

>>>> import timeit
>>>> timeit.timeit("os.walk('.').next()[1]", "import os", number=10000)
1.1215229034423828
>>>> timeit.timeit("[ name for name in os.listdir('.') if os.path.isdir(os.path.join('.', name)) ]", "import os", number=10000)
1.0592019557952881

The filtering of os.listdir() is very slightly faster.

Question 9

A very much simpler and elegant way is to use this:

 import os
 dir_list = os.walk('.').next()[1]
 print dir_list

Run this script in the same folder for which you want folder names.It will give you exactly the immediate folders name only(that too without the full path of the folders).

Question 10

Using list comprehension,

[a for a in os.listdir() if os.path.isdir(a)]

I think It is the simplest way

Question 11

being a newbie here i can’t yet directly comment but here is a small correction i’d like to add to the following part of ΤΖΩΤΖΙΟΥ’s answer :

If you prefer full pathnames, then use this function:

def listdirs(folder):  
  return [
    d for d in (os.path.join(folder, d1) for d1 in os.listdir(folder))
    if os.path.isdir(d)
]

for those still on python < 2.4: the inner construct needs to be a list instead of a tuple and therefore should read like this:

def listdirs(folder):  
  return [
    d for d in [os.path.join(folder, d1) for d1 in os.listdir(folder)]
    if os.path.isdir(d)
  ]

otherwise one gets a syntax error.

Question 12

[x for x in os.listdir(somedir) if os.path.isdir(os.path.join(somedir, x))]

Question 13

For a list of full path names I prefer this version to the other solutions here:

def listdirs(dir):
    return [os.path.join(os.path.join(dir, x)) for x in os.listdir(dir) 
        if os.path.isdir(os.path.join(dir, x))]

Question 14

scanDir = "abc"
directories = [d for d in os.listdir(scanDir) if os.path.isdir(os.path.join(os.path.abspath(scanDir), d))]

Question 15

A safer option that does not fail when there is no directory.

def listdirs(folder):
    if os.path.exists(folder):
         return [d for d in os.listdir(folder) if os.path.isdir(os.path.join(folder, d))]
    else:
         return []

Question 16

Like so?

>>>> [path for path in os.listdir(os.getcwd()) if os.path.isdir(path)]

Question 17

Python 3.4 introduced the pathlib module into the standard library, which provides an object oriented approach to handle filesystem paths:

from pathlib import Path

p = Path('./')
[f for f in p.iterdir() if f.is_dir()]

Question 18

-- This will exclude files and traverse through 1 level of sub folders in the root

def list_files(dir):
    List = []
    filterstr = ' '
    for root, dirs, files in os.walk(dir, topdown = True):
        #r.append(root)
        if (root == dir):
            pass
        elif filterstr in root:
            #filterstr = ' '
            pass
        else:
            filterstr = root
            #print(root)
            for name in files:
                print(root)
                print(dirs)
                List.append(os.path.join(root,name))
            #print(os.path.join(root,name),"\n")
                print(List,"\n")

    return List

Question 19

我有一个路径（包括目录和文件名）。
我需要测试文件名是否有效，例如，文件系统是否允许我创建具有该名称的文件。
文件名中包含一些Unicode字符。

可以安全地假设路径的目录段是有效且可访问的（^{我试图使这个问题更笼统地适用，并且显然我走得太远了}）。

除非必须，否则我非常不想逃脱任何东西。

我会发布一些我正在处理的示例字符，但是显然它们会被堆栈交换系统自动删除。无论如何，我想保留标准的unicode实体，例如ö，仅转义文件名中无效的内容。

这里是要抓住的地方。路径目标上可能已经（可能没有）文件。如果该文件存在，我需要保留该文件，如果不存在，则不要创建该文件。

基本上，我想检查是否可以在不实际打开写入路径的情况下写入路径（以及通常需要进行的自动文件创建/文件破坏）。

因此：

try:
    open(filename, 'w')
except OSError:
    # handle error here

从这里

这是不可接受的，因为它将覆盖我不想触摸的现有文件（如果存在），或者如果不存在则创建该文件。

我知道我可以做：

if not os.access(filePath, os.W_OK):
    try:
        open(filePath, 'w').close()
        os.unlink(filePath)
    except OSError:
        # handle error here

但这将在处创建文件filePath，然后我将不得不os.unlink。

最后，似乎花了6或7行来完成应该简单os.isvalidpath(filePath)或相似的操作。

顺便说一句，我需要在（至少）Windows和MacOS上运行它，因此我想避免使用特定于平台的东西。

“

Question 20

I have a path (including directory and file name).
I need to test if the file-name is a valid, e.g. if the file-system will allow me to create a file with such a name.
The file-name has some unicode characters in it.

It’s safe to assume the directory segment of the path is valid and accessible (^{I was trying to make the question more gnerally applicable, and apparently I wen too far}).

I very much do not want to have to escape anything unless I have to.

I’d post some of the example characters I am dealing with, but apparently they get automatically removed by the stack-exchange system. Anyways, I want to keep standard unicode entities like ö, and only escape things which are invalid in a filename.

Here is the catch. There may (or may not) already be a file at the target of the path. I need to keep that file if it does exist, and not create a file if it does not.

Basically I want to check if I could write to a path without actually opening the path for writing (and the automatic file creation/file clobbering that typically entails).

As such:

try:
    open(filename, 'w')
except OSError:
    # handle error here

from here

Is not acceptable, because it will overwrite the existent file, which I do not want to touch (if it’s there), or create said file if it’s not.

I know I can do:

if not os.access(filePath, os.W_OK):
    try:
        open(filePath, 'w').close()
        os.unlink(filePath)
    except OSError:
        # handle error here

But that will create the file at the filePath, which I would then have to os.unlink.

In the end, it seems like it’s spending 6 or 7 lines to do something that should be as simple as os.isvalidpath(filePath) or similar.

As an aside, I need this to run on (at least) Windows and MacOS, so I’d like to avoid platform-specific stuff.

“

Question 21

tl; dr

调用is_path_exists_or_creatable()下面定义的函数。

严格地使用Python3。这就是我们的发展方向。

两个问题的故事

问题“如何测试路径名的有效性，以及对于有效路径名，这些路径的存在或可写性？” 显然是两个独立的问题。两者都很有趣，而且在这里还是我能找到的任何地方都没有收到真正令人满意的答案。

vikki的答案可能是最接近的，但有以下明显的缺点：

不必要地打开（然后无法可靠地关闭）文件句柄。
不必要的写作（ …然后无法可靠地关闭或删除）0字节文件。
忽略操作系统特定的错误，以区分不可忽略的无效路径名和可忽略的文件系统问题。毫不奇怪，这在Windows下至关重要。（见下文。）
忽略由外部进程同时（重新）移动要测试的路径名的父目录导致的竞争条件。（见下文。）
忽略此路径名导致的连接超时，该路径名位于陈旧，缓慢或暂时不可访问的文件系统上。这可能会使面向公众的服务遭受潜在的DoS驱动的攻击。（见下文。）

我们将解决所有问题。

问题＃0：路径名有效性又是什么？

在将我们脆弱的肉类衣服扔进Python般的痛苦中之前，我们可能应该定义“路径名有效性”的含义。究竟是什么定义了有效性？

“路径名有效性”是指路径名相对于当前系统的根文件系统的语法正确性，无论该路径或其父目录是否物理存在。如果路径名符合根文件系统的所有语法要求，则在此定义下语法上正确。

所谓“根文件系统”，是指：

在与POSIX兼容的系统上，文件系统已安装到根目录（/）。
在Windows中，文件系统安装到%HOMEDRIVE%，包含当前的Windows安装（通常但结肠-后缀盘符不必然C:）。

反过来，“语法正确性”的含义取决于根文件系统的类型。对于ext4（且不是大多数但与所有POSIX兼容的）文件系统，路径名称在且仅当该路径名称在语法上正确：

不包含空字节（即，\x00在Python中）。这是所有POSIX兼容文件系统的硬性要求。
包含不超过255个字节的路径组件（例如，'a'*256在Python中）。路径成分是含有不路径名的最长子串/字符（例如，bergtatt，ind，i，和fjeldkamrene在路径名/bergtatt/ind/i/fjeldkamrene）。

句法正确性。根文件系统。而已。

问题1：我们现在应如何进行路径名有效性？

令人惊讶的是，在Python中验证路径名是不直观的。我在这里与Fake Name达成坚定协议：官方os.path软件包应为此提供现成的解决方案。出于未知（可能不令人信服）的原因，事实并非如此。幸运的是，展开您自己的临时解决方案并不是那么费劲……

好的，实际上是。毛茸茸的讨厌它在发光时发出嘶哑和咯咯笑声时可能会发痒。但是你会怎么做？Nuthin’。

我们将很快进入低级代码的放射性深渊。但首先，让我们谈谈高级商店。当传递无效的路径名时，标准os.stat()和os.lstat()函数会引发以下异常：

对于驻留在不存在的目录中的路径名， FileNotFoundError。
对于现有目录中的路径名：
- 在Windows下，WindowsError其winerror属性为123（即ERROR_INVALID_NAME）的实例。
- 在所有其他操作系统下：
- 对于包含空字节（即'\x00'）的路径名，请使用的实例TypeError。
- 对于包含长度超过255个字节的路径成分的路径名，OSError其errcode属性的实例为：
  - 在SunOS和* BSD系列操作系统下，errno.ERANGE。（这似乎是操作系统级别的错误，否则称为POSIX标准的“选择性解释”。）
  - 在所有其他操作系统下，errno.ENAMETOOLONG。

至关重要的是，这意味着仅存在于现有目录中的路径名是有效的。当传递的路径名驻留在不存在的目录中时，不管这些路径名是否无效，os.stat()andos.lstat()函数都会引发通用FileNotFoundError异常。目录存在优先于路径名无效。

这是否意味着不存在的目录中的路径名无效？是的-除非我们修改这些路径名以驻留在现有目录中。但是，这甚至安全可行吗？修改路径名是否应该阻止我们验证原始路径名？

要回答这个问题，请从上面回忆一下，ext4文件系统上语法正确的路径名不包含路径组件（A）包含空字节，或（B）长度超过255个字节。因此，ext4仅当该路径名中的所有路径组件均有效时，该路径名才有效。大多数 现实世界中感兴趣的文件系统都是如此。

那根学究的见解真的对我们有帮助吗？是。它将一次验证完整路径名的较大问题减少到仅验证该路径名中的所有路径组成部分的较小问题。通过遵循以下算法，可以以跨平台方式对任意路径名进行有效验证（无论该路径名是否位于现有目录中）：

将该路径名拆分为路径组成部分（例如，将路径名/troldskog/faren/vild拆分为list ['', 'troldskog', 'faren', 'vild']）。
对于每个这样的组件：
1. 将保证与该组件一起存在的目录的路径名加入新的临时路径名（例如/troldskog）。
2. 将该路径名传递给os.stat()或os.lstat()。如果该路径名及其组件无效，则可以确保此调用引发一个暴露无效类型的异常，而不是通用FileNotFoundError异常。为什么？因为该路径名位于现有目录中。（循环逻辑是循环的。）

是否有目录保证存在？是的，但通常只有一个：根文件系统的最顶层目录（如上定义）。

将驻留在任何其他目录（因此不保证存在）中的路径名传递给竞争条件os.stat()或os.lstat()引发竞争条件，即使该目录先前已被测试存在。为什么？因为在执行该测试之后但在将该路径名传递给os.stat()or之前，无法阻止外部进程同时删除该目录os.lstat()。释放令人发疯的狗！

上述方法也有一个很大的附带好处：安全性。（是不是说好的？）具体为：

前端应用程序通过简单地将这样的路径名传递给拒绝服务（DoS）攻击os.stat()或os.lstat()容易受到拒绝的攻击，从而验证来自不受信任来源的任意路径名。恶意用户可能试图反复验证驻留在已知陈旧或缓慢的文件系统上的路径名（例如，NFS Samba共享）；在这种情况下，盲目声明传入的路径名可能最终会因连接超时而失败，或者消耗的时间和资源要比您承受失业的能力弱。

上面的方法通过仅针对根文件系统的根目录验证路径名的路径组成部分来避免这种情况。（即使这是陈旧，缓慢或无法访问的，也比路径名验证要麻烦得多。）

丢失？大。让我们开始。（假定使用Python3。请参阅“ leycec对300的脆弱希望是什么？”）

import errno, os

# Sadly, Python fails to provide the following magic number for us.
ERROR_INVALID_NAME = 123
'''
Windows-specific error code indicating an invalid pathname.

See Also
----------
https://docs.microsoft.com/en-us/windows/win32/debug/system-error-codes--0-499-
    Official listing of all such codes.
'''

def is_pathname_valid(pathname: str) -> bool:
    '''
    `True` if the passed pathname is a valid pathname for the current OS;
    `False` otherwise.
    '''
    # If this pathname is either not a string or is but is empty, this pathname
    # is invalid.
    try:
        if not isinstance(pathname, str) or not pathname:
            return False

        # Strip this pathname's Windows-specific drive specifier (e.g., `C:\`)
        # if any. Since Windows prohibits path components from containing `:`
        # characters, failing to strip this `:`-suffixed prefix would
        # erroneously invalidate all valid absolute Windows pathnames.
        _, pathname = os.path.splitdrive(pathname)

        # Directory guaranteed to exist. If the current OS is Windows, this is
        # the drive to which Windows was installed (e.g., the "%HOMEDRIVE%"
        # environment variable); else, the typical root directory.
        root_dirname = os.environ.get('HOMEDRIVE', 'C:') \
            if sys.platform == 'win32' else os.path.sep
        assert os.path.isdir(root_dirname)   # ...Murphy and her ironclad Law

        # Append a path separator to this directory if needed.
        root_dirname = root_dirname.rstrip(os.path.sep) + os.path.sep

        # Test whether each path component split from this pathname is valid or
        # not, ignoring non-existent and non-readable path components.
        for pathname_part in pathname.split(os.path.sep):
            try:
                os.lstat(root_dirname + pathname_part)
            # If an OS-specific exception is raised, its error code
            # indicates whether this pathname is valid or not. Unless this
            # is the case, this exception implies an ignorable kernel or
            # filesystem complaint (e.g., path not found or inaccessible).
            #
            # Only the following exceptions indicate invalid pathnames:
            #
            # * Instances of the Windows-specific "WindowsError" class
            #   defining the "winerror" attribute whose value is
            #   "ERROR_INVALID_NAME". Under Windows, "winerror" is more
            #   fine-grained and hence useful than the generic "errno"
            #   attribute. When a too-long pathname is passed, for example,
            #   "errno" is "ENOENT" (i.e., no such file or directory) rather
            #   than "ENAMETOOLONG" (i.e., file name too long).
            # * Instances of the cross-platform "OSError" class defining the
            #   generic "errno" attribute whose value is either:
            #   * Under most POSIX-compatible OSes, "ENAMETOOLONG".
            #   * Under some edge-case OSes (e.g., SunOS, *BSD), "ERANGE".
            except OSError as exc:
                if hasattr(exc, 'winerror'):
                    if exc.winerror == ERROR_INVALID_NAME:
                        return False
                elif exc.errno in {errno.ENAMETOOLONG, errno.ERANGE}:
                    return False
    # If a "TypeError" exception was raised, it almost certainly has the
    # error message "embedded NUL character" indicating an invalid pathname.
    except TypeError as exc:
        return False
    # If no exception was raised, all path components and hence this
    # pathname itself are valid. (Praise be to the curmudgeonly python.)
    else:
        return True
    # If any other exception was raised, this is an unrelated fatal issue
    # (e.g., a bug). Permit this exception to unwind the call stack.
    #
    # Did we mention this should be shipped with Python already?

做完了 不要斜视那个代码。（它咬。）

问题2：路径名的存在或可创建性可能无效，是吗？

在上述解决方案的基础上，测试可能无效的路径名的存在或可创建性通常很简单。这里的关键是在测试传递的路径之前调用先前定义的函数：

def is_path_creatable(pathname: str) -> bool:
    '''
    `True` if the current user has sufficient permissions to create the passed
    pathname; `False` otherwise.
    '''
    # Parent directory of the passed path. If empty, we substitute the current
    # working directory (CWD) instead.
    dirname = os.path.dirname(pathname) or os.getcwd()
    return os.access(dirname, os.W_OK)

def is_path_exists_or_creatable(pathname: str) -> bool:
    '''
    `True` if the passed pathname is a valid pathname for the current OS _and_
    either currently exists or is hypothetically creatable; `False` otherwise.

    This function is guaranteed to _never_ raise exceptions.
    '''
    try:
        # To prevent "os" module calls from raising undesirable exceptions on
        # invalid pathnames, is_pathname_valid() is explicitly called first.
        return is_pathname_valid(pathname) and (
            os.path.exists(pathname) or is_path_creatable(pathname))
    # Report failure on non-fatal filesystem complaints (e.g., connection
    # timeouts, permissions issues) implying this path to be inaccessible. All
    # other exceptions are unrelated fatal issues and should not be caught here.
    except OSError:
        return False

完成和完成的。除了不太一样。

问题3：Windows上可能存在无效的路径名或可写性

有一个警告。当然有。

官方os.access()文件承认：

注意：即使os.access()表明I / O操作将成功，它也可能会失败，尤其是对于网络文件系统上的操作，其权限语义可能超出通常的POSIX权限位模型。

毫不奇怪，Windows通常是这里的嫌疑人。由于在NTFS文件系统上广泛使用了访问控制列表（ACL），因此简单的POSIX权限位模型无法很好地映射到底层Windows现实。尽管这（不是问题）不是Python的错，但对于与Windows兼容的应用程序，它可能仍然值得关注。

如果是您，那么需要一个更强大的替代方案。如果传递的路径也不会存在，我们不是试图建立保证该路径的父目录被立即删除临时文件- creatability的更便携的（如昂贵的）测试：

import os, tempfile

def is_path_sibling_creatable(pathname: str) -> bool:
    '''
    `True` if the current user has sufficient permissions to create **siblings**
    (i.e., arbitrary files in the parent directory) of the passed pathname;
    `False` otherwise.
    '''
    # Parent directory of the passed path. If empty, we substitute the current
    # working directory (CWD) instead.
    dirname = os.path.dirname(pathname) or os.getcwd()

    try:
        # For safety, explicitly close and hence delete this temporary file
        # immediately after creating it in the passed path's parent directory.
        with tempfile.TemporaryFile(dir=dirname): pass
        return True
    # While the exact type of exception raised by the above function depends on
    # the current version of the Python interpreter, all such types subclass the
    # following exception superclass.
    except EnvironmentError:
        return False

def is_path_exists_or_creatable_portable(pathname: str) -> bool:
    '''
    `True` if the passed pathname is a valid pathname on the current OS _and_
    either currently exists or is hypothetically creatable in a cross-platform
    manner optimized for POSIX-unfriendly filesystems; `False` otherwise.

    This function is guaranteed to _never_ raise exceptions.
    '''
    try:
        # To prevent "os" module calls from raising undesirable exceptions on
        # invalid pathnames, is_pathname_valid() is explicitly called first.
        return is_pathname_valid(pathname) and (
            os.path.exists(pathname) or is_path_sibling_creatable(pathname))
    # Report failure on non-fatal filesystem complaints (e.g., connection
    # timeouts, permissions issues) implying this path to be inaccessible. All
    # other exceptions are unrelated fatal issues and should not be caught here.
    except OSError:
        return False

但是请注意，即使这可能还不够。

多亏了用户访问控制（UAC），永远无法模仿的Windows Vista及其所有后续迭代都明显地涉及与系统目录有关的权限。当非管理员用户尝试在规范目录C:\Windows或C:\Windows\system32目录中创建文件时，UAC会从表面上允许用户这样做，同时实际上将所有创建的文件隔离到该用户配置文件中的“虚拟存储”中。（谁能想到欺骗用户会产生有害的长期后果？）

这太疯狂了。这是Windows。

证明给我看

敢吗现在该进行上述测试了。

由于NULL是面向UNIX的文件系统上路径名中唯一禁止使用的字符，因此让我们利用它来展示冷酷的事实–忽略不可忽略的Windows恶作剧，坦白地说，这同样使我感到厌烦并激怒了我：

>>> print('"foo.bar" valid? ' + str(is_pathname_valid('foo.bar')))
"foo.bar" valid? True
>>> print('Null byte valid? ' + str(is_pathname_valid('\x00')))
Null byte valid? False
>>> print('Long path valid? ' + str(is_pathname_valid('a' * 256)))
Long path valid? False
>>> print('"/dev" exists or creatable? ' + str(is_path_exists_or_creatable('/dev')))
"/dev" exists or creatable? True
>>> print('"/dev/foo.bar" exists or creatable? ' + str(is_path_exists_or_creatable('/dev/foo.bar')))
"/dev/foo.bar" exists or creatable? False
>>> print('Null byte exists or creatable? ' + str(is_path_exists_or_creatable('\x00')))
Null byte exists or creatable? False

超越理智。超越痛苦。您会发现Python可移植性问题。

Question 22

tl;dr

Call the is_path_exists_or_creatable() function defined below.

Strictly Python 3. That’s just how we roll.

A Tale of Two Questions

The question of “How do I test pathname validity and, for valid pathnames, the existence or writability of those paths?” is clearly two separate questions. Both are interesting, and neither have received a genuinely satisfactory answer here… or, well, anywhere that I could grep.

vikki‘s answer probably hews the closest, but has the remarkable disadvantages of:

Needlessly opening (…and then failing to reliably close) file handles.
Needlessly writing (…and then failing to reliable close or delete) 0-byte files.
Ignoring OS-specific errors differentiating between non-ignorable invalid pathnames and ignorable filesystem issues. Unsurprisingly, this is critical under Windows. (See below.)
Ignoring race conditions resulting from external processes concurrently (re)moving parent directories of the pathname to be tested. (See below.)
Ignoring connection timeouts resulting from this pathname residing on stale, slow, or otherwise temporarily inaccessible filesystems. This could expose public-facing services to potential DoS-driven attacks. (See below.)

We’re gonna fix all that.

Question #0: What’s Pathname Validity Again?

Before hurling our fragile meat suits into the python-riddled moshpits of pain, we should probably define what we mean by “pathname validity.” What defines validity, exactly?

By “pathname validity,” we mean the syntactic correctness of a pathname with respect to the root filesystem of the current system – regardless of whether that path or parent directories thereof physically exist. A pathname is syntactically correct under this definition if it complies with all syntactic requirements of the root filesystem.

By “root filesystem,” we mean:

On POSIX-compatible systems, the filesystem mounted to the root directory (/).
On Windows, the filesystem mounted to %HOMEDRIVE%, the colon-suffixed drive letter containing the current Windows installation (typically but not necessarily C:).

The meaning of “syntactic correctness,” in turn, depends on the type of root filesystem. For ext4 (and most but not all POSIX-compatible) filesystems, a pathname is syntactically correct if and only if that pathname:

Contains no null bytes (i.e., \x00 in Python). This is a hard requirement for all POSIX-compatible filesystems.
Contains no path components longer than 255 bytes (e.g., 'a'*256 in Python). A path component is a longest substring of a pathname containing no / character (e.g., bergtatt, ind, i, and fjeldkamrene in the pathname /bergtatt/ind/i/fjeldkamrene).

Syntactic correctness. Root filesystem. That’s it.

Question #1: How Now Shall We Do Pathname Validity?

Validating pathnames in Python is surprisingly non-intuitive. I’m in firm agreement with Fake Name here: the official os.path package should provide an out-of-the-box solution for this. For unknown (and probably uncompelling) reasons, it doesn’t. Fortunately, unrolling your own ad-hoc solution isn’t that gut-wrenching…

O.K., it actually is. It’s hairy; it’s nasty; it probably chortles as it burbles and giggles as it glows. But what you gonna do? Nuthin’.

We’ll soon descend into the radioactive abyss of low-level code. But first, let’s talk high-level shop. The standard os.stat() and os.lstat() functions raise the following exceptions when passed invalid pathnames:

For pathnames residing in non-existing directories, instances of FileNotFoundError.
For pathnames residing in existing directories:
- Under Windows, instances of WindowsError whose winerror attribute is 123 (i.e., ERROR_INVALID_NAME).
- Under all other OSes:
- For pathnames containing null bytes (i.e., '\x00'), instances of TypeError.
- For pathnames containing path components longer than 255 bytes, instances of OSError whose errcode attribute is:
  - Under SunOS and the *BSD family of OSes, errno.ERANGE. (This appears to be an OS-level bug, otherwise referred to as “selective interpretation” of the POSIX standard.)
  - Under all other OSes, errno.ENAMETOOLONG.

Crucially, this implies that only pathnames residing in existing directories are validatable. The os.stat() and os.lstat() functions raise generic FileNotFoundError exceptions when passed pathnames residing in non-existing directories, regardless of whether those pathnames are invalid or not. Directory existence takes precedence over pathname invalidity.

Does this mean that pathnames residing in non-existing directories are not validatable? Yes – unless we modify those pathnames to reside in existing directories. Is that even safely feasible, however? Shouldn’t modifying a pathname prevent us from validating the original pathname?

To answer this question, recall from above that syntactically correct pathnames on the ext4 filesystem contain no path components (A) containing null bytes or (B) over 255 bytes in length. Hence, an ext4 pathname is valid if and only if all path components in that pathname are valid. This is true of most real-world filesystems of interest.

Does that pedantic insight actually help us? Yes. It reduces the larger problem of validating the full pathname in one fell swoop to the smaller problem of only validating all path components in that pathname. Any arbitrary pathname is validatable (regardless of whether that pathname resides in an existing directory or not) in a cross-platform manner by following the following algorithm:

Split that pathname into path components (e.g., the pathname /troldskog/faren/vild into the list ['', 'troldskog', 'faren', 'vild']).
For each such component:
1. Join the pathname of a directory guaranteed to exist with that component into a new temporary pathname (e.g., /troldskog) .
2. Pass that pathname to os.stat() or os.lstat(). If that pathname and hence that component is invalid, this call is guaranteed to raise an exception exposing the type of invalidity rather than a generic FileNotFoundError exception. Why? Because that pathname resides in an existing directory. (Circular logic is circular.)

Is there a directory guaranteed to exist? Yes, but typically only one: the topmost directory of the root filesystem (as defined above).

Passing pathnames residing in any other directory (and hence not guaranteed to exist) to os.stat() or os.lstat() invites race conditions, even if that directory was previously tested to exist. Why? Because external processes cannot be prevented from concurrently removing that directory after that test has been performed but before that pathname is passed to os.stat() or os.lstat(). Unleash the dogs of mind-fellating insanity!

There exists a substantial side benefit to the above approach as well: security. (Isn’t that nice?) Specifically:

Front-facing applications validating arbitrary pathnames from untrusted sources by simply passing such pathnames to os.stat() or os.lstat() are susceptible to Denial of Service (DoS) attacks and other black-hat shenanigans. Malicious users may attempt to repeatedly validate pathnames residing on filesystems known to be stale or otherwise slow (e.g., NFS Samba shares); in that case, blindly statting incoming pathnames is liable to either eventually fail with connection timeouts or consume more time and resources than your feeble capacity to withstand unemployment.

The above approach obviates this by only validating the path components of a pathname against the root directory of the root filesystem. (If even that’s stale, slow, or inaccessible, you’ve got larger problems than pathname validation.)

Lost? Great. Let’s begin. (Python 3 assumed. See “What Is Fragile Hope for 300, leycec?”)

import errno, os

# Sadly, Python fails to provide the following magic number for us.
ERROR_INVALID_NAME = 123
'''
Windows-specific error code indicating an invalid pathname.

See Also
----------
https://docs.microsoft.com/en-us/windows/win32/debug/system-error-codes--0-499-
    Official listing of all such codes.
'''

def is_pathname_valid(pathname: str) -> bool:
    '''
    `True` if the passed pathname is a valid pathname for the current OS;
    `False` otherwise.
    '''
    # If this pathname is either not a string or is but is empty, this pathname
    # is invalid.
    try:
        if not isinstance(pathname, str) or not pathname:
            return False

        # Strip this pathname's Windows-specific drive specifier (e.g., `C:\`)
        # if any. Since Windows prohibits path components from containing `:`
        # characters, failing to strip this `:`-suffixed prefix would
        # erroneously invalidate all valid absolute Windows pathnames.
        _, pathname = os.path.splitdrive(pathname)

        # Directory guaranteed to exist. If the current OS is Windows, this is
        # the drive to which Windows was installed (e.g., the "%HOMEDRIVE%"
        # environment variable); else, the typical root directory.
        root_dirname = os.environ.get('HOMEDRIVE', 'C:') \
            if sys.platform == 'win32' else os.path.sep
        assert os.path.isdir(root_dirname)   # ...Murphy and her ironclad Law

        # Append a path separator to this directory if needed.
        root_dirname = root_dirname.rstrip(os.path.sep) + os.path.sep

        # Test whether each path component split from this pathname is valid or
        # not, ignoring non-existent and non-readable path components.
        for pathname_part in pathname.split(os.path.sep):
            try:
                os.lstat(root_dirname + pathname_part)
            # If an OS-specific exception is raised, its error code
            # indicates whether this pathname is valid or not. Unless this
            # is the case, this exception implies an ignorable kernel or
            # filesystem complaint (e.g., path not found or inaccessible).
            #
            # Only the following exceptions indicate invalid pathnames:
            #
            # * Instances of the Windows-specific "WindowsError" class
            #   defining the "winerror" attribute whose value is
            #   "ERROR_INVALID_NAME". Under Windows, "winerror" is more
            #   fine-grained and hence useful than the generic "errno"
            #   attribute. When a too-long pathname is passed, for example,
            #   "errno" is "ENOENT" (i.e., no such file or directory) rather
            #   than "ENAMETOOLONG" (i.e., file name too long).
            # * Instances of the cross-platform "OSError" class defining the
            #   generic "errno" attribute whose value is either:
            #   * Under most POSIX-compatible OSes, "ENAMETOOLONG".
            #   * Under some edge-case OSes (e.g., SunOS, *BSD), "ERANGE".
            except OSError as exc:
                if hasattr(exc, 'winerror'):
                    if exc.winerror == ERROR_INVALID_NAME:
                        return False
                elif exc.errno in {errno.ENAMETOOLONG, errno.ERANGE}:
                    return False
    # If a "TypeError" exception was raised, it almost certainly has the
    # error message "embedded NUL character" indicating an invalid pathname.
    except TypeError as exc:
        return False
    # If no exception was raised, all path components and hence this
    # pathname itself are valid. (Praise be to the curmudgeonly python.)
    else:
        return True
    # If any other exception was raised, this is an unrelated fatal issue
    # (e.g., a bug). Permit this exception to unwind the call stack.
    #
    # Did we mention this should be shipped with Python already?

Done. Don’t squint at that code. (It bites.)

Question #2: Possibly Invalid Pathname Existence or Creatability, Eh?

Testing the existence or creatability of possibly invalid pathnames is, given the above solution, mostly trivial. The little key here is to call the previously defined function before testing the passed path:

def is_path_creatable(pathname: str) -> bool:
    '''
    `True` if the current user has sufficient permissions to create the passed
    pathname; `False` otherwise.
    '''
    # Parent directory of the passed path. If empty, we substitute the current
    # working directory (CWD) instead.
    dirname = os.path.dirname(pathname) or os.getcwd()
    return os.access(dirname, os.W_OK)

def is_path_exists_or_creatable(pathname: str) -> bool:
    '''
    `True` if the passed pathname is a valid pathname for the current OS _and_
    either currently exists or is hypothetically creatable; `False` otherwise.

    This function is guaranteed to _never_ raise exceptions.
    '''
    try:
        # To prevent "os" module calls from raising undesirable exceptions on
        # invalid pathnames, is_pathname_valid() is explicitly called first.
        return is_pathname_valid(pathname) and (
            os.path.exists(pathname) or is_path_creatable(pathname))
    # Report failure on non-fatal filesystem complaints (e.g., connection
    # timeouts, permissions issues) implying this path to be inaccessible. All
    # other exceptions are unrelated fatal issues and should not be caught here.
    except OSError:
        return False

Done and done. Except not quite.

Question #3: Possibly Invalid Pathname Existence or Writability on Windows

There exists a caveat. Of course there does.

As the official os.access() documentation admits:

Note: I/O operations may fail even when os.access() indicates that they would succeed, particularly for operations on network filesystems which may have permissions semantics beyond the usual POSIX permission-bit model.

To no one’s surprise, Windows is the usual suspect here. Thanks to extensive use of Access Control Lists (ACL) on NTFS filesystems, the simplistic POSIX permission-bit model maps poorly to the underlying Windows reality. While this (arguably) isn’t Python’s fault, it might nonetheless be of concern for Windows-compatible applications.

If this is you, a more robust alternative is wanted. If the passed path does not exist, we instead attempt to create a temporary file guaranteed to be immediately deleted in the parent directory of that path – a more portable (if expensive) test of creatability:

import os, tempfile

def is_path_sibling_creatable(pathname: str) -> bool:
    '''
    `True` if the current user has sufficient permissions to create **siblings**
    (i.e., arbitrary files in the parent directory) of the passed pathname;
    `False` otherwise.
    '''
    # Parent directory of the passed path. If empty, we substitute the current
    # working directory (CWD) instead.
    dirname = os.path.dirname(pathname) or os.getcwd()

    try:
        # For safety, explicitly close and hence delete this temporary file
        # immediately after creating it in the passed path's parent directory.
        with tempfile.TemporaryFile(dir=dirname): pass
        return True
    # While the exact type of exception raised by the above function depends on
    # the current version of the Python interpreter, all such types subclass the
    # following exception superclass.
    except EnvironmentError:
        return False

def is_path_exists_or_creatable_portable(pathname: str) -> bool:
    '''
    `True` if the passed pathname is a valid pathname on the current OS _and_
    either currently exists or is hypothetically creatable in a cross-platform
    manner optimized for POSIX-unfriendly filesystems; `False` otherwise.

    This function is guaranteed to _never_ raise exceptions.
    '''
    try:
        # To prevent "os" module calls from raising undesirable exceptions on
        # invalid pathnames, is_pathname_valid() is explicitly called first.
        return is_pathname_valid(pathname) and (
            os.path.exists(pathname) or is_path_sibling_creatable(pathname))
    # Report failure on non-fatal filesystem complaints (e.g., connection
    # timeouts, permissions issues) implying this path to be inaccessible. All
    # other exceptions are unrelated fatal issues and should not be caught here.
    except OSError:
        return False

Note, however, that even this may not be enough.

Thanks to User Access Control (UAC), the ever-inimicable Windows Vista and all subsequent iterations thereof blatantly lie about permissions pertaining to system directories. When non-Administrator users attempt to create files in either the canonical C:\Windows or C:\Windows\system32 directories, UAC superficially permits the user to do so while actually isolating all created files into a “Virtual Store” in that user’s profile. (Who could have possibly imagined that deceiving users would have harmful long-term consequences?)

This is crazy. This is Windows.

Prove It

Dare we? It’s time to test-drive the above tests.

Since NULL is the only character prohibited in pathnames on UNIX-oriented filesystems, let’s leverage that to demonstrate the cold, hard truth – ignoring non-ignorable Windows shenanigans, which frankly bore and anger me in equal measure:

>>> print('"foo.bar" valid? ' + str(is_pathname_valid('foo.bar')))
"foo.bar" valid? True
>>> print('Null byte valid? ' + str(is_pathname_valid('\x00')))
Null byte valid? False
>>> print('Long path valid? ' + str(is_pathname_valid('a' * 256)))
Long path valid? False
>>> print('"/dev" exists or creatable? ' + str(is_path_exists_or_creatable('/dev')))
"/dev" exists or creatable? True
>>> print('"/dev/foo.bar" exists or creatable? ' + str(is_path_exists_or_creatable('/dev/foo.bar')))
"/dev/foo.bar" exists or creatable? False
>>> print('Null byte exists or creatable? ' + str(is_path_exists_or_creatable('\x00')))
Null byte exists or creatable? False

Beyond sanity. Beyond pain. You will find Python portability concerns.

Question 23

if os.path.exists(filePath):
    #the file is there
elif os.access(os.path.dirname(filePath), os.W_OK):
    #the file does not exists but write privileges are given
else:
    #can not write there

请注意，path.exists失败的原因the file is not there可能不仅仅是，所以您可能必须进行更精细的测试，例如测试包含目录是否存在等等。

在与OP讨论之后，事实证明，主要的问题似乎是文件名可能包含文件系统不允许的字符。当然，需要将它们删除，但是OP希望在文件系统允许的范围内保持尽可能多的人可读性。

可悲的是，我不知道有什么好的解决方案。但是，塞西尔·库里（Cecil Curry）的答案更仔细地研究了发现问题。

Question 24

if os.path.exists(filePath):
    #the file is there
elif os.access(os.path.dirname(filePath), os.W_OK):
    #the file does not exists but write privileges are given
else:
    #can not write there

Note that path.exists can fail for more reasons than just the file is not there so you might have to do finer tests like testing if the containing directory exists and so on.

After my discussion with the OP it turned out, that the main problem seems to be, that the file name might contain characters that are not allowed by the filesystem. Of course they need to be removed but the OP wants to maintain as much human readablitiy as the filesystem allows.

Sadly I do not know of any good solution for this. However Cecil Curry’s answer takes a closer look at detecting the problem.

Question 25

使用Python 3，如何：

try:
    with open(filename, 'x') as tempfile: # OSError if file exists or is invalid
        pass
except OSError:
    # handle error here

使用“ x”选项，我们也不必担心比赛条件。请参阅此处的文档。

现在，如果该文件尚不存在，它将创建一个寿命很短的临时文件-除非名称无效。如果您可以忍受，那么可以简化很多事情。

Question 26

With Python 3, how about:

try:
    with open(filename, 'x') as tempfile: # OSError if file exists or is invalid
        pass
except OSError:
    # handle error here

With the ‘x’ option we also don’t have to worry about race conditions. See documentation here.

Now, this WILL create a very shortlived temporary file if it does not exist already – unless the name is invalid. If you can live with that, it simplifies things a lot.

Question 27

open(filename,'r')   #2nd argument is r and not w

将打开文件或给出错误（如果不存在）。如果有错误，那么您可以尝试写入路径，如果不能，则出现第二个错误

try:
    open(filename,'r')
    return True
except IOError:
    try:
        open(filename, 'w')
        return True
    except IOError:
        return False

也可以在这里查看有关Windows权限的信息

Question 28

open(filename,'r')   #2nd argument is r and not w

will open the file or give an error if it doesn’t exist. If there’s an error, then you can try to write to the path, if you can’t then you get a second error

try:
    open(filename,'r')
    return True
except IOError:
    try:
        open(filename, 'w')
        return True
    except IOError:
        return False

Also have a look here about permissions on windows

Question 29

尝试os.path.exists此操作将检查路径，并返回True是否存在（False如果不存在）。

Question 30

try os.path.exists this will check for the path and return True if exists and False if not.

Question 31

I want to open a series of subfolders in a folder and find some text files and print some lines of the text files. I am using this:

configfiles = glob.glob('C:/Users/sam/Desktop/file1/*.txt')

But this cannot access the subfolders as well. Does anyone know how I can use the same command to access subfolders as well?

Question 32

In Python 3.5 and newer use the new recursive **/ functionality:

configfiles = glob.glob('C:/Users/sam/Desktop/file1/**/*.txt', recursive=True)

When recursive is set, ** followed by a path separator matches 0 or more subdirectories.

In earlier Python versions, glob.glob() cannot list files in subdirectories recursively.

In that case I’d use os.walk() combined with fnmatch.filter() instead:

import os
import fnmatch

path = 'C:/Users/sam/Desktop/file1'

configfiles = [os.path.join(dirpath, f)
    for dirpath, dirnames, files in os.walk(path)
    for f in fnmatch.filter(files, '*.txt')]

This’ll walk your directories recursively and return all absolute pathnames to matching .txt files. In this specific case the fnmatch.filter() may be overkill, you could also use a .endswith() test:

import os

path = 'C:/Users/sam/Desktop/file1'

configfiles = [os.path.join(dirpath, f)
    for dirpath, dirnames, files in os.walk(path)
    for f in files if f.endswith('.txt')]

Question 33

To find files in immediate subdirectories:

configfiles = glob.glob(r'C:\Users\sam\Desktop\*\*.txt')

For a recursive version that traverse all subdirectories, you could use ** and pass recursive=True since Python 3.5:

configfiles = glob.glob(r'C:\Users\sam\Desktop\**\*.txt', recursive=True)

Both function calls return lists. You could use glob.iglob() to return paths one by one. Or use pathlib:

from pathlib import Path

path = Path(r'C:\Users\sam\Desktop')
txt_files_only_subdirs = path.glob('*/*.txt')
txt_files_all_recursively = path.rglob('*.txt') # including the current dir

Both methods return iterators (you can get paths one by one).

Question 34

There’s a lot of confusion on this topic. Let me see if I can clarify it (Python 3.7):

glob.glob('*.txt') :matches all files ending in ‘.txt’ in current directory
glob.glob('*/*.txt') :same as 1
glob.glob('**/*.txt') :matches all files ending in ‘.txt’ in the immediate subdirectories only, but not in the current directory
glob.glob('*.txt',recursive=True) :same as 1
glob.glob('*/*.txt',recursive=True) :same as 3
glob.glob('**/*.txt',recursive=True):matches all files ending in ‘.txt’ in the current directory and in all subdirectories

So it’s best to always specify recursive=True.

Question 35

The glob2 package supports wild cards and is reasonably fast

code = '''
import glob2
glob2.glob("files/*/**")
'''
timeit.timeit(code, number=1)

On my laptop it takes approximately 2 seconds to match >60,000 file paths.

Question 36

You can use Formic with Python 2.6

import formic
fileset = formic.FileSet(include="**/*.txt", directory="C:/Users/sam/Desktop/")

Disclosure – I am the author of this package.

Question 37

Here is a adapted version that enables glob.glob like functionality without using glob2.

def find_files(directory, pattern='*'):
    if not os.path.exists(directory):
        raise ValueError("Directory not found {}".format(directory))

    matches = []
    for root, dirnames, filenames in os.walk(directory):
        for filename in filenames:
            full_path = os.path.join(root, filename)
            if fnmatch.filter([full_path], pattern):
                matches.append(os.path.join(root, filename))
    return matches

So if you have the following dir structure

tests/files
├── a0
│   ├── a0.txt
│   ├── a0.yaml
│   └── b0
│       ├── b0.yaml
│       └── b00.yaml
└── a1

You can do something like this

files = utils.find_files('tests/files','**/b0/b*.yaml')
> ['tests/files/a0/b0/b0.yaml', 'tests/files/a0/b0/b00.yaml']

Pretty much fnmatch pattern match on the whole filename itself, rather than the filename only.

Question 38

configfiles = glob.glob('C:/Users/sam/Desktop/**/*.txt")

Doesn’t works for all cases, instead use glob2

configfiles = glob2.glob('C:/Users/sam/Desktop/**/*.txt")

Question 39

If you can install glob2 package…

import glob2
filenames = glob2.glob("C:\\top_directory\\**\\*.ext")  # Where ext is a specific file extension
folders = glob2.glob("C:\\top_directory\\**\\")

All filenames and folders:

all_ff = glob2.glob("C:\\top_directory\\**\\**")

Question 40

If you’re running Python 3.4+, you can use the pathlib module. The Path.glob() method supports the ** pattern, which means “this directory and all subdirectories, recursively”. It returns a generator yielding Path objects for all matching files.

from pathlib import Path
configfiles = Path("C:/Users/sam/Desktop/file1/").glob("**/*.txt")

Question 41

As pointed out by Martijn, glob can only do this through the **operator introduced in Python 3.5. Since the OP explicitly asked for the glob module, the following will return a lazy evaluation iterator that behaves similarly

import os, glob, itertools

configfiles = itertools.chain.from_iterable(glob.iglob(os.path.join(root,'*.txt'))
                         for root, dirs, files in os.walk('C:/Users/sam/Desktop/file1/'))

Note that you can only iterate once over configfiles in this approach though. If you require a real list of configfiles that can be used in multiple operations you would have to create this explicitly by using list(configfiles).

Question 42

The command rglob will do an infinite recursion down the deepest sub-level of your directory structure. If you only want one level deep, then do not use it, however.

I realize the OP was talking about using glob.glob. I believe this answers the intent, however, which is to search all subfolders recursively.

The rglob function recently produced a 100x increase in speed for a data processing algorithm which was using the folder structure as a fixed assumption for the order of data reading. However, with rglob we were able to do a single scan once through all files at or below a specified parent directory, save their names to a list (over a million files), then use that list to determine which files we needed to open at any point in the future based on the file naming conventions only vs. which folder they were in.

Question 43

I am trying to get a list of files in a directory using Python, but I do not want a list of ALL the files.

What I essentially want is the ability to do something like the following but using Python and not executing ls.

ls 145592*.jpg

If there is no built-in method for this, I am currently thinking of writing a for loop to iterate through the results of an os.listdir() and to append all the matching files to a new list.

However, there are a lot of files in that directory and therefore I am hoping there is a more efficient method (or a built-in method).

Question 44

import glob

jpgFilenamesList = glob.glob('145592*.jpg')

See glob in python documenttion

Question 45

glob.glob() is definitely the way to do it (as per Ignacio). However, if you do need more complicated matching, you can do it with a list comprehension and re.match(), something like so:

files = [f for f in os.listdir('.') if re.match(r'[0-9]+.*\.jpg', f)]

More flexible, but as you note, less efficient.

Question 46

Keep it simple:

import os
relevant_path = "[path to folder]"
included_extensions = ['jpg','jpeg', 'bmp', 'png', 'gif']
file_names = [fn for fn in os.listdir(relevant_path)
              if any(fn.endswith(ext) for ext in included_extensions)]

I prefer this form of list comprehensions because it reads well in English.

I read the fourth line as: For each fn in os.listdir for my path, give me only the ones that match any one of my included extensions.

It may be hard for novice python programmers to really get used to using list comprehensions for filtering, and it can have some memory overhead for very large data sets, but for listing a directory and other simple string filtering tasks, list comprehensions lead to more clean documentable code.

The only thing about this design is that it doesn’t protect you against making the mistake of passing a string instead of a list. For example if you accidentally convert a string to a list and end up checking against all the characters of a string, you could end up getting a slew of false positives.

But it’s better to have a problem that’s easy to fix than a solution that’s hard to understand.

Question 47

Another option:

>>> import os, fnmatch
>>> fnmatch.filter(os.listdir('.'), '*.py')
['manage.py']

https://docs.python.org/3/library/fnmatch.html

Question 48

Filter with `glob` module:

Import glob

import glob

Wild Cards:

files=glob.glob("data/*")
print(files)

Out:

['data/ks_10000_0', 'data/ks_1000_0', 'data/ks_100_0', 'data/ks_100_1',
'data/ks_100_2', 'data/ks_106_0', 'data/ks_19_0', 'data/ks_200_0', 'data/ks_200_1', 
'data/ks_300_0', 'data/ks_30_0', 'data/ks_400_0', 'data/ks_40_0', 'data/ks_45_0', 
'data/ks_4_0', 'data/ks_500_0', 'data/ks_50_0', 'data/ks_50_1', 'data/ks_60_0', 
'data/ks_82_0', 'data/ks_lecture_dp_1', 'data/ks_lecture_dp_2']

Fiter extension `.txt`:

files = glob.glob("/home/ach/*/*.txt")

A single character

glob.glob("/home/ach/file?.txt")

Number Ranges

glob.glob("/home/ach/*[0-9]*")

Alphabet Ranges

glob.glob("/home/ach/[a-c]*")

Question 49

Preliminary code

import glob
import fnmatch
import pathlib
import os

pattern = '*.py'
path = '.'

Solution 1 – use “glob”

# lookup in current dir
glob.glob(pattern)

In [2]: glob.glob(pattern)
Out[2]: ['wsgi.py', 'manage.py', 'tasks.py']

Solution 2 – use “os” + “fnmatch”

Variant 2.1 – Lookup in current dir

# lookup in current dir
fnmatch.filter(os.listdir(path), pattern)

In [3]: fnmatch.filter(os.listdir(path), pattern)
Out[3]: ['wsgi.py', 'manage.py', 'tasks.py']

Variant 2.2 – Lookup recursive

# lookup recursive
for dirpath, dirnames, filenames in os.walk(path):

    if not filenames:
        continue

    pythonic_files = fnmatch.filter(filenames, pattern)
    if pythonic_files:
        for file in pythonic_files:
            print('{}/{}'.format(dirpath, file))

Result

./wsgi.py
./manage.py
./tasks.py
./temp/temp.py
./apps/diaries/urls.py
./apps/diaries/signals.py
./apps/diaries/actions.py
./apps/diaries/querysets.py
./apps/library/tests/test_forms.py
./apps/library/migrations/0001_initial.py
./apps/polls/views.py
./apps/polls/formsets.py
./apps/polls/reports.py
./apps/polls/admin.py

Solution 3 – use “pathlib”

# lookup in current dir
path_ = pathlib.Path('.')
tuple(path_.glob(pattern))

# lookup recursive
tuple(path_.rglob(pattern))

Notes:

Tested on the Python 3.4
The module “pathlib” was added only in the Python 3.4
The Python 3.5 added a feature for recursive lookup with glob.glob https://docs.python.org/3.5/library/glob.html#glob.glob. Since my machine is installed with Python 3.4, I have not tested that.

Question 50

use os.walk to recursively list your files

import os
root = "/home"
pattern = "145992"
alist_filter = ['jpg','bmp','png','gif'] 
path=os.path.join(root,"mydir_to_scan")
for r,d,f in os.walk(path):
    for file in f:
        if file[-3:] in alist_filter and pattern in file:
            print os.path.join(root,file)

Question 51

import os

dir="/path/to/dir"
[x[0]+"/"+f for x in os.walk(dir) for f in x[2] if f.endswith(".jpg")]

This will give you a list of jpg files with their full path. You can replace x[0]+"/"+f with f for just filenames. You can also replace f.endswith(".jpg") with whatever string condition you wish.

Question 52

you might also like a more high-level approach (I have implemented and packaged as findtools):

from findtools.find_files import (find_files, Match)


# Recursively find all *.txt files in **/home/**
txt_files_pattern = Match(filetype='f', name='*.txt')
found_files = find_files(path='/home', match=txt_files_pattern)

for found_file in found_files:
    print found_file

can be installed with

pip install findtools

Question 53

Filenames with “jpg” and “png” extensions in “path/to/images”:

import os
accepted_extensions = ["jpg", "png"]
filenames = [fn for fn in os.listdir("path/to/images") if fn.split(".")[-1] in accepted_extensions]

Question 54

You can use pathlib that is available in Python standard library 3.4 and above.

from pathlib import Path

files = [f for f in Path.cwd().iterdir() if f.match("145592*.jpg")]

Question 55

You can define pattern and check for it. Here I have taken both start and end pattern and looking for them in the filename. FILES contains the list of all the files in a directory.

import os
PATTERN_START = "145592"
PATTERN_END = ".jpg"
CURRENT_DIR = os.path.dirname(os.path.realpath(__file__))
for r,d,FILES in os.walk(CURRENT_DIR):
    for FILE in FILES:
        if PATTERN_START in FILE.startwith(PATTERN_START) and PATTERN_END in FILE.endswith(PATTERN_END):
            print FILE

Question 56

How about str.split()? Nothing to import.

import os

image_names = [f for f in os.listdir(path) if len(f.split('.jpg')) == 2]

Question 57

You can use subprocess.check_ouput() as

import subprocess

list_files = subprocess.check_output("ls 145992*.jpg", shell=True)

Of course, the string between quotes can be anything you want to execute in the shell, and store the output.

问题：如何仅列出Python中的顶级目录？

回答 0

回答 1

步行

如何运作

os.walk

How this works

回答 2

回答 3

回答 4

回答 5

回答 6

回答 7

回答 8

回答 9

回答 10

回答 11

回答 12

回答 13

回答 14

回答 15

回答 16

问题：检查路径在Python中是否有效，而无需在路径的目标位置创建文件

回答 0

tl; dr

两个问题的故事

问题＃0：路径名有效性又是什么？

问题1：我们现在应如何进行路径名有效性？

问题2：路径名的存在或可创建性可能无效，是吗？

问题3：Windows上可能存在无效的路径名或可写性

证明给我看

tl;dr

A Tale of Two Questions

Question #0: What’s Pathname Validity Again?

Question #1: How Now Shall We Do Pathname Validity?

Question #2: Possibly Invalid Pathname Existence or Creatability, Eh?

Question #3: Possibly Invalid Pathname Existence or Writability on Windows

Prove It

回答 1

回答 2

回答 3

回答 4

问题：如何使用glob.glob模块搜索子文件夹？

回答 0

回答 1

回答 2

回答 3

回答 4

回答 5

回答 6

回答 7

回答 8

回答 9

回答 10

问题：获取目录中文件的过滤列表

回答 0

回答 1

回答 2

回答 3

回答 4

过滤glob模块：

导入球

通配符：

接头扩展.txt：

一个字符

编号范围

字母范围

Filter with glob module:

Import glob

Wild Cards:

Fiter extension .txt:

A single character

Number Ranges

Alphabet Ranges

回答 5

回答 6

回答 7

回答 8

回答 9

回答 10

过滤`glob`模块：

接头扩展`.txt`：

Filter with `glob` module:

Fiter extension `.txt`:

1）使用`shutil`模块复制文件

2）使用`os`模块复制文件

3）使用`subprocess`模块复制文件

1) Copying files using `shutil` module

2) Copying files using `os` module

3) Copying files using `subprocess` module