标签归档:filepath

检查路径在Python中是否有效,而无需在路径的目标位置创建文件

问题:检查路径在Python中是否有效,而无需在路径的目标位置创建文件

我有一个路径(包括目录和文件名)。
我需要测试文件名是否有效,例如,文件系统是否允许我创建具有该名称的文件。
文件名中包含一些Unicode字符

可以安全地假设路径的目录段是有效且可访问的(我试图使这个问题更笼统地适用,并且显然我走得太远了)。

除非必须,否则我非常不想逃脱任何东西。

我会发布一些我正在处理的示例字符,但是显然它们会被堆栈交换系统自动删除。无论如何,我想保留标准的unicode实体,例如ö,仅转义文件名中无效的内容。


这里是要抓住的地方。路径目标上可能已经(可能没有)文件。如果该文件存在,我需要保留该文件,如果不存在,则不要创建该文件。

基本上,我想检查是否可以不实际打开写入路径的情况下写入路径(以及通常需要进行的自动文件创建/文件破坏)。

因此:

try:
    open(filename, 'w')
except OSError:
    # handle error here

从这里

这是不可接受的,因为它将覆盖我不想触摸的现有文件(如果存在),或者如果不存在则创建该文件。

我知道我可以做:

if not os.access(filePath, os.W_OK):
    try:
        open(filePath, 'w').close()
        os.unlink(filePath)
    except OSError:
        # handle error here

但这将在处创建文件filePath,然后我将不得不os.unlink

最后,似乎花了6或7行来完成应该简单os.isvalidpath(filePath)或相似的操作。


顺便说一句,我需要在(至少)Windows和MacOS上运行它,因此我想避免使用特定于平台的东西。

I have a path (including directory and file name).
I need to test if the file-name is a valid, e.g. if the file-system will allow me to create a file with such a name.
The file-name has some unicode characters in it.

It’s safe to assume the directory segment of the path is valid and accessible (I was trying to make the question more gnerally applicable, and apparently I wen too far).

I very much do not want to have to escape anything unless I have to.

I’d post some of the example characters I am dealing with, but apparently they get automatically removed by the stack-exchange system. Anyways, I want to keep standard unicode entities like ö, and only escape things which are invalid in a filename.


Here is the catch. There may (or may not) already be a file at the target of the path. I need to keep that file if it does exist, and not create a file if it does not.

Basically I want to check if I could write to a path without actually opening the path for writing (and the automatic file creation/file clobbering that typically entails).

As such:

try:
    open(filename, 'w')
except OSError:
    # handle error here

from here

Is not acceptable, because it will overwrite the existent file, which I do not want to touch (if it’s there), or create said file if it’s not.

I know I can do:

if not os.access(filePath, os.W_OK):
    try:
        open(filePath, 'w').close()
        os.unlink(filePath)
    except OSError:
        # handle error here

But that will create the file at the filePath, which I would then have to os.unlink.

In the end, it seems like it’s spending 6 or 7 lines to do something that should be as simple as os.isvalidpath(filePath) or similar.


As an aside, I need this to run on (at least) Windows and MacOS, so I’d like to avoid platform-specific stuff.


回答 0

tl; dr

调用is_path_exists_or_creatable()下面定义的函数。

严格地使用Python3。这就是我们的发展方向。

两个问题的故事

问题“如何测试路径名的有效性,以及对于有效路径名,这些路径的存在或可写性?” 显然是两个独立的问题。两者都很有趣,而且在这里还是我能找到的任何地方都没有收到真正令人满意的答案。

vikki答案可能是最接近的,但有以下明显的缺点:

  • 不必要地打开(然后无法可靠地关闭)文件句柄。
  • 不必要的写作( …然后无法可靠地关闭或删除)0字节文件。
  • 忽略操作系统特定的错误,以区分不可忽略的无效路径名和可忽略的文件系统问题。毫不奇怪,这在Windows下至关重要。(见下文。
  • 忽略由外部进程同时(重新)移动要测试的路径名的父目录导致的竞争条件。(见下文。
  • 忽略此路径名导致的连接超时,该路径名位于陈旧,缓慢或暂时不可访问的文件系统上。这可能会使面向公众的服务遭受潜在的DoS驱动的攻击。(见下文。

我们将解决所有问题。

问题#0:路径名有效性又是什么?

在将我们脆弱的肉类衣服扔进Python般的痛苦中之前,我们可能应该定义“路径名有效性”的含义。究竟是什么定义了有效性?

“路径名有效性”是指路径名相对于当前系统的根文件系统语法正确性,无论该路径或其父目录是否物理存在。如果路径名符合根文件系统的所有语法要求,则在此定义下语法上正确。

所谓“根文件系统”,是指:

  • 在与POSIX兼容的系统上,文件系统已安装到根目录(/)。
  • 在Windows中,文件系统安装到%HOMEDRIVE%,包含当前的Windows安装(通常但结肠-后缀盘符必然C:)。

反过来,“语法正确性”的含义取决于根文件系统的类型。对于ext4(且不是大多数但与所有POSIX兼容的)文件系统,路径名称在且仅当该路径名称在语法上正确:

  • 不包含空字节(即,\x00在Python中)。这是所有POSIX兼容文件系统的硬性要求。
  • 包含不超过255个字节的路径组件(例如,'a'*256在Python中)。路径成分是含有不路径名的最长子串/字符(例如,bergtattindi,和fjeldkamrene在路径名/bergtatt/ind/i/fjeldkamrene)。

句法正确性。根文件系统。而已。

问题1:我们现在应如何进行路径名有效性?

令人惊讶的是,在Python中验证路径名是不直观的。我在这里与Fake Name达成坚定协议:官方os.path软件包应为此提供现成的解决方案。出于未知(可能不令人信服)的原因,事实并非如此。幸运的是,展开您自己的临时解决方案并不是那么费劲……

好的,实际上是。毛茸茸的 讨厌 它在发光时发出嘶哑和咯咯笑声时可能会发痒。但是你会怎么做?Nuthin’。

我们将很快进入低级代码的放射性深渊。但首先,让我们谈谈高级商店。当传递无效的路径名时,标准os.stat()os.lstat()函数会引发以下异常:

  • 对于驻留在不存在的目录中的路径名, FileNotFoundError
  • 对于现有目录中的路径名:
    • 在Windows下,WindowsErrorwinerror属性为123(即ERROR_INVALID_NAME)的实例。
    • 在所有其他操作系统下:
    • 对于包含空字节(即'\x00')的路径名,请使用的实例TypeError
    • 对于包含长度超过255个字节的路径成分的路径名,OSErrorerrcode属性的实例为:
      • 在SunOS和* BSD系列操作系统下,errno.ERANGE。(这似乎是操作系统级别的错误,否则称为POSIX标准的“选择性解释”。)
      • 在所有其他操作系统下,errno.ENAMETOOLONG

至关重要的是,这意味着仅存在于现有目录中的路径名是有效的。当传递的路径名驻留在不存在的目录中时,不管这些路径名是否无效,os.stat()andos.lstat()函数都会引发通用FileNotFoundError异常。目录存在优先于路径名无效。

这是否意味着不存在的目录中的路径名无效?是的-除非我们修改这些路径名以驻留在现有目录中。但是,这甚至安全可行吗?修改路径名是否应该阻止我们验证原始路径名?

要回答这个问题,请从上面回忆一下,ext4文件系统上语法正确的路径名不包含路径组件(A)包含空字节,或(B)长度超过255个字节。因此,ext4仅当该路径名中的所有路径组件均有效时,该路径名才有效。大多数 现实世界中感兴趣的文件系统都是如此。

那根学究的见解真的对我们有帮助吗?是。它将一次验证完整路径名的较大问题减少到仅验证该路径名中的所有路径组成部分的较小问题。通过遵循以下算法,可以以跨平台方式对任意路径名进行有效验证(无论该路径名是否位于现有目录中):

  1. 将该路径名拆分为路径组成部分(例如,将路径名/troldskog/faren/vild拆分为list ['', 'troldskog', 'faren', 'vild'])。
  2. 对于每个这样的组件:
    1. 将保证与该组件一起存在的目录的路径名加入新的临时路径名(例如/troldskog)。
    2. 将该路径名传递给os.stat()os.lstat()。如果该路径名及其组件无效,则可以确保此调用引发一个暴露无效类型的异常,而不是通用FileNotFoundError异常。为什么?因为该路径名位于现有目录中。(循环逻辑是循环的。)

是否有目录保证存在?是的,但通常只有一个:根文件系统的最顶层目录(如上定义)。

将驻留在任何其他目录(因此不保证存在)中的路径名传递给竞争条件os.stat()os.lstat()引发竞争条件,即使该目录先前已被测试存在。为什么?因为在执行该测试之后将该路径名传递给os.stat()or之前,无法阻止外部进程同时删除该目录os.lstat()。释放令人发疯的狗!

上述方法也有一个很大的附带好处:安全性。(是不是好的?)具体为:

前端应用程序通过简单地将这样的路径名传递给拒绝服务(DoS)攻击os.stat()os.lstat()容易受到拒绝的攻击,从而验证来自不受信任来源的任意路径名。恶意用户可能试图反复验证驻留在已知陈旧或缓慢的文件系统上的路径名(例如,NFS Samba共享);在这种情况下,盲目声明传入的路径名可能最终会因连接超时而失败,或者消耗的时间和资源要比您承受失业的能力弱。

上面的方法通过仅针对根文件系统的根目录验证路径名的路径组成部分来避免这种情况。(即使这是陈旧,缓慢或无法访问的,也比路径名验证要麻烦得多。)

丢失?大。让我们开始。(假定使用Python3。请参阅“ leycec对300的脆弱希望是什么?”)

import errno, os

# Sadly, Python fails to provide the following magic number for us.
ERROR_INVALID_NAME = 123
'''
Windows-specific error code indicating an invalid pathname.

See Also
----------
https://docs.microsoft.com/en-us/windows/win32/debug/system-error-codes--0-499-
    Official listing of all such codes.
'''

def is_pathname_valid(pathname: str) -> bool:
    '''
    `True` if the passed pathname is a valid pathname for the current OS;
    `False` otherwise.
    '''
    # If this pathname is either not a string or is but is empty, this pathname
    # is invalid.
    try:
        if not isinstance(pathname, str) or not pathname:
            return False

        # Strip this pathname's Windows-specific drive specifier (e.g., `C:\`)
        # if any. Since Windows prohibits path components from containing `:`
        # characters, failing to strip this `:`-suffixed prefix would
        # erroneously invalidate all valid absolute Windows pathnames.
        _, pathname = os.path.splitdrive(pathname)

        # Directory guaranteed to exist. If the current OS is Windows, this is
        # the drive to which Windows was installed (e.g., the "%HOMEDRIVE%"
        # environment variable); else, the typical root directory.
        root_dirname = os.environ.get('HOMEDRIVE', 'C:') \
            if sys.platform == 'win32' else os.path.sep
        assert os.path.isdir(root_dirname)   # ...Murphy and her ironclad Law

        # Append a path separator to this directory if needed.
        root_dirname = root_dirname.rstrip(os.path.sep) + os.path.sep

        # Test whether each path component split from this pathname is valid or
        # not, ignoring non-existent and non-readable path components.
        for pathname_part in pathname.split(os.path.sep):
            try:
                os.lstat(root_dirname + pathname_part)
            # If an OS-specific exception is raised, its error code
            # indicates whether this pathname is valid or not. Unless this
            # is the case, this exception implies an ignorable kernel or
            # filesystem complaint (e.g., path not found or inaccessible).
            #
            # Only the following exceptions indicate invalid pathnames:
            #
            # * Instances of the Windows-specific "WindowsError" class
            #   defining the "winerror" attribute whose value is
            #   "ERROR_INVALID_NAME". Under Windows, "winerror" is more
            #   fine-grained and hence useful than the generic "errno"
            #   attribute. When a too-long pathname is passed, for example,
            #   "errno" is "ENOENT" (i.e., no such file or directory) rather
            #   than "ENAMETOOLONG" (i.e., file name too long).
            # * Instances of the cross-platform "OSError" class defining the
            #   generic "errno" attribute whose value is either:
            #   * Under most POSIX-compatible OSes, "ENAMETOOLONG".
            #   * Under some edge-case OSes (e.g., SunOS, *BSD), "ERANGE".
            except OSError as exc:
                if hasattr(exc, 'winerror'):
                    if exc.winerror == ERROR_INVALID_NAME:
                        return False
                elif exc.errno in {errno.ENAMETOOLONG, errno.ERANGE}:
                    return False
    # If a "TypeError" exception was raised, it almost certainly has the
    # error message "embedded NUL character" indicating an invalid pathname.
    except TypeError as exc:
        return False
    # If no exception was raised, all path components and hence this
    # pathname itself are valid. (Praise be to the curmudgeonly python.)
    else:
        return True
    # If any other exception was raised, this is an unrelated fatal issue
    # (e.g., a bug). Permit this exception to unwind the call stack.
    #
    # Did we mention this should be shipped with Python already?

做完了 不要斜视那个代码。(它咬。

问题2:路径名的存在或可创建性可能无效,是吗?

在上述解决方案的基础上,测试可能无效的路径名的存在或可创建性通常很简单。这里的关键是测试传递的路径之前调用先前定义的函数:

def is_path_creatable(pathname: str) -> bool:
    '''
    `True` if the current user has sufficient permissions to create the passed
    pathname; `False` otherwise.
    '''
    # Parent directory of the passed path. If empty, we substitute the current
    # working directory (CWD) instead.
    dirname = os.path.dirname(pathname) or os.getcwd()
    return os.access(dirname, os.W_OK)

def is_path_exists_or_creatable(pathname: str) -> bool:
    '''
    `True` if the passed pathname is a valid pathname for the current OS _and_
    either currently exists or is hypothetically creatable; `False` otherwise.

    This function is guaranteed to _never_ raise exceptions.
    '''
    try:
        # To prevent "os" module calls from raising undesirable exceptions on
        # invalid pathnames, is_pathname_valid() is explicitly called first.
        return is_pathname_valid(pathname) and (
            os.path.exists(pathname) or is_path_creatable(pathname))
    # Report failure on non-fatal filesystem complaints (e.g., connection
    # timeouts, permissions issues) implying this path to be inaccessible. All
    # other exceptions are unrelated fatal issues and should not be caught here.
    except OSError:
        return False

完成完成的。除了不太一样。

问题3:Windows上可能存在无效的路径名或可写性

有一个警告。当然有。

官方os.access()文件承认:

注意:即使os.access()表明I / O操作将成功,它也可能会失败,尤其是对于网络文件系统上的操作,其权限语义可能超出通常的POSIX权限位模型。

毫不奇怪,Windows通常是这里的嫌疑人。由于在NTFS文件系统上广泛使用了访问控制列表(ACL),因此简单的POSIX权限位模型无法很好地映射到底层Windows现实。尽管这(不是问题)不是Python的错,但对于与Windows兼容的应用程序,它可能仍然值得关注。

如果是您,那么需要一个更强大的替代方案。如果传递的路径也不会存在,我们不是试图建立保证该路径的父目录被立即删除临时文件- creatability的更便携的(如昂贵的)测试:

import os, tempfile

def is_path_sibling_creatable(pathname: str) -> bool:
    '''
    `True` if the current user has sufficient permissions to create **siblings**
    (i.e., arbitrary files in the parent directory) of the passed pathname;
    `False` otherwise.
    '''
    # Parent directory of the passed path. If empty, we substitute the current
    # working directory (CWD) instead.
    dirname = os.path.dirname(pathname) or os.getcwd()

    try:
        # For safety, explicitly close and hence delete this temporary file
        # immediately after creating it in the passed path's parent directory.
        with tempfile.TemporaryFile(dir=dirname): pass
        return True
    # While the exact type of exception raised by the above function depends on
    # the current version of the Python interpreter, all such types subclass the
    # following exception superclass.
    except EnvironmentError:
        return False

def is_path_exists_or_creatable_portable(pathname: str) -> bool:
    '''
    `True` if the passed pathname is a valid pathname on the current OS _and_
    either currently exists or is hypothetically creatable in a cross-platform
    manner optimized for POSIX-unfriendly filesystems; `False` otherwise.

    This function is guaranteed to _never_ raise exceptions.
    '''
    try:
        # To prevent "os" module calls from raising undesirable exceptions on
        # invalid pathnames, is_pathname_valid() is explicitly called first.
        return is_pathname_valid(pathname) and (
            os.path.exists(pathname) or is_path_sibling_creatable(pathname))
    # Report failure on non-fatal filesystem complaints (e.g., connection
    # timeouts, permissions issues) implying this path to be inaccessible. All
    # other exceptions are unrelated fatal issues and should not be caught here.
    except OSError:
        return False

但是请注意,即使可能还不够。

多亏了用户访问控制(UAC),永远无法模仿的Windows Vista及其所有后续迭代都明显涉及与系统目录有关的权限。当非管理员用户尝试在规范目录C:\WindowsC:\Windows\system32目录中创建文件时,UAC会从表面上允许用户这样做,同时实际上将所有创建的文件隔离到该用户配置文件中的“虚拟存储”中。(谁能想到欺骗用户会产生有害的长期后果?)

这太疯狂了。这是Windows。

证明给我看

敢吗 现在该进行上述测试了。

由于NULL是面向UNIX的文件系统上路径名中唯一禁止使用的字符,因此让我们利用它来展示冷酷的事实–忽略不可忽略的Windows恶作剧,坦白地说,这同样使我感到厌烦并激怒了我:

>>> print('"foo.bar" valid? ' + str(is_pathname_valid('foo.bar')))
"foo.bar" valid? True
>>> print('Null byte valid? ' + str(is_pathname_valid('\x00')))
Null byte valid? False
>>> print('Long path valid? ' + str(is_pathname_valid('a' * 256)))
Long path valid? False
>>> print('"/dev" exists or creatable? ' + str(is_path_exists_or_creatable('/dev')))
"/dev" exists or creatable? True
>>> print('"/dev/foo.bar" exists or creatable? ' + str(is_path_exists_or_creatable('/dev/foo.bar')))
"/dev/foo.bar" exists or creatable? False
>>> print('Null byte exists or creatable? ' + str(is_path_exists_or_creatable('\x00')))
Null byte exists or creatable? False

超越理智。超越痛苦。您会发现Python可移植性问题。

tl;dr

Call the is_path_exists_or_creatable() function defined below.

Strictly Python 3. That’s just how we roll.

A Tale of Two Questions

The question of “How do I test pathname validity and, for valid pathnames, the existence or writability of those paths?” is clearly two separate questions. Both are interesting, and neither have received a genuinely satisfactory answer here… or, well, anywhere that I could grep.

vikki‘s answer probably hews the closest, but has the remarkable disadvantages of:

  • Needlessly opening (…and then failing to reliably close) file handles.
  • Needlessly writing (…and then failing to reliable close or delete) 0-byte files.
  • Ignoring OS-specific errors differentiating between non-ignorable invalid pathnames and ignorable filesystem issues. Unsurprisingly, this is critical under Windows. (See below.)
  • Ignoring race conditions resulting from external processes concurrently (re)moving parent directories of the pathname to be tested. (See below.)
  • Ignoring connection timeouts resulting from this pathname residing on stale, slow, or otherwise temporarily inaccessible filesystems. This could expose public-facing services to potential DoS-driven attacks. (See below.)

We’re gonna fix all that.

Question #0: What’s Pathname Validity Again?

Before hurling our fragile meat suits into the python-riddled moshpits of pain, we should probably define what we mean by “pathname validity.” What defines validity, exactly?

By “pathname validity,” we mean the syntactic correctness of a pathname with respect to the root filesystem of the current system – regardless of whether that path or parent directories thereof physically exist. A pathname is syntactically correct under this definition if it complies with all syntactic requirements of the root filesystem.

By “root filesystem,” we mean:

  • On POSIX-compatible systems, the filesystem mounted to the root directory (/).
  • On Windows, the filesystem mounted to %HOMEDRIVE%, the colon-suffixed drive letter containing the current Windows installation (typically but not necessarily C:).

The meaning of “syntactic correctness,” in turn, depends on the type of root filesystem. For ext4 (and most but not all POSIX-compatible) filesystems, a pathname is syntactically correct if and only if that pathname:

  • Contains no null bytes (i.e., \x00 in Python). This is a hard requirement for all POSIX-compatible filesystems.
  • Contains no path components longer than 255 bytes (e.g., 'a'*256 in Python). A path component is a longest substring of a pathname containing no / character (e.g., bergtatt, ind, i, and fjeldkamrene in the pathname /bergtatt/ind/i/fjeldkamrene).

Syntactic correctness. Root filesystem. That’s it.

Question #1: How Now Shall We Do Pathname Validity?

Validating pathnames in Python is surprisingly non-intuitive. I’m in firm agreement with Fake Name here: the official os.path package should provide an out-of-the-box solution for this. For unknown (and probably uncompelling) reasons, it doesn’t. Fortunately, unrolling your own ad-hoc solution isn’t that gut-wrenching…

O.K., it actually is. It’s hairy; it’s nasty; it probably chortles as it burbles and giggles as it glows. But what you gonna do? Nuthin’.

We’ll soon descend into the radioactive abyss of low-level code. But first, let’s talk high-level shop. The standard os.stat() and os.lstat() functions raise the following exceptions when passed invalid pathnames:

  • For pathnames residing in non-existing directories, instances of FileNotFoundError.
  • For pathnames residing in existing directories:
    • Under Windows, instances of WindowsError whose winerror attribute is 123 (i.e., ERROR_INVALID_NAME).
    • Under all other OSes:
    • For pathnames containing null bytes (i.e., '\x00'), instances of TypeError.
    • For pathnames containing path components longer than 255 bytes, instances of OSError whose errcode attribute is:
      • Under SunOS and the *BSD family of OSes, errno.ERANGE. (This appears to be an OS-level bug, otherwise referred to as “selective interpretation” of the POSIX standard.)
      • Under all other OSes, errno.ENAMETOOLONG.

Crucially, this implies that only pathnames residing in existing directories are validatable. The os.stat() and os.lstat() functions raise generic FileNotFoundError exceptions when passed pathnames residing in non-existing directories, regardless of whether those pathnames are invalid or not. Directory existence takes precedence over pathname invalidity.

Does this mean that pathnames residing in non-existing directories are not validatable? Yes – unless we modify those pathnames to reside in existing directories. Is that even safely feasible, however? Shouldn’t modifying a pathname prevent us from validating the original pathname?

To answer this question, recall from above that syntactically correct pathnames on the ext4 filesystem contain no path components (A) containing null bytes or (B) over 255 bytes in length. Hence, an ext4 pathname is valid if and only if all path components in that pathname are valid. This is true of most real-world filesystems of interest.

Does that pedantic insight actually help us? Yes. It reduces the larger problem of validating the full pathname in one fell swoop to the smaller problem of only validating all path components in that pathname. Any arbitrary pathname is validatable (regardless of whether that pathname resides in an existing directory or not) in a cross-platform manner by following the following algorithm:

  1. Split that pathname into path components (e.g., the pathname /troldskog/faren/vild into the list ['', 'troldskog', 'faren', 'vild']).
  2. For each such component:
    1. Join the pathname of a directory guaranteed to exist with that component into a new temporary pathname (e.g., /troldskog) .
    2. Pass that pathname to os.stat() or os.lstat(). If that pathname and hence that component is invalid, this call is guaranteed to raise an exception exposing the type of invalidity rather than a generic FileNotFoundError exception. Why? Because that pathname resides in an existing directory. (Circular logic is circular.)

Is there a directory guaranteed to exist? Yes, but typically only one: the topmost directory of the root filesystem (as defined above).

Passing pathnames residing in any other directory (and hence not guaranteed to exist) to os.stat() or os.lstat() invites race conditions, even if that directory was previously tested to exist. Why? Because external processes cannot be prevented from concurrently removing that directory after that test has been performed but before that pathname is passed to os.stat() or os.lstat(). Unleash the dogs of mind-fellating insanity!

There exists a substantial side benefit to the above approach as well: security. (Isn’t that nice?) Specifically:

Front-facing applications validating arbitrary pathnames from untrusted sources by simply passing such pathnames to os.stat() or os.lstat() are susceptible to Denial of Service (DoS) attacks and other black-hat shenanigans. Malicious users may attempt to repeatedly validate pathnames residing on filesystems known to be stale or otherwise slow (e.g., NFS Samba shares); in that case, blindly statting incoming pathnames is liable to either eventually fail with connection timeouts or consume more time and resources than your feeble capacity to withstand unemployment.

The above approach obviates this by only validating the path components of a pathname against the root directory of the root filesystem. (If even that’s stale, slow, or inaccessible, you’ve got larger problems than pathname validation.)

Lost? Great. Let’s begin. (Python 3 assumed. See “What Is Fragile Hope for 300, leycec?”)

import errno, os

# Sadly, Python fails to provide the following magic number for us.
ERROR_INVALID_NAME = 123
'''
Windows-specific error code indicating an invalid pathname.

See Also
----------
https://docs.microsoft.com/en-us/windows/win32/debug/system-error-codes--0-499-
    Official listing of all such codes.
'''

def is_pathname_valid(pathname: str) -> bool:
    '''
    `True` if the passed pathname is a valid pathname for the current OS;
    `False` otherwise.
    '''
    # If this pathname is either not a string or is but is empty, this pathname
    # is invalid.
    try:
        if not isinstance(pathname, str) or not pathname:
            return False

        # Strip this pathname's Windows-specific drive specifier (e.g., `C:\`)
        # if any. Since Windows prohibits path components from containing `:`
        # characters, failing to strip this `:`-suffixed prefix would
        # erroneously invalidate all valid absolute Windows pathnames.
        _, pathname = os.path.splitdrive(pathname)

        # Directory guaranteed to exist. If the current OS is Windows, this is
        # the drive to which Windows was installed (e.g., the "%HOMEDRIVE%"
        # environment variable); else, the typical root directory.
        root_dirname = os.environ.get('HOMEDRIVE', 'C:') \
            if sys.platform == 'win32' else os.path.sep
        assert os.path.isdir(root_dirname)   # ...Murphy and her ironclad Law

        # Append a path separator to this directory if needed.
        root_dirname = root_dirname.rstrip(os.path.sep) + os.path.sep

        # Test whether each path component split from this pathname is valid or
        # not, ignoring non-existent and non-readable path components.
        for pathname_part in pathname.split(os.path.sep):
            try:
                os.lstat(root_dirname + pathname_part)
            # If an OS-specific exception is raised, its error code
            # indicates whether this pathname is valid or not. Unless this
            # is the case, this exception implies an ignorable kernel or
            # filesystem complaint (e.g., path not found or inaccessible).
            #
            # Only the following exceptions indicate invalid pathnames:
            #
            # * Instances of the Windows-specific "WindowsError" class
            #   defining the "winerror" attribute whose value is
            #   "ERROR_INVALID_NAME". Under Windows, "winerror" is more
            #   fine-grained and hence useful than the generic "errno"
            #   attribute. When a too-long pathname is passed, for example,
            #   "errno" is "ENOENT" (i.e., no such file or directory) rather
            #   than "ENAMETOOLONG" (i.e., file name too long).
            # * Instances of the cross-platform "OSError" class defining the
            #   generic "errno" attribute whose value is either:
            #   * Under most POSIX-compatible OSes, "ENAMETOOLONG".
            #   * Under some edge-case OSes (e.g., SunOS, *BSD), "ERANGE".
            except OSError as exc:
                if hasattr(exc, 'winerror'):
                    if exc.winerror == ERROR_INVALID_NAME:
                        return False
                elif exc.errno in {errno.ENAMETOOLONG, errno.ERANGE}:
                    return False
    # If a "TypeError" exception was raised, it almost certainly has the
    # error message "embedded NUL character" indicating an invalid pathname.
    except TypeError as exc:
        return False
    # If no exception was raised, all path components and hence this
    # pathname itself are valid. (Praise be to the curmudgeonly python.)
    else:
        return True
    # If any other exception was raised, this is an unrelated fatal issue
    # (e.g., a bug). Permit this exception to unwind the call stack.
    #
    # Did we mention this should be shipped with Python already?

Done. Don’t squint at that code. (It bites.)

Question #2: Possibly Invalid Pathname Existence or Creatability, Eh?

Testing the existence or creatability of possibly invalid pathnames is, given the above solution, mostly trivial. The little key here is to call the previously defined function before testing the passed path:

def is_path_creatable(pathname: str) -> bool:
    '''
    `True` if the current user has sufficient permissions to create the passed
    pathname; `False` otherwise.
    '''
    # Parent directory of the passed path. If empty, we substitute the current
    # working directory (CWD) instead.
    dirname = os.path.dirname(pathname) or os.getcwd()
    return os.access(dirname, os.W_OK)

def is_path_exists_or_creatable(pathname: str) -> bool:
    '''
    `True` if the passed pathname is a valid pathname for the current OS _and_
    either currently exists or is hypothetically creatable; `False` otherwise.

    This function is guaranteed to _never_ raise exceptions.
    '''
    try:
        # To prevent "os" module calls from raising undesirable exceptions on
        # invalid pathnames, is_pathname_valid() is explicitly called first.
        return is_pathname_valid(pathname) and (
            os.path.exists(pathname) or is_path_creatable(pathname))
    # Report failure on non-fatal filesystem complaints (e.g., connection
    # timeouts, permissions issues) implying this path to be inaccessible. All
    # other exceptions are unrelated fatal issues and should not be caught here.
    except OSError:
        return False

Done and done. Except not quite.

Question #3: Possibly Invalid Pathname Existence or Writability on Windows

There exists a caveat. Of course there does.

As the official os.access() documentation admits:

Note: I/O operations may fail even when os.access() indicates that they would succeed, particularly for operations on network filesystems which may have permissions semantics beyond the usual POSIX permission-bit model.

To no one’s surprise, Windows is the usual suspect here. Thanks to extensive use of Access Control Lists (ACL) on NTFS filesystems, the simplistic POSIX permission-bit model maps poorly to the underlying Windows reality. While this (arguably) isn’t Python’s fault, it might nonetheless be of concern for Windows-compatible applications.

If this is you, a more robust alternative is wanted. If the passed path does not exist, we instead attempt to create a temporary file guaranteed to be immediately deleted in the parent directory of that path – a more portable (if expensive) test of creatability:

import os, tempfile

def is_path_sibling_creatable(pathname: str) -> bool:
    '''
    `True` if the current user has sufficient permissions to create **siblings**
    (i.e., arbitrary files in the parent directory) of the passed pathname;
    `False` otherwise.
    '''
    # Parent directory of the passed path. If empty, we substitute the current
    # working directory (CWD) instead.
    dirname = os.path.dirname(pathname) or os.getcwd()

    try:
        # For safety, explicitly close and hence delete this temporary file
        # immediately after creating it in the passed path's parent directory.
        with tempfile.TemporaryFile(dir=dirname): pass
        return True
    # While the exact type of exception raised by the above function depends on
    # the current version of the Python interpreter, all such types subclass the
    # following exception superclass.
    except EnvironmentError:
        return False

def is_path_exists_or_creatable_portable(pathname: str) -> bool:
    '''
    `True` if the passed pathname is a valid pathname on the current OS _and_
    either currently exists or is hypothetically creatable in a cross-platform
    manner optimized for POSIX-unfriendly filesystems; `False` otherwise.

    This function is guaranteed to _never_ raise exceptions.
    '''
    try:
        # To prevent "os" module calls from raising undesirable exceptions on
        # invalid pathnames, is_pathname_valid() is explicitly called first.
        return is_pathname_valid(pathname) and (
            os.path.exists(pathname) or is_path_sibling_creatable(pathname))
    # Report failure on non-fatal filesystem complaints (e.g., connection
    # timeouts, permissions issues) implying this path to be inaccessible. All
    # other exceptions are unrelated fatal issues and should not be caught here.
    except OSError:
        return False

Note, however, that even this may not be enough.

Thanks to User Access Control (UAC), the ever-inimicable Windows Vista and all subsequent iterations thereof blatantly lie about permissions pertaining to system directories. When non-Administrator users attempt to create files in either the canonical C:\Windows or C:\Windows\system32 directories, UAC superficially permits the user to do so while actually isolating all created files into a “Virtual Store” in that user’s profile. (Who could have possibly imagined that deceiving users would have harmful long-term consequences?)

This is crazy. This is Windows.

Prove It

Dare we? It’s time to test-drive the above tests.

Since NULL is the only character prohibited in pathnames on UNIX-oriented filesystems, let’s leverage that to demonstrate the cold, hard truth – ignoring non-ignorable Windows shenanigans, which frankly bore and anger me in equal measure:

>>> print('"foo.bar" valid? ' + str(is_pathname_valid('foo.bar')))
"foo.bar" valid? True
>>> print('Null byte valid? ' + str(is_pathname_valid('\x00')))
Null byte valid? False
>>> print('Long path valid? ' + str(is_pathname_valid('a' * 256)))
Long path valid? False
>>> print('"/dev" exists or creatable? ' + str(is_path_exists_or_creatable('/dev')))
"/dev" exists or creatable? True
>>> print('"/dev/foo.bar" exists or creatable? ' + str(is_path_exists_or_creatable('/dev/foo.bar')))
"/dev/foo.bar" exists or creatable? False
>>> print('Null byte exists or creatable? ' + str(is_path_exists_or_creatable('\x00')))
Null byte exists or creatable? False

Beyond sanity. Beyond pain. You will find Python portability concerns.


回答 1

if os.path.exists(filePath):
    #the file is there
elif os.access(os.path.dirname(filePath), os.W_OK):
    #the file does not exists but write privileges are given
else:
    #can not write there

请注意,path.exists失败的原因the file is not there可能不仅仅是,所以您可能必须进行更精细的测试,例如测试包含目录是否存在等等。


在与OP讨论之后,事实证明,主要的问题似乎是文件名可能包含文件系统不允许的字符。当然,需要将它们删除,但是OP希望在文件系统允许的范围内保持尽可能多的人可读性。

可悲的是,我不知道有什么好的解决方案。但是,塞西尔·库里(Cecil Curry)的答案更仔细地研究了发现问题。

if os.path.exists(filePath):
    #the file is there
elif os.access(os.path.dirname(filePath), os.W_OK):
    #the file does not exists but write privileges are given
else:
    #can not write there

Note that path.exists can fail for more reasons than just the file is not there so you might have to do finer tests like testing if the containing directory exists and so on.


After my discussion with the OP it turned out, that the main problem seems to be, that the file name might contain characters that are not allowed by the filesystem. Of course they need to be removed but the OP wants to maintain as much human readablitiy as the filesystem allows.

Sadly I do not know of any good solution for this. However Cecil Curry’s answer takes a closer look at detecting the problem.


回答 2

使用Python 3,如何:

try:
    with open(filename, 'x') as tempfile: # OSError if file exists or is invalid
        pass
except OSError:
    # handle error here

使用“ x”选项,我们也不必担心比赛条件。请参阅此处的文档。

现在,如果该文件尚不存在,它将创建一个寿命很短的临时文件-除非名称无效。如果您可以忍受,那么可以简化很多事情。

With Python 3, how about:

try:
    with open(filename, 'x') as tempfile: # OSError if file exists or is invalid
        pass
except OSError:
    # handle error here

With the ‘x’ option we also don’t have to worry about race conditions. See documentation here.

Now, this WILL create a very shortlived temporary file if it does not exist already – unless the name is invalid. If you can live with that, it simplifies things a lot.


回答 3

open(filename,'r')   #2nd argument is r and not w

将打开文件或给出错误(如果不存在)。如果有错误,那么您可以尝试写入路径,如果不能,则出现第二个错误

try:
    open(filename,'r')
    return True
except IOError:
    try:
        open(filename, 'w')
        return True
    except IOError:
        return False

也可以在这里查看有关Windows权限的信息

open(filename,'r')   #2nd argument is r and not w

will open the file or give an error if it doesn’t exist. If there’s an error, then you can try to write to the path, if you can’t then you get a second error

try:
    open(filename,'r')
    return True
except IOError:
    try:
        open(filename, 'w')
        return True
    except IOError:
        return False

Also have a look here about permissions on windows


回答 4

尝试os.path.exists此操作将检查路径,并返回True是否存在(False如果不存在)。

try os.path.exists this will check for the path and return True if exists and False if not.


在Python中提取一部分文件路径(目录)

问题:在Python中提取一部分文件路径(目录)

我需要提取某个路径的父目录的名称。看起来是这样的:

c:\stuff\directory_i_need\subdir\file

我正在使用使用文件directory_i_need名(而不是路径)的东西来修改“文件”的内容。我创建了一个函数,该函数会给我所有文件的列表,然后…

for path in file_list:
   #directory_name = os.path.dirname(path)   # this is not what I need, that's why it is commented
   directories, files = path.split('\\')

   line_replace_add_directory = line_replace + directories  
   # this is what I want to add in the text, with the directory name at the end 
   # of the line.

我怎样才能做到这一点?

I need to extract the name of the parent directory of a certain path. This is what it looks like:

c:\stuff\directory_i_need\subdir\file

I am modifying the content of the “file” with something that uses the directory_i_need name in it (not the path). I have created a function that will give me a list of all the files, and then…

for path in file_list:
   #directory_name = os.path.dirname(path)   # this is not what I need, that's why it is commented
   directories, files = path.split('\\')

   line_replace_add_directory = line_replace + directories  
   # this is what I want to add in the text, with the directory name at the end 
   # of the line.

How can I do that?


回答 0

import os
## first file in current dir (with full path)
file = os.path.join(os.getcwd(), os.listdir(os.getcwd())[0])
file
os.path.dirname(file) ## directory of file
os.path.dirname(os.path.dirname(file)) ## directory of directory of file
...

而且您可以根据需要继续执行多次…

编辑:os.path,您可以使用os.path.split或os.path.basename:

dir = os.path.dirname(os.path.dirname(file)) ## dir of dir of file
## once you're at the directory level you want, with the desired directory as the final path node:
dirname1 = os.path.basename(dir) 
dirname2 = os.path.split(dir)[1] ## if you look at the documentation, this is exactly what os.path.basename does.
import os
## first file in current dir (with full path)
file = os.path.join(os.getcwd(), os.listdir(os.getcwd())[0])
file
os.path.dirname(file) ## directory of file
os.path.dirname(os.path.dirname(file)) ## directory of directory of file
...

And you can continue doing this as many times as necessary…

Edit: from os.path, you can use either os.path.split or os.path.basename:

dir = os.path.dirname(os.path.dirname(file)) ## dir of dir of file
## once you're at the directory level you want, with the desired directory as the final path node:
dirname1 = os.path.basename(dir) 
dirname2 = os.path.split(dir)[1] ## if you look at the documentation, this is exactly what os.path.basename does.

回答 1

在Python 3.4中,您可以使用pathlib模块

>>> from pathlib import Path
>>> p = Path('C:\Program Files\Internet Explorer\iexplore.exe')
>>> p.name
'iexplore.exe'
>>> p.suffix
'.exe'
>>> p.root
'\\'
>>> p.parts
('C:\\', 'Program Files', 'Internet Explorer', 'iexplore.exe')
>>> p.relative_to('C:\Program Files')
WindowsPath('Internet Explorer/iexplore.exe')
>>> p.exists()
True

In Python 3.4 you can use the pathlib module:

>>> from pathlib import Path
>>> p = Path('C:\Program Files\Internet Explorer\iexplore.exe')
>>> p.name
'iexplore.exe'
>>> p.suffix
'.exe'
>>> p.root
'\\'
>>> p.parts
('C:\\', 'Program Files', 'Internet Explorer', 'iexplore.exe')
>>> p.relative_to('C:\Program Files')
WindowsPath('Internet Explorer/iexplore.exe')
>>> p.exists()
True

回答 2

parent如果您使用,您所需要的只是一部分pathlib

from pathlib import Path
p = Path(r'C:\Program Files\Internet Explorer\iexplore.exe')
print(p.parent) 

将输出:

C:\Program Files\Internet Explorer    

如果您需要所有部分(已经包含在其他答案中),请使用parts

p = Path(r'C:\Program Files\Internet Explorer\iexplore.exe')
print(p.parts) 

然后,您将获得一个列表:

('C:\\', 'Program Files', 'Internet Explorer', 'iexplore.exe')

节省时间。

All you need is parent part if you use pathlib.

from pathlib import Path
p = Path(r'C:\Program Files\Internet Explorer\iexplore.exe')
print(p.parent) 

Will output:

C:\Program Files\Internet Explorer    

Case you need all parts (already covered in other answers) use parts:

p = Path(r'C:\Program Files\Internet Explorer\iexplore.exe')
print(p.parts) 

Then you will get a list:

('C:\\', 'Program Files', 'Internet Explorer', 'iexplore.exe')

Saves tone of time.


回答 3

首先,查看中是否有splitunc()可用功能os.path。返回的第一项应该是您想要的…但是我在Linux上,并且在导入os并尝试使用它时没有此功能。

否则,完成工作的一种半丑陋的方法是使用:

>>> pathname = "\\C:\\mystuff\\project\\file.py"
>>> pathname
'\\C:\\mystuff\\project\\file.py'
>>> print pathname
\C:\mystuff\project\file.py
>>> "\\".join(pathname.split('\\')[:-2])
'\\C:\\mystuff'
>>> "\\".join(pathname.split('\\')[:-1])
'\\C:\\mystuff\\project'

该图显示了检索文件正上方的目录以及该目录正上方的目录。

First, see if you have splitunc() as an available function within os.path. The first item returned should be what you want… but I am on Linux and I do not have this function when I import os and try to use it.

Otherwise, one semi-ugly way that gets the job done is to use:

>>> pathname = "\\C:\\mystuff\\project\\file.py"
>>> pathname
'\\C:\\mystuff\\project\\file.py'
>>> print pathname
\C:\mystuff\project\file.py
>>> "\\".join(pathname.split('\\')[:-2])
'\\C:\\mystuff'
>>> "\\".join(pathname.split('\\')[:-1])
'\\C:\\mystuff\\project'

which shows retrieving the directory just above the file, and the directory just above that.


回答 4

这是我提取目录的一部分的工作:

for path in file_list:
  directories = path.rsplit('\\')
  directories.reverse()
  line_replace_add_directory = line_replace+directories[2]

谢谢您的帮助。

This is what I did to extract the piece of the directory:

for path in file_list:
  directories = path.rsplit('\\')
  directories.reverse()
  line_replace_add_directory = line_replace+directories[2]

Thank you for your help.


回答 5

import os

directory = os.path.abspath('\\') # root directory
print(directory) # e.g. 'C:\'

directory = os.path.abspath('.') # current directory
print(directory) # e.g. 'C:\Users\User\Desktop'

parent_directory, directory_name = os.path.split(directory)
print(directory_name) # e.g. 'Desktop'
parent_parent_directory, parent_directory_name = os.path.split(parent_directory)
print(parent_directory_name) # e.g. 'User'

这也应该可以解决问题。

import os

directory = os.path.abspath('\\') # root directory
print(directory) # e.g. 'C:\'

directory = os.path.abspath('.') # current directory
print(directory) # e.g. 'C:\Users\User\Desktop'

parent_directory, directory_name = os.path.split(directory)
print(directory_name) # e.g. 'Desktop'
parent_parent_directory, parent_directory_name = os.path.split(parent_directory)
print(parent_directory_name) # e.g. 'User'

This should also do the trick.


回答 6

您必须将整个路径作为os.path.split的参数。请参阅文档。它不像字符串拆分那样工作。

You have to put the entire path as a parameter to os.path.split. See The docs. It doesn’t work like string split.