问题:无论OS /路径格式如何,都从路径中提取文件名

无论使用哪种操作系统或路径格式,我都可以使用哪个Python库从路径提取文件名?

例如,我希望所有这些路径返回我c

a/b/c/
a/b/c
\a\b\c
\a\b\c\
a\b\c
a/b/../../a/b/c/
a/b/../../a/b/c

Which Python library can I use to extract filenames from paths, no matter what the operating system or path format could be?

For example, I’d like all of these paths to return me c:

a/b/c/
a/b/c
\a\b\c
\a\b\c\
a\b\c
a/b/../../a/b/c/
a/b/../../a/b/c

回答 0

使用os.path.splitos.path.basename建议使用并非在所有情况下都行得通:如果您在Linux上运行脚本并尝试处理经典的Windows样式路径,它将失败。

Windows路径可以使用反斜杠或正斜杠作为路径分隔符。因此,该ntpath模块(在Windows上运行时等效于os.path)将适用于所有平台上的所有(1)路径。

import ntpath
ntpath.basename("a/b/c")

当然,如果文件以斜杠结尾,则基名将为空,因此请使用您自己的函数来处理它:

def path_leaf(path):
    head, tail = ntpath.split(path)
    return tail or ntpath.basename(head)

验证:

>>> paths = ['a/b/c/', 'a/b/c', '\\a\\b\\c', '\\a\\b\\c\\', 'a\\b\\c', 
...     'a/b/../../a/b/c/', 'a/b/../../a/b/c']
>>> [path_leaf(path) for path in paths]
['c', 'c', 'c', 'c', 'c', 'c', 'c']


(1)有一个警告:Linux文件名可能包含反斜杠。因此,在Linux上,r'a/b\c'始终引用文件夹b\c中的a文件,而在Windows上,始终引用c文件b夹的子文件夹中的a文件。因此,在路径中同时使用正斜杠和反斜杠时,您需要了解关联的平台才能正确解释它。实际上,通常可以安全地假定它是Windows路径,因为Linux文件名中很少使用反斜杠,但是在编写代码时请记住这一点,以免造成意外的安全漏洞。

Using os.path.split or os.path.basename as others suggest won’t work in all cases: if you’re running the script on Linux and attempt to process a classic windows-style path, it will fail.

Windows paths can use either backslash or forward slash as path separator. Therefore, the ntpath module (which is equivalent to os.path when running on windows) will work for all(1) paths on all platforms.

import ntpath
ntpath.basename("a/b/c")

Of course, if the file ends with a slash, the basename will be empty, so make your own function to deal with it:

def path_leaf(path):
    head, tail = ntpath.split(path)
    return tail or ntpath.basename(head)

Verification:

>>> paths = ['a/b/c/', 'a/b/c', '\\a\\b\\c', '\\a\\b\\c\\', 'a\\b\\c', 
...     'a/b/../../a/b/c/', 'a/b/../../a/b/c']
>>> [path_leaf(path) for path in paths]
['c', 'c', 'c', 'c', 'c', 'c', 'c']


(1) There’s one caveat: Linux filenames may contain backslashes. So on linux, r'a/b\c' always refers to the file b\c in the a folder, while on Windows, it always refers to the c file in the b subfolder of the a folder. So when both forward and backward slashes are used in a path, you need to know the associated platform to be able to interpret it correctly. In practice it’s usually safe to assume it’s a windows path since backslashes are seldom used in Linux filenames, but keep this in mind when you code so you don’t create accidental security holes.


回答 1

实际上,有一个函数可以完全返回您想要的

import os
print(os.path.basename(your_path))

Actually, there’s a function that returns exactly what you want

import os
print(os.path.basename(your_path))

回答 2

os.path.split 是您要寻找的功能

head, tail = os.path.split("/tmp/d/a.dat")

>>> print(tail)
a.dat
>>> print(head)
/tmp/d

os.path.split is the function you are looking for

head, tail = os.path.split("/tmp/d/a.dat")

>>> print(tail)
a.dat
>>> print(head)
/tmp/d

回答 3

在python 3中

>>> from pathlib import Path    
>>> Path("/tmp/d/a.dat").name
'a.dat'

In python 3

>>> from pathlib import Path    
>>> Path("/tmp/d/a.dat").name
'a.dat'

回答 4

import os
head, tail = os.path.split('path/to/file.exe')

尾部就是您想要的文件名。

有关详细信息,请参见python os模块文档

import os
head, tail = os.path.split('path/to/file.exe')

tail is what you want, the filename.

See python os module docs for detail


回答 5

import os
file_location = '/srv/volume1/data/eds/eds_report.csv'
file_name = os.path.basename(file_location )  #eds_report.csv
location = os.path.dirname(file_location )    #/srv/volume1/data/eds
import os
file_location = '/srv/volume1/data/eds/eds_report.csv'
file_name = os.path.basename(file_location )  #eds_report.csv
location = os.path.dirname(file_location )    #/srv/volume1/data/eds

回答 6

在您的示例中,您还需要从右侧右侧去除斜线以返回c

>>> import os
>>> path = 'a/b/c/'
>>> path = path.rstrip(os.sep) # strip the slash from the right side
>>> os.path.basename(path)
'c'

第二级:

>>> os.path.filename(os.path.dirname(path))
'b'

更新:我认为lazyr已经提供了正确的答案。我的代码不适用于unix系统上类似Windows的路径,反之亦然,不适用于Windows系统上类似unix的路径。

In your example you will also need to strip slash from right the right side to return c:

>>> import os
>>> path = 'a/b/c/'
>>> path = path.rstrip(os.sep) # strip the slash from the right side
>>> os.path.basename(path)
'c'

Second level:

>>> os.path.filename(os.path.dirname(path))
'b'

update: I think lazyr has provided the right answer. My code will not work with windows-like paths on unix systems and vice versus with unix-like paths on windows system.


回答 7

fname = str("C:\Windows\paint.exe").split('\\')[-1:][0]

这将返回:paint.exe

根据您的路径或操作系统更改分割功能的sep值。

fname = str("C:\Windows\paint.exe").split('\\')[-1:][0]

this will return : paint.exe

change the sep value of the split function regarding your path or OS.


回答 8

如果要自动获取文件名,可以执行

import glob

for f in glob.glob('/your/path/*'):
    print(os.path.split(f)[-1])

If you want to get the filename automatically you can do

import glob

for f in glob.glob('/your/path/*'):
    print(os.path.split(f)[-1])

回答 9

如果您的文件路径不是以“ /”结尾并且目录以“ /”分隔,则使用以下代码。众所周知,路径通常不以“ /”结尾。

import os
path_str = "/var/www/index.html"
print(os.path.basename(path_str))

但是在某些情况下,例如URL以“ /”结尾,然后使用以下代码

import os
path_str = "/home/some_str/last_str/"
split_path = path_str.rsplit("/",1)
print(os.path.basename(split_path[0]))

但是,当您的路径通常在Windows路径中以“ \”分隔时,则可以使用以下代码

import os
path_str = "c:\\var\www\index.html"
print(os.path.basename(path_str))

import os
path_str = "c:\\home\some_str\last_str\\"
split_path = path_str.rsplit("\\",1)
print(os.path.basename(split_path[0]))

您可以通过检查OS类型将两者组合为一个功能并返回结果。

If your file path not ended with “/” and directories separated by “/” then use the following code. As we know generally path doesn’t end with “/”.

import os
path_str = "/var/www/index.html"
print(os.path.basename(path_str))

But in some cases like URLs end with “/” then use the following code

import os
path_str = "/home/some_str/last_str/"
split_path = path_str.rsplit("/",1)
print(os.path.basename(split_path[0]))

but when your path sperated by “\” which you generally find in windows paths then you can use the following codes

import os
path_str = "c:\\var\www\index.html"
print(os.path.basename(path_str))

import os
path_str = "c:\\home\some_str\last_str\\"
split_path = path_str.rsplit("\\",1)
print(os.path.basename(split_path[0]))

You can combine both into one function by check OS type and return the result.


回答 10

这适用于Linux和Windows,以及标准库

paths = ['a/b/c/', 'a/b/c', '\\a\\b\\c', '\\a\\b\\c\\', 'a\\b\\c',
         'a/b/../../a/b/c/', 'a/b/../../a/b/c']

def path_leaf(path):
    return path.strip('/').strip('\\').split('/')[-1].split('\\')[-1]

[path_leaf(path) for path in paths]

结果:

['c', 'c', 'c', 'c', 'c', 'c', 'c']

This is working for linux and windows as well with standard library

paths = ['a/b/c/', 'a/b/c', '\\a\\b\\c', '\\a\\b\\c\\', 'a\\b\\c',
         'a/b/../../a/b/c/', 'a/b/../../a/b/c']

def path_leaf(path):
    return path.strip('/').strip('\\').split('/')[-1].split('\\')[-1]

[path_leaf(path) for path in paths]

Results:

['c', 'c', 'c', 'c', 'c', 'c', 'c']

回答 11

这是仅用于正则表达式的解决方案,它似乎可与任何OS上的任何OS路径一起使用。

不需要其他模块,也不需要预处理:

import re

def extract_basename(path):
  """Extracts basename of a given path. Should Work with any OS Path on any OS"""
  basename = re.search(r'[^\\/]+(?=[\\/]?$)', path)
  if basename:
    return basename.group(0)


paths = ['a/b/c/', 'a/b/c', '\\a\\b\\c', '\\a\\b\\c\\', 'a\\b\\c',
         'a/b/../../a/b/c/', 'a/b/../../a/b/c']

print([extract_basename(path) for path in paths])
# ['c', 'c', 'c', 'c', 'c', 'c', 'c']


extra_paths = ['C:\\', 'alone', '/a/space in filename', 'C:\\multi\nline']

print([extract_basename(path) for path in extra_paths])
# ['C:', 'alone', 'space in filename', 'multi\nline']

更新:

如果您只想要一个潜在的文件名(如果存在)(即/a/b/dir,也是如此c:\windows\),则将正则表达式更改为:r'[^\\/]+(?![\\/])$'。对于“正则表达式挑战”,这会将某种斜杠的正向正向查找更改为负向正向查找,导致以所述斜杠结尾的路径名不返回任何内容,而不返回路径名中的最后一个子目录。当然,不能保证潜在的文件名实际上是指文件,并且为此os.path.is_dir()os.path.is_file()将需要使用。

这将匹配如下:

/a/b/c/             # nothing, pathname ends with the dir 'c'
c:\windows\         # nothing, pathname ends with the dir 'windows'
c:hello.txt         # matches potential filename 'hello.txt'
~it_s_me/.bashrc    # matches potential filename '.bashrc'
c:\windows\system32 # matches potential filename 'system32', except
                    # that is obviously a dir. os.path.is_dir()
                    # should be used to tell us for sure

可以在这里测试正则表达式。

Here’s a regex-only solution, which seems to work with any OS path on any OS.

No other module is needed, and no preprocessing is needed either :

import re

def extract_basename(path):
  """Extracts basename of a given path. Should Work with any OS Path on any OS"""
  basename = re.search(r'[^\\/]+(?=[\\/]?$)', path)
  if basename:
    return basename.group(0)


paths = ['a/b/c/', 'a/b/c', '\\a\\b\\c', '\\a\\b\\c\\', 'a\\b\\c',
         'a/b/../../a/b/c/', 'a/b/../../a/b/c']

print([extract_basename(path) for path in paths])
# ['c', 'c', 'c', 'c', 'c', 'c', 'c']


extra_paths = ['C:\\', 'alone', '/a/space in filename', 'C:\\multi\nline']

print([extract_basename(path) for path in extra_paths])
# ['C:', 'alone', 'space in filename', 'multi\nline']

Update:

If you only want a potential filename, if present (i.e., /a/b/ is a dir and so is c:\windows\), change the regex to: r'[^\\/]+(?![\\/])$' . For the “regex challenged,” this changes the positive forward lookahead for some sort of slash to a negative forward lookahead, causing pathnames that end with said slash to return nothing instead of the last sub-directory in the pathname. Of course there is no guarantee that the potential filename actually refers to a file and for that os.path.is_dir() or os.path.is_file() would need to be employed.

This will match as follows:

/a/b/c/             # nothing, pathname ends with the dir 'c'
c:\windows\         # nothing, pathname ends with the dir 'windows'
c:hello.txt         # matches potential filename 'hello.txt'
~it_s_me/.bashrc    # matches potential filename '.bashrc'
c:\windows\system32 # matches potential filename 'system32', except
                    # that is obviously a dir. os.path.is_dir()
                    # should be used to tell us for sure

The regex can be tested here.


回答 12

也许只是我的一站式解决方案,而没有重要的新特性(关于创建临时文件的临时文件:D)

import tempfile
abc = tempfile.NamedTemporaryFile(dir='/tmp/')
abc.name
abc.name.replace("/", " ").split()[-1] 

获取的值abc.name将是这样的字符串:'/tmp/tmpks5oksk7' 所以我可以用/空格替换.replace("/", " "),然后调用split()。那将返回一个列表,我得到列表的最后一个元素[-1]

无需导入任何模块。

Maybe just my all in one solution without important some new(regard the tempfile for creating temporary files :D )

import tempfile
abc = tempfile.NamedTemporaryFile(dir='/tmp/')
abc.name
abc.name.replace("/", " ").split()[-1] 

Getting the values of abc.name will be a string like this: '/tmp/tmpks5oksk7' So I can replace the / with a space .replace("/", " ") and then call split(). That will return a list and I get the last element of the list with [-1]

No need to get any module imported.


回答 13

我从未见过双反斜线路径,它们是否存在?python模块的内置功能对此os失败。其他所有工作方式,以及您对os.path.normpath()以下事项的警告:

paths = ['a/b/c/', 'a/b/c', '\\a\\b\\c', '\\a\\b\\c\\', 'a\\b\\c', 
...     'a/b/../../a/b/c/', 'a/b/../../a/b/c', 'a/./b/c', 'a\b/c']
for path in paths:
    os.path.basename(os.path.normpath(path))

I have never seen double-backslashed paths, are they existing? The built-in feature of python module os fails for those. All others work, also the caveat given by you with os.path.normpath():

paths = ['a/b/c/', 'a/b/c', '\\a\\b\\c', '\\a\\b\\c\\', 'a\\b\\c', 
...     'a/b/../../a/b/c/', 'a/b/../../a/b/c', 'a/./b/c', 'a\b/c']
for path in paths:
    os.path.basename(os.path.normpath(path))

回答 14

Windows分隔符可以在Unix文件名或Windows路径中。Unix分隔符只能存在于Unix路径中。Unix分隔符的存在指示非Windows路径。

以下将通过操作系统特定的分隔符剥离(剪切尾随的分隔符),然后分割并返回最右边的值。这很丑陋,但基于上面的假设很简单。如果假设不正确,请更新,我将更新此回复以匹配更准确的条件。

a.rstrip("\\\\" if a.count("/") == 0 else '/').split("\\\\" if a.count("/") == 0 else '/')[-1]

样例代码:

b = ['a/b/c/','a/b/c','\\a\\b\\c','\\a\\b\\c\\','a\\b\\c','a/b/../../a/b/c/','a/b/../../a/b/c']

for a in b:

    print (a, a.rstrip("\\" if a.count("/") == 0 else '/').split("\\" if a.count("/") == 0 else '/')[-1])

The Windows separator can be in a Unix filename or Windows Path. The Unix separator can only exist in the Unix path. The presence of a Unix separator indicates a non-Windows path.

The following will strip (cut trailing separator) by the OS specific separator, then split and return the rightmost value. It’s ugly, but simple based on the assumption above. If the assumption is incorrect, please update and I will update this response to match the more accurate conditions.

a.rstrip("\\\\" if a.count("/") == 0 else '/').split("\\\\" if a.count("/") == 0 else '/')[-1]

sample code:

b = ['a/b/c/','a/b/c','\\a\\b\\c','\\a\\b\\c\\','a\\b\\c','a/b/../../a/b/c/','a/b/../../a/b/c']

for a in b:

    print (a, a.rstrip("\\" if a.count("/") == 0 else '/').split("\\" if a.count("/") == 0 else '/')[-1])

回答 15

为了完整起见,这是pathlibpython 3.2+ 的解决方案:

>>> from pathlib import PureWindowsPath

>>> paths = ['a/b/c/', 'a/b/c', '\\a\\b\\c', '\\a\\b\\c\\', 'a\\b\\c', 
...          'a/b/../../a/b/c/', 'a/b/../../a/b/c']

>>> [PureWindowsPath(path).name for path in paths]
['c', 'c', 'c', 'c', 'c', 'c', 'c']

这适用于Windows和Linux。

For completeness sake, here is the pathlib solution for python 3.2+:

>>> from pathlib import PureWindowsPath

>>> paths = ['a/b/c/', 'a/b/c', '\\a\\b\\c', '\\a\\b\\c\\', 'a\\b\\c', 
...          'a/b/../../a/b/c/', 'a/b/../../a/b/c']

>>> [PureWindowsPath(path).name for path in paths]
['c', 'c', 'c', 'c', 'c', 'c', 'c']

This works on both Windows and Linux.


回答 16

在Python 2和3中,都使用pathlib2模块:

import posixpath  # to generate unix paths
from pathlib2 import PurePath, PureWindowsPath, PurePosixPath

def path2unix(path, nojoin=True, fromwinpath=False):
    """From a path given in any format, converts to posix path format
    fromwinpath=True forces the input path to be recognized as a Windows path (useful on Unix machines to unit test Windows paths)"""
    if not path:
        return path
    if fromwinpath:
        pathparts = list(PureWindowsPath(path).parts)
    else:
        pathparts = list(PurePath(path).parts)
    if nojoin:
        return pathparts
    else:
        return posixpath.join(*pathparts)

用法:

In [9]: path2unix('lala/lolo/haha.dat')
Out[9]: ['lala', 'lolo', 'haha.dat']

In [10]: path2unix(r'C:\lala/lolo/haha.dat')
Out[10]: ['C:\\', 'lala', 'lolo', 'haha.dat']

In [11]: path2unix(r'C:\lala/lolo/haha.dat') # works even with malformatted cases mixing both Windows and Linux path separators
Out[11]: ['C:\\', 'lala', 'lolo', 'haha.dat']

与您的测试用例:

In [12]: testcase = paths = ['a/b/c/', 'a/b/c', '\\a\\b\\c', '\\a\\b\\c\\', 'a\\b\\c',
    ...: ...     'a/b/../../a/b/c/', 'a/b/../../a/b/c']

In [14]: for t in testcase:
    ...:     print(path2unix(t)[-1])
    ...:
    ...:
c
c
c
c
c
c
c

这里的想法是将所有路径转换为的统一内部表示形式pathlib2,具体取决于平台而使用不同的解码器。幸运的是,它pathlib2包括一个可PurePath在任何路径上工作的通用解码器。如果此方法无效,则可以使用强制识别Windows路径fromwinpath=True。这会将输入字符串分成多个部分,最后一个是您要查找的叶子,因此是path2unix(t)[-1]

如果为参数nojoin=False,则路径将重新加入,以便输出只是转换为Unix格式的输入字符串,这对于跨平台比较子路径很有用。

In both Python 2 and 3, using the module pathlib2:

import posixpath  # to generate unix paths
from pathlib2 import PurePath, PureWindowsPath, PurePosixPath

def path2unix(path, nojoin=True, fromwinpath=False):
    """From a path given in any format, converts to posix path format
    fromwinpath=True forces the input path to be recognized as a Windows path (useful on Unix machines to unit test Windows paths)"""
    if not path:
        return path
    if fromwinpath:
        pathparts = list(PureWindowsPath(path).parts)
    else:
        pathparts = list(PurePath(path).parts)
    if nojoin:
        return pathparts
    else:
        return posixpath.join(*pathparts)

Usage:

In [9]: path2unix('lala/lolo/haha.dat')
Out[9]: ['lala', 'lolo', 'haha.dat']

In [10]: path2unix(r'C:\lala/lolo/haha.dat')
Out[10]: ['C:\\', 'lala', 'lolo', 'haha.dat']

In [11]: path2unix(r'C:\lala/lolo/haha.dat') # works even with malformatted cases mixing both Windows and Linux path separators
Out[11]: ['C:\\', 'lala', 'lolo', 'haha.dat']

With your testcase:

In [12]: testcase = paths = ['a/b/c/', 'a/b/c', '\\a\\b\\c', '\\a\\b\\c\\', 'a\\b\\c',
    ...: ...     'a/b/../../a/b/c/', 'a/b/../../a/b/c']

In [14]: for t in testcase:
    ...:     print(path2unix(t)[-1])
    ...:
    ...:
c
c
c
c
c
c
c

The idea here is to convert all paths into the unified internal representation of pathlib2, with different decoders depending on the platform. Fortunately, pathlib2 includes a generic decoder called PurePath that should work on any path. In case this does not work, you can force the recognition of windows path using fromwinpath=True. This will split the input string into parts, the last one is the leaf you are looking for, hence the path2unix(t)[-1].

If the argument nojoin=False, the path will be joined back, so that the output is simply the input string converted to a Unix format, which can be useful to compare subpaths across platforms.


声明:本站所有文章,如无特殊说明或标注,均为本站原创发布。任何个人或组织,在未征得本站同意时,禁止复制、盗用、采集、发布本站内容到任何网站、书籍等各类媒体平台。如若本站内容侵犯了原著者的合法权益,可联系我们进行处理。