如何在Python中将Dos路径拆分为其组件

问题:如何在Python中将Dos路径拆分为其组件

我有一个代表dos路径的字符串变量,例如:

var = "d:\stuff\morestuff\furtherdown\THEFILE.txt"

我想将此字符串拆分为:

[ "d", "stuff", "morestuff", "furtherdown", "THEFILE.txt" ]

我尝试使用split()和,replace()但是它们要么只处理第一个反斜杠,要么将十六进制数字插入字符串。

我需要以某种方式将此字符串变量转换为原始字符串,以便对其进行解析。

最好的方法是什么?

我还应该补充一点,var即我要解析的路径的内容实际上是命令行查询的返回值。我生成的不是路径数据。它存储在文件中,命令行工具不会转义反斜杠。

I have a string variable which represents a dos path e.g:

var = "d:\stuff\morestuff\furtherdown\THEFILE.txt"

I want to split this string into:

[ "d", "stuff", "morestuff", "furtherdown", "THEFILE.txt" ]

I have tried using split() and replace() but they either only process the first backslash or they insert hex numbers into the string.

I need to convert this string variable into a raw string somehow so that I can parse it.

What’s the best way to do this?

I should also add that the contents of var i.e. the path that I’m trying to parse, is actually the return value of a command line query. It’s not path data that I generate myself. Its stored in a file, and the command line tool is not going to escape the backslashes.


回答 0

人们一直在编写自己的路径摆弄功能并弄错了它,这使我倍感痛苦。空格,斜杠,反斜杠,冒号-混乱的可能性并非无止境,但无论如何还是容易犯错误。因此,我非常喜欢使用os.path,并在此基础上推荐使用。

(但是,通往美德的途径并不是最容易采用的途径,许多人在发现这一点时总是倾向于走一条湿滑的道路直指该死。他们直到一天都不会意识到一切都崩溃了,他们- ,更有可能是其他人-必须弄清楚为什么一切都出错了,结果是有人制作了一个包含斜杠和反斜杠的文件名-有人建议答案是“不要这样做”。这些人中的任何一个。除了将斜杠和反斜杠混在一起的人之外,如果您愿意,可以成为他们。)

您可以这样获得驱动器和路径+文件:

drive, path_and_file = os.path.splitdrive(path)

获取路径和文件:

path, file = os.path.split(path_and_file)

获取各个文件夹的名称并不是特别方便,但这是一种诚实的中间不适,这增加了以后找到实际可行的东西的乐趣:

folders = []
while 1:
    path, folder = os.path.split(path)

    if folder != "":
        folders.append(folder)
    else:
        if path != "":
            folders.append(path)

        break

folders.reverse()

(如果该路径最初是绝对路径"\"folders则会在此路径的开头弹出a 。如果不希望这样做,可能会丢失一些代码。)

I’ve been bitten loads of times by people writing their own path fiddling functions and getting it wrong. Spaces, slashes, backslashes, colons — the possibilities for confusion are not endless, but mistakes are easily made anyway. So I’m a stickler for the use of os.path, and recommend it on that basis.

(However, the path to virtue is not the one most easily taken, and many people when finding this are tempted to take a slippery path straight to damnation. They won’t realise until one day everything falls to pieces, and they — or, more likely, somebody else — has to work out why everything has gone wrong, and it turns out somebody made a filename that mixes slashes and backslashes — and some person suggests that the answer is “not to do that”. Don’t be any of these people. Except for the one who mixed up slashes and backslashes — you could be them if you like.)

You can get the drive and path+file like this:

drive, path_and_file = os.path.splitdrive(path)

Get the path and the file:

path, file = os.path.split(path_and_file)

Getting the individual folder names is not especially convenient, but it is the sort of honest middling discomfort that heightens the pleasure of later finding something that actually works well:

folders = []
while 1:
    path, folder = os.path.split(path)

    if folder != "":
        folders.append(folder)
    elif path != "":
        folders.append(path)

        break

folders.reverse()

(This pops a "\" at the start of folders if the path was originally absolute. You could lose a bit of code if you didn’t want that.)


回答 1

我会做

import os
path = os.path.normpath(path)
path.split(os.sep)

首先,将路径字符串标准化为适合操作系统的字符串。然后os.sep必须安全地用作字符串函数分割中的定界符。

I would do

import os
path = os.path.normpath(path)
path.split(os.sep)

First normalize the path string into a proper string for the OS. Then os.sep must be safe to use as a delimiter in string function split.


回答 2

您可以简单地使用最Python化的方法(IMHO):

import os

your_path = r"d:\stuff\morestuff\furtherdown\THEFILE.txt"
path_list = your_path.split(os.sep)
print path_list

这会给你:

['d:', 'stuff', 'morestuff', 'furtherdown', 'THEFILE.txt']

这里的提示是使用os.sep而不是'\\''/',因为这使它独立于系统。

要从驱动器号中删除冒号(尽管我看不到您要这样做的任何原因),您可以编写:

path_list[0] = path_list[0][0]

You can simply use the most Pythonic approach (IMHO):

import os

your_path = r"d:\stuff\morestuff\furtherdown\THEFILE.txt"
path_list = your_path.split(os.sep)
print path_list

Which will give you:

['d:', 'stuff', 'morestuff', 'furtherdown', 'THEFILE.txt']

The clue here is to use os.sep instead of '\\' or '/', as this makes it system independent.

To remove colon from the drive letter (although I don’t see any reason why you would want to do that), you can write:

path_list[0] = path_list[0][0]

回答 3

在Python> = 3.4中,这变得更加简单。现在,您可以pathlib.Path.parts用来获取路径的所有部分。

例:

>>> from pathlib import Path
>>> Path('C:/path/to/file.txt').parts
('C:\\', 'path', 'to', 'file.txt')
>>> Path(r'C:\path\to\file.txt').parts
('C:\\', 'path', 'to', 'file.txt')

在Windows安装的Python 3上,这将假定您正在使用Windows路径,而在* nix上,将假定您正在使用posix路径。这通常是您想要的,但是如果不是,则可以使用类pathlib.PurePosixPathpathlib.PureWindowsPath根据需要:

>>> from pathlib import PurePosixPath, PureWindowsPath
>>> PurePosixPath('/path/to/file.txt').parts
('/', 'path', 'to', 'file.txt')
>>> PureWindowsPath(r'C:\path\to\file.txt').parts
('C:\\', 'path', 'to', 'file.txt')
>>> PureWindowsPath(r'\\host\share\path\to\file.txt').parts
('\\\\host\\share\\', 'path', 'to', 'file.txt')

编辑:还有一个反向移植到python 2可用:pathlib2

In Python >=3.4 this has become much simpler. You can now use pathlib.Path.parts to get all the parts of a path.

Example:

>>> from pathlib import Path
>>> Path('C:/path/to/file.txt').parts
('C:\\', 'path', 'to', 'file.txt')
>>> Path(r'C:\path\to\file.txt').parts
('C:\\', 'path', 'to', 'file.txt')

On a Windows install of Python 3 this will assume that you are working with Windows paths, and on *nix it will assume that you are working with posix paths. This is usually what you want, but if it isn’t you can use the classes pathlib.PurePosixPath or pathlib.PureWindowsPath as needed:

>>> from pathlib import PurePosixPath, PureWindowsPath
>>> PurePosixPath('/path/to/file.txt').parts
('/', 'path', 'to', 'file.txt')
>>> PureWindowsPath(r'C:\path\to\file.txt').parts
('C:\\', 'path', 'to', 'file.txt')
>>> PureWindowsPath(r'\\host\share\path\to\file.txt').parts
('\\\\host\\share\\', 'path', 'to', 'file.txt')

Edit: There is also a backport to python 2 available: pathlib2


回答 4

这里的问题始于如何首先创建字符串。

a = "d:\stuff\morestuff\furtherdown\THEFILE.txt"

使用这种方式时,巨蟒试图特殊情况下,这些:\s\m\f,和\T。在您的情况下,\f被视为换页(0x0C),而其他反斜杠的处理正确。您需要执行以下操作之一:

b = "d:\\stuff\\morestuff\\furtherdown\\THEFILE.txt"      # doubled backslashes
c = r"d:\stuff\morestuff\furtherdown\THEFILE.txt"         # raw string, no doubling necessary

然后,将其中任何一个拆分后,您将获得所需的结果。

The problem here starts with how you’re creating the string in the first place.

a = "d:\stuff\morestuff\furtherdown\THEFILE.txt"

Done this way, Python is trying to special case these: \s, \m, \f, and \T. In your case, \f is being treated as a formfeed (0x0C) while the other backslashes are handled correctly. What you need to do is one of these:

b = "d:\\stuff\\morestuff\\furtherdown\\THEFILE.txt"      # doubled backslashes
c = r"d:\stuff\morestuff\furtherdown\THEFILE.txt"         # raw string, no doubling necessary

Then once you split either of these, you’ll get the result you want.


回答 5

对于更简洁的解决方案,请考虑以下事项:

def split_path(p):
    a,b = os.path.split(p)
    return (split_path(a) if len(a) and len(b) else []) + [b]

For a somewhat more concise solution, consider the following:

def split_path(p):
    a,b = os.path.split(p)
    return (split_path(a) if len(a) and len(b) else []) + [b]

回答 6

我实际上无法对此做出真正的回答(因为我来这里只是为了找到一个我自己),但是对我来说,不同方法的数量以及所提到的所有警告是最确定的指标,表明Python的os.path模块迫切需要这一点。作为内置功能。

I can’t actually contribute a real answer to this one (as I came here hoping to find one myself), but to me the number of differing approaches and all the caveats mentioned is the surest indicator that Python’s os.path module desperately needs this as a built-in function.


回答 7

具有生成器的功能方式。

def split(path):
    (drive, head) = os.path.splitdrive(path)
    while (head != os.sep):
        (head, tail) = os.path.split(head)
        yield tail

实际上:

>>> print([x for x in split(os.path.normpath('/path/to/filename'))])
['filename', 'to', 'path']

The functional way, with a generator.

def split(path):
    (drive, head) = os.path.splitdrive(path)
    while (head != os.sep):
        (head, tail) = os.path.split(head)
        yield tail

In action:

>>> print([x for x in split(os.path.normpath('/path/to/filename'))])
['filename', 'to', 'path']

回答 8

这个对我有用:

>>> a=r"d:\stuff\morestuff\furtherdown\THEFILE.txt"
>>> a.split("\\")
['d:', 'stuff', 'morestuff', 'furtherdown', 'THEFILE.txt']

当然,您可能还需要从第一个组件中去除冒号,但是保留它可以重新组装路径。

r修改标记字符串文字为“原始”; 注意嵌入的反斜杠如何不加倍。

It works for me:

>>> a=r"d:\stuff\morestuff\furtherdown\THEFILE.txt"
>>> a.split("\\")
['d:', 'stuff', 'morestuff', 'furtherdown', 'THEFILE.txt']

Sure you might need to also strip out the colon from the first component, but keeping it makes it possible to re-assemble the path.

The r modifier marks the string literal as “raw”; notice how embedded backslashes are not doubled.


回答 9

关于的内容mypath.split("\\")最好用表示mypath.split(os.sep)sep是您特定平台(例如\Windows,/Unix等)的路径分隔符,Python构建知道使用哪个平台。如果您使用sep,那么您的代码将与平台无关。

The stuff about about mypath.split("\\") would be better expressed as mypath.split(os.sep). sep is the path separator for your particular platform (e.g., \ for Windows, / for Unix, etc.), and the Python build knows which one to use. If you use sep, then your code will be platform agnostic.


回答 10

re.split()比string.split()可以提供更多帮助

import re    
var = "d:\stuff\morestuff\furtherdown\THEFILE.txt"
re.split( r'[\\/]', var )
['d:', 'stuff', 'morestuff', 'furtherdown', 'THEFILE.txt']

如果您还想支持Linux和Mac路径,只需添加filter(None,result),由于它的路径以’/’或’//’开头,因此它将从split()中删除不需要的”。例如’// mount / …’或’/ var / tmp /’

import re    
var = "/var/stuff/morestuff/furtherdown/THEFILE.txt"
result = re.split( r'[\\/]', var )
filter( None, result )
['var', 'stuff', 'morestuff', 'furtherdown', 'THEFILE.txt']

re.split() can help a little more then string.split()

import re    
var = "d:\stuff\morestuff\furtherdown\THEFILE.txt"
re.split( r'[\\/]', var )
['d:', 'stuff', 'morestuff', 'furtherdown', 'THEFILE.txt']

If you also want to support Linux and Mac paths, just add filter(None,result), so it will remove the unwanted ” from the split() since their paths starts with ‘/’ or ‘//’. for example ‘//mount/…’ or ‘/var/tmp/’

import re    
var = "/var/stuff/morestuff/furtherdown/THEFILE.txt"
result = re.split( r'[\\/]', var )
filter( None, result )
['var', 'stuff', 'morestuff', 'furtherdown', 'THEFILE.txt']

回答 11

您可以递归os.path.split的字符串

import os
def parts(path):
    p,f = os.path.split(path)
    return parts(p) + [f] if f else [p]

针对一些路径字符串进行测试,然后使用 os.path.join

>>> for path in [
...         r'd:\stuff\morestuff\furtherdown\THEFILE.txt',
...         '/path/to/file.txt',
...         'relative/path/to/file.txt',
...         r'C:\path\to\file.txt',
...         r'\\host\share\path\to\file.txt',
...     ]:
...     print parts(path), os.path.join(*parts(path))
... 
['d:\\', 'stuff', 'morestuff', 'furtherdown', 'THEFILE.txt'] d:\stuff\morestuff\furtherdown\THEFILE.txt
['/', 'path', 'to', 'file.txt'] /path\to\file.txt
['', 'relative', 'path', 'to', 'file.txt'] relative\path\to\file.txt
['C:\\', 'path', 'to', 'file.txt'] C:\path\to\file.txt
['\\\\', 'host', 'share', 'path', 'to', 'file.txt'] \\host\share\path\to\file.txt

根据您要如何处理驱动器号,UNC路径以及绝对和相对路径,可能需要对列表的第一个元素进行不同的处理。更改最后一个[p][os.path.splitdrive(p)]将驱动器号和目录根拆分为元组来强制问题。

import os
def parts(path):
    p,f = os.path.split(path)
    return parts(p) + [f] if f else [os.path.splitdrive(p)]

[('d:', '\\'), 'stuff', 'morestuff', 'furtherdown', 'THEFILE.txt']
[('', '/'), 'path', 'to', 'file.txt']
[('', ''), 'relative', 'path', 'to', 'file.txt']
[('C:', '\\'), 'path', 'to', 'file.txt']
[('', '\\\\'), 'host', 'share', 'path', 'to', 'file.txt']

编辑:我已经意识到,这个答案是非常相似的,上面给出user1556435。我留下我的答案,因为路径的驱动器组件的处理方式不同。

You can recursively os.path.split the string

import os
def parts(path):
    p,f = os.path.split(path)
    return parts(p) + [f] if f else [p]

Testing this against some path strings, and reassembling the path with os.path.join

>>> for path in [
...         r'd:\stuff\morestuff\furtherdown\THEFILE.txt',
...         '/path/to/file.txt',
...         'relative/path/to/file.txt',
...         r'C:\path\to\file.txt',
...         r'\\host\share\path\to\file.txt',
...     ]:
...     print parts(path), os.path.join(*parts(path))
... 
['d:\\', 'stuff', 'morestuff', 'furtherdown', 'THEFILE.txt'] d:\stuff\morestuff\furtherdown\THEFILE.txt
['/', 'path', 'to', 'file.txt'] /path\to\file.txt
['', 'relative', 'path', 'to', 'file.txt'] relative\path\to\file.txt
['C:\\', 'path', 'to', 'file.txt'] C:\path\to\file.txt
['\\\\', 'host', 'share', 'path', 'to', 'file.txt'] \\host\share\path\to\file.txt

The first element of the list may need to be treated differently depending on how you want to deal with drive letters, UNC paths and absolute and relative paths. Changing the last [p] to [os.path.splitdrive(p)] forces the issue by splitting the drive letter and directory root out into a tuple.

import os
def parts(path):
    p,f = os.path.split(path)
    return parts(p) + [f] if f else [os.path.splitdrive(p)]

[('d:', '\\'), 'stuff', 'morestuff', 'furtherdown', 'THEFILE.txt']
[('', '/'), 'path', 'to', 'file.txt']
[('', ''), 'relative', 'path', 'to', 'file.txt']
[('C:', '\\'), 'path', 'to', 'file.txt']
[('', '\\\\'), 'host', 'share', 'path', 'to', 'file.txt']

Edit: I have realised that this answer is very similar to that given above by user1556435. I’m leaving my answer up as the handling of the drive component of the path is different.


回答 12

就像其他人解释的那样-您的问题源于使用\,这是字符串文字/常量中的转义字符。OTOH,如果您有另一个来源的文件路径字符串(从文件,控制台读取或由os函数返回)-在’\\’或r’\’上拆分不会有问题。

就如同其他人则建议,如果你想使用\的程序的文字,你必须要么复制它\\或整个文本必须由前缀r,像这样r'lite\ral'r"lite\ral"避免解析器转换是\r对CR(回车)字符。

不过,还有另一种方法-就是不要\在代码中使用反斜杠路径名!自从上个世纪以来,Windows认识到使用正斜杠作为目录分隔符的路径名并可以很好地工作/!不知何故,没有多少人知道..但是它起作用:

>>> var = "d:/stuff/morestuff/furtherdown/THEFILE.txt"
>>> var.split('/')
['d:', 'stuff', 'morestuff', 'furtherdown', 'THEFILE.txt']

顺便说一下,这将使您的代码在Unix,Windows和Mac上运行…因为它们全部都/用作目录分隔符…即使您不想使用module的预定义常量os

Just like others explained – your problem stemmed from using \, which is escape character in string literal/constant. OTOH, if you had that file path string from another source (read from file, console or returned by os function) – there wouldn’t have been problem splitting on ‘\\’ or r’\’.

And just like others suggested, if you want to use \ in program literal, you have to either duplicate it \\ or the whole literal has to be prefixed by r, like so r'lite\ral' or r"lite\ral" to avoid the parser converting that \ and r to CR (carriage return) character.

There is one more way though – just don’t use backslash \ pathnames in your code! Since last century Windows recognizes and works fine with pathnames which use forward slash as directory separator /! Somehow not many people know that.. but it works:

>>> var = "d:/stuff/morestuff/furtherdown/THEFILE.txt"
>>> var.split('/')
['d:', 'stuff', 'morestuff', 'furtherdown', 'THEFILE.txt']

This by the way will make your code work on Unix, Windows and Mac… because all of them do use / as directory separator… even if you don’t want to use the predefined constants of module os.


回答 13

假设您有一个filedata.txt包含内容的文件:

d:\stuff\morestuff\furtherdown\THEFILE.txt
d:\otherstuff\something\otherfile.txt

您可以读取和分割文件路径:

>>> for i in open("filedata.txt").readlines():
...     print i.strip().split("\\")
... 
['d:', 'stuff', 'morestuff', 'furtherdown', 'THEFILE.txt']
['d:', 'otherstuff', 'something', 'otherfile.txt']

Let assume you have have a file filedata.txt with content:

d:\stuff\morestuff\furtherdown\THEFILE.txt
d:\otherstuff\something\otherfile.txt

You can read and split the file paths:

>>> for i in open("filedata.txt").readlines():
...     print i.strip().split("\\")
... 
['d:', 'stuff', 'morestuff', 'furtherdown', 'THEFILE.txt']
['d:', 'otherstuff', 'something', 'otherfile.txt']

回答 14

我使用以下命令,因为它使用os.path.basename函数,因此不会在返回的列表中添加任何斜杠。它也适用于任何平台的斜杠:即窗口的\\或Unix的/。而且,它不会添加Windows用于服务器路径的\\\\ :)

def SplitPath( split_path ):
    pathSplit_lst   = []
    while os.path.basename(split_path):
        pathSplit_lst.append( os.path.basename(split_path) )
        split_path = os.path.dirname(split_path)
    pathSplit_lst.reverse()
    return pathSplit_lst

因此对于’\\\\ server \\ folder1 \\ folder2 \\ folder3 \\ folder4′

你得到

[‘服务器’,’文件夹1’,’文件夹2’,’文件夹3’,’文件夹4’]

I use the following as since it uses the os.path.basename function it doesn’t add any slashes to the returned list. It also works with any platform’s slashes: i.e window’s \\ or unix’s /. And furthermore, it doesn’t add the \\\\ that windows uses for server paths :)

def SplitPath( split_path ):
    pathSplit_lst   = []
    while os.path.basename(split_path):
        pathSplit_lst.append( os.path.basename(split_path) )
        split_path = os.path.dirname(split_path)
    pathSplit_lst.reverse()
    return pathSplit_lst

So for ‘\\\\server\\folder1\\folder2\\folder3\\folder4’

you get

[‘server’,’folder1′,’folder2′,’folder3′,’folder4′]


回答 15

我实际上不确定这是否可以完全回答问题,但是我很开心地编写了这个小功能,该功能可以保持堆栈,坚持使用基于os.path的操作并返回项目的列表/堆栈。

  9 def components(path):
 10     ret = []
 11     while len(path) > 0:
 12         path, crust = split(path)
 13         ret.insert(0, crust)
 14
 15     return ret
 16

I’m not actually sure if this fully answers the question, but I had a fun time writing this little function that keeps a stack, sticks to os.path-based manipulations, and returns the list/stack of items.

  9 def components(path):
 10     ret = []
 11     while len(path) > 0:
 12         path, crust = split(path)
 13         ret.insert(0, crust)
 14
 15     return ret
 16

回答 16

下面的代码行可以处理:

  1. C:/路径/路径
  2. C://路径//路径
  3. C:\路径\路径
  4. C:\路径\路径

路径= re.split(r'[/// \]’,路径)

Below line of code can handle:

  1. C:/path/path
  2. C://path//path
  3. C:\path\path
  4. C:\path\path

path = re.split(r'[///\]’, path)


回答 17

一种递归的乐趣。

这不是最优雅的答案,但是应该可以在任何地方使用:

import os

def split_path(path):
    head = os.path.dirname(path)
    tail = os.path.basename(path)
    if head == os.path.dirname(head):
        return [tail]
    return split_path(head) + [tail]

One recursive for the fun.

Not the most elegant answer, but should work everywhere:

import os

def split_path(path):
    head = os.path.dirname(path)
    tail = os.path.basename(path)
    if head == os.path.dirname(head):
        return [tail]
    return split_path(head) + [tail]

回答 18

ntpath.split()

use ntpath.split()