


我希望结果路径相对于服务器的根目录。在上面的示例中,如果前缀为“ media”,则我希望结果为/media/js/foo.js。



For example, I want to join a prefix path to resource paths like /js/foo.js.

I want the resulting path to be relative to the root of the server. In the above example if the prefix was “media” I would want the result to be /media/js/foo.js.

os.path.join does this really well, but how it joins paths is OS dependent. In this case I know I am targeting the web, not the local file system.

Is there a best alternative when you are working with paths you know will be used in URLs? Will os.path.join work well enough? Should I just roll my own?

回答 0

从OP发布的评论看来,由于他似乎不想保留联接中的“绝对URL”(这是;-的关键工作之一)urlparse.urljoin,因此我建议避免这种情况。 os.path.join出于完全相同的原因也会很糟糕。

因此,我将使用类似的命令'/'.join(s.strip('/') for s in pieces)(如果/必须也忽略引导符-如果引导件必须是特殊情况,那当然也是可行的;-)。

Since, from the comments the OP posted, it seems he doesn’t want to preserve “absolute URLs” in the join (which is one of the key jobs of urlparse.urljoin;-), I’d recommend avoiding that. os.path.join would also be bad, for exactly the same reason.

So, I’d use something like '/'.join(s.strip('/') for s in pieces) (if the leading / must also be ignored — if the leading piece must be special-cased, that’s also feasible of course;-).

回答 1


>>> from urllib.parse import urljoin
>>> urljoin('/media/path/', 'js/foo.js')


>>> urljoin('/media/path', 'js/foo.js')
>>> urljoin('/media/path', '/js/foo.js')


在Python 2上,您必须做

from urlparse import urljoin

You can use urllib.parse.urljoin:

>>> from urllib.parse import urljoin
>>> urljoin('/media/path/', 'js/foo.js')

But beware:

>>> urljoin('/media/path', 'js/foo.js')
>>> urljoin('/media/path', '/js/foo.js')

The reason you get different results from /js/foo.js and js/foo.js is because the former begins with a slash which signifies that it already begins at the website root.

On Python 2, you have to do

from urlparse import urljoin

回答 2


>>> os.path.join is posixpath.join
>>> posixpath.join('/media/', 'js/foo.js')


编辑: @Pete的建议是一个好建议,您可以为导入添加别名以提高可读性

from posixpath import join as urljoin

编辑:如果您查看的源os.py代码,我认为这会变得更清楚,或者至少帮助我理解了(此处的代码来自Python 2.7.11,此外我还做了一些修整)。其中有条件导入,os.py可以选择要在namespace中使用哪个路径模块os.path。所有底层模块(posixpathntpathos2emxpathriscospath),其可以在进口os.py,别名为path,在那里,存在要在所有系统中使用。os.pyos.path根据当前的操作系统在运行时选择要在命名空间中使用的模块之一。

# os.py
import sys, errno

_names = sys.builtin_module_names

if 'posix' in _names:
    # ...
    from posix import *
    # ...
    import posixpath as path
    # ...

elif 'nt' in _names:
    # ...
    from nt import *
    # ...
    import ntpath as path
    # ...

elif 'os2' in _names:
    # ...
    from os2 import *
    # ...
    if sys.version.find('EMX GCC') == -1:
        import ntpath as path
        import os2emxpath as path
        from _emx_link import link
    # ...

elif 'ce' in _names:
    # ...
    from ce import *
    # ...
    # We can use the standard Windows path.
    import ntpath as path

elif 'riscos' in _names:
    # ...
    from riscos import *
    # ...
    import riscospath as path
    # ...

    raise ImportError, 'no os specific module found'

Like you say, os.path.join joins paths based on the current os. posixpath is the underlying module that is used on posix systems under the namespace os.path:

>>> os.path.join is posixpath.join
>>> posixpath.join('/media/', 'js/foo.js')

So you can just import and use posixpath.join instead for urls, which is available and will work on any platform.

Edit: @Pete’s suggestion is a good one, you can alias the import for increased readability

from posixpath import join as urljoin

Edit: I think this is made clearer, or at least helped me understand, if you look into the source of os.py (the code here is from Python 2.7.11, plus I’ve trimmed some bits). There’s conditional imports in os.py that picks which path module to use in the namespace os.path. All the underlying modules (posixpath, ntpath, os2emxpath, riscospath) that may be imported in os.py, aliased as path, are there and exist to be used on all systems. os.py is just picking one of the modules to use in the namespace os.path at run time based on the current OS.

# os.py
import sys, errno

_names = sys.builtin_module_names

if 'posix' in _names:
    # ...
    from posix import *
    # ...
    import posixpath as path
    # ...

elif 'nt' in _names:
    # ...
    from nt import *
    # ...
    import ntpath as path
    # ...

elif 'os2' in _names:
    # ...
    from os2 import *
    # ...
    if sys.version.find('EMX GCC') == -1:
        import ntpath as path
        import os2emxpath as path
        from _emx_link import link
    # ...

elif 'ce' in _names:
    # ...
    from ce import *
    # ...
    # We can use the standard Windows path.
    import ntpath as path

elif 'riscos' in _names:
    # ...
    from riscos import *
    # ...
    import riscospath as path
    # ...

    raise ImportError, 'no os specific module found'

回答 3


def urljoin(*args):
    Joins given arguments into an url. Trailing but not leading slashes are
    stripped for each argument.

    return "/".join(map(lambda x: str(x).rstrip('/'), args))

This does the job nicely:

def urljoin(*args):
    Joins given arguments into an url. Trailing but not leading slashes are
    stripped for each argument.

    return "/".join(map(lambda x: str(x).rstrip('/'), args))

回答 4


basejoin = urljoin(base, url, allow_fragments=True)
    Join a base URL and a possibly relative URL to form an absolute
    interpretation of the latter.


The basejoin function in the urllib package might be what you’re looking for.

basejoin = urljoin(base, url, allow_fragments=True)
    Join a base URL and a possibly relative URL to form an absolute
    interpretation of the latter.

Edit: I didn’t notice before, but urllib.basejoin seems to map directly to urlparse.urljoin, making the latter preferred.

回答 5

使用furl, pip install furl它将是:


Using furl, pip install furl it will be:


回答 6


>>> url = 'https://api.foo.com/orders/bartag?spamStatus=awaiting_spam&page=1&pageSize=250'


>>> split = urlparse.urlsplit(url)
>>> split
SplitResult(scheme='https', netloc='api.foo.com', path='/orders/bartag', query='spamStatus=awaiting_spam&page=1&pageSize=250', fragment='')
>>> type(split)
<class 'urlparse.SplitResult'>
>>> dir(split)
['__add__', '__class__', '__contains__', '__delattr__', '__dict__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__getitem__', '__getnewargs__', '__getslice__', '__getstate__', '__gt__', '__hash__', '__init__', '__iter__', '__le__', '__len__', '__lt__', '__module__', '__mul__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__rmul__', '__setattr__', '__sizeof__', '__slots__', '__str__', '__subclasshook__', '__weakref__', '_asdict', '_fields', '_make', '_replace', 'count', 'fragment', 'geturl', 'hostname', 'index', 'netloc', 'password', 'path', 'port', 'query', 'scheme', 'username']
>>> split[0]
>>> split = (split[:])
>>> type(split)
<type 'tuple'>


>>> split
('https', 'api.foo.com', '/orders/bartag', 'spamStatus=awaiting_spam&page=1&pageSize=250', '')
>>> unsplit = urlparse.urlunsplit(split)
>>> unsplit



方案0 URL方案说明符空字符串

netloc 1网络位置部分为空字符串




I know this is a bit more than the OP asked for, However I had the pieces to the following url, and was looking for a simple way to join them:

>>> url = 'https://api.foo.com/orders/bartag?spamStatus=awaiting_spam&page=1&pageSize=250'

Doing some looking around:

>>> split = urlparse.urlsplit(url)
>>> split
SplitResult(scheme='https', netloc='api.foo.com', path='/orders/bartag', query='spamStatus=awaiting_spam&page=1&pageSize=250', fragment='')
>>> type(split)
<class 'urlparse.SplitResult'>
>>> dir(split)
['__add__', '__class__', '__contains__', '__delattr__', '__dict__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__getitem__', '__getnewargs__', '__getslice__', '__getstate__', '__gt__', '__hash__', '__init__', '__iter__', '__le__', '__len__', '__lt__', '__module__', '__mul__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__rmul__', '__setattr__', '__sizeof__', '__slots__', '__str__', '__subclasshook__', '__weakref__', '_asdict', '_fields', '_make', '_replace', 'count', 'fragment', 'geturl', 'hostname', 'index', 'netloc', 'password', 'path', 'port', 'query', 'scheme', 'username']
>>> split[0]
>>> split = (split[:])
>>> type(split)
<type 'tuple'>

So in addition to the path joining which has already been answered in the other answers, To get what I was looking for I did the following:

>>> split
('https', 'api.foo.com', '/orders/bartag', 'spamStatus=awaiting_spam&page=1&pageSize=250', '')
>>> unsplit = urlparse.urlunsplit(split)
>>> unsplit

According to the documentation it takes EXACTLY a 5 part tuple.

With the following tuple format:

scheme 0 URL scheme specifier empty string

netloc 1 Network location part empty string

path 2 Hierarchical path empty string

query 3 Query component empty string

fragment 4 Fragment identifier empty string

回答 7

符文·卡加德(Rune Kaagaard)为我提供了一个出色而紧凑的解决方案,我对此进行了一些扩展:

def urljoin(*args):
    trailing_slash = '/' if args[-1].endswith('/') else ''
    return "/".join(map(lambda x: str(x).strip('/'), args)) + trailing_slash


Rune Kaagaard provided a great and compact solution that worked for me, I expanded on it a little:

def urljoin(*args):
    trailing_slash = '/' if args[-1].endswith('/') else ''
    return "/".join(map(lambda x: str(x).strip('/'), args)) + trailing_slash

This allows all arguments to be joined regardless of trailing and ending slashes while preserving the last slash if present.

回答 8

为了稍微改善Alex Martelli的响应,以下内容将不仅清理多余的斜杠,而且保留尾随的(结束)斜杠,这有时可能有用:

>>> items = ["http://www.website.com", "/api", "v2/"]
>>> url = "/".join([(u.strip("/") if index + 1 < len(items) else u.lstrip("/")) for index, u in enumerate(items)])
>>> print(url)


To improve slightly over Alex Martelli’s response, the following will not only cleanup extra slashes but also preserve trailing (ending) slashes, which can sometimes be useful :

>>> items = ["http://www.website.com", "/api", "v2/"]
>>> url = "/".join([(u.strip("/") if index + 1 < len(items) else u.lstrip("/")) for index, u in enumerate(items)])
>>> print(url)

It’s not as easy to read though, and won’t cleanup multiple extra trailing slashes.

回答 9

我发现上述所有解决方案都不受欢迎,所以我提出了自己的解决方案。此版本可确保零件以单个斜杠连接,而单独保留前导斜杠和尾随斜杠。不pip install,不urllib.parse.urljoin奇怪。

In [1]: from functools import reduce

In [2]: def join_slash(a, b):
   ...:     return a.rstrip('/') + '/' + b.lstrip('/')

In [3]: def urljoin(*args):
   ...:     return reduce(join_slash, args) if args else ''

In [4]: parts = ['https://foo-bar.quux.net', '/foo', 'bar', '/bat/', '/quux/']

In [5]: urljoin(*parts)
Out[5]: 'https://foo-bar.quux.net/foo/bar/bat/quux/'

In [6]: urljoin('https://quux.com/', '/path', 'to/file///', '//here/')
Out[6]: 'https://quux.com/path/to/file/here/'

In [7]: urljoin()
Out[7]: ''

In [8]: urljoin('//','beware', 'of/this///')
Out[8]: '/beware/of/this///'

In [9]: urljoin('/leading', 'and/', '/trailing/', 'slash/')
Out[9]: '/leading/and/trailing/slash/'

I found things not to like about all the above solutions, so I came up with my own. This version makes sure parts are joined with a single slash and leaves leading and trailing slashes alone. No pip install, no urllib.parse.urljoin weirdness.

In [1]: from functools import reduce

In [2]: def join_slash(a, b):
   ...:     return a.rstrip('/') + '/' + b.lstrip('/')

In [3]: def urljoin(*args):
   ...:     return reduce(join_slash, args) if args else ''

In [4]: parts = ['https://foo-bar.quux.net', '/foo', 'bar', '/bat/', '/quux/']

In [5]: urljoin(*parts)
Out[5]: 'https://foo-bar.quux.net/foo/bar/bat/quux/'

In [6]: urljoin('https://quux.com/', '/path', 'to/file///', '//here/')
Out[6]: 'https://quux.com/path/to/file/here/'

In [7]: urljoin()
Out[7]: ''

In [8]: urljoin('//','beware', 'of/this///')
Out[8]: '/beware/of/this///'

In [9]: urljoin('/leading', 'and/', '/trailing/', 'slash/')
Out[9]: '/leading/and/trailing/slash/'

回答 10

使用Furl正则表达式(Python 3)

>>> import re
>>> import furl
>>> p = re.compile(r'(\/)+')
>>> url = furl.furl('/media/path').add(path='/js/foo.js').url
>>> url
>>> p.sub(r"\1", url)
>>> url = furl.furl('/media/path').add(path='js/foo.js').url
>>> url
>>> p.sub(r"\1", url)
>>> url = furl.furl('/media/path/').add(path='js/foo.js').url
>>> url
>>> p.sub(r"\1", url)
>>> url = furl.furl('/media///path///').add(path='//js///foo.js').url
>>> url
>>> p.sub(r"\1", url)

Using furl and regex (python 3)

>>> import re
>>> import furl
>>> p = re.compile(r'(\/)+')
>>> url = furl.furl('/media/path').add(path='/js/foo.js').url
>>> url
>>> p.sub(r"\1", url)
>>> url = furl.furl('/media/path').add(path='js/foo.js').url
>>> url
>>> p.sub(r"\1", url)
>>> url = furl.furl('/media/path/').add(path='js/foo.js').url
>>> url
>>> p.sub(r"\1", url)
>>> url = furl.furl('/media///path///').add(path='//js///foo.js').url
>>> url
>>> p.sub(r"\1", url)