Python 实用宝典

Question 1

I have two urls:

url1 = "http://127.0.0.1/test1/test2/test3/test5.xml"
url2 = "../../test4/test6.xml"

How can I get an absolute url for url2?

Question 2

You should use urlparse.urljoin :

>>> import urlparse
>>> urlparse.urljoin(url1, url2)
'http://127.0.0.1/test1/test4/test6.xml'

With Python 3 (where urlparse is renamed to urllib.parse) you could use it as follow:

>>> import urllib.parse
>>> urllib.parse.urljoin(url1, url2)
'http://127.0.0.1/test1/test4/test6.xml'

Question 3

If your relative path consists of multiple parts, you have to join them separately, since urljoin would replace the relative path, not join it. The easiest way to do that is to use posixpath.

>>> import urllib.parse
>>> import posixpath
>>> url1 = "http://127.0.0.1"
>>> url2 = "test1"
>>> url3 = "test2"
>>> url4 = "test3"
>>> url5 = "test5.xml"
>>> url_path = posixpath.join(url2, url3, url4, url5)
>>> urllib.parse.urljoin(url1, url_path)
'http://127.0.0.1/test1/test2/test3/test5.xml'

See also: How to join components of a path when you are constructing a URL in Python

Question 4

es = ['http://127.0.0.1', 'test1', 'test4', 'test6.xml']
base = ''
map(lambda e: urlparse.urljoin(base, e), es)

Question 5

>>> from urlparse import urljoin
>>> url1 = "http://www.youtube.com/user/khanacademy"
>>> url2 = "/user/khanacademy"
>>> urljoin(url1, url2)
'http://www.youtube.com/user/khanacademy'

Simple.

Question 6

For python 3.0+ the correct way to join urls is:

from urllib.parse import urljoin
urljoin('https://10.66.0.200/', '/api/org')
# output : 'https://10.66.0.200/api/org'

Question 7

You can use reduce to achieve Shikhar’s method in a cleaner fashion.

>>> import urllib.parse
>>> from functools import reduce
>>> reduce(urllib.parse.urljoin, ["http://moc.com/", "path1/", "path2/", "path3/"])
'http://moc.com/path1/path2/path3/'

Note that with this method, each fragment should have trailing forward-slash, with no leading forward-slash (to indicate it is a path fragment being joined). This is more correct/informative, telling you that path1/ is a URI path fragment, and not the full path /path1/ or an unknown path1, which could be either (and gets treated as a full path).

If you need to add / to a fragment lacking it, you could do:

uri = uri if uri.endswith("/") else f"{uri}/"

To learn more about URI resolution, Wikipedia has some nice examples.

update

Just notices Peter Perron commented about reduce on Shikhar’s answer, but I’ll leave this here then to demonstrate how that’s done.

Python 实用宝典

如何加入绝对和相对网址？

问题：如何加入绝对和相对网址？

回答 0

回答 1

回答 2

回答 3

回答 4

回答 5

有趣好用的Python教程