标签归档:url

在Python中将参数添加到给定的URL

问题:在Python中将参数添加到给定的URL

假设给了我一个URL。
它可能已经具有GET参数(例如http://example.com/search?q=question),也可能没有(例如http://example.com/)。

现在我需要为其添加一些参数{'lang':'en','tag':'python'}。在第一种情况下,我将拥有,http://example.com/search?q=question&lang=en&tag=python而在第二种情况下- http://example.com/search?lang=en&tag=python

有什么标准的方法可以做到这一点吗?

Suppose I was given a URL.
It might already have GET parameters (e.g. http://example.com/search?q=question) or it might not (e.g. http://example.com/).

And now I need to add some parameters to it like {'lang':'en','tag':'python'}. In the first case I’m going to have http://example.com/search?q=question&lang=en&tag=python and in the second — http://example.com/search?lang=en&tag=python.

Is there any standard way to do this?


回答 0

urlliburlparse模块有几个怪癖。这是一个工作示例:

try:
    import urlparse
    from urllib import urlencode
except: # For Python 3
    import urllib.parse as urlparse
    from urllib.parse import urlencode

url = "http://stackoverflow.com/search?q=question"
params = {'lang':'en','tag':'python'}

url_parts = list(urlparse.urlparse(url))
query = dict(urlparse.parse_qsl(url_parts[4]))
query.update(params)

url_parts[4] = urlencode(query)

print(urlparse.urlunparse(url_parts))

ParseResult,结果urlparse()是只读的,我们需要把它转换成list之前,我们可以尝试修改其数据。

There are a couple of quirks with the urllib and urlparse modules. Here’s a working example:

try:
    import urlparse
    from urllib import urlencode
except: # For Python 3
    import urllib.parse as urlparse
    from urllib.parse import urlencode

url = "http://stackoverflow.com/search?q=question"
params = {'lang':'en','tag':'python'}

url_parts = list(urlparse.urlparse(url))
query = dict(urlparse.parse_qsl(url_parts[4]))
query.update(params)

url_parts[4] = urlencode(query)

print(urlparse.urlunparse(url_parts))

ParseResult, the result of urlparse(), is read-only and we need to convert it to a list before we can attempt to modify its data.


回答 1

为什么

我对本页上的所有解决方案都不满意(请问,我们最喜欢的复制粘贴内容在哪里?),所以我根据此处的答案写了自己的解决方案。它试图变得完整和更加Pythonic。我为参数中的dictbool值添加了一个处理程序,以使其对消费者端(JS)更友好,但是它们仍然是可选的,您可以将其删除。

这个怎么运作

测试1:添加新参数,处理数组和布尔值:

url = 'http://stackoverflow.com/test'
new_params = {'answers': False, 'data': ['some','values']}

add_url_params(url, new_params) == \
    'http://stackoverflow.com/test?data=some&data=values&answers=false'

测试2:重写现有的参数,处理DICT值:

url = 'http://stackoverflow.com/test/?question=false'
new_params = {'question': {'__X__':'__Y__'}}

add_url_params(url, new_params) == \
    'http://stackoverflow.com/test/?question=%7B%22__X__%22%3A+%22__Y__%22%7D'

谈话很便宜。给我看代码。

代码本身。我试图详细描述它:

from json import dumps

try:
    from urllib import urlencode, unquote
    from urlparse import urlparse, parse_qsl, ParseResult
except ImportError:
    # Python 3 fallback
    from urllib.parse import (
        urlencode, unquote, urlparse, parse_qsl, ParseResult
    )


def add_url_params(url, params):
    """ Add GET params to provided URL being aware of existing.

    :param url: string of target URL
    :param params: dict containing requested params to be added
    :return: string with updated URL

    >> url = 'http://stackoverflow.com/test?answers=true'
    >> new_params = {'answers': False, 'data': ['some','values']}
    >> add_url_params(url, new_params)
    'http://stackoverflow.com/test?data=some&data=values&answers=false'
    """
    # Unquoting URL first so we don't loose existing args
    url = unquote(url)
    # Extracting url info
    parsed_url = urlparse(url)
    # Extracting URL arguments from parsed URL
    get_args = parsed_url.query
    # Converting URL arguments to dict
    parsed_get_args = dict(parse_qsl(get_args))
    # Merging URL arguments dict with new params
    parsed_get_args.update(params)

    # Bool and Dict values should be converted to json-friendly values
    # you may throw this part away if you don't like it :)
    parsed_get_args.update(
        {k: dumps(v) for k, v in parsed_get_args.items()
         if isinstance(v, (bool, dict))}
    )

    # Converting URL argument to proper query string
    encoded_get_args = urlencode(parsed_get_args, doseq=True)
    # Creating new parsed result object based on provided with new
    # URL arguments. Same thing happens inside of urlparse.
    new_url = ParseResult(
        parsed_url.scheme, parsed_url.netloc, parsed_url.path,
        parsed_url.params, encoded_get_args, parsed_url.fragment
    ).geturl()

    return new_url

请注意,可能会有一些问题,如果您发现一个问题,请告诉我,我们会做的更好

Why

I’ve been not satisfied with all the solutions on this page (come on, where is our favorite copy-paste thing?) so I wrote my own based on answers here. It tries to be complete and more Pythonic. I’ve added a handler for dict and bool values in arguments to be more consumer-side (JS) friendly, but they are yet optional, you can drop them.

How it works

Test 1: Adding new arguments, handling Arrays and Bool values:

url = 'http://stackoverflow.com/test'
new_params = {'answers': False, 'data': ['some','values']}

add_url_params(url, new_params) == \
    'http://stackoverflow.com/test?data=some&data=values&answers=false'

Test 2: Rewriting existing args, handling DICT values:

url = 'http://stackoverflow.com/test/?question=false'
new_params = {'question': {'__X__':'__Y__'}}

add_url_params(url, new_params) == \
    'http://stackoverflow.com/test/?question=%7B%22__X__%22%3A+%22__Y__%22%7D'

Talk is cheap. Show me the code.

Code itself. I’ve tried to describe it in details:

from json import dumps

try:
    from urllib import urlencode, unquote
    from urlparse import urlparse, parse_qsl, ParseResult
except ImportError:
    # Python 3 fallback
    from urllib.parse import (
        urlencode, unquote, urlparse, parse_qsl, ParseResult
    )


def add_url_params(url, params):
    """ Add GET params to provided URL being aware of existing.

    :param url: string of target URL
    :param params: dict containing requested params to be added
    :return: string with updated URL

    >> url = 'http://stackoverflow.com/test?answers=true'
    >> new_params = {'answers': False, 'data': ['some','values']}
    >> add_url_params(url, new_params)
    'http://stackoverflow.com/test?data=some&data=values&answers=false'
    """
    # Unquoting URL first so we don't loose existing args
    url = unquote(url)
    # Extracting url info
    parsed_url = urlparse(url)
    # Extracting URL arguments from parsed URL
    get_args = parsed_url.query
    # Converting URL arguments to dict
    parsed_get_args = dict(parse_qsl(get_args))
    # Merging URL arguments dict with new params
    parsed_get_args.update(params)

    # Bool and Dict values should be converted to json-friendly values
    # you may throw this part away if you don't like it :)
    parsed_get_args.update(
        {k: dumps(v) for k, v in parsed_get_args.items()
         if isinstance(v, (bool, dict))}
    )

    # Converting URL argument to proper query string
    encoded_get_args = urlencode(parsed_get_args, doseq=True)
    # Creating new parsed result object based on provided with new
    # URL arguments. Same thing happens inside of urlparse.
    new_url = ParseResult(
        parsed_url.scheme, parsed_url.netloc, parsed_url.path,
        parsed_url.params, encoded_get_args, parsed_url.fragment
    ).geturl()

    return new_url

Please be aware that there may be some issues, if you’ll find one please let me know and we will make this thing better


回答 2

如果字符串可以具有任意数据(例如,需要对与号,斜线等字符进行编码),则要使用URL编码。

查看urllib.urlencode:

>>> import urllib
>>> urllib.urlencode({'lang':'en','tag':'python'})
'lang=en&tag=python'

在python3中:

from urllib import parse
parse.urlencode({'lang':'en','tag':'python'})

You want to use URL encoding if the strings can have arbitrary data (for example, characters such as ampersands, slashes, etc. will need to be encoded).

Check out urllib.urlencode:

>>> import urllib
>>> urllib.urlencode({'lang':'en','tag':'python'})
'lang=en&tag=python'

In python3:

from urllib import parse
parse.urlencode({'lang':'en','tag':'python'})

回答 3

您还可以使用furl模块https://github.com/gruns/furl

>>> from furl import furl
>>> print furl('http://example.com/search?q=question').add({'lang':'en','tag':'python'}).url
http://example.com/search?q=question&lang=en&tag=python

You can also use the furl module https://github.com/gruns/furl

>>> from furl import furl
>>> print furl('http://example.com/search?q=question').add({'lang':'en','tag':'python'}).url
http://example.com/search?q=question&lang=en&tag=python

回答 4

将其外包给经过战斗测试的请求库

这就是我要做的:

from requests.models import PreparedRequest
url = 'http://example.com/search?q=question'
params = {'lang':'en','tag':'python'}
req = PreparedRequest()
req.prepare_url(url, params)
print(req.url)

Outsource it to the battle tested requests library.

This is how I will do it:

from requests.models import PreparedRequest
url = 'http://example.com/search?q=question'
params = {'lang':'en','tag':'python'}
req = PreparedRequest()
req.prepare_url(url, params)
print(req.url)

回答 5

如果您使用请求lib

import requests
...
params = {'tag': 'python'}
requests.get(url, params=params)

If you are using the requests lib:

import requests
...
params = {'tag': 'python'}
requests.get(url, params=params)

回答 6

是的:使用urllib

从文档中的示例中:

>>> import urllib
>>> params = urllib.urlencode({'spam': 1, 'eggs': 2, 'bacon': 0})
>>> f = urllib.urlopen("http://www.musi-cal.com/cgi-bin/query?%s" % params)
>>> print f.geturl() # Prints the final URL with parameters.
>>> print f.read() # Prints the contents

Yes: use urllib.

From the examples in the documentation:

>>> import urllib
>>> params = urllib.urlencode({'spam': 1, 'eggs': 2, 'bacon': 0})
>>> f = urllib.urlopen("http://www.musi-cal.com/cgi-bin/query?%s" % params)
>>> print f.geturl() # Prints the final URL with parameters.
>>> print f.read() # Prints the contents

回答 7

基于这个答案,简单案例的一线式(Python 3代码):

from urllib.parse import urlparse, urlencode


url = "https://stackoverflow.com/search?q=question"
params = {'lang':'en','tag':'python'}

url += ('&' if urlparse(url).query else '?') + urlencode(params)

要么:

url += ('&', '?')[urlparse(url).query == ''] + urlencode(params)

Based on this answer, one-liner for simple cases (Python 3 code):

from urllib.parse import urlparse, urlencode


url = "https://stackoverflow.com/search?q=question"
params = {'lang':'en','tag':'python'}

url += ('&' if urlparse(url).query else '?') + urlencode(params)

or:

url += ('&', '?')[urlparse(url).query == ''] + urlencode(params)

回答 8

我发现这比两个最重要的答案更为优雅:

from urllib.parse import urlencode, urlparse, parse_qs

def merge_url_query_params(url: str, additional_params: dict) -> str:
    url_components = urlparse(url)
    original_params = parse_qs(url_components.query)
    # Before Python 3.5 you could update original_params with 
    # additional_params, but here all the variables are immutable.
    merged_params = {**original_params, **additional_params}
    updated_query = urlencode(merged_params, doseq=True)
    # _replace() is how you can create a new NamedTuple with a changed field
    return url_components._replace(query=updated_query).geturl()

assert merge_url_query_params(
    'http://example.com/search?q=question',
    {'lang':'en','tag':'python'},
) == 'http://example.com/search?q=question&lang=en&tag=python'

我在最重要的答案中不喜欢的最重要的事情(尽管如此,它们还是不错的):

  • Łukasz:必须记住queryURL组件中的索引
  • Sapphire64:创建更新版本的非常冗长的方法 ParseResult

我的响应不好的是dict使用了拆包的神奇合并,但是由于我对可变性的偏见,我更喜欢更新现有字典。

I find this more elegant than the two top answers:

from urllib.parse import urlencode, urlparse, parse_qs

def merge_url_query_params(url: str, additional_params: dict) -> str:
    url_components = urlparse(url)
    original_params = parse_qs(url_components.query)
    # Before Python 3.5 you could update original_params with 
    # additional_params, but here all the variables are immutable.
    merged_params = {**original_params, **additional_params}
    updated_query = urlencode(merged_params, doseq=True)
    # _replace() is how you can create a new NamedTuple with a changed field
    return url_components._replace(query=updated_query).geturl()

assert merge_url_query_params(
    'http://example.com/search?q=question',
    {'lang':'en','tag':'python'},
) == 'http://example.com/search?q=question&lang=en&tag=python'

The most important things I dislike in the top answers (they are nevertheless good):

  • Łukasz: having to remember the index at which the query is in the URL components
  • Sapphire64: the very verbose way of creating the updated ParseResult

What’s bad about my response is the magically looking dict merge using unpacking, but I prefer that to updating an already existing dictionary because of my prejudice against mutability.


回答 9

我喜欢Łukasz版本,但是由于在这种情况下使用urllib和urllparse函数有些尴尬,因此我认为执行以下操作更简单:

params = urllib.urlencode(params)

if urlparse.urlparse(url)[4]:
    print url + '&' + params
else:
    print url + '?' + params

I liked Łukasz version, but since urllib and urllparse functions are somewhat awkward to use in this case, I think it’s more straightforward to do something like this:

params = urllib.urlencode(params)

if urlparse.urlparse(url)[4]:
    print url + '&' + params
else:
    print url + '?' + params

回答 10

使用各种urlparse功能urllib.urlencode()将组合字典上的现有URL拆开,然后urlparse.urlunparse()将其重新组合在一起。

或只取结果urllib.urlencode()并将其适当地连接到URL。

Use the various urlparse functions to tear apart the existing URL, urllib.urlencode() on the combined dictionary, then urlparse.urlunparse() to put it all back together again.

Or just take the result of urllib.urlencode() and concatenate it to the URL appropriately.


回答 11

还有一个答案:

def addGetParameters(url, newParams):
    (scheme, netloc, path, params, query, fragment) = urlparse.urlparse(url)
    queryList = urlparse.parse_qsl(query, keep_blank_values=True)
    for key in newParams:
        queryList.append((key, newParams[key]))
    return urlparse.urlunparse((scheme, netloc, path, params, urllib.urlencode(queryList), fragment))

Yet another answer:

def addGetParameters(url, newParams):
    (scheme, netloc, path, params, query, fragment) = urlparse.urlparse(url)
    queryList = urlparse.parse_qsl(query, keep_blank_values=True)
    for key in newParams:
        queryList.append((key, newParams[key]))
    return urlparse.urlunparse((scheme, netloc, path, params, urllib.urlencode(queryList), fragment))

回答 12

这是我的实现方法。

import urllib

params = urllib.urlencode({'lang':'en','tag':'python'})
url = ''
if request.GET:
   url = request.url + '&' + params
else:
   url = request.url + '?' + params    

像魅力一样工作。但是,我希望有一种更清洁的方法来实现此目的。

实现上述内容的另一种方法是将其放入方法中。

import urllib

def add_url_param(request, **params):
   new_url = ''
   _params = dict(**params)
   _params = urllib.urlencode(_params)

   if _params:
      if request.GET:
         new_url = request.url + '&' + _params
      else:
         new_url = request.url + '?' + _params
   else:
      new_url = request.url

   return new_ur

Here is how I implemented it.

import urllib

params = urllib.urlencode({'lang':'en','tag':'python'})
url = ''
if request.GET:
   url = request.url + '&' + params
else:
   url = request.url + '?' + params    

Worked like a charm. However, I would have liked a more cleaner way to implement this.

Another way of implementing the above is put it in a method.

import urllib

def add_url_param(request, **params):
   new_url = ''
   _params = dict(**params)
   _params = urllib.urlencode(_params)

   if _params:
      if request.GET:
         new_url = request.url + '&' + _params
      else:
         new_url = request.url + '?' + _params
   else:
      new_url = request.url

   return new_ur

回答 13

在python 2.5中

import cgi
import urllib
import urlparse

def add_url_param(url, **params):
    n=3
    parts = list(urlparse.urlsplit(url))
    d = dict(cgi.parse_qsl(parts[n])) # use cgi.parse_qs for list values
    d.update(params)
    parts[n]=urllib.urlencode(d)
    return urlparse.urlunsplit(parts)

url = "http://stackoverflow.com/search?q=question"
add_url_param(url, lang='en') == "http://stackoverflow.com/search?q=question&lang=en"

In python 2.5

import cgi
import urllib
import urlparse

def add_url_param(url, **params):
    n=3
    parts = list(urlparse.urlsplit(url))
    d = dict(cgi.parse_qsl(parts[n])) # use cgi.parse_qs for list values
    d.update(params)
    parts[n]=urllib.urlencode(d)
    return urlparse.urlunsplit(parts)

url = "http://stackoverflow.com/search?q=question"
add_url_param(url, lang='en') == "http://stackoverflow.com/search?q=question&lang=en"

Python-如何在python中验证网址?(格式是否正确)

问题:Python-如何在python中验证网址?(格式是否正确)

url来自用户,我必须用获取的HTML进行回复。

如何检查URL格式是否正确?

例如 :

url='google'  // Malformed
url='google.com'  // Malformed
url='http://google.com'  // Valid
url='http://google'   // Malformed

我们怎样才能做到这一点?

I have url from the user and I have to reply with the fetched HTML.

How can I check for the URL to be malformed or not?

For example :

url='google'  // Malformed
url='google.com'  // Malformed
url='http://google.com'  // Valid
url='http://google'   // Malformed

回答 0

django url验证正则表达式():

import re
regex = re.compile(
        r'^(?:http|ftp)s?://' # http:// or https://
        r'(?:(?:[A-Z0-9](?:[A-Z0-9-]{0,61}[A-Z0-9])?\.)+(?:[A-Z]{2,6}\.?|[A-Z0-9-]{2,}\.?)|' #domain...
        r'localhost|' #localhost...
        r'\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})' # ...or ip
        r'(?::\d+)?' # optional port
        r'(?:/?|[/?]\S+)$', re.IGNORECASE)

print(re.match(regex, "http://www.example.com") is not None) # True
print(re.match(regex, "example.com") is not None)            # False

django url validation regex (source):

import re
regex = re.compile(
        r'^(?:http|ftp)s?://' # http:// or https://
        r'(?:(?:[A-Z0-9](?:[A-Z0-9-]{0,61}[A-Z0-9])?\.)+(?:[A-Z]{2,6}\.?|[A-Z0-9-]{2,}\.?)|' #domain...
        r'localhost|' #localhost...
        r'\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})' # ...or ip
        r'(?::\d+)?' # optional port
        r'(?:/?|[/?]\S+)$', re.IGNORECASE)

print(re.match(regex, "http://www.example.com") is not None) # True
print(re.match(regex, "example.com") is not None)            # False

回答 1

实际上,我认为这是最好的方法。

from django.core.validators import URLValidator
from django.core.exceptions import ValidationError

val = URLValidator(verify_exists=False)
try:
    val('http://www.google.com')
except ValidationError, e:
    print e

如果设置verify_existsTrue,它将实际验证该URL是否存在,否则将仅检查其格式是否正确。

编辑:是的,这个问题是重复的:我如何检查Django的验证程序是否存在URL?

Actually, I think this is the best way.

from django.core.validators import URLValidator
from django.core.exceptions import ValidationError

val = URLValidator(verify_exists=False)
try:
    val('http://www.google.com')
except ValidationError, e:
    print e

If you set verify_exists to True, it will actually verify that the URL exists, otherwise it will just check if it’s formed correctly.

edit: ah yeah, this question is a duplicate of this: How can I check if a URL exists with Django’s validators?


回答 2

使用验证程序包:

>>> import validators
>>> validators.url("http://google.com")
True
>>> validators.url("http://google")
ValidationFailure(func=url, args={'value': 'http://google', 'require_tld': True})
>>> if not validators.url("http://google"):
...     print "not valid"
... 
not valid
>>>

使用pip()从PyPI安装它pip install validators

Use the validators package:

>>> import validators
>>> validators.url("http://google.com")
True
>>> validators.url("http://google")
ValidationFailure(func=url, args={'value': 'http://google', 'require_tld': True})
>>> if not validators.url("http://google"):
...     print "not valid"
... 
not valid
>>>

Install it from PyPI with pip (pip install validators).


回答 3

基于@DMfll的真假版本:

try:
    # python2
    from urlparse import urlparse
except:
    # python3
    from urllib.parse import urlparse

a = 'http://www.cwi.nl:80/%7Eguido/Python.html'
b = '/data/Python.html'
c = 532
d = u'dkakasdkjdjakdjadjfalskdjfalk'

def uri_validator(x):
    try:
        result = urlparse(x)
        return all([result.scheme, result.netloc, result.path])
    except:
        return False

print(uri_validator(a))
print(uri_validator(b))
print(uri_validator(c))
print(uri_validator(d))

给出:

True
False
False
False

A True or False version, based on @DMfll answer:

try:
    # python2
    from urlparse import urlparse
except:
    # python3
    from urllib.parse import urlparse

a = 'http://www.cwi.nl:80/%7Eguido/Python.html'
b = '/data/Python.html'
c = 532
d = u'dkakasdkjdjakdjadjfalskdjfalk'

def uri_validator(x):
    try:
        result = urlparse(x)
        return all([result.scheme, result.netloc, result.path])
    except:
        return False

print(uri_validator(a))
print(uri_validator(b))
print(uri_validator(c))
print(uri_validator(d))

Gives:

True
False
False
False

回答 4

如今,我根据Padam的回答使用以下内容:

$ python --version
Python 3.6.5

这是它的外观:

from urllib.parse import urlparse

def is_url(url):
  try:
    result = urlparse(url)
    return all([result.scheme, result.netloc])
  except ValueError:
    return False

只需使用is_url("http://www.asdf.com")

希望能帮助到你!

Nowadays, I use the following, based on the Padam’s answer:

$ python --version
Python 3.6.5

And this is how it looks:

from urllib.parse import urlparse

def is_url(url):
  try:
    result = urlparse(url)
    return all([result.scheme, result.netloc])
  except ValueError:
    return False

Just use is_url("http://www.asdf.com").

Hope it helps!


回答 5

注意 -lepl不再受支持,对不起(欢迎您使用它,我认为以下代码可以运行,但不会获得更新)。

rfc 3696 http://www.faqs.org/rfcs/rfc3696.html定义了如何执行此操作(针对http网址和电子邮件)。我使用lepl(解析器库)在python中实现了其建议。看到http://acooke.org/lepl/rfc3696.html

使用:

> easy_install lepl
...
> python
...
>>> from lepl.apps.rfc3696 import HttpUrl
>>> validator = HttpUrl()
>>> validator('google')
False
>>> validator('http://google')
False
>>> validator('http://google.com')
True

note – lepl is no longer supported, sorry (you’re welcome to use it, and i think the code below works, but it’s not going to get updates).

rfc 3696 http://www.faqs.org/rfcs/rfc3696.html defines how to do this (for http urls and email). i implemented its recommendations in python using lepl (a parser library). see http://acooke.org/lepl/rfc3696.html

to use:

> easy_install lepl
...
> python
...
>>> from lepl.apps.rfc3696 import HttpUrl
>>> validator = HttpUrl()
>>> validator('google')
False
>>> validator('http://google')
False
>>> validator('http://google.com')
True

回答 6

我登陆此页面,试图找出一种合理的方法来将字符串验证为“有效” URL。我在这里分享我使用python3的解决方案。无需额外的库。

如果您使用的是python2,请参见https://docs.python.org/2/library/urlparse.html

如果您按原样使用python3,请参阅https://docs.python.org/3.0/library/urllib.parse.html

import urllib
from pprint import pprint

invalid_url = 'dkakasdkjdjakdjadjfalskdjfalk'
valid_url = 'https://stackoverflow.com'
tokens = [urllib.parse.urlparse(url) for url in (invalid_url, valid_url)]

for token in tokens:
    pprint(token)

min_attributes = ('scheme', 'netloc')  # add attrs to your liking
for token in tokens:
    if not all([getattr(token, attr) for attr in min_attributes]):
        error = "'{url}' string has no scheme or netloc.".format(url=token.geturl())
        print(error)
    else:
        print("'{url}' is probably a valid url.".format(url=token.geturl()))

ParseResult(scheme =”,netloc =”,path =’dkakasdkjdjakdjadjfalskdjfalk’,params =”,query =”,fragment =“”)

ParseResult(scheme =’https’,netloc =’stackoverflow.com’,path =”,params =”,query =”,fragment =“”)

‘dkakasdkjdjakdjadjfalskdjfalk’字符串没有方案或netloc。

https://stackoverflow.com ”可能是有效的网址。

这是一个更简洁的功能:

from urllib.parse import urlparse

min_attributes = ('scheme', 'netloc')


def is_valid(url, qualifying=min_attributes):
    tokens = urlparse(url)
    return all([getattr(tokens, qualifying_attr)
                for qualifying_attr in qualifying])

I landed on this page trying to figure out a sane way to validate strings as “valid” urls. I share here my solution using python3. No extra libraries required.

See https://docs.python.org/2/library/urlparse.html if you are using python2.

See https://docs.python.org/3.0/library/urllib.parse.html if you are using python3 as I am.

import urllib
from pprint import pprint

invalid_url = 'dkakasdkjdjakdjadjfalskdjfalk'
valid_url = 'https://stackoverflow.com'
tokens = [urllib.parse.urlparse(url) for url in (invalid_url, valid_url)]

for token in tokens:
    pprint(token)

min_attributes = ('scheme', 'netloc')  # add attrs to your liking
for token in tokens:
    if not all([getattr(token, attr) for attr in min_attributes]):
        error = "'{url}' string has no scheme or netloc.".format(url=token.geturl())
        print(error)
    else:
        print("'{url}' is probably a valid url.".format(url=token.geturl()))

ParseResult(scheme=”, netloc=”, path=’dkakasdkjdjakdjadjfalskdjfalk’, params=”, query=”, fragment=”)

ParseResult(scheme=’https’, netloc=’stackoverflow.com’, path=”, params=”, query=”, fragment=”)

‘dkakasdkjdjakdjadjfalskdjfalk’ string has no scheme or netloc.

https://stackoverflow.com‘ is probably a valid url.

Here is a more concise function:

from urllib.parse import urlparse

min_attributes = ('scheme', 'netloc')


def is_valid(url, qualifying=min_attributes):
    tokens = urlparse(url)
    return all([getattr(tokens, qualifying_attr)
                for qualifying_attr in qualifying])

回答 7

编辑

正如@Kwame指出的那样,即使.comor .co等不存在,下面的代码也会验证网址。

@Blaise也指出,类似https://www.google的URL是有效的URL,您需要分别进行DNS检查以检查其是否解析。

这很简单并且有效:

因此min_attr包含定义URL有效性(即http://google.com一部分)所需出现的基本字符串集。

urlparse.scheme商店http://

urlparse.netloc 存储域名 google.com

from urlparse import urlparse
def url_check(url):

    min_attr = ('scheme' , 'netloc')
    try:
        result = urlparse(url)
        if all([result.scheme, result.netloc]):
            return True
        else:
            return False
    except:
        return False

all()如果其中所有变量都返回true,则返回true。因此,如果result.schemeresult.netloc存在(即具有某个值),则该URL有效,因此返回True

EDIT

As pointed out by @Kwame , the below code does validate the url even if the .com or .co etc are not present.

also pointed out by @Blaise, URLs like https://www.google is a valid URL and you need to do a DNS check for checking if it resolves or not, separately.

This is simple and works:

So min_attr contains the basic set of strings that needs to be present to define the validity of a URL, i.e http:// part and google.com part.

urlparse.scheme stores http:// and

urlparse.netloc store the domain name google.com

from urlparse import urlparse
def url_check(url):

    min_attr = ('scheme' , 'netloc')
    try:
        result = urlparse(url)
        if all([result.scheme, result.netloc]):
            return True
        else:
            return False
    except:
        return False

all() returns true if all the variables inside it return true. So if result.scheme and result.netloc is present i.e. has some value then the URL is valid and hence returns True.


回答 8

使用urllib和类似Django的正则表达式验证URL

Django URL验证正则表达式实际上非常好,但是我需要针对我的用例进行一些调整。 随时适应您的需求!

Python 3.7

import re
import urllib

# Check https://regex101.com/r/A326u1/5 for reference
DOMAIN_FORMAT = re.compile(
    r"(?:^(\w{1,255}):(.{1,255})@|^)" # http basic authentication [optional]
    r"(?:(?:(?=\S{0,253}(?:$|:))" # check full domain length to be less than or equal to 253 (starting after http basic auth, stopping before port)
    r"((?:[a-z0-9](?:[a-z0-9-]{0,61}[a-z0-9])?\.)+" # check for at least one subdomain (maximum length per subdomain: 63 characters), dashes in between allowed
    r"(?:[a-z0-9]{1,63})))" # check for top level domain, no dashes allowed
    r"|localhost)" # accept also "localhost" only
    r"(:\d{1,5})?", # port [optional]
    re.IGNORECASE
)
SCHEME_FORMAT = re.compile(
    r"^(http|hxxp|ftp|fxp)s?$", # scheme: http(s) or ftp(s)
    re.IGNORECASE
)

def validate_url(url: str):
    url = url.strip()

    if not url:
        raise Exception("No URL specified")

    if len(url) > 2048:
        raise Exception("URL exceeds its maximum length of 2048 characters (given length={})".format(len(url)))

    result = urllib.parse.urlparse(url)
    scheme = result.scheme
    domain = result.netloc

    if not scheme:
        raise Exception("No URL scheme specified")

    if not re.fullmatch(SCHEME_FORMAT, scheme):
        raise Exception("URL scheme must either be http(s) or ftp(s) (given scheme={})".format(scheme))

    if not domain:
        raise Exception("No URL domain specified")

    if not re.fullmatch(DOMAIN_FORMAT, domain):
        raise Exception("URL domain malformed (domain={})".format(domain))

    return url

说明

  • 该代码仅验证给定URL 的schemenetloc部分。(为了正确执行此操作,我将URL分为urllib.parse.urlparse()两个相应的部分,然后与相应的正则表达式匹配。)
  • netloc零件在第一次出现斜线之前停止/,因此port数字仍然是的一部分netloc,例如:

    https://www.google.com:80/search?q=python
    ^^^^^   ^^^^^^^^^^^^^^^^^
      |             |      
      |             +-- netloc (aka "domain" in my code)
      +-- scheme
  • IPv4地址也经过验证

IPv6支持

如果您希望URL验证程序也可以使用IPv6地址,请执行以下操作:

例子

以下是实际netloc(aka domain)部分的正则表达式示例:

Validate URL with urllib and Django-like regex

The Django URL validation regex was actually pretty good but I needed to tweak it a little bit for my use case. Feel free to adapt it to yours!

Python 3.7

import re
import urllib

# Check https://regex101.com/r/A326u1/5 for reference
DOMAIN_FORMAT = re.compile(
    r"(?:^(\w{1,255}):(.{1,255})@|^)" # http basic authentication [optional]
    r"(?:(?:(?=\S{0,253}(?:$|:))" # check full domain length to be less than or equal to 253 (starting after http basic auth, stopping before port)
    r"((?:[a-z0-9](?:[a-z0-9-]{0,61}[a-z0-9])?\.)+" # check for at least one subdomain (maximum length per subdomain: 63 characters), dashes in between allowed
    r"(?:[a-z0-9]{1,63})))" # check for top level domain, no dashes allowed
    r"|localhost)" # accept also "localhost" only
    r"(:\d{1,5})?", # port [optional]
    re.IGNORECASE
)
SCHEME_FORMAT = re.compile(
    r"^(http|hxxp|ftp|fxp)s?$", # scheme: http(s) or ftp(s)
    re.IGNORECASE
)

def validate_url(url: str):
    url = url.strip()

    if not url:
        raise Exception("No URL specified")

    if len(url) > 2048:
        raise Exception("URL exceeds its maximum length of 2048 characters (given length={})".format(len(url)))

    result = urllib.parse.urlparse(url)
    scheme = result.scheme
    domain = result.netloc

    if not scheme:
        raise Exception("No URL scheme specified")

    if not re.fullmatch(SCHEME_FORMAT, scheme):
        raise Exception("URL scheme must either be http(s) or ftp(s) (given scheme={})".format(scheme))

    if not domain:
        raise Exception("No URL domain specified")

    if not re.fullmatch(DOMAIN_FORMAT, domain):
        raise Exception("URL domain malformed (domain={})".format(domain))

    return url

Explanation

  • The code only validates the scheme and netloc part of a given URL. (To do this properly, I split the URL with urllib.parse.urlparse() in the two according parts which are then matched with the corresponding regex terms.)
  • The netloc part stops before the first occurrence of a slash /, so port numbers are still part of the netloc, e.g.:

    https://www.google.com:80/search?q=python
    ^^^^^   ^^^^^^^^^^^^^^^^^
      |             |      
      |             +-- netloc (aka "domain" in my code)
      +-- scheme
    
  • IPv4 addresses are also validated

IPv6 Support

If you want the URL validator to also work with IPv6 addresses, do the following:

  • Add is_valid_ipv6(ip) from Markus Jarderot’s answer, which has a really good IPv6 validator regex
  • Add and not is_valid_ipv6(domain) to the last if

Examples

Here are some examples of the regex for the netloc (aka domain) part in action:


回答 9

以上所有解决方案都将“ http://www.google.com/path,www.yahoo.com/path ”之类的字符串识别为有效。此解决方案始终可以正常工作

import re

# URL-link validation
ip_middle_octet = u"(?:\.(?:1?\d{1,2}|2[0-4]\d|25[0-5]))"
ip_last_octet = u"(?:\.(?:[1-9]\d?|1\d\d|2[0-4]\d|25[0-4]))"

URL_PATTERN = re.compile(
                        u"^"
                        # protocol identifier
                        u"(?:(?:https?|ftp|rtsp|rtp|mmp)://)"
                        # user:pass authentication
                        u"(?:\S+(?::\S*)?@)?"
                        u"(?:"
                        u"(?P<private_ip>"
                        # IP address exclusion
                        # private & local networks
                        u"(?:localhost)|"
                        u"(?:(?:10|127)" + ip_middle_octet + u"{2}" + ip_last_octet + u")|"
                        u"(?:(?:169\.254|192\.168)" + ip_middle_octet + ip_last_octet + u")|"
                        u"(?:172\.(?:1[6-9]|2\d|3[0-1])" + ip_middle_octet + ip_last_octet + u"))"
                        u"|"
                        # IP address dotted notation octets
                        # excludes loopback network 0.0.0.0
                        # excludes reserved space >= 224.0.0.0
                        # excludes network & broadcast addresses
                        # (first & last IP address of each class)
                        u"(?P<public_ip>"
                        u"(?:[1-9]\d?|1\d\d|2[01]\d|22[0-3])"
                        u"" + ip_middle_octet + u"{2}"
                        u"" + ip_last_octet + u")"
                        u"|"
                        # host name
                        u"(?:(?:[a-z\u00a1-\uffff0-9_-]-?)*[a-z\u00a1-\uffff0-9_-]+)"
                        # domain name
                        u"(?:\.(?:[a-z\u00a1-\uffff0-9_-]-?)*[a-z\u00a1-\uffff0-9_-]+)*"
                        # TLD identifier
                        u"(?:\.(?:[a-z\u00a1-\uffff]{2,}))"
                        u")"
                        # port number
                        u"(?::\d{2,5})?"
                        # resource path
                        u"(?:/\S*)?"
                        # query string
                        u"(?:\?\S*)?"
                        u"$",
                        re.UNICODE | re.IGNORECASE
                       )
def url_validate(url):   
    """ URL string validation
    """                                                                                                                                                      
    return re.compile(URL_PATTERN).match(url)

All of the above solutions recognize a string like “http://www.google.com/path,www.yahoo.com/path” as valid. This solution always works as it should

import re

# URL-link validation
ip_middle_octet = u"(?:\.(?:1?\d{1,2}|2[0-4]\d|25[0-5]))"
ip_last_octet = u"(?:\.(?:[1-9]\d?|1\d\d|2[0-4]\d|25[0-4]))"

URL_PATTERN = re.compile(
                        u"^"
                        # protocol identifier
                        u"(?:(?:https?|ftp|rtsp|rtp|mmp)://)"
                        # user:pass authentication
                        u"(?:\S+(?::\S*)?@)?"
                        u"(?:"
                        u"(?P<private_ip>"
                        # IP address exclusion
                        # private & local networks
                        u"(?:localhost)|"
                        u"(?:(?:10|127)" + ip_middle_octet + u"{2}" + ip_last_octet + u")|"
                        u"(?:(?:169\.254|192\.168)" + ip_middle_octet + ip_last_octet + u")|"
                        u"(?:172\.(?:1[6-9]|2\d|3[0-1])" + ip_middle_octet + ip_last_octet + u"))"
                        u"|"
                        # IP address dotted notation octets
                        # excludes loopback network 0.0.0.0
                        # excludes reserved space >= 224.0.0.0
                        # excludes network & broadcast addresses
                        # (first & last IP address of each class)
                        u"(?P<public_ip>"
                        u"(?:[1-9]\d?|1\d\d|2[01]\d|22[0-3])"
                        u"" + ip_middle_octet + u"{2}"
                        u"" + ip_last_octet + u")"
                        u"|"
                        # host name
                        u"(?:(?:[a-z\u00a1-\uffff0-9_-]-?)*[a-z\u00a1-\uffff0-9_-]+)"
                        # domain name
                        u"(?:\.(?:[a-z\u00a1-\uffff0-9_-]-?)*[a-z\u00a1-\uffff0-9_-]+)*"
                        # TLD identifier
                        u"(?:\.(?:[a-z\u00a1-\uffff]{2,}))"
                        u")"
                        # port number
                        u"(?::\d{2,5})?"
                        # resource path
                        u"(?:/\S*)?"
                        # query string
                        u"(?:\?\S*)?"
                        u"$",
                        re.UNICODE | re.IGNORECASE
                       )
def url_validate(url):   
    """ URL string validation
    """                                                                                                                                                      
    return re.compile(URL_PATTERN).match(url)

如何获取URL中最后一个斜杠之后的所有内容?

问题:如何获取URL中最后一个斜杠之后的所有内容?

如何提取Python中URL中最后一个斜杠之后的内容?例如,这些URL应该返回以下内容:

URL: http://www.test.com/TEST1
returns: TEST1

URL: http://www.test.com/page/TEST2
returns: TEST2

URL: http://www.test.com/page/page/12345
returns: 12345

我已经尝试过urlparse,但这给了我完整的路径文件名,例如page/page/12345

How can I extract whatever follows the last slash in a URL in Python? For example, these URLs should return the following:

URL: http://www.test.com/TEST1
returns: TEST1

URL: http://www.test.com/page/TEST2
returns: TEST2

URL: http://www.test.com/page/page/12345
returns: 12345

I’ve tried urlparse, but that gives me the full path filename, such as page/page/12345.


回答 0

您不需要花哨的东西,只需在标准库中查看字符串方法,就可以轻松地在“ filename”部分和其余部分之间拆分url:

url.rsplit('/', 1)

因此,您可以简单地通过以下方式获得您感兴趣的部分:

url.rsplit('/', 1)[-1]

You don’t need fancy things, just see the string methods in the standard library and you can easily split your url between ‘filename’ part and the rest:

url.rsplit('/', 1)

So you can get the part you’re interested in simply with:

url.rsplit('/', 1)[-1]

回答 1

另一种(惯用的)方式:

URL.split("/")[-1]

One more (idio(ma)tic) way:

URL.split("/")[-1]

回答 2

rsplit 应该完成任务:

In [1]: 'http://www.test.com/page/TEST2'.rsplit('/', 1)[1]
Out[1]: 'TEST2'

rsplit should be up to the task:

In [1]: 'http://www.test.com/page/TEST2'.rsplit('/', 1)[1]
Out[1]: 'TEST2'

回答 3

您可以这样:

head, tail = os.path.split(url)

其中tail是您的文件名。

You can do like this:

head, tail = os.path.split(url)

Where tail will be your file name.


回答 4

如果需要,可以使用urlparse(例如,摆脱任何查询字符串参数)。

import urllib.parse

urls = [
    'http://www.test.com/TEST1',
    'http://www.test.com/page/TEST2',
    'http://www.test.com/page/page/12345',
    'http://www.test.com/page/page/12345?abc=123'
]

for i in urls:
    url_parts = urllib.parse.urlparse(i)
    path_parts = url_parts[2].rpartition('/')
    print('URL: {}\nreturns: {}\n'.format(i, path_parts[2]))

输出:

URL: http://www.test.com/TEST1
returns: TEST1

URL: http://www.test.com/page/TEST2
returns: TEST2

URL: http://www.test.com/page/page/12345
returns: 12345

URL: http://www.test.com/page/page/12345?abc=123
returns: 12345

urlparse is fine to use if you want to (say, to get rid of any query string parameters).

import urllib.parse

urls = [
    'http://www.test.com/TEST1',
    'http://www.test.com/page/TEST2',
    'http://www.test.com/page/page/12345',
    'http://www.test.com/page/page/12345?abc=123'
]

for i in urls:
    url_parts = urllib.parse.urlparse(i)
    path_parts = url_parts[2].rpartition('/')
    print('URL: {}\nreturns: {}\n'.format(i, path_parts[2]))

Output:

URL: http://www.test.com/TEST1
returns: TEST1

URL: http://www.test.com/page/TEST2
returns: TEST2

URL: http://www.test.com/page/page/12345
returns: 12345

URL: http://www.test.com/page/page/12345?abc=123
returns: 12345

回答 5

os.path.basename(os.path.normpath('/folderA/folderB/folderC/folderD/'))
>>> folderD
os.path.basename(os.path.normpath('/folderA/folderB/folderC/folderD/'))
>>> folderD

回答 6

这是更通用的正则表达式方法:

    re.sub(r'^.+/([^/]+)$', r'\1', url)

Here’s a more general, regex way of doing this:

    re.sub(r'^.+/([^/]+)$', r'\1', url)

回答 7

extracted_url = url[url.rfind("/")+1:];
extracted_url = url[url.rfind("/")+1:];

回答 8

partition并且rpartition对于此类事情也很方便:

url.rpartition('/')[2]

First extract the path element from the URL:

from urllib.parse import urlparse
parsed= urlparse('https://www.dummy.example/this/is/PATH?q=/a/b&r=5#asx')

and then you can extract the last segment with string functions:

parsed.path.rpartition('/')[2]

(example resulting to 'PATH')


回答 9

分割网址并弹出最后一个元素 url.split('/').pop()

Split the url and pop the last element url.split('/').pop()


回答 10

url ='http://www.test.com/page/TEST2'.split('/')[4]
print url

输出:TEST2

url ='http://www.test.com/page/TEST2'.split('/')[4]
print url

Output: TEST2.


如何以JSON格式发送POST请求?

问题:如何以JSON格式发送POST请求?

data = {
        'ids': [12, 3, 4, 5, 6 , ...]
    }
    urllib2.urlopen("http://abc.com/api/posts/create",urllib.urlencode(data))

我想发送POST请求,但是其中一个字段应该是数字列表。我怎样才能做到这一点 ?(JSON?)

data = {
        'ids': [12, 3, 4, 5, 6 , ...]
    }
    urllib2.urlopen("http://abc.com/api/posts/create",urllib.urlencode(data))

I want to send a POST request, but one of the fields should be a list of numbers. How can I do that ? (JSON?)


回答 0

如果您的服务器期望POST请求为json,那么您将需要添加标头,并为请求序列化数据…

Python 2.x

import json
import urllib2

data = {
        'ids': [12, 3, 4, 5, 6]
}

req = urllib2.Request('http://example.com/api/posts/create')
req.add_header('Content-Type', 'application/json')

response = urllib2.urlopen(req, json.dumps(data))

Python 3.x

https://stackoverflow.com/a/26876308/496445


如果不指定标题,它将是默认application/x-www-form-urlencoded类型。

If your server is expecting the POST request to be json, then you would need to add a header, and also serialize the data for your request…

Python 2.x

import json
import urllib2

data = {
        'ids': [12, 3, 4, 5, 6]
}

req = urllib2.Request('http://example.com/api/posts/create')
req.add_header('Content-Type', 'application/json')

response = urllib2.urlopen(req, json.dumps(data))

Python 3.x

https://stackoverflow.com/a/26876308/496445


If you don’t specify the header, it will be the default application/x-www-form-urlencoded type.


回答 1

我建议使用令人难以置信的requests模块。

http://docs.python-requests.org/zh-CN/v0.10.7/user/quickstart/#custom-headers

url = 'https://api.github.com/some/endpoint'
payload = {'some': 'data'}
headers = {'content-type': 'application/json'}

response = requests.post(url, data=json.dumps(payload), headers=headers)

I recommend using the incredible requests module.

http://docs.python-requests.org/en/v0.10.7/user/quickstart/#custom-headers

url = 'https://api.github.com/some/endpoint'
payload = {'some': 'data'}
headers = {'content-type': 'application/json'}

response = requests.post(url, data=json.dumps(payload), headers=headers)

回答 2

对于python 3.4.2,我发现以下将起作用:

import urllib.request
import json      

body = {'ids': [12, 14, 50]}  

myurl = "http://www.testmycode.com"
req = urllib.request.Request(myurl)
req.add_header('Content-Type', 'application/json; charset=utf-8')
jsondata = json.dumps(body)
jsondataasbytes = jsondata.encode('utf-8')   # needs to be bytes
req.add_header('Content-Length', len(jsondataasbytes))
print (jsondataasbytes)
response = urllib.request.urlopen(req, jsondataasbytes)

for python 3.4.2 I found the following will work:

import urllib.request
import json      

body = {'ids': [12, 14, 50]}  
myurl = "http://www.testmycode.com"

req = urllib.request.Request(myurl)
req.add_header('Content-Type', 'application/json; charset=utf-8')
jsondata = json.dumps(body)
jsondataasbytes = jsondata.encode('utf-8')   # needs to be bytes
req.add_header('Content-Length', len(jsondataasbytes))
response = urllib.request.urlopen(req, jsondataasbytes)

回答 3

这非常适合 Python 3.5,如果URL中包含查询字符串/参数值,

请求网址= https://bah2.com/ws/rest/v1/concept/
参数值= 21f6bb43-98a1-419d-8f0c-8133669e40ca

import requests

url = 'https://bahbah2.com/ws/rest/v1/concept/21f6bb43-98a1-419d-8f0c-8133669e40ca'
data = {"name": "Value"}
r = requests.post(url, auth=('username', 'password'), verify=False, json=data)
print(r.status_code)

This works perfect for Python 3.5, if the URL contains Query String / Parameter value,

Request URL = https://bah2.com/ws/rest/v1/concept/
Parameter value = 21f6bb43-98a1-419d-8f0c-8133669e40ca

import requests

url = 'https://bahbah2.com/ws/rest/v1/concept/21f6bb43-98a1-419d-8f0c-8133669e40ca'
data = {"name": "Value"}
r = requests.post(url, auth=('username', 'password'), verify=False, json=data)
print(r.status_code)

回答 4

您必须添加标题,否则会出现http 400错误。该代码在python2.6,centos5.4上运行良好

码:

    import urllib2,json

    url = 'http://www.google.com/someservice'
    postdata = {'key':'value'}

    req = urllib2.Request(url)
    req.add_header('Content-Type','application/json')
    data = json.dumps(postdata)

    response = urllib2.urlopen(req,data)

You have to add header,or you will get http 400 error. The code works well on python2.6,centos5.4

code:

    import urllib2,json

    url = 'http://www.google.com/someservice'
    postdata = {'key':'value'}

    req = urllib2.Request(url)
    req.add_header('Content-Type','application/json')
    data = json.dumps(postdata)

    response = urllib2.urlopen(req,data)

回答 5

这是一个如何使用Python标准库中的urllib.request对象的示例。

import urllib.request
import json
from pprint import pprint

url = "https://app.close.com/hackwithus/3d63efa04a08a9e0/"

values = {
    "first_name": "Vlad",
    "last_name": "Bezden",
    "urls": [
        "https://twitter.com/VladBezden",
        "https://github.com/vlad-bezden",
    ],
}


headers = {
    "Content-Type": "application/json",
    "Accept": "application/json",
}

data = json.dumps(values).encode("utf-8")
pprint(data)

try:
    req = urllib.request.Request(url, data, headers)
    with urllib.request.urlopen(req) as f:
        res = f.read()
    pprint(res.decode())
except Exception as e:
    pprint(e)

Here is an example of how to use urllib.request object from Python standard library.

import urllib.request
import json
from pprint import pprint

url = "https://app.close.com/hackwithus/3d63efa04a08a9e0/"

values = {
    "first_name": "Vlad",
    "last_name": "Bezden",
    "urls": [
        "https://twitter.com/VladBezden",
        "https://github.com/vlad-bezden",
    ],
}


headers = {
    "Content-Type": "application/json",
    "Accept": "application/json",
}

data = json.dumps(values).encode("utf-8")
pprint(data)

try:
    req = urllib.request.Request(url, data, headers)
    with urllib.request.urlopen(req) as f:
        res = f.read()
    pprint(res.decode())
except Exception as e:
    pprint(e)

回答 6

在最新的请求包中,您可以使用jsonin requests.post()方法来发送json dict,并且Content-Typein标头将设置为application/json。无需显式指定标头。

import requests

payload = {'key': 'value'}

requests.post(url, json=payload)

In the lastest requests package, you can use json parameter in requests.post() method to send a json dict, and the Content-Type in header will be set to application/json. There is no need to specify header explicitly.

import requests

payload = {'key': 'value'}

requests.post(url, json=payload)

回答 7

这对我使用api来说效果很好

import requests

data={'Id':id ,'name': name}
r = requests.post( url = 'https://apiurllink', data = data)

This one works fine for me with apis

import requests

data={'Id':id ,'name': name}
r = requests.post( url = 'https://apiurllink', data = data)

在Python中构造URL时如何联接路径的组成部分

问题:在Python中构造URL时如何联接路径的组成部分

例如,我想将前缀路径连接到资源路径(例如/js/foo.js)。

我希望结果路径相对于服务器的根目录。在上面的示例中,如果前缀为“ media”,则我希望结果为/media/js/foo.js。

os.path.join确实做得很好,但是它如何连接路径取决于OS。在这种情况下,我知道我的目标是网络,而不是本地文件系统。

当您使用已知将在URL中使用的路径时,是否有最佳选择?os.path.join是否可以正常工作?我应该自己滚吗?

For example, I want to join a prefix path to resource paths like /js/foo.js.

I want the resulting path to be relative to the root of the server. In the above example if the prefix was “media” I would want the result to be /media/js/foo.js.

os.path.join does this really well, but how it joins paths is OS dependent. In this case I know I am targeting the web, not the local file system.

Is there a best alternative when you are working with paths you know will be used in URLs? Will os.path.join work well enough? Should I just roll my own?


回答 0

从OP发布的评论看来,由于他似乎不想保留联接中的“绝对URL”(这是;-的关键工作之一)urlparse.urljoin,因此我建议避免这种情况。 os.path.join出于完全相同的原因也会很糟糕。

因此,我将使用类似的命令'/'.join(s.strip('/') for s in pieces)(如果/必须也忽略引导符-如果引导件必须是特殊情况,那当然也是可行的;-)。

Since, from the comments the OP posted, it seems he doesn’t want to preserve “absolute URLs” in the join (which is one of the key jobs of urlparse.urljoin;-), I’d recommend avoiding that. os.path.join would also be bad, for exactly the same reason.

So, I’d use something like '/'.join(s.strip('/') for s in pieces) (if the leading / must also be ignored — if the leading piece must be special-cased, that’s also feasible of course;-).


回答 1

您可以使用urllib.parse.urljoin

>>> from urllib.parse import urljoin
>>> urljoin('/media/path/', 'js/foo.js')
'/media/path/js/foo.js'

但要注意

>>> urljoin('/media/path', 'js/foo.js')
'/media/js/foo.js'
>>> urljoin('/media/path', '/js/foo.js')
'/js/foo.js'

之所以得到不同的结果/js/foo.jsjs/foo.js是因为前者以斜杠开头,表示它已经从网站根目录开始。

在Python 2上,您必须做

from urlparse import urljoin

You can use urllib.parse.urljoin:

>>> from urllib.parse import urljoin
>>> urljoin('/media/path/', 'js/foo.js')
'/media/path/js/foo.js'

But beware:

>>> urljoin('/media/path', 'js/foo.js')
'/media/js/foo.js'
>>> urljoin('/media/path', '/js/foo.js')
'/js/foo.js'

The reason you get different results from /js/foo.js and js/foo.js is because the former begins with a slash which signifies that it already begins at the website root.

On Python 2, you have to do

from urlparse import urljoin

回答 2

就像您说的那样,os.path.join基于当前os连接路径。posixpath是在posix系统上命名空间下使用的基础模块os.path

>>> os.path.join is posixpath.join
True
>>> posixpath.join('/media/', 'js/foo.js')
'/media/js/foo.js'

因此,您可以仅导入posixpath.joinURL并将其用作URL,URL可用并且可以在任何平台上使用

编辑: @Pete的建议是一个好建议,您可以为导入添加别名以提高可读性

from posixpath import join as urljoin

编辑:如果您查看的源os.py代码,我认为这会变得更清楚,或者至少帮助我理解了(此处的代码来自Python 2.7.11,此外我还做了一些修整)。其中有条件导入,os.py可以选择要在namespace中使用哪个路径模块os.path。所有底层模块(posixpathntpathos2emxpathriscospath),其可以在进口os.py,别名为path,在那里,存在要在所有系统中使用。os.pyos.path根据当前的操作系统在运行时选择要在命名空间中使用的模块之一。

# os.py
import sys, errno

_names = sys.builtin_module_names

if 'posix' in _names:
    # ...
    from posix import *
    # ...
    import posixpath as path
    # ...

elif 'nt' in _names:
    # ...
    from nt import *
    # ...
    import ntpath as path
    # ...

elif 'os2' in _names:
    # ...
    from os2 import *
    # ...
    if sys.version.find('EMX GCC') == -1:
        import ntpath as path
    else:
        import os2emxpath as path
        from _emx_link import link
    # ...

elif 'ce' in _names:
    # ...
    from ce import *
    # ...
    # We can use the standard Windows path.
    import ntpath as path

elif 'riscos' in _names:
    # ...
    from riscos import *
    # ...
    import riscospath as path
    # ...

else:
    raise ImportError, 'no os specific module found'

Like you say, os.path.join joins paths based on the current os. posixpath is the underlying module that is used on posix systems under the namespace os.path:

>>> os.path.join is posixpath.join
True
>>> posixpath.join('/media/', 'js/foo.js')
'/media/js/foo.js'

So you can just import and use posixpath.join instead for urls, which is available and will work on any platform.

Edit: @Pete’s suggestion is a good one, you can alias the import for increased readability

from posixpath import join as urljoin

Edit: I think this is made clearer, or at least helped me understand, if you look into the source of os.py (the code here is from Python 2.7.11, plus I’ve trimmed some bits). There’s conditional imports in os.py that picks which path module to use in the namespace os.path. All the underlying modules (posixpath, ntpath, os2emxpath, riscospath) that may be imported in os.py, aliased as path, are there and exist to be used on all systems. os.py is just picking one of the modules to use in the namespace os.path at run time based on the current OS.

# os.py
import sys, errno

_names = sys.builtin_module_names

if 'posix' in _names:
    # ...
    from posix import *
    # ...
    import posixpath as path
    # ...

elif 'nt' in _names:
    # ...
    from nt import *
    # ...
    import ntpath as path
    # ...

elif 'os2' in _names:
    # ...
    from os2 import *
    # ...
    if sys.version.find('EMX GCC') == -1:
        import ntpath as path
    else:
        import os2emxpath as path
        from _emx_link import link
    # ...

elif 'ce' in _names:
    # ...
    from ce import *
    # ...
    # We can use the standard Windows path.
    import ntpath as path

elif 'riscos' in _names:
    # ...
    from riscos import *
    # ...
    import riscospath as path
    # ...

else:
    raise ImportError, 'no os specific module found'

回答 3

这很好地完成了工作:

def urljoin(*args):
    """
    Joins given arguments into an url. Trailing but not leading slashes are
    stripped for each argument.
    """

    return "/".join(map(lambda x: str(x).rstrip('/'), args))

This does the job nicely:

def urljoin(*args):
    """
    Joins given arguments into an url. Trailing but not leading slashes are
    stripped for each argument.
    """

    return "/".join(map(lambda x: str(x).rstrip('/'), args))

回答 4

urllib程序包中的basejoin函数可能正是您想要的。

basejoin = urljoin(base, url, allow_fragments=True)
    Join a base URL and a possibly relative URL to form an absolute
    interpretation of the latter.

编辑:我之前没有注意到,但是urllib.basejoin似乎直接映射到urlparse.urljoin,使后者成为首选。

The basejoin function in the urllib package might be what you’re looking for.

basejoin = urljoin(base, url, allow_fragments=True)
    Join a base URL and a possibly relative URL to form an absolute
    interpretation of the latter.

Edit: I didn’t notice before, but urllib.basejoin seems to map directly to urlparse.urljoin, making the latter preferred.


回答 5

使用furl, pip install furl它将是:

 furl.furl('/media/path/').add(path='js/foo.js')

Using furl, pip install furl it will be:

 furl.furl('/media/path/').add(path='js/foo.js')

回答 6

我知道这比OP要求的要多,但是我拥有以下URL的组成部分,并且正在寻找一种简单的方法来加入它们:

>>> url = 'https://api.foo.com/orders/bartag?spamStatus=awaiting_spam&page=1&pageSize=250'

环顾四周:

>>> split = urlparse.urlsplit(url)
>>> split
SplitResult(scheme='https', netloc='api.foo.com', path='/orders/bartag', query='spamStatus=awaiting_spam&page=1&pageSize=250', fragment='')
>>> type(split)
<class 'urlparse.SplitResult'>
>>> dir(split)
['__add__', '__class__', '__contains__', '__delattr__', '__dict__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__getitem__', '__getnewargs__', '__getslice__', '__getstate__', '__gt__', '__hash__', '__init__', '__iter__', '__le__', '__len__', '__lt__', '__module__', '__mul__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__rmul__', '__setattr__', '__sizeof__', '__slots__', '__str__', '__subclasshook__', '__weakref__', '_asdict', '_fields', '_make', '_replace', 'count', 'fragment', 'geturl', 'hostname', 'index', 'netloc', 'password', 'path', 'port', 'query', 'scheme', 'username']
>>> split[0]
'https'
>>> split = (split[:])
>>> type(split)
<type 'tuple'>

因此,除了在其他答案中已经回答过的路径联接之外,要获得我一直在寻找的内容,我还执行了以下操作:

>>> split
('https', 'api.foo.com', '/orders/bartag', 'spamStatus=awaiting_spam&page=1&pageSize=250', '')
>>> unsplit = urlparse.urlunsplit(split)
>>> unsplit
'https://api.foo.com/orders/bartag?spamStatus=awaiting_spam&page=1&pageSize=250'

根据文档,它只需要5部分元组。

使用以下元组格式:

方案0 URL方案说明符空字符串

netloc 1网络位置部分为空字符串

路径2分层路径空字符串

查询3查询组件为空字符串

片段4片段标识符为空字符串

I know this is a bit more than the OP asked for, However I had the pieces to the following url, and was looking for a simple way to join them:

>>> url = 'https://api.foo.com/orders/bartag?spamStatus=awaiting_spam&page=1&pageSize=250'

Doing some looking around:

>>> split = urlparse.urlsplit(url)
>>> split
SplitResult(scheme='https', netloc='api.foo.com', path='/orders/bartag', query='spamStatus=awaiting_spam&page=1&pageSize=250', fragment='')
>>> type(split)
<class 'urlparse.SplitResult'>
>>> dir(split)
['__add__', '__class__', '__contains__', '__delattr__', '__dict__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__getitem__', '__getnewargs__', '__getslice__', '__getstate__', '__gt__', '__hash__', '__init__', '__iter__', '__le__', '__len__', '__lt__', '__module__', '__mul__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__rmul__', '__setattr__', '__sizeof__', '__slots__', '__str__', '__subclasshook__', '__weakref__', '_asdict', '_fields', '_make', '_replace', 'count', 'fragment', 'geturl', 'hostname', 'index', 'netloc', 'password', 'path', 'port', 'query', 'scheme', 'username']
>>> split[0]
'https'
>>> split = (split[:])
>>> type(split)
<type 'tuple'>

So in addition to the path joining which has already been answered in the other answers, To get what I was looking for I did the following:

>>> split
('https', 'api.foo.com', '/orders/bartag', 'spamStatus=awaiting_spam&page=1&pageSize=250', '')
>>> unsplit = urlparse.urlunsplit(split)
>>> unsplit
'https://api.foo.com/orders/bartag?spamStatus=awaiting_spam&page=1&pageSize=250'

According to the documentation it takes EXACTLY a 5 part tuple.

With the following tuple format:

scheme 0 URL scheme specifier empty string

netloc 1 Network location part empty string

path 2 Hierarchical path empty string

query 3 Query component empty string

fragment 4 Fragment identifier empty string


回答 7

符文·卡加德(Rune Kaagaard)为我提供了一个出色而紧凑的解决方案,我对此进行了一些扩展:

def urljoin(*args):
    trailing_slash = '/' if args[-1].endswith('/') else ''
    return "/".join(map(lambda x: str(x).strip('/'), args)) + trailing_slash

这样,无论尾随斜杠和尾部是什么斜杠,所有参数都可以连接在一起,同时保留最后一个斜杠(如果存在)。

Rune Kaagaard provided a great and compact solution that worked for me, I expanded on it a little:

def urljoin(*args):
    trailing_slash = '/' if args[-1].endswith('/') else ''
    return "/".join(map(lambda x: str(x).strip('/'), args)) + trailing_slash

This allows all arguments to be joined regardless of trailing and ending slashes while preserving the last slash if present.


回答 8

为了稍微改善Alex Martelli的响应,以下内容将不仅清理多余的斜杠,而且保留尾随的(结束)斜杠,这有时可能有用:

>>> items = ["http://www.website.com", "/api", "v2/"]
>>> url = "/".join([(u.strip("/") if index + 1 < len(items) else u.lstrip("/")) for index, u in enumerate(items)])
>>> print(url)
http://www.website.com/api/v2/

它不是那么容易阅读,并且不会清除多个多余的斜杠。

To improve slightly over Alex Martelli’s response, the following will not only cleanup extra slashes but also preserve trailing (ending) slashes, which can sometimes be useful :

>>> items = ["http://www.website.com", "/api", "v2/"]
>>> url = "/".join([(u.strip("/") if index + 1 < len(items) else u.lstrip("/")) for index, u in enumerate(items)])
>>> print(url)
http://www.website.com/api/v2/

It’s not as easy to read though, and won’t cleanup multiple extra trailing slashes.


回答 9

我发现上述所有解决方案都不受欢迎,所以我提出了自己的解决方案。此版本可确保零件以单个斜杠连接,而单独保留前导斜杠和尾随斜杠。不pip install,不urllib.parse.urljoin奇怪。

In [1]: from functools import reduce

In [2]: def join_slash(a, b):
   ...:     return a.rstrip('/') + '/' + b.lstrip('/')
   ...:

In [3]: def urljoin(*args):
   ...:     return reduce(join_slash, args) if args else ''
   ...:

In [4]: parts = ['https://foo-bar.quux.net', '/foo', 'bar', '/bat/', '/quux/']

In [5]: urljoin(*parts)
Out[5]: 'https://foo-bar.quux.net/foo/bar/bat/quux/'

In [6]: urljoin('https://quux.com/', '/path', 'to/file///', '//here/')
Out[6]: 'https://quux.com/path/to/file/here/'

In [7]: urljoin()
Out[7]: ''

In [8]: urljoin('//','beware', 'of/this///')
Out[8]: '/beware/of/this///'

In [9]: urljoin('/leading', 'and/', '/trailing/', 'slash/')
Out[9]: '/leading/and/trailing/slash/'

I found things not to like about all the above solutions, so I came up with my own. This version makes sure parts are joined with a single slash and leaves leading and trailing slashes alone. No pip install, no urllib.parse.urljoin weirdness.

In [1]: from functools import reduce

In [2]: def join_slash(a, b):
   ...:     return a.rstrip('/') + '/' + b.lstrip('/')
   ...:

In [3]: def urljoin(*args):
   ...:     return reduce(join_slash, args) if args else ''
   ...:

In [4]: parts = ['https://foo-bar.quux.net', '/foo', 'bar', '/bat/', '/quux/']

In [5]: urljoin(*parts)
Out[5]: 'https://foo-bar.quux.net/foo/bar/bat/quux/'

In [6]: urljoin('https://quux.com/', '/path', 'to/file///', '//here/')
Out[6]: 'https://quux.com/path/to/file/here/'

In [7]: urljoin()
Out[7]: ''

In [8]: urljoin('//','beware', 'of/this///')
Out[8]: '/beware/of/this///'

In [9]: urljoin('/leading', 'and/', '/trailing/', 'slash/')
Out[9]: '/leading/and/trailing/slash/'

回答 10

使用Furl正则表达式(Python 3)

>>> import re
>>> import furl
>>> p = re.compile(r'(\/)+')
>>> url = furl.furl('/media/path').add(path='/js/foo.js').url
>>> url
'/media/path/js/foo.js'
>>> p.sub(r"\1", url)
'/media/path/js/foo.js'
>>> url = furl.furl('/media/path').add(path='js/foo.js').url
>>> url
'/media/path/js/foo.js'
>>> p.sub(r"\1", url)
'/media/path/js/foo.js'
>>> url = furl.furl('/media/path/').add(path='js/foo.js').url
>>> url
'/media/path/js/foo.js'
>>> p.sub(r"\1", url)
'/media/path/js/foo.js'
>>> url = furl.furl('/media///path///').add(path='//js///foo.js').url
>>> url
'/media///path/////js///foo.js'
>>> p.sub(r"\1", url)
'/media/path/js/foo.js'

Using furl and regex (python 3)

>>> import re
>>> import furl
>>> p = re.compile(r'(\/)+')
>>> url = furl.furl('/media/path').add(path='/js/foo.js').url
>>> url
'/media/path/js/foo.js'
>>> p.sub(r"\1", url)
'/media/path/js/foo.js'
>>> url = furl.furl('/media/path').add(path='js/foo.js').url
>>> url
'/media/path/js/foo.js'
>>> p.sub(r"\1", url)
'/media/path/js/foo.js'
>>> url = furl.furl('/media/path/').add(path='js/foo.js').url
>>> url
'/media/path/js/foo.js'
>>> p.sub(r"\1", url)
'/media/path/js/foo.js'
>>> url = furl.furl('/media///path///').add(path='//js///foo.js').url
>>> url
'/media///path/////js///foo.js'
>>> p.sub(r"\1", url)
'/media/path/js/foo.js'

如何加入绝对和相对网址?

问题:如何加入绝对和相对网址?

我有两个网址:

url1 = "http://127.0.0.1/test1/test2/test3/test5.xml"
url2 = "../../test4/test6.xml"

如何获得url2的绝对URL?

I have two urls:

url1 = "http://127.0.0.1/test1/test2/test3/test5.xml"
url2 = "../../test4/test6.xml"

How can I get an absolute url for url2?


回答 0

您应该使用urlparse.urljoin

>>> import urlparse
>>> urlparse.urljoin(url1, url2)
'http://127.0.0.1/test1/test4/test6.xml'

使用Python 3(将urlparse重命名为urllib.parse),您可以按以下方式使用它

>>> import urllib.parse
>>> urllib.parse.urljoin(url1, url2)
'http://127.0.0.1/test1/test4/test6.xml'

You should use urlparse.urljoin :

>>> import urlparse
>>> urlparse.urljoin(url1, url2)
'http://127.0.0.1/test1/test4/test6.xml'

With Python 3 (where urlparse is renamed to urllib.parse) you could use it as follow:

>>> import urllib.parse
>>> urllib.parse.urljoin(url1, url2)
'http://127.0.0.1/test1/test4/test6.xml'

回答 1

如果您的相对路径由多个部分组成,则必须单独连接它们,因为urljoin它将替换相对路径,而不是将其合并。最简单的方法是使用posixpath

>>> import urllib.parse
>>> import posixpath
>>> url1 = "http://127.0.0.1"
>>> url2 = "test1"
>>> url3 = "test2"
>>> url4 = "test3"
>>> url5 = "test5.xml"
>>> url_path = posixpath.join(url2, url3, url4, url5)
>>> urllib.parse.urljoin(url1, url_path)
'http://127.0.0.1/test1/test2/test3/test5.xml'

另请参阅:在Python中构造URL时如何联接路径的组件

If your relative path consists of multiple parts, you have to join them separately, since urljoin would replace the relative path, not join it. The easiest way to do that is to use posixpath.

>>> import urllib.parse
>>> import posixpath
>>> url1 = "http://127.0.0.1"
>>> url2 = "test1"
>>> url3 = "test2"
>>> url4 = "test3"
>>> url5 = "test5.xml"
>>> url_path = posixpath.join(url2, url3, url4, url5)
>>> urllib.parse.urljoin(url1, url_path)
'http://127.0.0.1/test1/test2/test3/test5.xml'

See also: How to join components of a path when you are constructing a URL in Python


回答 2

es = ['http://127.0.0.1', 'test1', 'test4', 'test6.xml']
base = ''
map(lambda e: urlparse.urljoin(base, e), es)
es = ['http://127.0.0.1', 'test1', 'test4', 'test6.xml']
base = ''
map(lambda e: urlparse.urljoin(base, e), es)

回答 3

>>> from urlparse import urljoin
>>> url1 = "http://www.youtube.com/user/khanacademy"
>>> url2 = "/user/khanacademy"
>>> urljoin(url1, url2)
'http://www.youtube.com/user/khanacademy'

简单。

>>> from urlparse import urljoin
>>> url1 = "http://www.youtube.com/user/khanacademy"
>>> url2 = "/user/khanacademy"
>>> urljoin(url1, url2)
'http://www.youtube.com/user/khanacademy'

Simple.


回答 4

对于python 3.0+,加入网址的正确方法是:

from urllib.parse import urljoin
urljoin('https://10.66.0.200/', '/api/org')
# output : 'https://10.66.0.200/api/org'

For python 3.0+ the correct way to join urls is:

from urllib.parse import urljoin
urljoin('https://10.66.0.200/', '/api/org')
# output : 'https://10.66.0.200/api/org'

回答 5

您可以使用reduce一种更简洁的方式来实现Shikhar的方法。

>>> import urllib.parse
>>> from functools import reduce
>>> reduce(urllib.parse.urljoin, ["http://moc.com/", "path1/", "path2/", "path3/"])
'http://moc.com/path1/path2/path3/'

请注意,使用此方法,每个片段都应具有尾随的正斜杠,而没有前导的正斜杠(以表明它是路径片段的联接)。这更正确/更有意义,告诉您这path1/是URI路径片段,而不是完整路径/path1/或未知路径path1,它可能是(或者被视为完整路径)。

如果需要添加/到缺少该片段的片段,则可以执行以下操作:

uri = uri if uri.endswith("/") else f"{uri}/"

要了解有关URI解析的更多信息,请参阅Wikipedia。提供了一些很好的示例。

更新

只是注意到彼得·佩隆(Peter Perron)评论了Shikhar回答的减少,但是我将在这里留下来证明它是如何完成的。

You can use reduce to achieve Shikhar’s method in a cleaner fashion.

>>> import urllib.parse
>>> from functools import reduce
>>> reduce(urllib.parse.urljoin, ["http://moc.com/", "path1/", "path2/", "path3/"])
'http://moc.com/path1/path2/path3/'

Note that with this method, each fragment should have trailing forward-slash, with no leading forward-slash (to indicate it is a path fragment being joined). This is more correct/informative, telling you that path1/ is a URI path fragment, and not the full path /path1/ or an unknown path1, which could be either (and gets treated as a full path).

If you need to add / to a fragment lacking it, you could do:

uri = uri if uri.endswith("/") else f"{uri}/"

To learn more about URI resolution, Wikipedia has some nice examples.

update

Just notices Peter Perron commented about reduce on Shikhar’s answer, but I’ll leave this here then to demonstrate how that’s done.


字典python的URL查询参数

问题:字典python的URL查询参数

有没有一种方法可以解析网址(带有某些python库)并返回带有该网址查询参数部分的键和值的python字典?

例如:

url = "http://www.example.org/default.html?ct=32&op=92&item=98"

预期收益:

{'ct':32, 'op':92, 'item':98}

Is there a way to parse a URL (with some python library) and return a python dictionary with the keys and values of a query parameters part of the URL?

For example:

url = "http://www.example.org/default.html?ct=32&op=92&item=98"

expected return:

{'ct':32, 'op':92, 'item':98}

回答 0

使用urllib.parse

>>> from urllib import parse
>>> url = "http://www.example.org/default.html?ct=32&op=92&item=98"
>>> parse.urlsplit(url)
SplitResult(scheme='http', netloc='www.example.org', path='/default.html', query='ct=32&op=92&item=98', fragment='')
>>> parse.parse_qs(parse.urlsplit(url).query)
{'item': ['98'], 'op': ['92'], 'ct': ['32']}
>>> dict(parse.parse_qsl(parse.urlsplit(url).query))
{'item': '98', 'op': '92', 'ct': '32'}

urllib.parse.parse_qs()urllib.parse.parse_qsl()方法解析出查询字符串,考虑到钥匙可能会出现不止一次和顺序可能无关紧要。

如果您仍在使用Python 2,urllib.parse则称为urlparse

Use the urllib.parse library:

>>> from urllib import parse
>>> url = "http://www.example.org/default.html?ct=32&op=92&item=98"
>>> parse.urlsplit(url)
SplitResult(scheme='http', netloc='www.example.org', path='/default.html', query='ct=32&op=92&item=98', fragment='')
>>> parse.parse_qs(parse.urlsplit(url).query)
{'item': ['98'], 'op': ['92'], 'ct': ['32']}
>>> dict(parse.parse_qsl(parse.urlsplit(url).query))
{'item': '98', 'op': '92', 'ct': '32'}

The urllib.parse.parse_qs() and urllib.parse.parse_qsl() methods parse out query strings, taking into account that keys can occur more than once and that order may matter.

If you are still on Python 2, urllib.parse was called urlparse.


回答 1

对于Python 3,dict from的值parse_qs在列表中,因为可能有多个值。如果您只想要第一个:

>>> from urllib.parse import urlsplit, parse_qs
>>>
>>> url = "http://www.example.org/default.html?ct=32&op=92&item=98"
>>> query = urlsplit(url).query
>>> params = parse_qs(query)
>>> params
{'item': ['98'], 'op': ['92'], 'ct': ['32']}
>>> dict(params)
{'item': ['98'], 'op': ['92'], 'ct': ['32']}
>>> {k: v[0] for k, v in params.items()}
{'item': '98', 'op': '92', 'ct': '32'}

For Python 3, the values of the dict from parse_qs are in a list, because there might be multiple values. If you just want the first one:

>>> from urllib.parse import urlsplit, parse_qs
>>>
>>> url = "http://www.example.org/default.html?ct=32&op=92&item=98"
>>> query = urlsplit(url).query
>>> params = parse_qs(query)
>>> params
{'item': ['98'], 'op': ['92'], 'ct': ['32']}
>>> dict(params)
{'item': ['98'], 'op': ['92'], 'ct': ['32']}
>>> {k: v[0] for k, v in params.items()}
{'item': '98', 'op': '92', 'ct': '32'}

回答 2

如果您不想使用解析器:

url = "http://www.example.org/default.html?ct=32&op=92&item=98"
url = url.split("?")[1]
dict = {x[0] : x[1] for x in [x.split("=") for x in url[1:].split("&") ]}

因此,我不会删除上面的内容,但是绝对不是您应该使用的内容。

我想我读了一些答案,而且它们看起来有些复杂,以防万一您像我一样,不要使用我的解决方案。

用这个:

from urllib import parse
params = dict(parse.parse_qsl(parse.urlsplit(url).query))

而对于Python 2.X

import urlparse as parse
params = dict(parse.parse_qsl(parse.urlsplit(url).query))

我知道这与接受的答案相同,只是在一个可以复制的衬里上。

If you prefer not to use a parser:

url = "http://www.example.org/default.html?ct=32&op=92&item=98"
url = url.split("?")[1]
dict = {x[0] : x[1] for x in [x.split("=") for x in url[1:].split("&") ]}

So I won’t delete what’s above but it’s definitely not what you should use.

I think I read a few of the answers and they looked a little complicated, incase you’re like me, don’t use my solution.

Use this:

from urllib import parse
params = dict(parse.parse_qsl(parse.urlsplit(url).query))

and for Python 2.X

import urlparse as parse
params = dict(parse.parse_qsl(parse.urlsplit(url).query))

I know this is the same as the accepted answer, just in a one liner that can be copied.


回答 3

对于python 2.7

In [14]: url = "http://www.example.org/default.html?ct=32&op=92&item=98"

In [15]: from urlparse import urlparse, parse_qsl

In [16]: parse_url = urlparse(url)

In [17]: query_dict = dict(parse_qsl(parse_url.query))

In [18]: query_dict
Out[18]: {'ct': '32', 'item': '98', 'op': '92'}

For python 2.7

In [14]: url = "http://www.example.org/default.html?ct=32&op=92&item=98"

In [15]: from urlparse import urlparse, parse_qsl

In [16]: parse_url = urlparse(url)

In [17]: query_dict = dict(parse_qsl(parse_url.query))

In [18]: query_dict
Out[18]: {'ct': '32', 'item': '98', 'op': '92'}

回答 4

我同意不重新发明轮子,但有时(在您学习时)有助于构建轮子以便理解轮子。:)因此,从纯粹的学术角度来看,我提供了一个警告,即使用字典假定名称/值对是唯一的(查询字符串不包含多个记录)。

url = 'http:/mypage.html?one=1&two=2&three=3'

page, query = url.split('?')

names_values_dict = dict(pair.split('=') for pair in query.split('&'))

names_values_list = [pair.split('=') for pair in query.split('&')]

我在空闲IDE中使用3.6.5版。

I agree about not reinventing the wheel but sometimes (while you’re learning) it helps to build a wheel in order to understand a wheel. :) So, from a purely academic perspective, I offer this with the caveat that using a dictionary assumes that name value pairs are unique (that the query string does not contain multiple records).

url = 'http:/mypage.html?one=1&two=2&three=3'

page, query = url.split('?')

names_values_dict = dict(pair.split('=') for pair in query.split('&'))

names_values_list = [pair.split('=') for pair in query.split('&')]

I’m using version 3.6.5 in the Idle IDE.


回答 5

对于python2.7我正在使用urlparse模块来解析URL查询到字典。

import urlparse

url = "http://www.example.org/default.html?ct=32&op=92&item=98"

print urlparse.parse_qs( urlparse.urlparse(url).query )
# result: {'item': ['98'], 'op': ['92'], 'ct': ['32']} 

For python2.7 I am using urlparse module to parse url query to dict.

import urlparse

url = "http://www.example.org/default.html?ct=32&op=92&item=98"

print urlparse.parse_qs( urlparse.urlparse(url).query )
# result: {'item': ['98'], 'op': ['92'], 'ct': ['32']} 

从URL检索参数

问题:从URL检索参数

给定如下所示的URL,如何解析查询参数的值?例如,在这种情况下,我需要的值def

/abc?def='ghi'

我在我的环境中使用Django;有没有一种方法request可以帮助我?

我尝试使用,self.request.get('def')但是它没有返回ghi我希望的值。

Given a URL like the following, how can I parse the value of the query parameters? For example, in this case I want the value of def.

/abc?def='ghi'

I am using Django in my environment; is there a method on the request object that could help me?

I tried using self.request.get('def') but it is not returning the value ghi as I had hoped.


回答 0

Python 2:

import urlparse
url = 'http://foo.appspot.com/abc?def=ghi'
parsed = urlparse.urlparse(url)
print urlparse.parse_qs(parsed.query)['def']

Python 3:

import urllib.parse as urlparse
from urllib.parse import parse_qs
url = 'http://foo.appspot.com/abc?def=ghi'
parsed = urlparse.urlparse(url)
print(parse_qs(parsed.query)['def'])

parse_qs返回值列表,因此上述代码将打印['ghi']

这是Python 3文档。

Python 2:

import urlparse
url = 'http://foo.appspot.com/abc?def=ghi'
parsed = urlparse.urlparse(url)
print urlparse.parse_qs(parsed.query)['def']

Python 3:

import urllib.parse as urlparse
from urllib.parse import parse_qs
url = 'http://foo.appspot.com/abc?def=ghi'
parsed = urlparse.urlparse(url)
print(parse_qs(parsed.query)['def'])

parse_qs returns a list of values, so the above code will print ['ghi'].

Here’s the Python 3 documentation.


回答 1

我很震惊这个解决方案还没有在这里。用:

request.GET.get('variable_name')

这将从“ GET”字典中“获取”变量,并返回“ variable_name”值(如果存在),或者返回None对象(如果不存在)。

I’m shocked this solution isn’t on here already. Use:

request.GET.get('variable_name')

This will “get” the variable from the “GET” dictionary, and return the ‘variable_name’ value if it exists, or a None object if it doesn’t exist.


回答 2

import urlparse
url = 'http://example.com/?q=abc&p=123'
par = urlparse.parse_qs(urlparse.urlparse(url).query)

print par['q'][0], par['p'][0]
import urlparse
url = 'http://example.com/?q=abc&p=123'
par = urlparse.parse_qs(urlparse.urlparse(url).query)

print par['q'][0], par['p'][0]

回答 3

对于Python> 3.4

from urllib import parse
url = 'http://foo.appspot.com/abc?def=ghi'
query_def=parse.parse_qs(parse.urlparse(url).query)['def'][0]

for Python > 3.4

from urllib import parse
url = 'http://foo.appspot.com/abc?def=ghi'
query_def=parse.parse_qs(parse.urlparse(url).query)['def'][0]

回答 4

有一个名为furl的新库。我发现这个库是执行url代数的最pythonic。安装:

pip install furl

码:

from furl import furl
f = furl("/abc?def='ghi'") 
print f.args['def']

There is a new library called furl. I find this library to be most pythonic for doing url algebra. To install:

pip install furl

Code:

from furl import furl
f = furl("/abc?def='ghi'") 
print f.args['def']

回答 5

我知道这有点晚了,但是自从我今天在这里找到自己以来,我认为这对其他人可能是一个有用的答案。

import urlparse
url = 'http://example.com/?q=abc&p=123'
parsed = urlparse.urlparse(url)
params = urlparse.parse_qsl(parsed.query)
for x,y in params:
    print "Parameter = "+x,"Value = "+y

使用parse_qsl(),“将数据作为名称,值对的列表返回。”

I know this is a bit late but since I found myself on here today, I thought that this might be a useful answer for others.

import urlparse
url = 'http://example.com/?q=abc&p=123'
parsed = urlparse.urlparse(url)
params = urlparse.parse_qsl(parsed.query)
for x,y in params:
    print "Parameter = "+x,"Value = "+y

With parse_qsl(), “Data are returned as a list of name, value pairs.”


回答 6

您引用的url是一种查询类型,我看到请求对象支持一种称为arguments的方法来获取查询参数。您可能还希望self.request.get('def')直接尝试从对象中获取价值。

The url you are referring is a query type and I see that the request object supports a method called arguments to get the query arguments. You may also want try self.request.get('def') directly to get your value from the object..


回答 7

def getParams(url):
    params = url.split("?")[1]
    params = params.split('=')
    pairs = zip(params[0::2], params[1::2])
    answer = dict((k,v) for k,v in pairs)

希望这可以帮助

def getParams(url):
    params = url.split("?")[1]
    params = params.split('=')
    pairs = zip(params[0::2], params[1::2])
    answer = dict((k,v) for k,v in pairs)

Hope this helps


回答 8

urlparse模块提供您所需的一切:

urlparse.parse_qs()

The urlparse module provides everything you need:

urlparse.parse_qs()


回答 9

不需要做任何事情。只有

self.request.get('variable_name')

注意,我没有指定方法(GET,POST等)。这是有据可查的这是一个示例

您使用Django模板的事实并不意味着处理程序也由Django处理

There’s not need to do any of that. Only with

self.request.get('variable_name')

Notice that I’m not specifying the method (GET, POST, etc). This is well documented and this is an example

The fact that you use Django templates doesn’t mean the handler is processed by Django as well


回答 10

在纯Python中:

def get_param_from_url(url, param_name):
    return [i.split("=")[-1] for i in url.split("?", 1)[-1].split("&") if i.startswith(param_name + "=")][0]

In pure Python:

def get_param_from_url(url, param_name):
    return [i.split("=")[-1] for i in url.split("?", 1)[-1].split("&") if i.startswith(param_name + "=")][0]

回答 11

import cgitb
cgitb.enable()

import cgi
print "Content-Type: text/plain;charset=utf-8"
print
form = cgi.FieldStorage()
i = int(form.getvalue('a'))+int(form.getvalue('b'))
print i
import cgitb
cgitb.enable()

import cgi
print "Content-Type: text/plain;charset=utf-8"
print
form = cgi.FieldStorage()
i = int(form.getvalue('a'))+int(form.getvalue('b'))
print i

回答 12

顺便说一句,我在使用parse_qs()并获取空值参数时遇到了问题,并了解到您必须传递第二个可选参数’keep_blank_values’才能在查询字符串中返回不包含任何值的参数列表。默认为false。某些糟糕的书面API要求参数存在,即使它们不包含任何值

for k,v in urlparse.parse_qs(p.query, True).items():
  print k

Btw, I was having issues using parse_qs() and getting empty value parameters and learned that you have to pass a second optional parameter ‘keep_blank_values’ to return a list of the parameters in a query string that contain no values. It defaults to false. Some crappy written APIs require parameters to be present even if they contain no values

for k,v in urlparse.parse_qs(p.query, True).items():
  print k

回答 13

有一个不错的库w3lib.url

from w3lib.url import url_query_parameter
url = "/abc?def=ghi"
print url_query_parameter(url, 'def')
ghi

There is a nice library w3lib.url

from w3lib.url import url_query_parameter
url = "/abc?def=ghi"
print url_query_parameter(url, 'def')
ghi

回答 14

参数= dict([part.split(’=’)代表get_parsed_url [4] .split(’&’)]的部分)

这很简单。可变参数将包含所有参数的字典。

parameters = dict([part.split(‘=’) for part in get_parsed_url[4].split(‘&’)])

This one is simple. The variable parameters will contain a dictionary of all the parameters.


回答 15

我看到龙卷风用户没有答案:

key = self.request.query_arguments.get("key", None)

此方法必须在派生自以下对象的处理程序中起作用:

tornado.web.RequestHandler

当找不到所请求的密钥时,此方法将返回的答案是无。这为您节省了一些异常处理。

I see there isn’t an answer for users of Tornado:

key = self.request.query_arguments.get("key", None)

This method must work inside an handler that is derived from:

tornado.web.RequestHandler

None is the answer this method will return when the requested key can’t be found. This saves you some exception handling.


如何获得Flask请求网址的不同部分?

问题:如何获得Flask请求网址的不同部分?

我想检测请求是否来自localhost:5000foo.herokuapp.com主机以及请求的路径。如何获得有关Flask请求的信息?

I want to detect if the request came from the localhost:5000 or foo.herokuapp.com host and what path was requested. How do I get this information about a Flask request?


回答 0

您可以通过以下几个Request字段检查网址:

用户请求以下URL:

    http://www.example.com/myapplication/page.html?x=y

在这种情况下,上述属性的值如下:

    path             /page.html
    script_root      /myapplication
    base_url         http://www.example.com/myapplication/page.html
    url              http://www.example.com/myapplication/page.html?x=y
    url_root         http://www.example.com/myapplication/

您可以通过适当的拆分轻松提取主体部分。

You can examine the url through several Request fields:

Imagine your application is listening on the following application root:

http://www.example.com/myapplication

And a user requests the following URI:

http://www.example.com/myapplication/foo/page.html?x=y

In this case the values of the above mentioned attributes would be the following:

    path             /foo/page.html
    full_path        /foo/page.html?x=y
    script_root      /myapplication
    base_url         http://www.example.com/myapplication/foo/page.html
    url              http://www.example.com/myapplication/foo/page.html?x=y
    url_root         http://www.example.com/myapplication/

You can easily extract the host part with the appropriate splits.


回答 1

另一个例子:

请求:

curl -XGET http://127.0.0.1:5000/alert/dingding/test?x=y

然后:

request.method:              GET
request.url:                 http://127.0.0.1:5000/alert/dingding/test?x=y
request.base_url:            http://127.0.0.1:5000/alert/dingding/test
request.url_charset:         utf-8
request.url_root:            http://127.0.0.1:5000/
str(request.url_rule):       /alert/dingding/test
request.host_url:            http://127.0.0.1:5000/
request.host:                127.0.0.1:5000
request.script_root:
request.path:                /alert/dingding/test
request.full_path:           /alert/dingding/test?x=y

request.args:                ImmutableMultiDict([('x', 'y')])
request.args.get('x'):       y

another example:

request:

curl -XGET http://127.0.0.1:5000/alert/dingding/test?x=y

then:

request.method:              GET
request.url:                 http://127.0.0.1:5000/alert/dingding/test?x=y
request.base_url:            http://127.0.0.1:5000/alert/dingding/test
request.url_charset:         utf-8
request.url_root:            http://127.0.0.1:5000/
str(request.url_rule):       /alert/dingding/test
request.host_url:            http://127.0.0.1:5000/
request.host:                127.0.0.1:5000
request.script_root:
request.path:                /alert/dingding/test
request.full_path:           /alert/dingding/test?x=y

request.args:                ImmutableMultiDict([('x', 'y')])
request.args.get('x'):       y

回答 2

你应该试试:

request.url 

它假设即使在localhost上也可以一直工作(只是这样做了)。

you should try:

request.url 

It suppose to work always, even on localhost (just did it).


回答 3

如果您使用的是Python,建议您浏览请求对象:

dir(request)

由于对象支持方法dict

request.__dict__

可以打印或保存。我用它来在Flask中记录404代码:

@app.errorhandler(404)
def not_found(e):
    with open("./404.csv", "a") as f:
        f.write(f'{datetime.datetime.now()},{request.__dict__}\n')
    return send_file('static/images/Darknet-404-Page-Concept.png', mimetype='image/png')

If you are using Python, I would suggest by exploring the request object:

dir(request)

Since the object support the method dict:

request.__dict__

It can be printed or saved. I use it to log 404 codes in Flask:

@app.errorhandler(404)
def not_found(e):
    with open("./404.csv", "a") as f:
        f.write(f'{datetime.datetime.now()},{request.__dict__}\n')
    return send_file('static/images/Darknet-404-Page-Concept.png', mimetype='image/png')

如何对Python中的URL参数进行百分比编码?

问题:如何对Python中的URL参数进行百分比编码?

如果我做

url = "http://example.com?p=" + urllib.quote(query)
  1. 它不编码/%2F(破坏OAuth规范化)
  2. 它不处理Unicode(引发异常)

有没有更好的图书馆?

If I do

url = "http://example.com?p=" + urllib.quote(query)
  1. It doesn’t encode / to %2F (breaks OAuth normalization)
  2. It doesn’t handle Unicode (it throws an exception)

Is there a better library?


回答 0

Python 2

文档

urllib.quote(string[, safe])

使用%xx转义符替换字符串中的特殊字符。字母,数字和字符“ _.-”都不会被引用。默认情况下,此函数用于引用URL的路径部分。可选的safe参数指定不应引用的其他字符- 其默认值为’/’

这意味着通过“安全”将解决您的第一个问题:

>>> urllib.quote('/test')
'/test'
>>> urllib.quote('/test', safe='')
'%2Ftest'

关于第二个问题,有关于它的bug报告在这里。显然,它已在python 3中修复。您可以通过编码为utf8来解决此问题,如下所示:

>>> query = urllib.quote(u"Müller".encode('utf8'))
>>> print urllib.unquote(query).decode('utf8')
Müller

顺便看看urlencode

Python 3

相同的,除了更换urllib.quoteurllib.parse.quote

Python 2

From the docs:

urllib.quote(string[, safe])

Replace special characters in string using the %xx escape. Letters, digits, and the characters ‘_.-‘ are never quoted. By default, this function is intended for quoting the path section of the URL.The optional safe parameter specifies additional characters that should not be quoted — its default value is ‘/’

That means passing ” for safe will solve your first issue:

>>> urllib.quote('/test')
'/test'
>>> urllib.quote('/test', safe='')
'%2Ftest'

About the second issue, there is a bug report about it here. Apparently it was fixed in python 3. You can workaround it by encoding as utf8 like this:

>>> query = urllib.quote(u"Müller".encode('utf8'))
>>> print urllib.unquote(query).decode('utf8')
Müller

By the way have a look at urlencode

Python 3

The same, except replace urllib.quote with urllib.parse.quote.


回答 1

在Python 3中,urllib.quote已移至,urllib.parse.quote并且默认情况下确实处理unicode。

>>> from urllib.parse import quote
>>> quote('/test')
'/test'
>>> quote('/test', safe='')
'%2Ftest'
>>> quote('/El Niño/')
'/El%20Ni%C3%B1o/'

In Python 3, urllib.quote has been moved to urllib.parse.quote and it does handle unicode by default.

>>> from urllib.parse import quote
>>> quote('/test')
'/test'
>>> quote('/test', safe='')
'%2Ftest'
>>> quote('/El Niño/')
'/El%20Ni%C3%B1o/'

回答 2

我的答案类似于保罗的答案。

我认为模块requests要好得多。它基于urllib3。您可以尝试以下方法:

>>> from requests.utils import quote
>>> quote('/test')
'/test'
>>> quote('/test', safe='')
'%2Ftest'

My answer is similar to Paolo’s answer.

I think module requests is much better. It’s based on urllib3. You can try this:

>>> from requests.utils import quote
>>> quote('/test')
'/test'
>>> quote('/test', safe='')
'%2Ftest'

回答 3

如果您使用的是django,则可以使用urlquote:

>>> from django.utils.http import urlquote
>>> urlquote(u"Müller")
u'M%C3%BCller'

请注意,自发布此答案以来对Python的更改意味着它现在是旧版包装器。从django.utils.http的Django 2.1源代码中:

A legacy compatibility wrapper to Python's urllib.parse.quote() function.
(was used for unicode handling on Python 2)

If you’re using django, you can use urlquote:

>>> from django.utils.http import urlquote
>>> urlquote(u"Müller")
u'M%C3%BCller'

Note that changes to Python since this answer was published mean that this is now a legacy wrapper. From the Django 2.1 source code for django.utils.http:

A legacy compatibility wrapper to Python's urllib.parse.quote() function.
(was used for unicode handling on Python 2)

回答 4

最好在urlencode这里使用。单个参数没有太大区别,但是恕我直言使代码更清晰。(看一个函数看起来很混乱quote_plus!尤其是那些来自其他语言的函数)

In [21]: query='lskdfj/sdfkjdf/ksdfj skfj'

In [22]: val=34

In [23]: from urllib.parse import urlencode

In [24]: encoded = urlencode(dict(p=query,val=val))

In [25]: print(f"http://example.com?{encoded}")
http://example.com?p=lskdfj%2Fsdfkjdf%2Fksdfj+skfj&val=34

文件

urlencode:https//docs.python.org/3/library/urllib.parse.html#urllib.parse.urlencode

quote_plus:https ://docs.python.org/3/library/urllib.parse.html#urllib.parse.quote_plus

It is better to use urlencode here. Not much difference for single parameter but IMHO makes the code clearer. (It looks confusing to see a function quote_plus! especially those coming from other languates)

In [21]: query='lskdfj/sdfkjdf/ksdfj skfj'

In [22]: val=34

In [23]: from urllib.parse import urlencode

In [24]: encoded = urlencode(dict(p=query,val=val))

In [25]: print(f"http://example.com?{encoded}")
http://example.com?p=lskdfj%2Fsdfkjdf%2Fksdfj+skfj&val=34

Docs

urlencode: https://docs.python.org/3/library/urllib.parse.html#urllib.parse.urlencode

quote_plus: https://docs.python.org/3/library/urllib.parse.html#urllib.parse.quote_plus