标签归档:http

如何在Python 2中发送HEAD HTTP请求?

问题:如何在Python 2中发送HEAD HTTP请求?

我在这里尝试做的是获取给定URL的标头,以便确定MIME类型。我希望能够查看是否http://somedomain/foo/将返回例如HTML文档或JPEG图像。因此,我需要弄清楚如何发送HEAD请求,以便无需下载内容就可以读取MIME类型。有人知道这样做的简单方法吗?

What I’m trying to do here is get the headers of a given URL so I can determine the MIME type. I want to be able to see if http://somedomain/foo/ will return an HTML document or a JPEG image for example. Thus, I need to figure out how to send a HEAD request so that I can read the MIME type without having to download the content. Does anyone know of an easy way of doing this?


回答 0

编辑:此答案有效,但是现在您应该使用下面其他答案中提到的请求库。


使用httplib

>>> import httplib
>>> conn = httplib.HTTPConnection("www.google.com")
>>> conn.request("HEAD", "/index.html")
>>> res = conn.getresponse()
>>> print res.status, res.reason
200 OK
>>> print res.getheaders()
[('content-length', '0'), ('expires', '-1'), ('server', 'gws'), ('cache-control', 'private, max-age=0'), ('date', 'Sat, 20 Sep 2008 06:43:36 GMT'), ('content-type', 'text/html; charset=ISO-8859-1')]

还有一个getheader(name)获取特定的标头。

edit: This answer works, but nowadays you should just use the requests library as mentioned by other answers below.


Use httplib.

>>> import httplib
>>> conn = httplib.HTTPConnection("www.google.com")
>>> conn.request("HEAD", "/index.html")
>>> res = conn.getresponse()
>>> print res.status, res.reason
200 OK
>>> print res.getheaders()
[('content-length', '0'), ('expires', '-1'), ('server', 'gws'), ('cache-control', 'private, max-age=0'), ('date', 'Sat, 20 Sep 2008 06:43:36 GMT'), ('content-type', 'text/html; charset=ISO-8859-1')]

There’s also a getheader(name) to get a specific header.


回答 1

urllib2可用于执行HEAD请求。这比使用httplib更好,因为urllib2为您解析URL,而不是要求您将URL分为主机名和路径。

>>> import urllib2
>>> class HeadRequest(urllib2.Request):
...     def get_method(self):
...         return "HEAD"
... 
>>> response = urllib2.urlopen(HeadRequest("http://google.com/index.html"))

头可以通过response.info()像以前一样使用。有趣的是,您可以找到重定向到的URL:

>>> print response.geturl()
http://www.google.com.au/index.html

urllib2 can be used to perform a HEAD request. This is a little nicer than using httplib since urllib2 parses the URL for you instead of requiring you to split the URL into host name and path.

>>> import urllib2
>>> class HeadRequest(urllib2.Request):
...     def get_method(self):
...         return "HEAD"
... 
>>> response = urllib2.urlopen(HeadRequest("http://google.com/index.html"))

Headers are available via response.info() as before. Interestingly, you can find the URL that you were redirected to:

>>> print response.geturl()
http://www.google.com.au/index.html

回答 2

强制Requests方式:

import requests

resp = requests.head("http://www.google.com")
print resp.status_code, resp.text, resp.headers

Obligatory Requests way:

import requests

resp = requests.head("http://www.google.com")
print resp.status_code, resp.text, resp.headers

回答 3

我认为也应该提及Requests库。

I believe the Requests library should be mentioned as well.


回答 4

只是:

import urllib2
request = urllib2.Request('http://localhost:8080')
request.get_method = lambda : 'HEAD'

response = urllib2.urlopen(request)
response.info().gettype()

编辑:我刚开始意识到有httplib2:D

import httplib2
h = httplib2.Http()
resp = h.request("http://www.google.com", 'HEAD')
assert resp[0]['status'] == 200
assert resp[0]['content-type'] == 'text/html'
...

连结文字

Just:

import urllib2
request = urllib2.Request('http://localhost:8080')
request.get_method = lambda : 'HEAD'

response = urllib2.urlopen(request)
response.info().gettype()

Edit: I’ve just came to realize there is httplib2 :D

import httplib2
h = httplib2.Http()
resp = h.request("http://www.google.com", 'HEAD')
assert resp[0]['status'] == 200
assert resp[0]['content-type'] == 'text/html'
...

link text


回答 5

为了完整起见,有一个与使用httplib接受的答案等效的Python3答案。

基本相同的代码只是该库不再被称为httplib而是http.client

from http.client import HTTPConnection

conn = HTTPConnection('www.google.com')
conn.request('HEAD', '/index.html')
res = conn.getresponse()

print(res.status, res.reason)

For completeness to have a Python3 answer equivalent to the accepted answer using httplib.

It is basically the same code just that the library isn’t called httplib anymore but http.client

from http.client import HTTPConnection

conn = HTTPConnection('www.google.com')
conn.request('HEAD', '/index.html')
res = conn.getresponse()

print(res.status, res.reason)

回答 6

import httplib
import urlparse

def unshorten_url(url):
    parsed = urlparse.urlparse(url)
    h = httplib.HTTPConnection(parsed.netloc)
    h.request('HEAD', parsed.path)
    response = h.getresponse()
    if response.status/100 == 3 and response.getheader('Location'):
        return response.getheader('Location')
    else:
        return url
import httplib
import urlparse

def unshorten_url(url):
    parsed = urlparse.urlparse(url)
    h = httplib.HTTPConnection(parsed.netloc)
    h.request('HEAD', parsed.path)
    response = h.getresponse()
    if response.status/100 == 3 and response.getheader('Location'):
        return response.getheader('Location')
    else:
        return url

回答 7

顺便说一句,当使用httplib时(至少在2.5.2上),尝试读取HEAD请求的响应将阻塞(在读取行中),随后失败。如果您未在响应中发出已读消息,则无法在连接上发送另一个请求,则需要打开一个新请求。或者接受两次请求之间的长时间延迟。

As an aside, when using the httplib (at least on 2.5.2), trying to read the response of a HEAD request will block (on readline) and subsequently fail. If you do not issue read on the response, you are unable to send another request on the connection, you will need to open a new one. Or accept a long delay between requests.


回答 8

我发现httplib比urllib2快一点。我为两个程序计时-一个使用httplib,另一个使用urllib2-将HEAD请求发送到10,000个URL。httplib的速度快了几分钟。 httplib的总统计为:实际6m21.334s用户0m2.124s sys 0m16.372s

的urllib2的总的统计是:实9m1.380s用户0m16.666s SYS 0m28.565s

有人对此有意见吗?

I have found that httplib is slightly faster than urllib2. I timed two programs – one using httplib and the other using urllib2 – sending HEAD requests to 10,000 URL’s. The httplib one was faster by several minutes. httplib‘s total stats were: real 6m21.334s user 0m2.124s sys 0m16.372s

And urllib2‘s total stats were: real 9m1.380s user 0m16.666s sys 0m28.565s

Does anybody else have input on this?


回答 9

还有另一种方法(类似于Pawel的答案):

import urllib2
import types

request = urllib2.Request('http://localhost:8080')
request.get_method = types.MethodType(lambda self: 'HEAD', request, request.__class__)

只是为了避免在实例级别使用无限制的方法。

And yet another approach (similar to Pawel answer):

import urllib2
import types

request = urllib2.Request('http://localhost:8080')
request.get_method = types.MethodType(lambda self: 'HEAD', request, request.__class__)

Just to avoid having unbounded methods at instance level.


回答 10

可能更容易:使用urllib或urllib2。

>>> import urllib
>>> f = urllib.urlopen('http://google.com')
>>> f.info().gettype()
'text/html'

f.info()是类似字典的对象,因此您可以执行f.info()[‘content-type’]等。

http://docs.python.org/library/urllib.html
http://docs.python.org/library/urllib2.html
http://docs.python.org/library/httplib.html

文档指出,通常不直接使用httplib。

Probably easier: use urllib or urllib2.

>>> import urllib
>>> f = urllib.urlopen('http://google.com')
>>> f.info().gettype()
'text/html'

f.info() is a dictionary-like object, so you can do f.info()[‘content-type’], etc.

http://docs.python.org/library/urllib.html
http://docs.python.org/library/urllib2.html
http://docs.python.org/library/httplib.html

The docs note that httplib is not normally used directly.


有没有一种简单的方法可以在python中请求URL而不遵循重定向?

问题:有没有一种简单的方法可以在python中请求URL而不遵循重定向?

查看urllib2的源代码,看起来最简单的方法是将HTTPRedirectHandler子类化,然后使用build_opener覆盖默认的HTTPRedirectHandler,但这似乎需要很多(相对复杂的)工作来完成应有的工作很简单。

Looking at the source of urllib2 it looks like the easiest way to do it would be to subclass HTTPRedirectHandler and then use build_opener to override the default HTTPRedirectHandler, but this seems like a lot of (relatively complicated) work to do what seems like it should be pretty simple.


回答 0

这是请求的方式:

import requests
r = requests.get('http://github.com', allow_redirects=False)
print(r.status_code, r.headers['Location'])

Here is the Requests way:

import requests
r = requests.get('http://github.com', allow_redirects=False)
print(r.status_code, r.headers['Location'])

回答 1

Dive Into Python有很好的章节介绍如何使用urllib2进行重定向。另一个解决方案是httplib

>>> import httplib
>>> conn = httplib.HTTPConnection("www.bogosoft.com")
>>> conn.request("GET", "")
>>> r1 = conn.getresponse()
>>> print r1.status, r1.reason
301 Moved Permanently
>>> print r1.getheader('Location')
http://www.bogosoft.com/new/location

Dive Into Python has a good chapter on handling redirects with urllib2. Another solution is httplib.

>>> import httplib
>>> conn = httplib.HTTPConnection("www.bogosoft.com")
>>> conn.request("GET", "")
>>> r1 = conn.getresponse()
>>> print r1.status, r1.reason
301 Moved Permanently
>>> print r1.getheader('Location')
http://www.bogosoft.com/new/location

回答 2

这是一个不会跟随重定向的urllib2处理程序:

class NoRedirectHandler(urllib2.HTTPRedirectHandler):
    def http_error_302(self, req, fp, code, msg, headers):
        infourl = urllib.addinfourl(fp, headers, req.get_full_url())
        infourl.status = code
        infourl.code = code
        return infourl
    http_error_300 = http_error_302
    http_error_301 = http_error_302
    http_error_303 = http_error_302
    http_error_307 = http_error_302

opener = urllib2.build_opener(NoRedirectHandler())
urllib2.install_opener(opener)

This is a urllib2 handler that will not follow redirects:

class NoRedirectHandler(urllib2.HTTPRedirectHandler):
    def http_error_302(self, req, fp, code, msg, headers):
        infourl = urllib.addinfourl(fp, headers, req.get_full_url())
        infourl.status = code
        infourl.code = code
        return infourl
    http_error_300 = http_error_302
    http_error_301 = http_error_302
    http_error_303 = http_error_302
    http_error_307 = http_error_302

opener = urllib2.build_opener(NoRedirectHandler())
urllib2.install_opener(opener)

回答 3

request方法中的redirections关键字httplib2是红色鲱鱼。RedirectLimit如果收到重定向状态代码,它将引发异常,而不是返回第一个请求。要返回初始响应,您需要在对象上设置follow_redirects为:FalseHttp

import httplib2
h = httplib2.Http()
h.follow_redirects = False
(response, body) = h.request("http://example.com")

The redirections keyword in the httplib2 request method is a red herring. Rather than return the first request it will raise a RedirectLimit exception if it receives a redirection status code. To return the inital response you need to set follow_redirects to False on the Http object:

import httplib2
h = httplib2.Http()
h.follow_redirects = False
(response, body) = h.request("http://example.com")

回答 4

我想这会有所帮助

from httplib2 import Http
def get_html(uri,num_redirections=0): # put it as 0 for not to follow redirects
conn = Http()
return conn.request(uri,redirections=num_redirections)

i suppose this would help

from httplib2 import Http
def get_html(uri,num_redirections=0): # put it as 0 for not to follow redirects
conn = Http()
return conn.request(uri,redirections=num_redirections)

回答 5

我仅次于olt的Dive into Python指针。这是一个使用urllib2重定向处理程序的实现,比应做的工作还要多?也许,耸耸肩。

import sys
import urllib2

class RedirectHandler(urllib2.HTTPRedirectHandler):
    def http_error_301(self, req, fp, code, msg, headers):  
        result = urllib2.HTTPRedirectHandler.http_error_301( 
            self, req, fp, code, msg, headers)              
        result.status = code                                 
        raise Exception("Permanent Redirect: %s" % 301)

    def http_error_302(self, req, fp, code, msg, headers):
        result = urllib2.HTTPRedirectHandler.http_error_302(
            self, req, fp, code, msg, headers)              
        result.status = code                                
        raise Exception("Temporary Redirect: %s" % 302)

def main(script_name, url):
   opener = urllib2.build_opener(RedirectHandler)
   urllib2.install_opener(opener)
   print urllib2.urlopen(url).read()

if __name__ == "__main__":
    main(*sys.argv) 

I second olt’s pointer to Dive into Python. Here’s an implementation using urllib2 redirect handlers, more work than it should be? Maybe, shrug.

import sys
import urllib2

class RedirectHandler(urllib2.HTTPRedirectHandler):
    def http_error_301(self, req, fp, code, msg, headers):  
        result = urllib2.HTTPRedirectHandler.http_error_301( 
            self, req, fp, code, msg, headers)              
        result.status = code                                 
        raise Exception("Permanent Redirect: %s" % 301)

    def http_error_302(self, req, fp, code, msg, headers):
        result = urllib2.HTTPRedirectHandler.http_error_302(
            self, req, fp, code, msg, headers)              
        result.status = code                                
        raise Exception("Temporary Redirect: %s" % 302)

def main(script_name, url):
   opener = urllib2.build_opener(RedirectHandler)
   urllib2.install_opener(opener)
   print urllib2.urlopen(url).read()

if __name__ == "__main__":
    main(*sys.argv) 

回答 6

但是最短的方法是

class NoRedirect(urllib2.HTTPRedirectHandler):
    def redirect_request(self, req, fp, code, msg, hdrs, newurl):
        pass

noredir_opener = urllib2.build_opener(NoRedirect())

The shortest way however is

class NoRedirect(urllib2.HTTPRedirectHandler):
    def redirect_request(self, req, fp, code, msg, hdrs, newurl):
        pass

noredir_opener = urllib2.build_opener(NoRedirect())

Python请求库重定向新网址

问题:Python请求库重定向新网址

我一直在浏览Python Requests文档,但是看不到我要实现的功能。

在我的脚本中,我正在设置allow_redirects=True

我想知道页面是否已重定向到其他内容,新的URL是什么。

例如,如果起始URL为: www.google.com/redirect

最终的URL是 www.google.co.uk/redirected

我如何获得该URL?

I’ve been looking through the Python Requests documentation but I cannot see any functionality for what I am trying to achieve.

In my script I am setting allow_redirects=True.

I would like to know if the page has been redirected to something else, what is the new URL.

For example, if the start URL was: www.google.com/redirect

And the final URL is www.google.co.uk/redirected

How do I get that URL?


回答 0

您正在寻找请求历史记录

response.history属性是导致最终到达网址的响应列表,可以在中找到response.url

response = requests.get(someurl)
if response.history:
    print("Request was redirected")
    for resp in response.history:
        print(resp.status_code, resp.url)
    print("Final destination:")
    print(response.status_code, response.url)
else:
    print("Request was not redirected")

演示:

>>> import requests
>>> response = requests.get('http://httpbin.org/redirect/3')
>>> response.history
(<Response [302]>, <Response [302]>, <Response [302]>)
>>> for resp in response.history:
...     print(resp.status_code, resp.url)
... 
302 http://httpbin.org/redirect/3
302 http://httpbin.org/redirect/2
302 http://httpbin.org/redirect/1
>>> print(response.status_code, response.url)
200 http://httpbin.org/get

You are looking for the request history.

The response.history attribute is a list of responses that led to the final URL, which can be found in response.url.

response = requests.get(someurl)
if response.history:
    print("Request was redirected")
    for resp in response.history:
        print(resp.status_code, resp.url)
    print("Final destination:")
    print(response.status_code, response.url)
else:
    print("Request was not redirected")

Demo:

>>> import requests
>>> response = requests.get('http://httpbin.org/redirect/3')
>>> response.history
(<Response [302]>, <Response [302]>, <Response [302]>)
>>> for resp in response.history:
...     print(resp.status_code, resp.url)
... 
302 http://httpbin.org/redirect/3
302 http://httpbin.org/redirect/2
302 http://httpbin.org/redirect/1
>>> print(response.status_code, response.url)
200 http://httpbin.org/get

回答 1

这回答了一个稍有不同的问题,但是由于我自己一直坚持这个问题,所以我希望它对其他人可能有用。

如果要使用allow_redirects=False并直接到达第一个重定向对象,而不是遵循它们的链,而只想直接从302响应对象中获取重定向位置,则r.url则将无法使用。相反,它是“ Location”标头:

r = requests.get('http://github.com/', allow_redirects=False)
r.status_code  # 302
r.url  # http://github.com, not https.
r.headers['Location']  # https://github.com/ -- the redirect destination

This is answering a slightly different question, but since I got stuck on this myself, I hope it might be useful for someone else.

If you want to use allow_redirects=False and get directly to the first redirect object, rather than following a chain of them, and you just want to get the redirect location directly out of the 302 response object, then r.url won’t work. Instead, it’s the “Location” header:

r = requests.get('http://github.com/', allow_redirects=False)
r.status_code  # 302
r.url  # http://github.com, not https.
r.headers['Location']  # https://github.com/ -- the redirect destination

回答 2

该文档具有以下内容:https: //requests.readthedocs.io/zh/master/user/quickstart/#redirection-and-history

import requests

r = requests.get('http://www.github.com')
r.url
#returns https://www.github.com instead of the http page you asked for 

the documentation has this blurb https://requests.readthedocs.io/en/master/user/quickstart/#redirection-and-history

import requests

r = requests.get('http://www.github.com')
r.url
#returns https://www.github.com instead of the http page you asked for 

回答 3

我觉得requests.head代替requests.get会更安全的处理URL重定向时调用,检查GitHub的问题在这里

r = requests.head(url, allow_redirects=True)
print(r.url)

I think requests.head instead of requests.get will be more safe to call when handling url redirect,check the github issue here:

r = requests.head(url, allow_redirects=True)
print(r.url)

回答 4

对于python3.5,您可以使用以下代码:

import urllib.request
res = urllib.request.urlopen(starturl)
finalurl = res.geturl()
print(finalurl)

For python3.5, you can use the following code:

import urllib.request
res = urllib.request.urlopen(starturl)
finalurl = res.geturl()
print(finalurl)

如何以JSON格式发送POST请求?

问题:如何以JSON格式发送POST请求?

data = {
        'ids': [12, 3, 4, 5, 6 , ...]
    }
    urllib2.urlopen("http://abc.com/api/posts/create",urllib.urlencode(data))

我想发送POST请求,但是其中一个字段应该是数字列表。我怎样才能做到这一点 ?(JSON?)

data = {
        'ids': [12, 3, 4, 5, 6 , ...]
    }
    urllib2.urlopen("http://abc.com/api/posts/create",urllib.urlencode(data))

I want to send a POST request, but one of the fields should be a list of numbers. How can I do that ? (JSON?)


回答 0

如果您的服务器期望POST请求为json,那么您将需要添加标头,并为请求序列化数据…

Python 2.x

import json
import urllib2

data = {
        'ids': [12, 3, 4, 5, 6]
}

req = urllib2.Request('http://example.com/api/posts/create')
req.add_header('Content-Type', 'application/json')

response = urllib2.urlopen(req, json.dumps(data))

Python 3.x

https://stackoverflow.com/a/26876308/496445


如果不指定标题,它将是默认application/x-www-form-urlencoded类型。

If your server is expecting the POST request to be json, then you would need to add a header, and also serialize the data for your request…

Python 2.x

import json
import urllib2

data = {
        'ids': [12, 3, 4, 5, 6]
}

req = urllib2.Request('http://example.com/api/posts/create')
req.add_header('Content-Type', 'application/json')

response = urllib2.urlopen(req, json.dumps(data))

Python 3.x

https://stackoverflow.com/a/26876308/496445


If you don’t specify the header, it will be the default application/x-www-form-urlencoded type.


回答 1

我建议使用令人难以置信的requests模块。

http://docs.python-requests.org/zh-CN/v0.10.7/user/quickstart/#custom-headers

url = 'https://api.github.com/some/endpoint'
payload = {'some': 'data'}
headers = {'content-type': 'application/json'}

response = requests.post(url, data=json.dumps(payload), headers=headers)

I recommend using the incredible requests module.

http://docs.python-requests.org/en/v0.10.7/user/quickstart/#custom-headers

url = 'https://api.github.com/some/endpoint'
payload = {'some': 'data'}
headers = {'content-type': 'application/json'}

response = requests.post(url, data=json.dumps(payload), headers=headers)

回答 2

对于python 3.4.2,我发现以下将起作用:

import urllib.request
import json      

body = {'ids': [12, 14, 50]}  

myurl = "http://www.testmycode.com"
req = urllib.request.Request(myurl)
req.add_header('Content-Type', 'application/json; charset=utf-8')
jsondata = json.dumps(body)
jsondataasbytes = jsondata.encode('utf-8')   # needs to be bytes
req.add_header('Content-Length', len(jsondataasbytes))
print (jsondataasbytes)
response = urllib.request.urlopen(req, jsondataasbytes)

for python 3.4.2 I found the following will work:

import urllib.request
import json      

body = {'ids': [12, 14, 50]}  
myurl = "http://www.testmycode.com"

req = urllib.request.Request(myurl)
req.add_header('Content-Type', 'application/json; charset=utf-8')
jsondata = json.dumps(body)
jsondataasbytes = jsondata.encode('utf-8')   # needs to be bytes
req.add_header('Content-Length', len(jsondataasbytes))
response = urllib.request.urlopen(req, jsondataasbytes)

回答 3

这非常适合 Python 3.5,如果URL中包含查询字符串/参数值,

请求网址= https://bah2.com/ws/rest/v1/concept/
参数值= 21f6bb43-98a1-419d-8f0c-8133669e40ca

import requests

url = 'https://bahbah2.com/ws/rest/v1/concept/21f6bb43-98a1-419d-8f0c-8133669e40ca'
data = {"name": "Value"}
r = requests.post(url, auth=('username', 'password'), verify=False, json=data)
print(r.status_code)

This works perfect for Python 3.5, if the URL contains Query String / Parameter value,

Request URL = https://bah2.com/ws/rest/v1/concept/
Parameter value = 21f6bb43-98a1-419d-8f0c-8133669e40ca

import requests

url = 'https://bahbah2.com/ws/rest/v1/concept/21f6bb43-98a1-419d-8f0c-8133669e40ca'
data = {"name": "Value"}
r = requests.post(url, auth=('username', 'password'), verify=False, json=data)
print(r.status_code)

回答 4

您必须添加标题,否则会出现http 400错误。该代码在python2.6,centos5.4上运行良好

码:

    import urllib2,json

    url = 'http://www.google.com/someservice'
    postdata = {'key':'value'}

    req = urllib2.Request(url)
    req.add_header('Content-Type','application/json')
    data = json.dumps(postdata)

    response = urllib2.urlopen(req,data)

You have to add header,or you will get http 400 error. The code works well on python2.6,centos5.4

code:

    import urllib2,json

    url = 'http://www.google.com/someservice'
    postdata = {'key':'value'}

    req = urllib2.Request(url)
    req.add_header('Content-Type','application/json')
    data = json.dumps(postdata)

    response = urllib2.urlopen(req,data)

回答 5

这是一个如何使用Python标准库中的urllib.request对象的示例。

import urllib.request
import json
from pprint import pprint

url = "https://app.close.com/hackwithus/3d63efa04a08a9e0/"

values = {
    "first_name": "Vlad",
    "last_name": "Bezden",
    "urls": [
        "https://twitter.com/VladBezden",
        "https://github.com/vlad-bezden",
    ],
}


headers = {
    "Content-Type": "application/json",
    "Accept": "application/json",
}

data = json.dumps(values).encode("utf-8")
pprint(data)

try:
    req = urllib.request.Request(url, data, headers)
    with urllib.request.urlopen(req) as f:
        res = f.read()
    pprint(res.decode())
except Exception as e:
    pprint(e)

Here is an example of how to use urllib.request object from Python standard library.

import urllib.request
import json
from pprint import pprint

url = "https://app.close.com/hackwithus/3d63efa04a08a9e0/"

values = {
    "first_name": "Vlad",
    "last_name": "Bezden",
    "urls": [
        "https://twitter.com/VladBezden",
        "https://github.com/vlad-bezden",
    ],
}


headers = {
    "Content-Type": "application/json",
    "Accept": "application/json",
}

data = json.dumps(values).encode("utf-8")
pprint(data)

try:
    req = urllib.request.Request(url, data, headers)
    with urllib.request.urlopen(req) as f:
        res = f.read()
    pprint(res.decode())
except Exception as e:
    pprint(e)

回答 6

在最新的请求包中,您可以使用jsonin requests.post()方法来发送json dict,并且Content-Typein标头将设置为application/json。无需显式指定标头。

import requests

payload = {'key': 'value'}

requests.post(url, json=payload)

In the lastest requests package, you can use json parameter in requests.post() method to send a json dict, and the Content-Type in header will be set to application/json. There is no need to specify header explicitly.

import requests

payload = {'key': 'value'}

requests.post(url, json=payload)

回答 7

这对我使用api来说效果很好

import requests

data={'Id':id ,'name': name}
r = requests.post( url = 'https://apiurllink', data = data)

This one works fine for me with apis

import requests

data={'Id':id ,'name': name}
r = requests.post( url = 'https://apiurllink', data = data)

urllib2.HTTPError:HTTP错误403:禁止

问题:urllib2.HTTPError:HTTP错误403:禁止

我正在尝试使用python自动下载历史股票数据。我尝试打开的URL用CSV文件响应,但是我无法使用urllib2打开。我曾尝试按照前面几个问题中的说明更改用户代理,甚至尝试接受响应cookie,但没有运气。你能帮忙吗?

注意:相同的方法适用于yahoo Finance。

码:

import urllib2,cookielib

site= "http://www.nseindia.com/live_market/dynaContent/live_watch/get_quote/getHistoricalData.jsp?symbol=JPASSOCIAT&fromDate=1-JAN-2012&toDate=1-AUG-2012&datePeriod=unselected&hiddDwnld=true"

hdr = {'User-Agent':'Mozilla/5.0'}

req = urllib2.Request(site,headers=hdr)

page = urllib2.urlopen(req)

错误

http_error_default中的文件“ C:\ Python27 \ lib \ urllib2.py”,第527行,引发HTTPError(req.get_full_url(),代码,msg,hdrs,fp)urllib2.HTTPError:HTTP错误403:禁止

谢谢你的协助

I am trying to automate download of historic stock data using python. The URL I am trying to open responds with a CSV file, but I am unable to open using urllib2. I have tried changing user agent as specified in few questions earlier, I even tried to accept response cookies, with no luck. Can you please help.

Note: The same method works for yahoo Finance.

Code:

import urllib2,cookielib

site= "http://www.nseindia.com/live_market/dynaContent/live_watch/get_quote/getHistoricalData.jsp?symbol=JPASSOCIAT&fromDate=1-JAN-2012&toDate=1-AUG-2012&datePeriod=unselected&hiddDwnld=true"

hdr = {'User-Agent':'Mozilla/5.0'}

req = urllib2.Request(site,headers=hdr)

page = urllib2.urlopen(req)

Error

File “C:\Python27\lib\urllib2.py”, line 527, in http_error_default raise HTTPError(req.get_full_url(), code, msg, hdrs, fp) urllib2.HTTPError: HTTP Error 403: Forbidden

Thanks for your assistance


回答 0

通过添加更多头,我可以获取数据:

import urllib2,cookielib

site= "http://www.nseindia.com/live_market/dynaContent/live_watch/get_quote/getHistoricalData.jsp?symbol=JPASSOCIAT&fromDate=1-JAN-2012&toDate=1-AUG-2012&datePeriod=unselected&hiddDwnld=true"
hdr = {'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.11 (KHTML, like Gecko) Chrome/23.0.1271.64 Safari/537.11',
       'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
       'Accept-Charset': 'ISO-8859-1,utf-8;q=0.7,*;q=0.3',
       'Accept-Encoding': 'none',
       'Accept-Language': 'en-US,en;q=0.8',
       'Connection': 'keep-alive'}

req = urllib2.Request(site, headers=hdr)

try:
    page = urllib2.urlopen(req)
except urllib2.HTTPError, e:
    print e.fp.read()

content = page.read()
print content

实际上,它仅与此一个附加头一起工作:

'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',

By adding a few more headers I was able to get the data:

import urllib2,cookielib

site= "http://www.nseindia.com/live_market/dynaContent/live_watch/get_quote/getHistoricalData.jsp?symbol=JPASSOCIAT&fromDate=1-JAN-2012&toDate=1-AUG-2012&datePeriod=unselected&hiddDwnld=true"
hdr = {'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.11 (KHTML, like Gecko) Chrome/23.0.1271.64 Safari/537.11',
       'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
       'Accept-Charset': 'ISO-8859-1,utf-8;q=0.7,*;q=0.3',
       'Accept-Encoding': 'none',
       'Accept-Language': 'en-US,en;q=0.8',
       'Connection': 'keep-alive'}

req = urllib2.Request(site, headers=hdr)

try:
    page = urllib2.urlopen(req)
except urllib2.HTTPError, e:
    print e.fp.read()

content = page.read()
print content

Actually, it works with just this one additional header:

'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',

回答 1

这将在Python 3中工作

import urllib.request

user_agent = 'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.0.7) Gecko/2009021910 Firefox/3.0.7'

url = "http://en.wikipedia.org/wiki/List_of_TCP_and_UDP_port_numbers"
headers={'User-Agent':user_agent,} 

request=urllib.request.Request(url,None,headers) #The assembled request
response = urllib.request.urlopen(request)
data = response.read() # The data u need

This will work in Python 3

import urllib.request

user_agent = 'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.0.7) Gecko/2009021910 Firefox/3.0.7'

url = "http://en.wikipedia.org/wiki/List_of_TCP_and_UDP_port_numbers"
headers={'User-Agent':user_agent,} 

request=urllib.request.Request(url,None,headers) #The assembled request
response = urllib.request.urlopen(request)
data = response.read() # The data u need

回答 2

NSE网站已更改,较旧的脚本是当前网站的半最佳状态。此摘要可以收集安全性的日常细节。详细信息包括代码,证券类型,先前收盘价,开盘价,高价,低价,平均价格,交易数量,营业额,交易数量,可交付数量以及已交付与交易的比率(百分比)。这些方便地以字典形式的列表形式呈现。

带有请求和BeautifulSoup的Python 3.X版本

from requests import get
from csv import DictReader
from bs4 import BeautifulSoup as Soup
from datetime import date
from io import StringIO 

SECURITY_NAME="3MINDIA" # Change this to get quote for another stock
START_DATE= date(2017, 1, 1) # Start date of stock quote data DD-MM-YYYY
END_DATE= date(2017, 9, 14)  # End date of stock quote data DD-MM-YYYY


BASE_URL = "https://www.nseindia.com/products/dynaContent/common/productsSymbolMapping.jsp?symbol={security}&segmentLink=3&symbolCount=1&series=ALL&dateRange=+&fromDate={start_date}&toDate={end_date}&dataType=PRICEVOLUMEDELIVERABLE"




def getquote(symbol, start, end):
    start = start.strftime("%-d-%-m-%Y")
    end = end.strftime("%-d-%-m-%Y")

    hdr = {'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.11 (KHTML, like Gecko) Chrome/23.0.1271.64 Safari/537.11',
         'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
         'Referer': 'https://cssspritegenerator.com',
         'Accept-Charset': 'ISO-8859-1,utf-8;q=0.7,*;q=0.3',
         'Accept-Encoding': 'none',
         'Accept-Language': 'en-US,en;q=0.8',
         'Connection': 'keep-alive'}

    url = BASE_URL.format(security=symbol, start_date=start, end_date=end)
    d = get(url, headers=hdr)
    soup = Soup(d.content, 'html.parser')
    payload = soup.find('div', {'id': 'csvContentDiv'}).text.replace(':', '\n')
    csv = DictReader(StringIO(payload))
    for row in csv:
        print({k:v.strip() for k, v in row.items()})


 if __name__ == '__main__':
     getquote(SECURITY_NAME, START_DATE, END_DATE)

此外,这是相对模块化的,可以使用摘要。

NSE website has changed and the older scripts are semi-optimum to current website. This snippet can gather daily details of security. Details include symbol, security type, previous close, open price, high price, low price, average price, traded quantity, turnover, number of trades, deliverable quantities and ratio of delivered vs traded in percentage. These conveniently presented as list of dictionary form.

Python 3.X version with requests and BeautifulSoup

from requests import get
from csv import DictReader
from bs4 import BeautifulSoup as Soup
from datetime import date
from io import StringIO 

SECURITY_NAME="3MINDIA" # Change this to get quote for another stock
START_DATE= date(2017, 1, 1) # Start date of stock quote data DD-MM-YYYY
END_DATE= date(2017, 9, 14)  # End date of stock quote data DD-MM-YYYY


BASE_URL = "https://www.nseindia.com/products/dynaContent/common/productsSymbolMapping.jsp?symbol={security}&segmentLink=3&symbolCount=1&series=ALL&dateRange=+&fromDate={start_date}&toDate={end_date}&dataType=PRICEVOLUMEDELIVERABLE"




def getquote(symbol, start, end):
    start = start.strftime("%-d-%-m-%Y")
    end = end.strftime("%-d-%-m-%Y")

    hdr = {'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.11 (KHTML, like Gecko) Chrome/23.0.1271.64 Safari/537.11',
         'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
         'Referer': 'https://cssspritegenerator.com',
         'Accept-Charset': 'ISO-8859-1,utf-8;q=0.7,*;q=0.3',
         'Accept-Encoding': 'none',
         'Accept-Language': 'en-US,en;q=0.8',
         'Connection': 'keep-alive'}

    url = BASE_URL.format(security=symbol, start_date=start, end_date=end)
    d = get(url, headers=hdr)
    soup = Soup(d.content, 'html.parser')
    payload = soup.find('div', {'id': 'csvContentDiv'}).text.replace(':', '\n')
    csv = DictReader(StringIO(payload))
    for row in csv:
        print({k:v.strip() for k, v in row.items()})


 if __name__ == '__main__':
     getquote(SECURITY_NAME, START_DATE, END_DATE)

Besides this is relatively modular and ready to use snippet.


Python 3 Web Scraping中的HTTP错误403

问题:Python 3 Web Scraping中的HTTP错误403

我试图通过抓取网站进行练习,但是我一直收到HTTP错误403(它认为我是机器人)吗?

这是我的代码:

#import requests
import urllib.request
from bs4 import BeautifulSoup
#from urllib import urlopen
import re

webpage = urllib.request.urlopen('http://www.cmegroup.com/trading/products/#sortField=oi&sortAsc=false&venues=3&page=1&cleared=1&group=1').read
findrows = re.compile('<tr class="- banding(?:On|Off)>(.*?)</tr>')
findlink = re.compile('<a href =">(.*)</a>')

row_array = re.findall(findrows, webpage)
links = re.finall(findlink, webpate)

print(len(row_array))

iterator = []

我得到的错误是:

 File "C:\Python33\lib\urllib\request.py", line 160, in urlopen
    return opener.open(url, data, timeout)
  File "C:\Python33\lib\urllib\request.py", line 479, in open
    response = meth(req, response)
  File "C:\Python33\lib\urllib\request.py", line 591, in http_response
    'http', request, response, code, msg, hdrs)
  File "C:\Python33\lib\urllib\request.py", line 517, in error
    return self._call_chain(*args)
  File "C:\Python33\lib\urllib\request.py", line 451, in _call_chain
    result = func(*args)
  File "C:\Python33\lib\urllib\request.py", line 599, in http_error_default
    raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 403: Forbidden

I was trying to scrap a website for practice, but I kept on getting the HTTP Error 403 (does it think I’m a bot)?

Here is my code:

#import requests
import urllib.request
from bs4 import BeautifulSoup
#from urllib import urlopen
import re

webpage = urllib.request.urlopen('http://www.cmegroup.com/trading/products/#sortField=oi&sortAsc=false&venues=3&page=1&cleared=1&group=1').read
findrows = re.compile('<tr class="- banding(?:On|Off)>(.*?)</tr>')
findlink = re.compile('<a href =">(.*)</a>')

row_array = re.findall(findrows, webpage)
links = re.finall(findlink, webpate)

print(len(row_array))

iterator = []

The error I get is:

 File "C:\Python33\lib\urllib\request.py", line 160, in urlopen
    return opener.open(url, data, timeout)
  File "C:\Python33\lib\urllib\request.py", line 479, in open
    response = meth(req, response)
  File "C:\Python33\lib\urllib\request.py", line 591, in http_response
    'http', request, response, code, msg, hdrs)
  File "C:\Python33\lib\urllib\request.py", line 517, in error
    return self._call_chain(*args)
  File "C:\Python33\lib\urllib\request.py", line 451, in _call_chain
    result = func(*args)
  File "C:\Python33\lib\urllib\request.py", line 599, in http_error_default
    raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 403: Forbidden

回答 0

这可能是由于mod_security某种或类似的服务器安全功能阻止了已知的蜘蛛/机器人用户代理(urllib使用python urllib/3.3.0,很容易检测到)。尝试使用以下方法设置已知的浏览器用户代理:

from urllib.request import Request, urlopen

req = Request('http://www.cmegroup.com/trading/products/#sortField=oi&sortAsc=false&venues=3&page=1&cleared=1&group=1', headers={'User-Agent': 'Mozilla/5.0'})
webpage = urlopen(req).read()

这对我有用。

顺便说一句,在您的代码中您缺少该行中的()after ,但是我认为这是一个错字。.readurlopen

提示:由于这是练习,因此请选择其他非限制性站点。也许是urllib因为某种原因而被封锁…

This is probably because of mod_security or some similar server security feature which blocks known spider/bot user agents (urllib uses something like python urllib/3.3.0, it’s easily detected). Try setting a known browser user agent with:

from urllib.request import Request, urlopen

req = Request('http://www.cmegroup.com/trading/products/#sortField=oi&sortAsc=false&venues=3&page=1&cleared=1&group=1', headers={'User-Agent': 'Mozilla/5.0'})
webpage = urlopen(req).read()

This works for me.

By the way, in your code you are missing the () after .read in the urlopen line, but I think that it’s a typo.

TIP: since this is exercise, choose a different, non restrictive site. Maybe they are blocking urllib for some reason…


回答 1

肯定是因为您基于用户代理使用urllib而被阻止。OfferUp对我来说也是一样。您可以创建一个名为AppURLopener的新类,该类使用Mozilla覆盖用户代理。

import urllib.request

class AppURLopener(urllib.request.FancyURLopener):
    version = "Mozilla/5.0"

opener = AppURLopener()
response = opener.open('http://httpbin.org/user-agent')

资源

Definitely it’s blocking because of your use of urllib based on the user agent. This same thing is happening to me with OfferUp. You can create a new class called AppURLopener which overrides the user-agent with Mozilla.

import urllib.request

class AppURLopener(urllib.request.FancyURLopener):
    version = "Mozilla/5.0"

opener = AppURLopener()
response = opener.open('http://httpbin.org/user-agent')

Source


回答 2

“这可能是由于mod_security或某些类似的服务器安全功能阻止了已知的

蜘蛛/机器人

用户代理(urllib使用python urllib / 3.3.0之类的东西,很容易检测到)”-正如Stefano Sanfilippo所述

from urllib.request import Request, urlopen
url="https://stackoverflow.com/search?q=html+error+403"
req = Request(url, headers={'User-Agent': 'Mozilla/5.0'})

web_byte = urlopen(req).read()

webpage = web_byte.decode('utf-8')

web_byte是由服务器和类型存在于网页中的内容返回的字节对象主要是UTF-8 。因此,您需要使用解码方法来解码web_byte

当我尝试使用PyCharm从网站上抓取时,这解决了完整的问题

PS->我使用Python 3.4

“This is probably because of mod_security or some similar server security feature which blocks known

spider/bot

user agents (urllib uses something like python urllib/3.3.0, it’s easily detected)” – as already mentioned by Stefano Sanfilippo

from urllib.request import Request, urlopen
url="https://stackoverflow.com/search?q=html+error+403"
req = Request(url, headers={'User-Agent': 'Mozilla/5.0'})

web_byte = urlopen(req).read()

webpage = web_byte.decode('utf-8')

The web_byte is a byte object returned by the server and the content type present in webpage is mostly utf-8. Therefore you need to decode web_byte using decode method.

This solves complete problem while I was having trying to scrap from a website using PyCharm

P.S -> I use python 3.4


回答 3

由于该页面在浏览器中工作,而不是在python程序中调用时工作,因此似乎提供该URL的Web应用程序识别出您不是通过浏览器请求内容。

示范:

curl --dump-header r.txt http://www.cmegroup.com/trading/products/#sortField=oi&sortAsc=false&venues=3&page=1&cleared=1&group=1

...
<HTML><HEAD>
<TITLE>Access Denied</TITLE>
</HEAD><BODY>
<H1>Access Denied</H1>
You don't have permission to access ...
</HTML>

并且r.txt中的内容具有状态行:

HTTP/1.1 403 Forbidden

尝试发布伪造网络客户端的标头“ User-Agent” 。

注意:该页面包含Ajax调用,该调用创建您可能要解析的表。您需要检查页面的javascript逻辑,或仅使用浏览器调试器(如Firebug / Net选项卡)查看需要调用哪个url才能获取表的内容。

Since the page works in browser and not when calling within python program, it seems that the web app that serves that url recognizes that you request the content not by the browser.

Demonstration:

curl --dump-header r.txt http://www.cmegroup.com/trading/products/#sortField=oi&sortAsc=false&venues=3&page=1&cleared=1&group=1

...
<HTML><HEAD>
<TITLE>Access Denied</TITLE>
</HEAD><BODY>
<H1>Access Denied</H1>
You don't have permission to access ...
</HTML>

and the content in r.txt has status line:

HTTP/1.1 403 Forbidden

Try posting header ‘User-Agent’ which fakes web client.

NOTE: The page contains Ajax call that creates the table you probably want to parse. You’ll need to check the javascript logic of the page or simply using browser debugger (like Firebug / Net tab) to see which url you need to call to get the table’s content.


回答 4

您可以通过两种方式尝试。详细信息在此链接中

1)通过点

点安装-升级证书

2)如果它不起作用,请尝试运行适用于Mac的Python 3. *附带的Cerificates.command:(转到您的python安装位置,然后双击该文件)

打开/ Applications / Python \ 3。* / Install \ Certificates.command

You can try in two ways. The detail is in this link.

1) Via pip

pip install –upgrade certifi

2) If it doesn’t work, try to run a Cerificates.command that comes bundled with Python 3.* for Mac:(Go to your python installation location and double click the file)

open /Applications/Python\ 3.*/Install\ Certificates.command


回答 5

根据先前的答案,

from urllib.request import Request, urlopen       
#specify url
url = 'https://xyz/xyz'
req = Request(url, headers={'User-Agent': 'XYZ/3.0'})
response = urlopen(req, timeout=20).read()

这通过延长超时为我工作。

Based on the previous answer,

from urllib.request import Request, urlopen       
#specify url
url = 'https://xyz/xyz'
req = Request(url, headers={'User-Agent': 'XYZ/3.0'})
response = urlopen(req, timeout=20).read()

This worked for me by extending the timeout.


回答 6

如果您对将用户代理伪装成Mozilla感到内gui(在Stefano的最高答案中有评论),那么它也可以与非urllib User-Agent一起使用。这适用于我引用的网站:

    req = urlrequest.Request(link, headers={'User-Agent': 'XYZ/3.0'})
    urlrequest.urlopen(req, timeout=10).read()

我的应用程序是通过抓取我在文章中引用的特定链接来测试有效性。不是通用刮板。

If you feel guilty about faking the user-agent as Mozilla (comment in the top answer from Stefano), it could work with a non-urllib User-Agent as well. This worked for the sites I reference:

    req = urlrequest.Request(link, headers={'User-Agent': 'XYZ/3.0'})
    urlrequest.urlopen(req, timeout=10).read()

My application is to test validity by scraping specific links that I refer to, in my articles. Not a generic scraper.


回答 7

根据先前的答案,这已在Python 3.7中起作用

from urllib.request import Request, urlopen

req = Request('Url_Link', headers={'User-Agent': 'XYZ/3.0'})
webpage = urlopen(req, timeout=10).read()

print(webpage)

Based on previous answers this has worked for me with Python 3.7

from urllib.request import Request, urlopen

req = Request('Url_Link', headers={'User-Agent': 'XYZ/3.0'})
webpage = urlopen(req, timeout=10).read()

print(webpage)

Httpbin-HTTP请求和响应服务,用Python+Flask编写

如何使用Python登录网页并检索Cookie以供以后使用?

问题:如何使用Python登录网页并检索Cookie以供以后使用?

我想使用python下载和解析网页,但是要访问它,我需要设置一些cookie。因此,我需要先通过https登录到网页。登录时刻需要将两个POST参数(用户名,密码)发送到/login.php。在登录请求期间,我想从响应头中检索cookie并将其存储,以便可以在请求中使用它们来下载网页/data.php。

我将如何在python(最好是2.6)中做到这一点?如果可能,我只想使用内置模块。

I want to download and parse webpage using python, but to access it I need a couple of cookies set. Therefore I need to login over https to the webpage first. The login moment involves sending two POST params (username, password) to /login.php. During the login request I want to retrieve the cookies from the response header and store them so I can use them in the request to download the webpage /data.php.

How would I do this in python (preferably 2.6)? If possible I only want to use builtin modules.


回答 0

import urllib, urllib2, cookielib

username = 'myuser'
password = 'mypassword'

cj = cookielib.CookieJar()
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj))
login_data = urllib.urlencode({'username' : username, 'j_password' : password})
opener.open('http://www.example.com/login.php', login_data)
resp = opener.open('http://www.example.com/hiddenpage.php')
print resp.read()

resp.read()是您要打开的页面的纯HTML,您可以使用opener会话cookie查看任何页面。

import urllib, urllib2, cookielib

username = 'myuser'
password = 'mypassword'

cj = cookielib.CookieJar()
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj))
login_data = urllib.urlencode({'username' : username, 'j_password' : password})
opener.open('http://www.example.com/login.php', login_data)
resp = opener.open('http://www.example.com/hiddenpage.php')
print resp.read()

resp.read() is the straight html of the page you want to open, and you can use opener to view any page using your session cookie.


回答 1

这是使用优秀请求库的版本:

from requests import session

payload = {
    'action': 'login',
    'username': USERNAME,
    'password': PASSWORD
}

with session() as c:
    c.post('http://example.com/login.php', data=payload)
    response = c.get('http://example.com/protected_page.php')
    print(response.headers)
    print(response.text)

Here’s a version using the excellent requests library:

from requests import session

payload = {
    'action': 'login',
    'username': USERNAME,
    'password': PASSWORD
}

with session() as c:
    c.post('http://example.com/login.php', data=payload)
    response = c.get('http://example.com/protected_page.php')
    print(response.headers)
    print(response.text)

有没有办法在python中执行HTTP PUT

问题:有没有办法在python中执行HTTP PUT

我需要使用PUTpython中的HTTP将一些数据上传到服务器。从我对urllib2文档的简短阅读中,它只能使用HTTP POST。有什么办法可以PUT在python中执行HTTP 吗?

I need to upload some data to a server using HTTP PUT in python. From my brief reading of the urllib2 docs, it only does HTTP POST. Is there any way to do an HTTP PUT in python?


回答 0

过去,我使用过各种python HTTP库,而我最喜欢的是“ Requests ”。现有的库具有相当有用的接口,但是对于简单的操作,代码最终可能会花几行时间。请求中的基本PUT如下所示:

payload = {'username': 'bob', 'email': 'bob@bob.com'}
>>> r = requests.put("http://somedomain.org/endpoint", data=payload)

然后,您可以使用以下方法检查响应状态代码:

r.status_code

或回应:

r.content

请求有很多语法和快捷方式,可以使您的生活更轻松。

I’ve used a variety of python HTTP libs in the past, and I’ve settled on ‘Requests‘ as my favourite. Existing libs had pretty useable interfaces, but code can end up being a few lines too long for simple operations. A basic PUT in requests looks like:

payload = {'username': 'bob', 'email': 'bob@bob.com'}
>>> r = requests.put("http://somedomain.org/endpoint", data=payload)

You can then check the response status code with:

r.status_code

or the response with:

r.content

Requests has a lot synactic sugar and shortcuts that’ll make your life easier.


回答 1

import urllib2
opener = urllib2.build_opener(urllib2.HTTPHandler)
request = urllib2.Request('http://example.org', data='your_put_data')
request.add_header('Content-Type', 'your/contenttype')
request.get_method = lambda: 'PUT'
url = opener.open(request)
import urllib2
opener = urllib2.build_opener(urllib2.HTTPHandler)
request = urllib2.Request('http://example.org', data='your_put_data')
request.add_header('Content-Type', 'your/contenttype')
request.get_method = lambda: 'PUT'
url = opener.open(request)

回答 2

Httplib似乎是一个更清洁的选择。

import httplib
connection =  httplib.HTTPConnection('1.2.3.4:1234')
body_content = 'BODY CONTENT GOES HERE'
connection.request('PUT', '/url/path/to/put/to', body_content)
result = connection.getresponse()
# Now result.status and result.reason contains interesting stuff

Httplib seems like a cleaner choice.

import httplib
connection =  httplib.HTTPConnection('1.2.3.4:1234')
body_content = 'BODY CONTENT GOES HERE'
connection.request('PUT', '/url/path/to/put/to', body_content)
result = connection.getresponse()
# Now result.status and result.reason contains interesting stuff

回答 3

您应该看看httplib模块。它应该让您发出所需的任何HTTP请求。

You should have a look at the httplib module. It should let you make whatever sort of HTTP request you want.


回答 4

我有一阵子需要解决此问题,以便可以充当RESTful API的客户端。我选择了httplib2,因为它允许我除了GET和POST之外还发送PUT和DELETE。Httplib2不是标准库的一部分,但是您可以从奶酪店轻松获得它。

I needed to solve this problem too a while back so that I could act as a client for a RESTful API. I settled on httplib2 because it allowed me to send PUT and DELETE in addition to GET and POST. Httplib2 is not part of the standard library but you can easily get it from the cheese shop.


回答 5

您可以使用请求库,与采用urllib2方法相比,它可以简化很多事情。首先从pip安装它:

pip install requests

有关安装请求的更多信息。

然后设置放置请求:

import requests
import json
url = 'https://api.github.com/some/endpoint'
payload = {'some': 'data'}

# Create your header as required
headers = {"content-type": "application/json", "Authorization": "<auth-key>" }

r = requests.put(url, data=json.dumps(payload), headers=headers)

请参阅请求库快速入门。我认为这比urllib2简单得多,但确实需要安装和导入此附加软件包。

You can use the requests library, it simplifies things a lot in comparison to taking the urllib2 approach. First install it from pip:

pip install requests

More on installing requests.

Then setup the put request:

import requests
import json
url = 'https://api.github.com/some/endpoint'
payload = {'some': 'data'}

# Create your header as required
headers = {"content-type": "application/json", "Authorization": "<auth-key>" }

r = requests.put(url, data=json.dumps(payload), headers=headers)

See the quickstart for requests library. I think this is a lot simpler than urllib2 but does require this additional package to be installed and imported.


回答 6

这在python3中变得更好,并在stdlib文档中进行了记录

urllib.request.Request班获得了method=...在python3参数。

一些示例用法:

req = urllib.request.Request('https://example.com/', data=b'DATA!', method='PUT')
urllib.request.urlopen(req)

This was made better in python3 and documented in the stdlib documentation

The urllib.request.Request class gained a method=... parameter in python3.

Some sample usage:

req = urllib.request.Request('https://example.com/', data=b'DATA!', method='PUT')
urllib.request.urlopen(req)

回答 7

我还推荐Joe Gregario提供的httplib2。我经常使用它,而不是标准库中的httplib。

I also recommend httplib2 by Joe Gregario. I use this regularly instead of httplib in the standard lib.


回答 8

您看过put.py吗?我过去曾经用过。您也可以使用urllib修改自己的请求。

Have you taken a look at put.py? I’ve used it in the past. You can also just hack up your own request with urllib.


回答 9

当然,您可以使用现有的标准库在任何级别(从套接字到调整urllib)进行滚动。

http://pycurl.sourceforge.net/

“ PyCurl是libcurl的Python接口。”

“ libcurl是一个免费且易于使用的客户端URL传输库,…支持… HTTP PUT”

“ PycURL的主要缺点是它是libcurl上相对较薄的一层,没有任何不错的Pythonic类层次结构。这意味着除非您已经熟悉libcurl的C API,否则它的学习曲线就比较陡峭。”

You can of course roll your own with the existing standard libraries at any level from sockets up to tweaking urllib.

http://pycurl.sourceforge.net/

“PyCurl is a Python interface to libcurl.”

“libcurl is a free and easy-to-use client-side URL transfer library, … supports … HTTP PUT”

“The main drawback with PycURL is that it is a relative thin layer over libcurl without any of those nice Pythonic class hierarchies. This means it has a somewhat steep learning curve unless you are already familiar with libcurl’s C API. “


回答 10

如果要保留在标准库中,则可以子类化urllib2.Request

import urllib2

class RequestWithMethod(urllib2.Request):
    def __init__(self, *args, **kwargs):
        self._method = kwargs.pop('method', None)
        urllib2.Request.__init__(self, *args, **kwargs)

    def get_method(self):
        return self._method if self._method else super(RequestWithMethod, self).get_method()


def put_request(url, data):
    opener = urllib2.build_opener(urllib2.HTTPHandler)
    request = RequestWithMethod(url, method='PUT', data=data)
    return opener.open(request)

If you want to stay within the standard library, you can subclass urllib2.Request:

import urllib2

class RequestWithMethod(urllib2.Request):
    def __init__(self, *args, **kwargs):
        self._method = kwargs.pop('method', None)
        urllib2.Request.__init__(self, *args, **kwargs)

    def get_method(self):
        return self._method if self._method else super(RequestWithMethod, self).get_method()


def put_request(url, data):
    opener = urllib2.build_opener(urllib2.HTTPHandler)
    request = RequestWithMethod(url, method='PUT', data=data)
    return opener.open(request)

回答 11

一个更合适的方法requests是:

import requests

payload = {'username': 'bob', 'email': 'bob@bob.com'}

try:
    response = requests.put(url="http://somedomain.org/endpoint", data=payload)
    response.raise_for_status()
except requests.exceptions.RequestException as e:
    print(e)
    raise

如果HTTP PUT请求中存在错误,则会引发异常。

A more proper way of doing this with requests would be:

import requests

payload = {'username': 'bob', 'email': 'bob@bob.com'}

try:
    response = requests.put(url="http://somedomain.org/endpoint", data=payload)
    response.raise_for_status()
except requests.exceptions.RequestException as e:
    print(e)
    raise

This raises an exception if there is an error in the HTTP PUT request.


如何在Flask中获取HTTP标头?

问题:如何在Flask中获取HTTP标头?

我是python的新手,使用Python Flask并生成REST API服务。

我想检查发送给客户端的授权标头。

但是我找不到在flask中获取HTTP标头的方法。

感谢获得HTTP标头授权的任何帮助。

I am newbie to python and using Python Flask and generating REST API service.

I want to check authorization header which is sent the client.

But I can’t find way to get HTTP header in flask.

Any help for getting HTTP header authorization is appreciated.


回答 0

from flask import request
request.headers.get('your-header-name')

request.headers 行为就像字典,因此您也可以像使用任何字典一样获取标头:

request.headers['your-header-name']
from flask import request
request.headers.get('your-header-name')

request.headers behaves like a dictionary, so you can also get your header like you would with any dictionary:

request.headers['your-header-name']

回答 1

请注意,如果标头不存在,则方法之间的区别是

request.headers.get('your-header-name')

将返回None或没有异常,因此您可以像使用它

if request.headers.get('your-header-name'):
    ....

但是以下内容将引发错误

if request.headers['your-header-name'] # KeyError: 'your-header-name'
    ....

你可以通过

if 'your-header-name' in request.headers:
   customHeader = request.headers['your-header-name']
   ....

just note, The different between the methods are, if the header is not exist

request.headers.get('your-header-name')

will return None or no exception, so you can use it like

if request.headers.get('your-header-name'):
    ....

but the following will throw an error

if request.headers['your-header-name'] # KeyError: 'your-header-name'
    ....

You can handle it by

if 'your-header-name' in request.headers:
   customHeader = request.headers['your-header-name']
   ....

回答 2

如果有人试图获取所有已传递的标头,则只需使用:

dict(request.headers)

它为您提供了dict中的所有标头,您实际上可以从中执行任何想要的操作。在我的用例中,由于python API是代理,因此我不得不将所有标头转发到另一个API

If any one’s trying to fetch all headers that were passed then just simply use:

dict(request.headers)

it gives you all the headers in a dict from which you can actually do whatever ops you want to. In my use case I had to forward all headers to another API since the python API was a proxy


回答 3

让我们看看如何在Flask中获取参数,标题和正文。我要在邮递员的帮助下进行解释。

参数项和值反映在API端点中。例如端点中的key1和key2:https : //127.0.0.1/uploadkey1 = value1&key2 = value2

from flask import Flask, request
app = Flask(__name__)

@app.route('/upload')
def upload():

  key_1 = request.args.get('key1')
  key_2 = request.args.get('key2')
  print(key_1)
  #--> value1
  print(key_2)
  #--> value2

在参数之后,让我们看看如何获​​取标题

  header_1 = request.headers.get('header1')
  header_2 = request.headers.get('header2')
  print(header_1)
  #--> header_value1
  print(header_2)
  #--> header_value2

现在让我们看看如何获​​得身体

  file_name = request.files['file'].filename
  ref_id = request.form['referenceId']
  print(ref_id)
  #--> WWB9838yb3r47484

因此我们使用request.files获取上传的文件,并使用request.form获取文本

Let’s see how we get the params, headers and body in Flask. I’m gonna explain with the help of postman.

The params keys and values are reflected in the API endpoint. for example key1 and key2 in the endpoint : https://127.0.0.1/upload?key1=value1&key2=value2

from flask import Flask, request
app = Flask(__name__)

@app.route('/upload')
def upload():

  key_1 = request.args.get('key1')
  key_2 = request.args.get('key2')
  print(key_1)
  #--> value1
  print(key_2)
  #--> value2

After params, let’s now see how to get the headers:

  header_1 = request.headers.get('header1')
  header_2 = request.headers.get('header2')
  print(header_1)
  #--> header_value1
  print(header_2)
  #--> header_value2

Now let’s see how to get the body

  file_name = request.files['file'].filename
  ref_id = request.form['referenceId']
  print(ref_id)
  #--> WWB9838yb3r47484

so we fetch the uploaded files with request.files and text with request.form