问题:在Python中最快的HTTP GET方法是什么?
如果我知道内容将是字符串,那么用Python进行HTTP GET的最快方法是什么?我正在搜索文档,以查找像以下这样的快速单行代码:
contents = url.get("http://example.com/foo/bar")
但是,所有我能找到使用谷歌是httplib
和urllib
-我无法找到这些库中的快捷方式。
标准Python 2.5是否具有上述某种形式的快捷方式,还是应该编写一个函数url_get
?
- 我宁愿不捕获对
wget
或的炮击输出curl
。
What is the quickest way to HTTP GET in Python if I know the content will be a string? I am searching the documentation for a quick one-liner like:
contents = url.get("http://example.com/foo/bar")
But all I can find using Google are httplib
and urllib
– and I am unable to find a shortcut in those libraries.
Does standard Python 2.5 have a shortcut in some form as above, or should I write a function url_get
?
- I would prefer not to capture the output of shelling out to
wget
or curl
.
回答 0
Python 3:
import urllib.request
contents = urllib.request.urlopen("http://example.com/foo/bar").read()
Python 2:
import urllib2
contents = urllib2.urlopen("http://example.com/foo/bar").read()
urllib.request
和的文档read
。
Python 3:
import urllib.request
contents = urllib.request.urlopen("http://example.com/foo/bar").read()
Python 2:
import urllib2
contents = urllib2.urlopen("http://example.com/foo/bar").read()
Documentation for urllib.request
and read
.
回答 1
您可以使用一个称为request的库。
import requests
r = requests.get("http://example.com/foo/bar")
这很容易。然后您可以这样做:
>>> print(r.status_code)
>>> print(r.headers)
>>> print(r.content)
You could use a library called requests.
import requests
r = requests.get("http://example.com/foo/bar")
This is quite easy. Then you can do like this:
>>> print(r.status_code)
>>> print(r.headers)
>>> print(r.content)
回答 2
如果您希望使用httplib2的解决方案成为一体,请考虑实例化匿名Http对象。
import httplib2
resp, content = httplib2.Http().request("http://example.com/foo/bar")
If you want solution with httplib2 to be oneliner consider instantiating anonymous Http object
import httplib2
resp, content = httplib2.Http().request("http://example.com/foo/bar")
回答 3
看一下httplib2,它提供了很多您想要的东西,它旁边有许多非常有用的功能。
import httplib2
resp, content = httplib2.Http().request("http://example.com/foo/bar")
其中content是响应主体(作为字符串),而resp将包含状态和响应标头。
虽然它不包含在标准python安装中(但只需要标准python),但是绝对值得一试。
Have a look at httplib2, which – next to a lot of very useful features – provides exactly what you want.
import httplib2
resp, content = httplib2.Http().request("http://example.com/foo/bar")
Where content would be the response body (as a string), and resp would contain the status and response headers.
It doesn’t come included with a standard python install though (but it only requires standard python), but it’s definitely worth checking out.
回答 4
强大的urllib3
库就足够简单了。
像这样导入它:
import urllib3
http = urllib3.PoolManager()
并发出这样的请求:
response = http.request('GET', 'https://example.com')
print(response.data) # Raw data.
print(response.data.decode('utf-8')) # Text.
print(response.status) # Status code.
print(response.headers['Content-Type']) # Content type.
您也可以添加标题:
response = http.request('GET', 'https://example.com', headers={
'key1': 'value1',
'key2': 'value2'
})
可以在urllib3文档中找到更多信息。
urllib3
比内置模块urllib.request
或http
模块更安全,更易于使用,并且稳定。
It’s simple enough with the powerful urllib3
library.
Import it like this:
import urllib3
http = urllib3.PoolManager()
And make a request like this:
response = http.request('GET', 'https://example.com')
print(response.data) # Raw data.
print(response.data.decode('utf-8')) # Text.
print(response.status) # Status code.
print(response.headers['Content-Type']) # Content type.
You can add headers too:
response = http.request('GET', 'https://example.com', headers={
'key1': 'value1',
'key2': 'value2'
})
More info can be found on the urllib3 documentation.
urllib3
is much safer and easier to use than the builtin urllib.request
or http
modules and is stable.
回答 5
theller的wget解决方案确实很有用,但是,我发现它无法在整个下载过程中打印出进度。如果在reporthook中的print语句后添加一行,那是完美的。
import sys, urllib
def reporthook(a, b, c):
print "% 3.1f%% of %d bytes\r" % (min(100, float(a * b) / c * 100), c),
sys.stdout.flush()
for url in sys.argv[1:]:
i = url.rfind("/")
file = url[i+1:]
print url, "->", file
urllib.urlretrieve(url, file, reporthook)
print
theller’s solution for wget is really useful, however, i found it does not print out the progress throughout the downloading process. It’s perfect if you add one line after the print statement in reporthook.
import sys, urllib
def reporthook(a, b, c):
print "% 3.1f%% of %d bytes\r" % (min(100, float(a * b) / c * 100), c),
sys.stdout.flush()
for url in sys.argv[1:]:
i = url.rfind("/")
file = url[i+1:]
print url, "->", file
urllib.urlretrieve(url, file, reporthook)
print
回答 6
这是Python中的wget脚本:
# From python cookbook, 2nd edition, page 487
import sys, urllib
def reporthook(a, b, c):
print "% 3.1f%% of %d bytes\r" % (min(100, float(a * b) / c * 100), c),
for url in sys.argv[1:]:
i = url.rfind("/")
file = url[i+1:]
print url, "->", file
urllib.urlretrieve(url, file, reporthook)
print
Here is a wget script in Python:
# From python cookbook, 2nd edition, page 487
import sys, urllib
def reporthook(a, b, c):
print "% 3.1f%% of %d bytes\r" % (min(100, float(a * b) / c * 100), c),
for url in sys.argv[1:]:
i = url.rfind("/")
file = url[i+1:]
print url, "->", file
urllib.urlretrieve(url, file, reporthook)
print
回答 7
无需其他必要的导入,此解决方案(对我而言)有效-也适用于https:
try:
import urllib2 as urlreq # Python 2.x
except:
import urllib.request as urlreq # Python 3.x
req = urlreq.Request("http://example.com/foo/bar")
req.add_header('User-Agent', 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/60.0.3112.113 Safari/537.36')
urlreq.urlopen(req).read()
在标头信息中未指定“ User-Agent”时,通常很难抓住内容。然后通常会使用类似的取消请求:urllib2.HTTPError: HTTP Error 403: Forbidden
或urllib.error.HTTPError: HTTP Error 403: Forbidden
。
Without further necessary imports this solution works (for me) – also with https:
try:
import urllib2 as urlreq # Python 2.x
except:
import urllib.request as urlreq # Python 3.x
req = urlreq.Request("http://example.com/foo/bar")
req.add_header('User-Agent', 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/60.0.3112.113 Safari/537.36')
urlreq.urlopen(req).read()
I often have difficulty grabbing the content when not specifying a “User-Agent” in the header information. Then usually the requests are cancelled with something like: urllib2.HTTPError: HTTP Error 403: Forbidden
or urllib.error.HTTPError: HTTP Error 403: Forbidden
.
回答 8
如何发送标头
Python 3:
import urllib.request
contents = urllib.request.urlopen(urllib.request.Request(
"https://api.github.com/repos/cirosantilli/linux-kernel-module-cheat/releases/latest",
headers={"Accept" : 'application/vnd.github.full+json"text/html'}
)).read()
print(contents)
Python 2:
import urllib2
contents = urllib2.urlopen(urllib2.Request(
"https://api.github.com",
headers={"Accept" : 'application/vnd.github.full+json"text/html'}
)).read()
print(contents)
How to also send headers
Python 3:
import urllib.request
contents = urllib.request.urlopen(urllib.request.Request(
"https://api.github.com/repos/cirosantilli/linux-kernel-module-cheat/releases/latest",
headers={"Accept" : 'application/vnd.github.full+json"text/html'}
)).read()
print(contents)
Python 2:
import urllib2
contents = urllib2.urlopen(urllib2.Request(
"https://api.github.com",
headers={"Accept" : 'application/vnd.github.full+json"text/html'}
)).read()
print(contents)
回答 9
如果您专门使用HTTP API,那么还有更方便的选择,例如Nap。
例如,以下是自2014年5月1日起从Github获取要点的方法:
from nap.url import Url
api = Url('https://api.github.com')
gists = api.join('gists')
response = gists.get(params={'since': '2014-05-01T00:00:00Z'})
print(response.json())
更多示例:https : //github.com/kimmobrunfeldt/nap#examples
If you are working with HTTP APIs specifically, there are also more convenient choices such as Nap.
For example, here’s how to get gists from Github since May 1st 2014:
from nap.url import Url
api = Url('https://api.github.com')
gists = api.join('gists')
response = gists.get(params={'since': '2014-05-01T00:00:00Z'})
print(response.json())
More examples: https://github.com/kimmobrunfeldt/nap#examples
回答 10
出色的解决方案轩,塞勒。
为了使其与python 3配合使用,请进行以下更改
import sys, urllib.request
def reporthook(a, b, c):
print ("% 3.1f%% of %d bytes\r" % (min(100, float(a * b) / c * 100), c))
sys.stdout.flush()
for url in sys.argv[1:]:
i = url.rfind("/")
file = url[i+1:]
print (url, "->", file)
urllib.request.urlretrieve(url, file, reporthook)
print
另外,您输入的URL之前应带有“ http://”,否则将返回未知的URL类型错误。
Excellent solutions Xuan, Theller.
For it to work with python 3 make the following changes
import sys, urllib.request
def reporthook(a, b, c):
print ("% 3.1f%% of %d bytes\r" % (min(100, float(a * b) / c * 100), c))
sys.stdout.flush()
for url in sys.argv[1:]:
i = url.rfind("/")
file = url[i+1:]
print (url, "->", file)
urllib.request.urlretrieve(url, file, reporthook)
print
Also, the URL you enter should be preceded by a “http://”, otherwise it returns a unknown url type error.
回答 11
对于python >= 3.6
,您可以使用dload:
import dload
t = dload.text(url)
对于json
:
j = dload.json(url)
安装:
pip install dload
For python >= 3.6
, you can use dload:
import dload
t = dload.text(url)
For json
:
j = dload.json(url)
Install:
pip install dload
回答 12
实际上,在python中,我们可以从文件中读取url,这是从API读取json的示例。
import json
from urllib.request import urlopen
with urlopen(url) as f:
resp = json.load(f)
return resp['some_key']
Actually in python we can read from urls like from files, here is an example for reading json from API.
import json
from urllib.request import urlopen
with urlopen(url) as f:
resp = json.load(f)
return resp['some_key']
回答 13
如果您需要较低级别的API:
import http.client
conn = http.client.HTTPSConnection('example.com')
conn.request('GET', '/')
resp = conn.getresponse()
content = resp.read()
conn.close()
text = content.decode('utf-8')
print(text)
If you want a lower level API:
import http.client
conn = http.client.HTTPSConnection('example.com')
conn.request('GET', '/')
resp = conn.getresponse()
content = resp.read()
conn.close()
text = content.decode('utf-8')
print(text)