问题:如何在Python 2中发送HEAD HTTP请求?

我在这里尝试做的是获取给定URL的标头,以便确定MIME类型。我希望能够查看是否http://somedomain/foo/将返回例如HTML文档或JPEG图像。因此,我需要弄清楚如何发送HEAD请求,以便无需下载内容就可以读取MIME类型。有人知道这样做的简单方法吗?

What I’m trying to do here is get the headers of a given URL so I can determine the MIME type. I want to be able to see if http://somedomain/foo/ will return an HTML document or a JPEG image for example. Thus, I need to figure out how to send a HEAD request so that I can read the MIME type without having to download the content. Does anyone know of an easy way of doing this?


回答 0

编辑:此答案有效,但是现在您应该使用下面其他答案中提到的请求库。


使用httplib

>>> import httplib
>>> conn = httplib.HTTPConnection("www.google.com")
>>> conn.request("HEAD", "/index.html")
>>> res = conn.getresponse()
>>> print res.status, res.reason
200 OK
>>> print res.getheaders()
[('content-length', '0'), ('expires', '-1'), ('server', 'gws'), ('cache-control', 'private, max-age=0'), ('date', 'Sat, 20 Sep 2008 06:43:36 GMT'), ('content-type', 'text/html; charset=ISO-8859-1')]

还有一个getheader(name)获取特定的标头。

edit: This answer works, but nowadays you should just use the requests library as mentioned by other answers below.


Use httplib.

>>> import httplib
>>> conn = httplib.HTTPConnection("www.google.com")
>>> conn.request("HEAD", "/index.html")
>>> res = conn.getresponse()
>>> print res.status, res.reason
200 OK
>>> print res.getheaders()
[('content-length', '0'), ('expires', '-1'), ('server', 'gws'), ('cache-control', 'private, max-age=0'), ('date', 'Sat, 20 Sep 2008 06:43:36 GMT'), ('content-type', 'text/html; charset=ISO-8859-1')]

There’s also a getheader(name) to get a specific header.


回答 1

urllib2可用于执行HEAD请求。这比使用httplib更好,因为urllib2为您解析URL,而不是要求您将URL分为主机名和路径。

>>> import urllib2
>>> class HeadRequest(urllib2.Request):
...     def get_method(self):
...         return "HEAD"
... 
>>> response = urllib2.urlopen(HeadRequest("http://google.com/index.html"))

头可以通过response.info()像以前一样使用。有趣的是,您可以找到重定向到的URL:

>>> print response.geturl()
http://www.google.com.au/index.html

urllib2 can be used to perform a HEAD request. This is a little nicer than using httplib since urllib2 parses the URL for you instead of requiring you to split the URL into host name and path.

>>> import urllib2
>>> class HeadRequest(urllib2.Request):
...     def get_method(self):
...         return "HEAD"
... 
>>> response = urllib2.urlopen(HeadRequest("http://google.com/index.html"))

Headers are available via response.info() as before. Interestingly, you can find the URL that you were redirected to:

>>> print response.geturl()
http://www.google.com.au/index.html

回答 2

强制Requests方式:

import requests

resp = requests.head("http://www.google.com")
print resp.status_code, resp.text, resp.headers

Obligatory Requests way:

import requests

resp = requests.head("http://www.google.com")
print resp.status_code, resp.text, resp.headers

回答 3

我认为也应该提及Requests库。

I believe the Requests library should be mentioned as well.


回答 4

只是:

import urllib2
request = urllib2.Request('http://localhost:8080')
request.get_method = lambda : 'HEAD'

response = urllib2.urlopen(request)
response.info().gettype()

编辑:我刚开始意识到有httplib2:D

import httplib2
h = httplib2.Http()
resp = h.request("http://www.google.com", 'HEAD')
assert resp[0]['status'] == 200
assert resp[0]['content-type'] == 'text/html'
...

连结文字

Just:

import urllib2
request = urllib2.Request('http://localhost:8080')
request.get_method = lambda : 'HEAD'

response = urllib2.urlopen(request)
response.info().gettype()

Edit: I’ve just came to realize there is httplib2 :D

import httplib2
h = httplib2.Http()
resp = h.request("http://www.google.com", 'HEAD')
assert resp[0]['status'] == 200
assert resp[0]['content-type'] == 'text/html'
...

link text


回答 5

为了完整起见,有一个与使用httplib接受的答案等效的Python3答案。

基本相同的代码只是该库不再被称为httplib而是http.client

from http.client import HTTPConnection

conn = HTTPConnection('www.google.com')
conn.request('HEAD', '/index.html')
res = conn.getresponse()

print(res.status, res.reason)

For completeness to have a Python3 answer equivalent to the accepted answer using httplib.

It is basically the same code just that the library isn’t called httplib anymore but http.client

from http.client import HTTPConnection

conn = HTTPConnection('www.google.com')
conn.request('HEAD', '/index.html')
res = conn.getresponse()

print(res.status, res.reason)

回答 6

import httplib
import urlparse

def unshorten_url(url):
    parsed = urlparse.urlparse(url)
    h = httplib.HTTPConnection(parsed.netloc)
    h.request('HEAD', parsed.path)
    response = h.getresponse()
    if response.status/100 == 3 and response.getheader('Location'):
        return response.getheader('Location')
    else:
        return url
import httplib
import urlparse

def unshorten_url(url):
    parsed = urlparse.urlparse(url)
    h = httplib.HTTPConnection(parsed.netloc)
    h.request('HEAD', parsed.path)
    response = h.getresponse()
    if response.status/100 == 3 and response.getheader('Location'):
        return response.getheader('Location')
    else:
        return url

回答 7

顺便说一句,当使用httplib时(至少在2.5.2上),尝试读取HEAD请求的响应将阻塞(在读取行中),随后失败。如果您未在响应中发出已读消息,则无法在连接上发送另一个请求,则需要打开一个新请求。或者接受两次请求之间的长时间延迟。

As an aside, when using the httplib (at least on 2.5.2), trying to read the response of a HEAD request will block (on readline) and subsequently fail. If you do not issue read on the response, you are unable to send another request on the connection, you will need to open a new one. Or accept a long delay between requests.


回答 8

我发现httplib比urllib2快一点。我为两个程序计时-一个使用httplib,另一个使用urllib2-将HEAD请求发送到10,000个URL。httplib的速度快了几分钟。 httplib的总统计为:实际6m21.334s用户0m2.124s sys 0m16.372s

的urllib2的总的统计是:实9m1.380s用户0m16.666s SYS 0m28.565s

有人对此有意见吗?

I have found that httplib is slightly faster than urllib2. I timed two programs – one using httplib and the other using urllib2 – sending HEAD requests to 10,000 URL’s. The httplib one was faster by several minutes. httplib‘s total stats were: real 6m21.334s user 0m2.124s sys 0m16.372s

And urllib2‘s total stats were: real 9m1.380s user 0m16.666s sys 0m28.565s

Does anybody else have input on this?


回答 9

还有另一种方法(类似于Pawel的答案):

import urllib2
import types

request = urllib2.Request('http://localhost:8080')
request.get_method = types.MethodType(lambda self: 'HEAD', request, request.__class__)

只是为了避免在实例级别使用无限制的方法。

And yet another approach (similar to Pawel answer):

import urllib2
import types

request = urllib2.Request('http://localhost:8080')
request.get_method = types.MethodType(lambda self: 'HEAD', request, request.__class__)

Just to avoid having unbounded methods at instance level.


回答 10

可能更容易:使用urllib或urllib2。

>>> import urllib
>>> f = urllib.urlopen('http://google.com')
>>> f.info().gettype()
'text/html'

f.info()是类似字典的对象,因此您可以执行f.info()[‘content-type’]等。

http://docs.python.org/library/urllib.html
http://docs.python.org/library/urllib2.html
http://docs.python.org/library/httplib.html

文档指出,通常不直接使用httplib。

Probably easier: use urllib or urllib2.

>>> import urllib
>>> f = urllib.urlopen('http://google.com')
>>> f.info().gettype()
'text/html'

f.info() is a dictionary-like object, so you can do f.info()[‘content-type’], etc.

http://docs.python.org/library/urllib.html
http://docs.python.org/library/urllib2.html
http://docs.python.org/library/httplib.html

The docs note that httplib is not normally used directly.


声明:本站所有文章,如无特殊说明或标注,均为本站原创发布。任何个人或组织,在未征得本站同意时,禁止复制、盗用、采集、发布本站内容到任何网站、书籍等各类媒体平台。如若本站内容侵犯了原著者的合法权益,可联系我们进行处理。