问题:Python请求库重定向新网址
我一直在浏览Python Requests文档,但是看不到我要实现的功能。
在我的脚本中,我正在设置allow_redirects=True
。
我想知道页面是否已重定向到其他内容,新的URL是什么。
例如,如果起始URL为: www.google.com/redirect
最终的URL是 www.google.co.uk/redirected
我如何获得该URL?
I’ve been looking through the Python Requests documentation but I cannot see any functionality for what I am trying to achieve.
In my script I am setting allow_redirects=True
.
I would like to know if the page has been redirected to something else, what is the new URL.
For example, if the start URL was: www.google.com/redirect
And the final URL is www.google.co.uk/redirected
How do I get that URL?
回答 0
您正在寻找请求历史记录。
该response.history
属性是导致最终到达网址的响应列表,可以在中找到response.url
。
response = requests.get(someurl)
if response.history:
print("Request was redirected")
for resp in response.history:
print(resp.status_code, resp.url)
print("Final destination:")
print(response.status_code, response.url)
else:
print("Request was not redirected")
演示:
>>> import requests
>>> response = requests.get('http://httpbin.org/redirect/3')
>>> response.history
(<Response [302]>, <Response [302]>, <Response [302]>)
>>> for resp in response.history:
... print(resp.status_code, resp.url)
...
302 http://httpbin.org/redirect/3
302 http://httpbin.org/redirect/2
302 http://httpbin.org/redirect/1
>>> print(response.status_code, response.url)
200 http://httpbin.org/get
You are looking for the request history.
The response.history
attribute is a list of responses that led to the final URL, which can be found in response.url
.
response = requests.get(someurl)
if response.history:
print("Request was redirected")
for resp in response.history:
print(resp.status_code, resp.url)
print("Final destination:")
print(response.status_code, response.url)
else:
print("Request was not redirected")
Demo:
>>> import requests
>>> response = requests.get('http://httpbin.org/redirect/3')
>>> response.history
(<Response [302]>, <Response [302]>, <Response [302]>)
>>> for resp in response.history:
... print(resp.status_code, resp.url)
...
302 http://httpbin.org/redirect/3
302 http://httpbin.org/redirect/2
302 http://httpbin.org/redirect/1
>>> print(response.status_code, response.url)
200 http://httpbin.org/get
回答 1
这回答了一个稍有不同的问题,但是由于我自己一直坚持这个问题,所以我希望它对其他人可能有用。
如果要使用allow_redirects=False
并直接到达第一个重定向对象,而不是遵循它们的链,而只想直接从302响应对象中获取重定向位置,则r.url
则将无法使用。相反,它是“ Location”标头:
r = requests.get('http://github.com/', allow_redirects=False)
r.status_code # 302
r.url # http://github.com, not https.
r.headers['Location'] # https://github.com/ -- the redirect destination
This is answering a slightly different question, but since I got stuck on this myself, I hope it might be useful for someone else.
If you want to use allow_redirects=False
and get directly to the first redirect object, rather than following a chain of them, and you just want to get the redirect location directly out of the 302 response object, then r.url
won’t work. Instead, it’s the “Location” header:
r = requests.get('http://github.com/', allow_redirects=False)
r.status_code # 302
r.url # http://github.com, not https.
r.headers['Location'] # https://github.com/ -- the redirect destination
回答 2
该文档具有以下内容:https: //requests.readthedocs.io/zh/master/user/quickstart/#redirection-and-history
import requests
r = requests.get('http://www.github.com')
r.url
#returns https://www.github.com instead of the http page you asked for
回答 3
我觉得requests.head代替requests.get会更安全的处理URL重定向时调用,检查GitHub的问题在这里:
r = requests.head(url, allow_redirects=True)
print(r.url)
I think requests.head instead of requests.get will be more safe to call when handling url redirect,check the github issue here:
r = requests.head(url, allow_redirects=True)
print(r.url)
回答 4
对于python3.5,您可以使用以下代码:
import urllib.request
res = urllib.request.urlopen(starturl)
finalurl = res.geturl()
print(finalurl)
For python3.5, you can use the following code:
import urllib.request
res = urllib.request.urlopen(starturl)
finalurl = res.geturl()
print(finalurl)