问题:请求中的URL超过了最大重试次数

我正在尝试获取App Store> Business的内容:

import requests
from lxml import html

page = requests.get("https://itunes.apple.com/in/genre/ios-business/id6000?mt=8")
tree = html.fromstring(page.text)

flist = []
plist = []
for i in range(0, 100):
    app = tree.xpath("//div[@class='column first']/ul/li/a/@href")
    ap = app[0]
    page1 = requests.get(ap)

当我尝试range使用(0,2)它时,但是当我使用rangein 时,100它显示此错误:

Traceback (most recent call last):
  File "/home/preetham/Desktop/eg.py", line 17, in <module>
    page1 = requests.get(ap)
  File "/usr/local/lib/python2.7/dist-packages/requests/api.py", line 55, in get
    return request('get', url, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/requests/api.py", line 44, in request
    return session.request(method=method, url=url, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/requests/sessions.py", line 383, in request
    resp = self.send(prep, **send_kwargs)
  File "/usr/local/lib/python2.7/dist-packages/requests/sessions.py", line 486, in send
    r = adapter.send(request, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/requests/adapters.py", line 378, in send
    raise ConnectionError(e)
requests.exceptions.ConnectionError: HTTPSConnectionPool(host='itunes.apple.com', port=443): Max retries exceeded with url: /in/app/adobe-reader/id469337564?mt=8 (Caused by <class 'socket.gaierror'>: [Errno -2] Name or service not known)

I’m trying to get the content of App Store > Business:

import requests
from lxml import html

page = requests.get("https://itunes.apple.com/in/genre/ios-business/id6000?mt=8")
tree = html.fromstring(page.text)

flist = []
plist = []
for i in range(0, 100):
    app = tree.xpath("//div[@class='column first']/ul/li/a/@href")
    ap = app[0]
    page1 = requests.get(ap)

When I try the range with (0,2) it works, but when I put the range in 100s it shows this error:

Traceback (most recent call last):
  File "/home/preetham/Desktop/eg.py", line 17, in <module>
    page1 = requests.get(ap)
  File "/usr/local/lib/python2.7/dist-packages/requests/api.py", line 55, in get
    return request('get', url, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/requests/api.py", line 44, in request
    return session.request(method=method, url=url, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/requests/sessions.py", line 383, in request
    resp = self.send(prep, **send_kwargs)
  File "/usr/local/lib/python2.7/dist-packages/requests/sessions.py", line 486, in send
    r = adapter.send(request, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/requests/adapters.py", line 378, in send
    raise ConnectionError(e)
requests.exceptions.ConnectionError: HTTPSConnectionPool(host='itunes.apple.com', port=443): Max retries exceeded with url: /in/app/adobe-reader/id469337564?mt=8 (Caused by <class 'socket.gaierror'>: [Errno -2] Name or service not known)

回答 0

这里发生的是itunes服务器拒绝您的连接(您在短时间内从同一ip地址发送了太多请求)

网址超出了最大重试次数:/ in / app / adobe-reader / id469337564?mt = 8

错误跟踪会误导您,应该是“由于目标计算机主动拒绝连接而无法建立连接”

Github上有关python.requests lib的问题,请在此处查看

要克服此问题(与其说是错误的调试跟踪,不如说是一个问题),您应该捕获与连接有关的异常,如下所示:

try:
    page1 = requests.get(ap)
except requests.exceptions.ConnectionError:
    r.status_code = "Connection refused"

解决此问题的另一种方法是,如果您使用足够的时间间隔将请求发送到服务器,则可以通过sleep(timeinsec)python中的函数来实现(不要忘记导入睡眠)

from time import sleep

总而言之,请求都是很棒的python lib,希望能解决您的问题。

What happened here is that itunes server refuses your connection (you’re sending too many requests from same ip address in short period of time)

Max retries exceeded with url: /in/app/adobe-reader/id469337564?mt=8

error trace is misleading it should be something like “No connection could be made because the target machine actively refused it”.

There is an issue at about python.requests lib at Github, check it out here

To overcome this issue (not so much an issue as it is misleading debug trace) you should catch connection related exceptions like so:

try:
    page1 = requests.get(ap)
except requests.exceptions.ConnectionError:
    r.status_code = "Connection refused"

Another way to overcome this problem is if you use enough time gap to send requests to server this can be achieved by sleep(timeinsec) function in python (don’t forget to import sleep)

from time import sleep

All in all requests is awesome python lib, hope that solves your problem.


回答 1

只需使用以下requests'功能:

import requests
from requests.adapters import HTTPAdapter
from requests.packages.urllib3.util.retry import Retry


session = requests.Session()
retry = Retry(connect=3, backoff_factor=0.5)
adapter = HTTPAdapter(max_retries=retry)
session.mount('http://', adapter)
session.mount('https://', adapter)

session.get(url)

这将GET是URL,如果是,将重试3次requests.exceptions.ConnectionErrorbackoff_factor将有助于在两次尝试之间施加延迟,以避免在定期请求配额的情况下再次失败。

看一下requests.packages.urllib3.util.retry.Retry,它有许多选项可以简化重试。

Just use requests' features:

import requests
from requests.adapters import HTTPAdapter
from requests.packages.urllib3.util.retry import Retry


session = requests.Session()
retry = Retry(connect=3, backoff_factor=0.5)
adapter = HTTPAdapter(max_retries=retry)
session.mount('http://', adapter)
session.mount('https://', adapter)

session.get(url)

This will GET the URL and retry 3 times in case of requests.exceptions.ConnectionError. backoff_factor will help to apply delays between attempts to avoid to fail again in case of periodic request quota.

Take a look at requests.packages.urllib3.util.retry.Retry, it has many options to simplify retries.


回答 2

就是这样

粘贴以下代码代替page = requests.get(url)

import time

page = ''
while page == '':
    try:
        page = requests.get(url)
        break
    except:
        print("Connection refused by the server..")
        print("Let me sleep for 5 seconds")
        print("ZZzzzz...")
        time.sleep(5)
        print("Was a nice sleep, now let me continue...")
        continue

别客气 :)

Just do this,

Paste the following code in place of page = requests.get(url):

import time

page = ''
while page == '':
    try:
        page = requests.get(url)
        break
    except:
        print("Connection refused by the server..")
        print("Let me sleep for 5 seconds")
        print("ZZzzzz...")
        time.sleep(5)
        print("Was a nice sleep, now let me continue...")
        continue

You’re welcome :)


回答 3

pip install pyopenssl 似乎为我解决了。

https://github.com/requests/requests/issues/4246

pip install pyopenssl seemed to solve it for me.

https://github.com/requests/requests/issues/4246


回答 4

我遇到了类似的问题,但是以下代码对我有用。

url = <some REST url>    
page = requests.get(url, verify=False)

“ verify = False”禁用SSL验证。尝试捕获可以像往常一样添加。

I got similar problem but the following code worked for me.

url = <some REST url>    
page = requests.get(url, verify=False)

“verify=False” disables SSL verification. Try and catch can be added as usual.


回答 5

实施异常处理总是好的。它不仅有助于避免脚本意外退出,还可以帮助记录错误和信息通知。当使用Python请求时,我更喜欢捕获这样的异常:

    try:
        res = requests.get(adress,timeout=30)
    except requests.ConnectionError as e:
        print("OOPS!! Connection Error. Make sure you are connected to Internet. Technical Details given below.\n")
        print(str(e))            
        renewIPadress()
        continue
    except requests.Timeout as e:
        print("OOPS!! Timeout Error")
        print(str(e))
        renewIPadress()
        continue
    except requests.RequestException as e:
        print("OOPS!! General Error")
        print(str(e))
        renewIPadress()
        continue
    except KeyboardInterrupt:
        print("Someone closed the program")

这里的renewIPadress()是一个用户定义函数,如果被阻止,它可以更改IP地址。您可以不使用此功能。

It is always good to implement exception handling. It does not only help to avoid unexpected exit of script but can also help to log errors and info notification. When using Python requests I prefer to catch exceptions like this:

    try:
        res = requests.get(adress,timeout=30)
    except requests.ConnectionError as e:
        print("OOPS!! Connection Error. Make sure you are connected to Internet. Technical Details given below.\n")
        print(str(e))            
        renewIPadress()
        continue
    except requests.Timeout as e:
        print("OOPS!! Timeout Error")
        print(str(e))
        renewIPadress()
        continue
    except requests.RequestException as e:
        print("OOPS!! General Error")
        print(str(e))
        renewIPadress()
        continue
    except KeyboardInterrupt:
        print("Someone closed the program")

Here renewIPadress() is a user define function which can change the IP address if it get blocked. You can go without this function.


回答 6

在公司环境中指定代理可以为我解决问题。

page = requests.get("http://www.google.com:80", proxies={"http": "http://111.233.225.166:1234"})

完整的错误是:

requests.exceptions.ConnectionError:HTTPSConnectionPool(host =’www.google.com’,port = 80):URL超过了最大重试次数:/(由NewConnectionError(’:导致:无法建立新连接:[WinError 10060]连接尝试失败,因为连接的一方在一段时间后未正确响应,或者建立的连接失败,因为连接的主机未能响应’))

Specifying the proxy in a corporate environment solved it for me.

page = requests.get("http://www.google.com:80", proxies={"http": "http://111.233.225.166:1234"})

The full error is:

requests.exceptions.ConnectionError: HTTPSConnectionPool(host=’www.google.com’, port=80): Max retries exceeded with url: / (Caused by NewConnectionError(‘: Failed to establish a new connection: [WinError 10060] A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond’))


回答 7

即使安装了pyopenssl并尝试了各种python版本(尽管在Mac上运行良好)后,我仍无法使其在Windows上运行,所以我切换到urllib,并且在python 3.6(来自python .org)和3.7(anaconda)上工作)

import urllib 
from urllib.request import urlopen
html = urlopen("http://pythonscraping.com/pages/page1.html")
contents = html.read()
print(contents)

i wasn’t able to make it work on windows even after installing pyopenssl and trying various python versions (while it worked fine on mac), so i switched to urllib and it works on python 3.6 (from python .org) and 3.7 (anaconda)

import urllib 
from urllib.request import urlopen
html = urlopen("http://pythonscraping.com/pages/page1.html")
contents = html.read()
print(contents)

回答 8

在编写硒浏览器测试脚本时,driver.quit()在使用JS api 调用之前进行调用时会遇到此错误。请记住退出网络驱动程序是最后要做的事情!

When I was writing a selenium browser test script, I encountered this error when calling driver.quit() before a usage of a JS api call.Remember that quiting webdriver is last thing to do!


回答 9

为以后遇到这种情况的人增加我自己的经验。我的具体错误是

Failed to establish a new connection: [Errno 8] nodename nor servname provided, or not known'

事实证明,这实际上是因为我已达到系统上打开文件的最大数量。它与失败的连接或指示的DNS错误无关。

Adding my own experience for those who are experiencing this in the future. My specific error was

Failed to establish a new connection: [Errno 8] nodename nor servname provided, or not known'

It turns out that this was actually because I had reach the maximum number of open files on my system. It had nothing to do with failed connections, or even a DNS error as indicated.


回答 10

添加我自己的经验:

r = requests.get(download_url)

当我尝试下载url中指定的文件时。

错误是

HTTPSConnectionPool(host, port=443): Max retries exceeded with url (Caused by SSLError(SSLError("bad handshake: Error([('SSL routines', 'tls_process_server_certificate', 'certificate verify failed')])")))

我通过添加verify = False如下函数来更正了它:

r = requests.get(download_url + filename)
open(filename, 'wb').write(r.content)

Adding my own experience :

r = requests.get(download_url)

when I tried to download a file specified in the url.

The error was

HTTPSConnectionPool(host, port=443): Max retries exceeded with url (Caused by SSLError(SSLError("bad handshake: Error([('SSL routines', 'tls_process_server_certificate', 'certificate verify failed')])")))

I corrected it by adding verify = False in the function as follows :

r = requests.get(download_url + filename)
open(filename, 'wb').write(r.content)

回答 11

添加此请求的标头。

headers={
'Referer': 'https://itunes.apple.com',
'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/75.0.3770.142 Safari/537.36'
}

requests.get(ap, headers=headers)

Add headers for this request.

headers={
'Referer': 'https://itunes.apple.com',
'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/75.0.3770.142 Safari/537.36'
}

requests.get(ap, headers=headers)

声明:本站所有文章,如无特殊说明或标注,均为本站原创发布。任何个人或组织,在未征得本站同意时,禁止复制、盗用、采集、发布本站内容到任何网站、书籍等各类媒体平台。如若本站内容侵犯了原著者的合法权益,可联系我们进行处理。