Python 实用宝典

Question 1

I’m gathering statistics on a list of websites and I’m using requests for it for simplicity. Here is my code:

data=[]
websites=['http://google.com', 'http://bbc.co.uk']
for w in websites:
    r= requests.get(w, verify=False)
    data.append( (r.url, len(r.content), r.elapsed.total_seconds(), str([(l.status_code, l.url) for l in r.history]), str(r.headers.items()), str(r.cookies.items())) )

Now, I want requests.get to timeout after 10 seconds so the loop doesn’t get stuck.

This question has been of interest before too but none of the answers are clean. I will be putting some bounty on this to get a nice answer.

I hear that maybe not using requests is a good idea but then how should I get the nice things requests offer. (the ones in the tuple)

Question 2

What about using eventlet? If you want to timeout the request after 10 seconds, even if data is being received, this snippet will work for you:

import requests
import eventlet
eventlet.monkey_patch()

with eventlet.Timeout(10):
    requests.get("http://ipv4.download.thinkbroadband.com/1GB.zip", verify=False)

Question 3

Set the timeout parameter:

r = requests.get(w, verify=False, timeout=10) # 10 seconds

As long as you don’t set stream=True on that request, this will cause the call to requests.get() to timeout if the connection takes more than ten seconds, or if the server doesn’t send data for more than ten seconds.

Question 4

UPDATE: https://requests.readthedocs.io/en/master/user/advanced/#timeouts

In new version of requests:

If you specify a single value for the timeout, like this:

r = requests.get('https://github.com', timeout=5)

The timeout value will be applied to both the connect and the read timeouts. Specify a tuple if you would like to set the values separately:

r = requests.get('https://github.com', timeout=(3.05, 27))

If the remote server is very slow, you can tell Requests to wait forever for a response, by passing None as a timeout value and then retrieving a cup of coffee.

r = requests.get('https://github.com', timeout=None)

My old (probably outdated) answer (which was posted long time ago):

There are other ways to overcome this problem:

1. Use the TimeoutSauce internal class

From: https://github.com/kennethreitz/requests/issues/1928#issuecomment-35811896

import requests from requests.adapters import TimeoutSauce

class MyTimeout(TimeoutSauce):
    def __init__(self, *args, **kwargs):
        connect = kwargs.get('connect', 5)
        read = kwargs.get('read', connect)
        super(MyTimeout, self).__init__(connect=connect, read=read)

requests.adapters.TimeoutSauce = MyTimeout
This code should cause us to set the read timeout as equal to the connect timeout, which is the timeout value you pass on your Session.get() call. (Note that I haven’t actually tested this code, so it may need some quick debugging, I just wrote it straight into the GitHub window.)

2. Use a fork of requests from kevinburke: https://github.com/kevinburke/requests/tree/connect-timeout

From its documentation: https://github.com/kevinburke/requests/blob/connect-timeout/docs/user/advanced.rst

If you specify a single value for the timeout, like this:
r = requests.get('https://github.com', timeout=5)
The timeout value will be applied to both the connect and the read timeouts. Specify a tuple if you would like to set the values separately:
r = requests.get('https://github.com', timeout=(3.05, 27))

kevinburke has requested it to be merged into the main requests project, but it hasn’t been accepted yet.

Question 5

`timeout = int(seconds)`

Since requests >= 2.4.0, you can use the timeout argument, i.e:

requests.get('https://duckduckgo.com/', timeout=10)

Note:

timeout is not a time limit on the entire response download; rather, an exception is raised if the server has not issued a response for timeout seconds ( more precisely, if no bytes have been received on the underlying socket for timeout seconds). If no timeout is specified explicitly, requests do not time out.

Question 6

To create a timeout you can use signals.

The best way to solve this case is probably to

Set an exception as the handler for the alarm signal
Call the alarm signal with a ten second delay
Call the function inside a try-except-finally block.
The except block is reached if the function timed out.
In the finally block you abort the alarm, so it’s not singnaled later.

Here is some example code:

import signal
from time import sleep

class TimeoutException(Exception):
    """ Simple Exception to be called on timeouts. """
    pass

def _timeout(signum, frame):
    """ Raise an TimeoutException.

    This is intended for use as a signal handler.
    The signum and frame arguments passed to this are ignored.

    """
    # Raise TimeoutException with system default timeout message
    raise TimeoutException()

# Set the handler for the SIGALRM signal:
signal.signal(signal.SIGALRM, _timeout)
# Send the SIGALRM signal in 10 seconds:
signal.alarm(10)

try:    
    # Do our code:
    print('This will take 11 seconds...')
    sleep(11)
    print('done!')
except TimeoutException:
    print('It timed out!')
finally:
    # Abort the sending of the SIGALRM signal:
    signal.alarm(0)

There are some caveats to this:

It is not threadsafe, signals are always delivered to the main thread, so you can’t put this in any other thread.
There is a slight delay after the scheduling of the signal and the execution of the actual code. This means that the example would time out even if it only slept for ten seconds.

But, it’s all in the standard python library! Except for the sleep function import it’s only one import. If you are going to use timeouts many places You can easily put the TimeoutException, _timeout and the singaling in a function and just call that. Or you can make a decorator and put it on functions, see the answer linked below.

You can also set this up as a “context manager” so you can use it with the with statement:

import signal
class Timeout():
    """ Timeout for use with the `with` statement. """

    class TimeoutException(Exception):
        """ Simple Exception to be called on timeouts. """
        pass

    def _timeout(signum, frame):
        """ Raise an TimeoutException.

        This is intended for use as a signal handler.
        The signum and frame arguments passed to this are ignored.

        """
        raise Timeout.TimeoutException()

    def __init__(self, timeout=10):
        self.timeout = timeout
        signal.signal(signal.SIGALRM, Timeout._timeout)

    def __enter__(self):
        signal.alarm(self.timeout)

    def __exit__(self, exc_type, exc_value, traceback):
        signal.alarm(0)
        return exc_type is Timeout.TimeoutException

# Demonstration:
from time import sleep

print('This is going to take maximum 10 seconds...')
with Timeout(10):
    sleep(15)
    print('No timeout?')
print('Done')

One possible down side with this context manager approach is that you can’t know if the code actually timed out or not.

Sources and recommended reading:

The documentation on signals
This answer on timeouts by @David Narayan. He has organized the above code as a decorator.

Question 7

Try this request with timeout & error handling:

import requests
try: 
    url = "http://google.com"
    r = requests.get(url, timeout=10)
except requests.exceptions.Timeout as e: 
    print e

Question 8

Set stream=True and use r.iter_content(1024). Yes, eventlet.Timeout just somehow doesn’t work for me.

try:
    start = time()
    timeout = 5
    with get(config['source']['online'], stream=True, timeout=timeout) as r:
        r.raise_for_status()
        content = bytes()
        content_gen = r.iter_content(1024)
        while True:
            if time()-start > timeout:
                raise TimeoutError('Time out! ({} seconds)'.format(timeout))
            try:
                content += next(content_gen)
            except StopIteration:
                break
        data = content.decode().split('\n')
        if len(data) in [0, 1]:
            raise ValueError('Bad requests data')
except (exceptions.RequestException, ValueError, IndexError, KeyboardInterrupt,
        TimeoutError) as e:
    print(e)
    with open(config['source']['local']) as f:
        data = [line.strip() for line in f.readlines()]

The discussion is here https://redd.it/80kp1h

Question 9

This may be overkill, but the Celery distributed task queue has good support for timeouts.

In particular, you can define a soft time limit that just raises an exception in your process (so you can clean up) and/or a hard time limit that terminates the task when the time limit has been exceeded.

Under the covers, this uses the same signals approach as referenced in your “before” post, but in a more usable and manageable way. And if the list of web sites you are monitoring is long, you might benefit from its primary feature — all kinds of ways to manage the execution of a large number of tasks.

Question 10

I believe you can use multiprocessing and not depend on a 3rd party package:

import multiprocessing
import requests

def call_with_timeout(func, args, kwargs, timeout):
    manager = multiprocessing.Manager()
    return_dict = manager.dict()

    # define a wrapper of `return_dict` to store the result.
    def function(return_dict):
        return_dict['value'] = func(*args, **kwargs)

    p = multiprocessing.Process(target=function, args=(return_dict,))
    p.start()

    # Force a max. `timeout` or wait for the process to finish
    p.join(timeout)

    # If thread is still active, it didn't finish: raise TimeoutError
    if p.is_alive():
        p.terminate()
        p.join()
        raise TimeoutError
    else:
        return return_dict['value']

call_with_timeout(requests.get, args=(url,), kwargs={'timeout': 10}, timeout=60)

The timeout passed to kwargs is the timeout to get any response from the server, the argument timeout is the timeout to get the complete response.

Question 11

timeout = (connection timeout, data read timeout) or give a single argument(timeout=1)

import requests

try:
    req = requests.request('GET', 'https://www.google.com',timeout=(1,1))
    print(req)
except requests.ReadTimeout:
    print("READ TIME OUT")

Question 12

this code working for socketError 11004 and 10060……

# -*- encoding:UTF-8 -*-
__author__ = 'ACE'
import requests
from PyQt4.QtCore import *
from PyQt4.QtGui import *


class TimeOutModel(QThread):
    Existed = pyqtSignal(bool)
    TimeOut = pyqtSignal()

    def __init__(self, fun, timeout=500, parent=None):
        """
        @param fun: function or lambda
        @param timeout: ms
        """
        super(TimeOutModel, self).__init__(parent)
        self.fun = fun

        self.timeer = QTimer(self)
        self.timeer.setInterval(timeout)
        self.timeer.timeout.connect(self.time_timeout)
        self.Existed.connect(self.timeer.stop)
        self.timeer.start()

        self.setTerminationEnabled(True)

    def time_timeout(self):
        self.timeer.stop()
        self.TimeOut.emit()
        self.quit()
        self.terminate()

    def run(self):
        self.fun()


bb = lambda: requests.get("http://ipv4.download.thinkbroadband.com/1GB.zip")

a = QApplication([])

z = TimeOutModel(bb, 500)
print 'timeout'

a.exec_()

Question 13

Despite the question being about requests, I find this very easy to do with pycurl CURLOPT_TIMEOUT or CURLOPT_TIMEOUT_MS.

No threading or signaling required:

import pycurl
import StringIO

url = 'http://www.example.com/example.zip'
timeout_ms = 1000
raw = StringIO.StringIO()
c = pycurl.Curl()
c.setopt(pycurl.TIMEOUT_MS, timeout_ms)  # total timeout in milliseconds
c.setopt(pycurl.WRITEFUNCTION, raw.write)
c.setopt(pycurl.NOSIGNAL, 1)
c.setopt(pycurl.URL, url)
c.setopt(pycurl.HTTPGET, 1)
try:
    c.perform()
except pycurl.error:
    traceback.print_exc() # error generated on timeout
    pass # or just pass if you don't want to print the error

Question 14

In case you’re using the option stream=True you can do this:

r = requests.get(
    'http://url_to_large_file',
    timeout=1,  # relevant only for underlying socket
    stream=True)

with open('/tmp/out_file.txt'), 'wb') as f:
    start_time = time.time()
    for chunk in r.iter_content(chunk_size=1024):
        if chunk:  # filter out keep-alive new chunks
            f.write(chunk)
        if time.time() - start_time > 8:
            raise Exception('Request took longer than 8s')

The solution does not need signals or multiprocessing.

Question 15

Just another one solution (got it from http://docs.python-requests.org/en/master/user/advanced/#streaming-uploads)

Before upload you can find out the content size:

TOO_LONG = 10*1024*1024  # 10 Mb
big_url = "http://ipv4.download.thinkbroadband.com/1GB.zip"
r = requests.get(big_url, stream=True)
print (r.headers['content-length'])
# 1073741824  

if int(r.headers['content-length']) < TOO_LONG:
    # upload content:
    content = r.content

But be careful, a sender can set up incorrect value in the ‘content-length’ response field.

Question 16

If it comes to that, create a watchdog thread that messes up requests’ internal state after 10 seconds, e.g.:

closes the underlying socket, and ideally
triggers an exception if requests retries the operation

Note that depending on the system libraries you may be unable to set deadline on DNS resolution.

Question 17

Well, I tried many solutions on this page and still faced instabilities, random hangs, poor connections performance.

I’m now using Curl and i’m really happy about it’s “max time” functionnality and about the global performances, even with such a poor implementation :

content=commands.getoutput('curl -m6 -Ss "http://mywebsite.xyz"')

Here, I defined a 6 seconds max time parameter, englobing both connection and transfer time.

I’m sure Curl has a nice python binding, if you prefer to stick to the pythonic syntax :)

Question 18

There is a package called timeout-decorator that you can use to time out any python function.

@timeout_decorator.timeout(5)
def mytest():
    print("Start")
    for i in range(1,10):
        time.sleep(1)
        print("{} seconds have passed".format(i))

It uses the signals approach that some answers here suggest. Alternatively, you can tell it to use multiprocessing instead of signals (e.g. if you are in a multi-thread environment).

Question 19

I’m using requests 2.2.1 and eventlet didn’t work for me. Instead I was able use gevent timeout instead since gevent is used in my service for gunicorn.

import gevent
import gevent.monkey
gevent.monkey.patch_all(subprocess=True)
try:
    with gevent.Timeout(5):
        ret = requests.get(url)
        print ret.status_code, ret.content
except gevent.timeout.Timeout as e:
    print "timeout: {}".format(e.message)

Please note that gevent.timeout.Timeout is not caught by general Exception handling. So either explicitly catch gevent.timeout.Timeout or pass in a different exception to be used like so: with gevent.Timeout(5, requests.exceptions.Timeout): although no message is passed when this exception is raised.

Question 20

I came up with a more direct solution that is admittedly ugly but fixes the real problem. It goes a bit like this:

resp = requests.get(some_url, stream=True)
resp.raw._fp.fp._sock.settimeout(read_timeout)
# This will load the entire response even though stream is set
content = resp.content

You can read the full explanation here

Python 实用宝典

python请求超时。获取整个响应

问题：python请求超时。获取整个响应

回答 0

回答 1

回答 2

回答 3

`timeout = int(seconds)`

`timeout = int(seconds)`

回答 4

回答 5

回答 6

回答 7

回答 8

回答 9

回答 10

回答 11

回答 12

回答 13

回答 14

回答 15

回答 16

回答 17

回答 18

有趣好用的Python教程