标签归档:request

网址中的熊猫read_csv

问题:网址中的熊猫read_csv

我将Python 3.4与IPython结合使用,并具有以下代码。我无法从给定的URL读取csv文件:

import pandas as pd
import requests

url="https://github.com/cs109/2014_data/blob/master/countries.csv"
s=requests.get(url).content
c=pd.read_csv(s)

我有以下错误

“预期的文件路径名或类文件对象,得到类型”

我怎样才能解决这个问题?

I am using Python 3.4 with IPython and have the following code. I’m unable to read a csv-file from the given URL:

import pandas as pd
import requests

url="https://github.com/cs109/2014_data/blob/master/countries.csv"
s=requests.get(url).content
c=pd.read_csv(s)

I have the following error

“Expected file path name or file-like object, got type”

How can I fix this?


回答 0

更新资料

0.19.2现在,您可以从熊猫直接传递URL


正如错误所暗示的,pandas.read_csv需要一个类似文件的对象作为第一个参数。

如果要从字符串读取csv,可以使用io.StringIO(Python 3.x)或StringIO.StringIO(Python 2.x)

另外,对于URL- https://github.com/cs109/2014_data/blob/master/countries.csv-您正在获得html响应,而不是原始的csv,您应该使用Rawgithub页面中的链接给出的url 获取原始的csv响应-https: //raw.githubusercontent.com/cs109/2014_data/master/countries.csv

范例-

import pandas as pd
import io
import requests
url="https://raw.githubusercontent.com/cs109/2014_data/master/countries.csv"
s=requests.get(url).content
c=pd.read_csv(io.StringIO(s.decode('utf-8')))

Update

From pandas 0.19.2 you can now just pass the url directly.


Just as the error suggests, pandas.read_csv needs a file-like object as the first argument.

If you want to read the csv from a string, you can use io.StringIO (Python 3.x) or StringIO.StringIO (Python 2.x) .

Also, for the URL – https://github.com/cs109/2014_data/blob/master/countries.csv – you are getting back html response , not raw csv, you should use the url given by the Raw link in the github page for getting raw csv response , which is – https://raw.githubusercontent.com/cs109/2014_data/master/countries.csv

Example –

import pandas as pd
import io
import requests
url="https://raw.githubusercontent.com/cs109/2014_data/master/countries.csv"
s=requests.get(url).content
c=pd.read_csv(io.StringIO(s.decode('utf-8')))

回答 1

在最新版本的pandas(0.19.2)中,您可以直接传递网址

import pandas as pd

url="https://raw.githubusercontent.com/cs109/2014_data/master/countries.csv"
c=pd.read_csv(url)

In the latest version of pandas (0.19.2) you can directly pass the url

import pandas as pd

url="https://raw.githubusercontent.com/cs109/2014_data/master/countries.csv"
c=pd.read_csv(url)

回答 2

正如我评论的那样,您需要使用StringIO对象并进行解码,即c=pd.read_csv(io.StringIO(s.decode("utf-8")))如果使用请求,则需要进行解码,因为如果您使用.text ,则content会返回字节,您只需要像s = requests.get(url).textc = 那样传递s即可pd.read_csv(StringIO(s))

一种更简单的方法是将原始数据的正确url 直接传递给read_csv,您不必传递像object这样的文件,您可以传递url从而根本不需要请求:

c = pd.read_csv("https://raw.githubusercontent.com/cs109/2014_data/master/countries.csv")

print(c)

输出:

                              Country         Region
0                             Algeria         AFRICA
1                              Angola         AFRICA
2                               Benin         AFRICA
3                            Botswana         AFRICA
4                             Burkina         AFRICA
5                             Burundi         AFRICA
6                            Cameroon         AFRICA
..................................

文档

filepath_or_buffer

字符串或文件句柄/ StringIO字符串可以是URL。有效的URL方案包括http,ftp,s3和file。对于文件URL,需要一个主机。例如,本地文件可以是文件://localhost/path/to/table.csv

As I commented you need to use a StringIO object and decode i.e c=pd.read_csv(io.StringIO(s.decode("utf-8"))) if using requests, you need to decode as .content returns bytes if you used .text you would just need to pass s as is s = requests.get(url).text c = pd.read_csv(StringIO(s)).

A simpler approach is to pass the correct url of the raw data directly to read_csv, you don’t have to pass a file like object, you can pass a url so you don’t need requests at all:

c = pd.read_csv("https://raw.githubusercontent.com/cs109/2014_data/master/countries.csv")

print(c)

Output:

                              Country         Region
0                             Algeria         AFRICA
1                              Angola         AFRICA
2                               Benin         AFRICA
3                            Botswana         AFRICA
4                             Burkina         AFRICA
5                             Burundi         AFRICA
6                            Cameroon         AFRICA
..................................

From the docs:

filepath_or_buffer :

string or file handle / StringIO The string could be a URL. Valid URL schemes include http, ftp, s3, and file. For file URLs, a host is expected. For instance, a local file could be file ://localhost/path/to/table.csv


回答 3

您遇到的问题是,进入变量s的输出不是csv,而是html文件。为了获得原始的csv,您必须将url修改为:

https://raw.githubusercontent.com/cs109/2014_data/master/countries.csv

您的第二个问题是read_csv需要一个文件名,我们可以通过使用io模块中的StringIO来解决此问题。第三个问题是request.get(url).content提供了字节流,我们可以改用request.get(url).text解决。

最终结果是此代码:

from io import StringIO

import pandas as pd
import requests
url='https://raw.githubusercontent.com/cs109/2014_data/master/countries.csv'
s=requests.get(url).text

c=pd.read_csv(StringIO(s))

输出:

>>> c.head()
    Country  Region
0   Algeria  AFRICA
1    Angola  AFRICA
2     Benin  AFRICA
3  Botswana  AFRICA
4   Burkina  AFRICA

The problem you’re having is that the output you get into the variable ‘s’ is not a csv, but a html file. In order to get the raw csv, you have to modify the url to:

https://raw.githubusercontent.com/cs109/2014_data/master/countries.csv

Your second problem is that read_csv expects a file name, we can solve this by using StringIO from io module. Third problem is that request.get(url).content delivers a byte stream, we can solve this using the request.get(url).text instead.

End result is this code:

from io import StringIO

import pandas as pd
import requests
url='https://raw.githubusercontent.com/cs109/2014_data/master/countries.csv'
s=requests.get(url).text

c=pd.read_csv(StringIO(s))

output:

>>> c.head()
    Country  Region
0   Algeria  AFRICA
1    Angola  AFRICA
2     Benin  AFRICA
3  Botswana  AFRICA
4   Burkina  AFRICA

回答 4

url = "https://github.com/cs109/2014_data/blob/master/countries.csv"
c = pd.read_csv(url, sep = "\t")
url = "https://github.com/cs109/2014_data/blob/master/countries.csv"
c = pd.read_csv(url, sep = "\t")

回答 5

要通过熊猫中的URL导入数据,只需应用下面的简单代码即可,实际上效果更好。

import pandas as pd
train = pd.read_table("https://urlandfile.com/dataset.csv")
train.head()

如果您对原始数据有疑问,则只需在网址前添加“ r”

import pandas as pd
train = pd.read_table(r"https://urlandfile.com/dataset.csv")
train.head()

To Import Data through URL in pandas just apply the simple below code it works actually better.

import pandas as pd
train = pd.read_table("https://urlandfile.com/dataset.csv")
train.head()

If you are having issues with a raw data then just put ‘r’ before URL

import pandas as pd
train = pd.read_table(r"https://urlandfile.com/dataset.csv")
train.head()

如何模拟请求和响应?

问题:如何模拟请求和响应?

我正在尝试使用Pythons模拟包来模拟Pythons requests模块。使我在以下情况下工作的基本要求是什么?

在我的views.py中,我有一个函数,该函数每次都以不同的响应进行各种request.get()调用

def myview(request):
  res1 = requests.get('aurl')
  res2 = request.get('burl')
  res3 = request.get('curl')

在我的测试类中,我想做这样的事情,但无法找出确切的方法调用

第1步:

# Mock the requests module
# when mockedRequests.get('aurl') is called then return 'a response'
# when mockedRequests.get('burl') is called then return 'b response'
# when mockedRequests.get('curl') is called then return 'c response'

第2步:

给我打电话

第三步:

验证响应包含“ a响应”,“ b响应”,“ c响应”

如何完成第1步(模拟请求模块)?

I am trying to use Pythons mock package to mock Pythons requests module. What are the basic calls to get me working in below scenario?

In my views.py, I have a function that makes variety of requests.get() calls with different response each time

def myview(request):
  res1 = requests.get('aurl')
  res2 = request.get('burl')
  res3 = request.get('curl')

In my test class I want to do something like this but cannot figure out exact method calls

Step 1:

# Mock the requests module
# when mockedRequests.get('aurl') is called then return 'a response'
# when mockedRequests.get('burl') is called then return 'b response'
# when mockedRequests.get('curl') is called then return 'c response'

Step 2:

Call my view

Step 3:

verify response contains ‘a response’, ‘b response’ , ‘c response’

How can I complete Step 1 (mocking the requests module)?


回答 0

这是您可以执行的操作(可以按原样运行此文件):

import requests
import unittest
from unittest import mock

# This is the class we want to test
class MyGreatClass:
    def fetch_json(self, url):
        response = requests.get(url)
        return response.json()

# This method will be used by the mock to replace requests.get
def mocked_requests_get(*args, **kwargs):
    class MockResponse:
        def __init__(self, json_data, status_code):
            self.json_data = json_data
            self.status_code = status_code

        def json(self):
            return self.json_data

    if args[0] == 'http://someurl.com/test.json':
        return MockResponse({"key1": "value1"}, 200)
    elif args[0] == 'http://someotherurl.com/anothertest.json':
        return MockResponse({"key2": "value2"}, 200)

    return MockResponse(None, 404)

# Our test case class
class MyGreatClassTestCase(unittest.TestCase):

    # We patch 'requests.get' with our own method. The mock object is passed in to our test case method.
    @mock.patch('requests.get', side_effect=mocked_requests_get)
    def test_fetch(self, mock_get):
        # Assert requests.get calls
        mgc = MyGreatClass()
        json_data = mgc.fetch_json('http://someurl.com/test.json')
        self.assertEqual(json_data, {"key1": "value1"})
        json_data = mgc.fetch_json('http://someotherurl.com/anothertest.json')
        self.assertEqual(json_data, {"key2": "value2"})
        json_data = mgc.fetch_json('http://nonexistenturl.com/cantfindme.json')
        self.assertIsNone(json_data)

        # We can even assert that our mocked method was called with the right parameters
        self.assertIn(mock.call('http://someurl.com/test.json'), mock_get.call_args_list)
        self.assertIn(mock.call('http://someotherurl.com/anothertest.json'), mock_get.call_args_list)

        self.assertEqual(len(mock_get.call_args_list), 3)

if __name__ == '__main__':
    unittest.main()

重要说明:如果您的MyGreatClass类位于不同的程序包中,请说my.great.package,您必须进行模拟,my.great.package.requests.get而不仅仅是’request.get’。在这种情况下,您的测试用例将如下所示:

import unittest
from unittest import mock
from my.great.package import MyGreatClass

# This method will be used by the mock to replace requests.get
def mocked_requests_get(*args, **kwargs):
    # Same as above


class MyGreatClassTestCase(unittest.TestCase):

    # Now we must patch 'my.great.package.requests.get'
    @mock.patch('my.great.package.requests.get', side_effect=mocked_requests_get)
    def test_fetch(self, mock_get):
        # Same as above

if __name__ == '__main__':
    unittest.main()

请享用!

This is how you can do it (you can run this file as-is):

import requests
import unittest
from unittest import mock

# This is the class we want to test
class MyGreatClass:
    def fetch_json(self, url):
        response = requests.get(url)
        return response.json()

# This method will be used by the mock to replace requests.get
def mocked_requests_get(*args, **kwargs):
    class MockResponse:
        def __init__(self, json_data, status_code):
            self.json_data = json_data
            self.status_code = status_code

        def json(self):
            return self.json_data

    if args[0] == 'http://someurl.com/test.json':
        return MockResponse({"key1": "value1"}, 200)
    elif args[0] == 'http://someotherurl.com/anothertest.json':
        return MockResponse({"key2": "value2"}, 200)

    return MockResponse(None, 404)

# Our test case class
class MyGreatClassTestCase(unittest.TestCase):

    # We patch 'requests.get' with our own method. The mock object is passed in to our test case method.
    @mock.patch('requests.get', side_effect=mocked_requests_get)
    def test_fetch(self, mock_get):
        # Assert requests.get calls
        mgc = MyGreatClass()
        json_data = mgc.fetch_json('http://someurl.com/test.json')
        self.assertEqual(json_data, {"key1": "value1"})
        json_data = mgc.fetch_json('http://someotherurl.com/anothertest.json')
        self.assertEqual(json_data, {"key2": "value2"})
        json_data = mgc.fetch_json('http://nonexistenturl.com/cantfindme.json')
        self.assertIsNone(json_data)

        # We can even assert that our mocked method was called with the right parameters
        self.assertIn(mock.call('http://someurl.com/test.json'), mock_get.call_args_list)
        self.assertIn(mock.call('http://someotherurl.com/anothertest.json'), mock_get.call_args_list)

        self.assertEqual(len(mock_get.call_args_list), 3)

if __name__ == '__main__':
    unittest.main()

Important Note: If your MyGreatClass class lives in a different package, say my.great.package, you have to mock my.great.package.requests.get instead of just ‘request.get’. In that case your test case would look like this:

import unittest
from unittest import mock
from my.great.package import MyGreatClass

# This method will be used by the mock to replace requests.get
def mocked_requests_get(*args, **kwargs):
    # Same as above


class MyGreatClassTestCase(unittest.TestCase):

    # Now we must patch 'my.great.package.requests.get'
    @mock.patch('my.great.package.requests.get', side_effect=mocked_requests_get)
    def test_fetch(self, mock_get):
        # Same as above

if __name__ == '__main__':
    unittest.main()

Enjoy!


回答 1

尝试使用响应库

import responses
import requests

@responses.activate
def test_simple():
    responses.add(responses.GET, 'http://twitter.com/api/1/foobar',
                  json={'error': 'not found'}, status=404)

    resp = requests.get('http://twitter.com/api/1/foobar')

    assert resp.json() == {"error": "not found"}

    assert len(responses.calls) == 1
    assert responses.calls[0].request.url == 'http://twitter.com/api/1/foobar'
    assert responses.calls[0].response.text == '{"error": "not found"}'

为您自己设置所有模拟提供了很好的便利

还有HTTPretty

它不是特定于requests库的,在某些方面更强大,尽管我发现它不能很好地检查它拦截的请求,这responses很容易

也有httmock

Try using the responses library. Here is an example from their documentation:

import responses
import requests

@responses.activate
def test_simple():
    responses.add(responses.GET, 'http://twitter.com/api/1/foobar',
                  json={'error': 'not found'}, status=404)

    resp = requests.get('http://twitter.com/api/1/foobar')

    assert resp.json() == {"error": "not found"}

    assert len(responses.calls) == 1
    assert responses.calls[0].request.url == 'http://twitter.com/api/1/foobar'
    assert responses.calls[0].response.text == '{"error": "not found"}'

It provides quite a nice convenience over setting up all the mocking yourself.

There’s also HTTPretty:

It’s not specific to requests library, more powerful in some ways though I found it doesn’t lend itself so well to inspecting the requests that it intercepted, which responses does quite easily

There’s also httmock.


回答 2

这对我有用:

import mock
@mock.patch('requests.get', mock.Mock(side_effect = lambda k:{'aurl': 'a response', 'burl' : 'b response'}.get(k, 'unhandled request %s'%k)))

Here is what worked for me:

import mock
@mock.patch('requests.get', mock.Mock(side_effect = lambda k:{'aurl': 'a response', 'burl' : 'b response'}.get(k, 'unhandled request %s'%k)))

回答 3

我使用请求模拟为单独的模块编写测试:

# module.py
import requests

class A():

    def get_response(self, url):
        response = requests.get(url)
        return response.text

和测试:

# tests.py
import requests_mock
import unittest

from module import A


class TestAPI(unittest.TestCase):

    @requests_mock.mock()
    def test_get_response(self, m):
        a = A()
        m.get('http://aurl.com', text='a response')
        self.assertEqual(a.get_response('http://aurl.com'), 'a response')
        m.get('http://burl.com', text='b response')
        self.assertEqual(a.get_response('http://burl.com'), 'b response')
        m.get('http://curl.com', text='c response')
        self.assertEqual(a.get_response('http://curl.com'), 'c response')

if __name__ == '__main__':
    unittest.main()

I used requests-mock for writing tests for separate module:

# module.py
import requests

class A():

    def get_response(self, url):
        response = requests.get(url)
        return response.text

And the tests:

# tests.py
import requests_mock
import unittest

from module import A


class TestAPI(unittest.TestCase):

    @requests_mock.mock()
    def test_get_response(self, m):
        a = A()
        m.get('http://aurl.com', text='a response')
        self.assertEqual(a.get_response('http://aurl.com'), 'a response')
        m.get('http://burl.com', text='b response')
        self.assertEqual(a.get_response('http://burl.com'), 'b response')
        m.get('http://curl.com', text='c response')
        self.assertEqual(a.get_response('http://curl.com'), 'c response')

if __name__ == '__main__':
    unittest.main()

回答 4

这是模拟请求的方法,将其更改为http方法

@patch.object(requests, 'post')
def your_test_method(self, mockpost):
    mockresponse = Mock()
    mockpost.return_value = mockresponse
    mockresponse.text = 'mock return'

    #call your target method now

this is how you mock requests.post, change it to your http method

@patch.object(requests, 'post')
def your_test_method(self, mockpost):
    mockresponse = Mock()
    mockpost.return_value = mockresponse
    mockresponse.text = 'mock return'

    #call your target method now

回答 5

如果要模拟假响应,另一种方法是简单地实例化基本HttpResponse类的实例,如下所示:

from django.http.response import HttpResponseBase

self.fake_response = HttpResponseBase()

If you want to mock a fake response, another way to do it is to simply instantiate an instance of the base HttpResponse class, like so:

from django.http.response import HttpResponseBase

self.fake_response = HttpResponseBase()

回答 6

解决请求的一种可能方法是使用库betamax,它记录所有请求,然后,如果您在具有相同参数的相同url中发出请求,则betamax将使用记录的请求,我一直在使用它来测试Web搜寻器它节省了我很多时间。

import os

import requests
from betamax import Betamax
from betamax_serializers import pretty_json


WORKERS_DIR = os.path.dirname(os.path.abspath(__file__))
CASSETTES_DIR = os.path.join(WORKERS_DIR, u'resources', u'cassettes')
MATCH_REQUESTS_ON = [u'method', u'uri', u'path', u'query']

Betamax.register_serializer(pretty_json.PrettyJSONSerializer)
with Betamax.configure() as config:
    config.cassette_library_dir = CASSETTES_DIR
    config.default_cassette_options[u'serialize_with'] = u'prettyjson'
    config.default_cassette_options[u'match_requests_on'] = MATCH_REQUESTS_ON
    config.default_cassette_options[u'preserve_exact_body_bytes'] = True


class WorkerCertidaoTRT2:
    session = requests.session()

    def make_request(self, input_json):
        with Betamax(self.session) as vcr:
            vcr.use_cassette(u'google')
            response = session.get('http://www.google.com')

https://betamax.readthedocs.io/en/latest/

One possible way to work around requests is using the library betamax, it records all requests and after that if you make a request in the same url with the same parameters the betamax will use the recorded request, I have been using it to test web crawler and it save me a lot time.

import os

import requests
from betamax import Betamax
from betamax_serializers import pretty_json


WORKERS_DIR = os.path.dirname(os.path.abspath(__file__))
CASSETTES_DIR = os.path.join(WORKERS_DIR, u'resources', u'cassettes')
MATCH_REQUESTS_ON = [u'method', u'uri', u'path', u'query']

Betamax.register_serializer(pretty_json.PrettyJSONSerializer)
with Betamax.configure() as config:
    config.cassette_library_dir = CASSETTES_DIR
    config.default_cassette_options[u'serialize_with'] = u'prettyjson'
    config.default_cassette_options[u'match_requests_on'] = MATCH_REQUESTS_ON
    config.default_cassette_options[u'preserve_exact_body_bytes'] = True


class WorkerCertidaoTRT2:
    session = requests.session()

    def make_request(self, input_json):
        with Betamax(self.session) as vcr:
            vcr.use_cassette(u'google')
            response = session.get('http://www.google.com')

https://betamax.readthedocs.io/en/latest/


回答 7

对于那些仍在挣扎,从urllib或urllib2 / urllib3转换为请求并尝试模拟响应的人来说,这只是一个有用的提示-在实现我的模拟时,我遇到了一个令人困惑的错误:

with requests.get(path, auth=HTTPBasicAuth('user', 'pass'), verify=False) as url:

AttributeError:__enter__

好吧,当然,如果我对with工作原理一无所知(我不知道),那我就会知道这是一种残余的,不必要的环境(摘自PEP 343)。不必要的使用请求库时,因为它基本上给你同样的事情引擎盖下。只需移开,with然后使用裸露requests.get(...)Bob的叔叔

Just a helpful hint to those that are still struggling, converting from urllib or urllib2/urllib3 to requests AND trying to mock a response- I was getting a slightly confusing error when implementing my mock:

with requests.get(path, auth=HTTPBasicAuth('user', 'pass'), verify=False) as url:

AttributeError: __enter__

Well, of course, if I knew anything about how with works (I didn’t), I’d know it was a vestigial, unnecessary context (from PEP 343). Unnecessary when using the requests library because it does basically the same thing for you under the hood. Just remove the with and use bare requests.get(...) and Bob’s your uncle.


回答 8

因为我很难弄清楚如何模拟异步api调用,所以我将添加此信息。

这是我模拟异步调用的操作。

这是我要测试的功能

async def get_user_info(headers, payload):
    return await httpx.AsyncClient().post(URI, json=payload, headers=headers)

您仍然需要MockResponse类

class MockResponse:
    def __init__(self, json_data, status_code):
        self.json_data = json_data
        self.status_code = status_code

    def json(self):
        return self.json_data

您添加MockResponseAsync类

class MockResponseAsync:
    def __init__(self, json_data, status_code):
        self.response = MockResponse(json_data, status_code)

    async def getResponse(self):
        return self.response

这是测试。重要的是我在创建响应之前就已经创建了响应,因为init函数不能是异步的,并且对getResponse的调用是异步的,因此都已签出。

@pytest.mark.asyncio
@patch('httpx.AsyncClient')
async def test_get_user_info_valid(self, mock_post):
    """test_get_user_info_valid"""
    # Given
    token_bd = "abc"
    username = "bob"
    payload = {
        'USERNAME': username,
        'DBNAME': 'TEST'
    }
    headers = {
        'Authorization': 'Bearer ' + token_bd,
        'Content-Type': 'application/json'
    }
    async_response = MockResponseAsync("", 200)
    mock_post.return_value.post.return_value = async_response.getResponse()

    # When
    await api_bd.get_user_info(headers, payload)

    # Then
    mock_post.return_value.post.assert_called_once_with(
        URI, json=payload, headers=headers)

如果您有更好的方法,请告诉我,但我认为这样很干净。

I will add this information since I had a hard time figuring how to mock an async api call.

Here is what I did to mock an async call.

Here is the function I wanted to test

async def get_user_info(headers, payload):
    return await httpx.AsyncClient().post(URI, json=payload, headers=headers)

You still need the MockResponse class

class MockResponse:
    def __init__(self, json_data, status_code):
        self.json_data = json_data
        self.status_code = status_code

    def json(self):
        return self.json_data

You add the MockResponseAsync class

class MockResponseAsync:
    def __init__(self, json_data, status_code):
        self.response = MockResponse(json_data, status_code)

    async def getResponse(self):
        return self.response

Here is the test. The important thing here is I create the response before since init function can’t be async and the call to getResponse is async so it all checked out.

@pytest.mark.asyncio
@patch('httpx.AsyncClient')
async def test_get_user_info_valid(self, mock_post):
    """test_get_user_info_valid"""
    # Given
    token_bd = "abc"
    username = "bob"
    payload = {
        'USERNAME': username,
        'DBNAME': 'TEST'
    }
    headers = {
        'Authorization': 'Bearer ' + token_bd,
        'Content-Type': 'application/json'
    }
    async_response = MockResponseAsync("", 200)
    mock_post.return_value.post.return_value = async_response.getResponse()

    # When
    await api_bd.get_user_info(headers, payload)

    # Then
    mock_post.return_value.post.assert_called_once_with(
        URI, json=payload, headers=headers)

If you have a better way of doing that tell me but I think it’s pretty clean like that.


尝试/使用Python请求模块的正确方法?

问题:尝试/使用Python请求模块的正确方法?

try:
    r = requests.get(url, params={'s': thing})
except requests.ConnectionError, e:
    print e #should I also sys.exit(1) after this?

它是否正确?有没有更好的方法来构造它?这会覆盖我所有的基地吗?

try:
    r = requests.get(url, params={'s': thing})
except requests.ConnectionError, e:
    print e #should I also sys.exit(1) after this?

Is this correct? Is there a better way to structure this? Will this cover all my bases?


回答 0

看一下Requests 异常文档。简而言之:

如果出现网络问题(例如DNS故障,连接被拒绝等),请求将引发ConnectionError异常。

如果发生罕见的无效HTTP响应,则请求将引发HTTPError异常。

如果请求超时,Timeout则会引发异常。

如果请求超过配置的最大重定向数,TooManyRedirects则会引发异常。

请求显式引发的所有异常都继承自requests.exceptions.RequestException

要回答您的问题,您显示的内容不会涵盖所有基础。您将只捕获与连接有关的错误,而不是超时的错误。

捕获异常时该做什么实际上取决于脚本/程序的设计。退出是否可以接受?您可以再试一次吗?如果错误是灾难性的,并且您无法继续进行,那么可以,您可以通过引发SystemExit(一种打印错误并调用的好方法)来中止程序sys.exit

您可以捕获基类异常,该异常将处理所有情况:

try:
    r = requests.get(url, params={'s': thing})
except requests.exceptions.RequestException as e:  # This is the correct syntax
    raise SystemExit(e)

或者,您可以分别捕获它们并执行不同的操作。

try:
    r = requests.get(url, params={'s': thing})
except requests.exceptions.Timeout:
    # Maybe set up for a retry, or continue in a retry loop
except requests.exceptions.TooManyRedirects:
    # Tell the user their URL was bad and try a different one
except requests.exceptions.RequestException as e:
    # catastrophic error. bail.
    raise SystemExit(e)

正如克里斯蒂安指出:

如果您希望http错误(例如401未经授权)引发异常,可以调用Response.raise_for_statusHTTPError如果响应是http错误,则将引发。

一个例子:

try:
    r = requests.get('http://www.google.com/nothere')
    r.raise_for_status()
except requests.exceptions.HTTPError as err:
    raise SystemExit(err)

将打印:

404 Client Error: Not Found for url: http://www.google.com/nothere

Have a look at the Requests exception docs. In short:

In the event of a network problem (e.g. DNS failure, refused connection, etc), Requests will raise a ConnectionError exception.

In the event of the rare invalid HTTP response, Requests will raise an HTTPError exception.

If a request times out, a Timeout exception is raised.

If a request exceeds the configured number of maximum redirections, a TooManyRedirects exception is raised.

All exceptions that Requests explicitly raises inherit from requests.exceptions.RequestException.

To answer your question, what you show will not cover all of your bases. You’ll only catch connection-related errors, not ones that time out.

What to do when you catch the exception is really up to the design of your script/program. Is it acceptable to exit? Can you go on and try again? If the error is catastrophic and you can’t go on, then yes, you may abort your program by raising SystemExit (a nice way to both print an error and call sys.exit).

You can either catch the base-class exception, which will handle all cases:

try:
    r = requests.get(url, params={'s': thing})
except requests.exceptions.RequestException as e:  # This is the correct syntax
    raise SystemExit(e)

Or you can catch them separately and do different things.

try:
    r = requests.get(url, params={'s': thing})
except requests.exceptions.Timeout:
    # Maybe set up for a retry, or continue in a retry loop
except requests.exceptions.TooManyRedirects:
    # Tell the user their URL was bad and try a different one
except requests.exceptions.RequestException as e:
    # catastrophic error. bail.
    raise SystemExit(e)

As Christian pointed out:

If you want http errors (e.g. 401 Unauthorized) to raise exceptions, you can call Response.raise_for_status. That will raise an HTTPError, if the response was an http error.

An example:

try:
    r = requests.get('http://www.google.com/nothere')
    r.raise_for_status()
except requests.exceptions.HTTPError as err:
    raise SystemExit(err)

Will print:

404 Client Error: Not Found for url: http://www.google.com/nothere

回答 1

另一项建议是明确的。似乎最好是从特定错误到一般错误,以获取所需的错误来捕获,因此特定错误不会被一般错误掩盖。

url='http://www.google.com/blahblah'

try:
    r = requests.get(url,timeout=3)
    r.raise_for_status()
except requests.exceptions.HTTPError as errh:
    print ("Http Error:",errh)
except requests.exceptions.ConnectionError as errc:
    print ("Error Connecting:",errc)
except requests.exceptions.Timeout as errt:
    print ("Timeout Error:",errt)
except requests.exceptions.RequestException as err:
    print ("OOps: Something Else",err)

Http Error: 404 Client Error: Not Found for url: http://www.google.com/blahblah

url='http://www.google.com/blahblah'

try:
    r = requests.get(url,timeout=3)
    r.raise_for_status()
except requests.exceptions.RequestException as err:
    print ("OOps: Something Else",err)
except requests.exceptions.HTTPError as errh:
    print ("Http Error:",errh)
except requests.exceptions.ConnectionError as errc:
    print ("Error Connecting:",errc)
except requests.exceptions.Timeout as errt:
    print ("Timeout Error:",errt)     

OOps: Something Else 404 Client Error: Not Found for url: http://www.google.com/blahblah

One additional suggestion to be explicit. It seems best to go from specific to general down the stack of errors to get the desired error to be caught, so the specific ones don’t get masked by the general one.

url='http://www.google.com/blahblah'

try:
    r = requests.get(url,timeout=3)
    r.raise_for_status()
except requests.exceptions.HTTPError as errh:
    print ("Http Error:",errh)
except requests.exceptions.ConnectionError as errc:
    print ("Error Connecting:",errc)
except requests.exceptions.Timeout as errt:
    print ("Timeout Error:",errt)
except requests.exceptions.RequestException as err:
    print ("OOps: Something Else",err)

Http Error: 404 Client Error: Not Found for url: http://www.google.com/blahblah

vs

url='http://www.google.com/blahblah'

try:
    r = requests.get(url,timeout=3)
    r.raise_for_status()
except requests.exceptions.RequestException as err:
    print ("OOps: Something Else",err)
except requests.exceptions.HTTPError as errh:
    print ("Http Error:",errh)
except requests.exceptions.ConnectionError as errc:
    print ("Error Connecting:",errc)
except requests.exceptions.Timeout as errt:
    print ("Timeout Error:",errt)     

OOps: Something Else 404 Client Error: Not Found for url: http://www.google.com/blahblah

回答 2

异常对象还包含原始响应e.response,如果需要查看服务器响应中的错误正文,该对象可能很有用。例如:

try:
    r = requests.post('somerestapi.com/post-here', data={'birthday': '9/9/3999'})
    r.raise_for_status()
except requests.exceptions.HTTPError as e:
    print (e.response.text)

Exception object also contains original response e.response, that could be useful if need to see error body in response from the server. For example:

try:
    r = requests.post('somerestapi.com/post-here', data={'birthday': '9/9/3999'})
    r.raise_for_status()
except requests.exceptions.HTTPError as e:
    print (e.response.text)