标签归档:httprequest

Python请求包:处理xml响应

问题:Python请求包:处理xml响应

我非常喜欢该requests程序包及其舒适的方式来处理JSON响应。

不幸的是,我不知道是否还可以处理XML响应。有没有人体验过如何使用该requests包处理XML响应?是否需要包括另一个用于XML解码的包?

I like very much the requests package and its comfortable way to handle JSON responses.

Unfortunately, I did not understand if I can also process XML responses. Has anybody experience how to handle XML responses with the requests package? Is it necessary to include another package for the XML decoding?


回答 0

requests不处理解析XML响应,否。XML响应本质上比JSON响应复杂得多,如何将XML数据序列化为Python结构几乎不那么简单。

Python带有内置的XML解析器。我建议您使用ElementTree API

import requests
from xml.etree import ElementTree

response = requests.get(url)

tree = ElementTree.fromstring(response.content)

或者,如果响应特别大,请使用增量方法:

response = requests.get(url, stream=True)
# if the server sent a Gzip or Deflate compressed response, decompress
# as we read the raw stream:
response.raw.decode_content = True

events = ElementTree.iterparse(response.raw)
for event, elem in events:
    # do something with `elem`

外部lxml项目建立在相同的API上,仍然为您提供更多功能和强大功能。

requests does not handle parsing XML responses, no. XML responses are much more complex in nature than JSON responses, how you’d serialize XML data into Python structures is not nearly as straightforward.

Python comes with built-in XML parsers. I recommend you use the ElementTree API:

import requests
from xml.etree import ElementTree

response = requests.get(url)

tree = ElementTree.fromstring(response.content)

or, if the response is particularly large, use an incremental approach:

    response = requests.get(url, stream=True)
    # if the server sent a Gzip or Deflate compressed response, decompress
    # as we read the raw stream:
    response.raw.decode_content = True

    events = ElementTree.iterparse(response.raw)
    for event, elem in events:
        # do something with `elem`

The external lxml project builds on the same API to give you more features and power still.


如何在Django中获取所有请求标头?

问题:如何在Django中获取所有请求标头?

我需要获取所有Django请求标头。根据我的阅读,Django只是将所有内容request.META与大量其他数据一起转储到变量中。获取客户端发送到我的Django应用程序的所有标头的最佳方法是什么?

我将使用这些来构建httplib请求。

I need to get all the Django request headers. From what i’ve read, Django simply dumps everything into the request.META variable along with a lot aof other data. What would be the best way to get all the headers that the client sent to my Django application?

I’m going use these to build a httplib request.


回答 0

根据文档,这 request.META是“包含所有可用HTTP标头的标准Python词典”。如果要获取所有标头,则可以简单地遍历字典。

代码的哪一部分执行此操作取决于您的确切要求。有权访问的任何地方都request应该这样做。

更新资料

我需要在Middleware类中访问它,但是当我对其进行迭代时,除了HTTP标头之外,我还获得了很多其他值。

从文档中:

除了CONTENT_LENGTH和之外CONTENT_TYPE,如上所述,通过将所有字符HTTP都转换META为大写字母,用下划线替换所有连字符并将名称添加HTTP_前缀,将请求中的所有标头都转换为键。

(添加了强调)

HTTP单独获取标题,只需按前缀为的键进行过滤HTTP_

更新2

您能否告诉我如何通过从request.META变量中滤除所有以HTTP_开头并除去开头的HTTP_部分的键来构建标头字典。

当然。这是一种方法。

import re
regex = re.compile('^HTTP_')
dict((regex.sub('', header), value) for (header, value) 
       in request.META.items() if header.startswith('HTTP_'))

According to the documentation request.META is a “standard Python dictionary containing all available HTTP headers”. If you want to get all the headers you can simply iterate through the dictionary.

Which part of your code to do this depends on your exact requirement. Anyplace that has access to request should do.

Update

I need to access it in a Middleware class but when i iterate over it, I get a lot of values apart from HTTP headers.

From the documentation:

With the exception of CONTENT_LENGTH and CONTENT_TYPE, as given above, any HTTP headers in the request are converted to META keys by converting all characters to uppercase, replacing any hyphens with underscores and adding an HTTP_ prefix to the name.

(Emphasis added)

To get the HTTP headers alone, just filter by keys prefixed with HTTP_.

Update 2

could you show me how I could build a dictionary of headers by filtering out all the keys from the request.META variable which begin with a HTTP_ and strip out the leading HTTP_ part.

Sure. Here is one way to do it.

import re
regex = re.compile('^HTTP_')
dict((regex.sub('', header), value) for (header, value) 
       in request.META.items() if header.startswith('HTTP_'))

回答 1

从Django 2.2开始,您可以使用request.headers访问HTTP标头。从HttpRequest.headers文档中

不区分大小写,类似于dict的对象,该对象提供对请求中所有HTTP前缀的标头(加上Content-Length和Content-Type)的访问。

每个标题的名称在显示时都带有标题框(例如User-Agent)。您可以不区分大小写地访问标头:

>>> request.headers
{'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_6', ...}

>>> 'User-Agent' in request.headers
True
>>> 'user-agent' in request.headers
True

>>> request.headers['User-Agent']
Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_6)
>>> request.headers['user-agent']
Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_6)

>>> request.headers.get('User-Agent')
Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_6)
>>> request.headers.get('user-agent')
Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_6)

要获取所有标题,可以使用request.headers.keys()request.headers.items()

Starting from Django 2.2, you can use request.headers to access the HTTP headers. From the documentation on HttpRequest.headers:

A case insensitive, dict-like object that provides access to all HTTP-prefixed headers (plus Content-Length and Content-Type) from the request.

The name of each header is stylized with title-casing (e.g. User-Agent) when it’s displayed. You can access headers case-insensitively:

>>> request.headers
{'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_6', ...}

>>> 'User-Agent' in request.headers
True
>>> 'user-agent' in request.headers
True

>>> request.headers['User-Agent']
Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_6)
>>> request.headers['user-agent']
Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_6)

>>> request.headers.get('User-Agent')
Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_6)
>>> request.headers.get('user-agent')
Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_6)

To get all headers, you can use request.headers.keys() or request.headers.items().


回答 2

这是另一种实现方法,与上面的Manoj Govindan的答案非常相似:

import re
regex_http_          = re.compile(r'^HTTP_.+$')
regex_content_type   = re.compile(r'^CONTENT_TYPE$')
regex_content_length = re.compile(r'^CONTENT_LENGTH$')

request_headers = {}
for header in request.META:
    if regex_http_.match(header) or regex_content_type.match(header) or regex_content_length.match(header):
        request_headers[header] = request.META[header]

这还将抓取CONTENT_TYPECONTENT_LENGTH请求标头以及HTTP_那些标头。request_headers['some_key]== request.META['some_key']

如果需要包括/省略某些标题,请进行相应的修改。Django在这里列出了一堆,但不是全部:https : //docs.djangoproject.com/en/dev/ref/request-response/#django.http.HttpRequest.META

Django请求标头的算法:

  1. -用下划线替换连字符_
  2. 转换为大写。
  3. 前置HTTP_到原请求的所有头,除了CONTENT_TYPECONTENT_LENGTH

每个标头的值都应保持不变。

This is another way to do it, very similar to Manoj Govindan‘s answer above:

import re
regex_http_          = re.compile(r'^HTTP_.+$')
regex_content_type   = re.compile(r'^CONTENT_TYPE$')
regex_content_length = re.compile(r'^CONTENT_LENGTH$')

request_headers = {}
for header in request.META:
    if regex_http_.match(header) or regex_content_type.match(header) or regex_content_length.match(header):
        request_headers[header] = request.META[header]

That will also grab the CONTENT_TYPE and CONTENT_LENGTH request headers, along with the HTTP_ ones. request_headers['some_key] == request.META['some_key'].

Modify accordingly if you need to include/omit certain headers. Django lists a bunch, but not all, of them here: https://docs.djangoproject.com/en/dev/ref/request-response/#django.http.HttpRequest.META

Django’s algorithm for request headers:

  1. Replace hyphen - with underscore _
  2. Convert to UPPERCASE.
  3. Prepend HTTP_ to all headers in original request, except for CONTENT_TYPE and CONTENT_LENGTH.

The values of each header should be unmodified.


回答 3

request.META.get(’HTTP_AUTHORIZATION’) /python3.6/site-packages/rest_framework/authentication.py

你可以从这个文件中得到…

request.META.get(‘HTTP_AUTHORIZATION’) /python3.6/site-packages/rest_framework/authentication.py

you can get that from this file though…


回答 4

我认为没有简单的方法可以仅获取HTTP标头。您必须遍历request.META字典以获得所需的所有内容。

django-debug-toolbar采用相同的方法显示标题信息。看一下负责检索头信息的文件

I don’t think there is any easy way to get only HTTP headers. You have to iterate through request.META dict to get what all you need.

django-debug-toolbar takes the same approach to show header information. Have a look at this file responsible for retrieving header information.


回答 5

如果要从请求标头获取客户端密钥,则可以尝试以下操作:

from rest_framework.authentication import BaseAuthentication
from rest_framework import exceptions
from apps.authentication.models import CerebroAuth

class CerebroAuthentication(BaseAuthentication):
def authenticate(self, request):
    client_id = request.META.get('HTTP_AUTHORIZATION')
    if not client_id:
        raise exceptions.AuthenticationFailed('Client key not provided')
    client_id = client_id.split()
    if len(client_id) == 1 or len(client_id) > 2:
        msg = ('Invalid secrer key header. No credentials provided.')
        raise exceptions.AuthenticationFailed(msg)
    try:
        client = CerebroAuth.objects.get(client_id=client_id[1])
    except CerebroAuth.DoesNotExist:
        raise exceptions.AuthenticationFailed('No such client')
    return (client, None)

If you want to get client key from request header, u can try following:

from rest_framework.authentication import BaseAuthentication
from rest_framework import exceptions
from apps.authentication.models import CerebroAuth

class CerebroAuthentication(BaseAuthentication):
def authenticate(self, request):
    client_id = request.META.get('HTTP_AUTHORIZATION')
    if not client_id:
        raise exceptions.AuthenticationFailed('Client key not provided')
    client_id = client_id.split()
    if len(client_id) == 1 or len(client_id) > 2:
        msg = ('Invalid secrer key header. No credentials provided.')
        raise exceptions.AuthenticationFailed(msg)
    try:
        client = CerebroAuth.objects.get(client_id=client_id[1])
    except CerebroAuth.DoesNotExist:
        raise exceptions.AuthenticationFailed('No such client')
    return (client, None)

回答 6

对于它的价值,看来您的意图是使用传入的HTTP请求来形成另一个HTTP请求。有点像网关。有一个出色的模块django-revproxy可以完成此任务。

资料来源很好地参考了如何完成您想做的事情。

For what it’s worth, it appears your intent is to use the incoming HTTP request to form another HTTP request. Sort of like a gateway. There is an excellent module django-revproxy that accomplishes exactly this.

The source is a pretty good reference on how to accomplish what you are trying to do.


回答 7

<b>request.META</b><br>
{% for k_meta, v_meta in request.META.items %}
  <code>{{ k_meta }}</code> : {{ v_meta }} <br>
{% endfor %}
<b>request.META</b><br>
{% for k_meta, v_meta in request.META.items %}
  <code>{{ k_meta }}</code> : {{ v_meta }} <br>
{% endfor %}

回答 8

只需从Django 2.2开始使用HttpRequest.headers。以下示例直接取自Django官方文档的请求和响应对象”部分。

>>> request.headers
{'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_6', ...}

>>> 'User-Agent' in request.headers
True
>>> 'user-agent' in request.headers
True

>>> request.headers['User-Agent']
Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_6)
>>> request.headers['user-agent']
Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_6)

>>> request.headers.get('User-Agent')
Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_6)
>>> request.headers.get('user-agent')
Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_6)

Simply you can use HttpRequest.headers from Django 2.2 onward. Following example is directly taken from the official Django Documentation under Request and response objects section.

>>> request.headers
{'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_6', ...}

>>> 'User-Agent' in request.headers
True
>>> 'user-agent' in request.headers
True

>>> request.headers['User-Agent']
Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_6)
>>> request.headers['user-agent']
Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_6)

>>> request.headers.get('User-Agent')
Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_6)
>>> request.headers.get('user-agent')
Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_6)

带有参数数据的Python请求发布

问题:带有参数数据的Python请求发布

这是对API调用的原始请求:

POST http://192.168.3.45:8080/api/v2/event/log?sessionKey=b299d17b896417a7b18f46544d40adb734240cc2&format=json HTTP/1.1
Accept-Encoding: gzip,deflate
Content-Type: application/json
Content-Length: 86
Host: 192.168.3.45:8080
Connection: Keep-Alive
User-Agent: Apache-HttpClient/4.1.1 (java 1.5)

{"eventType":"AAS_PORTAL_START","data":{"uid":"hfe3hf45huf33545","aid":"1","vid":"1"}}"""

该请求返回成功(2xx)响应。

现在,我尝试使用发送此请求requests

>>> import requests
>>> headers = {'content-type' : 'application/json'}
>>> data ={"eventType":"AAS_PORTAL_START","data{"uid":"hfe3hf45huf33545","aid":"1","vid":"1"}}
>>> url = "http://192.168.3.45:8080/api/v2/event/log?sessionKey=9ebbd0b25760557393a43064a92bae539d962103&format=xml&platformId=1"
>>> requests.post(url,params=data,headers=headers)
<Response [400]>

一切对我来说看起来不错,我不太确定自己张贴的错误是什么引起400响应。

This is the raw request for an API call:

POST http://192.168.3.45:8080/api/v2/event/log?sessionKey=b299d17b896417a7b18f46544d40adb734240cc2&format=json HTTP/1.1
Accept-Encoding: gzip,deflate
Content-Type: application/json
Content-Length: 86
Host: 192.168.3.45:8080
Connection: Keep-Alive
User-Agent: Apache-HttpClient/4.1.1 (java 1.5)

{"eventType":"AAS_PORTAL_START","data":{"uid":"hfe3hf45huf33545","aid":"1","vid":"1"}}"""

This request returns a success (2xx) response.

Now I am trying to post this request using requests:

>>> import requests
>>> headers = {'content-type' : 'application/json'}
>>> data ={"eventType":"AAS_PORTAL_START","data{"uid":"hfe3hf45huf33545","aid":"1","vid":"1"}}
>>> url = "http://192.168.3.45:8080/api/v2/event/log?sessionKey=9ebbd0b25760557393a43064a92bae539d962103&format=xml&platformId=1"
>>> requests.post(url,params=data,headers=headers)
<Response [400]>

Everything looks fine to me and I am not quite sure what I posting wrong to get a 400 response.


回答 0

params用于GET样式的URL参数,data用于POST样式的正文信息。在请求中同时提供两种类型的信息是完全合法的,您的请求也是如此,但是您已经将URL参数编码为URL。

您的原始帖子虽然包含JSON数据。requests可以为您处理JSON编码,并且也会设置正确的代码Content-Header;您需要做的就是将Python对象传递为json关键字JSON编码。

您也可以拆分URL参数:

params = {'sessionKey': '9ebbd0b25760557393a43064a92bae539d962103', 'format': 'xml', 'platformId': 1}

然后通过以下方式发布您的数据:

import requests

url = 'http://192.168.3.45:8080/api/v2/event/log'

data = {"eventType": "AAS_PORTAL_START", "data": {"uid": "hfe3hf45huf33545", "aid": "1", "vid": "1"}}
params = {'sessionKey': '9ebbd0b25760557393a43064a92bae539d962103', 'format': 'xml', 'platformId': 1}

requests.post(url, params=params, json=data)

json关键字是新的requests2.4.2版本; 如果仍然需要使用旧版本,请使用json模块手动编码JSON,然后将编码后的结果作为data密钥发布;在这种情况下,您将必须显式设置Content-Type标头:

import requests
import json

headers = {'content-type': 'application/json'}
url = 'http://192.168.3.45:8080/api/v2/event/log'

data = {"eventType": "AAS_PORTAL_START", "data": {"uid": "hfe3hf45huf33545", "aid": "1", "vid": "1"}}
params = {'sessionKey': '9ebbd0b25760557393a43064a92bae539d962103', 'format': 'xml', 'platformId': 1}

requests.post(url, params=params, data=json.dumps(data), headers=headers)

params is for GET-style URL parameters, data is for POST-style body information. It is perfectly legal to provide both types of information in a request, and your request does so too, but you encoded the URL parameters into the URL already.

Your raw post contains JSON data though. requests can handle JSON encoding for you, and it’ll set the correct Content-Header too; all you need to do is pass in the Python object to be encoded as JSON into the json keyword argument.

You could split out the URL parameters as well:

params = {'sessionKey': '9ebbd0b25760557393a43064a92bae539d962103', 'format': 'xml', 'platformId': 1}

then post your data with:

import requests

url = 'http://192.168.3.45:8080/api/v2/event/log'

data = {"eventType": "AAS_PORTAL_START", "data": {"uid": "hfe3hf45huf33545", "aid": "1", "vid": "1"}}
params = {'sessionKey': '9ebbd0b25760557393a43064a92bae539d962103', 'format': 'xml', 'platformId': 1}

requests.post(url, params=params, json=data)

The json keyword is new in requests version 2.4.2; if you still have to use an older version, encode the JSON manually using the json module and post the encoded result as the data key; you will have to explicitly set the Content-Type header in that case:

import requests
import json

headers = {'content-type': 'application/json'}
url = 'http://192.168.3.45:8080/api/v2/event/log'

data = {"eventType": "AAS_PORTAL_START", "data": {"uid": "hfe3hf45huf33545", "aid": "1", "vid": "1"}}
params = {'sessionKey': '9ebbd0b25760557393a43064a92bae539d962103', 'format': 'xml', 'platformId': 1}

requests.post(url, params=params, data=json.dumps(data), headers=headers)

回答 1

将数据设置为此:

data ={"eventType":"AAS_PORTAL_START","data":{"uid":"hfe3hf45huf33545","aid":"1","vid":"1"}}

Set data to this:

data ={"eventType":"AAS_PORTAL_START","data":{"uid":"hfe3hf45huf33545","aid":"1","vid":"1"}}

回答 2

将响应分配给值并测试其属性。这些应该告诉您一些有用的信息。

response = requests.post(url,params=data,headers=headers)
response.status_code
response.text
  • status_code应该只重新确认您之前提供的代码

Assign the response to a value and test the attributes of it. These should tell you something useful.

response = requests.post(url,params=data,headers=headers)
response.status_code
response.text
  • status_code should just reconfirm the code you were given before, of course

带有Python请求的异步请求

问题:带有Python请求的异步请求

我尝试了python 请求库文档中提供的示例。

使用async.map(rs),我得到了响应代码,但是我想获得所请求的每个页面的内容。例如,这不起作用:

out = async.map(rs)
print out[0].content

I tried the sample provided within the documentation of the requests library for python.

With async.map(rs), I get the response codes, but I want to get the content of each page requested. This, for example, does not work:

out = async.map(rs)
print out[0].content

回答 0

注意

下面的答案是适用于请求v0.13.0 +。编写此问题后,异步功能已移至grequests。但是,您可以将其替换requestsgrequests下面的内容,它应该可以工作。

我已经留下了这个答案,以反映原始问题,该问题与使用请求<v0.13.0有关。


async.map 异步执行多个任务,您必须:

  1. 为每个对象定义一个函数(您的任务)
  2. 将该函数添加为请求中的事件挂钩
  3. 调用async.map所有请求/操作的列表

例:

from requests import async
# If using requests > v0.13.0, use
# from grequests import async

urls = [
    'http://python-requests.org',
    'http://httpbin.org',
    'http://python-guide.org',
    'http://kennethreitz.com'
]

# A simple task to do to each response object
def do_something(response):
    print response.url

# A list to hold our things to do via async
async_list = []

for u in urls:
    # The "hooks = {..." part is where you define what you want to do
    # 
    # Note the lack of parentheses following do_something, this is
    # because the response will be used as the first argument automatically
    action_item = async.get(u, hooks = {'response' : do_something})

    # Add the task to our list of things to do via async
    async_list.append(action_item)

# Do our list of things to do via async
async.map(async_list)

Note

The below answer is not applicable to requests v0.13.0+. The asynchronous functionality was moved to grequests after this question was written. However, you could just replace requests with grequests below and it should work.

I’ve left this answer as is to reflect the original question which was about using requests < v0.13.0.


To do multiple tasks with async.map asynchronously you have to:

  1. Define a function for what you want to do with each object (your task)
  2. Add that function as an event hook in your request
  3. Call async.map on a list of all the requests / actions

Example:

from requests import async
# If using requests > v0.13.0, use
# from grequests import async

urls = [
    'http://python-requests.org',
    'http://httpbin.org',
    'http://python-guide.org',
    'http://kennethreitz.com'
]

# A simple task to do to each response object
def do_something(response):
    print response.url

# A list to hold our things to do via async
async_list = []

for u in urls:
    # The "hooks = {..." part is where you define what you want to do
    # 
    # Note the lack of parentheses following do_something, this is
    # because the response will be used as the first argument automatically
    action_item = async.get(u, hooks = {'response' : do_something})

    # Add the task to our list of things to do via async
    async_list.append(action_item)

# Do our list of things to do via async
async.map(async_list)

回答 1

async现在是一个独立的模块:grequests

看到这里:https : //github.com/kennethreitz/grequests

那里:通过Python发送多个HTTP请求的理想方法?

安装:

$ pip install grequests

用法:

建立一个堆栈:

import grequests

urls = [
    'http://www.heroku.com',
    'http://tablib.org',
    'http://httpbin.org',
    'http://python-requests.org',
    'http://kennethreitz.com'
]

rs = (grequests.get(u) for u in urls)

发送堆栈

grequests.map(rs)

结果看起来像

[<Response [200]>, <Response [200]>, <Response [200]>, <Response [200]>, <Response [200]>]

grequests似乎没有为并发请求设置限制,即当多个请求发送到同一服务器时。

async is now an independent module : grequests.

See here : https://github.com/kennethreitz/grequests

And there: Ideal method for sending multiple HTTP requests over Python?

installation:

$ pip install grequests

usage:

build a stack:

import grequests

urls = [
    'http://www.heroku.com',
    'http://tablib.org',
    'http://httpbin.org',
    'http://python-requests.org',
    'http://kennethreitz.com'
]

rs = (grequests.get(u) for u in urls)

send the stack

grequests.map(rs)

result looks like

[<Response [200]>, <Response [200]>, <Response [200]>, <Response [200]>, <Response [200]>]

grequests don’t seem to set a limitation for concurrent requests, ie when multiple requests are sent to the same server.


回答 2

我同时测试了request-futuresgrequests。Grequests速度更快,但是会带来Monkey补丁和依赖关系的其他问题。request-futures比grequests慢几倍。我决定将自己的请求简单地包装到ThreadPoolExecutor中,这几乎与grequests一样快,但是没有外部依赖项。

import requests
import concurrent.futures

def get_urls():
    return ["url1","url2"]

def load_url(url, timeout):
    return requests.get(url, timeout = timeout)

with concurrent.futures.ThreadPoolExecutor(max_workers=20) as executor:

    future_to_url = {executor.submit(load_url, url, 10): url for url in     get_urls()}
    for future in concurrent.futures.as_completed(future_to_url):
        url = future_to_url[future]
        try:
            data = future.result()
        except Exception as exc:
            resp_err = resp_err + 1
        else:
            resp_ok = resp_ok + 1

I tested both requests-futures and grequests. Grequests is faster but brings monkey patching and additional problems with dependencies. requests-futures is several times slower than grequests. I decided to write my own and simply wrapped requests into ThreadPoolExecutor and it was almost as fast as grequests, but without external dependencies.

import requests
import concurrent.futures

def get_urls():
    return ["url1","url2"]

def load_url(url, timeout):
    return requests.get(url, timeout = timeout)

with concurrent.futures.ThreadPoolExecutor(max_workers=20) as executor:

    future_to_url = {executor.submit(load_url, url, 10): url for url in     get_urls()}
    for future in concurrent.futures.as_completed(future_to_url):
        url = future_to_url[future]
        try:
            data = future.result()
        except Exception as exc:
            resp_err = resp_err + 1
        else:
            resp_ok = resp_ok + 1

回答 3

也许要求-未来是另一种选择。

from requests_futures.sessions import FuturesSession

session = FuturesSession()
# first request is started in background
future_one = session.get('http://httpbin.org/get')
# second requests is started immediately
future_two = session.get('http://httpbin.org/get?foo=bar')
# wait for the first request to complete, if it hasn't already
response_one = future_one.result()
print('response one status: {0}'.format(response_one.status_code))
print(response_one.content)
# wait for the second request to complete, if it hasn't already
response_two = future_two.result()
print('response two status: {0}'.format(response_two.status_code))
print(response_two.content)

办公文件中也建议使用此功能。如果您不想参与gevent,那将是一个很好的选择。

maybe requests-futures is another choice.

from requests_futures.sessions import FuturesSession

session = FuturesSession()
# first request is started in background
future_one = session.get('http://httpbin.org/get')
# second requests is started immediately
future_two = session.get('http://httpbin.org/get?foo=bar')
# wait for the first request to complete, if it hasn't already
response_one = future_one.result()
print('response one status: {0}'.format(response_one.status_code))
print(response_one.content)
# wait for the second request to complete, if it hasn't already
response_two = future_two.result()
print('response two status: {0}'.format(response_two.status_code))
print(response_two.content)

It is also recommended in the office document. If you don’t want involve gevent, it’s a good one.


回答 4

我在发布的大多数答案中都遇到了很多问题-它们要么使用已过时的库,这些库已被移植以具有有限的功能,要么为解决方案的执行提供了太多魔力,因此难以处理错误。如果它们不属于上述类别之一,则说明它们是第三方库或已弃用。

某些解决方案完全可以在http请求中正常工作,但是对于任何其他种类的请求(这都是荒谬的),这些解决方案都不够。这里不需要高度定制的解决方案。

简单地使用python内置库asyncio足以执行任何类型的异步请求,并为复杂的和用例特定的错误处理提供足够的流动性。

import asyncio

loop = asyncio.get_event_loop()

def do_thing(params):
    async def get_rpc_info_and_do_chores(id):
        # do things
        response = perform_grpc_call(id)
        do_chores(response)

    async def get_httpapi_info_and_do_chores(id):
        # do things
        response = requests.get(URL)
        do_chores(response)

    async_tasks = []
    for element in list(params.list_of_things):
       async_tasks.append(loop.create_task(get_chan_info_and_do_chores(id)))
       async_tasks.append(loop.create_task(get_httpapi_info_and_do_chores(ch_id)))

    loop.run_until_complete(asyncio.gather(*async_tasks))

它是如何工作的很简单。您正在创建一系列要异步执行的任务,然后要求循环执行这些任务并在完成时退出。没有多余的库,无需维护,也无需缺少功能。

I have a lot of issues with most of the answers posted – they either use deprecated libraries that have been ported over with limited features, or provide a solution with too much magic on the execution of the request, making it difficult to error handle. If they do not fall into one of the above categories, they’re 3rd party libraries or deprecated.

Some of the solutions works alright purely in http requests, but the solutions fall short for any other kind of request, which is ludicrous. A highly customized solution is not necessary here.

Simply using the python built-in library asyncio is sufficient enough to perform asynchronous requests of any type, as well as providing enough fluidity for complex and usecase specific error handling.

import asyncio

loop = asyncio.get_event_loop()

def do_thing(params):
    async def get_rpc_info_and_do_chores(id):
        # do things
        response = perform_grpc_call(id)
        do_chores(response)

    async def get_httpapi_info_and_do_chores(id):
        # do things
        response = requests.get(URL)
        do_chores(response)

    async_tasks = []
    for element in list(params.list_of_things):
       async_tasks.append(loop.create_task(get_chan_info_and_do_chores(id)))
       async_tasks.append(loop.create_task(get_httpapi_info_and_do_chores(ch_id)))

    loop.run_until_complete(asyncio.gather(*async_tasks))

How it works is simple. You’re creating a series of tasks you’d like to occur asynchronously, and then asking a loop to execute those tasks and exit upon completion. No extra libraries subject to lack of maintenance, no lack of functionality required.


回答 5

我知道这已经关闭了一段时间,但我认为推广另一个基于请求库的异步解决方案可能很有用。

list_of_requests = ['http://moop.com', 'http://doop.com', ...]

from simple_requests import Requests
for response in Requests().swarm(list_of_requests):
    print response.content

这些文档在这里:http : //pythonhosted.org/simple-requests/

I know this has been closed for a while, but I thought it might be useful to promote another async solution built on the requests library.

list_of_requests = ['http://moop.com', 'http://doop.com', ...]

from simple_requests import Requests
for response in Requests().swarm(list_of_requests):
    print response.content

The docs are here: http://pythonhosted.org/simple-requests/


回答 6

threads=list()

for requestURI in requests:
    t = Thread(target=self.openURL, args=(requestURI,))
    t.start()
    threads.append(t)

for thread in threads:
    thread.join()

...

def openURL(self, requestURI):
    o = urllib2.urlopen(requestURI, timeout = 600)
    o...
from threading import Thread

threads=list()

for requestURI in requests:
    t = Thread(target=self.openURL, args=(requestURI,))
    t.start()
    threads.append(t)

for thread in threads:
    thread.join()

...

def openURL(self, requestURI):
    o = urllib2.urlopen(requestURI, timeout = 600)
    o...

回答 7

如果你想使用ASYNCIO,然后requests-async提供异步/ AWAIT功能为requestshttps://github.com/encode/requests-async

If you want to use asyncio, then requests-async provides async/await functionality for requestshttps://github.com/encode/requests-async


回答 8

我一直在使用python请求对github的gist API进行异步调用。

有关示例,请参见此处的代码:

https://github.com/davidthewatson/flasgist/blob/master/views.py#L60-72

这种样式的python可能不是最清晰的例子,但是我可以向您保证代码可以工作。让我知道这是否使您感到困惑,我们将对其进行记录。

I have been using python requests for async calls against github’s gist API for some time.

For an example, see the code here:

https://github.com/davidthewatson/flasgist/blob/master/views.py#L60-72

This style of python may not be the clearest example, but I can assure you that the code works. Let me know if this is confusing to you and I will document it.


回答 9

您可以使用httpx它。

import httpx

async def get_async(url):
    async with httpx.AsyncClient() as client:
        return await client.get(url)

urls = ["http://google.com", "http://wikipedia.org"]

# Note that you need an async context to use `await`.
await asyncio.gather(*map(get_async, urls))

如果您需要功能语法,则gamla lib 会将其包装到中get_async

那你可以做


await gamla.map(gamla.get_async(10), ["http://google.com", "http://wikipedia.org"])

10以秒为单位的超时时间。

(免责声明:我是它的作者)

You can use httpx for that.

import httpx

async def get_async(url):
    async with httpx.AsyncClient() as client:
        return await client.get(url)

urls = ["http://google.com", "http://wikipedia.org"]

# Note that you need an async context to use `await`.
await asyncio.gather(*map(get_async, urls))

if you want a functional syntax, the gamla lib wraps this into get_async.

Then you can do


await gamla.map(gamla.get_async(10), ["http://google.com", "http://wikipedia.org"])

The 10 is the timeout in seconds.

(disclaimer: I am its author)


回答 10

我还尝试了使用python中的异步方法进行某些操作,但是使用twist进行异步编程的运气却更好。它具有较少的问题,并且有据可查。这是一些类似于您正在尝试的东西的链接。

http://pythonquirks.blogspot.com/2011/04/twisted-asynchronous-http-request.html

I have also tried some things using the asynchronous methods in python, how ever I have had much better luck using twisted for asynchronous programming. It has fewer problems and is well documented. Here is a link of something simmilar to what you are trying in twisted.

http://pythonquirks.blogspot.com/2011/04/twisted-asynchronous-http-request.html