问题:带有Python请求的异步请求
我尝试了python 请求库文档中提供的示例。
使用async.map(rs)
,我得到了响应代码,但是我想获得所请求的每个页面的内容。例如,这不起作用:
out = async.map(rs)
print out[0].content
I tried the sample provided within the documentation of the requests library for python.
With async.map(rs)
, I get the response codes, but I want to get the content of each page requested. This, for example, does not work:
out = async.map(rs)
print out[0].content
回答 0
注意
下面的答案是不适用于请求v0.13.0 +。编写此问题后,异步功能已移至grequests。但是,您可以将其替换requests
为grequests
下面的内容,它应该可以工作。
我已经留下了这个答案,以反映原始问题,该问题与使用请求<v0.13.0有关。
要async.map
异步执行多个任务,您必须:
- 为每个对象定义一个函数(您的任务)
- 将该函数添加为请求中的事件挂钩
- 调用
async.map
所有请求/操作的列表
例:
from requests import async
# If using requests > v0.13.0, use
# from grequests import async
urls = [
'http://python-requests.org',
'http://httpbin.org',
'http://python-guide.org',
'http://kennethreitz.com'
]
# A simple task to do to each response object
def do_something(response):
print response.url
# A list to hold our things to do via async
async_list = []
for u in urls:
# The "hooks = {..." part is where you define what you want to do
#
# Note the lack of parentheses following do_something, this is
# because the response will be used as the first argument automatically
action_item = async.get(u, hooks = {'response' : do_something})
# Add the task to our list of things to do via async
async_list.append(action_item)
# Do our list of things to do via async
async.map(async_list)
Note
The below answer is not applicable to requests v0.13.0+. The asynchronous functionality was moved to grequests after this question was written. However, you could just replace requests
with grequests
below and it should work.
I’ve left this answer as is to reflect the original question which was about using requests < v0.13.0.
To do multiple tasks with async.map
asynchronously you have to:
- Define a function for what you want to do with each object (your task)
- Add that function as an event hook in your request
- Call
async.map
on a list of all the requests / actions
Example:
from requests import async
# If using requests > v0.13.0, use
# from grequests import async
urls = [
'http://python-requests.org',
'http://httpbin.org',
'http://python-guide.org',
'http://kennethreitz.com'
]
# A simple task to do to each response object
def do_something(response):
print response.url
# A list to hold our things to do via async
async_list = []
for u in urls:
# The "hooks = {..." part is where you define what you want to do
#
# Note the lack of parentheses following do_something, this is
# because the response will be used as the first argument automatically
action_item = async.get(u, hooks = {'response' : do_something})
# Add the task to our list of things to do via async
async_list.append(action_item)
# Do our list of things to do via async
async.map(async_list)
回答 1
async
现在是一个独立的模块:grequests
。
看到这里:https : //github.com/kennethreitz/grequests
那里:通过Python发送多个HTTP请求的理想方法?
安装:
$ pip install grequests
用法:
建立一个堆栈:
import grequests
urls = [
'http://www.heroku.com',
'http://tablib.org',
'http://httpbin.org',
'http://python-requests.org',
'http://kennethreitz.com'
]
rs = (grequests.get(u) for u in urls)
发送堆栈
grequests.map(rs)
结果看起来像
[<Response [200]>, <Response [200]>, <Response [200]>, <Response [200]>, <Response [200]>]
grequests似乎没有为并发请求设置限制,即当多个请求发送到同一服务器时。
async
is now an independent module : grequests
.
See here : https://github.com/kennethreitz/grequests
And there: Ideal method for sending multiple HTTP requests over Python?
installation:
$ pip install grequests
usage:
build a stack:
import grequests
urls = [
'http://www.heroku.com',
'http://tablib.org',
'http://httpbin.org',
'http://python-requests.org',
'http://kennethreitz.com'
]
rs = (grequests.get(u) for u in urls)
send the stack
grequests.map(rs)
result looks like
[<Response [200]>, <Response [200]>, <Response [200]>, <Response [200]>, <Response [200]>]
grequests don’t seem to set a limitation for concurrent requests, ie when multiple requests are sent to the same server.
回答 2
我同时测试了request-futures和grequests。Grequests速度更快,但是会带来Monkey补丁和依赖关系的其他问题。request-futures比grequests慢几倍。我决定将自己的请求简单地包装到ThreadPoolExecutor中,这几乎与grequests一样快,但是没有外部依赖项。
import requests
import concurrent.futures
def get_urls():
return ["url1","url2"]
def load_url(url, timeout):
return requests.get(url, timeout = timeout)
with concurrent.futures.ThreadPoolExecutor(max_workers=20) as executor:
future_to_url = {executor.submit(load_url, url, 10): url for url in get_urls()}
for future in concurrent.futures.as_completed(future_to_url):
url = future_to_url[future]
try:
data = future.result()
except Exception as exc:
resp_err = resp_err + 1
else:
resp_ok = resp_ok + 1
I tested both requests-futures and grequests. Grequests is faster but brings monkey patching and additional problems with dependencies. requests-futures is several times slower than grequests. I decided to write my own and simply wrapped requests into ThreadPoolExecutor and it was almost as fast as grequests, but without external dependencies.
import requests
import concurrent.futures
def get_urls():
return ["url1","url2"]
def load_url(url, timeout):
return requests.get(url, timeout = timeout)
with concurrent.futures.ThreadPoolExecutor(max_workers=20) as executor:
future_to_url = {executor.submit(load_url, url, 10): url for url in get_urls()}
for future in concurrent.futures.as_completed(future_to_url):
url = future_to_url[future]
try:
data = future.result()
except Exception as exc:
resp_err = resp_err + 1
else:
resp_ok = resp_ok + 1
回答 3
也许要求-未来是另一种选择。
from requests_futures.sessions import FuturesSession
session = FuturesSession()
# first request is started in background
future_one = session.get('http://httpbin.org/get')
# second requests is started immediately
future_two = session.get('http://httpbin.org/get?foo=bar')
# wait for the first request to complete, if it hasn't already
response_one = future_one.result()
print('response one status: {0}'.format(response_one.status_code))
print(response_one.content)
# wait for the second request to complete, if it hasn't already
response_two = future_two.result()
print('response two status: {0}'.format(response_two.status_code))
print(response_two.content)
办公文件中也建议使用此功能。如果您不想参与gevent,那将是一个很好的选择。
maybe requests-futures is another choice.
from requests_futures.sessions import FuturesSession
session = FuturesSession()
# first request is started in background
future_one = session.get('http://httpbin.org/get')
# second requests is started immediately
future_two = session.get('http://httpbin.org/get?foo=bar')
# wait for the first request to complete, if it hasn't already
response_one = future_one.result()
print('response one status: {0}'.format(response_one.status_code))
print(response_one.content)
# wait for the second request to complete, if it hasn't already
response_two = future_two.result()
print('response two status: {0}'.format(response_two.status_code))
print(response_two.content)
It is also recommended in the office document. If you don’t want involve gevent, it’s a good one.
回答 4
我在发布的大多数答案中都遇到了很多问题-它们要么使用已过时的库,这些库已被移植以具有有限的功能,要么为解决方案的执行提供了太多魔力,因此难以处理错误。如果它们不属于上述类别之一,则说明它们是第三方库或已弃用。
某些解决方案完全可以在http请求中正常工作,但是对于任何其他种类的请求(这都是荒谬的),这些解决方案都不够。这里不需要高度定制的解决方案。
简单地使用python内置库asyncio
足以执行任何类型的异步请求,并为复杂的和用例特定的错误处理提供足够的流动性。
import asyncio
loop = asyncio.get_event_loop()
def do_thing(params):
async def get_rpc_info_and_do_chores(id):
# do things
response = perform_grpc_call(id)
do_chores(response)
async def get_httpapi_info_and_do_chores(id):
# do things
response = requests.get(URL)
do_chores(response)
async_tasks = []
for element in list(params.list_of_things):
async_tasks.append(loop.create_task(get_chan_info_and_do_chores(id)))
async_tasks.append(loop.create_task(get_httpapi_info_and_do_chores(ch_id)))
loop.run_until_complete(asyncio.gather(*async_tasks))
它是如何工作的很简单。您正在创建一系列要异步执行的任务,然后要求循环执行这些任务并在完成时退出。没有多余的库,无需维护,也无需缺少功能。
I have a lot of issues with most of the answers posted – they either use deprecated libraries that have been ported over with limited features, or provide a solution with too much magic on the execution of the request, making it difficult to error handle. If they do not fall into one of the above categories, they’re 3rd party libraries or deprecated.
Some of the solutions works alright purely in http requests, but the solutions fall short for any other kind of request, which is ludicrous. A highly customized solution is not necessary here.
Simply using the python built-in library asyncio
is sufficient enough to perform asynchronous requests of any type, as well as providing enough fluidity for complex and usecase specific error handling.
import asyncio
loop = asyncio.get_event_loop()
def do_thing(params):
async def get_rpc_info_and_do_chores(id):
# do things
response = perform_grpc_call(id)
do_chores(response)
async def get_httpapi_info_and_do_chores(id):
# do things
response = requests.get(URL)
do_chores(response)
async_tasks = []
for element in list(params.list_of_things):
async_tasks.append(loop.create_task(get_chan_info_and_do_chores(id)))
async_tasks.append(loop.create_task(get_httpapi_info_and_do_chores(ch_id)))
loop.run_until_complete(asyncio.gather(*async_tasks))
How it works is simple. You’re creating a series of tasks you’d like to occur asynchronously, and then asking a loop to execute those tasks and exit upon completion. No extra libraries subject to lack of maintenance, no lack of functionality required.
回答 5
我知道这已经关闭了一段时间,但我认为推广另一个基于请求库的异步解决方案可能很有用。
list_of_requests = ['http://moop.com', 'http://doop.com', ...]
from simple_requests import Requests
for response in Requests().swarm(list_of_requests):
print response.content
这些文档在这里:http : //pythonhosted.org/simple-requests/
I know this has been closed for a while, but I thought it might be useful to promote another async solution built on the requests library.
list_of_requests = ['http://moop.com', 'http://doop.com', ...]
from simple_requests import Requests
for response in Requests().swarm(list_of_requests):
print response.content
The docs are here: http://pythonhosted.org/simple-requests/
回答 6
threads=list()
for requestURI in requests:
t = Thread(target=self.openURL, args=(requestURI,))
t.start()
threads.append(t)
for thread in threads:
thread.join()
...
def openURL(self, requestURI):
o = urllib2.urlopen(requestURI, timeout = 600)
o...
from threading import Thread
threads=list()
for requestURI in requests:
t = Thread(target=self.openURL, args=(requestURI,))
t.start()
threads.append(t)
for thread in threads:
thread.join()
...
def openURL(self, requestURI):
o = urllib2.urlopen(requestURI, timeout = 600)
o...
回答 7
回答 8
I have been using python requests for async calls against github’s gist API for some time.
For an example, see the code here:
https://github.com/davidthewatson/flasgist/blob/master/views.py#L60-72
This style of python may not be the clearest example, but I can assure you that the code works. Let me know if this is confusing to you and I will document it.
回答 9
您可以使用httpx
它。
import httpx
async def get_async(url):
async with httpx.AsyncClient() as client:
return await client.get(url)
urls = ["http://google.com", "http://wikipedia.org"]
# Note that you need an async context to use `await`.
await asyncio.gather(*map(get_async, urls))
如果您需要功能语法,则gamla lib 会将其包装到中get_async
。
那你可以做
await gamla.map(gamla.get_async(10), ["http://google.com", "http://wikipedia.org"])
在10
以秒为单位的超时时间。
(免责声明:我是它的作者)
You can use httpx
for that.
import httpx
async def get_async(url):
async with httpx.AsyncClient() as client:
return await client.get(url)
urls = ["http://google.com", "http://wikipedia.org"]
# Note that you need an async context to use `await`.
await asyncio.gather(*map(get_async, urls))
if you want a functional syntax, the gamla lib wraps this into get_async
.
Then you can do
await gamla.map(gamla.get_async(10), ["http://google.com", "http://wikipedia.org"])
The 10
is the timeout in seconds.
(disclaimer: I am its author)
回答 10