Python 实用宝典

Question 1

I like very much the requests package and its comfortable way to handle JSON responses.

Unfortunately, I did not understand if I can also process XML responses. Has anybody experience how to handle XML responses with the requests package? Is it necessary to include another package for the XML decoding?

Question 2

requests does not handle parsing XML responses, no. XML responses are much more complex in nature than JSON responses, how you’d serialize XML data into Python structures is not nearly as straightforward.

Python comes with built-in XML parsers. I recommend you use the ElementTree API:

import requests
from xml.etree import ElementTree

response = requests.get(url)

tree = ElementTree.fromstring(response.content)

or, if the response is particularly large, use an incremental approach:

    response = requests.get(url, stream=True)
    # if the server sent a Gzip or Deflate compressed response, decompress
    # as we read the raw stream:
    response.raw.decode_content = True

    events = ElementTree.iterparse(response.raw)
    for event, elem in events:
        # do something with `elem`

The external lxml project builds on the same API to give you more features and power still.

Question 3

I need to get all the Django request headers. From what i’ve read, Django simply dumps everything into the request.META variable along with a lot aof other data. What would be the best way to get all the headers that the client sent to my Django application?

I’m going use these to build a httplib request.

Question 4

According to the documentation request.META is a “standard Python dictionary containing all available HTTP headers”. If you want to get all the headers you can simply iterate through the dictionary.

Which part of your code to do this depends on your exact requirement. Anyplace that has access to request should do.

Update

I need to access it in a Middleware class but when i iterate over it, I get a lot of values apart from HTTP headers.

From the documentation:

With the exception of CONTENT_LENGTH and CONTENT_TYPE, as given above, any HTTP headers in the request are converted to META keys by converting all characters to uppercase, replacing any hyphens with underscores and adding an HTTP_ prefix to the name.

(Emphasis added)

To get the HTTP headers alone, just filter by keys prefixed with HTTP_.

Update 2

could you show me how I could build a dictionary of headers by filtering out all the keys from the request.META variable which begin with a HTTP_ and strip out the leading HTTP_ part.

Sure. Here is one way to do it.

import re
regex = re.compile('^HTTP_')
dict((regex.sub('', header), value) for (header, value) 
       in request.META.items() if header.startswith('HTTP_'))

Question 5

Starting from Django 2.2, you can use request.headers to access the HTTP headers. From the documentation on HttpRequest.headers:

A case insensitive, dict-like object that provides access to all HTTP-prefixed headers (plus Content-Length and Content-Type) from the request.

The name of each header is stylized with title-casing (e.g. User-Agent) when it’s displayed. You can access headers case-insensitively:
>>> request.headers
{'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_6', ...}

>>> 'User-Agent' in request.headers
True
>>> 'user-agent' in request.headers
True

>>> request.headers['User-Agent']
Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_6)
>>> request.headers['user-agent']
Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_6)

>>> request.headers.get('User-Agent')
Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_6)
>>> request.headers.get('user-agent')
Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_6)

To get all headers, you can use request.headers.keys() or request.headers.items().

Question 6

This is another way to do it, very similar to Manoj Govindan‘s answer above:

import re
regex_http_          = re.compile(r'^HTTP_.+$')
regex_content_type   = re.compile(r'^CONTENT_TYPE$')
regex_content_length = re.compile(r'^CONTENT_LENGTH$')

request_headers = {}
for header in request.META:
    if regex_http_.match(header) or regex_content_type.match(header) or regex_content_length.match(header):
        request_headers[header] = request.META[header]

That will also grab the CONTENT_TYPE and CONTENT_LENGTH request headers, along with the HTTP_ ones. request_headers['some_key] == request.META['some_key'].

Modify accordingly if you need to include/omit certain headers. Django lists a bunch, but not all, of them here: https://docs.djangoproject.com/en/dev/ref/request-response/#django.http.HttpRequest.META

Django’s algorithm for request headers:

Replace hyphen - with underscore _
Convert to UPPERCASE.
Prepend HTTP_ to all headers in original request, except for CONTENT_TYPE and CONTENT_LENGTH.

The values of each header should be unmodified.

Question 7

request.META.get(‘HTTP_AUTHORIZATION’) /python3.6/site-packages/rest_framework/authentication.py

you can get that from this file though…

Question 8

I don’t think there is any easy way to get only HTTP headers. You have to iterate through request.META dict to get what all you need.

django-debug-toolbar takes the same approach to show header information. Have a look at this file responsible for retrieving header information.

Question 9

If you want to get client key from request header, u can try following:

from rest_framework.authentication import BaseAuthentication
from rest_framework import exceptions
from apps.authentication.models import CerebroAuth

class CerebroAuthentication(BaseAuthentication):
def authenticate(self, request):
    client_id = request.META.get('HTTP_AUTHORIZATION')
    if not client_id:
        raise exceptions.AuthenticationFailed('Client key not provided')
    client_id = client_id.split()
    if len(client_id) == 1 or len(client_id) > 2:
        msg = ('Invalid secrer key header. No credentials provided.')
        raise exceptions.AuthenticationFailed(msg)
    try:
        client = CerebroAuth.objects.get(client_id=client_id[1])
    except CerebroAuth.DoesNotExist:
        raise exceptions.AuthenticationFailed('No such client')
    return (client, None)

Question 10

For what it’s worth, it appears your intent is to use the incoming HTTP request to form another HTTP request. Sort of like a gateway. There is an excellent module django-revproxy that accomplishes exactly this.

The source is a pretty good reference on how to accomplish what you are trying to do.

Question 11

<b>request.META</b><br>
{% for k_meta, v_meta in request.META.items %}
  <code>{{ k_meta }}</code> : {{ v_meta }} <br>
{% endfor %}

Question 12

Simply you can use HttpRequest.headers from Django 2.2 onward. Following example is directly taken from the official Django Documentation under Request and response objects section.

>>> request.headers
{'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_6', ...}

>>> 'User-Agent' in request.headers
True
>>> 'user-agent' in request.headers
True

>>> request.headers['User-Agent']
Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_6)
>>> request.headers['user-agent']
Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_6)

>>> request.headers.get('User-Agent')
Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_6)
>>> request.headers.get('user-agent')
Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_6)

Question 13

This is the raw request for an API call:

POST http://192.168.3.45:8080/api/v2/event/log?sessionKey=b299d17b896417a7b18f46544d40adb734240cc2&format=json HTTP/1.1
Accept-Encoding: gzip,deflate
Content-Type: application/json
Content-Length: 86
Host: 192.168.3.45:8080
Connection: Keep-Alive
User-Agent: Apache-HttpClient/4.1.1 (java 1.5)

{"eventType":"AAS_PORTAL_START","data":{"uid":"hfe3hf45huf33545","aid":"1","vid":"1"}}"""

This request returns a success (2xx) response.

Now I am trying to post this request using requests:

>>> import requests
>>> headers = {'content-type' : 'application/json'}
>>> data ={"eventType":"AAS_PORTAL_START","data{"uid":"hfe3hf45huf33545","aid":"1","vid":"1"}}
>>> url = "http://192.168.3.45:8080/api/v2/event/log?sessionKey=9ebbd0b25760557393a43064a92bae539d962103&format=xml&platformId=1"
>>> requests.post(url,params=data,headers=headers)
<Response [400]>

Everything looks fine to me and I am not quite sure what I posting wrong to get a 400 response.

Question 14

params is for GET-style URL parameters, data is for POST-style body information. It is perfectly legal to provide both types of information in a request, and your request does so too, but you encoded the URL parameters into the URL already.

Your raw post contains JSON data though. requests can handle JSON encoding for you, and it’ll set the correct Content-Header too; all you need to do is pass in the Python object to be encoded as JSON into the json keyword argument.

You could split out the URL parameters as well:

params = {'sessionKey': '9ebbd0b25760557393a43064a92bae539d962103', 'format': 'xml', 'platformId': 1}

then post your data with:

import requests

url = 'http://192.168.3.45:8080/api/v2/event/log'

data = {"eventType": "AAS_PORTAL_START", "data": {"uid": "hfe3hf45huf33545", "aid": "1", "vid": "1"}}
params = {'sessionKey': '9ebbd0b25760557393a43064a92bae539d962103', 'format': 'xml', 'platformId': 1}

requests.post(url, params=params, json=data)

The json keyword is new in requests version 2.4.2; if you still have to use an older version, encode the JSON manually using the json module and post the encoded result as the data key; you will have to explicitly set the Content-Type header in that case:

import requests
import json

headers = {'content-type': 'application/json'}
url = 'http://192.168.3.45:8080/api/v2/event/log'

data = {"eventType": "AAS_PORTAL_START", "data": {"uid": "hfe3hf45huf33545", "aid": "1", "vid": "1"}}
params = {'sessionKey': '9ebbd0b25760557393a43064a92bae539d962103', 'format': 'xml', 'platformId': 1}

requests.post(url, params=params, data=json.dumps(data), headers=headers)

Question 15

Set data to this:

data ={"eventType":"AAS_PORTAL_START","data":{"uid":"hfe3hf45huf33545","aid":"1","vid":"1"}}

Question 16

Assign the response to a value and test the attributes of it. These should tell you something useful.

response = requests.post(url,params=data,headers=headers)
response.status_code
response.text

status_code should just reconfirm the code you were given before, of course

Question 17

I tried the sample provided within the documentation of the requests library for python.

With async.map(rs), I get the response codes, but I want to get the content of each page requested. This, for example, does not work:

out = async.map(rs)
print out[0].content

Question 18

Note

The below answer is not applicable to requests v0.13.0+. The asynchronous functionality was moved to grequests after this question was written. However, you could just replace requests with grequests below and it should work.

I’ve left this answer as is to reflect the original question which was about using requests < v0.13.0.

To do multiple tasks with async.map asynchronously you have to:

Define a function for what you want to do with each object (your task)
Add that function as an event hook in your request
Call async.map on a list of all the requests / actions

Example:

from requests import async
# If using requests > v0.13.0, use
# from grequests import async

urls = [
    'http://python-requests.org',
    'http://httpbin.org',
    'http://python-guide.org',
    'http://kennethreitz.com'
]

# A simple task to do to each response object
def do_something(response):
    print response.url

# A list to hold our things to do via async
async_list = []

for u in urls:
    # The "hooks = {..." part is where you define what you want to do
    # 
    # Note the lack of parentheses following do_something, this is
    # because the response will be used as the first argument automatically
    action_item = async.get(u, hooks = {'response' : do_something})

    # Add the task to our list of things to do via async
    async_list.append(action_item)

# Do our list of things to do via async
async.map(async_list)

Question 19

async is now an independent module : grequests.

See here : https://github.com/kennethreitz/grequests

And there: Ideal method for sending multiple HTTP requests over Python?

installation:

$ pip install grequests

usage:

build a stack:

import grequests

urls = [
    'http://www.heroku.com',
    'http://tablib.org',
    'http://httpbin.org',
    'http://python-requests.org',
    'http://kennethreitz.com'
]

rs = (grequests.get(u) for u in urls)

send the stack

grequests.map(rs)

result looks like

[<Response [200]>, <Response [200]>, <Response [200]>, <Response [200]>, <Response [200]>]

grequests don’t seem to set a limitation for concurrent requests, ie when multiple requests are sent to the same server.

Question 20

I tested both requests-futures and grequests. Grequests is faster but brings monkey patching and additional problems with dependencies. requests-futures is several times slower than grequests. I decided to write my own and simply wrapped requests into ThreadPoolExecutor and it was almost as fast as grequests, but without external dependencies.

import requests
import concurrent.futures

def get_urls():
    return ["url1","url2"]

def load_url(url, timeout):
    return requests.get(url, timeout = timeout)

with concurrent.futures.ThreadPoolExecutor(max_workers=20) as executor:

    future_to_url = {executor.submit(load_url, url, 10): url for url in     get_urls()}
    for future in concurrent.futures.as_completed(future_to_url):
        url = future_to_url[future]
        try:
            data = future.result()
        except Exception as exc:
            resp_err = resp_err + 1
        else:
            resp_ok = resp_ok + 1

Question 21

maybe requests-futures is another choice.

from requests_futures.sessions import FuturesSession

session = FuturesSession()
# first request is started in background
future_one = session.get('http://httpbin.org/get')
# second requests is started immediately
future_two = session.get('http://httpbin.org/get?foo=bar')
# wait for the first request to complete, if it hasn't already
response_one = future_one.result()
print('response one status: {0}'.format(response_one.status_code))
print(response_one.content)
# wait for the second request to complete, if it hasn't already
response_two = future_two.result()
print('response two status: {0}'.format(response_two.status_code))
print(response_two.content)

It is also recommended in the office document. If you don’t want involve gevent, it’s a good one.

Question 22

I have a lot of issues with most of the answers posted – they either use deprecated libraries that have been ported over with limited features, or provide a solution with too much magic on the execution of the request, making it difficult to error handle. If they do not fall into one of the above categories, they’re 3rd party libraries or deprecated.

Some of the solutions works alright purely in http requests, but the solutions fall short for any other kind of request, which is ludicrous. A highly customized solution is not necessary here.

Simply using the python built-in library asyncio is sufficient enough to perform asynchronous requests of any type, as well as providing enough fluidity for complex and usecase specific error handling.

import asyncio

loop = asyncio.get_event_loop()

def do_thing(params):
    async def get_rpc_info_and_do_chores(id):
        # do things
        response = perform_grpc_call(id)
        do_chores(response)

    async def get_httpapi_info_and_do_chores(id):
        # do things
        response = requests.get(URL)
        do_chores(response)

    async_tasks = []
    for element in list(params.list_of_things):
       async_tasks.append(loop.create_task(get_chan_info_and_do_chores(id)))
       async_tasks.append(loop.create_task(get_httpapi_info_and_do_chores(ch_id)))

    loop.run_until_complete(asyncio.gather(*async_tasks))

How it works is simple. You’re creating a series of tasks you’d like to occur asynchronously, and then asking a loop to execute those tasks and exit upon completion. No extra libraries subject to lack of maintenance, no lack of functionality required.

Question 23

I know this has been closed for a while, but I thought it might be useful to promote another async solution built on the requests library.

list_of_requests = ['http://moop.com', 'http://doop.com', ...]

from simple_requests import Requests
for response in Requests().swarm(list_of_requests):
    print response.content

The docs are here: http://pythonhosted.org/simple-requests/

Question 24

from threading import Thread

threads=list()

for requestURI in requests:
    t = Thread(target=self.openURL, args=(requestURI,))
    t.start()
    threads.append(t)

for thread in threads:
    thread.join()

...

def openURL(self, requestURI):
    o = urllib2.urlopen(requestURI, timeout = 600)
    o...

Question 25

If you want to use asyncio, then requests-async provides async/await functionality for requests – https://github.com/encode/requests-async

Question 26

I have been using python requests for async calls against github’s gist API for some time.

For an example, see the code here:

https://github.com/davidthewatson/flasgist/blob/master/views.py#L60-72

This style of python may not be the clearest example, but I can assure you that the code works. Let me know if this is confusing to you and I will document it.

Question 27

You can use httpx for that.

import httpx

async def get_async(url):
    async with httpx.AsyncClient() as client:
        return await client.get(url)

urls = ["http://google.com", "http://wikipedia.org"]

# Note that you need an async context to use `await`.
await asyncio.gather(*map(get_async, urls))

if you want a functional syntax, the gamla lib wraps this into get_async.

Then you can do


await gamla.map(gamla.get_async(10), ["http://google.com", "http://wikipedia.org"])

The 10 is the timeout in seconds.

(disclaimer: I am its author)

Question 28

I have also tried some things using the asynchronous methods in python, how ever I have had much better luck using twisted for asynchronous programming. It has fewer problems and is well documented. Here is a link of something simmilar to what you are trying in twisted.

http://pythonquirks.blogspot.com/2011/04/twisted-asynchronous-http-request.html

问题：Python请求包：处理xml响应

回答 0

问题：如何在Django中获取所有请求标头？

回答 0

回答 1

回答 2

回答 3

回答 4

回答 5

回答 6

回答 7

回答 8

问题：带有参数数据的Python请求发布

回答 0

回答 1

回答 2

问题：带有Python请求的异步请求

回答 0

注意

Note

回答 1

安装：

用法：

installation:

usage:

回答 2

回答 3

回答 4

回答 5

回答 6

回答 7

回答 8

回答 9

回答 10

有趣好用的Python教程