问题:使用Python中的请求库发送“用户代理”

我想在"User-agent"使用Python请求请求网页时发送的值。我不确定是否可以将其作为标头的一部分发送,如以下代码所示:

debug = {'verbose': sys.stderr}
user_agent = {'User-agent': 'Mozilla/5.0'}
response  = requests.get(url, headers = user_agent, config=debug)

调试信息未显示请求期间发送的标头。

在标头中发送此信息是否可以接受?如果没有,我该如何发送?

I want to send a value for "User-agent" while requesting a webpage using Python Requests. I am not sure is if it is okay to send this as a part of the header, as in the code below:

debug = {'verbose': sys.stderr}
user_agent = {'User-agent': 'Mozilla/5.0'}
response  = requests.get(url, headers = user_agent, config=debug)

The debug information isn’t showing the headers being sent during the request.

Is it acceptable to send this information in the header? If not, how can I send it?


回答 0

user-agent应指定为在报头中的字段。

这是HTTP标头字段列表,您可能会对特定请求的字段感兴趣,其中包括User-Agent

如果您使用的是v2.13及更高版本的请求

执行所需操作的最简单方法是创建字典并直接指定标题,例如:

import requests

url = 'SOME URL'

headers = {
    'User-Agent': 'My User Agent 1.0',
    'From': 'youremail@domain.com'  # This is another valid field
}

response = requests.get(url, headers=headers)

如果您使用的是v2.12.x及更高版本的请求

旧版本的requests默认标头已损坏,因此您需要执行以下操作来保留默认标头,然后向其添加自己的标头。

import requests

url = 'SOME URL'

# Get a copy of the default headers that requests would use
headers = requests.utils.default_headers()

# Update the headers with your custom ones
# You don't have to worry about case-sensitivity with
# the dictionary keys, because default_headers uses a custom
# CaseInsensitiveDict implementation within requests' source code.
headers.update(
    {
        'User-Agent': 'My User Agent 1.0',
    }
)

response = requests.get(url, headers=headers)

The user-agent should be specified as a field in the header.

Here is a list of HTTP header fields, and you’d probably be interested in request-specific fields, which includes User-Agent.

If you’re using requests v2.13 and newer

The simplest way to do what you want is to create a dictionary and specify your headers directly, like so:

import requests

url = 'SOME URL'

headers = {
    'User-Agent': 'My User Agent 1.0',
    'From': 'youremail@domain.com'  # This is another valid field
}

response = requests.get(url, headers=headers)

If you’re using requests v2.12.x and older

Older versions of requests clobbered default headers, so you’d want to do the following to preserve default headers and then add your own to them.

import requests

url = 'SOME URL'

# Get a copy of the default headers that requests would use
headers = requests.utils.default_headers()

# Update the headers with your custom ones
# You don't have to worry about case-sensitivity with
# the dictionary keys, because default_headers uses a custom
# CaseInsensitiveDict implementation within requests' source code.
headers.update(
    {
        'User-Agent': 'My User Agent 1.0',
    }
)

response = requests.get(url, headers=headers)

回答 1

使用session更方便,这样您就不必每次都设置标题了:

session = requests.Session()
session.headers.update({'User-Agent': 'Custom user agent'})

session.get('https://httpbin.org/headers')

默认情况下,会话也会为您管理cookie。如果您要禁用该功能,请参阅此问题

It’s more convenient to use a session, this way you don’t have to remember to set headers each time:

session = requests.Session()
session.headers.update({'User-Agent': 'Custom user agent'})

session.get('https://httpbin.org/headers')

By default, session also manages cookies for you. In case you want to disable that, see this question.


声明:本站所有文章,如无特殊说明或标注,均为本站原创发布。任何个人或组织,在未征得本站同意时,禁止复制、盗用、采集、发布本站内容到任何网站、书籍等各类媒体平台。如若本站内容侵犯了原著者的合法权益,可联系我们进行处理。