标签归档:http-request

如何使用Python Requests库在发布请求中发送Cookie?

问题:如何使用Python Requests库在发布请求中发送Cookie?

我正在尝试使用Requests库发送带有后期请求的cookie,但是我不确定如何根据其文档实际设置cookie。该脚本可在Wikipedia上使用,并且需要发送的cookie具有以下形式:

enwiki_session=17ab96bd8ffbe8ca58a78657a918558e; path=/; domain=.wikipedia.com; HttpOnly

但是,requests文档快速入门仅以此为例:

cookies = dict(cookies_are='working')

如何使用该库对上述Cookie进行编码?我是否需要使用python的标准cookie库进行制作,然后将其与POST请求一起发送?

I’m trying to use the Requests library to send cookies with a post request, but I’m not sure how to actually set up the cookies based on its documentation. The script is for use on Wikipedia, and the cookie(s) that need to be sent are of this form:

enwiki_session=17ab96bd8ffbe8ca58a78657a918558e; path=/; domain=.wikipedia.com; HttpOnly

However, the requests documentation quickstart gives this as the only example:

cookies = dict(cookies_are='working')

How can I encode a cookie like the above using this library? Do I need to make it with python’s standard cookie library, then send it along with the POST request?


回答 0

最新版本的“请求”将通过简单的词典为您构建CookieJars。

import requests

cookies = {'enwiki_session': '17ab96bd8ffbe8ca58a78657a918558'}

r = requests.post('http://wikipedia.org', cookies=cookies)

请享用 :)

The latest release of Requests will build CookieJars for you from simple dictionaries.

import requests

cookies = {'enwiki_session': '17ab96bd8ffbe8ca58a78657a918558'}

r = requests.post('http://wikipedia.org', cookies=cookies)

Enjoy :)


回答 1

只是为了扩展上一个答案,如果将两个请求链接在一起,并且想要将第一个返回的cookie发送到第二个(例如,保持跨请求的会话有效),则可以执行以下操作:

import requests
r1 = requests.post('http://www.yourapp.com/login')
r2 = requests.post('http://www.yourapp.com/somepage',cookies=r1.cookies)

Just to extend on the previous answer, if you are linking two requests together and want to send the cookies returned from the first one to the second one (for example, maintaining a session alive across requests) you can do:

import requests
r1 = requests.post('http://www.yourapp.com/login')
r2 = requests.post('http://www.yourapp.com/somepage',cookies=r1.cookies)

回答 2

如果要将cookie传递给浏览器,则必须附加到标头以发送回。如果您使用的是wsgi:

import requests
...


def application(environ, start_response):
    cookie = {'enwiki_session': '17ab96bd8ffbe8ca58a78657a918558'}
    response_headers = [('Content-type', 'text/plain')]
    response_headers.append(('Set-Cookie',cookie))
...

    return [bytes(post_env),response_headers]

通过将auth user / password传递给我的python脚本并将cookie传递给浏览器,我能够成功地通过托管在同一域中的Bugzilla和TWiki进行身份验证,而python wsgi脚本正在运行。这使我可以在同一浏览器中打开Bugzilla和TWiki页面并进行身份验证。我正在尝试对SuiteCRM执行相同的操作,但是即使SuiteCRM已成功通过身份验证,我也无法接受从python脚本获得的会话cookie。

If you want to pass the cookie to the browser, you have to append to the headers to be sent back. If you’re using wsgi:

import requests
...


def application(environ, start_response):
    cookie = {'enwiki_session': '17ab96bd8ffbe8ca58a78657a918558'}
    response_headers = [('Content-type', 'text/plain')]
    response_headers.append(('Set-Cookie',cookie))
...

    return [bytes(post_env),response_headers]

I’m successfully able to authenticate with Bugzilla and TWiki hosted on the same domain my python wsgi script is running by passing auth user/password to my python script and pass the cookies to the browser. This allows me to open the Bugzilla and TWiki pages in the same browser and be authenticated. I’m trying to do the same with SuiteCRM but i’m having trouble with SuiteCRM accepting the session cookies obtained from the python script even though it has successfully authenticated.


带有Python’请求’模块的代理

问题:带有Python’请求’模块的代理

简短,简单的介绍了出色的Python 请求模块。

我似乎在文档中找不到变量“代理”应包含的内容。当我发送带有标准“ IP:PORT”值的字典时,它拒绝要求2个值。所以,我猜(因为在文档中似乎没有涵盖),第一个值是ip,第二个值是端口?

文档只提到了这一点:

代理–(可选)字典到代理URL的映射协议。

所以我尝试了…我应该怎么做?

proxy = { ip: port}

在将它们放入字典之前,我应该将它们转换为某种类型吗?

r = requests.get(url,headers=headers,proxies=proxy)

Just a short, simple one about the excellent Requests module for Python.

I can’t seem to find in the documentation what the variable ‘proxies’ should contain. When I send it a dict with a standard “IP:PORT” value it rejected it asking for 2 values. So, I guess (because this doesn’t seem to be covered in the docs) that the first value is the ip and the second the port?

The docs mention this only:

proxies – (optional) Dictionary mapping protocol to the URL of the proxy.

So I tried this… what should I be doing?

proxy = { ip: port}

and should I convert these to some type before putting them in the dict?

r = requests.get(url,headers=headers,proxies=proxy)

回答 0

proxies“字典语法{"protocol":"ip:port", ...}。使用它,您可以使用httphttpsftp协议为请求指定不同(或相同)的代理:

http_proxy  = "http://10.10.1.10:3128"
https_proxy = "https://10.10.1.11:1080"
ftp_proxy   = "ftp://10.10.1.10:3128"

proxyDict = { 
              "http"  : http_proxy, 
              "https" : https_proxy, 
              "ftp"   : ftp_proxy
            }

r = requests.get(url, headers=headers, proxies=proxyDict)

requests文档推导:

参数:
method –新Request对象的方法。
url–新的Request对象的URL。

proxies–(可选)字典映射 协议代理URL


在Linux上,你也可以通过这样做HTTP_PROXYHTTPS_PROXY以及FTP_PROXY环境变量:

export HTTP_PROXY=10.10.1.10:3128
export HTTPS_PROXY=10.10.1.11:1080
export FTP_PROXY=10.10.1.10:3128

在Windows上:

set http_proxy=10.10.1.10:3128
set https_proxy=10.10.1.11:1080
set ftp_proxy=10.10.1.10:3128

谢谢,Jay指出了这一点:
语法随请求2.0.0更改。
您需要将架构添加到url:https : //2.python-requests.org/en/latest/user/advanced/#proxies

The proxies‘ dict syntax is {"protocol":"ip:port", ...}. With it you can specify different (or the same) proxie(s) for requests using http, https, and ftp protocols:

http_proxy  = "http://10.10.1.10:3128"
https_proxy = "https://10.10.1.11:1080"
ftp_proxy   = "ftp://10.10.1.10:3128"

proxyDict = { 
              "http"  : http_proxy, 
              "https" : https_proxy, 
              "ftp"   : ftp_proxy
            }

r = requests.get(url, headers=headers, proxies=proxyDict)

Deduced from the requests documentation:

Parameters:
method – method for the new Request object.
url – URL for the new Request object.

proxies – (optional) Dictionary mapping protocol to the URL of the proxy.


On linux you can also do this via the HTTP_PROXY, HTTPS_PROXY, and FTP_PROXY environment variables:

export HTTP_PROXY=10.10.1.10:3128
export HTTPS_PROXY=10.10.1.11:1080
export FTP_PROXY=10.10.1.10:3128

On Windows:

set http_proxy=10.10.1.10:3128
set https_proxy=10.10.1.11:1080
set ftp_proxy=10.10.1.10:3128

Thanks, Jay for pointing this out:
The syntax changed with requests 2.0.0.
You’ll need to add a schema to the url: https://2.python-requests.org/en/latest/user/advanced/#proxies


回答 1

我发现urllib有一些非常好的代码来选取系统的代理设置,而且它们恰好采用直接使用的正确形式。您可以这样使用:

import urllib

...
r = requests.get('http://example.org', proxies=urllib.request.getproxies())

它确实运行良好,并且urllib知道如何获取Mac OS X和Windows设置。

I have found that urllib has some really good code to pick up the system’s proxy settings and they happen to be in the correct form to use directly. You can use this like:

import urllib

...
r = requests.get('http://example.org', proxies=urllib.request.getproxies())

It works really well and urllib knows about getting Mac OS X and Windows settings as well.


回答 2

您可以在此处参考代理文档

如果需要使用代理,则可以使用任何请求方法的proxies参数配置单个请求:

import requests

proxies = {
  "http": "http://10.10.1.10:3128",
  "https": "https://10.10.1.10:1080",
}

requests.get("http://example.org", proxies=proxies)

要将HTTP Basic Auth与您的代理一起使用,请使用http:// user:password@host.com/语法:

proxies = {
    "http": "http://user:pass@10.10.1.10:3128/"
}

You can refer to the proxy documentation here.

If you need to use a proxy, you can configure individual requests with the proxies argument to any request method:

import requests

proxies = {
  "http": "http://10.10.1.10:3128",
  "https": "https://10.10.1.10:1080",
}

requests.get("http://example.org", proxies=proxies)

To use HTTP Basic Auth with your proxy, use the http://user:password@host.com/ syntax:

proxies = {
    "http": "http://user:pass@10.10.1.10:3128/"
}

回答 3

可接受的答案对我来说是一个好的开始,但是我不断遇到以下错误:

AssertionError: Not supported proxy scheme None

解决此问题的方法是在代理url中指定http://,从而:

http_proxy  = "http://194.62.145.248:8080"
https_proxy  = "https://194.62.145.248:8080"
ftp_proxy   = "10.10.1.10:3128"

proxyDict = {
              "http"  : http_proxy,
              "https" : https_proxy,
              "ftp"   : ftp_proxy
            }

我想知道为什么原始作品对某些人有用,但对我却不有用。

编辑:我看到主要的答案现在已更新以反映此:)

The accepted answer was a good start for me, but I kept getting the following error:

AssertionError: Not supported proxy scheme None

Fix to this was to specify the http:// in the proxy url thus:

http_proxy  = "http://194.62.145.248:8080"
https_proxy  = "https://194.62.145.248:8080"
ftp_proxy   = "10.10.1.10:3128"

proxyDict = {
              "http"  : http_proxy,
              "https" : https_proxy,
              "ftp"   : ftp_proxy
            }

I’d be interested as to why the original works for some people but not me.

Edit: I see the main answer is now updated to reflect this :)


回答 4

如果您要坚持使用cookie和会话数据,则最好这样做:

import requests

proxies = {
    'http': 'http://user:pass@10.10.1.0:3128',
    'https': 'https://user:pass@10.10.1.0:3128',
}

# Create the session and set the proxies.
s = requests.Session()
s.proxies = proxies

# Make the HTTP request through the session.
r = s.get('http://www.showmemyip.com/')

If you’d like to persisist cookies and session data, you’d best do it like this:

import requests

proxies = {
    'http': 'http://user:pass@10.10.1.0:3128',
    'https': 'https://user:pass@10.10.1.0:3128',
}

# Create the session and set the proxies.
s = requests.Session()
s.proxies = proxies

# Make the HTTP request through the session.
r = s.get('http://www.showmemyip.com/')

回答 5

晚了8年。但我喜欢:

import os
import requests

os.environ['HTTP_PROXY'] = os.environ['http_proxy'] = 'http://http-connect-proxy:3128/'
os.environ['HTTPS_PROXY'] = os.environ['https_proxy'] = 'http://http-connect-proxy:3128/'
os.environ['NO_PROXY'] = os.environ['no_proxy'] = '127.0.0.1,localhost,.local'

r = requests.get('https://example.com')  # , verify=False

8 years late. But I like:

import os
import requests

os.environ['HTTP_PROXY'] = os.environ['http_proxy'] = 'http://http-connect-proxy:3128/'
os.environ['HTTPS_PROXY'] = os.environ['https_proxy'] = 'http://http-connect-proxy:3128/'
os.environ['NO_PROXY'] = os.environ['no_proxy'] = '127.0.0.1,localhost,.local'

r = requests.get('https://example.com')  # , verify=False

回答 6

这是我的python基本类,带有一些代理配置和秒表的requests模块!

import requests
import time
class BaseCheck():
    def __init__(self, url):
        self.http_proxy  = "http://user:pw@proxy:8080"
        self.https_proxy = "http://user:pw@proxy:8080"
        self.ftp_proxy   = "http://user:pw@proxy:8080"
        self.proxyDict = {
                      "http"  : self.http_proxy,
                      "https" : self.https_proxy,
                      "ftp"   : self.ftp_proxy
                    }
        self.url = url
        def makearr(tsteps):
            global stemps
            global steps
            stemps = {}
            for step in tsteps:
                stemps[step] = { 'start': 0, 'end': 0 }
            steps = tsteps
        makearr(['init','check'])
        def starttime(typ = ""):
            for stemp in stemps:
                if typ == "":
                    stemps[stemp]['start'] = time.time()
                else:
                    stemps[stemp][typ] = time.time()
        starttime()
    def __str__(self):
        return str(self.url)
    def getrequests(self):
        g=requests.get(self.url,proxies=self.proxyDict)
        print g.status_code
        print g.content
        print self.url
        stemps['init']['end'] = time.time()
        #print stemps['init']['end'] - stemps['init']['start']
        x= stemps['init']['end'] - stemps['init']['start']
        print x


test=BaseCheck(url='http://google.com')
test.getrequests()

here is my basic class in python for the requests module with some proxy configs and stopwatch !

import requests
import time
class BaseCheck():
    def __init__(self, url):
        self.http_proxy  = "http://user:pw@proxy:8080"
        self.https_proxy = "http://user:pw@proxy:8080"
        self.ftp_proxy   = "http://user:pw@proxy:8080"
        self.proxyDict = {
                      "http"  : self.http_proxy,
                      "https" : self.https_proxy,
                      "ftp"   : self.ftp_proxy
                    }
        self.url = url
        def makearr(tsteps):
            global stemps
            global steps
            stemps = {}
            for step in tsteps:
                stemps[step] = { 'start': 0, 'end': 0 }
            steps = tsteps
        makearr(['init','check'])
        def starttime(typ = ""):
            for stemp in stemps:
                if typ == "":
                    stemps[stemp]['start'] = time.time()
                else:
                    stemps[stemp][typ] = time.time()
        starttime()
    def __str__(self):
        return str(self.url)
    def getrequests(self):
        g=requests.get(self.url,proxies=self.proxyDict)
        print g.status_code
        print g.content
        print self.url
        stemps['init']['end'] = time.time()
        #print stemps['init']['end'] - stemps['init']['start']
        x= stemps['init']['end'] - stemps['init']['start']
        print x


test=BaseCheck(url='http://google.com')
test.getrequests()

回答 7

我只是做了一个代理抓取器,也可以与相同的抓取代理连接,而无需任何输入,这里是:

#Import Modules

from termcolor import colored
from selenium import webdriver
import requests
import os
import sys
import time

#Proxy Grab

options = webdriver.ChromeOptions()
options.add_argument('headless')
driver = webdriver.Chrome(chrome_options=options)
driver.get("https://www.sslproxies.org/")
tbody = driver.find_element_by_tag_name("tbody")
cell = tbody.find_elements_by_tag_name("tr")
for column in cell:

        column = column.text.split(" ")
        print(colored(column[0]+":"+column[1],'yellow'))
driver.quit()
print("")

os.system('clear')
os.system('cls')

#Proxy Connection

print(colored('Getting Proxies from graber...','green'))
time.sleep(2)
os.system('clear')
os.system('cls')
proxy = {"http": "http://"+ column[0]+":"+column[1]}
url = 'https://mobile.facebook.com/login'
r = requests.get(url,  proxies=proxy)
print("")
print(colored('Connecting using proxy' ,'green'))
print("")
sts = r.status_code

i just made a proxy graber and also can connect with same grabed proxy without any input here is :

#Import Modules

from termcolor import colored
from selenium import webdriver
import requests
import os
import sys
import time

#Proxy Grab

options = webdriver.ChromeOptions()
options.add_argument('headless')
driver = webdriver.Chrome(chrome_options=options)
driver.get("https://www.sslproxies.org/")
tbody = driver.find_element_by_tag_name("tbody")
cell = tbody.find_elements_by_tag_name("tr")
for column in cell:

        column = column.text.split(" ")
        print(colored(column[0]+":"+column[1],'yellow'))
driver.quit()
print("")

os.system('clear')
os.system('cls')

#Proxy Connection

print(colored('Getting Proxies from graber...','green'))
time.sleep(2)
os.system('clear')
os.system('cls')
proxy = {"http": "http://"+ column[0]+":"+column[1]}
url = 'https://mobile.facebook.com/login'
r = requests.get(url,  proxies=proxy)
print("")
print(colored('Connecting using proxy' ,'green'))
print("")
sts = r.status_code

回答 8

有点晚了,但这是一个包装器类,它简化了抓取代理,然后进行了HTTP POST或GET:

代理请求

https://github.com/rootVIII/proxy_requests

It’s a bit late but here is a wrapper class that simplifies scraping proxies and then making an http POST or GET:

ProxyRequests

https://github.com/rootVIII/proxy_requests

回答 9

我共享一些代码,该代码介绍如何从“ https://free-proxy-list.net”站点获取代理并将数据存储到与“ Elite Proxy Switcher”(格式为IP:PORT)这样的工具兼容的文件中:

## PROXY_UPDATER-从https://free-proxy-list.net/获取免费代理

from lxml.html import fromstring
import requests
from itertools import cycle
import traceback
import re

######################FIND PROXIES#########################################
def get_proxies():
    url = 'https://free-proxy-list.net/'
    response = requests.get(url)
    parser = fromstring(response.text)
    proxies = set()
    for i in parser.xpath('//tbody/tr')[:299]:   #299 proxies max
        proxy = ":".join([i.xpath('.//td[1]/text()') 
        [0],i.xpath('.//td[2]/text()')[0]])
        proxies.add(proxy)
    return proxies



######################write to file in format   IP:PORT######################
try:
    proxies = get_proxies()
    f=open('proxy_list.txt','w')
    for proxy in proxies:
        f.write(proxy+'\n')
    f.close()
    print ("DONE")
except:
    print ("MAJOR ERROR")

I share some code how to fetch proxies from the site “https://free-proxy-list.net” and store data to a file compatible with tools like “Elite Proxy Switcher”(format IP:PORT):

##PROXY_UPDATER – get free proxies from https://free-proxy-list.net/

from lxml.html import fromstring
import requests
from itertools import cycle
import traceback
import re

######################FIND PROXIES#########################################
def get_proxies():
    url = 'https://free-proxy-list.net/'
    response = requests.get(url)
    parser = fromstring(response.text)
    proxies = set()
    for i in parser.xpath('//tbody/tr')[:299]:   #299 proxies max
        proxy = ":".join([i.xpath('.//td[1]/text()') 
        [0],i.xpath('.//td[2]/text()')[0]])
        proxies.add(proxy)
    return proxies



######################write to file in format   IP:PORT######################
try:
    proxies = get_proxies()
    f=open('proxy_list.txt','w')
    for proxy in proxies:
        f.write(proxy+'\n')
    f.close()
    print ("DONE")
except:
    print ("MAJOR ERROR")

在Python请求库的get方法中使用标头

问题:在Python请求库的get方法中使用标头

因此,我最近偶然发现了这个用于在Python中处理HTTP请求的强大库;在此处找到http://docs.python-requests.org/en/latest/index.html

我喜欢使用它,但是我不知道如何在我的get请求中添加标题。救命?

So I recently stumbled upon this great library for handling HTTP requests in Python; found here http://docs.python-requests.org/en/latest/index.html.

I love working with it, but I can’t figure out how to add headers to my get requests. Help?


回答 0

根据api,标头都可以使用request.get传递:

r=requests.get("http://www.example.com/", headers={"content-type":"text"})

According to the api, the headers can all be passed in using requests.get:

r=requests.get("http://www.example.com/", headers={"content-type":"text"})

回答 1

根据您链接的页面上的文档,您似乎很简单(强调我的)。

requests.get(URL,params = None,headers = None,Cookies = None,auth = None,timeout = None)

发送GET请求。返回Response对象。

参数:

  • url –新Request对象的URL 。
  • params –(可选)与一起发送的GET参数字典Request
  • 标头-(可选)与一起发送的HTTP标头字典Request
  • cookies –(可选)与一起发送的CookieJar对象 Request
  • auth –(可选)AuthObject以启用基本HTTP身份验证。
  • 超时–(可选)浮动,描述请求的超时。

Seems pretty straightforward, according to the docs on the page you linked (emphasis mine).

requests.get(url, params=None, headers=None, cookies=None, auth=None, timeout=None)

Sends a GET request. Returns Response object.

Parameters:

  • url – URL for the new Request object.
  • params – (optional) Dictionary of GET Parameters to send with the Request.
  • headers – (optional) Dictionary of HTTP Headers to send with the Request.
  • cookies – (optional) CookieJar object to send with the Request.
  • auth – (optional) AuthObject to enable Basic HTTP Auth.
  • timeout – (optional) Float describing the timeout of the request.

回答 2

这个答案告诉我,您可以为整个会话设置标题:

s = requests.Session()
s.auth = ('user', 'pass')
s.headers.update({'x-test': 'true'})

# both 'x-test' and 'x-test2' are sent
s.get('http://httpbin.org/headers', headers={'x-test2': 'true'})

奖励:会话还可以处理Cookie。

This answer taught me that you can set headers for an entire session:

s = requests.Session()
s.auth = ('user', 'pass')
s.headers.update({'x-test': 'true'})

# both 'x-test' and 'x-test2' are sent
s.get('http://httpbin.org/headers', headers={'x-test2': 'true'})

Bonus: Sessions also handle cookies.