如何使用Python登录网页并检索Cookie以供以后使用?

问题:如何使用Python登录网页并检索Cookie以供以后使用?

我想使用python下载和解析网页,但是要访问它,我需要设置一些cookie。因此,我需要先通过https登录到网页。登录时刻需要将两个POST参数(用户名,密码)发送到/login.php。在登录请求期间,我想从响应头中检索cookie并将其存储,以便可以在请求中使用它们来下载网页/data.php。

我将如何在python(最好是2.6)中做到这一点?如果可能,我只想使用内置模块。

I want to download and parse webpage using python, but to access it I need a couple of cookies set. Therefore I need to login over https to the webpage first. The login moment involves sending two POST params (username, password) to /login.php. During the login request I want to retrieve the cookies from the response header and store them so I can use them in the request to download the webpage /data.php.

How would I do this in python (preferably 2.6)? If possible I only want to use builtin modules.


回答 0

import urllib, urllib2, cookielib

username = 'myuser'
password = 'mypassword'

cj = cookielib.CookieJar()
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj))
login_data = urllib.urlencode({'username' : username, 'j_password' : password})
opener.open('http://www.example.com/login.php', login_data)
resp = opener.open('http://www.example.com/hiddenpage.php')
print resp.read()

resp.read()是您要打开的页面的纯HTML,您可以使用opener会话cookie查看任何页面。

import urllib, urllib2, cookielib

username = 'myuser'
password = 'mypassword'

cj = cookielib.CookieJar()
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj))
login_data = urllib.urlencode({'username' : username, 'j_password' : password})
opener.open('http://www.example.com/login.php', login_data)
resp = opener.open('http://www.example.com/hiddenpage.php')
print resp.read()

resp.read() is the straight html of the page you want to open, and you can use opener to view any page using your session cookie.


回答 1

这是使用优秀请求库的版本:

from requests import session

payload = {
    'action': 'login',
    'username': USERNAME,
    'password': PASSWORD
}

with session() as c:
    c.post('http://example.com/login.php', data=payload)
    response = c.get('http://example.com/protected_page.php')
    print(response.headers)
    print(response.text)

Here’s a version using the excellent requests library:

from requests import session

payload = {
    'action': 'login',
    'username': USERNAME,
    'password': PASSWORD
}

with session() as c:
    c.post('http://example.com/login.php', data=payload)
    response = c.get('http://example.com/protected_page.php')
    print(response.headers)
    print(response.text)