问题:如何使用Python的“请求”模块“登录”网站?
我正在尝试使用Python中的“请求”模块发布一个登录到网站的请求,但它实际上无法正常工作。我是新来的…所以我不知道是否应该使用我的用户名和密码cookie或某种我发现的HTTP授权类型(??)。
from pyquery import PyQuery
import requests
url = 'http://www.locationary.com/home/index2.jsp'
所以现在,我认为我应该使用“发布”和cookie。
ck = {'inUserName': 'USERNAME/EMAIL', 'inUserPass': 'PASSWORD'}
r = requests.post(url, cookies=ck)
content = r.text
q = PyQuery(content)
title = q("title").text()
print title
我有一种感觉,我做错了Cookie的事情……我不知道。
如果登录不正确,则主页标题应显示在“ Locationary.com”上;如果登录不正确,则应显示为“主页”。
如果您可以向我解释一些有关请求和cookie的事情,并帮助我解决这个问题,我将不胜感激。:D
谢谢。
…它仍然没有真正起作用。好的…所以这是登录之前主页HTML的内容:
</td><td><img src="http://www.locationary.com/img/LocationaryImgs/icons/txt_email.gif"> </td>
<td><input class="Data_Entry_Field_Login" type="text" name="inUserName" id="inUserName" size="25"></td>
<td><img src="http://www.locationary.com/img/LocationaryImgs/icons/txt_password.gif"> </td>
<td><input class="Data_Entry_Field_Login" type="password" name="inUserPass" id="inUserPass"></td>
所以我认为我做对了,但输出仍然是“ Locationary.com”
第二次编辑:
我希望能够长时间保持登录状态,并且每当我请求该域下的页面时,我都希望内容显示出来就像我已登录一样。
I am trying to post a request to log in to a website using the Requests module in Python but its not really working. I’m new to this…so I can’t figure out if I should make my Username and Password cookies or some type of HTTP authorization thing I found (??).
from pyquery import PyQuery
import requests
url = 'http://www.locationary.com/home/index2.jsp'
So now, I think I’m supposed to use “post” and cookies….
ck = {'inUserName': 'USERNAME/EMAIL', 'inUserPass': 'PASSWORD'}
r = requests.post(url, cookies=ck)
content = r.text
q = PyQuery(content)
title = q("title").text()
print title
I have a feeling that I’m doing the cookies thing wrong…I don’t know.
If it doesn’t log in correctly, the title of the home page should come out to “Locationary.com” and if it does, it should be “Home Page.”
If you could maybe explain a few things about requests and cookies to me and help me out with this, I would greatly appreciate it. :D
Thanks.
…It still didn’t really work yet. Okay…so this is what the home page HTML says before you log in:
</td><td><img src="http://www.locationary.com/img/LocationaryImgs/icons/txt_email.gif"> </td>
<td><input class="Data_Entry_Field_Login" type="text" name="inUserName" id="inUserName" size="25"></td>
<td><img src="http://www.locationary.com/img/LocationaryImgs/icons/txt_password.gif"> </td>
<td><input class="Data_Entry_Field_Login" type="password" name="inUserPass" id="inUserPass"></td>
So I think I’m doing it right, but the output is still “Locationary.com”
2nd EDIT:
I want to be able to stay logged in for a long time and whenever I request a page under that domain, I want the content to show up as if I were logged in.
回答 0
If the information you want is on the page you are directed to immediately after login…
Lets call your ck
variable payload
instead, like in the python-requests docs:
payload = {'inUserName': 'USERNAME/EMAIL', 'inUserPass': 'PASSWORD'}
url = 'http://www.locationary.com/home/index2.jsp'
requests.post(url, data=payload)
Otherwise…
See https://stackoverflow.com/a/17633072/111362 below.
回答 1
我知道您已经找到了另一种解决方案,但是对于像我这样的人,如果发现同样的问题,可以通过以下请求来实现:
首先,就像Marcus一样,检查登录表单的源以获取三项信息-表单发布到的URL以及用户名和密码字段的名称属性。在他的示例中,它们是inUserName和inUserPass。
一旦知道了这一点,就可以使用requests.Session()
实例向登录URL发出发布请求,并将您的登录详细信息作为有效内容。从会话实例发出请求本质上与正常使用请求相同,它只是增加了持久性,允许您存储和使用cookie等。
假设您的登录尝试成功,则可以简单地使用会话实例向站点发出进一步的请求。标识您的cookie将用于授权请求。
例
import requests
# Fill in your details here to be posted to the login form.
payload = {
'inUserName': 'username',
'inUserPass': 'password'
}
# Use 'with' to ensure the session context is closed after use.
with requests.Session() as s:
p = s.post('LOGIN_URL', data=payload)
# print the html returned or something more intelligent to see if it's a successful login page.
print p.text
# An authorised request.
r = s.get('A protected web page url')
print r.text
# etc...
I know you’ve found another solution, but for those like me who find this question, looking for the same thing, it can be achieved with requests as follows:
Firstly, as Marcus did, check the source of the login form to get three pieces of information – the url that the form posts to, and the name attributes of the username and password fields. In his example, they are inUserName and inUserPass.
Once you’ve got that, you can use a requests.Session()
instance to make a post request to the login url with your login details as a payload. Making requests from a session instance is essentially the same as using requests normally, it simply adds persistence, allowing you to store and use cookies etc.
Assuming your login attempt was successful, you can simply use the session instance to make further requests to the site. The cookie that identifies you will be used to authorise the requests.
Example
import requests
# Fill in your details here to be posted to the login form.
payload = {
'inUserName': 'username',
'inUserPass': 'password'
}
# Use 'with' to ensure the session context is closed after use.
with requests.Session() as s:
p = s.post('LOGIN_URL', data=payload)
# print the html returned or something more intelligent to see if it's a successful login page.
print p.text
# An authorised request.
r = s.get('A protected web page url')
print r.text
# etc...
回答 2
让我尝试简化一下,假设该站点的URL是http://example.com/,并且假设您需要通过填充用户名和密码进行注册,所以我们在登录页面上输入http:// example。 com / login.php,然后查看其源代码并搜索操作网址,该网址将在表单标签中,例如
<form name="loginform" method="post" action="userinfo.php">
现在使用userinfo.php来创建绝对URL,该URL将是“ http://example.com/userinfo.php ”,现在运行一个简单的python脚本
import requests
url = 'http://example.com/userinfo.php'
values = {'username': 'user',
'password': 'pass'}
r = requests.post(url, data=values)
print r.content
我希望这有一天能对某人有所帮助。
Let me try to make it simple, suppose URL of the site is http://example.com/ and let’s suppose you need to sign up by filling username and password, so we go to the login page say http://example.com/login.php now and view it’s source code and search for the action URL it will be in form tag something like
<form name="loginform" method="post" action="userinfo.php">
now take userinfo.php to make absolute URL which will be ‘http://example.com/userinfo.php‘, now run a simple python script
import requests
url = 'http://example.com/userinfo.php'
values = {'username': 'user',
'password': 'pass'}
r = requests.post(url, data=values)
print r.content
I Hope that this helps someone somewhere someday.
回答 3
找出用于用户名<...name=username.../>
和密码的网站表单上输入的名称,<...name=password../>
并在下面的脚本中替换它们。另外,替换URL以指向要登录的所需站点。
login.py
#!/usr/bin/env python
import requests
from requests.packages.urllib3.exceptions import InsecureRequestWarning
requests.packages.urllib3.disable_warnings(InsecureRequestWarning)
payload = { 'username': 'user@email.com', 'password': 'blahblahsecretpassw0rd' }
url = 'https://website.com/login.html'
requests.post(url, data=payload, verify=False)
指某东西的用途 disable_warnings(InsecureRequestWarning)
当尝试使用未经验证的SSL证书登录站点时会使脚本的任何输出静音。
额外:
要在基于UNIX的系统上从命令行运行此脚本,请将其放置在目录中,即home/scripts
,将该目录添加到~/.bash_profile
终端所使用的路径或类似文件中。
# Custom scripts
export CUSTOM_SCRIPTS=home/scripts
export PATH=$CUSTOM_SCRIPTS:$PATH
然后在其中创建指向此python脚本的链接 home/scripts/login.py
ln -s ~/home/scripts/login.py ~/home/scripts/login
关闭您的终端,启动一个新终端,运行 login
Find out the name of the inputs used on the websites form for usernames <...name=username.../>
and passwords <...name=password../>
and replace them in the script below. Also replace the URL to point at the desired site to log into.
login.py
#!/usr/bin/env python
import requests
from requests.packages.urllib3.exceptions import InsecureRequestWarning
requests.packages.urllib3.disable_warnings(InsecureRequestWarning)
payload = { 'username': 'user@email.com', 'password': 'blahblahsecretpassw0rd' }
url = 'https://website.com/login.html'
requests.post(url, data=payload, verify=False)
The use of disable_warnings(InsecureRequestWarning)
will silence any output from the script when trying to log into sites with unverified SSL certificates.
Extra:
To run this script from the command line on a UNIX based system place it in a directory, i.e. home/scripts
and add this directory to your path in ~/.bash_profile
or a similar file used by the terminal.
# Custom scripts
export CUSTOM_SCRIPTS=home/scripts
export PATH=$CUSTOM_SCRIPTS:$PATH
Then create a link to this python script inside home/scripts/login.py
ln -s ~/home/scripts/login.py ~/home/scripts/login
Close your terminal, start a new one, run login
回答 4
该requests.Session()
解决方案有助于登录到具有CSRF保护的表单(与Flask-WTF表单中使用的一样)。检查是否csrf_token
需要a作为隐藏字段,然后使用用户名和密码将其添加到有效负载中:
import requests
from bs4 import BeautifulSoup
payload = {
'email': 'email@example.com',
'password': 'passw0rd'
}
with requests.Session() as sess:
res = sess.get(server_name + '/signin')
signin = BeautifulSoup(res._content, 'html.parser')
payload['csrf_token'] = signin.find('input', id='csrf_token')['value']
res = sess.post(server_name + '/auth/login', data=payload)
The requests.Session()
solution assisted with logging into a form with CSRF Protection (as used in Flask-WTF forms). Check if a csrf_token
is required as a hidden field and add it to the payload with the username and password:
import requests
from bs4 import BeautifulSoup
payload = {
'email': 'email@example.com',
'password': 'passw0rd'
}
with requests.Session() as sess:
res = sess.get(server_name + '/signin')
signin = BeautifulSoup(res._content, 'html.parser')
payload['csrf_token'] = signin.find('input', id='csrf_token')['value']
res = sess.post(server_name + '/auth/login', data=payload)