问题:有没有办法在Python中使用PhantomJS?
我想在Python中使用PhantomJS。我用谷歌搜索了这个问题,但是找不到合适的解决方案。
我发现os.popen()
可能是一个不错的选择。但是我无法通过一些参数。
使用subprocess.Popen()
可能是目前合适的解决方案。我想知道是否有更好的解决方案。
有没有办法在Python中使用PhantomJS?
I want to use PhantomJS in Python. I googled this problem but couldn’t find proper solutions.
I find os.popen()
may be a good choice. But I couldn’t pass some arguments to it.
Using subprocess.Popen()
may be a proper solution for now. I want to know whether there’s a better solution or not.
Is there a way to use PhantomJS in Python?
回答 0
在python中使用PhantomJS的最简单方法是通过Selenium。最简单的安装方法是
- 安装NodeJS
- 使用Node的包管理器安装phantomjs:
npm -g install phantomjs-prebuilt
- 安装硒(如果使用的话,在您的virtualenv中)
安装后,您可以简单地使用phantom:
from selenium import webdriver
driver = webdriver.PhantomJS() # or add to your PATH
driver.set_window_size(1024, 768) # optional
driver.get('https://google.com/')
driver.save_screenshot('screen.png') # save a screenshot to disk
sbtn = driver.find_element_by_css_selector('button.gbqfba')
sbtn.click()
如果您的系统路径环境变量设置不正确,则需要指定确切的路径作为的参数webdriver.PhantomJS()
。替换为:
driver = webdriver.PhantomJS() # or add to your PATH
…具有以下内容:
driver = webdriver.PhantomJS(executable_path='/usr/local/lib/node_modules/phantomjs/lib/phantom/bin/phantomjs')
参考文献:
The easiest way to use PhantomJS in python is via Selenium. The simplest installation method is
- Install NodeJS
- Using Node’s package manager install phantomjs:
npm -g install phantomjs-prebuilt
- install selenium (in your virtualenv, if you are using that)
After installation, you may use phantom as simple as:
from selenium import webdriver
driver = webdriver.PhantomJS() # or add to your PATH
driver.set_window_size(1024, 768) # optional
driver.get('https://google.com/')
driver.save_screenshot('screen.png') # save a screenshot to disk
sbtn = driver.find_element_by_css_selector('button.gbqfba')
sbtn.click()
If your system path environment variable isn’t set correctly, you’ll need to specify the exact path as an argument to webdriver.PhantomJS()
. Replace this:
driver = webdriver.PhantomJS() # or add to your PATH
… with the following:
driver = webdriver.PhantomJS(executable_path='/usr/local/lib/node_modules/phantomjs/lib/phantom/bin/phantomjs')
References:
回答 1
PhantomJS最近完全放弃了对Python的支持。但是,PhantomJS现在嵌入了Ghost驱动程序。
此后,一个新项目已加紧填补了空白:ghost.py
。您可能想使用它代替:
from ghost import Ghost
ghost = Ghost()
with ghost.start() as session:
page, extra_resources = ghost.open("http://jeanphi.me")
assert page.http_status==200 and 'jeanphix' in ghost.content
PhantomJS recently dropped Python support altogether. However, PhantomJS now embeds Ghost Driver.
A new project has since stepped up to fill the void: ghost.py
. You probably want to use that instead:
from ghost import Ghost
ghost = Ghost()
with ghost.start() as session:
page, extra_resources = ghost.open("http://jeanphi.me")
assert page.http_status==200 and 'jeanphix' in ghost.content
回答 2
现在,由于GhostDriver与PhantomJS捆绑在一起,通过Selenium使用它变得更加方便。
我按照Pykler的建议尝试了PhantomJS的Node安装,但实际上,我发现它比PhantomJS的独立安装要慢。我猜独立安装没有早些时候提供这些功能,但是从v1.9开始,它确实提供了这些功能。
- 安装PhantomJS(http://phantomjs.org/download.html)(如果您使用的是Linux,则以下说明将有助于https://stackoverflow.com/a/14267295/382630)
- 使用pip安装Selenium。
现在您可以像这样使用
import selenium.webdriver
driver = selenium.webdriver.PhantomJS()
driver.get('http://google.com')
# do some processing
driver.quit()
Now since the GhostDriver comes bundled with the PhantomJS, it has become even more convenient to use it through Selenium.
I tried the Node installation of PhantomJS, as suggested by Pykler, but in practice I found it to be slower than the standalone installation of PhantomJS. I guess standalone installation didn’t provided these features earlier, but as of v1.9, it very much does so.
- Install PhantomJS (http://phantomjs.org/download.html) (If you are on Linux, following instructions will help https://stackoverflow.com/a/14267295/382630)
- Install Selenium using pip.
Now you can use like this
import selenium.webdriver
driver = selenium.webdriver.PhantomJS()
driver.get('http://google.com')
# do some processing
driver.quit()
回答 3
这是我使用PhantomJS和Django测试javascript的方法:
mobile / test_no_js_errors.js:
var page = require('webpage').create(),
system = require('system'),
url = system.args[1],
status_code;
page.onError = function (msg, trace) {
console.log(msg);
trace.forEach(function(item) {
console.log(' ', item.file, ':', item.line);
});
};
page.onResourceReceived = function(resource) {
if (resource.url == url) {
status_code = resource.status;
}
};
page.open(url, function (status) {
if (status == "fail" || status_code != 200) {
console.log("Error: " + status_code + " for url: " + url);
phantom.exit(1);
}
phantom.exit(0);
});
mobile / tests.py:
import subprocess
from django.test import LiveServerTestCase
class MobileTest(LiveServerTestCase):
def test_mobile_js(self):
args = ["phantomjs", "mobile/test_no_js_errors.js", self.live_server_url]
result = subprocess.check_output(args)
self.assertEqual(result, "") # No result means no error
运行测试:
manage.py test mobile
Here’s how I test javascript using PhantomJS and Django:
mobile/test_no_js_errors.js:
var page = require('webpage').create(),
system = require('system'),
url = system.args[1],
status_code;
page.onError = function (msg, trace) {
console.log(msg);
trace.forEach(function(item) {
console.log(' ', item.file, ':', item.line);
});
};
page.onResourceReceived = function(resource) {
if (resource.url == url) {
status_code = resource.status;
}
};
page.open(url, function (status) {
if (status == "fail" || status_code != 200) {
console.log("Error: " + status_code + " for url: " + url);
phantom.exit(1);
}
phantom.exit(0);
});
mobile/tests.py:
import subprocess
from django.test import LiveServerTestCase
class MobileTest(LiveServerTestCase):
def test_mobile_js(self):
args = ["phantomjs", "mobile/test_no_js_errors.js", self.live_server_url]
result = subprocess.check_output(args)
self.assertEqual(result, "") # No result means no error
Run tests:
manage.py test mobile
回答 4
@Pykler的答案很好,但是Node要求已经过时。该答案中的注释提出了更简单的答案,我将其放在此处以节省其他人的时间:
安装PhantomJS
正如@ Vivin-Paliath所指出的,这是一个独立的项目,不是Node的一部分。
苹果电脑:
brew install phantomjs
Ubuntu:
sudo apt-get install phantomjs
等等
设置一个virtualenv
(如果还没有的话):
virtualenv mypy # doesn't have to be "mypy". Can be anything.
. mypy/bin/activate
如果您的计算机同时具有Python 2和3,则可能需要运行virtualenv-3.6 mypy
或类似版本。
安装硒:
pip install selenium
尝试一个简单的测试,例如从docs借来的:
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
driver = webdriver.PhantomJS()
driver.get("http://www.python.org")
assert "Python" in driver.title
elem = driver.find_element_by_name("q")
elem.clear()
elem.send_keys("pycon")
elem.send_keys(Keys.RETURN)
assert "No results found." not in driver.page_source
driver.close()
The answer by @Pykler is great but the Node requirement is outdated. The comments in that answer suggest the simpler answer, which I’ve put here to save others time:
Install PhantomJS
As @Vivin-Paliath points out, it’s a standalone project, not part of Node.
Mac:
brew install phantomjs
Ubuntu:
sudo apt-get install phantomjs
etc
Set up a virtualenv
(if you haven’t already):
virtualenv mypy # doesn't have to be "mypy". Can be anything.
. mypy/bin/activate
If your machine has both Python 2 and 3 you may need run virtualenv-3.6 mypy
or similar.
Install selenium:
pip install selenium
Try a simple test, like this borrowed from the docs:
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
driver = webdriver.PhantomJS()
driver.get("http://www.python.org")
assert "Python" in driver.title
elem = driver.find_element_by_name("q")
elem.clear()
elem.send_keys("pycon")
elem.send_keys(Keys.RETURN)
assert "No results found." not in driver.page_source
driver.close()
回答 5
这就是我所做的,python3.3。我正在处理大量站点,因此超时失败对于整个列表中的工作至关重要。
command = "phantomjs --ignore-ssl-errors=true "+<your js file for phantom>
process = subprocess.Popen(command, shell=True, stdout=subprocess.PIPE)
# make sure phantomjs has time to download/process the page
# but if we get nothing after 30 sec, just move on
try:
output, errors = process.communicate(timeout=30)
except Exception as e:
print("\t\tException: %s" % e)
process.kill()
# output will be weird, decode to utf-8 to save heartache
phantom_output = ''
for out_line in output.splitlines():
phantom_output += out_line.decode('utf-8')
this is what I do, python3.3. I was processing huge lists of sites, so failing on the timeout was vital for the job to run through the entire list.
command = "phantomjs --ignore-ssl-errors=true "+<your js file for phantom>
process = subprocess.Popen(command, shell=True, stdout=subprocess.PIPE)
# make sure phantomjs has time to download/process the page
# but if we get nothing after 30 sec, just move on
try:
output, errors = process.communicate(timeout=30)
except Exception as e:
print("\t\tException: %s" % e)
process.kill()
# output will be weird, decode to utf-8 to save heartache
phantom_output = ''
for out_line in output.splitlines():
phantom_output += out_line.decode('utf-8')
回答 6
如果使用Anaconda,请安装:
conda install PhantomJS
在您的脚本中:
from selenium import webdriver
driver=webdriver.PhantomJS()
完美地工作。
If using Anaconda, install with:
conda install PhantomJS
in your script:
from selenium import webdriver
driver=webdriver.PhantomJS()
works perfectly.
回答 7
如果您使用的是Buildout,则可以轻松地自动执行Pykler描述的安装过程。 gp.recipe.node配方。
[nodejs]
recipe = gp.recipe.node
version = 0.10.32
npms = phantomjs
scripts = phantomjs
该部分以二进制形式(至少在我的系统上)安装node.js,然后使用npm安装PhantomJS。最后,它将创建一个入口点bin/phantomjs
,您可以使用该入口点调用PhantomJS webdriver。(要安装Selenium,您需要在鸡蛋要求或Buildout配置中指定它。)
driver = webdriver.PhantomJS('bin/phantomjs')
In case you are using Buildout, you can easily automate the installation processes that Pykler describes using the gp.recipe.node recipe.
[nodejs]
recipe = gp.recipe.node
version = 0.10.32
npms = phantomjs
scripts = phantomjs
That part installs node.js as binary (at least on my system) and then uses npm to install PhantomJS. Finally it creates an entry point bin/phantomjs
, which you can call the PhantomJS webdriver with. (To install Selenium, you need to specify it in your egg requirements or in the Buildout configuration.)
driver = webdriver.PhantomJS('bin/phantomjs')