问题:使用Python在Selenium WebDriver中获取WebElement的HTML源
我正在使用Python绑定来运行Selenium WebDriver:
from selenium import webdriver
wd = webdriver.Firefox()
我知道我可以像这样抓取网络元素:
elem = wd.find_element_by_css_selector('#my-id')
我知道我可以通过…
wd.page_source
但是无论如何,有没有获得“元素来源”?
elem.source # <-- returns the HTML as a string
Python的Selenium Webdriver文档基本上不存在,我在代码中看不到任何能够启用该功能的东西。
对访问元素(及其子元素)的HTML的最佳方法有何想法?
I’m using the Python bindings to run Selenium WebDriver:
from selenium import webdriver
wd = webdriver.Firefox()
I know I can grab a webelement like so:
elem = wd.find_element_by_css_selector('#my-id')
And I know I can get the full page source with…
wd.page_source
But is there anyway to get the “element source”?
elem.source # <-- returns the HTML as a string
The selenium webdriver docs for Python are basically non-existent and I don’t see anything in the code that seems to enable that functionality.
Any thoughts on the best way to access the HTML of an element (and its children)?
回答 0
您可以读取innerHTML
属性以获取元素内容的outerHTML
来源或包含当前元素的来源。
Python:
element.get_attribute('innerHTML')
Java:
elem.getAttribute("innerHTML");
C#:
element.GetAttribute("innerHTML");
红宝石:
element.attribute("innerHTML")
JS:
element.getAttribute('innerHTML');
PHP:
$element->getAttribute('innerHTML');
经过测试并与ChromeDriver
。
You can read innerHTML
attribute to get source of the content of the element or outerHTML
for source with the current element.
Python:
element.get_attribute('innerHTML')
Java:
elem.getAttribute("innerHTML");
C#:
element.GetAttribute("innerHTML");
Ruby:
element.attribute("innerHTML")
JS:
element.getAttribute('innerHTML');
PHP:
$element->getAttribute('innerHTML');
Tested and works with the ChromeDriver
.
回答 1
获取a的html源代码实际上并没有直接的方法webelement
。您将不得不使用JS。我不太确定python绑定,但是您可以在Java中轻松地做到这一点。我确信一定有一些类似于JavascriptExecutor
Python中的类。
WebElement element = driver.findElement(By.id("foo"));
String contents = (String)((JavascriptExecutor)driver).executeScript("return arguments[0].innerHTML;", element);
There is not really a straight-forward way of getting the html source code of a webelement
. You will have to use JS. I am not too sure about python bindings but you can easily do like this in Java. I am sure there must be something similar to JavascriptExecutor
class in Python.
WebElement element = driver.findElement(By.id("foo"));
String contents = (String)((JavascriptExecutor)driver).executeScript("return arguments[0].innerHTML;", element);
回答 2
当然,我们可以在下面的Selenium Python中使用此脚本获取所有HTML源代码:
elem = driver.find_element_by_xpath("//*")
source_code = elem.get_attribute("outerHTML")
如果要保存到文件:
with open('c:/html_source_code.html', 'w') as f:
f.write(source_code.encode('utf-8'))
我建议保存到文件,因为源代码非常长。
Sure we can get all HTML source code with this script below in Selenium Python:
elem = driver.find_element_by_xpath("//*")
source_code = elem.get_attribute("outerHTML")
If you you want to save it to file:
with open('c:/html_source_code.html', 'w') as f:
f.write(source_code.encode('utf-8'))
I suggest saving to a file because source code is very very long.
回答 3
在Ruby中,使用selenium-webdriver(2.32.1),存在一种page_source
包含整个页面源的方法。
In Ruby, using selenium-webdriver (2.32.1), there is a page_source
method that contains the entire page source.
回答 4
实际上,使用属性方法更容易,更直接。
将Ruby与Selenium和PageObject宝石一起使用,以获取与某个元素关联的类,该行将为element.attribute(Class)
。
如果您想将其他属性绑定到元素,则适用相同的概念。例如,如果我想要一个元素的String element.attribute(String)
。
Using the attribute method is, in fact, easier and more straight forward.
Using Ruby with the Selenium and PageObject gems, to get the class associated with a certain element, the line would be element.attribute(Class)
.
The same concept applies if you wanted to get other attributes tied to the element. For example, if I wanted the String of an element, element.attribute(String)
.
回答 5
看起来已经过时了,但无论如何还是要放在这里。在您的情况下,正确的做法是:
elem = wd.find_element_by_css_selector('#my-id')
html = wd.execute_script("return arguments[0].innerHTML;", elem)
要么
html = elem.get_attribute('innerHTML')
两者都为我工作(selenium-server-standalone-2.35.0)
Looks outdated, but let it be here anyway. The correct way to do it in your case:
elem = wd.find_element_by_css_selector('#my-id')
html = wd.execute_script("return arguments[0].innerHTML;", elem)
or
html = elem.get_attribute('innerHTML')
Both are working for me (selenium-server-standalone-2.35.0)
回答 6
Java与Selenium 2.53.0
driver.getPageSource();
Java with Selenium 2.53.0
driver.getPageSource();
回答 7
回答 8
这对我来说是无缝的。
element.get_attribute('innerHTML')
This works seamlessly for me.
element.get_attribute('innerHTML')
回答 9
InnerHTML will return element inside the selected element and outerHTML will return inside HTML along with the element you have selected
Example :-
Now suppose your Element is as below
<tr id="myRow"><td>A</td><td>B</td></tr>
innerHTML element Output
<td>A</td><td>B</td>
outerHTML element Output
<tr id="myRow"><td>A</td><td>B</td></tr>
Live Example :-
http://www.java2s.com/Tutorials/JavascriptDemo/f/find_out_the_difference_between_innerhtml_and_outerhtml_in_javascript_example.htm
Below you will find the syntax which require as per different binding. Change the innerHTML
to outerHTML
as per required.
Python:
element.get_attribute('innerHTML')
Java:
elem.getAttribute("innerHTML");
If you want whole page HTML use below code :-
driver.getPageSource();
回答 10
WebElement element = driver.findElement(By.id("foo"));
String contents = (String)((JavascriptExecutor)driver).executeScript("return arguments[0].innerHTML;", element);
该代码也确实可以从源代码中获取JavaScript!
WebElement element = driver.findElement(By.id("foo"));
String contents = (String)((JavascriptExecutor)driver).executeScript("return arguments[0].innerHTML;", element);
This code really works to get JavaScript from source as well!
回答 11
在PHPUnit硒测试中,它是这样的:
$text = $this->byCssSelector('.some-class-nmae')->attribute('innerHTML');
And in PHPUnit selenium test it’s like this:
$text = $this->byCssSelector('.some-class-nmae')->attribute('innerHTML');
回答 12
如果您对Python中的远程控制解决方案感兴趣,请按照以下方法获取innerHTML:
innerHTML = sel.get_eval("window.document.getElementById('prodid').innerHTML")
If you are interested in a solution for Remote Control in Python, here is how to get innerHTML:
innerHTML = sel.get_eval("window.document.getElementById('prodid').innerHTML")
回答 13
我更喜欢获取呈现的HTML的方法如下:
driver.get("http://www.google.com")
body_html = driver.find_element_by_xpath("/html/body")
print body_html.text
但是,上述方法会删除所有标签(也是嵌套标签),并且仅返回文本内容。如果您也有兴趣获取HTML标记,请使用以下方法。
print body_html.getAttribute("innerHTML")
The method to get the rendered HTML I prefer is following:
driver.get("http://www.google.com")
body_html = driver.find_element_by_xpath("/html/body")
print body_html.text
However the above method removes all the tags( yes the nested tags as well ) and returns only text content. If you interested in getting the HTML markup as well, then use the method below.
print body_html.getAttribute("innerHTML")