标签归档:selenium-webdriver

使用Python在Selenium WebDriver中获取WebElement的HTML源

问题:使用Python在Selenium WebDriver中获取WebElement的HTML源

我正在使用Python绑定来运行Selenium WebDriver:

from selenium import webdriver
wd = webdriver.Firefox()

我知道我可以像这样抓取网络元素:

elem = wd.find_element_by_css_selector('#my-id')

我知道我可以通过…

wd.page_source

但是无论如何,有没有获得“元素来源”?

elem.source   # <-- returns the HTML as a string

Python的Selenium Webdriver文档基本上不存在,我在代码中看不到任何能够启用该功能的东西。

对访问元素(及其子元素)的HTML的最佳方法有何想法?

I’m using the Python bindings to run Selenium WebDriver:

from selenium import webdriver
wd = webdriver.Firefox()

I know I can grab a webelement like so:

elem = wd.find_element_by_css_selector('#my-id')

And I know I can get the full page source with…

wd.page_source

But is there anyway to get the “element source”?

elem.source   # <-- returns the HTML as a string

The selenium webdriver docs for Python are basically non-existent and I don’t see anything in the code that seems to enable that functionality.

Any thoughts on the best way to access the HTML of an element (and its children)?


回答 0

您可以读取innerHTML属性以获取元素内容outerHTML来源或包含当前元素的来源。

Python:

element.get_attribute('innerHTML')

Java:

elem.getAttribute("innerHTML");

C#:

element.GetAttribute("innerHTML");

红宝石:

element.attribute("innerHTML")

JS:

element.getAttribute('innerHTML');

PHP:

$element->getAttribute('innerHTML');

经过测试并与ChromeDriver

You can read innerHTML attribute to get source of the content of the element or outerHTML for source with the current element.

Python:

element.get_attribute('innerHTML')

Java:

elem.getAttribute("innerHTML");

C#:

element.GetAttribute("innerHTML");

Ruby:

element.attribute("innerHTML")

JS:

element.getAttribute('innerHTML');

PHP:

$element->getAttribute('innerHTML');

Tested and works with the ChromeDriver.


回答 1

获取a的html源代码实际上并没有直接的方法webelement。您将不得不使用JS。我不太确定python绑定,但是您可以在Java中轻松地做到这一点。我确信一定有一些类似于JavascriptExecutorPython中的类。

 WebElement element = driver.findElement(By.id("foo"));
 String contents = (String)((JavascriptExecutor)driver).executeScript("return arguments[0].innerHTML;", element); 

There is not really a straight-forward way of getting the html source code of a webelement. You will have to use JS. I am not too sure about python bindings but you can easily do like this in Java. I am sure there must be something similar to JavascriptExecutor class in Python.

 WebElement element = driver.findElement(By.id("foo"));
 String contents = (String)((JavascriptExecutor)driver).executeScript("return arguments[0].innerHTML;", element); 

回答 2

当然,我们可以在下面的Selenium Python中使用此脚本获取所有HTML源代码:

elem = driver.find_element_by_xpath("//*")
source_code = elem.get_attribute("outerHTML")

如果要保存到文件:

with open('c:/html_source_code.html', 'w') as f:
    f.write(source_code.encode('utf-8'))

我建议保存到文件,因为源代码非常长。

Sure we can get all HTML source code with this script below in Selenium Python:

elem = driver.find_element_by_xpath("//*")
source_code = elem.get_attribute("outerHTML")

If you you want to save it to file:

with open('c:/html_source_code.html', 'w') as f:
    f.write(source_code.encode('utf-8'))

I suggest saving to a file because source code is very very long.


回答 3

在Ruby中,使用selenium-webdriver(2.32.1),存在一种page_source包含整个页面源的方法。

In Ruby, using selenium-webdriver (2.32.1), there is a page_source method that contains the entire page source.


回答 4

实际上,使用属性方法更容易,更直接。

将Ruby与Selenium和PageObject宝石一起使用,以获取与某个元素关联的类,该行将为element.attribute(Class)

如果您想将其他属性绑定到元素,则适用相同的概念。例如,如果我想要一个元素的String element.attribute(String)

Using the attribute method is, in fact, easier and more straight forward.

Using Ruby with the Selenium and PageObject gems, to get the class associated with a certain element, the line would be element.attribute(Class).

The same concept applies if you wanted to get other attributes tied to the element. For example, if I wanted the String of an element, element.attribute(String).


回答 5

看起来已经过时了,但无论如何还是要放在这里。在您的情况下,正确的做法是:

elem = wd.find_element_by_css_selector('#my-id')
html = wd.execute_script("return arguments[0].innerHTML;", elem)

要么

html = elem.get_attribute('innerHTML')

两者都为我工作(selenium-server-standalone-2.35.0)

Looks outdated, but let it be here anyway. The correct way to do it in your case:

elem = wd.find_element_by_css_selector('#my-id')
html = wd.execute_script("return arguments[0].innerHTML;", elem)

or

html = elem.get_attribute('innerHTML')

Both are working for me (selenium-server-standalone-2.35.0)


回答 6

Java与Selenium 2.53.0

driver.getPageSource();

Java with Selenium 2.53.0

driver.getPageSource();

回答 7

希望对您有所帮助:http : //selenium.googlecode.com/svn/trunk/docs/api/java/org/openqa/selenium/WebElement.html

这里介绍Java方法:

java.lang.String    getText() 

但不幸的是,它在Python中不可用。因此,您可以将方法名称从Java转换为Python,并使用当前方法尝试另一种逻辑,而无需获取整个页面的源代码…

例如

 my_id = elem[0].get_attribute('my-id')

I hope this could help: http://selenium.googlecode.com/svn/trunk/docs/api/java/org/openqa/selenium/WebElement.html

Here is described Java method:

java.lang.String    getText() 

But unfortunately it’s not available in Python. So you can translate the method names to Python from Java and try another logic using present methods without getting the whole page source…

E.g.

 my_id = elem[0].get_attribute('my-id')

回答 8

这对我来说是无缝的。

element.get_attribute('innerHTML')

This works seamlessly for me.

element.get_attribute('innerHTML')

回答 9

InnerHTML将返回所选元素内的元素,而outerHTML将连同所选元素一起返回HTML内

示例:-现在假设您的Element如下

<tr id="myRow"><td>A</td><td>B</td></tr>

innerHTML元素输出

<td>A</td><td>B</td>

outsideHTML元素输出

<tr id="myRow"><td>A</td><td>B</td></tr>

现场示例:-

http://www.java2s.com/Tutorials/JavascriptDemo/f/find_out_the_difference_between_innerhtml_and_outerhtml_in_javascript_example.htm

在下面,您将找到根据不同绑定要求的语法。根据需要将更innerHTML改为outerHTML

Python:

element.get_attribute('innerHTML')

Java:

elem.getAttribute("innerHTML");

如果您想使用整页HTML,请使用以下代码:-

driver.getPageSource();

InnerHTML will return element inside the selected element and outerHTML will return inside HTML along with the element you have selected

Example :- Now suppose your Element is as below

<tr id="myRow"><td>A</td><td>B</td></tr>

innerHTML element Output

<td>A</td><td>B</td>

outerHTML element Output

<tr id="myRow"><td>A</td><td>B</td></tr>

Live Example :-

http://www.java2s.com/Tutorials/JavascriptDemo/f/find_out_the_difference_between_innerhtml_and_outerhtml_in_javascript_example.htm

Below you will find the syntax which require as per different binding. Change the innerHTML to outerHTML as per required.

Python:

element.get_attribute('innerHTML')

Java:

elem.getAttribute("innerHTML");

If you want whole page HTML use below code :-

driver.getPageSource();

回答 10

WebElement element = driver.findElement(By.id("foo"));
String contents = (String)((JavascriptExecutor)driver).executeScript("return      arguments[0].innerHTML;", element); 

该代码也确实可以从源代码中获取JavaScript!

WebElement element = driver.findElement(By.id("foo"));
String contents = (String)((JavascriptExecutor)driver).executeScript("return      arguments[0].innerHTML;", element); 

This code really works to get JavaScript from source as well!


回答 11

在PHPUnit硒测试中,它是这样的:

$text = $this->byCssSelector('.some-class-nmae')->attribute('innerHTML');

And in PHPUnit selenium test it’s like this:

$text = $this->byCssSelector('.some-class-nmae')->attribute('innerHTML');

回答 12

如果您对Python中的远程控制解决方案感兴趣,请按照以下方法获取innerHTML:

innerHTML = sel.get_eval("window.document.getElementById('prodid').innerHTML")

If you are interested in a solution for Remote Control in Python, here is how to get innerHTML:

innerHTML = sel.get_eval("window.document.getElementById('prodid').innerHTML")

回答 13

我更喜欢获取呈现的HTML的方法如下:

driver.get("http://www.google.com")
body_html = driver.find_element_by_xpath("/html/body")
print body_html.text

但是,上述方法会删除所有标签(也是嵌套标签),并且仅返回文本内容。如果您也有兴趣获取HTML标记,请使用以下方法。

print body_html.getAttribute("innerHTML")

The method to get the rendered HTML I prefer is following:

driver.get("http://www.google.com")
body_html = driver.find_element_by_xpath("/html/body")
print body_html.text

However the above method removes all the tags( yes the nested tags as well ) and returns only text content. If you interested in getting the HTML markup as well, then use the method below.

print body_html.getAttribute("innerHTML")