Python Selenium访问HTML源-Python 实用宝典

问题：Python Selenium访问HTML源

如何使用Selenium模块和Python在变量中获取HTML源代码？

我想做这样的事情：

from selenium import webdriver

browser = webdriver.Firefox()
browser.get("http://example.com")
if "whatever" in html_source:
    # Do something
else:
    # Do something else

我怎样才能做到这一点？我不知道如何访问HTML源。

How can I get the HTML source in a variable using the Selenium module with Python?

I wanted to do something like this:

from selenium import webdriver

browser = webdriver.Firefox()
browser.get("http://example.com")
if "whatever" in html_source:
    # Do something
else:
    # Do something else

How can I do this? I don’t know how to access the HTML source.

回答 0

您需要访问page_source属性：

from selenium import webdriver

browser = webdriver.Firefox()
browser.get("http://example.com")

html_source = browser.page_source
if "whatever" in html_source:
    # do something
else:
    # do something else

You need to access the page_source property:

from selenium import webdriver

browser = webdriver.Firefox()
browser.get("http://example.com")

html_source = browser.page_source
if "whatever" in html_source:
    # do something
else:
    # do something else

回答 1

借助Selenium2Library，您可以使用 get_source()

import Selenium2Library
s = Selenium2Library.Selenium2Library()
s.open_browser("localhost:7080", "firefox")
source = s.get_source()

With Selenium2Library you can use get_source()

import Selenium2Library
s = Selenium2Library.Selenium2Library()
s.open_browser("localhost:7080", "firefox")
source = s.get_source()

回答 2

driver.page_source将帮助您获取页面源代码。您可以检查页面源中是否存在文本。

from selenium import webdriver
driver = webdriver.Firefox()
driver.get("some url")
if "your text here" in driver.page_source:
    print('Found it!')
else:
    print('Did not find it.')

如果要将页面源存储在变量中，请在driver.get之后添加以下行：

var_pgsource=driver.page_source

并将if条件更改为：

if "your text here" in var_pgsource:

driver.page_source will help you get the page source code. You can check if the text is present in the page source or not.

from selenium import webdriver
driver = webdriver.Firefox()
driver.get("some url")
if "your text here" in driver.page_source:
    print('Found it!')
else:
    print('Did not find it.')

If you want to store the page source in a variable, add below line after driver.get:

var_pgsource=driver.page_source

and change the if condition to:

if "your text here" in var_pgsource:

回答 3

通过使用页面源，您将获得完整的HTML代码。
因此，首先确定需要检索数据或单击元素的代码或标记块。

options = driver.find_elements_by_name_("XXX")
for option in options:
    if option.text == "XXXXXX":
        print(option.text)
        option.click()

您可以按名称，XPath，ID，链接和CSS路径找到元素。

By using the page source you will get the whole HTML code.
So first decide the block of code or tag in which you require to retrieve the data or to click the element..

options = driver.find_elements_by_name_("XXX")
for option in options:
    if option.text == "XXXXXX":
        print(option.text)
        option.click()

You can find the elements by name, XPath, id, link and CSS path.

回答 4

要回答有关获取用于urllib 的URL的问题，只需执行以下JavaScript代码：

url = browser.execute_script("return window.location;")

To answer your question about getting the URL to use for urllib, just execute this JavaScript code:

url = browser.execute_script("return window.location;")

回答 5

您可以简单地使用该WebDriver对象，并通过其@property字段访问页面源代码page_source…

试试这个代码片段:-)

from selenium import webdriver
driver = webdriver.Firefox('path/to/executable')
driver.get('https://some-domain.com')
source = driver.page_source
if 'stuff' in source:
    print('found...')
else:
    print('not in source...')

You can simply use the WebDriver object, and access to the page source code via its @property field page_source…

Try this code snippet :-)

from selenium import webdriver
driver = webdriver.Firefox('path/to/executable')
driver.get('https://some-domain.com')
source = driver.page_source
if 'stuff' in source:
    print('found...')
else:
    print('not in source...')

回答 6

from bs4 import BeautifulSoup
from selenium import webdriver

driver = webdriver.Chrome()
html_source_code = driver.execute_script("return document.body.innerHTML;")
html_soup: BeautifulSoup = BeautifulSoup(html_source_code, 'html.parser')

现在您可以应用BeautifulSoup函数来提取数据…

from bs4 import BeautifulSoup
from selenium import webdriver

driver = webdriver.Chrome()
html_source_code = driver.execute_script("return document.body.innerHTML;")
html_soup: BeautifulSoup = BeautifulSoup(html_source_code, 'html.parser')

Now you can apply BeautifulSoup function to extract data…

回答 7

我建议使用urllib获取源代码，如果要解析，请使用Beautiful Soup之类的东西。

import urllib

url = urllib.urlopen("http://example.com") # Open the URL.
content = url.readlines() # Read the source and save it to a variable.

I’d recommend getting the source with urllib and, if you’re going to parse, use something like Beautiful Soup.

import urllib

url = urllib.urlopen("http://example.com") # Open the URL.
content = url.readlines() # Read the source and save it to a variable.

声明：本站所有文章，如无特殊说明或标注，均为本站原创发布。任何个人或组织，在未征得本站同意时，禁止复制、盗用、采集、发布本站内容到任何网站、书籍等各类媒体平台。如若本站内容侵犯了原著者的合法权益，可联系我们进行处理。

Python Selenium访问HTML源

问题：Python Selenium访问HTML源

回答 0

回答 1

回答 2

回答 3

回答 4

回答 5

回答 6

回答 7

排行榜展示

Python 情人节超强技能导出微信聊天记录生成词云

你不得不知道的python超级文献批量搜索下载工具

Python 流程图 — 一键转化代码为流程图

7行代码 Python热力图可视化分析缺失数据处理

Python 优化—算出每条语句执行时间

你的10W块放哪里能赚最多钱？

文章展示

Virgilio-您的数据科学E-Learning新导师

如何从数据框的单元格获取值？

Python 来算算一线城市的二手房价格指数相关性

IPython 官方存储库包含网站、文档构建等内容

pip安装：请检查该目录的权限和所有者

‘setdefault’dict方法的用例

Python Selenium访问HTML源

问题：Python Selenium访问HTML源

回答 0

回答 1

回答 2

回答 3

回答 4

回答 5

回答 6

回答 7

相关文章

排行榜展示

文章展示