标签归档:automated-tests

如何在python中使用Selenium Webdriver滚动网页?

问题:如何在python中使用Selenium Webdriver滚动网页?

我目前正在使用Selenium Webdriver通过Facebook用户朋友页面进行解析,并从AJAX脚本中提取所有ID。但是我需要向下滚动才能得到所有的朋友。如何在Selenium中向下滚动。我正在使用python。

I am currently using selenium webdriver to parse through facebook user friends page and extract all ids from the AJAX script. But I need to scroll down to get all the friends. How can I scroll down in Selenium. I am using python.


回答 0

您可以使用

driver.execute_script("window.scrollTo(0, Y)") 

其中Y是高度(在全高清显示器上为1080)。(感谢@lukeis)

您也可以使用

driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")

滚动到页面底部。

如果您想滚动到无限加载的页面,例如社交网络页面,facebook等(感谢@Cuong Tran)

SCROLL_PAUSE_TIME = 0.5

# Get scroll height
last_height = driver.execute_script("return document.body.scrollHeight")

while True:
    # Scroll down to bottom
    driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")

    # Wait to load page
    time.sleep(SCROLL_PAUSE_TIME)

    # Calculate new scroll height and compare with last scroll height
    new_height = driver.execute_script("return document.body.scrollHeight")
    if new_height == last_height:
        break
    last_height = new_height

另一种方法(感谢Juanse)是,选择一个对象,然后

label.sendKeys(Keys.PAGE_DOWN);

You can use

driver.execute_script("window.scrollTo(0, Y)") 

where Y is the height (on a fullhd monitor it’s 1080). (Thanks to @lukeis)

You can also use

driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")

to scroll to the bottom of the page.

If you want to scroll to a page with infinite loading, like social network ones, facebook etc. (thanks to @Cuong Tran)

SCROLL_PAUSE_TIME = 0.5

# Get scroll height
last_height = driver.execute_script("return document.body.scrollHeight")

while True:
    # Scroll down to bottom
    driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")

    # Wait to load page
    time.sleep(SCROLL_PAUSE_TIME)

    # Calculate new scroll height and compare with last scroll height
    new_height = driver.execute_script("return document.body.scrollHeight")
    if new_height == last_height:
        break
    last_height = new_height

another method (thanks to Juanse) is, select an object and

label.sendKeys(Keys.PAGE_DOWN);

回答 1

如果要向下滚动到无限页面的底部(例如linkedin.com),可以使用以下代码:

SCROLL_PAUSE_TIME = 0.5

# Get scroll height
last_height = driver.execute_script("return document.body.scrollHeight")

while True:
    # Scroll down to bottom
    driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")

    # Wait to load page
    time.sleep(SCROLL_PAUSE_TIME)

    # Calculate new scroll height and compare with last scroll height
    new_height = driver.execute_script("return document.body.scrollHeight")
    if new_height == last_height:
        break
    last_height = new_height

参考:https : //stackoverflow.com/a/28928684/1316860

If you want to scroll down to bottom of infinite page (like linkedin.com), you can use this code:

SCROLL_PAUSE_TIME = 0.5

# Get scroll height
last_height = driver.execute_script("return document.body.scrollHeight")

while True:
    # Scroll down to bottom
    driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")

    # Wait to load page
    time.sleep(SCROLL_PAUSE_TIME)

    # Calculate new scroll height and compare with last scroll height
    new_height = driver.execute_script("return document.body.scrollHeight")
    if new_height == last_height:
        break
    last_height = new_height

Reference: https://stackoverflow.com/a/28928684/1316860


回答 2

您可以send_keys用来模拟END(或PAGE_DOWN)按键(通常会滚动页面):

from selenium.webdriver.common.keys import Keys
html = driver.find_element_by_tag_name('html')
html.send_keys(Keys.END)

You can use send_keys to simulate an END (or PAGE_DOWN) key press (which normally scroll the page):

from selenium.webdriver.common.keys import Keys
html = driver.find_element_by_tag_name('html')
html.send_keys(Keys.END)

回答 3

如图相同的方法在这里

在python中,您可以使用

driver.execute_script("window.scrollTo(0, Y)")

(Y是您要滚动到的垂直位置)

same method as shown here:

in python you can just use

driver.execute_script("window.scrollTo(0, Y)")

(Y is the vertical position you want to scroll to)


回答 4

element=find_element_by_xpath("xpath of the li you are trying to access")

element.location_once_scrolled_into_view

当我尝试访问不可见的“ li”时,这很有帮助。

element=find_element_by_xpath("xpath of the li you are trying to access")

element.location_once_scrolled_into_view

this helped when I was trying to access a ‘li’ that was not visible.


回答 5

出于我的目的,我想向下滚动更多,同时牢记窗口的位置。我的解决方案是相似的,并使用window.scrollY

driver.execute_script("window.scrollTo(0, window.scrollY + 200)")

它将转到当前的y滚动位置+ 200

For my purpose, I wanted to scroll down more, keeping the windows position in mind. My solution was similar and used window.scrollY

driver.execute_script("window.scrollTo(0, window.scrollY + 200)")

which will go to the current y scroll position + 200


回答 6

这是您向下滚动网页的方式:

driver.execute_script("window.scrollTo(0, 1000);")

This is how you scroll down the webpage:

driver.execute_script("window.scrollTo(0, 1000);")

回答 7

我发现解决该问题的最简单方法是选择一个标签,然后发送:

label.sendKeys(Keys.PAGE_DOWN);

希望它能起作用!

The easiest way i found to solve that problem was to select a label and then send:

label.sendKeys(Keys.PAGE_DOWN);

Hope it works!


回答 8

这些答案都不适合我,至少不是向下滚动Facebook搜索结果页面有效,但经过大量测试,我发现此解决方案:

while driver.find_element_by_tag_name('div'):
    driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
    Divs=driver.find_element_by_tag_name('div').text
    if 'End of Results' in Divs:
        print 'end'
        break
    else:
        continue

None of these answers worked for me, at least not for scrolling down a facebook search result page, but I found after a lot of testing this solution:

while driver.find_element_by_tag_name('div'):
    driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
    Divs=driver.find_element_by_tag_name('div').text
    if 'End of Results' in Divs:
        print 'end'
        break
    else:
        continue

回答 9

使用youtube时,浮动元素的滚动高度为“ 0”,因此请不要使用“ return document.body.scrollHeight”,而是尝试使用此“ return document.documentElement.scrollHeight” ,根据您的互联网调整滚动暂停时间速度,否则它将只运行一次,然后在此之后中断。

SCROLL_PAUSE_TIME = 1

# Get scroll height
"""last_height = driver.execute_script("return document.body.scrollHeight")

this dowsnt work due to floating web elements on youtube
"""

last_height = driver.execute_script("return document.documentElement.scrollHeight")
while True:
    # Scroll down to bottom
    driver.execute_script("window.scrollTo(0,document.documentElement.scrollHeight);")

    # Wait to load page
    time.sleep(SCROLL_PAUSE_TIME)

    # Calculate new scroll height and compare with last scroll height
    new_height = driver.execute_script("return document.documentElement.scrollHeight")
    if new_height == last_height:
       print("break")
       break
    last_height = new_height

When working with youtube the floating elements give the value “0” as the scroll height so rather than using “return document.body.scrollHeight” try using this one “return document.documentElement.scrollHeight” adjust the scroll pause time as per your internet speed else it will run for only one time and then breaks after that.

SCROLL_PAUSE_TIME = 1

# Get scroll height
"""last_height = driver.execute_script("return document.body.scrollHeight")

this dowsnt work due to floating web elements on youtube
"""

last_height = driver.execute_script("return document.documentElement.scrollHeight")
while True:
    # Scroll down to bottom
    driver.execute_script("window.scrollTo(0,document.documentElement.scrollHeight);")

    # Wait to load page
    time.sleep(SCROLL_PAUSE_TIME)

    # Calculate new scroll height and compare with last scroll height
    new_height = driver.execute_script("return document.documentElement.scrollHeight")
    if new_height == last_height:
       print("break")
       break
    last_height = new_height

回答 10

我正在寻找一种滚动浏览动态网页的方法,并在到达页面末尾并发现该线程时自动停止。

@Cuong Tran的帖子进行了主要修改,是我正在寻找的答案。我认为其他人可能会发现此修改很有用(它对代码的工作方式有明显影响),因此,本文发布了。

修改是移动捕获循环最后一页高度的语句(以便使每项检查都与上一页高度进行比较)。

因此,下面的代码:

连续向下滚动动态网页(.scrollTo()),仅在一次迭代中页面高度保持不变时停止。

(还有另一种修改,其中break语句位于另一个可以删除的条件内(如果页面为“ sticks”)。

    SCROLL_PAUSE_TIME = 0.5


    while True:

        # Get scroll height
        ### This is the difference. Moving this *inside* the loop
        ### means that it checks if scrollTo is still scrolling 
        last_height = driver.execute_script("return document.body.scrollHeight")

        # Scroll down to bottom
        driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")

        # Wait to load page
        time.sleep(SCROLL_PAUSE_TIME)

        # Calculate new scroll height and compare with last scroll height
        new_height = driver.execute_script("return document.body.scrollHeight")
        if new_height == last_height:

            # try again (can be removed)
            driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")

            # Wait to load page
            time.sleep(SCROLL_PAUSE_TIME)

            # Calculate new scroll height and compare with last scroll height
            new_height = driver.execute_script("return document.body.scrollHeight")

            # check if the page height has remained the same
            if new_height == last_height:
                # if so, you are done
                break
            # if not, move on to the next loop
            else:
                last_height = new_height
                continue

I was looking for a way of scrolling through a dynamic webpage, and automatically stopping once the end of the page is reached, and found this thread.

The post by @Cuong Tran, with one main modification, was the answer that I was looking for. I thought that others might find the modification helpful (it has a pronounced effect on how the code works), hence this post.

The modification is to move the statement that captures the last page height inside the loop (so that each check is comparing to the previous page height).

So, the code below:

Continuously scrolls down a dynamic webpage (.scrollTo()), only stopping when, for one iteration, the page height stays the same.

(There is another modification, where the break statement is inside another condition (in case the page ‘sticks’) which can be removed).

    SCROLL_PAUSE_TIME = 0.5


    while True:

        # Get scroll height
        ### This is the difference. Moving this *inside* the loop
        ### means that it checks if scrollTo is still scrolling 
        last_height = driver.execute_script("return document.body.scrollHeight")

        # Scroll down to bottom
        driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")

        # Wait to load page
        time.sleep(SCROLL_PAUSE_TIME)

        # Calculate new scroll height and compare with last scroll height
        new_height = driver.execute_script("return document.body.scrollHeight")
        if new_height == last_height:

            # try again (can be removed)
            driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")

            # Wait to load page
            time.sleep(SCROLL_PAUSE_TIME)

            # Calculate new scroll height and compare with last scroll height
            new_height = driver.execute_script("return document.body.scrollHeight")

            # check if the page height has remained the same
            if new_height == last_height:
                # if so, you are done
                break
            # if not, move on to the next loop
            else:
                last_height = new_height
                continue

回答 11

该代码滚动到底部,但不需要您每次都等待。它会不断滚动,然后在底部停止(或超时)

from selenium import webdriver
import time

driver = webdriver.Chrome(executable_path='chromedriver.exe')
driver.get('https://example.com')

pre_scroll_height = driver.execute_script('return document.body.scrollHeight;')
run_time, max_run_time = 0, 1
while True:
    iteration_start = time.time()
    # Scroll webpage, the 100 allows for a more 'aggressive' scroll
    driver.execute_script('window.scrollTo(0, 100*document.body.scrollHeight);')

    post_scroll_height = driver.execute_script('return document.body.scrollHeight;')

    scrolled = post_scroll_height != pre_scroll_height
    timed_out = run_time >= max_run_time

    if scrolled:
        run_time = 0
        pre_scroll_height = post_scroll_height
    elif not scrolled and not timed_out:
        run_time += time.time() - iteration_start
    elif not scrolled and timed_out:
        break

# closing the driver is optional 
driver.close()

这比每次等待0.5-3秒等待响应要快得多,因为该响应可能需要0.1秒

This code scrolls to the bottom but doesn’t require that you wait each time. It’ll continually scroll, and then stop at the bottom (or timeout)

from selenium import webdriver
import time

driver = webdriver.Chrome(executable_path='chromedriver.exe')
driver.get('https://example.com')

pre_scroll_height = driver.execute_script('return document.body.scrollHeight;')
run_time, max_run_time = 0, 1
while True:
    iteration_start = time.time()
    # Scroll webpage, the 100 allows for a more 'aggressive' scroll
    driver.execute_script('window.scrollTo(0, 100*document.body.scrollHeight);')

    post_scroll_height = driver.execute_script('return document.body.scrollHeight;')

    scrolled = post_scroll_height != pre_scroll_height
    timed_out = run_time >= max_run_time

    if scrolled:
        run_time = 0
        pre_scroll_height = post_scroll_height
    elif not scrolled and not timed_out:
        run_time += time.time() - iteration_start
    elif not scrolled and timed_out:
        break

# closing the driver is optional 
driver.close()

This is much faster than waiting 0.5-3 seconds each time for a response, when that response could take 0.1 seconds


回答 12

滚动加载页面。示例:中,定额等

last_height = driver.execute_script("return document.body.scrollHeight")
    while True:
        driver.execute_script("window.scrollTo(0, document.body.scrollHeight-1000);")
        # Wait to load the page.
        driver.implicitly_wait(30) # seconds
        new_height = driver.execute_script("return document.body.scrollHeight")
    
        if new_height == last_height:
            break
        last_height = new_height
        # sleep for 30s
        driver.implicitly_wait(30) # seconds
    driver.quit()

scroll loading pages. Example: medium, quora,etc

last_height = driver.execute_script("return document.body.scrollHeight")
    while True:
        driver.execute_script("window.scrollTo(0, document.body.scrollHeight-1000);")
        # Wait to load the page.
        driver.implicitly_wait(30) # seconds
        new_height = driver.execute_script("return document.body.scrollHeight")
    
        if new_height == last_height:
            break
        last_height = new_height
        # sleep for 30s
        driver.implicitly_wait(30) # seconds
    driver.quit()

回答 13

如果要在特定视图/框架(WebElement)中滚动,则只需将“ body”替换为要在其中滚动的特定元素。我在下面的示例中通过“ getElementById”获得该元素:

self.driver.execute_script('window.scrollTo(0, document.getElementById("page-manager").scrollHeight);')

例如,在YouTube上就是这种情况。

if you want to scroll within a particular view/frame (WebElement), what you only need to do is to replace “body” with a particular element that you intend to scroll within. i get that element via “getElementById” in the example below:

self.driver.execute_script('window.scrollTo(0, document.getElementById("page-manager").scrollHeight);')

this is the case on YouTube, for example…


回答 14

ScrollTo()功能不再起作用。这是我使用的,效果很好。

driver.execute_script("document.getElementById('mydiv').scrollIntoView();")

The ScrollTo() function doesn’t work anymore. This is what I used and it worked fine.

driver.execute_script("document.getElementById('mydiv').scrollIntoView();")

回答 15

driver.execute_script("document.getElementById('your ID Element').scrollIntoView();")

它适合我的情况。

driver.execute_script("document.getElementById('your ID Element').scrollIntoView();")

it’s working for my case.


在Chrome中运行Selenium WebDriver python绑定

问题:在Chrome中运行Selenium WebDriver python绑定

使用Selenium时遇到问题。对于我的项目,我必须使用Chrome。但是,用Selenium启动该浏览器后,我无法连接到该浏览器。

由于某些原因,Selenium本身无法找到Chrome。当我尝试启动Chrome而没有添加路径时,会发生以下情况:

Traceback (most recent call last):
  File "./obp_pb_get_csv.py", line 73, in <module>
    browser = webdriver.Chrome() # Get local session of chrome
  File "/usr/lib64/python2.7/site-packages/selenium/webdriver/chrome/webdriver.py", line 46, in __init__
    self.service.start()
  File "/usr/lib64/python2.7/site-packages/selenium/webdriver/chrome/service.py", line 58, in start
    and read up at http://code.google.com/p/selenium/wiki/ChromeDriver")
selenium.common.exceptions.WebDriverException: Message: 'ChromeDriver executable needs to be available in the path.                 Please download from http://code.google.com/p/selenium/downloads/list                and read up at http://code.google.com/p/selenium/wiki/ChromeDriver'

为了解决此问题,我随后在启动Chrome的代码中包含了Chromium路径。但是,解释器无法找到要连接的套接字:

Traceback (most recent call last):
  File "./obp_pb_get_csv.py", line 73, in <module>
    browser = webdriver.Chrome('/usr/bin/chromium') # Get local session of chrome
  File "/usr/lib64/python2.7/site-packages/selenium/webdriver/chrome/webdriver.py", line 46, in __init__
    self.service.start()
  File "/usr/lib64/python2.7/site-packages/selenium/webdriver/chrome/service.py", line 64, in start
    raise WebDriverException("Can not connect to the ChromeDriver")
selenium.common.exceptions.WebDriverException: Message: 'Can not connect to the ChromeDriver'

我还尝试通过以下方式启动chrome解决问题:

铬--remote-shell-port = 9222

但是,这也不起作用。

PS。以下是有关我的系统的一些信息:

www-client:铬15.0.874.121  
dev-lang:python 2.7.2-r3 Selenium 2.11.1  
操作系统:GNU / Linux Gentoo Kernel 3.1.0-gentoo-r1

I ran into a problem while working with Selenium. For my project, I have to use Chrome. However, I can’t connect to that browser after launching it with Selenium.

For some reason, Selenium can’t find Chrome by itself. This is what happens when I try to launch Chrome without including a path:

Traceback (most recent call last):
  File "./obp_pb_get_csv.py", line 73, in <module>
    browser = webdriver.Chrome() # Get local session of chrome
  File "/usr/lib64/python2.7/site-packages/selenium/webdriver/chrome/webdriver.py", line 46, in __init__
    self.service.start()
  File "/usr/lib64/python2.7/site-packages/selenium/webdriver/chrome/service.py", line 58, in start
    and read up at http://code.google.com/p/selenium/wiki/ChromeDriver")
selenium.common.exceptions.WebDriverException: Message: 'ChromeDriver executable needs to be available in the path.                 Please download from http://code.google.com/p/selenium/downloads/list                and read up at http://code.google.com/p/selenium/wiki/ChromeDriver'

To solve this problem, I then included the Chromium path in the code that launches Chrome. However, the interpreter fails to find a socket to connect to:

Traceback (most recent call last):
  File "./obp_pb_get_csv.py", line 73, in <module>
    browser = webdriver.Chrome('/usr/bin/chromium') # Get local session of chrome
  File "/usr/lib64/python2.7/site-packages/selenium/webdriver/chrome/webdriver.py", line 46, in __init__
    self.service.start()
  File "/usr/lib64/python2.7/site-packages/selenium/webdriver/chrome/service.py", line 64, in start
    raise WebDriverException("Can not connect to the ChromeDriver")
selenium.common.exceptions.WebDriverException: Message: 'Can not connect to the ChromeDriver'

I also tried solving the problem by launching chrome with:

chromium --remote-shell-port=9222

However, this did not work either.

PS. Here’s some information about my system:

www-client: chromium 15.0.874.121  
dev-lang:   python 2.7.2-r3 Selenium 2.11.1  
OS:         GNU/Linux Gentoo Kernel 3.1.0-gentoo-r1

回答 0

您需要确保独立的ChromeDriver二进制文件(与Chrome浏览器二进制文件不同)位于您的路径中或在webdriver.chrome.driver环境变量中可用。

有关如何进行整理的完整信息,请参见http://code.google.com/p/selenium/wiki/ChromeDriver

编辑:

是的,似乎是从路径环境变量读取chromedriver二进制文件的Python绑定中的错误。似乎chromedriver不在您的路径中,您必须将其作为参数传递给构造函数。

import os
from selenium import webdriver

chromedriver = "/Users/adam/Downloads/chromedriver"
os.environ["webdriver.chrome.driver"] = chromedriver
driver = webdriver.Chrome(chromedriver)
driver.get("http://stackoverflow.com")
driver.quit()

You need to make sure the standalone ChromeDriver binary (which is different than the Chrome browser binary) is either in your path or available in the webdriver.chrome.driver environment variable.

see http://code.google.com/p/selenium/wiki/ChromeDriver for full information on how wire things up.

Edit:

Right, seems to be a bug in the Python bindings wrt reading the chromedriver binary from the path or the environment variable. Seems if chromedriver is not in your path you have to pass it in as an argument to the constructor.

import os
from selenium import webdriver

chromedriver = "/Users/adam/Downloads/chromedriver"
os.environ["webdriver.chrome.driver"] = chromedriver
driver = webdriver.Chrome(chromedriver)
driver.get("http://stackoverflow.com")
driver.quit()

回答 1

对于Linux

  1. 检查您是否安装了最新版本的Chrome浏览器-> chromium-browser -version
  2. 如果没有,请安装最新版本的chrome sudo apt-get install chromium-browser
  3. 此处获取适当版本的chrome驱动程序
  4. 解压缩chromedriver.zip
  5. 将文件移到/usr/bin目录sudo mv chromedriver /usr/bin
  6. 转到/usr/bin目录cd /usr/bin
  7. 现在,您需要运行类似sudo chmod a+x chromedriver将其标记为可执行文件的操作。
  8. 最后,您可以执行代码。

    from selenium import webdriver
    
    driver = webdriver.Chrome()
    driver.get("http://www.google.com")
    print driver.page_source.encode('utf-8')
    driver.quit()
    display.stop()

For Linux

  1. Check you have installed latest version of chrome brwoser-> chromium-browser -version
  2. If not, install latest version of chrome sudo apt-get install chromium-browser
  3. get appropriate version of chrome driver from here
  4. Unzip the chromedriver.zip
  5. Move the file to /usr/bin directory sudo mv chromedriver /usr/bin
  6. Goto /usr/bin directory cd /usr/bin
  7. Now, you would need to run something like sudo chmod a+x chromedriver to mark it executable.
  8. finally you can execute the code.

    from selenium import webdriver
    
    driver = webdriver.Chrome()
    driver.get("http://www.google.com")
    print driver.page_source.encode('utf-8')
    driver.quit()
    display.stop()
    

回答 2

仅Mac OSX

进行以下操作的更简单方法(假设您已经安装了自制软件,如果没有,应该先这样做,然后让自制软件使您的生活变得更好),只需运行以下命令:

brew install chromedriver

那应该将chromedriver放在您的路径中,并且您已经准备就绪。

Mac OSX only

An easier way to get going (assuming you already have homebrew installed, which you should, if not, go do that first and let homebrew make your life better) is to just run the following command:

brew install chromedriver

That should put the chromedriver in your path and you should be all set.


回答 3

对于窗户

从此直接链接 下载ChromeDriver 从此页面获取最新版本

将该chromedriver.exe文件粘贴到C:\Python27\Scripts文件夹中。

现在应该可以使用:

from selenium import webdriver
driver = webdriver.Chrome()

For windows

Download ChromeDriver from this direct link OR get the latest version from this page.

Paste the chromedriver.exe file in your C:\Python27\Scripts folder.

This should work now:

from selenium import webdriver
driver = webdriver.Chrome()

回答 4

对于窗户,请chromedriver.exe放在下面<Install Dir>/Python27/Scripts/

For windows, please have the chromedriver.exe placed under <Install Dir>/Python27/Scripts/


回答 5

有两种方法可以在Google Chrome浏览器中运行Selenium python测试。我正在考虑Windows(以Windows 10为例):

先决条件:从以下网址下载最新的Chrome驱动程序:https : //sites.google.com/a/chromium.org/chromedriver/downloads

方法1:

i)将下载的zip文件解压缩到您选择的目录/位置中
ii)如下所示在代码中设置可执行路径:

self.driver = webdriver.Chrome(executable_path='D:\Selenium_RiponAlWasim\Drivers\chromedriver_win32\chromedriver.exe')

方式2:

i)只需将chromedriver.exe粘贴在/ Python / Scripts /下(在我的情况下,文件夹为:C:\ Python36 \ Scripts)
ii)现在编写如下的简单代码:

self.driver = webdriver.Chrome()

There are 2 ways to run Selenium python tests in Google Chrome. I’m considering Windows (Windows 10 in my case):

Prerequisite: Download the latest Chrome Driver from: https://sites.google.com/a/chromium.org/chromedriver/downloads

Way 1:

i) Extract the downloaded zip file in a directory/location of your choice
ii) Set the executable path in your code as below:

self.driver = webdriver.Chrome(executable_path='D:\Selenium_RiponAlWasim\Drivers\chromedriver_win32\chromedriver.exe')

Way 2:

i) Simply paste the chromedriver.exe under /Python/Scripts/ (In my case the folder was: C:\Python36\Scripts)
ii) Now write the simple code as below:

self.driver = webdriver.Chrome()

回答 6

对于Windows的IDE:

如果您的路径不起作用,则可以尝试将其添加chromedriver.exe到您的项目中,就像在此项目结构中一样。

然后,您应该chromedriver.exe在主文件中加载。至于我,我装了driver.exedriver.py

def get_chrome_driver():
return webdriver.Chrome("..\\content\\engine\\chromedriver.exe",
                            chrome_options='--no-startup-window')

..表示driver.py's上层目录

. 表示目录 driver.py位于

希望这会有所帮助。

For Windows’ IDE:

If your path doesn’t work, you can try to add the chromedriver.exe to your project, like in this project structure.

Then you should load the chromedriver.exe in your main file. As for me, I loaded the driver.exe in driver.py.

def get_chrome_driver():
return webdriver.Chrome("..\\content\\engine\\chromedriver.exe",
                            chrome_options='--no-startup-window')

.. means driver.py's upper directory

. means the directory where the driver.py is located

Hope this will be helpful.


使用Python在Selenium WebDriver中获取WebElement的HTML源

问题:使用Python在Selenium WebDriver中获取WebElement的HTML源

我正在使用Python绑定来运行Selenium WebDriver:

from selenium import webdriver
wd = webdriver.Firefox()

我知道我可以像这样抓取网络元素:

elem = wd.find_element_by_css_selector('#my-id')

我知道我可以通过…

wd.page_source

但是无论如何,有没有获得“元素来源”?

elem.source   # <-- returns the HTML as a string

Python的Selenium Webdriver文档基本上不存在,我在代码中看不到任何能够启用该功能的东西。

对访问元素(及其子元素)的HTML的最佳方法有何想法?

I’m using the Python bindings to run Selenium WebDriver:

from selenium import webdriver
wd = webdriver.Firefox()

I know I can grab a webelement like so:

elem = wd.find_element_by_css_selector('#my-id')

And I know I can get the full page source with…

wd.page_source

But is there anyway to get the “element source”?

elem.source   # <-- returns the HTML as a string

The selenium webdriver docs for Python are basically non-existent and I don’t see anything in the code that seems to enable that functionality.

Any thoughts on the best way to access the HTML of an element (and its children)?


回答 0

您可以读取innerHTML属性以获取元素内容outerHTML来源或包含当前元素的来源。

Python:

element.get_attribute('innerHTML')

Java:

elem.getAttribute("innerHTML");

C#:

element.GetAttribute("innerHTML");

红宝石:

element.attribute("innerHTML")

JS:

element.getAttribute('innerHTML');

PHP:

$element->getAttribute('innerHTML');

经过测试并与ChromeDriver

You can read innerHTML attribute to get source of the content of the element or outerHTML for source with the current element.

Python:

element.get_attribute('innerHTML')

Java:

elem.getAttribute("innerHTML");

C#:

element.GetAttribute("innerHTML");

Ruby:

element.attribute("innerHTML")

JS:

element.getAttribute('innerHTML');

PHP:

$element->getAttribute('innerHTML');

Tested and works with the ChromeDriver.


回答 1

获取a的html源代码实际上并没有直接的方法webelement。您将不得不使用JS。我不太确定python绑定,但是您可以在Java中轻松地做到这一点。我确信一定有一些类似于JavascriptExecutorPython中的类。

 WebElement element = driver.findElement(By.id("foo"));
 String contents = (String)((JavascriptExecutor)driver).executeScript("return arguments[0].innerHTML;", element); 

There is not really a straight-forward way of getting the html source code of a webelement. You will have to use JS. I am not too sure about python bindings but you can easily do like this in Java. I am sure there must be something similar to JavascriptExecutor class in Python.

 WebElement element = driver.findElement(By.id("foo"));
 String contents = (String)((JavascriptExecutor)driver).executeScript("return arguments[0].innerHTML;", element); 

回答 2

当然,我们可以在下面的Selenium Python中使用此脚本获取所有HTML源代码:

elem = driver.find_element_by_xpath("//*")
source_code = elem.get_attribute("outerHTML")

如果要保存到文件:

with open('c:/html_source_code.html', 'w') as f:
    f.write(source_code.encode('utf-8'))

我建议保存到文件,因为源代码非常长。

Sure we can get all HTML source code with this script below in Selenium Python:

elem = driver.find_element_by_xpath("//*")
source_code = elem.get_attribute("outerHTML")

If you you want to save it to file:

with open('c:/html_source_code.html', 'w') as f:
    f.write(source_code.encode('utf-8'))

I suggest saving to a file because source code is very very long.


回答 3

在Ruby中,使用selenium-webdriver(2.32.1),存在一种page_source包含整个页面源的方法。

In Ruby, using selenium-webdriver (2.32.1), there is a page_source method that contains the entire page source.


回答 4

实际上,使用属性方法更容易,更直接。

将Ruby与Selenium和PageObject宝石一起使用,以获取与某个元素关联的类,该行将为element.attribute(Class)

如果您想将其他属性绑定到元素,则适用相同的概念。例如,如果我想要一个元素的String element.attribute(String)

Using the attribute method is, in fact, easier and more straight forward.

Using Ruby with the Selenium and PageObject gems, to get the class associated with a certain element, the line would be element.attribute(Class).

The same concept applies if you wanted to get other attributes tied to the element. For example, if I wanted the String of an element, element.attribute(String).


回答 5

看起来已经过时了,但无论如何还是要放在这里。在您的情况下,正确的做法是:

elem = wd.find_element_by_css_selector('#my-id')
html = wd.execute_script("return arguments[0].innerHTML;", elem)

要么

html = elem.get_attribute('innerHTML')

两者都为我工作(selenium-server-standalone-2.35.0)

Looks outdated, but let it be here anyway. The correct way to do it in your case:

elem = wd.find_element_by_css_selector('#my-id')
html = wd.execute_script("return arguments[0].innerHTML;", elem)

or

html = elem.get_attribute('innerHTML')

Both are working for me (selenium-server-standalone-2.35.0)


回答 6

Java与Selenium 2.53.0

driver.getPageSource();

Java with Selenium 2.53.0

driver.getPageSource();

回答 7

希望对您有所帮助:http : //selenium.googlecode.com/svn/trunk/docs/api/java/org/openqa/selenium/WebElement.html

这里介绍Java方法:

java.lang.String    getText() 

但不幸的是,它在Python中不可用。因此,您可以将方法名称从Java转换为Python,并使用当前方法尝试另一种逻辑,而无需获取整个页面的源代码…

例如

 my_id = elem[0].get_attribute('my-id')

I hope this could help: http://selenium.googlecode.com/svn/trunk/docs/api/java/org/openqa/selenium/WebElement.html

Here is described Java method:

java.lang.String    getText() 

But unfortunately it’s not available in Python. So you can translate the method names to Python from Java and try another logic using present methods without getting the whole page source…

E.g.

 my_id = elem[0].get_attribute('my-id')

回答 8

这对我来说是无缝的。

element.get_attribute('innerHTML')

This works seamlessly for me.

element.get_attribute('innerHTML')

回答 9

InnerHTML将返回所选元素内的元素,而outerHTML将连同所选元素一起返回HTML内

示例:-现在假设您的Element如下

<tr id="myRow"><td>A</td><td>B</td></tr>

innerHTML元素输出

<td>A</td><td>B</td>

outsideHTML元素输出

<tr id="myRow"><td>A</td><td>B</td></tr>

现场示例:-

http://www.java2s.com/Tutorials/JavascriptDemo/f/find_out_the_difference_between_innerhtml_and_outerhtml_in_javascript_example.htm

在下面,您将找到根据不同绑定要求的语法。根据需要将更innerHTML改为outerHTML

Python:

element.get_attribute('innerHTML')

Java:

elem.getAttribute("innerHTML");

如果您想使用整页HTML,请使用以下代码:-

driver.getPageSource();

InnerHTML will return element inside the selected element and outerHTML will return inside HTML along with the element you have selected

Example :- Now suppose your Element is as below

<tr id="myRow"><td>A</td><td>B</td></tr>

innerHTML element Output

<td>A</td><td>B</td>

outerHTML element Output

<tr id="myRow"><td>A</td><td>B</td></tr>

Live Example :-

http://www.java2s.com/Tutorials/JavascriptDemo/f/find_out_the_difference_between_innerhtml_and_outerhtml_in_javascript_example.htm

Below you will find the syntax which require as per different binding. Change the innerHTML to outerHTML as per required.

Python:

element.get_attribute('innerHTML')

Java:

elem.getAttribute("innerHTML");

If you want whole page HTML use below code :-

driver.getPageSource();

回答 10

WebElement element = driver.findElement(By.id("foo"));
String contents = (String)((JavascriptExecutor)driver).executeScript("return      arguments[0].innerHTML;", element); 

该代码也确实可以从源代码中获取JavaScript!

WebElement element = driver.findElement(By.id("foo"));
String contents = (String)((JavascriptExecutor)driver).executeScript("return      arguments[0].innerHTML;", element); 

This code really works to get JavaScript from source as well!


回答 11

在PHPUnit硒测试中,它是这样的:

$text = $this->byCssSelector('.some-class-nmae')->attribute('innerHTML');

And in PHPUnit selenium test it’s like this:

$text = $this->byCssSelector('.some-class-nmae')->attribute('innerHTML');

回答 12

如果您对Python中的远程控制解决方案感兴趣,请按照以下方法获取innerHTML:

innerHTML = sel.get_eval("window.document.getElementById('prodid').innerHTML")

If you are interested in a solution for Remote Control in Python, here is how to get innerHTML:

innerHTML = sel.get_eval("window.document.getElementById('prodid').innerHTML")

回答 13

我更喜欢获取呈现的HTML的方法如下:

driver.get("http://www.google.com")
body_html = driver.find_element_by_xpath("/html/body")
print body_html.text

但是,上述方法会删除所有标签(也是嵌套标签),并且仅返回文本内容。如果您也有兴趣获取HTML标记,请使用以下方法。

print body_html.getAttribute("innerHTML")

The method to get the rendered HTML I prefer is following:

driver.get("http://www.google.com")
body_html = driver.find_element_by_xpath("/html/body")
print body_html.text

However the above method removes all the tags( yes the nested tags as well ) and returns only text content. If you interested in getting the HTML markup as well, then use the method below.

print body_html.getAttribute("innerHTML")