如何在python中使用Selenium Webdriver滚动网页?

问题:如何在python中使用Selenium Webdriver滚动网页?

我目前正在使用Selenium Webdriver通过Facebook用户朋友页面进行解析,并从AJAX脚本中提取所有ID。但是我需要向下滚动才能得到所有的朋友。如何在Selenium中向下滚动。我正在使用python。

I am currently using selenium webdriver to parse through facebook user friends page and extract all ids from the AJAX script. But I need to scroll down to get all the friends. How can I scroll down in Selenium. I am using python.


回答 0

您可以使用

driver.execute_script("window.scrollTo(0, Y)") 

其中Y是高度(在全高清显示器上为1080)。(感谢@lukeis)

您也可以使用

driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")

滚动到页面底部。

如果您想滚动到无限加载的页面,例如社交网络页面,facebook等(感谢@Cuong Tran)

SCROLL_PAUSE_TIME = 0.5

# Get scroll height
last_height = driver.execute_script("return document.body.scrollHeight")

while True:
    # Scroll down to bottom
    driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")

    # Wait to load page
    time.sleep(SCROLL_PAUSE_TIME)

    # Calculate new scroll height and compare with last scroll height
    new_height = driver.execute_script("return document.body.scrollHeight")
    if new_height == last_height:
        break
    last_height = new_height

另一种方法(感谢Juanse)是,选择一个对象,然后

label.sendKeys(Keys.PAGE_DOWN);

You can use

driver.execute_script("window.scrollTo(0, Y)") 

where Y is the height (on a fullhd monitor it’s 1080). (Thanks to @lukeis)

You can also use

driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")

to scroll to the bottom of the page.

If you want to scroll to a page with infinite loading, like social network ones, facebook etc. (thanks to @Cuong Tran)

SCROLL_PAUSE_TIME = 0.5

# Get scroll height
last_height = driver.execute_script("return document.body.scrollHeight")

while True:
    # Scroll down to bottom
    driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")

    # Wait to load page
    time.sleep(SCROLL_PAUSE_TIME)

    # Calculate new scroll height and compare with last scroll height
    new_height = driver.execute_script("return document.body.scrollHeight")
    if new_height == last_height:
        break
    last_height = new_height

another method (thanks to Juanse) is, select an object and

label.sendKeys(Keys.PAGE_DOWN);

回答 1

如果要向下滚动到无限页面的底部(例如linkedin.com),可以使用以下代码:

SCROLL_PAUSE_TIME = 0.5

# Get scroll height
last_height = driver.execute_script("return document.body.scrollHeight")

while True:
    # Scroll down to bottom
    driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")

    # Wait to load page
    time.sleep(SCROLL_PAUSE_TIME)

    # Calculate new scroll height and compare with last scroll height
    new_height = driver.execute_script("return document.body.scrollHeight")
    if new_height == last_height:
        break
    last_height = new_height

参考:https : //stackoverflow.com/a/28928684/1316860

If you want to scroll down to bottom of infinite page (like linkedin.com), you can use this code:

SCROLL_PAUSE_TIME = 0.5

# Get scroll height
last_height = driver.execute_script("return document.body.scrollHeight")

while True:
    # Scroll down to bottom
    driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")

    # Wait to load page
    time.sleep(SCROLL_PAUSE_TIME)

    # Calculate new scroll height and compare with last scroll height
    new_height = driver.execute_script("return document.body.scrollHeight")
    if new_height == last_height:
        break
    last_height = new_height

Reference: https://stackoverflow.com/a/28928684/1316860


回答 2

您可以send_keys用来模拟END(或PAGE_DOWN)按键(通常会滚动页面):

from selenium.webdriver.common.keys import Keys
html = driver.find_element_by_tag_name('html')
html.send_keys(Keys.END)

You can use send_keys to simulate an END (or PAGE_DOWN) key press (which normally scroll the page):

from selenium.webdriver.common.keys import Keys
html = driver.find_element_by_tag_name('html')
html.send_keys(Keys.END)

回答 3

如图相同的方法在这里

在python中,您可以使用

driver.execute_script("window.scrollTo(0, Y)")

(Y是您要滚动到的垂直位置)

same method as shown here:

in python you can just use

driver.execute_script("window.scrollTo(0, Y)")

(Y is the vertical position you want to scroll to)


回答 4

element=find_element_by_xpath("xpath of the li you are trying to access")

element.location_once_scrolled_into_view

当我尝试访问不可见的“ li”时,这很有帮助。

element=find_element_by_xpath("xpath of the li you are trying to access")

element.location_once_scrolled_into_view

this helped when I was trying to access a ‘li’ that was not visible.


回答 5

出于我的目的,我想向下滚动更多,同时牢记窗口的位置。我的解决方案是相似的,并使用window.scrollY

driver.execute_script("window.scrollTo(0, window.scrollY + 200)")

它将转到当前的y滚动位置+ 200

For my purpose, I wanted to scroll down more, keeping the windows position in mind. My solution was similar and used window.scrollY

driver.execute_script("window.scrollTo(0, window.scrollY + 200)")

which will go to the current y scroll position + 200


回答 6

这是您向下滚动网页的方式:

driver.execute_script("window.scrollTo(0, 1000);")

This is how you scroll down the webpage:

driver.execute_script("window.scrollTo(0, 1000);")

回答 7

我发现解决该问题的最简单方法是选择一个标签,然后发送:

label.sendKeys(Keys.PAGE_DOWN);

希望它能起作用!

The easiest way i found to solve that problem was to select a label and then send:

label.sendKeys(Keys.PAGE_DOWN);

Hope it works!


回答 8

这些答案都不适合我,至少不是向下滚动Facebook搜索结果页面有效,但经过大量测试,我发现此解决方案:

while driver.find_element_by_tag_name('div'):
    driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
    Divs=driver.find_element_by_tag_name('div').text
    if 'End of Results' in Divs:
        print 'end'
        break
    else:
        continue

None of these answers worked for me, at least not for scrolling down a facebook search result page, but I found after a lot of testing this solution:

while driver.find_element_by_tag_name('div'):
    driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
    Divs=driver.find_element_by_tag_name('div').text
    if 'End of Results' in Divs:
        print 'end'
        break
    else:
        continue

回答 9

使用youtube时,浮动元素的滚动高度为“ 0”,因此请不要使用“ return document.body.scrollHeight”,而是尝试使用此“ return document.documentElement.scrollHeight” ,根据您的互联网调整滚动暂停时间速度,否则它将只运行一次,然后在此之后中断。

SCROLL_PAUSE_TIME = 1

# Get scroll height
"""last_height = driver.execute_script("return document.body.scrollHeight")

this dowsnt work due to floating web elements on youtube
"""

last_height = driver.execute_script("return document.documentElement.scrollHeight")
while True:
    # Scroll down to bottom
    driver.execute_script("window.scrollTo(0,document.documentElement.scrollHeight);")

    # Wait to load page
    time.sleep(SCROLL_PAUSE_TIME)

    # Calculate new scroll height and compare with last scroll height
    new_height = driver.execute_script("return document.documentElement.scrollHeight")
    if new_height == last_height:
       print("break")
       break
    last_height = new_height

When working with youtube the floating elements give the value “0” as the scroll height so rather than using “return document.body.scrollHeight” try using this one “return document.documentElement.scrollHeight” adjust the scroll pause time as per your internet speed else it will run for only one time and then breaks after that.

SCROLL_PAUSE_TIME = 1

# Get scroll height
"""last_height = driver.execute_script("return document.body.scrollHeight")

this dowsnt work due to floating web elements on youtube
"""

last_height = driver.execute_script("return document.documentElement.scrollHeight")
while True:
    # Scroll down to bottom
    driver.execute_script("window.scrollTo(0,document.documentElement.scrollHeight);")

    # Wait to load page
    time.sleep(SCROLL_PAUSE_TIME)

    # Calculate new scroll height and compare with last scroll height
    new_height = driver.execute_script("return document.documentElement.scrollHeight")
    if new_height == last_height:
       print("break")
       break
    last_height = new_height

回答 10

我正在寻找一种滚动浏览动态网页的方法,并在到达页面末尾并发现该线程时自动停止。

@Cuong Tran的帖子进行了主要修改,是我正在寻找的答案。我认为其他人可能会发现此修改很有用(它对代码的工作方式有明显影响),因此,本文发布了。

修改是移动捕获循环最后一页高度的语句(以便使每项检查都与上一页高度进行比较)。

因此,下面的代码:

连续向下滚动动态网页(.scrollTo()),仅在一次迭代中页面高度保持不变时停止。

(还有另一种修改,其中break语句位于另一个可以删除的条件内(如果页面为“ sticks”)。

    SCROLL_PAUSE_TIME = 0.5


    while True:

        # Get scroll height
        ### This is the difference. Moving this *inside* the loop
        ### means that it checks if scrollTo is still scrolling 
        last_height = driver.execute_script("return document.body.scrollHeight")

        # Scroll down to bottom
        driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")

        # Wait to load page
        time.sleep(SCROLL_PAUSE_TIME)

        # Calculate new scroll height and compare with last scroll height
        new_height = driver.execute_script("return document.body.scrollHeight")
        if new_height == last_height:

            # try again (can be removed)
            driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")

            # Wait to load page
            time.sleep(SCROLL_PAUSE_TIME)

            # Calculate new scroll height and compare with last scroll height
            new_height = driver.execute_script("return document.body.scrollHeight")

            # check if the page height has remained the same
            if new_height == last_height:
                # if so, you are done
                break
            # if not, move on to the next loop
            else:
                last_height = new_height
                continue

I was looking for a way of scrolling through a dynamic webpage, and automatically stopping once the end of the page is reached, and found this thread.

The post by @Cuong Tran, with one main modification, was the answer that I was looking for. I thought that others might find the modification helpful (it has a pronounced effect on how the code works), hence this post.

The modification is to move the statement that captures the last page height inside the loop (so that each check is comparing to the previous page height).

So, the code below:

Continuously scrolls down a dynamic webpage (.scrollTo()), only stopping when, for one iteration, the page height stays the same.

(There is another modification, where the break statement is inside another condition (in case the page ‘sticks’) which can be removed).

    SCROLL_PAUSE_TIME = 0.5


    while True:

        # Get scroll height
        ### This is the difference. Moving this *inside* the loop
        ### means that it checks if scrollTo is still scrolling 
        last_height = driver.execute_script("return document.body.scrollHeight")

        # Scroll down to bottom
        driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")

        # Wait to load page
        time.sleep(SCROLL_PAUSE_TIME)

        # Calculate new scroll height and compare with last scroll height
        new_height = driver.execute_script("return document.body.scrollHeight")
        if new_height == last_height:

            # try again (can be removed)
            driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")

            # Wait to load page
            time.sleep(SCROLL_PAUSE_TIME)

            # Calculate new scroll height and compare with last scroll height
            new_height = driver.execute_script("return document.body.scrollHeight")

            # check if the page height has remained the same
            if new_height == last_height:
                # if so, you are done
                break
            # if not, move on to the next loop
            else:
                last_height = new_height
                continue

回答 11

该代码滚动到底部,但不需要您每次都等待。它会不断滚动,然后在底部停止(或超时)

from selenium import webdriver
import time

driver = webdriver.Chrome(executable_path='chromedriver.exe')
driver.get('https://example.com')

pre_scroll_height = driver.execute_script('return document.body.scrollHeight;')
run_time, max_run_time = 0, 1
while True:
    iteration_start = time.time()
    # Scroll webpage, the 100 allows for a more 'aggressive' scroll
    driver.execute_script('window.scrollTo(0, 100*document.body.scrollHeight);')

    post_scroll_height = driver.execute_script('return document.body.scrollHeight;')

    scrolled = post_scroll_height != pre_scroll_height
    timed_out = run_time >= max_run_time

    if scrolled:
        run_time = 0
        pre_scroll_height = post_scroll_height
    elif not scrolled and not timed_out:
        run_time += time.time() - iteration_start
    elif not scrolled and timed_out:
        break

# closing the driver is optional 
driver.close()

这比每次等待0.5-3秒等待响应要快得多,因为该响应可能需要0.1秒

This code scrolls to the bottom but doesn’t require that you wait each time. It’ll continually scroll, and then stop at the bottom (or timeout)

from selenium import webdriver
import time

driver = webdriver.Chrome(executable_path='chromedriver.exe')
driver.get('https://example.com')

pre_scroll_height = driver.execute_script('return document.body.scrollHeight;')
run_time, max_run_time = 0, 1
while True:
    iteration_start = time.time()
    # Scroll webpage, the 100 allows for a more 'aggressive' scroll
    driver.execute_script('window.scrollTo(0, 100*document.body.scrollHeight);')

    post_scroll_height = driver.execute_script('return document.body.scrollHeight;')

    scrolled = post_scroll_height != pre_scroll_height
    timed_out = run_time >= max_run_time

    if scrolled:
        run_time = 0
        pre_scroll_height = post_scroll_height
    elif not scrolled and not timed_out:
        run_time += time.time() - iteration_start
    elif not scrolled and timed_out:
        break

# closing the driver is optional 
driver.close()

This is much faster than waiting 0.5-3 seconds each time for a response, when that response could take 0.1 seconds


回答 12

滚动加载页面。示例:中,定额等

last_height = driver.execute_script("return document.body.scrollHeight")
    while True:
        driver.execute_script("window.scrollTo(0, document.body.scrollHeight-1000);")
        # Wait to load the page.
        driver.implicitly_wait(30) # seconds
        new_height = driver.execute_script("return document.body.scrollHeight")
    
        if new_height == last_height:
            break
        last_height = new_height
        # sleep for 30s
        driver.implicitly_wait(30) # seconds
    driver.quit()

scroll loading pages. Example: medium, quora,etc

last_height = driver.execute_script("return document.body.scrollHeight")
    while True:
        driver.execute_script("window.scrollTo(0, document.body.scrollHeight-1000);")
        # Wait to load the page.
        driver.implicitly_wait(30) # seconds
        new_height = driver.execute_script("return document.body.scrollHeight")
    
        if new_height == last_height:
            break
        last_height = new_height
        # sleep for 30s
        driver.implicitly_wait(30) # seconds
    driver.quit()

回答 13

如果要在特定视图/框架(WebElement)中滚动,则只需将“ body”替换为要在其中滚动的特定元素。我在下面的示例中通过“ getElementById”获得该元素:

self.driver.execute_script('window.scrollTo(0, document.getElementById("page-manager").scrollHeight);')

例如,在YouTube上就是这种情况。

if you want to scroll within a particular view/frame (WebElement), what you only need to do is to replace “body” with a particular element that you intend to scroll within. i get that element via “getElementById” in the example below:

self.driver.execute_script('window.scrollTo(0, document.getElementById("page-manager").scrollHeight);')

this is the case on YouTube, for example…


回答 14

ScrollTo()功能不再起作用。这是我使用的,效果很好。

driver.execute_script("document.getElementById('mydiv').scrollIntoView();")

The ScrollTo() function doesn’t work anymore. This is what I used and it worked fine.

driver.execute_script("document.getElementById('mydiv').scrollIntoView();")

回答 15

driver.execute_script("document.getElementById('your ID Element').scrollIntoView();")

它适合我的情况。

driver.execute_script("document.getElementById('your ID Element').scrollIntoView();")

it’s working for my case.