Python 实用宝典

Question 1

I am currently using selenium webdriver to parse through facebook user friends page and extract all ids from the AJAX script. But I need to scroll down to get all the friends. How can I scroll down in Selenium. I am using python.

Question 2

You can use

driver.execute_script("window.scrollTo(0, Y)")

where Y is the height (on a fullhd monitor it’s 1080). (Thanks to @lukeis)

You can also use

driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")

to scroll to the bottom of the page.

If you want to scroll to a page with infinite loading, like social network ones, facebook etc. (thanks to @Cuong Tran)

SCROLL_PAUSE_TIME = 0.5

# Get scroll height
last_height = driver.execute_script("return document.body.scrollHeight")

while True:
    # Scroll down to bottom
    driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")

    # Wait to load page
    time.sleep(SCROLL_PAUSE_TIME)

    # Calculate new scroll height and compare with last scroll height
    new_height = driver.execute_script("return document.body.scrollHeight")
    if new_height == last_height:
        break
    last_height = new_height

another method (thanks to Juanse) is, select an object and

label.sendKeys(Keys.PAGE_DOWN);

Question 3

If you want to scroll down to bottom of infinite page (like linkedin.com), you can use this code:

SCROLL_PAUSE_TIME = 0.5

# Get scroll height
last_height = driver.execute_script("return document.body.scrollHeight")

while True:
    # Scroll down to bottom
    driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")

    # Wait to load page
    time.sleep(SCROLL_PAUSE_TIME)

    # Calculate new scroll height and compare with last scroll height
    new_height = driver.execute_script("return document.body.scrollHeight")
    if new_height == last_height:
        break
    last_height = new_height

Reference: https://stackoverflow.com/a/28928684/1316860

Question 4

You can use send_keys to simulate an END (or PAGE_DOWN) key press (which normally scroll the page):

from selenium.webdriver.common.keys import Keys
html = driver.find_element_by_tag_name('html')
html.send_keys(Keys.END)

Question 5

same method as shown here:

in python you can just use

driver.execute_script("window.scrollTo(0, Y)")

(Y is the vertical position you want to scroll to)

Question 6

element=find_element_by_xpath("xpath of the li you are trying to access")

element.location_once_scrolled_into_view

this helped when I was trying to access a ‘li’ that was not visible.

Question 7

For my purpose, I wanted to scroll down more, keeping the windows position in mind. My solution was similar and used window.scrollY

driver.execute_script("window.scrollTo(0, window.scrollY + 200)")

which will go to the current y scroll position + 200

Question 8

This is how you scroll down the webpage:

driver.execute_script("window.scrollTo(0, 1000);")

Question 9

The easiest way i found to solve that problem was to select a label and then send:

label.sendKeys(Keys.PAGE_DOWN);

Hope it works!

Question 10

None of these answers worked for me, at least not for scrolling down a facebook search result page, but I found after a lot of testing this solution:

while driver.find_element_by_tag_name('div'):
    driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
    Divs=driver.find_element_by_tag_name('div').text
    if 'End of Results' in Divs:
        print 'end'
        break
    else:
        continue

Question 11

When working with youtube the floating elements give the value “0” as the scroll height so rather than using “return document.body.scrollHeight” try using this one “return document.documentElement.scrollHeight” adjust the scroll pause time as per your internet speed else it will run for only one time and then breaks after that.

SCROLL_PAUSE_TIME = 1

# Get scroll height
"""last_height = driver.execute_script("return document.body.scrollHeight")

this dowsnt work due to floating web elements on youtube
"""

last_height = driver.execute_script("return document.documentElement.scrollHeight")
while True:
    # Scroll down to bottom
    driver.execute_script("window.scrollTo(0,document.documentElement.scrollHeight);")

    # Wait to load page
    time.sleep(SCROLL_PAUSE_TIME)

    # Calculate new scroll height and compare with last scroll height
    new_height = driver.execute_script("return document.documentElement.scrollHeight")
    if new_height == last_height:
       print("break")
       break
    last_height = new_height

Question 12

I was looking for a way of scrolling through a dynamic webpage, and automatically stopping once the end of the page is reached, and found this thread.

The post by @Cuong Tran, with one main modification, was the answer that I was looking for. I thought that others might find the modification helpful (it has a pronounced effect on how the code works), hence this post.

The modification is to move the statement that captures the last page height inside the loop (so that each check is comparing to the previous page height).

So, the code below:

Continuously scrolls down a dynamic webpage (.scrollTo()), only stopping when, for one iteration, the page height stays the same.

(There is another modification, where the break statement is inside another condition (in case the page ‘sticks’) which can be removed).

    SCROLL_PAUSE_TIME = 0.5


    while True:

        # Get scroll height
        ### This is the difference. Moving this *inside* the loop
        ### means that it checks if scrollTo is still scrolling 
        last_height = driver.execute_script("return document.body.scrollHeight")

        # Scroll down to bottom
        driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")

        # Wait to load page
        time.sleep(SCROLL_PAUSE_TIME)

        # Calculate new scroll height and compare with last scroll height
        new_height = driver.execute_script("return document.body.scrollHeight")
        if new_height == last_height:

            # try again (can be removed)
            driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")

            # Wait to load page
            time.sleep(SCROLL_PAUSE_TIME)

            # Calculate new scroll height and compare with last scroll height
            new_height = driver.execute_script("return document.body.scrollHeight")

            # check if the page height has remained the same
            if new_height == last_height:
                # if so, you are done
                break
            # if not, move on to the next loop
            else:
                last_height = new_height
                continue

Question 13

This code scrolls to the bottom but doesn’t require that you wait each time. It’ll continually scroll, and then stop at the bottom (or timeout)

from selenium import webdriver
import time

driver = webdriver.Chrome(executable_path='chromedriver.exe')
driver.get('https://example.com')

pre_scroll_height = driver.execute_script('return document.body.scrollHeight;')
run_time, max_run_time = 0, 1
while True:
    iteration_start = time.time()
    # Scroll webpage, the 100 allows for a more 'aggressive' scroll
    driver.execute_script('window.scrollTo(0, 100*document.body.scrollHeight);')

    post_scroll_height = driver.execute_script('return document.body.scrollHeight;')

    scrolled = post_scroll_height != pre_scroll_height
    timed_out = run_time >= max_run_time

    if scrolled:
        run_time = 0
        pre_scroll_height = post_scroll_height
    elif not scrolled and not timed_out:
        run_time += time.time() - iteration_start
    elif not scrolled and timed_out:
        break

# closing the driver is optional 
driver.close()

This is much faster than waiting 0.5-3 seconds each time for a response, when that response could take 0.1 seconds

Question 14

scroll loading pages. Example: medium, quora,etc

last_height = driver.execute_script("return document.body.scrollHeight")
    while True:
        driver.execute_script("window.scrollTo(0, document.body.scrollHeight-1000);")
        # Wait to load the page.
        driver.implicitly_wait(30) # seconds
        new_height = driver.execute_script("return document.body.scrollHeight")
    
        if new_height == last_height:
            break
        last_height = new_height
        # sleep for 30s
        driver.implicitly_wait(30) # seconds
    driver.quit()

Question 15

if you want to scroll within a particular view/frame (WebElement), what you only need to do is to replace “body” with a particular element that you intend to scroll within. i get that element via “getElementById” in the example below:

self.driver.execute_script('window.scrollTo(0, document.getElementById("page-manager").scrollHeight);')

this is the case on YouTube, for example…

Question 16

The ScrollTo() function doesn’t work anymore. This is what I used and it worked fine.

driver.execute_script("document.getElementById('mydiv').scrollIntoView();")

Question 17

driver.execute_script("document.getElementById('your ID Element').scrollIntoView();")

it’s working for my case.

Question 18

I ran into a problem while working with Selenium. For my project, I have to use Chrome. However, I can’t connect to that browser after launching it with Selenium.

For some reason, Selenium can’t find Chrome by itself. This is what happens when I try to launch Chrome without including a path:

Traceback (most recent call last):
  File "./obp_pb_get_csv.py", line 73, in <module>
    browser = webdriver.Chrome() # Get local session of chrome
  File "/usr/lib64/python2.7/site-packages/selenium/webdriver/chrome/webdriver.py", line 46, in __init__
    self.service.start()
  File "/usr/lib64/python2.7/site-packages/selenium/webdriver/chrome/service.py", line 58, in start
    and read up at http://code.google.com/p/selenium/wiki/ChromeDriver")
selenium.common.exceptions.WebDriverException: Message: 'ChromeDriver executable needs to be available in the path.                 Please download from http://code.google.com/p/selenium/downloads/list                and read up at http://code.google.com/p/selenium/wiki/ChromeDriver'

To solve this problem, I then included the Chromium path in the code that launches Chrome. However, the interpreter fails to find a socket to connect to:

Traceback (most recent call last):
  File "./obp_pb_get_csv.py", line 73, in <module>
    browser = webdriver.Chrome('/usr/bin/chromium') # Get local session of chrome
  File "/usr/lib64/python2.7/site-packages/selenium/webdriver/chrome/webdriver.py", line 46, in __init__
    self.service.start()
  File "/usr/lib64/python2.7/site-packages/selenium/webdriver/chrome/service.py", line 64, in start
    raise WebDriverException("Can not connect to the ChromeDriver")
selenium.common.exceptions.WebDriverException: Message: 'Can not connect to the ChromeDriver'

I also tried solving the problem by launching chrome with:

chromium --remote-shell-port=9222

However, this did not work either.

PS. Here’s some information about my system:

www-client: chromium 15.0.874.121  
dev-lang:   python 2.7.2-r3 Selenium 2.11.1  
OS:         GNU/Linux Gentoo Kernel 3.1.0-gentoo-r1

Question 19

You need to make sure the standalone ChromeDriver binary (which is different than the Chrome browser binary) is either in your path or available in the webdriver.chrome.driver environment variable.

see http://code.google.com/p/selenium/wiki/ChromeDriver for full information on how wire things up.

Edit:

Right, seems to be a bug in the Python bindings wrt reading the chromedriver binary from the path or the environment variable. Seems if chromedriver is not in your path you have to pass it in as an argument to the constructor.

import os
from selenium import webdriver

chromedriver = "/Users/adam/Downloads/chromedriver"
os.environ["webdriver.chrome.driver"] = chromedriver
driver = webdriver.Chrome(chromedriver)
driver.get("http://stackoverflow.com")
driver.quit()

Question 20

For Linux

Check you have installed latest version of chrome brwoser-> chromium-browser -version
If not, install latest version of chrome sudo apt-get install chromium-browser
get appropriate version of chrome driver from here
Unzip the chromedriver.zip
Move the file to /usr/bin directory sudo mv chromedriver /usr/bin
Goto /usr/bin directory cd /usr/bin
Now, you would need to run something like sudo chmod a+x chromedriver to mark it executable.

finally you can execute the code.

from selenium import webdriver

driver = webdriver.Chrome()
driver.get("http://www.google.com")
print driver.page_source.encode('utf-8')
driver.quit()
display.stop()

Question 21

Mac OSX only

An easier way to get going (assuming you already have homebrew installed, which you should, if not, go do that first and let homebrew make your life better) is to just run the following command:

brew install chromedriver

That should put the chromedriver in your path and you should be all set.

Question 22

For windows

Download ChromeDriver from this direct link OR get the latest version from this page.

Paste the chromedriver.exe file in your C:\Python27\Scripts folder.

This should work now:

from selenium import webdriver
driver = webdriver.Chrome()

Question 23

For windows, please have the chromedriver.exe placed under <Install Dir>/Python27/Scripts/

Question 24

There are 2 ways to run Selenium python tests in Google Chrome. I’m considering Windows (Windows 10 in my case):

Prerequisite: Download the latest Chrome Driver from: https://sites.google.com/a/chromium.org/chromedriver/downloads

Way 1:

i) Extract the downloaded zip file in a directory/location of your choice
ii) Set the executable path in your code as below:

self.driver = webdriver.Chrome(executable_path='D:\Selenium_RiponAlWasim\Drivers\chromedriver_win32\chromedriver.exe')

Way 2:

i) Simply paste the chromedriver.exe under /Python/Scripts/ (In my case the folder was: C:\Python36\Scripts)
ii) Now write the simple code as below:

self.driver = webdriver.Chrome()

Question 25

For Windows’ IDE:

If your path doesn’t work, you can try to add the chromedriver.exe to your project, like in this project structure.

Then you should load the chromedriver.exe in your main file. As for me, I loaded the driver.exe in driver.py.

def get_chrome_driver():
return webdriver.Chrome("..\\content\\engine\\chromedriver.exe",
                            chrome_options='--no-startup-window')

.. means driver.py's upper directory

. means the directory where the driver.py is located

Hope this will be helpful.

问题：如何在python中使用Selenium Webdriver滚动网页？

回答 0

回答 1

回答 2

回答 3

回答 4

回答 5

回答 6

回答 7

回答 8

回答 9

回答 10

回答 11

回答 12

回答 13

回答 14

回答 15

问题：在Chrome中运行Selenium WebDriver python绑定

回答 0

回答 1

回答 2

回答 3

对于窗户

For windows

回答 4

回答 5

回答 6

问题：使用Python在Selenium WebDriver中获取WebElement的HTML源

回答 0

回答 1

回答 2

回答 3

回答 4

回答 5

回答 6

回答 7

回答 8

回答 9

回答 10

回答 11

回答 12

回答 13

有趣好用的Python教程