标签归档:selenium

用硒清除textarea中的文本

问题:用硒清除textarea中的文本

我进行了一些测试,这些测试用于检查某些字段中的文本无效时是否出现了正确的错误消息。有效性检查之一是某个textarea元素不为空。

如果此文本区域中已经有文本,我如何告诉硒清除该字段?

就像是:

driver.get_element_by_id('foo').clear_field()

I’ve got some tests where I’m checking that the proper error message appears when text in certain fields are invalid. One check for validity is that a certain textarea element is not empty.

If this textarea already has text in it, how can I tell selenium to clear the field?

something like:

driver.get_element_by_id('foo').clear_field()

回答 0

driver.find_element_by_id('foo').clear()
driver.find_element_by_id('foo').clear()

回答 1

您可以使用

 webElement.clear();

如果此元素是文本输入元素,则将清除该值。

请注意,此事件引发的事件可能与您预期的不同。特别是,我们不会触发任何键盘或鼠标事件。如果您想确保触发了键盘事件,请考虑使用sendKeys(CharSequence)。例如:

 webElement.sendKeys(Keys.BACK_SPACE); //do repeatedly, e.g. in while loop

要么:

 webElement.sendKeys(Keys.CONTROL + "a");
 webElement.sendKeys(Keys.DELETE);

You can use

 webElement.clear();

If this element is a text entry element, this will clear the value.

Note that the events fired by this event may not be as you’d expect. In particular, we don’t fire any keyboard or mouse events. If you want to ensure keyboard events are fired, consider using something like sendKeys(CharSequence). E.g.:

 webElement.sendKeys(Keys.BACK_SPACE); //do repeatedly, e.g. in while loop

or:

 webElement.sendKeys(Keys.CONTROL + "a");
 webElement.sendKeys(Keys.DELETE);

回答 2

我遇到了.clear()无法正常工作的领域。结合使用前两个答案可解决此问题。

from selenium.webdriver.common.keys import Keys

#...your code (I was using python 3)

driver.find_element_by_id('foo').send_keys(Keys.CONTROL + "a");
driver.find_element_by_id('foo').send_keys(Keys.DELETE);

I ran into a field where .clear() did not work. Using a combination of the first two answers worked for this field.

from selenium.webdriver.common.keys import Keys

#...your code (I was using python 3)

driver.find_element_by_id('foo').send_keys(Keys.CONTROL + "a");
driver.find_element_by_id('foo').send_keys(Keys.DELETE);

回答 3

在最新的Selenium版本中,使用:

driver.find_element_by_id('foo').clear()

In the most recent Selenium version, use:

driver.find_element_by_id('foo').clear()

回答 4

对于java

driver.findelement(By.id('foo').clear();

要么

webElement.clear();

如果此元素是文本输入元素,则将清除该值。

for java

driver.findelement(By.id('foo').clear();

or

webElement.clear();

If this element is a text entry element, this will clear the value.


回答 5

这是一般语法

driver.find_element_by_id('Locator value').clear();
driver.find_element_by_name('Locator value').clear();

It is general syntax

driver.find_element_by_id('Locator value').clear();
driver.find_element_by_name('Locator value').clear();

回答 6

通过对clear()的简单调用,在DOM中就会出现相应的输入/文本区域组件仍具有其旧值的情况,因此对该组件进行的任何后续更改(例如,用新值填充该组件)都不会及时得到处理。

如果看一下硒源代码,您会发现clear()方法记录在案并带有以下注释:

/ **如果此元素是文本输入元素,则将清除该值。对其他元素没有影响。文本输入元素是INPUT和TEXTAREA元素。请注意,此事件引发的事件可能与您预期的不同。特别是,我们不会触发任何键盘或鼠标事件。如果要确保触发键盘事件,请考虑将{@link #sendKeys(CharSequence …)}之类的键与退格键一起使用。为确保您收到更改事件,请考虑使用Tab键跟随对{@link #sendKeys(CharSequence …)}的调用。* /

因此,使用此有用的提示来清除输入/文本区域(已经具有值的组件)并为其分配新的值,您将获得如下代码:

public void waitAndClearFollowedByKeys(By by, CharSequence keys) {
    LOG.debug("clearing element");
    wait(by, true).clear();
    sendKeys(by, Keys.BACK_SPACE.toString() + keys);
}

public void sendKeys(By by, CharSequence keysToSend) {
    WebElement webElement = wait(by, true);
    LOG.info("sending keys '{}' to {}", escapeProperly(keysToSend), by);
    webElement.sendKeys(keysToSend);
    LOG.info("keys sent");
}

private String escapeProperly(CharSequence keysToSend) {
    String result = "" + keysToSend;
    result = result.replace(Keys.TAB, "\\t");
    result = result.replace(Keys.ENTER, "\\n");
    result = result.replace(Keys.RETURN, "\\r");

    return result;
}

抱歉,此代码是Java而不是Python。另外,我还不得不跳过其他的“ waitUntilPageIsReady()-方法,这会使这篇文章过长。

希望这对您在硒的旅途中有所帮助!

With a simple call of clear() it appears in the DOM that the corresponding input/textarea component still has its old value, so any following changes on that component (e.g. filling the component with a new value) will not be processed in time.

If you take a look in the selenium source code you’ll find that the clear()-method is documented with the following comment:

/** If this element is a text entry element, this will clear the value. Has no effect on other elements. Text entry elements are INPUT and TEXTAREA elements. Note that the events fired by this event may not be as you’d expect. In particular, we don’t fire any keyboard or mouse events. If you want to ensure keyboard events are fired, consider using something like {@link #sendKeys(CharSequence…)} with the backspace key. To ensure you get a change event, consider following with a call to {@link #sendKeys(CharSequence…)} with the tab key. */

So using this helpful hint to clear an input/textarea (component that already has a value) AND assign a new value to it, you’ll get some code like the following:

public void waitAndClearFollowedByKeys(By by, CharSequence keys) {
    LOG.debug("clearing element");
    wait(by, true).clear();
    sendKeys(by, Keys.BACK_SPACE.toString() + keys);
}

public void sendKeys(By by, CharSequence keysToSend) {
    WebElement webElement = wait(by, true);
    LOG.info("sending keys '{}' to {}", escapeProperly(keysToSend), by);
    webElement.sendKeys(keysToSend);
    LOG.info("keys sent");
}

private String escapeProperly(CharSequence keysToSend) {
    String result = "" + keysToSend;
    result = result.replace(Keys.TAB, "\\t");
    result = result.replace(Keys.ENTER, "\\n");
    result = result.replace(Keys.RETURN, "\\r");

    return result;
}

Sorry for this code being Java and not Python. Also, I had to skip out an additional “waitUntilPageIsReady()-method that would make this post way too long.

Hope this helps you on your journey with Selenium!


回答 7

以我的经验,这是最有效的

driver.find_element_by_css_selector('foo').send_keys(u'\ue009' + u'\ue003')

我们正在发送Ctrl +退格键以删除输入中的所有字符,您也可以使用delete替换退格键。

编辑:删除键依赖

In my experience, this turned out to be the most efficient

driver.find_element_by_css_selector('foo').send_keys(u'\ue009' + u'\ue003')

We are sending Ctrl + Backspace to delete all characters from the input, you can also replace backspace with delete.

EDIT: removed Keys dependency


回答 8

driver.find_element_by_xpath("path").send_keys(Keys.CONTROL + u'\ue003') 与FireFox一起工作得很好

  • u’\ ue003’对像我这样的人来说是BACK_SPACE-永远不会记住它)

driver.find_element_by_xpath("path").send_keys(Keys.CONTROL + u'\ue003') worked great with FireFox

  • u’\ue003′ is a BACK_SPACE for those like me – never remembering it)

如何使用Python使用Selenium选择下拉菜单值?

问题:如何使用Python使用Selenium选择下拉菜单值?

我需要从中选择一个元素 下拉菜单中。

例如:

<select id="fruits01" class="select" name="fruits">
  <option value="0">Choose your fruits:</option>
  <option value="1">Banana</option>
  <option value="2">Mango</option>
</select>

1)首先,我必须单击它。我这样做:

inputElementFruits = driver.find_element_by_xpath("//select[id='fruits']").click()

2)之后,我必须选择一个好的元素,让我们说Mango

我尝试这样做,inputElementFruits.send_keys(...)但是没有用。

I need to select an element from a drop-down menu.

For example:

<select id="fruits01" class="select" name="fruits">
  <option value="0">Choose your fruits:</option>
  <option value="1">Banana</option>
  <option value="2">Mango</option>
</select>

1) First I have to click on it. I do this:

inputElementFruits = driver.find_element_by_xpath("//select[id='fruits']").click()

2) After that I have to select the good element, lets say Mango.

I tried to do it with inputElementFruits.send_keys(...) but it did not work.


回答 0

除非您的点击触发了某种Ajax调用来填充列表,否则您实际上不需要执行该点击。

只需找到元素,然后枚举选项,然后选择所需的选项即可。

这是一个例子:

from selenium import webdriver
b = webdriver.Firefox()
b.find_element_by_xpath("//select[@name='element_name']/option[text()='option_text']").click()

您可以在以下网址中阅读更多内容:https :
//sqa.stackexchange.com/questions/1355/unable-to-select-an-option-using-seleniums-python-webdriver

Unless your click is firing some kind of ajax call to populate your list, you don’t actually need to execute the click.

Just find the element and then enumerate the options, selecting the option(s) you want.

Here is an example:

from selenium import webdriver
b = webdriver.Firefox()
b.find_element_by_xpath("//select[@name='element_name']/option[text()='option_text']").click()

You can read more in:
https://sqa.stackexchange.com/questions/1355/unable-to-select-an-option-using-seleniums-python-webdriver


回答 1

Selenium提供了一个方便的Select来使用select -> option构造:

from selenium import webdriver
from selenium.webdriver.support.ui import Select

driver = webdriver.Firefox()
driver.get('url')

select = Select(driver.find_element_by_id('fruits01'))

# select by visible text
select.select_by_visible_text('Banana')

# select by value 
select.select_by_value('1')

也可以看看:

Selenium provides a convenient Select class to work with select -> option constructs:

from selenium import webdriver
from selenium.webdriver.support.ui import Select

driver = webdriver.Firefox()
driver.get('url')

select = Select(driver.find_element_by_id('fruits01'))

# select by visible text
select.select_by_visible_text('Banana')

# select by value 
select.select_by_value('1')

See also:


回答 2

首先,您需要导入Select类,然后创建Select类的实例。创建Select类的实例后,您可以对该实例执行select方法以从下拉列表中选择选项。这是代码

from selenium.webdriver.support.select import Select

select_fr = Select(driver.find_element_by_id("fruits01"))
select_fr.select_by_index(0)

firstly you need to import the Select class and then you need to create the instance of Select class. After creating the instance of Select class, you can perform select methods on that instance to select the options from dropdown list. Here is the code

from selenium.webdriver.support.select import Select

select_fr = Select(driver.find_element_by_id("fruits01"))
select_fr.select_by_index(0)

回答 3

希望这段代码对您有所帮助。

from selenium.webdriver.support.ui import Select

ID为下拉列表的元素

ddelement= Select(driver.find_element_by_id('id_of_element'))

带xpath的下拉元素

ddelement= Select(driver.find_element_by_xpath('xpath_of_element'))

带CSS选择器的下拉元素

ddelement= Select(driver.find_element_by_css_selector('css_selector_of_element'))

从下拉列表中选择“香蕉”

  1. 使用下拉索引

ddelement.select_by_index(1)

  1. 使用下拉菜单的值

ddelement.select_by_value('1')

  1. 您可以使用匹配显示在下拉菜单中的文本。

ddelement.select_by_visible_text('Banana')

I hope this code will help you.

from selenium.webdriver.support.ui import Select

dropdown element with id

ddelement= Select(driver.find_element_by_id('id_of_element'))

dropdown element with xpath

ddelement= Select(driver.find_element_by_xpath('xpath_of_element'))

dropdown element with css selector

ddelement= Select(driver.find_element_by_css_selector('css_selector_of_element'))

Selecting ‘Banana’ from a dropdown

  1. Using the index of dropdown

ddelement.select_by_index(1)

  1. Using the value of dropdown

ddelement.select_by_value('1')

  1. You can use match the text which is displayed in the drop down.

ddelement.select_by_visible_text('Banana')


回答 4

我尝试了很多事情,但下拉菜单位于表内,因此我无法执行简单的选择操作。仅以下解决方案有效。在这里,我突出显示下拉元素并按下箭头,直到获得所需的值-

        #identify the drop down element
        elem = browser.find_element_by_name(objectVal)
        for option in elem.find_elements_by_tag_name('option'):
            if option.text == value:
                break

            else:
                ARROW_DOWN = u'\ue015'
                elem.send_keys(ARROW_DOWN)

I tried a lot many things, but my drop down was inside a table and I was not able to perform a simple select operation. Only the below solution worked. Here I am highlighting drop down elem and pressing down arrow until getting the desired value –

        #identify the drop down element
        elem = browser.find_element_by_name(objectVal)
        for option in elem.find_elements_by_tag_name('option'):
            if option.text == value:
                break

            else:
                ARROW_DOWN = u'\ue015'
                elem.send_keys(ARROW_DOWN)

回答 5

您无需单击任何内容。使用xpath或任何您选择的方式查找,然后使用发送键

例如:HTML:

<select id="fruits01" class="select" name="fruits">
    <option value="0">Choose your fruits:</option>
    <option value="1">Banana</option>
    <option value="2">Mango</option>
</select>

Python:

fruit_field = browser.find_element_by_xpath("//input[@name='fruits']")
fruit_field.send_keys("Mango")

而已。

You don’t have to click anything. Use find by xpath or whatever you choose and then use send keys

For your example: HTML:

<select id="fruits01" class="select" name="fruits">
    <option value="0">Choose your fruits:</option>
    <option value="1">Banana</option>
    <option value="2">Mango</option>
</select>

Python:

fruit_field = browser.find_element_by_xpath("//input[@name='fruits']")
fruit_field.send_keys("Mango")

That’s it.


回答 6

您可以很好地使用CSS选择器组合

driver.find_element_by_css_selector("#fruits01 [value='1']").click()

将attribute = value css选择器中的1更改为与所需水果对应的值。

You can use a css selector combination a well

driver.find_element_by_css_selector("#fruits01 [value='1']").click()

Change the 1 in the attribute = value css selector to the value corresponding with the desired fruit.


回答 7

from selenium.webdriver.support.ui import Select
driver = webdriver.Ie(".\\IEDriverServer.exe")
driver.get("https://test.com")
select = Select(driver.find_element_by_xpath("""//input[@name='n_name']"""))
select.select_by_index(2)

会很好的工作

from selenium.webdriver.support.ui import Select
driver = webdriver.Ie(".\\IEDriverServer.exe")
driver.get("https://test.com")
select = Select(driver.find_element_by_xpath("""//input[@name='n_name']"""))
select.select_by_index(2)

It will work fine


回答 8

它与选项值一起使用:

from selenium import webdriver
b = webdriver.Firefox()
b.find_element_by_xpath("//select[@class='class_name']/option[@value='option_value']").click()

It works with option value:

from selenium import webdriver
b = webdriver.Firefox()
b.find_element_by_xpath("//select[@class='class_name']/option[@value='option_value']").click()

回答 9

这样,您可以在任何下拉菜单中选择所有选项。

driver.get("https://www.spectrapremium.com/en/aftermarket/north-america")

print( "The title is  : " + driver.title)

inputs = Select(driver.find_element_by_css_selector('#year'))

input1 = len(inputs.options)

for items in range(input1):

    inputs.select_by_index(items)
    time.sleep(1)

In this way you can select all the options in any dropdowns.

driver.get("https://www.spectrapremium.com/en/aftermarket/north-america")

print( "The title is  : " + driver.title)

inputs = Select(driver.find_element_by_css_selector('#year'))

input1 = len(inputs.options)

for items in range(input1):

    inputs.select_by_index(items)
    time.sleep(1)

回答 10

使用selenium.webdriver.support.ui.Select类与下拉选择配合使用的最佳方法,但由于HTML的设计问题或其他问题,有时它无法按预期工作。

在这种情况下,您也可以使用execute_script()以下替代方法:

option_visible_text = "Banana"
select = driver.find_element_by_id("fruits01")

#now use this to select option from dropdown by visible text 
driver.execute_script("var select = arguments[0]; for(var i = 0; i < select.options.length; i++){ if(select.options[i].text == arguments[1]){ select.options[i].selected = true; } }", select, option_visible_text);

The best way to use selenium.webdriver.support.ui.Select class to work to with dropdown selection but some time it does not work as expected due to designing issue or other issues of the HTML.

In this type of situation you can also prefer as alternate solution using execute_script() as below :-

option_visible_text = "Banana"
select = driver.find_element_by_id("fruits01")

#now use this to select option from dropdown by visible text 
driver.execute_script("var select = arguments[0]; for(var i = 0; i < select.options.length; i++){ if(select.options[i].text == arguments[1]){ select.options[i].selected = true; } }", select, option_visible_text);

回答 11

按照提供的HTML:

<select id="fruits01" class="select" name="fruits">
  <option value="0">Choose your fruits:</option>
  <option value="1">Banana</option>
  <option value="2">Mango</option>
</select>

选择一个 <option>元素中元素菜单,您必须使用Select Class。此外,由于您必须与你有诱导WebDriverWaitelement_to_be_clickable()

选择<option>文本作为芒果您可以使用以下两种定位策略之一

  • 使用ID属性和select_by_visible_text()方法:

    from selenium import webdriver
    from selenium.webdriver.support.ui import WebDriverWait
    from selenium.webdriver.common.by import By
    from selenium.webdriver.support import expected_conditions as EC
    from selenium.webdriver.support.ui import Select
    
    select = Select(WebDriverWait(driver, 10).until(EC.element_to_be_clickable((By.ID, "fruits01"))))
    select.select_by_visible_text("Mango")
  • 使用CSS-SELECTORselect_by_value()方法:

    select = Select(WebDriverWait(driver, 10).until(EC.element_to_be_clickable((By.CSS_SELECTOR, "select.select[name='fruits']"))))
    select.select_by_value("2")
  • 使用XPATHselect_by_index()方法:

    select = Select(WebDriverWait(driver, 10).until(EC.element_to_be_clickable((By.CSS_SELECTOR, "//select[@class='select' and @name='fruits']"))))
    select.select_by_index(2)

As per the HTML provided:

<select id="fruits01" class="select" name="fruits">
  <option value="0">Choose your fruits:</option>
  <option value="1">Banana</option>
  <option value="2">Mango</option>
</select>

To select an <option> element from a menu you have to use the Select Class. Moreover, as you have to interact with the you have to induce WebDriverWait for the element_to_be_clickable().

To select the <option> with text as Mango from the you can use you can use either of the following Locator Strategies:

  • Using ID attribute and select_by_visible_text() method:

    from selenium import webdriver
    from selenium.webdriver.support.ui import WebDriverWait
    from selenium.webdriver.common.by import By
    from selenium.webdriver.support import expected_conditions as EC
    from selenium.webdriver.support.ui import Select
    
    select = Select(WebDriverWait(driver, 10).until(EC.element_to_be_clickable((By.ID, "fruits01"))))
    select.select_by_visible_text("Mango")
    
  • Using CSS-SELECTOR and select_by_value() method:

    select = Select(WebDriverWait(driver, 10).until(EC.element_to_be_clickable((By.CSS_SELECTOR, "select.select[name='fruits']"))))
    select.select_by_value("2")
    
  • Using XPATH and select_by_index() method:

    select = Select(WebDriverWait(driver, 10).until(EC.element_to_be_clickable((By.CSS_SELECTOR, "//select[@class='select' and @name='fruits']"))))
    select.select_by_index(2)
    

回答 12

  1. 项目清单

公共类ListBoxMultiple {

public static void main(String[] args) throws InterruptedException {
    // TODO Auto-generated method stub
    System.setProperty("webdriver.chrome.driver", "./drivers/chromedriver.exe");
    WebDriver driver=new ChromeDriver();
    driver.get("file:///C:/Users/Amitabh/Desktop/hotel2.html");//open the website
    driver.manage().window().maximize();


    WebElement hotel = driver.findElement(By.id("maarya"));//get the element

    Select sel=new Select(hotel);//for handling list box
    //isMultiple
    if(sel.isMultiple()){
        System.out.println("it is multi select list");
    }
    else{
        System.out.println("it is single select list");
    }
    //select option
    sel.selectByIndex(1);// you can select by index values
    sel.selectByValue("p");//you can select by value
    sel.selectByVisibleText("Fish");// you can also select by visible text of the options
    //deselect option but this is possible only in case of multiple lists
    Thread.sleep(1000);
    sel.deselectByIndex(1);
    sel.deselectAll();

    //getOptions
    List<WebElement> options = sel.getOptions();

    int count=options.size();
    System.out.println("Total options: "+count);

    for(WebElement opt:options){ // getting text of every elements
        String text=opt.getText();
        System.out.println(text);
        }

    //select all options
    for(int i=0;i<count;i++){
        sel.selectByIndex(i);
        Thread.sleep(1000);
    }

    driver.quit();

}

}

  1. List item

public class ListBoxMultiple {

public static void main(String[] args) throws InterruptedException {
    // TODO Auto-generated method stub
    System.setProperty("webdriver.chrome.driver", "./drivers/chromedriver.exe");
    WebDriver driver=new ChromeDriver();
    driver.get("file:///C:/Users/Amitabh/Desktop/hotel2.html");//open the website
    driver.manage().window().maximize();


    WebElement hotel = driver.findElement(By.id("maarya"));//get the element

    Select sel=new Select(hotel);//for handling list box
    //isMultiple
    if(sel.isMultiple()){
        System.out.println("it is multi select list");
    }
    else{
        System.out.println("it is single select list");
    }
    //select option
    sel.selectByIndex(1);// you can select by index values
    sel.selectByValue("p");//you can select by value
    sel.selectByVisibleText("Fish");// you can also select by visible text of the options
    //deselect option but this is possible only in case of multiple lists
    Thread.sleep(1000);
    sel.deselectByIndex(1);
    sel.deselectAll();

    //getOptions
    List<WebElement> options = sel.getOptions();

    int count=options.size();
    System.out.println("Total options: "+count);

    for(WebElement opt:options){ // getting text of every elements
        String text=opt.getText();
        System.out.println(text);
        }

    //select all options
    for(int i=0;i<count;i++){
        sel.selectByIndex(i);
        Thread.sleep(1000);
    }

    driver.quit();

}

}


等待页面加载有Selenium WebDriver for Python

问题:等待页面加载有Selenium WebDriver for Python

我想抓取无限滚动实现的页面的所有数据。以下python代码有效。

for i in range(100):
    driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
    time.sleep(5)

这意味着每次我向下滚动到底部时,我都需要等待5秒钟,这通常足以使页面完成加载新生成的内容。但是,这可能不是省时的。该页面可能会在5秒内完成新内容的加载。每次向下滚动时,如何检测页面是否完成了新内容的加载?如果我可以检测到此情况,则知道页面完成加载后,可以再次向下滚动以查看更多内容。这样更省时。

I want to scrape all the data of a page implemented by a infinite scroll. The following python code works.

for i in range(100):
    driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
    time.sleep(5)

This means every time I scroll down to the bottom, I need to wait 5 seconds, which is generally enough for the page to finish loading the newly generated contents. But, this may not be time efficient. The page may finish loading the new contents within 5 seconds. How can I detect whether the page finished loading the new contents every time I scroll down? If I can detect this, I can scroll down again to see more contents once I know the page finished loading. This is more time efficient.


回答 0

webdriver会通过等待页面加载默认.get()的方法。

正如您可能正在寻找@ user227215所说的某些特定元素时,应该使用它WebDriverWait来等待页面中的某个元素:

from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
from selenium.common.exceptions import TimeoutException

browser = webdriver.Firefox()
browser.get("url")
delay = 3 # seconds
try:
    myElem = WebDriverWait(browser, delay).until(EC.presence_of_element_located((By.ID, 'IdOfMyElement')))
    print "Page is ready!"
except TimeoutException:
    print "Loading took too much time!"

我用它来检查警报。您可以使用任何其他类型的方法来查找定位器。

编辑1:

我应该提到,webdriver默认情况下,会等待页面加载。它不等待加载内部框架或ajax请求。这意味着当您使用时.get('url'),浏览器将等待页面完全加载完毕,然后转到代码中的下一个命令。但是,当您发布ajax请求时,请webdriver不要等待,您有责任等待适当的时间以加载页面或页面的一部分;因此有一个名为的模块expected_conditions

The webdriver will wait for a page to load by default via .get() method.

As you may be looking for some specific element as @user227215 said, you should use WebDriverWait to wait for an element located in your page:

from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
from selenium.common.exceptions import TimeoutException

browser = webdriver.Firefox()
browser.get("url")
delay = 3 # seconds
try:
    myElem = WebDriverWait(browser, delay).until(EC.presence_of_element_located((By.ID, 'IdOfMyElement')))
    print "Page is ready!"
except TimeoutException:
    print "Loading took too much time!"

I have used it for checking alerts. You can use any other type methods to find the locator.

EDIT 1:

I should mention that the webdriver will wait for a page to load by default. It does not wait for loading inside frames or for ajax requests. It means when you use .get('url'), your browser will wait until the page is completely loaded and then go to the next command in the code. But when you are posting an ajax request, webdriver does not wait and it’s your responsibility to wait an appropriate amount of time for the page or a part of page to load; so there is a module named expected_conditions.


回答 1

试图传递find_element_by_id给的构造函数presence_of_element_located(如已接受的答案所示)NoSuchElementException被引发。我不得不在fragles注释中使用语法:

from selenium import webdriver
from selenium.common.exceptions import TimeoutException
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By

driver = webdriver.Firefox()
driver.get('url')
timeout = 5
try:
    element_present = EC.presence_of_element_located((By.ID, 'element_id'))
    WebDriverWait(driver, timeout).until(element_present)
except TimeoutException:
    print "Timed out waiting for page to load"

这与文档中的示例匹配。这是By文档的链接。

Trying to pass find_element_by_id to the constructor for presence_of_element_located (as shown in the accepted answer) caused NoSuchElementException to be raised. I had to use the syntax in fraglescomment:

from selenium import webdriver
from selenium.common.exceptions import TimeoutException
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By

driver = webdriver.Firefox()
driver.get('url')
timeout = 5
try:
    element_present = EC.presence_of_element_located((By.ID, 'element_id'))
    WebDriverWait(driver, timeout).until(element_present)
except TimeoutException:
    print "Timed out waiting for page to load"

This matches the example in the documentation. Here is a link to the documentation for By.


回答 2

查找以下3种方法:

readyState

检查页面readyState(不可靠):

def page_has_loaded(self):
    self.log.info("Checking if {} page is loaded.".format(self.driver.current_url))
    page_state = self.driver.execute_script('return document.readyState;')
    return page_state == 'complete'

wait_for助手功能还是不错的,可惜click_through_to_new_page是开放的,我们管理的旧页面执行脚本的竞争条件,浏览器已经开始处理前点击,并page_has_loaded刚刚返回true,立竿见影。

id

将新的页面ID与旧的页面ID进行比较:

def page_has_loaded_id(self):
    self.log.info("Checking if {} page is loaded.".format(self.driver.current_url))
    try:
        new_page = browser.find_element_by_tag_name('html')
        return new_page.id != old_page.id
    except NoSuchElementException:
        return False

比较ID可能不如等待过时的引用异常有效。

staleness_of

使用staleness_of方法:

@contextlib.contextmanager
def wait_for_page_load(self, timeout=10):
    self.log.debug("Waiting for page to load at {}.".format(self.driver.current_url))
    old_page = self.find_element_by_tag_name('html')
    yield
    WebDriverWait(self, timeout).until(staleness_of(old_page))

有关更多详细信息,请查看Harry的博客

Find below 3 methods:

readyState

Checking page readyState (not reliable):

def page_has_loaded(self):
    self.log.info("Checking if {} page is loaded.".format(self.driver.current_url))
    page_state = self.driver.execute_script('return document.readyState;')
    return page_state == 'complete'

The wait_for helper function is good, but unfortunately click_through_to_new_page is open to the race condition where we manage to execute the script in the old page, before the browser has started processing the click, and page_has_loaded just returns true straight away.

id

Comparing new page ids with the old one:

def page_has_loaded_id(self):
    self.log.info("Checking if {} page is loaded.".format(self.driver.current_url))
    try:
        new_page = browser.find_element_by_tag_name('html')
        return new_page.id != old_page.id
    except NoSuchElementException:
        return False

It’s possible that comparing ids is not as effective as waiting for stale reference exceptions.

staleness_of

Using staleness_of method:

@contextlib.contextmanager
def wait_for_page_load(self, timeout=10):
    self.log.debug("Waiting for page to load at {}.".format(self.driver.current_url))
    old_page = self.find_element_by_tag_name('html')
    yield
    WebDriverWait(self, timeout).until(staleness_of(old_page))

For more details, check Harry’s blog.


回答 3

正如David Cullen回答中所提到的,我一直看到建议使用类似于以下内容的行:

element_present = EC.presence_of_element_located((By.ID, 'element_id'))
WebDriverWait(driver, timeout).until(element_present)

对于我来说,很难找到可以与一起使用的所有可能的定位器By,因此我认为在此处提供列表会很有用。根据Ryan Mitchell的《使用Python进行Web爬取》

ID

在示例中使用;通过其HTML id属性查找元素

CLASS_NAME

用于通过其HTML类属性查找元素。为什么这个功能CLASS_NAME不简单CLASS?使用表单object.CLASS 会给Selenium的Java库带来问题,这.class是保留方法。为了使Selenium语法在不同语言之间保持一致,CLASS_NAME使用了替代语言。

CSS_SELECTOR

通过他们的阶级,ID或标签名称找到元素,使用#idName.classNametagName约定。

LINK_TEXT

通过HTML标签包含的文本查找。例如,可以使用来选择显示“下一步”的链接(By.LINK_TEXT, "Next")

PARTIAL_LINK_TEXT

与相似LINK_TEXT,但匹配部分字符串。

NAME

通过名称属性查找HTML标记。这对于HTML表单很方便。

TAG_NAME

按标记名称查找HTML标记。

XPATH

使用XPath表达式…选择匹配的元素。

As mentioned in the answer from David Cullen, I’ve always seen recommendations to use a line like the following one:

element_present = EC.presence_of_element_located((By.ID, 'element_id'))
WebDriverWait(driver, timeout).until(element_present)

It was difficult for me to find somewhere all the possible locators that can be used with the By, so I thought it would be useful to provide the list here. According to Web Scraping with Python by Ryan Mitchell:

ID

Used in the example; finds elements by their HTML id attribute

CLASS_NAME

Used to find elements by their HTML class attribute. Why is this function CLASS_NAME not simply CLASS? Using the form object.CLASS would create problems for Selenium’s Java library, where .class is a reserved method. In order to keep the Selenium syntax consistent between different languages, CLASS_NAME was used instead.

CSS_SELECTOR

Finds elements by their class, id, or tag name, using the #idName, .className, tagName convention.

LINK_TEXT

Finds HTML tags by the text they contain. For example, a link that says “Next” can be selected using (By.LINK_TEXT, "Next").

PARTIAL_LINK_TEXT

Similar to LINK_TEXT, but matches on a partial string.

NAME

Finds HTML tags by their name attribute. This is handy for HTML forms.

TAG_NAME

Finds HTML tags by their tag name.

XPATH

Uses an XPath expression … to select matching elements.


回答 4

selenium / webdriver / support / wait.py

driver = ...
from selenium.webdriver.support.wait import WebDriverWait
element = WebDriverWait(driver, 10).until(
    lambda x: x.find_element_by_id("someId"))

From selenium/webdriver/support/wait.py

driver = ...
from selenium.webdriver.support.wait import WebDriverWait
element = WebDriverWait(driver, 10).until(
    lambda x: x.find_element_by_id("someId"))

回答 5

附带说明一下,您可以检查是否没有对DOM的其他修改(而不是向下滚动100次)(我们是在页面底部延迟加载AJAX的情况下)

def scrollDown(driver, value):
    driver.execute_script("window.scrollBy(0,"+str(value)+")")

# Scroll down the page
def scrollDownAllTheWay(driver):
    old_page = driver.page_source
    while True:
        logging.debug("Scrolling loop")
        for i in range(2):
            scrollDown(driver, 500)
            time.sleep(2)
        new_page = driver.page_source
        if new_page != old_page:
            old_page = new_page
        else:
            break
    return True

On a side note, instead of scrolling down 100 times, you can check if there are no more modifications to the DOM (we are in the case of the bottom of the page being AJAX lazy-loaded)

def scrollDown(driver, value):
    driver.execute_script("window.scrollBy(0,"+str(value)+")")

# Scroll down the page
def scrollDownAllTheWay(driver):
    old_page = driver.page_source
    while True:
        logging.debug("Scrolling loop")
        for i in range(2):
            scrollDown(driver, 500)
            time.sleep(2)
        new_page = driver.page_source
        if new_page != old_page:
            old_page = new_page
        else:
            break
    return True

回答 6

你试过了吗driver.implicitly_wait。就像驱动程序的设置一样,因此您在会话中只调用一次,它基本上告诉驱动程序等待给定的时间,直到可以执行每个命令。

driver = webdriver.Chrome()
driver.implicitly_wait(10)

因此,如果将等待时间设置为10秒,它将尽快执行该命令,等待10秒钟后才会放弃。我在类似的向下滚动场景中使用过此功能,因此我看不到为什么在您的情况下不起作用。希望这会有所帮助。

为了能够解决此问题,我必须添加新文本。确保在中使用小写字母“ w” implicitly_wait

Have you tried driver.implicitly_wait. It is like a setting for the driver, so you only call it once in the session and it basically tells the driver to wait the given amount of time until each command can be executed.

driver = webdriver.Chrome()
driver.implicitly_wait(10)

So if you set a wait time of 10 seconds it will execute the command as soon as possible, waiting 10 seconds before it gives up. I’ve used this in similar scroll-down scenarios so I don’t see why it wouldn’t work in your case. Hope this is helpful.

To be able to fix this answer, I have to add new text. Be sure to use a lower case ‘w’ in implicitly_wait.


回答 7

如何将WebDriverWait放入While循环并捕获异常。

from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.common.exceptions import TimeoutException

browser = webdriver.Firefox()
browser.get("url")
delay = 3 # seconds
while True:
    try:
        WebDriverWait(browser, delay).until(EC.presence_of_element_located(browser.find_element_by_id('IdOfMyElement')))
        print "Page is ready!"
        break # it will break from the loop once the specific element will be present. 
    except TimeoutException:
        print "Loading took too much time!-Try again"

How about putting WebDriverWait in While loop and catching the exceptions.

from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.common.exceptions import TimeoutException

browser = webdriver.Firefox()
browser.get("url")
delay = 3 # seconds
while True:
    try:
        WebDriverWait(browser, delay).until(EC.presence_of_element_located(browser.find_element_by_id('IdOfMyElement')))
        print "Page is ready!"
        break # it will break from the loop once the specific element will be present. 
    except TimeoutException:
        print "Loading took too much time!-Try again"

回答 8

在这里,我使用了一种非常简单的形式:

from selenium import webdriver
browser = webdriver.Firefox()
browser.get("url")
searchTxt=''
while not searchTxt:
    try:    
      searchTxt=browser.find_element_by_name('NAME OF ELEMENT')
      searchTxt.send_keys("USERNAME")
    except:continue

Here I did it using a rather simple form:

from selenium import webdriver
browser = webdriver.Firefox()
browser.get("url")
searchTxt=''
while not searchTxt:
    try:    
      searchTxt=browser.find_element_by_name('NAME OF ELEMENT')
      searchTxt.send_keys("USERNAME")
    except:continue

回答 9

您可以通过以下功能非常简单地执行此操作:

def page_is_loading(driver):
    while True:
        x = driver.execute_script("return document.readyState")
        if x == "complete":
            return True
        else:
            yield False

当您想要在页面加载完成后执行某些操作时,可以使用:

Driver = webdriver.Firefox(options=Options, executable_path='geckodriver.exe')
Driver.get("https://www.google.com/")

while not page_is_loading(Driver):
    continue

Driver.execute_script("alert('page is loaded')")

You can do that very simple by this function:

def page_is_loading(driver):
    while True:
        x = driver.execute_script("return document.readyState")
        if x == "complete":
            return True
        else:
            yield False

and when you want do something after page loading complete,you can use:

Driver = webdriver.Firefox(options=Options, executable_path='geckodriver.exe')
Driver.get("https://www.google.com/")

while not page_is_loading(Driver):
    continue

Driver.execute_script("alert('page is loaded')")

如何在Selenium Webdriver(Python)中找到包含特定文本的元素?

问题:如何在Selenium Webdriver(Python)中找到包含特定文本的元素?

我正在尝试使用Selenium(使用Python接口并在多个浏览器上)测试复杂的javascript接口。我有许多形式的按钮:

<div>My Button</div>

我希望能够基于“我的按钮”(或不区分大小写的部分匹配项,例如“我的按钮”或“按钮”)搜索按钮

我发现这非常困难,在某种程度上我感觉自己缺少明显的东西。到目前为止,我最好的是:

driver.find_elements_by_xpath('//div[contains(text(), "' + text + '")]')

但是,这是区分大小写的。我尝试过的另一件事是遍历页面上的所有div,并检查element.text属性。但是,每次您得到以下形式的情况:

<div class="outer"><div class="inner">My Button</div></div>

div.outer还使用“我的按钮”作为文本。为了解决这个问题,我尝试查看div.outer是否是div.inner的父级,但无法弄清楚该怎么做(element.get_element_by_xpath(’..’)返回元素的父级,但是测试不等于div.outer)。此外,至少使用Chrome网络驱动程序,迭代页面上的所有元素似乎真的很慢。

有想法吗?

编辑:这个问题有点模糊。在此处询问(并回答)一个更具体的版本:如何在Selenium WebDriver中(通过Python api)获取元素的文本而不包含子元素文本?

I’m trying to test a complicated JavaScript interface with Selenium (using the Python interface, and across multiple browsers). I have a number of buttons of the form:

<div>My Button</div>

I’d like to be able to search for buttons based on “My Button” (or non-case-sensitive, partial matches such as “my button” or “button”).

I’m finding this amazingly difficult, to the extent to which I feel like I’m missing something obvious. The best thing I have so far is:

driver.find_elements_by_xpath('//div[contains(text(), "' + text + '")]')

This is case-sensitive, however. The other thing I’ve tried is iterating through all the divs on the page, and checking the element.text property. However, every time you get a situation of the form:

<div class="outer"><div class="inner">My Button</div></div>

div.outer also has “My Button” as the text. To fix that, I’ve tried looking to see if div.outer is the parent of div.inner, but I couldn’t figure out how to do that (element.get_element_by_xpath(‘..’) returns an element’s parent, but it tests not equal to div.outer).

Also, iterating through all the elements on the page seems to be really slow, at least using the Chrome webdriver.

Ideas?


I asked (and answered) a more specific version here: How to get text of an element in Selenium WebDriver, without including child element text?


回答 0

尝试以下方法:

driver.find_elements_by_xpath("//*[contains(text(), 'My Button')]")

Try the following:

driver.find_elements_by_xpath("//*[contains(text(), 'My Button')]")

回答 1

您可以尝试使用xpath:

'//div[contains(text(), "{0}") and @class="inner"]'.format(text)

You could try an XPath expression like:

'//div[contains(text(), "{0}") and @class="inner"]'.format(text)

回答 2

您还可以将其与“页面对象模式”一起使用,例如:

试试这个代码:

@FindBy(xpath = "//*[contains(text(), 'Best Choice')]")
WebElement buttonBestChoice;

You can also use it with Page Object Pattern, e.g:

Try this code:

@FindBy(xpath = "//*[contains(text(), 'Best Choice')]")
WebElement buttonBestChoice;

回答 3

// *将寻找任何HTML标记。如果某些文本对于Button和div标签是公用的,并且// *是类别,则将无法按预期工作。如果需要选择任何特定内容,则可以通过声明HTML Element标签来获取。喜欢:

driver.find_element_by_xpath("//div[contains(text(),'Add User')]")
driver.find_element_by_xpath("//button[contains(text(),'Add User')]")

//* will be looking for any HTML tag. Where if some text is common for Button and div tag and if //* is categories it will not work as expected. If you need to select any specific then You can get it by declaring HTML Element tag. Like:

driver.find_element_by_xpath("//div[contains(text(),'Add User')]")
driver.find_element_by_xpath("//button[contains(text(),'Add User')]")

回答 4

有趣的是,几乎所有答案都围绕着xpath的功能contains(),而忽略了它区分大小写的事实-与OP的要求相反。
如果您需要不区分大小写,则可以在xpath 1.0 (现代浏览器支持的版本)中实现,尽管效果不佳-通过使用该translate()函数。通过使用转换表,它将源字符替换为其所需的形式。

构造一个由所有大写字母组成的表格,可以将节点的文本有效地转换为lower()形式-允许不区分大小写的匹配(这里只是特权)

[
  contains(
    translate(text(), 'ABCDEFGHIJKLMNOPQRSTUVWXYZ', 'abcdefghijklmnopqrstuvwxyz'),
    'my button'
  )
]
# will match a source text like "mY bUTTon"

完整的python调用:

driver.find_elements_by_xpath("//*[contains(translate(text(), 'ABCDEFGHIJKLMNOPQRSTUVWXYZЙ', 'abcdefghijklmnopqrstuvwxyzй'), 'my button')]")

自然,这种方法有其缺点-如所给出的,它仅适用于拉丁文字;如果要覆盖Unicode字符-您必须将它们添加到翻译表中。我已经在上面的示例中做到了-最后符是西里尔字母符号"Й"


如果我们生活在其中承载的XPath 2.0及以上的浏览器世界(🤞,但不会很快☹️发生的任何时间),我们可以有使用的功能lower-case()(但不完全区域识别),以及matches(对于正则表达式搜索,以案例-insensitive('i')标志)。

Interestingly virtually all answers revolve around xpath’s function contains(), neglecting the fact it is case sensitive – contrary to OP’s ask.
If you need case insensitivity, that is achievable in xpath 1.0 (the version contemporary browsers support), though it’s not pretty – by using the translate() function. It substitutes a source character to its desired form, by using a translation table.

Constructing a table of all upper case characters will effectively transform the node’s text to its lower() form – allowing case-insensitive matching (here’s just the prerogative):

[
  contains(
    translate(text(), 'ABCDEFGHIJKLMNOPQRSTUVWXYZ', 'abcdefghijklmnopqrstuvwxyz'),
    'my button'
  )
]
# will match a source text like "mY bUTTon"

The full python call:

driver.find_elements_by_xpath("//*[contains(translate(text(), 'ABCDEFGHIJKLMNOPQRSTUVWXYZЙ', 'abcdefghijklmnopqrstuvwxyzй'), 'my button')]")

Naturally this approach has its drawbacks – as given, it’ll work only for latin text; if you want to cover unicode characters – you’ll have to add them to the translation table. I’ve done that in the sample above – the last character is the Cyrillic symbol "Й".


And if we lived in a world where browsers supported xpath 2.0 and up (🤞, but not happening any time soon ☹️), we could having used the functions lower-case() (yet, not fully locale-aware), and matches (for regex searches, with case-insensitive ('i') flag).


回答 5

在您提供的HTML中:

<div>My Button</div>

文本My Button为,innerHTML周围没有空格,因此您可以轻松地text()按以下方式使用:

my_element = driver.find_element_by_xpath("//div[text()='My Button']")

注意text()选择上下文节点的所有文本节点子级


带有前导/后缀空格的文本

如果开头的相关文本包含空格

<div>   My Button</div>

或最后:

<div>My Button   </div>

或两端:

<div> My Button </div>  

在这些情况下,您有2个选择:

  • 您可以使用contains()确定第一个参数字符串是否包含第二个参数字符串并返回boolean true或false的函数,如下所示:

    my_element = driver.find_element_by_xpath("//div[contains(., 'My Button')]")
  • 您可以使用以下normalize-space()功能:从字符串中去除开头和结尾的空格,将空格字符序列替换为一个空格,然后返回结果字符串,如下所示:

    driver.find_element_by_xpath("//div[normalize-space()='My Button']]")

变量文本的xpath

如果文本是变量,则可以使用:

foo= "foo_bar"
my_element = driver.find_element_by_xpath("//div[.='" + foo + "']")

In the HTML which you have provided:

<div>My Button</div>

The text My Button is the innerHTML and have no whitespaces around it so you can easily use text() as follows:

my_element = driver.find_element_by_xpath("//div[text()='My Button']")

Note: text() selects all text node children of the context node


Text with leading/trailing spaces

Incase the relevant text containing whitespaces either in the beginning:

<div>   My Button</div>

or at the end:

<div>My Button   </div>

or at both the ends:

<div> My Button </div>  

In these cases you have 2 options:

  • You can use contains() function which determines whether the first argument string contains the second argument string and returns boolean true or false as follows:

    my_element = driver.find_element_by_xpath("//div[contains(., 'My Button')]")
    
  • You can use normalize-space() function which strips leading and trailing white-space from a string, replaces sequences of whitespace characters by a single space, and returns the resulting string as follows:

    driver.find_element_by_xpath("//div[normalize-space()='My Button']]")
    

xpath for variable Text

Incase the text is a variable you can use:

foo= "foo_bar"
my_element = driver.find_element_by_xpath("//div[.='" + foo + "']")

回答 6

wait.until(ExpectedConditions.visibilityOfElementLocated(By.xpath("//*[contains(text(), 'YourTextHere')]")));
    assertNotNull(driver.findElement(By.xpath("//*[contains(text(), 'YourTextHere')]")));
    String yourButtonName=driver.findElement(By.xpath("//*[contains(text(), 'YourTextHere')]")).getAttribute("innerText");
    assertTrue(yourButtonName.equalsIgnoreCase("YourTextHere"));
wait.until(ExpectedConditions.visibilityOfElementLocated(By.xpath("//*[contains(text(), 'YourTextHere')]")));
assertNotNull(driver.findElement(By.xpath("//*[contains(text(), 'YourTextHere')]")));
String yourButtonName = driver.findElement(By.xpath("//*[contains(text(), 'YourTextHere')]")).getAttribute("innerText");
assertTrue(yourButtonName.equalsIgnoreCase("YourTextHere"));

回答 7

类似的问题:查找 <button>Advanced...</button>

也许这会给您一些想法(请将概念从Java转移到Python):

wait.until(ExpectedConditions.elementToBeClickable(//
    driver.findElements(By.tagName("button")).stream().filter(i -> i.getText().equals("Advanced...")).findFirst().get())).click();

Similar problem: Find <button>Advanced...</button>

Maybe this will give you some ideas (please transfer the concept from Java to Python):

wait.until(ExpectedConditions.elementToBeClickable(//
    driver.findElements(By.tagName("button")).stream().filter(i -> i.getText().equals("Advanced...")).findFirst().get())).click();

回答 8

使用driver.find_elements_by_xpath匹配正则表达式匹配函数,以按元素的文本区分大小写

driver.find_elements_by_xpath("//*[matches(.,'My Button', 'i')]")

Use driver.find_elements_by_xpath and matches regex matching function for the case insensitive search of the element by its text.

driver.find_elements_by_xpath("//*[matches(.,'My Button', 'i')]")

回答 9

试试这个。非常简单:

driver.getPageSource().contains("text to search");

这对于硒网络驱动程序确实很有效。

Try this. It’s very easy:

driver.getPageSource().contains("text to search");

This really worked for me in Selenium WebDriver.


网站可以检测到何时在chromedriver中使用硒吗?

问题:网站可以检测到何时在chromedriver中使用硒吗?

我一直在使用Chromedriver测试Selenium,但我注意到有些页面可以检测到您正在使用Selenium,即使根本没有自动化。即使当我只是通过Selenium和Xephyr使用chrome手动浏览时,我也经常得到一个页面,指出检测到可疑活动。我已经检查了用户代理和浏览器指纹,它们与普通的chrome浏览器完全相同。

当我以普通的chrome浏览到这些站点时,一切正常,但是当我使用Selenium时,我被检测到。

从理论上讲,chromedriver和chrome在任何Web服务器上看起来都应该完全相同,但是它们可以通过某种方式检测到它。

如果您想要一些测试代码,请尝试以下方法:

from pyvirtualdisplay import Display
from selenium import webdriver

display = Display(visible=1, size=(1600, 902))
display.start()
chrome_options = webdriver.ChromeOptions()
chrome_options.add_argument('--disable-extensions')
chrome_options.add_argument('--profile-directory=Default')
chrome_options.add_argument("--incognito")
chrome_options.add_argument("--disable-plugins-discovery");
chrome_options.add_argument("--start-maximized")
driver = webdriver.Chrome(chrome_options=chrome_options)
driver.delete_all_cookies()
driver.set_window_size(800,800)
driver.set_window_position(0,0)
print 'arguments done'
driver.get('http://stubhub.com')

如果浏览stubhub,您将在一个或两个请求中被重定向和“阻止”。我一直在对此进行调查,无法弄清楚他们如何分辨用户正在使用Selenium。

他们是怎么做到的呢?

编辑更新:

我在Firefox中安装了Selenium IDE插件,当我在普通的Firefox浏览器中仅使用附加插件访问stubhub.com时就被禁止了。

编辑:

当我使用Fiddler来回查看HTTP请求时,我注意到“假浏览器”的请求通常在响应标头中具有“ no-cache”。

编辑:

像这样的结果是否有办法从Javascript检测到我在Selenium Webdriver页面中,这表明应该没有办法检测何时使用Webdriver。但这证据表明并非如此。

编辑:

该站点将指纹上载到他们的服务器,但是我检查了一下,硒的指纹与使用chrome时的指纹相同。

编辑:

这是它们发送到服务器的指纹有效载荷之一

{"appName":"Netscape","platform":"Linuxx86_64","cookies":1,"syslang":"en-US","userlang":"en-US","cpu":"","productSub":"20030107","setTimeout":1,"setInterval":1,"plugins":{"0":"ChromePDFViewer","1":"ShockwaveFlash","2":"WidevineContentDecryptionModule","3":"NativeClient","4":"ChromePDFViewer"},"mimeTypes":{"0":"application/pdf","1":"ShockwaveFlashapplication/x-shockwave-flash","2":"FutureSplashPlayerapplication/futuresplash","3":"WidevineContentDecryptionModuleapplication/x-ppapi-widevine-cdm","4":"NativeClientExecutableapplication/x-nacl","5":"PortableNativeClientExecutableapplication/x-pnacl","6":"PortableDocumentFormatapplication/x-google-chrome-pdf"},"screen":{"width":1600,"height":900,"colorDepth":24},"fonts":{"0":"monospace","1":"DejaVuSerif","2":"Georgia","3":"DejaVuSans","4":"TrebuchetMS","5":"Verdana","6":"AndaleMono","7":"DejaVuSansMono","8":"LiberationMono","9":"NimbusMonoL","10":"CourierNew","11":"Courier"}}

硒和铬相同

编辑:

VPN只能使用一次,但是在加载第一页后会被检测到。显然,正在运行一些JavaScript来检测Selenium。

I’ve been testing out Selenium with Chromedriver and I noticed that some pages can detect that you’re using Selenium even though there’s no automation at all. Even when I’m just browsing manually just using chrome through Selenium and Xephyr I often get a page saying that suspicious activity was detected. I’ve checked my user agent, and my browser fingerprint, and they are all exactly identical to the normal chrome browser.

When I browse to these sites in normal chrome everything works fine, but the moment I use Selenium I’m detected.

In theory chromedriver and chrome should look literally exactly the same to any webserver, but somehow they can detect it.

If you want some testcode try out this:

from pyvirtualdisplay import Display
from selenium import webdriver

display = Display(visible=1, size=(1600, 902))
display.start()
chrome_options = webdriver.ChromeOptions()
chrome_options.add_argument('--disable-extensions')
chrome_options.add_argument('--profile-directory=Default')
chrome_options.add_argument("--incognito")
chrome_options.add_argument("--disable-plugins-discovery");
chrome_options.add_argument("--start-maximized")
driver = webdriver.Chrome(chrome_options=chrome_options)
driver.delete_all_cookies()
driver.set_window_size(800,800)
driver.set_window_position(0,0)
print 'arguments done'
driver.get('http://stubhub.com')

If you browse around stubhub you’ll get redirected and ‘blocked’ within one or two requests. I’ve been investigating this and I can’t figure out how they can tell that a user is using Selenium.

How do they do it?

EDIT UPDATE:

I installed the Selenium IDE plugin in Firefox and I got banned when I went to stubhub.com in the normal firefox browser with only the additional plugin.

EDIT:

When I use Fiddler to view the HTTP requests being sent back and forth I’ve noticed that the ‘fake browser\’s’ requests often have ‘no-cache’ in the response header.

EDIT:

results like this Is there a way to detect that I’m in a Selenium Webdriver page from Javascript suggest that there should be no way to detect when you are using a webdriver. But this evidence suggests otherwise.

EDIT:

The site uploads a fingerprint to their servers, but I checked and the fingerprint of selenium is identical to the fingerprint when using chrome.

EDIT:

This is one of the fingerprint payloads that they send to their servers

{"appName":"Netscape","platform":"Linuxx86_64","cookies":1,"syslang":"en-US","userlang":"en-US","cpu":"","productSub":"20030107","setTimeout":1,"setInterval":1,"plugins":{"0":"ChromePDFViewer","1":"ShockwaveFlash","2":"WidevineContentDecryptionModule","3":"NativeClient","4":"ChromePDFViewer"},"mimeTypes":{"0":"application/pdf","1":"ShockwaveFlashapplication/x-shockwave-flash","2":"FutureSplashPlayerapplication/futuresplash","3":"WidevineContentDecryptionModuleapplication/x-ppapi-widevine-cdm","4":"NativeClientExecutableapplication/x-nacl","5":"PortableNativeClientExecutableapplication/x-pnacl","6":"PortableDocumentFormatapplication/x-google-chrome-pdf"},"screen":{"width":1600,"height":900,"colorDepth":24},"fonts":{"0":"monospace","1":"DejaVuSerif","2":"Georgia","3":"DejaVuSans","4":"TrebuchetMS","5":"Verdana","6":"AndaleMono","7":"DejaVuSansMono","8":"LiberationMono","9":"NimbusMonoL","10":"CourierNew","11":"Courier"}}

Its identical in selenium and in chrome

EDIT:

VPNs work for a single use but get detected after I load the first page. Clearly some javascript is being run to detect Selenium.


回答 0

对于Mac用户

cdc_使用Vim或Perl 替换变量

您可以使用vim,或如@Vic Seedoubleyew在@ Erti-Chris Eelmaa的答案中指出的那样perl,替换中的cdc_变量chromedriver请参阅@ Erti-Chris Eelmaa的帖子以了解有关该变量的更多信息)。使用vimperl防止您不得不重新编译源代码或使用十六进制编辑器。chromedriver在尝试编辑原件之前,请确保对其进行复印。另外,以下方法也在上进行了测试chromedriver version 2.41.578706


使用Vim

vim /path/to/chromedriver

在上面的代码行之后,您可能会看到一堆乱码。请执行下列操作:

  1. cdc_通过键入/cdc_并按进行搜索return
  2. 按启用编辑a
  3. 删除任意数量的,$cdc_lasutopfhvcZLmcfl然后用相等数量的字符替换删除的内容。如果您不这样做,chromedriver将会失败。
  4. 编辑完成后,按esc
  5. 要保存更改并退出,请键入:wq!并按return
  6. 如果您不想保存更改,但要退出,请键入:q!并按return
  7. 你完成了。

转到更改后的chromedriver双击。terminal应打开一个窗口。如果killed在输出中看不到,则说明您成功更改了驱动程序。


使用Perl

下面的行替换cdc_dog_

perl -pi -e 's/cdc_/dog_/g' /path/to/chromedriver

确保替换字符串的字符数与搜索字符串的字符数相同,否则chromedriver将失败。

Perl说明

s///g 表示您要搜索一个字符串并将其全局替换为另一个字符串(替换所有出现的字符串)。

例如, s/string/replacment/g

所以,

s/// 表示搜索并替换字符串。

cdc_ 是搜索字符串。

dog_ 是替换字符串。

g 是全局键,它将替换每次出现的字符串。

如何检查Perl替代品是否有效

以下行将打印每次出现的搜索字符串cdc_

perl -ne 'while(/cdc_/g){print "$&\n";}' /path/to/chromedriver

如果没有返回任何内容,cdc_则已被替换。

相反,您可以使用以下代码:

perl -ne 'while(/dog_/g){print "$&\n";}' /path/to/chromedriver

查看替换字符串,dog_现在是否在chromedriver二进制文件中。如果是这样,替换字符串将被打印到控制台。

转到更改后的chromedriver双击。terminal应打开一个窗口。如果killed在输出中看不到,则说明您成功更改了驱动程序。


包起来

更改chromedriver二进制文件后,请确保更改后的二进制文件的名称chromedriverchromedriver,并且原始二进制文件已从其原始位置移动或重命名。


我对这种方法的经验

以前,我在尝试登录时在网站上被检测到我,但是用cdc_相同大小的字符串替换后,我得以登录。但是就像其他人所说的那样,如果已经被检测到,则可能会被阻止即使使用此方法后,还有其他原因。因此,您可能必须尝试使用​​VPN,其他网络或具有什么功能的站点访问检测到您的站点。

For Mac Users

Replacing cdc_ variable using Vim or Perl

You can use vim, or as @Vic Seedoubleyew has pointed out in the answer by @Erti-Chris Eelmaa, perl, to replace the cdc_ variable in chromedriver(See post by @Erti-Chris Eelmaa to learn more about that variable). Using vim or perl prevents you from having to recompile source code or use a hex-editor. Make sure to make a copy of the original chromedriver before attempting to edit it. Also, the methods below were tested on chromedriver version 2.41.578706.


Using Vim

vim /path/to/chromedriver

After running the line above, you’ll probably see a bunch of gibberish. Do the following:

  1. Search for cdc_ by typing /cdc_ and pressing return.
  2. Enable editing by pressing a.
  3. Delete any amount of $cdc_lasutopfhvcZLmcfl and replace what was deleted with an equal amount characters. If you don’t, chromedriver will fail.
  4. After you’re done editing, press esc.
  5. To save the changes and quit, type :wq! and press return.
  6. If you don’t want to save the changes, but you want to quit, type :q! and press return.
  7. You’re done.

Go to the altered chromedriver and double click on it. A terminal window should open up. If you don’t see killed in the output, you successfully altered the driver.


Using Perl

The line below replaces cdc_ with dog_:

perl -pi -e 's/cdc_/dog_/g' /path/to/chromedriver

Make sure that the replacement string has the same number of characters as the search string, otherwise the chromedriver will fail.

Perl Explanation

s///g denotes that you want to search for a string and replace it globally with another string (replaces all occurrences).

e.g., s/string/replacment/g

So,

s/// denotes searching for and replacing a string.

cdc_ is the search string.

dog_ is the replacement string.

g is the global key, which replaces every occurrence of the string.

How to check if the Perl replacement worked

The following line will print every occurrence of the search string cdc_:

perl -ne 'while(/cdc_/g){print "$&\n";}' /path/to/chromedriver

If this returns nothing, then cdc_ has been replaced.

Conversely, you can use the this:

perl -ne 'while(/dog_/g){print "$&\n";}' /path/to/chromedriver

to see if your replacement string, dog_, is now in the chromedriver binary. If it is, the replacement string will be printed to the console.

Go to the altered chromedriver and double click on it. A terminal window should open up. If you don’t see killed in the output, you successfully altered the driver.


Wrapping Up

After altering the chromedriver binary, make sure that the name of the altered chromedriver binary is chromedriver, and that the original binary is either moved from its original location or renamed.


My Experience With This Method

I was previously being detected on a website while trying to log in, but after replacing cdc_ with an equal sized string, I was able to log in. Like others have said though, if you’ve already been detected, you might get blocked for a plethora of other reasons even after using this method. So you may have to try accessing the site that was detecting you using a VPN, different network, or what have you.


回答 1

基本上,硒检测的工作方式是,它们检测与selenium一起运行时出现的预定义javascript变量。僵尸程序检测脚本通常会在任何变量中(在窗口对象上)查找包含单词“ selenium” /“ webdriver”的内容,并记录名为$cdc_和的变量$wdc_。当然,所有这些取决于您所使用的浏览器。所有不同的浏览器都公开不同的内容。

对我来说,我使用了chrome,所以,要做的就是确保$cdc_不再存在作为文档变量,然后瞧瞧(下载chromedriver源代码,修改chromedriver并$cdc_以不同的名称重新编译。)

这是我在chromedriver中修改的功能:

call_function.js:

function getPageCache(opt_doc) {
  var doc = opt_doc || document;
  //var key = '$cdc_asdjflasutopfhvcZLmcfl_';
  var key = 'randomblabla_';
  if (!(key in doc))
    doc[key] = new Cache();
  return doc[key];
}

(注意评论,我所做的我转过身$cdc_randomblabla_

这是一个伪代码,演示了僵尸网络可能使用的一些技术:

runBotDetection = function () {
    var documentDetectionKeys = [
        "__webdriver_evaluate",
        "__selenium_evaluate",
        "__webdriver_script_function",
        "__webdriver_script_func",
        "__webdriver_script_fn",
        "__fxdriver_evaluate",
        "__driver_unwrapped",
        "__webdriver_unwrapped",
        "__driver_evaluate",
        "__selenium_unwrapped",
        "__fxdriver_unwrapped",
    ];

    var windowDetectionKeys = [
        "_phantom",
        "__nightmare",
        "_selenium",
        "callPhantom",
        "callSelenium",
        "_Selenium_IDE_Recorder",
    ];

    for (const windowDetectionKey in windowDetectionKeys) {
        const windowDetectionKeyValue = windowDetectionKeys[windowDetectionKey];
        if (window[windowDetectionKeyValue]) {
            return true;
        }
    };
    for (const documentDetectionKey in documentDetectionKeys) {
        const documentDetectionKeyValue = documentDetectionKeys[documentDetectionKey];
        if (window['document'][documentDetectionKeyValue]) {
            return true;
        }
    };

    for (const documentKey in window['document']) {
        if (documentKey.match(/\$[a-z]dc_/) && window['document'][documentKey]['cache_']) {
            return true;
        }
    }

    if (window['external'] && window['external'].toString() && (window['external'].toString()['indexOf']('Sequentum') != -1)) return true;

    if (window['document']['documentElement']['getAttribute']('selenium')) return true;
    if (window['document']['documentElement']['getAttribute']('webdriver')) return true;
    if (window['document']['documentElement']['getAttribute']('driver')) return true;

    return false;
};

根据用户@szx,也可以在十六进制编辑器中简单地打开chromedriver.exe,然后手动进行替换,而无需进行任何编译。

Basically the way the selenium detection works, is that they test for pre-defined javascript variables which appear when running with selenium. The bot detection scripts usually look anything containing word “selenium” / “webdriver” in any of the variables (on window object), and also document variables called $cdc_ and $wdc_. Of course, all of this depends on which browser you are on. All the different browsers expose different things.

For me, I used chrome, so, all that I had to do was to ensure that $cdc_ didn’t exist anymore as document variable, and voila (download chromedriver source code, modify chromedriver and re-compile $cdc_ under different name.)

this is the function I modified in chromedriver:

call_function.js:

function getPageCache(opt_doc) {
  var doc = opt_doc || document;
  //var key = '$cdc_asdjflasutopfhvcZLmcfl_';
  var key = 'randomblabla_';
  if (!(key in doc))
    doc[key] = new Cache();
  return doc[key];
}

(note the comment, all I did I turned $cdc_ to randomblabla_.

Here is a pseudo-code which demonstrates some of the techniques that bot networks might use:

runBotDetection = function () {
    var documentDetectionKeys = [
        "__webdriver_evaluate",
        "__selenium_evaluate",
        "__webdriver_script_function",
        "__webdriver_script_func",
        "__webdriver_script_fn",
        "__fxdriver_evaluate",
        "__driver_unwrapped",
        "__webdriver_unwrapped",
        "__driver_evaluate",
        "__selenium_unwrapped",
        "__fxdriver_unwrapped",
    ];

    var windowDetectionKeys = [
        "_phantom",
        "__nightmare",
        "_selenium",
        "callPhantom",
        "callSelenium",
        "_Selenium_IDE_Recorder",
    ];

    for (const windowDetectionKey in windowDetectionKeys) {
        const windowDetectionKeyValue = windowDetectionKeys[windowDetectionKey];
        if (window[windowDetectionKeyValue]) {
            return true;
        }
    };
    for (const documentDetectionKey in documentDetectionKeys) {
        const documentDetectionKeyValue = documentDetectionKeys[documentDetectionKey];
        if (window['document'][documentDetectionKeyValue]) {
            return true;
        }
    };

    for (const documentKey in window['document']) {
        if (documentKey.match(/\$[a-z]dc_/) && window['document'][documentKey]['cache_']) {
            return true;
        }
    }

    if (window['external'] && window['external'].toString() && (window['external'].toString()['indexOf']('Sequentum') != -1)) return true;

    if (window['document']['documentElement']['getAttribute']('selenium')) return true;
    if (window['document']['documentElement']['getAttribute']('webdriver')) return true;
    if (window['document']['documentElement']['getAttribute']('driver')) return true;

    return false;
};

according to user @szx, it is also possible to simply open chromedriver.exe in hex editor, and just do the replacement manually, without actually doing any compiling.


回答 2

正如我们已经在问题和发布的答案中弄清楚的那样,这里有一个反Web 爬网和一个名为“ Distil Networks”的Bot检测服务。而且,根据公司首席执行官的采访

即使他们可以创建新的机器人,我们还是想出了一种方法来识别Selenium,即他们正在使用的工具,因此,无论Selenium在该机器人上迭代多少次,我们都将阻止它。我们现在使用Python和许多不同的技术来做到这一点。一旦我们发现一种类型的漫游器出现了某种模式,那么我们就会对他们使用的技术进行反向工程并将其识别为恶意软件。

要了解它们如何精确地检测硒,需要时间和其他挑战,但是目前我们可以肯定地说些什么:

  • 它与您对硒采取的措施无关-一旦导航到该站点,便会立即被发现并被禁止。我尝试在动作之间添加人为的随机延迟,在页面加载后暂停-没有任何帮助
  • 这也不是关于浏览器指纹的-在具有干净配置文件而不是隐身模式的多个浏览器中尝试过-没有任何帮助
  • 因为根据采访中的提示,这是“逆向工程”,所以我怀疑这是通过在浏览器中执行一些JS代码完成的,这表明这是通过Selenium Webdriver自动化的浏览器

决定将其发布为答案,因为显然:

网站可以检测到何时在chromedriver中使用硒吗?

是。


另外,我还没有尝试过使用较旧的硒和较旧的浏览器版本-从理论上讲,Distil Networks僵尸检测程序当前依赖于某个特定点,硒中可能实现或添加了某些东西。然后,如果是这种情况,我们可能会检测到(是的,让我们检测检测器)在哪个点/版本上进行了相关更改,调查变更日志和变更集,并且可能会为我们提供有关在哪里查看的更多信息。以及它们用于检测由Webdriver驱动的浏览器的功能。这只是一个需要检验的理论。

As we’ve already figured out in the question and the posted answers, there is an anti Web-scraping and a Bot detection service called “Distil Networks” in play here. And, according to the company CEO’s interview:

Even though they can create new bots, we figured out a way to identify Selenium the a tool they’re using, so we’re blocking Selenium no matter how many times they iterate on that bot. We’re doing that now with Python and a lot of different technologies. Once we see a pattern emerge from one type of bot, then we work to reverse engineer the technology they use and identify it as malicious.

It’ll take time and additional challenges to understand how exactly they are detecting Selenium, but what can we say for sure at the moment:

  • it’s not related to the actions you take with selenium – once you navigate to the site, you get immediately detected and banned. I’ve tried to add artificial random delays between actions, take a pause after the page is loaded – nothing helped
  • it’s not about browser fingerprint either – tried it in multiple browsers with clean profiles and not, incognito modes – nothing helped
  • since, according to the hint in the interview, this was “reverse engineering”, I suspect this is done with some JS code being executed in the browser revealing that this is a browser automated via selenium webdriver

Decided to post it as an answer, since clearly:

Can a website detect when you are using selenium with chromedriver?

Yes.


Also, what I haven’t experimented with is older selenium and older browser versions – in theory, there could be something implemented/added to selenium at a certain point that Distil Networks bot detector currently relies on. Then, if this is the case, we might detect (yeah, let’s detect the detector) at what point/version a relevant change was made, look into changelog and changesets and, may be, this could give us more information on where to look and what is it they use to detect a webdriver-powered browser. It’s just a theory that needs to be tested.


回答 3

在wellsfargo.com上如何实施的示例:

try {
 if (window.document.documentElement.getAttribute("webdriver")) return !+[]
} catch (IDLMrxxel) {}
try {
 if ("_Selenium_IDE_Recorder" in window) return !+""
} catch (KknKsUayS) {}
try {
 if ("__webdriver_script_fn" in document) return !+""

Example of how it’s implemented on wellsfargo.com:

try {
 if (window.document.documentElement.getAttribute("webdriver")) return !+[]
} catch (IDLMrxxel) {}
try {
 if ("_Selenium_IDE_Recorder" in window) return !+""
} catch (KknKsUayS) {}
try {
 if ("__webdriver_script_fn" in document) return !+""

回答 4

混淆JavaScript结果

我已经检查了chromedriver源代码。这会将一些javascript文件注入浏览器。
此链接上的每个javascript文件都会注入到以下网页: https : //chromium.googlesource.com/chromium/src/+/master/chrome/test/chromedriver/js/

因此,我使用了逆向工程,并通过十六进制编辑来模糊化js文件。现在,我确定不再使用JavaScript变量,函数名称和固定字符串来发现硒的活动。但是仍然有些站点和reCaptcha可以检测到硒!
也许他们检查由chromedriver js执行引起的修改:)


编辑1:

Chrome“导航器”参数修改

我发现“导航器”中有一些参数可以简要介绍chromedriver的使用。这些是参数:

  • “ navigator.webdriver”在非自动模式下为’undefined’。在自动模式下,它是“ true”。
  • “ navigator.plugins”在无头chrome上的长度为0。因此,我添加了一些假元素来欺骗插件长度检查过程。
  • navigator.languages”设置为默认镶边值'[“ en-US”,“ en”,“ es”]’。

因此,我需要一个Chrome扩展程序来在网页上运行javascript。我使用本文提供的js代码进行了扩展,并使用另一篇文章将压缩扩展添加到我的项目中。我已经成功更改了值;但是仍然没有改变!

我没有找到其他像这样的变量,但这并不意味着它们不存在。reCaptcha仍然检测到chromedriver,因此应该有更多变量要更改。在下一步应的检测服务,逆向工程,我不想做的事。

现在,我不确定是否值得在此自动化过程上花费更多时间或寻找替代方法!

Obfuscating JavaScripts result

I have checked the chromedriver source code. That injects some javascript files to the browser.
Every javascript file on this link is injected to the web pages: https://chromium.googlesource.com/chromium/src/+/master/chrome/test/chromedriver/js/

So I used reverse engineering and obfuscated the js files by Hex editing. Now i was sure that no more javascript variable, function names and fixed strings were used to uncover selenium activity. But still some sites and reCaptcha detect selenium!
Maybe they check the modifications that are caused by chromedriver js execution :)


Edit 1:

Chrome ‘navigator’ parameters modification

I discovered there are some parameters in ‘navigator’ that briefly uncover using of chromedriver. These are the parameters:

  • “navigator.webdriver” On non-automated mode it is ‘undefined’. On automated mode it’s ‘true’.
  • “navigator.plugins” On headless chrome has 0 length. So I added some fake elements to fool the plugin length checking process.
  • navigator.languages” was set to default chrome value ‘[“en-US”, “en”, “es”]’ .

So what i needed was a chrome extension to run javascript on the web pages. I made an extension with the js code provided in the article and used another article to add the zipped extension to my project. I have successfully changed the values; But still nothing changed!

I didn’t find other variables like these but it doesn’t mean that they don’t exist. Still reCaptcha detects chromedriver, So there should be more variables to change. The next step should be reverse engineering of the detector services that i don’t want to do.

Now I’m not sure does it worth to spend more time on this automation process or search for alternative methods!


回答 5

尝试将selenium与chrome的特定用户配置文件一起使用,以这种方式,您可以将其用作特定用户并定义所需的任何内容。这样做时,它将以“实际”用户身份运行,请使用一些进程浏览器查看chrome进程。您会看到标签的区别。

例如:

username = os.getenv("USERNAME")
userProfile = "C:\\Users\\" + username + "\\AppData\\Local\\Google\\Chrome\\User Data\\Default"
options = webdriver.ChromeOptions()
options.add_argument("user-data-dir={}".format(userProfile))
# add here any tag you want.
options.add_experimental_option("excludeSwitches", ["ignore-certificate-errors", "safebrowsing-disable-download-protection", "safebrowsing-disable-auto-update", "disable-client-side-phishing-detection"])
chromedriver = "C:\Python27\chromedriver\chromedriver.exe"
os.environ["webdriver.chrome.driver"] = chromedriver
browser = webdriver.Chrome(executable_path=chromedriver, chrome_options=options)

chrome标签列表在这里

Try to use selenium with a specific user profile of chrome, That way you can use it as specific user and define any thing you want, When doing so it will run as a ‘real’ user, look at chrome process with some process explorer and you’ll see the difference with the tags.

For example:

username = os.getenv("USERNAME")
userProfile = "C:\\Users\\" + username + "\\AppData\\Local\\Google\\Chrome\\User Data\\Default"
options = webdriver.ChromeOptions()
options.add_argument("user-data-dir={}".format(userProfile))
# add here any tag you want.
options.add_experimental_option("excludeSwitches", ["ignore-certificate-errors", "safebrowsing-disable-download-protection", "safebrowsing-disable-auto-update", "disable-client-side-phishing-detection"])
chromedriver = "C:\Python27\chromedriver\chromedriver.exe"
os.environ["webdriver.chrome.driver"] = chromedriver
browser = webdriver.Chrome(executable_path=chromedriver, chrome_options=options)

chrome tag list here


回答 6

partial interface Navigator { readonly attribute boolean webdriver; };

Navigator界面的webdriver IDL属性必须返回webdriver-active标志的值,该标志最初为false。

此属性使网站可以确定用户代理受WebDriver的控制,并且可以用于帮助减轻拒绝服务攻击。

直接取自2017年W3C编辑的WebDriver草案。这在很大程度上意味着,至少可以确定硒驱动程序的未来迭代,以防止滥用。最终,如果没有源代码,很难说出到底是什么导致chrome驱动程序可检测到。

partial interface Navigator { readonly attribute boolean webdriver; };

The webdriver IDL attribute of the Navigator interface must return the value of the webdriver-active flag, which is initially false.

This property allows websites to determine that the user agent is under control by WebDriver, and can be used to help mitigate denial-of-service attacks.

Taken directly from the 2017 W3C Editor’s Draft of WebDriver. This heavily implies that at the very least, future iterations of selenium’s drivers will be identifiable to prevent misuse. Ultimately, it’s hard to tell without the source code, what exactly causes chrome driver in specific to be detectable.


回答 7

据说window.navigator.webdriver === true如果使用webdriver 会设置Firefox 。这是根据较早的规范之一(例如:archive.org)得出的,但是我在新的规范中找不到它,除了附录中一些非常模糊的措词。

对它的测试是在文件fingerprint_test.js中的硒代码中,其末尾的注释显示“当前仅在firefox中实现”,但是我无法通过一些简单的grep方式识别出该方向上的任何代码,在当前(41.0.2)Firefox发行树或Chromium树中。

从2015年1月起,我还发现了有关firefox驱动程序b82512999938中有关指纹的较早提交的评论。Selenium GIT-master仍在昨天下载的Selenium GIT-master中javascript/firefox-driver/extension/content/server.js添加了注释,该注释链接到当前w3c Webdriver规范中措辞略有不同的附录。

Firefox is said to set window.navigator.webdriver === true if working with a webdriver. That was according to one of the older specs (e.g.: archive.org) but I couldn’t find it in the new one except for some very vague wording in the appendices.

A test for it is in the selenium code in the file fingerprint_test.js where the comment at the end says “Currently only implemented in firefox” but I wasn’t able to identify any code in that direction with some simple greping, neither in the current (41.0.2) Firefox release-tree nor in the Chromium-tree.

I also found a comment for an older commit regarding fingerprinting in the firefox driver b82512999938 from January 2015. That code is still in the Selenium GIT-master downloaded yesterday at javascript/firefox-driver/extension/content/server.js with a comment linking to the slightly differently worded appendix in the current w3c webdriver spec.


回答 8

除了@ Erti-Chris Eelmaa的出色答案-令人讨厌window.navigator.webdriver,它是只读的。如果将其值更改为它的事件false仍然会存在true。因此,仍然可以检测到由自动化软件驱动的浏览器。 MDN

该变量由--enable-automationchrome中的标志管理。chromedriver使用该标志启动chrome并将chrome设置window.navigator.webdrivertrue。你可以在这里找到它。您需要将标记添加到“排除开关”中。例如(golang):

package main

import (
    "github.com/tebeka/selenium"
    "github.com/tebeka/selenium/chrome"
)

func main() {

caps := selenium.Capabilities{
    "browserName": "chrome",
}

chromeCaps := chrome.Capabilities{
    Path:            "/path/to/chrome-binary",
    ExcludeSwitches: []string{"enable-automation"},
}
caps.AddChrome(chromeCaps)

wd, err := selenium.NewRemote(caps, fmt.Sprintf("http://localhost:%d/wd/hub", 4444))
}

Additionally to the great answer of @Erti-Chris Eelmaa – there’s annoying window.navigator.webdriver and it is read-only. Event if you change the value of it to false it will still have true. Thats why the browser driven by automated software can still be detected. MDN

The variable is managed by the flag --enable-automation in chrome. The chromedriver launches chrome with that flag and chrome sets the window.navigator.webdriver to true. You can find it here. You need to add to “exclude switches” the flag. For instance (golang):

package main

import (
    "github.com/tebeka/selenium"
    "github.com/tebeka/selenium/chrome"
)

func main() {

caps := selenium.Capabilities{
    "browserName": "chrome",
}

chromeCaps := chrome.Capabilities{
    Path:            "/path/to/chrome-binary",
    ExcludeSwitches: []string{"enable-automation"},
}
caps.AddChrome(chromeCaps)

wd, err := selenium.NewRemote(caps, fmt.Sprintf("http://localhost:%d/wd/hub", 4444))
}

回答 9

听起来好像它们在Web应用程序防火墙后面。看一下modsecurity和owasp,看看它们是如何工作的。实际上,您要问的是如何进行漫游器检测规避。这不是Selenium Web驱动程序的用途。它用于测试您的Web应用程序,而不打其他Web应用程序。有可能,但基本上,您必须查看WAF在其规则集中查找的内容,并且如果可以的话,特别要避免使用硒。即使那样,它仍然可能无法正常工作,因为您不知道他们在使用什么WAF。您做了正确的第一步,就是伪造用户代理。如果仍然不能解决问题,那么WAF已经到位,您可能需要变得更加棘手。

编辑:点取自其他答案。确保首先正确设置了用户代理。可能是它撞到了本地Web服务器,还是嗅探了流量。

It sounds like they are behind a web application firewall. Take a look at modsecurity and owasp to see how those work. In reality, what you are asking is how to do bot detection evasion. That is not what selenium web driver is for. It is for testing your web application not hitting other web applications. It is possible, but basically, you’d have to look at what a WAF looks for in their rule set and specifically avoid it with selenium if you can. Even then, it might still not work because you don’t know what WAF they are using. You did the right first step, that is faking the user agent. If that didn’t work though, then a WAF is in place and you probably need to get more tricky.

Edit: Point taken from other answer. Make sure your user agent is actually being set correctly first. Maybe have it hit a local web server or sniff the traffic going out.


回答 10

即使您发送了所有正确的数据(例如,Selenium并未显示为扩展名,您也具有合理的分辨率/位深度&c),但仍有许多服务和工具可以分析访问者的行为,以确定访问者的行为是否演员是用户或自动化系统。

例如,访问一个站点然后立即通过将鼠标直接移到相关按钮上不到一秒钟立即执行一些操作,这实际上是用户不会做的。

作为调试工具,使用https://panopticlick.eff.org/这样的站点来检查浏览器的独特性可能也很有用。它还将帮助您验证是否有任何特定参数表明您正在Selenium中运行。

Even if you are sending all the right data (e.g. Selenium doesn’t show up as an extension, you have a reasonable resolution/bit-depth, &c), there are a number of services and tools which profile visitor behaviour to determine whether the actor is a user or an automated system.

For example, visiting a site then immediately going to perform some action by moving the mouse directly to the relevant button, in less than a second, is something no user would actually do.

It might also be useful as a debugging tool to use a site such as https://panopticlick.eff.org/ to check how unique your browser is; it’ll also help you verify whether there are any specific parameters that indicate you’re running in Selenium.


回答 11

我所看到的漫游器检测似乎比我在下面的答案中读到的东西更加复杂或至少有所不同。

实验1:

  1. 我从Python控制台使用Selenium打开浏览器和网页。
  2. 鼠标已经位于特定的位置,我知道该链接将在页面加载后出现。我从不动鼠标。
  3. 我按下了鼠标左键一次(这是从运行Python的控制台到浏览器的焦点)。
  4. 我再次按下鼠标左键(记住,光标在给定链接的上方)。
  5. 链接会正常打开,应该打开。

实验2:

  1. 和以前一样,我从Python控制台使用Selenium打开浏览器和网页。

  2. 这次,我没有使用鼠标单击,而是使用Selenium(在Python控制台中)单击具有随机偏移量的相同元素。

  3. 链接没有打开,但是我被带到了注册页面。

含义:

  • 通过Selenium打开网络浏览器并不会阻止我出现人类
  • 像人一样移动鼠标并不一定要归类为人
  • 通过Selenium单击具有偏移量的内容仍会引发警报

似乎很神秘,但是我想他们可以确定某个动作是否源自Selenium,而他们并不关心浏览器本身是否通过Selenium打开。还是可以确定窗口是否具有焦点?听到有人有任何见识会很有趣。

The bot detection I’ve seen seems more sophisticated or at least different than what I’ve read through in the answers below.

EXPERIMENT 1:

  1. I open a browser and web page with Selenium from a Python console.
  2. The mouse is already at a specific location where I know a link will appear once the page loads. I never move the mouse.
  3. I press the left mouse button once (this is necessary to take focus from the console where Python is running to the browser).
  4. I press the left mouse button again (remember, cursor is above a given link).
  5. The link opens normally, as it should.

EXPERIMENT 2:

  1. As before, I open a browser and the web page with Selenium from a Python console.

  2. This time around, instead of clicking with the mouse, I use Selenium (in the Python console) to click the same element with a random offset.

  3. The link doesn’t open, but I am taken to a sign up page.

IMPLICATIONS:

  • opening a web browser via Selenium doesn’t preclude me from appearing human
  • moving the mouse like a human is not necessary to be classified as human
  • clicking something via Selenium with an offset still raises the alarm

Seems mysterious, but I guess they can just determine whether an action originates from Selenium or not, while they don’t care whether the browser itself was opened via Selenium or not. Or can they determine if the window has focus? Would be interesting to hear if anyone has any insights.


回答 12

我发现的另一件事是,某些网站使用检查用户代理的平台。如果该值包含:“ HeadlessChrome”,则在使用无头模式时,该行为可能会很奇怪。

解决方法是覆盖用户代理值,例如在Java中:

chromeOptions.addArguments("--user-agent=Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.86 Safari/537.36");

One more thing I found is that some websites uses a platform that checks the User Agent. If the value contains: “HeadlessChrome” the behavior can be weird when using headless mode.

The workaround for that will be to override the user agent value, for example in Java:

chromeOptions.addArguments("--user-agent=Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.86 Safari/537.36");

回答 13

一些站点正在检测到此:

function d() {
try {
    if (window.document.$cdc_asdjflasutopfhvcZLmcfl_.cache_)
        return !0
} catch (e) {}

try {
    //if (window.document.documentElement.getAttribute(decodeURIComponent("%77%65%62%64%72%69%76%65%72")))
    if (window.document.documentElement.getAttribute("webdriver"))
        return !0
} catch (e) {}

try {
    //if (decodeURIComponent("%5F%53%65%6C%65%6E%69%75%6D%5F%49%44%45%5F%52%65%63%6F%72%64%65%72") in window)
    if ("_Selenium_IDE_Recorder" in window)
        return !0
} catch (e) {}

try {
    //if (decodeURIComponent("%5F%5F%77%65%62%64%72%69%76%65%72%5F%73%63%72%69%70%74%5F%66%6E") in document)
    if ("__webdriver_script_fn" in document)
        return !0
} catch (e) {}

Some sites are detecting this:

function d() {
try {
    if (window.document.$cdc_asdjflasutopfhvcZLmcfl_.cache_)
        return !0
} catch (e) {}

try {
    //if (window.document.documentElement.getAttribute(decodeURIComponent("%77%65%62%64%72%69%76%65%72")))
    if (window.document.documentElement.getAttribute("webdriver"))
        return !0
} catch (e) {}

try {
    //if (decodeURIComponent("%5F%53%65%6C%65%6E%69%75%6D%5F%49%44%45%5F%52%65%63%6F%72%64%65%72") in window)
    if ("_Selenium_IDE_Recorder" in window)
        return !0
} catch (e) {}

try {
    //if (decodeURIComponent("%5F%5F%77%65%62%64%72%69%76%65%72%5F%73%63%72%69%70%74%5F%66%6E") in document)
    if ("__webdriver_script_fn" in document)
        return !0
} catch (e) {}

回答 14

用以下代码编写一个html页面。您将看到,在DOM硒中,在externalHTML中应用了webdriver属性

<html>
<head>
  <script type="text/javascript">
  <!--
    function showWindow(){
      javascript:(alert(document.documentElement.outerHTML));
    }
  //-->
  </script>
</head>
<body>
  <form>
    <input type="button" value="Show outerHTML" onclick="showWindow()">
  </form>
</body>
</html>

Write an html page with the following code. You will see that in the DOM selenium applies a webdriver attribute in the outerHTML

<html>
<head>
  <script type="text/javascript">
  <!--
    function showWindow(){
      javascript:(alert(document.documentElement.outerHTML));
    }
  //-->
  </script>
</head>
<body>
  <form>
    <input type="button" value="Show outerHTML" onclick="showWindow()">
  </form>
</body>
</html>

回答 15

我发现这样更改javascript“ key”变量:

//Fools the website into believing a human is navigating it
        ((JavascriptExecutor)driver).executeScript("window.key = \"blahblah\";");

在将Selenium Webdriver和Google Chrome结合使用时,某些网站可以使用,因为许多网站都会检查此变量,以避免被Selenium废弃。

I’ve found changing the javascript “key” variable like this:

//Fools the website into believing a human is navigating it
        ((JavascriptExecutor)driver).executeScript("window.key = \"blahblah\";");

works for some websites when using Selenium Webdriver along with Google Chrome, since many sites check for this variable in order to avoid being scrapped by Selenium.


回答 16

在我看来,使用Selenium做到这一点的最简单方法是拦截XHR,后者将发送回浏览器指纹。

但这是仅硒的问题,因此最好使用其他方法。硒应该使这种事情变得容易,而不是更困难。

It seems to me the simplest way to do it with Selenium is to intercept the XHR that sends back the browser fingerprint.

But since this is a Selenium-only problem, its better just to use something else. Selenium is supposed to make things like this easier, not way harder.


回答 17

您可以尝试使用参数“启用自动化”

var options = new ChromeOptions();

// hide selenium
options.AddExcludedArguments(new List<string>() { "enable-automation" });

var driver = new ChromeDriver(ChromeDriverService.CreateDefaultService(), options);

但是,我想提醒您,此功能已在ChromeDriver 79.0.3945.16中修复。因此,您可能应该使用旧版的chrome。

另外,作为另一个选择,您可以尝试使用InternetExplorerDriver而不是Chrome。对于我来说,IE不会在没有任何黑客的情况下完全阻止。

有关更多信息,请尝试在这里查看:

Selenium Webdriver:修改navigator.webdriver标志以防止硒检测

Chrome v76中无法隐藏“ Chrome正在由自动化软件控制”信息栏

You can try to use the parameter “enable-automation”

var options = new ChromeOptions();

// hide selenium
options.AddExcludedArguments(new List<string>() { "enable-automation" });

var driver = new ChromeDriver(ChromeDriverService.CreateDefaultService(), options);

But, I want to warn that this ability was fixed in ChromeDriver 79.0.3945.16. So probably you should use older versions of chrome.

Also, as another option, you can try using InternetExplorerDriver instead of Chrome. As for me, IE does not block at all without any hacks.

And for more info try to take a look here:

Selenium webdriver: Modifying navigator.webdriver flag to prevent selenium detection

Unable to hide “Chrome is being controlled by automated software” infobar within Chrome v76


使用Python在Selenium WebDriver中获取WebElement的HTML源

问题:使用Python在Selenium WebDriver中获取WebElement的HTML源

我正在使用Python绑定来运行Selenium WebDriver:

from selenium import webdriver
wd = webdriver.Firefox()

我知道我可以像这样抓取网络元素:

elem = wd.find_element_by_css_selector('#my-id')

我知道我可以通过…

wd.page_source

但是无论如何,有没有获得“元素来源”?

elem.source   # <-- returns the HTML as a string

Python的Selenium Webdriver文档基本上不存在,我在代码中看不到任何能够启用该功能的东西。

对访问元素(及其子元素)的HTML的最佳方法有何想法?

I’m using the Python bindings to run Selenium WebDriver:

from selenium import webdriver
wd = webdriver.Firefox()

I know I can grab a webelement like so:

elem = wd.find_element_by_css_selector('#my-id')

And I know I can get the full page source with…

wd.page_source

But is there anyway to get the “element source”?

elem.source   # <-- returns the HTML as a string

The selenium webdriver docs for Python are basically non-existent and I don’t see anything in the code that seems to enable that functionality.

Any thoughts on the best way to access the HTML of an element (and its children)?


回答 0

您可以读取innerHTML属性以获取元素内容outerHTML来源或包含当前元素的来源。

Python:

element.get_attribute('innerHTML')

Java:

elem.getAttribute("innerHTML");

C#:

element.GetAttribute("innerHTML");

红宝石:

element.attribute("innerHTML")

JS:

element.getAttribute('innerHTML');

PHP:

$element->getAttribute('innerHTML');

经过测试并与ChromeDriver

You can read innerHTML attribute to get source of the content of the element or outerHTML for source with the current element.

Python:

element.get_attribute('innerHTML')

Java:

elem.getAttribute("innerHTML");

C#:

element.GetAttribute("innerHTML");

Ruby:

element.attribute("innerHTML")

JS:

element.getAttribute('innerHTML');

PHP:

$element->getAttribute('innerHTML');

Tested and works with the ChromeDriver.


回答 1

获取a的html源代码实际上并没有直接的方法webelement。您将不得不使用JS。我不太确定python绑定,但是您可以在Java中轻松地做到这一点。我确信一定有一些类似于JavascriptExecutorPython中的类。

 WebElement element = driver.findElement(By.id("foo"));
 String contents = (String)((JavascriptExecutor)driver).executeScript("return arguments[0].innerHTML;", element); 

There is not really a straight-forward way of getting the html source code of a webelement. You will have to use JS. I am not too sure about python bindings but you can easily do like this in Java. I am sure there must be something similar to JavascriptExecutor class in Python.

 WebElement element = driver.findElement(By.id("foo"));
 String contents = (String)((JavascriptExecutor)driver).executeScript("return arguments[0].innerHTML;", element); 

回答 2

当然,我们可以在下面的Selenium Python中使用此脚本获取所有HTML源代码:

elem = driver.find_element_by_xpath("//*")
source_code = elem.get_attribute("outerHTML")

如果要保存到文件:

with open('c:/html_source_code.html', 'w') as f:
    f.write(source_code.encode('utf-8'))

我建议保存到文件,因为源代码非常长。

Sure we can get all HTML source code with this script below in Selenium Python:

elem = driver.find_element_by_xpath("//*")
source_code = elem.get_attribute("outerHTML")

If you you want to save it to file:

with open('c:/html_source_code.html', 'w') as f:
    f.write(source_code.encode('utf-8'))

I suggest saving to a file because source code is very very long.


回答 3

在Ruby中,使用selenium-webdriver(2.32.1),存在一种page_source包含整个页面源的方法。

In Ruby, using selenium-webdriver (2.32.1), there is a page_source method that contains the entire page source.


回答 4

实际上,使用属性方法更容易,更直接。

将Ruby与Selenium和PageObject宝石一起使用,以获取与某个元素关联的类,该行将为element.attribute(Class)

如果您想将其他属性绑定到元素,则适用相同的概念。例如,如果我想要一个元素的String element.attribute(String)

Using the attribute method is, in fact, easier and more straight forward.

Using Ruby with the Selenium and PageObject gems, to get the class associated with a certain element, the line would be element.attribute(Class).

The same concept applies if you wanted to get other attributes tied to the element. For example, if I wanted the String of an element, element.attribute(String).


回答 5

看起来已经过时了,但无论如何还是要放在这里。在您的情况下,正确的做法是:

elem = wd.find_element_by_css_selector('#my-id')
html = wd.execute_script("return arguments[0].innerHTML;", elem)

要么

html = elem.get_attribute('innerHTML')

两者都为我工作(selenium-server-standalone-2.35.0)

Looks outdated, but let it be here anyway. The correct way to do it in your case:

elem = wd.find_element_by_css_selector('#my-id')
html = wd.execute_script("return arguments[0].innerHTML;", elem)

or

html = elem.get_attribute('innerHTML')

Both are working for me (selenium-server-standalone-2.35.0)


回答 6

Java与Selenium 2.53.0

driver.getPageSource();

Java with Selenium 2.53.0

driver.getPageSource();

回答 7

希望对您有所帮助:http : //selenium.googlecode.com/svn/trunk/docs/api/java/org/openqa/selenium/WebElement.html

这里介绍Java方法:

java.lang.String    getText() 

但不幸的是,它在Python中不可用。因此,您可以将方法名称从Java转换为Python,并使用当前方法尝试另一种逻辑,而无需获取整个页面的源代码…

例如

 my_id = elem[0].get_attribute('my-id')

I hope this could help: http://selenium.googlecode.com/svn/trunk/docs/api/java/org/openqa/selenium/WebElement.html

Here is described Java method:

java.lang.String    getText() 

But unfortunately it’s not available in Python. So you can translate the method names to Python from Java and try another logic using present methods without getting the whole page source…

E.g.

 my_id = elem[0].get_attribute('my-id')

回答 8

这对我来说是无缝的。

element.get_attribute('innerHTML')

This works seamlessly for me.

element.get_attribute('innerHTML')

回答 9

InnerHTML将返回所选元素内的元素,而outerHTML将连同所选元素一起返回HTML内

示例:-现在假设您的Element如下

<tr id="myRow"><td>A</td><td>B</td></tr>

innerHTML元素输出

<td>A</td><td>B</td>

outsideHTML元素输出

<tr id="myRow"><td>A</td><td>B</td></tr>

现场示例:-

http://www.java2s.com/Tutorials/JavascriptDemo/f/find_out_the_difference_between_innerhtml_and_outerhtml_in_javascript_example.htm

在下面,您将找到根据不同绑定要求的语法。根据需要将更innerHTML改为outerHTML

Python:

element.get_attribute('innerHTML')

Java:

elem.getAttribute("innerHTML");

如果您想使用整页HTML,请使用以下代码:-

driver.getPageSource();

InnerHTML will return element inside the selected element and outerHTML will return inside HTML along with the element you have selected

Example :- Now suppose your Element is as below

<tr id="myRow"><td>A</td><td>B</td></tr>

innerHTML element Output

<td>A</td><td>B</td>

outerHTML element Output

<tr id="myRow"><td>A</td><td>B</td></tr>

Live Example :-

http://www.java2s.com/Tutorials/JavascriptDemo/f/find_out_the_difference_between_innerhtml_and_outerhtml_in_javascript_example.htm

Below you will find the syntax which require as per different binding. Change the innerHTML to outerHTML as per required.

Python:

element.get_attribute('innerHTML')

Java:

elem.getAttribute("innerHTML");

If you want whole page HTML use below code :-

driver.getPageSource();

回答 10

WebElement element = driver.findElement(By.id("foo"));
String contents = (String)((JavascriptExecutor)driver).executeScript("return      arguments[0].innerHTML;", element); 

该代码也确实可以从源代码中获取JavaScript!

WebElement element = driver.findElement(By.id("foo"));
String contents = (String)((JavascriptExecutor)driver).executeScript("return      arguments[0].innerHTML;", element); 

This code really works to get JavaScript from source as well!


回答 11

在PHPUnit硒测试中,它是这样的:

$text = $this->byCssSelector('.some-class-nmae')->attribute('innerHTML');

And in PHPUnit selenium test it’s like this:

$text = $this->byCssSelector('.some-class-nmae')->attribute('innerHTML');

回答 12

如果您对Python中的远程控制解决方案感兴趣,请按照以下方法获取innerHTML:

innerHTML = sel.get_eval("window.document.getElementById('prodid').innerHTML")

If you are interested in a solution for Remote Control in Python, here is how to get innerHTML:

innerHTML = sel.get_eval("window.document.getElementById('prodid').innerHTML")

回答 13

我更喜欢获取呈现的HTML的方法如下:

driver.get("http://www.google.com")
body_html = driver.find_element_by_xpath("/html/body")
print body_html.text

但是,上述方法会删除所有标签(也是嵌套标签),并且仅返回文本内容。如果您也有兴趣获取HTML标记,请使用以下方法。

print body_html.getAttribute("innerHTML")

The method to get the rendered HTML I prefer is following:

driver.get("http://www.google.com")
body_html = driver.find_element_by_xpath("/html/body")
print body_html.text

However the above method removes all the tags( yes the nested tags as well ) and returns only text content. If you interested in getting the HTML markup as well, then use the method below.

print body_html.getAttribute("innerHTML")

使用Python的Selenium-Geckodriver可执行文件必须位于PATH中

问题:使用Python的Selenium-Geckodriver可执行文件必须位于PATH中

我是编程的新手,Python大约2个月前开始学习,并且正在研究Sweigart的《用Python文本自动生成无聊的东西》。我正在使用IDLE,并且已经安装了硒模块和Firefox浏览器。每当我尝试运行webdriver函数时,都会得到以下信息:

from selenium import webdriver
browser = webdriver.Firefox()

exceptions:-

Exception ignored in: <bound method Service.__del__ of <selenium.webdriver.firefox.service.Service object at 0x00000249C0DA1080>>
Traceback (most recent call last):
  File "C:\Python\Python35\lib\site-packages\selenium\webdriver\common\service.py", line 163, in __del__
    self.stop()
  File "C:\Python\Python35\lib\site-packages\selenium\webdriver\common\service.py", line 135, in stop
    if self.process is None:
AttributeError: 'Service' object has no attribute 'process'
Exception ignored in: <bound method Service.__del__ of <selenium.webdriver.firefox.service.Service object at 0x00000249C0E08128>>
Traceback (most recent call last):
  File "C:\Python\Python35\lib\site-packages\selenium\webdriver\common\service.py", line 163, in __del__
    self.stop()
  File "C:\Python\Python35\lib\site-packages\selenium\webdriver\common\service.py", line 135, in stop
    if self.process is None:
AttributeError: 'Service' object has no attribute 'process'
Traceback (most recent call last):
  File "C:\Python\Python35\lib\site-packages\selenium\webdriver\common\service.py", line 64, in start
    stdout=self.log_file, stderr=self.log_file)
  File "C:\Python\Python35\lib\subprocess.py", line 947, in __init__
    restore_signals, start_new_session)
  File "C:\Python\Python35\lib\subprocess.py", line 1224, in _execute_child
    startupinfo)
FileNotFoundError: [WinError 2] The system cannot find the file specified

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<pyshell#11>", line 1, in <module>
    browser = webdriver.Firefox()
  File "C:\Python\Python35\lib\site-packages\selenium\webdriver\firefox\webdriver.py", line 135, in __init__
    self.service.start()
  File "C:\Python\Python35\lib\site-packages\selenium\webdriver\common\service.py", line 71, in start
    os.path.basename(self.path), self.start_error_message)
selenium.common.exceptions.WebDriverException: Message: 'geckodriver' executable needs to be in PATH. 

我想我需要设置路径,geckodriver但不确定如何设置,所以谁能告诉我该怎么做?

I’m new to programming and started with Python about 2 months ago and am going over Sweigart’s Automate the Boring Stuff with Python text. I’m using IDLE and already installed the selenium module and the Firefox browser. Whenever I tried to run the webdriver function, I get this:

from selenium import webdriver
browser = webdriver.Firefox()

Exception :-

Exception ignored in: <bound method Service.__del__ of <selenium.webdriver.firefox.service.Service object at 0x00000249C0DA1080>>
Traceback (most recent call last):
  File "C:\Python\Python35\lib\site-packages\selenium\webdriver\common\service.py", line 163, in __del__
    self.stop()
  File "C:\Python\Python35\lib\site-packages\selenium\webdriver\common\service.py", line 135, in stop
    if self.process is None:
AttributeError: 'Service' object has no attribute 'process'
Exception ignored in: <bound method Service.__del__ of <selenium.webdriver.firefox.service.Service object at 0x00000249C0E08128>>
Traceback (most recent call last):
  File "C:\Python\Python35\lib\site-packages\selenium\webdriver\common\service.py", line 163, in __del__
    self.stop()
  File "C:\Python\Python35\lib\site-packages\selenium\webdriver\common\service.py", line 135, in stop
    if self.process is None:
AttributeError: 'Service' object has no attribute 'process'
Traceback (most recent call last):
  File "C:\Python\Python35\lib\site-packages\selenium\webdriver\common\service.py", line 64, in start
    stdout=self.log_file, stderr=self.log_file)
  File "C:\Python\Python35\lib\subprocess.py", line 947, in __init__
    restore_signals, start_new_session)
  File "C:\Python\Python35\lib\subprocess.py", line 1224, in _execute_child
    startupinfo)
FileNotFoundError: [WinError 2] The system cannot find the file specified

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<pyshell#11>", line 1, in <module>
    browser = webdriver.Firefox()
  File "C:\Python\Python35\lib\site-packages\selenium\webdriver\firefox\webdriver.py", line 135, in __init__
    self.service.start()
  File "C:\Python\Python35\lib\site-packages\selenium\webdriver\common\service.py", line 71, in start
    os.path.basename(self.path), self.start_error_message)
selenium.common.exceptions.WebDriverException: Message: 'geckodriver' executable needs to be in PATH. 

I think I need to set the path for geckodriver but not sure how, so can anyone tell me how would I do this?


回答 0

selenium.common.exceptions.WebDriverException:消息:“ geckodriver”可执行文件必须位于PATH中。

首先,您需要从此处下载最新的可执行geckodriver,以使用硒运行最新的Firefox。

实际上,Selenium客户端绑定试图geckodriver从系统中找到可执行文件PATH。您需要将包含可执行文件的目录添加到系统路径。

  • 在Unix系统上,如果使用的是与bash兼容的shell,则可以执行以下操作将其附加到系统的搜索路径中:

    export PATH=$PATH:/path/to/directory/of/executable/downloaded/in/previous/step
  • 在Windows上,您将需要更新Path系统变量以 手动命令行将完整目录路径添加到可执行geckodriver (不要忘记在将可执行geckodriver添加到系统PATH中生效后重新启动系统)。其原理与Unix相同。

现在,您可以按照以下步骤运行代码:-

from selenium import webdriver

browser = webdriver.Firefox()

selenium.common.exceptions.WebDriverException:消息:预期的浏览器二进制位置,但无法在默认位置找到二进制位置,未提供’moz:firefoxOptions.binary’功能,并且命令行上未设置二进制标志

异常清楚地表明您在Selenium试图查找Firefox并从默认位置启动时在其他位置安装了Firefox,但找不到。您需要提供明确安装了firefox的二进制位置才能启动firefox,如下所示:

from selenium import webdriver
from selenium.webdriver.firefox.firefox_binary import FirefoxBinary

binary = FirefoxBinary('path/to/installed firefox binary')
browser = webdriver.Firefox(firefox_binary=binary)

selenium.common.exceptions.WebDriverException: Message: ‘geckodriver’ executable needs to be in PATH.

First of all you will need to download latest executable geckodriver from here to run latest firefox using selenium

Actually The Selenium client bindings tries to locate the geckodriver executable from the system PATH. You will need to add the directory containing the executable to the system path.

  • On Unix systems you can do the following to append it to your system’s search path, if you’re using a bash-compatible shell:

    export PATH=$PATH:/path/to/directory/of/executable/downloaded/in/previous/step
    
  • On Windows you will need to update the Path system variable to add the full directory path to the executable geckodriver manually or command line(don’t forget to restart your system after adding executable geckodriver into system PATH to take effect). The principle is the same as on Unix.

Now you can run your code same as you’re doing as below :-

from selenium import webdriver

browser = webdriver.Firefox()

selenium.common.exceptions.WebDriverException: Message: Expected browser binary location, but unable to find binary in default location, no ‘moz:firefoxOptions.binary’ capability provided, and no binary flag set on the command line

Exception clearly states you have installed firefox some other location while Selenium is trying to find firefox and launch from default location but it couldn’t find. You need to provide explicitly firefox installed binary location to launch firefox as below :-

from selenium import webdriver
from selenium.webdriver.firefox.firefox_binary import FirefoxBinary

binary = FirefoxBinary('path/to/installed firefox binary')
browser = webdriver.Firefox(firefox_binary=binary)

回答 1

这为我解决了。

from selenium import webdriver
driver = webdriver.Firefox(executable_path=r'your\path\geckodriver.exe')
driver.get('http://inventwithpython.com')

This solved it for me.

from selenium import webdriver
driver = webdriver.Firefox(executable_path=r'your\path\geckodriver.exe')
driver.get('http://inventwithpython.com')

回答 2

这个步骤在ubuntu firefox 50上为我解决了。

  1. 下载geckodriver

  2. 将geckodriver复制到/ usr / local / bin

您不需要添加

firefox_capabilities = DesiredCapabilities.FIREFOX
firefox_capabilities['marionette'] = True
firefox_capabilities['binary'] = '/usr/bin/firefox'
browser = webdriver.Firefox(capabilities=firefox_capabilities)

this steps SOLVED for me on ubuntu firefox 50.

  1. Download geckodriver

  2. Copy geckodriver in /usr/local/bin

You do NOT need to add

firefox_capabilities = DesiredCapabilities.FIREFOX
firefox_capabilities['marionette'] = True
firefox_capabilities['binary'] = '/usr/bin/firefox'
browser = webdriver.Firefox(capabilities=firefox_capabilities)

回答 3

@saurabh的回答解决了这个问题,但没有解释为什么使用Python自动完成无聊的工作不包括这些步骤。

这是由于该书基于selenium 2.x,并且该系列的Firefox驱动程序不需要gecko驱动程序。开发硒时,用于驱动浏览器的Gecko界面不可用。

selenium 2.x系列的最新版本是2.53.6(请参见例如此答案,以更轻松地查看版本)。

2.53.6版本页面完全不提壁虎。但是从3.0.2版开始,文档明确指出您需要安装gecko驱动程序。

如果升级(或在新系统上安装)后,以前(或在旧系统上)运行良好的软件不再起作用,而您又急着,请执行以下操作,将硒版本固定在virtualenv中

pip install selenium==2.53.6

但是,当然,开发的长期解决方案是使用最新版本的Selenium设置新的virtualenv,安装gecko驱动程序并测试一切是否仍按预期进行。但是主要版本颠簸可能会引入书中未涵盖的其他API更改,因此您可能要坚持使用较旧的硒,直到您有足够的信心自己可以解决selenium2和selenium3 API之间的任何差异。

The answer by @saurabh solves the issue, but doesn’t explain why Automate the Boring Stuff with Python doesn’t include those steps.

This is caused by the book being based on selenium 2.x and the Firefox driver for that series does not need the gecko driver. The Gecko interface to drive the browser was not available when selenium was being developed.

The latest version in the selenium 2.x series is 2.53.6 (see e.g this answers, for an easier view of the versions).

The 2.53.6 version page doesn’t mention gecko at all. But since version 3.0.2 the documentation explicitly states you need to install the gecko driver.

If after an upgrade (or install on a new system), your software that worked fine before (or on your old system) doesn’t work anymore and you are in a hurry, pin the selenium version in your virtualenv by doing

pip install selenium==2.53.6

but of course the long term solution for development is to setup a new virtualenv with the latest version of selenium, install the gecko driver and test if everything still works as expected. But the major version bump might introduce other API changes that are not covered by your book, so you might want to stick with the older selenium, until you are confident enough that you can fix any discrepancies between the selenium2 and selenium3 API yourself.


回答 4

在已安装Homebrew的 macOS上,您只需运行Terminal命令即可

$ brew install geckodriver

因为自制软件已经扩展了,PATH所以不需要修改任何启动脚本。

On macOS with Homebrew already installed you can simply run the Terminal command

$ brew install geckodriver

Because homebrew already did extend the PATH there’s no need to modify any startup scripts.


回答 5

为Selenium Python设置geckodriver:

它需要使用FirefoxDriver设置geckodriver路径,如下代码:

self.driver = webdriver.Firefox(executable_path = 'D:\Selenium_RiponAlWasim\geckodriver-v0.18.0-win64\geckodriver.exe')

下载适用于您的操作系统的geckodriver(从https://github.com/mozilla/geckodriver/releases)->将其提取到您选择的文件夹中->如上所述正确设置路径

我在Windows 10中使用Python 3.6.2和Selenium WebDriver 3.4.3。

设置geckodriver的另一种方法:

i)只需将geckodriver.exe粘贴在/ Python / Scripts /下(在我的情况下,文件夹为:C:\ Python36 \ Scripts)
ii)现在编写如下的简单代码:

self.driver = webdriver.Firefox()

To set up geckodriver for Selenium Python:

It needs to set geckodriver path with FirefoxDriver as below code:

self.driver = webdriver.Firefox(executable_path = 'D:\Selenium_RiponAlWasim\geckodriver-v0.18.0-win64\geckodriver.exe')

Download geckodriver for your suitable OS (from https://github.com/mozilla/geckodriver/releases) -> Extract it in a folder of your choice -> Set the path correctly as mentioned above

I’m using Python 3.6.2 and Selenium WebDriver 3.4.3 in Windows 10.

Another way to set up geckodriver:

i) Simply paste the geckodriver.exe under /Python/Scripts/ (In my case the folder was: C:\Python36\Scripts)
ii) Now write the simple code as below:

self.driver = webdriver.Firefox()

回答 6

如果您使用的是Anaconda,则只需激活虚拟环境,然后使用以下命令安装geckodriver

    conda install -c conda-forge geckodriver

If you are using Anaconda, all you have to do is activate your virtual environment and then install geckodriver using the following command:

    conda install -c conda-forge geckodriver

回答 7

Ubuntu 18.04+和最新版本的geckodriver

这也应适用于其他* nix品种。

export GV=v0.26.0
wget "https://github.com/mozilla/geckodriver/releases/download/$GV/geckodriver-$GV-linux64.tar.gz"
tar xvzf geckodriver-$GV-linux64.tar.gz 
chmod +x geckodriver
sudo cp geckodriver /usr/local/bin/

对于Mac,请更新至:

geckodriver-$GV-macos.tar.gz

Ubuntu 18.04+ and Newest release of geckodriver

This should also work for other *nix varieties as well.

export GV=v0.26.0
wget "https://github.com/mozilla/geckodriver/releases/download/$GV/geckodriver-$GV-linux64.tar.gz"
tar xvzf geckodriver-$GV-linux64.tar.gz 
chmod +x geckodriver
sudo cp geckodriver /usr/local/bin/

For mac update to:

geckodriver-$GV-macos.tar.gz

回答 8

我看到讨论仍在讨论通过下载二进制文件并手动配置路径来设置geckodriver的旧方法。

可以使用webdriver-manager自动完成

pip install webdriver-manager

现在,问题中的上述代码将可以简单地与以下更改一起使用,

from selenium import webdriver
from webdriver_manager.firefox import GeckoDriverManager

driver = webdriver.Firefox(executable_path=GeckoDriverManager().install())

I see the discussions still talk about the old way of setting up geckodriver by downloading the binary and configuring the path manually.

This can be done automatically using webdriver-manager

pip install webdriver-manager

Now the above code in the question will work simply with below change,

from selenium import webdriver
from webdriver_manager.firefox import GeckoDriverManager

driver = webdriver.Firefox(executable_path=GeckoDriverManager().install())

回答 9

Windows的最简单方法!此处
下载最新版本。将geckodriver.exe文件添加到python目录(或已存在的任何其他目录)中。这应该可以解决问题(在Windows 10上测试)geckodriverPATH

The easiest way for windows!
Download the latest version of geckodriver from here. Add the geckodriver.exe file to the python directory (or any other directory which already in PATH). This should solve the problem (Tested on Windows 10)


回答 10

MAC的步骤:

简单的解决方案是下载GeckoDriver并将其添加到您的系统PATH中。您可以使用以下两种方法之一:

简短方法:

1)下载并解压缩Geckodriver

2)在启动驱动程序时提及路径:

driver = webdriver.Firefox(executable_path='/your/path/to/geckodriver')

长方法:

1)下载并解压缩Geckodriver

2)打开.bash_profile。如果尚未创建,则可以使用命令:touch ~/.bash_profile。然后使用以下命令打开它:open ~/.bash_profile

3)考虑到GeckoDriver文件存在于“下载”文件夹中,可以将以下行添加到该.bash_profile文件中:

PATH="/Users/<your-name>/Downloads/geckodriver:$PATH"
export PATH

这样,您会将GeckoDriver的路径附加到系统路径。这告诉系统执行Selenium脚本时GeckoDriver的位置。

4)保存.bash_profile并强制执行。这将立即加载值,而无需重新启动。为此,您可以运行以下命令:

source ~/.bash_profile

5)就这样。你做完了!您现在可以运行Python脚本。

Steps for MAC:

The simple solution is to download GeckoDriver and add it to your system PATH. You can use either of the two approaches:

Short Method:

1) Download and unzip Geckodriver.

2) Mention the path while initiating the driver:

driver = webdriver.Firefox(executable_path='/your/path/to/geckodriver')

Long Method:

1) Download and unzip Geckodriver.

2) Open .bash_profile. If you haven’t created it yet, you can do so using the command: touch ~/.bash_profile. Then open it using: open ~/.bash_profile

3) Considering GeckoDriver file is present in your Downloads folder, you can add the following line(s) to the .bash_profile file:

PATH="/Users/<your-name>/Downloads/geckodriver:$PATH"
export PATH

By this you are appending the path to GeckoDriver to your System PATH. This tells the system where GeckoDriver is located when executing your Selenium scripts.

4) Save the .bash_profile and force it to execute. This loads the values immediately without having to reboot. To do this you can run the following command:

source ~/.bash_profile

5) That’s it. You are DONE!. You can run the Python script now.


回答 11

为该线程的将来读者提供一些其他输入/说明:

以下是Windows 7,Python 3.6,Selenium 3.11的分辨率:

早先针对Unix的@dsalaj注释也适用于Windows;修改PATH环境。可以避免Windows级别的变量和Windows系统重启。

(1)下载geckodriver(如本主题前面所述),然后将(未压缩的)geckdriver.exe放在X:\ Folder \ of \ your \ choice中

(2)Python代码示例:

import os;
os.environ["PATH"] += os.pathsep + r'X:\Folder\of\your\choice';

from selenium import webdriver;
browser = webdriver.Firefox();
browser.get('http://localhost:8000')
assert 'Django' in browser.title

注意:(1)上面的代码可能需要大约10秒钟才能为指定的URL打开Firefox浏览器。
(2)如果没有服务器已经在指定的url上运行,或者没有提供标题为字符串’Django’的页面,则python控制台将显示以下错误:selenium.common.exceptions.WebDriverException:消息:已到达错误页面:关于:neterror?e = connectionFailure&u = http%3A // localhost%3A8000 /&c = UTF-8&f = regular&d = Firefox%20can%E2%80%9

Some additional input/clarification for future readers of this thread:

The following suffices as a resolution for Windows 7, Python 3.6, selenium 3.11:

@dsalaj’s note in this thread earlier for Unix is applicable to Windows as well; tinkering with the PATH env. variable at the Windows level and restart of the Windows system can be avoided.

(1) Download geckodriver (as described in this thread earlier) and place the (unzipped) geckdriver.exe at X:\Folder\of\your\choice

(2) Python code sample:

import os;
os.environ["PATH"] += os.pathsep + r'X:\Folder\of\your\choice';

from selenium import webdriver;
browser = webdriver.Firefox();
browser.get('http://localhost:8000')
assert 'Django' in browser.title

Notes: (1) It may take about 10 seconds for the above code to open up the Firefox browser for the specified url.
(2) The python console would show the following error if there’s no server already running at the specified url or serving a page with the title containing the string ‘Django’: selenium.common.exceptions.WebDriverException: Message: Reached error page: about:neterror?e=connectionFailure&u=http%3A//localhost%3A8000/&c=UTF-8&f=regular&d=Firefox%20can%E2%80%9


回答 12

我实际上发现您可以使用最新的geckodriver,而无需将其放入系统路径中。目前我正在使用

https://github.com/mozilla/geckodriver/releases/download/v0.12.0/geckodriver-v0.12.0-win64.zip

Firefox 50.1.0

Python 3.5.2

硒3.0.2

Windows 10

我正在运行VirtualEnv(我使用PyCharm进行管理,假设它使用Pip来安装所有内容)

在以下代码中,我可以使用execute_path参数为geckodriver使用特定路径(我通过查看Lib \ site-packages \ selenium \ webdriver \ firefox \ webdriver.py发现了这一点)。请注意,我怀疑调用webdriver时参数参数的顺序很重要,这就是为什么execute_path在我的代码中位于最后(最右边的第二行)

您可能还会注意到,我使用自定义的firefox配置文件来解决sec_error_unknown_issuer问题,如果所测试的站点具有不受信任的证书,则会遇到该问题。请参阅如何使用Selenium禁用Firefox的不受信任的连接警告?

经调查后发现,木偶驱动程序不完整且仍在运行中,没有任何设置各种功能或配置文件选项以消除或设置证书的方法。因此,使用自定义配置文件更加容易。

无论如何,这是有关如何使geckodriver在不经路径的情况下工作的代码:

from selenium import webdriver
from selenium.webdriver.common.desired_capabilities import DesiredCapabilities

firefox_capabilities = DesiredCapabilities.FIREFOX
firefox_capabilities['marionette'] = True

#you probably don't need the next 3 lines they don't seem to work anyway
firefox_capabilities['handleAlerts'] = True
firefox_capabilities['acceptSslCerts'] = True
firefox_capabilities['acceptInsecureCerts'] = True

#In the next line I'm using a specific FireFox profile because
# I wanted to get around the sec_error_unknown_issuer problems with the new Firefox and Marionette driver
# I create a FireFox profile where I had already made an exception for the site I'm testing
# see https://support.mozilla.org/en-US/kb/profile-manager-create-and-remove-firefox-profiles#w_starting-the-profile-manager

ffProfilePath = 'D:\Work\PyTestFramework\FirefoxSeleniumProfile'
profile = webdriver.FirefoxProfile(profile_directory=ffProfilePath)
geckoPath = 'D:\Work\PyTestFramework\geckodriver.exe'
browser = webdriver.Firefox(firefox_profile=profile, capabilities=firefox_capabilities, executable_path=geckoPath)
browser.get('http://stackoverflow.com')

I’ve actually discovered you can use the latest geckodriver with out putting it in the system path. Currently I’m using

https://github.com/mozilla/geckodriver/releases/download/v0.12.0/geckodriver-v0.12.0-win64.zip

Firefox 50.1.0

Python 3.5.2

Selenium 3.0.2

Windows 10

I’m running a VirtualEnv (which I manage using PyCharm, I assume it uses Pip to install everything)

In the following code I can use a specific path for the geckodriver using the executable_path paramater (I discoverd this by having a look in Lib\site-packages\selenium\webdriver\firefox\webdriver.py ). Note I have a suspicion that the order of parameter arguments when calling the webdriver is important, which is why the executable_path is last in my code (2nd last line off to the far right)

You may also notice I use a custom firefox Profile to get around the sec_error_unknown_issuer problem that you will run into if the site you’re testing has an untrusted certificate. see How to disable Firefox’s untrusted connection warning using Selenium?

AFter investigation it was found that the Marionette driver is incomplete and still in progress, and no amount of setting various capabilities or profile options for dismissing or setting certifcates was going to work. So it was just easier to use a custom profile.

Anyway here’s the code on how I got the geckodriver to work without being in the path:

from selenium import webdriver
from selenium.webdriver.common.desired_capabilities import DesiredCapabilities

firefox_capabilities = DesiredCapabilities.FIREFOX
firefox_capabilities['marionette'] = True

#you probably don't need the next 3 lines they don't seem to work anyway
firefox_capabilities['handleAlerts'] = True
firefox_capabilities['acceptSslCerts'] = True
firefox_capabilities['acceptInsecureCerts'] = True

#In the next line I'm using a specific FireFox profile because
# I wanted to get around the sec_error_unknown_issuer problems with the new Firefox and Marionette driver
# I create a FireFox profile where I had already made an exception for the site I'm testing
# see https://support.mozilla.org/en-US/kb/profile-manager-create-and-remove-firefox-profiles#w_starting-the-profile-manager

ffProfilePath = 'D:\Work\PyTestFramework\FirefoxSeleniumProfile'
profile = webdriver.FirefoxProfile(profile_directory=ffProfilePath)
geckoPath = 'D:\Work\PyTestFramework\geckodriver.exe'
browser = webdriver.Firefox(firefox_profile=profile, capabilities=firefox_capabilities, executable_path=geckoPath)
browser.get('http://stackoverflow.com')

回答 13

我正在使用Windows 10,这对我有用:

  1. 此处下载geckodriver 。为您使用的计算机下载正确的版本
  2. 解压缩刚刚下载的文件,并剪切/复制其中包含的“ .exe”文件
  3. 导航至C:{您的python根文件夹}。我的是C:\ Python27。将geckodriver.exe文件粘贴到此文件夹中。
  4. 重新启动您的开发环境。
  5. 再次尝试运行代码,它现在应该可以工作了。

I’m using Windows 10 and this worked for me:

  1. Download geckodriver from here . Download the right version for the computer you are using
  2. Unzip the file you just downloaded and cut/copy the “.exe” file it contains
  3. Navigate to C:{your python root folder}. Mine was C:\Python27. Paste the geckodriver.exe file in this folder.
  4. Restart your development environment.
  5. Try running the code again, it should work now.

回答 14

考虑安装容器化的Firefox:

docker pull selenium/standalone-firefox
docker run --rm -d -p 5555:4444 --shm-size=2g selenium/standalone-firefox

使用连接webdriver.Remote

driver = webdriver.Remote('http://localhost:5555/wd/hub', DesiredCapabilities.FIREFOX)
driver.set_window_size(1280, 1024)
driver.get('https://toolbox.googleapps.com/apps/browserinfo/')
driver.save_screenshot('info.png')

Consider installing a containerized Firefox:

docker pull selenium/standalone-firefox
docker run --rm -d -p 5555:4444 --shm-size=2g selenium/standalone-firefox

Connect using webdriver.Remote:

driver = webdriver.Remote('http://localhost:5555/wd/hub', DesiredCapabilities.FIREFOX)
driver.set_window_size(1280, 1024)
driver.get('https://toolbox.googleapps.com/apps/browserinfo/')
driver.save_screenshot('info.png')

回答 15

遗憾的是,在Selenium / Python上出版的所有书籍以及通过Google对此问题的大多数评论都没有清楚地说明在Mac上进行设置的路径逻辑(一切都是Windows !!!!)。youtube使用者会在“之后”设置好路径设置(在我看来,便宜的出路!)。因此,对于您的Mac用户来说,请使用以下命令编辑bash路径文件:

> $ touch〜/ .bash_profile; 打开〜/ .bash_profile

然后添加类似以下的路径。…*#为geckodriver设置PATH PATH =“ / usr / bin / geckodriver:$ {PATH}” export PATH

为Selenium firefox设置PATH

PATH =“〜/ Users / yourNamePATH / VEnvPythonInterpreter / lib / python2.7 / site-packages / selenium / webdriver / firefox /:$ {PATH}”导出路径

在Firefox驱动程序上设置可执行文件的PATH

PATH =“ /用户/您的PATH / VEnvPythonInterpreter / lib / python2.7 / site-packages / selenium / webdriver / common / service.py:$ {PATH}”导出PATH *

这对我有用。我担心的是Selenium Windows社区何时才能开始玩真正的游戏,并让我们Mac用户加入其自负的俱乐部会员资格。

It’s really rather sad that none of the books published on Selenium/Python and most of the comments on this issue via Google do not clearly explain the pathing logic to set this up on Mac (everything is Windows!!!!). The youtubes all pickup at the “after” you’ve got the pathing setup (in my mind, the cheap way out!). So, for you wonderful Mac users, use the following to edit your bash path files:

>$touch ~/.bash_profile; open ~/.bash_profile

Then add a path something like this…. *# Setting PATH for geckodriver PATH=“/usr/bin/geckodriver:${PATH}” export PATH

Setting PATH for Selenium firefox

PATH=“~/Users/yourNamePATH/VEnvPythonInterpreter/lib/python2.7/site-packages/selenium/webdriver/firefox/:${PATH}” export PATH

Setting PATH for executable on firefox driver

PATH=“/Users/yournamePATH/VEnvPythonInterpreter/lib/python2.7/site-packages/selenium/webdriver/common/service.py:${PATH}” export PATH*

This worked for me. My concern is when will the Selenium Windows community start playing the real game and include us Mac users into their arrogant club membership.


回答 16

硒在他们的DESCRIPTION.rst中回答了这个问题

Drivers
=======

Selenium requires a driver to interface with the chosen browser. Firefox,
for example, requires `geckodriver <https://github.com/mozilla/geckodriver/releases>`_, which needs to be installed before the below examples can be run. Make sure it's in your `PATH`, e. g., place it in `/usr/bin` or `/usr/local/bin`.

Failure to observe this step will give you an error `selenium.common.exceptions.WebDriverException: Message: 'geckodriver' executable needs to be in PATH.

基本上,只需下载geckodriver,将其解压缩并将可执行文件移至您的/ usr / bin文件夹

Selenium answers this question in their DESCRIPTION.rst

Drivers
=======

Selenium requires a driver to interface with the chosen browser. Firefox,
for example, requires `geckodriver <https://github.com/mozilla/geckodriver/releases>`_, which needs to be installed before the below examples can be run. Make sure it's in your `PATH`, e. g., place it in `/usr/bin` or `/usr/local/bin`.

Failure to observe this step will give you an error `selenium.common.exceptions.WebDriverException: Message: 'geckodriver' executable needs to be in PATH.

Basically just download the geckodriver, unpack it and move the executable to your /usr/bin folder


回答 17

对于Windows用户

使用原始代码:

from selenium import webdriver
browser = webdriver.Firefox()
driver.get("https://www.google.com")

然后从以下位置下载驱动程序:mozilla / geckodriver

(永久地)将其放置在固定路径中。例如,我将其放置在:

C:\ Python35

然后转到系统的环境变量,在“系统变量”的网格中查找Path变量并添加:

; C:\ Python35 \ geckodriver

geckodriver,而不是geckodriver.exe

For windows users

use the original code as it’s:

from selenium import webdriver
browser = webdriver.Firefox()
driver.get("https://www.google.com")

then download the driver from: mozilla/geckodriver

Place it in a fixed path (permanently).. as an example, I put it in:

C:\Python35

Then go to the environment variables of the system, in the grid of “System variables” look for Path variable and add:

;C:\Python35\geckodriver

geckodriver, not geckodriver.exe.


回答 18

在Raspberry Pi上,我必须从ARM驱动程序创建并在以下位置设置geckodriver和日志路径:

须藤纳米/usr/local/lib/python2.7/dist-packages/selenium/webdriver/firefox/webdriver.py

def __init__(self, firefox_profile=None, firefox_binary=None,
             timeout=30, capabilities=None, proxy=None,
             executable_path="/PATH/gecko/geckodriver",                     
firefox_options=None,
             log_path="/PATH/geckodriver.log"):

On Raspberry Pi I had to create from ARM driver and set the geckodriver and log path in:

sudo nano /usr/local/lib/python2.7/dist-packages/selenium/webdriver/firefox/webdriver.py

def __init__(self, firefox_profile=None, firefox_binary=None,
             timeout=30, capabilities=None, proxy=None,
             executable_path="/PATH/gecko/geckodriver",                     
firefox_options=None,
             log_path="/PATH/geckodriver.log"):

回答 19

如果使用虚拟环境和win10(可能是其他系统的环境),则只需将geckodriver.exe放入虚拟环境目录中的以下文件夹中:

… \ my_virtual_env_directory \ Scripts \ geckodriver.exe

If you use virtual environment and win10(maybe it’s the for other systems), you just need to put geckodriver.exe into the following folder in your virtual environment directory:

…\my_virtual_env_directory\Scripts\geckodriver.exe


回答 20

from webdriverdownloader import GeckoDriverDownloader # vs ChromeDriverDownloader vs OperaChromiumDriverDownloader
gdd = GeckoDriverDownloader()
gdd.download_and_install()
#gdd.download_and_install("v0.19.0")

这将为您提供Windows上gekodriver.exe的路径

from selenium import webdriver
driver = webdriver.Firefox(executable_path=r'C:\\Users\\username\\\bin\\geckodriver.exe')
driver.get('https://www.amazon.com/')
from webdriverdownloader import GeckoDriverDownloader # vs ChromeDriverDownloader vs OperaChromiumDriverDownloader
gdd = GeckoDriverDownloader()
gdd.download_and_install()
#gdd.download_and_install("v0.19.0")

this will get you the path to your gekodriver.exe on windows

from selenium import webdriver
driver = webdriver.Firefox(executable_path=r'C:\\Users\\username\\\bin\\geckodriver.exe')
driver.get('https://www.amazon.com/')

回答 21

Mac 10.12.1 python 2.7.10对我有用:)

def download(url):
firefox_capabilities = DesiredCapabilities.FIREFOX
firefox_capabilities['marionette'] = True
browser = webdriver.Firefox(capabilities=firefox_capabilities,
                            executable_path=r'/Users/Do01/Documents/crawler-env/geckodriver')
browser.get(url)
return browser.page_source

Mac 10.12.1 python 2.7.10 this work for me :)

def download(url):
firefox_capabilities = DesiredCapabilities.FIREFOX
firefox_capabilities['marionette'] = True
browser = webdriver.Firefox(capabilities=firefox_capabilities,
                            executable_path=r'/Users/Do01/Documents/crawler-env/geckodriver')
browser.get(url)
return browser.page_source

回答 22

我正在使用Windows 10和Anaconda2。我尝试设置系统路径变量,但没有解决。然后,我只是将geckodriver.exe文件添加到Anaconda2 / Scripts文件夹中,现在一切正常。对我来说,道路是:

C:\ Users \ Bhavya \ Anaconda2 \ Scripts

I am using Windows 10 and Anaconda2. I tried setting system path variable but didn’t worked out. Then I simply added geckodriver.exe file to Anaconda2/Scripts folder and everything works great now. For me the path was:-

C:\Users\Bhavya\Anaconda2\Scripts


回答 23

如果要在Windows 10上添加驱动程序路径:

  1. 右键单击“此PC”图标,然后选择“属性”

  2. 点击“高级系统设置”

  3. 点击屏幕底部的“环境变量”
  4. 在“用户变量”部分中,突出显示“路径”,然后单击“编辑”
  5. 通过单击“新建”并输入要添加的驱动程序的路径,然后按Enter键,将路径添加到变量中。
  6. 输入路径后,点击“确定”
  7. 持续单击“确定”,直到关闭所有屏幕

If you want to add the driver paths on windows 10:

  1. Right click on the “This PC” icon and select “Properties”

  2. Click on “Advanced System Settings”

  3. Click on “Environment Variables” at the bottom of the screen
  4. In the “User Variables” section highlight “Path” and click “Edit”
  5. Add the paths to your variables by clicking “New” and typing in the path for the driver you are adding and hitting enter.
  6. Once you done entering in the path, click “OK”
  7. Keep clicking “OK” until you have closed out all the screens

回答 24

访问Gecko驱动程序,从下载部分获取gecko驱动程序的URL。

克隆此仓库https://github.com/jackton1/script_install.git

cd script_install

./installer --gecko-driver https://github.com/mozilla/geckodriver/releases/download/v0.18.0/geckodriver-v0.25.0-linux64.tar.gz

Visit Gecko Driver get the url for the gecko driver from the Downloads section.

Clone this repo https://github.com/jackton1/script_install.git

cd script_install

Run

./installer --gecko-driver https://github.com/mozilla/geckodriver/releases/download/v0.18.0/geckodriver-v0.25.0-linux64.tar.gz

回答 25

  1. 确保您具有正确版本的驱动程序(geckodriver),x86或64。
  2. 确保您正在检查正确的环境,例如,作业在Docker中运行,而检查environmnet是主机操作系统
  1. ensure you have the correct version of driver (geckodriver), x86 or 64.
  2. ensure you are checking the right environment, for example the job is running in a Docker, whereas the environmnet is checked is the host OS

回答 26

对我而言,仅在相同的环境中安装geckodriver就足够了:

$ brew install geckodriver

并且代码没有更改:

from selenium import webdriver
browser = webdriver.Firefox()

for me it was enough just to install geckodriver in the same environment:

$ brew install geckodriver

and the code was not change:

from selenium import webdriver
browser = webdriver.Firefox()

回答 27

要加上我的5美分,也可以这样做echo PATH(Linux),只需将geckodriver移到您喜欢的文件夹中即可。如果以系统(而非虚拟环境)文件夹为目标,则驱动程序变得可以全局访问。

To add my 5 cents, it is also possible to do echo PATH (Linux) and just move geckodriver to the folder of your liking. If a system (not virtual environment) folder is the target, the driver becomes globally accessible.


InstaPy-📷InstagramBot-用于自动Instagram交互的工具

InstaPy

用来加工的工具自动化您的社交媒体交互可以使用Selenium模块在使用Python实现的Instagram上“农场”赞、评论和关注者

Twitter of InstaPy|Twitter of Tim|Discord Channel|How it works (FreeCodingCamp)|
Talk about automating your Instagram|Talk about doing Open-Source work|Listen to the “Talk Python to me”-Episode

时事通讯:Sign Up for the Newsletter here!
官方视频指南:Get it here!
机器人创建指南:Learn to Build your own Bots with the Creators of InstaPy
我们的数据可视化实践研讨会:Learn to create insightful Visualizations from Scratch!

从头开始学习自动化:The School of Automation
学习构建您自己的InstaPy的技能:Automating Social Media Interactions

找到完整的文档,请访问InstaPy.org

目录

学分

社区

一个积极和支持的社区是每个开源项目都需要维持的。我们一起到达了世界上的每个大洲和大多数国家!
感谢你们成为InstaPy社区的一员✌️

贡献者

这个项目的存在要归功于所有做出贡献的人。[Contribute]

支持者

感谢我们所有的支持者!🙏[Become a backer]


免责声明: Please note that this is a research project. I am by no means responsible for any usage of this tool. Use it on your behalf. I’m also not responsible if your accounts get banned due to the extensive use of this tool.

Selenium-浏览器自动化框架和生态系统

Selenium是一个伞形项目,封装了支持Web浏览器自动化的各种工具和库。Selenium专门为W3C WebDriver specification-与所有主要Web浏览器兼容的平台和语言中立的编码接口

这个项目是由志愿贡献者慷慨捐赠数千小时进行代码开发和维护而实现的

Selenium的源代码位于Apache 2.0 license

文档

叙述性文档:

接口文档:

拉取请求

请阅读CONTRIBUTING.md在提交您的拉取请求之前

要求

  • Bazelisk中指定的Bazel版本自动下载的Bazel包装器.bazelversion文件,并透明地将所有命令行参数传递给真正的Bazel二进制文件
  • 最新版本的Java 11 OpenJDK
  • javajar在路径上(请确保使用java可从JDK执行,但不能从JRE执行)
    • 要测试这一点,请尝试运行以下命令javac如果您只安装了JRE,则此命令将不存在。如果您遇到一系列命令行选项,那么您引用的JDK是正确的
  • Python 3.7+
  • python在路上
  • The tox automation project对于Python:pip install tox
  • MacOS用户应该安装最新版本的Xcode,包括命令行工具。以下命令应该可以工作:
xcode-select --install
  • Apple Silicon Mac的用户应添加build --host_platform=//:rosetta致他们的.bazelrc.local文件。我们正在努力确保从长远来看这不是必需的
  • Windows用户应安装最新版本的Visual Studio命令行工具和生成工具
    • BAZEL_VS环境变量应该指向构建工具的位置,例如C:\Program Files (x86)\Microsoft Visual Studio\2019\BuildTools
    • BAZEL_VC环境变量应该指向命令行工具的位置,例如C:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC
    • BAZEL_VC_FULL_VERSION环境变量应包含已安装的命令行工具的版本,例如14.27.29110

可选要求

  • Ruby 2.0

Internet Explorer驱动程序

如果您计划编译IE driver,您还需要:

构建可以在任何平台上运行,但如果您不是在Windows上构建,则会以静默方式跳过IE的测试

大楼

巴泽尔

Bazel是由谷歌的优秀员工建造的。Bazel管理依赖项下载、生成Selenium二进制文件、执行测试,并且完成所有这些工作的速度都相当快

下面是运行Bazel的更详细的说明,但是如果您可以成功构建java和javascript文件夹而不出错,那么您应该相信您的系统上有正确的二进制文件

在构建之前

确保您安装了Firefox并安装了最新版本geckodriver在您的$PATH您可能需要不时更新此信息

通用构建目标

要从源代码构建最常用的Selenium模块,请从根项目文件夹执行以下命令:

bazel build java/...

如果您手头有一些额外的时间,您可以运行此命令以获得构建成功的额外信心。这将做更多的工作来构建所有的javascript工件:

bazel build java/... javascript/...

如果您正在对此项目中的java/或javascript/文件夹进行更改,并且此命令执行时没有错误,那么您应该能够创建更改的PR。(另见CONTRIBUTING.md)

构建详细信息

  • Bazel文件名为BUILD.bazel
  • crazyfun生成文件被称为build.desc这是一个较旧的构建系统,大部分仍在用于Ruby绑定的项目中

模块的构建顺序由构建系统决定。如果要构建单个模块(假设所有依赖模块之前都已构建),请尝试以下操作:

bazel test javascript/atoms:test

在这种情况下,javascript/atoms是模块目录,test是该目录BUILD.bazel文件

如你所见构建目标在日志中滚动,您可能需要单独运行它们

常见任务(Bazel)

要从源代码构建大量Selenium二进制文件,请从根文件夹运行以下命令:

bazel build java/... javascript/...

要构建网格部署JAR,请运行以下命令:

bazel build grid

要在项目的特定区域内运行测试,请使用“test”命令,后跟文件夹或目标。测试用“小”、“中”或“大”标记,并且可以用--test_size_filters选项:

bazel test --test_size_filters=small,medium java/...

Bazel的“test”命令将运行包中的测试,包括集成测试。期待着test java/...启动浏览器并消耗大量时间和资源

编辑代码

大多数团队成员使用IntelliJ IDEA或VS.Code进行日常编辑。如果您在IntelliJ中工作,我们强烈建议您安装Bazel IJ
plugin
其文档记录在its own site

如果您使用的是IntelliJ和Bazel插件,则会有一个项目视图签入到中的树中scripts/ij.bazelproject这将使运行和编辑代码变得更容易:)

游览

代码库通常围绕用于编写组件的语言进行划分。Selenium广泛使用JavaScript,所以让我们从这里开始。使用JavaScript很容易。首先,启动开发服务器:

bazel run debug-server

现在,导航到http://localhost:2310/javascript您会发现javascript/正在显示目录。我们使用Closure
Library
来开发大部分JavaScript,所以现在导航到http://localhost:2310/javascript/atoms/test

此目录中的测试是名称以_test.html单击其中一个以加载页面并运行测试

Maven POM文件

这是public Selenium Maven
repository

生成输出

bazel属性创建顶级目录组。bazel-每个目录上的前缀

在以下方面提供帮助go

更一般但更基本的帮助go

./go --help

go只是个包装而已Rake,因此您可以使用标准命令,如rake -T要获取有关可用目标的详细信息,请执行以下操作

马文本身

如果还不清楚,那么Selenium不是用Maven构建的。它是用bazel,不过这是用go如上所述,您不必对此了解太多

也就是说,可以相对快速地构建供Maven使用的硒片。只有在针对您的应用程序测试尖端的Selenium开发(我们欢迎)时,您才会真正想要这样做。以下是构建和部署到本地maven存储库的最快方法(~/.m2/repository),同时跳过Selenium自己的测试

./go maven-install

Maven罐子现在应该在你当地了~/.m2/repository

有用资源

请参阅Build Instructions关于构建零碎的硒的最后一句话的维基页面

在Linux上运行浏览器测试

为了运行浏览器测试,您首先需要安装特定于浏览器的驱动程序,例如geckodriverchromedriver,或edgedriver这些需要放在你的PATH

默认情况下,Bazel在您当前的X-server UI中运行这些测试。如果您愿意,也可以在虚拟或嵌套的X服务器中运行它们

  1. 运行X服务器Xvfb :99Xnest :99
  2. 运行窗口管理器,例如,DISPLAY=:99 jwm
  3. 运行您感兴趣的测试:
bazel test --test_env=DISPLAY=:99 //java/... --test_tag_filters=chrome

在虚拟X服务器中运行测试的一种简单方法是使用Bazel的--run_under功能:

bazel test --run_under="xvfb-run -a" //java/... --test_tag_filters=chrome

Bazel安装/故障排除

MacOS

巴泽利克(BAZELLISK)

Bazelisk是Bazel的Mac友好启动器。要安装,请执行以下步骤:

brew tap bazelbuild/tap && \
brew uninstall bazel; \
brew install bazelbuild/tap/bazelisk

Xcode

如果您收到提到Xcode的错误,则需要安装命令行工具

Bazel for Mac需要一些额外的步骤才能正确配置。首先要做的是:使用Bazelisk项目(Philwo提供),它是Bazel的纯Golang实现。要安装Bazelisk,请首先验证您的Xcode是否会合作:执行以下命令:

xcode-select -p

如果值为/Applications/Xcode.app/Contents/Developer/,您可以继续安装bazelisk。但是,如果返回值为/Library/Developer/CommandLineTools/,您需要将Xcode系统重定向到正确的值

sudo xcode-select -s /Applications/Xcode.app/Contents/Developer/
sudo xcodebuild -license

第一个命令将提示您输入密码。第二步要求您阅读新的Xcode许可,然后通过键入“Agree”接受它。

(多亏了this thread对于这些步骤)