标签归档:selenium-chromedriver

错误消息:“’chromedriver’可执行文件必须在路径中可用”

问题:错误消息:“’chromedriver’可执行文件必须在路径中可用”

我正在将硒与python结合使用,并已从以下站点下载了适用于Windows计算机的chromedriver:http ://chromedriver.storage.googleapis.com/index.html?path=2.15 /

下载zip文件后,我将zip文件解压缩到我的下载文件夹中。然后,我将可执行二进制文件(C:\ Users \ michael \ Downloads \ chromedriver_win32)的路径放入环境变量“路径”中。

但是,当我运行以下代码时:

  from selenium import webdriver

  driver = webdriver.Chrome()

…我不断收到以下错误消息:

WebDriverException: Message: 'chromedriver' executable needs to be available in the path. Please look at     http://docs.seleniumhq.org/download/#thirdPartyDrivers and read up at http://code.google.com/p/selenium/wiki/ChromeDriver

但是-如上所述-可执行文件在路径中是(!)…这里发生了什么?

I am using selenium with python and have downloaded the chromedriver for my windows computer from this site: http://chromedriver.storage.googleapis.com/index.html?path=2.15/

After downloading the zip file, I unpacked the zip file to my downloads folder. Then I put the path to the executable binary (C:\Users\michael\Downloads\chromedriver_win32) into the Environment Variable “Path”.

However, when I run the following code:

  from selenium import webdriver

  driver = webdriver.Chrome()

… I keep getting the following error message:

WebDriverException: Message: 'chromedriver' executable needs to be available in the path. Please look at     http://docs.seleniumhq.org/download/#thirdPartyDrivers and read up at http://code.google.com/p/selenium/wiki/ChromeDriver

But – as explained above – the executable is(!) in the path … what is going on here?


回答 0

您可以测试它是否确实在PATH中,如果您打开cmd并输入chromedriver(假设您的chromedriver可执行文件仍以此命名),然后按Enter。如果Starting ChromeDriver 2.15.322448显示,则PATH设置正确,并且还有其他问题。

另外,您可以像这样使用chromedriver的直接路径:

 driver = webdriver.Chrome('/path/to/chromedriver') 

因此,在您的特定情况下:

 driver = webdriver.Chrome("C:/Users/michael/Downloads/chromedriver_win32/chromedriver.exe")

You can test if it actually is in the PATH, if you open a cmd and type in chromedriver (assuming your chromedriver executable is still named like this) and hit Enter. If Starting ChromeDriver 2.15.322448 is appearing, the PATH is set appropriately and there is something else going wrong.

Alternatively you can use a direct path to the chromedriver like this:

 driver = webdriver.Chrome('/path/to/chromedriver') 

So in your specific case:

 driver = webdriver.Chrome("C:/Users/michael/Downloads/chromedriver_win32/chromedriver.exe")

回答 1

我看到讨论仍在讨论通过下载二进制文件并手动配置路径来设置chromedriver的旧方法。

可以使用webdriver-manager自动完成

pip install webdriver-manager

现在,问题中的上述代码将可以在下面的更改中简单地工作,

from selenium import webdriver
from webdriver_manager.chrome import ChromeDriverManager

driver = webdriver.Chrome(ChromeDriverManager().install())

可以使用相同的方法来设置Firefox,Edge和二进制文件。

I see the discussions still talk about the old way of setting up chromedriver by downloading the binary and configuring the path manually.

This can be done automatically using webdriver-manager

pip install webdriver-manager

Now the above code in the question will work simply with below change,

from selenium import webdriver
from webdriver_manager.chrome import ChromeDriverManager

driver = webdriver.Chrome(ChromeDriverManager().install())

The same can be used to set Firefox, Edge and ie binaries.


回答 2

pycharm社区版的情况与此相同,因此对于cmd,必须重新启动ide才能重新加载路径变量。重新启动您的ide,应该没问题。

Same situation with pycharm community edition, so, as for cmd, you must restart your ide in order to reload path variables. Restart your ide and it should be fine.


回答 3

在Linux(Ubuntu或Debian)上:

sudo apt install chromium-chromedriver

在macOS上安装https://brew.sh/然后执行

brew cask install chromedriver

On Ubuntu:

sudo apt install chromium-chromedriver

On Debian:

sudo apt install chromium-driver

On macOS install https://brew.sh/ then do

brew cask install chromedriver

回答 4

r对于原始字符串,我们必须添加路径字符串,以字符串之前的字母开头。我以这种方式进行了测试,并且有效。

driver = webdriver.Chrome(r"C:/Users/michael/Downloads/chromedriver_win32/chromedriver.exe")

We have to add path string, begin with the letter r before the string, for raw string. I tested this way, and it works.

driver = webdriver.Chrome(r"C:/Users/michael/Downloads/chromedriver_win32/chromedriver.exe")

回答 5

一些额外的输入/说明,供以后使用该线程的读者使用,以避免修改PATH env。Windows级别的变量并重新启动Windows系统:(从https://stackoverflow.com/a/49851498/9083077复制我的答案,适用于Chrome):

(1)下载chromedriver(如本主题前面所述),然后将(解压缩的)chromedriver.exe放在X:\ Folder \ of \ your \ choice中

(2)Python代码示例:

import os;
os.environ["PATH"] += os.pathsep + r'X:\Folder\of\your\choice';

from selenium import webdriver;
browser = webdriver.Chrome();
browser.get('http://localhost:8000')
assert 'Django' in browser.title

注意:(1)示例代码(在引用的答案中)可能需要5秒钟打开Firefox浏览器以获取指定的URL。(2)如果尚无服务器在指定的url上运行或提供标题为字符串’Django’的页面,则python控制台将显示以下错误:在browser.title AssertionError中断言’Django’。

Some additional input/clarification for future readers of this thread, to avoid tinkering with the PATH env. variable at the Windows level and restart of the Windows system: (copy of my answer from https://stackoverflow.com/a/49851498/9083077 as applicable to Chrome):

(1) Download chromedriver (as described in this thread earlier) and place the (unzipped) chromedriver.exe at X:\Folder\of\your\choice

(2) Python code sample:

import os;
os.environ["PATH"] += os.pathsep + r'X:\Folder\of\your\choice';

from selenium import webdriver;
browser = webdriver.Chrome();
browser.get('http://localhost:8000')
assert 'Django' in browser.title

Notes: (1) It may take about 5 seconds for the sample code (in the referenced answer) to open up the Firefox browser for the specified url. (2) The python console would show the following error if there’s no server already running at the specified url or serving a page with the title containing the string ‘Django’: assert ‘Django’ in browser.title AssertionError


回答 6

对于Linux和OSX

步骤1:下载chromedriver

# You can find more recent/older versions at http://chromedriver.storage.googleapis.com/
# Also make sure to pick the right driver, based on your Operating System
wget http://chromedriver.storage.googleapis.com/81.0.4044.69/chromedriver_mac64.zip

第2步:将chromedriver添加到 /usr/local/bin

unzip chromedriver_mac64.zip
cp chromedriver /usr/local/bin

您现在应该可以运行

from selenium import webdriver

browser = webdriver.Chrome()
browser.get('http://localhost:8000')

没有任何问题

For Linux and OSX

Step 1: Download chromedriver

# You can find more recent/older versions at http://chromedriver.storage.googleapis.com/
# Also make sure to pick the right driver, based on your Operating System
wget http://chromedriver.storage.googleapis.com/81.0.4044.69/chromedriver_mac64.zip

For debian: wget https://chromedriver.storage.googleapis.com/2.41/chromedriver_linux64.zip

Step 2: Add chromedriver to /usr/local/bin

unzip chromedriver_mac64.zip
sudo mv chromedriver /usr/local/bin
sudo chown root:root /usr/local/bin/chromedriver
sudo chmod +x /usr/local/bin/chromedriver

You should now be able to run

from selenium import webdriver

browser = webdriver.Chrome()
browser.get('http://localhost:8000')

without any issues


回答 7

解压缩chromedriver时,请务必指定确切位置,以便以后进行跟踪。在下面,您将为您的操作系统找到合适的chromedriver,然后将其解压缩到一个确切的位置,稍后可以在代码中将其作为参数提供。

wget http://chromedriver.storage.googleapis.com/2.10/chromedriver_linux64.zip unzip chromedriver_linux64.zip -d /home/virtualenv/python2.7.9/

When you unzip chromedriver, please do specify an exact location so that you can trace it later. Below, you are getting the right chromedriver for your OS, and then unzipping it to an exact location, which could be provided as argument later on in your code.

wget http://chromedriver.storage.googleapis.com/2.10/chromedriver_linux64.zip unzip chromedriver_linux64.zip -d /home/virtualenv/python2.7.9/


回答 8

如果您正在使用机器人框架RIDE。然后,您可以Chromedriver.exe从其官方网站下载并将此.exe文件保存在C:\Python27\Scripts目录中。现在将此路径作为环境变量提及。C:\Python27\Scripts\chromedriver.exe

重新启动计算机,然后再次运行相同的测试用例。您不会再遇到此问题。

If you are working with robot framework RIDE. Then you can download Chromedriver.exe from its official website and keep this .exe file in C:\Python27\Scripts directory. Now mention this path as your environment variable eg. C:\Python27\Scripts\chromedriver.exe.

Restart your computer and run same test case again. You will not get this problem again.


回答 9

根据说明,您需要在实例化webdriver时包括ChromeDriver的路径。Chrome例如:

driver = webdriver.Chrome('/path/to/chromedriver')

According to the instruction, you need to include the path to ChromeDriver when instantiating webdriver.Chrome eg.:

driver = webdriver.Chrome('/path/to/chromedriver')

回答 10

在将chromedriver添加到路径之前,请确保它与浏览器的版本相同。

如果不是,则需要匹配版本:更新/降级chrome,以及升级/降级webdriver。

我建议您尽可能多地更新Chrome版本,并匹配网络驱动程序。

要更新Chrome:

  • 在右上角,单击三个点。
  • 点击help->About Google Chrome
  • 更新版本并重新启动chrome

然后从此处下载兼容版本:http : //chromedriver.chromium.org/downloads

注意:最新的chromedriver并不总是与最新版本的chrome匹配!

现在,您可以将其添加到PATH中:

  1. 在您计算机的某个位置创建一个新文件夹,您将在其中放置Web驱动程序。我创建了一个命名的文件夹webdriversC:\Program Files

  2. 复制文件夹路径。就我而言C:\Program Files\webdrivers

  3. 右键单击this PC-> properties

  1. 在右键上 Advanced System settings
  2. 请点击 Environment Variables
  3. 在中System variables,单击,path然后单击edit
  4. 点击 new
  5. 粘贴之前复制的路径
  6. 在所有窗口上单击确定

而已!我使用了pycharm,不得不重新打开它。也许与其他IDE或终端相同。

Before you add the chromedriver to your path, make sure it’s the same version as your browser.

If not, you will need to match versions: either update/downgrade you chrome, and upgrade/downgrade your webdriver.

I recommend updating your chrome version as much as possible, and the matching the webdriver.

To update chrome:

  • On the top right corner, click on the three dots.
  • click help -> About Google Chrome
  • update the version and restart chrome

Then download the compatible version from here: http://chromedriver.chromium.org/downloads .

Note: The newest chromedriver doesn’t always match the newest version of chrome!

Now you can add it to the PATH:

  1. create a new folder somewhere in your computer, where you will place your web drivers. I created a folder named webdrivers in C:\Program Files

  2. copy the folder path. In my case it was C:\Program Files\webdrivers

  3. right click on this PC -> properties:

  1. On the right click Advanced System settings
  2. Click Environment Variables
  3. In System variables, click on path and click edit
  4. click new
  5. paste the path you copied before
  6. click OK on all the windows

Thats it! I used pycharm and I had to reopen it. Maybe its the same with other IDEs or terminals.


回答 11

如果您完全确定PATH设置正确,可以尝试重新启动计算机,如果它无法正常工作。

就Windows 7而言,我总是在WebDriverException上出现错误:消息:对于chromedriver,gecodriver,IEDriverServer。我很确定我有正确的路径。重启电脑,一切正常

Could try to restart computer if it doesn’t work after you are quite sure that PATH is set correctly.

In my case on windows 7, I always got the error on WebDriverException: Message: for chromedriver, gecodriver, IEDriverServer. I am pretty sure that i have correct path. Restart computer, all work


回答 12

就我而言,当我将chromedriver文件复制到c:\ Windows文件夹时,此错误消失了。这是因为Windows目录位于python脚本检查chromedriver可用性的路径中。

In my case, this error disappears when I have copied chromedriver file to c:\Windows folder. Its because windows directory is in the path which python script check for chromedriver availability.


回答 13

如果使用远程解释器,则还必须检查是否定义了其可执行文件PATH。在我的情况下,从远程Docker解释器切换到本地解释器解决了问题。

If you are using remote interpreter you have to also check if its executable PATH is defined. In my case switching from remote Docker interpreter to local interpreter solved the problem.


回答 14

我遇到了与您相同的问题。我正在使用PyCharm编写程序,我认为问题出在PyCharm中而不是OS中。我解决了该问题,方法是进行脚本配置,然后手动编辑环境变量中的PATH。希望对您有所帮助!

I encountered the same problem as yours. I’m using PyCharm to write programs, and I think the problem lies in environment setup in PyCharm rather than the OS. I solved the problem by going to script configuration and then editing the PATH in environment variables manually. Hope you find this helpful!


回答 15

C:\ Windows处添加webdriver(chromedriver.exe或geckodriver.exe)。这对我来说很有效

Add the webdriver(chromedriver.exe or geckodriver.exe) here C:\Windows. This worked in my case


回答 16

最好的方法可能是获取当前目录并将剩余地址附加到该目录。像这样的代码(Windows上的Word。在Linux上,您可以使用pwd行): webdriveraddress = str(os.popen("cd").read().replace("\n", ''))+'\path\to\webdriver'

The best way is maybe to get the current directory and append the remaining address to it. Like this code(Word on windows. On linux you can use something line pwd): webdriveraddress = str(os.popen("cd").read().replace("\n", ''))+'\path\to\webdriver'


回答 17

当我下载chromedriver.exe时,我只是将其移动到PATH文件夹C:\ Windows \ System32 \ chromedriver.exe中,却遇到了完全相同的问题。

对我来说,解决方案是只更改PATH中的文件夹,因此我将其移到了PATH中也位于Pycharm Community bin文件夹中。例如:

  • C:\ Windows \ System32 \ chromedriver.exe->给我exceptions
  • C:\ Program Files \ JetBrains \ PyCharm Community Edition 2019.1.3 \ bin \ chromedriver.exe->运行正常

When I downloaded chromedriver.exe I just move it in PATH folder C:\Windows\System32\chromedriver.exe and had exact same problem.

For me solution was to just change folder in PATH, so I just moved it at Pycharm Community bin folder that was also in PATH. ex:

  • C:\Windows\System32\chromedriver.exe –> Gave me exception
  • C:\Program Files\JetBrains\PyCharm Community Edition 2019.1.3\bin\chromedriver.exe –> worked fine

回答 18

Mac Mojave运行机器人测试框架和Chrome 77时出现了此问题。这解决了问题。感谢@Navarasu将我指向正确的轨道。

$ pip install webdriver-manager --user # install webdriver-manager lib for python
$ python # open python prompt

接下来,在python提示符下:

from selenium import webdriver
from webdriver_manager.chrome import ChromeDriverManager
driver = webdriver.Chrome(ChromeDriverManager().install())

# ctrl+d to exit

这导致以下错误:

Checking for mac64 chromedriver:xx.x.xxxx.xx in cache
There is no cached driver. Downloading new one...
Trying to download new driver from http://chromedriver.storage.googleapis.com/xx.x.xxxx.xx/chromedriver_mac64.zip
...
TypeError: makedirs() got an unexpected keyword argument 'exist_ok'
  • 我现在得到了最新的下载链接
    • 将chromedriver下载并解压缩到所需位置
    • 例如: ~/chromedriver/chromedriver

~/.bash_profile用编辑器打开并添加:

export PATH="$HOME/chromedriver:$PATH"

打开新的终端窗口,ta-da🎉

Had this issue with Mac Mojave running Robot test framework and Chrome 77. This solved the problem. Kudos @Navarasu for pointing me to the right track.

$ pip install webdriver-manager --user # install webdriver-manager lib for python
$ python # open python prompt

Next, in python prompt:

from selenium import webdriver
from webdriver_manager.chrome import ChromeDriverManager
driver = webdriver.Chrome(ChromeDriverManager().install())

# ctrl+d to exit

This leads to the following error:

Checking for mac64 chromedriver:xx.x.xxxx.xx in cache
There is no cached driver. Downloading new one...
Trying to download new driver from http://chromedriver.storage.googleapis.com/xx.x.xxxx.xx/chromedriver_mac64.zip
...
TypeError: makedirs() got an unexpected keyword argument 'exist_ok'
  • I now got the newest download link
    • Download and unzip chromedriver to where you want
    • For example: ~/chromedriver/chromedriver

Open ~/.bash_profile with editor and add:

export PATH="$HOME/chromedriver:$PATH"

Open new terminal window, ta-da 🎉


回答 19

我在Webdriver 3.8.0(Chrome 73.0.3683.103和ChromeDriver 73.0.3683.68)上遇到了此问题。我做完之后问题就消失了

pip install -U selenium

将Webdriver升级到3.14.1。

I had this problem on Webdriver 3.8.0 (Chrome 73.0.3683.103 and ChromeDriver 73.0.3683.68). The problem disappeared after I did

pip install -U selenium

to upgrade Webdriver to 3.14.1.


回答 20

最好的确定方法是在这里:

下载并解压缩chromedriver并将chromedriver.exe放入C:\ Python27 \ Scripts中,然后您无需提供驱动程序的路径,只需

driver= webdriver.Chrome()

您无需添加路径或其他任何操作

Best way for sure is here:

Download and unzip chromedriver and put ‘chromedriver.exe’ in C:\Python27\Scripts and then you need not to provide the path of driver, just

driver= webdriver.Chrome()

You are done no need to add paths or anything


回答 21

检查您的Chrome驱动程序的路径,它可能无法从那里获取。只需复制即可将驱动程序位置粘贴到代码中。

Check the path of your chrome driver, it might not get it from there. Simply Copy paste the driver location into the code.


回答 22

(对于Mac用户)我有同样的问题,但是我通过以下简单方法解决了:您必须将chromedriver.exe放在执行脚本的同一文件夹中,然后在pyhton中编写以下指令:

导入操作系统

os.environ [“ PATH”] + = os.pathsep + r’X:/您的/文件夹/脚本/’

(for Mac users) I have the same problem but i solved by this simple way: You have to put your chromedriver.exe in the same folder to your executed script and than in pyhton write this instruction :

import os

os.environ[“PATH”] += os.pathsep + r’X:/your/folder/script/’


网站可以检测到何时在chromedriver中使用硒吗?

问题:网站可以检测到何时在chromedriver中使用硒吗?

我一直在使用Chromedriver测试Selenium,但我注意到有些页面可以检测到您正在使用Selenium,即使根本没有自动化。即使当我只是通过Selenium和Xephyr使用chrome手动浏览时,我也经常得到一个页面,指出检测到可疑活动。我已经检查了用户代理和浏览器指纹,它们与普通的chrome浏览器完全相同。

当我以普通的chrome浏览到这些站点时,一切正常,但是当我使用Selenium时,我被检测到。

从理论上讲,chromedriver和chrome在任何Web服务器上看起来都应该完全相同,但是它们可以通过某种方式检测到它。

如果您想要一些测试代码,请尝试以下方法:

from pyvirtualdisplay import Display
from selenium import webdriver

display = Display(visible=1, size=(1600, 902))
display.start()
chrome_options = webdriver.ChromeOptions()
chrome_options.add_argument('--disable-extensions')
chrome_options.add_argument('--profile-directory=Default')
chrome_options.add_argument("--incognito")
chrome_options.add_argument("--disable-plugins-discovery");
chrome_options.add_argument("--start-maximized")
driver = webdriver.Chrome(chrome_options=chrome_options)
driver.delete_all_cookies()
driver.set_window_size(800,800)
driver.set_window_position(0,0)
print 'arguments done'
driver.get('http://stubhub.com')

如果浏览stubhub,您将在一个或两个请求中被重定向和“阻止”。我一直在对此进行调查,无法弄清楚他们如何分辨用户正在使用Selenium。

他们是怎么做到的呢?

编辑更新:

我在Firefox中安装了Selenium IDE插件,当我在普通的Firefox浏览器中仅使用附加插件访问stubhub.com时就被禁止了。

编辑:

当我使用Fiddler来回查看HTTP请求时,我注意到“假浏览器”的请求通常在响应标头中具有“ no-cache”。

编辑:

像这样的结果是否有办法从Javascript检测到我在Selenium Webdriver页面中,这表明应该没有办法检测何时使用Webdriver。但这证据表明并非如此。

编辑:

该站点将指纹上载到他们的服务器,但是我检查了一下,硒的指纹与使用chrome时的指纹相同。

编辑:

这是它们发送到服务器的指纹有效载荷之一

{"appName":"Netscape","platform":"Linuxx86_64","cookies":1,"syslang":"en-US","userlang":"en-US","cpu":"","productSub":"20030107","setTimeout":1,"setInterval":1,"plugins":{"0":"ChromePDFViewer","1":"ShockwaveFlash","2":"WidevineContentDecryptionModule","3":"NativeClient","4":"ChromePDFViewer"},"mimeTypes":{"0":"application/pdf","1":"ShockwaveFlashapplication/x-shockwave-flash","2":"FutureSplashPlayerapplication/futuresplash","3":"WidevineContentDecryptionModuleapplication/x-ppapi-widevine-cdm","4":"NativeClientExecutableapplication/x-nacl","5":"PortableNativeClientExecutableapplication/x-pnacl","6":"PortableDocumentFormatapplication/x-google-chrome-pdf"},"screen":{"width":1600,"height":900,"colorDepth":24},"fonts":{"0":"monospace","1":"DejaVuSerif","2":"Georgia","3":"DejaVuSans","4":"TrebuchetMS","5":"Verdana","6":"AndaleMono","7":"DejaVuSansMono","8":"LiberationMono","9":"NimbusMonoL","10":"CourierNew","11":"Courier"}}

硒和铬相同

编辑:

VPN只能使用一次,但是在加载第一页后会被检测到。显然,正在运行一些JavaScript来检测Selenium。

I’ve been testing out Selenium with Chromedriver and I noticed that some pages can detect that you’re using Selenium even though there’s no automation at all. Even when I’m just browsing manually just using chrome through Selenium and Xephyr I often get a page saying that suspicious activity was detected. I’ve checked my user agent, and my browser fingerprint, and they are all exactly identical to the normal chrome browser.

When I browse to these sites in normal chrome everything works fine, but the moment I use Selenium I’m detected.

In theory chromedriver and chrome should look literally exactly the same to any webserver, but somehow they can detect it.

If you want some testcode try out this:

from pyvirtualdisplay import Display
from selenium import webdriver

display = Display(visible=1, size=(1600, 902))
display.start()
chrome_options = webdriver.ChromeOptions()
chrome_options.add_argument('--disable-extensions')
chrome_options.add_argument('--profile-directory=Default')
chrome_options.add_argument("--incognito")
chrome_options.add_argument("--disable-plugins-discovery");
chrome_options.add_argument("--start-maximized")
driver = webdriver.Chrome(chrome_options=chrome_options)
driver.delete_all_cookies()
driver.set_window_size(800,800)
driver.set_window_position(0,0)
print 'arguments done'
driver.get('http://stubhub.com')

If you browse around stubhub you’ll get redirected and ‘blocked’ within one or two requests. I’ve been investigating this and I can’t figure out how they can tell that a user is using Selenium.

How do they do it?

EDIT UPDATE:

I installed the Selenium IDE plugin in Firefox and I got banned when I went to stubhub.com in the normal firefox browser with only the additional plugin.

EDIT:

When I use Fiddler to view the HTTP requests being sent back and forth I’ve noticed that the ‘fake browser\’s’ requests often have ‘no-cache’ in the response header.

EDIT:

results like this Is there a way to detect that I’m in a Selenium Webdriver page from Javascript suggest that there should be no way to detect when you are using a webdriver. But this evidence suggests otherwise.

EDIT:

The site uploads a fingerprint to their servers, but I checked and the fingerprint of selenium is identical to the fingerprint when using chrome.

EDIT:

This is one of the fingerprint payloads that they send to their servers

{"appName":"Netscape","platform":"Linuxx86_64","cookies":1,"syslang":"en-US","userlang":"en-US","cpu":"","productSub":"20030107","setTimeout":1,"setInterval":1,"plugins":{"0":"ChromePDFViewer","1":"ShockwaveFlash","2":"WidevineContentDecryptionModule","3":"NativeClient","4":"ChromePDFViewer"},"mimeTypes":{"0":"application/pdf","1":"ShockwaveFlashapplication/x-shockwave-flash","2":"FutureSplashPlayerapplication/futuresplash","3":"WidevineContentDecryptionModuleapplication/x-ppapi-widevine-cdm","4":"NativeClientExecutableapplication/x-nacl","5":"PortableNativeClientExecutableapplication/x-pnacl","6":"PortableDocumentFormatapplication/x-google-chrome-pdf"},"screen":{"width":1600,"height":900,"colorDepth":24},"fonts":{"0":"monospace","1":"DejaVuSerif","2":"Georgia","3":"DejaVuSans","4":"TrebuchetMS","5":"Verdana","6":"AndaleMono","7":"DejaVuSansMono","8":"LiberationMono","9":"NimbusMonoL","10":"CourierNew","11":"Courier"}}

Its identical in selenium and in chrome

EDIT:

VPNs work for a single use but get detected after I load the first page. Clearly some javascript is being run to detect Selenium.


回答 0

对于Mac用户

cdc_使用Vim或Perl 替换变量

您可以使用vim,或如@Vic Seedoubleyew在@ Erti-Chris Eelmaa的答案中指出的那样perl,替换中的cdc_变量chromedriver请参阅@ Erti-Chris Eelmaa的帖子以了解有关该变量的更多信息)。使用vimperl防止您不得不重新编译源代码或使用十六进制编辑器。chromedriver在尝试编辑原件之前,请确保对其进行复印。另外,以下方法也在上进行了测试chromedriver version 2.41.578706


使用Vim

vim /path/to/chromedriver

在上面的代码行之后,您可能会看到一堆乱码。请执行下列操作:

  1. cdc_通过键入/cdc_并按进行搜索return
  2. 按启用编辑a
  3. 删除任意数量的,$cdc_lasutopfhvcZLmcfl然后用相等数量的字符替换删除的内容。如果您不这样做,chromedriver将会失败。
  4. 编辑完成后,按esc
  5. 要保存更改并退出,请键入:wq!并按return
  6. 如果您不想保存更改,但要退出,请键入:q!并按return
  7. 你完成了。

转到更改后的chromedriver双击。terminal应打开一个窗口。如果killed在输出中看不到,则说明您成功更改了驱动程序。


使用Perl

下面的行替换cdc_dog_

perl -pi -e 's/cdc_/dog_/g' /path/to/chromedriver

确保替换字符串的字符数与搜索字符串的字符数相同,否则chromedriver将失败。

Perl说明

s///g 表示您要搜索一个字符串并将其全局替换为另一个字符串(替换所有出现的字符串)。

例如, s/string/replacment/g

所以,

s/// 表示搜索并替换字符串。

cdc_ 是搜索字符串。

dog_ 是替换字符串。

g 是全局键,它将替换每次出现的字符串。

如何检查Perl替代品是否有效

以下行将打印每次出现的搜索字符串cdc_

perl -ne 'while(/cdc_/g){print "$&\n";}' /path/to/chromedriver

如果没有返回任何内容,cdc_则已被替换。

相反,您可以使用以下代码:

perl -ne 'while(/dog_/g){print "$&\n";}' /path/to/chromedriver

查看替换字符串,dog_现在是否在chromedriver二进制文件中。如果是这样,替换字符串将被打印到控制台。

转到更改后的chromedriver双击。terminal应打开一个窗口。如果killed在输出中看不到,则说明您成功更改了驱动程序。


包起来

更改chromedriver二进制文件后,请确保更改后的二进制文件的名称chromedriverchromedriver,并且原始二进制文件已从其原始位置移动或重命名。


我对这种方法的经验

以前,我在尝试登录时在网站上被检测到我,但是用cdc_相同大小的字符串替换后,我得以登录。但是就像其他人所说的那样,如果已经被检测到,则可能会被阻止即使使用此方法后,还有其他原因。因此,您可能必须尝试使用​​VPN,其他网络或具有什么功能的站点访问检测到您的站点。

For Mac Users

Replacing cdc_ variable using Vim or Perl

You can use vim, or as @Vic Seedoubleyew has pointed out in the answer by @Erti-Chris Eelmaa, perl, to replace the cdc_ variable in chromedriver(See post by @Erti-Chris Eelmaa to learn more about that variable). Using vim or perl prevents you from having to recompile source code or use a hex-editor. Make sure to make a copy of the original chromedriver before attempting to edit it. Also, the methods below were tested on chromedriver version 2.41.578706.


Using Vim

vim /path/to/chromedriver

After running the line above, you’ll probably see a bunch of gibberish. Do the following:

  1. Search for cdc_ by typing /cdc_ and pressing return.
  2. Enable editing by pressing a.
  3. Delete any amount of $cdc_lasutopfhvcZLmcfl and replace what was deleted with an equal amount characters. If you don’t, chromedriver will fail.
  4. After you’re done editing, press esc.
  5. To save the changes and quit, type :wq! and press return.
  6. If you don’t want to save the changes, but you want to quit, type :q! and press return.
  7. You’re done.

Go to the altered chromedriver and double click on it. A terminal window should open up. If you don’t see killed in the output, you successfully altered the driver.


Using Perl

The line below replaces cdc_ with dog_:

perl -pi -e 's/cdc_/dog_/g' /path/to/chromedriver

Make sure that the replacement string has the same number of characters as the search string, otherwise the chromedriver will fail.

Perl Explanation

s///g denotes that you want to search for a string and replace it globally with another string (replaces all occurrences).

e.g., s/string/replacment/g

So,

s/// denotes searching for and replacing a string.

cdc_ is the search string.

dog_ is the replacement string.

g is the global key, which replaces every occurrence of the string.

How to check if the Perl replacement worked

The following line will print every occurrence of the search string cdc_:

perl -ne 'while(/cdc_/g){print "$&\n";}' /path/to/chromedriver

If this returns nothing, then cdc_ has been replaced.

Conversely, you can use the this:

perl -ne 'while(/dog_/g){print "$&\n";}' /path/to/chromedriver

to see if your replacement string, dog_, is now in the chromedriver binary. If it is, the replacement string will be printed to the console.

Go to the altered chromedriver and double click on it. A terminal window should open up. If you don’t see killed in the output, you successfully altered the driver.


Wrapping Up

After altering the chromedriver binary, make sure that the name of the altered chromedriver binary is chromedriver, and that the original binary is either moved from its original location or renamed.


My Experience With This Method

I was previously being detected on a website while trying to log in, but after replacing cdc_ with an equal sized string, I was able to log in. Like others have said though, if you’ve already been detected, you might get blocked for a plethora of other reasons even after using this method. So you may have to try accessing the site that was detecting you using a VPN, different network, or what have you.


回答 1

基本上,硒检测的工作方式是,它们检测与selenium一起运行时出现的预定义javascript变量。僵尸程序检测脚本通常会在任何变量中(在窗口对象上)查找包含单词“ selenium” /“ webdriver”的内容,并记录名为$cdc_和的变量$wdc_。当然,所有这些取决于您所使用的浏览器。所有不同的浏览器都公开不同的内容。

对我来说,我使用了chrome,所以,要做的就是确保$cdc_不再存在作为文档变量,然后瞧瞧(下载chromedriver源代码,修改chromedriver并$cdc_以不同的名称重新编译。)

这是我在chromedriver中修改的功能:

call_function.js:

function getPageCache(opt_doc) {
  var doc = opt_doc || document;
  //var key = '$cdc_asdjflasutopfhvcZLmcfl_';
  var key = 'randomblabla_';
  if (!(key in doc))
    doc[key] = new Cache();
  return doc[key];
}

(注意评论,我所做的我转过身$cdc_randomblabla_

这是一个伪代码,演示了僵尸网络可能使用的一些技术:

runBotDetection = function () {
    var documentDetectionKeys = [
        "__webdriver_evaluate",
        "__selenium_evaluate",
        "__webdriver_script_function",
        "__webdriver_script_func",
        "__webdriver_script_fn",
        "__fxdriver_evaluate",
        "__driver_unwrapped",
        "__webdriver_unwrapped",
        "__driver_evaluate",
        "__selenium_unwrapped",
        "__fxdriver_unwrapped",
    ];

    var windowDetectionKeys = [
        "_phantom",
        "__nightmare",
        "_selenium",
        "callPhantom",
        "callSelenium",
        "_Selenium_IDE_Recorder",
    ];

    for (const windowDetectionKey in windowDetectionKeys) {
        const windowDetectionKeyValue = windowDetectionKeys[windowDetectionKey];
        if (window[windowDetectionKeyValue]) {
            return true;
        }
    };
    for (const documentDetectionKey in documentDetectionKeys) {
        const documentDetectionKeyValue = documentDetectionKeys[documentDetectionKey];
        if (window['document'][documentDetectionKeyValue]) {
            return true;
        }
    };

    for (const documentKey in window['document']) {
        if (documentKey.match(/\$[a-z]dc_/) && window['document'][documentKey]['cache_']) {
            return true;
        }
    }

    if (window['external'] && window['external'].toString() && (window['external'].toString()['indexOf']('Sequentum') != -1)) return true;

    if (window['document']['documentElement']['getAttribute']('selenium')) return true;
    if (window['document']['documentElement']['getAttribute']('webdriver')) return true;
    if (window['document']['documentElement']['getAttribute']('driver')) return true;

    return false;
};

根据用户@szx,也可以在十六进制编辑器中简单地打开chromedriver.exe,然后手动进行替换,而无需进行任何编译。

Basically the way the selenium detection works, is that they test for pre-defined javascript variables which appear when running with selenium. The bot detection scripts usually look anything containing word “selenium” / “webdriver” in any of the variables (on window object), and also document variables called $cdc_ and $wdc_. Of course, all of this depends on which browser you are on. All the different browsers expose different things.

For me, I used chrome, so, all that I had to do was to ensure that $cdc_ didn’t exist anymore as document variable, and voila (download chromedriver source code, modify chromedriver and re-compile $cdc_ under different name.)

this is the function I modified in chromedriver:

call_function.js:

function getPageCache(opt_doc) {
  var doc = opt_doc || document;
  //var key = '$cdc_asdjflasutopfhvcZLmcfl_';
  var key = 'randomblabla_';
  if (!(key in doc))
    doc[key] = new Cache();
  return doc[key];
}

(note the comment, all I did I turned $cdc_ to randomblabla_.

Here is a pseudo-code which demonstrates some of the techniques that bot networks might use:

runBotDetection = function () {
    var documentDetectionKeys = [
        "__webdriver_evaluate",
        "__selenium_evaluate",
        "__webdriver_script_function",
        "__webdriver_script_func",
        "__webdriver_script_fn",
        "__fxdriver_evaluate",
        "__driver_unwrapped",
        "__webdriver_unwrapped",
        "__driver_evaluate",
        "__selenium_unwrapped",
        "__fxdriver_unwrapped",
    ];

    var windowDetectionKeys = [
        "_phantom",
        "__nightmare",
        "_selenium",
        "callPhantom",
        "callSelenium",
        "_Selenium_IDE_Recorder",
    ];

    for (const windowDetectionKey in windowDetectionKeys) {
        const windowDetectionKeyValue = windowDetectionKeys[windowDetectionKey];
        if (window[windowDetectionKeyValue]) {
            return true;
        }
    };
    for (const documentDetectionKey in documentDetectionKeys) {
        const documentDetectionKeyValue = documentDetectionKeys[documentDetectionKey];
        if (window['document'][documentDetectionKeyValue]) {
            return true;
        }
    };

    for (const documentKey in window['document']) {
        if (documentKey.match(/\$[a-z]dc_/) && window['document'][documentKey]['cache_']) {
            return true;
        }
    }

    if (window['external'] && window['external'].toString() && (window['external'].toString()['indexOf']('Sequentum') != -1)) return true;

    if (window['document']['documentElement']['getAttribute']('selenium')) return true;
    if (window['document']['documentElement']['getAttribute']('webdriver')) return true;
    if (window['document']['documentElement']['getAttribute']('driver')) return true;

    return false;
};

according to user @szx, it is also possible to simply open chromedriver.exe in hex editor, and just do the replacement manually, without actually doing any compiling.


回答 2

正如我们已经在问题和发布的答案中弄清楚的那样,这里有一个反Web 爬网和一个名为“ Distil Networks”的Bot检测服务。而且,根据公司首席执行官的采访

即使他们可以创建新的机器人,我们还是想出了一种方法来识别Selenium,即他们正在使用的工具,因此,无论Selenium在该机器人上迭代多少次,我们都将阻止它。我们现在使用Python和许多不同的技术来做到这一点。一旦我们发现一种类型的漫游器出现了某种模式,那么我们就会对他们使用的技术进行反向工程并将其识别为恶意软件。

要了解它们如何精确地检测硒,需要时间和其他挑战,但是目前我们可以肯定地说些什么:

  • 它与您对硒采取的措施无关-一旦导航到该站点,便会立即被发现并被禁止。我尝试在动作之间添加人为的随机延迟,在页面加载后暂停-没有任何帮助
  • 这也不是关于浏览器指纹的-在具有干净配置文件而不是隐身模式的多个浏览器中尝试过-没有任何帮助
  • 因为根据采访中的提示,这是“逆向工程”,所以我怀疑这是通过在浏览器中执行一些JS代码完成的,这表明这是通过Selenium Webdriver自动化的浏览器

决定将其发布为答案,因为显然:

网站可以检测到何时在chromedriver中使用硒吗?

是。


另外,我还没有尝试过使用较旧的硒和较旧的浏览器版本-从理论上讲,Distil Networks僵尸检测程序当前依赖于某个特定点,硒中可能实现或添加了某些东西。然后,如果是这种情况,我们可能会检测到(是的,让我们检测检测器)在哪个点/版本上进行了相关更改,调查变更日志和变更集,并且可能会为我们提供有关在哪里查看的更多信息。以及它们用于检测由Webdriver驱动的浏览器的功能。这只是一个需要检验的理论。

As we’ve already figured out in the question and the posted answers, there is an anti Web-scraping and a Bot detection service called “Distil Networks” in play here. And, according to the company CEO’s interview:

Even though they can create new bots, we figured out a way to identify Selenium the a tool they’re using, so we’re blocking Selenium no matter how many times they iterate on that bot. We’re doing that now with Python and a lot of different technologies. Once we see a pattern emerge from one type of bot, then we work to reverse engineer the technology they use and identify it as malicious.

It’ll take time and additional challenges to understand how exactly they are detecting Selenium, but what can we say for sure at the moment:

  • it’s not related to the actions you take with selenium – once you navigate to the site, you get immediately detected and banned. I’ve tried to add artificial random delays between actions, take a pause after the page is loaded – nothing helped
  • it’s not about browser fingerprint either – tried it in multiple browsers with clean profiles and not, incognito modes – nothing helped
  • since, according to the hint in the interview, this was “reverse engineering”, I suspect this is done with some JS code being executed in the browser revealing that this is a browser automated via selenium webdriver

Decided to post it as an answer, since clearly:

Can a website detect when you are using selenium with chromedriver?

Yes.


Also, what I haven’t experimented with is older selenium and older browser versions – in theory, there could be something implemented/added to selenium at a certain point that Distil Networks bot detector currently relies on. Then, if this is the case, we might detect (yeah, let’s detect the detector) at what point/version a relevant change was made, look into changelog and changesets and, may be, this could give us more information on where to look and what is it they use to detect a webdriver-powered browser. It’s just a theory that needs to be tested.


回答 3

在wellsfargo.com上如何实施的示例:

try {
 if (window.document.documentElement.getAttribute("webdriver")) return !+[]
} catch (IDLMrxxel) {}
try {
 if ("_Selenium_IDE_Recorder" in window) return !+""
} catch (KknKsUayS) {}
try {
 if ("__webdriver_script_fn" in document) return !+""

Example of how it’s implemented on wellsfargo.com:

try {
 if (window.document.documentElement.getAttribute("webdriver")) return !+[]
} catch (IDLMrxxel) {}
try {
 if ("_Selenium_IDE_Recorder" in window) return !+""
} catch (KknKsUayS) {}
try {
 if ("__webdriver_script_fn" in document) return !+""

回答 4

混淆JavaScript结果

我已经检查了chromedriver源代码。这会将一些javascript文件注入浏览器。
此链接上的每个javascript文件都会注入到以下网页: https : //chromium.googlesource.com/chromium/src/+/master/chrome/test/chromedriver/js/

因此,我使用了逆向工程,并通过十六进制编辑来模糊化js文件。现在,我确定不再使用JavaScript变量,函数名称和固定字符串来发现硒的活动。但是仍然有些站点和reCaptcha可以检测到硒!
也许他们检查由chromedriver js执行引起的修改:)


编辑1:

Chrome“导航器”参数修改

我发现“导航器”中有一些参数可以简要介绍chromedriver的使用。这些是参数:

  • “ navigator.webdriver”在非自动模式下为’undefined’。在自动模式下,它是“ true”。
  • “ navigator.plugins”在无头chrome上的长度为0。因此,我添加了一些假元素来欺骗插件长度检查过程。
  • navigator.languages”设置为默认镶边值'[“ en-US”,“ en”,“ es”]’。

因此,我需要一个Chrome扩展程序来在网页上运行javascript。我使用本文提供的js代码进行了扩展,并使用另一篇文章将压缩扩展添加到我的项目中。我已经成功更改了值;但是仍然没有改变!

我没有找到其他像这样的变量,但这并不意味着它们不存在。reCaptcha仍然检测到chromedriver,因此应该有更多变量要更改。在下一步应的检测服务,逆向工程,我不想做的事。

现在,我不确定是否值得在此自动化过程上花费更多时间或寻找替代方法!

Obfuscating JavaScripts result

I have checked the chromedriver source code. That injects some javascript files to the browser.
Every javascript file on this link is injected to the web pages: https://chromium.googlesource.com/chromium/src/+/master/chrome/test/chromedriver/js/

So I used reverse engineering and obfuscated the js files by Hex editing. Now i was sure that no more javascript variable, function names and fixed strings were used to uncover selenium activity. But still some sites and reCaptcha detect selenium!
Maybe they check the modifications that are caused by chromedriver js execution 🙂


Edit 1:

Chrome ‘navigator’ parameters modification

I discovered there are some parameters in ‘navigator’ that briefly uncover using of chromedriver. These are the parameters:

  • “navigator.webdriver” On non-automated mode it is ‘undefined’. On automated mode it’s ‘true’.
  • “navigator.plugins” On headless chrome has 0 length. So I added some fake elements to fool the plugin length checking process.
  • navigator.languages” was set to default chrome value ‘[“en-US”, “en”, “es”]’ .

So what i needed was a chrome extension to run javascript on the web pages. I made an extension with the js code provided in the article and used another article to add the zipped extension to my project. I have successfully changed the values; But still nothing changed!

I didn’t find other variables like these but it doesn’t mean that they don’t exist. Still reCaptcha detects chromedriver, So there should be more variables to change. The next step should be reverse engineering of the detector services that i don’t want to do.

Now I’m not sure does it worth to spend more time on this automation process or search for alternative methods!


回答 5

尝试将selenium与chrome的特定用户配置文件一起使用,以这种方式,您可以将其用作特定用户并定义所需的任何内容。这样做时,它将以“实际”用户身份运行,请使用一些进程浏览器查看chrome进程。您会看到标签的区别。

例如:

username = os.getenv("USERNAME")
userProfile = "C:\\Users\\" + username + "\\AppData\\Local\\Google\\Chrome\\User Data\\Default"
options = webdriver.ChromeOptions()
options.add_argument("user-data-dir={}".format(userProfile))
# add here any tag you want.
options.add_experimental_option("excludeSwitches", ["ignore-certificate-errors", "safebrowsing-disable-download-protection", "safebrowsing-disable-auto-update", "disable-client-side-phishing-detection"])
chromedriver = "C:\Python27\chromedriver\chromedriver.exe"
os.environ["webdriver.chrome.driver"] = chromedriver
browser = webdriver.Chrome(executable_path=chromedriver, chrome_options=options)

chrome标签列表在这里

Try to use selenium with a specific user profile of chrome, That way you can use it as specific user and define any thing you want, When doing so it will run as a ‘real’ user, look at chrome process with some process explorer and you’ll see the difference with the tags.

For example:

username = os.getenv("USERNAME")
userProfile = "C:\\Users\\" + username + "\\AppData\\Local\\Google\\Chrome\\User Data\\Default"
options = webdriver.ChromeOptions()
options.add_argument("user-data-dir={}".format(userProfile))
# add here any tag you want.
options.add_experimental_option("excludeSwitches", ["ignore-certificate-errors", "safebrowsing-disable-download-protection", "safebrowsing-disable-auto-update", "disable-client-side-phishing-detection"])
chromedriver = "C:\Python27\chromedriver\chromedriver.exe"
os.environ["webdriver.chrome.driver"] = chromedriver
browser = webdriver.Chrome(executable_path=chromedriver, chrome_options=options)

chrome tag list here


回答 6

partial interface Navigator { readonly attribute boolean webdriver; };

Navigator界面的webdriver IDL属性必须返回webdriver-active标志的值,该标志最初为false。

此属性使网站可以确定用户代理受WebDriver的控制,并且可以用于帮助减轻拒绝服务攻击。

直接取自2017年W3C编辑的WebDriver草案。这在很大程度上意味着,至少可以确定硒驱动程序的未来迭代,以防止滥用。最终,如果没有源代码,很难说出到底是什么导致chrome驱动程序可检测到。

partial interface Navigator { readonly attribute boolean webdriver; };

The webdriver IDL attribute of the Navigator interface must return the value of the webdriver-active flag, which is initially false.

This property allows websites to determine that the user agent is under control by WebDriver, and can be used to help mitigate denial-of-service attacks.

Taken directly from the 2017 W3C Editor’s Draft of WebDriver. This heavily implies that at the very least, future iterations of selenium’s drivers will be identifiable to prevent misuse. Ultimately, it’s hard to tell without the source code, what exactly causes chrome driver in specific to be detectable.


回答 7

据说window.navigator.webdriver === true如果使用webdriver 会设置Firefox 。这是根据较早的规范之一(例如:archive.org)得出的,但是我在新的规范中找不到它,除了附录中一些非常模糊的措词。

对它的测试是在文件fingerprint_test.js中的硒代码中,其末尾的注释显示“当前仅在firefox中实现”,但是我无法通过一些简单的grep方式识别出该方向上的任何代码,在当前(41.0.2)Firefox发行树或Chromium树中。

从2015年1月起,我还发现了有关firefox驱动程序b82512999938中有关指纹的较早提交的评论。Selenium GIT-master仍在昨天下载的Selenium GIT-master中javascript/firefox-driver/extension/content/server.js添加了注释,该注释链接到当前w3c Webdriver规范中措辞略有不同的附录。

Firefox is said to set window.navigator.webdriver === true if working with a webdriver. That was according to one of the older specs (e.g.: archive.org) but I couldn’t find it in the new one except for some very vague wording in the appendices.

A test for it is in the selenium code in the file fingerprint_test.js where the comment at the end says “Currently only implemented in firefox” but I wasn’t able to identify any code in that direction with some simple greping, neither in the current (41.0.2) Firefox release-tree nor in the Chromium-tree.

I also found a comment for an older commit regarding fingerprinting in the firefox driver b82512999938 from January 2015. That code is still in the Selenium GIT-master downloaded yesterday at javascript/firefox-driver/extension/content/server.js with a comment linking to the slightly differently worded appendix in the current w3c webdriver spec.


回答 8

除了@ Erti-Chris Eelmaa的出色答案-令人讨厌window.navigator.webdriver,它是只读的。如果将其值更改为它的事件false仍然会存在true。因此,仍然可以检测到由自动化软件驱动的浏览器。 MDN

该变量由--enable-automationchrome中的标志管理。chromedriver使用该标志启动chrome并将chrome设置window.navigator.webdrivertrue。你可以在这里找到它。您需要将标记添加到“排除开关”中。例如(golang):

package main

import (
    "github.com/tebeka/selenium"
    "github.com/tebeka/selenium/chrome"
)

func main() {

caps := selenium.Capabilities{
    "browserName": "chrome",
}

chromeCaps := chrome.Capabilities{
    Path:            "/path/to/chrome-binary",
    ExcludeSwitches: []string{"enable-automation"},
}
caps.AddChrome(chromeCaps)

wd, err := selenium.NewRemote(caps, fmt.Sprintf("http://localhost:%d/wd/hub", 4444))
}

Additionally to the great answer of @Erti-Chris Eelmaa – there’s annoying window.navigator.webdriver and it is read-only. Event if you change the value of it to false it will still have true. Thats why the browser driven by automated software can still be detected. MDN

The variable is managed by the flag --enable-automation in chrome. The chromedriver launches chrome with that flag and chrome sets the window.navigator.webdriver to true. You can find it here. You need to add to “exclude switches” the flag. For instance (golang):

package main

import (
    "github.com/tebeka/selenium"
    "github.com/tebeka/selenium/chrome"
)

func main() {

caps := selenium.Capabilities{
    "browserName": "chrome",
}

chromeCaps := chrome.Capabilities{
    Path:            "/path/to/chrome-binary",
    ExcludeSwitches: []string{"enable-automation"},
}
caps.AddChrome(chromeCaps)

wd, err := selenium.NewRemote(caps, fmt.Sprintf("http://localhost:%d/wd/hub", 4444))
}

回答 9

听起来好像它们在Web应用程序防火墙后面。看一下modsecurity和owasp,看看它们是如何工作的。实际上,您要问的是如何进行漫游器检测规避。这不是Selenium Web驱动程序的用途。它用于测试您的Web应用程序,而不打其他Web应用程序。有可能,但基本上,您必须查看WAF在其规则集中查找的内容,并且如果可以的话,特别要避免使用硒。即使那样,它仍然可能无法正常工作,因为您不知道他们在使用什么WAF。您做了正确的第一步,就是伪造用户代理。如果仍然不能解决问题,那么WAF已经到位,您可能需要变得更加棘手。

编辑:点取自其他答案。确保首先正确设置了用户代理。可能是它撞到了本地Web服务器,还是嗅探了流量。

It sounds like they are behind a web application firewall. Take a look at modsecurity and owasp to see how those work. In reality, what you are asking is how to do bot detection evasion. That is not what selenium web driver is for. It is for testing your web application not hitting other web applications. It is possible, but basically, you’d have to look at what a WAF looks for in their rule set and specifically avoid it with selenium if you can. Even then, it might still not work because you don’t know what WAF they are using. You did the right first step, that is faking the user agent. If that didn’t work though, then a WAF is in place and you probably need to get more tricky.

Edit: Point taken from other answer. Make sure your user agent is actually being set correctly first. Maybe have it hit a local web server or sniff the traffic going out.


回答 10

即使您发送了所有正确的数据(例如,Selenium并未显示为扩展名,您也具有合理的分辨率/位深度&c),但仍有许多服务和工具可以分析访问者的行为,以确定访问者的行为是否演员是用户或自动化系统。

例如,访问一个站点然后立即通过将鼠标直接移到相关按钮上不到一秒钟立即执行一些操作,这实际上是用户不会做的。

作为调试工具,使用https://panopticlick.eff.org/这样的站点来检查浏览器的独特性可能也很有用。它还将帮助您验证是否有任何特定参数表明您正在Selenium中运行。

Even if you are sending all the right data (e.g. Selenium doesn’t show up as an extension, you have a reasonable resolution/bit-depth, &c), there are a number of services and tools which profile visitor behaviour to determine whether the actor is a user or an automated system.

For example, visiting a site then immediately going to perform some action by moving the mouse directly to the relevant button, in less than a second, is something no user would actually do.

It might also be useful as a debugging tool to use a site such as https://panopticlick.eff.org/ to check how unique your browser is; it’ll also help you verify whether there are any specific parameters that indicate you’re running in Selenium.


回答 11

我所看到的漫游器检测似乎比我在下面的答案中读到的东西更加复杂或至少有所不同。

实验1:

  1. 我从Python控制台使用Selenium打开浏览器和网页。
  2. 鼠标已经位于特定的位置,我知道该链接将在页面加载后出现。我从不动鼠标。
  3. 我按下了鼠标左键一次(这是从运行Python的控制台到浏览器的焦点)。
  4. 我再次按下鼠标左键(记住,光标在给定链接的上方)。
  5. 链接会正常打开,应该打开。

实验2:

  1. 和以前一样,我从Python控制台使用Selenium打开浏览器和网页。

  2. 这次,我没有使用鼠标单击,而是使用Selenium(在Python控制台中)单击具有随机偏移量的相同元素。

  3. 链接没有打开,但是我被带到了注册页面。

含义:

  • 通过Selenium打开网络浏览器并不会阻止我出现人类
  • 像人一样移动鼠标并不一定要归类为人
  • 通过Selenium单击具有偏移量的内容仍会引发警报

似乎很神秘,但是我想他们可以确定某个动作是否源自Selenium,而他们并不关心浏览器本身是否通过Selenium打开。还是可以确定窗口是否具有焦点?听到有人有任何见识会很有趣。

The bot detection I’ve seen seems more sophisticated or at least different than what I’ve read through in the answers below.

EXPERIMENT 1:

  1. I open a browser and web page with Selenium from a Python console.
  2. The mouse is already at a specific location where I know a link will appear once the page loads. I never move the mouse.
  3. I press the left mouse button once (this is necessary to take focus from the console where Python is running to the browser).
  4. I press the left mouse button again (remember, cursor is above a given link).
  5. The link opens normally, as it should.

EXPERIMENT 2:

  1. As before, I open a browser and the web page with Selenium from a Python console.

  2. This time around, instead of clicking with the mouse, I use Selenium (in the Python console) to click the same element with a random offset.

  3. The link doesn’t open, but I am taken to a sign up page.

IMPLICATIONS:

  • opening a web browser via Selenium doesn’t preclude me from appearing human
  • moving the mouse like a human is not necessary to be classified as human
  • clicking something via Selenium with an offset still raises the alarm

Seems mysterious, but I guess they can just determine whether an action originates from Selenium or not, while they don’t care whether the browser itself was opened via Selenium or not. Or can they determine if the window has focus? Would be interesting to hear if anyone has any insights.


回答 12

我发现的另一件事是,某些网站使用检查用户代理的平台。如果该值包含:“ HeadlessChrome”,则在使用无头模式时,该行为可能会很奇怪。

解决方法是覆盖用户代理值,例如在Java中:

chromeOptions.addArguments("--user-agent=Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.86 Safari/537.36");

One more thing I found is that some websites uses a platform that checks the User Agent. If the value contains: “HeadlessChrome” the behavior can be weird when using headless mode.

The workaround for that will be to override the user agent value, for example in Java:

chromeOptions.addArguments("--user-agent=Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.86 Safari/537.36");

回答 13

一些站点正在检测到此:

function d() {
try {
    if (window.document.$cdc_asdjflasutopfhvcZLmcfl_.cache_)
        return !0
} catch (e) {}

try {
    //if (window.document.documentElement.getAttribute(decodeURIComponent("%77%65%62%64%72%69%76%65%72")))
    if (window.document.documentElement.getAttribute("webdriver"))
        return !0
} catch (e) {}

try {
    //if (decodeURIComponent("%5F%53%65%6C%65%6E%69%75%6D%5F%49%44%45%5F%52%65%63%6F%72%64%65%72") in window)
    if ("_Selenium_IDE_Recorder" in window)
        return !0
} catch (e) {}

try {
    //if (decodeURIComponent("%5F%5F%77%65%62%64%72%69%76%65%72%5F%73%63%72%69%70%74%5F%66%6E") in document)
    if ("__webdriver_script_fn" in document)
        return !0
} catch (e) {}

Some sites are detecting this:

function d() {
try {
    if (window.document.$cdc_asdjflasutopfhvcZLmcfl_.cache_)
        return !0
} catch (e) {}

try {
    //if (window.document.documentElement.getAttribute(decodeURIComponent("%77%65%62%64%72%69%76%65%72")))
    if (window.document.documentElement.getAttribute("webdriver"))
        return !0
} catch (e) {}

try {
    //if (decodeURIComponent("%5F%53%65%6C%65%6E%69%75%6D%5F%49%44%45%5F%52%65%63%6F%72%64%65%72") in window)
    if ("_Selenium_IDE_Recorder" in window)
        return !0
} catch (e) {}

try {
    //if (decodeURIComponent("%5F%5F%77%65%62%64%72%69%76%65%72%5F%73%63%72%69%70%74%5F%66%6E") in document)
    if ("__webdriver_script_fn" in document)
        return !0
} catch (e) {}

回答 14

用以下代码编写一个html页面。您将看到,在DOM硒中,在externalHTML中应用了webdriver属性

<html>
<head>
  <script type="text/javascript">
  <!--
    function showWindow(){
      javascript:(alert(document.documentElement.outerHTML));
    }
  //-->
  </script>
</head>
<body>
  <form>
    <input type="button" value="Show outerHTML" onclick="showWindow()">
  </form>
</body>
</html>

Write an html page with the following code. You will see that in the DOM selenium applies a webdriver attribute in the outerHTML

<html>
<head>
  <script type="text/javascript">
  <!--
    function showWindow(){
      javascript:(alert(document.documentElement.outerHTML));
    }
  //-->
  </script>
</head>
<body>
  <form>
    <input type="button" value="Show outerHTML" onclick="showWindow()">
  </form>
</body>
</html>

回答 15

我发现这样更改javascript“ key”变量:

//Fools the website into believing a human is navigating it
        ((JavascriptExecutor)driver).executeScript("window.key = \"blahblah\";");

在将Selenium Webdriver和Google Chrome结合使用时,某些网站可以使用,因为许多网站都会检查此变量,以避免被Selenium废弃。

I’ve found changing the javascript “key” variable like this:

//Fools the website into believing a human is navigating it
        ((JavascriptExecutor)driver).executeScript("window.key = \"blahblah\";");

works for some websites when using Selenium Webdriver along with Google Chrome, since many sites check for this variable in order to avoid being scrapped by Selenium.


回答 16

在我看来,使用Selenium做到这一点的最简单方法是拦截XHR,后者将发送回浏览器指纹。

但这是仅硒的问题,因此最好使用其他方法。硒应该使这种事情变得容易,而不是更困难。

It seems to me the simplest way to do it with Selenium is to intercept the XHR that sends back the browser fingerprint.

But since this is a Selenium-only problem, its better just to use something else. Selenium is supposed to make things like this easier, not way harder.


回答 17

您可以尝试使用参数“启用自动化”

var options = new ChromeOptions();

// hide selenium
options.AddExcludedArguments(new List<string>() { "enable-automation" });

var driver = new ChromeDriver(ChromeDriverService.CreateDefaultService(), options);

但是,我想提醒您,此功能已在ChromeDriver 79.0.3945.16中修复。因此,您可能应该使用旧版的chrome。

另外,作为另一个选择,您可以尝试使用InternetExplorerDriver而不是Chrome。对于我来说,IE不会在没有任何黑客的情况下完全阻止。

有关更多信息,请尝试在这里查看:

Selenium Webdriver:修改navigator.webdriver标志以防止硒检测

Chrome v76中无法隐藏“ Chrome正在由自动化软件控制”信息栏

You can try to use the parameter “enable-automation”

var options = new ChromeOptions();

// hide selenium
options.AddExcludedArguments(new List<string>() { "enable-automation" });

var driver = new ChromeDriver(ChromeDriverService.CreateDefaultService(), options);

But, I want to warn that this ability was fixed in ChromeDriver 79.0.3945.16. So probably you should use older versions of chrome.

Also, as another option, you can try using InternetExplorerDriver instead of Chrome. As for me, IE does not block at all without any hacks.

And for more info try to take a look here:

Selenium webdriver: Modifying navigator.webdriver flag to prevent selenium detection

Unable to hide “Chrome is being controlled by automated software” infobar within Chrome v76