Python 实用宝典

Question 1

I want to get the content from the below website. If I use a browser like Firefox or Chrome I could get the real website page I want, but if I use the Python requests package (or wget command) to get it, it returns a totally different HTML page. I thought the developer of the website had made some blocks for this, so the question is:

How do I fake a browser visit by using python requests or command wget?

http://www.ichangtou.com/#company:data_000008.html

Question 2

Provide a User-Agent header:

import requests

url = 'http://www.ichangtou.com/#company:data_000008.html'
headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36'}

response = requests.get(url, headers=headers)
print(response.content)

FYI, here is a list of User-Agent strings for different browsers:

List of all Browsers

As a side note, there is a pretty useful third-party package called fake-useragent that provides a nice abstraction layer over user agents:

fake-useragent

Up to date simple useragent faker with real world database

Demo:

>>> from fake_useragent import UserAgent
>>> ua = UserAgent()
>>> ua.chrome
u'Mozilla/5.0 (Windows NT 6.2; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/32.0.1667.0 Safari/537.36'
>>> ua.random
u'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/36.0.1985.67 Safari/537.36'

Question 3

if this question is still valid

I used fake UserAgent

How to use:

from fake_useragent import UserAgent
import requests


ua = UserAgent()
print(ua.chrome)
header = {'User-Agent':str(ua.chrome)}
print(header)
url = "https://www.hybrid-analysis.com/recent-submissions?filter=file&sort=^timestamp"
htmlContent = requests.get(url, headers=header)
print(htmlContent)

outPut:

Mozilla/5.0 (Macintosh; Intel Mac OS X 10_8_2) AppleWebKit/537.17 (KHTML, like Gecko) Chrome/24.0.1309.0 Safari/537.17
{'User-Agent': 'Mozilla/5.0 (X11; OpenBSD i386) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/36.0.1985.125 Safari/537.36'}
<Response [200]>

Question 4

Try doing this, using firefox as fake user agent (moreover, it’s a good startup script for web scraping with the use of cookies):

#!/usr/bin/env python2
# -*- coding: utf8 -*-
# vim:ts=4:sw=4


import cookielib, urllib2, sys

def doIt(uri):
    cj = cookielib.CookieJar()
    opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj))
    page = opener.open(uri)
    page.addheaders = [('User-agent', 'Mozilla/5.0')]
    print page.read()

for i in sys.argv[1:]:
    doIt(i)

USAGE:

python script.py "http://www.ichangtou.com/#company:data_000008.html"

Question 5

The root of the answer is that the person asking the question needs to have a JavaScript interpreter to get what they are after. What I have found is I am able to get all of the information I wanted on a website in json before it was interpreted by JavaScript. This has saved me a ton of time in what would be parsing html hoping each webpage is in the same format.

So when you get a response from a website using requests really look at the html/text because you might find the javascripts JSON in the footer ready to be parsed.

Question 6

I know the URL of an image on Internet.

e.g. http://www.digimouth.com/news/media/2011/09/google-logo.jpg, which contains the logo of Google.

Now, how can I download this image using Python without actually opening the URL in a browser and saving the file manually.

Question 7

Python 2

Here is a more straightforward way if all you want to do is save it as a file:

import urllib

urllib.urlretrieve("http://www.digimouth.com/news/media/2011/09/google-logo.jpg", "local-filename.jpg")

The second argument is the local path where the file should be saved.

Python 3

As SergO suggested the code below should work with Python 3.

import urllib.request

urllib.request.urlretrieve("http://www.digimouth.com/news/media/2011/09/google-logo.jpg", "local-filename.jpg")

Question 8

import urllib
resource = urllib.urlopen("http://www.digimouth.com/news/media/2011/09/google-logo.jpg")
output = open("file01.jpg","wb")
output.write(resource.read())
output.close()

file01.jpg will contain your image.

Question 9

I wrote a script that does just this, and it is available on my github for your use.

I utilized BeautifulSoup to allow me to parse any website for images. If you will be doing much web scraping (or intend to use my tool) I suggest you sudo pip install BeautifulSoup. Information on BeautifulSoup is available here.

For convenience here is my code:

from bs4 import BeautifulSoup
from urllib2 import urlopen
import urllib

# use this image scraper from the location that 
#you want to save scraped images to

def make_soup(url):
    html = urlopen(url).read()
    return BeautifulSoup(html)

def get_images(url):
    soup = make_soup(url)
    #this makes a list of bs4 element tags
    images = [img for img in soup.findAll('img')]
    print (str(len(images)) + "images found.")
    print 'Downloading images to current working directory.'
    #compile our unicode list of image links
    image_links = [each.get('src') for each in images]
    for each in image_links:
        filename=each.split('/')[-1]
        urllib.urlretrieve(each, filename)
    return image_links

#a standard call looks like this
#get_images('http://www.wookmark.com')

Question 10

This can be done with requests. Load the page and dump the binary content to a file.

import os
import requests

url = 'https://apod.nasa.gov/apod/image/1701/potw1636aN159_HST_2048.jpg'
page = requests.get(url)

f_ext = os.path.splitext(url)[-1]
f_name = 'img{}'.format(f_ext)
with open(f_name, 'wb') as f:
    f.write(page.content)

Question 11

Python 3

urllib.request — Extensible library for opening URLs

from urllib.error import HTTPError
from urllib.request import urlretrieve

try:
    urlretrieve(image_url, image_local_path)
except FileNotFoundError as err:
    print(err)   # something wrong with local path
except HTTPError as err:
    print(err)  # something wrong with url

Question 12

A solution which works with Python 2 and Python 3:

try:
    from urllib.request import urlretrieve  # Python 3
except ImportError:
    from urllib import urlretrieve  # Python 2

url = "http://www.digimouth.com/news/media/2011/09/google-logo.jpg"
urlretrieve(url, "local-filename.jpg")

or, if the additional requirement of requests is acceptable and if it is a http(s) URL:

def load_requests(source_url, sink_path):
    """
    Load a file from an URL (e.g. http).

    Parameters
    ----------
    source_url : str
        Where to load the file from.
    sink_path : str
        Where the loaded file is stored.
    """
    import requests
    r = requests.get(source_url, stream=True)
    if r.status_code == 200:
        with open(sink_path, 'wb') as f:
            for chunk in r:
                f.write(chunk)

Question 13

I made a script expanding on Yup.’s script. I fixed some things. It will now bypass 403:Forbidden problems. It wont crash when an image fails to be retrieved. It tries to avoid corrupted previews. It gets the right absolute urls. It gives out more information. It can be run with an argument from the command line.

# getem.py
# python2 script to download all images in a given url
# use: python getem.py http://url.where.images.are

from bs4 import BeautifulSoup
import urllib2
import shutil
import requests
from urlparse import urljoin
import sys
import time

def make_soup(url):
    req = urllib2.Request(url, headers={'User-Agent' : "Magic Browser"}) 
    html = urllib2.urlopen(req)
    return BeautifulSoup(html, 'html.parser')

def get_images(url):
    soup = make_soup(url)
    images = [img for img in soup.findAll('img')]
    print (str(len(images)) + " images found.")
    print 'Downloading images to current working directory.'
    image_links = [each.get('src') for each in images]
    for each in image_links:
        try:
            filename = each.strip().split('/')[-1].strip()
            src = urljoin(url, each)
            print 'Getting: ' + filename
            response = requests.get(src, stream=True)
            # delay to avoid corrupted previews
            time.sleep(1)
            with open(filename, 'wb') as out_file:
                shutil.copyfileobj(response.raw, out_file)
        except:
            print '  An error occured. Continuing.'
    print 'Done.'

if __name__ == '__main__':
    url = sys.argv[1]
    get_images(url)

Question 14

Using requests library

import requests
import shutil,os

headers = {
    'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.108 Safari/537.36'
}
currentDir = os.getcwd()
path = os.path.join(currentDir,'Images')#saving images to Images folder

def ImageDl(url):
    attempts = 0
    while attempts < 5:#retry 5 times
        try:
            filename = url.split('/')[-1]
            r = requests.get(url,headers=headers,stream=True,timeout=5)
            if r.status_code == 200:
                with open(os.path.join(path,filename),'wb') as f:
                    r.raw.decode_content = True
                    shutil.copyfileobj(r.raw,f)
            print(filename)
            break
        except Exception as e:
            attempts+=1
            print(e)


ImageDl(url)

Question 15

This is very short answer.

import urllib
urllib.urlretrieve("http://photogallery.sandesh.com/Picture.aspx?AlubumId=422040", "Abc.jpg")

Question 16

Version for Python 3

I adjusted the code of @madprops for Python 3

# getem.py
# python2 script to download all images in a given url
# use: python getem.py http://url.where.images.are

from bs4 import BeautifulSoup
import urllib.request
import shutil
import requests
from urllib.parse import urljoin
import sys
import time

def make_soup(url):
    req = urllib.request.Request(url, headers={'User-Agent' : "Magic Browser"}) 
    html = urllib.request.urlopen(req)
    return BeautifulSoup(html, 'html.parser')

def get_images(url):
    soup = make_soup(url)
    images = [img for img in soup.findAll('img')]
    print (str(len(images)) + " images found.")
    print('Downloading images to current working directory.')
    image_links = [each.get('src') for each in images]
    for each in image_links:
        try:
            filename = each.strip().split('/')[-1].strip()
            src = urljoin(url, each)
            print('Getting: ' + filename)
            response = requests.get(src, stream=True)
            # delay to avoid corrupted previews
            time.sleep(1)
            with open(filename, 'wb') as out_file:
                shutil.copyfileobj(response.raw, out_file)
        except:
            print('  An error occured. Continuing.')
    print('Done.')

if __name__ == '__main__':
    get_images('http://www.wookmark.com')

Question 17

Something fresh for Python 3 using Requests:

Comments in the code. Ready to use function.


import requests
from os import path

def get_image(image_url):
    """
    Get image based on url.
    :return: Image name if everything OK, False otherwise
    """
    image_name = path.split(image_url)[1]
    try:
        image = requests.get(image_url)
    except OSError:  # Little too wide, but work OK, no additional imports needed. Catch all conection problems
        return False
    if image.status_code == 200:  # we could have retrieved error page
        base_dir = path.join(path.dirname(path.realpath(__file__)), "images") # Use your own path or "" to use current working directory. Folder must exist.
        with open(path.join(base_dir, image_name), "wb") as f:
            f.write(image.content)
        return image_name

get_image("https://apod.nasddfda.gov/apod/image/2003/S106_Mishra_1947.jpg")

Question 18

Late answer, but for python>=3.6 you can use dload, i.e.:

import dload
dload.save("http://www.digimouth.com/news/media/2011/09/google-logo.jpg")

if you need the image as bytes, use:

img_bytes = dload.bytes("http://www.digimouth.com/news/media/2011/09/google-logo.jpg")

install using pip3 install dload

Question 19

img_data=requests.get('https://apod.nasa.gov/apod/image/1701/potw1636aN159_HST_2048.jpg')

with open(str('file_name.jpg', 'wb') as handler:
    handler.write(img_data)

Question 20

How can I retrieve the links of a webpage and copy the url address of the links using Python?

Question 21

Here’s a short snippet using the SoupStrainer class in BeautifulSoup:

import httplib2
from bs4 import BeautifulSoup, SoupStrainer

http = httplib2.Http()
status, response = http.request('http://www.nytimes.com')

for link in BeautifulSoup(response, parse_only=SoupStrainer('a')):
    if link.has_attr('href'):
        print(link['href'])

The BeautifulSoup documentation is actually quite good, and covers a number of typical scenarios:

https://www.crummy.com/software/BeautifulSoup/bs4/doc/

Edit: Note that I used the SoupStrainer class because it’s a bit more efficient (memory and speed wise), if you know what you’re parsing in advance.

Question 22

For completeness sake, the BeautifulSoup 4 version, making use of the encoding supplied by the server as well:

from bs4 import BeautifulSoup
import urllib.request

parser = 'html.parser'  # or 'lxml' (preferred) or 'html5lib', if installed
resp = urllib.request.urlopen("http://www.gpsbasecamp.com/national-parks")
soup = BeautifulSoup(resp, parser, from_encoding=resp.info().get_param('charset'))

for link in soup.find_all('a', href=True):
    print(link['href'])

or the Python 2 version:

from bs4 import BeautifulSoup
import urllib2

parser = 'html.parser'  # or 'lxml' (preferred) or 'html5lib', if installed
resp = urllib2.urlopen("http://www.gpsbasecamp.com/national-parks")
soup = BeautifulSoup(resp, parser, from_encoding=resp.info().getparam('charset'))

for link in soup.find_all('a', href=True):
    print link['href']

and a version using the requests library, which as written will work in both Python 2 and 3:

from bs4 import BeautifulSoup
from bs4.dammit import EncodingDetector
import requests

parser = 'html.parser'  # or 'lxml' (preferred) or 'html5lib', if installed
resp = requests.get("http://www.gpsbasecamp.com/national-parks")
http_encoding = resp.encoding if 'charset' in resp.headers.get('content-type', '').lower() else None
html_encoding = EncodingDetector.find_declared_encoding(resp.content, is_html=True)
encoding = html_encoding or http_encoding
soup = BeautifulSoup(resp.content, parser, from_encoding=encoding)

for link in soup.find_all('a', href=True):
    print(link['href'])

The soup.find_all('a', href=True) call finds all <a> elements that have an href attribute; elements without the attribute are skipped.

BeautifulSoup 3 stopped development in March 2012; new projects really should use BeautifulSoup 4, always.

Note that you should leave decoding the HTML from bytes to BeautifulSoup. You can inform BeautifulSoup of the characterset found in the HTTP response headers to assist in decoding, but this can be wrong and conflicting with a <meta> header info found in the HTML itself, which is why the above uses the BeautifulSoup internal class method EncodingDetector.find_declared_encoding() to make sure that such embedded encoding hints win over a misconfigured server.

With requests, the response.encoding attribute defaults to Latin-1 if the response has a text/* mimetype, even if no characterset was returned. This is consistent with the HTTP RFCs but painful when used with HTML parsing, so you should ignore that attribute when no charset is set in the Content-Type header.

Question 23

Others have recommended BeautifulSoup, but it’s much better to use lxml. Despite its name, it is also for parsing and scraping HTML. It’s much, much faster than BeautifulSoup, and it even handles “broken” HTML better than BeautifulSoup (their claim to fame). It has a compatibility API for BeautifulSoup too if you don’t want to learn the lxml API.

Ian Blicking agrees.

There’s no reason to use BeautifulSoup anymore, unless you’re on Google App Engine or something where anything not purely Python isn’t allowed.

lxml.html also supports CSS3 selectors so this sort of thing is trivial.

An example with lxml and xpath would look like this:

import urllib
import lxml.html
connection = urllib.urlopen('http://www.nytimes.com')

dom =  lxml.html.fromstring(connection.read())

for link in dom.xpath('//a/@href'): # select the url in href for all a tags(links)
    print link

Question 24

import urllib2
import BeautifulSoup

request = urllib2.Request("http://www.gpsbasecamp.com/national-parks")
response = urllib2.urlopen(request)
soup = BeautifulSoup.BeautifulSoup(response)
for a in soup.findAll('a'):
  if 'national-park' in a['href']:
    print 'found a url with national-park in the link'

Question 25

The following code is to retrieve all the links available in a webpage using urllib2 and BeautifulSoup4:

import urllib2
from bs4 import BeautifulSoup

url = urllib2.urlopen("http://www.espncricinfo.com/").read()
soup = BeautifulSoup(url)

for line in soup.find_all('a'):
    print(line.get('href'))

Question 26

Under the hood BeautifulSoup now uses lxml. Requests, lxml & list comprehensions makes a killer combo.

import requests
import lxml.html

dom = lxml.html.fromstring(requests.get('http://www.nytimes.com').content)

[x for x in dom.xpath('//a/@href') if '//' in x and 'nytimes.com' not in x]

In the list comp, the “if ‘//’ and ‘url.com’ not in x” is a simple method to scrub the url list of the sites ‘internal’ navigation urls, etc.

Question 27

just for getting the links, without B.soup and regex:

import urllib2
url="http://www.somewhere.com"
page=urllib2.urlopen(url)
data=page.read().split("</a>")
tag="<a href=\""
endtag="\">"
for item in data:
    if "<a href" in item:
        try:
            ind = item.index(tag)
            item=item[ind+len(tag):]
            end=item.index(endtag)
        except: pass
        else:
            print item[:end]

for more complex operations, of course BSoup is still preferred.

Question 28

This script does what your looking for, But also resolves the relative links to absolute links.

import urllib
import lxml.html
import urlparse

def get_dom(url):
    connection = urllib.urlopen(url)
    return lxml.html.fromstring(connection.read())

def get_links(url):
    return resolve_links((link for link in get_dom(url).xpath('//a/@href')))

def guess_root(links):
    for link in links:
        if link.startswith('http'):
            parsed_link = urlparse.urlparse(link)
            scheme = parsed_link.scheme + '://'
            netloc = parsed_link.netloc
            return scheme + netloc

def resolve_links(links):
    root = guess_root(links)
    for link in links:
        if not link.startswith('http'):
            link = urlparse.urljoin(root, link)
        yield link  

for link in get_links('http://www.google.com'):
    print link

Question 29

To find all the links, we will in this example use the urllib2 module together with the re.module *One of the most powerful function in the re module is “re.findall()”. While re.search() is used to find the first match for a pattern, re.findall() finds all the matches and returns them as a list of strings, with each string representing one match*

import urllib2

import re
#connect to a URL
website = urllib2.urlopen(url)

#read html code
html = website.read()

#use re.findall to get all the links
links = re.findall('"((http|ftp)s?://.*?)"', html)

print links

Question 30

Why not use regular expressions:

import urllib2
import re
url = "http://www.somewhere.com"
page = urllib2.urlopen(url)
page = page.read()
links = re.findall(r"<a.*?\s*href=\"(.*?)\".*?>(.*?)</a>", page)
for link in links:
    print('href: %s, HTML text: %s' % (link[0], link[1]))

Question 31

Links can be within a variety of attributes so you could pass a list of those attributes to select

for example, with src and href attribute (here I am using the starts with ^ operator to specify that either of these attributes values starts with http. You can tailor this as required

from bs4 import BeautifulSoup as bs
import requests
r = requests.get('https://stackoverflow.com/')
soup = bs(r.content, 'lxml')
links = [item['href'] if item.get('href') is not None else item['src'] for item in soup.select('[href^="http"], [src^="http"]') ]
print(links)

Attribute = value selectors

[attr^=value]

Represents elements with an attribute name of attr whose value is prefixed (preceded) by value.

Question 32

Here’s an example using @ars accepted answer and the BeautifulSoup4, requests, and wget modules to handle the downloads.

import requests
import wget
import os

from bs4 import BeautifulSoup, SoupStrainer

url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/eeg-mld/eeg_full/'
file_type = '.tar.gz'

response = requests.get(url)

for link in BeautifulSoup(response.content, 'html.parser', parse_only=SoupStrainer('a')):
    if link.has_attr('href'):
        if file_type in link['href']:
            full_path = url + link['href']
            wget.download(full_path)

Question 33

I found the answer by @Blairg23 working , after the following correction (covering the scenario where it failed to work correctly):

for link in BeautifulSoup(response.content, 'html.parser', parse_only=SoupStrainer('a')):
    if link.has_attr('href'):
        if file_type in link['href']:
            full_path =urlparse.urljoin(url , link['href']) #module urlparse need to be imported
            wget.download(full_path)

For Python 3:

urllib.parse.urljoin has to be used in order to obtain the full URL instead.

Question 34

BeatifulSoup’s own parser can be slow. It might be more feasible to use lxml which is capable of parsing directly from a URL (with some limitations mentioned below).

import lxml.html

doc = lxml.html.parse(url)

links = doc.xpath('//a[@href]')

for link in links:
    print link.attrib['href']

The code above will return the links as is, and in most cases they would be relative links or absolute from the site root. Since my use case was to only extract a certain type of links, below is a version that converts the links to full URLs and which optionally accepts a glob pattern like *.mp3. It won’t handle single and double dots in the relative paths though, but so far I didn’t have the need for it. If you need to parse URL fragments containing ../ or ./ then urlparse.urljoin might come in handy.

NOTE: Direct lxml url parsing doesn’t handle loading from https and doesn’t do redirects, so for this reason the version below is using urllib2 + lxml.

#!/usr/bin/env python
import sys
import urllib2
import urlparse
import lxml.html
import fnmatch

try:
    import urltools as urltools
except ImportError:
    sys.stderr.write('To normalize URLs run: `pip install urltools --user`')
    urltools = None


def get_host(url):
    p = urlparse.urlparse(url)
    return "{}://{}".format(p.scheme, p.netloc)


if __name__ == '__main__':
    url = sys.argv[1]
    host = get_host(url)
    glob_patt = len(sys.argv) > 2 and sys.argv[2] or '*'

    doc = lxml.html.parse(urllib2.urlopen(url))
    links = doc.xpath('//a[@href]')

    for link in links:
        href = link.attrib['href']

        if fnmatch.fnmatch(href, glob_patt):

            if not href.startswith(('http://', 'https://' 'ftp://')):

                if href.startswith('/'):
                    href = host + href
                else:
                    parent_url = url.rsplit('/', 1)[0]
                    href = urlparse.urljoin(parent_url, href)

                    if urltools:
                        href = urltools.normalize(href)

            print href

The usage is as follows:

getlinks.py http://stackoverflow.com/a/37758066/191246
getlinks.py http://stackoverflow.com/a/37758066/191246 "*users*"
getlinks.py http://fakedomain.mu/somepage.html "*.mp3"

Question 35

import urllib2
from bs4 import BeautifulSoup
a=urllib2.urlopen('http://dir.yahoo.com')
code=a.read()
soup=BeautifulSoup(code)
links=soup.findAll("a")
#To get href part alone
print links[0].attrs['href']

Question 36

There can be many duplicate links together with both external and internal links. To differentiate between the two and just get unique links using sets:

# Python 3.
import urllib    
from bs4 import BeautifulSoup

url = "http://www.espncricinfo.com/"
resp = urllib.request.urlopen(url)
# Get server encoding per recommendation of Martijn Pieters.
soup = BeautifulSoup(resp, from_encoding=resp.info().get_param('charset'))  
external_links = set()
internal_links = set()
for line in soup.find_all('a'):
    link = line.get('href')
    if not link:
        continue
    if link.startswith('http'):
        external_links.add(link)
    else:
        internal_links.add(link)

# Depending on usage, full internal links may be preferred.
full_internal_links = {
    urllib.parse.urljoin(url, internal_link) 
    for internal_link in internal_links
}

# Print all unique external and full internal links.
for link in external_links.union(full_internal_links):
    print(link)

Question 37

I need to select an element from a drop-down menu.

For example:

<select id="fruits01" class="select" name="fruits">
  <option value="0">Choose your fruits:</option>
  <option value="1">Banana</option>
  <option value="2">Mango</option>
</select>

1) First I have to click on it. I do this:

inputElementFruits = driver.find_element_by_xpath("//select[id='fruits']").click()

2) After that I have to select the good element, lets say Mango.

I tried to do it with inputElementFruits.send_keys(...) but it did not work.

Question 38

Unless your click is firing some kind of ajax call to populate your list, you don’t actually need to execute the click.

Just find the element and then enumerate the options, selecting the option(s) you want.

Here is an example:

from selenium import webdriver
b = webdriver.Firefox()
b.find_element_by_xpath("//select[@name='element_name']/option[text()='option_text']").click()

You can read more in:
https://sqa.stackexchange.com/questions/1355/unable-to-select-an-option-using-seleniums-python-webdriver

Question 39

Selenium provides a convenient Select class to work with select -> option constructs:

from selenium import webdriver
from selenium.webdriver.support.ui import Select

driver = webdriver.Firefox()
driver.get('url')

select = Select(driver.find_element_by_id('fruits01'))

# select by visible text
select.select_by_visible_text('Banana')

# select by value 
select.select_by_value('1')

See also:

What is the correct way to select an using Selenium’s Python WebDriver?

Question 40

firstly you need to import the Select class and then you need to create the instance of Select class. After creating the instance of Select class, you can perform select methods on that instance to select the options from dropdown list. Here is the code

from selenium.webdriver.support.select import Select

select_fr = Select(driver.find_element_by_id("fruits01"))
select_fr.select_by_index(0)

Question 41

I hope this code will help you.

from selenium.webdriver.support.ui import Select

dropdown element with id

ddelement= Select(driver.find_element_by_id('id_of_element'))

dropdown element with xpath

ddelement= Select(driver.find_element_by_xpath('xpath_of_element'))

dropdown element with css selector

ddelement= Select(driver.find_element_by_css_selector('css_selector_of_element'))

Selecting ‘Banana’ from a dropdown

Using the index of dropdown

ddelement.select_by_index(1)

Using the value of dropdown

ddelement.select_by_value('1')

You can use match the text which is displayed in the drop down.

ddelement.select_by_visible_text('Banana')

Question 42

I tried a lot many things, but my drop down was inside a table and I was not able to perform a simple select operation. Only the below solution worked. Here I am highlighting drop down elem and pressing down arrow until getting the desired value –

        #identify the drop down element
        elem = browser.find_element_by_name(objectVal)
        for option in elem.find_elements_by_tag_name('option'):
            if option.text == value:
                break

            else:
                ARROW_DOWN = u'\ue015'
                elem.send_keys(ARROW_DOWN)

Question 43

You don’t have to click anything. Use find by xpath or whatever you choose and then use send keys

For your example: HTML:

<select id="fruits01" class="select" name="fruits">
    <option value="0">Choose your fruits:</option>
    <option value="1">Banana</option>
    <option value="2">Mango</option>
</select>

Python:

fruit_field = browser.find_element_by_xpath("//input[@name='fruits']")
fruit_field.send_keys("Mango")

That’s it.

Question 44

You can use a css selector combination a well

driver.find_element_by_css_selector("#fruits01 [value='1']").click()

Change the 1 in the attribute = value css selector to the value corresponding with the desired fruit.

Question 45

from selenium.webdriver.support.ui import Select
driver = webdriver.Ie(".\\IEDriverServer.exe")
driver.get("https://test.com")
select = Select(driver.find_element_by_xpath("""//input[@name='n_name']"""))
select.select_by_index(2)

It will work fine

Question 46

It works with option value:

from selenium import webdriver
b = webdriver.Firefox()
b.find_element_by_xpath("//select[@class='class_name']/option[@value='option_value']").click()

Question 47

In this way you can select all the options in any dropdowns.

driver.get("https://www.spectrapremium.com/en/aftermarket/north-america")

print( "The title is  : " + driver.title)

inputs = Select(driver.find_element_by_css_selector('#year'))

input1 = len(inputs.options)

for items in range(input1):

    inputs.select_by_index(items)
    time.sleep(1)

Question 48

The best way to use selenium.webdriver.support.ui.Select class to work to with dropdown selection but some time it does not work as expected due to designing issue or other issues of the HTML.

In this type of situation you can also prefer as alternate solution using execute_script() as below :-

option_visible_text = "Banana"
select = driver.find_element_by_id("fruits01")

#now use this to select option from dropdown by visible text 
driver.execute_script("var select = arguments[0]; for(var i = 0; i < select.options.length; i++){ if(select.options[i].text == arguments[1]){ select.options[i].selected = true; } }", select, option_visible_text);

Question 49

As per the HTML provided:

<select id="fruits01" class="select" name="fruits">
  <option value="0">Choose your fruits:</option>
  <option value="1">Banana</option>
  <option value="2">Mango</option>
</select>

To select an <option> element from a html-select menu you have to use the Select Class. Moreover, as you have to interact with the drop-down-menu you have to induce WebDriverWait for the element_to_be_clickable().

To select the <option> with text as Mango from the dropdown you can use you can use either of the following Locator Strategies:

Using ID attribute and select_by_visible_text() method:

from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.ui import Select

select = Select(WebDriverWait(driver, 10).until(EC.element_to_be_clickable((By.ID, "fruits01"))))
select.select_by_visible_text("Mango")

Using CSS-SELECTOR and select_by_value() method:

select = Select(WebDriverWait(driver, 10).until(EC.element_to_be_clickable((By.CSS_SELECTOR, "select.select[name='fruits']"))))
select.select_by_value("2")

Using XPATH and select_by_index() method:

select = Select(WebDriverWait(driver, 10).until(EC.element_to_be_clickable((By.CSS_SELECTOR, "//select[@class='select' and @name='fruits']"))))
select.select_by_index(2)

Question 50

List item

public class ListBoxMultiple {

public static void main(String[] args) throws InterruptedException {
    // TODO Auto-generated method stub
    System.setProperty("webdriver.chrome.driver", "./drivers/chromedriver.exe");
    WebDriver driver=new ChromeDriver();
    driver.get("file:///C:/Users/Amitabh/Desktop/hotel2.html");//open the website
    driver.manage().window().maximize();


    WebElement hotel = driver.findElement(By.id("maarya"));//get the element

    Select sel=new Select(hotel);//for handling list box
    //isMultiple
    if(sel.isMultiple()){
        System.out.println("it is multi select list");
    }
    else{
        System.out.println("it is single select list");
    }
    //select option
    sel.selectByIndex(1);// you can select by index values
    sel.selectByValue("p");//you can select by value
    sel.selectByVisibleText("Fish");// you can also select by visible text of the options
    //deselect option but this is possible only in case of multiple lists
    Thread.sleep(1000);
    sel.deselectByIndex(1);
    sel.deselectAll();

    //getOptions
    List<WebElement> options = sel.getOptions();

    int count=options.size();
    System.out.println("Total options: "+count);

    for(WebElement opt:options){ // getting text of every elements
        String text=opt.getText();
        System.out.println(text);
        }

    //select all options
    for(int i=0;i<count;i++){
        sel.selectByIndex(i);
        Thread.sleep(1000);
    }

    driver.quit();

}

}

问题：如何使用Python请求伪造浏览器访问？

回答 0

回答 1

回答 2

用法：

USAGE:

回答 3

问题：如何使用我已经知道URL地址的Python在本地保存图像？

回答 0

Python 2

Python 3

Python 2

Python 3

回答 1

回答 2

回答 3

回答 4

回答 5

回答 6

回答 7

回答 8

回答 9

Python 3版本

Version for Python 3

回答 10

回答 11

回答 12

问题：使用python和BeautifulSoup从网页检索链接

回答 0

回答 1

回答 2

回答 3

回答 4

回答 5

回答 6

回答 7

回答 8

回答 9

回答 10

回答 11

回答 12

回答 13

回答 14

回答 15

问题：如何使用Python使用Selenium选择下拉菜单值？

回答 0

回答 1

回答 2

回答 3

回答 4

回答 5

回答 6

回答 7

回答 8

回答 9

回答 10

回答 11

回答 12

问题：如何按类别查找元素

回答 0

回答 1

回答 2

回答 3

回答 4

回答 5

如何按类别查找元素

How to find elements by class

回答 6

回答 7

回答 8

回答 9

回答 10

回答 11

回答 12

回答 13

回答 14

回答 15

有趣好用的Python教程