soup =BeautifulSoup(sdata)
mydivs = soup.findAll('div')for div in mydivs:if(div["class"]=="stylelistrow"):print div
脚本完成后的同一行出现错误。
File"./beautifulcoding.py", line 130,in getlanguage
if(div["class"]=="stylelistrow"):File"/usr/local/lib/python2.6/dist-packages/BeautifulSoup.py", line 599,in __getitem__
return self._getAttrMap()[key]KeyError:'class'
I’m having trouble parsing HTML elements with “class” attribute using Beautifulsoup. The code looks like this
soup = BeautifulSoup(sdata)
mydivs = soup.findAll('div')
for div in mydivs:
if (div["class"] == "stylelistrow"):
print div
I get an error on the same line “after” the script finishes.
File "./beautifulcoding.py", line 130, in getlanguage
if (div["class"] == "stylelistrow"):
File "/usr/local/lib/python2.6/dist-packages/BeautifulSoup.py", line 599, in __getitem__
return self._getAttrMap()[key]
KeyError: 'class'
soup =BeautifulSoup(sdata)
class_list =["stylelistrow"]# can add any other classes to this list.# will find any divs with any names in class_list:
mydivs = soup.find_all('div', class_=class_list)
To be clear, this selects only the p tags that are both strikeout and body class.
To find for the intersection of any in a set of classes (not the intersection, but the union), you can give a list to the class_ keyword argument (as of 4.1.2):
soup = BeautifulSoup(sdata)
class_list = ["stylelistrow"] # can add any other classes to this list.
# will find any divs with any names in class_list:
mydivs = soup.find_all('div', class_=class_list)
Also note that findAll has been renamed from the camelCase to the more Pythonic find_all.
This works for me to access the class attribute (on beautifulsoup 4, contrary to what the documentation says). The KeyError comes a list being returned not a dictionary.
for hit in soup.findAll(name='span'):
print hit.contents[1]['class']
for div in mydivs:
try:
clazz = div["class"]
except KeyError:
clazz = ""
if (clazz == "stylelistrow"):
print div
回答 12
或者,我们可以使用lxml,它支持xpath并且非常快!
from lxml import html, etree
attr = html.fromstring(html_text)#passing the raw html
handles = attr.xpath('//div[@class="stylelistrow"]')#xpath exresssion to find that specific classfor each in handles:print(etree.tostring(each))#printing the html as string
Alternatively we can use lxml, it support xpath and very fast!
from lxml import html, etree
attr = html.fromstring(html_text)#passing the raw html
handles = attr.xpath('//div[@class="stylelistrow"]')#xpath exresssion to find that specific class
for each in handles:
print(etree.tostring(each))#printing the html as string
回答 13
这应该工作:
soup =BeautifulSoup(sdata)
mydivs = soup.findAll('div')for div in mydivs:if(div.find(class_ =="stylelistrow"):print div
# parse html
page_soup = soup(web_page.read(),"html.parser")# filter out items matching class name
all_songs = page_soup.findAll("li","song_item")# traverse through all_songsfor song in all_songs:# get text out of span element matching class 'song_name'# doing a 'find' by class name within a specific song element taken out of 'all_songs' collection
song.find("span","song_name").text
In other answers the findAll is being used on the soup object itself, but I needed a way to do a find by class name on objects inside a specific element extracted from the object I obtained after doing findAll.
If you are trying to do a search inside nested HTML elements to get objects by class name, try below –
# parse html
page_soup = soup(web_page.read(), "html.parser")
# filter out items matching class name
all_songs = page_soup.findAll("li", "song_item")
# traverse through all_songs
for song in all_songs:
# get text out of span element matching class 'song_name'
# doing a 'find' by class name within a specific song element taken out of 'all_songs' collection
song.find("span", "song_name").text
Points to note:
I’m not explicitly defining the search to be on ‘class’ attribute findAll("li", {"class": "song_item"}), since it’s the only attribute I’m searching on and it will by default search for class attribute if you don’t exclusively tell which attribute you want to find on.
When you do a findAll or find, the resulting object is of class bs4.element.ResultSet which is a subclass of list. You can utilize all methods of ResultSet, inside any number of nested elements (as long as they are of type ResultSet) to do a find or find all.
replace ‘totalcount’ with your class name and ‘span’ with tag you are looking for. Also, if your class contains multiple names with space, just choose one and use.
P.S. This finds the first element with given criteria. If you want to find all elements then replace ‘find’ with ‘find_all’.