如何使用BeautifulSoup查找节点的子节点-Python 实用宝典

问题：如何使用BeautifulSoup查找节点的子节点

我想获取所有<a>属于以下子项的标签<li>：

<div>
<li class="test">
    <a>link1</a>
    <ul> 
       <li>  
          <a>link2</a> 
       </li>
    </ul>
</li>
</div>

我知道如何找到像这样的特定类的元素：

soup.find("li", { "class" : "test" })

但是我不知道如何找到所有<a>的孩子的孩子，<li class=test>而不是其他孩子的孩子。

就像我想选择：

<a>link1</a>

I want to get all the <a> tags which are children of <li>:

<div>
<li class="test">
    <a>link1</a>
    <ul> 
       <li>  
          <a>link2</a> 
       </li>
    </ul>
</li>
</div>

I know how to find element with particular class like this:

soup.find("li", { "class" : "test" })

But I don’t know how to find all <a> which are children of <li class=test> but not any others.

Like I want to select:

<a>link1</a>

回答 0

试试这个

li = soup.find('li', {'class': 'text'})
children = li.findChildren("a" , recursive=False)
for child in children:
    print child

Try this

li = soup.find('li', {'class': 'text'})
children = li.findChildren("a" , recursive=False)
for child in children:
    print(child)

回答 1

DOC中有一个超小部分，显示了如何查找/ find_all 直接子级。

https://www.crummy.com/software/BeautifulSoup/bs4/doc/#the-recursive-argument

在您需要的情况下，link1是第一个直接子级：

# for only first direct child
soup.find("li", { "class" : "test" }).find("a", recursive=False)

如果您想要所有直系子女：

# for all direct children
soup.find("li", { "class" : "test" }).findAll("a", recursive=False)

There’s a super small section in the DOCs that shows how to find/find_all direct children.

https://www.crummy.com/software/BeautifulSoup/bs4/doc/#the-recursive-argument

In your case as you want link1 which is first direct child:

# for only first direct child
soup.find("li", { "class" : "test" }).find("a", recursive=False)

If you want all direct children:

# for all direct children
soup.find("li", { "class" : "test" }).findAll("a", recursive=False)

回答 2

也许你想做

soup.find("li", { "class" : "test" }).find('a')

Perhaps you want to do

soup.find("li", { "class" : "test" }).find('a')

回答 3

试试这个：

li = soup.find("li", { "class" : "test" })
children = li.find_all("a") # returns a list of all <a> children of li

其他提醒：

find方法仅获取第一个出现的子元素。find_all方法获取所有后代元素，并存储在列表中。

try this:

li = soup.find("li", { "class" : "test" })
children = li.find_all("a") # returns a list of all <a> children of li

other reminders:

The find method only gets the first occurring child element. The find_all method gets all descendant elements and are stored in a list.

回答 4

“如何找到所有人a的孩子，<li class=test>而不是其他孩子？”

给定下面的HTML（我添加了另一个<a>以显示select和之间的区别select_one）：

<div>
  <li class="test">
    <a>link1</a>
    <ul>
      <li>
        <a>link2</a>
      </li>
    </ul>
    <a>link3</a>
  </li>
</div>

解决方案是使用放在两个CSS选择器之间的子组合器（>）：

>>> soup.select('li.test > a')
[<a>link1</a>, <a>link3</a>]

如果您只想找到第一个孩子：

>>> soup.select_one('li.test > a')
<a>link1</a>

“How to find all a which are children of <li class=test> but not any others?”

Given the HTML below (I added another <a> to show te difference between select and select_one):

<div>
  <li class="test">
    <a>link1</a>
    <ul>
      <li>
        <a>link2</a>
      </li>
    </ul>
    <a>link3</a>
  </li>
</div>

The solution is to use child combinator (>) that is placed between two CSS selectors:

>>> soup.select('li.test > a')
[<a>link1</a>, <a>link3</a>]

In case you want to find only the first child:

>>> soup.select_one('li.test > a')
<a>link1</a>

回答 5

另一种方法-创建一个过滤器函数，该函数返回True所有所需标签：

def my_filter(tag):
    return (tag.name == 'a' and
        tag.parent.name == 'li' and
        'test' in tag.parent['class'])

然后只需调用find_all参数：

for a in soup(my_filter): # or soup.find_all(my_filter)
    print a

Yet another method – create a filter function that returns True for all desired tags:

def my_filter(tag):
    return (tag.name == 'a' and
        tag.parent.name == 'li' and
        'test' in tag.parent['class'])

Then just call find_all with the argument:

for a in soup(my_filter): # or soup.find_all(my_filter)
    print a

声明：本站所有文章，如无特殊说明或标注，均为本站原创发布。任何个人或组织，在未征得本站同意时，禁止复制、盗用、采集、发布本站内容到任何网站、书籍等各类媒体平台。如若本站内容侵犯了原著者的合法权益，可联系我们进行处理。

如何使用BeautifulSoup查找节点的子节点

问题：如何使用BeautifulSoup查找节点的子节点

回答 0

回答 1

回答 2

回答 3

回答 4

回答 5

排行榜展示

Python 情人节超强技能导出微信聊天记录生成词云

你不得不知道的python超级文献批量搜索下载工具

Python 流程图 — 一键转化代码为流程图

7行代码 Python热力图可视化分析缺失数据处理

Python 优化—算出每条语句执行时间

你的10W块放哪里能赚最多钱？

文章展示

使用Celery vs. RQ的利弊[关闭]

如何在Python中获取当前时间

如何使用numpy.correlate进行自相关？

使用Python插入MySQL数据库后，如何获得“ id”？

Django-如何创建文件并将其保存到模型的FileField中？

有没有办法在Python中使用PhantomJS？

如何使用BeautifulSoup查找节点的子节点

问题：如何使用BeautifulSoup查找节点的子节点

回答 0

回答 1

回答 2

回答 3

回答 4

回答 5

相关文章

排行榜展示

文章展示