bs4.FeatureNotFound:找不到具有您请求的功能的树构建器:lxml。您需要安装解析器库吗?

问题:bs4.FeatureNotFound:找不到具有您请求的功能的树构建器:lxml。您需要安装解析器库吗?

...
soup = BeautifulSoup(html, "lxml")
File "/Library/Python/2.7/site-packages/bs4/__init__.py", line 152, in __init__
% ",".join(features))
bs4.FeatureNotFound: Couldn't find a tree builder with the features you requested: lxml. Do you need to install a parser library?

以上输出在我的终端上。我在Mac OS 10.7.x上。我有Python 2.7.1,并按照本教程操作获得了Beautiful Soup和lxml,它们都已成功安装并与位于此处的单独测试文件一起使用。在导致此错误的Python脚本中,我包含以下行: from pageCrawler import comparePages 在pageCrawler文件中,我包含以下两行: from bs4 import BeautifulSoup from urllib2 import urlopen

找出问题所在以及如何解决的任何帮助将不胜感激。

...
soup = BeautifulSoup(html, "lxml")
File "/Library/Python/2.7/site-packages/bs4/__init__.py", line 152, in __init__
% ",".join(features))
bs4.FeatureNotFound: Couldn't find a tree builder with the features you requested: lxml. Do you need to install a parser library?

The above outputs on my Terminal. I am on Mac OS 10.7.x. I have Python 2.7.1, and followed this tutorial to get Beautiful Soup and lxml, which both installed successfully and work with a separate test file located here. In the Python script that causes this error, I have included this line: from pageCrawler import comparePages And in the pageCrawler file I have included the following two lines: from bs4 import BeautifulSoup from urllib2 import urlopen

Any help in figuring out what the problem is and how it can be solved would much be appreciated.


回答 0

我怀疑这与BS将用于读取HTML的解析器有关。他们的文档在这里,但是如果您像我(在OSX上)一样,可能会遇到一些麻烦,需要做一些工作:

您会注意到,在上面的BS4文档页面中,他们指出,默认情况下,BS4将使用Python内置的HTML解析器。假设您使用的是OSX,则Apple捆绑的Python版本是2.7.2,它对字符格式不宽容。我遇到了同样的问题,因此我升级了Python版本以解决此问题。在virtualenv中执行此操作可以最大程度地减少对其他项目的破坏。

如果这样做听起来很痛苦,则可以切换到LXML解析器:

pip install lxml

然后尝试:

soup = BeautifulSoup(html, "lxml")

根据您的情况,这可能就足够了。我发现这很烦人,需要升级我的Python版本。使用的virtualenv,您可以迁移的包很容易。

I have a suspicion that this is related to the parser that BS will use to read the HTML. They document is here, but if you’re like me (on OSX) you might be stuck with something that requires a bit of work:

You’ll notice that in the BS4 documentation page above, they point out that by default BS4 will use the Python built-in HTML parser. Assuming you are in OSX, the Apple-bundled version of Python is 2.7.2 which is not lenient for character formatting. I hit this same problem, so I upgraded my version of Python to work around it. Doing this in a virtualenv will minimize disruption to other projects.

If doing that sounds like a pain, you can switch over to the LXML parser:

pip install lxml

And then try:

soup = BeautifulSoup(html, "lxml")

Depending on your scenario, that might be good enough. I found this annoying enough to warrant upgrading my version of Python. Using virtualenv, you can migrate your packages fairly easily.


回答 1

对于安装了bs4的基本开箱即用的python,则可以使用以下命令处理xml

soup = BeautifulSoup(html, "html5lib")

但是,如果您想使用formatter =’xml’,则需要

pip3 install lxml

soup = BeautifulSoup(html, features="xml")

For basic out of the box python with bs4 installed then you can process your xml with

soup = BeautifulSoup(html, "html5lib")

If however you want to use formatter=’xml’ then you need to

pip3 install lxml

soup = BeautifulSoup(html, features="xml")

回答 2

我首选内置python html解析器,不安装不依赖

soup = BeautifulSoup(s, "html.parser")

I preferred built in python html parser, no install no dependencies

soup = BeautifulSoup(s, "html.parser")


回答 3

我正在使用Python 3.6,并且在这篇文章中有相同的原始错误。运行命令后:

python3 -m pip install lxml

它解决了我的问题

I am using Python 3.6 and I had the same original error in this post. After I ran the command:

python3 -m pip install lxml

it resolved my problem


回答 4

运行以下三个命令,以确保已安装所有相关的软件包:

pip install bs4
pip install html5lib
pip install lxml

然后根据需要重新启动Python IDE。

那应该处理与这个问题有关的任何事情。

Run these three commands to make sure that you have all the relevant packages installed:

pip install bs4
pip install html5lib
pip install lxml

Then restart your Python IDE, if needed.

That should take care of anything related to this issue.


回答 5

您可以使用以下代码代替使用lxml和html.parser:

soup = BeautifulSoup(html, 'html.parser')

Instead of using lxml use html.parser, you can use this piece of code:

soup = BeautifulSoup(html, 'html.parser')

回答 6

尽管BeautifulSoup默认情况下支持HTML解析器。但是,如果您想使用任何其他第三方Python解析器,则需要安装该外部解析器,例如(lxml)。

soup_object= BeautifulSoup(markup,"html.parser") #Python HTML parser

但是,如果您未将任何解析器指定为参数,则会收到一条警告,提示您未指定解析器。

soup_object= BeautifulSoup(markup) #Warnning

要使用任何其他外部解析器,您需要先安装它,然后再指定它。喜欢

pip install lxml

soup_object= BeautifulSoup(markup,'lxml') # C dependent parser 

外部解析器具有c和python依赖关系,这可能有一些优点和缺点。

Although BeautifulSoup supports the HTML parser by default If you want to use any other third-party Python parsers you need to install that external parser like(lxml).

soup_object= BeautifulSoup(markup,"html.parser") #Python HTML parser

But if you don’t specified any parser as parameter you will get an warning that no parser specified.

soup_object= BeautifulSoup(markup) #Warnning

To use any other external parser you need to install it and then need to specify it. like

pip install lxml

soup_object= BeautifulSoup(markup,'lxml') # C dependent parser 

External parser have c and python dependency which may have some advantage and disadvantage.


回答 7

我遇到了同样的问题。我发现原因是我有一个过时的python 6软件包。

>>> import html5lib
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python2.7/site-packages/html5lib/__init__.py", line 16, in <module>
    from .html5parser import HTMLParser, parse, parseFragment
  File "/usr/local/lib/python2.7/site-packages/html5lib/html5parser.py", line 2, in <module>
    from six import with_metaclass, viewkeys, PY3
ImportError: cannot import name viewkeys

升级六个软件包将解决此问题:

sudo pip install six=1.10.0

I encountered the same issue. I found the reason is that I had a slightly-outdated python six package.

>>> import html5lib
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python2.7/site-packages/html5lib/__init__.py", line 16, in <module>
    from .html5parser import HTMLParser, parse, parseFragment
  File "/usr/local/lib/python2.7/site-packages/html5lib/html5parser.py", line 2, in <module>
    from six import with_metaclass, viewkeys, PY3
ImportError: cannot import name viewkeys

Upgrading your six package will solve the issue:

sudo pip install six=1.10.0

回答 8

在python环境中安装LXML分析器。

pip install lxml

您的问题将得到解决。您还可以使用内置的python软件包,其用法与以下相同:

soup = BeautifulSoup(s,  "html.parser")

注意:在Python3中,“ HTMLParser”模块已重命名为“ html.parser”

Install LXML parser in python environment.

pip install lxml

Your problem will be resolve. You can also use built-in python package for the same as:

soup = BeautifulSoup(s,  "html.parser")

Note: The “HTMLParser” module has been renamed to “html.parser” in Python3


回答 9

在某些参考中,使用第二个而不是第一个:

soup_object= BeautifulSoup(markup,'html-parser')
soup_object= BeautifulSoup(markup,'html.parser')

In some references, use the second instead of the first:

soup_object= BeautifulSoup(markup,'html-parser')
soup_object= BeautifulSoup(markup,'html.parser')

回答 10

由于使用了解析器,因此出现错误。通常,如果您具有HTML文件/代码,则需要使用html5lib(文档可在此处找到);如果您具有XML文件/数据,则需要使用lxml(文档可在此处找到)。您也可以将其lxml用于HTML文件/代码,但有时会出现上述错误。因此,最好根据数据/文件的类型明智地选择软件包。您也可以使用html_parser内置模块。但是,这有时有时也不起作用。

有关何时使用哪个软件包的更多详细信息,请参见此处的详细信息

The error is coming because of the parser you are using. In general, if you have HTML file/code then you need to use html5lib(documentation can be found here) & in-case you have XML file/data then you need to use lxml(documentation can be found here). You can use lxml for HTML file/code also but sometimes it gives an error as above. So, better to choose the package wisely based on the type of data/file. You can also use html_parser which is built-in module. But, this also sometimes do not work.

For more details regarding when to use which package you can see the details here


回答 11

空白参数将导致警告,提示您最好使用该参数。
汤= BeautifulSoup(html)

————— // UserWarning:未明确指定解析器,因此我正在为此系统使用最佳的HTML解析器(“ html5lib”)。通常这不是问题,但是如果您在另一个系统或不同的虚拟环境中运行此代码,则它可能使用不同的解析器并且行为不同。 ——- /

python –version Python 3.7.7

PyCharm 19.3.4 CE

Blank parameter will result in a warning for best available.
soup = BeautifulSoup(html)

—————/UserWarning: No parser was explicitly specified, so I’m using the best available HTML parser for this system (“html5lib”). This usually isn’t a problem, but if you run this code on another system, or in a different virtual environment, it may use a different parser and behave differently.———————-/

python –version Python 3.7.7

PyCharm 19.3.4 CE