在Python中使用XML模式进行验证

问题:在Python中使用XML模式进行验证

我在另一个文件中有一个XML文件和一个XML模式,我想验证我的XML文件是否遵循该模式。如何在Python中执行此操作?

我希望使用标准库,但如有必要,我可以安装第三方软件包。

I have an XML file and an XML schema in another file and I’d like to validate that my XML file adheres to the schema. How do I do this in Python?

I’d prefer something using the standard library, but I can install a third-party package if necessary.


回答 0

我假设您的意思是使用XSD文件。令人惊讶的是,没有很多支持此功能的python XML库。但是,lxml确实可以。使用lxml检查验证。该页面还列出了如何使用lxml与其他架构类型进行验证。

I am assuming you mean using XSD files. Surprisingly there aren’t many python XML libraries that support this. lxml does however. Check Validation with lxml. The page also lists how to use lxml to validate with other schema types.


回答 1

至于“纯python”解决方案:包索引列出:

  • pyxsd,描述说它使用xml.etree.cElementTree,它不是“纯python”(但包含在stdlib中),但是源代码表明它回落到xml.etree.ElementTree,因此将其视为纯python。尚未使用它,但是根据文档,它确实进行模式验证。
  • minixsv:“用“纯” Python编写的轻量级XML模式验证器”。但是,描述中说“当前支持XML模式标准的子集”,所以这可能还不够。
  • XSV,我认为它用于W3C的在线xsd验证器(它似乎仍使用旧的pyxml包,我认为该包不再维护)

As for “pure python” solutions: the package index lists:

  • pyxsd, the description says it uses xml.etree.cElementTree, which is not “pure python” (but included in stdlib), but source code indicates that it falls back to xml.etree.ElementTree, so this would count as pure python. Haven’t used it, but according to the docs, it does do schema validation.
  • minixsv: ‘a lightweight XML schema validator written in “pure” Python’. However, the description says “currently a subset of the XML schema standard is supported”, so this may not be enough.
  • XSV, which I think is used for the W3C’s online xsd validator (it still seems to use the old pyxml package, which I think is no longer maintained)

回答 2

使用流行的库lxml的 Python3中的简单验证器的示例

安装lxml

pip install lxml

如果出现类似“在库libxml2中找不到函数xmlCheckVersion的错误。是否已安装libxml2?”的错误。,请先尝试执行以下操作:

# Debian/Ubuntu
apt-get install python-dev python3-dev libxml2-dev libxslt-dev

# Fedora 23+
dnf install python-devel python3-devel libxml2-devel libxslt-devel

最简单的验证器

让我们创建最简单的validator.py

from lxml import etree

def validate(xml_path: str, xsd_path: str) -> bool:

    xmlschema_doc = etree.parse(xsd_path)
    xmlschema = etree.XMLSchema(xmlschema_doc)

    xml_doc = etree.parse(xml_path)
    result = xmlschema.validate(xml_doc)

    return result

然后编写并运行main.py

from validator import validate

if validate("path/to/file.xml", "path/to/scheme.xsd"):
    print('Valid! :)')
else:
    print('Not valid! :(')

一点点的OOP

为了验证多个文件,不需要每次都创建一个XMLSchema对象,因此:

验证器

from lxml import etree

class Validator:

    def __init__(self, xsd_path: str):
        xmlschema_doc = etree.parse(xsd_path)
        self.xmlschema = etree.XMLSchema(xmlschema_doc)

    def validate(self, xml_path: str) -> bool:
        xml_doc = etree.parse(xml_path)
        result = self.xmlschema.validate(xml_doc)

        return result

现在,我们可以按以下方式验证目录中的所有文件:

main.py

import os
from validator import Validator

validator = Validator("path/to/scheme.xsd")

# The directory with XML files
XML_DIR = "path/to/directory"

for file_name in os.listdir(XML_DIR):
    print('{}: '.format(file_name), end='')

    file_path = '{}/{}'.format(XML_DIR, file_name)

    if validator.validate(file_path):
        print('Valid! :)')
    else:
        print('Not valid! :(')

有关更多选项,请阅读此处:使用lxml进行验证

An example of a simple validator in Python3 using the popular library lxml

Installation lxml

pip install lxml

If you get an error like “Could not find function xmlCheckVersion in library libxml2. Is libxml2 installed?”, try to do this first:

# Debian/Ubuntu
apt-get install python-dev python3-dev libxml2-dev libxslt-dev

# Fedora 23+
dnf install python-devel python3-devel libxml2-devel libxslt-devel

The simplest validator

Let’s create simplest validator.py

from lxml import etree

def validate(xml_path: str, xsd_path: str) -> bool:

    xmlschema_doc = etree.parse(xsd_path)
    xmlschema = etree.XMLSchema(xmlschema_doc)

    xml_doc = etree.parse(xml_path)
    result = xmlschema.validate(xml_doc)

    return result

then write and run main.py

from validator import validate

if validate("path/to/file.xml", "path/to/scheme.xsd"):
    print('Valid! :)')
else:
    print('Not valid! :(')

A little bit of OOP

In order to validate more than one file, there is no need to create an XMLSchema object every time, therefore:

validator.py

from lxml import etree

class Validator:

    def __init__(self, xsd_path: str):
        xmlschema_doc = etree.parse(xsd_path)
        self.xmlschema = etree.XMLSchema(xmlschema_doc)

    def validate(self, xml_path: str) -> bool:
        xml_doc = etree.parse(xml_path)
        result = self.xmlschema.validate(xml_doc)

        return result

Now we can validate all files in the directory as follows:

main.py

import os
from validator import Validator

validator = Validator("path/to/scheme.xsd")

# The directory with XML files
XML_DIR = "path/to/directory"

for file_name in os.listdir(XML_DIR):
    print('{}: '.format(file_name), end='')

    file_path = '{}/{}'.format(XML_DIR, file_name)

    if validator.validate(file_path):
        print('Valid! :)')
    else:
        print('Not valid! :(')

For more options read here: Validation with lxml


回答 3

http://pyxb.sourceforge.net/上的PyXB程序包可从XML模式文档生成Python的验证绑定。它处理几乎每种模式构造并支持多个命名空间。

The PyXB package at http://pyxb.sourceforge.net/ generates validating bindings for Python from XML schema documents. It handles almost every schema construct and supports multiple namespaces.


回答 4

您可以通过两种方式(实际上还有更多)来执行此操作。
1.使用lxml
pip install lxml

from lxml import etree, objectify
from lxml.etree import XMLSyntaxError

def xml_validator(some_xml_string, xsd_file='/path/to/my_schema_file.xsd'):
    try:
        schema = etree.XMLSchema(file=xsd_file)
        parser = objectify.makeparser(schema=schema)
        objectify.fromstring(some_xml_string, parser)
        print "YEAH!, my xml file has validated"
    except XMLSyntaxError:
        #handle exception here
        print "Oh NO!, my xml file does not validate"
        pass

xml_file = open('my_xml_file.xml', 'r')
xml_string = xml_file.read()
xml_file.close()

xml_validator(xml_string, '/path/to/my_schema_file.xsd')
  1. 从命令行使用xmllint。xmllint已安装在许多Linux发行版中。

>> xmllint --format --pretty 1 --load-trace --debug --schema /path/to/my_schema_file.xsd /path/to/my_xml_file.xml

There are two ways(actually there are more) that you could do this.
1. using lxml
pip install lxml

from lxml import etree, objectify
from lxml.etree import XMLSyntaxError

def xml_validator(some_xml_string, xsd_file='/path/to/my_schema_file.xsd'):
    try:
        schema = etree.XMLSchema(file=xsd_file)
        parser = objectify.makeparser(schema=schema)
        objectify.fromstring(some_xml_string, parser)
        print "YEAH!, my xml file has validated"
    except XMLSyntaxError:
        #handle exception here
        print "Oh NO!, my xml file does not validate"
        pass

xml_file = open('my_xml_file.xml', 'r')
xml_string = xml_file.read()
xml_file.close()

xml_validator(xml_string, '/path/to/my_schema_file.xsd')
  1. Use xmllint from the commandline. xmllint comes installed in many linux distributions.

>> xmllint --format --pretty 1 --load-trace --debug --schema /path/to/my_schema_file.xsd /path/to/my_xml_file.xml


回答 5

您可以使用xmlschema Python软件包轻松地针对XML Schema(XSD)验证XML文件或树。它是纯Python,可在PyPi使用,并且没有很多依赖项。

示例-验证文件:

import xmlschema
xmlschema.validate('doc.xml', 'some.xsd')

如果文件未针对XSD进行验证,则该方法将引发异常。然后,该异常包含一些违规细节。

如果要验证许多文件,则只需加载一次XSD:

xsd = xmlschema.XMLSchema('some.xsd')
for filename in filenames:
    xsd.validate(filename)

如果您不需要exceptions,可以这样验证:

if xsd.is_valid('doc.xml'):
    print('do something useful')

另外,xmlschema可直接在文件对象和内存XML树(使用xml.etree.ElementTree或lxml创建)中工作。例:

import xml.etree.ElementTree as ET
t = ET.parse('doc.xml')
result = xsd.is_valid(t)
print('Document is valid? {}'.format(result))

You can easily validate an XML file or tree against an XML Schema (XSD) with the xmlschema Python package. It’s pure Python, available on PyPi and doesn’t have many dependencies.

Example – validate a file:

import xmlschema
xmlschema.validate('doc.xml', 'some.xsd')

The method raises an exception if the file doesn’t validate against the XSD. That exception then contains some violation details.

If you want to validate many files you only have to load the XSD once:

xsd = xmlschema.XMLSchema('some.xsd')
for filename in filenames:
    xsd.validate(filename)

If you don’t need the exception you can validate like this:

if xsd.is_valid('doc.xml'):
    print('do something useful')

Alternatively, xmlschema directly works on file objects and in memory XML trees (either created with xml.etree.ElementTree or lxml). Example:

import xml.etree.ElementTree as ET
t = ET.parse('doc.xml')
result = xsd.is_valid(t)
print('Document is valid? {}'.format(result))

回答 6

lxml提供etree.DTD

来自http://lxml.de/api/lxml.tests.test_dtd-pysrc.html上的测试

...
root = etree.XML(_bytes("<b/>")) 
dtd = etree.DTD(BytesIO("<!ELEMENT b EMPTY>")) 
self.assert_(dtd.validate(root)) 

lxml provides etree.DTD

from the tests on http://lxml.de/api/lxml.tests.test_dtd-pysrc.html

...
root = etree.XML(_bytes("<b/>")) 
dtd = etree.DTD(BytesIO("<!ELEMENT b EMPTY>")) 
self.assert_(dtd.validate(root))