问题:使用python创建一个简单的XML文件

如果我想在python中创建一个简单的XML文件,我有哪些选择?(明智的)

我想要的xml看起来像:

<root>
 <doc>
     <field1 name="blah">some value1</field1>
     <field2 name="asdfasd">some vlaue2</field2>
 </doc>

</root>

What are my options if I want to create a simple XML file in python? (library wise)

The xml I want looks like:

<root>
 <doc>
     <field1 name="blah">some value1</field1>
     <field2 name="asdfasd">some vlaue2</field2>
 </doc>

</root>

回答 0

如今,最受欢迎的(也是非常简单的)选项是ElementTree API,该元素自Python 2.5起已包含在标准库中。

可用的选项是:

  • ElementTree(ElementTree的基本,纯Python实现。自2.5以来是标准库的一部分)
  • cElementTree(ElementTree的优化C实现。从2.5开始在标准库中提供)
  • LXML(基于libxml2。提供ElementTree API的丰富超集以及XPath,CSS选择器等)

这是一个如何使用stdlib cElementTree生成示例文档的示例:

import xml.etree.cElementTree as ET

root = ET.Element("root")
doc = ET.SubElement(root, "doc")

ET.SubElement(doc, "field1", name="blah").text = "some value1"
ET.SubElement(doc, "field2", name="asdfasd").text = "some vlaue2"

tree = ET.ElementTree(root)
tree.write("filename.xml")

我已经对其进行了测试,并且可以正常工作,但是我假设空格并不重要。如果您需要“ prettyprint”缩进,请告诉我,我将查找如何做。(这可能是特定于LXML的选项。我不太使用stdlib实现)

为了进一步阅读,这里有一些有用的链接:

最后一点,cElementTree或LXML都应该足够快以满足您的所有需求(都是经过优化的C代码),但是如果您处在需要挤出最后每一个性能的情况下,则基准LXML网站指示:

  • LXML显然在序列化(生成)XML方面胜出
  • 作为实现正确的父遍历的副作用,LXML的解析比cElementTree慢一些。

These days, the most popular (and very simple) option is the ElementTree API, which has been included in the standard library since Python 2.5.

The available options for that are:

  • ElementTree (Basic, pure-Python implementation of ElementTree. Part of the standard library since 2.5)
  • cElementTree (Optimized C implementation of ElementTree. Also offered in the standard library since 2.5)
  • LXML (Based on libxml2. Offers a rich superset of the ElementTree API as well XPath, CSS Selectors, and more)

Here’s an example of how to generate your example document using the in-stdlib cElementTree:

import xml.etree.cElementTree as ET

root = ET.Element("root")
doc = ET.SubElement(root, "doc")

ET.SubElement(doc, "field1", name="blah").text = "some value1"
ET.SubElement(doc, "field2", name="asdfasd").text = "some vlaue2"

tree = ET.ElementTree(root)
tree.write("filename.xml")

I’ve tested it and it works, but I’m assuming whitespace isn’t significant. If you need “prettyprint” indentation, let me know and I’ll look up how to do that. (It may be an LXML-specific option. I don’t use the stdlib implementation much)

For further reading, here are some useful links:

As a final note, either cElementTree or LXML should be fast enough for all your needs (both are optimized C code), but in the event you’re in a situation where you need to squeeze out every last bit of performance, the benchmarks on the LXML site indicate that:

  • LXML clearly wins for serializing (generating) XML
  • As a side-effect of implementing proper parent traversal, LXML is a bit slower than cElementTree for parsing.

回答 1

LXML库包括XML生成一个非常方便的语法,叫做E-工厂。这是我为您提供的示例的方式:

#!/usr/bin/python
import lxml.etree
import lxml.builder    

E = lxml.builder.ElementMaker()
ROOT = E.root
DOC = E.doc
FIELD1 = E.field1
FIELD2 = E.field2

the_doc = ROOT(
        DOC(
            FIELD1('some value1', name='blah'),
            FIELD2('some value2', name='asdfasd'),
            )   
        )   

print lxml.etree.tostring(the_doc, pretty_print=True)

输出:

<root>
  <doc>
    <field1 name="blah">some value1</field1>
    <field2 name="asdfasd">some value2</field2>
  </doc>
</root>

它还支持添加到已制成的节点,例如,在上述操作之后,您可以说

the_doc.append(FIELD2('another value again', name='hithere'))

The lxml library includes a very convenient syntax for XML generation, called the E-factory. Here’s how I’d make the example you give:

#!/usr/bin/python
import lxml.etree
import lxml.builder    

E = lxml.builder.ElementMaker()
ROOT = E.root
DOC = E.doc
FIELD1 = E.field1
FIELD2 = E.field2

the_doc = ROOT(
        DOC(
            FIELD1('some value1', name='blah'),
            FIELD2('some value2', name='asdfasd'),
            )   
        )   

print lxml.etree.tostring(the_doc, pretty_print=True)

Output:

<root>
  <doc>
    <field1 name="blah">some value1</field1>
    <field2 name="asdfasd">some value2</field2>
  </doc>
</root>

It also supports adding to an already-made node, e.g. after the above you could say

the_doc.append(FIELD2('another value again', name='hithere'))

回答 2

Yattag http://www.yattag.org/https://github.com/leforestier/yattag提供了一个有趣的API,用于创建此类XML文档(以及HTML文档)。

它使用上下文管理器with关键字。

from yattag import Doc, indent

doc, tag, text = Doc().tagtext()

with tag('root'):
    with tag('doc'):
        with tag('field1', name='blah'):
            text('some value1')
        with tag('field2', name='asdfasd'):
            text('some value2')

result = indent(
    doc.getvalue(),
    indentation = ' '*4,
    newline = '\r\n'
)

print(result)

这样您将获得:

<root>
    <doc>
        <field1 name="blah">some value1</field1>
        <field2 name="asdfasd">some value2</field2>
    </doc>
</root>

Yattag http://www.yattag.org/ or https://github.com/leforestier/yattag provides an interesting API to create such XML document (and also HTML documents).

It’s using context manager and with keyword.

from yattag import Doc, indent

doc, tag, text = Doc().tagtext()

with tag('root'):
    with tag('doc'):
        with tag('field1', name='blah'):
            text('some value1')
        with tag('field2', name='asdfasd'):
            text('some value2')

result = indent(
    doc.getvalue(),
    indentation = ' '*4,
    newline = '\r\n'
)

print(result)

so you will get:

<root>
    <doc>
        <field1 name="blah">some value1</field1>
        <field2 name="asdfasd">some value2</field2>
    </doc>
</root>

回答 3

对于最简单的选择,我会选择minidom:http ://docs.python.org/library/xml.dom.minidom.html 。它内置在python标准库中,在简单情况下易于使用。

这是一个非常容易遵循的教程:http : //www.boddie.org.uk/python/XML_intro.html

For the simplest choice, I’d go with minidom: http://docs.python.org/library/xml.dom.minidom.html . It is built in to the python standard library and is straightforward to use in simple cases.

Here’s a pretty easy to follow tutorial: http://www.boddie.org.uk/python/XML_intro.html


回答 4

对于这样一个简单的XML结构,您可能不希望使用完整的XML模块。对于最简单的结构,请考虑使用字符串模板,对于更复杂的对象,请考虑使用Jinja。Jinja可以处理循环遍历数据列表以生成文档列表的内部xml。使用原始python字符串模板有点棘手

有关Jinja的示例,请参见我对类似问题的回答

这是一个使用字符串模板生成xml的示例。

import string
from xml.sax.saxutils import escape

inner_template = string.Template('    <field${id} name="${name}">${value}</field${id}>')

outer_template = string.Template("""<root>
 <doc>
${document_list}
 </doc>
</root>
 """)

data = [
    (1, 'foo', 'The value for the foo document'),
    (2, 'bar', 'The <value> for the <bar> document'),
]

inner_contents = [inner_template.substitute(id=id, name=name, value=escape(value)) for (id, name, value) in data]
result = outer_template.substitute(document_list='\n'.join(inner_contents))
print result

输出:

<root>
 <doc>
    <field1 name="foo">The value for the foo document</field1>
    <field2 name="bar">The &lt;value&gt; for the &lt;bar&gt; document</field2>
 </doc>
</root>

模板方法的令人沮丧的是,你不会得到的逃避<>自由。我通过从中引入一个工具来解决这个问题xml.sax

For such a simple XML structure, you may not want to involve a full blown XML module. Consider a string template for the simplest structures, or Jinja for something a little more complex. Jinja can handle looping over a list of data to produce the inner xml of your document list. That is a bit trickier with raw python string templates

For a Jinja example, see my answer to a similar question.

Here is an example of generating your xml with string templates.

import string
from xml.sax.saxutils import escape

inner_template = string.Template('    <field${id} name="${name}">${value}</field${id}>')

outer_template = string.Template("""<root>
 <doc>
${document_list}
 </doc>
</root>
 """)

data = [
    (1, 'foo', 'The value for the foo document'),
    (2, 'bar', 'The <value> for the <bar> document'),
]

inner_contents = [inner_template.substitute(id=id, name=name, value=escape(value)) for (id, name, value) in data]
result = outer_template.substitute(document_list='\n'.join(inner_contents))
print result

Output:

<root>
 <doc>
    <field1 name="foo">The value for the foo document</field1>
    <field2 name="bar">The &lt;value&gt; for the &lt;bar&gt; document</field2>
 </doc>
</root>

The downer of the template approach is that you won’t get escaping of < and > for free. I danced around that problem by pulling in a util from xml.sax


回答 5

我刚刚使用bigh_29的Templates方法编写了一个xml生成器,这是一种控制输出内容的好方法,而没有太多对象“阻碍”。

至于标签和值,我使用了两个数组,一个数组给出了标签名称和在输出xml中的位置,另一个数组引用了具有相同标签列表的参数文件。但是,参数文件在相应的输入(csv)文件中也有位置编号,将从中获取数据。这样,如果来自输入文件的数据位置发生任何变化,则程序不会改变;它可以从参数文件中的相应标签动态计算出数据字段的位置。

I just finished writing an xml generator, using bigh_29’s method of Templates … it’s a nice way of controlling what you output without too many Objects getting ‘in the way’.

As for the tag and value, I used two arrays, one which gave the tag name and position in the output xml and another which referenced a parameter file having the same list of tags. The parameter file, however, also has the position number in the corresponding input (csv) file where the data will be taken from. This way, if there’s any changes to the position of the data coming in from the input file, the program doesn’t change; it dynamically works out the data field position from the appropriate tag in the parameter file.


声明:本站所有文章,如无特殊说明或标注,均为本站原创发布。任何个人或组织,在未征得本站同意时,禁止复制、盗用、采集、发布本站内容到任何网站、书籍等各类媒体平台。如若本站内容侵犯了原著者的合法权益,可联系我们进行处理。