问题:用Python漂亮地打印XML
在Python中漂亮地打印XML的最佳方法(或多种方法)是什么?
What is the best way (or are the various ways) to pretty print XML in Python?
回答 0
import xml.dom.minidom
dom = xml.dom.minidom.parse(xml_fname) # or xml.dom.minidom.parseString(xml_string)
pretty_xml_as_string = dom.toprettyxml()
import xml.dom.minidom
dom = xml.dom.minidom.parse(xml_fname) # or xml.dom.minidom.parseString(xml_string)
pretty_xml_as_string = dom.toprettyxml()
回答 1
lxml是最新的,更新的,并且包含漂亮的打印功能
import lxml.etree as etree
x = etree.parse("filename")
print etree.tostring(x, pretty_print=True)
查看lxml教程:http : //lxml.de/tutorial.html
lxml is recent, updated, and includes a pretty print function
import lxml.etree as etree
x = etree.parse("filename")
print etree.tostring(x, pretty_print=True)
Check out the lxml tutorial: http://lxml.de/tutorial.html
回答 2
另一个解决方案是借用此indent
函数,以与自2.5以来内置在Python中的ElementTree库一起使用。如下所示:
from xml.etree import ElementTree
def indent(elem, level=0):
i = "\n" + level*" "
j = "\n" + (level-1)*" "
if len(elem):
if not elem.text or not elem.text.strip():
elem.text = i + " "
if not elem.tail or not elem.tail.strip():
elem.tail = i
for subelem in elem:
indent(subelem, level+1)
if not elem.tail or not elem.tail.strip():
elem.tail = j
else:
if level and (not elem.tail or not elem.tail.strip()):
elem.tail = j
return elem
root = ElementTree.parse('/tmp/xmlfile').getroot()
indent(root)
ElementTree.dump(root)
Another solution is to borrow this indent
function, for use with the ElementTree library that’s built in to Python since 2.5. Here’s what that would look like:
from xml.etree import ElementTree
def indent(elem, level=0):
i = "\n" + level*" "
j = "\n" + (level-1)*" "
if len(elem):
if not elem.text or not elem.text.strip():
elem.text = i + " "
if not elem.tail or not elem.tail.strip():
elem.tail = i
for subelem in elem:
indent(subelem, level+1)
if not elem.tail or not elem.tail.strip():
elem.tail = j
else:
if level and (not elem.tail or not elem.tail.strip()):
elem.tail = j
return elem
root = ElementTree.parse('/tmp/xmlfile').getroot()
indent(root)
ElementTree.dump(root)
回答 3
这是我的(hacky?)解决方案,用于解决丑陋的文本节点问题。
uglyXml = doc.toprettyxml(indent=' ')
text_re = re.compile('>\n\s+([^<>\s].*?)\n\s+</', re.DOTALL)
prettyXml = text_re.sub('>\g<1></', uglyXml)
print prettyXml
上面的代码将生成:
<?xml version="1.0" ?>
<issues>
<issue>
<id>1</id>
<title>Add Visual Studio 2005 and 2008 solution files</title>
<details>We need Visual Studio 2005/2008 project files for Windows.</details>
</issue>
</issues>
代替这个:
<?xml version="1.0" ?>
<issues>
<issue>
<id>
1
</id>
<title>
Add Visual Studio 2005 and 2008 solution files
</title>
<details>
We need Visual Studio 2005/2008 project files for Windows.
</details>
</issue>
</issues>
免责声明:可能存在一些限制。
Here’s my (hacky?) solution to get around the ugly text node problem.
uglyXml = doc.toprettyxml(indent=' ')
text_re = re.compile('>\n\s+([^<>\s].*?)\n\s+</', re.DOTALL)
prettyXml = text_re.sub('>\g<1></', uglyXml)
print prettyXml
The above code will produce:
<?xml version="1.0" ?>
<issues>
<issue>
<id>1</id>
<title>Add Visual Studio 2005 and 2008 solution files</title>
<details>We need Visual Studio 2005/2008 project files for Windows.</details>
</issue>
</issues>
Instead of this:
<?xml version="1.0" ?>
<issues>
<issue>
<id>
1
</id>
<title>
Add Visual Studio 2005 and 2008 solution files
</title>
<details>
We need Visual Studio 2005/2008 project files for Windows.
</details>
</issue>
</issues>
Disclaimer: There are probably some limitations.
回答 4
正如其他人指出的那样,lxml内置了一个漂亮的打印机。
请注意,尽管默认情况下它将CDATA节更改为普通文本,这可能会带来讨厌的结果。
这是一个Python函数,可保留输入文件,仅更改缩进(请注意strip_cdata=False
)。此外,它确保输出使用UTF-8作为编码,而不是默认的ASCII(请注意encoding='utf-8'
):
from lxml import etree
def prettyPrintXml(xmlFilePathToPrettyPrint):
assert xmlFilePathToPrettyPrint is not None
parser = etree.XMLParser(resolve_entities=False, strip_cdata=False)
document = etree.parse(xmlFilePathToPrettyPrint, parser)
document.write(xmlFilePathToPrettyPrint, pretty_print=True, encoding='utf-8')
用法示例:
prettyPrintXml('some_folder/some_file.xml')
As others pointed out, lxml has a pretty printer built in.
Be aware though that by default it changes CDATA sections to normal text, which can have nasty results.
Here’s a Python function that preserves the input file and only changes the indentation (notice the strip_cdata=False
). Furthermore it makes sure the output uses UTF-8 as encoding instead of the default ASCII (notice the encoding='utf-8'
):
from lxml import etree
def prettyPrintXml(xmlFilePathToPrettyPrint):
assert xmlFilePathToPrettyPrint is not None
parser = etree.XMLParser(resolve_entities=False, strip_cdata=False)
document = etree.parse(xmlFilePathToPrettyPrint, parser)
document.write(xmlFilePathToPrettyPrint, pretty_print=True, encoding='utf-8')
Example usage:
prettyPrintXml('some_folder/some_file.xml')
回答 5
BeautifulSoup有一个易于使用的prettify()
方法。
每个缩进级别缩进一个空格。它比lxml的pretty_print好得多,而且又短又可爱。
from bs4 import BeautifulSoup
bs = BeautifulSoup(open(xml_file), 'xml')
print bs.prettify()
BeautifulSoup has a easy to use prettify()
method.
It indents one space per indentation level. It works much better than lxml’s pretty_print and is short and sweet.
from bs4 import BeautifulSoup
bs = BeautifulSoup(open(xml_file), 'xml')
print bs.prettify()
回答 6
如果有的xmllint
话,可以产生一个子流程并使用它。xmllint --format <file>
漂亮地将其输入XML打印到标准输出。
请注意,此方法使用python外部的程序,这使其有点像hack。
def pretty_print_xml(xml):
proc = subprocess.Popen(
['xmllint', '--format', '/dev/stdin'],
stdin=subprocess.PIPE,
stdout=subprocess.PIPE,
)
(output, error_output) = proc.communicate(xml);
return output
print(pretty_print_xml(data))
If you have xmllint
you can spawn a subprocess and use it. xmllint --format <file>
pretty-prints its input XML to standard output.
Note that this method uses an program external to python, which makes it sort of a hack.
def pretty_print_xml(xml):
proc = subprocess.Popen(
['xmllint', '--format', '/dev/stdin'],
stdin=subprocess.PIPE,
stdout=subprocess.PIPE,
)
(output, error_output) = proc.communicate(xml);
return output
print(pretty_print_xml(data))
回答 7
我尝试编辑上面的“ ade”答案,但是在最初匿名提供反馈后,Stack Overflow不允许我进行编辑。这是用于精巧打印ElementTree的函数的错误版本。
def indent(elem, level=0, more_sibs=False):
i = "\n"
if level:
i += (level-1) * ' '
num_kids = len(elem)
if num_kids:
if not elem.text or not elem.text.strip():
elem.text = i + " "
if level:
elem.text += ' '
count = 0
for kid in elem:
indent(kid, level+1, count < num_kids - 1)
count += 1
if not elem.tail or not elem.tail.strip():
elem.tail = i
if more_sibs:
elem.tail += ' '
else:
if level and (not elem.tail or not elem.tail.strip()):
elem.tail = i
if more_sibs:
elem.tail += ' '
I tried to edit “ade”s answer above, but Stack Overflow wouldn’t let me edit after I had initially provided feedback anonymously. This is a less buggy version of the function to pretty-print an ElementTree.
def indent(elem, level=0, more_sibs=False):
i = "\n"
if level:
i += (level-1) * ' '
num_kids = len(elem)
if num_kids:
if not elem.text or not elem.text.strip():
elem.text = i + " "
if level:
elem.text += ' '
count = 0
for kid in elem:
indent(kid, level+1, count < num_kids - 1)
count += 1
if not elem.tail or not elem.tail.strip():
elem.tail = i
if more_sibs:
elem.tail += ' '
else:
if level and (not elem.tail or not elem.tail.strip()):
elem.tail = i
if more_sibs:
elem.tail += ' '
回答 8
如果您使用的是DOM实现,则每种都有自己的内置漂亮打印形式:
# minidom
#
document.toprettyxml()
# 4DOM
#
xml.dom.ext.PrettyPrint(document, stream)
# pxdom (or other DOM Level 3 LS-compliant imp)
#
serializer.domConfig.setParameter('format-pretty-print', True)
serializer.writeToString(document)
如果您使用的其他东西没有它自己的漂亮打印机-或那些漂亮打印机没有按照您想要的方式做-您可能必须编写或继承自己的序列化器。
If you’re using a DOM implementation, each has their own form of pretty-printing built-in:
# minidom
#
document.toprettyxml()
# 4DOM
#
xml.dom.ext.PrettyPrint(document, stream)
# pxdom (or other DOM Level 3 LS-compliant imp)
#
serializer.domConfig.setParameter('format-pretty-print', True)
serializer.writeToString(document)
If you’re using something else without its own pretty-printer — or those pretty-printers don’t quite do it the way you want — you’d probably have to write or subclass your own serialiser.
回答 9
我对minidom的漂亮字体有一些疑问。每当我尝试用给定编码之外的字符漂亮地打印文档时,都会出现UnicodeError,例如,如果我在文档中有一个β并且尝试了doc.toprettyxml(encoding='latin-1')
。这是我的解决方法:
def toprettyxml(doc, encoding):
"""Return a pretty-printed XML document in a given encoding."""
unistr = doc.toprettyxml().replace(u'<?xml version="1.0" ?>',
u'<?xml version="1.0" encoding="%s"?>' % encoding)
return unistr.encode(encoding, 'xmlcharrefreplace')
I had some problems with minidom’s pretty print. I’d get a UnicodeError whenever I tried pretty-printing a document with characters outside the given encoding, eg if I had a β in a document and I tried doc.toprettyxml(encoding='latin-1')
. Here’s my workaround for it:
def toprettyxml(doc, encoding):
"""Return a pretty-printed XML document in a given encoding."""
unistr = doc.toprettyxml().replace(u'<?xml version="1.0" ?>',
u'<?xml version="1.0" encoding="%s"?>' % encoding)
return unistr.encode(encoding, 'xmlcharrefreplace')
回答 10
from yattag import indent
pretty_string = indent(ugly_string)
除非您要求使用以下命令,否则它不会在文本节点内添加空格或换行符:
indent(mystring, indent_text = True)
您可以指定缩进单位应该是什么以及换行符应该是什么样。
pretty_xml_string = indent(
ugly_xml_string,
indentation = ' ',
newline = '\r\n'
)
该文档位于http://www.yattag.org主页上。
from yattag import indent
pretty_string = indent(ugly_string)
It won’t add spaces or newlines inside text nodes, unless you ask for it with:
indent(mystring, indent_text = True)
You can specify what the indentation unit should be and what the newline should look like.
pretty_xml_string = indent(
ugly_xml_string,
indentation = ' ',
newline = '\r\n'
)
The doc is on http://www.yattag.org homepage.
回答 11
我编写了一个解决方案,以遍历现有的ElementTree并按照通常期望的那样使用文本/尾部缩进。
def prettify(element, indent=' '):
queue = [(0, element)] # (level, element)
while queue:
level, element = queue.pop(0)
children = [(level + 1, child) for child in list(element)]
if children:
element.text = '\n' + indent * (level+1) # for child open
if queue:
element.tail = '\n' + indent * queue[0][0] # for sibling open
else:
element.tail = '\n' + indent * (level-1) # for parent close
queue[0:0] = children # prepend so children come before siblings
I wrote a solution to walk through an existing ElementTree and use text/tail to indent it as one typically expects.
def prettify(element, indent=' '):
queue = [(0, element)] # (level, element)
while queue:
level, element = queue.pop(0)
children = [(level + 1, child) for child in list(element)]
if children:
element.text = '\n' + indent * (level+1) # for child open
if queue:
element.tail = '\n' + indent * queue[0][0] # for sibling open
else:
element.tail = '\n' + indent * (level-1) # for parent close
queue[0:0] = children # prepend so children come before siblings
回答 12
回答 13
这是一个Python3解决方案,它摆脱了丑陋的换行符问题(大量空白),并且仅使用标准库,而不像大多数其他实现那样。
import xml.etree.ElementTree as ET
import xml.dom.minidom
import os
def pretty_print_xml_given_root(root, output_xml):
"""
Useful for when you are editing xml data on the fly
"""
xml_string = xml.dom.minidom.parseString(ET.tostring(root)).toprettyxml()
xml_string = os.linesep.join([s for s in xml_string.splitlines() if s.strip()]) # remove the weird newline issue
with open(output_xml, "w") as file_out:
file_out.write(xml_string)
def pretty_print_xml_given_file(input_xml, output_xml):
"""
Useful for when you want to reformat an already existing xml file
"""
tree = ET.parse(input_xml)
root = tree.getroot()
pretty_print_xml_given_root(root, output_xml)
我在这里找到了解决常见换行问题的方法。
Here’s a Python3 solution that gets rid of the ugly newline issue (tons of whitespace), and it only uses standard libraries unlike most other implementations.
import xml.etree.ElementTree as ET
import xml.dom.minidom
import os
def pretty_print_xml_given_root(root, output_xml):
"""
Useful for when you are editing xml data on the fly
"""
xml_string = xml.dom.minidom.parseString(ET.tostring(root)).toprettyxml()
xml_string = os.linesep.join([s for s in xml_string.splitlines() if s.strip()]) # remove the weird newline issue
with open(output_xml, "w") as file_out:
file_out.write(xml_string)
def pretty_print_xml_given_file(input_xml, output_xml):
"""
Useful for when you want to reformat an already existing xml file
"""
tree = ET.parse(input_xml)
root = tree.getroot()
pretty_print_xml_given_root(root, output_xml)
I found how to fix the common newline issue here.
回答 14
您可以将流行的外部库xmltodict与一起使用unparse
,pretty=True
您将获得最佳结果:
xmltodict.unparse(
xmltodict.parse(my_xml), full_document=False, pretty=True)
full_document=False
反对<?xml version="1.0" encoding="UTF-8"?>
在顶部。
You can use popular external library xmltodict, with unparse
and pretty=True
you will get best result:
xmltodict.unparse(
xmltodict.parse(my_xml), full_document=False, pretty=True)
full_document=False
against <?xml version="1.0" encoding="UTF-8"?>
at the top.
回答 15
看一下vkbeautify模块。
这是我非常流行的javascript / nodejs插件的同名python版本。它可以漂亮地打印/最小化XML,JSON和CSS文本。输入和输出可以是字符串/文件的任意组合。它非常紧凑,没有任何依赖性。
例子:
import vkbeautify as vkb
vkb.xml(text)
vkb.xml(text, 'path/to/dest/file')
vkb.xml('path/to/src/file')
vkb.xml('path/to/src/file', 'path/to/dest/file')
Take a look at the vkbeautify module.
It is a python version of my very popular javascript/nodejs plugin with the same name. It can pretty-print/minify XML, JSON and CSS text. Input and output can be string/file in any combinations. It is very compact and doesn’t have any dependency.
Examples:
import vkbeautify as vkb
vkb.xml(text)
vkb.xml(text, 'path/to/dest/file')
vkb.xml('path/to/src/file')
vkb.xml('path/to/src/file', 'path/to/dest/file')
回答 16
如果您不想进行重新解析,则可以使用xmlpp.py库和该get_pprint()
函数。在我的用例中,它工作得很好且流畅,而无需重新解析为lxml ElementTree对象。
An alternative if you don’t want to have to reparse, there is the xmlpp.py library with the get_pprint()
function. It worked nice and smoothly for my use cases, without having to reparse to an lxml ElementTree object.
回答 17
您可以尝试这种变化…
安装BeautifulSoup
和后端lxml
(解析器)库:
user$ pip3 install lxml bs4
处理您的XML文档:
from bs4 import BeautifulSoup
with open('/path/to/file.xml', 'r') as doc:
for line in doc:
print(BeautifulSoup(line, 'lxml-xml').prettify())
You can try this variation…
Install BeautifulSoup
and the backend lxml
(parser) libraries:
user$ pip3 install lxml bs4
Process your XML document:
from bs4 import BeautifulSoup
with open('/path/to/file.xml', 'r') as doc:
for line in doc:
print(BeautifulSoup(line, 'lxml-xml').prettify())
回答 18
我遇到了这个问题,并像这样解决了它:
def write_xml_file (self, file, xml_root_element, xml_declaration=False, pretty_print=False, encoding='unicode', indent='\t'):
pretty_printed_xml = etree.tostring(xml_root_element, xml_declaration=xml_declaration, pretty_print=pretty_print, encoding=encoding)
if pretty_print: pretty_printed_xml = pretty_printed_xml.replace(' ', indent)
file.write(pretty_printed_xml)
在我的代码中,此方法的调用方式如下:
try:
with open(file_path, 'w') as file:
file.write('<?xml version="1.0" encoding="utf-8" ?>')
# create some xml content using etree ...
xml_parser = XMLParser()
xml_parser.write_xml_file(file, xml_root, xml_declaration=False, pretty_print=True, encoding='unicode', indent='\t')
except IOError:
print("Error while writing in log file!")
这仅是因为etree默认情况下会使用two spaces
缩进,但我发现并不太强调缩进,因此效果不佳。我无法为etree设置任何设置或为任何函数更改标准etree缩进的参数。我喜欢使用etree多么容易,但这确实让我很烦。
I had this problem and solved it like this:
def write_xml_file (self, file, xml_root_element, xml_declaration=False, pretty_print=False, encoding='unicode', indent='\t'):
pretty_printed_xml = etree.tostring(xml_root_element, xml_declaration=xml_declaration, pretty_print=pretty_print, encoding=encoding)
if pretty_print: pretty_printed_xml = pretty_printed_xml.replace(' ', indent)
file.write(pretty_printed_xml)
In my code this method is called like this:
try:
with open(file_path, 'w') as file:
file.write('<?xml version="1.0" encoding="utf-8" ?>')
# create some xml content using etree ...
xml_parser = XMLParser()
xml_parser.write_xml_file(file, xml_root, xml_declaration=False, pretty_print=True, encoding='unicode', indent='\t')
except IOError:
print("Error while writing in log file!")
This works only because etree by default uses two spaces
to indent, which I don’t find very much emphasizing the indentation and therefore not pretty. I couldn’t ind any setting for etree or parameter for any function to change the standard etree indent. I like how easy it is to use etree, but this was really annoying me.
回答 19
要将整个xml文档转换为漂亮的xml文档
(例如:假设您已提取[解压缩] LibreOffice Writer .odt或.ods文件,并且想要将丑陋的“ content.xml”文件转换为自动化git版本控制和git difftool
.odt / .ods文件的生成,例如我在此处实现的)
import xml.dom.minidom
file = open("./content.xml", 'r')
xml_string = file.read()
file.close()
parsed_xml = xml.dom.minidom.parseString(xml_string)
pretty_xml_as_string = parsed_xml.toprettyxml()
file = open("./content_new.xml", 'w')
file.write(pretty_xml_as_string)
file.close()
参考资料:
-感谢本·诺兰德在本页上的回答,这为我提供了大部分帮助。
For converting an entire xml document to a pretty xml document
(ex: assuming you’ve extracted [unzipped] a LibreOffice Writer .odt or .ods file, and you want to convert the ugly “content.xml” file to a pretty one for automated git version control and git difftool
ing of .odt/.ods files, such as I’m implementing here)
import xml.dom.minidom
file = open("./content.xml", 'r')
xml_string = file.read()
file.close()
parsed_xml = xml.dom.minidom.parseString(xml_string)
pretty_xml_as_string = parsed_xml.toprettyxml()
file = open("./content_new.xml", 'w')
file.write(pretty_xml_as_string)
file.close()
References:
– Thanks to Ben Noland’s answer on this page which got me most of the way there.
回答 20
from lxml import etree
import xml.dom.minidom as mmd
xml_root = etree.parse(xml_fiel_path, etree.XMLParser())
def print_xml(xml_root):
plain_xml = etree.tostring(xml_root).decode('utf-8')
urgly_xml = ''.join(plain_xml .split())
good_xml = mmd.parseString(urgly_xml)
print(good_xml.toprettyxml(indent=' ',))
对于带有中文的xml来说效果很好!
from lxml import etree
import xml.dom.minidom as mmd
xml_root = etree.parse(xml_fiel_path, etree.XMLParser())
def print_xml(xml_root):
plain_xml = etree.tostring(xml_root).decode('utf-8')
urgly_xml = ''.join(plain_xml .split())
good_xml = mmd.parseString(urgly_xml)
print(good_xml.toprettyxml(indent=' ',))
It’s working well for the xml with Chinese!
回答 21
如果由于某种原因您无法使用其他用户提到的任何Python模块,那么我建议使用以下针对Python 2.7的解决方案:
import subprocess
def makePretty(filepath):
cmd = "xmllint --format " + filepath
prettyXML = subprocess.check_output(cmd, shell = True)
with open(filepath, "w") as outfile:
outfile.write(prettyXML)
据我所知,该解决方案将在xmllint
安装了该软件包的基于Unix的系统上运行。
If for some reason you can’t get your hands on any of the Python modules that other users mentioned, I suggest the following solution for Python 2.7:
import subprocess
def makePretty(filepath):
cmd = "xmllint --format " + filepath
prettyXML = subprocess.check_output(cmd, shell = True)
with open(filepath, "w") as outfile:
outfile.write(prettyXML)
As far as I know, this solution will work on Unix-based systems that have the xmllint
package installed.
回答 22
我用几行代码解决了这个问题,打开文件,遍历文件并添加缩进,然后再次保存。我正在处理小型xml文件,并且不想添加依赖项,也不想为用户安装更多库。无论如何,这就是我最终得到的结果:
f = open(file_name,'r')
xml = f.read()
f.close()
#Removing old indendations
raw_xml = ''
for line in xml:
raw_xml += line
xml = raw_xml
new_xml = ''
indent = ' '
deepness = 0
for i in range((len(xml))):
new_xml += xml[i]
if(i<len(xml)-3):
simpleSplit = xml[i:(i+2)] == '><'
advancSplit = xml[i:(i+3)] == '></'
end = xml[i:(i+2)] == '/>'
start = xml[i] == '<'
if(advancSplit):
deepness += -1
new_xml += '\n' + indent*deepness
simpleSplit = False
deepness += -1
if(simpleSplit):
new_xml += '\n' + indent*deepness
if(start):
deepness += 1
if(end):
deepness += -1
f = open(file_name,'w')
f.write(new_xml)
f.close()
它对我有用,也许有人会使用它:)
I solved this with some lines of code, opening the file, going trough it and adding indentation, then saving it again. I was working with small xml files, and did not want to add dependencies, or more libraries to install for the user. Anyway, here is what I ended up with:
f = open(file_name,'r')
xml = f.read()
f.close()
#Removing old indendations
raw_xml = ''
for line in xml:
raw_xml += line
xml = raw_xml
new_xml = ''
indent = ' '
deepness = 0
for i in range((len(xml))):
new_xml += xml[i]
if(i<len(xml)-3):
simpleSplit = xml[i:(i+2)] == '><'
advancSplit = xml[i:(i+3)] == '></'
end = xml[i:(i+2)] == '/>'
start = xml[i] == '<'
if(advancSplit):
deepness += -1
new_xml += '\n' + indent*deepness
simpleSplit = False
deepness += -1
if(simpleSplit):
new_xml += '\n' + indent*deepness
if(start):
deepness += 1
if(end):
deepness += -1
f = open(file_name,'w')
f.write(new_xml)
f.close()
It works for me, perhaps someone will have some use of it :)
声明:本站所有文章,如无特殊说明或标注,均为本站原创发布。任何个人或组织,在未征得本站同意时,禁止复制、盗用、采集、发布本站内容到任何网站、书籍等各类媒体平台。如若本站内容侵犯了原著者的合法权益,可联系我们进行处理。