如何将xml字符串转换为字典?-Python 实用宝典

如何将xml字符串转换为字典?

我有一个程序可以从套接字读取xml文档。我将xml文档存储在一个字符串中,我想将其直接转换为Python字典,就像在Django的simplejson库中一样。 举个例子: str ="<?xml version="1.0" ?><person><name>john</name><age>20</age></person" dic_xml = convert_to_dic(str) 然后dic_xml看起来像{'person' : { 'name' : 'john', 'age' : 20 } }

问题:如何将xml字符串转换为字典?

我有一个程序可以从套接字读取xml文档。我将xml文档存储在一个字符串中,我想将其直接转换为Python字典,就像在Django的simplejson库中一样。

举个例子:

str ="<?xml version="1.0" ?><person><name>john</name><age>20</age></person"
dic_xml = convert_to_dic(str)

然后dic_xml看起来像{'person' : { 'name' : 'john', 'age' : 20 } }

I have a program that reads an xml document from a socket. I have the xml document stored in a string which I would like to convert directly to a Python dictionary, the same way it is done in Django's simplejson library.

Take as an example:

str ="<?xml version="1.0" ?><person><name>john</name><age>20</age></person"
dic_xml = convert_to_dic(str)

Then dic_xml would look like {'person' : { 'name' : 'john', 'age' : 20 } }


回答 0

这是某人创建的一个很棒的模块。我已经使用过几次了。 http://code.activestate.com/recipes/410469-xml-as-dictionary/

这是网站上的代码,以防链接损坏。

from xml.etree import cElementTree as ElementTree

class XmlListConfig(list):
    def __init__(self, aList):
        for element in aList:
            if element:
                # treat like dict
                if len(element) == 1 or element[0].tag != element[1].tag:
                    self.append(XmlDictConfig(element))
                # treat like list
                elif element[0].tag == element[1].tag:
                    self.append(XmlListConfig(element))
            elif element.text:
                text = element.text.strip()
                if text:
                    self.append(text)


class XmlDictConfig(dict):
    '''
    Example usage:

    >>> tree = ElementTree.parse('your_file.xml')
    >>> root = tree.getroot()
    >>> xmldict = XmlDictConfig(root)

    Or, if you want to use an XML string:

    >>> root = ElementTree.XML(xml_string)
    >>> xmldict = XmlDictConfig(root)

    And then use xmldict for what it is... a dict.
    '''
    def __init__(self, parent_element):
        if parent_element.items():
            self.update(dict(parent_element.items()))
        for element in parent_element:
            if element:
                # treat like dict - we assume that if the first two tags
                # in a series are different, then they are all different.
                if len(element) == 1 or element[0].tag != element[1].tag:
                    aDict = XmlDictConfig(element)
                # treat like list - we assume that if the first two tags
                # in a series are the same, then the rest are the same.
                else:
                    # here, we put the list in dictionary; the key is the
                    # tag name the list elements all share in common, and
                    # the value is the list itself 
                    aDict = {element[0].tag: XmlListConfig(element)}
                # if the tag has attributes, add those to the dict
                if element.items():
                    aDict.update(dict(element.items()))
                self.update({element.tag: aDict})
            # this assumes that if you've got an attribute in a tag,
            # you won't be having any text. This may or may not be a 
            # good idea -- time will tell. It works for the way we are
            # currently doing XML configuration files...
            elif element.items():
                self.update({element.tag: dict(element.items())})
            # finally, if there are no child tags and no attributes, extract
            # the text
            else:
                self.update({element.tag: element.text})

用法示例:

tree = ElementTree.parse('your_file.xml')
root = tree.getroot()
xmldict = XmlDictConfig(root)

//或者,如果要使用XML字符串:

root = ElementTree.XML(xml_string)
xmldict = XmlDictConfig(root)

This is a great module that someone created. I've used it several times. http://code.activestate.com/recipes/410469-xml-as-dictionary/

Here is the code from the website just in case the link goes bad.

from xml.etree import cElementTree as ElementTree

class XmlListConfig(list):
    def __init__(self, aList):
        for element in aList:
            if element:
                # treat like dict
                if len(element) == 1 or element[0].tag != element[1].tag:
                    self.append(XmlDictConfig(element))
                # treat like list
                elif element[0].tag == element[1].tag:
                    self.append(XmlListConfig(element))
            elif element.text:
                text = element.text.strip()
                if text:
                    self.append(text)


class XmlDictConfig(dict):
    '''
    Example usage:

    >>> tree = ElementTree.parse('your_file.xml')
    >>> root = tree.getroot()
    >>> xmldict = XmlDictConfig(root)

    Or, if you want to use an XML string:

    >>> root = ElementTree.XML(xml_string)
    >>> xmldict = XmlDictConfig(root)

    And then use xmldict for what it is... a dict.
    '''
    def __init__(self, parent_element):
        if parent_element.items():
            self.update(dict(parent_element.items()))
        for element in parent_element:
            if element:
                # treat like dict - we assume that if the first two tags
                # in a series are different, then they are all different.
                if len(element) == 1 or element[0].tag != element[1].tag:
                    aDict = XmlDictConfig(element)
                # treat like list - we assume that if the first two tags
                # in a series are the same, then the rest are the same.
                else:
                    # here, we put the list in dictionary; the key is the
                    # tag name the list elements all share in common, and
                    # the value is the list itself 
                    aDict = {element[0].tag: XmlListConfig(element)}
                # if the tag has attributes, add those to the dict
                if element.items():
                    aDict.update(dict(element.items()))
                self.update({element.tag: aDict})
            # this assumes that if you've got an attribute in a tag,
            # you won't be having any text. This may or may not be a 
            # good idea -- time will tell. It works for the way we are
            # currently doing XML configuration files...
            elif element.items():
                self.update({element.tag: dict(element.items())})
            # finally, if there are no child tags and no attributes, extract
            # the text
            else:
                self.update({element.tag: element.text})

Example usage:

tree = ElementTree.parse('your_file.xml')
root = tree.getroot()
xmldict = XmlDictConfig(root)

//Or, if you want to use an XML string:

root = ElementTree.XML(xml_string)
xmldict = XmlDictConfig(root)

回答 1

xmltodict(完全公开:我写了它)确实做到了:

xmltodict.parse("""
<?xml version="1.0" ?>
<person>
  <name>john</name>
  <age>20</age>
</person>""")
# {u'person': {u'age': u'20', u'name': u'john'}}

xmltodict (full disclosure: I wrote it) does exactly that:

xmltodict.parse("""
<?xml version="1.0" ?>
<person>
  <name>john</name>
  <age>20</age>
</person>""")
# {u'person': {u'age': u'20', u'name': u'john'}}

回答 2

以下XML-to-Python-dict片段分析了此XML-to-JSON“规范”之后的实体以及属性。这是处理XML所有情况的最通用的解决方案。

from collections import defaultdict

def etree_to_dict(t):
    d = {t.tag: {} if t.attrib else None}
    children = list(t)
    if children:
        dd = defaultdict(list)
        for dc in map(etree_to_dict, children):
            for k, v in dc.items():
                dd[k].append(v)
        d = {t.tag: {k:v[0] if len(v) == 1 else v for k, v in dd.items()}}
    if t.attrib:
        d[t.tag].update(('@' + k, v) for k, v in t.attrib.items())
    if t.text:
        text = t.text.strip()
        if children or t.attrib:
            if text:
              d[t.tag]['#text'] = text
        else:
            d[t.tag] = text
    return d

它用于:

from xml.etree import cElementTree as ET
e = ET.XML('''
<root>
  <e />
  <e>text</e>
  <e name="value" />
  <e name="value">text</e>
  <e> <a>text</a> <b>text</b> </e>
  <e> <a>text</a> <a>text</a> </e>
  <e> text <a>text</a> </e>
</root>
''')

from pprint import pprint
pprint(etree_to_dict(e))

此示例的输出(根据上面链接的“规范”)应为:

{'root': {'e': [None,
                'text',
                {'@name': 'value'},
                {'#text': 'text', '@name': 'value'},
                {'a': 'text', 'b': 'text'},
                {'a': ['text', 'text']},
                {'#text': 'text', 'a': 'text'}]}}

不一定很漂亮,但它是明确的,而更简单的XML输入会导致更简单的JSON。:)


更新资料

如果要进行相反的操作从JSON / dict发出XML字符串,则可以使用:

try:
  basestring
except NameError:  # python3
  basestring = str

def dict_to_etree(d):
    def _to_etree(d, root):
        if not d:
            pass
        elif isinstance(d, basestring):
            root.text = d
        elif isinstance(d, dict):
            for k,v in d.items():
                assert isinstance(k, basestring)
                if k.startswith('#'):
                    assert k == '#text' and isinstance(v, basestring)
                    root.text = v
                elif k.startswith('@'):
                    assert isinstance(v, basestring)
                    root.set(k[1:], v)
                elif isinstance(v, list):
                    for e in v:
                        _to_etree(e, ET.SubElement(root, k))
                else:
                    _to_etree(v, ET.SubElement(root, k))
        else:
            raise TypeError('invalid type: ' + str(type(d)))
    assert isinstance(d, dict) and len(d) == 1
    tag, body = next(iter(d.items()))
    node = ET.Element(tag)
    _to_etree(body, node)
    return ET.tostring(node)

pprint(dict_to_etree(d))

The following XML-to-Python-dict snippet parses entities as well as attributes following this XML-to-JSON "specification". It is the most general solution handling all cases of XML.

from collections import defaultdict

def etree_to_dict(t):
    d = {t.tag: {} if t.attrib else None}
    children = list(t)
    if children:
        dd = defaultdict(list)
        for dc in map(etree_to_dict, children):
            for k, v in dc.items():
                dd[k].append(v)
        d = {t.tag: {k:v[0] if len(v) == 1 else v for k, v in dd.items()}}
    if t.attrib:
        d[t.tag].update(('@' + k, v) for k, v in t.attrib.items())
    if t.text:
        text = t.text.strip()
        if children or t.attrib:
            if text:
              d[t.tag]['#text'] = text
        else:
            d[t.tag] = text
    return d

It is used:

from xml.etree import cElementTree as ET
e = ET.XML('''
<root>
  <e />
  <e>text</e>
  <e name="value" />
  <e name="value">text</e>
  <e> <a>text</a> <b>text</b> </e>
  <e> <a>text</a> <a>text</a> </e>
  <e> text <a>text</a> </e>
</root>
''')

from pprint import pprint
pprint(etree_to_dict(e))

The output of this example (as per above-linked "specification") should be:

{'root': {'e': [None,
                'text',
                {'@name': 'value'},
                {'#text': 'text', '@name': 'value'},
                {'a': 'text', 'b': 'text'},
                {'a': ['text', 'text']},
                {'#text': 'text', 'a': 'text'}]}}

Not necessarily pretty, but it is unambiguous, and simpler XML inputs result in simpler JSON. 🙂


Update

If you want to do the reverse, emit an XML string from a JSON/dict, you can use:

try:
  basestring
except NameError:  # python3
  basestring = str

def dict_to_etree(d):
    def _to_etree(d, root):
        if not d:
            pass
        elif isinstance(d, basestring):
            root.text = d
        elif isinstance(d, dict):
            for k,v in d.items():
                assert isinstance(k, basestring)
                if k.startswith('#'):
                    assert k == '#text' and isinstance(v, basestring)
                    root.text = v
                elif k.startswith('@'):
                    assert isinstance(v, basestring)
                    root.set(k[1:], v)
                elif isinstance(v, list):
                    for e in v:
                        _to_etree(e, ET.SubElement(root, k))
                else:
                    _to_etree(v, ET.SubElement(root, k))
        else:
            raise TypeError('invalid type: ' + str(type(d)))
    assert isinstance(d, dict) and len(d) == 1
    tag, body = next(iter(d.items()))
    node = ET.Element(tag)
    _to_etree(body, node)
    return ET.tostring(node)

pprint(dict_to_etree(d))

回答 3

这个轻量级的版本虽然不可配置,但是很容易根据需要进行定制,并且可以在旧的python中工作。它也是严格的-意味着无论属性是否存在,结果都是相同的。

import xml.etree.ElementTree as ET

from copy import copy

def dictify(r,root=True):
    if root:
        return {r.tag : dictify(r, False)}
    d=copy(r.attrib)
    if r.text:
        d["_text"]=r.text
    for x in r.findall("./*"):
        if x.tag not in d:
            d[x.tag]=[]
        d[x.tag].append(dictify(x,False))
    return d

所以:

root = ET.fromstring("<erik><a x='1'>v</a><a y='2'>w</a></erik>")

dictify(root)

结果是:

{'erik': {'a': [{'x': '1', '_text': 'v'}, {'y': '2', '_text': 'w'}]}}

This lightweight version, while not configurable, is pretty easy to tailor as needed, and works in old pythons. Also it is rigid - meaning the results are the same regardless of the existence of attributes.

import xml.etree.ElementTree as ET

from copy import copy

def dictify(r,root=True):
    if root:
        return {r.tag : dictify(r, False)}
    d=copy(r.attrib)
    if r.text:
        d["_text"]=r.text
    for x in r.findall("./*"):
        if x.tag not in d:
            d[x.tag]=[]
        d[x.tag].append(dictify(x,False))
    return d

So:

root = ET.fromstring("<erik><a x='1'>v</a><a y='2'>w</a></erik>")

dictify(root)

Results in:

{'erik': {'a': [{'x': '1', '_text': 'v'}, {'y': '2', '_text': 'w'}]}}

回答 4

PicklingTools库的最新版本(1.3.0和1.3.1)支持将XML转换为Python dict的工具。

可从此处下载文件: PicklingTools 1.3.1

没有为转换颇有几分文档在这里:文档中详细的所有XML和Python字典之间转换时将产生的决定和问题描述(也有一些边缘情况:属性,列表,匿名列表,匿名多数转换器无法处理的dict,eval等)。通常,这些转换器易于使用。如果“ example.xml”包含:

<top>
  <a>1</a>
  <b>2.2</b>
  <c>three</c>
</top>

然后将其转换为字典:

>>> from xmlloader import *
>>> example = file('example.xml', 'r')   # A document containing XML
>>> xl = StreamXMLLoader(example, 0)     # 0 = all defaults on operation
>>> result = xl.expect XML()
>>> print result
{'top': {'a': '1', 'c': 'three', 'b': '2.2'}}

有一些可以在C ++和Python中进行转换的工具:C ++和Python可以进行相同的转换,但是C ++的速度要快60倍左右

The most recent versions of the PicklingTools libraries (1.3.0 and 1.3.1) support tools for converting from XML to a Python dict.

The download is available here: PicklingTools 1.3.1

There is quite a bit of documentation for the converters here: the documentation describes in detail all of the decisions and issues that will arise when converting between XML and Python dictionaries (there are a number of edge cases: attributes, lists, anonymous lists, anonymous dicts, eval, etc. that most converters don't handle). In general, though, the converters are easy to use. If an 'example.xml' contains:

<top>
  <a>1</a>
  <b>2.2</b>
  <c>three</c>
</top>

Then to convert it to a dictionary:

>>> from xmlloader import *
>>> example = file('example.xml', 'r')   # A document containing XML
>>> xl = StreamXMLLoader(example, 0)     # 0 = all defaults on operation
>>> result = xl.expect XML()
>>> print result
{'top': {'a': '1', 'c': 'three', 'b': '2.2'}}

There are tools for converting in both C++ and Python: the C++ and Python do indentical conversion, but the C++ is about 60x faster


回答 5

您可以使用lxml轻松完成此操作。首先安装它:

[sudo] pip install lxml

这是我编写的递归函数,可以为您完成繁重的工作:

from lxml import objectify as xml_objectify


def xml_to_dict(xml_str):
    """ Convert xml to dict, using lxml v3.4.2 xml processing library """
    def xml_to_dict_recursion(xml_object):
        dict_object = xml_object.__dict__
        if not dict_object:
            return xml_object
        for key, value in dict_object.items():
            dict_object[key] = xml_to_dict_recursion(value)
        return dict_object
    return xml_to_dict_recursion(xml_objectify.fromstring(xml_str))

xml_string = """<?xml version="1.0" encoding="UTF-8"?><Response><NewOrderResp>
<IndustryType>Test</IndustryType><SomeData><SomeNestedData1>1234</SomeNestedData1>
<SomeNestedData2>3455</SomeNestedData2></SomeData></NewOrderResp></Response>"""

print xml_to_dict(xml_string)

以下变体保留了父键/元素:

def xml_to_dict(xml_str):
    """ Convert xml to dict, using lxml v3.4.2 xml processing library, see http://lxml.de/ """
    def xml_to_dict_recursion(xml_object):
        dict_object = xml_object.__dict__
        if not dict_object:  # if empty dict returned
            return xml_object
        for key, value in dict_object.items():
            dict_object[key] = xml_to_dict_recursion(value)
        return dict_object
    xml_obj = objectify.fromstring(xml_str)
    return {xml_obj.tag: xml_to_dict_recursion(xml_obj)}

如果只想返回一个子树并将其转换为dict,则可以使用Element.find()获取该子树,然后对其进行转换:

xml_obj.find('.//')  # lxml.objectify.ObjectifiedElement instance

请在此处查看lxml文档。我希望这有帮助!

You can do this quite easily with lxml. First install it:

[sudo] pip install lxml

Here is a recursive function I wrote that does the heavy lifting for you:

from lxml import objectify as xml_objectify


def xml_to_dict(xml_str):
    """ Convert xml to dict, using lxml v3.4.2 xml processing library """
    def xml_to_dict_recursion(xml_object):
        dict_object = xml_object.__dict__
        if not dict_object:
            return xml_object
        for key, value in dict_object.items():
            dict_object[key] = xml_to_dict_recursion(value)
        return dict_object
    return xml_to_dict_recursion(xml_objectify.fromstring(xml_str))

xml_string = """<?xml version="1.0" encoding="UTF-8"?><Response><NewOrderResp>
<IndustryType>Test</IndustryType><SomeData><SomeNestedData1>1234</SomeNestedData1>
<SomeNestedData2>3455</SomeNestedData2></SomeData></NewOrderResp></Response>"""

print xml_to_dict(xml_string)

The below variant preserves the parent key / element:

def xml_to_dict(xml_str):
    """ Convert xml to dict, using lxml v3.4.2 xml processing library, see http://lxml.de/ """
    def xml_to_dict_recursion(xml_object):
        dict_object = xml_object.__dict__
        if not dict_object:  # if empty dict returned
            return xml_object
        for key, value in dict_object.items():
            dict_object[key] = xml_to_dict_recursion(value)
        return dict_object
    xml_obj = objectify.fromstring(xml_str)
    return {xml_obj.tag: xml_to_dict_recursion(xml_obj)}

If you want to only return a subtree and convert it to dict, you can use Element.find() to get the subtree and then convert it:

xml_obj.find('.//')  # lxml.objectify.ObjectifiedElement instance

See the lxml docs here. I hope this helps!


回答 6

免责声明:此经过修改的XML解析器受到Adam Clark 的启发。原始XML解析器适用于大多数简单情况。但是,它不适用于某些复杂的XML文件。我逐行调试了代码,最后解决了一些问题。如果您发现一些错误,请告诉我。我很高兴修复它。

class XmlDictConfig(dict):  
    '''   
    Note: need to add a root into if no exising    
    Example usage:
    >>> tree = ElementTree.parse('your_file.xml')
    >>> root = tree.getroot()
    >>> xmldict = XmlDictConfig(root)
    Or, if you want to use an XML string:
    >>> root = ElementTree.XML(xml_string)
    >>> xmldict = XmlDictConfig(root)
    And then use xmldict for what it is... a dict.
    '''
    def __init__(self, parent_element):
        if parent_element.items():
            self.updateShim( dict(parent_element.items()) )
        for element in parent_element:
            if len(element):
                aDict = XmlDictConfig(element)
            #   if element.items():
            #   aDict.updateShim(dict(element.items()))
                self.updateShim({element.tag: aDict})
            elif element.items():    # items() is specialy for attribtes
                elementattrib= element.items()
                if element.text:           
                    elementattrib.append((element.tag,element.text ))     # add tag:text if there exist
                self.updateShim({element.tag: dict(elementattrib)})
            else:
                self.updateShim({element.tag: element.text})

    def updateShim (self, aDict ):
        for key in aDict.keys():   # keys() includes tag and attributes
            if key in self:
                value = self.pop(key)
                if type(value) is not list:
                    listOfDicts = []
                    listOfDicts.append(value)
                    listOfDicts.append(aDict[key])
                    self.update({key: listOfDicts})
                else:
                    value.append(aDict[key])
                    self.update({key: value})
            else:
                self.update({key:aDict[key]})  # it was self.update(aDict)    

Disclaimer: This modified XML parser was inspired by Adam Clark The original XML parser works for most of simple cases. However, it didn't work for some complicated XML files. I debugged the code line by line and finally fixed some issues. If you find some bugs, please let me know. I am glad to fix it.

class XmlDictConfig(dict):  
    '''   
    Note: need to add a root into if no exising    
    Example usage:
    >>> tree = ElementTree.parse('your_file.xml')
    >>> root = tree.getroot()
    >>> xmldict = XmlDictConfig(root)
    Or, if you want to use an XML string:
    >>> root = ElementTree.XML(xml_string)
    >>> xmldict = XmlDictConfig(root)
    And then use xmldict for what it is... a dict.
    '''
    def __init__(self, parent_element):
        if parent_element.items():
            self.updateShim( dict(parent_element.items()) )
        for element in parent_element:
            if len(element):
                aDict = XmlDictConfig(element)
            #   if element.items():
            #   aDict.updateShim(dict(element.items()))
                self.updateShim({element.tag: aDict})
            elif element.items():    # items() is specialy for attribtes
                elementattrib= element.items()
                if element.text:           
                    elementattrib.append((element.tag,element.text ))     # add tag:text if there exist
                self.updateShim({element.tag: dict(elementattrib)})
            else:
                self.updateShim({element.tag: element.text})

    def updateShim (self, aDict ):
        for key in aDict.keys():   # keys() includes tag and attributes
            if key in self:
                value = self.pop(key)
                if type(value) is not list:
                    listOfDicts = []
                    listOfDicts.append(value)
                    listOfDicts.append(aDict[key])
                    self.update({key: listOfDicts})
                else:
                    value.append(aDict[key])
                    self.update({key: value})
            else:
                self.update({key:aDict[key]})  # it was self.update(aDict)    

回答 7

def xml_to_dict(node):
    u''' 
    @param node:lxml_node
    @return: dict 
    '''

    return {'tag': node.tag, 'text': node.text, 'attrib': node.attrib, 'children': {child.tag: xml_to_dict(child) for child in node}}
def xml_to_dict(node):
    u''' 
    @param node:lxml_node
    @return: dict 
    '''

    return {'tag': node.tag, 'text': node.text, 'attrib': node.attrib, 'children': {child.tag: xml_to_dict(child) for child in node}}

回答 8

最容易使用的XML XML解析器是ElementTree(从2.5x开始,在标准库xml.etree.ElementTree中)。我认为没有什么可以完全满足您的要求。使用ElementTree编写某些内容来完成您想要的事情,这很简单,但是为什么要转换为字典,为什么不直接使用ElementTree。

The easiest to use XML parser for Python is ElementTree (as of 2.5x and above it is in the standard library xml.etree.ElementTree). I don't think there is anything that does exactly what you want out of the box. It would be pretty trivial to write something to do what you want using ElementTree, but why convert to a dictionary, and why not just use ElementTree directly.


回答 9

来自http://code.activestate.com/recipes/410469-xml-as-dictionary/的代码效果很好,但是,如果在层次结构中的给定位置存在多个相同的元素,它将覆盖它们。

我在两者之间添加了一个垫片,以查看在self.update()之前该元素是否已经存在。如果是这样,则弹出现有条目并从现有条目和新条目中创建一个列表。随后的所有重复项都将添加到列表中。

不知道是否可以更妥善地处理此问题,但它的工作原理是:

import xml.etree.ElementTree as ElementTree

class XmlDictConfig(dict):
    def __init__(self, parent_element):
        if parent_element.items():
            self.updateShim(dict(parent_element.items()))
        for element in parent_element:
            if len(element):
                aDict = XmlDictConfig(element)
                if element.items():
                    aDict.updateShim(dict(element.items()))
                self.updateShim({element.tag: aDict})
            elif element.items():
                self.updateShim({element.tag: dict(element.items())})
            else:
                self.updateShim({element.tag: element.text.strip()})

    def updateShim (self, aDict ):
        for key in aDict.keys():
            if key in self:
                value = self.pop(key)
                if type(value) is not list:
                    listOfDicts = []
                    listOfDicts.append(value)
                    listOfDicts.append(aDict[key])
                    self.update({key: listOfDicts})

                else:
                    value.append(aDict[key])
                    self.update({key: value})
            else:
                self.update(aDict)

The code from http://code.activestate.com/recipes/410469-xml-as-dictionary/ works well, but if there are multiple elements that are the same at a given place in the hierarchy it just overrides them.

I added a shim between that looks to see if the element already exists before self.update(). If so, pops the existing entry and creates a lists out of the existing and the new. Any subsequent duplicates are added to the list.

Not sure if this can be handled more gracefully, but it works:

import xml.etree.ElementTree as ElementTree

class XmlDictConfig(dict):
    def __init__(self, parent_element):
        if parent_element.items():
            self.updateShim(dict(parent_element.items()))
        for element in parent_element:
            if len(element):
                aDict = XmlDictConfig(element)
                if element.items():
                    aDict.updateShim(dict(element.items()))
                self.updateShim({element.tag: aDict})
            elif element.items():
                self.updateShim({element.tag: dict(element.items())})
            else:
                self.updateShim({element.tag: element.text.strip()})

    def updateShim (self, aDict ):
        for key in aDict.keys():
            if key in self:
                value = self.pop(key)
                if type(value) is not list:
                    listOfDicts = []
                    listOfDicts.append(value)
                    listOfDicts.append(aDict[key])
                    self.update({key: listOfDicts})

                else:
                    value.append(aDict[key])
                    self.update({key: value})
            else:
                self.update(aDict)

回答 10

从@ K3 --- rnc 响应(最适合我),我添加了一些小修改以从XML文本中获得OrderedDict(有时顺序很重要):

def etree_to_ordereddict(t):
d = OrderedDict()
d[t.tag] = OrderedDict() if t.attrib else None
children = list(t)
if children:
    dd = OrderedDict()
    for dc in map(etree_to_ordereddict, children):
        for k, v in dc.iteritems():
            if k not in dd:
                dd[k] = list()
            dd[k].append(v)
    d = OrderedDict()
    d[t.tag] = OrderedDict()
    for k, v in dd.iteritems():
        if len(v) == 1:
            d[t.tag][k] = v[0]
        else:
            d[t.tag][k] = v
if t.attrib:
    d[t.tag].update(('@' + k, v) for k, v in t.attrib.iteritems())
if t.text:
    text = t.text.strip()
    if children or t.attrib:
        if text:
            d[t.tag]['#text'] = text
    else:
        d[t.tag] = text
return d

在@ K3 --- rnc示例中,可以使用它:

from xml.etree import cElementTree as ET
e = ET.XML('''
<root>
  <e />
  <e>text</e>
  <e name="value" />
  <e name="value">text</e>
  <e> <a>text</a> <b>text</b> </e>
  <e> <a>text</a> <a>text</a> </e>
  <e> text <a>text</a> </e>
</root>
''')

from pprint import pprint
pprint(etree_to_ordereddict(e))

希望能帮助到你 ;)

From @K3---rnc response (the best for me) I've added a small modifications to get an OrderedDict from an XML text (some times order matters):

def etree_to_ordereddict(t):
d = OrderedDict()
d[t.tag] = OrderedDict() if t.attrib else None
children = list(t)
if children:
    dd = OrderedDict()
    for dc in map(etree_to_ordereddict, children):
        for k, v in dc.iteritems():
            if k not in dd:
                dd[k] = list()
            dd[k].append(v)
    d = OrderedDict()
    d[t.tag] = OrderedDict()
    for k, v in dd.iteritems():
        if len(v) == 1:
            d[t.tag][k] = v[0]
        else:
            d[t.tag][k] = v
if t.attrib:
    d[t.tag].update(('@' + k, v) for k, v in t.attrib.iteritems())
if t.text:
    text = t.text.strip()
    if children or t.attrib:
        if text:
            d[t.tag]['#text'] = text
    else:
        d[t.tag] = text
return d

Following @K3---rnc example, you can use it:

from xml.etree import cElementTree as ET
e = ET.XML('''
<root>
  <e />
  <e>text</e>
  <e name="value" />
  <e name="value">text</e>
  <e> <a>text</a> <b>text</b> </e>
  <e> <a>text</a> <a>text</a> </e>
  <e> text <a>text</a> </e>
</root>
''')

from pprint import pprint
pprint(etree_to_ordereddict(e))

Hope it helps 😉


回答 11

这是ActiveState解决方案的链接-以及代码再次消失的代码。

==================================================
xmlreader.py:
==================================================
from xml.dom.minidom import parse


class NotTextNodeError:
    pass


def getTextFromNode(node):
    """
    scans through all children of node and gathers the
    text. if node has non-text child-nodes, then
    NotTextNodeError is raised.
    """
    t = ""
    for n in node.childNodes:
    if n.nodeType == n.TEXT_NODE:
        t += n.nodeValue
    else:
        raise NotTextNodeError
    return t


def nodeToDic(node):
    """
    nodeToDic() scans through the children of node and makes a
    dictionary from the content.
    three cases are differentiated:
    - if the node contains no other nodes, it is a text-node
    and {nodeName:text} is merged into the dictionary.
    - if the node has the attribute "method" set to "true",
    then it's children will be appended to a list and this
    list is merged to the dictionary in the form: {nodeName:list}.
    - else, nodeToDic() will call itself recursively on
    the nodes children (merging {nodeName:nodeToDic()} to
    the dictionary).
    """
    dic = {} 
    for n in node.childNodes:
    if n.nodeType != n.ELEMENT_NODE:
        continue
    if n.getAttribute("multiple") == "true":
        # node with multiple children:
        # put them in a list
        l = []
        for c in n.childNodes:
            if c.nodeType != n.ELEMENT_NODE:
            continue
        l.append(nodeToDic(c))
            dic.update({n.nodeName:l})
        continue

    try:
        text = getTextFromNode(n)
    except NotTextNodeError:
            # 'normal' node
            dic.update({n.nodeName:nodeToDic(n)})
            continue

        # text node
        dic.update({n.nodeName:text})
    continue
    return dic


def readConfig(filename):
    dom = parse(filename)
    return nodeToDic(dom)





def test():
    dic = readConfig("sample.xml")

    print dic["Config"]["Name"]
    print
    for item in dic["Config"]["Items"]:
    print "Item's Name:", item["Name"]
    print "Item's Value:", item["Value"]

test()



==================================================
sample.xml:
==================================================
<?xml version="1.0" encoding="UTF-8"?>

<Config>
    <Name>My Config File</Name>

    <Items multiple="true">
    <Item>
        <Name>First Item</Name>
        <Value>Value 1</Value>
    </Item>
    <Item>
        <Name>Second Item</Name>
        <Value>Value 2</Value>
    </Item>
    </Items>

</Config>



==================================================
output:
==================================================
My Config File

Item's Name: First Item
Item's Value: Value 1
Item's Name: Second Item
Item's Value: Value 2

Here's a link to an ActiveState solution - and the code in case it disappears again.

==================================================
xmlreader.py:
==================================================
from xml.dom.minidom import parse


class NotTextNodeError:
    pass


def getTextFromNode(node):
    """
    scans through all children of node and gathers the
    text. if node has non-text child-nodes, then
    NotTextNodeError is raised.
    """
    t = ""
    for n in node.childNodes:
    if n.nodeType == n.TEXT_NODE:
        t += n.nodeValue
    else:
        raise NotTextNodeError
    return t


def nodeToDic(node):
    """
    nodeToDic() scans through the children of node and makes a
    dictionary from the content.
    three cases are differentiated:
    - if the node contains no other nodes, it is a text-node
    and {nodeName:text} is merged into the dictionary.
    - if the node has the attribute "method" set to "true",
    then it's children will be appended to a list and this
    list is merged to the dictionary in the form: {nodeName:list}.
    - else, nodeToDic() will call itself recursively on
    the nodes children (merging {nodeName:nodeToDic()} to
    the dictionary).
    """
    dic = {} 
    for n in node.childNodes:
    if n.nodeType != n.ELEMENT_NODE:
        continue
    if n.getAttribute("multiple") == "true":
        # node with multiple children:
        # put them in a list
        l = []
        for c in n.childNodes:
            if c.nodeType != n.ELEMENT_NODE:
            continue
        l.append(nodeToDic(c))
            dic.update({n.nodeName:l})
        continue

    try:
        text = getTextFromNode(n)
    except NotTextNodeError:
            # 'normal' node
            dic.update({n.nodeName:nodeToDic(n)})
            continue

        # text node
        dic.update({n.nodeName:text})
    continue
    return dic


def readConfig(filename):
    dom = parse(filename)
    return nodeToDic(dom)





def test():
    dic = readConfig("sample.xml")

    print dic["Config"]["Name"]
    print
    for item in dic["Config"]["Items"]:
    print "Item's Name:", item["Name"]
    print "Item's Value:", item["Value"]

test()



==================================================
sample.xml:
==================================================
<?xml version="1.0" encoding="UTF-8"?>

<Config>
    <Name>My Config File</Name>

    <Items multiple="true">
    <Item>
        <Name>First Item</Name>
        <Value>Value 1</Value>
    </Item>
    <Item>
        <Name>Second Item</Name>
        <Value>Value 2</Value>
    </Item>
    </Items>

</Config>



==================================================
output:
==================================================
My Config File

Item's Name: First Item
Item's Value: Value 1
Item's Name: Second Item
Item's Value: Value 2

回答 12

在某一时刻,我不得不解析和编写仅包含没有属性的元素的XML,因此从XML到dict的1:1映射很容易。如果别人也不需要属性,这就是我想出的:

def xmltodict(element):
    if not isinstance(element, ElementTree.Element):
        raise ValueError("must pass xml.etree.ElementTree.Element object")

    def xmltodict_handler(parent_element):
        result = dict()
        for element in parent_element:
            if len(element):
                obj = xmltodict_handler(element)
            else:
                obj = element.text

            if result.get(element.tag):
                if hasattr(result[element.tag], "append"):
                    result[element.tag].append(obj)
                else:
                    result[element.tag] = [result[element.tag], obj]
            else:
                result[element.tag] = obj
        return result

    return {element.tag: xmltodict_handler(element)}


def dicttoxml(element):
    if not isinstance(element, dict):
        raise ValueError("must pass dict type")
    if len(element) != 1:
        raise ValueError("dict must have exactly one root key")

    def dicttoxml_handler(result, key, value):
        if isinstance(value, list):
            for e in value:
                dicttoxml_handler(result, key, e)
        elif isinstance(value, basestring):
            elem = ElementTree.Element(key)
            elem.text = value
            result.append(elem)
        elif isinstance(value, int) or isinstance(value, float):
            elem = ElementTree.Element(key)
            elem.text = str(value)
            result.append(elem)
        elif value is None:
            result.append(ElementTree.Element(key))
        else:
            res = ElementTree.Element(key)
            for k, v in value.items():
                dicttoxml_handler(res, k, v)
            result.append(res)

    result = ElementTree.Element(element.keys()[0])
    for key, value in element[element.keys()[0]].items():
        dicttoxml_handler(result, key, value)
    return result

def xmlfiletodict(filename):
    return xmltodict(ElementTree.parse(filename).getroot())

def dicttoxmlfile(element, filename):
    ElementTree.ElementTree(dicttoxml(element)).write(filename)

def xmlstringtodict(xmlstring):
    return xmltodict(ElementTree.fromstring(xmlstring).getroot())

def dicttoxmlstring(element):
    return ElementTree.tostring(dicttoxml(element))

At one point I had to parse and write XML that only consisted of elements without attributes so a 1:1 mapping from XML to dict was possible easily. This is what I came up with in case someone else also doesnt need attributes:

def xmltodict(element):
    if not isinstance(element, ElementTree.Element):
        raise ValueError("must pass xml.etree.ElementTree.Element object")

    def xmltodict_handler(parent_element):
        result = dict()
        for element in parent_element:
            if len(element):
                obj = xmltodict_handler(element)
            else:
                obj = element.text

            if result.get(element.tag):
                if hasattr(result[element.tag], "append"):
                    result[element.tag].append(obj)
                else:
                    result[element.tag] = [result[element.tag], obj]
            else:
                result[element.tag] = obj
        return result

    return {element.tag: xmltodict_handler(element)}


def dicttoxml(element):
    if not isinstance(element, dict):
        raise ValueError("must pass dict type")
    if len(element) != 1:
        raise ValueError("dict must have exactly one root key")

    def dicttoxml_handler(result, key, value):
        if isinstance(value, list):
            for e in value:
                dicttoxml_handler(result, key, e)
        elif isinstance(value, basestring):
            elem = ElementTree.Element(key)
            elem.text = value
            result.append(elem)
        elif isinstance(value, int) or isinstance(value, float):
            elem = ElementTree.Element(key)
            elem.text = str(value)
            result.append(elem)
        elif value is None:
            result.append(ElementTree.Element(key))
        else:
            res = ElementTree.Element(key)
            for k, v in value.items():
                dicttoxml_handler(res, k, v)
            result.append(res)

    result = ElementTree.Element(element.keys()[0])
    for key, value in element[element.keys()[0]].items():
        dicttoxml_handler(result, key, value)
    return result

def xmlfiletodict(filename):
    return xmltodict(ElementTree.parse(filename).getroot())

def dicttoxmlfile(element, filename):
    ElementTree.ElementTree(dicttoxml(element)).write(filename)

def xmlstringtodict(xmlstring):
    return xmltodict(ElementTree.fromstring(xmlstring).getroot())

def dicttoxmlstring(element):
    return ElementTree.tostring(dicttoxml(element))

回答 13

@dibrovsd:如果xml具有多个具有相同名称的标签,则解决方案将不起作用

根据您的想法,我对代码进行了一些修改,并将其编写为常规节点而不是root用户:

from collections import defaultdict
def xml2dict(node):
    d, count = defaultdict(list), 1
    for i in node:
        d[i.tag + "_" + str(count)]['text'] = i.findtext('.')[0]
        d[i.tag + "_" + str(count)]['attrib'] = i.attrib # attrib gives the list
        d[i.tag + "_" + str(count)]['children'] = xml2dict(i) # it gives dict
     return d

@dibrovsd: Solution will not work if the xml have more than one tag with same name

On your line of thought, I have modified the code a bit and written it for general node instead of root:

from collections import defaultdict
def xml2dict(node):
    d, count = defaultdict(list), 1
    for i in node:
        d[i.tag + "_" + str(count)]['text'] = i.findtext('.')[0]
        d[i.tag + "_" + str(count)]['attrib'] = i.attrib # attrib gives the list
        d[i.tag + "_" + str(count)]['children'] = xml2dict(i) # it gives dict
     return d

回答 14

我修改了我的口味的答案之一,并使用同一标签处理多个值,例如考虑以下保存在XML.xml文件中的xml代码。

     <A>
        <B>
            <BB>inAB</BB>
            <C>
                <D>
                    <E>
                        inABCDE
                    </E>
                    <E>value2</E>
                    <E>value3</E>
                </D>
                <inCout-ofD>123</inCout-ofD>
            </C>
        </B>
        <B>abc</B>
        <F>F</F>
    </A>

和在python中

import xml.etree.ElementTree as ET




class XMLToDictionary(dict):
    def __init__(self, parentElement):
        self.parentElement = parentElement
        for child in list(parentElement):
            child.text = child.text if (child.text != None) else  ' '
            if len(child) == 0:
                self.update(self._addToDict(key= child.tag, value = child.text.strip(), dict = self))
            else:
                innerChild = XMLToDictionary(parentElement=child)
                self.update(self._addToDict(key=innerChild.parentElement.tag, value=innerChild, dict=self))

    def getDict(self):
        return {self.parentElement.tag: self}

    class _addToDict(dict):
        def __init__(self, key, value, dict):
            if not key in dict:
                self.update({key: value})
            else:
                identical = dict[key] if type(dict[key]) == list else [dict[key]]
                self.update({key: identical + [value]})


tree = ET.parse('./XML.xml')
root = tree.getroot()
parseredDict = XMLToDictionary(root).getDict()
print(parseredDict)

输出是

{'A': {'B': [{'BB': 'inAB', 'C': {'D': {'E': ['inABCDE', 'value2', 'value3']}, 'inCout-ofD': '123'}}, 'abc'], 'F': 'F'}}

I have modified one of the answers to my taste and to work with multiple values with the same tag for example consider the following xml code saved in XML.xml file

     <A>
        <B>
            <BB>inAB</BB>
            <C>
                <D>
                    <E>
                        inABCDE
                    </E>
                    <E>value2</E>
                    <E>value3</E>
                </D>
                <inCout-ofD>123</inCout-ofD>
            </C>
        </B>
        <B>abc</B>
        <F>F</F>
    </A>

and in python

import xml.etree.ElementTree as ET




class XMLToDictionary(dict):
    def __init__(self, parentElement):
        self.parentElement = parentElement
        for child in list(parentElement):
            child.text = child.text if (child.text != None) else  ' '
            if len(child) == 0:
                self.update(self._addToDict(key= child.tag, value = child.text.strip(), dict = self))
            else:
                innerChild = XMLToDictionary(parentElement=child)
                self.update(self._addToDict(key=innerChild.parentElement.tag, value=innerChild, dict=self))

    def getDict(self):
        return {self.parentElement.tag: self}

    class _addToDict(dict):
        def __init__(self, key, value, dict):
            if not key in dict:
                self.update({key: value})
            else:
                identical = dict[key] if type(dict[key]) == list else [dict[key]]
                self.update({key: identical + [value]})


tree = ET.parse('./XML.xml')
root = tree.getroot()
parseredDict = XMLToDictionary(root).getDict()
print(parseredDict)

the output is

{'A': {'B': [{'BB': 'inAB', 'C': {'D': {'E': ['inABCDE', 'value2', 'value3']}, 'inCout-ofD': '123'}}, 'abc'], 'F': 'F'}}

回答 15

我有一个递归方法,可从lxml元素获取字典

    def recursive_dict(element):
        return (element.tag.split('}')[1],
                dict(map(recursive_dict, element.getchildren()),
                     **element.attrib))

I have a recursive method to get a dictionary from a lxml element

    def recursive_dict(element):
        return (element.tag.split('}')[1],
                dict(map(recursive_dict, element.getchildren()),
                     **element.attrib))

本文由 Python 实用宝典 作者:Python实用宝典 发表,其版权均为 Python 实用宝典 所有,文章内容系作者个人观点,不代表 Python 实用宝典 对观点赞同或支持。如需转载,请注明文章来源。
15

抱歉,评论已关闭!