标签归档:minidom

使用Python最小化获取元素值

问题:使用Python最小化获取元素值

我正在使用Python创建Eve Online API的GUI前端。

我已经成功地从他们的服务器中提取了XML数据。

我正在尝试从名为“名称”的节点获取值:

from xml.dom.minidom import parse
dom = parse("C:\\eve.xml")
name = dom.getElementsByTagName('name')
print name

这似乎找到了节点,但是输出如下:

[<DOM Element: name at 0x11e6d28>]

我如何获得它来打印节点的值?

I am creating a GUI frontend for the Eve Online API in Python.

I have successfully pulled the XML data from their server.

I am trying to grab the value from a node called “name”:

from xml.dom.minidom import parse
dom = parse("C:\\eve.xml")
name = dom.getElementsByTagName('name')
print name

This seems to find the node, but the output is below:

[<DOM Element: name at 0x11e6d28>]

How could I get it to print the value of the node?


回答 0

应该只是

name[0].firstChild.nodeValue

It should just be

name[0].firstChild.nodeValue

回答 1

如果这是您想要的文字部分,可能是这样的。

from xml.dom.minidom import parse
dom = parse("C:\\eve.xml")
name = dom.getElementsByTagName('name')

print " ".join(t.nodeValue for t in name[0].childNodes if t.nodeType == t.TEXT_NODE)

节点的文本部分本身被视为一个节点,它作为您要的节点的子节点。因此,您将需要遍历其所有子节点,并找到所有作为文本节点的子节点。一个节点可以有多个文本节点。例如。

<name>
  blabla
  <somestuff>asdf</somestuff>
  znylpx
</name>

您同时需要“ blabla”和“ znylpx”;因此是“” .join()。您可能要用换行符代替空格,或者什么也不要。

Probably something like this if it’s the text part you want…

from xml.dom.minidom import parse
dom = parse("C:\\eve.xml")
name = dom.getElementsByTagName('name')

print " ".join(t.nodeValue for t in name[0].childNodes if t.nodeType == t.TEXT_NODE)

The text part of a node is considered a node in itself placed as a child-node of the one you asked for. Thus you will want to go through all its children and find all child nodes that are text nodes. A node can have several text nodes; eg.

<name>
  blabla
  <somestuff>asdf</somestuff>
  znylpx
</name>

You want both ‘blabla’ and ‘znylpx’; hence the ” “.join(). You might want to replace the space with a newline or so, or perhaps by nothing.


回答 2

你可以用这样的东西。它对我有用

doc = parse('C:\\eve.xml')
my_node_list = doc.getElementsByTagName("name")
my_n_node = my_node_list[0]
my_child = my_n_node.firstChild
my_text = my_child.data 
print my_text

you can use something like this.It worked out for me

doc = parse('C:\\eve.xml')
my_node_list = doc.getElementsByTagName("name")
my_n_node = my_node_list[0]
my_child = my_n_node.firstChild
my_text = my_child.data 
print my_text

回答 3

我知道这个问题现在已经很老了,但我认为您使用ElementTree可能会更轻松

from xml.etree import ElementTree as ET
import datetime

f = ET.XML(data)

for element in f:
    if element.tag == "currentTime":
        # Handle time data was pulled
        currentTime = datetime.datetime.strptime(element.text, "%Y-%m-%d %H:%M:%S")
    if element.tag == "cachedUntil":
        # Handle time until next allowed update
        cachedUntil = datetime.datetime.strptime(element.text, "%Y-%m-%d %H:%M:%S")
    if element.tag == "result":
        # Process list of skills
        pass

我知道这不是超级特定的,但是我只是发现了,到目前为止,让我的头脑比最小化要容易得多(因为很多节点本质上都是空白)。

例如,您可以将标签名称和实际文本放在一起,就像您可能期望的那样:

>>> element[0]
<Element currentTime at 40984d0>
>>> element[0].tag
'currentTime'
>>> element[0].text
'2010-04-12 02:45:45'e

I know this question is pretty old now, but I thought you might have an easier time with ElementTree

from xml.etree import ElementTree as ET
import datetime

f = ET.XML(data)

for element in f:
    if element.tag == "currentTime":
        # Handle time data was pulled
        currentTime = datetime.datetime.strptime(element.text, "%Y-%m-%d %H:%M:%S")
    if element.tag == "cachedUntil":
        # Handle time until next allowed update
        cachedUntil = datetime.datetime.strptime(element.text, "%Y-%m-%d %H:%M:%S")
    if element.tag == "result":
        # Process list of skills
        pass

I know that’s not super specific, but I just discovered it, and so far it’s a lot easier to get my head around than the minidom (since so many nodes are essentially white space).

For instance, you have the tag name and the actual text together, just as you’d probably expect:

>>> element[0]
<Element currentTime at 40984d0>
>>> element[0].tag
'currentTime'
>>> element[0].text
'2010-04-12 02:45:45'e

回答 4

上面的答案是正确的,即:

name[0].firstChild.nodeValue

但是,对我来说,和其他人一样,我的价值更进一步。

name[0].firstChild.firstChild.nodeValue

为了找到这个,我使用了以下内容:

def scandown( elements, indent ):
    for el in elements:
        print("   " * indent + "nodeName: " + str(el.nodeName) )
        print("   " * indent + "nodeValue: " + str(el.nodeValue) )
        print("   " * indent + "childNodes: " + str(el.childNodes) )
        scandown(el.childNodes, indent + 1)

scandown( doc.getElementsByTagName('text'), 0 )

对使用Inkscape创建的简单SVG文件运行此命令,这给了我:

nodeName: text
nodeValue: None
childNodes: [<DOM Element: tspan at 0x10392c6d0>]
   nodeName: tspan
   nodeValue: None
   childNodes: [<DOM Text node "'MY STRING'">]
      nodeName: #text
      nodeValue: MY STRING
      childNodes: ()
nodeName: text
nodeValue: None
childNodes: [<DOM Element: tspan at 0x10392c800>]
   nodeName: tspan
   nodeValue: None
   childNodes: [<DOM Text node "'MY WORDS'">]
      nodeName: #text
      nodeValue: MY WORDS
      childNodes: ()

我使用xml.dom.minidom,此页面MiniDom Python解释了各个字段

The above answer is correct, namely:

name[0].firstChild.nodeValue

However for me, like others, my value was further down the tree:

name[0].firstChild.firstChild.nodeValue

To find this I used the following:

def scandown( elements, indent ):
    for el in elements:
        print("   " * indent + "nodeName: " + str(el.nodeName) )
        print("   " * indent + "nodeValue: " + str(el.nodeValue) )
        print("   " * indent + "childNodes: " + str(el.childNodes) )
        scandown(el.childNodes, indent + 1)

scandown( doc.getElementsByTagName('text'), 0 )

Running this for my simple SVG file created with Inkscape this gave me:

nodeName: text
nodeValue: None
childNodes: [<DOM Element: tspan at 0x10392c6d0>]
   nodeName: tspan
   nodeValue: None
   childNodes: [<DOM Text node "'MY STRING'">]
      nodeName: #text
      nodeValue: MY STRING
      childNodes: ()
nodeName: text
nodeValue: None
childNodes: [<DOM Element: tspan at 0x10392c800>]
   nodeName: tspan
   nodeValue: None
   childNodes: [<DOM Text node "'MY WORDS'">]
      nodeName: #text
      nodeValue: MY WORDS
      childNodes: ()

I used xml.dom.minidom, the various fields are explained on this page, MiniDom Python.


回答 5

我有一个类似的案例,对我有用的是:

name.firstChild.childNodes [0] .data

XML应该很简单,实际上确实如此,我不知道为什么python的小巧性使它如此复杂…但是它是如何制作的

I had a similar case, what worked for me was:

name.firstChild.childNodes[0].data

XML is supposed to be simple and it really is and I don’t know why python’s minidom did it so complicated… but it’s how it’s made


回答 6

这是Henrik对于多个节点的稍作修改的答案(即,当getElementsByTagName返回多个实例时)

images = xml.getElementsByTagName("imageUrl")
for i in images:
    print " ".join(t.nodeValue for t in i.childNodes if t.nodeType == t.TEXT_NODE)

Here is a slightly modified answer of Henrik’s for multiple nodes (ie. when getElementsByTagName returns more than one instance)

images = xml.getElementsByTagName("imageUrl")
for i in images:
    print " ".join(t.nodeValue for t in i.childNodes if t.nodeType == t.TEXT_NODE)

回答 7

问题已经回答,我的贡献在于澄清了一件事,可能会使初学者感到困惑:

使用了一些建议的正确答案firstChild.datafirstChild.nodeValue而使用了其他答案。如果您想知道它们之间的区别是什么,您应该记住它们做相同的事情,因为nodeValue它只是的别名data

我的陈述的引用可以作为对minidom源代码的注释

nodeValue是的别名data

The question has been answered, my contribution consists in clarifying one thing that may confuse beginners:

Some of the suggested and correct answers used firstChild.data and others used firstChild.nodeValue instead. In case you are wondering what is the different between them, you should remember they do the same thing because nodeValue is just an alias for data.

The reference to my statement can be found as a comment on the source code of minidom:

#nodeValue is an alias for data


回答 8

它是一棵树,可能有嵌套的元素。尝试:

def innerText(self, sep=''):
    t = ""
    for curNode in self.childNodes:
        if (curNode.nodeType == Node.TEXT_NODE):
            t += sep + curNode.nodeValue
        elif (curNode.nodeType == Node.ELEMENT_NODE):
            t += sep + curNode.innerText(sep=sep)
    return t

It’s a tree, and there may be nested elements. Try:

def innerText(self, sep=''):
    t = ""
    for curNode in self.childNodes:
        if (curNode.nodeType == Node.TEXT_NODE):
            t += sep + curNode.nodeValue
        elif (curNode.nodeType == Node.ELEMENT_NODE):
            t += sep + curNode.innerText(sep=sep)
    return t