XML *** .. highlight:: python .. important:: Check out the :doc:`snippets/xml` snippets page. Links ===== - `What's New in Python 2.5`_ Working with XML through ElementTree - `ElementTree Overview`_ `Python Library Reference- The ElementTree XML API`_ - http://codespeak.net/lxml/ lxml is the most feature-rich and easy-to-use library for working with XML and HTML in the Python language. - `XML building library`_ Note ==== *The core components (of ElementTree) are also shipped with Python 2.5 and later*... Sample ====== Read ---- .. note:: The :doc:`snippets/xml` snippets page has a python 3 example which iterates over attributes **and** tags. .. tip:: The :doc:`snippets/xml` snippets page has an example using ``xmltodict``. If the file can fit in memory this is probably an easier option. An older python 2 example which iterates over tags:: from xml.etree import ElementTree as ET tree = ET.parse('pom.xml') r = tree.getroot() def trav(node, indent=0): for c in node.getchildren(): print ' '*indent, c.tag, ':', c.text trav(c, indent+1) trav(r) ...using the ``trav`` method above, we can iterate over a string:: >>> xml = '799188' >>> tree = ET.fromstring(xml) >>> trav(tree) Another (slightly confusing) sample:: >>> testtext = """ ... hello world. foo! ... """ >>> testtext '\n hello world. foo!\n ' >>> tree = ET.fromstring(testtext) >>> len(tree) 1 >>> tree[0].text 'hello world. ' >>> tree[0][0].text 'foo!' >>> for italicNode in tree.findall('.//i'): ... print italicNode.text ... foo! >>> ET.tostring(tree) 'hello world. foo!\n ' >>> Create ------ :: from xml.etree import ElementTree as ET root = ET.Element('html') head = ET.SubElement(root, 'head') title = ET.SubElement(head, 'title') title.text = 'Page Title' body = ET.SubElement(root, 'body') body.set('bgcolor', '#ffffff') body.text = 'Hello World!' tree = ET.ElementTree(root) tree.write('temp.xml') Encoding e.g. using the ``tree`` object from the *Create* sample (above):: tree.write('out.xml', encoding="UTF-8") `Introducing ElementTree 1.3, XML Output`_ Pretty Print We can produce a *pretty print* using this method:: def indent(elem, level=0): i = "\n" + level*" " if len(elem): if not elem.text or not elem.text.strip(): elem.text = i + " " if not elem.tail or not elem.tail.strip(): elem.tail = i for elem in elem: indent(elem, level+1) if not elem.tail or not elem.tail.strip(): elem.tail = i else: if level and (not elem.tail or not elem.tail.strip()): elem.tail = i e.g. using the ``tree`` object from the *Create* sample (above):: indent(tree.getroot()) tree.write('pretty.xml', encoding="ISO-8859-1") - `Element Library Functions, prettyprint`_ - `Gentlemen indent your XML!`_ - `ActiveState, Recipe 576750: Pretty-print XML`_ :: #!/usr/bin/env python import xml.dom.minidom as md import sys pretty_print = lambda f: '\n'.join([line for line in md.parse(open(f)).toprettyxml(indent=' '*2).split('\n') if line.strip()]) if __name__ == "__main__": if len(sys.argv)>=2: print pretty_print(sys.argv[1]) else: sys.exit("Usage: %s [xmlfile]" % sys.argv[0]) ``find`` and ``findAll`` ======================== For this example we will parse a standard Maven ``pom.xml`` file. To find elements using *XPath like* syntax, we first need to know the namespace:: from xml.etree import ElementTree as ET tree = ET.parse('sample-app/pom.xml') root = tree.getroot() for element in root: print element.tag ...: {http://maven.apache.org/POM/4.0.0}modelVersion {http://maven.apache.org/POM/4.0.0}groupId {http://maven.apache.org/POM/4.0.0}artifactId ... Don't forget to include the namespace when searching for elements:: e = tree.find('{http://maven.apache.org/POM/4.0.0}artifactId') e.text 'sample-app' To find all elements in the xml file, prefix the query with ``\/\/``:: e = tree.findall('//{http://maven.apache.org/POM/4.0.0}artifactId') for i in e: print i.text ....: sample-app junit To search down through a specific path:: e = tree.find('{http://maven.apache.org/POM/4.0.0}dependencies/{http://maven.apache.org/POM/4.0.0}dependency/{http://maven.apache.org/POM/4.0.0}artifactId') e.text 'junit' .. _`ActiveState, Recipe 576750: Pretty-print XML`: http://code.activestate.com/recipes/576750/ .. _`Element Library Functions, prettyprint`: http://effbot.org/zone/element-lib.htm#prettyprint .. _`ElementTree Overview`: http://effbot.org/zone/element-index.htm .. _`Gentlemen indent your XML!`: http://infix.se/2007/02/06/gentlemen-indent-your-xml .. _`Introducing ElementTree 1.3, XML Output`: http://effbot.org/zone/elementtree-13-intro.htm .. _`Python Library Reference- The ElementTree XML API`: http://docs.python.org/lib/module-xml.etree.ElementTree.html .. _`What's New in Python 2.5`: http://www.onlamp.com/pub/a/python/2006/10/26/python-25.html?page=4 .. _`XML building library`: http://github.com/galvez/xmlwitch/