<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>learning python &#187; ElementTree</title>
	<atom:link href="http://www.learningpython.com/tag/elementtree/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.learningpython.com</link>
	<description>one man's journey into python...</description>
	<lastBuildDate>Mon, 26 Apr 2010 01:21:51 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=abc</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>Elegant XML parsing using the ElementTree Module</title>
		<link>http://www.learningpython.com/2008/05/07/elegant-xml-parsing-using-the-elementtree-module/</link>
		<comments>http://www.learningpython.com/2008/05/07/elegant-xml-parsing-using-the-elementtree-module/#comments</comments>
		<pubDate>Wed, 07 May 2008 16:21:48 +0000</pubDate>
		<dc:creator>selsine</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[python]]></category>
		<category><![CDATA[tutorial]]></category>
		<category><![CDATA[xml]]></category>
		<category><![CDATA[ElementTree]]></category>

		<guid isPermaLink="false">http://www.learningpython.com/?p=74</guid>
		<description><![CDATA[
			
				
			
		
Mark Mruss
Note: This article was first published the October 2007 issue of Python Magazine
XML is everywhere.  It seems you can&#8217;t do much these days unless you utilize XML in one way or another. Fortunately, Python developers have a new tool in our standard arsenal: the ElementTree module. This article aims to introduce you to [...]]]></description>
			<content:encoded><![CDATA[<div class="tweetmeme_button" style="float: right; margin-left: 10px;">
			<a href="http://api.tweetmeme.com/share?url=http%3A%2F%2Fwww.learningpython.com%2F2008%2F05%2F07%2Felegant-xml-parsing-using-the-elementtree-module%2F"><br />
				<img src="http://api.tweetmeme.com/imagebutton.gif?url=http%3A%2F%2Fwww.learningpython.com%2F2008%2F05%2F07%2Felegant-xml-parsing-using-the-elementtree-module%2F&amp;style=normal" height="61" width="50" /><br />
			</a>
		</div>
<p><strong>Mark Mruss</strong></p>
<p><strong>Note:</strong> This article was first published the October 2007 issue of <a href="http://www.pythonmagazine.com">Python Magazine</a></p>
<p>XML is everywhere.  It seems you can&#8217;t do much these days unless you utilize XML in one way or another. Fortunately, Python developers have a new tool in our standard arsenal: the ElementTree module. This article aims to introduce you to reading, writing, saving, and loading XML using the ElementTree module.</p>
<ol>
<li><a href="#Introduction">Introduction</a></li>
<li><a href="#ReadingXMLdata">Reading XML data</a></li>
<li><a href="#Listing1">Listing 1</a></li>
<li><a href="#Listing2">Listing 2</a></li>
<li><a href="#ReadingXMLAttributes">Reading XML Attributes</a></li>
<li><a href="#WritingXML">Writing XML</a></li>
<li><a href="#Listing3">Listing 3</a></li>
<li><a href="#WritingXMLAttributes">Writing XML Attributes</a></li>
<li><a href="#ReadingXMLFiles">Reading XML Files</a></li>
<li><a href="#WritingXMLDatatoaFile">Writing XML Data to a File</a></li>
<li><a href="#ReadingfromtheWeb">Reading from the Web</a></li>
<li><a href="#Conclusion">Conclusion</a></li>
</ol>
<p><span id="more-74"></span><!--more--></p>
<h2><a href="#Introduction">Introduction</a></h2>
<p>It seems like everyone needs to parse XML these days.  They&#8217;re either saving their own information in XML or loading in someone else&#8217;s data.  This is why I was glad to learn that as of Python 2.5, the <em>ElementTree</em> XML package has been added to the standard library in the XML module.</p>
<p>What I like about the <em>ElementTree</em> module is that it just seems to make sense.  This might seem like a strange thing to say about an XML module, but I&#8217;ve had to parse enough XML in my time to know that if an XML module makes sense the first time you use it, it&#8217;s probably a keeper. The <em>ElementTree</em> module allows me to work with XML data in a way that is similar to how I <em>think</em> about XML data.</p>
<p>A subset of the full <em>ElementTree</em> module is available in the Python 2.5 standard library as <code>xml.etree</code>, but you don&#8217;t have to use Python 2.5 in order to use the <em>ElementTree</em> module. If you are still using an older version of Python (1.5.2 or later) you can simply download the module from its website and manually install it on your system.  The website also has very easy to follow installation instructions, which you should consult to avoid issues while installing <em>ElementTree</em>.</p>
<p>In general, the <em>ElementTree</em> module treats XML data as a list of lists.  All XML has a root element that will have zero or more subelements (or child elements). Each of those subelements may in turn have subelements of their own.  The best way to think about this is with a brief example.</p>
<p>First let&#8217;s take a look at some sample XML data:</p>
<div class="hl-surround" ><div class="hl-main"><pre><span class="hl-default">&lt;</span><span class="hl-identifier">root</span><span class="hl-default">&gt;
&lt;</span><span class="hl-identifier">child</span><span class="hl-default">&gt;</span><span class="hl-identifier">One</span><span class="hl-default">&lt;/</span><span class="hl-identifier">child</span><span class="hl-default">&gt;
&lt;</span><span class="hl-identifier">child</span><span class="hl-default">&gt;</span><span class="hl-identifier">Two</span><span class="hl-default">&lt;/</span><span class="hl-identifier">child</span><span class="hl-default">&gt;
&lt;/</span><span class="hl-identifier">root</span><span class="hl-default">&gt;</span></pre></div></div>
<p>Here we have a root element with two child elements. Each child element has some text associated with it seen here as &#8220;one&#8221; and &#8220;two&#8221;. If we examine the XML as a hierarchical list of lists we see that we have one element &#8220;root&#8221; in our root list.  Within the &#8220;root&#8221; element we have a list containing two subelements &#8220;child&#8221; and &#8220;child&#8221;. The two &#8220;child&#8221; elements would then contain empty lists representing their lack of subelements. Not too complicated so far, is it?</p>
<h2><a href="#ReadingXMLdata">Reading XML data</a></h2>
<p>Now let&#8217;s use the <em>ElementTree</em> package to parse this XML and print the text data associated with each child element.  To start, we&#8217;ll create a Python file with the contents shown in Listing 1.</p>
<p><strong><a href="#Listing1">Listing 1</a></strong></p>
<div class="hl-surround" ><div class="hl-main"><pre><span class="hl-comment">#!/usr/bin/env python

</span><span class="hl-reserved">def </span><span class="hl-identifier">main</span><span class="hl-brackets">()</span><span class="hl-default">:
	</span><span class="hl-reserved">pass

if </span><span class="hl-identifier">__name__</span><span class="hl-default"> == </span><span class="hl-quotes">&quot;</span><span class="hl-string">__main__</span><span class="hl-quotes">&quot;</span><span class="hl-default">:
	</span><span class="hl-identifier">main</span><span class="hl-brackets">()</span></pre></div></div>
<p>This is basically a template that I use for many of my simple &#8220;*.py&#8221; files.  It doesn&#8217;t actually do anything except set up the script so that when the file is run, the <code>main</code> method will be executed. Some people like to use the Python interactive interpreter for simple hacking like this. Personally, I prefer having my code stored in a handy file so I can make simple changes and re-run the entire script when I am just playing around.</p>
<p>The first thing that we need to do in our Python code is import the <em>ElementTree</em> module:</p>
<div class="hl-surround" style="height:28px;"><div class="hl-main"><pre><span class="hl-reserved">from </span><span class="hl-identifier">xml</span><span class="hl-default">.</span><span class="hl-identifier">etree </span><span class="hl-reserved">import </span><span class="hl-identifier">ElementTree as ET</span></pre></div></div>
<p><strong>Note</strong>: If you are not using Python 2.5 and have installed the <em>ElementTree</em> module on your own, you should import the <em>ElementTree</em> module as follows:</p>
<div class="hl-surround" style="height:28px;"><div class="hl-main"><pre><span class="hl-reserved">from </span><span class="hl-identifier">elementtree </span><span class="hl-reserved">import </span><span class="hl-identifier">ElementTree as ET</span></pre></div></div>
<p>This will import the ElementTree section of the module into your program aliased as ET.  However, you don&#8217;t have to import <em>ElementTree</em> using an alias; you can simply import it and access it as <code>ElementTree</code>. Using ET is demonstrated in the Python 2.5 &#8220;What&#8217;s new&#8221; documentation[1] and I think it&#8217;s a great way to eliminate some key strokes.</p>
<p>Now we&#8217;ll begin writing code in the <code>main</code> method.  The first step is to load the XML data described above.  Normally you will be working with a file or URL; for now we want to keep this simple and load the XML data directly from the text:</p>
<div class="hl-surround" ><div class="hl-main"><pre><span class="hl-identifier">element</span><span class="hl-default"> = </span><span class="hl-identifier">ET</span><span class="hl-default">.</span><span class="hl-identifier">XML</span><span class="hl-brackets">(
       </span><span class="hl-quotes">&quot;</span><span class="hl-string">&lt;root&gt;&lt;child&gt;One&lt;/child&gt;&lt;child&gt;Two&lt;/child&gt;&lt;/root&gt;</span><span class="hl-quotes">&quot;</span><span class="hl-brackets">)</span></pre></div></div>
<p>The <code>XML</code> function is described in the <em>ElementTree</em> documentation as follows: &#8220;Parses an XML document from a string constant. This function can be used to embed &#8220;XML literals&#8221; in Python code&#8221;[2].</p>
<p>Be careful here! The <code>XML</code> function returns an Element object, and not an ElementTree object as one might expect. Element objects are used to represent XML elements, whereas the ElementTree object is used to represent the entire XML document. Element objects <em>may</em> represent the entire XML document if they are the root element but will not if they are a subelement. ElementTree objects also add &#8220;some extra support for serialization to and from standard XML.&#8221;[3] The Element object that is returned represents the <code><root></root></code> element in our XML data.</p>
<p>Thankfully, the Element object is an iterator object so we can use a <code>for</code> loop to loop through all of its child elements:</p>
<div class="hl-surround" style="height:28px;"><div class="hl-main"><pre><span class="hl-reserved">for </span><span class="hl-identifier">subelement </span><span class="hl-reserved">in </span><span class="hl-identifier">element</span><span class="hl-default">:</span></pre></div></div>
<p>This will give us all the child elements in the root element.  As mentioned earlier, each element in the XML tree is represented as an Element object, so as we iterate through the root element&#8217;s child elements we are getting Element objects with which to work. Meaning that each loop though the for loop will give us the next child element in the form of an Element object until there are no more children left. In order to print out the text associated with an Element object we simply have to access the Element object&#8217;s <code>text</code> attribute:</p>
<div class="hl-surround" ><div class="hl-main"><pre><span class="hl-reserved">for </span><span class="hl-identifier">subelement </span><span class="hl-reserved">in </span><span class="hl-identifier">element</span><span class="hl-default">:
       </span><span class="hl-reserved">print </span><span class="hl-identifier">subelement</span><span class="hl-default">.</span><span class="hl-identifier">text</span></pre></div></div>
<p>To recap, have a look at the code in Listing 2.</p>
<p><strong><a href="#Listing2">Listing 2</a></strong></p>
<div class="hl-surround" ><div class="hl-main"><pre><span class="hl-comment">#!/usr/bin/env python

</span><span class="hl-reserved">from </span><span class="hl-identifier">xml</span><span class="hl-default">.</span><span class="hl-identifier">etree </span><span class="hl-reserved">import </span><span class="hl-identifier">ElementTree as ET

</span><span class="hl-reserved">def </span><span class="hl-identifier">main</span><span class="hl-brackets">()</span><span class="hl-default">:
	</span><span class="hl-identifier">element</span><span class="hl-default"> = </span><span class="hl-identifier">ET</span><span class="hl-default">.</span><span class="hl-identifier">XML</span><span class="hl-brackets">(</span><span class="hl-quotes">&quot;</span><span class="hl-string">&lt;root&gt;&lt;child&gt;One&lt;/child&gt;&lt;child&gt;Two&lt;/child&gt;&lt;/root&gt;</span><span class="hl-quotes">&quot;</span><span class="hl-brackets">)
	</span><span class="hl-reserved">for </span><span class="hl-identifier">subelement </span><span class="hl-reserved">in </span><span class="hl-identifier">element</span><span class="hl-default">:
		</span><span class="hl-reserved">print </span><span class="hl-identifier">subelement</span><span class="hl-default">.</span><span class="hl-identifier">text

</span><span class="hl-reserved">if </span><span class="hl-identifier">__name__</span><span class="hl-default"> == </span><span class="hl-quotes">&quot;</span><span class="hl-string">__main__</span><span class="hl-quotes">&quot;</span><span class="hl-default">:
	</span><span class="hl-comment"># Someone is launching this directly
	</span><span class="hl-identifier">main</span><span class="hl-brackets">()</span></pre></div></div>
<p>Once you run the code you should get the following output:</p>
<div class="hl-surround" ><div class="hl-main"><pre><span class="hl-identifier">One
Two</span></pre></div></div>
<p>If an XML element does not have any text associated with it, like our root element, the Element object&#8217;s <code>text</code> attribute will be set to <code>None</code>. If you want to check if an element had any text associated with it, you can do the following:</p>
<div class="hl-surround" ><div class="hl-main"><pre><span class="hl-reserved">if </span><span class="hl-identifier">element</span><span class="hl-default">.</span><span class="hl-identifier">text </span><span class="hl-reserved">is not None</span><span class="hl-default">:
       </span><span class="hl-reserved">print </span><span class="hl-identifier">element</span><span class="hl-default">.</span><span class="hl-identifier">text</span></pre></div></div>
<h2><a href="#ReadingXMLAttributes">Reading XML Attributes</a></h2>
<p>Let&#8217;s alter the XML that we are working with to add attributes to the elements and look at how we would parse that information.</p>
<p>If the XML uses attributes in addition to, or instead of, inner text they can be accessed using the Element object&#8217;s <code>attrib</code> attribute.  The <code>attrib</code> attribute is a Python dictionary and is relatively easy to use:</p>
<div class="hl-surround" ><div class="hl-main"><pre><span class="hl-reserved">def </span><span class="hl-identifier">main</span><span class="hl-brackets">()</span><span class="hl-default">:
       </span><span class="hl-identifier">element</span><span class="hl-default"> = </span><span class="hl-identifier">ET</span><span class="hl-default">.</span><span class="hl-identifier">XML</span><span class="hl-brackets">(
               </span><span class="hl-quotes">'</span><span class="hl-string">&lt;root&gt;&lt;child val=&quot;One&quot;/&gt;&lt;child val=&quot;Two&quot;/&gt;&lt;/root&gt;</span><span class="hl-quotes">'</span><span class="hl-brackets">)
       </span><span class="hl-reserved">for </span><span class="hl-identifier">subelement </span><span class="hl-reserved">in </span><span class="hl-identifier">element</span><span class="hl-default">:
               </span><span class="hl-reserved">print </span><span class="hl-identifier">subelement</span><span class="hl-default">.</span><span class="hl-identifier">attrib</span></pre></div></div>
<p>When you run the code you get the following output:</p>
<div class="hl-surround" ><div class="hl-main"><pre><span class="hl-default">{</span><span class="hl-quotes">'</span><span class="hl-string">val</span><span class="hl-quotes">'</span><span class="hl-default">: </span><span class="hl-quotes">'</span><span class="hl-string">One</span><span class="hl-quotes">'</span><span class="hl-default">}
{</span><span class="hl-quotes">'</span><span class="hl-string">val</span><span class="hl-quotes">'</span><span class="hl-default">: </span><span class="hl-quotes">'</span><span class="hl-string">Two</span><span class="hl-quotes">'</span><span class="hl-default">}</span></pre></div></div>
<p>These are the attributes for each child element stored in a dictionary. Being able to work with an XML element&#8217;s attributes as a Python dictionary is a great feature and fits well with the dynamic nature of XML attributes.</p>
<h2><a href="#WritingXML">Writing XML</a></h2>
<p>Now that we&#8217;ve tried our hand at reading XML, let&#8217;s try creating some. If you understand the reading process, you should have no trouble understanding the creation process because it works in much the same manner. What we are going to do in this example is recreate the XML data that we were working with above.</p>
<p>The first step is to create our <code><root></root></code> element:</p>
<div class="hl-surround" ><div class="hl-main"><pre><span class="hl-comment">#create the root &lt;root&gt;
</span><span class="hl-identifier">root_element</span><span class="hl-default"> = </span><span class="hl-identifier">ET</span><span class="hl-default">.</span><span class="hl-identifier">Element</span><span class="hl-brackets">(</span><span class="hl-quotes">&quot;</span><span class="hl-string">root</span><span class="hl-quotes">&quot;</span><span class="hl-brackets">)</span></pre></div></div>
<p>After this code is executed, the variable <code>root_element</code> is an Element object, just like the Element objects that we used earlier to parse the XML.</p>
<p>The next step is to create the two child elements. There are two ways to do this.</p>
<p>In the first method, if you know exactly what you are creating, it&#8217;s easiest to use the <code>SubElement</code> method, which creates an Element object that is a subelement (or child) of another Element object:</p>
<div class="hl-surround" ><div class="hl-main"><pre><span class="hl-comment">#create the first child &lt;child&gt;One&lt;/child&gt;
</span><span class="hl-identifier">child</span><span class="hl-default"> = </span><span class="hl-identifier">ET</span><span class="hl-default">.</span><span class="hl-identifier">SubElement</span><span class="hl-brackets">(</span><span class="hl-identifier">root_element</span><span class="hl-code">, </span><span class="hl-quotes">&quot;</span><span class="hl-string">child</span><span class="hl-quotes">&quot;</span><span class="hl-brackets">)</span></pre></div></div>
<p>This will create a <code><child></child></code> Element that is a child of <code>root_element</code>.  We then need to set the text associated with that element.  To do this we use the same text attribute that we used in the first parsing example. However, instead of simply reading the text attribute we set its value:</p>
<div class="hl-surround" style="height:28px;"><div class="hl-main"><pre><span class="hl-identifier">child</span><span class="hl-default">.</span><span class="hl-identifier">text</span><span class="hl-default"> = </span><span class="hl-quotes">&quot;</span><span class="hl-string">One</span><span class="hl-quotes">&quot;</span></pre></div></div>
<p>The second approach to creating a child element is to create an Element object separately (rather than a sub element) and append it to a parent Element object.  The results are exactly the same &#8211; this is simply a different approach that may come in handy when creating your XML,or working with two sets of XML data.</p>
<p>First we create an Element object in the same way that we created the root element:</p>
<div class="hl-surround" ><div class="hl-main"><pre><span class="hl-comment">#create the second child &lt;child&gt;Two&lt;/child&gt;
</span><span class="hl-identifier">child</span><span class="hl-default"> = </span><span class="hl-identifier">ET</span><span class="hl-default">.</span><span class="hl-identifier">Element</span><span class="hl-brackets">(</span><span class="hl-quotes">&quot;</span><span class="hl-string">child</span><span class="hl-quotes">&quot;</span><span class="hl-brackets">)
</span><span class="hl-identifier">child</span><span class="hl-default">.</span><span class="hl-identifier">text</span><span class="hl-default"> = </span><span class="hl-quotes">&quot;</span><span class="hl-string">Two</span><span class="hl-quotes">&quot;</span></pre></div></div>
<p>This creates the <code>child</code> Element object and sets its text to &#8220;Two&#8221;.  We then append it to the root element: </p>
<div class="hl-surround" ><div class="hl-main"><pre><span class="hl-comment">#now append
</span><span class="hl-identifier">root_element</span><span class="hl-default">.</span><span class="hl-identifier">append</span><span class="hl-brackets">(</span><span class="hl-identifier">child</span><span class="hl-brackets">)</span></pre></div></div>
<p>Pretty simple!  Now, if we want to look at the contents of our <code>root_element</code> (or any other Element object for that matter) we can use the handy <code>tostring</code> function. It does exactly what it says that it does: it converts an Element object into a human readable string.</p>
<div class="hl-surround" ><div class="hl-main"><pre><span class="hl-comment">#Let's see the results
</span><span class="hl-reserved">print </span><span class="hl-identifier">ET</span><span class="hl-default">.</span><span class="hl-identifier">tostring</span><span class="hl-brackets">(</span><span class="hl-identifier">root_element</span><span class="hl-brackets">)</span></pre></div></div>
<p><strong><a href="#Listing3">Listing 3</a></strong></p>
<div class="hl-surround" style="height:280px;"><div class="hl-main"><pre><span class="hl-comment">#!/usr/bin/env python

</span><span class="hl-reserved">from </span><span class="hl-identifier">xml</span><span class="hl-default">.</span><span class="hl-identifier">etree </span><span class="hl-reserved">import </span><span class="hl-identifier">ElementTree as ET

</span><span class="hl-reserved">def </span><span class="hl-identifier">main</span><span class="hl-brackets">()</span><span class="hl-default">:
	</span><span class="hl-comment">#create the root &lt;/root&gt;&lt;root&gt;
	</span><span class="hl-identifier">root_element</span><span class="hl-default"> = </span><span class="hl-identifier">ET</span><span class="hl-default">.</span><span class="hl-identifier">Element</span><span class="hl-brackets">(</span><span class="hl-quotes">&quot;</span><span class="hl-string">root</span><span class="hl-quotes">&quot;</span><span class="hl-brackets">)
	</span><span class="hl-comment">#create the first child &lt;child&gt;One&lt;/child&gt;
	</span><span class="hl-identifier">child</span><span class="hl-default"> = </span><span class="hl-identifier">ET</span><span class="hl-default">.</span><span class="hl-identifier">SubElement</span><span class="hl-brackets">(</span><span class="hl-identifier">root_element</span><span class="hl-code">, </span><span class="hl-quotes">&quot;</span><span class="hl-string">child</span><span class="hl-quotes">&quot;</span><span class="hl-brackets">)
	</span><span class="hl-identifier">child</span><span class="hl-default">.</span><span class="hl-identifier">text</span><span class="hl-default"> = </span><span class="hl-quotes">&quot;</span><span class="hl-string">One</span><span class="hl-quotes">&quot;
	</span><span class="hl-comment">#create the second child &lt;child&gt;Two&lt;/child&gt;
	</span><span class="hl-identifier">child</span><span class="hl-default"> = </span><span class="hl-identifier">ET</span><span class="hl-default">.</span><span class="hl-identifier">Element</span><span class="hl-brackets">(</span><span class="hl-quotes">&quot;</span><span class="hl-string">child</span><span class="hl-quotes">&quot;</span><span class="hl-brackets">)
	</span><span class="hl-identifier">child</span><span class="hl-default">.</span><span class="hl-identifier">text</span><span class="hl-default"> = </span><span class="hl-quotes">&quot;</span><span class="hl-string">Two</span><span class="hl-quotes">&quot;
	</span><span class="hl-comment">#now append
	</span><span class="hl-identifier">root_element</span><span class="hl-default">.</span><span class="hl-identifier">append</span><span class="hl-brackets">(</span><span class="hl-identifier">child</span><span class="hl-brackets">)
	</span><span class="hl-comment">#Let's see the results
	</span><span class="hl-reserved">print </span><span class="hl-identifier">ET</span><span class="hl-default">.</span><span class="hl-identifier">tostring</span><span class="hl-brackets">(</span><span class="hl-identifier">root_element</span><span class="hl-brackets">)

</span><span class="hl-reserved">if </span><span class="hl-identifier">__name__</span><span class="hl-default"> == </span><span class="hl-quotes">&quot;</span><span class="hl-string">__main__</span><span class="hl-quotes">&quot;</span><span class="hl-default">:
	</span><span class="hl-comment"># Someone is launching this directly
	</span><span class="hl-identifier">main</span><span class="hl-brackets">()</span></pre></div></div>
<p>To recap, have a look at the code in Listing 3. When you run this code you will get the following output:</p>
<div class="hl-surround" style="height:28px;"><div class="hl-main"><pre><span class="hl-default">&lt;/</span><span class="hl-identifier">root</span><span class="hl-default">&gt;&lt;</span><span class="hl-identifier">root</span><span class="hl-default">&gt;&lt;</span><span class="hl-identifier">child</span><span class="hl-default">&gt;</span><span class="hl-identifier">One</span><span class="hl-default">&lt;/</span><span class="hl-identifier">child</span><span class="hl-default">&gt;&lt;</span><span class="hl-identifier">child</span><span class="hl-default">&gt;</span><span class="hl-identifier">Two</span><span class="hl-default">&lt;/</span><span class="hl-identifier">child</span><span class="hl-default">&gt;&lt;/</span><span class="hl-identifier">root</span><span class="hl-default">&gt;</span></pre></div></div>
<h2><a href="#WritingXMLAttributes">Writing XML Attributes</a></h2>
<p>If you want to create the XML with attributes (as illustrated in the second reading example), you can use the Element object&#8217;s <code>set</code> method.  To add the <code>val</code> attribute to the first element, use the following:</p>
<div class="hl-surround" style="height:28px;"><div class="hl-main"><pre><span class="hl-identifier">child</span><span class="hl-default">.</span><span class="hl-identifier">set</span><span class="hl-brackets">(</span><span class="hl-quotes">&quot;</span><span class="hl-string">val</span><span class="hl-quotes">&quot;</span><span class="hl-code">,</span><span class="hl-quotes">&quot;</span><span class="hl-string">One</span><span class="hl-quotes">&quot;</span><span class="hl-brackets">)</span></pre></div></div>
<p>You may also set attributes when you create Element objects:</p>
<div class="hl-surround" style="height:28px;"><div class="hl-main"><pre><span class="hl-identifier">child</span><span class="hl-default"> = </span><span class="hl-identifier">ET</span><span class="hl-default">.</span><span class="hl-identifier">Element</span><span class="hl-brackets">(</span><span class="hl-quotes">&quot;</span><span class="hl-string">child</span><span class="hl-quotes">&quot;</span><span class="hl-code">, </span><span class="hl-identifier">val</span><span class="hl-code">=</span><span class="hl-quotes">&quot;</span><span class="hl-string">One</span><span class="hl-quotes">&quot;</span><span class="hl-brackets">)</span></pre></div></div>
<h2><a href="#ReadingXMLFiles">Reading XML Files</a></h2>
<p>Most of the time you won&#8217;t be working with XML data that you explicitly create in your code, instead you will usually read the XML data in from a data source, work with it, and then save it back out when you are done. Fortunately, configuring <em>ElementTree</em> to work with different data sources is very easy.  For example, let&#8217;s take the XML data that we first used and save it into a file named <code>our.xml</code> in the same location as our Python file.</p>
<p>There are a few methods that we can use to load XML data from a file. We are going to use the <code>parse</code> function. This function is nice because it will accept, as a parameter, the path to a file OR a &#8220;file-like&#8221; object.  The term &#8220;file-like&#8221; is used on purpose because the object does not have to be a file object per se &#8211; it simply has to be an object that behaves in a file-like manner. A &#8220;file-like&#8221; object is an object that implements a &#8220;file-like&#8221; interface, meaning that it shares many (if not all) methods with the file object. If an object is &#8220;file-like&#8221; this fact will usually be prominently mentioned in its documentation.</p>
<p>The first thing that we need in order to load the XML data is determine the full path to the <code>our.xml</code> file. In order to calculate this, we determine the full path of our Python source file, strip the filename from it, and then append <code>our.xml</code> to the path. This is rather simple given that the <code>__file__</code> attribute (available in Python 2.2 and later) is the relative path and filename of our Python source file. Although the <code>__file__</code> attribute will be a relative path, we can use it to calculate the absolute path using the standard <em>os</em> module:</p>
<div class="hl-surround" style="height:28px;"><div class="hl-main"><pre><span class="hl-reserved">import </span><span class="hl-identifier">os</span></pre></div></div>
<p>We then call the <code>abspath</code> function to get the absolute path:</p>
<div class="hl-surround" style="height:28px;"><div class="hl-main"><pre><span class="hl-identifier">xml_file</span><span class="hl-default"> = </span><span class="hl-identifier">os</span><span class="hl-default">.</span><span class="hl-identifier">path</span><span class="hl-default">.</span><span class="hl-identifier">abspath</span><span class="hl-brackets">(</span><span class="hl-identifier">__file__</span><span class="hl-brackets">)</span></pre></div></div>
<p>However, since we only want the directory name (not the full path and filename of our Python source file) we have to strip off the filename:</p>
<div class="hl-surround" style="height:28px;"><div class="hl-main"><pre><span class="hl-identifier">xml_file</span><span class="hl-default"> = </span><span class="hl-identifier">os</span><span class="hl-default">.</span><span class="hl-identifier">path</span><span class="hl-default">.</span><span class="hl-identifier">dirname</span><span class="hl-brackets">(</span><span class="hl-identifier">xml_file</span><span class="hl-brackets">)</span></pre></div></div>
<p>Now that we have the directory in which the <code>our.xml</code> file resides, all we have to do is append the <code>our.xml</code> filename to the <code>xml_file</code> variable.  However, instead of just doing something like:</p>
<div class="hl-surround" style="height:28px;"><div class="hl-main"><pre><span class="hl-identifier">xml_file</span><span class="hl-default"> += </span><span class="hl-quotes">&quot;</span><span class="hl-string">/our.xml</span><span class="hl-quotes">&quot;</span></pre></div></div>
<p>we will use the <em>os</em> module to join the two paths so that the resulting path is always correct regardless of what operating system our code is executed on:</p>
<div class="hl-surround" style="height:28px;"><div class="hl-main"><pre><span class="hl-identifier">xml_file</span><span class="hl-default"> = </span><span class="hl-identifier">os</span><span class="hl-default">.</span><span class="hl-identifier">path</span><span class="hl-default">.</span><span class="hl-identifier">join</span><span class="hl-brackets">(</span><span class="hl-identifier">xml_file</span><span class="hl-code">, </span><span class="hl-quotes">&quot;</span><span class="hl-string">our.xml</span><span class="hl-quotes">&quot;</span><span class="hl-brackets">)</span></pre></div></div>
<p><strong>Note</strong>: If you have any trouble understanding what any of the code used to determine the path of <code>our.xml</code> is doing, try printing out <code>xml_file</code> after each of the above lines and it should become clear.</p>
<p>We now have the full path to the <code>our.xml</code> file.  In order to load its XML data we simply pass the path to the <code>parse</code> function:</p>
<div class="hl-surround" style="height:28px;"><div class="hl-main"><pre><span class="hl-identifier">tree</span><span class="hl-default"> = </span><span class="hl-identifier">ET</span><span class="hl-default">.</span><span class="hl-identifier">parse</span><span class="hl-brackets">(</span><span class="hl-identifier">xml_file</span><span class="hl-brackets">)</span></pre></div></div>
<p>We now have an ElementTree object instance that represents our XML file.</p>
<p>Since we are working with files, we should watch out for incorrect paths, I/O errors, or the parse function failing for any other reason.  If you wish to be extra careful, you can wrap the parse function in a try/except block in order to catch any exceptions that may be thrown:</p>
<div class="hl-surround" ><div class="hl-main"><pre><span class="hl-reserved">try</span><span class="hl-default">:
       </span><span class="hl-identifier">tree</span><span class="hl-default"> = </span><span class="hl-identifier">ET</span><span class="hl-default">.</span><span class="hl-identifier">parse</span><span class="hl-brackets">(</span><span class="hl-quotes">&quot;</span><span class="hl-string">sar</span><span class="hl-quotes">&quot;</span><span class="hl-brackets">)
</span><span class="hl-reserved">except Exception</span><span class="hl-default">, </span><span class="hl-identifier">inst</span><span class="hl-default">:
       </span><span class="hl-reserved">print </span><span class="hl-quotes">&quot;</span><span class="hl-string">Unexpected error opening %s: %s</span><span class="hl-quotes">&quot;</span><span class="hl-default"> % </span><span class="hl-brackets">(</span><span class="hl-identifier">xml_file</span><span class="hl-code">, </span><span class="hl-identifier">inst</span><span class="hl-brackets">)
       </span><span class="hl-reserved">return</span></pre></div></div>
<p>In the except block, I catch the Exception base class so that I catch any and all exceptions that may be thrown (in the case of a missing file it will most likely be an <code>IOError</code> exception).</p>
<h2><a href="#WritingXMLDatatoaFile">Writing XML Data to a File</a></h2>
<p>Now that we know how to read in XML data, we should look at how one writes XML data out to a file.  Let&#8217;s assume that after reading in the <code>out.xml</code> fiie we want to add another item to the XML file that we just read in:</p>
<div class="hl-surround" ><div class="hl-main"><pre><span class="hl-identifier">child</span><span class="hl-default"> = </span><span class="hl-identifier">ET</span><span class="hl-default">.</span><span class="hl-identifier">SubElement</span><span class="hl-brackets">(</span><span class="hl-identifier">tree</span><span class="hl-code">.</span><span class="hl-identifier">getroot</span><span class="hl-brackets">()</span><span class="hl-code">, </span><span class="hl-quotes">&quot;</span><span class="hl-string">child</span><span class="hl-quotes">&quot;</span><span class="hl-brackets">)
</span><span class="hl-identifier">child</span><span class="hl-default">.</span><span class="hl-identifier">text</span><span class="hl-default"> = </span><span class="hl-quotes">&quot;</span><span class="hl-string">Three</span><span class="hl-quotes">&quot;</span></pre></div></div>
<p>Notice that in order to add a child to the root element we used the ElementTree object&#8217;s <code>getroot</code> function. The <code>getroot</code> function simply returns the root Element object of the XML data.</p>
<p>Now that we have a third child element, let&#8217;s write the XML data back out to <code>our.xml</code>. Thanks to <em>ElementTree</em> this is a painless experience:</p>
<div class="hl-surround" style="height:28px;"><div class="hl-main"><pre><span class="hl-identifier">tree</span><span class="hl-default">.</span><span class="hl-identifier">write</span><span class="hl-brackets">(</span><span class="hl-identifier">xml_file</span><span class="hl-brackets">)</span></pre></div></div>
<p>That&#8217;s it!</p>
<p>If we want to be really careful when writing the XML data out to a file, we&#8217;ll watch out for exceptions. However most of the time the <code>write</code> method will succeed without throwing an exception; it is more important to be sure that the path used is correct.  Often times, instead of getting the exception that you want, you end up with an XML file stored in some far off and strange location on your hard drive because your path was incorrect or you did not specify the full path.  But, as is often the case when programming, better safe than sorry:</p>
<div class="hl-surround" ><div class="hl-main"><pre><span class="hl-reserved">try</span><span class="hl-default">:
       </span><span class="hl-identifier">tree</span><span class="hl-default">.</span><span class="hl-identifier">write</span><span class="hl-brackets">(</span><span class="hl-identifier">xml_file</span><span class="hl-brackets">)
</span><span class="hl-reserved">except Exception</span><span class="hl-default">, </span><span class="hl-identifier">inst</span><span class="hl-default">:
       </span><span class="hl-reserved">print </span><span class="hl-quotes">&quot;</span><span class="hl-string">Unexpected error writing to file %s: %s</span><span class="hl-quotes">&quot;</span><span class="hl-default"> % </span><span class="hl-brackets">(</span><span class="hl-identifier">xml_file</span><span class="hl-code">, </span><span class="hl-identifier">inst</span><span class="hl-brackets">)
       </span><span class="hl-reserved">return</span></pre></div></div>
<p>To recap you can find all of the code from this section in Listing 4.</p>
<p><strong><a href="#Listing4">Listing 4</a></strong></p>
<div class="hl-surround" style="height:280px;"><div class="hl-main"><pre><span class="hl-comment">#!/usr/bin/env python

</span><span class="hl-reserved">from </span><span class="hl-identifier">xml</span><span class="hl-default">.</span><span class="hl-identifier">etree </span><span class="hl-reserved">import </span><span class="hl-identifier">ElementTree as ET
</span><span class="hl-reserved">import </span><span class="hl-identifier">os

</span><span class="hl-reserved">def </span><span class="hl-identifier">main</span><span class="hl-brackets">()</span><span class="hl-default">:

	</span><span class="hl-identifier">xml_file</span><span class="hl-default"> = </span><span class="hl-identifier">os</span><span class="hl-default">.</span><span class="hl-identifier">path</span><span class="hl-default">.</span><span class="hl-identifier">abspath</span><span class="hl-brackets">(</span><span class="hl-identifier">__file__</span><span class="hl-brackets">)
	</span><span class="hl-identifier">xml_file</span><span class="hl-default"> = </span><span class="hl-identifier">os</span><span class="hl-default">.</span><span class="hl-identifier">path</span><span class="hl-default">.</span><span class="hl-identifier">dirname</span><span class="hl-brackets">(</span><span class="hl-identifier">xml_file</span><span class="hl-brackets">)
	</span><span class="hl-identifier">xml_file</span><span class="hl-default"> = </span><span class="hl-identifier">os</span><span class="hl-default">.</span><span class="hl-identifier">path</span><span class="hl-default">.</span><span class="hl-identifier">join</span><span class="hl-brackets">(</span><span class="hl-identifier">xml_file</span><span class="hl-code">, </span><span class="hl-quotes">&quot;</span><span class="hl-string">our.xml</span><span class="hl-quotes">&quot;</span><span class="hl-brackets">)

	</span><span class="hl-reserved">try</span><span class="hl-default">:
		</span><span class="hl-identifier">tree</span><span class="hl-default"> = </span><span class="hl-identifier">ET</span><span class="hl-default">.</span><span class="hl-identifier">parse</span><span class="hl-brackets">(</span><span class="hl-identifier">xml_file</span><span class="hl-brackets">)
	</span><span class="hl-reserved">except Exception</span><span class="hl-default">, </span><span class="hl-identifier">inst</span><span class="hl-default">:
		</span><span class="hl-reserved">print </span><span class="hl-quotes">&quot;</span><span class="hl-string">Unexpected error opening %s: %s</span><span class="hl-quotes">&quot;</span><span class="hl-default"> % </span><span class="hl-brackets">(</span><span class="hl-identifier">xml_file</span><span class="hl-code">, </span><span class="hl-identifier">inst</span><span class="hl-brackets">)
		</span><span class="hl-reserved">return

	</span><span class="hl-identifier">child</span><span class="hl-default"> = </span><span class="hl-identifier">ET</span><span class="hl-default">.</span><span class="hl-identifier">SubElement</span><span class="hl-brackets">(</span><span class="hl-identifier">tree</span><span class="hl-code">.</span><span class="hl-identifier">getroot</span><span class="hl-brackets">()</span><span class="hl-code">, </span><span class="hl-quotes">&quot;</span><span class="hl-string">child</span><span class="hl-quotes">&quot;</span><span class="hl-brackets">)
	</span><span class="hl-identifier">child</span><span class="hl-default">.</span><span class="hl-identifier">text</span><span class="hl-default"> = </span><span class="hl-quotes">&quot;</span><span class="hl-string">Three</span><span class="hl-quotes">&quot;

	</span><span class="hl-reserved">try</span><span class="hl-default">:
		</span><span class="hl-identifier">tree</span><span class="hl-default">.</span><span class="hl-identifier">write</span><span class="hl-brackets">(</span><span class="hl-identifier">xml_file</span><span class="hl-brackets">)
	</span><span class="hl-reserved">except Exception</span><span class="hl-default">, </span><span class="hl-identifier">inst</span><span class="hl-default">:
		</span><span class="hl-reserved">print </span><span class="hl-quotes">&quot;</span><span class="hl-string">Unexpected error writing to file %s: %s</span><span class="hl-quotes">&quot;</span><span class="hl-default"> % </span><span class="hl-brackets">(</span><span class="hl-identifier">xml_file</span><span class="hl-code">, </span><span class="hl-identifier">inst</span><span class="hl-brackets">)
		</span><span class="hl-reserved">return

if </span><span class="hl-identifier">__name__</span><span class="hl-default"> == </span><span class="hl-quotes">&quot;</span><span class="hl-string">__main__</span><span class="hl-quotes">&quot;</span><span class="hl-default">:
	</span><span class="hl-comment"># Someone is launching this directly
	</span><span class="hl-identifier">main</span><span class="hl-brackets">()</span></pre></div></div>
<p>When you run the code and take a look at the <code>our.xml</code> file you should see that the the third child element has been added:</p>
<div class="hl-surround" ><div class="hl-main"><pre><span class="hl-default">&lt;</span><span class="hl-identifier">root</span><span class="hl-default">&gt;
&lt;</span><span class="hl-identifier">child</span><span class="hl-default">&gt;</span><span class="hl-identifier">One</span><span class="hl-default">&lt;/</span><span class="hl-identifier">child</span><span class="hl-default">&gt;
&lt;</span><span class="hl-identifier">child</span><span class="hl-default">&gt;</span><span class="hl-identifier">Two</span><span class="hl-default">&lt;/</span><span class="hl-identifier">child</span><span class="hl-default">&gt;
&lt;</span><span class="hl-identifier">child</span><span class="hl-default">&gt;</span><span class="hl-identifier">Three</span><span class="hl-default">&lt;/</span><span class="hl-identifier">child</span><span class="hl-default">&gt;
&lt;/</span><span class="hl-identifier">root</span><span class="hl-default">&gt;</span></pre></div></div>
<h2><a href="#ReadingfromtheWeb">Reading from the Web</a></h2>
<p>Working with a local file is very useful, but you might also be in a situation where you will have to work with an XML file that is located on the Internet, perhaps an RSS feed.  Fortunately, since the <code>parse</code> function explained above works with file-like elements, loading a URL is very easy.</p>
<p>First off, you need to import the <em>urllib</em> module; a standard module that allows you to open URLs in a method similar to opening local files:</p>
<div class="hl-surround" style="height:28px;"><div class="hl-main"><pre><span class="hl-reserved">import </span><span class="hl-identifier">urllib</span></pre></div></div>
<p>In order to open a URL we use:</p>
<div class="hl-surround" ><div class="hl-main"><pre><span class="hl-identifier">feed</span><span class="hl-default"> = </span><span class="hl-identifier">urllib</span><span class="hl-default">.</span><span class="hl-identifier">urlopen</span><span class="hl-brackets">(</span><span class="hl-quotes">&quot;</span><span class="hl-string">http://pythonmagazine.com/c/news/atom</span><span class="hl-quotes">&quot;</span><span class="hl-brackets">)
</span><span class="hl-identifier">tree</span><span class="hl-default"> = </span><span class="hl-identifier">ET</span><span class="hl-default">.</span><span class="hl-identifier">parse</span><span class="hl-brackets">(</span><span class="hl-identifier">feed</span><span class="hl-brackets">)</span></pre></div></div>
<h2><a href="#Conclusion">Conclusion</a></h2>
<p>And that&#8217;s that!  This concludes our brief introduction to XML parsing using the <em>ElementTree</em> module. Hopefully throughout this article you have seen how easy it is to create and manipulate XML using <em>ElementTree</em> &#8230;and I&#8217;ve only scratched the surface.  For more information take a look at the official Python documentation and some of the great examples on the effbot website. I&#8217;m sure you&#8217;ll be an XML wizard in no time.</p>
<p>[1] <a href="http://docs.python.org/whatsnew/modules.html#SECTION0001420000000000000000">http://docs.python.org/whatsnew/modules.html#SECTION0001420000000000000000</a><br />
[2] <a href="http://effbot.org/zone/pythondoc-elementtree-ElementTree.htm#elementtree.ElementTree.XML-function">http://effbot.org/zone/pythondoc-elementtree-ElementTree.htm#elementtree.ElementTree.XML-function</a><br />
[3] <a href="http://effbot.org/zone/pythondoc-elementtree-ElementTree.htm#elementtree.ElementTree.ElementTree-class">http://effbot.org/zone/pythondoc-elementtree-ElementTree.htm#elementtree.ElementTree.ElementTree-class</a></p>
<div style="float:right;margin:0px 0px 0px 0px;"><a href="http://www.google.com/reader/link?url=http://www.learningpython.com/2008/05/07/elegant-xml-parsing-using-the-elementtree-module/&title=Elegant XML parsing using the ElementTree Module&srcTitle=learning python&srcURL=http://www.learningpython.com"target="_blank" rel=""><img border="0" src="http://www.learningpython.com/wp-content/plugins/wp-google-buzz/icon/12.png" style="opacity:1;filter:alpha(opacity=100)" onmouseover="this.style.opacity=0.8;this.filters.alpha.opacity=70" onmouseout="this.style.opacity=1;this.filters.alpha.opacity=100"/> </a></div>]]></content:encoded>
			<wfw:commentRss>http://www.learningpython.com/2008/05/07/elegant-xml-parsing-using-the-elementtree-module/feed/</wfw:commentRss>
		<slash:comments>12</slash:comments>
		</item>
	</channel>
</rss>
