The experiences of a software developer as he wades through the dynamic world of technology. Discussions of new industry developments and current technologies he finds himself wrapped up in.

Monday, July 31, 2006

XSLT Key is the Key

Anyone who has used XSLT (eXtensible Stylesheet Language Transformations) knows that it can be a very powerful template language. XSLT is generally used for the transformation of an XML document into another type of document. I've used it to transform XML into other formats of XML, HTML, SQL, and even NAnt build files. As you start to use XSLT, it can be a little difficult to grasp the 'template' way of thinking, but with a little practice (as I've discovered), you'll find that it becomes quite simple. And as you work more and more with the technology, you'll start to discover the different elements and functions that will help you write more efficiently.

During my early XSLT experience, I found the key() function to be indespensible. W3 Schools defines this function as follows:
The key() function returns a node-set from the document, using the index specified by an element.

This definition may not make much sense on its own, so I'll provide an example. Let's say we have an XML document as displayed below that describes countries, territories, and cities in the world. The document also contains telephone information for each of the territories.

<?xml version="1.0" encoding="ISO-8859-1"?>
<World>
   <Country name="Canada">
      <Territory name="British Columbia" />
         <City name="Vancouver" />
      <Territory name="Alberta" />
         <City name="Edmonton" />
      <Territory name="Saskatchewan" />
         <City name="Saskatoon" />
      <Territory name="Manitoba" />
         <City name="Winnipeg" />
      <Territory name="Ontario" />
         <City name="Toronto" />
      <Territory name="Quebec" />
         <City name="Montreal" />
      <Territory name="New Brunswick" />
         <City name="Moncton" />
      <Territory name="Newfoundland and Labrador" />
         <City name="St. John's" />
      <Territory name="Prince Edward Island" />
         <City name="Charlottetown
      <Territory name="Nova Scotia" />
         <City name="Halifax" />
   </Country>
   <Country name="United States of America">
      <Territory name="New York" />
         <City name="Albany" />
      <Territory name="California" />
         <City name="Los Angeles" />
      <Territory name="Arizona" />
         <City name="Phoenix" />
      <Territory name="Florida" />
         <City name="Miami" />
   </Country>
   <Telephone area-code="306" territory="Saskatchewan" />
   <Telephone area-code="519" territory="Ontario" />
   <Telephone area-code="905" territory="Ontario" />
   <Telephone area-code="780" territory="Alberta" />
</World>

Now let's say that we want to transform this document into the one below.

<?xml version="1.0" encoding="ISO-8859-1"?>
<Phonebook>
   <Country name="Canada">
      <AreaCode value="306" province="Saskatchewan" />
      <AreaCode value="519" province="Ontario" />
      <AreaCode value="905" province="Ontario" />
      <AreaCode value="780" province="Alberta" />
   </Country>
</Phonebook>
The XSLT used to do this would look something like the following:
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
   <xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>
   <xsl:template match="/">
      <Phonebook>
         <xsl:apply-templates select="World/Country" mode="CountryMatch" />
      </Phonebook>
   </xsl:template>
   <xsl:template match="Country" mode="CountryMatch">
      <Country>
         <xsl:attribute name="name">
            <xsl:value-of select="@name" />
      </xsl:attribute>
      <xsl:apply-templates select="Territory" mode="TerritoryMatch" />
      </Country>
   </xsl:template>
   <xsl:template match="Territory" mode="TerritoryMatch">
      <xsl:variable name="CurrentTerritory" select="@name" />
      <xsl:for-each select="../../Telephone[@territory=$CurrentTerritory]">
         <AreaCode>
            <xsl:attribute name="value">
               <xsl:value-of select="@area-code" />
            </xsl:attribute>
            <xsl:attribute name="territory">
               <xsl:value-of select="$CurrentTerritory" />
            </xsl:attribute>
         </AreaCode>
      </xsl:for-each>
   </xsl:template>
</xsl:stylesheet>



This works just fine, and you may not ever have a problem with it. However as the documents you are transforming increase in size, this technique may not work as efficiently as you would hope. The particular line I am referring to that may cause a problem is:


<xsl:for-each select="../../Telephone[@territory=$CurrentTerritory]">
Walking back node levels in the document to evaluate an expression can prove to be very inefficient when dealing with large documents. Consider that you had an XML document that contained all of the countries of the world, all of their territories (such as state, province, etc.), and all of their cities over 50,000 people. This would be a very large document. Now imagine the work the XSLT engine would have to perform to process the problem statement. For each territory in each country, the processor would have to back up two node levels and then search for each matching territory in each Telephone node - that's a lot of work.

I ran into this very problem recently when I was working with extremely large XML documents - around 350 MB. The transformations were part of a nightly NAnt build cycle and some were taking over an hour to finish. After looking into the problem, I discovered that the culprit was a similar statement to the one above. After a little research I realized that I should have been using the XSLT function, key(). As stated earlier, the key() function (in conjunction with the xsl:key element) finds the node with a given value for a named key. So, to make use of the key() function, the XSLT document from above would look slightly different.

Under the root element, World, you would define a key.


<xsl:key name="MatchedTelephone" match="World/Telephone" use="@territory" />


The 'name' attribute simply holds the name of your key. The 'match' attribute holds a match pattern identifying the collection of nodes where the lookups will take place. And finally, the 'use' attribute is what is used to find a matching node in your defined match pattern.

Now that we have a key defined, we can re-write our problem statement using the key() function as follows:


<xsl:for-each select="key('MatchedTelephone', $CurrentTerritory)">



What we are essentially doing here is saying, for each of the World/Telephone elements where the 'territory' attribute is equal to our current 'territory', create an AreaCode element.

Once I made these changes, the large document I was trying to transform came down from over an hour process to less than 5 minutes. The effects were incredible, to say the least (just keep in mind, that keys operate in the current context, meaning that they will not work with an external document, ie. using the document function). Like I stated earlier, if you're dealing with relatively small documents, you probably won't notice any type of performance gain, but once you get into large documents, the peformance benefits are huge.

Labels: ,

2 Comments:

Anonymous Anonymous said...

very useful article and well explained - thanks

4:07 AM

 
Anonymous Deepa said...

That is real good explanation. Really useful as the w3 definition is not making much sense.

10:03 AM

 

Post a Comment

<< Home