Let's Get Technical: July 2006

Monday, July 31, 2006

XSLT Key is the Key

Anyone who has used XSLT (eXtensible Stylesheet Language Transformations) knows that it can be a very powerful template language. XSLT is generally used for the transformation of an XML document into another type of document. I've used it to transform XML into other formats of XML, HTML, SQL, and even NAnt build files. As you start to use XSLT, it can be a little difficult to grasp the 'template' way of thinking, but with a little practice (as I've discovered), you'll find that it becomes quite simple. And as you work more and more with the technology, you'll start to discover the different elements and functions that will help you write more efficiently.

During my early XSLT experience, I found the key() function to be indespensible. W3 Schools defines this function as follows:

The key() function returns a node-set from the document, using the index specified by an element.

This definition may not make much sense on its own, so I'll provide an example. Let's say we have an XML document as displayed below that describes countries, territories, and cities in the world. The document also contains telephone information for each of the territories.

<?xml version="1.0" encoding="ISO-8859-1"?>
<World>
   <Country name="Canada">
      <Territory name="British Columbia" />
         <City name="Vancouver" />
      <Territory name="Alberta" />
         <City name="Edmonton" />
      <Territory name="Saskatchewan" />
         <City name="Saskatoon" />
      <Territory name="Manitoba" />
         <City name="Winnipeg" />
      <Territory name="Ontario" />
         <City name="Toronto" />
      <Territory name="Quebec" />
         <City name="Montreal" />
      <Territory name="New Brunswick" />
         <City name="Moncton" />
      <Territory name="Newfoundland and Labrador" />
         <City name="St. John's" />
      <Territory name="Prince Edward Island" />
         <City name="Charlottetown
      <Territory name="Nova Scotia" />
         <City name="Halifax" />
   </Country>
   <Country name="United States of America">
      <Territory name="New York" />
         <City name="Albany" />
      <Territory name="California" />
         <City name="Los Angeles" />
      <Territory name="Arizona" />
         <City name="Phoenix" />
      <Territory name="Florida" />
         <City name="Miami" />
   </Country>
   <Telephone area-code="306" territory="Saskatchewan" />
   <Telephone area-code="519" territory="Ontario" />
   <Telephone area-code="905" territory="Ontario" />
   <Telephone area-code="780" territory="Alberta" />
</World>

Now let's say that we want to transform this document into the one below.

<?xml version="1.0" encoding="ISO-8859-1"?>
<Phonebook>
   <Country name="Canada">
      <AreaCode value="306" province="Saskatchewan" />
      <AreaCode value="519" province="Ontario" />
      <AreaCode value="905" province="Ontario" />
      <AreaCode value="780" province="Alberta" />
   </Country>
</Phonebook>

The XSLT used to do this would look something like the following:

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
   <xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>
   <xsl:template match="/">
      <Phonebook>
         <xsl:apply-templates select="World/Country" mode="CountryMatch" />
      </Phonebook>
   </xsl:template>
   <xsl:template match="Country" mode="CountryMatch">
      <Country>
         <xsl:attribute name="name">
            <xsl:value-of select="@name" />
      </xsl:attribute>
      <xsl:apply-templates select="Territory" mode="TerritoryMatch" />
      </Country>
   </xsl:template>
   <xsl:template match="Territory" mode="TerritoryMatch">
      <xsl:variable name="CurrentTerritory" select="@name" />
      <xsl:for-each select="../../Telephone[@territory=$CurrentTerritory]">
         <AreaCode>
            <xsl:attribute name="value">
               <xsl:value-of select="@area-code" />
            </xsl:attribute>
            <xsl:attribute name="territory">
               <xsl:value-of select="$CurrentTerritory" />
            </xsl:attribute>
         </AreaCode>
      </xsl:for-each>
   </xsl:template>
</xsl:stylesheet>

This works just fine, and you may not ever have a problem with it. However as the documents you are transforming increase in size, this technique may not work as efficiently as you would hope. The particular line I am referring to that may cause a problem is:

<xsl:for-each select="../../Telephone[@territory=$CurrentTerritory]">

Walking back node levels in the document to evaluate an expression can prove to be very inefficient when dealing with large documents. Consider that you had an XML document that contained all of the countries of the world, all of their territories (such as state, province, etc.), and all of their cities over 50,000 people. This would be a very large document. Now imagine the work the XSLT engine would have to perform to process the problem statement. For each territory in each country, the processor would have to back up two node levels and then search for each matching territory in each Telephone node - that's a lot of work.

I ran into this very problem recently when I was working with extremely large XML documents - around 350 MB. The transformations were part of a nightly NAnt build cycle and some were taking over an hour to finish. After looking into the problem, I discovered that the culprit was a similar statement to the one above. After a little research I realized that I should have been using the XSLT function, key(). As stated earlier, the key() function (in conjunction with the xsl:key element) finds the node with a given value for a named key. So, to make use of the key() function, the XSLT document from above would look slightly different.

Under the root element, World, you would define a key.

<xsl:key name="MatchedTelephone" match="World/Telephone" use="@territory" />

The 'name' attribute simply holds the name of your key. The 'match' attribute holds a match pattern identifying the collection of nodes where the lookups will take place. And finally, the 'use' attribute is what is used to find a matching node in your defined match pattern.

Now that we have a key defined, we can re-write our problem statement using the key() function as follows:

<xsl:for-each select="key('MatchedTelephone', $CurrentTerritory)">

What we are essentially doing here is saying, for each of the World/Telephone elements where the 'territory' attribute is equal to our current 'territory', create an AreaCode element.

Once I made these changes, the large document I was trying to transform came down from over an hour process to less than 5 minutes. The effects were incredible, to say the least (just keep in mind, that keys operate in the current context, meaning that they will not work with an external document, ie. using the document function). Like I stated earlier, if you're dealing with relatively small documents, you probably won't notice any type of performance gain, but once you get into large documents, the peformance benefits are huge.

Labels: performance, xslt

Wednesday, July 26, 2006

The XPath to Cleaner Java Code

It was about three years ago that I got heavily involved in XML. The company I was working for at the time took the leap into SOA, and XML was a major component. I can remember the dilemmas I faced when trying to parse the XML messages being passed around from system to system. There were many options to dealing with the XML and after evaluating many of them, I decided to use the Document Object Model.

Granted, using the DOM gives you a lot of power, but writing the code to actually navigate an XML document can be a real hassle. And once you do finally get your logic in place, trying to come back to look at that code a few months later can bring on a serious migraine. No one said maintaining such code would be easy.

Now that I look back on those days, I often wonder why I didn't choose another API such as Xalan, or Saxon. Sure, I'd have introduced a dependency on an external engine, thereby locking me into their API, but I would have had the advantage of using XML Path Language, which is more commonly referred to as XPath. Since joing my new company, I have done extensive work using XPath (mostly while writing XSLT in a .NET 2.0 environment) and it was only at this time that I fully realized its power. It's safe to say that I didn't fully appreciate it during my initial evaluation.

For those who aren't familiar with XPath it is a powerful query language for extracting information from an XML document. Like SQL is a query language optimized for extracting data from a relational database, XPath is optimized for easily navigating an XML document to find the information you're interested in. IBM's technical XML library on their developerWorks network made a great analogy.

If you send someone out to purchase a gallon of milk, what would you rather tell that person? "Please go buy a gallon of milk." Or, "Exit the house through the front door. Turn left at the sidewalk. Walk three blocks. Turn right. Walk one half block. Turn right and enter the store. Go to aisle four. Walk five meters down the aisle. Turn left. Pick up a gallon jug of milk. Bring it to the checkout counter. Pay for it. Then retrace your steps home." That's ridiculous. Most adults are intelligent enough to procure the milk on their own with little more instruction than "Please go buy a gallon of milk."

This gives a great sense as to how much simpler it can be to write an XPath expression, as opposed to having to write complicated DOM code. Let's say I had an XML document that contained a list of cities, in a list of territories, and countries. If I wanted to find all of the cities listed in Ontario, my XPath statement would look something like this (assuming 'world' is the root element of the XML document:

//world/[country='canada']/[territory='ontario']/city

I'm not even going to get into writing the DOM code for doing something like this. I think you can imagine what it would look like. I sure can, because I got my practice writing a lot of it during the project I was referring to earlier. I guess if you were not going to have a lot of lookups, using the DOM would be fine and as a developer it is your decision to weigh your options. Do you want to write some code that will be difficult to maintain, or do you want to get locked into an external API and it's particular engine?

Well, with Sun's introduction of Java 5 this decision may have gotten a lot easier. This release of the Java platform now includes a package javax.xml.xpath that has everything you need to perform XPath queries right out of the box. No longer will you have to rely on an external engine to take advantage of XPath. I just wish it was available when I was trying to make that decision three years ago.

For further information on using javax.xml.xpath, check out IBM article, The The Java XPath API and Sun's official JavaDocs for Java 5.

Labels: java, java xpath api, xpath

Thursday, July 20, 2006

AJAX Still Gaining Momentum

If you are a developer and haven't heard of AJAX (Asynchronous JavaScript and XML), perhaps it's time you crawled out from under that rock you live. AJAX technologies have been around for a few years now, but within the last 12 months or so it has really started to take off. In fact, all of the major players seem to have embraced the idea. If you head to Sun's Java site, the landing page is filled with links to AJAX articles, tips, and tools. In fact today, the site contains an announcement that Sun has joined the OpenAJAX Alliance and Dojo Foundation. I guess Sun thinks AJAX is relevant enought to "actively participate in these two communities to help drive open standards for AJAX programming and increase interoperability across AJAX technologies." And let's not forget the other major players; Microsoft's Atlas framework is offered for free and integrates seamlessly with ASP .NET 2.0. Google wasn't about to miss the boat either, recently releasing their Google Web Toolkit. GWT let's you "You write your front end in the Java programming language, and the GWT compiler converts your Java classes to browser-compliant JavaScript and HTML."

I really think that AJAX applications are going to continue to flood the web. All the coolest stuff out there seem to make use of it in some way or another, such as GoogleMaps, GMail and Google Calendar, and Zip.ca. As these applications find a place on the internet, gone are the days of differentiating between desktop applications and web applications - or at least we'll be a lot closer.

I have yet to dive into the world of AJAX from a hands-on perspective, but I have read a lot of material on the subject and have started to put together some preliminary work. Actually, I'm gonna take that back. I have done a significant amount of "AJAX" development, and this was about 3 years ago (before it was actually called AJAX). I developed a web application for three agriculturalists that allowed users to create quotes for crop hail insurance, compare prices, and finally purchase the insurance. The application made it possible to change different values, resulting in an update of all of the information on the page, without an annoying refresh.

In Professional Ajax, the Hidden Frame Technique is defined, and considered to be the first asynchronous request / response model for web applications.

As developers began to understand how to manipulate frames, a new technique emerged to facilitate client-server communication. The hidden frame technique involved setting up a frameset where one frame was set to a width or height of 0 pixels, and its sole purpose was to initiate communication with the server. The hidden frame would contain an HTML form with specific form fields that could be dynamically filled out by JavaScript and submitted back to the server. When the frame returned, it would call another JavaScript function to notify the original that data had been returned. The hidden frame technique represented the first asynchronous request/response model for web applications.

Even though this is far from the current AJAX frameworks which make use of an AJAX engine, I think it is safe to say that I was an early adopter of this revolutionary model. As I learn more about how to implement this framework, I can't help wonder how much cooler I could make the above mentioned hail insurance system. I've thrown the idea at the three guys who own the company, and a real interest was there after I explained the advantages it could offer. It also makes it a much easier sell when you show them some of those revolutionary web apps that are already out there.

Labels: ajax, google web toolkit, javascript, microsoft atlas, openajax alliance

Tuesday, July 18, 2006

We Have Lift-off

Information Technology has been a huge part of my life for sometime now. Since I was in grade school I was programming, and I guess it was only natural that this would blossom into a career. I'm thirty-one now and have been in the IT business professionally for about seven years. My career choice saw me move halfway across the country and back to where I grew up, letting me experience different parts of the country, as well as gain valuable technical skills.

To start my career off, I was part of the IT department of a large insurance / financial company, and I have recently joined a small, cutting edge software company. Needless to say, I have seen two extremes of the different IT work environments. I think testing out the waters of two different types of companies will help me make the final decision on what type of company I would like to spend the rest of my career with. At this point, it's a tough decision since both environments have so many different advantages.

Anway, I didn't want to get into my life story, but just wanted to start off with a brief background of myself. In terms of the type of technical work I've done, I've had the privilege of experiencing a wide spectrum of different technologies. I started off on a development team building a fat client using Java Swing. I then moved on to building distributed systems leveraging J2EE and web services. I now find myself wrapped up in the .NET world, heavily focussing on data syndication. In the mean-time, I always find time to do a little consulting side projects. These are usually J2EE web systems.

I'm lucky to say that I wake up everyday eager to get to the office to see what awaits me. I see no sign of slowing down anytime soon, and if that time every comes, I guess it's time to find a new career. Be sure to check back (or better yet, subscribe to this feed) for my upcoming technical articles.

Take care.

Labels: software development, technology

Let's Get Technical

Monday, July 31, 2006

XSLT Key is the Key

Wednesday, July 26, 2006

The XPath to Cleaner Java Code

Thursday, July 20, 2006

AJAX Still Gaining Momentum

Tuesday, July 18, 2006

We Have Lift-off

About Me

Links

Previous Posts

Archives

Subscribe By Email