Let's Get Technical: DB2 9 and pureXML : Part 1

I'm not going to start this post by speaking about how important XML is, or how it is the backbone of the service oriented architectures of so many enterprises. I'm not going to get into this, not because I disagree with these statements, but because I think they are generally accepted notions in today's IT industry. I think what is important to talk about is how we can more efficiently work with this technology, and how we can further integrate it into the enterprise information archictecture.

IBM is not about to be left behind and they have recently released DB2 9 (code-named Viper) which includes something they have coined 'pureXML'.

XML data requires the same coveted qualities of service that relational databases provide: high availability, reliability, protection, and performance. The pureXML™ technology in DB2 9 unlocks the latent potential of XML by providing simple efficient access to XML with the same levels of security, integrity, and resiliency taken for granted with relational data. With pureXML innovation in DB2 9, your organization can expect breakthrough increases in availability, speed, and versatility of your XML information, along with dramatically reduced development and administrative costs.

With the addition of a new 'xml' datatype you can now store XML content in a table. Having this option allows you to overcome many issues that arise when working with XML. Having the ability to store XML in a table, rather than files, we can now write more scaleable applications, and deal with concurrency and security issues more efficiently.

Besides these advantages, the addition of pureXML let's us actually query the XML data that is stored in the table with regular SQL (SQL/XML a.k.a. SQL with XML Extensions) or XQuery. In this first post of a two part series, I will explain how to query data stored in an XML column of a DB2 9 table using SQL and SQL/XML.

Let's say that we have a table defined as follows:

CREATE TABLE Catalog (
   id INT PRIMARY KEY NOT NULL,
   catalogname VARCHAR(40),
   company VARCHAR(40),
   contents XML
)

Now let's say that the'contents' XML column contains the following data:

<?xml version="1.0" encoding="UTF-8"?>
<Catalog>
   <Product>
      <ProductID>100</ProductID>
      <ProductName>Columbian Coffee</ProductName>
      <ProductDescription>A bold blend of Columbia's finest beans.</ProductDescription>
      <Price>7.99</Price>
   </Product>
   <Product>
      <ProductID>200</ProductID>
      <ProductName>Kona Coffee</ProductName>
      <ProductDescription>A mild blend of Hawaii's tropical java.</ProductDescription>
      <Price>10.99</Price>
   </Product>
</Catalog>

Despite having the new XML column as part of the 'Catalog' table, you can still make regular SQL queries such as:

SELECT * FROM Catalog
WHERE company = 'World Coffes Inc.'

Obviously this query will return all of the rows where 'company' is equal to 'World Coffee Inc.'. Keep in mind that the data as stored in the XML column, 'contents' will also be returned by the query, letting the programmer handle parsing the markup in some traditional manner - which frankly, isn't very exciting.

But would you say if I could return the names of all of the products contained in the 'contents' XML offered by 'World Coffee Inc.'. Now that is pretty exciting!

Before we write a query to do this, let's define SQL/XML.

As the name implies, SQL/XML is designed to bridge between the SQL and XML worlds. It evolved as part of the SQL standard effort and now includes specifications for embedding XQuery or XPath expressions within SQL statements. XPath is a language for navigating XML documents to find elements or attributes. XQuery includes support for XPath.

Now, back to the query:

   SELECT xmlquery('$c/Catalog/Product/ProductName
   PASSING contents as "c")
   FROM Catalog
   WHERE company = 'World Coffee Inc.'

Now to explain what this query actually means. The first line uses the built in pureXML function, xmlquery to state what part of the XML we are interested in. We pass in the XPath query, Catalog/Product/ProductName, to specify what node we are interested in - in this case, the ProductName.

The PASSING contents as "c" simply lets us specify the table's 'contents' column as the source for the XML. The $c which is plugged in front of our XPath statement is simply a variable reference to the PASSING clause.

The rest of the query is nothing more than regular SQL syntax.

Let's look at another example where we want to return all 'catalognames' that contain products with prices over $10.

   SELECT catalogname
   FROM Catalog
   WHERE xmlexists(
      '$c/Catalog/Product[Price > 10]'
      PASSING Catalog.contents as "c"
   )

You can probably figure out the nuances of this query based on what you learned from the previous example. Something that you may not recognize is the xmlexists function which evaluates to 'true' if the XPath expression, Catalog/Product[Price > 10] is valid. So we are essentially ensuring that we are retrieving only the 'catalogname' where the product's price is greater than $10.

These two examples only scratch the surface of what can be accomplished in DB2 9 and pureXML, but I hope it has given you a taste of just how effective they can be. By downloading some of this XML processing to the database level, developers can take advantage of the database engine's power.

In Part 2 of DB2 9 and pureXML I will introduce some examples of querying XML data with XQuery and see why DB2 9 is considered a bilingual product. If you're interested in further reading, check out the IBM Redbook - DB2 9 : pureXML Overview and Fast Start.

Labels: db2, ibm, purexml, sql, xml, xpath, xquery

Let's Get Technical

Monday, October 23, 2006

DB2 9 and pureXML : Part 1

0 Comments:

About Me

Previous Posts

Subscribe By Email