XML Bestiary: IndexingXPathNavigator

| 7 Comments | 18 TrackBacks

Here I go again with another experimental creature from my XML Bestiary: IndexingXPathNavigator. This one was inspired by Mark Fussell's "Indexing XML, XML ids and a better GetElementByID method on the XmlDocument class". I've been experimenting with Mark's extended XmlDocument, played a bit with XPathDocument and "idkey()" extension function Mark was talking about. Finally I came to a conclusion that 1)XPath is the way to go (that' not the first time I say it, right? :) and thus what should be extended is XPathNavigator; 2)no need to reinvent the wheel as XSLT's keys is proved excellent stuff.
That is what IndexingXPathNavigator is - XPathNavigator, augmented with XSLT keys functionality: it supports declaring keys, lazy or eager indexing and retriving indexed nodes via key() function all as per familiar and proved XSLT semantics.

Intro

IndexingXPathNavigator is a wrapper around any XPathNavigator, augmenting it with indexing functionality. Such architecture allows seamless indexing of any IXPathNavigable XML store, such as XPathDocument, XmlDocument or XmlDataDocument. The overall semantics behind should be very familiar to any developer acquainted with XSLT as IndexingXPathNavigator behavior literally follows XSLT 1.0 specification.

The main scenario when IndexingXPathNavigator is meant to be used is uniform repetitive XPath selections from loaded in-memory XML document, such as selecting orders by orderID from an XmlDocument. Using IndexingXPathNavigator with preindexed selections allows drastically decrease selection time and to achieve O(n) perf.

How it works

First one have to declare keys, according to which the indexing occurs. This is done using
IndexingXPathNavigator.AddKey(string keyname, string match, string use) method, where
keyname is the name of the key,
match is an XPath pattern, defining the nodes to which this key is applicable and
use is an XPath expression used to determine the value of the key for each matching node.
E.g. to index XML document by "id" attribute value, one can declare a key, named "idKey" as follows: AddKey("idKey", "*", "@id"). This method works identically to XSLT's <xsl:key> instruction. Essentially a key can be thought as a collection of node-value pairs. During indexing IndexingXPathNavigator builds a Hashtable for each named key, which contains nodes and associated values and allows to retrieve nodes by key value. You may define as many keys as you need including keys with the same name.

After all keys are declared, IndexingXPathNavigator is ready for indexing. Indexing process is performed as follows - each node in XML document is matched against all key definitions. For each matching node, key value is calculated and this node-value pair is added into appropriate Hashtable. As can be seen indexing is not a cheap operation, it involves walking through the whole XML tree, multiple node matching and XPath expression evaluating. That's the usual indexing price. Indexing can be done in either lazy (first access time) or eager (before any selections) manner.

After indexing IndexingXPathNavigator is ready for node retrieving. IndexingXPathNavigator augments XPath with standard XSLT's key(string keyname, object keyValue) function, which allows to retrieve nodes directly from built indexes (Hashtables) by key value. The function is implemented as per XSLT spec.

Usage sample

Here is common usage pattern, which I used also to test the performance of the IndexingXPathNavigator. There is an XML document, northwind.xml, which is XML version of the Northwind database. It contains lots of orders, each one consisting of order ID, order date and shipping address:
<Item>
  <OrderID> 10952</OrderID>
  <OrderDate> 4/15/96</OrderDate>
  <ShipAddress> Obere Str. 57</ShipAddress>
</Item>
The aim is to select shipping address for an order by order ID. Here is how it can be implemented with IndexingXPathNavigator:
XPathDocument doc = new XPathDocument("test/northwind.xml");
IndexingXPathNavigator inav = new IndexingXPathNavigator(
  doc.CreateNavigator());
//Declare a key named "orderKey", which matches Item elements and 
//whose key value is value of child OrderID element
inav.AddKey("orderKey", "OrderIDs/Item", "OrderID"); 
//Indexing
inav.BuildIndexes();
//Selection
XPathNodeIterator ni =  nav.Select("key('orderKey', ' 10330')/ShipAddress");
while (ni.MoveNext())
  Console.WriteLine(ni.Current.Value);

Performance test

Here are results of my testing. northwind.xml is 240Kb and contains 830 orders. I'm searching shipping address for an order, whose OrderID is 10330 (it's almost in the middle of the XML document) using regular XPathNavigator and IndexingXPathNavigator (code above, full code in the download). My PC is 533MHz/256M RAM Win2K box, here are the results:
Loading XML document: 167.12 ms
Regular selection, warming...
Regular selection: 1000 times, total time 5371.79 ms, 1000 nodes selected
Regular selection, testing...
Regular selection: 1000 times, total time 5181.80 ms, 1000 nodes selected
Building IndexingXPathNavigator:   1.03 ms
Adding keys:   5.16 ms
Indexing:  58.21 ms
Indexed selection, warming...
Indexed selection: 1000 times, total time 515.90 ms, 1000 nodes selected
Indexed selection, testing...
Indexed selection: 1000 times, total time 476.06 ms, 1000 nodes selected
As can be seen, average selection time for regular XPath selection is 5.181 ms, while for indexed selection it's 0.476 ms. One order of magnitude faster! Note additionally that XML document is very simple and regular and I used /ROOT/CustomerIDs/OrderIDs/Item[OrderID=' 10330']/ShipAddress XPath for regular selection, which is almost linear search and is probably the most effective from XPath point of view. With more complex XML structure and XPath expressions such as //Item[OrderID=' 10330']/ShipAddress the difference would be even more striking.

Download

I wanted to make IndexingXPathNavigator a subproject of MVP XML library, but failed. I forgot I can't work with CVS over SSH from my work due to closed SSH port. Grrrmm. Ok, after all it's experimental implementation. Let's see if there will be any substantial feedback first.

Full source code along with perf testing. As usual two download locations available: local one and from GotDotNet (will update later). IndexingXPathNavigator homepage is http://www.tkachenko.com/dotnet/IndexingXPathNavigator.html.


I really like this one. Probably that's what my next article is going to be about. What do you think? I'm cap in hand waiting for comments.

Related Blog Posts

18 TrackBacks

TrackBack URL: http://www.tkachenko.com/cgi-bin/mt-tb.cgi/196

Take Outs for 16 March from Enjoy Every Sandwich on March 17, 2004 9:14 AM

You have been Taken Out! Thanks for the post. Read More

Take Outs for 16 March 2004 Read More

TITLE: re: Toward a Generic XML Content Indexing System URL: http://weblogs.asp.net/jezell/archive/2004/03/21/93420.aspx IP: 66.129.67.202 BLOG NAME: Jesse Ezell Blog DATE: 03/21/2004 11:51:52 AM Read More

IndexingXPathNavigator reloaded from Daniel Cazzulino .NET and XML digress on April 19, 2004 7:02 PM

TITLE: IndexingXPathNavigator reloaded URL: http://weblogs.asp.net/cazzu/archive/2004/04/19/116001.aspx IP: 66.129.67.203 BLOG NAME: Daniel Cazzulino .NET and XML digress DATE: 04/19/2004 07:02:10 PM Read More

TITLE: Mvp.Xml project: IndexingXPathNavigator reloaded URL: http://weblogs.asp.net/cazzu/archive/0001/01/01/116001.aspx IP: 66.129.67.202 BLOG NAME: DATE: 04/28/2004 02:33:16 AM Read More

TITLE: Mvp.Xml project: packed XML cool utilities URL: http://weblogs.asp.net/cazzu/archive/2004/05/15/132532.aspx IP: 66.129.67.202 BLOG NAME: DATE: 05/15/2004 06:02:41 PM Read More

TITLE: Mvp.Xml project: packed XML cool utilities URL: http://weblogs.asp.net/cazzu/archive/0001/01/01/132532.aspx IP: 66.129.67.202 BLOG NAME: DATE: 05/31/2004 09:15:59 PM Read More

TITLE: Mvp.Xml project: packed XML cool utilities URL: http://weblogs.asp.net/cazzu/archive/0001/01/01/132532.aspx IP: 66.129.67.202 BLOG NAME: DATE: 06/22/2004 07:29:03 PM Read More

TITLE: IndexingXPathNavigator reloaded URL: http://weblogs.asp.net/cazzu/archive/0001/01/01/116001.aspx IP: 66.129.67.202 BLOG NAME: DATE: 06/22/2004 07:36:48 PM Read More

TITLE: Mvp.Xml project: packed XML cool utilities URL: http://weblogs.asp.net/cazzu/archive/0001/01/01/132532.aspx IP: 66.129.67.202 BLOG NAME: DATE: 06/24/2004 10:23:29 PM Read More

TITLE: Mvp.Xml project: packed XML cool utilities URL: http://weblogs.asp.net/cazzu/archive/0001/01/01/132532.aspx IP: 66.129.67.202 BLOG NAME: DATE: 07/04/2004 04:40:36 PM Read More

TITLE: IndexingXPathNavigator reloaded URL: http://weblogs.asp.net/cazzu/archive/0001/01/01/116001.aspx IP: 66.129.67.203 BLOG NAME: DATE: 07/09/2004 03:27:32 PM Read More

TITLE: Mvp.Xml project: packed XML cool utilities URL: http://weblogs.asp.net/cazzu/archive/0001/01/01/132532.aspx IP: 66.129.67.202 BLOG NAME: DATE: 07/14/2004 07:15:37 AM Read More

TITLE: Mvp.Xml project: packed XML cool utilities URL: http://weblogs.asp.net/cazzu/archive/0001/01/01/132532.aspx IP: 66.129.67.202 BLOG NAME: DATE: 08/02/2004 11:22:29 AM Read More

TITLE: Mvp.Xml project: packed XML cool utilities URL: http://weblogs.asp.net/cazzu/archive/0001/01/01/132532.aspx IP: 66.129.67.202 BLOG NAME: DATE: 08/02/2004 11:24:17 AM Read More

TITLE: Mvp.Xml project: packed XML cool utilities URL: http://weblogs.asp.net/cazzu/archive/0001/01/01/132532.aspx IP: 66.129.67.202 BLOG NAME: DATE: 08/02/2004 11:24:53 AM Read More

TITLE: Mvp.Xml project: packed XML cool utilities URL: http://weblogs.asp.net/cazzu/archive/0001/01/01/132532.aspx IP: 66.129.67.202 BLOG NAME: eXtensible mind DATE: 11/14/2004 04:28:14 AM Read More

TITLE: IndexingXPathNavigator reloaded URL: http://weblogs.asp.net/cazzu/archive/0001/01/01/116001.aspx IP: 66.129.67.203 BLOG NAME: eXtensible mind DATE: 11/14/2004 04:30:49 AM Read More

7 Comments

The IndexingXPathNavigator is now part of the Mvp.Xml project: http://mvp-xml.sf.net

I like this a lot. Perfect for scenarios where you are caching reasonably big xml, non-volatile structures and hitting them frequently.

We do this quite a lot with a number of our solutions.

No, I believe such "beast" is really useful one in a scenario when only pure XPath querying takes place. IndexingXPathNavigator is preffered here over XSLT, because XSLT would index document on each run, while IndexingXPathNavigator can build indexes once and then is ready for queries effectively as many times as needed.

> Btw, talking about variables in XPath
> Visualizer - it's also possible to provide
> custom variables in XPath expressions with
> XPathNavigator.
> It leads to "parametrized compiled XPath
> expressions". That could be useful...

Why shold one have to use such "beasts"? Only to avoid the necessity to learn XSLT? :)

Oleg, you and I will use the true thing -- XSLT. There are other people, who need cludges ... In the meantime in doing so they are going to cost a lot to their employers.

Cheers,
Dimitre.

Yeah, that reminds me it too.
Btw, talking about variables in XPath Visualizer - it's also possible to provide custom variables in XPath expressions with XPathNavigator.
It leads to "parametrized compiled XPath expressions". That could be useful when evaluating the same expression many times - no need to compile it over and over again, instead it's possible to provide different variable values via context.

Nice Oleg,

This reminds me of the XPath Visualizer, which supports user-defined xsl:key and xsl:variable and references to them (keys are referenced by the key() function) in XPath expressions.

This was implemented three years ago and remains one of the most powerful features of the XPath Visualizer.

I am especially glad to see your conclusion that "XPath is the way to go".

And of course, it was quite clear that using xsl:key there's no real need for DTDs, ID, IDREF, IDREFS and the id() function.

Why are standing around waiting? Go, Man, Go! This is great stuff!

Leave a comment