February 16, 2004

RE: Streaming XPath and ForwardXPathNavigator

Ok, Dare great deal clarified things in his "Combining XPath-based Filtering with Pull-based XML Parsing" post: Actually Oleg is closer and yet farther from the truth than he realizes. Although I wrote about a hypothetical ForwardOnlyXPathNavigator in my article entitled Can One Size Fit All? for XML Journal my planned article ...

So, it's forward-only XPath subset and BizTalk's XPathReader isn't hidden. Nice to hear.
I wonder who this guy is. He's definitely an expert in the area. Why he doesn't blog? I'm looking forward to see the article, what a pity XML dev center is postponed.

When the article describing the XPathReader is done it will provide source and if there is interest I'll create a GotDotNet Workspace for the project although it is unlikely I nor the dev who originally wrote the code will have time to maintain it.
I'm volunteering here. I think it's important-to-have option in XML processing under .NET.

Meanwhile Daniel has released XSE stuff at last (btw, I'm musing if I have to adopt hype-before-release strategy? :). Really interesting. But I still believe XPath (forward-only subset of course) is the way to go.

Anyway, here is ForwardXPathNavigator I was talking about - ForwardXPathNavigator.zip. It's written by my buddy dev Vladimir Nesterovsky. And here are some basic samples.

Selecting feed titles from RSSBandit feed list (pure forward-only selection):

XmlReader r = new XmlTextReader("feedlist.xml");
ForwardXPathNavigator nav = new ForwardXPathNavigator(r);
XmlNamespaceManager nsm = new XmlNamespaceManager(nav.NameTable);
nsm.AddNamespace("r", 
    "http://www.25hoursaday.com/2003/RSSBandit/feeds/");
XPathExpression expr = 
    nav.Compile("/r:feeds/r:feed/r:title");
expr.SetContext(nsm);
XPathNodeIterator ni = nav.Select(expr);
while (ni.MoveNext()) {
    Console.WriteLine(ni.Current.Value);
}
Obviously ForwardXPathNavigator doesn't allow you to peek to forward or backward nodes. What it only stores is current node XmlReader is positioned at and some details about its direct ancestors. As Dare pointed out, expression such as /r:feeds/r:feed[count(r:stories-recently-viewed)>10]/r:title are not supported, because it cannot be done in forward-only manner. That wasn't ForwardXPathNavigator's goal anyway. In fact such query can be done in forward-only way to some extent though, but not without a help from the host environment. E.g. to select the most viewed feeds, one can select each feed, store its title, then calculate count(r:stories-recently-viewed/r:story) and determine if the feed is popular enough to be selected:
XmlReader r = new XmlTextReader("feedlist.xml");
ForwardXPathNavigator nav = new ForwardXPathNavigator(r);
XmlNamespaceManager nsm = new 
    XmlNamespaceManager(nav.NameTable);
nsm.AddNamespace("r", 
    "http://www.25hoursaday.com/2003/RSSBandit/feeds/");
XPathExpression expr = 
    nav.Compile("/r:feeds/r:feed");
expr.SetContext(nsm);
XPathExpression countExpr = 
    nav.Compile("count(r:stories-recently-viewed/r:story)");
countExpr.SetContext(nsm);
XPathExpression titleExpr = 
    nav.Compile("string(r:title)");
titleExpr.SetContext(nsm);
XPathNodeIterator ni = nav.Select(expr);
while (ni.MoveNext()) {
    string title = ni.Current.Evaluate(titleExpr) as string;
    if ((double)ni.Current.Evaluate(countExpr) > 20)
        Console.WriteLine(title);
}
Not so elegant (mostly because lack of XPathNavigator.Select(string, XmlNamespaceManager) method), but still feasible. Btw, instroducing some extension function, which could control ForwardXPathNavigator's cach would be quite interesting. Something like /r:feeds/r:feed[ext:store(r:title)][count(r:stories-recently-viewed)>10]/r:title. That's a pity XPath doesn't allow to create variables...

As I said ForwardXPathNavigator keeps some track of ancestor nodes (name, attributes etc), thus enabling some limited backward selections, such as /r:feeds/r:feed[r:title='The XML Files']/@category! I'm going to provide small aspx page where ForwardXPathNavigator can be tested online by anyone interested.

Tomorrow I'll go on spinning up the topic by presenting XmlUpdater (which is based on ForwardXPathNavigator), SAX-filter-like approach to modify XML on the fly.