Warriors of Streaming XPath Order

| 1 Comment | 12 TrackBacks

Daniel writes about performant (and inevitably streaming) XML processing, introducing XSEReader (aka Xml Streaming Events Reader). While he didn't publish the implementation itself yet, but only teasing with samples of its usage, I think I get the idea. Basically I know what he's talking about. I've been playing with such beasts, making all kinds of mistakes and finally I came up with a solution, which I think is good, but I didn't publish it yet. Why? Because I'm tired to publish spoilers :) It's based on "ForwardOnlyXPathNavigator" aka XPathNavigator over XmlReader, Dare is going to write about in MSDN XML Dev Center and I wait till that's published.

May be I'm mistaken, but anyway here is the idea - "ForwardOnlyXPathNavigator" is XPathNavigator implementation over XmlReader, which obviously supports forward-only XPath subset. My fellow developer wrote such one so may be we should publish it anyway. Having such navigator it's easy to write a class (I called it XPathFilter), which allows to register callbacks to specific nodes, identified by XPath pattern. XPathFilter travers XML document moving ForwardOnlyXPathNavigator in document order and on each node matching any registered pattern it calls callback method. In the callback it's possible to skip or modify matched node, just like in ordinar SAX filter. I've implemented XmlUpdater class based on such technique and it's proven to be effectieve on modifying huge XML documents on the fly. For instance here is how I can change element into attribute:

FileStream output = File.Create("ot2.xml");
XmlUpdater updater = new XmlUpdater(File.OpenRead("otbig.xml"), 
    output); 			
updater.AddHandler("/tstmt/book/chapter/chtitle", 
    new NodeMatchedEventHandler(MyHandler));
updater.Start();
...
public static void MyHandler(XmlUpdater xu, 
        XPathNavigator nav, XmlWriter w) {						
    w.WriteAttributeString("title", nav.Value);			
}		

And after I played enough with and implemented that stuff I discovered BizTalk 2004 Beta classes contain much better implementation of the same functionality in such gems as XPathReader, XmlTranslatorStream, XmlValidatingStream and XPathMutatorStream. They're amazing classes that enable streaming XML processing in much rich way than trivial XmlReader stack does. I only wonder why they are not in System.Xml v2 ? Is there are any reasons why they are still hidden deeply inside BizTalk 2004 ? Probably I have to evangelize them a bit as I really like this idea.

Anyway, back to XSEReader. What I like in this approach is that it's streaming event based one (do I still miss SAX?). What I dislike is proprietary XPath-like patterns like ":*" (why not *.* ?), "^kzu:*", XPath-like sugar like RootedPath(), RelativePath() etc. I think XPath is the way to go, no need to reinvent the wheel. Anyway, let's wait Daniel unveils all API and impl details.

Related Blog Posts

12 TrackBacks

TrackBack URL: http://www.tkachenko.com/cgi-bin/mt-tb.cgi/161

XPath vs pseudo-XPath from IXml* - Welcome to the real world on February 16, 2004 12:57 AM

TITLE: XPath vs pseudo-XPath URL: http://weblogs.asp.net/cazzu/archive/2004/02/15/73431.aspx IP: 66.129.67.203 BLOG NAME: IXml* - Welcome to the real world DATE: 02/16/2004 12:57:11 AM Read More

XPath vs pseudo-XPath from IXml* - Welcome to the real world on February 16, 2004 2:49 PM

TITLE: XPath vs pseudo-XPath URL: http://weblogs.asp.net/cazzu/archive/2004/02/16/PseudoXPath.aspx IP: 66.129.67.203 BLOG NAME: IXml* - Welcome to the real world DATE: 02/16/2004 02:49:48 PM Read More

XPath vs pseudo-XPath from IXml* - Welcome to the real world on February 16, 2004 2:56 PM

TITLE: XPath vs pseudo-XPath URL: http://weblogs.asp.net/cazzu/archive/0001/01/01/PseudoXPath.aspx IP: 66.129.67.203 BLOG NAME: IXml* - Welcome to the real world DATE: 02/16/2004 02:56:23 PM Read More

XPath vs pseudo-XPath from IXml* - Welcome to the real world on February 16, 2004 3:03 PM

TITLE: XPath vs pseudo-XPath URL: http://weblogs.asp.net/cazzu/archive/0001/01/01/PseudoXPath.aspx IP: 66.129.67.203 BLOG NAME: IXml* - Welcome to the real world DATE: 02/16/2004 03:03:53 PM Read More

XPath vs pseudo-XPath from IXml* - Welcome to the real world on February 16, 2004 3:04 PM

TITLE: XPath vs pseudo-XPath URL: http://weblogs.asp.net/cazzu/archive/0001/01/01/PseudoXPath.aspx IP: 66.129.67.203 BLOG NAME: IXml* - Welcome to the real world DATE: 02/16/2004 03:04:40 PM Read More

XPath vs pseudo-XPath from IXml* - Welcome to the real world on February 16, 2004 3:05 PM

TITLE: XPath vs pseudo-XPath URL: http://weblogs.asp.net/cazzu/archive/0001/01/01/PseudoXPath.aspx IP: 66.129.67.203 BLOG NAME: IXml* - Welcome to the real world DATE: 02/16/2004 03:05:43 PM Read More

XPath vs pseudo-XPath from IXml* - Welcome to the real world on February 16, 2004 3:06 PM

TITLE: XPath vs pseudo-XPath URL: http://weblogs.asp.net/cazzu/archive/0001/01/01/PseudoXPath.aspx IP: 66.129.67.203 BLOG NAME: IXml* - Welcome to the real world DATE: 02/16/2004 03:06:40 PM Read More

XPath vs pseudo-XPath from IXml* - Welcome to the real world on February 16, 2004 3:07 PM

TITLE: XPath vs pseudo-XPath URL: http://weblogs.asp.net/cazzu/archive/0001/01/01/PseudoXPath.aspx IP: 66.129.67.203 BLOG NAME: IXml* - Welcome to the real world DATE: 02/16/2004 03:07:51 PM Read More

XPath vs pseudo-XPath from IXml* - Welcome to the real world on February 16, 2004 3:08 PM

TITLE: XPath vs pseudo-XPath URL: http://weblogs.asp.net/cazzu/archive/0001/01/01/PseudoXPath.aspx IP: 66.129.67.203 BLOG NAME: IXml* - Welcome to the real world DATE: 02/16/2004 03:08:35 PM Read More

TITLE: XPath-Based Filtering With Pull-Based XML Parsing URL: http://blogs.xmladvice.com/kaevans/archive/2004/02/16/539.aspx IP: 66.129.67.222 BLOG NAME: Kirk Allen Evans' XML Blog DATE: 02/16/2004 04:36:10 PM Read More

TITLE: XPath vs pseudo-XPath URL: http://weblogs.asp.net/cazzu/archive/0001/01/01/PseudoXPath.aspx IP: 66.129.67.202 BLOG NAME: kzu.net DATE: 04/25/2004 08:06:04 AM Read More

TITLE: XPath vs pseudo-XPath URL: http://weblogs.asp.net/cazzu/archive/0001/01/01/PseudoXPath.aspx IP: 66.129.67.202 BLOG NAME: kzu.net DATE: 04/25/2004 04:26:26 PM Read More

1 Comment

Leave a comment