February 21, 2004

nxslt.exe Command Line Utility

Dummy entry to provide single place for nxslt.exe utility comments. ...

February 20, 2004

"XQuery and XPath Formal Semantics" goes Last Call Working Draft

"XQuery 1.0 and XPath 2.0 Formal Semantics" spec has been updated today and reached Last Call Working Draft status. This is a document you may want to read to get deep understanding of semantics of XQuery 1.0 and XPath 2.0 languages: This document defines the semantics of [XPath/XQuery] by giving ...

XQuery for simple problems only?

Here is what Michael Kay (XSLT star, developer of Saxon, author of every-XSLT-dev-bible "XSLT Programmer's Reference" and XSLT 2.0 editor) writes about XQuery: The strength of XQuery is that it is a simpler language than XSLT, which makes it much more feasible to implement efficient searching of very large XML ...

RenderX ports XEP XSL-FO formatter to .NET

RenderX, a company behind famous XEP XSL-FO formatter plans to release a .NET version. Great news! XEP is the best production quality Java XSL-FO formatter I've ever seen. It's not unexpensive, but it covers XSL-FO a way better than free Apache FOP (I have to add "unfortunately", being one of ...

February 19, 2004

BizTalk 2004 launch on March 2, 2004

BizTalk Server 2004 will launch on March 2, 2004. At last! And to get us to speed up 8 BizTalk 2004 MSDN webcasts are arranged between March 2 and March 5! Here is the first developer treat: As part of the launch there will be an MSDN BizTalk Server Developer ...

XML Bestary Updated

I've updated my XML Bestiary as a consequence of users and my own feedback. First of all I renamed WritableXPathNavigator to SerializableXPathNavigator. That's much less confusing name IMO. Beside that I unified all distributions (the same namespace, project structure etc). More beasts to come soon, I've got several growing up ...

February 18, 2004

Streaming XInclude and Intra-document References

It's definitely love-to-steaming-strikes-back day today. Here is another sample of how streaming XML processing approach fails. The only XInlcude feature still not implemented in XInlcude.NET project is intra-document references. And basically I have no idea how to implement it in .NET pull environment (as well as Elliotte Rusty Harold has ...

XInclude allows the following constructs:

<root>
   <element id="bar"/>
   <xi:include xpointer="bar"/>
</root>
After XInclude processinig above XML should resolve to
<root>
   <element id="bar"/>
   <element id="bar"/>
</root>
This is called intra-document reference. <xi:include> instruction having no href attribute refers to the same document currently processed. That opens Pandora's box of implications that basically prevents streaming XInclude processing altogether, as one obviously can't arbitrary navigate over XML stream, neither with XmlReader nor with SAX. "bar" as XPointer is a shorthand pointer, pointing to the element with "bar" ID. (Btw, XInclude processing is recursive so the same way it may point to another <xi:include> element in the same document, causing double processing of the same <xi:include> instruction).

As the core class of XInclude.NET - XIncludingReader is just an XmlReader, how on earth I can get backward in forward-only XmlReader??? Seems like to implement this feature I have to cache source XML document as a whole. Too bad.

ForwardXPathNavigator vs XSE: a class vs API

Meanwhile I managed to create simple dummy online demo of ForwardXPathNavigator (XPathNavigator implementation over XmlReader) I was talking about. Here it is. ...

It allows to test what ForwardXPathNavigator can and what cannot select. Upload XML document you like to test (please don't abuse loading huge ones), then enter XPath expression and click "RunQuery" button. I know it looks badly in mozilla, but I have no idea how to insert transformation result into HTML page so it gets styled in mozilla too. There are lots of issues, such as namespace declararation isn't showed etc, come on, that's not online XPath tutorial, but just simple demo.

Talking about difference between XSE and ForwardXPathNavigator, Daniel writes:

Back to the issue, there's a fundamental difference in the approach between his class and my XSE API: his will consume the stream with a single query. Mine supports multiple handlers matching multiple elements at the same time. And it's still a pull-based API, where you have to iterate results, instead of being called when something you care happened (was matched).
Well, ForwardXPathNavigator wasn't designed to be compared with XSE! It's simple poor man's (XmlReader) XPathNavigator. But as XPathNavigator it allows not only evaluate XPath queries, but to navigate node by node over XML too. I was planning to build XSE-like system based on ForwardXPathNavigator. Actually I must admit I didn't go far from proving the concept and don't have code to publish yet (in the face of brilliant XSE impl :). The idea behind XmlUpdater/XPathFilter was the following: just navigate over XML using ForwardXPathNavigator and check each node if it matches any registered XPath patterns. On each matched node call associated with the pattern callback method, providing it with enough context to to what it want - to skip node (transparency), to modify it etc.

I found pattern matching cheap enough operation and the whole prototype quite satisfying. What I dislike is too fragile nature of ForwardXPathNavigator. It's forward-only, so XPath patterns and the whole application must be too-carefully defined with forward-only concept in mind, what's not usual concept when working with XPath, right? Funny thing - ForwardXPathNavigator may move irrevocably when you just inspecting its properties in the debugger! Count property of XPathNodeIterator becomes obviously unusabe too. To put it another way - it's to hard to work with this stuff. And benefits are not so striking by the way. May be that's my bad design, dunno...

XSE idea

Here is Daniel clarifies things about XSE: XSE is not about querying with an specific expression language/format (i.e. XPath or SXPath). XSE is just a mechanism for encapsulating state machines checking for matches against a given expression. What the expression looks like depends on the factory that creates the strategy ...

The Man's patenting XML?

Looks like Microsoft's patenting its XML investments. Recently we had a hubbub about Office 2003 schemas patenting, then XML scripting. Daniel like many others feel alarm, you too? Well, I'm not. Patenting software ideas is stupid thing, but that's a matter of unperfect reality we live in. Everything is patented ...

New XQuery book

Michael Brundage's excellent XQuery reference book is finally available. [Via Michael Rys] Dr. Rys is talking about just published (February 2004) "XQuery : The XML Query Language" book. Michael Brundage is Technical Lead for XQuery processing at Microsoft and the recommendations are so weighty... I feel I want this book ...

February 16, 2004

RE: Streaming XPath and ForwardXPathNavigator

Ok, Dare great deal clarified things in his "Combining XPath-based Filtering with Pull-based XML Parsing" post: Actually Oleg is closer and yet farther from the truth than he realizes. Although I wrote about a hypothetical ForwardOnlyXPathNavigator in my article entitled Can One Size Fit All? for XML Journal my planned article ...

So, it's forward-only XPath subset and BizTalk's XPathReader isn't hidden. Nice to hear.
I wonder who this guy is. He's definitely an expert in the area. Why he doesn't blog? I'm looking forward to see the article, what a pity XML dev center is postponed.

When the article describing the XPathReader is done it will provide source and if there is interest I'll create a GotDotNet Workspace for the project although it is unlikely I nor the dev who originally wrote the code will have time to maintain it.
I'm volunteering here. I think it's important-to-have option in XML processing under .NET.

Meanwhile Daniel has released XSE stuff at last (btw, I'm musing if I have to adopt hype-before-release strategy? :). Really interesting. But I still believe XPath (forward-only subset of course) is the way to go.

Anyway, here is ForwardXPathNavigator I was talking about - ForwardXPathNavigator.zip. It's written by my buddy dev Vladimir Nesterovsky. And here are some basic samples.

Selecting feed titles from RSSBandit feed list (pure forward-only selection):

XmlReader r = new XmlTextReader("feedlist.xml");
ForwardXPathNavigator nav = new ForwardXPathNavigator(r);
XmlNamespaceManager nsm = new XmlNamespaceManager(nav.NameTable);
nsm.AddNamespace("r", 
    "http://www.25hoursaday.com/2003/RSSBandit/feeds/");
XPathExpression expr = 
    nav.Compile("/r:feeds/r:feed/r:title");
expr.SetContext(nsm);
XPathNodeIterator ni = nav.Select(expr);
while (ni.MoveNext()) {
    Console.WriteLine(ni.Current.Value);
}
Obviously ForwardXPathNavigator doesn't allow you to peek to forward or backward nodes. What it only stores is current node XmlReader is positioned at and some details about its direct ancestors. As Dare pointed out, expression such as /r:feeds/r:feed[count(r:stories-recently-viewed)>10]/r:title are not supported, because it cannot be done in forward-only manner. That wasn't ForwardXPathNavigator's goal anyway. In fact such query can be done in forward-only way to some extent though, but not without a help from the host environment. E.g. to select the most viewed feeds, one can select each feed, store its title, then calculate count(r:stories-recently-viewed/r:story) and determine if the feed is popular enough to be selected:
XmlReader r = new XmlTextReader("feedlist.xml");
ForwardXPathNavigator nav = new ForwardXPathNavigator(r);
XmlNamespaceManager nsm = new 
    XmlNamespaceManager(nav.NameTable);
nsm.AddNamespace("r", 
    "http://www.25hoursaday.com/2003/RSSBandit/feeds/");
XPathExpression expr = 
    nav.Compile("/r:feeds/r:feed");
expr.SetContext(nsm);
XPathExpression countExpr = 
    nav.Compile("count(r:stories-recently-viewed/r:story)");
countExpr.SetContext(nsm);
XPathExpression titleExpr = 
    nav.Compile("string(r:title)");
titleExpr.SetContext(nsm);
XPathNodeIterator ni = nav.Select(expr);
while (ni.MoveNext()) {
    string title = ni.Current.Evaluate(titleExpr) as string;
    if ((double)ni.Current.Evaluate(countExpr) > 20)
        Console.WriteLine(title);
}
Not so elegant (mostly because lack of XPathNavigator.Select(string, XmlNamespaceManager) method), but still feasible. Btw, instroducing some extension function, which could control ForwardXPathNavigator's cach would be quite interesting. Something like /r:feeds/r:feed[ext:store(r:title)][count(r:stories-recently-viewed)>10]/r:title. That's a pity XPath doesn't allow to create variables...

As I said ForwardXPathNavigator keeps some track of ancestor nodes (name, attributes etc), thus enabling some limited backward selections, such as /r:feeds/r:feed[r:title='The XML Files']/@category! I'm going to provide small aspx page where ForwardXPathNavigator can be tested online by anyone interested.

Tomorrow I'll go on spinning up the topic by presenting XmlUpdater (which is based on ForwardXPathNavigator), SAX-filter-like approach to modify XML on the fly.

February 15, 2004

nxslt 1.4 released

I've released nxslt.exe utility version 1.4. It's maintenance release. Changes are: Updated to EXSLT.NET 1.0.1. Updated to XInclude.NET 1.2. Updated project to Microsoft Visual Studio .NET 2003 (so now nxslt.exe can be built directly from VS.NET, no need to run nmake manually - EXSLT methods renaming such as nodeSet() to ...

Warriors of Streaming XPath Order

Daniel writes about performant (and inevitably streaming) XML processing, introducing XSEReader (aka Xml Streaming Events Reader). While he didn't publish the implementation itself yet, but only teasing with samples of its usage, I think I get the idea. Basically I know what he's talking about. I've been playing with such ...

May be I'm mistaken, but anyway here is the idea - "ForwardOnlyXPathNavigator" is XPathNavigator implementation over XmlReader, which obviously supports forward-only XPath subset. My fellow developer wrote such one so may be we should publish it anyway. Having such navigator it's easy to write a class (I called it XPathFilter), which allows to register callbacks to specific nodes, identified by XPath pattern. XPathFilter travers XML document moving ForwardOnlyXPathNavigator in document order and on each node matching any registered pattern it calls callback method. In the callback it's possible to skip or modify matched node, just like in ordinar SAX filter. I've implemented XmlUpdater class based on such technique and it's proven to be effectieve on modifying huge XML documents on the fly. For instance here is how I can change element into attribute:

FileStream output = File.Create("ot2.xml");
XmlUpdater updater = new XmlUpdater(File.OpenRead("otbig.xml"), 
    output); 			
updater.AddHandler("/tstmt/book/chapter/chtitle", 
    new NodeMatchedEventHandler(MyHandler));
updater.Start();
...
public static void MyHandler(XmlUpdater xu, 
        XPathNavigator nav, XmlWriter w) {						
    w.WriteAttributeString("title", nav.Value);			
}		

And after I played enough with and implemented that stuff I discovered BizTalk 2004 Beta classes contain much better implementation of the same functionality in such gems as XPathReader, XmlTranslatorStream, XmlValidatingStream and XPathMutatorStream. They're amazing classes that enable streaming XML processing in much rich way than trivial XmlReader stack does. I only wonder why they are not in System.Xml v2 ? Is there are any reasons why they are still hidden deeply inside BizTalk 2004 ? Probably I have to evangelize them a bit as I really like this idea.

Anyway, back to XSEReader. What I like in this approach is that it's streaming event based one (do I still miss SAX?). What I dislike is proprietary XPath-like patterns like ":*" (why not *.* ?), "^kzu:*", XPath-like sugar like RootedPath(), RelativePath() etc. I think XPath is the way to go, no need to reinvent the wheel. Anyway, let's wait Daniel unveils all API and impl details.