February 26, 2004

Streaming XInlcude gets blessing

At last some good news. Streaming subset of XInclude I was talking about gets blessing from the W3C XML Core WG. Here is what Jonathan Marsh (MSFT, editor of XInclude) writes: It appears to be impossible to improve streamability without removing functionality from XInclude. The WG decided instead to bless ...

February 25, 2004

XInclude.NET logo contest

Well, I know I stink on graphics. Yesterday I tried to develop a logo for the XInclude.NET project and here is what I ended up. The idea was about Lego and intergration or parts into a round thing, whatever. I'd like to hear what do you guys think about this ...

Bad Monday and XInclude.NET development

When your hard disk dies Monday morning, that's nice week start. Low type tasks on recovering your data, sources, reinstalling and configuring all the stuff you cannot work without... Refreshing. Basically I've recovered already. Surprisingly I cannot now install Office 2003, it says "You've got McAffee VirusScan Enterprise installed, Office ...

Anyway, back to XML. I've been fixing one nasty bug in XInclude.NET and implemented at the same time document caching (those loaded into memory anyway - when XPointer involved). It's very natural indeed to have several partial includes from the same document:

<xi:include href="books.xml" xpointer="xpointer(/catalog/@title)"/>
<xi:include href="books.xml" xpointer="xpointer(/catalog/@count)"/>
For selecting fragment of a document, identified by XPointer pointer, XInclude.NET translates XPointer to XPath, loads the document into XmlDocument and evaluates the XPath selection. Obviously XmlDocument should be cached in this scenario.

Well, may be also there is a point to cache documents, which are included as a whole. That's not so common to include the same document more than once, but who knows. That means no more streamng include, while allows to implement XInclude intra-document references I was complaining about. I think that's nice to have option.

So I'm planning to implement non-streaming XInclude mode for XInclude.NET too. In fact that's amazingly simple to build in-memory XML processing on top of streaming one. Actually all I need to do is to cache source document and pipe it to XInlcludingReader via XmlNodeReader. That's it.

Having that mode XInclude.NET would be able to perform XML Inclusions in streaming (w/o support for intra-doc refs) or non-streaming way (full support). Users might measure and decide which mode is appropriate for their particlular project.

If I'd be not so lazy it's doable to move from XmlDocument to XPathDocument, gaining approximately 30% more of perf. For that I need XmlReader over XPathNavigator. I've seen several, so why not.

The bottom line - XInclude.NET 1.3 is coming.

February 21, 2004

nxslt.exe Command Line Utility

Dummy entry to provide single place for nxslt.exe utility comments. ...

February 20, 2004

"XQuery and XPath Formal Semantics" goes Last Call Working Draft

"XQuery 1.0 and XPath 2.0 Formal Semantics" spec has been updated today and reached Last Call Working Draft status. This is a document you may want to read to get deep understanding of semantics of XQuery 1.0 and XPath 2.0 languages: This document defines the semantics of [XPath/XQuery] by giving ...

XQuery for simple problems only?

Here is what Michael Kay (XSLT star, developer of Saxon, author of every-XSLT-dev-bible "XSLT Programmer's Reference" and XSLT 2.0 editor) writes about XQuery: The strength of XQuery is that it is a simpler language than XSLT, which makes it much more feasible to implement efficient searching of very large XML ...

RenderX ports XEP XSL-FO formatter to .NET

RenderX, a company behind famous XEP XSL-FO formatter plans to release a .NET version. Great news! XEP is the best production quality Java XSL-FO formatter I've ever seen. It's not unexpensive, but it covers XSL-FO a way better than free Apache FOP (I have to add "unfortunately", being one of ...

February 19, 2004

BizTalk 2004 launch on March 2, 2004

BizTalk Server 2004 will launch on March 2, 2004. At last! And to get us to speed up 8 BizTalk 2004 MSDN webcasts are arranged between March 2 and March 5! Here is the first developer treat: As part of the launch there will be an MSDN BizTalk Server Developer ...

XML Bestary Updated

I've updated my XML Bestiary as a consequence of users and my own feedback. First of all I renamed WritableXPathNavigator to SerializableXPathNavigator. That's much less confusing name IMO. Beside that I unified all distributions (the same namespace, project structure etc). More beasts to come soon, I've got several growing up ...

February 18, 2004

Streaming XInclude and Intra-document References

It's definitely love-to-steaming-strikes-back day today. Here is another sample of how streaming XML processing approach fails. The only XInlcude feature still not implemented in XInlcude.NET project is intra-document references. And basically I have no idea how to implement it in .NET pull environment (as well as Elliotte Rusty Harold has ...

XInclude allows the following constructs:

<root>
   <element id="bar"/>
   <xi:include xpointer="bar"/>
</root>
After XInclude processinig above XML should resolve to
<root>
   <element id="bar"/>
   <element id="bar"/>
</root>
This is called intra-document reference. <xi:include> instruction having no href attribute refers to the same document currently processed. That opens Pandora's box of implications that basically prevents streaming XInclude processing altogether, as one obviously can't arbitrary navigate over XML stream, neither with XmlReader nor with SAX. "bar" as XPointer is a shorthand pointer, pointing to the element with "bar" ID. (Btw, XInclude processing is recursive so the same way it may point to another <xi:include> element in the same document, causing double processing of the same <xi:include> instruction).

As the core class of XInclude.NET - XIncludingReader is just an XmlReader, how on earth I can get backward in forward-only XmlReader??? Seems like to implement this feature I have to cache source XML document as a whole. Too bad.

ForwardXPathNavigator vs XSE: a class vs API

Meanwhile I managed to create simple dummy online demo of ForwardXPathNavigator (XPathNavigator implementation over XmlReader) I was talking about. Here it is. ...

It allows to test what ForwardXPathNavigator can and what cannot select. Upload XML document you like to test (please don't abuse loading huge ones), then enter XPath expression and click "RunQuery" button. I know it looks badly in mozilla, but I have no idea how to insert transformation result into HTML page so it gets styled in mozilla too. There are lots of issues, such as namespace declararation isn't showed etc, come on, that's not online XPath tutorial, but just simple demo.

Talking about difference between XSE and ForwardXPathNavigator, Daniel writes:

Back to the issue, there's a fundamental difference in the approach between his class and my XSE API: his will consume the stream with a single query. Mine supports multiple handlers matching multiple elements at the same time. And it's still a pull-based API, where you have to iterate results, instead of being called when something you care happened (was matched).
Well, ForwardXPathNavigator wasn't designed to be compared with XSE! It's simple poor man's (XmlReader) XPathNavigator. But as XPathNavigator it allows not only evaluate XPath queries, but to navigate node by node over XML too. I was planning to build XSE-like system based on ForwardXPathNavigator. Actually I must admit I didn't go far from proving the concept and don't have code to publish yet (in the face of brilliant XSE impl :). The idea behind XmlUpdater/XPathFilter was the following: just navigate over XML using ForwardXPathNavigator and check each node if it matches any registered XPath patterns. On each matched node call associated with the pattern callback method, providing it with enough context to to what it want - to skip node (transparency), to modify it etc.

I found pattern matching cheap enough operation and the whole prototype quite satisfying. What I dislike is too fragile nature of ForwardXPathNavigator. It's forward-only, so XPath patterns and the whole application must be too-carefully defined with forward-only concept in mind, what's not usual concept when working with XPath, right? Funny thing - ForwardXPathNavigator may move irrevocably when you just inspecting its properties in the debugger! Count property of XPathNodeIterator becomes obviously unusabe too. To put it another way - it's to hard to work with this stuff. And benefits are not so striking by the way. May be that's my bad design, dunno...

XSE idea

Here is Daniel clarifies things about XSE: XSE is not about querying with an specific expression language/format (i.e. XPath or SXPath). XSE is just a mechanism for encapsulating state machines checking for matches against a given expression. What the expression looks like depends on the factory that creates the strategy ...

The Man's patenting XML?

Looks like Microsoft's patenting its XML investments. Recently we had a hubbub about Office 2003 schemas patenting, then XML scripting. Daniel like many others feel alarm, you too? Well, I'm not. Patenting software ideas is stupid thing, but that's a matter of unperfect reality we live in. Everything is patented ...

New XQuery book

Michael Brundage's excellent XQuery reference book is finally available. [Via Michael Rys] Dr. Rys is talking about just published (February 2004) "XQuery : The XML Query Language" book. Michael Brundage is Technical Lead for XQuery processing at Microsoft and the recommendations are so weighty... I feel I want this book ...

February 16, 2004

RE: Streaming XPath and ForwardXPathNavigator

Ok, Dare great deal clarified things in his "Combining XPath-based Filtering with Pull-based XML Parsing" post: Actually Oleg is closer and yet farther from the truth than he realizes. Although I wrote about a hypothetical ForwardOnlyXPathNavigator in my article entitled Can One Size Fit All? for XML Journal my planned article ...

So, it's forward-only XPath subset and BizTalk's XPathReader isn't hidden. Nice to hear.
I wonder who this guy is. He's definitely an expert in the area. Why he doesn't blog? I'm looking forward to see the article, what a pity XML dev center is postponed.

When the article describing the XPathReader is done it will provide source and if there is interest I'll create a GotDotNet Workspace for the project although it is unlikely I nor the dev who originally wrote the code will have time to maintain it.
I'm volunteering here. I think it's important-to-have option in XML processing under .NET.

Meanwhile Daniel has released XSE stuff at last (btw, I'm musing if I have to adopt hype-before-release strategy? :). Really interesting. But I still believe XPath (forward-only subset of course) is the way to go.

Anyway, here is ForwardXPathNavigator I was talking about - ForwardXPathNavigator.zip. It's written by my buddy dev Vladimir Nesterovsky. And here are some basic samples.

Selecting feed titles from RSSBandit feed list (pure forward-only selection):

XmlReader r = new XmlTextReader("feedlist.xml");
ForwardXPathNavigator nav = new ForwardXPathNavigator(r);
XmlNamespaceManager nsm = new XmlNamespaceManager(nav.NameTable);
nsm.AddNamespace("r", 
    "http://www.25hoursaday.com/2003/RSSBandit/feeds/");
XPathExpression expr = 
    nav.Compile("/r:feeds/r:feed/r:title");
expr.SetContext(nsm);
XPathNodeIterator ni = nav.Select(expr);
while (ni.MoveNext()) {
    Console.WriteLine(ni.Current.Value);
}
Obviously ForwardXPathNavigator doesn't allow you to peek to forward or backward nodes. What it only stores is current node XmlReader is positioned at and some details about its direct ancestors. As Dare pointed out, expression such as /r:feeds/r:feed[count(r:stories-recently-viewed)>10]/r:title are not supported, because it cannot be done in forward-only manner. That wasn't ForwardXPathNavigator's goal anyway. In fact such query can be done in forward-only way to some extent though, but not without a help from the host environment. E.g. to select the most viewed feeds, one can select each feed, store its title, then calculate count(r:stories-recently-viewed/r:story) and determine if the feed is popular enough to be selected:
XmlReader r = new XmlTextReader("feedlist.xml");
ForwardXPathNavigator nav = new ForwardXPathNavigator(r);
XmlNamespaceManager nsm = new 
    XmlNamespaceManager(nav.NameTable);
nsm.AddNamespace("r", 
    "http://www.25hoursaday.com/2003/RSSBandit/feeds/");
XPathExpression expr = 
    nav.Compile("/r:feeds/r:feed");
expr.SetContext(nsm);
XPathExpression countExpr = 
    nav.Compile("count(r:stories-recently-viewed/r:story)");
countExpr.SetContext(nsm);
XPathExpression titleExpr = 
    nav.Compile("string(r:title)");
titleExpr.SetContext(nsm);
XPathNodeIterator ni = nav.Select(expr);
while (ni.MoveNext()) {
    string title = ni.Current.Evaluate(titleExpr) as string;
    if ((double)ni.Current.Evaluate(countExpr) > 20)
        Console.WriteLine(title);
}
Not so elegant (mostly because lack of XPathNavigator.Select(string, XmlNamespaceManager) method), but still feasible. Btw, instroducing some extension function, which could control ForwardXPathNavigator's cach would be quite interesting. Something like /r:feeds/r:feed[ext:store(r:title)][count(r:stories-recently-viewed)>10]/r:title. That's a pity XPath doesn't allow to create variables...

As I said ForwardXPathNavigator keeps some track of ancestor nodes (name, attributes etc), thus enabling some limited backward selections, such as /r:feeds/r:feed[r:title='The XML Files']/@category! I'm going to provide small aspx page where ForwardXPathNavigator can be tested online by anyone interested.

Tomorrow I'll go on spinning up the topic by presenting XmlUpdater (which is based on ForwardXPathNavigator), SAX-filter-like approach to modify XML on the fly.

February 15, 2004

nxslt 1.4 released

I've released nxslt.exe utility version 1.4. It's maintenance release. Changes are: Updated to EXSLT.NET 1.0.1. Updated to XInclude.NET 1.2. Updated project to Microsoft Visual Studio .NET 2003 (so now nxslt.exe can be built directly from VS.NET, no need to run nmake manually - EXSLT methods renaming such as nodeSet() to ...

Warriors of Streaming XPath Order

Daniel writes about performant (and inevitably streaming) XML processing, introducing XSEReader (aka Xml Streaming Events Reader). While he didn't publish the implementation itself yet, but only teasing with samples of its usage, I think I get the idea. Basically I know what he's talking about. I've been playing with such ...

May be I'm mistaken, but anyway here is the idea - "ForwardOnlyXPathNavigator" is XPathNavigator implementation over XmlReader, which obviously supports forward-only XPath subset. My fellow developer wrote such one so may be we should publish it anyway. Having such navigator it's easy to write a class (I called it XPathFilter), which allows to register callbacks to specific nodes, identified by XPath pattern. XPathFilter travers XML document moving ForwardOnlyXPathNavigator in document order and on each node matching any registered pattern it calls callback method. In the callback it's possible to skip or modify matched node, just like in ordinar SAX filter. I've implemented XmlUpdater class based on such technique and it's proven to be effectieve on modifying huge XML documents on the fly. For instance here is how I can change element into attribute:

FileStream output = File.Create("ot2.xml");
XmlUpdater updater = new XmlUpdater(File.OpenRead("otbig.xml"), 
    output); 			
updater.AddHandler("/tstmt/book/chapter/chtitle", 
    new NodeMatchedEventHandler(MyHandler));
updater.Start();
...
public static void MyHandler(XmlUpdater xu, 
        XPathNavigator nav, XmlWriter w) {						
    w.WriteAttributeString("title", nav.Value);			
}		

And after I played enough with and implemented that stuff I discovered BizTalk 2004 Beta classes contain much better implementation of the same functionality in such gems as XPathReader, XmlTranslatorStream, XmlValidatingStream and XPathMutatorStream. They're amazing classes that enable streaming XML processing in much rich way than trivial XmlReader stack does. I only wonder why they are not in System.Xml v2 ? Is there are any reasons why they are still hidden deeply inside BizTalk 2004 ? Probably I have to evangelize them a bit as I really like this idea.

Anyway, back to XSEReader. What I like in this approach is that it's streaming event based one (do I still miss SAX?). What I dislike is proprietary XPath-like patterns like ":*" (why not *.* ?), "^kzu:*", XPath-like sugar like RootedPath(), RelativePath() etc. I think XPath is the way to go, no need to reinvent the wheel. Anyway, let's wait Daniel unveils all API and impl details.

February 10, 2004

DevDays 2004 Israel

It's been Microsoft DevDays 2004 in Israel today. Well, DevDay actually. Here are the impressions I got there: One has to get up earlier to not miss the keynote. VS.NET has cool PocketPC emulator. Code Access Security is omnipotent. Lutz Roeder's .NET Reflector may hang out in the middle of ...

February 9, 2004

On Making XML Namespaced On The Fly

This interesting trick has been discussed in microsoft.public.dotnet.xml newsgroup recently. When one has a no-namespaced XML document, such as <?xml version="1.0"?> <foo> <bar>Blah</bar> </foo> there is a trick in .NET, which allows to read such document as if it has some default namespace: <?xml version="1.0"?> <foo xmlns="http://foo.com"> <bar>Blah</bar> </foo> ...

Actually I have no idea when this could be useful, but people keep asking if this can be done (I presume they are neglecting namespaces in fact, but still I think there are real use cases for such functionality). And while it's probably not 100% clean architecturally, there is an effectieve and simple way. The crux is to read XML document not as standalone document, but as XML fragment, providing XmlNamespaceManager with default namespace set up. Here is the code:

string xml = 
@"<?xml version=""1.0""?>
  <foo>
    <bar>Blah</bar>    
  </foo>";
XmlNameTable nt = new NameTable();
XmlNamespaceManager nsm = new XmlNamespaceManager(nt);
nsm.AddNamespace(String.Empty, "http://foo.com");
XmlParserContext ctx = new XmlParserContext(nt, nsm, null, XmlSpace.Default);
XmlTextReader r = new XmlTextReader(xml, XmlNodeType.Document, ctx);
//Read it to XmlDocument to test
XmlDocument doc = new XmlDocument();
doc.Load(r);
doc.Save(Console.Out);
The result is
<?xml version="1.0"?>
<foo xmlns="http://foo.com">
  <bar>Blah</bar>
</foo>

I was quite surprised to see it works even when XML fragment type is XmlNodeType.Document, IMO it should work only for XmlNodeType.Element typed XML fragment. With XmlNodeType.Document it looks like inter-document namespace definition nonsense, but anyway, nice and effective trick.

Read more about reading XML fragments with XmlTextReader in MSDN.

February 5, 2004

XML Tips and Tricks. Conditional XPath expressions

I'm introducing another category in my blog - XML Tips and Tricks, where I'm going to post some XML, XPath, XSLT, XML Schema, XQuery etc tips and tricks. I know, many of my readers being real XML gurus know all this stuff (I encourage to correct me when I'm wrong ...

If you are like me and addicted to write

return a>b? a : b;
instead of
if (a>b)
    return a;
else
    return b;
then you should be used to grumble programming in XSLT, because XPath 1.0 doesn't support conditional expressions (XPath 2.0 does though). The most notorious sample is when outputting a value into HTML table cell - you should assure it's not empty otherwise the cell will collapse into nothing in a browser. So one usually ends up with the following verbose pattern:
<xsl:choose>
    <xsl:when test="price != ''">
        <xsl:value-of select="price"/>
    </xsl:when>
    <xsl:otherwise>&#xA0;</xsl:otherwise>
</xsl:choose>
Or when you need to output some value or "n/a" string if the value is empty. Quite common requirements. Things get even worse when you need to set up a variable conditionally - the only way then is to nest xsl:choose switch within xsl:variable, thus getting result tree fragment instead of nodeset.

But in fact there are tricks to address this XPath 1.0 restriction. Here they are.

For conditional nodesets the trick formula is
$nodeset1[$condition] | $nodeset2[not($condition)]

It's an union of both nodesets, filtered by mutually exclusive conditional expressions. Easy to see than depending on boolean value of the $condition one nodeset will be selected and second one filtered out. E.g.

<xsl:variable name="var" select="//foo[$param] | //bar[not($param)]"/>
binds $var to //foo if $param is true and to //bar otherwise.

For conditional strings or numbers the trick formula is more complicted:
concat( substring($s1, number(not($condition))*string-length($s1)+1),
     substring($s2, number($condition)*string-length($s2)+1) )

While it looks quite convolute, the idea (Becker's method after Oliver Becker) is simple - in XPath number(true()) is 1, while number(false()) is 0 and when second argument of substring() function is greater than actual length of the string, empty string is returned. Hence substring($s1, number(not($condition))*string-length($s1)+1) returns $s1 if $condition is true and empty string otherwise. Concatenating two such expressions in mutually exclusive way gives us conditional strings expression.
There is also another variant:
concat( substring($s1, 1, number($condition)*string-length($s1)),
     substring($s2, 1, number(not($condition))*string-length($s2)) )

In practice such expressions can be great deal simplified though. For instance to output price if it's not empty or "n/a" otherwise one can use just

<xsl:value-of select="concat(price, 
     substring('n/a', (price!='')*string-length('n/a')+1))"/>

Another interesting trick is to leverage the ability of msxsl:node-set() (or exslt:node-set) extension function to convert a string into a text node, thus enabling using aforementioned conditional nodeset trick for strings too. Here is the same sample written using this method:

<xsl:value-of select="concat(price, 
     msxsl:node-set('n/a')[current()/price=''])"/>
Well, probably enough. Hope you had fun looking at this clumsy XPath tricks. Remember than it's the very first version of the XPath language after all and XPath 2.0 will make these tricks obsolete bringing in support for conditional expressions, such as
<xsl:value-of select="if ($part/@discounted) 
  then $part/wholesale 
  else $part/retail"/>
Till then it's good to know these tricks.


Quote of the day:

Is it possible to transform a XML document to another XML document using XSLT? How?
:)

February 4, 2004

XML Bestiary: SerializableXPathNavigator - InnerXml/OuterXml for XPathNavigator

Dare has been talking recently about the disconnects developers may feel once they make the shift from tree based (XmlDocument) to cursor based (XPathNavigator) model. My personal XML learning curve has started with DOM (I remember those long convolute ugly DOM navigational programs I wrote back in Y2K), then I ...

February 3, 2004

I love XmlResolvers

Did you know XslTransform class allows custom XmlResolver to return not only Stream (it's only what default XmlResolver implementation - XmlUrlResolver class supports), but also XPathNavigator! Sounds like undeservedly undocumented feature. What it gives us? Really efficient advanced XML resolving scenarios such as just mentioned recently on asp.net XML forum ...

On transforming WordML to HTML again

One of consequences of the revolutionary XML support in Microsoft Office 2003 is a possibility to unlock information in the Microsoft Office System using XML. Most likely that was deliberate decision to open Office doors for XML technology and I'm sure that's winning strategy. Talking about transforming WordprocessingML (WordML) to ...

February 1, 2004

MovableType 3.0 Alpha soon

Six Apart has announced MovableType 3.0 Alpha testing is about to begin. Testers such as plugin developers, web standards advocates or just Movable Type users with an active commenting community are invited. Here is a list of upcoming MT 3.0 features. I keep getting 5-10 spam comments a day, so ...

EXSLT.NET rocks

Have you noted this thread in microsoft.public.dotnet.xml newsgroup? A guy was trying to get list of unique values from XML document of 46000 records. Using Muenchian grouping method. For MSXML4 it took 20 seconds, while in .NET 1.0 and 1.1 it effectively hung.Well, as all we know Muenchian method works ...