Anyway, back to XML. I've been fixing one nasty bug in XInclude.NET and implemented at the same time document caching (those loaded into memory anyway - when XPointer involved). It's very natural indeed to have several partial includes from the same document:
<xi:include href="books.xml" xpointer="xpointer(/catalog/@title)"/> <xi:include href="books.xml" xpointer="xpointer(/catalog/@count)"/>For selecting fragment of a document, identified by XPointer pointer, XInclude.NET translates XPointer to XPath, loads the document into XmlDocument and evaluates the XPath selection. Obviously XmlDocument should be cached in this scenario.
Well, may be also there is a point to cache documents, which are included as a whole. That's not so common to include the same document more than once, but who knows. That means no more streamng include, while allows to implement XInclude intra-document references I was complaining about. I think that's nice to have option.
So I'm planning to implement non-streaming XInclude mode for XInclude.NET too. In fact that's amazingly simple to build in-memory XML processing on top of streaming one. Actually all I need to do is to cache source document and pipe it to XInlcludingReader via XmlNodeReader. That's it.
Having that mode XInclude.NET would be able to perform XML Inclusions in streaming (w/o support for intra-doc refs) or non-streaming way (full support). Users might measure and decide which mode is appropriate for their particlular project.
If I'd be not so lazy it's doable to move from XmlDocument to XPathDocument, gaining approximately 30% more of perf. For that I need XmlReader over XPathNavigator. I've seen several, so why not.
The bottom line - XInclude.NET 1.3 is coming.
XInclude allows the following constructs:
<root> <element id="bar"/> <xi:include xpointer="bar"/> </root>After XInclude processinig above XML should resolve to
<root> <element id="bar"/> <element id="bar"/> </root>This is called intra-document reference. <xi:include> instruction having no href attribute refers to the same document currently processed. That opens Pandora's box of implications that basically prevents streaming XInclude processing altogether, as one obviously can't arbitrary navigate over XML stream, neither with XmlReader nor with SAX. "bar" as XPointer is a shorthand pointer, pointing to the element with "bar" ID. (Btw, XInclude processing is recursive so the same way it may point to another <xi:include> element in the same document, causing double processing of the same <xi:include> instruction).
As the core class of XInclude.NET - XIncludingReader is just an XmlReader, how on earth I can get backward in forward-only XmlReader??? Seems like to implement this feature I have to cache source XML document as a whole. Too bad.
It allows to test what ForwardXPathNavigator can and what cannot select. Upload XML document you like to test (please don't abuse loading huge ones), then enter XPath expression and click "RunQuery" button. I know it looks badly in mozilla, but I have no idea how to insert transformation result into HTML page so it gets styled in mozilla too. There are lots of issues, such as namespace declararation isn't showed etc, come on, that's not online XPath tutorial, but just simple demo.
Talking about difference between XSE and ForwardXPathNavigator, Daniel writes:
Back to the issue, there's a fundamental difference in the approach between his class and my XSE API: his will consume the stream with a single query. Mine supports multiple handlers matching multiple elements at the same time. And it's still a pull-based API, where you have to iterate results, instead of being called when something you care happened (was matched).Well, ForwardXPathNavigator wasn't designed to be compared with XSE! It's simple poor man's (XmlReader) XPathNavigator. But as XPathNavigator it allows not only evaluate XPath queries, but to navigate node by node over XML too. I was planning to build XSE-like system based on ForwardXPathNavigator. Actually I must admit I didn't go far from proving the concept and don't have code to publish yet (in the face of brilliant XSE impl :). The idea behind XmlUpdater/XPathFilter was the following: just navigate over XML using ForwardXPathNavigator and check each node if it matches any registered XPath patterns. On each matched node call associated with the pattern callback method, providing it with enough context to to what it want - to skip node (transparency), to modify it etc.
I found pattern matching cheap enough operation and the whole prototype quite satisfying. What I dislike is too fragile nature of ForwardXPathNavigator. It's forward-only, so XPath patterns and the whole application must be too-carefully defined with forward-only concept in mind, what's not usual concept when working with XPath, right? Funny thing - ForwardXPathNavigator may move irrevocably when you just inspecting its properties in the debugger! Count property of XPathNodeIterator becomes obviously unusabe too. To put it another way - it's to hard to work with this stuff. And benefits are not so striking by the way. May be that's my bad design, dunno...
So, it's forward-only XPath subset and BizTalk's XPathReader isn't hidden. Nice to hear.
I wonder who this guy is. He's definitely an expert in the area. Why he doesn't blog? I'm looking forward to see the article, what a pity XML dev center is postponed.
When the article describing the XPathReader is done it will provide source and if there is interest I'll create a GotDotNet Workspace for the project although it is unlikely I nor the dev who originally wrote the code will have time to maintain it.I'm volunteering here. I think it's important-to-have option in XML processing under .NET.
Meanwhile Daniel has released XSE stuff at last (btw, I'm musing if I have to adopt hype-before-release strategy? :). Really interesting. But I still believe XPath (forward-only subset of course) is the way to go.
Anyway, here is ForwardXPathNavigator I was talking about - ForwardXPathNavigator.zip. It's written by my buddy dev Vladimir Nesterovsky. And here are some basic samples.
Selecting feed titles from RSSBandit feed list (pure forward-only selection):
XmlReader r = new XmlTextReader("feedlist.xml"); ForwardXPathNavigator nav = new ForwardXPathNavigator(r); XmlNamespaceManager nsm = new XmlNamespaceManager(nav.NameTable); nsm.AddNamespace("r", "http://www.25hoursaday.com/2003/RSSBandit/feeds/"); XPathExpression expr = nav.Compile("/r:feeds/r:feed/r:title"); expr.SetContext(nsm); XPathNodeIterator ni = nav.Select(expr); while (ni.MoveNext()) { Console.WriteLine(ni.Current.Value); }Obviously ForwardXPathNavigator doesn't allow you to peek to forward or backward nodes. What it only stores is current node XmlReader is positioned at and some details about its direct ancestors. As Dare pointed out, expression such as /r:feeds/r:feed[count(r:stories-recently-viewed)>10]/r:title are not supported, because it cannot be done in forward-only manner. That wasn't ForwardXPathNavigator's goal anyway. In fact such query can be done in forward-only way to some extent though, but not without a help from the host environment. E.g. to select the most viewed feeds, one can select each feed, store its title, then calculate count(r:stories-recently-viewed/r:story) and determine if the feed is popular enough to be selected:
XmlReader r = new XmlTextReader("feedlist.xml"); ForwardXPathNavigator nav = new ForwardXPathNavigator(r); XmlNamespaceManager nsm = new XmlNamespaceManager(nav.NameTable); nsm.AddNamespace("r", "http://www.25hoursaday.com/2003/RSSBandit/feeds/"); XPathExpression expr = nav.Compile("/r:feeds/r:feed"); expr.SetContext(nsm); XPathExpression countExpr = nav.Compile("count(r:stories-recently-viewed/r:story)"); countExpr.SetContext(nsm); XPathExpression titleExpr = nav.Compile("string(r:title)"); titleExpr.SetContext(nsm); XPathNodeIterator ni = nav.Select(expr); while (ni.MoveNext()) { string title = ni.Current.Evaluate(titleExpr) as string; if ((double)ni.Current.Evaluate(countExpr) > 20) Console.WriteLine(title); }Not so elegant (mostly because lack of XPathNavigator.Select(string, XmlNamespaceManager) method), but still feasible. Btw, instroducing some extension function, which could control ForwardXPathNavigator's cach would be quite interesting. Something like /r:feeds/r:feed[ext:store(r:title)][count(r:stories-recently-viewed)>10]/r:title. That's a pity XPath doesn't allow to create variables...
As I said ForwardXPathNavigator keeps some track of ancestor nodes (name, attributes etc), thus enabling some limited backward selections, such as /r:feeds/r:feed[r:title='The XML Files']/@category! I'm going to provide small aspx page where ForwardXPathNavigator can be tested online by anyone interested.
Tomorrow I'll go on spinning up the topic by presenting XmlUpdater (which is based on ForwardXPathNavigator), SAX-filter-like approach to modify XML on the fly.
May be I'm mistaken, but anyway here is the idea - "ForwardOnlyXPathNavigator" is XPathNavigator implementation over XmlReader, which obviously supports forward-only XPath subset. My fellow developer wrote such one so may be we should publish it anyway. Having such navigator it's easy to write a class (I called it XPathFilter), which allows to register callbacks to specific nodes, identified by XPath pattern. XPathFilter travers XML document moving ForwardOnlyXPathNavigator in document order and on each node matching any registered pattern it calls callback method. In the callback it's possible to skip or modify matched node, just like in ordinar SAX filter. I've implemented XmlUpdater class based on such technique and it's proven to be effectieve on modifying huge XML documents on the fly. For instance here is how I can change element into attribute:
FileStream output = File.Create("ot2.xml"); XmlUpdater updater = new XmlUpdater(File.OpenRead("otbig.xml"), output); updater.AddHandler("/tstmt/book/chapter/chtitle", new NodeMatchedEventHandler(MyHandler)); updater.Start(); ... public static void MyHandler(XmlUpdater xu, XPathNavigator nav, XmlWriter w) { w.WriteAttributeString("title", nav.Value); }
And after I played enough with and implemented that stuff I discovered BizTalk 2004 Beta classes contain much better implementation of the same functionality in such gems as XPathReader, XmlTranslatorStream, XmlValidatingStream and XPathMutatorStream. They're amazing classes that enable streaming XML processing in much rich way than trivial XmlReader stack does. I only wonder why they are not in System.Xml v2 ? Is there are any reasons why they are still hidden deeply inside BizTalk 2004 ? Probably I have to evangelize them a bit as I really like this idea.
Anyway, back to XSEReader. What I like in this approach is that it's streaming event based one (do I still miss SAX?). What I dislike is proprietary XPath-like patterns like ":*" (why not *.* ?), "^kzu:*", XPath-like sugar like RootedPath(), RelativePath() etc. I think XPath is the way to go, no need to reinvent the wheel. Anyway, let's wait Daniel unveils all API and impl details.
Actually I have no idea when this could be useful, but people keep asking if this can be done (I presume they are neglecting namespaces in fact, but still I think there are real use cases for such functionality). And while it's probably not 100% clean architecturally, there is an effectieve and simple way. The crux is to read XML document not as standalone document, but as XML fragment, providing XmlNamespaceManager with default namespace set up. Here is the code:
string xml = @"<?xml version=""1.0""?> <foo> <bar>Blah</bar> </foo>"; XmlNameTable nt = new NameTable(); XmlNamespaceManager nsm = new XmlNamespaceManager(nt); nsm.AddNamespace(String.Empty, "http://foo.com"); XmlParserContext ctx = new XmlParserContext(nt, nsm, null, XmlSpace.Default); XmlTextReader r = new XmlTextReader(xml, XmlNodeType.Document, ctx); //Read it to XmlDocument to test XmlDocument doc = new XmlDocument(); doc.Load(r); doc.Save(Console.Out);The result is
<?xml version="1.0"?> <foo xmlns="http://foo.com"> <bar>Blah</bar> </foo>
I was quite surprised to see it works even when XML fragment type is XmlNodeType.Document, IMO it should work only for XmlNodeType.Element typed XML fragment. With XmlNodeType.Document it looks like inter-document namespace definition nonsense, but anyway, nice and effective trick.
Read more about reading XML fragments with XmlTextReader in MSDN.
If you are like me and addicted to write
return a>b? a : b;instead of
if (a>b) return a; else return b;then you should be used to grumble programming in XSLT, because XPath 1.0 doesn't support conditional expressions (XPath 2.0 does though). The most notorious sample is when outputting a value into HTML table cell - you should assure it's not empty otherwise the cell will collapse into nothing in a browser. So one usually ends up with the following verbose pattern:
<xsl:choose> <xsl:when test="price != ''"> <xsl:value-of select="price"/> </xsl:when> <xsl:otherwise> </xsl:otherwise> </xsl:choose>Or when you need to output some value or "n/a" string if the value is empty. Quite common requirements. Things get even worse when you need to set up a variable conditionally - the only way then is to nest xsl:choose switch within xsl:variable, thus getting result tree fragment instead of nodeset.
But in fact there are tricks to address this XPath 1.0 restriction. Here they are.
For conditional nodesets the trick formula is
$nodeset1[$condition] | $nodeset2[not($condition)]
It's an union of both nodesets, filtered by mutually exclusive conditional expressions. Easy to see than depending on boolean value of the $condition one nodeset will be selected and second one filtered out. E.g.
<xsl:variable name="var" select="//foo[$param] | //bar[not($param)]"/>binds $var to //foo if $param is true and to //bar otherwise.
For conditional strings or numbers the trick formula is more complicted:
concat( substring($s1, number(not($condition))*string-length($s1)+1),
substring($s2, number($condition)*string-length($s2)+1) )
While it looks quite convolute, the idea (Becker's method after Oliver Becker) is simple - in XPath number(true()) is 1, while number(false()) is 0 and when second argument of substring() function is greater than actual length of the string, empty string is returned. Hence substring($s1, number(not($condition))*string-length($s1)+1) returns $s1 if $condition is true and empty string otherwise. Concatenating two such expressions in mutually exclusive way gives us conditional strings expression.
There is also another variant:
concat( substring($s1, 1, number($condition)*string-length($s1)),
substring($s2, 1, number(not($condition))*string-length($s2)) )
In practice such expressions can be great deal simplified though. For instance to output price if it's not empty or "n/a" otherwise one can use just
<xsl:value-of select="concat(price, substring('n/a', (price!='')*string-length('n/a')+1))"/>
Another interesting trick is to leverage the ability of msxsl:node-set() (or exslt:node-set) extension function to convert a string into a text node, thus enabling using aforementioned conditional nodeset trick for strings too. Here is the same sample written using this method:
<xsl:value-of select="concat(price, msxsl:node-set('n/a')[current()/price=''])"/>Well, probably enough. Hope you had fun looking at this clumsy XPath tricks. Remember than it's the very first version of the XPath language after all and XPath 2.0 will make these tricks obsolete bringing in support for conditional expressions, such as
<xsl:value-of select="if ($part/@discounted) then $part/wholesale else $part/retail"/>Till then it's good to know these tricks.
Quote of the day:
Is it possible to transform a XML document to another XML document using XSLT? How?:)