November 30, 2004

The Cafes

Elliotte Rusty Harold has started a new site called (not surprisingly) "The Cafes" - for articles "longer than a typical Cafe con Leche news item, but much shorter than a full book". Here is the RSS feed. Subscribed. ...

November 29, 2004

Why is XML case-sensitive?

Sriram Krishnan asks strange question: I see someone flaming someone else for not being XHTML compliant. Tim Bray - if you're reading this, I want to know something. Why is XML case-sensitive? No human-being ever thinks in case-sensitive terms. A is a. End of story. So now, I have a ...

Beta MSN search runs XHTML

Scoble says "MSN is XHTML". Well, not really msn.com, but MSN search (beta version) - beta.search.msn.com. Good news anyway. ...

November 28, 2004

Re-throwing exceptions - a subtle difference between Java and .NET you better be aware of

Here is what I learnt from Jackie Goldstein's talk on .NET Worst Practices at the .Net Deep Dive conference in Tel-Aviv last Thursday. There is a subtle, but hugely important difference between how .NET and Java re-throw a caught exception and I missed that somehow when been learning .NET. Not ...

November 25, 2004

Head First books

Hey, that's cool stuff - check it out. Apparently, O'Reilly have found new way to sell more books. It's a sort of modern version of the "X for complete idiots" series - actually they call it "Head First". The main idea as far as I understand is to set out ...

XML encoding pedantry

BTW, as nicely pointed out by Michael Kay, XML document with no XML declaration, in encoding other than UTF-8 or UTF-16 is not necessarily malformed! In fact XML spec allows encoding information to be provided externally (e.g. via Content-type HTTP header). ...

November 24, 2004

Cool article on the history of math notations

"Mathematical Notation: Past and Future" by Stephen Wolfram - amazingly interesting article. [Via Sean Gerety] ...

November 23, 2004

Another elections disappointed

Well, it's not about USA elections. It's about elections in Ukraine, the country where I was born and grew up. The president elections were just terrible. Calling them fraudulent is saying nothing, they ware super-fraudulent. Violence, intimidation, abuse of state resources in favor of the prime minister, frauds such as ...

November 22, 2004

Calling document("") in .NET

There was recently an interesting thread in the microsoft.public.dotnet.xml newsgroup on document("") function call in .NET. A guy was porting some app from using MSXML to .NET. Something didn't work... You know these common bitter (and usually completely lame) complaints: It is strange, this all works just fine using MSXML4 ...

First, a little disgression on what's document("") call actually means. As per XSLT 1.0 spec:

The URI reference may be relative. The base URI (see [3.2 Base URI]) of the node in the second argument node-set that is first in document order is used as the base URI for resolving the relative URI into an absolute URI. If the second argument is omitted, then it defaults to the node in the stylesheet that contains the expression that includes the call to the document function. Note that a zero-length URI reference is a reference to the document relative to which the URI reference is being resolved; thus document("") refers to the root node of the stylesheet; the tree representation of the stylesheet is exactly the same as if the XML document containing the stylesheet was the initial source document.
So it's about introspection - calling document() function with empty string as the only argument allows XSLT stylesheet to get its own source as XML document. And to process it as any other XML document - query, transform, anything. Extremely useful feature, which leverages the simple fact XSLT stylesheets are merely XML documents. Static lookup tables stored within a stylesheet is one of common usages.

So that guy has MSXML-based application, working in I/O-restricted environment (no access to file system in particular). XSLT stylesheet is given as a string, being loaded to MSXML2.DOMDocument and calls document("") to access a lookup table within its own source. Works fine. Doesn't work in .NET. Why?

The difference between MSXML and System.Xml's XSLT implementation here is that in MSXML XSLT is a function of DOM - XSLT stylesheet always must be explicitly loaded into DOM before calling tranformNode()/transformNodeToObject() or working with Msxml2.XSLTemplate. So XSLT implementation always has in-memory DOM representation of the stylesheet at hands and returns it whenever document("") is called. Simple and effective as almost everything in MSXML. In System.Xml, XSLT and DOM are completely decoupled. It's even officially recommended to avoid using DOM (XmlDocument class) when performing XSL Transformations in .NET. Instead, XslTransform class can be loaded from a variety of sources, such as Stream, TextReader, XmlReader, XPathNavigator or by an URI.

XslTransform loads and compiles XSLT stylesheet into some internal representation, ready to multithreaded transformations. See the difference? There is no in-memory XSLT stylesheet floating around explicitly, so XsltTransform can't return it whenever document("") is called. Instead, in XslTransform document("") isn't treated as any special case and usual URI resolving machinery leads "" to the stylesheet's base URI (as per XSLT spec above) and it gets fetched using that URI as any other document.

In fact, many (if not all DOM-decoupled) XSLT processors behave exactly this way. At least Saxon and Xalan (included by default into Java 2) do. Obviously nobody wants to hold XSLT sources in memory till the run-time just in case there will be a call to document(""). There is no magic here. XSLT is regular compiled language. Can you imagine .exe to be asked for C++ sources? (Well, there is a reflection, but that's completely another matter and not usually the case).

As a matter of interest, XslTarnsform's internal compiled XSLT structure does include source XSLT stylesheet as XPathNavigator. Looks like it only used when document() function resolves relative URIs. Hmm, I wonder if it could be made more thrifty? Oh well, anyway. Looks like they just decided not to expose it. Who can say that's unreasonable decision - not to expose an internal structure?

So, what's the solution? How to avoid I/O when using document("") in .NET? Switching back to MSXML? No way. Here is a simple trick how to reuse in-memory XSLT stylesheet to avoid loading XSLT sources by URI. The idea is to load the stylesheet to an XPathDocument, assign it some unique base URI and resolve that URI in an XmlResolver.
First, small XmlResolver:

public class MyResolver : XmlUrlResolver {
     private XPathNavigator nav;

     public MyResolver(XPathNavigator nav) {
         this.nav = nav;
     }

     public override object GetEntity(Uri absoluteUri, string role, Type 
ofObjectToReturn) {
         if (absoluteUri.Scheme == "my")
             return nav.Clone();
         else
             return base.GetEntity(absoluteUri, role, ofObjectToReturn);
      }
}
And here is the main part:
string xml = "<foo/>";
string xsl = @"
<xsl:stylesheet version=""1.0"" 
xmlns:xsl=""http://www.w3.org/1999/XSL/Transform"">
 <xsl:template match=""/"">
  <xsl:value-of select=""count(document('')//*)""/>
 </xsl:template>
</xsl:stylesheet>";

XPathDocument doc = new XPathDocument(new StringReader(xml));
XslTransform xslt = new XslTransform();
XPathDocument xslDoc = new XPathDocument(new
     XmlTextReader("my://uri", new StringReader(xsl)));
xslt.Load(xslDoc);

//Runtime - no I/O here
xslt.Transform(doc, null, Console.Out, new
     MyResolver(xslDoc.CreateNavigator()));
It's a bit tricky though and I wonder if there is any cleaner solution?

New XSLT-related blog - xsltblog.com

M. David Peterson, coordinator of the x2x2x.org community open-source project (known by the Saxon.NET, AspectXML, and xameleon projects) started a blog at xsltblog.com. The blog's description is "An ongoing weblog of current topics from the XSLT development community & other XML/XSLT related news items. Hosted, maintained, & edited by M ...

November 17, 2004

Minor EXSLT.NET update

Just for the record: I updated EXSLT.NET to support for omit-xml-declaration attribute on the exsl:document element. If somebody desperately needs it, it's in the source repository already. ...

TopXML is reblogging

TopXML launched XML News Reblogger service. It's basically XML blogs and news aggregator, similar to the Planet XMLhack. They aggregate selected XML-related news feeds and blogs (127 currently, including mine :) twice a day and provide a way to read all that jazz on their web site. They don't provide ...

November 16, 2004

Norman Walsh on XML 2.0

Amazing new essay by Norman Walsh on XML 2.0. Worth reading and contemplating. The crux is "simplification". XML is too complex, who knew it six years ago :) ...

Fifth anniversary of the XSLT and XPath

Here are some 5-years old news: http://www.w3.org/ -- 16 November 1999 -- The World Wide Web Consortium (W3C) today releases two specifications, XSL Transformations (XSLT) and XML Path Language (XPath), as W3C Recommendations. These new specifications represent cross-industry and expert community agreement on technologies that will enable the transformation and ...

November 15, 2004

Altova: Free XSLT1.0/XSLT2.0 and XQuery1.0 Processors for Windows

Breaking news from Altova GmbH (maker of famous XML Spy IDE): Altova has compiled a collection of free tools and technical resources to help develop solutions for today's business challenges. That includes: Altova XSLT 1.0 and 2.0 Engines, Altova XQuery Engine, XMLSpy® 2005 Home Edition, Authentic® 2005. All Windows-only apparently ...

November 7, 2004

Imprinting on "randomness"

Well, that's just a simple level 100 quiz aiming to imprint "standard random number generators are not really random" program to those who still lack it. What will produce the following C# snippet? System.Random rnd = new System.Random(12345); System.Random rnd2 = new System.Random(12345); for (int i=0; i<1000; i++) if (rnd.Next ...