July 10, 2005

Loading XPathDocument with XmlWriter

What I dislike in System.Xml v2.0 (and v1.X for that matter) is a poor support for push-based XML processing. Somehow it's all about pull - XmlReader, while push - XmlWriter seems to be a second class citizen. For instance one can't populate XML into XPathDocument or XSLT stylesheet into XslCompiledTransform ...

As a matter of fact, XSLT chaining problem can be solved using XmlDocument, but alas it's still huge, slow and overkill for scenarios where read-only XML store is required.

Apparently this unfortunate state of the art has something to do with SAX vs XmlReader battles in early .NET days, which XmlReader definitely won. In .NET 1.X there wasn't even a standard way to write to XmlDocument using XmlWriter! Happily Chris Lovett came to the rescue with XmlNodeWriter.

An ultimate solution for the XML pipelining problem in .NET would be XmlWriterReader - a component that bridges XmlWriter and XmlReader. It can be implemented either by efficiently caching internally the whole stream of XmlWriter events and reading them after or by two-threaded synchronized XmlWriter/XmlReader. The good news is that it can be said for sure that soon such component will be implemented for the Mvp.Xml library. Stay tuned.

Still I wonder why all these hurdles. Let's take XPathDocument class. XSLT chaining problem could be solved be XPathDocument loadable from a XmlWriter. It's not. It accepts URI, Stream, TextWriter or XmlReader. But if you look inside XPathDocument you can see that it's constructed using XPathDocumentBuilder class, which implements XmlWriter! Put it another way: XPathDocument internally is constructed using only XmlWriter, but somehow it's impossible to populate it with your own XmlWriter. Weird, huh?

To prove it here is a little hackery showing it's feasible to populate XPathDocument with XmlWriter (it's a rude hack, don't use it):

//Create XPathDocumentBuilder
Type xpathDocBuilderType = 
  typeof(XPathDocument).Assembly.GetType(
    "MS.Internal.Xml.Cache.XPathDocumentBuilder");
XPathDocument doc = 
  (XPathDocument)Activator.CreateInstance(
    typeof(XPathDocument), 
    BindingFlags.NonPublic | BindingFlags.Instance, 
    null, new object[] { }, null);
ConstructorInfo xpathDocBuilderCtor = 
  xpathDocBuilderType.GetConstructors()[0];
XmlWriter xpathDocBuilder = 
  (XmlWriter)xpathDocBuilderCtor.Invoke(
  new object[] { doc, null, "", null });

//Populate XPathDocument
xpathDocBuilder.WriteStartElement("foo");
xpathDocBuilder.WriteAttributeString("attr", "value");
xpathDocBuilder.WriteString("content");
xpathDocBuilder.WriteEndElement();

//Done
Console.WriteLine(doc.CreateNavigator().OuterXml);
The output is
<foo attr="value">content</foo>

I wonder why this useful functionality isn't exposed. Apparently the reason is the added complexity. That would require to expose XPathDocumentBuilder and probably move to Builder pattern of constructing XPathDocument altogether. Ok, I've opened a suggestion at the MSDN Feedback Center, let's see what Microsofties say.