December 12, 2005

Zvon's XSLT 2.0 tutorial

Miloslav Nic has announced the first snapshot of XSLT 2.0 tutorial at Zvon. Good stuff. I remember 5 year ago I was learning XSLT 1.0 using Zvon's tutorial... ...

On making noise about XSLT 2.0 and Microsoft

Dare thinks I'm making fruitless noise asking people if they need XSLT 2.0: I'm not sure how an informal survey in a blog would convince Microsoft one way or the other about implementing a technology. A business case to convince a product team to do something usually involves showing them ...

Processing XML in .NET: Antipatterns

I run into this article "Harnessing the BackPack API" by Michael K. Campbell in the new and very cool "XML 4 Fun" column at MSDN. The article is otherwise brilliant and really fun, but XML processing code samples are not so good. It's actually a great collection of XML processing ...

Here is a code snippet:

private void SinglePageReturned(string pageData)
{
    // TODO: add try/catch etc

    byte[] data = Encoding.UTF8.GetBytes(pageData);
    MemoryStream stream = new MemoryStream(data);
    XPathDocument input = new XPathDocument(stream);

    XslCompiledTransform xsl = new XslCompiledTransform();
    XmlNode stylesheet = this.LoadTransformDocument();
    xsl.Load(stylesheet);

    MemoryStream ms = new MemoryStream();
    StreamWriter sw = new StreamWriter(ms);

    xsl.Transform(input, null, sw);

    XmlDocument page = new XmlDocument();
    byte[] bytes = ms.ToArray();
    string transformedXml = Encoding.UTF8.GetString(bytes);
    page.LoadXml(transformedXml);
    sw.Dispose();
    ms.Close();
    ms.Dispose();

    XmlNode node = page.SelectSingleNode("/");
    Page output = this.GetPageFromXml(node);

    this.AddPage(output);
}
Can you see what's wrong here? There is approximately sizeof(input) + sizeof(stylesheet)*3 + sizeof(xslt output)*4 wasted memory here!

First antipattern is loading XML from a string. Somehow people think System.Xml is too stupid to handle encoding issues so they have to decode string into bytes and only then pass it to a System.Xml API:

byte[] data = Encoding.UTF8.GetBytes(pageData);
MemoryStream stream = new MemoryStream(data);
XPathDocument input = new XPathDocument(stream);
So what this code does is actually copying pageData string in memory into a byte array. Pure waste of memory. Don't do that - System.Xml is smart enough and mere StringReader is enough here. Here is a better one:
XPathDocument input = new XPathDocument(new StringReader(pageData));

Second antipattern found here is loading XslCompiledTransform class:

XslCompiledTransform xsl = new XslCompiledTransform();
XmlNode stylesheet = this.LoadTransformDocument();
xsl.Load(stylesheet);
Somehow people believe that XslTransform/XslCompiledTransform needs XSLT stylesheet fully loaded in memory as XmlDocument. That's so MSXML-ish and that is so wrong in .NET 2.0. Here is why. When loading XSLT stylesheet XslCompiledTransform merely reads it via XmlReader API and builds internal representation - AST tree aka QIL tree. All XslCompiledTransform needs is XmlReader over stylesheet document, no more. URI, Stream or TextReader is ok too. If you pass XmlDocument then XslCompiledTransform still reads it via XmlReader, so don't waste memory, loading XML into in-memory store like XmlDocument is quite expensive and takes in average thrice of XML size. Never load XSLT styleshet into XmlDocument to load it into XslCompiledTransform unless you absolutely have to, e.g. for editing stylesheet before loading. Something like this is much better:
XslCompiledTransform xsl = new XslCompiledTransform();
XmlReader stylesheet = this.LoadTransformDocument();
xsl.Load(stylesheet);

Third antipattern found in this code is about transforming into XmlDocument. Somehow people believe some interim buffering is necessary:

MemoryStream ms = new MemoryStream();
StreamWriter sw = new StreamWriter(ms);

xsl.Transform(input, null, sw);

XmlDocument page = new XmlDocument();
byte[] bytes = ms.ToArray();
string transformedXml = Encoding.UTF8.GetString(bytes);
page.LoadXml(transformedXml);
sw.Dispose();
ms.Close();
ms.Dispose();
So here we can see transformation is done into a temporary byte buffer, then it's decoded into a string and then loaded into XmlDocument. Terrible. Again - pure waste of memory, XslCompiledTransform is pretty much capable of outputting transformation results directly into XmlDocument:
XmlDocument page = new XmlDocument();
using (XmlWriter writer = page.CreateNavigator().AppendChild())
{
   xslt.Transform(src, null, writer);
}
Never use any interim buffers when you need to transform into XmlDocument. Just trasform into it.

XmlNode node = page.SelectSingleNode("/");
This is just weird and looks like expensive variant of
XmlNode node = page;
because page is XmlDocument and SelectSingleNode("/") selects root node, which is XmlDocument node in DOM.

And final antipattern found here is getting XSL transformation result as XmlReader. That code above did a transformation into XmlDocument just to be able to read it then as XmlReader:

Page page = (Page)s.Deserialize(new XmlNodeReader(input));
Don't do that, again that's a waste of memory. XmlSerializer needs XmlReader, not fully loaded into memory XmlDocument. A bit more effective is to transform into a byte array and then read it by XmlReader:
MemoryStream pageBuf = new MemoryStream();
xsl.Transform(input, null, pageBuf);
Page page = (Page)s.Deserialize(XmlReader.Create(pageBuf));
And this still uses interim buffer wasting memory. The ultimate approach is to make use of MvpXslTransform from the Mvp.Xml v2.0 library. MvpXslTransform class is a wrapper around XslCompiledTransform and supports effective transformations into XmlReader:
MvpXslTransform xsl = new MvpXslTransform();
XmlReader stylesheet = this.LoadTransformDocument();
xsl.Load(stylesheet);
XmlReader pageReader = xsl.Transform(new XmlInput(input), null);
Page page = (Page)s.Deserialize(pageReader);
A small XML processing antipatterns summary:
  • Don't mess with encodings when having XML in a string, just use StringReader/StringWriter
  • Don't load XSLT stylesheet into XmlDocument in order to load it into XslCompiledTransform - just use URI, Stream, TextReader or XmlReader
  • Don't allocate any temporary buffers when transformting into XmlDocument - just transform directly into it
  • Don't load XML into XmlDocument only to read it as XmlReader. If you need XSL transformation result as XmlReader, use MvpXslTransform class from the Mvp.Xml library