December 15, 2005

IE7 to adopt orange Firefox RSS feed icon

This is surprisingly cool news - Microsoft RSS Team decided to adopt this orange RSS feed icon used in Firefox to be used in IE7 too. The guys from Mozilla happily allowed the usage of the icon. Here is what Jane from Microsoft RSS team writes: I’m excited to announce ...

XInclude and Mvp.Xml Library in Microsft products

By the way, this is sort of a milestone for the Mvp.Xml project - Microsoft has released Guidance Automation Toolkit (GAT) and Guidance Automation Extensions (GAX) for Visual Studio 2005 which uses and includes recently released Mvp.Xml library v2.0, particularly our XInclude implementation. This is the first Microsoft product using ...

A short summary on what this stuff is:

The Guidance Automation Toolkit is an extension to Visual Studio 2005 which allows architects to author rich, integrated user experiences for reusable assets including frameworks, components and patterns. The resulting Guidance Packages composed of templates, wizards and recipes help developers build solutions in a way consistent with the architecture guidance. The Guidance Automation Extensions for Visual Studio 2005 is a runtime component that must be installed to use the Guidance Automation Toolkit itself, as well as to use any guidance packages built using the Guidance Automation Toolkit. For more information, see Introduction to the Guidance Automation Toolkit.
And note XInclude amongst new features:
New in this release

The December 2005 CTP of the Guidance Automation Toolkit and Guidance Automation Extensions is a minor update to the previous May 2005 CTP. In addition to being updated to work on the final release of Visual Studio 2005, a number of new features have been added. These include:

o Integration with the T4 Text Templating Engine (which is also used by the DSL Toolkit)
o T4 Templates can now be associated with Item Templates
o Recipe references can now be placed on cascading menus
o Two new extensibility points have been added: the Action Execution Service and the Action Coordination Service
o XInclude can be used in recipe definition files to reference XML fragments stored in external files

December 13, 2005

"Schema-Aware Queries and Stylesheets" article from Michael Kay

In the latest article "Schema-Aware Queries and Stylesheets" Michael Kay explaines how useful XML Schema-awareness is for XQuery queries and XSLT stylesheets. ...

December 12, 2005

Zvon's XSLT 2.0 tutorial

Miloslav Nic has announced the first snapshot of XSLT 2.0 tutorial at Zvon. Good stuff. I remember 5 year ago I was learning XSLT 1.0 using Zvon's tutorial... ...

On making noise about XSLT 2.0 and Microsoft

Dare thinks I'm making fruitless noise asking people if they need XSLT 2.0: I'm not sure how an informal survey in a blog would convince Microsoft one way or the other about implementing a technology. A business case to convince a product team to do something usually involves showing them ...

Processing XML in .NET: Antipatterns

I run into this article "Harnessing the BackPack API" by Michael K. Campbell in the new and very cool "XML 4 Fun" column at MSDN. The article is otherwise brilliant and really fun, but XML processing code samples are not so good. It's actually a great collection of XML processing ...

Here is a code snippet:

private void SinglePageReturned(string pageData)
{
    // TODO: add try/catch etc

    byte[] data = Encoding.UTF8.GetBytes(pageData);
    MemoryStream stream = new MemoryStream(data);
    XPathDocument input = new XPathDocument(stream);

    XslCompiledTransform xsl = new XslCompiledTransform();
    XmlNode stylesheet = this.LoadTransformDocument();
    xsl.Load(stylesheet);

    MemoryStream ms = new MemoryStream();
    StreamWriter sw = new StreamWriter(ms);

    xsl.Transform(input, null, sw);

    XmlDocument page = new XmlDocument();
    byte[] bytes = ms.ToArray();
    string transformedXml = Encoding.UTF8.GetString(bytes);
    page.LoadXml(transformedXml);
    sw.Dispose();
    ms.Close();
    ms.Dispose();

    XmlNode node = page.SelectSingleNode("/");
    Page output = this.GetPageFromXml(node);

    this.AddPage(output);
}
Can you see what's wrong here? There is approximately sizeof(input) + sizeof(stylesheet)*3 + sizeof(xslt output)*4 wasted memory here!

First antipattern is loading XML from a string. Somehow people think System.Xml is too stupid to handle encoding issues so they have to decode string into bytes and only then pass it to a System.Xml API:

byte[] data = Encoding.UTF8.GetBytes(pageData);
MemoryStream stream = new MemoryStream(data);
XPathDocument input = new XPathDocument(stream);
So what this code does is actually copying pageData string in memory into a byte array. Pure waste of memory. Don't do that - System.Xml is smart enough and mere StringReader is enough here. Here is a better one:
XPathDocument input = new XPathDocument(new StringReader(pageData));

Second antipattern found here is loading XslCompiledTransform class:

XslCompiledTransform xsl = new XslCompiledTransform();
XmlNode stylesheet = this.LoadTransformDocument();
xsl.Load(stylesheet);
Somehow people believe that XslTransform/XslCompiledTransform needs XSLT stylesheet fully loaded in memory as XmlDocument. That's so MSXML-ish and that is so wrong in .NET 2.0. Here is why. When loading XSLT stylesheet XslCompiledTransform merely reads it via XmlReader API and builds internal representation - AST tree aka QIL tree. All XslCompiledTransform needs is XmlReader over stylesheet document, no more. URI, Stream or TextReader is ok too. If you pass XmlDocument then XslCompiledTransform still reads it via XmlReader, so don't waste memory, loading XML into in-memory store like XmlDocument is quite expensive and takes in average thrice of XML size. Never load XSLT styleshet into XmlDocument to load it into XslCompiledTransform unless you absolutely have to, e.g. for editing stylesheet before loading. Something like this is much better:
XslCompiledTransform xsl = new XslCompiledTransform();
XmlReader stylesheet = this.LoadTransformDocument();
xsl.Load(stylesheet);

Third antipattern found in this code is about transforming into XmlDocument. Somehow people believe some interim buffering is necessary:

MemoryStream ms = new MemoryStream();
StreamWriter sw = new StreamWriter(ms);

xsl.Transform(input, null, sw);

XmlDocument page = new XmlDocument();
byte[] bytes = ms.ToArray();
string transformedXml = Encoding.UTF8.GetString(bytes);
page.LoadXml(transformedXml);
sw.Dispose();
ms.Close();
ms.Dispose();
So here we can see transformation is done into a temporary byte buffer, then it's decoded into a string and then loaded into XmlDocument. Terrible. Again - pure waste of memory, XslCompiledTransform is pretty much capable of outputting transformation results directly into XmlDocument:
XmlDocument page = new XmlDocument();
using (XmlWriter writer = page.CreateNavigator().AppendChild())
{
   xslt.Transform(src, null, writer);
}
Never use any interim buffers when you need to transform into XmlDocument. Just trasform into it.

XmlNode node = page.SelectSingleNode("/");
This is just weird and looks like expensive variant of
XmlNode node = page;
because page is XmlDocument and SelectSingleNode("/") selects root node, which is XmlDocument node in DOM.

And final antipattern found here is getting XSL transformation result as XmlReader. That code above did a transformation into XmlDocument just to be able to read it then as XmlReader:

Page page = (Page)s.Deserialize(new XmlNodeReader(input));
Don't do that, again that's a waste of memory. XmlSerializer needs XmlReader, not fully loaded into memory XmlDocument. A bit more effective is to transform into a byte array and then read it by XmlReader:
MemoryStream pageBuf = new MemoryStream();
xsl.Transform(input, null, pageBuf);
Page page = (Page)s.Deserialize(XmlReader.Create(pageBuf));
And this still uses interim buffer wasting memory. The ultimate approach is to make use of MvpXslTransform from the Mvp.Xml v2.0 library. MvpXslTransform class is a wrapper around XslCompiledTransform and supports effective transformations into XmlReader:
MvpXslTransform xsl = new MvpXslTransform();
XmlReader stylesheet = this.LoadTransformDocument();
xsl.Load(stylesheet);
XmlReader pageReader = xsl.Transform(new XmlInput(input), null);
Page page = (Page)s.Deserialize(pageReader);
A small XML processing antipatterns summary:
  • Don't mess with encodings when having XML in a string, just use StringReader/StringWriter
  • Don't load XSLT stylesheet into XmlDocument in order to load it into XslCompiledTransform - just use URI, Stream, TextReader or XmlReader
  • Don't allocate any temporary buffers when transformting into XmlDocument - just transform directly into it
  • Don't load XML into XmlDocument only to read it as XmlReader. If you need XSL transformation result as XmlReader, use MvpXslTransform class from the Mvp.Xml library

December 11, 2005

XSLT 2.0 and Microsoft Unofficial Survey

Moving along business cases Microsoft seeks to implement XSLT 2.0 I'm trying to gather some opinion statistics amongs developers working with XML and XSLT. So I'm holding this survey at the XML Lab site: Would you like to have XSLT 2.0 implementation in the .NET Framework? The possible answers are ...