December 29, 2005

First XML Podcast

Ok, this is gonna be the firrst podcast I want to subscribe to. It's "A Weekly XML Industry Podcast Hosted by Kurt Cagle and M. David Peterson". First real pilot segment can be found here. It's about OASIS Open Document and Microsoft's Open XML formats. The podcast feed is here ...

December 26, 2005

Ignore this post

No Need to Click Here - I'm just claiming my feed at Feedster feedster:54ff7668410b28a62e65b7362c37778c ...

December 21, 2005

MSXML6 SDK documentation online

MSXML6 SDK documentation is online now. In fact it's "multi-version" documentation, which covers MSXML3 through MSXML6. So, what's new in MSXML6? Looks like it's security tightening, XML Schema support improvements and removals: What's New in MSXML 6.0 MSXML 6.0 shipped with SQL Server 2005. It also shipped as a Web ...

December 20, 2005

The Raise of XSLT Compilation

Slowly, gradually and with not much loud buzz both modern managed platforms - Java and .NET have switched to compiling XSLT implementations by default. First Java 5.0 made compiling Apache XSLTC processor a default transformer in JAXP 1.3 (instead of interpreting Apache XALAN). Then Microsoft released .NET 2.0 with new ...

Both Java and .NET declare the same reason for adopting XSLT compilation - performance. Here is a snippet from JAXP 1.3 documentation:

o XSLTC, the fast, compiling transformer, which is now the default engine for XSLT processing.
The XSLTC transformer generates a transformation engine, or translet, from an XSL stylesheet. This approach separates the interpretation of stylesheet instructions from their runtime application to XML data.

XSLTC works by compiling a stylesheet into Java byte code (translets), which can then be used to perform XSLT transformations. This approach greatly improves the performance of XSLT transformations where a given stylesheet is compiled once and used many times. It also generates an extremely lightweight translet, because only the XSLT instructions that are actually used by the stylesheet are included.
And here is what Microsoft XML Team writes about XslCompiledTransform:
To improve XSLT execution performance in the .NET Framework version 2.0, the XslTransform class has been replaced with a new XSLT 1.0 implementation: the XslCompiledTransform class. XslCompiledTransform compiles XSLT stylesheets to Microsoft Intermediate Language (MSIL) methods and then executes them. Execution time of the new processor is on average 4 times better than XslTransform and matches the speed of MSXML, the native XML processor.

Is it true that only XSLT compilation can provide the best XML transformation performance on managed platforms like Java and .NET? I have no fresh benchmark results, but AFAIR XSLTC was always one of the fastest XSLT processors undeservedly underused because of its unique processing model. And Microsoft also claims that new XslCompiledTransform now matches the speed of MSXML4. But what about Saxon? It's interpreting XSLT engine and it's pretty fast. I believe Saxon is fast only due to numerous very smart and unique optimizations and so can't beat compiling optimizing XSLT processor.

The idea that ideal XSLT engine is optimizing compiling one sounds pretty obvious. XSLT was and is meant to be compiled, not interpreted and despite the fact that for years there was only a single semi-experimental compiling XSLT engine around - Sun's XSLTC (now Apache XSLTC), XSLT 2.0 is still looks like more traditional compiled language than a dynamic one.

Ok, but what about the future? I think that's safe to say that in the future XSLT compilation will be even more pervasive. Apache community (with IBM behind contributing developers) have chosen XSLTC, not XALAN as a basis for their future XSLT 2.0 implementation and I have no doubts that Microsoft will implement XSLT 2.0 only as a compiling engine too. And I love it. I predict that in a near future we will be compiling XSLT stylesheets as we do with ordinary Java or C# classes and call "translets" at run time as usual classes without bothering to load stylesheet sources first.

Btw, it should be noted that for Java users the switch to another default XSLT engine went mostly unnoticed thanks to JAXP, while Microsoft has no JAXP analog so users have to migrate to the XslCompiledTransform explicitly modifying their code. I'll address that in a separate post though.

RSS Bandit Nightcrawler release in Russian

RSSBandit users who are interested in Russian localization were probably disapointed when found no Russian language support in the Nightcrawler release. Sorry about that, I was too late for the deadline. Good news is that RSS Bandit bugfix release with Russian localization is expected really soon - most likely before ...

December 15, 2005

IE7 to adopt orange Firefox RSS feed icon

This is surprisingly cool news - Microsoft RSS Team decided to adopt this orange RSS feed icon used in Firefox to be used in IE7 too. The guys from Mozilla happily allowed the usage of the icon. Here is what Jane from Microsoft RSS team writes: I’m excited to announce ...

XInclude and Mvp.Xml Library in Microsft products

By the way, this is sort of a milestone for the Mvp.Xml project - Microsoft has released Guidance Automation Toolkit (GAT) and Guidance Automation Extensions (GAX) for Visual Studio 2005 which uses and includes recently released Mvp.Xml library v2.0, particularly our XInclude implementation. This is the first Microsoft product using ...

A short summary on what this stuff is:

The Guidance Automation Toolkit is an extension to Visual Studio 2005 which allows architects to author rich, integrated user experiences for reusable assets including frameworks, components and patterns. The resulting Guidance Packages composed of templates, wizards and recipes help developers build solutions in a way consistent with the architecture guidance. The Guidance Automation Extensions for Visual Studio 2005 is a runtime component that must be installed to use the Guidance Automation Toolkit itself, as well as to use any guidance packages built using the Guidance Automation Toolkit. For more information, see Introduction to the Guidance Automation Toolkit.
And note XInclude amongst new features:
New in this release

The December 2005 CTP of the Guidance Automation Toolkit and Guidance Automation Extensions is a minor update to the previous May 2005 CTP. In addition to being updated to work on the final release of Visual Studio 2005, a number of new features have been added. These include:

o Integration with the T4 Text Templating Engine (which is also used by the DSL Toolkit)
o T4 Templates can now be associated with Item Templates
o Recipe references can now be placed on cascading menus
o Two new extensibility points have been added: the Action Execution Service and the Action Coordination Service
o XInclude can be used in recipe definition files to reference XML fragments stored in external files

December 13, 2005

"Schema-Aware Queries and Stylesheets" article from Michael Kay

In the latest article "Schema-Aware Queries and Stylesheets" Michael Kay explaines how useful XML Schema-awareness is for XQuery queries and XSLT stylesheets. ...

December 12, 2005

Zvon's XSLT 2.0 tutorial

Miloslav Nic has announced the first snapshot of XSLT 2.0 tutorial at Zvon. Good stuff. I remember 5 year ago I was learning XSLT 1.0 using Zvon's tutorial... ...

On making noise about XSLT 2.0 and Microsoft

Dare thinks I'm making fruitless noise asking people if they need XSLT 2.0: I'm not sure how an informal survey in a blog would convince Microsoft one way or the other about implementing a technology. A business case to convince a product team to do something usually involves showing them ...

Processing XML in .NET: Antipatterns

I run into this article "Harnessing the BackPack API" by Michael K. Campbell in the new and very cool "XML 4 Fun" column at MSDN. The article is otherwise brilliant and really fun, but XML processing code samples are not so good. It's actually a great collection of XML processing ...

Here is a code snippet:

private void SinglePageReturned(string pageData)
{
    // TODO: add try/catch etc

    byte[] data = Encoding.UTF8.GetBytes(pageData);
    MemoryStream stream = new MemoryStream(data);
    XPathDocument input = new XPathDocument(stream);

    XslCompiledTransform xsl = new XslCompiledTransform();
    XmlNode stylesheet = this.LoadTransformDocument();
    xsl.Load(stylesheet);

    MemoryStream ms = new MemoryStream();
    StreamWriter sw = new StreamWriter(ms);

    xsl.Transform(input, null, sw);

    XmlDocument page = new XmlDocument();
    byte[] bytes = ms.ToArray();
    string transformedXml = Encoding.UTF8.GetString(bytes);
    page.LoadXml(transformedXml);
    sw.Dispose();
    ms.Close();
    ms.Dispose();

    XmlNode node = page.SelectSingleNode("/");
    Page output = this.GetPageFromXml(node);

    this.AddPage(output);
}
Can you see what's wrong here? There is approximately sizeof(input) + sizeof(stylesheet)*3 + sizeof(xslt output)*4 wasted memory here!

First antipattern is loading XML from a string. Somehow people think System.Xml is too stupid to handle encoding issues so they have to decode string into bytes and only then pass it to a System.Xml API:

byte[] data = Encoding.UTF8.GetBytes(pageData);
MemoryStream stream = new MemoryStream(data);
XPathDocument input = new XPathDocument(stream);
So what this code does is actually copying pageData string in memory into a byte array. Pure waste of memory. Don't do that - System.Xml is smart enough and mere StringReader is enough here. Here is a better one:
XPathDocument input = new XPathDocument(new StringReader(pageData));

Second antipattern found here is loading XslCompiledTransform class:

XslCompiledTransform xsl = new XslCompiledTransform();
XmlNode stylesheet = this.LoadTransformDocument();
xsl.Load(stylesheet);
Somehow people believe that XslTransform/XslCompiledTransform needs XSLT stylesheet fully loaded in memory as XmlDocument. That's so MSXML-ish and that is so wrong in .NET 2.0. Here is why. When loading XSLT stylesheet XslCompiledTransform merely reads it via XmlReader API and builds internal representation - AST tree aka QIL tree. All XslCompiledTransform needs is XmlReader over stylesheet document, no more. URI, Stream or TextReader is ok too. If you pass XmlDocument then XslCompiledTransform still reads it via XmlReader, so don't waste memory, loading XML into in-memory store like XmlDocument is quite expensive and takes in average thrice of XML size. Never load XSLT styleshet into XmlDocument to load it into XslCompiledTransform unless you absolutely have to, e.g. for editing stylesheet before loading. Something like this is much better:
XslCompiledTransform xsl = new XslCompiledTransform();
XmlReader stylesheet = this.LoadTransformDocument();
xsl.Load(stylesheet);

Third antipattern found in this code is about transforming into XmlDocument. Somehow people believe some interim buffering is necessary:

MemoryStream ms = new MemoryStream();
StreamWriter sw = new StreamWriter(ms);

xsl.Transform(input, null, sw);

XmlDocument page = new XmlDocument();
byte[] bytes = ms.ToArray();
string transformedXml = Encoding.UTF8.GetString(bytes);
page.LoadXml(transformedXml);
sw.Dispose();
ms.Close();
ms.Dispose();
So here we can see transformation is done into a temporary byte buffer, then it's decoded into a string and then loaded into XmlDocument. Terrible. Again - pure waste of memory, XslCompiledTransform is pretty much capable of outputting transformation results directly into XmlDocument:
XmlDocument page = new XmlDocument();
using (XmlWriter writer = page.CreateNavigator().AppendChild())
{
   xslt.Transform(src, null, writer);
}
Never use any interim buffers when you need to transform into XmlDocument. Just trasform into it.

XmlNode node = page.SelectSingleNode("/");
This is just weird and looks like expensive variant of
XmlNode node = page;
because page is XmlDocument and SelectSingleNode("/") selects root node, which is XmlDocument node in DOM.

And final antipattern found here is getting XSL transformation result as XmlReader. That code above did a transformation into XmlDocument just to be able to read it then as XmlReader:

Page page = (Page)s.Deserialize(new XmlNodeReader(input));
Don't do that, again that's a waste of memory. XmlSerializer needs XmlReader, not fully loaded into memory XmlDocument. A bit more effective is to transform into a byte array and then read it by XmlReader:
MemoryStream pageBuf = new MemoryStream();
xsl.Transform(input, null, pageBuf);
Page page = (Page)s.Deserialize(XmlReader.Create(pageBuf));
And this still uses interim buffer wasting memory. The ultimate approach is to make use of MvpXslTransform from the Mvp.Xml v2.0 library. MvpXslTransform class is a wrapper around XslCompiledTransform and supports effective transformations into XmlReader:
MvpXslTransform xsl = new MvpXslTransform();
XmlReader stylesheet = this.LoadTransformDocument();
xsl.Load(stylesheet);
XmlReader pageReader = xsl.Transform(new XmlInput(input), null);
Page page = (Page)s.Deserialize(pageReader);
A small XML processing antipatterns summary:
  • Don't mess with encodings when having XML in a string, just use StringReader/StringWriter
  • Don't load XSLT stylesheet into XmlDocument in order to load it into XslCompiledTransform - just use URI, Stream, TextReader or XmlReader
  • Don't allocate any temporary buffers when transformting into XmlDocument - just transform directly into it
  • Don't load XML into XmlDocument only to read it as XmlReader. If you need XSL transformation result as XmlReader, use MvpXslTransform class from the Mvp.Xml library

December 11, 2005

XSLT 2.0 and Microsoft Unofficial Survey

Moving along business cases Microsoft seeks to implement XSLT 2.0 I'm trying to gather some opinion statistics amongs developers working with XML and XSLT. So I'm holding this survey at the XML Lab site: Would you like to have XSLT 2.0 implementation in the .NET Framework? The possible answers are ...

December 9, 2005

eXml - extended ASP.NET XML Web server control v1.0 released

I'm glad to announce first release of the eXml - extended ASP.NET Xml Web Server Control. eXml is a free open-source ASP.NET 2.0 Web server control extending and improving standard ASP.NET XML Web server control. eXml Web server control uses new .NET 2.0 XSLT processor - XslCompiledTransform class to perform ...

December 8, 2005

A business case for XSLT 2.0?

If you are using XSLT and you think that XSLT 2.0 would provide you some real benefits, please drop a line of comment with a short explanation pleeeease. I'm collecting some arguments for XSLT 2.0, some real world scenarios that are hard with XSLT 1.0, some business cases when XSLT ...

December 5, 2005

nxslt v2.0 released

nxslt v2.0 (aka nxslt2) is available for download. This is first nxslt release for .NET 2.0. nxslt is a free feature-rich command line utility that allows to perform XSL Transformations (XSLT) using .NET Framework 2.0 XSLT implementation - System.Xml.Xsl.XslCompiledTransform class. nxslt is compatible with Microsoft's MSXSL.EXE tool and additionally supports ...

nxslt2 is a free open source tool, subject to the BSD license and is distributed on an "AS IS" basis, WITHOUT WARRANTY OF ANY KIND, either express or implied. You can use it for any commercial or noncommercial purposes, including distributing derivative works.

I decided to name v2 version nxslt2, so in the release you will find nxslt2.exe file. That's because I still do lots of .NET 1.1 development and so I need nxslt.exe to remain .NET 1.1 tool. If you don't like nxslt2.exe name - well, just rename it.

Some smart people proposed to contribute patches to the nxslt codebase in order to add new features they need. That's just great. To enable this I moved nxslt sources to the Mvp.Xml Project CVS Repository. Now if you need the nxslt tool to have a new feature you can either just suggest it and wait me to implement it or instead you can checkout nxslt sources, add new feature and send a patch to me. Do whatever works for you and thanks in advance for contributions.