Recently in XML Tips and Tricks Category

I found this gem in David Carlisle's blog. Smart Javascript trick allows to mask msxsl:node-set() extension function as exsl:node-set() and so you can easily write crossbrowser XSLT stylesheets using exsl:node-set() functionality. Opera 9, Internet Explorer 6-7 and Firefox 3 are covered, but sadly Firefox 2 is out of the game. Julian Reschke came with a nice trick using Javascript expressiveness:

<xsl:stylesheet
  version="1.0" 
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
  xmlns:exslt="http://exslt.org/common"
  xmlns:msxsl="urn:schemas-microsoft-com:xslt"
  exclude-result-prefixes="exslt msxsl">
  

<msxsl:script language="JScript" implements-prefix="exslt">
 this['node-set'] =  function (x) {
  return x;
  }
</msxsl:script>
...

Very smart.

That reminds me old days of mine when I came with a similar trick for EXSLT extension functions implemented in C# (for EXSLT.NET project). Except that C# isn't so dynamic as Javascript so I had to escape to renaming method names in MSIL bytecode. That trick still drives EXSLT.NET (now module of the Mvp.Xml library).

 By the way just to remind you - .NET (XslCompiledTransform) supports exsl:node-set() function natively.

In ASP.NET when you building a server control that includes an HTTP handler you have this problem - the HTTP handler has to be registered in Web.config. That means it's not enough that your customer developer drops control on her Web form and sets up its properties. One more step is required - manual editing of the config, which is usability horror.

How do you make your customer aware she needs to perform this additional action? Documentation? Yes, but who reads documentation on controls? I know I never, I usually just drop it on the page and poke around its properties to figure out what I need to set up to make it working asap.

So here is nice trick how to avoid manual Web.config editing (found it in the ScriptAculoUs autocomplete web control).

  1. Make sure your control has a designer.
  2. In your control's designer class override ControlDesigner.GetDesignTimeHtml() method, which is called each time your control needs to be represented in design mode.
  3. In the GetDesignTimeHtml() method check if your HTTP handler in already registered in Web.config and if it isn't - just register it.
Here is a sample code that worth hundred words: 
using System;
using System.Web.UI.Design;
using System.Security.Permissions;
using System.Configuration;
using System.Web.Configuration;
using System.Windows.Forms;

namespace XMLLab.WordXMLViewer
{
    [SecurityPermission(SecurityAction.Demand, 
        Flags = SecurityPermissionFlag.UnmanagedCode)]
    public class WordXMLViewerDesigner : ControlDesigner
    {
        private void RegisterImageHttpHandler()
        {
            IWebApplication webApplication = 
                (IWebApplication)this.GetService(typeof(IWebApplication));

            if (webApplication != null)
            {
                Configuration configuration = webApplication.OpenWebConfiguration(false);
                if (configuration != null)
                {
                    HttpHandlersSection section = 
                        (HttpHandlersSection)configuration.GetSection(
                        "system.web/httpHandlers");
                    if (section == null)
                    {
                        section = new HttpHandlersSection();
                        ConfigurationSectionGroup group = 
                            configuration.GetSectionGroup("system.web");
                        if (group == null)
                        {
                            configuration.SectionGroups.Add("system.web", 
                                new ConfigurationSectionGroup());
                        }
                        group.Sections.Add("httpHandlers", section);
                    }
                    section.Handlers.Add(Action);
                    configuration.Save(ConfigurationSaveMode.Minimal);
                }
            }
        }


        private bool IsHttpHandlerRegistered()
        {
            IWebApplication webApplication = 
                (IWebApplication)this.GetService(typeof(IWebApplication));

            if (webApplication != null)
            {
                Configuration configuration = 
                    webApplication.OpenWebConfiguration(true);

                if (configuration != null)
                {
                    HttpHandlersSection section = 
                        (HttpHandlersSection)configuration.GetSection(
                        "system.web/httpHandlers");

                    if ((section != null) && (section.Handlers.IndexOf(Action) >= 0))
                        return true;
                }
            }
            return false;
        }


        static HttpHandlerAction Action
        {
            get
            {
                return new HttpHandlerAction(
                    "image.ashx", 
                    "XMLLab.WordXMLViewer.ImageHandler, XMLLab.WordXMLViewer", 
                    "*"
                );
            }
        }

        public override string GetDesignTimeHtml(DesignerRegionCollection regions)
        {
            if (!IsHttpHandlerRegistered() && 
                (MessageBox.Show(
                "Do you want to automatically register the HttpHandler needed by this control in the web.config?", 
                "Confirmation", MessageBoxButtons.YesNo, 
                MessageBoxIcon.Exclamation) == DialogResult.Yes))
                RegisterImageHttpHandler();
            return base.CreatePlaceHolderDesignTimeHtml("Word 2003 XML Viewer");
        }
    }
}
Obviously it only works if your control gets rendered at least once in Design mode, which isn't always the case. Some freaks (including /me) prefer to work with Web forms in Source mode, so you still need to write in the documentation how to update Web.config to make your control working.

Back in 2005 I was writing about speeding up Muenchian grouping in .NET 1.X. I was comparing three variants of the Muenchian grouping (using generate-id(), count() and set:distinct()). The conclusion was that XslTransform class in .NET 1.X really sucks when grouping using generate-id(), performs better with count() and the best with EXSLT set:distinct().

Here is that old graph:

Today a reader reminded me I forgot to post similar results for .NET 2.0 and its new shiny XslCompiledTransform engine. So here it is. I was running simple XSLT stylesheet doing Muenchian grouping. Input documents contain 415, 830, 1660, 3320, 6640, 13280, 26560 and 53120 orders to be grouped.

Besides being pretty damn faster that XslTransform, XslCompiledTransform shows expected results - there is no difference in a way you are doing Muenchian grouping in .NET 2.0 - all three variants I was testing are performing excellent with very very close results. Old XslTransform was full of bad surprises. Just switching to count() instead of generate-id() provided 7x performance boost in grouping. That was bad. Anybody digging into XslTransform sources knows how ridiculously badly generate-id() was implemented. Now XslCompiledTransform shows no surprises - works as expected. No tricks needed. That's a sign of a good quality software.

Reporting errors in XSLT stylesheets is a task that almost nobody gets done right. Including me - error reporting in nxslt sucks in a big way. Probably that's because I'm just lazy bastard. But also lets face it - XslCompiledTransform API doesn't help here.

Whenever there are XSLT loading (compilation) errors XslCompiledTransform.Load() method throws an XsltException containing description of the first error encountered by the compiler. But as a matter of fact internally XslCompiledTransform holds list of all errors and warnings (internal Errors property). It's just kept internal who knows why. Even Microsoft own products such as Visual Studio don't use this important information when reporting XSLT errors - Visual Studio's XML editor also displays only first error. That sucks.

Anyway here is a piece of code written by Anton Lapounov, one of the guys behind XslCompiledTransform. It shows how to use internal Errors list via reflection (just remember you would need FullTrust for that) to report all XSLT compilation errors and warnings. The code is in the public domain - feel free to use it.  I'm going to incorporate it into the next nxslt release. I'd modify it a little bit though - when for some reason (e.g. insufficient permissions) errors info isn't available you still have XsltException with at least first error info.

private void Run(string[] args) {
    XslCompiledTransform xslt = new XslCompiledTransform();
    try {
        xslt.Load(args[0]);
    }
    catch (XsltException) {
        string errors = GetCompileErrors(xslt);
        if (errors == null) {
            // Failed to obtain list of compile errors
            throw;
        }
        Console.Write(errors);
    }
}

// True to output full file names, false to output user-friendly file names
private bool fullPaths = false;

// Cached value of Environment.CurrentDirectory
private string currentDir = null;

/// 
/// Returns user-friendly file name. First, it tries to obtain a file name
/// from the given uriString.
/// Then, if fullPaths == false, and the file name starts with the current
/// directory path, it removes that path from the file name.
/// 
private string GetFriendlyFileName(string uriString) {
    Uri uri;

    if (uriString == null ||
        uriString.Length == 0 ||
        !Uri.TryCreate(uriString, UriKind.Absolute, out uri) ||
        !uri.IsFile
    ) {
        return uriString;
    }

    string fileName = uri.LocalPath;

    if (!fullPaths) {
        if (currentDir == null) {
            currentDir = Environment.CurrentDirectory;
            if (currentDir[currentDir.Length - 1] != Path.DirectorySeparatorChar) {
                currentDir += Path.DirectorySeparatorChar;
            }
        }

        if (fileName.StartsWith(currentDir, StringComparison.OrdinalIgnoreCase)) {
            fileName = fileName.Substring(currentDir.Length);
        }
    }

    return fileName;
}

private string GetCompileErrors(XslCompiledTransform xslt) {
    try {
        MethodInfo methErrors = typeof(XslCompiledTransform).GetMethod(
            "get_Errors", BindingFlags.NonPublic | BindingFlags.Instance);

        if (methErrors == null) {
            return null;
        }

        CompilerErrorCollection errorColl = 
            (CompilerErrorCollection) methErrors.Invoke(xslt, null);
        StringBuilder sb = new StringBuilder();

        foreach (CompilerError error in errorColl) {
            sb.AppendFormat("{0}({1},{2}) : {3} {4}: {5}",
                GetFriendlyFileName(error.FileName),
                error.Line,
                error.Column,
                error.IsWarning ? "warning" : "error",
                error.ErrorNumber,
                error.ErrorText
            );
            sb.AppendLine();
        }
        return sb.ToString();
    }
    catch {
        // MethodAccessException or SecurityException may happen 
        //if we do not have enough permissions
        return null;
    }
}

Feel the difference - here is nxslt2 output:

An error occurred while compiling stylesheet 'file:///D:/projects2005/Test22/Test22/test.xsl': 
System.Xml.Xsl.XslLoadException: Name cannot begin with the '1' character, hexadecimal value 0x31.

And here is Anton's code output:

test.xsl(11,5) : error : Name cannot begin with the '1' character, hexadecimal value 0x31.
test.xsl(12,5) : error : Name cannot begin with the '0' character, hexadecimal value 0x30.
test.xsl(13,5) : error : The empty string '' is not a valid name.
test.xsl(14,5) : error : The ':' character, hexadecimal value 0x3A, cannot be included in a name.
test.xsl(15,5) : error : Name cannot begin with the '-' character, hexadecimal value 0x2D.

It's surprisingly easy in .NET 2.0. Obviously it can't be done with pure XSLT, but an extension function returning line number for a node takes literally two lines. The trick is to use XPathDocument, not XmlDocument to store source XML to be transformed.

The key is IXmlLineInfo interface. Every XPathNavigator over XPathDocument implements this interface and provides line number and line position for every node in a document. Here is a small sample:

using System;
using System.Xml;
using System.Xml.XPath;
using System.Xml.Xsl;

public class Test
{
  static void Main()
  {
    XPathDocument xdoc = new XPathDocument("books.xml");
    XslCompiledTransform xslt = new XslCompiledTransform();
    xslt.Load("foo.xslt", XsltSettings.TrustedXslt,
      new XmlUrlResolver());
    xslt.Transform(xdoc, null, Console.Out);
  }
}
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" 
xmlns:ext="http://example.com/ext" 
extension-element-prefixes="ext">

  <ms:script implements-prefix="ext" 
  xmlns:ms="urn:schemas-microsoft-com:xslt" language="C#">
    public int line(XPathNavigator node)
    {
      IXmlLineInfo lineInfo = node as IXmlLineInfo;
      return lineInfo != null ? lineInfo.LineNumber : 0;
    }
  </ms:script>
  
  <xsl:template match="/">
    <foo>
      <xsl:value-of select="ext:line(//book)">
    </foo>
  </xsl:template>
</xsl:stylesheet>

Ability to report line info is another reason to choose XPathDocument as a store for your XML (in read-only scenarios such as query or transformation) - in addition to better performance and smaller memory footprint.

If you really need the same, but with XmlDocument, you have to extend DOM.

When working with XPath be it in XSLT or C# or Javascript, apostrophes and quotes in string literals is the most annoying thing that drives people crazy. Classical example is selections like "foo[bar="Tom's BBQ"]. This one actually can be written correctly as source.selectNodes("foo[bar=\"Tom's BBQ\"]"), but what if your string is something crazy as A'B'C"D" ? XPath syntax doesn't allow such value to be used as a string literal altogether- it just can't be surrounded with neither apostrophes nor quotes. How do you eliminate such annoyances? 

The solution is simple: don't build XPath expressions concatenating strings. Use variables as you would do in any other language. Say no to

selectNodes("foo[bar=\"Tom's BBQ\"]") 
and say yes to
selectNodes("foo[bar=$var]")

How do you implement this in .NET? System.Xml.XPath namespace provides all functionality you need in XPathExpression/IXsltContextVariable classes, but using them directly is pretty much cumbersome and too geeky for the majority of developers who just love SelectNodes() method for its simplicity.

The Mvp.Xml project comes to rescue providing XPathCache class:

XPathCache.SelectSingleNode("//foo[bar=$var]",
    doc, new XPathVariable("var", "A'B'C\"D\""))

And this is not only stunningly simple, but safe - remember XPath injection attacks?

You can download latest Mvp.Xml v2.0 drop at our new project homepage at the Codeplex.

Yeah, I know it's an old problem and all are tired of this one, but it's still newsgroups' hit. Sometimes XSLT is the off-shelf solution (not really perf-friendly though), but <xsl:output indent="yes"/> is just ignored in MSXML. In .NET one can leverage XmlTextWriter's formatting capabilities, but what in MSXML? Well, as apparently many forgot MSXML implements SAX2 and includes MXXMLWriter class, which implements XML pretty-printing and is also SAX ContentHandler, so can handle SAXXMLReader's events. That's all needed to pretty-print XML document in a pretty streaming way:

<html>
   <head>
      <title>MXXMLWriter sample.</title>
      <script type="text/javascript">
      var reader = new ActiveXObject("Msxml2.SAXXMLReader.4.0");
      var writer = new ActiveXObject("Msxml2.MXXMLWriter.4.0");        
      writer.indent = true;
      writer.standalone = true;
      reader.contentHandler = writer;            
      reader.putProperty("http://xml.org/sax/properties/lexical-handler", writer);
      reader.parseURL("source.xml");
      alert(writer.output);           
      </script>
   </head>
   <body>
      <p>MXXMLWriter sample.</p>
   </body>
</html>

This is small trick for newbies looking for a way to get URI of a source XML and the stylesheet from within XSLT stylesheet.

As a matter of interest - how would you implement breadth-first tree traversal in XSLT? Traditional algorithm is based on using a queue and hence isn't particularly suitable here. Probably it's feasible to emulate a queue with temporary trees, but I think that's going to be quite ineffective. Being not procedural, but declarative language XSLT needs different approach. Here is what I came up with:

This interesting trick has been discussed in microsoft.public.dotnet.xml newsgroup recently. When one has a no-namespaced XML document, such as

<?xml version="1.0"?>
<foo>
    <bar>Blah</bar>    
</foo>
there is a trick in .NET, which allows to read such document as if it has some default namespace:
<?xml version="1.0"?>
<foo xmlns="http://foo.com">
    <bar>Blah</bar>    
</foo>

I'm introducing another category in my blog - XML Tips and Tricks, where I'm going to post some XML, XPath, XSLT, XML Schema, XQuery etc tips and tricks. I know, many of my readers being real XML gurus know all this stuff (I encourage to correct me when I'm wrong or proposing better versions though), but I hope it would be interesting for the rest and may attract new readers.

Here is the first instalment - conditional XPath expressions.