October 25, 2004

Implementing XML Base in .NET

XML Base is a tiny W3C Recommendation, just couple of pages. It facilitates defining base URIs for parts of XML documents via semantically predefined xml:base attribute (similar to that of HTML BASE element). It's XML Core spec, standing in one line with "Namespaces in XML" and XML InfoSet. Published back ...

So what XML Base is all about? It introduces xml:base attribute with predefined semantics (just like xml:space or xml:lang) of manipulating base URIs. xml:base attribute can be inserted anywhere in any XML document to specify for the element and its descendants base URI other than the base URI of the document or extenal entity. One purpose is to provide native XML way to define base URIs. Another purpose is resolving of relative URIs in XML documents, e.g. when document A is included into document B in some different location, relative URIs in the content of A would be broken. To keep them identifying the same resources xml:base attribute is used. If you still don't get it, take a look at a sample in the "Preserving Base URI" section of the "Combining XML Documents with XInclude" article at the MSDN Xml Dev Center. So it's basically XML's analog of the HTML's BASE tag.

Basically System.Xml supports base URIs all over the infastructure, the only problem is that basic syntax-level facilities such as XmlTextReader and XmlTextWriter ignore xml:base attribute when parsing and writing XML. Can we add such support in a transparent way? Sure. Let's take XmlTextReader, extend it in such way that each time it gets positioned on an element which bears xml:base attribute, BaseUri propery gets updated to reflect it. Here it is:

public class XmlBaseAwareXmlTextReader : XmlTextReader 
{
    private XmlBaseState _state = new XmlBaseState();
    private Stack _states = null;
    
    //Add more constructors as needed    
    public XmlBaseAwareXmlTextReader(string uri)
        : base(uri) 
    {
        _state.BaseUri = new Uri(base.BaseURI);
    }

    public override string BaseURI
    {
        get
        {
            return _state.BaseUri==null? "" : _state.BaseUri.AbsoluteUri;
        }
    }

    public override bool Read()
    {   
        bool baseRead = base.Read();
        if (baseRead) 
        {
            if (base.NodeType == XmlNodeType.Element &&
                base.HasAttributes) 
            {
                string baseAttr = GetAttribute("xml:base");
                if (baseAttr == null)
                    return baseRead;                
                Uri newBaseUri = null;
                if (_state.BaseUri == null)
                    newBaseUri = new Uri(baseAttr);        
                else
                    newBaseUri = new Uri(_state.BaseUri, baseAttr);                        
                if (_states == null)
                    _states = new Stack();
                //Push current state and allocate new one
                _states.Push(_state); 
                _state = new XmlBaseState(newBaseUri, base.Depth);
            }
            else if (base.NodeType == XmlNodeType.EndElement) 
            {
                if (base.Depth == _state.Depth && _states.Count > 0) 
                {
                    //Pop previous state
                    _state = (XmlBaseState)_states.Pop();
                }
            }
        }
        return baseRead;            
    }     
}

internal class XmlBaseState 
{
    public XmlBaseState() {}
    public XmlBaseState(Uri baseUri, int depth) 
    {
        this.BaseUri = baseUri;
        this.Depth = depth;
    }
    public Uri BaseUri;
    public int Depth;
}
Simple, huh? Now let's test it. Suppose I have a collection of XML documents in the "d:/Files" directory and a catalog XML file, such as
<catalog>
  <files xml:base="file:///d:/Files/">
    <file name="file1.xml"/>
  </files>
</catalog>
As you can see, xml:base attribute here defines base URI for files element subtree to be file:///d:/Files/ so file names are to be resolved relative to that folder no matter where catalog file is actually placed. (Of course I could have absolute URIs instead, but sure having absolute URIs hardcoded in every single place easily leads to a maintenance nightmare for any real system).

While loading this document to XPathDocument via XmlBaseAwareXmlTextReader it can be seen that base URIs are preserved as per XML Base spec:

XmlReader r = new XmlBaseAwareXmlTextReader("foo.xml");
XPathDocument doc = new XPathDocument(r);
XPathNavigator nav = doc.CreateNavigator();
XPathNodeIterator ni = nav.Select("/catalog");
if (ni.MoveNext())
  Console.WriteLine(ni.Current.BaseURI);
ni = nav.Select("/catalog/files/file");
if (ni.MoveNext())
  Console.WriteLine(ni.Current.BaseURI);
outputs
file:///D:/projects/Test/foo.xml
file:///d:/Files/
Unfortunatley XmlDocument doesn't seem to be so smart as XPathDocument on that matter and only supports base URI of the document and external entities. Too bad, too bad.

Ok, that was abstract test, now consider some XSLT processing - I load files by name for some processing using document() function. Recall that by default (single argument) document() function resolves relative URIs relatively to XSLT stylesheet's base URI (strictly speaking relatively to the base URI of the XSLT instruction which contains document() function). To resolve URIs relatively to some other base URI, second argument is used. So I'm going to pass <file> elements to the document() function as a second argumen for resolving URIs relitely to their base URI (which is defined via xml:base attribute on their parent element <files>):

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
  <xsl:template match="files">
    <files>
      <xsl:apply-templates/>
    </files>
  </xsl:template>
  <xsl:template match="file">
    <xsl:copy-of select="document(@name, .)"/>
  </xsl:template>
</xsl:stylesheet>
The code is as simple as
XmlReader r = new XmlBaseAwareXmlTextReader("foo.xml");
XPathDocument doc = new XPathDocument(r);
XslTransform xslt = new XslTransform();
xslt.Load("foo.xsl");
xslt.Transform(doc, null, Console.Out);
The result is
<files>
  <para>File 1 content</para>
</files>
As you can see, when using XmlBaseAwareXmlTextReader with XPathDocument one can get XML Base support for XPath and XSLT.

Alternatively I could implement XmlBaseAwareXmlTextReader as XmlReader, not as XmlTextReader (if you know the difference). And in the same simple way XML Base can be implemented for XML writing as XmlBaseAwareXmlTextWriter. Similar classes are used in XInclude.NET and I'm also going to add XmlBaseAwareXmlTextReader and XmlBaseAwareXmlTextWriter to our collection of custom XML tools in the MVP.XML project.

Update: XmlBaseAwareXmlTextReader is now part of the Common module of the MVP.XML library.

Did you know? XSLT 1.0 and XSLT 2.0 can be mixed

I missed that point somehow: The trouble is that XSLT allows regions of a stylesheet to belong to different versions. In XSLT 1.0, you can put an xsl:version attribute on any literal result element to indicate the version of XSLT used in the content of that element. In XSLT 2.0 ...