October 27, 2004

XSL-FO to WordML stylesheet

Jirka Kosek has announced a tool (XSLT stylesheet actually) for converting XSL-FO documents to WordML. Get it at http://fo2wordml.sourceforge.net. ...

October 25, 2004

Implementing XML Base in .NET

XML Base is a tiny W3C Recommendation, just couple of pages. It facilitates defining base URIs for parts of XML documents via semantically predefined xml:base attribute (similar to that of HTML BASE element). It's XML Core spec, standing in one line with "Namespaces in XML" and XML InfoSet. Published back ...

So what XML Base is all about? It introduces xml:base attribute with predefined semantics (just like xml:space or xml:lang) of manipulating base URIs. xml:base attribute can be inserted anywhere in any XML document to specify for the element and its descendants base URI other than the base URI of the document or extenal entity. One purpose is to provide native XML way to define base URIs. Another purpose is resolving of relative URIs in XML documents, e.g. when document A is included into document B in some different location, relative URIs in the content of A would be broken. To keep them identifying the same resources xml:base attribute is used. If you still don't get it, take a look at a sample in the "Preserving Base URI" section of the "Combining XML Documents with XInclude" article at the MSDN Xml Dev Center. So it's basically XML's analog of the HTML's BASE tag.

Basically System.Xml supports base URIs all over the infastructure, the only problem is that basic syntax-level facilities such as XmlTextReader and XmlTextWriter ignore xml:base attribute when parsing and writing XML. Can we add such support in a transparent way? Sure. Let's take XmlTextReader, extend it in such way that each time it gets positioned on an element which bears xml:base attribute, BaseUri propery gets updated to reflect it. Here it is:

public class XmlBaseAwareXmlTextReader : XmlTextReader 
{
    private XmlBaseState _state = new XmlBaseState();
    private Stack _states = null;
    
    //Add more constructors as needed    
    public XmlBaseAwareXmlTextReader(string uri)
        : base(uri) 
    {
        _state.BaseUri = new Uri(base.BaseURI);
    }

    public override string BaseURI
    {
        get
        {
            return _state.BaseUri==null? "" : _state.BaseUri.AbsoluteUri;
        }
    }

    public override bool Read()
    {   
        bool baseRead = base.Read();
        if (baseRead) 
        {
            if (base.NodeType == XmlNodeType.Element &&
                base.HasAttributes) 
            {
                string baseAttr = GetAttribute("xml:base");
                if (baseAttr == null)
                    return baseRead;                
                Uri newBaseUri = null;
                if (_state.BaseUri == null)
                    newBaseUri = new Uri(baseAttr);        
                else
                    newBaseUri = new Uri(_state.BaseUri, baseAttr);                        
                if (_states == null)
                    _states = new Stack();
                //Push current state and allocate new one
                _states.Push(_state); 
                _state = new XmlBaseState(newBaseUri, base.Depth);
            }
            else if (base.NodeType == XmlNodeType.EndElement) 
            {
                if (base.Depth == _state.Depth && _states.Count > 0) 
                {
                    //Pop previous state
                    _state = (XmlBaseState)_states.Pop();
                }
            }
        }
        return baseRead;            
    }     
}

internal class XmlBaseState 
{
    public XmlBaseState() {}
    public XmlBaseState(Uri baseUri, int depth) 
    {
        this.BaseUri = baseUri;
        this.Depth = depth;
    }
    public Uri BaseUri;
    public int Depth;
}
Simple, huh? Now let's test it. Suppose I have a collection of XML documents in the "d:/Files" directory and a catalog XML file, such as
<catalog>
  <files xml:base="file:///d:/Files/">
    <file name="file1.xml"/>
  </files>
</catalog>
As you can see, xml:base attribute here defines base URI for files element subtree to be file:///d:/Files/ so file names are to be resolved relative to that folder no matter where catalog file is actually placed. (Of course I could have absolute URIs instead, but sure having absolute URIs hardcoded in every single place easily leads to a maintenance nightmare for any real system).

While loading this document to XPathDocument via XmlBaseAwareXmlTextReader it can be seen that base URIs are preserved as per XML Base spec:

XmlReader r = new XmlBaseAwareXmlTextReader("foo.xml");
XPathDocument doc = new XPathDocument(r);
XPathNavigator nav = doc.CreateNavigator();
XPathNodeIterator ni = nav.Select("/catalog");
if (ni.MoveNext())
  Console.WriteLine(ni.Current.BaseURI);
ni = nav.Select("/catalog/files/file");
if (ni.MoveNext())
  Console.WriteLine(ni.Current.BaseURI);
outputs
file:///D:/projects/Test/foo.xml
file:///d:/Files/
Unfortunatley XmlDocument doesn't seem to be so smart as XPathDocument on that matter and only supports base URI of the document and external entities. Too bad, too bad.

Ok, that was abstract test, now consider some XSLT processing - I load files by name for some processing using document() function. Recall that by default (single argument) document() function resolves relative URIs relatively to XSLT stylesheet's base URI (strictly speaking relatively to the base URI of the XSLT instruction which contains document() function). To resolve URIs relatively to some other base URI, second argument is used. So I'm going to pass <file> elements to the document() function as a second argumen for resolving URIs relitely to their base URI (which is defined via xml:base attribute on their parent element <files>):

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
  <xsl:template match="files">
    <files>
      <xsl:apply-templates/>
    </files>
  </xsl:template>
  <xsl:template match="file">
    <xsl:copy-of select="document(@name, .)"/>
  </xsl:template>
</xsl:stylesheet>
The code is as simple as
XmlReader r = new XmlBaseAwareXmlTextReader("foo.xml");
XPathDocument doc = new XPathDocument(r);
XslTransform xslt = new XslTransform();
xslt.Load("foo.xsl");
xslt.Transform(doc, null, Console.Out);
The result is
<files>
  <para>File 1 content</para>
</files>
As you can see, when using XmlBaseAwareXmlTextReader with XPathDocument one can get XML Base support for XPath and XSLT.

Alternatively I could implement XmlBaseAwareXmlTextReader as XmlReader, not as XmlTextReader (if you know the difference). And in the same simple way XML Base can be implemented for XML writing as XmlBaseAwareXmlTextWriter. Similar classes are used in XInclude.NET and I'm also going to add XmlBaseAwareXmlTextReader and XmlBaseAwareXmlTextWriter to our collection of custom XML tools in the MVP.XML project.

Update: XmlBaseAwareXmlTextReader is now part of the Common module of the MVP.XML library.

Did you know? XSLT 1.0 and XSLT 2.0 can be mixed

I missed that point somehow: The trouble is that XSLT allows regions of a stylesheet to belong to different versions. In XSLT 1.0, you can put an xsl:version attribute on any literal result element to indicate the version of XSLT used in the content of that element. In XSLT 2.0 ...

October 24, 2004

Indenting attributes with XmlTextWriter

XmlTextWriter in .NET 1.X only supports indentation of the following node types: DocumentType, Element, Comment, ProcessingInstruction, and CDATA. No attributes. So how to get attributes indented anyway? If you can - wait .NET 2.0 with cool XmlWriterSettings.NewLineOnAttributes, otherwise - here is a hack how to get attributes indented with XmlTextWriter ...

Well, XmlWriter isn't particularly low-level writer, it's abstract XML oriented API, so its implementation XmlTextWriter wouldn't allow you to just override WriteStartAttribute() method and inject indentation characters before each attribute - it would be considered as an exceptional attempt to write an attribute after a content has been already written. But when instantiating XmlTextWriter on top of some TextWriter, one can inject indentation before each attribute to that underlying TextWriter. It doesn't look particularly clean, but anyway:

public class AttributeIndentingXmlTextWriter : XmlTextWriter 
{
    private TextWriter w;
    private int depth;

    //Add constructors as needed
    public AttributeIndentingXmlTextWriter(TextWriter w)
        : base(w) 
    {
        this.w = w;
    }

    public override void WriteStartElement(string prefix, 
        string localName, string ns)    
    {
        depth ++;
        base.WriteStartElement(prefix, localName, ns);
    }

    public override void WriteFullEndElement()
    {
        depth--;
        base.WriteFullEndElement();
    }    

    public override void WriteEndElement()
    {
        depth--;
        base.WriteEndElement();
    }    

    public override void WriteStartAttribute(string prefix, 
        string localName, string ns)
    {
        if (base.Formatting == Formatting.Indented) 
        {   
            w.WriteLine();
            for (int i=1; i<Indentation*depth; i++)
                w.Write(IndentChar);
        }
        base.WriteStartAttribute(prefix, localName, ns);
    }
}
Usage:
XmlTextWriter w = 
  new AttributeIndentingXmlTextWriter(Console.Out);
w.Formatting = Formatting.Indented;
w.WriteStartDocument();
w.WriteStartElement("foo");
w.WriteAttributeString("attr1", "value1");
w.WriteAttributeString("attr2", "value2");
w.WriteAttributeString("attr3", "value3");
w.WriteStartElement("bar");
w.WriteAttributeString("attr1", "value1");
w.WriteAttributeString("attr2", "value2");
w.WriteAttributeString("attr3", "value3");
w.WriteString("some text");
w.WriteEndElement();
w.WriteEndElement();
w.WriteEndDocument();
The result is as follows:
<foo
  attr1="value1"
  attr2="value2"
  attr3="value3">
  <bar
    attr1="value1"
    attr2="value2"
    attr3="value3">some text</bar>
</foo>

Samples are templates

DonXML writes on viral coding examples in presentations on using XML in .NET: Joe Fawcett (fellow XML MVP) came across a great example (from the Microsoft.Public.Xml newsgroup) of one of my biggest pet peeves, "We (the community) are doing a very poor job teaching the average developer how to use ...