May 30, 2006

On creating custom XmlReaders/XmlWriters in .NET 2.0

When developing custom XmlReader or XmlWriter in .NET 2.0 there is at least three options:

  1. implement XmlReader/XmlWriter
  2. extend one of concrete XmlReader/XmlWriter implementations and override only methods you need
  3. implement XmlReader/XmlWriter by wrapping one of concrete XmlReader/XmlWriter implementations and overriding only methods you need

...

First way is a for full-blown XmlReader/XmlWriters, which need to implement each aspects of XML reading/writing in a different way, e.g. XmlCsvReader, which reads CSV as XML or XslReader, which streamlines XSLT output. This is the most clean while the hardest way - XmlReader has 26 (and XmlWriter - 24) abstract members you would have to implement. Second and third options are for override-style custom readers/writers. When you only need to partially modify XmlReader/XmlWriter behavior it's an overkill to reimplement it from scratch. As this is the most usual scenario I'll concentrate on these two options.

OOP taught us - inherit the class whose behaviour you want to modify and override its virtual members. That's the easiest and most popular way of writing custom XmlReader/XmlWriter. For example let's say you are receiving XML documents from your business partners and one particularly annoying one keeps sending element <value>, while according to the schema you expect it to be <price>. So you need renaming plumbing - XmlReader that reads <value> as <price> and XmlWriter that writes <price> as <value>. Here are implementations extending standard XmlTextReader and XmlTextWriter classes:

public class RenamingXmlReader : XmlTextReader
{
    //Provide as many constructors as you need
    public RenamingXmlReader(string file)
        : base(file) { }

    public override string Name
    {
        get
        {
            return (NodeType == XmlNodeType.Element && 
                base.Name == "value") ? 
                "price" : base.Name;
        }
    }

    public override string LocalName
    {
        get
        {
            return (NodeType == XmlNodeType.Element && 
                base.LocalName == "value") ?
                "price" : base.LocalName;
        }
    }
}

public class RenamingXmlWriter : XmlTextWriter
{
    //Provide as many constructors as you need
    public RenamingXmlWriter(TextWriter output)
        : base(output) { }

    public override void WriteStartElement(string prefix, 
        string localName, string ns)
    {
        if (string.IsNullOrEmpty(prefix) && localName == "price")
        {
            localName = "value";
        }
        base.WriteStartElement(prefix, localName, ns);
    }
}
Looks nice, but there is a couple of serious drawbacks in this approach though:
  1. XmlTextReader and XmlTextWriter are kinda legacy classes. They were introduced in .NET 1.0 with some deviations from the XML standard and as it usually happens now Microsoft have to support those deviations for the sake of backwards compatibility. That means that there some aspects where XmlTextReader/XmlTextWriter and XmlReader/XmlWriter created via Create() method behave differently, in short - Create() method creates more conformant XmlReader/XmlWriter instances than XmlTextReader/XmlTextWriter, e.g. XmlTextWriter does not check for the following:
    • Invalid characters in attribute and element names.
    • Unicode characters that do not fit the specified encoding. If the Unicode characters do not fit the specified encoding, the XmlTextWriter does not escape the Unicode characters into character entities.
    • Duplicate attributes.
    • Characters in the DOCTYPE public identifier or system identifier.
    And XmlTextReader doesn't check character range for numeric entities and so allows � unless Normalization property (off by default) is turned on etc.

    So beware that when deriving from XmlTextReader/XmlTextWriter you are gonna inherit all their crappy legacy weirdnesses too.

  2. Since .NET 1.1 XmlTextReader has FullTrust inheritance demand, i.e. only fully trusted classes can derive from XmlTextReader. That means that above RenaminXmlReader won't work in partially trusted environment such as ASP.NET or ClickOnce. I run into this unpleasant issue with my free eXml Web server control - since XmlBaseAwareXmlTextReader derives from XmlTextReader (just because I was too lazy when creating it) eXml control cannot do XInclude unless working under FullTrust.
So actually I wouldn't recommend deriving from XmlTextReader/XmlTextWriter.

A third approach is to wrap XmlReader/XmlWriter created via Create() method and override methods you need. This approach is used by .NET itself. This requires a little bit more code, but as a result you get clean design and easily composable implementation. I'll cover it tomorrow 'cause Tel-Aviv traffic jams are waiting for me now.