May 31, 2006

On creating custom XmlReaders/XmlWriters in .NET 2.0, Part 2

This is second part of the post. Find first part here. So what is a better way of creating custom XmlReader/XmlWriter in .NET 2.0? Here is the idea - have an utility wrapper class, which wraps XmlReader/XmlWriter and does nothing else. Then derive from this class and override methods you ...

Btw, most likely the Mvp.Xml project will be rehosted to the CodePlex. SourceForge is ok, but the idea is that more Microsoft-friendly environment will induce more contributors both XML MVPs and others.

+ XmlWrappingReader class.

+ XmlWrappingWriter class.

Now instead of deriving from legacy XmlTextReader/XmlTextWriter you derive from XmlWrappingReader/XmlWrappingWriter passing to their constructors XmlReader/XmlWriter created via Create() factory method:
public class RenamingXmlReader : XmlWrappingReader
{
    //Provide as many contructors as you need
    public RenamingXmlReader(string file)
        : base(XmlReader.Create(file)) { }

    public override string Name
    {
        get
        {
            return (NodeType == XmlNodeType.Element && 
                base.Name == "value") ? 
                "price" : base.Name;
        }
    }

    public override string LocalName
    {
        get
        {
            return (NodeType == XmlNodeType.Element && 
                base.LocalName == "value") ?
                "price" : base.LocalName;
        }
    }
}
public class RenamingXmlWriter : XmlWrappingWriter
{
    //Provide as many contructors as you need
    public RenamingXmlWriter(TextWriter output)
        : base(XmlWriter.Create(output)) { }

    public override void WriteStartElement(string prefix, string localName, string ns)
    {
        if (string.IsNullOrEmpty(prefix) && localName == "price")
        {
            localName = "value";
        }
        base.WriteStartElement(prefix, localName, ns);
    }
}

That's it. Not much different from previous option in terms of coding, but free of weird XmlTextReader/XmlTextWriter legacy.

AFAIK there is still one problem with this approach though, which is DTD validation. I mean cascading validating XmlReader on top of custom XmlReader scenario. E.g. if you need to resolve XInclude or rename couple of elements and validate resulting XML against DTD in one shot. In .NET 2.0 if you want to DTD validate XML that comes from another XmlReader that reader must be an instance of XmlTextReader. That's undocumented limitation and it was left sort of deliberatly - after all who cares about DTD nowadays? XML Schema validation is not affected by this limitation.

May 30, 2006

On creating custom XmlReaders/XmlWriters in .NET 2.0

When developing custom XmlReader or XmlWriter in .NET 2.0 there is at least three options: implement XmlReader/XmlWriter extend one of concrete XmlReader/XmlWriter implementations and override only methods you need implement XmlReader/XmlWriter by wrapping one of concrete XmlReader/XmlWriter implementations and overriding only methods you need ...

First way is a for full-blown XmlReader/XmlWriters, which need to implement each aspects of XML reading/writing in a different way, e.g. XmlCsvReader, which reads CSV as XML or XslReader, which streamlines XSLT output. This is the most clean while the hardest way - XmlReader has 26 (and XmlWriter - 24) abstract members you would have to implement. Second and third options are for override-style custom readers/writers. When you only need to partially modify XmlReader/XmlWriter behavior it's an overkill to reimplement it from scratch. As this is the most usual scenario I'll concentrate on these two options.

OOP taught us - inherit the class whose behaviour you want to modify and override its virtual members. That's the easiest and most popular way of writing custom XmlReader/XmlWriter. For example let's say you are receiving XML documents from your business partners and one particularly annoying one keeps sending element <value>, while according to the schema you expect it to be <price>. So you need renaming plumbing - XmlReader that reads <value> as <price> and XmlWriter that writes <price> as <value>. Here are implementations extending standard XmlTextReader and XmlTextWriter classes:

public class RenamingXmlReader : XmlTextReader
{
    //Provide as many constructors as you need
    public RenamingXmlReader(string file)
        : base(file) { }

    public override string Name
    {
        get
        {
            return (NodeType == XmlNodeType.Element && 
                base.Name == "value") ? 
                "price" : base.Name;
        }
    }

    public override string LocalName
    {
        get
        {
            return (NodeType == XmlNodeType.Element && 
                base.LocalName == "value") ?
                "price" : base.LocalName;
        }
    }
}

public class RenamingXmlWriter : XmlTextWriter
{
    //Provide as many constructors as you need
    public RenamingXmlWriter(TextWriter output)
        : base(output) { }

    public override void WriteStartElement(string prefix, 
        string localName, string ns)
    {
        if (string.IsNullOrEmpty(prefix) && localName == "price")
        {
            localName = "value";
        }
        base.WriteStartElement(prefix, localName, ns);
    }
}
Looks nice, but there is a couple of serious drawbacks in this approach though:
  1. XmlTextReader and XmlTextWriter are kinda legacy classes. They were introduced in .NET 1.0 with some deviations from the XML standard and as it usually happens now Microsoft have to support those deviations for the sake of backwards compatibility. That means that there some aspects where XmlTextReader/XmlTextWriter and XmlReader/XmlWriter created via Create() method behave differently, in short - Create() method creates more conformant XmlReader/XmlWriter instances than XmlTextReader/XmlTextWriter, e.g. XmlTextWriter does not check for the following:
    • Invalid characters in attribute and element names.
    • Unicode characters that do not fit the specified encoding. If the Unicode characters do not fit the specified encoding, the XmlTextWriter does not escape the Unicode characters into character entities.
    • Duplicate attributes.
    • Characters in the DOCTYPE public identifier or system identifier.
    And XmlTextReader doesn't check character range for numeric entities and so allows � unless Normalization property (off by default) is turned on etc.

    So beware that when deriving from XmlTextReader/XmlTextWriter you are gonna inherit all their crappy legacy weirdnesses too.

  2. Since .NET 1.1 XmlTextReader has FullTrust inheritance demand, i.e. only fully trusted classes can derive from XmlTextReader. That means that above RenaminXmlReader won't work in partially trusted environment such as ASP.NET or ClickOnce. I run into this unpleasant issue with my free eXml Web server control - since XmlBaseAwareXmlTextReader derives from XmlTextReader (just because I was too lazy when creating it) eXml control cannot do XInclude unless working under FullTrust.
So actually I wouldn't recommend deriving from XmlTextReader/XmlTextWriter.

A third approach is to wrap XmlReader/XmlWriter created via Create() method and override methods you need. This approach is used by .NET itself. This requires a little bit more code, but as a result you get clean design and easily composable implementation. I'll cover it tomorrow 'cause Tel-Aviv traffic jams are waiting for me now.

May 29, 2006

On XmlReader/XmlWriter design changes in .NET 2.0

From .NET 1.X experience Microsoft seems finally figured out that providing a set of concrete poorly composable XmlReader and XmlWriter implementations (XmlTextReader, XmlTextWriter, XmlValidatingReader, XmlNodeReader) and emphasizing on programming with concrete classes instead of anstract XmlReader/Xmlwriter was really bad idea. One notorious horrible sample was XmlValidatingReader accepting abstract XmlReader instance ...

Well, unfortunately when you do something wrong for years you can't fix things "in a one swift stroke". Sadly, but MSDN documentation itself (even System.Xml namespace documentation!) is still full of samples using "not recommended" XmlTextReader and XmlTextWriters. And if you don't follow you words youself why would others do? Reality is that .NET developers didn't notice that shy recommendation and keep programming into concrete XML reader/writer classes.

So it wasn't a surprise for me to see fresh MSDN Magazine (06/2006) article "Five Ways to Emit Test Results as XML" promoting using XmlTextWriter class once again and not even mentioning XmlWriter.Create() pattern. Hey, I thought it's a requirement for MSDN authors to read MSDN documentation on a subject? Apparently it's not.

Another thing that bothers me is that XmlReader/XmlWriter factory has no extensibility points whatsoever. So that's not an option for non-Microsoft XML parsers and writers like XIncludingReader or MultiXmlTextWriter to be created in that factory. There is no way to plug in support for XML Base into a system without writing actual code that instantiates XmlBaseAwareXmlTextReader. Microsoft still considers providing abstract factory for XML readers/writers too geeky and similar (in terms or weirdness) to providing abstract factory for string class. Well, it probably would take another couple of .NET versions to grow up before .NET would come to extensibility Java already has for years.