On creating custom XmlReaders/XmlWriters in .NET 2.0

When developing custom XmlReader or XmlWriter in .NET 2.0 there is at least three options:

implement XmlReader/XmlWriter
extend one of concrete XmlReader/XmlWriter implementations and override only methods you need
implement XmlReader/XmlWriter by wrapping one of concrete XmlReader/XmlWriter implementations and overriding only methods you need

First way is a for full-blown XmlReader/XmlWriters, which need to implement each aspects of XML reading/writing in a different way, e.g. XmlCsvReader, which reads CSV as XML or XslReader, which streamlines XSLT output. This is the most clean while the hardest way - XmlReader has 26 (and XmlWriter - 24) abstract members you would have to implement. Second and third options are for override-style custom readers/writers. When you only need to partially modify XmlReader/XmlWriter behavior it's an overkill to reimplement it from scratch. As this is the most usual scenario I'll concentrate on these two options.

OOP taught us - inherit the class whose behaviour you want to modify and override its virtual members. That's the easiest and most popular way of writing custom XmlReader/XmlWriter. For example let's say you are receiving XML documents from your business partners and one particularly annoying one keeps sending element <value>, while according to the schema you expect it to be <price>. So you need renaming plumbing - XmlReader that reads <value> as <price> and XmlWriter that writes <price> as <value>. Here are implementations extending standard XmlTextReader and XmlTextWriter classes:

public class RenamingXmlReader : XmlTextReader
{
    //Provide as many constructors as you need
    public RenamingXmlReader(string file)
        : base(file) { }

    public override string Name
    {
        get
        {
            return (NodeType == XmlNodeType.Element && 
                base.Name == "value") ? 
                "price" : base.Name;
        }
    }

    public override string LocalName
    {
        get
        {
            return (NodeType == XmlNodeType.Element && 
                base.LocalName == "value") ?
                "price" : base.LocalName;
        }
    }
}

public class RenamingXmlWriter : XmlTextWriter
{
    //Provide as many constructors as you need
    public RenamingXmlWriter(TextWriter output)
        : base(output) { }

    public override void WriteStartElement(string prefix, 
        string localName, string ns)
    {
        if (string.IsNullOrEmpty(prefix) && localName == "price")
        {
            localName = "value";
        }
        base.WriteStartElement(prefix, localName, ns);
    }
}

Looks nice, but there is a couple of serious drawbacks in this approach though:

XmlTextReader and XmlTextWriter are kinda legacy classes. They were introduced in .NET 1.0 with some deviations from the XML standard and as it usually happens now Microsoft have to support those deviations for the sake of backwards compatibility. That means that there some aspects where XmlTextReader/XmlTextWriter and XmlReader/XmlWriter created via Create() method behave differently, in short - Create() method creates more conformant XmlReader/XmlWriter instances than XmlTextReader/XmlTextWriter, e.g. XmlTextWriter does not check for the following:
- Invalid characters in attribute and element names.
- Unicode characters that do not fit the specified encoding. If the Unicode characters do not fit the specified encoding, the XmlTextWriter does not escape the Unicode characters into character entities.
- Duplicate attributes.
- Characters in the DOCTYPE public identifier or system identifier.
And XmlTextReader doesn't check character range for numeric entities and so allows � unless Normalization property (off by default) is turned on etc.
So beware that when deriving from XmlTextReader/XmlTextWriter you are gonna inherit all their crappy legacy weirdnesses too.
Since .NET 1.1 XmlTextReader has FullTrust inheritance demand, i.e. only fully trusted classes can derive from XmlTextReader. That means that above RenaminXmlReader won't work in partially trusted environment such as ASP.NET or ClickOnce. I run into this unpleasant issue with my free eXml Web server control - since XmlBaseAwareXmlTextReader derives from XmlTextReader (just because I was too lazy when creating it) eXml control cannot do XInclude unless working under FullTrust.

So actually I wouldn't recommend deriving from XmlTextReader/XmlTextWriter.

A third approach is to wrap XmlReader/XmlWriter created via Create() method and override methods you need. This approach is used by .NET itself. This requires a little bit more code, but as a result you get clean design and easily composable implementation. I'll cover it tomorrow 'cause Tel-Aviv traffic jams are waiting for me now.

3 Comments

Igor | October 18, 2006 11:14 AM | Reply

Oleg, what can I say but.... thanks!!!
You just saved my day. I haven't looked at this blog to see if you posted a reply for a while now, and as I was actively frustrated about the workaround I made (transforming to MemoryStream, converting to string, adding the doctype manually and than binary writing the final string to response) I thought I'd have a look here again and... you have an answer, and it works perfectly!
Thanks again.

Oleg Tkachenko | September 28, 2006 4:48 PM | Reply

Igor, it's a bit tricky. XslCompiledTransform stores information about output doctype in the OutputSettings property. You have to pass this value to your custom XmlWriter:

1. Have a constructor accepting XmlWriterSettings in your custom XmlWriter class:

public class XhtmlWriter : XmlWrappingWriter
{
public XhtmlWriter(TextWriter output, XmlWriterSettings settings)
: base(XmlWriter.Create(output, settings)) { }

2. Then when transforming, pass XslCompiledTransform.OutputSettings property to your custom XmlWriter constructor:

xslt.Transform(doc, null, new XhtmlWriter(Console.Out, xslt.OutputSettings));

That's it.

Igor | September 18, 2006 11:21 AM | Reply

I am trying to create an XmlWriter for .net 2.0 that outputs valid xhtml.
The problem with using XslCompiledTransform and a stylesheet that has output method set to 'xml' is that it 'helpfully' collapses all tags without content, eg: <script...></script> becomes <script />. This in turn totally breaks the page when opened in the browser.
So I implemented my own XmlWriter, deriving from XmlTextWriter and overriding WriteStartElement and WriteEndElement. It defines the tags that must have the closing tags, and ensures that they indeed get the full closing tag. This much works perfectly. But for some reason, this derived XmlTextWriter did not output the <?xml... ?> processing instruction, nor the doctype section.
After some digging around I found your post about XmlWrappingWriter and changed my code to derive from this class, rather than from XmlTextWriter. As a result, the xml processing instruction now does get written, but I still cannot get it to output the doctype declaration. I find this all quite confusing; one would expect that by overriding the appropriate methods the desired handling of output can be achieved, but the WriteDocType method never gets called with my custom XmlWriter. Do you have any idea about what is going on here?

Name

Email Address

URL

Remember personal info?

Comments (You may use HTML tags for style)

On creating custom XmlReaders/XmlWriters in .NET 2.0

Tags:

Related Blog Posts

No TrackBacks

3 Comments

Leave a comment

Search

About this Entry

Recent Tweets

Recent Comments

Recent Posts

On creating custom XmlReaders/XmlWriters in .NET 2.0

Tags:

Related Blog Posts

No TrackBacks

3 Comments

Leave a comment

Search

About this Entry

Recent Tweets

Archives

Tag Cloud

Recent Comments

Recent Posts