May 31, 2006

On creating custom XmlReaders/XmlWriters in .NET 2.0, Part 2

This is second part of the post. Find first part here. So what is a better way of creating custom XmlReader/XmlWriter in .NET 2.0? Here is the idea - have an utility wrapper class, which wraps XmlReader/XmlWriter and does nothing else. Then derive from this class and override methods you ...

Btw, most likely the Mvp.Xml project will be rehosted to the CodePlex. SourceForge is ok, but the idea is that more Microsoft-friendly environment will induce more contributors both XML MVPs and others.

+ XmlWrappingReader class.

+ XmlWrappingWriter class.

Now instead of deriving from legacy XmlTextReader/XmlTextWriter you derive from XmlWrappingReader/XmlWrappingWriter passing to their constructors XmlReader/XmlWriter created via Create() factory method:
public class RenamingXmlReader : XmlWrappingReader
{
    //Provide as many contructors as you need
    public RenamingXmlReader(string file)
        : base(XmlReader.Create(file)) { }

    public override string Name
    {
        get
        {
            return (NodeType == XmlNodeType.Element && 
                base.Name == "value") ? 
                "price" : base.Name;
        }
    }

    public override string LocalName
    {
        get
        {
            return (NodeType == XmlNodeType.Element && 
                base.LocalName == "value") ?
                "price" : base.LocalName;
        }
    }
}
public class RenamingXmlWriter : XmlWrappingWriter
{
    //Provide as many contructors as you need
    public RenamingXmlWriter(TextWriter output)
        : base(XmlWriter.Create(output)) { }

    public override void WriteStartElement(string prefix, string localName, string ns)
    {
        if (string.IsNullOrEmpty(prefix) && localName == "price")
        {
            localName = "value";
        }
        base.WriteStartElement(prefix, localName, ns);
    }
}

That's it. Not much different from previous option in terms of coding, but free of weird XmlTextReader/XmlTextWriter legacy.

AFAIK there is still one problem with this approach though, which is DTD validation. I mean cascading validating XmlReader on top of custom XmlReader scenario. E.g. if you need to resolve XInclude or rename couple of elements and validate resulting XML against DTD in one shot. In .NET 2.0 if you want to DTD validate XML that comes from another XmlReader that reader must be an instance of XmlTextReader. That's undocumented limitation and it was left sort of deliberatly - after all who cares about DTD nowadays? XML Schema validation is not affected by this limitation.

May 30, 2006

On creating custom XmlReaders/XmlWriters in .NET 2.0

When developing custom XmlReader or XmlWriter in .NET 2.0 there is at least three options: implement XmlReader/XmlWriter extend one of concrete XmlReader/XmlWriter implementations and override only methods you need implement XmlReader/XmlWriter by wrapping one of concrete XmlReader/XmlWriter implementations and overriding only methods you need ...

First way is a for full-blown XmlReader/XmlWriters, which need to implement each aspects of XML reading/writing in a different way, e.g. XmlCsvReader, which reads CSV as XML or XslReader, which streamlines XSLT output. This is the most clean while the hardest way - XmlReader has 26 (and XmlWriter - 24) abstract members you would have to implement. Second and third options are for override-style custom readers/writers. When you only need to partially modify XmlReader/XmlWriter behavior it's an overkill to reimplement it from scratch. As this is the most usual scenario I'll concentrate on these two options.

OOP taught us - inherit the class whose behaviour you want to modify and override its virtual members. That's the easiest and most popular way of writing custom XmlReader/XmlWriter. For example let's say you are receiving XML documents from your business partners and one particularly annoying one keeps sending element <value>, while according to the schema you expect it to be <price>. So you need renaming plumbing - XmlReader that reads <value> as <price> and XmlWriter that writes <price> as <value>. Here are implementations extending standard XmlTextReader and XmlTextWriter classes:

public class RenamingXmlReader : XmlTextReader
{
    //Provide as many constructors as you need
    public RenamingXmlReader(string file)
        : base(file) { }

    public override string Name
    {
        get
        {
            return (NodeType == XmlNodeType.Element && 
                base.Name == "value") ? 
                "price" : base.Name;
        }
    }

    public override string LocalName
    {
        get
        {
            return (NodeType == XmlNodeType.Element && 
                base.LocalName == "value") ?
                "price" : base.LocalName;
        }
    }
}

public class RenamingXmlWriter : XmlTextWriter
{
    //Provide as many constructors as you need
    public RenamingXmlWriter(TextWriter output)
        : base(output) { }

    public override void WriteStartElement(string prefix, 
        string localName, string ns)
    {
        if (string.IsNullOrEmpty(prefix) && localName == "price")
        {
            localName = "value";
        }
        base.WriteStartElement(prefix, localName, ns);
    }
}
Looks nice, but there is a couple of serious drawbacks in this approach though:
  1. XmlTextReader and XmlTextWriter are kinda legacy classes. They were introduced in .NET 1.0 with some deviations from the XML standard and as it usually happens now Microsoft have to support those deviations for the sake of backwards compatibility. That means that there some aspects where XmlTextReader/XmlTextWriter and XmlReader/XmlWriter created via Create() method behave differently, in short - Create() method creates more conformant XmlReader/XmlWriter instances than XmlTextReader/XmlTextWriter, e.g. XmlTextWriter does not check for the following:
    • Invalid characters in attribute and element names.
    • Unicode characters that do not fit the specified encoding. If the Unicode characters do not fit the specified encoding, the XmlTextWriter does not escape the Unicode characters into character entities.
    • Duplicate attributes.
    • Characters in the DOCTYPE public identifier or system identifier.
    And XmlTextReader doesn't check character range for numeric entities and so allows � unless Normalization property (off by default) is turned on etc.

    So beware that when deriving from XmlTextReader/XmlTextWriter you are gonna inherit all their crappy legacy weirdnesses too.

  2. Since .NET 1.1 XmlTextReader has FullTrust inheritance demand, i.e. only fully trusted classes can derive from XmlTextReader. That means that above RenaminXmlReader won't work in partially trusted environment such as ASP.NET or ClickOnce. I run into this unpleasant issue with my free eXml Web server control - since XmlBaseAwareXmlTextReader derives from XmlTextReader (just because I was too lazy when creating it) eXml control cannot do XInclude unless working under FullTrust.
So actually I wouldn't recommend deriving from XmlTextReader/XmlTextWriter.

A third approach is to wrap XmlReader/XmlWriter created via Create() method and override methods you need. This approach is used by .NET itself. This requires a little bit more code, but as a result you get clean design and easily composable implementation. I'll cover it tomorrow 'cause Tel-Aviv traffic jams are waiting for me now.

May 29, 2006

On XmlReader/XmlWriter design changes in .NET 2.0

From .NET 1.X experience Microsoft seems finally figured out that providing a set of concrete poorly composable XmlReader and XmlWriter implementations (XmlTextReader, XmlTextWriter, XmlValidatingReader, XmlNodeReader) and emphasizing on programming with concrete classes instead of anstract XmlReader/Xmlwriter was really bad idea. One notorious horrible sample was XmlValidatingReader accepting abstract XmlReader instance ...

Well, unfortunately when you do something wrong for years you can't fix things "in a one swift stroke". Sadly, but MSDN documentation itself (even System.Xml namespace documentation!) is still full of samples using "not recommended" XmlTextReader and XmlTextWriters. And if you don't follow you words youself why would others do? Reality is that .NET developers didn't notice that shy recommendation and keep programming into concrete XML reader/writer classes.

So it wasn't a surprise for me to see fresh MSDN Magazine (06/2006) article "Five Ways to Emit Test Results as XML" promoting using XmlTextWriter class once again and not even mentioning XmlWriter.Create() pattern. Hey, I thought it's a requirement for MSDN authors to read MSDN documentation on a subject? Apparently it's not.

Another thing that bothers me is that XmlReader/XmlWriter factory has no extensibility points whatsoever. So that's not an option for non-Microsoft XML parsers and writers like XIncludingReader or MultiXmlTextWriter to be created in that factory. There is no way to plug in support for XML Base into a system without writing actual code that instantiates XmlBaseAwareXmlTextReader. Microsoft still considers providing abstract factory for XML readers/writers too geeky and similar (in terms or weirdness) to providing abstract factory for string class. Well, it probably would take another couple of .NET versions to grow up before .NET would come to extensibility Java already has for years.

May 22, 2006

XLinq Overview, Overview Diff and Reference online at XLinq.net

I've uploaded HTML versions of the XLinq Overview, XLinq Overview Diff (Sep 2005/May 2006) and XLinq SDK Reference to the XLinq.net portal. I don't fee it's right that I have to install heavy preview-quality package into my system just to be able to read these stuff. Or may be I ...

May 21, 2006

Draft 1.3 of the Ecma Office Open XML formats standard

Via Brian Jones we learn that the Ecma International Technical Committee (TC45) has published draft version 1.3 of the Ecma Office Open XML File Formats Standard. This is 4000 pages document specifying new (alternative to Oasis OpenOffice/OpenDocument XML format) Office XML format to be used by Microsoft starting with Office ...

May 16, 2006

CodePlex

Microsoft has launched CodePlex Beta - kinda revamped GotDotNet, based on Team Foundation Server: CodePlex is an online software development environment for open and shared source developers to create, host and manage projects throughout the project lifecycle. It has been written from the ground up in C# using .NET 2.0 ...

May 12, 2006

On experimenting with LINQ CTP without screwing up Visual Studio (C# only)

LINQ May 2006 CTP installs C# 3.0 compiler and new C# language service into Visual Studio 2005. New syntax, keywords, Intellisense for extension methods and all that jazz. This essensially disables native C# 2.0 compiler and C# language service. If you installed LINQ on Virtual PC - big deal. But ...

May 11, 2006

Saxon.NET and System.Xml.Xsl

I really enjoy seeing Michael Kay talking about working with XML in .NET. That makes me feel I might be not a freak after all. ...

Mike writes about benchmarking Saxon.NET vs System.Xml.Xsl.XslTransform v1.1. .NET seems to be thrice faster, but then Saxon.NET converts .NET DOM into a Saxon tree before each transformation. So Mike went to implement a lightweight wrapper around .NET DOM to allow Saxon to transform .NET DOM almost directly and while working on that he found Microsoft XML documentation "woefully inadequate". Whoha! I know the feeling! Microsoft documentation is written for end users, not for vendors. Just step out a bit from the mainstream (say try to implement custom XmlReader) - and you lost. In too many places MSDN says nothing about how methods behave and even experimenting don't always help. Developing solid software this way sucks a lot. That's a reality I know, but I wish it changes.

Anyway, I should say the benchmark code used isn't an optimal one of course. Transforming to XmlReader never was the fastest method in .NET 1.1. When the result needs to be in DOM, it's better to use XmlNodeWriter to transform directly into DOM. DOM source isn't fastest transformation source either for both .NET and Saxon (did I tell you DOM sucks?) XPathDocument easily gives 30-50% perf improvement. And of course System.Xml.Xsl.XslTransform sucks while System.Xml.Xsl.XslCompiledTransform rocks. That would be interesting to benchmark Saxon.NET with DOM wrapper vs XslCompiledTransform.

But after all Mike is right saying that

Not that Saxon is primarily competing on performance - it's the productivity benefits in XSLT 2.0 that will really influence people - but it would be nice if it's in the same ballpark.

Hey and what about running XslCompiledTransform over XLINQ's XDocument? That should be the killer. Have you seen recent LINQ CTP?

May 8, 2006

VB LINQ: getting a bit closer to C# a bit further from SQL

According to Paul Vick VB9 LINQ query syntax will be switched from SQL-like "Select/From" to "From/Select" (Paul calls it "Yoda style" :) used by C# since the begining of LINQ. While for VB it's quite natural to follow SQL syntax (so not troubling poor busy VB developers), working Intellisense is ...

May 1, 2006

Rendering WordML documents in ASP.NET

Here is one easy way: Go to xmllab.net, get free eXml Web server control and modified Microsoft's WordML2HTML XSLT stylesheet, version 1.3. Drop eXml control onto a Web form, assign DocumentSource property (WordML document you want to render), TransformSource property(wordml2html-.NET-script.xslt): Create new folder to store external images In code behind ...