May 31, 2006

On creating custom XmlReaders/XmlWriters in .NET 2.0, Part 2

This is second part of the post. Find first part here.

So what is a better way of creating custom XmlReader/XmlWriter in .NET 2.0? Here is the idea - have an utility wrapper class, which wraps XmlReader/XmlWriter and does nothing else. Then derive from this class and override methods you are interested in. These utility wrappers are called XmlWrapingReader and XmlWrapingWriter. They are part of System.Xml namespace, but unfortunately they are internal ones - Microsoft XML team has considered making them public, but in the Whidbey release rush decided to postpone this issue. Ok, happily these classes being pure wrappers have no logic whatsoever so anybody who needs them can indeed create them in a 10 minutes. But to save you that 10 minutes I post these wrappers here. I will include XmlWrapingReader and XmlWrapingWriter into the next Mvp.Xml library release.

Btw, most likely the Mvp.Xml project will be rehosted to the CodePlex. SourceForge is ok, but the idea is that more Microsoft-friendly environment will induce more contributors both XML MVPs and others.

+ XmlWrappingReader class.

+ XmlWrappingWriter class.

Now instead of deriving from legacy XmlTextReader/XmlTextWriter you derive from XmlWrappingReader/XmlWrappingWriter passing to their constructors XmlReader/XmlWriter created via Create() factory method:
public class RenamingXmlReader : XmlWrappingReader
{
    //Provide as many contructors as you need
    public RenamingXmlReader(string file)
        : base(XmlReader.Create(file)) { }

    public override string Name
    {
        get
        {
            return (NodeType == XmlNodeType.Element && 
                base.Name == "value") ? 
                "price" : base.Name;
        }
    }

    public override string LocalName
    {
        get
        {
            return (NodeType == XmlNodeType.Element && 
                base.LocalName == "value") ?
                "price" : base.LocalName;
        }
    }
}
public class RenamingXmlWriter : XmlWrappingWriter
{
    //Provide as many contructors as you need
    public RenamingXmlWriter(TextWriter output)
        : base(XmlWriter.Create(output)) { }

    public override void WriteStartElement(string prefix, string localName, string ns)
    {
        if (string.IsNullOrEmpty(prefix) && localName == "price")
        {
            localName = "value";
        }
        base.WriteStartElement(prefix, localName, ns);
    }
}

That's it. Not much different from previous option in terms of coding, but free of weird XmlTextReader/XmlTextWriter legacy.

AFAIK there is still one problem with this approach though, which is DTD validation. I mean cascading validating XmlReader on top of custom XmlReader scenario. E.g. if you need to resolve XInclude or rename couple of elements and validate resulting XML against DTD in one shot. In .NET 2.0 if you want to DTD validate XML that comes from another XmlReader that reader must be an instance of XmlTextReader. That's undocumented limitation and it was left sort of deliberatly - after all who cares about DTD nowadays? XML Schema validation is not affected by this limitation.

May 30, 2006

On creating custom XmlReaders/XmlWriters in .NET 2.0

When developing custom XmlReader or XmlWriter in .NET 2.0 there is at least three options:

  1. implement XmlReader/XmlWriter
  2. extend one of concrete XmlReader/XmlWriter implementations and override only methods you need
  3. implement XmlReader/XmlWriter by wrapping one of concrete XmlReader/XmlWriter implementations and overriding only methods you need

First way is a for full-blown XmlReader/XmlWriters, which need to implement each aspects of XML reading/writing in a different way, e.g. XmlCsvReader, which reads CSV as XML or XslReader, which streamlines XSLT output. This is the most clean while the hardest way - XmlReader has 26 (and XmlWriter - 24) abstract members you would have to implement. Second and third options are for override-style custom readers/writers. When you only need to partially modify XmlReader/XmlWriter behavior it's an overkill to reimplement it from scratch. As this is the most usual scenario I'll concentrate on these two options.

OOP taught us - inherit the class whose behaviour you want to modify and override its virtual members. That's the easiest and most popular way of writing custom XmlReader/XmlWriter. For example let's say you are receiving XML documents from your business partners and one particularly annoying one keeps sending element <value>, while according to the schema you expect it to be <price>. So you need renaming plumbing - XmlReader that reads <value> as <price> and XmlWriter that writes <price> as <value>. Here are implementations extending standard XmlTextReader and XmlTextWriter classes:

public class RenamingXmlReader : XmlTextReader
{
    //Provide as many constructors as you need
    public RenamingXmlReader(string file)
        : base(file) { }

    public override string Name
    {
        get
        {
            return (NodeType == XmlNodeType.Element && 
                base.Name == "value") ? 
                "price" : base.Name;
        }
    }

    public override string LocalName
    {
        get
        {
            return (NodeType == XmlNodeType.Element && 
                base.LocalName == "value") ?
                "price" : base.LocalName;
        }
    }
}

public class RenamingXmlWriter : XmlTextWriter
{
    //Provide as many constructors as you need
    public RenamingXmlWriter(TextWriter output)
        : base(output) { }

    public override void WriteStartElement(string prefix, 
        string localName, string ns)
    {
        if (string.IsNullOrEmpty(prefix) && localName == "price")
        {
            localName = "value";
        }
        base.WriteStartElement(prefix, localName, ns);
    }
}
Looks nice, but there is a couple of serious drawbacks in this approach though:
  1. XmlTextReader and XmlTextWriter are kinda legacy classes. They were introduced in .NET 1.0 with some deviations from the XML standard and as it usually happens now Microsoft have to support those deviations for the sake of backwards compatibility. That means that there some aspects where XmlTextReader/XmlTextWriter and XmlReader/XmlWriter created via Create() method behave differently, in short - Create() method creates more conformant XmlReader/XmlWriter instances than XmlTextReader/XmlTextWriter, e.g. XmlTextWriter does not check for the following:
    • Invalid characters in attribute and element names.
    • Unicode characters that do not fit the specified encoding. If the Unicode characters do not fit the specified encoding, the XmlTextWriter does not escape the Unicode characters into character entities.
    • Duplicate attributes.
    • Characters in the DOCTYPE public identifier or system identifier.
    And XmlTextReader doesn't check character range for numeric entities and so allows � unless Normalization property (off by default) is turned on etc.

    So beware that when deriving from XmlTextReader/XmlTextWriter you are gonna inherit all their crappy legacy weirdnesses too.

  2. Since .NET 1.1 XmlTextReader has FullTrust inheritance demand, i.e. only fully trusted classes can derive from XmlTextReader. That means that above RenaminXmlReader won't work in partially trusted environment such as ASP.NET or ClickOnce. I run into this unpleasant issue with my free eXml Web server control - since XmlBaseAwareXmlTextReader derives from XmlTextReader (just because I was too lazy when creating it) eXml control cannot do XInclude unless working under FullTrust.
So actually I wouldn't recommend deriving from XmlTextReader/XmlTextWriter.

A third approach is to wrap XmlReader/XmlWriter created via Create() method and override methods you need. This approach is used by .NET itself. This requires a little bit more code, but as a result you get clean design and easily composable implementation. I'll cover it tomorrow 'cause Tel-Aviv traffic jams are waiting for me now.

May 29, 2006

On XmlReader/XmlWriter design changes in .NET 2.0

From .NET 1.X experience Microsoft seems finally figured out that providing a set of concrete poorly composable XmlReader and XmlWriter implementations (XmlTextReader, XmlTextWriter, XmlValidatingReader, XmlNodeReader) and emphasizing on programming with concrete classes instead of anstract XmlReader/Xmlwriter was really bad idea. One notorious horrible sample was XmlValidatingReader accepting abstract XmlReader instance and downcasting it silently to XmlTextReader inside. In .NET 2.0 Microsoft (with a huge diffidence) is trying to bring some order to that mess:

  1. XmlReader and XmlWriter now follow factory method design pattern by providing static Create() method which is now recommended way of creating XmlReader and XmlWriter instances.
  2. While not being marked as obsolete or deprecated or not recommended, concrete implementations like XmlTextReader and XmlTextWriter are now just wrappers for internal classes used to implement Create() factory method.
  3. I was said that Microsoft will be "moving away from the XmlTextReader and XmlValidating reader" and "emphasize programming directly to the XmlReader and will provide an implementation of the factory design patterns which returns different XmlReader instances based on which features the user is interested.".

Well, unfortunately when you do something wrong for years you can't fix things "in a one swift stroke". Sadly, but MSDN documentation itself (even System.Xml namespace documentation!) is still full of samples using "not recommended" XmlTextReader and XmlTextWriters. And if you don't follow you words youself why would others do? Reality is that .NET developers didn't notice that shy recommendation and keep programming into concrete XML reader/writer classes.

So it wasn't a surprise for me to see fresh MSDN Magazine (06/2006) article "Five Ways to Emit Test Results as XML" promoting using XmlTextWriter class once again and not even mentioning XmlWriter.Create() pattern. Hey, I thought it's a requirement for MSDN authors to read MSDN documentation on a subject? Apparently it's not.

Another thing that bothers me is that XmlReader/XmlWriter factory has no extensibility points whatsoever. So that's not an option for non-Microsoft XML parsers and writers like XIncludingReader or MultiXmlTextWriter to be created in that factory. There is no way to plug in support for XML Base into a system without writing actual code that instantiates XmlBaseAwareXmlTextReader. Microsoft still considers providing abstract factory for XML readers/writers too geeky and similar (in terms or weirdness) to providing abstract factory for string class. Well, it probably would take another couple of .NET versions to grow up before .NET would come to extensibility Java already has for years.

May 22, 2006

XLinq Overview, Overview Diff and Reference online at XLinq.net

I've uploaded HTML versions of the XLinq Overview, XLinq Overview Diff (Sep 2005/May 2006) and XLinq SDK Reference to the XLinq.net portal. I don't fee it's right that I have to install heavy preview-quality package into my system just to be able to read these stuff. Or may be I just used to MSDN online. Diff is also cool for lazy/busy devs like me. Anyway. Btw, XLinq Overview link at the LINQ Project homepage points to the old September 2005 version.

May 21, 2006

Draft 1.3 of the Ecma Office Open XML formats standard

Via Brian Jones we learn that the Ecma International Technical Committee (TC45) has published draft version 1.3 of the Ecma Office Open XML File Formats Standard. This is 4000 pages document specifying new (alternative to Oasis OpenOffice/OpenDocument XML format) Office XML format to be used by Microsoft starting with Office 2007.

As a matter of interest:

  • The draft is available in PDF, which was created by Word 2007
  • The draft also available in Open XML format itself, which one will be use once Office 2007 Beta 2 is out
  • The document is huge and specifies everything down to the "Maple Muffins" border style kinda details
  • These guys help MIcrosoft in creating Ecma Office Open XML format: Apple, Barclays Capital, BP, The British Library, Essilor, Intel, Microsoft, NextPage, Novell, Statoil, and Toshiba

May 16, 2006

CodePlex

Microsoft has launched CodePlex Beta - kinda revamped GotDotNet, based on Team Foundation Server:

CodePlex is an online software development environment for open and shared source developers to create, host and manage projects throughout the project lifecycle. It has been written from the ground up in C# using .NET 2.0 technology with Team Foundation Server on the back end. CodePlex is open to the public free of charge.

CodePlex includes the following features:

* Release Management
* Work Item Tracking
* Source Code Dissemination
* Wiki-based Project Team Communications
* Project Forums
* News Feed Aggregation

My experience with GotDotNet was a nasty one and I don't expect its successor to bring SourceForge quality any soon, but anyway CodePlex looks interesting and promising. It already hosts several interesting projects including CodePlex itself, "Atlas" Control Toolkit, Commerce Starter Kit, IronPython etc.

I'm sure my next open-source .NET related project will be hosted at the CodePlex.

May 12, 2006

On experimenting with LINQ CTP without screwing up Visual Studio (C# only)

LINQ May 2006 CTP installs C# 3.0 compiler and new C# language service into Visual Studio 2005. New syntax, keywords, Intellisense for extension methods and all that jazz. This essensially disables native C# 2.0 compiler and C# language service. If you installed LINQ on Virtual PC - big deal. But if not and you want to switch C# back to 2.0 - there is a solution. Folder bin contains two little scripts called "Install C# IDE Support.vbs" and "Uninstall C# IDE Support.vbs". Just run latter one and your native C# 2.0 is back. Somehow there are only scripts for C#.

May 11, 2006

Saxon.NET and System.Xml.Xsl

I really enjoy seeing Michael Kay talking about working with XML in .NET. That makes me feel I might be not a freak after all.

Mike writes about benchmarking Saxon.NET vs System.Xml.Xsl.XslTransform v1.1. .NET seems to be thrice faster, but then Saxon.NET converts .NET DOM into a Saxon tree before each transformation. So Mike went to implement a lightweight wrapper around .NET DOM to allow Saxon to transform .NET DOM almost directly and while working on that he found Microsoft XML documentation "woefully inadequate". Whoha! I know the feeling! Microsoft documentation is written for end users, not for vendors. Just step out a bit from the mainstream (say try to implement custom XmlReader) - and you lost. In too many places MSDN says nothing about how methods behave and even experimenting don't always help. Developing solid software this way sucks a lot. That's a reality I know, but I wish it changes.

Anyway, I should say the benchmark code used isn't an optimal one of course. Transforming to XmlReader never was the fastest method in .NET 1.1. When the result needs to be in DOM, it's better to use XmlNodeWriter to transform directly into DOM. DOM source isn't fastest transformation source either for both .NET and Saxon (did I tell you DOM sucks?) XPathDocument easily gives 30-50% perf improvement. And of course System.Xml.Xsl.XslTransform sucks while System.Xml.Xsl.XslCompiledTransform rocks. That would be interesting to benchmark Saxon.NET with DOM wrapper vs XslCompiledTransform.

But after all Mike is right saying that

Not that Saxon is primarily competing on performance - it's the productivity benefits in XSLT 2.0 that will really influence people - but it would be nice if it's in the same ballpark.

Hey and what about running XslCompiledTransform over XLINQ's XDocument? That should be the killer. Have you seen recent LINQ CTP?

May 8, 2006

VB LINQ: getting a bit closer to C# a bit further from SQL

According to Paul Vick VB9 LINQ query syntax will be switched from SQL-like "Select/From" to "From/Select" (Paul calls it "Yoda style" :) used by C# since the begining of LINQ. While for VB it's quite natural to follow SQL syntax (so not troubling poor busy VB developers), working Intellisense is more important. Anyway, I like it.

The changes should be in the next LINQ CTP, which is expected ... (wink wink) really really soon!

May 1, 2006

Rendering WordML documents in ASP.NET

Here is one easy way:

  1. Go to xmllab.net, get free eXml Web server control and modified Microsoft's WordML2HTML XSLT stylesheet, version 1.3.
  2. Drop eXml control onto a Web form, assign DocumentSource property (WordML document you want to render), TransformSource property(wordml2html-.NET-script.xslt): <xmllab:eXml ID="EXml1" runat="server" DocumentSource="~/TestDocument.xml" TransformSource="~/wordml2html-.NET-script.xslt"/>
  3. Create new folder to store external images
  4. In code behind allow XSLT scripting and pass couple XSLT parameters - real path to above image directory and its virtual name:
    protected void Page_Load(object sender, EventArgs e)
    {
      EXml1.XsltSettings = System.Xml.Xsl.XsltSettings.TrustedXslt;
      EXml1.TransformArgumentList = 
        new System.Xml.Xsl.XsltArgumentList();
      EXml1.TransformArgumentList.AddParam(
        "base-dir-for-images", "", MapPathSecure("~/images"));
      EXml1.TransformArgumentList.AddParam(
        "base-virtual-dir-for-images", "", "images");        
    }
    
Done.

I had to add these two parameters so the WordML2HTML stylesheet could export images there and then refer to exported images in HTML. If you don't pass these parameters images will be exported into current directory - while that's ok when running WordML2HTML transformation in a command line, that's bad idea for ASP.NET environment.

Enjoy!