Recently in System.Xml v2.0 Category

In .NET 2.0 ValidationType.Auto value is made obsolete. What's worse - it doesn't work for XmlReaders created via XmlReader.Create() factory method. But how do you validate against either DTD and/or schema, i.e. against DTD if document has a DOCTYPE and/or schema if any is applicable? The answer is: you can chain two XmlReaders, one set up for performing DTD validation and second - schema validation.

I've seen some people talking and some leaving comments that newly introduced XPathNavigator.SelectSingleNode() method is actually a wrapper around XPathNavigator.Select and so it provides no performance benefits. This is both true and false. It's true that you won't get any performance boost from moving to XPathNavigator.SelectSingleNode() method, because it's really just a wrapper around XPathNavigator.Select() returning first selected node. But that's also false that there is something wrong with it. No performance boost because XPathNavigator.Select() is efficient itself and never selects all nodes, returning instead XPathNodeIterator, which actually selects nodes once its MoveNext() method is called. So there is no perf difference, both are very fast and XPathNavigator.SelectSingleNode() method is mostly just about code elegance and convenience for a coder.

This is a real hidden gem in .NET 2.0 everybody (including me) have pretty much overlooked. XmlSchemaValidator class from the System.Xml.Schema namespace is a push-based W3C XML Schema validatation engine. Push-based means different processing model - an opposite for pull-based one. Think about how you work with XmlWriter (push) and XmlReader (pull). With (obsolete now) .NET 1.X's XmlValidatingReader and .NET 2.0's XmlReader with validation enabled you read XML to make sure it's valid. With XmlSchemaValidator you do the opposite - you ask it to validate XML bits using ValidateElement, ValidateAttribute, ValidateText etc methods.

Erik Saltwell explains what this XslCompiledTransform.OutputSettings property is and why it exists. That's really fresh, clean and powerful design once you get it. I didn't first.

In .NET 2.0 XPathNavigator finally has SelectSingleNode() method! MSXML and XmlDocument (XmlNode actually) have it forever and it's so widely used because it's soooo handy. Obviously despite its name, XPathNavigator.SelectSingleNode() returns not a node, but node equivalent in XPathNavigator's data model - XPathNavigator. And this method is even better than XmlNode's one, because it has overloads accepting compiled XPathExpression, so when running within a loop you don't have to pay XPath compilation price on each iteration. That's another reason to switch completely to XPathNavigator API when processing XML in .NET 2.0.

nxslt 1.6 and nxslt2 Beta1 are available for download. For those not familiar with nxslt: nxslt is free feature-rich .NET XSLT Command Line Utility.

nxslt 1.6 is the next version for the .NET 1.X Frameworks. New features include optionality for source XML or stylesheet, pretty printing, ASCII only escaped output and support for "omit-xml-declaration" attribute of the exsl:document extension element.

nxslt2 Beta1 is the first beta version of the next major nxslt release. nxslt2 uses new XSLT 1.0 processor in the .NET 2.0 Framework - System.Xml.Xsl.XslCompiledTransform class. Hence it requires .NET 2.0 Beta2 (which you can download here) or higher. As a first beta version, nxslt2 Beta1 is quite limited - no support for XInclude, EXSLT, multiple output and embedded stylesheets. As far as I port EXSLT.NET and XInclude.NET to .NET 2.0 I'll update nxslt2.

New Microsoft XSLT Processor (XslCompiledTransform) is a great stuff. Compiles XSLT to MSIL and runs as fast as MSXML4. I'll be writing about it a lot soon. With nxslt2 you can give it a whirl.

I was wrong in my last post. Here is how one can output HTML with XslCompiledTransform when XmlResolver needs to be passed to Transform() method.

using (XmlReader src = XmlReader.Create("../../source.xml"))
  XslCompiledTransform xslt = new XslCompiledTransform();
  XmlWriter result = XmlWriter.Create(Console.Out, xslt.OutputSettings);
  xslt.Transform(src, null, result, new XmlUrlResolver());
The key line is emphasized. One just needs to pass XslCompiledTransform's OutputSettings (after XSLT stylesheet is loaded) to XmlWriter.Create() method and then resulting XmlWriter will output transformation results according to <xsl:output> settings in the XSLT stylesheet. Really nice once I get it.

I'm porting nxslt utility to .NET 2.0 with XslCompiledTransform as XSLT processor and I just found out XslCompiledTransform API is really severe broken. I was writing before that the only Transform() method overload that accepts XmlResolver outputs to XmlWriter. So if you want to create HTML and to have some control over document() function resolving (or just provide user credentials), you are out of luck with XslCompiledTransform. Quite common scenario, isn't it? Too bad, XML Team should hire better testers.

What I dislike in System.Xml v2.0 (and v1.X for that matter) is a poor support for push-based XML processing. Somehow it's all about pull - XmlReader, while push - XmlWriter seems to be a second class citizen. For instance one can't populate XML into XPathDocument or XSLT stylesheet into XslCompiledTransform with XmlWriter. One can't deserialize an object from XML if XML is represented as XmlWriter etc. In a nutshell: XML producers in .NET write XML into XmlWriter, while XML consumers read XML from XmlReader. The problem with this mix of pull and push arises when one tries to pipeline a XML producer and a XML consumer, i.e. a component that writes to XmlWriter and another one which reads from XmlReader. Ooops! Think about feeding XML created with XSLT into SqlXml or deserializing an object from an XML modified by XSLT or chaining two XSLT transformations when output from the first one goes as input to the second one or generating XSLT stylesheet on the fly. Most of these problems can't be solved in .NET 2.0 in a streaming fashion and require interim buffering of the whole XML effectively killing scalability and performance. Look here and here. I'm really sorry to see interim buffering with serializing/reparsing involved as an XML pipelining solution in a modern technology like .NET 2.0. published a series of articles by Alex Homer on reading and writing XML in .NET Version 2.0:

  1. Reading and Writing XML in .NET Version 2.0 - Part 1
  2. Reading and Writing XML in .NET Version 2.0 - Part 2
Excellent articles. Part 3 is expected too according to Alex's site. Yeah, I just found out Alex Homer has a site and even very interesting blog (with no RSS feed and not so frequently updated though). It's Alex, you really need RSS feed on your site!

Microsoft XML Team has posted a response "Comparing XML Performance" to the Sun XML Mark 1.0 benchmark and accompanying whitepaper from Sun XML Performance Team asserted that Java significantly outperforms .NET in XML processing performance.

kzu says he has broken the mark of 100 bugs filed to the MSDN Feedback Center. That's impressive. My numbers are humble - only 15 bugs and suggestions. Gotta be more active here. I spent a day working on an adapter to my homegrown XSLT test suite for the XslCompiledTransform class and the very first run brought a bunch of issues. Now I have to analyze the log and file bugs. I'm happy I found already a quite significant one - XslCompiledTransform thinks NaN equals to NaN. To put it another way - "number('foo')=number('bar')" evaluates to true! That's really bad, because that property of NaN (non equality to anything) is the base for a quite widespread technique in XSLT 1.0 on determining whether a value is a number - "number($val) = number($val)" is true if and only if $val is a number.

Another improvement System.Xml 2.0 brings, from the how-come-I-didn't-make-it--before department is that XPathNavigator class now implements IXPathNavigable. Sounds obvious, huh? In both common and OOP sense of course XPathNavigator should be IXPathNavigable, but somehow in .NET 1.0 and 1.1 it is not. (And by the way I still wonder how come XmlNodeReader doesn't implement IHasXmlNode interface. Too bad I made this suggestion too late and now we must wait another year or two for this). Anyway, these 2 lines of code:

public virtual XPathNavigator CreateNavigator()
      return this.Clone();
made a magic to the XslCompiledTransform API. 4 redundant Transform() overloads down! Now input to XSLT is either string (URL), XmlReader or IXPathNavigable.

And if you aren't familiar with IXPathNavigable - don't bother. Just remember that you can pass XmlDocument, XPathDocument, XmlDataDocument or XPathNavigator objects as is to the Transform() method as all these classes implement IXPathNavigable. API simplicity is invaluable and I'm glad version 2.0 of the System.Xml looks simpler than previous ones.

Another handy feature implemented in .NET 2.0 Beta2 is that XmlReader class now implements IDisposable interface and so can be closed automatically when using with "using" statement in C#:

using (XmlReader r = XmlReader.Create("../../source.xml"))
  while (r.Read())
Really handy. And implemented in literally couple of lines. It's a pity we don't have such simple but useful stuff in .NET 1.1.

.NET XSLT API is traditionally ugly. XslTransform class (obsoleted in .NET 2.0) had 8 Load() methods and 9 Transform() ones in .NET 1.0. In .NET 1.1 - 11 Load() methods (5 of them obsoleted) and 18 Transform() (9 obsoleted). Huge mess. Brand new XslCompiledTransform in .NET 2.0 Beta2 has just 6 Load() methods and 14 Transform() ones, none obsoleted so far. Sounds good, but does this pile of method overloads cover all usage cases? Unfortunately not.

This one little improvement in System.Xml 2.0 Beta2 is sooo cool anyway: XPathNodeIterator class at last implements IEnumerable! Such unification with .NET iteration model means we can finally iterate over nodes in an XPath selection using standard foreach statement:

XmlDocument doc = new XmlDocument();
XPathNavigator nav = doc.CreateNavigator();
foreach (XPathNavigator node in nav.Select("/orders/order"))
Compare this to what we have to write in .NET 1.X:
XmlDocument doc = new XmlDocument();
XPathNavigator nav = doc.CreateNavigator();
XPathNodeIterator ni = nav.Select("/orders/order");
while (ni.MoveNext())      
Needless to say - that's the case when just a dozen lines of code can radically simplify a class's usage and improve overall developer's productivity. How come this wasn't done in .NET 1.1 I have no idea.

And how come the MSDN documentation for the class still doesn't mention this cool feature - I have no idea either.

More security changes made in XSLT in .NET 2.0 Beta2. When working with XslCompiledTransform class:

document() function is disabled by default. To enable it, one has to provide XsltSettings enum value with EnableDocumentFunction field set to the XslCompiledTransform.Load() method:

XslCompiledTransform xslt = new XslCompiledTransform();
XsltSettings settings = new XsltSettings();
settings.EnableDocumentFunction = true;            
xslt.Load("style.xslt", settings, new XmlUrlResolver());
XslCompiledTransform xslt = new XslCompiledTransform();
XsltSettings settings = new XsltSettings(true, false);            
xslt.Load("style.xslt", settings, new XmlUrlResolver());
(first argument in the XsltSettings constructor controls document() function enabling).
Or even (for full trusted stylesheets):
XslCompiledTransform xslt = new XslCompiledTransform();                        
xslt.Load("style.xslt", XsltSettings.TrustedXslt, new XmlUrlResolver());
Note, that then one must provide an instance of XmlResolver class to the XslCompiledTransform.Load() method. It' used to resolve stylesheet URI and xsl:include/xsl:import statements and somehow cannot be null, so there doesn't seem to be any way to disable xsl:include/xsl:import, despite the documentation claims xsl:include/xsl:import are enabled by default. Weird.

And even if at compile time the document() function was enabled, one can supress it provideing null as a XmlResolver to the XslCompiledTransform.Transform() method. And btw, there is only one Transform() overload, which accepts XmlResolver, which is also weird, because it requires XmlReader and what if I've got IXPathNavigable as a source XML?

Script blocks are disabled by default too. Use the same XsltSettings enum to enable it.

I had a conversation with somebody about how EXSLT.NET worked around the hyphenated EXSLT function names problem and if there are better ways to solve it. Here is a suggestion for Microsoft: give us more control over exposing .NET methods as extension functions and make it declarative.

Yep, no DTD is allowed by default in the .NET 2.0 Beta2:

XmlReaderSettings.ProhibitDtd Property (System.Xml)
Gets or sets a value indicating whether to prohibit document type definition (DTD) processing.

Return Value
true to prohibit DTD processing; otherwise false. The default is true.

This setting can be useful in preventing certain denial of service attacks. When set to true, the reader throws an System.Xml.XmlException when any DTD content is encountered.

That's for sure contradicts "Allow all XML syntax" gospel, but looks like Microsoft takes security very seriously nowadays. Well, at least Microsoft's XML team. Most likely that was a hard decision, but may be not since what are the options here in the face of the billions of laughs attack? If 1Kb well-formed XML document can hog all your CPU and memory when you just open it in a browser, which processes DTD, such as IE?

Well, sure it's just a default value and can be changed. But defaults are more than just defaults and I bet most .NET 2.0 applications won't accept XML with DTD. That's sort of a milestone in XML history.

I'm studying new XSLT 1.0 implementation provided by Microsoft in the .NET 2.0 Beta2 - XslCompiledTransform class. The guys who wrote it are my good friends and excellent developers, but let me to complain a little bit, not because I'm a complainer, but trying to make this cool piece of software even better.