May 31, 2005

Effective XML: Dumping XML content while reading it from a stream

A fellow MVP asked if there is a way to dump XML content while reading it from a stream without buffering the whole XML document. Here is a scenario - an XML document being read from a HttpWebResponse stream and needs to be passed as an XmlReader to an XmlSerializer ...

public class DumpingXmlTextReader : XmlTextReader 
{
  private XmlWriter dump;

  //Add more constructors as needed
  public DumpingXmlTextReader(string url, XmlWriter dump)
    :base(url) 
  {
    this.dump = dump;
  }
    
  /// <summary>
  /// Overriden XmlReader's Read() method
  /// </summary>    
  public override bool Read()
  {
    bool baseRead = base.Read();
    if (baseRead) 
    {
      WriteShallowNode(this, dump);   
    }
    return baseRead;
  }
    
  /// <summary>
  /// Auxilary method to dump node XmlReader is positioned at.
  /// Thanks to Mark Fussell, 
  /// http://blogs.msdn.com/mfussell/archive/2005/02/12/371546.aspx
  /// </summary>    
  static void WriteShallowNode( XmlReader reader, XmlWriter writer )
  {
    if ( reader == null )
    {
      throw new ArgumentNullException("reader");
    }

    if ( writer == null )
    {
      throw new ArgumentNullException("writer");
    }   
      
    switch ( reader.NodeType )
    {
      case XmlNodeType.Element:
        writer.WriteStartElement( reader.Prefix, reader.LocalName, 
          reader.NamespaceURI );
        writer.WriteAttributes( reader, true );
        if ( reader.IsEmptyElement )
        {
          writer.WriteEndElement();
        }
        break;

      case XmlNodeType.Text:
        writer.WriteString( reader.Value );
        break;

      case XmlNodeType.Whitespace:
      case XmlNodeType.SignificantWhitespace:
        writer.WriteWhitespace(reader.Value);
        break;

      case XmlNodeType.CDATA:
        writer.WriteCData( reader.Value );
        break;

      case XmlNodeType.EntityReference:
        writer.WriteEntityRef(reader.Name);
        break;

      case XmlNodeType.XmlDeclaration:
      case XmlNodeType.ProcessingInstruction:
        writer.WriteProcessingInstruction( reader.Name, reader.Value );
        break;

      case XmlNodeType.DocumentType:
        writer.WriteDocType( reader.Name, 
          reader.GetAttribute( "PUBLIC" ), reader.GetAttribute( "SYSTEM" ), 
          reader.Value );
        break;

      case XmlNodeType.Comment:
        writer.WriteComment( reader.Value );
        break;

      case XmlNodeType.EndElement:
        writer.WriteFullEndElement();
        break;
    }
  }
}
Not a rocket science as you can see, pretty straightforward. The core method - WriteShallowNode, dumping XML node I borrowed from Mark Fussell's post on "Combining the XmlReader and XmlWriter classes for simple streaming transformations".

And here is a usage sample. I'm reading XML from a file stream (imagine instead it's HttpWebResponse stream), feeding it to an XmlSerializer and dumping its content at the same time. And note - XML content never gets buffered as a whole, the processing is pure forward-only non-caching streaming one.

//Prepare dumping writer
XmlTextWriter dumpWriter = new XmlTextWriter("dump.xml", Encoding.UTF8);
dumpWriter.Formatting = Formatting.Indented;
PurchaseOrder po = null;
using (FileStream fs = File.OpenRead("PurchaseOrder.xml")) 
{
  //Reads and dumps XML content node-by-node to the dumpWriter
  XmlReader reader = new DumpingXmlTextReader(fs, dumpWriter);
  XmlSerializer serializer = new XmlSerializer(typeof(PurchaseOrder));      
  po = (PurchaseOrder)serializer.Deserialize(reader);
}
//Close dumping writer, the XML dump is in dump.xml
dumpWriter.Close();            
//Deserialization went ok
Console.WriteLine(po.Account);

I wonder if it's a rare use case or we need such class in utilities, e.g. in Mvp.Xml library?

May 29, 2005

Did you know? MIT's OpenCourseWare - MIT Courses online for free

Hey, self-learners, did you know that you can take MIT course online for free? MIT's OpenCourseWare made it possible. That's really cool resource for self education. [Via Wesner Moise] Enrolling in an online computer science program can help advance your knowledge. ...

May 26, 2005

Altsoft released Xml2PDF formatting engine version 2.3, now supporting WordML

Altsoft N.V. has announced a relase of the Xml2PDF formatting engine version 2.3, now supporting WordML. Altsoft Xml2PDF is a .NET based formatting engine for converting various XML-based formats to PDF. It supports XSL-FO, SVG, XHTML, WordML and XML+XSLT as an input and generates PDF as an output. The prices ...

May 25, 2005

Mvp.Xml Project Statistics

SourceForge has fixed the stat system and now we can analyze Mvp.Xml project statistics. The numbers are good - 8-15K hits/mo and 700-800 downloads/mo, not bad for a 1.0 release. ...

Hibiscus time

Hibiscus blooms again at our home. Actually outdoor it blooms non stop all year here in Israel, serving as a perfect undemanding but beautiful hedge. It's everywhere, every color and every form, there are even Hibiscus trees. But my home Hibiscus blooms two or three months in a row, then ...

70-316 Exam: Passed. Got MCAD.

So yesterday I passed 70-316 exam ("Developing and Implementing Windows-based Applications with Microsoft Visual C# .NET and Microsoft Visual Studio .NET"). Slack preparation during a week, bad-surprizingly too many questions on darned DataSets, but anyway I got 900 out of 1000. Now that I passed these three exams (70-315, 70-316 ...

May 22, 2005

Netscape 8 Breaks XSLT in Internet Explorer?

Some users report that after installing Netscape 8 Internet Explorer and other IE-based browsers usch as Avant browser stop applying XSLT stylesheets, even the default stylesheet used to render XML documents. That probably has something to do with "Firefox or IE6 rendering" feature in Netscape. Beware. If you do make ...

May 18, 2005

Jonathan Marsh is blogging

Jonathan Marsh, who is one of Microsoft's representatives at the W3C, an editor of XML Base, XPointer, XInclude, xml:id, some XQuery 1.0 and XPath 2.0 specs and is by the way the original author of the defaultss.xsl which is used in Internet Explorer to display XML documents, is blogging. His ...

70-320 exam: passed

Well, I recently decided I need to be an MCAD. That requires to pass three exams. I passed one (70-315) in December and yesterday I went for a second one - 70-320 (Developing XML Web Services and Server Components with Microsoft Visual C# and the Microsoft .NET Framework). Should admit ...

May 16, 2005

System.Xml 2.0: XmlReader is now IDisposabe

Another handy feature implemented in .NET 2.0 Beta2 is that XmlReader class now implements IDisposable interface and so can be closed automatically when using with "using" statement in C#: using (XmlReader r = XmlReader.Create("../../source.xml")) { while (r.Read()) Console.WriteLine(r.NodeType); } Really handy. And implemented in literally couple of lines. It's a ...

May 11, 2005

Blogging engine running XSLT?

Dave Pawson investigates if there is enough interest in a community developed web logging system based on XSLT processing and Atom8. He welcomes any feedback. I've been using XSLT-powered online publishing systems such as Docbook Website and definitely see enough potential behind Dave's idea. ...

May 10, 2005

.NET XSLT API - broken again?

.NET XSLT API is traditionally ugly. XslTransform class (obsoleted in .NET 2.0) had 8 Load() methods and 9 Transform() ones in .NET 1.0. In .NET 1.1 - 11 Load() methods (5 of them obsoleted) and 18 Transform() (9 obsoleted). Huge mess. Brand new XslCompiledTransform in .NET 2.0 Beta2 has just ...

For instance there is no Transform() method that accepts IXPathNavigable and XmlResolver. That means one can't transform already loaded XmlDocument or XPathDocument and provide XmlReader instance (say with user credentials set) at the same time. The only workaround is to pass XmlReader over XmlDocument or XPathDocument, which means ridiculous thing - I've got loaded XPathDocument, but have to pass an XmlReader over it so XslCompiledTransform will build another temporary copy of this XPathDocument. What a waste of memory!

Another uncovered case - the only Transform() method accepting XmlResolver outputs to an XmlWriter. But what if I transfrom to HTML or text? There is no workaround here.

Why this stuff is so complicated? The core problem is that input and output of an XSLT processor can be of a variety of forms. Generally speaking XSLT input can be string (document URI), Stream, TextReader, XmlReader or IXPathNavigable. And output can be string (URI), Stream, TextWriter or XmlWriter. 5 times 4 = 20. Plus optional XsltArgumentList and optional XmlResolver - 20 times 2 times 2 = 80 Transform() methods for the maximum usability. That's of course crazy. Encapsulation usully helps. Stream can be encapsulated into TextReader and TextReader into XmlReader and XmlReader into IXPathNavigable. Output isn't so easy. Stream, TextWriter or XmlWriter arte really needed. And recall usability... So it's still to many methods.

I wonder if the XML Team considered javax.xml.transform.Result approach?

May 9, 2005

foreach and XPathNodeIterator - finally together

This one little improvement in System.Xml 2.0 Beta2 is sooo cool anyway: XPathNodeIterator class at last implements IEnumerable! Such unification with .NET iteration model means we can finally iterate over nodes in an XPath selection using standard foreach statement: XmlDocument doc = new XmlDocument(); doc.Load("orders.xml"); XPathNavigator nav = doc.CreateNavigator(); foreach ...

May 8, 2005

Security changes in .NET 2.0's XSLT

More security changes made in XSLT in .NET 2.0 Beta2. When working with XslCompiledTransform class: document() function is disabled by default. To enable it, one has to provide XsltSettings enum value with EnableDocumentFunction field set to the XslCompiledTransform.Load() method: XslCompiledTransform xslt = new XslCompiledTransform(); XsltSettings settings = new XsltSettings(); settings.EnableDocumentFunction ...

May 5, 2005

Declarative way to expose methods as XSLT extension functions?

I had a conversation with somebody about how EXSLT.NET worked around the hyphenated EXSLT function names problem and if there are better ways to solve it. Here is a suggestion for Microsoft: give us more control over exposing .NET methods as extension functions and make it declarative. ...

Currently when one exposes an object as an extension object to XSLT, all its public instance and static methods become accessible as XSLT extension functions. One has no control over which methods are exposed and which aren't. Also one has no control over how object methods are exposed to XSLT:

  • no custom name, so you call a function in XSLT by its CLR name, while XSLT and CLR functions syntax requirements and naming conventions are fairly different - hence aforementioned EXSLT.NET problem: many EXSLT functions are named in XSL style - names contain hyphens, but in .NET method names cannot contain them. As a consequence in EXSLT.NET we had to resort to a MSIL level hacking.
  • no control over marshalling between CLR and XSLT type systems, which are very different
  • no control over memoization (result caching)
  • no control over code access - security is binary-grained - everybody having FullTrust can use an extension object and nobody else
  • no overrides or priority means as in XSLT 2.0 stylesheet functions
  • no arguments optionality
  • no control over function namespace. Think about it - namespace, extension function belongs to is defined not by a person who implements it, but by someone who uses it.
When working with Web Services .NET doesn't expose every public method of your class as WebMethod, right? Instead Web methods should be marked with WebMethodAttribute, which along with bunch of other attributes controls exposing Web Service interface. Wouldn't it be nice to see the same declarative approach applied to XSLT extension objects? I mean something like
/// <summary>
/// Calculates the circumference of a circle given the radius.
/// </summary>
[XsltExtensionObject(Namespace="http://example.com/calculate")]
public class Calculate
{
    private double circ = 0;

    [XsltExtensionFunction(Name="circumference", Memoization=true)]
    public double Circumference(double radius)
    {
      circ = Math.PI*2*radius;
      return circ;
    }
 }
Benefits are evident, aren't they? You would say - dream on, who cares? Ok, but I believe that such stuff is inevitable as XML penetrates to the core of modern programming languages such as C# or Java.

Oh well, I filed above as a suggestion to the MSDN Product Feedback Center, go vote for it if you like the idea. Any comments would be appreciated too.

May 4, 2005

.NET 2.0 prohibits DTD in XML by default

Yep, no DTD is allowed by default in the .NET 2.0 Beta2: XmlReaderSettings.ProhibitDtd Property (System.Xml) Gets or sets a value indicating whether to prohibit document type definition (DTD) processing. Return Value true to prohibit DTD processing; otherwise false. The default is true. Remarks This setting can be useful in preventing ...

May 3, 2005

Changing value of an XSLT variable (don't laugh!)

Can one change a value of a variable in XSLT? Unexpected answer - yes, when debugging XSLT in Visual Studio 2005 Beta2. I'm not sure even if it's a bug. Actually it can be quite useful when debugging to be able to change some values, right? Do other XSLT debuggers ...

XML Tools in Visual Studio 2005 video from the Channel 9

Finally something interesting from the Channel 9: Ken Levy shows new cool XML tools in Visual Studio 2005 - improved XML editor, XML Intellisense, schemas, XML code snippets, XSLT editing and debugging etc etc etc. 41 min video, still worth watching. ...

May 2, 2005

Microsoft licensed Mvp.Xml library

On behalf of the Mvp.Xml project team our one and the only lawyer - XML MVP Daniel Cazzulino aka kzu has signed a license for Microsoft to use and distribute the Mvp.Xml library. That effectively means Microsoft can (and actually wants to) use and distribute XInclude.NET and the rest Mvp.Xml ...

May 1, 2005

Erik Meijer on Helping Programmers Program Better

Erik Meijer, one of designers of Haskell98 and C-omega languages will be presenting an interesting webcast at Tuesday, May 03, 2005 10:00 AM (GMT-08:00): MSDN Webcast: Language Design: Helping Programmers Program Better (Level 300) One of the greatest challenges programmers face is translating the concepts in their head into a ...

XQuery in .NET 2.0 Petition - too late, guys!

Almost 6 months after it's been announced that Microsoft won't ship XQuery implementation in the .NET 2.0, StylusStudio (maker of the namesake XML IDE) decided to run an online petition "XQuery for all" to urge Microsoft change the mind. Well, as a marketing action it's ok, but the petition itself ...