May 31, 2005

Effective XML: Dumping XML content while reading it from a stream

A fellow MVP asked if there is a way to dump XML content while reading it from a stream without buffering the whole XML document. Here is a scenario - an XML document being read from a HttpWebResponse stream and needs to be passed as an XmlReader to an XmlSerializer ...

public class DumpingXmlTextReader : XmlTextReader 
{
  private XmlWriter dump;

  //Add more constructors as needed
  public DumpingXmlTextReader(string url, XmlWriter dump)
    :base(url) 
  {
    this.dump = dump;
  }
    
  /// <summary>
  /// Overriden XmlReader's Read() method
  /// </summary>    
  public override bool Read()
  {
    bool baseRead = base.Read();
    if (baseRead) 
    {
      WriteShallowNode(this, dump);   
    }
    return baseRead;
  }
    
  /// <summary>
  /// Auxilary method to dump node XmlReader is positioned at.
  /// Thanks to Mark Fussell, 
  /// http://blogs.msdn.com/mfussell/archive/2005/02/12/371546.aspx
  /// </summary>    
  static void WriteShallowNode( XmlReader reader, XmlWriter writer )
  {
    if ( reader == null )
    {
      throw new ArgumentNullException("reader");
    }

    if ( writer == null )
    {
      throw new ArgumentNullException("writer");
    }   
      
    switch ( reader.NodeType )
    {
      case XmlNodeType.Element:
        writer.WriteStartElement( reader.Prefix, reader.LocalName, 
          reader.NamespaceURI );
        writer.WriteAttributes( reader, true );
        if ( reader.IsEmptyElement )
        {
          writer.WriteEndElement();
        }
        break;

      case XmlNodeType.Text:
        writer.WriteString( reader.Value );
        break;

      case XmlNodeType.Whitespace:
      case XmlNodeType.SignificantWhitespace:
        writer.WriteWhitespace(reader.Value);
        break;

      case XmlNodeType.CDATA:
        writer.WriteCData( reader.Value );
        break;

      case XmlNodeType.EntityReference:
        writer.WriteEntityRef(reader.Name);
        break;

      case XmlNodeType.XmlDeclaration:
      case XmlNodeType.ProcessingInstruction:
        writer.WriteProcessingInstruction( reader.Name, reader.Value );
        break;

      case XmlNodeType.DocumentType:
        writer.WriteDocType( reader.Name, 
          reader.GetAttribute( "PUBLIC" ), reader.GetAttribute( "SYSTEM" ), 
          reader.Value );
        break;

      case XmlNodeType.Comment:
        writer.WriteComment( reader.Value );
        break;

      case XmlNodeType.EndElement:
        writer.WriteFullEndElement();
        break;
    }
  }
}
Not a rocket science as you can see, pretty straightforward. The core method - WriteShallowNode, dumping XML node I borrowed from Mark Fussell's post on "Combining the XmlReader and XmlWriter classes for simple streaming transformations".

And here is a usage sample. I'm reading XML from a file stream (imagine instead it's HttpWebResponse stream), feeding it to an XmlSerializer and dumping its content at the same time. And note - XML content never gets buffered as a whole, the processing is pure forward-only non-caching streaming one.

//Prepare dumping writer
XmlTextWriter dumpWriter = new XmlTextWriter("dump.xml", Encoding.UTF8);
dumpWriter.Formatting = Formatting.Indented;
PurchaseOrder po = null;
using (FileStream fs = File.OpenRead("PurchaseOrder.xml")) 
{
  //Reads and dumps XML content node-by-node to the dumpWriter
  XmlReader reader = new DumpingXmlTextReader(fs, dumpWriter);
  XmlSerializer serializer = new XmlSerializer(typeof(PurchaseOrder));      
  po = (PurchaseOrder)serializer.Deserialize(reader);
}
//Close dumping writer, the XML dump is in dump.xml
dumpWriter.Close();            
//Deserialization went ok
Console.WriteLine(po.Account);

I wonder if it's a rare use case or we need such class in utilities, e.g. in Mvp.Xml library?