June 24, 2004

Efficient subtree transformation with SubtreeXPathNavigator

Daniel implemented SubtreeXPathNavigator I was talking about. That's a way cool stuff, I really like it. Now I'm not sure about XmlNodeNavigator - do we need it in Mvp.Xml library or we better remove it to not confuse users with different forms of the same navigator? I feel a bit ...

June 22, 2004

Microsoft Research RSS Feeds

Here are new feeds worth to subscribe: Microsoft Research News and Headlines Feed Microsoft Research Downloads Feed Microsoft Research Publications Feed Cool. [Via Roy Osherove] ...

June 21, 2004

Validating Doctype-less documents against DTD

Here is another interesting puzzle to solve - how would you validate Doctype-less XML document (which has no Doctype declaration) against DTD? ...

XmlValidatingReader needs Doctype to validate document, that's is the only way to enforce it to perform validation against DTD - give it XML with Doctype. Ok, my first impulse was to write a simple custom XmlTextWriter, which exposes synthetic Doctype while reading Doctype-less documents:
XmlTextReader -> DoctypeAppendingXmlReader -> XmlValidatingReader
That's a bit nontrivial as PublicID and SystemID should be exposed as synthetic attributes too, but quite doable with a small state machine. Unfortunately that doesn't work. XmlValidatingReader still doesn't validate even being given a Doctype. A bit of reflection unveiled that when encontering Doctype, XmlValidatingReader asks XmlTextReader for DTD via its internal property. Obviously it's null as XmlTextReader doesn't see the synthetic Doctype. No way, won't work.

Ok, then what? Another approach is to modify XML before validation by appending Doctype. No XmlDocument or XSLT here please, that's a job for XmlReader-XmlWriter pipeline. Here is the modifying code:

XmlReader r = new XmlTextReader("foo.xml");
XmlWriter w = new XmlTextWriter("foo2.xml", Encoding.UTF8);
bool hasDoctype = false;
while (r.Read()) 
{
    if (r.NodeType == XmlNodeType.DocumentType)
        hasDoctype = true;
    else if (r.NodeType == XmlNodeType.Element) 
    {
        if (!hasDoctype) 
        {
            //First element is about to be written - insert Doctype
            w.WriteDocType(r.Name, null, "foo.dtd", null);        
        }        
    }
    w.WriteNode(r, false);
}
r.Close();
w.Close();
//Now let's validate modified one
XmlValidatingReader vr = new XmlValidatingReader(
    new XmlTextReader("foo2.xml"));    
while (vr.Read());
Works fine, but requires temporary buffer/file/whatever to hold modified version of the document. Not really satisfying. Any other ideas? I think a better solution exists. E.g. if during validation document is going to be loaded into some in-memory XML store anyway, then some sort of in-memory validation might help, but I doubt that trick will work for DTD validation.

June 20, 2004

Non-Extractive XML Parsing

Well, I'm working on decreasing the size of the "Items for Read" folder in RSS Bandit. Still many to catch up, but anyway. XML.com has published "Non-Extractive Parsing for XML" article by Jimmy Zhang. In the article Jimmy proposes another approach to XML parsing - using "non-extractive" style of tokenization ...

June 15, 2004

How to add a reference to XSLT stylesheet while writng DataSet data to XML

Say you've got a DataSet and you want to save its data as XML. DataSet.WriteXml() method does it perfectly well. But what if you need saved XML to have a reference to an XSLT stylesheet - xml-stylesheet processing instruction, such as <?xml-stylesheet type="text/xsl" href="foo.xsl"?> ? Of course you can load ...

The idea is a trivial - DataSet.WriteXml() method accepts XmlWriter, so we can design a custom XmlWriter and encapsulate adding PI logic in it. xml-stylesheet PI must be placed in the document's prolog, so here is when to add it - once XmlWriter.WriteStartElement() method is called and the writer state is WriteState.Prolog or WriteState.Start (that means the first element's start tag is about to be written). Trivial and robust. Here is the implementation:

public sealed class AddPIXmlTextWriter : XmlTextWriter 
{
  private string stylesheet;

  //Constructors - add more as needed
  public AddPIXmlTextWriter(string filename, 
    Encoding enc, string stylesheet) 
  : base(filename, enc) 
  {
    this.stylesheet = stylesheet;
  }

  public override void WriteStartElement(string prefix, 
    string localName, string ns)
  {
    if (WriteState == WriteState.Prolog || 
        WriteState == WriteState.Start) 
    {
      //It's time to add the PI
      base.WriteProcessingInstruction("xml-stylesheet", 
        String.Format("type=\"text/xsl\" href=\"{0}\"", stylesheet));
    }
    base.WriteStartElement(prefix, localName, ns);
  }
}
And here is a usage sample:
    DataSet ds = new DataSet();
    ds.ReadXml("feedlist.xml");
    ...
    XmlTextWriter w = new AddPIXmlTextWriter("foo.xml", Encoding.UTF8, "foo.xsl");
    w.Formatting = Formatting.Indented;        
    ds.WriteXml(w);
And the resulting foo.xml starts with:
<?xml-stylesheet type="text/xsl" href="foo.xsl"?>
<feeds refresh-rate="900000" xmlns="http://www.25hoursaday.com/2003/RSSBandit/feeds/">
  <feed category="blogs">
  ...

June 9, 2004

Microsoft Security Bulletin RSS Feed

RSS makes its way. TechNet's security team announced the first version of an RSS feed for its security bulletins: Microsoft Security Bulletin RSS Feed. ...

June 8, 2004

MSDN still suggests ineffective XSLT pipelining

Reading wonderful "Chapter 9 - Improving XML Performance": Split Complex Transformations into Several Stages You can incrementally transform an XML document by using multiple XSLT style sheets to generate the final required output. This process is referred to as pipelining and is particularly beneficial for complex transformations over large XML ...

June 6, 2004

I'm back

So I'm back. That was crazy trip Tel-Aviv-Prague-Berlin-Amsterdam-Paris-Bavaria-Prague-Tel-Aviv. Bad weather was chasing us, but fortunately it was mostly warm enough even for us sun-accustomed Israelis. Mailbox overflow did happen and all incoming mail has been bounced during 06/01-06/03. If you were trying to send me something that days, you may ...