Why XmlReader Usage Pattern Ignores NameTable?

| 6 Comments | 3 TrackBacks

Somehow it happened that one of the most commonly used XmlReader usage patterns ignores NameTable. That's really unfortunate! Everybody, including Microsofties, MVPs and of course zillions of users blindly follow it, carelessly slowing down XmlReader's parsing speed.

I'm talking about "keep reading till element foo" pattern all we familiar with:

while (reader.Read()) {
  if (reader.NodeType==XmlNodeType.Element && 
    reader.Name=="foo") {
      ...
    }
}
Bolded part is the crux here. reader.Name property returns parsed element name, with respect to the XmlReader's NameTable, while "foo" string doesn't belong to any NameTable. That means usual string comparison (pointers/length/char-by-char) occurs, which is obviously slow. That's not how it was meant to be! "Object Comparison Using XmlNameTable with XmlReader" article of the .NET Framework Developer's Guide suggests different usage pattern:
object cust = reader.NameTable.Add("Customer");
while (reader.Read())
{
   // The "if" uses efficient pointer comparison.
   if (cust == reader.Name)   
   {
      ...
   }
}
Both strings compared here are belong to the same NameTable, thus taking the comparison down to a single cheap pointer comparison!

And what do you think Sun does it in their XML Processing Performance Java and .NET. comparison? The same reader.Name != "LineItemCountValue" stuff! It's interesting to run their tests with such lines fixed.

According to my rough measurements this unfortunate usage pattern costs about 1-20% of the parsing time dependig on many factors. Below is my testing. I'm parsing books.xml document, counting "price" elements.

The result on my Win2K box is:

D:\projects\Test\bin\Release>Test.exe
Warming up...
Testing...
Time with NameTable: 1308.86 ms
Time with no NameTable: 1403.60 ms
Benchmarking is a really fragile stuff and I'm sure the results will differ drastically, but basically what I wanted to say is that something needs to be done to fix this particular usage pattern of XmlReader to not ignore great NameTable idea. I encourage fellow MVPs, XmlInsiders and others not to post XmlReader samples, where NameTable is neglected.

Related Blog Posts

3 TrackBacks

TrackBack URL: http://www.tkachenko.com/cgi-bin/mt-tb.cgi/183

TITLE: re: XmlNameTable: The Shiftstick of System.Xml URL: http://blogs.msdn.com/mfussell/archive/2004/04/28/122138.aspx IP: 66.129.67.203 BLOG NAME: Mark Fussell's WebLog DATE: 04/28/2004 06:59:17 PM Read More

Xml and the Nametable from ComputerZen.com - Scott Hanselman on March 7, 2006 12:23 PM
Xml and the Nametable from ComputerZen.com - Scott Hanselman on March 7, 2006 12:23 PM

6 Comments

Ah I see. Now that I can live with. :)
Thanks

Well, I'm not sure which exactly comparison switch operator performs, CLI spec doesn't say that clearly.
But I believe that doesn't matter, because string comparison also performs object comparison as the very first check. (Then it checks if any of operands is null, then compares string lengthes and only then char-by-char comparison takes place).

I don't understand. Do you think that the switch statement would use object reference comparison rather than a string comparison?

I think that's the same issue. I don't see why it shouldn't work.

I may be overlooking something obvious but usually there will be many, many nodes that need processing to require statements like:

if (cust == reader.Name)

inside the while loop in the sample code. That means many, many if..else/else if statements. Not good. switch statement would be an obvious choice but unless I'm mistaken, it only works on integral or string expressions. So while the above may work faster due to the object comparison rather than string(char-by-char) comparison, I wonder if the same applies inside a switch statement. And although cust object is typed object, NameTable.Add returns a string object after all. I don't know where that is relevant but wouldn't switch do a string comparison rather than a reference comparison? Or may it won't run at all.

object cust = reader.NameTable.Add("Customer");
object emp = reader.NameTable.Add("Employee");
object last = reader.NameTable.Add("Lastname");

while(reader.Read())
{
switch(reader.Name)
{
case cust:
//do customer
break;
case emp:
//do employee
break;
case lastname:
//do lastname
break;
default:
break;
}
}

Any ideas?

Really good tip Oleg! Now that I read it, it looks SOO obvious!

Leave a comment