March 7, 2004

Why XmlReader Usage Pattern Ignores NameTable?

Somehow it happened that one of the most commonly used XmlReader usage patterns ignores NameTable. That's really unfortunate! Everybody, including Microsofties, MVPs and of course zillions of users blindly follow it, carelessly slowing down XmlReader's parsing speed. ...

I'm talking about "keep reading till element foo" pattern all we familiar with:

while (reader.Read()) {
  if (reader.NodeType==XmlNodeType.Element && 
    reader.Name=="foo") {
Bolded part is the crux here. reader.Name property returns parsed element name, with respect to the XmlReader's NameTable, while "foo" string doesn't belong to any NameTable. That means usual string comparison (pointers/length/char-by-char) occurs, which is obviously slow. That's not how it was meant to be! "Object Comparison Using XmlNameTable with XmlReader" article of the .NET Framework Developer's Guide suggests different usage pattern:
object cust = reader.NameTable.Add("Customer");
while (reader.Read())
   // The "if" uses efficient pointer comparison.
   if (cust == reader.Name)   
Both strings compared here are belong to the same NameTable, thus taking the comparison down to a single cheap pointer comparison!

And what do you think Sun does it in their XML Processing Performance Java and .NET. comparison? The same reader.Name != "LineItemCountValue" stuff! It's interesting to run their tests with such lines fixed.

According to my rough measurements this unfortunate usage pattern costs about 1-20% of the parsing time dependig on many factors. Below is my testing. I'm parsing books.xml document, counting "price" elements.

The result on my Win2K box is:

Warming up...
Time with NameTable: 1308.86 ms
Time with no NameTable: 1403.60 ms
Benchmarking is a really fragile stuff and I'm sure the results will differ drastically, but basically what I wanted to say is that something needs to be done to fix this particular usage pattern of XmlReader to not ignore great NameTable idea. I encourage fellow MVPs, XmlInsiders and others not to post XmlReader samples, where NameTable is neglected.