Why XmlReader Usage Pattern Ignores NameTable?

Somehow it happened that one of the most commonly used XmlReader usage patterns ignores NameTable. That's really unfortunate! Everybody, including Microsofties, MVPs and of course zillions of users blindly follow it, carelessly slowing down XmlReader's parsing speed.

I'm talking about "keep reading till element foo" pattern all we familiar with:

while (reader.Read()) {
  if (reader.NodeType==XmlNodeType.Element && 
    reader.Name=="foo") {
      ...
    }
}

Bolded part is the crux here. reader.Name property returns parsed element name, with respect to the XmlReader's NameTable, while "foo" string doesn't belong to any NameTable. That means usual string comparison (pointers/length/char-by-char) occurs, which is obviously slow. That's not how it was meant to be! "Object Comparison Using XmlNameTable with XmlReader" article of the .NET Framework Developer's Guide suggests different usage pattern:

object cust = reader.NameTable.Add("Customer");
while (reader.Read())
{
   // The "if" uses efficient pointer comparison.
   if (cust == reader.Name)   
   {
      ...
   }
}

Both strings compared here are belong to the same NameTable, thus taking the comparison down to a single cheap pointer comparison!

And what do you think Sun does it in their XML Processing Performance Java and .NET. comparison? The same reader.Name != "LineItemCountValue" stuff! It's interesting to run their tests with such lines fixed.

According to my rough measurements this unfortunate usage pattern costs about 1-20% of the parsing time dependig on many factors. Below is my testing. I'm parsing books.xml document, counting "price" elements. using System; using System.Xml; using System.IO; using System.Runtime.InteropServices; class Class1 { [STAThread] static void Main(string[] args) { StreamReader sr = new StreamReader("books.xml"); string xml = sr.ReadToEnd(); sr.Close(); int num = 1000; PerfTest test = new PerfTest(); Console.WriteLine("Warming up..."); TestWithNoNameTable(xml, num); TestWithNameTable(xml, num); Console.WriteLine("Testing..."); test.Start(); TestWithNameTable(xml, num); Console.WriteLine("Time with NameTable: {0, 6:f2} ms", test.Stop()); test.Start(); TestWithNoNameTable(xml, num); Console.WriteLine("Time with no NameTable: {0, 6:f2} ms", test.Stop()); } public static void TestWithNoNameTable(string xml, int num) { int counter = 0; for (int i=0; i<num; i++) { XmlTextReader r = new XmlTextReader(new StringReader(xml)); while (r.Read()) { if (r.NodeType == XmlNodeType.Element && r.Name.Equals("price")) counter++; } r.Close(); } } public static void TestWithNameTable(string xml, int num) { int counter = 0; XmlNameTable nt = new NameTable(); string key = nt.Get("price"); for (int i=0; i<num; i++) { XmlTextReader r = new XmlTextReader(new StringReader(xml), nt); while (r.Read()) { if (r.NodeType == XmlNodeType.Element && r.Name == key) counter++; } r.Close(); } } } public class PerfTest { [DllImport("kernel32.dll", EntryPoint = "QueryPerformanceCounter", CharSet = CharSet.Unicode)] extern static bool QueryPerformanceCounter(out long perfcount); [DllImport("kernel32.dll", EntryPoint = "QueryPerformanceFrequency", CharSet = CharSet.Unicode)] extern static bool QueryPerformanceFrequency(out long frequency); long startTime; long stopTime; public void Start() { QueryPerformanceCounter(out this.startTime); } public float Stop() { QueryPerformanceCounter(out this.stopTime); long frequency; QueryPerformanceFrequency(out frequency); float diff = (stopTime - startTime); return diff*1000f/(float)frequency; } }

The result on my Win2K box is:

D:\projects\Test\bin\Release>Test.exe
Warming up...
Testing...
Time with NameTable: 1308.86 ms
Time with no NameTable: 1403.60 ms

Benchmarking is a really fragile stuff and I'm sure the results will differ drastically, but basically what I wanted to say is that something needs to be done to fix this particular usage pattern of XmlReader to not ignore great NameTable idea. I encourage fellow MVPs, XmlInsiders and others not to post XmlReader samples, where NameTable is neglected.

3 TrackBacks

TrackBack URL: http://www.tkachenko.com/cgi-bin/mt-tb.cgi/183

re: XmlNameTable: The Shiftstick of System.Xml from Mark Fussell's WebLog on April 28, 2004 6:59 PM

TITLE: re: XmlNameTable: The Shiftstick of System.Xml URL: http://blogs.msdn.com/mfussell/archive/2004/04/28/122138.aspx IP: 66.129.67.203 BLOG NAME: Mark Fussell's WebLog DATE: 04/28/2004 06:59:17 PM Read More

Xml and the Nametable from ComputerZen.com - Scott Hanselman on March 7, 2006 12:23 PM

6 Comments

Jiho Han | March 18, 2004 11:25 PM | Reply

Ah I see. Now that I can live with. :)
Thanks

Oleg Tkachenko | March 18, 2004 5:05 PM | Reply

Well, I'm not sure which exactly comparison switch operator performs, CLI spec doesn't say that clearly.
But I believe that doesn't matter, because string comparison also performs object comparison as the very first check. (Then it checks if any of operands is null, then compares string lengthes and only then char-by-char comparison takes place).

Jiho Han | March 18, 2004 4:26 PM | Reply

I don't understand. Do you think that the switch statement would use object reference comparison rather than a string comparison?

Oleg Tkachenko | March 17, 2004 2:38 PM | Reply

I think that's the same issue. I don't see why it shouldn't work.

Jiho Han | March 16, 2004 5:18 PM | Reply

I may be overlooking something obvious but usually there will be many, many nodes that need processing to require statements like:

if (cust == reader.Name)

inside the while loop in the sample code. That means many, many if..else/else if statements. Not good. switch statement would be an obvious choice but unless I'm mistaken, it only works on integral or string expressions. So while the above may work faster due to the object comparison rather than string(char-by-char) comparison, I wonder if the same applies inside a switch statement. And although cust object is typed object, NameTable.Add returns a string object after all. I don't know where that is relevant but wouldn't switch do a string comparison rather than a reference comparison? Or may it won't run at all.

object cust = reader.NameTable.Add("Customer");
object emp = reader.NameTable.Add("Employee");
object last = reader.NameTable.Add("Lastname");

while(reader.Read())
{
switch(reader.Name)
{
case cust:
//do customer
break;
case emp:
//do employee
break;
case lastname:
//do lastname
break;
default:
break;
}
}

Any ideas?

Daniel Cazzulino | March 11, 2004 5:44 AM | Reply

Really good tip Oleg! Now that I read it, it looks SOO obvious!

Name

Email Address

URL

Remember personal info?

Comments (You may use HTML tags for style)

Why XmlReader Usage Pattern Ignores NameTable?

Tags:

Related Blog Posts

3 TrackBacks

6 Comments

Leave a comment

Search

About this Entry

Recent Tweets

Recent Comments

Recent Posts

Why XmlReader Usage Pattern Ignores NameTable?

Tags:

Related Blog Posts

3 TrackBacks

6 Comments

Leave a comment

Search

About this Entry

Recent Tweets

Archives

Tag Cloud

Recent Comments

Recent Posts