Signs on the Sand: March 7, 2004 - March 13, 2004 Archives

March 10, 2004

random:random-sequence() at large

Ok, I've implemented EXSLT Random module, which consists of the only function random:random-sequence() for EXSLT.NET library. Here is how it looks now: ...

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:random="http://exslt.org/random" exclude-result-prefixes="random">
    <xsl:template match="/">
    	10 random numbers: 
    		<xsl:for-each select="random:random-sequence(10)">
    		<xsl:value-of select="format-number(., '##0.000')"/>
    		<xsl:if test="position() != last()">, 
    		</xsl:if>
    	</xsl:for-each>
    </xsl:template>
</xsl:stylesheet>

The result is

10 random numbers: 
    0.311, 
    0.398, 
    0.698, 
    0.929, 
    0.418, 
    0.523, 
    0.667, 
    0.215, 
    0.915, 
    0.007

The function accepts optional number of random numbers (1 by default) to generate and optional seed (DateTime.Now.Ticks by default) and returns nodeset of <random> elements, each one containing generated random number.

EXSLT.NET team members are encouraged to review my implementation in the projects's source repository and if nobody objects we can release EXSLT.NET 1.1 version.

8:39 PM | Comments (0) | TrackBack | #XML in .NET

MSDN XML Dev Center Tagline

Dare is looking for suggestions on what the tagline of the MSDN XML Dev Center (which is about two weeks from being launched) should be. I stink on naming and have almost nothing to suggest. Anyway, here are my document-centric-minded slogans: Marking up the world The universal data format The ...

3:17 PM | Comments (0) | TrackBack | #XML

The C# Frequently Asked Questions blog

Interesting new blog at blogs.msdn.com - "C# Frequently Asked Questions", where the C# team posts answers to common C# questions. Subscribed. Why doesn't C# support default parameters? Why doesn't C# support multiple inheritance? Why doesn't C# support #define macros? Ask your question here. ...

2:33 PM | Comments (0) | TrackBack | #Blogging

RE: Workspaces bug tracker changes coming soon

Watch out for some improvements in the Workspaces bug tracker next week (Tuesday 3/16/04). GotDotNet Workspaces are about to be updated. Improvements: better bug search, separating bugs by a custom field (such as build number), customization of bug display, ability to export bug lists to XML, file attachments. Not ...

2:24 PM | Comments (0) | TrackBack | #Software

March 9, 2004

"Influences on the design of XQuery" article

Great article "XQuery from the Experts: Influences on the design of XQuery" by Don Chamberlin. It's an excerpt from a chapter of "XQuery from the Experts: A Guide to the W3C XML Query Language" book. Good reading. Why relational data model doesn't fit XML, why SQL can't be used to ...

2:42 PM | Comments (3) | TrackBack | #XQuery

Saxon goes commercial

That's a milestone in XSLT technology life - the most famous Java XSLT processor Saxon goes commercial. Here is what Michael Kay (author of Saxon and XSLT 2.0 editor) writes: In March 2004 I founded Saxonica Limited to provide ongoing development and support of Saxon as a commercial venture. My ...

10:42 AM | Comments (0) | TrackBack | #XSLT

XOR Trick and Declarative Programming

Rick Schaut writes about stupidity of the XOR trick these days: So, not only is the XOR swap stupid because it's obscure, it's stupid because, with modern optimizing compilers, the eventual result often ends up being contrary to the intended result of using the coding trick in the first place ...

10:22 AM | Comments (0) | TrackBack | #Software

March 7, 2004

Why XmlReader Usage Pattern Ignores NameTable?

Somehow it happened that one of the most commonly used XmlReader usage patterns ignores NameTable. That's really unfortunate! Everybody, including Microsofties, MVPs and of course zillions of users blindly follow it, carelessly slowing down XmlReader's parsing speed. ...

I'm talking about "keep reading till element foo" pattern all we familiar with:

while (reader.Read()) {
  if (reader.NodeType==XmlNodeType.Element && 
    reader.Name=="foo") {
      ...
    }
}

Bolded part is the crux here. reader.Name property returns parsed element name, with respect to the XmlReader's NameTable, while "foo" string doesn't belong to any NameTable. That means usual string comparison (pointers/length/char-by-char) occurs, which is obviously slow. That's not how it was meant to be! "Object Comparison Using XmlNameTable with XmlReader" article of the .NET Framework Developer's Guide suggests different usage pattern:

object cust = reader.NameTable.Add("Customer");
while (reader.Read())
{
   // The "if" uses efficient pointer comparison.
   if (cust == reader.Name)   
   {
      ...
   }
}

Both strings compared here are belong to the same NameTable, thus taking the comparison down to a single cheap pointer comparison!

And what do you think Sun does it in their XML Processing Performance Java and .NET. comparison? The same reader.Name != "LineItemCountValue" stuff! It's interesting to run their tests with such lines fixed.

According to my rough measurements this unfortunate usage pattern costs about 1-20% of the parsing time dependig on many factors. Below is my testing. I'm parsing books.xml document, counting "price" elements. using System; using System.Xml; using System.IO; using System.Runtime.InteropServices; class Class1 { [STAThread] static void Main(string[] args) { StreamReader sr = new StreamReader("books.xml"); string xml = sr.ReadToEnd(); sr.Close(); int num = 1000; PerfTest test = new PerfTest(); Console.WriteLine("Warming up..."); TestWithNoNameTable(xml, num); TestWithNameTable(xml, num); Console.WriteLine("Testing..."); test.Start(); TestWithNameTable(xml, num); Console.WriteLine("Time with NameTable: {0, 6:f2} ms", test.Stop()); test.Start(); TestWithNoNameTable(xml, num); Console.WriteLine("Time with no NameTable: {0, 6:f2} ms", test.Stop()); } public static void TestWithNoNameTable(string xml, int num) { int counter = 0; for (int i=0; i<num; i++) { XmlTextReader r = new XmlTextReader(new StringReader(xml)); while (r.Read()) { if (r.NodeType == XmlNodeType.Element && r.Name.Equals("price")) counter++; } r.Close(); } } public static void TestWithNameTable(string xml, int num) { int counter = 0; XmlNameTable nt = new NameTable(); string key = nt.Get("price"); for (int i=0; i<num; i++) { XmlTextReader r = new XmlTextReader(new StringReader(xml), nt); while (r.Read()) { if (r.NodeType == XmlNodeType.Element && r.Name == key) counter++; } r.Close(); } } } public class PerfTest { [DllImport("kernel32.dll", EntryPoint = "QueryPerformanceCounter", CharSet = CharSet.Unicode)] extern static bool QueryPerformanceCounter(out long perfcount); [DllImport("kernel32.dll", EntryPoint = "QueryPerformanceFrequency", CharSet = CharSet.Unicode)] extern static bool QueryPerformanceFrequency(out long frequency); long startTime; long stopTime; public void Start() { QueryPerformanceCounter(out this.startTime); } public float Stop() { QueryPerformanceCounter(out this.stopTime); long frequency; QueryPerformanceFrequency(out frequency); float diff = (stopTime - startTime); return diff*1000f/(float)frequency; } }

The result on my Win2K box is:

D:\projects\Test\bin\Release>Test.exe
Warming up...
Testing...
Time with NameTable: 1308.86 ms
Time with no NameTable: 1403.60 ms

Benchmarking is a really fragile stuff and I'm sure the results will differ drastically, but basically what I wanted to say is that something needs to be done to fix this particular usage pattern of XmlReader to not ignore great NameTable idea. I encourage fellow MVPs, XmlInsiders and others not to post XmlReader samples, where NameTable is neglected.

3:49 PM | Comments (6) | TrackBack | #XML in .NET