March 31, 2004

Undeservedly forgotten: IE Tools for Validating XML and Viewing XSLT Output

This tool is undeservedly forgotten, but frequently asked and usually hard to find (somehow it's constantly moving around MSDN breaking the links). I'm talking about "Internet Explorer Tools for Validating XML and Viewing XSLT Output". IE out of box doesn't allow you to validate XML, the only way is to ...

Chernobyl chronicles

Ivan posts a link to the "GHOST TOWN" - a story of a real girl riding on a motorbike through the closed Chernobyl area, where nuclear powerplant has exploded back in 1986. Lots of fantastic photos. Abandoned cities 18 years after the disaster. Deadly amazing and sad story. I've been ...

March 30, 2004

Xerces.NET???

May be I missed something, but looks like Travis Bright is converting Apache Xerces XML parser to .NET. I wonder what for? Aha, he's PM for the Java Language Conversion Assistant (JLCA). That explains. Btw, one day I stumbled across CSS parsing in .NET. Java version of the product I've ...

Woohoo! (MSDN XML DevCenter)

Well, of course the breaking news today are all about recently launched MSDN XML Developer Center. Should admit I've been checking http://msdn.com/xml several times a day last weeks :) At last it's up and it looks just great! Somebody said it's like blessing for XML. Kinda true. Of course the ...

Mark's article "What's New in System.Xml for Visual Studio 2005 and the .NET Framework 2.0 Release" looks brilliant. It's a big one and I didn't read it yet, but it should be familiar to me as I just finished reading his "First Look at ADO.NET and System Xml v 2.0" book. So more comments to come. Btw, that book will be "book-of-the-month" in the recently launched .NET book club.

Addition from Dare - XSD Inference API will be promoted into core System.Xml. Nice. Here we can see how proved to be useful tool make it to the core API.

Additions from Mark:

Many methods and properties on the XmlReader, XmlWriter and XPathNavigator abstract classes now have default implementations added making it easier to implement your custom versions over datastores.
Wow! At last! That was my top feature request.
XPathNodeIterator implements the IEnumerable interface.
Really nice one. It makes XPathNodeIterator more consistent.

Well done, sincere congratulations to Dare and to all involved, let's make XML Dev Center rock!

PS. Small web design comment - the front page looks bad in mozilla.

Visual Studio.NET Wallpaper?

Apparently it's possible to set a background image in VisualStudio.NET text editor via undocumented API. Interesting exercise. [Via Mike Gunderloy] ...

RE: Do we need SAX for .NET? (or does Java ports to C# make sense?)

Daniel says he's disappointed in SAX.NET project I was writing about. Unlike lazy me, he downloaded it and inspected implementation. Well, I mostly agree with him. This piece of direct thoughtless porting of complex convolute Java API to .NET looks weird and kinda unnatural. "namespace System.Xml.Sax {" isn't what I ...

Binary evil strikes back - W3C launches XML Binary Characterization WG

W3C announced the creation of the XML Binary Characterization Working Group. Chartered for a year, the group will analyze and develop use cases and measurements for alternate encodings of XML. Its goal is to determine if serialized binary XML transmission and formats are feasible. The WG has been created as ...

EXSLT.NET and ASP.NET security woes

A long and convolute discussion about security problems of using EXSLT.NET in ASP.NET took place in EXSLT.NET message board. Here I'd like to formulate some short summary. ...

A guy said he's getting SecurityException whenever he tries to use EXSLT.NET in ASP.NET, while it works in command line with nxslt.exe. We use reflection in EXSLT.NET code (e.g. to return a nodeset from the extension functions using this recipe) and as Dare noted in his "EXSLT: Enhancing the Power of XSLT" article, that imposes Full Trust security requirements. That's why at first I decided that's EXSLT.NET fault and I should find out how to get rid of reflection in the code. That was too quick rush, happily then I realized many people including me do use EXSLT.NET in ASP.NET and it does work fine.

Subsequent investigation shed some light. Using reflection and even heavy sin of instantiating an internal class in XSLT extension object in fact doesn't require Full Trust, but only ReflectionPermission with appropiate flags. Moreover, that just doesn't matter, because using any extension object in XSLT requires real Full Trust due to improved Code Access Security in .NET 1.1. So no matter what we do in EXSLT.NET code, your code (including your XSLT stylesheet) must be fully trusted to be able to use any XSLT extensions. That undoubtedly makes sense otherwise another cool XSLT stylesheet could inspect your hard disk while styling an RSS feed.

The bottom line - to avoid SecurityException when using EXSLT.NET functions in ASP.NET load your XSLT stylesheet from fully trusted place, such as local file system (read more about Code Access Security). If you want to load a stylesheet from a remote web site, you need to make that site fully trusted using either caspol.exe utility or .NET Framework Configuration tool (Mscorcfg.msc).

March 29, 2004

New Google's Skin

Looks like Google got new site skin. I like it. Lightweight and clean. ...

March 28, 2004

Interesting BizTalk webcast will take place April 8th - enroll now

This webcast is going to be really interesting one: MSDN Webcast: Real-World BizTalk Server 2004 Editing and Mapping Techniques - Level 200 This session is a deep dive on the BizTalk Server Editor and Mapper. Learn how to model flat-files and EDI-files. Learn how to detail with complex mapping scenarios ...

MSN chases Google, now with MSN toolbar

May be I missed the train, but look what I discovered in the recent "Microsoft This Week" newsletter: MSN toolbar. It looks exactly like Google toolbar, moreover what's funny, http://toolbar.msn.com and http://toolbar.google.com pages are just the twins! After all that's good move. I hope the competition is going to be ...

March 25, 2004

Visual Studio 2005 Community Technology Preview is here

Visual Studio 2005 Community Technology Preview March 2004 - Full DVD available for MSDN subscribers! ...

First year in Blogland

Arrrgh, I missed that day - 20 March my blog crossed 1 year timeline. Here is what I wrote a year ago: Well, blogging is really infectious disease and finally I got the infection. I have installed Movabletype engine on my site quite easily (c'mon, it's cgi based) and here ...

March 22, 2004

SAX for .NET?

Hey, SAX for .NET topic is becoming hot. I was aware of one implementation (to be unveiled really soon), being developed by my fellow MVP/XmlInsider, but apparently there is another one, by Karl Waclawek. Here is what he writes in xml-dev mail list: The SAX dot NET project on SourceForge ...

March 21, 2004

Should I use elements or attributes?

Here is a definitive answer: Beginners always ask this question. Those with a little experience express their opinions passionately. Experts tell you there is no right answer.Mike Kay ...

March 17, 2004

Transforming WordML to HTML: Support for Images

Update: this post is outdated, see "WordML2HTML with support for images stylesheet updated" for updates. Here is a new version of WordML2HTML XSLT stylesheet, developed by Microsoft for Word 2003 Beta2 and adapted by me to Word 2003 RTM. I called this version "1.1-.NET-script". Here is why. Along with some ...

Word 2003 allows to save document as "Single File Web Page" (*.mht aka Web Archive file) or as usual HTML document. In the latter case all images embedded into Word document are saved into documentName_files directory. Here is a challenge - how to implement it with XSLT. Obviously we need extension function to decode (Base64) embedded image and to write to a directory. That's not a big deal, but the problem is it makes XSLT stylesheet not portable, that's why I need different versions for .NET, MSXML etc. Second problem - WordML document doesn't store name of file, so the question is how to name a directory where to save decoded images. I introduced a global stylesheet parameter called docName to facilitate the issue. If docName parameter isn't provided, it's defaulted to first 10 characters of the document title.

To run the transformation I used nxslt.exe command line utility. Download it for free if you don't have it yet.

So I created test Word 2003 document with a couple of images:
Test Word document with a couple of images
Saved it as XML to test.xml file and run transformation to HTML by the following command line:

nxslt.exe test.xml d:\xsl\Word2HTML-.NET-script.xsl -o test.html docName=test
As the result, XSLT transformation created test.html document and test_files directory, containing two decoded images, here is how it looks like in a browser:
Word document transformed into HTML.

The implementation is very simple one. Here it is:

<msxsl:script language="c#" implements-prefix="ext">
  public string decodePicture(XPathNodeIterator bindata, string dirname, string filename) {
    if (bindata.MoveNext()) {
      System.IO.DirectoryInfo di = new System.IO.DirectoryInfo(dirname);
      if (!di.Exists)
        di.Create();
      using (System.IO.FileStream fs = 
        System.IO.File.Create(System.IO.Path.Combine(di.FullName, filename))) {
        byte[] data = Convert.FromBase64String(bindata.Current.Value);
        fs.Write(data, 0, data.Length);
      }
      return dirname + "/" + filename;
    }
    else 
        return "";
}
</msxsl:script>
<xsl:template match="w:pict">
  <xsl:variable name="dir">
    <xsl:choose>
      <xsl:when test="$docName != ''">
        <xsl:value-of select="$docName"/>
      </xsl:when>
      <xsl:otherwise>
        <!-- We need something unique instead of document name -->
        <!-- Let's take first 10 characters of title -->
        <xsl:value-of select="translate(substring($p.docInfo/o:Title, 1, 10), ' ', '')"/>
      </xsl:otherwise>
    </xsl:choose>
    <xsl:text>_files</xsl:text>		
  </xsl:variable>
  <img 
  src="{ext:decodePicture(w:binData, $dir, substring-after(w:binData/@w:name, 'wordml://'))}" 
  alt="{v:shape/v:imagedata/@o:title}" style="{v:shape/@style}" 
  title="{v:shape/v:imagedata/@o:title}"/>
</xsl:template>
Not a rocket engineering indeed. Yes, Sal, WMZ images are not supported, I have no idea how to convert them to GIF.

Download the stylesheet here and give it a shot. Again - this stylesheet requires .NET XSLT engine. Any comments/requests/bug reports are welcome.

March 16, 2004

XML Bestiary: IndexingXPathNavigator

Here I go again with another experimental creature from my XML Bestiary: IndexingXPathNavigator. This one was inspired by Mark Fussell's "Indexing XML, XML ids and a better GetElementByID method on the XmlDocument class". I've been experimenting with Mark's extended XmlDocument, played a bit with XPathDocument and "idkey()" extension function Mark ...

Intro

IndexingXPathNavigator is a wrapper around any XPathNavigator, augmenting it with indexing functionality. Such architecture allows seamless indexing of any IXPathNavigable XML store, such as XPathDocument, XmlDocument or XmlDataDocument. The overall semantics behind should be very familiar to any developer acquainted with XSLT as IndexingXPathNavigator behavior literally follows XSLT 1.0 specification.

The main scenario when IndexingXPathNavigator is meant to be used is uniform repetitive XPath selections from loaded in-memory XML document, such as selecting orders by orderID from an XmlDocument. Using IndexingXPathNavigator with preindexed selections allows drastically decrease selection time and to achieve O(n) perf.

How it works

First one have to declare keys, according to which the indexing occurs. This is done using
IndexingXPathNavigator.AddKey(string keyname, string match, string use) method, where
keyname is the name of the key,
match is an XPath pattern, defining the nodes to which this key is applicable and
use is an XPath expression used to determine the value of the key for each matching node.
E.g. to index XML document by "id" attribute value, one can declare a key, named "idKey" as follows: AddKey("idKey", "*", "@id"). This method works identically to XSLT's <xsl:key> instruction. Essentially a key can be thought as a collection of node-value pairs. During indexing IndexingXPathNavigator builds a Hashtable for each named key, which contains nodes and associated values and allows to retrieve nodes by key value. You may define as many keys as you need including keys with the same name.

After all keys are declared, IndexingXPathNavigator is ready for indexing. Indexing process is performed as follows - each node in XML document is matched against all key definitions. For each matching node, key value is calculated and this node-value pair is added into appropriate Hashtable. As can be seen indexing is not a cheap operation, it involves walking through the whole XML tree, multiple node matching and XPath expression evaluating. That's the usual indexing price. Indexing can be done in either lazy (first access time) or eager (before any selections) manner.

After indexing IndexingXPathNavigator is ready for node retrieving. IndexingXPathNavigator augments XPath with standard XSLT's key(string keyname, object keyValue) function, which allows to retrieve nodes directly from built indexes (Hashtables) by key value. The function is implemented as per XSLT spec.

Usage sample

Here is common usage pattern, which I used also to test the performance of the IndexingXPathNavigator. There is an XML document, northwind.xml, which is XML version of the Northwind database. It contains lots of orders, each one consisting of order ID, order date and shipping address:
<Item>
  <OrderID> 10952</OrderID>
  <OrderDate> 4/15/96</OrderDate>
  <ShipAddress> Obere Str. 57</ShipAddress>
</Item>
The aim is to select shipping address for an order by order ID. Here is how it can be implemented with IndexingXPathNavigator:
XPathDocument doc = new XPathDocument("test/northwind.xml");
IndexingXPathNavigator inav = new IndexingXPathNavigator(
  doc.CreateNavigator());
//Declare a key named "orderKey", which matches Item elements and 
//whose key value is value of child OrderID element
inav.AddKey("orderKey", "OrderIDs/Item", "OrderID"); 
//Indexing
inav.BuildIndexes();
//Selection
XPathNodeIterator ni =  nav.Select("key('orderKey', ' 10330')/ShipAddress");
while (ni.MoveNext())
  Console.WriteLine(ni.Current.Value);

Performance test

Here are results of my testing. northwind.xml is 240Kb and contains 830 orders. I'm searching shipping address for an order, whose OrderID is 10330 (it's almost in the middle of the XML document) using regular XPathNavigator and IndexingXPathNavigator (code above, full code in the download). My PC is 533MHz/256M RAM Win2K box, here are the results:
Loading XML document: 167.12 ms
Regular selection, warming...
Regular selection: 1000 times, total time 5371.79 ms, 1000 nodes selected
Regular selection, testing...
Regular selection: 1000 times, total time 5181.80 ms, 1000 nodes selected
Building IndexingXPathNavigator:   1.03 ms
Adding keys:   5.16 ms
Indexing:  58.21 ms
Indexed selection, warming...
Indexed selection: 1000 times, total time 515.90 ms, 1000 nodes selected
Indexed selection, testing...
Indexed selection: 1000 times, total time 476.06 ms, 1000 nodes selected
As can be seen, average selection time for regular XPath selection is 5.181 ms, while for indexed selection it's 0.476 ms. One order of magnitude faster! Note additionally that XML document is very simple and regular and I used /ROOT/CustomerIDs/OrderIDs/Item[OrderID=' 10330']/ShipAddress XPath for regular selection, which is almost linear search and is probably the most effective from XPath point of view. With more complex XML structure and XPath expressions such as //Item[OrderID=' 10330']/ShipAddress the difference would be even more striking.

Download

I wanted to make IndexingXPathNavigator a subproject of MVP XML library, but failed. I forgot I can't work with CVS over SSH from my work due to closed SSH port. Grrrmm. Ok, after all it's experimental implementation. Let's see if there will be any substantial feedback first.

Full source code along with perf testing. As usual two download locations available: local one and from GotDotNet (will update later). IndexingXPathNavigator homepage is http://www.tkachenko.com/dotnet/IndexingXPathNavigator.html.


I really like this one. Probably that's what my next article is going to be about. What do you think? I'm cap in hand waiting for comments.

MovableType 3.0 on the horizon

Here is what MovableType blogging engine team writes: We're taking our first steps towards the release of Movable Type 3.0. The pre-beta version has just finished its initial two rounds of alpha testing and we're now opening the testing to a larger audience ... What's new includes: "significant change to ...

RE: Changes in the Workspaces releases area

Hey, good news about GotDotNet Workspaces again! Changes on the releases section scheduled for tomorrow include: per-release download count (AT LAST!!!), no more zero-byte/corrupt downloads (I hope), no more Passport sign-in for downloads (great), off-site hosting of releases (cool). Really sweet. [Via Andy Oakley] ...

March 14, 2004

ASP.NET XML syntax?

XAML, Windows Forms Markup Language (WFML), Report Definition Language (RDL), Relational Schema Definition (RSD) and Mapping Schema Definition (MSD). You get the idea what the trend is nowadays. It's all XML. What I'm wondering though - why still there is no alternative ASP.NET XML-based syntax? How long ASP.NET will stay ...

RE: Opera As An RSS Reader

Hey, apparently recent Opera browser beta has RSS reader embedded. Here are some screenshorts - here and here. I like that trend. [Via 10x More Productive Blog] ...

March 10, 2004

random:random-sequence() at large

Ok, I've implemented EXSLT Random module, which consists of the only function random:random-sequence() for EXSLT.NET library. Here is how it looks now: ...

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:random="http://exslt.org/random" exclude-result-prefixes="random">
    <xsl:template match="/">
    	10 random numbers: 
    		<xsl:for-each select="random:random-sequence(10)">
    		<xsl:value-of select="format-number(., '##0.000')"/>
    		<xsl:if test="position() != last()">, 
    		</xsl:if>
    	</xsl:for-each>
    </xsl:template>
</xsl:stylesheet>
The result is
10 random numbers: 
    0.311, 
    0.398, 
    0.698, 
    0.929, 
    0.418, 
    0.523, 
    0.667, 
    0.215, 
    0.915, 
    0.007
The function accepts optional number of random numbers (1 by default) to generate and optional seed (DateTime.Now.Ticks by default) and returns nodeset of <random> elements, each one containing generated random number.

EXSLT.NET team members are encouraged to review my implementation in the projects's source repository and if nobody objects we can release EXSLT.NET 1.1 version.

MSDN XML Dev Center Tagline

Dare is looking for suggestions on what the tagline of the MSDN XML Dev Center (which is about two weeks from being launched) should be. I stink on naming and have almost nothing to suggest. Anyway, here are my document-centric-minded slogans: Marking up the world The universal data format The ...

The C# Frequently Asked Questions blog

Interesting new blog at blogs.msdn.com - "C# Frequently Asked Questions", where the C# team posts answers to common C# questions. Subscribed. Why doesn't C# support default parameters? Why doesn't C# support multiple inheritance? Why doesn't C# support #define macros? Ask your question here. ...

RE: Workspaces bug tracker changes coming soon

Watch out for some improvements in the Workspaces bug tracker next week (Tuesday 3/16/04). GotDotNet Workspaces are about to be updated. Improvements: better bug search, separating bugs by a custom field (such as build number), customization of bug display, ability to export bug lists to XML, file attachments. Not ...

March 9, 2004

"Influences on the design of XQuery" article

Great article "XQuery from the Experts: Influences on the design of XQuery" by Don Chamberlin. It's an excerpt from a chapter of "XQuery from the Experts: A Guide to the W3C XML Query Language" book. Good reading. Why relational data model doesn't fit XML, why SQL can't be used to ...

Saxon goes commercial

That's a milestone in XSLT technology life - the most famous Java XSLT processor Saxon goes commercial. Here is what Michael Kay (author of Saxon and XSLT 2.0 editor) writes: In March 2004 I founded Saxonica Limited to provide ongoing development and support of Saxon as a commercial venture. My ...

XOR Trick and Declarative Programming

Rick Schaut writes about stupidity of the XOR trick these days: So, not only is the XOR swap stupid because it's obscure, it's stupid because, with modern optimizing compilers, the eventual result often ends up being contrary to the intended result of using the coding trick in the first place ...

March 7, 2004

Why XmlReader Usage Pattern Ignores NameTable?

Somehow it happened that one of the most commonly used XmlReader usage patterns ignores NameTable. That's really unfortunate! Everybody, including Microsofties, MVPs and of course zillions of users blindly follow it, carelessly slowing down XmlReader's parsing speed. ...

I'm talking about "keep reading till element foo" pattern all we familiar with:

while (reader.Read()) {
  if (reader.NodeType==XmlNodeType.Element && 
    reader.Name=="foo") {
      ...
    }
}
Bolded part is the crux here. reader.Name property returns parsed element name, with respect to the XmlReader's NameTable, while "foo" string doesn't belong to any NameTable. That means usual string comparison (pointers/length/char-by-char) occurs, which is obviously slow. That's not how it was meant to be! "Object Comparison Using XmlNameTable with XmlReader" article of the .NET Framework Developer's Guide suggests different usage pattern:
object cust = reader.NameTable.Add("Customer");
while (reader.Read())
{
   // The "if" uses efficient pointer comparison.
   if (cust == reader.Name)   
   {
      ...
   }
}
Both strings compared here are belong to the same NameTable, thus taking the comparison down to a single cheap pointer comparison!

And what do you think Sun does it in their XML Processing Performance Java and .NET. comparison? The same reader.Name != "LineItemCountValue" stuff! It's interesting to run their tests with such lines fixed.

According to my rough measurements this unfortunate usage pattern costs about 1-20% of the parsing time dependig on many factors. Below is my testing. I'm parsing books.xml document, counting "price" elements.

The result on my Win2K box is:

D:\projects\Test\bin\Release>Test.exe
Warming up...
Testing...
Time with NameTable: 1308.86 ms
Time with no NameTable: 1403.60 ms
Benchmarking is a really fragile stuff and I'm sure the results will differ drastically, but basically what I wanted to say is that something needs to be done to fix this particular usage pattern of XmlReader to not ignore great NameTable idea. I encourage fellow MVPs, XmlInsiders and others not to post XmlReader samples, where NameTable is neglected.

March 4, 2004

On XML Catalogs

XML.com has published good article "Using XML Catalogs with JAXP". XML Catalogs are successors of SGML Catalogs and in simple words it's a system for defining resolving of resource identifiers (URIs or Public Identifiers) in XML. If you are .NET minded - it's about having XML document (called catalog), where ...

Very useful. I hooked on catalogs when working with big DTDs such as Docbook. Validate against huge DTD or schema loading it from the Web isn't good idea and catalogs here is a feature one cannot work without. It's a shame there is still no .NET implementation. Basically that's on my to-do list for almost a year, still close to bottom :( There was some showstopper related to PUBLIC identifier, but I don't remember exactly what the the problem was. It's still tempting to implement it. Probably that's going to be my next pet project after I finish EXSLT article I'm writing. Anybody interested to participate?

March 3, 2004

Visual Studio .NET Shortcut Keys

I'm sure many of you know this page, but for the rest - here is useful link to default Visual Studio .NET shortcut keys. I like this stuff. My favorite one is CTRL + TAB to navigate over opened files. [Via Jason Mauss] ...

March 2, 2004

EXSLT Random module for EXSLT.NET

I'm going to implement EXSLT Random module for EXSLT.NET lib. It contains the only extension function: number+ random:random-sequence(number?, number?) The function returns a sequence of random numbers between 0 and 1 (as text nodes obviously). The first argument is number or random numbers to generate (1 by default) and the ...

RE: Announcing: BizTalk Server 2004 Developer Competition!!!!

Hey, look at what Scott Woodgate writes: Let the first ever BizTalk Server Developer Competition commence. We are giving away cash prizes totalling $25,000 USD including a huge $15,000 USD first prize. The purpose of the BizTalk Server 2004 developer competition is to highlight and reward programming excellence using BizTalk ...

XInclude Tough Destiny

Dare writes: We were planning to add support for xml:base to the core XML parser as part of implementing XInclude but given that that it recently went from being a W3C candidate recommendation to going back to being a W3C working draft (partly due to a number of the architectural ...