March 30, 2005

wbloggar site hacked?

I'm setting up new notebook (HP Pavilion ze2070ea - really really nice one) and just realized I can't download one of the most important piece of software I just can't live without - w.bloggar, blogging client, because its site seems to be hacked and defaced out. When you go to ...

Generating Word documents using Java

Dino Chiesa of Microsoft shows how to generate dynamically WordML documents using Java and XSLT. Yep, that's not a typo, Microsoft, WordML and Java. XML serves as peacemaker again. And he even provides a working JSP demo. Cool. ...

Baby photos

Here are some first photos of our little Catherine: Catherine's gallery. I believe I start feeling something unusual seeing her. That's really amazing. ...

Happy resource wasters

The day can't go well when it starts with such. That's really sad to see. The guy, who "used exceptions quite extensively to pass messages from the database all the way to the client", tested (no, "tested") cost of throwing exceptions in .NET on his desktop using such "test": Sub ...

March 27, 2005

MSDN Webcast: Making the Most of XQuery with SQL Server 2005 (Level 300) - 4/26/2005

Michael Rys will be presenting MSDN Webcast "Making the Most of XQuery with SQL Server 2005 (Level 300)" at 4/26/2005. This session provides an introduction to XQuery and the data modification language as implemented in SQL Server 2005, and shows you how to get the most from XQuery. Learn how ...

March 22, 2005

RSS Bandit v1.3.0.26 is here

Dare writes: This is the final release of the version formerly codenamed "Wolverine". This is the most significant release to date and has a ton of cool features. Enjoy. Get it here, this version rocks. I made the Russian translation again, so all typos and lame Russian words are my ...

The daughter!

Saturday, March 19 11:55AM the following transformation has occured: <xsl:template match="family[@surname='Tkachenko' and husband/@fname='Oleg' and wife/@fname='Alenka']"> <xsl:copy> <xsl:copy-of select="@* | husband | wife"/> <child sex="female" birth-date="2005-03-19"/> </xsl:copy> </xsl:template> For those who can't read XSLT, I translate to English: last Saturday at 11:55AM my dear wife gave birth to our lovely wonderful ...

March 15, 2005

MSDN Webcast on managing XML in SQL Server 2005 by Michael Rys

Michael Rys will be presenting a MSDN webcast on "Managing XML Data on the Database with SQL Server 2005 and Visual Studio 2005 (Level 300)" at April 05, 2005. This session explores advanced concepts and techniques for working with XML data types using Microsoft SQL Server 2005 and Visual Studio ...

March 14, 2005

"XML Hacks" book review

"XML Hacks" by Michael Fitzgerald is not a newly published book (July 2004), it just sometimes happens when your reading queue is implemented as a priority queue and you read not what you'd like to but what you have to. My overall final rating is . That's the first book ...

March 12, 2005

MSDN upcoming webcasts RSS feed

For those like me who missed that - finally you can get informed about upcoming MSDN webcasts in the most natural way - via RSS feed! ...

VB6 is dead? Amen

http://classicvb.org/petition: 1378 developers, including 203 MVPs signed petition to bring VB6 to Visual Studio .NET. Sounds crazy, huh? While I understand the pain of backwards compatibility issues, I'd rather sign a petition against it. ...

March 10, 2005

How to speed up Muenchian grouping in .NET

Muenchian technique of grouping is de-facto standard way of grouping in XSLT 1.0. It uses keys and is usually very fast, efficient and scalable. There used to be some problems with using Muenchian grouping in .NET though, in particular the speed was in question. To put it another way ...

Muenchian grouping includes a step of selecting unique nodes - first node for each group. Usually this is done using generate-id() or count() functions. There is another way to select nodes with unique value though - EXSLT's set:distinct() function, supported by EXSLT.NET. So I measured performance and scalability of all three methods.

The source XML is XML dump of the Orders database from the Northwind sample database, including 415 orders:

<root>
  <orders OrderID="10249" CustomerID="TOMSP" EmployeeID="6" 
  OrderDate="1996-07-05T00:00:00" RequiredDate="1996-08-16T00:00:00" 
  ShippedDate="1996-07-10T00:00:00" ShipVia="1" Freight="11.61" 
  ShipName="Toms Spezialitten" ShipAddress="Luisenstr. 48" ShipCity="Munster" 
  ShipPostalCode="44087" ShipCountry="Germany" />
  <!-- 414 more orders -->
</root>
To unveil scalability issues I created bigger documents by multiplying number of orders by 2 (while keeping OrderID uniquness), so I got documents with 415, 830, 1660, 3320, 6640 and 13280 orders (from 135 Kb to 4.5 Mb). The task is to group orders by ShipCountry value. Here is the first stylesheet (classical Muenchian grouping with generate-id()):
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
  <xsl:output indent="yes"/>
  <xsl:key name="countryKey" match="orders" use="@ShipCountry"/>
  <xsl:template match="root">
    <table border="1">
      <tr>
        <th>Order ID</th>
        <th>Ship City</th>
      </tr>
      <xsl:for-each select="
      orders[generate-id()=generate-id(key('countryKey', @ShipCountry)[1])]">
        <tr>
          <th colspan="2">
            <xsl:value-of select="@ShipCountry"/>
          </th>
        </tr>
        <xsl:for-each select="key('countryKey',@ShipCountry)">
          <tr>
            <td>
              <xsl:value-of select="@OrderID"/>
            </td>
            <td>
              <xsl:value-of select="@ShipCity"/>
            </td>
          </tr>
        </xsl:for-each>
      </xsl:for-each>
    </table>
  </xsl:template>
</xsl:stylesheet>
Pretty trivial. Second version uses count() function instead of generate-id(), here is the relevant part:
      <xsl:for-each select="
      orders[count(.| key('countryKey', @ShipCountry)[1]) = 1]">
And third version uses set:distinct() function:
      <xsl:for-each select="set:distinct(orders/@ShipCountry)/.." xmlns:set="http://exslt.org/sets">
Here are the results I got when running all three stylesheets with above 6 XML documents on my ancient Dell workstation (P3 600MHz) using nxslt.exe:
Grouping techniqueTransformation time (ms)
XML document size (number of orders to group)
41583016603320664013280
Muenchian Grouping (with generate-id())151.722407.6191318.6765290.96227773.98130860.1
Muenchian Grouping (with count())97.238190.086462.0751401.1994193.14314015.86
Muenchian Grouping (with (set:distinct())94.499155.035276.465687.4941104.5542503.871

The graph view works better:
Testing results

As can be seen, in .NET 1.1, Muenchian grouping using generate-id() is not only the slowest, but shows the worst scalability. Probably the reason is poor generate-id() function implementation. count() function performs much better, but still shows some scalability issues. And finally Muenchian grouping using set:distinct() function is the winner here - both in speed and good scalability. Sublinear running time, amazing. Kudos to Dimitre Novatchev for optimizing set:distinct() function implmentation in EXSLT.NET.

The bottom line - if you are looking for ways to speed up grouping in XSLT under .NET 1.X, use Muenchian grouping with set:distinct() function from EXSLT.NET to get the best perf and scalability. Otherwise use Muenchian grouping with count() function, which sucks less in .NET than generate-id() function does.

I wonder what would be results in .NET 2.0? Stay tuned guys.

Microsoft Certification Second Shot Offer - if you fail, try second time for free

Nice promotional offer from Microsoft for those interested in Microsoft IT Pro or Developer certification. Register for the offer by May 31, get the promotional code and then take your exam. You you fail, you can use the promotional code to retake the exam for free! ...

March 6, 2005

XmlTextReader video tutorial from Dan Wahlin

Second video installment in XML API fundamentals series from Dan Wahlin - this time on my favorite - XmlTextReader. Good way to grasp basics. ...

March 3, 2005

Indexing XML article, part 1 - done

I finally finished that article and sent it to the MSDN XML Dev Center. It's two-part article discussing various XML indexing aspects. In the first part I covered techniques for indexing standalone XML documents - XML IDs, XSLT Keys and IndexingXPathNavigator. The next part will be completely focused on XML ...

Another interview with Michael Rys on Microsoft SQL Server 2005 and XQuery

Ivan Pedruzzi (Stylus Studio) has interviewed Michael Rys on XQuery, Yukon and XML technologies at Microsoft. Really interesting one, read it here. ...

Microsoft SQLXML is not SQL/XML, which is written with a slash. SQL/XML is the common name for part 14 of the official ANSI SQL 2003 standard, which defines an xml data type, operations on the xml data type, a set of xml publishing functions, mapping rules from relational to xml data, and so on. SQL Server 2005's new XML data type is based on this standard.
Hey, nice clarification. On XQuery support in SQL Server 2005:
So, we've chosen to focus on what is considered to be the more stable parts of the specification, which I believe address the majority of the use cases - I think that those who take advantage of SQL Server 2005's XQuery support functionality will reap the benefits that XQuery has to offer. Obviously, some customers will want more, but they should understand that we're committed to delivering more extensive support for XQuery over time as the XQuery specification becomes an official W3C recommendation.
That should be safe. So get prepared, because:
Currently, the plan is to ship Microsoft SQL Server 2005 sometime this summer though the exact date has not been publicly announced.
Now - the long awaited news - MSXML is back and kicking! I knew I'll happen.
As far as MSXML is concerned, we don't have a lot of requests for XQuery. And with Version 6.0, we think that MSXML will become the XML tools library for the native C and C++ code environment, and we'll continue to support it.
XQuery in SQL Server 2005 internals:
If you get down to the nuts and bolts of it, SQL Server's XQuery processing facilities rely on the underlying SQL query processor -- they're really quite integrated and there's no easy way to separate the two. But this actually brings up an interesting point about SQL Server's XQuery processor architecture - that under the hood, we're able to leverage our top-of-the-line relational query optimization engine technologies. As XQuery is a declarative language, we're able to do a pretty advanced job at decomposing and optimizing the underlying query. We've brought several innovations to this space to make optimization of XQuery expressions possible, and there is much more that can be done. [See Microsoft's recent SIGMOD 2004 and VLDB 2004 papers for more information - Ed.]
The papers are: "ORDPATHs: Insert-Friendly XML Node Labels" and "Indexing XML Data Stored in a Relational Database". ORDPATH is a hierarchical labeling scheme implemented in SQL Server 2005. Both articles are worth reading and contemplating.

And when it comes to XQuery support on client side - that's what I suspected:

If I was to summarize the state of XQuery support at Microsoft, I'd say that we think it's a great server-side technology for accessing large amounts of XML, its declarative nature makes it particularly well suited for optimization, and we support it 100%. On the client-side, we're still investigating here- we haven't seen our users crying out for declarative languages on the client side just yet.
Fair enough.

And finally really good news:

The XQuery working group - we're currently preparing a final last call, and there's light at the end of the tunnel. I see the process leading to a recommendation, probably in Q1 2006.
Now that sounds like a call for action. XQuery in SQL Server is soon to be here, so it's time to start learning and playing with this amazing technology. And that's what I'm gonna do.