May 19, 2004

Wearing tourist hat

That's all, folks. I'm on vacation from tomorrow for two weeks. We gonna fly to Prague, stay there for some days and then make a car trip across Europe. No laptop. I'll read my mail occasionally though (I'm afraid otherwise my mailbox will explode with all that spam and I ...

May 17, 2004

xmlhack.com to take a rest

From daily-bad-news department: "That's it for now" from xmlhack.com, a good news site for XML developers. It's been a lot of fun writing XMLhack since 1999, but it's time for us to take a rest. At least : Every endeavour will be made to keep XMLhack content online at the ...

May 16, 2004

An Introduction to the XQuery (and XPath 2.0) Type System by Michael Rys

Michael Rys (PM for the SQL Server Engine support of XQuery) is trying to bring some order into the confusion around XQuery 1.0 and XPath 2.0 type system. His first instalment in the series introduces the terminology and general concepts. Read it here. More to come, worth to stay tuned. ...

Improving XML Performance

Here is another piece of a must reading - "Chapter 9 - Improving XML Performance" of the "Improving .NET Application Performance and Scalability" guide from the Microsoft Pattern and Practices group. Here are the objectives : Optimize XML processing design, Parse XML documents efficiently, Validate XML documents efficiently, Optimize your ...

May 13, 2004

XSLT and XPath Optimization

Here is interesting paper "XSLT and XPath Optimization" by Michael Kay. That's materials of Michael's talk at recent XML Europe conference. In this paper Michael reveals details of the XSLT and XPath optimizations performed internally by SAXON (XSLT and XQuery processor): This paper describes the main techniques used by the ...

Interesting quotes:

Parsing of XQuery is considerably more complex than XPath parsing, because of the need to switch between the different grammatical styles used in the query prolog, in conventional expressions, and in the XML-like element constructors. XSLT parsing, of course, is trivial, because the difficult part is done by an XML parser.
Most of the important static optimizations are done during the second phase, analyze(). These fall into a number of categories:

Early evaluation of constant sub-expressions (known, for some curious reason, as "constant folding"). This is another optimization that cannot be done if the result is sensitive to node identity. Constant sub-expressions are surprisingly common in XSLT, because global variables often have a constant value; and it is often possible to get rid of large chunks of stylesheet code by pre-evaluating a condition such as <xsl:if test="(system-property('xsl:vendor')='Xalan')">.

Local rewrites such as replacing count(X) = 0 by empty(X). (This avoids the need to distinguish a sequence with a thousand items from one with a thousand and one). These rewrites are local in the sense that they only require looking in the immediate vicinity of a node in the abstract syntax tree.

Early binding of polymorphic operators (especially comparison operators such as "=" and arithmetic operators such as "+") based on the types of their operands.

Adding code to do run-time type-checking and type conversion (such as atomization and numeric promotion) if static analysis shows that it is necessary. If static analysis shows that the supplied value will always have the required type, then this code is not generated.

Non-local rewrites. The most significant of these in Saxon is moving an expression out of a loop if it does not depend on any variables (range variables or context variables) that are set within the loop. A "loop" here can be a predicate, the right-hand side of the "/" operator, or the action part of a "for" expression. (Saxon does not at present do this optimization at the XSLT level, only at the XPath/XQuery level). This optimization too needs to be aware that creative expressions cannot always be safely moved.

Ordering rewrites. These fall into two categories: adding a sort, and removing it. This relates primarily to the requirement to deliver the results of certain expressions in document order. Saxon first adds a sort operator to the tree if the semantics require it and the expression is not "naturally sorted" (as determined using rules such as the peer/subtree rule given earlier). Then it removes the sort operator if the expression is used in a context where ordering is immaterial, for example an argument of one of the functions count(), max(), or of course unordered(). However, the scope for this is limited because in many contexts where there is no dependency on ordering, there is still a requirement to eliminate duplicates, and the simplest way to eliminate duplicates is by sorting.

Elimination of common subexpressions. Saxon currenly does this in only a few special cases, for example it rewrites the expression A/B/C | A/B/D as A/B/*[self::C or self::D]. In general, elimination of common subexpressions is greatly complicated by the need to consider dependencies on the context.
Firstly, there's a range of techniques that come under the heading of parallelism. Xalan, for example, has the parser and tree builder for the source document running in parallel with the transformation engine: if the stylesheet needs access to nodes that aren't there yet, the transformation engine waits until the parser has delivered them. The real saving here would come if it was also possible to discard parts of the tree once the stylesheet has finished with them. Unfortunately no-one seems close to solving this problem, even though many stylesheets do process the source document in a broadly sequential way.
etc...

On XPathReader

Finally XPathReader has been unveiled at the MSDN XML DC in "Extreme XML: Combining XPath with the XmlReader" article by Dare Obasanjo and Howard Hao. Really, really interesting solution, alredy used in Biztalk internals to optimize XML pipeline processing. Need to play with it more. I like XPathReader. It reminds ...

Mono beta1

Mono project (an open source implementation of the .NET framework for Linux, Unix and Windows) reached Beta1 stage. They say Mono 1.0 can be released this summer already. Now to funny part. I've been reading Release Notes while downloading the release and found myself in the contributors list :) Well ...

May 12, 2004

Transforming only a part of XML document

Daniel writes about transforming a portion of XML document using XPathNavigatorReader. That's a common bad surprise for MSXML-experienced people, who used to fooNode.transformNode() method, where only fooNode and its descendants are visible in XSLT. In .NET that's different - no matter which DOM node you pass to XslTransform.Transform() method, the ...

May 4, 2004

Random excuses

I'm blogging sparely last time due to trivial lack of time. I'm taking two MVP academy courses (advanced C# and ASP.NET level 200) simultaneously, trying to catch up with what I've promissed for Mvp.Xml and XInclude.NET projects, preparing new article and working on my new pet project, which I'm going ...

RSS Bandit v1.2.0.112 Released

New version of the RSS Bandit has been released today. Amonst new features: support for Atom 0.3 support, ability to synchronize installations (killer!), 5 translations including Russian one made by me and lots more! Updated. Didn't check new features yet, but I can say the responsiveness is also improved! This ...

Free WordML chapter of the "Office 2003 XML" book

As Evan Lenz pointed out, O'Reilly put Chapter 2 ("The WordprocessingML Vocabulary") of the "Office 2003 XML" book online. Here it is (88 pages pdf). Excellent introduction to WordML. Those who want to learn WordML - go read it (or buy the book). ...

May 2, 2004

RenderX XEP for .NET is released

RenderX has announced the first release (somehow it's v3.0 :) of XEP.NET - XSL-FO formatter for .NET. XEP.NET is a Visual J#.NET port of RenderX XEP, an XSL formatter for Java; its functionality and XSL FO support level are identical to the Java version. The XEP.NET core is wrapped in ...