August 3, 2005

Saxon 8.5, new optimizations and abilities

Michael Kay has released Saxon XSLT and XQuery processor v8.5. This new release implements some very interesting optimizations (available only in commercial version though) and new abilities, one of which is probably worth to implement in EXSLT.NET module. ...

Optimizations include:

* hash join optimization for both XSLT and XQuery (can give fantastic speed-up when processing large documents)
Hash join is well-known technique implemented by many RDBMS engines (including SQL Server) used to optimize set-matching operations. You can read about hash join here.
* a binary disk representation of validated source documents, reducing the document loading costs when the same document is used repeatedly by many transformations
Here goes binary PSVI representation, for sure a proprietary one, but portable binary is an oxymoron anyway.
* a mechanism for sequential XSLT processing of input documents without reading the whole document into memory, making it feasible to process very large documents provided the transformation is serial in nature
That reminds me what Mike said a year ago:
Firstly, there's a range of techniques that come under the heading of parallelism. Xalan, for example, has the parser and tree builder for the source document running in parallel with the transformation engine: if the stylesheet needs access to nodes that aren't there yet, the transformation engine waits until the parser has delivered them. The real saving here would come if it was also possible to discard parts of the tree once the stylesheet has finished with them. Unfortunately no-one seems close to solving this problem, even though many stylesheets do process the source document in a broadly sequential way.
Looks like he solved it. It would be interesting to measure the performance boost this optimization provides.

Additionally free Saxon version now is able to process the whole directory of files using this syntax:

document("dir?recurse=yes;select=*.html;parser=org.ccil.cowan.tagsoup.Parser")
Which returns all *.html files in the "dir" directory, processed recursively and converted to XML using TagSoap parser. It seems to be pretty useful and a piece of cake to implement at the same time. I've heard many times users asked for wildcards in document() function and every time the answer was - go write custom resolver and combine documents somehow.

If anybody needs this functionality (wildcards/recursive processing for the document() function + ability to load HTML documents) - speak up and I'd go and implement it for the EXSLT.NET module.

Gobo Eiffel XSLT - XSLT 2.0 Processor written in Eiffel

Colin Paul Adams has announced Gobo Eiffel XSLT - free XSLT 2.0 processor written in Eiffel. Gexslt is intended to conform to a Basic-level XSLT 2.0 Processor and currently is still under development. Win32 compiled version can be downloaded at http://www.gobosoft.com/download/gobo34.zip. ...