March 30, 2008

Crowdsourcing in action: results

I was writing about a pilot the Library of Congress was doing with Flickr. I measured also number of tags, notes and comments and repeated the process several times during last 2 months. Here are some numeric results:

Library of Conress pilot on Flickr

As expected, while tags, notes and comments still coming, in general the lines are almost flat after 50 days.

Averages: 4.85 unique tags,  0.39 notes, 1.34 comments per photo.

The Library of Congress blog shared some real results:

And because we government-types love to talk about results, there are some tangible outcomes of the Flickr pilot to report: As of this writing, 68 of our bibliographic records have been modified thanks to this project and all of those awesome Flickr members.

Well, that doesn't impress much, but they must be happy as they have posted 50 more photos.

March 27, 2008

Generating HTML excerpts

Here is another interesting problem: how do you generate HTML excerpts preserving HTML structure and style? Say you have long XHTML text:

<b>This is a <span style="color: #888">very long</span> text.</b>

In browser it looks like this:

This is a very long text.

The text is 25 characters long. Now you need to generate a short excerpt - cut it down to 15 characters, while preserving HTML structure and style:

<b>This is a <span style="color: #888">very ...</span></b>

So in a browser it would look like

This is a very ...

I solved it in XSLT 1.0 using ugly (but effifcient) recursive template:

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

  <xsl:param name="max-len" select="15"/>

  <xsl:template match="/">
    <xsl:call-template name="trim"/>
  </xsl:template>

  <xsl:template name="trim">
    <xsl:param name="rlen" select="0"/>
    <xsl:param name="nodes" select="*"/>

    <xsl:choose>
      <xsl:when test="$rlen + string-length($nodes[1]) <= $max-len">
        <xsl:copy-of select="$nodes[1]"/>
        <xsl:if test="$nodes[2]">
          <xsl:call-template name="trim">
            <xsl:with-param name="rlen" select="$rlen + string-length($nodes[1]) "/>
            <xsl:with-param name="nodes" select="$nodes[position() != 1]|$nodes[1]/*"/>
          </xsl:call-template>
        </xsl:if>
      </xsl:when>
      <xsl:when test="$nodes[1]/self::text()">
        <xsl:value-of select="substring($nodes[1], 1, $max-len - $rlen)"/>
        <xsl:text>...</xsl:text>
      </xsl:when>
      <xsl:otherwise>
        <xsl:if test="$nodes[1]/node()">
          <xsl:element name="{name($nodes[1])}" 
                       namespace="{namespace-uri($nodes[1])}">
            <xsl:copy-of select="$nodes[1]/@*"/>
            <xsl:call-template name="trim">
              <xsl:with-param name="rlen" select="$rlen"/>
              <xsl:with-param name="nodes" select="$nodes[1]/node()"/>
            </xsl:call-template>
          </xsl:element>
        </xsl:if>
      </xsl:otherwise>
    </xsl:choose>
  </xsl:template>
</xsl:stylesheet>

But I'm not happy with this solution. There must be more elegant way. The problem just smells FXSL. Hopefully Dimitre can show me how FXSL can do it with beauty and style.

I also wonder how would you do it with XLinq?

March 21, 2008

Generating Java using XSLT

We are working on yet another language migration tool and faced once again Java source code generation problem. Unfortunately Java doesn't have anything similar to .NET's CodeDOM, so we had to build own own Java generator. This time our development platform is XSLT 2.0. Yes, we are converting COOL:Gen (obscure 4GL model-based language) to Java using XSLT 2.0.

XSLT 2.0 rocks by the way. This is first time I write production code in XSLT 2.0 and this is amazing experience. Suddenly all is so easy, everything is possible, no hassle. Despite poor authoring support (Eclipse XSLT editor sucks, while Visual Studio 2008 with XSLT 2.0 schema is ok, but cannot run Saxon), lack of debugger and Saxon quirks I had a blast practicing XSLT 2.0 for real.

At first I started generating Java beans simple way: output mode="text" and producing Java sources as text. Obviously it sucked big way. I spent a week and got it done, but with way too cumbersome and fragile code. Generating code and simultaneously coping with Java syntax and formatting is hard. Additional layer of indirection was needed desperately.

One of smart guys I work with came with a simple but brilliant idea. Vladimir took Java 6 ANTLR grammar and converted it to XML Schema. Then he developed a generic serializer (also in XSLT 2.0 of course) that is able to convert XML document confirming to Java XML schema (he called it JXOM -  Java XML Object Model) into nicely formatted and optimized decent Java 6 source code.

Then I rebuilt my Java bean generator using JXOM instead in just one day. Building Java as XML is so much easier and cleaner, I believe it's even easier than using System.CodeDom in .NET (obviously CodeDom can do more than just generate C# or VB sources).

Anyway, anybody interested in Java generation - check out JXOM. This is really easy way to generate Java 9even Java 6.0) using XSLT. It's freely available and it just works. Here are more links:

  1. Java xml object model
  2. Xslt for the jxom (Java xml object model)
  3. jxom update

JXOM is ready to use, but still under active development. Any feedback is highly appreciated at Vladimir and Arthur Nesterovsky blog.

March 6, 2008

Sergey Dubinets is blogging

Sergey Dubinets, the guy behind Microsoft XSLT engine and tools is blogging. Subscribed. Highly recommended.

More XSLT bloggers from Microsoft: