March 27, 2008

Generating HTML excerpts

Here is another interesting problem: how do you generate HTML excerpts preserving HTML structure and style? Say you have long XHTML text:

<b>This is a <span style="color: #888">very long</span> text.</b>

In browser it looks like this:

This is a very long text.

The text is 25 characters long. Now you need to generate a short excerpt - cut it down to 15 characters, while preserving HTML structure and style:

<b>This is a <span style="color: #888">very ...</span></b>

So in a browser it would look like

This is a very ...

I solved it in XSLT 1.0 using ugly (but effifcient) recursive template:

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

  <xsl:param name="max-len" select="15"/>

  <xsl:template match="/">
    <xsl:call-template name="trim"/>
  </xsl:template>

  <xsl:template name="trim">
    <xsl:param name="rlen" select="0"/>
    <xsl:param name="nodes" select="*"/>

    <xsl:choose>
      <xsl:when test="$rlen + string-length($nodes[1]) <= $max-len">
        <xsl:copy-of select="$nodes[1]"/>
        <xsl:if test="$nodes[2]">
          <xsl:call-template name="trim">
            <xsl:with-param name="rlen" select="$rlen + string-length($nodes[1]) "/>
            <xsl:with-param name="nodes" select="$nodes[position() != 1]|$nodes[1]/*"/>
          </xsl:call-template>
        </xsl:if>
      </xsl:when>
      <xsl:when test="$nodes[1]/self::text()">
        <xsl:value-of select="substring($nodes[1], 1, $max-len - $rlen)"/>
        <xsl:text>...</xsl:text>
      </xsl:when>
      <xsl:otherwise>
        <xsl:if test="$nodes[1]/node()">
          <xsl:element name="{name($nodes[1])}" 
                       namespace="{namespace-uri($nodes[1])}">
            <xsl:copy-of select="$nodes[1]/@*"/>
            <xsl:call-template name="trim">
              <xsl:with-param name="rlen" select="$rlen"/>
              <xsl:with-param name="nodes" select="$nodes[1]/node()"/>
            </xsl:call-template>
          </xsl:element>
        </xsl:if>
      </xsl:otherwise>
    </xsl:choose>
  </xsl:template>
</xsl:stylesheet>

But I'm not happy with this solution. There must be more elegant way. The problem just smells FXSL. Hopefully Dimitre can show me how FXSL can do it with beauty and style.

I also wonder how would you do it with XLinq?

...