Generating HTML excerpts

| 1 Comment | No TrackBacks

Here is another interesting problem: how do you generate HTML excerpts preserving HTML structure and style? Say you have long XHTML text:

<b>This is a <span style="color: #888">very long</span> text.</b>

In browser it looks like this:

This is a very long text.

The text is 25 characters long. Now you need to generate a short excerpt - cut it down to 15 characters, while preserving HTML structure and style:

<b>This is a <span style="color: #888">very ...</span></b>

So in a browser it would look like

This is a very ...

I solved it in XSLT 1.0 using ugly (but effifcient) recursive template:

<xsl:stylesheet version="1.0" xmlns:xsl="">

  <xsl:param name="max-len" select="15"/>

  <xsl:template match="/">
    <xsl:call-template name="trim"/>

  <xsl:template name="trim">
    <xsl:param name="rlen" select="0"/>
    <xsl:param name="nodes" select="*"/>

      <xsl:when test="$rlen + string-length($nodes[1]) <= $max-len">
        <xsl:copy-of select="$nodes[1]"/>
        <xsl:if test="$nodes[2]">
          <xsl:call-template name="trim">
            <xsl:with-param name="rlen" select="$rlen + string-length($nodes[1]) "/>
            <xsl:with-param name="nodes" select="$nodes[position() != 1]|$nodes[1]/*"/>
      <xsl:when test="$nodes[1]/self::text()">
        <xsl:value-of select="substring($nodes[1], 1, $max-len - $rlen)"/>
        <xsl:if test="$nodes[1]/node()">
          <xsl:element name="{name($nodes[1])}" 
            <xsl:copy-of select="$nodes[1]/@*"/>
            <xsl:call-template name="trim">
              <xsl:with-param name="rlen" select="$rlen"/>
              <xsl:with-param name="nodes" select="$nodes[1]/node()"/>

But I'm not happy with this solution. There must be more elegant way. The problem just smells FXSL. Hopefully Dimitre can show me how FXSL can do it with beauty and style.

I also wonder how would you do it with XLinq?

Related Blog Posts

No TrackBacks

TrackBack URL:

1 Comment

You need to remove "|$nodes[1]/*" from the first xsl:when, otherwise some nodes will be duplicated in the output. For XLinq the following approach may work:

var xdoc = XDocument.Parse("<b>This is a <span style=\"color: #888\">very long</span> text.</b>");
int maxLength = 15;
var last = xdoc.DescendantNodes().OfType<XText>().FirstOrDefault(n => (maxLength -= n.Value.Length) < 0);
if (last != null) {
last.Value = last.Value.Substring(0, maxLength + last.Value.Length) + "...";
last.Ancestors().SelectMany(n => n.NodesAfterSelf()).Remove();

As an alternative, one may override the XmlReader.Read() method to stop returning nodes after maxLength characters has been read in text nodes.

Leave a comment