July 25, 2004

Breadth-first tree traversal in XSLT

As a matter of interest - how would you implement breadth-first tree traversal in XSLT? Traditional algorithm is based on using a queue and hence isn't particularly suitable here. Probably it's feasible to emulate a queue with temporary trees, but I think that's going to be quite ineffective. Being not ...

Let's say we've got the following XML:

<a>
  <b>
    <d/>
    <e/>
  </b>  
  <c>
    <f/>
    <g/>
  </c>
</a>
Easy to see that being traversed in a breadth-first way, the sequence of visited nodes would be a, b, c, d, e, f, g. How?

Strightforward declarative solution is to traverse nodes level by level - just select all nodes at a level i, then i+1 etc. till maximum depth level is reached:

<xsl:stylesheet version="1.0" 
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
  <xsl:template match="/">  
    <xsl:call-template name="BFT">
      <xsl:with-param name="tree" select="/descendant::*"/>
    </xsl:call-template>
  </xsl:template>
  <!-- Breadth-first traversal template -->
  <xsl:template name="BFT">
    <xsl:param name="tree" select="/.."/>
    <xsl:param name="depth" select="0"/>
    <xsl:variable name="nodes-at-this-depth" 
       select="$tree[count(ancestor::*)=$depth]"/>
    <xsl:apply-templates select="$nodes-at-this-depth"/>
    <xsl:if test="count($nodes-at-this-depth)>0">
        <xsl:call-template name="BFT">
          <xsl:with-param name="tree" select="$tree"/>
          <xsl:with-param name="depth" select="$depth + 1"/>       
        </xsl:call-template>
    </xsl:if>    
  </xsl:template>
  <!-- Actual node processing -->
  <xsl:template match="*">    
    <xsl:value-of select="name()"/>    
  </xsl:template>
</xsl:stylesheet>
Not so effective, but anyway. For each depth level we need to count all ancestors of each node in the list. Looks like worst case running time of the above implementation is O(maxDepth*n*maxDepth) = O(maxDepth2*n), where n is the number of nodes.

Using keys it can be improved to O(maxDepth*n + maxDepth*O(1)) = O(maxDepth*(n+1)), where maxDepth*n is an indexing price and maxDepth*O(1) is running time of retrieving nodes from the index by depth level value (provided keys implementation is based on a hashtable):

<xsl:stylesheet version="1.0" 
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
  <xsl:key name="depthMap" match="*" use="count(ancestor::*)"/>
  <xsl:template match="/">
    <xsl:call-template name="BFT"/>
  </xsl:template>
  <!-- Breadth-first traversal template -->
  <xsl:template name="BFT">
    <xsl:param name="depth" select="0"/>
    <xsl:variable name="nodes-at-this-depth" 
      select="key('depthMap', $depth)"/>
    <xsl:apply-templates select="$nodes-at-this-depth"/>
    <xsl:if test="count($nodes-at-this-depth)>0">
      <xsl:call-template name="BFT">
        <xsl:with-param name="depth" select="$depth+1"/>
      </xsl:call-template>
    </xsl:if>
  </xsl:template>
  <!-- Actual node processing -->
  <xsl:template match="*">
    <xsl:value-of select="name()"/>
  </xsl:template>
</xsl:stylesheet>

O(maxDepth*n) is definitely better than O(maxDepth2*n), but still worse than procedural O(n). In the worst case (deep lean tree) it gets even O(n2). But on average XML trees, which are usually wide and rather shallow, it's close to procedural algorithm's running time.

More ideas?

SgmlReader and namespaces

It's obvious, but I didn't realize that till recently - Chris Lovett's SgmlReader doesn't supprot namespaces. Why? SgmlReader is SGML reader in the first place and you know, there is no namespaces in SGML. So whenever you want to cheat and process malformed XML with SgmlReader - beware of namespaces. ...