July 30, 2004

Another form of blog comment spam: indirect referencing

I just got several instances of what I believe is another resourceful form of blog comment spam. It looked like an ordinar spam, somehow making it through MT-Blacklist system I've got installed and after "Name: free government grants" I was aready clicking on "De-spam using MT-Blacklist" link, but then I ...

July 28, 2004

Tell me who are you and what are you processing

This is small trick for newbies looking for a way to get URI of a source XML and the stylesheet from within XSLT stylesheet. ...

Unfortunately neither XPath 1.0 nor XSLT1.0 don't provide any solutions for the problem. Usual answer is "pass it as a parameter". That's a good one, but not always suitable (e.g. when transforming client side with <?xml-stylesheet?> PI). Next answer is "use Saxon's saxon:system-id() extension function or write your own". Latter is what I'm going to illustrate.

Simple, ain't it:

<xsl:stylesheet version="1.0" 
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:msxsl="urn:schemas-microsoft-com:xslt"
xmlns:ext="http://mycompany.com/mynamespace">

  <msxsl:script language="javascript" 
    implements-prefix="ext">
    function uri(nodelist) {
      return nodelist[0].url;
    }
  </msxsl:script>
  
  <xsl:template match="/">
    <p>Currently processing XML document URI is 
      <tt><xsl:value-of select="ext:uri(/)"/></tt></p>
    <p>Currently processing XSLT stylesheet URI is 
      <tt><xsl:value-of select="ext:uri(document(''))"/></tt></p>
  </xsl:template>  
</xsl:stylesheet>
The result (try http://www.tkachenko.com/samples/detect-uri.xml) is:
Currently processing XML document URI is http://www.tkachenko.com/samples/detect-uri.xml

Currently processing XSLT stylesheet URI is http://www.tkachenko.com/samples/detect-uri.xsl

PS. Of course extension functions are not portable and above works only in IE/MSXML3+.

PPS. Of course it only works well when XML documents are loaded from a URI, not generated on the fly.

PPPS. XPath 2.0 will fix the problem providing fn:document-uri and fn:base-uri() functions.

July 25, 2004

Breadth-first tree traversal in XSLT

As a matter of interest - how would you implement breadth-first tree traversal in XSLT? Traditional algorithm is based on using a queue and hence isn't particularly suitable here. Probably it's feasible to emulate a queue with temporary trees, but I think that's going to be quite ineffective. Being not ...

Let's say we've got the following XML:

<a>
  <b>
    <d/>
    <e/>
  </b>  
  <c>
    <f/>
    <g/>
  </c>
</a>
Easy to see that being traversed in a breadth-first way, the sequence of visited nodes would be a, b, c, d, e, f, g. How?

Strightforward declarative solution is to traverse nodes level by level - just select all nodes at a level i, then i+1 etc. till maximum depth level is reached:

<xsl:stylesheet version="1.0" 
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
  <xsl:template match="/">  
    <xsl:call-template name="BFT">
      <xsl:with-param name="tree" select="/descendant::*"/>
    </xsl:call-template>
  </xsl:template>
  <!-- Breadth-first traversal template -->
  <xsl:template name="BFT">
    <xsl:param name="tree" select="/.."/>
    <xsl:param name="depth" select="0"/>
    <xsl:variable name="nodes-at-this-depth" 
       select="$tree[count(ancestor::*)=$depth]"/>
    <xsl:apply-templates select="$nodes-at-this-depth"/>
    <xsl:if test="count($nodes-at-this-depth)>0">
        <xsl:call-template name="BFT">
          <xsl:with-param name="tree" select="$tree"/>
          <xsl:with-param name="depth" select="$depth + 1"/>       
        </xsl:call-template>
    </xsl:if>    
  </xsl:template>
  <!-- Actual node processing -->
  <xsl:template match="*">    
    <xsl:value-of select="name()"/>    
  </xsl:template>
</xsl:stylesheet>
Not so effective, but anyway. For each depth level we need to count all ancestors of each node in the list. Looks like worst case running time of the above implementation is O(maxDepth*n*maxDepth) = O(maxDepth2*n), where n is the number of nodes.

Using keys it can be improved to O(maxDepth*n + maxDepth*O(1)) = O(maxDepth*(n+1)), where maxDepth*n is an indexing price and maxDepth*O(1) is running time of retrieving nodes from the index by depth level value (provided keys implementation is based on a hashtable):

<xsl:stylesheet version="1.0" 
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
  <xsl:key name="depthMap" match="*" use="count(ancestor::*)"/>
  <xsl:template match="/">
    <xsl:call-template name="BFT"/>
  </xsl:template>
  <!-- Breadth-first traversal template -->
  <xsl:template name="BFT">
    <xsl:param name="depth" select="0"/>
    <xsl:variable name="nodes-at-this-depth" 
      select="key('depthMap', $depth)"/>
    <xsl:apply-templates select="$nodes-at-this-depth"/>
    <xsl:if test="count($nodes-at-this-depth)>0">
      <xsl:call-template name="BFT">
        <xsl:with-param name="depth" select="$depth+1"/>
      </xsl:call-template>
    </xsl:if>
  </xsl:template>
  <!-- Actual node processing -->
  <xsl:template match="*">
    <xsl:value-of select="name()"/>
  </xsl:template>
</xsl:stylesheet>

O(maxDepth*n) is definitely better than O(maxDepth2*n), but still worse than procedural O(n). In the worst case (deep lean tree) it gets even O(n2). But on average XML trees, which are usually wide and rather shallow, it's close to procedural algorithm's running time.

More ideas?

SgmlReader and namespaces

It's obvious, but I didn't realize that till recently - Chris Lovett's SgmlReader doesn't supprot namespaces. Why? SgmlReader is SGML reader in the first place and you know, there is no namespaces in SGML. So whenever you want to cheat and process malformed XML with SgmlReader - beware of namespaces. ...