Unfortunately neither XPath 1.0 nor XSLT1.0 don't provide any solutions for the problem. Usual answer is "pass it as a parameter". That's a good one, but not always suitable (e.g. when transforming client side with <?xml-stylesheet?> PI). Next answer is "use Saxon's saxon:system-id() extension function or write your own". Latter is what I'm going to illustrate.
Simple, ain't it:
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:msxsl="urn:schemas-microsoft-com:xslt" xmlns:ext="http://mycompany.com/mynamespace"> <msxsl:script language="javascript" implements-prefix="ext"> function uri(nodelist) { return nodelist[0].url; } </msxsl:script> <xsl:template match="/"> <p>Currently processing XML document URI is <tt><xsl:value-of select="ext:uri(/)"/></tt></p> <p>Currently processing XSLT stylesheet URI is <tt><xsl:value-of select="ext:uri(document(''))"/></tt></p> </xsl:template> </xsl:stylesheet>The result (try http://www.tkachenko.com/samples/detect-uri.xml) is:
PS. Of course extension functions are not portable and above works only in IE/MSXML3+.
PPS. Of course it only works well when XML documents are loaded from a URI, not generated on the fly.
PPPS. XPath 2.0 will fix the problem providing fn:document-uri and fn:base-uri() functions.
Let's say we've got the following XML:
<a> <b> <d/> <e/> </b> <c> <f/> <g/> </c> </a>Easy to see that being traversed in a breadth-first way, the sequence of visited nodes would be a, b, c, d, e, f, g. How?
Strightforward declarative solution is to traverse nodes level by level - just select all nodes at a level i, then i+1 etc. till maximum depth level is reached:
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <xsl:template match="/"> <xsl:call-template name="BFT"> <xsl:with-param name="tree" select="/descendant::*"/> </xsl:call-template> </xsl:template> <!-- Breadth-first traversal template --> <xsl:template name="BFT"> <xsl:param name="tree" select="/.."/> <xsl:param name="depth" select="0"/> <xsl:variable name="nodes-at-this-depth" select="$tree[count(ancestor::*)=$depth]"/> <xsl:apply-templates select="$nodes-at-this-depth"/> <xsl:if test="count($nodes-at-this-depth)>0"> <xsl:call-template name="BFT"> <xsl:with-param name="tree" select="$tree"/> <xsl:with-param name="depth" select="$depth + 1"/> </xsl:call-template> </xsl:if> </xsl:template> <!-- Actual node processing --> <xsl:template match="*"> <xsl:value-of select="name()"/> </xsl:template> </xsl:stylesheet>Not so effective, but anyway. For each depth level we need to count all ancestors of each node in the list. Looks like worst case running time of the above implementation is O(maxDepth*n*maxDepth) = O(maxDepth2*n), where n is the number of nodes.
Using keys it can be improved to O(maxDepth*n + maxDepth*O(1)) = O(maxDepth*(n+1)), where maxDepth*n is an indexing price and maxDepth*O(1) is running time of retrieving nodes from the index by depth level value (provided keys implementation is based on a hashtable):
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <xsl:key name="depthMap" match="*" use="count(ancestor::*)"/> <xsl:template match="/"> <xsl:call-template name="BFT"/> </xsl:template> <!-- Breadth-first traversal template --> <xsl:template name="BFT"> <xsl:param name="depth" select="0"/> <xsl:variable name="nodes-at-this-depth" select="key('depthMap', $depth)"/> <xsl:apply-templates select="$nodes-at-this-depth"/> <xsl:if test="count($nodes-at-this-depth)>0"> <xsl:call-template name="BFT"> <xsl:with-param name="depth" select="$depth+1"/> </xsl:call-template> </xsl:if> </xsl:template> <!-- Actual node processing --> <xsl:template match="*"> <xsl:value-of select="name()"/> </xsl:template> </xsl:stylesheet>
O(maxDepth*n) is definitely better than O(maxDepth2*n), but still worse than procedural O(n). In the worst case (deep lean tree) it gets even O(n2). But on average XML trees, which are usually wide and rather shallow, it's close to procedural algorithm's running time.
More ideas?
One could ask - btw, can ASCII encoded XML document to contain arbitrary Unicode characters? While it sounds a bit contradictory, the answer is of course - sure it can, because encoding of an XML document (that one which you can see in XML declaration) is sort of "transfer encoding", while character range for any XML document is always the whole set of legal characters of Unicode and ISO/IEC 10646 (more strict definition). XML syntax allows any (but legal in XML) character to be written as numeric character reference, e.g. ב (Hebrew letter BET). So the solution for the given problem of narrowing the encoding is to encode all characters that don't fit into target encoding as numeric character references. Basically that's what XSLT 1.0 spec requires from XSLT processors:
It is possible that the result tree will contain a character that cannot be represented in the encoding that the XSLT processor is using for output. In this case, if the character occurs in a context where XML recognizes character references (i.e. in the value of an attribute node or text node), then the character should be output as a character reference; otherwise (for example if the character occurs in the name of an element) the XSLT processor should signal an error.
Well, unfortunately native .NET 1.X XSLT processor - XslTransform class doesn't support that yet. So let's see how we can get this done. The first solution that comes in my mind is simple custom XmlWriter, which filters output text and encodes all non ASCII characters as numeric character references. Just like SAX filter for those SAX minded. Here is the implementation:
public sealed class ASCIIXmlTextWriter : XmlTextWriter { //Constructors - add more as needed public ASCIIXmlTextWriter(string url) : base(url, Encoding.ASCII) {} public override void WriteString(string text) { StringBuilder sb = new StringBuilder(text.Length); foreach (char c in text) { if (c > 0x0080) { sb.Append(""); sb.Append((int)c); sb.Append(';'); } else { sb.Append(c); } } base.WriteRaw(sb.ToString()); } }Let's test it. Source XML:
<?xml version="1.0" encoding="UTF-8"?> <message>English, Русский (Russian), עברית (Hebrew)</message>Dummy stylesheet:
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <xsl:template match="/"> <xsl:copy-of select="message"/> </xsl:template> </xsl:stylesheet>The code:
XPathDocument doc = new XPathDocument("foo.xml"); XslTransform xslt = new XslTransform(); xslt.Load("foo.xsl"); ASCIIXmlTextWriter writer = new ASCIIXmlTextWriter("out.xml"); xslt.Transform(doc, null, writer, null); writer.Close();And here is the result (note that all non ASCII charcters have been written as numeric char references):
<message>English, Русский (Russian), עברית (Hebrew)</message>Which looks fine both in IE and Mozilla:
Above simple solution handles equally well both text and attribute values, but not comments and PIs (but that's easy as well). It's neither optimized nor tested, but you've got the idea.