July 12, 2007

Producing XHTML using XSLT in .NET

Producing XHTML using XSLT 1.0 processor is tough (no wonder - XSLT 1.0 is so old - it was published even before XHTML 1.0). While XHTML is just XML, XHTML spec defines a set of very specific formatting rules called "HTML Compatibility Guidelines". The goal is to facilitate rendering of XHTML by HTML browsers (such as Internet Explorer :).

The guidelines say for instance that elements with non-empty content model (such as <p>) must never be serialized in minimized form (<p />), while elements with empty content model (such as <br>) must never be serialized in full form (<br></br>).

While XML doesn't care about such nonsense, HTML browsers might be confused and so XHTML generation should be smart enough. And XSLT 1.0 processors can only output text, HTML or XML (XSLT 2.0 processors can also do XHTML). That's why generating XHTML using XSLT 1.0 processor is tough.

I implemented one simple solution to the problem in the Mvp.Xml library 2.3. Here is a sample that says it all:

XSLT stylesheet:
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
  <xsl:output method="xml" indent="yes" 
    doctype-system="http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"
    doctype-public="-//W3C//DTD XHTML 1.0 Transitional//EN"/>
  <xsl:template match="/">
    <html xmlns="http://www.w3.org/1999/xhtml">
      <head>
        <title/>
      </head>
      <body>
        <p>Para element must have end tag even if empty:</p>
        <p/>
        <p>These elements must not have end tags:</p>
	<p>
          <br></br>
          <hr></hr>
          <img src="foo.jpg" alt="bar"></img>
        </p>
      </body>
    </html>
  </xsl:template>
</xsl:stylesheet>

The code:

using System;
using System.Xml.XPath;
using System.IO;
using Mvp.Xml.Common.Xsl;

class Program
{
  static void Main(string[] args)
  {
    XPathDocument doc = new XPathDocument(
      new StringReader(""));
    MvpXslTransform xslt = new MvpXslTransform();
    xslt.Load("../../XSLTFile1.xslt");
    xslt.EnforceXHTMLOutput = true;
    xslt.Transform(new XmlInput(doc), null, 
      new XmlOutput(Console.Out));
  }
}

The result:

<?xml version="1.0" encoding="DOS-862"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
  <head>
    <title></title>
  </head>
  <body>
    <p>Para element must have end tag even if empty:</p>
    <p></p>
    <p>These elements must not have end tags:</p>
    <p>
      <br />
      <hr />
      <img src="foo.jpg" alt="bar" />
    </p>
  </body>
</html>

If for some weird reason you don't want to use MvpXslTransform class, you can stay with XslCompiledTransform and just output via XhtmlWriter class:

using System;
using System.Xml.XPath;
using System.Xml.Xsl;
using System.Xml;
using System.IO;
using Mvp.Xml.Common;

class Program
{
  static void Main(string[] args)
  {
    XPathDocument doc = new XPathDocument(
      new StringReader(""));
    XslCompiledTransform xslt = new XslCompiledTransform();
    xslt.Load("../../XSLTFile1.xslt");            
    xslt.Transform(doc, null, 
      new XhtmlWriter(
        XmlWriter.Create(Console.Out, xslt.OutputSettings)));
  }
}
...