Generating Word documents using XSLT

| 15 Comments | 8 TrackBacks

The world is getting better. And the Word too! Word 2003 Beta2 now understands not only those *.doc files, but XML also. It's all as it should be in open XML world (what makes some people suspicious): there is WordML vocabulary, its schema (well documented one, btw) is available as part of Microsoft Word XML Content Development Kit Beta 2. Having said that it's obvious to go on and to assume that Word documents now may be queried using XPath or XQuery as well as transformed and generated using XSLT. Isn't it fantastic?

So here is "Hello Word!" XSLT stylesheet, which generates minimal, while still valid Word 2003 document:

<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
    <xsl:template match="/">
        <xsl:processing-instruction 
name="mso-application">progid="Word.Document"</xsl:processing-instruction>
        <w:wordDocument
xmlns:w="http://schemas.microsoft.com/office/word/2003/2/wordml">
            <w:body>
                <w:p>
                    <w:r>
                        <w:t>Hello Word!</w:t>
                    </w:r>
                </w:p>
            </w:body>
        </w:wordDocument>
    </xsl:template>
</xsl:stylesheet>
That <?mso-application progid="Word.Document"?> processing instruction is important one - that's how Windows recognizes an XML document as Word document. Seems like they parse only XML document prolog looking for this PI. Good idea I think.

Now let's try something more interesting - transform some XML document to formatted Word document, containing heading, italic text and link. Consider the following source doc:

<?xml-stylesheet type="text/xsl" href="style.xsl"?>
<chapter title="XSLT Programming">
    <para>It's <i>very</i> simple. Just ask <link
url="http://google.com">Google</link>.</para>
</chapter>
Then XSLT stylesheet (quite big one due to verbose element-based WordML syntax):
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform" 
xmlns:o="urn:schemas-microsoft-com:office:office"
xmlns:w="http://schemas.microsoft.com/office/word/2003/2/wordml">
    <xsl:template match="/">
        <xsl:processing-instruction 
name="mso-application">progid="Word.Document"</xsl:processing-instruction>
        <w:wordDocument>
            <xsl:apply-templates/>
        </w:wordDocument>
    </xsl:template>
    <xsl:template match="chapter">
        <o:DocumentProperties>
            <o:Title>
                <xsl:value-of select="@title"/>
            </o:Title>
        </o:DocumentProperties>
        <w:styles>
            <w:style w:type="paragraph" w:styleId="Heading3">
                <w:name w:val="heading 3"/>
                <w:pPr>
                    <w:pStyle w:val="Heading3"/>
                    <w:keepNext/>
                    <w:spacing w:before="240" w:after="60"/>
                    <w:outlineLvl w:val="2"/>
                </w:pPr>
                <w:rPr>
                    <w:rFonts w:ascii="Arial" w:h-ansi="Arial"/>
                    <w:b/>
                    <w:sz w:val="26"/>
                </w:rPr>
            </w:style>
            <w:style w:type="character" w:styleId="Hyperlink">
                <w:rPr>
                    <w:color w:val="0000FF"/>
                    <w:u w:val="single"/>
                </w:rPr>
            </w:style>
        </w:styles>
        <w:body>
            <w:p>
                <w:pPr>
                    <w:pStyle w:val="Heading3"/>
                </w:pPr>
                <w:r>
                    <w:t>
                        <xsl:value-of select="@title"/>
                    </w:t>
                </w:r>
            </w:p>
            <xsl:apply-templates/>
        </w:body>
    </xsl:template>
    <xsl:template match="para">
        <w:p>
            <xsl:apply-templates/>
        </w:p>
    </xsl:template>
    <xsl:template match="i">
        <w:r>
            <w:rPr>
                <w:i/>
            </w:rPr>
            <xsl:apply-templates/>
        </w:r>
    </xsl:template>
    <xsl:template match="text()">
        <w:r>
            <w:t xml:space="preserve"><xsl:value-of 
select="."/></w:t>
        </w:r>
    </xsl:template>
    <xsl:template match="link">
        <w:hlink w:dest="{@url}">
            <w:r>
                <w:rPr>
                    <w:rStyle w:val="Hyperlink"/>
                    <w:i/>
                </w:rPr>
                <xsl:apply-templates/>
            </w:r>
        </w:hlink>
    </xsl:template>
</xsl:stylesheet>
And the resulting WordML document, opened in Word 2003:
Generated Word Document

Not bad.

If you need to convert PDF to Word you could discover that many of those converting PDF to Word sites aren't as useful as a dedicated piece of PDF conversion software, most especially complex PDF to Word software for document management.

Related Blog Posts

8 TrackBacks

TrackBack URL: http://www.tkachenko.com/cgi-bin/mt-tb.cgi/27

Great link on using XSLT to make word documents. "It seems to be one of the holy grails, producing a Read More

Todays links from InsultConsult on May 7, 2003 4:13 PM

Oh joy, even more links: Disassembling Java Classes negative(): minus minus is plus. (Nice compilation.) Java, XML, and Databases. Digitally Imported New website and more channels (streaming MP3). GoogleRSS (Who knows what this actually does...) Guido... Read More

Oleg Tkachenko: Generating Word documents using XSLT Read More

rtt fyrir a unglyndi blaamaurinn hafi ekki svara vangaveltum mnum um kynhneigir James Hetfield ea skoanir forsngvara Maus eim verur a segjast a interneti getur veri strsniugt. an rakst g vafri mnu blogg Olegs Tkachenkos... Read More

(SOURCE:"donb")- Hmmm maybe WordML is not as closed as people claim. Read More

RE: Let's talk t, p, and r from John R. Durant's WebLog on February 11, 2004 6:47 PM

TITLE: RE: Let's talk t, p, and r URL: http://weblogs.asp.net/johnrdurant/archive/2004/02/11/71378.aspx IP: 66.129.67.202 BLOG NAME: John R. Durant's WebLog DATE: 02/11/2004 06:47:50 PM Read More

More on Word and XML from Steven's [Mostly] Tech Notebook on July 9, 2004 5:02 PM

Via Don Box , Oleg Tkachenko 's Generating Word documents using XSLT . Read More

TITLE: re: Generating Word documents with XML and XSLT URL: http://blogs.bartdesmet.net/bart/archive/2004/09/04/384.aspx IP: 193.190.130.177 BLOG NAME: B# .NET Blog DATE: 09/04/2004 06:29:33 PM Read More

15 Comments

Ok, I'm closing comments on this page due to severe spamming.

Interesting to see Microsoft playing catchup. Open Source Office alternative OpenOffice.org http://www.openoffice.org is based on xml and has been around for years.

Nelson, you need something like /contract/sections/section[@number='section1']/sectionTerm[ @termid='term1']/term

Hello,
Does anybody know how to get an child node which has an attribute by using selectSingleNode method.

I try to get node "sectionTerm" with attribute termid = "term1" under section which has attribute number="section1" from following
xml file(I have to use [ to replace < because it will not show tag name if I use <):


......
[contract][sections]
[section number="section1"]
[sectionTerm termid = "term1"]
[term]Hello[/term]
[/sectionTerm]
[sectionTerm termid = "term2"]
[term]Goodbye[/term]
[/sectionTerm]
[/section]

[section number="section2"]
[sectionTerm termid = "term1"]
[term]Hello[/term]
[/sectionTerm]
[sectionTerm termid = "term2"]
[term]Goodbye[/term]
[/sectionTerm]
[/section]
[/sections]
[/contract]

I'm a newbie in WordML, how do you handle images?

This is pretty interesting. I agree with the author.

Cris, afaik Word 2003 holds images embedded within WordML document, obviously Base64 encoded. It's w:pict element, take a look into WordML schema. So it also seems to be quite feasible.

Yeah, sure I've been thinking about XSL-FO2WordML and WordML2XSL-FO, but I'm still in research phase. While I know XSL-FO well, I'm newbie in WordML.
But that's really sounds tempting...

using XSL:FO as unified formatting language for documents, can any WordML be transformed to FO and can any FO be transformed to WordNL, in other words, is there (semantic, or functional, whatever that means in formal terms, I am not 100% sure) equivalence between two formatting languages?

I don't know that, did you think of that already? I think definite answer requires some time consuming research...

how do you handle images and making them local images so users can edit images and see them if internet connection is not available.

Very Cool,

i will wait until more tools are avaible!

Thanks for the info, Oleg.

Hans Braumller
-- + --
Mail Art Networking Visual & Virtual Poet
http://braumueller.crosses.net

Oh, Goggle, funny typo, thanks, fixed. btw, goggle.com site does exist, but I don't advise to browse it due to nasty spam popup windows.
And what about Word - I do impressed about these new possibility also. Let's just wait the release and when people get upgraded.

Wow. I wish I understood that! It seems to be one of the holy grails, producing a valid word document *without* using word :-)

You do know you wrote Goggle, right?