Random photo
Loading...
Domains for sale
|
May 4, 2003Generating Word documents using XSLTThe world is getting better. And the Word too! Word 2003 Beta2 now understands not only those *.doc files, but XML also. It's all as it should be in open XML world (what makes some people suspicious): there is WordML vocabulary, its schema (well documented one, btw) is available as part of Microsoft Word XML Content Development Kit Beta 2. Having said that it's obvious to go on and to assume that Word documents now may be queried using XPath or XQuery as well as transformed and generated using XSLT. Isn't it fantastic? So here is "Hello Word!" XSLT stylesheet, which generates minimal, while still valid Word 2003 document: <xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:template match="/">
<xsl:processing-instruction
name="mso-application">progid="Word.Document"</xsl:processing-instruction>
<w:wordDocument
xmlns:w="http://schemas.microsoft.com/office/word/2003/2/wordml">
<w:body>
<w:p>
<w:r>
<w:t>Hello Word!</w:t>
</w:r>
</w:p>
</w:body>
</w:wordDocument>
</xsl:template>
</xsl:stylesheet>
That <?mso-application progid="Word.Document"?> processing instruction is important one - that's how Windows recognizes an XML document as Word document. Seems like they parse only XML document prolog looking for this PI. Good idea I think.
Now let's try something more interesting - transform some XML document to formatted Word document, containing heading, italic text and link. Consider the following source doc:
<?xml-stylesheet type="text/xsl" href="style.xsl"?>
<chapter title="XSLT Programming">
<para>It's <i>very</i> simple. Just ask <link
url="http://google.com">Google</link>.</para>
</chapter>
Then XSLT stylesheet (quite big one due to verbose element-based WordML syntax):
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:o="urn:schemas-microsoft-com:office:office"
xmlns:w="http://schemas.microsoft.com/office/word/2003/2/wordml">
<xsl:template match="/">
<xsl:processing-instruction
name="mso-application">progid="Word.Document"</xsl:processing-instruction>
<w:wordDocument>
<xsl:apply-templates/>
</w:wordDocument>
</xsl:template>
<xsl:template match="chapter">
<o:DocumentProperties>
<o:Title>
<xsl:value-of select="@title"/>
</o:Title>
</o:DocumentProperties>
<w:styles>
<w:style w:type="paragraph" w:styleId="Heading3">
<w:name w:val="heading 3"/>
<w:pPr>
<w:pStyle w:val="Heading3"/>
<w:keepNext/>
<w:spacing w:before="240" w:after="60"/>
<w:outlineLvl w:val="2"/>
</w:pPr>
<w:rPr>
<w:rFonts w:ascii="Arial" w:h-ansi="Arial"/>
<w:b/>
<w:sz w:val="26"/>
</w:rPr>
</w:style>
<w:style w:type="character" w:styleId="Hyperlink">
<w:rPr>
<w:color w:val="0000FF"/>
<w:u w:val="single"/>
</w:rPr>
</w:style>
</w:styles>
<w:body>
<w:p>
<w:pPr>
<w:pStyle w:val="Heading3"/>
</w:pPr>
<w:r>
<w:t>
<xsl:value-of select="@title"/>
</w:t>
</w:r>
</w:p>
<xsl:apply-templates/>
</w:body>
</xsl:template>
<xsl:template match="para">
<w:p>
<xsl:apply-templates/>
</w:p>
</xsl:template>
<xsl:template match="i">
<w:r>
<w:rPr>
<w:i/>
</w:rPr>
<xsl:apply-templates/>
</w:r>
</xsl:template>
<xsl:template match="text()">
<w:r>
<w:t xml:space="preserve"><xsl:value-of
select="."/></w:t>
</w:r>
</xsl:template>
<xsl:template match="link">
<w:hlink w:dest="{@url}">
<w:r>
<w:rPr>
<w:rStyle w:val="Hyperlink"/>
<w:i/>
</w:rPr>
<xsl:apply-templates/>
</w:r>
</w:hlink>
</xsl:template>
</xsl:stylesheet>
And the resulting WordML document, opened in Word 2003:
Not bad.
If you need to convert PDF to Word
you could discover that many of those converting PDF to Word
sites aren't as useful as a dedicated piece of PDF conversion
software, most especially complex PDF to Word software
for document management.
Comments
Ok, I'm closing comments on this page due to severe spamming. Posted by: Oleg Tkachenko at March 1, 2004 11:13 AMInteresting to see Microsoft playing catchup. Open Source Office alternative OpenOffice.org http://www.openoffice.org is based on xml and has been around for years. Posted by: Jez Nicholson at January 23, 2004 6:31 PMThanks ! Good work :) Posted by: Kristopher Gora at December 26, 2003 10:02 AMNelson, you need something like /contract/sections/section[@number='section1']/sectionTerm[ @termid='term1']/term Posted by: Oleg Tkachenko at December 1, 2003 3:10 PMmed, see "Generating images in WordprocessingML" at http://www.tkachenko.com/blog/archives/000106.html Posted by: Oleg Tkachenko at December 1, 2003 3:05 PMHello, I try to get node "sectionTerm" with attribute termid = "term1" under section which has attribute number="section1" from following
[section number="section2"] I'm a newbie in WordML, how do you handle images? Posted by: med at November 30, 2003 9:18 PMThis is pretty interesting. I agree with the author. Posted by: dns at October 12, 2003 2:02 PMCris, afaik Word 2003 holds images embedded within WordML document, obviously Base64 encoded. It's w:pict element, take a look into WordML schema. So it also seems to be quite feasible. Posted by: Oleg Tkachenko at July 13, 2003 7:58 PMYeah, sure I've been thinking about XSL-FO2WordML and WordML2XSL-FO, but I'm still in research phase. While I know XSL-FO well, I'm newbie in WordML. using XSL:FO as unified formatting language for documents, can any WordML be transformed to FO and can any FO be transformed to WordNL, in other words, is there (semantic, or functional, whatever that means in formal terms, I am not 100% sure) equivalence between two formatting languages? I don't know that, did you think of that already? I think definite answer requires some time consuming research... Posted by: viktor gritsenko at July 13, 2003 7:05 PMhow do you handle images and making them local images so users can edit images and see them if internet connection is not available. Posted by: cris at July 10, 2003 8:53 PMVery Cool, i will wait until more tools are avaible! Thanks for the info, Oleg. Hans Braumller Oh, Goggle, funny typo, thanks, fixed. btw, goggle.com site does exist, but I don't advise to browse it due to nasty spam popup windows. Wow. I wish I understood that! It seems to be one of the holy grails, producing a valid word document *without* using word :-) You do know you wrote Goggle, right? Posted by: Dan F at May 5, 2003 1:25 PMComments on this post are closed, sorry... Listed below are links to weblogs that reference this post:
Generating Word documents using XSLT from Brad's Blog
Todays links from InsultConsult
Generating Word documents using XSLT from Liudvikas Bukys
XML - Interneti - Mail Art from zzzzzzzzzzzzzzzz
Signs on the Sand: Generating Word documents using XSLT from Roland Tanglao's Weblog
RE: Let's talk t, p, and r from John R. Durant's WebLog
More on Word and XML from Steven's [Mostly] Tech Notebook
re: Generating Word documents with XML and XSLT from B# .NET Blog |