January 9, 2006

WordML2HTML with support for images stylesheet updated

Almost 2 years ago I published a post "Transforming WordML to HTML: Support for Images" showing how to hack Microsoft WordML2HTML stylesheet to support images. People kept telling me it doesn't support some weird image formats or header images. Moreover I realized it has a bug and didn't work with ...

Starting Word 2003 document with images in body and header:

Magic XSLT transformation:

nxslt2 test.xml wordml2html-.NET-script.xslt -o test.html
produces test.html and a directory containing decoded images:

Download the stylesheet at the XML Lab downloads page. Any comments are welcome.

Higher quality PDF to Word software will do more than just allow you to convert PDF to Word; you'll be able to do PDF conversion between Excel, Powerpoint, and other formats, such that converting PDF to Word is just the tip of the iceberg.