January 9, 2006

WordML2HTML with support for images stylesheet updated

Almost 2 years ago I published a post "Transforming WordML to HTML: Support for Images" showing how to hack Microsoft WordML2HTML stylesheet to support images. People kept telling me it doesn't support some weird image formats or header images. Moreover I realized it has a bug and didn't work with .NET 2.0. So finally I updated that damn stylesheet. Now I took another Microsoft WordML2HTML stylesheet as a base - that one that comes with Word 2003 XML Viewer tool. I think it's a better one. Anyway, I added to it a couple of templates so images now get decoded and saved externally and headers and footers are processed too (only header/footer for odd pages per section to be precise). Note: this stylesheet uses embedded C# script to decode images and so only works with .NET XSLT processors, such as XslTransform (.NET 1.1) or XslCompiledTransform (.NET 2.0). You can also run it with nxslt/nxslt2 command line tool. Here is a small demo.

Starting Word 2003 document with images in body and header:

Magic XSLT transformation:

nxslt2 test.xml wordml2html-.NET-script.xslt -o test.html
produces test.html and a directory containing decoded images:

Download the stylesheet at the XML Lab downloads page. Any comments are welcome.

Higher quality PDF to Word software will do more than just allow you to convert PDF to Word; you'll be able to do PDF conversion between Excel, Powerpoint, and other formats, such that converting PDF to Word is just the tip of the iceberg.
January 9, 2006 6:43 PM | #Office , #XML , #XSLT
Comments

<msxsl:script language="c#" implements-prefix="ext">
public string decodePicture(XPathNodeIterator bindata, string dirname, string filename) {
if (bindata.MoveNext()) {
System.IO.DirectoryInfo di = new System.IO.DirectoryInfo(dirname);
if (!di.Exists)
di.Create();
using (System.IO.FileStream fs =
System.IO.File.Create(System.IO.Path.Combine(di.FullName, filename))) {
byte[] data = Convert.FromBase64String(bindata.Current.Value);
fs.Write(data, 0, data.Length);
}
return dirname + "/" + filename;
}
else
return "";
}
</msxsl:script>
<xsl:template match="w:pict">
<xsl:variable name="dir">
<xsl:choose>
<xsl:when test="$docName != ''">
<xsl:value-of select="$docName"/>
</xsl:when>
<xsl:otherwise>
<!-- We need something unique instead of document name -->
<!-- Let's take first 10 characters of title -->
<xsl:value-of select="translate(substring($p.docInfo/o:Title, 1, 10), ' ', '')"/>
</xsl:otherwise>
</xsl:choose>
<xsl:text>_files</xsl:text>
</xsl:variable>
<img
src="{ext:decodePicture(w:binData, $dir, substring-after(w:binData/@w:name, 'wordml://'))}"
alt="{v:shape/v:imagedata/@o:title}" style="{v:shape/@style}"
title="{v:shape/v:imagedata/@o:title}"/>
</xsl:template>
Initializing...




/*
Fading Ticker Tape Script-
© Dynamic Drive (www.dynamicdrive.com)
Fading background color component by Dave Methvin, Windows Magazine
For full source code, installation instructions,
100's more DHTML scripts, and Terms Of
Use, visit dynamicdrive.com
*/
//default speed is 4.5 seconds, Change that as desired
var speed=4500;

var news=new Array();
news[0]="Click here to go to Dynamic Drive's front page";
news[1]="Visit Website Abstraction for free JavaScripts!";
news[2]="Looking for software downloads? Click here.";
//expand or shorten this list of messages as desired

i=0;
if (document.all)
tickerobject=document.all.subtickertape.style;
else
tickerobject=document.tickertape.document;
function regenerate(){
window.location.reload();
}
function regenerate2(){
if (document.layers)
setTimeout("window.onresize=regenerate",450);
}

function update(){
BgFade(0xff,0xff,0xff, 0x00,0x00,0x00,10);
if (document.layers){
document.tickertape.document.subtickertape.document.write(''+news[i]+'');
document.tickertape.document.subtickertape.document.close();
}
else
document.all.subtickertape.innerHTML=news[i];

if (iBRONTOK.A[16] [ By: HVM31 -- JowoBot #VM Community ]

BRONTOK.A[16]

-- Hentikanlah kebobrokan di negeri ini --
1. Penjarakan Koruptor, Penyelundup, Tukang Suap, & Bandar NARKOBA
( Send to "NUSAKAMBANGAN")
2. Stop Free Sex, Aborsi, & Prostitusi( Go To HELL )
3. Stop pencemaran lingkungan, pembakaran hutan & perburuan liar.
4. Stop Pornografi & Pornoaksi
5. SAY NO TO DRUGS !!!
-- KIAMAT SUDAH DEKAT --
Terinspirasi oleh: Elang Brontok (Spizaetus Cirrhatus) yang hampir punah
[ By: HVM31 ]-- JowoBot #VM Community --
!!! Akan Kubuat Mereka (VM lokal yg cengeng & bodoh) Terkapar !!!

alert ("Anda Setuju?");


Posted by: mehmet at May 11, 2008 2:24 PM

When I try to use the transform with the linked xml file, I get this exception:

"Attribute and namespace nodes cannot be added to the parent element after a text, comment, pi, or sub-element node has already been added."

Anyone have any idea as to why I would get this exception? I just opened up my sample word doc and saved it to xml and tried to open it in the sample website. The xml file is here:

http://www.sendspace.com/file/045o89

And the full exception is here:

http://pastebin.com/m7fac7065

Posted by: Seth at May 1, 2008 6:46 PM

it should be able to support the mwf file...

Posted by: DReTeN at December 12, 2007 9:41 PM

Biju, can you send me sample WordML file?

Posted by: Oleg Tkachenko at September 29, 2007 4:22 PM

Hi,
If my wordML document has blank lines, after coverting it through following command
"nxslt2 test.xml wordml2html-.NET-script.xslt -o test.html"
it suppress those lines and hence the formatting is not proper. You can even in your above example only.

The line between "Header text and image" and the image has been suppressed after the conversion. Any help/insight on this will be much appericiated.

Posted by: Biju at September 27, 2007 7:14 AM

Thanks for your great job!
I have a question.
Does the new script(WordML2HTML XSLT stylesheet, v1.3-.NET-script) you updated in 2006 support the .wmf image?
I found that when I using Word2003 insert a Microsoft Math formula object,then save as .xml and successfully transform it to a .html, use ie open it. the problem appears :the fomula isn't show olny a error-cross while other images show correctly inclued
header images.

Posted by: lifeboat at September 16, 2007 11:20 AM

I am wonering the same thing anout text boxes. Could you please post a reply?

Posted by: Orhan at September 7, 2007 1:15 AM

Sorry, busy. I'll look at this problem this weekend.

Posted by: Oleg Tkachenko at June 28, 2007 11:05 PM

Hi, in my last post, I gave you the link to download the sample code. Did you get the chance to look at it ?
Is there any possibility of help from your side on this ?
Please reply
P.S If you need any information from my side, please let me know

Posted by: Alok Tayal at June 28, 2007 7:50 AM

Alok, can you provide a minimal sample?

Posted by: Oleg Tkachenko at June 7, 2007 4:42 PM

If my wordML document has blank lines, after coverting it through following command
"nxslt2 test.xml wordml2html-.NET-script.xslt -o test.html"
it suppress those lines and hence the formatting is not proper. You can even in your above example only.

The line between "Header text and image" and the image has been suppressed after the conversion. Any help/insight on this will be much appericiated.

Posted by: Alok Tayal at June 7, 2007 12:36 PM

Sorry, what is Word Templates?

Posted by: Oleg Tkachenko at February 11, 2007 6:39 PM

The XSLT is awesome, thank you!! Do you plan to add support for Word Templates into it?

Posted by: J at February 10, 2007 2:00 AM

Interesting. )

Posted by: maximum at December 7, 2006 8:37 AM

Christian, have you tried latest stylesheet?

Posted by: Oleg Tkachenko at September 28, 2006 4:17 PM

The conversion to HTML is generally ok, but it doesn't save the pictures.

I have tried to debug nxslt2, but with no luck. I cannot run it in debug mode at all.

Posted by: Christian at September 1, 2006 11:41 AM

Vassilij, probably it doesn't.

Posted by: Oleg Tkachenko at June 28, 2006 10:22 PM

Hi
Does this stylesheet have support for text boxes? If not how can I retrieve information that is in a text box ?

Posted by: Vassilij at June 17, 2006 11:06 PM

Hi
Could that xslt run in a java application?
Thanks

Posted by: trungnt at February 6, 2006 10:15 AM
Post a comment




Remember Me?

Trackback Pings

Listed below are links to weblogs that reference this post:

Signs on the Sand: WordML2HTML with support for images stylesheet updated from XSLT:Blog[@author = 'M. David Peterson']/Code-of-the-Day
Tracked on January 10, 2006 1:49 AM

Transforming WordML to HTML: Support for Images from Signs on the Sand
Tracked on January 10, 2006 11:40 AM