Random photo
Loading...
Domains for sale
|
March 17, 2004Transforming WordML to HTML: Support for ImagesUpdate: this post is outdated, see "WordML2HTML with support for images stylesheet updated" for updates. Here is a new version of WordML2HTML XSLT stylesheet, developed by Microsoft for Word 2003 Beta2 and adapted by me to Word 2003 RTM. I called this version "1.1-.NET-script". Here is why. Along with some bug fixes (typo with w:rStyle, empty <title> in generated HTML etc) I implemented basic support for images. That required XSLT extension function, which I implemented with .NET and <msxsl:script>. MHT and MSXML/Jscript versions are coming soon. Word 2003 allows to save document as "Single File Web Page" (*.mht aka Web Archive file) or as usual HTML document. In the latter case all images embedded into Word document are saved into documentName_files directory. Here is a challenge - how to implement it with XSLT. Obviously we need extension function to decode (Base64) embedded image and to write to a directory. That's not a big deal, but the problem is it makes XSLT stylesheet not portable, that's why I need different versions for .NET, MSXML etc. Second problem - WordML document doesn't store name of file, so the question is how to name a directory where to save decoded images. I introduced a global stylesheet parameter called docName to facilitate the issue. If docName parameter isn't provided, it's defaulted to first 10 characters of the document title. To run the transformation I used nxslt.exe command line utility. Download it for free if you don't have it yet. So I created test Word 2003 document with a couple of images: nxslt.exe test.xml d:\xsl\Word2HTML-.NET-script.xsl -o test.html docName=testAs the result, XSLT transformation created test.html document and test_files directory, containing two decoded images, here is how it looks like in a browser: .
The implementation is very simple one. Here it is: <msxsl:script language="c#" implements-prefix="ext">
public string decodePicture(XPathNodeIterator bindata, string dirname, string filename) {
if (bindata.MoveNext()) {
System.IO.DirectoryInfo di = new System.IO.DirectoryInfo(dirname);
if (!di.Exists)
di.Create();
using (System.IO.FileStream fs =
System.IO.File.Create(System.IO.Path.Combine(di.FullName, filename))) {
byte[] data = Convert.FromBase64String(bindata.Current.Value);
fs.Write(data, 0, data.Length);
}
return dirname + "/" + filename;
}
else
return "";
}
</msxsl:script>
<xsl:template match="w:pict">
<xsl:variable name="dir">
<xsl:choose>
<xsl:when test="$docName != ''">
<xsl:value-of select="$docName"/>
</xsl:when>
<xsl:otherwise>
<!-- We need something unique instead of document name -->
<!-- Let's take first 10 characters of title -->
<xsl:value-of select="translate(substring($p.docInfo/o:Title, 1, 10), ' ', '')"/>
</xsl:otherwise>
</xsl:choose>
<xsl:text>_files</xsl:text>
</xsl:variable>
<img
src="{ext:decodePicture(w:binData, $dir, substring-after(w:binData/@w:name, 'wordml://'))}"
alt="{v:shape/v:imagedata/@o:title}" style="{v:shape/@style}"
title="{v:shape/v:imagedata/@o:title}"/>
</xsl:template>
Not a rocket engineering indeed. Yes, Sal, WMZ images are not supported, I have no idea how to convert them to GIF.
Download the stylesheet here and give it a shot. Again - this stylesheet requires .NET XSLT engine. Any comments/requests/bug reports are welcome. March 17, 2004 8:55 PM
| #Office
Comments
suppose instead of specifting the path, image contents are there in xml file how can i retrieve the image back in doc file? Reply to my mail saravanan_article@yahoo.com and also tell me how to run the application is it req to write xsltransformation for that? Posted by: Saravanan at May 12, 2006 1:42 PM
Fergal, for header images take a look at http://www.tkachenko.com/blog/archives/000556.html Posted by: Oleg Tkachenko at January 10, 2006 11:42 AMSilly bug, Thomas. bindata argument must be moved next before it can be used. Somehow it works in .NET 1.1, but not in .NET 2.0. Super work!!! But why doesn't it function with nxslt2? Posted by: Thomas at January 4, 2006 11:18 AMFirstly - great job - this is absolutely brilliant! Thanks Fergal Posted by: Fergal at October 28, 2005 6:26 PMHello Oleg! Nope, sorry, too busy currently. Posted by: Oleg Tkachenko at March 17, 2005 5:15 PM
Any new solutions on wmz images? Posted by: Peter Lukan at March 11, 2005 12:56 PMSorry I made a mistake Here is the valid XSL -->
Now the last file, "show_image.php" Posted by: Francois Levasseur at February 20, 2005 4:24 AMThis is great when you use the .NET framework on a Windows platform (or Mono on any platform). Here is a solution to do that in PHP5. I have 3 files. word2html.xsl wordml_preview_html.php and show_image.php In word2xsl put: -->
/* Code to use the XSLT */ /* function that is called */ Well, it's access denied error, which is obviously has nothing to do with XSLT. How do you run XSLT? Looks like it's trying to create image on disk and security doesn't allow that. Posted by: Oleg Tkachenko at September 26, 2004 1:46 PMHi there, Sorry about the delayed response. To reproduce the problem, just place an image into the header section of a Word document. doesn't seem to matter if its JPEG or GIF. Posted by: Stephajn at July 27, 2004 8:39 PMStephajn, how can I reproduce that problem? Posted by: Oleg Tkachenko at July 15, 2004 9:19 AMVery handy. Images don't seem to be picked up if they're in the header section of a WordML document though. Posted by: Stephajn at July 14, 2004 7:49 PMWell, sounds like broken WordML document? That's insteresting to see, can you send it to me? Posted by: Oleg Tkachenko at June 27, 2004 12:38 PMOleg, I'm getting this exception in your 'ext:decodePicture' function: System.Xml.Xsl.XsltException: Function 'ext:decodePicture()' has failed. ---> Sy Great work! Thank you. Posted by: Claus Conrad at April 27, 2004 10:05 PMvery cool. Kudos Posted by: Me at March 19, 2004 3:05 AMPost a comment
Listed below are links to weblogs that reference this post:
Take Outs for 17 March 2004 from Enjoy Every Sandwich
Converting WordprocessingML into HTML (for easy viewing) from Brian Jones: Office XML Formats |