Don Box on WordML and XHTML+CSS

Don Box talks about his weekend hobby - getting XHTML+CSS from WordML. Using XSLT of course. And it works - Don bloggs in Word, publishing in XHTML. Resulting HTML is not optimal yet (see page source), but perfectly proves the concept.

Well, as per Don WordML and XHTML are twins separated at birth:

Our chat confirmed for me that WordML and XHTML+CSS are more alike than they are different.
Both use a fairly small number of structural markup elements and use annotations on those elements to influence formatting.
But there are reasons WordML to XHTML+CSS transformation isn't trivial enough. These are non-semantic WordML nature (no semantic markup at all or wacky lists as another example), different whitespace handling, styles inheritance and tabs. I'm sure there is much more differences over there we still don't see.

Anyway I fully agree with Don in his conclusion - WordML is just another XML vocabulary and its processing, such as generation, aggregation, querying or transformation is as easy as processing any other XML. With one small attached string - "Provided we know and understand WordML well enough."

Talking about lists. I really didn't realize lists in WordML are designed to be easily rendered. For Word rendering engine it doesn't matter a paragraph is a list item - it can just process w:listPr and rendering done. My semantically-oriented mind resist to swallow it, but it's naked truth. WordML isn't designed to be semantic document markup language like Docbook or XHTML are, it's completely different, anti-semantic, fully presentational vocabulary, designed to be easlily rendered by Word rendering engine. But XML is XML and processing of WordML is not actually different from processing Docbook. Everything is possible in XML.

