February 14, 2007

XML Inclusions reversal or transclusions strike back

Kzu, being also one of the Mvp.Xml project users has this wild feature request. He wants to reverse XInclude resolving back. The scenario is simple: you load XML document A.xml containing XML Inclusions for B.xml and C.xml, XInclude processor resolves XML Inclusions, you get a combined document, edit it and then you save it back to A.xml, B.xml and C.xml. So if you have modified an element coming from B.xml then B.xml gets updated on save.

Well, that sounds like a reasonable feature, but how it can be done? To be able to reverse XML Inclusions one has to know exactly where each node came from, i.e. to preserve original context in a post-XInclude document.

Inclusion preserving context information is also known as a transclusion. Visual transclusion is traditionally associated with XLink instead and technically speaking XInclude has nothing to do with it. From XInclude 1.0 spec:

1.1 Relationship to XLink

XInclude differs from the linking features described in the [XML Linking Language], specifically links with the attribute value show="embed". Such links provide a media-type independent syntax for indicating that a resource is to be embedded graphically within the display of the document. XLink does not specify a specific processing model, but simply facilitates the detection of links and recognition of associated metadata by a higher level application.

XInclude, on the other hand, specifies a media-type specific (XML into XML) transformation. It defines a specific processing model for merging information sets. XInclude processing occurs at a low level, often by a generic XInclude processor which makes the resulting information set available to higher level applications.

Simple information item inclusion as described in this specification differs from transclusion, which preserves contextual information such as style.

So in an ideal world I'd just suggest Kzu to use XLink instead of XInclude for transclusions. The problem though is that XLink is basically dead for years now and unfortunately there is none XLink implementations for .NET. That's why XInclude.

As I read XInclude spec more I realized above citation about XInclude != transclusion isn't 100% true and XInlcude does preserve some pieces of context:

The inclusion history of each top-level included item is recorded in the extension property include history. The include history property is a list of element information items, representing the xi:include elements for recursive levels of inclusion. If an include history property already appears on a top-level included item, the xi:include element information item is prepended to the list. If no include history property exists, then this property is added with the single value of the xi:include element information item.

So basically for each node in a post-XInclude document it's possible to figure out it's original context:

  1. If a node has no ancestors having "include history" property, it belongs to the including XML document.
  2. If there is such ancestor node then "include history" can be used to find out where this node came from.

Of course that only sounds simple. For starters Mvp.Xml XInclude implementation doesn't support "include history". XIncludingReader keeps internal stack of xi:include elements though and can expose it in some way. Then "include history" should be preserved in XML Infoset implementation, e.g. XML DOM - XmlDocument. That means XIncludeXmlDocument class extending XmlDocument. And then "include history" should be used when saving XmlDocument. Still sounds feasible.

Problems. What about partial inclusions with XPointer? if a node was included from inside a document its full XPath must be preserved in "include history" so it can be saved back at exactly the same location. Still feasible.

Editing combined document opens Pandora's box. New nodes - where they should be saved. Deleting nodes - how to detect? Moving nodes around. Multiple inclusions of the same node - how to resolve conflicts?

Well, still it sounds mostly feasible to implement transclusion on top of XInclude.

Any comments? Does anybody think it might be useful?