April 3, 2005

XML Indexing Article went live

Part 1 of my "Indexing XML" article went live at the MSDN XML Dev Center. In this article I discuss various aspects of indexing standalone XML documents - XML IDs, XSLT Keys and introduce IndexingXPathNavigator class, part of the Mvp.Xml library, which enables lazy or eager indexing of any IXPathNavigable ...

XInclude + XML Schema validation = Ooops!

Norm Walsh writes about a very nasty problem waiting to happen when anybody naive enough is trying to validate an XML document after XInclude processing against XML Schema. Ooops, XInclude processor appends xml:base attributes at top-level included elements, which you gotta allow in the schema, most likely on every element ...

One interesting point Norm raised is how come this glitch happened:

I think what pains me most about this situation is that XInclude was in development for just over five years. It went through eleven drafts[1] including three Candidate Recommendations.
Why didn't we notice this until several months after XInclude was a Recommendation?
Basically I disagree. Of course this incompatibility wasn't unnoticed. Moreover you can find the following in the "DocBook Technical Committee Meeting Minutes: 19 Nov 2002":
615587 Support xml:base
Any object to adding xml:base to the common attributes? None.
Accepted.
You get the idea of what is the common solution - just declare xml:base and xml:lang in your schema.

And what about better solutions? Norm talks about fixing either XML Schema or XInclude. I don't see why XInclude needs to be fixed here. It's clear that if a fragment comes from a different place its base URI must be preserved somehow, otherwise say good bye to relative URIs in included fragment. And xml:base is currently the only feasible facility of base URI manipulation. But when it comes to XML Schema I see plenty of room for improvements - it could either allow to define globally allowed attributes or to allow xml: attributes to appear anywhere, just like xsi:type at al.

What's wrong with the latter solution? xml:base, xml:lang, xml:space, xml:id - isn't it ridiculous to be forced to declare them on every element in XML Schema? They are orthogonal to validation and so XML Schema validation should be orthogonal to them.

Another interesting tidbit - this issue was reported to the MSDN Product Feedback Center by kzu and his suggestion to introduce a flag to the XmlSchemaValidationFlags enum saying that xml: attributes should be ignored during validation seems to be favorable by Microsoft. There is a chance that in .NET 2.0 XML Schema processor will optionally allow xml:base and friends even if not declared in schema. If you like it - go vote for this suggestion. I did.