Idee fixe

| 2 Comments | No TrackBacks

The whole morning I'm trying to get rid of the idee fixe of writing XmlReader/XmlWriter based XML updater. The aim is to be able to update XML without loading it to DOM or even XPathDocument (which as rumored is going to be editable in .NET 1.2). Stream-oriented reading via XmlReader, some on-the-fly logic (quite limited though - filtering, values modifying) in between and then writing to XmlWriter. Cache-free, forward-only just as XmlReader is. If you're aware of SAX filters you know what I'm talking about. But I want the filtering/updating logic (hmmm, did you note I'm avoiding "transforming" term?) to be expressed declaratively.

Obviously the key task is how to express and detect nodes to be updated. If we go XPath patterns way we generally can get limited to single update per process, due to forward-only restriction. Subsetting XPath can help though. The only way to evaluate XPath expression without building tree graph is so-called ForwardOnlyXPathNavigator aka XPathNavigator over XmlReader. This beast is mentioned sometimes in articles, but I'm not aware of any implementation availble online yet. Btw, a friend of mine did that almost a year ago, may be I can get him to publish it. As per name it limits XPath to forward axes only (the subset seems to be the same as Arpan Desai's SXPath) and of course can't evaluate more than one absolute location path. But it can evaluate multiple relative location pathes though, e.g. /foo/a, then b/c in

<foo>
    <a>
        <b>
            <c/>
        </b>
    </a>
</foo>
tree. Another way to express which nodes are to be updated is [NodeType][NodeName] pattern, probably plus some simple attribute-based predicates. Sounds ugly, I know, but limiting scope to a node only fits better to forward-only way I'm trying to think.

Another problem is how to express update semantics. I have no idea how to avoid inventing new syntax. Something like:

<update match="/books/book[@title='Effective XML']">
    <set-attribute name="on-load" value="Arthur"/>
</update>
I have no idea if it's really feasible to implement though. All unmatched nodes should be passed untouched forward to the result, on the matched one the update logic should be evaluated and then go on.

Yes, I'm aware of STX, but I feel uneasy about this technology. Too coupled to SAX (CDATA nodes in data model ugh!), assignable variables etc. No, I'm talking about different thing, even more lightweight one (thought even more limited).

Does it make any sense, huh ?

Related Blog Posts

No TrackBacks

TrackBack URL: http://www.tkachenko.com/cgi-bin/mt-tb.cgi/96

2 Comments

Yeah, XSQ does seem as really interesting approach.

I've suffered from the idea of streaming XPath before. I like the XSQ solution (http://www.cs.umd.edu/projects/xsq/) in this space.

STX is a Good Thing once you get used to it... As for assignable variables, the non-random access nature of the SAX stream makes it a necessity. CDATA and the like are necessary to preserve the streaming character, as events pile up otherwise. Work is underway on a new data model, by the way; you should stop by and give some feedback.

Leave a comment