June 2004 Archives

Efficient subtree transformation with SubtreeXPathNavigator

June 24, 2004 10:57 AM | 5 Comments | 5 TrackBacks | Tags: : XML in .NET

Daniel implemented SubtreeXPathNavigator I was talking about. That's a way cool stuff, I really like it. Now I'm not sure about XmlNodeNavigator - do we need it in Mvp.Xml library or we better remove it to not confuse users with different forms of the same navigator?

I feel a bit guilty about Mvp.Xml project and the June plans I announced. Sorry, I just did nothing. I'm way busy with another unexpected stuff I just had to finish first.

Microsoft Research RSS Feeds

June 22, 2004 1:13 PM | No Comments | 5 TrackBacks | Tags: : Blogging

Here are new feeds worth to subscribe:
Microsoft Research News and Headlines Feed
Microsoft Research Downloads Feed
Microsoft Research Publications Feed
Cool.

[Via Roy Osherove]

Validating Doctype-less documents against DTD

June 21, 2004 2:41 PM | 1 Comment | No TrackBacks | Tags: : XML in .NET

Here is another interesting puzzle to solve - how would you validate Doctype-less XML document (which has no Doctype declaration) against DTD?

Continue reading Validating Doctype-less documents against DTD.

Non-Extractive XML Parsing

June 20, 2004 2:50 PM | No Comments | No TrackBacks | Tags: : XML

Well, I'm working on decreasing the size of the "Items for Read" folder in RSS Bandit. Still many to catch up, but anyway. XML.com has published "Non-Extractive Parsing for XML" article by Jimmy Zhang. In the article Jimmy proposes another approach to XML parsing - using "non-extractive" style of tokenization. In simple words his idea is to treat XML document as a static sequence of characters, where each XML token can be identified by the offset:length pair of integers. That would give lots of new possibilities such as updating a part of XML document without serializing of unchanged content (by copying only leading and trailing buffers of text instead), fast addressing by offset instead of ids or XPath, creating binary index for a document ("parse once, use many times" approach).

While sounding interesting (and not really new as being sort of remake of the idea of parsing XML by regexp) there is lots of problems with "non-extractive" parsing. XML in general doesn't really fit well into that paradigm. Entities and inclusions, encoding issues, comments, CDATA and default values in DTD all screw up the idea. Unfortunately that happens with optimization techniques quite often - they tend to simplify the problem. It probably will work only with a very limited subset of XML, but it's fruitfullness still needs to be proven.

Another shortcoming of "non-extractive" parsing is the necessity to have entire source XML document accessible (obviously offsets are meaningless with no source buffer at hands). That would mean the buffering the whole (possibly huge) XML document in a streaming scenario (e.g. when you read XML from a network stream).

Still that was interesting reading. Indexing of an XML document, how does it sound? Using IndexingXPathNavigator it's possible to index in-memory IXPathNavigable XML store and to select nodes directlty by key values instead of traversing the tree. That works, but there is still lots of room for developement here. What about persistent indexes? What if XslTransform would be able to leverage existing indexes instead of building its own (for xsl:key) on each transformation?

How to add a reference to XSLT stylesheet while writng DataSet data to XML

June 15, 2004 11:24 AM | 3 Comments | No TrackBacks | Tags: : XML in .NET

Say you've got a DataSet and you want to save its data as XML. DataSet.WriteXml() method does it perfectly well. But what if you need saved XML to have a reference to an XSLT stylesheet - xml-stylesheet processing instruction, such as <?xml-stylesheet type="text/xsl" href="foo.xsl"?> ? Of course you can load saved XML into XmlDocument, add the PI and then save it back, but don't you remember Jan Gray's Performance Pledge we took:

"I promise I will not ship slow code. Speed is a feature I care about. Every day I will pay attention to the performance of my code. I will regularly and methodically measure its speed and size. I will learn, build, or buy the tools I need to do this. It's my responsibility."

Come on, forget about XmlDocument. Think about perf and don't be lazy to look for a performance-oriented solution in the first place. Here is one simple streaming solution to the problem - small customized XmlWriter, which adds the PI on the fly during XML writing.

Continue reading How to add a reference to XSLT stylesheet while writng DataSet data to XML.

Microsoft Security Bulletin RSS Feed

June 9, 2004 11:36 AM | No Comments | No TrackBacks | Tags: : Blogging

RSS makes its way. TechNet's security team announced the first version of an RSS feed for its security bulletins: Microsoft Security Bulletin RSS Feed.

MSDN still suggests ineffective XSLT pipelining

June 8, 2004 2:30 PM | No Comments | 2 TrackBacks | Tags: : XML in .NET

Reading wonderful "Chapter 9 - Improving XML Performance":

Split Complex Transformations into Several Stages You can incrementally transform an XML document by using multiple XSLT style sheets to generate the final required output. This process is referred to as pipelining and is particularly beneficial for complex transformations over large XML documents.

More Information

For more information about how to split complex transformations into several stages, see Microsoft Knowledge Base article 320847, "HOW TO: Pipeline XSLT Transformations in .NET Applications," at http://support.microsoft.com/default.aspx?scid=kb;en-us;320847.

Sounds great, but referred KB article 320847 is actually a huge green fly in the ointment - it still suggests using temporary MemoryStream to pipeline XSL transformations! What a crap, didn't they hear about new XPathDocument(xslt.Transform(doc, args))??? I've reported that glitch more than a year ago and I've been told they are working to fix it. Still not fixed. Ok, probably that's time to use my MVP connections to get it finally fixed.

Another one I've stumbled upon:

XPathNavigator. The XPathNavigator abstract base class provides random read - only access to data through XPath queries over any data store. Data stores include the XmlDocument, DataSet, and XmlDataDocument classes.

Somehow "the XML data store" XPathDocument is forgotten once again :(

I'm back

June 6, 2004 11:02 AM | 2 Comments | No TrackBacks | Tags: : Personal

So I'm back. That was crazy trip Tel-Aviv-Prague-Berlin-Amsterdam-Paris-Bavaria-Prague-Tel-Aviv. Bad weather was chasing us, but fortunately it was mostly warm enough even for us sun-accustomed Israelis.

Mailbox overflow did happen and all incoming mail has been bounced during 06/01-06/03. If you were trying to send me something that days, you may want to resend it again. I just started mailbox cleaning (3000 unreaded / 70% spam). And RSS Bandit (600 unreaded / 0% spam :) is waiting for me too. I just removed about 100 spam comments from the blog. Recovering's going on...

« May 2004 | Main Index | Archives | July 2004 »

The D00D: Oh LOL, XSLT. I wondered where the conditionals were, and read more
Sam: Does this wok with custom XSD . I have custom read more
Anonymous: Check this: http://msdn.microsoft.com/en-us/library/system.collections.sortedlist.contains.aspx Towards the middle of the page you read more
Buzzer: As for the & ampersand issue, simply call HTTPUtility.HTMLEncode in read more
Ethyl Kruk: Have you read any of Cem Kaner's books? He is read more
steve oak: yes, tried and it works! you know what? I even read more
Tim: @Martin Kool Thanks for the registry-hint read more
nokola: Hi Oleg, This is indeed very interesting finding! I tried read more
Anonymous: The total of the first line is 1, the total read more
lior: the url are broken is the code still free source read more

June 2004 Archives

Efficient subtree transformation with SubtreeXPathNavigator

Microsoft Research RSS Feeds

Validating Doctype-less documents against DTD

Non-Extractive XML Parsing

How to add a reference to XSLT stylesheet while writng DataSet data to XML

Microsoft Security Bulletin RSS Feed

MSDN still suggests ineffective XSLT pipelining

I'm back

Search

About this Archive

Recent Tweets

Recent Comments

Recent Posts

June 2004 Archives

Search

About this Archive

Recent Tweets

Tag Cloud

Recent Comments

Recent Posts