February 2004 Archives
At last some good news. Streaming subset of XInclude I was talking about gets blessing from the W3C XML Core WG. Here is what Jonathan Marsh (MSFT, editor of XInclude) writes:
It appears to be impossible to improve streamability without removing
functionality from XInclude. The WG decided instead to bless a kind of
"streamable subset" by adding text along these lines:
The abscense of a value for the href attribute, either by the appearance
of href="" or by the absence of the href attribute, represents a case
which may be incompatible with certain implementation strategies. For
instance, an XInclude processor might not have a textual representation
of the source infoset to include as parse="text", or it may be unable to
access another part of the document using parse="xml" and an xpointer
because of streamability concerns. An implementation may choose to treat
any or all absences of a value for the href attribute as resource
errors. Implementors should document the conditions under which such
resource errors occur.
New version of XInclude spec is going to be published soon. As they are slightly changing syntax again (removing accept-charset attribute), I think it will be Working Draft again.
Well, I know I stink on graphics. Yesterday I tried to develop a logo for the XInclude.NET project and here is what I ended up. The idea was about Lego and intergration or parts into a round thing, whatever.
I'd like to hear what do you guys think about this logo?
I'm personally not really satisfied with it and I doubt I can make it better, so let's have a logo contest. You send me your logo variants (find my email in top right corner of this page), I put them to some page and after some time we vote for a winner logo.
Prize? Well, XInclude.NET project doesn't have sponsors, so we can't afford anything more valuable than "The logo design by" line in every bit of XInclude.NET documentation and of course a pile of eternal gratitude.
When your hard disk dies Monday morning, that's nice week start. Low type tasks on recovering your data, sources, reinstalling and configuring all the stuff you cannot work without... Refreshing.
Basically I've recovered already. Surprisingly I cannot now install Office 2003, it says "You've got McAffee VirusScan Enterprise installed, Office 2003 Pro cannot be installed on the same machine with that crap." Hmmmm... Anybody seen that? I failed to google any workarounds.
Dummy entry to provide single place for nxslt.exe utility comments.
"XQuery 1.0 and XPath 2.0 Formal Semantics" spec has been updated today and reached Last Call Working Draft status. This is a document you may want to read to get deep understanding of semantics of XQuery 1.0 and XPath 2.0 languages:
This document defines the semantics of [XPath/XQuery] by giving a precise formal meaning to each of the expressions of the [XPath/XQuery] specification in terms of the [XPath/XQuery] data model. This document assumes that the reader is already familiar with the [XPath/XQuery] language.
Comments are due by 15 April 2004.
Here is what Michael Kay (XSLT star, developer of Saxon, author of every-XSLT-dev-bible "XSLT Programmer's Reference" and XSLT 2.0 editor) writes about XQuery:
The strength of XQuery is that it is a simpler language than XSLT, which
makes it much more feasible to implement efficient searching of very
large XML databases.
Its other strength is that for simple problems, the XQuery code is much
shorter than the XSLT code.
But for complex manipulation of in-memory XML, I would use XSLT every
time, regardless of whether you're dealing with "data" problems or
Do you agree with him?
RenderX, a company behind famous XEP XSL-FO formatter plans to release a .NET version. Great news! XEP is the best production quality Java XSL-FO formatter I've ever seen. It's not unexpensive, but it covers XSL-FO a way better than free Apache FOP (I have to add "unfortunately", being one of FOP committers).
XEP.NET is an XSL-FO formatter component for .NET,
capable of producing PDF and PostScript from XSL-FO
data. The product is based on a proven Java core, and is
fully identical in functionality to the latest Java version.
The software is 100% manageable .NET code: no native
libraries are used. It exposes standard .NET interfaces
for XML processing (XmlReader and XmlWriter).
Additionally, classes for smooth MSXML integration
are included, with source code.
The package also includes a command-line utility and
a simple GUI tool to run XSL-FO formatting.
.NET Framework 1.1 or higher;
Visual J# Redistributable 1.1 or higher.
Seeing J# in prerequisites I can assume they have ported Java code into J# actually. Why not?
Meanwhile RenderX is looking for beta testers.
BizTalk Server 2004 will launch on March 2, 2004.
And to get us to speed up 8 BizTalk 2004 MSDN webcasts are arranged between March 2 and March 5!
Here is the first developer treat: As part of the launch there will be an MSDN BizTalk Server Developer Blitz with no less than eight web casts packed with information from 3/2 to 3/5. These sessions are developer orientated, full of demos and guarranteed to get you up to speed. Get your own mini-Teched on BizTalk Server for the attractive price of $0 and delivered to you in the comfort of your office/home on the same week we launch the product. Don't forget to register now - these sessions will likely full up fast.
Worth to get registered now.
I've updated my XML Bestiary as a consequence of users and my own feedback. First of all I renamed WritableXPathNavigator to SerializableXPathNavigator. That's much less confusing name IMO. Beside that I unified all distributions (the same namespace, project structure etc). More beasts to come soon, I've got several growing up in an incubator.
It's definitely love-to-steaming-strikes-back day today. Here is another sample of how streaming XML processing approach fails.
The only XInlcude feature still not implemented in XInlcude.NET project is intra-document references. And basically I have no idea how to implement it in .NET pull environment (as well as Elliotte Rusty Harold has no idea how to implement it in his SAX-based implementation). What's the problem?
Meanwhile I managed to create simple dummy online demo of ForwardXPathNavigator (XPathNavigator implementation over XmlReader) I was talking about. Here it is.
Here is Daniel clarifies things about XSE:
XSE is not about querying with an specific expression language/format (i.e. XPath or SXPath). XSE is just a mechanism for encapsulating state machines checking for matches against a given expression. What the expression looks like depends on the factory that creates the strategy.
Therefore, the factories I showed (i.e. my RootedPath and RelativePath) are only encapsulating code generation for different FSMs, based on an expression language that fits a need. Therefore, I could even create a factory implementing SXPath and still remain in Xml Streaming Events land.
The XSE idea is to provide a callback metaphor to XML parsing, instead of the pull-model of the XmlReader. In fact, it's a sort of evolution over SAX, in that at the same time it offers both worlds: pull model directly from the XseReader, events-based for your registered handlers.
Now that's finally clear to me. And the approach starts to delight me. Really, really not bad. I need to dig around it before I can say some more.
Looks like Microsoft's patenting its XML investments. Recently we had a hubbub about Office 2003 schemas patenting, then XML scripting. Daniel like many others feel alarm, you too?
Well, I'm not. Patenting software ideas is stupid thing, but that's a matter of unperfect reality we live in. Everything is patented nowadays, right up to the wheel. So if Office XML is gonna be patented I prefer it's being patented by Microsoft. After all they are not interested to close it (aka make it die), instead they made Office schemas Royalty-Free. And one more reason - I'm sure all we don't want to find ourself one day rewriting all Office-based solutions just because of another Eolas scrooge case or even to pay for out-of-blue-license to some other litigious bastards.
That's all sounds reasonable if that's really defensive patenting though, otherwise - be prepared.
Michael Brundage's excellent XQuery reference book is finally available.
[Via Michael Rys]
Dr. Rys is talking about just published (February 2004) "XQuery : The XML Query Language" book.
Michael Brundage is Technical Lead for XQuery processing at Microsoft and the recommendations are so weighty... I feel I want this book too.
Ok, Dare great deal clarified things in his "Combining XPath-based Filtering with Pull-based XML Parsing" post:
Actually Oleg is closer and yet farther from the truth than he realizes. Although
I wrote about a hypothetical ForwardOnlyXPathNavigator in my article entitled Can
One Size Fit All? for XML Journal my planned article which should show up
when the MSDN XML Developer Center launches in a month or so won't be using it. Instead
it will be based on an XPathReader that is very similar to the one used in BizTalk
2004, in fact it was written by the same guy. The XPathReader works similarly
to Daniel Cazzulino's XseReader but uses the XPath subset described in Arpan Desai's Introduction
to Sequential XPath paper instead of adding proprietary extensions to XPath
as Daniel's does.
I've released nxslt.exe utility version 1.4. It's maintenance release. Changes are:
- Updated to EXSLT.NET 1.0.1.
- Updated to XInclude.NET 1.2.
- Updated project to Microsoft Visual Studio .NET 2003 (so now nxslt.exe can be built directly from VS.NET, no need to run nmake manually - EXSLT methods renaming such as nodeSet() to node-set() is done in postbuild script now).
- Binary download includes three nxslt.exe versions (compiled for .NET 1.0, 1.1. and 1.2).
- Usage header now indicatas what's .NET runtime nxslt.exe is running under:
.NET XSLT command line utility, version 1.4 (Running under .NET 1.1)
The rule is simple - nxslt.exe requires .NET Framework it's compiled for. By default nxslt.exe is compled for .NET 1.1 and thus can't run under .NET 1.0. Instead use nxslt-.NET1.0.exe version (feel free to rename it too). For testing .NET 1.2 use nxslt-.NET1.2.exe version.
No need to say, I appreciate any comments|critics|suggestions|donations|not(spam).
Not too much, right. For the next nxslt.exe release (March probably) I'm going to implement basic XSLT profiling, tracing and may be rudimentary debugging functionality. Stay tuned.
Daniel writes about performant (and inevitably streaming) XML processing, introducing XSEReader (aka Xml Streaming Events Reader). While he didn't publish the implementation itself yet, but only teasing with samples of its usage, I think I get the idea. Basically I know what he's talking about. I've been playing with such beasts, making all kinds of mistakes and finally I came up with a solution, which I think is good, but I didn't publish it yet. Why? Because I'm tired to publish spoilers :) It's based on "ForwardOnlyXPathNavigator" aka XPathNavigator over XmlReader, Dare is going to write about in MSDN XML Dev Center and I wait till that's published.
It's been Microsoft DevDays 2004 in Israel today. Well, DevDay actually. Here are the impressions I got there:
- One has to get up earlier to not miss the keynote.
- VS.NET has cool PocketPC emulator.
- Code Access Security is omnipotent.
- Lutz Roeder's .NET Reflector may hang out in the middle of a presentation.
- WS-Security is great and Yosi Taguri is bright speaker, but he scrolls code too fast.
- Zero Deployment is amazingly simple.
- They are really anxious about security nowadays. All attendants have been given "Writing Secure Code" book for free. Aaah, bookworm's joy. "Required reading at Microsoft. - Bill Gates" is written on the book's front page.
This interesting trick has been discussed in microsoft.public.dotnet.xml newsgroup recently. When one has a no-namespaced XML document, such as
there is a trick in .NET, which allows to read such document as if it has some default namespace:
I'm introducing another category in my blog - XML Tips and Tricks, where I'm going to post some XML, XPath, XSLT, XML Schema, XQuery etc tips and tricks. I know, many of my readers being real XML gurus know all this stuff (I encourage to correct me when I'm wrong or proposing better versions though), but I hope it would be interesting for the rest and may attract new readers.
Here is the first instalment - conditional XPath expressions.
Dare has been talking recently about the disconnects developers may feel once they make the shift from tree based (XmlDocument) to cursor based (XPathNavigator) model. My personal XML learning curve has started with DOM (I remember those long convolute ugly DOM navigational programs I wrote back in Y2K), then I fell in love with SAX and only then I became XmlReader and XPathNavigator fan. But despite the fact I'm probably not an average developer (as I spend most of my time dealing exclusively with XML) I can feel the disconnect too. DOM is kinda ground zero for many of us and not feeling it underfoot is a bit like flying in zero-gravity. Hurts at first, but fun and cool once you get used to it. I think that's not by accident DOM implemenation in .NET has been named XmlDocument, that reflected some basic attitude at that time, although some of us believe now DOMDocument was a better name.
Anyway, here is my small humble contribution to XPathNavigator appreciation - SerializableXPathNavigator. It's really small wrapper around XPathNavigator, which extends it adding InnerXml/OuterXml properties and WriteTo()/WriteContentTo() methods. That's unfortunate omission XPathNavigator doesn't have such fuctionality in .NET 1.0/1.1 and this fact adds some degree to the discronnect devs feel, because devs do like OuterXml and use it frequently. It's fixed in .NET 2.0, but till then I propose this implementation.
Here is local copy and here is GotDotNet's copy. Free and open source of course.
XPathDocument doc = new XPathDocument("books.xml");
XPathNavigator nav = doc.CreateNavigator();
XPathNodeIterator ni = nav.Select("/catalog/book[title='Creepy Crawlies']");
SerializableXPathNavigator snav = new SerializableXPathNavigator(ni.Current);
Couple of details - SerializableXPathNavigator is XPathNavigator itself, which wraps another XPathNavigator and exposes the following additional members:
OuterXml - gets the XML markup representing the current node and all its child nodes.
InnerXml - gets the XML markup representing only the child nodes of the current node.
WriteTo(XmlWriter) - saves the current node to the specified XmlWriter.
WriteContentTo(XmlWriter) - saves all the child nodes of the current node to the specified XmlWriter.
Implementation details - see sources.
Hope you can find it useful. As usual I appreciate any comments/bugs/critics.
Did you know XslTransform class allows custom XmlResolver to return not only Stream (it's only what default XmlResolver implementation - XmlUrlResolver class supports), but also XPathNavigator! Sounds like undeservedly undocumented feature. What it gives us? Really efficient advanced XML resolving scenarios such as just mentioned recently on asp.net XML forum - getting access to XML fragments from within XSLT. Or looking up for cached in-memory XML documents. Or constructing XML documents on the fly for XSLT, e.g. via accessing SQL Server database from within XSLT stylesheet and processing the result. Well, part of it could be done also with XSLT parameters and extension functions, but XmlResolver is more powerful, flexible and elegant approach.
Here is a sample XmlFragmentResolver, which allows XSLT to get access to external XML fragments (XML fragment aka external general parsed entity is well-formed XML with more than one root elements):
public class XmlFragmentResolver : XmlUrlResolver
override public object GetEntity(Uri absoluteUri, string role,
using (FileStream fs = File.OpenRead(absoluteUri.AbsolutePath))
XmlTextReader r = new XmlTextReader(fs,
XPathDocument doc = new XPathDocument(r);
Don't forget to pass its instance to Transform() method (in .NET 1.0 - set it to XslTransform.XmlResolver
xslt.Transform(doc, null, Console.Out, new XmlFragmentResolver());
And here is how then you can access XML fragments from within XSLT:
Note, that instead you can load XML fragment and pass it as a parameter, but then you should know statically in advance all XML fragments/documents XSLT would ever require. XmlResolver approach allows XSLT to take over and access external documents or fragments really dynamically, e.g. when a file name cannot be known prior to the transformation.
One of consequences of the revolutionary XML support in Microsoft Office 2003 is a possibility to
unlock information in the Microsoft Office System using XML. Most likely that was deliberate decision to open Office doors for XML technology and I'm sure that's winning strategy.
Talking about transforming WordprocessingML (WordML) to HTML, what's the state of the art nowadays?
There are two related activities I'm aware of, both Microsoft rooted. First, it's "WordML to HTML XSL Transformation" XSLT stylesheet available for download at Microsoft Download Center. It's huge while well documented while unsupported beta XSLT stylesheet, which transforms Word 2003 Beta 2 XML documents to HTML. Its final release, which will also support images is expected, but who knows when?
Second, Don Box is experimenting with Wordml2XHTML+CSS transformation, mostly for the sake of his blogging workflow. He said his stylesheet is better (less global variables etc.). Apparently Don didn't finish it yet, so the stylesheet isn't available.
So one stylesheet is only for Word 2003 Beta 2 documents, second isn't ready yet, sounds bad, huh? Here is my temporary solution - original "WordML Beta 2 to HTML XSL Transformation" stylesheet fixed by me to support Word 2003 RTM XML documents. As usually with Microsoft stuff, "beta" most likely is 99% RTM version. So I fixed Beta 2 stylesheet a bit and it just works. In fact that's only namespaces that I fixed yet. I'm currently testing the stylesheet with big real documents, so chances are I'll need to modify it further.
Download version 1.0 of the stylesheet here - Word2HTML-1.0.zip. Credits due to Microsoft and personally to whoever developed the stylesheet. Any bug reports or comments are appreciated. Just post comment to this text.
Another idea is to implement support for images. Basically the idea is to decode images and save them as external files in XSLT external function and I don't see how to make it in portable way, so most likely I'll end up soon with two stylesheet versions - for MSXML and .NET. Stay tuned.
Six Apart has announced MovableType 3.0 Alpha testing is about to begin. Testers such as plugin developers, web standards advocates or just Movable Type users with an active commenting community are invited. Here is a list of upcoming MT 3.0 features.
I keep getting 5-10 spam comments a day, so sure I'd like to test comment registration system.
Have you noted this thread in microsoft.public.dotnet.xml newsgroup? A guy was trying to get list of unique values from XML document of 46000 records. Using Muenchian grouping method. For MSXML4 it took 20 seconds, while in .NET 1.0 and 1.1 it effectively hung.
Well, as all we know Muenchian method works deadly slowly in .NET unfortunately. MSXML4 optimizes generate-id($node1) = generate-id($node2) expression by making direct comparison of nodes instead of generating and comparing ids. .NET implementation isn't so sophisticated. Emerging .NET 1.1 sp1 is going to make it faster, but what's today's solution?
Enter EXSLT.NET's set:distinct() extension function. Using it the result was:
695 unique keys generated from about 46000 records in less
than 2 seconds.
Now that's really amazing. Ten times faster than MSXML4! And much more understandable - just compare these expressions:
Special kudos to Dimitre Novatchev for optimizing EXSLT.NET set functions.