September 26, 2005

What XLinq misses

XLinq is at early stages, but what else would I like to see in XLinq? Here are my crazy wishes. Shortcuts. In C# I need book["title"] instead of book.Element("title"). last() and position() Literal XML just like in C-omega, not "kinda pseudo XML literals" like in VB9. Fine control over serialization ...

On XML expanded names in XLinq

Dave Remy writes about XName and expanded names in XLinq and he wants feedback. Here we go. ...

I personally just love this feature. Expanded name is a core XML notion that exists since early days of XML. Obviously XLinq didn't invented it, XPath, XML Namespaces, XSLT and XQuery all use expanded name as an abstraction for an XML name. Where XLinq innovates is providing concrete syntax representation for an expanded name. In XML in general and XQuery particular expanded name has no syntax and you probably can guess that's not just a whim. There are some issues, you know.

First - it's all plain strings - no syntax checking help from C# compiler, you going to be informed about invalid expanded name only at runtime. Curiously enough at the moment XLinq isn't smart enough to detect that there is something wrong in "new XElement("{dd}d}foo")". This compiles and even runs ok :)

Those numerous string concatenations/validations/tokenizings...

Every other XML API and XML itself uses prefixes. Leaving them out sure is too dramatical step and will confuse lots of not-so-advanced-in-XML developers. Curretly XLinq seems to be way to liberal, down to being no-namespace aware - you can create an element p:foo with no namespace declared for prefix "p" and XLinq won't complain. You customer would though.

Unfortunately I don't believe you can ignore prefixes completely, despite QNames in content considered harmful, they are ubiquitous. So XLinq have to clutter API provide some facilities to work with QNames and prefixes like you it or not. What about "prefix{ns-name}localname" triple syntax form? :)

Simplify is a good mantra though.

XLinq Bitter Words, Part III - Weird things

Nodes in XLinq overload ToString() method to provide pretty-printed outer XML representation. At the same time nodes contain (familiar for MSXML users and new for .NET users) readonly Xml property, which returns raw (not pretty-printed) outer XML representation. At also at the same time casting element to a string returns ...

Here is how it works now:

XElement book = new XElement("foo", 
    new XElement("bar", "baz"));          
Console.WriteLine(book.ToString());
Console.WriteLine(book.Xml);
Console.WriteLine((string)book);
The result:
<foo>
  <bar>baz</bar>
</foo>
<foo><bar>baz</bar></foo>
baz
Actually I can live with it, but what do you think?

Another confusing thing is XElement.SetElement() method. Setting an element to a magic null value means removing the element. So this method either sets element's value or removes it depending on value provided. Hmmm, weird. That reminds me early C functions which used to be doing many different things depending on magic argument values. Are we back to realloc()-like design?

XLinq Bitter Words, Part II - Heterogenuos XML Tree

In XLinq XML Tree is exposed in a heterogenos way, that is nodes in a tree don't belong to a common type. Traditionally XML tree models are homogeneous, e.g. in DOM every node belongs to a specific kind of XmlNode class. But in XLinq attributes and text nodes aren't actually ...

That might be really good idea actually. And at the same time such design has drawbacks. Main one you can see immediately by looking at XLinq API - those nasty object and object[] all around the methods. Yes, lots and lots of methods accept and return anything or arrays of anything. That's the price XLinq pays for sacrifycing text and attribute nodes. What's wrong with object[] based API? It's loosly typed and no compile time checks. You are suposed to read API documentation to figure out what you should pass to a method or what you would get back. That's not really a good idea. I'm sure developers would try to put DataSets into XElement constructor and then wonder why it doesn't come back. It's object, so you can pass *anything* and when you get it back it's your responsibility to figure out what do you get. Hence type switches all around the code. OOP developer in me cries "That's wrong!", but may be I'm wrong and XLinq indeed means "anything"? XLinq seems to be escaping to object[] in API because that's the only way to say "XNode, XAttribute and String" (actually that might change once XLinq becomes strongly typed). And the reason why to escape is obviously lack of attribute and text nodes.

Btw, I have no idea why XDocument constructor accepts object[] and not XNode[]. After all XDocument can only contain XDocumentType, XDeclaration, XElement, XCommment or XProcessingInstruction and all they inherit XNode. So currently "new XDocument("hmmm");" compiles well, but crashes at runtime. Why not to catch it at compile time??

I'm sure loosly typed nature of XLinq API hides more such "ooops" moments.

Honestly speaking I'm not sure if it worth it. What's so wrong with attributes and text nodes so one would screw up his API just to avoid them? Unless you go nuts (just like W3C DOM) by allowing adjacent text nodes, text nodes within attributes or entity references or similar crazyness, you are safe. XPath and now XQuery (like every other XML Tree API) have text nodes and I never heard of any problems with them (well, except for whitespace-only text nodes). Not to mention that at the same time XQuery supports strongly typed elements with no any troubles. Really, what's the reason for not having attributes and text as nodes in XLinq?

I'm packing to the MVP summit, so sorry for messy thoughts.