September 28, 2005

Aftermatch thoughts on XLinq

I finally got some time (18 hours in a plane to Seattle :) to settle down my mind about XLinq. Erik Meijer's excellent article, which explains XLinq from functional programming point of view made me changing my mind on some issues I wrote earlier, some hands on experience and some ...

Dimitre Novatchev (father of XSLT functional programming) writes:

The fact that XLINQ is as it is reflects that it targets a completely different audience, who would never attempt to learn XPath :o)

You are a person, who doesn't really need XLINQ and the authors of XLINQ had totally different audience in mind.

Most of your remarks in the remaining "Bitter words" also reflect this fact.

Taking the fact of the completely different audience targeted by XLINQ, many of the "shortcomings" for an experienced XML professional will actually be regarded as useful features for innocent OOP-ers.
There is some truth in his words. But I don't want to be "completely different audience" actually, I like XLinq and I do believe it can make XML programming easier for masses. While tragetting innocent developers who are too lazy afraid to learn anything new might be a good idea, abandoning everybody else who actually love to learn and know how to use XML tools provided by Microsoft is definitely a bad strategy. To be successful XLinq must be powerful enough to compete (or even better - to complement) with native XML query/transformation technologies supported by Microsoft for years at least in common XML processing scenarios. Erik Saltwell put it really well:
The point is, LINQ is a general purpose language for querying and transforming data, so of course there will be cases where a domain-specific query language (like xslt) will be more powerful, more compact, or more expressive. Our hope though, is that for those common cases where you don't need the extra power, this will be (as one PDC-goer put it) 'the last weird query language you'll ever have to learn.'
Well, the recipe is known:
Simple things should be simple. Complex things should be possible.

September 26, 2005

What XLinq misses

XLinq is at early stages, but what else would I like to see in XLinq? Here are my crazy wishes. Shortcuts. In C# I need book["title"] instead of book.Element("title"). last() and position() Literal XML just like in C-omega, not "kinda pseudo XML literals" like in VB9. Fine control over serialization ...

On XML expanded names in XLinq

Dave Remy writes about XName and expanded names in XLinq and he wants feedback. Here we go. ...

I personally just love this feature. Expanded name is a core XML notion that exists since early days of XML. Obviously XLinq didn't invented it, XPath, XML Namespaces, XSLT and XQuery all use expanded name as an abstraction for an XML name. Where XLinq innovates is providing concrete syntax representation for an expanded name. In XML in general and XQuery particular expanded name has no syntax and you probably can guess that's not just a whim. There are some issues, you know.

First - it's all plain strings - no syntax checking help from C# compiler, you going to be informed about invalid expanded name only at runtime. Curiously enough at the moment XLinq isn't smart enough to detect that there is something wrong in "new XElement("{dd}d}foo")". This compiles and even runs ok :)

Those numerous string concatenations/validations/tokenizings...

Every other XML API and XML itself uses prefixes. Leaving them out sure is too dramatical step and will confuse lots of not-so-advanced-in-XML developers. Curretly XLinq seems to be way to liberal, down to being no-namespace aware - you can create an element p:foo with no namespace declared for prefix "p" and XLinq won't complain. You customer would though.

Unfortunately I don't believe you can ignore prefixes completely, despite QNames in content considered harmful, they are ubiquitous. So XLinq have to clutter API provide some facilities to work with QNames and prefixes like you it or not. What about "prefix{ns-name}localname" triple syntax form? :)

Simplify is a good mantra though.

XLinq Bitter Words, Part III - Weird things

Nodes in XLinq overload ToString() method to provide pretty-printed outer XML representation. At the same time nodes contain (familiar for MSXML users and new for .NET users) readonly Xml property, which returns raw (not pretty-printed) outer XML representation. At also at the same time casting element to a string returns ...

Here is how it works now:

XElement book = new XElement("foo", 
    new XElement("bar", "baz"));          
Console.WriteLine(book.ToString());
Console.WriteLine(book.Xml);
Console.WriteLine((string)book);
The result:
<foo>
  <bar>baz</bar>
</foo>
<foo><bar>baz</bar></foo>
baz
Actually I can live with it, but what do you think?

Another confusing thing is XElement.SetElement() method. Setting an element to a magic null value means removing the element. So this method either sets element's value or removes it depending on value provided. Hmmm, weird. That reminds me early C functions which used to be doing many different things depending on magic argument values. Are we back to realloc()-like design?

XLinq Bitter Words, Part II - Heterogenuos XML Tree

In XLinq XML Tree is exposed in a heterogenos way, that is nodes in a tree don't belong to a common type. Traditionally XML tree models are homogeneous, e.g. in DOM every node belongs to a specific kind of XmlNode class. But in XLinq attributes and text nodes aren't actually ...

That might be really good idea actually. And at the same time such design has drawbacks. Main one you can see immediately by looking at XLinq API - those nasty object and object[] all around the methods. Yes, lots and lots of methods accept and return anything or arrays of anything. That's the price XLinq pays for sacrifycing text and attribute nodes. What's wrong with object[] based API? It's loosly typed and no compile time checks. You are suposed to read API documentation to figure out what you should pass to a method or what you would get back. That's not really a good idea. I'm sure developers would try to put DataSets into XElement constructor and then wonder why it doesn't come back. It's object, so you can pass *anything* and when you get it back it's your responsibility to figure out what do you get. Hence type switches all around the code. OOP developer in me cries "That's wrong!", but may be I'm wrong and XLinq indeed means "anything"? XLinq seems to be escaping to object[] in API because that's the only way to say "XNode, XAttribute and String" (actually that might change once XLinq becomes strongly typed). And the reason why to escape is obviously lack of attribute and text nodes.

Btw, I have no idea why XDocument constructor accepts object[] and not XNode[]. After all XDocument can only contain XDocumentType, XDeclaration, XElement, XCommment or XProcessingInstruction and all they inherit XNode. So currently "new XDocument("hmmm");" compiles well, but crashes at runtime. Why not to catch it at compile time??

I'm sure loosly typed nature of XLinq API hides more such "ooops" moments.

Honestly speaking I'm not sure if it worth it. What's so wrong with attributes and text nodes so one would screw up his API just to avoid them? Unless you go nuts (just like W3C DOM) by allowing adjacent text nodes, text nodes within attributes or entity references or similar crazyness, you are safe. XPath and now XQuery (like every other XML Tree API) have text nodes and I never heard of any problems with them (well, except for whitespace-only text nodes). Not to mention that at the same time XQuery supports strongly typed elements with no any troubles. Really, what's the reason for not having attributes and text as nodes in XLinq?

I'm packing to the MVP summit, so sorry for messy thoughts.

September 25, 2005

"XLinq: XML Programming Refactored (The Return Of The Monoids)" Paper by Erik Meijer and Brian Beckman

Erik Meijer: XLinq: XML Programming Refactored (The Return Of The Monoids) I just posted my XML 2005 submission about XLinq on my homepage. It describes the XLinq API in somewhat detail, and informally explains the relationship between LINQ and monads. That's really good one. [Via Lambda the Ultimate] ...

September 24, 2005

MVP Summit

I'll be at the Microsoft MVP Summit in Redmond next week. I'm Microsoft MVP for second year, but that's going to be my first MVP Summit. That should be fun. ...

September 23, 2005

XLinq Bitter Words, Part I - XML functional construction

XLinq is new and hot technology everybody seems to be happy with. I'm going to post a different review series - not what I like, but what I dislike and want to be fixed in XLinq. Sorry in advance for bitter words, but it's better when your friend says them ...

XML Tree functional construction is a great stuff and definitely a big improvement over the upside down DOM-style XML building. I only wanted to note that one doesn't have to wait years for C# 3.0 to be able to build XML tree in a natural top-down way, that was actually possible from the very beginning (here is .NET 2.0 sample, in .NET 1.X one would need XmlNodeWriter):

    XmlDocument doc = new XmlDocument();    
    XmlWriter w = doc.CreateNavigator().AppendChild();
    w.WriteStartElement("contacts");
      w.WriteStartElement("contact");
        w.WriteElementString("name", "Patrick Hines");
        w.WriteStartElement("phone");
          w.WriteAttributeString("type", "home");
          w.WriteString("206-555-0144");
        w.WriteEndElement();
        w.WriteStartElement("phone");
          w.WriteAttributeString("type", "work");
          w.WriteString("425-555-0145");
        w.WriteEndElement();
        w.WriteStartElement("address");
          w.WriteElementString("street1", "123 Main St");
          w.WriteElementString("city", "Mercer Island");
          w.WriteElementString("state", "WA");
          w.WriteElementString("postal", "68042");
        w.WriteEndElement();
      w.WriteEndElement();
    w.WriteEndElement();
    w.Close();
Should admit XLinq's functional construction still looks shorter and tree-friendly, but it is seriously constrained to building only XLinq XML tree in-memory. If one day you decide that building in-memory XML tree just to be saved to a disk is a waste of memory (and it is a waste), you would need to rewrite completely XML construction code and that's bad. XmlWriter has no limits here - you can use it for building XML tree in memory or writing XML directly to a stream in a fast efficicent non-caching way. XmlWriter just does its job - writes XML with no coupling to the result and that's way any code that produces XML with XmlWriter is more reusable.

Ergo - please don't repeat System.Xml mistakes and treat XmlWriter as a first class citizen - provide a way to build XDocument/XElement with XmlWriter, that will be good architecturally and will provide smoother migration from XmlDocument v2. E.g. what about providing an editable XPathNavigator over XElement or XDocument?

Update. And as a matter of interest XQuery being truly functional language of course supports functional composition too. Here is a sample:

element book { 
   attribute isbn {"isbn-0060229357" }, 
   element title { "Harold and the Purple Crayon"},
   element author { 
      element first { "Crockett" }, 
      element last {"Johnson" }
   }
}
Looks terribly familiar, huh?

But as Erik pointed out in comments neither form of composition can beat the ultimate way of building XML - literal one. Here is how it looks like in XQuery:

<example>
   <p> Here is a query. </p>
   <eg> $b/title </eg>
   <p> Here is the result of the query. </p>
   <eg>{ $b/title }</eg>
</example>
C# doesn't support it yet, while VB sort of does. "Sort of" because VB form of "literal XML" is not actually XML, but confusing ASP-like mess. But I'll address that later.

To be continued...

September 22, 2005

XLinq.Net

Being excited about XLinq I couldn't stop myself from grabbing XLinq.NET domain name. I'm going to try to build a community portal for the XLinq technology. The goal is basically to push XLinq by growing a community around it. There is definitely a need for Microsoft-independent easily accessible place where ...

6 ways of associating XML Schema with XML document in VS 2005

Hmm, according to Stan Kitsis there is at least 6 ways to associate an XML Schema with an XML document in Visual Studio 2005: 1. Schemas Property on your XML document 2. Inline inside your XML document 3. xsi:schemaLocation or xsi:noNamespaceSchemaLocation attributes in your XML document 4. Open Document Window ...

"How we did XQuery in SQL Server 2005" paper

Microsoft's paper about "the experiences and the challenges in implementing XQuery in Microsoft's SQL Server 2005" is available here. [Via Michael Rys] ...

September 21, 2005

xml:id went W3C Recommendation

Little xml:id spec finally got W3C Recommendation status. I believe XML programming would be better be xml:id done in 1998, not in 2005. Anyway. xml:id provides a mechanism for annotating elements with unique identifiers. You just set xml:id attribute for an element and you done, no need for DTD, XML ...

I hope Microsoft XML Team would consider adding support for xml:id into the next .NET version and into XLinq. That's really valuable addition to the XML Core.

And of course I cannot avoid Canonical XML (C14N) and xml:id controversy. Read Norm for the crux of the issue. In short - Canonical XML is broken. Here is an illustration:

string xml = @"
<foo xml:id=""f42"" xml:base=""http://foo.com"">
  <bar xml:base=""dir"">baz
</foo>";

XmlDocument doc = new XmlDocument();
doc.LoadXml(xml);
XmlDsigC14NTransform c14n = new XmlDsigC14NTransform();
c14n.LoadInput(doc.SelectNodes("/foo/bar"));
StreamReader sr = new StreamReader((Stream)c14n.GetOutput());
Console.WriteLine(sr.ReadToEnd());
Guess what is the result?
<bar xml:base="dir" xml:id="f42"></bar>
Not only xml:id is inherited by bar element (so now you can get different element when searching by the same ID), but xml:base is broken (absolute base URI part is lost). Too bad. Canonical XML should definitely be fixed.

September 19, 2005

Upgraded to MT 3.2

This is a repetitive pattern: once in a 6 months I get tired of comment and trackback spam and go upgrade my blog engine or install some antispam plugins.This time is not different. I've been massively attacked by spam trackbacks so I had to upgrade to MovableType 3.2. Not without ...

September 14, 2005

Integrating XML into programming languages - Cobol's turn

We've heard about XML penetration into C#, Java and SQL. Now it seems like 45-years old programming language, 75% of worlds's business apps is written in is ready to adopt XML. I'm talking about Cobol, yeah baby! In the "XML and the New COBOL" article at webservicessummit.com Barry Tauber explains ...

September 13, 2005

Little Catherine gets first chair

First chair for Catherine. Too big one, but not for long! ...

C# 3.0 chat with C# team

That's an interesting chat: C# 3.0 Language Enhancements Description: Can't attend PDC but still want to talk to the C# team? This chat is your chance! Join the C# team to discuss the newly announced C# 3.0 features like extension methods, lambda expressions, type inference, anonymous types and the .NET ...

September 8, 2005

Little known Visual Studio facts

Here are some amazing facts about Microsoft Visual Studio: Visual Studio 2005 will have 2700 commands that come from Microsoft alone, 800 of them - shared ones Visual Studio is well factored into 250 basic packages Visual Studio is the base for 36 SKU's Visual Studio 2003 shipped with 358 ...

September 7, 2005

XQuery in 10 min

Stylus Studio Team has published "Learn XQuery in 10 minutes" article by Mike Kay. Smells like Stylus Studio commercial, but anyway good intro to XQuery. ...