October 31, 2004

Is System.Net.FileWebResponse class so limited WRT to content type?

I got a problem. It's .NET problem. In XInclude.NET I'm fetching resources by URI using WebRequest/WebResponse classes. Everything seems to be working fine, the only problem is as follows: when the URI is file system URI, the content type property is always "application/octet-stream". Looks like it's hardcoded in System.Net.FileWebResponse class ...

Five XQuery/XPath/XSLT working drafts updated

W3C has published fresh working drafts for XQuery/XPath/XSLT. XQuery 1.0: An XML Query Language, XML Path Language (XPath) 2.0, XQuery 1.0 and XPath 2.0 Data Model, XQuery 1.0 and XPath 2.0 Functions and Operators, XSLT 2.0 and XQuery 1.0 Serialization. These address comments received on previous drafts. XQuery 1.0. What's ...

Cw (Comega) language compiler pereview (again?)

From the Microsoft Research: Comega is an experimental language which extends C# with new constructs for relational and semi-structured data access and asynchronous concurrency. Cw is an extension of C# in two areas: - A control flow extension for asynchronous wide-area concurrency (formerly known as Polyphonic C#). - A data ...

October 27, 2004

XSL-FO to WordML stylesheet

Jirka Kosek has announced a tool (XSLT stylesheet actually) for converting XSL-FO documents to WordML. Get it at http://fo2wordml.sourceforge.net. ...

October 25, 2004

Implementing XML Base in .NET

XML Base is a tiny W3C Recommendation, just couple of pages. It facilitates defining base URIs for parts of XML documents via semantically predefined xml:base attribute (similar to that of HTML BASE element). It's XML Core spec, standing in one line with "Namespaces in XML" and XML InfoSet. Published back ...

So what XML Base is all about? It introduces xml:base attribute with predefined semantics (just like xml:space or xml:lang) of manipulating base URIs. xml:base attribute can be inserted anywhere in any XML document to specify for the element and its descendants base URI other than the base URI of the document or extenal entity. One purpose is to provide native XML way to define base URIs. Another purpose is resolving of relative URIs in XML documents, e.g. when document A is included into document B in some different location, relative URIs in the content of A would be broken. To keep them identifying the same resources xml:base attribute is used. If you still don't get it, take a look at a sample in the "Preserving Base URI" section of the "Combining XML Documents with XInclude" article at the MSDN Xml Dev Center. So it's basically XML's analog of the HTML's BASE tag.

Basically System.Xml supports base URIs all over the infastructure, the only problem is that basic syntax-level facilities such as XmlTextReader and XmlTextWriter ignore xml:base attribute when parsing and writing XML. Can we add such support in a transparent way? Sure. Let's take XmlTextReader, extend it in such way that each time it gets positioned on an element which bears xml:base attribute, BaseUri propery gets updated to reflect it. Here it is:

public class XmlBaseAwareXmlTextReader : XmlTextReader 
{
    private XmlBaseState _state = new XmlBaseState();
    private Stack _states = null;
    
    //Add more constructors as needed    
    public XmlBaseAwareXmlTextReader(string uri)
        : base(uri) 
    {
        _state.BaseUri = new Uri(base.BaseURI);
    }

    public override string BaseURI
    {
        get
        {
            return _state.BaseUri==null? "" : _state.BaseUri.AbsoluteUri;
        }
    }

    public override bool Read()
    {   
        bool baseRead = base.Read();
        if (baseRead) 
        {
            if (base.NodeType == XmlNodeType.Element &&
                base.HasAttributes) 
            {
                string baseAttr = GetAttribute("xml:base");
                if (baseAttr == null)
                    return baseRead;                
                Uri newBaseUri = null;
                if (_state.BaseUri == null)
                    newBaseUri = new Uri(baseAttr);        
                else
                    newBaseUri = new Uri(_state.BaseUri, baseAttr);                        
                if (_states == null)
                    _states = new Stack();
                //Push current state and allocate new one
                _states.Push(_state); 
                _state = new XmlBaseState(newBaseUri, base.Depth);
            }
            else if (base.NodeType == XmlNodeType.EndElement) 
            {
                if (base.Depth == _state.Depth && _states.Count > 0) 
                {
                    //Pop previous state
                    _state = (XmlBaseState)_states.Pop();
                }
            }
        }
        return baseRead;            
    }     
}

internal class XmlBaseState 
{
    public XmlBaseState() {}
    public XmlBaseState(Uri baseUri, int depth) 
    {
        this.BaseUri = baseUri;
        this.Depth = depth;
    }
    public Uri BaseUri;
    public int Depth;
}
Simple, huh? Now let's test it. Suppose I have a collection of XML documents in the "d:/Files" directory and a catalog XML file, such as
<catalog>
  <files xml:base="file:///d:/Files/">
    <file name="file1.xml"/>
  </files>
</catalog>
As you can see, xml:base attribute here defines base URI for files element subtree to be file:///d:/Files/ so file names are to be resolved relative to that folder no matter where catalog file is actually placed. (Of course I could have absolute URIs instead, but sure having absolute URIs hardcoded in every single place easily leads to a maintenance nightmare for any real system).

While loading this document to XPathDocument via XmlBaseAwareXmlTextReader it can be seen that base URIs are preserved as per XML Base spec:

XmlReader r = new XmlBaseAwareXmlTextReader("foo.xml");
XPathDocument doc = new XPathDocument(r);
XPathNavigator nav = doc.CreateNavigator();
XPathNodeIterator ni = nav.Select("/catalog");
if (ni.MoveNext())
  Console.WriteLine(ni.Current.BaseURI);
ni = nav.Select("/catalog/files/file");
if (ni.MoveNext())
  Console.WriteLine(ni.Current.BaseURI);
outputs
file:///D:/projects/Test/foo.xml
file:///d:/Files/
Unfortunatley XmlDocument doesn't seem to be so smart as XPathDocument on that matter and only supports base URI of the document and external entities. Too bad, too bad.

Ok, that was abstract test, now consider some XSLT processing - I load files by name for some processing using document() function. Recall that by default (single argument) document() function resolves relative URIs relatively to XSLT stylesheet's base URI (strictly speaking relatively to the base URI of the XSLT instruction which contains document() function). To resolve URIs relatively to some other base URI, second argument is used. So I'm going to pass <file> elements to the document() function as a second argumen for resolving URIs relitely to their base URI (which is defined via xml:base attribute on their parent element <files>):

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
  <xsl:template match="files">
    <files>
      <xsl:apply-templates/>
    </files>
  </xsl:template>
  <xsl:template match="file">
    <xsl:copy-of select="document(@name, .)"/>
  </xsl:template>
</xsl:stylesheet>
The code is as simple as
XmlReader r = new XmlBaseAwareXmlTextReader("foo.xml");
XPathDocument doc = new XPathDocument(r);
XslTransform xslt = new XslTransform();
xslt.Load("foo.xsl");
xslt.Transform(doc, null, Console.Out);
The result is
<files>
  <para>File 1 content</para>
</files>
As you can see, when using XmlBaseAwareXmlTextReader with XPathDocument one can get XML Base support for XPath and XSLT.

Alternatively I could implement XmlBaseAwareXmlTextReader as XmlReader, not as XmlTextReader (if you know the difference). And in the same simple way XML Base can be implemented for XML writing as XmlBaseAwareXmlTextWriter. Similar classes are used in XInclude.NET and I'm also going to add XmlBaseAwareXmlTextReader and XmlBaseAwareXmlTextWriter to our collection of custom XML tools in the MVP.XML project.

Update: XmlBaseAwareXmlTextReader is now part of the Common module of the MVP.XML library.

Did you know? XSLT 1.0 and XSLT 2.0 can be mixed

I missed that point somehow: The trouble is that XSLT allows regions of a stylesheet to belong to different versions. In XSLT 1.0, you can put an xsl:version attribute on any literal result element to indicate the version of XSLT used in the content of that element. In XSLT 2.0 ...

October 24, 2004

Indenting attributes with XmlTextWriter

XmlTextWriter in .NET 1.X only supports indentation of the following node types: DocumentType, Element, Comment, ProcessingInstruction, and CDATA. No attributes. So how to get attributes indented anyway? If you can - wait .NET 2.0 with cool XmlWriterSettings.NewLineOnAttributes, otherwise - here is a hack how to get attributes indented with XmlTextWriter ...

Well, XmlWriter isn't particularly low-level writer, it's abstract XML oriented API, so its implementation XmlTextWriter wouldn't allow you to just override WriteStartAttribute() method and inject indentation characters before each attribute - it would be considered as an exceptional attempt to write an attribute after a content has been already written. But when instantiating XmlTextWriter on top of some TextWriter, one can inject indentation before each attribute to that underlying TextWriter. It doesn't look particularly clean, but anyway:

public class AttributeIndentingXmlTextWriter : XmlTextWriter 
{
    private TextWriter w;
    private int depth;

    //Add constructors as needed
    public AttributeIndentingXmlTextWriter(TextWriter w)
        : base(w) 
    {
        this.w = w;
    }

    public override void WriteStartElement(string prefix, 
        string localName, string ns)    
    {
        depth ++;
        base.WriteStartElement(prefix, localName, ns);
    }

    public override void WriteFullEndElement()
    {
        depth--;
        base.WriteFullEndElement();
    }    

    public override void WriteEndElement()
    {
        depth--;
        base.WriteEndElement();
    }    

    public override void WriteStartAttribute(string prefix, 
        string localName, string ns)
    {
        if (base.Formatting == Formatting.Indented) 
        {   
            w.WriteLine();
            for (int i=1; i<Indentation*depth; i++)
                w.Write(IndentChar);
        }
        base.WriteStartAttribute(prefix, localName, ns);
    }
}
Usage:
XmlTextWriter w = 
  new AttributeIndentingXmlTextWriter(Console.Out);
w.Formatting = Formatting.Indented;
w.WriteStartDocument();
w.WriteStartElement("foo");
w.WriteAttributeString("attr1", "value1");
w.WriteAttributeString("attr2", "value2");
w.WriteAttributeString("attr3", "value3");
w.WriteStartElement("bar");
w.WriteAttributeString("attr1", "value1");
w.WriteAttributeString("attr2", "value2");
w.WriteAttributeString("attr3", "value3");
w.WriteString("some text");
w.WriteEndElement();
w.WriteEndElement();
w.WriteEndDocument();
The result is as follows:
<foo
  attr1="value1"
  attr2="value2"
  attr3="value3">
  <bar
    attr1="value1"
    attr2="value2"
    attr3="value3">some text</bar>
</foo>

Samples are templates

DonXML writes on viral coding examples in presentations on using XML in .NET: Joe Fawcett (fellow XML MVP) came across a great example (from the Microsoft.Public.Xml newsgroup) of one of my biggest pet peeves, "We (the community) are doing a very poor job teaching the average developer how to use ...

October 20, 2004

SAX for .NET 1.0 released

Karl Waclawek has announced the first production release of the SAX for .NET library - open source C#/.NET port of the SAX API. It contains API and Expat-based implementation. AElfred-based implementation is expected soon. ...

OPath language intro

"An Introduction to "WinFS" OPath" article by Thomas Rizzo and Sean Grimaldi has been published at MSDN. Summary: WinFS introduces a query language that supports searching the information stored in WinFS called WinFS OPath. WinFS OPath combines the best of the SQL language with the best of XML style languages ...

On pretty-printing XML documents using MSXML

Yeah, I know it's an old problem and all are tired of this one, but it's still newsgroups' hit. Sometimes XSLT is the off-shelf solution (not really perf-friendly though), but <xsl:output indent="yes"/> is just ignored in MSXML. In .NET one can leverage XmlTextWriter's formatting capabilities, but what in MSXML? Well ...

October 18, 2004

Dare's The XML Litmus Test

MSDN has published "The XML Litmus Test - Understanding When and Why to Use XML" article by Dare Obasanjo. Cool and useful stuff. But an example of inappropriate XML usage I believe is chosen quite poorly - in such kind of articles samples must be clear and clean, while sample ...

October 17, 2004

Derek Denny-Brown is blogging

That's sort of news that make my day - Derek Denny-Brown is finally blogging. Derek is working on XML/SGML last 9 years and currently is dev lead for both MSXML & System.Xml. Here is his atom feed if you can't find it on that dark-colored page. Subscribed. [Via Dare] ...

October 14, 2004

Yet yet another google puzzle (last one this week, I swear)

Ok, last one: Consider a function which, for a given whole number n, returns the number of ones required when writing out all numbers between 0 and n. For example, f(13) = 6. Notice that f(1) = 1. What is the next largest n such that f(n) = n? Again ...

F# Compiler Preview

Interesting news from Microsoft Research: The F# compiler is an implementation of an ML programming language for .NET. F# is essentially an implementation of the core of the OCaml programming language (see http://caml.inria.fr). F#/OCaml/ML are mixed functional-imperative programming languages which are excellent for medium-advanced programmers and for teaching. In addition ...

October 13, 2004

Yet another google puzzle

And what about this one: 1 1 1 2 1 1 2 1 1 1 1 1 2 2 1 What is the next line? I found several solutions, one better and couple of not really, but all of them don't match another property this sequence looks like to be ...

XEP 4.0 released

RenderX has released new major version of their famous XSL-FO Formatter - XEP 4.0, "with many more features and performance improvements". The engine supports the XSL Formatting Objects (XSL FO) Recommendation and the Scalable Vector Graphics (SVG) Recommendation for uniform, powerful, industry standard representation of source documents. XEP renders multi-media ...

Upcoming Changes to System.Xml in .NET Framework 2.0 Beta 2

Dare writes about "Upcoming Changes to System.Xml in .NET Framework 2.0 Beta 2". In short: No XQuery (only in SQL Server 2005 aka Yukon) New - push model XML Schema valiadtor - XmlSchemaValidator. XPathDocument is reverted the XPathDocument to what it was in version 1.1 of the .NET Framework. XmlReader ...

October 12, 2004

How to collect namespaces used in a XML document

The question raised in the microsoft.public.dotnet.xml newsgroup today: "How to retrieve the namespace collection of all the document namespaces for which there is at least one element in the document". The purpose is a validation against different schemas. Well, the most effective way of doing it is during XML document ...

October 11, 2004

Another google puzzle

Here is another cool puzzle from google: Solve this cryptic equation, realizing of course that values for M and E could be interchanged. No leading zeros are allowed. WWWDOT - GOOGLE = DOTCOM Should admit I failed to solve it with just a pen and a piece of paper. Or ...

October 10, 2004

Aggregated by the Planet XMLhack

Oh boy, I just realized my blog is aggregated by the Planet XMLhack. Wow. Thanks for that. Must stop writing narrow-minded rubbish and start focusing on XML hacking. ...

XML Schema determined ID, XPointer and .NET

While old gray XPath 1.0 supports only DTD-determined IDs, XPointer Framework also supports schema-determined IDs - an element in XML document can be identified by a value of an attribute or even child element, whose type is xs:ID. I've been implementing support for schema-determined IDs for the XPointer.NET/XInclude.NET library (has ...

Dan Wahlin is blogging

Dan Wahlin, author of the "XML for ASP.NET Developers" book and xmlforasp.net portal, Microsoft MVP for XML Web Services, etc, is finally blogging. Really better late than never. ...

Steve Ball has announced XSLT Standard Library 1.2.1

Steve Ball announced XSLT Standard Library version 1.2.1 - an open source, pure XSLT (extensions free) collection of commonly-used templates. New stuff includes new SVG and comparison modules and new functions in string, date-time and math modules. ...

October 5, 2004

Mark: XmlResolvers article and new edition of the "A First Look at ADO.NET and System.Xml V2.0" book

Mark Fussell: In between re-writing and updating the chapters for the beta version of the my book A First Look at ADO.NET and System.Xml V2.0, I found some time to write an article on Building Custom XmlResolvers for MSDN. It's really good artilce, highly recommended reading for those who still ...

October 2, 2004

Planet XMLhack - aggregating weblogs of the XML developer community

Edd Dumbill has announced planet.xmlhack.com - aggregating weblogs of the XML developer community. The weblogs are chosen to have a reasonable technical content, but because this is as much about the community as it is about the tech, expect the usual personal ramblings and digressions as well. In short, Planet ...

How to join XQP project

Well, here are some clarifications on how to join XQP project. You have to be registered at the SourceForge.net (here is where you can get new user accout) and then send some free-worded request along with SourceForge user name to me. That's it. Oh, and subscribe to the xqp-development mail ...

XInclude goes Proposed Rec

W3C published XInclude 1.0 Proposed Recommendation. Now it's only one step left for XInclude to become W3C Recommendation. That's what I call "just in time"! I just finished integrating XInclude.NET into the Mvp-Xml codebase, cleaning up the code and optimizing it using great goodies of Mvp-Xml such as XPathCache, XPathNavigatorReader ...