June 15, 2006

New Microsoft XML API - XmlLite

And you thought XML is done? No way. It's alive and kicking technology. And here is just one more proof: yet another new XML API from Microsoft - the XmlLite. It's a native library for building high-performance secure XML-based applications. XmlLite library is a small one by design - it ...

Why another XML API?

XmlLite doesn't use or link MSXML, it's a separate standalone DLL. The reason why it's a separate DLL and not a part of MSXML is probably MSXML DLL size and lots of dependencies not all applications are willing to tolerate. Latest msxml6.dll is 1.3 Mb and it depends on mlang.dll, wininet.dll, urlmon.dll (about 700Kb each). XmlLite.dll is just 115Kb and depends on nothing.

How do I develop with XmlLite?

XmlLite SDK is part of the "Microsoft® Windows® Software Development Kit (SDK) for Beta 2 of Windows Vista and WinFX Runtime Components" aka Windows SDK. That of course doesn't mean XmlLite works only on Windows Vista (while it's expected to be shipped with Vista). It's a plain Win32 DLL you can work with even in Visual Studio 6. So - install Windows SDK (don't forget to check "Windows Vista Headers and Libs" point while installing). That will give you XmlLite.h, XmlLite.lib and documentation. That's enough for compiling and linking your application. IN order to run it you also need XmlLite runtime - the DLL. Currently it only comes with IE7 and Vista betas, but if you don't want to install any of these here is a trick - download latest IE7 installer, but don't run it. Unzip it instead and extract xmllitesetup.exe. This is XmlLite runtime installer, which will install XmlLite.dll into your system.

XmlLite reader is a pull-based (as opposite to SAX, which is push-based) non-caching forward-only XML parser. If you are not familiar with pull-based parsing, read this. Pull XML parsing is so easier to program with that I think everybody using SAX/MSXML should consider switching to XmlLite - now you've got an alternative which is not only faster but also better.

Here is my sample (you can find more samples in Windows SDK documentation) of extracting some value out of XML file. Sample XML document:

<config>    
    <key name="mykey" value="myval"/>
    <key name="foo" value="bar"/>
</config>
And I want to read a value of a key named "foo":
#include "stdafx.h"
#include <atlbase.h>
#include "xmllite.h"
#include <strsafe.h>

int _tmain(int argc, _TCHAR* argv[])
{
  HRESULT hr;
  CComPtr<IStream> pFileStream;
  CComPtr<IXmlReader> pReader;
  XmlNodeType nodeType;
  const WCHAR* pName;
  const WCHAR* pValue;


  //Open XML document
  if (FAILED(hr = SHCreateStreamOnFile(L"config.xml", 
    STGM_READ, &pFileStream)))
  {
    wprintf(L"Error opening XML document, error %08.8lx", hr);
    return -1;
  }

  if (FAILED(hr = CreateXmlReader(__uuidof(IXmlReader), 
    (void**)&pReader, NULL)))
  {
    wprintf(L"Error creating XmlReader, error %08.8lx", hr);
    return -1;
  }

  if (FAILED(hr = pReader->SetInput(pFileStream)))
  {
    wprintf(L"Error setting input for XmlReader, error %08.8lx", hr);
    return -1;
  }

  while (S_OK == (hr = pReader->Read(&nodeType))) 
  {
    switch (nodeType)
    {
    case XmlNodeType_Element:
      if (FAILED(hr = pReader->GetQualifiedName(&pName, NULL)))                      
      {
        wprintf(L"Error reading element name, error %08.8lx", hr);
        return -1;
      }
      if (wcscmp(pName, L"key") == 0)
      {
        if (SUCCEEDED(hr = 
          pReader->MoveToAttributeByName(L"name", NULL)))                      
        {
          if (FAILED(hr = pReader->GetValue(&pValue, NULL)))                      
          {
            wprintf(L"Error reading attribute value, error %08.8lx", hr);
            return -1;
          }
          if (wcscmp(pValue, L"foo") == 0) 
          {
            //That's an element we are looking for
            if (FAILED(hr = 
              pReader->MoveToAttributeByName(L"value", NULL)))                      
            {
              wprintf(L"Error reading attribute \"value\", error %08.8lx", hr);
              return -1;
            }
            if (FAILED(hr = pReader->GetValue(&pValue, NULL)))                      
            {
              wprintf(L"Error reading attribute value, error %08.8lx", hr);
              return -1;
            }
            wprintf(L"Key \"foo\"'s value is \"%s\"", pValue);
          }
        }    
      }
      break;
    }
  }
  return 0;
}

XmlLite reader and writer work on an instance of the IStream. You can use standard SHCreateStreamOnFile/CreateStreamOnHGlobal functions to read from memory or a file. If you need something else, e.g. reading from a socket, XmlLite SDK contains handy sample of a class implementing IStream you can start with.

Instances of the XmlLite's XmlReader and XmlWriter are meant to be reusble - first you create XmlReader/XmlWriter instance and then attach a stream to read from or write to using SetInput() or SetOutput() methods. At anty time you can reset XmlReader/XmlWriter and start working with a new stream.

Security. Of course DTD processing is turned off by default just like in .NET 2.0 and MSXML6. Besides XmlLite supports also fairly advanced security featues such as limiting memory consumption, limiting maximum element depth and limiting entity expansion. The latter makes XmlLite immune to the notorious billion laughs attack - a 1 kb well-formed XML document that kills IE, Visual Studio and almost any other tool which tries to parse it and expand laughing entities. Even with DTD support turned on XmlLite just stops parsing when entity expansion limit is reached. Cool, huh?

One more cool feature - XmlLite can read XML fragments, which is a recommended way to store frequently updated data such as log files.

And that's not all. What about random access mode in which XmlLite parser will store not attribute values, but attribute positions in a stream instead? Non-Extractive XML Parsing comes true. This can seriously reduce memory consuption in some scenarios. Of course underlying stream must be seekable for this.

Reading attribute or element values in chunks, IXmlResolver for total control over resolving of external entities. IMalloc to control reader/writer memory allocation. That's all great stuff.

Now about the dark side. IXmlReader provides bare minimum needed for XML parsing. After all it's XmlLite. I used to .NET luxury XmlReader and miss utility methods such as GetAtribute(), MoveToAttribute(), MoveToContent(), ReadElementString(), ReadInnerXml(), ReadOutterXml(), ReadToDescendant(), Skip() etc. XmlLite doesn't implement those trying to keep minimum API as possible. But I believe these methods while being pure utility are very substantial to the very nature of pull XML parsing. Without them pull parsing erodes into pseudo-push parsing - you gotta build pseudo push engine (I mean that "while (reader.Read())" loop with a switch within) and hook handlers for nodes you are interested in - instead of reading data you need directly. Not to mention that implmenting these methods properly can be tricky and error-prone. I think I'll provide an IXmlReader helper class with those missing methods.

So meet XmlLite, tiny but mighty third Microsoft XML library.

June 13, 2006

Bruce Eckel's general purpose XML manipulation library

Bruce Eckel doesn't like XML. But alas - it's everywhere and he has to deal with it. So as you can expect, he goes and creates "general purpose XML manipulation library called xmlnode." for Python. That should be easy, right? Just one class, no need for more. Alas, it doesn't ...

XSLT2/XPath2/XQuery1 fresh CRs

W3C has released fresh versions of the Candidate Recommendations of XML Query 1.0, XSLT 2.0, XPath 2.0 and supporting documents. No big deal changes - xdt:* types has been moved to xs:* namespace (damn XML Schema). See new XQuery1/XPath2 type system below. Looks like XSLT2/XPath2/XQuery1 are moving fast toward Proposed ...

We all remember that major arguments for Microsoft not implementing XSLT 2.0 were XQuery (they decided it's better) and XSLT2 draft status (so don't repeat WD-XSL story). Now that XQuery is wiped out from .NET and XSLT2 becoming full recommendation, what could be the next argument against implementing it? Probably XLinq. But as XLinq evolves it becomes clear that XLinq doesn't really replaces XSLT.

.NET provides amazing support for XSLT1. Developing, debugging with Visual Studio, one of the best XSLT processors - XslCompiledTransform. That's great and hence XSLT is everywhere nowadays and you know what - we want more! XSLT 1.0 sucks, give us XSLT 2.0!

Here is a new XPath2/XQuery1 type system. I'd say it's definitely more elegant than it was before. Ready to go I think.

June 12, 2006

MSDN Wiki

Hmmm, community-driven MSDN documentation... tempting. Microsoft has launched the MSDN Wiki Beta - sort of a wrapper around MSDN documentation site, which adds "Community Content section" to the bottom of each MSDN page. Anybody can contribute any content to that section. Here is my test contribution to the "XslCompiledTransform Class ...