June 15, 2006

New Microsoft XML API - XmlLite

And you thought XML is done? No way. It's alive and kicking technology. And here is just one more proof: yet another new XML API from Microsoft - the XmlLite. It's a native library for building high-performance secure XML-based applications. XmlLite library is a small one by design - it only includes pull XML parser (native analog of the .NET's XmlReader), XML writer (native analog of the .NET's XmlWriter) and XML resolver (similar to the .NET's XmlResolver). XmlLite's meant to be small, simple, secure, standards-compliant but damn fast library to read and write XML. It's claimed to be able to parse XML even faster than MSXML. What I found especially compelling is XmlLite API similarity with .NET - no need to learn yet another way to read and write XML, it's a lite version of the .NET's XmlReader/XmlWriter, but for native programming. It's a "lite", so: no validation, very limited DTD processing (entity expansion and defaults for attributes only), no ActiveX, no scripting languages, not thread-safe etc.

...

Why another XML API?

XmlLite doesn't use or link MSXML, it's a separate standalone DLL. The reason why it's a separate DLL and not a part of MSXML is probably MSXML DLL size and lots of dependencies not all applications are willing to tolerate. Latest msxml6.dll is 1.3 Mb and it depends on mlang.dll, wininet.dll, urlmon.dll (about 700Kb each). XmlLite.dll is just 115Kb and depends on nothing.

How do I develop with XmlLite?

XmlLite SDK is part of the "Microsoft® Windows® Software Development Kit (SDK) for Beta 2 of Windows Vista and WinFX Runtime Components" aka Windows SDK. That of course doesn't mean XmlLite works only on Windows Vista (while it's expected to be shipped with Vista). It's a plain Win32 DLL you can work with even in Visual Studio 6. So - install Windows SDK (don't forget to check "Windows Vista Headers and Libs" point while installing). That will give you XmlLite.h, XmlLite.lib and documentation. That's enough for compiling and linking your application. IN order to run it you also need XmlLite runtime - the DLL. Currently it only comes with IE7 and Vista betas, but if you don't want to install any of these here is a trick - download latest IE7 installer, but don't run it. Unzip it instead and extract xmllitesetup.exe. This is XmlLite runtime installer, which will install XmlLite.dll into your system.

XmlLite reader is a pull-based (as opposite to SAX, which is push-based) non-caching forward-only XML parser. If you are not familiar with pull-based parsing, read this. Pull XML parsing is so easier to program with that I think everybody using SAX/MSXML should consider switching to XmlLite - now you've got an alternative which is not only faster but also better.

Here is my sample (you can find more samples in Windows SDK documentation) of extracting some value out of XML file. Sample XML document:

<config>    
    <key name="mykey" value="myval"/>
    <key name="foo" value="bar"/>
</config>
And I want to read a value of a key named "foo":
#include "stdafx.h"
#include <atlbase.h>
#include "xmllite.h"
#include <strsafe.h>

int _tmain(int argc, _TCHAR* argv[])
{
  HRESULT hr;
  CComPtr<IStream> pFileStream;
  CComPtr<IXmlReader> pReader;
  XmlNodeType nodeType;
  const WCHAR* pName;
  const WCHAR* pValue;


  //Open XML document
  if (FAILED(hr = SHCreateStreamOnFile(L"config.xml", 
    STGM_READ, &pFileStream)))
  {
    wprintf(L"Error opening XML document, error %08.8lx", hr);
    return -1;
  }

  if (FAILED(hr = CreateXmlReader(__uuidof(IXmlReader), 
    (void**)&pReader, NULL)))
  {
    wprintf(L"Error creating XmlReader, error %08.8lx", hr);
    return -1;
  }

  if (FAILED(hr = pReader->SetInput(pFileStream)))
  {
    wprintf(L"Error setting input for XmlReader, error %08.8lx", hr);
    return -1;
  }

  while (S_OK == (hr = pReader->Read(&nodeType))) 
  {
    switch (nodeType)
    {
    case XmlNodeType_Element:
      if (FAILED(hr = pReader->GetQualifiedName(&pName, NULL)))                      
      {
        wprintf(L"Error reading element name, error %08.8lx", hr);
        return -1;
      }
      if (wcscmp(pName, L"key") == 0)
      {
        if (SUCCEEDED(hr = 
          pReader->MoveToAttributeByName(L"name", NULL)))                      
        {
          if (FAILED(hr = pReader->GetValue(&pValue, NULL)))                      
          {
            wprintf(L"Error reading attribute value, error %08.8lx", hr);
            return -1;
          }
          if (wcscmp(pValue, L"foo") == 0) 
          {
            //That's an element we are looking for
            if (FAILED(hr = 
              pReader->MoveToAttributeByName(L"value", NULL)))                      
            {
              wprintf(L"Error reading attribute \"value\", error %08.8lx", hr);
              return -1;
            }
            if (FAILED(hr = pReader->GetValue(&pValue, NULL)))                      
            {
              wprintf(L"Error reading attribute value, error %08.8lx", hr);
              return -1;
            }
            wprintf(L"Key \"foo\"'s value is \"%s\"", pValue);
          }
        }    
      }
      break;
    }
  }
  return 0;
}

XmlLite reader and writer work on an instance of the IStream. You can use standard SHCreateStreamOnFile/CreateStreamOnHGlobal functions to read from memory or a file. If you need something else, e.g. reading from a socket, XmlLite SDK contains handy sample of a class implementing IStream you can start with.

Instances of the XmlLite's XmlReader and XmlWriter are meant to be reusble - first you create XmlReader/XmlWriter instance and then attach a stream to read from or write to using SetInput() or SetOutput() methods. At anty time you can reset XmlReader/XmlWriter and start working with a new stream.

Security. Of course DTD processing is turned off by default just like in .NET 2.0 and MSXML6. Besides XmlLite supports also fairly advanced security featues such as limiting memory consumption, limiting maximum element depth and limiting entity expansion. The latter makes XmlLite immune to the notorious billion laughs attack - a 1 kb well-formed XML document that kills IE, Visual Studio and almost any other tool which tries to parse it and expand laughing entities. Even with DTD support turned on XmlLite just stops parsing when entity expansion limit is reached. Cool, huh?

One more cool feature - XmlLite can read XML fragments, which is a recommended way to store frequently updated data such as log files.

And that's not all. What about random access mode in which XmlLite parser will store not attribute values, but attribute positions in a stream instead? Non-Extractive XML Parsing comes true. This can seriously reduce memory consuption in some scenarios. Of course underlying stream must be seekable for this.

Reading attribute or element values in chunks, IXmlResolver for total control over resolving of external entities. IMalloc to control reader/writer memory allocation. That's all great stuff.

Now about the dark side. IXmlReader provides bare minimum needed for XML parsing. After all it's XmlLite. I used to .NET luxury XmlReader and miss utility methods such as GetAtribute(), MoveToAttribute(), MoveToContent(), ReadElementString(), ReadInnerXml(), ReadOutterXml(), ReadToDescendant(), Skip() etc. XmlLite doesn't implement those trying to keep minimum API as possible. But I believe these methods while being pure utility are very substantial to the very nature of pull XML parsing. Without them pull parsing erodes into pseudo-push parsing - you gotta build pseudo push engine (I mean that "while (reader.Read())" loop with a switch within) and hook handlers for nodes you are interested in - instead of reading data you need directly. Not to mention that implmenting these methods properly can be tricky and error-prone. I think I'll provide an IXmlReader helper class with those missing methods.

So meet XmlLite, tiny but mighty third Microsoft XML library.