Signs on the Sand: August 13, 2006 Archives

August 13, 2006

Introducing AdSense Watch Toolbar

This Lebanon war had a terrible impact on my personal productivity. Too much TV, too much internet, too much pain, too little work. Hope it ends soon. Anyway I decided I need some short victorious war, oops I mean small interesting project to get me back on track. I've seen AdSense Notifier plugin for Firefox another day and I thought - cool, but I don't run Firefox 100% time, I want it on Windows taskbar, not a browser statusbar. So I had a spike project and got it working in just one night. Then I spent another two weeks polishing it. Ahhhhh, a joy of good old pure win32, MFC-free, just Windows and you and nothing in between. Unmanaged C++, LPTSTR, HWND, messages, win32 multithreading - sweet, I'm in The Old New Thing world again. The result is AdSense Watch Toolbar.

AdSenseWatch Desk Band AdSense Watch is a Windows Explorer toolbar (a desk band technically speaking), usually docked to the Windows taskbar. AdSense Watch displays your current "Google AdSense for content" report - Page impressions, Clicks, Page CTR, Page eCPM and Earnings. The data is updated automatically or on demand. More info on AdSense Watch Toolbar usage can be found at the XML Lab site.

Latest AdSense Watch installation is available at the XML Lab Downloads page. The latest version is currently 1.0b and as any other beta software AdSense Watch is currently free (but not open-source). AdSense Watch is written in C++ in Visual Studio 2005. AdSense Watch was tested on Windows 2000, Windows XP Pro and Windows Server 2003.

Any suggestions, bug reports and comments are welcome at the AdSense Watch Toolbar forum.

Sorry in advance to Allen G Holman that AdSense Watch looks similar to his great AdSense Notifier. Basic things usually similar in any environment...

I wasn't aware of Google AdSense API (and I'm still unaware of what it provides) and so implemented AdSense login basically using screenscraping technique. I tried to make login code as robust as possible and I think I succeeded in that, at least AdSense Watch survived latest changes in Google AdSense login procedure AdSense Notifier stumbled upon. As for report data - AdSense Watch is using CSV data for reliability.

Btw, AdSense Watch Toolbar is Windows Explorer Desk Band, but from implementation perspective it's not much different from Internet Explorer toolbar, so with minimal changes (mostly WRT registering) I actually can make AdSense Watch IE toolbar version.

I want to investigate Google AdSense API possibilities and add more features in the next version if there will be any interest in this tool.

Anyway, download AdSense Watch Toolbar for free and enjoy. Any comments are welcome!

...

6:45 PM | Comments (1) | TrackBack | #AdSense

Streaming XML filtering in Java and .NET

XML processing is changing. In Java SAX slowly but steadily goes away or at least goes into low level and nowadays Java with StAX is not so different from .NET XmlReader. I found it pretty interesting to compare approaches to streaming filtering XML in Java and .NET. Filtering is a very useful technique for transforming XML on the fly, while XML is being read. Filtering out parts or branches application isn't interested to process is a great way to simplify XML reading code, which is especially important in streaming XML processing which usually tends to be more complicated than in-memory based (XML DOM) processing.

Let's say we have this dummy XML and we want to extract "interesting data" out of it.

<root> <ignoreme>junk</ignoreme> <data>interesting data</data> </root> StAX API has a dedicated built-in facility for filtering - StreamFilter/EventFilter (as it happens in Java world StAX is a bit overengineered and contains actually two APIs - iterator-style and cursor-based one). Here is how it looks in Java with wonderful StAX:

XMLInputFactory xif = XMLInputFactory.newInstance();
XMLStreamReader reader = xif.createXMLStreamReader(
    new StreamSource("foo.xml"));
reader = xif.createFilteredReader(reader, new StreamFilter() {
    private int ignoreDepth = 0;

    public boolean accept(XMLStreamReader reader) {
        if (reader.isStartElement()
            && reader.getLocalName().equals("ignoreme")) {
            ignoreDepth++;
            return false;
        } else if (reader.isEndElement()
           && reader.getLocalName().equals("ignoreme")) {
           ignoreDepth--;
           return false;
        }
        return (ignoreDepth == 0);
    }
});
// move to <root>
moveToNextTag(reader);
// move to <data>
moveToNextTag(reader);
// read data
System.out.println(reader.getElementText());
reader.close();

Where moveToNextTag() is an utility method doing what its name says:

do {
    reader.next();
} while (!reader.isStartElement() && !reader.isEndElement());

XmlStreamReader actually provides method nextTag(), but weirdly enough it can't skip text (even text filtered out by an underlying filter!) and throws an exception.

Now .NET code. Unlike StAX, .NET doesn't provide any facility for XML filtering so usual approach is to implement filter as a full-blown custom XmlReader and then chain it to another XmlReader instance. As I said before implementing custom XmlReader even .NET 2.0 still sucks (holy cow - 26 abstract methods or deriving from legacy nonconormant XmlTextReader). So I'm going to use XmlWrappingReader helper I was recommending to use:

public class Test
{
    private class XmlFilter : XmlWrappingReader
    {
        public XmlFilter(string uri)
            : base(XmlReader.Create(uri)) { }

        public override bool Read()
        {
            bool baseRead = base.Read();
            if (NodeType == XmlNodeType.Element &&
                LocalName == "ignoreme")
            {
                Skip();
                return base.Read();
            }
            return baseRead;
        }
    }

    static void Main(string[] args)
    {
        XmlFilter filter = new XmlFilter("../../foo.xml");
        XmlReader r = XmlReader.Create(filter, null);
        //move to <root>
        r.MoveToContent();
        //Move to <data>
        MoveToNextTag(r);
        Console.WriteLine(r.ReadString());
    }

    private static void MoveToNextTag(XmlReader r)
    {
        do
        {
            r.Read();
        } while (!(r.NodeType == XmlNodeType.Element) &&
        !(r.NodeType == XmlNodeType.EndElement));

    }
}

Amazingly similar but not so cool because of lack of anonymous classes in .NET 2.0 (expected in .NET 3.0).

In short - what I like in Java version - built-in support for XML filtering, anonymous classes. What I don't like in Java version: filter can be called more than one time on the same position, what means that real filter implementation must support such scenario; very ascetic API, too few utility methods. What I like in .NET version: lots of useful methods in XmlReader such as Skip(), ReadToXXX() etc. What I don't like - no built-in support for filters, no anonymous methods.

Besides - if you work with StAX you can readily work with .NET XmlReader and the other way. Great unification saves hours learning for developers. I wonder if streaming XML processing API should be standardized?

...

12:03 AM | Comments (1) | TrackBack | #XML