Signs on the Sand: January 2008 Archives

January 28, 2008

Testing XSLT

State of the art of XSLT testing in a simple succinct format by Tony Graham.

Creating a working stylesheet may seem like an end in itself, but once it’s written you may want it to run faster or you may not be sure that the output is correct (And if you are sure, how sure are you?).
Profilers, unit test frameworks, and other tools of conventional programming are similarly available for XSLT but are not widely used. This presentation surveys the available tools for ensuring the quality of your XSLT.

10:18 PM | Comments (0) | TrackBack | #XML , #XSLT

January 27, 2008

Windows of Tel-Aviv

On Dizengoff street, Tel-Aviv.

10:36 PM | Comments (0) | TrackBack | #Photos

Crowdsourcing in Action: The Library of Congress Photos on Flickr

The Library of Congress has launched an interesting pilot project with Flickr, which can be characterized as a crowdsourcing experiment.

They have uploaded 3115 copyright-free photos from two of the most popular collections and in return they hope the Flickr community will enhance the collections by labeling and commenting images:

We want people to tag, comment and make notes on the images, just like any other Flickr photo, which will benefit not only the community but also the collections themselves. For instance, many photos are missing key caption information such as where the photo was taken and who is pictured. If such information is collected via Flickr members, it can potentially enhance the quality of the bibliographic records for the images.

Crowdsourcing is a special case of a human-based computation, a technique for solving problems that computers just incapable of (or if you wish - problems for which humans cannot yet program computer to solve). The simple idea behind human-based computation is to outsource certain steps to humans. And if you outsource it to the crowd you get crowdsourcing:

Crowdsourcing is a neologism for the act of taking a task traditionally performed by an employee or contractor, and outsourcing it to an undefined, generally large group of people, in the form of an open call. For example, the public may be invited to develop a new technology, carry out a design task, refine an algorithm or help capture, systematize or analyze large amounts of data (see also citizen science).

Think about tagging images (Google Image Labeler), answering arbitrary human questions (Yahoo! Answers), selecting the most interesting stories (Digg, reddit), inventing better algorithms (Netflix prize) or even monitoring the Texas-Mexican border.

Btw, did you know that Google didn't invented Google Image Labeler, but licensed Luis von Ahn's ESP Game? And that while the crowd is working for free on Google Image Labeler, improving Google's image search, Google never shares collected tags? I don't think that's fair. Moreover I think that's unfair. Results of crowdsourcing must be available to the crowd, right?

Anyway, how is the pilot going? From the Flickr blog we learn first results:

In the 24 hours after we launched, you added over 4,000 unique tags across the collection (about 19,000 tags were added in total, for example, “Rosie the Riveter” has been added to 10 different photos so far). You left just over 500 comments (most of which were remarkably informative and helpful), and the Library has made a ton of new friends (almost overwhelming the email account at the Library, thanks to all the “Someone has made you a contact” emails)!

That was after 24 hours. Today, 10 days later the results (according to my little script) are: 2440 comments, 570 notes, 13077 unique tags.

That's almost 500% more comments and 300% more tags. In average 0.8 comments and 4.2 tags per image. Not bad, but not very impressive too. I will be interesting to check it again in a month to see what's is the trend.

It's also interesting to see when bad guys start to abuse it. Google Image Labeler was abused less than a month after its launch. And Google Image Labeler is protected from abuse by using only tags selected by both players independently, while on Flickr there is no protection whatsoever.

I also figured out that while these 3115 photos were posted to Flickr, there are about 1 million others available online in the Library of Congress's own Prints & Photographs Online Catalog, which is really astounding. Check out this picture of General Allenby's entrance into Jerusalem back in 1917:

Scanned from b&w film copy negative, no known restrictions on publication, freely available as uncompressed tiff (1,725 kilobytes). Now that's real wow.

12:30 PM | Comments (0) | TrackBack | #Photos , #Web