April 21, 2008

blogs.asia

No, I don't own it. But with you help together we can win it.

blogs.asia domain name received more than one application (one of them was mine) during .asia landrush period and will be auctioned via dotasia.pool.com site soon (most likely in couple weeks). This will be a closed auction - only those who sent application during landrush period will be able to participate.

Now, I don't know how many people wanted this domain and how serious they are about bidding for this name. I was trying to register blogs.asia just for fun and probably will fail on the auction. So if anybody has any ideas about what blogs.asia can become and willing to spend some money on it, drop me a line and we can try to get it together.

April 16, 2008

Most Popular Words 2008 (Google, Live)

I was doing some Web popularity research and found very cool data set collected by Philipp Lenssen back in 2006 and 2003. This is basically Google page count for 27000 English vocabulary words.

I decided to repeat the process on a wider word set via at least two search engines (Google and Live Search). So I combined Philipp's 27000+ vocabulary with Wiktionary (a wiki-based open content dictionary) English index and got quite comprehensive 74000+ vocabulary which reflects contemporary English language usage on the net. And then I collected page count number for each word reported by Google and Live Search.

And here are some visualizations. Unfortunately while Swivel can do do great interactive visualizations including clouds, they only support static graph for embedding. So don't hesitate to click on the graphs to see a better visualization (e.g. cloud for 100 top words).

Top 30 most popular words by Google, Live (numbers are in billions):
Most Popular Words (Google version)    Most Popular Words (Live version)

As expected, top is occupied by common English words and common internet related nouns.

Top 30 most popular words by Google vs Live:

 Most Popular Words (Google vs Live)

Top 30 gainers (Google, 2006 to 2008). Good to see x 48 page count gain for "twitter", the rest I cannot explain. Can you?

oracular x 163.6
planchette x 153.7
newsy x 93.5
posse x 81.7
nymphet x 75.2
jewelelry x 65.6
twitter x 48.6
paling x 48.2
waylain x 45.2
outmatch x 45.2
outrode x 41.6
pod x 41.0
phizog x 35.6
sinology x 29.9
overdrew x 26.7
multistorey x 26.5
nonstick x 25.6
nun x 25.4
pedicure x 24.8
pillory x 24.8
panty x 24.3
outridden x 24.0
nip x 23.2
naturism x 23.2
organddy x 23.0
piccolo x 22.0
paladin x 21.6
notability x 21.2
breadthways x 20.9

And finally top 10 the longest words along with page count (Google, 2008):

<w c="1460">tetaumatawhakatangihangakoauaotamateaurehaeaturipukapihimaungahoronukupokaiwhenuaakitanarahu</w>
<w c="5620">taumatawhakatangihangakoauauotamateaturipukakapikimaungahoronukupokaiwhenuakitanatahu</w>
<w c="60">methionylglutaminylarginyltyrosylglutamylserylleucylphenylalanylal...serine</w>
<w c="62300">llanfairpwllgwyngyllgogerychwyrndrobwllllantysiliogogogoch</w>
<w c="20100">taumatawhakatangihangakoauauotamateapokaiwhenuakitanatahu</w>
<w c="285">aequeosalinocalcalinosetaceoaluminosocupreovitriolic</w>
<w c="69000">pneumonoultramicroscopicsilicovolcanoconiosis</w>
<w c="1010">hepaticocholangiocholecystenterostomies</w>
<w c="18">hepaticocholangiocholecystenterostomy</w>
<w c="74500">hippopotomonstrosesquippedaliophobia</w>

Unsurprisingly, the longest  word is still 92 letters long name of a hill in New Zealand, this one is hard to beat.

 

The raw data sets (page count for 74000+ words) are available in XML format and also on Swivel (Google version, Live version) where you can play with them visualizing and comparing in your way. Any more interesting visualization or comparison for this data set can you came up with? Enjoy.