09.04.03
The Ghost Blogs of Tibet
Jenny, 22 claims to lead an angst-filled life in the Arctic Ocean (80.46 N, 36.87 E), but she really lives near Winston-Salem (36.87 N, 80.46 W).
Musically inclined shepherds in the desolate highlands of Kachi, China, might be extremely excited to learn that it's only a short camel trek to Jazz Guitar Resources, conveniently located near the Kyrgyz border, right next to the Kwik-E-Mart. (40.5 N, 76.5 E). How sad they will be to find that JGR is really somewhere near Harrisburg, Pennsylvania (40.5 N, 76.5 W).
In fact, a look at the map on the GeoURL homepage reveals a dense packing of phantom American weblogs out in Tibet and the Central Asian steppes, a region not known for its lively expatriate community or access to broadband.
Latitude/longitude coordinates form a beautiful synergy with Murphy's Law. In the GeoURL scheme, for example, there are seven ways to get your coordinates wrong, yet still have them be valid. You can give the wrong sign for latitude and longitude, or list the coordinates in the wrong order, or do some of both. The sign problem is especially subtle - latitude and longitude are often shown using unsigned numbers and the letters N, S, E, W to indicate hemisphere, so it's very easy to forget to add a minus sign if you live below the equator, or in the New World.

Some of the errors are easy to spot. In a previous post, I described a suspicious cluster of blogs in the Horn of Africa, many of them in the Indian Ocean, which turned out to be a latitude/longitude transposition error made by some German and Czech bloggers. Finding a cluster of red dots in water was an easy tip-off. The inverted USA in Central Asia is similarly obvious, partly because the area in question is so sparsely inhabited, and partly because the reversed blogs actually form a mirror image of the U.S. East Coast.
But other cases are not so easy to disambiguate. A single pair of coordinates can give locations in Sardinia, Portugal, Tanzania, or Brazil, depending on how they are arranged. Even a language algorithm (which has its own assumptions and potential for error) won't help distinguish a Portuguese blog from a Brazilian one. European or African blogs close enough to the prime meridian can flip longitude with no real chance of detection.
But enough talk. Let's get to the bullet points.
Here's what I've learned this weekend about geographical markup:
Now for the awards portion of the post:
Starc.deviantart.com wins the Dick Cheney Prize for Most Undisclosed Location - the META tag puts Starc somewhere in Chad, the first user profile on the page says Starc is in Canada, and the second user profile says that Starc lives in Texas.
DeviantART.com itself wins the overall Most Useless Geodata award, for putting half its American users in Tibet and Central Asia.
Fourmilab wins the I Can't Believe This is Free award for best geographical website.
Further honors will go to anyone who can write in with more interesting examples of transpositions in the GeoURL data set, or point me to tools that can easily generate inverted-longitude and -latititude maps for countries and regions, to aid in the hunt for phantom blogs.
Finally, I should make it clear that I don't mean this post to heap dirt on the GeoURL project. This kind of stuff happens wherever there are many users and a potential for error. GeoURL has a large enough data set to make these patterns visible, and maps are something we can all understand. But the deeper point is that we are all fallen in the eyes of the metadata god.
5:58 PMGreatest Hits
Attacked By Thugs 5/04 Warsaw police hijinks
Best Practices For Time Travelers 9/03Archive:
Maciej Ceglowski
maciej @ ceglowski.com
Very Good Websites
Mimi Smartypants
The best blogger
Jeweled Platypus
Britta gives me hope
A Shout Out To My Pepys
Ignatz is a writing hero
Scrubbles
Posters, books, design, bric-a-brac. Smart writing
Duck For Cover
Marrije reads so you don't have to
Language Hat
Always interesting language geekery
Eyeteeth
Eyeteeth is bound for writing glory