Sunday, 28 August 2011

A brief analysis of Yahoo captchas

Captchas, initially a huge annoyance, are generally recognized as a necessary evil now. They stop bots from abusing your services, and there's a lot of interesting variants to use. The biggest is google's recaptcha, which is so popular even microsoft uses it occasionally. Today my attention is on Yahoo's implementation. You'll know them, they look like this:

In a nutshell: I had to type a few of these lately, and the character distribution didn't look quite right. I grabbed a hundred captchas, laboriously typed them out, and broke it down by character.

What you can't see here: Yahoo works with the traditional 'random combinations of letters and numbers' form of captcha. They use at least three different fonts, which are then physically skewed in a variety of ways. There's no additional visible interference between you and the letters, and the average length is 7.2 characters.

What you can see: Yahoo captchas use a relatively small subset of alphanumeric characters. A, B ,F ,G ,H ,J ,L ,M, T and V appear only in uppercase while c, d, e, n, p, r, s, t, y, q, y and z appear only in lowercase. Out of the numbers we have only 2, 3, 4, 5, 6, 7, and 8. This leaves 8 alphanumeric characters completely unrepresented - i, k, o, q, x, 1, 9 and 0.

Most of these seem to be omitted due to possible confusion. O, o and 0 are easily mistaken and so all are avoided, and the same goes for l/1 and K/X. Additionally, some two-character combinations which look similar to existing characters are omitted.
In this example the letter d is very easily mistaken for either 'cl' or 'ol' due to the font. However c, l, and o never appear in the captchas, presumably for this reason.. The letter p suffers similarly, while B and 8 manage to escape despite being sometimes difficult to distinguish.

I'm not entirely sure of the strategy here. They're purposefully obfuscating the word by overlapping the characters, but at the same time dramatically reducing the number of characters that could be present. By cutting down the total alphanumeric characters from 62 to 28 they're making it easier for OCR to render their technique ineffective.

Wednesday, 10 August 2011

RIM: Now checking your email from the wrong country

I like the way my blackberry handles multiple email accounts. RIM servers effortlessly stream my gmail traffic directly to me wherever I am, handling two-way sync, calendars and contacts with ease.

Today google forced me to re-authenticate in gmail a couple of times. A quick look at the recent activity list revealed why: turns out to be and is the first time a foreign IP has been handling my mail for an extended period.

They have Australian servers that work perfectly well, and I cant find any notices about local downtime. I'm not entirely happy with this, we'll have to see if it stays.

Edit: A week later, It's still all Canada. Either google has reclassified all RIM IP space as physically in canada, or RIM has made a fairly drastic change to how they access your email account.

Tuesday, 2 August 2011 A brief lesson in what privacy isn't is getting a bit of press lately as a new site that allows sharing multiple files with others in 'boxes'. No sign up required; you just click on the 'start sharing' button, upload files, and share the URL around with others.

Here's their current front page.

Before I go further, I should mention this kind of sharing is inherently insecure - they make no promises about keeping your data safe and there's no password protection. Anything you put up there can be accessed by whoever has the private URL, and that's how it's designed. The only thing they specify on that page there is that your box has a 'private URL'.

That should be easy, right? All they have to do is set up robots.txt so that nobody can spider their site, after all.

Unfortunately, I'm wrong.

And as a result of that:

Now, I know what you're thinking -"That's not too big a deal, google can't find anything that's not already linked to". That's where the second mistake comes in, which is only obvious because of the first.

When you click on the link to, you get this lovely page.
Yup. That a public xml file with hard links for the thousand most recently uploaded files. I'm not sure why the file exists, but the fact it's publicly accessible is just terrible. In 15 minutes you could knock up a shell script that regularly checks for and downloads every single file uploaded to the site.

So the outcome here: may turn out fine one day, but so far Loren Burton's  claims of a private URL aren't holding up.. If you don't want your files immediately available to the general internet, don't use boxify.

Interesting Sidenode: The example box they link to on the front page can also be edited by the masses. Consequently, it has naughty material for the discerning visitor.