Captchas, initially a huge annoyance, are generally recognized as a necessary evil now. They stop bots from abusing your services, and there's a lot of interesting variants to use. The biggest is google's recaptcha, which is so popular even microsoft uses it occasionally. Today my attention is on Yahoo's implementation. You'll know them, they look like this:
In a nutshell: I had to type a few of these lately, and the character distribution didn't look quite right. I grabbed a hundred captchas, laboriously typed them out, and broke it down by character.
What you can't see here: Yahoo works with the traditional 'random combinations of letters and numbers' form of captcha. They use at least three different fonts, which are then physically skewed in a variety of ways. There's no additional visible interference between you and the letters, and the average length is 7.2 characters.
What you can see: Yahoo captchas use a relatively small subset of alphanumeric characters. A, B ,F ,G ,H ,J ,L ,M, T and V appear only in uppercase while c, d, e, n, p, r, s, t, y, q, y and z appear only in lowercase. Out of the numbers we have only 2, 3, 4, 5, 6, 7, and 8. This leaves 8 alphanumeric characters completely unrepresented - i, k, o, q, x, 1, 9 and 0.
Most of these seem to be omitted due to possible confusion. O, o and 0 are easily mistaken and so all are avoided, and the same goes for l/1 and K/X. Additionally, some two-character combinations which look similar to existing characters are omitted.
I'm not entirely sure of the strategy here. They're purposefully obfuscating the word by overlapping the characters, but at the same time dramatically reducing the number of characters that could be present. By cutting down the total alphanumeric characters from 62 to 28 they're making it easier for OCR to render their technique ineffective.