October 14th, 2008

ronin

Cracking kitten-auth with AI.


The Asirra CAPTCHA relies on the problem of distinguishing images of cats and dogs (a task that humans are very good at). The security of Asirra is based on the presumed difficulty of classifying these images automatically.

In this paper, we describe a classifier which is 82.7% accurate in telling apart the images of cats and dogs used in Asirra. This classifier is a combination of support-vector machine classifiers trained on color and texture features extracted from images. Our classifier allows us to solve a 12-image Asirra challenge automatically with probability 10.3%. This probability of success is significantly higher than the estimate of 0.2% given in [7] for machine vision attacks. Our results suggest caution against deploying Asirra without safeguards.

We also investigate the impact of our attacks on the partial credit and token bucket algorithms proposed in [7]. The partial credit algorithm weakens Asirra considerably and we recommend against its use. The token bucket algorithm helps mitigate the impact of our attacks and allows Asirra to be deployed in a way that maintains an appealing balance between usability and security. One contribution of our work is to inform the choice of safeguard parameters in Asirra
deployments.


http://www2.parc.com/csl/members/pgolle/papers/dogcat.pdf

Golle trained his program using 8,000 images collected from the same website. Through trial and error, his software gradually learned to tell cats and dogs apart, based on a statistical analysis of color and texture in each photo. The pink of the dogs' tongues and the green of the cats' eyes provided strong clues, Golle says, but it is only by studying color and texture information from so many images that his program could attack the problem. "Machine learning is very good at aggregating information," Golle says.

http://www.technologyreview.com/web/21519/page1/

I knew this was coming, but I didn't think it would happen so soon! IMO the next thing we need to do is incorporate domain knowledge into the test somehow. For instance, a captcha can show random pictures of animals and direct the user to "click on all the african animals". Then they have to click on pictures of elephants, zebras and cheetahs but not polar bears, kangaroos or llamas. The knowledge that certain kinds of animals are in Africa and others are not is something that machines don't do well (yet).

Collapse )
  • Current Music
    Social Code - Cats And Dogs
  • Tags