Breaking a Visual CAPTCHA

Greg Mori(1,2) and Jitendra Malik (1)
(1) UC Berkeley Computer Vision Group
(2) Simon Fraser University


This is the homepage of the Shape Contexts based approach to break Gimpy, the CAPTCHA test used at Yahoo! to screen out bots. Our method can successfully pass that test 92% of the time. The approach we take uses general purpose algorithms that have been designed for generic object recognition. The same basic ideas have been applied to finding people in images, matching handwritten digits, and recognizing 3D objects.

News Articles

Human or Computer? Take This Test, The New York Times, December 10, 2002.
Up to the Challenge: Computer Scientists Crack a Set of AI-Based Puzzles, SIAM News, November 2002.

Quick links:

Our Approach
Related Links

A CAPTCHA in use at Yahoo
Picture of a CAPTCHA in use at Yahoo.


A CAPTCHA is a program that can generate and grade tests that:

CAPTCHA stands for "Completely Automated Public Turing test to Tell Computers and Humans Apart". See the CAPTCHA site for more details. The concept of a CAPTCHA is motivated by real-world problems faced by internet companies such as Yahoo! and AltaVista. These companies offer free email accounts, intended for use by humans. However, they found that many online vendors were using "bots", computer programs that would sign up for thousands of email accounts, from which they could send out masses of junk email. By requiring the user to solve a CAPTCHA, in the case of Yahoo the word-based one called EZ-GIMPY shown above, the "bots" could be screened out.

EZ-Gimpy and Gimpy, the CAPTCHAs that we have broken, are examples of word-based CAPTCHAs. In EZ-Gimpy, the CATPCHA used by Yahoo! (shown in the figure above), the user is presented with an image of a single word. This image has been distorted, and a cluttered, textured background has been added. The distortion and clutter is sufficient to confuse current OCR (optical character recognition) software. However, using our computer vision techniques we are able to correctly identify the word 92% of the time.

Gimpy is a more difficult variant of a word-based CAPTCHA. Ten words are presented in distortion and clutter similar to EZ-Gimpy. The words are also overlapped, providing a CAPTCHA test that can be challenging for humans in some cases. The user is required to name 3 of the 10 words in the image in order to pass the test. Our algorithm can pass this more difficult test 33% of the time.

Our Approach

The fundamental ideas behind our approach to solving Gimpy are the same as those we are using to solve generic object recognition problems. Our solution to the Gimpy CAPTCHA is just an application of a general framework that we have used to compare images of everyday objects and even find and track people in video sequences. The essences of these problems are similar. Finding the letters "T", "A", "M", "E" in an image and connecting them to read the word "TAME" is akin to finding hands, feet, elbows, and faces and connecting them up to find a human. Real images of people and objects contain large amounts of clutter. Learning to deal with the adversarial clutter present in Gimpy has helped us in understanding generic object recognition problems.
Our related work on finding people and generic objects.

A high-level description of our method can be found here.
If you would like more details, see our paper from CVPR 2003.



Below are a few examples of images analyzed using our method, and the word that was found. Correct words are shown in green, incorrect words in red. For EZ-Gimpy we did experiments using 191 images. We were able to correctly identify the word in 176 of these images: a success rate of 92%! Our algorithm takes only a few seconds to process one image. If your would like to see our results on all 191 images, please click here.



The more difficult version of the Gimpy CAPTCHA presents an image such as the one shown below. There are 10 words (some repeated), overlaid in pairs. The test-taker is required to list 3 of the words present in the image in order to pass.

The clutter in these images, real words instead of random background textures, is much more difficult to deal with. In addition, we must find 3 words instead of just one. Our current algorithm can find 3 correct words and pass this Gimpy test 33% of the time. Note that even if we could guess a single word correctly 70% of the time, we would only expect to get 3 words correct approximately 0.7*0.7*0.7 = 34% of the time. Moreover, given our 33% success rate, this CATPCHA would still be ineffective at filtering out "bots" since they can bombard a program with thousands of requests.

The algorithm we use is outlined in detail in our paper linked above. Check out our results on the the harder version of Gimpy.


The CAPTCHA project
NuCaptcha, a new video-based CAPTCHA service Greg Mori is advising
The UC Berkeley Computer Vision Group
The SFU Vision and Media Lab
Belorussian translation of this page provided by moneyaisle

Back to Greg Mori's page