High Level Description

1. EZ-Gimpy

Our algorithm for breaking EZ-Gimpy consists of 3 main steps:
  1. Locate possible letters at various locations
  2. Construct graph of consistent letters
  3. Look for plausible words in the graph
This page will give a high level description of these steps, using the image below as an example.

Candidate Letters

The first step is to hypothesize a set of candidate letters in the image. This is done using our shape matching techniques described in detail here. The method essentially looks at a bunch of points in the image at random, and compares these points to points on each of the 26 letters. The comparison is done in a way that is very robust to background clutter and deformation of the letters. The process usually results in 3-5 candidate letters per actual letter in the image.

In the example shown here, the "p" of profit matches well to both an "o" or a "p", the border between the "p" and the "r" look a bit like a "u", and so forth. At this stage we keep many candidates, to be sure we don't miss anything for later steps.

Consistent Letters

Next, we analyze pairs of letters to see whether or not they are "consistent", or can be used consecutively to form a word. In the example below, green lines are drawn between pairs of letters that could be chained together to form a complete word.

Plausible Words

There are many possible paths through the graph of letters constructed in the previous step. However, most of them do not form real words. For example, "pfql" is a path through the graph, but we don't need to consider it further since it isn't a real word. It turns out that the vast majority of paths through the graph are meaningless.

We select out the real words in the graph, and assign scores to them based on how well their individual letters match the image. In this example, 2 complete words are found, "roll" and "profit". We compute matching scores for each letter or these words, and find that "profit", with a score of 9.42 (lower is better, it's a distance measure) matches the image better than "roll", that has a score of 11.94.

2. Gimpy

In a Gimpy CAPTCHA, such as the one shown below, we must correctly identify 3 words in order to pass the test.

However, a naive approach like the one developed for EZ-Gimpy, looking for individual characters, is doomed to fail. Consider these small, character-sized windows taken from some Gimpy CAPTCHAs. It is very difficult to determine what letters are present.

Instead, we attempt to find entire words all at once. If used directly, this method would be very inefficient - there are about 500 words in the dictionary used for Gimpy, compared to looking for 26 characters this would be slow. We avoid the computational expense of comparing to all 500 words by anchoring about the salient parts of the word in the image. Condsider a pair of words (shown as edges) from the image above.

Often, the beginning or closing 2 letters of a word are readily recognizable. Moreover, if we can identify these two letters, the number of possible candidate words left is rather small.

We find these beginning or closing 2 letters, called "bigrams" in the same way that we searched for individual letters before. Note that of the 26*26 = 676 possible bigrams, only about 100 are actually used to open and close english words (no common words begin or end with "lq", or "zx").

We then guess at what the first word in a pair might be, in the example shown here, we correctly guess "round". We then remove the parts of the image corresponding to this word, and repeat the process of guessing bigrams and looking for the second word. For more details on exactly how the matching, removal of words, and everything else in our algorithm is done, please see our paper.



Back to Greg Mori's page