Due to several bots managing to bypass CAPTCHA on both this website and other websites I work with, I became interested in exploring the nuances behind CAPTCHA solvers, how they work, and what makes a “good” captcha. What I found is that in the age of AI, CAPTCHAs are becoming increasingly easier to crack… and where AI can’t do it the gig economy fills the gap.
Interested in the concept, I figured I would try to create my own hence this project being born. The project uses a combination of ImageMagick and ML.net image classification to solve the CAPTCHA images it is given.
This is based on a few observations of Amazon’s CAPTCHA images:
- Each character has at least 1 pixel of white space between them.
- Every CAPTCHA image is 6 characters long.
- In instances where 7 characters were found, a character is wrapped from the end to the beginning of the image. This character is always the last in the sequence.
- No additional noise is added to the image other than letter skewing and position changes.
Due to these specific observations, I was able to develop a consistent algorithm for parsing and evaluation. The algorithm is generally explained as follows:
- Image Slicing: The image is sliced into separate images and trimmed such that a series of images containing only a character is created.
- Image Resizing: The character image is then resized to maintain consistency. While this results in some images looking skewed, our consistent use of this process coupled with the use of an image classification model makes this a non-issue.
- Character Processing: Each character image is processed via the solver model to determine what the image contains.
- Result Compilation: The answers for each index are then compiled into string form and returned to the caller.
This project is open-source and available on my GitHub page.