A lot of time ago, there was an incident related to citizenship privacy in Chile and many people, even from other countries, published some posts about it [1][2][3]. Today, with all citizen data made public due to an important lack of awareness about privacy and security, I'll describe the weak patch implemented at Servel's page to avoid massive requests of citizen information. This was a CAPTCHA solution which I broke by two methods: a logic flaw and a design weakness.
In SPECT Research we aren't used to make public vulnerabilities in web sites, but we have a strong commitment with the citizenship and the issues affecting their privacy. In this case, even my information was compromised. I think the vulnerabilities explained here are harmless, since all citizen information extracted from Servel's site is available in networks like Tor or direct download, as it's mentioned in referenced posts.
A simple flaw in the logic: CAPTCHA bypass
First three queries to Servel's page with a valid RUN doesn't require CAPTCHA solution. After that, a CAPTCHA is displayed and you need to answer it correctly in order to get citizen information. But what if instead of answering this new form with new fields you replay previous requests?
I think this flaw in the logic was implemented by staff in charge of the web site. After three queries, the CAPTCHA is displayed but this new form isn't enforced. You can use previous requests, change the RUN and get citizen information, bypassing the CAPTCHA.
From 62.5% to 100% of success in CAPTCHA solving
In late April this year, I published a video solving the CAPTCHA present in Servel's page with a 62.5% of success guessing the numbers of the CAPTCHA. After that, I did a little research to improve this percentage based on the first work.
The software used by Servel's page was CAPTCHA Image. Its source code is available so you can check how CAPTCHAs are created. The initial version of the CAPTCHA solver script did the following steps on the CAPTCHA image:
-
Use convert utility (from ImageMagick) to apply a Difference of Gaussian (DoG) filter to the image. This was the best filter to highlight borders of the image.
-
Use pytesser (a python wrapper for Tesseract) to read the numbers from the image. I did some modifications on the code to run the script.
With that approach, I got 62.5% of success, so the next step was thinking ideas to improve the results. I thought about improving clarity of the numbers in the image, through application of new filters. This was a difficult way since the noise in the background is always present and the filters could get confused, so image clarity wouldn't improve in the expected manner.
After that closed door, I thought about flaws in the logic of CAPTCHA Image. As it's explained in its web site, the workflow is the following:
- CAPTCHA Image generates a random text
- It stores the text in the Session object (
Session["CaptchaImageText"]
) - It creates the CAPTCHA image using
Session["CaptchaImageText"]
- Finally, it validates the user input against
Session["CaptchaImageText"]
The page in charge of this handling is Default.aspx.cs, specifically the method Page_Load()
, displayed next:
private void Page_Load(object sender, System.EventArgs e)
{
if (!this.IsPostBack)
// Create a random code and store it in the Session object.
this.Session["CaptchaImageText"] = GenerateRandomCode();
else
{
// On a postback, check the user input.
if (this.CodeNumberTextBox.Text == this.Session["CaptchaImageText"].ToString())
{
// Display an informational message.
this.MessageLabel.CssClass = "info";
this.MessageLabel.Text = "Correct!";
}
else
{
// Display an error message.
this.MessageLabel.CssClass = "error";
this.MessageLabel.Text = "ERROR: Incorrect, try again.";
// Clear the input and create a new random code.
this.CodeNumberTextBox.Text = "";
this.Session["CaptchaImageText"] = GenerateRandomCode();
}
}
}
At the first request (a GET
request), the condition at line 3 is True
then CAPTCHA text is generated and related to your session. In other case, if it's a valid PostBack request then the user input is validated as right or wrong.
It seems like if you correctly answer the CAPTCHA, the variable Session["CaptchaImageText"]
is never cleaned, then you could request over and over successfully using the same initial CAPTCHA solution, but it's not true in practice. I don't know if they added some countermeasure for that or the same CAPTCHA Image made some call to generate another random text.
Recalling from CAPTCHA Image workflow, at point 3 the CAPTCHA image is generated from Session["CaptchaImageText"]
in page JpegImage.aspx
. Looking at the code in Default.aspx.cs
, if you made a bad guess the CAPTCHA text is regenerated. Otherwise, it's kept. You can generate infinite images requesting JpegImage.aspx
with the same Session["CaptchaImageText"]
and all images will have the same number. This, combined with my first script, is the new approach.
The script fails when it can't correctly guess the text in the image. It's due to the fact that some CAPTCHA images have much noise, or their rotation values are high and the texts are very distorted. But if you can generate many images, some of them will be guessed well. In this case, the success of the script is 62.5%, so I could request five images, if three of them match the same guess, I could confirm with around 100% of accuracy that the guess was right.
A video showing CAPTCHA Image solver running is shown next:
The source code of CAPTCHA Image solver is available here.