Pixelating or blurring doesn’t actually work to hide text
If you’ve ever pixelated an email address or blurred a phone number before putting an image onto the internet in order to protect someone’s privacy, I’ve got bad news for you: Researchers at the University of California-San Diego have found that the popular Photoshop redaction techniques are decodable such that the underlying text can be read.
The researchers were able to recover text from a variety of redacted screenshots that they found online, said computer science professor Hovav Shacham by email. They were, for example, able to figure out the blurred email address in this screenshot of a conversation between a corrupt DEA agent and the then-CEO of Bitcoin exchange Mt. Gox.
It’s not the first time we’ve discovered that a Photoshop redaction tool doesn’t work as well as thought. In 2007, we found out that Photoshop’s “twirly” filter was reversible. A man had posted pornographic photos of himself with young boys to the internet, “twirling” his face to protect his identity, but was busted when Interpol untwirled his photo.
Blurring and mosaic pixelation, though, are “lossy” techniques, that discard some of the data, meaning you shouldn’t be able to reverse their obfuscation.
“In many online communities, it is the norm to redact names and other sensitive text from posted screen shots,” write the researchers, specifically citing Reddit. “Mosaicing and blurring have also been used for the redaction of high-profile government documents and celebrity social media.”
They should probably stop doing that. The UC-San Diego researchers found that they could use statistical models—”so-called hidden Markov models”—to generate the blurring or pixelation of lots of numbers, letters, and words, to the point that their software program could match a known redaction to an unknown redaction to figure out what it says. The biggest challenge is figuring out the font and size of the underlying text which the researchers need for their deciphering. They say it works better than a brute-force technique for deciphering pixelated images discussed by Dheera Venkatraman in 2007.
“We conclude that hidden Markov models allow near-perfect recovery of text redacted by mosaicing or blurring for many common fonts and parameter settings, and that mosaicing and blurring are not effective choices for textual document redaction,” the researchers write in their paper, presented this week at the Privacy Enhancing Technologies Symposium in Germany.
They note that, despite the risks, there are countless blurred and pixelated images online containing redacted names, phone numbers, email addresses, passwords, credit card numbers, personal checks, and private conversations. Eek.
The researchers recognize that people like using these techniques because of the aesthetic appeal of suggesting the underlying text, but they recommend that people who truly want to redact information use black bars instead.