It’s been at least a decade since the first stories came about people trying to redact PDF files by drawing boxes over the text they were trying to hide. Maybe you remember some of them. The recipient unlocks the document and removes the boxes. Voilà, the hidden text reappears. Producing party (often some government agency) has egg on its face. We get another one of those stories to chuckle about.
But a decade later, surely we’ve all learned our lesson, right? Everyone knows better than to redact a PDF file simply by covering the text with a black box. Don’t they?
It seems not. The Ministry of Defence (their spelling) and several other British agencies posted “redacted” PDF files on the Internet and made one of the oldest faux pas in our business. You guessed it: They covered the text on the image but did not remove the underlying text. Anyone smart enough to highlight the area or press Ctrl-A could copy the underlying text and paste it in a Word document. All the redacted text was conveniently there for the reading.
These weren’t ordinary redactions, as several British news outlets reported, including The Daily Telegraph, the Daily Star and The Register. Rather, they included secrets about Britain’s nuclear submarines, including expert opinions on how well the fleet could cope with a catastrophic accident.
The documents were published under the U.K.’s Freedom of Information Act. Officials “redacted” them by using Photoshop to paste a black patch over secret text, news reports said.
What’s going on here? Doesn’t everyone know how to redact a PDF file? Nope. They don’t. And it keeps happening.
Facebook suffered a similar fate when it settled claims brought against it by ConnectU, the website originally called HarvardConnect that was central to the plot line of the film, The Social Network.
You probably remember. The Winklevoss brothers claimed that Mark Zuckerberg stole their idea and turned it into Facebook, making billions of dollars as a result. The parties settled the case in February 2008 for an undisclosed sum (or at least on terms they didn’t want disclosed). In June of that year, the parties participated in a court hearing, of which the transcript was later published in redacted form. The intent was to block out the parts of the record that would show how much money and stock were paid over to the claimants.
The court published the transcript as a PDF and made it available on the web. You can still see it posted at Justia.com. Page through it and you will see redacted sections. Here is page 24 for example:

You can see the redacted part is blank, which is what the court and the parties intended.
Guess what happens when I select the text tool and highlight the redacted section on the PDF? (I could also just press Ctrl-A and highlight the whole page.)

You see the highlights for the visible and the hidden text. The next step is to press Ctrl-C (or right click and choose copy). Then paste the copied text into a blank Word document (or text file). Now you see what you weren’t supposed to see. It looks something like this:

Let’s just say there was a lot of embarrassment here. And a lot of unfortunate press coverage for the parties.
What’s Going On Here?
Listen up folks. This isn’t that hard. Professionals shouldn’t be making these kinds of mistakes, let alone the Ministry of Defence—particularly involving nuclear secrets. Sheesh.
Adobe files are complex documents. They support multiple layers of images and text. Just because you cover the outer layer of the document doesn’t mean you have eradicated a lower layer of text.
We have worked with PDF files since the mid-1990s, both the image plus text formats and the native postscript files. (In 1999, we were inducted into the Smithsonian Institute for our work using Adobe Acrobat in litigation.)
For most of those years, we have offered our users the ability to redact documents online from their browser. We start the process by converting the native files to the PDF format.
What we don’t do is draw a box over the offending text and call it good. Rather, we extract a page out of the PDF and “flatten” it. By that I mean we convert it from the complex, multi-layered PDF format to a simple one-layer format such as PNG or TIFF. Next we draw a box over the image containing the text (actually the user does this). Then we merge the box and the image together before converting it back to PDF form. The resulting page is merged into the document and saved as a redacted copy.
In this way, the redacted information is gone–both the image layer and the underlying text. There is no hidden layer to be discovered because it has been removed by the flattening of the file and the recreation of the page as a new image file. You can’t remove the box or scrape the underlying text from under the box. It simply doesn’t exist. Indeed, in most cases we then allow the user to OCR the file so that the remaining text is searchable.
There are plenty of other ways to accomplish this. Several years ago, Adobe added a redaction feature to its paid product (not the free Acrobat Reader). If you use the tool properly, the resulting redaction will not be recoverable and there will be no hidden text awaiting prying eyes (or smart computer geeks).
What you don’t want to do is take the approach I saw in a couple comments posted on the Web. They suggested that you place a black box over the offending text and then lock down the PDF (no changes or text scraping allowed, for example). The problem with that approach is this: There are scores of free or almost free PDF cracking tools. I haven’t tested them with the latest versions of Acrobat but last time I tried I could bust open a locked down PDF in a few seconds. Then I could go in and remove that box.
A better option would be to save the file to TIFF and then turn it back into a PDF. This is easy to do with Acrobat and will eliminate any chance of recovering the hidden text. TIFF is a simple format and won’t support a hidden layer. Do that if you don’t have a better alternative.
If your redactions involve secret ingredients for a favored family recipe, the simple place-a-box-over-the-words approach might do the job fine. But if you are a government official with nuclear secrets and a submarine fleet to protect, you should seek help from a pro before you post those documents on the web. You have to love this business!

[...] Article Published By: Catalyst E-Discovery Search Blog [...]