Redacting PDFs on Linux

I had an odd situation pop up where I needed to redact a PDF. It wasn’t anything nefarious or super exciting, but it did involve a complicated PDF that included images and tables in addition to text. In figuring out how to redact the PDF, I ended up learning a few things and thought it might be helpful to share those (mostly with future me in case I need to do this again).

Being a little tech savvy, I’m well aware that there are people out there who highlight text in a PDF in black and save it, thinking the text is gone. It’s not. It’s just highlighted in black. If you want to get rid of the text, you really only have two options. (1) You literally delete the text in the PDF using a PDF editor, or (2) you convert the PDF into images after the edits are done. The text is gone once the PDF is converted into an image.

You could use LibreOffice to edit the text in a PDF. I love LibreOffice, but its ability to retain formatting when editing a PDF isn’t great. So, if you want to edit the actual text, you can give LibreOffice a go. If you want to keep the formatting but make sure the text is gone, here are some directions. (NOTE: I’m using Kubuntu 24.04, Okular 23.08.5, Xournal 0.4.8.2016, and pdf-redact-tools 0.1.2-4.)

Redacting Just Text

If you just want to redact text, you can start out by doing what I said above – highlight the text you want to delete in black. Open your PDF in Okular. I’m going to use a PDF from a recently published article to illustrate. Then, with the Annotations toolbar, select the highlighter.

Then, select the color you want to highlight with. If your text is black, you’re going to highlight it with black.

Here is me highlighting some of the text.

Obviously, the text is still there. If you save the PDF right now then try to select the text, you’ll see that you can still select it. To find out what it is, you’d just need to copy the text that is highlighted in black and paste it into a different document.

To make the text go away entirely, you really need to turn the text into an image. Conveniently, there is a software package that does that for you: pdf-redact-tools. It converts the PDF into PNG files, then combines them back into a PDF. The resulting PDF is made up of just images with no text that can be highlighted and copied then pasted to find out what it was. This is done via the command line:

pdf-redact-tools -s NAMEOFPDF.pdf

The first part of the code calls the program (pdf-redact-tools). The second part, “-s” tells the program to “sanitize” the PDF, which means it will convert each page to a PNG file, then combine them into a new PDF. The last part of the code is just the name of the PDF you want to convert. Make sure you have navigated to where the PDF is in your console (e.g., cd /home/user/Desktop). FYI, when you run this, pdf-redact-tools, will create a new version of the PDF with “final” appended to the name.

And here is an image of me trying to select text in the new PDF that has been converted.

You’ll note in the last image above that the quality of the PDF has degraded. That’s because it is now an image and not text. But you will have genuinely redacted the text from the document in such a fashion that it cannot be recovered.

Redacting Text and Images/Tables

The method above works great if you just need to redact text. But my situation recently was a bit more complicated as I also needed to redact tables and figures. I came up with a solution that ended up working fairly well. Instead of opening your PDF in Okular, open it in Xournal.

I use Xournal regularly to sign PDFs by inserting my signature. That led me to realize I could just insert a blank white image over the figures or tables in Xournal, which would block them out. I took an image of a white screen that you can use. I posted it below, but of course, you won’t be able to see the white screen because the background of this blog is white. But, trust me, it’s there. You can right-click on it and download it if you want to use it.

Select the “insert image” icon in Xournal:

Then click where you want to insert the white image. Xournal will pop up a dialogue to find the image. Find the image you want to insert, and Xournal will insert it where you clicked.

One of the really nice features of doing this in Xournal is that you can expand the white image as big as you want it or shrink it as well. Here I am expanding the image to cover the figure in my PDF.

Once you’ve covered the image, figure, table, or even text you want to remove, then run the same pdf-redact-tools program to convert the PDF into images as detailed above, wiping out the underlying content.

When you open the new file (renamed with “final” appended to the end), you won’t be able to see or select anything where the figure or table was. It will just be blank white space.

That is how you can successfully redact content from a PDF on Linux.

Redacting Just Text

Redacting Text and Images/Tables

Comments

Leave a Reply Cancel reply