GW Micro - Unleashing the power of your mind's eye.

Protected PDFs - A Rant and Solution

2007-09-04

Permalink 02:06:17 pm, by Aaron Email , 1250 words, 1633 views English (US)
Categories: dev, software

Protected PDFs - A Rant and Solution

Before I begin, let me make something perfectly clear: GW Micro does not condone the act of hacking or circumventing security restrictions explicitly applied to protect content in Adobe PDF files. If an author set a password on a PDF document, they probably did so for a reason, and we're not in the business of defrauding those trying to safeguard their livelihood.

With that in mind, on to my rant. We tout support for protected PDFs in Window-Eyes, so what the heck am I going on about? PDF protection isn't as clear as on or off. Using Adobe Acrobat, when an author makes the conscious decision to protect a PDF document, they can choose to add a password, restrict editing and printing, restrict copying images, text, and other content, and (hold on to your seats) restrict text access for screen readers. Yep, you heard that last one correctly. Adobe provides authors with the ability (pun intended -- you'll know why in a second) to explicitly deny access to assistive technology. This aberration is clearly marked with a check box labeled, "Enable text access for screen reader devices for the visually impaired." I applaud Adobe for taking the lead in creating accessible electronic documentation by providing access to PDF documents, but I will never understand the inclusion of an option that gives someone the ability to decide whether or not accessibility should exist. That check box should have never been created, and it needs to be removed. Accessibility is something that should not be decided by a flick of a mouse button from the hand of a sighted person who doesn't have the first clue as to why a blind person needs access to a PDF in the first place. Accessibility should not be optional, and that scenario is precisely the reason why I have no objections to providing a solution to access restricted information, assuming that you legally own the PDFs that you need access to.

Let me make more perfectly clear what I previously made perfectly clear: we are not looking to break the security model of PDF files. We’re not talking about removing passwords, or enabling the ability to modify the text of a PDF. We don’t want you to be able to print when you want to print, copy when you want to copy, or anything along those lines. Protected PDFs are a decent way to protect content, just like password protected Word documents, password protected ZIP files, secure web pages, emails, and so on. We are highly sensitive to the need for security, and even implement our own security models wherever we can. We are instead simply providing a solution that provides access to text that has been unduly restricted, most likely due to the ignorance of the individual who enabled the restrictive security methods. And, once the process is all said and done, it’s really no different than printing a PDF, scanning the result, and OCR’ing into your favorite word processor. In fact, if the printing security restriction has been enabled, this trick won’t work anyway.

I think I’ve disclaimed enough, so let’s move on. Although there are various means to access protected PDF text (many of them quite actionable if you don't legally own the PDF in question), I'm going to discuss one that uses the Microsoft Office Document Imaging feature available with Microsoft Office 2003 and up. The basic gist of the process involves printing a PDF to the Microsoft Office Document Image Writer, and then using the OCR features of the Microsoft Office Document Imaging application to provide the text to Microsoft Word.

First make sure you have Microsoft Office 2003 installed along with the Microsoft Office Document Imaging feature (which, I believe, is installed by default, at least with the Professional Edition of Microsoft Office 2003). Next, make sure you have either Adobe Reader or Adobe Acrobat installed, which you would need anyway to read non-protected PDF files. Finally, you'll need the PDF file that you can't read through normal Adobe means.

Here’s the step by step:

  1. Open the restricted PDF file.
  2. Press CTRL-P to print.
  3. Select the Microsoft Office Document Image Writer from the printer name combo box, and press ENTER.
  4. Enter a file name to print to. The extension should be .MDI (for Microsoft Document Imaging Format). Once the document has printed, close the PDF file.
  5. Open the Microsoft Office Document Imaging utility (usually located in the Start Menu, under Programs, Microsoft Office, Microsoft Office Tools).
  6. Press CTRL-O to bring up the Open dialog.
  7. Type in the path and file name of the MDI you saved in step 5, and press ENTER.
  8. Press ALT-T for Tools.
  9. Arrow down to Send Text to Word, and press ENTER.
  10. Press ENTER to begin the conversion with the default options. If you’re presented with a dialog stating, “You must re-run OCR before performing this operation,” simply confirm by selecting the OK button.
    The conversion process will begin. You can use the Window-Eyes progress hot key (CTRL-INS-B by default) to interrogate the progress.
  11. Once the conversion is complete, Microsoft Word will be open (for me, it opened in the background) with the text of the PDF file available for your perusal. Once you locate the Microsoft Word window, you can close the Microsoft Office Document Imaging utility.

There are a few things to note about this process.

  • The results of an OCR are only as good as the OCR engine. OCR is never a complete replacement for the original text. In other words, don’t expect perfect text accuracy.
  • This process does not remove password protection. If you have a password protected PDF, you will still need to know the password to perform this task.
  • If a PDF author has restricted copying text, this method will enable the OCR’d text to be copied. Acrobat itself warns about this when you enable the copying restriction: “All Adobe products enforce the restrictions set by the Permissions Password. However, not all third-party products fully support and respect these settings. Recipients using such third-party products might be able to bypass some of the restrictions you have set.”
  • If the printing security restriction has been enabled, you cannot print the PDF, meaning you can’t use this method to do what you want.

Although I’ve been discussing this method for use with restricted PDFs, it will also work fairly well with PDFs that contain nothing but images. If you don’t have access to another utility that boasts PDF OCR capabilities, this may be a good solution for you.

For example, I took a screen shot of a web page, and created a PDF out of it; the PDF contained nothing but an image of what was on my screen. I ran it through this process, and for the most part, the text on the web page was readable.

PDF files, in general, are very accessible despite their enigmatic stigma. Adobe even provides their own methods of tweaking accessibility settings (i.e. changing reading order, overriding tagged order, etc.). There’s even an Accessibility Quick Check in the Acrobat Reader (even more detailed Accessibility tools in the full Adobe Acrobat) for examining documents, and reporting problems to the PDF author.

Now you have an additional resource when you encounter a not-so-friendly PDF file that doesn’t live up to good accessibility standards.

Do you have any other tips for reading PDF files?


Comments, Pingbacks:

Comment from: manosinu [Visitor]
The screen-capturing proves solid but gruesomely tiresom if you'd want to do it manually. Try www.copistar.com that will do it automatically and create a printable pdf.
PermalinkPermalink 2007-10-28 @ 06:33
Comment from: web design [Visitor] · http://www.xelonline.com
How about printing the document as a text file as explained here http://sethf.com/infothought/blog/archives/000751.html

PermalinkPermalink 2008-01-16 @ 16:14
Comment from: peter [Visitor] Email
Am I missing something here?
I printed the protected document using the (print to file) MS XPS Document Writer which immediately produced a .xps file. I opened the xps file in Acrobat and then saved it as a pdf file. The protection was removed.
PermalinkPermalink 2008-03-02 @ 05:29
Comment from: Kris [Visitor] Email
Does anyone know of any software that prevents this from working? I'm looking into publishing a book online and want some protection for it. I am looking at www.locklizard.com and it seems to do everything (prevents screen capture, print limits and prevents printing to file). Does anyone know any more about it or about other software?
PermalinkPermalink 2008-03-19 @ 15:40
Comment from: Geno [Visitor]
I'm a college student working late on a paper and spent damn near an hour downloading shitty demo versions of pdf decryptors that only worked on half the document unless I paid $30 bucks. So thanks, you're a life saver!
PermalinkPermalink 2008-05-01 @ 00:10

Leave a comment:

Your email address will not be displayed on this site.
Your URL will be displayed.

Allowed XHTML tags: <p, ul, ol, li, dl, dt, dd, address, blockquote, ins, del, span, bdo, br, em, strong, dfn, code, samp, kdb, var, cite, abbr, acronym, q, sub, sup, tt, i, b, big, small>
(Line breaks become <br />)
(Set cookies for name, email and url)
(Allow users to contact you through a message form (your email will NOT be displayed.))
Why am I being asked this question?


Archive

May 2008
Sun Mon Tue Wed Thu Fri Sat
 << <   > >>
        1 2 3
4 5 6 7 8 9 10
11 12 13 14 15 16 17
18 19 20 21 22 23 24
25 26 27 28 29 30 31

Search



© GW Micro, Inc. All Rights Reserved.
GW Micro, Inc.    725 Airport North Office Park    Fort Wayne, IN 46825
Ph: 260-489-3671 Fax: 260-489-2608    www.gwmicro.com    sales@gwmicro.com    support@gwmicro.com
Hours: M-F, 8a-5p, EST