How can I decrypt a PDF using PyPDF2?
Problem Description:
Currently I am using the PyPDF2 as a dependency.
I have encountered some encrypted files and handled
them as you normally would (in the following code):
from PyPDF2 import PdfReader
reader = PdfReader(pdf_filepath)
if reader.is_encrypted:
reader.decrypt("")
print(len(reader.pages))
My filepath looks something like "~/blah/FDJKL492019 21490 ,LFS.pdf"
PDF.decrypt("") returns 1, which means it was successful. But when it hits print PDF.getNumPages(),
it still raises the error, "PyPDF2.utils.PdfReadError: File has not been decrypted".
How do I get rid of this error?
I can open the PDF file just fine by double click (which default-opens with Adobe Reader).
Solution – 1
To Answer My Own Question:
If you have ANY spaces in your file name, then PyPDF 2 decrypt function will ultimately fail despite returning a success code.
Try to stick to underscores when naming your PDFs before you run them through PyPDF2.
For example,
Rather than “FDJKL492019 21490 ,LFS.pdf” do something like “FDJKL492019_21490_,LFS.pdf”.
Solution – 2
This error may come about due to 128-bit AES encryption on the pdf, see Query – is there a way to bypass security restrictions on a pdf?
One workaround is to decrypt all isEncrypted pdfs with “qpdf”
qpdf --password='' --decrypt input.pdf output.pdf
Even if your PDF does not appear password protected, it may still be encrypted with no password. The above snippet assumes this is the case.
Solution – 3
It has nothing to do with whether the file has been decrypted or not when using the method getNumPages()
.
If we take a look at the source code of getNumPages()
:
def getNumPages(self):
"""
Calculates the number of pages in this PDF file.
:return: number of pages
:rtype: int
:raises PdfReadError: if file is encrypted and restrictions prevent
this action.
"""
# Flattened pages will not work on an Encrypted PDF;
# the PDF file's page count is used in this case. Otherwise,
# the original method (flattened page count) is used.
if self.isEncrypted:
try:
self._override_encryption = True
self.decrypt('')
return self.trailer["/Root"]["/Pages"]["/Count"]
except:
raise utils.PdfReadError("File has not been decrypted")
finally:
self._override_encryption = False
else:
if self.flattenedPages == None:
self._flatten()
return len(self.flattenedPages)
we will notice that it is the self.isEncrypted
property controlling the flow. And as we all know the isEncrypted
property is read-only and not changeable even when the pdf is decrypted.
So, the easy way to handle the situation is just add the password as key-word argument with empty string as default value and pass your password when using the getNumPages()
method and any other method build beyond it
Solution – 4
The following code could solve this problem:
import os
from PyPDF2 import PdfReader
filename = "example.pdf"
reader = PdfReader(filename)
if reader.is_encrypted:
try:
reader.decrypt("")
print("File Decrypted (PyPDF2)")
except:
command = (
"cp "
+ filename
+ " temp.pdf; qpdf --password='' --decrypt temp.pdf "
+ filename
+ "; rm temp.pdf"
)
os.system(command)
print("File Decrypted (qpdf)")
reader = PdfReader(filename)
else:
print("File Not Encrypted")
Solution – 5
You can try PyMuPDF
package, it can open encrypted files and solved my problems.
Reference: PyMuPDF Documentation
Solution – 6
Implement qpdf using python with pikepdf library.
import pikepdf
pdf = pikepdf.open('unextractable.pdf')
pdf.save('extractable.pdf')