Table of Contents
Introduction
PyPDF2 is the Python package used to edit and manage PDFs.
With a few lines of code it is possible to do so many things such as merging several PDFs into one or dividing them into several pages.
Very often, PDF is seen as a document that is difficult to edit and requires additional tools.
Well, in this post I want to show you that handling PDFs with PyPDF2 is really simple.
Being a non built-in package, we’ll start with the installation and then I’ll show you how to split a PDF into multiple pages and then how to merge multiple files into one PDF.
Don’t worry, as usual you will find simple explanations full of examples!
PyPDF2
For this tutorial we will use PyPDF2, a Python package that allows you to read, merge and modify PDFs in few lines of code.
Read on to learn how to install it and how to use it!
Package installation
Since we are installing an external package, I recommend that you create a virtualenv.
If you don’t know how to do it, here there is a post with the procedure on how to create a virtualenv.
Regardless of whether you have created the virtualenv or not (I recommend it) you can install qrcode with the following command:
pip install PyPDF2
If you need to install a specific version, for example 2.10.0, use this command:
pip install PyPDF2==2.10.0
Feel free to replace 2.10.0 with the version you need.
It is possible to use PyPDF2 with both Python2 and Python3 but be careful which version you need to install.
If you look at the documentation you will be able to understand which version is suitable for you.
For simplicity, the table below shows the version scheme.

How to extract PDF information using PyPDF2
The first very useful thing I want to show you is to extract some information from a PDF such as the author, title, number of pages, etc.
In particular, the PyPDF2 package allows you to return all this information by calling a simple method:
- Title
- Creator
- Producer
- Creation date
The method is getDocumentInfo()
and belongs to the PdfReader
class.
Basically we just need to read the PDF file and call this method to get all the information we need.
Let’s see how to use it:
from PyPDF2 import PdfReader
# read original pdf
whole_pdf = PdfReader(open(r"whole_pdf.pdf", "rb"))
# get PDF info
pdf_info = whole_pdf.metadata()
print(pdf_info)
print(pdf_info.title)
print(pdf_info.creator)
print(pdf_info.producer)
print(pdf_info.creation_date)
How to extract text from a PDF using PyPDF2
In the previous chapter we saw how PyPDF2 can be used to extract information from a PDF.
But I guess you’ve definitely had to extract text from a PDF.
Well, we have dedicated an entire post to this very common task.
Here you can find the guide for how to extract text from a PDF.
How to split PDF in multiple files using PyPDF2
In this section we will see how you can split a PDF into multiple files using the PyPDF2 Python package.
First we need to import the necessary packages and read the file we want to split.
in this example, the file that we want to read is called whole_pdf.pdf
and it is located in the same directory of the Python script below.
from PyPDF2 import PdfWriter, PdfReader
# read original pdf
whole_pdf = PdfReader(open(r"whole_pdf.pdf", "rb"))
# get number of pages
pages_number = len(whole_pdf.pages)
print("Found #{} pages".format(pages_number))
The next step is to iterate over the pages of the read PDF and create a new file for each page.
This way we’re going to split the original PDF into multiple files.
# loop on pdf pages and write a new pdf for each page
for i in range(pages_number):
# create a new pdf with the current page
new_pdf = PdfWriter()
new_pdf.add_page(whole_pdf.pages[i])
# write new pdf
file_name = "page_{}.pdf".format(i+1)
with open(file_name, "wb") as outputStream:
new_pdf.write(outputStream)
With these few lines of code we are therefore able to divide a PDF with several pages into different files.
How to merge PDF files into a single one using PyPDF2
In this section we will see how you can merge multiple PDF files into a single one using PyPDF2 Python package.
The first thing we need to do is import the necessary packages and prepare a Python list with all the PDF files to be merged.
from PyPDF2 import PdfMerger
pdfs = ["my_first_pdf.pdf", "my_second_amazing_pdf.pdf", "third_one.pdf"]
Please remember to use the full path of files and not just the name. This will help you when the PDF files are not in the same directory of the script.
At this point you just have to create a PdfMerger object of the PyPDF2 package and that’s it.
# init PdfMerger object
merger = PdfMerger()
# append to the PdfMerger all PDF files
for pdf in pdfs:
merger.append(pdf)
# write a new PDF
merger.write("final_pdf_file.pdf")
merger.close()
As you can see from the code above, the PdfMerger object behaves like a Python list.
You can then add as many PDF files as you want to have in your final PDF with the append
method.
Once this is done, just write the file.
I remind you that if you want to write the file in another directory than the script you will have to specify the complete path and not just the file name.
How to rotate pages of a PDF using PyPDF2
In the previous chapters we have seen how it is possible to do some operations on PDFs with PyPDF2. In this section we will see how you can rotate all pages of a PDF by a certain angle.
The general idea is as follows:
- Read the PDF to rotate
- Decide on a rotation angle, for example 90 degrees
- Use the PdfWriter object to create a new PDF with rotated pages
The code is as follows:
from PyPDF2 import PdfReader, PdfWriter
# read original pdf
whole_pdf = PdfReader(open(r"page_1.pdf", "rb"))
new_pdf = PdfWriter()
# loop on PDF pages and rotate pages
for page in whole_pdf.pages:
page.rotate(90)
new_pdf.add_page(page)
# write the new pdf
with open("final_pdf_rotated.pdf", "wb") as outputStream:
new_pdf.write(outputStream)
Conclusion
Here we are at the end of this post, as always I hope this article will be useful to you and that now you know everything about the Python package PyPDF2.
If you find that something is unclear or you have a problem that you don’t know how to solve, leave me a comment below and I will try to help you as soon as possible.
If this topic is clear to you, take a look at the latest posts!