Split PDF Python

Table of Contents

Introduction

If at least once in your life you wondered how to split a PDF into multiple pages using Python, you’re in the right place!
Very often, PDF is seen as a document that is difficult to edit and requires additional tools.

In this post we will see that editing a PDF in Python is very simple and requires very few lines of code.

Split PDF using Python – PyPDF2

Split a PDF in multiple files using Python is very simple.
For this tutorial we will use PyPDF2, a Python package that allows you to read, merge and modify PDFs in few lines of code.

Read on to learn how to install it and how to use it!

PyPDF2 package installation

Since we are installing an external package, I recommend that you create a virtualenv.
If you don’t know how to do it, here there is a post with the procedure on how to create a virtualenv.

Regardless of whether you have created the virtualenv or not (I recommend it) you can install qrcode with the following command:

pip install PyPDF2

If you need to install a specific version, for example 2.10.0, use this command:

pip install PyPDF2==2.10.0

Feel free to replace 2.10.0 with the version you need.

It is possible to use PyPDF2 with both Python2 and Python3 but be careful which version you need to install.
If you look at the documentation you will be able to understand which version is suitable for you.

For simplicity, the table below shows the version scheme.

PyPDF2 version based on Python installation
Which PyPDF2 version to install based on Python version

How to split PDF in multiple files using Python

In this section we will see how you can split a PDF into multiple files using Python.
To solve this problem we will use the PyPDF2 package.

First we need to import the necessary packages and read the file we want to split.

from PyPDF2 import PdfFileWriter, PdfFileReader
# read original pdf
whole_pdf = PdfFileReader(open(r"whole_pdf.pdf", "rb"))
# get number of pages
pages_number = whole_pdf.numPages
print("Found #{} pages".format(pages_number))

The next step is to iterate over the pages of the read PDF and create a new file for each page.
This way we’re going to split the original PDF into multiple files.

# loop on pdf pages and write a new pdf for each page
for i in range(pages_number):
    # create a new pdf with the current page
    new_pdf = PdfFileWriter()
    new_pdf.addPage(whole_pdf.getPage(i))
    # write new pdf
    file_name = "page_{}.pdf".format(i+1)
    with open(file_name, "wb") as outputStream:
        new_pdf.write(outputStream)

With these few lines of code we are therefore able to divide a PDF with several pages into different files.

Conclusion

Here we are at the end of this post, as always I hope this article will be useful to you and that now you know everything about the Python package PyPDF2.
If you find that something is unclear or you have a problem that you don’t know how to solve, leave me a comment below and I will try to help you as soon as possible.

If this topic is clear to you, take a look at the latest posts!

Leave a Comment

Your email address will not be published. Required fields are marked *