Hướng dẫn python integrate pdf
It’s really useful to know how to create and modify PDF files in Python. The PDF, or Portable Document Format, is one of the most common formats for sharing documents over the Internet. PDFs can contain text, images, tables, forms, and rich media like videos and animations, all in a single file. Show
Nội dung chính
This abundance of content types can make working with PDFs difficult. There are a lot of different kinds of data to decode when opening a PDF file! Fortunately, the Python ecosystem has some great packages for reading, manipulating, and creating PDF files. In this tutorial, you’ll learn how to:
Along the way, you’ll have several opportunities to deepen your understanding by following along with the examples. You can download the materials used in the examples by clicking on the link below: Concatenating and Merging PDFsTwo common tasks when working with PDF files are concatenating and merging several PDFs into a single file. When you concatenate two or more PDFs, you join the files one after another into a single document. For example, a company may concatenate several daily reports into one monthly report at the end of a month. Merging two PDFs also joins the PDFs into a single file. But instead of joining the second PDF to the end of the first, merging allows you to insert it after a specific page in the first PDF. Then it pushes all of the first PDF’s pages after the insertion point to the end of the second PDF. In this section, you’ll learn how to concatenate and merge PDFs using the Using the PdfFileMerger ClassThe The main difference between the two
is that Go ahead and create your first >>>
There are a couple of ways to add pages to the
You’ll look at both methods in this section, starting
with Concatenating PDFs With .append()The Peter needs to concatenate these three PDFs and submit them to his employer as a single PDF file so that he can get reimbursed for some work-related expenses. You can start by using the >>>
After you import the Once you have the path to the Take a look at what’s in the directory: >>>
The names of the three files are listed, but they aren’t in order. Furthermore, the order of the files you see in the output on your computer may not match the output shown here. In general, the order of paths returned by >>>
Remember
that To confirm that the sorting worked, loop over >>>
That looks good! Now you can concatenate the three PDFs. To do that, you’ll use Let’s see this in action. First, import the >>>
Now loop over the paths in the sorted >>>
Notice that each With
all of the PDF files in the Open a new file in binary write mode, then pass the file object to the >>>
You now have a PDF file in your current working directory called Merging PDFs With .merge()To merge two or more PDFs, use Take a look at an example. Goggle, Inc. prepared a quarterly report but forgot to include a table of contents. Peter Python noticed the mistake and quickly created a PDF with the missing table of contents. Now he needs to merge that PDF into the original report. Both the report PDF and the table of contents PDF can be found in the In IDLE’s interactive window, import the >>>
The first
thing you’ll do is append the report PDF to a new >>>
Now that You want to insert the table of contents after
the title page and just before the introduction section. Since PDF page indices start with To do that, call
Here’s what that looks like: >>>
Every page in the table of contents PDF is inserted before the page at index Now write the merged PDF to an output file: >>>
You now have a
Concatenating and merging PDFs are common operations. While the examples in this section are admittedly somewhat contrived, you can imagine how useful a program would be for merging thousands of PDFs or for automating routine tasks that would otherwise take a human lots of time to complete. Check Your UnderstandingExpand the block below to check your understanding: In the Using a You can expand the block below to see a solution: Set up the path to the PDF file:
Now you can create the
Now loop over the paths in
Finally, write the contents of
When you’re ready, you can move on to the next section. Rotating and Cropping PDF PagesSo far, you’ve learned how to extract text and pages from PDFs and how
to and concatenate and merge two or more PDF files. These are all common operations with PDFs, but In this section, you’ll learn how to rotate and crop pages in a PDF file. Rotating PagesYou’ll start by learning how to rotate pages. For this example, you’ll use the Let’s fix that. In a new IDLE interactive window, start by importing the >>>
Now create a >>>
Finally, create new >>>
Your goal
is to use To correct the problem, you’ll use There are several ways you can
go about rotating pages in the PDF. We’ll discuss two different ways of doing it. Both of them rely on The first technique is to loop over the indices of the pages in the PDF and check if each index corresponds to a page that needs to be rotated. If so, then you’ll call Here’s what that looks like: >>>
Notice that the page gets rotated if the index is even. That might seem strange since the odd-numbered pages in the PDF are the ones that are rotated incorrectly. However, the page numbers in the PDF start with If that makes your head spin, don’t worry! Even after years of dealing with stuff like this, professional programmers still get tripped up by these sorts of things! Now
that you’ve rotated all the pages in the PDF, you can write the content of >>>
You should now have a file in your current working directory called The problem with the approach you just used to rotate the pages in the In fact, you can determine which pages need to be rotated without prior knowledge. Well, sometimes you can. Let’s see how, starting with a new >>>
You need to do this because you altered the pages in the old
>>>
Yikes! Mixed in with all that nonsensical-looking stuff is a key called You can access the >>>
If you look at the >>>
What all this means is that the page at index If you rotate the first page using >>>
Now that you know how to inspect the The first thing you need to do is reinitialize your >>>
Now write a loop that loops over the pages in the >>>
Not only is this loop slightly shorter than the loop in the first solution, but it doesn’t rely on any prior knowledge of which pages need to be rotated. You could use a loop like this to rotate pages in any PDF without ever having to open it up and look at it. To finish out the solution, write the contents of >>>
Now you can open the The value of This is one of many quirks that can make working with PDF files frustrating. Sometimes you’ll just need to open a PDF in a PDF reader program and manually figure things out. Cropping PagesAnother common operation with PDFs is cropping pages. You might need to do this to split a single page into multiple pages or to extract just a small portion of a page, such as a signature or a figure. For example, the Each page in this PDF has two columns. Let’s split each page into two pages, one for each column. To get started, import the >>>
Now create a >>>
Next, create a new >>>
To crop the page, you first need to know a little bit more about how pages are structured. You can use IDLE’s interactive window to explore the >>>
The The list
A You can use these four properties to get the coordinates of each corner of the >>>
Each property
returns a >>>
You can alter the coordinates of a >>>
When you change the >>>
When you alter the coordinates of the Go ahead and write the cropped page to a new PDF file: >>>
If you open the How would you crop the page so that
just the text on the left side of the page is visible? You would need to cut the horizontal dimensions of the page in half. You can achieve this by altering the First, you need to get new >>>
Now get the first page of the PDF: >>>
This
time, let’s work with a copy of the first page so that the page you just extracted stays intact. You can do that by importing the >>>
Now you can alter Now you need to do a little bit of math. You already worked out
that you need to move the upper right-hand corner of the First, get the current coordinates of the upper-right corner of the >>>
Then create a new >>>
Finally, assign the new coordinates to the >>>
You’ve now cropped the original page to contain only the text on the left side! Let’s extract the right side of the page next. First get a new copy of >>>
Move the >>>
This sets the upper-left
corner to the same coordinates that you moved the upper-right corner to when extracting the left side of the page. So, Finally, add the >>>
Now open the Check Your UnderstandingExpand the block below to check your understanding: In the Create a new file called You can expand the block below to see a solution: Set up the path to the PDF file:
Now you can create
Loop over the pages in
Finally, write the contents of
Encrypting and Decrypting PDFsSometimes PDF files are password protected. With the Encrypting PDFsYou can add password protection to a PDF file using the
Let’s use >>>
Now create a new >>>
Next, add the password >>>
When you set only Finally, write the encrypted PDF to an output file in your home directory called >>>
When you open the PDF with a PDF reader, you’ll be prompted to enter a password. Enter If you need to set a separate
owner password for the PDF, then pass a second string to the >>>
In this example, the user password is When you encrypt a PDF file with a password and attempt to open it, you must provide the password before you can view its contents. This protection extends to reading from the PDF in a Python program. Next, let’s see how to decrypt PDF files with Decrypting PDFsTo decrypt an encrypted PDF file, use the
Let’s open the encrypted First, create a new >>>
Before you decrypt the PDF, check out what happens if you try to get the first page: >>>
A Go ahead and decrypt the file now: >>>
Once you’ve decrypted the file, you can access the contents of the PDF: >>>
Now you can extract text and crop or rotate pages to your heart’s content! Check Your UnderstandingExpand the block below to check your understanding: In the Using You can expand the block below to see a solution: Set up the path to the PDF file:
Now create
You can append all of the pages from
Now use
Finally, write the contents of
Creating a PDF File From ScratchThe ReportLab is a full-featured solution for creating PDFs. There is a commercial version that costs money to use, but a limited-feature open source version is also available. Installing reportlabTo get started, you need to install
You can verify the installation with
At the time of writing,
the latest version of Using the Canvas ClassThe main interface for creating PDFs with Open a new IDLE interactive window and type the following to import the >>>
When you make a new
>>>
You now have a Let’s add some text to the PDF. To do that, you use
>>>
The first two arguments passed to The values passed to To save the PDF to a file, use You now have a PDF file in your current working directory called There are a few things to notice about the PDF you just created:
You’re not stuck with these settings. Setting the Page SizeWhen you instantiate a For example, to set the page size to
If doing the math to convert points to inches or centimeters isn’t your cup of tea, then you can use the Go ahead and import the >>>
Now you can inspect each object to see what they are: >>>
Both To use the units, multiply the unit name by the number of units that you want to convert to points. For example, here’s how to use >>>
By passing a tuple to The page sizes are located in the >>>
If you inspect the >>>
The
In addition to these, the module contains definitions for all of the ISO 216 standard paper sizes. Setting Font PropertiesYou can also change the font, font size, and font color when you write text to the To change the font and font size, you can use >>>
Then
set the font to Times New Roman with a size of >>>
Finally, write the string >>>
With these settings, the text will be written one inch from the left side of the page and ten inches from the bottom. Open up the There are three fonts available by default:
Each font has bolded and italicized variants. Here’s a list of all the font variations available in
You can also set the font color using
The examples in this section highlight the basics of working with the The ReportLab User Guide contains a plethora of examples of how to generate PDF documents from scratch. It’s a great place to start if you’re interested in learning more about creating PDFs with Python. Check Your UnderstandingExpand the block below to check your understanding: Create a PDF in your computer’s home directory called You can expand the block below to see a solution: Set
up the
Now draw the string
Finally, save the When you’re ready, you can move on to the next section. Conclusion: Create and Modify PDF Files in PythonIn this tutorial, you learned how to create and modify PDF files with the
With
You also had an introduction to creating PDF files from scratch with the
Happy coding! |