问题:使用Python将文本添加到现有PDF
我需要使用Python向现有的PDF中添加一些额外的文本,最好的方法是什么,需要安装哪些额外的模块。
注意:理想情况下,我希望能够在Windows和Linux上都可以运行此程序,但是一键式仅Linux可以运行。
编辑:pyPDF和ReportLab看起来不错,但没人允许我编辑现有的PDF,还有其他选择吗?
I need to add some extra text to an existing PDF using Python, what is the best way to go about this and what extra modules will I need to install.
Note: Ideally I would like to be able to run this on both Windows and Linux, but at a push Linux only will do.
Edit: pyPDF and ReportLab look good but neither one will allow me to edit an existing PDF, are there any other options?
回答 0
我知道这是一篇较旧的文章,但是我花了很长时间尝试寻找解决方案。我碰巧只使用ReportLab和PyPDF,所以我想分享一下:
- 使用阅读您的PDF
PdfFileReader()
,我们称此输入
- 创建一个包含要使用ReportLab添加的文本的新pdf文件,并将其另存为字符串对象
- 使用读取字符串对象
PdfFileReader()
,我们将此文本称为
- 使用创建一个新的PDF对象
PdfFileWriter()
,我们将其称为输出
- 遍历输入内容并申请
.mergePage(*text*.getPage(0))
要添加文本的每个页面,然后用于output.addPage()
将修改后的页面添加到新文档中
这对于简单的文本添加效果很好。请参阅PyPDF的示例为文档加水印。
这是一些代码,可以回答以下问题:
packet = StringIO.StringIO()
can = canvas.Canvas(packet, pagesize=letter)
<do something with canvas>
can.save()
packet.seek(0)
input = PdfFileReader(packet)
在这里,您可以将输入文件的页面与另一个文档合并。
I know this is an older post, but I spent a long time trying to find a solution. I came across a decent one using only ReportLab and PyPDF so I thought I’d share:
- read your PDF using
PdfFileReader()
, we’ll call this input
- create a new pdf containing your text to add using ReportLab, save this as a string object
- read the string object using
PdfFileReader()
, we’ll call this text
- create a new PDF object using
PdfFileWriter()
, we’ll call this output
- iterate through input and apply
.mergePage(*text*.getPage(0))
for each page you want the text added to, then use output.addPage()
to add the modified pages to a new document
This works well for simple text additions. See PyPDF’s sample for watermarking a document.
Here is some code to answer the question below:
packet = StringIO.StringIO()
can = canvas.Canvas(packet, pagesize=letter)
<do something with canvas>
can.save()
packet.seek(0)
input = PdfFileReader(packet)
From here you can merge the pages of the input file with another document.
回答 1
[Python 2.7]的示例:
from pyPdf import PdfFileWriter, PdfFileReader
import StringIO
from reportlab.pdfgen import canvas
from reportlab.lib.pagesizes import letter
packet = StringIO.StringIO()
# create a new PDF with Reportlab
can = canvas.Canvas(packet, pagesize=letter)
can.drawString(10, 100, "Hello world")
can.save()
#move to the beginning of the StringIO buffer
packet.seek(0)
new_pdf = PdfFileReader(packet)
# read your existing PDF
existing_pdf = PdfFileReader(file("original.pdf", "rb"))
output = PdfFileWriter()
# add the "watermark" (which is the new pdf) on the existing page
page = existing_pdf.getPage(0)
page.mergePage(new_pdf.getPage(0))
output.addPage(page)
# finally, write "output" to a real file
outputStream = file("destination.pdf", "wb")
output.write(outputStream)
outputStream.close()
Python 3.x的示例:
from PyPDF2 import PdfFileWriter, PdfFileReader
import io
from reportlab.pdfgen import canvas
from reportlab.lib.pagesizes import letter
packet = io.BytesIO()
# create a new PDF with Reportlab
can = canvas.Canvas(packet, pagesize=letter)
can.drawString(10, 100, "Hello world")
can.save()
#move to the beginning of the StringIO buffer
packet.seek(0)
new_pdf = PdfFileReader(packet)
# read your existing PDF
existing_pdf = PdfFileReader(open("original.pdf", "rb"))
output = PdfFileWriter()
# add the "watermark" (which is the new pdf) on the existing page
page = existing_pdf.getPage(0)
page.mergePage(new_pdf.getPage(0))
output.addPage(page)
# finally, write "output" to a real file
outputStream = open("destination.pdf", "wb")
output.write(outputStream)
outputStream.close()
Example for [Python 2.7]:
from pyPdf import PdfFileWriter, PdfFileReader
import StringIO
from reportlab.pdfgen import canvas
from reportlab.lib.pagesizes import letter
packet = StringIO.StringIO()
# create a new PDF with Reportlab
can = canvas.Canvas(packet, pagesize=letter)
can.drawString(10, 100, "Hello world")
can.save()
#move to the beginning of the StringIO buffer
packet.seek(0)
new_pdf = PdfFileReader(packet)
# read your existing PDF
existing_pdf = PdfFileReader(file("original.pdf", "rb"))
output = PdfFileWriter()
# add the "watermark" (which is the new pdf) on the existing page
page = existing_pdf.getPage(0)
page.mergePage(new_pdf.getPage(0))
output.addPage(page)
# finally, write "output" to a real file
outputStream = file("destination.pdf", "wb")
output.write(outputStream)
outputStream.close()
Example for Python 3.x:
from PyPDF2 import PdfFileWriter, PdfFileReader
import io
from reportlab.pdfgen import canvas
from reportlab.lib.pagesizes import letter
packet = io.BytesIO()
# create a new PDF with Reportlab
can = canvas.Canvas(packet, pagesize=letter)
can.drawString(10, 100, "Hello world")
can.save()
#move to the beginning of the StringIO buffer
packet.seek(0)
new_pdf = PdfFileReader(packet)
# read your existing PDF
existing_pdf = PdfFileReader(open("original.pdf", "rb"))
output = PdfFileWriter()
# add the "watermark" (which is the new pdf) on the existing page
page = existing_pdf.getPage(0)
page.mergePage(new_pdf.getPage(0))
output.addPage(page)
# finally, write "output" to a real file
outputStream = open("destination.pdf", "wb")
output.write(outputStream)
outputStream.close()
回答 2
pdfrw允许您读取现有PDF的页面并将其绘制到reportlab画布上(类似于绘制图像)。github上的pdfrw examples / rl1子目录中有一些示例。免责声明:我是pdfrw的作者。
pdfrw will let you read in pages from an existing PDF and draw them to a reportlab canvas (similar to drawing an image). There are examples for this in the pdfrw examples/rl1 subdirectory on github. Disclaimer: I am the pdfrw author.
回答 3
利用David Dehghan的回答,以下在Python 2.7.13中起作用:
from PyPDF2 import PdfFileWriter, PdfFileReader, PdfFileMerger
import StringIO
from reportlab.pdfgen import canvas
from reportlab.lib.pagesizes import letter
packet = StringIO.StringIO()
# create a new PDF with Reportlab
can = canvas.Canvas(packet, pagesize=letter)
can.drawString(290, 720, "Hello world")
can.save()
#move to the beginning of the StringIO buffer
packet.seek(0)
new_pdf = PdfFileReader(packet)
# read your existing PDF
existing_pdf = PdfFileReader("original.pdf")
output = PdfFileWriter()
# add the "watermark" (which is the new pdf) on the existing page
page = existing_pdf.getPage(0)
page.mergePage(new_pdf.getPage(0))
output.addPage(page)
# finally, write "output" to a real file
outputStream = open("destination.pdf", "wb")
output.write(outputStream)
outputStream.close()
Leveraging David Dehghan‘s answer above, the following works in Python 2.7.13:
from PyPDF2 import PdfFileWriter, PdfFileReader, PdfFileMerger
import StringIO
from reportlab.pdfgen import canvas
from reportlab.lib.pagesizes import letter
packet = StringIO.StringIO()
# create a new PDF with Reportlab
can = canvas.Canvas(packet, pagesize=letter)
can.drawString(290, 720, "Hello world")
can.save()
#move to the beginning of the StringIO buffer
packet.seek(0)
new_pdf = PdfFileReader(packet)
# read your existing PDF
existing_pdf = PdfFileReader("original.pdf")
output = PdfFileWriter()
# add the "watermark" (which is the new pdf) on the existing page
page = existing_pdf.getPage(0)
page.mergePage(new_pdf.getPage(0))
output.addPage(page)
# finally, write "output" to a real file
outputStream = open("destination.pdf", "wb")
output.write(outputStream)
outputStream.close()
回答 4
cpdf将通过命令行执行此工作。它不是python,但是(afaik):
cpdf -add-text "Line of text" input.pdf -o output .pdf
cpdf will do the job from the command-line. It isn’t python, though (afaik):
cpdf -add-text "Line of text" input.pdf -o output .pdf
回答 5
将问题分解为将PDF转换为可编辑格式,编写更改,然后再将其转换回PDF可能会更好。我不知道可以直接编辑PDF的库,但是例如在DOC和PDF之间有很多转换器。
You may have better luck breaking the problem down into converting PDF into an editable format, writing your changes, then converting it back into PDF. I don’t know of a library that lets you directly edit PDF but there are plenty of converters between DOC and PDF for example.
回答 6
回答 7
您尝试过pyPdf吗?
抱歉,它无法修改页面的内容。
Have you tried pyPdf ?
Sorry, it doesn’t have the ability to modify a page’s content.