如何在Python中创建PDF文件

问题:如何在Python中创建PDF文件

我正在一个项目中,该项目从用户那里获取一些图像,然后创建一个包含所有这些图像的PDF文件。

有什么方法或工具可以在Python中做到这一点吗?例如要从image1 + image 2 + image 3-> PDF文件创建PDF文件(或eps,ps)?

I’m working on a project which takes some images from user and then creates a PDF file which contains all of these images.

Is there any way or any tool to do this in Python? E.g. to create a PDF file (or eps, ps) from image1 + image 2 + image 3 -> PDF file?


回答 0

我建议pyPdf。它真的很好。不久前,我也写了一篇博客文章,您可以在这里找到它。

I suggest pyPdf. It works really nice. I also wrote a blog post some while ago, you can find it here.


回答 1

遵循此页面上的提示后,这是我的经验。

  1. pyPDF无法将图像嵌入文件中。它只能拆分和合并。(来源:Ctrl + F通过其文档页面)这很好,但是如果您的图像尚未嵌入PDF中,则不行。

  2. pyPDF2似乎在pyPDF之上没有任何其他文档。

  3. ReportLab非常广泛。(Userguide)但是,通过一点Ctrl + F并通过其源代码进行grepping,我得到了:

    • 首先,下载Windows安装程序源代码
    • 然后在Python命令行上尝试:

      from reportlab.pdfgen import canvas
      from reportlab.lib.units import inch, cm
      c = canvas.Canvas('ex.pdf')
      c.drawImage('ar.jpg', 0, 0, 10*cm, 10*cm)
      c.showPage()
      c.save()
      

我需要做的就是将一堆图像转换成PDF,以便检查它们的外观并进行打印。以上足以实现该目标。

ReportLab很棒,但可以从其文档中突出显示的helloworlds中受益。

Here is my experience after following the hints on this page.

  1. pyPDF can’t embed images into files. It can only split and merge. (Source: Ctrl+F through its documentation page) Which is great, but not if you have images that are not already embedded in a PDF.

  2. pyPDF2 doesn’t seem to have any extra documentation on top of pyPDF.

  3. ReportLab is very extensive. (Userguide) However, with a bit of Ctrl+F and grepping through its source, I got this:

    • First, download the Windows installer and source
    • Then try this on Python command line:

      from reportlab.pdfgen import canvas
      from reportlab.lib.units import inch, cm
      c = canvas.Canvas('ex.pdf')
      c.drawImage('ar.jpg', 0, 0, 10*cm, 10*cm)
      c.showPage()
      c.save()
      

All I needed is to get a bunch of images into a PDF, so that I can check how they look and print them. The above is sufficient to achieve that goal.

ReportLab is great, but would benefit from including helloworlds like the above prominently in its documentation.


回答 2

我建议使用Pdfkit。(安装指南

它从html文件创建pdf。我选择从Python金字塔堆栈通过2个步骤创建pdf:

  1. 使用mako模板在服务器端呈现您想要的样式和标记pdf文档
  2. pdfkit.from_string(...)通过将呈现的html作为参数传递的执行方法

这样,您将获得包含样式和图像的pdf文档。

您可以按以下方式安装它:

  • 使用点

    pip install pdfkit

  • 您还需要在Ubuntu上安装wkhtmltopdf 。

I suggest Pdfkit. (installation guide)

It creates pdf from html files. I chose it to create pdf in 2 steps from my Python Pyramid stack:

  1. Rendering server-side with mako templates with the style and markup you want for you pdf document
  2. Executing pdfkit.from_string(...) method by passing the rendered html as parameter

This way you get a pdf document with styling and images supported.

You can install it as follows :

  • using pip

    pip install pdfkit

  • You will also need to install wkhtmltopdf (on Ubuntu).

回答 3

您可以尝试使用这种方法(Python-for-PDF-Generation),也可以尝试使用PyQt,它支持打印到pdf。

用于生成PDF的Python

可移植文档格式(PDF)使您可以创建在每个平台上看起来完全相同的文档。但是,有时需要动态生成PDF文档,这可能是一个很大的挑战。幸运的是,有一些库可以提供帮助。本文研究了其中一种适用于Python的方法。

http://www.devshed.com/c/a/Python/Python-for-PDF-Generation/#whoCFCPh3TAks368.99上了解更多信息

You can try this(Python-for-PDF-Generation) or you can try PyQt, which has support for printing to pdf.

Python for PDF Generation

The Portable Document Format (PDF) lets you create documents that look exactly the same on every platform. Sometimes a PDF document needs to be generated dynamically, however, and that can be quite a challenge. Fortunately, there are libraries that can help. This article examines one of those for Python.

Read more at http://www.devshed.com/c/a/Python/Python-for-PDF-Generation/#whoCFCPh3TAks368.99


回答 4

这是仅适用于标准软件包的解决方案。matplotlib有一个PDF后端,可以将图形保存为PDF。您可以创建带有子图的图形,其中每个子图都是您的图像之一。您完全可以随意摆弄图形:添加标题,按位置播放等。完成图形后,保存为PDF。每次调用savefig都会创建另一页PDF。

下面的示例在第1页和第2页上并排绘制2张图像。

from matplotlib.backends.backend_pdf import PdfPages
import matplotlib.pyplot as plt
from scipy.misc import imread
import os
import numpy as np

files = [ "Column0_Line16.jpg", "Column0_Line47.jpg" ]
def plotImage(f):
    folder = "C:/temp/"
    im = imread(os.path.join(folder, f)).astype(np.float32) / 255
    plt.imshow(im)
    a = plt.gca()
    a.get_xaxis().set_visible(False) # We don't need axis ticks
    a.get_yaxis().set_visible(False)

pp = PdfPages("c:/temp/page1.pdf")
plt.subplot(121)
plotImage(files[0])
plt.subplot(122)
plotImage(files[1])
pp.savefig(plt.gcf()) # This generates page 1
pp.savefig(plt.gcf()) # This generates page 2
pp.close()

Here is a solution that works with only the standard packages. matplotlib has a PDF backend to save figures to PDF. You can create a figures with subplots, where each subplot is one of your images. You have full freedom to mess with the figure: Adding titles, play with position, etc. Once your figure is done, save to PDF. Each call to savefig will create another page of PDF.

Example below plots 2 images side-by-side, on page 1 and page 2.

from matplotlib.backends.backend_pdf import PdfPages
import matplotlib.pyplot as plt
from scipy.misc import imread
import os
import numpy as np

files = [ "Column0_Line16.jpg", "Column0_Line47.jpg" ]
def plotImage(f):
    folder = "C:/temp/"
    im = imread(os.path.join(folder, f)).astype(np.float32) / 255
    plt.imshow(im)
    a = plt.gca()
    a.get_xaxis().set_visible(False) # We don't need axis ticks
    a.get_yaxis().set_visible(False)

pp = PdfPages("c:/temp/page1.pdf")
plt.subplot(121)
plotImage(files[0])
plt.subplot(122)
plotImage(files[1])
pp.savefig(plt.gcf()) # This generates page 1
pp.savefig(plt.gcf()) # This generates page 2
pp.close()

回答 5

我在PyQt中做了很多,而且效果很好。Qt对图像,字体,样式等具有广泛的支持,所有这些都可以写到pdf文档中。

I have done this quite a bit in PyQt and it works very well. Qt has extensive support for images, fonts, styles, etc and all of those can be written out to pdf documents.


回答 6

我相信matplotlib可以将图形,文本和其他对象序列化为pdf文档。

I believe that matplotlib has the ability to serialize graphics, text and other objects to a pdf document.


回答 7

fpdf是python(也是)。并经常使用。请参阅PyPI /点子搜索。但也许将其从pyfpdf重命名为fpdf。来自功能:PNG,GIF和JPG支持(包括透明度和Alpha通道)

fpdf is python (too). And often used. See PyPI / pip search. But maybe it was renamed from pyfpdf to fpdf. From features: PNG, GIF and JPG support (including transparency and alpha channel)


回答 8

我使用rst2pdf来创建pdf文件,因为我对RST比对HTML更熟悉。它支持嵌入几乎任何种类的栅格或矢量图像。

它需要reportlab,但是我发现reportlab并不是那么直接使用(至少对我而言)。

I use rst2pdf to create a pdf file, since I am more familiar with RST than with HTML. It supports embedding almost any kind of raster or vector images.

It requires reportlab, but I found reportlab is not so straight forward to use (at least for me).


回答 9

您实际上可以尝试xhtml2pdf http://flask.pocoo.org/snippets/68/


回答 10

这取决于图像文件的格式,但是对于这里正在工作的项目,我使用了RemoteSensing.org的 LibTIFF中的tiff2pdf工具。基本上,只是使用了子进程来调用tiff2pdf.exe,并带有适当的参数来读取我拥有的tiff类型并输出我想要的pdf类型。如果它们不是tiff,则可以使用PIL将其转换为tiff,或者可以找到一种更适合于您的图像类型的工具(如果图像多种多样,则可以使用更通用的工具),例如上面提到的ReportLab。

It depends on what format your image files are in, but for a project here at work I used the tiff2pdf tool in LibTIFF from RemoteSensing.org. Basically just used subprocess to call tiff2pdf.exe with the appropriate argument to read the kind of tiff I had and output the kind of pdf I wanted. If they aren’t tiffs you could probably convert them to tiffs using PIL, or maybe find a tool more specific to your image type (or more generic if the images will be diverse) like ReportLab mentioned above.


回答 11

fpdf对我来说效果很好。比ReportLab简单得多,而且真正免费。与UTF-8一起使用。

fpdf works well for me. Much simpler than ReportLab and really free. Works with UTF-8.


回答 12

rinohtype支持(本地)嵌入PDF,PNG和JPEG图像以及其他位图格式(安装Pillow时)。

(全部披露:我是rinohtype的作者)

rinohtype supports embedding PDF, PNG and JPEG images (natively) and other bitmap formats (when Pillow is installed).

(Full disclosure: I am the author of rinohtype)


回答 13

如果您熟悉LaTex,则可能要考虑使用pylatex

pylatex的优点之一是易于控制图像质量。pdf中的图像将具有与原始图像相同的质量。使用reportlab时,我感到图像被自动压缩,并且图像质量降低了。

pylatex的缺点在于,由于它基于LaTex,因此很难将图像准确地放置在页面上所需的位置。但是,我发现在Figure类中使用position参数,有时在Subfigure中使用,可以得到足够好的结果。

用于创建带有单个图像的pdf的示例代码:

from pylatex import Document, Figure

doc = Document(documentclass="article")
with doc.create(Figure(position='p')) as fig:
fig.add_image('Lenna.png')

doc.generate_pdf('test', compiler='latexmk', compiler_args=["-pdf", "-pdflatex=pdflatex"], clean_tex=True)

除了安装pylatex(pip install pylatex)外,您还需要安装LaTex。对于Ubuntu和其他Debian系统,您可以运行sudo apt-get install texlive-full。如果您使用的是Windows,我建议您使用MixTex

If you are familiar with LaTex you might want to consider pylatex

One of the advantages of pylatex is that it is easy to control the image quality. The images in your pdf will be of the same quality as the original images. When using reportlab, I experienced that the images were automatically compressed, and the image quality reduced.

The disadvantage of pylatex is that, since it is based on LaTex, it can be hard to place images exactly where you want on the page. However, I have found that using the position argument in the Figure class, and sometimes Subfigure, gives good enough results.

Example code for creating a pdf with a single image:

from pylatex import Document, Figure

doc = Document(documentclass="article")
with doc.create(Figure(position='p')) as fig:
fig.add_image('Lenna.png')

doc.generate_pdf('test', compiler='latexmk', compiler_args=["-pdf", "-pdflatex=pdflatex"], clean_tex=True)

In addition to installing pylatex (pip install pylatex), you need to install LaTex. For Ubuntu and other Debian systems you can run sudo apt-get install texlive-full. If you are using Windows I would recommend MixTex