合并PDF文件

Question 1

Is it possible, using Python, to merge separate PDF files?

Assuming so, I need to extend this a little further. I am hoping to loop through folders in a directory and repeat this procedure.

And I may be pushing my luck, but is it possible to exclude a page that is contained in of the PDFs (my report generation always creates an extra blank page).

Question 2

Use Pypdf or its successor PyPDF2:

A Pure-Python library built as a PDF toolkit. It is capable of:
* splitting documents page by page,
* merging documents page by page,

(and much more)

Here’s a sample program that works with both versions.

#!/usr/bin/env python
import sys
try:
    from PyPDF2 import PdfFileReader, PdfFileWriter
except ImportError:
    from pyPdf import PdfFileReader, PdfFileWriter

def pdf_cat(input_files, output_stream):
    input_streams = []
    try:
        # First open all the files, then produce the output file, and
        # finally close the input files. This is necessary because
        # the data isn't read from the input files until the write
        # operation. Thanks to
        # https://stackoverflow.com/questions/6773631/problem-with-closing-python-pypdf-writing-getting-a-valueerror-i-o-operation/6773733#6773733
        for input_file in input_files:
            input_streams.append(open(input_file, 'rb'))
        writer = PdfFileWriter()
        for reader in map(PdfFileReader, input_streams):
            for n in range(reader.getNumPages()):
                writer.addPage(reader.getPage(n))
        writer.write(output_stream)
    finally:
        for f in input_streams:
            f.close()

if __name__ == '__main__':
    if sys.platform == "win32":
        import os, msvcrt
        msvcrt.setmode(sys.stdout.fileno(), os.O_BINARY)
    pdf_cat(sys.argv[1:], sys.stdout)

Question 3

You can use PyPdf2s PdfMerger class.

File Concatenation

You can simply concatenate files by using the append method.

from PyPDF2 import PdfFileMerger

pdfs = ['file1.pdf', 'file2.pdf', 'file3.pdf', 'file4.pdf']

merger = PdfFileMerger()

for pdf in pdfs:
    merger.append(pdf)

merger.write("result.pdf")
merger.close()

You can pass file handles instead file paths if you want.

File Merging

If you want more fine grained control of merging there is a merge method of the PdfMerger, which allows you to specify an insertion point in the output file, meaning you can insert the pages anywhere in the file. The append method can be thought of as a merge where the insertion point is the end of the file.

e.g.

merger.merge(2, pdf)

Here we insert the whole pdf into the output but at page 2.

Page Ranges

If you wish to control which pages are appended from a particular file, you can use the pages keyword argument of append and merge, passing a tuple in the form (start, stop[, step]) (like the regular range function).

e.g.

merger.append(pdf, pages=(0, 3))    # first 3 pages
merger.append(pdf, pages=(0, 6, 2)) # pages 1,3, 5

If you specify an invalid range you will get an IndexError.

Note: also that to avoid files being left open, the PdfFileMergers close method should be called when the merged file has been written. This ensures all files are closed (input and output) in a timely manner. It’s a shame that PdfFileMerger isn’t implemented as a context manager, so we can use the with keyword, avoid the explicit close call and get some easy exception safety.

You might also want to look at the pdfcat script provided as part of pypdf2. You can potentially avoid the need to write code altogether.

The PyPdf2 github also includes some example code demonstrating merging.

Question 4

Merge all pdf files that are present in a dir

Put the pdf files in a dir. Launch the program. You get one pdf with all the pdfs merged.

import os
from PyPDF2 import PdfFileMerger

x = [a for a in os.listdir() if a.endswith(".pdf")]

merger = PdfFileMerger()

for pdf in x:
    merger.append(open(pdf, 'rb'))

with open("result.pdf", "wb") as fout:
    merger.write(fout)

Question 5

The pdfrw library can do this quite easily, assuming you don’t need to preserve bookmarks and annotations, and your PDFs aren’t encrypted. cat.py is an example concatenation script, and subset.py is an example page subsetting script.

The relevant part of the concatenation script — assumes inputs is a list of input filenames, and outfn is an output file name:

from pdfrw import PdfReader, PdfWriter

writer = PdfWriter()
for inpfn in inputs:
    writer.addpages(PdfReader(inpfn).pages)
writer.write(outfn)

As you can see from this, it would be pretty easy to leave out the last page, e.g. something like:

    writer.addpages(PdfReader(inpfn).pages[:-1])

Disclaimer: I am the primary pdfrw author.

Question 6

Is it possible, using Python, to merge seperate PDF files?

Yes.

The following example merges all files in one folder to a single new PDF file:

#!/usr/bin/env python
# -*- coding: utf-8 -*-

from argparse import ArgumentParser
from glob import glob
from pyPdf import PdfFileReader, PdfFileWriter
import os

def merge(path, output_filename):
    output = PdfFileWriter()

    for pdffile in glob(path + os.sep + '*.pdf'):
        if pdffile == output_filename:
            continue
        print("Parse '%s'" % pdffile)
        document = PdfFileReader(open(pdffile, 'rb'))
        for i in range(document.getNumPages()):
            output.addPage(document.getPage(i))

    print("Start writing '%s'" % output_filename)
    with open(output_filename, "wb") as f:
        output.write(f)

if __name__ == "__main__":
    parser = ArgumentParser()

    # Add more options if you like
    parser.add_argument("-o", "--output",
                        dest="output_filename",
                        default="merged.pdf",
                        help="write merged PDF to FILE",
                        metavar="FILE")
    parser.add_argument("-p", "--path",
                        dest="path",
                        default=".",
                        help="path of source PDF files")

    args = parser.parse_args()
    merge(args.path, args.output_filename)

Question 7

from PyPDF2 import PdfFileMerger
import webbrowser
import os
dir_path = os.path.dirname(os.path.realpath(__file__))

def list_files(directory, extension):
    return (f for f in os.listdir(directory) if f.endswith('.' + extension))

pdfs = list_files(dir_path, "pdf")

merger = PdfFileMerger()

for pdf in pdfs:
    merger.append(open(pdf, 'rb'))

with open('result.pdf', 'wb') as fout:
    merger.write(fout)

webbrowser.open_new('file://'+ dir_path + '/result.pdf')

Git Repo: https://github.com/mahaguru24/Python_Merge_PDF.git

Question 8

here, http://pieceofpy.com/2009/03/05/concatenating-pdf-with-python/, gives an solution.

similarly:

from pyPdf import PdfFileWriter, PdfFileReader

def append_pdf(input,output):
    [output.addPage(input.getPage(page_num)) for page_num in range(input.numPages)]

output = PdfFileWriter()

append_pdf(PdfFileReader(file("C:\\sample.pdf","rb")),output)
append_pdf(PdfFileReader(file("c:\\sample1.pdf","rb")),output)
append_pdf(PdfFileReader(file("c:\\sample2.pdf","rb")),output)
append_pdf(PdfFileReader(file("c:\\sample3.pdf","rb")),output)

    output.write(file("c:\\combined.pdf","wb"))

Question 9

A slight variation using a dictionary for greater flexibility (e.g. sort, dedup):

import os
from PyPDF2 import PdfFileMerger
# use dict to sort by filepath or filename
file_dict = {}
for subdir, dirs, files in os.walk("<dir>"):
    for file in files:
        filepath = subdir + os.sep + file
        # you can have multiple endswith
        if filepath.endswith((".pdf", ".PDF")):
            file_dict[file] = filepath
# use strict = False to ignore PdfReadError: Illegal character error
merger = PdfFileMerger(strict=False)

for k, v in file_dict.items():
    print(k, v)
    merger.append(v)

merger.write("combined_result.pdf")

Question 10

I used pdf unite on the linux terminal by leveraging subprocess (assumes one.pdf and two.pdf exist on the directory) and the aim is to merge them to three.pdf

 import subprocess
 subprocess.call(['pdfunite one.pdf two.pdf three.pdf'],shell=True)

合并PDF文件

问题：合并PDF文件

回答 0

回答 1

回答 2

合并目录中存在的所有pdf文件

Merge all pdf files that are present in a dir

回答 3

回答 4

回答 5

回答 6

回答 7

回答 8

排行榜展示

Python 情人节超强技能导出微信聊天记录生成词云

你不得不知道的python超级文献批量搜索下载工具

Python 流程图 — 一键转化代码为流程图

7行代码 Python热力图可视化分析缺失数据处理

Python 优化—算出每条语句执行时间

你的10W块放哪里能赚最多钱？

文章展示

从熊猫数据框列获取列表

如何通过密钥按数据组访问熊猫

您如何从stdin阅读？

如何在Fabric文件中设置目标主机

我何时应该在代码中使用pandas apply（）？

pythonic的方式做N次没有索引变量？

合并PDF文件

问题：合并PDF文件

回答 0

回答 1

回答 2

合并目录中存在的所有pdf文件

Merge all pdf files that are present in a dir

回答 3

回答 4

回答 5

回答 6

回答 7

回答 8

相关文章

排行榜展示

文章展示