标签归档:file-io

合并PDF文件

问题:合并PDF文件

是否可以使用Python合并单独的PDF文件?

假设是这样,我需要进一步扩展。我希望遍历目录中的文件夹并重复此过程。

我可能会碰运气,但是有可能排除PDF中包含的页面(我的报告生成总是创建一个额外的空白页面)。

Is it possible, using Python, to merge separate PDF files?

Assuming so, I need to extend this a little further. I am hoping to loop through folders in a directory and repeat this procedure.

And I may be pushing my luck, but is it possible to exclude a page that is contained in of the PDFs (my report generation always creates an extra blank page).


回答 0

使用Pypdf或其后续版本PyPDF2

作为Python工具箱构建的Pure-Python库。它具有以下功能:
*逐页拆分文档,
* 逐页合并文档,

(以及更多)

这是适用于两个版本的示例程序。

#!/usr/bin/env python
import sys
try:
    from PyPDF2 import PdfFileReader, PdfFileWriter
except ImportError:
    from pyPdf import PdfFileReader, PdfFileWriter

def pdf_cat(input_files, output_stream):
    input_streams = []
    try:
        # First open all the files, then produce the output file, and
        # finally close the input files. This is necessary because
        # the data isn't read from the input files until the write
        # operation. Thanks to
        # /programming/6773631/problem-with-closing-python-pypdf-writing-getting-a-valueerror-i-o-operation/6773733#6773733
        for input_file in input_files:
            input_streams.append(open(input_file, 'rb'))
        writer = PdfFileWriter()
        for reader in map(PdfFileReader, input_streams):
            for n in range(reader.getNumPages()):
                writer.addPage(reader.getPage(n))
        writer.write(output_stream)
    finally:
        for f in input_streams:
            f.close()

if __name__ == '__main__':
    if sys.platform == "win32":
        import os, msvcrt
        msvcrt.setmode(sys.stdout.fileno(), os.O_BINARY)
    pdf_cat(sys.argv[1:], sys.stdout)

Use Pypdf or its successor PyPDF2:

A Pure-Python library built as a PDF toolkit. It is capable of:
* splitting documents page by page,
* merging documents page by page,

(and much more)

Here’s a sample program that works with both versions.

#!/usr/bin/env python
import sys
try:
    from PyPDF2 import PdfFileReader, PdfFileWriter
except ImportError:
    from pyPdf import PdfFileReader, PdfFileWriter

def pdf_cat(input_files, output_stream):
    input_streams = []
    try:
        # First open all the files, then produce the output file, and
        # finally close the input files. This is necessary because
        # the data isn't read from the input files until the write
        # operation. Thanks to
        # https://stackoverflow.com/questions/6773631/problem-with-closing-python-pypdf-writing-getting-a-valueerror-i-o-operation/6773733#6773733
        for input_file in input_files:
            input_streams.append(open(input_file, 'rb'))
        writer = PdfFileWriter()
        for reader in map(PdfFileReader, input_streams):
            for n in range(reader.getNumPages()):
                writer.addPage(reader.getPage(n))
        writer.write(output_stream)
    finally:
        for f in input_streams:
            f.close()

if __name__ == '__main__':
    if sys.platform == "win32":
        import os, msvcrt
        msvcrt.setmode(sys.stdout.fileno(), os.O_BINARY)
    pdf_cat(sys.argv[1:], sys.stdout)

回答 1

您可以使用PyPdf2PdfMerger类。

文件串联

您可以使用方法简单地串联文件append

from PyPDF2 import PdfFileMerger

pdfs = ['file1.pdf', 'file2.pdf', 'file3.pdf', 'file4.pdf']

merger = PdfFileMerger()

for pdf in pdfs:
    merger.append(pdf)

merger.write("result.pdf")
merger.close()

您可以根据需要传递文件句柄而不是文件路径。

文件合并

如果要更精细地控制合并,可以使用的merge方法,该方法PdfMerger可以在输出文件中指定插入点,这意味着您可以将页面插入文件中的任何位置。该append方法可以认为是merge插入点位于文件末尾的位置。

例如

merger.merge(2, pdf)

在这里,我们将整个pdf插入到输出中,但在第2页。

页面范围

如果要控制从特定文件追加哪些页面,可以使用and 的pages关键字参数,以格式传递元组(类似于常规函数)。appendmerge(start, stop[, step])range

例如

merger.append(pdf, pages=(0, 3))    # first 3 pages
merger.append(pdf, pages=(0, 6, 2)) # pages 1,3, 5

如果指定的范围无效,则会显示IndexError

注意:此外,为避免文件保持打开状态,在PdfFileMerger写入合并文件后应调用s close方法。这样可确保及时关闭所有文件(输入和输出)。遗憾的PdfFileMerger是没有作为上下文管理器来实现,因此我们可以使用with关键字,避免显式的close调用并获得一些简单的异常安全性。

您可能还需要查看pdfcatpypdf2中提供的脚本。您可以完全避免编写代码。

PyPdf2 github还包括一些示例代码,展示了合并。

You can use PyPdf2s PdfMerger class.

File Concatenation

You can simply concatenate files by using the append method.

from PyPDF2 import PdfFileMerger

pdfs = ['file1.pdf', 'file2.pdf', 'file3.pdf', 'file4.pdf']

merger = PdfFileMerger()

for pdf in pdfs:
    merger.append(pdf)

merger.write("result.pdf")
merger.close()

You can pass file handles instead file paths if you want.

File Merging

If you want more fine grained control of merging there is a merge method of the PdfMerger, which allows you to specify an insertion point in the output file, meaning you can insert the pages anywhere in the file. The append method can be thought of as a merge where the insertion point is the end of the file.

e.g.

merger.merge(2, pdf)

Here we insert the whole pdf into the output but at page 2.

Page Ranges

If you wish to control which pages are appended from a particular file, you can use the pages keyword argument of append and merge, passing a tuple in the form (start, stop[, step]) (like the regular range function).

e.g.

merger.append(pdf, pages=(0, 3))    # first 3 pages
merger.append(pdf, pages=(0, 6, 2)) # pages 1,3, 5

If you specify an invalid range you will get an IndexError.

Note: also that to avoid files being left open, the PdfFileMergers close method should be called when the merged file has been written. This ensures all files are closed (input and output) in a timely manner. It’s a shame that PdfFileMerger isn’t implemented as a context manager, so we can use the with keyword, avoid the explicit close call and get some easy exception safety.

You might also want to look at the pdfcat script provided as part of pypdf2. You can potentially avoid the need to write code altogether.

The PyPdf2 github also includes some example code demonstrating merging.


回答 2

合并目录中存在的所有pdf文件

将pdf文件放在目录中。启动程序。您将合并所有pdf文件,得到一个pdf文件。

import os
from PyPDF2 import PdfFileMerger

x = [a for a in os.listdir() if a.endswith(".pdf")]

merger = PdfFileMerger()

for pdf in x:
    merger.append(open(pdf, 'rb'))

with open("result.pdf", "wb") as fout:
    merger.write(fout)

Merge all pdf files that are present in a dir

Put the pdf files in a dir. Launch the program. You get one pdf with all the pdfs merged.

import os
from PyPDF2 import PdfFileMerger

x = [a for a in os.listdir() if a.endswith(".pdf")]

merger = PdfFileMerger()

for pdf in x:
    merger.append(open(pdf, 'rb'))

with open("result.pdf", "wb") as fout:
    merger.write(fout)

回答 3

假设您不需要保留书签和注释,并且您的PDF未被加密,该pdfrw可以非常轻松地做到这一点。 cat.py是示例串联脚本,并且subset.py是示例页面子设置脚本。

串联脚本的相关部分-假设inputs是输入文件名列表,并且outfn是输出文件名:

from pdfrw import PdfReader, PdfWriter

writer = PdfWriter()
for inpfn in inputs:
    writer.addpages(PdfReader(inpfn).pages)
writer.write(outfn)

从中可以看出,省去最后一页非常容易,例如:

    writer.addpages(PdfReader(inpfn).pages[:-1])

免责声明:我是第一pdfrw作者。

The pdfrw library can do this quite easily, assuming you don’t need to preserve bookmarks and annotations, and your PDFs aren’t encrypted. cat.py is an example concatenation script, and subset.py is an example page subsetting script.

The relevant part of the concatenation script — assumes inputs is a list of input filenames, and outfn is an output file name:

from pdfrw import PdfReader, PdfWriter

writer = PdfWriter()
for inpfn in inputs:
    writer.addpages(PdfReader(inpfn).pages)
writer.write(outfn)

As you can see from this, it would be pretty easy to leave out the last page, e.g. something like:

    writer.addpages(PdfReader(inpfn).pages[:-1])

Disclaimer: I am the primary pdfrw author.


回答 4

是否可以使用Python合并单独的PDF文件?

是。

以下示例将一个文件夹中的所有文件合并为一个新的PDF文件:

#!/usr/bin/env python
# -*- coding: utf-8 -*-

from argparse import ArgumentParser
from glob import glob
from pyPdf import PdfFileReader, PdfFileWriter
import os

def merge(path, output_filename):
    output = PdfFileWriter()

    for pdffile in glob(path + os.sep + '*.pdf'):
        if pdffile == output_filename:
            continue
        print("Parse '%s'" % pdffile)
        document = PdfFileReader(open(pdffile, 'rb'))
        for i in range(document.getNumPages()):
            output.addPage(document.getPage(i))

    print("Start writing '%s'" % output_filename)
    with open(output_filename, "wb") as f:
        output.write(f)

if __name__ == "__main__":
    parser = ArgumentParser()

    # Add more options if you like
    parser.add_argument("-o", "--output",
                        dest="output_filename",
                        default="merged.pdf",
                        help="write merged PDF to FILE",
                        metavar="FILE")
    parser.add_argument("-p", "--path",
                        dest="path",
                        default=".",
                        help="path of source PDF files")

    args = parser.parse_args()
    merge(args.path, args.output_filename)

Is it possible, using Python, to merge seperate PDF files?

Yes.

The following example merges all files in one folder to a single new PDF file:

#!/usr/bin/env python
# -*- coding: utf-8 -*-

from argparse import ArgumentParser
from glob import glob
from pyPdf import PdfFileReader, PdfFileWriter
import os

def merge(path, output_filename):
    output = PdfFileWriter()

    for pdffile in glob(path + os.sep + '*.pdf'):
        if pdffile == output_filename:
            continue
        print("Parse '%s'" % pdffile)
        document = PdfFileReader(open(pdffile, 'rb'))
        for i in range(document.getNumPages()):
            output.addPage(document.getPage(i))

    print("Start writing '%s'" % output_filename)
    with open(output_filename, "wb") as f:
        output.write(f)

if __name__ == "__main__":
    parser = ArgumentParser()

    # Add more options if you like
    parser.add_argument("-o", "--output",
                        dest="output_filename",
                        default="merged.pdf",
                        help="write merged PDF to FILE",
                        metavar="FILE")
    parser.add_argument("-p", "--path",
                        dest="path",
                        default=".",
                        help="path of source PDF files")

    args = parser.parse_args()
    merge(args.path, args.output_filename)

回答 5

from PyPDF2 import PdfFileMerger
import webbrowser
import os
dir_path = os.path.dirname(os.path.realpath(__file__))

def list_files(directory, extension):
    return (f for f in os.listdir(directory) if f.endswith('.' + extension))

pdfs = list_files(dir_path, "pdf")

merger = PdfFileMerger()

for pdf in pdfs:
    merger.append(open(pdf, 'rb'))

with open('result.pdf', 'wb') as fout:
    merger.write(fout)

webbrowser.open_new('file://'+ dir_path + '/result.pdf')

Git回购:https : //github.com/mahaguru24/Python_Merge_PDF.git

from PyPDF2 import PdfFileMerger
import webbrowser
import os
dir_path = os.path.dirname(os.path.realpath(__file__))

def list_files(directory, extension):
    return (f for f in os.listdir(directory) if f.endswith('.' + extension))

pdfs = list_files(dir_path, "pdf")

merger = PdfFileMerger()

for pdf in pdfs:
    merger.append(open(pdf, 'rb'))

with open('result.pdf', 'wb') as fout:
    merger.write(fout)

webbrowser.open_new('file://'+ dir_path + '/result.pdf')

Git Repo: https://github.com/mahaguru24/Python_Merge_PDF.git


回答 6

在这里,http://pieceofpy.com/2009/03/05/concatenating-pdf-with-python/提供了解决方案。

类似地:

from pyPdf import PdfFileWriter, PdfFileReader

def append_pdf(input,output):
    [output.addPage(input.getPage(page_num)) for page_num in range(input.numPages)]

output = PdfFileWriter()

append_pdf(PdfFileReader(file("C:\\sample.pdf","rb")),output)
append_pdf(PdfFileReader(file("c:\\sample1.pdf","rb")),output)
append_pdf(PdfFileReader(file("c:\\sample2.pdf","rb")),output)
append_pdf(PdfFileReader(file("c:\\sample3.pdf","rb")),output)

    output.write(file("c:\\combined.pdf","wb"))

here, http://pieceofpy.com/2009/03/05/concatenating-pdf-with-python/, gives an solution.

similarly:

from pyPdf import PdfFileWriter, PdfFileReader

def append_pdf(input,output):
    [output.addPage(input.getPage(page_num)) for page_num in range(input.numPages)]

output = PdfFileWriter()

append_pdf(PdfFileReader(file("C:\\sample.pdf","rb")),output)
append_pdf(PdfFileReader(file("c:\\sample1.pdf","rb")),output)
append_pdf(PdfFileReader(file("c:\\sample2.pdf","rb")),output)
append_pdf(PdfFileReader(file("c:\\sample3.pdf","rb")),output)

    output.write(file("c:\\combined.pdf","wb"))

回答 7

使用字典进行一些细微的改动以获得更大的灵活性(例如,sort,dedup):

import os
from PyPDF2 import PdfFileMerger
# use dict to sort by filepath or filename
file_dict = {}
for subdir, dirs, files in os.walk("<dir>"):
    for file in files:
        filepath = subdir + os.sep + file
        # you can have multiple endswith
        if filepath.endswith((".pdf", ".PDF")):
            file_dict[file] = filepath
# use strict = False to ignore PdfReadError: Illegal character error
merger = PdfFileMerger(strict=False)

for k, v in file_dict.items():
    print(k, v)
    merger.append(v)

merger.write("combined_result.pdf")

A slight variation using a dictionary for greater flexibility (e.g. sort, dedup):

import os
from PyPDF2 import PdfFileMerger
# use dict to sort by filepath or filename
file_dict = {}
for subdir, dirs, files in os.walk("<dir>"):
    for file in files:
        filepath = subdir + os.sep + file
        # you can have multiple endswith
        if filepath.endswith((".pdf", ".PDF")):
            file_dict[file] = filepath
# use strict = False to ignore PdfReadError: Illegal character error
merger = PdfFileMerger(strict=False)

for k, v in file_dict.items():
    print(k, v)
    merger.append(v)

merger.write("combined_result.pdf")

回答 8

我通过利用子进程在Linux终端上使用pdf unite(假设目录中存在one.pdf和two.pdf),目的是将它们合并为3.pdf

 import subprocess
 subprocess.call(['pdfunite one.pdf two.pdf three.pdf'],shell=True)

I used pdf unite on the linux terminal by leveraging subprocess (assumes one.pdf and two.pdf exist on the directory) and the aim is to merge them to three.pdf

 import subprocess
 subprocess.call(['pdfunite one.pdf two.pdf three.pdf'],shell=True)

Python-write()与writelines()和串联字符串

问题:Python-write()与writelines()和串联字符串

所以我正在学习Python。我正在上课,遇到一个问题,我不得不将很多压缩target.write()成一个write(),同时"\n"在每个用户输入变量(的对象write())之间都有一个。

我想出了:

nl = "\n"
lines = line1, nl, line2, nl, line3, nl
textdoc.writelines(lines)

如果我尝试这样做:

textdoc.write(lines)

我得到一个错误。但是如果我输入:

textdoc.write(line1 + "\n" + line2 + ....)

然后工作正常。为什么我不能在其中使用字符串作为换行符,write()但可以在其中使用呢writelines()

Python 2.7当我搜索google时,发现的大部分资源都超出了我的想象力,我仍然是一个外行。

So I’m learning Python. I am going through the lessons and ran into a problem where I had to condense a great many target.write() into a single write(), while having a "\n" between each user input variable(the object of write()).

I came up with:

nl = "\n"
lines = line1, nl, line2, nl, line3, nl
textdoc.writelines(lines)

If I try to do:

textdoc.write(lines)

I get an error. But if I type:

textdoc.write(line1 + "\n" + line2 + ....)

Then it works fine. Why am I unable to use a string for a newline in write() but I can use it in writelines()?

Python 2.7 When I searched google most resources I found were way over my head, I’m still a lay-people.


回答 0

  • writelines 期待字符串的迭代
  • write 需要一个字符串。

line1 + "\n" + line2将这些字符串合并到一个字符串中,然后再传递给write

请注意,如果您有很多行,则可能要使用"\n".join(list_of_lines)

  • writelines expects an iterable of strings
  • write expects a single string.

line1 + "\n" + line2 merges those strings together into a single string before passing it to write.

Note that if you have many lines, you may want to use "\n".join(list_of_lines).


回答 1

为什么我不能在write()中将字符串用于换行符,但可以在writelines()中使用它?

想法如下:如果要编写单个字符串,可以使用write()。如果您有一系列字符串,则可以使用编写所有字符串writelines()

write(arg)需要一个字符串作为参数并将其写入文件。如果您提供字符串列表,它将引发异常(顺便说一下,向我们显示错误!)。

writelines(arg)期望将iterable作为参数(在最一般的意义上,可迭代对象可以是元组,列表,字符串或迭代器)。迭代器中包含的每个项目均应为字符串。您提供的是一个字符串元组,因此一切正常。

字符串的性质对两个函数都无关紧要,即,无论您提供什么字符串,它们都只会写入文件。有趣的是,writelines()它本身并不添加换行符,因此方法名称实际上可能会造成很大的混乱。实际上,它的行为类似于一个称为的虚构方法write_all_of_these_strings(sequence)

接下来是Python中的一种惯用方式,将字符串列表写入文件,同时将每个字符串保留在自己的行中:

lines = ['line1', 'line2']
with open('filename.txt', 'w') as f:
    f.write('\n'.join(lines))

这将为您关闭文件。该构造'\n'.join(lines)将列表中的字符串连接(连接),lines并使用字符“ \ n”作为粘合。比使用+运算符更有效。

从相同的lines序列开始,以相同的输出结束,但使用writelines()

lines = ['line1', 'line2']
with open('filename.txt', 'w') as f:
    f.writelines("%s\n" % l for l in lines)

这利用了生成器表达式并动态创建了以换行符结尾的字符串。writelines()遍历此字符串序列并写入每个项目。

编辑:您应该注意的另一点:

write()并且readlines()writelines()引入之前就存在。writelines()后来作为的对应版本引入readlines(),以便人们可以轻松地编写通过readlines()以下方式读取的文件内容:

outfile.writelines(infile.readlines())

确实,这就是为什么使用writelines如此混乱的名称的主要原因。而且,今天,我们真的不再想要使用此方法。readlines()writelines()开始写入数据之前,将整个文件读取到计算机的内存中。首先,这可能会浪费时间。为什么不阅读其他部分就开始写部分数据呢?但是,最重要的是,这种方法可能会占用大量内存。在极端情况下,如果输入文件大于计算机的内存,则此方法甚至不起作用。解决此问题的方法是仅使用迭代器。一个工作示例:

with open('inputfile') as infile:
    with open('outputfile') as outfile:
        for line in infile:
            outfile.write(line)

这将逐行读取输入文件。读取一行后,该行即被写入输出文件。从概念上讲,内存中始终只有一行(相比之下,在采用读取行/写入行方法的情况下,整个文件内容都在内存中)。

Why am I unable to use a string for a newline in write() but I can use it in writelines()?

The idea is the following: if you want to write a single string you can do this with write(). If you have a sequence of strings you can write them all using writelines().

write(arg) expects a string as argument and writes it to the file. If you provide a list of strings, it will raise an exception (by the way, show errors to us!).

writelines(arg) expects an iterable as argument (an iterable object can be a tuple, a list, a string, or an iterator in the most general sense). Each item contained in the iterator is expected to be a string. A tuple of strings is what you provided, so things worked.

The nature of the string(s) does not matter to both of the functions, i.e. they just write to the file whatever you provide them. The interesting part is that writelines() does not add newline characters on its own, so the method name can actually be quite confusing. It actually behaves like an imaginary method called write_all_of_these_strings(sequence).

What follows is an idiomatic way in Python to write a list of strings to a file while keeping each string in its own line:

lines = ['line1', 'line2']
with open('filename.txt', 'w') as f:
    f.write('\n'.join(lines))

This takes care of closing the file for you. The construct '\n'.join(lines) concatenates (connects) the strings in the list lines and uses the character ‘\n’ as glue. It is more efficient than using the + operator.

Starting from the same lines sequence, ending up with the same output, but using writelines():

lines = ['line1', 'line2']
with open('filename.txt', 'w') as f:
    f.writelines("%s\n" % l for l in lines)

This makes use of a generator expression and dynamically creates newline-terminated strings. writelines() iterates over this sequence of strings and writes every item.

Edit: Another point you should be aware of:

write() and readlines() existed before writelines() was introduced. writelines() was introduced later as a counterpart of readlines(), so that one could easily write the file content that was just read via readlines():

outfile.writelines(infile.readlines())

Really, this is the main reason why writelines has such a confusing name. Also, today, we do not really want to use this method anymore. readlines() reads the entire file to the memory of your machine before writelines() starts to write the data. First of all, this may waste time. Why not start writing parts of data while reading other parts? But, most importantly, this approach can be very memory consuming. In an extreme scenario, where the input file is larger than the memory of your machine, this approach won’t even work. The solution to this problem is to use iterators only. A working example:

with open('inputfile') as infile:
    with open('outputfile') as outfile:
        for line in infile:
            outfile.write(line)

This reads the input file line by line. As soon as one line is read, this line is written to the output file. Schematically spoken, there always is only one single line in memory (compared to the entire file content being in memory in case of the readlines/writelines approach).


回答 2

如果您只想保存和加载列表,请尝试Pickle

泡菜保存:

with open("yourFile","wb")as file:
 pickle.dump(YourList,file)

和加载:

with open("yourFile","rb")as file:
 YourList=pickle.load(file)

if you just want to save and load a list try Pickle

Pickle saving:

with open("yourFile","wb")as file:
 pickle.dump(YourList,file)

and loading:

with open("yourFile","rb")as file:
 YourList=pickle.load(file)

回答 3

实际上,我认为问题在于您的变量“行”不好。您将行定义为元组,但是我相信write()需要一个字符串。您所要做的就是将逗号变成加号(+)。

nl = "\n"
lines = line1+nl+line2+nl+line3+nl
textdoc.writelines(lines)

应该管用。

Actually, I think the problem is that your variable “lines” is bad. You defined lines as a tuple, but I believe that write() requires a string. All you have to change is your commas into pluses (+).

nl = "\n"
lines = line1+nl+line2+nl+line3+nl
textdoc.writelines(lines)

should work.


回答 4

习德(Zed Shaw)的书中的练习16?您可以使用转义符,如下所示:

paragraph1 = "%s \n %s \n %s \n" % (line1, line2, line3)
target.write(paragraph1)
target.close()

Exercise 16 from Zed Shaw’s book? You can use escape characters as follows:

paragraph1 = "%s \n %s \n %s \n" % (line1, line2, line3)
target.write(paragraph1)
target.close()

如何将多维数组写入文本文件?

问题:如何将多维数组写入文本文件?

在另一个问题中,如果我可以提供遇到问题的阵列,其他用户会提供一些帮助。但是,我什至无法完成基本的I / O任务,例如将数组写入文件。

谁能解释我需要向文件写入4x11x14 numpy数组的哪种循环?

该数组由四个11 x 14数组组成,因此我应该使用漂亮的换行符对其进行格式化,以使文件读取更加容易。

编辑:所以我已经尝试了numpy.savetxt函数。奇怪的是,它给出了以下错误:

TypeError: float argument required, not numpy.ndarray

我认为这是因为函数不适用于多维数组?我希望在一个文件中找到任何解决方案吗?

In another question, other users offered some help if I could supply the array I was having trouble with. However, I even fail at a basic I/O task, such as writing an array to a file.

Can anyone explain what kind of loop I would need to write a 4x11x14 numpy array to file?

This array consist of four 11 x 14 arrays, so I should format it with a nice newline, to make the reading of the file easier on others.

Edit: So I’ve tried the numpy.savetxt function. Strangely, it gives the following error:

TypeError: float argument required, not numpy.ndarray

I assume that this is because the function doesn’t work with multidimensional arrays? Any solutions as I would like them within one file?


回答 0

如果您想将其写入磁盘,以便轻松地以numpy数组的形式读回,请查看numpy.save。对其进行酸洗也可以,但是对于大型阵列而言效率较低(您的不是,因此两者都很好)。

如果您希望它易于阅读,请查看numpy.savetxt

编辑: 所以,savetxt对于> 2维数组似乎似乎不是一个很好的选择…但是只是为了得出所有结论,它的全部结论是:

我刚刚意识到,numpy.savetxt大于2维的ndarray上的阻塞…这可能是设计使然,因为没有固有定义的方式来指示文本文件中的其他维。

例如,这个(二维数组)可以正常工作

import numpy as np
x = np.arange(20).reshape((4,5))
np.savetxt('test.txt', x)

TypeError: float argument required, not numpy.ndarray对于3D数组,相同的操作将失败(错误消息不多:):

import numpy as np
x = np.arange(200).reshape((4,5,10))
np.savetxt('test.txt', x)

一种解决方法是将3D(或更大)阵列分成2D切片。例如

x = np.arange(200).reshape((4,5,10))
with file('test.txt', 'w') as outfile:
    for slice_2d in x:
        np.savetxt(outfile, slice_2d)

但是,我们的目标是使人类清晰易读,同时仍然可以轻松地将其读回numpy.loadtxt。因此,我们可以稍微冗长一些,并使用注释出的行来区分切片。默认情况下,numpy.loadtxt将忽略任何以#(或commentskwarg 指定的任何字符)开头的行。(看起来比实际要冗长得多…)

import numpy as np

# Generate some test data
data = np.arange(200).reshape((4,5,10))

# Write the array to disk
with open('test.txt', 'w') as outfile:
    # I'm writing a header here just for the sake of readability
    # Any line starting with "#" will be ignored by numpy.loadtxt
    outfile.write('# Array shape: {0}\n'.format(data.shape))

    # Iterating through a ndimensional array produces slices along
    # the last axis. This is equivalent to data[i,:,:] in this case
    for data_slice in data:

        # The formatting string indicates that I'm writing out
        # the values in left-justified columns 7 characters in width
        # with 2 decimal places.  
        np.savetxt(outfile, data_slice, fmt='%-7.2f')

        # Writing out a break to indicate different slices...
        outfile.write('# New slice\n')

这样生成:

# Array shape: (4, 5, 10)
0.00    1.00    2.00    3.00    4.00    5.00    6.00    7.00    8.00    9.00   
10.00   11.00   12.00   13.00   14.00   15.00   16.00   17.00   18.00   19.00  
20.00   21.00   22.00   23.00   24.00   25.00   26.00   27.00   28.00   29.00  
30.00   31.00   32.00   33.00   34.00   35.00   36.00   37.00   38.00   39.00  
40.00   41.00   42.00   43.00   44.00   45.00   46.00   47.00   48.00   49.00  
# New slice
50.00   51.00   52.00   53.00   54.00   55.00   56.00   57.00   58.00   59.00  
60.00   61.00   62.00   63.00   64.00   65.00   66.00   67.00   68.00   69.00  
70.00   71.00   72.00   73.00   74.00   75.00   76.00   77.00   78.00   79.00  
80.00   81.00   82.00   83.00   84.00   85.00   86.00   87.00   88.00   89.00  
90.00   91.00   92.00   93.00   94.00   95.00   96.00   97.00   98.00   99.00  
# New slice
100.00  101.00  102.00  103.00  104.00  105.00  106.00  107.00  108.00  109.00 
110.00  111.00  112.00  113.00  114.00  115.00  116.00  117.00  118.00  119.00 
120.00  121.00  122.00  123.00  124.00  125.00  126.00  127.00  128.00  129.00 
130.00  131.00  132.00  133.00  134.00  135.00  136.00  137.00  138.00  139.00 
140.00  141.00  142.00  143.00  144.00  145.00  146.00  147.00  148.00  149.00 
# New slice
150.00  151.00  152.00  153.00  154.00  155.00  156.00  157.00  158.00  159.00 
160.00  161.00  162.00  163.00  164.00  165.00  166.00  167.00  168.00  169.00 
170.00  171.00  172.00  173.00  174.00  175.00  176.00  177.00  178.00  179.00 
180.00  181.00  182.00  183.00  184.00  185.00  186.00  187.00  188.00  189.00 
190.00  191.00  192.00  193.00  194.00  195.00  196.00  197.00  198.00  199.00 
# New slice

只要我们知道原始数组的形状,就可以很容易地读回它。我们可以做numpy.loadtxt('test.txt').reshape((4,5,10))。作为一个示例(您可以在一行中完成此操作,我只是在冗长地澄清事情):

# Read the array from disk
new_data = np.loadtxt('test.txt')

# Note that this returned a 2D array!
print new_data.shape

# However, going back to 3D is easy if we know the 
# original shape of the array
new_data = new_data.reshape((4,5,10))

# Just to check that they're the same...
assert np.all(new_data == data)

If you want to write it to disk so that it will be easy to read back in as a numpy array, look into numpy.save. Pickling it will work fine, as well, but it’s less efficient for large arrays (which yours isn’t, so either is perfectly fine).

If you want it to be human readable, look into numpy.savetxt.

Edit: So, it seems like savetxt isn’t quite as great an option for arrays with >2 dimensions… But just to draw everything out to it’s full conclusion:

I just realized that numpy.savetxt chokes on ndarrays with more than 2 dimensions… This is probably by design, as there’s no inherently defined way to indicate additional dimensions in a text file.

E.g. This (a 2D array) works fine

import numpy as np
x = np.arange(20).reshape((4,5))
np.savetxt('test.txt', x)

While the same thing would fail (with a rather uninformative error: TypeError: float argument required, not numpy.ndarray) for a 3D array:

import numpy as np
x = np.arange(200).reshape((4,5,10))
np.savetxt('test.txt', x)

One workaround is just to break the 3D (or greater) array into 2D slices. E.g.

x = np.arange(200).reshape((4,5,10))
with open('test.txt', 'w') as outfile:
    for slice_2d in x:
        np.savetxt(outfile, slice_2d)

However, our goal is to be clearly human readable, while still being easily read back in with numpy.loadtxt. Therefore, we can be a bit more verbose, and differentiate the slices using commented out lines. By default, numpy.loadtxt will ignore any lines that start with # (or whichever character is specified by the comments kwarg). (This looks more verbose than it actually is…)

import numpy as np

# Generate some test data
data = np.arange(200).reshape((4,5,10))

# Write the array to disk
with open('test.txt', 'w') as outfile:
    # I'm writing a header here just for the sake of readability
    # Any line starting with "#" will be ignored by numpy.loadtxt
    outfile.write('# Array shape: {0}\n'.format(data.shape))
    
    # Iterating through a ndimensional array produces slices along
    # the last axis. This is equivalent to data[i,:,:] in this case
    for data_slice in data:

        # The formatting string indicates that I'm writing out
        # the values in left-justified columns 7 characters in width
        # with 2 decimal places.  
        np.savetxt(outfile, data_slice, fmt='%-7.2f')

        # Writing out a break to indicate different slices...
        outfile.write('# New slice\n')

This yields:

# Array shape: (4, 5, 10)
0.00    1.00    2.00    3.00    4.00    5.00    6.00    7.00    8.00    9.00   
10.00   11.00   12.00   13.00   14.00   15.00   16.00   17.00   18.00   19.00  
20.00   21.00   22.00   23.00   24.00   25.00   26.00   27.00   28.00   29.00  
30.00   31.00   32.00   33.00   34.00   35.00   36.00   37.00   38.00   39.00  
40.00   41.00   42.00   43.00   44.00   45.00   46.00   47.00   48.00   49.00  
# New slice
50.00   51.00   52.00   53.00   54.00   55.00   56.00   57.00   58.00   59.00  
60.00   61.00   62.00   63.00   64.00   65.00   66.00   67.00   68.00   69.00  
70.00   71.00   72.00   73.00   74.00   75.00   76.00   77.00   78.00   79.00  
80.00   81.00   82.00   83.00   84.00   85.00   86.00   87.00   88.00   89.00  
90.00   91.00   92.00   93.00   94.00   95.00   96.00   97.00   98.00   99.00  
# New slice
100.00  101.00  102.00  103.00  104.00  105.00  106.00  107.00  108.00  109.00 
110.00  111.00  112.00  113.00  114.00  115.00  116.00  117.00  118.00  119.00 
120.00  121.00  122.00  123.00  124.00  125.00  126.00  127.00  128.00  129.00 
130.00  131.00  132.00  133.00  134.00  135.00  136.00  137.00  138.00  139.00 
140.00  141.00  142.00  143.00  144.00  145.00  146.00  147.00  148.00  149.00 
# New slice
150.00  151.00  152.00  153.00  154.00  155.00  156.00  157.00  158.00  159.00 
160.00  161.00  162.00  163.00  164.00  165.00  166.00  167.00  168.00  169.00 
170.00  171.00  172.00  173.00  174.00  175.00  176.00  177.00  178.00  179.00 
180.00  181.00  182.00  183.00  184.00  185.00  186.00  187.00  188.00  189.00 
190.00  191.00  192.00  193.00  194.00  195.00  196.00  197.00  198.00  199.00 
# New slice

Reading it back in is very easy, as long as we know the shape of the original array. We can just do numpy.loadtxt('test.txt').reshape((4,5,10)). As an example (You can do this in one line, I’m just being verbose to clarify things):

# Read the array from disk
new_data = np.loadtxt('test.txt')

# Note that this returned a 2D array!
print new_data.shape

# However, going back to 3D is easy if we know the 
# original shape of the array
new_data = new_data.reshape((4,5,10))
    
# Just to check that they're the same...
assert np.all(new_data == data)

回答 1

鉴于我认为您有兴趣让人们可读该文件,因此我不确定这是否满足您的要求,但是如果这不是主要问题,就pickle可以了。

要保存它:

import pickle

my_data = {'a': [1, 2.0, 3, 4+6j],
           'b': ('string', u'Unicode string'),
           'c': None}
output = open('data.pkl', 'wb')
pickle.dump(my_data, output)
output.close()

读回:

import pprint, pickle

pkl_file = open('data.pkl', 'rb')

data1 = pickle.load(pkl_file)
pprint.pprint(data1)

pkl_file.close()

I am not certain if this meets your requirements, given I think you are interested in making the file readable by people, but if that’s not a primary concern, just pickle it.

To save it:

import pickle

my_data = {'a': [1, 2.0, 3, 4+6j],
           'b': ('string', u'Unicode string'),
           'c': None}
output = open('data.pkl', 'wb')
pickle.dump(my_data, output)
output.close()

To read it back:

import pprint, pickle

pkl_file = open('data.pkl', 'rb')

data1 = pickle.load(pkl_file)
pprint.pprint(data1)

pkl_file.close()

回答 2

如果不需要人类可读的输出,则可以尝试的另一种选择是将数组另存为MATLAB .mat文件,这是结构化数组。我鄙视MATLAB,但是我可以.mat在很少的几行中进行读写的事实很方便。

与乔金顿的回答,这样做的好处是,你不需要知道数据的原始形状.mat的文件,即在阅读无需重塑。而且,不像使用pickle,一个.mat文件可以通过MATLAB读取,以及其他一些程序/语言。

这是一个例子:

import numpy as np
import scipy.io

# Some test data
x = np.arange(200).reshape((4,5,10))

# Specify the filename of the .mat file
matfile = 'test_mat.mat'

# Write the array to the mat file. For this to work, the array must be the value
# corresponding to a key name of your choice in a dictionary
scipy.io.savemat(matfile, mdict={'out': x}, oned_as='row')

# For the above line, I specified the kwarg oned_as since python (2.7 with 
# numpy 1.6.1) throws a FutureWarning.  Here, this isn't really necessary 
# since oned_as is a kwarg for dealing with 1-D arrays.

# Now load in the data from the .mat that was just saved
matdata = scipy.io.loadmat(matfile)

# And just to check if the data is the same:
assert np.all(x == matdata['out'])

如果忘记了在.mat文件中为数组命名的键​​,则始终可以执行以下操作:

print matdata.keys()

当然,您可以使用更多键存储许多数组。

因此,是的-您的眼睛无法看懂它,而只需要两行即可写入和读取数据,我认为这是一个公平的权衡。

看看scipy.io.savematscipy.io.loadmat的文档, 以及本教程页面:scipy.io文件IO教程

If you don’t need a human-readable output, another option you could try is to save the array as a MATLAB .mat file, which is a structured array. I despise MATLAB, but the fact that I can both read and write a .mat in very few lines is convenient.

Unlike Joe Kington’s answer, the benefit of this is that you don’t need to know the original shape of the data in the .mat file, i.e. no need to reshape upon reading in. And, unlike using pickle, a .mat file can be read by MATLAB, and probably some other programs/languages as well.

Here is an example:

import numpy as np
import scipy.io

# Some test data
x = np.arange(200).reshape((4,5,10))

# Specify the filename of the .mat file
matfile = 'test_mat.mat'

# Write the array to the mat file. For this to work, the array must be the value
# corresponding to a key name of your choice in a dictionary
scipy.io.savemat(matfile, mdict={'out': x}, oned_as='row')

# For the above line, I specified the kwarg oned_as since python (2.7 with 
# numpy 1.6.1) throws a FutureWarning.  Here, this isn't really necessary 
# since oned_as is a kwarg for dealing with 1-D arrays.

# Now load in the data from the .mat that was just saved
matdata = scipy.io.loadmat(matfile)

# And just to check if the data is the same:
assert np.all(x == matdata['out'])

If you forget the key that the array is named in the .mat file, you can always do:

print matdata.keys()

And of course you can store many arrays using many more keys.

So yes – it won’t be readable with your eyes, but only takes 2 lines to write and read the data, which I think is a fair trade-off.

Take a look at the docs for scipy.io.savemat and scipy.io.loadmat and also this tutorial page: scipy.io File IO Tutorial


回答 3

ndarray.tofile() 应该也可以

例如,如果您的数组被调用a

a.tofile('yourfile.txt',sep=" ",format="%s")

虽然不确定如何获取换行格式。

编辑在此处向Kevin J. Black发表评论):

从1.5.0版开始,np.tofile()采用可选参数 newline='\n'以允许多行输出。 https://docs.scipy.org/doc/numpy-1.13.0/reference/generated/numpy.savetxt.html

ndarray.tofile() should also work

e.g. if your array is called a:

a.tofile('yourfile.txt',sep=" ",format="%s")

Not sure how to get newline formatting though.

Edit (credit Kevin J. Black’s comment here):

Since version 1.5.0, np.tofile() takes an optional parameter newline='\n' to allow multi-line output. https://docs.scipy.org/doc/numpy-1.13.0/reference/generated/numpy.savetxt.html


回答 4

有专门的库可以做到这一点。(加上python包装器)

希望这可以帮助

There exist special libraries to do just that. (Plus wrappers for python)

hope this helps


回答 5

您可以简单地在三个嵌套循环中遍历数组并将其值写入文件。为了阅读,您只需使用完全相同的循环结构即可。您将以正确的顺序获得值,以再次正确填充数组。

You can simply traverse the array in three nested loops and write their values to your file. For reading, you simply use the same exact loop construction. You will get the values in exactly the right order to fill your arrays correctly again.


回答 6

我有一种方法可以使用简单的filename.write()操作。它对我来说很好用,但是我正在处理具有约1500个数据元素的数组。

我基本上只需要for循环来遍历文件,然后以csv样式输出将其逐行写入输出目标。

import numpy as np

trial = np.genfromtxt("/extension/file.txt", dtype = str, delimiter = ",")

with open("/extension/file.txt", "w") as f:
    for x in xrange(len(trial[:,1])):
        for y in range(num_of_columns):
            if y < num_of_columns-2:
                f.write(trial[x][y] + ",")
            elif y == num_of_columns-1:
                f.write(trial[x][y])
        f.write("\n")

if和elif语句用于在数据元素之间添加逗号。无论出于何种原因,当以nd数组形式读取文件时,这些内容都会被删除。我的目标是将文件输出为csv,因此此方法有助于解决该问题。

希望这可以帮助!

I have a way to do it using a simply filename.write() operation. It works fine for me, but I’m dealing with arrays having ~1500 data elements.

I basically just have for loops to iterate through the file and write it to the output destination line-by-line in a csv style output.

import numpy as np

trial = np.genfromtxt("/extension/file.txt", dtype = str, delimiter = ",")

with open("/extension/file.txt", "w") as f:
    for x in xrange(len(trial[:,1])):
        for y in range(num_of_columns):
            if y < num_of_columns-2:
                f.write(trial[x][y] + ",")
            elif y == num_of_columns-1:
                f.write(trial[x][y])
        f.write("\n")

The if and elif statement are used to add commas between the data elements. For whatever reason, these get stripped out when reading the file in as an nd array. My goal was to output the file as a csv, so this method helps to handle that.

Hope this helps!


回答 7

泡菜最适合这些情况。假设您有一个名为的ndarray x_train。您可以将其转储到文件中,然后使用以下命令将其还原:

import pickle

###Load into file
with open("myfile.pkl","wb") as f:
    pickle.dump(x_train,f)

###Extract from file
with open("myfile.pkl","rb") as f:
    x_temp = pickle.load(f)

Pickle is best for these cases. Suppose you have a ndarray named x_train. You can dump it into a file and revert it back using the following command:

import pickle

###Load into file
with open("myfile.pkl","wb") as f:
    pickle.dump(x_train,f)

###Extract from file
with open("myfile.pkl","rb") as f:
    x_temp = pickle.load(f)

替换文件内容中的字符串

问题:替换文件内容中的字符串

如何打开文件Stud.txt,然后用“橙色”替换出现的“ A”?

How can I open a file, Stud.txt, and then replace any occurences of “A” with “Orange”?


回答 0

with open("Stud.txt", "rt") as fin:
    with open("out.txt", "wt") as fout:
        for line in fin:
            fout.write(line.replace('A', 'Orange'))
with open("Stud.txt", "rt") as fin:
    with open("out.txt", "wt") as fout:
        for line in fin:
            fout.write(line.replace('A', 'Orange'))

回答 1

如果要替换同一文件中的字符串,则可能必须将其内容读入局部变量,将其关闭,然后重新打开以进行写入:

在此示例中,我使用with语句,该语句with块终止后关闭文件-通常在最后一条命令完成执行时执行,或在exceptions情况下执行。

def inplace_change(filename, old_string, new_string):
    # Safely read the input filename using 'with'
    with open(filename) as f:
        s = f.read()
        if old_string not in s:
            print('"{old_string}" not found in {filename}.'.format(**locals()))
            return

    # Safely write the changed content, if found in the file
    with open(filename, 'w') as f:
        print('Changing "{old_string}" to "{new_string}" in {filename}'.format(**locals()))
        s = s.replace(old_string, new_string)
        f.write(s)

值得一提的是,如果文件名不同,我们可以用一条with语句来做得更好。

If you’d like to replace the strings in the same file, you probably have to read its contents into a local variable, close it, and re-open it for writing:

I am using the with statement in this example, which closes the file after the with block is terminated – either normally when the last command finishes executing, or by an exception.

def inplace_change(filename, old_string, new_string):
    # Safely read the input filename using 'with'
    with open(filename) as f:
        s = f.read()
        if old_string not in s:
            print('"{old_string}" not found in {filename}.'.format(**locals()))
            return

    # Safely write the changed content, if found in the file
    with open(filename, 'w') as f:
        print('Changing "{old_string}" to "{new_string}" in {filename}'.format(**locals()))
        s = s.replace(old_string, new_string)
        f.write(s)

It is worth mentioning that if the filenames were different, we could have done this more elegantly with a single with statement.


回答 2

#!/usr/bin/python

with open(FileName) as f:
    newText=f.read().replace('A', 'Orange')

with open(FileName, "w") as f:
    f.write(newText)
#!/usr/bin/python

with open(FileName) as f:
    newText=f.read().replace('A', 'Orange')

with open(FileName, "w") as f:
    f.write(newText)

回答 3

就像是

file = open('Stud.txt')
contents = file.read()
replaced_contents = contents.replace('A', 'Orange')

<do stuff with the result>

Something like

file = open('Stud.txt')
contents = file.read()
replaced_contents = contents.replace('A', 'Orange')

<do stuff with the result>

回答 4

with open('Stud.txt','r') as f:
    newlines = []
    for line in f.readlines():
        newlines.append(line.replace('A', 'Orange'))
with open('Stud.txt', 'w') as f:
    for line in newlines:
        f.write(line)
with open('Stud.txt','r') as f:
    newlines = []
    for line in f.readlines():
        newlines.append(line.replace('A', 'Orange'))
with open('Stud.txt', 'w') as f:
    for line in newlines:
        f.write(line)

回答 5

如果您使用的是Linux,并且只想替换单词dog,则cat可以执行以下操作:

text.txt:

Hi, i am a dog and dog's are awesome, i love dogs! dog dog dogs!

Linux命令:

sed -i 's/dog/cat/g' test.txt

输出:

Hi, i am a cat and cat's are awesome, i love cats! cat cat cats!

原始帖子:https : //askubuntu.com/questions/20414/find-and-replace-text-within-a-file-using-commands

If you are on linux and just want to replace the word dog with catyou can do:

text.txt:

Hi, i am a dog and dog's are awesome, i love dogs! dog dog dogs!

Linux Command:

sed -i 's/dog/cat/g' test.txt

Output:

Hi, i am a cat and cat's are awesome, i love cats! cat cat cats!

Original Post: https://askubuntu.com/questions/20414/find-and-replace-text-within-a-file-using-commands


回答 6

使用pathlib(https://docs.python.org/3/library/pathlib.html

from pathlib import Path
file = Path('Stud.txt')
file.write_text(file.read_text().replace('A', 'Orange'))

如果输入文件和输出文件不同,则将对read_text和使用两个不同的变量write_text

如果您希望更改比单个替换更复杂,则可以将结果分配read_text给一个变量,对其进行处理,然后将新内容保存到另一个变量,然后使用保存新内容write_text

如果您的文件很大,则最好不要读取内存中的整个文件,而应像Gareth Davidson在另一个答案中显示的那样逐行处理(https://stackoverflow.com/a/4128192/3981273) ,当然需要使用两个不同的文件进行输入和输出。

Using pathlib (https://docs.python.org/3/library/pathlib.html)

from pathlib import Path
file = Path('Stud.txt')
file.write_text(file.read_text().replace('A', 'Orange'))

If input and output files were different you would use two different variables for read_text and write_text.

If you wanted a change more complex than a single replacement, you would assign the result of read_text to a variable, process it and save the new content to another variable, and then save the new content with write_text.

If your file was large you would prefer an approach that does not read the whole file in memory, but rather process it line by line as show by Gareth Davidson in another answer (https://stackoverflow.com/a/4128192/3981273), which of course requires to use two distinct files for input and output.


回答 7

最简单的方法是使用正则表达式进行此操作,假设您要遍历文件中的每一行(存储“ A”的位置),则可以…

import re

input = file('C:\full_path\Stud.txt), 'r')
#when you try and write to a file with write permissions, it clears the file and writes only #what you tell it to the file.  So we have to save the file first.

saved_input
for eachLine in input:
    saved_input.append(eachLine)

#now we change entries with 'A' to 'Orange'
for i in range(0, len(old):
    search = re.sub('A', 'Orange', saved_input[i])
    if search is not None:
        saved_input[i] = search
#now we open the file in write mode (clearing it) and writing saved_input back to it
input = file('C:\full_path\Stud.txt), 'w')
for each in saved_input:
    input.write(each)

easiest way is to do it with regular expressions, assuming that you want to iterate over each line in the file (where ‘A’ would be stored) you do…

import re

input = file('C:\full_path\Stud.txt), 'r')
#when you try and write to a file with write permissions, it clears the file and writes only #what you tell it to the file.  So we have to save the file first.

saved_input
for eachLine in input:
    saved_input.append(eachLine)

#now we change entries with 'A' to 'Orange'
for i in range(0, len(old):
    search = re.sub('A', 'Orange', saved_input[i])
    if search is not None:
        saved_input[i] = search
#now we open the file in write mode (clearing it) and writing saved_input back to it
input = file('C:\full_path\Stud.txt), 'w')
for each in saved_input:
    input.write(each)

Python Pandas:如何仅读取CSV文件的前n行?

问题:Python Pandas:如何仅读取CSV文件的前n行?

我有一个非常大的数据集,我无法读取其中的整个数据集。因此,我正在考虑只读取其中的一个数据块进行训练,但是我不知道该怎么做。任何想法将不胜感激。

I have a very large data set and I can’t afford to read the entire data set in. So, I’m thinking of reading only one chunk of it to train but I have no idea how to do it. Any thought will be appreciated.


回答 0

如果您只想读取前999,999行(非标题):

read_csv(..., nrows=999999)

如果您只想读取1,000,000 … 1,999,999行

read_csv(..., skiprows=1000000, nrows=999999)

nrows:int,默认值无要读取的文件行数。对读取大文件有用*

skiprows:类似于列表或整数的行号,在文件开头要跳过(索引为0)或要跳过的行数(整数)

对于大文件,您可能还需要使用chunksize:

chunksize:int,默认为None返回TextFileReader对象进行迭代

pandas.io.parsers.read_csv文档

If you only want to read the first 999,999 (non-header) rows:

read_csv(..., nrows=999999)

If you only want to read rows 1,000,000 … 1,999,999

read_csv(..., skiprows=1000000, nrows=999999)

nrows : int, default None Number of rows of file to read. Useful for reading pieces of large files*

skiprows : list-like or integer Row numbers to skip (0-indexed) or number of rows to skip (int) at the start of the file

and for large files, you’ll probably also want to use chunksize:

chunksize : int, default None Return TextFileReader object for iteration

pandas.io.parsers.read_csv documentation


ValueError:对关闭的文件进行I / O操作

问题:ValueError:对关闭的文件进行I / O操作

import csv    

with open('v.csv', 'w') as csvfile:
    cwriter = csv.writer(csvfile, delimiter=' ', quotechar='|', quoting=csv.QUOTE_MINIMAL)

for w, c in p.items():
    cwriter.writerow(w + c)

这里,p是一本字典,w并且c都是字符串。

当我尝试写入文件时,它报告错误:

ValueError: I/O operation on closed file.
import csv    

with open('v.csv', 'w') as csvfile:
    cwriter = csv.writer(csvfile, delimiter=' ', quotechar='|', quoting=csv.QUOTE_MINIMAL)

for w, c in p.items():
    cwriter.writerow(w + c)

Here, p is a dictionary, w and c both are strings.

When I try to write to the file it reports the error:

ValueError: I/O operation on closed file.

回答 0

正确缩进;您的for陈述应在with区块内:

import csv    

with open('v.csv', 'w') as csvfile:
    cwriter = csv.writer(csvfile, delimiter=' ', quotechar='|', quoting=csv.QUOTE_MINIMAL)

    for w, c in p.items():
        cwriter.writerow(w + c)

with块外部,文件已关闭。

>>> with open('/tmp/1', 'w') as f:
...     print(f.closed)
... 
False
>>> print(f.closed)
True

Indent correctly; your for statement should be inside the with block:

import csv    

with open('v.csv', 'w') as csvfile:
    cwriter = csv.writer(csvfile, delimiter=' ', quotechar='|', quoting=csv.QUOTE_MINIMAL)

    for w, c in p.items():
        cwriter.writerow(w + c)

Outside the with block, the file is closed.

>>> with open('/tmp/1', 'w') as f:
...     print(f.closed)
... 
False
>>> print(f.closed)
True

回答 1

混合使用:制表符+空格会引起相同的错误

with open('/foo', 'w') as f:
 (spaces OR  tab) print f       <-- success
 (spaces AND tab) print f       <-- fail

Same error can raise by mixing: tabs + spaces.

with open('/foo', 'w') as f:
 (spaces OR  tab) print f       <-- success
 (spaces AND tab) print f       <-- fail

以“ rt”和“ wt”模式打开文件

问题:以“ rt”和“ wt”模式打开文件

在这里,我有好几次见过人们使用rtwt模式来读写文件。

例如:

with open('input.txt', 'rt') as input_file:
     with open('output.txt', 'wt') as output_file: 
         ...

我没有看到有关模式的文档,但是由于open()不会引发错误-看起来非常合法。

它的作用是什么,使用wtvs wrtvs 之间有什么区别r

Several times here on SO I’ve seen people using rt and wt modes for reading and writing files.

For example:

with open('input.txt', 'rt') as input_file:
     with open('output.txt', 'wt') as output_file: 
         ...

I don’t see the modes documented, but since open() doesn’t throw an error – looks like it’s pretty much legal to use.

What is it for and is there any difference between using wt vs w and rt vs r?


回答 0

t指文本模式。rrtw和与之间没有区别,wt因为文本模式是默认模式。

记录在这里

Character   Meaning
'r'     open for reading (default)
'w'     open for writing, truncating the file first
'x'     open for exclusive creation, failing if the file already exists
'a'     open for writing, appending to the end of the file if it exists
'b'     binary mode
't'     text mode (default)
'+'     open a disk file for updating (reading and writing)
'U'     universal newlines mode (deprecated)

默认模式为'r'(打开以读取文本,为的同义词'rt')。

t refers to the text mode. There is no difference between r and rt or w and wt since text mode is the default.

Documented here:

Character   Meaning
'r'     open for reading (default)
'w'     open for writing, truncating the file first
'x'     open for exclusive creation, failing if the file already exists
'a'     open for writing, appending to the end of the file if it exists
'b'     binary mode
't'     text mode (default)
'+'     open a disk file for updating (reading and writing)
'U'     universal newlines mode (deprecated)

The default mode is 'r' (open for reading text, synonym of 'rt').


回答 1

t显示文本模式,这意味着\n字符将写入文件时,读取时被翻译成主机OS行结束,然后再返回。由于文本模式是默认设置,因此标记基本上只是噪音。

除此之外U,这些模式的标志直接来自标准C库的fopen()功能,即在第六段记录的事实python2文档open()

据我所知,t它不是并且从未成为C标准的一部分,因此尽管C库的许多实现仍然接受了C标准,但并不能保证它们全部都能实现,因此也不能保证它可以在C的每个构建中使用。Python。这就解释了为什么python2文档没有列出它,以及为什么它仍然正常工作。该python3文档使它官员。

The t indicates text mode, meaning that \n characters will be translated to the host OS line endings when writing to a file, and back again when reading. The flag is basically just noise, since text mode is the default.

Other than U, those mode flags come directly from the standard C library’s fopen() function, a fact that is documented in the sixth paragraph of the python2 documentation for open().

As far as I know, t is not and has never been part of the C standard, so although many implementations of the C library accept it anyway, there’s no guarantee that they all will, and therefore no guarantee that it will work on every build of python. That explains why the python2 docs didn’t list it, and why it generally worked anyway. The python3 docs make it official.


回答 2

“ r”用于阅读,“ w”用于书写,“ a”用于附加。

“ t”表示与二进制模式相对应的文本模式。

因此,我在这里有好几次看到人们使用rt和wt模式读取和写入文件。

编辑:您确定您看到rt而不是rb吗?

这些函数通常包装以下fopen函数:

http://www.cplusplus.com/reference/cstdio/fopen/

如您所见,它提到使用b以二进制模式打开文件。

您提供的文档链接也引用了此b模式:

甚至在没有区别对待二进制文件和文本文件的系统上,将’b’用作文档也是很有用的。

The ‘r’ is for reading, ‘w’ for writing and ‘a’ is for appending.

The ‘t’ represents text mode as apposed to binary mode.

Several times here on SO I’ve seen people using rt and wt modes for reading and writing files.

Edit: Are you sure you saw rt and not rb?

These functions generally wrap the fopen function which is described here:

http://www.cplusplus.com/reference/cstdio/fopen/

As you can see it mentions the use of b to open the file in binary mode.

The document link you provided also makes reference to this b mode:

Appending ‘b’ is useful even on systems that don’t treat binary and text files differently, where it serves as documentation.


回答 3

t 表示 text mode

https://docs.python.org/release/3.1.5/library/functions.html#open

在Linux上,文本模式和二进制模式之间没有区别,但是在Windows中,它们会转换\n\r\nwhen文本模式。

http://www.cygwin.com/cygwin-ug-net/using-textbinary.html

t indicates for text mode

https://docs.python.org/release/3.1.5/library/functions.html#open

on linux, there’s no difference between text mode and binary mode, however, in windows, they converts \n to \r\n when text mode.

http://www.cygwin.com/cygwin-ug-net/using-textbinary.html


批量重命名目录中的文件

问题:批量重命名目录中的文件

有没有一种简单的方法可以使用Python重命名目录中已包含的一组文件?

示例: 我有一个充满* .doc文件的目录,我想以一致的方式重命名它们。

X.doc->“ new(X).doc”

Y.doc->“ new(Y).doc”

Is there an easy way to rename a group of files already contained in a directory, using Python?

Example: I have a directory full of *.doc files and I want to rename them in a consistent way.

X.doc -> “new(X).doc”

Y.doc -> “new(Y).doc”


回答 0

这样的重命名非常容易,例如使用osglob模块:

import glob, os

def rename(dir, pattern, titlePattern):
    for pathAndFilename in glob.iglob(os.path.join(dir, pattern)):
        title, ext = os.path.splitext(os.path.basename(pathAndFilename))
        os.rename(pathAndFilename, 
                  os.path.join(dir, titlePattern % title + ext))

然后可以在示例中使用它,如下所示:

rename(r'c:\temp\xx', r'*.doc', r'new(%s)')

上面的示例会将dir中的所有*.doc文件都转换c:\temp\xxnew(%s).doc,其中%s是文件的先前基本名称(不带扩展名)。

Such renaming is quite easy, for example with os and glob modules:

import glob, os

def rename(dir, pattern, titlePattern):
    for pathAndFilename in glob.iglob(os.path.join(dir, pattern)):
        title, ext = os.path.splitext(os.path.basename(pathAndFilename))
        os.rename(pathAndFilename, 
                  os.path.join(dir, titlePattern % title + ext))

You could then use it in your example like this:

rename(r'c:\temp\xx', r'*.doc', r'new(%s)')

The above example will convert all *.doc files in c:\temp\xx dir to new(%s).doc, where %s is the previous base name of the file (without extension).


回答 1

我更喜欢为每次替换编写一个小的内衬,而不是编写更加通用和复杂的代码。例如:

这会将当前目录中任何非隐藏文件中的所有下划线都用连字符替换

import os
[os.rename(f, f.replace('_', '-')) for f in os.listdir('.') if not f.startswith('.')]

I prefer writing small one liners for each replace I have to do instead of making a more generic and complex code. E.g.:

This replaces all underscores with hyphens in any non-hidden file in the current directory

import os
[os.rename(f, f.replace('_', '-')) for f in os.listdir('.') if not f.startswith('.')]

回答 2

如果您不介意使用正则表达式,那么此函数将为您提供重命名文件的强大功能:

import re, glob, os

def renamer(files, pattern, replacement):
    for pathname in glob.glob(files):
        basename= os.path.basename(pathname)
        new_filename= re.sub(pattern, replacement, basename)
        if new_filename != basename:
            os.rename(
              pathname,
              os.path.join(os.path.dirname(pathname), new_filename))

因此,在您的示例中,您可以这样做(假设这是文件所在的当前目录):

renamer("*.doc", r"^(.*)\.doc$", r"new(\1).doc")

但您也可以回滚到初始文件名:

renamer("*.doc", r"^new\((.*)\)\.doc", r"\1.doc")

和更多。

If you don’t mind using regular expressions, then this function would give you much power in renaming files:

import re, glob, os

def renamer(files, pattern, replacement):
    for pathname in glob.glob(files):
        basename= os.path.basename(pathname)
        new_filename= re.sub(pattern, replacement, basename)
        if new_filename != basename:
            os.rename(
              pathname,
              os.path.join(os.path.dirname(pathname), new_filename))

So in your example, you could do (assuming it’s the current directory where the files are):

renamer("*.doc", r"^(.*)\.doc$", r"new(\1).doc")

but you could also roll back to the initial filenames:

renamer("*.doc", r"^new\((.*)\)\.doc", r"\1.doc")

and more.


回答 3

我用它来简单地重命名文件夹的子文件夹中的所有文件

import os

def replace(fpath, old_str, new_str):
    for path, subdirs, files in os.walk(fpath):
        for name in files:
            if(old_str.lower() in name.lower()):
                os.rename(os.path.join(path,name), os.path.join(path,
                                            name.lower().replace(old_str,new_str)))

我用new_str替换所有出现的old_str情况。

I have this to simply rename all files in subfolders of folder

import os

def replace(fpath, old_str, new_str):
    for path, subdirs, files in os.walk(fpath):
        for name in files:
            if(old_str.lower() in name.lower()):
                os.rename(os.path.join(path,name), os.path.join(path,
                                            name.lower().replace(old_str,new_str)))

I am replacing all occurences of old_str with any case by new_str.


回答 4

尝试:http : //www.mattweber.org/2007/03/04/python-script-renamepy/

我喜欢用某种方式命名我的音乐,电影和图片文件。当我从Internet下载文件时,它们通常不遵循我的命名约定。我发现自己手动重命名每个文件以适合我的风格。这真的很快就过去了,所以我决定编写一个程序为我做。

该程序可以将文件名转换为所有小写字母,将文件名中的字符串替换为所需内容,并从文件名的开头或后面修剪任意数量的字符。

该程序的源代码也可用。

Try: http://www.mattweber.org/2007/03/04/python-script-renamepy/

I like to have my music, movie, and picture files named a certain way. When I download files from the internet, they usually don’t follow my naming convention. I found myself manually renaming each file to fit my style. This got old realy fast, so I decided to write a program to do it for me.

This program can convert the filename to all lowercase, replace strings in the filename with whatever you want, and trim any number of characters from the front or back of the filename.

The program’s source code is also available.


回答 5

我自己编写了一个python脚本。它以存在文件的目录路径和要使用的命名模式作为参数。但是,它通过给您提供的命名模式附加一个增量数字(1、2、3等)来重命名。

import os
import sys

# checking whether path and filename are given.
if len(sys.argv) != 3:
    print "Usage : python rename.py <path> <new_name.extension>"
    sys.exit()

# splitting name and extension.
name = sys.argv[2].split('.')
if len(name) < 2:
    name.append('')
else:
    name[1] = ".%s" %name[1]

# to name starting from 1 to number_of_files.
count = 1

# creating a new folder in which the renamed files will be stored.
s = "%s/pic_folder" % sys.argv[1]
try:
    os.mkdir(s)
except OSError:
    # if pic_folder is already present, use it.
    pass

try:
    for x in os.walk(sys.argv[1]):
        for y in x[2]:
            # creating the rename pattern.
            s = "%spic_folder/%s%s%s" %(x[0], name[0], count, name[1])
            # getting the original path of the file to be renamed.
            z = os.path.join(x[0],y)
            # renaming.
            os.rename(z, s)
            # incrementing the count.
            count = count + 1
except OSError:
    pass

希望这对您有用。

I’ve written a python script on my own. It takes as arguments the path of the directory in which the files are present and the naming pattern that you want to use. However, it renames by attaching an incremental number (1, 2, 3 and so on) to the naming pattern you give.

import os
import sys

# checking whether path and filename are given.
if len(sys.argv) != 3:
    print "Usage : python rename.py <path> <new_name.extension>"
    sys.exit()

# splitting name and extension.
name = sys.argv[2].split('.')
if len(name) < 2:
    name.append('')
else:
    name[1] = ".%s" %name[1]

# to name starting from 1 to number_of_files.
count = 1

# creating a new folder in which the renamed files will be stored.
s = "%s/pic_folder" % sys.argv[1]
try:
    os.mkdir(s)
except OSError:
    # if pic_folder is already present, use it.
    pass

try:
    for x in os.walk(sys.argv[1]):
        for y in x[2]:
            # creating the rename pattern.
            s = "%spic_folder/%s%s%s" %(x[0], name[0], count, name[1])
            # getting the original path of the file to be renamed.
            z = os.path.join(x[0],y)
            # renaming.
            os.rename(z, s)
            # incrementing the count.
            count = count + 1
except OSError:
    pass

Hope this works for you.


回答 6

在您需要执行重命名的目录中。

import os
# get the file name list to nameList
nameList = os.listdir() 
#loop through the name and rename
for fileName in nameList:
    rename=fileName[15:28]
    os.rename(fileName,rename)
#example:
#input fileName bulk like :20180707131932_IMG_4304.JPG
#output renamed bulk like :IMG_4304.JPG

Be in the directory where you need to perform the renaming.

import os
# get the file name list to nameList
nameList = os.listdir() 
#loop through the name and rename
for fileName in nameList:
    rename=fileName[15:28]
    os.rename(fileName,rename)
#example:
#input fileName bulk like :20180707131932_IMG_4304.JPG
#output renamed bulk like :IMG_4304.JPG

回答 7

directoryName = "Photographs"
filePath = os.path.abspath(directoryName)
filePathWithSlash = filePath + "\\"

for counter, filename in enumerate(os.listdir(directoryName)):

    filenameWithPath = os.path.join(filePathWithSlash, filename)

    os.rename(filenameWithPath, filenameWithPath.replace(filename,"DSC_" + \
          str(counter).zfill(4) + ".jpg" ))

# e.g. filename = "photo1.jpg", directory = "c:\users\Photographs"        
# The string.replace call swaps in the new filename into 
# the current filename within the filenameWitPath string. Which    
# is then used by os.rename to rename the file in place, using the  
# current (unmodified) filenameWithPath.

# os.listdir delivers the filename(s) from the directory
# however in attempting to "rename" the file using os 
# a specific location of the file to be renamed is required.

# this code is from Windows 
directoryName = "Photographs"
filePath = os.path.abspath(directoryName)
filePathWithSlash = filePath + "\\"

for counter, filename in enumerate(os.listdir(directoryName)):

    filenameWithPath = os.path.join(filePathWithSlash, filename)

    os.rename(filenameWithPath, filenameWithPath.replace(filename,"DSC_" + \
          str(counter).zfill(4) + ".jpg" ))

# e.g. filename = "photo1.jpg", directory = "c:\users\Photographs"        
# The string.replace call swaps in the new filename into 
# the current filename within the filenameWitPath string. Which    
# is then used by os.rename to rename the file in place, using the  
# current (unmodified) filenameWithPath.

# os.listdir delivers the filename(s) from the directory
# however in attempting to "rename" the file using os 
# a specific location of the file to be renamed is required.

# this code is from Windows 

回答 8

我有一个类似的问题,但是我想在目录中所有文件的文件名的开头添加文本,并使用类似的方法。请参见下面的示例:

folder = r"R:\mystuff\GIS_Projects\Website\2017\PDF"

import os


for root, dirs, filenames in os.walk(folder):


for filename in filenames:  
    fullpath = os.path.join(root, filename)  
    filename_split = os.path.splitext(filename) # filename will be filename_split[0] and extension will be filename_split[1])
    print fullpath
    print filename_split[0]
    print filename_split[1]
    os.rename(os.path.join(root, filename), os.path.join(root, "NewText_2017_" + filename_split[0] + filename_split[1]))

I had a similar problem, but I wanted to append text to the beginning of the file name of all files in a directory and used a similar method. See example below:

folder = r"R:\mystuff\GIS_Projects\Website\2017\PDF"

import os


for root, dirs, filenames in os.walk(folder):


for filename in filenames:  
    fullpath = os.path.join(root, filename)  
    filename_split = os.path.splitext(filename) # filename will be filename_split[0] and extension will be filename_split[1])
    print fullpath
    print filename_split[0]
    print filename_split[1]
    os.rename(os.path.join(root, filename), os.path.join(root, "NewText_2017_" + filename_split[0] + filename_split[1]))

回答 9

至于我在我的目录中我有多个子目录,每个子目录有很多图像,我想将所有子目录图像更改为1.jpg〜n.jpg

def batch_rename():
    base_dir = 'F:/ad_samples/test_samples/'
    sub_dir_list = glob.glob(base_dir + '*')
    # print sub_dir_list # like that ['F:/dir1', 'F:/dir2']
    for dir_item in sub_dir_list:
        files = glob.glob(dir_item + '/*.jpg')
        i = 0
        for f in files:
            os.rename(f, os.path.join(dir_item, str(i) + '.jpg'))
            i += 1

(我自己的答案)https://stackoverflow.com/a/45734381/6329006

as to me in my directory I have multiple subdir, each subdir has lots of images I want to change all the subdir images to 1.jpg ~ n.jpg

def batch_rename():
    base_dir = 'F:/ad_samples/test_samples/'
    sub_dir_list = glob.glob(base_dir + '*')
    # print sub_dir_list # like that ['F:/dir1', 'F:/dir2']
    for dir_item in sub_dir_list:
        files = glob.glob(dir_item + '/*.jpg')
        i = 0
        for f in files:
            os.rename(f, os.path.join(dir_item, str(i) + '.jpg'))
            i += 1

(mys own answer)https://stackoverflow.com/a/45734381/6329006


回答 10

#  another regex version
#  usage example:
#  replacing an underscore in the filename with today's date
#  rename_files('..\\output', '(.*)(_)(.*\.CSV)', '\g<1>_20180402_\g<3>')
def rename_files(path, pattern, replacement):
    for filename in os.listdir(path):
        if re.search(pattern, filename):
            new_filename = re.sub(pattern, replacement, filename)
            new_fullname = os.path.join(path, new_filename)
            old_fullname = os.path.join(path, filename)
            os.rename(old_fullname, new_fullname)
            print('Renamed: ' + old_fullname + ' to ' + new_fullname
#  another regex version
#  usage example:
#  replacing an underscore in the filename with today's date
#  rename_files('..\\output', '(.*)(_)(.*\.CSV)', '\g<1>_20180402_\g<3>')
def rename_files(path, pattern, replacement):
    for filename in os.listdir(path):
        if re.search(pattern, filename):
            new_filename = re.sub(pattern, replacement, filename)
            new_fullname = os.path.join(path, new_filename)
            old_fullname = os.path.join(path, filename)
            os.rename(old_fullname, new_fullname)
            print('Renamed: ' + old_fullname + ' to ' + new_fullname

回答 11

如果要在编辑器(例如vim)中修改文件名,单击库将随命令一起提供,该命令click.edit()可用于接收来自编辑器的用户输入。这是如何使用它来重构目录中文件的示例。

import click
from pathlib import Path

# current directory
direc_to_refactor = Path(".")

# list of old file paths
old_paths = list(direc_to_refactor.iterdir())

# list of old file names
old_names = [str(p.name) for p in old_paths]

# modify old file names in an editor,
# and store them in a list of new file names
new_names = click.edit("\n".join(old_names)).split("\n")

# refactor the old file names
for i in range(len(old_paths)):
    old_paths[i].replace(direc_to_refactor / new_names[i])

我编写了一个使用相同技术的命令行应用程序,但它减少了此脚本的易变性,并提供了更多选项,例如递归重构。这是github页面的链接。如果您喜欢命令行应用程序,并且对文件名进行一些快速编辑,这很有用。(我的应用程序是类似于中发现的“bulkrename”命令游侠)。

If you would like to modify file names in an editor (such as vim), the click library comes with the command click.edit(), which can be used to receive user input from an editor. Here is an example of how it can be used to refactor files in a directory.

import click
from pathlib import Path

# current directory
direc_to_refactor = Path(".")

# list of old file paths
old_paths = list(direc_to_refactor.iterdir())

# list of old file names
old_names = [str(p.name) for p in old_paths]

# modify old file names in an editor,
# and store them in a list of new file names
new_names = click.edit("\n".join(old_names)).split("\n")

# refactor the old file names
for i in range(len(old_paths)):
    old_paths[i].replace(direc_to_refactor / new_names[i])

I wrote a command line application that uses the same technique, but that reduces the volatility of this script, and comes with more options, such as recursive refactoring. Here is the link to the github page. This is useful if you like command line applications, and are interested in making some quick edits to file names. (My application is similar to the “bulkrename” command found in ranger).


回答 12

该代码将起作用

该函数正好使用两个参数f_patth作为重命名文件的路径,并使用new_name作为文件的新名称。

import glob2
import os


def rename(f_path, new_name):
    filelist = glob2.glob(f_path + "*.ma")
    count = 0
    for file in filelist:
        print("File Count : ", count)
        filename = os.path.split(file)
        print(filename)
        new_filename = f_path + new_name + str(count + 1) + ".ma"
        os.rename(f_path+filename[1], new_filename)
        print(new_filename)
        count = count + 1

This code will work

The function exactly takes two arguments f_patth as your path to rename file and new_name as your new name to the file.

import glob2
import os


def rename(f_path, new_name):
    filelist = glob2.glob(f_path + "*.ma")
    count = 0
    for file in filelist:
        print("File Count : ", count)
        filename = os.path.split(file)
        print(filename)
        new_filename = f_path + new_name + str(count + 1) + ".ma"
        os.rename(f_path+filename[1], new_filename)
        print(new_filename)
        count = count + 1

将列表的Python列表写入csv文件

问题:将列表的Python列表写入csv文件

我有一长串以下形式的清单-

a = [[1.2,'abc',3],[1.2,'werew',4],........,[1.4,'qew',2]]

即列表中的值是不同的类型-浮点数,整数,字符串。如何将其写入csv文件,以便输出的csv文件看起来像

1.2,abc,3
1.2,werew,4
.
.
.
1.4,qew,2

I have a long list of lists of the following form —

a = [[1.2,'abc',3],[1.2,'werew',4],........,[1.4,'qew',2]]

i.e. the values in the list are of different types — float,int, strings.How do I write it into a csv file so that my output csv file looks like

1.2,abc,3
1.2,werew,4
.
.
.
1.4,qew,2

回答 0

Python的内置CSV模块可以轻松处理此问题:

import csv

with open("output.csv", "wb") as f:
    writer = csv.writer(f)
    writer.writerows(a)

假设您的问题中的清单定义为a。您可以通过各种可选参数来调整输出CSV的确切格式,csv.writer()如上面链接的库参考页中所述。

Python 3更新

import csv

with open("out.csv", "w", newline="") as f:
    writer = csv.writer(f)
    writer.writerows(a)

Python’s built-in CSV module can handle this easily:

import csv

with open("output.csv", "wb") as f:
    writer = csv.writer(f)
    writer.writerows(a)

This assumes your list is defined as a, as it is in your question. You can tweak the exact format of the output CSV via the various optional parameters to csv.writer() as documented in the library reference page linked above.

Update for Python 3

import csv

with open("out.csv", "w", newline="") as f:
    writer = csv.writer(f)
    writer.writerows(a)

回答 1

您可以使用pandas

In [1]: import pandas as pd

In [2]: a = [[1.2,'abc',3],[1.2,'werew',4],[1.4,'qew',2]]

In [3]: my_df = pd.DataFrame(a)

In [4]: my_df.to_csv('my_csv.csv', index=False, header=False)

You could use pandas:

In [1]: import pandas as pd

In [2]: a = [[1.2,'abc',3],[1.2,'werew',4],[1.4,'qew',2]]

In [3]: my_df = pd.DataFrame(a)

In [4]: my_df.to_csv('my_csv.csv', index=False, header=False)

回答 2

import csv
with open(file_path, 'a') as outcsv:   
    #configure writer to write standard csv file
    writer = csv.writer(outcsv, delimiter=',', quotechar='|', quoting=csv.QUOTE_MINIMAL, lineterminator='\n')
    writer.writerow(['number', 'text', 'number'])
    for item in list:
        #Write item to outcsv
        writer.writerow([item[0], item[1], item[2]])

官方文档:http : //docs.python.org/2/library/csv.html

import csv
with open(file_path, 'a') as outcsv:   
    #configure writer to write standard csv file
    writer = csv.writer(outcsv, delimiter=',', quotechar='|', quoting=csv.QUOTE_MINIMAL, lineterminator='\n')
    writer.writerow(['number', 'text', 'number'])
    for item in list:
        #Write item to outcsv
        writer.writerow([item[0], item[1], item[2]])

official docs: http://docs.python.org/2/library/csv.html


回答 3

如果出于某种原因你想要做手工(不使用模块一样csvpandasnumpy等):

with open('myfile.csv','w') as f:
    for sublist in mylist:
        for item in sublist:
            f.write(item + ',')
        f.write('\n')

当然,滚动自己的版本可能容易出错且效率低下……这通常就是为什么要使用该模块的原因。但是有时编写自己的内容可以帮助您了解它们的工作原理,有时则更容易。

If for whatever reason you wanted to do it manually (without using a module like csv,pandas,numpy etc.):

with open('myfile.csv','w') as f:
    for sublist in mylist:
        for item in sublist:
            f.write(item + ',')
        f.write('\n')

Of course, rolling your own version can be error-prone and inefficient … that’s usually why there’s a module for that. But sometimes writing your own can help you understand how they work, and sometimes it’s just easier.


回答 4

在我很大的列表中使用csv.writer花费了很长时间。我决定使用熊猫,它更快,更容易控制和理解:

 import pandas

 yourlist = [[...],...,[...]]
 pd = pandas.DataFrame(yourlist)
 pd.to_csv("mylist.csv")

好的部分,您可以更改一些内容以制作更好的csv文件:

 yourlist = [[...],...,[...]]
 columns = ["abcd","bcde","cdef"] #a csv with 3 columns
 index = [i[0] for i in yourlist] #first element of every list in yourlist
 not_index_list = [i[1:] for i in yourlist]
 pd = pandas.DataFrame(not_index_list, columns = columns, index = index)

 #Now you have a csv with columns and index:
 pd.to_csv("mylist.csv")

Using csv.writer in my very large list took quite a time. I decided to use pandas, it was faster and more easy to control and understand:

 import pandas

 yourlist = [[...],...,[...]]
 pd = pandas.DataFrame(yourlist)
 pd.to_csv("mylist.csv")

The good part you can change somethings to make a better csv file:

 yourlist = [[...],...,[...]]
 columns = ["abcd","bcde","cdef"] #a csv with 3 columns
 index = [i[0] for i in yourlist] #first element of every list in yourlist
 not_index_list = [i[1:] for i in yourlist]
 pd = pandas.DataFrame(not_index_list, columns = columns, index = index)

 #Now you have a csv with columns and index:
 pd.to_csv("mylist.csv")

回答 5

Ambers的解决方案也适用于numpy数组:

from pylab import *
import csv

array_=arange(0,10,1)
list_=[array_,array_*2,array_*3]
with open("output.csv", "wb") as f:
    writer = csv.writer(f)
    writer.writerows(list_)

Ambers’s solution also works well for numpy arrays:

from pylab import *
import csv

array_=arange(0,10,1)
list_=[array_,array_*2,array_*3]
with open("output.csv", "wb") as f:
    writer = csv.writer(f)
    writer.writerows(list_)

回答 6

如果您不想为此导入csv模块,则可以仅使用Python内置组件将列表列表写入csv文件

with open("output.csv", "w") as f:
    for row in a:
        f.write("%s\n" % ','.join(str(col) for col in row))

If you don’t want to import csv module for that, you can write a list of lists to a csv file using only Python built-ins

with open("output.csv", "w") as f:
    for row in a:
        f.write("%s\n" % ','.join(str(col) for col in row))

回答 7

确保lineterinator='\n'创建创作者时注明;否则,当数据源来自其他csv文件时,每条数据行之后可能会在文件中写入多余的空行…

这是我的解决方案:

with open('csvfile', 'a') as csvfile:
    spamwriter = csv.writer(csvfile, delimiter='    ',quotechar='|', quoting=csv.QUOTE_MINIMAL, lineterminator='\n')
for i in range(0, len(data)):
    spamwriter.writerow(data[i])

Make sure to indicate lineterinator='\n' when create the writer; otherwise, an extra empty line might be written into file after each data line when data sources are from other csv file…

Here is my solution:

with open('csvfile', 'a') as csvfile:
    spamwriter = csv.writer(csvfile, delimiter='    ',quotechar='|', quoting=csv.QUOTE_MINIMAL, lineterminator='\n')
for i in range(0, len(data)):
    spamwriter.writerow(data[i])

回答 8

如何将列表列表转储到pickle中并使用pickle模块恢复呢?很方便

>>> import pickle
>>> 
>>> mylist = [1, 'foo', 'bar', {1, 2, 3}, [ [1,4,2,6], [3,6,0,10]]]
>>> with open('mylist', 'wb') as f:
...     pickle.dump(mylist, f) 


>>> with open('mylist', 'rb') as f:
...      mylist = pickle.load(f)
>>> mylist
[1, 'foo', 'bar', {1, 2, 3}, [[1, 4, 2, 6], [3, 6, 0, 10]]]
>>> 

How about dumping the list of list into pickle and restoring it with pickle module? It’s quite convenient.

>>> import pickle
>>> 
>>> mylist = [1, 'foo', 'bar', {1, 2, 3}, [ [1,4,2,6], [3,6,0,10]]]
>>> with open('mylist', 'wb') as f:
...     pickle.dump(mylist, f) 


>>> with open('mylist', 'rb') as f:
...      mylist = pickle.load(f)
>>> mylist
[1, 'foo', 'bar', {1, 2, 3}, [[1, 4, 2, 6], [3, 6, 0, 10]]]
>>> 

回答 9

在csv.writer函数中使用换行符跟随示例时,出现错误消息。以下代码对我有用。

 with open(strFileName, "w") as f:
    writer = csv.writer(f, delimiter=',',  quoting=csv.QUOTE_MINIMAL)
    writer.writerows(result)

I got an error message when following the examples with a newline parameter in the csv.writer function. The following code worked for me.

 with open(strFileName, "w") as f:
    writer = csv.writer(f, delimiter=',',  quoting=csv.QUOTE_MINIMAL)
    writer.writerows(result)

Python连接文本文件

问题:Python连接文本文件

我列出了20个文件名,例如['file1.txt', 'file2.txt', ...]。我想编写一个Python脚本将这些文件连接成一个新文件。我可以通过打开每个文件f = open(...),通过调用逐行读取f.readline(),然后将每一行写入该新文件。在我看来,这并不是很“优雅”,尤其是我必须逐行读取/写入的部分。

在Python中是否有更“优雅”的方式来做到这一点?

I have a list of 20 file names, like ['file1.txt', 'file2.txt', ...]. I want to write a Python script to concatenate these files into a new file. I could open each file by f = open(...), read line by line by calling f.readline(), and write each line into that new file. It doesn’t seem very “elegant” to me, especially the part where I have to read//write line by line.

Is there a more “elegant” way to do this in Python?


回答 0

这应该做

对于大文件:

filenames = ['file1.txt', 'file2.txt', ...]
with open('path/to/output/file', 'w') as outfile:
    for fname in filenames:
        with open(fname) as infile:
            for line in infile:
                outfile.write(line)

对于小文件:

filenames = ['file1.txt', 'file2.txt', ...]
with open('path/to/output/file', 'w') as outfile:
    for fname in filenames:
        with open(fname) as infile:
            outfile.write(infile.read())

……还有我想到的另一个有趣的东西

filenames = ['file1.txt', 'file2.txt', ...]
with open('path/to/output/file', 'w') as outfile:
    for line in itertools.chain.from_iterable(itertools.imap(open, filnames)):
        outfile.write(line)

遗憾的是,这最后一个方法留下了一些打开的文件描述符,GC还是应该照顾这些文件描述符。我只是觉得很有趣

This should do it

For large files:

filenames = ['file1.txt', 'file2.txt', ...]
with open('path/to/output/file', 'w') as outfile:
    for fname in filenames:
        with open(fname) as infile:
            for line in infile:
                outfile.write(line)

For small files:

filenames = ['file1.txt', 'file2.txt', ...]
with open('path/to/output/file', 'w') as outfile:
    for fname in filenames:
        with open(fname) as infile:
            outfile.write(infile.read())

… and another interesting one that I thought of:

filenames = ['file1.txt', 'file2.txt', ...]
with open('path/to/output/file', 'w') as outfile:
    for line in itertools.chain.from_iterable(itertools.imap(open, filnames)):
        outfile.write(line)

Sadly, this last method leaves a few open file descriptors, which the GC should take care of anyway. I just thought it was interesting


回答 1

使用shutil.copyfileobj

它会自动为您逐块读取输入文件,这样效率更高,并且可以读取输入文件,即使某些输入文件太大而无法装入内存也可以正常工作:

import shutil

with open('output_file.txt','wb') as wfd:
    for f in ['seg1.txt','seg2.txt','seg3.txt']:
        with open(f,'rb') as fd:
            shutil.copyfileobj(fd, wfd)

Use shutil.copyfileobj.

It automatically reads the input files chunk by chunk for you, which is more more efficient and reading the input files in and will work even if some of the input files are too large to fit into memory:

import shutil

with open('output_file.txt','wb') as wfd:
    for f in ['seg1.txt','seg2.txt','seg3.txt']:
        with open(f,'rb') as fd:
            shutil.copyfileobj(fd, wfd)

回答 2

这正是fileinput的目的:

import fileinput
with open(outfilename, 'w') as fout, fileinput.input(filenames) as fin:
    for line in fin:
        fout.write(line)

对于这种用例,它不仅比手动遍历文件简单得多,而且在其他情况下,拥有一个遍历所有文件就好像它们是单个文件一样的单个迭代器非常方便。(此外,事实是fileinput,一旦完成就关闭每个文件,这意味着不需要withclose每个文件,但这只是节省一行,而不是什么大不了的。)

中还有其他一些漂亮的功能fileinput,例如仅通过过滤每一行就可以对文件进行就地修改的功能。


正如评论中所述,并在另一篇文章中讨论的那样,fileinputPython 2.7将无法按指示工作。在这里稍作修改以使代码与Python 2.7兼容

with open('outfilename', 'w') as fout:
    fin = fileinput.input(filenames)
    for line in fin:
        fout.write(line)
    fin.close()

That’s exactly what fileinput is for:

import fileinput
with open(outfilename, 'w') as fout, fileinput.input(filenames) as fin:
    for line in fin:
        fout.write(line)

For this use case, it’s really not much simpler than just iterating over the files manually, but in other cases, having a single iterator that iterates over all of the files as if they were a single file is very handy. (Also, the fact that fileinput closes each file as soon as it’s done means there’s no need to with or close each one, but that’s just a one-line savings, not that big of a deal.)

There are some other nifty features in fileinput, like the ability to do in-place modifications of files just by filtering each line.


As noted in the comments, and discussed in another post, fileinput for Python 2.7 will not work as indicated. Here slight modification to make the code Python 2.7 compliant

with open('outfilename', 'w') as fout:
    fin = fileinput.input(filenames)
    for line in fin:
        fout.write(line)
    fin.close()

回答 3

我对优雅并不了解,但这可行:

    import glob
    import os
    for f in glob.glob("file*.txt"):
         os.system("cat "+f+" >> OutFile.txt")

I don’t know about elegance, but this works:

    import glob
    import os
    for f in glob.glob("file*.txt"):
         os.system("cat "+f+" >> OutFile.txt")

回答 4

UNIX命令怎么了?(假设您不在Windows上工作):

ls | xargs cat | tee output.txt 做这项工作(如果需要,您可以使用子进程从python调用它)

What’s wrong with UNIX commands ? (given you’re not working on Windows) :

ls | xargs cat | tee output.txt does the job ( you can call it from python with subprocess if you want)


回答 5

outfile.write(infile.read()) # time: 2.1085190773010254s
shutil.copyfileobj(fd, wfd, 1024*1024*10) # time: 0.60599684715271s

一个简单的基准表明,shutil性能更好。

outfile.write(infile.read()) # time: 2.1085190773010254s
shutil.copyfileobj(fd, wfd, 1024*1024*10) # time: 0.60599684715271s

A simple benchmark shows that the shutil performs better.


回答 6

@ inspectorG4dget答案的替代方法(最佳答案日期29-03-2016)。我测试了436MB的3个文件。

@ inspectorG4dget解决方案:162秒

解决方法:125秒

from subprocess import Popen
filenames = ['file1.txt', 'file2.txt', 'file3.txt']
fbatch = open('batch.bat','w')
str ="type "
for f in filenames:
    str+= f + " "
fbatch.write(str + " > file4results.txt")
fbatch.close()
p = Popen("batch.bat", cwd=r"Drive:\Path\to\folder")
stdout, stderr = p.communicate()

这个想法是利用“旧的好技术”来创建并执行一个批处理文件。它是半Python,但运行速度更快。适用于Windows。

An alternative to @inspectorG4dget answer (best answer to date 29-03-2016). I tested with 3 files of 436MB.

@inspectorG4dget solution: 162 seconds

The following solution : 125 seconds

from subprocess import Popen
filenames = ['file1.txt', 'file2.txt', 'file3.txt']
fbatch = open('batch.bat','w')
str ="type "
for f in filenames:
    str+= f + " "
fbatch.write(str + " > file4results.txt")
fbatch.close()
p = Popen("batch.bat", cwd=r"Drive:\Path\to\folder")
stdout, stderr = p.communicate()

The idea is to create a batch file and execute it, taking advantage of “old good technology”. Its semi-python but works faster. Works for windows.


回答 7

如果目录中有很多文件,则glob2最好是生成文件名列表,而不是手工编写文件名。

import glob2

filenames = glob2.glob('*.txt')  # list of all .txt files in the directory

with open('outfile.txt', 'w') as f:
    for file in filenames:
        with open(file) as infile:
            f.write(infile.read()+'\n')

If you have a lot of files in the directory then glob2 might be a better option to generate a list of filenames rather than writing them by hand.

import glob2

filenames = glob2.glob('*.txt')  # list of all .txt files in the directory

with open('outfile.txt', 'w') as f:
    for file in filenames:
        with open(file) as infile:
            f.write(infile.read()+'\n')

回答 8

检出File对象的.read()方法:

http://docs.python.org/2/tutorial/inputoutput.html#methods-of-file-objects

您可以执行以下操作:

concat = ""
for file in files:
    concat += open(file).read()

或更“优雅”的python-way:

concat = ''.join([open(f).read() for f in files])

根据这篇文章:http : //www.skymind.com/~ocrow/python_string/也是最快的。

Check out the .read() method of the File object:

http://docs.python.org/2/tutorial/inputoutput.html#methods-of-file-objects

You could do something like:

concat = ""
for file in files:
    concat += open(file).read()

or a more ‘elegant’ python-way:

concat = ''.join([open(f).read() for f in files])

which, according to this article: http://www.skymind.com/~ocrow/python_string/ would also be the fastest.


回答 9

如果文件不是巨大的:

with open('newfile.txt','wb') as newf:
    for filename in list_of_files:
        with open(filename,'rb') as hf:
            newf.write(hf.read())
            # newf.write('\n\n\n')   if you want to introduce
            # some blank lines between the contents of the copied files

如果文件太大而无法完全读取并保存在RAM中,则算法必须有所不同read(10000),例如使用固定长度的块读取要循环复制的每个文件。

If the files are not gigantic:

with open('newfile.txt','wb') as newf:
    for filename in list_of_files:
        with open(filename,'rb') as hf:
            newf.write(hf.read())
            # newf.write('\n\n\n')   if you want to introduce
            # some blank lines between the contents of the copied files

If the files are too big to be entirely read and held in RAM, the algorithm must be a little different to read each file to be copied in a loop by chunks of fixed length, using read(10000) for example.


回答 10

def concatFiles():
    path = 'input/'
    files = os.listdir(path)
    for idx, infile in enumerate(files):
        print ("File #" + str(idx) + "  " + infile)
    concat = ''.join([open(path + f).read() for f in files])
    with open("output_concatFile.txt", "w") as fo:
        fo.write(path + concat)

if __name__ == "__main__":
    concatFiles()
def concatFiles():
    path = 'input/'
    files = os.listdir(path)
    for idx, infile in enumerate(files):
        print ("File #" + str(idx) + "  " + infile)
    concat = ''.join([open(path + f).read() for f in files])
    with open("output_concatFile.txt", "w") as fo:
        fo.write(path + concat)

if __name__ == "__main__":
    concatFiles()

回答 11

  import os
  files=os.listdir()
  print(files)
  print('#',tuple(files))
  name=input('Enter the inclusive file name: ')
  exten=input('Enter the type(extension): ')
  filename=name+'.'+exten
  output_file=open(filename,'w+')
  for i in files:
    print(i)
    j=files.index(i)
    f_j=open(i,'r')
    print(f_j.read())
    for x in f_j:
      outfile.write(x)
  import os
  files=os.listdir()
  print(files)
  print('#',tuple(files))
  name=input('Enter the inclusive file name: ')
  exten=input('Enter the type(extension): ')
  filename=name+'.'+exten
  output_file=open(filename,'w+')
  for i in files:
    print(i)
    j=files.index(i)
    f_j=open(i,'r')
    print(f_j.read())
    for x in f_j:
      outfile.write(x)