标签归档:save

如何使用PIL保存图像?

问题:如何使用PIL保存图像?

我刚刚使用Python图像库(PIL)进行了一些图像处理,这是我之前发现的用于执行图像的傅立叶变换的文章,但我无法使用save函数。整个代码运行良好,但不会保存生成的图像:

from PIL import Image
import numpy as np

i = Image.open("C:/Users/User/Desktop/mesh.bmp")
i = i.convert("L")
a = np.asarray(i)
b = np.abs(np.fft.rfft2(a))
j = Image.fromarray(b)
j.save("C:/Users/User/Desktop/mesh_trans",".bmp")

我得到的错误如下:

save_handler = SAVE[string.upper(format)] # unknown format
    KeyError: '.BMP'

如何使用Pythons PIL保存图像?

I have just done some image processing using the Python image library (PIL) using a post I found earlier to perform fourier transforms of images and I can’t get the save function to work. The whole code works fine but it just wont save the resulting image:

from PIL import Image
import numpy as np

i = Image.open("C:/Users/User/Desktop/mesh.bmp")
i = i.convert("L")
a = np.asarray(i)
b = np.abs(np.fft.rfft2(a))
j = Image.fromarray(b)
j.save("C:/Users/User/Desktop/mesh_trans",".bmp")

The error I get is the following:

save_handler = SAVE[string.upper(format)] # unknown format
    KeyError: '.BMP'

How can I save an image with Pythons PIL?


回答 0

解决了与文件扩展名有关的错误,您可以使用BMP(不带点)或将输出名称与扩展名一起传递。现在要处理该错误,您需要在频域中适当地修改数据以将其保存为整数图像,PIL这告诉您它不接受将浮点数据保存为BMP。

这是进行转换以实现正确可视化的建议(还有其他一些小的修改,例如使用fftshiftnumpy.array代替numpy.asarray):

import sys
import numpy
from PIL import Image

img = Image.open(sys.argv[1]).convert('L')

im = numpy.array(img)
fft_mag = numpy.abs(numpy.fft.fftshift(numpy.fft.fft2(im)))

visual = numpy.log(fft_mag)
visual = (visual - visual.min()) / (visual.max() - visual.min())

result = Image.fromarray((visual * 255).astype(numpy.uint8))
result.save('out.bmp')

The error regarding the file extension has been handled, you either use BMP (without the dot) or pass the output name with the extension already. Now to handle the error you need to properly modify your data in the frequency domain to be saved as an integer image, PIL is telling you that it doesn’t accept float data to save as BMP.

Here is a suggestion (with other minor modifications, like using fftshift and numpy.array instead of numpy.asarray) for doing the conversion for proper visualization:

import sys
import numpy
from PIL import Image

img = Image.open(sys.argv[1]).convert('L')

im = numpy.array(img)
fft_mag = numpy.abs(numpy.fft.fftshift(numpy.fft.fft2(im)))

visual = numpy.log(fft_mag)
visual = (visual - visual.min()) / (visual.max() - visual.min())

result = Image.fromarray((visual * 255).astype(numpy.uint8))
result.save('out.bmp')

回答 1

您应该能够简单地让PIL从扩展名中获取文件类型,即使用:

j.save("C:/Users/User/Desktop/mesh_trans.bmp")

You should be able to simply let PIL get the filetype from extension, i.e. use:

j.save("C:/Users/User/Desktop/mesh_trans.bmp")

回答 2

尝试删除.之前的.bmp(它BMP与预期的不匹配)。正如您从错误中看到的那样,save_handler就是format您提供的大写字母,然后在中寻找匹配项SAVE。但是,该对象中的对应键为BMP(而不是.BMP)。

我不太了解PIL,但是通过一些快速搜索,似乎mode图像的问题。将的定义更改j为:

j = Image.fromarray(b, mode='RGB')

似乎为我工作(但是请注意,我对的了解很少PIL,因此我建议使用@mmgp的解决方案,因为他/她清楚地知道他们在做什么:)))。对于的类型mode,我使用了页面-希望那里的一种选择适合您。

Try removing the . before the .bmp (it isn’t matching BMP as expected). As you can see from the error, the save_handler is upper-casing the format you provided and then looking for a match in SAVE. However the corresponding key in that object is BMP (instead of .BMP).

I don’t know a great deal about PIL, but from some quick searching around it seems that it is a problem with the mode of the image. Changing the definition of j to:

j = Image.fromarray(b, mode='RGB')

Seemed to work for me (however note that I have very little knowledge of PIL, so I would suggest using @mmgp’s solution as s/he clearly knows what they are doing :) ). For the types of mode, I used this page – hopefully one of the choices there will work for you.


回答 3

我知道这很旧,但是我发现(在使用Pillow的同时)通过使用open(fp, 'w')然后保存文件来打开文件是可行的。例如:

with open(fp, 'w') as f:
    result.save(f)

fp 当然是文件路径。

I know that this is old, but I’ve found that (while using Pillow) opening the file by using open(fp, 'w') and then saving the file will work. E.g:

with open(fp, 'w') as f:
    result.save(f)

fp being the file path, of course.


以非常高的质量将图像保存在python中

问题:以非常高的质量将图像保存在python中

如何以极高的质量保存python图?

也就是说,当我继续放大保存在pdf文件中的对象时,没有模糊吗?

另外,保存它的最佳方式是什么?

pngeps?还是其他?我不能做,pdf因为有一个隐藏的数字发生,导致Latexmk编译混乱。

How can I save Python plots at very high quality?

That is, when I keep zooming in on the object saved in a PDF file, why isn’t there any blurring?

Also, what would be the best mode to save it in?

png, eps? Or some other? I can’t do pdf, because there is a hidden number that happens that mess with Latexmk compilation.


回答 0

如果您正在使用matplotlib并试图在乳胶文档中获得良好的数据,请另存为eps。具体来说,请在运行命令以绘制图像后尝试以下操作:

plt.savefig('destination_path.eps', format='eps')

我发现eps文件效果最好,而dpi参数的确是使它们在文档中看起来不错的原因。

更新:

要在保存之前指定图形的方向,只需在调用之前plt.savefig在创建图形后调用以下命令即可(假设您使用名称为的轴进行了绘制ax):

ax.view_init(elev=elevation_angle, azim=azimuthal_angle)

其中elevation_angle的一个数字(以度为单位)指定极角(从垂直z轴向下),并且azimuthal_angle指定方位角(围绕z轴)。

我发现最简单的方法是确定这些值,方法是先绘制图像,然后旋转图像,然后观察角度的当前值出现在窗口的底部,即实际绘制的下方。请记住,默认情况下会显示x,y,z位置,但是当您开始单击+拖动+旋转图像时,它们会被两个角度替换。

If you are using Matplotlib and are trying to get good figures in a LaTeX document, save as an EPS. Specifically, try something like this after running the commands to plot the image:

plt.savefig('destination_path.eps', format='eps')

I have found that EPS files work best and the dpi parameter is what really makes them look good in a document.

To specify the orientation of the figure before saving, simply call the following before the plt.savefig call, but after creating the plot (assuming you have plotted using an axes with the name ax):

ax.view_init(elev=elevation_angle, azim=azimuthal_angle)

Where elevation_angle is a number (in degrees) specifying the polar angle (down from vertical z axis) and the azimuthal_angle specifies the azimuthal angle (around the z axis).

I find that it is easiest to determine these values by first plotting the image and then rotating it and watching the current values of the angles appear towards the bottom of the window just below the actual plot. Keep in mind that the x, y, z, positions appear by default, but they are replaced with the two angles when you start to click+drag+rotate the image.


回答 1

只是添加我的结果,也使用matplotlib。

.eps使我的所有文本变为粗体,并删除了透明度。.svg给了我高分辨率图片,实际上看起来像我的图表。

import matplotlib.pyplot as plt
fig, ax = plt.subplots()
# Do the plot code
fig.savefig('myimage.svg', format='svg', dpi=1200)

我使用1200 dpi是因为许多科学期刊要求图像的分辨率为1200/600/300 dpi。在GiMP或Inkscape中转换为所需的dpi和格式。

编辑:显然dpi没关系,因为.svg是矢量图形,并且具有“无限分辨率”。

Just to add my results, also using Matplotlib.

.eps made all my text bold and removed transparency. .svg gave me high-resolution pictures that actually looked like my graph.

import matplotlib.pyplot as plt
fig, ax = plt.subplots()
# Do the plot code
fig.savefig('myimage.svg', format='svg', dpi=1200)

I used 1200 dpi because a lot of scientific journals require images in 1200 / 600 / 300 dpi, depending on what the image is of. Convert to desired dpi and format in GIMP or Inkscape.

Obviously the dpi doesn’t matter since .svg are vector graphics and have “infinite resolution”.


回答 2

好的,我发现spencerlyon2的答案有效,但是万一有人发现自己不知道该如何处理那一行,我就必须这样做:

beingsaved = plt.figure()

# some scatters
plt.scatter(X_1_x, X_1_y)
plt.scatter(X_2_x, X_2_y)

beingsaved.savefig('destination_path.eps', format='eps', dpi=1000)

Okay, I found spencerlyon2’s answer working. However, in case anybody would find himself/herself not knowing what to do with that one line, I had to do it this way:

beingsaved = plt.figure()

# Some scatter plots
plt.scatter(X_1_x, X_1_y)
plt.scatter(X_2_x, X_2_y)

beingsaved.savefig('destination_path.eps', format='eps', dpi=1000)

回答 3

如果您正在使用海图,而不是matplotlib,可以保存一个.png图像,如下所示:

假设您有一个matrix对象(pandas或numpy),并且想获取一个热图:

import seaborn as sb

image = sb.heatmap(matrix)   # this gets you the heatmap
image.figure.savefig("C:/Your/Path/ ... /your_image.png")   # this saves it

该代码与seaborn的最新版本兼容。关于stackoverflow的其他代码仅适用于以前的版本。

我喜欢的另一种方式是这样。我将下一张图像的大小设置如下:

plt.subplots(figsize=(15,15))

然后,稍后在控制台中绘制输出,从中可以将其复制粘贴到所需的位置。(由于seaborn是建立在matplotlib之上的,因此不会有问题。)

In case you are working with seaborn plots, instead of Matplotlib, you can save a .png image like this:

Let’s suppose you have a matrix object (either Pandas or NumPy), and you want to take a heatmap:

import seaborn as sb

image = sb.heatmap(matrix)   # This gets you the heatmap
image.figure.savefig("C:/Your/Path/ ... /your_image.png")   # This saves it

This code is compatible with the latest version of Seaborn. Other code around Stack Overflow worked only for previous versions.

Another way I like is this. I set the size of the next image as follows:

plt.subplots(figsize=(15,15))

And then later I plot the output in the console, from which I can copy-paste it where I want. (Since Seaborn is built on top of Matplotlib, there will not be any problem.)


回答 4

您可以使用以下方法将其保存为1920×1080(或1080p)的图形:

fig = plt.figure(figsize=(19.20,10.80))

您也可以更高或更低。上述解决方案可以很好地用于打印,但是如今,您希望将创建的图像转换为PNG / JPG或以宽屏格式显示。

You can save to a figure that is 1920×1080 (or 1080p) using:

fig = plt.figure(figsize=(19.20,10.80))

You can also go much higher or lower. The above solutions work well for printing, but these days you want the created image to go into a PNG/JPG or appear in a wide screen format.


基本的HTTP文件下载并保存到python中的磁盘上?

问题:基本的HTTP文件下载并保存到python中的磁盘上?

我是Python的新手,并且已经在本网站上进行了问答,以解答我的问题。但是,我是一个初学者,我发现很难理解某些解决方案。我需要一个非常基本的解决方案。

有人可以向我解释一下“通过http下载文件”和“在Windows中保存到磁盘”的简单解决方案吗?

我也不知道如何使用shutil和os模块。

我要下载的文件不到500 MB,是一个.gz存档文件。如果有人可以解释如何提取存档并利用其中的文件,那就太好了!

这是部分解决方案,是我根据各种答案写的:

import requests
import os
import shutil

global dump

def download_file():
    global dump
    url = "http://randomsite.com/file.gz"
    file = requests.get(url, stream=True)
    dump = file.raw

def save_file():
    global dump
    location = os.path.abspath("D:\folder\file.gz")
    with open("file.gz", 'wb') as location:
        shutil.copyfileobj(dump, location)
    del dump

有人可以指出错误(初学者水平)并解释执行此操作的更简单方法吗?

谢谢!

I’m new to Python and I’ve been going through the Q&A on this site, for an answer to my question. However, I’m a beginner and I find it difficult to understand some of the solutions. I need a very basic solution.

Could someone please explain a simple solution to ‘Downloading a file through http’ and ‘Saving it to disk, in Windows’, to me?

I’m not sure how to use shutil and os modules, either.

The file I want to download is under 500 MB and is an .gz archive file.If someone can explain how to extract the archive and utilise the files in it also, that would be great!

Here’s a partial solution, that I wrote from various answers combined:

import requests
import os
import shutil

global dump

def download_file():
    global dump
    url = "http://randomsite.com/file.gz"
    file = requests.get(url, stream=True)
    dump = file.raw

def save_file():
    global dump
    location = os.path.abspath("D:\folder\file.gz")
    with open("file.gz", 'wb') as location:
        shutil.copyfileobj(dump, location)
    del dump

Could someone point out errors (beginner level) and explain any easier methods to do this?

Thanks!


回答 0

一种下载文件的干净方法是:

import urllib

testfile = urllib.URLopener()
testfile.retrieve("http://randomsite.com/file.gz", "file.gz")

这将从网站下载文件并命名file.gz。这是我最喜欢的解决方案之一,从通过urllib和python下载图片开始

本示例使用该urllib库,它将直接从源中检索文件。

A clean way to download a file is:

import urllib

testfile = urllib.URLopener()
testfile.retrieve("http://randomsite.com/file.gz", "file.gz")

This downloads a file from a website and names it file.gz. This is one of my favorite solutions, from Downloading a picture via urllib and python.

This example uses the urllib library, and it will directly retrieve the file form a source.


回答 1

如前所述这里

import urllib
urllib.urlretrieve ("http://randomsite.com/file.gz", "file.gz")

EDIT:如果您仍想使用请求,请查看此问题问题

As mentioned here:

import urllib
urllib.urlretrieve ("http://randomsite.com/file.gz", "file.gz")

EDIT: If you still want to use requests, take a look at this question or this one.


回答 2

我用wget

如果您想举例说明简单而又好的库?

import wget

file_url = 'http://johndoe.com/download.zip'

file_name = wget.download(file_url)

wget模块支持python 2和python 3版本

I use wget.

Simple and good library if you want to example?

import wget

file_url = 'http://johndoe.com/download.zip'

file_name = wget.download(file_url)

wget module support python 2 and python 3 versions


回答 3

四种使用wget,urllib和request的方法。

#!/usr/bin/python
import requests
from StringIO import StringIO
from PIL import Image
import profile as profile
import urllib
import wget


url = 'https://tinypng.com/images/social/website.jpg'

def testRequest():
    image_name = 'test1.jpg'
    r = requests.get(url, stream=True)
    with open(image_name, 'wb') as f:
        for chunk in r.iter_content():
            f.write(chunk)

def testRequest2():
    image_name = 'test2.jpg'
    r = requests.get(url)
    i = Image.open(StringIO(r.content))
    i.save(image_name)

def testUrllib():
    image_name = 'test3.jpg'
    testfile = urllib.URLopener()
    testfile.retrieve(url, image_name)

def testwget():
    image_name = 'test4.jpg'
    wget.download(url, image_name)

if __name__ == '__main__':
    profile.run('testRequest()')
    profile.run('testRequest2()')
    profile.run('testUrllib()')
    profile.run('testwget()')

testRequest-在20.236秒内调用4469882函数(4469842基本调用)

testRequest2-8580个函数调用(8574个基本调用)在0.072秒内

testUrllib-在0.036秒内调用3810个函数(调用3775个原始函数)

testwget-在0.020秒内调用3489函数

Four methods using wget, urllib and request.

#!/usr/bin/python
import requests
from StringIO import StringIO
from PIL import Image
import profile as profile
import urllib
import wget


url = 'https://tinypng.com/images/social/website.jpg'

def testRequest():
    image_name = 'test1.jpg'
    r = requests.get(url, stream=True)
    with open(image_name, 'wb') as f:
        for chunk in r.iter_content():
            f.write(chunk)

def testRequest2():
    image_name = 'test2.jpg'
    r = requests.get(url)
    i = Image.open(StringIO(r.content))
    i.save(image_name)

def testUrllib():
    image_name = 'test3.jpg'
    testfile = urllib.URLopener()
    testfile.retrieve(url, image_name)

def testwget():
    image_name = 'test4.jpg'
    wget.download(url, image_name)

if __name__ == '__main__':
    profile.run('testRequest()')
    profile.run('testRequest2()')
    profile.run('testUrllib()')
    profile.run('testwget()')

testRequest – 4469882 function calls (4469842 primitive calls) in 20.236 seconds

testRequest2 – 8580 function calls (8574 primitive calls) in 0.072 seconds

testUrllib – 3810 function calls (3775 primitive calls) in 0.036 seconds

testwget – 3489 function calls in 0.020 seconds


回答 4

对于Python3 +, URLopener已弃用。使用时会出现如下错误:

url_opener = urllib.URLopener()AttributeError:模块’urllib’没有属性’URLopener’

因此,请尝试:

import urllib.request 
urllib.request.urlretrieve(url, filename)

For Python3+ URLopener is deprecated. And when used you will get error as below:

url_opener = urllib.URLopener() AttributeError: module ‘urllib’ has no attribute ‘URLopener’

So, try:

import urllib.request 
urllib.request.urlretrieve(url, filename)

回答 5

异国Windows解决方案

import subprocess

subprocess.run("powershell Invoke-WebRequest {} -OutFile {}".format(your_url, filename), shell=True)

Exotic Windows Solution

import subprocess

subprocess.run("powershell Invoke-WebRequest {} -OutFile {}".format(your_url, filename), shell=True)

回答 6

我开始沿着这条路走,因为ESXi的wget没有使用SSL编译,我想将OVA从供应商的网站直接下载到位于世界另一端的ESXi主机上。

我必须通过编辑规则来禁用防火墙(懒惰)/启用https(正确)

创建了python脚本:

import ssl
import shutil
import tempfile
import urllib.request
context = ssl._create_unverified_context()

dlurl='https://somesite/path/whatever'
with urllib.request.urlopen(durl, context=context) as response:
    with open("file.ova", 'wb') as tmp_file:
        shutil.copyfileobj(response, tmp_file)

ESXi库是配对的,但是开源的鼬鼠安装程序似乎将urllib用于https …因此启发了我走这条路

I started down this path because ESXi’s wget is not compiled with SSL and I wanted to download an OVA from a vendor’s website directly onto the ESXi host which is on the other side of the world.

I had to disable the firewall(lazy)/enable https out by editing the rules(proper)

created the python script:

import ssl
import shutil
import tempfile
import urllib.request
context = ssl._create_unverified_context()

dlurl='https://somesite/path/whatever'
with urllib.request.urlopen(durl, context=context) as response:
    with open("file.ova", 'wb') as tmp_file:
        shutil.copyfileobj(response, tmp_file)

ESXi libraries are kind of paired down but the open source weasel installer seemed to use urllib for https… so it inspired me to go down this path


回答 7

另一种保存文件的干净方法是:

import csv
import urllib

urllib.retrieve("your url goes here" , "output.csv")

Another clean way to save the file is this:

import csv
import urllib

urllib.retrieve("your url goes here" , "output.csv")

存储Python字典

问题:存储Python字典

我习惯于使用.csv文件将数据导入和导出Python,但这存在明显的挑战。关于将字典(或字典集)存储在json或pck文件中的简单方法的任何建议?例如:

data = {}
data ['key1'] = "keyinfo"
data ['key2'] = "keyinfo2"

我想知道如何保存它,然后再将其加载回去。

I’m used to bringing data in and out of Python using CSV files, but there are obvious challenges to this. Are there simple ways to store a dictionary (or sets of dictionaries) in a JSON or pickle file?

For example:

data = {}
data ['key1'] = "keyinfo"
data ['key2'] = "keyinfo2"

I would like to know both how to save this, and then how to load it back in.


回答 0

泡菜保存:

try:
    import cPickle as pickle
except ImportError:  # python 3.x
    import pickle

with open('data.p', 'wb') as fp:
    pickle.dump(data, fp, protocol=pickle.HIGHEST_PROTOCOL)

有关该参数的其他信息,请参见pickle模块文档protocol

酸洗负荷:

with open('data.p', 'rb') as fp:
    data = pickle.load(fp)

JSON保存:

import json

with open('data.json', 'w') as fp:
    json.dump(data, fp)

提供额外的参数,例如sort_keysindent以获得漂亮的结果。参数sort_keys将按字母顺序对键进行排序,而indent将使用indent=N空格缩进您的数据结构。

json.dump(data, fp, sort_keys=True, indent=4)

JSON加载:

with open('data.json', 'r') as fp:
    data = json.load(fp)

Pickle save:

try:
    import cPickle as pickle
except ImportError:  # Python 3.x
    import pickle

with open('data.p', 'wb') as fp:
    pickle.dump(data, fp, protocol=pickle.HIGHEST_PROTOCOL)

See the pickle module documentation for additional information regarding the protocol argument.

Pickle load:

with open('data.p', 'rb') as fp:
    data = pickle.load(fp)

JSON save:

import json

with open('data.json', 'w') as fp:
    json.dump(data, fp)

Supply extra arguments, like sort_keys or indent, to get a pretty result. The argument sort_keys will sort the keys alphabetically and indent will indent your data structure with indent=N spaces.

json.dump(data, fp, sort_keys=True, indent=4)

JSON load:

with open('data.json', 'r') as fp:
    data = json.load(fp)

回答 1

最小的示例,直接写入文件:

import json
json.dump(data, open(filename, 'wb'))
data = json.load(open(filename))

或安全地打开/关闭:

import json
with open(filename, 'wb') as outfile:
    json.dump(data, outfile)
with open(filename) as infile:
    data = json.load(infile)

如果要将其保存为字符串而不是文件:

import json
json_str = json.dumps(data)
data = json.loads(json_str)

Minimal example, writing directly to a file:

import json
json.dump(data, open(filename, 'wb'))
data = json.load(open(filename))

or safely opening / closing:

import json
with open(filename, 'wb') as outfile:
    json.dump(data, outfile)
with open(filename) as infile:
    data = json.load(infile)

If you want to save it in a string instead of a file:

import json
json_str = json.dumps(data)
data = json.loads(json_str)

回答 2

另请参阅加速包ujson。 https://pypi.python.org/pypi/ujson

import ujson
with open('data.json', 'wb') as fp:
    ujson.dump(data, fp)

Also see the speeded-up package ujson:

import ujson

with open('data.json', 'wb') as fp:
    ujson.dump(data, fp)

回答 3

要写入文件:

import json
myfile.write(json.dumps(mydict))

要读取文件:

import json
mydict = json.loads(myfile.read())

myfile 是存储字典的文件的文件对象。

To write to a file:

import json
myfile.write(json.dumps(mydict))

To read from a file:

import json
mydict = json.loads(myfile.read())

myfile is the file object for the file that you stored the dict in.


回答 4

如果您正在序列化之后但不需要其他程序中的数据,则强烈建议您使用该shelve模块。将其视为持久性字典。

myData = shelve.open('/path/to/file')

# check for values.
keyVar in myData

# set values
myData[anotherKey] = someValue

# save the data for future use.
myData.close()

If you’re after serialization, but won’t need the data in other programs, I strongly recommend the shelve module. Think of it as a persistent dictionary.

myData = shelve.open('/path/to/file')

# Check for values.
keyVar in myData

# Set values
myData[anotherKey] = someValue

# Save the data for future use.
myData.close()

回答 5

如果您想要替代picklejson,则可以使用klepto

>>> init = {'y': 2, 'x': 1, 'z': 3}
>>> import klepto
>>> cache = klepto.archives.file_archive('memo', init, serialized=False)
>>> cache        
{'y': 2, 'x': 1, 'z': 3}
>>>
>>> # dump dictionary to the file 'memo.py'
>>> cache.dump() 
>>> 
>>> # import from 'memo.py'
>>> from memo import memo
>>> print memo
{'y': 2, 'x': 1, 'z': 3}

使用klepto,如果使用过serialized=True,则该字典将被memo.pkl作为腌制的字典写入,而不是使用明文。

你可以在klepto这里找到:https : //github.com/uqfoundation/klepto

dill酸洗可能比酸洗更好pickle,因为dill可以在python中序列化几乎所有内容。 klepto也可以使用dill

你可以在dill这里找到:https : //github.com/uqfoundation/dill

前几行中额外的mumbo-jumbo是因为klepto可以配置为将字典存储到文件,目录上下文或SQL数据库中。无论选择什么作为后端存档,API都是相同的。它为您提供了一个“可存档”字典,您可以使用该字典loaddump与档案进行交互。

If you want an alternative to pickle or json, you can use klepto.

>>> init = {'y': 2, 'x': 1, 'z': 3}
>>> import klepto
>>> cache = klepto.archives.file_archive('memo', init, serialized=False)
>>> cache        
{'y': 2, 'x': 1, 'z': 3}
>>>
>>> # dump dictionary to the file 'memo.py'
>>> cache.dump() 
>>> 
>>> # import from 'memo.py'
>>> from memo import memo
>>> print memo
{'y': 2, 'x': 1, 'z': 3}

With klepto, if you had used serialized=True, the dictionary would have been written to memo.pkl as a pickled dictionary instead of with clear text.

You can get klepto here: https://github.com/uqfoundation/klepto

dill is probably a better choice for pickling then pickle itself, as dill can serialize almost anything in python. klepto also can use dill.

You can get dill here: https://github.com/uqfoundation/dill

The additional mumbo-jumbo on the first few lines are because klepto can be configured to store dictionaries to a file, to a directory context, or to a SQL database. The API is the same for whatever you choose as the backend archive. It gives you an “archivable” dictionary with which you can use load and dump to interact with the archive.


回答 6

这是一个老话题,但是为了完整起见,我们应该包括ConfigParser和configparser,它们分别是Python 2和3中的标准库的一部分。该模块读取和写入config / ini文件,并且(至少在Python 3中)其行为类似于字典。它的另一个好处是,您可以将多个词典存储到config / ini文件的不同部分中,并对其进行调用。甜!

Python 2.7.x示例。

import ConfigParser

config = ConfigParser.ConfigParser()

dict1 = {'key1':'keyinfo', 'key2':'keyinfo2'}
dict2 = {'k1':'hot', 'k2':'cross', 'k3':'buns'}
dict3 = {'x':1, 'y':2, 'z':3}

# make each dictionary a separate section in config
config.add_section('dict1')
for key in dict1.keys():
    config.set('dict1', key, dict1[key])

config.add_section('dict2')
for key in dict2.keys():
    config.set('dict2', key, dict2[key])

config.add_section('dict3')
for key in dict3.keys():
    config.set('dict3', key, dict3[key])

# save config to file
f = open('config.ini', 'w')
config.write(f)
f.close()

# read config from file
config2 = ConfigParser.ConfigParser()
config2.read('config.ini')

dictA = {}
for item in config2.items('dict1'):
    dictA[item[0]] = item[1]

dictB = {}
for item in config2.items('dict2'):
    dictB[item[0]] = item[1]

dictC = {}
for item in config2.items('dict3'):
    dictC[item[0]] = item[1]

print(dictA)
print(dictB)
print(dictC)

Python 3.X示例。

import configparser

config = configparser.ConfigParser()

dict1 = {'key1':'keyinfo', 'key2':'keyinfo2'}
dict2 = {'k1':'hot', 'k2':'cross', 'k3':'buns'}
dict3 = {'x':1, 'y':2, 'z':3}

# make each dictionary a separate section in config
config['dict1'] = dict1
config['dict2'] = dict2
config['dict3'] = dict3

# save config to file
f = open('config.ini', 'w')
config.write(f)
f.close()

# read config from file
config2 = configparser.ConfigParser()
config2.read('config.ini')

# ConfigParser objects are a lot like dictionaries, but if you really
# want a dictionary you can ask it to convert a section to a dictionary
dictA = dict(config2['dict1'] )
dictB = dict(config2['dict2'] )
dictC = dict(config2['dict3'])

print(dictA)
print(dictB)
print(dictC)

控制台输出

{'key2': 'keyinfo2', 'key1': 'keyinfo'}
{'k1': 'hot', 'k2': 'cross', 'k3': 'buns'}
{'z': '3', 'y': '2', 'x': '1'}

config.ini的内容

[dict1]
key2 = keyinfo2
key1 = keyinfo

[dict2]
k1 = hot
k2 = cross
k3 = buns

[dict3]
z = 3
y = 2
x = 1

For completeness, we should include ConfigParser and configparser which are part of the standard library in Python 2 and 3, respectively. This module reads and writes to a config/ini file and (at least in Python 3) behaves in a lot of ways like a dictionary. It has the added benefit that you can store multiple dictionaries into separate sections of your config/ini file and recall them. Sweet!

Python 2.7.x example.

import ConfigParser

config = ConfigParser.ConfigParser()

dict1 = {'key1':'keyinfo', 'key2':'keyinfo2'}
dict2 = {'k1':'hot', 'k2':'cross', 'k3':'buns'}
dict3 = {'x':1, 'y':2, 'z':3}

# Make each dictionary a separate section in the configuration
config.add_section('dict1')
for key in dict1.keys():
    config.set('dict1', key, dict1[key])
   
config.add_section('dict2')
for key in dict2.keys():
    config.set('dict2', key, dict2[key])

config.add_section('dict3')
for key in dict3.keys():
    config.set('dict3', key, dict3[key])

# Save the configuration to a file
f = open('config.ini', 'w')
config.write(f)
f.close()

# Read the configuration from a file
config2 = ConfigParser.ConfigParser()
config2.read('config.ini')

dictA = {}
for item in config2.items('dict1'):
    dictA[item[0]] = item[1]

dictB = {}
for item in config2.items('dict2'):
    dictB[item[0]] = item[1]

dictC = {}
for item in config2.items('dict3'):
    dictC[item[0]] = item[1]

print(dictA)
print(dictB)
print(dictC)

Python 3.X example.

import configparser

config = configparser.ConfigParser()

dict1 = {'key1':'keyinfo', 'key2':'keyinfo2'}
dict2 = {'k1':'hot', 'k2':'cross', 'k3':'buns'}
dict3 = {'x':1, 'y':2, 'z':3}

# Make each dictionary a separate section in the configuration
config['dict1'] = dict1
config['dict2'] = dict2
config['dict3'] = dict3

# Save the configuration to a file
f = open('config.ini', 'w')
config.write(f)
f.close()

# Read the configuration from a file
config2 = configparser.ConfigParser()
config2.read('config.ini')

# ConfigParser objects are a lot like dictionaries, but if you really
# want a dictionary you can ask it to convert a section to a dictionary
dictA = dict(config2['dict1'] )
dictB = dict(config2['dict2'] )
dictC = dict(config2['dict3'])

print(dictA)
print(dictB)
print(dictC)

Console output

{'key2': 'keyinfo2', 'key1': 'keyinfo'}
{'k1': 'hot', 'k2': 'cross', 'k3': 'buns'}
{'z': '3', 'y': '2', 'x': '1'}

Contents of config.ini

[dict1]
key2 = keyinfo2
key1 = keyinfo

[dict2]
k1 = hot
k2 = cross
k3 = buns

[dict3]
z = 3
y = 2
x = 1

回答 7

如果保存到json文件,最好的和最简单的方法是:

import json
with open("file.json", "wb") as f:
    f.write(json.dumps(dict).encode("utf-8"))

If save to a JSON file, the best and easiest way of doing this is:

import json
with open("file.json", "wb") as f:
    f.write(json.dumps(dict).encode("utf-8"))

回答 8

我的用例是将多个json对象保存到文件中,而marty的回答对我有所帮助。但是要满足我的用例,答案并不完整,因为每次保存新条目时,它都会覆盖旧数据。

为了将多个条目保存在一个文件中,必须检查旧内容(即在写入之前先读取)。存放json数据的典型文件将具有a listobjectas根。因此,我认为我的json文件始终具有a,list of objects并且每次向其添加数据时,我只会首先加载列表,在其中添加新数据,然后将其转储回文件(w)的仅可写实例:

def saveJson(url,sc): #this function writes the 2 values to file
    newdata = {'url':url,'sc':sc}
    json_path = "db/file.json"

    old_list= []
    with open(json_path) as myfile:  #read the contents first
        old_list = json.load(myfile)
    old_list.append(newdata)

    with open(json_path,"w") as myfile:  #overwrite the whole content
        json.dump(old_list,myfile,sort_keys=True,indent=4)

    return "sucess"

新的json文件将如下所示:

[
    {
        "sc": "a11",
        "url": "www.google.com"
    },
    {
        "sc": "a12",
        "url": "www.google.com"
    },
    {
        "sc": "a13",
        "url": "www.google.com"
    }
]

注意:必须file.json使用[]以初始数据命名的文件,此方法才能正常工作

PS:与原始问题无关,但是通过首先检查我们的条目是否已经存在(基于1 /多个键),然后仅追加并保存数据,也可以进一步改进此方法。让我知道是否有人需要该支票,我将添加到答案中

My use case was to save multiple JSON objects to a file and marty’s answer helped me somewhat. But to serve my use case, the answer was not complete as it would overwrite the old data every time a new entry was saved.

To save multiple entries in a file, one must check for the old content (i.e., read before write). A typical file holding JSON data will either have a list or an object as root. So I considered that my JSON file always has a list of objects and every time I add data to it, I simply load the list first, append my new data in it, and dump it back to a writable-only instance of file (w):

def saveJson(url,sc): # This function writes the two values to the file
    newdata = {'url':url,'sc':sc}
    json_path = "db/file.json"

    old_list= []
    with open(json_path) as myfile:  # Read the contents first
        old_list = json.load(myfile)
    old_list.append(newdata)

    with open(json_path,"w") as myfile:  # Overwrite the whole content
        json.dump(old_list, myfile, sort_keys=True, indent=4)

    return "success"

The new JSON file will look something like this:

[
    {
        "sc": "a11",
        "url": "www.google.com"
    },
    {
        "sc": "a12",
        "url": "www.google.com"
    },
    {
        "sc": "a13",
        "url": "www.google.com"
    }
]

NOTE: It is essential to have a file named file.json with [] as initial data for this approach to work

PS: not related to original question, but this approach could also be further improved by first checking if our entry already exists (based on one or multiple keys) and only then append and save the data.


保存对象(数据持久性)

问题:保存对象(数据持久性)

我创建了一个像这样的对象:

company1.name = 'banana' 
company1.value = 40

我想保存该对象。我怎样才能做到这一点?

I’ve created an object like this:

company1.name = 'banana' 
company1.value = 40

I would like to save this object. How can I do that?


回答 0

您可以使用pickle标准库中的模块。这是您的示例的基本应用:

import pickle

class Company(object):
    def __init__(self, name, value):
        self.name = name
        self.value = value

with open('company_data.pkl', 'wb') as output:
    company1 = Company('banana', 40)
    pickle.dump(company1, output, pickle.HIGHEST_PROTOCOL)

    company2 = Company('spam', 42)
    pickle.dump(company2, output, pickle.HIGHEST_PROTOCOL)

del company1
del company2

with open('company_data.pkl', 'rb') as input:
    company1 = pickle.load(input)
    print(company1.name)  # -> banana
    print(company1.value)  # -> 40

    company2 = pickle.load(input)
    print(company2.name) # -> spam
    print(company2.value)  # -> 42

您还可以定义自己的简单实用程序,如下所示,该实用程序打开文件并向其中写入单个对象:

def save_object(obj, filename):
    with open(filename, 'wb') as output:  # Overwrites any existing file.
        pickle.dump(obj, output, pickle.HIGHEST_PROTOCOL)

# sample usage
save_object(company1, 'company1.pkl')

更新资料

由于这是一个很受欢迎的答案,因此,我想谈谈一些较高级的用法主题。

cPickle(或_pickle)与pickle

实际使用cPickle模块几乎总是可取的,而不是pickle因为模块是用C编写的并且速度更快。它们之间有一些细微的差异,但是在大多数情况下它们是等效的,并且C版本将提供非常优越的性能。切换到它再简单不过,只需将import语句更改为:

import cPickle as pickle

在Python 3中,它cPickle已被重命名_pickle,但是不再需要执行此操作,因为该pickle模块现在可以自动执行此操作-请参阅python 3中的pickle和_pickle有什么区别?

总结是,您可以使用类似以下内容的代码来确保您的代码在Python 2和3中都可用时始终使用C版本:

try:
    import cPickle as pickle
except ModuleNotFoundError:
    import pickle

数据流格式(协议)

pickle可以读写多种不同的特定于Python的格式的文件,称为文档中所述的协议,“协议版本0”为ASCII,因此“易于阅读”。> 0的版本是二进制的,可用的最高版本取决于所使用的Python版本。默认值还取决于Python版本。在Python 2中,默认值是Protocol版本,但在Python 3.8.1中,它是Protocol版本。在Python 3.x中,该模块已添加,但在Python 2中不存在。04pickle.DEFAULT_PROTOCOL

幸运的是,pickle.HIGHEST_PROTOCOL在每个调用中都有一个写速记的方法(假设这就是您想要的,并且您通常会这样做),只需使用文字数字-1-类似于通过负索引引用序列的最后一个元素。因此,与其编写:

pickle.dump(obj, output, pickle.HIGHEST_PROTOCOL)

您可以这样写:

pickle.dump(obj, output, -1)

无论哪种方式,如果您创建了一个Pickler用于多个酸洗操作的对象,则只需指定一次协议:

pickler = pickle.Pickler(output, -1)
pickler.dump(obj1)
pickler.dump(obj2)
   etc...

注意:如果您正在运行不同版本的Python的环境中,则可能需要显式地使用(即,硬编码)它们都可以读取的特定协议编号(较新的版本通常可以读取较早版本产生的文件) 。

多个物件

虽然泡菜文件可以包含如上述样品中,当有这些数目不详的任何数量的腌制对象的,它往往更容易将其全部保存在某种可变大小的容器,就像一个listtupledict写字一次调用即可将它们全部保存到文件中:

tech_companies = [
    Company('Apple', 114.18), Company('Google', 908.60), Company('Microsoft', 69.18)
]
save_object(tech_companies, 'tech_companies.pkl')

然后使用以下命令恢复列表及其中的所有内容:

with open('tech_companies.pkl', 'rb') as input:
    tech_companies = pickle.load(input)

主要优点是您无需知道要保存多少个对象实例即可在以后加载它们(尽管如果没有该信息可以这样做,但它需要一些专门的代码)。请参阅相关问题的答案在腌制文件中保存和加载多个对象?有关执行此操作的不同方法的详细信息。个人喜欢@Lutz Prechelt的答案。它适用于此处的示例:

class Company:
    def __init__(self, name, value):
        self.name = name
        self.value = value

def pickled_items(filename):
    """ Unpickle a file of pickled data. """
    with open(filename, "rb") as f:
        while True:
            try:
                yield pickle.load(f)
            except EOFError:
                break

print('Companies in pickle file:')
for company in pickled_items('company_data.pkl'):
    print('  name: {}, value: {}'.format(company.name, company.value))

You could use the pickle module in the standard library. Here’s an elementary application of it to your example:

import pickle

class Company(object):
    def __init__(self, name, value):
        self.name = name
        self.value = value

with open('company_data.pkl', 'wb') as output:
    company1 = Company('banana', 40)
    pickle.dump(company1, output, pickle.HIGHEST_PROTOCOL)

    company2 = Company('spam', 42)
    pickle.dump(company2, output, pickle.HIGHEST_PROTOCOL)

del company1
del company2

with open('company_data.pkl', 'rb') as input:
    company1 = pickle.load(input)
    print(company1.name)  # -> banana
    print(company1.value)  # -> 40

    company2 = pickle.load(input)
    print(company2.name) # -> spam
    print(company2.value)  # -> 42

You could also define your own simple utility like the following which opens a file and writes a single object to it:

def save_object(obj, filename):
    with open(filename, 'wb') as output:  # Overwrites any existing file.
        pickle.dump(obj, output, pickle.HIGHEST_PROTOCOL)

# sample usage
save_object(company1, 'company1.pkl')

Update

Since this is such a popular answer, I’d like touch on a few slightly advanced usage topics.

cPickle (or _pickle) vs pickle

It’s almost always preferable to actually use the cPickle module rather than pickle because the former is written in C and is much faster. There are some subtle differences between them, but in most situations they’re equivalent and the C version will provide greatly superior performance. Switching to it couldn’t be easier, just change the import statement to this:

import cPickle as pickle

In Python 3, cPickle was renamed _pickle, but doing this is no longer necessary since the pickle module now does it automatically—see What difference between pickle and _pickle in python 3?.

The rundown is you could use something like the following to ensure that your code will always use the C version when it’s available in both Python 2 and 3:

try:
    import cPickle as pickle
except ModuleNotFoundError:
    import pickle

Data stream formats (protocols)

pickle can read and write files in several different, Python-specific, formats, called protocols as described in the documentation, “Protocol version 0” is ASCII and therefore “human-readable”. Versions > 0 are binary and the highest one available depends on what version of Python is being used. The default also depends on Python version. In Python 2 the default was Protocol version 0, but in Python 3.8.1, it’s Protocol version 4. In Python 3.x the module had a pickle.DEFAULT_PROTOCOL added to it, but that doesn’t exist in Python 2.

Fortunately there’s shorthand for writing pickle.HIGHEST_PROTOCOL in every call (assuming that’s what you want, and you usually do), just use the literal number -1 — similar to referencing the last element of a sequence via a negative index. So, instead of writing:

pickle.dump(obj, output, pickle.HIGHEST_PROTOCOL)

You can just write:

pickle.dump(obj, output, -1)

Either way, you’d only have specify the protocol once if you created a Pickler object for use in multiple pickle operations:

pickler = pickle.Pickler(output, -1)
pickler.dump(obj1)
pickler.dump(obj2)
   etc...

Note: If you’re in an environment running different versions of Python, then you’ll probably want to explicitly use (i.e. hardcode) a specific protocol number that all of them can read (later versions can generally read files produced by earlier ones).

Multiple Objects

While a pickle file can contain any number of pickled objects, as shown in the above samples, when there’s an unknown number of them, it’s often easier to store them all in some sort of variably-sized container, like a list, tuple, or dict and write them all to the file in a single call:

tech_companies = [
    Company('Apple', 114.18), Company('Google', 908.60), Company('Microsoft', 69.18)
]
save_object(tech_companies, 'tech_companies.pkl')

and restore the list and everything in it later with:

with open('tech_companies.pkl', 'rb') as input:
    tech_companies = pickle.load(input)

The major advantage is you don’t need to know how many object instances are saved in order to load them back later (although doing so without that information is possible, it requires some slightly specialized code). See the answers to the related question Saving and loading multiple objects in pickle file? for details on different ways to do this. Personally I like @Lutz Prechelt’s answer the best. Here’s it adapted to the examples here:

class Company:
    def __init__(self, name, value):
        self.name = name
        self.value = value

def pickled_items(filename):
    """ Unpickle a file of pickled data. """
    with open(filename, "rb") as f:
        while True:
            try:
                yield pickle.load(f)
            except EOFError:
                break

print('Companies in pickle file:')
for company in pickled_items('company_data.pkl'):
    print('  name: {}, value: {}'.format(company.name, company.value))

回答 1

我认为,假设该对象是个,这是一个很强的假设class。如果不是,该class怎么办?还有一种假设是该对象未在解释器中定义。如果在解释器中定义该怎么办?另外,如果属性是动态添加的,该怎么办?当某些python对象__dict__在创建后向其添加了属性时,pickle就不考虑这些属性的添加(即“忘记”了它们的添加-因为pickle是通过引用对象定义进行序列化的)。

在所有这些情况,pickle并且cPickle可以可怕的失败你。

如果您要保存object(任意创建的)具有属性(在对象定义中添加,或之后添加)的属性,则最好的选择是使用dill,它可以在python中序列化几乎所有内容。

我们从上课开始…

Python 2.7.8 (default, Jul 13 2014, 02:29:54) 
[GCC 4.2.1 Compatible Apple Clang 4.1 ((tags/Apple/clang-421.11.66))] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import pickle
>>> class Company:
...     pass
... 
>>> company1 = Company()
>>> company1.name = 'banana'
>>> company1.value = 40
>>> with open('company.pkl', 'wb') as f:
...     pickle.dump(company1, f, pickle.HIGHEST_PROTOCOL)
... 
>>> 

现在关闭,然后重新启动…

Python 2.7.8 (default, Jul 13 2014, 02:29:54) 
[GCC 4.2.1 Compatible Apple Clang 4.1 ((tags/Apple/clang-421.11.66))] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import pickle
>>> with open('company.pkl', 'rb') as f:
...     company1 = pickle.load(f)
... 
Traceback (most recent call last):
  File "<stdin>", line 2, in <module>
  File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py", line 1378, in load
    return Unpickler(file).load()
  File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py", line 858, in load
dispatch[key](self)
  File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py", line 1090, in load_global
    klass = self.find_class(module, name)
  File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py", line 1126, in find_class
    klass = getattr(mod, name)
AttributeError: 'module' object has no attribute 'Company'
>>> 

糟糕… pickle无法处理。让我们尝试一下dill。我们将引入另一种对象类型(a lambda)以取得良好的效果。

Python 2.7.8 (default, Jul 13 2014, 02:29:54) 
[GCC 4.2.1 Compatible Apple Clang 4.1 ((tags/Apple/clang-421.11.66))] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import dill       
>>> class Company:
...     pass
... 
>>> company1 = Company()
>>> company1.name = 'banana'
>>> company1.value = 40
>>> 
>>> company2 = lambda x:x
>>> company2.name = 'rhubarb'
>>> company2.value = 42
>>> 
>>> with open('company_dill.pkl', 'wb') as f:
...     dill.dump(company1, f)
...     dill.dump(company2, f)
... 
>>> 

现在读取文件。

Python 2.7.8 (default, Jul 13 2014, 02:29:54) 
[GCC 4.2.1 Compatible Apple Clang 4.1 ((tags/Apple/clang-421.11.66))] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import dill
>>> with open('company_dill.pkl', 'rb') as f:
...     company1 = dill.load(f)
...     company2 = dill.load(f)
... 
>>> company1 
<__main__.Company instance at 0x107909128>
>>> company1.name
'banana'
>>> company1.value
40
>>> company2.name
'rhubarb'
>>> company2.value
42
>>>    

有用。pickle失败的原因(dill并非如此)是(在大多数情况下)dill将其视为__main__一个模块,并且还可以腌制类定义而不是通过引用进行腌制(就像这样pickle做)。dill腌制a 的原因lambda是它给它起了个名字……然后就会出现腌制魔术。

实际上,有一种保存所有这些对象的简便方法,尤其是当您创建了很多对象时。只需转储整个python会话,然后稍后再返回即可。

Python 2.7.8 (default, Jul 13 2014, 02:29:54) 
[GCC 4.2.1 Compatible Apple Clang 4.1 ((tags/Apple/clang-421.11.66))] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import dill
>>> class Company:
...     pass
... 
>>> company1 = Company()
>>> company1.name = 'banana'
>>> company1.value = 40
>>> 
>>> company2 = lambda x:x
>>> company2.name = 'rhubarb'
>>> company2.value = 42
>>> 
>>> dill.dump_session('dill.pkl')
>>> 

现在关闭计算机,享用意式浓缩咖啡或其他任何东西,然后再回来…

Python 2.7.8 (default, Jul 13 2014, 02:29:54) 
[GCC 4.2.1 Compatible Apple Clang 4.1 ((tags/Apple/clang-421.11.66))] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import dill
>>> dill.load_session('dill.pkl')
>>> company1.name
'banana'
>>> company1.value
40
>>> company2.name
'rhubarb'
>>> company2.value
42
>>> company2
<function <lambda> at 0x1065f2938>

唯一的主要缺点是它dill不是python标准库的一部分。因此,如果您无法在服务器上安装python软件包,则无法使用它。

但是,如果你能够在系统上安装Python包,你可以得到最新的dillgit+https://github.com/uqfoundation/dill.git@master#egg=dill。您可以使用下载最新版本pip install dill

I think it’s a pretty strong assumption to assume that the object is a class. What if it’s not a class? There’s also the assumption that the object was not defined in the interpreter. What if it was defined in the interpreter? Also, what if the attributes were added dynamically? When some python objects have attributes added to their __dict__ after creation, pickle doesn’t respect the addition of those attributes (i.e. it ‘forgets’ they were added — because pickle serializes by reference to the object definition).

In all these cases, pickle and cPickle can fail you horribly.

If you are looking to save an object (arbitrarily created), where you have attributes (either added in the object definition, or afterward)… your best bet is to use dill, which can serialize almost anything in python.

We start with a class…

Python 2.7.8 (default, Jul 13 2014, 02:29:54) 
[GCC 4.2.1 Compatible Apple Clang 4.1 ((tags/Apple/clang-421.11.66))] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import pickle
>>> class Company:
...     pass
... 
>>> company1 = Company()
>>> company1.name = 'banana'
>>> company1.value = 40
>>> with open('company.pkl', 'wb') as f:
...     pickle.dump(company1, f, pickle.HIGHEST_PROTOCOL)
... 
>>> 

Now shut down, and restart…

Python 2.7.8 (default, Jul 13 2014, 02:29:54) 
[GCC 4.2.1 Compatible Apple Clang 4.1 ((tags/Apple/clang-421.11.66))] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import pickle
>>> with open('company.pkl', 'rb') as f:
...     company1 = pickle.load(f)
... 
Traceback (most recent call last):
  File "<stdin>", line 2, in <module>
  File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py", line 1378, in load
    return Unpickler(file).load()
  File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py", line 858, in load
dispatch[key](self)
  File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py", line 1090, in load_global
    klass = self.find_class(module, name)
  File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py", line 1126, in find_class
    klass = getattr(mod, name)
AttributeError: 'module' object has no attribute 'Company'
>>> 

Oops… pickle can’t handle it. Let’s try dill. We’ll throw in another object type (a lambda) for good measure.

Python 2.7.8 (default, Jul 13 2014, 02:29:54) 
[GCC 4.2.1 Compatible Apple Clang 4.1 ((tags/Apple/clang-421.11.66))] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import dill       
>>> class Company:
...     pass
... 
>>> company1 = Company()
>>> company1.name = 'banana'
>>> company1.value = 40
>>> 
>>> company2 = lambda x:x
>>> company2.name = 'rhubarb'
>>> company2.value = 42
>>> 
>>> with open('company_dill.pkl', 'wb') as f:
...     dill.dump(company1, f)
...     dill.dump(company2, f)
... 
>>> 

And now read the file.

Python 2.7.8 (default, Jul 13 2014, 02:29:54) 
[GCC 4.2.1 Compatible Apple Clang 4.1 ((tags/Apple/clang-421.11.66))] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import dill
>>> with open('company_dill.pkl', 'rb') as f:
...     company1 = dill.load(f)
...     company2 = dill.load(f)
... 
>>> company1 
<__main__.Company instance at 0x107909128>
>>> company1.name
'banana'
>>> company1.value
40
>>> company2.name
'rhubarb'
>>> company2.value
42
>>>    

It works. The reason pickle fails, and dill doesn’t, is that dill treats __main__ like a module (for the most part), and also can pickle class definitions instead of pickling by reference (like pickle does). The reason dill can pickle a lambda is that it gives it a name… then pickling magic can happen.

Actually, there’s an easier way to save all these objects, especially if you have a lot of objects you’ve created. Just dump the whole python session, and come back to it later.

Python 2.7.8 (default, Jul 13 2014, 02:29:54) 
[GCC 4.2.1 Compatible Apple Clang 4.1 ((tags/Apple/clang-421.11.66))] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import dill
>>> class Company:
...     pass
... 
>>> company1 = Company()
>>> company1.name = 'banana'
>>> company1.value = 40
>>> 
>>> company2 = lambda x:x
>>> company2.name = 'rhubarb'
>>> company2.value = 42
>>> 
>>> dill.dump_session('dill.pkl')
>>> 

Now shut down your computer, go enjoy an espresso or whatever, and come back later…

Python 2.7.8 (default, Jul 13 2014, 02:29:54) 
[GCC 4.2.1 Compatible Apple Clang 4.1 ((tags/Apple/clang-421.11.66))] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import dill
>>> dill.load_session('dill.pkl')
>>> company1.name
'banana'
>>> company1.value
40
>>> company2.name
'rhubarb'
>>> company2.value
42
>>> company2
<function <lambda> at 0x1065f2938>

The only major drawback is that dill is not part of the python standard library. So if you can’t install a python package on your server, then you can’t use it.

However, if you are able to install python packages on your system, you can get the latest dill with git+https://github.com/uqfoundation/dill.git@master#egg=dill. And you can get the latest released version with pip install dill.


回答 2

您可以使用anycache为您完成这项工作。它考虑了所有细节:

  • 它使用莳萝作为后端,扩展了python pickle模块以处理lambda和所有不错的python功能。
  • 它将不同的对象存储到不同的文件,并正确地重新加载它们。
  • 限制缓存大小
  • 允许清除缓存
  • 允许在多次运行之间共享对象
  • 允许尊重会影响结果的输入文件

假设您有一个myfunc创建实例的函数:

from anycache import anycache

class Company(object):
    def __init__(self, name, value):
        self.name = name
        self.value = value

@anycache(cachedir='/path/to/your/cache')    
def myfunc(name, value)
    return Company(name, value)

Anycache首次调用myfunc,并cachedir使用唯一标识符(取决于函数名称及其参数)作为文件名将结果腌制到文件中。在任何连续运行中,将加载已腌制的对象。如果在cachedir两次python运行之间保留了,则腌制的对象将从先前的python运行中获取。

有关更多详细信息,请参见文档

You can use anycache to do the job for you. It considers all the details:

  • It uses dill as backend, which extends the python pickle module to handle lambda and all the nice python features.
  • It stores different objects to different files and reloads them properly.
  • Limits cache size
  • Allows cache clearing
  • Allows sharing of objects between multiple runs
  • Allows respect of input files which influence the result

Assuming you have a function myfunc which creates the instance:

from anycache import anycache

class Company(object):
    def __init__(self, name, value):
        self.name = name
        self.value = value

@anycache(cachedir='/path/to/your/cache')    
def myfunc(name, value)
    return Company(name, value)

Anycache calls myfunc at the first time and pickles the result to a file in cachedir using an unique identifier (depending on the function name and its arguments) as filename. On any consecutive run, the pickled object is loaded. If the cachedir is preserved between python runs, the pickled object is taken from the previous python run.

For any further details see the documentation


回答 3

使用company1您的问题和python3的快速示例。

import pickle

# Save the file
pickle.dump(company1, file = open("company1.pickle", "wb"))

# Reload the file
company1_reloaded = pickle.load(open("company1.pickle", "rb"))

但是,正如该答案指出的那样,泡菜经常失败。所以你应该真正使用dill

import dill

# Save the file
dill.dump(company1, file = open("company1.pickle", "wb"))

# Reload the file
company1_reloaded = dill.load(open("company1.pickle", "rb"))

Quick example using company1 from your question, with python3.

import pickle

# Save the file
pickle.dump(company1, file = open("company1.pickle", "wb"))

# Reload the file
company1_reloaded = pickle.load(open("company1.pickle", "rb"))

However, as this answer noted, pickle often fails. So you should really use dill.

import dill

# Save the file
dill.dump(company1, file = open("company1.pickle", "wb"))

# Reload the file
company1_reloaded = dill.load(open("company1.pickle", "rb"))