保存对象（数据持久性）

Question 1

I’ve created an object like this:

company1.name = 'banana' 
company1.value = 40

I would like to save this object. How can I do that?

Question 2

You could use the pickle module in the standard library. Here’s an elementary application of it to your example:

import pickle

class Company(object):
    def __init__(self, name, value):
        self.name = name
        self.value = value

with open('company_data.pkl', 'wb') as output:
    company1 = Company('banana', 40)
    pickle.dump(company1, output, pickle.HIGHEST_PROTOCOL)

    company2 = Company('spam', 42)
    pickle.dump(company2, output, pickle.HIGHEST_PROTOCOL)

del company1
del company2

with open('company_data.pkl', 'rb') as input:
    company1 = pickle.load(input)
    print(company1.name)  # -> banana
    print(company1.value)  # -> 40

    company2 = pickle.load(input)
    print(company2.name) # -> spam
    print(company2.value)  # -> 42

You could also define your own simple utility like the following which opens a file and writes a single object to it:

def save_object(obj, filename):
    with open(filename, 'wb') as output:  # Overwrites any existing file.
        pickle.dump(obj, output, pickle.HIGHEST_PROTOCOL)

# sample usage
save_object(company1, 'company1.pkl')

Update

Since this is such a popular answer, I’d like touch on a few slightly advanced usage topics.

`cPickle` (or `_pickle`) vs `pickle`

It’s almost always preferable to actually use the module rather than pickle because the former is written in C and is much faster. There are some subtle differences between them, but in most situations they’re equivalent and the C version will provide greatly superior performance. Switching to it couldn’t be easier, just change the import statement to this:

import cPickle as pickle

In Python 3, cPickle was renamed _pickle, but doing this is no longer necessary since the pickle module now does it automatically—see What difference between pickle and _pickle in python 3?.

The rundown is you could use something like the following to ensure that your code will always use the C version when it’s available in both Python 2 and 3:

try:
    import cPickle as pickle
except ModuleNotFoundError:
    import pickle

Data stream formats (protocols)

pickle can read and write files in several different, Python-specific, formats, called protocols as described in the documentation, “Protocol version 0” is ASCII and therefore “human-readable”. Versions > 0 are binary and the highest one available depends on what version of Python is being used. The default also depends on Python version. In Python 2 the default was Protocol version 0, but in Python 3.8.1, it’s Protocol version 4. In Python 3.x the module had a pickle.DEFAULT_PROTOCOL added to it, but that doesn’t exist in Python 2.

Fortunately there’s shorthand for writing pickle.HIGHEST_PROTOCOL in every call (assuming that’s what you want, and you usually do), just use the literal number -1 — similar to referencing the last element of a sequence via a negative index. So, instead of writing:

pickle.dump(obj, output, pickle.HIGHEST_PROTOCOL)

You can just write:

pickle.dump(obj, output, -1)

Either way, you’d only have specify the protocol once if you created a Pickler object for use in multiple pickle operations:

pickler = pickle.Pickler(output, -1)
pickler.dump(obj1)
pickler.dump(obj2)
   etc...

Note: If you’re in an environment running different versions of Python, then you’ll probably want to explicitly use (i.e. hardcode) a specific protocol number that all of them can read (later versions can generally read files produced by earlier ones).

Multiple Objects

While a pickle file can contain any number of pickled objects, as shown in the above samples, when there’s an unknown number of them, it’s often easier to store them all in some sort of variably-sized container, like a list, tuple, or dict and write them all to the file in a single call:

tech_companies = [
    Company('Apple', 114.18), Company('Google', 908.60), Company('Microsoft', 69.18)
]
save_object(tech_companies, 'tech_companies.pkl')

and restore the list and everything in it later with:

with open('tech_companies.pkl', 'rb') as input:
    tech_companies = pickle.load(input)

The major advantage is you don’t need to know how many object instances are saved in order to load them back later (although doing so without that information is possible, it requires some slightly specialized code). See the answers to the related question Saving and loading multiple objects in pickle file? for details on different ways to do this. Personally I like @Lutz Prechelt’s answer the best. Here’s it adapted to the examples here:

class Company:
    def __init__(self, name, value):
        self.name = name
        self.value = value

def pickled_items(filename):
    """ Unpickle a file of pickled data. """
    with open(filename, "rb") as f:
        while True:
            try:
                yield pickle.load(f)
            except EOFError:
                break

print('Companies in pickle file:')
for company in pickled_items('company_data.pkl'):
    print('  name: {}, value: {}'.format(company.name, company.value))

Question 3

I think it’s a pretty strong assumption to assume that the object is a class. What if it’s not a class? There’s also the assumption that the object was not defined in the interpreter. What if it was defined in the interpreter? Also, what if the attributes were added dynamically? When some python objects have attributes added to their __dict__ after creation, pickle doesn’t respect the addition of those attributes (i.e. it ‘forgets’ they were added — because pickle serializes by reference to the object definition).

In all these cases, pickle and cPickle can fail you horribly.

If you are looking to save an object (arbitrarily created), where you have attributes (either added in the object definition, or afterward)… your best bet is to use dill, which can serialize almost anything in python.

We start with a class…

Python 2.7.8 (default, Jul 13 2014, 02:29:54) 
[GCC 4.2.1 Compatible Apple Clang 4.1 ((tags/Apple/clang-421.11.66))] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import pickle
>>> class Company:
...     pass
... 
>>> company1 = Company()
>>> company1.name = 'banana'
>>> company1.value = 40
>>> with open('company.pkl', 'wb') as f:
...     pickle.dump(company1, f, pickle.HIGHEST_PROTOCOL)
... 
>>>

Now shut down, and restart…

Python 2.7.8 (default, Jul 13 2014, 02:29:54) 
[GCC 4.2.1 Compatible Apple Clang 4.1 ((tags/Apple/clang-421.11.66))] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import pickle
>>> with open('company.pkl', 'rb') as f:
...     company1 = pickle.load(f)
... 
Traceback (most recent call last):
  File "<stdin>", line 2, in <module>
  File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py", line 1378, in load
    return Unpickler(file).load()
  File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py", line 858, in load
dispatch[key](self)
  File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py", line 1090, in load_global
    klass = self.find_class(module, name)
  File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py", line 1126, in find_class
    klass = getattr(mod, name)
AttributeError: 'module' object has no attribute 'Company'
>>>

Oops… pickle can’t handle it. Let’s try dill. We’ll throw in another object type (a lambda) for good measure.

Python 2.7.8 (default, Jul 13 2014, 02:29:54) 
[GCC 4.2.1 Compatible Apple Clang 4.1 ((tags/Apple/clang-421.11.66))] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import dill       
>>> class Company:
...     pass
... 
>>> company1 = Company()
>>> company1.name = 'banana'
>>> company1.value = 40
>>> 
>>> company2 = lambda x:x
>>> company2.name = 'rhubarb'
>>> company2.value = 42
>>> 
>>> with open('company_dill.pkl', 'wb') as f:
...     dill.dump(company1, f)
...     dill.dump(company2, f)
... 
>>>

And now read the file.

Python 2.7.8 (default, Jul 13 2014, 02:29:54) 
[GCC 4.2.1 Compatible Apple Clang 4.1 ((tags/Apple/clang-421.11.66))] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import dill
>>> with open('company_dill.pkl', 'rb') as f:
...     company1 = dill.load(f)
...     company2 = dill.load(f)
... 
>>> company1 
<__main__.Company instance at 0x107909128>
>>> company1.name
'banana'
>>> company1.value
40
>>> company2.name
'rhubarb'
>>> company2.value
42
>>>

It works. The reason pickle fails, and dill doesn’t, is that dill treats __main__ like a module (for the most part), and also can pickle class definitions instead of pickling by reference (like pickle does). The reason dill can pickle a lambda is that it gives it a name… then pickling magic can happen.

Actually, there’s an easier way to save all these objects, especially if you have a lot of objects you’ve created. Just dump the whole python session, and come back to it later.

Python 2.7.8 (default, Jul 13 2014, 02:29:54) 
[GCC 4.2.1 Compatible Apple Clang 4.1 ((tags/Apple/clang-421.11.66))] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import dill
>>> class Company:
...     pass
... 
>>> company1 = Company()
>>> company1.name = 'banana'
>>> company1.value = 40
>>> 
>>> company2 = lambda x:x
>>> company2.name = 'rhubarb'
>>> company2.value = 42
>>> 
>>> dill.dump_session('dill.pkl')
>>>

Now shut down your computer, go enjoy an espresso or whatever, and come back later…

Python 2.7.8 (default, Jul 13 2014, 02:29:54) 
[GCC 4.2.1 Compatible Apple Clang 4.1 ((tags/Apple/clang-421.11.66))] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import dill
>>> dill.load_session('dill.pkl')
>>> company1.name
'banana'
>>> company1.value
40
>>> company2.name
'rhubarb'
>>> company2.value
42
>>> company2
<function <lambda> at 0x1065f2938>

The only major drawback is that dill is not part of the python standard library. So if you can’t install a python package on your server, then you can’t use it.

However, if you are able to install python packages on your system, you can get the latest dill with git+https://github.com/uqfoundation/dill.git@master#egg=dill. And you can get the latest released version with pip install dill.

Question 4

You can use anycache to do the job for you. It considers all the details:

It uses dill as backend, which extends the python pickle module to handle lambda and all the nice python features.
It stores different objects to different files and reloads them properly.
Limits cache size
Allows cache clearing
Allows sharing of objects between multiple runs
Allows respect of input files which influence the result

Assuming you have a function myfunc which creates the instance:

from anycache import anycache

class Company(object):
    def __init__(self, name, value):
        self.name = name
        self.value = value

@anycache(cachedir='/path/to/your/cache')    
def myfunc(name, value)
    return Company(name, value)

Anycache calls myfunc at the first time and pickles the result to a file in cachedir using an unique identifier (depending on the function name and its arguments) as filename. On any consecutive run, the pickled object is loaded. If the cachedir is preserved between python runs, the pickled object is taken from the previous python run.

For any further details see the documentation

Question 5

Quick example using company1 from your question, with python3.

import pickle

# Save the file
pickle.dump(company1, file = open("company1.pickle", "wb"))

# Reload the file
company1_reloaded = pickle.load(open("company1.pickle", "rb"))

However, as this answer noted, pickle often fails. So you should really use dill.

import dill

# Save the file
dill.dump(company1, file = open("company1.pickle", "wb"))

# Reload the file
company1_reloaded = dill.load(open("company1.pickle", "rb"))

保存对象（数据持久性）

问题：保存对象（数据持久性）

回答 0

更新资料

`cPickle`（或`_pickle`）与`pickle`

数据流格式（协议）

多个物件

Update

`cPickle` (or `_pickle`) vs `pickle`

Data stream formats (protocols)

Multiple Objects

回答 1

回答 2

回答 3

排行榜展示

Python 情人节超强技能导出微信聊天记录生成词云

你不得不知道的python超级文献批量搜索下载工具

7行代码 Python热力图可视化分析缺失数据处理

Python 流程图 — 一键转化代码为流程图

Python 优化—算出每条语句执行时间

你的10W块放哪里能赚最多钱？

文章展示

从字符串创建Pandas DataFrame

Django查询中的“ LIKE”等效SQL

TypeError：ObjectId（”）不可序列化JSON

Python中的事件系统

熊猫三向联接列上的多个数据框

如何按两列或更多列对python pandas中的dataFrame进行排序？

保存对象（数据持久性）

问题：保存对象（数据持久性）

回答 0

更新资料

cPickle（或_pickle）与pickle

数据流格式（协议）

多个物件

Update

cPickle (or _pickle) vs pickle

Data stream formats (protocols)

Multiple Objects

回答 1

回答 2

回答 3

相关文章

排行榜展示

文章展示

`cPickle`（或`_pickle`）与`pickle`

`cPickle` (or `_pickle`) vs `pickle`