标签归档:object

字典与对象-哪个更有效,为什么?

问题:字典与对象-哪个更有效,为什么?

在内存使用和CPU消耗方面,在Python中更有效的方法是-字典还是对象?

背景: 我必须将大量数据加载到Python中。我创建了一个只是字段容器的对象。创建4M实例并将其放入字典中大约需要10分钟和约6GB的内存。字典准备就绪后,只需眨眼即可访问。

示例: 为了检查性能,我编写了两个简单的程序,它们执行相同的操作-一个使用对象,另一个使用字典:

对象(执行时间〜18sec):

class Obj(object):
  def __init__(self, i):
    self.i = i
    self.l = []
all = {}
for i in range(1000000):
  all[i] = Obj(i)

字典(执行时间约12秒):

all = {}
for i in range(1000000):
  o = {}
  o['i'] = i
  o['l'] = []
  all[i] = o

问题: 我做错什么了吗?字典比对象快?如果确实字典表现更好,有人可以解释为什么吗?

What is more efficient in Python in terms of memory usage and CPU consumption – Dictionary or Object?

Background: I have to load huge amount of data into Python. I created an object that is just a field container. Creating 4M instances and putting them into a dictionary took about 10 minutes and ~6GB of memory. After dictionary is ready, accessing it is a blink of an eye.

Example: To check the performance I wrote two simple programs that do the same – one is using objects, other dictionary:

Object (execution time ~18sec):

class Obj(object):
  def __init__(self, i):
    self.i = i
    self.l = []
all = {}
for i in range(1000000):
  all[i] = Obj(i)

Dictionary (execution time ~12sec):

all = {}
for i in range(1000000):
  o = {}
  o['i'] = i
  o['l'] = []
  all[i] = o

Question: Am I doing something wrong or dictionary is just faster than object? If indeed dictionary performs better, can somebody explain why?


回答 0

您是否尝试过使用__slots__

文档中

默认情况下,新旧类的实例都有用于属性存储的字典。这浪费了具有很少实例变量的对象的空间。创建大量实例时,空间消耗会变得非常大。

可以通过__slots__在新式类定义中进行定义来覆盖默认值。该__slots__声明采用一系列实例变量,并且在每个实例中仅保留足够的空间来容纳每个变量的值。因为__dict__未为每个实例创建空间,所以节省了空间。

那么,这样既节省时间又节省内存吗?

比较计算机上的三种方法:

test_slots.py:

class Obj(object):
  __slots__ = ('i', 'l')
  def __init__(self, i):
    self.i = i
    self.l = []
all = {}
for i in range(1000000):
  all[i] = Obj(i)

test_obj.py:

class Obj(object):
  def __init__(self, i):
    self.i = i
    self.l = []
all = {}
for i in range(1000000):
  all[i] = Obj(i)

test_dict.py:

all = {}
for i in range(1000000):
  o = {}
  o['i'] = i
  o['l'] = []
  all[i] = o

test_namedtuple.py(在2.6中受支持):

import collections

Obj = collections.namedtuple('Obj', 'i l')

all = {}
for i in range(1000000):
  all[i] = Obj(i, [])

运行基准测试(使用CPython 2.5):

$ lshw | grep product | head -n 1
          product: Intel(R) Pentium(R) M processor 1.60GHz
$ python --version
Python 2.5
$ time python test_obj.py && time python test_dict.py && time python test_slots.py 

real    0m27.398s (using 'normal' object)
real    0m16.747s (using __dict__)
real    0m11.777s (using __slots__)

使用CPython 2.6.2,包括命名的元组测试:

$ python --version
Python 2.6.2
$ time python test_obj.py && time python test_dict.py && time python test_slots.py && time python test_namedtuple.py 

real    0m27.197s (using 'normal' object)
real    0m17.657s (using __dict__)
real    0m12.249s (using __slots__)
real    0m12.262s (using namedtuple)

因此,是的(不是很意外),使用__slots__是一种性能优化。使用命名元组的性能与相似__slots__

Have you tried using __slots__?

From the documentation:

By default, instances of both old and new-style classes have a dictionary for attribute storage. This wastes space for objects having very few instance variables. The space consumption can become acute when creating large numbers of instances.

The default can be overridden by defining __slots__ in a new-style class definition. The __slots__ declaration takes a sequence of instance variables and reserves just enough space in each instance to hold a value for each variable. Space is saved because __dict__ is not created for each instance.

So does this save time as well as memory?

Comparing the three approaches on my computer:

test_slots.py:

class Obj(object):
  __slots__ = ('i', 'l')
  def __init__(self, i):
    self.i = i
    self.l = []
all = {}
for i in range(1000000):
  all[i] = Obj(i)

test_obj.py:

class Obj(object):
  def __init__(self, i):
    self.i = i
    self.l = []
all = {}
for i in range(1000000):
  all[i] = Obj(i)

test_dict.py:

all = {}
for i in range(1000000):
  o = {}
  o['i'] = i
  o['l'] = []
  all[i] = o

test_namedtuple.py (supported in 2.6):

import collections

Obj = collections.namedtuple('Obj', 'i l')

all = {}
for i in range(1000000):
  all[i] = Obj(i, [])

Run benchmark (using CPython 2.5):

$ lshw | grep product | head -n 1
          product: Intel(R) Pentium(R) M processor 1.60GHz
$ python --version
Python 2.5
$ time python test_obj.py && time python test_dict.py && time python test_slots.py 

real    0m27.398s (using 'normal' object)
real    0m16.747s (using __dict__)
real    0m11.777s (using __slots__)

Using CPython 2.6.2, including the named tuple test:

$ python --version
Python 2.6.2
$ time python test_obj.py && time python test_dict.py && time python test_slots.py && time python test_namedtuple.py 

real    0m27.197s (using 'normal' object)
real    0m17.657s (using __dict__)
real    0m12.249s (using __slots__)
real    0m12.262s (using namedtuple)

So yes (not really a surprise), using __slots__ is a performance optimization. Using a named tuple has similar performance to __slots__.


回答 1

对象中的属性访问使用幕后的字典访问-因此,使用属性访问会增加额外的开销。另外,在对象情况下,由于例如额外的内存分配和代码执行(例如__init__方法的执行),您将承担额外的开销。

在您的代码中,如果o是一个Obj实例,o.attr则等效于o.__dict__['attr']少量的额外开销。

Attribute access in an object uses dictionary access behind the scenes – so by using attribute access you are adding extra overhead. Plus in the object case, you are incurring additional overhead because of e.g. additional memory allocations and code execution (e.g. of the __init__ method).

In your code, if o is an Obj instance, o.attr is equivalent to o.__dict__['attr'] with a small amount of extra overhead.


回答 2

您是否考虑过使用namedtuple?(python 2.4 / 2.5的链接

这是表示结构化数据的新标准方式,可为您提供元组的性能和类的便利性。

与字典相比,它的唯一缺点是(如元组)它不具有创建后更改属性的能力。

Have you considered using a namedtuple? (link for python 2.4/2.5)

It’s the new standard way of representing structured data that gives you the performance of a tuple and the convenience of a class.

It’s only downside compared with dictionaries is that (like tuples) it doesn’t give you the ability to change attributes after creation.


回答 3

这是python 3.6.1的@hughdbrown答案的副本,我将计数增加了5倍,并在每次运行结束时添加了一些代码来测试python进程的内存占用量。

在不愿接受投票的人之前,请注意,这种计算对象大小的方法并不准确。

from datetime import datetime
import os
import psutil

process = psutil.Process(os.getpid())


ITER_COUNT = 1000 * 1000 * 5

RESULT=None

def makeL(i):
    # Use this line to negate the effect of the strings on the test 
    # return "Python is smart and will only create one string with this line"

    # Use this if you want to see the difference with 5 million unique strings
    return "This is a sample string %s" % i

def timeit(method):
    def timed(*args, **kw):
        global RESULT
        s = datetime.now()
        RESULT = method(*args, **kw)
        e = datetime.now()

        sizeMb = process.memory_info().rss / 1024 / 1024
        sizeMbStr = "{0:,}".format(round(sizeMb, 2))

        print('Time Taken = %s, \t%s, \tSize = %s' % (e - s, method.__name__, sizeMbStr))

    return timed

class Obj(object):
    def __init__(self, i):
       self.i = i
       self.l = makeL(i)

class SlotObj(object):
    __slots__ = ('i', 'l')
    def __init__(self, i):
       self.i = i
       self.l = makeL(i)

from collections import namedtuple
NT = namedtuple("NT", ["i", 'l'])

@timeit
def profile_dict_of_nt():
    return [NT(i=i, l=makeL(i)) for i in range(ITER_COUNT)]

@timeit
def profile_list_of_nt():
    return dict((i, NT(i=i, l=makeL(i))) for i in range(ITER_COUNT))

@timeit
def profile_dict_of_dict():
    return dict((i, {'i': i, 'l': makeL(i)}) for i in range(ITER_COUNT))

@timeit
def profile_list_of_dict():
    return [{'i': i, 'l': makeL(i)} for i in range(ITER_COUNT)]

@timeit
def profile_dict_of_obj():
    return dict((i, Obj(i)) for i in range(ITER_COUNT))

@timeit
def profile_list_of_obj():
    return [Obj(i) for i in range(ITER_COUNT)]

@timeit
def profile_dict_of_slot():
    return dict((i, SlotObj(i)) for i in range(ITER_COUNT))

@timeit
def profile_list_of_slot():
    return [SlotObj(i) for i in range(ITER_COUNT)]

profile_dict_of_nt()
profile_list_of_nt()
profile_dict_of_dict()
profile_list_of_dict()
profile_dict_of_obj()
profile_list_of_obj()
profile_dict_of_slot()
profile_list_of_slot()

这些是我的结果

Time Taken = 0:00:07.018720,    provile_dict_of_nt,     Size = 951.83
Time Taken = 0:00:07.716197,    provile_list_of_nt,     Size = 1,084.75
Time Taken = 0:00:03.237139,    profile_dict_of_dict,   Size = 1,926.29
Time Taken = 0:00:02.770469,    profile_list_of_dict,   Size = 1,778.58
Time Taken = 0:00:07.961045,    profile_dict_of_obj,    Size = 1,537.64
Time Taken = 0:00:05.899573,    profile_list_of_obj,    Size = 1,458.05
Time Taken = 0:00:06.567684,    profile_dict_of_slot,   Size = 1,035.65
Time Taken = 0:00:04.925101,    profile_list_of_slot,   Size = 887.49

我的结论是:

  1. 插槽具有最佳的内存占用,并且速度合理。
  2. dict是最快的,但使用最多的内存。

Here is a copy of @hughdbrown answer for python 3.6.1, I’ve made the count 5x larger and added some code to test the memory footprint of the python process at the end of each run.

Before the downvoters have at it, Be advised that this method of counting the size of objects is not accurate.

from datetime import datetime
import os
import psutil

process = psutil.Process(os.getpid())


ITER_COUNT = 1000 * 1000 * 5

RESULT=None

def makeL(i):
    # Use this line to negate the effect of the strings on the test 
    # return "Python is smart and will only create one string with this line"

    # Use this if you want to see the difference with 5 million unique strings
    return "This is a sample string %s" % i

def timeit(method):
    def timed(*args, **kw):
        global RESULT
        s = datetime.now()
        RESULT = method(*args, **kw)
        e = datetime.now()

        sizeMb = process.memory_info().rss / 1024 / 1024
        sizeMbStr = "{0:,}".format(round(sizeMb, 2))

        print('Time Taken = %s, \t%s, \tSize = %s' % (e - s, method.__name__, sizeMbStr))

    return timed

class Obj(object):
    def __init__(self, i):
       self.i = i
       self.l = makeL(i)

class SlotObj(object):
    __slots__ = ('i', 'l')
    def __init__(self, i):
       self.i = i
       self.l = makeL(i)

from collections import namedtuple
NT = namedtuple("NT", ["i", 'l'])

@timeit
def profile_dict_of_nt():
    return [NT(i=i, l=makeL(i)) for i in range(ITER_COUNT)]

@timeit
def profile_list_of_nt():
    return dict((i, NT(i=i, l=makeL(i))) for i in range(ITER_COUNT))

@timeit
def profile_dict_of_dict():
    return dict((i, {'i': i, 'l': makeL(i)}) for i in range(ITER_COUNT))

@timeit
def profile_list_of_dict():
    return [{'i': i, 'l': makeL(i)} for i in range(ITER_COUNT)]

@timeit
def profile_dict_of_obj():
    return dict((i, Obj(i)) for i in range(ITER_COUNT))

@timeit
def profile_list_of_obj():
    return [Obj(i) for i in range(ITER_COUNT)]

@timeit
def profile_dict_of_slot():
    return dict((i, SlotObj(i)) for i in range(ITER_COUNT))

@timeit
def profile_list_of_slot():
    return [SlotObj(i) for i in range(ITER_COUNT)]

profile_dict_of_nt()
profile_list_of_nt()
profile_dict_of_dict()
profile_list_of_dict()
profile_dict_of_obj()
profile_list_of_obj()
profile_dict_of_slot()
profile_list_of_slot()

And these are my results

Time Taken = 0:00:07.018720,    provile_dict_of_nt,     Size = 951.83
Time Taken = 0:00:07.716197,    provile_list_of_nt,     Size = 1,084.75
Time Taken = 0:00:03.237139,    profile_dict_of_dict,   Size = 1,926.29
Time Taken = 0:00:02.770469,    profile_list_of_dict,   Size = 1,778.58
Time Taken = 0:00:07.961045,    profile_dict_of_obj,    Size = 1,537.64
Time Taken = 0:00:05.899573,    profile_list_of_obj,    Size = 1,458.05
Time Taken = 0:00:06.567684,    profile_dict_of_slot,   Size = 1,035.65
Time Taken = 0:00:04.925101,    profile_list_of_slot,   Size = 887.49

My conclusion is:

  1. Slots have the best memory footprint and are reasonable on speed.
  2. dicts are the fastest, but use the most memory.

回答 4

from datetime import datetime

ITER_COUNT = 1000 * 1000

def timeit(method):
    def timed(*args, **kw):
        s = datetime.now()
        result = method(*args, **kw)
        e = datetime.now()

        print method.__name__, '(%r, %r)' % (args, kw), e - s
        return result
    return timed

class Obj(object):
    def __init__(self, i):
       self.i = i
       self.l = []

class SlotObj(object):
    __slots__ = ('i', 'l')
    def __init__(self, i):
       self.i = i
       self.l = []

@timeit
def profile_dict_of_dict():
    return dict((i, {'i': i, 'l': []}) for i in xrange(ITER_COUNT))

@timeit
def profile_list_of_dict():
    return [{'i': i, 'l': []} for i in xrange(ITER_COUNT)]

@timeit
def profile_dict_of_obj():
    return dict((i, Obj(i)) for i in xrange(ITER_COUNT))

@timeit
def profile_list_of_obj():
    return [Obj(i) for i in xrange(ITER_COUNT)]

@timeit
def profile_dict_of_slotobj():
    return dict((i, SlotObj(i)) for i in xrange(ITER_COUNT))

@timeit
def profile_list_of_slotobj():
    return [SlotObj(i) for i in xrange(ITER_COUNT)]

if __name__ == '__main__':
    profile_dict_of_dict()
    profile_list_of_dict()
    profile_dict_of_obj()
    profile_list_of_obj()
    profile_dict_of_slotobj()
    profile_list_of_slotobj()

结果:

hbrown@hbrown-lpt:~$ python ~/Dropbox/src/StackOverflow/1336791.py 
profile_dict_of_dict ((), {}) 0:00:08.228094
profile_list_of_dict ((), {}) 0:00:06.040870
profile_dict_of_obj ((), {}) 0:00:11.481681
profile_list_of_obj ((), {}) 0:00:10.893125
profile_dict_of_slotobj ((), {}) 0:00:06.381897
profile_list_of_slotobj ((), {}) 0:00:05.860749
from datetime import datetime

ITER_COUNT = 1000 * 1000

def timeit(method):
    def timed(*args, **kw):
        s = datetime.now()
        result = method(*args, **kw)
        e = datetime.now()

        print method.__name__, '(%r, %r)' % (args, kw), e - s
        return result
    return timed

class Obj(object):
    def __init__(self, i):
       self.i = i
       self.l = []

class SlotObj(object):
    __slots__ = ('i', 'l')
    def __init__(self, i):
       self.i = i
       self.l = []

@timeit
def profile_dict_of_dict():
    return dict((i, {'i': i, 'l': []}) for i in xrange(ITER_COUNT))

@timeit
def profile_list_of_dict():
    return [{'i': i, 'l': []} for i in xrange(ITER_COUNT)]

@timeit
def profile_dict_of_obj():
    return dict((i, Obj(i)) for i in xrange(ITER_COUNT))

@timeit
def profile_list_of_obj():
    return [Obj(i) for i in xrange(ITER_COUNT)]

@timeit
def profile_dict_of_slotobj():
    return dict((i, SlotObj(i)) for i in xrange(ITER_COUNT))

@timeit
def profile_list_of_slotobj():
    return [SlotObj(i) for i in xrange(ITER_COUNT)]

if __name__ == '__main__':
    profile_dict_of_dict()
    profile_list_of_dict()
    profile_dict_of_obj()
    profile_list_of_obj()
    profile_dict_of_slotobj()
    profile_list_of_slotobj()

Results:

hbrown@hbrown-lpt:~$ python ~/Dropbox/src/StackOverflow/1336791.py 
profile_dict_of_dict ((), {}) 0:00:08.228094
profile_list_of_dict ((), {}) 0:00:06.040870
profile_dict_of_obj ((), {}) 0:00:11.481681
profile_list_of_obj ((), {}) 0:00:10.893125
profile_dict_of_slotobj ((), {}) 0:00:06.381897
profile_list_of_slotobj ((), {}) 0:00:05.860749

回答 5

没问题。
您有没有其他属性的数据(没有方法,没有任何东西)。因此,您有一个数据容器(在本例中为字典)。

我通常更喜欢在数据建模方面进行思考。如果存在巨大的性能问题,那么我可以放弃抽象中的某些内容,但是只有非常好的理由。
编程是关于管理复杂性的,维护正确的抽象常常是实现这种结果的最有用的方法之一。

关于物体变慢的原因,我认为您的测量不正确。
您在for循环内执行的分配太少,因此看到的实例化dict(本机对象)和“ custom”对象所需的时间不同。尽管从语言角度看它们是相同的,但它们的实现却大不相同。
之后,两者的分配时间应几乎相同,因为最终成员将保留在词典中。

There is no question.
You have data, with no other attributes (no methods, nothing). Hence you have a data container (in this case, a dictionary).

I usually prefer to think in terms of data modeling. If there is some huge performance issue, then I can give up something in the abstraction, but only with very good reasons.
Programming is all about managing complexity, and the maintaining the correct abstraction is very often one of the most useful way to achieve such result.

About the reasons an object is slower, I think your measurement is not correct.
You are performing too little assignments inside the for loop, and therefore what you see there is the different time necessary to instantiate a dict (intrinsic object) and a “custom” object. Although from the language perspective they are the same, they have quite a different implementation.
After that, the assignment time should be almost the same for both, as in the end members are maintained inside a dictionary.


回答 6

如果数据结构不应包含参考周期,则还有另一种减少内存使用的方法。

让我们比较两个类:

class DataItem:
    __slots__ = ('name', 'age', 'address')
    def __init__(self, name, age, address):
        self.name = name
        self.age = age
        self.address = address

$ pip install recordclass

>>> from recordclass import structclass
>>> DataItem2 = structclass('DataItem', 'name age address')
>>> inst = DataItem('Mike', 10, 'Cherry Street 15')
>>> inst2 = DataItem2('Mike', 10, 'Cherry Street 15')
>>> print(inst2)
>>> print(sys.getsizeof(inst), sys.getsizeof(inst2))
DataItem(name='Mike', age=10, address='Cherry Street 15')
64 40

由于structclass基于类的类不支持循环垃圾收集,在这种情况下不需要,因此成为可能。

__slots__基于类的类相比,还有一个优点:您可以添加额外的属性:

>>> DataItem3 = structclass('DataItem', 'name age address', usedict=True)
>>> inst3 = DataItem3('Mike', 10, 'Cherry Street 15')
>>> inst3.hobby = ['drawing', 'singing']
>>> print(inst3)
>>> print(sizeof(inst3), 'has dict:',  bool(inst3.__dict__))
DataItem(name='Mike', age=10, address='Cherry Street 15', **{'hobby': ['drawing', 'singing']})
48 has dict: True

There is yet another way to reduce memory usage if data structure isn’t supposed to contain reference cycles.

Let’s compare two classes:

class DataItem:
    __slots__ = ('name', 'age', 'address')
    def __init__(self, name, age, address):
        self.name = name
        self.age = age
        self.address = address

and

$ pip install recordclass

>>> from recordclass import structclass
>>> DataItem2 = structclass('DataItem', 'name age address')
>>> inst = DataItem('Mike', 10, 'Cherry Street 15')
>>> inst2 = DataItem2('Mike', 10, 'Cherry Street 15')
>>> print(inst2)
>>> print(sys.getsizeof(inst), sys.getsizeof(inst2))
DataItem(name='Mike', age=10, address='Cherry Street 15')
64 40

It became possible since structclass-based classes doesn’t support cyclic garbage collection, which is not needed in such cases.

There is also one advantage over __slots__-based class: you are able to add extra attributes:

>>> DataItem3 = structclass('DataItem', 'name age address', usedict=True)
>>> inst3 = DataItem3('Mike', 10, 'Cherry Street 15')
>>> inst3.hobby = ['drawing', 'singing']
>>> print(inst3)
>>> print(sizeof(inst3), 'has dict:',  bool(inst3.__dict__))
DataItem(name='Mike', age=10, address='Cherry Street 15', **{'hobby': ['drawing', 'singing']})
48 has dict: True

回答 7

这是我对@ Jarrod-Chesney非常好的脚本的测试运行。为了进行比较,我还针对python2运行了它,将“ range”替换为“ xrange”。

出于好奇,我还使用OrderedDict(ordict)添加了类似的测试以进行比较。

Python 3.6.9:

Time Taken = 0:00:04.971369,    profile_dict_of_nt,     Size = 944.27
Time Taken = 0:00:05.743104,    profile_list_of_nt,     Size = 1,066.93
Time Taken = 0:00:02.524507,    profile_dict_of_dict,   Size = 1,920.35
Time Taken = 0:00:02.123801,    profile_list_of_dict,   Size = 1,760.9
Time Taken = 0:00:05.374294,    profile_dict_of_obj,    Size = 1,532.12
Time Taken = 0:00:04.517245,    profile_list_of_obj,    Size = 1,441.04
Time Taken = 0:00:04.590298,    profile_dict_of_slot,   Size = 1,030.09
Time Taken = 0:00:04.197425,    profile_list_of_slot,   Size = 870.67

Time Taken = 0:00:08.833653,    profile_ordict_of_ordict, Size = 3,045.52
Time Taken = 0:00:11.539006,    profile_list_of_ordict, Size = 2,722.34
Time Taken = 0:00:06.428105,    profile_ordict_of_obj,  Size = 1,799.29
Time Taken = 0:00:05.559248,    profile_ordict_of_slot, Size = 1,257.75

Python 2.7.15+:

Time Taken = 0:00:05.193900,    profile_dict_of_nt,     Size = 906.0
Time Taken = 0:00:05.860978,    profile_list_of_nt,     Size = 1,177.0
Time Taken = 0:00:02.370905,    profile_dict_of_dict,   Size = 2,228.0
Time Taken = 0:00:02.100117,    profile_list_of_dict,   Size = 2,036.0
Time Taken = 0:00:08.353666,    profile_dict_of_obj,    Size = 2,493.0
Time Taken = 0:00:07.441747,    profile_list_of_obj,    Size = 2,337.0
Time Taken = 0:00:06.118018,    profile_dict_of_slot,   Size = 1,117.0
Time Taken = 0:00:04.654888,    profile_list_of_slot,   Size = 964.0

Time Taken = 0:00:59.576874,    profile_ordict_of_ordict, Size = 7,427.0
Time Taken = 0:10:25.679784,    profile_list_of_ordict, Size = 11,305.0
Time Taken = 0:05:47.289230,    profile_ordict_of_obj,  Size = 11,477.0
Time Taken = 0:00:51.485756,    profile_ordict_of_slot, Size = 11,193.0

因此,在两个主要版本上,@ Jarrod-Chesney的结论仍然看起来不错。

Here are my test runs of the very nice script of @Jarrod-Chesney. For comparison, I also run it against python2 with “range” replaced by “xrange”.

By curiosity, I also added similar tests with OrderedDict (ordict) for comparison.

Python 3.6.9:

Time Taken = 0:00:04.971369,    profile_dict_of_nt,     Size = 944.27
Time Taken = 0:00:05.743104,    profile_list_of_nt,     Size = 1,066.93
Time Taken = 0:00:02.524507,    profile_dict_of_dict,   Size = 1,920.35
Time Taken = 0:00:02.123801,    profile_list_of_dict,   Size = 1,760.9
Time Taken = 0:00:05.374294,    profile_dict_of_obj,    Size = 1,532.12
Time Taken = 0:00:04.517245,    profile_list_of_obj,    Size = 1,441.04
Time Taken = 0:00:04.590298,    profile_dict_of_slot,   Size = 1,030.09
Time Taken = 0:00:04.197425,    profile_list_of_slot,   Size = 870.67

Time Taken = 0:00:08.833653,    profile_ordict_of_ordict, Size = 3,045.52
Time Taken = 0:00:11.539006,    profile_list_of_ordict, Size = 2,722.34
Time Taken = 0:00:06.428105,    profile_ordict_of_obj,  Size = 1,799.29
Time Taken = 0:00:05.559248,    profile_ordict_of_slot, Size = 1,257.75

Python 2.7.15+:

Time Taken = 0:00:05.193900,    profile_dict_of_nt,     Size = 906.0
Time Taken = 0:00:05.860978,    profile_list_of_nt,     Size = 1,177.0
Time Taken = 0:00:02.370905,    profile_dict_of_dict,   Size = 2,228.0
Time Taken = 0:00:02.100117,    profile_list_of_dict,   Size = 2,036.0
Time Taken = 0:00:08.353666,    profile_dict_of_obj,    Size = 2,493.0
Time Taken = 0:00:07.441747,    profile_list_of_obj,    Size = 2,337.0
Time Taken = 0:00:06.118018,    profile_dict_of_slot,   Size = 1,117.0
Time Taken = 0:00:04.654888,    profile_list_of_slot,   Size = 964.0

Time Taken = 0:00:59.576874,    profile_ordict_of_ordict, Size = 7,427.0
Time Taken = 0:10:25.679784,    profile_list_of_ordict, Size = 11,305.0
Time Taken = 0:05:47.289230,    profile_ordict_of_obj,  Size = 11,477.0
Time Taken = 0:00:51.485756,    profile_ordict_of_slot, Size = 11,193.0

So, on both major versions, the conclusions of @Jarrod-Chesney are still looking good.


保存和加载对象以及使用泡菜

问题:保存和加载对象以及使用泡菜

我正在尝试使用pickle模块保存和加载对象。
首先,我声明我的对象:

>>> class Fruits:pass
...
>>> banana = Fruits()

>>> banana.color = 'yellow'
>>> banana.value = 30

之后,我打开一个名为“ Fruits.obj”的文件(以前,我创建了一个新的.txt文件,并将其重命名为“ Fruits.obj”):

>>> import pickle
>>> filehandler = open(b"Fruits.obj","wb")
>>> pickle.dump(banana,filehandler)

完成此操作后,我关闭了会话,开始了一个新会话,然后放入下一个会话(尝试访问应该保存的对象):

file = open("Fruits.obj",'r')
object_file = pickle.load(file)

但是我有这个信息:

Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\Python31\lib\pickle.py", line 1365, in load
encoding=encoding, errors=errors).load()
ValueError: read() from the underlying stream did notreturn bytes

我不知道该怎么办,因为我不了解此消息。有人知道我如何加载对象“香蕉”吗?谢谢!

编辑: 正如你们中的一些人所说的那样:

>>> import pickle
>>> file = open("Fruits.obj",'rb')

没问题,但是我要讲的是:

>>> object_file = pickle.load(file)

我有错误:

Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\Python31\lib\pickle.py", line 1365, in load
encoding=encoding, errors=errors).load()
EOFError

I´m trying to save and load objects using pickle module.
First I declare my objects:

>>> class Fruits:pass
...
>>> banana = Fruits()

>>> banana.color = 'yellow'
>>> banana.value = 30

After that I open a file called ‘Fruits.obj'(previously I created a new .txt file and I renamed ‘Fruits.obj’):

>>> import pickle
>>> filehandler = open(b"Fruits.obj","wb")
>>> pickle.dump(banana,filehandler)

After do this I close my session and I began a new one and I put the next (trying to access to the object that it supposed to be saved):

file = open("Fruits.obj",'r')
object_file = pickle.load(file)

But I have this message:

Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\Python31\lib\pickle.py", line 1365, in load
encoding=encoding, errors=errors).load()
ValueError: read() from the underlying stream did notreturn bytes

I don´t know what to do because I don´t understand this message. Does anyone know How I can load my object ‘banana’? Thank you!

EDIT: As some of you have sugested I put:

>>> import pickle
>>> file = open("Fruits.obj",'rb')

There were no problem, but the next I put was:

>>> object_file = pickle.load(file)

And I have error:

Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\Python31\lib\pickle.py", line 1365, in load
encoding=encoding, errors=errors).load()
EOFError

回答 0

至于第二个问题:

 Traceback (most recent call last):
 File "<stdin>", line 1, in <module>
 File "C:\Python31\lib\pickle.py", line
 1365, in load encoding=encoding,
 errors=errors).load() EOFError

读取文件内容后,文件指针将位于文件末尾-不再有其他数据可读取。您必须倒带该文件,以便从头开始再次读取它:

file.seek(0)

但是,您通常要使用上下文管理器打开文件并从中读取数据。这样,文件将在块执行完后自动关闭,这也将帮助您将文件操作组织为有意义的块。

最后,cPickle是C语言中pickle模块的更快实现。因此:

In [1]: import cPickle

In [2]: d = {"a": 1, "b": 2}

In [4]: with open(r"someobject.pickle", "wb") as output_file:
   ...:     cPickle.dump(d, output_file)
   ...:

# pickle_file will be closed at this point, preventing your from accessing it any further

In [5]: with open(r"someobject.pickle", "rb") as input_file:
   ...:     e = cPickle.load(input_file)
   ...:

In [7]: print e
------> print(e)
{'a': 1, 'b': 2}

As for your second problem:

 Traceback (most recent call last):
 File "<stdin>", line 1, in <module>
 File "C:\Python31\lib\pickle.py", line
 1365, in load encoding=encoding,
 errors=errors).load() EOFError

After you have read the contents of the file, the file pointer will be at the end of the file – there will be no further data to read. You have to rewind the file so that it will be read from the beginning again:

file.seek(0)

What you usually want to do though, is to use a context manager to open the file and read data from it. This way, the file will be automatically closed after the block finishes executing, which will also help you organize your file operations into meaningful chunks.

Finally, cPickle is a faster implementation of the pickle module in C. So:

In [1]: import cPickle

In [2]: d = {"a": 1, "b": 2}

In [4]: with open(r"someobject.pickle", "wb") as output_file:
   ...:     cPickle.dump(d, output_file)
   ...:

# pickle_file will be closed at this point, preventing your from accessing it any further

In [5]: with open(r"someobject.pickle", "rb") as input_file:
   ...:     e = cPickle.load(input_file)
   ...:

In [7]: print e
------> print(e)
{'a': 1, 'b': 2}

回答 1

以下对我有用:

class Fruits: pass

banana = Fruits()

banana.color = 'yellow'
banana.value = 30

import pickle

filehandler = open("Fruits.obj","wb")
pickle.dump(banana,filehandler)
filehandler.close()

file = open("Fruits.obj",'rb')
object_file = pickle.load(file)
file.close()

print(object_file.color, object_file.value, sep=', ')
# yellow, 30

The following works for me:

class Fruits: pass

banana = Fruits()

banana.color = 'yellow'
banana.value = 30

import pickle

filehandler = open("Fruits.obj","wb")
pickle.dump(banana,filehandler)
filehandler.close()

file = open("Fruits.obj",'rb')
object_file = pickle.load(file)
file.close()

print(object_file.color, object_file.value, sep=', ')
# yellow, 30

回答 2

您也忘记将其读取为二进制文件。

在您的写作部分中,您有:

open(b"Fruits.obj","wb") # Note the wb part (Write Binary)

在阅读部分中,您有:

file = open("Fruits.obj",'r') # Note the r part, there should be a b too

因此,将其替换为:

file = open("Fruits.obj",'rb')

它将起作用:)


至于第二个错误,很可能是由于未正确关闭/同步文件而引起的。

尝试这段代码来编写:

>>> import pickle
>>> filehandler = open(b"Fruits.obj","wb")
>>> pickle.dump(banana,filehandler)
>>> filehandler.close()

这(不变)为:

>>> import pickle
>>> file = open("Fruits.obj",'rb')
>>> object_file = pickle.load(file)

更整洁的版本将使用该with语句。

写作:

>>> import pickle
>>> with open('Fruits.obj', 'wb') as fp:
>>>     pickle.dump(banana, fp)

阅读:

>>> import pickle
>>> with open('Fruits.obj', 'rb') as fp:
>>>     banana = pickle.load(fp)

You’re forgetting to read it as binary too.

In your write part you have:

open(b"Fruits.obj","wb") # Note the wb part (Write Binary)

In the read part you have:

file = open("Fruits.obj",'r') # Note the r part, there should be a b too

So replace it with:

file = open("Fruits.obj",'rb')

And it will work :)


As for your second error, it is most likely cause by not closing/syncing the file properly.

Try this bit of code to write:

>>> import pickle
>>> filehandler = open(b"Fruits.obj","wb")
>>> pickle.dump(banana,filehandler)
>>> filehandler.close()

And this (unchanged) to read:

>>> import pickle
>>> file = open("Fruits.obj",'rb')
>>> object_file = pickle.load(file)

A neater version would be using the with statement.

For writing:

>>> import pickle
>>> with open('Fruits.obj', 'wb') as fp:
>>>     pickle.dump(banana, fp)

For reading:

>>> import pickle
>>> with open('Fruits.obj', 'rb') as fp:
>>>     banana = pickle.load(fp)

回答 3

在这种情况下,始终以二进制模式打开

file = open("Fruits.obj",'rb')

Always open in binary mode, in this case

file = open("Fruits.obj",'rb')

回答 4

您没有以二进制模式打开文件。

open("Fruits.obj",'rb')

应该管用。

对于第二个错误,文件很可能为空,这意味着您无意中清空了文件或使用了错误的文件名或其他名称。

(这是假设您确实确实关闭了会话。如果没有,则是因为您没有在写入和读取之间关闭文件)。

我测试了您的代码,它可以正常工作。

You didn’t open the file in binary mode.

open("Fruits.obj",'rb')

Should work.

For your second error, the file is most likely empty, which mean you inadvertently emptied it or used the wrong filename or something.

(This is assuming you really did close your session. If not, then it’s because you didn’t close the file between the write and the read).

I tested your code, and it works.


回答 5

看来您想跨会话保存您的类实例,并且使用pickle是一种不错的方法。但是,有一个名为的程序包klepto,将对象的保存抽象到字典接口,因此您可以选择腌制对象并将其保存到文件(如下所示),或腌制对象并将其保存到数据库,或者选择使用pickle使用json或其他许多选项。有趣的klepto是,通过抽象到通用接口,它很容易,因此您不必记住如何通过酸洗保存到文件等其他底层细节。

请注意,它适用于动态添加的类属性,而pickle无法做到这一点…

dude@hilbert>$ python
Python 2.7.6 (default, Nov 12 2013, 13:26:39) 
[GCC 4.2.1 Compatible Apple Clang 4.1 ((tags/Apple/clang-421.11.66))] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> from klepto.archives import file_archive 
>>> db = file_archive('fruits.txt')
>>> class Fruits: pass
... 
>>> banana = Fruits()
>>> banana.color = 'yellow'
>>> banana.value = 30
>>> 
>>> db['banana'] = banana 
>>> db.dump()
>>> 

然后我们重新启动…

dude@hilbert>$ python
Python 2.7.6 (default, Nov 12 2013, 13:26:39) 
[GCC 4.2.1 Compatible Apple Clang 4.1 ((tags/Apple/clang-421.11.66))] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> from klepto.archives import file_archive
>>> db = file_archive('fruits.txt')
>>> db.load()
>>> 
>>> db['banana'].color
'yellow'
>>> 

Klepto 适用于python2和python3。

在此处获取代码:https : //github.com/uqfoundation

It seems you want to save your class instances across sessions, and using pickle is a decent way to do this. However, there’s a package called klepto that abstracts the saving of objects to a dictionary interface, so you can choose to pickle objects and save them to a file (as shown below), or pickle the objects and save them to a database, or instead of use pickle use json, or many other options. The nice thing about klepto is that by abstracting to a common interface, it makes it easy so you don’t have to remember the low-level details of how to save via pickling to a file, or otherwise.

Note that It works for dynamically added class attributes, which pickle cannot do…

dude@hilbert>$ python
Python 2.7.6 (default, Nov 12 2013, 13:26:39) 
[GCC 4.2.1 Compatible Apple Clang 4.1 ((tags/Apple/clang-421.11.66))] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> from klepto.archives import file_archive 
>>> db = file_archive('fruits.txt')
>>> class Fruits: pass
... 
>>> banana = Fruits()
>>> banana.color = 'yellow'
>>> banana.value = 30
>>> 
>>> db['banana'] = banana 
>>> db.dump()
>>> 

Then we restart…

dude@hilbert>$ python
Python 2.7.6 (default, Nov 12 2013, 13:26:39) 
[GCC 4.2.1 Compatible Apple Clang 4.1 ((tags/Apple/clang-421.11.66))] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> from klepto.archives import file_archive
>>> db = file_archive('fruits.txt')
>>> db.load()
>>> 
>>> db['banana'].color
'yellow'
>>> 

Klepto works on python2 and python3.

Get the code here: https://github.com/uqfoundation


回答 6

您可以使用anycache为您完成这项工作。假设您有一个myfunc创建实例的函数:

from anycache import anycache

class Fruits:pass

@anycache(cachedir='/path/to/your/cache')    
def myfunc()
    banana = Fruits()
    banana.color = 'yellow'
    banana.value = 30
return banana

Anycache会myfunc在第一次调用时,cachedir使用唯一的标识符(取决于函数名和参数)作为文件名,将结果腌制到文件中。在任何连续运行中,将加载已腌制的对象。

如果在cachedir两次python运行之间保留了,则腌制的对象将从先前的python运行中获取。

函数参数也被考虑在内。重构的实现也是如此:

from anycache import anycache

class Fruits:pass

@anycache(cachedir='/path/to/your/cache')    
def myfunc(color, value)
    fruit = Fruits()
    fruit.color = color
    fruit.value = value
return fruit

You can use anycache to do the job for you. Assuming you have a function myfunc which creates the instance:

from anycache import anycache

class Fruits:pass

@anycache(cachedir='/path/to/your/cache')    
def myfunc()
    banana = Fruits()
    banana.color = 'yellow'
    banana.value = 30
return banana

Anycache calls myfunc at the first time and pickles the result to a file in cachedir using an unique identifier (depending on the the function name and the arguments) as filename. On any consecutive run, the pickled object is loaded.

If the cachedir is preserved between python runs, the pickled object is taken from the previous python run.

The function arguments are also taken into account. A refactored implementation works likewise:

from anycache import anycache

class Fruits:pass

@anycache(cachedir='/path/to/your/cache')    
def myfunc(color, value)
    fruit = Fruits()
    fruit.color = color
    fruit.value = value
return fruit

如何在python中保存和恢复多个变量?

问题:如何在python中保存和恢复多个变量?

我需要将大约十二个对象保存到文件中,然后稍后将其还原。我尝试过使用带咸菜和搁置的for循环,但效果不佳。

编辑。
我试图保存的所有对象都在同一个类中(我之前应该提到过),但我没有意识到我可以像这样保存整个类:

import pickle
def saveLoad(opt):
    global calc
    if opt == "save":
        f = file(filename, 'wb')
        pickle.dump(calc, f, 2)
        f.close
        print 'data saved'
    elif opt == "load":
        f = file(filename, 'rb')
        calc = pickle.load(f)
    else:
        print 'Invalid saveLoad option'

I need to save about a dozen objects to a file and then restore them later. I’ve tried to use a for loop with pickle and shelve but it didn’t work right.

Edit.
All of the objects that I was trying to save were in the same class (I should have mentioned this before), and I didn’t realize that I could just save the whole class like this:

import pickle
def saveLoad(opt):
    global calc
    if opt == "save":
        f = file(filename, 'wb')
        pickle.dump(calc, f, 2)
        f.close
        print 'data saved'
    elif opt == "load":
        f = file(filename, 'rb')
        calc = pickle.load(f)
    else:
        print 'Invalid saveLoad option'

回答 0

如果需要保存多个对象,则可以简单地将它们放在单个列表或元组中,例如:

import pickle

# obj0, obj1, obj2 are created here...

# Saving the objects:
with open('objs.pkl', 'w') as f:  # Python 3: open(..., 'wb')
    pickle.dump([obj0, obj1, obj2], f)

# Getting back the objects:
with open('objs.pkl') as f:  # Python 3: open(..., 'rb')
    obj0, obj1, obj2 = pickle.load(f)

如果您有大量数据,可以通过传递protocol=-1dump(); 来减小文件大小。pickle然后将使用最佳的可用协议,而不是默认的历史记录(以及向后兼容的协议)。在这种情况下,必须以二进制模式打开文件(分别为wbrb)。

二进制模式也应该与Python 3一起使用,因为它的默认协议会生成二进制(即非文本)数据(写入模式'wb'和读取模式'rb')。

If you need to save multiple objects, you can simply put them in a single list, or tuple, for instance:

import pickle

# obj0, obj1, obj2 are created here...

# Saving the objects:
with open('objs.pkl', 'w') as f:  # Python 3: open(..., 'wb')
    pickle.dump([obj0, obj1, obj2], f)

# Getting back the objects:
with open('objs.pkl') as f:  # Python 3: open(..., 'rb')
    obj0, obj1, obj2 = pickle.load(f)

If you have a lot of data, you can reduce the file size by passing protocol=-1 to dump(); pickle will then use the best available protocol instead of the default historical (and more backward-compatible) protocol. In this case, the file must be opened in binary mode (wb and rb, respectively).

The binary mode should also be used with Python 3, as its default protocol produces binary (i.e. non-text) data (writing mode 'wb' and reading mode 'rb').


回答 1

有一个名为的内置库pickle。使用,pickle您可以将对象转储到文件中并在以后加载它们。

import pickle

f = open('store.pckl', 'wb')
pickle.dump(obj, f)
f.close()

f = open('store.pckl', 'rb')
obj = pickle.load(f)
f.close()

There is a built-in library called pickle. Using pickle you can dump objects to a file and load them later.

import pickle

f = open('store.pckl', 'wb')
pickle.dump(obj, f)
f.close()

f = open('store.pckl', 'rb')
obj = pickle.load(f)
f.close()

回答 2

您应该查看一下货架泡菜模块。如果您需要存储大量数据,最好使用数据库

You should look at the shelve and pickle modules. If you need to store a lot of data it may be better to use a database


回答 3

将多个变量保存到pickle文件的另一种方法是:

import pickle

a = 3; b = [11,223,435];
pickle.dump([a,b], open("trial.p", "wb"))

c,d = pickle.load(open("trial.p","rb"))

print(c,d) ## To verify

Another approach to saving multiple variables to a pickle file is:

import pickle

a = 3; b = [11,223,435];
pickle.dump([a,b], open("trial.p", "wb"))

c,d = pickle.load(open("trial.p","rb"))

print(c,d) ## To verify

回答 4

您可以使用klepto,它为内存,磁盘或数据库提供持久缓存。

dude@hilbert>$ python
Python 2.7.6 (default, Nov 12 2013, 13:26:39) 
[GCC 4.2.1 Compatible Apple Clang 4.1 ((tags/Apple/clang-421.11.66))] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> from klepto.archives import file_archive
>>> db = file_archive('foo.txt')
>>> db['1'] = 1
>>> db['max'] = max
>>> squared = lambda x: x**2
>>> db['squared'] = squared
>>> def add(x,y):
...   return x+y
... 
>>> db['add'] = add
>>> class Foo(object):
...   y = 1
...   def bar(self, x):
...     return self.y + x
... 
>>> db['Foo'] = Foo
>>> f = Foo()
>>> db['f'] = f  
>>> db.dump()
>>> 

然后,在解释器重新启动后…

dude@hilbert>$ python
Python 2.7.6 (default, Nov 12 2013, 13:26:39) 
[GCC 4.2.1 Compatible Apple Clang 4.1 ((tags/Apple/clang-421.11.66))] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> from klepto.archives import file_archive
>>> db = file_archive('foo.txt')
>>> db
file_archive('foo.txt', {}, cached=True)
>>> db.load()
>>> db
file_archive('foo.txt', {'1': 1, 'add': <function add at 0x10610a0c8>, 'f': <__main__.Foo object at 0x10510ced0>, 'max': <built-in function max>, 'Foo': <class '__main__.Foo'>, 'squared': <function <lambda> at 0x10610a1b8>}, cached=True)
>>> db['add'](2,3)
5
>>> db['squared'](3)
9
>>> db['f'].bar(4)
5
>>> 

在此处获取代码:https : //github.com/uqfoundation

You could use klepto, which provides persistent caching to memory, disk, or database.

dude@hilbert>$ python
Python 2.7.6 (default, Nov 12 2013, 13:26:39) 
[GCC 4.2.1 Compatible Apple Clang 4.1 ((tags/Apple/clang-421.11.66))] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> from klepto.archives import file_archive
>>> db = file_archive('foo.txt')
>>> db['1'] = 1
>>> db['max'] = max
>>> squared = lambda x: x**2
>>> db['squared'] = squared
>>> def add(x,y):
...   return x+y
... 
>>> db['add'] = add
>>> class Foo(object):
...   y = 1
...   def bar(self, x):
...     return self.y + x
... 
>>> db['Foo'] = Foo
>>> f = Foo()
>>> db['f'] = f  
>>> db.dump()
>>> 

Then, after interpreter restart…

dude@hilbert>$ python
Python 2.7.6 (default, Nov 12 2013, 13:26:39) 
[GCC 4.2.1 Compatible Apple Clang 4.1 ((tags/Apple/clang-421.11.66))] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> from klepto.archives import file_archive
>>> db = file_archive('foo.txt')
>>> db
file_archive('foo.txt', {}, cached=True)
>>> db.load()
>>> db
file_archive('foo.txt', {'1': 1, 'add': <function add at 0x10610a0c8>, 'f': <__main__.Foo object at 0x10510ced0>, 'max': <built-in function max>, 'Foo': <class '__main__.Foo'>, 'squared': <function <lambda> at 0x10610a1b8>}, cached=True)
>>> db['add'](2,3)
5
>>> db['squared'](3)
9
>>> db['f'].bar(4)
5
>>> 

Get the code here: https://github.com/uqfoundation


回答 5

以下方法似乎很简单,可以与不同大小的变量一起使用:

import hickle as hkl
# write variables to filename [a,b,c can be of any size]
hkl.dump([a,b,c], filename)

# load variables from filename
a,b,c = hkl.load(filename)

The following approach seems simple and can be used with variables of different size:

import hickle as hkl
# write variables to filename [a,b,c can be of any size]
hkl.dump([a,b,c], filename)

# load variables from filename
a,b,c = hkl.load(filename)

如何为带有测试的pytest类正确设置和拆卸?

问题:如何为带有测试的pytest类正确设置和拆卸?

我正在使用硒进行端到端测试,但无法获得使用方法setup_classteardown_class方法。

我需要在setup_class方法中设置浏览器,然后执行一堆定义为类方法的测试,最后退出teardown_class方法中的浏览器。

但是从逻辑上讲,这似乎是一个糟糕的解决方案,因为实际上我的测试不适用于类,而适用于对象。我self在每个测试方法中传递参数,因此可以访问对象的vars:

class TestClass:
  
    def setup_class(cls):
        pass
        
    def test_buttons(self, data):
        # self.$attribute can be used, but not cls.$attribute?  
        pass
        
    def test_buttons2(self, data):
        # self.$attribute can be used, but not cls.$attribute?
        pass
        
    def teardown_class(cls):
        pass
    

甚至为类创建浏览器实例似乎也不正确。应该为每个对象分别创建,对吗?

因此,我需要使用__init__and __del__方法代替setup_classand teardown_class

I am using selenium for end to end testing and I can’t get how to use setup_class and teardown_class methods.

I need to set up browser in setup_class method, then perform a bunch of tests defined as class methods and finally quit browser in teardown_class method.

But logically it seems like a bad solution, because in fact my tests will not work with class, but with object. I pass self param inside every test method, so I can access objects’ vars:

class TestClass:
  
    def setup_class(cls):
        pass
        
    def test_buttons(self, data):
        # self.$attribute can be used, but not cls.$attribute?  
        pass
        
    def test_buttons2(self, data):
        # self.$attribute can be used, but not cls.$attribute?
        pass
        
    def teardown_class(cls):
        pass
    

And it even seems not to be correct to create browser instance for class.. It should be created for every object separately, right?

So, I need to use __init__ and __del__ methods instead of setup_class and teardown_class?


回答 0

根据Fixture的完成/执行拆卸代码,当前设置和拆卸的最佳做法是使用yield而不是return

import pytest

@pytest.fixture()
def resource():
    print("setup")
    yield "resource"
    print("teardown")

class TestResource:
    def test_that_depends_on_resource(self, resource):
        print("testing {}".format(resource))

运行它会导致

$ py.test --capture=no pytest_yield.py
=== test session starts ===
platform darwin -- Python 2.7.10, pytest-3.0.2, py-1.4.31, pluggy-0.3.1
collected 1 items

pytest_yield.py setup
testing resource
.teardown


=== 1 passed in 0.01 seconds ===

编写拆卸代码的另一种方法是,将一个request-context对象接受到您的Fixture函数中,并request.addfinalizer使用执行一次或多次拆卸的函数调用其方法:

import pytest

@pytest.fixture()
def resource(request):
    print("setup")

    def teardown():
        print("teardown")
    request.addfinalizer(teardown)
    
    return "resource"

class TestResource:
    def test_that_depends_on_resource(self, resource):
        print("testing {}".format(resource))

According to Fixture finalization / executing teardown code, the current best practice for setup and teardown is to use yield instead of return:

import pytest

@pytest.fixture()
def resource():
    print("setup")
    yield "resource"
    print("teardown")

class TestResource:
    def test_that_depends_on_resource(self, resource):
        print("testing {}".format(resource))

Running it results in

$ py.test --capture=no pytest_yield.py
=== test session starts ===
platform darwin -- Python 2.7.10, pytest-3.0.2, py-1.4.31, pluggy-0.3.1
collected 1 items

pytest_yield.py setup
testing resource
.teardown


=== 1 passed in 0.01 seconds ===

Another way to write teardown code is by accepting a request-context object into your fixture function and calling its request.addfinalizer method with a function that performs the teardown one or multiple times:

import pytest

@pytest.fixture()
def resource(request):
    print("setup")

    def teardown():
        print("teardown")
    request.addfinalizer(teardown)
    
    return "resource"

class TestResource:
    def test_that_depends_on_resource(self, resource):
        print("testing {}".format(resource))

回答 1

当您编写“定义为类方法的测试”时,您是说类方法(将其作为第一个参数的方法)还是常规方法(将实例作为第一个参数的方法)?

由于您的示例使用self了测试方法,因此我假设是后者,因此您只需要使用setup_method

class Test:

    def setup_method(self, test_method):
        # configure self.attribute

    def teardown_method(self, test_method):
        # tear down self.attribute

    def test_buttons(self):
        # use self.attribute for test

测试方法实例传递给setup_methodteardown_method,但是如果您的设置/拆卸代码不需要了解测试上下文,则可以忽略该方法。可以在这里找到更多信息

我还建议您熟悉py.test的装置,因为它们是更强大的概念。

When you write “tests defined as class methods”, do you really mean class methods (methods which receive its class as first parameter) or just regular methods (methods which receive an instance as first parameter)?

Since your example uses self for the test methods I’m assuming the latter, so you just need to use setup_method instead:

class Test:

    def setup_method(self, test_method):
        # configure self.attribute

    def teardown_method(self, test_method):
        # tear down self.attribute

    def test_buttons(self):
        # use self.attribute for test

The test method instance is passed to setup_method and teardown_method, but can be ignored if your setup/teardown code doesn’t need to know the testing context. More information can be found here.

I also recommend that you familiarize yourself with py.test’s fixtures, as they are a more powerful concept.


回答 2

这可能会有所帮助http://docs.pytest.org/en/latest/xunit_setup.html

在测试套件中,我将测试用例分组。对于安装和拆卸,我需要该类中的所有测试用例,我使用setup_class(cls)teardown_class(cls)类方法。

对于每个测试用例的设置和拆卸,我使用setup_method(method)teardown_method(methods)

例:

lh = <got log handler from logger module>

class TestClass:
    @classmethod
    def setup_class(cls):
        lh.info("starting class: {} execution".format(cls.__name__))

    @classmethod
    def teardown_class(cls):
        lh.info("starting class: {} execution".format(cls.__name__))

    def setup_method(self, method):
        lh.info("starting execution of tc: {}".format(method.__name__))

    def teardown_method(self, method):
        lh.info("starting execution of tc: {}".format(method.__name__))

    def test_tc1(self):
        <tc_content>
        assert 

    def test_tc2(self):
        <tc_content>
        assert

现在,当我运行测试时,当TestClass执行开始时,它将记录何时开始执行,何时结束执行以及方法的详细信息。

您可以在相应位置添加其他设置和拆卸步骤。

希望能帮助到你!

This might help http://docs.pytest.org/en/latest/xunit_setup.html

In my test suite, I group my test cases into classes. For the setup and teardown I need for all the test cases in that class, I use the setup_class(cls) and teardown_class(cls) classmethods.

And for the setup and teardown I need for each of the test case, I use the setup_method(method) and teardown_method(methods)

Example:

lh = <got log handler from logger module>

class TestClass:
    @classmethod
    def setup_class(cls):
        lh.info("starting class: {} execution".format(cls.__name__))

    @classmethod
    def teardown_class(cls):
        lh.info("starting class: {} execution".format(cls.__name__))

    def setup_method(self, method):
        lh.info("starting execution of tc: {}".format(method.__name__))

    def teardown_method(self, method):
        lh.info("starting execution of tc: {}".format(method.__name__))

    def test_tc1(self):
        <tc_content>
        assert 

    def test_tc2(self):
        <tc_content>
        assert

Now when I run my tests, when the TestClass execution is starting, it logs the details for when it is beginning execution, when it is ending execution and same for the methods..

You can add up other setup and teardown steps you might have in the respective locations.

Hope it helps!


回答 3

正如@Bruno所建议的那样,使用pytest固定装置是另一种解决方案,可用于两个测试类甚至是简单的测试函数。这是测试python2.7函数的示例

import pytest

@pytest.fixture(scope='function')
def some_resource(request):
    stuff_i_setup = ["I setup"]

    def some_teardown():
        stuff_i_setup[0] += " ... but now I'm torn down..."
        print stuff_i_setup[0]
    request.addfinalizer(some_teardown)

    return stuff_i_setup[0]

def test_1_that_needs_resource(some_resource):
    print some_resource + "... and now I'm testing things..."

所以,跑步 test_1...生成:

I setup... and now I'm testing things...
I setup ... but now I'm torn down...

该通知stuff_i_setup是在夹具中引用,使该对象是setuptorn down为测试它与交互。您可以想象这对于持久性对象(例如假设的数据库或某些连接)很有用,必须在每次测试运行之前清除这些持久性对象以使它们隔离。

As @Bruno suggested, using pytest fixtures is another solution that is accessible for both test classes or even just simple test functions. Here’s an example testing python2.7 functions:

import pytest

@pytest.fixture(scope='function')
def some_resource(request):
    stuff_i_setup = ["I setup"]

    def some_teardown():
        stuff_i_setup[0] += " ... but now I'm torn down..."
        print stuff_i_setup[0]
    request.addfinalizer(some_teardown)

    return stuff_i_setup[0]

def test_1_that_needs_resource(some_resource):
    print some_resource + "... and now I'm testing things..."

So, running test_1... produces:

I setup... and now I'm testing things...
I setup ... but now I'm torn down...

Notice that stuff_i_setup is referenced in the fixture, allowing that object to be setup and torn down for the test it’s interacting with. You can imagine this could be useful for a persistent object, such as a hypothetical database or some connection, that must be cleared before each test runs to keep them isolated.


回答 4

添加@classmethod装饰器后,您的代码应该可以按预期工作。

@classmethod 
def setup_class(cls):
    "Runs once per class"

@classmethod 
def teardown_class(cls):
    "Runs at end of class"

参见http://pythontesting.net/framework/pytest/pytest-xunit-style-fixtures/

Your code should work just as you expect it to if you add @classmethod decorators.

@classmethod 
def setup_class(cls):
    "Runs once per class"

@classmethod 
def teardown_class(cls):
    "Runs at end of class"

See http://pythontesting.net/framework/pytest/pytest-xunit-style-fixtures/


访问对象存储器地址

问题:访问对象存储器地址

当您object.__repr__()在Python中调用该方法时,您会得到类似以下的信息:

<__main__.Test object at 0x2aba1c0cf890> 

如果您过载了__repr__(),还有什么方法可以保留该内存地址,然后调用super(Class, obj).__repr__()并重新分配它呢?

When you call the object.__repr__() method in Python you get something like this back:

<__main__.Test object at 0x2aba1c0cf890> 

Is there any way to get a hold of the memory address if you overload __repr__(), other then calling super(Class, obj).__repr__() and regexing it out?


回答 0

Python的手册已经这样说id()

返回一个对象的“身份”,这是一个整数(或长整数),在该对象的生存期内保证是唯一且恒定的。两个不重叠生存期的对象可能具有相同的id()值。 (实施说明:这是对象的地址。)

因此,在CPython中,这将是对象的地址。但是,没有任何其他Python解释器的此类保证。

请注意,如果您正在编写C扩展名,则可以完全访问Python解释器的内部,包括直接访问对象的地址。

The Python manual has this to say about id():

Return the “identity” of an object. This is an integer (or long integer) which is guaranteed to be unique and constant for this object during its lifetime. Two objects with non-overlapping lifetimes may have the same id() value. (Implementation note: this is the address of the object.)

So in CPython, this will be the address of the object. No such guarantee for any other Python interpreter, though.

Note that if you’re writing a C extension, you have full access to the internals of the Python interpreter, including access to the addresses of objects directly.


回答 1

您可以通过以下方式重新实现默认代表:

def __repr__(self):
    return '<%s.%s object at %s>' % (
        self.__class__.__module__,
        self.__class__.__name__,
        hex(id(self))
    )

You could reimplement the default repr this way:

def __repr__(self):
    return '<%s.%s object at %s>' % (
        self.__class__.__module__,
        self.__class__.__name__,
        hex(id(self))
    )

回答 2

只需使用

id(object)

Just use

id(object)

回答 3

这里有一些其他答案未涵盖的问题。

首先,id仅返回:

对象的“身份”。这是一个整数(或长整数),在该对象的生存期内,此整数保证是唯一且恒定的。具有非重叠生存期的两个对象可能具有相同的id()值。


在CPython中,这恰好是指向PyObject解释器中代表对象的指针,这与on上的东西相同,显然不会成为指针。我不确定IronPython,但我怀疑在这方面,它更像是Jython,而不是CPython。因此,在大多数Python实现中,无法获得显示在其中的任何内容,如果您这样做了,则毫无用处。object.__repr__显示。但这只是CPython的实现细节,而不是一般Python的真实情况。Jython不处理指针,它处理Java引用(JVM当然可以将其表示为指针,但是您看不到它们,并且也不想这样做,因为允许GC来回移动它们)。PyPy让不同类型的对象具有不同的种类id,但最一般的只是对您已调用的对象表的索引idrepr


但是,如果您只关心CPython怎么办?毕竟,这是一个很普通的情况。

好吧,首先,您可能会注意到这id是一个整数; *如果您想要该0x2aba1c0cf890字符串而不是数字46978822895760,则必须自己设置其格式。在幕后,我相信object.__repr__最终使用printf%p格式,你没有从Python的有……但你总是可以做到这一点:

format(id(spam), '#010x' if sys.maxsize.bit_length() <= 32 else '#18x')

*在3.x中,它是一个int。在2.x中,int如果它足够大以容纳一个指针(可能不是由于某些平台上的有符号数问题而引起的),long否则是一个错误。

除了将它们打印出来,您还能使用这些指针做什么?当然(再次假设您只关心CPython)。

所有C API函数均采用指向PyObject或相关类型的指针。对于那些相关的类型,您可以调用PyFoo_Check以确保它确实是一个Foo对象,然后使用进行强制转换(PyFoo *)p。因此,如果您要编写C扩展名,id则正是您所需要的。

如果您正在编写纯Python代码怎么办?您可以使用pythonapifrom 调用完全相同的函数ctypes


最后,提出了其他一些答案ctypes.addressof。这与这里无关。这仅适用于ctypes类似的对象c_int32(可能还有一些类似内存缓冲区的对象,如所提供的对象numpy)。而且,即使在那儿,它也没有为您提供c_int32值的地址,而是为您提供int32c_int32包装的C级地址。

话虽这么说,但通常情况下,如果您确实认为自己需要某个东西的地址,那么首先就不需要原生Python对象,而是想要一个ctypes对象。

There are a few issues here that aren’t covered by any of the other answers.

First, id only returns:

the “identity” of an object. This is an integer (or long integer) which is guaranteed to be unique and constant for this object during its lifetime. Two objects with non-overlapping lifetimes may have the same id() value.


In CPython, this happens to be the pointer to the PyObject that represents the object in the interpreter, which is the same thing that object.__repr__ displays. But this is just an implementation detail of CPython, not something that’s true of Python in general. Jython doesn’t deal in pointers, it deals in Java references (which the JVM of course probably represents as pointers, but you can’t see those—and wouldn’t want to, because the GC is allowed to move them around). PyPy lets different types have different kinds of id, but the most general is just an index into a table of objects you’ve called id on, which is obviously not going to be a pointer. I’m not sure about IronPython, but I’d suspect it’s more like Jython than like CPython in this regard. So, in most Python implementations, there’s no way to get whatever showed up in that repr, and no use if you did.


But what if you only care about CPython? That’s a pretty common case, after all.

Well, first, you may notice that id is an integer;* if you want that 0x2aba1c0cf890 string instead of the number 46978822895760, you’re going to have to format it yourself. Under the covers, I believe object.__repr__ is ultimately using printf‘s %p format, which you don’t have from Python… but you can always do this:

format(id(spam), '#010x' if sys.maxsize.bit_length() <= 32 else '#18x')

* In 3.x, it’s an int. In 2.x, it’s an int if that’s big enough to hold a pointer—which is may not be because of signed number issues on some platforms—and a long otherwise.

Is there anything you can do with these pointers besides print them out? Sure (again, assuming you only care about CPython).

All of the C API functions take a pointer to a PyObject or a related type. For those related types, you can just call PyFoo_Check to make sure it really is a Foo object, then cast with (PyFoo *)p. So, if you’re writing a C extension, the id is exactly what you need.

What if you’re writing pure Python code? You can call the exact same functions with pythonapi from ctypes.


Finally, a few of the other answers have brought up ctypes.addressof. That isn’t relevant here. This only works for ctypes objects like c_int32 (and maybe a few memory-buffer-like objects, like those provided by numpy). And, even there, it isn’t giving you the address of the c_int32 value, it’s giving you the address of the C-level int32 that the c_int32 wraps up.

That being said, more often than not, if you really think you need the address of something, you didn’t want a native Python object in the first place, you wanted a ctypes object.


回答 4

仅作为对Torsten的回应,我无法调用addressof()常规的python对象。此外,id(a) != addressof(a)。这是在CPython中,对其他什么都不知道。

>>> from ctypes import c_int, addressof
>>> a = 69
>>> addressof(a)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: invalid type
>>> b = c_int(69)
>>> addressof(b)
4300673472
>>> id(b)
4300673392

Just in response to Torsten, I wasn’t able to call addressof() on a regular python object. Furthermore, id(a) != addressof(a). This is in CPython, don’t know about anything else.

>>> from ctypes import c_int, addressof
>>> a = 69
>>> addressof(a)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: invalid type
>>> b = c_int(69)
>>> addressof(b)
4300673472
>>> id(b)
4300673392

回答 5

使用ctypes,您可以使用

>>> import ctypes
>>> a = (1,2,3)
>>> ctypes.addressof(a)
3077760748L

说明文件:

addressof(C instance) -> integer
返回C实例内部缓冲区的地址

请注意,在CPython中,当前是id(a) == ctypes.addressof(a),但是ctypes.addressof应返回每个Python实现的真实地址,如果

  • 支持ctypes
  • 内存指针是一个有效的概念。

编辑:添加了有关ctypes解释器独立性的信息

With ctypes, you can achieve the same thing with

>>> import ctypes
>>> a = (1,2,3)
>>> ctypes.addressof(a)
3077760748L

Documentation:

addressof(C instance) -> integer
Return the address of the C instance internal buffer

Note that in CPython, currently id(a) == ctypes.addressof(a), but ctypes.addressof should return the real address for each Python implementation, if

  • ctypes is supported
  • memory pointers are a valid notion.

Edit: added information about interpreter-independence of ctypes


回答 6

您可以通过以下方式获得适合该目的的东西:

id(self)

You can get something suitable for that purpose with:

id(self)

回答 7

我知道这是一个老问题,但是如果您现在仍在使用python 3编程,我实际上发现如果它是字符串,那么有一种非常简单的方法可以做到这一点:

>>> spam.upper
<built-in method upper of str object at 0x1042e4830>
>>> spam.upper()
'YO I NEED HELP!'
>>> id(spam)
4365109296

字符串转换也不影响内存中的位置:

>>> spam = {437 : 'passphrase'}
>>> object.__repr__(spam)
'<dict object at 0x1043313f0>'
>>> str(spam)
"{437: 'passphrase'}"
>>> object.__repr__(spam)
'<dict object at 0x1043313f0>'

I know this is an old question but if you’re still programming, in python 3 these days… I have actually found that if it is a string, then there is a really easy way to do this:

>>> spam.upper
<built-in method upper of str object at 0x1042e4830>
>>> spam.upper()
'YO I NEED HELP!'
>>> id(spam)
4365109296

string conversion does not affect location in memory either:

>>> spam = {437 : 'passphrase'}
>>> object.__repr__(spam)
'<dict object at 0x1043313f0>'
>>> str(spam)
"{437: 'passphrase'}"
>>> object.__repr__(spam)
'<dict object at 0x1043313f0>'

回答 8

虽然确实id(object)可以在默认的CPython实现中获取对象的地址,但这通常是无用的……您无法纯Python代码中的地址进行任何操作。

实际上,唯一可以使用该地址的时间是来自C扩展库…在这种情况下,获取对象的地址很简单,因为Python对象始终作为C指针传递。

While it’s true that id(object) gets the object’s address in the default CPython implementation, this is generally useless… you can’t do anything with the address from pure Python code.

The only time you would actually be able to use the address is from a C extension library… in which case it is trivial to get the object’s address since Python objects are always passed around as C pointers.


如何访问给定字符串的对象属性,该字符串对应于该属性的名称

问题:如何访问给定字符串的对象属性,该字符串对应于该属性的名称

如何设置/获取t给定的属性的值x

class Test:
   def __init__(self):
       self.attr1 = 1
       self.attr2 = 2

t = Test()
x = "attr1"

How do you set/get the values of attributes of t given by x?

class Test:
   def __init__(self):
       self.attr1 = 1
       self.attr2 = 2

t = Test()
x = "attr1"

回答 0

有称为getattr和的内置函数。setattr

getattr(object, attrname)
setattr(object, attrname, value)

在这种情况下

x = getattr(t, 'attr1')
setattr(t, 'attr1', 21)

There are built-in functions called getattr and setattr

getattr(object, attrname)
setattr(object, attrname, value)

In this case

x = getattr(t, 'attr1')
setattr(t, 'attr1', 21)

回答 1

注意:此答案非常过时。它适用new使用2008不推荐使用的模块的Python 2 。

有内置的python函数setattr和getattr。可以用来设置和获取类的属性。

一个简单的例子:

>>> from new import  classobj

>>> obj = classobj('Test', (object,), {'attr1': int, 'attr2': int}) # Just created a class

>>> setattr(obj, 'attr1', 10)

>>> setattr(obj, 'attr2', 20)

>>> getattr(obj, 'attr1')
10

>>> getattr(obj, 'attr2')
20

Note: This answer is very outdated. It applies to Python 2 using the new module that was deprecated in 2008.

There is python built in functions setattr and getattr. Which can used to set and get the attribute of an class.

A brief example:

>>> from new import  classobj

>>> obj = classobj('Test', (object,), {'attr1': int, 'attr2': int}) # Just created a class

>>> setattr(obj, 'attr1', 10)

>>> setattr(obj, 'attr2', 20)

>>> getattr(obj, 'attr1')
10

>>> getattr(obj, 'attr2')
20

回答 2

如果要将逻辑隐藏在类内部,则可能更喜欢使用通用的getter方法,如下所示:

class Test:
    def __init__(self):
        self.attr1 = 1
        self.attr2 = 2

    def get(self,varname):
        return getattr(self,varname)

t = Test()
x = "attr1"
print ("Attribute value of {0} is {1}".format(x, t.get(x)))

输出:

Attribute value of attr1 is 1

可以更好地隐藏它的另一个方法是使用magic方法__getattribute__,但是我一直遇到一个无限循环,当尝试在该方法中检索属性值时,我无法解决。

另请注意,您也可以使用vars()。在上面的示例中,您可以getattr(self,varname)通过进行交换return vars(self)[varname],但getattr根据之间的区别varssetattr是最好的

If you want to keep the logic hidden inside the class, you may prefer to use a generalized getter method like so:

class Test:
    def __init__(self):
        self.attr1 = 1
        self.attr2 = 2

    def get(self,varname):
        return getattr(self,varname)

t = Test()
x = "attr1"
print ("Attribute value of {0} is {1}".format(x, t.get(x)))

Outputs:

Attribute value of attr1 is 1

Another apporach that could hide it even better would be using the magic method __getattribute__, but I kept getting an endless loop which I was unable to resolve when trying to get retrieve the attribute value inside that method.

Also note that you can alternatively use vars(). In the above example, you could exchange getattr(self,varname) by return vars(self)[varname], but getattrmight be preferable according to the answer to What is the difference between vars and setattr?.


您如何以编程方式设置属性?

问题:您如何以编程方式设置属性?

假设我有一个python对象x和一个字符串s,如何将属性设置为son x?所以:

>>> x = SomeObject()
>>> attr = 'myAttr'
>>> # magic goes here
>>> x.myAttr
'magic'

魔术是什么?顺便说一下,这样做的目的是将对的调用缓存x.__getattr__()

Suppose I have a python object x and a string s, how do I set the attribute s on x? So:

>>> x = SomeObject()
>>> attr = 'myAttr'
>>> # magic goes here
>>> x.myAttr
'magic'

What’s the magic? The goal of this, incidentally, is to cache calls to x.__getattr__().


回答 0

setattr(x, attr, 'magic')

寻求帮助:

>>> help(setattr)
Help on built-in function setattr in module __builtin__:

setattr(...)
    setattr(object, name, value)

    Set a named attribute on an object; setattr(x, 'y', v) is equivalent to
    ``x.y = v''.

编辑:但是,您应该注意(如注释中所指出),您不能对的“纯”实例执行此操作object。但是很可能您有一个简单的对象子类,可以很好地工作。我强烈敦促OP不要创建这样的对象实例。

setattr(x, attr, 'magic')

For help on it:

>>> help(setattr)
Help on built-in function setattr in module __builtin__:

setattr(...)
    setattr(object, name, value)

    Set a named attribute on an object; setattr(x, 'y', v) is equivalent to
    ``x.y = v''.

Edit: However, you should note (as pointed out in a comment) that you can’t do that to a “pure” instance of object. But it is likely you have a simple subclass of object where it will work fine. I would strongly urge the O.P. to never make instances of object like that.


回答 1

通常,我们为此定义类。

class XClass( object ):
   def __init__( self ):
       self.myAttr= None

x= XClass()
x.myAttr= 'magic'
x.myAttr

但是,您可以在一定程度上使用setattrgetattr内置函数来执行此操作。但是,它们不适用于object直接实例。

>>> a= object()
>>> setattr( a, 'hi', 'mom' )
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: 'object' object has no attribute 'hi'

但是,它们确实适用于各种简单的类。

class YClass( object ):
    pass

y= YClass()
setattr( y, 'myAttr', 'magic' )
y.myAttr

Usually, we define classes for this.

class XClass( object ):
   def __init__( self ):
       self.myAttr= None

x= XClass()
x.myAttr= 'magic'
x.myAttr

However, you can, to an extent, do this with the setattr and getattr built-in functions. However, they don’t work on instances of object directly.

>>> a= object()
>>> setattr( a, 'hi', 'mom' )
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: 'object' object has no attribute 'hi'

They do, however, work on all kinds of simple classes.

class YClass( object ):
    pass

y= YClass()
setattr( y, 'myAttr', 'magic' )
y.myAttr

回答 2

让x成为一个对象,那么你可以通过两种方式做到这一点

x.attr_name = s 
setattr(x, 'attr_name', s)

let x be an object then you can do it two ways

x.attr_name = s 
setattr(x, 'attr_name', s)

如何在Python中创建对象的副本?

问题:如何在Python中创建对象的副本?

我想创建一个对象的副本。我希望新对象拥有旧对象的所有属性(字段的值)。但是我想拥有独立的对象。因此,如果我更改新对象的字段的值,则旧对象不应受此影响。

I would like to create a copy of an object. I want the new object to possess all properties of the old object (values of the fields). But I want to have independent objects. So, if I change values of the fields of the new object, the old object should not be affected by that.


回答 0

要获得对象的完全独立副本,可以使用copy.deepcopy()函数。

有关浅层复制和深层复制的更多详细信息,请参考该问题的其他答案以及该答案中对相关问题的很好的解释。

To get a fully independent copy of an object you can use the copy.deepcopy() function.

For more details about shallow and deep copying please refer to the other answers to this question and the nice explanation in this answer to a related question.


回答 1

如何在Python中创建对象的副本?

因此,如果我更改新对象的字段的值,则旧对象不应受此影响。

那你的意思是一个可变的对象。

在Python 3中,列表获取copy方法(在2中,您将使用切片来制作副本):

>>> a_list = list('abc')
>>> a_copy_of_a_list = a_list.copy()
>>> a_copy_of_a_list is a_list
False
>>> a_copy_of_a_list == a_list
True

浅副本

浅拷贝只是最外层容器的拷贝。

list.copy 是浅表副本:

>>> list_of_dict_of_set = [{'foo': set('abc')}]
>>> lodos_copy = list_of_dict_of_set.copy()
>>> lodos_copy[0]['foo'].pop()
'c'
>>> lodos_copy
[{'foo': {'b', 'a'}}]
>>> list_of_dict_of_set
[{'foo': {'b', 'a'}}]

您不会获得内部对象的副本。它们是同一个对象-因此,对它们进行突变后,更改会显示在两个容器中。

深拷贝

深层副本是每个内部对象的递归副本。

>>> lodos_deep_copy = copy.deepcopy(list_of_dict_of_set)
>>> lodos_deep_copy[0]['foo'].add('c')
>>> lodos_deep_copy
[{'foo': {'c', 'b', 'a'}}]
>>> list_of_dict_of_set
[{'foo': {'b', 'a'}}]

所做的更改不会反映在原始文档中,而只会反映在副本中。

不变的对象

不可变对象通常不需要复制。实际上,如果您尝试这样做,Python只会为您提供原始对象:

>>> a_tuple = tuple('abc')
>>> tuple_copy_attempt = a_tuple.copy()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: 'tuple' object has no attribute 'copy'

元组甚至没有复制方法,因此让我们用一个切片尝试一下:

>>> tuple_copy_attempt = a_tuple[:]

但是我们看到它是同一对象:

>>> tuple_copy_attempt is a_tuple
True

对于字符串类似:

>>> s = 'abc'
>>> s0 = s[:]
>>> s == s0
True
>>> s is s0
True

对于冻结集,即使它们具有以下copy方法:

>>> a_frozenset = frozenset('abc')
>>> frozenset_copy_attempt = a_frozenset.copy()
>>> frozenset_copy_attempt is a_frozenset
True

何时复制不可变对象

如果需要复制可变的内部对象,则应复制不可变对象。

>>> tuple_of_list = [],
>>> copy_of_tuple_of_list = tuple_of_list[:]
>>> copy_of_tuple_of_list[0].append('a')
>>> copy_of_tuple_of_list
(['a'],)
>>> tuple_of_list
(['a'],)
>>> deepcopy_of_tuple_of_list = copy.deepcopy(tuple_of_list)
>>> deepcopy_of_tuple_of_list[0].append('b')
>>> deepcopy_of_tuple_of_list
(['a', 'b'],)
>>> tuple_of_list
(['a'],)

如我们所见,当副本的内部对象发生突变时,原始对象不会更改。

自定义对象

定制对象通常将数据存储在__dict__属性中或__slots__(类似元组的内存结构中)中。

要制作可复制的对象,请定义__copy__(对于浅层副本)和/或__deepcopy__(对于深层副本)。

from copy import copy, deepcopy

class Copyable:
    __slots__ = 'a', '__dict__'
    def __init__(self, a, b):
        self.a, self.b = a, b
    def __copy__(self):
        return type(self)(self.a, self.b)
    def __deepcopy__(self, memo): # memo is a dict of id's to copies
        id_self = id(self)        # memoization avoids unnecesary recursion
        _copy = memo.get(id_self)
        if _copy is None:
            _copy = type(self)(
                deepcopy(self.a, memo), 
                deepcopy(self.b, memo))
            memo[id_self] = _copy 
        return _copy

请注意,deepcopy保留id(original)要复制的(或身份编号)备忘录字典。为了享受递归数据结构的良好行为,请确保您尚未制作副本,如果有,请返回该副本。

因此,让我们创建一个对象:

>>> c1 = Copyable(1, [2])

copy进行浅表复制:

>>> c2 = copy(c1)
>>> c1 is c2
False
>>> c2.b.append(3)
>>> c1.b
[2, 3]

deepcopy现在做了深刻的副本:

>>> c3 = deepcopy(c1)
>>> c3.b.append(4)
>>> c1.b
[2, 3]

How can I create a copy of an object in Python?

So, if I change values of the fields of the new object, the old object should not be affected by that.

You mean a mutable object then.

In Python 3, lists get a copy method (in 2, you’d use a slice to make a copy):

>>> a_list = list('abc')
>>> a_copy_of_a_list = a_list.copy()
>>> a_copy_of_a_list is a_list
False
>>> a_copy_of_a_list == a_list
True

Shallow Copies

Shallow copies are just copies of the outermost container.

list.copy is a shallow copy:

>>> list_of_dict_of_set = [{'foo': set('abc')}]
>>> lodos_copy = list_of_dict_of_set.copy()
>>> lodos_copy[0]['foo'].pop()
'c'
>>> lodos_copy
[{'foo': {'b', 'a'}}]
>>> list_of_dict_of_set
[{'foo': {'b', 'a'}}]

You don’t get a copy of the interior objects. They’re the same object – so when they’re mutated, the change shows up in both containers.

Deep copies

Deep copies are recursive copies of each interior object.

>>> lodos_deep_copy = copy.deepcopy(list_of_dict_of_set)
>>> lodos_deep_copy[0]['foo'].add('c')
>>> lodos_deep_copy
[{'foo': {'c', 'b', 'a'}}]
>>> list_of_dict_of_set
[{'foo': {'b', 'a'}}]

Changes are not reflected in the original, only in the copy.

Immutable objects

Immutable objects do not usually need to be copied. In fact, if you try to, Python will just give you the original object:

>>> a_tuple = tuple('abc')
>>> tuple_copy_attempt = a_tuple.copy()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: 'tuple' object has no attribute 'copy'

Tuples don’t even have a copy method, so let’s try it with a slice:

>>> tuple_copy_attempt = a_tuple[:]

But we see it’s the same object:

>>> tuple_copy_attempt is a_tuple
True

Similarly for strings:

>>> s = 'abc'
>>> s0 = s[:]
>>> s == s0
True
>>> s is s0
True

and for frozensets, even though they have a copy method:

>>> a_frozenset = frozenset('abc')
>>> frozenset_copy_attempt = a_frozenset.copy()
>>> frozenset_copy_attempt is a_frozenset
True

When to copy immutable objects

Immutable objects should be copied if you need a mutable interior object copied.

>>> tuple_of_list = [],
>>> copy_of_tuple_of_list = tuple_of_list[:]
>>> copy_of_tuple_of_list[0].append('a')
>>> copy_of_tuple_of_list
(['a'],)
>>> tuple_of_list
(['a'],)
>>> deepcopy_of_tuple_of_list = copy.deepcopy(tuple_of_list)
>>> deepcopy_of_tuple_of_list[0].append('b')
>>> deepcopy_of_tuple_of_list
(['a', 'b'],)
>>> tuple_of_list
(['a'],)

As we can see, when the interior object of the copy is mutated, the original does not change.

Custom Objects

Custom objects usually store data in a __dict__ attribute or in __slots__ (a tuple-like memory structure.)

To make a copyable object, define __copy__ (for shallow copies) and/or __deepcopy__ (for deep copies).

from copy import copy, deepcopy

class Copyable:
    __slots__ = 'a', '__dict__'
    def __init__(self, a, b):
        self.a, self.b = a, b
    def __copy__(self):
        return type(self)(self.a, self.b)
    def __deepcopy__(self, memo): # memo is a dict of id's to copies
        id_self = id(self)        # memoization avoids unnecesary recursion
        _copy = memo.get(id_self)
        if _copy is None:
            _copy = type(self)(
                deepcopy(self.a, memo), 
                deepcopy(self.b, memo))
            memo[id_self] = _copy 
        return _copy

Note that deepcopy keeps a memoization dictionary of id(original) (or identity numbers) to copies. To enjoy good behavior with recursive data structures, make sure you haven’t already made a copy, and if you have, return that.

So let’s make an object:

>>> c1 = Copyable(1, [2])

And copy makes a shallow copy:

>>> c2 = copy(c1)
>>> c1 is c2
False
>>> c2.b.append(3)
>>> c1.b
[2, 3]

And deepcopy now makes a deep copy:

>>> c3 = deepcopy(c1)
>>> c3.b.append(4)
>>> c1.b
[2, 3]

回答 2

浅复制 copy.copy()

#!/usr/bin/env python3

import copy

class C():
    def __init__(self):
        self.x = [1]
        self.y = [2]

# It copies.
c = C()
d = copy.copy(c)
d.x = [3]
assert c.x == [1]
assert d.x == [3]

# It's shallow.
c = C()
d = copy.copy(c)
d.x[0] = 3
assert c.x == [3]
assert d.x == [3]

深度复制 copy.deepcopy()

#!/usr/bin/env python3
import copy
class C():
    def __init__(self):
        self.x = [1]
        self.y = [2]
c = C()
d = copy.deepcopy(c)
d.x[0] = 3
assert c.x == [1]
assert d.x == [3]

文档:https : //docs.python.org/3/library/copy.html

在Python 3.6.5上测试。

Shallow copy with copy.copy()

#!/usr/bin/env python3

import copy

class C():
    def __init__(self):
        self.x = [1]
        self.y = [2]

# It copies.
c = C()
d = copy.copy(c)
d.x = [3]
assert c.x == [1]
assert d.x == [3]

# It's shallow.
c = C()
d = copy.copy(c)
d.x[0] = 3
assert c.x == [3]
assert d.x == [3]

Deep copy with copy.deepcopy()

#!/usr/bin/env python3
import copy
class C():
    def __init__(self):
        self.x = [1]
        self.y = [2]
c = C()
d = copy.deepcopy(c)
d.x[0] = 3
assert c.x == [1]
assert d.x == [3]

Documentation: https://docs.python.org/3/library/copy.html

Tested on Python 3.6.5.


回答 3

我相信以下内容应适用于许多行为良好的Python类:

def copy(obj):
    return type(obj)(obj)

(当然,我在这里不是在谈论“深层副本”,这是一个不同的故事,并且可能不是一个非常清晰的概念-足够深的深度?)

根据我对Python 3的测试,对于不可变的对象(例如元组或字符串),它返回相同的对象(因为不需要对不可变的对象进行浅表复制),但是对于列表或字典,它创建了独立的浅表复制。

当然,此方法仅适用于构造函数具有相应行为的类。可能的用例:制作标准Python容器类的浅表副本。

I believe the following should work with many well-behaved classed in Python:

def copy(obj):
    return type(obj)(obj)

(Of course, I am not talking here about “deep copies,” which is a different story, and which may be not a very clear concept — how deep is deep enough?)

According to my tests with Python 3, for immutable objects, like tuples or strings, it returns the same object (because there is no need to make a shallow copy of an immutable object), but for lists or dictionaries it creates an independent shallow copy.

Of course this method only works for classes whose constructors behave accordingly. Possible use cases: making a shallow copy of a standard Python container class.


super()失败,并显示错误:当父级未从对象继承时,TypeError“参数1必须为类型,而不是classobj”

问题:super()失败,并显示错误:当父级未从对象继承时,TypeError“参数1必须为类型,而不是classobj”

我收到一些我不知道的错误。任何线索我的示例代码有什么问题吗?

class B:
    def meth(self, arg):
        print arg

class C(B):
    def meth(self, arg):
        super(C, self).meth(arg)

print C().meth(1)

我从“ super”内置方法的帮助下获得了示例测试代码。

这是错误:

Traceback (most recent call last):
  File "./test.py", line 10, in ?
    print C().meth(1)
  File "./test.py", line 8, in meth
    super(C, self).meth(arg)
TypeError: super() argument 1 must be type, not classobj

仅供参考,这是python本身的帮助(超级):

Help on class super in module __builtin__:

class super(object)
 |  super(type) -> unbound super object
 |  super(type, obj) -> bound super object; requires isinstance(obj, type)
 |  super(type, type2) -> bound super object; requires issubclass(type2, type)
 |  Typical use to call a cooperative superclass method:
 |  class C(B):
 |      def meth(self, arg):
 |          super(C, self).meth(arg)
 |

I get some error that I can’t figure out. Any clue what is wrong with my sample code?

class B:
    def meth(self, arg):
        print arg

class C(B):
    def meth(self, arg):
        super(C, self).meth(arg)

print C().meth(1)

I got the sample test code from help of ‘super’ built-in method.

Here is the error:

Traceback (most recent call last):
  File "./test.py", line 10, in ?
    print C().meth(1)
  File "./test.py", line 8, in meth
    super(C, self).meth(arg)
TypeError: super() argument 1 must be type, not classobj

FYI, here is the help(super) from python itself:

Help on class super in module __builtin__:

class super(object)
 |  super(type) -> unbound super object
 |  super(type, obj) -> bound super object; requires isinstance(obj, type)
 |  super(type, type2) -> bound super object; requires issubclass(type2, type)
 |  Typical use to call a cooperative superclass method:
 |  class C(B):
 |      def meth(self, arg):
 |          super(C, self).meth(arg)
 |

回答 0

您的问题是类B没有声明为“新式”类。像这样更改它:

class B(object):

它会工作。

super()并且所有子类/超类的内容仅适用于新型类。我建议您养成始终(object)在任何类定义上键入它的习惯,以确保它是一种新型的类。

旧式类(也称为“经典”类)始终为type classobj;新样式类的类型为type。这就是为什么您看到错误消息的原因:

TypeError: super() argument 1 must be type, not classobj

试试看自己:

class OldStyle:
    pass

class NewStyle(object):
    pass

print type(OldStyle)  # prints: <type 'classobj'>

print type(NewStyle) # prints <type 'type'>

请注意,在Python 3.x中,所有类都是新样式。您仍然可以使用旧样式类中的语法,但是会获得新样式类。因此,在Python 3.x中,您将不会遇到此问题。

Your problem is that class B is not declared as a “new-style” class. Change it like so:

class B(object):

and it will work.

super() and all subclass/superclass stuff only works with new-style classes. I recommend you get in the habit of always typing that (object) on any class definition to make sure it is a new-style class.

Old-style classes (also known as “classic” classes) are always of type classobj; new-style classes are of type type. This is why you got the error message you saw:

TypeError: super() argument 1 must be type, not classobj

Try this to see for yourself:

class OldStyle:
    pass

class NewStyle(object):
    pass

print type(OldStyle)  # prints: <type 'classobj'>

print type(NewStyle) # prints <type 'type'>

Note that in Python 3.x, all classes are new-style. You can still use the syntax from the old-style classes but you get a new-style class. So, in Python 3.x you won’t have this problem.


回答 1

另外,如果您不能更改类B,则可以使用多重继承来修复错误。

class B:
    def meth(self, arg):
        print arg

class C(B, object):
    def meth(self, arg):
        super(C, self).meth(arg)

print C().meth(1)

Also, if you can’t change class B, you can fix the error by using multiple inheritance.

class B:
    def meth(self, arg):
        print arg

class C(B, object):
    def meth(self, arg):
        super(C, self).meth(arg)

print C().meth(1)

回答 2

如果python版本是3.X,就可以了。

我认为您的python版本是2.X,在添加此代码时,超级将可用

__metaclass__ = type

所以代码是

__metaclass__ = type
class B:
    def meth(self, arg):
        print arg
class C(B):
    def meth(self, arg):
        super(C, self).meth(arg)
print C().meth(1)

If the python version is 3.X, it’s okay.

I think your python version is 2.X, the super would work when adding this code

__metaclass__ = type

so the code is

__metaclass__ = type
class B:
    def meth(self, arg):
        print arg
class C(B):
    def meth(self, arg):
        super(C, self).meth(arg)
print C().meth(1)

回答 3

当我使用python 2.7时,也会遇到发布的问题。它在python 3.4下工作得很好

为了使其在python 2.7中工作,我__metaclass__ = type在程序顶部添加了该属性,并且该属性可以正常工作。

__metaclass__ :简化了从旧样式类到新样式类的过渡。

I was also faced by the posted issue when I used python 2.7. It is working very fine with python 3.4

To make it work in python 2.7 I have added the __metaclass__ = type attribute at the top of my program and it worked.

__metaclass__ : It eases the transition from old-style classes and new-style classes.


保存对象(数据持久性)

问题:保存对象(数据持久性)

我创建了一个像这样的对象:

company1.name = 'banana' 
company1.value = 40

我想保存该对象。我怎样才能做到这一点?

I’ve created an object like this:

company1.name = 'banana' 
company1.value = 40

I would like to save this object. How can I do that?


回答 0

您可以使用pickle标准库中的模块。这是您的示例的基本应用:

import pickle

class Company(object):
    def __init__(self, name, value):
        self.name = name
        self.value = value

with open('company_data.pkl', 'wb') as output:
    company1 = Company('banana', 40)
    pickle.dump(company1, output, pickle.HIGHEST_PROTOCOL)

    company2 = Company('spam', 42)
    pickle.dump(company2, output, pickle.HIGHEST_PROTOCOL)

del company1
del company2

with open('company_data.pkl', 'rb') as input:
    company1 = pickle.load(input)
    print(company1.name)  # -> banana
    print(company1.value)  # -> 40

    company2 = pickle.load(input)
    print(company2.name) # -> spam
    print(company2.value)  # -> 42

您还可以定义自己的简单实用程序,如下所示,该实用程序打开文件并向其中写入单个对象:

def save_object(obj, filename):
    with open(filename, 'wb') as output:  # Overwrites any existing file.
        pickle.dump(obj, output, pickle.HIGHEST_PROTOCOL)

# sample usage
save_object(company1, 'company1.pkl')

更新资料

由于这是一个很受欢迎的答案,因此,我想谈谈一些较高级的用法主题。

cPickle(或_pickle)与pickle

实际使用cPickle模块几乎总是可取的,而不是pickle因为模块是用C编写的并且速度更快。它们之间有一些细微的差异,但是在大多数情况下它们是等效的,并且C版本将提供非常优越的性能。切换到它再简单不过,只需将import语句更改为:

import cPickle as pickle

在Python 3中,它cPickle已被重命名_pickle,但是不再需要执行此操作,因为该pickle模块现在可以自动执行此操作-请参阅python 3中的pickle和_pickle有什么区别?

总结是,您可以使用类似以下内容的代码来确保您的代码在Python 2和3中都可用时始终使用C版本:

try:
    import cPickle as pickle
except ModuleNotFoundError:
    import pickle

数据流格式(协议)

pickle可以读写多种不同的特定于Python的格式的文件,称为文档中所述的协议,“协议版本0”为ASCII,因此“易于阅读”。> 0的版本是二进制的,可用的最高版本取决于所使用的Python版本。默认值还取决于Python版本。在Python 2中,默认值是Protocol版本,但在Python 3.8.1中,它是Protocol版本。在Python 3.x中,该模块已添加,但在Python 2中不存在。04pickle.DEFAULT_PROTOCOL

幸运的是,pickle.HIGHEST_PROTOCOL在每个调用中都有一个写速记的方法(假设这就是您想要的,并且您通常会这样做),只需使用文字数字-1-类似于通过负索引引用序列的最后一个元素。因此,与其编写:

pickle.dump(obj, output, pickle.HIGHEST_PROTOCOL)

您可以这样写:

pickle.dump(obj, output, -1)

无论哪种方式,如果您创建了一个Pickler用于多个酸洗操作的对象,则只需指定一次协议:

pickler = pickle.Pickler(output, -1)
pickler.dump(obj1)
pickler.dump(obj2)
   etc...

注意:如果您正在运行不同版本的Python的环境中,则可能需要显式地使用(即,硬编码)它们都可以读取的特定协议编号(较新的版本通常可以读取较早版本产生的文件) 。

多个物件

虽然泡菜文件可以包含如上述样品中,当有这些数目不详的任何数量的腌制对象的,它往往更容易将其全部保存在某种可变大小的容器,就像一个listtupledict写字一次调用即可将它们全部保存到文件中:

tech_companies = [
    Company('Apple', 114.18), Company('Google', 908.60), Company('Microsoft', 69.18)
]
save_object(tech_companies, 'tech_companies.pkl')

然后使用以下命令恢复列表及其中的所有内容:

with open('tech_companies.pkl', 'rb') as input:
    tech_companies = pickle.load(input)

主要优点是您无需知道要保存多少个对象实例即可在以后加载它们(尽管如果没有该信息可以这样做,但它需要一些专门的代码)。请参阅相关问题的答案在腌制文件中保存和加载多个对象?有关执行此操作的不同方法的详细信息。个人喜欢@Lutz Prechelt的答案。它适用于此处的示例:

class Company:
    def __init__(self, name, value):
        self.name = name
        self.value = value

def pickled_items(filename):
    """ Unpickle a file of pickled data. """
    with open(filename, "rb") as f:
        while True:
            try:
                yield pickle.load(f)
            except EOFError:
                break

print('Companies in pickle file:')
for company in pickled_items('company_data.pkl'):
    print('  name: {}, value: {}'.format(company.name, company.value))

You could use the pickle module in the standard library. Here’s an elementary application of it to your example:

import pickle

class Company(object):
    def __init__(self, name, value):
        self.name = name
        self.value = value

with open('company_data.pkl', 'wb') as output:
    company1 = Company('banana', 40)
    pickle.dump(company1, output, pickle.HIGHEST_PROTOCOL)

    company2 = Company('spam', 42)
    pickle.dump(company2, output, pickle.HIGHEST_PROTOCOL)

del company1
del company2

with open('company_data.pkl', 'rb') as input:
    company1 = pickle.load(input)
    print(company1.name)  # -> banana
    print(company1.value)  # -> 40

    company2 = pickle.load(input)
    print(company2.name) # -> spam
    print(company2.value)  # -> 42

You could also define your own simple utility like the following which opens a file and writes a single object to it:

def save_object(obj, filename):
    with open(filename, 'wb') as output:  # Overwrites any existing file.
        pickle.dump(obj, output, pickle.HIGHEST_PROTOCOL)

# sample usage
save_object(company1, 'company1.pkl')

Update

Since this is such a popular answer, I’d like touch on a few slightly advanced usage topics.

cPickle (or _pickle) vs pickle

It’s almost always preferable to actually use the cPickle module rather than pickle because the former is written in C and is much faster. There are some subtle differences between them, but in most situations they’re equivalent and the C version will provide greatly superior performance. Switching to it couldn’t be easier, just change the import statement to this:

import cPickle as pickle

In Python 3, cPickle was renamed _pickle, but doing this is no longer necessary since the pickle module now does it automatically—see What difference between pickle and _pickle in python 3?.

The rundown is you could use something like the following to ensure that your code will always use the C version when it’s available in both Python 2 and 3:

try:
    import cPickle as pickle
except ModuleNotFoundError:
    import pickle

Data stream formats (protocols)

pickle can read and write files in several different, Python-specific, formats, called protocols as described in the documentation, “Protocol version 0” is ASCII and therefore “human-readable”. Versions > 0 are binary and the highest one available depends on what version of Python is being used. The default also depends on Python version. In Python 2 the default was Protocol version 0, but in Python 3.8.1, it’s Protocol version 4. In Python 3.x the module had a pickle.DEFAULT_PROTOCOL added to it, but that doesn’t exist in Python 2.

Fortunately there’s shorthand for writing pickle.HIGHEST_PROTOCOL in every call (assuming that’s what you want, and you usually do), just use the literal number -1 — similar to referencing the last element of a sequence via a negative index. So, instead of writing:

pickle.dump(obj, output, pickle.HIGHEST_PROTOCOL)

You can just write:

pickle.dump(obj, output, -1)

Either way, you’d only have specify the protocol once if you created a Pickler object for use in multiple pickle operations:

pickler = pickle.Pickler(output, -1)
pickler.dump(obj1)
pickler.dump(obj2)
   etc...

Note: If you’re in an environment running different versions of Python, then you’ll probably want to explicitly use (i.e. hardcode) a specific protocol number that all of them can read (later versions can generally read files produced by earlier ones).

Multiple Objects

While a pickle file can contain any number of pickled objects, as shown in the above samples, when there’s an unknown number of them, it’s often easier to store them all in some sort of variably-sized container, like a list, tuple, or dict and write them all to the file in a single call:

tech_companies = [
    Company('Apple', 114.18), Company('Google', 908.60), Company('Microsoft', 69.18)
]
save_object(tech_companies, 'tech_companies.pkl')

and restore the list and everything in it later with:

with open('tech_companies.pkl', 'rb') as input:
    tech_companies = pickle.load(input)

The major advantage is you don’t need to know how many object instances are saved in order to load them back later (although doing so without that information is possible, it requires some slightly specialized code). See the answers to the related question Saving and loading multiple objects in pickle file? for details on different ways to do this. Personally I like @Lutz Prechelt’s answer the best. Here’s it adapted to the examples here:

class Company:
    def __init__(self, name, value):
        self.name = name
        self.value = value

def pickled_items(filename):
    """ Unpickle a file of pickled data. """
    with open(filename, "rb") as f:
        while True:
            try:
                yield pickle.load(f)
            except EOFError:
                break

print('Companies in pickle file:')
for company in pickled_items('company_data.pkl'):
    print('  name: {}, value: {}'.format(company.name, company.value))

回答 1

我认为,假设该对象是个,这是一个很强的假设class。如果不是,该class怎么办?还有一种假设是该对象未在解释器中定义。如果在解释器中定义该怎么办?另外,如果属性是动态添加的,该怎么办?当某些python对象__dict__在创建后向其添加了属性时,pickle就不考虑这些属性的添加(即“忘记”了它们的添加-因为pickle是通过引用对象定义进行序列化的)。

在所有这些情况,pickle并且cPickle可以可怕的失败你。

如果您要保存object(任意创建的)具有属性(在对象定义中添加,或之后添加)的属性,则最好的选择是使用dill,它可以在python中序列化几乎所有内容。

我们从上课开始…

Python 2.7.8 (default, Jul 13 2014, 02:29:54) 
[GCC 4.2.1 Compatible Apple Clang 4.1 ((tags/Apple/clang-421.11.66))] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import pickle
>>> class Company:
...     pass
... 
>>> company1 = Company()
>>> company1.name = 'banana'
>>> company1.value = 40
>>> with open('company.pkl', 'wb') as f:
...     pickle.dump(company1, f, pickle.HIGHEST_PROTOCOL)
... 
>>> 

现在关闭,然后重新启动…

Python 2.7.8 (default, Jul 13 2014, 02:29:54) 
[GCC 4.2.1 Compatible Apple Clang 4.1 ((tags/Apple/clang-421.11.66))] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import pickle
>>> with open('company.pkl', 'rb') as f:
...     company1 = pickle.load(f)
... 
Traceback (most recent call last):
  File "<stdin>", line 2, in <module>
  File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py", line 1378, in load
    return Unpickler(file).load()
  File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py", line 858, in load
dispatch[key](self)
  File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py", line 1090, in load_global
    klass = self.find_class(module, name)
  File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py", line 1126, in find_class
    klass = getattr(mod, name)
AttributeError: 'module' object has no attribute 'Company'
>>> 

糟糕… pickle无法处理。让我们尝试一下dill。我们将引入另一种对象类型(a lambda)以取得良好的效果。

Python 2.7.8 (default, Jul 13 2014, 02:29:54) 
[GCC 4.2.1 Compatible Apple Clang 4.1 ((tags/Apple/clang-421.11.66))] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import dill       
>>> class Company:
...     pass
... 
>>> company1 = Company()
>>> company1.name = 'banana'
>>> company1.value = 40
>>> 
>>> company2 = lambda x:x
>>> company2.name = 'rhubarb'
>>> company2.value = 42
>>> 
>>> with open('company_dill.pkl', 'wb') as f:
...     dill.dump(company1, f)
...     dill.dump(company2, f)
... 
>>> 

现在读取文件。

Python 2.7.8 (default, Jul 13 2014, 02:29:54) 
[GCC 4.2.1 Compatible Apple Clang 4.1 ((tags/Apple/clang-421.11.66))] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import dill
>>> with open('company_dill.pkl', 'rb') as f:
...     company1 = dill.load(f)
...     company2 = dill.load(f)
... 
>>> company1 
<__main__.Company instance at 0x107909128>
>>> company1.name
'banana'
>>> company1.value
40
>>> company2.name
'rhubarb'
>>> company2.value
42
>>>    

有用。pickle失败的原因(dill并非如此)是(在大多数情况下)dill将其视为__main__一个模块,并且还可以腌制类定义而不是通过引用进行腌制(就像这样pickle做)。dill腌制a 的原因lambda是它给它起了个名字……然后就会出现腌制魔术。

实际上,有一种保存所有这些对象的简便方法,尤其是当您创建了很多对象时。只需转储整个python会话,然后稍后再返回即可。

Python 2.7.8 (default, Jul 13 2014, 02:29:54) 
[GCC 4.2.1 Compatible Apple Clang 4.1 ((tags/Apple/clang-421.11.66))] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import dill
>>> class Company:
...     pass
... 
>>> company1 = Company()
>>> company1.name = 'banana'
>>> company1.value = 40
>>> 
>>> company2 = lambda x:x
>>> company2.name = 'rhubarb'
>>> company2.value = 42
>>> 
>>> dill.dump_session('dill.pkl')
>>> 

现在关闭计算机,享用意式浓缩咖啡或其他任何东西,然后再回来…

Python 2.7.8 (default, Jul 13 2014, 02:29:54) 
[GCC 4.2.1 Compatible Apple Clang 4.1 ((tags/Apple/clang-421.11.66))] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import dill
>>> dill.load_session('dill.pkl')
>>> company1.name
'banana'
>>> company1.value
40
>>> company2.name
'rhubarb'
>>> company2.value
42
>>> company2
<function <lambda> at 0x1065f2938>

唯一的主要缺点是它dill不是python标准库的一部分。因此,如果您无法在服务器上安装python软件包,则无法使用它。

但是,如果你能够在系统上安装Python包,你可以得到最新的dillgit+https://github.com/uqfoundation/dill.git@master#egg=dill。您可以使用下载最新版本pip install dill

I think it’s a pretty strong assumption to assume that the object is a class. What if it’s not a class? There’s also the assumption that the object was not defined in the interpreter. What if it was defined in the interpreter? Also, what if the attributes were added dynamically? When some python objects have attributes added to their __dict__ after creation, pickle doesn’t respect the addition of those attributes (i.e. it ‘forgets’ they were added — because pickle serializes by reference to the object definition).

In all these cases, pickle and cPickle can fail you horribly.

If you are looking to save an object (arbitrarily created), where you have attributes (either added in the object definition, or afterward)… your best bet is to use dill, which can serialize almost anything in python.

We start with a class…

Python 2.7.8 (default, Jul 13 2014, 02:29:54) 
[GCC 4.2.1 Compatible Apple Clang 4.1 ((tags/Apple/clang-421.11.66))] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import pickle
>>> class Company:
...     pass
... 
>>> company1 = Company()
>>> company1.name = 'banana'
>>> company1.value = 40
>>> with open('company.pkl', 'wb') as f:
...     pickle.dump(company1, f, pickle.HIGHEST_PROTOCOL)
... 
>>> 

Now shut down, and restart…

Python 2.7.8 (default, Jul 13 2014, 02:29:54) 
[GCC 4.2.1 Compatible Apple Clang 4.1 ((tags/Apple/clang-421.11.66))] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import pickle
>>> with open('company.pkl', 'rb') as f:
...     company1 = pickle.load(f)
... 
Traceback (most recent call last):
  File "<stdin>", line 2, in <module>
  File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py", line 1378, in load
    return Unpickler(file).load()
  File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py", line 858, in load
dispatch[key](self)
  File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py", line 1090, in load_global
    klass = self.find_class(module, name)
  File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py", line 1126, in find_class
    klass = getattr(mod, name)
AttributeError: 'module' object has no attribute 'Company'
>>> 

Oops… pickle can’t handle it. Let’s try dill. We’ll throw in another object type (a lambda) for good measure.

Python 2.7.8 (default, Jul 13 2014, 02:29:54) 
[GCC 4.2.1 Compatible Apple Clang 4.1 ((tags/Apple/clang-421.11.66))] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import dill       
>>> class Company:
...     pass
... 
>>> company1 = Company()
>>> company1.name = 'banana'
>>> company1.value = 40
>>> 
>>> company2 = lambda x:x
>>> company2.name = 'rhubarb'
>>> company2.value = 42
>>> 
>>> with open('company_dill.pkl', 'wb') as f:
...     dill.dump(company1, f)
...     dill.dump(company2, f)
... 
>>> 

And now read the file.

Python 2.7.8 (default, Jul 13 2014, 02:29:54) 
[GCC 4.2.1 Compatible Apple Clang 4.1 ((tags/Apple/clang-421.11.66))] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import dill
>>> with open('company_dill.pkl', 'rb') as f:
...     company1 = dill.load(f)
...     company2 = dill.load(f)
... 
>>> company1 
<__main__.Company instance at 0x107909128>
>>> company1.name
'banana'
>>> company1.value
40
>>> company2.name
'rhubarb'
>>> company2.value
42
>>>    

It works. The reason pickle fails, and dill doesn’t, is that dill treats __main__ like a module (for the most part), and also can pickle class definitions instead of pickling by reference (like pickle does). The reason dill can pickle a lambda is that it gives it a name… then pickling magic can happen.

Actually, there’s an easier way to save all these objects, especially if you have a lot of objects you’ve created. Just dump the whole python session, and come back to it later.

Python 2.7.8 (default, Jul 13 2014, 02:29:54) 
[GCC 4.2.1 Compatible Apple Clang 4.1 ((tags/Apple/clang-421.11.66))] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import dill
>>> class Company:
...     pass
... 
>>> company1 = Company()
>>> company1.name = 'banana'
>>> company1.value = 40
>>> 
>>> company2 = lambda x:x
>>> company2.name = 'rhubarb'
>>> company2.value = 42
>>> 
>>> dill.dump_session('dill.pkl')
>>> 

Now shut down your computer, go enjoy an espresso or whatever, and come back later…

Python 2.7.8 (default, Jul 13 2014, 02:29:54) 
[GCC 4.2.1 Compatible Apple Clang 4.1 ((tags/Apple/clang-421.11.66))] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import dill
>>> dill.load_session('dill.pkl')
>>> company1.name
'banana'
>>> company1.value
40
>>> company2.name
'rhubarb'
>>> company2.value
42
>>> company2
<function <lambda> at 0x1065f2938>

The only major drawback is that dill is not part of the python standard library. So if you can’t install a python package on your server, then you can’t use it.

However, if you are able to install python packages on your system, you can get the latest dill with git+https://github.com/uqfoundation/dill.git@master#egg=dill. And you can get the latest released version with pip install dill.


回答 2

您可以使用anycache为您完成这项工作。它考虑了所有细节:

  • 它使用莳萝作为后端,扩展了python pickle模块以处理lambda和所有不错的python功能。
  • 它将不同的对象存储到不同的文件,并正确地重新加载它们。
  • 限制缓存大小
  • 允许清除缓存
  • 允许在多次运行之间共享对象
  • 允许尊重会影响结果的输入文件

假设您有一个myfunc创建实例的函数:

from anycache import anycache

class Company(object):
    def __init__(self, name, value):
        self.name = name
        self.value = value

@anycache(cachedir='/path/to/your/cache')    
def myfunc(name, value)
    return Company(name, value)

Anycache首次调用myfunc,并cachedir使用唯一标识符(取决于函数名称及其参数)作为文件名将结果腌制到文件中。在任何连续运行中,将加载已腌制的对象。如果在cachedir两次python运行之间保留了,则腌制的对象将从先前的python运行中获取。

有关更多详细信息,请参见文档

You can use anycache to do the job for you. It considers all the details:

  • It uses dill as backend, which extends the python pickle module to handle lambda and all the nice python features.
  • It stores different objects to different files and reloads them properly.
  • Limits cache size
  • Allows cache clearing
  • Allows sharing of objects between multiple runs
  • Allows respect of input files which influence the result

Assuming you have a function myfunc which creates the instance:

from anycache import anycache

class Company(object):
    def __init__(self, name, value):
        self.name = name
        self.value = value

@anycache(cachedir='/path/to/your/cache')    
def myfunc(name, value)
    return Company(name, value)

Anycache calls myfunc at the first time and pickles the result to a file in cachedir using an unique identifier (depending on the function name and its arguments) as filename. On any consecutive run, the pickled object is loaded. If the cachedir is preserved between python runs, the pickled object is taken from the previous python run.

For any further details see the documentation


回答 3

使用company1您的问题和python3的快速示例。

import pickle

# Save the file
pickle.dump(company1, file = open("company1.pickle", "wb"))

# Reload the file
company1_reloaded = pickle.load(open("company1.pickle", "rb"))

但是,正如该答案指出的那样,泡菜经常失败。所以你应该真正使用dill

import dill

# Save the file
dill.dump(company1, file = open("company1.pickle", "wb"))

# Reload the file
company1_reloaded = dill.load(open("company1.pickle", "rb"))

Quick example using company1 from your question, with python3.

import pickle

# Save the file
pickle.dump(company1, file = open("company1.pickle", "wb"))

# Reload the file
company1_reloaded = pickle.load(open("company1.pickle", "rb"))

However, as this answer noted, pickle often fails. So you should really use dill.

import dill

# Save the file
dill.dump(company1, file = open("company1.pickle", "wb"))

# Reload the file
company1_reloaded = dill.load(open("company1.pickle", "rb"))