如何覆盖Python对象的复制/深层复制操作?

问题:如何覆盖Python对象的复制/深层复制操作?

我了解复制模块copy与vs 之间的区别deepcopy。我已经使用过copy.copy并且copy.deepcopy在成功之前使用过,但这是我第一次真正地重载__copy__and __deepcopy__方法。我已经用谷歌搜索看去,通过内置的Python模块查找的实例__copy____deepcopy__功能(例如sets.pydecimal.pyfractions.py),但我仍然不能100%肯定我明白了它的权利。

这是我的情况:

我有一个配置对象。最初,我将使用一组默认值实例化一个配置对象。此配置将移交给其他多个对象(以确保所有对象都以相同的配置开始)。但是,一旦开始用户交互,每个对象都需要独立地调整其配置,而又不影响彼此的配置(对我来说,我需要对初始配置进行深入复制才能进行处理)。

这是一个示例对象:

class ChartConfig(object):

    def __init__(self):

        #Drawing properties (Booleans/strings)
        self.antialiased = None
        self.plot_style = None
        self.plot_title = None
        self.autoscale = None

        #X axis properties (strings/ints)
        self.xaxis_title = None
        self.xaxis_tick_rotation = None
        self.xaxis_tick_align = None

        #Y axis properties (strings/ints)
        self.yaxis_title = None
        self.yaxis_tick_rotation = None
        self.yaxis_tick_align = None

        #A list of non-primitive objects
        self.trace_configs = []

    def __copy__(self):
        pass

    def __deepcopy__(self, memo):
        pass 

在此对象上实现copydeepcopy方法以确保copy.copycopy.deepcopy提供适当行为的正确方法是什么?

I understand the difference between copy vs. deepcopy in the copy module. I’ve used copy.copy and copy.deepcopy before successfully, but this is the first time I’ve actually gone about overloading the __copy__ and __deepcopy__ methods. I’ve already Googled around and looked through the built-in Python modules to look for instances of the __copy__ and __deepcopy__ functions (e.g. sets.py, decimal.py, and fractions.py), but I’m still not 100% sure I’ve got it right.

Here’s my scenario:

I have a configuration object. Initially, I’m going to instantiate one configuration object with a default set of values. This configuration will be handed off to multiple other objects (to ensure all objects start with the same configuration). However, once user interaction starts, each object needs to tweak its configurations independently without affecting each other’s configurations (which says to me I’ll need to make deepcopys of my initial configuration to hand around).

Here’s a sample object:

class ChartConfig(object):

    def __init__(self):

        #Drawing properties (Booleans/strings)
        self.antialiased = None
        self.plot_style = None
        self.plot_title = None
        self.autoscale = None

        #X axis properties (strings/ints)
        self.xaxis_title = None
        self.xaxis_tick_rotation = None
        self.xaxis_tick_align = None

        #Y axis properties (strings/ints)
        self.yaxis_title = None
        self.yaxis_tick_rotation = None
        self.yaxis_tick_align = None

        #A list of non-primitive objects
        self.trace_configs = []

    def __copy__(self):
        pass

    def __deepcopy__(self, memo):
        pass 

What is the right way to implement the copy and deepcopy methods on this object to ensure copy.copy and copy.deepcopy give me the proper behavior?


回答 0

有关自定义的建议位于文档页面的最后

类可以使用与控制酸洗相同的接口来控制复制。有关这些方法的信息,请参见模块pickle的描述。复制模块不使用copy_reg注册模块。

为了让一个类定义自己的副本实现,它可以定义特殊的方法__copy__()__deepcopy__()。前者被称为实现浅拷贝操作;没有传递其他参数。后者被称为实现深度复制操作。它传递了一个参数,即备忘字典。如果__deepcopy__() 实现需要复制组件的深层副本,则应deepcopy()以该组件为第一个参数,并以备注字典为第二个参数来调用该函数。

由于您似乎不关心腌制自定义,因此定义__copy____deepcopy__无疑似乎是适合您的正确方法。

具体来说,__copy__(浅表副本)在您的情况下非常容易…:

def __copy__(self):
  newone = type(self)()
  newone.__dict__.update(self.__dict__)
  return newone

__deepcopy__会类似(也接受memoarg),但是在返回之前,它必须调用self.foo = deepcopy(self.foo, memo)任何self.foo需要深度复制的属性(本质上是容器属性-列表,字典,非原始对象,它们通过__dict__s 保存其他内容)。

The recommendations for customizing are at the very end of the docs page:

Classes can use the same interfaces to control copying that they use to control pickling. See the description of module pickle for information on these methods. The copy module does not use the copy_reg registration module.

In order for a class to define its own copy implementation, it can define special methods __copy__() and __deepcopy__(). The former is called to implement the shallow copy operation; no additional arguments are passed. The latter is called to implement the deep copy operation; it is passed one argument, the memo dictionary. If the __deepcopy__() implementation needs to make a deep copy of a component, it should call the deepcopy() function with the component as first argument and the memo dictionary as second argument.

Since you appear not to care about pickling customization, defining __copy__ and __deepcopy__ definitely seems like the right way to go for you.

Specifically, __copy__ (the shallow copy) is pretty easy in your case…:

def __copy__(self):
  newone = type(self)()
  newone.__dict__.update(self.__dict__)
  return newone

__deepcopy__ would be similar (accepting a memo arg too) but before the return it would have to call self.foo = deepcopy(self.foo, memo) for any attribute self.foo that needs deep copying (essentially attributes that are containers — lists, dicts, non-primitive objects which hold other stuff through their __dict__s).


回答 1

将Alex Martelli的答案和Rob Young的评论放在一起,您将获得以下代码:

from copy import copy, deepcopy

class A(object):
    def __init__(self):
        print 'init'
        self.v = 10
        self.z = [2,3,4]

    def __copy__(self):
        cls = self.__class__
        result = cls.__new__(cls)
        result.__dict__.update(self.__dict__)
        return result

    def __deepcopy__(self, memo):
        cls = self.__class__
        result = cls.__new__(cls)
        memo[id(self)] = result
        for k, v in self.__dict__.items():
            setattr(result, k, deepcopy(v, memo))
        return result

a = A()
a.v = 11
b1, b2 = copy(a), deepcopy(a)
a.v = 12
a.z.append(5)
print b1.v, b1.z
print b2.v, b2.z

版画

init
11 [2, 3, 4, 5]
11 [2, 3, 4]

在此__deepcopy__填写该格,memo以避免在对象本身从其成员引用的情况下过度复制。

Putting together Alex Martelli’s answer and Rob Young’s comment you get the following code:

from copy import copy, deepcopy

class A(object):
    def __init__(self):
        print 'init'
        self.v = 10
        self.z = [2,3,4]

    def __copy__(self):
        cls = self.__class__
        result = cls.__new__(cls)
        result.__dict__.update(self.__dict__)
        return result

    def __deepcopy__(self, memo):
        cls = self.__class__
        result = cls.__new__(cls)
        memo[id(self)] = result
        for k, v in self.__dict__.items():
            setattr(result, k, deepcopy(v, memo))
        return result

a = A()
a.v = 11
b1, b2 = copy(a), deepcopy(a)
a.v = 12
a.z.append(5)
print b1.v, b1.z
print b2.v, b2.z

prints

init
11 [2, 3, 4, 5]
11 [2, 3, 4]

here __deepcopy__ fills in the memo dict to avoid excess copying in case the object itself is referenced from its member.


回答 2

遵循Peter的出色回答,实现自定义深度复制,对默认实现的更改最少(例如,仅修改所需的字段):

class Foo(object):
    def __deepcopy__(self, memo):
        deepcopy_method = self.__deepcopy__
        self.__deepcopy__ = None
        cp = deepcopy(self, memo)
        self.__deepcopy__ = deepcopy_method
        cp.__deepcopy__ = deepcopy_method

        # custom treatments
        # for instance: cp.id = None

        return cp

Following Peter’s excellent answer, to implement a custom deepcopy, with minimal alteration to the default implementation (e.g. just modifying a field like I needed) :

class Foo(object):
    def __deepcopy__(self, memo):
        deepcopy_method = self.__deepcopy__
        self.__deepcopy__ = None
        cp = deepcopy(self, memo)
        self.__deepcopy__ = deepcopy_method
        cp.__deepcopy__ = deepcopy_method

        # custom treatments
        # for instance: cp.id = None

        return cp

回答 3

从您的问题尚不清楚,您为什么需要覆盖这些方法,因为您不想对复制方法进行任何自定义。

无论如何,如果您确实想自定义深层副本(例如,通过共享某些属性并复制其他属性),则可以采用以下解决方案:

from copy import deepcopy


def deepcopy_with_sharing(obj, shared_attribute_names, memo=None):
    '''
    Deepcopy an object, except for a given list of attributes, which should
    be shared between the original object and its copy.

    obj is some object
    shared_attribute_names: A list of strings identifying the attributes that
        should be shared between the original and its copy.
    memo is the dictionary passed into __deepcopy__.  Ignore this argument if
        not calling from within __deepcopy__.
    '''
    assert isinstance(shared_attribute_names, (list, tuple))
    shared_attributes = {k: getattr(obj, k) for k in shared_attribute_names}

    if hasattr(obj, '__deepcopy__'):
        # Do hack to prevent infinite recursion in call to deepcopy
        deepcopy_method = obj.__deepcopy__
        obj.__deepcopy__ = None

    for attr in shared_attribute_names:
        del obj.__dict__[attr]

    clone = deepcopy(obj)

    for attr, val in shared_attributes.iteritems():
        setattr(obj, attr, val)
        setattr(clone, attr, val)

    if hasattr(obj, '__deepcopy__'):
        # Undo hack
        obj.__deepcopy__ = deepcopy_method
        del clone.__deepcopy__

    return clone



class A(object):

    def __init__(self):
        self.copy_me = []
        self.share_me = []

    def __deepcopy__(self, memo):
        return deepcopy_with_sharing(self, shared_attribute_names = ['share_me'], memo=memo)

a = A()
b = deepcopy(a)
assert a.copy_me is not b.copy_me
assert a.share_me is b.share_me

c = deepcopy(b)
assert c.copy_me is not b.copy_me
assert c.share_me is b.share_me

Its not clear from your problem why you need to override these methods, since you don’t want to do any customization to the copying methods.

Anyhow, if you do want to customize the deep copy (e.g. by sharing some attributes and copying others), here is a solution:

from copy import deepcopy


def deepcopy_with_sharing(obj, shared_attribute_names, memo=None):
    '''
    Deepcopy an object, except for a given list of attributes, which should
    be shared between the original object and its copy.

    obj is some object
    shared_attribute_names: A list of strings identifying the attributes that
        should be shared between the original and its copy.
    memo is the dictionary passed into __deepcopy__.  Ignore this argument if
        not calling from within __deepcopy__.
    '''
    assert isinstance(shared_attribute_names, (list, tuple))
    shared_attributes = {k: getattr(obj, k) for k in shared_attribute_names}

    if hasattr(obj, '__deepcopy__'):
        # Do hack to prevent infinite recursion in call to deepcopy
        deepcopy_method = obj.__deepcopy__
        obj.__deepcopy__ = None

    for attr in shared_attribute_names:
        del obj.__dict__[attr]

    clone = deepcopy(obj)

    for attr, val in shared_attributes.iteritems():
        setattr(obj, attr, val)
        setattr(clone, attr, val)

    if hasattr(obj, '__deepcopy__'):
        # Undo hack
        obj.__deepcopy__ = deepcopy_method
        del clone.__deepcopy__

    return clone



class A(object):

    def __init__(self):
        self.copy_me = []
        self.share_me = []

    def __deepcopy__(self, memo):
        return deepcopy_with_sharing(self, shared_attribute_names = ['share_me'], memo=memo)

a = A()
b = deepcopy(a)
assert a.copy_me is not b.copy_me
assert a.share_me is b.share_me

c = deepcopy(b)
assert c.copy_me is not b.copy_me
assert c.share_me is b.share_me

回答 4

我可能在细节上有些偏离,但是这里有:

copy文档 ;

  • 浅表副本将构造一个新的复合对象,然后(在可能的范围内)将对原始对象中找到的对象的引用插入其中。
  • 深层副本将构造一个新的复合对象,然后递归地将原始对象中发现的对象的副本插入其中。

换句话说:copy()将仅复制顶部元素,将其余元素作为指针保留到原始结构中。deepcopy()将递归复制所有内容。

那就是deepcopy()你所需要的。

如果您需要做一些真正特定的事情,则可以按照手册中的说明覆盖__copy__()__deepcopy__()。就个人而言,我可能会实现一个普通函数(例如config.copy_config(),诸如此类)以明确表明它不是Python标准行为。

I might be a bit off on the specifics, but here goes;

From the copy docs;

  • A shallow copy constructs a new compound object and then (to the extent possible) inserts references into it to the objects found in the original.
  • A deep copy constructs a new compound object and then, recursively, inserts copies into it of the objects found in the original.

In other words: copy() will copy only the top element and leave the rest as pointers into the original structure. deepcopy() will recursively copy over everything.

That is, deepcopy() is what you need.

If you need to do something really specific, you can override __copy__() or __deepcopy__(), as described in the manual. Personally, I’d probably implement a plain function (e.g. config.copy_config() or such) to make it plain that it isn’t Python standard behaviour.


回答 5

copy模块最终使用__getstate__()/ pickling协议,因此这些也是要覆盖的有效目标。__setstate__()

默认实现只是返回并设置__dict__类的,因此您不必调用super()并担心上面的 Eino Gourdin的巧妙技巧。

The copy module uses eventually the __getstate__()/__setstate__() pickling protocol, so these are also valid targets to override.

The default implementation just returns and sets the __dict__ of the class, so you don’t have to call super() and worry about Eino Gourdin’s clever trick, above.


回答 6

建立在安东尼·哈奇金斯(Antony Hatchkins)干净答案的基础上,这是我的版本,其中相关类来自另一个自定义类(我们需要调用super):

class Foo(FooBase):
    def __init__(self, param1, param2):
        self._base_params = [param1, param2]
        super(Foo, result).__init__(*self._base_params)

    def __copy__(self):
        cls = self.__class__
        result = cls.__new__(cls)
        result.__dict__.update(self.__dict__)
        super(Foo, result).__init__(*self._base_params)
        return result

    def __deepcopy__(self, memo):
        cls = self.__class__
        result = cls.__new__(cls)
        memo[id(self)] = result
        for k, v in self.__dict__.items():
            setattr(result, k, copy.deepcopy(v, memo))
        super(Foo, result).__init__(*self._base_params)
        return result

Building on Antony Hatchkins’ clean answer, here’s my version where the class in question derives from another custom class (s.t. we need to call super):

class Foo(FooBase):
    def __init__(self, param1, param2):
        self._base_params = [param1, param2]
        super(Foo, result).__init__(*self._base_params)

    def __copy__(self):
        cls = self.__class__
        result = cls.__new__(cls)
        result.__dict__.update(self.__dict__)
        super(Foo, result).__init__(*self._base_params)
        return result

    def __deepcopy__(self, memo):
        cls = self.__class__
        result = cls.__new__(cls)
        memo[id(self)] = result
        for k, v in self.__dict__.items():
            setattr(result, k, copy.deepcopy(v, memo))
        super(Foo, result).__init__(*self._base_params)
        return result