Python(和Python C API):__new__与__init__

问题:Python(和Python C API):__new__与__init__

我要问的问题似乎是Python对__new__和__init__的重复使用?,但无论如何,我仍然不清楚__new__和之间的实际区别是什么__init__

在您急于告诉我__new__创建对象和__init__初始化对象之前,请让我明确:我明白了。 实际上,这种区分对我来说是很自然的,因为我在C ++中有经验,在那里我们放置了new,它类似地将对象分配与初始化分开。

Python的C API教程解释它是这样的:

新成员负责创建(而不是初始化)该类型的对象。它在Python中作为__new__()方法公开。… 实施新方法的原因之一是要确保实例变量的初始值

所以,是的-我明白__new__,但是尽管如此,我仍然不明白为什么它在Python中很有用。给出的示例说,__new__如果要“确保实例变量的初始值” ,这可能会很有用。好吧,这不正是要做__init__什么吗?

在C API教程中,显示​​了一个示例,其中创建了新的Type(称为“ Noddy”),并__new__定义了Type的功能。Noddy类型包含一个名为的字符串成员first,并且该字符串成员被初始化为一个空字符串,如下所示:

static PyObject * Noddy_new(PyTypeObject *type, PyObject *args, PyObject *kwds)
{
    .....

    self->first = PyString_FromString("");
    if (self->first == NULL)
    {
       Py_DECREF(self);
       return NULL;
    }

    .....
}

请注意,如果没有在此__new__定义的方法,我们将不得不使用PyType_GenericNew,它只会将所有实例变量成员初始化为NULL。因此,该__new__方法的唯一好处是实例变量将从一个空字符串开始,而不是NULL。 但是,为什么这会有用呢,因为如果我们要确保将实例变量初始化为某个默认值,那么我们可以在__init__方法中做到这一点?

The question I’m about to ask seems to be a duplicate of Python’s use of __new__ and __init__?, but regardless, it’s still unclear to me exactly what the practical difference between __new__ and __init__ is.

Before you rush to tell me that __new__ is for creating objects and __init__ is for initializing objects, let me be clear: I get that. In fact, that distinction is quite natural to me, since I have experience in C++ where we have placement new, which similarly separates object allocation from initialization.

The Python C API tutorial explains it like this:

The new member is responsible for creating (as opposed to initializing) objects of the type. It is exposed in Python as the __new__() method. … One reason to implement a new method is to assure the initial values of instance variables.

So, yeah – I get what __new__ does, but despite this, I still don’t understand why it’s useful in Python. The example given says that __new__ might be useful if you want to “assure the initial values of instance variables”. Well, isn’t that exactly what __init__ will do?

In the C API tutorial, an example is shown where a new Type (called a “Noddy”) is created, and the Type’s __new__ function is defined. The Noddy type contains a string member called first, and this string member is initialized to an empty string like so:

static PyObject * Noddy_new(PyTypeObject *type, PyObject *args, PyObject *kwds)
{
    .....

    self->first = PyString_FromString("");
    if (self->first == NULL)
    {
       Py_DECREF(self);
       return NULL;
    }

    .....
}

Note that without the __new__ method defined here, we’d have to use PyType_GenericNew, which simply initializes all of the instance variable members to NULL. So the only benefit of the __new__ method is that the instance variable will start out as an empty string, as opposed to NULL. But why is this ever useful, since if we cared about making sure our instance variables are initialized to some default value, we could have just done that in the __init__ method?


回答 0

差异主要发生在可变与不可变类型之间。

__new__接受一个类型作为第一个参数,并且(通常)返回该类型的新实例。因此,它适用于可变类型和不可变类型。

__init__接受一个实例作为第一个参数,并修改该实例的属性。这不适用于不可变类型,因为它允许在创建后通过调用修改它们obj.__init__(*args)

比较的行为tuplelist

>>> x = (1, 2)
>>> x
(1, 2)
>>> x.__init__([3, 4])
>>> x # tuple.__init__ does nothing
(1, 2)
>>> y = [1, 2]
>>> y
[1, 2]
>>> y.__init__([3, 4])
>>> y # list.__init__ reinitialises the object
[3, 4]

关于它们为什么分开的原因(除了简单的历史原因):__new__方法需要一堆样板才能正确(最初的对象创建,然后记得最后返回对象)。__init__相比之下,方法非常简单,因为您只需设置需要设置的任何属性即可。

除了__init__更易于编写的方法以及上面提到的可变与不可变的区别外,还可以利用这种分离,__init__通过在中设置任何绝对必要的实例不变式,使在子类中调用父类成为可选的__new__。不过,这通常是一种可疑的做法-通常在需要时仅调用父类__init__方法会更清晰。

The difference mainly arises with mutable vs immutable types.

__new__ accepts a type as the first argument, and (usually) returns a new instance of that type. Thus it is suitable for use with both mutable and immutable types.

__init__ accepts an instance as the first argument and modifies the attributes of that instance. This is inappropriate for an immutable type, as it would allow them to be modified after creation by calling obj.__init__(*args).

Compare the behaviour of tuple and list:

>>> x = (1, 2)
>>> x
(1, 2)
>>> x.__init__([3, 4])
>>> x # tuple.__init__ does nothing
(1, 2)
>>> y = [1, 2]
>>> y
[1, 2]
>>> y.__init__([3, 4])
>>> y # list.__init__ reinitialises the object
[3, 4]

As to why they’re separate (aside from simple historical reasons): __new__ methods require a bunch of boilerplate to get right (the initial object creation, and then remembering to return the object at the end). __init__ methods, by contrast, are dead simple, since you just set whatever attributes you need to set.

Aside from __init__ methods being easier to write, and the mutable vs immutable distinction noted above, the separation can also be exploited to make calling the parent class __init__ in subclasses optional by setting up any absolutely required instance invariants in __new__. This is generally a dubious practice though – it’s usually clearer to just call the parent class __init__ methods as necessary.


回答 1

可能还有其他用途,__new__但有一个真正显而易见的用途:如果不使用,就不能继承不可变类型__new__。例如,假设您要创建一个元组的子类,该子类只能包含0到之间的整数值size

class ModularTuple(tuple):
    def __new__(cls, tup, size=100):
        tup = (int(x) % size for x in tup)
        return super(ModularTuple, cls).__new__(cls, tup)

你根本无法做到这一点__init__-如果你试图修改self__init__,解释器会抱怨你试图修改不可变对象。

There are probably other uses for __new__ but there’s one really obvious one: You can’t subclass an immutable type without using __new__. So for example, say you wanted to create a subclass of tuple that can contain only integral values between 0 and size.

class ModularTuple(tuple):
    def __new__(cls, tup, size=100):
        tup = (int(x) % size for x in tup)
        return super(ModularTuple, cls).__new__(cls, tup)

You simply can’t do this with __init__ — if you tried to modify self in __init__, the interpreter would complain that you’re trying to modify an immutable object.


回答 2

__new__()可以返回与其绑定的类不同类型的对象。__init__()仅初始化该类的现有实例。

>>> class C(object):
...   def __new__(cls):
...     return 5
...
>>> c = C()
>>> print type(c)
<type 'int'>
>>> print c
5

__new__() can return objects of types other than the class it’s bound to. __init__() only initializes an existing instance of the class.

>>> class C(object):
...   def __new__(cls):
...     return 5
...
>>> c = C()
>>> print type(c)
<type 'int'>
>>> print c
5

回答 3

这不是一个完整的答案,但也许可以说明差异。

__new__当必须创建一个对象时,它将总是被调用。在某些情况下__init__不会被呼叫。一个示例是,当您从pickle文件中解开对象时,它们将被分配(__new__)但未初始化(__init__)。

Not a complete answer but perhaps something that illustrates the difference.

__new__ will always get called when an object has to be created. There are some situations where __init__ will not get called. One example is when you unpickle objects from a pickle file, they will get allocated (__new__) but not initialised (__init__).


回答 4

只是想添加一个关于定义vs 的意图(与行为相反)的词__new____init__

当我试图理解定义类工厂的最佳方法时,我遇到了这个问题。我意识到,在__new__概念上与之不同的一种方式__init__是,这样的好处__new__恰恰是问题中所陈述的事实:

因此__new__方法的唯一好处是实例变量将从一个空字符串开始,而不是NULL。但是为什么这会有用呢,因为如果我们要确保实例变量被初始化为某个默认值,那么我们可以在__init__方法中做到这一点?

考虑到上述情况,当实例实际上是类本身时,我们关心实例变量的初始值。因此,如果我们在运行时动态创建一个类对象,并且需要定义/控制一些有关正在创建的类的后续实例的特殊操作,则可以在__new__元类的方法中定义这些条件/属性。

我一直对此感到困惑,直到我真正考虑到该概念的应用,而不仅仅是其含义。这是一个希望可以使区别清楚的示例:

a = Shape(sides=3, base=2, height=12)
b = Shape(sides=4, length=2)
print(a.area())
print(b.area())

# I want `a` and `b` to be an instances of either of 'Square' or 'Triangle'
# depending on number of sides and also the `.area()` method to do the right
# thing. How do I do that without creating a Shape class with all the
# methods having a bunch of `if`s ? Here is one possibility

class Shape:
    def __new__(cls, sides, *args, **kwargs):
        if sides == 3:
            return Triangle(*args, **kwargs)
        else:
            return Square(*args, **kwargs)

class Triangle:
    def __init__(self, base, height):
        self.base = base
        self.height = height

    def area(self):
        return (self.base * self.height) / 2

class Square:
    def __init__(self, length):
        self.length = length

    def area(self):
        return self.length*self.length

请注意,这只是一个示例。有多种方法可以获取解决方案,而无需借助上述的类工厂方法,即使我们确实选择以这种方式来实现该解决方案,为简洁起见也有一些注意事项(例如,明确声明元类) )

如果您要创建常规类(又称为非元类),那么__new__除非真正有特殊意义,例如ncoghlan答案中的可变与不可变方案(实际上是定义概念的更具体示例),否则这没有什么意义通过创建的类/类型的初始值/属性,__new__然后通过进行初始化__init__

Just want to add a word about the intent (as opposed to the behavior) of defining __new__ versus __init__.

I came across this question (among others) when I was trying to understand the best way to define a class factory. I realized that one of the ways in which __new__ is conceptually different from __init__ is the fact that the benefit of __new__ is exactly what was stated in the question:

So the only benefit of the __new__ method is that the instance variable will start out as an empty string, as opposed to NULL. But why is this ever useful, since if we cared about making sure our instance variables are initialized to some default value, we could have just done that in the __init__ method?

Considering the stated scenario, we care about the initial values of the instance variables when the instance is in reality a class itself. So, if we are dynamically creating a class object at runtime and we need to define/control something special about the subsequent instances of this class being created, we would define these conditions/properties in a __new__ method of a metaclass.

I was confused about this until I actually thought about the application of the concept rather than just the meaning of it. Here’s an example that would hopefully make the difference clear:

a = Shape(sides=3, base=2, height=12)
b = Shape(sides=4, length=2)
print(a.area())
print(b.area())

# I want `a` and `b` to be an instances of either of 'Square' or 'Triangle'
# depending on number of sides and also the `.area()` method to do the right
# thing. How do I do that without creating a Shape class with all the
# methods having a bunch of `if`s ? Here is one possibility

class Shape:
    def __new__(cls, sides, *args, **kwargs):
        if sides == 3:
            return Triangle(*args, **kwargs)
        else:
            return Square(*args, **kwargs)

class Triangle:
    def __init__(self, base, height):
        self.base = base
        self.height = height

    def area(self):
        return (self.base * self.height) / 2

class Square:
    def __init__(self, length):
        self.length = length

    def area(self):
        return self.length*self.length

Note this is just an demonstartive example. There are multiple ways to get a solution without resorting to a class factory approach like above and even if we do choose to implelent the solution in this manner, there are a little caveats left out for sake of brevity (for instance, declaring the metaclass explicitly)

If you are creating a regular class (a.k.a a non-metaclass), then __new__ doesn’t really make sense unless it is special case like the mutable versus immutable scenario in ncoghlan’s answer answer (which is essentially a more specific example of the concept of defining the initial values/properties of the class/type being created via __new__ to be then initialized via __init__).