标签归档:python-internals

为什么(’x’,)中的’x’比’x’==’x’快?

问题:为什么(’x’,)中的’x’比’x’==’x’快?

>>> timeit.timeit("'x' in ('x',)")
0.04869917374131205
>>> timeit.timeit("'x' == 'x'")
0.06144205736110564

也适用于具有多个元素的元组,两个版本似乎线性增长:

>>> timeit.timeit("'x' in ('x', 'y')")
0.04866674801541748
>>> timeit.timeit("'x' == 'x' or 'x' == 'y'")
0.06565782838087131
>>> timeit.timeit("'x' in ('y', 'x')")
0.08975995576448526
>>> timeit.timeit("'x' == 'y' or 'x' == 'y'")
0.12992391047427532

基于此,我认为我应该完全开始in在任何地方而不是在所有地方使用==

>>> timeit.timeit("'x' in ('x',)")
0.04869917374131205
>>> timeit.timeit("'x' == 'x'")
0.06144205736110564

Also works for tuples with multiple elements, both versions seem to grow linearly:

>>> timeit.timeit("'x' in ('x', 'y')")
0.04866674801541748
>>> timeit.timeit("'x' == 'x' or 'x' == 'y'")
0.06565782838087131
>>> timeit.timeit("'x' in ('y', 'x')")
0.08975995576448526
>>> timeit.timeit("'x' == 'y' or 'x' == 'y'")
0.12992391047427532

Based on this, I think I should totally start using in everywhere instead of ==!


回答 0

正如我对大卫·沃尔沃(David Wolever)所提到的那样,这不仅仅是眼神。两种方法都发送到is; 你可以通过做证明

min(Timer("x == x", setup="x = 'a' * 1000000").repeat(10, 10000))
#>>> 0.00045456900261342525

min(Timer("x == y", setup="x = 'a' * 1000000; y = 'a' * 1000000").repeat(10, 10000))
#>>> 0.5256857610074803

第一个只能如此之快,因为它通过身份检查。

为了找出为什么一个比另一个要花更长的时间,让我们追溯执行。

它们都以开头ceval.cCOMPARE_OP因为这是所涉及的字节码

TARGET(COMPARE_OP) {
    PyObject *right = POP();
    PyObject *left = TOP();
    PyObject *res = cmp_outcome(oparg, left, right);
    Py_DECREF(left);
    Py_DECREF(right);
    SET_TOP(res);
    if (res == NULL)
        goto error;
    PREDICT(POP_JUMP_IF_FALSE);
    PREDICT(POP_JUMP_IF_TRUE);
    DISPATCH();
}

这会从堆栈中弹出值(从技术上讲,它只会弹出一个)

PyObject *right = POP();
PyObject *left = TOP();

并运行比较:

PyObject *res = cmp_outcome(oparg, left, right);

cmp_outcome 这是:

static PyObject *
cmp_outcome(int op, PyObject *v, PyObject *w)
{
    int res = 0;
    switch (op) {
    case PyCmp_IS: ...
    case PyCmp_IS_NOT: ...
    case PyCmp_IN:
        res = PySequence_Contains(w, v);
        if (res < 0)
            return NULL;
        break;
    case PyCmp_NOT_IN: ...
    case PyCmp_EXC_MATCH: ...
    default:
        return PyObject_RichCompare(v, w, op);
    }
    v = res ? Py_True : Py_False;
    Py_INCREF(v);
    return v;
}

这是路径分开的地方。该PyCmp_IN分支不

int
PySequence_Contains(PyObject *seq, PyObject *ob)
{
    Py_ssize_t result;
    PySequenceMethods *sqm = seq->ob_type->tp_as_sequence;
    if (sqm != NULL && sqm->sq_contains != NULL)
        return (*sqm->sq_contains)(seq, ob);
    result = _PySequence_IterSearch(seq, ob, PY_ITERSEARCH_CONTAINS);
    return Py_SAFE_DOWNCAST(result, Py_ssize_t, int);
}

请注意,元组定义为

static PySequenceMethods tuple_as_sequence = {
    ...
    (objobjproc)tuplecontains,                  /* sq_contains */
};

PyTypeObject PyTuple_Type = {
    ...
    &tuple_as_sequence,                         /* tp_as_sequence */
    ...
};

所以分公司

if (sqm != NULL && sqm->sq_contains != NULL)

将采用和*sqm->sq_contains,即功能(objobjproc)tuplecontains

这确实

static int
tuplecontains(PyTupleObject *a, PyObject *el)
{
    Py_ssize_t i;
    int cmp;

    for (i = 0, cmp = 0 ; cmp == 0 && i < Py_SIZE(a); ++i)
        cmp = PyObject_RichCompareBool(el, PyTuple_GET_ITEM(a, i),
                                           Py_EQ);
    return cmp;
}

…等等,那不是PyObject_RichCompareBool其他分支所采取的吗?不,那是PyObject_RichCompare

该代码路径很短,因此很可能取决于这两者的速度。让我们比较一下。

int
PyObject_RichCompareBool(PyObject *v, PyObject *w, int op)
{
    PyObject *res;
    int ok;

    /* Quick result when objects are the same.
       Guarantees that identity implies equality. */
    if (v == w) {
        if (op == Py_EQ)
            return 1;
        else if (op == Py_NE)
            return 0;
    }

    ...
}

代码路径PyObject_RichCompareBool几乎立即终止。对于PyObject_RichCompare,它确实

PyObject *
PyObject_RichCompare(PyObject *v, PyObject *w, int op)
{
    PyObject *res;

    assert(Py_LT <= op && op <= Py_GE);
    if (v == NULL || w == NULL) { ... }
    if (Py_EnterRecursiveCall(" in comparison"))
        return NULL;
    res = do_richcompare(v, w, op);
    Py_LeaveRecursiveCall();
    return res;
}

Py_EnterRecursiveCall/ Py_LeaveRecursiveCall组合不采取在前面的路径,但这些都是比较快的宏将递增和递减一些全局后短路。

do_richcompare 确实:

static PyObject *
do_richcompare(PyObject *v, PyObject *w, int op)
{
    richcmpfunc f;
    PyObject *res;
    int checked_reverse_op = 0;

    if (v->ob_type != w->ob_type && ...) { ... }
    if ((f = v->ob_type->tp_richcompare) != NULL) {
        res = (*f)(v, w, op);
        if (res != Py_NotImplemented)
            return res;
        ...
    }
    ...
}

这做一些快速的检查,以电话v->ob_type->tp_richcompare

PyTypeObject PyUnicode_Type = {
    ...
    PyUnicode_RichCompare,      /* tp_richcompare */
    ...
};

哪个

PyObject *
PyUnicode_RichCompare(PyObject *left, PyObject *right, int op)
{
    int result;
    PyObject *v;

    if (!PyUnicode_Check(left) || !PyUnicode_Check(right))
        Py_RETURN_NOTIMPLEMENTED;

    if (PyUnicode_READY(left) == -1 ||
        PyUnicode_READY(right) == -1)
        return NULL;

    if (left == right) {
        switch (op) {
        case Py_EQ:
        case Py_LE:
        case Py_GE:
            /* a string is equal to itself */
            v = Py_True;
            break;
        case Py_NE:
        case Py_LT:
        case Py_GT:
            v = Py_False;
            break;
        default:
            ...
        }
    }
    else if (...) { ... }
    else { ...}
    Py_INCREF(v);
    return v;
}

即,此快捷方式left == right仅在…之后

    if (!PyUnicode_Check(left) || !PyUnicode_Check(right))

    if (PyUnicode_READY(left) == -1 ||
        PyUnicode_READY(right) == -1)

所有路径都看起来像这样(手动递归内联,展开和修剪已知分支)

POP()                           # Stack stuff
TOP()                           #
                                #
case PyCmp_IN:                  # Dispatch on operation
                                #
sqm != NULL                     # Dispatch to builtin op
sqm->sq_contains != NULL        #
*sqm->sq_contains               #
                                #
cmp == 0                        # Do comparison in loop
i < Py_SIZE(a)                  #
v == w                          #
op == Py_EQ                     #
++i                             # 
cmp == 0                        #
                                #
res < 0                         # Convert to Python-space
res ? Py_True : Py_False        #
Py_INCREF(v)                    #
                                #
Py_DECREF(left)                 # Stack stuff
Py_DECREF(right)                #
SET_TOP(res)                    #
res == NULL                     #
DISPATCH()                      #

POP()                           # Stack stuff
TOP()                           #
                                #
default:                        # Dispatch on operation
                                #
Py_LT <= op                     # Checking operation
op <= Py_GE                     #
v == NULL                       #
w == NULL                       #
Py_EnterRecursiveCall(...)      # Recursive check
                                #
v->ob_type != w->ob_type        # More operation checks
f = v->ob_type->tp_richcompare  # Dispatch to builtin op
f != NULL                       #
                                #
!PyUnicode_Check(left)          # ...More checks
!PyUnicode_Check(right))        #
PyUnicode_READY(left) == -1     #
PyUnicode_READY(right) == -1    #
left == right                   # Finally, doing comparison
case Py_EQ:                     # Immediately short circuit
Py_INCREF(v);                   #
                                #
res != Py_NotImplemented        #
                                #
Py_LeaveRecursiveCall()         # Recursive check
                                #
Py_DECREF(left)                 # Stack stuff
Py_DECREF(right)                #
SET_TOP(res)                    #
res == NULL                     #
DISPATCH()                      #

现在,PyUnicode_CheckPyUnicode_READY相当便宜,因为他们只检查了几个领域,但它应该是显而易见的是,上面一个是较小的代码路径,它具有较少的函数调用,只有一个开关语句是只是有点薄。

TL; DR:

都派往if (left_pointer == right_pointer); 不同之处在于他们为达到目标需要做多少工作。in只是做得更少。

As I mentioned to David Wolever, there’s more to this than meets the eye; both methods dispatch to is; you can prove this by doing

min(Timer("x == x", setup="x = 'a' * 1000000").repeat(10, 10000))
#>>> 0.00045456900261342525

min(Timer("x == y", setup="x = 'a' * 1000000; y = 'a' * 1000000").repeat(10, 10000))
#>>> 0.5256857610074803

The first can only be so fast because it checks by identity.

To find out why one would take longer than the other, let’s trace through execution.

They both start in ceval.c, from COMPARE_OP since that is the bytecode involved

TARGET(COMPARE_OP) {
    PyObject *right = POP();
    PyObject *left = TOP();
    PyObject *res = cmp_outcome(oparg, left, right);
    Py_DECREF(left);
    Py_DECREF(right);
    SET_TOP(res);
    if (res == NULL)
        goto error;
    PREDICT(POP_JUMP_IF_FALSE);
    PREDICT(POP_JUMP_IF_TRUE);
    DISPATCH();
}

This pops the values from the stack (technically it only pops one)

PyObject *right = POP();
PyObject *left = TOP();

and runs the compare:

PyObject *res = cmp_outcome(oparg, left, right);

cmp_outcome is this:

static PyObject *
cmp_outcome(int op, PyObject *v, PyObject *w)
{
    int res = 0;
    switch (op) {
    case PyCmp_IS: ...
    case PyCmp_IS_NOT: ...
    case PyCmp_IN:
        res = PySequence_Contains(w, v);
        if (res < 0)
            return NULL;
        break;
    case PyCmp_NOT_IN: ...
    case PyCmp_EXC_MATCH: ...
    default:
        return PyObject_RichCompare(v, w, op);
    }
    v = res ? Py_True : Py_False;
    Py_INCREF(v);
    return v;
}

This is where the paths split. The PyCmp_IN branch does

int
PySequence_Contains(PyObject *seq, PyObject *ob)
{
    Py_ssize_t result;
    PySequenceMethods *sqm = seq->ob_type->tp_as_sequence;
    if (sqm != NULL && sqm->sq_contains != NULL)
        return (*sqm->sq_contains)(seq, ob);
    result = _PySequence_IterSearch(seq, ob, PY_ITERSEARCH_CONTAINS);
    return Py_SAFE_DOWNCAST(result, Py_ssize_t, int);
}

Note that a tuple is defined as

static PySequenceMethods tuple_as_sequence = {
    ...
    (objobjproc)tuplecontains,                  /* sq_contains */
};

PyTypeObject PyTuple_Type = {
    ...
    &tuple_as_sequence,                         /* tp_as_sequence */
    ...
};

So the branch

if (sqm != NULL && sqm->sq_contains != NULL)

will be taken and *sqm->sq_contains, which is the function (objobjproc)tuplecontains, will be taken.

This does

static int
tuplecontains(PyTupleObject *a, PyObject *el)
{
    Py_ssize_t i;
    int cmp;

    for (i = 0, cmp = 0 ; cmp == 0 && i < Py_SIZE(a); ++i)
        cmp = PyObject_RichCompareBool(el, PyTuple_GET_ITEM(a, i),
                                           Py_EQ);
    return cmp;
}

…Wait, wasn’t that PyObject_RichCompareBool what the other branch took? Nope, that was PyObject_RichCompare.

That code path was short so it likely just comes down to the speed of these two. Let’s compare.

int
PyObject_RichCompareBool(PyObject *v, PyObject *w, int op)
{
    PyObject *res;
    int ok;

    /* Quick result when objects are the same.
       Guarantees that identity implies equality. */
    if (v == w) {
        if (op == Py_EQ)
            return 1;
        else if (op == Py_NE)
            return 0;
    }

    ...
}

The code path in PyObject_RichCompareBool pretty much immediately terminates. For PyObject_RichCompare, it does

PyObject *
PyObject_RichCompare(PyObject *v, PyObject *w, int op)
{
    PyObject *res;

    assert(Py_LT <= op && op <= Py_GE);
    if (v == NULL || w == NULL) { ... }
    if (Py_EnterRecursiveCall(" in comparison"))
        return NULL;
    res = do_richcompare(v, w, op);
    Py_LeaveRecursiveCall();
    return res;
}

The Py_EnterRecursiveCall/Py_LeaveRecursiveCall combo are not taken in the previous path, but these are relatively quick macros that’ll short-circuit after incrementing and decrementing some globals.

do_richcompare does:

static PyObject *
do_richcompare(PyObject *v, PyObject *w, int op)
{
    richcmpfunc f;
    PyObject *res;
    int checked_reverse_op = 0;

    if (v->ob_type != w->ob_type && ...) { ... }
    if ((f = v->ob_type->tp_richcompare) != NULL) {
        res = (*f)(v, w, op);
        if (res != Py_NotImplemented)
            return res;
        ...
    }
    ...
}

This does some quick checks to call v->ob_type->tp_richcompare which is

PyTypeObject PyUnicode_Type = {
    ...
    PyUnicode_RichCompare,      /* tp_richcompare */
    ...
};

which does

PyObject *
PyUnicode_RichCompare(PyObject *left, PyObject *right, int op)
{
    int result;
    PyObject *v;

    if (!PyUnicode_Check(left) || !PyUnicode_Check(right))
        Py_RETURN_NOTIMPLEMENTED;

    if (PyUnicode_READY(left) == -1 ||
        PyUnicode_READY(right) == -1)
        return NULL;

    if (left == right) {
        switch (op) {
        case Py_EQ:
        case Py_LE:
        case Py_GE:
            /* a string is equal to itself */
            v = Py_True;
            break;
        case Py_NE:
        case Py_LT:
        case Py_GT:
            v = Py_False;
            break;
        default:
            ...
        }
    }
    else if (...) { ... }
    else { ...}
    Py_INCREF(v);
    return v;
}

Namely, this shortcuts on left == right… but only after doing

    if (!PyUnicode_Check(left) || !PyUnicode_Check(right))

    if (PyUnicode_READY(left) == -1 ||
        PyUnicode_READY(right) == -1)

All in all the paths then look something like this (manually recursively inlining, unrolling and pruning known branches)

POP()                           # Stack stuff
TOP()                           #
                                #
case PyCmp_IN:                  # Dispatch on operation
                                #
sqm != NULL                     # Dispatch to builtin op
sqm->sq_contains != NULL        #
*sqm->sq_contains               #
                                #
cmp == 0                        # Do comparison in loop
i < Py_SIZE(a)                  #
v == w                          #
op == Py_EQ                     #
++i                             # 
cmp == 0                        #
                                #
res < 0                         # Convert to Python-space
res ? Py_True : Py_False        #
Py_INCREF(v)                    #
                                #
Py_DECREF(left)                 # Stack stuff
Py_DECREF(right)                #
SET_TOP(res)                    #
res == NULL                     #
DISPATCH()                      #

vs

POP()                           # Stack stuff
TOP()                           #
                                #
default:                        # Dispatch on operation
                                #
Py_LT <= op                     # Checking operation
op <= Py_GE                     #
v == NULL                       #
w == NULL                       #
Py_EnterRecursiveCall(...)      # Recursive check
                                #
v->ob_type != w->ob_type        # More operation checks
f = v->ob_type->tp_richcompare  # Dispatch to builtin op
f != NULL                       #
                                #
!PyUnicode_Check(left)          # ...More checks
!PyUnicode_Check(right))        #
PyUnicode_READY(left) == -1     #
PyUnicode_READY(right) == -1    #
left == right                   # Finally, doing comparison
case Py_EQ:                     # Immediately short circuit
Py_INCREF(v);                   #
                                #
res != Py_NotImplemented        #
                                #
Py_LeaveRecursiveCall()         # Recursive check
                                #
Py_DECREF(left)                 # Stack stuff
Py_DECREF(right)                #
SET_TOP(res)                    #
res == NULL                     #
DISPATCH()                      #

Now, PyUnicode_Check and PyUnicode_READY are pretty cheap since they only check a couple of fields, but it should be obvious that the top one is a smaller code path, it has fewer function calls, only one switch statement and is just a bit thinner.

TL;DR:

Both dispatch to if (left_pointer == right_pointer); the difference is just how much work they do to get there. in just does less.


回答 1

这里有三个因素在起作用,共同产生这种令人惊讶的行为。

首先:in操作员使用快捷方式并检查身份(x is y),然后再检查是否等于(x == y):

>>> n = float('nan')
>>> n in (n, )
True
>>> n == n
False
>>> n is n
True

第二:由于Python的字符串interning,两个"x"in "x" in ("x", )是相同的:

>>> "x" is "x"
True

(大警告:这是实现特定的行为!is应该永远不会被用来比较字符串,因为它有时给了惊人的答复;例如"x" * 100 is "x" * 100 ==> False

第三:在详细Veedrac的梦幻般的回答tuple.__contains__x in (y, )大致相当于(y, ).__contains__(x))获取进行标识检查的速度比点str.__eq__(再次,x == y就是大致相当于x.__eq__(y))一样。

您可以看到这一点的证据,因为x in (y, )它比逻辑上等效的要慢得多x == y

In [18]: %timeit 'x' in ('x', )
10000000 loops, best of 3: 65.2 ns per loop

In [19]: %timeit 'x' == 'x'    
10000000 loops, best of 3: 68 ns per loop

In [20]: %timeit 'x' in ('y', ) 
10000000 loops, best of 3: 73.4 ns per loop

In [21]: %timeit 'x' == 'y'    
10000000 loops, best of 3: 56.2 ns per loop

这种x in (y, )情况的速度较慢,因为在is比较失败之后,in运算符将退回到常规的相等性检查(即使用==),因此比较所需的时间与相同,因此==由于创建元组的开销,整个操作的速度变慢了。 ,遍历其成员等。

还请注意,只有在以下a in (b, )情况下更快a is b

In [48]: a = 1             

In [49]: b = 2

In [50]: %timeit a is a or a == a
10000000 loops, best of 3: 95.1 ns per loop

In [51]: %timeit a in (a, )      
10000000 loops, best of 3: 140 ns per loop

In [52]: %timeit a is b or a == b
10000000 loops, best of 3: 177 ns per loop

In [53]: %timeit a in (b, )      
10000000 loops, best of 3: 169 ns per loop

(为什么a in (b, )要比这快a is b or a == b?我猜想虚拟机指令会更少—  a in (b, )只有〜3条指令,其中a is b or a == b会有更多的VM指令)

Veedrac的答案- https://stackoverflow.com/a/28889838/71522 -进入每个过程具体是什么情况更多的细节==in,是非常值得读。

There are three factors at play here which, combined, produce this surprising behavior.

First: the in operator takes a shortcut and checks identity (x is y) before it checks equality (x == y):

>>> n = float('nan')
>>> n in (n, )
True
>>> n == n
False
>>> n is n
True

Second: because of Python’s string interning, both "x"s in "x" in ("x", ) will be identical:

>>> "x" is "x"
True

(big warning: this is implementation-specific behavior! is should never be used to compare strings because it will give surprising answers sometimes; for example "x" * 100 is "x" * 100 ==> False)

Third: as detailed in Veedrac’s fantastic answer, tuple.__contains__ (x in (y, ) is roughly equivalent to (y, ).__contains__(x)) gets to the point of performing the identity check faster than str.__eq__ (again, x == y is roughly equivalent to x.__eq__(y)) does.

You can see evidence for this because x in (y, ) is significantly slower than the logically equivalent, x == y:

In [18]: %timeit 'x' in ('x', )
10000000 loops, best of 3: 65.2 ns per loop

In [19]: %timeit 'x' == 'x'    
10000000 loops, best of 3: 68 ns per loop

In [20]: %timeit 'x' in ('y', ) 
10000000 loops, best of 3: 73.4 ns per loop

In [21]: %timeit 'x' == 'y'    
10000000 loops, best of 3: 56.2 ns per loop

The x in (y, ) case is slower because, after the is comparison fails, the in operator falls back to normal equality checking (i.e., using ==), so the comparison takes about the same amount of time as ==, rendering the entire operation slower because of the overhead of creating the tuple, walking its members, etc.

Note also that a in (b, ) is only faster when a is b:

In [48]: a = 1             

In [49]: b = 2

In [50]: %timeit a is a or a == a
10000000 loops, best of 3: 95.1 ns per loop

In [51]: %timeit a in (a, )      
10000000 loops, best of 3: 140 ns per loop

In [52]: %timeit a is b or a == b
10000000 loops, best of 3: 177 ns per loop

In [53]: %timeit a in (b, )      
10000000 loops, best of 3: 169 ns per loop

(why is a in (b, ) faster than a is b or a == b? My guess would be fewer virtual machine instructions — a in (b, ) is only ~3 instructions, where a is b or a == b will be quite a few more VM instructions)

Veedrac’s answer — https://stackoverflow.com/a/28889838/71522 — goes into much more detail on specifically what happens during each of == and in and is well worth the read.


什么时候del在python中有用?

问题:什么时候del在python中有用?

我真的想不出python为什么需要del关键字的任何原因(大多数语言似乎没有类似的关键字)。例如,可以删除变量而不是删除变量None。从字典中删除时,del可以添加一个方法。

是否有任何理由保留del在python中,或者它是Python的垃圾收集日的痕迹?

I can’t really think of any reason why python needs the del keyword (and most languages seem to not have a similar keyword). For instance, rather than deleting a variable, one could just assign None to it. And when deleting from a dictionary, a del method could be added.

Is there any reason to keep del in python, or is it a vestige of Python’s pre-garbage collection days?


回答 0

首先,除了局部变量,您还可以进行其他操作

del list_item[4]
del dictionary["alpha"]

两者都应该明显有用。其次,del在局部变量上使用可使意图更清晰。相比:

del foo

foo = None

我知道在这种情况下del foo,目的是从范围中删除变量。目前尚不清楚这样foo = None做。如果有人刚分配,foo = None我可能认为这是无效代码。但是我立即知道某个编码人员del foo正在尝试做什么。

Firstly, you can del other things besides local variables

del list_item[4]
del dictionary["alpha"]

Both of which should be clearly useful. Secondly, using del on a local variable makes the intent clearer. Compare:

del foo

to

foo = None

I know in the case of del foo that the intent is to remove the variable from scope. It’s not clear that foo = None is doing that. If somebody just assigned foo = None I might think it was dead code. But I instantly know what somebody who codes del foo was trying to do.


回答 1

这是做什么的一部分del(来自Python语言参考):

删除名称会从本地或全局命名空间中删除该名称的绑定

分配None名称不会从命名空间中删除名称的绑定。

(我想可能存在一些关于删除名称绑定是否真正有用的参数,但这是另一个问题。)

There’s this part of what del does (from the Python Language Reference):

Deletion of a name removes the binding of that name from the local or global namespace

Assigning None to a name does not remove the binding of the name from the namespace.

(I suppose there could be some debate about whether removing a name binding is actually useful, but that’s another question.)


回答 2

我发现一个del有用的地方是清理for循环中的无关变量:

for x in some_list:
  do(x)
del x

现在,您可以确定,如果在for循环之外使用x,它将是未定义的。

One place I’ve found del useful is cleaning up extraneous variables in for loops:

for x in some_list:
  do(x)
del x

Now you can be sure that x will be undefined if you use it outside the for loop.


回答 3

删除变量与将其设置为“无”不同

用删除变量名del可能很少使用,但是如果没有关键字就无法轻易实现。如果您可以通过编写来创建变量名a=1,那么理论上可以通过删除a来撤消该操作。

在某些情况下,它可以简化调试过程,因为尝试访问已删除的变量将引发NameError。

您可以删除类实例属性

Python使您可以编写如下内容:

class A(object):
    def set_a(self, a):
        self.a=a
a=A()
a.set_a(3)
if hasattr(a, "a"):
    print("Hallo")

如果您选择向类实例动态添加属性,那么您当然希望能够通过编写来撤消它

del a.a

Deleting a variable is different than setting it to None

Deleting variable names with del is probably something used rarely, but it is something that could not trivially be achieved without a keyword. If you can create a variable name by writing a=1, it is nice that you can theoretically undo this by deleting a.

It can make debugging easier in some cases as trying to access a deleted variable will raise an NameError.

You can delete class instance attributes

Python lets you write something like:

class A(object):
    def set_a(self, a):
        self.a=a
a=A()
a.set_a(3)
if hasattr(a, "a"):
    print("Hallo")

If you choose to dynamically add attributes to a class instance, you certainly want to be able to undo it by writing

del a.a

回答 4

有一个特定的示例说明您在检查异常时应使用的时间del(可能还有其他,但我知道这是sys.exc_info()一时的事)。此函数返回一个元组,引发的异常类型,消息和回溯。

前两个值通常足以诊断错误并采取措施,但第三个值包含引发异常与捕获异常之间的整个调用堆栈。特别是如果您做类似的事情

try:
    do_evil()
except:
    exc_type, exc_value, tb = sys.exc_info()
    if something(exc_value):
        raise

追溯,tb最终出现在调用堆栈的本地中,从而创建了无法进行垃圾回收的循环引用。因此,执行以下操作很重要:

try:
    do_evil()
except:
    exc_type, exc_value, tb = sys.exc_info()
    del tb
    if something(exc_value):
        raise

打破循环参考。在很多情况下,你想打电话sys.exc_info(),像元类魔术,回溯有用的,所以你必须确保你清理之前,你都不可能离开异常处理程序。如果不需要回溯,则应立即将其删除,或者直接执行以下操作:

exc_type, exc_value = sys.exc_info()[:2]

为了避免所有这一切。

There is a specific example of when you should use del (there may be others, but I know about this one off hand) when you are using sys.exc_info() to inspect an exception. This function returns a tuple, the type of exception that was raised, the message, and a traceback.

The first two values are usually sufficient to diagnose an error and act on it, but the third contains the entire call stack between where the exception was raised and where the the exception is caught. In particular, if you do something like

try:
    do_evil()
except:
    exc_type, exc_value, tb = sys.exc_info()
    if something(exc_value):
        raise

the traceback, tb ends up in the locals of the call stack, creating a circular reference that cannot be garbage collected. Thus, it is important to do:

try:
    do_evil()
except:
    exc_type, exc_value, tb = sys.exc_info()
    del tb
    if something(exc_value):
        raise

to break the circular reference. In many cases where you would want to call sys.exc_info(), like with metaclass magic, the traceback is useful, so you have to make sure that you clean it up before you can possibly leave the exception handler. If you don’t need the traceback, you should delete it immediately, or just do:

exc_type, exc_value = sys.exc_info()[:2]

To avoid it all together.


回答 5

只是另一种想法。

在像Django这样的框架中调试http应用程序时,调用堆栈充满了以前使用的无用且混乱的变量,尤其是当列表很长时,对于开发人员而言可能非常痛苦。因此,此时,命名空间控制可能会很有用。

Just another thinking.

When debugging http applications in framework like Django, the call stack full of useless and messed up variables previously used, especially when it’s a very long list, could be very painful for developers. so, at this point, namespace controlling could be useful.


回答 6

与将变量分配给None相比,显式使用“ del”也是一种更好的做法。如果尝试删除不存在的变量,则会遇到运行时错误,但如果尝试将不存在的变量设置为None,Python会静默将新变量设置为None,而将变量保留为想删除它在哪里。因此,del将帮助您尽早发现错误

Using “del” explicitly is also better practice than assigning a variable to None. If you attempt to del a variable that doesn’t exist, you’ll get a runtime error but if you attempt to set a variable that doesn’t exist to None, Python will silently set a new variable to None, leaving the variable you wanted deleted where it was. So del will help you catch your mistakes earlier


回答 7

要为以上答案添加几点: del x

x指示的定义r -> or指向对象的引用o),但del x更改r而不是o。这是对对象(而不是与关联的对象)的引用(指针)的操作x。区分ro是此处的关键。

  • 它将从中删除locals()
  • globals()如果x属于它,将其删除。
  • 将其从堆栈框架中删除(从物理上删除引用,但对象本身位于对象池中,而不位于堆栈框架中)。
  • 从当前作用域中删除它。限制局部变量定义的范围非常有用,否则会导致问题。
  • 它更多地是关于名称的声明而不是内容的定义。
  • 它影响x属于的地方,而不影响x指向的地方。内存中唯一的物理更改是这样。例如,如果x在字典或列表中,则将其(作为参考)从那里(不一定从对象池中)删除。在此示例中,它所属的字典是locals()与重叠的堆栈框架()globals()

To add a few points to above answers: del x

Definition of x indicates r -> o (a reference r pointing to an object o) but del x changes r rather than o. It is an operation on the reference (pointer) to object rather than the object associated with x. Distinguishing between r and o is key here.

  • It removes it from locals().
  • Removes it from globals() if x belongs there.
  • Removes it from the stack frame (removes the reference physically from it, but the object itself resides in object pool and not in the stack frame).
  • Removes it from the current scope. It is very useful to limit the span of definition of a local variable, which otherwise can cause problems.
  • It is more about declaration of the name rather than definition of content.
  • It affects where x belongs to, not where x points to. The only physical change in memory is this. For example if x is in a dictionary or list, it (as a reference) is removed from there(and not necessarily from the object pool). In this example, the dictionary it belongs is the stack frame (locals()), which overlaps with globals().

回答 8

使用numpy.load后强制关闭文件:

利基用法也许,但我发现它在numpy.load用于读取文件时很有用。我会不时地更新文件,并且需要将具有相同名称的文件复制到目录中。

我曾经del发布过文件,并允许我复制到新文件中。

请注意,我想避免使用with上下文管理器,因为我在命令行上玩弄图并且不想太多地按下Tab键!

看到这个问题。

Force closing a file after using numpy.load:

A niche usage perhaps but I found it useful when using numpy.load to read a file. Every once in a while I would update the file and need to copy a file with the same name to the directory.

I used del to release the file and allow me to copy in the new file.

Note I want to avoid the with context manager as I was playing around with plots on the command line and didn’t want to be pressing tab a lot!

See this question.


回答 9

del通常在__init__.py文件中看到。__init__.py文件中定义的所有全局变量都将自动“导出”(将包含在中from module import *)。避免这种情况的一种方法是定义__all__,但这会变得混乱,而且并非所有人都使用它。

例如,如果您的代码__init__.py

import sys
if sys.version_info < (3,):
    print("Python 2 not supported")

然后,您的模块将导出sys名称。你应该写

import sys
if sys.version_info < (3,):
    print("Python 2 not supported")

del sys

del is often seen in __init__.py files. Any global variable that is defined in an __init__.py file is automatically “exported” (it will be included in a from module import *). One way to avoid this is to define __all__, but this can get messy and not everyone uses it.

For example, if you had code in __init__.py like

import sys
if sys.version_info < (3,):
    print("Python 2 not supported")

Then your module would export the sys name. You should instead write

import sys
if sys.version_info < (3,):
    print("Python 2 not supported")

del sys

回答 10

作为一个例子什么del可用于,我觉得有用我的情况是这样的:

def f(a, b, c=3):
    return '{} {} {}'.format(a, b, c)

def g(**kwargs):
    if 'c' in kwargs and kwargs['c'] is None:
        del kwargs['c']

    return f(**kwargs)

# g(a=1, b=2, c=None) === '1 2 3'
# g(a=1, b=2) === '1 2 3'
# g(a=1, b=2, c=4) === '1 2 4'

这两个功能都可以在不同的封装/模块和程序员不需要知道默认值参数cf实际拥有。因此,通过将kwargs与del结合使用,可以通过将其设置为None来说“我想要c的默认值”(或者在这种情况下也可以保留它)。

您可以使用类似的方法做同样的事情:

def g(a, b, c=None):
    kwargs = {'a': a,
              'b': b}
    if c is not None:
        kwargs['c'] = c

    return f(**kwargs)

但是,我发现前面的示例更加干燥和优雅。

As an example of what del can be used for, I find it useful i situations like this:

def f(a, b, c=3):
    return '{} {} {}'.format(a, b, c)

def g(**kwargs):
    if 'c' in kwargs and kwargs['c'] is None:
        del kwargs['c']

    return f(**kwargs)

# g(a=1, b=2, c=None) === '1 2 3'
# g(a=1, b=2) === '1 2 3'
# g(a=1, b=2, c=4) === '1 2 4'

These two functions can be in different packages/modules and the programmer doesn’t need to know what default value argument c in f actually have. So by using kwargs in combination with del you can say “I want the default value on c” by setting it to None (or in this case also leave it).

You could do the same thing with something like:

def g(a, b, c=None):
    kwargs = {'a': a,
              'b': b}
    if c is not None:
        kwargs['c'] = c

    return f(**kwargs)

However I find the previous example more DRY and elegant.


回答 11

什么时候del在python中有用?

您可以使用它删除数组的单个元素,而不是切片语法x[i:i+1]=[]。例如,如果您在其中os.walk并希望删除目录中的元素,这可能会很有用。不过,我认为关键字对此无济于事,因为一个人只能制作一个[].remove(index)方法(该.remove方法实际上是搜索并删除第一个实例值)。

When is del useful in python?

You can use it to remove a single element of an array instead of the slice syntax x[i:i+1]=[]. This may be useful if for example you are in os.walk and wish to delete an element in the directory. I would not consider a keyword useful for this though, since one could just make a [].remove(index) method (the .remove method is actually search-and-remove-first-instance-of-value).


回答 12

del当使用Numpy处理大数据时,我发现对于伪手动内存管理很有用。例如:

for image_name in large_image_set:
    large_image = io.imread(image_name)
    height, width, depth = large_image.shape
    large_mask = np.all(large_image == <some_condition>)
    # Clear memory, make space
    del large_image; gc.collect()

    large_processed_image = np.zeros((height, width, depth))
    large_processed_image[large_mask] = (new_value)
    io.imsave("processed_image.png", large_processed_image)

    # Clear memory, make space
    del large_mask, large_processed_image; gc.collect()

这可能是由于Python GC无法跟上系统疯狂地交换脚本而使脚本陷入停顿状态,并且在松散的内存阈值下它可以完美平滑地运行,从而留出了很大的可用空间来浏览机器和代码,而它的工作。

I’ve found del to be useful for pseudo-manual memory management when handling large data with Numpy. For example:

for image_name in large_image_set:
    large_image = io.imread(image_name)
    height, width, depth = large_image.shape
    large_mask = np.all(large_image == <some_condition>)
    # Clear memory, make space
    del large_image; gc.collect()

    large_processed_image = np.zeros((height, width, depth))
    large_processed_image[large_mask] = (new_value)
    io.imsave("processed_image.png", large_processed_image)

    # Clear memory, make space
    del large_mask, large_processed_image; gc.collect()

This can be the difference between bringing a script to a grinding halt as the system swaps like mad when the Python GC can’t keep up, and it running perfectly smooth below a loose memory threshold that leaves plenty of headroom to use the machine to browse and code while it’s working.


回答 13

我认为del具有自己的语法的原因之一是,在某些情况下,鉴于del操作绑定或变量而不是它引用的值,用函数替换它可能很难。因此,如果要创建del的函数版本,则需要传入上下文。del foo将需要变为globals()。remove(’foo’)或locals()。remove(’foo’),从而变得混乱而且可读性较差。我仍然说,鉴于del似乎很少使用,摆脱它会很好。但是删除语言功能/缺陷可能很痛苦。也许python 4会删除它:)

I think one of the reasons that del has its own syntax is that replacing it with a function might be hard in certain cases given it operates on the binding or variable and not the value it references. Thus if a function version of del were to be created a context would need to be passed in. del foo would need to become globals().remove(‘foo’) or locals().remove(‘foo’) which gets messy and less readable. Still I say getting rid of del would be good given its seemingly rare use. But removing language features/flaws can be painful. Maybe python 4 will remove it :)


回答 14

我想详细说明可接受的答案,以突出显示将变量设置为None与使用删除变量之间的细微差别del

给定变量foo = 'bar',并提供以下函数定义:

def test_var(var):
    if var:
        print('variable tested true')
    else:
        print('variable tested false')

最初声明后,test_var(foo)Yield将variable tested true达到预期。

现在尝试:

foo = None
test_var(foo)

产生variable tested false

将此行为与:

del foo
test_var(foo)

现在提高了NameError: name 'foo' is not defined

I would like to elaborate on the accepted answer to highlight the nuance between setting a variable to None versus removing it with del:

Given the variable foo = 'bar', and the following function definition:

def test_var(var):
    if var:
        print('variable tested true')
    else:
        print('variable tested false')

Once initially declared, test_var(foo) yields variable tested true as expected.

Now try:

foo = None
test_var(foo)

which yields variable tested false.

Contrast this behavior with:

del foo
test_var(foo)

which now raises NameError: name 'foo' is not defined.


回答 15

又一小生用法:在pyroot与ROOT5或ROOT6,“删除”可以是,以去除被称为无再现有C ++对象Python对象是有用的。这允许pyroot的动态查找来找到一个同名的C ++对象,并将其绑定到python名称。因此,您可能会遇到以下情况:

import ROOT as R
input_file = R.TFile('inputs/___my_file_name___.root')
tree = input_file.Get('r')
tree.Draw('hy>>hh(10,0,5)')
R.gPad.Close()
R.hy # shows that hy is still available. It can even be redrawn at this stage.
tree.Draw('hy>>hh(3,0,3)') # overwrites the C++ object in ROOT's namespace
R.hy # shows that R.hy is None, since the C++ object it pointed to is gone
del R.hy
R.hy # now finds the new C++ object

希望此利基市场将通过ROOT7的精明对象管理而关闭。

Yet another niche usage: In pyroot with ROOT5 or ROOT6, “del” may be useful to remove a python object that referred to a no-longer existing C++ object. This allows the dynamic lookup of pyroot to find an identically-named C++ object and bind it to the python name. So you can have a scenario such as:

import ROOT as R
input_file = R.TFile('inputs/___my_file_name___.root')
tree = input_file.Get('r')
tree.Draw('hy>>hh(10,0,5)')
R.gPad.Close()
R.hy # shows that hy is still available. It can even be redrawn at this stage.
tree.Draw('hy>>hh(3,0,3)') # overwrites the C++ object in ROOT's namespace
R.hy # shows that R.hy is None, since the C++ object it pointed to is gone
del R.hy
R.hy # now finds the new C++ object

Hopefully, this niche will be closed with ROOT7’s saner object management.


回答 16

“ del”命令对于控制数组中的数据非常有用,例如:

elements = ["A", "B", "C", "D"]
# Remove first element.
del elements[:1]
print(elements)

输出:

[‘B’,’C’,’D’]

The “del” command is very useful for controlling data in an array, for example:

elements = ["A", "B", "C", "D"]
# Remove first element.
del elements[:1]
print(elements)

Output:

[‘B’, ‘C’, ‘D’]


回答 17

一旦我不得不使用:

del serial
serial = None

因为仅使用:

serial = None

释放串口的速度不够快,无法立即再次打开它。从那堂课中,我学到了del真正的意思:“现在就GC!等到完成为止”,这在很多情况下非常有用。当然,您可能会有一个system.gc.del_this_and_wait_balbalbalba(obj)

Once I had to use:

del serial
serial = None

because using only:

serial = None

didn’t release the serial port fast enough to immediately open it again. From that lesson I learned that del really meant: “GC this NOW! and wait until it’s done” and that is really useful in a lot of situations. Of course, you may have a system.gc.del_this_and_wait_balbalbalba(obj).


回答 18

del在许多语言中都等同于“未设置”,并且作为从另一种语言到python的交叉参考点。人们倾向于寻找命令,它们执行的操作与以前使用第一语言时所做的相同……将var更改为“”或没有任何操作并不会真正从范围中删除该var。它只是清空其值,该var本身的名称仍将存储在内存中…为什么?!在占用大量内存的脚本中..保持垃圾桶的“否”和“反之……”每种语言都具有某种形式的“未设置/删除” var函数..为什么不使用python?

del is the equivalent of “unset” in many languages and as a cross reference point moving from another language to python.. people tend to look for commands that do the same thing that they used to do in their first language… also setting a var to “” or none doesn’t really remove the var from scope..it just empties its value the name of the var itself would still be stored in memory…why?!? in a memory intensive script..keeping trash behind its just a no no and anyways…every language out there has some form of an “unset/delete” var function..why not python?


回答 19

python中的每个对象都有一个标识符,类型,与之关联的引用计数,当我们使用del时,引用计数会减少,当引用计数变为零时,它很可能成为垃圾回收的候选对象。与将标识符设置为“无”相比,这可以区分del。在后面的情况下,它只是意味着该对象被遗忘了(直到我们超出范围,在这种情况下,计数减少了),现在标识符仅指向其他某个对象(内存位置)。

Every object in python has an identifier, Type, reference count associated with it, when we use del the reference count is reduced, when the reference count becomes zero it is a potential candidate for getting garbage collected. This differentiates the del when compared to setting an identifier to None. In later case it simply means the object is just left out wild( until we are out of scope in which case the count is reduced) and simply now the identifier point to some other object(memory location).


time.sleep —休眠线程或进程?

问题:time.sleep —休眠线程或进程?

在Python for * nix中,是否time.sleep()阻塞线程或进程?

In Python for *nix, does time.sleep() block the thread or the process?


回答 0

它阻塞线程。如果在Python源代码中查看Modules / timemodule.c,您会看到在调用中floatsleep(),睡眠操作的实质部分包装在Py_BEGIN_ALLOW_THREADS和Py_END_ALLOW_THREADS块中,从而允许其他线程在当前线程继续执行一睡觉。您也可以使用简单的python程序进行测试:

import time
from threading import Thread

class worker(Thread):
    def run(self):
        for x in xrange(0,11):
            print x
            time.sleep(1)

class waiter(Thread):
    def run(self):
        for x in xrange(100,103):
            print x
            time.sleep(5)

def run():
    worker().start()
    waiter().start()

将打印:

>>> thread_test.run()
0
100
>>> 1
2
3
4
5
101
6
7
8
9
10
102

It blocks the thread. If you look in Modules/timemodule.c in the Python source, you’ll see that in the call to floatsleep(), the substantive part of the sleep operation is wrapped in a Py_BEGIN_ALLOW_THREADS and Py_END_ALLOW_THREADS block, allowing other threads to continue to execute while the current one sleeps. You can also test this with a simple python program:

import time
from threading import Thread

class worker(Thread):
    def run(self):
        for x in xrange(0,11):
            print x
            time.sleep(1)

class waiter(Thread):
    def run(self):
        for x in xrange(100,103):
            print x
            time.sleep(5)

def run():
    worker().start()
    waiter().start()

Which will print:

>>> thread_test.run()
0
100
>>> 1
2
3
4
5
101
6
7
8
9
10
102

回答 1

它只会休眠线程,除非您的应用程序只有一个线程,在这种情况下,它将休眠线程并有效地进程。

睡眠中的python文档未指定此内容,因此我当然可以理解混淆!

http://docs.python.org/2/library/time.html

It will just sleep the thread except in the case where your application has only a single thread, in which case it will sleep the thread and effectively the process as well.

The python documentation on sleep doesn’t specify this however, so I can certainly understand the confusion!

http://docs.python.org/2/library/time.html


回答 2

只是线程。

Just the thread.


回答 3

该线程将阻塞,但是该进程仍然有效。

在单线程应用程序中,这意味着您在睡眠时一切都被阻止了。在多线程应用程序中,只有您显式“睡眠”的线程将被阻塞,其他线程仍在进程中运行。

The thread will block, but the process is still alive.

In a single threaded application, this means everything is blocked while you sleep. In a multithreaded application, only the thread you explicitly ‘sleep’ will block and the other threads still run within the process.


回答 4

只有线程,除非您的进程具有单个线程。

Only the thread unless your process has a single thread.


回答 5

进程本身无法运行。关于执行,进程只是线程的容器。这意味着您根本无法暂停该过程。它根本不适用于过程。

Process is not runnable by itself. In regard to execution, process is just a container for threads. Meaning you can’t pause the process at all. It is simply not applicable to process.


回答 6

如果它在同一线程中执行,则它将阻塞一个线程,如果不是从主代码中执行,则它将阻塞该线程

it blocks a thread if it is executed in the same thread not if it is executed from the main code


“ is”运算符对整数的行为异常

问题:“ is”运算符对整数的行为异常

为什么以下内容在Python中表现异常?

>>> a = 256
>>> b = 256
>>> a is b
True           # This is an expected result
>>> a = 257
>>> b = 257
>>> a is b
False          # What happened here? Why is this False?
>>> 257 is 257
True           # Yet the literal numbers compare properly

我正在使用Python 2.5.2。尝试使用某些不同版本的Python,Python 2.3.3似乎在99到100之间显示了上述行为。

基于以上所述,我可以假设Python是内部实现的,因此“小”整数的存储方式与大整数的存储方式不同,并且is运算符可以分辨出这种差异。为什么要泄漏抽象?当我事先不知道它们是否为数字时,比较两个任意对象以查看它们是否相同的更好方法是什么?

Why does the following behave unexpectedly in Python?

>>> a = 256
>>> b = 256
>>> a is b
True           # This is an expected result
>>> a = 257
>>> b = 257
>>> a is b
False          # What happened here? Why is this False?
>>> 257 is 257
True           # Yet the literal numbers compare properly

I am using Python 2.5.2. Trying some different versions of Python, it appears that Python 2.3.3 shows the above behaviour between 99 and 100.

Based on the above, I can hypothesize that Python is internally implemented such that “small” integers are stored in a different way than larger integers and the is operator can tell the difference. Why the leaky abstraction? What is a better way of comparing two arbitrary objects to see whether they are the same when I don’t know in advance whether they are numbers or not?


回答 0

看看这个:

>>> a = 256
>>> b = 256
>>> id(a)
9987148
>>> id(b)
9987148
>>> a = 257
>>> b = 257
>>> id(a)
11662816
>>> id(b)
11662828

这是我在Python 2文档“普通整数对象”中发现的内容(对于Python 3也是一样):

当前的实现为-5到256之间的所有整数保留一个整数对象数组,当您在该范围内创建int时,实际上实际上是返回对现有对象的引用。因此应该可以更改1的值。我怀疑在这种情况下Python的行为是不确定的。:-)

Take a look at this:

>>> a = 256
>>> b = 256
>>> id(a)
9987148
>>> id(b)
9987148
>>> a = 257
>>> b = 257
>>> id(a)
11662816
>>> id(b)
11662828

Here’s what I found in the Python 2 documentation, “Plain Integer Objects” (It’s the same for Python 3):

The current implementation keeps an array of integer objects for all integers between -5 and 256, when you create an int in that range you actually just get back a reference to the existing object. So it should be possible to change the value of 1. I suspect the behaviour of Python in this case is undefined. :-)


回答 1

Python的“ is”运算符在使用整数时表现异常吗?

总结-让我强调一下:不要is用于比较整数。

这不是您应该有任何期望的行为。

相反,分别使用==!=比较相等和不平等。例如:

>>> a = 1000
>>> a == 1000       # Test integers like this,
True
>>> a != 5000       # or this!
True
>>> a is 1000       # Don't do this! - Don't use `is` to test integers!!
False

说明

要知道这一点,您需要了解以下内容。

首先,该怎么is办?它是一个比较运算符。从文档中

运算符isis not测试对象标识:x is y当且仅当x和y是同一对象时才为true。x is not y产生反真值。

因此,以下内容是等效的。

>>> a is b
>>> id(a) == id(b)

文档中

id 返回对象的“身份”。这是一个整数(或长整数),在该对象的生存期内,此整数保证是唯一且恒定的。具有不重叠生存期的两个对象可能具有相同的id()值。

请注意,CPython(Python的参考实现)中对象的ID是内存中的位置这一事实是实现细节。Python的其他实现(例如Jython或IronPython)可以轻松地使用的不同实现id

那么用例是is什么呢? PEP8描述

与单例之类的比较None应始终使用isis not,而不应使用相等运算符。

问题

您询问并陈述以下问题(带有代码):

为什么以下内容在Python中表现异常?

>>> a = 256
>>> b = 256
>>> a is b
True           # This is an expected result

不是预期的结果。为什么会这样?这仅意味着256两者a和引用的整数值是整数b的相同实例。整数在Python中是不可变的,因此它们不能更改。这对任何代码都没有影响。不应期望。这仅仅是一个实现细节。

但是也许我们应该为每次声明一个等于256的值而在内存中没有新的单独实例感到高兴。

>>> a = 257
>>> b = 257
>>> a is b
False          # What happened here? Why is this False?

看起来我们现在有两个单独的整数实例,它们的值257在内存中。由于整数是不可变的,因此会浪费内存。希望我们不要浪费很多。我们可能不是。但是不能保证这种行为。

>>> 257 is 257
True           # Yet the literal numbers compare properly

好吧,这看起来好像您的Python特定实现正在尝试变得聪明,除非必须这样做,否则不会在内存中创建冗余值的整数。您似乎表明您正在使用Python的引用实现,即CPython。对CPython有好处。

如果CPython可以在全球范围内做到这一点甚至更好,如果它可以便宜地做到这一点(因为查找会花费一定的成本),也许还有另一种实现方式。

但是对于对代码的影响,您不必在乎整数是否是整数的特定实例。您只需要关心该实例的值是什么,就可以使用普通的比较运算符,即==

是什么is

is检查id两个对象的相同。在CPython中,id是内存中的位置,但是在另一个实现中,它可能是其他一些唯一标识的数字。要用代码重新声明:

>>> a is b

是相同的

>>> id(a) == id(b)

那我们为什么要使用is呢?

相对于说,这是一个非常快速的检查,检查两个很长的字符串的值是否相等。但是由于它适用于对象的唯一性,因此我们的用例有限。实际上,我们主要是想用它来检查None,这是一个单例(内存中一个地方存在的唯一实例)。如果有可能将其他单例合并is,我们可以创建其他单例,我们可能会与进行检查,但这相对较少。这是一个示例(将在Python 2和3中运行),例如

SENTINEL_SINGLETON = object() # this will only be created one time.

def foo(keyword_argument=None):
    if keyword_argument is None:
        print('no argument given to foo')
    bar()
    bar(keyword_argument)
    bar('baz')

def bar(keyword_argument=SENTINEL_SINGLETON):
    # SENTINEL_SINGLETON tells us if we were not passed anything
    # as None is a legitimate potential argument we could get.
    if keyword_argument is SENTINEL_SINGLETON:
        print('no argument given to bar')
    else:
        print('argument to bar: {0}'.format(keyword_argument))

foo()

哪些打印:

no argument given to foo
no argument given to bar
argument to bar: None
argument to bar: baz

因此,我们看到,使用is和和哨兵,我们可以区分何时bar不带参数调用和何时带调用None。这些是主要的用例的is-不要没有用它来测试整数,字符串,元组,或者其他喜欢这些东西的平等。

Python’s “is” operator behaves unexpectedly with integers?

In summary – let me emphasize: Do not use is to compare integers.

This isn’t behavior you should have any expectations about.

Instead, use == and != to compare for equality and inequality, respectively. For example:

>>> a = 1000
>>> a == 1000       # Test integers like this,
True
>>> a != 5000       # or this!
True
>>> a is 1000       # Don't do this! - Don't use `is` to test integers!!
False

Explanation

To know this, you need to know the following.

First, what does is do? It is a comparison operator. From the documentation:

The operators is and is not test for object identity: x is y is true if and only if x and y are the same object. x is not y yields the inverse truth value.

And so the following are equivalent.

>>> a is b
>>> id(a) == id(b)

From the documentation:

id Return the “identity” of an object. This is an integer (or long integer) which is guaranteed to be unique and constant for this object during its lifetime. Two objects with non-overlapping lifetimes may have the same id() value.

Note that the fact that the id of an object in CPython (the reference implementation of Python) is the location in memory is an implementation detail. Other implementations of Python (such as Jython or IronPython) could easily have a different implementation for id.

So what is the use-case for is? PEP8 describes:

Comparisons to singletons like None should always be done with is or is not, never the equality operators.

The Question

You ask, and state, the following question (with code):

Why does the following behave unexpectedly in Python?

>>> a = 256
>>> b = 256
>>> a is b
True           # This is an expected result

It is not an expected result. Why is it expected? It only means that the integers valued at 256 referenced by both a and b are the same instance of integer. Integers are immutable in Python, thus they cannot change. This should have no impact on any code. It should not be expected. It is merely an implementation detail.

But perhaps we should be glad that there is not a new separate instance in memory every time we state a value equals 256.

>>> a = 257
>>> b = 257
>>> a is b
False          # What happened here? Why is this False?

Looks like we now have two separate instances of integers with the value of 257 in memory. Since integers are immutable, this wastes memory. Let’s hope we’re not wasting a lot of it. We’re probably not. But this behavior is not guaranteed.

>>> 257 is 257
True           # Yet the literal numbers compare properly

Well, this looks like your particular implementation of Python is trying to be smart and not creating redundantly valued integers in memory unless it has to. You seem to indicate you are using the referent implementation of Python, which is CPython. Good for CPython.

It might be even better if CPython could do this globally, if it could do so cheaply (as there would a cost in the lookup), perhaps another implementation might.

But as for impact on code, you should not care if an integer is a particular instance of an integer. You should only care what the value of that instance is, and you would use the normal comparison operators for that, i.e. ==.

What is does

is checks that the id of two objects are the same. In CPython, the id is the location in memory, but it could be some other uniquely identifying number in another implementation. To restate this with code:

>>> a is b

is the same as

>>> id(a) == id(b)

Why would we want to use is then?

This can be a very fast check relative to say, checking if two very long strings are equal in value. But since it applies to the uniqueness of the object, we thus have limited use-cases for it. In fact, we mostly want to use it to check for None, which is a singleton (a sole instance existing in one place in memory). We might create other singletons if there is potential to conflate them, which we might check with is, but these are relatively rare. Here’s an example (will work in Python 2 and 3) e.g.

SENTINEL_SINGLETON = object() # this will only be created one time.

def foo(keyword_argument=None):
    if keyword_argument is None:
        print('no argument given to foo')
    bar()
    bar(keyword_argument)
    bar('baz')

def bar(keyword_argument=SENTINEL_SINGLETON):
    # SENTINEL_SINGLETON tells us if we were not passed anything
    # as None is a legitimate potential argument we could get.
    if keyword_argument is SENTINEL_SINGLETON:
        print('no argument given to bar')
    else:
        print('argument to bar: {0}'.format(keyword_argument))

foo()

Which prints:

no argument given to foo
no argument given to bar
argument to bar: None
argument to bar: baz

And so we see, with is and a sentinel, we are able to differentiate between when bar is called with no arguments and when it is called with None. These are the primary use-cases for is – do not use it to test for equality of integers, strings, tuples, or other things like these.


回答 2

这取决于您是否要看两个事物是否相等或相同的对象。

is检查它们是否是相同的对象,而不仅仅是相等。小整数可能指向相同的内存位置以提高空间效率

In [29]: a = 3
In [30]: b = 3
In [31]: id(a)
Out[31]: 500729144
In [32]: id(b)
Out[32]: 500729144

您应该==用来比较任意对象的相等性。您可以使用__eq____ne__属性指定行为。

It depends on whether you’re looking to see if 2 things are equal, or the same object.

is checks to see if they are the same object, not just equal. The small ints are probably pointing to the same memory location for space efficiency

In [29]: a = 3
In [30]: b = 3
In [31]: id(a)
Out[31]: 500729144
In [32]: id(b)
Out[32]: 500729144

You should use == to compare equality of arbitrary objects. You can specify the behavior with the __eq__, and __ne__ attributes.


回答 3

我来晚了,但是,您想从中获得答案吗?我将尝试以介绍性的方式对此进行说明,以便更多的人可以跟进。


关于CPython的一件好事是您实际上可以看到其来源。我将使用3.5版本的链接,但是找到相应的2.x链接是微不足道的。

在CPython中,用于创建新对象的C-API函数intPyLong_FromLong(long v)。此功能的说明是:

当前的实现为-5到256之间的所有整数保留一个整数对象数组,当您在该范围内创建int时,实际上实际上是返回对现有对象的引用。因此应该可以更改1的值。我怀疑在这种情况下Python的行为是不确定的。:-)

(我的斜体)

不了解您,但我看到了并想:让我们找到那个数组!

如果您还不熟悉实现CPython的C代码,则应 ; 一切都井井有条,可读性强。对于我们而言,我们需要在看Objects子目录中的主源代码目录树

PyLong_FromLong处理long对象,因此不难推断我们需要窥视内部longobject.c。看完内部,您可能会觉得事情很混乱。它们是,但不要担心,我们正在寻找的功能在第230行令人不寒而栗,等待我们检查出来。这是一个很小的函数,因此主体(不包括声明)可以轻松粘贴到此处:

PyObject *
PyLong_FromLong(long ival)
{
    // omitting declarations

    CHECK_SMALL_INT(ival);

    if (ival < 0) {
        /* negate: cant write this as abs_ival = -ival since that
           invokes undefined behaviour when ival is LONG_MIN */
        abs_ival = 0U-(unsigned long)ival;
        sign = -1;
    }
    else {
        abs_ival = (unsigned long)ival;
    }

    /* Fast path for single-digit ints */
    if (!(abs_ival >> PyLong_SHIFT)) {
        v = _PyLong_New(1);
        if (v) {
            Py_SIZE(v) = sign;
            v->ob_digit[0] = Py_SAFE_DOWNCAST(
                abs_ival, unsigned long, digit);
        }
        return (PyObject*)v; 
}

现在,我们不是C 主代码-haxxorz,但我们也不傻,我们可以看到这CHECK_SMALL_INT(ival);一切诱人地窥视着我们。我们可以理解,这与此有关。让我们来看看:

#define CHECK_SMALL_INT(ival) \
    do if (-NSMALLNEGINTS <= ival && ival < NSMALLPOSINTS) { \
        return get_small_int((sdigit)ival); \
    } while(0)

因此,get_small_int如果值ival满足条件,则它是一个调用函数的宏:

if (-NSMALLNEGINTS <= ival && ival < NSMALLPOSINTS)

那么什么是NSMALLNEGINTSNSMALLPOSINTS?宏!他们在这里

#ifndef NSMALLPOSINTS
#define NSMALLPOSINTS           257
#endif
#ifndef NSMALLNEGINTS
#define NSMALLNEGINTS           5
#endif

所以我们的条件是if (-5 <= ival && ival < 257)通话get_small_int

接下来,让我们看一下get_small_int它的所有荣耀(好吧,我们只看它的身体,因为那是有趣的地方):

PyObject *v;
assert(-NSMALLNEGINTS <= ival && ival < NSMALLPOSINTS);
v = (PyObject *)&small_ints[ival + NSMALLNEGINTS];
Py_INCREF(v);

好的,声明一个PyObject,断言先前的条件成立并执行赋值:

v = (PyObject *)&small_ints[ival + NSMALLNEGINTS];

small_ints看起来很像我们一直在寻找的那个数组,它是!我们只要阅读该死的文档,我们就永远知道!

/* Small integers are preallocated in this array so that they
   can be shared.
   The integers that are preallocated are those in the range
   -NSMALLNEGINTS (inclusive) to NSMALLPOSINTS (not inclusive).
*/
static PyLongObject small_ints[NSMALLNEGINTS + NSMALLPOSINTS];

是的,这是我们的家伙。当您要int在该范围内创建一个新[NSMALLNEGINTS, NSMALLPOSINTS)对象时,您只需返回对已预先分配的现有对象的引用。

由于引用引用的是同一对象,因此id()直接发布或检查其上的身份is将返回完全相同的内容。

但是,什么时候分配它们?

_PyLong_Init Python 初始化期间,将很乐意进入for循环为您执行此操作:

for (ival = -NSMALLNEGINTS; ival <  NSMALLPOSINTS; ival++, v++) {

查看源代码以阅读循环体!

希望我的解释使您现在对C的认识清楚(很明显是故意的)。


但是,257 is 257?这是怎么回事?

这实际上更容易解释,我已经尝试过这样做;这是由于Python将这个交互式语句作为一个单独的块执行:

>>> 257 is 257

在编译此语句期间,CPython将看到您有两个匹配的文字,并将使用相同的PyLongObject表示形式257。如果您自己进行编译并检查其内容,则可以看到以下内容:

>>> codeObj = compile("257 is 257", "blah!", "exec")
>>> codeObj.co_consts
(257, None)

当CPython进行操作时,现在将要加载完全相同的对象:

>>> import dis
>>> dis.dis(codeObj)
  1           0 LOAD_CONST               0 (257)   # dis
              3 LOAD_CONST               0 (257)   # dis again
              6 COMPARE_OP               8 (is)

所以is会回来的True

I’m late but, you want some source with your answer? I’ll try and word this in an introductory manner so more folks can follow along.


A good thing about CPython is that you can actually see the source for this. I’m going to use links for the 3.5 release, but finding the corresponding 2.x ones is trivial.

In CPython, the C-API function that handles creating a new int object is PyLong_FromLong(long v). The description for this function is:

The current implementation keeps an array of integer objects for all integers between -5 and 256, when you create an int in that range you actually just get back a reference to the existing object. So it should be possible to change the value of 1. I suspect the behaviour of Python in this case is undefined. :-)

(My italics)

Don’t know about you but I see this and think: Let’s find that array!

If you haven’t fiddled with the C code implementing CPython you should; everything is pretty organized and readable. For our case, we need to look in the Objects subdirectory of the main source code directory tree.

PyLong_FromLong deals with long objects so it shouldn’t be hard to deduce that we need to peek inside longobject.c. After looking inside you might think things are chaotic; they are, but fear not, the function we’re looking for is chilling at line 230 waiting for us to check it out. It’s a smallish function so the main body (excluding declarations) is easily pasted here:

PyObject *
PyLong_FromLong(long ival)
{
    // omitting declarations

    CHECK_SMALL_INT(ival);

    if (ival < 0) {
        /* negate: cant write this as abs_ival = -ival since that
           invokes undefined behaviour when ival is LONG_MIN */
        abs_ival = 0U-(unsigned long)ival;
        sign = -1;
    }
    else {
        abs_ival = (unsigned long)ival;
    }

    /* Fast path for single-digit ints */
    if (!(abs_ival >> PyLong_SHIFT)) {
        v = _PyLong_New(1);
        if (v) {
            Py_SIZE(v) = sign;
            v->ob_digit[0] = Py_SAFE_DOWNCAST(
                abs_ival, unsigned long, digit);
        }
        return (PyObject*)v; 
}

Now, we’re no C master-code-haxxorz but we’re also not dumb, we can see that CHECK_SMALL_INT(ival); peeking at us all seductively; we can understand it has something to do with this. Let’s check it out:

#define CHECK_SMALL_INT(ival) \
    do if (-NSMALLNEGINTS <= ival && ival < NSMALLPOSINTS) { \
        return get_small_int((sdigit)ival); \
    } while(0)

So it’s a macro that calls function get_small_int if the value ival satisfies the condition:

if (-NSMALLNEGINTS <= ival && ival < NSMALLPOSINTS)

So what are NSMALLNEGINTS and NSMALLPOSINTS? Macros! Here they are:

#ifndef NSMALLPOSINTS
#define NSMALLPOSINTS           257
#endif
#ifndef NSMALLNEGINTS
#define NSMALLNEGINTS           5
#endif

So our condition is if (-5 <= ival && ival < 257) call get_small_int.

Next let’s look at get_small_int in all its glory (well, we’ll just look at its body because that’s where the interesting things are):

PyObject *v;
assert(-NSMALLNEGINTS <= ival && ival < NSMALLPOSINTS);
v = (PyObject *)&small_ints[ival + NSMALLNEGINTS];
Py_INCREF(v);

Okay, declare a PyObject, assert that the previous condition holds and execute the assignment:

v = (PyObject *)&small_ints[ival + NSMALLNEGINTS];

small_ints looks a lot like that array we’ve been searching for, and it is! We could’ve just read the damn documentation and we would’ve know all along!:

/* Small integers are preallocated in this array so that they
   can be shared.
   The integers that are preallocated are those in the range
   -NSMALLNEGINTS (inclusive) to NSMALLPOSINTS (not inclusive).
*/
static PyLongObject small_ints[NSMALLNEGINTS + NSMALLPOSINTS];

So yup, this is our guy. When you want to create a new int in the range [NSMALLNEGINTS, NSMALLPOSINTS) you’ll just get back a reference to an already existing object that has been preallocated.

Since the reference refers to the same object, issuing id() directly or checking for identity with is on it will return exactly the same thing.

But, when are they allocated??

During initialization in _PyLong_Init Python will gladly enter in a for loop do do this for you:

for (ival = -NSMALLNEGINTS; ival <  NSMALLPOSINTS; ival++, v++) {

Check out the source to read the loop body!

I hope my explanation has made you C things clearly now (pun obviously intented).


But, 257 is 257? What’s up?

This is actually easier to explain, and I have attempted to do so already; it’s due to the fact that Python will execute this interactive statement as a single block:

>>> 257 is 257

During complilation of this statement, CPython will see that you have two matching literals and will use the same PyLongObject representing 257. You can see this if you do the compilation yourself and examine its contents:

>>> codeObj = compile("257 is 257", "blah!", "exec")
>>> codeObj.co_consts
(257, None)

When CPython does the operation, it’s now just going to load the exact same object:

>>> import dis
>>> dis.dis(codeObj)
  1           0 LOAD_CONST               0 (257)   # dis
              3 LOAD_CONST               0 (257)   # dis again
              6 COMPARE_OP               8 (is)

So is will return True.


回答 4

您可以检入源文件intobject.c,Python会缓存小整数以提高效率。每次创建对小整数的引用时,都是在引用缓存的小整数,而不是新对象。257不是一个小整数,因此它被计算为另一个对象。

最好==用于此目的。

As you can check in source file intobject.c, Python caches small integers for efficiency. Every time you create a reference to a small integer, you are referring the cached small integer, not a new object. 257 is not an small integer, so it is calculated as a different object.

It is better to use == for that purpose.


回答 5

我认为您的假设是正确的。实验id(对象的身份):

In [1]: id(255)
Out[1]: 146349024

In [2]: id(255)
Out[2]: 146349024

In [3]: id(257)
Out[3]: 146802752

In [4]: id(257)
Out[4]: 148993740

In [5]: a=255

In [6]: b=255

In [7]: c=257

In [8]: d=257

In [9]: id(a), id(b), id(c), id(d)
Out[9]: (146349024, 146349024, 146783024, 146804020)

看来数字<= 255被当作​​文字,而上面的任何东西都被不同地对待!

I think your hypotheses is correct. Experiment with id (identity of object):

In [1]: id(255)
Out[1]: 146349024

In [2]: id(255)
Out[2]: 146349024

In [3]: id(257)
Out[3]: 146802752

In [4]: id(257)
Out[4]: 148993740

In [5]: a=255

In [6]: b=255

In [7]: c=257

In [8]: d=257

In [9]: id(a), id(b), id(c), id(d)
Out[9]: (146349024, 146349024, 146783024, 146804020)

It appears that numbers <= 255 are treated as literals and anything above is treated differently!


回答 6

对于整数,字符串或日期时间之类的不可变值对象,对象标识并不是特别有用。最好考虑平等。身份本质上是值对象的实现细节-由于它们是不可变的,因此对同一个对象或多个对象具有多个引用之间没有有效的区别。

For immutable value objects, like ints, strings or datetimes, object identity is not especially useful. It’s better to think about equality. Identity is essentially an implementation detail for value objects – since they’re immutable, there’s no effective difference between having multiple refs to the same object or multiple objects.


回答 7

现有答案中都没有指出另一个问题。允许Python合并任何两个不可变的值,并且预先创建的小int值不是发生这种情况的唯一方法。永远不能保证 Python实现会做到这一点,但他们所做的不仅仅只是小整数。


一方面,还有一些其他预先创建的值,例如empty tuplestrbytes和一些短字符串(在CPython 3.6中,这是256个单字符Latin-1字符串)。例如:

>>> a = ()
>>> b = ()
>>> a is b
True

而且,即使是非预先创建的值也可以相同。考虑以下示例:

>>> c = 257
>>> d = 257
>>> c is d
False
>>> e, f = 258, 258
>>> e is f
True

这不限于int值:

>>> g, h = 42.23e100, 42.23e100
>>> g is h
True

显然,CPython没有为预先创建float42.23e100。那么,这是怎么回事?

CPython的编译器将合并一些已知不变类型等的恒定值intfloatstrbytes,在相同的编译单元。对于一个模块,整个模块是一个编译单元,但是在交互式解释器中,每个语句都是一个单独的编译单元。由于cd是在单独的语句中定义的,因此不会合并它们的值。由于ef是在同一条语句中定义的,因此将合并它们的值。


您可以通过分解字节码来查看发生了什么。尝试定义一个执行该操作的函数,e, f = 128, 128然后对其进行调用dis.dis,您将看到只有一个常数值(128, 128)

>>> def f(): i, j = 258, 258
>>> dis.dis(f)
  1           0 LOAD_CONST               2 ((128, 128))
              2 UNPACK_SEQUENCE          2
              4 STORE_FAST               0 (i)
              6 STORE_FAST               1 (j)
              8 LOAD_CONST               0 (None)
             10 RETURN_VALUE
>>> f.__code__.co_consts
(None, 128, (128, 128))
>>> id(f.__code__.co_consts[1], f.__code__.co_consts[2][0], f.__code__.co_consts[2][1])
4305296480, 4305296480, 4305296480

您可能会注意到,128即使字节码实际上并未使用编译器,编译器也已将其存储为常量,这使您了解了CPython编译器所做的优化很少。这意味着(非空)元组实际上不会最终合并:

>>> k, l = (1, 2), (1, 2)
>>> k is l
False

把在一个函数,dis它,看看co_consts-there是一个12两个(1, 2)共享相同的元组12,但不相同,并且((1, 2), (1, 2))具有两个不同的元组相等的元组。


CPython还有另外一个优化:字符串实习。与编译器常量折叠不同,这不限于源代码文字:

>>> m = 'abc'
>>> n = 'abc'
>>> m is n
True

另一方面,它仅限于内部存储类型“ ascii compact”,“ compact”或“ legacy ready”str类型和字符串,并且在许多情况下,只有“ ascii compact”会被嵌入。


无论如何,不​​同实现之间,同一实现的版本之间,甚至同一实现的同一副本上运行相同代码的时间之间,关于值必须是,可能是或不能不同的规则有所不同。 。

有趣的是值得学习一个特定Python的规则。但是在代码中不值得依赖它们。唯一安全的规则是:

  • 不要编写假定两个相等但分别创建的不可变值相同的代码(不要使用x is y,请使用x == y
  • 不要编写假定两个相等但分别创建的不可变值不同的代码(不要使用x is not y,请使用x != y

或者,换句话说,仅用于is测试已记录的单例(如None)或仅在代码中的一个位置创建的单例(如_sentinel = object()成语)。

There’s another issue that isn’t pointed out in any of the existing answers. Python is allowed to merge any two immutable values, and pre-created small int values are not the only way this can happen. A Python implementation is never guaranteed to do this, but they all do it for more than just small ints.


For one thing, there are some other pre-created values, such as the empty tuple, str, and bytes, and some short strings (in CPython 3.6, it’s the 256 single-character Latin-1 strings). For example:

>>> a = ()
>>> b = ()
>>> a is b
True

But also, even non-pre-created values can be identical. Consider these examples:

>>> c = 257
>>> d = 257
>>> c is d
False
>>> e, f = 258, 258
>>> e is f
True

And this isn’t limited to int values:

>>> g, h = 42.23e100, 42.23e100
>>> g is h
True

Obviously, CPython doesn’t come with a pre-created float value for 42.23e100. So, what’s going on here?

The CPython compiler will merge constant values of some known-immutable types like int, float, str, bytes, in the same compilation unit. For a module, the whole module is a compilation unit, but at the interactive interpreter, each statement is a separate compilation unit. Since c and d are defined in separate statements, their values aren’t merged. Since e and f are defined in the same statement, their values are merged.


You can see what’s going on by disassembling the bytecode. Try defining a function that does e, f = 128, 128 and then calling dis.dis on it, and you’ll see that there’s a single constant value (128, 128)

>>> def f(): i, j = 258, 258
>>> dis.dis(f)
  1           0 LOAD_CONST               2 ((128, 128))
              2 UNPACK_SEQUENCE          2
              4 STORE_FAST               0 (i)
              6 STORE_FAST               1 (j)
              8 LOAD_CONST               0 (None)
             10 RETURN_VALUE
>>> f.__code__.co_consts
(None, 128, (128, 128))
>>> id(f.__code__.co_consts[1], f.__code__.co_consts[2][0], f.__code__.co_consts[2][1])
4305296480, 4305296480, 4305296480

You may notice that the compiler has stored 128 as a constant even though it’s not actually used by the bytecode, which gives you an idea of how little optimization CPython’s compiler does. Which means that (non-empty) tuples actually don’t end up merged:

>>> k, l = (1, 2), (1, 2)
>>> k is l
False

Put that in a function, dis it, and look at the co_consts—there’s a 1 and a 2, two (1, 2) tuples that share the same 1 and 2 but are not identical, and a ((1, 2), (1, 2)) tuple that has the two distinct equal tuples.


There’s one more optimization that CPython does: string interning. Unlike compiler constant folding, this isn’t restricted to source code literals:

>>> m = 'abc'
>>> n = 'abc'
>>> m is n
True

On the other hand, it is limited to the str type, and to strings of internal storage kind “ascii compact”, “compact”, or “legacy ready”, and in many cases only “ascii compact” will get interned.


At any rate, the rules for what values must be, might be, or cannot be distinct vary from implementation to implementation, and between versions of the same implementation, and maybe even between runs of the same code on the same copy of the same implementation.

It can be worth learning the rules for one specific Python for the fun of it. But it’s not worth relying on them in your code. The only safe rule is:

  • Do not write code that assumes two equal but separately-created immutable values are identical (don’t use x is y, use x == y)
  • Do not write code that assumes two equal but separately-created immutable values are distinct (don’t use x is not y, use x != y)

Or, in other words, only use is to test for the documented singletons (like None) or that are only created in one place in the code (like the _sentinel = object() idiom).


回答 8

is 身份相等运算符(功能类似于id(a) == id(b));只是两个相等的数字不一定是同一对象。出于性能原因,一些小整数正好会被记住,因此它们往往是相同的(因为它们是不可变的,因此可以这样做)。

===另一方面,PHP的运算符被描述为检查相等性和类型:x == y and type(x) == type(y)根据Paulo Freitas的评论。这足以满足通用数,但不同于以荒谬方式is定义的类__eq__

class Unequal:
    def __eq__(self, other):
        return False

对于“内置”类,PHP显然允许相同的东西(我指的是在C级实现,而不是在PHP中实现)。计时器对象可能有点荒谬,它每次用作数字时,其值都不同。相当为什么要模拟Visual Basic,Now而不是显示它是带有time.time()我不知道。

Greg Hewgill(OP)发表了一条澄清的评论:“我的目标是比较对象标识,而不是价值相等。除了数字,我希望对象标识与价值相等相同。”

这将有另一个答案,因为我们必须将事物归类为数字,以选择是否与==或进行比较isCPython定义数字协议,包括PyNumber_Check,但这不能从Python本身访问。

我们可以尝试使用isinstance所有已知的数字类型,但这不可避免地是不完整的。类型模块包含一个StringTypes列表,但没有NumberTypes。从Python 2.6开始,内置数字类具有基类numbers.Number,但存在相同的问题:

import numpy, numbers
assert not issubclass(numpy.int16,numbers.Number)
assert issubclass(int,numbers.Number)

顺便说一句,NumPy将产生低数字的单独实例。

我实际上不知道这个问题的答案。我想从理论上讲可以使用ctypes进行调用PyNumber_Check,但是即使该函数也已经受到参数,并且肯定不是可移植的。我们只需要对我们目前要测试的内容有所保留。

最后,此问题源于Python最初没有类型树,其谓词如Scheme number?Haskell的 类型类 Numis检查对象身份,而不是值相等。PHP的历史也很悠久,===显然is在PHP5中的对象上起作用,而在PHP4中没有。这就是跨语言(包括一种语言的版本)之间转移的越来越大的痛苦。

is is the identity equality operator (functioning like id(a) == id(b)); it’s just that two equal numbers aren’t necessarily the same object. For performance reasons some small integers happen to be memoized so they will tend to be the same (this can be done since they are immutable).

PHP’s === operator, on the other hand, is described as checking equality and type: x == y and type(x) == type(y) as per Paulo Freitas’ comment. This will suffice for common numbers, but differ from is for classes that define __eq__ in an absurd manner:

class Unequal:
    def __eq__(self, other):
        return False

PHP apparently allows the same thing for “built-in” classes (which I take to mean implemented at C level, not in PHP). A slightly less absurd use might be a timer object, which has a different value every time it’s used as a number. Quite why you’d want to emulate Visual Basic’s Now instead of showing that it is an evaluation with time.time() I don’t know.

Greg Hewgill (OP) made one clarifying comment “My goal is to compare object identity, rather than equality of value. Except for numbers, where I want to treat object identity the same as equality of value.”

This would have yet another answer, as we have to categorize things as numbers or not, to select whether we compare with == or is. CPython defines the number protocol, including PyNumber_Check, but this is not accessible from Python itself.

We could try to use isinstance with all the number types we know of, but this would inevitably be incomplete. The types module contains a StringTypes list but no NumberTypes. Since Python 2.6, the built in number classes have a base class numbers.Number, but it has the same problem:

import numpy, numbers
assert not issubclass(numpy.int16,numbers.Number)
assert issubclass(int,numbers.Number)

By the way, NumPy will produce separate instances of low numbers.

I don’t actually know an answer to this variant of the question. I suppose one could theoretically use ctypes to call PyNumber_Check, but even that function has been debated, and it’s certainly not portable. We’ll just have to be less particular about what we test for now.

In the end, this issue stems from Python not originally having a type tree with predicates like Scheme’s number?, or Haskell’s type class Num. is checks object identity, not value equality. PHP has a colorful history as well, where === apparently behaves as is only on objects in PHP5, but not PHP4. Such are the growing pains of moving across languages (including versions of one).


回答 9

字符串也会发生这种情况:

>>> s = b = 'somestr'
>>> s == b, s is b, id(s), id(b)
(True, True, 4555519392, 4555519392)

现在一切似乎都很好。

>>> s = 'somestr'
>>> b = 'somestr'
>>> s == b, s is b, id(s), id(b)
(True, True, 4555519392, 4555519392)

这也是预期的。

>>> s1 = b1 = 'somestrdaasd ad ad asd as dasddsg,dlfg ,;dflg, dfg a'
>>> s1 == b1, s1 is b1, id(s1), id(b1)
(True, True, 4555308080, 4555308080)

>>> s1 = 'somestrdaasd ad ad asd as dasddsg,dlfg ,;dflg, dfg a'
>>> b1 = 'somestrdaasd ad ad asd as dasddsg,dlfg ,;dflg, dfg a'
>>> s1 == b1, s1 is b1, id(s1), id(b1)
(True, False, 4555308176, 4555308272)

现在那是出乎意料的。

It also happens with strings:

>>> s = b = 'somestr'
>>> s == b, s is b, id(s), id(b)
(True, True, 4555519392, 4555519392)

Now everything seems fine.

>>> s = 'somestr'
>>> b = 'somestr'
>>> s == b, s is b, id(s), id(b)
(True, True, 4555519392, 4555519392)

That’s expected too.

>>> s1 = b1 = 'somestrdaasd ad ad asd as dasddsg,dlfg ,;dflg, dfg a'
>>> s1 == b1, s1 is b1, id(s1), id(b1)
(True, True, 4555308080, 4555308080)

>>> s1 = 'somestrdaasd ad ad asd as dasddsg,dlfg ,;dflg, dfg a'
>>> b1 = 'somestrdaasd ad ad asd as dasddsg,dlfg ,;dflg, dfg a'
>>> s1 == b1, s1 is b1, id(s1), id(b1)
(True, False, 4555308176, 4555308272)

Now that’s unexpected.


回答 10

Python 3.8的新增功能:Python行为的变化

现在,当将身份检查(和 )与某些类型的文字(例如字符串,整数)一起使用时,编译器会生成SyntaxWarning。这些通常在CPython中偶然地起作用,但是语言规范无法保证。该警告建议用户改用相等性测试( 和)。isis not==!=

What’s New In Python 3.8: Changes in Python behavior:

The compiler now produces a SyntaxWarning when identity checks (is and is not) are used with certain types of literals (e.g. strings, ints). These can often work by accident in CPython, but are not guaranteed by the language spec. The warning advises users to use equality tests (== and !=) instead.


是否在Python 3.6+中订购了字典?

问题:是否在Python 3.6+中订购了字典?

与以前的版本不同,字典在Python 3.6中排序(至少在CPython实现下)。这似乎是一个重大更改,但只是文档中的一小段。它被描述为CPython实现细节而不是语言功能,但这也意味着将来可能会成为标准。

在保留元素顺序的同时,新的字典实现如何比旧的实现更好?

以下是文档中的文字:

dict()现在使用PyPy率先提出的“紧凑”表示形式。与Python 3.5相比,新dict()的内存使用量减少了20%至25%。PEP 468(在函数中保留** kwarg的顺序。)由此实现。此新实现的顺序保留方面被认为是实现细节,因此不应依赖(将来可能会更改,但是希望在更改语言规范之前,先在几个发行版中使用该新dict实现该语言,为所有当前和将来的Python实现强制要求保留顺序的语义;这还有助于保留与仍旧有效的随机迭代顺序的旧版本语言(例如Python 3.5)的向后兼容性。(由INADA Naoki在发行27350最初由Raymond Hettinger提出的想法。)

2017年12月更新:Python 3.7 保证dict保留插入顺序

Dictionaries are ordered in Python 3.6 (under the CPython implementation at least) unlike in previous incarnations. This seems like a substantial change, but it’s only a short paragraph in the documentation. It is described as a CPython implementation detail rather than a language feature, but also implies this may become standard in the future.

How does the new dictionary implementation perform better than the older one while preserving element order?

Here is the text from the documentation:

dict() now uses a “compact” representation pioneered by PyPy. The memory usage of the new dict() is between 20% and 25% smaller compared to Python 3.5. PEP 468 (Preserving the order of **kwargs in a function.) is implemented by this. The order-preserving aspect of this new implementation is considered an implementation detail and should not be relied upon (this may change in the future, but it is desired to have this new dict implementation in the language for a few releases before changing the language spec to mandate order-preserving semantics for all current and future Python implementations; this also helps preserve backwards-compatibility with older versions of the language where random iteration order is still in effect, e.g. Python 3.5). (Contributed by INADA Naoki in issue 27350. Idea originally suggested by Raymond Hettinger.)

Update December 2017: dicts retaining insertion order is guaranteed for Python 3.7


回答 0

是否在Python 3.6+中订购了字典?

它们是插入顺序[1]。从Python 3.6开始,对于Python的CPython实现,字典会记住插入项目的顺序这在Python 3.6中被视为实现细节;你需要使用OrderedDict,如果你想多数民众赞成插入排序保证不同的Python的其它实现(与其他有序行为[1] )。

从Python 3.7开始,它不再是实现细节,而是成为一种语言功能。从GvR的py​​thon-dev消息中

做到这一点。裁定“裁定保留插入顺序”。谢谢!

这只是意味着您可以依靠它。如果其他Python实现希望成为Python 3.7的一致实现,则还必须提供插入顺序字典。


在保留元素顺序的同时,Python 3.6字典实现如何比旧的实现更好的性能[2]

本质上,通过保留两个数组

  • 第一个数组,按插入顺序dk_entries保存字典的条目(类型PyDictKeyEntry)。保留顺序是通过仅附加数组来实现的,在该数组中始终在末尾插入新项(插入顺序)。

  • 第二个dk_indices保留dk_entries数组的索引(即,指示中相应条目位置的值dk_entries)。该数组充当哈希表。对键进行哈希处理时,它会导致存储在其中的索引之一,dk_indices并且通过indexing获取相应的条目dk_entries。由于只有索引被保留,此数组的类型取决于字典的整体大小(范围从类型int8_t1字节)到int32_t/ int64_t4/ 8字节)上32/ 64位构建)

在以前的实现中,必须分配类型PyDictKeyEntry和大小的稀疏数组dk_size。不幸的是,由于性能原因,该阵列不允许2/3 * dk_size满载,这也导致了很多空白。(并且空白区域具有大小!)。PyDictKeyEntry

现在不是这种情况,因为仅存储了必需的条目(已插入的条目),并且保留了一个稀疏类型的数组intX_tX取决于dict的大小)2/3 * dk_size。空格从类型更改PyDictKeyEntryintX_t

因此,显然,创建一个类型PyDictKeyEntry稀疏的数组比存储ints 的稀疏数组需要更多的内存。

如果有兴趣,可以在Python-Dev上查看有关此功能的完整对话,这是一本好书。


在Raymond Hettinger提出的原始建议中,可以看到使用的数据结构的可视化效果,该可视化体现了该思想的要旨。

例如,字典:

d = {'timmy': 'red', 'barry': 'green', 'guido': 'blue'}

当前存储为[keyhash,key,value]:

entries = [['--', '--', '--'],
           [-8522787127447073495, 'barry', 'green'],
           ['--', '--', '--'],
           ['--', '--', '--'],
           ['--', '--', '--'],
           [-9092791511155847987, 'timmy', 'red'],
           ['--', '--', '--'],
           [-6480567542315338377, 'guido', 'blue']]

相反,数据应按以下方式组织:

indices =  [None, 1, None, None, None, 0, None, 2]
entries =  [[-9092791511155847987, 'timmy', 'red'],
            [-8522787127447073495, 'barry', 'green'],
            [-6480567542315338377, 'guido', 'blue']]

正如您现在可以从视觉上看到的那样,在原始建议中,很多空间实际上是空的,以减少冲突并加快查找速度。使用新方法,可以通过将稀疏移动到真正需要的索引中来减少所需的内存。


[1]:我说“插入有序”而不是“有序”,因为在存在OrderedDict的情况下,“有序”暗示了dict对象不提供的其他行为。OrderedDicts是可逆的,提供顺序敏感的方法,并且主要是,提供一个订单sensive相等测试(==!=)。dict目前不提供任何这些行为/方法。


[2]:新的字典实现通过更紧凑的设计而在内存方面表现更好;这是这里的主要好处。在速度方面,差异并不那么明显,在某些地方,新的dict可能会引入轻微的回归(例如,关键查找),而在其他地方(会想到迭代和调整大小),应该会提高性能。

总体而言,由于引入的紧凑性,字典的性能(尤其是在现实生活中)得以提高。

Are dictionaries ordered in Python 3.6+?

They are insertion ordered[1]. As of Python 3.6, for the CPython implementation of Python, dictionaries remember the order of items inserted. This is considered an implementation detail in Python 3.6; you need to use OrderedDict if you want insertion ordering that’s guaranteed across other implementations of Python (and other ordered behavior[1]).

As of Python 3.7, this is no longer an implementation detail and instead becomes a language feature. From a python-dev message by GvR:

Make it so. “Dict keeps insertion order” is the ruling. Thanks!

This simply means that you can depend on it. Other implementations of Python must also offer an insertion ordered dictionary if they wish to be a conforming implementation of Python 3.7.


How does the Python 3.6 dictionary implementation perform better[2] than the older one while preserving element order?

Essentially, by keeping two arrays.

  • The first array, dk_entries, holds the entries (of type PyDictKeyEntry) for the dictionary in the order that they were inserted. Preserving order is achieved by this being an append only array where new items are always inserted at the end (insertion order).

  • The second, dk_indices, holds the indices for the dk_entries array (that is, values that indicate the position of the corresponding entry in dk_entries). This array acts as the hash table. When a key is hashed it leads to one of the indices stored in dk_indices and the corresponding entry is fetched by indexing dk_entries. Since only indices are kept, the type of this array depends on the overall size of the dictionary (ranging from type int8_t(1 byte) to int32_t/int64_t (4/8 bytes) on 32/64 bit builds)

In the previous implementation, a sparse array of type PyDictKeyEntry and size dk_size had to be allocated; unfortunately, it also resulted in a lot of empty space since that array was not allowed to be more than 2/3 * dk_size full for performance reasons. (and the empty space still had PyDictKeyEntry size!).

This is not the case now since only the required entries are stored (those that have been inserted) and a sparse array of type intX_t (X depending on dict size) 2/3 * dk_sizes full is kept. The empty space changed from type PyDictKeyEntry to intX_t.

So, obviously, creating a sparse array of type PyDictKeyEntry is much more memory demanding than a sparse array for storing ints.

You can see the full conversation on Python-Dev regarding this feature if interested, it is a good read.


In the original proposal made by Raymond Hettinger, a visualization of the data structures used can be seen which captures the gist of the idea.

For example, the dictionary:

d = {'timmy': 'red', 'barry': 'green', 'guido': 'blue'}

is currently stored as [keyhash, key, value]:

entries = [['--', '--', '--'],
           [-8522787127447073495, 'barry', 'green'],
           ['--', '--', '--'],
           ['--', '--', '--'],
           ['--', '--', '--'],
           [-9092791511155847987, 'timmy', 'red'],
           ['--', '--', '--'],
           [-6480567542315338377, 'guido', 'blue']]

Instead, the data should be organized as follows:

indices =  [None, 1, None, None, None, 0, None, 2]
entries =  [[-9092791511155847987, 'timmy', 'red'],
            [-8522787127447073495, 'barry', 'green'],
            [-6480567542315338377, 'guido', 'blue']]

As you can visually now see, in the original proposal, a lot of space is essentially empty to reduce collisions and make look-ups faster. With the new approach, you reduce the memory required by moving the sparseness where it’s really required, in the indices.


[1]: I say “insertion ordered” and not “ordered” since, with the existence of OrderedDict, “ordered” suggests further behavior that the dict object doesn’t provide. OrderedDicts are reversible, provide order sensitive methods and, mainly, provide an order-sensive equality tests (==, !=). dicts currently don’t offer any of those behaviors/methods.


[2]: The new dictionary implementations performs better memory wise by being designed more compactly; that’s the main benefit here. Speed wise, the difference isn’t so drastic, there’s places where the new dict might introduce slight regressions (key-lookups, for example) while in others (iteration and resizing come to mind) a performance boost should be present.

Overall, the performance of the dictionary, especially in real-life situations, improves due to the compactness introduced.


回答 1

以下是回答最初的第一个问题:

我应该在Python 3.6中使用dict还是OrderedDict在Python 3.6中使用?

我认为文档中的这句话实际上足以回答您的问题

此新实现的顺序保留方面被视为实现细节,不应依赖于此

dict并不明确表示它是有序集合,因此,如果您要保持一致并且不依赖于新实现的副作用,则应坚持使用OrderedDict

使您的代码成为未来的证明:)

有关于辩论在这里

编辑:Python 3.7将保留此功能, 请参阅

Below is answering the original first question:

Should I use dict or OrderedDict in Python 3.6?

I think this sentence from the documentation is actually enough to answer your question

The order-preserving aspect of this new implementation is considered an implementation detail and should not be relied upon

dict is not explicitly meant to be an ordered collection, so if you want to stay consistent and not rely on a side effect of the new implementation you should stick with OrderedDict.

Make your code future proof :)

There’s a debate about that here.

EDIT: Python 3.7 will keep this as a feature see


回答 2

更新:Guido van Rossum 在邮件列表宣布,从 Python 3.7开始dict,所有Python实现中必须保留插入顺序。

Update: Guido van Rossum announced on the mailing list that as of Python 3.7 dicts in all Python implementations must preserve insertion order.


回答 3

我想添加到上面的讨论中,但没有评论的声誉。

Python 3.8尚未发布,但它甚至将包含reversed()字典上的函数(消除了的另一个区别OrderedDict。)。

现在可以使用reversed()以反向插入顺序迭代Dict和dictviews。(由RémiLapeyre在bpo-33462中贡献。) 查看python 3.8的新增功能

我没有提到相等运算符或的其他功能,OrderedDict因此它们仍然不完全相同。

I wanted to add to the discussion above but don’t have the reputation to comment.

Python 3.8 is not quite released yet, but it will even include the reversed() function on dictionaries (removing another difference from OrderedDict.

Dict and dictviews are now iterable in reversed insertion order using reversed(). (Contributed by Rémi Lapeyre in bpo-33462.) See what’s new in python 3.8

I don’t see any mention of the equality operator or other features of OrderedDict so they are still not entirely the same.


@property装饰器如何工作?

问题:@property装饰器如何工作?

我想了解内置函数的property工作原理。令我感到困惑的是,property它还可以用作装饰器,但是仅当用作内置函数时才接受参数,而不能用作装饰器。

这个例子来自文档

class C(object):
    def __init__(self):
        self._x = None

    def getx(self):
        return self._x
    def setx(self, value):
        self._x = value
    def delx(self):
        del self._x
    x = property(getx, setx, delx, "I'm the 'x' property.")

property的论点是getxsetxdelx和文档字符串。

在下面的代码中property用作装饰器。它的对象是x函数,但是在上面的代码中,参数中没有对象函数的位置。

class C(object):
    def __init__(self):
        self._x = None

    @property
    def x(self):
        """I'm the 'x' property."""
        return self._x

    @x.setter
    def x(self, value):
        self._x = value

    @x.deleter
    def x(self):
        del self._x

并且,x.setterx.deleter装饰器是如何创建的?我很困惑。

I would like to understand how the built-in function property works. What confuses me is that property can also be used as a decorator, but it only takes arguments when used as a built-in function and not when used as a decorator.

This example is from the documentation:

class C(object):
    def __init__(self):
        self._x = None

    def getx(self):
        return self._x
    def setx(self, value):
        self._x = value
    def delx(self):
        del self._x
    x = property(getx, setx, delx, "I'm the 'x' property.")

property‘s arguments are getx, setx, delx and a doc string.

In the code below property is used as decorator. The object of it is the x function, but in the code above there is no place for an object function in the arguments.

class C(object):
    def __init__(self):
        self._x = None

    @property
    def x(self):
        """I'm the 'x' property."""
        return self._x

    @x.setter
    def x(self, value):
        self._x = value

    @x.deleter
    def x(self):
        del self._x

And, how are the x.setter and x.deleter decorators created? I am confused.


回答 0

property()函数返回一个特殊的描述符对象

>>> property()
<property object at 0x10ff07940>

正是这种对象有额外的方法:

>>> property().getter
<built-in method getter of property object at 0x10ff07998>
>>> property().setter
<built-in method setter of property object at 0x10ff07940>
>>> property().deleter
<built-in method deleter of property object at 0x10ff07998>

这些充当装饰。他们返回一个新的属性对象:

>>> property().getter(None)
<property object at 0x10ff079f0>

那是旧对象的副本,但是替换了其中一个功能。

请记住,@decorator语法只是语法糖。语法:

@property
def foo(self): return self._foo

确实与

def foo(self): return self._foo
foo = property(foo)

因此foo该函数被替换property(foo),我们在上面看到的是一个特殊的对象。然后,当您使用时@foo.setter(),您正在做的就是调用property().setter上面显示的方法,该方法将返回该属性的新副本,但是这次将setter函数替换为装饰方法。

下面的序列还通过使用那些装饰器方法创建了一个全开属性。

首先,我们property仅使用getter 创建一些函数和一个对象:

>>> def getter(self): print('Get!')
... 
>>> def setter(self, value): print('Set to {!r}!'.format(value))
... 
>>> def deleter(self): print('Delete!')
... 
>>> prop = property(getter)
>>> prop.fget is getter
True
>>> prop.fset is None
True
>>> prop.fdel is None
True

接下来,我们使用该.setter()方法添加setter:

>>> prop = prop.setter(setter)
>>> prop.fget is getter
True
>>> prop.fset is setter
True
>>> prop.fdel is None
True

最后,我们使用以下.deleter()方法添加删除器:

>>> prop = prop.deleter(deleter)
>>> prop.fget is getter
True
>>> prop.fset is setter
True
>>> prop.fdel is deleter
True

最后但并非最不重要的一点是,该property对象充当描述符对象,因此它具有和.__get__(),可以.__set__().__delete__()实例属性的获取,设置和删除方法挂钩:

>>> class Foo: pass
... 
>>> prop.__get__(Foo(), Foo)
Get!
>>> prop.__set__(Foo(), 'bar')
Set to 'bar'!
>>> prop.__delete__(Foo())
Delete!

Descriptor Howto包括以下类型的纯Python示例实现property()

class Property:
    "Emulate PyProperty_Type() in Objects/descrobject.c"

    def __init__(self, fget=None, fset=None, fdel=None, doc=None):
        self.fget = fget
        self.fset = fset
        self.fdel = fdel
        if doc is None and fget is not None:
            doc = fget.__doc__
        self.__doc__ = doc

    def __get__(self, obj, objtype=None):
        if obj is None:
            return self
        if self.fget is None:
            raise AttributeError("unreadable attribute")
        return self.fget(obj)

    def __set__(self, obj, value):
        if self.fset is None:
            raise AttributeError("can't set attribute")
        self.fset(obj, value)

    def __delete__(self, obj):
        if self.fdel is None:
            raise AttributeError("can't delete attribute")
        self.fdel(obj)

    def getter(self, fget):
        return type(self)(fget, self.fset, self.fdel, self.__doc__)

    def setter(self, fset):
        return type(self)(self.fget, fset, self.fdel, self.__doc__)

    def deleter(self, fdel):
        return type(self)(self.fget, self.fset, fdel, self.__doc__)

The property() function returns a special descriptor object:

>>> property()
<property object at 0x10ff07940>

It is this object that has extra methods:

>>> property().getter
<built-in method getter of property object at 0x10ff07998>
>>> property().setter
<built-in method setter of property object at 0x10ff07940>
>>> property().deleter
<built-in method deleter of property object at 0x10ff07998>

These act as decorators too. They return a new property object:

>>> property().getter(None)
<property object at 0x10ff079f0>

that is a copy of the old object, but with one of the functions replaced.

Remember, that the @decorator syntax is just syntactic sugar; the syntax:

@property
def foo(self): return self._foo

really means the same thing as

def foo(self): return self._foo
foo = property(foo)

so foo the function is replaced by property(foo), which we saw above is a special object. Then when you use @foo.setter(), what you are doing is call that property().setter method I showed you above, which returns a new copy of the property, but this time with the setter function replaced with the decorated method.

The following sequence also creates a full-on property, by using those decorator methods.

First we create some functions and a property object with just a getter:

>>> def getter(self): print('Get!')
... 
>>> def setter(self, value): print('Set to {!r}!'.format(value))
... 
>>> def deleter(self): print('Delete!')
... 
>>> prop = property(getter)
>>> prop.fget is getter
True
>>> prop.fset is None
True
>>> prop.fdel is None
True

Next we use the .setter() method to add a setter:

>>> prop = prop.setter(setter)
>>> prop.fget is getter
True
>>> prop.fset is setter
True
>>> prop.fdel is None
True

Last we add a deleter with the .deleter() method:

>>> prop = prop.deleter(deleter)
>>> prop.fget is getter
True
>>> prop.fset is setter
True
>>> prop.fdel is deleter
True

Last but not least, the property object acts as a descriptor object, so it has .__get__(), .__set__() and .__delete__() methods to hook into instance attribute getting, setting and deleting:

>>> class Foo: pass
... 
>>> prop.__get__(Foo(), Foo)
Get!
>>> prop.__set__(Foo(), 'bar')
Set to 'bar'!
>>> prop.__delete__(Foo())
Delete!

The Descriptor Howto includes a pure Python sample implementation of the property() type:

class Property:
    "Emulate PyProperty_Type() in Objects/descrobject.c"

    def __init__(self, fget=None, fset=None, fdel=None, doc=None):
        self.fget = fget
        self.fset = fset
        self.fdel = fdel
        if doc is None and fget is not None:
            doc = fget.__doc__
        self.__doc__ = doc

    def __get__(self, obj, objtype=None):
        if obj is None:
            return self
        if self.fget is None:
            raise AttributeError("unreadable attribute")
        return self.fget(obj)

    def __set__(self, obj, value):
        if self.fset is None:
            raise AttributeError("can't set attribute")
        self.fset(obj, value)

    def __delete__(self, obj):
        if self.fdel is None:
            raise AttributeError("can't delete attribute")
        self.fdel(obj)

    def getter(self, fget):
        return type(self)(fget, self.fset, self.fdel, self.__doc__)

    def setter(self, fset):
        return type(self)(self.fget, fset, self.fdel, self.__doc__)

    def deleter(self, fdel):
        return type(self)(self.fget, self.fset, fdel, self.__doc__)

回答 1

文档说这只是创建只读属性的捷径。所以

@property
def x(self):
    return self._x

相当于

def getx(self):
    return self._x
x = property(getx)

Documentation says it’s just a shortcut for creating readonly properties. So

@property
def x(self):
    return self._x

is equivalent to

def getx(self):
    return self._x
x = property(getx)

回答 2

这是如何@property实现的最小示例:

class Thing:
    def __init__(self, my_word):
        self._word = my_word 
    @property
    def word(self):
        return self._word

>>> print( Thing('ok').word )
'ok'

否则,将word保留方法而不是属性。

class Thing:
    def __init__(self, my_word):
        self._word = my_word
    def word(self):
        return self._word

>>> print( Thing('ok').word() )
'ok'

Here is a minimal example of how @property can be implemented:

class Thing:
    def __init__(self, my_word):
        self._word = my_word 
    @property
    def word(self):
        return self._word

>>> print( Thing('ok').word )
'ok'

Otherwise word remains a method instead of a property.

class Thing:
    def __init__(self, my_word):
        self._word = my_word
    def word(self):
        return self._word

>>> print( Thing('ok').word() )
'ok'

回答 3

第一部分很简单:

@property
def x(self): ...

是相同的

def x(self): ...
x = property(x)
  • 反过来,这是property仅使用getter 创建a的简化语法。

下一步将使用设置器和删除器扩展此属性。并通过适当的方法来实现:

@x.setter
def x(self, value): ...

返回一个新属性,该属性继承了旧属性x以及给定的setter的所有内容。

x.deleter 以相同的方式工作。

The first part is simple:

@property
def x(self): ...

is the same as

def x(self): ...
x = property(x)
  • which, in turn, is the simplified syntax for creating a property with just a getter.

The next step would be to extend this property with a setter and a deleter. And this happens with the appropriate methods:

@x.setter
def x(self, value): ...

returns a new property which inherits everything from the old x plus the given setter.

x.deleter works the same way.


回答 4

以下内容:

class C(object):
    def __init__(self):
        self._x = None

    @property
    def x(self):
        """I'm the 'x' property."""
        return self._x

    @x.setter
    def x(self, value):
        self._x = value

    @x.deleter
    def x(self):
        del self._x

是相同的:

class C(object):
    def __init__(self):
        self._x = None

    def _x_get(self):
        return self._x

    def _x_set(self, value):
        self._x = value

    def _x_del(self):
        del self._x

    x = property(_x_get, _x_set, _x_del, 
                    "I'm the 'x' property.")

是相同的:

class C(object):
    def __init__(self):
        self._x = None

    def _x_get(self):
        return self._x

    def _x_set(self, value):
        self._x = value

    def _x_del(self):
        del self._x

    x = property(_x_get, doc="I'm the 'x' property.")
    x = x.setter(_x_set)
    x = x.deleter(_x_del)

是相同的:

class C(object):
    def __init__(self):
        self._x = None

    def _x_get(self):
        return self._x
    x = property(_x_get, doc="I'm the 'x' property.")

    def _x_set(self, value):
        self._x = value
    x = x.setter(_x_set)

    def _x_del(self):
        del self._x
    x = x.deleter(_x_del)

等同于:

class C(object):
    def __init__(self):
        self._x = None

    @property
    def x(self):
        """I'm the 'x' property."""
        return self._x

    @x.setter
    def x(self, value):
        self._x = value

    @x.deleter
    def x(self):
        del self._x

This following:

class C(object):
    def __init__(self):
        self._x = None

    @property
    def x(self):
        """I'm the 'x' property."""
        return self._x

    @x.setter
    def x(self, value):
        self._x = value

    @x.deleter
    def x(self):
        del self._x

Is the same as:

class C(object):
    def __init__(self):
        self._x = None

    def _x_get(self):
        return self._x

    def _x_set(self, value):
        self._x = value

    def _x_del(self):
        del self._x

    x = property(_x_get, _x_set, _x_del, 
                    "I'm the 'x' property.")

Is the same as:

class C(object):
    def __init__(self):
        self._x = None

    def _x_get(self):
        return self._x

    def _x_set(self, value):
        self._x = value

    def _x_del(self):
        del self._x

    x = property(_x_get, doc="I'm the 'x' property.")
    x = x.setter(_x_set)
    x = x.deleter(_x_del)

Is the same as:

class C(object):
    def __init__(self):
        self._x = None

    def _x_get(self):
        return self._x
    x = property(_x_get, doc="I'm the 'x' property.")

    def _x_set(self, value):
        self._x = value
    x = x.setter(_x_set)

    def _x_del(self):
        del self._x
    x = x.deleter(_x_del)

Which is the same as :

class C(object):
    def __init__(self):
        self._x = None

    @property
    def x(self):
        """I'm the 'x' property."""
        return self._x

    @x.setter
    def x(self, value):
        self._x = value

    @x.deleter
    def x(self):
        del self._x

回答 5

下面是另一个示例,该示例在@property需要重构代码的情况下如何提供帮助(从此处进行总结):

假设您创建了一个Money这样的类:

class Money:
    def __init__(self, dollars, cents):
        self.dollars = dollars
        self.cents = cents

并且用户根据他/她使用的此类创建一个库

money = Money(27, 12)

print("I have {} dollar and {} cents.".format(money.dollars, money.cents))
# prints I have 27 dollar and 12 cents.

现在,让我们假设您决定更改您的Money类并摆脱dollarscents属性,而是决定仅跟踪总分:

class Money:
    def __init__(self, dollars, cents):
        self.total_cents = dollars * 100 + cents

如果上述用户现在尝试像以前一样运行他/她的库

money = Money(27, 12)

print("I have {} dollar and {} cents.".format(money.dollars, money.cents))

这会导致错误

AttributeError:“ Money”对象没有属性“ dollars”

也就是说,现在大家谁依赖于原始的手段Money类将不得不改变所有代码行,其中dollarscents使用可以是非常痛苦……那么,怎么会这样避免?通过使用@property

就是那样:

class Money:
    def __init__(self, dollars, cents):
        self.total_cents = dollars * 100 + cents

    # Getter and setter for dollars...
    @property
    def dollars(self):
        return self.total_cents // 100

    @dollars.setter
    def dollars(self, new_dollars):
        self.total_cents = 100 * new_dollars + self.cents

    # And the getter and setter for cents.
    @property
    def cents(self):
        return self.total_cents % 100

    @cents.setter
    def cents(self, new_cents):
        self.total_cents = 100 * self.dollars + new_cents

现在我们从图书馆打电话时

money = Money(27, 12)

print("I have {} dollar and {} cents.".format(money.dollars, money.cents))
# prints I have 27 dollar and 12 cents.

它会按预期工作,我们不必在库中更改任何代码!实际上,我们甚至不必知道我们依赖的库已更改。

setter可以正常工作:

money.dollars += 2
print("I have {} dollar and {} cents.".format(money.dollars, money.cents))
# prints I have 29 dollar and 12 cents.

money.cents += 10
print("I have {} dollar and {} cents.".format(money.dollars, money.cents))
# prints I have 29 dollar and 22 cents.

您也@property可以在抽象类中使用。我在这里举一个最小的例子。

Below is another example on how @property can help when one has to refactor code which is taken from here (I only summarize it below):

Imagine you created a class Money like this:

class Money:
    def __init__(self, dollars, cents):
        self.dollars = dollars
        self.cents = cents

and an user creates a library depending on this class where he/she uses e.g.

money = Money(27, 12)

print("I have {} dollar and {} cents.".format(money.dollars, money.cents))
# prints I have 27 dollar and 12 cents.

Now let’s suppose you decide to change your Money class and get rid of the dollars and cents attributes but instead decide to only track the total amount of cents:

class Money:
    def __init__(self, dollars, cents):
        self.total_cents = dollars * 100 + cents

If the above mentioned user now tries to run his/her library as before

money = Money(27, 12)

print("I have {} dollar and {} cents.".format(money.dollars, money.cents))

it will result in an error

AttributeError: ‘Money’ object has no attribute ‘dollars’

That means that now everyone who relies on your original Money class would have to change all lines of code where dollars and cents are used which can be very painful… So, how could this be avoided? By using @property!

That is how:

class Money:
    def __init__(self, dollars, cents):
        self.total_cents = dollars * 100 + cents

    # Getter and setter for dollars...
    @property
    def dollars(self):
        return self.total_cents // 100

    @dollars.setter
    def dollars(self, new_dollars):
        self.total_cents = 100 * new_dollars + self.cents

    # And the getter and setter for cents.
    @property
    def cents(self):
        return self.total_cents % 100

    @cents.setter
    def cents(self, new_cents):
        self.total_cents = 100 * self.dollars + new_cents

when we now call from our library

money = Money(27, 12)

print("I have {} dollar and {} cents.".format(money.dollars, money.cents))
# prints I have 27 dollar and 12 cents.

it will work as expected and we did not have to change a single line of code in our library! In fact, we would not even have to know that the library we depend on changed.

Also the setter works fine:

money.dollars += 2
print("I have {} dollar and {} cents.".format(money.dollars, money.cents))
# prints I have 29 dollar and 12 cents.

money.cents += 10
print("I have {} dollar and {} cents.".format(money.dollars, money.cents))
# prints I have 29 dollar and 22 cents.

You can use @property also in abstract classes; I give a minimal example here.


回答 6

我在这里阅读了所有文章,并意识到我们可能需要一个真实的例子。为什么实际上我们有@property?因此,考虑使用身份验证系统的Flask应用。您在中声明模型用户models.py

class User(UserMixin, db.Model):
    __tablename__ = 'users'
    id = db.Column(db.Integer, primary_key=True)
    email = db.Column(db.String(64), unique=True, index=True)
    username = db.Column(db.String(64), unique=True, index=True)
    password_hash = db.Column(db.String(128))

    ...

    @property
    def password(self):
        raise AttributeError('password is not a readable attribute')

    @password.setter
    def password(self, password):
        self.password_hash = generate_password_hash(password)

    def verify_password(self, password):
        return check_password_hash(self.password_hash, password)

在这段代码中,我们password使用了“隐藏”属性,当您尝试直接访问它时@property,该属性会触发AttributeError断言,而我们使用@ property.setter来设置实际的实例变量password_hash

现在,auth/views.py我们可以实例化一个用户:

...
@auth.route('/register', methods=['GET', 'POST'])
def register():
    form = RegisterForm()
    if form.validate_on_submit():
        user = User(email=form.email.data,
                    username=form.username.data,
                    password=form.password.data)
        db.session.add(user)
        db.session.commit()
...

password用户填写表单时,该属性来自注册表单。密码确认发生在前端EqualTo('password', message='Passwords must match')(如果您想知道,但这是与Flask表单相关的其他主题)。

我希望这个例子会有用

I read all the posts here and realized that we may need a real life example. Why, actually, we have @property? So, consider a Flask app where you use authentication system. You declare a model User in models.py:

class User(UserMixin, db.Model):
    __tablename__ = 'users'
    id = db.Column(db.Integer, primary_key=True)
    email = db.Column(db.String(64), unique=True, index=True)
    username = db.Column(db.String(64), unique=True, index=True)
    password_hash = db.Column(db.String(128))

    ...

    @property
    def password(self):
        raise AttributeError('password is not a readable attribute')

    @password.setter
    def password(self, password):
        self.password_hash = generate_password_hash(password)

    def verify_password(self, password):
        return check_password_hash(self.password_hash, password)

In this code we’ve “hidden” attribute password by using @property which triggers AttributeError assertion when you try to access it directly, while we used @property.setter to set the actual instance variable password_hash.

Now in auth/views.py we can instantiate a User with:

...
@auth.route('/register', methods=['GET', 'POST'])
def register():
    form = RegisterForm()
    if form.validate_on_submit():
        user = User(email=form.email.data,
                    username=form.username.data,
                    password=form.password.data)
        db.session.add(user)
        db.session.commit()
...

Notice attribute password that comes from a registration form when a user fills the form. Password confirmation happens on the front end with EqualTo('password', message='Passwords must match') (in case if you are wondering, but it’s a different topic related Flask forms).

I hope this example will be useful


回答 7

那里的很多人都清楚了这一点,但这是我一直在寻找的直接点。我觉得从@property装饰器开始很重要。例如:-

class UtilityMixin():
    @property
    def get_config(self):
        return "This is property"

函数“ get_config()”的调用将像这样工作。

util = UtilityMixin()
print(util.get_config)

如果您注意到我没有使用“()”括号来调用该函数。这是我在搜索@property装饰器的基本内容。这样您就可以像使用变量一样使用函数。

This point is been cleared by many people up there but here is a direct point which I was searching. This is what I feel is important to start with the @property decorator. eg:-

class UtilityMixin():
    @property
    def get_config(self):
        return "This is property"

The calling of function “get_config()” will work like this.

util = UtilityMixin()
print(util.get_config)

If you notice I have not used “()” brackets for calling the function. This is the basic thing which I was searching for the @property decorator. So that you can use your function just like a variable.


回答 8

让我们从Python装饰器开始。

Python装饰器是一个函数,可以帮助向已经定义的函数添加一些其他功能。

在Python中,一切都是对象。Python中的函数是一流的对象,这意味着它们可以被变量引用,添加到列表中,作为参数传递给另一个函数等。

考虑以下代码片段。

def decorator_func(fun):
    def wrapper_func():
        print("Wrapper function started")
        fun()
        print("Given function decorated")
        # Wrapper function add something to the passed function and decorator 
        # returns the wrapper function
    return wrapper_func

def say_bye():
    print("bye!!")

say_bye = decorator_func(say_bye)
say_bye()

# Output:
#  Wrapper function started
#  bye
#  Given function decorated

在这里,我们可以说装饰器函数修改了我们的say_hello函数,并在其中添加了一些额外的代码行。

装饰器的Python语法

def decorator_func(fun):
    def wrapper_func():
        print("Wrapper function started")
        fun()
        print("Given function decorated")
        # Wrapper function add something to the passed function and decorator 
        # returns the wrapper function
    return wrapper_func

@decorator_func
def say_bye():
    print("bye!!")

say_bye()

最后,让我们结束一个案例案例,但在此之前,让我们先讨论一些糟糕的原则。

在许多面向对象的编程语言中都使用getter和setter来确保数据封装的原理(被视为将数据与对这些数据进行操作的方法捆绑在一起)。

这些方法当然是用于获取数据的吸气剂和用于更改数据的设置器。

根据此原理,将一个类的属性设为私有,以隐藏它们并保护它们免受其他代码的侵害。

是的,@ property基本上是使用getter和setterpythonic方法。

Python有一个伟大的概念,称为属性,它使面向对象的程序员的生活变得更加简单。

让我们假设您决定创建一个可以存储摄氏温度的类。

class Celsius:
def __init__(self, temperature = 0):
    self.set_temperature(temperature)

def to_fahrenheit(self):
    return (self.get_temperature() * 1.8) + 32

def get_temperature(self):
    return self._temperature

def set_temperature(self, value):
    if value < -273:
        raise ValueError("Temperature below -273 is not possible")
    self._temperature = value

重构代码,这是我们可以通过属性实现的方法。

在Python中,property()是一个内置函数,可创建并返回属性对象。

属性对象具有三种方法,getter(),setter()和delete()。

class Celsius:
def __init__(self, temperature = 0):
    self.temperature = temperature

def to_fahrenheit(self):
    return (self.temperature * 1.8) + 32

def get_temperature(self):
    print("Getting value")
    return self.temperature

def set_temperature(self, value):
    if value < -273:
        raise ValueError("Temperature below -273 is not possible")
    print("Setting value")
    self.temperature = value

temperature = property(get_temperature,set_temperature)

这里,

temperature = property(get_temperature,set_temperature)

本可以分解为

# make empty property
temperature = property()
# assign fget
temperature = temperature.getter(get_temperature)
# assign fset
temperature = temperature.setter(set_temperature)

注意事项:

  • get_temperature仍然是属性而不是方法。

现在,您可以通过写入来获取温度值。

C = Celsius()
C.temperature
# instead of writing C.get_temperature()

我们可以进一步继续,不要定义名称get_temperatureset_temperature,因为它们是不必要的,并污染类命名空间。

解决上述问题的pythonic方法是使用@property

class Celsius:
    def __init__(self, temperature = 0):
        self.temperature = temperature

    def to_fahrenheit(self):
        return (self.temperature * 1.8) + 32

    @property
    def temperature(self):
        print("Getting value")
        return self.temperature

    @temperature.setter
    def temperature(self, value):
        if value < -273:
            raise ValueError("Temperature below -273 is not possible")
        print("Setting value")
        self.temperature = value

注意事项-

  1. 用于获取值的方法以“ @property”修饰。
  2. 用作设置器的方法用“ @ temperature.setter”修饰,如果该函数被称为“ x”,则必须用“ @ x.setter”修饰。
  3. 我们用相同的名称和不同数量的参数“ def temperature(self)”和“ def temperature(self,x)”编写了“两个”方法。

如您所见,该代码绝对不太优雅。

现在,让我们谈谈一个现实的实用场景。

假设您设计的类如下:

class OurClass:

    def __init__(self, a):
        self.x = a


y = OurClass(10)
print(y.x)

现在,让我们进一步假设我们的类在客户中很受欢迎,并且他们开始在程序中使用它。他们对对象进行了各种分配。

有朝一日,一个值得信赖的客户来找我们,建议“ x”的值必须在0到1000之间,这确实是一个可怕的情况!

由于属性,这很容易:我们创建属性版本“ x”。

class OurClass:

    def __init__(self,x):
        self.x = x

    @property
    def x(self):
        return self.__x

    @x.setter
    def x(self, x):
        if x < 0:
            self.__x = 0
        elif x > 1000:
            self.__x = 1000
        else:
            self.__x = x

很好,不是吗:您可以从可以想象到的最简单的实现开始,并且以后可以随意迁移到属性版本,而不必更改接口!因此,属性不仅仅是吸气剂和塞特剂的替代品!

您可以在此处检查此实现

Let’s start with Python decorators.

A Python decorator is a function that helps to add some additional functionalities to an already defined function.

In Python, everything is an object. Functions in Python are first-class objects which means that they can be referenced by a variable, added in the lists, passed as arguments to another function etc.

Consider the following code snippet.

def decorator_func(fun):
    def wrapper_func():
        print("Wrapper function started")
        fun()
        print("Given function decorated")
        # Wrapper function add something to the passed function and decorator 
        # returns the wrapper function
    return wrapper_func

def say_bye():
    print("bye!!")

say_bye = decorator_func(say_bye)
say_bye()

# Output:
#  Wrapper function started
#  bye
#  Given function decorated

Here, we can say that decorator function modified our say_hello function and added some extra lines of code in it.

Python syntax for decorator

def decorator_func(fun):
    def wrapper_func():
        print("Wrapper function started")
        fun()
        print("Given function decorated")
        # Wrapper function add something to the passed function and decorator 
        # returns the wrapper function
    return wrapper_func

@decorator_func
def say_bye():
    print("bye!!")

say_bye()

Let’s Concluded everything than with a case scenario, but before that let’s talk about some oops priniciples.

Getters and setters are used in many object oriented programming languages to ensure the principle of data encapsulation(is seen as the bundling of data with the methods that operate on these data.)

These methods are of course the getter for retrieving the data and the setter for changing the data.

According to this principle, the attributes of a class are made private to hide and protect them from other code.

Yup, @property is basically a pythonic way to use getters and setters.

Python has a great concept called property which makes the life of an object-oriented programmer much simpler.

Let us assume that you decide to make a class that could store the temperature in degree Celsius.

class Celsius:
def __init__(self, temperature = 0):
    self.set_temperature(temperature)

def to_fahrenheit(self):
    return (self.get_temperature() * 1.8) + 32

def get_temperature(self):
    return self._temperature

def set_temperature(self, value):
    if value < -273:
        raise ValueError("Temperature below -273 is not possible")
    self._temperature = value

Refactored Code, Here is how we could have achieved it with property.

In Python, property() is a built-in function that creates and returns a property object.

A property object has three methods, getter(), setter(), and delete().

class Celsius:
def __init__(self, temperature = 0):
    self.temperature = temperature

def to_fahrenheit(self):
    return (self.temperature * 1.8) + 32

def get_temperature(self):
    print("Getting value")
    return self.temperature

def set_temperature(self, value):
    if value < -273:
        raise ValueError("Temperature below -273 is not possible")
    print("Setting value")
    self.temperature = value

temperature = property(get_temperature,set_temperature)

Here,

temperature = property(get_temperature,set_temperature)

could have been broken down as,

# make empty property
temperature = property()
# assign fget
temperature = temperature.getter(get_temperature)
# assign fset
temperature = temperature.setter(set_temperature)

Point To Note:

  • get_temperature remains a property instead of a method.

Now you can access the value of temperature by writing.

C = Celsius()
C.temperature
# instead of writing C.get_temperature()

We can further go on and not define names get_temperature and set_temperature as they are unnecessary and pollute the class namespace.

The pythonic way to deal with the above problem is to use @property.

class Celsius:
    def __init__(self, temperature = 0):
        self.temperature = temperature

    def to_fahrenheit(self):
        return (self.temperature * 1.8) + 32

    @property
    def temperature(self):
        print("Getting value")
        return self.temperature

    @temperature.setter
    def temperature(self, value):
        if value < -273:
            raise ValueError("Temperature below -273 is not possible")
        print("Setting value")
        self.temperature = value

Points to Note –

  1. A method which is used for getting a value is decorated with “@property”.
  2. The method which has to function as the setter is decorated with “@temperature.setter”, If the function had been called “x”, we would have to decorate it with “@x.setter”.
  3. We wrote “two” methods with the same name and a different number of parameters “def temperature(self)” and “def temperature(self,x)”.

As you can see, the code is definitely less elegant.

Now,let’s talk about one real-life practical scenerio.

Let’s say you have designed a class as follows:

class OurClass:

    def __init__(self, a):
        self.x = a


y = OurClass(10)
print(y.x)

Now, let’s further assume that our class got popular among clients and they started using it in their programs, They did all kinds of assignments to the object.

And One fateful day, a trusted client came to us and suggested that “x” has to be a value between 0 and 1000, this is really a horrible scenario!

Due to properties it’s easy: We create a property version of “x”.

class OurClass:

    def __init__(self,x):
        self.x = x

    @property
    def x(self):
        return self.__x

    @x.setter
    def x(self, x):
        if x < 0:
            self.__x = 0
        elif x > 1000:
            self.__x = 1000
        else:
            self.__x = x

This is great, isn’t it: You can start with the simplest implementation imaginable, and you are free to later migrate to a property version without having to change the interface! So properties are not just a replacement for getters and setter!

You can check this Implementation here


回答 9

property@property装饰器背后的一类。

您可以随时检查以下内容:

print(property) #<class 'property'>

我改写了示例,help(property)以显示@property语法

class C:
    def __init__(self):
        self._x=None

    @property 
    def x(self):
        return self._x

    @x.setter 
    def x(self, value):
        self._x = value

    @x.deleter
    def x(self):
        del self._x

c = C()
c.x="a"
print(c.x)

在功能上与property()语法相同:

class C:
    def __init__(self):
        self._x=None

    def g(self):
        return self._x

    def s(self, v):
        self._x = v

    def d(self):
        del self._x

    prop = property(g,s,d)

c = C()
c.x="a"
print(c.x)

如您所见,我们使用该属性的方式没有什么不同。

为了回答这个问题,@property装饰器是通过property类实现的。


因此,问题是要对该property类进行一些解释。这行:

prop = property(g,s,d)

是初始化。我们可以这样重写它:

prop = property(fget=g,fset=s,fdel=d)

的含义fgetfsetfdel

 |    fget
 |      function to be used for getting an attribute value
 |    fset
 |      function to be used for setting an attribute value
 |    fdel
 |      function to be used for del'ing an attribute
 |    doc
 |      docstring

下图显示了我们从类中获得的三胞胎property

在此处输入图片说明

__get____set____delete__那里被覆盖。这是Python中描述符模式的实现。

通常,描述符是具有“绑定行为”的对象属性,其属性访问已被描述符协议中的方法所覆盖。

我们还可以使用属性settergetterdeleter方法的功能绑定属性。检查下一个示例。s2该类的方法C会将属性设置为double

class C:
    def __init__(self):
        self._x=None

    def g(self):
        return self._x

    def s(self, x):
        self._x = x

    def d(self):
        del self._x

    def s2(self,x):
        self._x=x+x


    x=property(g)
    x=x.setter(s)
    x=x.deleter(d)      


c = C()
c.x="a"
print(c.x) # outputs "a"

C.x=property(C.g, C.s2)
C.x=C.x.deleter(C.d)
c2 = C()
c2.x="a"
print(c2.x) # outputs "aa"

property is a class behind @property decorator.

You can always check this:

print(property) #<class 'property'>

I rewrote the example from help(property) to show that the @property syntax

class C:
    def __init__(self):
        self._x=None

    @property 
    def x(self):
        return self._x

    @x.setter 
    def x(self, value):
        self._x = value

    @x.deleter
    def x(self):
        del self._x

c = C()
c.x="a"
print(c.x)

is functionally identical to property() syntax:

class C:
    def __init__(self):
        self._x=None

    def g(self):
        return self._x

    def s(self, v):
        self._x = v

    def d(self):
        del self._x

    prop = property(g,s,d)

c = C()
c.x="a"
print(c.x)

There is no difference how we use the property as you can see.

To answer the question @property decorator is implemented via property class.


So, the question is to explain the property class a bit. This line:

prop = property(g,s,d)

Was the initialization. We can rewrite it like this:

prop = property(fget=g,fset=s,fdel=d)

The meaning of fget, fset and fdel:

 |    fget
 |      function to be used for getting an attribute value
 |    fset
 |      function to be used for setting an attribute value
 |    fdel
 |      function to be used for del'ing an attribute
 |    doc
 |      docstring

The next image shows the triplets we have, from the class property:

enter image description here

__get__, __set__, and __delete__ are there to be overridden. This is the implementation of the descriptor pattern in Python.

In general, a descriptor is an object attribute with “binding behavior”, one whose attribute access has been overridden by methods in the descriptor protocol.

We can also use property setter, getter and deleter methods to bind the function to property. Check the next example. The method s2 of the class C will set the property doubled.

class C:
    def __init__(self):
        self._x=None

    def g(self):
        return self._x

    def s(self, x):
        self._x = x

    def d(self):
        del self._x

    def s2(self,x):
        self._x=x+x


    x=property(g)
    x=x.setter(s)
    x=x.deleter(d)      


c = C()
c.x="a"
print(c.x) # outputs "a"

C.x=property(C.g, C.s2)
C.x=C.x.deleter(C.d)
c2 = C()
c2.x="a"
print(c2.x) # outputs "aa"

回答 10

可以通过两种方式声明属性。

  • 为属性创建getter,setter方法,然后将它们作为参数传递给属性函数
  • 使用@property装饰器。

您可以看一下我编写的有关python属性的一些示例。

A property can be declared in two ways.

  • Creating the getter, setter methods for an attribute and then passing these as argument to property function
  • Using the @property decorator.

You can have a look at few examples I have written about properties in python.


回答 11

最好的解释可以在这里找到:Python @Property Explained –如何使用和何时使用?(完整示例)Selva Prabhakaran | 发表于十一月5,2018

它帮助我理解了为什么不仅如此。

https://www.machinelearningplus.com/python/python-property/

The best explanation can be found here: Python @Property Explained – How to Use and When? (Full Examples) by Selva Prabhakaran | Posted on November 5, 2018

It helped me understand WHY not only HOW.

https://www.machinelearningplus.com/python/python-property/


回答 12

这是另一个示例:

##
## Python Properties Example
##
class GetterSetterExample( object ):
    ## Set the default value for x ( we reference it using self.x, set a value using self.x = value )
    __x = None


##
## On Class Initialization - do something... if we want..
##
def __init__( self ):
    ## Set a value to __x through the getter / setter... Since __x is defined above, this doesn't need to be set...
    self.x = 1234

    return None


##
## Define x as a property, ie a getter - All getters should have a default value arg, so I added it - it will not be passed in when setting a value, so you need to set the default here so it will be used..
##
@property
def x( self, _default = None ):
    ## I added an optional default value argument as all getters should have this - set it to the default value you want to return...
    _value = ( self.__x, _default )[ self.__x == None ]

    ## Debugging - so you can see the order the calls are made...
    print( '[ Test Class ] Get x = ' + str( _value ) )

    ## Return the value - we are a getter afterall...
    return _value


##
## Define the setter function for x...
##
@x.setter
def x( self, _value = None ):
    ## Debugging - so you can see the order the calls are made...
    print( '[ Test Class ] Set x = ' + str( _value ) )

    ## This is to show the setter function works.... If the value is above 0, set it to a negative value... otherwise keep it as is ( 0 is the only non-negative number, it can't be negative or positive anyway )
    if ( _value > 0 ):
        self.__x = -_value
    else:
        self.__x = _value


##
## Define the deleter function for x...
##
@x.deleter
def x( self ):
    ## Unload the assignment / data for x
    if ( self.__x != None ):
        del self.__x


##
## To String / Output Function for the class - this will show the property value for each property we add...
##
def __str__( self ):
    ## Output the x property data...
    print( '[ x ] ' + str( self.x ) )


    ## Return a new line - technically we should return a string so it can be printed where we want it, instead of printed early if _data = str( C( ) ) is used....
    return '\n'

##
##
##
_test = GetterSetterExample( )
print( _test )

## For some reason the deleter isn't being called...
del _test.x

基本上,与C(object)示例相同,只是我改用x …我也不在__init中初始化 -…很好..我可以,但是可以删除它,因为__x被定义为一部分班上的…

输出为:

[ Test Class ] Set x = 1234
[ Test Class ] Get x = -1234
[ x ] -1234

如果我将init的self.x = 1234注释掉,则输出为:

[ Test Class ] Get x = None
[ x ] None

并且如果我在getter函数中将_default = None设置为_default = 0(因为所有的getter都应具有默认值,但不会被我所看到的属性值传递,因此您可以在此处定义它,以及它实际上还不错,因为您可以定义一次默认值并在所有地方使用它),即:def x(self,_default = 0):

[ Test Class ] Get x = 0
[ x ] 0

注意:getter逻辑只是为了让它操纵值以确保它被操纵-与print语句相同…

注意:我习惯了Lua,并且在调用单个函数时能够动态创建10个以上的助手,并且我在不使用属性的情况下为Python做了类似的事情,并且在一定程度上可以正常工作,但是,即使这些函数是在之前创建的被使用时,在创建它们之前有时仍会调用它们,这很奇怪,因为它不是以这种方式编码的。。。我更喜欢Lua元表的灵活性,而且我可以使用实际的setter / getters。而不是本质上直接访问变量…我确实喜欢用Python可以快速构建某些东西-例如gui程序。尽管没有大量其他库,虽然我正在设计的库可能无法实现-如果我在AutoHotkey中对其进行编码,则可以直接访问所需的dll调用,并且可以在Java,C#,C ++,

注意:此论坛中的代码输出已损坏-我必须在代码的第一部分中添加空格才能使其正常工作-复制/粘贴时,请确保将所有空格都转换为制表符…。我在Python中使用制表符,因为在10,000行的文件大小可以为512KB至1MB(带空格)和100至200KB(带制表符),这在文件大小和减少处理时间方面存在巨大差异。

还可以按用户调整选项卡-因此,如果您希望使用2个空格宽度,4个,8个空格或您可以做的任何事情,这意味着它对于有视力缺陷的开发人员来说是体贴的。

注意:由于论坛软件中的错误,该类中定义的所有功能均未正确缩进-如果复制/粘贴,请确保将其缩进

Here is another example:

##
## Python Properties Example
##
class GetterSetterExample( object ):
    ## Set the default value for x ( we reference it using self.x, set a value using self.x = value )
    __x = None


##
## On Class Initialization - do something... if we want..
##
def __init__( self ):
    ## Set a value to __x through the getter / setter... Since __x is defined above, this doesn't need to be set...
    self.x = 1234

    return None


##
## Define x as a property, ie a getter - All getters should have a default value arg, so I added it - it will not be passed in when setting a value, so you need to set the default here so it will be used..
##
@property
def x( self, _default = None ):
    ## I added an optional default value argument as all getters should have this - set it to the default value you want to return...
    _value = ( self.__x, _default )[ self.__x == None ]

    ## Debugging - so you can see the order the calls are made...
    print( '[ Test Class ] Get x = ' + str( _value ) )

    ## Return the value - we are a getter afterall...
    return _value


##
## Define the setter function for x...
##
@x.setter
def x( self, _value = None ):
    ## Debugging - so you can see the order the calls are made...
    print( '[ Test Class ] Set x = ' + str( _value ) )

    ## This is to show the setter function works.... If the value is above 0, set it to a negative value... otherwise keep it as is ( 0 is the only non-negative number, it can't be negative or positive anyway )
    if ( _value > 0 ):
        self.__x = -_value
    else:
        self.__x = _value


##
## Define the deleter function for x...
##
@x.deleter
def x( self ):
    ## Unload the assignment / data for x
    if ( self.__x != None ):
        del self.__x


##
## To String / Output Function for the class - this will show the property value for each property we add...
##
def __str__( self ):
    ## Output the x property data...
    print( '[ x ] ' + str( self.x ) )


    ## Return a new line - technically we should return a string so it can be printed where we want it, instead of printed early if _data = str( C( ) ) is used....
    return '\n'

##
##
##
_test = GetterSetterExample( )
print( _test )

## For some reason the deleter isn't being called...
del _test.x

Basically, the same as the C( object ) example except I’m using x instead… I also don’t initialize in __init – … well.. I do, but it can be removed because __x is defined as part of the class….

The output is:

[ Test Class ] Set x = 1234
[ Test Class ] Get x = -1234
[ x ] -1234

and if I comment out the self.x = 1234 in init then the output is:

[ Test Class ] Get x = None
[ x ] None

and if I set the _default = None to _default = 0 in the getter function ( as all getters should have a default value but it isn’t passed in by the property values from what I’ve seen so you can define it here, and it actually isn’t bad because you can define the default once and use it everywhere ) ie: def x( self, _default = 0 ):

[ Test Class ] Get x = 0
[ x ] 0

Note: The getter logic is there just to have the value be manipulated by it to ensure it is manipulated by it – the same for the print statements…

Note: I’m used to Lua and being able to dynamically create 10+ helpers when I call a single function and I made something similar for Python without using properties and it works to a degree, but, even though the functions are being created before being used, there are still issues at times with them being called prior to being created which is strange as it isn’t coded that way… I prefer the flexibility of Lua meta-tables and the fact I can use actual setters / getters instead of essentially directly accessing a variable… I do like how quickly some things can be built with Python though – for instance gui programs. although one I am designing may not be possible without a lot of additional libraries – if I code it in AutoHotkey I can directly access the dll calls I need, and the same can be done in Java, C#, C++, and more – maybe I haven’t found the right thing yet but for that project I may switch from Python..

Note: The code output in this forum is broken – I had to add spaces to the first part of the code for it to work – when copy / pasting ensure you convert all spaces to tabs…. I use tabs for Python because in a file which is 10,000 lines the filesize can be 512KB to 1MB with spaces and 100 to 200KB with tabs which equates to a massive difference for file size, and reduction in processing time…

Tabs can also be adjusted per user – so if you prefer 2 spaces width, 4, 8 or whatever you can do it meaning it is thoughtful for developers with eye-sight deficits.

Note: All of the functions defined in the class aren’t indented properly because of a bug in the forum software – ensure you indent it if you copy / paste


回答 13

一句话:对我来说,对于Python 2.x,@property当我不继承自object

class A():
    pass

但在以下情况下有效:

class A(object):
    pass

对于Python 3,始终有效。

One remark: for me, for Python 2.x, @property didn’t work as advertised when I didn’t inherit from object:

class A():
    pass

but worked when:

class A(object):
    pass

for Python 3, worked always.


__slots__的用法?

问题:__slots__的用法?

__slots__Python 的目的是什么-尤其是关于何时要使用它,何时不使用它?

What is the purpose of __slots__ in Python — especially with respect to when I would want to use it, and when not?


回答 0

在Python中,目的是__slots__什么?在什么情况下应该避免这种情况?

TLDR:

特殊属性__slots__允许您显式说明您希望对象实例具有哪些实例属性,并具有预期的结果:

  1. 更快的属性访问。
  2. 节省内存空间

节省的空间来自

  1. 将值引用存储在插槽中而不是中__dict__
  2. 如果父类拒绝它们并且您声明,则拒绝__dict____weakref__创建__slots__

快速警告

请注意,您只应在继承树中一次声明一个特定的插槽。例如:

class Base:
    __slots__ = 'foo', 'bar'

class Right(Base):
    __slots__ = 'baz', 

class Wrong(Base):
    __slots__ = 'foo', 'bar', 'baz'        # redundant foo and bar

遇到错误时,Python不会反对(它应该会),否则问题可能不会显现出来,但是您的对象将比原先占用更多的空间。Python 3.8:

>>> from sys import getsizeof
>>> getsizeof(Right()), getsizeof(Wrong())
(56, 72)

这是因为基准站的插槽描述符的插槽与错误的插槽分开。通常不应该这样,但是可以:

>>> w = Wrong()
>>> w.foo = 'foo'
>>> Base.foo.__get__(w)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: foo
>>> Wrong.foo.__get__(w)
'foo'

最大的警告是多重继承-无法将多个“具有非空插槽的父类”组合在一起。

为适应此限制,请遵循最佳做法:排除其父母(或他们的具体类)将共同继承的除一个或所有父代之外的所有抽象-给这些抽象留出空位(就像在父类中的抽象基类一样)标准库)。

有关示例,请参见下面有关多重继承的部分。

要求:

  • 要使名为in的属性__slots__实际上存储在插槽中而不是存储在插槽中__dict__,则类必须从继承object

  • 为防止创建__dict__,您必须继承,object并且继承中的所有类都必须声明,__slots__并且它们都不能具有'__dict__'条目。

如果您想继续阅读,有很多细节。

为什么使用__slots__:更快的属性访问。

Python的创建者Guido van Rossum 指出,他实际上是__slots__为了更快地访问属性而创建的。

证明可观的显着更快访问是微不足道的:

import timeit

class Foo(object): __slots__ = 'foo',

class Bar(object): pass

slotted = Foo()
not_slotted = Bar()

def get_set_delete_fn(obj):
    def get_set_delete():
        obj.foo = 'foo'
        obj.foo
        del obj.foo
    return get_set_delete

>>> min(timeit.repeat(get_set_delete_fn(slotted)))
0.2846834529991611
>>> min(timeit.repeat(get_set_delete_fn(not_slotted)))
0.3664822799983085

在Ubuntu 3.5上的Python 3.5中,插槽式访问的速度几乎快了30%。

>>> 0.3664822799983085 / 0.2846834529991611
1.2873325658284342

在Windows上的Python 2中,我测得的速度要快15%。

为何使用__slots__:内存节省

的另一个目的__slots__是减少每个对象实例占用的内存空间。

我自己对文档的贡献清楚地说明了背后的原因

通过使用节省的空间__dict__可能很大。

SQLAlchemy将大量内存节省归因__slots__

为了验证这一点,请在Ubuntu Linux上使用Python 2.7的Anaconda发行版(带有guppy.hpy(又是堆的)和)sys.getsizeof,不__slots__声明且没有其他声明的类实例的大小为64字节。但这包括__dict__。再次感谢Python的惰性求值,在__dict__引用它之前,显然不会调用,但是没有数据的类通常是无用的。当存在时,该__dict__属性另外至少为280个字节。

相反,__slots__声明为()(无数据)的类实例只有16个字节,插槽中有一项的总字节数为56个,插槽中有两项的总数为64个字节。

对于64位Python,我说明了dict在3.6中增长的每个点(0、1和2属性除外)的for __slots____dict__(未定义插槽)在Python 2.7和3.6中以字节为单位的内存消耗:

       Python 2.7             Python 3.6
attrs  __slots__  __dict__*   __slots__  __dict__* | *(no slots defined)
none   16         56 + 272   16         56 + 112 | if __dict__ referenced
one    48         56 + 272    48         56 + 112
two    56         56 + 272    56         56 + 112
six    88         56 + 1040   88         56 + 152
11     128        56 + 1040   128        56 + 240
22     216        56 + 3344   216        56 + 408     
43     384        56 + 3344   384        56 + 752

因此,尽管Python 3中的指令较小,但我们仍然可以看到__slots__实例可以很好地扩展以节省内存,这是您要使用的主要原因__slots__

只是为了完整起见,请注意,在类的命名空间中,每个插槽的一次性成本为Python 2中64字节,而在Python 3中为72字节,因为插槽使用数据描述符(如属性)称为“成员”。

>>> Foo.foo
<member 'foo' of 'Foo' objects>
>>> type(Foo.foo)
<class 'member_descriptor'>
>>> getsizeof(Foo.foo)
72

演示__slots__

要拒绝创建__dict__,必须子类化object

class Base(object): 
    __slots__ = ()

现在:

>>> b = Base()
>>> b.a = 'a'
Traceback (most recent call last):
  File "<pyshell#38>", line 1, in <module>
    b.a = 'a'
AttributeError: 'Base' object has no attribute 'a'

或子类化另一个定义的类 __slots__

class Child(Base):
    __slots__ = ('a',)

现在:

c = Child()
c.a = 'a'

但:

>>> c.b = 'b'
Traceback (most recent call last):
  File "<pyshell#42>", line 1, in <module>
    c.b = 'b'
AttributeError: 'Child' object has no attribute 'b'

要在对有槽位的__dict__对象进行子类化时允许创建,只需添加'__dict__'__slots__(请注意,槽位是有序的,并且您不应重复父类中已经存在的槽位):

class SlottedWithDict(Child): 
    __slots__ = ('__dict__', 'b')

swd = SlottedWithDict()
swd.a = 'a'
swd.b = 'b'
swd.c = 'c'

>>> swd.__dict__
{'c': 'c'}

或者甚至不需要__slots__在子类中声明,并且仍将使用父级的插槽,但不限制创建__dict__

class NoSlots(Child): pass
ns = NoSlots()
ns.a = 'a'
ns.b = 'b'

和:

>>> ns.__dict__
{'b': 'b'}

但是,__slots__可能会导致多重继承问题:

class BaseA(object): 
    __slots__ = ('a',)

class BaseB(object): 
    __slots__ = ('b',)

由于从具有两个非空插槽的父母创建子类失败:

>>> class Child(BaseA, BaseB): __slots__ = ()
Traceback (most recent call last):
  File "<pyshell#68>", line 1, in <module>
    class Child(BaseA, BaseB): __slots__ = ()
TypeError: Error when calling the metaclass bases
    multiple bases have instance lay-out conflict

如果遇到此问题,则可以将其__slots__从父级中移除,或者如果您可以控制父级,则给他们留空的空位,或重构为抽象:

from abc import ABC

class AbstractA(ABC):
    __slots__ = ()

class BaseA(AbstractA): 
    __slots__ = ('a',)

class AbstractB(ABC):
    __slots__ = ()

class BaseB(AbstractB): 
    __slots__ = ('b',)

class Child(AbstractA, AbstractB): 
    __slots__ = ('a', 'b')

c = Child() # no problem!

添加'__dict__'__slots__以获得动态分配:

class Foo(object):
    __slots__ = 'bar', 'baz', '__dict__'

现在:

>>> foo = Foo()
>>> foo.boink = 'boink'

因此,'__dict__'在具有插槽的情况下,我们将失去一些尺寸上的好处,因为它具有动态分配的优势,并且仍具有我们确实期望的名称的插槽。

当您从未插入槽的对象继承时,使用时会得到相同的语义__slots____slots__指向插入槽的值的名称,而其他所有值都放在实例的中__dict__

避免这样做__slots__是因为您不希望出现这种情况,因为它实际上并不是一个很好的理由- 如果需要,只需添加"__dict__"您的属性即可__slots__

如果需要该功能,可以类似地将__weakref____slots__显式添加。

子类化namedtuple时,设置为空tuple:

内置namedtuple使不可变的实例非常轻巧(本质上是元组的大小),但是要获得好处,如果您将它们子类化,则需要自己做:

from collections import namedtuple
class MyNT(namedtuple('MyNT', 'bar baz')):
    """MyNT is an immutable and lightweight object"""
    __slots__ = ()

用法:

>>> nt = MyNT('bar', 'baz')
>>> nt.bar
'bar'
>>> nt.baz
'baz'

尝试分配意外属性会引发,AttributeError因为我们已阻止创建__dict__

>>> nt.quux = 'quux'
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: 'MyNT' object has no attribute 'quux'

可以__dict__通过设置off 允许创建__slots__ = (),但是不能__slots__对元组的子类型使用非空。

最大的警告:多重继承

即使多个父级的非空插槽相同,也不能一起使用:

class Foo(object): 
    __slots__ = 'foo', 'bar'
class Bar(object):
    __slots__ = 'foo', 'bar' # alas, would work if empty, i.e. ()

>>> class Baz(Foo, Bar): pass
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: Error when calling the metaclass bases
    multiple bases have instance lay-out conflict

使用空__slots__父似乎提供了最大的灵活性,允许孩子选择阻止或允许(通过增加'__dict__'获得动态分配,见上面部分)创建的__dict__

class Foo(object): __slots__ = ()
class Bar(object): __slots__ = ()
class Baz(Foo, Bar): __slots__ = ('foo', 'bar')
b = Baz()
b.foo, b.bar = 'foo', 'bar'

你不具备有槽-因此,如果您添加它们,后来删除它们,它不应引起任何问题。

走出放在这里肢体:如果您撰写的混入或使用抽象基类,它不打算被实例化,空__slots__在那些父母似乎是在灵活性作为子类方面最好的一段路要走。

为了演示,首先,让我们创建一个我们希望在多重继承下使用的代码的类。

class AbstractBase:
    __slots__ = ()
    def __init__(self, a, b):
        self.a = a
        self.b = b
    def __repr__(self):
        return f'{type(self).__name__}({repr(self.a)}, {repr(self.b)})'

我们可以通过继承并声明预期的位置来直接使用以上内容:

class Foo(AbstractBase):
    __slots__ = 'a', 'b'

但是我们对此并不在意,这是微不足道的单一继承,我们需要另一个我们也可能继承的类,也许带有嘈杂的属性:

class AbstractBaseC:
    __slots__ = ()
    @property
    def c(self):
        print('getting c!')
        return self._c
    @c.setter
    def c(self, arg):
        print('setting c!')
        self._c = arg

现在,如果两个基地都有非空插槽,我们将无法进行以下操作。(实际上,如果我们愿意,我们可以给AbstractBase非空槽a和b,并将它们排除在下面的声明之外-将它们留在里面是错误的):

class Concretion(AbstractBase, AbstractBaseC):
    __slots__ = 'a b _c'.split()

现在,我们具有通过多重继承的功能,并且仍然可以拒绝__dict____weakref__实例化:

>>> c = Concretion('a', 'b')
>>> c.c = c
setting c!
>>> c.c
getting c!
Concretion('a', 'b')
>>> c.d = 'd'
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: 'Concretion' object has no attribute 'd'

其他避免插槽的情况:

  • __class__除非插槽布局相同,否则要在不具有它们的另一个类(并且不能添加它们)上执行分配时,请避免使用它们。(我对了解谁在做什么以及为什么这样做很感兴趣。)
  • 如果您想将诸如long,tuple或str之类的可变长度内建子类化,并想为其添加属性,请避免使用它们。
  • 如果您坚持通过实例变量的类属性提供默认值,请避免使用它们。

您也许可以从__slots__ 文档的其余部分(最新的3.7 dev文档)中找出更多的警告,我最近做出了很大的贡献。

对其他答案的批评

当前的最佳答案引用了过时的信息,而且非常容易波动,并且在某些重要方面未达到要求。

不要“仅__slots__在实例化许多对象时使用”

我引用:

__slots__如果要实例化大量(数百个,数千个)同一类的对象,则需要使用。”

例如,来自collections模块的抽象基类未实例化,但__slots__已为其声明。

为什么?

如果用户希望拒绝__dict____weakref__创建,则这些内容在父类中必须不可用。

__slots__ 创建接口或混入时有助于重用。

的确,许多Python用户并不是为可重用性而编写的,但是当您这样做时,可以选择拒绝不必要的空间使用是很有价值的。

__slots__ 不会破坏酸洗

腌制开槽的物体时,您可能会发现它带有误导性的抱怨TypeError

>>> pickle.loads(pickle.dumps(f))
TypeError: a class that defines __slots__ without defining __getstate__ cannot be pickled

这实际上是不正确的。此消息来自最早的协议,这是默认协议。您可以使用-1参数选择最新的协议。在Python 2.7中为2(在2.3中引入),在3.6中为4

>>> pickle.loads(pickle.dumps(f, -1))
<__main__.Foo object at 0x1129C770>

在Python 2.7中:

>>> pickle.loads(pickle.dumps(f, 2))
<__main__.Foo object at 0x1129C770>

在Python 3.6中

>>> pickle.loads(pickle.dumps(f, 4))
<__main__.Foo object at 0x1129C770>

所以我会牢记这一点,因为这是一个已解决的问题。

评论(至2016年10月2日)被接受

第一段是一半简短的解释,一半是预测的。这是真正回答问题的唯一部分

正确的用法__slots__是节省对象空间。静态结构不允许创建后添加任何内容,而不是具有允许随时向对象添加属性的动态命令。这样可以为使用插槽的每个对象节省一个指令的开销

后半部分是一厢情愿的想法,并且超出了预期:

尽管有时这是一个有用的优化,但是如果Python解释器足够动态,则仅在实际向对象添加内容时才需要dict,就完全没有必要了。

Python实际上做了类似的事情,只在__dict__访问时创建,但是创建许多没有数据的对象是相当荒谬的。

第二段过分简化,错过了避免的实际原因__slots__。以下不是避免使用插槽的真正原因(出于实际原因,请参阅上面我的回答的其余部分。):

它们以一种可被控制怪胎和静态类型临时表滥用的方式更改具有插槽的对象的行为。

然后,它继续讨论了使用Python实现该有害目标的其他方法,而不是讨论与之相关的任何方法__slots__

第三段是更多的如意算盘。答案者甚至根本没有写过这些杂乱的内容,而是为该网站的批评者弹药。

内存使用证据

创建一些普通对象和带槽对象:

>>> class Foo(object): pass
>>> class Bar(object): __slots__ = ()

实例化其中的一百万:

>>> foos = [Foo() for f in xrange(1000000)]
>>> bars = [Bar() for b in xrange(1000000)]

检查guppy.hpy().heap()

>>> guppy.hpy().heap()
Partition of a set of 2028259 objects. Total size = 99763360 bytes.
 Index  Count   %     Size   % Cumulative  % Kind (class / dict of class)
     0 1000000  49 64000000  64  64000000  64 __main__.Foo
     1     169   0 16281480  16  80281480  80 list
     2 1000000  49 16000000  16  96281480  97 __main__.Bar
     3   12284   1   987472   1  97268952  97 str
...

访问常规对象及其对象,__dict__然后再次检查:

>>> for f in foos:
...     f.__dict__
>>> guppy.hpy().heap()
Partition of a set of 3028258 objects. Total size = 379763480 bytes.
 Index  Count   %      Size    % Cumulative  % Kind (class / dict of class)
     0 1000000  33 280000000  74 280000000  74 dict of __main__.Foo
     1 1000000  33  64000000  17 344000000  91 __main__.Foo
     2     169   0  16281480   4 360281480  95 list
     3 1000000  33  16000000   4 376281480  99 __main__.Bar
     4   12284   0    987472   0 377268952  99 str
...

这与Python历史一致,来自Python 2.2中的Unifying类型和类。

如果您将内置类型作为子类,则多余的空间会自动添加到实例中以容纳__dict____weakrefs__。(__dict__尽管直到使用完,它才会被初始化,因此您不必担心空字典为您创建的每个实例所占用的空间。)如果不需要此多余的空间,则可以在短语中添加“ __slots__ = []”你的班。

In Python, what is the purpose of __slots__ and what are the cases one should avoid this?

TLDR:

The special attribute __slots__ allows you to explicitly state which instance attributes you expect your object instances to have, with the expected results:

  1. faster attribute access.
  2. space savings in memory.

The space savings is from

  1. Storing value references in slots instead of __dict__.
  2. Denying __dict__ and __weakref__ creation if parent classes deny them and you declare __slots__.

Quick Caveats

Small caveat, you should only declare a particular slot one time in an inheritance tree. For example:

class Base:
    __slots__ = 'foo', 'bar'

class Right(Base):
    __slots__ = 'baz', 

class Wrong(Base):
    __slots__ = 'foo', 'bar', 'baz'        # redundant foo and bar

Python doesn’t object when you get this wrong (it probably should), problems might not otherwise manifest, but your objects will take up more space than they otherwise should. Python 3.8:

>>> from sys import getsizeof
>>> getsizeof(Right()), getsizeof(Wrong())
(56, 72)

This is because the Base’s slot descriptor has a slot separate from the Wrong’s. This shouldn’t usually come up, but it could:

>>> w = Wrong()
>>> w.foo = 'foo'
>>> Base.foo.__get__(w)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: foo
>>> Wrong.foo.__get__(w)
'foo'

The biggest caveat is for multiple inheritance – multiple “parent classes with nonempty slots” cannot be combined.

To accommodate this restriction, follow best practices: Factor out all but one or all parents’ abstraction which their concrete class respectively and your new concrete class collectively will inherit from – giving the abstraction(s) empty slots (just like abstract base classes in the standard library).

See section on multiple inheritance below for an example.

Requirements:

  • To have attributes named in __slots__ to actually be stored in slots instead of a __dict__, a class must inherit from object.

  • To prevent the creation of a __dict__, you must inherit from object and all classes in the inheritance must declare __slots__ and none of them can have a '__dict__' entry.

There are a lot of details if you wish to keep reading.

Why use __slots__: Faster attribute access.

The creator of Python, Guido van Rossum, states that he actually created __slots__ for faster attribute access.

It is trivial to demonstrate measurably significant faster access:

import timeit

class Foo(object): __slots__ = 'foo',

class Bar(object): pass

slotted = Foo()
not_slotted = Bar()

def get_set_delete_fn(obj):
    def get_set_delete():
        obj.foo = 'foo'
        obj.foo
        del obj.foo
    return get_set_delete

and

>>> min(timeit.repeat(get_set_delete_fn(slotted)))
0.2846834529991611
>>> min(timeit.repeat(get_set_delete_fn(not_slotted)))
0.3664822799983085

The slotted access is almost 30% faster in Python 3.5 on Ubuntu.

>>> 0.3664822799983085 / 0.2846834529991611
1.2873325658284342

In Python 2 on Windows I have measured it about 15% faster.

Why use __slots__: Memory Savings

Another purpose of __slots__ is to reduce the space in memory that each object instance takes up.

My own contribution to the documentation clearly states the reasons behind this:

The space saved over using __dict__ can be significant.

SQLAlchemy attributes a lot of memory savings to __slots__.

To verify this, using the Anaconda distribution of Python 2.7 on Ubuntu Linux, with guppy.hpy (aka heapy) and sys.getsizeof, the size of a class instance without __slots__ declared, and nothing else, is 64 bytes. That does not include the __dict__. Thank you Python for lazy evaluation again, the __dict__ is apparently not called into existence until it is referenced, but classes without data are usually useless. When called into existence, the __dict__ attribute is a minimum of 280 bytes additionally.

In contrast, a class instance with __slots__ declared to be () (no data) is only 16 bytes, and 56 total bytes with one item in slots, 64 with two.

For 64 bit Python, I illustrate the memory consumption in bytes in Python 2.7 and 3.6, for __slots__ and __dict__ (no slots defined) for each point where the dict grows in 3.6 (except for 0, 1, and 2 attributes):

       Python 2.7             Python 3.6
attrs  __slots__  __dict__*   __slots__  __dict__* | *(no slots defined)
none   16         56 + 272†   16         56 + 112† | †if __dict__ referenced
one    48         56 + 272    48         56 + 112
two    56         56 + 272    56         56 + 112
six    88         56 + 1040   88         56 + 152
11     128        56 + 1040   128        56 + 240
22     216        56 + 3344   216        56 + 408     
43     384        56 + 3344   384        56 + 752

So, in spite of smaller dicts in Python 3, we see how nicely __slots__ scale for instances to save us memory, and that is a major reason you would want to use __slots__.

Just for completeness of my notes, note that there is a one-time cost per slot in the class’s namespace of 64 bytes in Python 2, and 72 bytes in Python 3, because slots use data descriptors like properties, called “members”.

>>> Foo.foo
<member 'foo' of 'Foo' objects>
>>> type(Foo.foo)
<class 'member_descriptor'>
>>> getsizeof(Foo.foo)
72

Demonstration of __slots__:

To deny the creation of a __dict__, you must subclass object:

class Base(object): 
    __slots__ = ()

now:

>>> b = Base()
>>> b.a = 'a'
Traceback (most recent call last):
  File "<pyshell#38>", line 1, in <module>
    b.a = 'a'
AttributeError: 'Base' object has no attribute 'a'

Or subclass another class that defines __slots__

class Child(Base):
    __slots__ = ('a',)

and now:

c = Child()
c.a = 'a'

but:

>>> c.b = 'b'
Traceback (most recent call last):
  File "<pyshell#42>", line 1, in <module>
    c.b = 'b'
AttributeError: 'Child' object has no attribute 'b'

To allow __dict__ creation while subclassing slotted objects, just add '__dict__' to the __slots__ (note that slots are ordered, and you shouldn’t repeat slots that are already in parent classes):

class SlottedWithDict(Child): 
    __slots__ = ('__dict__', 'b')

swd = SlottedWithDict()
swd.a = 'a'
swd.b = 'b'
swd.c = 'c'

and

>>> swd.__dict__
{'c': 'c'}

Or you don’t even need to declare __slots__ in your subclass, and you will still use slots from the parents, but not restrict the creation of a __dict__:

class NoSlots(Child): pass
ns = NoSlots()
ns.a = 'a'
ns.b = 'b'

And:

>>> ns.__dict__
{'b': 'b'}

However, __slots__ may cause problems for multiple inheritance:

class BaseA(object): 
    __slots__ = ('a',)

class BaseB(object): 
    __slots__ = ('b',)

Because creating a child class from parents with both non-empty slots fails:

>>> class Child(BaseA, BaseB): __slots__ = ()
Traceback (most recent call last):
  File "<pyshell#68>", line 1, in <module>
    class Child(BaseA, BaseB): __slots__ = ()
TypeError: Error when calling the metaclass bases
    multiple bases have instance lay-out conflict

If you run into this problem, You could just remove __slots__ from the parents, or if you have control of the parents, give them empty slots, or refactor to abstractions:

from abc import ABC

class AbstractA(ABC):
    __slots__ = ()

class BaseA(AbstractA): 
    __slots__ = ('a',)

class AbstractB(ABC):
    __slots__ = ()

class BaseB(AbstractB): 
    __slots__ = ('b',)

class Child(AbstractA, AbstractB): 
    __slots__ = ('a', 'b')

c = Child() # no problem!

Add '__dict__' to __slots__ to get dynamic assignment:

class Foo(object):
    __slots__ = 'bar', 'baz', '__dict__'

and now:

>>> foo = Foo()
>>> foo.boink = 'boink'

So with '__dict__' in slots we lose some of the size benefits with the upside of having dynamic assignment and still having slots for the names we do expect.

When you inherit from an object that isn’t slotted, you get the same sort of semantics when you use __slots__ – names that are in __slots__ point to slotted values, while any other values are put in the instance’s __dict__.

Avoiding __slots__ because you want to be able to add attributes on the fly is actually not a good reason – just add "__dict__" to your __slots__ if this is required.

You can similarly add __weakref__ to __slots__ explicitly if you need that feature.

Set to empty tuple when subclassing a namedtuple:

The namedtuple builtin make immutable instances that are very lightweight (essentially, the size of tuples) but to get the benefits, you need to do it yourself if you subclass them:

from collections import namedtuple
class MyNT(namedtuple('MyNT', 'bar baz')):
    """MyNT is an immutable and lightweight object"""
    __slots__ = ()

usage:

>>> nt = MyNT('bar', 'baz')
>>> nt.bar
'bar'
>>> nt.baz
'baz'

And trying to assign an unexpected attribute raises an AttributeError because we have prevented the creation of __dict__:

>>> nt.quux = 'quux'
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: 'MyNT' object has no attribute 'quux'

You can allow __dict__ creation by leaving off __slots__ = (), but you can’t use non-empty __slots__ with subtypes of tuple.

Biggest Caveat: Multiple inheritance

Even when non-empty slots are the same for multiple parents, they cannot be used together:

class Foo(object): 
    __slots__ = 'foo', 'bar'
class Bar(object):
    __slots__ = 'foo', 'bar' # alas, would work if empty, i.e. ()

>>> class Baz(Foo, Bar): pass
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: Error when calling the metaclass bases
    multiple bases have instance lay-out conflict

Using an empty __slots__ in the parent seems to provide the most flexibility, allowing the child to choose to prevent or allow (by adding '__dict__' to get dynamic assignment, see section above) the creation of a __dict__:

class Foo(object): __slots__ = ()
class Bar(object): __slots__ = ()
class Baz(Foo, Bar): __slots__ = ('foo', 'bar')
b = Baz()
b.foo, b.bar = 'foo', 'bar'

You don’t have to have slots – so if you add them, and remove them later, it shouldn’t cause any problems.

Going out on a limb here: If you’re composing mixins or using abstract base classes, which aren’t intended to be instantiated, an empty __slots__ in those parents seems to be the best way to go in terms of flexibility for subclassers.

To demonstrate, first, let’s create a class with code we’d like to use under multiple inheritance

class AbstractBase:
    __slots__ = ()
    def __init__(self, a, b):
        self.a = a
        self.b = b
    def __repr__(self):
        return f'{type(self).__name__}({repr(self.a)}, {repr(self.b)})'

We could use the above directly by inheriting and declaring the expected slots:

class Foo(AbstractBase):
    __slots__ = 'a', 'b'

But we don’t care about that, that’s trivial single inheritance, we need another class we might also inherit from, maybe with a noisy attribute:

class AbstractBaseC:
    __slots__ = ()
    @property
    def c(self):
        print('getting c!')
        return self._c
    @c.setter
    def c(self, arg):
        print('setting c!')
        self._c = arg

Now if both bases had nonempty slots, we couldn’t do the below. (In fact, if we wanted, we could have given AbstractBase nonempty slots a and b, and left them out of the below declaration – leaving them in would be wrong):

class Concretion(AbstractBase, AbstractBaseC):
    __slots__ = 'a b _c'.split()

And now we have functionality from both via multiple inheritance, and can still deny __dict__ and __weakref__ instantiation:

>>> c = Concretion('a', 'b')
>>> c.c = c
setting c!
>>> c.c
getting c!
Concretion('a', 'b')
>>> c.d = 'd'
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: 'Concretion' object has no attribute 'd'

Other cases to avoid slots:

  • Avoid them when you want to perform __class__ assignment with another class that doesn’t have them (and you can’t add them) unless the slot layouts are identical. (I am very interested in learning who is doing this and why.)
  • Avoid them if you want to subclass variable length builtins like long, tuple, or str, and you want to add attributes to them.
  • Avoid them if you insist on providing default values via class attributes for instance variables.

You may be able to tease out further caveats from the rest of the __slots__ documentation (the 3.7 dev docs are the most current), which I have made significant recent contributions to.

Critiques of other answers

The current top answers cite outdated information and are quite hand-wavy and miss the mark in some important ways.

Do not “only use __slots__ when instantiating lots of objects”

I quote:

“You would want to use __slots__ if you are going to instantiate a lot (hundreds, thousands) of objects of the same class.”

Abstract Base Classes, for example, from the collections module, are not instantiated, yet __slots__ are declared for them.

Why?

If a user wishes to deny __dict__ or __weakref__ creation, those things must not be available in the parent classes.

__slots__ contributes to reusability when creating interfaces or mixins.

It is true that many Python users aren’t writing for reusability, but when you are, having the option to deny unnecessary space usage is valuable.

__slots__ doesn’t break pickling

When pickling a slotted object, you may find it complains with a misleading TypeError:

>>> pickle.loads(pickle.dumps(f))
TypeError: a class that defines __slots__ without defining __getstate__ cannot be pickled

This is actually incorrect. This message comes from the oldest protocol, which is the default. You can select the latest protocol with the -1 argument. In Python 2.7 this would be 2 (which was introduced in 2.3), and in 3.6 it is 4.

>>> pickle.loads(pickle.dumps(f, -1))
<__main__.Foo object at 0x1129C770>

in Python 2.7:

>>> pickle.loads(pickle.dumps(f, 2))
<__main__.Foo object at 0x1129C770>

in Python 3.6

>>> pickle.loads(pickle.dumps(f, 4))
<__main__.Foo object at 0x1129C770>

So I would keep this in mind, as it is a solved problem.

Critique of the (until Oct 2, 2016) accepted answer

The first paragraph is half short explanation, half predictive. Here’s the only part that actually answers the question

The proper use of __slots__ is to save space in objects. Instead of having a dynamic dict that allows adding attributes to objects at anytime, there is a static structure which does not allow additions after creation. This saves the overhead of one dict for every object that uses slots

The second half is wishful thinking, and off the mark:

While this is sometimes a useful optimization, it would be completely unnecessary if the Python interpreter was dynamic enough so that it would only require the dict when there actually were additions to the object.

Python actually does something similar to this, only creating the __dict__ when it is accessed, but creating lots of objects with no data is fairly ridiculous.

The second paragraph oversimplifies and misses actual reasons to avoid __slots__. The below is not a real reason to avoid slots (for actual reasons, see the rest of my answer above.):

They change the behavior of the objects that have slots in a way that can be abused by control freaks and static typing weenies.

It then goes on to discuss other ways of accomplishing that perverse goal with Python, not discussing anything to do with __slots__.

The third paragraph is more wishful thinking. Together it is mostly off-the-mark content that the answerer didn’t even author and contributes to ammunition for critics of the site.

Memory usage evidence

Create some normal objects and slotted objects:

>>> class Foo(object): pass
>>> class Bar(object): __slots__ = ()

Instantiate a million of them:

>>> foos = [Foo() for f in xrange(1000000)]
>>> bars = [Bar() for b in xrange(1000000)]

Inspect with guppy.hpy().heap():

>>> guppy.hpy().heap()
Partition of a set of 2028259 objects. Total size = 99763360 bytes.
 Index  Count   %     Size   % Cumulative  % Kind (class / dict of class)
     0 1000000  49 64000000  64  64000000  64 __main__.Foo
     1     169   0 16281480  16  80281480  80 list
     2 1000000  49 16000000  16  96281480  97 __main__.Bar
     3   12284   1   987472   1  97268952  97 str
...

Access the regular objects and their __dict__ and inspect again:

>>> for f in foos:
...     f.__dict__
>>> guppy.hpy().heap()
Partition of a set of 3028258 objects. Total size = 379763480 bytes.
 Index  Count   %      Size    % Cumulative  % Kind (class / dict of class)
     0 1000000  33 280000000  74 280000000  74 dict of __main__.Foo
     1 1000000  33  64000000  17 344000000  91 __main__.Foo
     2     169   0  16281480   4 360281480  95 list
     3 1000000  33  16000000   4 376281480  99 __main__.Bar
     4   12284   0    987472   0 377268952  99 str
...

This is consistent with the history of Python, from Unifying types and classes in Python 2.2

If you subclass a built-in type, extra space is automatically added to the instances to accomodate __dict__ and __weakrefs__. (The __dict__ is not initialized until you use it though, so you shouldn’t worry about the space occupied by an empty dictionary for each instance you create.) If you don’t need this extra space, you can add the phrase “__slots__ = []” to your class.


回答 1

引用雅各布·哈伦Jacob Hallen)的话

正确的用法__slots__是节省对象空间。静态结构不允许创建后添加任何内容,而不是具有允许随时向对象添加属性的动态命令。[这种使用__slots__消除了每个对象一个字典的开销。]尽管这有时是有用的优化,但是如果Python解释器足够动态,以至仅在实际添加了dict时才需要该字典,则完全没有必要。宾语。

不幸的是,插槽有副作用。它们以一种可被控制怪胎和静态类型临时表滥用的方式更改具有插槽的对象的行为。这是不好的,因为控制怪胎应该滥用元类,而静态类型之间应该滥用装饰器,因为在Python中,应该只有一种明显的方法。

使CPython足够智能来处理节省的空间__slots__是一项艰巨的任务,这可能就是为什么它不在P3k更改列表中的原因(至今)。

Quoting Jacob Hallen:

The proper use of __slots__ is to save space in objects. Instead of having a dynamic dict that allows adding attributes to objects at anytime, there is a static structure which does not allow additions after creation. [This use of __slots__ eliminates the overhead of one dict for every object.] While this is sometimes a useful optimization, it would be completely unnecessary if the Python interpreter was dynamic enough so that it would only require the dict when there actually were additions to the object.

Unfortunately there is a side effect to slots. They change the behavior of the objects that have slots in a way that can be abused by control freaks and static typing weenies. This is bad, because the control freaks should be abusing the metaclasses and the static typing weenies should be abusing decorators, since in Python, there should be only one obvious way of doing something.

Making CPython smart enough to handle saving space without __slots__ is a major undertaking, which is probably why it is not on the list of changes for P3k (yet).


回答 2

你会想使用__slots__,如果你要实例化同一个类的对象很多(几百,几千)。__slots__仅作为内存优化工具存在。

不建议将其__slots__用于约束属性创建。

__slots__使用默认的(最旧的)泡菜协议将不能使用酸洗对象。有必要指定一个更高的版本。

python的其他一些自省功能也可能受到不利影响。

You would want to use __slots__ if you are going to instantiate a lot (hundreds, thousands) of objects of the same class. __slots__ only exists as a memory optimization tool.

It’s highly discouraged to use __slots__ for constraining attribute creation.

Pickling objects with __slots__ won’t work with the default (oldest) pickle protocol; it’s necessary to specify a later version.

Some other introspection features of python may also be adversely affected.


回答 3

每个python对象都有一个__dict__属性,该属性是包含所有其他属性的字典。例如,当您键入self.attrpython时实际上正在执行self.__dict__['attr']。您可以想象使用字典存储属性会花费一些额外的空间和时间来访问它。

但是,当您使用 __slots__,为该类创建的任何对象将没有__dict__属性。相反,所有属性访问都直接通过指针进行。

因此,如果要使用C样式结构而不是完整的类,则可以使用它__slots__来压缩对象的大小并减少属性访问时间。一个很好的例子是一个包含属性x&y的Point类。如果您有很多要点,可以尝试使用__slots__以节省一些内存。

Each python object has a __dict__ atttribute which is a dictionary containing all other attributes. e.g. when you type self.attr python is actually doing self.__dict__['attr']. As you can imagine using a dictionary to store attribute takes some extra space & time for accessing it.

However, when you use __slots__, any object created for that class won’t have a __dict__ attribute. Instead, all attribute access is done directly via pointers.

So if want a C style structure rather than a full fledged class you can use __slots__ for compacting size of the objects & reducing attribute access time. A good example is a Point class containing attributes x & y. If you are going to have a lot of points, you can try using __slots__ in order to conserve some memory.


回答 4

除了其他答案,这是使用的示例__slots__

>>> class Test(object):   #Must be new-style class!
...  __slots__ = ['x', 'y']
... 
>>> pt = Test()
>>> dir(pt)
['__class__', '__delattr__', '__doc__', '__getattribute__', '__hash__', 
 '__init__', '__module__', '__new__', '__reduce__', '__reduce_ex__', 
 '__repr__', '__setattr__', '__slots__', '__str__', 'x', 'y']
>>> pt.x
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: x
>>> pt.x = 1
>>> pt.x
1
>>> pt.z = 2
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: 'Test' object has no attribute 'z'
>>> pt.__dict__
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: 'Test' object has no attribute '__dict__'
>>> pt.__slots__
['x', 'y']

因此,要实现__slots__,只需要多花一行(如果还没有,请将您的类变成新样式的类)。这样,您可以将这些类的内存占用量减少5倍,而在必要时以及必须编写自定义的pickle代码的代价是。

In addition to the other answers, here is an example of using __slots__:

>>> class Test(object):   #Must be new-style class!
...  __slots__ = ['x', 'y']
... 
>>> pt = Test()
>>> dir(pt)
['__class__', '__delattr__', '__doc__', '__getattribute__', '__hash__', 
 '__init__', '__module__', '__new__', '__reduce__', '__reduce_ex__', 
 '__repr__', '__setattr__', '__slots__', '__str__', 'x', 'y']
>>> pt.x
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: x
>>> pt.x = 1
>>> pt.x
1
>>> pt.z = 2
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: 'Test' object has no attribute 'z'
>>> pt.__dict__
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: 'Test' object has no attribute '__dict__'
>>> pt.__slots__
['x', 'y']

So, to implement __slots__, it only takes an extra line (and making your class a new-style class if it isn’t already). This way you can reduce the memory footprint of those classes 5-fold, at the expense of having to write custom pickle code, if and when that becomes necessary.


回答 5

插槽对于库调用非常有用,以消除进行函数调用时的“命名方法分派”。SWIG 文档中提到了这一点。对于想要减少使用插槽的通常称为函数的函数开销的高性能库,速度要快得多。

现在,这可能与OP问题没有直接关系。它与构建扩展有关,而不是与在对象上使用slot语法有关。但这确实有助于完整了解插槽的使用情况以及它们背后的一些原因。

Slots are very useful for library calls to eliminate the “named method dispatch” when making function calls. This is mentioned in the SWIG documentation. For high performance libraries that want to reduce function overhead for commonly called functions using slots is much faster.

Now this may not be directly related to the OPs question. It is related more to building extensions than it does to using the slots syntax on an object. But it does help complete the picture for the usage of slots and some of the reasoning behind them.


回答 6

类实例的属性具有3个属性:实例,属性名称和属性值。

常规属性访问中,实例充当字典,而属性名称充当该字典中查找值的键。

instance(attribute)->值

__slots__访问,属性名称充当字典,实例充当字典中查找值的键。

属性(实例)->值

flyweight模式中,属性的名称充当字典,而值充当该字典中查找实例的键。

attribute(value)->实例

An attribute of a class instance has 3 properties: the instance, the name of the attribute, and the value of the attribute.

In regular attribute access, the instance acts as a dictionary and the name of the attribute acts as the key in that dictionary looking up value.

instance(attribute) –> value

In __slots__ access, the name of the attribute acts as the dictionary and the instance acts as the key in the dictionary looking up value.

attribute(instance) –> value

In flyweight pattern, the name of the attribute acts as the dictionary and the value acts as the key in that dictionary looking up the instance.

attribute(value) –> instance


回答 7

一个非常简单的__slot__属性示例。

问题:无 __slots__

如果__slot__我的Class中没有属性,则可以向对象添加新属性。

class Test:
    pass

obj1=Test()
obj2=Test()

print(obj1.__dict__)  #--> {}
obj1.x=12
print(obj1.__dict__)  # --> {'x': 12}
obj1.y=20
print(obj1.__dict__)  # --> {'x': 12, 'y': 20}

obj2.x=99
print(obj2.__dict__)  # --> {'x': 99}

如果您看上面的示例,可以看到obj1obj2都有自己的xy属性,而python还dict为每个对象创建了一个属性(obj1obj2)。

假设我的类Test是否有成千上万个此类对象?dict为每个对象创建一个附加属性将导致我的代码中大量开销(内存,计算能力等)。

解决方案: __slots__

现在,在下面的示例中,我的类Test包含__slots__属性。现在,我无法向对象添加新属性(attribute除外x),而python不再创建dict属性。这消除了每个对象的开销,如果您有很多对象,开销可能会变得很大。

class Test:
    __slots__=("x")

obj1=Test()
obj2=Test()
obj1.x=12
print(obj1.x)  # --> 12
obj2.x=99
print(obj2.x)  # --> 99

obj1.y=28
print(obj1.y)  # --> AttributeError: 'Test' object has no attribute 'y'

A very simple example of __slot__ attribute.

Problem: Without __slots__

If I don’t have __slot__ attribute in my class, I can add new attributes to my objects.

class Test:
    pass

obj1=Test()
obj2=Test()

print(obj1.__dict__)  #--> {}
obj1.x=12
print(obj1.__dict__)  # --> {'x': 12}
obj1.y=20
print(obj1.__dict__)  # --> {'x': 12, 'y': 20}

obj2.x=99
print(obj2.__dict__)  # --> {'x': 99}

If you look at example above, you can see that obj1 and obj2 have their own x and y attributes and python has also created a dict attribute for each object (obj1 and obj2).

Suppose if my class Test has thousands of such objects? Creating an additional attribute dict for each object will cause lot of overhead (memory, computing power etc.) in my code.

Solution: With __slots__

Now in the following example my class Test contains __slots__ attribute. Now I can’t add new attributes to my objects (except attribute x) and python doesn’t create a dict attribute anymore. This eliminates overhead for each object, which can become significant if you have many objects.

class Test:
    __slots__=("x")

obj1=Test()
obj2=Test()
obj1.x=12
print(obj1.x)  # --> 12
obj2.x=99
print(obj2.x)  # --> 99

obj1.y=28
print(obj1.y)  # --> AttributeError: 'Test' object has no attribute 'y'

回答 8

另一个晦涩的用法__slots__是从ProxyTypes包(以前是PEAK项目的一部分)中向对象代理添加属性。它ObjectWrapper允许您代理另一个对象,但拦截与代理对象的所有交互。它不是很常用(并且没有Python 3支持),但是我们已经使用它来实现基于龙卷风的异步实现周围的线程安全的阻塞包装,该龙卷风使用线程安全来反弹通过ioloop对代理对象的所有访问concurrent.Future对象以同步并返回结果。

默认情况下,对代理对象的任何属性访问都将为您提供代理对象的结果。如果需要在代理对象上添加属性,__slots__则可以使用。

from peak.util.proxies import ObjectWrapper

class Original(object):
    def __init__(self):
        self.name = 'The Original'

class ProxyOriginal(ObjectWrapper):

    __slots__ = ['proxy_name']

    def __init__(self, subject, proxy_name):
        # proxy_info attributed added directly to the
        # Original instance, not the ProxyOriginal instance
        self.proxy_info = 'You are proxied by {}'.format(proxy_name)

        # proxy_name added to ProxyOriginal instance, since it is
        # defined in __slots__
        self.proxy_name = proxy_name

        super(ProxyOriginal, self).__init__(subject)

if __name__ == "__main__":
    original = Original()
    proxy = ProxyOriginal(original, 'Proxy Overlord')

    # Both statements print "The Original"
    print "original.name: ", original.name
    print "proxy.name: ", proxy.name

    # Both statements below print 
    # "You are proxied by Proxy Overlord", since the ProxyOriginal
    # __init__ sets it to the original object 
    print "original.proxy_info: ", original.proxy_info
    print "proxy.proxy_info: ", proxy.proxy_info

    # prints "Proxy Overlord"
    print "proxy.proxy_name: ", proxy.proxy_name
    # Raises AttributeError since proxy_name is only set on 
    # the proxy object
    print "original.proxy_name: ", proxy.proxy_name

Another somewhat obscure use of __slots__ is to add attributes to an object proxy from the ProxyTypes package, formerly part of the PEAK project. Its ObjectWrapper allows you to proxy another object, but intercept all interactions with the proxied object. It is not very commonly used (and no Python 3 support), but we have used it to implement a thread-safe blocking wrapper around an async implementation based on tornado that bounces all access to the proxied object through the ioloop, using thread-safe concurrent.Future objects to synchronise and return results.

By default any attribute access to the proxy object will give you the result from the proxied object. If you need to add an attribute on the proxy object, __slots__ can be used.

from peak.util.proxies import ObjectWrapper

class Original(object):
    def __init__(self):
        self.name = 'The Original'

class ProxyOriginal(ObjectWrapper):

    __slots__ = ['proxy_name']

    def __init__(self, subject, proxy_name):
        # proxy_info attributed added directly to the
        # Original instance, not the ProxyOriginal instance
        self.proxy_info = 'You are proxied by {}'.format(proxy_name)

        # proxy_name added to ProxyOriginal instance, since it is
        # defined in __slots__
        self.proxy_name = proxy_name

        super(ProxyOriginal, self).__init__(subject)

if __name__ == "__main__":
    original = Original()
    proxy = ProxyOriginal(original, 'Proxy Overlord')

    # Both statements print "The Original"
    print "original.name: ", original.name
    print "proxy.name: ", proxy.name

    # Both statements below print 
    # "You are proxied by Proxy Overlord", since the ProxyOriginal
    # __init__ sets it to the original object 
    print "original.proxy_info: ", original.proxy_info
    print "proxy.proxy_info: ", proxy.proxy_info

    # prints "Proxy Overlord"
    print "proxy.proxy_name: ", proxy.proxy_name
    # Raises AttributeError since proxy_name is only set on 
    # the proxy object
    print "original.proxy_name: ", proxy.proxy_name

回答 9

您基本上没有用__slots__

在您认为自己可能需要的时间里__slots__,您实际上想使用轻量级轻量级设计模式。在这些情况下,您不再希望使用纯Python对象。相反,您希望在数组,结构或numpy数组周围使用类似Python对象的包装器。

class Flyweight(object):

    def get(self, theData, index):
        return theData[index]

    def set(self, theData, index, value):
        theData[index]= value

类似于类的包装器没有属性-它仅提供对基础数据起作用的方法。这些方法可以简化为类方法。实际上,可以将其简化为仅对底层数据数组进行操作的函数。

You have — essentially — no use for __slots__.

For the time when you think you might need __slots__, you actually want to use Lightweight or Flyweight design patterns. These are cases when you no longer want to use purely Python objects. Instead, you want a Python object-like wrapper around an array, struct, or numpy array.

class Flyweight(object):

    def get(self, theData, index):
        return theData[index]

    def set(self, theData, index, value):
        theData[index]= value

The class-like wrapper has no attributes — it just provides methods that act on the underlying data. The methods can be reduced to class methods. Indeed, it could be reduced to just functions operating on the underlying array of data.


回答 10

最初的问题是关于一般用例,而不仅仅是内存。因此,在这里应该提到的是,在实例化大量对象时,您也会获得更好的性能 -有趣的是,例如,将大型文档解析为对象或从数据库中解析时。

这是使用插槽和不使用插槽创建具有一百万个条目的对象树的比较。作为参考,在对树使用普通字典时的性能(在OSX上为Py2.7.10):

********** RUN 1 **********
1.96036410332 <class 'css_tree_select.element.Element'>
3.02922606468 <class 'css_tree_select.element.ElementNoSlots'>
2.90828204155 dict
********** RUN 2 **********
1.77050495148 <class 'css_tree_select.element.Element'>
3.10655999184 <class 'css_tree_select.element.ElementNoSlots'>
2.84120798111 dict
********** RUN 3 **********
1.84069895744 <class 'css_tree_select.element.Element'>
3.21540498734 <class 'css_tree_select.element.ElementNoSlots'>
2.59615707397 dict
********** RUN 4 **********
1.75041103363 <class 'css_tree_select.element.Element'>
3.17366290092 <class 'css_tree_select.element.ElementNoSlots'>
2.70941114426 dict

测试类(ident,来自插槽的appart):

class Element(object):
    __slots__ = ['_typ', 'id', 'parent', 'childs']
    def __init__(self, typ, id, parent=None):
        self._typ = typ
        self.id = id
        self.childs = []
        if parent:
            self.parent = parent
            parent.childs.append(self)

class ElementNoSlots(object): (same, w/o slots)

测试代码,详细模式:

na, nb, nc = 100, 100, 100
for i in (1, 2, 3, 4):
    print '*' * 10, 'RUN', i, '*' * 10
    # tree with slot and no slot:
    for cls in Element, ElementNoSlots:
        t1 = time.time()
        root = cls('root', 'root')
        for i in xrange(na):
            ela = cls(typ='a', id=i, parent=root)
            for j in xrange(nb):
                elb = cls(typ='b', id=(i, j), parent=ela)
                for k in xrange(nc):
                    elc = cls(typ='c', id=(i, j, k), parent=elb)
        to =  time.time() - t1
        print to, cls
        del root

    # ref: tree with dicts only:
    t1 = time.time()
    droot = {'childs': []}
    for i in xrange(na):
        ela =  {'typ': 'a', id: i, 'childs': []}
        droot['childs'].append(ela)
        for j in xrange(nb):
            elb =  {'typ': 'b', id: (i, j), 'childs': []}
            ela['childs'].append(elb)
            for k in xrange(nc):
                elc =  {'typ': 'c', id: (i, j, k), 'childs': []}
                elb['childs'].append(elc)
    td = time.time() - t1
    print td, 'dict'
    del droot

The original question was about general use cases not only about memory. So it should be mentioned here that you also get better performance when instantiating large amounts of objects – interesting e.g. when parsing large documents into objects or from a database.

Here is a comparison of creating object trees with a million entries, using slots and without slots. As a reference also the performance when using plain dicts for the trees (Py2.7.10 on OSX):

********** RUN 1 **********
1.96036410332 <class 'css_tree_select.element.Element'>
3.02922606468 <class 'css_tree_select.element.ElementNoSlots'>
2.90828204155 dict
********** RUN 2 **********
1.77050495148 <class 'css_tree_select.element.Element'>
3.10655999184 <class 'css_tree_select.element.ElementNoSlots'>
2.84120798111 dict
********** RUN 3 **********
1.84069895744 <class 'css_tree_select.element.Element'>
3.21540498734 <class 'css_tree_select.element.ElementNoSlots'>
2.59615707397 dict
********** RUN 4 **********
1.75041103363 <class 'css_tree_select.element.Element'>
3.17366290092 <class 'css_tree_select.element.ElementNoSlots'>
2.70941114426 dict

Test classes (ident, appart from slots):

class Element(object):
    __slots__ = ['_typ', 'id', 'parent', 'childs']
    def __init__(self, typ, id, parent=None):
        self._typ = typ
        self.id = id
        self.childs = []
        if parent:
            self.parent = parent
            parent.childs.append(self)

class ElementNoSlots(object): (same, w/o slots)

testcode, verbose mode:

na, nb, nc = 100, 100, 100
for i in (1, 2, 3, 4):
    print '*' * 10, 'RUN', i, '*' * 10
    # tree with slot and no slot:
    for cls in Element, ElementNoSlots:
        t1 = time.time()
        root = cls('root', 'root')
        for i in xrange(na):
            ela = cls(typ='a', id=i, parent=root)
            for j in xrange(nb):
                elb = cls(typ='b', id=(i, j), parent=ela)
                for k in xrange(nc):
                    elc = cls(typ='c', id=(i, j, k), parent=elb)
        to =  time.time() - t1
        print to, cls
        del root

    # ref: tree with dicts only:
    t1 = time.time()
    droot = {'childs': []}
    for i in xrange(na):
        ela =  {'typ': 'a', id: i, 'childs': []}
        droot['childs'].append(ela)
        for j in xrange(nb):
            elb =  {'typ': 'b', id: (i, j), 'childs': []}
            ela['childs'].append(elb)
            for k in xrange(nc):
                elc =  {'typ': 'c', id: (i, j, k), 'childs': []}
                elb['childs'].append(elc)
    td = time.time() - t1
    print td, 'dict'
    del droot

为什么在Python 3中“范围(1000000000000000(1000000000000001))”这么快?

问题:为什么在Python 3中“范围(1000000000000000(1000000000000001))”这么快?

据我了解,该range()函数实际上是Python 3中的一种对象类型,它会生成器一样动态生成其内容。

在这种情况下,我本以为下一行会花费过多的时间,因为要确定1个四舍五入是否在该范围内,必须生成一个四舍五入值:

1000000000000000 in range(1000000000000001)

此外:似乎无论我添加多少个零,计算多少都花费相同的时间(基本上是瞬时的)。

我也尝试过这样的事情,但是计算仍然是即时的:

1000000000000000000000 in range(0,1000000000000000000001,10) # count by tens

如果我尝试实现自己的范围函数,结果将不是很好!

def my_crappy_range(N):
    i = 0
    while i < N:
        yield i
        i += 1
    return

使range()物体如此之快的物体在做什么?


选择Martijn Pieters的答案是因为它的完整性,但也看到了abarnert的第一个答案,它很好地讨论了在Python 3中range成为完整序列的含义,以及一些有关__contains__跨Python实现的函数优化潜在不一致的信息/警告。。abarnert的其他答案更加详细,并为那些对Python 3优化背后的历史(以及xrangePython 2中缺乏优化)感兴趣的人提供了链接。pokewim的答案感兴趣的人提供了相关的C源代码和说明。

It is my understanding that the range() function, which is actually an object type in Python 3, generates its contents on the fly, similar to a generator.

This being the case, I would have expected the following line to take an inordinate amount of time, because in order to determine whether 1 quadrillion is in the range, a quadrillion values would have to be generated:

1000000000000000 in range(1000000000000001)

Furthermore: it seems that no matter how many zeroes I add on, the calculation more or less takes the same amount of time (basically instantaneous).

I have also tried things like this, but the calculation is still almost instant:

1000000000000000000000 in range(0,1000000000000000000001,10) # count by tens

If I try to implement my own range function, the result is not so nice!!

def my_crappy_range(N):
    i = 0
    while i < N:
        yield i
        i += 1
    return

What is the range() object doing under the hood that makes it so fast?


Martijn Pieters’ answer was chosen for its completeness, but also see abarnert’s first answer for a good discussion of what it means for range to be a full-fledged sequence in Python 3, and some information/warning regarding potential inconsistency for __contains__ function optimization across Python implementations. abarnert’s other answer goes into some more detail and provides links for those interested in the history behind the optimization in Python 3 (and lack of optimization of xrange in Python 2). Answers by poke and by wim provide the relevant C source code and explanations for those who are interested.


回答 0

Python 3 range()对象不会立即产生数字。它是一个智能序列对象,可按需生成数字。它包含的只是您的开始,结束和步长值,然后在对对象进行迭代时,每次迭代都会计算下一个整数。

该对象还实现了object.__contains__hook,并计算您的电话号码是否在其范围内。计算是一个(近)恒定时间运算*。永远不需要扫描范围内的所有可能整数。

range()对象文档中

所述的优点range类型通过常规listtuple是一个范围对象将始终以相同的内存(小)数量,无论它代表的范围内的大小(因为它仅存储startstopstep值,计算各个项目和子范围如所须)。

因此,您的range()对象至少可以做到:

class my_range(object):
    def __init__(self, start, stop=None, step=1):
        if stop is None:
            start, stop = 0, start
        self.start, self.stop, self.step = start, stop, step
        if step < 0:
            lo, hi, step = stop, start, -step
        else:
            lo, hi = start, stop
        self.length = 0 if lo > hi else ((hi - lo - 1) // step) + 1

    def __iter__(self):
        current = self.start
        if self.step < 0:
            while current > self.stop:
                yield current
                current += self.step
        else:
            while current < self.stop:
                yield current
                current += self.step

    def __len__(self):
        return self.length

    def __getitem__(self, i):
        if i < 0:
            i += self.length
        if 0 <= i < self.length:
            return self.start + i * self.step
        raise IndexError('Index out of range: {}'.format(i))

    def __contains__(self, num):
        if self.step < 0:
            if not (self.stop < num <= self.start):
                return False
        else:
            if not (self.start <= num < self.stop):
                return False
        return (num - self.start) % self.step == 0

这仍然缺少实际range()支持的几项内容(例如.index().count()方法,哈希,相等性测试或切片),但应该可以给您一个提示。

我还简化了__contains__实现,只专注于整数测试。如果您为实物range()提供非整数值(包括的子类int),则会启动慢速扫描以查看是否存在匹配项,就好像您对所有包含的值的列表使用了包含测试一样。这样做是为了继续支持其他数字类型,这些数字类型恰好支持使用整数进行相等性测试,但也不希望同时支持整数算术。请参阅实现收容测试的原始Python问题


* 由于Python整数是无界的,所以时间接近恒定,因此数学运算也随着N的增长而及时增长,这使其成为O(log N)运算。由于所有操作均以优化的C代码执行,并且Python将整数值存储在30位块中,因此,由于此处涉及的整数大小,您会用光内存,然后再看到任何性能影响。

The Python 3 range() object doesn’t produce numbers immediately; it is a smart sequence object that produces numbers on demand. All it contains is your start, stop and step values, then as you iterate over the object the next integer is calculated each iteration.

The object also implements the object.__contains__ hook, and calculates if your number is part of its range. Calculating is a (near) constant time operation *. There is never a need to scan through all possible integers in the range.

From the range() object documentation:

The advantage of the range type over a regular list or tuple is that a range object will always take the same (small) amount of memory, no matter the size of the range it represents (as it only stores the start, stop and step values, calculating individual items and subranges as needed).

So at a minimum, your range() object would do:

class my_range(object):
    def __init__(self, start, stop=None, step=1):
        if stop is None:
            start, stop = 0, start
        self.start, self.stop, self.step = start, stop, step
        if step < 0:
            lo, hi, step = stop, start, -step
        else:
            lo, hi = start, stop
        self.length = 0 if lo > hi else ((hi - lo - 1) // step) + 1

    def __iter__(self):
        current = self.start
        if self.step < 0:
            while current > self.stop:
                yield current
                current += self.step
        else:
            while current < self.stop:
                yield current
                current += self.step

    def __len__(self):
        return self.length

    def __getitem__(self, i):
        if i < 0:
            i += self.length
        if 0 <= i < self.length:
            return self.start + i * self.step
        raise IndexError('Index out of range: {}'.format(i))

    def __contains__(self, num):
        if self.step < 0:
            if not (self.stop < num <= self.start):
                return False
        else:
            if not (self.start <= num < self.stop):
                return False
        return (num - self.start) % self.step == 0

This is still missing several things that a real range() supports (such as the .index() or .count() methods, hashing, equality testing, or slicing), but should give you an idea.

I also simplified the __contains__ implementation to only focus on integer tests; if you give a real range() object a non-integer value (including subclasses of int), a slow scan is initiated to see if there is a match, just as if you use a containment test against a list of all the contained values. This was done to continue to support other numeric types that just happen to support equality testing with integers but are not expected to support integer arithmetic as well. See the original Python issue that implemented the containment test.


* Near constant time because Python integers are unbounded and so math operations also grow in time as N grows, making this a O(log N) operation. Since it’s all executed in optimised C code and Python stores integer values in 30-bit chunks, you’d run out of memory before you saw any performance impact due to the size of the integers involved here.


回答 1

此处的根本误解是认为range是生成器。不是。实际上,它不是任何迭代器。

您可以很容易地说出这一点:

>>> a = range(5)
>>> print(list(a))
[0, 1, 2, 3, 4]
>>> print(list(a))
[0, 1, 2, 3, 4]

如果它是一个生成器,则对其进行一次迭代将耗尽它:

>>> b = my_crappy_range(5)
>>> print(list(b))
[0, 1, 2, 3, 4]
>>> print(list(b))
[]

什么range实际上是,是一个序列,就像一个列表。您甚至可以测试一下:

>>> import collections.abc
>>> isinstance(a, collections.abc.Sequence)
True

这意味着它必须遵循成为序列的所有规则:

>>> a[3]         # indexable
3
>>> len(a)       # sized
5
>>> 3 in a       # membership
True
>>> reversed(a)  # reversible
<range_iterator at 0x101cd2360>
>>> a.index(3)   # implements 'index'
3
>>> a.count(3)   # implements 'count'
1

一个之间的差range和一list在于,range动态序列; 它不记得所有的价值,它只是记住它startstopstep,并根据需要创建的值__getitem__

(作为一个旁注,如果您使用print(iter(a)),则会注意到range使用与相同的listiterator类型list。它是如何工作的?A 除了listiterator使用listC的C实现这一事实外,没有使用任何其他特殊方法__getitem__,因此对于range太。)


现在,没有什么可以说Sequence.__contains__必须是恒定时间的-实际上,对于类似的明显示例list,事实并非如此。但是没有什么可以说是不可能的。与range.__contains__(val - start) % step实际进行计算和测试所有值相比,仅对其进行数学检查(,但具有一些额外的复杂性来处理否定步骤)要容易实现,那么为什么这样做会更好呢?

但是似乎没有什么语言可以保证会发生这种情况。正如Ashwini Chaudhari指出的那样,如果您给它提供一个非整数值,而不是转换为整数并进行数学测试,它将落到对所有值进行迭代并逐一进行比较的过程中。不仅因为CPython 3.2+和PyPy 3.x版本恰好包含此优化,而且这是一个显而易见的好主意且易于实现,所以Iron Iron或NewKickAssPython 3.x没有理由不能放弃它。(实际上,CPython 3.0-3.1 并未包含它。)


如果range实际上是一个生成器(如)my_crappy_range,那么以__contains__这种方式进行测试就没有意义,或者至少有一种合理的方式并不明显。如果您已经迭代了前三个值,那么生成器1仍然in是吗?测试是否应该1使其迭代并消耗所有值1(或直到第一个值>= 1)?

The fundamental misunderstanding here is in thinking that range is a generator. It’s not. In fact, it’s not any kind of iterator.

You can tell this pretty easily:

>>> a = range(5)
>>> print(list(a))
[0, 1, 2, 3, 4]
>>> print(list(a))
[0, 1, 2, 3, 4]

If it were a generator, iterating it once would exhaust it:

>>> b = my_crappy_range(5)
>>> print(list(b))
[0, 1, 2, 3, 4]
>>> print(list(b))
[]

What range actually is, is a sequence, just like a list. You can even test this:

>>> import collections.abc
>>> isinstance(a, collections.abc.Sequence)
True

This means it has to follow all the rules of being a sequence:

>>> a[3]         # indexable
3
>>> len(a)       # sized
5
>>> 3 in a       # membership
True
>>> reversed(a)  # reversible
<range_iterator at 0x101cd2360>
>>> a.index(3)   # implements 'index'
3
>>> a.count(3)   # implements 'count'
1

The difference between a range and a list is that a range is a lazy or dynamic sequence; it doesn’t remember all of its values, it just remembers its start, stop, and step, and creates the values on demand on __getitem__.

(As a side note, if you print(iter(a)), you’ll notice that range uses the same listiterator type as list. How does that work? A listiterator doesn’t use anything special about list except for the fact that it provides a C implementation of __getitem__, so it works fine for range too.)


Now, there’s nothing that says that Sequence.__contains__ has to be constant time—in fact, for obvious examples of sequences like list, it isn’t. But there’s nothing that says it can’t be. And it’s easier to implement range.__contains__ to just check it mathematically ((val - start) % step, but with some extra complexity to deal with negative steps) than to actually generate and test all the values, so why shouldn’t it do it the better way?

But there doesn’t seem to be anything in the language that guarantees this will happen. As Ashwini Chaudhari points out, if you give it a non-integral value, instead of converting to integer and doing the mathematical test, it will fall back to iterating all the values and comparing them one by one. And just because CPython 3.2+ and PyPy 3.x versions happen to contain this optimization, and it’s an obvious good idea and easy to do, there’s no reason that IronPython or NewKickAssPython 3.x couldn’t leave it out. (And in fact CPython 3.0-3.1 didn’t include it.)


If range actually were a generator, like my_crappy_range, then it wouldn’t make sense to test __contains__ this way, or at least the way it makes sense wouldn’t be obvious. If you’d already iterated the first 3 values, is 1 still in the generator? Should testing for 1 cause it to iterate and consume all the values up to 1 (or up to the first value >= 1)?


回答 2

使用消息来源,卢克!

在CPython中,range(...).__contains__(方法包装器)最终将委托给一个简单的计算,该计算将检查该值是否可以在该范围内。速度之所以如此,是因为我们使用关于边界的数学推理,而不是range对象的直接迭代。解释所使用的逻辑:

  1. 检查数字在start和之间stop,以及
  2. 检查步幅值是否不会“超过”我们的数字。

例如,994range(4, 1000, 2)因为:

  1. 4 <= 994 < 1000
  2. (994 - 4) % 2 == 0

完整的C代码包含在下面,由于内存管理和引用计数的详细信息,因此较为冗长,但这里存在基本思想:

static int
range_contains_long(rangeobject *r, PyObject *ob)
{
    int cmp1, cmp2, cmp3;
    PyObject *tmp1 = NULL;
    PyObject *tmp2 = NULL;
    PyObject *zero = NULL;
    int result = -1;

    zero = PyLong_FromLong(0);
    if (zero == NULL) /* MemoryError in int(0) */
        goto end;

    /* Check if the value can possibly be in the range. */

    cmp1 = PyObject_RichCompareBool(r->step, zero, Py_GT);
    if (cmp1 == -1)
        goto end;
    if (cmp1 == 1) { /* positive steps: start <= ob < stop */
        cmp2 = PyObject_RichCompareBool(r->start, ob, Py_LE);
        cmp3 = PyObject_RichCompareBool(ob, r->stop, Py_LT);
    }
    else { /* negative steps: stop < ob <= start */
        cmp2 = PyObject_RichCompareBool(ob, r->start, Py_LE);
        cmp3 = PyObject_RichCompareBool(r->stop, ob, Py_LT);
    }

    if (cmp2 == -1 || cmp3 == -1) /* TypeError */
        goto end;
    if (cmp2 == 0 || cmp3 == 0) { /* ob outside of range */
        result = 0;
        goto end;
    }

    /* Check that the stride does not invalidate ob's membership. */
    tmp1 = PyNumber_Subtract(ob, r->start);
    if (tmp1 == NULL)
        goto end;
    tmp2 = PyNumber_Remainder(tmp1, r->step);
    if (tmp2 == NULL)
        goto end;
    /* result = ((int(ob) - start) % step) == 0 */
    result = PyObject_RichCompareBool(tmp2, zero, Py_EQ);
  end:
    Py_XDECREF(tmp1);
    Py_XDECREF(tmp2);
    Py_XDECREF(zero);
    return result;
}

static int
range_contains(rangeobject *r, PyObject *ob)
{
    if (PyLong_CheckExact(ob) || PyBool_Check(ob))
        return range_contains_long(r, ob);

    return (int)_PySequence_IterSearch((PyObject*)r, ob,
                                       PY_ITERSEARCH_CONTAINS);
}

该行的“实质”在该行中提到:

/* result = ((int(ob) - start) % step) == 0 */ 

最后一点-查看range_contains代码段底部的函数。如果确切的类型检查失败,那么我们将不使用描述的巧妙算法,而是使用_PySequence_IterSearch!退回到该范围的愚蠢迭代搜索。您可以在解释器中检查此行为(我在这里使用v3.5.0):

>>> x, r = 1000000000000000, range(1000000000000001)
>>> class MyInt(int):
...     pass
... 
>>> x_ = MyInt(x)
>>> x in r  # calculates immediately :) 
True
>>> x_ in r  # iterates for ages.. :( 
^\Quit (core dumped)

Use the source, Luke!

In CPython, range(...).__contains__ (a method wrapper) will eventually delegate to a simple calculation which checks if the value can possibly be in the range. The reason for the speed here is we’re using mathematical reasoning about the bounds, rather than a direct iteration of the range object. To explain the logic used:

  1. Check that the number is between start and stop, and
  2. Check that the stride value doesn’t “step over” our number.

For example, 994 is in range(4, 1000, 2) because:

  1. 4 <= 994 < 1000, and
  2. (994 - 4) % 2 == 0.

The full C code is included below, which is a bit more verbose because of memory management and reference counting details, but the basic idea is there:

static int
range_contains_long(rangeobject *r, PyObject *ob)
{
    int cmp1, cmp2, cmp3;
    PyObject *tmp1 = NULL;
    PyObject *tmp2 = NULL;
    PyObject *zero = NULL;
    int result = -1;

    zero = PyLong_FromLong(0);
    if (zero == NULL) /* MemoryError in int(0) */
        goto end;

    /* Check if the value can possibly be in the range. */

    cmp1 = PyObject_RichCompareBool(r->step, zero, Py_GT);
    if (cmp1 == -1)
        goto end;
    if (cmp1 == 1) { /* positive steps: start <= ob < stop */
        cmp2 = PyObject_RichCompareBool(r->start, ob, Py_LE);
        cmp3 = PyObject_RichCompareBool(ob, r->stop, Py_LT);
    }
    else { /* negative steps: stop < ob <= start */
        cmp2 = PyObject_RichCompareBool(ob, r->start, Py_LE);
        cmp3 = PyObject_RichCompareBool(r->stop, ob, Py_LT);
    }

    if (cmp2 == -1 || cmp3 == -1) /* TypeError */
        goto end;
    if (cmp2 == 0 || cmp3 == 0) { /* ob outside of range */
        result = 0;
        goto end;
    }

    /* Check that the stride does not invalidate ob's membership. */
    tmp1 = PyNumber_Subtract(ob, r->start);
    if (tmp1 == NULL)
        goto end;
    tmp2 = PyNumber_Remainder(tmp1, r->step);
    if (tmp2 == NULL)
        goto end;
    /* result = ((int(ob) - start) % step) == 0 */
    result = PyObject_RichCompareBool(tmp2, zero, Py_EQ);
  end:
    Py_XDECREF(tmp1);
    Py_XDECREF(tmp2);
    Py_XDECREF(zero);
    return result;
}

static int
range_contains(rangeobject *r, PyObject *ob)
{
    if (PyLong_CheckExact(ob) || PyBool_Check(ob))
        return range_contains_long(r, ob);

    return (int)_PySequence_IterSearch((PyObject*)r, ob,
                                       PY_ITERSEARCH_CONTAINS);
}

The “meat” of the idea is mentioned in the line:

/* result = ((int(ob) - start) % step) == 0 */ 

As a final note – look at the range_contains function at the bottom of the code snippet. If the exact type check fails then we don’t use the clever algorithm described, instead falling back to a dumb iteration search of the range using _PySequence_IterSearch! You can check this behaviour in the interpreter (I’m using v3.5.0 here):

>>> x, r = 1000000000000000, range(1000000000000001)
>>> class MyInt(int):
...     pass
... 
>>> x_ = MyInt(x)
>>> x in r  # calculates immediately :) 
True
>>> x_ in r  # iterates for ages.. :( 
^\Quit (core dumped)

回答 3

为了补充Martijn的答案,这是源代码的相关部分(在C中,因为range对象是用本机代码编写的):

static int
range_contains(rangeobject *r, PyObject *ob)
{
    if (PyLong_CheckExact(ob) || PyBool_Check(ob))
        return range_contains_long(r, ob);

    return (int)_PySequence_IterSearch((PyObject*)r, ob,
                                       PY_ITERSEARCH_CONTAINS);
}

因此对于PyLong对象(int在Python 3中是),它将使用该range_contains_long函数确定结果。该函数实际上检查是否ob在指定范围内(尽管在C语言中看起来更复杂)。

如果不是int对象,它将退回到迭代,直到找到(或没有)值为止。

整个逻辑可以像这样转换为伪Python:

def range_contains (rangeObj, obj):
    if isinstance(obj, int):
        return range_contains_long(rangeObj, obj)

    # default logic by iterating
    return any(obj == x for x in rangeObj)

def range_contains_long (r, num):
    if r.step > 0:
        # positive step: r.start <= num < r.stop
        cmp2 = r.start <= num
        cmp3 = num < r.stop
    else:
        # negative step: r.start >= num > r.stop
        cmp2 = num <= r.start
        cmp3 = r.stop < num

    # outside of the range boundaries
    if not cmp2 or not cmp3:
        return False

    # num must be on a valid step inside the boundaries
    return (num - r.start) % r.step == 0

To add to Martijn’s answer, this is the relevant part of the source (in C, as the range object is written in native code):

static int
range_contains(rangeobject *r, PyObject *ob)
{
    if (PyLong_CheckExact(ob) || PyBool_Check(ob))
        return range_contains_long(r, ob);

    return (int)_PySequence_IterSearch((PyObject*)r, ob,
                                       PY_ITERSEARCH_CONTAINS);
}

So for PyLong objects (which is int in Python 3), it will use the range_contains_long function to determine the result. And that function essentially checks if ob is in the specified range (although it looks a bit more complex in C).

If it’s not an int object, it falls back to iterating until it finds the value (or not).

The whole logic could be translated to pseudo-Python like this:

def range_contains (rangeObj, obj):
    if isinstance(obj, int):
        return range_contains_long(rangeObj, obj)

    # default logic by iterating
    return any(obj == x for x in rangeObj)

def range_contains_long (r, num):
    if r.step > 0:
        # positive step: r.start <= num < r.stop
        cmp2 = r.start <= num
        cmp3 = num < r.stop
    else:
        # negative step: r.start >= num > r.stop
        cmp2 = num <= r.start
        cmp3 = r.stop < num

    # outside of the range boundaries
    if not cmp2 or not cmp3:
        return False

    # num must be on a valid step inside the boundaries
    return (num - r.start) % r.step == 0

回答 4

如果您想知道为什么将此优化添加到range.__contains__,以及为什么将其添加到xrange.__contains__2.7:

首先,正如Ashwini Chaudhary所发现的, 发行1766304已明确打开以进行优化[x]range.__contains__接受了此修补程序并签入了3.2版本,但没有回迁到2.7版本,因为“ xrange表现得如此之久,以至于我看不到它为什么让我们提交最新的修补程序。” (当时2.7快要淘汰了。)

与此同时:

最初xrange是一个非相当序列的对象。如 3.1文档所说:

范围对象的行为很少:它们仅支持索引,迭代和 len功能。

这不是真的。一个xrange对象实际支持,与索引和自动出现一些其他的东西len *包括__contains__(通过线性搜索)。但是,没有人认为有必要在那时将它们完整地序列化。

然后,作为实现抽象基类 PEP的一部分,重要的是弄清楚应将哪些内置类型标记为实现哪些ABC和xrange/ range声称实现collections.Sequence,即使它仍仅处理相同的“非常少的行为”。在发布9213之前,没有人注意到这个问题。该问题的补丁不仅增加indexcount3.2的range,它也重新工作的优化__contains__(共享相同的数学index,并直接使用count)。** 此更改也适用于3.2,并且没有回移植到2.x,因为“这是一个添加了新方法的错误修正”。(此时,2.7已经超过了rc状态。)

因此,有两次机会可以将此优化回溯到2.7,但都被拒绝了。


*实际上,您甚至可以单独使用索引免费获得迭代,但是在2.3 xrange对象中获得了自定义迭代器。

**第一个版本实际上是重新实现了它,并且弄错了细节-例如,它将给您MyIntSubclass(2) in range(5) == False。但是Daniel Stutzbach的补丁更新版本恢复了以前的大部分代码,包括对通用代码的后备支持,_PySequence_IterSearchrange.__contains__在不应用优化的情况下会缓慢地降低3.2 之前版本的隐式使用。

If you’re wondering why this optimization was added to range.__contains__, and why it wasn’t added to xrange.__contains__ in 2.7:

First, as Ashwini Chaudhary discovered, issue 1766304 was opened explicitly to optimize [x]range.__contains__. A patch for this was accepted and checked in for 3.2, but not backported to 2.7 because “xrange has behaved like this for such a long time that I don’t see what it buys us to commit the patch this late.” (2.7 was nearly out at that point.)

Meanwhile:

Originally, xrange was a not-quite-sequence object. As the 3.1 docs say:

Range objects have very little behavior: they only support indexing, iteration, and the len function.

This wasn’t quite true; an xrange object actually supported a few other things that come automatically with indexing and len,* including __contains__ (via linear search). But nobody thought it was worth making them full sequences at the time.

Then, as part of implementing the Abstract Base Classes PEP, it was important to figure out which builtin types should be marked as implementing which ABCs, and xrange/range claimed to implement collections.Sequence, even though it still only handled the same “very little behavior”. Nobody noticed that problem until issue 9213. The patch for that issue not only added index and count to 3.2’s range, it also re-worked the optimized __contains__ (which shares the same math with index, and is directly used by count).** This change went in for 3.2 as well, and was not backported to 2.x, because “it’s a bugfix that adds new methods”. (At this point, 2.7 was already past rc status.)

So, there were two chances to get this optimization backported to 2.7, but they were both rejected.


* In fact, you even get iteration for free with indexing alone, but in 2.3 xrange objects got a custom iterator.

** The first version actually reimplemented it, and got the details wrong—e.g., it would give you MyIntSubclass(2) in range(5) == False. But Daniel Stutzbach’s updated version of the patch restored most of the previous code, including the fallback to the generic, slow _PySequence_IterSearch that pre-3.2 range.__contains__ was implicitly using when the optimization doesn’t apply.


回答 5

其他答案已经很好地说明了这一点,但是我想提供另一个实验来说明范围对象的性质:

>>> r = range(5)
>>> for i in r:
        print(i, 2 in r, list(r))

0 True [0, 1, 2, 3, 4]
1 True [0, 1, 2, 3, 4]
2 True [0, 1, 2, 3, 4]
3 True [0, 1, 2, 3, 4]
4 True [0, 1, 2, 3, 4]

如您所见,范围对象是一个记住其范围的对象,可以多次使用(即使在其上进行迭代),而不仅仅是一次生成器。

The other answers explained it well already, but I’d like to offer another experiment illustrating the nature of range objects:

>>> r = range(5)
>>> for i in r:
        print(i, 2 in r, list(r))

0 True [0, 1, 2, 3, 4]
1 True [0, 1, 2, 3, 4]
2 True [0, 1, 2, 3, 4]
3 True [0, 1, 2, 3, 4]
4 True [0, 1, 2, 3, 4]

As you can see, a range object is an object that remembers its range and can be used many times (even while iterating over it), not just a one-time generator.


回答 6

这是关于一个偷懒的办法来评估和一些额外的优化range。直到实际使用时才需要计算范围内的值,或者由于额外的优化甚至不需要进一步计算。

顺便说一句,您的整数不是那么大,请考虑 sys.maxsize

sys.maxsize in range(sys.maxsize) 相当快

由于优化-比较给定的整数和范围的最小值和最大值很容易。

但:

Decimal(sys.maxsize) in range(sys.maxsize) 很慢

(在这种情况下, range,因此,如果python收到意外的Decimal,则python将比较所有数字)

您应该了解实现细节,但不应依赖它,因为将来可能会改变。

It’s all about a lazy approach to the evaluation and some extra optimization of range. Values in ranges don’t need to be computed until real use, or even further due to extra optimization.

By the way, your integer is not such big, consider sys.maxsize

sys.maxsize in range(sys.maxsize) is pretty fast

due to optimization – it’s easy to compare given integer just with min and max of range.

but:

Decimal(sys.maxsize) in range(sys.maxsize) is pretty slow.

(in this case, there is no optimization in range, so if python receives unexpected Decimal, python will compare all numbers)

You should be aware of an implementation detail but should not be relied upon, because this may change in the future.


回答 7

TL; DR

传回的物件range()实际上是range对象。该对象实现了迭代器接口,因此您可以按顺序迭代其值,就像生成器,列表或元组一样。

但是,它实现了__contains__接口,该接口实际上是当对象出现在in操作员右侧时调用的接口。该__contains__()方法返回a bool左侧项目是否in在对象中。由于range对象知道其边界和步幅,因此在O(1)中非常容易实现。

TL;DR

The object returned by range() is actually a range object. This object implements the iterator interface so you can iterate over its values sequentially, just like a generator, list, or tuple.

But it also implements the __contains__ interface which is actually what gets called when an object appears on the right hand side of the in operator. The __contains__() method returns a bool of whether or not the item on the left-hand-side of the in is in the object. Since range objects know their bounds and stride, this is very easy to implement in O(1).


回答 8

  1. 由于优化,将给定的整数与最小和最大范围进行比较非常容易。
  2. 在Python3 中range()函数之所以如此之快,是因为这里我们对边界使用数学推理,而不是范围对象的直接迭代。
  3. 所以在这里解释逻辑:
    • 检查数字是否在开始和停止之间。
    • 检查步长精度值是否不超过我们的数字。
  4. 例如,997在range(4,1000,3)内是因为:

    4 <= 997 < 1000, and (997 - 4) % 3 == 0.

  1. Due to optimization, it is very easy to compare given integers just with min and max range.
  2. The reason that range() function is so fast in Python3 is that here we use mathematical reasoning for the bounds, rather than a direct iteration of the range object.
  3. So for explaining the logic here:
    • Check whether the number is between the start and stop.
    • Check whether the step precision value doesn’t go over our number.
  4. Take an example, 997 is in range(4, 1000, 3) because:

    4 <= 997 < 1000, and (997 - 4) % 3 == 0.


回答 9

尝试x-1 in (i for i in range(x))使用较大的x值,该值使用生成器理解来避免调用range.__contains__优化。

Try x-1 in (i for i in range(x)) for large x values, which uses a generator comprehension to avoid invoking the range.__contains__ optimisation.