为什么迭代一小串字符串比一小串列表慢?

问题:为什么迭代一小串字符串比一小串列表慢?

我在玩timeit时发现,对小字符串进行简单的列表理解要比对小字符串列表进行相同的操作花费的时间更长。有什么解释吗?时间几乎是原来的1.35倍。

>>> from timeit import timeit
>>> timeit("[x for x in 'abc']")
2.0691067844831528
>>> timeit("[x for x in ['a', 'b', 'c']]")
1.5286479570345861

导致此情况的较低级别发生了什么?

I was playing around with timeit and noticed that doing a simple list comprehension over a small string took longer than doing the same operation on a list of small single character strings. Any explanation? It’s almost 1.35 times as much time.

>>> from timeit import timeit
>>> timeit("[x for x in 'abc']")
2.0691067844831528
>>> timeit("[x for x in ['a', 'b', 'c']]")
1.5286479570345861

What’s happening on a lower level that’s causing this?


回答 0

TL; DR

  • 对于Python 2,一旦消除了很多开销,实际的速度差异就会接近70%(或更高)。

  • 对象创建没有错。这两种方法都不会创建新对象,因为会缓存一个字符的字符串。

  • 区别并不明显,但可能是由于对类型和格式正确的字符串索引进行了大量检查而造成的。由于很有必要检查返回的商品,因此很有可能。

  • 列表索引非常快。



>>> python3 -m timeit '[x for x in "abc"]'
1000000 loops, best of 3: 0.388 usec per loop

>>> python3 -m timeit '[x for x in ["a", "b", "c"]]'
1000000 loops, best of 3: 0.436 usec per loop

这与您发现的内容不同…

然后,您必须使用Python 2。

>>> python2 -m timeit '[x for x in "abc"]'
1000000 loops, best of 3: 0.309 usec per loop

>>> python2 -m timeit '[x for x in ["a", "b", "c"]]'
1000000 loops, best of 3: 0.212 usec per loop

让我们解释两个版本之间的区别。我将检查编译后的代码。

对于Python 3:

import dis

def list_iterate():
    [item for item in ["a", "b", "c"]]

dis.dis(list_iterate)
#>>>   4           0 LOAD_CONST               1 (<code object <listcomp> at 0x7f4d06b118a0, file "", line 4>)
#>>>               3 LOAD_CONST               2 ('list_iterate.<locals>.<listcomp>')
#>>>               6 MAKE_FUNCTION            0
#>>>               9 LOAD_CONST               3 ('a')
#>>>              12 LOAD_CONST               4 ('b')
#>>>              15 LOAD_CONST               5 ('c')
#>>>              18 BUILD_LIST               3
#>>>              21 GET_ITER
#>>>              22 CALL_FUNCTION            1 (1 positional, 0 keyword pair)
#>>>              25 POP_TOP
#>>>              26 LOAD_CONST               0 (None)
#>>>              29 RETURN_VALUE

def string_iterate():
    [item for item in "abc"]

dis.dis(string_iterate)
#>>>  21           0 LOAD_CONST               1 (<code object <listcomp> at 0x7f4d06b17150, file "", line 21>)
#>>>               3 LOAD_CONST               2 ('string_iterate.<locals>.<listcomp>')
#>>>               6 MAKE_FUNCTION            0
#>>>               9 LOAD_CONST               3 ('abc')
#>>>              12 GET_ITER
#>>>              13 CALL_FUNCTION            1 (1 positional, 0 keyword pair)
#>>>              16 POP_TOP
#>>>              17 LOAD_CONST               0 (None)
#>>>              20 RETURN_VALUE

您会在此处看到,由于每次都建立列表,列表变体可能会变慢。

这是

 9 LOAD_CONST   3 ('a')
12 LOAD_CONST   4 ('b')
15 LOAD_CONST   5 ('c')
18 BUILD_LIST   3

部分。字符串变体仅具有

 9 LOAD_CONST   3 ('abc')

您可以检查一下是否确实有所不同:

def string_iterate():
    [item for item in ("a", "b", "c")]

dis.dis(string_iterate)
#>>>  35           0 LOAD_CONST               1 (<code object <listcomp> at 0x7f4d068be660, file "", line 35>)
#>>>               3 LOAD_CONST               2 ('string_iterate.<locals>.<listcomp>')
#>>>               6 MAKE_FUNCTION            0
#>>>               9 LOAD_CONST               6 (('a', 'b', 'c'))
#>>>              12 GET_ITER
#>>>              13 CALL_FUNCTION            1 (1 positional, 0 keyword pair)
#>>>              16 POP_TOP
#>>>              17 LOAD_CONST               0 (None)
#>>>              20 RETURN_VALUE

这产生了

 9 LOAD_CONST               6 (('a', 'b', 'c'))

因为元组是不可变的。测试:

>>> python3 -m timeit '[x for x in ("a", "b", "c")]'
1000000 loops, best of 3: 0.369 usec per loop

太好了,赶快行动吧。

对于Python 2:

def list_iterate():
    [item for item in ["a", "b", "c"]]

dis.dis(list_iterate)
#>>>   2           0 BUILD_LIST               0
#>>>               3 LOAD_CONST               1 ('a')
#>>>               6 LOAD_CONST               2 ('b')
#>>>               9 LOAD_CONST               3 ('c')
#>>>              12 BUILD_LIST               3
#>>>              15 GET_ITER            
#>>>         >>   16 FOR_ITER                12 (to 31)
#>>>              19 STORE_FAST               0 (item)
#>>>              22 LOAD_FAST                0 (item)
#>>>              25 LIST_APPEND              2
#>>>              28 JUMP_ABSOLUTE           16
#>>>         >>   31 POP_TOP             
#>>>              32 LOAD_CONST               0 (None)
#>>>              35 RETURN_VALUE        

def string_iterate():
    [item for item in "abc"]

dis.dis(string_iterate)
#>>>   2           0 BUILD_LIST               0
#>>>               3 LOAD_CONST               1 ('abc')
#>>>               6 GET_ITER            
#>>>         >>    7 FOR_ITER                12 (to 22)
#>>>              10 STORE_FAST               0 (item)
#>>>              13 LOAD_FAST                0 (item)
#>>>              16 LIST_APPEND              2
#>>>              19 JUMP_ABSOLUTE            7
#>>>         >>   22 POP_TOP             
#>>>              23 LOAD_CONST               0 (None)
#>>>              26 RETURN_VALUE        

奇怪的是,我们具有相同的列表构建,但是这样做的速度仍然更快。Python 2的运行速度异常快。

让我们删除理解和重新计时。这_ =是为了防止它被优化。

>>> python3 -m timeit '_ = ["a", "b", "c"]'
10000000 loops, best of 3: 0.0707 usec per loop

>>> python3 -m timeit '_ = "abc"'
100000000 loops, best of 3: 0.0171 usec per loop

我们可以看到初始化不足以说明版本之间的差异(这些数字很小)!因此,我们可以得出结论,Python 3的理解速度较慢。随着Python 3将理解方式更改为具有更安全的作用域,这才有意义。

好吧,现在提高基准(我只是删除不是迭代的开销)。这通过预先分配来删除迭代器的构建:

>>> python3 -m timeit -s 'iterable = "abc"'           '[x for x in iterable]'
1000000 loops, best of 3: 0.387 usec per loop

>>> python3 -m timeit -s 'iterable = ["a", "b", "c"]' '[x for x in iterable]'
1000000 loops, best of 3: 0.368 usec per loop
>>> python2 -m timeit -s 'iterable = "abc"'           '[x for x in iterable]'
1000000 loops, best of 3: 0.309 usec per loop

>>> python2 -m timeit -s 'iterable = ["a", "b", "c"]' '[x for x in iterable]'
10000000 loops, best of 3: 0.164 usec per loop

我们可以检查调用iter是否是开销:

>>> python3 -m timeit -s 'iterable = "abc"'           'iter(iterable)'
10000000 loops, best of 3: 0.099 usec per loop

>>> python3 -m timeit -s 'iterable = ["a", "b", "c"]' 'iter(iterable)'
10000000 loops, best of 3: 0.1 usec per loop
>>> python2 -m timeit -s 'iterable = "abc"'           'iter(iterable)'
10000000 loops, best of 3: 0.0913 usec per loop

>>> python2 -m timeit -s 'iterable = ["a", "b", "c"]' 'iter(iterable)'
10000000 loops, best of 3: 0.0854 usec per loop

不,不是。差别太小,尤其是对于Python 3。

因此,让我们降低整体速度,从而消除更多不必要的开销!目的是使迭代时间更长,从而节省时间。

>>> python3 -m timeit -s 'import random; iterable = "".join(chr(random.randint(0, 127)) for _ in range(100000))' '[x for x in iterable]'
100 loops, best of 3: 3.12 msec per loop

>>> python3 -m timeit -s 'import random; iterable =        [chr(random.randint(0, 127)) for _ in range(100000)]' '[x for x in iterable]'
100 loops, best of 3: 2.77 msec per loop
>>> python2 -m timeit -s 'import random; iterable = "".join(chr(random.randint(0, 127)) for _ in range(100000))' '[x for x in iterable]'
100 loops, best of 3: 2.32 msec per loop

>>> python2 -m timeit -s 'import random; iterable =        [chr(random.randint(0, 127)) for _ in range(100000)]' '[x for x in iterable]'
100 loops, best of 3: 2.09 msec per loop

这实际上并没有太大变化,但有所帮助。

因此,消除理解。开销并不是问题的一部分:

>>> python3 -m timeit -s 'import random; iterable = "".join(chr(random.randint(0, 127)) for _ in range(100000))' 'for x in iterable: pass'
1000 loops, best of 3: 1.71 msec per loop

>>> python3 -m timeit -s 'import random; iterable =        [chr(random.randint(0, 127)) for _ in range(100000)]' 'for x in iterable: pass'
1000 loops, best of 3: 1.36 msec per loop
>>> python2 -m timeit -s 'import random; iterable = "".join(chr(random.randint(0, 127)) for _ in range(100000))' 'for x in iterable: pass'
1000 loops, best of 3: 1.27 msec per loop

>>> python2 -m timeit -s 'import random; iterable =        [chr(random.randint(0, 127)) for _ in range(100000)]' 'for x in iterable: pass'
1000 loops, best of 3: 935 usec per loop

这还差不多!通过使用deque迭代,我们仍然可以稍微快一些。基本上是一样的,但是速度更快

>>> python3 -m timeit -s 'import random; from collections import deque; iterable = "".join(chr(random.randint(0, 127)) for _ in range(100000))' 'deque(iterable, maxlen=0)'
1000 loops, best of 3: 777 usec per loop

>>> python3 -m timeit -s 'import random; from collections import deque; iterable =        [chr(random.randint(0, 127)) for _ in range(100000)]' 'deque(iterable, maxlen=0)'
1000 loops, best of 3: 405 usec per loop
>>> python2 -m timeit -s 'import random; from collections import deque; iterable = "".join(chr(random.randint(0, 127)) for _ in range(100000))' 'deque(iterable, maxlen=0)'
1000 loops, best of 3: 805 usec per loop

>>> python2 -m timeit -s 'import random; from collections import deque; iterable =        [chr(random.randint(0, 127)) for _ in range(100000)]' 'deque(iterable, maxlen=0)'
1000 loops, best of 3: 438 usec per loop

令我印象深刻的是,Unicode在字节串方面具有竞争力。我们可以通过尝试在bytesunicode两者中进行显式检查:

  • bytes

    >>> python3 -m timeit -s 'import random; from collections import deque; iterable = b"".join(chr(random.randint(0, 127)).encode("ascii") for _ in range(100000))' 'deque(iterable, maxlen=0)'                                                                    :(
    1000 loops, best of 3: 571 usec per loop
    
    >>> python3 -m timeit -s 'import random; from collections import deque; iterable =         [chr(random.randint(0, 127)).encode("ascii") for _ in range(100000)]' 'deque(iterable, maxlen=0)'
    1000 loops, best of 3: 394 usec per loop
    
    >>> python2 -m timeit -s 'import random; from collections import deque; iterable = b"".join(chr(random.randint(0, 127))                 for _ in range(100000))' 'deque(iterable, maxlen=0)'
    1000 loops, best of 3: 757 usec per loop
    
    >>> python2 -m timeit -s 'import random; from collections import deque; iterable =         [chr(random.randint(0, 127))                 for _ in range(100000)]' 'deque(iterable, maxlen=0)'
    1000 loops, best of 3: 438 usec per loop
    

    在这里,您可以看到Python 3实际上比Python 2 更快

  • unicode

    >>> python3 -m timeit -s 'import random; from collections import deque; iterable = u"".join(   chr(random.randint(0, 127)) for _ in range(100000))' 'deque(iterable, maxlen=0)'
    1000 loops, best of 3: 800 usec per loop
    
    >>> python3 -m timeit -s 'import random; from collections import deque; iterable =         [   chr(random.randint(0, 127)) for _ in range(100000)]' 'deque(iterable, maxlen=0)'
    1000 loops, best of 3: 394 usec per loop
    
    >>> python2 -m timeit -s 'import random; from collections import deque; iterable = u"".join(unichr(random.randint(0, 127)) for _ in range(100000))' 'deque(iterable, maxlen=0)'
    1000 loops, best of 3: 1.07 msec per loop
    
    >>> python2 -m timeit -s 'import random; from collections import deque; iterable =         [unichr(random.randint(0, 127)) for _ in range(100000)]' 'deque(iterable, maxlen=0)'
    1000 loops, best of 3: 469 usec per loop
    

    同样,Python 3更快,尽管这是可以预料的(str在Python 3中引起了很多关注)。

实际上,这unicodebytes差异很小,令人印象深刻。

因此,让我们分析一下这种情况,因为它对我来说既快速又方便:

>>> python3 -m timeit -s 'import random; from collections import deque; iterable = "".join(chr(random.randint(0, 127)) for _ in range(100000))' 'deque(iterable, maxlen=0)'
1000 loops, best of 3: 777 usec per loop

>>> python3 -m timeit -s 'import random; from collections import deque; iterable =        [chr(random.randint(0, 127)) for _ in range(100000)]' 'deque(iterable, maxlen=0)'
1000 loops, best of 3: 405 usec per loop

实际上,我们可以排除蒂姆·彼得(Tim Peter)提出10次支持的答案!

>>> foo = iterable[123]
>>> iterable[36] is foo
True

这些不是新对象!

但这值得一提:索引成本。区别可能在于索引,因此删除迭代并仅索引:

>>> python3 -m timeit -s 'import random; iterable = "".join(chr(random.randint(0, 127)) for _ in range(100000))' 'iterable[123]'
10000000 loops, best of 3: 0.0397 usec per loop

>>> python3 -m timeit -s 'import random; iterable =        [chr(random.randint(0, 127)) for _ in range(100000)]' 'iterable[123]'
10000000 loops, best of 3: 0.0374 usec per loop

差异似乎很小,但是至少一半的成本是间接费用:

>>> python3 -m timeit -s 'import random; iterable =        [chr(random.randint(0, 127)) for _ in range(100000)]' 'iterable; 123'
100000000 loops, best of 3: 0.0173 usec per loop

因此,速度差足以决定对此负责。我认为。

那么为什么索引列表这么快呢?

好吧,我会回来给你这一点,但我的猜测是的是倒在支票实习字符串(或缓存的字符,如果它是一个独立的机构)。这将不如最佳速度快。但是我会去检查源代码(尽管我对C语言不太满意):)。


所以这是来源:

static PyObject *
unicode_getitem(PyObject *self, Py_ssize_t index)
{
    void *data;
    enum PyUnicode_Kind kind;
    Py_UCS4 ch;
    PyObject *res;

    if (!PyUnicode_Check(self) || PyUnicode_READY(self) == -1) {
        PyErr_BadArgument();
        return NULL;
    }
    if (index < 0 || index >= PyUnicode_GET_LENGTH(self)) {
        PyErr_SetString(PyExc_IndexError, "string index out of range");
        return NULL;
    }
    kind = PyUnicode_KIND(self);
    data = PyUnicode_DATA(self);
    ch = PyUnicode_READ(kind, data, index);
    if (ch < 256)
        return get_latin1_char(ch);

    res = PyUnicode_New(1, ch);
    if (res == NULL)
        return NULL;
    kind = PyUnicode_KIND(res);
    data = PyUnicode_DATA(res);
    PyUnicode_WRITE(kind, data, 0, ch);
    assert(_PyUnicode_CheckConsistency(res, 1));
    return res;
}

从顶部走,我们将进行一些检查。这些无聊。然后一些分配,这也应该很无聊。第一个有趣的行是

ch = PyUnicode_READ(kind, data, index);

但是我们希望这很快,因为我们正在通过索引从连续的C数组读取数据。结果ch小于256,因此我们将在中返回缓存的字符get_latin1_char(ch)

因此,我们将运行(删除第一个检查)

kind = PyUnicode_KIND(self);
data = PyUnicode_DATA(self);
ch = PyUnicode_READ(kind, data, index);
return get_latin1_char(ch);

哪里

#define PyUnicode_KIND(op) \
    (assert(PyUnicode_Check(op)), \
     assert(PyUnicode_IS_READY(op)),            \
     ((PyASCIIObject *)(op))->state.kind)

(这很无聊,因为断言在调试中会被忽略(因此我可以检查它们是否很快),((PyASCIIObject *)(op))->state.kind)并且(我认为)是间接调用和C级强制转换);

#define PyUnicode_DATA(op) \
    (assert(PyUnicode_Check(op)), \
     PyUnicode_IS_COMPACT(op) ? _PyUnicode_COMPACT_DATA(op) :   \
     _PyUnicode_NONCOMPACT_DATA(op))

(由于类似的原因,这也很无聊,假设宏(Something_CAPITALIZED)都很快),

#define PyUnicode_READ(kind, data, index) \
    ((Py_UCS4) \
    ((kind) == PyUnicode_1BYTE_KIND ? \
        ((const Py_UCS1 *)(data))[(index)] : \
        ((kind) == PyUnicode_2BYTE_KIND ? \
            ((const Py_UCS2 *)(data))[(index)] : \
            ((const Py_UCS4 *)(data))[(index)] \
        ) \
    ))

(涉及索引,但实际上一点也不慢),并且

static PyObject*
get_latin1_char(unsigned char ch)
{
    PyObject *unicode = unicode_latin1[ch];
    if (!unicode) {
        unicode = PyUnicode_New(1, ch);
        if (!unicode)
            return NULL;
        PyUnicode_1BYTE_DATA(unicode)[0] = ch;
        assert(_PyUnicode_CheckConsistency(unicode, 1));
        unicode_latin1[ch] = unicode;
    }
    Py_INCREF(unicode);
    return unicode;
}

这证实了我的怀疑:

  • 这被缓存:

    PyObject *unicode = unicode_latin1[ch];
  • 这应该很快。在if (!unicode)没有运行,所以它是在这种情况下相当于字面上

    PyObject *unicode = unicode_latin1[ch];
    Py_INCREF(unicode);
    return unicode;
    

坦白地说,在测试asserts 之后(通过禁用它们[我认为它可以在C级断言上运行…]),只有看起来很慢的部分是:

PyUnicode_IS_COMPACT(op)
_PyUnicode_COMPACT_DATA(op)
_PyUnicode_NONCOMPACT_DATA(op)

哪个是:

#define PyUnicode_IS_COMPACT(op) \
    (((PyASCIIObject*)(op))->state.compact)

(和以前一样快),

#define _PyUnicode_COMPACT_DATA(op)                     \
    (PyUnicode_IS_ASCII(op) ?                   \
     ((void*)((PyASCIIObject*)(op) + 1)) :              \
     ((void*)((PyCompactUnicodeObject*)(op) + 1)))

(如果宏IS_ASCII很快,则很快),以及

#define _PyUnicode_NONCOMPACT_DATA(op)                  \
    (assert(((PyUnicodeObject*)(op))->data.any),        \
     ((((PyUnicodeObject *)(op))->data.any)))

(因为它是断言,间接寻址和强制转换,因此速度也很快)。

因此,我们进入(兔子洞)以:

PyUnicode_IS_ASCII

这是

#define PyUnicode_IS_ASCII(op)                   \
    (assert(PyUnicode_Check(op)),                \
     assert(PyUnicode_IS_READY(op)),             \
     ((PyASCIIObject*)op)->state.ascii)

嗯…似乎也很快…


好吧,但让我们将其与进行比较PyList_GetItem。(是的,感谢蒂姆·彼得斯(Tim Peters)为我提供了更多的工作要做:P。)

PyObject *
PyList_GetItem(PyObject *op, Py_ssize_t i)
{
    if (!PyList_Check(op)) {
        PyErr_BadInternalCall();
        return NULL;
    }
    if (i < 0 || i >= Py_SIZE(op)) {
        if (indexerr == NULL) {
            indexerr = PyUnicode_FromString(
                "list index out of range");
            if (indexerr == NULL)
                return NULL;
        }
        PyErr_SetObject(PyExc_IndexError, indexerr);
        return NULL;
    }
    return ((PyListObject *)op) -> ob_item[i];
}

我们可以看到,在非错误情况下,这只会运行:

PyList_Check(op)
Py_SIZE(op)
((PyListObject *)op) -> ob_item[i]

哪里PyList_Check

#define PyList_Check(op) \
     PyType_FastSubclass(Py_TYPE(op), Py_TPFLAGS_LIST_SUBCLASS)

TABS!TABS !!!)(issue215875分钟内修复并合并。就像…是的。该死的。他们让Skeet感到羞耻。

#define Py_SIZE(ob)             (((PyVarObject*)(ob))->ob_size)
#define PyType_FastSubclass(t,f)  PyType_HasFeature(t,f)
#ifdef Py_LIMITED_API
#define PyType_HasFeature(t,f)  ((PyType_GetFlags(t) & (f)) != 0)
#else
#define PyType_HasFeature(t,f)  (((t)->tp_flags & (f)) != 0)
#endif

因此,除非Py_LIMITED_API启用,否则通常这确实是微不足道的(两个间接调用和几个布尔检查)……???

然后是索引和强制转换(((PyListObject *)op) -> ob_item[i]),我们完成了。

因此,对列表检查肯定会更少,并且速度差异很小肯定意味着它可能是相关的。


我认为通常来说,(->)Unicode的类型检查和间接性更多。似乎我遗漏了一点,但是

TL;DR

  • The actual speed difference is closer to 70% (or more) once a lot of the overhead is removed, for Python 2.

  • Object creation is not at fault. Neither method creates a new object, as one-character strings are cached.

  • The difference is unobvious, but is likely created from a greater number of checks on string indexing, with regards to the type and well-formedness. It is also quite likely thanks to the need to check what to return.

  • List indexing is remarkably fast.



>>> python3 -m timeit '[x for x in "abc"]'
1000000 loops, best of 3: 0.388 usec per loop

>>> python3 -m timeit '[x for x in ["a", "b", "c"]]'
1000000 loops, best of 3: 0.436 usec per loop

This disagrees with what you’ve found…

You must be using Python 2, then.

>>> python2 -m timeit '[x for x in "abc"]'
1000000 loops, best of 3: 0.309 usec per loop

>>> python2 -m timeit '[x for x in ["a", "b", "c"]]'
1000000 loops, best of 3: 0.212 usec per loop

Let’s explain the difference between the versions. I’ll examine the compiled code.

For Python 3:

import dis

def list_iterate():
    [item for item in ["a", "b", "c"]]

dis.dis(list_iterate)
#>>>   4           0 LOAD_CONST               1 (<code object <listcomp> at 0x7f4d06b118a0, file "", line 4>)
#>>>               3 LOAD_CONST               2 ('list_iterate.<locals>.<listcomp>')
#>>>               6 MAKE_FUNCTION            0
#>>>               9 LOAD_CONST               3 ('a')
#>>>              12 LOAD_CONST               4 ('b')
#>>>              15 LOAD_CONST               5 ('c')
#>>>              18 BUILD_LIST               3
#>>>              21 GET_ITER
#>>>              22 CALL_FUNCTION            1 (1 positional, 0 keyword pair)
#>>>              25 POP_TOP
#>>>              26 LOAD_CONST               0 (None)
#>>>              29 RETURN_VALUE

def string_iterate():
    [item for item in "abc"]

dis.dis(string_iterate)
#>>>  21           0 LOAD_CONST               1 (<code object <listcomp> at 0x7f4d06b17150, file "", line 21>)
#>>>               3 LOAD_CONST               2 ('string_iterate.<locals>.<listcomp>')
#>>>               6 MAKE_FUNCTION            0
#>>>               9 LOAD_CONST               3 ('abc')
#>>>              12 GET_ITER
#>>>              13 CALL_FUNCTION            1 (1 positional, 0 keyword pair)
#>>>              16 POP_TOP
#>>>              17 LOAD_CONST               0 (None)
#>>>              20 RETURN_VALUE

You see here that the list variant is likely to be slower due to the building of the list each time.

This is the

 9 LOAD_CONST   3 ('a')
12 LOAD_CONST   4 ('b')
15 LOAD_CONST   5 ('c')
18 BUILD_LIST   3

part. The string variant only has

 9 LOAD_CONST   3 ('abc')

You can check that this does seem to make a difference:

def string_iterate():
    [item for item in ("a", "b", "c")]

dis.dis(string_iterate)
#>>>  35           0 LOAD_CONST               1 (<code object <listcomp> at 0x7f4d068be660, file "", line 35>)
#>>>               3 LOAD_CONST               2 ('string_iterate.<locals>.<listcomp>')
#>>>               6 MAKE_FUNCTION            0
#>>>               9 LOAD_CONST               6 (('a', 'b', 'c'))
#>>>              12 GET_ITER
#>>>              13 CALL_FUNCTION            1 (1 positional, 0 keyword pair)
#>>>              16 POP_TOP
#>>>              17 LOAD_CONST               0 (None)
#>>>              20 RETURN_VALUE

This produces just

 9 LOAD_CONST               6 (('a', 'b', 'c'))

as tuples are immutable. Test:

>>> python3 -m timeit '[x for x in ("a", "b", "c")]'
1000000 loops, best of 3: 0.369 usec per loop

Great, back up to speed.

For Python 2:

def list_iterate():
    [item for item in ["a", "b", "c"]]

dis.dis(list_iterate)
#>>>   2           0 BUILD_LIST               0
#>>>               3 LOAD_CONST               1 ('a')
#>>>               6 LOAD_CONST               2 ('b')
#>>>               9 LOAD_CONST               3 ('c')
#>>>              12 BUILD_LIST               3
#>>>              15 GET_ITER            
#>>>         >>   16 FOR_ITER                12 (to 31)
#>>>              19 STORE_FAST               0 (item)
#>>>              22 LOAD_FAST                0 (item)
#>>>              25 LIST_APPEND              2
#>>>              28 JUMP_ABSOLUTE           16
#>>>         >>   31 POP_TOP             
#>>>              32 LOAD_CONST               0 (None)
#>>>              35 RETURN_VALUE        

def string_iterate():
    [item for item in "abc"]

dis.dis(string_iterate)
#>>>   2           0 BUILD_LIST               0
#>>>               3 LOAD_CONST               1 ('abc')
#>>>               6 GET_ITER            
#>>>         >>    7 FOR_ITER                12 (to 22)
#>>>              10 STORE_FAST               0 (item)
#>>>              13 LOAD_FAST                0 (item)
#>>>              16 LIST_APPEND              2
#>>>              19 JUMP_ABSOLUTE            7
#>>>         >>   22 POP_TOP             
#>>>              23 LOAD_CONST               0 (None)
#>>>              26 RETURN_VALUE        

The odd thing is that we have the same building of the list, but it’s still faster for this. Python 2 is acting strangely fast.

Let’s remove the comprehensions and re-time. The _ = is to prevent it getting optimised out.

>>> python3 -m timeit '_ = ["a", "b", "c"]'
10000000 loops, best of 3: 0.0707 usec per loop

>>> python3 -m timeit '_ = "abc"'
100000000 loops, best of 3: 0.0171 usec per loop

We can see that initialization is not significant enough to account for the difference between the versions (those numbers are small)! We can thus conclude that Python 3 has slower comprehensions. This makes sense as Python 3 changed comprehensions to have safer scoping.

Well, now improve the benchmark (I’m just removing overhead that isn’t iteration). This removes the building of the iterable by pre-assigning it:

>>> python3 -m timeit -s 'iterable = "abc"'           '[x for x in iterable]'
1000000 loops, best of 3: 0.387 usec per loop

>>> python3 -m timeit -s 'iterable = ["a", "b", "c"]' '[x for x in iterable]'
1000000 loops, best of 3: 0.368 usec per loop
>>> python2 -m timeit -s 'iterable = "abc"'           '[x for x in iterable]'
1000000 loops, best of 3: 0.309 usec per loop

>>> python2 -m timeit -s 'iterable = ["a", "b", "c"]' '[x for x in iterable]'
10000000 loops, best of 3: 0.164 usec per loop

We can check if calling iter is the overhead:

>>> python3 -m timeit -s 'iterable = "abc"'           'iter(iterable)'
10000000 loops, best of 3: 0.099 usec per loop

>>> python3 -m timeit -s 'iterable = ["a", "b", "c"]' 'iter(iterable)'
10000000 loops, best of 3: 0.1 usec per loop
>>> python2 -m timeit -s 'iterable = "abc"'           'iter(iterable)'
10000000 loops, best of 3: 0.0913 usec per loop

>>> python2 -m timeit -s 'iterable = ["a", "b", "c"]' 'iter(iterable)'
10000000 loops, best of 3: 0.0854 usec per loop

No. No it is not. The difference is too small, especially for Python 3.

So let’s remove yet more unwanted overhead… by making the whole thing slower! The aim is just to have a longer iteration so the time hides overhead.

>>> python3 -m timeit -s 'import random; iterable = "".join(chr(random.randint(0, 127)) for _ in range(100000))' '[x for x in iterable]'
100 loops, best of 3: 3.12 msec per loop

>>> python3 -m timeit -s 'import random; iterable =        [chr(random.randint(0, 127)) for _ in range(100000)]' '[x for x in iterable]'
100 loops, best of 3: 2.77 msec per loop
>>> python2 -m timeit -s 'import random; iterable = "".join(chr(random.randint(0, 127)) for _ in range(100000))' '[x for x in iterable]'
100 loops, best of 3: 2.32 msec per loop

>>> python2 -m timeit -s 'import random; iterable =        [chr(random.randint(0, 127)) for _ in range(100000)]' '[x for x in iterable]'
100 loops, best of 3: 2.09 msec per loop

This hasn’t actually changed much, but it’s helped a little.

So remove the comprehension. It’s overhead that’s not part of the question:

>>> python3 -m timeit -s 'import random; iterable = "".join(chr(random.randint(0, 127)) for _ in range(100000))' 'for x in iterable: pass'
1000 loops, best of 3: 1.71 msec per loop

>>> python3 -m timeit -s 'import random; iterable =        [chr(random.randint(0, 127)) for _ in range(100000)]' 'for x in iterable: pass'
1000 loops, best of 3: 1.36 msec per loop
>>> python2 -m timeit -s 'import random; iterable = "".join(chr(random.randint(0, 127)) for _ in range(100000))' 'for x in iterable: pass'
1000 loops, best of 3: 1.27 msec per loop

>>> python2 -m timeit -s 'import random; iterable =        [chr(random.randint(0, 127)) for _ in range(100000)]' 'for x in iterable: pass'
1000 loops, best of 3: 935 usec per loop

That’s more like it! We can get slightly faster still by using deque to iterate. It’s basically the same, but it’s faster:

>>> python3 -m timeit -s 'import random; from collections import deque; iterable = "".join(chr(random.randint(0, 127)) for _ in range(100000))' 'deque(iterable, maxlen=0)'
1000 loops, best of 3: 777 usec per loop

>>> python3 -m timeit -s 'import random; from collections import deque; iterable =        [chr(random.randint(0, 127)) for _ in range(100000)]' 'deque(iterable, maxlen=0)'
1000 loops, best of 3: 405 usec per loop
>>> python2 -m timeit -s 'import random; from collections import deque; iterable = "".join(chr(random.randint(0, 127)) for _ in range(100000))' 'deque(iterable, maxlen=0)'
1000 loops, best of 3: 805 usec per loop

>>> python2 -m timeit -s 'import random; from collections import deque; iterable =        [chr(random.randint(0, 127)) for _ in range(100000)]' 'deque(iterable, maxlen=0)'
1000 loops, best of 3: 438 usec per loop

What impresses me is that Unicode is competitive with bytestrings. We can check this explicitly by trying bytes and unicode in both:

  • bytes

    >>> python3 -m timeit -s 'import random; from collections import deque; iterable = b"".join(chr(random.randint(0, 127)).encode("ascii") for _ in range(100000))' 'deque(iterable, maxlen=0)'                                                                    :(
    1000 loops, best of 3: 571 usec per loop
    
    >>> python3 -m timeit -s 'import random; from collections import deque; iterable =         [chr(random.randint(0, 127)).encode("ascii") for _ in range(100000)]' 'deque(iterable, maxlen=0)'
    1000 loops, best of 3: 394 usec per loop
    
    >>> python2 -m timeit -s 'import random; from collections import deque; iterable = b"".join(chr(random.randint(0, 127))                 for _ in range(100000))' 'deque(iterable, maxlen=0)'
    1000 loops, best of 3: 757 usec per loop
    
    >>> python2 -m timeit -s 'import random; from collections import deque; iterable =         [chr(random.randint(0, 127))                 for _ in range(100000)]' 'deque(iterable, maxlen=0)'
    1000 loops, best of 3: 438 usec per loop
    

    Here you see Python 3 actually faster than Python 2.

  • unicode

    >>> python3 -m timeit -s 'import random; from collections import deque; iterable = u"".join(   chr(random.randint(0, 127)) for _ in range(100000))' 'deque(iterable, maxlen=0)'
    1000 loops, best of 3: 800 usec per loop
    
    >>> python3 -m timeit -s 'import random; from collections import deque; iterable =         [   chr(random.randint(0, 127)) for _ in range(100000)]' 'deque(iterable, maxlen=0)'
    1000 loops, best of 3: 394 usec per loop
    
    >>> python2 -m timeit -s 'import random; from collections import deque; iterable = u"".join(unichr(random.randint(0, 127)) for _ in range(100000))' 'deque(iterable, maxlen=0)'
    1000 loops, best of 3: 1.07 msec per loop
    
    >>> python2 -m timeit -s 'import random; from collections import deque; iterable =         [unichr(random.randint(0, 127)) for _ in range(100000)]' 'deque(iterable, maxlen=0)'
    1000 loops, best of 3: 469 usec per loop
    

    Again, Python 3 is faster, although this is to be expected (str has had a lot of attention in Python 3).

In fact, this unicodebytes difference is very small, which is impressive.

So let’s analyse this one case, seeing as it’s fast and convenient for me:

>>> python3 -m timeit -s 'import random; from collections import deque; iterable = "".join(chr(random.randint(0, 127)) for _ in range(100000))' 'deque(iterable, maxlen=0)'
1000 loops, best of 3: 777 usec per loop

>>> python3 -m timeit -s 'import random; from collections import deque; iterable =        [chr(random.randint(0, 127)) for _ in range(100000)]' 'deque(iterable, maxlen=0)'
1000 loops, best of 3: 405 usec per loop

We can actually rule out Tim Peter’s 10-times-upvoted answer!

>>> foo = iterable[123]
>>> iterable[36] is foo
True

These are not new objects!

But this is worth mentioning: indexing costs. The difference will likely be in the indexing, so remove the iteration and just index:

>>> python3 -m timeit -s 'import random; iterable = "".join(chr(random.randint(0, 127)) for _ in range(100000))' 'iterable[123]'
10000000 loops, best of 3: 0.0397 usec per loop

>>> python3 -m timeit -s 'import random; iterable =        [chr(random.randint(0, 127)) for _ in range(100000)]' 'iterable[123]'
10000000 loops, best of 3: 0.0374 usec per loop

The difference seems small, but at least half of the cost is overhead:

>>> python3 -m timeit -s 'import random; iterable =        [chr(random.randint(0, 127)) for _ in range(100000)]' 'iterable; 123'
100000000 loops, best of 3: 0.0173 usec per loop

so the speed difference is sufficient to decide to blame it. I think.

So why is indexing a list so much faster?

Well, I’ll come back to you on that, but my guess is that’s is down to the check for interned strings (or cached characters if it’s a separate mechanism). This will be less fast than optimal. But I’ll go check the source (although I’m not comfortable in C…) :).


So here’s the source:

static PyObject *
unicode_getitem(PyObject *self, Py_ssize_t index)
{
    void *data;
    enum PyUnicode_Kind kind;
    Py_UCS4 ch;
    PyObject *res;

    if (!PyUnicode_Check(self) || PyUnicode_READY(self) == -1) {
        PyErr_BadArgument();
        return NULL;
    }
    if (index < 0 || index >= PyUnicode_GET_LENGTH(self)) {
        PyErr_SetString(PyExc_IndexError, "string index out of range");
        return NULL;
    }
    kind = PyUnicode_KIND(self);
    data = PyUnicode_DATA(self);
    ch = PyUnicode_READ(kind, data, index);
    if (ch < 256)
        return get_latin1_char(ch);

    res = PyUnicode_New(1, ch);
    if (res == NULL)
        return NULL;
    kind = PyUnicode_KIND(res);
    data = PyUnicode_DATA(res);
    PyUnicode_WRITE(kind, data, 0, ch);
    assert(_PyUnicode_CheckConsistency(res, 1));
    return res;
}

Walking from the top, we’ll have some checks. These are boring. Then some assigns, which should also be boring. The first interesting line is

ch = PyUnicode_READ(kind, data, index);

but we’d hope that is fast, as we’re reading from a contiguous C array by indexing it. The result, ch, will be less than 256 so we’ll return the cached character in get_latin1_char(ch).

So we’ll run (dropping the first checks)

kind = PyUnicode_KIND(self);
data = PyUnicode_DATA(self);
ch = PyUnicode_READ(kind, data, index);
return get_latin1_char(ch);

Where

#define PyUnicode_KIND(op) \
    (assert(PyUnicode_Check(op)), \
     assert(PyUnicode_IS_READY(op)),            \
     ((PyASCIIObject *)(op))->state.kind)

(which is boring because asserts get ignored in debug [so I can check that they’re fast] and ((PyASCIIObject *)(op))->state.kind) is (I think) an indirection and a C-level cast);

#define PyUnicode_DATA(op) \
    (assert(PyUnicode_Check(op)), \
     PyUnicode_IS_COMPACT(op) ? _PyUnicode_COMPACT_DATA(op) :   \
     _PyUnicode_NONCOMPACT_DATA(op))

(which is also boring for similar reasons, assuming the macros (Something_CAPITALIZED) are all fast),

#define PyUnicode_READ(kind, data, index) \
    ((Py_UCS4) \
    ((kind) == PyUnicode_1BYTE_KIND ? \
        ((const Py_UCS1 *)(data))[(index)] : \
        ((kind) == PyUnicode_2BYTE_KIND ? \
            ((const Py_UCS2 *)(data))[(index)] : \
            ((const Py_UCS4 *)(data))[(index)] \
        ) \
    ))

(which involves indexes but really isn’t slow at all) and

static PyObject*
get_latin1_char(unsigned char ch)
{
    PyObject *unicode = unicode_latin1[ch];
    if (!unicode) {
        unicode = PyUnicode_New(1, ch);
        if (!unicode)
            return NULL;
        PyUnicode_1BYTE_DATA(unicode)[0] = ch;
        assert(_PyUnicode_CheckConsistency(unicode, 1));
        unicode_latin1[ch] = unicode;
    }
    Py_INCREF(unicode);
    return unicode;
}

Which confirms my suspicion that:

  • This is cached:

    PyObject *unicode = unicode_latin1[ch];
    
  • This should be fast. The if (!unicode) is not run, so it’s literally equivalent in this case to

    PyObject *unicode = unicode_latin1[ch];
    Py_INCREF(unicode);
    return unicode;
    

Honestly, after testing the asserts are fast (by disabling them [I think it works on the C-level asserts…]), the only plausibly-slow parts are:

PyUnicode_IS_COMPACT(op)
_PyUnicode_COMPACT_DATA(op)
_PyUnicode_NONCOMPACT_DATA(op)

Which are:

#define PyUnicode_IS_COMPACT(op) \
    (((PyASCIIObject*)(op))->state.compact)

(fast, as before),

#define _PyUnicode_COMPACT_DATA(op)                     \
    (PyUnicode_IS_ASCII(op) ?                   \
     ((void*)((PyASCIIObject*)(op) + 1)) :              \
     ((void*)((PyCompactUnicodeObject*)(op) + 1)))

(fast if the macro IS_ASCII is fast), and

#define _PyUnicode_NONCOMPACT_DATA(op)                  \
    (assert(((PyUnicodeObject*)(op))->data.any),        \
     ((((PyUnicodeObject *)(op))->data.any)))

(also fast as it’s an assert plus an indirection plus a cast).

So we’re down (the rabbit hole) to:

PyUnicode_IS_ASCII

which is

#define PyUnicode_IS_ASCII(op)                   \
    (assert(PyUnicode_Check(op)),                \
     assert(PyUnicode_IS_READY(op)),             \
     ((PyASCIIObject*)op)->state.ascii)

Hmm… that seems fast too…


Well, OK, but let’s compare it to PyList_GetItem. (Yeah, thanks Tim Peters for giving me more work to do :P.)

PyObject *
PyList_GetItem(PyObject *op, Py_ssize_t i)
{
    if (!PyList_Check(op)) {
        PyErr_BadInternalCall();
        return NULL;
    }
    if (i < 0 || i >= Py_SIZE(op)) {
        if (indexerr == NULL) {
            indexerr = PyUnicode_FromString(
                "list index out of range");
            if (indexerr == NULL)
                return NULL;
        }
        PyErr_SetObject(PyExc_IndexError, indexerr);
        return NULL;
    }
    return ((PyListObject *)op) -> ob_item[i];
}

We can see that on non-error cases this is just going to run:

PyList_Check(op)
Py_SIZE(op)
((PyListObject *)op) -> ob_item[i]

Where PyList_Check is

#define PyList_Check(op) \
     PyType_FastSubclass(Py_TYPE(op), Py_TPFLAGS_LIST_SUBCLASS)

(TABS! TABS!!!) (issue21587) That got fixed and merged in 5 minutes. Like… yeah. Damn. They put Skeet to shame.

#define Py_SIZE(ob)             (((PyVarObject*)(ob))->ob_size)
#define PyType_FastSubclass(t,f)  PyType_HasFeature(t,f)
#ifdef Py_LIMITED_API
#define PyType_HasFeature(t,f)  ((PyType_GetFlags(t) & (f)) != 0)
#else
#define PyType_HasFeature(t,f)  (((t)->tp_flags & (f)) != 0)
#endif

So this is normally really trivial (two indirections and a couple of boolean checks) unless Py_LIMITED_API is on, in which case… ???

Then there’s the indexing and a cast (((PyListObject *)op) -> ob_item[i]) and we’re done.

So there are definitely fewer checks for lists, and the small speed differences certainly imply that it could be relevant.


I think in general, there’s just more type-checking and indirection (->) for Unicode. It seems I’m missing a point, but what?


回答 1

当您遍历大多数容器对象(列表,元组,字典,…)时,迭代器会容器中传递对象。

但是,当您遍历字符串时,必须为传递的每个字符创建一个对象-字符串不是“容器”,就如同列表是容器一样。在迭代创建对象之前,字符串中的各个字符不作为不同的对象存在。

When you iterate over most container objects (lists, tuples, dicts, …), the iterator delivers the objects in the container.

But when you iterate over a string, a new object has to be created for each character delivered – a string is not “a container” in the same sense a list is a container. The individual characters in a string don’t exist as distinct objects before iteration creates those objects.


回答 2

创建字符串的迭代器可能会招致麻烦。而数组在实例化时已经包含一个迭代器。

编辑:

>>> timeit("[x for x in ['a','b','c']]")
0.3818681240081787
>>> timeit("[x for x in 'abc']")
0.3732869625091553

这是使用2.7运行的,但是在我的Mac book pro i7上。这可能是系统配置不同的结果。

You could be incurring and overhead for creating the iterator for the string. Whereas the array already contains an iterator upon instantiation.

EDIT:

>>> timeit("[x for x in ['a','b','c']]")
0.3818681240081787
>>> timeit("[x for x in 'abc']")
0.3732869625091553

This was ran using 2.7, but on my mac book pro i7. This could be the result of a system configuration difference.


Python垃圾收集器文档

问题:Python垃圾收集器文档

我正在寻找详细描述python垃圾回收如何工作的文档。

我对在哪个步骤中完成操作很感兴趣。这三个集合中有哪些对象?在每个步骤中删除哪些对象?参考循环使用什么算法?

背景:我正在实施一些必须在短时间内完成的搜索。当垃圾收集器开始收集最旧的一代时,它比其他情况“慢很多”。它花费了比计划搜索更多的时间。我正在寻找如何预测何时收集最古老的一代以及需要多长时间。

很容易预测何时使用get_count()和收集最老的一代get_threshold()。也可以使用进行操纵set_threshold()。但是我看不出collect()用武力做出更好的决定或等待预定的收集会多么容易。

I’m looking for documents that describes in details how python garbage collection works.

I’m interested what is done in which step. What objects are in these 3 collections? What kinds of objects are deleted in each step? What algorithm is used for reference cycles finding?

Background: I’m implementing some searches that have to finish in small amount of time. When the garbage collector starts collecting the oldest generation, it is “much” slower than in other cases. It took more time than it is intended for searches. I’m looking how to predict when it will collect oldest generation and how long it will take.

It is easy to predict when it will collect oldest generation with get_count() and get_threshold(). That also can be manipulated with set_threshold(). But I don’t see how easy to decide is it better to make collect() by force or wait for scheduled collection.


回答 0

没有关于Python如何进行垃圾收集的明确资源(除了源代码本身),但是这3个链接应该给您一个很好的主意。

更新资料

来源实际上很有帮助。从中获得多少取决于您对C的理解程度,但是注释实际上非常有帮助。跳到该collect()功能,注释会很好地解释该过程(尽管在技术上非常严格)。

There’s no definitive resource on how Python does its garbage collection (other than the source code itself), but those 3 links should give you a pretty good idea.

Update

The source is actually pretty helpful. How much you get out of it depends on how well you read C, but the comments are actually very helpful. Skip down to the collect() function and the comments explain the process well (albeit in very technical terms).


__init __()是否应该调用父类的__init __()?

问题:__init __()是否应该调用父类的__init __()?

我在Objective-C中使用过这种结构:

- (void)init {
    if (self = [super init]) {
        // init class
    }
    return self;
}

Python是否还应该为调用父类的实现__init__

class NewClass(SomeOtherClass):
    def __init__(self):
        SomeOtherClass.__init__(self)
        # init class

对于__new__()和也是正确/错误__del__()吗?

编辑:有一个非常类似的问题:Python中的继承和重写__init__

I’m used that in Objective-C I’ve got this construct:

- (void)init {
    if (self = [super init]) {
        // init class
    }
    return self;
}

Should Python also call the parent class’s implementation for __init__?

class NewClass(SomeOtherClass):
    def __init__(self):
        SomeOtherClass.__init__(self)
        # init class

Is this also true/false for __new__() and __del__()?

Edit: There’s a very similar question: Inheritance and Overriding __init__ in Python


回答 0

在Python中,调用超类__init__是可选的。如果调用它,那么使用super标识符还是显式命名超类也是可选的:

object.__init__(self)

对于对象,由于super方法为空,因此不必严格要求调用super方法。相同__del__

另一方面,对于__new__,您确实应该调用super方法,并将其return用作新创建的对象-除非您明确希望返回其他内容。

In Python, calling the super-class’ __init__ is optional. If you call it, it is then also optional whether to use the super identifier, or whether to explicitly name the super class:

object.__init__(self)

In case of object, calling the super method is not strictly necessary, since the super method is empty. Same for __del__.

On the other hand, for __new__, you should indeed call the super method, and use its return as the newly-created object – unless you explicitly want to return something different.


回答 1

如果__init__除了在当前类中正在执行的操作之外,还需要从super 进行操作,则__init__,必须自己调用它,因为这不会自动发生。但是,如果您不需要super的__init__,任何东西,则无需调用它。例:

>>> class C(object):
        def __init__(self):
            self.b = 1


>>> class D(C):
        def __init__(self):
            super().__init__() # in Python 2 use super(D, self).__init__()
            self.a = 1


>>> class E(C):
        def __init__(self):
            self.a = 1


>>> d = D()
>>> d.a
1
>>> d.b  # This works because of the call to super's init
1
>>> e = E()
>>> e.a
1
>>> e.b  # This is going to fail since nothing in E initializes b...
Traceback (most recent call last):
  File "<pyshell#70>", line 1, in <module>
    e.b  # This is going to fail since nothing in E initializes b...
AttributeError: 'E' object has no attribute 'b'

__del__是相同的方式(但要警惕依赖于__del__完成-请考虑通过with语句代替)。

我很少使用__new__. 所有初始化方法__init__.

If you need something from super’s __init__ to be done in addition to what is being done in the current class’s __init__, you must call it yourself, since that will not happen automatically. But if you don’t need anything from super’s __init__, no need to call it. Example:

>>> class C(object):
        def __init__(self):
            self.b = 1


>>> class D(C):
        def __init__(self):
            super().__init__() # in Python 2 use super(D, self).__init__()
            self.a = 1


>>> class E(C):
        def __init__(self):
            self.a = 1


>>> d = D()
>>> d.a
1
>>> d.b  # This works because of the call to super's init
1
>>> e = E()
>>> e.a
1
>>> e.b  # This is going to fail since nothing in E initializes b...
Traceback (most recent call last):
  File "<pyshell#70>", line 1, in <module>
    e.b  # This is going to fail since nothing in E initializes b...
AttributeError: 'E' object has no attribute 'b'

__del__ is the same way, (but be wary of relying on __del__ for finalization – consider doing it via the with statement instead).

I rarely use __new__. I do all the initialization in __init__.


回答 2

在Anon的回答中:
“如果__init__除了在当前类中所做的事情之外,还需要从super 进行一些事情__init__,则必须自己调用它,因为这不会自动发生”

令人难以置信:他的措辞与继承原则完全相反。


不是说“ super __init__ (…)中的某事不会自动发生”,而是它会自动发生,但不会发生,因为__init__派生类的定义覆盖了基类。__init__

那么,为什么要定义一个named_class’ __init__,因为它会覆盖有人诉诸继承时的目标?

这是因为需要定义一些在基类中未完成的事情__init__,而获得该结果的唯一可能性是将其执行置于派生类的__init__函数中。
换句话说,如果在基类__init____init__没有被覆盖,除了在基类会自动完成的事情外,还需要在基类做些什么。
并非相反。


然后,问题是__init__在实例化时不再激活存在于基类中的所需指令。为了抵消这种失活,需要做一些特殊的事情:显式调用基类’ __init__,以便保留基类执行的初始化,而不是添加__init__。这就是官方文档中所说的:

实际上,派生类中的重写方法可能想扩展而不是简单地替换相同名称的基类方法。有一种直接调用基类方法的简单方法:只需调用BaseClassName.methodname(self,arguments)。
http://docs.python.org/tutorial/classes.html#inheritance

这就是全部故事:

  • 当目标是保留基类执行的初始化(即纯继承)时,不需要任何特殊操作,必须避免__init__在派生类中定义一个函数

  • 当目的是替换由基类执行的初始化时,__init__必须在派生类中定义

  • 当目标是将过程添加到由基类执行的初始化时,__init__ 必须定义一个派生类,包括对基类的显式调用__init__


在Anon的职位上,我感到惊讶的不仅是他表达了与继承理论相反的事实,而且还有5个人绕过那个被推崇而又不掉头的家伙,而且在过去的2年中,没有人反应一个线程,其有趣的主题必须相对频繁地阅读。

In Anon’s answer:
“If you need something from super’s __init__ to be done in addition to what is being done in the current class’s __init__ , you must call it yourself, since that will not happen automatically”

It’s incredible: he is wording exactly the contrary of the principle of inheritance.


It is not that “something from super’s __init__ (…) will not happen automatically” , it is that it WOULD happen automatically, but it doesn’t happen because the base-class’ __init__ is overriden by the definition of the derived-clas __init__

So then, WHY defining a derived_class’ __init__ , since it overrides what is aimed at when someone resorts to inheritance ??

It’s because one needs to define something that is NOT done in the base-class’ __init__ , and the only possibility to obtain that is to put its execution in a derived-class’ __init__ function.
In other words, one needs something in base-class’ __init__ in addition to what would be automatically done in the base-classe’ __init__ if this latter wasn’t overriden.
NOT the contrary.


Then, the problem is that the desired instructions present in the base-class’ __init__ are no more activated at the moment of instantiation. In order to offset this inactivation, something special is required: calling explicitly the base-class’ __init__ , in order to KEEP , NOT TO ADD, the initialization performed by the base-class’ __init__ . That’s exactly what is said in the official doc:

An overriding method in a derived class may in fact want to extend rather than simply replace the base class method of the same name. There is a simple way to call the base class method directly: just call BaseClassName.methodname(self, arguments).
http://docs.python.org/tutorial/classes.html#inheritance

That’s all the story:

  • when the aim is to KEEP the initialization performed by the base-class, that is pure inheritance, nothing special is needed, one must just avoid to define an __init__ function in the derived class

  • when the aim is to REPLACE the initialization performed by the base-class, __init__ must be defined in the derived-class

  • when the aim is to ADD processes to the initialization performed by the base-class, a derived-class’ __init__ must be defined , comprising an explicit call to the base-class __init__


What I feel astonishing in the post of Anon is not only that he expresses the contrary of the inheritance theory, but that there have been 5 guys passing by that upvoted without turning a hair, and moreover there have been nobody to react in 2 years in a thread whose interesting subject must be read relatively often.


回答 3

编辑:(在代码更改之后)
我们无法告诉您是否需要调用父母的__init__(或任何其他函数)。继承显然可以在没有这种调用的情况下工作。这完全取决于代码的逻辑:例如,如果所有__init__操作都在父类中完成,则可以完全跳过子类__init__

考虑以下示例:

>>> class A:
    def __init__(self, val):
        self.a = val


>>> class B(A):
    pass

>>> class C(A):
    def __init__(self, val):
        A.__init__(self, val)
        self.a += val


>>> A(4).a
4
>>> B(5).a
5
>>> C(6).a
12

Edit: (after the code change)
There is no way for us to tell you whether you need or not to call your parent’s __init__ (or any other function). Inheritance obviously would work without such call. It all depends on the logic of your code: for example, if all your __init__ is done in parent class, you can just skip child-class __init__ altogether.

consider the following example:

>>> class A:
    def __init__(self, val):
        self.a = val


>>> class B(A):
    pass

>>> class C(A):
    def __init__(self, val):
        A.__init__(self, val)
        self.a += val


>>> A(4).a
4
>>> B(5).a
5
>>> C(6).a
12

回答 4

没有硬性规定。类的文档应指出子类是否应调用超类方法。有时您想完全替换超类行为,而有时又要增强它-即在超类调用之前和/或之后调用您自己的代码。

更新:相同的基本逻辑适用于任何方法调用。构造函数有时需要特别考虑(因为它们经常设置确定行为的状态)和析构函数,因为它们并行构造函数(例如,在资源分配(例如数据库连接)中)。但是,对于render()小部件的方法可能也是如此。

进一步更新:什么是OPP?你是说OOP吗?否-一个子类经常需要知道一些关于超类的设计。不是内部实现细节-而是超类与其客户(使用类)所拥有的基本契约。这丝毫不违反OOP原则。这就是为什么protected在OOP中通常是一个有效的概念的原因(尽管在Python中当然不是)。

There’s no hard and fast rule. The documentation for a class should indicate whether subclasses should call the superclass method. Sometimes you want to completely replace superclass behaviour, and at other times augment it – i.e. call your own code before and/or after a superclass call.

Update: The same basic logic applies to any method call. Constructors sometimes need special consideration (as they often set up state which determines behaviour) and destructors because they parallel constructors (e.g. in the allocation of resources, e.g. database connections). But the same might apply, say, to the render() method of a widget.

Further update: What’s the OPP? Do you mean OOP? No – a subclass often needs to know something about the design of the superclass. Not the internal implementation details – but the basic contract that the superclass has with its clients (using classes). This does not violate OOP principles in any way. That’s why protected is a valid concept in OOP in general (though not, of course, in Python).


回答 5

海事组织,你应该给它打电话。如果您的超类是object,则不应这样做,但在其他情况下,我认为不调用它是一种exceptions。正如其他人已经回答的那样,如果您的类甚至不必重写__init__自身,例如在没有(其他)内部状态要初始化的情况下,这将非常方便。

IMO, you should call it. If your superclass is object, you should not, but in other cases I think it is exceptional not to call it. As already answered by others, it is very convenient if your class doesn’t even have to override __init__ itself, for example when it has no (additional) internal state to initialize.


回答 6

是的,您应该始终__init__显式调用基类,这是一种良好的编码习惯。忘记执行此操作可能会导致细微的问题或运行时错误。即使__init__不接受任何参数也是如此。这与其他语言不同,在其他语言中,编译器会为您隐式调用基类构造函数。Python不会那样做!

始终调用基类的主要原因_init__是基类通常可以创建成员变量并将其初始化为默认值。因此,如果不调用基类init,则不会执行任何代码,并且最终会得到没有成员变量的基类。

范例

class Base:
  def __init__(self):
    print('base init')

class Derived1(Base):
  def __init__(self):
    print('derived1 init')

class Derived2(Base):
  def __init__(self):
    super(Derived2, self).__init__()
    print('derived2 init')

print('Creating Derived1...')
d1 = Derived1()
print('Creating Derived2...')
d2 = Derived2()

打印..

Creating Derived1...
derived1 init
Creating Derived2...
base init
derived2 init

运行此代码

Yes, you should always call base class __init__ explicitly as a good coding practice. Forgetting to do this can cause subtle issues or run time errors. This is true even if __init__ doesn’t take any parameters. This is unlike other languages where compiler would implicitly call base class constructor for you. Python doesn’t do that!

The main reason for always calling base class _init__ is that base class may typically create member variable and initialize them to defaults. So if you don’t call base class init, none of that code would be executed and you would end up with base class that has no member variables.

Example:

class Base:
  def __init__(self):
    print('base init')

class Derived1(Base):
  def __init__(self):
    print('derived1 init')

class Derived2(Base):
  def __init__(self):
    super(Derived2, self).__init__()
    print('derived2 init')

print('Creating Derived1...')
d1 = Derived1()
print('Creating Derived2...')
d2 = Derived2()

This prints..

Creating Derived1...
derived1 init
Creating Derived2...
base init
derived2 init

Run this code.


迭代对应于Python中列表的字典键值

问题:迭代对应于Python中列表的字典键值

使用Python 2.7。我有一本字典,其中以球队名称为关键,对每支球队得分并允许的奔跑次数作为值列表:

NL_East = {'Phillies': [645, 469], 'Braves': [599, 548], 'Mets': [653, 672]}

我希望能够将字典提供给函数并遍历每个团队(键)。

这是我正在使用的代码。现在,我只能逐队参加。我将如何遍历每个团队并为每个团队打印预期的win_percentage?

def Pythag(league):
    runs_scored = float(league['Phillies'][0])
    runs_allowed = float(league['Phillies'][1])
    win_percentage = round((runs_scored**2)/((runs_scored**2)+(runs_allowed**2))*1000)
    print win_percentage

谢谢你的帮助。

Working in Python 2.7. I have a dictionary with team names as the keys and the amount of runs scored and allowed for each team as the value list:

NL_East = {'Phillies': [645, 469], 'Braves': [599, 548], 'Mets': [653, 672]}

I would like to be able to feed the dictionary into a function and iterate over each team (the keys).

Here’s the code I’m using. Right now, I can only go team by team. How would I iterate over each team and print the expected win_percentage for each team?

def Pythag(league):
    runs_scored = float(league['Phillies'][0])
    runs_allowed = float(league['Phillies'][1])
    win_percentage = round((runs_scored**2)/((runs_scored**2)+(runs_allowed**2))*1000)
    print win_percentage

Thanks for any help.


回答 0

您有几种选择可以遍历字典。

如果迭代字典本身(for team in league),则将迭代字典的键。当使用for循环进行循环时,无论您是在dict(league)本身上循环还是在以下情况下,行为都是相同的league.keys()

for team in league.keys():
    runs_scored, runs_allowed = map(float, league[team])

您还可以通过迭代遍历键和值一次league.items()

for team, runs in league.items():
    runs_scored, runs_allowed = map(float, runs)

您甚至可以在迭代时执行元组拆包:

for team, (runs_scored, runs_allowed) in league.items():
    runs_scored = float(runs_scored)
    runs_allowed = float(runs_allowed)

You have several options for iterating over a dictionary.

If you iterate over the dictionary itself (for team in league), you will be iterating over the keys of the dictionary. When looping with a for loop, the behavior will be the same whether you loop over the dict (league) itself, or league.keys():

for team in league.keys():
    runs_scored, runs_allowed = map(float, league[team])

You can also iterate over both the keys and the values at once by iterating over league.items():

for team, runs in league.items():
    runs_scored, runs_allowed = map(float, runs)

You can even perform your tuple unpacking while iterating:

for team, (runs_scored, runs_allowed) in league.items():
    runs_scored = float(runs_scored)
    runs_allowed = float(runs_allowed)

回答 1

您也可以很容易地遍历字典:

for team, scores in NL_East.iteritems():
    runs_scored = float(scores[0])
    runs_allowed = float(scores[1])
    win_percentage = round((runs_scored**2)/((runs_scored**2)+(runs_allowed**2))*1000)
    print '%s: %.1f%%' % (team, win_percentage)

You can very easily iterate over dictionaries, too:

for team, scores in NL_East.iteritems():
    runs_scored = float(scores[0])
    runs_allowed = float(scores[1])
    win_percentage = round((runs_scored**2)/((runs_scored**2)+(runs_allowed**2))*1000)
    print '%s: %.1f%%' % (team, win_percentage)

回答 2

字典具有一个称为的内置函数iterkeys()

尝试:

for team in league.iterkeys():
    runs_scored = float(league[team][0])
    runs_allowed = float(league[team][1])
    win_percentage = round((runs_scored**2)/((runs_scored**2)+(runs_allowed**2))*1000)
    print win_percentage

Dictionaries have a built in function called iterkeys().

Try:

for team in league.iterkeys():
    runs_scored = float(league[team][0])
    runs_allowed = float(league[team][1])
    win_percentage = round((runs_scored**2)/((runs_scored**2)+(runs_allowed**2))*1000)
    print win_percentage

回答 3

字典对象允许您迭代其项目。此外,通过模式匹配和__future__可以使事情稍微简化。

最后,您可以将逻辑从打印中分离出来,以使事情在以后的重构/调试中更加容易。

from __future__ import division

def Pythag(league):
    def win_percentages():
        for team, (runs_scored, runs_allowed) in league.iteritems():
            win_percentage = round((runs_scored**2) / ((runs_scored**2)+(runs_allowed**2))*1000)
            yield win_percentage

    for win_percentage in win_percentages():
        print win_percentage

Dictionary objects allow you to iterate over their items. Also, with pattern matching and the division from __future__ you can do simplify things a bit.

Finally, you can separate your logic from your printing to make things a bit easier to refactor/debug later.

from __future__ import division

def Pythag(league):
    def win_percentages():
        for team, (runs_scored, runs_allowed) in league.iteritems():
            win_percentage = round((runs_scored**2) / ((runs_scored**2)+(runs_allowed**2))*1000)
            yield win_percentage

    for win_percentage in win_percentages():
        print win_percentage

回答 4

列表理解可以缩短内容…

win_percentages = [m**2.0 / (m**2.0 + n**2.0) * 100 for m, n in [a[i] for i in NL_East]]

List comprehension can shorten things…

win_percentages = [m**2.0 / (m**2.0 + n**2.0) * 100 for m, n in [a[i] for i in NL_East]]

Emacs适用于Python的批量缩进

问题:Emacs适用于Python的批量缩进

如果我想在代码块中添加try / except,则在Emacs中使用Python,我经常发现我必须逐行缩进整个代码块。在Emacs中,如何立即缩进整个块。

我不是经验丰富的Emacs用户,但是发现它是通过ssh工作的最佳工具。我在命令行(Ubuntu)上使用Emacs,而不是作为gui,如果有什么不同的话。

Working with Python in Emacs if I want to add a try/except to a block of code, I often find that I am having to indent the whole block, line by line. In Emacs, how do you indent the whole block at once.

I am not an experienced Emacs user, but just find it is the best tool for working through ssh. I am using Emacs on the command line(Ubuntu), not as a gui, if that makes any difference.


回答 0

如果您正在使用Emacs编程Python,那么您可能应该使用python-mode。使用python-mode,在标记代码块之后,

C-c >C-c C-l 将区域右移4个空格

C-c <C-c C-r 将区域向左移动4个空格

如果您需要将代码缩进两个级别,或者需要一定程度的缩编,则可以在命令前加上一个参数:

C-u 8 C-c > 将区域右移8个空格

C-u 8 C-c < 将区域向左移动8个空格

另一种选择是使用M-x indent-rigidly绑定到C-x TAB

C-u 8 C-x TAB 将区域右移8个空格

C-u -8 C-x TAB 将区域向左移动8个空格

用于文本矩形而不是文本行的矩形命令也很有用。

例如,在标记矩形区域后,

C-x r o 插入空格以填充矩形区域(有效地向右移动代码)

C-x r k 杀死矩形区域(有效地将代码向左移动)

C-x r t提示输入一个字符串来替换矩形。输入C-u 8 <space>后将输入8个空格。

PS。使用Ubuntu,要将python-mode设置为所有.py文件的默认模式,只需安装该python-mode软件包。

If you are programming Python using Emacs, then you should probably be using python-mode. With python-mode, after marking the block of code,

C-c > or C-c C-l shifts the region 4 spaces to the right

C-c < or C-c C-r shifts the region 4 spaces to the left

If you need to shift code by two levels of indention, or some arbitary amount you can prefix the command with an argument:

C-u 8 C-c > shifts the region 8 spaces to the right

C-u 8 C-c < shifts the region 8 spaces to the left

Another alternative is to use M-x indent-rigidly which is bound to C-x TAB:

C-u 8 C-x TAB shifts the region 8 spaces to the right

C-u -8 C-x TAB shifts the region 8 spaces to the left

Also useful are the rectangle commands that operate on rectangles of text instead of lines of text.

For example, after marking a rectangular region,

C-x r o inserts blank space to fill the rectangular region (effectively shifting code to the right)

C-x r k kills the rectangular region (effectively shifting code to the left)

C-x r t prompts for a string to replace the rectangle with. Entering C-u 8 <space> will then enter 8 spaces.

PS. With Ubuntu, to make python-mode the default mode for all .py files, simply install the python-mode package.


回答 1

除了默认情况下indent-region映射到C-M-\的,矩形编辑命令对Python很有用。将区域标记为正常,然后:

  • C-x r tstring-rectangle):将提示您输入要插入每行的字符;非常适合插入一定数量的空格
  • C-x r kkill-rectangle):删除矩形区域;非常适合去除压痕

您也可以C-x r yyank-rectangle),但这很少有用。

In addition to indent-region, which is mapped to C-M-\ by default, the rectangle edit commands are very useful for Python. Mark a region as normal, then:

  • C-x r t (string-rectangle): will prompt you for characters you’d like to insert into each line; great for inserting a certain number of spaces
  • C-x r k (kill-rectangle): remove a rectangle region; great for removing indentation

You can also C-x r y (yank-rectangle), but that’s only rarely useful.


回答 2

indent-region映射C-M-\应该可以解决问题。

indent-region mapped to C-M-\ should do the trick.


回答 3

我一直在使用此功能来处理缩进和缩进:

(defun unindent-dwim (&optional count-arg)
  "Keeps relative spacing in the region.  Unindents to the next multiple of the current tab-width"
  (interactive)
  (let ((deactivate-mark nil)
        (beg (or (and mark-active (region-beginning)) (line-beginning-position)))
        (end (or (and mark-active (region-end)) (line-end-position)))
        (min-indentation)
        (count (or count-arg 1)))
    (save-excursion
      (goto-char beg)
      (while (< (point) end)
        (add-to-list 'min-indentation (current-indentation))
        (forward-line)))
    (if (< 0 count)
        (if (not (< 0 (apply 'min min-indentation)))
            (error "Can't indent any more.  Try `indent-rigidly` with a negative arg.")))
    (if (> 0 count)
        (indent-rigidly beg end (* (- 0 tab-width) count))
      (let (
            (indent-amount
             (apply 'min (mapcar (lambda (x) (- 0 (mod x tab-width))) min-indentation))))
        (indent-rigidly beg end (or
                                 (and (< indent-amount 0) indent-amount)
                                 (* (or count 1) (- 0 tab-width))))))))

然后将其分配给键盘快捷键:

(global-set-key (kbd "s-[") 'unindent-dwim)
(global-set-key (kbd "s-]") (lambda () (interactive) (unindent-dwim -1)))

I’ve been using this function to handle my indenting and unindenting:

(defun unindent-dwim (&optional count-arg)
  "Keeps relative spacing in the region.  Unindents to the next multiple of the current tab-width"
  (interactive)
  (let ((deactivate-mark nil)
        (beg (or (and mark-active (region-beginning)) (line-beginning-position)))
        (end (or (and mark-active (region-end)) (line-end-position)))
        (min-indentation)
        (count (or count-arg 1)))
    (save-excursion
      (goto-char beg)
      (while (< (point) end)
        (add-to-list 'min-indentation (current-indentation))
        (forward-line)))
    (if (< 0 count)
        (if (not (< 0 (apply 'min min-indentation)))
            (error "Can't indent any more.  Try `indent-rigidly` with a negative arg.")))
    (if (> 0 count)
        (indent-rigidly beg end (* (- 0 tab-width) count))
      (let (
            (indent-amount
             (apply 'min (mapcar (lambda (x) (- 0 (mod x tab-width))) min-indentation))))
        (indent-rigidly beg end (or
                                 (and (< indent-amount 0) indent-amount)
                                 (* (or count 1) (- 0 tab-width))))))))

And then I assign it to a keyboard shortcut:

(global-set-key (kbd "s-[") 'unindent-dwim)
(global-set-key (kbd "s-]") (lambda () (interactive) (unindent-dwim -1)))

回答 4

我是Emacs的新手,因此此答案可能对您毫无用处。

到目前为止,所提到的答案都没有覆盖字面量如dict或的重新缩进list。例如,如果您剪切并粘贴了以下文字并需要对其进行合理的缩进,则M-x indent-regionor或M-x python-indent-shift-rightcompany不会提供帮助:

    foo = {
  'bar' : [
     1,
    2,
        3 ],
      'baz' : {
     'asdf' : {
        'banana' : 1,
        'apple' : 2 } } }

感觉M-x indent-region应该在中做一些明智的事情python-mode,但是还不是这样。

对于将文字放在方括号中的特定情况,在相关行上使用TAB即可获得所需的内容(因为空格不起作用)。

因此,在这种情况下,我一直在快速记录键盘宏,例如<f3> C-n TAB <f4>F3,Ctrl-n(或向下箭头),TAB,F4,然后重复使用F4来应用宏可以节省几次击键。或者,您可以将C-u 10 C-x e其应用10次。

(我知道这听起来不多,但是尝试重新缩进100行垃圾文字而不丢失向下箭头,然后不得不上升5行并重复一遍;)。

I’m an Emacs newb, so this answer it probably bordering on useless.

None of the answers mentioned so far cover re-indentation of literals like dict or list. E.g. M-x indent-region or M-x python-indent-shift-right and company aren’t going to help if you’ve cut-and-pasted the following literal and need it to be re-indented sensibly:

    foo = {
  'bar' : [
     1,
    2,
        3 ],
      'baz' : {
     'asdf' : {
        'banana' : 1,
        'apple' : 2 } } }

It feels like M-x indent-region should do something sensibly in python-mode, but that’s not (yet) the case.

For the specific case where your literals are bracketed, using TAB on the lines in question gets what you want (because whitespace doesn’t play a role).

So what I’ve been doing in such cases is quickly recording a keyboard macro like <f3> C-n TAB <f4> as in F3, Ctrl-n (or down arrow), TAB, F4, and then using F4 repeatedly to apply the macro can save a couple of keystrokes. Or you can do C-u 10 C-x e to apply it 10 times.

(I know it doesn’t sound like much, but try re-indenting 100 lines of garbage literal without missing down-arrow, and then having to go up 5 lines and repeat things ;) ).


回答 5

我使用以下代码段。当选项卡处于非活动状态时,在选项卡上缩进当前行(通常如此);当选择处于非活动状态时,它将使整个区域向右缩进。

(defun my-python-tab-command (&optional _)
  "If the region is active, shift to the right; otherwise, indent current line."
  (interactive)
  (if (not (region-active-p))
      (indent-for-tab-command)
    (let ((lo (min (region-beginning) (region-end)))
          (hi (max (region-beginning) (region-end))))
      (goto-char lo)
      (beginning-of-line)
      (set-mark (point))
      (goto-char hi)
      (end-of-line)
      (python-indent-shift-right (mark) (point)))))
(define-key python-mode-map [remap indent-for-tab-command] 'my-python-tab-command)

I use the following snippet. On tab when the selection is inactive, it indents the current line (as it normally does); when the selection is inactive, it indents the whole region to the right.

(defun my-python-tab-command (&optional _)
  "If the region is active, shift to the right; otherwise, indent current line."
  (interactive)
  (if (not (region-active-p))
      (indent-for-tab-command)
    (let ((lo (min (region-beginning) (region-end)))
          (hi (max (region-beginning) (region-end))))
      (goto-char lo)
      (beginning-of-line)
      (set-mark (point))
      (goto-char hi)
      (end-of-line)
      (python-indent-shift-right (mark) (point)))))
(define-key python-mode-map [remap indent-for-tab-command] 'my-python-tab-command)

回答 6

交互进行缩进。

  1. 选择要缩进的区域。
  2. Cx TAB
  3. 使用箭头(<-->)进行交互缩进。
  4. Esc完成所需的缩进后,按三次。

从我的文章中复制:在Emacs中缩进几行

Do indentation interactively.

  1. Select the region to be indented.
  2. C-x TAB.
  3. Use arrows (<- and ->) to indent interactively.
  4. Press Esc three times when you are done with the required indentation.

Copied from my post in: Indent several lines in Emacs


回答 7

我普遍做这样的事情

;; intent whole buffer 
(defun iwb ()
  "indent whole buffer"
  (interactive)
  ;;(delete-trailing-whitespace)
  (indent-region (point-min) (point-max) nil)
  (untabify (point-min) (point-max)))

I do something like this universally

;; intent whole buffer 
(defun iwb ()
  "indent whole buffer"
  (interactive)
  ;;(delete-trailing-whitespace)
  (indent-region (point-min) (point-max) nil)
  (untabify (point-min) (point-max)))

numpy,scipy,matplotlib和pylab之间的混淆

问题:numpy,scipy,matplotlib和pylab之间的混淆

Numpy,scipy,matplotlib和pylab是使用python进行科学计算的常用术语。

我只是学习了一些有关pylab的知识,而感到困惑。每当我要导入numpy时,我都可以执行以下操作:

import numpy as np

我只是认为,一旦我这样做

from pylab import *

numpy也将被导入(使用np别名)。所以基本上,第二个相比第一个做更多的事情。

我想问的几件事:

  1. pylab仅仅是numpy,scipy和matplotlib的包装吗?
  2. 由于NP是pylab中的numpy别名,因此pylab中的scipy和matplotlib别名是什么?(据我所知,plt是matplotlib.pyplot的别名,但我不知道matplotlib本身的别名)

Numpy, scipy, matplotlib, and pylab are common terms among they who use python for scientific computation.

I just learn a bit about pylab, and I got confused. Whenever I want to import numpy, I can always do:

import numpy as np

I just consider, that once I do

from pylab import *

the numpy will be imported as well (with np alias). So basically the second one does more things compared to the first one.

There are few things I want to ask:

  1. Is it right that pylab is just a wrapper for numpy, scipy and matplotlib?
  2. As np is the numpy alias in pylab, what is the scipy and matplotlib alias in pylab? (as far as I know, plt is alias of matplotlib.pyplot, but I don’t know the alias for the matplotlib itself)

回答 0

  1. 没有,pylab是的一部分matplotlib(在matplotlib.pylab),并试图给你喜欢的环境Matlab的。matplotlib有许多依赖项,其中有一些依赖项numpy以通用别名导入npscipy不是的依赖项matplotlib

  2. 如果运行ipython --pylab自动导入,则会将所有符号从中matplotlib.pylab放入全局范围。就像您写的一样numpy,在np别名下导入。别名matplotlib下的符号来自mpl

  1. No, pylab is part of matplotlib (in matplotlib.pylab) and tries to give you a MatLab like environment. matplotlib has a number of dependencies, among them numpy which it imports under the common alias np. scipy is not a dependency of matplotlib.

  2. If you run ipython --pylab an automatic import will put all symbols from matplotlib.pylab into global scope. Like you wrote numpy gets imported under the np alias. Symbols from matplotlib are available under the mpl alias.


回答 1

Scipy和numpy是科学项目,旨在为python带来高效,快速的数值计算。

Matplotlib是python绘图库的名称。

Pyplot是matplotlib的交互式api,主要用于jupyter之类的笔记本中。您通常会这样使用它:import matplotlib.pyplot as plt

Pylab与pyplot相同,但是具有额外的功能(目前不鼓励使用)。

  • pylab = pyplot + numpy的

在此处查看更多信息:Matplotlib,Pylab,Pyplot等:这些和何时使用它们有什么区别?

Scipy and numpy are scientific projects whose aim is to bring efficient and fast numeric computing to python.

Matplotlib is the name of the python plotting library.

Pyplot is an interactive api for matplotlib, mostly for use in notebooks like jupyter. You generally use it like this: import matplotlib.pyplot as plt.

Pylab is the same thing as pyplot, but with extra features (its use is currently discouraged).

  • pylab = pyplot + numpy

See more information here: Matplotlib, Pylab, Pyplot, etc: What’s the difference between these and when to use each?


回答 2

由于某些示例(例如我)可能仍然对pylab的使用感到困惑,因为pylab互联网上存在使用示例的示例,因此这里引用了官方matplotlib常见问题解答:

pylab是一个便捷模块,可在单个命名空间中批量导入matplotlib.pyplot(用于绘图)和numpy(用于数学以及使用数组)。尽管许多示例都使用pylab,但不再建议使用。

因此,TL; DR; 是不使用pylab,句点。根据需要分别使用pyplot和导入numpy

这是进一步阅读和其他有用示例的链接

Since some people (like me) may still be confused about usage of pylab since examples using pylab are out there on the internet, here is a quote from the official matplotlib FAQ:

pylab is a convenience module that bulk imports matplotlib.pyplot (for plotting) and numpy (for mathematics and working with arrays) in a single name space. Although many examples use pylab, it is no longer recommended.

So, TL;DR; is do not use pylab, period. Use pyplot and import numpy separately as needed.

Here is the link for further reading and other useful examples.


检查类是否已定义函数的最快方法是什么?

问题:检查类是否已定义函数的最快方法是什么?

我正在编写AI状态空间搜索算法,并且有一个通用类可以用于快速实现搜索算法。子类将定义必要的操作,然后算法执行其余操作。

这是我遇到的问题:我想避免一遍又一遍地重新生成父状态,所以我有以下函数,该函数返回可以合法地应用于任何状态的操作:

def get_operations(self, include_parent=True):
    ops = self._get_operations()
    if not include_parent and self.path.parent_op:
        try:
            parent_inverse = self.invert_op(self.path.parent_op)
            ops.remove(parent_inverse)
        except NotImplementedError:
            pass
    return ops

并且invert_op函数默认情况下抛出。

有没有比捕获异常更快的方法来检查函数是否未定义?

我在检查dir中是否存在内容时正在思考,但这似乎不正确。hasattr是通过调用getattr并检查它是否引发来实现的,这不是我想要的。

I’m writing an AI state space search algorithm, and I have a generic class which can be used to quickly implement a search algorithm. A subclass would define the necessary operations, and the algorithm does the rest.

Here is where I get stuck: I want to avoid regenerating the parent state over and over again, so I have the following function, which returns the operations that can be legally applied to any state:

def get_operations(self, include_parent=True):
    ops = self._get_operations()
    if not include_parent and self.path.parent_op:
        try:
            parent_inverse = self.invert_op(self.path.parent_op)
            ops.remove(parent_inverse)
        except NotImplementedError:
            pass
    return ops

And the invert_op function throws by default.

Is there a faster way to check to see if the function is not defined than catching an exception?

I was thinking something on the lines of checking for present in dir, but that doesn’t seem right. hasattr is implemented by calling getattr and checking if it raises, which is not what I want.


回答 0

是的,用于getattr()获取属性并callable()验证它是否为方法:

invert_op = getattr(self, "invert_op", None)
if callable(invert_op):
    invert_op(self.path.parent_op)

请注意,getattr()当属性不存在时,通常会引发异常。但是,如果您指定默认值(None在本例中为),它将返回该值。

Yes, use getattr() to get the attribute, and callable() to verify it is a method:

invert_op = getattr(self, "invert_op", None)
if callable(invert_op):
    invert_op(self.path.parent_op)

Note that getattr() normally throws exception when the attribute doesn’t exist. However, if you specify a default value (None, in this case), it will return that instead.


回答 1

它同时适用于Python 2和Python 3

hasattr(connection, 'invert_opt')

hasattrTrue如果连接对象已invert_opt定义函数,则返回。这是供您放牧的文档

https://docs.python.org/2/library/functions.html#hasattr https://docs.python.org/3/library/functions.html#hasattr

It works in both Python 2 and Python 3

hasattr(connection, 'invert_opt')

hasattr returns True if connection object has a function invert_opt defined. Here is the documentation for you to graze

https://docs.python.org/2/library/functions.html#hasattr https://docs.python.org/3/library/functions.html#hasattr


回答 2

有没有比捕获异常更快的方法来检查函数是否未定义?

你为什么反对那个?在大多数Pythonic情况下,最好是请求宽恕而不是允许。;-)

hasattr是通过调用getattr并检查它是否引发来实现的,这不是我想要的。

同样,为什么呢?以下是相当Pythonic的内容:

    try:
        invert_op = self.invert_op
    except AttributeError:
        pass
    else:
        parent_inverse = invert_op(self.path.parent_op)
        ops.remove(parent_inverse)

要么,

    # if you supply the optional `default` parameter, no exception is thrown
    invert_op = getattr(self, 'invert_op', None)  
    if invert_op is not None:
        parent_inverse = invert_op(self.path.parent_op)
        ops.remove(parent_inverse)

但是请注意,这getattr(obj, attr, default)基本上也是通过捕获异常来实现的。Python领域没有错!

Is there a faster way to check to see if the function is not defined than catching an exception?

Why are you against that? In most Pythonic cases, it’s better to ask forgiveness than permission. ;-)

hasattr is implemented by calling getattr and checking if it raises, which is not what I want.

Again, why is that? The following is quite Pythonic:

    try:
        invert_op = self.invert_op
    except AttributeError:
        pass
    else:
        parent_inverse = invert_op(self.path.parent_op)
        ops.remove(parent_inverse)

Or,

    # if you supply the optional `default` parameter, no exception is thrown
    invert_op = getattr(self, 'invert_op', None)  
    if invert_op is not None:
        parent_inverse = invert_op(self.path.parent_op)
        ops.remove(parent_inverse)

Note, however, that getattr(obj, attr, default) is basically implemented by catching an exception, too. There is nothing wrong with that in Python land!


回答 3

这里的响应检查字符串是否是对象的属性的名称。需要一个额外的步骤(使用callable)来检查属性是否为方法。

因此,可以归结为:检查对象obj是否具有属性attrib的最快方法是什么。答案是

'attrib' in obj.__dict__

之所以如此,是因为dict对其键进行了哈希处理,因此可以快速检查键的存在。

请参见下面的时序比较。

>>> class SomeClass():
...         pass
...
>>> obj = SomeClass()
>>>
>>> getattr(obj, "invert_op", None)
>>>
>>> %timeit getattr(obj, "invert_op", None)
1000000 loops, best of 3: 723 ns per loop
>>> %timeit hasattr(obj, "invert_op")
The slowest run took 4.60 times longer than the fastest. This could mean that an intermediate result is being cached.
1000000 loops, best of 3: 674 ns per loop
>>> %timeit "invert_op" in obj.__dict__
The slowest run took 12.19 times longer than the fastest. This could mean that an intermediate result is being cached.
10000000 loops, best of 3: 176 ns per loop

The responses herein check if a string is the name of an attribute of the object. An extra step (using callable) is needed to check if the attribute is a method.

So it boils down to: what is the fastest way to check if an object obj has an attribute attrib. The answer is

'attrib' in obj.__dict__

This is so because a dict hashes its keys so checking for the key’s existence is fast.

See timing comparisons below.

>>> class SomeClass():
...         pass
...
>>> obj = SomeClass()
>>>
>>> getattr(obj, "invert_op", None)
>>>
>>> %timeit getattr(obj, "invert_op", None)
1000000 loops, best of 3: 723 ns per loop
>>> %timeit hasattr(obj, "invert_op")
The slowest run took 4.60 times longer than the fastest. This could mean that an intermediate result is being cached.
1000000 loops, best of 3: 674 ns per loop
>>> %timeit "invert_op" in obj.__dict__
The slowest run took 12.19 times longer than the fastest. This could mean that an intermediate result is being cached.
10000000 loops, best of 3: 176 ns per loop

回答 4

我喜欢内森·奥斯特加德的回答,并对此进行了投票。但是解决问题的另一种方法是使用记忆修饰符,该修饰符将缓存函数调用的结果。因此,您可以继续使用具有昂贵功能的功能来解决某些问题,但是当您一遍又一遍地调用它时,后续调用很快。函数的记忆版本会在字典中查找参数,然后从实际函数计算结果时开始在字典中查找结果,然后立即返回结果。

这是雷蒙德·海廷格(Raymond Hettinger)称为“ lru_cache”的记忆修饰器的食谱。现在,此版本是Python 3.2中functools模块的标准版本。

http://code.activestate.com/recipes/498245-lru-and-lfu-cache-decorators/

http://docs.python.org/release/3.2/library/functools.html

I like Nathan Ostgard’s answer and I up-voted it. But another way you could solve your problem would be to use a memoizing decorator, which would cache the result of the function call. So you can go ahead and have an expensive function that figures something out, but then when you call it over and over the subsequent calls are fast; the memoized version of the function looks up the arguments in a dict, finds the result in the dict from when the actual function computed the result, and returns the result right away.

Here is a recipe for a memoizing decorator called “lru_cache” by Raymond Hettinger. A version of this is now standard in the functools module in Python 3.2.

http://code.activestate.com/recipes/498245-lru-and-lfu-cache-decorators/

http://docs.python.org/release/3.2/library/functools.html


回答 5

像Python中的任何东西一样,如果您尽力而为,那么您就可以直截了当地去做一些令人讨厌的事情。现在,这是令人讨厌的部分:

def invert_op(self, op):
    raise NotImplementedError

def is_invert_op_implemented(self):
    # Only works in CPython 2.x of course
    return self.invert_op.__code__.co_code == 't\x00\x00\x82\x01\x00d\x00\x00S'

请帮我们一个忙,只要继续解决您的问题,就不要使用它,除非您是PyPy团队的黑客,他们正在侵入Python解释器。您所拥有的是Pythonic,我在这里拥有的是纯EVIL

Like anything in Python, if you try hard enough, you can get at the guts and do something really nasty. Now, here’s the nasty part:

def invert_op(self, op):
    raise NotImplementedError

def is_invert_op_implemented(self):
    # Only works in CPython 2.x of course
    return self.invert_op.__code__.co_code == 't\x00\x00\x82\x01\x00d\x00\x00S'

Please do us a favor, just keep doing what you have in your question and DON’T ever use this unless you are on the PyPy team hacking into the Python interpreter. What you have up there is Pythonic, what I have here is pure EVIL.


回答 6

您也可以遍历类:

import inspect


def get_methods(cls_):
    methods = inspect.getmembers(cls_, inspect.isfunction)
    return dict(methods)

# Example
class A(object):
    pass

class B(object):
    def foo():
        print('B')


# If you only have an object, you can use `cls_ = obj.__class__`
if 'foo' in get_methods(A):
    print('A has foo')

if 'foo' in get_methods(B):
    print('B has foo')

You can also go over the class:

import inspect


def get_methods(cls_):
    methods = inspect.getmembers(cls_, inspect.isfunction)
    return dict(methods)

# Example
class A(object):
    pass

class B(object):
    def foo():
        print('B')


# If you only have an object, you can use `cls_ = obj.__class__`
if 'foo' in get_methods(A):
    print('A has foo')

if 'foo' in get_methods(B):
    print('B has foo')

回答 7

虽然在__dict__属性中检查属性确实非常快,但是您不能将其用于方法,因为它们不会出现在__dict__哈希中。但是,如果性能至关重要,则可以在课堂上采用棘手的解决方法:

class Test():
    def __init__():
        # redefine your method as attribute
        self.custom_method = self.custom_method

    def custom_method(self):
        pass

然后检查方法为:

t = Test()
'custom_method' in t.__dict__

时间比较getattr

>>%timeit 'custom_method' in t.__dict__
55.9 ns ± 0.626 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)

>>%timeit getattr(t, 'custom_method', None)
116 ns ± 0.765 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)

我并不是在鼓励这种方法,但是它似乎有效。

[EDIT]当方法名称不在给定的类中时,性能提升甚至更高:

>>%timeit 'rubbish' in t.__dict__
65.5 ns ± 11 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)

>>%timeit getattr(t, 'rubbish', None)
385 ns ± 12.9 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

While checking for attributes in __dict__ property is really fast, you cannot use this for methods, since they do not appear in __dict__ hash. You could however resort to hackish workaround in your class, if performance is that critical:

class Test():
    def __init__():
        # redefine your method as attribute
        self.custom_method = self.custom_method

    def custom_method(self):
        pass

Then check for method as:

t = Test()
'custom_method' in t.__dict__

Time comparision with getattr:

>>%timeit 'custom_method' in t.__dict__
55.9 ns ± 0.626 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)

>>%timeit getattr(t, 'custom_method', None)
116 ns ± 0.765 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)

Not that I’m encouraging this approach, but it seems to work.

[EDIT] Performance boost is even higher when method name is not in given class:

>>%timeit 'rubbish' in t.__dict__
65.5 ns ± 11 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)

>>%timeit getattr(t, 'rubbish', None)
385 ns ± 12.9 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

Python,创建对象

问题:Python,创建对象

我正在尝试学习python,现在我试图摆脱类的困扰,以及如何使用实例操纵它们。

我似乎无法理解这个练习问题:

创建并返回其名称,年龄和专业与输入相同的学生对象

def make_student(name, age, major)

我只是不了解对象的含义,是否意味着我应该在包含这些值的函数内创建一个数组?或创建一个类,让该函数位于其中,并分配实例?(在问这个问题之前,我被要求开设一个学生班,里面要说姓名,年龄和专业)

class Student:
    name = "Unknown name"
    age = 0
    major = "Unknown major"

I’m trying to learn python and I now I am trying to get the hang of classes and how to manipulate them with instances.

I can’t seem to understand this practice problem:

Create and return a student object whose name, age, and major are the same as those given as input

def make_student(name, age, major)

I just don’t get what it means by object, do they mean I should create an array inside the function that holds these values? or create a class and let this function be inside it, and assign instances? (before this question i was asked to set up a student class with name, age, and major inside)

class Student:
    name = "Unknown name"
    age = 0
    major = "Unknown major"

回答 0

class Student(object):
    name = ""
    age = 0
    major = ""

    # The class "constructor" - It's actually an initializer 
    def __init__(self, name, age, major):
        self.name = name
        self.age = age
        self.major = major

def make_student(name, age, major):
    student = Student(name, age, major)
    return student

请注意,即使Python哲学中的原则之一是“应该有一个,最好只有一个,这是显而易见的方式”,但仍然有多种方式可以做到这一点。您还可以使用以下两个代码段来利用Python的动态功能:

class Student(object):
    name = ""
    age = 0
    major = ""

def make_student(name, age, major):
    student = Student()
    student.name = name
    student.age = age
    student.major = major
    # Note: I didn't need to create a variable in the class definition before doing this.
    student.gpa = float(4.0)
    return student

我更喜欢前者,但在某些情况下后者可能有用–一种是在使用文档数据库(如MongoDB)时。

class Student(object):
    name = ""
    age = 0
    major = ""

    # The class "constructor" - It's actually an initializer 
    def __init__(self, name, age, major):
        self.name = name
        self.age = age
        self.major = major

def make_student(name, age, major):
    student = Student(name, age, major)
    return student

Note that even though one of the principles in Python’s philosophy is “there should be one—and preferably only one—obvious way to do it”, there are still multiple ways to do this. You can also use the two following snippets of code to take advantage of Python’s dynamic capabilities:

class Student(object):
    name = ""
    age = 0
    major = ""

def make_student(name, age, major):
    student = Student()
    student.name = name
    student.age = age
    student.major = major
    # Note: I didn't need to create a variable in the class definition before doing this.
    student.gpa = float(4.0)
    return student

I prefer the former, but there are instances where the latter can be useful – one being when working with document databases like MongoDB.


回答 1

创建一个类并为其提供__init__方法:

class Student:
    def __init__(self, name, age, major):
        self.name = name
        self.age = age
        self.major = major

    def is_old(self):
        return self.age > 100

现在,您可以初始化Student该类的实例:

>>> s = Student('John', 88, None)
>>> s.name
    'John'
>>> s.age
    88

尽管我不知道make_student如果做与相同的功能为什么为什么需要一个学生函数Student.__init__

Create a class and give it an __init__ method:

class Student:
    def __init__(self, name, age, major):
        self.name = name
        self.age = age
        self.major = major

    def is_old(self):
        return self.age > 100

Now, you can initialize an instance of the Student class:

>>> s = Student('John', 88, None)
>>> s.name
    'John'
>>> s.age
    88

Although I’m not sure why you need a make_student student function if it does the same thing as Student.__init__.


回答 2

对象是类的实例。类只是对象的蓝图。因此,根据您的类定义-

# Note the added (object) - this is the preferred way of creating new classes
class Student(object):
    name = "Unknown name"
    age = 0
    major = "Unknown major"

您可以make_student通过将属性明确分配给Student– 的新实例来创建函数

def make_student(name, age, major):
    student = Student()
    student.name = name
    student.age = age
    student.major = major
    return student

但是在构造函数(__init__)中执行此操作可能更有意义-

class Student(object):
    def __init__(self, name="Unknown name", age=0, major="Unknown major"):
        self.name = name
        self.age = age
        self.major = major

使用时会调用构造函数Student()。它将采用__init__方法中定义的参数。现在,构造函数签名实际上将是Student(name, age, major)

如果使用该make_student函数,那么函数是微不足道的(并且是多余的)-

def make_student(name, age, major):
    return Student(name, age, major)

为了好玩,这里有一个示例,说明如何在make_student不定义类的情况下创建函数。请不要在家尝试。

def make_student(name, age, major):
    return type('Student', (object,),
                {'name': name, 'age': age, 'major': major})()

Objects are instances of classes. Classes are just the blueprints for objects. So given your class definition –

# Note the added (object) - this is the preferred way of creating new classes
class Student(object):
    name = "Unknown name"
    age = 0
    major = "Unknown major"

You can create a make_student function by explicitly assigning the attributes to a new instance of Student

def make_student(name, age, major):
    student = Student()
    student.name = name
    student.age = age
    student.major = major
    return student

But it probably makes more sense to do this in a constructor (__init__) –

class Student(object):
    def __init__(self, name="Unknown name", age=0, major="Unknown major"):
        self.name = name
        self.age = age
        self.major = major

The constructor is called when you use Student(). It will take the arguments defined in the __init__ method. The constructor signature would now essentially be Student(name, age, major).

If you use that, then a make_student function is trivial (and superfluous) –

def make_student(name, age, major):
    return Student(name, age, major)

For fun, here is an example of how to create a make_student function without defining a class. Please do not try this at home.

def make_student(name, age, major):
    return type('Student', (object,),
                {'name': name, 'age': age, 'major': major})()

回答 3

使用predefine类创建对象时,首先要创建一个用于存储该对象的变量。然后,您可以创建对象并存储您创建的变量。

class Student:
     def __init__(self):

# creating an object....

   student1=Student()

实际上,此init方法是class的构造方法。您可以使用一些属性来初始化该方法。在这一点上,创建对象时,您将必须为特定属性传递一些值。

class Student:
      def __init__(self,name,age):
            self.name=value
            self.age=value

 # creating an object.......

     student2=Student("smith",25)

when you create an object using predefine class, at first you want to create a variable for storing that object. Then you can create object and store variable that you created.

class Student:
     def __init__(self):

# creating an object....

   student1=Student()

Actually this init method is the constructor of class.you can initialize that method using some attributes.. In that point , when you creating an object , you will have to pass some values for particular attributes..

class Student:
      def __init__(self,name,age):
            self.name=value
            self.age=value

 # creating an object.......

     student2=Student("smith",25)

脾气暴躁的地方有多个条件

问题:脾气暴躁的地方有多个条件

我有一组距离称为dists。我想选择两个值之间的距离。我编写了以下代码行:

 dists[(np.where(dists >= r)) and (np.where(dists <= r + dr))]

但是,这仅针对条件选择

 (np.where(dists <= r + dr))

如果我通过使用临时变量按顺序执行命令,则效果很好。为什么上面的代码不起作用,如何使它起作用?

干杯

I have an array of distances called dists. I want to select dists which are between two values. I wrote the following line of code to do that:

 dists[(np.where(dists >= r)) and (np.where(dists <= r + dr))]

However this selects only for the condition

 (np.where(dists <= r + dr))

If I do the commands sequentially by using a temporary variable it works fine. Why does the above code not work, and how do I get it to work?

Cheers


回答 0

您的特定情况下,最好的方法将两个条件更改为一个条件:

dists[abs(dists - r - dr/2.) <= dr/2.]

它仅创建一个布尔数组,在我看来是更易于阅读,因为它说,dist内部的dr还是r(尽管我将重新定义r为您感兴趣的区域的中心,而不是开始的位置,所以r = r + dr/2.)但这并不能回答您的问题。


问题的答案:如果您只是想过滤出不符合标准的元素,则
实际上并不需要:wheredists

dists[(dists >= r) & (dists <= r+dr)]

因为&将会为您提供基本元素and(括号是必需的)。

或者,如果您where出于某些原因要使用,可以执行以下操作:

 dists[(np.where((dists >= r) & (dists <= r + dr)))]

原因:
不起作用的原因是因为np.where返回的是索引列表,而不是布尔数组。您试图and在两个数字列表之间移动,这些数字当然没有您期望的True/ False值。如果ab都是两个True值,则a and b返回b。所以说这样的话[0,1,2] and [2,3,4]只会给你[2,3,4]。它在起作用:

In [230]: dists = np.arange(0,10,.5)
In [231]: r = 5
In [232]: dr = 1

In [233]: np.where(dists >= r)
Out[233]: (array([10, 11, 12, 13, 14, 15, 16, 17, 18, 19]),)

In [234]: np.where(dists <= r+dr)
Out[234]: (array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12]),)

In [235]: np.where(dists >= r) and np.where(dists <= r+dr)
Out[235]: (array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12]),)

您期望比较的只是布尔数组,例如

In [236]: dists >= r
Out[236]: 
array([False, False, False, False, False, False, False, False, False,
       False,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True], dtype=bool)

In [237]: dists <= r + dr
Out[237]: 
array([ True,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True,  True,  True, False, False, False, False, False,
       False, False], dtype=bool)

In [238]: (dists >= r) & (dists <= r + dr)
Out[238]: 
array([False, False, False, False, False, False, False, False, False,
       False,  True,  True,  True, False, False, False, False, False,
       False, False], dtype=bool)

现在,您可以调用np.where组合的布尔数组:

In [239]: np.where((dists >= r) & (dists <= r + dr))
Out[239]: (array([10, 11, 12]),)

In [240]: dists[np.where((dists >= r) & (dists <= r + dr))]
Out[240]: array([ 5. ,  5.5,  6. ])

或者使用花式索引简单地用布尔数组对原始数组进行索引

In [241]: dists[(dists >= r) & (dists <= r + dr)]
Out[241]: array([ 5. ,  5.5,  6. ])

The best way in your particular case would just be to change your two criteria to one criterion:

dists[abs(dists - r - dr/2.) <= dr/2.]

It only creates one boolean array, and in my opinion is easier to read because it says, is dist within a dr or r? (Though I’d redefine r to be the center of your region of interest instead of the beginning, so r = r + dr/2.) But that doesn’t answer your question.


The answer to your question:
You don’t actually need where if you’re just trying to filter out the elements of dists that don’t fit your criteria:

dists[(dists >= r) & (dists <= r+dr)]

Because the & will give you an elementwise and (the parentheses are necessary).

Or, if you do want to use where for some reason, you can do:

 dists[(np.where((dists >= r) & (dists <= r + dr)))]

Why:
The reason it doesn’t work is because np.where returns a list of indices, not a boolean array. You’re trying to get and between two lists of numbers, which of course doesn’t have the True/False values that you expect. If a and b are both True values, then a and b returns b. So saying something like [0,1,2] and [2,3,4] will just give you [2,3,4]. Here it is in action:

In [230]: dists = np.arange(0,10,.5)
In [231]: r = 5
In [232]: dr = 1

In [233]: np.where(dists >= r)
Out[233]: (array([10, 11, 12, 13, 14, 15, 16, 17, 18, 19]),)

In [234]: np.where(dists <= r+dr)
Out[234]: (array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12]),)

In [235]: np.where(dists >= r) and np.where(dists <= r+dr)
Out[235]: (array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12]),)

What you were expecting to compare was simply the boolean array, for example

In [236]: dists >= r
Out[236]: 
array([False, False, False, False, False, False, False, False, False,
       False,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True], dtype=bool)

In [237]: dists <= r + dr
Out[237]: 
array([ True,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True,  True,  True, False, False, False, False, False,
       False, False], dtype=bool)

In [238]: (dists >= r) & (dists <= r + dr)
Out[238]: 
array([False, False, False, False, False, False, False, False, False,
       False,  True,  True,  True, False, False, False, False, False,
       False, False], dtype=bool)

Now you can call np.where on the combined boolean array:

In [239]: np.where((dists >= r) & (dists <= r + dr))
Out[239]: (array([10, 11, 12]),)

In [240]: dists[np.where((dists >= r) & (dists <= r + dr))]
Out[240]: array([ 5. ,  5.5,  6. ])

Or simply index the original array with the boolean array using fancy indexing

In [241]: dists[(dists >= r) & (dists <= r + dr)]
Out[241]: array([ 5. ,  5.5,  6. ])

回答 1

公认的答案已经很好地解释了这个问题。但是,应用多个条件的Numpythonic方法更多是使用numpy逻辑函数。在这种情况下,您可以使用np.logical_and

np.where(np.logical_and(np.greater_equal(dists,r),np.greater_equal(dists,r + dr)))

The accepted answer explained the problem well enough. However, the the more Numpythonic approach for applying multiple conditions is to use numpy logical functions. In this ase you can use np.logical_and:

np.where(np.logical_and(np.greater_equal(dists,r),np.greater_equal(dists,r + dr)))

回答 2

这里要指出的一件有趣的事情是:在这种情况下,通常也可以使用ORAND的方式,但有一点点变化。代替“ and”和“ or”,而使用Ampersand(&)Pipe Operator(|),它将起作用。

当我们使用‘and’时

ar = np.array([3,4,5,14,2,4,3,7])
np.where((ar>3) and (ar<6), 'yo', ar)

Output:
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

当我们使用&符时

ar = np.array([3,4,5,14,2,4,3,7])
np.where((ar>3) & (ar<6), 'yo', ar)

Output:
array(['3', 'yo', 'yo', '14', '2', 'yo', '3', '7'], dtype='<U11')

当我们尝试应用大熊猫Dataframe的多个过滤器时,情况也是如此。现在,其背后的原因必须与逻辑运算符和按位运算符有关,并且为了对它们有更多的了解,我建议在stackoverflow中仔细研究一下此答案或类似的Q / A。

更新

用户问,为什么需要在括号内给出(ar> 3)和(ar <6)。好吧,这就是事情。在我开始讨论这里发生的事情之前,需要了解Python中的运算符优先级。

类似于BODMAS所涉及的内容,python还优先执行应首先执行的操作。首先执行括号内的项目,然后按位运算符开始工作。我将在下面显示两种情况,当您确实使用和不使用“(”,“)”时会发生什么。

情况1:

np.where( ar>3 & ar<6, 'yo', ar)
np.where( np.array([3,4,5,14,2,4,3,7])>3 & np.array([3,4,5,14,2,4,3,7])<6, 'yo', ar)

由于这里没有括号,因此按位运算符(&)在这里变得困惑,您甚至要求它获得逻辑与,因为在运算符优先级表中(如果看到的话)&被赋予了优先于<>运算符。这是从最低优先级到最高优先级的表格。

在此处输入图片说明

它甚至不执行<>操作被要求执行逻辑与操作。这就是为什么它会导致该错误。

您可以查看以下链接以了解更多信息:运算符优先级

现在转到案例2:

如果您确实使用了支架,那么您会清楚地看到会发生什么。

np.where( (ar>3) & (ar<6), 'yo', ar)
np.where( (array([False,  True,  True,  True, False,  True, False,  True])) & (array([ True,  True,  True, False,  True,  True,  True, False])), 'yo', ar)

真假两个数组。而且,您可以轻松地对其执行逻辑AND操作。这给你:

np.where( array([False,  True,  True, False, False,  True, False, False]),  'yo', ar)

休息一下,np.where,对于给定的情况,在任何情况下,True都会分配第一个值(即“ yo”),如果为False,则分配另一个值(即在此保留原始值)。

就这样。我希望我能很好地解释查询。

One interesting thing to point here; the usual way of using OR and AND too will work in this case, but with a small change. Instead of “and” and instead of “or”, rather use Ampersand(&) and Pipe Operator(|) and it will work.

When we use ‘and’:

ar = np.array([3,4,5,14,2,4,3,7])
np.where((ar>3) and (ar<6), 'yo', ar)

Output:
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

When we use Ampersand(&):

ar = np.array([3,4,5,14,2,4,3,7])
np.where((ar>3) & (ar<6), 'yo', ar)

Output:
array(['3', 'yo', 'yo', '14', '2', 'yo', '3', '7'], dtype='<U11')

And this is same in the case when we are trying to apply multiple filters in case of pandas Dataframe. Now the reasoning behind this has to do something with Logical Operators and Bitwise Operators and for more understanding about same, I’d suggest to go through this answer or similar Q/A in stackoverflow.

UPDATE

A user asked, why is there a need for giving (ar>3) and (ar<6) inside the parenthesis. Well here’s the thing. Before I start talking about what’s happening here, one needs to know about Operator precedence in Python.

Similar to what BODMAS is about, python also gives precedence to what should be performed first. Items inside the parenthesis are performed first and then the bitwise operator comes to work. I’ll show below what happens in both the cases when you do use and not use “(“, “)”.

Case1:

np.where( ar>3 & ar<6, 'yo', ar)
np.where( np.array([3,4,5,14,2,4,3,7])>3 & np.array([3,4,5,14,2,4,3,7])<6, 'yo', ar)

Since there are no brackets here, the bitwise operator(&) is getting confused here that what are you even asking it to get logical AND of, because in the operator precedence table if you see, & is given precedence over < or > operators. Here’s the table from from lowest precedence to highest precedence.

enter image description here

It’s not even performing the < and > operation and being asked to perform a logical AND operation. So that’s why it gives that error.

One can check out the following link to learn more about: operator precedence

Now to Case 2:

If you do use the bracket, you clearly see what happens.

np.where( (ar>3) & (ar<6), 'yo', ar)
np.where( (array([False,  True,  True,  True, False,  True, False,  True])) & (array([ True,  True,  True, False,  True,  True,  True, False])), 'yo', ar)

Two arrays of True and False. And you can easily perform logical AND operation on them. Which gives you:

np.where( array([False,  True,  True, False, False,  True, False, False]),  'yo', ar)

And rest you know, np.where, for given cases, wherever True, assigns first value(i.e. here ‘yo’) and if False, the other(i.e. here, keeping the original).

That’s all. I hope I explained the query well.


回答 3

我喜欢np.vectorize用于此类任务。考虑以下:

>>> # function which returns True when constraints are satisfied.
>>> func = lambda d: d >= r and d<= (r+dr) 
>>>
>>> # Apply constraints element-wise to the dists array.
>>> result = np.vectorize(func)(dists) 
>>>
>>> result = np.where(result) # Get output.

您也可以使用np.argwhere代替以np.where获得清晰的输出。但这是您的电话:)

希望能帮助到你。

I like to use np.vectorize for such tasks. Consider the following:

>>> # function which returns True when constraints are satisfied.
>>> func = lambda d: d >= r and d<= (r+dr) 
>>>
>>> # Apply constraints element-wise to the dists array.
>>> result = np.vectorize(func)(dists) 
>>>
>>> result = np.where(result) # Get output.

You can also use np.argwhere instead of np.where for clear output. But that is your call :)

Hope it helps.


回答 4

尝试:

np.intersect1d(np.where(dists >= r)[0],np.where(dists <= r + dr)[0])

Try:

np.intersect1d(np.where(dists >= r)[0],np.where(dists <= r + dr)[0])

回答 5

这应该工作:

dists[((dists >= r) & (dists <= r+dr))]

最优雅的方式~~

This should work:

dists[((dists >= r) & (dists <= r+dr))]

The most elegant way~~


回答 6

尝试:

import numpy as np
dist = np.array([1,2,3,4,5])
r = 2
dr = 3
np.where(np.logical_and(dist> r, dist<=r+dr))

输出:(array([2,3]),)

您可以查看逻辑功能以获取更多详细信息。

Try:

import numpy as np
dist = np.array([1,2,3,4,5])
r = 2
dr = 3
np.where(np.logical_and(dist> r, dist<=r+dr))

Output: (array([2, 3]),)

You can see Logic functions for more details.


回答 7

我已经解决了这个简单的例子

import numpy as np

ar = np.array([3,4,5,14,2,4,3,7])

print [X for X in list(ar) if (X >= 3 and X <= 6)]

>>> 
[3, 4, 5, 4, 3]

I have worked out this simple example

import numpy as np

ar = np.array([3,4,5,14,2,4,3,7])

print [X for X in list(ar) if (X >= 3 and X <= 6)]

>>> 
[3, 4, 5, 4, 3]

在__init__内调用类函数

问题:在__init__内调用类函数

我正在编写一些使用文件名,打开文件并解析出一些数据的代码。我想在课堂上做到这一点。以下代码有效:

class MyClass():
    def __init__(self, filename):
        self.filename = filename 

        self.stat1 = None
        self.stat2 = None
        self.stat3 = None
        self.stat4 = None
        self.stat5 = None

        def parse_file():
            #do some parsing
            self.stat1 = result_from_parse1
            self.stat2 = result_from_parse2
            self.stat3 = result_from_parse3
            self.stat4 = result_from_parse4
            self.stat5 = result_from_parse5

        parse_file()

但是,这涉及到我将所有解析机制置于__init__类的功能范围之内。现在,对于此简化的代码来说,这看起来还不错,但是该函数parse_file还具有许多缩进级别。我更喜欢将函数定义parse_file()为类函数,如下所示:

class MyClass():
    def __init__(self, filename):
        self.filename = filename 

        self.stat1 = None
        self.stat2 = None
        self.stat3 = None
        self.stat4 = None
        self.stat5 = None
        parse_file()

    def parse_file():
        #do some parsing
        self.stat1 = result_from_parse1
        self.stat2 = result_from_parse2
        self.stat3 = result_from_parse3
        self.stat4 = result_from_parse4
        self.stat5 = result_from_parse5

当然,此代码不起作用,因为该函数parse_file()不在函数范围内__init__。有没有办法从该类内部调用类函数__init__?还是我想这是错误的方式?

I’m writing some code that takes a filename, opens the file, and parses out some data. I’d like to do this in a class. The following code works:

class MyClass():
    def __init__(self, filename):
        self.filename = filename 

        self.stat1 = None
        self.stat2 = None
        self.stat3 = None
        self.stat4 = None
        self.stat5 = None

        def parse_file():
            #do some parsing
            self.stat1 = result_from_parse1
            self.stat2 = result_from_parse2
            self.stat3 = result_from_parse3
            self.stat4 = result_from_parse4
            self.stat5 = result_from_parse5

        parse_file()

But it involves me putting all of the parsing machinery in the scope of the __init__ function for my class. That looks fine now for this simplified code, but the function parse_file has quite a few levels of indention as well. I’d prefer to define the function parse_file() as a class function like below:

class MyClass():
    def __init__(self, filename):
        self.filename = filename 

        self.stat1 = None
        self.stat2 = None
        self.stat3 = None
        self.stat4 = None
        self.stat5 = None
        parse_file()

    def parse_file():
        #do some parsing
        self.stat1 = result_from_parse1
        self.stat2 = result_from_parse2
        self.stat3 = result_from_parse3
        self.stat4 = result_from_parse4
        self.stat5 = result_from_parse5

Of course this code doesn’t work because the function parse_file() is not within the scope of the __init__ function. Is there a way to call a class function from within __init__ of that class? Or am I thinking about this the wrong way?


回答 0

以这种方式调用该函数:

self.parse_file()

您还需要像这样定义parse_file()函数:

def parse_file(self):

parse_file方法必须在调用时绑定到对象(因为它不是静态方法)。这是通过在对象的实例上调用函数来完成的(在您的情况下,实例是)self

Call the function in this way:

self.parse_file()

You also need to define your parse_file() function like this:

def parse_file(self):

The parse_file method has to be bound to an object upon calling it (because it’s not a static method). This is done by calling the function on an instance of the object, in your case the instance is self.


回答 1

如果我没记错的话,这两个函数都是您的类的一部分,则应像这样使用它:

class MyClass():
    def __init__(self, filename):
        self.filename = filename 

        self.stat1 = None
        self.stat2 = None
        self.stat3 = None
        self.stat4 = None
        self.stat5 = None
        self.parse_file()

    def parse_file(self):
        #do some parsing
        self.stat1 = result_from_parse1
        self.stat2 = result_from_parse2
        self.stat3 = result_from_parse3
        self.stat4 = result_from_parse4
        self.stat5 = result_from_parse5

替换行:

parse_file() 

与:

self.parse_file()

If I’m not wrong, both functions are part of your class, you should use it like this:

class MyClass():
    def __init__(self, filename):
        self.filename = filename 

        self.stat1 = None
        self.stat2 = None
        self.stat3 = None
        self.stat4 = None
        self.stat5 = None
        self.parse_file()

    def parse_file(self):
        #do some parsing
        self.stat1 = result_from_parse1
        self.stat2 = result_from_parse2
        self.stat3 = result_from_parse3
        self.stat4 = result_from_parse4
        self.stat5 = result_from_parse5

replace your line:

parse_file() 

with:

self.parse_file()

回答 2

怎么样:

class MyClass(object):
    def __init__(self, filename):
        self.filename = filename 
        self.stats = parse_file(filename)

def parse_file(filename):
    #do some parsing
    return results_from_parse

顺便说一句,如果有一个名为变量stat1stat2等等,情况正在乞求一个元组: stats = (...)

因此,让我们parse_file返回一个元组,并将其存储在中 self.stats

然后,例如,您可以访问曾经使用调用stat3的内容self.stats[2]

How about:

class MyClass(object):
    def __init__(self, filename):
        self.filename = filename 
        self.stats = parse_file(filename)

def parse_file(filename):
    #do some parsing
    return results_from_parse

By the way, if you have variables named stat1, stat2, etc., the situation is begging for a tuple: stats = (...).

So let parse_file return a tuple, and store the tuple in self.stats.

Then, for example, you can access what used to be called stat3 with self.stats[2].


回答 3

在中parse_file,接受self参数(与中一样__init__)。如果您需要任何其他上下文,则只需照常将其作为附加参数传递。

In parse_file, take the self argument (just like in __init__). If there’s any other context you need then just pass it as additional arguments as usual.


回答 4

您必须像这样声明parse_file; def parse_file(self)。在大多数语言中,“ self”参数是一个隐藏参数,但在python中则不是。您必须将其添加到属于一个类的所有方法的定义中。然后您可以使用以下方法从类中的任何方法调用该函数self.parse_file

您的最终程序将如下所示:

class MyClass():
  def __init__(self, filename):
      self.filename = filename 

      self.stat1 = None
      self.stat2 = None
      self.stat3 = None
      self.stat4 = None
      self.stat5 = None
      self.parse_file()

  def parse_file(self):
      #do some parsing
      self.stat1 = result_from_parse1
      self.stat2 = result_from_parse2
      self.stat3 = result_from_parse3
      self.stat4 = result_from_parse4
      self.stat5 = result_from_parse5

You must declare parse_file like this; def parse_file(self). The “self” parameter is a hidden parameter in most languages, but not in python. You must add it to the definition of all that methods that belong to a class. Then you can call the function from any method inside the class using self.parse_file

your final program is going to look like this:

class MyClass():
  def __init__(self, filename):
      self.filename = filename 

      self.stat1 = None
      self.stat2 = None
      self.stat3 = None
      self.stat4 = None
      self.stat5 = None
      self.parse_file()

  def parse_file(self):
      #do some parsing
      self.stat1 = result_from_parse1
      self.stat2 = result_from_parse2
      self.stat3 = result_from_parse3
      self.stat4 = result_from_parse4
      self.stat5 = result_from_parse5

回答 5

我认为您的问题实际上是没有正确缩进init函数,应该是这样的

class MyClass():
     def __init__(self, filename):
          pass

     def parse_file():
          pass

I think that your problem is actually with not correctly indenting init function.It should be like this

class MyClass():
     def __init__(self, filename):
          pass

     def parse_file():
          pass

有趣好用的Python教程