Python 实用宝典

Question 1

I was playing around with timeit and noticed that doing a simple list comprehension over a small string took longer than doing the same operation on a list of small single character strings. Any explanation? It’s almost 1.35 times as much time.

>>> from timeit import timeit
>>> timeit("[x for x in 'abc']")
2.0691067844831528
>>> timeit("[x for x in ['a', 'b', 'c']]")
1.5286479570345861

What’s happening on a lower level that’s causing this?

Question 2

TL;DR

The actual speed difference is closer to 70% (or more) once a lot of the overhead is removed, for Python 2.
Object creation is not at fault. Neither method creates a new object, as one-character strings are cached.
The difference is unobvious, but is likely created from a greater number of checks on string indexing, with regards to the type and well-formedness. It is also quite likely thanks to the need to check what to return.
List indexing is remarkably fast.

>>> python3 -m timeit '[x for x in "abc"]'
1000000 loops, best of 3: 0.388 usec per loop

>>> python3 -m timeit '[x for x in ["a", "b", "c"]]'
1000000 loops, best of 3: 0.436 usec per loop

This disagrees with what you’ve found…

You must be using Python 2, then.

>>> python2 -m timeit '[x for x in "abc"]'
1000000 loops, best of 3: 0.309 usec per loop

>>> python2 -m timeit '[x for x in ["a", "b", "c"]]'
1000000 loops, best of 3: 0.212 usec per loop

Let’s explain the difference between the versions. I’ll examine the compiled code.

For Python 3:

import dis

def list_iterate():
    [item for item in ["a", "b", "c"]]

dis.dis(list_iterate)
#>>>   4           0 LOAD_CONST               1 (<code object <listcomp> at 0x7f4d06b118a0, file "", line 4>)
#>>>               3 LOAD_CONST               2 ('list_iterate.<locals>.<listcomp>')
#>>>               6 MAKE_FUNCTION            0
#>>>               9 LOAD_CONST               3 ('a')
#>>>              12 LOAD_CONST               4 ('b')
#>>>              15 LOAD_CONST               5 ('c')
#>>>              18 BUILD_LIST               3
#>>>              21 GET_ITER
#>>>              22 CALL_FUNCTION            1 (1 positional, 0 keyword pair)
#>>>              25 POP_TOP
#>>>              26 LOAD_CONST               0 (None)
#>>>              29 RETURN_VALUE

def string_iterate():
    [item for item in "abc"]

dis.dis(string_iterate)
#>>>  21           0 LOAD_CONST               1 (<code object <listcomp> at 0x7f4d06b17150, file "", line 21>)
#>>>               3 LOAD_CONST               2 ('string_iterate.<locals>.<listcomp>')
#>>>               6 MAKE_FUNCTION            0
#>>>               9 LOAD_CONST               3 ('abc')
#>>>              12 GET_ITER
#>>>              13 CALL_FUNCTION            1 (1 positional, 0 keyword pair)
#>>>              16 POP_TOP
#>>>              17 LOAD_CONST               0 (None)
#>>>              20 RETURN_VALUE

You see here that the list variant is likely to be slower due to the building of the list each time.

This is the

 9 LOAD_CONST   3 ('a')
12 LOAD_CONST   4 ('b')
15 LOAD_CONST   5 ('c')
18 BUILD_LIST   3

part. The string variant only has

 9 LOAD_CONST   3 ('abc')

You can check that this does seem to make a difference:

def string_iterate():
    [item for item in ("a", "b", "c")]

dis.dis(string_iterate)
#>>>  35           0 LOAD_CONST               1 (<code object <listcomp> at 0x7f4d068be660, file "", line 35>)
#>>>               3 LOAD_CONST               2 ('string_iterate.<locals>.<listcomp>')
#>>>               6 MAKE_FUNCTION            0
#>>>               9 LOAD_CONST               6 (('a', 'b', 'c'))
#>>>              12 GET_ITER
#>>>              13 CALL_FUNCTION            1 (1 positional, 0 keyword pair)
#>>>              16 POP_TOP
#>>>              17 LOAD_CONST               0 (None)
#>>>              20 RETURN_VALUE

This produces just

 9 LOAD_CONST               6 (('a', 'b', 'c'))

as tuples are immutable. Test:

>>> python3 -m timeit '[x for x in ("a", "b", "c")]'
1000000 loops, best of 3: 0.369 usec per loop

Great, back up to speed.

For Python 2:

def list_iterate():
    [item for item in ["a", "b", "c"]]

dis.dis(list_iterate)
#>>>   2           0 BUILD_LIST               0
#>>>               3 LOAD_CONST               1 ('a')
#>>>               6 LOAD_CONST               2 ('b')
#>>>               9 LOAD_CONST               3 ('c')
#>>>              12 BUILD_LIST               3
#>>>              15 GET_ITER            
#>>>         >>   16 FOR_ITER                12 (to 31)
#>>>              19 STORE_FAST               0 (item)
#>>>              22 LOAD_FAST                0 (item)
#>>>              25 LIST_APPEND              2
#>>>              28 JUMP_ABSOLUTE           16
#>>>         >>   31 POP_TOP             
#>>>              32 LOAD_CONST               0 (None)
#>>>              35 RETURN_VALUE        

def string_iterate():
    [item for item in "abc"]

dis.dis(string_iterate)
#>>>   2           0 BUILD_LIST               0
#>>>               3 LOAD_CONST               1 ('abc')
#>>>               6 GET_ITER            
#>>>         >>    7 FOR_ITER                12 (to 22)
#>>>              10 STORE_FAST               0 (item)
#>>>              13 LOAD_FAST                0 (item)
#>>>              16 LIST_APPEND              2
#>>>              19 JUMP_ABSOLUTE            7
#>>>         >>   22 POP_TOP             
#>>>              23 LOAD_CONST               0 (None)
#>>>              26 RETURN_VALUE

The odd thing is that we have the same building of the list, but it’s still faster for this. Python 2 is acting strangely fast.

Let’s remove the comprehensions and re-time. The _ = is to prevent it getting optimised out.

>>> python3 -m timeit '_ = ["a", "b", "c"]'
10000000 loops, best of 3: 0.0707 usec per loop

>>> python3 -m timeit '_ = "abc"'
100000000 loops, best of 3: 0.0171 usec per loop

We can see that initialization is not significant enough to account for the difference between the versions (those numbers are small)! We can thus conclude that Python 3 has slower comprehensions. This makes sense as Python 3 changed comprehensions to have safer scoping.

Well, now improve the benchmark (I’m just removing overhead that isn’t iteration). This removes the building of the iterable by pre-assigning it:

>>> python3 -m timeit -s 'iterable = "abc"'           '[x for x in iterable]'
1000000 loops, best of 3: 0.387 usec per loop

>>> python3 -m timeit -s 'iterable = ["a", "b", "c"]' '[x for x in iterable]'
1000000 loops, best of 3: 0.368 usec per loop

>>> python2 -m timeit -s 'iterable = "abc"'           '[x for x in iterable]'
1000000 loops, best of 3: 0.309 usec per loop

>>> python2 -m timeit -s 'iterable = ["a", "b", "c"]' '[x for x in iterable]'
10000000 loops, best of 3: 0.164 usec per loop

We can check if calling iter is the overhead:

>>> python3 -m timeit -s 'iterable = "abc"'           'iter(iterable)'
10000000 loops, best of 3: 0.099 usec per loop

>>> python3 -m timeit -s 'iterable = ["a", "b", "c"]' 'iter(iterable)'
10000000 loops, best of 3: 0.1 usec per loop

>>> python2 -m timeit -s 'iterable = "abc"'           'iter(iterable)'
10000000 loops, best of 3: 0.0913 usec per loop

>>> python2 -m timeit -s 'iterable = ["a", "b", "c"]' 'iter(iterable)'
10000000 loops, best of 3: 0.0854 usec per loop

No. No it is not. The difference is too small, especially for Python 3.

So let’s remove yet more unwanted overhead… by making the whole thing slower! The aim is just to have a longer iteration so the time hides overhead.

>>> python3 -m timeit -s 'import random; iterable = "".join(chr(random.randint(0, 127)) for _ in range(100000))' '[x for x in iterable]'
100 loops, best of 3: 3.12 msec per loop

>>> python3 -m timeit -s 'import random; iterable =        [chr(random.randint(0, 127)) for _ in range(100000)]' '[x for x in iterable]'
100 loops, best of 3: 2.77 msec per loop

>>> python2 -m timeit -s 'import random; iterable = "".join(chr(random.randint(0, 127)) for _ in range(100000))' '[x for x in iterable]'
100 loops, best of 3: 2.32 msec per loop

>>> python2 -m timeit -s 'import random; iterable =        [chr(random.randint(0, 127)) for _ in range(100000)]' '[x for x in iterable]'
100 loops, best of 3: 2.09 msec per loop

This hasn’t actually changed much, but it’s helped a little.

So remove the comprehension. It’s overhead that’s not part of the question:

>>> python3 -m timeit -s 'import random; iterable = "".join(chr(random.randint(0, 127)) for _ in range(100000))' 'for x in iterable: pass'
1000 loops, best of 3: 1.71 msec per loop

>>> python3 -m timeit -s 'import random; iterable =        [chr(random.randint(0, 127)) for _ in range(100000)]' 'for x in iterable: pass'
1000 loops, best of 3: 1.36 msec per loop

>>> python2 -m timeit -s 'import random; iterable = "".join(chr(random.randint(0, 127)) for _ in range(100000))' 'for x in iterable: pass'
1000 loops, best of 3: 1.27 msec per loop

>>> python2 -m timeit -s 'import random; iterable =        [chr(random.randint(0, 127)) for _ in range(100000)]' 'for x in iterable: pass'
1000 loops, best of 3: 935 usec per loop

That’s more like it! We can get slightly faster still by using deque to iterate. It’s basically the same, but it’s faster:

>>> python3 -m timeit -s 'import random; from collections import deque; iterable = "".join(chr(random.randint(0, 127)) for _ in range(100000))' 'deque(iterable, maxlen=0)'
1000 loops, best of 3: 777 usec per loop

>>> python3 -m timeit -s 'import random; from collections import deque; iterable =        [chr(random.randint(0, 127)) for _ in range(100000)]' 'deque(iterable, maxlen=0)'
1000 loops, best of 3: 405 usec per loop

>>> python2 -m timeit -s 'import random; from collections import deque; iterable = "".join(chr(random.randint(0, 127)) for _ in range(100000))' 'deque(iterable, maxlen=0)'
1000 loops, best of 3: 805 usec per loop

>>> python2 -m timeit -s 'import random; from collections import deque; iterable =        [chr(random.randint(0, 127)) for _ in range(100000)]' 'deque(iterable, maxlen=0)'
1000 loops, best of 3: 438 usec per loop

What impresses me is that Unicode is competitive with bytestrings. We can check this explicitly by trying bytes and unicode in both:

bytes

>>> python3 -m timeit -s 'import random; from collections import deque; iterable = b"".join(chr(random.randint(0, 127)).encode("ascii") for _ in range(100000))' 'deque(iterable, maxlen=0)'                                                                    :(
1000 loops, best of 3: 571 usec per loop

>>> python3 -m timeit -s 'import random; from collections import deque; iterable =         [chr(random.randint(0, 127)).encode("ascii") for _ in range(100000)]' 'deque(iterable, maxlen=0)'
1000 loops, best of 3: 394 usec per loop

>>> python2 -m timeit -s 'import random; from collections import deque; iterable = b"".join(chr(random.randint(0, 127))                 for _ in range(100000))' 'deque(iterable, maxlen=0)'
1000 loops, best of 3: 757 usec per loop

>>> python2 -m timeit -s 'import random; from collections import deque; iterable =         [chr(random.randint(0, 127))                 for _ in range(100000)]' 'deque(iterable, maxlen=0)'
1000 loops, best of 3: 438 usec per loop

Here you see Python 3 actually faster than Python 2.

unicode

>>> python3 -m timeit -s 'import random; from collections import deque; iterable = u"".join(   chr(random.randint(0, 127)) for _ in range(100000))' 'deque(iterable, maxlen=0)'
1000 loops, best of 3: 800 usec per loop

>>> python3 -m timeit -s 'import random; from collections import deque; iterable =         [   chr(random.randint(0, 127)) for _ in range(100000)]' 'deque(iterable, maxlen=0)'
1000 loops, best of 3: 394 usec per loop

>>> python2 -m timeit -s 'import random; from collections import deque; iterable = u"".join(unichr(random.randint(0, 127)) for _ in range(100000))' 'deque(iterable, maxlen=0)'
1000 loops, best of 3: 1.07 msec per loop

>>> python2 -m timeit -s 'import random; from collections import deque; iterable =         [unichr(random.randint(0, 127)) for _ in range(100000)]' 'deque(iterable, maxlen=0)'
1000 loops, best of 3: 469 usec per loop

Again, Python 3 is faster, although this is to be expected (str has had a lot of attention in Python 3).

In fact, this unicode–bytes difference is very small, which is impressive.

So let’s analyse this one case, seeing as it’s fast and convenient for me:

>>> python3 -m timeit -s 'import random; from collections import deque; iterable = "".join(chr(random.randint(0, 127)) for _ in range(100000))' 'deque(iterable, maxlen=0)'
1000 loops, best of 3: 777 usec per loop

>>> python3 -m timeit -s 'import random; from collections import deque; iterable =        [chr(random.randint(0, 127)) for _ in range(100000)]' 'deque(iterable, maxlen=0)'
1000 loops, best of 3: 405 usec per loop

We can actually rule out Tim Peter’s 10-times-upvoted answer!

>>> foo = iterable[123]
>>> iterable[36] is foo
True

These are not new objects!

But this is worth mentioning: indexing costs. The difference will likely be in the indexing, so remove the iteration and just index:

>>> python3 -m timeit -s 'import random; iterable = "".join(chr(random.randint(0, 127)) for _ in range(100000))' 'iterable[123]'
10000000 loops, best of 3: 0.0397 usec per loop

>>> python3 -m timeit -s 'import random; iterable =        [chr(random.randint(0, 127)) for _ in range(100000)]' 'iterable[123]'
10000000 loops, best of 3: 0.0374 usec per loop

The difference seems small, but at least half of the cost is overhead:

>>> python3 -m timeit -s 'import random; iterable =        [chr(random.randint(0, 127)) for _ in range(100000)]' 'iterable; 123'
100000000 loops, best of 3: 0.0173 usec per loop

so the speed difference is sufficient to decide to blame it. I think.

So why is indexing a list so much faster?

Well, I’ll come back to you on that, but my guess is that’s is down to the check for interned strings (or cached characters if it’s a separate mechanism). This will be less fast than optimal. But I’ll go check the source (although I’m not comfortable in C…) :).

So here’s the source:

static PyObject *
unicode_getitem(PyObject *self, Py_ssize_t index)
{
    void *data;
    enum PyUnicode_Kind kind;
    Py_UCS4 ch;
    PyObject *res;

    if (!PyUnicode_Check(self) || PyUnicode_READY(self) == -1) {
        PyErr_BadArgument();
        return NULL;
    }
    if (index < 0 || index >= PyUnicode_GET_LENGTH(self)) {
        PyErr_SetString(PyExc_IndexError, "string index out of range");
        return NULL;
    }
    kind = PyUnicode_KIND(self);
    data = PyUnicode_DATA(self);
    ch = PyUnicode_READ(kind, data, index);
    if (ch < 256)
        return get_latin1_char(ch);

    res = PyUnicode_New(1, ch);
    if (res == NULL)
        return NULL;
    kind = PyUnicode_KIND(res);
    data = PyUnicode_DATA(res);
    PyUnicode_WRITE(kind, data, 0, ch);
    assert(_PyUnicode_CheckConsistency(res, 1));
    return res;
}

Walking from the top, we’ll have some checks. These are boring. Then some assigns, which should also be boring. The first interesting line is

ch = PyUnicode_READ(kind, data, index);

but we’d hope that is fast, as we’re reading from a contiguous C array by indexing it. The result, ch, will be less than 256 so we’ll return the cached character in get_latin1_char(ch).

So we’ll run (dropping the first checks)

kind = PyUnicode_KIND(self);
data = PyUnicode_DATA(self);
ch = PyUnicode_READ(kind, data, index);
return get_latin1_char(ch);

Where

#define PyUnicode_KIND(op) \
    (assert(PyUnicode_Check(op)), \
     assert(PyUnicode_IS_READY(op)),            \
     ((PyASCIIObject *)(op))->state.kind)

(which is boring because asserts get ignored in debug [so I can check that they’re fast] and ((PyASCIIObject *)(op))->state.kind) is (I think) an indirection and a C-level cast);

#define PyUnicode_DATA(op) \
    (assert(PyUnicode_Check(op)), \
     PyUnicode_IS_COMPACT(op) ? _PyUnicode_COMPACT_DATA(op) :   \
     _PyUnicode_NONCOMPACT_DATA(op))

(which is also boring for similar reasons, assuming the macros (Something_CAPITALIZED) are all fast),

#define PyUnicode_READ(kind, data, index) \
    ((Py_UCS4) \
    ((kind) == PyUnicode_1BYTE_KIND ? \
        ((const Py_UCS1 *)(data))[(index)] : \
        ((kind) == PyUnicode_2BYTE_KIND ? \
            ((const Py_UCS2 *)(data))[(index)] : \
            ((const Py_UCS4 *)(data))[(index)] \
        ) \
    ))

(which involves indexes but really isn’t slow at all) and

static PyObject*
get_latin1_char(unsigned char ch)
{
    PyObject *unicode = unicode_latin1[ch];
    if (!unicode) {
        unicode = PyUnicode_New(1, ch);
        if (!unicode)
            return NULL;
        PyUnicode_1BYTE_DATA(unicode)[0] = ch;
        assert(_PyUnicode_CheckConsistency(unicode, 1));
        unicode_latin1[ch] = unicode;
    }
    Py_INCREF(unicode);
    return unicode;
}

Which confirms my suspicion that:

This is cached:

PyObject *unicode = unicode_latin1[ch];

This should be fast. The if (!unicode) is not run, so it’s literally equivalent in this case to
```
PyObject *unicode = unicode_latin1[ch];
Py_INCREF(unicode);
return unicode;
```

Honestly, after testing the asserts are fast (by disabling them [I think it works on the C-level asserts…]), the only plausibly-slow parts are:

PyUnicode_IS_COMPACT(op)
_PyUnicode_COMPACT_DATA(op)
_PyUnicode_NONCOMPACT_DATA(op)

Which are:

#define PyUnicode_IS_COMPACT(op) \
    (((PyASCIIObject*)(op))->state.compact)

(fast, as before),

#define _PyUnicode_COMPACT_DATA(op)                     \
    (PyUnicode_IS_ASCII(op) ?                   \
     ((void*)((PyASCIIObject*)(op) + 1)) :              \
     ((void*)((PyCompactUnicodeObject*)(op) + 1)))

(fast if the macro IS_ASCII is fast), and

#define _PyUnicode_NONCOMPACT_DATA(op)                  \
    (assert(((PyUnicodeObject*)(op))->data.any),        \
     ((((PyUnicodeObject *)(op))->data.any)))

(also fast as it’s an assert plus an indirection plus a cast).

So we’re down (the rabbit hole) to:

PyUnicode_IS_ASCII

which is

#define PyUnicode_IS_ASCII(op)                   \
    (assert(PyUnicode_Check(op)),                \
     assert(PyUnicode_IS_READY(op)),             \
     ((PyASCIIObject*)op)->state.ascii)

Hmm… that seems fast too…

Well, OK, but let’s compare it to PyList_GetItem. (Yeah, thanks Tim Peters for giving me more work to do :P.)

PyObject *
PyList_GetItem(PyObject *op, Py_ssize_t i)
{
    if (!PyList_Check(op)) {
        PyErr_BadInternalCall();
        return NULL;
    }
    if (i < 0 || i >= Py_SIZE(op)) {
        if (indexerr == NULL) {
            indexerr = PyUnicode_FromString(
                "list index out of range");
            if (indexerr == NULL)
                return NULL;
        }
        PyErr_SetObject(PyExc_IndexError, indexerr);
        return NULL;
    }
    return ((PyListObject *)op) -> ob_item[i];
}

We can see that on non-error cases this is just going to run:

PyList_Check(op)
Py_SIZE(op)
((PyListObject *)op) -> ob_item[i]

Where PyList_Check is

#define PyList_Check(op) \
     PyType_FastSubclass(Py_TYPE(op), Py_TPFLAGS_LIST_SUBCLASS)

~~(TABS! TABS!!!) (issue21587)~~ That got fixed and merged in 5 minutes. Like… yeah. Damn. They put Skeet to shame.

#define Py_SIZE(ob)             (((PyVarObject*)(ob))->ob_size)

#define PyType_FastSubclass(t,f)  PyType_HasFeature(t,f)

#ifdef Py_LIMITED_API
#define PyType_HasFeature(t,f)  ((PyType_GetFlags(t) & (f)) != 0)
#else
#define PyType_HasFeature(t,f)  (((t)->tp_flags & (f)) != 0)
#endif

So this is normally really trivial (two indirections and a couple of boolean checks) unless Py_LIMITED_API is on, in which case… ???

Then there’s the indexing and a cast (((PyListObject *)op) -> ob_item[i]) and we’re done.

So there are definitely fewer checks for lists, and the small speed differences certainly imply that it could be relevant.

I think in general, there’s just more type-checking and indirection (->) for Unicode. It seems I’m missing a point, but what?

Question 3

When you iterate over most container objects (lists, tuples, dicts, …), the iterator delivers the objects in the container.

But when you iterate over a string, a new object has to be created for each character delivered – a string is not “a container” in the same sense a list is a container. The individual characters in a string don’t exist as distinct objects before iteration creates those objects.

Question 4

You could be incurring and overhead for creating the iterator for the string. Whereas the array already contains an iterator upon instantiation.

EDIT:

>>> timeit("[x for x in ['a','b','c']]")
0.3818681240081787
>>> timeit("[x for x in 'abc']")
0.3732869625091553

This was ran using 2.7, but on my mac book pro i7. This could be the result of a system configuration difference.

Question 5

I’m looking for documents that describes in details how python garbage collection works.

I’m interested what is done in which step. What objects are in these 3 collections? What kinds of objects are deleted in each step? What algorithm is used for reference cycles finding?

Background: I’m implementing some searches that have to finish in small amount of time. When the garbage collector starts collecting the oldest generation, it is “much” slower than in other cases. It took more time than it is intended for searches. I’m looking how to predict when it will collect oldest generation and how long it will take.

It is easy to predict when it will collect oldest generation with get_count() and get_threshold(). That also can be manipulated with set_threshold(). But I don’t see how easy to decide is it better to make collect() by force or wait for scheduled collection.

Question 6

There’s no definitive resource on how Python does its garbage collection (other than the source code itself), but those 3 links should give you a pretty good idea.

Update

The source is actually pretty helpful. How much you get out of it depends on how well you read C, but the comments are actually very helpful. Skip down to the collect() function and the comments explain the process well (albeit in very technical terms).

Question 7

I’m used that in Objective-C I’ve got this construct:

- (void)init {
    if (self = [super init]) {
        // init class
    }
    return self;
}

Should Python also call the parent class’s implementation for __init__?

class NewClass(SomeOtherClass):
    def __init__(self):
        SomeOtherClass.__init__(self)
        # init class

Is this also true/false for __new__() and __del__()?

Edit: There’s a very similar question: Inheritance and Overriding __init__ in Python

Question 8

In Python, calling the super-class’ __init__ is optional. If you call it, it is then also optional whether to use the super identifier, or whether to explicitly name the super class:

object.__init__(self)

In case of object, calling the super method is not strictly necessary, since the super method is empty. Same for __del__.

On the other hand, for __new__, you should indeed call the super method, and use its return as the newly-created object – unless you explicitly want to return something different.

Question 9

If you need something from super’s __init__ to be done in addition to what is being done in the current class’s __init__, you must call it yourself, since that will not happen automatically. But if you don’t need anything from super’s __init__, no need to call it. Example:

>>> class C(object):
        def __init__(self):
            self.b = 1


>>> class D(C):
        def __init__(self):
            super().__init__() # in Python 2 use super(D, self).__init__()
            self.a = 1


>>> class E(C):
        def __init__(self):
            self.a = 1


>>> d = D()
>>> d.a
1
>>> d.b  # This works because of the call to super's init
1
>>> e = E()
>>> e.a
1
>>> e.b  # This is going to fail since nothing in E initializes b...
Traceback (most recent call last):
  File "<pyshell#70>", line 1, in <module>
    e.b  # This is going to fail since nothing in E initializes b...
AttributeError: 'E' object has no attribute 'b'

__del__ is the same way, (but be wary of relying on __del__ for finalization – consider doing it via the with statement instead).

I rarely use __new__. I do all the initialization in __init__.

Question 10

In Anon’s answer:
“If you need something from super’s __init__ to be done in addition to what is being done in the current class’s __init__ , you must call it yourself, since that will not happen automatically”

It’s incredible: he is wording exactly the contrary of the principle of inheritance.

It is not that “something from super’s __init__ (…) will not happen automatically” , it is that it WOULD happen automatically, but it doesn’t happen because the base-class’ __init__ is overriden by the definition of the derived-clas __init__

So then, WHY defining a derived_class’ __init__ , since it overrides what is aimed at when someone resorts to inheritance ??

It’s because one needs to define something that is NOT done in the base-class’ __init__ , and the only possibility to obtain that is to put its execution in a derived-class’ __init__ function.
In other words, one needs something in base-class’ __init__ in addition to what would be automatically done in the base-classe’ __init__ if this latter wasn’t overriden.
NOT the contrary.

Then, the problem is that the desired instructions present in the base-class’ __init__ are no more activated at the moment of instantiation. In order to offset this inactivation, something special is required: calling explicitly the base-class’ __init__ , in order to KEEP , NOT TO ADD, the initialization performed by the base-class’ __init__ . That’s exactly what is said in the official doc:

An overriding method in a derived class may in fact want to extend rather than simply replace the base class method of the same name. There is a simple way to call the base class method directly: just call BaseClassName.methodname(self, arguments).
http://docs.python.org/tutorial/classes.html#inheritance

That’s all the story:

when the aim is to KEEP the initialization performed by the base-class, that is pure inheritance, nothing special is needed, one must just avoid to define an __init__ function in the derived class
when the aim is to REPLACE the initialization performed by the base-class, __init__ must be defined in the derived-class
when the aim is to ADD processes to the initialization performed by the base-class, a derived-class’ __init__ must be defined , comprising an explicit call to the base-class __init__

What I feel astonishing in the post of Anon is not only that he expresses the contrary of the inheritance theory, but that there have been 5 guys passing by that upvoted without turning a hair, and moreover there have been nobody to react in 2 years in a thread whose interesting subject must be read relatively often.

Question 11

Edit: (after the code change)
There is no way for us to tell you whether you need or not to call your parent’s __init__ (or any other function). Inheritance obviously would work without such call. It all depends on the logic of your code: for example, if all your __init__ is done in parent class, you can just skip child-class __init__ altogether.

consider the following example:

>>> class A:
    def __init__(self, val):
        self.a = val


>>> class B(A):
    pass

>>> class C(A):
    def __init__(self, val):
        A.__init__(self, val)
        self.a += val


>>> A(4).a
4
>>> B(5).a
5
>>> C(6).a
12

Question 12

There’s no hard and fast rule. The documentation for a class should indicate whether subclasses should call the superclass method. Sometimes you want to completely replace superclass behaviour, and at other times augment it – i.e. call your own code before and/or after a superclass call.

Update: The same basic logic applies to any method call. Constructors sometimes need special consideration (as they often set up state which determines behaviour) and destructors because they parallel constructors (e.g. in the allocation of resources, e.g. database connections). But the same might apply, say, to the render() method of a widget.

Further update: What’s the OPP? Do you mean OOP? No – a subclass often needs to know something about the design of the superclass. Not the internal implementation details – but the basic contract that the superclass has with its clients (using classes). This does not violate OOP principles in any way. That’s why protected is a valid concept in OOP in general (though not, of course, in Python).

Question 13

IMO, you should call it. If your superclass is object, you should not, but in other cases I think it is exceptional not to call it. As already answered by others, it is very convenient if your class doesn’t even have to override __init__ itself, for example when it has no (additional) internal state to initialize.

Question 14

Yes, you should always call base class __init__ explicitly as a good coding practice. Forgetting to do this can cause subtle issues or run time errors. This is true even if __init__ doesn’t take any parameters. This is unlike other languages where compiler would implicitly call base class constructor for you. Python doesn’t do that!

The main reason for always calling base class _init__ is that base class may typically create member variable and initialize them to defaults. So if you don’t call base class init, none of that code would be executed and you would end up with base class that has no member variables.

Example:

class Base:
  def __init__(self):
    print('base init')

class Derived1(Base):
  def __init__(self):
    print('derived1 init')

class Derived2(Base):
  def __init__(self):
    super(Derived2, self).__init__()
    print('derived2 init')

print('Creating Derived1...')
d1 = Derived1()
print('Creating Derived2...')
d2 = Derived2()

This prints..

Creating Derived1...
derived1 init
Creating Derived2...
base init
derived2 init

Run this code.

Question 15

Working in Python 2.7. I have a dictionary with team names as the keys and the amount of runs scored and allowed for each team as the value list:

NL_East = {'Phillies': [645, 469], 'Braves': [599, 548], 'Mets': [653, 672]}

I would like to be able to feed the dictionary into a function and iterate over each team (the keys).

Here’s the code I’m using. Right now, I can only go team by team. How would I iterate over each team and print the expected win_percentage for each team?

def Pythag(league):
    runs_scored = float(league['Phillies'][0])
    runs_allowed = float(league['Phillies'][1])
    win_percentage = round((runs_scored**2)/((runs_scored**2)+(runs_allowed**2))*1000)
    print win_percentage

Thanks for any help.

Question 16

You have several options for iterating over a dictionary.

If you iterate over the dictionary itself (for team in league), you will be iterating over the keys of the dictionary. When looping with a for loop, the behavior will be the same whether you loop over the dict (league) itself, or league.keys():

for team in league.keys():
    runs_scored, runs_allowed = map(float, league[team])

You can also iterate over both the keys and the values at once by iterating over league.items():

for team, runs in league.items():
    runs_scored, runs_allowed = map(float, runs)

You can even perform your tuple unpacking while iterating:

for team, (runs_scored, runs_allowed) in league.items():
    runs_scored = float(runs_scored)
    runs_allowed = float(runs_allowed)

Question 17

You can very easily iterate over dictionaries, too:

for team, scores in NL_East.iteritems():
    runs_scored = float(scores[0])
    runs_allowed = float(scores[1])
    win_percentage = round((runs_scored**2)/((runs_scored**2)+(runs_allowed**2))*1000)
    print '%s: %.1f%%' % (team, win_percentage)

Question 18

Dictionaries have a built in function called iterkeys().

Try:

for team in league.iterkeys():
    runs_scored = float(league[team][0])
    runs_allowed = float(league[team][1])
    win_percentage = round((runs_scored**2)/((runs_scored**2)+(runs_allowed**2))*1000)
    print win_percentage

Question 19

Dictionary objects allow you to iterate over their items. Also, with pattern matching and the division from __future__ you can do simplify things a bit.

Finally, you can separate your logic from your printing to make things a bit easier to refactor/debug later.

from __future__ import division

def Pythag(league):
    def win_percentages():
        for team, (runs_scored, runs_allowed) in league.iteritems():
            win_percentage = round((runs_scored**2) / ((runs_scored**2)+(runs_allowed**2))*1000)
            yield win_percentage

    for win_percentage in win_percentages():
        print win_percentage

Question 20

List comprehension can shorten things…

win_percentages = [m**2.0 / (m**2.0 + n**2.0) * 100 for m, n in [a[i] for i in NL_East]]

Question 21

Working with Python in Emacs if I want to add a try/except to a block of code, I often find that I am having to indent the whole block, line by line. In Emacs, how do you indent the whole block at once.

I am not an experienced Emacs user, but just find it is the best tool for working through ssh. I am using Emacs on the command line(Ubuntu), not as a gui, if that makes any difference.

Question 22

If you are programming Python using Emacs, then you should probably be using python-mode. With python-mode, after marking the block of code,

C-c > or C-c C-l shifts the region 4 spaces to the right

C-c < or C-c C-r shifts the region 4 spaces to the left

If you need to shift code by two levels of indention, or some arbitary amount you can prefix the command with an argument:

C-u 8 C-c > shifts the region 8 spaces to the right

C-u 8 C-c < shifts the region 8 spaces to the left

Another alternative is to use M-x indent-rigidly which is bound to C-x TAB:

C-u 8 C-x TAB shifts the region 8 spaces to the right

C-u -8 C-x TAB shifts the region 8 spaces to the left

Also useful are the rectangle commands that operate on rectangles of text instead of lines of text.

For example, after marking a rectangular region,

C-x r o inserts blank space to fill the rectangular region (effectively shifting code to the right)

C-x r k kills the rectangular region (effectively shifting code to the left)

C-x r t prompts for a string to replace the rectangle with. Entering C-u 8 <space> will then enter 8 spaces.

PS. With Ubuntu, to make python-mode the default mode for all .py files, simply install the python-mode package.

Question 23

In addition to indent-region, which is mapped to C-M-\ by default, the rectangle edit commands are very useful for Python. Mark a region as normal, then:

C-x r t (string-rectangle): will prompt you for characters you’d like to insert into each line; great for inserting a certain number of spaces
C-x r k (kill-rectangle): remove a rectangle region; great for removing indentation

You can also C-x r y (yank-rectangle), but that’s only rarely useful.

Question 24

indent-region mapped to C-M-\ should do the trick.

Question 25

I’ve been using this function to handle my indenting and unindenting:

(defun unindent-dwim (&optional count-arg)
  "Keeps relative spacing in the region.  Unindents to the next multiple of the current tab-width"
  (interactive)
  (let ((deactivate-mark nil)
        (beg (or (and mark-active (region-beginning)) (line-beginning-position)))
        (end (or (and mark-active (region-end)) (line-end-position)))
        (min-indentation)
        (count (or count-arg 1)))
    (save-excursion
      (goto-char beg)
      (while (< (point) end)
        (add-to-list 'min-indentation (current-indentation))
        (forward-line)))
    (if (< 0 count)
        (if (not (< 0 (apply 'min min-indentation)))
            (error "Can't indent any more.  Try `indent-rigidly` with a negative arg.")))
    (if (> 0 count)
        (indent-rigidly beg end (* (- 0 tab-width) count))
      (let (
            (indent-amount
             (apply 'min (mapcar (lambda (x) (- 0 (mod x tab-width))) min-indentation))))
        (indent-rigidly beg end (or
                                 (and (< indent-amount 0) indent-amount)
                                 (* (or count 1) (- 0 tab-width))))))))

And then I assign it to a keyboard shortcut:

(global-set-key (kbd "s-[") 'unindent-dwim)
(global-set-key (kbd "s-]") (lambda () (interactive) (unindent-dwim -1)))

Question 26

I’m an Emacs newb, so this answer it probably bordering on useless.

None of the answers mentioned so far cover re-indentation of literals like dict or list. E.g. M-x indent-region or M-x python-indent-shift-right and company aren’t going to help if you’ve cut-and-pasted the following literal and need it to be re-indented sensibly:

    foo = {
  'bar' : [
     1,
    2,
        3 ],
      'baz' : {
     'asdf' : {
        'banana' : 1,
        'apple' : 2 } } }

It feels like M-x indent-region should do something sensibly in python-mode, but that’s not (yet) the case.

For the specific case where your literals are bracketed, using TAB on the lines in question gets what you want (because whitespace doesn’t play a role).

So what I’ve been doing in such cases is quickly recording a keyboard macro like <f3> C-n TAB <f4> as in F3, Ctrl-n (or down arrow), TAB, F4, and then using F4 repeatedly to apply the macro can save a couple of keystrokes. Or you can do C-u 10 C-x e to apply it 10 times.

(I know it doesn’t sound like much, but try re-indenting 100 lines of garbage literal without missing down-arrow, and then having to go up 5 lines and repeat things ;) ).

Question 27

I use the following snippet. On tab when the selection is inactive, it indents the current line (as it normally does); when the selection is inactive, it indents the whole region to the right.

(defun my-python-tab-command (&optional _)
  "If the region is active, shift to the right; otherwise, indent current line."
  (interactive)
  (if (not (region-active-p))
      (indent-for-tab-command)
    (let ((lo (min (region-beginning) (region-end)))
          (hi (max (region-beginning) (region-end))))
      (goto-char lo)
      (beginning-of-line)
      (set-mark (point))
      (goto-char hi)
      (end-of-line)
      (python-indent-shift-right (mark) (point)))))
(define-key python-mode-map [remap indent-for-tab-command] 'my-python-tab-command)

Question 28

Do indentation interactively.

Select the region to be indented.
C-x TAB.
Use arrows (<- and ->) to indent interactively.
Press Esc three times when you are done with the required indentation.

Copied from my post in: Indent several lines in Emacs

Question 29

I do something like this universally

;; intent whole buffer 
(defun iwb ()
  "indent whole buffer"
  (interactive)
  ;;(delete-trailing-whitespace)
  (indent-region (point-min) (point-max) nil)
  (untabify (point-min) (point-max)))

Question 30

Numpy, scipy, matplotlib, and pylab are common terms among they who use python for scientific computation.

I just learn a bit about pylab, and I got confused. Whenever I want to import numpy, I can always do:

import numpy as np

I just consider, that once I do

from pylab import *

the numpy will be imported as well (with np alias). So basically the second one does more things compared to the first one.

There are few things I want to ask:

Is it right that pylab is just a wrapper for numpy, scipy and matplotlib?
As np is the numpy alias in pylab, what is the scipy and matplotlib alias in pylab? (as far as I know, plt is alias of matplotlib.pyplot, but I don’t know the alias for the matplotlib itself)

Question 31

No, pylab is part of matplotlib (in matplotlib.pylab) and tries to give you a MatLab like environment. matplotlib has a number of dependencies, among them numpy which it imports under the common alias np. scipy is not a dependency of matplotlib.
If you run ipython --pylab an automatic import will put all symbols from matplotlib.pylab into global scope. Like you wrote numpy gets imported under the np alias. Symbols from matplotlib are available under the mpl alias.

Question 32

Scipy and numpy are scientific projects whose aim is to bring efficient and fast numeric computing to python.

Matplotlib is the name of the python plotting library.

Pyplot is an interactive api for matplotlib, mostly for use in notebooks like jupyter. You generally use it like this: import matplotlib.pyplot as plt.

Pylab is the same thing as pyplot, but with extra features (its use is currently discouraged).

pylab = pyplot + numpy

See more information here: Matplotlib, Pylab, Pyplot, etc: What’s the difference between these and when to use each?

Question 33

Since some people (like me) may still be confused about usage of pylab since examples using pylab are out there on the internet, here is a quote from the official matplotlib FAQ:

pylab is a convenience module that bulk imports matplotlib.pyplot (for plotting) and numpy (for mathematics and working with arrays) in a single name space. Although many examples use pylab, it is no longer recommended.

So, TL;DR; is do not use pylab, period. Use pyplot and import numpy separately as needed.

Here is the link for further reading and other useful examples.

Question 34

I’m writing an AI state space search algorithm, and I have a generic class which can be used to quickly implement a search algorithm. A subclass would define the necessary operations, and the algorithm does the rest.

Here is where I get stuck: I want to avoid regenerating the parent state over and over again, so I have the following function, which returns the operations that can be legally applied to any state:

def get_operations(self, include_parent=True):
    ops = self._get_operations()
    if not include_parent and self.path.parent_op:
        try:
            parent_inverse = self.invert_op(self.path.parent_op)
            ops.remove(parent_inverse)
        except NotImplementedError:
            pass
    return ops

And the invert_op function throws by default.

Is there a faster way to check to see if the function is not defined than catching an exception?

I was thinking something on the lines of checking for present in dir, but that doesn’t seem right. hasattr is implemented by calling getattr and checking if it raises, which is not what I want.

Question 35

Yes, use getattr() to get the attribute, and callable() to verify it is a method:

invert_op = getattr(self, "invert_op", None)
if callable(invert_op):
    invert_op(self.path.parent_op)

Note that getattr() normally throws exception when the attribute doesn’t exist. However, if you specify a default value (None, in this case), it will return that instead.

Question 36

It works in both Python 2 and Python 3

hasattr(connection, 'invert_opt')

hasattr returns True if connection object has a function invert_opt defined. Here is the documentation for you to graze

https://docs.python.org/2/library/functions.html#hasattr https://docs.python.org/3/library/functions.html#hasattr

Question 37

Is there a faster way to check to see if the function is not defined than catching an exception?

Why are you against that? In most Pythonic cases, it’s better to ask forgiveness than permission. ;-)

hasattr is implemented by calling getattr and checking if it raises, which is not what I want.

Again, why is that? The following is quite Pythonic:

    try:
        invert_op = self.invert_op
    except AttributeError:
        pass
    else:
        parent_inverse = invert_op(self.path.parent_op)
        ops.remove(parent_inverse)

Or,

    # if you supply the optional `default` parameter, no exception is thrown
    invert_op = getattr(self, 'invert_op', None)  
    if invert_op is not None:
        parent_inverse = invert_op(self.path.parent_op)
        ops.remove(parent_inverse)

Note, however, that getattr(obj, attr, default) is basically implemented by catching an exception, too. There is nothing wrong with that in Python land!

Question 38

The responses herein check if a string is the name of an attribute of the object. An extra step (using callable) is needed to check if the attribute is a method.

So it boils down to: what is the fastest way to check if an object obj has an attribute attrib. The answer is

'attrib' in obj.__dict__

This is so because a dict hashes its keys so checking for the key’s existence is fast.

See timing comparisons below.

>>> class SomeClass():
...         pass
...
>>> obj = SomeClass()
>>>
>>> getattr(obj, "invert_op", None)
>>>
>>> %timeit getattr(obj, "invert_op", None)
1000000 loops, best of 3: 723 ns per loop
>>> %timeit hasattr(obj, "invert_op")
The slowest run took 4.60 times longer than the fastest. This could mean that an intermediate result is being cached.
1000000 loops, best of 3: 674 ns per loop
>>> %timeit "invert_op" in obj.__dict__
The slowest run took 12.19 times longer than the fastest. This could mean that an intermediate result is being cached.
10000000 loops, best of 3: 176 ns per loop

Question 39

I like Nathan Ostgard’s answer and I up-voted it. But another way you could solve your problem would be to use a memoizing decorator, which would cache the result of the function call. So you can go ahead and have an expensive function that figures something out, but then when you call it over and over the subsequent calls are fast; the memoized version of the function looks up the arguments in a dict, finds the result in the dict from when the actual function computed the result, and returns the result right away.

Here is a recipe for a memoizing decorator called “lru_cache” by Raymond Hettinger. A version of this is now standard in the functools module in Python 3.2.

http://code.activestate.com/recipes/498245-lru-and-lfu-cache-decorators/

http://docs.python.org/release/3.2/library/functools.html

Question 40

Like anything in Python, if you try hard enough, you can get at the guts and do something really nasty. Now, here’s the nasty part:

def invert_op(self, op):
    raise NotImplementedError

def is_invert_op_implemented(self):
    # Only works in CPython 2.x of course
    return self.invert_op.__code__.co_code == 't\x00\x00\x82\x01\x00d\x00\x00S'

Please do us a favor, just keep doing what you have in your question and DON’T ever use this unless you are on the PyPy team hacking into the Python interpreter. What you have up there is Pythonic, what I have here is pure EVIL.

Question 41

You can also go over the class:

import inspect


def get_methods(cls_):
    methods = inspect.getmembers(cls_, inspect.isfunction)
    return dict(methods)

# Example
class A(object):
    pass

class B(object):
    def foo():
        print('B')


# If you only have an object, you can use `cls_ = obj.__class__`
if 'foo' in get_methods(A):
    print('A has foo')

if 'foo' in get_methods(B):
    print('B has foo')

Question 42

While checking for attributes in __dict__ property is really fast, you cannot use this for methods, since they do not appear in __dict__ hash. You could however resort to hackish workaround in your class, if performance is that critical:

class Test():
    def __init__():
        # redefine your method as attribute
        self.custom_method = self.custom_method

    def custom_method(self):
        pass

Then check for method as:

t = Test()
'custom_method' in t.__dict__

Time comparision with getattr:

>>%timeit 'custom_method' in t.__dict__
55.9 ns ± 0.626 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)

>>%timeit getattr(t, 'custom_method', None)
116 ns ± 0.765 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)

Not that I’m encouraging this approach, but it seems to work.

[EDIT] Performance boost is even higher when method name is not in given class:

>>%timeit 'rubbish' in t.__dict__
65.5 ns ± 11 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)

>>%timeit getattr(t, 'rubbish', None)
385 ns ± 12.9 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

Question 43

I’m trying to learn python and I now I am trying to get the hang of classes and how to manipulate them with instances.

I can’t seem to understand this practice problem:

Create and return a student object whose name, age, and major are the same as those given as input

def make_student(name, age, major)

I just don’t get what it means by object, do they mean I should create an array inside the function that holds these values? or create a class and let this function be inside it, and assign instances? (before this question i was asked to set up a student class with name, age, and major inside)

class Student:
    name = "Unknown name"
    age = 0
    major = "Unknown major"

Question 44

class Student(object):
    name = ""
    age = 0
    major = ""

    # The class "constructor" - It's actually an initializer 
    def __init__(self, name, age, major):
        self.name = name
        self.age = age
        self.major = major

def make_student(name, age, major):
    student = Student(name, age, major)
    return student

Note that even though one of the principles in Python’s philosophy is “there should be one—and preferably only one—obvious way to do it”, there are still multiple ways to do this. You can also use the two following snippets of code to take advantage of Python’s dynamic capabilities:

class Student(object):
    name = ""
    age = 0
    major = ""

def make_student(name, age, major):
    student = Student()
    student.name = name
    student.age = age
    student.major = major
    # Note: I didn't need to create a variable in the class definition before doing this.
    student.gpa = float(4.0)
    return student

I prefer the former, but there are instances where the latter can be useful – one being when working with document databases like MongoDB.

Question 45

Create a class and give it an __init__ method:

class Student:
    def __init__(self, name, age, major):
        self.name = name
        self.age = age
        self.major = major

    def is_old(self):
        return self.age > 100

Now, you can initialize an instance of the Student class:

>>> s = Student('John', 88, None)
>>> s.name
    'John'
>>> s.age
    88

Although I’m not sure why you need a make_student student function if it does the same thing as Student.__init__.

Question 46

Objects are instances of classes. Classes are just the blueprints for objects. So given your class definition –

# Note the added (object) - this is the preferred way of creating new classes
class Student(object):
    name = "Unknown name"
    age = 0
    major = "Unknown major"

You can create a make_student function by explicitly assigning the attributes to a new instance of Student –

def make_student(name, age, major):
    student = Student()
    student.name = name
    student.age = age
    student.major = major
    return student

But it probably makes more sense to do this in a constructor (__init__) –

class Student(object):
    def __init__(self, name="Unknown name", age=0, major="Unknown major"):
        self.name = name
        self.age = age
        self.major = major

The constructor is called when you use Student(). It will take the arguments defined in the __init__ method. The constructor signature would now essentially be Student(name, age, major).

If you use that, then a make_student function is trivial (and superfluous) –

def make_student(name, age, major):
    return Student(name, age, major)

For fun, here is an example of how to create a make_student function without defining a class. Please do not try this at home.

def make_student(name, age, major):
    return type('Student', (object,),
                {'name': name, 'age': age, 'major': major})()

Question 47

when you create an object using predefine class, at first you want to create a variable for storing that object. Then you can create object and store variable that you created.

class Student:
     def __init__(self):

# creating an object....

   student1=Student()

Actually this init method is the constructor of class.you can initialize that method using some attributes.. In that point , when you creating an object , you will have to pass some values for particular attributes..

class Student:
      def __init__(self,name,age):
            self.name=value
            self.age=value

 # creating an object.......

     student2=Student("smith",25)

Question 48

I have an array of distances called dists. I want to select dists which are between two values. I wrote the following line of code to do that:

 dists[(np.where(dists >= r)) and (np.where(dists <= r + dr))]

However this selects only for the condition

 (np.where(dists <= r + dr))

If I do the commands sequentially by using a temporary variable it works fine. Why does the above code not work, and how do I get it to work?

Cheers

Question 49

The best way in your particular case would just be to change your two criteria to one criterion:

dists[abs(dists - r - dr/2.) <= dr/2.]

It only creates one boolean array, and in my opinion is easier to read because it says, is dist within a dr or r? (Though I’d redefine r to be the center of your region of interest instead of the beginning, so r = r + dr/2.) But that doesn’t answer your question.

The answer to your question:
You don’t actually need where if you’re just trying to filter out the elements of dists that don’t fit your criteria:

dists[(dists >= r) & (dists <= r+dr)]

Because the & will give you an elementwise and (the parentheses are necessary).

Or, if you do want to use where for some reason, you can do:

 dists[(np.where((dists >= r) & (dists <= r + dr)))]

Why:
The reason it doesn’t work is because np.where returns a list of indices, not a boolean array. You’re trying to get and between two lists of numbers, which of course doesn’t have the True/False values that you expect. If a and b are both True values, then a and b returns b. So saying something like [0,1,2] and [2,3,4] will just give you [2,3,4]. Here it is in action:

In [230]: dists = np.arange(0,10,.5)
In [231]: r = 5
In [232]: dr = 1

In [233]: np.where(dists >= r)
Out[233]: (array([10, 11, 12, 13, 14, 15, 16, 17, 18, 19]),)

In [234]: np.where(dists <= r+dr)
Out[234]: (array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12]),)

In [235]: np.where(dists >= r) and np.where(dists <= r+dr)
Out[235]: (array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12]),)

What you were expecting to compare was simply the boolean array, for example

In [236]: dists >= r
Out[236]: 
array([False, False, False, False, False, False, False, False, False,
       False,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True], dtype=bool)

In [237]: dists <= r + dr
Out[237]: 
array([ True,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True,  True,  True, False, False, False, False, False,
       False, False], dtype=bool)

In [238]: (dists >= r) & (dists <= r + dr)
Out[238]: 
array([False, False, False, False, False, False, False, False, False,
       False,  True,  True,  True, False, False, False, False, False,
       False, False], dtype=bool)

Now you can call np.where on the combined boolean array:

In [239]: np.where((dists >= r) & (dists <= r + dr))
Out[239]: (array([10, 11, 12]),)

In [240]: dists[np.where((dists >= r) & (dists <= r + dr))]
Out[240]: array([ 5. ,  5.5,  6. ])

Or simply index the original array with the boolean array using fancy indexing

In [241]: dists[(dists >= r) & (dists <= r + dr)]
Out[241]: array([ 5. ,  5.5,  6. ])

Question 50

The accepted answer explained the problem well enough. However, the the more Numpythonic approach for applying multiple conditions is to use numpy logical functions. In this ase you can use np.logical_and:

np.where(np.logical_and(np.greater_equal(dists,r),np.greater_equal(dists,r + dr)))

Question 51

One interesting thing to point here; the usual way of using OR and AND too will work in this case, but with a small change. Instead of “and” and instead of “or”, rather use Ampersand(&) and Pipe Operator(|) and it will work.

When we use ‘and’:

ar = np.array([3,4,5,14,2,4,3,7])
np.where((ar>3) and (ar<6), 'yo', ar)

Output:
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

When we use Ampersand(&):

ar = np.array([3,4,5,14,2,4,3,7])
np.where((ar>3) & (ar<6), 'yo', ar)

Output:
array(['3', 'yo', 'yo', '14', '2', 'yo', '3', '7'], dtype='<U11')

And this is same in the case when we are trying to apply multiple filters in case of pandas Dataframe. Now the reasoning behind this has to do something with Logical Operators and Bitwise Operators and for more understanding about same, I’d suggest to go through this answer or similar Q/A in stackoverflow.

UPDATE

A user asked, why is there a need for giving (ar>3) and (ar<6) inside the parenthesis. Well here’s the thing. Before I start talking about what’s happening here, one needs to know about Operator precedence in Python.

Similar to what BODMAS is about, python also gives precedence to what should be performed first. Items inside the parenthesis are performed first and then the bitwise operator comes to work. I’ll show below what happens in both the cases when you do use and not use “(“, “)”.

Case1:

np.where( ar>3 & ar<6, 'yo', ar)
np.where( np.array([3,4,5,14,2,4,3,7])>3 & np.array([3,4,5,14,2,4,3,7])<6, 'yo', ar)

Since there are no brackets here, the bitwise operator(&) is getting confused here that what are you even asking it to get logical AND of, because in the operator precedence table if you see, & is given precedence over < or > operators. Here’s the table from from lowest precedence to highest precedence.

It’s not even performing the < and > operation and being asked to perform a logical AND operation. So that’s why it gives that error.

One can check out the following link to learn more about: operator precedence

Now to Case 2:

If you do use the bracket, you clearly see what happens.

np.where( (ar>3) & (ar<6), 'yo', ar)
np.where( (array([False,  True,  True,  True, False,  True, False,  True])) & (array([ True,  True,  True, False,  True,  True,  True, False])), 'yo', ar)

Two arrays of True and False. And you can easily perform logical AND operation on them. Which gives you:

np.where( array([False,  True,  True, False, False,  True, False, False]),  'yo', ar)

And rest you know, np.where, for given cases, wherever True, assigns first value(i.e. here ‘yo’) and if False, the other(i.e. here, keeping the original).

That’s all. I hope I explained the query well.

Question 52

I like to use np.vectorize for such tasks. Consider the following:

>>> # function which returns True when constraints are satisfied.
>>> func = lambda d: d >= r and d<= (r+dr) 
>>>
>>> # Apply constraints element-wise to the dists array.
>>> result = np.vectorize(func)(dists) 
>>>
>>> result = np.where(result) # Get output.

You can also use np.argwhere instead of np.where for clear output. But that is your call :)

Hope it helps.

Question 53

Try:

np.intersect1d(np.where(dists >= r)[0],np.where(dists <= r + dr)[0])

Question 54

This should work:

dists[((dists >= r) & (dists <= r+dr))]

The most elegant way~~

Question 55

Try:

import numpy as np
dist = np.array([1,2,3,4,5])
r = 2
dr = 3
np.where(np.logical_and(dist> r, dist<=r+dr))

Output: (array([2, 3]),)

You can see Logic functions for more details.

Question 56

I have worked out this simple example

import numpy as np

ar = np.array([3,4,5,14,2,4,3,7])

print [X for X in list(ar) if (X >= 3 and X <= 6)]

>>> 
[3, 4, 5, 4, 3]

Question 57

I’m writing some code that takes a filename, opens the file, and parses out some data. I’d like to do this in a class. The following code works:

class MyClass():
    def __init__(self, filename):
        self.filename = filename 

        self.stat1 = None
        self.stat2 = None
        self.stat3 = None
        self.stat4 = None
        self.stat5 = None

        def parse_file():
            #do some parsing
            self.stat1 = result_from_parse1
            self.stat2 = result_from_parse2
            self.stat3 = result_from_parse3
            self.stat4 = result_from_parse4
            self.stat5 = result_from_parse5

        parse_file()

But it involves me putting all of the parsing machinery in the scope of the __init__ function for my class. That looks fine now for this simplified code, but the function parse_file has quite a few levels of indention as well. I’d prefer to define the function parse_file() as a class function like below:

class MyClass():
    def __init__(self, filename):
        self.filename = filename 

        self.stat1 = None
        self.stat2 = None
        self.stat3 = None
        self.stat4 = None
        self.stat5 = None
        parse_file()

    def parse_file():
        #do some parsing
        self.stat1 = result_from_parse1
        self.stat2 = result_from_parse2
        self.stat3 = result_from_parse3
        self.stat4 = result_from_parse4
        self.stat5 = result_from_parse5

Of course this code doesn’t work because the function parse_file() is not within the scope of the __init__ function. Is there a way to call a class function from within __init__ of that class? Or am I thinking about this the wrong way?

Question 58

Call the function in this way:

self.parse_file()

You also need to define your parse_file() function like this:

def parse_file(self):

The parse_file method has to be bound to an object upon calling it (because it’s not a static method). This is done by calling the function on an instance of the object, in your case the instance is self.

Question 59

If I’m not wrong, both functions are part of your class, you should use it like this:

class MyClass():
    def __init__(self, filename):
        self.filename = filename 

        self.stat1 = None
        self.stat2 = None
        self.stat3 = None
        self.stat4 = None
        self.stat5 = None
        self.parse_file()

    def parse_file(self):
        #do some parsing
        self.stat1 = result_from_parse1
        self.stat2 = result_from_parse2
        self.stat3 = result_from_parse3
        self.stat4 = result_from_parse4
        self.stat5 = result_from_parse5

replace your line:

parse_file()

with:

self.parse_file()

Question 60

How about:

class MyClass(object):
    def __init__(self, filename):
        self.filename = filename 
        self.stats = parse_file(filename)

def parse_file(filename):
    #do some parsing
    return results_from_parse

By the way, if you have variables named stat1, stat2, etc., the situation is begging for a tuple: stats = (...).

So let parse_file return a tuple, and store the tuple in self.stats.

Then, for example, you can access what used to be called stat3 with self.stats[2].

Question 61

In parse_file, take the self argument (just like in __init__). If there’s any other context you need then just pass it as additional arguments as usual.

Question 62

You must declare parse_file like this; def parse_file(self). The “self” parameter is a hidden parameter in most languages, but not in python. You must add it to the definition of all that methods that belong to a class. Then you can call the function from any method inside the class using self.parse_file

your final program is going to look like this:

class MyClass():
  def __init__(self, filename):
      self.filename = filename 

      self.stat1 = None
      self.stat2 = None
      self.stat3 = None
      self.stat4 = None
      self.stat5 = None
      self.parse_file()

  def parse_file(self):
      #do some parsing
      self.stat1 = result_from_parse1
      self.stat2 = result_from_parse2
      self.stat3 = result_from_parse3
      self.stat4 = result_from_parse4
      self.stat5 = result_from_parse5

Question 63

I think that your problem is actually with not correctly indenting init function.It should be like this

class MyClass():
     def __init__(self, filename):
          pass

     def parse_file():
          pass

问题：为什么迭代一小串字符串比一小串列表慢？

回答 0

TL; DR

这些不是新对象！

TL;DR

These are not new objects!

回答 1

回答 2

问题：Python垃圾收集器文档

回答 0

更新资料

Update

问题：__init __（）是否应该调用父类的__init __（）？

回答 0

回答 1

回答 2

回答 3

回答 4

回答 5

回答 6

问题：迭代对应于Python中列表的字典键值

回答 0

回答 1

回答 2

回答 3

回答 4

问题：Emacs适用于Python的批量缩进

回答 0

回答 1

回答 2

回答 3

回答 4

回答 5

回答 6

回答 7

问题：numpy，scipy，matplotlib和pylab之间的混淆

回答 0

回答 1

回答 2

问题：检查类是否已定义函数的最快方法是什么？

回答 0

回答 1

回答 2

回答 3

回答 4

回答 5

回答 6

回答 7

问题：Python，创建对象

回答 0

回答 1

回答 2

回答 3

问题：脾气暴躁的地方有多个条件

回答 0

回答 1

回答 2

更新

UPDATE

回答 3

回答 4

回答 5

回答 6

回答 7

问题：在__init__内调用类函数

回答 0

回答 1

回答 2

回答 3

回答 4

回答 5

有趣好用的Python教程

问题：init （）是否应该调用父类的init （）？

问题：在init内调用类函数