Python 实用宝典

Question 1

I’m looking for documents that describes in details how python garbage collection works.

I’m interested what is done in which step. What objects are in these 3 collections? What kinds of objects are deleted in each step? What algorithm is used for reference cycles finding?

Background: I’m implementing some searches that have to finish in small amount of time. When the garbage collector starts collecting the oldest generation, it is “much” slower than in other cases. It took more time than it is intended for searches. I’m looking how to predict when it will collect oldest generation and how long it will take.

It is easy to predict when it will collect oldest generation with get_count() and get_threshold(). That also can be manipulated with set_threshold(). But I don’t see how easy to decide is it better to make collect() by force or wait for scheduled collection.

Question 2

There’s no definitive resource on how Python does its garbage collection (other than the source code itself), but those 3 links should give you a pretty good idea.

Update

The source is actually pretty helpful. How much you get out of it depends on how well you read C, but the comments are actually very helpful. Skip down to the collect() function and the comments explain the process well (albeit in very technical terms).

Question 3

I’m used that in Objective-C I’ve got this construct:

- (void)init {
    if (self = [super init]) {
        // init class
    }
    return self;
}

Should Python also call the parent class’s implementation for __init__?

class NewClass(SomeOtherClass):
    def __init__(self):
        SomeOtherClass.__init__(self)
        # init class

Is this also true/false for __new__() and __del__()?

Edit: There’s a very similar question: Inheritance and Overriding __init__ in Python

Question 4

In Python, calling the super-class’ __init__ is optional. If you call it, it is then also optional whether to use the super identifier, or whether to explicitly name the super class:

object.__init__(self)

In case of object, calling the super method is not strictly necessary, since the super method is empty. Same for __del__.

On the other hand, for __new__, you should indeed call the super method, and use its return as the newly-created object – unless you explicitly want to return something different.

Question 5

If you need something from super’s __init__ to be done in addition to what is being done in the current class’s __init__, you must call it yourself, since that will not happen automatically. But if you don’t need anything from super’s __init__, no need to call it. Example:

>>> class C(object):
        def __init__(self):
            self.b = 1


>>> class D(C):
        def __init__(self):
            super().__init__() # in Python 2 use super(D, self).__init__()
            self.a = 1


>>> class E(C):
        def __init__(self):
            self.a = 1


>>> d = D()
>>> d.a
1
>>> d.b  # This works because of the call to super's init
1
>>> e = E()
>>> e.a
1
>>> e.b  # This is going to fail since nothing in E initializes b...
Traceback (most recent call last):
  File "<pyshell#70>", line 1, in <module>
    e.b  # This is going to fail since nothing in E initializes b...
AttributeError: 'E' object has no attribute 'b'

__del__ is the same way, (but be wary of relying on __del__ for finalization – consider doing it via the with statement instead).

I rarely use __new__. I do all the initialization in __init__.

Question 6

In Anon’s answer:
“If you need something from super’s __init__ to be done in addition to what is being done in the current class’s __init__ , you must call it yourself, since that will not happen automatically”

It’s incredible: he is wording exactly the contrary of the principle of inheritance.

It is not that “something from super’s __init__ (…) will not happen automatically” , it is that it WOULD happen automatically, but it doesn’t happen because the base-class’ __init__ is overriden by the definition of the derived-clas __init__

So then, WHY defining a derived_class’ __init__ , since it overrides what is aimed at when someone resorts to inheritance ??

It’s because one needs to define something that is NOT done in the base-class’ __init__ , and the only possibility to obtain that is to put its execution in a derived-class’ __init__ function.
In other words, one needs something in base-class’ __init__ in addition to what would be automatically done in the base-classe’ __init__ if this latter wasn’t overriden.
NOT the contrary.

Then, the problem is that the desired instructions present in the base-class’ __init__ are no more activated at the moment of instantiation. In order to offset this inactivation, something special is required: calling explicitly the base-class’ __init__ , in order to KEEP , NOT TO ADD, the initialization performed by the base-class’ __init__ . That’s exactly what is said in the official doc:

An overriding method in a derived class may in fact want to extend rather than simply replace the base class method of the same name. There is a simple way to call the base class method directly: just call BaseClassName.methodname(self, arguments).
http://docs.python.org/tutorial/classes.html#inheritance

That’s all the story:

when the aim is to KEEP the initialization performed by the base-class, that is pure inheritance, nothing special is needed, one must just avoid to define an __init__ function in the derived class
when the aim is to REPLACE the initialization performed by the base-class, __init__ must be defined in the derived-class
when the aim is to ADD processes to the initialization performed by the base-class, a derived-class’ __init__ must be defined , comprising an explicit call to the base-class __init__

What I feel astonishing in the post of Anon is not only that he expresses the contrary of the inheritance theory, but that there have been 5 guys passing by that upvoted without turning a hair, and moreover there have been nobody to react in 2 years in a thread whose interesting subject must be read relatively often.

Question 7

Edit: (after the code change)
There is no way for us to tell you whether you need or not to call your parent’s __init__ (or any other function). Inheritance obviously would work without such call. It all depends on the logic of your code: for example, if all your __init__ is done in parent class, you can just skip child-class __init__ altogether.

consider the following example:

>>> class A:
    def __init__(self, val):
        self.a = val


>>> class B(A):
    pass

>>> class C(A):
    def __init__(self, val):
        A.__init__(self, val)
        self.a += val


>>> A(4).a
4
>>> B(5).a
5
>>> C(6).a
12

Question 8

There’s no hard and fast rule. The documentation for a class should indicate whether subclasses should call the superclass method. Sometimes you want to completely replace superclass behaviour, and at other times augment it – i.e. call your own code before and/or after a superclass call.

Update: The same basic logic applies to any method call. Constructors sometimes need special consideration (as they often set up state which determines behaviour) and destructors because they parallel constructors (e.g. in the allocation of resources, e.g. database connections). But the same might apply, say, to the render() method of a widget.

Further update: What’s the OPP? Do you mean OOP? No – a subclass often needs to know something about the design of the superclass. Not the internal implementation details – but the basic contract that the superclass has with its clients (using classes). This does not violate OOP principles in any way. That’s why protected is a valid concept in OOP in general (though not, of course, in Python).

Question 9

IMO, you should call it. If your superclass is object, you should not, but in other cases I think it is exceptional not to call it. As already answered by others, it is very convenient if your class doesn’t even have to override __init__ itself, for example when it has no (additional) internal state to initialize.

Question 10

Yes, you should always call base class __init__ explicitly as a good coding practice. Forgetting to do this can cause subtle issues or run time errors. This is true even if __init__ doesn’t take any parameters. This is unlike other languages where compiler would implicitly call base class constructor for you. Python doesn’t do that!

The main reason for always calling base class _init__ is that base class may typically create member variable and initialize them to defaults. So if you don’t call base class init, none of that code would be executed and you would end up with base class that has no member variables.

Example:

class Base:
  def __init__(self):
    print('base init')

class Derived1(Base):
  def __init__(self):
    print('derived1 init')

class Derived2(Base):
  def __init__(self):
    super(Derived2, self).__init__()
    print('derived2 init')

print('Creating Derived1...')
d1 = Derived1()
print('Creating Derived2...')
d2 = Derived2()

This prints..

Creating Derived1...
derived1 init
Creating Derived2...
base init
derived2 init

Run this code.

Question 11

Working in Python 2.7. I have a dictionary with team names as the keys and the amount of runs scored and allowed for each team as the value list:

NL_East = {'Phillies': [645, 469], 'Braves': [599, 548], 'Mets': [653, 672]}

I would like to be able to feed the dictionary into a function and iterate over each team (the keys).

Here’s the code I’m using. Right now, I can only go team by team. How would I iterate over each team and print the expected win_percentage for each team?

def Pythag(league):
    runs_scored = float(league['Phillies'][0])
    runs_allowed = float(league['Phillies'][1])
    win_percentage = round((runs_scored**2)/((runs_scored**2)+(runs_allowed**2))*1000)
    print win_percentage

Thanks for any help.

Question 12

You have several options for iterating over a dictionary.

If you iterate over the dictionary itself (for team in league), you will be iterating over the keys of the dictionary. When looping with a for loop, the behavior will be the same whether you loop over the dict (league) itself, or league.keys():

for team in league.keys():
    runs_scored, runs_allowed = map(float, league[team])

You can also iterate over both the keys and the values at once by iterating over league.items():

for team, runs in league.items():
    runs_scored, runs_allowed = map(float, runs)

You can even perform your tuple unpacking while iterating:

for team, (runs_scored, runs_allowed) in league.items():
    runs_scored = float(runs_scored)
    runs_allowed = float(runs_allowed)

Question 13

You can very easily iterate over dictionaries, too:

for team, scores in NL_East.iteritems():
    runs_scored = float(scores[0])
    runs_allowed = float(scores[1])
    win_percentage = round((runs_scored**2)/((runs_scored**2)+(runs_allowed**2))*1000)
    print '%s: %.1f%%' % (team, win_percentage)

Question 14

Dictionaries have a built in function called iterkeys().

Try:

for team in league.iterkeys():
    runs_scored = float(league[team][0])
    runs_allowed = float(league[team][1])
    win_percentage = round((runs_scored**2)/((runs_scored**2)+(runs_allowed**2))*1000)
    print win_percentage

Question 15

Dictionary objects allow you to iterate over their items. Also, with pattern matching and the division from __future__ you can do simplify things a bit.

Finally, you can separate your logic from your printing to make things a bit easier to refactor/debug later.

from __future__ import division

def Pythag(league):
    def win_percentages():
        for team, (runs_scored, runs_allowed) in league.iteritems():
            win_percentage = round((runs_scored**2) / ((runs_scored**2)+(runs_allowed**2))*1000)
            yield win_percentage

    for win_percentage in win_percentages():
        print win_percentage

Question 16

List comprehension can shorten things…

win_percentages = [m**2.0 / (m**2.0 + n**2.0) * 100 for m, n in [a[i] for i in NL_East]]

Question 17

Working with Python in Emacs if I want to add a try/except to a block of code, I often find that I am having to indent the whole block, line by line. In Emacs, how do you indent the whole block at once.

I am not an experienced Emacs user, but just find it is the best tool for working through ssh. I am using Emacs on the command line(Ubuntu), not as a gui, if that makes any difference.

Question 18

If you are programming Python using Emacs, then you should probably be using python-mode. With python-mode, after marking the block of code,

C-c > or C-c C-l shifts the region 4 spaces to the right

C-c < or C-c C-r shifts the region 4 spaces to the left

If you need to shift code by two levels of indention, or some arbitary amount you can prefix the command with an argument:

C-u 8 C-c > shifts the region 8 spaces to the right

C-u 8 C-c < shifts the region 8 spaces to the left

Another alternative is to use M-x indent-rigidly which is bound to C-x TAB:

C-u 8 C-x TAB shifts the region 8 spaces to the right

C-u -8 C-x TAB shifts the region 8 spaces to the left

Also useful are the rectangle commands that operate on rectangles of text instead of lines of text.

For example, after marking a rectangular region,

C-x r o inserts blank space to fill the rectangular region (effectively shifting code to the right)

C-x r k kills the rectangular region (effectively shifting code to the left)

C-x r t prompts for a string to replace the rectangle with. Entering C-u 8 <space> will then enter 8 spaces.

PS. With Ubuntu, to make python-mode the default mode for all .py files, simply install the python-mode package.

Question 19

In addition to indent-region, which is mapped to C-M-\ by default, the rectangle edit commands are very useful for Python. Mark a region as normal, then:

C-x r t (string-rectangle): will prompt you for characters you’d like to insert into each line; great for inserting a certain number of spaces
C-x r k (kill-rectangle): remove a rectangle region; great for removing indentation

You can also C-x r y (yank-rectangle), but that’s only rarely useful.

Question 20

indent-region mapped to C-M-\ should do the trick.

Question 21

I’ve been using this function to handle my indenting and unindenting:

(defun unindent-dwim (&optional count-arg)
  "Keeps relative spacing in the region.  Unindents to the next multiple of the current tab-width"
  (interactive)
  (let ((deactivate-mark nil)
        (beg (or (and mark-active (region-beginning)) (line-beginning-position)))
        (end (or (and mark-active (region-end)) (line-end-position)))
        (min-indentation)
        (count (or count-arg 1)))
    (save-excursion
      (goto-char beg)
      (while (< (point) end)
        (add-to-list 'min-indentation (current-indentation))
        (forward-line)))
    (if (< 0 count)
        (if (not (< 0 (apply 'min min-indentation)))
            (error "Can't indent any more.  Try `indent-rigidly` with a negative arg.")))
    (if (> 0 count)
        (indent-rigidly beg end (* (- 0 tab-width) count))
      (let (
            (indent-amount
             (apply 'min (mapcar (lambda (x) (- 0 (mod x tab-width))) min-indentation))))
        (indent-rigidly beg end (or
                                 (and (< indent-amount 0) indent-amount)
                                 (* (or count 1) (- 0 tab-width))))))))

And then I assign it to a keyboard shortcut:

(global-set-key (kbd "s-[") 'unindent-dwim)
(global-set-key (kbd "s-]") (lambda () (interactive) (unindent-dwim -1)))

Question 22

I’m an Emacs newb, so this answer it probably bordering on useless.

None of the answers mentioned so far cover re-indentation of literals like dict or list. E.g. M-x indent-region or M-x python-indent-shift-right and company aren’t going to help if you’ve cut-and-pasted the following literal and need it to be re-indented sensibly:

    foo = {
  'bar' : [
     1,
    2,
        3 ],
      'baz' : {
     'asdf' : {
        'banana' : 1,
        'apple' : 2 } } }

It feels like M-x indent-region should do something sensibly in python-mode, but that’s not (yet) the case.

For the specific case where your literals are bracketed, using TAB on the lines in question gets what you want (because whitespace doesn’t play a role).

So what I’ve been doing in such cases is quickly recording a keyboard macro like <f3> C-n TAB <f4> as in F3, Ctrl-n (or down arrow), TAB, F4, and then using F4 repeatedly to apply the macro can save a couple of keystrokes. Or you can do C-u 10 C-x e to apply it 10 times.

(I know it doesn’t sound like much, but try re-indenting 100 lines of garbage literal without missing down-arrow, and then having to go up 5 lines and repeat things ;) ).

Question 23

I use the following snippet. On tab when the selection is inactive, it indents the current line (as it normally does); when the selection is inactive, it indents the whole region to the right.

(defun my-python-tab-command (&optional _)
  "If the region is active, shift to the right; otherwise, indent current line."
  (interactive)
  (if (not (region-active-p))
      (indent-for-tab-command)
    (let ((lo (min (region-beginning) (region-end)))
          (hi (max (region-beginning) (region-end))))
      (goto-char lo)
      (beginning-of-line)
      (set-mark (point))
      (goto-char hi)
      (end-of-line)
      (python-indent-shift-right (mark) (point)))))
(define-key python-mode-map [remap indent-for-tab-command] 'my-python-tab-command)

Question 24

Do indentation interactively.

Select the region to be indented.
C-x TAB.
Use arrows (<- and ->) to indent interactively.
Press Esc three times when you are done with the required indentation.

Copied from my post in: Indent several lines in Emacs

Question 25

I do something like this universally

;; intent whole buffer 
(defun iwb ()
  "indent whole buffer"
  (interactive)
  ;;(delete-trailing-whitespace)
  (indent-region (point-min) (point-max) nil)
  (untabify (point-min) (point-max)))

Question 26

Numpy, scipy, matplotlib, and pylab are common terms among they who use python for scientific computation.

I just learn a bit about pylab, and I got confused. Whenever I want to import numpy, I can always do:

import numpy as np

I just consider, that once I do

from pylab import *

the numpy will be imported as well (with np alias). So basically the second one does more things compared to the first one.

There are few things I want to ask:

Is it right that pylab is just a wrapper for numpy, scipy and matplotlib?
As np is the numpy alias in pylab, what is the scipy and matplotlib alias in pylab? (as far as I know, plt is alias of matplotlib.pyplot, but I don’t know the alias for the matplotlib itself)

Question 27

No, pylab is part of matplotlib (in matplotlib.pylab) and tries to give you a MatLab like environment. matplotlib has a number of dependencies, among them numpy which it imports under the common alias np. scipy is not a dependency of matplotlib.
If you run ipython --pylab an automatic import will put all symbols from matplotlib.pylab into global scope. Like you wrote numpy gets imported under the np alias. Symbols from matplotlib are available under the mpl alias.

Question 28

Scipy and numpy are scientific projects whose aim is to bring efficient and fast numeric computing to python.

Matplotlib is the name of the python plotting library.

Pyplot is an interactive api for matplotlib, mostly for use in notebooks like jupyter. You generally use it like this: import matplotlib.pyplot as plt.

Pylab is the same thing as pyplot, but with extra features (its use is currently discouraged).

pylab = pyplot + numpy

See more information here: Matplotlib, Pylab, Pyplot, etc: What’s the difference between these and when to use each?

Question 29

Since some people (like me) may still be confused about usage of pylab since examples using pylab are out there on the internet, here is a quote from the official matplotlib FAQ:

pylab is a convenience module that bulk imports matplotlib.pyplot (for plotting) and numpy (for mathematics and working with arrays) in a single name space. Although many examples use pylab, it is no longer recommended.

So, TL;DR; is do not use pylab, period. Use pyplot and import numpy separately as needed.

Here is the link for further reading and other useful examples.

Question 30

I’m writing an AI state space search algorithm, and I have a generic class which can be used to quickly implement a search algorithm. A subclass would define the necessary operations, and the algorithm does the rest.

Here is where I get stuck: I want to avoid regenerating the parent state over and over again, so I have the following function, which returns the operations that can be legally applied to any state:

def get_operations(self, include_parent=True):
    ops = self._get_operations()
    if not include_parent and self.path.parent_op:
        try:
            parent_inverse = self.invert_op(self.path.parent_op)
            ops.remove(parent_inverse)
        except NotImplementedError:
            pass
    return ops

And the invert_op function throws by default.

Is there a faster way to check to see if the function is not defined than catching an exception?

I was thinking something on the lines of checking for present in dir, but that doesn’t seem right. hasattr is implemented by calling getattr and checking if it raises, which is not what I want.

Question 31

Yes, use getattr() to get the attribute, and callable() to verify it is a method:

invert_op = getattr(self, "invert_op", None)
if callable(invert_op):
    invert_op(self.path.parent_op)

Note that getattr() normally throws exception when the attribute doesn’t exist. However, if you specify a default value (None, in this case), it will return that instead.

Question 32

It works in both Python 2 and Python 3

hasattr(connection, 'invert_opt')

hasattr returns True if connection object has a function invert_opt defined. Here is the documentation for you to graze

https://docs.python.org/2/library/functions.html#hasattr https://docs.python.org/3/library/functions.html#hasattr

Question 33

Is there a faster way to check to see if the function is not defined than catching an exception?

Why are you against that? In most Pythonic cases, it’s better to ask forgiveness than permission. ;-)

hasattr is implemented by calling getattr and checking if it raises, which is not what I want.

Again, why is that? The following is quite Pythonic:

    try:
        invert_op = self.invert_op
    except AttributeError:
        pass
    else:
        parent_inverse = invert_op(self.path.parent_op)
        ops.remove(parent_inverse)

Or,

    # if you supply the optional `default` parameter, no exception is thrown
    invert_op = getattr(self, 'invert_op', None)  
    if invert_op is not None:
        parent_inverse = invert_op(self.path.parent_op)
        ops.remove(parent_inverse)

Note, however, that getattr(obj, attr, default) is basically implemented by catching an exception, too. There is nothing wrong with that in Python land!

Question 34

The responses herein check if a string is the name of an attribute of the object. An extra step (using callable) is needed to check if the attribute is a method.

So it boils down to: what is the fastest way to check if an object obj has an attribute attrib. The answer is

'attrib' in obj.__dict__

This is so because a dict hashes its keys so checking for the key’s existence is fast.

See timing comparisons below.

>>> class SomeClass():
...         pass
...
>>> obj = SomeClass()
>>>
>>> getattr(obj, "invert_op", None)
>>>
>>> %timeit getattr(obj, "invert_op", None)
1000000 loops, best of 3: 723 ns per loop
>>> %timeit hasattr(obj, "invert_op")
The slowest run took 4.60 times longer than the fastest. This could mean that an intermediate result is being cached.
1000000 loops, best of 3: 674 ns per loop
>>> %timeit "invert_op" in obj.__dict__
The slowest run took 12.19 times longer than the fastest. This could mean that an intermediate result is being cached.
10000000 loops, best of 3: 176 ns per loop

Question 35

I like Nathan Ostgard’s answer and I up-voted it. But another way you could solve your problem would be to use a memoizing decorator, which would cache the result of the function call. So you can go ahead and have an expensive function that figures something out, but then when you call it over and over the subsequent calls are fast; the memoized version of the function looks up the arguments in a dict, finds the result in the dict from when the actual function computed the result, and returns the result right away.

Here is a recipe for a memoizing decorator called “lru_cache” by Raymond Hettinger. A version of this is now standard in the functools module in Python 3.2.

http://code.activestate.com/recipes/498245-lru-and-lfu-cache-decorators/

http://docs.python.org/release/3.2/library/functools.html

Question 36

Like anything in Python, if you try hard enough, you can get at the guts and do something really nasty. Now, here’s the nasty part:

def invert_op(self, op):
    raise NotImplementedError

def is_invert_op_implemented(self):
    # Only works in CPython 2.x of course
    return self.invert_op.__code__.co_code == 't\x00\x00\x82\x01\x00d\x00\x00S'

Please do us a favor, just keep doing what you have in your question and DON’T ever use this unless you are on the PyPy team hacking into the Python interpreter. What you have up there is Pythonic, what I have here is pure EVIL.

Question 37

You can also go over the class:

import inspect


def get_methods(cls_):
    methods = inspect.getmembers(cls_, inspect.isfunction)
    return dict(methods)

# Example
class A(object):
    pass

class B(object):
    def foo():
        print('B')


# If you only have an object, you can use `cls_ = obj.__class__`
if 'foo' in get_methods(A):
    print('A has foo')

if 'foo' in get_methods(B):
    print('B has foo')

Question 38

While checking for attributes in __dict__ property is really fast, you cannot use this for methods, since they do not appear in __dict__ hash. You could however resort to hackish workaround in your class, if performance is that critical:

class Test():
    def __init__():
        # redefine your method as attribute
        self.custom_method = self.custom_method

    def custom_method(self):
        pass

Then check for method as:

t = Test()
'custom_method' in t.__dict__

Time comparision with getattr:

>>%timeit 'custom_method' in t.__dict__
55.9 ns ± 0.626 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)

>>%timeit getattr(t, 'custom_method', None)
116 ns ± 0.765 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)

Not that I’m encouraging this approach, but it seems to work.

[EDIT] Performance boost is even higher when method name is not in given class:

>>%timeit 'rubbish' in t.__dict__
65.5 ns ± 11 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)

>>%timeit getattr(t, 'rubbish', None)
385 ns ± 12.9 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

Question 39

I’m trying to learn python and I now I am trying to get the hang of classes and how to manipulate them with instances.

I can’t seem to understand this practice problem:

Create and return a student object whose name, age, and major are the same as those given as input

def make_student(name, age, major)

I just don’t get what it means by object, do they mean I should create an array inside the function that holds these values? or create a class and let this function be inside it, and assign instances? (before this question i was asked to set up a student class with name, age, and major inside)

class Student:
    name = "Unknown name"
    age = 0
    major = "Unknown major"

Question 40

class Student(object):
    name = ""
    age = 0
    major = ""

    # The class "constructor" - It's actually an initializer 
    def __init__(self, name, age, major):
        self.name = name
        self.age = age
        self.major = major

def make_student(name, age, major):
    student = Student(name, age, major)
    return student

Note that even though one of the principles in Python’s philosophy is “there should be one—and preferably only one—obvious way to do it”, there are still multiple ways to do this. You can also use the two following snippets of code to take advantage of Python’s dynamic capabilities:

class Student(object):
    name = ""
    age = 0
    major = ""

def make_student(name, age, major):
    student = Student()
    student.name = name
    student.age = age
    student.major = major
    # Note: I didn't need to create a variable in the class definition before doing this.
    student.gpa = float(4.0)
    return student

I prefer the former, but there are instances where the latter can be useful – one being when working with document databases like MongoDB.

Question 41

Create a class and give it an __init__ method:

class Student:
    def __init__(self, name, age, major):
        self.name = name
        self.age = age
        self.major = major

    def is_old(self):
        return self.age > 100

Now, you can initialize an instance of the Student class:

>>> s = Student('John', 88, None)
>>> s.name
    'John'
>>> s.age
    88

Although I’m not sure why you need a make_student student function if it does the same thing as Student.__init__.

Question 42

Objects are instances of classes. Classes are just the blueprints for objects. So given your class definition –

# Note the added (object) - this is the preferred way of creating new classes
class Student(object):
    name = "Unknown name"
    age = 0
    major = "Unknown major"

You can create a make_student function by explicitly assigning the attributes to a new instance of Student –

def make_student(name, age, major):
    student = Student()
    student.name = name
    student.age = age
    student.major = major
    return student

But it probably makes more sense to do this in a constructor (__init__) –

class Student(object):
    def __init__(self, name="Unknown name", age=0, major="Unknown major"):
        self.name = name
        self.age = age
        self.major = major

The constructor is called when you use Student(). It will take the arguments defined in the __init__ method. The constructor signature would now essentially be Student(name, age, major).

If you use that, then a make_student function is trivial (and superfluous) –

def make_student(name, age, major):
    return Student(name, age, major)

For fun, here is an example of how to create a make_student function without defining a class. Please do not try this at home.

def make_student(name, age, major):
    return type('Student', (object,),
                {'name': name, 'age': age, 'major': major})()

Question 43

when you create an object using predefine class, at first you want to create a variable for storing that object. Then you can create object and store variable that you created.

class Student:
     def __init__(self):

# creating an object....

   student1=Student()

Actually this init method is the constructor of class.you can initialize that method using some attributes.. In that point , when you creating an object , you will have to pass some values for particular attributes..

class Student:
      def __init__(self,name,age):
            self.name=value
            self.age=value

 # creating an object.......

     student2=Student("smith",25)

Question 44

I have an array of distances called dists. I want to select dists which are between two values. I wrote the following line of code to do that:

 dists[(np.where(dists >= r)) and (np.where(dists <= r + dr))]

However this selects only for the condition

 (np.where(dists <= r + dr))

If I do the commands sequentially by using a temporary variable it works fine. Why does the above code not work, and how do I get it to work?

Cheers

Question 45

The best way in your particular case would just be to change your two criteria to one criterion:

dists[abs(dists - r - dr/2.) <= dr/2.]

It only creates one boolean array, and in my opinion is easier to read because it says, is dist within a dr or r? (Though I’d redefine r to be the center of your region of interest instead of the beginning, so r = r + dr/2.) But that doesn’t answer your question.

The answer to your question:
You don’t actually need where if you’re just trying to filter out the elements of dists that don’t fit your criteria:

dists[(dists >= r) & (dists <= r+dr)]

Because the & will give you an elementwise and (the parentheses are necessary).

Or, if you do want to use where for some reason, you can do:

 dists[(np.where((dists >= r) & (dists <= r + dr)))]

Why:
The reason it doesn’t work is because np.where returns a list of indices, not a boolean array. You’re trying to get and between two lists of numbers, which of course doesn’t have the True/False values that you expect. If a and b are both True values, then a and b returns b. So saying something like [0,1,2] and [2,3,4] will just give you [2,3,4]. Here it is in action:

In [230]: dists = np.arange(0,10,.5)
In [231]: r = 5
In [232]: dr = 1

In [233]: np.where(dists >= r)
Out[233]: (array([10, 11, 12, 13, 14, 15, 16, 17, 18, 19]),)

In [234]: np.where(dists <= r+dr)
Out[234]: (array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12]),)

In [235]: np.where(dists >= r) and np.where(dists <= r+dr)
Out[235]: (array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12]),)

What you were expecting to compare was simply the boolean array, for example

In [236]: dists >= r
Out[236]: 
array([False, False, False, False, False, False, False, False, False,
       False,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True], dtype=bool)

In [237]: dists <= r + dr
Out[237]: 
array([ True,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True,  True,  True, False, False, False, False, False,
       False, False], dtype=bool)

In [238]: (dists >= r) & (dists <= r + dr)
Out[238]: 
array([False, False, False, False, False, False, False, False, False,
       False,  True,  True,  True, False, False, False, False, False,
       False, False], dtype=bool)

Now you can call np.where on the combined boolean array:

In [239]: np.where((dists >= r) & (dists <= r + dr))
Out[239]: (array([10, 11, 12]),)

In [240]: dists[np.where((dists >= r) & (dists <= r + dr))]
Out[240]: array([ 5. ,  5.5,  6. ])

Or simply index the original array with the boolean array using fancy indexing

In [241]: dists[(dists >= r) & (dists <= r + dr)]
Out[241]: array([ 5. ,  5.5,  6. ])

Question 46

The accepted answer explained the problem well enough. However, the the more Numpythonic approach for applying multiple conditions is to use numpy logical functions. In this ase you can use np.logical_and:

np.where(np.logical_and(np.greater_equal(dists,r),np.greater_equal(dists,r + dr)))

Question 47

One interesting thing to point here; the usual way of using OR and AND too will work in this case, but with a small change. Instead of “and” and instead of “or”, rather use Ampersand(&) and Pipe Operator(|) and it will work.

When we use ‘and’:

ar = np.array([3,4,5,14,2,4,3,7])
np.where((ar>3) and (ar<6), 'yo', ar)

Output:
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

When we use Ampersand(&):

ar = np.array([3,4,5,14,2,4,3,7])
np.where((ar>3) & (ar<6), 'yo', ar)

Output:
array(['3', 'yo', 'yo', '14', '2', 'yo', '3', '7'], dtype='<U11')

And this is same in the case when we are trying to apply multiple filters in case of pandas Dataframe. Now the reasoning behind this has to do something with Logical Operators and Bitwise Operators and for more understanding about same, I’d suggest to go through this answer or similar Q/A in stackoverflow.

UPDATE

A user asked, why is there a need for giving (ar>3) and (ar<6) inside the parenthesis. Well here’s the thing. Before I start talking about what’s happening here, one needs to know about Operator precedence in Python.

Similar to what BODMAS is about, python also gives precedence to what should be performed first. Items inside the parenthesis are performed first and then the bitwise operator comes to work. I’ll show below what happens in both the cases when you do use and not use “(“, “)”.

Case1:

np.where( ar>3 & ar<6, 'yo', ar)
np.where( np.array([3,4,5,14,2,4,3,7])>3 & np.array([3,4,5,14,2,4,3,7])<6, 'yo', ar)

Since there are no brackets here, the bitwise operator(&) is getting confused here that what are you even asking it to get logical AND of, because in the operator precedence table if you see, & is given precedence over < or > operators. Here’s the table from from lowest precedence to highest precedence.

It’s not even performing the < and > operation and being asked to perform a logical AND operation. So that’s why it gives that error.

One can check out the following link to learn more about: operator precedence

Now to Case 2:

If you do use the bracket, you clearly see what happens.

np.where( (ar>3) & (ar<6), 'yo', ar)
np.where( (array([False,  True,  True,  True, False,  True, False,  True])) & (array([ True,  True,  True, False,  True,  True,  True, False])), 'yo', ar)

Two arrays of True and False. And you can easily perform logical AND operation on them. Which gives you:

np.where( array([False,  True,  True, False, False,  True, False, False]),  'yo', ar)

And rest you know, np.where, for given cases, wherever True, assigns first value(i.e. here ‘yo’) and if False, the other(i.e. here, keeping the original).

That’s all. I hope I explained the query well.

Question 48

I like to use np.vectorize for such tasks. Consider the following:

>>> # function which returns True when constraints are satisfied.
>>> func = lambda d: d >= r and d<= (r+dr) 
>>>
>>> # Apply constraints element-wise to the dists array.
>>> result = np.vectorize(func)(dists) 
>>>
>>> result = np.where(result) # Get output.

You can also use np.argwhere instead of np.where for clear output. But that is your call :)

Hope it helps.

Question 49

Try:

np.intersect1d(np.where(dists >= r)[0],np.where(dists <= r + dr)[0])

Question 50

This should work:

dists[((dists >= r) & (dists <= r+dr))]

The most elegant way~~

Question 51

Try:

import numpy as np
dist = np.array([1,2,3,4,5])
r = 2
dr = 3
np.where(np.logical_and(dist> r, dist<=r+dr))

Output: (array([2, 3]),)

You can see Logic functions for more details.

Question 52

I have worked out this simple example

import numpy as np

ar = np.array([3,4,5,14,2,4,3,7])

print [X for X in list(ar) if (X >= 3 and X <= 6)]

>>> 
[3, 4, 5, 4, 3]

Question 53

I’m writing some code that takes a filename, opens the file, and parses out some data. I’d like to do this in a class. The following code works:

class MyClass():
    def __init__(self, filename):
        self.filename = filename 

        self.stat1 = None
        self.stat2 = None
        self.stat3 = None
        self.stat4 = None
        self.stat5 = None

        def parse_file():
            #do some parsing
            self.stat1 = result_from_parse1
            self.stat2 = result_from_parse2
            self.stat3 = result_from_parse3
            self.stat4 = result_from_parse4
            self.stat5 = result_from_parse5

        parse_file()

But it involves me putting all of the parsing machinery in the scope of the __init__ function for my class. That looks fine now for this simplified code, but the function parse_file has quite a few levels of indention as well. I’d prefer to define the function parse_file() as a class function like below:

class MyClass():
    def __init__(self, filename):
        self.filename = filename 

        self.stat1 = None
        self.stat2 = None
        self.stat3 = None
        self.stat4 = None
        self.stat5 = None
        parse_file()

    def parse_file():
        #do some parsing
        self.stat1 = result_from_parse1
        self.stat2 = result_from_parse2
        self.stat3 = result_from_parse3
        self.stat4 = result_from_parse4
        self.stat5 = result_from_parse5

Of course this code doesn’t work because the function parse_file() is not within the scope of the __init__ function. Is there a way to call a class function from within __init__ of that class? Or am I thinking about this the wrong way?

Question 54

Call the function in this way:

self.parse_file()

You also need to define your parse_file() function like this:

def parse_file(self):

The parse_file method has to be bound to an object upon calling it (because it’s not a static method). This is done by calling the function on an instance of the object, in your case the instance is self.

Question 55

If I’m not wrong, both functions are part of your class, you should use it like this:

class MyClass():
    def __init__(self, filename):
        self.filename = filename 

        self.stat1 = None
        self.stat2 = None
        self.stat3 = None
        self.stat4 = None
        self.stat5 = None
        self.parse_file()

    def parse_file(self):
        #do some parsing
        self.stat1 = result_from_parse1
        self.stat2 = result_from_parse2
        self.stat3 = result_from_parse3
        self.stat4 = result_from_parse4
        self.stat5 = result_from_parse5

replace your line:

parse_file()

with:

self.parse_file()

Question 56

How about:

class MyClass(object):
    def __init__(self, filename):
        self.filename = filename 
        self.stats = parse_file(filename)

def parse_file(filename):
    #do some parsing
    return results_from_parse

By the way, if you have variables named stat1, stat2, etc., the situation is begging for a tuple: stats = (...).

So let parse_file return a tuple, and store the tuple in self.stats.

Then, for example, you can access what used to be called stat3 with self.stats[2].

Question 57

In parse_file, take the self argument (just like in __init__). If there’s any other context you need then just pass it as additional arguments as usual.

Question 58

You must declare parse_file like this; def parse_file(self). The “self” parameter is a hidden parameter in most languages, but not in python. You must add it to the definition of all that methods that belong to a class. Then you can call the function from any method inside the class using self.parse_file

your final program is going to look like this:

class MyClass():
  def __init__(self, filename):
      self.filename = filename 

      self.stat1 = None
      self.stat2 = None
      self.stat3 = None
      self.stat4 = None
      self.stat5 = None
      self.parse_file()

  def parse_file(self):
      #do some parsing
      self.stat1 = result_from_parse1
      self.stat2 = result_from_parse2
      self.stat3 = result_from_parse3
      self.stat4 = result_from_parse4
      self.stat5 = result_from_parse5

Question 59

I think that your problem is actually with not correctly indenting init function.It should be like this

class MyClass():
     def __init__(self, filename):
          pass

     def parse_file():
          pass

Question 60

I have a file with some probabilities for different values e.g.:

I would like to generate random numbers using this distribution. Does an existing module that handles this exist? It’s fairly simple to code on your own (build the cumulative density function, generate a random value [0,1] and pick the corresponding value) but it seems like this should be a common problem and probably someone has created a function/module for it.

I need this because I want to generate a list of birthdays (which do not follow any distribution in the standard random module).

Question 61

scipy.stats.rv_discrete might be what you want. You can supply your probabilities via the values parameter. You can then use the rvs() method of the distribution object to generate random numbers.

As pointed out by Eugene Pakhomov in the comments, you can also pass a p keyword parameter to numpy.random.choice(), e.g.

numpy.random.choice(numpy.arange(1, 7), p=[0.1, 0.05, 0.05, 0.2, 0.4, 0.2])

If you are using Python 3.6 or above, you can use random.choices() from the standard library – see the answer by Mark Dickinson.

Question 62

Since Python 3.6, there’s a solution for this in Python’s standard library, namely random.choices.

Example usage: let’s set up a population and weights matching those in the OP’s question:

>>> from random import choices
>>> population = [1, 2, 3, 4, 5, 6]
>>> weights = [0.1, 0.05, 0.05, 0.2, 0.4, 0.2]

Now choices(population, weights) generates a single sample:

>>> choices(population, weights)
4

The optional keyword-only argument k allows one to request more than one sample at once. This is valuable because there’s some preparatory work that random.choices has to do every time it’s called, prior to generating any samples; by generating many samples at once, we only have to do that preparatory work once. Here we generate a million samples, and use collections.Counter to check that the distribution we get roughly matches the weights we gave.

>>> million_samples = choices(population, weights, k=10**6)
>>> from collections import Counter
>>> Counter(million_samples)
Counter({5: 399616, 6: 200387, 4: 200117, 1: 99636, 3: 50219, 2: 50025})

Question 63

An advantage to generating the list using CDF is that you can use binary search. While you need O(n) time and space for preprocessing, you can get k numbers in O(k log n). Since normal Python lists are inefficient, you can use array module.

If you insist on constant space, you can do the following; O(n) time, O(1) space.

def random_distr(l):
    r = random.uniform(0, 1)
    s = 0
    for item, prob in l:
        s += prob
        if s >= r:
            return item
    return item  # Might occur because of floating point inaccuracies

Question 64

Maybe it is kind of late. But you can use numpy.random.choice(), passing the p parameter:

val = numpy.random.choice(numpy.arange(1, 7), p=[0.1, 0.05, 0.05, 0.2, 0.4, 0.2])

问题：Python垃圾收集器文档

回答 0

更新资料

Update

问题：__init __（）是否应该调用父类的__init __（）？

回答 0

回答 1

回答 2

回答 3

回答 4

回答 5

回答 6

问题：迭代对应于Python中列表的字典键值

回答 0

回答 1

回答 2

回答 3

回答 4

问题：Emacs适用于Python的批量缩进

回答 0

回答 1

回答 2

回答 3

回答 4

回答 5

回答 6

回答 7

问题：numpy，scipy，matplotlib和pylab之间的混淆

回答 0

回答 1

回答 2

问题：检查类是否已定义函数的最快方法是什么？

回答 0

回答 1

回答 2

回答 3

回答 4

回答 5

回答 6

回答 7

问题：Python，创建对象

回答 0

回答 1

回答 2

回答 3

问题：脾气暴躁的地方有多个条件

回答 0

回答 1

回答 2

更新

UPDATE

回答 3

回答 4

回答 5

回答 6

回答 7

问题：在__init__内调用类函数

回答 0

回答 1

回答 2

回答 3

回答 4

回答 5

问题：生成具有给定（数字）分布的随机数

回答 0

回答 1

回答 2

回答 3

回答 4

回答 5

回答 6

回答 7

回答 8

回答 9

回答 10

回答 11

回答 12

有趣好用的Python教程

问题：init （）是否应该调用父类的init （）？

问题：在init内调用类函数