Python 实用宝典

Question 1

When do I use each ?

Also…is the NLTK lemmatization dependent upon Parts of Speech? Wouldn’t it be more accurate if it was?

Question 2

Short and dense: http://nlp.stanford.edu/IR-book/html/htmledition/stemming-and-lemmatization-1.html

The goal of both stemming and lemmatization is to reduce inflectional forms and sometimes derivationally related forms of a word to a common base form.

However, the two words differ in their flavor. Stemming usually refers to a crude heuristic process that chops off the ends of words in the hope of achieving this goal correctly most of the time, and often includes the removal of derivational affixes. Lemmatization usually refers to doing things properly with the use of a vocabulary and morphological analysis of words, normally aiming to remove inflectional endings only and to return the base or dictionary form of a word, which is known as the lemma .

From the NLTK docs:

Lemmatization and stemming are special cases of normalization. They identify a canonical representative for a set of related word forms.

Question 3

Lemmatisation is closely related to stemming. The difference is that a stemmer operates on a single word without knowledge of the context, and therefore cannot discriminate between words which have different meanings depending on part of speech. However, stemmers are typically easier to implement and run faster, and the reduced accuracy may not matter for some applications.

For instance:

The word “better” has “good” as its lemma. This link is missed by stemming, as it requires a dictionary look-up.

The word “walk” is the base form for word “walking”, and hence this is matched in both stemming and lemmatisation.

The word “meeting” can be either the base form of a noun or a form of a verb (“to meet”) depending on the context, e.g., “in our last meeting” or “We are meeting again tomorrow”. Unlike stemming, lemmatisation can in principle select the appropriate lemma depending on the context.

Source: https://en.wikipedia.org/wiki/Lemmatisation

Question 4

There are two aspects to show their differences:

A stemmer will return the stem of a word, which needn’t be identical to the morphological root of the word. It usually sufficient that related words map to the same stem,even if the stem is not in itself a valid root, while in lemmatisation, it will return the dictionary form of a word, which must be a valid word.
In lemmatisation, the part of speech of a word should be first determined and the normalisation rules will be different for different part of speech, while the stemmer operates on a single word without knowledge of the context, and therefore cannot discriminate between words which have different meanings depending on part of speech.

Reference http://textminingonline.com/dive-into-nltk-part-iv-stemming-and-lemmatization

Question 5

The purpose of both stemming and lemmatization is to reduce morphological variation. This is in contrast to the the more general “term conflation” procedures, which may also address lexico-semantic, syntactic, or orthographic variations.

The real difference between stemming and lemmatization is threefold:

Stemming reduces word-forms to (pseudo)stems, whereas lemmatization reduces the word-forms to linguistically valid lemmas. This difference is apparent in languages with more complex morphology, but may be irrelevant for many IR applications;
Lemmatization deals only with inflectional variance, whereas stemming may also deal with derivational variance;
In terms of implementation, lemmatization is usually more sophisticated (especially for morphologically complex languages) and usually requires some sort of lexica. Satisfatory stemming, on the other hand, can be achieved with rather simple rule-based approaches.

Lemmatization may also be backed up by a part-of-speech tagger in order to disambiguate homonyms.

Question 6

As MYYN pointed out, stemming is the process of removing inflectional and sometimes derivational affixes to a base form that all of the original words are probably related to. Lemmatization is concerned with obtaining the single word that allows you to group together a bunch of inflected forms. This is harder than stemming because it requires taking the context into account (and thus the meaning of the word), while stemming ignores context.

As for when you would use one or the other, it’s a matter of how much your application depends on getting the meaning of a word in context correct. If you’re doing machine translation, you probably want lemmatization to avoid mistranslating a word. If you’re doing information retrieval over a billion documents with 99% of your queries ranging from 1-3 words, you can settle for stemming.

As for NLTK, the WordNetLemmatizer does use the part of speech, though you have to provide it (otherwise it defaults to nouns). Passing it “dove” and “v” yields “dive” while “dove” and “n” yields “dove”.

Question 7

An example-driven explanation on the differenes between lemmatization and stemming:

Lemmatization handles matching “car” to “cars” along with matching “car” to “automobile”.

Stemming handles matching “car” to “cars” .

Lemmatization implies a broader scope of fuzzy word matching that is still handled by the same subsystems. It implies certain techniques for low level processing within the engine, and may also reflect an engineering preference for terminology.

[…] Taking FAST as an example, their lemmatization engine handles not only basic word variations like singular vs. plural, but also thesaurus operators like having “hot” match “warm”.

This is not to say that other engines don’t handle synonyms, of course they do, but the low level implementation may be in a different subsystem than those that handle base stemming.

http://www.ideaeng.com/stemming-lemmatization-0601

Question 8

ianacl
but i think Stemming is a rough hack people use to get all the different forms of the same word down to a base form which need not be a legit word on its own
Something like the Porter Stemmer can uses simple regexes to eliminate common word suffixes

Lemmatization brings a word down to its actual base form which, in the case of irregular verbs, might look nothing like the input word
Something like Morpha which uses FSTs to bring nouns and verbs to their base form

Question 9

Stemming just removes or stems the last few characters of a word, often leading to incorrect meanings and spelling. Lemmatization considers the context and converts the word to its meaningful base form, which is called Lemma. Sometimes, the same word can have multiple different Lemmas. We should identify the Part of Speech (POS) tag for the word in that specific context. Here are the examples to illustrate all the differences and use cases:

If you lemmatize the word ‘Caring‘, it would return ‘Care‘. If you stem, it would return ‘Car‘ and this is erroneous.
If you lemmatize the word ‘Stripes‘ in verb context, it would return ‘Strip‘. If you lemmatize it in noun context, it would return ‘Stripe‘. If you just stem it, it would just return ‘Strip‘.
You would get same results whether you lemmatize or stem words such as walking, running, swimming… to walk, run, swim etc.
Lemmatization is computationally expensive since it involves look-up tables and what not. If you have large dataset and performance is an issue, go with Stemming. Remember you can also add your own rules to Stemming. If accuracy is paramount and dataset isn’t humongous, go with Lemmatization.

Question 10

Stemming is the process of removing the last few characters of a given word, to obtain a shorter form, even if that form doesn’t have any meaning.

Examples,

"beautiful" -> "beauti"
"corpora" -> "corpora"

Stemming can be done very quickly.

Lemmatization on the other hand, is the process of converting the given word into it’s base form according to the dictionary meaning of the word.

Examples,

"beautiful" -> "beauty"
"corpora" -> "corpus"

Lemmatization takes more time than stemming.

Question 11

In a django online course, the instructor has us use the url() function to call views and utilize regular expressions in the urlpatterns list. I’ve seen other examples on youtube of this. e.g.

from django.contrib import admin
from django.urls import include
from django.conf.urls import url

urlpatterns = [
    path('admin/', admin.site.urls),
    url(r'^polls/', include('polls.urls')),
]


#and in polls/urls.py

urlpatterns = [        
    url(r'^$', views.index, name="index"),
]

However, in going through the Django tutorial, they use path() instead e.g.:

from django.urls import path
from . import views

urlpatterns = [
    path('', views.index, name="index"),        
]

Furthermore regular expressions don’t seem to work with the path() function as using a path(r'^$', views.index, name="index") won’t find the mysite.com/polls/ view.

Is using path() without regex matching the proper way going forward? Is url() more powerful but more complicated so they’re using path() to start us out with? Or is it a case of different tools for different jobs?

Question 12

From Django documentation for url

url(regex, view, kwargs=None, name=None) This function is an alias to django.urls.re_path(). It’s likely to be deprecated in a future release.

Key difference between path and re_path is that path uses route without regex

You can use re_path for complex regex calls and use just path for simpler lookups

Question 13

The new django.urls.path() function allows a simpler, more readable URL routing syntax. For example, this example from previous Django releases:

url(r'^articles/(?P<year>[0-9]{4})/$', views.year_archive)

could be written as:

path('articles/<int:year>/', views.year_archive)

The django.conf.urls.url() function from previous versions is now available as django.urls.re_path(). The old location remains for backwards compatibility, without an imminent deprecation. The old django.conf.urls.include() function is now importable from django.urls so you can use:

from django.urls import include, path, re_path

in the URLconfs. For further reading django doc

Question 14

path is simply new in Django 2.0, which was only released a couple of weeks ago. Most tutorials won’t have been updated for the new syntax.

It was certainly supposed to be a simpler way of doing things; I wouldn’t say that URL is more powerful though, you should be able to express patterns in either format.

Question 15

Regular expressions don’t seem to work with the path() function with the following arguments: path(r'^$', views.index, name="index").

It should be like this: path('', views.index, name="index").

The 1st argument must be blank to enter a regular expression.

Question 16

Path is a new feature of Django 2.0. Explained here : https://docs.djangoproject.com/en/2.0/releases/2.0/#whats-new-2-0

Look like more pythonic way, and enable to not use regular expression in argument you pass to view… you can ue int() function for exemple.

Question 17

From v2.0 many users are using path, but we can use either path or url. For example in django 2.1.1 mapping to functions through url can be done as follows

from django.contrib import admin
from django.urls import path

from django.contrib.auth import login
from posts.views import post_home
from django.conf.urls import url

urlpatterns = [
    path('admin/', admin.site.urls),
    url(r'^posts/$', post_home, name='post_home'),

]

where posts is an application & post_home is a function in views.py

Question 18

this code is get the templates/blog1/page.html in b.py:

path = os.path.join(os.path.dirname(__file__), os.path.join('templates', 'blog1/page.html'))

but i want to get the parent dir location:

aParent
   |--a
   |  |---b.py
   |      |---templates
   |              |--------blog1
   |                         |-------page.html
   |--templates
          |--------blog1
                     |-------page.html

and how to get the aParent location

thanks

updated:

this is right:

dirname=os.path.dirname
path = os.path.join(dirname(dirname(__file__)), os.path.join('templates', 'blog1/page.html'))

or

path = os.path.abspath(os.path.join(os.path.dirname(__file__),".."))

Question 19

You can apply dirname repeatedly to climb higher: dirname(dirname(file)). This can only go as far as the root package, however. If this is a problem, use os.path.abspath: dirname(dirname(abspath(file))).

Question 20

os.path.abspath doesn’t validate anything, so if we’re already appending strings to __file__ there’s no need to bother with dirname or joining or any of that. Just treat __file__ as a directory and start climbing:

# climb to __file__'s parent's parent:
os.path.abspath(__file__ + "/../../")

That’s far less convoluted than os.path.abspath(os.path.join(os.path.dirname(__file__),"..")) and about as manageable as dirname(dirname(__file__)). Climbing more than two levels starts to get ridiculous.

But, since we know how many levels to climb, we could clean this up with a simple little function:

uppath = lambda _path, n: os.sep.join(_path.split(os.sep)[:-n])

# __file__ = "/aParent/templates/blog1/page.html"
>>> uppath(__file__, 1)
'/aParent/templates/blog1'
>>> uppath(__file__, 2)
'/aParent/templates'
>>> uppath(__file__, 3)
'/aParent'

Question 21

Use relative path with the pathlib module in Python 3.4+:

from pathlib import Path

Path(__file__).parent

You can use multiple calls to parent to go further in the path:

Path(__file__).parent.parent

As an alternative to specifying parent twice, you can use:

Path(__file__).parents[1]

Question 22

os.path.dirname(os.path.abspath(__file__))

Should give you the path to a.

But if b.py is the file that is currently executed, then you can achieve the same by just doing

os.path.abspath(os.path.join('templates', 'blog1', 'page.html'))

Question 23

os.pardir is a better way for ../ and more readable.

import os
print os.path.abspath(os.path.join(given_path, os.pardir))

This will return the parent path of the given_path

Question 24

A simple way can be:

import os
current_dir =  os.path.abspath(os.path.dirname(__file__))
parent_dir = os.path.abspath(current_dir + "/../")
print parent_dir

Question 25

May be join two .. folder, to get parent of the parent folder?

path = os.path.abspath(os.path.join(os.path.dirname(os.path.abspath(__file__)),"..",".."))

Question 26

Use the following to jump to previous folder:

os.chdir(os.pardir)

If you need multiple jumps a good and easy solution will be to use a simple decorator in this case.

Question 27

Here is another relatively simple solution that:

does not use dirname() (which does not work as expected on one level arguments like “file.txt” or relative parents like “..”)
does not use abspath() (avoiding any assumptions about the current working directory) but instead preserves the relative character of paths

it just uses normpath and join:

def parent(p):
    return os.path.normpath(os.path.join(p, os.path.pardir))

# Example:
for p in ['foo', 'foo/bar/baz', 'with/trailing/slash/', 
        'dir/file.txt', '../up/', '/abs/path']:
    print parent(p)

Result:

.
foo/bar
with/trailing
dir
..
/abs

Question 28

I think use this is better:

os.path.realpath(__file__).rsplit('/', X)[0]


In [1]: __file__ = "/aParent/templates/blog1/page.html"

In [2]: os.path.realpath(__file__).rsplit('/', 3)[0]
Out[3]: '/aParent'

In [4]: __file__ = "/aParent/templates/blog1/page.html"

In [5]: os.path.realpath(__file__).rsplit('/', 1)[0]
Out[6]: '/aParent/templates/blog1'

In [7]: os.path.realpath(__file__).rsplit('/', 2)[0]
Out[8]: '/aParent/templates'

In [9]: os.path.realpath(__file__).rsplit('/', 3)[0]
Out[10]: '/aParent'

Question 29

I tried:

import os
os.path.abspath(os.path.join(os.path.dirname(os.path.abspath(inspect.getfile(inspect.currentframe()))), os.pardir))

Question 30

I want to check if any of the items in one list are present in another list. I can do it simply with the code below, but I suspect there might be a library function to do this. If not, is there a more pythonic method of achieving the same result.

In [78]: a = [1, 2, 3, 4, 5]

In [79]: b = [8, 7, 6]

In [80]: c = [8, 7, 6, 5]

In [81]: def lists_overlap(a, b):
   ....:     for i in a:
   ....:         if i in b:
   ....:             return True
   ....:     return False
   ....: 

In [82]: lists_overlap(a, b)
Out[82]: False

In [83]: lists_overlap(a, c)
Out[83]: True

In [84]: def lists_overlap2(a, b):
   ....:     return len(set(a).intersection(set(b))) > 0
   ....:

Question 31

Short answer: use not set(a).isdisjoint(b), it’s generally the fastest.

There are four common ways to test if two lists a and b share any items. The first option is to convert both to sets and check their intersection, as such:

bool(set(a) & set(b))

Because sets are stored using a hash table in Python, searching them is O(1) (see here for more information about complexity of operators in Python). Theoretically, this is O(n+m) on average for n and m objects in lists a and b. But 1) it must first create sets out of the lists, which can take a non-negligible amount of time, and 2) it supposes that hashing collisions are sparse among your data.

The second way to do it is using a generator expression performing iteration on the lists, such as:

any(i in a for i in b)

This allows to search in-place, so no new memory is allocated for intermediary variables. It also bails out on the first find. But the in operator is always O(n) on lists (see here).

Another proposed option is an hybridto iterate through one of the list, convert the other one in a set and test for membership on this set, like so:

a = set(a); any(i in a for i in b)

A fourth approach is to take advantage of the isdisjoint() method of the (frozen)sets (see here), for example:

not set(a).isdisjoint(b)

If the elements you search are near the beginning of an array (e.g. it is sorted), the generator expression is favored, as the sets intersection method have to allocate new memory for the intermediary variables:

from timeit import timeit
>>> timeit('bool(set(a) & set(b))', setup="a=list(range(1000));b=list(range(1000))", number=100000)
26.077727576019242
>>> timeit('any(i in a for i in b)', setup="a=list(range(1000));b=list(range(1000))", number=100000)
0.16220548999262974

Here’s a graph of the execution time for this example in function of list size:

Note that both axes are logarithmic. This represents the best case for the generator expression. As can be seen, the isdisjoint() method is better for very small list sizes, whereas the generator expression is better for larger list sizes.

On the other hand, as the search begins with the beginning for the hybrid and generator expression, if the shared element are systematically at the end of the array (or both lists does not share any values), the disjoint and set intersection approaches are then way faster than the generator expression and the hybrid approach.

>>> timeit('any(i in a for i in b)', setup="a=list(range(1000));b=[x+998 for x in range(999,0,-1)]", number=1000))
13.739536046981812
>>> timeit('bool(set(a) & set(b))', setup="a=list(range(1000));b=[x+998 for x in range(999,0,-1)]", number=1000))
0.08102107048034668

It is interesting to note that the generator expression is way slower for bigger list sizes. This is only for 1000 repetitions, instead of the 100000 for the previous figure. This setup also approximates well when when no elements are shared, and is the best case for the disjoint and set intersection approaches.

Here are two analysis using random numbers (instead of rigging the setup to favor one technique or another):

High chance of sharing: elements are randomly taken from [1, 2*len(a)]. Low chance of sharing: elements are randomly taken from [1, 1000*len(a)].

Up to now, this analysis supposed both lists are of the same size. In case of two lists of different sizes, for example a is much smaller, isdisjoint() is always faster:

Make sure that the a list is the smaller, otherwise the performance decreases. In this experiment, the a list size was set constant to 5.

In summary:

If the lists are very small (< 10 elements), not set(a).isdisjoint(b) is always the fastest.
If the elements in the lists are sorted or have a regular structure that you can take advantage of, the generator expression any(i in a for i in b) is the fastest on large list sizes;
Test the set intersection with not set(a).isdisjoint(b), which is always faster than bool(set(a) & set(b)).
The hybrid “iterate through list, test on set” a = set(a); any(i in a for i in b) is generally slower than other methods.
The generator expression and the hybrid are much slower than the two other approaches when it comes to lists without sharing elements.

In most cases, using the isdisjoint() method is the best approach as the generator expression will take much longer to execute, as it is very inefficient when no elements are shared.

Question 32

def lists_overlap3(a, b):
    return bool(set(a) & set(b))

Note: the above assumes that you want a boolean as the answer. If all you need is an expression to use in an if statement, just use if set(a) & set(b):

Question 33

def lists_overlap(a, b):
  sb = set(b)
  return any(el in sb for el in a)

This is asymptotically optimal (worst case O(n + m)), and might be better than the intersection approach due to any‘s short-circuiting.

E.g.:

lists_overlap([3,4,5], [1,2,3])

will return True as soon as it gets to 3 in sb

EDIT: Another variation (with thanks to Dave Kirby):

def lists_overlap(a, b):
  sb = set(b)
  return any(itertools.imap(sb.__contains__, a))

This relies on imap‘s iterator, which is implemented in C, rather than a generator comprehension. It also uses sb.__contains__ as the mapping function. I don’t know how much performance difference this makes. It will still short-circuit.

Question 34

You could also use any with list comprehension:

any([item in a for item in b])

Question 35

In python 2.6 or later you can do:

return not frozenset(a).isdisjoint(frozenset(b))

Question 36

You can use the any built in function /w a generator expression:

def list_overlap(a,b): 
     return any(i for i in a if i in b)

As John and Lie have pointed out this gives incorrect results when for every i shared by the two lists bool(i) == False. It should be:

return any(i in b for i in a)

Question 37

This question is pretty old, but I noticed that while people were arguing sets vs. lists, that no one thought of using them together. Following Soravux’s example,

Worst case for lists:

>>> timeit('bool(set(a) & set(b))',  setup="a=list(range(10000)); b=[x+9999 for x in range(10000)]", number=100000)
100.91506409645081
>>> timeit('any(i in a for i in b)', setup="a=list(range(10000)); b=[x+9999 for x in range(10000)]", number=100000)
19.746716022491455
>>> timeit('any(i in a for i in b)', setup="a= set(range(10000)); b=[x+9999 for x in range(10000)]", number=100000)
0.092626094818115234

And the best case for lists:

>>> timeit('bool(set(a) & set(b))',  setup="a=list(range(10000)); b=list(range(10000))", number=100000)
154.69790101051331
>>> timeit('any(i in a for i in b)', setup="a=list(range(10000)); b=list(range(10000))", number=100000)
0.082653045654296875
>>> timeit('any(i in a for i in b)', setup="a= set(range(10000)); b=list(range(10000))", number=100000)
0.08434605598449707

So even faster than iterating through two lists is iterating though a list to see if it’s in a set, which makes sense since checking if a number is in a set takes constant time while checking by iterating through a list takes time proportional to the length of the list.

Thus, my conclusion is that iterate through a list, and check if it’s in a set.

Question 38

if you don’t care what the overlapping element might be, you can simply check the len of the combined list vs. the lists combined as a set. If there are overlapping elements, the set will be shorter:

len(set(a+b+c))==len(a+b+c) returns True, if there is no overlap.

Question 39

I’ll throw another one in with a functional programming style:

any(map(lambda x: x in a, b))

Explanation:

map(lambda x: x in a, b)

returns a list of booleans where elements of b are found in a. That list is then passed to any, which simply returns True if any elements are True.

Question 40

If I do

conda info pandas

I can see all of the packages available.

I updated my pandas to the latest this morning, but I need to revert to a prior version now. I tried

conda update pandas 0.13.1

but that didn’t work. How do I specify which version to use?

Question 41

I had to use the install function instead:

conda install pandas=0.13.1

Question 42

For the case that you wish to revert a recently installed package that made several changes to dependencies (such as tensorflow), you can “roll back” to an earlier installation state via the following method:

conda list --revisions
conda install --revision [revision number]

The first command shows previous installation revisions (with dependencies) and the second reverts to whichever revision number you specify.

Note that if you wish to (re)install a later revision, you may have to sequentially reinstall all intermediate versions. If you had been at revision 23, reinstalled revision 20 and wish to return, you may have to run each:

conda install --revision 21
conda install --revision 22
conda install --revision 23

Question 43

I have been learning Python by following some pygame tutorials.

Therein I found extensive use of the keyword self, and coming from a primarily Java background, I find that I keep forgetting to type self. For example, instead of self.rect.centerx I would type rect.centerx, because, to me, rect is already a member variable of the class.

The Java parallel I can think of for this situation is having to prefix all references to member variables with this.

Am I stuck prefixing all member variables with self, or is there a way to declare them that would allow me to avoid having to do so?

Even if what I am suggesting isn’t pythonic, I’d still like to know if it is possible.

I have taken a look at these related SO questions, but they don’t quite answer what I am after:

Question 44

Python requires specifying self. The result is there’s never any confusion over what’s a member and what’s not, even without the full class definition visible. This leads to useful properties, such as: you can’t add members which accidentally shadow non-members and thereby break code.

One extreme example: you can write a class without any knowledge of what base classes it might have, and always know whether you are accessing a member or not:

class A(some_function()):
  def f(self):
    self.member = 42
    self.method()

That’s the complete code! (some_function returns the type used as a base.)

Another, where the methods of a class are dynamically composed:

class B(object):
  pass

print B()
# <__main__.B object at 0xb7e4082c>

def B_init(self):
  self.answer = 42
def B_str(self):
  return "<The answer is %s.>" % self.answer
# notice these functions require no knowledge of the actual class
# how hard are they to read and realize that "members" are used?

B.__init__ = B_init
B.__str__ = B_str

print B()
# <The answer is 42.>

Remember, both of these examples are extreme and you won’t see them every day, nor am I suggesting you should often write code like this, but they do clearly show aspects of self being explicitly required.

Question 45

Previous answers are all basically variants of “you can’t” or “you shouldn’t”. While I agree with the latter sentiment, the question is technically still unanswered.

Furthermore, there are legitimate reasons why someone might want to do something along the lines of what the actual question is asking. One thing I run into sometimes is lengthy math equations where using long names makes the equation unrecognizable. Here are a couple ways of how you could do this in a canned example:

import numpy as np
class MyFunkyGaussian() :
    def __init__(self, A, x0, w, s, y0) :
        self.A = float(A)
        self.x0 = x0
        self.w = w
        self.y0 = y0
        self.s = s

    # The correct way, but subjectively less readable to some (like me) 
    def calc1(self, x) :
        return (self.A/(self.w*np.sqrt(np.pi))/(1+self.s*self.w**2/2)
                * np.exp( -(x-self.x0)**2/self.w**2)
                * (1+self.s*(x-self.x0)**2) + self.y0 )

    # The correct way if you really don't want to use 'self' in the calculations
    def calc2(self, x) :
        # Explicity copy variables
        A, x0, w, y0, s = self.A, self.x0, self.w, self.y0, self.s
        sqrt, exp, pi = np.sqrt, np.exp, np.pi
        return ( A/( w*sqrt(pi) )/(1+s*w**2/2)
                * exp( -(x-x0)**2/w**2 )
                * (1+s*(x-x0)**2) + y0 )

    # Probably a bad idea...
    def calc3(self, x) :
        # Automatically copy every class vairable
        for k in self.__dict__ : exec(k+'= self.'+k)
        sqrt, exp, pi = np.sqrt, np.exp, np.pi
        return ( A/( w*sqrt(pi) )/(1+s*w**2/2)
                * exp( -(x-x0)**2/w**2 )
                * (1+s*(x-x0)**2) + y0 )

g = MyFunkyGaussian(2.0, 1.5, 3.0, 5.0, 0.0)
print(g.calc1(0.5))
print(g.calc2(0.5))
print(g.calc3(0.5))

The third example – i.e. using for k in self.__dict__ : exec(k+'= self.'+k) is basically what the question is actually asking for, but let me be clear that I don’t think it is generally a good idea.

For more info, and ways to iterate through class variables, or even functions, see answers and discussion to this question. For a discussion of other ways to dynamically name variables, and why this is usually not a good idea see this blog post.

UPDATE: There appears to be no way to dynamically update or change locals in a function in Python3, so calc3 and similar variants are no longer possible. The only python3 compatible solution I can think of now is to use globals:

def calc4(self, x) :
        # Automatically copy every class variable in globals
        globals().update(self.__dict__)
        sqrt, exp, pi = np.sqrt, np.exp, np.pi
        return ( A/( w*sqrt(pi) )/(1+s*w**2/2)
                * exp( -(x-x0)**2/w**2 )
                * (1+s*(x-x0)**2) + y0 )

Which, again, would be a terrible practice in general.

Question 46

Actually self is not a keyword, it’s just the name conventionally given to the first parameter of instance methods in Python. And that first parameter can’t be skipped, as it’s the only mechanism a method has of knowing which instance of your class it’s being called on.

Question 47

You can use whatever name you want, for example

class test(object):
    def function(this, variable):
        this.variable = variable

or even

class test(object):
    def function(s, variable):
        s.variable = variable

but you are stuck with using a name for the scope.

I do not recommend you use something different to self unless you have a convincing reason, as it would make it alien for experienced pythonistas.

Question 48

yes, you must always specify self, because explicit is better than implicit, according to python philosophy.

You will also find out that the way you program in python is very different from the way you program in java, hence the use of self tends to decrease because you don’t project everything inside the object. Rather, you make larger use of module-level function, which can be better tested.

by the way. I hated it at first, now I hate the opposite. same for indented-driven flow control.

Question 49

The “self” is the conventional placeholder of the current object instance of a class. Its used when you want to refer to the object’s property or field or method inside a class as if you’re referring to “itself”. But to make it shorter someone in the Python programming realm started to use “self” , other realms use “this” but they make it as a keyword which cannot be replaced. I rather used “its” to increase the code readability. Its one of the good things in Python – you have a freedom to choose your own placeholder for the object’s instance other than “self”. Example for self:

class UserAccount():    
    def __init__(self, user_type, username, password):
        self.user_type = user_type
        self.username = username            
        self.password = encrypt(password)        

    def get_password(self):
        return decrypt(self.password)

    def set_password(self, password):
        self.password = encrypt(password)

Now we replace ‘self’ with ‘its’:

class UserAccount():    
    def __init__(its, user_type, username, password):
        its.user_type = user_type
        its.username = username            
        its.password = encrypt(password)        

    def get_password(its):
        return decrypt(its.password)

    def set_password(its, password):
        its.password = encrypt(password)

which is more readable now?

Question 50

self is part of the python syntax to access members of objects, so I’m afraid you’re stuck with it

Question 51

Actually you can use recipe “Implicit self” from Armin Ronacher presentation “5 years of bad ideas” ( google it).

It’s a very clever recipe, as almost everything from Armin Ronacher, but I don’t think this idea is very appealing. I think I’d prefer explicit this in C#/Java.

Update. Link to “bad idea recipe”: https://speakerdeck.com/mitsuhiko/5-years-of-bad-ideas?slide=58

Question 52

Yeah, self is tedious. But, is it better?

class Test:

    def __init__(_):
        _.test = 'test'

    def run(_):
        print _.test

Question 53

From: Self Hell – More stateful functions.

…a hybrid approach works best. All of your class methods that actually do computation should be moved into closures, and extensions to clean up syntax should be kept in classes. Stuff the closures into classes, treating the class much like a namespace. The closures are essentially static functions, and so do not require selfs*, even in the class…

Question 54

I think that it would be easier and more readable if there was a statement “member” just as there is “global” so you can tell the interpreter which are the objects members of the class.

Question 55

I searched in this official document to find difference between the json.dump() and json.dumps() in python. It is clear that they are related with file write option.
But what is the detailed difference between them and in what situations one has more advantage than other?

Question 56

There isn’t much else to add other than what the docs say. If you want to dump the JSON into a file/socket or whatever, then you should go with dump(). If you only need it as a string (for printing, parsing or whatever) then use dumps() (dump string)

As mentioned by Antti Haapala in this answer, there are some minor differences on the ensure_ascii behaviour. This is mostly due to how the underlying write() function works, being that it operates on chunks rather than the whole string. Check his answer for more details on that.

json.dump()

Serialize obj as a JSON formatted stream to fp (a .write()-supporting file-like object

If ensure_ascii is False, some chunks written to fp may be unicode instances

json.dumps()

Serialize obj to a JSON formatted str

If ensure_ascii is False, the result may contain non-ASCII characters and the return value may be a unicode instance

Question 57

The functions with an s take string parameters. The others take file streams.

Question 58

In memory usage and speed.

When you call jsonstr = json.dumps(mydata) it first creates a full copy of your data in memory and only then you file.write(jsonstr) it to disk. So this is a faster method but can be a problem if you have a big piece of data to save.

When you call json.dump(mydata, file) — without ‘s’, new memory is not used, as the data is dumped by chunks. But the whole process is about 2 times slower.

Source: I checked the source code of json.dump() and json.dumps() and also tested both the variants measuring the time with time.time() and watching the memory usage in htop.

Question 59

One notable difference in Python 2 is that if you’re using ensure_ascii=False, dump will properly write UTF-8 encoded data into the file (unless you used 8-bit strings with extended characters that are not UTF-8):

dumps on the other hand, with ensure_ascii=False can produce a str or unicode just depending on what types you used for strings:

Serialize obj to a JSON formatted str using this conversion table. If ensure_ascii is False, the result may contain non-ASCII characters and the return value may be a unicode instance.

(emphasis mine). Note that it may still be a str instance as well.

Thus you cannot use its return value to save the structure into file without checking which format was returned and possibly playing with unicode.encode.

This of course is not valid concern in Python 3 any more, since there is no more this 8-bit/Unicode confusion.

As for load vs loads, load considers the whole file to be one JSON document, so you cannot use it to read multiple newline limited JSON documents from a single file.

Question 60

I’m trying to access a dict_key’s element by its index:

test = {'foo': 'bar', 'hello': 'world'}
keys = test.keys()  # dict_keys object

keys.index(0)
AttributeError: 'dict_keys' object has no attribute 'index'

I want to get foo.

same with:

keys[0]
TypeError: 'dict_keys' object does not support indexing

How can I do this?

Question 61

Call list() on the dictionary instead:

keys = list(test)

In Python 3, the dict.keys() method returns a dictionary view object, which acts as a set. Iterating over the dictionary directly also yields keys, so turning a dictionary into a list results in a list of all the keys:

>>> test = {'foo': 'bar', 'hello': 'world'}
>>> list(test)
['foo', 'hello']
>>> list(test)[0]
'foo'

Question 62

Not a full answer but perhaps a useful hint. If it is really the first item you want*, then

next(iter(q))

is much faster than

list(q)[0]

for large dicts, since the whole thing doesn’t have to be stored in memory.

For 10.000.000 items I found it to be almost 40.000 times faster.

*The first item in case of a dict being just a pseudo-random item before Python 3.6 (after that it’s ordered in the standard implementation, although it’s not advised to rely on it).

Question 63

I wanted “key” & “value” pair of a first dictionary item. I used the following code.

 key, val = next(iter(my_dict.items()))

Question 64

test = {'foo': 'bar', 'hello': 'world'}
ls = []
for key in test.keys():
    ls.append(key)
print(ls[0])

Conventional way of appending the keys to a statically defined list and then indexing it for same

问题：词形化与词干的区别是什么？

回答 0

回答 1

回答 2

回答 3

回答 4

回答 5

回答 6

回答 7

回答 8

问题：对于Django 2.0，在urls.py中使用path（）或url（）更好吗？

回答 0

回答 1

回答 2

回答 3

回答 4

回答 5

问题：如何获取父目录位置

回答 0

回答 1

回答 2

回答 3

回答 4

回答 5

回答 6

回答 7

回答 8

回答 9

回答 10

问题：测试列表是否共享python中的任何项目

回答 0

回答 1

回答 2

回答 3

回答 4

回答 5

回答 6

回答 7

回答 8

问题：如何恢复到Anaconda中的先前软件包？

回答 0

回答 1

问题：如何避免在Python中显式的“自我”？

回答 0

回答 1

回答 2

回答 3

回答 4

回答 5

回答 6

回答 7

回答 8

回答 9

回答 10

问题：python中的json.dump（）和json.dumps（）有什么区别？

回答 0

回答 1

回答 2

回答 3

问题：在Python3中按索引访问dict_keys元素

回答 0

回答 1

回答 2

回答 3

回答 4

回答 5

问题：上个月的python日期

回答 0

回答 1

回答 2

回答 3

回答 4

回答 5

回答 6

回答 7

回答 8

回答 9

回答 10

回答 11

问题：计算大熊猫数量的最有效方法是什么？