Python 实用宝典

Question 1

I am having a lot of trouble understanding how the class_weight parameter in scikit-learn’s Logistic Regression operates.

The Situation

I want to use logistic regression to do binary classification on a very unbalanced data set. The classes are labelled 0 (negative) and 1 (positive) and the observed data is in a ratio of about 19:1 with the majority of samples having negative outcome.

First Attempt: Manually Preparing Training Data

I split the data I had into disjoint sets for training and testing (about 80/20). Then I randomly sampled the training data by hand to get training data in different proportions than 19:1; from 2:1 -> 16:1.

I then trained logistic regression on these different training data subsets and plotted recall (= TP/(TP+FN)) as a function of the different training proportions. Of course, the recall was computed on the disjoint TEST samples which had the observed proportions of 19:1. Note, although I trained the different models on different training data, I computed recall for all of them on the same (disjoint) test data.

The results were as expected: the recall was about 60% at 2:1 training proportions and fell off rather fast by the time it got to 16:1. There were several proportions 2:1 -> 6:1 where the recall was decently above 5%.

Second Attempt: Grid Search

Next, I wanted to test different regularization parameters and so I used GridSearchCV and made a grid of several values of the C parameter as well as the class_weight parameter. To translate my n:m proportions of negative:positive training samples into the dictionary language of class_weight I thought that I just specify several dictionaries as follows:

{ 0:0.67, 1:0.33 } #expected 2:1
{ 0:0.75, 1:0.25 } #expected 3:1
{ 0:0.8, 1:0.2 }   #expected 4:1

and I also included None and auto.

This time the results were totally wacked. All my recalls came out tiny (< 0.05) for every value of class_weight except auto. So I can only assume that my understanding of how to set the class_weight dictionary is wrong. Interestingly, the class_weight value of ‘auto’ in the grid search was around 59% for all values of C, and I guessed it balances to 1:1?

My Questions

How do you properly use class_weight to achieve different balances in training data from what you actually give it? Specifically, what dictionary do I pass to class_weight to use n:m proportions of negative:positive training samples?
If you pass various class_weight dictionaries to GridSearchCV, during cross-validation will it rebalance the training fold data according to the dictionary but use the true given sample proportions for computing my scoring function on the test fold? This is critical since any metric is only useful to me if it comes from data in the observed proportions.
What does the auto value of class_weight do as far as proportions? I read the documentation and I assume “balances the data inversely proportional to their frequency” just means it makes it 1:1. Is this correct? If not, can someone clarify?

Question 2

First off, it might not be good to just go by recall alone. You can simply achieve a recall of 100% by classifying everything as the positive class. I usually suggest using AUC for selecting parameters, and then finding a threshold for the operating point (say a given precision level) that you are interested in.

For how class_weight works: It penalizes mistakes in samples of class[i] with class_weight[i] instead of 1. So higher class-weight means you want to put more emphasis on a class. From what you say it seems class 0 is 19 times more frequent than class 1. So you should increase the class_weight of class 1 relative to class 0, say {0:.1, 1:.9}. If the class_weight doesn’t sum to 1, it will basically change the regularization parameter.

For how class_weight="auto" works, you can have a look at this discussion. In the dev version you can use class_weight="balanced", which is easier to understand: it basically means replicating the smaller class until you have as many samples as in the larger one, but in an implicit way.

Question 3

The first answer is good for understanding how it works. But I wanted to understand how I should be using it in practice.

SUMMARY

for moderately imbalanced data WITHOUT noise, there is not much of a difference in applying class weights
for moderately imbalanced data WITH noise and strongly imbalanced, it is better to apply class weights
param class_weight="balanced" works decent in the absence of you wanting to optimize manually
with class_weight="balanced" you capture more true events (higher TRUE recall) but also you are more likely to get false alerts (lower TRUE precision)
- as a result, the total % TRUE might be higher than actual because of all the false positives
- AUC might misguide you here if the false alarms are an issue
no need to change decision threshold to the imbalance %, even for strong imbalance, ok to keep 0.5 (or somewhere around that depending on what you need)

NB

The result might differ when using RF or GBM. sklearn does not have class_weight="balanced" for GBM but lightgbm has LGBMClassifier(is_unbalance=False)

CODE

# scikit-learn==0.21.3
from sklearn import datasets
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import roc_auc_score, classification_report
import numpy as np
import pandas as pd

# case: moderate imbalance
X, y = datasets.make_classification(n_samples=50*15, n_features=5, n_informative=2, n_redundant=0, random_state=1, weights=[0.8]) #,flip_y=0.1,class_sep=0.5)
np.mean(y) # 0.2

LogisticRegression(C=1e9).fit(X,y).predict(X).mean() # 0.184
(LogisticRegression(C=1e9).fit(X,y).predict_proba(X)[:,1]>0.5).mean() # 0.184 => same as first
LogisticRegression(C=1e9,class_weight={0:0.5,1:0.5}).fit(X,y).predict(X).mean() # 0.184 => same as first
LogisticRegression(C=1e9,class_weight={0:2,1:8}).fit(X,y).predict(X).mean() # 0.296 => seems to make things worse?
LogisticRegression(C=1e9,class_weight="balanced").fit(X,y).predict(X).mean() # 0.292 => seems to make things worse?

roc_auc_score(y,LogisticRegression(C=1e9).fit(X,y).predict(X)) # 0.83
roc_auc_score(y,LogisticRegression(C=1e9,class_weight={0:2,1:8}).fit(X,y).predict(X)) # 0.86 => about the same
roc_auc_score(y,LogisticRegression(C=1e9,class_weight="balanced").fit(X,y).predict(X)) # 0.86 => about the same

# case: strong imbalance
X, y = datasets.make_classification(n_samples=50*15, n_features=5, n_informative=2, n_redundant=0, random_state=1, weights=[0.95])
np.mean(y) # 0.06

LogisticRegression(C=1e9).fit(X,y).predict(X).mean() # 0.02
(LogisticRegression(C=1e9).fit(X,y).predict_proba(X)[:,1]>0.5).mean() # 0.02 => same as first
LogisticRegression(C=1e9,class_weight={0:0.5,1:0.5}).fit(X,y).predict(X).mean() # 0.02 => same as first
LogisticRegression(C=1e9,class_weight={0:1,1:20}).fit(X,y).predict(X).mean() # 0.25 => huh??
LogisticRegression(C=1e9,class_weight="balanced").fit(X,y).predict(X).mean() # 0.22 => huh??
(LogisticRegression(C=1e9,class_weight="balanced").fit(X,y).predict_proba(X)[:,1]>0.5).mean() # same as last

roc_auc_score(y,LogisticRegression(C=1e9).fit(X,y).predict(X)) # 0.64
roc_auc_score(y,LogisticRegression(C=1e9,class_weight={0:1,1:20}).fit(X,y).predict(X)) # 0.84 => much better
roc_auc_score(y,LogisticRegression(C=1e9,class_weight="balanced").fit(X,y).predict(X)) # 0.85 => similar to manual
roc_auc_score(y,(LogisticRegression(C=1e9,class_weight="balanced").fit(X,y).predict_proba(X)[:,1]>0.5).astype(int)) # same as last

print(classification_report(y,LogisticRegression(C=1e9).fit(X,y).predict(X)))
pd.crosstab(y,LogisticRegression(C=1e9).fit(X,y).predict(X),margins=True)
pd.crosstab(y,LogisticRegression(C=1e9).fit(X,y).predict(X),margins=True,normalize='index') # few prediced TRUE with only 28% TRUE recall and 86% TRUE precision so 6%*28%~=2%

print(classification_report(y,LogisticRegression(C=1e9,class_weight="balanced").fit(X,y).predict(X)))
pd.crosstab(y,LogisticRegression(C=1e9,class_weight="balanced").fit(X,y).predict(X),margins=True)
pd.crosstab(y,LogisticRegression(C=1e9,class_weight="balanced").fit(X,y).predict(X),margins=True,normalize='index') # 88% TRUE recall but also lot of false positives with only 23% TRUE precision, making total predicted % TRUE > actual % TRUE

Question 4

I want to make a dictionary where English words point to Russian and French translations.

How do I print out unicode characters in Python? Also, how do you store unicode chars in a variable?

Question 5

To include Unicode characters in your Python source code, you can use Unicode escape characters in the form \u0123 in your string. In Python 2.x, you also need to prefix the string literal with ‘u’.

Here’s an example running in the Python 2.x interactive console:

>>> print u'\u0420\u043e\u0441\u0441\u0438\u044f'
Россия

In Python 2, prefixing a string with ‘u’ declares them as Unicode-type variables, as described in the Python Unicode documentation.

In Python 3, the ‘u’ prefix is now optional:

>>> print('\u0420\u043e\u0441\u0441\u0438\u044f')
Россия

If running the above commands doesn’t display the text correctly for you, perhaps your terminal isn’t capable of displaying Unicode characters.

These examples use Unicode escapes (\u...), which allows you to print Unicode characters while keeping your source code as plain ASCII. This can help when working with the same source code on different systems. You can also use Unicode characters directly in your Python source code (e.g. print u'Россия' in Python 2), if you are confident all your systems handle Unicode files properly.

For information about reading Unicode data from a file, see this answer:

Character reading from file in Python

Question 6

Print a unicode character in Python:

Print a unicode character directly from python interpreter:

el@apollo:~$ python
Python 2.7.3
>>> print u'\u2713'
✓

Unicode character u'\u2713' is a checkmark. The interpreter prints the checkmark on the screen.

Print a unicode character from a python script:

Put this in test.py:

#!/usr/bin/python
print("here is your checkmark: " + u'\u2713');

Run it like this:

el@apollo:~$ python test.py
here is your checkmark: ✓

If it doesn’t show a checkmark for you, then the problem could be elsewhere, like the terminal settings or something you are doing with stream redirection.

Store unicode characters in a file:

Save this to file: foo.py:

#!/usr/bin/python -tt
# -*- coding: utf-8 -*-
import codecs
import sys 
UTF8Writer = codecs.getwriter('utf8')
sys.stdout = UTF8Writer(sys.stdout)
print(u'e with obfuscation: é')

Run it and pipe output to file:

python foo.py > tmp.txt

Open tmp.txt and look inside, you see this:

el@apollo:~$ cat tmp.txt 
e with obfuscation: é

Thus you have saved unicode e with a obfuscation mark on it to a file.

Question 7

If you’re trying to print() Unicode, and getting ascii codec errors, check out this page, the TLDR of which is do export PYTHONIOENCODING=UTF-8 before firing up python (this variable controls what sequence of bytes the console tries to encode your string data as). Internally, Python3 uses UTF-8 by default (see the Unicode HOWTO) so that’s not the problem; you can just put Unicode in strings, as seen in the other answers and comments. It’s when you try and get this data out to your console that the problem happens. Python thinks your console can only handle ascii. Some of the other answers say, “Write it to a file, first” but note they specify the encoding (UTF-8) for doing so (so, Python doesn’t change anything in writing), and then use a method for reading the file that just spits out the bytes without any regard for encoding, which is why that works.

Question 8

In Python 2, you declare unicode strings with a u, as in u"猫" and use decode() and encode() to translate to and from unicode, respectively.

It’s quite a bit easier in Python 3. A very good overview can be found here. That presentation clarified a lot of things for me.

Question 9

Considering that this is the first stack overflow result when google searching this topic, it bears mentioning that prefixing u to unicode strings is optional in Python 3. (Python 2 example was copied from the top answer)

Python 3 (both work):

print('\u0420\u043e\u0441\u0441\u0438\u044f')
print(u'\u0420\u043e\u0441\u0441\u0438\u044f')

Python 2:

print u'\u0420\u043e\u0441\u0441\u0438\u044f'

Question 10

I use Portable winpython in Windows, it includes IPython QT console, I could achieve the following.

>>>print ("結婚")
結婚

>>>print ("おはよう")
おはよう

>>>str = "結婚"


>>>print (str)
結婚

your console interpreter should support unicode in order to show unicode characters.

Question 11

Just one more thing that hasn’t been added yet

In Python 2, if you want to print a variable that has unicode and use .format(), then do this (make the base string that is being formatted a unicode string with u'':

>>> text = "Université de Montréal"
>>> print(u"This is unicode: {}".format(text))
>>> This is unicode: Université de Montréal

Question 12

This fixes UTF-8 printing in python:

UTF8Writer = codecs.getwriter('utf8')
sys.stdout = UTF8Writer(sys.stdout)

Question 13

Replace ‘+’ with ‘000’. For example, ‘U+1F600’ will become ‘U0001F600’ and prepend the Unicode code with “\” and print. Example:

>>> print("Learning : ", "\U0001F40D")
Learning :  🐍
>>>

Check this maybe it will help python unicode emoji

Question 14

I have this python script where I need to run gdal_retile.py, but I get an exception on this line:

if Verbose:
   print("Building internam Index for %d tile(s) ..." % len(inputTiles), end=' ')

The end=' ' is invalid syntax. I am curious as to why, and what the author probably meant to do.

I’m new to python if you haven’t already guessed.

I think the root cause of the problem is that these imports are failing and therefore one must contain this import from __future__ import print_function

try: 
   from osgeo import gdal
   from osgeo import ogr
   from osgeo import osr
   from osgeo.gdalconst import *
except:
   import gdal
   import ogr
   import osr
   from gdalconst import *

Question 15

Are you sure you are using Python 3.x? The syntax isn’t available in Python 2.x because print is still a statement.

print("foo" % bar, end=" ")

in Python 2.x is identical to

print ("foo" % bar, end=" ")

or

print "foo" % bar, end=" "

i.e. as a call to print with a tuple as argument.

That’s obviously bad syntax (literals don’t take keyword arguments). In Python 3.x print is an actual function, so it takes keyword arguments, too.

The correct idiom in Python 2.x for end=" " is:

print "foo" % bar,

(note the final comma, this makes it end the line with a space rather than a linebreak)

If you want more control over the output, consider using sys.stdout directly. This won’t do any special magic with the output.

Of course in somewhat recent versions of Python 2.x (2.5 should have it, not sure about 2.4), you can use the __future__ module to enable it in your script file:

from __future__ import print_function

The same goes with unicode_literals and some other nice things (with_statement, for example). This won’t work in really old versions (i.e. created before the feature was introduced) of Python 2.x, though.

Question 16

How about this:

#Only for use in Python 2.6.0a2 and later
from __future__ import print_function

This allows you to use the Python 3.0 style print function without having to hand-edit all occurrences of print :)

Question 17

In python 2.7 here is how you do it

mantra = 'Always look on the bright side of life'
for c in mantra: print c,

#output
A l w a y s   l o o k   o n   t h e   b r i g h t   s i d e   o f   l i f e

In python 3.x

myjob= 'hacker'
for c in myjob: print (c, end=' ')
#output 
h a c k e r

Question 18

First of all, you’re missing a quote at the beginning but this is probably a copy/paste error.

In Python 3.x, the end=' ' part will place a space after the displayed string instead of a newline. To do the same thing in Python 2.x, you’d put a comma at the end:

print "Building internam Index for %d tile(s) ..." % len(inputTiles),

Question 19

I think he’s using Python 3.0 and you’re using Python 2.6.

Question 20

This is just a version thing. Since Python 3.x the print is actually a function, so it now takes arguments like any normal function.

The end=' ' is just to say that you want a space after the end of the statement instead of a new line character. In Python 2.x you would have to do this by placing a comma at the end of the print statement.

For example, when in a Python 3.x environment:

while i<5:
    print(i)
    i=i+1

Will give the following output:

Where as:

while i<5:
    print(i, end = ' ')
    i=i+1

Will give as output:

0 1 2 3 4

Question 21

It looks like you’re just missing an opening double-quote. Try:

if Verbose:
   print("Building internam Index for %d tile(s) ..." % len(inputTiles), end=' ')

Question 22

I think the author probably meant:

if Verbose:
   print("Building internam Index for %d tile(s) ..." % len(inputTiles), end=' ')

He’s missing an initial quote after print(.

Note that as of Python 3.0, print is a function as opposed to a statement, if you’re using older versions of Python the equivalent would be:

print "Building internam Index for %d tile(s) ..." % len(inputTiles)

The end parameter means that the line gets ' ' at the end rather than a newline character. The equivalent in earlier versions of Python is:

print "Building internam Index for %d tile(s) ..." % len(inputTiles),

(thanks Ignacio).

Question 23

USE :: python3 filename.py

I had such error , this occured because i have two versions of python installed on my drive namely python2.7 and python3 . Following was my code :

#!usr/bin/python

f = open('lines.txt')
for line in f.readlines():
        print(line,end ='')

when i run it by the command python lines.py I got the following error

#!usr/bin/python

f = open('lines.txt')
for line in f.readlines():
        print(line,end ='')

when I run it by the command python3 lines.py I executed successfully

Question 24

For python 2.7 I had the same issue Just use “from __future__ import print_function” without quotes to resolve this issue.This Ensures Python 2.6 and later Python 2.x can use Python 3.x print function.

Question 25

Try this one if you are working with python 2.7:

from __future__ import print_function

Question 26

Even I was getting that same error today. And I’ve experienced an interesting thing. If you’re using python 3.x and still getting the error, it might be a reason:

You have multiple python versions installed on same drive. And when you’re presing the f5 button the python shell window (of ver. < 3.x) pops up

I was getting same error today, and noticed that thing. Trust me, when I execute my code from proper shell window (of ver. 3.x), I got satisfactory results

Question 27

we need to import a header before using end='', as it is not included in the python’s normal runtime.

from __future__ import print_function

it shall work perfectly now

Question 28

Compatible with both Python 2 & 3:

sys.stdout.write('mytext')

Compatible with only Python 2

print 'mytext',

Compatible with only Python 3

print('mytext', end='')

Question 29

Intrigued by this question about infinite loops in perl: while (1) Vs. for (;;) Is there a speed difference?, I decided to run a similar comparison in python. I expected that the compiler would generate the same byte code for while(True): pass and while(1): pass, but this is actually not the case in python2.7.

The following script:

import dis

def while_one():
    while 1:
        pass

def while_true():
    while True:
        pass

print("while 1")
print("----------------------------")
dis.dis(while_one)

print("while True")
print("----------------------------")
dis.dis(while_true)

produces the following results:

while 1
----------------------------
  4           0 SETUP_LOOP               3 (to 6)

  5     >>    3 JUMP_ABSOLUTE            3
        >>    6 LOAD_CONST               0 (None)
              9 RETURN_VALUE        
while True
----------------------------
  8           0 SETUP_LOOP              12 (to 15)
        >>    3 LOAD_GLOBAL              0 (True)
              6 JUMP_IF_FALSE            4 (to 13)
              9 POP_TOP             

  9          10 JUMP_ABSOLUTE            3
        >>   13 POP_TOP             
             14 POP_BLOCK           
        >>   15 LOAD_CONST               0 (None)
             18 RETURN_VALUE

Using while True is noticeably more complicated. Why is this?

In other contexts, python acts as though True equals 1:

>>> True == 1
True

>>> True + True
2

Why does while distinguish the two?

I noticed that python3 does evaluate the statements using identical operations:

while 1
----------------------------
  4           0 SETUP_LOOP               3 (to 6) 

  5     >>    3 JUMP_ABSOLUTE            3 
        >>    6 LOAD_CONST               0 (None) 
              9 RETURN_VALUE         
while True
----------------------------
  8           0 SETUP_LOOP               3 (to 6) 

  9     >>    3 JUMP_ABSOLUTE            3 
        >>    6 LOAD_CONST               0 (None) 
              9 RETURN_VALUE

Is there a change in python3 to the way booleans are evaluated?

Question 30

In Python 2.x, True is not a keyword, but just a built-in global constant that is defined to 1 in the bool type. Therefore the interpreter still has to load the contents of True. In other words, True is reassignable:

Python 2.7 (r27:82508, Jul  3 2010, 21:12:11) 
[GCC 4.0.1 (Apple Inc. build 5493)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> True = 4
>>> True
4

In Python 3.x it truly becomes a keyword and a real constant:

Python 3.1.2 (r312:79147, Jul 19 2010, 21:03:37) 
[GCC 4.2.1 (Apple Inc. build 5664)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> True = 4
  File "<stdin>", line 1
SyntaxError: assignment to keyword

thus the interpreter can replace the while True: loop with an infinite loop.

Question 31

This isn’t quite right,

thus the interpreter can replace the while True: loop with an infinite loop.

as one can still break out of the loop. But it is true that such a loop’s else clause would never be accessed in Python 3. And it is also true that simplifying the value lookup makes it run just as quickly as while 1 in Python 2.

Performance Comparison

Demonstrating the difference in time for a somewhat nontrivial while loop:

Setup

def while1():
    x = 0
    while 1:
        x += 1
        if x == 10:
            break

def whileTrue():
    x = 0
    while True:
        x += 1
        if x == 10:
            break

Python 2

>>> import timeit
>>> min(timeit.repeat(while1))
0.49712109565734863
>>> min(timeit.repeat(whileTrue))
0.756627082824707

Python 3

>>> import timeit
>>> min(timeit.repeat(while1))
0.6462970309949014
>>> min(timeit.repeat(whileTrue))
0.6450748789939098

Explanation

To explain the difference, in Python 2:

>>> import keyword
>>> 'True' in keyword.kwlist
False

but in Python 3:

>>> import keyword
>>> 'True' in keyword.kwlist
True
>>> True = 'true?'
  File "<stdin>", line 1
SyntaxError: can't assign to keyword

Since True is a keyword in Python 3, the interpreter doesn’t have to look up the value to see if someone replaced it with some other value. But since one can assign True to another value, the interpreter has to look it up every time.

Conclusion for Python 2

If you have a tight, long-running loop in Python 2, you probably should use while 1: instead of while True:.

Conclusion for Python 3

Use while True: if you have no condition for breaking out of your loop.

Question 32

This is a 7-year-old question that already has a great answer, but a misconception in the question, which isn’t addressed in any of the answers, makes it potentially confusing for some of the other questions marked as duplicates.

In other contexts, python acts as though True equals 1:

>>> True == 1
True

>>> True + True
2

Why does while distinguish the two?

In fact, while isn’t doing anything different here at all. It distinguishes 1 and True in exactly the same way that the + example does.

Here’s 2.7:

>>> dis.dis('True == 1')
  1           0 LOAD_GLOBAL              0 (True)
              3 LOAD_CONST               1 (1)
              6 COMPARE_OP               2 (==)
              9 RETURN_VALUE

>>> dis.dis('True == 1')
  1           0 LOAD_GLOBAL              0 (True)
              3 LOAD_GLOBAL              0 (True)
              6 BINARY_ADD
              9 RETURN_VALUE

Now compare:

>>> dis.dis('1 + 1')
  1           0 LOAD_CONST               1 (2)
              3 RETURN_VALUE

It’s emitting a LOAD_GLOBAL (True) for each True, and there’s nothing the optimizer can do with a global. So, while distinguishes 1 and True for the exact same reason that + does. (And == doesn’t distinguish them because the optimizer doesn’t optimize out comparisons.)

Now compare 3.6:

>>> dis.dis('True == 1')
  1           0 LOAD_CONST               0 (True)
              2 LOAD_CONST               1 (1)
              4 COMPARE_OP               2 (==)
              6 RETURN_VALUE

>>> dis.dis('True + True')
  1           0 LOAD_CONST               1 (2)
              2 RETURN_VALUE

Here, it’s emitting a LOAD_CONST (True) for the keyword, which the optimizer can take advantage of. So, True + 1 doesn’t distinguish, for exactly the same reason while True doesn’t. (And == still doesn’t distinguish them because the optimizer doesn’t optimize out comparisons.)

Meanwhile, if the code isn’t optimized out, the interpreter ends up treating True and 1 exactly the same in all three of these cases. bool is a subclass of int, and inherits most of its methods from int, and True has an internal integer value of 1. So, whether you’re doing a while test (__bool__ in 3.x, __nonzero__ in 2.x), a comparison (__eq__), or arithmetic (__add__), you’re calling the same method whether you use True or 1.

Question 33

I want to create a model object, like Person, if person’s id doesn’t not exist, or I will get that person object.

The code to create a new person as following:

class Person(models.Model):
    identifier = models.CharField(max_length = 10)
    name = models.CharField(max_length = 20)
    objects = PersonManager()

class PersonManager(models.Manager):
    def create_person(self, identifier):
        person = self.create(identifier = identifier)
        return person

But I don’t know where to check and get the existing person object.

Question 34

If you’re looking for “update if exists else create” use case, please refer to @Zags excellent answer

Django already has a get_or_create, https://docs.djangoproject.com/en/dev/ref/models/querysets/#get-or-create

For you it could be :

id = 'some identifier'
person, created = Person.objects.get_or_create(identifier=id)

if created:
   # means you have created a new person
else:
   # person just refers to the existing one

Question 35

It’s unclear whether your question is asking for the get_or_create method (available from at least Django 1.3) or the update_or_create method (new in Django 1.7). It depends on how you want to update the user object.

Sample use is as follows:

# In both cases, the call will get a person object with matching
# identifier or create one if none exists; if a person is created,
# it will be created with name equal to the value in `name`.

# In this case, if the Person already exists, its existing name is preserved
person, created = Person.objects.get_or_create(
        identifier=identifier, defaults={"name": name}
)

# In this case, if the Person already exists, its name is updated
person, created = Person.objects.update_or_create(
        identifier=identifier, defaults={"name": name}
)

Question 36

Django has support for this, check get_or_create

person, created = Person.objects.get_or_create(name='abc')
if created:
    # A new person object created
else:
    # person object already exists

Question 37

For only a small amount of objects the update_or_create works well, but if you’re doing over a large collection it won’t scale well. update_or_create always first runs a SELECT and thereafter an UPDATE.

for the_bar in bars:
    updated_rows = SomeModel.objects.filter(bar=the_bar).update(foo=100)
        if not updated_rows:
            # if not exists, create new
            SomeModel.objects.create(bar=the_bar, foo=100)

This will at best only run the first update-query, and only if it matched zero rows run another INSERT-query. Which will greatly increase your performance if you expect most of the rows to actually be existing.

It all comes down to your use case though. If you are expecting mostly inserts then perhaps the bulk_create() command could be an option.

Question 38

Thought I’d add an answer since your question title looks like it is asking how to create or update, rather than get or create as described in the question body.

If you did want to create or update an object, the .save() method already has this behaviour by default, from the docs:

Django abstracts the need to use INSERT or UPDATE SQL statements. Specifically, when you call save(), Django follows this algorithm:

If the object’s primary key attribute is set to a value that evaluates to True (i.e., a value other than None or the empty string), Django executes an UPDATE. If the object’s primary key attribute is not set or if the UPDATE didn’t update anything, Django executes an INSERT.

It’s worth noting that when they say ‘if the UPDATE didn’t update anything’ they are essentially referring to the case where the id you gave the object doesn’t already exist in the database.

Question 39

If one of the input when you create is a primary key, this will be enough:

Person.objects.get_or_create(id=1)

It will automatically update if exist since two data with the same primary key is not allowed.

Question 40

I am trying to write a dictionary to a txt file. Then read the dict values by typing the keys with raw_input. I feel like I am just missing one step but I have been looking for a while now.

I get this error

File "name.py", line 24, in reading
    print whip[name]
TypeError: string indices must be integers, not str

My code:

#!/usr/bin/env python
from sys import exit

class Person(object):
    def __init__(self):
        self.name = ""
        self.address = ""
        self.phone = ""
        self.age = ""
        self.whip = {}

    def writing(self):
        self.whip[p.name] = p.age, p.address, p.phone
        target = open('deed.txt', 'a')
        target.write(str(self.whip))
        print self.whip

    def reading(self):
        self.whip = open('deed.txt', 'r').read()
        name = raw_input("> ")
        if name in self.whip:
            print self.whip[name]

p = Person()

while True:
    print "Type:\n\t*read to read data base\n\t*write to write to data base\n\t*exit to exit"
    action = raw_input("\n> ")
    if "write" in action:
        p.name = raw_input("Name?\n> ")
        p.phone = raw_input("Phone Number?\n> ")
        p.age = raw_input("Age?\n> ")
        p.address = raw_input("Address?\n>")
        p.writing()
    elif "read" in action:
        p.reading()
    elif "exit" in action:
        exit(0)

Question 41

Your code is almost right! You are right, you are just missing one step. When you read in the file, you are reading it as a string; but you want to turn the string back into a dictionary.

The error message you saw was because self.whip was a string, not a dictionary.

I first wrote that you could just feed the string into dict() but that doesn’t work! You need to do something else.

Example

Here is the simplest way: feed the string into eval(). Like so:

def reading(self):
    s = open('deed.txt', 'r').read()
    self.whip = eval(s)

You can do it in one line, but I think it looks messy this way:

def reading(self):
    self.whip = eval(open('deed.txt', 'r').read())

But eval() is sometimes not recommended. The problem is that eval() will evaluate any string, and if someone tricked you into running a really tricky string, something bad might happen. In this case, you are just running eval() on your own file, so it should be okay.

But because eval() is useful, someone made an alternative to it that is safer. This is called literal_eval and you get it from a Python module called ast.

import ast

def reading(self):
    s = open('deed.txt', 'r').read()
    self.whip = ast.literal_eval(s)

ast.literal_eval() will only evaluate strings that turn into the basic Python types, so there is no way that a tricky string can do something bad on your computer.

EDIT

Actually, best practice in Python is to use a with statement to make sure the file gets properly closed. Rewriting the above to use a with statement:

import ast

def reading(self):
    with open('deed.txt', 'r') as f:
        s = f.read()
        self.whip = ast.literal_eval(s)

In the most popular Python, known as “CPython”, you usually don’t need the with statement as the built-in “garbage collection” features will figure out that you are done with the file and will close it for you. But other Python implementations, like “Jython” (Python for the Java VM) or “PyPy” (a really cool experimental system with just-in-time code optimization) might not figure out to close the file for you. It’s good to get in the habit of using with, and I think it makes the code pretty easy to understand.

Question 42

Have you tried the json module? JSON format is very similar to python dictionary. And it’s human readable/writable:

>>> import json
>>> d = {"one":1, "two":2}
>>> json.dump(d, open("text.txt",'w'))

This code dumps to a text file

$ cat text.txt 
{"two": 2, "one": 1}

Also you can load from a JSON file:

>>> d2 = json.load(open("text.txt"))
>>> print d2
{u'two': 2, u'one': 1}

Question 43

To store Python objects in files, use the pickle module:

import pickle

a = {
  'a': 1,
  'b': 2
}

with open('file.txt', 'wb') as handle:
  pickle.dump(a, handle)

with open('file.txt', 'rb') as handle:
  b = pickle.loads(handle.read())

print a == b # True

Notice that I never set b = a, but instead pickled a to a file and then unpickled it into b.

As for your error:

self.whip = open('deed.txt', 'r').read()

self.whip was a dictionary object. deed.txt contains text, so when you load the contents of deed.txt into self.whip, self.whip becomes the string representation of itself.

You’d probably want to evaluate the string back into a Python object:

self.whip = eval(open('deed.txt', 'r').read())

Notice how eval sounds like evil. That’s intentional. Use the pickle module instead.

Question 44

I created my own functions which work really nicely:

def writeDict(dict, filename, sep):
    with open(filename, "a") as f:
        for i in dict.keys():            
            f.write(i + " " + sep.join([str(x) for x in dict[i]]) + "\n")

It will store the keyname first, followed by all values. Note that in this case my dict contains integers so that’s why it converts to int. This is most likely the part you need to change for your situation.

def readDict(filename, sep):
    with open(filename, "r") as f:
        dict = {}
        for line in f:
            values = line.split(sep)
            dict[values[0]] = {int(x) for x in values[1:len(values)]}
        return(dict)

Question 45

Hi there is a way to write and read the dictionary to file you can turn your dictionary to JSON format and read and write quickly just do this :

To write your date:

 import json

 your_dictionary = {"some_date" : "date"}
 f = open('destFile.txt', 'w+')
 f.write(json.dumps(your_dictionary))

and to read your data:

 import json

 f = open('destFile.txt', 'r')
 your_dictionary = json.loads(f.read())

Question 46

You can iterate through the key-value pair and write it into file

pair = {'name': name,'location': location}
with open('F:\\twitter.json', 'a') as f:
     f.writelines('{}:{}'.format(k,v) for k, v in pair.items())
     f.write('\n')

Question 47

What is the most idiomatic way to achieve something like the following, in Haskell:

foldl (+) 0 [1,2,3,4,5]
--> 15

Or its equivalent in Ruby:

[1,2,3,4,5].inject(0) {|m,x| m + x}
#> 15

Obviously, Python provides the reduce function, which is an implementation of fold, exactly as above, however, I was told that the ‘pythonic’ way of programming was to avoid lambda terms and higher-order functions, preferring list-comprehensions where possible. Therefore, is there a preferred way of folding a list, or list-like structure in Python that isn’t the reduce function, or is reduce the idiomatic way of achieving this?

Question 48

The Pythonic way of summing an array is using sum. For other purposes, you can sometimes use some combination of reduce (from the functools module) and the operator module, e.g.:

def product(xs):
    return reduce(operator.mul, xs, 1)

Be aware that reduce is actually a foldl, in Haskell terms. There is no special syntax to perform folds, there’s no builtin foldr, and actually using reduce with non-associative operators is considered bad style.

Using higher-order functions is quite pythonic; it makes good use of Python’s principle that everything is an object, including functions and classes. You are right that lambdas are frowned upon by some Pythonistas, but mostly because they tend not to be very readable when they get complex.

Question 49

Haskell

foldl (+) 0 [1,2,3,4,5]

Python

reduce(lambda a,b: a+b, [1,2,3,4,5], 0)

Obviously, that is a trivial example to illustrate a point. In Python you would just do sum([1,2,3,4,5]) and even Haskell purists would generally prefer sum [1,2,3,4,5].

For non-trivial scenarios when there is no obvious convenience function, the idiomatic pythonic approach is to explicitly write out the for loop and use mutable variable assignment instead of using reduce or a fold.

That is not at all the functional style, but that is the “pythonic” way. Python is not designed for functional purists. See how Python favors exceptions for flow control to see how non-functional idiomatic python is.

Question 50

In Python 3, the reduce has been removed: Release notes. Nevertheless you can use the functools module

import operator, functools
def product(xs):
    return functools.reduce(operator.mul, xs, 1)

On the other hand, the documentation expresses preference towards for-loop instead of reduce, hence:

def product(xs):
    result = 1
    for i in xs:
        result *= i
    return result

Question 51

You can reinvent the wheel as well:

def fold(f, l, a):
    """
    f: the function to apply
    l: the list to fold
    a: the accumulator, who is also the 'zero' on the first call
    """ 
    return a if(len(l) == 0) else fold(f, l[1:], f(a, l[0]))

print "Sum:", fold(lambda x, y : x+y, [1,2,3,4,5], 0)

print "Any:", fold(lambda x, y : x or y, [False, True, False], False)

print "All:", fold(lambda x, y : x and y, [False, True, False], True)

# Prove that result can be of a different type of the list's elements
print "Count(x==True):", 
print fold(lambda x, y : x+1 if(y) else x, [False, True, True], 0)

Question 52

Not really answer to the question, but one-liners for foldl and foldr:

a = [8,3,4]

## Foldl
reduce(lambda x,y: x**y, a)
#68719476736

## Foldr
reduce(lambda x,y: y**x, a[::-1])
#14134776518227074636666380005943348126619871175004951664972849610340958208L

Question 53

Starting Python 3.8, and the introduction of assignment expressions (PEP 572) (:= operator), which gives the possibility to name the result of an expression, we can use a list comprehension to replicate what other languages call fold/foldleft/reduce operations:

Given a list, a reducing function and an accumulator:

items = [1, 2, 3, 4, 5]
f = lambda acc, x: acc * x
accumulator = 1

we can fold items with f in order to obtain the resulting accumulation:

[accumulator := f(accumulator, x) for x in items]
# accumulator = 120

or in a condensed formed:

acc = 1; [acc := acc * x for x in [1, 2, 3, 4, 5]]
# acc = 120

Note that this is actually also a “scanleft” operation as the result of the list comprehension represents the state of the accumulation at each step:

acc = 1
scanned = [acc := acc * x for x in [1, 2, 3, 4, 5]]
# scanned = [1, 2, 6, 24, 120]
# acc = 120

Question 54

The actual answer to this (reduce) problem is: Just use a loop!

initial_value = 0
for x in the_list:
    initial_value += x #or any function.

This will be faster than a reduce and things like PyPy can optimize loops like that.

BTW, the sum case should be solved with the sum function

Question 55

I believe some of the respondents of this question have missed the broader implication of the fold function as an abstract tool. Yes, sum can do the same thing for a list of integers, but this is a trivial case. fold is more generic. It is useful when you have a sequence of data structures of varying shape and want to cleanly express an aggregation. So instead of having to build up a for loop with an aggregate variable and manually recompute it each time, a fold function (or the Python version, which reduce appears to correspond to) allows the programmer to express the intent of the aggregation much more plainly by simply providing two things:

A default starting or “seed” value for the aggregation.
A function that takes the current value of the aggregation (starting with the “seed”) and the next element in the list, and returns the next aggregation value.

Question 56

I may be quite late to the party, but we can create custom foldr using simple lambda calculus and curried function. Here is my implementation of foldr in python.

def foldr(func):
    def accumulator(acc):
        def listFunc(l):
            if l:
                x = l[0]
                xs = l[1:]
                return func(x)(foldr(func)(acc)(xs))
            else:
                return acc
        return listFunc
    return accumulator  


def curried_add(x):
    def inner(y):
        return x + y
    return inner

def curried_mult(x):
    def inner(y):
        return x * y
    return inner

print foldr(curried_add)(0)(range(1, 6))
print foldr(curried_mult)(1)(range(1, 6))

Even though the implementation is recursive (might be slow), it will print the values 15 and 120 respectively

Question 57

I have written a custom container object.

According to this page, I need to implement this method on my object:

__iter__(self)

However, upon following up the link to Iterator Types in the Python reference manual, there are no examples given of how to implement your own.

Can someone post a snippet (or link to a resource), that shows how to do this?

The container I am writing, is a map (i.e. stores values by unique keys). dicts can be iterated like this:

for k, v in mydict.items()

In this case I need to be able to return two elements (a tuple?) in the iterator. It is still not clear how to implement such an iterator (despite the several answers that have been kindly provided). Could someone please shed some more light on how to implement an iterator for a map-like container object? (i.e. a custom class that acts like a dict)?

Question 58

I normally would use a generator function. Each time you use a yield statement, it will add an item to the sequence.

The following will create an iterator that yields five, and then every item in some_list.

def __iter__(self):
   yield 5
   yield from some_list

Pre-3.3, yield from didn’t exist, so you would have to do:

def __iter__(self):
   yield 5
   for x in some_list:
      yield x

Question 59

Another option is to inherit from the appropriate abstract base class from the `collections module as documented here.

In case the container is its own iterator, you can inherit from collections.Iterator. You only need to implement the next method then.

An example is:

>>> from collections import Iterator
>>> class MyContainer(Iterator):
...     def __init__(self, *data):
...         self.data = list(data)
...     def next(self):
...         if not self.data:
...             raise StopIteration
...         return self.data.pop()
...         
...     
... 
>>> c = MyContainer(1, "two", 3, 4.0)
>>> for i in c:
...     print i
...     
... 
4.0
3
two
1

While you are looking at the collections module, consider inheriting from Sequence, Mapping or another abstract base class if that is more appropriate. Here is an example for a Sequence subclass:

>>> from collections import Sequence
>>> class MyContainer(Sequence):
...     def __init__(self, *data):
...         self.data = list(data)
...     def __getitem__(self, index):
...         return self.data[index]
...     def __len__(self):
...         return len(self.data)
...         
...     
... 
>>> c = MyContainer(1, "two", 3, 4.0)
>>> for i in c:
...     print i
...     
... 
1
two
3
4.0

NB: Thanks to Glenn Maynard for drawing my attention to the need to clarify the difference between iterators on the one hand and containers that are iterables rather than iterators on the other.

Question 60

usually __iter__() just return self if you have already define the next() method (generator object):

here is a Dummy example of a generator :

class Test(object):

    def __init__(self, data):
       self.data = data

    def next(self):
        if not self.data:
           raise StopIteration
        return self.data.pop()

    def __iter__(self):
        return self

but __iter__() can also be used like this: http://mail.python.org/pipermail/tutor/2006-January/044455.html

Question 61

If your object contains a set of data you want to bind your object’s iter to, you can cheat and do this:

>>> class foo:
    def __init__(self, *params):
           self.data = params
    def __iter__(self):
        if hasattr(self.data[0], "__iter__"):
            return self.data[0].__iter__()
        return self.data.__iter__()
>>> d=foo(6,7,3,8, "ads", 6)
>>> for i in d:
    print i
6
7
3
8
ads
6

Question 62

The “iterable interface” in python consists of two methods __next__() and __iter__(). The __next__ function is the most important, as it defines the iterator behavior – that is, the function determines what value should be returned next. The __iter__() method is used to reset the starting point of the iteration. Often, you will find that __iter__() can just return self when __init__() is used to set the starting point.

See the following code for defining a Class Reverse which implements the “iterable interface” and defines an iterator over any instance from any sequence class. The __next__() method starts at the end of the sequence and returns values in reverse order of the sequence. Note that instances from a class implementing the “sequence interface” must define a __len__() and a __getitem__() method.

class Reverse:
    """Iterator for looping over a sequence backwards."""
    def __init__(self, seq):
        self.data = seq
        self.index = len(seq)

    def __iter__(self):
        return self

    def __next__(self):
        if self.index == 0:
            raise StopIteration
        self.index = self.index - 1
        return self.data[self.index]

>>> rev = Reverse('spam')
>>> next(rev)   # note no need to call iter()
'm'
>>> nums = Reverse(range(1,10))
>>> next(nums)
9

Question 63

To answer the question about mappings: your provided __iter__ should iterate over the keys of the mapping. The following is a simple example that creates a mapping x -> x * x and works on Python3 extending the ABC mapping.

import collections.abc

class MyMap(collections.abc.Mapping):
    def __init__(self, n):
        self.n = n

    def __getitem__(self, key): # given a key, return it's value
        if 0 <= key < self.n:
            return key * key
        else:
            raise KeyError('Invalid key')

    def __iter__(self): # iterate over all keys
        for x in range(self.n):
            yield x

    def __len__(self):
        return self.n

m = MyMap(5)
for k, v in m.items():
    print(k, '->', v)
# 0 -> 0
# 1 -> 1
# 2 -> 4
# 3 -> 9
# 4 -> 16

Question 64

In case you don’t want to inherit from dict as others have suggested, here is direct answer to the question on how to implement __iter__ for a crude example of a custom dict:

class Attribute:
    def __init__(self, key, value):
        self.key = key
        self.value = value

class Node(collections.Mapping):
    def __init__(self):
        self.type  = ""
        self.attrs = [] # List of Attributes

    def __iter__(self):
        for attr in self.attrs:
            yield attr.key

That uses a generator, which is well described here.

Since we’re inheriting from Mapping, you need to also implement __getitem__ and __len__:

    def __getitem__(self, key):
        for attr in self.attrs:
            if key == attr.key:
                return attr.value
        raise KeyError

    def __len__(self):
        return len(self.attrs)

问题：scikit-learn中的class_weight参数如何工作？

回答 0

回答 1

问题：如何在Python中打印Unicode字符？

回答 0

回答 1

在Python中打印unicode字符：

Print a unicode character in Python:

回答 2

回答 3

回答 4

回答 5

回答 6

回答 7

回答 8

问题：python print end =”

回答 0

回答 1

回答 2

回答 3

回答 4

回答 5

回答 6

回答 7

回答 8

回答 9

回答 10

回答 11

回答 12

回答 13

问题：while（1）vs. while（True）—为什么会有区别（在python 2字节码中）？

回答 0

回答 1

性能比较

建立

Python 2

Python 3

说明

Python 2总结

Python 3总结

Performance Comparison

Setup

Python 2

Python 3

Explanation

Conclusion for Python 2

Conclusion for Python 3

回答 2

问题：创建Django模型或更新（如果存在）

回答 0

回答 1

回答 2

回答 3

回答 4

回答 5

问题：写一个字典到txt文件并读回去？

回答 0

例

编辑

Example

EDIT

回答 1

回答 2

回答 3

回答 4

回答 5

问题：函数式编程中的“ pythonic”等同于“ fold”函数是什么？

回答 0

回答 1

回答 2

回答 3

回答 4

回答 5

回答 6

回答 7

回答 8

问题：如何为容器对象实现__iter __（self）（Python）

回答 0

回答 1

回答 2

问题：如何为容器对象实现iter （self）（Python）