Python 实用宝典

Question 1

db = sqlite.connect("test.sqlite")
res = db.execute("select * from table")

With iteration I get lists coresponding to the rows.

for row in res:
    print row

I can get name of the columns

col_name_list = [tuple[0] for tuple in res.description]

But is there some function or setting to get dictionaries instead of list?

{'col1': 'value', 'col2': 'value'}

or I have to do myself?

Question 2

You could use row_factory, as in the example in the docs:

import sqlite3

def dict_factory(cursor, row):
    d = {}
    for idx, col in enumerate(cursor.description):
        d[col[0]] = row[idx]
    return d

con = sqlite3.connect(":memory:")
con.row_factory = dict_factory
cur = con.cursor()
cur.execute("select 1 as a")
print cur.fetchone()["a"]

or follow the advice that’s given right after this example in the docs:

If returning a tuple doesn’t suffice and you want name-based access to columns, you should consider setting row_factory to the highly-optimized sqlite3.Row type. Row provides both index-based and case-insensitive name-based access to columns with almost no memory overhead. It will probably be better than your own custom dictionary-based approach or even a db_row based solution.

Question 3

I thought I answer this question even though the answer is partly mentioned in both Adam Schmideg’s and Alex Martelli’s answers. In order for others like me that have the same question, to find the answer easily.

conn = sqlite3.connect(":memory:")

#This is the important part, here we are setting row_factory property of
#connection object to sqlite3.Row(sqlite3.Row is an implementation of
#row_factory)
conn.row_factory = sqlite3.Row
c = conn.cursor()
c.execute('select * from stocks')

result = c.fetchall()
#returns a list of dictionaries, each item in list(each dictionary)
#represents a row of the table

Question 4

Even using the sqlite3.Row class– you still can’t use string formatting in the form of:

print "%(id)i - %(name)s: %(value)s" % row

In order to get past this, I use a helper function that takes the row and converts to a dictionary. I only use this when the dictionary object is preferable to the Row object (e.g. for things like string formatting where the Row object doesn’t natively support the dictionary API as well). But use the Row object all other times.

def dict_from_row(row):
    return dict(zip(row.keys(), row))

Question 5

After you connect to SQLite: con = sqlite3.connect(.....) it is sufficient to just run:

con.row_factory = sqlite3.Row

Voila!

Question 6

From PEP 249:

Question: 

   How can I construct a dictionary out of the tuples returned by
   .fetch*():

Answer:

   There are several existing tools available which provide
   helpers for this task. Most of them use the approach of using
   the column names defined in the cursor attribute .description
   as basis for the keys in the row dictionary.

   Note that the reason for not extending the DB API specification
   to also support dictionary return values for the .fetch*()
   methods is that this approach has several drawbacks:

   * Some databases don't support case-sensitive column names or
     auto-convert them to all lowercase or all uppercase
     characters.

   * Columns in the result set which are generated by the query
     (e.g.  using SQL functions) don't map to table column names
     and databases usually generate names for these columns in a
     very database specific way.

   As a result, accessing the columns through dictionary keys
   varies between databases and makes writing portable code
   impossible.

So yes, do it yourself.

Question 7

Shorter version:

db.row_factory = lambda c, r: dict([(col[0], r[idx]) for idx, col in enumerate(c.description)])

Question 8

Fastest on my tests:

conn.row_factory = lambda c, r: dict(zip([col[0] for col in c.description], r))
c = conn.cursor()

%timeit c.execute('SELECT * FROM table').fetchall()
19.8 µs ± 1.05 µs per loop (mean ± std. dev. of 7 runs, 100000 loops each)

vs:

conn.row_factory = lambda c, r: dict([(col[0], r[idx]) for idx, col in enumerate(c.description)])
c = conn.cursor()

%timeit c.execute('SELECT * FROM table').fetchall()
19.4 µs ± 75.6 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

You decide :)

Question 9

Similar like before-mentioned solutions, but most compact:

db.row_factory = lambda C, R: { c[0]: R[i] for i, c in enumerate(C.description) }

Question 10

As mentioned by @gandalf’s answer, one has to use conn.row_factory = sqlite3.Row, but the results are not directly dictionaries. One has to add an additional “cast” to dict in the last loop:

import sqlite3
conn = sqlite3.connect(":memory:")
conn.execute('create table t (a text, b text, c text)')
conn.execute('insert into t values ("aaa", "bbb", "ccc")')
conn.execute('insert into t values ("AAA", "BBB", "CCC")')
conn.row_factory = sqlite3.Row
c = conn.cursor()
c.execute('select * from t')
for r in c.fetchall():
    print(dict(r))

# {'a': 'aaa', 'b': 'bbb', 'c': 'ccc'}
# {'a': 'AAA', 'b': 'BBB', 'c': 'CCC'}

Question 11

I think you were on the right track. Let’s keep this very simple and complete what you were trying to do:

import sqlite3
db = sqlite3.connect("test.sqlite3")
cur = db.cursor()
res = cur.execute("select * from table").fetchall()
data = dict(zip([c[0] for c in cur.description], res[0]))

print(data)

The downside is that .fetchall(), which is murder on your memory consumption, if your table is very large. But for trivial applications dealing with mere few thousands of rows of text and numeric columns, this simple approach is good enough.

For serious stuff, you should look into row factories, as proposed in many other answers.

Question 12

Or you could convert the sqlite3.Rows to a dictionary as follows. This will give a dictionary with a list for each row.

    def from_sqlite_Row_to_dict(list_with_rows):
    ''' Turn a list with sqlite3.Row objects into a dictionary'''
    d ={} # the dictionary to be filled with the row data and to be returned

    for i, row in enumerate(list_with_rows): # iterate throw the sqlite3.Row objects            
        l = [] # for each Row use a separate list
        for col in range(0, len(row)): # copy over the row date (ie. column data) to a list
            l.append(row[col])
        d[i] = l # add the list to the dictionary   
    return d

Question 13

A generic alternative, using just three lines

def select_column_and_value(db, sql, parameters=()):
    execute = db.execute(sql, parameters)
    fetch = execute.fetchone()
    return {k[0]: v for k, v in list(zip(execute.description, fetch))}

con = sqlite3.connect('/mydatabase.db')
c = con.cursor()
print(select_column_and_value(c, 'SELECT * FROM things WHERE id=?', (id,)))

But if your query returns nothing, will result in error. In this case…

def select_column_and_value(self, sql, parameters=()):
    execute = self.execute(sql, parameters)
    fetch = execute.fetchone()

    if fetch is None:
        return {k[0]: None for k in execute.description}

    return {k[0]: v for k, v in list(zip(execute.description, fetch))}

or

def select_column_and_value(self, sql, parameters=()):
    execute = self.execute(sql, parameters)
    fetch = execute.fetchone()

    if fetch is None:
        return {}

    return {k[0]: v for k, v in list(zip(execute.description, fetch))}

Question 14

import sqlite3

db = sqlite3.connect('mydatabase.db')
cursor = db.execute('SELECT * FROM students ORDER BY CREATE_AT')
studentList = cursor.fetchall()

columnNames = list(map(lambda x: x[0], cursor.description)) #students table column names list
studentsAssoc = {} #Assoc format is dictionary similarly


#THIS IS ASSOC PROCESS
for lineNumber, student in enumerate(studentList):
    studentsAssoc[lineNumber] = {}

    for columnNumber, value in enumerate(student):
        studentsAssoc[lineNumber][columnNames[columnNumber]] = value


print(studentsAssoc)

The result is definitely true, but I do not know the best.

Question 15

Dictionaries in python provide arbitrary access to their elements. So any dictionary with “names” although it might be informative on one hand (a.k.a. what are the field names) “un-orders” the fields, which might be unwanted.

Best approach is to get the names in a separate list and then combine them with the results by yourself, if needed.

try:
         mycursor = self.memconn.cursor()
         mycursor.execute('''SELECT * FROM maintbl;''')
         #first get the names, because they will be lost after retrieval of rows
         names = list(map(lambda x: x[0], mycursor.description))
         manyrows = mycursor.fetchall()

         return manyrows, names

Also remember that the names, in all approaches, are the names you provided in the query, not the names in database. Exception is the SELECT * FROM

If your only concern is to get the results using a dictionary, then definitely use the conn.row_factory = sqlite3.Row (already stated in another answer).

Question 16

I have just done some image processing using the Python image library (PIL) using a post I found earlier to perform fourier transforms of images and I can’t get the save function to work. The whole code works fine but it just wont save the resulting image:

from PIL import Image
import numpy as np

i = Image.open("C:/Users/User/Desktop/mesh.bmp")
i = i.convert("L")
a = np.asarray(i)
b = np.abs(np.fft.rfft2(a))
j = Image.fromarray(b)
j.save("C:/Users/User/Desktop/mesh_trans",".bmp")

The error I get is the following:

save_handler = SAVE[string.upper(format)] # unknown format
    KeyError: '.BMP'

How can I save an image with Pythons PIL?

Question 17

The error regarding the file extension has been handled, you either use BMP (without the dot) or pass the output name with the extension already. Now to handle the error you need to properly modify your data in the frequency domain to be saved as an integer image, PIL is telling you that it doesn’t accept float data to save as BMP.

Here is a suggestion (with other minor modifications, like using fftshift and numpy.array instead of numpy.asarray) for doing the conversion for proper visualization:

import sys
import numpy
from PIL import Image

img = Image.open(sys.argv[1]).convert('L')

im = numpy.array(img)
fft_mag = numpy.abs(numpy.fft.fftshift(numpy.fft.fft2(im)))

visual = numpy.log(fft_mag)
visual = (visual - visual.min()) / (visual.max() - visual.min())

result = Image.fromarray((visual * 255).astype(numpy.uint8))
result.save('out.bmp')

Question 18

You should be able to simply let PIL get the filetype from extension, i.e. use:

j.save("C:/Users/User/Desktop/mesh_trans.bmp")

Question 19

Try removing the . before the .bmp (it isn’t matching BMP as expected). As you can see from the error, the save_handler is upper-casing the format you provided and then looking for a match in SAVE. However the corresponding key in that object is BMP (instead of .BMP).

I don’t know a great deal about PIL, but from some quick searching around it seems that it is a problem with the mode of the image. Changing the definition of j to:

j = Image.fromarray(b, mode='RGB')

Seemed to work for me (however note that I have very little knowledge of PIL, so I would suggest using @mmgp’s solution as s/he clearly knows what they are doing :) ). For the types of mode, I used this page – hopefully one of the choices there will work for you.

Question 20

I know that this is old, but I’ve found that (while using Pillow) opening the file by using open(fp, 'w') and then saving the file will work. E.g:

with open(fp, 'w') as f:
    result.save(f)

fp being the file path, of course.

Question 21

I’m confused about the rules Pandas uses when deciding that a selection from a dataframe is a copy of the original dataframe, or a view on the original.

If I have, for example,

df = pd.DataFrame(np.random.randn(8,8), columns=list('ABCDEFGH'), index=range(1,9))

I understand that a query returns a copy so that something like

foo = df.query('2 < index <= 5')
foo.loc[:,'E'] = 40

will have no effect on the original dataframe, df. I also understand that scalar or named slices return a view, so that assignments to these, such as

df.iloc[3] = 70

or

df.ix[1,'B':'E'] = 222

will change df. But I’m lost when it comes to more complicated cases. For example,

df[df.C <= df.B] = 7654321

changes df, but

df[df.C <= df.B].ix[:,'B':'E']

does not.

Is there a simple rule that Pandas is using that I’m just missing? What’s going on in these specific cases; and in particular, how do I change all values (or a subset of values) in a dataframe that satisfy a particular query (as I’m attempting to do in the last example above)?

Note: This is not the same as this question; and I have read the documentation, but am not enlightened by it. I’ve also read through the “Related” questions on this topic, but I’m still missing the simple rule Pandas is using, and how I’d apply it to — for example — modify the values (or a subset of values) in a dataframe that satisfy a particular query.

Question 22

Here’s the rules, subsequent override:

All operations generate a copy
If inplace=True is provided, it will modify in-place; only some operations support this
An indexer that sets, e.g. .loc/.iloc/.iat/.at will set inplace.
An indexer that gets on a single-dtyped object is almost always a view (depending on the memory layout it may not be that’s why this is not reliable). This is mainly for efficiency. (the example from above is for .query; this will always return a copy as its evaluated by numexpr)
An indexer that gets on a multiple-dtyped object is always a copy.

Your example of chained indexing

df[df.C <= df.B].loc[:,'B':'E']

is not guaranteed to work (and thus you shoulld never do this).

Instead do:

df.loc[df.C <= df.B, 'B':'E']

as this is faster and will always work

The chained indexing is 2 separate python operations and thus cannot be reliably intercepted by pandas (you will oftentimes get a SettingWithCopyWarning, but that is not 100% detectable either). The dev docs, which you pointed, offer a much more full explanation.

Question 23

I need to know what += does in python. It’s that simple. I also would appreciate links to definitions of other short hand tools in python.

Question 24

In Python, += is sugar coating for the __iadd__ special method, or __add__ or __radd__ if __iadd__ isn’t present. The __iadd__ method of a class can do anything it wants. The list object implements it and uses it to iterate over an iterable object appending each element to itself in the same way that the list’s extend method does.

Here’s a simple custom class that implements the __iadd__ special method. You initialize the object with an int, then can use the += operator to add a number. I’ve added a print statement in __iadd__ to show that it gets called. Also, __iadd__ is expected to return an object, so I returned the addition of itself plus the other number which makes sense in this case.

>>> class Adder(object):
        def __init__(self, num=0):
            self.num = num

        def __iadd__(self, other):
            print 'in __iadd__', other
            self.num = self.num + other
            return self.num

>>> a = Adder(2)
>>> a += 3
in __iadd__ 3
>>> a
5

Hope this helps.

Question 25

+= adds another value with the variable’s value and assigns the new value to the variable.

>>> x = 3
>>> x += 2
>>> print x
5

-=, *=, /= does similar for subtraction, multiplication and division.

Question 26

x += 5 is not exactly same as saying x = x + 5 in Python.

Note here:

In [1]: x = [2,3,4]    
In [2]: y = x    
In [3]: x += 7,8,9    
In [4]: x
Out[4]: [2, 3, 4, 7, 8, 9]    
In [5]: y
Out[5]: [2, 3, 4, 7, 8, 9]    
In [6]: x += [44,55]    
In [7]: x
Out[7]: [2, 3, 4, 7, 8, 9, 44, 55]    
In [8]: y
Out[8]: [2, 3, 4, 7, 8, 9, 44, 55]    
In [9]: x = x + [33,22]    
In [10]: x
Out[10]: [2, 3, 4, 7, 8, 9, 44, 55, 33, 22]    
In [11]: y
Out[11]: [2, 3, 4, 7, 8, 9, 44, 55]

See for reference: Why does += behave unexpectedly on lists?

Question 27

+= adds a number to a variable, changing the variable itself in the process (whereas + would not). Similar to this, there are the following that also modifies the variable:

-=, subtracts a value from variable, setting the variable to the result
*=, multiplies the variable and a value, making the outcome the variable
/=, divides the variable by the value, making the outcome the variable
%=, performs modulus on the variable, with the variable then being set to the result of it

There may be others. I am not a Python programmer.

Question 28

It adds the right operand to the left. x += 2 means x = x + 2

It can also add elements to a list — see this SO thread.

Question 29

It is not a mere syntactic shortcut. Try this:

x=[]                   # empty list
x += "something"       # iterates over the string and appends to list
print(x)               # ['s', 'o', 'm', 'e', 't', 'h', 'i', 'n', 'g']

versus

x=[]                   # empty list
x = x + "something"    # TypeError: can only concatenate list (not "str") to list

This illustrates that += invokes the iadd list method but + invokes add, which do different things with lists.

Question 30

Notionally a += b “adds” b to a storing the result in a. This simplistic description would describe the += operator in many languages.

However the simplistic description raises a couple of questions.

What exactly do we mean by “adding”?
What exactly do we mean by “storing the result in a”? python variables don’t store values directly they store references to objects.

In python the answers to both of these questions depend on the data type of a.

So what exactly does “adding” mean?

For numbers it means numeric addition.
For lists, tuples, strings etc it means concatenation.

Note that for lists += is more flexible than +, the + operator on a list requires another list, but the += operator will accept any iterable.

So what does “storing the value in a” mean?

If the object is mutable then it is encouraged (but not required) to perform the modification in-place. So a points to the same object it did before but that object now has different content.

If the object is immutable then it obviously can’t perform the modification in-place. Some mutable objects may also not have an implementation of an in-place “add” operation . In this case the variable “a” will be updated to point to a new object containing the result of an addition operation.

Technically this is implemented by looking for __IADD__ first, if that is not implemented then __ADD__ is tried and finally __RADD__.

Care is required when using += in python on variables where we are not certain of the exact type and in particular where we are not certain if the type is mutable or not. For example consider the following code.

def dostuff(a):
    b = a
    a += (3,4)
    print(repr(a)+' '+repr(b))

dostuff((1,2))
dostuff([1,2])

When we invoke dostuff with a tuple then the tuple is copied as part of the += operation and so b is unaffected. However when we invoke it with a list the list is modified in place, so both a and b are affected.

In python 3, similar behaviour is observed with the “bytes” and “bytearray” types.

Finally note that reassignment happens even if the object is not replaced. This doesn’t matter much if the left hand side is simply a variable but it can cause confusing behaviour when you have an immutable collection referring to mutable collections for example:

a = ([1,2],[3,4])
a[0] += [5]

In this case [5] will successfully be added to the list referred to by a[0] but then afterwards an exception will be raised when the code tries and fails to reassign a[0].

Question 31

The short answer is += can be translated as “add whatever is to the right of the += to the variable on the left of the +=”.

Ex. If you have a = 10 then a += 5 would be: a = a + 5

So, “a” now equal to 15.

Question 32

Note x += y is not the same as x = x + y in some situations where an additional operator is included because of the operator precedence combined with the fact that the right hand side is always evaluated first, e.g.

>>> x = 2
>>> x += 2 and 1
>>> x
3

>>> x = 2
>>> x = x + 2 and 1
>>> x
1

Note the first case expand to:

>>> x = 2
>>> x = x + (2 and 1)
>>> x
3

You are more likely to encounter this in the ‘real world’ with other operators, e.g.

x *= 2 + 1 == x = x * (2 + 1) != x = x * 2 + 1

Question 33

+= is just a shortcut for writing

number = 4
number = number + 1

So instead you would write

numbers = 4
numbers += 1

Both ways are correct but example two helps you write a little less code

Question 34

As others also said, the += operator is a shortcut. An example:

var = 1;
var = var + 1;
#var = 2

It could also be written like so:

var = 1;
var += 1;
#var = 2

So instead of writing the first example, you can just write the second one, which would work just fine.

Question 35

Remember when you used to sum, for example 2 & 3, in your old calculator and every time you hit the = you see 3 added to the total, the += does similar job. Example:

>>> orange = 2
>>> orange += 3
>>> print(orange)
5
>>> orange +=3
>>> print(orange)
8

Question 36

I’m seeing a lot of answers that don’t bring up using += with multiple integers.

One example:

x -= 1 + 3

This would be similar to:

x = x - (1 + 3)

and not:

x = (x - 1) + 3

Question 37

According to the documentation

x += y is equivalent to x = operator.iadd(x, y). Another way to put it is to say that z = operator.iadd(x, y) is equivalent to the compound statement z = x; z += y.

So x += 3 is the same as x = x + 3.

x = 2

x += 3

print(x)

will output 5.

Notice that there’s also

&=
//=
<<=
%=
*=
@=
|=
**=
>>=
-=
/=
^=

Question 38

I want to convert a 1-dimensional array into a 2-dimensional array by specifying the number of columns in the 2D array. Something that would work like this:

> import numpy as np
> A = np.array([1,2,3,4,5,6])
> B = vec2matrix(A,ncol=2)
> B
array([[1, 2],
       [3, 4],
       [5, 6]])

Does numpy have a function that works like my made-up function “vec2matrix”? (I understand that you can index a 1D array like a 2D array, but that isn’t an option in the code I have – I need to make this conversion.)

Question 39

You want to reshape the array.

B = np.reshape(A, (-1, 2))

where -1 infers the size of the new dimension from the size of the input array.

Question 40

You have two options:

If you no longer want the original shape, the easiest is just to assign a new shape to the array
```
a.shape = (a.size//ncols, ncols)
```
You can switch the a.size//ncols by -1 to compute the proper shape automatically. Make sure that a.shape[0]*a.shape[1]=a.size, else you’ll run into some problem.
You can get a new array with the np.reshape function, that works mostly like the version presented above
```
new = np.reshape(a, (-1, ncols))
```
When it’s possible, new will be just a view of the initial array a, meaning that the data are shared. In some cases, though, new array will be acopy instead. Note that np.reshape also accepts an optional keyword order that lets you switch from row-major C order to column-major Fortran order. np.reshape is the function version of the a.reshape method.

If you can’t respect the requirement a.shape[0]*a.shape[1]=a.size, you’re stuck with having to create a new array. You can use the np.resize function and mixing it with np.reshape, such as

>>> a =np.arange(9)
>>> np.resize(a, 10).reshape(5,2)

Question 41

Try something like:

B = np.reshape(A,(-1,ncols))

You’ll need to make sure that you can divide the number of elements in your array by ncols though. You can also play with the order in which the numbers are pulled into B using the order keyword.

Question 42

If your sole purpose is to convert a 1d array X to a 2d array just do:

X = np.reshape(X,(1, X.size))

Question 43

import numpy as np
array = np.arange(8) 
print("Original array : \n", array)
array = np.arange(8).reshape(2, 4)
print("New array : \n", array)

Question 44

some_array.shape = (1,)+some_array.shape

or get a new one

another_array = numpy.reshape(some_array, (1,)+some_array.shape)

This will make dimensions +1, equals to adding a bracket on the outermost

Question 45

You can useflatten() from the numpy package.

import numpy as np
a = np.array([[1, 2],
       [3, 4],
       [5, 6]])
a_flat = a.flatten()
print(f"original array: {a} \nflattened array = {a_flat}")

Output:

original array: [[1 2]
 [3 4]
 [5 6]] 
flattened array = [1 2 3 4 5 6]

Question 46

Change 1D array into 2D array without using Numpy.

l = [i for i in range(1,21)]
part = 3
new = []
start, end = 0, part


while end <= len(l):
    temp = []
    for i in range(start, end):
        temp.append(l[i])
    new.append(temp)
    start += part
    end += part
print("new values:  ", new)


# for uneven cases
temp = []
while start < len(l):
    temp.append(l[start])
    start += 1
    new.append(temp)
print("new values for uneven cases:   ", new)

Question 47

Is there a way to convert true of type unicode to 1 and false of type unicode to 0 (in Python)?

For example: x == 'true' and type(x) == unicode

I want x = 1

PS: I don’t want to use if–else.

Question 48

Use int() on a boolean test:

x = int(x == 'true')

int() turns the boolean into 1 or 0. Note that any value not equal to 'true' will result in 0 being returned.

Question 49

If B is a Boolean array, write

B = B*1

(A bit code golfy.)

Question 50

You can use x.astype('uint8') where x is your Boolean array.

Question 51

Here’s a yet another solution to your problem:

def to_bool(s):
    return 1 - sum(map(ord, s)) % 2
    # return 1 - sum(s.encode('ascii')) % 2  # Alternative for Python 3

It works because the sum of the ASCII codes of 'true' is 448, which is even, while the sum of the ASCII codes of 'false' is 523 which is odd.

The funny thing about this solution is that its result is pretty random if the input is not one of 'true' or 'false'. Half of the time it will return 0, and the other half 1. The variant using encode will raise an encoding error if the input is not ASCII (thus increasing the undefined-ness of the behaviour).

Seriously, I believe the most readable, and faster, solution is to use an if:

def to_bool(s):
    return 1 if s == 'true' else 0

See some microbenchmarks:

In [14]: def most_readable(s):
    ...:     return 1 if s == 'true' else 0

In [15]: def int_cast(s):
    ...:     return int(s == 'true')

In [16]: def str2bool(s):
    ...:     try:
    ...:         return ['false', 'true'].index(s)
    ...:     except (ValueError, AttributeError):
    ...:         raise ValueError()

In [17]: def str2bool2(s):
    ...:     try:
    ...:         return ('false', 'true').index(s)
    ...:     except (ValueError, AttributeError):
    ...:         raise ValueError()

In [18]: def to_bool(s):
    ...:     return 1 - sum(s.encode('ascii')) % 2

In [19]: %timeit most_readable('true')
10000000 loops, best of 3: 112 ns per loop

In [20]: %timeit most_readable('false')
10000000 loops, best of 3: 109 ns per loop

In [21]: %timeit int_cast('true')
1000000 loops, best of 3: 259 ns per loop

In [22]: %timeit int_cast('false')
1000000 loops, best of 3: 262 ns per loop

In [23]: %timeit str2bool('true')
1000000 loops, best of 3: 343 ns per loop

In [24]: %timeit str2bool('false')
1000000 loops, best of 3: 325 ns per loop

In [25]: %timeit str2bool2('true')
1000000 loops, best of 3: 295 ns per loop

In [26]: %timeit str2bool2('false')
1000000 loops, best of 3: 277 ns per loop

In [27]: %timeit to_bool('true')
1000000 loops, best of 3: 607 ns per loop

In [28]: %timeit to_bool('false')
1000000 loops, best of 3: 612 ns per loop

Notice how the if solution is at least 2.5x times faster than all the other solutions. It does not make sense to put as a requirement to avoid using ifs except if this is some kind of homework (in which case you shouldn’t have asked this in the first place).

Question 52

If you need a general purpose conversion from a string which per se is not a bool, you should better write a routine similar to the one depicted below. In keeping with the spirit of duck typing, I have not silently passed the error but converted it as appropriate for the current scenario.

>>> def str2bool(st):
try:
    return ['false', 'true'].index(st.lower())
except (ValueError, AttributeError):
    raise ValueError('no Valid Conversion Possible')


>>> str2bool('garbaze')

Traceback (most recent call last):
  File "<pyshell#106>", line 1, in <module>
    str2bool('garbaze')
  File "<pyshell#105>", line 5, in str2bool
    raise TypeError('no Valid COnversion Possible')
TypeError: no Valid Conversion Possible
>>> str2bool('false')
0
>>> str2bool('True')
1

Question 53

bool to int: x = (x == 'true') + 0

Now the x contains 1 if x == 'true' else 0.

Note: x == 'true' will return bool which then will be typecasted to int having value (1 if bool value is True else 0) when added with 0.

Question 54

only with this:

const a = true; const b = false;

console.log(+a);//1 console.log(+b);//0

Question 55

I have a requirement as follows:

./xyifier --prox --lport lport --rport rport

for the argument prox , I use action=’store_true’ to check if it is present or not. I do not require any of the arguments. But, if –prox is set I require rport and lport as well. Is there an easy way of doing this with argparse without writing custom conditional coding.

More Code:

non_int.add_argument('--prox', action='store_true', help='Flag to turn on proxy')
non_int.add_argument('--lport', type=int, help='Listen Port.')
non_int.add_argument('--rport', type=int, help='Proxy port.')

Question 56

No, there isn’t any option in argparse to make mutually inclusive sets of options.

The simplest way to deal with this would be:

if args.prox and (args.lport is None or args.rport is None):
    parser.error("--prox requires --lport and --rport.")

Question 57

You’re talking about having conditionally required arguments. Like @borntyping said you could check for the error and do parser.error(), or you could just apply a requirement related to --prox when you add a new argument.

A simple solution for your example could be:

non_int.add_argument('--prox', action='store_true', help='Flag to turn on proxy')
non_int.add_argument('--lport', required='--prox' in sys.argv, type=int)
non_int.add_argument('--rport', required='--prox' in sys.argv, type=int)

This way required receives either True or False depending on whether the user as used --prox. This also guarantees that -lport and -rport have an independent behavior between each other.

Question 58

How about using parser.parse_known_args() method and then adding the --lport and --rport args as required args if --prox is present.

# just add --prox arg now
non_int = argparse.ArgumentParser(description="stackoverflow question", 
                                  usage="%(prog)s [-h] [--prox --lport port --rport port]")
non_int.add_argument('--prox', action='store_true', 
                     help='Flag to turn on proxy, requires additional args lport and rport')
opts, rem_args = non_int.parse_known_args()
if opts.prox:
    non_int.add_argument('--lport', required=True, type=int, help='Listen Port.')
    non_int.add_argument('--rport', required=True, type=int, help='Proxy port.')
    # use options and namespace from first parsing
    non_int.parse_args(rem_args, namespace = opts)

Also keep in mind that you can supply the namespace opts generated after the first parsing while parsing the remaining arguments the second time. That way, in the the end, after all the parsing is done, you’ll have a single namespace with all the options.

Drawbacks:

If --prox is not present the other two dependent options aren’t even present in the namespace. Although based on your use-case, if --prox is not present, what happens to the other options is irrelevant.
Need to modify usage message as parser doesn’t know full structure
--lport and --rport don’t show up in help message

Question 59

Do you use lport when prox is not set. If not, why not make lport and rport arguments of prox? e.g.

parser.add_argument('--prox', nargs=2, type=int, help='Prox: listen and proxy ports')

That saves your users typing. It is just as easy to test if args.prox is not None: as if args.prox:.

Question 60

The accepted answer worked great for me! Since all code is broken without tests here is how I tested the accepted answer. parser.error() does not raise an argparse.ArgumentError error it instead exits the process. You have to test for SystemExit.

with pytest

import pytest
from . import parse_arguments  # code that rasises parse.error()


def test_args_parsed_raises_error():
    with pytest.raises(SystemExit):
        parse_arguments(["argument that raises error"])

with unittests

from unittest import TestCase
from . import parse_arguments  # code that rasises parse.error()

class TestArgs(TestCase):

    def test_args_parsed_raises_error():
        with self.assertRaises(SystemExit) as cm:
            parse_arguments(["argument that raises error"])

inspired from: Using unittest to test argparse – exit errors

Question 61

I can’t figure out how the sklearn.pipeline.Pipeline works exactly.

There are a few explanation in the doc. For example what do they mean by:

Pipeline of transforms with a final estimator.

To make my question clearer, what are steps? How do they work?

Edit

Thanks to the answers I can make my question clearer:

When I call pipeline and pass, as steps, two transformers and one estimator, e.g:

pipln = Pipeline([("trsfm1",transformer_1),
                  ("trsfm2",transformer_2),
                  ("estmtr",estimator)])

What happens when I call this?

pipln.fit()
OR
pipln.fit_transform()

I can’t figure out how an estimator can be a transformer and how a transformer can be fitted.

Question 62

Transformer in scikit-learn – some class that have fit and transform method, or fit_transform method.

Predictor – some class that has fit and predict methods, or fit_predict method.

Pipeline is just an abstract notion, it’s not some existing ml algorithm. Often in ML tasks you need to perform sequence of different transformations (find set of features, generate new features, select only some good features) of raw dataset before applying final estimator.

Here is a good example of Pipeline usage. Pipeline gives you a single interface for all 3 steps of transformation and resulting estimator. It encapsulates transformers and predictors inside, and now you can do something like:

    vect = CountVectorizer()
    tfidf = TfidfTransformer()
    clf = SGDClassifier()

    vX = vect.fit_transform(Xtrain)
    tfidfX = tfidf.fit_transform(vX)
    predicted = clf.fit_predict(tfidfX)

    # Now evaluate all steps on test set
    vX = vect.fit_transform(Xtest)
    tfidfX = tfidf.fit_transform(vX)
    predicted = clf.fit_predict(tfidfX)

With just:

pipeline = Pipeline([
    ('vect', CountVectorizer()),
    ('tfidf', TfidfTransformer()),
    ('clf', SGDClassifier()),
])
predicted = pipeline.fit(Xtrain).predict(Xtrain)
# Now evaluate all steps on test set
predicted = pipeline.predict(Xtest)

With pipelines you can easily perform a grid-search over set of parameters for each step of this meta-estimator. As described in the link above. All steps except last one must be transforms, last step can be transformer or predictor. Answer to edit: When you call pipln.fit() – each transformer inside pipeline will be fitted on outputs of previous transformer (First transformer is learned on raw dataset). Last estimator may be transformer or predictor, you can call fit_transform() on pipeline only if your last estimator is transformer (that implements fit_transform, or transform and fit methods separately), you can call fit_predict() or predict() on pipeline only if your last estimator is predictor. So you just can’t call fit_transform or transform on pipeline, last step of which is predictor.

Question 63

I think that M0rkHaV has the right idea. Scikit-learn’s pipeline class is a useful tool for encapsulating multiple different transformers alongside an estimator into one object, so that you only have to call your important methods once (fit(), predict(), etc). Let’s break down the two major components:

Transformers are classes that implement both fit() and transform(). You might be familiar with some of the sklearn preprocessing tools, like TfidfVectorizer and Binarizer. If you look at the docs for these preprocessing tools, you’ll see that they implement both of these methods. What I find pretty cool is that some estimators can also be used as transformation steps, e.g. LinearSVC!
Estimators are classes that implement both fit() and predict(). You’ll find that many of the classifiers and regression models implement both these methods, and as such you can readily test many different models. It is possible to use another transformer as the final estimator (i.e., it doesn’t necessarily implement predict(), but definitely implements fit()). All this means is that you wouldn’t be able to call predict().

As for your edit: let’s go through a text-based example. Using LabelBinarizer, we want to turn a list of labels into a list of binary values.

bin = LabelBinarizer()  #first we initialize

vec = ['cat', 'dog', 'dog', 'dog'] #we have our label list we want binarized

Now, when the binarizer is fitted on some data, it will have a structure called classes_ that contains the unique classes that the transformer ‘knows’ about. Without calling fit() the binarizer has no idea what the data looks like, so calling transform() wouldn’t make any sense. This is true if you print out the list of classes before trying to fit the data.

print bin.classes_

I get the following error when trying this:

AttributeError: 'LabelBinarizer' object has no attribute 'classes_'

But when you fit the binarizer on the vec list:

bin.fit(vec)

and try again

print bin.classes_

I get the following:

['cat' 'dog']


print bin.transform(vec)

And now, after calling transform on the vec object, we get the following:

[[0]
 [1]
 [1]
 [1]]

As for estimators being used as transformers, let us use the DecisionTree classifier as an example of a feature-extractor. Decision Trees are great for a lot of reasons, but for our purposes, what’s important is that they have the ability to rank features that the tree found useful for predicting. When you call transform() on a Decision Tree, it will take your input data and find what it thinks are the most important features. So you can think of it transforming your data matrix (n rows by m columns) into a smaller matrix (n rows by k columns), where the k columns are the k most important features that the Decision Tree found.

Question 64

ML algorithms typically process tabular data. You may want to do preprocessing and post-processing of this data before and after your ML algorithm. A pipeline is a way to chain those data processing steps.

What are ML pipelines and how do they work?

A pipeline is a series of steps in which data is transformed. It comes from the old “pipe and filter” design pattern (for instance, you could think of unix bash commands with pipes “|” or redirect operators “>”). However, pipelines are objects in the code. Thus, you may have a class for each filter (a.k.a. each pipeline step), and then another class to combine those steps into the final pipeline. Some pipelines may combine other pipelines in series or in parallel, have multiple inputs or outputs, and so on. We like to view Machine Learning pipelines as:

Pipe and filters. The pipeline’s steps process data, and they manage their inner state which can be learned from the data.
Composites. Pipelines can be nested: for example a whole pipeline can be treated as a single pipeline step in another pipeline. A pipeline step is not necessarily a pipeline, but a pipeline is itself at least a pipeline step by definition.
Directed Acyclic Graphs (DAG). A pipeline step’s output may be sent to many other steps, and then the resulting outputs can be recombined, and so on. Side note: despite pipelines are acyclic, they can process multiple items one by one, and if their state change (e.g.: using the fit_transform method each time), then they can be viewed as recurrently unfolding through time, keeping their states (think like an RNN). That’s an interesting way to see pipelines for doing online learning when putting them in production and training them on more data.

Methods of a Scikit-Learn Pipeline

Pipelines (or steps in the pipeline) must have those two methods:

“fit” to learn on the data and acquire state (e.g.: neural network’s neural weights are such state)
“transform” (or “predict”) to actually process the data and generate a prediction.

It’s also possible to call this method to chain both:

“fit_transform” to fit and then transform the data, but in one pass, which allows for potential code optimizations when the two methods must be done one after the other directly.

Problems of the sklearn.pipeline.Pipeline class

Scikit-Learn’s “pipe and filter” design pattern is simply beautiful. But how to use it for Deep Learning, AutoML, and complex production-level pipelines?

Scikit-Learn had its first release in 2007, which was a pre deep learning era. However, it’s one of the most known and adopted machine learning library, and is still growing. On top of all, it uses the Pipe and Filter design pattern as a software architectural style – it’s what makes Scikit-Learn so fabulous, added to the fact it provides algorithms ready for use. However, it has massive issues when it comes to do the following, which we should be able to do in 2020 already:

Automatic Machine Learning (AutoML),
Deep Learning Pipelines,
More complex Machine Learning pipelines.

Solutions that we’ve Found to Those Scikit-Learn’s Problems

For sure, Scikit-Learn is very convenient and well-built. However, it needs a refresh. Here are our solutions with Neuraxle to make Scikit-Learn fresh and useable within modern computing projects!

Additional pipeline methods and features offered through Neuraxle

Note: if a step of a pipeline doesn’t need to have one of the fit or transform methods, it could inherit from NonFittableMixin or NonTransformableMixin to be provided a default implementation of one of those methods to do nothing.

As a starter, it is possible for pipelines or their steps to also optionally define those methods:

“setup” which will call the “setup” method on each of its step. For instance, if a step contains a TensorFlow, PyTorch, or Keras neural network, the steps could create their neural graphs and register them to the GPU in the “setup” method before fit. It is discouraged to create the graphs directly in the constructors of the steps for several reasons, such as if the steps are copied before running many times with different hyperparameters within an Automatic Machine Learning algorithm that searches for the best hyperparameters for you.
“teardown”, which is the opposite of the “setup” method: it clears resources.

The following methods are provided by default to allow for managing hyperparameters:

“get_hyperparams” will return you a dictionary of the hyperparameters. If your pipeline contains more pipelines (nested pipelines), then the hyperparameter’ keys are chained with double underscores “__” separators.
“set_hyperparams” will allow you to set new hyperparameters in the same format of when you get them.
“get_hyperparams_space” allows you to get the space of hyperparameter, which will be not empty if you defined one. So, the only difference with “get_hyperparams” here is that you’ll get statistic distributions as values instead of a precise value. For instance, one hyperparameter for the number of layers could be a RandInt(1, 3) which means 1 to 3 layers. You can call .rvs() on this dict to pick a value randomly and send it to “set_hyperparams” to try training on it.
“set_hyperparams_space” can be used to set a new space using the same hyperparameter distribution classes as in “get_hyperparams_space”.

For more info on our suggested solutions, read the entries in the big list with links above.

问题：如何从sqlite查询中获取字典？

回答 0

回答 1

回答 2

回答 3

回答 4

回答 5

回答 6

回答 7

回答 8

回答 9

回答 10

回答 11

回答 12

回答 13

问题：如何使用PIL保存图像？

回答 0

回答 1

回答 2

回答 3

问题：熊猫使用什么规则生成视图与副本？

回答 0

问题：+ =在python中到底是做什么的？

回答 0

回答 1

回答 2

回答 3

回答 4

回答 5

回答 6

回答 7

回答 8

回答 9

回答 10

回答 11

回答 12

回答 13

问题：在numpy中将一维数组转换为二维数组

回答 0

回答 1

回答 2

回答 3

回答 4

回答 5

回答 6

回答 7

问题：如何在Python中将’false’转换为0并将’true’转换为1

回答 0

回答 1

回答 2

回答 3

回答 4

回答 5

回答 6

问题：Argparse：如果存在“ x”，则必需的参数“ y”

回答 0

回答 1

回答 2

回答 3

回答 4

问题：Python-sklearn.pipeline.Pipeline到底是什么？

回答 0

回答 1

回答 2

Scikit-Learn管道的方法

我们为那些Scikit-Learn问题找到的解决方案

通过Neuraxle提供的其他管道方法和功能

Methods of a Scikit-Learn Pipeline

Solutions that we’ve Found to Those Scikit-Learn’s Problems

Additional pipeline methods and features offered through Neuraxle

问题：使用熊猫将字符串前缀添加到字符串列中的每个值

回答 0

回答 1

回答 2

回答 3

回答 4

问题：如何在python抽象类中创建抽象属性

回答 0

回答 1

回答 2