Python 实用宝典

Question 1

I’m trying to use TDD (test-driven development) with pytest. pytest will not print to the console when I use print.

I am using pytest my_tests.py to run it.

The documentation seems to say that it should work by default: http://pytest.org/latest/capture.html

But:

import myapplication as tum

class TestBlogger:

    @classmethod
    def setup_class(self):
        self.user = "alice"
        self.b = tum.Blogger(self.user)
        print "This should be printed, but it won't be!"

    def test_inherit(self):
        assert issubclass(tum.Blogger, tum.Site)
        links = self.b.get_links(posts)
        print len(links)   # This won't print either.

Nothing gets printed to my standard output console (just the normal progress and how many many tests passed/failed).

And the script that I’m testing contains print:

class Blogger(Site):
    get_links(self, posts):
        print len(posts)   # It won't get printed in the test.

In unittest module, everything gets printed by default, which is exactly what I need. However, I wish to use pytest for other reasons.

Does anyone know how to make the print statements get shown?

Question 2

By default, py.test captures the result of standard out so that it can control how it prints it out. If it didn’t do this, it would spew out a lot of text without the context of what test printed that text.

However, if a test fails, it will include a section in the resulting report that shows what was printed to standard out in that particular test.

For example,

def test_good():
    for i in range(1000):
        print(i)

def test_bad():
    print('this should fail!')
    assert False

Results in the following output:

>>> py.test tmp.py
============================= test session starts ==============================
platform darwin -- Python 2.7.6 -- py-1.4.20 -- pytest-2.5.2
plugins: cache, cov, pep8, xdist
collected 2 items

tmp.py .F

=================================== FAILURES ===================================
___________________________________ test_bad ___________________________________

    def test_bad():
        print('this should fail!')
>       assert False
E       assert False

tmp.py:7: AssertionError
------------------------------- Captured stdout --------------------------------
this should fail!
====================== 1 failed, 1 passed in 0.04 seconds ======================

Note the Captured stdout section.

If you would like to see print statements as they are executed, you can pass the -s flag to py.test. However, note that this can sometimes be difficult to parse.

>>> py.test tmp.py -s
============================= test session starts ==============================
platform darwin -- Python 2.7.6 -- py-1.4.20 -- pytest-2.5.2
plugins: cache, cov, pep8, xdist
collected 2 items

tmp.py 0
1
2
3
... and so on ...
997
998
999
.this should fail!
F

=================================== FAILURES ===================================
___________________________________ test_bad ___________________________________

    def test_bad():
        print('this should fail!')
>       assert False
E       assert False

tmp.py:7: AssertionError
====================== 1 failed, 1 passed in 0.02 seconds ======================

Question 3

Using -s option will print output of all functions, which may be too much.

If you need particular output, the doc page you mentioned offers few suggestions:

Insert assert False, "dumb assert to make PyTest print my stuff" at the end of your function, and you will see your output due to failed test.
You have special object passed to you by PyTest, and you can write the output into a file to inspect it later, like
```
def test_good1(capsys):
    for i in range(5):
        print i
    out, err = capsys.readouterr()
    open("err.txt", "w").write(err)
    open("out.txt", "w").write(out)
```
You can open the out and err files in a separate tab and let editor automatically refresh it for you, or do a simple py.test; cat out.txt shell command to run your test.

That is rather hackish way to do stuff, but may be it is the stuff you need: after all, TDD means you mess with stuff and leave it clean and silent when it’s ready :-).

Question 4

Short Answer

Use the -s option:

pytest -s

Detailed answer

From the docs:

During test execution any output sent to stdout and stderr is captured. If a test or a setup method fails its according captured output will usually be shown along with the failure traceback.

pytest has the option --capture=method in which method is per-test capturing method, and could be one of the following: fd, sys or no. pytest also has the option -s which is a shortcut for --capture=no, and this is the option that will allow you to see your print statements in the console.

pytest --capture=no     # show print statements in console
pytest -s               # equivalent to previous command

Setting capturing methods or disabling capturing

There are two ways in which pytest can perform capturing:

file descriptor (FD) level capturing (default): All writes going to the operating system file descriptors 1 and 2 will be captured.
sys level capturing: Only writes to Python files sys.stdout and sys.stderr will be captured. No capturing of writes to filedescriptors is performed.

pytest -s            # disable all capturing
pytest --capture=sys # replace sys.stdout/stderr with in-mem files
pytest --capture=fd  # also point filedescriptors 1 and 2 to temp file

Question 5

I needed to print important warning about skipped tests exactly when PyTest muted literally everything.

I didn’t want to fail a test to send a signal, so I did a hack as follow:

def test_2_YellAboutBrokenAndMutedTests():
    import atexit
    def report():
        print C_patch.tidy_text("""
In silent mode PyTest breaks low level stream structure I work with, so
I cannot test if my functionality work fine. I skipped corresponding tests.
Run `py.test -s` to make sure everything is tested.""")
    if sys.stdout != sys.__stdout__:
        atexit.register(report)

The atexit module allows me to print stuff after PyTest released the output streams. The output looks as follow:

============================= test session starts ==============================
platform linux2 -- Python 2.7.3, pytest-2.9.2, py-1.4.31, pluggy-0.3.1
rootdir: /media/Storage/henaro/smyth/Alchemist2-git/sources/C_patch, inifile: 
collected 15 items 

test_C_patch.py .....ssss....s.

===================== 10 passed, 5 skipped in 0.15 seconds =====================
In silent mode PyTest breaks low level stream structure I work with, so
I cannot test if my functionality work fine. I skipped corresponding tests.
Run `py.test -s` to make sure everything is tested.
~/.../sources/C_patch$

Message is printed even when PyTest is in silent mode, and is not printed if you run stuff with py.test -s, so everything is tested nicely already.

Question 6

According to the pytest docs, pytest --capture=sys should work. If you want to capture standard out inside a test, refer to the capsys fixture.

Question 7

I originally came in here to find how to make PyTest print in VSCode’s console while running/debugging the unit test from there. This can be done with the following launch.json configuration. Given .venv the virtual environment folder.

    "version": "0.2.0",
    "configurations": [
        {
            "name": "PyTest",
            "type": "python",
            "request": "launch",
            "stopOnEntry": false,
            "pythonPath": "${config:python.pythonPath}",
            "module": "pytest",
            "args": [
                "-sv"
            ],
            "cwd": "${workspaceRoot}",
            "env": {},
            "envFile": "${workspaceRoot}/.venv",
            "debugOptions": [
                "WaitOnAbnormalExit",
                "WaitOnNormalExit",
                "RedirectOutput"
            ]
        }
    ]
}

Question 8

I have pandas dataframe df1 and df2 (df1 is vanila dataframe, df2 is indexed by ‘STK_ID’ & ‘RPT_Date’) :

>>> df1
    STK_ID  RPT_Date  TClose   sales  discount
0   000568  20060331    3.69   5.975       NaN
1   000568  20060630    9.14  10.143       NaN
2   000568  20060930    9.49  13.854       NaN
3   000568  20061231   15.84  19.262       NaN
4   000568  20070331   17.00   6.803       NaN
5   000568  20070630   26.31  12.940       NaN
6   000568  20070930   39.12  19.977       NaN
7   000568  20071231   45.94  29.269       NaN
8   000568  20080331   38.75  12.668       NaN
9   000568  20080630   30.09  21.102       NaN
10  000568  20080930   26.00  30.769       NaN

>>> df2
                 TClose   sales  discount  net_sales    cogs
STK_ID RPT_Date                                             
000568 20060331    3.69   5.975       NaN      5.975   2.591
       20060630    9.14  10.143       NaN     10.143   4.363
       20060930    9.49  13.854       NaN     13.854   5.901
       20061231   15.84  19.262       NaN     19.262   8.407
       20070331   17.00   6.803       NaN      6.803   2.815
       20070630   26.31  12.940       NaN     12.940   5.418
       20070930   39.12  19.977       NaN     19.977   8.452
       20071231   45.94  29.269       NaN     29.269  12.606
       20080331   38.75  12.668       NaN     12.668   3.958
       20080630   30.09  21.102       NaN     21.102   7.431

I can get the last 3 rows of df2 by:

>>> df2.ix[-3:]
                 TClose   sales  discount  net_sales    cogs
STK_ID RPT_Date                                             
000568 20071231   45.94  29.269       NaN     29.269  12.606
       20080331   38.75  12.668       NaN     12.668   3.958
       20080630   30.09  21.102       NaN     21.102   7.431

while df1.ix[-3:] give all the rows:

>>> df1.ix[-3:]
    STK_ID  RPT_Date  TClose   sales  discount
0   000568  20060331    3.69   5.975       NaN
1   000568  20060630    9.14  10.143       NaN
2   000568  20060930    9.49  13.854       NaN
3   000568  20061231   15.84  19.262       NaN
4   000568  20070331   17.00   6.803       NaN
5   000568  20070630   26.31  12.940       NaN
6   000568  20070930   39.12  19.977       NaN
7   000568  20071231   45.94  29.269       NaN
8   000568  20080331   38.75  12.668       NaN
9   000568  20080630   30.09  21.102       NaN
10  000568  20080930   26.00  30.769       NaN

Why ? How to get the last 3 rows of df1 (dataframe without index) ? Pandas 0.10.1

Question 9

Don’t forget DataFrame.tail! e.g. df1.tail(10)

Question 10

This is because of using integer indices (ix selects those by label over -3 rather than position, and this is by design: see integer indexing in pandas “gotchas”*).

*In newer versions of pandas prefer loc or iloc to remove the ambiguity of ix as position or label:

df.iloc[-3:]

see the docs.

As Wes points out, in this specific case you should just use tail!

Question 11

How to get the last N rows of a pandas DataFrame?

If you are slicing by position, __getitem__ (i.e., slicing with[]) works well, and is the most succinct solution I’ve found for this problem.

pd.__version__
# '0.24.2'

df = pd.DataFrame({'A': list('aaabbbbc'), 'B': np.arange(1, 9)})
df

   A  B
0  a  1
1  a  2
2  a  3
3  b  4
4  b  5
5  b  6
6  b  7
7  c  8

df[-3:]

   A  B
5  b  6
6  b  7
7  c  8

This is the same as calling df.iloc[-3:], for instance (iloc internally delegates to __getitem__).

As an aside, if you want to find the last N rows for each group, use groupby and GroupBy.tail:

df.groupby('A').tail(2)

   A  B
1  a  2
2  a  3
5  b  6
6  b  7
7  c  8

Question 12

This piece of code is giving me an error unhashable type: dict can anyone explain me what is the solution

negids = movie_reviews.fileids('neg')
def word_feats(words):
    return dict([(word, True) for word in words])

negfeats = [(word_feats(movie_reviews.words(fileids=[f])), 'neg') for f in negids]
stopset = set(stopwords.words('english'))

def stopword_filtered_word_feats(words):
    return dict([(word, True) for word in words if word not in stopset])

result=stopword_filtered_word_feats(negfeats)

Question 13

You’re trying to use a dict as a key to another dict or in a set. That does not work because the keys have to be hashable. As a general rule, only immutable objects (strings, integers, floats, frozensets, tuples of immutables) are hashable (though exceptions are possible). So this does not work:

>>> dict_key = {"a": "b"}
>>> some_dict[dict_key] = True
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: unhashable type: 'dict'

To use a dict as a key you need to turn it into something that may be hashed first. If the dict you wish to use as key consists of only immutable values, you can create a hashable representation of it like this:

>>> key = frozenset(dict_key.items())

Now you may use key as a key in a dict or set:

>>> some_dict[key] = True
>>> some_dict
{frozenset([('a', 'b')]): True}

Of course you need to repeat the exercise whenever you want to look up something using a dict:

>>> some_dict[dict_key]                     # Doesn't work
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: unhashable type: 'dict'
>>> some_dict[frozenset(dict_key.items())]  # Works
True

If the dict you wish to use as key has values that are themselves dicts and/or lists, you need to recursively “freeze” the prospective key. Here’s a starting point:

def freeze(d):
    if isinstance(d, dict):
        return frozenset((key, freeze(value)) for key, value in d.items())
    elif isinstance(d, list):
        return tuple(freeze(value) for value in d)
    return d

Question 14

A possible solution might be to use the JSON dumps() method, so you can convert the dictionary to a string —

import json

a={"a":10, "b":20}
b={"b":20, "a":10}
c = [json.dumps(a), json.dumps(b)]


set(c)
json.dumps(a) in c

Output –

set(['{"a": 10, "b": 20}'])
True

Question 15

I am attempting to filter users by a custom field in each users profile called profile. This field is called level and is an integer between 0-3.

If I filter using equals, I get a list of users with the chosen level as expected:

user_list = User.objects.filter(userprofile__level = 0)

When I try to filter using less than:

user_list = User.objects.filter(userprofile__level < 3)

I get the error:

global name ‘userprofile__level’ is not defined

Is there a way to filter by < or >, or am I barking up the wrong tree.

Question 16

Less than or equal:

User.objects.filter(userprofile__level__lte=0)

Greater than or equal:

User.objects.filter(userprofile__level__gte=0)

Likewise, lt for less than and gt for greater than. You can find them all in the documentation.

Question 17

Suppose I have the following Button made with Tkinter in Python:

import Tkinter as Tk
win = Tk.Toplevel()
frame = Tk.Frame(master=win).grid(row=1, column=1)
button = Tk.Button(master=frame, text='press', command=action)

The method action is called when I press the button, but what if I wanted to pass some arguments to the method action?

I have tried with the following code:

button = Tk.Button(master=frame, text='press', command=action(someNumber))

This just invokes the method immediately, and pressing the button does nothing.

Question 18

I personally prefer to use lambdas in such a scenario, because imo it’s clearer and simpler and also doesn’t force you to write lots of wrapper methods if you don’t have control over the called method, but that’s certainly a matter of taste.

That’s how you’d do it with a lambda (note there’s also some implementation of currying in the functional module, so you can use that too):

button = Tk.Button(master=frame, text='press', command= lambda: action(someNumber))

Question 19

This can also be done by using partial from the standard library functools, like this:

from functools import partial
#(...)
action_with_arg = partial(action, arg)
button = Tk.Button(master=frame, text='press', command=action_with_arg)

Question 20

Example GUI:

Let’s say I have the GUI:

import tkinter as tk

root = tk.Tk()

btn = tk.Button(root, text="Press")
btn.pack()

root.mainloop()

What Happens When a Button Is Pressed

See that when btn is pressed it calls its own function which is very similar to button_press_handle in the following example:

def button_press_handle(callback=None):
    if callback:
        callback() # Where exactly the method assigned to btn['command'] is being callled

with:

button_press_handle(btn['command'])

You can simply think that command option should be set as, the reference to the method we want to be called, similar to callback in button_press_handle.

Calling a Method(Callback) When the Button is Pressed

Without arguments

So if I wanted to print something when the button is pressed I would need to set:

btn['command'] = print # default to print is new line

Pay close attention to the lack of () with the print method which is omitted in the meaning that: “This is the method’s name which I want you to call when pressed but don’t call it just this very instant.” However, I didn’t pass any arguments for the print so it printed whatever it prints when called without arguments.

With Argument(s)

Now If I wanted to also pass arguments to the method I want to be called when the button is pressed I could make use of the anonymous functions, which can be created with lambda statement, in this case for print built-in method, like the following:

btn['command'] = lambda arg1="Hello", arg2=" ", arg3="World!" : print(arg1 + arg2 + arg3)

Calling Multiple Methods when the Button Is Pressed

Without Arguments

You can also achieve that using lambda statement but it is considered bad practice and thus I won’t include it here. The good practice is to define a separate method, multiple_methods, that calls the methods wanted and then set it as the callback to the button press:

def multiple_methods():
    print("Vicariously") # the first inner callback
    print("I") # another inner callback

With Argument(s)

In order to pass argument(s) to method that calls other methods, again make use of lambda statement, but first:

def multiple_methods(*args, **kwargs):
    print(args[0]) # the first inner callback
    print(kwargs['opt1']) # another inner callback

and then set:

btn['command'] = lambda arg="live", kw="as the" : a_new_method(arg, opt1=kw)

Returning Object(s) From the Callback

Also further note that callback can’t really return because it’s only called inside button_press_handle with callback() as opposed to return callback(). It does return but not anywhere outside that function. Thus you should rather modify object(s) that are accessible in the current scope.

Complete Example with global Object Modification(s)

Below example will call a method that changes btn‘s text each time the button is pressed:

import tkinter as tk

i = 0
def text_mod():
    global i, btn           # btn can be omitted but not sure if should be
    txt = ("Vicariously", "I", "live", "as", "the", "whole", "world", "dies")
    btn['text'] = txt[i]    # the global object that is modified
    i = (i + 1) % len(txt)  # another global object that gets modified

root = tk.Tk()

btn = tk.Button(root, text="My Button")
btn['command'] = text_mod

btn.pack(fill='both', expand=True)

root.mainloop()

Mirror

Question 21

Python’s ability to provide default values for function arguments gives us a way out.

def fce(x=myX, y=myY):
    myFunction(x,y)
button = Tk.Button(mainWin, text='press', command=fce)

See: http://infohost.nmt.edu/tcc/help/pubs/tkinter/web/extra-args.html

For more buttons you can create a function which returns a function:

def fce(myX, myY):
    def wrapper(x=myX, y=myY):
        pass
        pass
        pass
        return x+y
    return wrapper

button1 = Tk.Button(mainWin, text='press 1', command=fce(1,2))
button2 = Tk.Button(mainWin, text='press 2', command=fce(3,4))
button3 = Tk.Button(mainWin, text='press 3', command=fce(9,8))

Question 22

Building on Matt Thompsons answer : a class can be made callable so it can be used instead of a function:

import tkinter as tk

class Callback:
    def __init__(self, func, *args, **kwargs):
        self.func = func
        self.args = args
        self.kwargs = kwargs
    def __call__(self):
        self.func(*self.args, **self.kwargs)

def default_callback(t):
    print("Button '{}' pressed.".format(t))

root = tk.Tk()

buttons = ["A", "B", "C"]

for i, b in enumerate(buttons):
    tk.Button(root, text=b, command=Callback(default_callback, b)).grid(row=i, column=0)

tk.mainloop()

Question 23

The reason it invokes the method immediately and pressing the button does nothing is that action(somenumber) is evaluated and its return value is attributed as the command for the button. So if action prints something to tell you it has run and returns None, you just run action to evaluate its return value and given None as the command for the button.

To have buttons to call functions with different arguments you can use global variables, although I can’t recommend it:

import Tkinter as Tk

frame = Tk.Frame(width=5, height=2, bd=1, relief=Tk.SUNKEN)
frame.grid(row=2,column=2)
frame.pack(fill=Tk.X, padx=5, pady=5)
def action():
    global output
    global variable
    output.insert(Tk.END,variable.get())
button = Tk.Button(master=frame, text='press', command=action)
button.pack()
variable = Tk.Entry(master=frame)
variable.pack()
output = Tk.Text(master=frame)
output.pack()

if __name__ == '__main__':
    Tk.mainloop()

What I would do is make a class whose objects would contain every variable required and methods to change those as needed:

import Tkinter as Tk
class Window:
    def __init__(self):
        self.frame = Tk.Frame(width=5, height=2, bd=1, relief=Tk.SUNKEN)
        self.frame.grid(row=2,column=2)
        self.frame.pack(fill=Tk.X, padx=5, pady=5)

        self.button = Tk.Button(master=self.frame, text='press', command=self.action)
        self.button.pack()

        self.variable = Tk.Entry(master=self.frame)
        self.variable.pack()

        self.output = Tk.Text(master=self.frame)
        self.output.pack()

    def action(self):
        self.output.insert(Tk.END,self.variable.get())

if __name__ == '__main__':
    window = Window()
    Tk.mainloop()

Question 24

button = Tk.Button(master=frame, text='press', command=lambda: action(someNumber))

I believe should fix this

Question 25

The best thing to do is use lambda as follows:

button = Tk.Button(master=frame, text='press', command=lambda: action(someNumber))

Question 26

I am extremely late, but here is a very simple way of accomplishing it.

import tkinter as tk
def function1(param1, param2):
    print(str(param1) + str(param2))

var1 = "Hello "
var2 = "World!"
def function2():
    function1(var1, var2)

root = tk.Tk()

myButton = tk.Button(root, text="Button", command=function2)
root.mainloop()

You simply wrap the function you want to use in another function and call the second function on the button press.

Question 27

Lambdas are all well and good, but you can also try this (which works in a for loop btw):

root = Tk()

dct = {"1": [*args], "2": [*args]}
def keypress(event):
    *args = dct[event.char]
    for arg in args:
        pass
for i in range(10):
    root.bind(str(i), keypress)

This works because when the binding is set, a key press passes the event as an argument. You can then call attributes off the event like event.char to get “1” or “UP” ect. If you need an argument or multiple arguments other than the event attributes. just create a dictionary to store them.

Question 28

I have encountered this problem before, too. You can just use lambda:

button = Tk.Button(master=frame, text='press',command=lambda: action(someNumber))

Question 29

Use a lambda to pass the entry data to the command function if you have more actions to carry out, like this (I’ve tried to make it generic, so just adapt):

event1 = Entry(master)
button1 = Button(master, text="OK", command=lambda: test_event(event1.get()))

def test_event(event_text):
    if not event_text:
        print("Nothing entered")
    else:
        print(str(event_text))
        #  do stuff

This will pass the information in the event to the button function. There may be more Pythonesque ways of writing this, but it works for me.

Question 30

JasonPy – a few things…

if you stick a button in a loop it will be created over and over and over again… which is probably not what you want. (maybe it is)…

The reason it always gets the last index is lambda events run when you click them – not when the program starts. I’m not sure 100% what you are doing but maybe try storing the value when it’s made then call it later with the lambda button.

eg: (don’t use this code, just an example)

for entry in stuff_that_is_happening:
    value_store[entry] = stuff_that_is_happening

then you can say….

button... command: lambda: value_store[1]

hope this helps!

Question 31

One simple way would be to configure button with lambda like the following syntax:

button['command'] = lambda arg1 = local_var1, arg2 = local_var2 : function(arg1, arg2)

Question 32

For posterity: you can also use classes to achieve something similar. For instance:

class Function_Wrapper():
    def __init__(self, x, y, z):
        self.x, self.y, self.z = x, y, z
    def func(self):
        return self.x + self.y + self.z # execute function

Button can then be simply created by:

instance1 = Function_Wrapper(x, y, z)
button1  = Button(master, text = "press", command = instance1.func)

This approach also allows you to change the function arguments by i.e. setting instance1.x = 3.

Question 33

You need to use lambda:

button = Tk.Button(master=frame, text='press', command=lambda: action(someNumber))

Question 34

Use lambda

import tkinter as tk

root = tk.Tk()
def go(text):
    print(text)

b = tk.Button(root, text="Click", command=lambda: go("hello"))
b.pack()
root.mainloop()

output:

hello

Question 35

I am trying to figure out how to append multiple values to a list in Python. I know there are few methods to do so, such as manually input the values, or put the append operation in a for loop, or the append and extend functions.

However, I wonder if there is a more neat way to do so? Maybe a certain package or function?

Question 36

You can use the sequence method list.extend to extend the list by multiple values from any kind of iterable, being it another list or any other thing that provides a sequence of values.

>>> lst = [1, 2]
>>> lst.append(3)
>>> lst.append(4)
>>> lst
[1, 2, 3, 4]

>>> lst.extend([5, 6, 7])
>>> lst.extend((8, 9, 10))
>>> lst
[1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

>>> lst.extend(range(11, 14))
>>> lst
[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13]

So you can use list.append() to append a single value, and list.extend() to append multiple values.

Question 37

Other than the append function, if by “multiple values” you mean another list, you can simply concatenate them like so.

>>> a = [1,2,3]
>>> b = [4,5,6]
>>> a + b
[1, 2, 3, 4, 5, 6]

Question 38

If you take a look at the official docs, you’ll see right below append, extend. That’s what your looking for.

There’s also itertools.chain if you are more interested in efficient iteration than ending up with a fully populated data structure.

Question 39

I read data from a .csv file to a Pandas dataframe as below. For one of the columns, namely id, I want to specify the column type as int. The problem is the id series has missing/empty values.

When I try to cast the id column to integer while reading the .csv, I get:

df= pd.read_csv("data.csv", dtype={'id': int}) 
error: Integer column has NA values

Alternatively, I tried to convert the column type after reading as below, but this time I get:

df= pd.read_csv("data.csv") 
df[['id']] = df[['id']].astype(int)
error: Cannot convert NA to integer

How can I tackle this?

Question 40

The lack of NaN rep in integer columns is a pandas “gotcha”.

The usual workaround is to simply use floats.

Question 41

In version 0.24.+ pandas has gained the ability to hold integer dtypes with missing values.

Nullable Integer Data Type.

Pandas can represent integer data with possibly missing values using arrays.IntegerArray. This is an extension types implemented within pandas. It is not the default dtype for integers, and will not be inferred; you must explicitly pass the dtype into array() or Series:

arr = pd.array([1, 2, np.nan], dtype=pd.Int64Dtype())
pd.Series(arr)

0      1
1      2
2    NaN
dtype: Int64

For convert column to nullable integers use:

df['myCol'] = df['myCol'].astype('Int64')

Question 42

My use case is munging data prior to loading into a DB table:

df[col] = df[col].fillna(-1)
df[col] = df[col].astype(int)
df[col] = df[col].astype(str)
df[col] = df[col].replace('-1', np.nan)

Remove NaNs, convert to int, convert to str and then reinsert NANs.

It’s not pretty but it gets the job done!

Question 43

It is now possible to create a pandas column containing NaNs as dtype int, since it is now officially added on pandas 0.24.0

pandas 0.24.x release notes Quote: “Pandas has gained the ability to hold integer dtypes with missing values

Question 44

If you absolutely want to combine integers and NaNs in a column, you can use the ‘object’ data type:

df['col'] = (
    df['col'].fillna(0)
    .astype(int)
    .astype(object)
    .where(df['col'].notnull())
)

This will replace NaNs with an integer (doesn’t matter which), convert to int, convert to object and finally reinsert NaNs.

Question 45

If you can modify your stored data, use a sentinel value for missing id. A common use case, inferred by the column name, being that id is an integer, strictly greater than zero, you could use 0 as a sentinel value so that you can write

if row['id']:
   regular_process(row)
else:
   special_process(row)

Question 46

You could use .dropna() if it is OK to drop the rows with the NaN values.

df = df.dropna(subset=['id'])

Alternatively, use .fillna() and .astype() to replace the NaN with values and convert them to int.

I ran into this problem when processing a CSV file with large integers, while some of them were missing (NaN). Using float as the type was not an option, because I might loose the precision.

My solution was to use str as the intermediate type. Then you can convert the string to int as you please later in the code. I replaced NaN with 0, but you could choose any value.

df = pd.read_csv(filename, dtype={'id':str})
df["id"] = df["id"].fillna("0").astype(int)

For the illustration, here is an example how floats may loose the precision:

s = "12345678901234567890"
f = float(s)
i = int(f)
i2 = int(s)
print (f, i, i2)

And the output is:

1.2345678901234567e+19 12345678901234567168 12345678901234567890

Question 47

Most solutions here tell you how to use a placeholder integer to represent nulls. That approach isn’t helpful if you’re uncertain that integer won’t show up in your source data though. My method with will format floats without their decimal values and convert nulls to None’s. The result is an object datatype that will look like an integer field with null values when loaded into a CSV.

keep_df[col] = keep_df[col].apply(lambda x: None if pandas.isnull(x) else '{0:.0f}'.format(pandas.to_numeric(x)))

Question 48

I ran into this issue working with pyspark. As this is a python frontend for code running on a jvm, it requires type safety and using float instead of int is not an option. I worked around the issue by wrapping the pandas pd.read_csv in a function that will fill user-defined columns with user-defined fill values before casting them to the required type. Here is what I ended up using:

def custom_read_csv(file_path, custom_dtype = None, fill_values = None, **kwargs):
    if custom_dtype is None:
        return pd.read_csv(file_path, **kwargs)
    else:
        assert 'dtype' not in kwargs.keys()
        df = pd.read_csv(file_path, dtype = {}, **kwargs)
        for col, typ in custom_dtype.items():
            if fill_values is None or col not in fill_values.keys():
                fill_val = -1
            else:
                fill_val = fill_values[col]
            df[col] = df[col].fillna(fill_val).astype(typ)
    return df

Question 49

First remove the rows which contain NaN. Then do Integer conversion on remaining rows. At Last insert the removed rows again. Hope it will work

Question 50

import pandas as pd

df= pd.read_csv("data.csv")
df['id'] = pd.to_numeric(df['id'])

Question 51

Assuming your DateColumn formatted 3312018.0 should be converted to 03/31/2018 as a string. And, some records are missing or 0.

df['DateColumn'] = df['DateColumn'].astype(int)
df['DateColumn'] = df['DateColumn'].astype(str)
df['DateColumn'] = df['DateColumn'].apply(lambda x: x.zfill(8))
df.loc[df['DateColumn'] == '00000000','DateColumn'] = '01011980'
df['DateColumn'] = pd.to_datetime(df['DateColumn'], format="%m%d%Y")
df['DateColumn'] = df['DateColumn'].apply(lambda x: x.strftime('%m/%d/%Y'))

Question 52

Thanks to some great folks on SO, I discovered the possibilities offered by collections.defaultdict, notably in readability and speed. I have put them to use with success.

Now I would like to implement three levels of dictionaries, the two top ones being defaultdict and the lowest one being int. I don’t find the appropriate way to do this. Here is my attempt:

from collections import defaultdict
d = defaultdict(defaultdict)
a = [("key1", {"a1":22, "a2":33}),
     ("key2", {"a1":32, "a2":55}),
     ("key3", {"a1":43, "a2":44})]
for i in a:
    d[i[0]] = i[1]

Now this works, but the following, which is the desired behavior, doesn’t:

d["key4"]["a1"] + 1

I suspect that I should have declared somewhere that the second level defaultdict is of type int, but I didn’t find where or how to do so.

The reason I am using defaultdict in the first place is to avoid having to initialize the dictionary for each new key.

Any more elegant suggestion?

Thanks pythoneers!

Question 53

Use:

from collections import defaultdict
d = defaultdict(lambda: defaultdict(int))

This will create a new defaultdict(int) whenever a new key is accessed in d.

Question 54

Another way to make a pickleable, nested defaultdict is to use a partial object instead of a lambda:

from functools import partial
...
d = defaultdict(partial(defaultdict, int))

This will work because the defaultdict class is globally accessible at the module level:

“You can’t pickle a partial object unless the function [or in this case, class] it wraps is globally accessible … under its __name__ (within its __module__)” — Pickling wrapped partial functions

Question 55

Look at nosklo’s answer here for a more general solution.

class AutoVivification(dict):
    """Implementation of perl's autovivification feature."""
    def __getitem__(self, item):
        try:
            return dict.__getitem__(self, item)
        except KeyError:
            value = self[item] = type(self)()
            return value

Testing:

a = AutoVivification()

a[1][2][3] = 4
a[1][3][3] = 5
a[1][2]['test'] = 6

print a

Output:

{1: {2: {'test': 6, 3: 4}, 3: {3: 5}}}

Question 56

As per @rschwieb’s request for D['key'] += 1, we can expand on previous by overriding addition by defining __add__ method, to make this behave more like a collections.Counter()

First __missing__ will be called to create a new empty value, which will be passed into __add__. We test the value, counting on empty values to be False.

See emulating numeric types for more information on overriding.

from numbers import Number


class autovivify(dict):
    def __missing__(self, key):
        value = self[key] = type(self)()
        return value

    def __add__(self, x):
        """ override addition for numeric types when self is empty """
        if not self and isinstance(x, Number):
            return x
        raise ValueError

    def __sub__(self, x):
        if not self and isinstance(x, Number):
            return -1 * x
        raise ValueError

Examples:

>>> import autovivify
>>> a = autovivify.autovivify()
>>> a
{}
>>> a[2]
{}
>>> a
{2: {}}
>>> a[4] += 1
>>> a[5][3][2] -= 1
>>> a
{2: {}, 4: 1, 5: {3: {2: -1}}}

Rather than checking argument is a Number (very non-python, amirite!) we could just provide a default 0 value and then attempt the operation:

class av2(dict):
    def __missing__(self, key):
        value = self[key] = type(self)()
        return value

    def __add__(self, x):
        """ override addition when self is empty """
        if not self:
            return 0 + x
        raise ValueError

    def __sub__(self, x):
        """ override subtraction when self is empty """
        if not self:
            return 0 - x
        raise ValueError

Question 57

Late to the party, but for arbitrary depth I just found myself doing something like this:

from collections import defaultdict

class DeepDict(defaultdict):
    def __call__(self):
        return DeepDict(self.default_factory)

The trick here is basically to make the DeepDict instance itself a valid factory for constructing missing values. Now we can do things like

dd = DeepDict(DeepDict(list))
dd[1][2].extend([3,4])
sum(dd[1][2])  # 7

ddd = DeepDict(DeepDict(DeepDict(list)))
ddd[1][2][3].extend([4,5])
sum(ddd[1][2][3])  # 9

Question 58

def _sub_getitem(self, k):
    try:
        # sub.__class__.__bases__[0]
        real_val = self.__class__.mro()[-2].__getitem__(self, k)
        val = '' if real_val is None else real_val
    except Exception:
        val = ''
        real_val = None
    # isinstance(Avoid,dict)也是true，会一直递归死
    if type(val) in (dict, list, str, tuple):
        val = type('Avoid', (type(val),), {'__getitem__': _sub_getitem, 'pop': _sub_pop})(val)
        # 重新赋值当前字典键为返回值，当对其赋值时可回溯
        if all([real_val is not None, isinstance(self, (dict, list)), type(k) is not slice]):
            self[k] = val
    return val


def _sub_pop(self, k=-1):
    try:
        val = self.__class__.mro()[-2].pop(self, k)
        val = '' if val is None else val
    except Exception:
        val = ''
    if type(val) in (dict, list, str, tuple):
        val = type('Avoid', (type(val),), {'__getitem__': _sub_getitem, 'pop': _sub_pop})(val)
    return val


class DefaultDict(dict):
    def __getitem__(self, k):
        return _sub_getitem(self, k)

    def pop(self, k):
        return _sub_pop(self, k)

In[8]: d=DefaultDict()
In[9]: d['a']['b']['c']['d']
Out[9]: ''
In[10]: d['a']="ggggggg"
In[11]: d['a']
Out[11]: 'ggggggg'
In[12]: d['a']['pp']
Out[12]: ''

No errors again. No matter how many levels nested. pop no error also

dd=DefaultDict({“1”:333333})

Question 59

I’m using Python, and would like to insert a string into a text file without deleting or copying the file. How can I do that?

Question 60

Unfortunately there is no way to insert into the middle of a file without re-writing it. As previous posters have indicated, you can append to a file or overwrite part of it using seek but if you want to add stuff at the beginning or the middle, you’ll have to rewrite it.

This is an operating system thing, not a Python thing. It is the same in all languages.

What I usually do is read from the file, make the modifications and write it out to a new file called myfile.txt.tmp or something like that. This is better than reading the whole file into memory because the file may be too large for that. Once the temporary file is completed, I rename it the same as the original file.

This is a good, safe way to do it because if the file write crashes or aborts for any reason, you still have your untouched original file.

Question 61

我无法终生让python的相对导入工作。我创建了一个不起作用的简单示例：

目录结构为：

/__init__.py
/start.py
/parent.py
/sub/__init__.py
/sub/relative.py

/start.py 仅包含： import sub.relative

/sub/relative.py 仅包含 from .. import parent

所有其他文件均为空白。

在命令行上执行以下命令时：

$ cd /
$ python start.py

我得到：

Traceback (most recent call last):
  File "start.py", line 1, in <module>
    import sub.relative
  File "/home/cvondrick/sandbox/sub/relative.py", line 1, in <module>
    from .. import parent
ValueError: Attempted relative import beyond toplevel package

我正在使用Python 2.6。为什么会这样呢？如何使此沙盒示例正常工作？

Question 62

I can’t for the life of me get python’s relative imports to work. I have created a simple example of where it does not function:

The directory structure is:

/__init__.py
/start.py
/parent.py
/sub/__init__.py
/sub/relative.py

/start.py contains just: import sub.relative

/sub/relative.py contains just from .. import parent

All other files are blank.

When executing the following on the command line:

$ cd /
$ python start.py

I get:

Traceback (most recent call last):
  File "start.py", line 1, in <module>
    import sub.relative
  File "/home/cvondrick/sandbox/sub/relative.py", line 1, in <module>
    from .. import parent
ValueError: Attempted relative import beyond toplevel package

I am using Python 2.6. Why is this the case? How do I make this sandbox example work?

Question 63

您正在从“ sub”包中导入。start.py即使有__init__.py礼物，它本身也不在包装中。

您需要从以下目录中的一个目录启动程序parent.py：

./start.py

./pkg/__init__.py
./pkg/parent.py
./pkg/sub/__init__.py
./pkg/sub/relative.py

与start.py：

import pkg.sub.relative

现在pkg是顶层软件包，您的相对导入应该可以了。

如果您想坚持使用当前的布局，则可以使用import parent。因为您是start.py用来启动解释器的，所以该目录start.py位于python路径中。parent.py作为一个单独的模块住在那儿。

__init__.py如果您不将任何内容导入到目录树中更远的脚本中，也可以安全地删除顶层。

Question 64

You are importing from package “sub”. start.py is not itself in a package even if there is a __init__.py present.

You would need to start your program from one directory over parent.py:

./start.py

./pkg/__init__.py
./pkg/parent.py
./pkg/sub/__init__.py
./pkg/sub/relative.py

With start.py:

import pkg.sub.relative

Now pkg is the top level package and your relative import should work.

If you want to stick with your current layout you can just use import parent. Because you use start.py to launch your interpreter, the directory where start.py is located is in your python path. parent.py lives there as a separate module.

You can also safely delete the top level __init__.py, if you don’t import anything into a script further up the directory tree.

问题：如何在pytest中打印到控制台？

回答 0

回答 1

回答 2

简短答案

详细答案

设置捕获方法或禁用捕获

Short Answer

Detailed answer

Setting capturing methods or disabling capturing

回答 3

回答 4

回答 5

问题：如何获取熊猫DataFrame的最后N行？

回答 0

回答 1

回答 2

如何获取熊猫DataFrame的最后N行？

How to get the last N rows of a pandas DataFrame?

问题：TypeError：无法散列的类型：’dict’

回答 0

回答 1

问题：如何在Django queryset中执行小于或等于过滤器？

回答 0

问题：如何在Tkinter中将参数传递给Button命令？

回答 0

回答 1

回答 2

GUI示例：

按下按钮时会发生什么

按下按钮时调用方法（回调）

按下按钮时调用多种方法

从回调返回对象

具有全局对象修改的完整示例

Example GUI:

What Happens When a Button Is Pressed

Calling a Method(Callback) When the Button is Pressed

Calling Multiple Methods when the Button Is Pressed

Returning Object(s) From the Callback

Complete Example with global Object Modification(s)

回答 3

回答 4

回答 5

回答 6

回答 7

回答 8

回答 9

回答 10

回答 11

回答 12

回答 13

回答 14

回答 15

回答 16

问题：如何在Python中将多个值附加到列表

回答 0

回答 1

回答 2

问题：将包含NaN的Pandas列转换为dtype`int`

回答 0

回答 1

回答 2

回答 3

回答 4

回答 5

回答 6

回答 7

回答 8

回答 9

回答 10

回答 11

问题：Python中的“ collection.defaultdict”多个级别

回答 0

回答 1

回答 2

回答 3

回答 4

回答 5

问题：如何修改文本文件？

回答 0