Python 实用宝典

Question 1

I want to check if any of the items in one list are present in another list. I can do it simply with the code below, but I suspect there might be a library function to do this. If not, is there a more pythonic method of achieving the same result.

In [78]: a = [1, 2, 3, 4, 5]

In [79]: b = [8, 7, 6]

In [80]: c = [8, 7, 6, 5]

In [81]: def lists_overlap(a, b):
   ....:     for i in a:
   ....:         if i in b:
   ....:             return True
   ....:     return False
   ....: 

In [82]: lists_overlap(a, b)
Out[82]: False

In [83]: lists_overlap(a, c)
Out[83]: True

In [84]: def lists_overlap2(a, b):
   ....:     return len(set(a).intersection(set(b))) > 0
   ....:

Question 2

Short answer: use not set(a).isdisjoint(b), it’s generally the fastest.

There are four common ways to test if two lists a and b share any items. The first option is to convert both to sets and check their intersection, as such:

bool(set(a) & set(b))

Because sets are stored using a hash table in Python, searching them is O(1) (see here for more information about complexity of operators in Python). Theoretically, this is O(n+m) on average for n and m objects in lists a and b. But 1) it must first create sets out of the lists, which can take a non-negligible amount of time, and 2) it supposes that hashing collisions are sparse among your data.

The second way to do it is using a generator expression performing iteration on the lists, such as:

any(i in a for i in b)

This allows to search in-place, so no new memory is allocated for intermediary variables. It also bails out on the first find. But the in operator is always O(n) on lists (see here).

Another proposed option is an hybridto iterate through one of the list, convert the other one in a set and test for membership on this set, like so:

a = set(a); any(i in a for i in b)

A fourth approach is to take advantage of the isdisjoint() method of the (frozen)sets (see here), for example:

not set(a).isdisjoint(b)

If the elements you search are near the beginning of an array (e.g. it is sorted), the generator expression is favored, as the sets intersection method have to allocate new memory for the intermediary variables:

from timeit import timeit
>>> timeit('bool(set(a) & set(b))', setup="a=list(range(1000));b=list(range(1000))", number=100000)
26.077727576019242
>>> timeit('any(i in a for i in b)', setup="a=list(range(1000));b=list(range(1000))", number=100000)
0.16220548999262974

Here’s a graph of the execution time for this example in function of list size:

Note that both axes are logarithmic. This represents the best case for the generator expression. As can be seen, the isdisjoint() method is better for very small list sizes, whereas the generator expression is better for larger list sizes.

On the other hand, as the search begins with the beginning for the hybrid and generator expression, if the shared element are systematically at the end of the array (or both lists does not share any values), the disjoint and set intersection approaches are then way faster than the generator expression and the hybrid approach.

>>> timeit('any(i in a for i in b)', setup="a=list(range(1000));b=[x+998 for x in range(999,0,-1)]", number=1000))
13.739536046981812
>>> timeit('bool(set(a) & set(b))', setup="a=list(range(1000));b=[x+998 for x in range(999,0,-1)]", number=1000))
0.08102107048034668

It is interesting to note that the generator expression is way slower for bigger list sizes. This is only for 1000 repetitions, instead of the 100000 for the previous figure. This setup also approximates well when when no elements are shared, and is the best case for the disjoint and set intersection approaches.

Here are two analysis using random numbers (instead of rigging the setup to favor one technique or another):

High chance of sharing: elements are randomly taken from [1, 2*len(a)]. Low chance of sharing: elements are randomly taken from [1, 1000*len(a)].

Up to now, this analysis supposed both lists are of the same size. In case of two lists of different sizes, for example a is much smaller, isdisjoint() is always faster:

Make sure that the a list is the smaller, otherwise the performance decreases. In this experiment, the a list size was set constant to 5.

In summary:

If the lists are very small (< 10 elements), not set(a).isdisjoint(b) is always the fastest.
If the elements in the lists are sorted or have a regular structure that you can take advantage of, the generator expression any(i in a for i in b) is the fastest on large list sizes;
Test the set intersection with not set(a).isdisjoint(b), which is always faster than bool(set(a) & set(b)).
The hybrid “iterate through list, test on set” a = set(a); any(i in a for i in b) is generally slower than other methods.
The generator expression and the hybrid are much slower than the two other approaches when it comes to lists without sharing elements.

In most cases, using the isdisjoint() method is the best approach as the generator expression will take much longer to execute, as it is very inefficient when no elements are shared.

Question 3

def lists_overlap3(a, b):
    return bool(set(a) & set(b))

Note: the above assumes that you want a boolean as the answer. If all you need is an expression to use in an if statement, just use if set(a) & set(b):

Question 4

def lists_overlap(a, b):
  sb = set(b)
  return any(el in sb for el in a)

This is asymptotically optimal (worst case O(n + m)), and might be better than the intersection approach due to any‘s short-circuiting.

E.g.:

lists_overlap([3,4,5], [1,2,3])

will return True as soon as it gets to 3 in sb

EDIT: Another variation (with thanks to Dave Kirby):

def lists_overlap(a, b):
  sb = set(b)
  return any(itertools.imap(sb.__contains__, a))

This relies on imap‘s iterator, which is implemented in C, rather than a generator comprehension. It also uses sb.__contains__ as the mapping function. I don’t know how much performance difference this makes. It will still short-circuit.

Question 5

You could also use any with list comprehension:

any([item in a for item in b])

Question 6

In python 2.6 or later you can do:

return not frozenset(a).isdisjoint(frozenset(b))

Question 7

You can use the any built in function /w a generator expression:

def list_overlap(a,b): 
     return any(i for i in a if i in b)

As John and Lie have pointed out this gives incorrect results when for every i shared by the two lists bool(i) == False. It should be:

return any(i in b for i in a)

Question 8

This question is pretty old, but I noticed that while people were arguing sets vs. lists, that no one thought of using them together. Following Soravux’s example,

Worst case for lists:

>>> timeit('bool(set(a) & set(b))',  setup="a=list(range(10000)); b=[x+9999 for x in range(10000)]", number=100000)
100.91506409645081
>>> timeit('any(i in a for i in b)', setup="a=list(range(10000)); b=[x+9999 for x in range(10000)]", number=100000)
19.746716022491455
>>> timeit('any(i in a for i in b)', setup="a= set(range(10000)); b=[x+9999 for x in range(10000)]", number=100000)
0.092626094818115234

And the best case for lists:

>>> timeit('bool(set(a) & set(b))',  setup="a=list(range(10000)); b=list(range(10000))", number=100000)
154.69790101051331
>>> timeit('any(i in a for i in b)', setup="a=list(range(10000)); b=list(range(10000))", number=100000)
0.082653045654296875
>>> timeit('any(i in a for i in b)', setup="a= set(range(10000)); b=list(range(10000))", number=100000)
0.08434605598449707

So even faster than iterating through two lists is iterating though a list to see if it’s in a set, which makes sense since checking if a number is in a set takes constant time while checking by iterating through a list takes time proportional to the length of the list.

Thus, my conclusion is that iterate through a list, and check if it’s in a set.

Question 9

if you don’t care what the overlapping element might be, you can simply check the len of the combined list vs. the lists combined as a set. If there are overlapping elements, the set will be shorter:

len(set(a+b+c))==len(a+b+c) returns True, if there is no overlap.

Question 10

I’ll throw another one in with a functional programming style:

any(map(lambda x: x in a, b))

Explanation:

map(lambda x: x in a, b)

returns a list of booleans where elements of b are found in a. That list is then passed to any, which simply returns True if any elements are True.

Question 11

If I do

conda info pandas

I can see all of the packages available.

I updated my pandas to the latest this morning, but I need to revert to a prior version now. I tried

conda update pandas 0.13.1

but that didn’t work. How do I specify which version to use?

Question 12

I had to use the install function instead:

conda install pandas=0.13.1

Question 13

For the case that you wish to revert a recently installed package that made several changes to dependencies (such as tensorflow), you can “roll back” to an earlier installation state via the following method:

conda list --revisions
conda install --revision [revision number]

The first command shows previous installation revisions (with dependencies) and the second reverts to whichever revision number you specify.

Note that if you wish to (re)install a later revision, you may have to sequentially reinstall all intermediate versions. If you had been at revision 23, reinstalled revision 20 and wish to return, you may have to run each:

conda install --revision 21
conda install --revision 22
conda install --revision 23

Question 14

I have been learning Python by following some pygame tutorials.

Therein I found extensive use of the keyword self, and coming from a primarily Java background, I find that I keep forgetting to type self. For example, instead of self.rect.centerx I would type rect.centerx, because, to me, rect is already a member variable of the class.

The Java parallel I can think of for this situation is having to prefix all references to member variables with this.

Am I stuck prefixing all member variables with self, or is there a way to declare them that would allow me to avoid having to do so?

Even if what I am suggesting isn’t pythonic, I’d still like to know if it is possible.

I have taken a look at these related SO questions, but they don’t quite answer what I am after:

Question 15

Python requires specifying self. The result is there’s never any confusion over what’s a member and what’s not, even without the full class definition visible. This leads to useful properties, such as: you can’t add members which accidentally shadow non-members and thereby break code.

One extreme example: you can write a class without any knowledge of what base classes it might have, and always know whether you are accessing a member or not:

class A(some_function()):
  def f(self):
    self.member = 42
    self.method()

That’s the complete code! (some_function returns the type used as a base.)

Another, where the methods of a class are dynamically composed:

class B(object):
  pass

print B()
# <__main__.B object at 0xb7e4082c>

def B_init(self):
  self.answer = 42
def B_str(self):
  return "<The answer is %s.>" % self.answer
# notice these functions require no knowledge of the actual class
# how hard are they to read and realize that "members" are used?

B.__init__ = B_init
B.__str__ = B_str

print B()
# <The answer is 42.>

Remember, both of these examples are extreme and you won’t see them every day, nor am I suggesting you should often write code like this, but they do clearly show aspects of self being explicitly required.

Question 16

Previous answers are all basically variants of “you can’t” or “you shouldn’t”. While I agree with the latter sentiment, the question is technically still unanswered.

Furthermore, there are legitimate reasons why someone might want to do something along the lines of what the actual question is asking. One thing I run into sometimes is lengthy math equations where using long names makes the equation unrecognizable. Here are a couple ways of how you could do this in a canned example:

import numpy as np
class MyFunkyGaussian() :
    def __init__(self, A, x0, w, s, y0) :
        self.A = float(A)
        self.x0 = x0
        self.w = w
        self.y0 = y0
        self.s = s

    # The correct way, but subjectively less readable to some (like me) 
    def calc1(self, x) :
        return (self.A/(self.w*np.sqrt(np.pi))/(1+self.s*self.w**2/2)
                * np.exp( -(x-self.x0)**2/self.w**2)
                * (1+self.s*(x-self.x0)**2) + self.y0 )

    # The correct way if you really don't want to use 'self' in the calculations
    def calc2(self, x) :
        # Explicity copy variables
        A, x0, w, y0, s = self.A, self.x0, self.w, self.y0, self.s
        sqrt, exp, pi = np.sqrt, np.exp, np.pi
        return ( A/( w*sqrt(pi) )/(1+s*w**2/2)
                * exp( -(x-x0)**2/w**2 )
                * (1+s*(x-x0)**2) + y0 )

    # Probably a bad idea...
    def calc3(self, x) :
        # Automatically copy every class vairable
        for k in self.__dict__ : exec(k+'= self.'+k)
        sqrt, exp, pi = np.sqrt, np.exp, np.pi
        return ( A/( w*sqrt(pi) )/(1+s*w**2/2)
                * exp( -(x-x0)**2/w**2 )
                * (1+s*(x-x0)**2) + y0 )

g = MyFunkyGaussian(2.0, 1.5, 3.0, 5.0, 0.0)
print(g.calc1(0.5))
print(g.calc2(0.5))
print(g.calc3(0.5))

The third example – i.e. using for k in self.__dict__ : exec(k+'= self.'+k) is basically what the question is actually asking for, but let me be clear that I don’t think it is generally a good idea.

For more info, and ways to iterate through class variables, or even functions, see answers and discussion to this question. For a discussion of other ways to dynamically name variables, and why this is usually not a good idea see this blog post.

UPDATE: There appears to be no way to dynamically update or change locals in a function in Python3, so calc3 and similar variants are no longer possible. The only python3 compatible solution I can think of now is to use globals:

def calc4(self, x) :
        # Automatically copy every class variable in globals
        globals().update(self.__dict__)
        sqrt, exp, pi = np.sqrt, np.exp, np.pi
        return ( A/( w*sqrt(pi) )/(1+s*w**2/2)
                * exp( -(x-x0)**2/w**2 )
                * (1+s*(x-x0)**2) + y0 )

Which, again, would be a terrible practice in general.

Question 17

Actually self is not a keyword, it’s just the name conventionally given to the first parameter of instance methods in Python. And that first parameter can’t be skipped, as it’s the only mechanism a method has of knowing which instance of your class it’s being called on.

Question 18

You can use whatever name you want, for example

class test(object):
    def function(this, variable):
        this.variable = variable

or even

class test(object):
    def function(s, variable):
        s.variable = variable

but you are stuck with using a name for the scope.

I do not recommend you use something different to self unless you have a convincing reason, as it would make it alien for experienced pythonistas.

Question 19

yes, you must always specify self, because explicit is better than implicit, according to python philosophy.

You will also find out that the way you program in python is very different from the way you program in java, hence the use of self tends to decrease because you don’t project everything inside the object. Rather, you make larger use of module-level function, which can be better tested.

by the way. I hated it at first, now I hate the opposite. same for indented-driven flow control.

Question 20

The “self” is the conventional placeholder of the current object instance of a class. Its used when you want to refer to the object’s property or field or method inside a class as if you’re referring to “itself”. But to make it shorter someone in the Python programming realm started to use “self” , other realms use “this” but they make it as a keyword which cannot be replaced. I rather used “its” to increase the code readability. Its one of the good things in Python – you have a freedom to choose your own placeholder for the object’s instance other than “self”. Example for self:

class UserAccount():    
    def __init__(self, user_type, username, password):
        self.user_type = user_type
        self.username = username            
        self.password = encrypt(password)        

    def get_password(self):
        return decrypt(self.password)

    def set_password(self, password):
        self.password = encrypt(password)

Now we replace ‘self’ with ‘its’:

class UserAccount():    
    def __init__(its, user_type, username, password):
        its.user_type = user_type
        its.username = username            
        its.password = encrypt(password)        

    def get_password(its):
        return decrypt(its.password)

    def set_password(its, password):
        its.password = encrypt(password)

which is more readable now?

Question 21

self is part of the python syntax to access members of objects, so I’m afraid you’re stuck with it

Question 22

Actually you can use recipe “Implicit self” from Armin Ronacher presentation “5 years of bad ideas” ( google it).

It’s a very clever recipe, as almost everything from Armin Ronacher, but I don’t think this idea is very appealing. I think I’d prefer explicit this in C#/Java.

Update. Link to “bad idea recipe”: https://speakerdeck.com/mitsuhiko/5-years-of-bad-ideas?slide=58

Question 23

Yeah, self is tedious. But, is it better?

class Test:

    def __init__(_):
        _.test = 'test'

    def run(_):
        print _.test

Question 24

From: Self Hell – More stateful functions.

…a hybrid approach works best. All of your class methods that actually do computation should be moved into closures, and extensions to clean up syntax should be kept in classes. Stuff the closures into classes, treating the class much like a namespace. The closures are essentially static functions, and so do not require selfs*, even in the class…

Question 25

I think that it would be easier and more readable if there was a statement “member” just as there is “global” so you can tell the interpreter which are the objects members of the class.

Question 26

I searched in this official document to find difference between the json.dump() and json.dumps() in python. It is clear that they are related with file write option.
But what is the detailed difference between them and in what situations one has more advantage than other?

Question 27

There isn’t much else to add other than what the docs say. If you want to dump the JSON into a file/socket or whatever, then you should go with dump(). If you only need it as a string (for printing, parsing or whatever) then use dumps() (dump string)

As mentioned by Antti Haapala in this answer, there are some minor differences on the ensure_ascii behaviour. This is mostly due to how the underlying write() function works, being that it operates on chunks rather than the whole string. Check his answer for more details on that.

json.dump()

Serialize obj as a JSON formatted stream to fp (a .write()-supporting file-like object

If ensure_ascii is False, some chunks written to fp may be unicode instances

json.dumps()

Serialize obj to a JSON formatted str

If ensure_ascii is False, the result may contain non-ASCII characters and the return value may be a unicode instance

Question 28

The functions with an s take string parameters. The others take file streams.

Question 29

In memory usage and speed.

When you call jsonstr = json.dumps(mydata) it first creates a full copy of your data in memory and only then you file.write(jsonstr) it to disk. So this is a faster method but can be a problem if you have a big piece of data to save.

When you call json.dump(mydata, file) — without ‘s’, new memory is not used, as the data is dumped by chunks. But the whole process is about 2 times slower.

Source: I checked the source code of json.dump() and json.dumps() and also tested both the variants measuring the time with time.time() and watching the memory usage in htop.

Question 30

One notable difference in Python 2 is that if you’re using ensure_ascii=False, dump will properly write UTF-8 encoded data into the file (unless you used 8-bit strings with extended characters that are not UTF-8):

dumps on the other hand, with ensure_ascii=False can produce a str or unicode just depending on what types you used for strings:

Serialize obj to a JSON formatted str using this conversion table. If ensure_ascii is False, the result may contain non-ASCII characters and the return value may be a unicode instance.

(emphasis mine). Note that it may still be a str instance as well.

Thus you cannot use its return value to save the structure into file without checking which format was returned and possibly playing with unicode.encode.

This of course is not valid concern in Python 3 any more, since there is no more this 8-bit/Unicode confusion.

As for load vs loads, load considers the whole file to be one JSON document, so you cannot use it to read multiple newline limited JSON documents from a single file.

Question 31

I’m trying to access a dict_key’s element by its index:

test = {'foo': 'bar', 'hello': 'world'}
keys = test.keys()  # dict_keys object

keys.index(0)
AttributeError: 'dict_keys' object has no attribute 'index'

I want to get foo.

same with:

keys[0]
TypeError: 'dict_keys' object does not support indexing

How can I do this?

Question 32

Call list() on the dictionary instead:

keys = list(test)

In Python 3, the dict.keys() method returns a dictionary view object, which acts as a set. Iterating over the dictionary directly also yields keys, so turning a dictionary into a list results in a list of all the keys:

>>> test = {'foo': 'bar', 'hello': 'world'}
>>> list(test)
['foo', 'hello']
>>> list(test)[0]
'foo'

Question 33

Not a full answer but perhaps a useful hint. If it is really the first item you want*, then

next(iter(q))

is much faster than

list(q)[0]

for large dicts, since the whole thing doesn’t have to be stored in memory.

For 10.000.000 items I found it to be almost 40.000 times faster.

*The first item in case of a dict being just a pseudo-random item before Python 3.6 (after that it’s ordered in the standard implementation, although it’s not advised to rely on it).

Question 34

I wanted “key” & “value” pair of a first dictionary item. I used the following code.

 key, val = next(iter(my_dict.items()))

Question 35

test = {'foo': 'bar', 'hello': 'world'}
ls = []
for key in test.keys():
    ls.append(key)
print(ls[0])

Conventional way of appending the keys to a statically defined list and then indexing it for same

Question 36

In many cases, this may be an XY Problem. Why are you indexing your dictionary keys by position? Do you really need to? Until recently, dictionaries were not even ordered in Python, so accessing the first element was arbitrary.

I just translated some Python 2 code to Python 3:

keys = d.keys()
for (i, res) in enumerate(some_list):
    k = keys[i]
    # ...

which is not pretty, but not very bad either. At first, I was about to replace it by the monstrous

    k = next(itertools.islice(iter(keys), i, None))

before I realised this is all much better written as

for (k, res) in zip(d.keys(), some_list):

which works just fine.

I believe that in many other cases, indexing dictionary keys by position can be avoided. Although dictionaries are ordered in Python 3.7, relying on that is not pretty. The code above only works because the contents of some_list had been recently produced from the contents of d.

Have a hard look at your code if you really need to access a disk_keys element by index. Perhaps you don’t need to.

Question 37

Try this

keys = [next(iter(x.keys())) for x in test]
print(list(keys))

The result looks like this. [‘foo’, ‘hello’]

You can find more possible solutions here.

Question 38

I am trying to get the date of the previous month with python. Here is what i’ve tried:

str( time.strftime('%Y') ) + str( int(time.strftime('%m'))-1 )

However, this way is bad for 2 reasons: First it returns 20122 for the February of 2012 (instead of 201202) and secondly it will return 0 instead of 12 on January.

I have solved this trouble in bash with

echo $(date -d"3 month ago" "+%G%m%d")

I think that if bash has a built-in way for this purpose, then python, much more equipped, should provide something better than forcing writing one’s own script to achieve this goal. Of course i could do something like:

if int(time.strftime('%m')) == 1:
    return '12'
else:
    if int(time.strftime('%m')) < 10:
        return '0'+str(time.strftime('%m')-1)
    else:
        return str(time.strftime('%m') -1)

I have not tested this code and i don’t want to use it anyway (unless I can’t find any other way:/)

Thanks for your help!

Question 39

datetime and the datetime.timedelta classes are your friend.

find today.
use that to find the first day of this month.
use timedelta to backup a single day, to the last day of the previous month.
print the YYYYMM string you’re looking for.

Like this:

 import datetime
 today = datetime.date.today()
 first = today.replace(day=1)
 lastMonth = first - datetime.timedelta(days=1)
 print(lastMonth.strftime("%Y%m"))

201202 is printed.

Question 40

You should use dateutil. With that, you can use relativedelta, it’s an improved version of timedelta.

>>> import datetime 
>>> import dateutil.relativedelta
>>> now = datetime.datetime.now()
>>> print now
2012-03-15 12:33:04.281248
>>> print now + dateutil.relativedelta.relativedelta(months=-1)
2012-02-15 12:33:04.281248

Question 41

from datetime import date, timedelta

first_day_of_current_month = date.today().replace(day=1)
last_day_of_previous_month = first_day_of_current_month - timedelta(days=1)

print "Previous month:", last_day_of_previous_month.month

Or:

from datetime import date, timedelta

prev = date.today().replace(day=1) - timedelta(days=1)
print prev.month

Question 42

Building on bgporter’s answer.

def prev_month_range(when = None): 
    """Return (previous month's start date, previous month's end date)."""
    if not when:
        # Default to today.
        when = datetime.datetime.today()
    # Find previous month: https://stackoverflow.com/a/9725093/564514
    # Find today.
    first = datetime.date(day=1, month=when.month, year=when.year)
    # Use that to find the first day of this month.
    prev_month_end = first - datetime.timedelta(days=1)
    prev_month_start = datetime.date(day=1, month= prev_month_end.month, year= prev_month_end.year)
    # Return previous month's start and end dates in YY-MM-DD format.
    return (prev_month_start.strftime('%Y-%m-%d'), prev_month_end.strftime('%Y-%m-%d'))

Question 43

Its very easy and simple. Do this

from dateutil.relativedelta import relativedelta
from datetime import datetime

today_date = datetime.today()
print "todays date time: %s" %today_date

one_month_ago = today_date - relativedelta(months=1)
print "one month ago date time: %s" % one_month_ago
print "one month ago date: %s" % one_month_ago.date()

Here is the output: $python2.7 main.py

todays date time: 2016-09-06 02:13:01.937121
one month ago date time: 2016-08-06 02:13:01.937121
one month ago date: 2016-08-06

Question 44

For someone who got here and looking to get both the first and last day of the previous month:

from datetime import date, timedelta

last_day_of_prev_month = date.today().replace(day=1) - timedelta(days=1)

start_day_of_prev_month = date.today().replace(day=1) - timedelta(days=last_day_of_prev_month.day)

# For printing results
print("First day of prev month:", start_day_of_prev_month)
print("Last day of prev month:", last_day_of_prev_month)

Output:

First day of prev month: 2019-02-01
Last day of prev month: 2019-02-28

Question 45

def prev_month(date=datetime.datetime.today()):
    if date.month == 1:
        return date.replace(month=12,year=date.year-1)
    else:
        try:
            return date.replace(month=date.month-1)
        except ValueError:
            return prev_month(date=date.replace(day=date.day-1))

Question 46

Just for fun, a pure math answer using divmod. Pretty inneficient because of the multiplication, could do just as well a simple check on the number of month (if equal to 12, increase year, etc)

year = today.year
month = today.month

nm = list(divmod(year * 12 + month + 1, 12))
if nm[1] == 0:
    nm[1] = 12
    nm[0] -= 1
pm = list(divmod(year * 12 + month - 1, 12))
if pm[1] == 0:
    pm[1] = 12
    pm[0] -= 1

next_month = nm
previous_month = pm

Question 47

With the Pendulum very complete library, we have the subtract method (and not “subStract”):

import pendulum
today = pendulum.datetime.today()  # 2020, january
lastmonth = today.subtract(months=1)
lastmonth.strftime('%Y%m')
# '201912'

We see that it handles jumping years.

The reverse equivalent is add.

https://pendulum.eustace.io/docs/#addition-and-subtraction

Question 48

Building off the comment of @J.F. Sebastian, you can chain the replace() function to go back one “month”. Since a month is not a constant time period, this solution tries to go back to the same date the previous month, which of course does not work for all months. In such a case, this algorithm defaults to the last day of the prior month.

from datetime import datetime, timedelta

d = datetime(2012, 3, 31) # A problem date as an example

# last day of last month
one_month_ago = (d.replace(day=1) - timedelta(days=1))
try:
    # try to go back to same day last month
    one_month_ago = one_month_ago.replace(day=d.day)
except ValueError:
    pass
print("one_month_ago: {0}".format(one_month_ago))

Output:

one_month_ago: 2012-02-29 00:00:00

Question 49

If you want to look at the ASCII letters in a EXE type file in a LINUX/UNIX Environment, try “od -c ‘filename’ |more”

You will likely get a lot of unrecognizable items, but they will all be presented, and the HEX representations will be displayed, and the ASCII equivalent characters (if appropriate) will follow the line of hex codes. Try it on a compiled piece of code that you know. You might see things in it you recognize.

Question 50

There is a high level library dateparser that can determine the past date given natural language, and return the corresponding Python datetime object

from dateparser import parse
parse('4 months ago')

Question 51

I have a large (about 12M rows) dataframe df with say:

df.columns = ['word','documents','frequency']

So the following ran in a timely fashion:

word_grouping = df[['word','frequency']].groupby('word')
MaxFrequency_perWord = word_grouping[['frequency']].max().reset_index()
MaxFrequency_perWord.columns = ['word','MaxFrequency']

However, this is taking an unexpected long time to run:

Occurrences_of_Words = word_grouping[['word']].count().reset_index()

What am I doing wrong here? Is there a better way to count occurences in a large dataframe?

df.word.describe()

ran pretty well, so I really did not expect this Occurrences_of_Words dataframe to take very long to build.

ps: If the answer is obvious and you feel the need to penalize me for asking this question, please include the answer as well. thank you.

Question 52

I think df['word'].value_counts() should serve. By skipping the groupby machinery, you’ll save some time. I’m not sure why count should be much slower than max. Both take some time to avoid missing values. (Compare with size.)

In any case, value_counts has been specifically optimized to handle object type, like your words, so I doubt you’ll do much better than that.

Question 53

When you want to count the frequency of categorical data in a column in pandas dataFrame use: df['Column_Name'].value_counts()

–Source.

Question 54

Just an addition to the previous answers. Let’s not forget that when dealing with real data there might be null values, so it’s useful to also include those in the counting by using the option dropna=False (default is True)

An example:

>>> df['Embarked'].value_counts(dropna=False)
S      644
C      168
Q       77
NaN      2

Question 55

How can I insert an element at the first index of a list ? If I use list.insert(0,elem), do elem modify the content of the first index? Or do I have to create a new list with the first elem and then copy the old list inside this new one?

Question 56

Use insert:

In [1]: ls = [1,2,3]

In [2]: ls.insert(0, "new")

In [3]: ls
Out[3]: ['new', 1, 2, 3]

Question 57

From the documentation:

list.insert(i, x)
Insert an item at a given position. The first argument is the index of the element before which to insert, so a.insert(0, x) inserts at the front of the list, and a.insert(len(a),x) is equivalent to a.append(x)

http://docs.python.org/2/tutorial/datastructures.html#more-on-lists

Question 58

sess = tf.Session()在Tensorflow 2.0环境中执行命令时，出现如下错误消息：

Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: module 'tensorflow' has no attribute 'Session'

系统信息：

操作系统平台和发行版：Windows 10
python版本：3.7.1
Tensorflow版本：2.0.0-alpha0（随pip一起安装）

重现步骤：

安装：

点安装-升级点
pip install tensorflow == 2.0.0-alpha0
点安装keras
点安装numpy == 1.16.2

执行：

执行命令：将tensorflow导入为tf
执行命令：sess = tf.Session（）

Question 59

When I am executing the command sess = tf.Session() in Tensorflow 2.0 environment, I am getting an error message as below:

Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: module 'tensorflow' has no attribute 'Session'

System Information:

OS Platform and Distribution: Windows 10
Python Version: 3.7.1
Tensorflow Version: 2.0.0-alpha0 (installed with pip)

Steps to reproduce:

Installation:

pip install –upgrade pip
pip install tensorflow==2.0.0-alpha0
pip install keras
pip install numpy==1.16.2

Execution:

Execute command: import tensorflow as tf
Execute command: sess = tf.Session()

Question 60

根据TF 1:1 Symbols Map，在TF 2.0中，您应该使用tf.compat.v1.Session()而不是tf.Session()

https://docs.google.com/spreadsheets/d/1FLFJLzg7WNP6JHODX5q8BDgptKafq_slHpnHVbJIteQ/edit#gid=0

要获得TF 2.0中类似TF 1.x的行为，可以运行

import tensorflow.compat.v1 as tf
tf.disable_v2_behavior()

但后来人们无法受益于TF 2.0所做的许多改进。有关更多详细信息，请参阅迁移指南 https://www.tensorflow.org/guide/migrate

Question 61

According to TF 1:1 Symbols Map, in TF 2.0 you should use tf.compat.v1.Session() instead of tf.Session()

https://docs.google.com/spreadsheets/d/1FLFJLzg7WNP6JHODX5q8BDgptKafq_slHpnHVbJIteQ/edit#gid=0

To get TF 1.x like behaviour in TF 2.0 one can run

import tensorflow.compat.v1 as tf
tf.disable_v2_behavior()

but then one cannot benefit of many improvements made in TF 2.0. For more details please refer to the migration guide https://www.tensorflow.org/guide/migrate

Question 62

TF2默认情况下运行急切执行，因此无需会话。如果要运行静态图，则更正确的方法是tf.function()在TF2中使用。虽然仍然可以通过tf.compat.v1.Session()TF2访问Session ，但我不建议使用它。通过比较问候世界中的差异来证明这种差异可能会有所帮助：

TF1.x你好世界：

import tensorflow as tf
msg = tf.constant('Hello, TensorFlow!')
sess = tf.Session()
print(sess.run(msg))

TF2.x你好世界：

import tensorflow as tf
msg = tf.constant('Hello, TensorFlow!')
tf.print(msg)

有关更多信息，请参见Effective TensorFlow 2

Question 63

TF2 runs Eager Execution by default, thus removing the need for Sessions. If you want to run static graphs, the more proper way is to use tf.function() in TF2. While Session can still be accessed via tf.compat.v1.Session() in TF2, I would discourage using it. It may be helpful to demonstrate this difference by comparing the difference in hello worlds:

TF1.x hello world:

import tensorflow as tf
msg = tf.constant('Hello, TensorFlow!')
sess = tf.Session()
print(sess.run(msg))

TF2.x hello world:

import tensorflow as tf
msg = tf.constant('Hello, TensorFlow!')
tf.print(msg)

For more info, see Effective TensorFlow 2

Question 64

安装后第一次尝试python时遇到了这个问题 windows10 + python3.7(64bit) + anacconda3 + jupyter notebook.

我通过参考“ https://vispud.blogspot.com/2019/05/tensorflow200a0-attributeerror-module.html ”解决了此问题

我同意

我相信TF 2.0已删除了“ Session（）”。

我插入了两行。一个是tf.compat.v1.disable_eager_execution()，另一个是sess = tf.compat.v1.Session()

我的Hello.py如下：

import tensorflow as tf

tf.compat.v1.disable_eager_execution()

hello = tf.constant('Hello, TensorFlow!')

sess = tf.compat.v1.Session()

print(sess.run(hello))

问题：测试列表是否共享python中的任何项目

回答 0

回答 1

回答 2

回答 3

回答 4

回答 5

回答 6

回答 7

回答 8

问题：如何恢复到Anaconda中的先前软件包？

回答 0

回答 1

问题：如何避免在Python中显式的“自我”？

回答 0

回答 1

回答 2

回答 3

回答 4

回答 5

回答 6

回答 7

回答 8

回答 9

回答 10

问题：python中的json.dump（）和json.dumps（）有什么区别？

回答 0

回答 1

回答 2

回答 3

问题：在Python3中按索引访问dict_keys元素

回答 0

回答 1

回答 2

回答 3

回答 4

回答 5

问题：上个月的python日期

回答 0

回答 1

回答 2

回答 3

回答 4

回答 5

回答 6

回答 7

回答 8

回答 9

回答 10

回答 11

问题：计算大熊猫数量的最有效方法是什么？

回答 0

回答 1

回答 2

问题：在Python列表的第一位置插入[关闭]

回答 0

回答 1

问题：Tensorflow 2.0-AttributeError：模块’tensorflow’没有属性’Session’

回答 0

回答 1

回答 2

回答 3

回答 4

回答 5

回答 6

回答 7

回答 8

回答 9

问题：在运行时确定带有upload_to的Django FileField

回答 0

回答 1

回答 2

回答 3

有趣好用的Python教程