Python 实用宝典

Question 1

Imports are always put at the top of the file, just after any module comments and docstrings, and before module globals and constants.

On occation, I violate PEP 8. Some times I import stuff inside functions. As a general rule, I do this if there is an import that is only used within a single function.

Any opinions?

EDIT (the reason I feel importing in functions can be a good idea):

Main reason: It can make the code clearer.

When looking at the code of a function I might ask myself: “What is function/class xxx?” (xxx being used inside the function). If I have all my imports at the top of the module, I have to go look there to determine what xxx is. This is more of an issue when using from m import xxx. Seeing m.xxx in the function probably tells me more. Depending on what m is: Is it a well-known top-level module/package (import m)? Or is it a sub-module/package (from a.b.c import m)?
In some cases having that extra information (“What is xxx?”) close to where xxx is used can make the function easier to understand.

Question 2

In the long run I think you’ll appreciate having most of your imports at the top of the file, that way you can tell at a glance how complicated your module is by what it needs to import.

If I’m adding new code to an existing file I’ll usually do the import where it’s needed and then if the code stays I’ll make things more permanent by moving the import line to the top of the file.

One other point, I prefer to get an ImportError exception before any code is run — as a sanity check, so that’s another reason to import at the top.

I use pyChecker to check for unused modules.

Question 3

There are two occasions where I violate PEP 8 in this regard:

Circular imports: module A imports module B, but something in module B needs module A (though this is often a sign that I need to refactor the modules to eliminate the circular dependency)
Inserting a pdb breakpoint: import pdb; pdb.set_trace() This is handy b/c I don’t want to put import pdb at the top of every module I might want to debug, and it easy to remember to remove the import when I remove the breakpoint.

Outside of these two cases, it’s a good idea to put everything at the top. It makes the dependencies clearer.

Question 4

Here are the four import use cases that we use

import (and from x import y and import x as y) at the top

Choices for Import. At the top.

import settings
if setting.something:
    import this as foo
else:
    import that as foo

Conditional Import. Used with JSON, XML libraries and the like. At the top.
```
try:
    import this as foo
except ImportError:
    import that as foo
```
Dynamic Import. So far, we only have one example of this.
```
import settings
module_stuff = {}
module= __import__( settings.some_module, module_stuff )
x = module_stuff['x']
```
Note that this dynamic import doesn’t bring in code, but brings in complex data structures written in Python. It’s kind of like a pickled piece of data except we pickled it by hand.

This is also, more-or-less, at the top of a module

Here’s what we do to make the code clearer:

Keep the modules short.
If I have all my imports at the top of the module, I have to go look there to determine what a name is. If the module is short, that’s easy to do.
In some cases having that extra information close to where a name is used can make the function easier to understand. If the module is short, that’s easy to do.

Question 5

One thing to bear in mind: needless imports can cause performance problems. So if this is a function that will be called frequently, you’re better off just putting the import at the top. Of course this is an optimization, so if there’s a valid case to be made that importing inside a function is more clear than importing at the top of a file, that trumps performance in most cases.

If you’re doing IronPython, I’m told that it’s better to import inside functions (since compiling code in IronPython can be slow). Thus, you may be able to get a way with importing inside functions then. But other than that, I’d argue that it’s just not worth it to fight convention.

As a general rule, I do this if there is an import that is only used within a single function.

Another point I’d like to make is that this may be a potential maintenence problem. What happens if you add a function that uses a module that was previously used by only one function? Are you going to remember to add the import to the top of the file? Or are you going to scan each and every function for imports?

FWIW, there are cases where it makes sense to import inside a function. For example, if you want to set the language in cx_Oracle, you need to set an NLS_LANG environment variable before it is imported. Thus, you may see code like this:

import os

oracle = None

def InitializeOracle(lang):
    global oracle
    os.environ['NLS_LANG'] = lang
    import cx_Oracle
    oracle = cx_Oracle

Question 6

I’ve broken this rule before for modules that are self-testing. That is, they are normally just used for support, but I define a main for them so that if you run them by themselves you can test their functionality. In that case I sometimes import getopt and cmd just in main, because I want it to be clear to someone reading the code that these modules have nothing to do with the normal operation of the module and are only being included for testing.

Question 7

Coming from the question about loading the module twice – Why not both?

An import at the top of the script will indicate the dependencies and another import in the function with make this function more atomic, while seemingly not causing any performance disadvantage, since a consecutive import is cheap.

Question 8

As long as it’s import and not from x import *, you should put them at the top. It adds just one name to the global namespace, and you stick to PEP 8. Plus, if you later need it somewhere else, you don’t have to move anything around.

It’s no big deal, but since there’s almost no difference I’d suggest doing what PEP 8 says.

Question 9

Have a look at the alternative approach that’s used in sqlalchemy: dependency injection:

@util.dependencies("sqlalchemy.orm.query")
def merge_result(query, *args):
    #...
    query.Query(...)

Notice how the imported library is declared in a decorator, and passed as an argument to the function!

This approach makes the code cleaner, and also works 4.5 times faster than an import statement!

Benchmark: https://gist.github.com/kolypto/589e84fbcfb6312532658df2fabdb796

Question 10

In modules that are both ‘normal’ modules and can be executed (i.e. have a if __name__ == '__main__':-section), I usually import modules that are only used when executing the module inside the main section.

Example:

def really_useful_function(data):
    ...


def main():
    from pathlib import Path
    from argparse import ArgumentParser
    from dataloader import load_data_from_directory

    parser = ArgumentParser()
    parser.add_argument('directory')
    args = parser.parse_args()
    data = load_data_from_directory(Path(args.directory))
    print(really_useful_function(data)


if __name__ == '__main__':
    main()

Question 11

There’s another (probably “corner”) case where it may be beneficial to import inside rarely used functions: shorten startup time.

I hit that wall once with a rather complex program running on a small IoT server accepting commands from a serial line and performing operations, possibly very complex operations.

Placing import statements at top of files meant to have all imports processed before server start; since import list included jinja2, lxml, signxml and other “heavy weights” (and SoC was not very powerful) this meant minutes before the first instruction was actually executed.

OTOH placing most imports in functions I was able to have the server “alive” on the serial line in seconds. Of course when the modules were actually needed I had to pay the price (Note: also this can be mitigated by spawning a background task doing imports in idle time).

Question 12

I have gotten the following error:

type object ‘datetime.datetime’ has no attribute ‘datetime’

On the following line:

date = datetime.datetime(int(year), int(month), 1)

Does anybody know the reason for the error?

I imported datetime with from datetime import datetime if that helps

Thanks

Question 13

Datetime is a module that allows for handling of dates, times and datetimes (all of which are datatypes). This means that datetime is both a top-level module as well as being a type within that module. This is confusing.

Your error is probably based on the confusing naming of the module, and what either you or a module you’re using has already imported.

>>> import datetime
>>> datetime
<module 'datetime' from '/usr/lib/python2.6/lib-dynload/datetime.so'>
>>> datetime.datetime(2001,5,1)
datetime.datetime(2001, 5, 1, 0, 0)

But, if you import datetime.datetime:

>>> from datetime import datetime
>>> datetime
<type 'datetime.datetime'>
>>> datetime.datetime(2001,5,1) # You shouldn't expect this to work 
                                # as you imported the type, not the module
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: type object 'datetime.datetime' has no attribute 'datetime'
>>> datetime(2001,5,1)
datetime.datetime(2001, 5, 1, 0, 0)

I suspect you or one of the modules you’re using has imported like this: from datetime import datetime.

Question 14

For python 3.3

from datetime import datetime, timedelta
futuredate = datetime.now() + timedelta(days=10)

Question 15

You should use

date = datetime(int(year), int(month), 1)

Or change

from datetime import datetime

to

import datetime

Question 16

You should really import the module into its own alias.

import datetime as dt
my_datetime = dt.datetime(year, month, day)

The above has the following benefits over the other solutions:

Calling the variable my_datetime instead of date reduces confusion since there is already a date in the datetime module (datetime.date).
The module and the class (both called datetime) do not shadow each other.

Question 17

If you have used:

from datetime import datetime

Then simply write the code as:

date = datetime(int(year), int(month), 1)

But if you have used:

import datetime

then only you can write:

date = datetime.datetime(int(2005), int(5), 1)

Question 18

I found this to be a lot easier

from dateutil import relativedelta
relativedelta.relativedelta(end_time,start_time).seconds

Question 19

I run into the same error maybe you have already imported the module by using only import datetime so change form datetime import datetime to only import datetime. It worked for me after I changed it back.

Question 20

from datetime import datetime
import time
from calendar import timegm
d = datetime.utcnow()
d = d.strftime("%Y-%m-%dT%H:%M:%S.%fZ")
utc_time = time.strptime(d,"%Y-%m-%dT%H:%M:%S.%fZ")
epoch_time = timegm(utc_time)

Question 21

I’m trying to convert a fairly simple Python program to an executable and couldn’t find what I was looking for, so I have a few questions (I’m running Python 3.6):

The methods of doing this that I have found so far are as follows

downloading an old version of Python and using pyinstaller/py2exe
setting up a virtual environment in Python 3.6 that will allow me to do 1.
downloading a Python to C++ converter and using that.

Here is what I’ve tried/what problems I’ve run into.

I installed pyinstaller before the required download before it (pypi-something) so it did not work. After downloading the prerequisite file, pyinstaller still does not recognize it.
If I’m setting up a virtualenv in Python 2.7, do I actually need to have Python 2.7 installed?
similarly, the only python to C++ converters I see work only up until Python 3.5 – do I need to download and use this version if attempting this?

Question 22

Steps to convert .py to .exe in Python 3.6

Install Python 3.6.
Install cx_Freeze, (open your command prompt and type pip install cx_Freeze.
Install idna, (open your command prompt and type pip install idna.
Write a .py program named myfirstprog.py.
Create a new python file named setup.py on the current directory of your script.
In the setup.py file, copy the code below and save it.
With shift pressed right click on the same directory, so you are able to open a command prompt window.
In the prompt, type python setup.py build
If your script is error free, then there will be no problem on creating application.
Check the newly created folder build. It has another folder in it. Within that folder you can find your application. Run it. Make yourself happy.

See the original script in my blog.

setup.py:

from cx_Freeze import setup, Executable

base = None    

executables = [Executable("myfirstprog.py", base=base)]

packages = ["idna"]
options = {
    'build_exe': {    
        'packages':packages,
    },    
}

setup(
    name = "<any name>",
    options = options,
    version = "<any number>",
    description = '<any description>',
    executables = executables
)

EDIT:

be sure that instead of myfirstprog.py you should put your .pyextension file name as created in step 4;
you should include each imported package in your .py into packages list (ex: packages = ["idna", "os","sys"])
any name, any number, any description in setup.py file should not remain the same, you should change it accordingly (ex:name = "<first_ever>", version = "0.11", description = '' )
the imported packages must be installed before you start step 8.

Question 23

Python 3.6 is supported by PyInstaller.

Open a cmd window in your Python folder (open a command window and use cd or while holding shift, right click it on Windows Explorer and choose ‘Open command window here’). Then just enter

pip install pyinstaller

And that’s it.

The simplest way to use it is by entering on your command prompt

pyinstaller file_name.py

For more details on how to use it, take a look at this question.

Question 24

There is an open source project called auto-py-to-exe on GitHub. Actually it also just uses PyInstaller internally but since it is has a simple GUI that controls PyInstaller it may be a comfortable alternative. It can also output a standalone file in contrast to other solutions. They also provide a video showing how to set it up.

GUI:

Output:

Question 25

I can’t tell you what’s best, but a tool I have used with success in the past was cx_Freeze. They recently updated (on Jan. 7, ’17) to version 5.0.1 and it supports Python 3.6.

Here’s the pypi https://pypi.python.org/pypi/cx_Freeze

The documentation shows that there is more than one way to do it, depending on your needs. http://cx-freeze.readthedocs.io/en/latest/overview.html

I have not tried it out yet, so I’m going to point to a post where the simple way of doing it was discussed. Some things may or may not have changed though.

How do I use cx_freeze?

Question 26

I’ve been using Nuitka and PyInstaller with my package, PySimpleGUI.

Nuitka There were issues getting tkinter to compile with Nuikta. One of the project contributors developed a script that fixed the problem.

If you’re not using tkinter it may “just work” for you. If you are using tkinter say so and I’ll try to get the script and instructions published.

PyInstaller I’m running 3.6 and PyInstaller is working great! The command I use to create my exe file is:

pyinstaller -wF myfile.py

The -wF will create a single EXE file. Because all of my programs have a GUI and I do not want to command window to show, the -w option will hide the command window.

This is as close to getting what looks like a Winforms program to run that was written in Python.

[Update 20-Jul-2019]

There is PySimpleGUI GUI based solution that uses PyInstaller. It uses PySimpleGUI. It’s called pysimplegui-exemaker and can be pip installed.

pip install PySimpleGUI-exemaker

To run it after installing:

python -m pysimplegui-exemaker.pysimplegui-exemaker

Question 27

Now you can convert it by using PyInstaller. It works with even Python 3.

Steps:

Fire up your PC
Open command prompt
Enter command pip install pyinstaller
When it is installed, use the command ‘cd’ to go to the working directory.
Run command pyinstall <filename>

Question 28

Is there any way to force install a pip python package ignoring all it’s dependencies that cannot be satisfied?

(I don’t care how “wrong” it is to do so, I just need to do it, any logic and reasoning aside…)

Question 29

pip has a --no-dependencies switch. You should use that.

For more information, run pip install -h, where you’ll see this line:

--no-deps, --no-dependencies
                        Ignore package dependencies

Question 30

When I were trying install librosa package with pip (pip install librosa), this error were appeared:

ERROR: Cannot uninstall 'llvmlite'. It is a distutils installed project and thus we cannot accurately determine which files belong to it which would lead to only a partial uninstall.

I tried to remove llvmlite, but pip uninstall could not remove it. So, I used capability of ignore of pip by this code:

pip install librosa --ignore-installed llvmlite

Indeed, you can use this rule for ignoring a package you don’t want to consider:

pip install {package you want to install} --ignore-installed {installed package you don't want to consider}

Question 31

I would like to merge two DataFrames, and keep the index from the first frame as the index on the merged dataset. However, when I do the merge, the resulting DataFrame has integer index. How can I specify that I want to keep the index from the left data frame?

In [4]: a = pd.DataFrame({'col1': {'a': 1, 'b': 2, 'c': 3}, 
                          'to_merge_on': {'a': 1, 'b': 3, 'c': 4}})

In [5]: b = pd.DataFrame({'col2': {0: 1, 1: 2, 2: 3}, 
                          'to_merge_on': {0: 1, 1: 3, 2: 5}})

In [6]: a
Out[6]:
   col1  to_merge_on
a     1            1
b     2            3
c     3            4

In [7]: b
Out[7]:
   col2  to_merge_on
0     1            1
1     2            3
2     3            5

In [8]: a.merge(b, how='left')
Out[8]:
   col1  to_merge_on  col2
0     1            1   1.0
1     2            3   2.0
2     3            4   NaN

In [9]: _.index
Out[9]: Int64Index([0, 1, 2], dtype='int64')

EDIT: Switched to example code that can be easily reproduced

Question 32

In [5]: a.reset_index().merge(b, how="left").set_index('index')
Out[5]:
       col1  to_merge_on  col2
index
a         1            1     1
b         2            3     2
c         3            4   NaN

Note that for some left merge operations, you may end up with more rows than in a when there are multiple matches between a and b. In this case, you may need to drop duplicates.

Question 33

You can make a copy of index on left dataframe and do merge.

a['copy_index'] = a.index
a.merge(b, how='left')

I found this simple method very useful while working with large dataframe and using pd.merge_asof() (or dd.merge_asof()).

This approach would be superior when resetting index is expensive (large dataframe).

Question 34

There is a non-pd.merge solution using Series.map and DataFrame.set_index.

In: a['col2'] = a['to_merge_on'].map(b.set_index('to_merge_on')['col2']))
In: a['col2']
Out:
   col1  to_merge_on  col2
a     1            1   1.0
b     2            3   2.0
c     3            4   NaN

This doesn’t introduce a dummy index name for the index.

Note however that there is no DataFrame.map method, and so this approach is not for multiple columns.

Question 35

df1 = df1.merge(df2, how="inner", left_index=True, right_index=True)

This allows to preserve the index of df1

Question 36

Think I’ve come up with a different solution. I was joining the left table on index value and the right table on a column value based off index of left table. What I did was a normal merge:

First10ReviewsJoined = pd.merge(First10Reviews, df, left_index=True, right_on='Line Number')

Then I retrieved the new index numbers from the merged table and put them in a new column named Sentiment Line Number:

First10ReviewsJoined['Sentiment Line Number']= First10ReviewsJoined.index.tolist()

Then I manually set the index back to the original, left table index based off pre-existing column called Line Number (the column value I joined on from left table index):

First10ReviewsJoined.set_index('Line Number', inplace=True)

Then removed the index name of Line Number so that it remains blank:

First10ReviewsJoined.index.name = None

Maybe a bit of a hack but seems to work well and relatively simple. Also, guess it reduces risk of duplicates/messing up your data. Hopefully that all makes sense.

Question 37

another simple option is to rename the index to what was before:

a.merge(b, how="left").set_axis(a.index)

merge preserves the order at dataframe ‘a’, but just resets the index so it’s save to use set_axis

Question 38

So I’m trying to make this program that will ask the user for input and store the values in an array / list.
Then when a blank line is entered it will tell the user how many of those values are unique.
I’m building this for real life reasons and not as a problem set.

enter: happy
enter: rofl
enter: happy
enter: mpg8
enter: Cpp
enter: Cpp
enter:
There are 4 unique words!

My code is as follows:

# ask for input
ipta = raw_input("Word: ")

# create list 
uniquewords = [] 
counter = 0
uniquewords.append(ipta)

a = 0   # loop thingy
# while loop to ask for input and append in list
while ipta: 
  ipta = raw_input("Word: ")
  new_words.append(input1)
  counter = counter + 1

for p in uniquewords:

..and that’s about all I’ve gotten so far.
I’m not sure how to count the unique number of words in a list?
If someone can post the solution so I can learn from it, or at least show me how it would be great, thanks!

Question 39

In addition, use collections.Counter to refactor your code:

from collections import Counter

words = ['a', 'b', 'c', 'a']

Counter(words).keys() # equals to list(set(words))
Counter(words).values() # counts the elements' frequency

Output:

['a', 'c', 'b']
[2, 1, 1]

Question 40

You can use a set to remove duplicates, and then the len function to count the elements in the set:

len(set(new_words))

Question 41

values, counts = np.unique(words, return_counts=True)

Question 42

Use a set:

words = ['a', 'b', 'c', 'a']
unique_words = set(words)             # == set(['a', 'b', 'c'])
unique_word_count = len(unique_words) # == 3

Armed with this, your solution could be as simple as:

words = []
ipta = raw_input("Word: ")

while ipta:
  words.append(ipta)
  ipta = raw_input("Word: ")

unique_word_count = len(set(words))

print "There are %d unique words!" % unique_word_count

Question 43

aa="XXYYYSBAA"
bb=dict(zip(list(aa),[list(aa).count(i) for i in list(aa)]))
print(bb)
# output:
# {'X': 2, 'Y': 3, 'S': 1, 'B': 1, 'A': 2}

Question 44

For ndarray there is a numpy method called unique:

np.unique(array_name)

Examples:

>>> np.unique([1, 1, 2, 2, 3, 3])
array([1, 2, 3])
>>> a = np.array([[1, 1], [2, 3]])
>>> np.unique(a)
array([1, 2, 3])

For a Series there is a function call value_counts():

Series_name.value_counts()

Question 45

ipta = raw_input("Word: ") ## asks for input
words = [] ## creates list
unique_words = set(words)

Question 46

Although a set is the easiest way, you could also use a dict and use some_dict.has(key) to populate a dictionary with only unique keys and values.

Assuming you have already populated words[] with input from the user, create a dict mapping the unique words in the list to a number:

word_map = {}
i = 1
for j in range(len(words)):
    if not word_map.has_key(words[j]):
        word_map[words[j]] = i
        i += 1                                                             
num_unique_words = len(new_map) # or num_unique_words = i, however you prefer

Question 47

Other method by using pandas

import pandas as pd

LIST = ["a","a","c","a","a","v","d"]
counts,values = pd.Series(LIST).value_counts().values, pd.Series(LIST).value_counts().index
df_results = pd.DataFrame(list(zip(values,counts)),columns=["value","count"])

You can then export results in any format you want

Question 48

How about:

import pandas as pd
#List with all words
words=[]

#Code for adding words
words.append('test')


#When Input equals blank:
pd.Series(words).nunique()

It returns how many unique values are in a list

Question 49

The following should work. The lambda function filter out the duplicated words.

inputs=[]
input = raw_input("Word: ").strip()
while input:
    inputs.append(input)
    input = raw_input("Word: ").strip()
uniques=reduce(lambda x,y: ((y in x) and x) or x+[y], inputs, [])
print 'There are', len(uniques), 'unique words'

Question 50

I’d use a set myself, but here’s yet another way:

uniquewords = []
while True:
    ipta = raw_input("Word: ")
    if ipta == "":
        break
    if not ipta in uniquewords:
        uniquewords.append(ipta)
print "There are", len(uniquewords), "unique words!"

Question 51

ipta = raw_input("Word: ") ## asks for input
words = [] ## creates list

while ipta: ## while loop to ask for input and append in list
  words.append(ipta)
  ipta = raw_input("Word: ")
  words.append(ipta)
#Create a set, sets do not have repeats
unique_words = set(words)

print "There are " +  str(len(unique_words)) + " unique words!"

Question 52

I need to get the latest file of a folder using python. While using the code:

max(files, key = os.path.getctime)

I am getting the below error:

FileNotFoundError: [WinError 2] The system cannot find the file specified: 'a'

Question 53

Whatever is assigned to the files variable is incorrect. Use the following code.

import glob
import os

list_of_files = glob.glob('/path/to/folder/*') # * means all if need specific format then *.csv
latest_file = max(list_of_files, key=os.path.getctime)
print latest_file

Question 54

max(files, key = os.path.getctime)

is quite incomplete code. What is files? It probably is a list of file names, coming out of os.listdir().

But this list lists only the filename parts (a. k. a. “basenames”), because their path is common. In order to use it correctly, you have to combine it with the path leading to it (and used to obtain it).

Such as (untested):

def newest(path):
    files = os.listdir(path)
    paths = [os.path.join(path, basename) for basename in files]
    return max(paths, key=os.path.getctime)

Question 55

I would suggest using glob.iglob() instead of the glob.glob(), as it is more efficient.

glob.iglob() Return an iterator which yields the same values as glob() without actually storing them all simultaneously.

Which means glob.iglob() will be more efficient.

I mostly use below code to find the latest file matching to my pattern:

LatestFile = max(glob.iglob(fileNamePattern),key=os.path.getctime)

NOTE: There are variants of max function, In case of finding the latest file we will be using below variant: max(iterable, *[, key, default])

which needs iterable so your first parameter should be iterable. In case of finding max of nums we can use beow variant : max (num1, num2, num3, *args[, key])

Question 56

Try to sort items by creation time. Example below sorts files in a folder and gets first element which is latest.

import glob
import os

files_path = os.path.join(folder, '*')
files = sorted(
    glob.iglob(files_path), key=os.path.getctime, reverse=True) 
print files[0]

Question 57

I lack the reputation to comment but ctime from Marlon Abeykoons response did not give the correct result for me. Using mtime does the trick though. (key=os.path.getmtime))

import glob
import os

list_of_files = glob.glob('/path/to/folder/*') # * means all if need specific format then *.csv
latest_file = max(list_of_files, key=os.path.getmtime)
print latest_file

I found two answers for that problem:

python os.path.getctime max does not return latest Difference between python – getmtime() and getctime() in unix system

Question 58

(Edited to improve answer)

First define a function get_latest_file

def get_latest_file(path, *paths):
    fullpath = os.path.join(path, paths)
    ...
get_latest_file('example', 'files','randomtext011.*.txt')

You may also use a docstring !

def get_latest_file(path, *paths):
    """Returns the name of the latest (most recent) file 
    of the joined path(s)"""
    fullpath = os.path.join(path, *paths)

If you use Python 3, you can use iglob instead.

Complete code to return the name of latest file:

def get_latest_file(path, *paths):
    """Returns the name of the latest (most recent) file 
    of the joined path(s)"""
    fullpath = os.path.join(path, *paths)
    files = glob.glob(fullpath)  # You may use iglob in Python3
    if not files:                # I prefer using the negation
        return None                      # because it behaves like a shortcut
    latest_file = max(files, key=os.path.getctime)
    _, filename = os.path.split(latest_file)
    return filename

Question 59

I have tried to use the above suggestions and my program crashed, than I figured out the file I’m trying to identify was used and when trying to use ‘os.path.getctime’ it crashed. what finally worked for me was:

    files_before = glob.glob(os.path.join(my_path,'*'))
    **code where new file is created**
    new_file = set(files_before).symmetric_difference(set(glob.glob(os.path.join(my_path,'*'))))

this codes gets the uncommon object between the two sets of file lists its not the most elegant, and if multiple files are created at the same time it would probably won’t be stable

Question 60

A much faster method on windows (0.05s), call a bat script that does this:

get_latest.bat

@echo off
for /f %%i in ('dir \\directory\in\question /b/a-d/od/t:c') do set LAST=%%i
%LAST%

where \\directory\in\question is the directory you want to investigate.

get_latest.py

from subprocess import Popen, PIPE
p = Popen("get_latest.bat", shell=True, stdout=PIPE,)
stdout, stderr = p.communicate()
print(stdout, stderr)

if it finds a file stdout is the path and stderr is None.

Use stdout.decode("utf-8").rstrip() to get the usable string representation of the file name.

Question 61

I’ve been using this in Python 3, including pattern matching on the filename.

from pathlib import Path

def latest_file(path: Path, pattern: str = "*"):
    files = path.glob(pattern)
    return max(files, key=lambda x: x.stat().st_ctime)

Question 62

I’m reading in a csv file with multiple datetime columns. I’d need to set the data types upon reading in the file, but datetimes appear to be a problem. For instance:

headers = ['col1', 'col2', 'col3', 'col4']
dtypes = ['datetime', 'datetime', 'str', 'float']
pd.read_csv(file, sep='\t', header=None, names=headers, dtype=dtypes)

When run gives a error:

TypeError: data type “datetime” not understood

Converting columns after the fact, via pandas.to_datetime() isn’t an option I can’t know which columns will be datetime objects. That information can change and comes from whatever informs my dtypes list.

Alternatively, I’ve tried to load the csv file with numpy.genfromtxt, set the dtypes in that function, and then convert to a pandas.dataframe but it garbles the data. Any help is greatly appreciated!

Question 63

Why it does not work

There is no datetime dtype to be set for read_csv as csv files can only contain strings, integers and floats.

Setting a dtype to datetime will make pandas interpret the datetime as an object, meaning you will end up with a string.

Pandas way of solving this

The pandas.read_csv() function has a keyword argument called parse_dates

Using this you can on the fly convert strings, floats or integers into datetimes using the default date_parser (dateutil.parser.parser)

headers = ['col1', 'col2', 'col3', 'col4']
dtypes = {'col1': 'str', 'col2': 'str', 'col3': 'str', 'col4': 'float'}
parse_dates = ['col1', 'col2']
pd.read_csv(file, sep='\t', header=None, names=headers, dtype=dtypes, parse_dates=parse_dates)

This will cause pandas to read col1 and col2 as strings, which they most likely are (“2016-05-05” etc.) and after having read the string, the date_parser for each column will act upon that string and give back whatever that function returns.

Defining your own date parsing function:

The pandas.read_csv() function also has a keyword argument called date_parser

Setting this to a lambda function will make that particular function be used for the parsing of the dates.

GOTCHA WARNING

You have to give it the function, not the execution of the function, thus this is Correct

date_parser = pd.datetools.to_datetime

This is incorrect:

date_parser = pd.datetools.to_datetime()

Pandas 0.22 Update

pd.datetools.to_datetime has been relocated to date_parser = pd.to_datetime

Thanks @stackoverYC

Question 64

There is a parse_dates parameter for read_csv which allows you to define the names of the columns you want treated as dates or datetimes:

date_cols = ['col1', 'col2']
pd.read_csv(file, sep='\t', header=None, names=headers, parse_dates=date_cols)

问题：导入函数内部是pythonic吗？

回答 0

回答 1

回答 2

回答 3

回答 4

回答 5

回答 6

回答 7

回答 8

回答 9

问题：类型对象“ datetime.datetime”没有属性“ datetime”

回答 0

回答 1

回答 2

回答 3

回答 4

回答 5

回答 6

回答 7

问题：如何将Python的.py转换为.exe？

回答 0

回答 1

回答 2

回答 3

回答 4

回答 5

问题：python pip：强制安装忽略依赖项

回答 0

回答 1

问题：使用熊猫合并时如何保持索引

回答 0

回答 1

回答 2

回答 3

回答 4

回答 5

问题：如何计算列表中的唯一值

回答 0

回答 1

回答 2

回答 3

回答 4

回答 5

回答 6

回答 7

回答 8

回答 9

回答 10

回答 11

回答 12

问题：如何使用python获取文件夹中的最新文件

回答 0

回答 1

回答 2

回答 3

回答 4

回答 5

回答 6

回答 7

回答 8

问题：熊猫中的datetime dtypes read_csv

回答 0

为什么它不起作用

熊猫解决这个问题的方法

定义自己的日期解析功能：

GOTCHA警告

熊猫0.22更新

Why it does not work

Pandas way of solving this

Defining your own date parsing function:

GOTCHA WARNING

Pandas 0.22 Update

回答 1

回答 2

回答 3

问题：合并PDF文件

回答 0

回答 1

回答 2