



Exception:  (<type 'exceptions.AttributeError'>,
AttributeError("'str' object has no attribute 'read'",), <traceback object at 0x1543ab8>)


def getEntries (self, sub):
    url = 'http://www.reddit.com/'
    if (sub != ''):
        url += 'r/' + sub

    request = urllib2.Request (url + 
        '.json', None, {'User-Agent' : 'Reddit desktop client by /user/RobinJ1995/'})
    response = urllib2.urlopen (request)
    jsonofabitch = response.read ()

    return json.load (jsonofabitch)['data']['children']


In Python I’m getting an error:

Exception:  (<type 'exceptions.AttributeError'>,
AttributeError("'str' object has no attribute 'read'",), <traceback object at 0x1543ab8>)

Given python code:

def getEntries (self, sub):
    url = 'http://www.reddit.com/'
    if (sub != ''):
        url += 'r/' + sub

    request = urllib2.Request (url + 
        '.json', None, {'User-Agent' : 'Reddit desktop client by /user/RobinJ1995/'})
    response = urllib2.urlopen (request)
    jsonofabitch = response.read ()

    return json.load (jsonofabitch)['data']['children']

What does this error mean and what did I do to cause it?

回答 0


The problem is that for json.load you should pass a file like object with a read function defined. So either you use json.load(response) or json.loads(response.read()).

回答 1

AttributeError("'str' object has no attribute 'read'",)



json.load (jsonofabitch)['data']['children']



您也可以通过阅读功能的内置Python文档(try help(json.load)或整个模块(try help(json))),或通过查看http://docs.python.org上这些功能的文档来解决此问题。

AttributeError("'str' object has no attribute 'read'",)

This means exactly what it says: something tried to find a .read attribute on the object that you gave it, and you gave it an object of type str (i.e., you gave it a string).

The error occurred here:

json.load (jsonofabitch)['data']['children']

Well, you aren’t looking for read anywhere, so it must happen in the json.load function that you called (as indicated by the full traceback). That is because json.load is trying to .read the thing that you gave it, but you gave it jsonofabitch, which currently names a string (which you created by calling .read on the response).

Solution: don’t call .read yourself; the function will do this, and is expecting you to give it the response directly so that it can do so.

You could also have figured this out by reading the built-in Python documentation for the function (try help(json.load), or for the entire module (try help(json)), or by checking the documentation for those functions on http://docs.python.org .

回答 2


AttributeError: 'str' object has no attribute 'some_method'



#!/usr/bin/env python
import json
def foobar(json):
    msg = json.loads(json)

foobar('{"batman": "yes"}')


AttributeError: 'str' object has no attribute 'loads'


#!/usr/bin/env python
import json
def foobar(jsonstring):
    msg = json.loads(jsonstring)

foobar('{"batman": "yes"}')


If you get a python error like this:

AttributeError: 'str' object has no attribute 'some_method'

You probably poisoned your object accidentally by overwriting your object with a string.

How to reproduce this error in python with a few lines of code:

#!/usr/bin/env python
import json
def foobar(json):
    msg = json.loads(json)

foobar('{"batman": "yes"}')

Run it, which prints:

AttributeError: 'str' object has no attribute 'loads'

But change the name of the variablename, and it works fine:

#!/usr/bin/env python
import json
def foobar(jsonstring):
    msg = json.loads(jsonstring)

foobar('{"batman": "yes"}')

This error is caused when you tried to run a method within a string. String has a few methods, but not the one you are invoking. So stop trying to invoke a method which String does not define and start looking for where you poisoned your object.

回答 3






Ok, this is an old thread but. I had a same issue, my problem was I used json.load instead of json.loads

This way, json has no problem with loading any kind of dictionary.

Official documentation

json.load – Deserialize fp (a .read()-supporting text file or binary file containing a JSON document) to a Python object using this conversion table.

json.loads – Deserialize s (a str, bytes or bytearray instance containing a JSON document) to a Python object using this conversion table.

回答 4


json_file = json.load('test.json')


f = open('test.json')
json_file = json.load(f)

You need to open the file first. This doesn’t work:

json_file = json.load('test.json')

But this works:

f = open('test.json')
json_file = json.load(f)




import cv2
vidcap = cv2.VideoCapture('Compton.mp4')
success,image = vidcap.read()
count = 0
success = True
while success:
  success,image = vidcap.read()
  cv2.imwrite("frame%d.jpg" % count, image)     # save frame as JPEG file
  if cv2.waitKey(10) == 27:                     # exit if Escape is hit
  count += 1


编辑:我尝试先做,success = True但这没有帮助。它仅创建了一个0字节的图像。

So I’ve followed this tutorial but it doesn’t seem to do anything. Simply nothing. It waits a few seconds and closes the program. What is wrong with this code?

import cv2
vidcap = cv2.VideoCapture('Compton.mp4')
success,image = vidcap.read()
count = 0
success = True
while success:
  success,image = vidcap.read()
  cv2.imwrite("frame%d.jpg" % count, image)     # save frame as JPEG file
  if cv2.waitKey(10) == 27:                     # exit if Escape is hit
  count += 1

Also, in the comments it says that this limits the frames to 1000? Why?

EDIT: I tried doing success = True first but that didn’t help. It only created one image that was 0 bytes.

回答 0



import cv2
vidcap = cv2.VideoCapture('big_buck_bunny_720p_5mb.mp4')
success,image = vidcap.read()
count = 0
while success:
  cv2.imwrite("frame%d.jpg" % count, image)     # save frame as JPEG file      
  success,image = vidcap.read()
  print('Read a new frame: ', success)
  count += 1


From here download this video so we have the same video file for the test. Make sure to have that mp4 file in the same directory of your python code. Then also make sure to run the python interpreter from the same directory.

Then modify the code, ditch waitKey that’s wasting time also without a window it cannot capture the keyboard events. Also we print the success value to make sure it’s reading the frames successfully.

import cv2
vidcap = cv2.VideoCapture('big_buck_bunny_720p_5mb.mp4')
success,image = vidcap.read()
count = 0
while success:
  cv2.imwrite("frame%d.jpg" % count, image)     # save frame as JPEG file      
  success,image = vidcap.read()
  print('Read a new frame: ', success)
  count += 1

How does that go?

回答 1

如果有人不想提取每一帧,但想每秒钟提取一帧,则针对稍有不同的情况扩展此问题(@ user2700065的答案)。因此,一分钟的视频将提供60帧(图像)。

import sys
import argparse

import cv2

def extractImages(pathIn, pathOut):
    count = 0
    vidcap = cv2.VideoCapture(pathIn)
    success,image = vidcap.read()
    success = True
    while success:
        vidcap.set(cv2.CAP_PROP_POS_MSEC,(count*1000))    # added this line 
        success,image = vidcap.read()
        print ('Read a new frame: ', success)
        cv2.imwrite( pathOut + "\\frame%d.jpg" % count, image)     # save frame as JPEG file
        count = count + 1

if __name__=="__main__":
    a = argparse.ArgumentParser()
    a.add_argument("--pathIn", help="path to video")
    a.add_argument("--pathOut", help="path to images")
    args = a.parse_args()
    extractImages(args.pathIn, args.pathOut)

To extend on this question (& answer by @user2700065) for a slightly different cases, if anyone does not want to extract every frame but wants to extract frame every one second. So a 1-minute video will give 60 frames(images).

import sys
import argparse

import cv2

def extractImages(pathIn, pathOut):
    count = 0
    vidcap = cv2.VideoCapture(pathIn)
    success,image = vidcap.read()
    success = True
    while success:
        vidcap.set(cv2.CAP_PROP_POS_MSEC,(count*1000))    # added this line 
        success,image = vidcap.read()
        print ('Read a new frame: ', success)
        cv2.imwrite( pathOut + "\\frame%d.jpg" % count, image)     # save frame as JPEG file
        count = count + 1

if __name__=="__main__":
    a = argparse.ArgumentParser()
    a.add_argument("--pathIn", help="path to video")
    a.add_argument("--pathOut", help="path to images")
    args = a.parse_args()
    extractImages(args.pathIn, args.pathOut)

回答 2

这是来自@GShocked的python 3.x以前答案的调整,我将其发布到注释中,但信誉不足

import sys
import argparse

import cv2

def extractImages(pathIn, pathOut):
    vidcap = cv2.VideoCapture(pathIn)
    success,image = vidcap.read()
    count = 0
    success = True
    while success:
      success,image = vidcap.read()
      print ('Read a new frame: ', success)
      cv2.imwrite( pathOut + "\\frame%d.jpg" % count, image)     # save frame as JPEG file
      count += 1

if __name__=="__main__":
    a = argparse.ArgumentParser()
    a.add_argument("--pathIn", help="path to video")
    a.add_argument("--pathOut", help="path to images")
    args = a.parse_args()
    extractImages(args.pathIn, args.pathOut)

This is a tweak from previous answer for python 3.x from @GShocked, I would post it to the comment, but dont have enough reputation

import sys
import argparse

import cv2

def extractImages(pathIn, pathOut):
    vidcap = cv2.VideoCapture(pathIn)
    success,image = vidcap.read()
    count = 0
    success = True
    while success:
      success,image = vidcap.read()
      print ('Read a new frame: ', success)
      cv2.imwrite( pathOut + "\\frame%d.jpg" % count, image)     # save frame as JPEG file
      count += 1

if __name__=="__main__":
    a = argparse.ArgumentParser()
    a.add_argument("--pathIn", help="path to video")
    a.add_argument("--pathOut", help="path to images")
    args = a.parse_args()
    extractImages(args.pathIn, args.pathOut)

回答 3

此功能可将大多数视频格式转换为视频中的帧数。它的工作原理上Python3OpenCV 3+

import cv2
import time
import os

def video_to_frames(input_loc, output_loc):
    """Function to extract frames from input video file
    and save them as separate frames in an output directory.
        input_loc: Input video file.
        output_loc: Output directory to save the frames.
    except OSError:
    # Log the time
    time_start = time.time()
    # Start capturing the feed
    cap = cv2.VideoCapture(input_loc)
    # Find the number of frames
    video_length = int(cap.get(cv2.CAP_PROP_FRAME_COUNT)) - 1
    print ("Number of frames: ", video_length)
    count = 0
    print ("Converting video..\n")
    # Start converting the video
    while cap.isOpened():
        # Extract the frame
        ret, frame = cap.read()
        # Write the results back to output location.
        cv2.imwrite(output_loc + "/%#05d.jpg" % (count+1), frame)
        count = count + 1
        # If there are no more frames left
        if (count > (video_length-1)):
            # Log the time again
            time_end = time.time()
            # Release the feed
            # Print stats
            print ("Done extracting frames.\n%d frames extracted" % count)
            print ("It took %d seconds forconversion." % (time_end-time_start))

if __name__=="__main__":

    input_loc = '/path/to/video/00009.MTS'
    output_loc = '/path/to/output/frames/'
    video_to_frames(input_loc, output_loc)


This is Function which will convert most of the video formats to number of frames there are in the video. It works on Python3 with OpenCV 3+

import cv2
import time
import os

def video_to_frames(input_loc, output_loc):
    """Function to extract frames from input video file
    and save them as separate frames in an output directory.
        input_loc: Input video file.
        output_loc: Output directory to save the frames.
    except OSError:
    # Log the time
    time_start = time.time()
    # Start capturing the feed
    cap = cv2.VideoCapture(input_loc)
    # Find the number of frames
    video_length = int(cap.get(cv2.CAP_PROP_FRAME_COUNT)) - 1
    print ("Number of frames: ", video_length)
    count = 0
    print ("Converting video..\n")
    # Start converting the video
    while cap.isOpened():
        # Extract the frame
        ret, frame = cap.read()
        # Write the results back to output location.
        cv2.imwrite(output_loc + "/%#05d.jpg" % (count+1), frame)
        count = count + 1
        # If there are no more frames left
        if (count > (video_length-1)):
            # Log the time again
            time_end = time.time()
            # Release the feed
            # Print stats
            print ("Done extracting frames.\n%d frames extracted" % count)
            print ("It took %d seconds forconversion." % (time_end-time_start))

if __name__=="__main__":

    input_loc = '/path/to/video/00009.MTS'
    output_loc = '/path/to/output/frames/'
    video_to_frames(input_loc, output_loc)

It supports .mts and normal files like .mp4 and .avi. Tried and Tested on .mts files. Works like a Charm.

回答 4


import cv2
import numpy as np
import os

def frames_to_video(inputpath,outputpath,fps):
   image_array = []
   files = [f for f in os.listdir(inputpath) if isfile(join(inputpath, f))]
   files.sort(key = lambda x: int(x[5:-4]))
   for i in range(len(files)):
       img = cv2.imread(inputpath + files[i])
       size =  (img.shape[1],img.shape[0])
       img = cv2.resize(img,size)
   fourcc = cv2.VideoWriter_fourcc('D', 'I', 'V', 'X')
   out = cv2.VideoWriter(outputpath,fourcc, fps, size)
   for i in range(len(image_array)):

inputpath = 'folder path'
outpath =  'video file path/video.mp4'
fps = 29


After a lot of research on how to convert frames to video I have created this function hope this helps. We require opencv for this:

import cv2
import numpy as np
import os

def frames_to_video(inputpath,outputpath,fps):
   image_array = []
   files = [f for f in os.listdir(inputpath) if isfile(join(inputpath, f))]
   files.sort(key = lambda x: int(x[5:-4]))
   for i in range(len(files)):
       img = cv2.imread(inputpath + files[i])
       size =  (img.shape[1],img.shape[0])
       img = cv2.resize(img,size)
   fourcc = cv2.VideoWriter_fourcc('D', 'I', 'V', 'X')
   out = cv2.VideoWriter(outputpath,fourcc, fps, size)
   for i in range(len(image_array)):

inputpath = 'folder path'
outpath =  'video file path/video.mp4'
fps = 29

change the value of fps(frames per second),input folder path and output folder path according to your own local locations

回答 5


# create a folder to store extracted images
import os
folder = 'test'  
# use opencv to do the job
import cv2
print(cv2.__version__)  # my version is 3.1.0
vidcap = cv2.VideoCapture('test_video.mp4')
count = 0
while True:
    success,image = vidcap.read()
    if not success:
    cv2.imwrite(os.path.join(folder,"frame{:d}.jpg".format(count)), image)     # save frame as JPEG file
    count += 1
print("{} images are extacted in {}.".format(count,folder))

顺便说一下,您可以通过VLC 检查帧率。转到Windows->媒体信息->编解码器详细信息

The previous answers have lost the first frame. And it will be nice to store the images in a folder.

# create a folder to store extracted images
import os
folder = 'test'  
# use opencv to do the job
import cv2
print(cv2.__version__)  # my version is 3.1.0
vidcap = cv2.VideoCapture('test_video.mp4')
count = 0
while True:
    success,image = vidcap.read()
    if not success:
    cv2.imwrite(os.path.join(folder,"frame{:d}.jpg".format(count)), image)     # save frame as JPEG file
    count += 1
print("{} images are extacted in {}.".format(count,folder))

By the way, you can check the frame rate by VLC. Go to windows -> media information -> codec details

回答 6

此代码从视频中提取帧并将帧保存为.jpg formate

import cv2
import numpy as np
import os

# set video file path of input video with name and extension
vid = cv2.VideoCapture('VideoPath')

if not os.path.exists('images'):

#for frame identity
index = 0
    # Extract images
    ret, frame = vid.read()
    # end of frames
    if not ret: 
    # Saves images
    name = './images/frame' + str(index) + '.jpg'
    print ('Creating...' + name)
    cv2.imwrite(name, frame)

    # next frame
    index += 1

This code extract frames from the video and save the frames in .jpg formate

import cv2
import numpy as np
import os

# set video file path of input video with name and extension
vid = cv2.VideoCapture('VideoPath')

if not os.path.exists('images'):

#for frame identity
index = 0
    # Extract images
    ret, frame = vid.read()
    # end of frames
    if not ret: 
    # Saves images
    name = './images/frame' + str(index) + '.jpg'
    print ('Creating...' + name)
    cv2.imwrite(name, frame)

    # next frame
    index += 1

回答 7

我正在通过Anaconda的Spyder软件使用Python。使用@Gshocked在此线程问题中列出的原始代码,该代码不起作用(Python无法读取mp4文件)。因此,我下载了OpenCV 3.2,并从“ bin”文件夹中复制了“ opencv_ffmpeg320.dll”和“ opencv_ffmpeg320_64.dll”。我将这两个dll文件都粘贴到了Anaconda的“ Dlls”文件夹中。

Anaconda也有一个“ pckgs”文件夹…我复制并粘贴了我下载到Anaconda“ pckgs”文件夹中的整个“ OpenCV 3.2”文件夹。

最后,Anaconda有一个“ Library”文件夹,其中有一个“ bin”子文件夹。我将“ opencv_ffmpeg320.dll”和“ opencv_ffmpeg320_64.dll”文件粘贴到该文件夹​​中。


I am using Python via Anaconda’s Spyder software. Using the original code listed in the question of this thread by @Gshocked, the code does not work (the python won’t read the mp4 file). So I downloaded OpenCV 3.2 and copied “opencv_ffmpeg320.dll” and “opencv_ffmpeg320_64.dll” from the “bin” folder. I pasted both of these dll files to Anaconda’s “Dlls” folder.

Anaconda also has a “pckgs” folder…I copied and pasted the entire “OpenCV 3.2” folder that I downloaded to the Anaconda “pckgs” folder.

Finally, Anaconda has a “Library” folder which has a “bin” subfolder. I pasted the “opencv_ffmpeg320.dll” and “opencv_ffmpeg320_64.dll” files to that folder.

After closing and restarting Spyder, the code worked. I’m not sure which of the three methods worked, and I’m too lazy to go back and figure it out. But it works so, cheers!

回答 8

此功能以1 fps的速度从视频中提取图像,此外它还标识最后一帧并停止读取:

import cv2
import numpy as np

def extract_image_one_fps(video_source_path):

    vidcap = cv2.VideoCapture(video_source_path)
    count = 0
    success = True
    while success:
      success,image = vidcap.read()

      ## Stop when last frame is identified
      image_last = cv2.imread("frame{}.png".format(count-1))
      if np.array_equal(image,image_last):

      cv2.imwrite("frame%d.png" % count, image)     # save frame as PNG file
      print '{}.sec reading a new frame: {} '.format(count,success)
      count += 1

This function extracts images from video with 1 fps, IN ADDITION it identifies the last frame and stops reading also:

import cv2
import numpy as np

def extract_image_one_fps(video_source_path):

    vidcap = cv2.VideoCapture(video_source_path)
    count = 0
    success = True
    while success:
      success,image = vidcap.read()

      ## Stop when last frame is identified
      image_last = cv2.imread("frame{}.png".format(count-1))
      if np.array_equal(image,image_last):

      cv2.imwrite("frame%d.png" % count, image)     # save frame as PNG file
      print '{}.sec reading a new frame: {} '.format(count,success)
      count += 1

回答 9

以下脚本将每隔半秒提取一次文件夹中所有视频的帧。(适用于python 3.7)

import cv2
import os
listing = os.listdir(r'D:/Images/AllVideos')
for vid in listing:
    vid = r"D:/Images/AllVideos/"+vid
    vidcap = cv2.VideoCapture(vid)
    def getFrame(sec):
        hasFrames,image = vidcap.read()
        if hasFrames:
            cv2.imwrite("D:/Images/Frames/image"+str(count)+".jpg", image) # Save frame as JPG file
        return hasFrames
    sec = 0
    frameRate = 0.5 # Change this number to 1 for each 1 second
    success = getFrame(sec)
    while success:
        count = count + 1
        sec = sec + frameRate
        sec = round(sec, 2)
        success = getFrame(sec)

Following script will extract frames every half a second of all videos in folder. (Works on python 3.7)

import cv2
import os
listing = os.listdir(r'D:/Images/AllVideos')
for vid in listing:
    vid = r"D:/Images/AllVideos/"+vid
    vidcap = cv2.VideoCapture(vid)
    def getFrame(sec):
        hasFrames,image = vidcap.read()
        if hasFrames:
            cv2.imwrite("D:/Images/Frames/image"+str(count)+".jpg", image) # Save frame as JPG file
        return hasFrames
    sec = 0
    frameRate = 0.5 # Change this number to 1 for each 1 second
    success = getFrame(sec)
    while success:
        count = count + 1
        sec = sec + frameRate
        sec = round(sec, 2)
        success = getFrame(sec)




I searched in this official document to find difference between the json.dump() and json.dumps() in python. It is clear that they are related with file write option.
But what is the detailed difference between them and in what situations one has more advantage than other?

回答 0


正如Antii Haapala在此答案中提到的,在ensure_ascii行为上有一些细微的差异。这主要是由于底层write()函数是如何工作的,因为它是对块而不是整个字符串进行操作。检查他的答案以获取更多详细信息。







There isn’t much else to add other than what the docs say. If you want to dump the JSON into a file/socket or whatever, then you should go with dump(). If you only need it as a string (for printing, parsing or whatever) then use dumps() (dump string)

As mentioned by Antti Haapala in this answer, there are some minor differences on the ensure_ascii behaviour. This is mostly due to how the underlying write() function works, being that it operates on chunks rather than the whole string. Check his answer for more details on that.


Serialize obj as a JSON formatted stream to fp (a .write()-supporting file-like object

If ensure_ascii is False, some chunks written to fp may be unicode instances


Serialize obj to a JSON formatted str

If ensure_ascii is False, the result may contain non-ASCII characters and the return value may be a unicode instance

回答 1


The functions with an s take string parameters. The others take file streams.

回答 2


调用时,jsonstr = json.dumps(mydata)它首先在内存中创建数据的完整副本,然后才将file.write(jsonstr)其复制到磁盘。因此,这是一种更快的方法,但是如果要保存大量数据,则可能会成为问题。

当调用json.dump(mydata, file)-不带’s’时,不使用新的内存,因为数据是按块转储的。但是整个过程要慢大约2倍。


In memory usage and speed.

When you call jsonstr = json.dumps(mydata) it first creates a full copy of your data in memory and only then you file.write(jsonstr) it to disk. So this is a faster method but can be a problem if you have a big piece of data to save.

When you call json.dump(mydata, file) — without ‘s’, new memory is not used, as the data is dumped by chunks. But the whole process is about 2 times slower.

Source: I checked the source code of json.dump() and json.dumps() and also tested both the variants measuring the time with time.time() and watching the memory usage in htop.

回答 3

Python 2的一个显着差异是,如果您使用ensure_ascii=False,则dump可以将UTF-8编码的数据正确写入文件中(除非您使用的扩展名不是UTF-8的8位字符串):

dumps另一方面,with ensure_ascii=False可以产生a strunicode仅取决于您用于字符串的类型:




当然,这在Python 3中不再是有效的问题,因为不再存在这种8位/ Unicode的混淆。

至于loadVS loadsload认为整个文件是一个JSON文件,所以你不能用它来从单个文件读取多个新行限制JSON文件。

One notable difference in Python 2 is that if you’re using ensure_ascii=False, dump will properly write UTF-8 encoded data into the file (unless you used 8-bit strings with extended characters that are not UTF-8):

dumps on the other hand, with ensure_ascii=False can produce a str or unicode just depending on what types you used for strings:

Serialize obj to a JSON formatted str using this conversion table. If ensure_ascii is False, the result may contain non-ASCII characters and the return value may be a unicode instance.

(emphasis mine). Note that it may still be a str instance as well.

Thus you cannot use its return value to save the structure into file without checking which format was returned and possibly playing with unicode.encode.

This of course is not valid concern in Python 3 any more, since there is no more this 8-bit/Unicode confusion.

As for load vs loads, load considers the whole file to be one JSON document, so you cannot use it to read multiple newline limited JSON documents from a single file.



我想检查我的环境中是否存在"FOO"Python 中的变量。为此,我正在使用os标准库。阅读图书馆的文档后,我想出了两种实现目标的方法:


if "FOO" in os.environ:


if os.getenv("FOO") is not None:


I want to check my environment for the existence of a variable, say "FOO", in Python. For this purpose, I am using the os standard library. After reading the library’s documentation, I have figured out 2 ways to achieve my goal:

Method 1:

if "FOO" in os.environ:

Method 2:

if os.getenv("FOO") is not None:

I would like to know which method, if either, is a good/preferred conditional and why.

回答 0


你想看看是否有存在 environ,为什么你会得到只是为了进行比较,然后折腾它扔掉



(这也意味着您的支票可能只是if getenv("FOO")



from os import environ
if environ.get('Foo') is not None:


if "FOO" in os.environ:


Use the first; it directly tries to check if something is defined in environ. Though the second form works equally well, it’s lacking semantically since you get a value back if it exists and only use it for a comparison.

You’re trying to see if something is present in environ, why would you get just to compare it and then toss it away?

That’s exactly what getenv does:

Get an environment variable, return None if it doesn’t exist. The optional second argument can specify an alternate default.

(this also means your check could just be if getenv("FOO"))

you don’t want to get it, you want to check for it’s existence.

Either way, getenv is just a wrapper around environ.get but you don’t see people checking for membership in mappings with:

from os import environ
if environ.get('Foo') is not None:

To summarize, use:

if "FOO" in os.environ:

if you just want to check for existence, while, use getenv("FOO") if you actually want to do something with the value you might get.

回答 1




if 'KITTEN_ALLERGY' in os.environ:



server = os.getenv('MY_CAT_STREAMS', 'youtube.com')


server = os.environ.get('MY_CAT_STREAMS', 'youtube.com')


combined = ChainMap(command_line_args, os.environ, defaults)

There is a case for either solution, depending on what you want to do conditional on the existence of the environment variable.

Case 1

When you want to take different actions purely based on the existence of the environment variable, without caring for its value, the first solution is the best practice. It succinctly describes what you test for: is ‘FOO’ in the list of environment variables.

if 'KITTEN_ALLERGY' in os.environ:

Case 2

When you want to set a default value if the value is not defined in the environment variables the second solution is actually useful, though not in the form you wrote it:

server = os.getenv('MY_CAT_STREAMS', 'youtube.com')

or perhaps

server = os.environ.get('MY_CAT_STREAMS', 'youtube.com')

Note that if you have several options for your application you might want to look into ChainMap, which allows to merge multiple dicts based on keys. There is an example of this in the ChainMap documentation:

combined = ChainMap(command_line_args, os.environ, defaults)

回答 2


os.getenv('FOO') or 'bar'



print(os.getenv('FOO', 'bar'))
# prints new line - though you expected `bar`


if "FOO" in os.environ:
    print("FOO is here")
# prints FOO is here - however its not

为了避免这种情况,只需使用 or

os.getenv('FOO') or 'bar'


print(os.getenv('FOO') or 'bar')
# bar



# .env


$ export FOO=

或忘记设置它 settings.py

# settings.py
os.environ['FOO'] = ''


>>> import os; os.environ['FOO'] = ''; print(os.getenv('FOO', 'bar'))

$ FOO= python -c "import os; print(os.getenv('FOO', 'bar'))"

To be on the safe side use

os.getenv('FOO') or 'bar'

A corner case with the above answers is when the environment variable is set but is empty

For this special case you get

print(os.getenv('FOO', 'bar'))
# prints new line - though you expected `bar`


if "FOO" in os.environ:
    print("FOO is here")
# prints FOO is here - however its not

To avoid this just use or

os.getenv('FOO') or 'bar'

Then you get

print(os.getenv('FOO') or 'bar')
# bar

When do we have empty environment variables?

You forgot to set the value in the .env file

# .env

or exported as

$ export FOO=

or forgot to set it in settings.py

# settings.py
os.environ['FOO'] = ''

Update: if in doubt, check out these one-liners

>>> import os; os.environ['FOO'] = ''; print(os.getenv('FOO', 'bar'))

$ FOO= python -c "import os; print(os.getenv('FOO', 'bar'))"

回答 3


import os


    if var not in os.environ:
        raise EnvironmentError("Failed because {} is not set.".format(var))

In case you want to check if multiple env variables are not set, you can do the following:

import os


    if var not in os.environ:
        raise EnvironmentError("Failed because {} is not set.".format(var))

回答 4


'USERNAME' %in% names(Sys.getenv())

My comment might not be relevant to the tags given. However, I was lead to this page from my search. I was looking for similar check in R and I came up the following with the help of @hugovdbeg post. I hope it would be helpful for someone who is looking for similar solution in R

'USERNAME' %in% names(Sys.getenv())

类方法生成“ TypeError:…为关键字参数获得了多个值……”

问题:类方法生成“ TypeError:…为关键字参数获得了多个值……”


class foo(object):
  def foodo(thing=None, thong='not underwear'):
    print thing if thing else "nothing" 
    print 'a thong is',thong


myfoo = foo()

TypeError: foodo() got multiple values for keyword argument 'thing'


If I define a class method with a keyword argument thus:

class foo(object):
  def foodo(thing=None, thong='not underwear'):
    print thing if thing else "nothing" 
    print 'a thong is',thong

calling the method generates a TypeError:

myfoo = foo()

TypeError: foodo() got multiple values for keyword argument 'thing'

What’s going on?

回答 0


class foo(object):
  def foodo(self, thing=None, thong='not underwear'):
    print thing if thing else "nothing" 
    print 'a thong is',thong



如果没有self作为第一个参数,则在myfoo.foodo(thing="something")执行时,将foodo使用arguments调用该方法(myfoo, thing="something")myfoo然后将该实例分配给thing(因为thing是第一个声明的参数),但是python也会尝试分配"something"thing,因此是Exception。


print myfoo


<__main__.foo object at 0x321c290>
a thong is something

<__main__.foo object at 0x321c290>

您可以看到“事物”已被分配对类“ foo”的实例“ myfoo”的引用。文档的此部分说明了函数参数的工作原理。

The problem is that the first argument passed to class methods in python is always a copy of the class instance on which the method is called, typically labelled self. If the class is declared thus:

class foo(object):
  def foodo(self, thing=None, thong='not underwear'):
    print thing if thing else "nothing" 
    print 'a thong is',thong

it behaves as expected.


Without self as the first parameter, when myfoo.foodo(thing="something") is executed, the foodo method is called with arguments (myfoo, thing="something"). The instance myfoo is then assigned to thing (since thing is the first declared parameter), but python also attempts to assign "something" to thing, hence the Exception.

To demonstrate, try running this with the original code:

print myfoo

You’ll output like:

<__main__.foo object at 0x321c290>
a thong is something

<__main__.foo object at 0x321c290>

You can see that ‘thing’ has been assigned a reference to the instance ‘myfoo’ of the class ‘foo’. This section of the docs explains how function arguments work a bit more.

回答 1

感谢您的指导性帖子。我只想说明一下,如果您收到“ TypeError:foodo()为关键字参数’thing’获得多个值”,则可能是您错误地将“ self”作为参数传递的调用该函数(可能是因为您从类声明中复制了该行-急时这是一个常见错误)。

Thanks for the instructive posts. I’d just like to keep a note that if you’re getting “TypeError: foodo() got multiple values for keyword argument ‘thing'”, it may also be that you’re mistakenly passing the ‘self’ as a parameter when calling the function (probably because you copied the line from the class declaration – it’s a common error when one’s in a hurry).

回答 2


>>> def foodo(thing=None, thong='not underwear'):
...     print thing if thing else "nothing"
...     print 'a thong is',thong
>>> foodo('something', thing='everything')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: foodo() got multiple values for keyword argument 'thing'

This might be obvious, but it might help someone who has never seen it before. This also happens for regular functions if you mistakenly assign a parameter by position and explicitly by name.

>>> def foodo(thing=None, thong='not underwear'):
...     print thing if thing else "nothing"
...     print 'a thong is',thong
>>> foodo('something', thing='everything')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: foodo() got multiple values for keyword argument 'thing'

回答 3

只需向功能添加“ staticmethod”装饰器即可解决问题

class foo(object):
    def foodo(thing=None, thong='not underwear'):
        print thing if thing else "nothing" 
        print 'a thong is',thong

just add ‘staticmethod’ decorator to function and problem is fixed

class foo(object):
    def foodo(thing=None, thong='not underwear'):
        print thing if thing else "nothing" 
        print 'a thong is',thong

回答 4



there is difference between parameter and argument您可以在此处详细了解python中的参数和参数

def hello(a,b=1, *args):
   print(a, b, *args)

hello(1, 2, 3, 4,a=12)



b = 1是关键字和默认参数

* args是可变长度参数


一个是位置值:a = 1

第二个是关键字值,a = 12

我们必须更改hello(1, 2, 3, 4,a=12)为,hello(1, 2, 3, 4,12) 所以现在a将仅获得一个位置值,即1,b将获得值2,其余值将获得* args(可变长度参数)


如果我们希望* args应该得到2,3,4而a应该得到1和b应该得到12

def hello(a,*args,b=1): pass hello(1, 2, 3, 4,b=12)


def hello(a,*c,b=1,**kwargs):




(2, 1, 2, 8, 9)


{'c': 12}

I want to add one more answer :

It happens when you try to pass positional parameter with wrong position order along with keyword argument in calling function.

there is difference between parameter and argument you can read in detail about here Arguments and Parameter in python

def hello(a,b=1, *args):
   print(a, b, *args)

hello(1, 2, 3, 4,a=12)

since we have three parameters :

a is positional parameter

b=1 is keyword and default parameter

*args is variable length parameter

so we first assign a as positional parameter , means we have to provide value to positional argument in its position order, here order matter. but we are passing argument 1 at the place of a in calling function and then we are also providing value to a , treating as keyword argument. now a have two values :

one is positional value: a=1

second is keyworded value which is a=12


We have to change hello(1, 2, 3, 4,a=12) to hello(1, 2, 3, 4,12) so now a will get only one positional value which is 1 and b will get value 2 and rest of values will get *args (variable length parameter)

additional information

if we want that *args should get 2,3,4 and a should get 1 and b should get 12

then we can do like this
def hello(a,*args,b=1): pass hello(1, 2, 3, 4,b=12)

Something more :

def hello(a,*c,b=1,**kwargs):


output :


(2, 1, 2, 8, 9)


{'c': 12}

回答 5


>>> class Foo():
...     def bar(self, bar, **kwargs):
...             print(bar)
>>> kwgs = {"bar":"Barred", "jokes":"Another key word argument"}
>>> myfoo = Foo()
>>> myfoo.bar("fire", **kwgs)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: bar() got multiple values for argument 'bar'



This error can also happen if you pass a key word argument for which one of the keys is similar (has same string name) to a positional argument.

>>> class Foo():
...     def bar(self, bar, **kwargs):
...             print(bar)
>>> kwgs = {"bar":"Barred", "jokes":"Another key word argument"}
>>> myfoo = Foo()
>>> myfoo.bar("fire", **kwgs)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: bar() got multiple values for argument 'bar'

“fire” has been accepted into the ‘bar’ argument. And yet there is another ‘bar’ argument present in kwargs.

You would have to remove the keyword argument from the kwargs before passing it to the method.

回答 6

如果您使用jquery ajax的URL反向到不包含’request’参数的函数,则这也可能在Django中发生

  url: '{{ url_to_myfunc }}',

def myfunc(foo, bar):

Also this can happen in Django if you are using jquery ajax to url that reverses to a function that doesn’t contain ‘request’ parameter

  url: '{{ url_to_myfunc }}',

def myfunc(foo, bar):




belong finger death punch <br>
hasty <br>
mike hasty walls jericho <br>
jägermeister rules <br>
rules bands follow performing jägermeister stage <br>


import sys

import numpy
from nltk.cluster import KMeansClusterer, GAAClusterer, euclidean_distance
import nltk.corpus
from nltk import decorators
import nltk.stem

stemmer_func = nltk.stem.EnglishStemmer().stem
stopwords = set(nltk.corpus.stopwords.words('english'))

def normalize_word(word):
    return stemmer_func(word.lower())

def get_words(titles):
    words = set()
    for title in job_titles:
        for word in title.split():
    return list(words)

def vectorspaced(title):
    title_components = [normalize_word(word) for word in title.split()]
    return numpy.array([
        word in title_components and not word in stopwords
        for word in words], numpy.short)

if __name__ == '__main__':

    filename = 'example.txt'
    if len(sys.argv) == 2:
        filename = sys.argv[1]

    with open(filename) as title_file:

        job_titles = [line.strip() for line in title_file.readlines()]

        words = get_words(job_titles)

        # cluster = KMeansClusterer(5, euclidean_distance)
        cluster = GAAClusterer(5)
        cluster.cluster([vectorspaced(title) for title in job_titles if title])

        # NOTE: This is inefficient, cluster.classify should really just be
        # called when you are classifying previously unseen examples!
        classified_examples = [
                cluster.classify(vectorspaced(title)) for title in job_titles

        for cluster_id, title in sorted(zip(classified_examples, job_titles)):
            print cluster_id, title



Traceback (most recent call last):
File "cluster_example.py", line 40, in
words = get_words(job_titles)
File "cluster_example.py", line 20, in get_words
File "", line 1, in
File "/usr/local/lib/python2.7/dist-packages/nltk/decorators.py", line 183, in memoize
result = func(*args)
File "cluster_example.py", line 14, in normalize_word
return stemmer_func(word.lower())
File "/usr/local/lib/python2.7/dist-packages/nltk/stem/snowball.py", line 694, in stem
word = (word.replace(u"\u2019", u"\x27")
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 13: ordinal not in range(128)


I’m using NLTK to perform kmeans clustering on my text file in which each line is considered as a document. So for example, my text file is something like this:

belong finger death punch <br>
hasty <br>
mike hasty walls jericho <br>
jägermeister rules <br>
rules bands follow performing jägermeister stage <br>

Now the demo code I’m trying to run is this:

import sys

import numpy
from nltk.cluster import KMeansClusterer, GAAClusterer, euclidean_distance
import nltk.corpus
from nltk import decorators
import nltk.stem

stemmer_func = nltk.stem.EnglishStemmer().stem
stopwords = set(nltk.corpus.stopwords.words('english'))

def normalize_word(word):
    return stemmer_func(word.lower())

def get_words(titles):
    words = set()
    for title in job_titles:
        for word in title.split():
    return list(words)

def vectorspaced(title):
    title_components = [normalize_word(word) for word in title.split()]
    return numpy.array([
        word in title_components and not word in stopwords
        for word in words], numpy.short)

if __name__ == '__main__':

    filename = 'example.txt'
    if len(sys.argv) == 2:
        filename = sys.argv[1]

    with open(filename) as title_file:

        job_titles = [line.strip() for line in title_file.readlines()]

        words = get_words(job_titles)

        # cluster = KMeansClusterer(5, euclidean_distance)
        cluster = GAAClusterer(5)
        cluster.cluster([vectorspaced(title) for title in job_titles if title])

        # NOTE: This is inefficient, cluster.classify should really just be
        # called when you are classifying previously unseen examples!
        classified_examples = [
                cluster.classify(vectorspaced(title)) for title in job_titles

        for cluster_id, title in sorted(zip(classified_examples, job_titles)):
            print cluster_id, title

(which can also be found here)

The error I receive is this:

Traceback (most recent call last):
File "cluster_example.py", line 40, in
words = get_words(job_titles)
File "cluster_example.py", line 20, in get_words
File "", line 1, in
File "/usr/local/lib/python2.7/dist-packages/nltk/decorators.py", line 183, in memoize
result = func(*args)
File "cluster_example.py", line 14, in normalize_word
return stemmer_func(word.lower())
File "/usr/local/lib/python2.7/dist-packages/nltk/stem/snowball.py", line 694, in stem
word = (word.replace(u"\u2019", u"\x27")
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 13: ordinal not in range(128)

What is happening here?

回答 0


job_titles = [line.strip() for line in title_file.readlines()]

strs 显式解码为unicode(此处假定为UTF-8):

job_titles = [line.decode('utf-8').strip() for line in title_file.readlines()]


The file is being read as a bunch of strs, but it should be unicodes. Python tries to implicitly convert, but fails. Change:

job_titles = [line.strip() for line in title_file.readlines()]

to explicitly decode the strs to unicode (here assuming UTF-8):

job_titles = [line.decode('utf-8').strip() for line in title_file.readlines()]

It could also be solved by importing the codecs module and using codecs.open rather than the built-in open.

回答 1


f = open(file_path, 'r+', encoding="utf-8")



This works fine for me.

f = open(file_path, 'r+', encoding="utf-8")

You can add a third parameter encoding to ensure the encoding type is ‘utf-8’

Note: this method works fine in Python3, I did not try it in Python2.7.

回答 2


export LC_CTYPE=en_US.UTF-8


source ~/.bashrc

For me there was a problem with the terminal encoding. Adding UTF-8 to .bashrc solved the problem:

export LC_CTYPE=en_US.UTF-8

Don’t forget to reload .bashrc afterwards:

source ~/.bashrc

回答 3


import sys

You can try this also:

import sys

回答 4

在使用Python3.6的 Ubuntu 18.04上,我同时解决了以下问题:

with open(filename, encoding="utf-8") as lines:


export LC_ALL=C.UTF-8


import sys


import io
with io.open(filename, 'r', encoding='utf-8') as lines:


export LC_ALL=C.UTF-8

When on Ubuntu 18.04 using Python3.6 I have solved the problem doing both:

with open(filename, encoding="utf-8") as lines:

and if you are running the tool as command line:

export LC_ALL=C.UTF-8

Note that if you are in Python2.7 you have do to handle this differently. First you have to set the default encoding:

import sys

and then to load the file you must use io.open to set the encoding:

import io
with io.open(filename, 'r', encoding='utf-8') as lines:

You still need to export the env

export LC_ALL=C.UTF-8

回答 5


# Avoid ascii errors when reading files in Python
RUN apt-get install -y \
  locales && \
  locale-gen en_US.UTF-8

I got this error when trying to install a python package in a Docker container. For me, the issue was that the docker image did not have a locale configured. Adding the following code to the Dockerfile solved the problem for me.

# Avoid ascii errors when reading files in Python
RUN apt-get install -y locales && locale-gen en_US.UTF-8

回答 6

要查找与ANY和ALL unicode错误相关的信息,请使用以下命令:

grep -r -P '[^\x00-\x7f]' /etc/apache2 /etc/letsencrypt /etc/nginx


/etc/letsencrypt/options-ssl-nginx.conf:        # The following CSP directives don't use default-src as 


00008099:     C2  194 302 11000010
00008100:     A0  160 240 10100000
00008101:  d  64  100 144 01100100
00008102:  e  65  101 145 01100101
00008103:  f  66  102 146 01100110
00008104:  a  61  097 141 01100001
00008105:  u  75  117 165 01110101
00008106:  l  6C  108 154 01101100
00008107:  t  74  116 164 01110100
00008108:  -  2D  045 055 00101101
00008109:  s  73  115 163 01110011
00008110:  r  72  114 162 01110010
00008111:  c  63  099 143 01100011
00008112:     C2  194 302 11000010
00008113:     A0  160 240 10100000

To find ANY and ALL unicode error related… Using the following command:

grep -r -P '[^\x00-\x7f]' /etc/apache2 /etc/letsencrypt /etc/nginx

Found mine in

/etc/letsencrypt/options-ssl-nginx.conf:        # The following CSP directives don't use default-src as 

Using shed, I found the offending sequence. It turned out to be an editor mistake.

00008099:     C2  194 302 11000010
00008100:     A0  160 240 10100000
00008101:  d  64  100 144 01100100
00008102:  e  65  101 145 01100101
00008103:  f  66  102 146 01100110
00008104:  a  61  097 141 01100001
00008105:  u  75  117 165 01110101
00008106:  l  6C  108 154 01101100
00008107:  t  74  116 164 01110100
00008108:  -  2D  045 055 00101101
00008109:  s  73  115 163 01110011
00008110:  r  72  114 162 01110010
00008111:  c  63  099 143 01100011
00008112:     C2  194 302 11000010
00008113:     A0  160 240 10100000

回答 7


source = unicode(job_titles, 'utf-8')

You can try this before using job_titles string:

source = unicode(job_titles, 'utf-8')

回答 8

对于python 3,默认编码为“ utf-8”。基本文档中建议采取以下步骤:如有任何问题,请https://docs.python.org/2/library/csv.html#csv-examples

  1. 创建一个功能

    def utf_8_encoder(unicode_csv_data):
        for line in unicode_csv_data:
            yield line.encode('utf-8')
  2. 然后使用阅读器内部的功能,例如

    csv_reader = csv.reader(utf_8_encoder(unicode_csv_data))

For python 3, the default encoding would be “utf-8”. Following steps are suggested in the base documentation:https://docs.python.org/2/library/csv.html#csv-examples in case of any problem

  1. Create a function

    def utf_8_encoder(unicode_csv_data):
        for line in unicode_csv_data:
            yield line.encode('utf-8')
  2. Then use the function inside the reader, for e.g.

    csv_reader = csv.reader(utf_8_encoder(unicode_csv_data))

回答 9


  1. 以字节流加载文件:

    body =”for open(’website / index.html’,’rb’)中的行:

  2. 使用全局设置:

    导入io导入sys sys.stdout = io.TextIOWrapper(sys.stdout.buffer,encoding =’utf-8’)

python3x or higher

  1. load file in byte stream:

    body = ” for lines in open(‘website/index.html’,’rb’): decodedLine = lines.decode(‘utf-8’) body = body+decodedLine.strip() return body

  2. use global setting:

    import io import sys sys.stdout = io.TextIOWrapper(sys.stdout.buffer,encoding=’utf-8′)

回答 10

使用open(fn, 'rb').read().decode('utf-8')而不是仅仅open(fn).read()

Use open(fn, 'rb').read().decode('utf-8') instead of just open(fn).read()

在python中将stdout重定向为“ nothing”

问题:在python中将stdout重定向为“ nothing”




# I want to do something like this.
sys.stdout = None         # this obviously will give an error as Nonetype object does not have any write method.


class DontPrint(object):
    def write(*args): pass

dp = DontPrint()
sys.stdout = dp


I have a large project consisting of sufficiently large number of modules, each printing something to the standard output. Now as the project has grown in size, there are large no. of print statements printing a lot on the std out which has made the program considerably slower.

So, I now want to decide at runtime whether or not to print anything to the stdout. I cannot make changes in the modules as there are plenty of them. (I know I can redirect the stdout to a file but even this is considerably slow.)

So my question is how do I redirect the stdout to nothing ie how do I make the print statement do nothing?

# I want to do something like this.
sys.stdout = None         # this obviously will give an error as Nonetype object does not have any write method.

Currently the only idea I have is to make a class which has a write method (which does nothing) and redirect the stdout to an instance of this class.

class DontPrint(object):
    def write(*args): pass

dp = DontPrint()
sys.stdout = dp

Is there an inbuilt mechanism in python for this? Or is there something better than this?

回答 0


import os
import sys
f = open(os.devnull, 'w')
sys.stdout = f


f = open('nul', 'w')
sys.stdout = f


f = open('/dev/null', 'w')
sys.stdout = f


import os
import sys
f = open(os.devnull, 'w')
sys.stdout = f

On Windows:

f = open('nul', 'w')
sys.stdout = f

On Linux:

f = open('/dev/null', 'w')
sys.stdout = f

回答 1


Python 2:

import os
import sys
from contextlib import contextmanager

def silence_stdout():
    old_target = sys.stdout
        with open(os.devnull, "w") as new_target:
            sys.stdout = new_target
            yield new_target
        sys.stdout = old_target

with silence_stdout():
    print("will not print")

print("this will print")

Python 3.4+:

Python 3.4具有内置的上下文处理器,因此您可以像这样简单地使用contextlib:

import contextlib

with contextlib.redirect_stdout(None):
    print("will not print")

print("this will print")


$ python test.py
this will print

这可以跨平台(Windows + Linux + Mac OSX)运行,并且比其他解决方案更干净。

A nice way to do this is to create a small context processor that you wrap your prints in. You then just use is in a with-statement to silence all output.

Python 2:

import os
import sys
from contextlib import contextmanager

def silence_stdout():
    old_target = sys.stdout
        with open(os.devnull, "w") as new_target:
            sys.stdout = new_target
            yield new_target
        sys.stdout = old_target

with silence_stdout():
    print("will not print")

print("this will print")

Python 3.4+:

Python 3.4 has a context processor like this built-in, so you can simply use contextlib like this:

import contextlib

with contextlib.redirect_stdout(None):
    print("will not print")

print("this will print")

Running this code only prints the second line of output, not the first:

$ python test.py
this will print

This works cross-platform (Windows + Linux + Mac OSX), and is cleaner than the ones other answers imho.

回答 2

如果您使用的是python 3.4或更高版本,则可以使用标准库提供一种简单安全的解决方案:

import contextlib

with contextlib.redirect_stdout(None):
  print("This won't print!")

If you’re in python 3.4 or higher, there’s a simple and safe solution using the standard library:

import contextlib

with contextlib.redirect_stdout(None):
  print("This won't print!")

回答 3


import os
import sys
import datetime

ITER = 10000000
def printlots(out, it, st="abcdefghijklmnopqrstuvwxyz1234567890"):
   temp = sys.stdout
   sys.stdout = out
   i = 0
   start_t = datetime.datetime.now()
   while i < it:
      print st
      i = i+1
   end_t = datetime.datetime.now()
   sys.stdout = temp
   print out, "\n   took", end_t - start_t, "for", it, "iterations"

class devnull():
   def write(*args):

printlots(open(os.devnull, 'wb'), ITER)
printlots(devnull(), ITER)


<open file '/dev/null', mode 'wb' at 0x7f2b747044b0> 
   took 0:00:02.074853 for 10000000 iterations
<__main__.devnull instance at 0x7f2b746bae18> 
   took 0:00:09.933056 for 10000000 iterations

(at least on my system) it appears that writing to os.devnull is about 5x faster than writing to a DontPrint class, i.e.

import os
import sys
import datetime

ITER = 10000000
def printlots(out, it, st="abcdefghijklmnopqrstuvwxyz1234567890"):
   temp = sys.stdout
   sys.stdout = out
   i = 0
   start_t = datetime.datetime.now()
   while i < it:
      print st
      i = i+1
   end_t = datetime.datetime.now()
   sys.stdout = temp
   print out, "\n   took", end_t - start_t, "for", it, "iterations"

class devnull():
   def write(*args):

printlots(open(os.devnull, 'wb'), ITER)
printlots(devnull(), ITER)

gave the following output:

<open file '/dev/null', mode 'wb' at 0x7f2b747044b0> 
   took 0:00:02.074853 for 10000000 iterations
<__main__.devnull instance at 0x7f2b746bae18> 
   took 0:00:09.933056 for 10000000 iterations

回答 4


python myprogram.py > /dev/null


python myprogram.py > nul

If you’re in a Unix environment (Linux included), you can redirect output to /dev/null:

python myprogram.py > /dev/null

And for Windows:

python myprogram.py > nul

回答 5


from contextlib import ExitStack, redirect_stdout
import os

with ExitStack() as stack:
    if should_hide_output():
        null_stream = open(os.devnull, "w")



主要缺点是,这仅适用于Python 3.4和更高版本。

How about this:

from contextlib import ExitStack, redirect_stdout
import os

with ExitStack() as stack:
    if should_hide_output():
        null_stream = open(os.devnull, "w")

This uses the features in the contextlib module to hide the output of whatever command you are trying to run, depending on the result of should_hide_output(), and then restores the output behavior after that function is done running.

If you want to hide standard error output, then import redirect_stderr from contextlib and add a line saying stack.enter_context(redirect_stderr(null_stream)).

The main downside it that this only works in Python 3.4 and later versions.

回答 6


如果您使用的是* NIX,则可以执行sys.stdout = open('/dev/null'),但这比滚动自己的类要轻巧。

Your class will work just fine (with the exception of the write() method name — it needs to be called write(), lowercase). Just make sure you save a copy of sys.stdout in another variable.

If you’re on a *NIX, you can do sys.stdout = open('/dev/null'), but this is less portable than rolling your own class.

回答 7


import mock

sys.stdout = mock.MagicMock()

You can just mock it.

import mock

sys.stdout = mock.MagicMock()

回答 8



Why don’t you try this?


回答 9

sys.stdout = None



在某些情况下,stdin,stdout和stderr以及原始值stdinstdoutstderr可以为None。对于未连接到控制台的Windows GUI应用程序以及以pythonw开头的Python应用程序,通常是这种情况。

sys.stdout = None

It is OK for print() case. But it can cause an error if you call any method of sys.stdout, e.g. sys.stdout.write().

There is a note in docs:

Under some conditions stdin, stdout and stderr as well as the original values stdin, stdout and stderr can be None. It is usually the case for Windows GUI apps that aren’t connected to a console and Python apps started with pythonw.

回答 10

补充iFreilicht的答案 -适用于python 2和3。

import sys

class NonWritable:
    def write(self, *args, **kwargs):

class StdoutIgnore:
    def __enter__(self):
        self.stdout_saved = sys.stdout
        sys.stdout = NonWritable()
        return self

    def __exit__(self, *args):
        sys.stdout = self.stdout_saved

with StdoutIgnore():
    print("This won't print!")

Supplement to iFreilicht’s answer – it works for both python 2 & 3.

import sys

class NonWritable:
    def write(self, *args, **kwargs):

class StdoutIgnore:
    def __enter__(self):
        self.stdout_saved = sys.stdout
        sys.stdout = NonWritable()
        return self

    def __exit__(self, *args):
        sys.stdout = self.stdout_saved

with StdoutIgnore():
    print("This won't print!")



在我的程序中,用户输入number n,然后输入n字符串数,这些字符串存储在列表中。




n = input ("Define number of actors: ")

count = 0

nams = []

while count < n:
    count = count + 1
    print "Define name for actor ", count, ":"
    name = raw_input ()

if nams[2]: #I am trying to say 'if nams[2] exists, do something depending on len(nams)
    if len(nams) > 3:
    if len(nams) > 4

if nams[3]: #etc.

In my program, user inputs number n, and then inputs n number of strings, which get stored in a list.

I need to code such that if a certain list index exists, then run a function.

This is made more complicated by the fact that I have nested if statements about len(my_list).

Here’s a simplified version of what I have now, which isn’t working:

n = input ("Define number of actors: ")

count = 0

nams = []

while count < n:
    count = count + 1
    print "Define name for actor ", count, ":"
    name = raw_input ()

if nams[2]: #I am trying to say 'if nams[2] exists, do something depending on len(nams)
    if len(nams) > 3:
    if len(nams) > 4

if nams[3]: #etc.

回答 0


Could it be more useful for you to use the length of the list len(n) to inform your decision rather than checking n[i] for each possible length?

回答 1




except IndexError:
    print('sorry, no 5')   

# Note: this only is a valid test in this context 
# with absolute (ie, positive) index
# a relative index is only showing you that a value can be returned
# from that relative index from the end of the list...

但是,根据定义,Python列表中0和之间的所有项都len(the_list)-1存在(即,无需尝试,除非您知道0 <= index < len(the_list))。



for i, name in enumerate(names):
    print(i + ' ' + name)
    if i in (3,4):
        # do your thing with the index 'i' or value 'name' for each item...


def do_something(name):
    print('some thing 1 done with ' + name)

def do_something_else(name):
    print('something 2 done with ' + name)        

def default(name):
    print('nothing done with ' + name)     

    3: do_something,        
    4: do_something_else

n = input ("Define number of actors: ")
count = 0
names = []

for count in range(n):
    print("Define name for actor {}:".format(count+1))
    name = raw_input ()

for name in names:
    except KeyError:


Define number of actors: 3
Define name for actor 1: bob
Define name for actor 2: tony
Define name for actor 3: alice
some thing 1 done with bob
something 2 done with tony
nothing done with alice

您也可以使用.get方法而不是try / except来获得较短的版本:

>>> something_to_do.get(3, default)('bob')
some thing 1 done with bob
>>> something_to_do.get(22, default)('alice')
nothing done with alice

I need to code such that if a certain list index exists, then run a function.

This is the perfect use for a try block:


except IndexError:
    print('sorry, no 5')   

# Note: this only is a valid test in this context 
# with absolute (ie, positive) index
# a relative index is only showing you that a value can be returned
# from that relative index from the end of the list...

However, by definition, all items in a Python list between 0 and len(the_list)-1 exist (i.e., there is no need for a try, except if you know 0 <= index < len(the_list)).

You can use enumerate if you want the indexes between 0 and the last element:


for i, name in enumerate(names):
    print(i + ' ' + name)
    if i in (3,4):
        # do your thing with the index 'i' or value 'name' for each item...

If you are looking for some defined ‘index’ thought, I think you are asking the wrong question. Perhaps you should consider using a mapping container (such as a dict) versus a sequence container (such as a list). You could rewrite your code like this:

def do_something(name):
    print('some thing 1 done with ' + name)

def do_something_else(name):
    print('something 2 done with ' + name)        

def default(name):
    print('nothing done with ' + name)     

    3: do_something,        
    4: do_something_else

n = input ("Define number of actors: ")
count = 0
names = []

for count in range(n):
    print("Define name for actor {}:".format(count+1))
    name = raw_input ()

for name in names:
    except KeyError:

Runs like this:

Define number of actors: 3
Define name for actor 1: bob
Define name for actor 2: tony
Define name for actor 3: alice
some thing 1 done with bob
something 2 done with tony
nothing done with alice

You can also use .get method rather than try/except for a shorter version:

>>> something_to_do.get(3, default)('bob')
some thing 1 done with bob
>>> something_to_do.get(22, default)('alice')
nothing done with alice

回答 2

len(nams)应该等于n您的代码。所有索引都0 <= i < n“存在”。

len(nams) should be equal to n in your code. All indexes 0 <= i < n “exist”.

回答 3


if index < len(my_list):
    print(index, 'exists in the list')
    print(index, "doesn't exist in the list")

It can be done simply using the following code:

if index < len(my_list):
    print(index, 'exists in the list')
    print(index, "doesn't exist in the list")

回答 4




因此,i 当且仅当列表的长度至少为时,列表才具有索引i + 1

I need to code such that if a certain list index exists, then run a function.

You already know how to test for this and in fact are already performing such tests in your code.

The valid indices for a list of length n are 0 through n-1 inclusive.

Thus, a list has an index i if and only if the length of the list is at least i + 1.

回答 5


def index_exists(ls, i):
    return (0 <= i < len(ls)) or (-len(ls) <= i < 0)


如果无论如何您以后都需要访问该索引处的项目,则宽恕要比权限容易,并且它也更快,更Python化。使用try: except:

    item = ls[i]
    # Do something with item
except IndexError:
    # Do something without the item


if index_exists(ls, i):
    item = ls[i]
    # Do something with item
    # Do something without the item

Using the length of the list would be the fastest solution to check if an index exists:

def index_exists(ls, i):
    return (0 <= i < len(ls)) or (-len(ls) <= i < 0)

This also tests for negative indices, and most sequence types (Like ranges and strs) that have a length.

If you need to access the item at that index afterwards anyways, it is easier to ask forgiveness than permission, and it is also faster and more Pythonic. Use try: except:.

    item = ls[i]
    # Do something with item
except IndexError:
    # Do something without the item

This would be as opposed to:

if index_exists(ls, i):
    item = ls[i]
    # Do something with item
    # Do something without the item

回答 6


for i in range(n):
    if len(nams[i]) > 3:
    if len(nams[i]) > 4:

If you want to iterate the inserted actors data:

for i in range(n):
    if len(nams[i]) > 3:
    if len(nams[i]) > 4:

回答 7


>>> your_list = [5,6,7]
>>> 2 in zip(*enumerate(your_list))[0]
>>> 3 in zip(*enumerate(your_list))[0]

ok, so I think it’s actually possible (for the sake of argument):

>>> your_list = [5,6,7]
>>> 2 in zip(*enumerate(your_list))[0]
>>> 3 in zip(*enumerate(your_list))[0]

回答 8


list = ["a", "b", "C", "d", "e", "f", "r"]

for i in range(0, len(list), 2):
    print list[i]
    if len(list) % 2 == 1 and  i == len(list)-1:
    print list[i+1];

You can try something like this

list = ["a", "b", "C", "d", "e", "f", "r"]

for i in range(0, len(list), 2):
    print list[i]
    if len(list) % 2 == 1 and  i == len(list)-1:
    print list[i+1];

回答 9


do_X() if len(your_list) > your_index else do_something_else()  


In [10]: def do_X(): 
    ...:     print(1) 

In [11]: def do_something_else(): 
    ...:     print(2) 

In [12]: your_index = 2                                                                                                                                                                                                                       

In [13]: your_list = [1,2,3]                                                                                                                                                                                                                  

In [14]: do_X() if len(your_list) > your_index else do_something_else()                                                                                                                                                                      

仅用于信息。恕我直言,try ... except IndexError是更好的解决方案。


do_X() if len(your_list) > your_index else do_something_else()  

Full example:

In [10]: def do_X(): 
    ...:     print(1) 

In [11]: def do_something_else(): 
    ...:     print(2) 

In [12]: your_index = 2                                                                                                                                                                                                                       

In [13]: your_list = [1,2,3]                                                                                                                                                                                                                  

In [14]: do_X() if len(your_list) > your_index else do_something_else()                                                                                                                                                                      

Just for info. Imho, try ... except IndexError is better solution.

回答 10



n = input ()



Do not let any space in front of your brackets.


n = input ()

Tip: You should add comments over and/or under your code. Not behind your code.

Have a nice day.

回答 11


要检查字典dict是否存在索引“ id”:

dic = {}
dic['name'] = "joao"
dic['age']  = "39"

if 'age' in dic


A lot of answers, not the simple one.

To check if a index ‘id’ exists at dictionary dict:

dic = {}
dic['name'] = "joao"
dic['age']  = "39"

if 'age' in dic

returns true if ‘age’ exists.




ValueError: Input contains NaN, infinity or a value too large for dtype('float64').


np.isnan(mat.any()) #and gets False
np.isfinite(mat.all()) #and gets True


mat[np.isfinite(mat) == True] = 0


我正在使用anaconda和python 2.7.9。

I am using sklearn and having a problem with the affinity propagation. I have built an input matrix and I keep getting the following error.

ValueError: Input contains NaN, infinity or a value too large for dtype('float64').

I have run

np.isnan(mat.any()) #and gets False
np.isfinite(mat.all()) #and gets True

I tried using

mat[np.isfinite(mat) == True] = 0

to remove the infinite values but this did not work either. What can I do to get rid of the infinite values in my matrix, so that I can use the affinity propagation algorithm?

I am using anaconda and python 2.7.9.

回答 0



np.isnan(mat.any()) #and gets False
np.isfinite(mat.all()) #and gets True





This might happen inside scikit, and it depends on what you’re doing. I recommend reading the documentation for the functions you’re using. You might be using one which depends e.g. on your matrix being positive definite and not fulfilling that criteria.

EDIT: How could I miss that:

np.isnan(mat.any()) #and gets False
np.isfinite(mat.all()) #and gets True

is obviously wrong. Right would be:




You want to check wheter any of the element is NaN, and not whether the return value of the any function is a number…

回答 1


df = df.reset_index()


df = df[df.label=='desired_one']

I got the same error message when using sklearn with pandas. My solution is to reset the index of my dataframe df before running any sklearn code:

df = df.reset_index()

I encountered this issue many times when I removed some entries in my df, such as

df = df[df.label=='desired_one']

回答 2


import pandas as pd

def clean_dataset(df):
    assert isinstance(df, pd.DataFrame), "df needs to be a pd.DataFrame"
    indices_to_keep = ~df.isin([np.nan, np.inf, -np.inf]).any(1)
    return df[indices_to_keep].astype(np.float64)

This is my function (based on this) to clean the dataset of nan, Inf, and missing cells (for skewed datasets):

import pandas as pd

def clean_dataset(df):
    assert isinstance(df, pd.DataFrame), "df needs to be a pd.DataFrame"
    indices_to_keep = ~df.isin([np.nan, np.inf, -np.inf]).any(1)
    return df[indices_to_keep].astype(np.float64)

回答 3


The Dimensions of my input array were skewed, as my input csv had empty spaces.

回答 4



def _assert_all_finite(X):
    """Like assert_all_finite, but only for ndarray."""
    X = np.asanyarray(X)
    # First try an O(n) time, O(1) space solution for the common case that
    # everything is finite; fall back to O(n) space np.isfinite to prevent
    # false positives from overflow in sum method.
    if (X.dtype.char in np.typecodes['AllFloat'] and not np.isfinite(X.sum())
            and not np.isfinite(X).all()):
        raise ValueError("Input contains NaN, infinity"
                         " or a value too large for %r." % X.dtype)


This is the check on which it fails:

Which says

def _assert_all_finite(X):
    """Like assert_all_finite, but only for ndarray."""
    X = np.asanyarray(X)
    # First try an O(n) time, O(1) space solution for the common case that
    # everything is finite; fall back to O(n) space np.isfinite to prevent
    # false positives from overflow in sum method.
    if (X.dtype.char in np.typecodes['AllFloat'] and not np.isfinite(X.sum())
            and not np.isfinite(X).all()):
        raise ValueError("Input contains NaN, infinity"
                         " or a value too large for %r." % X.dtype)

So make sure that you have non NaN values in your input. And all those values are actually float values. None of the values should be Inf either.

回答 5

使用此版本的python 3:

/opt/anaconda3/bin/python --version
Python 3.6.0 :: Anaconda 4.3.0 (64-bit)


/opt/anaconda3/lib/python3.6/site-packages/sklearn/utils/validation.py in _assert_all_finite(X)
     56             and not np.isfinite(X).all()):
     57         raise ValueError("Input contains NaN, infinity"
---> 58                          " or a value too large for %r." % X.dtype)

ValueError: Input contains NaN, infinity or a value too large for dtype('float64').

由此,我能够使用错误消息给出的相同测试来提取正确的方法来测试数据的处理方式: np.isfinite(X)


index = 0
for i in p[:,0]:
    if not np.isfinite(i):
        print(index, i)
    index +=1

4454 nan
6940 nan
10868 nan
12753 nan
14855 nan
15678 nan
24954 nan
30251 nan
31108 nan
51455 nan
59055 nan


With this version of python 3:

/opt/anaconda3/bin/python --version
Python 3.6.0 :: Anaconda 4.3.0 (64-bit)

Looking at the details of the error, I found the lines of codes causing the failure:

/opt/anaconda3/lib/python3.6/site-packages/sklearn/utils/validation.py in _assert_all_finite(X)
     56             and not np.isfinite(X).all()):
     57         raise ValueError("Input contains NaN, infinity"
---> 58                          " or a value too large for %r." % X.dtype)

ValueError: Input contains NaN, infinity or a value too large for dtype('float64').

From this, I was able to extract the correct way to test what was going on with my data using the same test which fails given by the error message: np.isfinite(X)

Then with a quick and dirty loop, I was able to find that my data indeed contains nans:

index = 0
for i in p[:,0]:
    if not np.isfinite(i):
        print(index, i)
    index +=1

4454 nan
6940 nan
10868 nan
12753 nan
14855 nan
15678 nan
24954 nan
30251 nan
31108 nan
51455 nan
59055 nan

Now all I have to do is remove the values at these indexes.

回答 6


df = df.reindex(index=my_index)


I had the error after trying to select a subset of rows:

df = df.reindex(index=my_index)

Turns out that my_index contained values that were not contained in df.index, so the reindex function inserted some new rows and filled them with nan.

回答 7



df.replace([np.inf, -np.inf], np.nan, inplace=True)


df.fillna(999, inplace=True)

In most cases getting rid of infinite and null values solve this problem.

get rid of infinite values.

df.replace([np.inf, -np.inf], np.nan, inplace=True)

get rid of null values the way you like, specific value such as 999, mean, or create your own function to impute missing values

df.fillna(999, inplace=True)

回答 8


X = X.values.astype(np.float)
y = y.values.astype(np.float)


I had the same error, and in my case X and y were dataframes so I had to convert them to matrices first:

X = X.values.astype(np.float)
y = y.values.astype(np.float)

Edit: The originally suggested X.as_matrix() is Deprecated

回答 9

我有同样的错误。它曾与df.fillna(-99999, inplace=True)做任何替换之前,替换等

i got the same error. it worked with df.fillna(-99999, inplace=True) before doing any replacement, substitution etc

回答 10


In my case the problem was that many scikit functions return numpy arrays, which are devoid of pandas index. So there was an index mismatch when I used those numpy arrays to build new DataFrames and then I tried to mix them with the original data.

回答 11



# find min and max values for each column, ignoring nan, -inf, and inf
mins = [np.nanmin(matrix[:, i][matrix[:, i] != -np.inf]) for i in range(matrix.shape[1])]
maxs = [np.nanmax(matrix[:, i][matrix[:, i] != np.inf]) for i in range(matrix.shape[1])]

# go through matrix one column at a time and replace  + and -infinity 
# with the max or min for that column
for i in range(log_train_arr.shape[1]):
    matrix[:, i][matrix[:, i] == -np.inf] = mins[i]
    matrix[:, i][matrix[:, i] == np.inf] = maxs[i]

Remove all infinite values:

(and replace with min or max for that column)

# find min and max values for each column, ignoring nan, -inf, and inf
mins = [np.nanmin(matrix[:, i][matrix[:, i] != -np.inf]) for i in range(matrix.shape[1])]
maxs = [np.nanmax(matrix[:, i][matrix[:, i] != np.inf]) for i in range(matrix.shape[1])]

# go through matrix one column at a time and replace  + and -infinity 
# with the max or min for that column
for i in range(log_train_arr.shape[1]):
    matrix[:, i][matrix[:, i] == -np.inf] = mins[i]
    matrix[:, i][matrix[:, i] == np.inf] = maxs[i]

回答 12



如果您的数据总和为无穷大(最大浮动值大于3.402823e + 38),则会收到该错误。


if is_float and np.isfinite(X.sum()):
elif is_float:
    msg_err = "Input contains {} or a value too large for {!r}."
    if (allow_nan and np.isinf(X).any() or
            not allow_nan and not np.isfinite(X).all()):
        type_err = 'infinity' if allow_nan else 'NaN, infinity'
        # print(X.sum())
        raise ValueError(msg_err.format(type_err, X.dtype))



If the sum of your data is infinity (greater that the max float value which is 3.402823e+38) you will get that error.

see the _assert_all_finite function in validation.py from the scikit source code:

if is_float and np.isfinite(X.sum()):
elif is_float:
    msg_err = "Input contains {} or a value too large for {!r}."
    if (allow_nan and np.isinf(X).any() or
            not allow_nan and not np.isfinite(X).all()):
        type_err = 'infinity' if allow_nan else 'NaN, infinity'
        # print(X.sum())
        raise ValueError(msg_err.format(type_err, X.dtype))




max(files, key = os.path.getctime)


FileNotFoundError: [WinError 2] The system cannot find the file specified: 'a'

I need to get the latest file of a folder using python. While using the code:

max(files, key = os.path.getctime)

I am getting the below error:

FileNotFoundError: [WinError 2] The system cannot find the file specified: 'a'

回答 0


import glob
import os

list_of_files = glob.glob('/path/to/folder/*') # * means all if need specific format then *.csv
latest_file = max(list_of_files, key=os.path.getctime)
print latest_file

Whatever is assigned to the files variable is incorrect. Use the following code.

import glob
import os

list_of_files = glob.glob('/path/to/folder/*') # * means all if need specific format then *.csv
latest_file = max(list_of_files, key=os.path.getctime)
print latest_file

回答 1

max(files, key = os.path.getctime)

是非常不完整的代码。什么files啊 可能是来自的文件名列表os.listdir()



def newest(path):
    files = os.listdir(path)
    paths = [os.path.join(path, basename) for basename in files]
    return max(paths, key=os.path.getctime)
max(files, key = os.path.getctime)

is quite incomplete code. What is files? It probably is a list of file names, coming out of os.listdir().

But this list lists only the filename parts (a. k. a. “basenames”), because their path is common. In order to use it correctly, you have to combine it with the path leading to it (and used to obtain it).

Such as (untested):

def newest(path):
    files = os.listdir(path)
    paths = [os.path.join(path, basename) for basename in files]
    return max(paths, key=os.path.getctime)

回答 2



意思是 glob.iglob()效率更高。


LatestFile = max(glob.iglob(fileNamePattern),key=os.path.getctime)

注意:max函数有多种变体,如果找到最新文件,我们将使用以下变体: max(iterable, *[, key, default])

它需要迭代,因此您的第一个参数应该是可迭代的。如果找到最大数量,我们可以使用beow变体:max (num1, num2, num3, *args[, key])

I would suggest using glob.iglob() instead of the glob.glob(), as it is more efficient.

glob.iglob() Return an iterator which yields the same values as glob() without actually storing them all simultaneously.

Which means glob.iglob() will be more efficient.

I mostly use below code to find the latest file matching to my pattern:

LatestFile = max(glob.iglob(fileNamePattern),key=os.path.getctime)

NOTE: There are variants of max function, In case of finding the latest file we will be using below variant: max(iterable, *[, key, default])

which needs iterable so your first parameter should be iterable. In case of finding max of nums we can use beow variant : max (num1, num2, num3, *args[, key])

回答 3


import glob
import os

files_path = os.path.join(folder, '*')
files = sorted(
    glob.iglob(files_path), key=os.path.getctime, reverse=True) 
print files[0]

Try to sort items by creation time. Example below sorts files in a folder and gets first element which is latest.

import glob
import os

files_path = os.path.join(folder, '*')
files = sorted(
    glob.iglob(files_path), key=os.path.getctime, reverse=True) 
print files[0]

回答 4

我缺乏发表评论的声誉,但是Marlon Abeykoons回应的ctime并未为我提供正确的结果。使用mtime可以解决问题。(key = os.path.get m时间))

import glob
import os

list_of_files = glob.glob('/path/to/folder/*') # * means all if need specific format then *.csv
latest_file = max(list_of_files, key=os.path.getmtime)
print latest_file


python os.path.getctime max不返回最新的 python-unix系统中的getmtime()和getctime()之间的区别

I lack the reputation to comment but ctime from Marlon Abeykoons response did not give the correct result for me. Using mtime does the trick though. (key=os.path.getmtime))

import glob
import os

list_of_files = glob.glob('/path/to/folder/*') # * means all if need specific format then *.csv
latest_file = max(list_of_files, key=os.path.getmtime)
print latest_file

I found two answers for that problem:

python os.path.getctime max does not return latest Difference between python – getmtime() and getctime() in unix system

回答 5



def get_latest_file(path, *paths):
    fullpath = os.path.join(path, paths)
get_latest_file('example', 'files','randomtext011.*.txt')


def get_latest_file(path, *paths):
    """Returns the name of the latest (most recent) file 
    of the joined path(s)"""
    fullpath = os.path.join(path, *paths)

如果使用Python 3,则可以改用iglob


def get_latest_file(path, *paths):
    """Returns the name of the latest (most recent) file 
    of the joined path(s)"""
    fullpath = os.path.join(path, *paths)
    files = glob.glob(fullpath)  # You may use iglob in Python3
    if not files:                # I prefer using the negation
        return None                      # because it behaves like a shortcut
    latest_file = max(files, key=os.path.getctime)
    _, filename = os.path.split(latest_file)
    return filename

(Edited to improve answer)

First define a function get_latest_file

def get_latest_file(path, *paths):
    fullpath = os.path.join(path, paths)
get_latest_file('example', 'files','randomtext011.*.txt')

You may also use a docstring !

def get_latest_file(path, *paths):
    """Returns the name of the latest (most recent) file 
    of the joined path(s)"""
    fullpath = os.path.join(path, *paths)

If you use Python 3, you can use iglob instead.

Complete code to return the name of latest file:

def get_latest_file(path, *paths):
    """Returns the name of the latest (most recent) file 
    of the joined path(s)"""
    fullpath = os.path.join(path, *paths)
    files = glob.glob(fullpath)  # You may use iglob in Python3
    if not files:                # I prefer using the negation
        return None                      # because it behaves like a shortcut
    latest_file = max(files, key=os.path.getctime)
    _, filename = os.path.split(latest_file)
    return filename

回答 6

我试图使用以上建议,但程序崩溃了,而不是我想识别的文件已被使用,并且在尝试使用“ os.path.getctime”时崩溃了。最终对我有用的是:

    files_before = glob.glob(os.path.join(my_path,'*'))
    **code where new file is created**
    new_file = set(files_before).symmetric_difference(set(glob.glob(os.path.join(my_path,'*'))))


I have tried to use the above suggestions and my program crashed, than I figured out the file I’m trying to identify was used and when trying to use ‘os.path.getctime’ it crashed. what finally worked for me was:

    files_before = glob.glob(os.path.join(my_path,'*'))
    **code where new file is created**
    new_file = set(files_before).symmetric_difference(set(glob.glob(os.path.join(my_path,'*'))))

this codes gets the uncommon object between the two sets of file lists its not the most elegant, and if multiple files are created at the same time it would probably won’t be stable

回答 7



@echo off
for /f %%i in ('dir \\directory\in\question /b/a-d/od/t:c') do set LAST=%%i



from subprocess import Popen, PIPE
p = Popen("get_latest.bat", shell=True, stdout=PIPE,)
stdout, stderr = p.communicate()
print(stdout, stderr)



A much faster method on windows (0.05s), call a bat script that does this:


@echo off
for /f %%i in ('dir \\directory\in\question /b/a-d/od/t:c') do set LAST=%%i

where \\directory\in\question is the directory you want to investigate.


from subprocess import Popen, PIPE
p = Popen("get_latest.bat", shell=True, stdout=PIPE,)
stdout, stderr = p.communicate()
print(stdout, stderr)

if it finds a file stdout is the path and stderr is None.

Use stdout.decode("utf-8").rstrip() to get the usable string representation of the file name.

回答 8

我在Python 3中一直在使用它,包括在文件名上进行模式匹配。

from pathlib import Path

def latest_file(path: Path, pattern: str = "*"):
    files = path.glob(pattern)
    return max(files, key=lambda x: x.stat().st_ctime)

I’ve been using this in Python 3, including pattern matching on the filename.

from pathlib import Path

def latest_file(path: Path, pattern: str = "*"):
    files = path.glob(pattern)
    return max(files, key=lambda x: x.stat().st_ctime)