标签归档:python-2.7

AttributeError(“’str’对象没有属性’read’”)

问题:AttributeError(“’str’对象没有属性’read’”)

在Python中,我得到一个错误:

Exception:  (<type 'exceptions.AttributeError'>,
AttributeError("'str' object has no attribute 'read'",), <traceback object at 0x1543ab8>)

给定python代码:

def getEntries (self, sub):
    url = 'http://www.reddit.com/'
    if (sub != ''):
        url += 'r/' + sub

    request = urllib2.Request (url + 
        '.json', None, {'User-Agent' : 'Reddit desktop client by /user/RobinJ1995/'})
    response = urllib2.urlopen (request)
    jsonofabitch = response.read ()

    return json.load (jsonofabitch)['data']['children']

此错误是什么意思,我怎么做导致此错误?

In Python I’m getting an error:

Exception:  (<type 'exceptions.AttributeError'>,
AttributeError("'str' object has no attribute 'read'",), <traceback object at 0x1543ab8>)

Given python code:

def getEntries (self, sub):
    url = 'http://www.reddit.com/'
    if (sub != ''):
        url += 'r/' + sub

    request = urllib2.Request (url + 
        '.json', None, {'User-Agent' : 'Reddit desktop client by /user/RobinJ1995/'})
    response = urllib2.urlopen (request)
    jsonofabitch = response.read ()

    return json.load (jsonofabitch)['data']['children']

What does this error mean and what did I do to cause it?


回答 0

问题在于,对于json.load您,应该传递带有已read定义函数的对象之类的文件。因此,您可以使用json.load(response)json.loads(response.read())

The problem is that for json.load you should pass a file like object with a read function defined. So either you use json.load(response) or json.loads(response.read()).


回答 1

AttributeError("'str' object has no attribute 'read'",)

这就是说的意思:有人试图.read在给它的对象上找到一个属性,然后给它一个类型的对象str(即,给它一个字符串)。

错误发生在这里:

json.load (jsonofabitch)['data']['children']

好吧,您不在read任何地方寻找,因此它必须在json.load您调用的函数中发生(如完整的回溯所示)。这是因为json.load正在尝试提供给.read您的东西,但是却给了它jsonofabitch,该东西当前命名为字符串(通过调用.read来创建response)。

解决方案:不要.read自欺欺人。函数将执行此操作,并希望您response直接给它以使其可以执行操作。

您也可以通过阅读功能的内置Python文档(try help(json.load)或整个模块(try help(json))),或通过查看http://docs.python.org上这些功能的文档来解决此问题。

AttributeError("'str' object has no attribute 'read'",)

This means exactly what it says: something tried to find a .read attribute on the object that you gave it, and you gave it an object of type str (i.e., you gave it a string).

The error occurred here:

json.load (jsonofabitch)['data']['children']

Well, you aren’t looking for read anywhere, so it must happen in the json.load function that you called (as indicated by the full traceback). That is because json.load is trying to .read the thing that you gave it, but you gave it jsonofabitch, which currently names a string (which you created by calling .read on the response).

Solution: don’t call .read yourself; the function will do this, and is expecting you to give it the response directly so that it can do so.

You could also have figured this out by reading the built-in Python documentation for the function (try help(json.load), or for the entire module (try help(json)), or by checking the documentation for those functions on http://docs.python.org .


回答 2

如果收到这样的python错误:

AttributeError: 'str' object has no attribute 'some_method'

您可能通过用字符串覆盖对象意外地中毒了对象。

如何用几行代码在python中重现此错误:

#!/usr/bin/env python
import json
def foobar(json):
    msg = json.loads(json)

foobar('{"batman": "yes"}')

运行它,打印:

AttributeError: 'str' object has no attribute 'loads'

但是更改变量名的名称,它可以正常工作:

#!/usr/bin/env python
import json
def foobar(jsonstring):
    msg = json.loads(jsonstring)

foobar('{"batman": "yes"}')

当您尝试在字符串中运行方法时,导致此错误。字符串有几种方法,但是您没有调用。因此,停止尝试调用String未定义的方法,并开始寻找在何处毒害了对象。

If you get a python error like this:

AttributeError: 'str' object has no attribute 'some_method'

You probably poisoned your object accidentally by overwriting your object with a string.

How to reproduce this error in python with a few lines of code:

#!/usr/bin/env python
import json
def foobar(json):
    msg = json.loads(json)

foobar('{"batman": "yes"}')

Run it, which prints:

AttributeError: 'str' object has no attribute 'loads'

But change the name of the variablename, and it works fine:

#!/usr/bin/env python
import json
def foobar(jsonstring):
    msg = json.loads(jsonstring)

foobar('{"batman": "yes"}')

This error is caused when you tried to run a method within a string. String has a few methods, but not the one you are invoking. So stop trying to invoke a method which String does not define and start looking for where you poisoned your object.


回答 3

好的,这是旧线程了。我有一个相同的问题,我的问题是我用了json.load而不是json.loads

这样,json加载任何类型的字典都没有问题。

官方文件

json.load-使用此转换表将fp(支持.read()的文本文件或包含JSON文档的二进制文件)反序列化为Python对象。

json.loads-使用此转换表将s(包含JSON文档的str,字节或字节数组实例)反序列化为Python对象。

Ok, this is an old thread but. I had a same issue, my problem was I used json.load instead of json.loads

This way, json has no problem with loading any kind of dictionary.

Official documentation

json.load – Deserialize fp (a .read()-supporting text file or binary file containing a JSON document) to a Python object using this conversion table.

json.loads – Deserialize s (a str, bytes or bytearray instance containing a JSON document) to a Python object using this conversion table.


回答 4

您需要先打开文件。这不起作用:

json_file = json.load('test.json')

但这有效:

f = open('test.json')
json_file = json.load(f)

You need to open the file first. This doesn’t work:

json_file = json.load('test.json')

But this works:

f = open('test.json')
json_file = json.load(f)

Python-提取和保存视频帧

问题:Python-提取和保存视频帧

因此,我已按照本教程进行操作,但似乎没有任何作用。根本没有。等待几秒钟,然后关闭程序。此代码有什么问题?

import cv2
vidcap = cv2.VideoCapture('Compton.mp4')
success,image = vidcap.read()
count = 0
success = True
while success:
  success,image = vidcap.read()
  cv2.imwrite("frame%d.jpg" % count, image)     # save frame as JPEG file
  if cv2.waitKey(10) == 27:                     # exit if Escape is hit
      break
  count += 1

另外,在评论中说这将帧数限制为1000?为什么?

编辑:我尝试先做,success = True但这没有帮助。它仅创建了一个0字节的图像。

So I’ve followed this tutorial but it doesn’t seem to do anything. Simply nothing. It waits a few seconds and closes the program. What is wrong with this code?

import cv2
vidcap = cv2.VideoCapture('Compton.mp4')
success,image = vidcap.read()
count = 0
success = True
while success:
  success,image = vidcap.read()
  cv2.imwrite("frame%d.jpg" % count, image)     # save frame as JPEG file
  if cv2.waitKey(10) == 27:                     # exit if Escape is hit
      break
  count += 1

Also, in the comments it says that this limits the frames to 1000? Why?

EDIT: I tried doing success = True first but that didn’t help. It only created one image that was 0 bytes.


回答 0

这里下载此视频,以便我们拥有用于测试的相同视频文件。确保将mp4文件放在python代码的同一目录中。然后还要确保从同一目录运行python解释器。

然后修改代码,waitKey浪费时间也没有窗口,它无法捕获键盘事件。另外,我们打印该success值以确保它已成功读取帧。

import cv2
vidcap = cv2.VideoCapture('big_buck_bunny_720p_5mb.mp4')
success,image = vidcap.read()
count = 0
while success:
  cv2.imwrite("frame%d.jpg" % count, image)     # save frame as JPEG file      
  success,image = vidcap.read()
  print('Read a new frame: ', success)
  count += 1

怎么样了?

From here download this video so we have the same video file for the test. Make sure to have that mp4 file in the same directory of your python code. Then also make sure to run the python interpreter from the same directory.

Then modify the code, ditch waitKey that’s wasting time also without a window it cannot capture the keyboard events. Also we print the success value to make sure it’s reading the frames successfully.

import cv2
vidcap = cv2.VideoCapture('big_buck_bunny_720p_5mb.mp4')
success,image = vidcap.read()
count = 0
while success:
  cv2.imwrite("frame%d.jpg" % count, image)     # save frame as JPEG file      
  success,image = vidcap.read()
  print('Read a new frame: ', success)
  count += 1

How does that go?


回答 1

如果有人不想提取每一帧,但想每秒钟提取一帧,则针对稍有不同的情况扩展此问题(@ user2700065的答案)。因此,一分钟的视频将提供60帧(图像)。

import sys
import argparse

import cv2
print(cv2.__version__)

def extractImages(pathIn, pathOut):
    count = 0
    vidcap = cv2.VideoCapture(pathIn)
    success,image = vidcap.read()
    success = True
    while success:
        vidcap.set(cv2.CAP_PROP_POS_MSEC,(count*1000))    # added this line 
        success,image = vidcap.read()
        print ('Read a new frame: ', success)
        cv2.imwrite( pathOut + "\\frame%d.jpg" % count, image)     # save frame as JPEG file
        count = count + 1

if __name__=="__main__":
    a = argparse.ArgumentParser()
    a.add_argument("--pathIn", help="path to video")
    a.add_argument("--pathOut", help="path to images")
    args = a.parse_args()
    print(args)
    extractImages(args.pathIn, args.pathOut)

To extend on this question (& answer by @user2700065) for a slightly different cases, if anyone does not want to extract every frame but wants to extract frame every one second. So a 1-minute video will give 60 frames(images).

import sys
import argparse

import cv2
print(cv2.__version__)

def extractImages(pathIn, pathOut):
    count = 0
    vidcap = cv2.VideoCapture(pathIn)
    success,image = vidcap.read()
    success = True
    while success:
        vidcap.set(cv2.CAP_PROP_POS_MSEC,(count*1000))    # added this line 
        success,image = vidcap.read()
        print ('Read a new frame: ', success)
        cv2.imwrite( pathOut + "\\frame%d.jpg" % count, image)     # save frame as JPEG file
        count = count + 1

if __name__=="__main__":
    a = argparse.ArgumentParser()
    a.add_argument("--pathIn", help="path to video")
    a.add_argument("--pathOut", help="path to images")
    args = a.parse_args()
    print(args)
    extractImages(args.pathIn, args.pathOut)

回答 2

这是来自@GShocked的python 3.x以前答案的调整,我将其发布到注释中,但信誉不足

import sys
import argparse

import cv2
print(cv2.__version__)

def extractImages(pathIn, pathOut):
    vidcap = cv2.VideoCapture(pathIn)
    success,image = vidcap.read()
    count = 0
    success = True
    while success:
      success,image = vidcap.read()
      print ('Read a new frame: ', success)
      cv2.imwrite( pathOut + "\\frame%d.jpg" % count, image)     # save frame as JPEG file
      count += 1

if __name__=="__main__":
    print("aba")
    a = argparse.ArgumentParser()
    a.add_argument("--pathIn", help="path to video")
    a.add_argument("--pathOut", help="path to images")
    args = a.parse_args()
    print(args)
    extractImages(args.pathIn, args.pathOut)

This is a tweak from previous answer for python 3.x from @GShocked, I would post it to the comment, but dont have enough reputation

import sys
import argparse

import cv2
print(cv2.__version__)

def extractImages(pathIn, pathOut):
    vidcap = cv2.VideoCapture(pathIn)
    success,image = vidcap.read()
    count = 0
    success = True
    while success:
      success,image = vidcap.read()
      print ('Read a new frame: ', success)
      cv2.imwrite( pathOut + "\\frame%d.jpg" % count, image)     # save frame as JPEG file
      count += 1

if __name__=="__main__":
    print("aba")
    a = argparse.ArgumentParser()
    a.add_argument("--pathIn", help="path to video")
    a.add_argument("--pathOut", help="path to images")
    args = a.parse_args()
    print(args)
    extractImages(args.pathIn, args.pathOut)

回答 3

此功能可将大多数视频格式转换为视频中的帧数。它的工作原理上Python3OpenCV 3+

import cv2
import time
import os

def video_to_frames(input_loc, output_loc):
    """Function to extract frames from input video file
    and save them as separate frames in an output directory.
    Args:
        input_loc: Input video file.
        output_loc: Output directory to save the frames.
    Returns:
        None
    """
    try:
        os.mkdir(output_loc)
    except OSError:
        pass
    # Log the time
    time_start = time.time()
    # Start capturing the feed
    cap = cv2.VideoCapture(input_loc)
    # Find the number of frames
    video_length = int(cap.get(cv2.CAP_PROP_FRAME_COUNT)) - 1
    print ("Number of frames: ", video_length)
    count = 0
    print ("Converting video..\n")
    # Start converting the video
    while cap.isOpened():
        # Extract the frame
        ret, frame = cap.read()
        # Write the results back to output location.
        cv2.imwrite(output_loc + "/%#05d.jpg" % (count+1), frame)
        count = count + 1
        # If there are no more frames left
        if (count > (video_length-1)):
            # Log the time again
            time_end = time.time()
            # Release the feed
            cap.release()
            # Print stats
            print ("Done extracting frames.\n%d frames extracted" % count)
            print ("It took %d seconds forconversion." % (time_end-time_start))
            break

if __name__=="__main__":

    input_loc = '/path/to/video/00009.MTS'
    output_loc = '/path/to/output/frames/'
    video_to_frames(input_loc, output_loc)

它支持.mts和普通文件,例如.mp4.avi。在.mts文件上尝试和测试。奇迹般有效。

This is Function which will convert most of the video formats to number of frames there are in the video. It works on Python3 with OpenCV 3+

import cv2
import time
import os

def video_to_frames(input_loc, output_loc):
    """Function to extract frames from input video file
    and save them as separate frames in an output directory.
    Args:
        input_loc: Input video file.
        output_loc: Output directory to save the frames.
    Returns:
        None
    """
    try:
        os.mkdir(output_loc)
    except OSError:
        pass
    # Log the time
    time_start = time.time()
    # Start capturing the feed
    cap = cv2.VideoCapture(input_loc)
    # Find the number of frames
    video_length = int(cap.get(cv2.CAP_PROP_FRAME_COUNT)) - 1
    print ("Number of frames: ", video_length)
    count = 0
    print ("Converting video..\n")
    # Start converting the video
    while cap.isOpened():
        # Extract the frame
        ret, frame = cap.read()
        # Write the results back to output location.
        cv2.imwrite(output_loc + "/%#05d.jpg" % (count+1), frame)
        count = count + 1
        # If there are no more frames left
        if (count > (video_length-1)):
            # Log the time again
            time_end = time.time()
            # Release the feed
            cap.release()
            # Print stats
            print ("Done extracting frames.\n%d frames extracted" % count)
            print ("It took %d seconds forconversion." % (time_end-time_start))
            break

if __name__=="__main__":

    input_loc = '/path/to/video/00009.MTS'
    output_loc = '/path/to/output/frames/'
    video_to_frames(input_loc, output_loc)

It supports .mts and normal files like .mp4 and .avi. Tried and Tested on .mts files. Works like a Charm.


回答 4

经过大量有关如何将帧转换为视频的研究,我创建了此功能,希望对您有所帮助。为此,我们需要opencv:

import cv2
import numpy as np
import os

def frames_to_video(inputpath,outputpath,fps):
   image_array = []
   files = [f for f in os.listdir(inputpath) if isfile(join(inputpath, f))]
   files.sort(key = lambda x: int(x[5:-4]))
   for i in range(len(files)):
       img = cv2.imread(inputpath + files[i])
       size =  (img.shape[1],img.shape[0])
       img = cv2.resize(img,size)
       image_array.append(img)
   fourcc = cv2.VideoWriter_fourcc('D', 'I', 'V', 'X')
   out = cv2.VideoWriter(outputpath,fourcc, fps, size)
   for i in range(len(image_array)):
       out.write(image_array[i])
   out.release()


inputpath = 'folder path'
outpath =  'video file path/video.mp4'
fps = 29
frames_to_video(inputpath,outpath,fps)

根据您自己的本地位置更改fps(帧/秒),输入文件夹路径和输出文件夹路径的值

After a lot of research on how to convert frames to video I have created this function hope this helps. We require opencv for this:

import cv2
import numpy as np
import os

def frames_to_video(inputpath,outputpath,fps):
   image_array = []
   files = [f for f in os.listdir(inputpath) if isfile(join(inputpath, f))]
   files.sort(key = lambda x: int(x[5:-4]))
   for i in range(len(files)):
       img = cv2.imread(inputpath + files[i])
       size =  (img.shape[1],img.shape[0])
       img = cv2.resize(img,size)
       image_array.append(img)
   fourcc = cv2.VideoWriter_fourcc('D', 'I', 'V', 'X')
   out = cv2.VideoWriter(outputpath,fourcc, fps, size)
   for i in range(len(image_array)):
       out.write(image_array[i])
   out.release()


inputpath = 'folder path'
outpath =  'video file path/video.mp4'
fps = 29
frames_to_video(inputpath,outpath,fps)

change the value of fps(frames per second),input folder path and output folder path according to your own local locations


回答 5

先前的答案丢失了第一帧。而且最好将图像存储在文件夹中。

# create a folder to store extracted images
import os
folder = 'test'  
os.mkdir(folder)
# use opencv to do the job
import cv2
print(cv2.__version__)  # my version is 3.1.0
vidcap = cv2.VideoCapture('test_video.mp4')
count = 0
while True:
    success,image = vidcap.read()
    if not success:
        break
    cv2.imwrite(os.path.join(folder,"frame{:d}.jpg".format(count)), image)     # save frame as JPEG file
    count += 1
print("{} images are extacted in {}.".format(count,folder))

顺便说一下,您可以通过VLC 检查帧率。转到Windows->媒体信息->编解码器详细信息

The previous answers have lost the first frame. And it will be nice to store the images in a folder.

# create a folder to store extracted images
import os
folder = 'test'  
os.mkdir(folder)
# use opencv to do the job
import cv2
print(cv2.__version__)  # my version is 3.1.0
vidcap = cv2.VideoCapture('test_video.mp4')
count = 0
while True:
    success,image = vidcap.read()
    if not success:
        break
    cv2.imwrite(os.path.join(folder,"frame{:d}.jpg".format(count)), image)     # save frame as JPEG file
    count += 1
print("{} images are extacted in {}.".format(count,folder))

By the way, you can check the frame rate by VLC. Go to windows -> media information -> codec details


回答 6

此代码从视频中提取帧并将帧保存为.jpg formate

import cv2
import numpy as np
import os

# set video file path of input video with name and extension
vid = cv2.VideoCapture('VideoPath')


if not os.path.exists('images'):
    os.makedirs('images')

#for frame identity
index = 0
while(True):
    # Extract images
    ret, frame = vid.read()
    # end of frames
    if not ret: 
        break
    # Saves images
    name = './images/frame' + str(index) + '.jpg'
    print ('Creating...' + name)
    cv2.imwrite(name, frame)

    # next frame
    index += 1

This code extract frames from the video and save the frames in .jpg formate

import cv2
import numpy as np
import os

# set video file path of input video with name and extension
vid = cv2.VideoCapture('VideoPath')


if not os.path.exists('images'):
    os.makedirs('images')

#for frame identity
index = 0
while(True):
    # Extract images
    ret, frame = vid.read()
    # end of frames
    if not ret: 
        break
    # Saves images
    name = './images/frame' + str(index) + '.jpg'
    print ('Creating...' + name)
    cv2.imwrite(name, frame)

    # next frame
    index += 1

回答 7

我正在通过Anaconda的Spyder软件使用Python。使用@Gshocked在此线程问题中列出的原始代码,该代码不起作用(Python无法读取mp4文件)。因此,我下载了OpenCV 3.2,并从“ bin”文件夹中复制了“ opencv_ffmpeg320.dll”和“ opencv_ffmpeg320_64.dll”。我将这两个dll文件都粘贴到了Anaconda的“ Dlls”文件夹中。

Anaconda也有一个“ pckgs”文件夹…我复制并粘贴了我下载到Anaconda“ pckgs”文件夹中的整个“ OpenCV 3.2”文件夹。

最后,Anaconda有一个“ Library”文件夹,其中有一个“ bin”子文件夹。我将“ opencv_ffmpeg320.dll”和“ opencv_ffmpeg320_64.dll”文件粘贴到该文件夹​​中。

关闭并重新启动Spyder之后,代码即可正常工作。我不确定这三种方法中的哪一种有效,而且我懒得回头再去弄清楚。但这很奏效,欢呼!

I am using Python via Anaconda’s Spyder software. Using the original code listed in the question of this thread by @Gshocked, the code does not work (the python won’t read the mp4 file). So I downloaded OpenCV 3.2 and copied “opencv_ffmpeg320.dll” and “opencv_ffmpeg320_64.dll” from the “bin” folder. I pasted both of these dll files to Anaconda’s “Dlls” folder.

Anaconda also has a “pckgs” folder…I copied and pasted the entire “OpenCV 3.2” folder that I downloaded to the Anaconda “pckgs” folder.

Finally, Anaconda has a “Library” folder which has a “bin” subfolder. I pasted the “opencv_ffmpeg320.dll” and “opencv_ffmpeg320_64.dll” files to that folder.

After closing and restarting Spyder, the code worked. I’m not sure which of the three methods worked, and I’m too lazy to go back and figure it out. But it works so, cheers!


回答 8

此功能以1 fps的速度从视频中提取图像,此外它还标识最后一帧并停止读取:

import cv2
import numpy as np

def extract_image_one_fps(video_source_path):

    vidcap = cv2.VideoCapture(video_source_path)
    count = 0
    success = True
    while success:
      vidcap.set(cv2.CAP_PROP_POS_MSEC,(count*1000))      
      success,image = vidcap.read()

      ## Stop when last frame is identified
      image_last = cv2.imread("frame{}.png".format(count-1))
      if np.array_equal(image,image_last):
          break

      cv2.imwrite("frame%d.png" % count, image)     # save frame as PNG file
      print '{}.sec reading a new frame: {} '.format(count,success)
      count += 1

This function extracts images from video with 1 fps, IN ADDITION it identifies the last frame and stops reading also:

import cv2
import numpy as np

def extract_image_one_fps(video_source_path):

    vidcap = cv2.VideoCapture(video_source_path)
    count = 0
    success = True
    while success:
      vidcap.set(cv2.CAP_PROP_POS_MSEC,(count*1000))      
      success,image = vidcap.read()

      ## Stop when last frame is identified
      image_last = cv2.imread("frame{}.png".format(count-1))
      if np.array_equal(image,image_last):
          break

      cv2.imwrite("frame%d.png" % count, image)     # save frame as PNG file
      print '{}.sec reading a new frame: {} '.format(count,success)
      count += 1

回答 9

以下脚本将每隔半秒提取一次文件夹中所有视频的帧。(适用于python 3.7)

import cv2
import os
listing = os.listdir(r'D:/Images/AllVideos')
count=1
for vid in listing:
    vid = r"D:/Images/AllVideos/"+vid
    vidcap = cv2.VideoCapture(vid)
    def getFrame(sec):
        vidcap.set(cv2.CAP_PROP_POS_MSEC,sec*1000)
        hasFrames,image = vidcap.read()
        if hasFrames:
            cv2.imwrite("D:/Images/Frames/image"+str(count)+".jpg", image) # Save frame as JPG file
        return hasFrames
    sec = 0
    frameRate = 0.5 # Change this number to 1 for each 1 second
    
    success = getFrame(sec)
    while success:
        count = count + 1
        sec = sec + frameRate
        sec = round(sec, 2)
        success = getFrame(sec)

Following script will extract frames every half a second of all videos in folder. (Works on python 3.7)

import cv2
import os
listing = os.listdir(r'D:/Images/AllVideos')
count=1
for vid in listing:
    vid = r"D:/Images/AllVideos/"+vid
    vidcap = cv2.VideoCapture(vid)
    def getFrame(sec):
        vidcap.set(cv2.CAP_PROP_POS_MSEC,sec*1000)
        hasFrames,image = vidcap.read()
        if hasFrames:
            cv2.imwrite("D:/Images/Frames/image"+str(count)+".jpg", image) # Save frame as JPG file
        return hasFrames
    sec = 0
    frameRate = 0.5 # Change this number to 1 for each 1 second
    
    success = getFrame(sec)
    while success:
        count = count + 1
        sec = sec + frameRate
        sec = round(sec, 2)
        success = getFrame(sec)

python中的json.dump()和json.dumps()有什么区别?

问题:python中的json.dump()和json.dumps()有什么区别?

我在官方文档中进行了搜索,以查找python中的json.dump()和json.dumps()之间的区别。显然,它们与文件写入选项有关。
但是,它们之间的详细区别是什么?在什么情况下,一个比另一个具有更多的优势?

I searched in this official document to find difference between the json.dump() and json.dumps() in python. It is clear that they are related with file write option.
But what is the detailed difference between them and in what situations one has more advantage than other?


回答 0

除了文档所说的内容外,没有什么可添加的。如果要将JSON转储到文件/套接字或其他文件中,则应使用dump()。如果只需要它作为字符串(用于打印,解析或其他操作),则使用dumps()(转储字符串)

正如Antii Haapala在此答案中提到的,在ensure_ascii行为上有一些细微的差异。这主要是由于底层write()函数是如何工作的,因为它是对块而不是整个字符串进行操作。检查他的答案以获取更多详细信息。

json.dump()

将obj作为JSON格式的流序列化到fp(支持.write()的类似文件的对象

如果ensure_ascii为False,则写入fp的某些块可能是unicode实例

json.dumps()

将obj序列化为JSON格式的str

如果sure_ascii为False,则结果可能包含非ASCII字符,并且返回值可能是unicode实例

There isn’t much else to add other than what the docs say. If you want to dump the JSON into a file/socket or whatever, then you should go with dump(). If you only need it as a string (for printing, parsing or whatever) then use dumps() (dump string)

As mentioned by Antti Haapala in this answer, there are some minor differences on the ensure_ascii behaviour. This is mostly due to how the underlying write() function works, being that it operates on chunks rather than the whole string. Check his answer for more details on that.

json.dump()

Serialize obj as a JSON formatted stream to fp (a .write()-supporting file-like object

If ensure_ascii is False, some chunks written to fp may be unicode instances

json.dumps()

Serialize obj to a JSON formatted str

If ensure_ascii is False, the result may contain non-ASCII characters and the return value may be a unicode instance


回答 1

与功能s取字符串参数。其他则采用文件流。

The functions with an s take string parameters. The others take file streams.


回答 2

在内存使用和速度上。

调用时,jsonstr = json.dumps(mydata)它首先在内存中创建数据的完整副本,然后才将file.write(jsonstr)其复制到磁盘。因此,这是一种更快的方法,但是如果要保存大量数据,则可能会成为问题。

当调用json.dump(mydata, file)-不带’s’时,不使用新的内存,因为数据是按块转储的。但是整个过程要慢大约2倍。

来源:我检查了json.dump()和的源代码,json.dumps()还测试了两个变量,它们测量了time.time()htop中的时间并观察了它们的内存使用情况。

In memory usage and speed.

When you call jsonstr = json.dumps(mydata) it first creates a full copy of your data in memory and only then you file.write(jsonstr) it to disk. So this is a faster method but can be a problem if you have a big piece of data to save.

When you call json.dump(mydata, file) — without ‘s’, new memory is not used, as the data is dumped by chunks. But the whole process is about 2 times slower.

Source: I checked the source code of json.dump() and json.dumps() and also tested both the variants measuring the time with time.time() and watching the memory usage in htop.


回答 3

Python 2的一个显着差异是,如果您使用ensure_ascii=False,则dump可以将UTF-8编码的数据正确写入文件中(除非您使用的扩展名不是UTF-8的8位字符串):

dumps另一方面,with ensure_ascii=False可以产生a strunicode仅取决于您用于字符串的类型:

使用此转换表将obj序列化为JSON格式的str。如果sure_ascii为False,则结果可能包含非ASCII字符,并且返回值可能是unicodeinstance

(强调我的)。请注意,它可能仍然是一个str实例。

因此,如果不检查返回的格式以及可能使用的格式,就无法使用其返回值将结构保存到文件中unicode.encode

当然,这在Python 3中不再是有效的问题,因为不再存在这种8位/ Unicode的混淆。


至于loadVS loadsload认为整个文件是一个JSON文件,所以你不能用它来从单个文件读取多个新行限制JSON文件。

One notable difference in Python 2 is that if you’re using ensure_ascii=False, dump will properly write UTF-8 encoded data into the file (unless you used 8-bit strings with extended characters that are not UTF-8):

dumps on the other hand, with ensure_ascii=False can produce a str or unicode just depending on what types you used for strings:

Serialize obj to a JSON formatted str using this conversion table. If ensure_ascii is False, the result may contain non-ASCII characters and the return value may be a unicode instance.

(emphasis mine). Note that it may still be a str instance as well.

Thus you cannot use its return value to save the structure into file without checking which format was returned and possibly playing with unicode.encode.

This of course is not valid concern in Python 3 any more, since there is no more this 8-bit/Unicode confusion.


As for load vs loads, load considers the whole file to be one JSON document, so you cannot use it to read multiple newline limited JSON documents from a single file.


检查环境变量是否存在的良好实践是什么?

问题:检查环境变量是否存在的良好实践是什么?

我想检查我的环境中是否存在"FOO"Python 中的变量。为此,我正在使用os标准库。阅读图书馆的文档后,我想出了两种实现目标的方法:

方法1:

if "FOO" in os.environ:
    pass

方法2:

if os.getenv("FOO") is not None:
    pass

我想知道哪种方法是好的/首选条件,以及为什么。

I want to check my environment for the existence of a variable, say "FOO", in Python. For this purpose, I am using the os standard library. After reading the library’s documentation, I have figured out 2 ways to achieve my goal:

Method 1:

if "FOO" in os.environ:
    pass

Method 2:

if os.getenv("FOO") is not None:
    pass

I would like to know which method, if either, is a good/preferred conditional and why.


回答 0

使用第一个;它直接尝试检查是否在中定义了某些内容environ。尽管第二种形式同样可以很好地工作,但是它在语义上是不足的,因为如果存在,您会得到一个返回的值,并且将其用于比较。

你想看看是否有存在 environ,为什么你会得到只是为了进行比较,然后折腾它扔掉

那正是这样getenv做的:

获取一个环境变量None如果不存在则返回。可选的第二个参数可以指定备用默认值。

(这也意味着您的支票可能只是if getenv("FOO")

你不想得到它,你想检查它的存在。

无论哪种方式,getenv都只是一个包装,environ.get但是您看不到有人通过以下方式检查映射中的成员身份:

from os import environ
if environ.get('Foo') is not None:

总结一下,使用:

if "FOO" in os.environ:
    pass

如果您只想检查是否存在,请使用,getenv("FOO")如果您确实想用可能获得的价值做某事。

Use the first; it directly tries to check if something is defined in environ. Though the second form works equally well, it’s lacking semantically since you get a value back if it exists and only use it for a comparison.

You’re trying to see if something is present in environ, why would you get just to compare it and then toss it away?

That’s exactly what getenv does:

Get an environment variable, return None if it doesn’t exist. The optional second argument can specify an alternate default.

(this also means your check could just be if getenv("FOO"))

you don’t want to get it, you want to check for it’s existence.

Either way, getenv is just a wrapper around environ.get but you don’t see people checking for membership in mappings with:

from os import environ
if environ.get('Foo') is not None:

To summarize, use:

if "FOO" in os.environ:
    pass

if you just want to check for existence, while, use getenv("FOO") if you actually want to do something with the value you might get.


回答 1

两种解决方案都有一种情况,这取决于您要根据环境变量的存在来执行什么操作。

情况1

如果您想纯粹基于环境变量的存在而采取不同的措施而又不关心其价值,那么第一个解决方案就是最佳实践。它简要描述了您要测试的内容:环境变量列表中的’FOO’。

if 'KITTEN_ALLERGY' in os.environ:
    buy_puppy()
else:
    buy_kitten()

情况二

如果您想在环境变量中未定义该值的情况下设置默认值,则第二个解决方案实际上很有用,尽管它不是您编写的形式:

server = os.getenv('MY_CAT_STREAMS', 'youtube.com')

也许

server = os.environ.get('MY_CAT_STREAMS', 'youtube.com')

请注意,如果您的应用程序有多个选项,则可能需要查看ChainMap,它允许根据键合并多个字典。ChainMap文档中有一个示例:

[...]
combined = ChainMap(command_line_args, os.environ, defaults)

There is a case for either solution, depending on what you want to do conditional on the existence of the environment variable.

Case 1

When you want to take different actions purely based on the existence of the environment variable, without caring for its value, the first solution is the best practice. It succinctly describes what you test for: is ‘FOO’ in the list of environment variables.

if 'KITTEN_ALLERGY' in os.environ:
    buy_puppy()
else:
    buy_kitten()

Case 2

When you want to set a default value if the value is not defined in the environment variables the second solution is actually useful, though not in the form you wrote it:

server = os.getenv('MY_CAT_STREAMS', 'youtube.com')

or perhaps

server = os.environ.get('MY_CAT_STREAMS', 'youtube.com')

Note that if you have several options for your application you might want to look into ChainMap, which allows to merge multiple dicts based on keys. There is an example of this in the ChainMap documentation:

[...]
combined = ChainMap(command_line_args, os.environ, defaults)

回答 2

为了安全起见

os.getenv('FOO') or 'bar'

上述答案的一个极端情况是设置了环境变量但为空

对于这种特殊情况,您会得到

print(os.getenv('FOO', 'bar'))
# prints new line - though you expected `bar`

要么

if "FOO" in os.environ:
    print("FOO is here")
# prints FOO is here - however its not

为了避免这种情况,只需使用 or

os.getenv('FOO') or 'bar'

然后你得到

print(os.getenv('FOO') or 'bar')
# bar

什么时候有空的环境变量?

您忘记在.env文件中设置值

# .env
FOO=

或导出为

$ export FOO=

或忘记设置它 settings.py

# settings.py
os.environ['FOO'] = ''

更新:如果有疑问,请查看这些单线

>>> import os; os.environ['FOO'] = ''; print(os.getenv('FOO', 'bar'))

$ FOO= python -c "import os; print(os.getenv('FOO', 'bar'))"

To be on the safe side use

os.getenv('FOO') or 'bar'

A corner case with the above answers is when the environment variable is set but is empty

For this special case you get

print(os.getenv('FOO', 'bar'))
# prints new line - though you expected `bar`

or

if "FOO" in os.environ:
    print("FOO is here")
# prints FOO is here - however its not

To avoid this just use or

os.getenv('FOO') or 'bar'

Then you get

print(os.getenv('FOO') or 'bar')
# bar

When do we have empty environment variables?

You forgot to set the value in the .env file

# .env
FOO=

or exported as

$ export FOO=

or forgot to set it in settings.py

# settings.py
os.environ['FOO'] = ''

Update: if in doubt, check out these one-liners

>>> import os; os.environ['FOO'] = ''; print(os.getenv('FOO', 'bar'))

$ FOO= python -c "import os; print(os.getenv('FOO', 'bar'))"

回答 3

如果您要检查是否未设置多个环境变量,可以执行以下操作:

import os

MANDATORY_ENV_VARS = ["FOO", "BAR"]

for var in MANDATORY_ENV_VARS:
    if var not in os.environ:
        raise EnvironmentError("Failed because {} is not set.".format(var))

In case you want to check if multiple env variables are not set, you can do the following:

import os

MANDATORY_ENV_VARS = ["FOO", "BAR"]

for var in MANDATORY_ENV_VARS:
    if var not in os.environ:
        raise EnvironmentError("Failed because {} is not set.".format(var))

回答 4

我的评论可能与给定的标签无关。但是,我是从搜索中转到此页面的。我一直在寻找R中的类似支票,并在@hugovdbeg帖子的帮助下提出了以下内容。我希望这对在R中寻求类似解决方案的人有所帮助

'USERNAME' %in% names(Sys.getenv())

My comment might not be relevant to the tags given. However, I was lead to this page from my search. I was looking for similar check in R and I came up the following with the help of @hugovdbeg post. I hope it would be helpful for someone who is looking for similar solution in R

'USERNAME' %in% names(Sys.getenv())

类方法生成“ TypeError:…为关键字参数获得了多个值……”

问题:类方法生成“ TypeError:…为关键字参数获得了多个值……”

如果我用关键字参数定义一个类方法,则:

class foo(object):
  def foodo(thing=None, thong='not underwear'):
    print thing if thing else "nothing" 
    print 'a thong is',thong

调用该方法将生成TypeError

myfoo = foo()
myfoo.foodo(thing="something")

...
TypeError: foodo() got multiple values for keyword argument 'thing'

这是怎么回事?

If I define a class method with a keyword argument thus:

class foo(object):
  def foodo(thing=None, thong='not underwear'):
    print thing if thing else "nothing" 
    print 'a thong is',thong

calling the method generates a TypeError:

myfoo = foo()
myfoo.foodo(thing="something")

...
TypeError: foodo() got multiple values for keyword argument 'thing'

What’s going on?


回答 0

问题在于,传递给python中类方法的第一个参数始终是在其上调用该方法的类实例的副本,通常标记为self。如果这样声明了该类:

class foo(object):
  def foodo(self, thing=None, thong='not underwear'):
    print thing if thing else "nothing" 
    print 'a thong is',thong

它的行为符合预期。

说明:

如果没有self作为第一个参数,则在myfoo.foodo(thing="something")执行时,将foodo使用arguments调用该方法(myfoo, thing="something")myfoo然后将该实例分配给thing(因为thing是第一个声明的参数),但是python也会尝试分配"something"thing,因此是Exception。

为了演示,请尝试使用原始代码运行它:

myfoo.foodo("something")
print
print myfoo

您将输出如下:

<__main__.foo object at 0x321c290>
a thong is something

<__main__.foo object at 0x321c290>

您可以看到“事物”已被分配对类“ foo”的实例“ myfoo”的引用。文档的此部分说明了函数参数的工作原理。

The problem is that the first argument passed to class methods in python is always a copy of the class instance on which the method is called, typically labelled self. If the class is declared thus:

class foo(object):
  def foodo(self, thing=None, thong='not underwear'):
    print thing if thing else "nothing" 
    print 'a thong is',thong

it behaves as expected.

Explanation:

Without self as the first parameter, when myfoo.foodo(thing="something") is executed, the foodo method is called with arguments (myfoo, thing="something"). The instance myfoo is then assigned to thing (since thing is the first declared parameter), but python also attempts to assign "something" to thing, hence the Exception.

To demonstrate, try running this with the original code:

myfoo.foodo("something")
print
print myfoo

You’ll output like:

<__main__.foo object at 0x321c290>
a thong is something

<__main__.foo object at 0x321c290>

You can see that ‘thing’ has been assigned a reference to the instance ‘myfoo’ of the class ‘foo’. This section of the docs explains how function arguments work a bit more.


回答 1

感谢您的指导性帖子。我只想说明一下,如果您收到“ TypeError:foodo()为关键字参数’thing’获得多个值”,则可能是您错误地将“ self”作为参数传递的调用该函数(可能是因为您从类声明中复制了该行-急时这是一个常见错误)。

Thanks for the instructive posts. I’d just like to keep a note that if you’re getting “TypeError: foodo() got multiple values for keyword argument ‘thing'”, it may also be that you’re mistakenly passing the ‘self’ as a parameter when calling the function (probably because you copied the line from the class declaration – it’s a common error when one’s in a hurry).


回答 2

这可能很明显,但可能会对从未见过的人有所帮助。如果您错误地通过位置和名称显式地分配了参数,则对于常规函数也会发生这种情况。

>>> def foodo(thing=None, thong='not underwear'):
...     print thing if thing else "nothing"
...     print 'a thong is',thong
...
>>> foodo('something', thing='everything')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: foodo() got multiple values for keyword argument 'thing'

This might be obvious, but it might help someone who has never seen it before. This also happens for regular functions if you mistakenly assign a parameter by position and explicitly by name.

>>> def foodo(thing=None, thong='not underwear'):
...     print thing if thing else "nothing"
...     print 'a thong is',thong
...
>>> foodo('something', thing='everything')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: foodo() got multiple values for keyword argument 'thing'

回答 3

只需向功能添加“ staticmethod”装饰器即可解决问题

class foo(object):
    @staticmethod
    def foodo(thing=None, thong='not underwear'):
        print thing if thing else "nothing" 
        print 'a thong is',thong

just add ‘staticmethod’ decorator to function and problem is fixed

class foo(object):
    @staticmethod
    def foodo(thing=None, thong='not underwear'):
        print thing if thing else "nothing" 
        print 'a thong is',thong

回答 4

我想再添加一个答案:

当您尝试在调用函数中尝试传递位置顺序错误的位置参数以及关键字参数时,就会发生这种情况。

there is difference between parameter and argument您可以在此处详细了解python中的参数和参数

def hello(a,b=1, *args):
   print(a, b, *args)


hello(1, 2, 3, 4,a=12)

因为我们有三个参数:

a是位置参数

b = 1是关键字和默认参数

* args是可变长度参数

因此我们首先将a作为位置参数赋值,这意味着我们必须按位置顺序向位置参数提供值,这里顺序很重要。但是我们将参数1传递给in调用函数中的位置,然后还将值提供给a,将其视为关键字参数。现在一个有两个值:

一个是位置值:a = 1

第二个是关键字值,a = 12

我们必须更改hello(1, 2, 3, 4,a=12)为,hello(1, 2, 3, 4,12) 所以现在a将仅获得一个位置值,即1,b将获得值2,其余值将获得* args(可变长度参数)

附加信息

如果我们希望* args应该得到2,3,4而a应该得到1和b应该得到12

那么我们可以这样做
def hello(a,*args,b=1): pass hello(1, 2, 3, 4,b=12)

还有更多:

def hello(a,*c,b=1,**kwargs):
    print(b)
    print(c)
    print(a)
    print(kwargs)

hello(1,2,1,2,8,9,c=12)

输出:

1

(2, 1, 2, 8, 9)

1

{'c': 12}

I want to add one more answer :

It happens when you try to pass positional parameter with wrong position order along with keyword argument in calling function.

there is difference between parameter and argument you can read in detail about here Arguments and Parameter in python

def hello(a,b=1, *args):
   print(a, b, *args)


hello(1, 2, 3, 4,a=12)

since we have three parameters :

a is positional parameter

b=1 is keyword and default parameter

*args is variable length parameter

so we first assign a as positional parameter , means we have to provide value to positional argument in its position order, here order matter. but we are passing argument 1 at the place of a in calling function and then we are also providing value to a , treating as keyword argument. now a have two values :

one is positional value: a=1

second is keyworded value which is a=12

Solution

We have to change hello(1, 2, 3, 4,a=12) to hello(1, 2, 3, 4,12) so now a will get only one positional value which is 1 and b will get value 2 and rest of values will get *args (variable length parameter)

additional information

if we want that *args should get 2,3,4 and a should get 1 and b should get 12

then we can do like this
def hello(a,*args,b=1): pass hello(1, 2, 3, 4,b=12)

Something more :

def hello(a,*c,b=1,**kwargs):
    print(b)
    print(c)
    print(a)
    print(kwargs)

hello(1,2,1,2,8,9,c=12)

output :

1

(2, 1, 2, 8, 9)

1

{'c': 12}

回答 5

如果您传递的关键字自变量的键之一与位置自变量相似(具有相同的字符串名称),则也会发生此错误。

>>> class Foo():
...     def bar(self, bar, **kwargs):
...             print(bar)
... 
>>> kwgs = {"bar":"Barred", "jokes":"Another key word argument"}
>>> myfoo = Foo()
>>> myfoo.bar("fire", **kwgs)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: bar() got multiple values for argument 'bar'
>>> 

“开火”已被纳入“酒吧”论点。但是在kwargs中还存在另一个“禁止”论点。

您必须先将关键字参数从kwargs中删除,然后再将其传递给方法。

This error can also happen if you pass a key word argument for which one of the keys is similar (has same string name) to a positional argument.

>>> class Foo():
...     def bar(self, bar, **kwargs):
...             print(bar)
... 
>>> kwgs = {"bar":"Barred", "jokes":"Another key word argument"}
>>> myfoo = Foo()
>>> myfoo.bar("fire", **kwgs)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: bar() got multiple values for argument 'bar'
>>> 

“fire” has been accepted into the ‘bar’ argument. And yet there is another ‘bar’ argument present in kwargs.

You would have to remove the keyword argument from the kwargs before passing it to the method.


回答 6

如果您使用jquery ajax的URL反向到不包含’request’参数的函数,则这也可能在Django中发生

$.ajax({
  url: '{{ url_to_myfunc }}',
});


def myfunc(foo, bar):
    ...

Also this can happen in Django if you are using jquery ajax to url that reverses to a function that doesn’t contain ‘request’ parameter

$.ajax({
  url: '{{ url_to_myfunc }}',
});


def myfunc(foo, bar):
    ...

UnicodeDecodeError:’ascii’编解码器无法解码位置13的字节0xe2:序数不在范围内(128)

问题:UnicodeDecodeError:’ascii’编解码器无法解码位置13的字节0xe2:序数不在范围内(128)

我正在使用NLTK在我的文本文件中执行kmeans聚类,其中每一行都被视为文档。例如,我的文本文件是这样的:

belong finger death punch <br>
hasty <br>
mike hasty walls jericho <br>
jägermeister rules <br>
rules bands follow performing jägermeister stage <br>
approach 

现在我要运行的演示代码是这样的:

import sys

import numpy
from nltk.cluster import KMeansClusterer, GAAClusterer, euclidean_distance
import nltk.corpus
from nltk import decorators
import nltk.stem

stemmer_func = nltk.stem.EnglishStemmer().stem
stopwords = set(nltk.corpus.stopwords.words('english'))

@decorators.memoize
def normalize_word(word):
    return stemmer_func(word.lower())

def get_words(titles):
    words = set()
    for title in job_titles:
        for word in title.split():
            words.add(normalize_word(word))
    return list(words)

@decorators.memoize
def vectorspaced(title):
    title_components = [normalize_word(word) for word in title.split()]
    return numpy.array([
        word in title_components and not word in stopwords
        for word in words], numpy.short)

if __name__ == '__main__':

    filename = 'example.txt'
    if len(sys.argv) == 2:
        filename = sys.argv[1]

    with open(filename) as title_file:

        job_titles = [line.strip() for line in title_file.readlines()]

        words = get_words(job_titles)

        # cluster = KMeansClusterer(5, euclidean_distance)
        cluster = GAAClusterer(5)
        cluster.cluster([vectorspaced(title) for title in job_titles if title])

        # NOTE: This is inefficient, cluster.classify should really just be
        # called when you are classifying previously unseen examples!
        classified_examples = [
                cluster.classify(vectorspaced(title)) for title in job_titles
            ]

        for cluster_id, title in sorted(zip(classified_examples, job_titles)):
            print cluster_id, title

(也可以在这里找到)

我收到的错误是这样的:

Traceback (most recent call last):
File "cluster_example.py", line 40, in
words = get_words(job_titles)
File "cluster_example.py", line 20, in get_words
words.add(normalize_word(word))
File "", line 1, in
File "/usr/local/lib/python2.7/dist-packages/nltk/decorators.py", line 183, in memoize
result = func(*args)
File "cluster_example.py", line 14, in normalize_word
return stemmer_func(word.lower())
File "/usr/local/lib/python2.7/dist-packages/nltk/stem/snowball.py", line 694, in stem
word = (word.replace(u"\u2019", u"\x27")
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 13: ordinal not in range(128)

这是怎么回事

I’m using NLTK to perform kmeans clustering on my text file in which each line is considered as a document. So for example, my text file is something like this:

belong finger death punch <br>
hasty <br>
mike hasty walls jericho <br>
jägermeister rules <br>
rules bands follow performing jägermeister stage <br>
approach 

Now the demo code I’m trying to run is this:

import sys

import numpy
from nltk.cluster import KMeansClusterer, GAAClusterer, euclidean_distance
import nltk.corpus
from nltk import decorators
import nltk.stem

stemmer_func = nltk.stem.EnglishStemmer().stem
stopwords = set(nltk.corpus.stopwords.words('english'))

@decorators.memoize
def normalize_word(word):
    return stemmer_func(word.lower())

def get_words(titles):
    words = set()
    for title in job_titles:
        for word in title.split():
            words.add(normalize_word(word))
    return list(words)

@decorators.memoize
def vectorspaced(title):
    title_components = [normalize_word(word) for word in title.split()]
    return numpy.array([
        word in title_components and not word in stopwords
        for word in words], numpy.short)

if __name__ == '__main__':

    filename = 'example.txt'
    if len(sys.argv) == 2:
        filename = sys.argv[1]

    with open(filename) as title_file:

        job_titles = [line.strip() for line in title_file.readlines()]

        words = get_words(job_titles)

        # cluster = KMeansClusterer(5, euclidean_distance)
        cluster = GAAClusterer(5)
        cluster.cluster([vectorspaced(title) for title in job_titles if title])

        # NOTE: This is inefficient, cluster.classify should really just be
        # called when you are classifying previously unseen examples!
        classified_examples = [
                cluster.classify(vectorspaced(title)) for title in job_titles
            ]

        for cluster_id, title in sorted(zip(classified_examples, job_titles)):
            print cluster_id, title

(which can also be found here)

The error I receive is this:

Traceback (most recent call last):
File "cluster_example.py", line 40, in
words = get_words(job_titles)
File "cluster_example.py", line 20, in get_words
words.add(normalize_word(word))
File "", line 1, in
File "/usr/local/lib/python2.7/dist-packages/nltk/decorators.py", line 183, in memoize
result = func(*args)
File "cluster_example.py", line 14, in normalize_word
return stemmer_func(word.lower())
File "/usr/local/lib/python2.7/dist-packages/nltk/stem/snowball.py", line 694, in stem
word = (word.replace(u"\u2019", u"\x27")
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 13: ordinal not in range(128)

What is happening here?


回答 0

该文件被读为一堆str,但应该为unicode。Python尝试隐式转换,但失败。更改:

job_titles = [line.strip() for line in title_file.readlines()]

strs 显式解码为unicode(此处假定为UTF-8):

job_titles = [line.decode('utf-8').strip() for line in title_file.readlines()]

它也可以通过进口解决codecs模块和使用codecs.open,而不是内置的open

The file is being read as a bunch of strs, but it should be unicodes. Python tries to implicitly convert, but fails. Change:

job_titles = [line.strip() for line in title_file.readlines()]

to explicitly decode the strs to unicode (here assuming UTF-8):

job_titles = [line.decode('utf-8').strip() for line in title_file.readlines()]

It could also be solved by importing the codecs module and using codecs.open rather than the built-in open.


回答 1

这对我来说很好。

f = open(file_path, 'r+', encoding="utf-8")

您可以添加第三个参数编码,以确保编码类型为’utf-8′

注意:此方法在Python3中工作正常,我没有在Python2.7中尝试过。

This works fine for me.

f = open(file_path, 'r+', encoding="utf-8")

You can add a third parameter encoding to ensure the encoding type is ‘utf-8’

Note: this method works fine in Python3, I did not try it in Python2.7.


回答 2

对我来说,终端编码有问题。将UTF-8添加到.bashrc解决了该问题:

export LC_CTYPE=en_US.UTF-8

不要忘了之后重新加载.bashrc:

source ~/.bashrc

For me there was a problem with the terminal encoding. Adding UTF-8 to .bashrc solved the problem:

export LC_CTYPE=en_US.UTF-8

Don’t forget to reload .bashrc afterwards:

source ~/.bashrc

回答 3

您也可以尝试以下操作:

import sys
reload(sys)
sys.setdefaultencoding('utf8')

You can try this also:

import sys
reload(sys)
sys.setdefaultencoding('utf8')

回答 4

在使用Python3.6的 Ubuntu 18.04上,我同时解决了以下问题:

with open(filename, encoding="utf-8") as lines:

并且如果您以命令行方式运行该工具:

export LC_ALL=C.UTF-8

请注意,如果您使用的是Python2.7,则必须以不同的方式进行处理。首先,您必须设置默认编码:

import sys
reload(sys)
sys.setdefaultencoding('utf-8')

然后要加载文件,您必须使用它io.open来设置编码:

import io
with io.open(filename, 'r', encoding='utf-8') as lines:

您仍然需要导出环境

export LC_ALL=C.UTF-8

When on Ubuntu 18.04 using Python3.6 I have solved the problem doing both:

with open(filename, encoding="utf-8") as lines:

and if you are running the tool as command line:

export LC_ALL=C.UTF-8

Note that if you are in Python2.7 you have do to handle this differently. First you have to set the default encoding:

import sys
reload(sys)
sys.setdefaultencoding('utf-8')

and then to load the file you must use io.open to set the encoding:

import io
with io.open(filename, 'r', encoding='utf-8') as lines:

You still need to export the env

export LC_ALL=C.UTF-8

回答 5

尝试在Docker容器中安装python软件包时出现此错误。对我来说,问题是Docker映像没有locale配置。将以下代码添加到Dockerfile中为我解决了这个问题。

# Avoid ascii errors when reading files in Python
RUN apt-get install -y \
  locales && \
  locale-gen en_US.UTF-8
ENV LANG='en_US.UTF-8' LANGUAGE='en_US:en' LC_ALL='en_US.UTF-8'

I got this error when trying to install a python package in a Docker container. For me, the issue was that the docker image did not have a locale configured. Adding the following code to the Dockerfile solved the problem for me.

# Avoid ascii errors when reading files in Python
RUN apt-get install -y locales && locale-gen en_US.UTF-8
ENV LANG='en_US.UTF-8' LANGUAGE='en_US:en' LC_ALL='en_US.UTF-8'

回答 6

要查找与ANY和ALL unicode错误相关的信息,请使用以下命令:

grep -r -P '[^\x00-\x7f]' /etc/apache2 /etc/letsencrypt /etc/nginx

在发现我的

/etc/letsencrypt/options-ssl-nginx.conf:        # The following CSP directives don't use default-src as 

使用shed,我发现了令人讨厌的顺序。原来是编辑器错误。

00008099:     C2  194 302 11000010
00008100:     A0  160 240 10100000
00008101:  d  64  100 144 01100100
00008102:  e  65  101 145 01100101
00008103:  f  66  102 146 01100110
00008104:  a  61  097 141 01100001
00008105:  u  75  117 165 01110101
00008106:  l  6C  108 154 01101100
00008107:  t  74  116 164 01110100
00008108:  -  2D  045 055 00101101
00008109:  s  73  115 163 01110011
00008110:  r  72  114 162 01110010
00008111:  c  63  099 143 01100011
00008112:     C2  194 302 11000010
00008113:     A0  160 240 10100000

To find ANY and ALL unicode error related… Using the following command:

grep -r -P '[^\x00-\x7f]' /etc/apache2 /etc/letsencrypt /etc/nginx

Found mine in

/etc/letsencrypt/options-ssl-nginx.conf:        # The following CSP directives don't use default-src as 

Using shed, I found the offending sequence. It turned out to be an editor mistake.

00008099:     C2  194 302 11000010
00008100:     A0  160 240 10100000
00008101:  d  64  100 144 01100100
00008102:  e  65  101 145 01100101
00008103:  f  66  102 146 01100110
00008104:  a  61  097 141 01100001
00008105:  u  75  117 165 01110101
00008106:  l  6C  108 154 01101100
00008107:  t  74  116 164 01110100
00008108:  -  2D  045 055 00101101
00008109:  s  73  115 163 01110011
00008110:  r  72  114 162 01110010
00008111:  c  63  099 143 01100011
00008112:     C2  194 302 11000010
00008113:     A0  160 240 10100000

回答 7

您可以在使用job_titles字符串之前尝试以下操作:

source = unicode(job_titles, 'utf-8')

You can try this before using job_titles string:

source = unicode(job_titles, 'utf-8')

回答 8

对于python 3,默认编码为“ utf-8”。基本文档中建议采取以下步骤:如有任何问题,请https://docs.python.org/2/library/csv.html#csv-examples

  1. 创建一个功能

    def utf_8_encoder(unicode_csv_data):
        for line in unicode_csv_data:
            yield line.encode('utf-8')
  2. 然后使用阅读器内部的功能,例如

    csv_reader = csv.reader(utf_8_encoder(unicode_csv_data))

For python 3, the default encoding would be “utf-8”. Following steps are suggested in the base documentation:https://docs.python.org/2/library/csv.html#csv-examples in case of any problem

  1. Create a function

    def utf_8_encoder(unicode_csv_data):
        for line in unicode_csv_data:
            yield line.encode('utf-8')
    
  2. Then use the function inside the reader, for e.g.

    csv_reader = csv.reader(utf_8_encoder(unicode_csv_data))
    

回答 9

python3x或更高版本

  1. 以字节流加载文件:

    body =”for open(’website / index.html’,’rb’)中的行:

  2. 使用全局设置:

    导入io导入sys sys.stdout = io.TextIOWrapper(sys.stdout.buffer,encoding =’utf-8’)

python3x or higher

  1. load file in byte stream:

    body = ” for lines in open(‘website/index.html’,’rb’): decodedLine = lines.decode(‘utf-8’) body = body+decodedLine.strip() return body

  2. use global setting:

    import io import sys sys.stdout = io.TextIOWrapper(sys.stdout.buffer,encoding=’utf-8′)


回答 10

使用open(fn, 'rb').read().decode('utf-8')而不是仅仅open(fn).read()

Use open(fn, 'rb').read().decode('utf-8') instead of just open(fn).read()


在python中将stdout重定向为“ nothing”

问题:在python中将stdout重定向为“ nothing”

我有一个大型项目,其中包含足够多的模块,每个模块都将一些内容打印到标准输出中。现在,随着项目规模的扩大,没有大型项目。的print报表打印在其上制作的节目相当慢性病了很多。

因此,我现在想在运行时决定是否将任何内容打印到标准输出。我无法对模块进行更改,因为其中有很多更改。(我知道我可以将标准输出重定向到文件,但即使这样也很慢。)

所以我的问题是如何将stdout重定向为空,即如何使print语句不执行任何操作?

# I want to do something like this.
sys.stdout = None         # this obviously will give an error as Nonetype object does not have any write method.

目前,我唯一的想法是制作一个具有write方法的类(不执行任何操作),然后将stdout重定向到该类的实例。

class DontPrint(object):
    def write(*args): pass

dp = DontPrint()
sys.stdout = dp

在python中有内置的机制吗?还是有比这更好的东西?

I have a large project consisting of sufficiently large number of modules, each printing something to the standard output. Now as the project has grown in size, there are large no. of print statements printing a lot on the std out which has made the program considerably slower.

So, I now want to decide at runtime whether or not to print anything to the stdout. I cannot make changes in the modules as there are plenty of them. (I know I can redirect the stdout to a file but even this is considerably slow.)

So my question is how do I redirect the stdout to nothing ie how do I make the print statement do nothing?

# I want to do something like this.
sys.stdout = None         # this obviously will give an error as Nonetype object does not have any write method.

Currently the only idea I have is to make a class which has a write method (which does nothing) and redirect the stdout to an instance of this class.

class DontPrint(object):
    def write(*args): pass

dp = DontPrint()
sys.stdout = dp

Is there an inbuilt mechanism in python for this? Or is there something better than this?


回答 0

跨平台:

import os
import sys
f = open(os.devnull, 'w')
sys.stdout = f

在Windows上:

f = open('nul', 'w')
sys.stdout = f

在Linux上:

f = open('/dev/null', 'w')
sys.stdout = f

Cross-platform:

import os
import sys
f = open(os.devnull, 'w')
sys.stdout = f

On Windows:

f = open('nul', 'w')
sys.stdout = f

On Linux:

f = open('/dev/null', 'w')
sys.stdout = f

回答 1

这样做的一种好方法是创建一个用于包装打印内容的小型上下文处理器。然后,您可以使用with-statement来使所有输出静音。

Python 2:

import os
import sys
from contextlib import contextmanager

@contextmanager
def silence_stdout():
    old_target = sys.stdout
    try:
        with open(os.devnull, "w") as new_target:
            sys.stdout = new_target
            yield new_target
    finally:
        sys.stdout = old_target

with silence_stdout():
    print("will not print")

print("this will print")

Python 3.4+:

Python 3.4具有内置的上下文处理器,因此您可以像这样简单地使用contextlib:

import contextlib

with contextlib.redirect_stdout(None):
    print("will not print")

print("this will print")

运行此代码仅显示输出的第二行,而不输出第一行:

$ python test.py
this will print

这可以跨平台(Windows + Linux + Mac OSX)运行,并且比其他解决方案更干净。

A nice way to do this is to create a small context processor that you wrap your prints in. You then just use is in a with-statement to silence all output.

Python 2:

import os
import sys
from contextlib import contextmanager

@contextmanager
def silence_stdout():
    old_target = sys.stdout
    try:
        with open(os.devnull, "w") as new_target:
            sys.stdout = new_target
            yield new_target
    finally:
        sys.stdout = old_target

with silence_stdout():
    print("will not print")

print("this will print")

Python 3.4+:

Python 3.4 has a context processor like this built-in, so you can simply use contextlib like this:

import contextlib

with contextlib.redirect_stdout(None):
    print("will not print")

print("this will print")

Running this code only prints the second line of output, not the first:

$ python test.py
this will print

This works cross-platform (Windows + Linux + Mac OSX), and is cleaner than the ones other answers imho.


回答 2

如果您使用的是python 3.4或更高版本,则可以使用标准库提供一种简单安全的解决方案:

import contextlib

with contextlib.redirect_stdout(None):
  print("This won't print!")

If you’re in python 3.4 or higher, there’s a simple and safe solution using the standard library:

import contextlib

with contextlib.redirect_stdout(None):
  print("This won't print!")

回答 3

(至少在我的系统上)似乎写os.devnull比写DontPrint类快大约5倍,即

#!/usr/bin/python
import os
import sys
import datetime

ITER = 10000000
def printlots(out, it, st="abcdefghijklmnopqrstuvwxyz1234567890"):
   temp = sys.stdout
   sys.stdout = out
   i = 0
   start_t = datetime.datetime.now()
   while i < it:
      print st
      i = i+1
   end_t = datetime.datetime.now()
   sys.stdout = temp
   print out, "\n   took", end_t - start_t, "for", it, "iterations"

class devnull():
   def write(*args):
      pass


printlots(open(os.devnull, 'wb'), ITER)
printlots(devnull(), ITER)

给出以下输出:

<open file '/dev/null', mode 'wb' at 0x7f2b747044b0> 
   took 0:00:02.074853 for 10000000 iterations
<__main__.devnull instance at 0x7f2b746bae18> 
   took 0:00:09.933056 for 10000000 iterations

(at least on my system) it appears that writing to os.devnull is about 5x faster than writing to a DontPrint class, i.e.

#!/usr/bin/python
import os
import sys
import datetime

ITER = 10000000
def printlots(out, it, st="abcdefghijklmnopqrstuvwxyz1234567890"):
   temp = sys.stdout
   sys.stdout = out
   i = 0
   start_t = datetime.datetime.now()
   while i < it:
      print st
      i = i+1
   end_t = datetime.datetime.now()
   sys.stdout = temp
   print out, "\n   took", end_t - start_t, "for", it, "iterations"

class devnull():
   def write(*args):
      pass


printlots(open(os.devnull, 'wb'), ITER)
printlots(devnull(), ITER)

gave the following output:

<open file '/dev/null', mode 'wb' at 0x7f2b747044b0> 
   took 0:00:02.074853 for 10000000 iterations
<__main__.devnull instance at 0x7f2b746bae18> 
   took 0:00:09.933056 for 10000000 iterations

回答 4

如果您在Unix环境(包括Linux)中,则可以将输出重定向到/dev/null

python myprogram.py > /dev/null

对于Windows:

python myprogram.py > nul

If you’re in a Unix environment (Linux included), you can redirect output to /dev/null:

python myprogram.py > /dev/null

And for Windows:

python myprogram.py > nul

回答 5

这个怎么样:

from contextlib import ExitStack, redirect_stdout
import os

with ExitStack() as stack:
    if should_hide_output():
        null_stream = open(os.devnull, "w")
        stack.enter_context(null_stream)
        stack.enter_context(redirect_stdout(null_stream))
    noisy_function()

这将使用contextlib模块中的功能根据的结果隐藏要尝试运行的任何命令的输出should_hide_output(),然后在该函数运行完毕后恢复输出行为。

如果您想隐藏标准错误输出,请redirect_stderr从导入contextlib并添加一行stack.enter_context(redirect_stderr(null_stream))

主要缺点是,这仅适用于Python 3.4和更高版本。

How about this:

from contextlib import ExitStack, redirect_stdout
import os

with ExitStack() as stack:
    if should_hide_output():
        null_stream = open(os.devnull, "w")
        stack.enter_context(null_stream)
        stack.enter_context(redirect_stdout(null_stream))
    noisy_function()

This uses the features in the contextlib module to hide the output of whatever command you are trying to run, depending on the result of should_hide_output(), and then restores the output behavior after that function is done running.

If you want to hide standard error output, then import redirect_stderr from contextlib and add a line saying stack.enter_context(redirect_stderr(null_stream)).

The main downside it that this only works in Python 3.4 and later versions.


回答 6

您的类将正常工作(write()方法名称除外-需要将其称为write()小写)。只要确保将副本保存sys.stdout在另一个变量中即可。

如果您使用的是* NIX,则可以执行sys.stdout = open('/dev/null'),但这比滚动自己的类要轻巧。

Your class will work just fine (with the exception of the write() method name — it needs to be called write(), lowercase). Just make sure you save a copy of sys.stdout in another variable.

If you’re on a *NIX, you can do sys.stdout = open('/dev/null'), but this is less portable than rolling your own class.


回答 7

您可以嘲笑它。

import mock

sys.stdout = mock.MagicMock()

You can just mock it.

import mock

sys.stdout = mock.MagicMock()

回答 8

你为什么不试试这个?

sys.stdout.close()
sys.stderr.close()

Why don’t you try this?

sys.stdout.close()
sys.stderr.close()

回答 9

sys.stdout = None

可以的print()情况下。但是,如果您调用sys.stdout的任何方法(例如),则可能会导致错误sys.stdout.write()

在文档中有一个注释

在某些情况下,stdin,stdout和stderr以及原始值stdinstdoutstderr可以为None。对于未连接到控制台的Windows GUI应用程序以及以pythonw开头的Python应用程序,通常是这种情况。

sys.stdout = None

It is OK for print() case. But it can cause an error if you call any method of sys.stdout, e.g. sys.stdout.write().

There is a note in docs:

Under some conditions stdin, stdout and stderr as well as the original values stdin, stdout and stderr can be None. It is usually the case for Windows GUI apps that aren’t connected to a console and Python apps started with pythonw.


回答 10

补充iFreilicht的答案 -适用于python 2和3。

import sys

class NonWritable:
    def write(self, *args, **kwargs):
        pass

class StdoutIgnore:
    def __enter__(self):
        self.stdout_saved = sys.stdout
        sys.stdout = NonWritable()
        return self

    def __exit__(self, *args):
        sys.stdout = self.stdout_saved

with StdoutIgnore():
    print("This won't print!")

Supplement to iFreilicht’s answer – it works for both python 2 & 3.

import sys

class NonWritable:
    def write(self, *args, **kwargs):
        pass

class StdoutIgnore:
    def __enter__(self):
        self.stdout_saved = sys.stdout
        sys.stdout = NonWritable()
        return self

    def __exit__(self, *args):
        sys.stdout = self.stdout_saved

with StdoutIgnore():
    print("This won't print!")

如果存在列表索引,请执行X

问题:如果存在列表索引,请执行X

在我的程序中,用户输入number n,然后输入n字符串数,这些字符串存储在列表中。

我需要进行编码,以便如果存在某个列表索引,然后运行一个函数。

我已经嵌套了if语句,这使情况变得更加复杂len(my_list)

这是我现在所拥有的简化版本,无法使用:

n = input ("Define number of actors: ")

count = 0

nams = []

while count < n:
    count = count + 1
    print "Define name for actor ", count, ":"
    name = raw_input ()
    nams.append(name)

if nams[2]: #I am trying to say 'if nams[2] exists, do something depending on len(nams)
    if len(nams) > 3:
        do_something
    if len(nams) > 4
        do_something_else

if nams[3]: #etc.

In my program, user inputs number n, and then inputs n number of strings, which get stored in a list.

I need to code such that if a certain list index exists, then run a function.

This is made more complicated by the fact that I have nested if statements about len(my_list).

Here’s a simplified version of what I have now, which isn’t working:

n = input ("Define number of actors: ")

count = 0

nams = []

while count < n:
    count = count + 1
    print "Define name for actor ", count, ":"
    name = raw_input ()
    nams.append(name)

if nams[2]: #I am trying to say 'if nams[2] exists, do something depending on len(nams)
    if len(nams) > 3:
        do_something
    if len(nams) > 4
        do_something_else

if nams[3]: #etc.

回答 0

使用列表的长度len(n)来告知您的决定而不是检查n[i]每个可能的长度,对您来说更有用吗?

Could it be more useful for you to use the length of the list len(n) to inform your decision rather than checking n[i] for each possible length?


回答 1

我需要进行编码,以便如果存在某个列表索引,然后运行一个函数。

这是try块的完美用法:

ar=[1,2,3]

try:
    t=ar[5]
except IndexError:
    print('sorry, no 5')   

# Note: this only is a valid test in this context 
# with absolute (ie, positive) index
# a relative index is only showing you that a value can be returned
# from that relative index from the end of the list...

但是,根据定义,Python列表中0和之间的所有项都len(the_list)-1存在(即,无需尝试,除非您知道0 <= index < len(the_list))。

如果希望索引在0和最后一个元素之间,可以使用enumerate

names=['barney','fred','dino']

for i, name in enumerate(names):
    print(i + ' ' + name)
    if i in (3,4):
        # do your thing with the index 'i' or value 'name' for each item...

如果您正在寻找一些明确的“索引”思想,那么我想您是在问一个错误的问题。也许您应该考虑使用映射容器(例如dict)与序列容器(例如列表)。您可以这样重写代码:

def do_something(name):
    print('some thing 1 done with ' + name)

def do_something_else(name):
    print('something 2 done with ' + name)        

def default(name):
    print('nothing done with ' + name)     

something_to_do={  
    3: do_something,        
    4: do_something_else
    }        

n = input ("Define number of actors: ")
count = 0
names = []

for count in range(n):
    print("Define name for actor {}:".format(count+1))
    name = raw_input ()
    names.append(name)

for name in names:
    try:
        something_to_do[len(name)](name)
    except KeyError:
        default(name)

像这样运行:

Define number of actors: 3
Define name for actor 1: bob
Define name for actor 2: tony
Define name for actor 3: alice
some thing 1 done with bob
something 2 done with tony
nothing done with alice

您也可以使用.get方法而不是try / except来获得较短的版本:

>>> something_to_do.get(3, default)('bob')
some thing 1 done with bob
>>> something_to_do.get(22, default)('alice')
nothing done with alice

I need to code such that if a certain list index exists, then run a function.

This is the perfect use for a try block:

ar=[1,2,3]

try:
    t=ar[5]
except IndexError:
    print('sorry, no 5')   

# Note: this only is a valid test in this context 
# with absolute (ie, positive) index
# a relative index is only showing you that a value can be returned
# from that relative index from the end of the list...

However, by definition, all items in a Python list between 0 and len(the_list)-1 exist (i.e., there is no need for a try, except if you know 0 <= index < len(the_list)).

You can use enumerate if you want the indexes between 0 and the last element:

names=['barney','fred','dino']

for i, name in enumerate(names):
    print(i + ' ' + name)
    if i in (3,4):
        # do your thing with the index 'i' or value 'name' for each item...

If you are looking for some defined ‘index’ thought, I think you are asking the wrong question. Perhaps you should consider using a mapping container (such as a dict) versus a sequence container (such as a list). You could rewrite your code like this:

def do_something(name):
    print('some thing 1 done with ' + name)

def do_something_else(name):
    print('something 2 done with ' + name)        

def default(name):
    print('nothing done with ' + name)     

something_to_do={  
    3: do_something,        
    4: do_something_else
    }        

n = input ("Define number of actors: ")
count = 0
names = []

for count in range(n):
    print("Define name for actor {}:".format(count+1))
    name = raw_input ()
    names.append(name)

for name in names:
    try:
        something_to_do[len(name)](name)
    except KeyError:
        default(name)

Runs like this:

Define number of actors: 3
Define name for actor 1: bob
Define name for actor 2: tony
Define name for actor 3: alice
some thing 1 done with bob
something 2 done with tony
nothing done with alice

You can also use .get method rather than try/except for a shorter version:

>>> something_to_do.get(3, default)('bob')
some thing 1 done with bob
>>> something_to_do.get(22, default)('alice')
nothing done with alice

回答 2

len(nams)应该等于n您的代码。所有索引都0 <= i < n“存在”。

len(nams) should be equal to n in your code. All indexes 0 <= i < n “exist”.


回答 3

只需使用以下代码即可完成:

if index < len(my_list):
    print(index, 'exists in the list')
else:
    print(index, "doesn't exist in the list")

It can be done simply using the following code:

if index < len(my_list):
    print(index, 'exists in the list')
else:
    print(index, "doesn't exist in the list")

回答 4

我需要进行编码,以便如果存在某个列表索引,然后运行一个函数。

您已经知道如何对此进行测试,并且实际上已经在代码中执行了这种测试

对于长的列表,有效索引n0通过n-1包容性。

因此,i 当且仅当列表的长度至少为时,列表才具有索引i + 1

I need to code such that if a certain list index exists, then run a function.

You already know how to test for this and in fact are already performing such tests in your code.

The valid indices for a list of length n are 0 through n-1 inclusive.

Thus, a list has an index i if and only if the length of the list is at least i + 1.


回答 5

使用列表的长度是检查索引是否存在的最快解决方案:

def index_exists(ls, i):
    return (0 <= i < len(ls)) or (-len(ls) <= i < 0)

这还将测试负索引以及具有长度的大多数序列类型(如rangesstr)。

如果无论如何您以后都需要访问该索引处的项目,则宽恕要比权限容易,并且它也更快,更Python化。使用try: except:

try:
    item = ls[i]
    # Do something with item
except IndexError:
    # Do something without the item

这与以下情况相反:

if index_exists(ls, i):
    item = ls[i]
    # Do something with item
else:
    # Do something without the item

Using the length of the list would be the fastest solution to check if an index exists:

def index_exists(ls, i):
    return (0 <= i < len(ls)) or (-len(ls) <= i < 0)

This also tests for negative indices, and most sequence types (Like ranges and strs) that have a length.

If you need to access the item at that index afterwards anyways, it is easier to ask forgiveness than permission, and it is also faster and more Pythonic. Use try: except:.

try:
    item = ls[i]
    # Do something with item
except IndexError:
    # Do something without the item

This would be as opposed to:

if index_exists(ls, i):
    item = ls[i]
    # Do something with item
else:
    # Do something without the item

回答 6

如果要迭代插入的actor数据:

for i in range(n):
    if len(nams[i]) > 3:
        do_something
    if len(nams[i]) > 4:
        do_something_else

If you want to iterate the inserted actors data:

for i in range(n):
    if len(nams[i]) > 3:
        do_something
    if len(nams[i]) > 4:
        do_something_else

回答 7

好的,所以我认为这实际上是可能的(出于参数的目的):

>>> your_list = [5,6,7]
>>> 2 in zip(*enumerate(your_list))[0]
True
>>> 3 in zip(*enumerate(your_list))[0]
False

ok, so I think it’s actually possible (for the sake of argument):

>>> your_list = [5,6,7]
>>> 2 in zip(*enumerate(your_list))[0]
True
>>> 3 in zip(*enumerate(your_list))[0]
False

回答 8

您可以尝试这样的事情

list = ["a", "b", "C", "d", "e", "f", "r"]

for i in range(0, len(list), 2):
    print list[i]
    if len(list) % 2 == 1 and  i == len(list)-1:
        break
    print list[i+1];

You can try something like this

list = ["a", "b", "C", "d", "e", "f", "r"]

for i in range(0, len(list), 2):
    print list[i]
    if len(list) % 2 == 1 and  i == len(list)-1:
        break
    print list[i+1];

回答 9

Oneliner:

do_X() if len(your_list) > your_index else do_something_else()  

完整示例:

In [10]: def do_X(): 
    ...:     print(1) 
    ...:                                                                                                                                                                                                                                      

In [11]: def do_something_else(): 
    ...:     print(2) 
    ...:                                                                                                                                                                                                                                      

In [12]: your_index = 2                                                                                                                                                                                                                       

In [13]: your_list = [1,2,3]                                                                                                                                                                                                                  

In [14]: do_X() if len(your_list) > your_index else do_something_else()                                                                                                                                                                      
1

仅用于信息。恕我直言,try ... except IndexError是更好的解决方案。

Oneliner:

do_X() if len(your_list) > your_index else do_something_else()  

Full example:

In [10]: def do_X(): 
    ...:     print(1) 
    ...:                                                                                                                                                                                                                                      

In [11]: def do_something_else(): 
    ...:     print(2) 
    ...:                                                                                                                                                                                                                                      

In [12]: your_index = 2                                                                                                                                                                                                                       

In [13]: your_list = [1,2,3]                                                                                                                                                                                                                  

In [14]: do_X() if len(your_list) > your_index else do_something_else()                                                                                                                                                                      
1

Just for info. Imho, try ... except IndexError is better solution.


回答 10

不要在方括号前留任何空间。

例:

n = input ()
         ^

提示:您应该在代码上方和/或下方添加注释。不隐藏您的代码。


祝你今天愉快。

Do not let any space in front of your brackets.

Example:

n = input ()
         ^

Tip: You should add comments over and/or under your code. Not behind your code.


Have a nice day.


回答 11

很多答案,而不是简单的答案。

要检查字典dict是否存在索引“ id”:

dic = {}
dic['name'] = "joao"
dic['age']  = "39"

if 'age' in dic

如果存在“年龄”,则返回true。

A lot of answers, not the simple one.

To check if a index ‘id’ exists at dictionary dict:

dic = {}
dic['name'] = "joao"
dic['age']  = "39"

if 'age' in dic

returns true if ‘age’ exists.


sklearn错误ValueError:输入包含NaN,无穷大或对于dtype(’float64’)而言太大的值

问题:sklearn错误ValueError:输入包含NaN,无穷大或对于dtype(’float64’)而言太大的值

我正在使用sklearn,并且亲和力传播存在问题。我建立了一个输入矩阵,并且不断收到以下错误。

ValueError: Input contains NaN, infinity or a value too large for dtype('float64').

我跑了

np.isnan(mat.any()) #and gets False
np.isfinite(mat.all()) #and gets True

我尝试使用

mat[np.isfinite(mat) == True] = 0

删除无限值,但这也不起作用。我该怎么做才能摆脱矩阵中的无限值,以便可以使用亲和力传播算法?

我正在使用anaconda和python 2.7.9。

I am using sklearn and having a problem with the affinity propagation. I have built an input matrix and I keep getting the following error.

ValueError: Input contains NaN, infinity or a value too large for dtype('float64').

I have run

np.isnan(mat.any()) #and gets False
np.isfinite(mat.all()) #and gets True

I tried using

mat[np.isfinite(mat) == True] = 0

to remove the infinite values but this did not work either. What can I do to get rid of the infinite values in my matrix, so that I can use the affinity propagation algorithm?

I am using anaconda and python 2.7.9.


回答 0

这可能会在scikit内部发生,并且取决于您在做什么。我建议您阅读所用功能的文档。您可能正在使用一种方法,例如,这取决于您的矩阵是正定的且不满足该条件。

编辑:我怎么会错过:

np.isnan(mat.any()) #and gets False
np.isfinite(mat.all()) #and gets True

显然是错误的。正确的是:

np.any(np.isnan(mat))

np.all(np.isfinite(mat))

您想检查任何元素是否为NaN,而不是该any函数的返回值是否为数字…

This might happen inside scikit, and it depends on what you’re doing. I recommend reading the documentation for the functions you’re using. You might be using one which depends e.g. on your matrix being positive definite and not fulfilling that criteria.

EDIT: How could I miss that:

np.isnan(mat.any()) #and gets False
np.isfinite(mat.all()) #and gets True

is obviously wrong. Right would be:

np.any(np.isnan(mat))

and

np.all(np.isfinite(mat))

You want to check wheter any of the element is NaN, and not whether the return value of the any function is a number…


回答 1

sklearnpandas一起使用时,出现相同的错误消息。我的解决方案是df在运行任何sklearn代码之前重置数据帧的索引:

df = df.reset_index()

当我删除自己的某些条目时,我多次遇到此问题df,例如

df = df[df.label=='desired_one']

I got the same error message when using sklearn with pandas. My solution is to reset the index of my dataframe df before running any sklearn code:

df = df.reset_index()

I encountered this issue many times when I removed some entries in my df, such as

df = df[df.label=='desired_one']

回答 2

这是我的功能(基于)清洁的数据集nanInf和缺少细胞(偏斜数据集):

import pandas as pd

def clean_dataset(df):
    assert isinstance(df, pd.DataFrame), "df needs to be a pd.DataFrame"
    df.dropna(inplace=True)
    indices_to_keep = ~df.isin([np.nan, np.inf, -np.inf]).any(1)
    return df[indices_to_keep].astype(np.float64)

This is my function (based on this) to clean the dataset of nan, Inf, and missing cells (for skewed datasets):

import pandas as pd

def clean_dataset(df):
    assert isinstance(df, pd.DataFrame), "df needs to be a pd.DataFrame"
    df.dropna(inplace=True)
    indices_to_keep = ~df.isin([np.nan, np.inf, -np.inf]).any(1)
    return df[indices_to_keep].astype(np.float64)

回答 3

输入数组的维度倾斜,因为我的输入csv有空白。

The Dimensions of my input array were skewed, as my input csv had empty spaces.


回答 4

这是失败的检查:

哪说

def _assert_all_finite(X):
    """Like assert_all_finite, but only for ndarray."""
    X = np.asanyarray(X)
    # First try an O(n) time, O(1) space solution for the common case that
    # everything is finite; fall back to O(n) space np.isfinite to prevent
    # false positives from overflow in sum method.
    if (X.dtype.char in np.typecodes['AllFloat'] and not np.isfinite(X.sum())
            and not np.isfinite(X).all()):
        raise ValueError("Input contains NaN, infinity"
                         " or a value too large for %r." % X.dtype)

因此,请确保输入中没有非NaN值。所有这些值实际上都是浮点值。两个值都不应该是Inf。

This is the check on which it fails:

Which says

def _assert_all_finite(X):
    """Like assert_all_finite, but only for ndarray."""
    X = np.asanyarray(X)
    # First try an O(n) time, O(1) space solution for the common case that
    # everything is finite; fall back to O(n) space np.isfinite to prevent
    # false positives from overflow in sum method.
    if (X.dtype.char in np.typecodes['AllFloat'] and not np.isfinite(X.sum())
            and not np.isfinite(X).all()):
        raise ValueError("Input contains NaN, infinity"
                         " or a value too large for %r." % X.dtype)

So make sure that you have non NaN values in your input. And all those values are actually float values. None of the values should be Inf either.


回答 5

使用此版本的python 3:

/opt/anaconda3/bin/python --version
Python 3.6.0 :: Anaconda 4.3.0 (64-bit)

查看错误的详细信息,我发现导致失败的代码行:

/opt/anaconda3/lib/python3.6/site-packages/sklearn/utils/validation.py in _assert_all_finite(X)
     56             and not np.isfinite(X).all()):
     57         raise ValueError("Input contains NaN, infinity"
---> 58                          " or a value too large for %r." % X.dtype)
     59 
     60 

ValueError: Input contains NaN, infinity or a value too large for dtype('float64').

由此,我能够使用错误消息给出的相同测试来提取正确的方法来测试数据的处理方式: np.isfinite(X)

然后通过快速而肮脏的循环,我发现我的数据确实包含nans

print(p[:,0].shape)
index = 0
for i in p[:,0]:
    if not np.isfinite(i):
        print(index, i)
    index +=1

(367340,)
4454 nan
6940 nan
10868 nan
12753 nan
14855 nan
15678 nan
24954 nan
30251 nan
31108 nan
51455 nan
59055 nan
...

现在,我要做的就是删除这些索引中的值。

With this version of python 3:

/opt/anaconda3/bin/python --version
Python 3.6.0 :: Anaconda 4.3.0 (64-bit)

Looking at the details of the error, I found the lines of codes causing the failure:

/opt/anaconda3/lib/python3.6/site-packages/sklearn/utils/validation.py in _assert_all_finite(X)
     56             and not np.isfinite(X).all()):
     57         raise ValueError("Input contains NaN, infinity"
---> 58                          " or a value too large for %r." % X.dtype)
     59 
     60 

ValueError: Input contains NaN, infinity or a value too large for dtype('float64').

From this, I was able to extract the correct way to test what was going on with my data using the same test which fails given by the error message: np.isfinite(X)

Then with a quick and dirty loop, I was able to find that my data indeed contains nans:

print(p[:,0].shape)
index = 0
for i in p[:,0]:
    if not np.isfinite(i):
        print(index, i)
    index +=1

(367340,)
4454 nan
6940 nan
10868 nan
12753 nan
14855 nan
15678 nan
24954 nan
30251 nan
31108 nan
51455 nan
59055 nan
...

Now all I have to do is remove the values at these indexes.


回答 6

尝试选择行的子集后出现错误:

df = df.reindex(index=my_index)

原来my_index包含的值不包含在其中df.index,因此reindex函数插入了一些新行并将其填充为nan

I had the error after trying to select a subset of rows:

df = df.reindex(index=my_index)

Turns out that my_index contained values that were not contained in df.index, so the reindex function inserted some new rows and filled them with nan.


回答 7

在大多数情况下,消除无穷和空值可以解决此问题。

摆脱无限的价值。

df.replace([np.inf, -np.inf], np.nan, inplace=True)

以您喜欢的方式消除空值,特定值(例如999),均值,或创建自己的函数来估算缺失值

df.fillna(999, inplace=True)

In most cases getting rid of infinite and null values solve this problem.

get rid of infinite values.

df.replace([np.inf, -np.inf], np.nan, inplace=True)

get rid of null values the way you like, specific value such as 999, mean, or create your own function to impute missing values

df.fillna(999, inplace=True)

回答 8

我有同样的错误,在我的案例中,X和y是数据帧,因此我必须先将它们转换为矩阵:

X = X.values.astype(np.float)
y = y.values.astype(np.float)

编辑:建议使用最初建议的X.as_matrix()

I had the same error, and in my case X and y were dataframes so I had to convert them to matrices first:

X = X.values.astype(np.float)
y = y.values.astype(np.float)

Edit: The originally suggested X.as_matrix() is Deprecated


回答 9

我有同样的错误。它曾与df.fillna(-99999, inplace=True)做任何替换之前,替换等

i got the same error. it worked with df.fillna(-99999, inplace=True) before doing any replacement, substitution etc


回答 10

就我而言,问题是许多scikit函数返回的numpy数组没有熊猫索引。因此,当我使用这些numpy数组构建新的DataFrames,然后尝试将它们与原始数据混合时,索引不匹配。

In my case the problem was that many scikit functions return numpy arrays, which are devoid of pandas index. So there was an index mismatch when I used those numpy arrays to build new DataFrames and then I tried to mix them with the original data.


回答 11

删除所有无限值:

(并用该列的min或max代替)

# find min and max values for each column, ignoring nan, -inf, and inf
mins = [np.nanmin(matrix[:, i][matrix[:, i] != -np.inf]) for i in range(matrix.shape[1])]
maxs = [np.nanmax(matrix[:, i][matrix[:, i] != np.inf]) for i in range(matrix.shape[1])]

# go through matrix one column at a time and replace  + and -infinity 
# with the max or min for that column
for i in range(log_train_arr.shape[1]):
    matrix[:, i][matrix[:, i] == -np.inf] = mins[i]
    matrix[:, i][matrix[:, i] == np.inf] = maxs[i]

Remove all infinite values:

(and replace with min or max for that column)

# find min and max values for each column, ignoring nan, -inf, and inf
mins = [np.nanmin(matrix[:, i][matrix[:, i] != -np.inf]) for i in range(matrix.shape[1])]
maxs = [np.nanmax(matrix[:, i][matrix[:, i] != np.inf]) for i in range(matrix.shape[1])]

# go through matrix one column at a time and replace  + and -infinity 
# with the max or min for that column
for i in range(log_train_arr.shape[1]):
    matrix[:, i][matrix[:, i] == -np.inf] = mins[i]
    matrix[:, i][matrix[:, i] == np.inf] = maxs[i]

回答 12

尝试

mat.sum()

如果您的数据总和为无穷大(最大浮动值大于3.402823e + 38),则会收到该错误。

请参阅scikit源代码中validation.py中的_assert_all_finite函数:

if is_float and np.isfinite(X.sum()):
    pass
elif is_float:
    msg_err = "Input contains {} or a value too large for {!r}."
    if (allow_nan and np.isinf(X).any() or
            not allow_nan and not np.isfinite(X).all()):
        type_err = 'infinity' if allow_nan else 'NaN, infinity'
        # print(X.sum())
        raise ValueError(msg_err.format(type_err, X.dtype))

try

mat.sum()

If the sum of your data is infinity (greater that the max float value which is 3.402823e+38) you will get that error.

see the _assert_all_finite function in validation.py from the scikit source code:

if is_float and np.isfinite(X.sum()):
    pass
elif is_float:
    msg_err = "Input contains {} or a value too large for {!r}."
    if (allow_nan and np.isinf(X).any() or
            not allow_nan and not np.isfinite(X).all()):
        type_err = 'infinity' if allow_nan else 'NaN, infinity'
        # print(X.sum())
        raise ValueError(msg_err.format(type_err, X.dtype))

如何使用python获取文件夹中的最新文件

问题:如何使用python获取文件夹中的最新文件

我需要使用python获取文件夹的最新文件。使用代码时:

max(files, key = os.path.getctime)

我收到以下错误:

FileNotFoundError: [WinError 2] The system cannot find the file specified: 'a'

I need to get the latest file of a folder using python. While using the code:

max(files, key = os.path.getctime)

I am getting the below error:

FileNotFoundError: [WinError 2] The system cannot find the file specified: 'a'


回答 0

分配给files变量的任何内容均不正确。使用以下代码。

import glob
import os

list_of_files = glob.glob('/path/to/folder/*') # * means all if need specific format then *.csv
latest_file = max(list_of_files, key=os.path.getctime)
print latest_file

Whatever is assigned to the files variable is incorrect. Use the following code.

import glob
import os

list_of_files = glob.glob('/path/to/folder/*') # * means all if need specific format then *.csv
latest_file = max(list_of_files, key=os.path.getctime)
print latest_file

回答 1

max(files, key = os.path.getctime)

是非常不完整的代码。什么files啊 可能是来自的文件名列表os.listdir()

但是此列表仅列出了文件名部分(也称为“基本名称”),因为它们的路径是通用的。为了正确使用它,您必须将其与通向它的路径结合起来(并用于获得它)。

如(未测试):

def newest(path):
    files = os.listdir(path)
    paths = [os.path.join(path, basename) for basename in files]
    return max(paths, key=os.path.getctime)
max(files, key = os.path.getctime)

is quite incomplete code. What is files? It probably is a list of file names, coming out of os.listdir().

But this list lists only the filename parts (a. k. a. “basenames”), because their path is common. In order to use it correctly, you have to combine it with the path leading to it (and used to obtain it).

Such as (untested):

def newest(path):
    files = os.listdir(path)
    paths = [os.path.join(path, basename) for basename in files]
    return max(paths, key=os.path.getctime)

回答 2

我建议使用glob.iglob()代替glob.glob(),因为它效率更高。

glob.iglob()返回一个迭代器,该迭代器产生的值与glob()相同,而实际上并没有同时存储所有值。

意思是 glob.iglob()效率更高。

我主要使用以下代码查找与我的模式匹配的最新文件:

LatestFile = max(glob.iglob(fileNamePattern),key=os.path.getctime)


注意:max函数有多种变体,如果找到最新文件,我们将使用以下变体: max(iterable, *[, key, default])

它需要迭代,因此您的第一个参数应该是可迭代的。如果找到最大数量,我们可以使用beow变体:max (num1, num2, num3, *args[, key])

I would suggest using glob.iglob() instead of the glob.glob(), as it is more efficient.

glob.iglob() Return an iterator which yields the same values as glob() without actually storing them all simultaneously.

Which means glob.iglob() will be more efficient.

I mostly use below code to find the latest file matching to my pattern:

LatestFile = max(glob.iglob(fileNamePattern),key=os.path.getctime)


NOTE: There are variants of max function, In case of finding the latest file we will be using below variant: max(iterable, *[, key, default])

which needs iterable so your first parameter should be iterable. In case of finding max of nums we can use beow variant : max (num1, num2, num3, *args[, key])


回答 3

尝试按创建时间对项目排序。以下示例对文件夹中的文件进行排序,并获取最新的第一个元素。

import glob
import os

files_path = os.path.join(folder, '*')
files = sorted(
    glob.iglob(files_path), key=os.path.getctime, reverse=True) 
print files[0]

Try to sort items by creation time. Example below sorts files in a folder and gets first element which is latest.

import glob
import os

files_path = os.path.join(folder, '*')
files = sorted(
    glob.iglob(files_path), key=os.path.getctime, reverse=True) 
print files[0]

回答 4

我缺乏发表评论的声誉,但是Marlon Abeykoons回应的ctime并未为我提供正确的结果。使用mtime可以解决问题。(key = os.path.get m时间))

import glob
import os

list_of_files = glob.glob('/path/to/folder/*') # * means all if need specific format then *.csv
latest_file = max(list_of_files, key=os.path.getmtime)
print latest_file

对于该问题,我找到了两个答案:

python os.path.getctime max不返回最新的 python-unix系统中的getmtime()和getctime()之间的区别

I lack the reputation to comment but ctime from Marlon Abeykoons response did not give the correct result for me. Using mtime does the trick though. (key=os.path.getmtime))

import glob
import os

list_of_files = glob.glob('/path/to/folder/*') # * means all if need specific format then *.csv
latest_file = max(list_of_files, key=os.path.getmtime)
print latest_file

I found two answers for that problem:

python os.path.getctime max does not return latest Difference between python – getmtime() and getctime() in unix system


回答 5

(编辑以改善答案)

首先定义一个函数get_latest_file

def get_latest_file(path, *paths):
    fullpath = os.path.join(path, paths)
    ...
get_latest_file('example', 'files','randomtext011.*.txt')

您也可以使用文档字符串!

def get_latest_file(path, *paths):
    """Returns the name of the latest (most recent) file 
    of the joined path(s)"""
    fullpath = os.path.join(path, *paths)

如果使用Python 3,则可以改用iglob

完成代码以返回最新文件的名称:

def get_latest_file(path, *paths):
    """Returns the name of the latest (most recent) file 
    of the joined path(s)"""
    fullpath = os.path.join(path, *paths)
    files = glob.glob(fullpath)  # You may use iglob in Python3
    if not files:                # I prefer using the negation
        return None                      # because it behaves like a shortcut
    latest_file = max(files, key=os.path.getctime)
    _, filename = os.path.split(latest_file)
    return filename

(Edited to improve answer)

First define a function get_latest_file

def get_latest_file(path, *paths):
    fullpath = os.path.join(path, paths)
    ...
get_latest_file('example', 'files','randomtext011.*.txt')

You may also use a docstring !

def get_latest_file(path, *paths):
    """Returns the name of the latest (most recent) file 
    of the joined path(s)"""
    fullpath = os.path.join(path, *paths)

If you use Python 3, you can use iglob instead.

Complete code to return the name of latest file:

def get_latest_file(path, *paths):
    """Returns the name of the latest (most recent) file 
    of the joined path(s)"""
    fullpath = os.path.join(path, *paths)
    files = glob.glob(fullpath)  # You may use iglob in Python3
    if not files:                # I prefer using the negation
        return None                      # because it behaves like a shortcut
    latest_file = max(files, key=os.path.getctime)
    _, filename = os.path.split(latest_file)
    return filename

回答 6

我试图使用以上建议,但程序崩溃了,而不是我想识别的文件已被使用,并且在尝试使用“ os.path.getctime”时崩溃了。最终对我有用的是:

    files_before = glob.glob(os.path.join(my_path,'*'))
    **code where new file is created**
    new_file = set(files_before).symmetric_difference(set(glob.glob(os.path.join(my_path,'*'))))

此代码获取了两组文件列表之间最常见的对象,它并不是最优雅的对象,如果同时创建多个文件,则可能会不稳定

I have tried to use the above suggestions and my program crashed, than I figured out the file I’m trying to identify was used and when trying to use ‘os.path.getctime’ it crashed. what finally worked for me was:

    files_before = glob.glob(os.path.join(my_path,'*'))
    **code where new file is created**
    new_file = set(files_before).symmetric_difference(set(glob.glob(os.path.join(my_path,'*'))))

this codes gets the uncommon object between the two sets of file lists its not the most elegant, and if multiple files are created at the same time it would probably won’t be stable


回答 7

在Windows(0.05s)上更快的方法是,调用执行此操作的bat脚本:

get_latest.bat

@echo off
for /f %%i in ('dir \\directory\in\question /b/a-d/od/t:c') do set LAST=%%i
%LAST%

\\directory\in\question您要调查的目录在哪里。

get_latest.py

from subprocess import Popen, PIPE
p = Popen("get_latest.bat", shell=True, stdout=PIPE,)
stdout, stderr = p.communicate()
print(stdout, stderr)

如果找到文件stdout是路径,stderr则为None。

使用stdout.decode("utf-8").rstrip()来获取文件名使用字符串表示。

A much faster method on windows (0.05s), call a bat script that does this:

get_latest.bat

@echo off
for /f %%i in ('dir \\directory\in\question /b/a-d/od/t:c') do set LAST=%%i
%LAST%

where \\directory\in\question is the directory you want to investigate.

get_latest.py

from subprocess import Popen, PIPE
p = Popen("get_latest.bat", shell=True, stdout=PIPE,)
stdout, stderr = p.communicate()
print(stdout, stderr)

if it finds a file stdout is the path and stderr is None.

Use stdout.decode("utf-8").rstrip() to get the usable string representation of the file name.


回答 8

我在Python 3中一直在使用它,包括在文件名上进行模式匹配。

from pathlib import Path

def latest_file(path: Path, pattern: str = "*"):
    files = path.glob(pattern)
    return max(files, key=lambda x: x.stat().st_ctime)

I’ve been using this in Python 3, including pattern matching on the filename.

from pathlib import Path

def latest_file(path: Path, pattern: str = "*"):
    files = path.glob(pattern)
    return max(files, key=lambda x: x.stat().st_ctime)