标签归档:urlopen

让JSON对象接受字节或让urlopen输出字符串

问题:让JSON对象接受字节或让urlopen输出字符串

使用Python 3,我需要从URL请求json文档。

response = urllib.request.urlopen(request)

response对象是带有readreadline方法的类似文件的对象。通常,可以使用在文本模式下打开的文件来创建JSON对象。

obj = json.load(fp)

我想做的是:

obj = json.load(response)

但是,此方法不起作用,因为urlopen以二进制模式返回文件对象。

解决方法当然是:

str_response = response.read().decode('utf-8')
obj = json.loads(str_response)

但这感觉不好…

有没有更好的方法可以将字节文件对象转换为字符串文件对象?还是我缺少任何一个参数urlopenjson.load给出编码?

With Python 3 I am requesting a json document from a URL.

response = urllib.request.urlopen(request)

The response object is a file-like object with read and readline methods. Normally a JSON object can be created with a file opened in text mode.

obj = json.load(fp)

What I would like to do is:

obj = json.load(response)

This however does not work as urlopen returns a file object in binary mode.

A work around is of course:

str_response = response.read().decode('utf-8')
obj = json.loads(str_response)

but this feels bad…

Is there a better way that I can transform a bytes file object to a string file object? Or am I missing any parameters for either urlopen or json.load to give an encoding?


回答 0

HTTP发送字节。如果所讨论的资源是文本,则通常通过Content-Type HTTP标头或其他机制(RFC,HTML meta http-equiv等)指定字符编码。

urllib 应该知道如何将字节编码为字符串,但这太幼稚了-这是一个功能强大且功能强大的非Pythonic库。

深入Python 3提供了有关情况的概述。

您的“变通方法”很好-尽管感觉不对,但这是正确的方法。

HTTP sends bytes. If the resource in question is text, the character encoding is normally specified, either by the Content-Type HTTP header or by another mechanism (an RFC, HTML meta http-equiv,…).

urllib should know how to encode the bytes to a string, but it’s too naïve—it’s a horribly underpowered and un-Pythonic library.

Dive Into Python 3 provides an overview about the situation.

Your “work-around” is fine—although it feels wrong, it’s the correct way to do it.