标签归档:json

如何将字典转储到json文件?

问题:如何将字典转储到json文件?

我有这样的命令:

sample = {'ObjectInterpolator': 1629,  'PointInterpolator': 1675, 'RectangleInterpolator': 2042}

我不知道如何将字典转储到json文件,如下所示:

{      
    "name": "interpolator",
    "children": [
      {"name": "ObjectInterpolator", "size": 1629},
      {"name": "PointInterpolator", "size": 1675},
      {"name": "RectangleInterpolator", "size": 2042}
     ]
}

有python方式可以做到这一点吗?

您可能会猜到我想生成一个d3树形图。

I have a dict like this:

sample = {'ObjectInterpolator': 1629,  'PointInterpolator': 1675, 'RectangleInterpolator': 2042}

I can’t figure out how to dump the dict to a json file as showed below:

{      
    "name": "interpolator",
    "children": [
      {"name": "ObjectInterpolator", "size": 1629},
      {"name": "PointInterpolator", "size": 1675},
      {"name": "RectangleInterpolator", "size": 2042}
     ]
}

Is there a pythonic way to do this?

You may guess that I want to generate a d3 treemap.


回答 0

import json
with open('result.json', 'w') as fp:
    json.dump(sample, fp)

这是一种更简单的方法。

在第二行代码中,文件result.json被创建并作为变量打开fp

在第三行中,您的字典sample被写入result.json

import json
with open('result.json', 'w') as fp:
    json.dump(sample, fp)

This is an easier way to do it.

In the second line of code the file result.json gets created and opened as the variable fp.

In the third line your dict sample gets written into the result.json!


回答 1

结合@mgilson和@gnibbler的答案,我发现我需要的是:


d = {"name":"interpolator",
     "children":[{'name':key,"size":value} for key,value in sample.items()]}
j = json.dumps(d, indent=4)
f = open('sample.json', 'w')
print >> f, j
f.close()

这样,我得到了一个漂亮的json文件。print >> f, j从这里可以找到技巧:http : //www.anthonydebarros.com/2012/03/11/generate-json-from-sql-using-python/

Combine the answer of @mgilson and @gnibbler, I found what I need was this:


d = {"name":"interpolator",
     "children":[{'name':key,"size":value} for key,value in sample.items()]}
j = json.dumps(d, indent=4)
f = open('sample.json', 'w')
print >> f, j
f.close()

It this way, I got a pretty-print json file. The tricks print >> f, j is found from here: http://www.anthonydebarros.com/2012/03/11/generate-json-from-sql-using-python/


回答 2

d = {"name":"interpolator",
     "children":[{'name':key,"size":value} for key,value in sample.items()]}
json_string = json.dumps(d)

当然,不可能完全保留顺序…但这只是字典的性质…

d = {"name":"interpolator",
     "children":[{'name':key,"size":value} for key,value in sample.items()]}
json_string = json.dumps(d)

Of course, it’s unlikely that the order will be exactly preserved … But that’s just the nature of dictionaries …


回答 3

这应该给你一个开始

>>> import json
>>> print json.dumps([{'name': k, 'size': v} for k,v in sample.items()], indent=4)
[
    {
        "name": "PointInterpolator",
        "size": 1675
    },
    {
        "name": "ObjectInterpolator",
        "size": 1629
    },
    {
        "name": "RectangleInterpolator",
        "size": 2042
    }
]

This should give you a start

>>> import json
>>> print json.dumps([{'name': k, 'size': v} for k,v in sample.items()], indent=4)
[
    {
        "name": "PointInterpolator",
        "size": 1675
    },
    {
        "name": "ObjectInterpolator",
        "size": 1629
    },
    {
        "name": "RectangleInterpolator",
        "size": 2042
    }
]

回答 4

具有漂亮的打印格式:

import json

with open(path_to_file, 'w') as file:
    json_string = json.dumps(sample, default=lambda o: o.__dict__, sort_keys=True, indent=2)
    file.write(json_string)

with pretty-print format:

import json

with open(path_to_file, 'w') as file:
    json_string = json.dumps(sample, default=lambda o: o.__dict__, sort_keys=True, indent=2)
    file.write(json_string)

如何将JSON数据转换为Python对象

问题:如何将JSON数据转换为Python对象

我想使用Python将JSON数据转换成Python对象。

我从Facebook API接收了JSON数据对象,我想将其存储在数据库中。

我当前在Django(Python)中的视图(request.POST包含JSON):

response = request.POST
user = FbApiUser(user_id = response['id'])
user.name = response['name']
user.username = response['username']
user.save()
  • 这可以正常工作,但是如何处理复杂的JSON数据对象?

  • 如果我能以某种方式将这个JSON对象转换为Python对象以便于使用,会不会更好呢?

I want to use Python to convert JSON data into a Python object.

I receive JSON data objects from the Facebook API, which I want to store in my database.

My current View in Django (Python) (request.POST contains the JSON):

response = request.POST
user = FbApiUser(user_id = response['id'])
user.name = response['name']
user.username = response['username']
user.save()
  • This works fine, but how do I handle complex JSON data objects?

  • Wouldn’t it be much better if I could somehow convert this JSON object into a Python object for easy use?


回答 0

您可以使用namedtuple和在一行中完成操作object_hook

import json
from collections import namedtuple

data = '{"name": "John Smith", "hometown": {"name": "New York", "id": 123}}'

# Parse JSON into an object with attributes corresponding to dict keys.
x = json.loads(data, object_hook=lambda d: namedtuple('X', d.keys())(*d.values()))
print x.name, x.hometown.name, x.hometown.id

或者,轻松重用此方法:

def _json_object_hook(d): return namedtuple('X', d.keys())(*d.values())
def json2obj(data): return json.loads(data, object_hook=_json_object_hook)

x = json2obj(data)

如果您希望它处理不是好的属性名称的键,请查看namedtuplerenameparameter

UPDATE

With Python3, you can do it in one line, using SimpleNamespace and object_hook:

import json
from types import SimpleNamespace

data = '{"name": "John Smith", "hometown": {"name": "New York", "id": 123}}'

# Parse JSON into an object with attributes corresponding to dict keys.
x = json.loads(data, object_hook=lambda d: SimpleNamespace(**d))
print(x.name, x.hometown.name, x.hometown.id)

OLD ANSWER (Python2)

In Python2, you can do it in one line, using namedtuple and object_hook (but it’s very slow with many nested objects):

import json
from collections import namedtuple

data = '{"name": "John Smith", "hometown": {"name": "New York", "id": 123}}'

# Parse JSON into an object with attributes corresponding to dict keys.
x = json.loads(data, object_hook=lambda d: namedtuple('X', d.keys())(*d.values()))
print x.name, x.hometown.name, x.hometown.id

or, to reuse this easily:

def _json_object_hook(d): return namedtuple('X', d.keys())(*d.values())
def json2obj(data): return json.loads(data, object_hook=_json_object_hook)

x = json2obj(data)

If you want it to handle keys that aren’t good attribute names, check out namedtuple‘s rename parameter.


回答 1

检查出标题为专业JSON对象解码json 模块文档。您可以使用它来将JSON对象解码为特定的Python类型。

这是一个例子:

class User(object):
    def __init__(self, name, username):
        self.name = name
        self.username = username

import json
def object_decoder(obj):
    if '__type__' in obj and obj['__type__'] == 'User':
        return User(obj['name'], obj['username'])
    return obj

json.loads('{"__type__": "User", "name": "John Smith", "username": "jsmith"}',
           object_hook=object_decoder)

print type(User)  # -> <type 'type'>

更新资料

如果要通过json模块访问字典中的数据,请执行以下操作:

user = json.loads('{"__type__": "User", "name": "John Smith", "username": "jsmith"}')
print user['name']
print user['username']

就像普通字典一样。

Check out the section titled Specializing JSON object decoding in the json module documentation. You can use that to decode a JSON object into a specific Python type.

Here’s an example:

class User(object):
    def __init__(self, name, username):
        self.name = name
        self.username = username

import json
def object_decoder(obj):
    if '__type__' in obj and obj['__type__'] == 'User':
        return User(obj['name'], obj['username'])
    return obj

json.loads('{"__type__": "User", "name": "John Smith", "username": "jsmith"}',
           object_hook=object_decoder)

print type(User)  # -> <type 'type'>

Update

If you want to access data in a dictionary via the json module do this:

user = json.loads('{"__type__": "User", "name": "John Smith", "username": "jsmith"}')
print user['name']
print user['username']

Just like a regular dictionary.


回答 2

这不是代码编程,但这是我最短的技巧,types.SimpleNamespace用作JSON对象的容器。

与领先的namedtuple解决方案相比,它是:

  • 可能更快/更小,因为它不会为每个对象创建一个类
  • 短一点
  • 没有rename选项,并且对于无效标识符的密钥可能有相同的限制(在幕后使用setattr

例:

from __future__ import print_function
import json

try:
    from types import SimpleNamespace as Namespace
except ImportError:
    # Python 2.x fallback
    from argparse import Namespace

data = '{"name": "John Smith", "hometown": {"name": "New York", "id": 123}}'

x = json.loads(data, object_hook=lambda d: Namespace(**d))

print (x.name, x.hometown.name, x.hometown.id)

This is not code golf, but here is my shortest trick, using types.SimpleNamespace as the container for JSON objects.

Compared to the leading namedtuple solution, it is:

  • probably faster/smaller as it does not create a class for each object
  • shorter
  • no rename option, and probably the same limitation on keys that are not valid identifiers (uses setattr under the covers)

Example:

from __future__ import print_function
import json

try:
    from types import SimpleNamespace as Namespace
except ImportError:
    # Python 2.x fallback
    from argparse import Namespace

data = '{"name": "John Smith", "hometown": {"name": "New York", "id": 123}}'

x = json.loads(data, object_hook=lambda d: Namespace(**d))

print (x.name, x.hometown.name, x.hometown.id)

回答 3

您可以尝试以下方法:

class User(object):
    def __init__(self, name, username, *args, **kwargs):
        self.name = name
        self.username = username

import json
j = json.loads(your_json)
u = User(**j)

只需创建一个新的对象,然后将参数作为映射传递即可。

You could try this:

class User(object):
    def __init__(self, name, username, *args, **kwargs):
        self.name = name
        self.username = username

import json
j = json.loads(your_json)
u = User(**j)

Just create a new Object, and pass the parameters as a map.


回答 4

这是一种快速而肮脏的JSON泡菜替代品

import json

class User:
    def __init__(self, name, username):
        self.name = name
        self.username = username

    def to_json(self):
        return json.dumps(self.__dict__)

    @classmethod
    def from_json(cls, json_str):
        json_dict = json.loads(json_str)
        return cls(**json_dict)

# example usage
User("tbrown", "Tom Brown").to_json()
User.from_json(User("tbrown", "Tom Brown").to_json()).to_json()

Here’s a quick and dirty json pickle alternative

import json

class User:
    def __init__(self, name, username):
        self.name = name
        self.username = username

    def to_json(self):
        return json.dumps(self.__dict__)

    @classmethod
    def from_json(cls, json_str):
        json_dict = json.loads(json_str)
        return cls(**json_dict)

# example usage
User("tbrown", "Tom Brown").to_json()
User.from_json(User("tbrown", "Tom Brown").to_json()).to_json()

回答 5

对于复杂的对象,可以使用JSON Pickle

用于将任意对象图序列化为JSON的Python库。它几乎可以使用任何Python对象并将该对象转换为JSON。此外,它可以将对象重新构造回Python。

For complex objects, you can use JSON Pickle

Python library for serializing any arbitrary object graph into JSON. It can take almost any Python object and turn the object into JSON. Additionally, it can reconstitute the object back into Python.


回答 6

如果您使用的是Python 3.5+,则可以用于jsons序列化和反序列化为普通的旧Python对象:

import jsons

response = request.POST

# You'll need your class attributes to match your dict keys, so in your case do:
response['id'] = response.pop('user_id')

# Then you can load that dict into your class:
user = jsons.load(response, FbApiUser)

user.save()

您也可以FbApiUser从继承继承,jsons.JsonSerializable以获得更多的优雅:

user = FbApiUser.from_json(response)

如果您的类由Python默认类型(例如字符串,整数,列表,日期时间等)组成,则这些示例将起作用。但是jsonslib将需要自定义类型的类型提示。

If you’re using Python 3.5+, you can use jsons to serialize and deserialize to plain old Python objects:

import jsons

response = request.POST

# You'll need your class attributes to match your dict keys, so in your case do:
response['id'] = response.pop('user_id')

# Then you can load that dict into your class:
user = jsons.load(response, FbApiUser)

user.save()

You could also make FbApiUser inherit from jsons.JsonSerializable for more elegance:

user = FbApiUser.from_json(response)

These examples will work if your class consists of Python default types, like strings, integers, lists, datetimes, etc. The jsons lib will require type hints for custom types though.


回答 7

如果您使用的是Python 3.6+,则可以使用marshmallow-dataclass。与上面列出的所有解决方案相反,它既简单又可安全输入:

from marshmallow_dataclass import dataclass

@dataclass
class User:
    name: str

user, err = User.Schema().load({"name": "Ramirez"})

If you are using python 3.6+, you can use marshmallow-dataclass. Contrarily to all the solutions listed above, it is both simple, and type safe:

from marshmallow_dataclass import dataclass

@dataclass
class User:
    name: str

user = User.Schema().load({"name": "Ramirez"})

回答 8

我编写了一个名为any2any的小型(反序列化)框架,该有助于在两种Python类型之间进行复杂的转换。

在您的情况下,我想您想将字典(由获取json.loads)转换为response.education ; response.name具有嵌套结构response.education.id等的复杂对象……所以这正是该框架的目的。该文档尚不完善,但是通过使用any2any.simple.MappingToObject,您应该可以很轻松地做到这一点。请询问是否需要帮助。

I have written a small (de)serialization framework called any2any that helps doing complex transformations between two Python types.

In your case, I guess you want to transform from a dictionary (obtained with json.loads) to an complex object response.education ; response.name, with a nested structure response.education.id, etc … So that’s exactly what this framework is made for. The documentation is not great yet, but by using any2any.simple.MappingToObject, you should be able to do that very easily. Please ask if you need help.


回答 9

改善lovasoa的很好答案。

如果您使用的是python 3.6+,则可以使用:
pip install marshmallow-enum
pip install marshmallow-dataclass

它简单且类型安全。

您可以使用string-json转换类,反之亦然:

从对象到字符串Json:

    from marshmallow_dataclass import dataclass
    user = User("Danilo","50","RedBull",15,OrderStatus.CREATED)
    user_json = User.Schema().dumps(user)
    user_json_str = user_json.data

从String Json到Object:

    json_str = '{"name":"Danilo", "orderId":"50", "productName":"RedBull", "quantity":15, "status":"Created"}'
    user, err = User.Schema().loads(json_str)
    print(user,flush=True)

类定义:

class OrderStatus(Enum):
    CREATED = 'Created'
    PENDING = 'Pending'
    CONFIRMED = 'Confirmed'
    FAILED = 'Failed'

@dataclass
class User:
    def __init__(self, name, orderId, productName, quantity, status):
        self.name = name
        self.orderId = orderId
        self.productName = productName
        self.quantity = quantity
        self.status = status

    name: str
    orderId: str
    productName: str
    quantity: int
    status: OrderStatus

Improving the lovasoa’s very good answer.

If you are using python 3.6+, you can use:
pip install marshmallow-enum and
pip install marshmallow-dataclass

Its simple and type safe.

You can transform your class in a string-json and vice-versa:

From Object to String Json:

    from marshmallow_dataclass import dataclass
    user = User("Danilo","50","RedBull",15,OrderStatus.CREATED)
    user_json = User.Schema().dumps(user)
    user_json_str = user_json.data

From String Json to Object:

    json_str = '{"name":"Danilo", "orderId":"50", "productName":"RedBull", "quantity":15, "status":"Created"}'
    user, err = User.Schema().loads(json_str)
    print(user,flush=True)

Class definitions:

class OrderStatus(Enum):
    CREATED = 'Created'
    PENDING = 'Pending'
    CONFIRMED = 'Confirmed'
    FAILED = 'Failed'

@dataclass
class User:
    def __init__(self, name, orderId, productName, quantity, status):
        self.name = name
        self.orderId = orderId
        self.productName = productName
        self.quantity = quantity
        self.status = status

    name: str
    orderId: str
    productName: str
    quantity: int
    status: OrderStatus

回答 10

由于没有人像我一样提供答案,因此我将在此处发布。

这是一个强大的类,它可以很容易地来回转换JSON之间strdict我从复制我的回答另一个问题

import json

class PyJSON(object):
    def __init__(self, d):
        if type(d) is str:
            d = json.loads(d)

        self.from_dict(d)

    def from_dict(self, d):
        self.__dict__ = {}
        for key, value in d.items():
            if type(value) is dict:
                value = PyJSON(value)
            self.__dict__[key] = value

    def to_dict(self):
        d = {}
        for key, value in self.__dict__.items():
            if type(value) is PyJSON:
                value = value.to_dict()
            d[key] = value
        return d

    def __repr__(self):
        return str(self.to_dict())

    def __setitem__(self, key, value):
        self.__dict__[key] = value

    def __getitem__(self, key):
        return self.__dict__[key]

json_str = """... json string ..."""

py_json = PyJSON(json_str)

Since noone provided an answer quite like mine, I am going to post it here.

It is a robust class that can easily convert back and forth between json str and dict that I have copied from my answer to another question:

import json

class PyJSON(object):
    def __init__(self, d):
        if type(d) is str:
            d = json.loads(d)

        self.from_dict(d)

    def from_dict(self, d):
        self.__dict__ = {}
        for key, value in d.items():
            if type(value) is dict:
                value = PyJSON(value)
            self.__dict__[key] = value

    def to_dict(self):
        d = {}
        for key, value in self.__dict__.items():
            if type(value) is PyJSON:
                value = value.to_dict()
            d[key] = value
        return d

    def __repr__(self):
        return str(self.to_dict())

    def __setitem__(self, key, value):
        self.__dict__[key] = value

    def __getitem__(self, key):
        return self.__dict__[key]

json_str = """... json string ..."""

py_json = PyJSON(json_str)

回答 11

稍微修改@DS响应以从文件加载:

def _json_object_hook(d): return namedtuple('X', d.keys())(*d.values())
def load_data(file_name):
  with open(file_name, 'r') as file_data:
    return file_data.read().replace('\n', '')
def json2obj(file_name): return json.loads(load_data(file_name), object_hook=_json_object_hook)

一件事:这无法加载前面带有数字的项目。像这样:

{
  "1_first_item": {
    "A": "1",
    "B": "2"
  }
}

因为“ 1_first_item”不是有效的python字段名称。

Modifying @DS response a bit, to load from a file:

def _json_object_hook(d): return namedtuple('X', d.keys())(*d.values())
def load_data(file_name):
  with open(file_name, 'r') as file_data:
    return file_data.read().replace('\n', '')
def json2obj(file_name): return json.loads(load_data(file_name), object_hook=_json_object_hook)

One thing: this cannot load items with numbers ahead. Like this:

{
  "1_first_item": {
    "A": "1",
    "B": "2"
  }
}

Because “1_first_item” is not a valid python field name.


回答 12

在寻找解决方案时,我偶然发现了此博客文章:https : //blog.mosthege.net/2016/11/12/json-deserialization-of-nested-objects/

它使用与先前答案相同的技术,但使用了装饰器。我发现有用的另一件事是事实,它在反序列化结束时返回一个类型化的对象

class JsonConvert(object):
    class_mappings = {}

    @classmethod
    def class_mapper(cls, d):
        for keys, cls in clsself.mappings.items():
            if keys.issuperset(d.keys()):   # are all required arguments present?
                return cls(**d)
        else:
            # Raise exception instead of silently returning None
            raise ValueError('Unable to find a matching class for object: {!s}'.format(d))

    @classmethod
    def complex_handler(cls, Obj):
        if hasattr(Obj, '__dict__'):
            return Obj.__dict__
        else:
            raise TypeError('Object of type %s with value of %s is not JSON serializable' % (type(Obj), repr(Obj)))

    @classmethod
    def register(cls, claz):
        clsself.mappings[frozenset(tuple([attr for attr,val in cls().__dict__.items()]))] = cls
        return cls

    @classmethod
    def to_json(cls, obj):
        return json.dumps(obj.__dict__, default=cls.complex_handler, indent=4)

    @classmethod
    def from_json(cls, json_str):
        return json.loads(json_str, object_hook=cls.class_mapper)

用法:

@JsonConvert.register
class Employee(object):
    def __init__(self, Name:int=None, Age:int=None):
        self.Name = Name
        self.Age = Age
        return

@JsonConvert.register
class Company(object):
    def __init__(self, Name:str="", Employees:[Employee]=None):
        self.Name = Name
        self.Employees = [] if Employees is None else Employees
        return

company = Company("Contonso")
company.Employees.append(Employee("Werner", 38))
company.Employees.append(Employee("Mary"))

as_json = JsonConvert.to_json(company)
from_json = JsonConvert.from_json(as_json)
as_json_from_json = JsonConvert.to_json(from_json)

assert(as_json_from_json == as_json)

print(as_json_from_json)

While searching for a solution, I’ve stumbled upon this blog post: https://blog.mosthege.net/2016/11/12/json-deserialization-of-nested-objects/

It uses the same technique as stated in previous answers but with a usage of decorators. Another thing I found useful is the fact that it returns a typed object at the end of deserialisation

class JsonConvert(object):
    class_mappings = {}

    @classmethod
    def class_mapper(cls, d):
        for keys, cls in clsself.mappings.items():
            if keys.issuperset(d.keys()):   # are all required arguments present?
                return cls(**d)
        else:
            # Raise exception instead of silently returning None
            raise ValueError('Unable to find a matching class for object: {!s}'.format(d))

    @classmethod
    def complex_handler(cls, Obj):
        if hasattr(Obj, '__dict__'):
            return Obj.__dict__
        else:
            raise TypeError('Object of type %s with value of %s is not JSON serializable' % (type(Obj), repr(Obj)))

    @classmethod
    def register(cls, claz):
        clsself.mappings[frozenset(tuple([attr for attr,val in cls().__dict__.items()]))] = cls
        return cls

    @classmethod
    def to_json(cls, obj):
        return json.dumps(obj.__dict__, default=cls.complex_handler, indent=4)

    @classmethod
    def from_json(cls, json_str):
        return json.loads(json_str, object_hook=cls.class_mapper)

Usage:

@JsonConvert.register
class Employee(object):
    def __init__(self, Name:int=None, Age:int=None):
        self.Name = Name
        self.Age = Age
        return

@JsonConvert.register
class Company(object):
    def __init__(self, Name:str="", Employees:[Employee]=None):
        self.Name = Name
        self.Employees = [] if Employees is None else Employees
        return

company = Company("Contonso")
company.Employees.append(Employee("Werner", 38))
company.Employees.append(Employee("Mary"))

as_json = JsonConvert.to_json(company)
from_json = JsonConvert.from_json(as_json)
as_json_from_json = JsonConvert.to_json(from_json)

assert(as_json_from_json == as_json)

print(as_json_from_json)

回答 13

稍微扩展一下DS的答案,如果您需要对象可变(不包括namedtuple),则可以使用recordclass库而不是namedtuple

import json
from recordclass import recordclass

data = '{"name": "John Smith", "hometown": {"name": "New York", "id": 123}}'

# Parse into a mutable object
x = json.loads(data, object_hook=lambda d: recordclass('X', d.keys())(*d.values()))

然后可以使用simplejson轻松地将修改后的对象转换回json :

x.name = "John Doe"
new_json = simplejson.dumps(x)

Expanding on DS’s answer a bit, if you need the object to be mutable (which namedtuple is not), you can use the recordclass library instead of namedtuple:

import json
from recordclass import recordclass

data = '{"name": "John Smith", "hometown": {"name": "New York", "id": 123}}'

# Parse into a mutable object
x = json.loads(data, object_hook=lambda d: recordclass('X', d.keys())(*d.values()))

The modified object can then be converted back to json very easily using simplejson:

x.name = "John Doe"
new_json = simplejson.dumps(x)

回答 14

如果您使用的是Python 3.6或更高版本,则可以看看squema-一种用于静态类型数据结构的轻量级模块。它使您的代码易于阅读,同时无需任何额外工作即可提供简单的数据验证,转换和序列化。您可以将其视为命名元组和数据类的更复杂,更自以为是的替代方案。使用方法如下:

from uuid import UUID
from squema import Squema


class FbApiUser(Squema):
    id: UUID
    age: int
    name: str

    def save(self):
        pass


user = FbApiUser(**json.loads(response))
user.save()

If you’re using Python 3.6 or newer, you could have a look at squema – a lightweight module for statically typed data structures. It makes your code easy to read while at the same time providing simple data validation, conversion and serialization without extra work. You can think of it as a more sophisticated and opinionated alternative to namedtuples and dataclasses. Here’s how you could use it:

from uuid import UUID
from squema import Squema


class FbApiUser(Squema):
    id: UUID
    age: int
    name: str

    def save(self):
        pass


user = FbApiUser(**json.loads(response))
user.save()

回答 15

我正在寻找一种可与recordclass.RecordClass,支持嵌套对象同时适用于json序列化和json反序列化的解决方案。

扩展DS的答案,并扩展BeneStr的解决方案,我想到了以下似乎可行的方法:

码:

import json
import recordclass

class NestedRec(recordclass.RecordClass):
    a : int = 0
    b : int = 0

class ExampleRec(recordclass.RecordClass):
    x : int       = None
    y : int       = None
    nested : NestedRec = NestedRec()

class JsonSerializer:
    @staticmethod
    def dumps(obj, ensure_ascii=True, indent=None, sort_keys=False):
        return json.dumps(obj, default=JsonSerializer.__obj_to_dict, ensure_ascii=ensure_ascii, indent=indent, sort_keys=sort_keys)

    @staticmethod
    def loads(s, klass):
        return JsonSerializer.__dict_to_obj(klass, json.loads(s))

    @staticmethod
    def __obj_to_dict(obj):
        if hasattr(obj, "_asdict"):
            return obj._asdict()
        else:
            return json.JSONEncoder().default(obj)

    @staticmethod
    def __dict_to_obj(klass, s_dict):
        kwargs = {
            key : JsonSerializer.__dict_to_obj(cls, s_dict[key]) if hasattr(cls,'_asdict') else s_dict[key] \
                for key,cls in klass.__annotations__.items() \
                    if s_dict is not None and key in s_dict
        }
        return klass(**kwargs)

用法:

example_0 = ExampleRec(x = 10, y = 20, nested = NestedRec( a = 30, b = 40 ) )

#Serialize to JSON

json_str = JsonSerializer.dumps(example_0)
print(json_str)
#{
#  "x": 10,
#  "y": 20,
#  "nested": {
#    "a": 30,
#    "b": 40
#  }
#}

# Deserialize from JSON
example_1 = JsonSerializer.loads(json_str, ExampleRec)
example_1.x += 1
example_1.y += 1
example_1.nested.a += 1
example_1.nested.b += 1

json_str = JsonSerializer.dumps(example_1)
print(json_str)
#{
#  "x": 11,
#  "y": 21,
#  "nested": {
#    "a": 31,
#    "b": 41
#  }
#}

I was searching for a solution that worked with recordclass.RecordClass, supports nested objects and works for both json serialization and json deserialization.

Expanding on DS’s answer, and expanding on solution from BeneStr, I came up with the following that seems to work:

Code:

import json
import recordclass

class NestedRec(recordclass.RecordClass):
    a : int = 0
    b : int = 0

class ExampleRec(recordclass.RecordClass):
    x : int       = None
    y : int       = None
    nested : NestedRec = NestedRec()

class JsonSerializer:
    @staticmethod
    def dumps(obj, ensure_ascii=True, indent=None, sort_keys=False):
        return json.dumps(obj, default=JsonSerializer.__obj_to_dict, ensure_ascii=ensure_ascii, indent=indent, sort_keys=sort_keys)

    @staticmethod
    def loads(s, klass):
        return JsonSerializer.__dict_to_obj(klass, json.loads(s))

    @staticmethod
    def __obj_to_dict(obj):
        if hasattr(obj, "_asdict"):
            return obj._asdict()
        else:
            return json.JSONEncoder().default(obj)

    @staticmethod
    def __dict_to_obj(klass, s_dict):
        kwargs = {
            key : JsonSerializer.__dict_to_obj(cls, s_dict[key]) if hasattr(cls,'_asdict') else s_dict[key] \
                for key,cls in klass.__annotations__.items() \
                    if s_dict is not None and key in s_dict
        }
        return klass(**kwargs)

Usage:

example_0 = ExampleRec(x = 10, y = 20, nested = NestedRec( a = 30, b = 40 ) )

#Serialize to JSON

json_str = JsonSerializer.dumps(example_0)
print(json_str)
#{
#  "x": 10,
#  "y": 20,
#  "nested": {
#    "a": 30,
#    "b": 40
#  }
#}

# Deserialize from JSON
example_1 = JsonSerializer.loads(json_str, ExampleRec)
example_1.x += 1
example_1.y += 1
example_1.nested.a += 1
example_1.nested.b += 1

json_str = JsonSerializer.dumps(example_1)
print(json_str)
#{
#  "x": 11,
#  "y": 21,
#  "nested": {
#    "a": 31,
#    "b": 41
#  }
#}

回答 16

此处给出的答案未返回正确的对象类型,因此我在下面创建了这些方法。如果您尝试将更多字段添加到给定JSON中不存在的类,它们也会失败:

def dict_to_class(class_name: Any, dictionary: dict) -> Any:
    instance = class_name()
    for key in dictionary.keys():
        setattr(instance, key, dictionary[key])
    return instance


def json_to_class(class_name: Any, json_string: str) -> Any:
    dict_object = json.loads(json_string)
    return dict_to_class(class_name, dict_object)

The answers given here does not return the correct object type, hence I created these methods below. They also fail if you try to add more fields to the class that does not exist in the given JSON:

def dict_to_class(class_name: Any, dictionary: dict) -> Any:
    instance = class_name()
    for key in dictionary.keys():
        setattr(instance, key, dictionary[key])
    return instance


def json_to_class(class_name: Any, json_string: str) -> Any:
    dict_object = json.loads(json_string)
    return dict_to_class(class_name, dict_object)

回答 17

Python3.x

我所能达到的最好的方法就是这个。
注意,此代码也对待set()。
这种方法是通用的,只需要扩展类(在第二个示例中)。
请注意,我只是对文件进行处理,但是可以根据自己的喜好修改行为。

但是,这是CoDec。

通过更多的工作,您可以用其他方式构造您的类。我假定使用默认的构造函数来实例化它,然后更新类dict。

import json
import collections


class JsonClassSerializable(json.JSONEncoder):

    REGISTERED_CLASS = {}

    def register(ctype):
        JsonClassSerializable.REGISTERED_CLASS[ctype.__name__] = ctype

    def default(self, obj):
        if isinstance(obj, collections.Set):
            return dict(_set_object=list(obj))
        if isinstance(obj, JsonClassSerializable):
            jclass = {}
            jclass["name"] = type(obj).__name__
            jclass["dict"] = obj.__dict__
            return dict(_class_object=jclass)
        else:
            return json.JSONEncoder.default(self, obj)

    def json_to_class(self, dct):
        if '_set_object' in dct:
            return set(dct['_set_object'])
        elif '_class_object' in dct:
            cclass = dct['_class_object']
            cclass_name = cclass["name"]
            if cclass_name not in self.REGISTERED_CLASS:
                raise RuntimeError(
                    "Class {} not registered in JSON Parser"
                    .format(cclass["name"])
                )
            instance = self.REGISTERED_CLASS[cclass_name]()
            instance.__dict__ = cclass["dict"]
            return instance
        return dct

    def encode_(self, file):
        with open(file, 'w') as outfile:
            json.dump(
                self.__dict__, outfile,
                cls=JsonClassSerializable,
                indent=4,
                sort_keys=True
            )

    def decode_(self, file):
        try:
            with open(file, 'r') as infile:
                self.__dict__ = json.load(
                    infile,
                    object_hook=self.json_to_class
                )
        except FileNotFoundError:
            print("Persistence load failed "
                  "'{}' do not exists".format(file)
                  )


class C(JsonClassSerializable):

    def __init__(self):
        self.mill = "s"


JsonClassSerializable.register(C)


class B(JsonClassSerializable):

    def __init__(self):
        self.a = 1230
        self.c = C()


JsonClassSerializable.register(B)


class A(JsonClassSerializable):

    def __init__(self):
        self.a = 1
        self.b = {1, 2}
        self.c = B()

JsonClassSerializable.register(A)

A().encode_("test")
b = A()
b.decode_("test")
print(b.a)
print(b.b)
print(b.c.a)

编辑

通过更多的研究,我找到了一种使用元类进行泛化而无需SUPERCLASS寄存器方法调用的方法。

import json
import collections

REGISTERED_CLASS = {}

class MetaSerializable(type):

    def __call__(cls, *args, **kwargs):
        if cls.__name__ not in REGISTERED_CLASS:
            REGISTERED_CLASS[cls.__name__] = cls
        return super(MetaSerializable, cls).__call__(*args, **kwargs)


class JsonClassSerializable(json.JSONEncoder, metaclass=MetaSerializable):

    def default(self, obj):
        if isinstance(obj, collections.Set):
            return dict(_set_object=list(obj))
        if isinstance(obj, JsonClassSerializable):
            jclass = {}
            jclass["name"] = type(obj).__name__
            jclass["dict"] = obj.__dict__
            return dict(_class_object=jclass)
        else:
            return json.JSONEncoder.default(self, obj)

    def json_to_class(self, dct):
        if '_set_object' in dct:
            return set(dct['_set_object'])
        elif '_class_object' in dct:
            cclass = dct['_class_object']
            cclass_name = cclass["name"]
            if cclass_name not in REGISTERED_CLASS:
                raise RuntimeError(
                    "Class {} not registered in JSON Parser"
                    .format(cclass["name"])
                )
            instance = REGISTERED_CLASS[cclass_name]()
            instance.__dict__ = cclass["dict"]
            return instance
        return dct

    def encode_(self, file):
        with open(file, 'w') as outfile:
            json.dump(
                self.__dict__, outfile,
                cls=JsonClassSerializable,
                indent=4,
                sort_keys=True
            )

    def decode_(self, file):
        try:
            with open(file, 'r') as infile:
                self.__dict__ = json.load(
                    infile,
                    object_hook=self.json_to_class
                )
        except FileNotFoundError:
            print("Persistence load failed "
                  "'{}' do not exists".format(file)
                  )


class C(JsonClassSerializable):

    def __init__(self):
        self.mill = "s"


class B(JsonClassSerializable):

    def __init__(self):
        self.a = 1230
        self.c = C()


class A(JsonClassSerializable):

    def __init__(self):
        self.a = 1
        self.b = {1, 2}
        self.c = B()


A().encode_("test")
b = A()
b.decode_("test")
print(b.a)
# 1
print(b.b)
# {1, 2}
print(b.c.a)
# 1230
print(b.c.c.mill)
# s

Python3.x

The best aproach I could reach with my knowledge was this.
Note that this code treat set() too.
This approach is generic just needing the extension of class (in the second example).
Note that I’m just doing it to files, but it’s easy to modify the behavior to your taste.

However this is a CoDec.

With a little more work you can construct your class in other ways. I assume a default constructor to instance it, then I update the class dict.

import json
import collections


class JsonClassSerializable(json.JSONEncoder):

    REGISTERED_CLASS = {}

    def register(ctype):
        JsonClassSerializable.REGISTERED_CLASS[ctype.__name__] = ctype

    def default(self, obj):
        if isinstance(obj, collections.Set):
            return dict(_set_object=list(obj))
        if isinstance(obj, JsonClassSerializable):
            jclass = {}
            jclass["name"] = type(obj).__name__
            jclass["dict"] = obj.__dict__
            return dict(_class_object=jclass)
        else:
            return json.JSONEncoder.default(self, obj)

    def json_to_class(self, dct):
        if '_set_object' in dct:
            return set(dct['_set_object'])
        elif '_class_object' in dct:
            cclass = dct['_class_object']
            cclass_name = cclass["name"]
            if cclass_name not in self.REGISTERED_CLASS:
                raise RuntimeError(
                    "Class {} not registered in JSON Parser"
                    .format(cclass["name"])
                )
            instance = self.REGISTERED_CLASS[cclass_name]()
            instance.__dict__ = cclass["dict"]
            return instance
        return dct

    def encode_(self, file):
        with open(file, 'w') as outfile:
            json.dump(
                self.__dict__, outfile,
                cls=JsonClassSerializable,
                indent=4,
                sort_keys=True
            )

    def decode_(self, file):
        try:
            with open(file, 'r') as infile:
                self.__dict__ = json.load(
                    infile,
                    object_hook=self.json_to_class
                )
        except FileNotFoundError:
            print("Persistence load failed "
                  "'{}' do not exists".format(file)
                  )


class C(JsonClassSerializable):

    def __init__(self):
        self.mill = "s"


JsonClassSerializable.register(C)


class B(JsonClassSerializable):

    def __init__(self):
        self.a = 1230
        self.c = C()


JsonClassSerializable.register(B)


class A(JsonClassSerializable):

    def __init__(self):
        self.a = 1
        self.b = {1, 2}
        self.c = B()

JsonClassSerializable.register(A)

A().encode_("test")
b = A()
b.decode_("test")
print(b.a)
print(b.b)
print(b.c.a)

Edit

With some more of research I found a way to generalize without the need of the SUPERCLASS register method call, using a metaclass

import json
import collections

REGISTERED_CLASS = {}

class MetaSerializable(type):

    def __call__(cls, *args, **kwargs):
        if cls.__name__ not in REGISTERED_CLASS:
            REGISTERED_CLASS[cls.__name__] = cls
        return super(MetaSerializable, cls).__call__(*args, **kwargs)


class JsonClassSerializable(json.JSONEncoder, metaclass=MetaSerializable):

    def default(self, obj):
        if isinstance(obj, collections.Set):
            return dict(_set_object=list(obj))
        if isinstance(obj, JsonClassSerializable):
            jclass = {}
            jclass["name"] = type(obj).__name__
            jclass["dict"] = obj.__dict__
            return dict(_class_object=jclass)
        else:
            return json.JSONEncoder.default(self, obj)

    def json_to_class(self, dct):
        if '_set_object' in dct:
            return set(dct['_set_object'])
        elif '_class_object' in dct:
            cclass = dct['_class_object']
            cclass_name = cclass["name"]
            if cclass_name not in REGISTERED_CLASS:
                raise RuntimeError(
                    "Class {} not registered in JSON Parser"
                    .format(cclass["name"])
                )
            instance = REGISTERED_CLASS[cclass_name]()
            instance.__dict__ = cclass["dict"]
            return instance
        return dct

    def encode_(self, file):
        with open(file, 'w') as outfile:
            json.dump(
                self.__dict__, outfile,
                cls=JsonClassSerializable,
                indent=4,
                sort_keys=True
            )

    def decode_(self, file):
        try:
            with open(file, 'r') as infile:
                self.__dict__ = json.load(
                    infile,
                    object_hook=self.json_to_class
                )
        except FileNotFoundError:
            print("Persistence load failed "
                  "'{}' do not exists".format(file)
                  )


class C(JsonClassSerializable):

    def __init__(self):
        self.mill = "s"


class B(JsonClassSerializable):

    def __init__(self):
        self.a = 1230
        self.c = C()


class A(JsonClassSerializable):

    def __init__(self):
        self.a = 1
        self.b = {1, 2}
        self.c = B()


A().encode_("test")
b = A()
b.decode_("test")
print(b.a)
# 1
print(b.b)
# {1, 2}
print(b.c.a)
# 1230
print(b.c.c.mill)
# s

回答 18

您可以使用

x = Map(json.loads(response))
x.__class__ = MyClass

哪里

class Map(dict):
    def __init__(self, *args, **kwargs):
        super(Map, self).__init__(*args, **kwargs)
        for arg in args:
            if isinstance(arg, dict):
                for k, v in arg.iteritems():
                    self[k] = v
                    if isinstance(v, dict):
                        self[k] = Map(v)

        if kwargs:
            # for python 3 use kwargs.items()
            for k, v in kwargs.iteritems():
                self[k] = v
                if isinstance(v, dict):
                    self[k] = Map(v)

    def __getattr__(self, attr):
        return self.get(attr)

    def __setattr__(self, key, value):
        self.__setitem__(key, value)

    def __setitem__(self, key, value):
        super(Map, self).__setitem__(key, value)
        self.__dict__.update({key: value})

    def __delattr__(self, item):
        self.__delitem__(item)

    def __delitem__(self, key):
        super(Map, self).__delitem__(key)
        del self.__dict__[key]

一个通用的,面向未来的解决方案。

You can use

x = Map(json.loads(response))
x.__class__ = MyClass

where

class Map(dict):
    def __init__(self, *args, **kwargs):
        super(Map, self).__init__(*args, **kwargs)
        for arg in args:
            if isinstance(arg, dict):
                for k, v in arg.iteritems():
                    self[k] = v
                    if isinstance(v, dict):
                        self[k] = Map(v)

        if kwargs:
            # for python 3 use kwargs.items()
            for k, v in kwargs.iteritems():
                self[k] = v
                if isinstance(v, dict):
                    self[k] = Map(v)

    def __getattr__(self, attr):
        return self.get(attr)

    def __setattr__(self, key, value):
        self.__setitem__(key, value)

    def __setitem__(self, key, value):
        super(Map, self).__setitem__(key, value)
        self.__dict__.update({key: value})

    def __delattr__(self, item):
        self.__delitem__(item)

    def __delitem__(self, key):
        super(Map, self).__delitem__(key)
        del self.__dict__[key]

For a generic, future-proof solution.


回答 19

使用json模块Python 2.6中的新增功能)或simplejson几乎始终安装的模块。

Use the json module (new in Python 2.6) or the simplejson module which is almost always installed.


如何在Python中解析JSON?

问题:如何在Python中解析JSON?

我的项目目前正在python中接收JSON消息,我需要从中获取一些信息。为此,我们将其设置为字符串中的一些简单JSON:

jsonStr = '{"one" : "1", "two" : "2", "three" : "3"}'

到目前为止,我一直在使用列表生成JSON请求json.dumps,但是与此相反,我认为我需要使用json.loads。但是我没有那么幸运。谁能为我提供一个片段,该片段将在上述示例"2"的输入中返回"two"

My project is currently receiving a JSON message in python which I need to get bits of information out of. For the purposes of this, let’s set it to some simple JSON in a string:

jsonStr = '{"one" : "1", "two" : "2", "three" : "3"}'

So far I’ve been generating JSON requests using a list and then json.dumps, but to do the opposite of this I think I need to use json.loads. However I haven’t had much luck with it. Could anyone provide me a snippet that would return "2" with the input of "two" in the above example?


回答 0

很简单:

import json
data = json.loads('{"one" : "1", "two" : "2", "three" : "3"}')
print data['two']

Very simple:

import json
data = json.loads('{"one" : "1", "two" : "2", "three" : "3"}')
print data['two']

回答 1

有时,您的json不是字符串。例如,如果您从这样的网址获取json:

j = urllib2.urlopen('http://site.com/data.json')

您将需要使用json.load,而不是json.loads:

j_obj = json.load(j)

(很容易忘记:“ s”代表“字符串”)

Sometimes your json is not a string. For example if you are getting a json from a url like this:

j = urllib2.urlopen('http://site.com/data.json')

you will need to use json.load, not json.loads:

j_obj = json.load(j)

(it is easy to forget: the ‘s’ is for ‘string’)


回答 2

对于URL或文件,请使用json.load()。对于具有.json内容的字符串,请使用json.loads()

#! /usr/bin/python

import json
# from pprint import pprint

json_file = 'my_cube.json'
cube = '1'

with open(json_file) as json_data:
    data = json.load(json_data)

# pprint(data)

print "Dimension: ", data['cubes'][cube]['dim']
print "Measures:  ", data['cubes'][cube]['meas']

For URL or file, use json.load(). For string with .json content, use json.loads().

#! /usr/bin/python

import json
# from pprint import pprint

json_file = 'my_cube.json'
cube = '1'

with open(json_file) as json_data:
    data = json.load(json_data)

# pprint(data)

print "Dimension: ", data['cubes'][cube]['dim']
print "Measures:  ", data['cubes'][cube]['meas']

回答 3

以下是可能帮助您的简单示例:

json_string = """
{
    "pk": 1, 
    "fa": "cc.ee", 
    "fb": {
        "fc": "", 
        "fd_id": "12345"
    }
}"""

import json
data = json.loads(json_string)
if data["fa"] == "cc.ee":
    data["fb"]["new_key"] = "cc.ee was present!"

print json.dumps(data)

上面代码的输出将是:

{"pk": 1, "fb": {"new_key": "cc.ee was present!", "fd_id": "12345", 
 "fc": ""}, "fa": "cc.ee"}

请注意,您可以设置dump的ident参数来像这样打印它(例如,当使用print json.dumps(data,indent = 4)时):

{
    "pk": 1, 
    "fb": {
        "new_key": "cc.ee was present!", 
        "fd_id": "12345", 
        "fc": ""
    }, 
    "fa": "cc.ee"
}

Following is simple example that may help you:

json_string = """
{
    "pk": 1, 
    "fa": "cc.ee", 
    "fb": {
        "fc": "", 
        "fd_id": "12345"
    }
}"""

import json
data = json.loads(json_string)
if data["fa"] == "cc.ee":
    data["fb"]["new_key"] = "cc.ee was present!"

print json.dumps(data)

The output for the above code will be:

{"pk": 1, "fb": {"new_key": "cc.ee was present!", "fd_id": "12345", 
 "fc": ""}, "fa": "cc.ee"}

Note that you can set the ident argument of dump to print it like so (for example,when using print json.dumps(data , indent=4)):

{
    "pk": 1, 
    "fb": {
        "new_key": "cc.ee was present!", 
        "fd_id": "12345", 
        "fc": ""
    }, 
    "fa": "cc.ee"
}

回答 4

可以使用json或ast python模块:

Using json :
=============

import json
jsonStr = '{"one" : "1", "two" : "2", "three" : "3"}'
json_data = json.loads(jsonStr)
print(f"json_data: {json_data}")
print(f"json_data['two']: {json_data['two']}")

Output:
json_data: {'one': '1', 'two': '2', 'three': '3'}
json_data['two']: 2




Using ast:
==========

import ast
jsonStr = '{"one" : "1", "two" : "2", "three" : "3"}'
json_dict = ast.literal_eval(jsonStr)
print(f"json_dict: {json_dict}")
print(f"json_dict['two']: {json_dict['two']}")

Output:
json_dict: {'one': '1', 'two': '2', 'three': '3'}
json_dict['two']: 2

Can use either json or ast python modules:

Using json :
=============

import json
jsonStr = '{"one" : "1", "two" : "2", "three" : "3"}'
json_data = json.loads(jsonStr)
print(f"json_data: {json_data}")
print(f"json_data['two']: {json_data['two']}")

Output:
json_data: {'one': '1', 'two': '2', 'three': '3'}
json_data['two']: 2




Using ast:
==========

import ast
jsonStr = '{"one" : "1", "two" : "2", "three" : "3"}'
json_dict = ast.literal_eval(jsonStr)
print(f"json_dict: {json_dict}")
print(f"json_dict['two']: {json_dict['two']}")

Output:
json_dict: {'one': '1', 'two': '2', 'three': '3'}
json_dict['two']: 2

如何从JSON获取字符串对象而不是Unicode?

问题:如何从JSON获取字符串对象而不是Unicode?

我正在使用Python 2ASCII编码的文本文件中解析JSON 。

使用json或 加载这些文件时simplejson,我所有的字符串值都转换为Unicode对象而不是字符串对象。问题是,我必须将数据与仅接受字符串对象的某些库一起使用。我无法更改库,无法更新它们。

是否可以获取字符串对象而不是Unicode对象?

>>> import json
>>> original_list = ['a', 'b']
>>> json_list = json.dumps(original_list)
>>> json_list
'["a", "b"]'
>>> new_list = json.loads(json_list)
>>> new_list
[u'a', u'b']  # I want these to be of type `str`, not `unicode`

更新资料

很久以前,当我坚持使用Python 2时就问这个问题。今天一种简单易用的解决方案是使用最新版本的Python,即Python 3及更高版本。

I’m using Python 2 to parse JSON from ASCII encoded text files.

When loading these files with either json or simplejson, all my string values are cast to Unicode objects instead of string objects. The problem is, I have to use the data with some libraries that only accept string objects. I can’t change the libraries nor update them.

Is it possible to get string objects instead of Unicode ones?

Example

>>> import json
>>> original_list = ['a', 'b']
>>> json_list = json.dumps(original_list)
>>> json_list
'["a", "b"]'
>>> new_list = json.loads(json_list)
>>> new_list
[u'a', u'b']  # I want these to be of type `str`, not `unicode`

Update

This question was asked a long time ago, when I was stuck with Python 2. One easy and clean solution for today is to use a recent version of Python — i.e. Python 3 and forward.


回答 0

一个解决方案 object_hook

import json

def json_load_byteified(file_handle):
    return _byteify(
        json.load(file_handle, object_hook=_byteify),
        ignore_dicts=True
    )

def json_loads_byteified(json_text):
    return _byteify(
        json.loads(json_text, object_hook=_byteify),
        ignore_dicts=True
    )

def _byteify(data, ignore_dicts = False):
    # if this is a unicode string, return its string representation
    if isinstance(data, unicode):
        return data.encode('utf-8')
    # if this is a list of values, return list of byteified values
    if isinstance(data, list):
        return [ _byteify(item, ignore_dicts=True) for item in data ]
    # if this is a dictionary, return dictionary of byteified keys and values
    # but only if we haven't already byteified it
    if isinstance(data, dict) and not ignore_dicts:
        return {
            _byteify(key, ignore_dicts=True): _byteify(value, ignore_dicts=True)
            for key, value in data.iteritems()
        }
    # if it's anything else, return it in its original form
    return data

用法示例:

>>> json_loads_byteified('{"Hello": "World"}')
{'Hello': 'World'}
>>> json_loads_byteified('"I am a top-level string"')
'I am a top-level string'
>>> json_loads_byteified('7')
7
>>> json_loads_byteified('["I am inside a list"]')
['I am inside a list']
>>> json_loads_byteified('[[[[[[[["I am inside a big nest of lists"]]]]]]]]')
[[[[[[[['I am inside a big nest of lists']]]]]]]]
>>> json_loads_byteified('{"foo": "bar", "things": [7, {"qux": "baz", "moo": {"cow": ["milk"]}}]}')
{'things': [7, {'qux': 'baz', 'moo': {'cow': ['milk']}}], 'foo': 'bar'}
>>> json_load_byteified(open('somefile.json'))
{'more json': 'from a file'}

它是如何工作的,我为什么要使用它?

Mark Amery的功能比这些功能更短更清晰,那么它们的意义何在?您为什么要使用它们?

纯粹是为了 表现。Mark的答案首先使用unicode字符串完全解码JSON文本,然后遍历整个解码值以将所有字符串转换为字节字符串。这有一些不良影响:

  • 整个解码结构的副本在内存中创建
  • 如果您的JSON对象确实是深度嵌套(500级或更多),那么您将达到Python的最大递归深度

这个答案通过缓解这两方面的性能问题object_hook的参数json.loadjson.loads。从文档

object_hook是一个可选函数,它将被解码的任何对象文字(a dict)的结果调用。将使用object_hook的返回值代替dict。此功能可用于实现自定义解码器

由于嵌套在其他字典中许多层次的字典object_hook 在解码时会传递给我们,因此我们可以在那时将其中的任何字符串或列表字节化,并避免以后再进行深度递归。

Mark的答案不适合按object_hook现状使用,因为它会递归为嵌套词典。我们阻止这个答案与该递归ignore_dicts参数_byteify,它被传递给它在任何时候都只是object_hook它传递一个新dict来byteify。该ignore_dicts标志指示_byteify忽略dicts,因为它们已经被字节化了。

最后,我们的实现json_load_byteified和在从或返回的结果上json_loads_byteified调用_byteify(with ignore_dicts=True)来处理被解码的JSON文本在顶层没有的情况。json.loadjson.loadsdict

A solution with object_hook

import json

def json_load_byteified(file_handle):
    return _byteify(
        json.load(file_handle, object_hook=_byteify),
        ignore_dicts=True
    )

def json_loads_byteified(json_text):
    return _byteify(
        json.loads(json_text, object_hook=_byteify),
        ignore_dicts=True
    )

def _byteify(data, ignore_dicts = False):
    # if this is a unicode string, return its string representation
    if isinstance(data, unicode):
        return data.encode('utf-8')
    # if this is a list of values, return list of byteified values
    if isinstance(data, list):
        return [ _byteify(item, ignore_dicts=True) for item in data ]
    # if this is a dictionary, return dictionary of byteified keys and values
    # but only if we haven't already byteified it
    if isinstance(data, dict) and not ignore_dicts:
        return {
            _byteify(key, ignore_dicts=True): _byteify(value, ignore_dicts=True)
            for key, value in data.iteritems()
        }
    # if it's anything else, return it in its original form
    return data

Example usage:

>>> json_loads_byteified('{"Hello": "World"}')
{'Hello': 'World'}
>>> json_loads_byteified('"I am a top-level string"')
'I am a top-level string'
>>> json_loads_byteified('7')
7
>>> json_loads_byteified('["I am inside a list"]')
['I am inside a list']
>>> json_loads_byteified('[[[[[[[["I am inside a big nest of lists"]]]]]]]]')
[[[[[[[['I am inside a big nest of lists']]]]]]]]
>>> json_loads_byteified('{"foo": "bar", "things": [7, {"qux": "baz", "moo": {"cow": ["milk"]}}]}')
{'things': [7, {'qux': 'baz', 'moo': {'cow': ['milk']}}], 'foo': 'bar'}
>>> json_load_byteified(open('somefile.json'))
{'more json': 'from a file'}

How does this work and why would I use it?

Mark Amery’s function is shorter and clearer than these ones, so what’s the point of them? Why would you want to use them?

Purely for performance. Mark’s answer decodes the JSON text fully first with unicode strings, then recurses through the entire decoded value to convert all strings to byte strings. This has a couple of undesirable effects:

  • A copy of the entire decoded structure gets created in memory
  • If your JSON object is really deeply nested (500 levels or more) then you’ll hit Python’s maximum recursion depth

This answer mitigates both of those performance issues by using the object_hook parameter of json.load and json.loads. From the docs:

object_hook is an optional function that will be called with the result of any object literal decoded (a dict). The return value of object_hook will be used instead of the dict. This feature can be used to implement custom decoders

Since dictionaries nested many levels deep in other dictionaries get passed to object_hook as they’re decoded, we can byteify any strings or lists inside them at that point and avoid the need for deep recursion later.

Mark’s answer isn’t suitable for use as an object_hook as it stands, because it recurses into nested dictionaries. We prevent that recursion in this answer with the ignore_dicts parameter to _byteify, which gets passed to it at all times except when object_hook passes it a new dict to byteify. The ignore_dicts flag tells _byteify to ignore dicts since they already been byteified.

Finally, our implementations of json_load_byteified and json_loads_byteified call _byteify (with ignore_dicts=True) on the result returned from json.load or json.loads to handle the case where the JSON text being decoded doesn’t have a dict at the top level.


回答 1

尽管这里有一些不错的答案,但是我最终还是使用PyYAML来解析我的JSON文件,因为它将键和值提供为str类型字符串而不是unicode类型。由于JSON是YAML的子集,因此效果很好:

>>> import json
>>> import yaml
>>> list_org = ['a', 'b']
>>> list_dump = json.dumps(list_org)
>>> list_dump
'["a", "b"]'
>>> json.loads(list_dump)
[u'a', u'b']
>>> yaml.safe_load(list_dump)
['a', 'b']

笔记

不过要注意一些事项:

  • 我得到字符串对象,因为我的所有条目都是ASCII编码的。如果我要使用unicode编码的条目,我会将它们作为unicode对象取回-没有转换!

  • 您应该(可能总是)使用PyYAML的safe_load功能;如果使用它来加载JSON文件,则无论如何都不需要该load函数的“附加功能” 。

  • 如果你想拥有的1.2版规范更多的支持(和YAML解析器正确地解析非常低的数字)尝试Ruamel YAMLpip install ruamel.yamlimport ruamel.yaml as yaml我在测试时所需要的所有我。

转换次数

如前所述,没有转换!如果不能确定只处理ASCII值(并且不能确定大多数时间),最好使用转换函数

我现在几次使用Mark Amery的产品,效果很好,而且非常易于使用。您也可以使用类似的功能object_hook来代替它,因为它可以提高大文件的性能。对此,请参见Mirec Miskuf稍有涉及的答案

While there are some good answers here, I ended up using PyYAML to parse my JSON files, since it gives the keys and values as str type strings instead of unicode type. Because JSON is a subset of YAML it works nicely:

>>> import json
>>> import yaml
>>> list_org = ['a', 'b']
>>> list_dump = json.dumps(list_org)
>>> list_dump
'["a", "b"]'
>>> json.loads(list_dump)
[u'a', u'b']
>>> yaml.safe_load(list_dump)
['a', 'b']

Notes

Some things to note though:

  • I get string objects because all my entries are ASCII encoded. If I would use unicode encoded entries, I would get them back as unicode objects — there is no conversion!

  • You should (probably always) use PyYAML’s safe_load function; if you use it to load JSON files, you don’t need the “additional power” of the load function anyway.

  • If you want a YAML parser that has more support for the 1.2 version of the spec (and correctly parses very low numbers) try Ruamel YAML: pip install ruamel.yaml and import ruamel.yaml as yaml was all I needed in my tests.

Conversion

As stated, there is no conversion! If you can’t be sure to only deal with ASCII values (and you can’t be sure most of the time), better use a conversion function:

I used the one from Mark Amery a couple of times now, it works great and is very easy to use. You can also use a similar function as an object_hook instead, as it might gain you a performance boost on big files. See the slightly more involved answer from Mirec Miskuf for that.


回答 2

没有内置选项可以使json模块函数返回字节字符串而不是unicode字符串。但是,此简短的简单递归函数会将所有已解码的JSON对象从使用unicode字符串转换为UTF-8编码的字节字符串:

def byteify(input):
    if isinstance(input, dict):
        return {byteify(key): byteify(value)
                for key, value in input.iteritems()}
    elif isinstance(input, list):
        return [byteify(element) for element in input]
    elif isinstance(input, unicode):
        return input.encode('utf-8')
    else:
        return input

只需在从json.load或获得的输出上调用它json.loads call调用它即可。

一些注意事项:

  • 要支持Python 2.6或更早版本,请替换return {byteify(key): byteify(value) for key, value in input.iteritems()}return dict([(byteify(key), byteify(value)) for key, value in input.iteritems()]),因为直到Python 2.7才支持字典解析。
  • 由于此答案遍历整个解码对象,因此它具有一些不良的性能特征,可以通过非常小心地使用object_hookobject_pairs_hook参数来避免。Mirec Miskuf的答案是迄今为止唯一能够正确实现这一目标的答案,尽管因此,它的答案比我的方法要复杂得多。

There’s no built-in option to make the json module functions return byte strings instead of unicode strings. However, this short and simple recursive function will convert any decoded JSON object from using unicode strings to UTF-8-encoded byte strings:

def byteify(input):
    if isinstance(input, dict):
        return {byteify(key): byteify(value)
                for key, value in input.iteritems()}
    elif isinstance(input, list):
        return [byteify(element) for element in input]
    elif isinstance(input, unicode):
        return input.encode('utf-8')
    else:
        return input

Just call this on the output you get from a json.load or json.loads call.

A couple of notes:

  • To support Python 2.6 or earlier, replace return {byteify(key): byteify(value) for key, value in input.iteritems()} with return dict([(byteify(key), byteify(value)) for key, value in input.iteritems()]), since dictionary comprehensions weren’t supported until Python 2.7.
  • Since this answer recurses through the entire decoded object, it has a couple of undesirable performance characteristics that can be avoided with very careful use of the object_hook or object_pairs_hook parameters. Mirec Miskuf’s answer is so far the only one that manages to pull this off correctly, although as a consequence, it’s significantly more complicated than my approach.

回答 3

您可以使用该object_hook参数json.loads来传递转换器。事实发生后,您不必进行转换。该json模块将始终object_hook仅传递字典,并且将递归传递嵌套字典,因此您不必自己递归到嵌套字典。我认为我不会将unicode字符串转换为Wells show之类的数字。如果它是unicode字符串,则在JSON文件中被引为字符串,因此应该是字符串(或文件错误)。

另外,我会尽量避免str(val)unicode对象执行类似操作。您应该使用value.encode(encoding)有效的编码,具体取决于外部库的期望。

因此,例如:

def _decode_list(data):
    rv = []
    for item in data:
        if isinstance(item, unicode):
            item = item.encode('utf-8')
        elif isinstance(item, list):
            item = _decode_list(item)
        elif isinstance(item, dict):
            item = _decode_dict(item)
        rv.append(item)
    return rv

def _decode_dict(data):
    rv = {}
    for key, value in data.iteritems():
        if isinstance(key, unicode):
            key = key.encode('utf-8')
        if isinstance(value, unicode):
            value = value.encode('utf-8')
        elif isinstance(value, list):
            value = _decode_list(value)
        elif isinstance(value, dict):
            value = _decode_dict(value)
        rv[key] = value
    return rv

obj = json.loads(s, object_hook=_decode_dict)

You can use the object_hook parameter for json.loads to pass in a converter. You don’t have to do the conversion after the fact. The json module will always pass the object_hook dicts only, and it will recursively pass in nested dicts, so you don’t have to recurse into nested dicts yourself. I don’t think I would convert unicode strings to numbers like Wells shows. If it’s a unicode string, it was quoted as a string in the JSON file, so it is supposed to be a string (or the file is bad).

Also, I’d try to avoid doing something like str(val) on a unicode object. You should use value.encode(encoding) with a valid encoding, depending on what your external lib expects.

So, for example:

def _decode_list(data):
    rv = []
    for item in data:
        if isinstance(item, unicode):
            item = item.encode('utf-8')
        elif isinstance(item, list):
            item = _decode_list(item)
        elif isinstance(item, dict):
            item = _decode_dict(item)
        rv.append(item)
    return rv

def _decode_dict(data):
    rv = {}
    for key, value in data.iteritems():
        if isinstance(key, unicode):
            key = key.encode('utf-8')
        if isinstance(value, unicode):
            value = value.encode('utf-8')
        elif isinstance(value, list):
            value = _decode_list(value)
        elif isinstance(value, dict):
            value = _decode_dict(value)
        rv[key] = value
    return rv

obj = json.loads(s, object_hook=_decode_dict)

回答 4

这是因为json在字符串对象和unicode对象之间没有区别。它们都是javascript中的字符串。

我认为JSON返回Unicode对象是正确的。实际上,我不会接受任何东西,因为javascript字符串实际上是unicode对象(即JSON(javascript)字符串可以存储任何类型的unicode字符),因此unicode在从JSON转换字符串时创建对象是有意义的。普通字符串不适合使用,因为库必须猜测您想要的编码。

最好在unicode任何地方使用字符串对象。因此,最好的选择是更新库,以便它们可以处理unicode对象。

但是,如果您真的想要字节串,只需将结果编码为您选择的编码即可:

>>> nl = json.loads(js)
>>> nl
[u'a', u'b']
>>> nl = [s.encode('utf-8') for s in nl]
>>> nl
['a', 'b']

That’s because json has no difference between string objects and unicode objects. They’re all strings in javascript.

I think JSON is right to return unicode objects. In fact, I wouldn’t accept anything less, since javascript strings are in fact unicode objects (i.e. JSON (javascript) strings can store any kind of unicode character) so it makes sense to create unicode objects when translating strings from JSON. Plain strings just wouldn’t fit since the library would have to guess the encoding you want.

It’s better to use unicode string objects everywhere. So your best option is to update your libraries so they can deal with unicode objects.

But if you really want bytestrings, just encode the results to the encoding of your choice:

>>> nl = json.loads(js)
>>> nl
[u'a', u'b']
>>> nl = [s.encode('utf-8') for s in nl]
>>> nl
['a', 'b']

回答 5

存在一个简单的解决方法。

TL; DR-使用ast.literal_eval()代替json.loads()。双方astjson在标准库。

虽然这不是一个“完美”的答案,但是如果您的计划是完全忽略Unicode,那么答案就很远了。在Python 2.7中

import json, ast
d = { 'field' : 'value' }
print "JSON Fail: ", json.loads(json.dumps(d))
print "AST Win:", ast.literal_eval(json.dumps(d))

给出:

JSON Fail:  {u'field': u'value'}
AST Win: {'field': 'value'}

当某些对象实际上是Unicode字符串时,这会变得更加冗长。完整的答案很快就会出现。

There exists an easy work-around.

TL;DR – Use ast.literal_eval() instead of json.loads(). Both ast and json are in the standard library.

While not a ‘perfect’ answer, it gets one pretty far if your plan is to ignore Unicode altogether. In Python 2.7

import json, ast
d = { 'field' : 'value' }
print "JSON Fail: ", json.loads(json.dumps(d))
print "AST Win:", ast.literal_eval(json.dumps(d))

gives:

JSON Fail:  {u'field': u'value'}
AST Win: {'field': 'value'}

This gets more hairy when some objects are really Unicode strings. The full answer gets hairy quickly.


回答 6

Mike Brennan的答案很接近,但是没有理由重新遍历整个结构。如果使用object_hook_pairs(Python 2.7+)参数:

object_pairs_hook是一个可选函数,将使用对的有序列表解码的任何对象文字的结果调用该函数。的返回值object_pairs_hook将代替dict。此功能可用于实现依赖于键和值对的解码顺序的自定义解码器(例如,collections.OrderedDict将记住插入顺序)。如果object_hook也定义,object_pairs_hook则优先。

使用它,您可以获得每个JSON对象,因此无需进行递归即可进行解码:

def deunicodify_hook(pairs):
    new_pairs = []
    for key, value in pairs:
        if isinstance(value, unicode):
            value = value.encode('utf-8')
        if isinstance(key, unicode):
            key = key.encode('utf-8')
        new_pairs.append((key, value))
    return dict(new_pairs)

In [52]: open('test.json').read()
Out[52]: '{"1": "hello", "abc": [1, 2, 3], "def": {"hi": "mom"}, "boo": [1, "hi", "moo", {"5": "some"}]}'                                        

In [53]: json.load(open('test.json'))
Out[53]: 
{u'1': u'hello',
 u'abc': [1, 2, 3],
 u'boo': [1, u'hi', u'moo', {u'5': u'some'}],
 u'def': {u'hi': u'mom'}}

In [54]: json.load(open('test.json'), object_pairs_hook=deunicodify_hook)
Out[54]: 
{'1': 'hello',
 'abc': [1, 2, 3],
 'boo': [1, 'hi', 'moo', {'5': 'some'}],
 'def': {'hi': 'mom'}}

请注意,由于您使用时,每个对象都将移交给该钩子,因此我不必递归调用该钩子object_pairs_hook。您确实需要关心列表,但是如您所见,列表中的对象将被正确转换,并且您无需递归即可实现它。

编辑:一位同事指出Python2.6没有object_hook_pairs。您仍然可以通过做一个很小的更改来使用Python2.6。在上方的挂钩中,更改:

for key, value in pairs:

for key, value in pairs.iteritems():

然后使用object_hook代替object_pairs_hook

In [66]: json.load(open('test.json'), object_hook=deunicodify_hook)
Out[66]: 
{'1': 'hello',
 'abc': [1, 2, 3],
 'boo': [1, 'hi', 'moo', {'5': 'some'}],
 'def': {'hi': 'mom'}}

使用 object_pairs_hook结果,可以为JSON对象中的每个对象实例化一个更少的字典,如果您正在解析一个巨大的文档,那可能值得一试。

Mike Brennan’s answer is close, but there is no reason to re-traverse the entire structure. If you use the object_hook_pairs (Python 2.7+) parameter:

object_pairs_hook is an optional function that will be called with the result of any object literal decoded with an ordered list of pairs. The return value of object_pairs_hook will be used instead of the dict. This feature can be used to implement custom decoders that rely on the order that the key and value pairs are decoded (for example, collections.OrderedDict will remember the order of insertion). If object_hook is also defined, the object_pairs_hook takes priority.

With it, you get each JSON object handed to you, so you can do the decoding with no need for recursion:

def deunicodify_hook(pairs):
    new_pairs = []
    for key, value in pairs:
        if isinstance(value, unicode):
            value = value.encode('utf-8')
        if isinstance(key, unicode):
            key = key.encode('utf-8')
        new_pairs.append((key, value))
    return dict(new_pairs)

In [52]: open('test.json').read()
Out[52]: '{"1": "hello", "abc": [1, 2, 3], "def": {"hi": "mom"}, "boo": [1, "hi", "moo", {"5": "some"}]}'                                        

In [53]: json.load(open('test.json'))
Out[53]: 
{u'1': u'hello',
 u'abc': [1, 2, 3],
 u'boo': [1, u'hi', u'moo', {u'5': u'some'}],
 u'def': {u'hi': u'mom'}}

In [54]: json.load(open('test.json'), object_pairs_hook=deunicodify_hook)
Out[54]: 
{'1': 'hello',
 'abc': [1, 2, 3],
 'boo': [1, 'hi', 'moo', {'5': 'some'}],
 'def': {'hi': 'mom'}}

Notice that I never have to call the hook recursively since every object will get handed to the hook when you use the object_pairs_hook. You do have to care about lists, but as you can see, an object within a list will be properly converted, and you don’t have to recurse to make it happen.

EDIT: A coworker pointed out that Python2.6 doesn’t have object_hook_pairs. You can still use this will Python2.6 by making a very small change. In the hook above, change:

for key, value in pairs:

to

for key, value in pairs.iteritems():

Then use object_hook instead of object_pairs_hook:

In [66]: json.load(open('test.json'), object_hook=deunicodify_hook)
Out[66]: 
{'1': 'hello',
 'abc': [1, 2, 3],
 'boo': [1, 'hi', 'moo', {'5': 'some'}],
 'def': {'hi': 'mom'}}

Using object_pairs_hook results in one less dictionary being instantiated for each object in the JSON object, which, if you were parsing a huge document, might be worth while.


回答 7

恐怕在simplejson库中无法自动实现此目的。

simplejson中的扫描器和解码器旨在生成unicode文本。为此,该库使用了一个称为c_scanstring(如果可用,为了提高速度)或py_scanstringC版本不可用的函数。该scanstring函数几乎被simplejson用来解码可能包含文本的结构的每个例程多次调用。您将不得不scanstring在simplejson.decoder中对值进行Monkey修补,或者在子类中JSONDecoder提供几乎所有可能包含文本的您自己的完整实现。

但是,simplejson输出unicode的原因是json规范中特别提到“字符串是零个或多个Unicode字符的集合” …对unicode的支持被认为是格式本身的一部分。Simplejson的scanstring实现范围甚至可以扫描和解释Unicode转义(甚至对格式错误的多字节字符集表示形式进行错误检查),因此唯一能够可靠地将值返回给您的方法就是Unicode。

如果您有一个需要使用的老化库,str建议您在解析后费力地搜索嵌套的数据结构(我承认这是您明确表示要避免的内容…对不起),或者将您的库包装成某种形式外观,您可以在其中更细化输入参数。如果您的数据结构确实深度嵌套,则第二种方法可能比第一种方法更易于管理。

I’m afraid there’s no way to achieve this automatically within the simplejson library.

The scanner and decoder in simplejson are designed to produce unicode text. To do this, the library uses a function called c_scanstring (if it’s available, for speed), or py_scanstring if the C version is not available. The scanstring function is called several times by nearly every routine that simplejson has for decoding a structure that might contain text. You’d have to either monkeypatch the scanstring value in simplejson.decoder, or subclass JSONDecoder and provide pretty much your own entire implementation of anything that might contain text.

The reason that simplejson outputs unicode, however, is that the json spec specifically mentions that “A string is a collection of zero or more Unicode characters”… support for unicode is assumed as part of the format itself. Simplejson’s scanstring implementation goes so far as to scan and interpret unicode escapes (even error-checking for malformed multi-byte charset representations), so the only way it can reliably return the value to you is as unicode.

If you have an aged library that needs an str, I recommend you either laboriously search the nested data structure after parsing (which I acknowledge is what you explicitly said you wanted to avoid… sorry), or perhaps wrap your libraries in some sort of facade where you can massage the input parameters at a more granular level. The second approach might be more manageable than the first if your data structures are indeed deeply nested.


回答 8

正如Mark(Amery)正确指出的那样:仅当您只有ASCII时,才可以在json转储上使用PyYaml的反序列化器。至少开箱即用。

关于PyYaml方法的两个简短评论:

  1. 切勿对字段中的数据使用yaml.load。yaml的功能(!)执行隐藏在结构中的任意代码。

  2. 可以通过以下方法使其也适用于非ASCII:

    def to_utf8(loader, node):
        return loader.construct_scalar(node).encode('utf-8')
    yaml.add_constructor(u'tag:yaml.org,2002:str', to_utf8)

但是从性能上来说,它与马克·阿默里的答案没有可比性:

将一些深层嵌套的示例字典扔到这两种方法上,我得到了这一点(dt [j] = json.loads(json.dumps(m))的时间增量):

     dt[yaml.safe_load(json.dumps(m))] =~ 100 * dt[j]
     dt[byteify recursion(Mark Amery)] =~   5 * dt[j]

因此,反序列化包括完全遍历树编码,完全在json基于C的实现的数量级之内。我发现这非常快,并且比深层嵌套结构中的yaml加载还要健壮。而且,查看yaml.load会减少安全性错误的发生。

=>虽然我希望使用指向仅基于C的转换器的指针,但byteify函数应该是默认答案。

如果您的json结构来自包含用户输入的字段,则尤其如此。因为那样的话,您可能无论如何都要遍历您的结构-独立于所需的内部数据结构(仅“ unicode三明治”或字节字符串)。

为什么?

Unicode 规范化。对于不知道:吃片止痛片和阅读

因此,使用字节化递归可以用一块石头杀死两只鸟:

  1. 从嵌套的json转储中获取字节串
  2. 使用户输入值标准化,以便您在存储中查找内容。

在我的测试中,结果证明,用unicodedata.normalize(’NFC’,input).encode(’utf-8’)替换input.encode(’utf-8’)甚至比不使用NFC还要快-多数民众赞成在很大程度上取决于样本数据。

As Mark (Amery) correctly notes: Using PyYaml‘s deserializer on a json dump works only if you have ASCII only. At least out of the box.

Two quick comments on the PyYaml approach:

  1. NEVER use yaml.load on data from the field. Its a feature(!) of yaml to execute arbitrary code hidden within the structure.

  2. You can make it work also for non ASCII via this:

    def to_utf8(loader, node):
        return loader.construct_scalar(node).encode('utf-8')
    yaml.add_constructor(u'tag:yaml.org,2002:str', to_utf8)
    

But performance wise its of no comparison to Mark Amery’s answer:

Throwing some deeply nested sample dicts onto the two methods, I get this (with dt[j] = time delta of json.loads(json.dumps(m))):

     dt[yaml.safe_load(json.dumps(m))] =~ 100 * dt[j]
     dt[byteify recursion(Mark Amery)] =~   5 * dt[j]

So deserialization including fully walking the tree and encoding, well within the order of magnitude of json’s C based implementation. I find this remarkably fast and its also more robust than the yaml load at deeply nested structures. And less security error prone, looking at yaml.load.

=> While I would appreciate a pointer to a C only based converter the byteify function should be the default answer.

This holds especially true if your json structure is from the field, containing user input. Because then you probably need to walk anyway over your structure – independent on your desired internal data structures (‘unicode sandwich’ or byte strings only).

Why?

Unicode normalisation. For the unaware: Take a painkiller and read this.

So using the byteify recursion you kill two birds with one stone:

  1. get your bytestrings from nested json dumps
  2. get user input values normalised, so that you find the stuff in your storage.

In my tests it turned out that replacing the input.encode(‘utf-8’) with a unicodedata.normalize(‘NFC’, input).encode(‘utf-8’) was even faster than w/o NFC – but thats heavily dependent on the sample data I guess.


回答 9

的疑难杂症的是,simplejsonjson是两个不同的模块,至少在它们的方式处理的unicode。您使用的json是py 2.6+,它为您提供unicode值,而simplejson返回字符串对象。只需在您的环境中尝试easy_install-ing simplejson,看看是否可行。它对我有用。

The gotcha is that simplejson and json are two different modules, at least in the manner they deal with unicode. You have json in py 2.6+, and this gives you unicode values, whereas simplejson returns string objects. Just try easy_install-ing simplejson in your environment and see if that works. It did for me.


回答 10

只需使用pickle而不是json进行转储和加载,如下所示:

    import json
    import pickle

    d = { 'field1': 'value1', 'field2': 2, }

    json.dump(d,open("testjson.txt","w"))

    print json.load(open("testjson.txt","r"))

    pickle.dump(d,open("testpickle.txt","w"))

    print pickle.load(open("testpickle.txt","r"))

它产生的输出是(正确处理字符串和整数):

    {u'field2': 2, u'field1': u'value1'}
    {'field2': 2, 'field1': 'value1'}

Just use pickle instead of json for dump and load, like so:

    import json
    import pickle

    d = { 'field1': 'value1', 'field2': 2, }

    json.dump(d,open("testjson.txt","w"))

    print json.load(open("testjson.txt","r"))

    pickle.dump(d,open("testpickle.txt","w"))

    print pickle.load(open("testpickle.txt","r"))

The output it produces is (strings and integers are handled correctly):

    {u'field2': 2, u'field1': u'value1'}
    {'field2': 2, 'field1': 'value1'}

回答 11

因此,我遇到了同样的问题。猜猜Google的第一个结果是什么。

因为我需要将所有数据传递给PyGTK,所以Unicode字符串对我也不是很有用。所以我有另一种递归转换方法。实际上,类型安全JSON转换也需要使用它-json.dump()会在所有非文字类(例如Python对象)上保释。但是不转换字典索引。

# removes any objects, turns unicode back into str
def filter_data(obj):
        if type(obj) in (int, float, str, bool):
                return obj
        elif type(obj) == unicode:
                return str(obj)
        elif type(obj) in (list, tuple, set):
                obj = list(obj)
                for i,v in enumerate(obj):
                        obj[i] = filter_data(v)
        elif type(obj) == dict:
                for i,v in obj.iteritems():
                        obj[i] = filter_data(v)
        else:
                print "invalid object in data, converting to string"
                obj = str(obj) 
        return obj

So, I’ve run into the same problem. Guess what was the first Google result.

Because I need to pass all data to PyGTK, unicode strings aren’t very useful to me either. So I have another recursive conversion method. It’s actually also needed for typesafe JSON conversion – json.dump() would bail on any non-literals, like Python objects. Doesn’t convert dict indexes though.

# removes any objects, turns unicode back into str
def filter_data(obj):
        if type(obj) in (int, float, str, bool):
                return obj
        elif type(obj) == unicode:
                return str(obj)
        elif type(obj) in (list, tuple, set):
                obj = list(obj)
                for i,v in enumerate(obj):
                        obj[i] = filter_data(v)
        elif type(obj) == dict:
                for i,v in obj.iteritems():
                        obj[i] = filter_data(v)
        else:
                print "invalid object in data, converting to string"
                obj = str(obj) 
        return obj

回答 12

我有一个JSON dict作为字符串。键和值是unicode对象,如以下示例所示:

myStringDict = "{u'key':u'value'}"

我可以通过使用byteify将字符串转换为dict对象来使用上面建议的功能ast.literal_eval(myStringDict)

I had a JSON dict as a string. The keys and values were unicode objects like in the following example:

myStringDict = "{u'key':u'value'}"

I could use the byteify function suggested above by converting the string to a dict object using ast.literal_eval(myStringDict).


回答 13

使用钩子支持Python2&3(来自https://stackoverflow.com/a/33571117/558397

import requests
import six
from six import iteritems

requests.packages.urllib3.disable_warnings()  # @UndefinedVariable
r = requests.get("http://echo.jsontest.com/key/value/one/two/three", verify=False)

def _byteify(data):
    # if this is a unicode string, return its string representation
    if isinstance(data, six.string_types):
        return str(data.encode('utf-8').decode())

    # if this is a list of values, return list of byteified values
    if isinstance(data, list):
        return [ _byteify(item) for item in data ]

    # if this is a dictionary, return dictionary of byteified keys and values
    # but only if we haven't already byteified it
    if isinstance(data, dict):
        return {
            _byteify(key): _byteify(value) for key, value in iteritems(data)
        }
    # if it's anything else, return it in its original form
    return data

w = r.json(object_hook=_byteify)
print(w)

返回值:

 {'three': '', 'key': 'value', 'one': 'two'}

Support Python2&3 using hook (from https://stackoverflow.com/a/33571117/558397)

import requests
import six
from six import iteritems

requests.packages.urllib3.disable_warnings()  # @UndefinedVariable
r = requests.get("http://echo.jsontest.com/key/value/one/two/three", verify=False)

def _byteify(data):
    # if this is a unicode string, return its string representation
    if isinstance(data, six.string_types):
        return str(data.encode('utf-8').decode())

    # if this is a list of values, return list of byteified values
    if isinstance(data, list):
        return [ _byteify(item) for item in data ]

    # if this is a dictionary, return dictionary of byteified keys and values
    # but only if we haven't already byteified it
    if isinstance(data, dict):
        return {
            _byteify(key): _byteify(value) for key, value in iteritems(data)
        }
    # if it's anything else, return it in its original form
    return data

w = r.json(object_hook=_byteify)
print(w)

Returns:

 {'three': '', 'key': 'value', 'one': 'two'}

回答 14

这对游戏来说太晚了,但是我建立了这个递归脚轮。它可以满足我的需求,而且我认为它比较完整。它可能会帮助您。

def _parseJSON(self, obj):
    newobj = {}

    for key, value in obj.iteritems():
        key = str(key)

        if isinstance(value, dict):
            newobj[key] = self._parseJSON(value)
        elif isinstance(value, list):
            if key not in newobj:
                newobj[key] = []
                for i in value:
                    newobj[key].append(self._parseJSON(i))
        elif isinstance(value, unicode):
            val = str(value)
            if val.isdigit():
                val = int(val)
            else:
                try:
                    val = float(val)
                except ValueError:
                    val = str(val)
            newobj[key] = val

    return newobj

只需将其传递给JSON对象,如下所示:

obj = json.loads(content, parse_float=float, parse_int=int)
obj = _parseJSON(obj)

我将其作为类的私有成员,但是您可以根据需要重新调整方法的用途。

This is late to the game, but I built this recursive caster. It works for my needs and I think it’s relatively complete. It may help you.

def _parseJSON(self, obj):
    newobj = {}

    for key, value in obj.iteritems():
        key = str(key)

        if isinstance(value, dict):
            newobj[key] = self._parseJSON(value)
        elif isinstance(value, list):
            if key not in newobj:
                newobj[key] = []
                for i in value:
                    newobj[key].append(self._parseJSON(i))
        elif isinstance(value, unicode):
            val = str(value)
            if val.isdigit():
                val = int(val)
            else:
                try:
                    val = float(val)
                except ValueError:
                    val = str(val)
            newobj[key] = val

    return newobj

Just pass it a JSON object like so:

obj = json.loads(content, parse_float=float, parse_int=int)
obj = _parseJSON(obj)

I have it as a private member of a class, but you can repurpose the method as you see fit.


回答 15

我重写了Wells的_parse_json()来处理json对象本身是数组的情况(我的用例)。

def _parseJSON(self, obj):
    if isinstance(obj, dict):
        newobj = {}
        for key, value in obj.iteritems():
            key = str(key)
            newobj[key] = self._parseJSON(value)
    elif isinstance(obj, list):
        newobj = []
        for value in obj:
            newobj.append(self._parseJSON(value))
    elif isinstance(obj, unicode):
        newobj = str(obj)
    else:
        newobj = obj
    return newobj

I rewrote Wells’s _parse_json() to handle cases where the json object itself is an array (my use case).

def _parseJSON(self, obj):
    if isinstance(obj, dict):
        newobj = {}
        for key, value in obj.iteritems():
            key = str(key)
            newobj[key] = self._parseJSON(value)
    elif isinstance(obj, list):
        newobj = []
        for value in obj:
            newobj.append(self._parseJSON(value))
    elif isinstance(obj, unicode):
        newobj = str(obj)
    else:
        newobj = obj
    return newobj

回答 16

这是用C语言编写的递归编码器:https : //github.com/axiros/nested_encode

与json.loads相比,“平均”结构的性能开销约为10%。

python speed.py                                                                                            
  json loads            [0.16sec]: {u'a': [{u'b': [[1, 2, [u'\xd6ster..
  json loads + encoding [0.18sec]: {'a': [{'b': [[1, 2, ['\xc3\x96ster.
  time overhead in percent: 9%

使用以下测试结构:

import json, nested_encode, time

s = """
{
  "firstName": "Jos\\u0301",
  "lastName": "Smith",
  "isAlive": true,
  "age": 25,
  "address": {
    "streetAddress": "21 2nd Street",
    "city": "\\u00d6sterreich",
    "state": "NY",
    "postalCode": "10021-3100"
  },
  "phoneNumbers": [
    {
      "type": "home",
      "number": "212 555-1234"
    },
    {
      "type": "office",
      "number": "646 555-4567"
    }
  ],
  "children": [],
  "spouse": null,
  "a": [{"b": [[1, 2, ["\\u00d6sterreich"]]]}]
}
"""


t1 = time.time()
for i in xrange(10000):
    u = json.loads(s)
dt_json = time.time() - t1

t1 = time.time()
for i in xrange(10000):
    b = nested_encode.encode_nested(json.loads(s))
dt_json_enc = time.time() - t1

print "json loads            [%.2fsec]: %s..." % (dt_json, str(u)[:20])
print "json loads + encoding [%.2fsec]: %s..." % (dt_json_enc, str(b)[:20])

print "time overhead in percent: %i%%"  % (100 * (dt_json_enc - dt_json)/dt_json)

here is a recursive encoder written in C: https://github.com/axiros/nested_encode

Performance overhead for “average” structures around 10% compared to json.loads.

python speed.py                                                                                            
  json loads            [0.16sec]: {u'a': [{u'b': [[1, 2, [u'\xd6ster..
  json loads + encoding [0.18sec]: {'a': [{'b': [[1, 2, ['\xc3\x96ster.
  time overhead in percent: 9%

using this teststructure:

import json, nested_encode, time

s = """
{
  "firstName": "Jos\\u0301",
  "lastName": "Smith",
  "isAlive": true,
  "age": 25,
  "address": {
    "streetAddress": "21 2nd Street",
    "city": "\\u00d6sterreich",
    "state": "NY",
    "postalCode": "10021-3100"
  },
  "phoneNumbers": [
    {
      "type": "home",
      "number": "212 555-1234"
    },
    {
      "type": "office",
      "number": "646 555-4567"
    }
  ],
  "children": [],
  "spouse": null,
  "a": [{"b": [[1, 2, ["\\u00d6sterreich"]]]}]
}
"""


t1 = time.time()
for i in xrange(10000):
    u = json.loads(s)
dt_json = time.time() - t1

t1 = time.time()
for i in xrange(10000):
    b = nested_encode.encode_nested(json.loads(s))
dt_json_enc = time.time() - t1

print "json loads            [%.2fsec]: %s..." % (dt_json, str(u)[:20])
print "json loads + encoding [%.2fsec]: %s..." % (dt_json_enc, str(b)[:20])

print "time overhead in percent: %i%%"  % (100 * (dt_json_enc - dt_json)/dt_json)

回答 17

使用Python 3.6,有时我仍然遇到这个问题。例如,当从REST API获取响应并将响应文本加载到JSON时,我仍然会获得unicode字符串。使用json.dumps()找到了一个简单的解决方案。

response_message = json.loads(json.dumps(response.text))
print(response_message)

With Python 3.6, sometimes I still run into this problem. For example, when getting response from a REST API and loading the response text to JSON, I still get the unicode strings. Found a simple solution using json.dumps().

response_message = json.loads(json.dumps(response.text))
print(response_message)

回答 18

我也遇到了这个问题,不得不处理JSON,我想出了一个小循环将Unicode键转换为字符串。(simplejson在GAE上不返回字符串键。)

obj 是从JSON解码的对象:

if NAME_CLASS_MAP.has_key(cls):
    kwargs = {}
    for i in obj.keys():
        kwargs[str(i)] = obj[i]
    o = NAME_CLASS_MAP[cls](**kwargs)
    o.save()

kwargs是我传递给GAE应用程序的构造函数的内容(它不喜欢其中的unicode**kwargs

不像Wells的解决方案那样强大,但是要小得多。

I ran into this problem too, and having to deal with JSON, I came up with a small loop that converts the unicode keys to strings. (simplejson on GAE does not return string keys.)

obj is the object decoded from JSON:

if NAME_CLASS_MAP.has_key(cls):
    kwargs = {}
    for i in obj.keys():
        kwargs[str(i)] = obj[i]
    o = NAME_CLASS_MAP[cls](**kwargs)
    o.save()

kwargs is what I pass to the constructor of the GAE application (which does not like unicode keys in **kwargs)

Not as robust as the solution from Wells, but much smaller.


回答 19

我从Mark Amery答案中改编了代码,尤其是为了摆脱isinstance鸭蛋式游戏的优点。

编码是手动完成的,ensure_ascii已被禁用。的python文档json.dump

如果suresure_ascii为True(默认值),则输出中的所有非ASCII字符均以\ uXXXX序列转义

免责声明:在doctest中,我使用了匈牙利语。一些与匈牙利人相关的著名字符编码是:使用cp852的IBM / OEM编码,例如。在DOS中(有时不正确地称为ascii,我认为这取决于代码页设置),cp1250例如。在Windows中(有时称为ansi,取决于语言环境设置),并且iso-8859-2有时在http服务器上使用。测试文本Tüskéshátú kígyóbűvölő归因于Wikipedia的KoltaiLászló(本机人名)。

# coding: utf-8
"""
This file should be encoded correctly with utf-8.
"""
import json

def encode_items(input, encoding='utf-8'):
    u"""original from: https://stackoverflow.com/a/13101776/611007
    adapted by SO/u/611007 (20150623)
    >>> 
    >>> ## run this with `python -m doctest <this file>.py` from command line
    >>> 
    >>> txt = u"Tüskéshátú kígyóbűvölő"
    >>> txt2 = u"T\\u00fcsk\\u00e9sh\\u00e1t\\u00fa k\\u00edgy\\u00f3b\\u0171v\\u00f6l\\u0151"
    >>> txt3 = u"uúuutifu"
    >>> txt4 = b'u\\xfauutifu'
    >>> # txt4 shouldn't be 'u\\xc3\\xbauutifu', string content needs double backslash for doctest:
    >>> assert u'\\u0102' not in b'u\\xfauutifu'.decode('cp1250')
    >>> txt4u = txt4.decode('cp1250')
    >>> assert txt4u == u'u\\xfauutifu', repr(txt4u)
    >>> txt5 = b"u\\xc3\\xbauutifu"
    >>> txt5u = txt5.decode('utf-8')
    >>> txt6 = u"u\\u251c\\u2551uutifu"
    >>> there_and_back_again = lambda t: encode_items(t, encoding='utf-8').decode('utf-8')
    >>> assert txt == there_and_back_again(txt)
    >>> assert txt == there_and_back_again(txt2)
    >>> assert txt3 == there_and_back_again(txt3)
    >>> assert txt3.encode('cp852') == there_and_back_again(txt4u).encode('cp852')
    >>> assert txt3 == txt4u,(txt3,txt4u)
    >>> assert txt3 == there_and_back_again(txt5)
    >>> assert txt3 == there_and_back_again(txt5u)
    >>> assert txt3 == there_and_back_again(txt4u)
    >>> assert txt3.encode('cp1250') == encode_items(txt4, encoding='utf-8')
    >>> assert txt3.encode('utf-8') == encode_items(txt5, encoding='utf-8')
    >>> assert txt2.encode('utf-8') == encode_items(txt, encoding='utf-8')
    >>> assert {'a':txt2.encode('utf-8')} == encode_items({'a':txt}, encoding='utf-8')
    >>> assert [txt2.encode('utf-8')] == encode_items([txt], encoding='utf-8')
    >>> assert [[txt2.encode('utf-8')]] == encode_items([[txt]], encoding='utf-8')
    >>> assert [{'a':txt2.encode('utf-8')}] == encode_items([{'a':txt}], encoding='utf-8')
    >>> assert {'b':{'a':txt2.encode('utf-8')}} == encode_items({'b':{'a':txt}}, encoding='utf-8')
    """
    try:
        input.iteritems
        return {encode_items(k): encode_items(v) for (k,v) in input.iteritems()}
    except AttributeError:
        if isinstance(input, unicode):
            return input.encode(encoding)
        elif isinstance(input, str):
            return input
        try:
            iter(input)
            return [encode_items(e) for e in input]
        except TypeError:
            return input

def alt_dumps(obj, **kwargs):
    """
    >>> alt_dumps({'a': u"T\\u00fcsk\\u00e9sh\\u00e1t\\u00fa k\\u00edgy\\u00f3b\\u0171v\\u00f6l\\u0151"})
    '{"a": "T\\xc3\\xbcsk\\xc3\\xa9sh\\xc3\\xa1t\\xc3\\xba k\\xc3\\xadgy\\xc3\\xb3b\\xc5\\xb1v\\xc3\\xb6l\\xc5\\x91"}'
    """
    if 'ensure_ascii' in kwargs:
        del kwargs['ensure_ascii']
    return json.dumps(encode_items(obj), ensure_ascii=False, **kwargs)

我还想强调Jarret Hardie答案,该答案引用了JSON规范,并引用:

字符串是零个或多个Unicode字符的集合

在我的用例中,我有带有json的文件。它们是utf-8编码文件。ensure_ascii会导致正确转义但可读性不强的json文件,这就是为什么我调整了Mark Amery的答案来满足自己的需求的原因。

doctest并不是特别周到,但是我分享了代码,希望它对某人有用。

I’ve adapted the code from the answer of Mark Amery, particularly in order to get rid of isinstance for the pros of duck-typing.

The encoding is done manually and ensure_ascii is disabled. The python docs for json.dump says that

If ensure_ascii is True (the default), all non-ASCII characters in the output are escaped with \uXXXX sequences

Disclaimer: in the doctest I used the Hungarian language. Some notable Hungarian-related character encodings are: cp852 the IBM/OEM encoding used eg. in DOS (sometimes referred as ascii, incorrectly I think, it is dependent on the codepage setting), cp1250 used eg. in Windows (sometimes referred as ansi, dependent on the locale settings), and iso-8859-2, sometimes used on http servers. The test text Tüskéshátú kígyóbűvölő is attributed to Koltai László (native personal name form) and is from wikipedia.

# coding: utf-8
"""
This file should be encoded correctly with utf-8.
"""
import json

def encode_items(input, encoding='utf-8'):
    u"""original from: https://stackoverflow.com/a/13101776/611007
    adapted by SO/u/611007 (20150623)
    >>> 
    >>> ## run this with `python -m doctest <this file>.py` from command line
    >>> 
    >>> txt = u"Tüskéshátú kígyóbűvölő"
    >>> txt2 = u"T\\u00fcsk\\u00e9sh\\u00e1t\\u00fa k\\u00edgy\\u00f3b\\u0171v\\u00f6l\\u0151"
    >>> txt3 = u"uúuutifu"
    >>> txt4 = b'u\\xfauutifu'
    >>> # txt4 shouldn't be 'u\\xc3\\xbauutifu', string content needs double backslash for doctest:
    >>> assert u'\\u0102' not in b'u\\xfauutifu'.decode('cp1250')
    >>> txt4u = txt4.decode('cp1250')
    >>> assert txt4u == u'u\\xfauutifu', repr(txt4u)
    >>> txt5 = b"u\\xc3\\xbauutifu"
    >>> txt5u = txt5.decode('utf-8')
    >>> txt6 = u"u\\u251c\\u2551uutifu"
    >>> there_and_back_again = lambda t: encode_items(t, encoding='utf-8').decode('utf-8')
    >>> assert txt == there_and_back_again(txt)
    >>> assert txt == there_and_back_again(txt2)
    >>> assert txt3 == there_and_back_again(txt3)
    >>> assert txt3.encode('cp852') == there_and_back_again(txt4u).encode('cp852')
    >>> assert txt3 == txt4u,(txt3,txt4u)
    >>> assert txt3 == there_and_back_again(txt5)
    >>> assert txt3 == there_and_back_again(txt5u)
    >>> assert txt3 == there_and_back_again(txt4u)
    >>> assert txt3.encode('cp1250') == encode_items(txt4, encoding='utf-8')
    >>> assert txt3.encode('utf-8') == encode_items(txt5, encoding='utf-8')
    >>> assert txt2.encode('utf-8') == encode_items(txt, encoding='utf-8')
    >>> assert {'a':txt2.encode('utf-8')} == encode_items({'a':txt}, encoding='utf-8')
    >>> assert [txt2.encode('utf-8')] == encode_items([txt], encoding='utf-8')
    >>> assert [[txt2.encode('utf-8')]] == encode_items([[txt]], encoding='utf-8')
    >>> assert [{'a':txt2.encode('utf-8')}] == encode_items([{'a':txt}], encoding='utf-8')
    >>> assert {'b':{'a':txt2.encode('utf-8')}} == encode_items({'b':{'a':txt}}, encoding='utf-8')
    """
    try:
        input.iteritems
        return {encode_items(k): encode_items(v) for (k,v) in input.iteritems()}
    except AttributeError:
        if isinstance(input, unicode):
            return input.encode(encoding)
        elif isinstance(input, str):
            return input
        try:
            iter(input)
            return [encode_items(e) for e in input]
        except TypeError:
            return input

def alt_dumps(obj, **kwargs):
    """
    >>> alt_dumps({'a': u"T\\u00fcsk\\u00e9sh\\u00e1t\\u00fa k\\u00edgy\\u00f3b\\u0171v\\u00f6l\\u0151"})
    '{"a": "T\\xc3\\xbcsk\\xc3\\xa9sh\\xc3\\xa1t\\xc3\\xba k\\xc3\\xadgy\\xc3\\xb3b\\xc5\\xb1v\\xc3\\xb6l\\xc5\\x91"}'
    """
    if 'ensure_ascii' in kwargs:
        del kwargs['ensure_ascii']
    return json.dumps(encode_items(obj), ensure_ascii=False, **kwargs)

I’d also like to highlight the answer of Jarret Hardie which references the JSON spec, quoting:

A string is a collection of zero or more Unicode characters

In my use-case I had files with json. They are utf-8 encoded files. ensure_ascii results in properly escaped but not very readable json files, that is why I’ve adapted Mark Amery’s answer to fit my needs.

The doctest is not particularly thoughtful but I share the code in the hope that it will useful for someone.


回答 20

看看这个类似问题的答案,该问题指出

u-前缀仅表示您具有Unicode字符串。当您真正使用字符串时,它不会出现在您的数据中。不要被打印输出扔掉。

例如,尝试以下操作:

print mail_accounts[0]["i"]

你不会看到你。

Check out this answer to a similar question like this which states that

The u- prefix just means that you have a Unicode string. When you really use the string, it won’t appear in your data. Don’t be thrown by the printed output.

For example, try this:

print mail_accounts[0]["i"]

You won’t see a u.


json.dumps与flask.jsonify

问题:json.dumps与flask.jsonify

我不确定我是否了解该flask.jsonify方法的目的。我尝试从中制作一个JSON字符串:

data = {"id": str(album.id), "title": album.title}

但是我得到的与我得到的json.dumps有所不同flask.jsonify

json.dumps(data): [{"id": "4ea856fd6506ae0db42702dd", "title": "Business"}]
flask.jsonify(data): {"id":…, "title":…}

显然,我需要得到一个看起来更像json.dumps返回结果的结果。我究竟做错了什么?

I am not sure I understand the purpose of the flask.jsonify method. I try to make a JSON string from this:

data = {"id": str(album.id), "title": album.title}

but what I get with json.dumps differs from what I get with flask.jsonify.

json.dumps(data): [{"id": "4ea856fd6506ae0db42702dd", "title": "Business"}]
flask.jsonify(data): {"id":…, "title":…}

Obviously I need to get a result that looks more like what json.dumps returns. What am I doing wrong?


回答 0

jsonify()flask中的函数返回一个flask.Response()对象,该对象已经具有用于json响应的适当的内容类型标头’application / json’。而该json.dumps()方法将仅返回编码后的字符串,这将需要手动添加MIME类型标头。

查看更多有关该jsonify()功能在这里完全参考。

编辑:另外,我注意到它可以jsonify()处理kwarg或字典,同时json.dumps()还支持列表和其他列表。

The jsonify() function in flask returns a flask.Response() object that already has the appropriate content-type header ‘application/json’ for use with json responses. Whereas, the json.dumps() method will just return an encoded string, which would require manually adding the MIME type header.

See more about the jsonify() function here for full reference.

Edit: Also, I’ve noticed that jsonify() handles kwargs or dictionaries, while json.dumps() additionally supports lists and others.


回答 1

你可以做:

flask.jsonify(**data)

要么

flask.jsonify(id=str(album.id), title=album.title)

You can do:

flask.jsonify(**data)

or

flask.jsonify(id=str(album.id), title=album.title)

回答 2

这是 flask.jsonify()

def jsonify(*args, **kwargs):
    if __debug__:
        _assert_have_json()
    return current_app.response_class(json.dumps(dict(*args, **kwargs),
        indent=None if request.is_xhr else 2), mimetype='application/json')

json使用的模块是simplejsonjson顺序相同。current_app是对Flask()对象(即您的应用程序)的引用。response_class()是对该Response()类的引用。

This is flask.jsonify()

def jsonify(*args, **kwargs):
    if __debug__:
        _assert_have_json()
    return current_app.response_class(json.dumps(dict(*args, **kwargs),
        indent=None if request.is_xhr else 2), mimetype='application/json')

The json module used is either simplejson or json in that order. current_app is a reference to the Flask() object i.e. your application. response_class() is a reference to the Response() class.


回答 3

一个或另一个的选择取决于您打算做什么。据我了解:

  • 当您构建一个有人查询并期望json作为回报的API时,jsonify会很有用。例如:REST github API可以使用此方法来回答您的请求。

  • dumps,更多关于将数据/ python对象格式化为json并在您的应用程序中进行处理。例如,我需要将一个对象传递给我的表示层,其中一些javascript将显示图形。您将使用转储生成的Json喂javascript。

The choice of one or another depends on what you intend to do. From what I do understand:

  • jsonify would be useful when you are building an API someone would query and expect json in return. E.g: The REST github API could use this method to answer your request.

  • dumps, is more about formating data/python object into json and work on it inside your application. For instance, I need to pass an object to my representation layer where some javascript will display graph. You’ll feed javascript with the Json generated by dumps.


回答 4

考虑

data={'fld':'hello'}

现在

jsonify(data)

将产生{‘fld’:’hello’}并且

json.dumps(data)

"<html><body><p>{'fld':'hello'}</p></body></html>"

consider

data={'fld':'hello'}

now

jsonify(data)

will yield {‘fld’:’hello’} and

json.dumps(data)

gives

"<html><body><p>{'fld':'hello'}</p></body></html>"

将python字典转换为字符串并返回

问题:将python字典转换为字符串并返回

我正在编写一个将数据存储在字典对象中的程序,但是该数据需要在程序执行过程中的某个时候保存,并在再次运行该程序时重新加载到字典对象中。如何将字典对象转换为可以写入文件并可以加载回字典对象的字符串?希望这将支持包含词典的词典。

I am writing a program that stores data in a dictionary object, but this data needs to be saved at some point during the program execution and loaded back into the dictionary object when the program is run again. How would I convert a dictionary object into a string that can be written to a file and loaded back into a dictionary object? This will hopefully support dictionaries containing dictionaries.


回答 0

json模块是一个很好的解决方案。与pickle相比,它的优势在于它仅生成纯文本输出,并且是跨平台和跨版本的。

import json
json.dumps(dict)

The json module is a good solution here. It has the advantages over pickle that it only produces plain text output, and is cross-platform and cross-version.

import json
json.dumps(dict)

回答 1

如果您的字典不太大,也许str + eval可以完成工作:

dict1 = {'one':1, 'two':2, 'three': {'three.1': 3.1, 'three.2': 3.2 }}
str1 = str(dict1)

dict2 = eval(str1)

print dict1==dict2

如果源不受信任,则可以使用ast.literal_eval而不是eval来提高安全性。

If your dictionary isn’t too big maybe str + eval can do the work:

dict1 = {'one':1, 'two':2, 'three': {'three.1': 3.1, 'three.2': 3.2 }}
str1 = str(dict1)

dict2 = eval(str1)

print dict1==dict2

You can use ast.literal_eval instead of eval for additional security if the source is untrusted.


回答 2

我用json

import json

# convert to string
input = json.dumps({'id': id })

# load to dict
my_dict = json.loads(input) 

I use json:

import json

# convert to string
input = json.dumps({'id': id })

# load to dict
my_dict = json.loads(input) 

回答 3

使用该pickle模块将其保存到磁盘并稍后加载。

Use the pickle module to save it to disk and load later on.


回答 4

为什么不使用Python 3的内置ast库的函数literal_eval。最好使用literal_eval而不是eval

import ast
str_of_dict = "{'key1': 'key1value', 'key2': 'key2value'}"
ast.literal_eval(str_of_dict)

将输出作为实际字典

{'key1': 'key1value', 'key2': 'key2value'}

而且,如果您要求将Dictionary转换为String,那么如何使用str() Python的方法。

假设字典是:

my_dict = {'key1': 'key1value', 'key2': 'key2value'}

这将像这样完成:

str(my_dict)

将打印:

"{'key1': 'key1value', 'key2': 'key2value'}"

这是您想要的简单操作。

Why not to use Python 3’s inbuilt ast library’s function literal_eval. It is better to use literal_eval instead of eval

import ast
str_of_dict = "{'key1': 'key1value', 'key2': 'key2value'}"
ast.literal_eval(str_of_dict)

will give output as actual Dictionary

{'key1': 'key1value', 'key2': 'key2value'}

And If you are asking to convert a Dictionary to a String then, How about using str() method of Python.

Suppose the dictionary is :

my_dict = {'key1': 'key1value', 'key2': 'key2value'}

And this will be done like this :

str(my_dict)

Will Print :

"{'key1': 'key1value', 'key2': 'key2value'}"

This is the easy as you like.


回答 5

如果在中国

import codecs
fout = codecs.open("xxx.json", "w", "utf-8")
dict_to_json = json.dumps({'text':"中文"},ensure_ascii=False,indent=2)
fout.write(dict_to_json + '\n')

If in Chinses

import codecs
fout = codecs.open("xxx.json", "w", "utf-8")
dict_to_json = json.dumps({'text':"中文"},ensure_ascii=False,indent=2)
fout.write(dict_to_json + '\n')

回答 6

将字典转换成JSON(字符串)

import json 

mydict = { "name" : "Don", 
          "surname" : "Mandol", 
          "age" : 43} 

result = json.dumps(mydict)

print(result[0:20])

将为您提供:

{“ name”:“ Don”,“ sur

将字符串转换成字典

back_to_mydict = json.loads(result) 

Convert dictionary into JSON (string)

import json 

mydict = { "name" : "Don", 
          "surname" : "Mandol", 
          "age" : 43} 

result = json.dumps(mydict)

print(result[0:20])

will get you:

{“name”: “Don”, “sur

Convert string into dictionary

back_to_mydict = json.loads(result) 

回答 7

我认为您应该考虑使用shelve提供持久性文件支持的字典式对象的模块。它很容易代替“真实”字典使用,因为它几乎透明地为您的程序提供了可以像字典一样使用的东西,而无需将其显式转换为字符串然后写入文件(或反之,反之亦然。

主要区别是需要open()先使用close()它,然后再使用(完成时可能要使用sync()它,具体取决于writeback要使用所使用选项)。创建的任何“架子”文件对象都可以包含常规字典作为值,从而使它们可以逻辑嵌套。

这是一个简单的例子:

import shelve

shelf = shelve.open('mydata')  # open for reading and writing, creating if nec
shelf.update({'one':1, 'two':2, 'three': {'three.1': 3.1, 'three.2': 3.2 }})
shelf.close()

shelf = shelve.open('mydata')
print shelf
shelf.close()

输出:

{'three': {'three.1': 3.1, 'three.2': 3.2}, 'two': 2, 'one': 1}

I think you should consider using the shelve module which provides persistent file-backed dictionary-like objects. It’s easy to use in place of a “real” dictionary because it almost transparently provides your program with something that can be used just like a dictionary, without the need to explicitly convert it to a string and then write to a file (or vice-versa).

The main difference is needing to initially open() it before first use and then close() it when you’re done (and possibly sync()ing it, depending on the writeback option being used). Any “shelf” file objects create can contain regular dictionaries as values, allowing them to be logically nested.

Here’s a trivial example:

import shelve

shelf = shelve.open('mydata')  # open for reading and writing, creating if nec
shelf.update({'one':1, 'two':2, 'three': {'three.1': 3.1, 'three.2': 3.2 }})
shelf.close()

shelf = shelve.open('mydata')
print shelf
shelf.close()

Output:

{'three': {'three.1': 3.1, 'three.2': 3.2}, 'two': 2, 'one': 1}

回答 8

如果您关心速度,请使用与json相同的API的ujson(UltraJSON):

import ujson
ujson.dumps([{"key": "value"}, 81, True])
# '[{"key":"value"},81,true]'
ujson.loads("""[{"key": "value"}, 81, true]""")
# [{u'key': u'value'}, 81, True]

If you care about the speed use ujson (UltraJSON), which has the same API as json:

import ujson
ujson.dumps([{"key": "value"}, 81, True])
# '[{"key":"value"},81,true]'
ujson.loads("""[{"key": "value"}, 81, true]""")
# [{u'key': u'value'}, 81, True]

回答 9

如果需要可读性(不是IMHO的JSON或XML),或者如果不需要阅读的话,我就使用yaml。

from pickle import dumps, loads
x = dict(a=1, b=2)
y = dict(c = x, z=3)
res = dumps(y)
open('/var/tmp/dump.txt', 'w').write(res)

回过头再读

from pickle import dumps, loads
rev = loads(open('/var/tmp/dump.txt').read())
print rev

I use yaml for that if needs to be readable (neither JSON nor XML are that IMHO), or if reading is not necessary I use pickle.

Write

from pickle import dumps, loads
x = dict(a=1, b=2)
y = dict(c = x, z=3)
res = dumps(y)
open('/var/tmp/dump.txt', 'w').write(res)

Read back

from pickle import dumps, loads
rev = loads(open('/var/tmp/dump.txt').read())
print rev

如何使用Python动态构建JSON对象?

问题:如何使用Python动态构建JSON对象?

我是Python的新手,并且正在使用JSON数据。我想通过向现有JSON对象添加一些键值来动态构建JSON对象。

我尝试了以下方法,但得到了TypeError: 'str' object does not support item assignment

import json

json_data = json.dumps({})
json_data["key"] = "value"

print 'JSON: ', json_data

I am new to Python and I am playing with JSON data. I would like to dynamically build a JSON object by adding some key-value to an existing JSON object.

I tried the following but I get TypeError: 'str' object does not support item assignment:

import json

json_data = json.dumps({})
json_data["key"] = "value"

print 'JSON: ', json_data

回答 0

您在将对象编码为JSON字符串之前先对其进行构建:

import json

data = {}
data['key'] = 'value'
json_data = json.dumps(data)

JSON是序列化格式,文本数据表示结构。它本身不是那个结构。

You build the object before encoding it to a JSON string:

import json

data = {}
data['key'] = 'value'
json_data = json.dumps(data)

JSON is a serialization format, textual data representing a structure. It is not, itself, that structure.


回答 1

您可以创建Python字典并将其序列化为JSON,并且一行都不丑。

my_json_string = json.dumps({'key1': val1, 'key2': val2})

You can create the Python dictionary and serialize it to JSON in one line and it’s not even ugly.

my_json_string = json.dumps({'key1': val1, 'key2': val2})

回答 2

已经提供了一种解决方案,可以构建字典(或嵌套字典以获取更复杂的数据),但是如果您要构建对象,则可以尝试使用“ ObjDict”。这样可以更好地控制要创建的json,例如保留顺序,并允许将其构建为对象,这可能是概念的首选表示形式。

pip首先安装objdict。

from objdict import ObjDict

data = ObjDict()
data.key = 'value'
json_data = data.dumps()

There is already a solution provided which allows building a dictionary, (or nested dictionary for more complex data), but if you wish to build an object, then perhaps try ‘ObjDict’. This gives much more control over the json to be created, for example retaining order, and allows building as an object which may be a preferred representation of your concept.

pip install objdict first.

from objdict import ObjDict

data = ObjDict()
data.key = 'value'
json_data = data.dumps()

回答 3

您可以使用EasyDictdoc

EasyDict允许将字典值作为属性访问(递归工作​​)。python dict的类似于Javascript的属性点表示法。

使用方法

>>> from easydict import EasyDict as edict
>>> d = edict({'foo':3, 'bar':{'x':1, 'y':2}})
>>> d.foo
3
>>> d.bar.x
1

>>> d = edict(foo=3)
>>> d.foo
3

[ 安装 ]:

  • pip install easydict

You can use EasyDict library (doc):

EasyDict allows to access dict values as attributes (works recursively). A Javascript-like properties dot notation for python dicts.

USEAGE

>>> from easydict import EasyDict as edict
>>> d = edict({'foo':3, 'bar':{'x':1, 'y':2}})
>>> d.foo
3
>>> d.bar.x
1

>>> d = edict(foo=3)
>>> d.foo
3

[INSTALLATION]:

  • pip install easydict

回答 4

以前的所有答案都是正确的,这是一种更简单的方法。例如,创建一个Dict数据结构来序列化和反序列化一个对象

请注意,Python中None是Null,我有意使用它来演示如何存储null并将其转换为json null)

import json
print('serialization')
myDictObj = { "name":"John", "age":30, "car":None }
##convert object to json
serialized= json.dumps(myDictObj, sort_keys=True, indent=3)
print(serialized)
## now we are gonna convert json to object
deserialization=json.loads(serialized)
print(deserialization)

在此处输入图片说明

All previous answers are correct, here is one more and easy way to do it. For example, create a Dict data structure to serialize and deserialize an object

(Notice None is Null in python and I’m intentionally using this to demonstrate how you can store null and convert it to json null)

import json
print('serialization')
myDictObj = { "name":"John", "age":30, "car":None }
##convert object to json
serialized= json.dumps(myDictObj, sort_keys=True, indent=3)
print(serialized)
## now we are gonna convert json to object
deserialization=json.loads(serialized)
print(deserialization)

enter image description here


从请求库解析JSON响应的最佳方法是什么?

问题:从请求库解析JSON响应的最佳方法是什么?

我正在使用python requests模块将RESTful GET发送到服务器,对此我得到了JSON响应。JSON响应基本上只是列表的列表。

强制对本地Python对象进行响应的最佳方法是什么,以便我可以使用进行迭代或打印出来pprint

I’m using the python requests module to send a RESTful GET to a server, for which I get a response in JSON. The JSON response is basically just a list of lists.

What’s the best way to coerce the response to a native Python object so I can either iterate or print it out using pprint?


回答 0

您可以使用json.loads

import json
import requests

response = requests.get(...)
json_data = json.loads(response.text)

这会将给定的字符串转换为字典,从而使您可以轻松地在代码中访问JSON数据。

或者,您可以使用@Martijn的有用建议以及投票较高的答案response.json()

You can use json.loads:

import json
import requests

response = requests.get(...)
json_data = json.loads(response.text)

This converts a given string into a dictionary which allows you to access your JSON data easily within your code.

Or you can use @Martijn’s helpful suggestion, and the higher voted answer, response.json().


回答 1

由于您正在使用requests,因此您应该使用响应的json方法。

import requests

response = requests.get(...)
data = response.json()

自动检测要使用的解码器

Since you’re using requests, you should use the response’s json method.

import requests

response = requests.get(...)
data = response.json()

It autodetects which decoder to use.


JSONDecodeError:预期值:第1行第1列(字符0)

问题:JSONDecodeError:预期值:第1行第1列(字符0)

Expecting value: line 1 column 1 (char 0)尝试解码JSON 时出现错误。

我用于API调用的URL在浏览器中可以正常工作,但是通过curl请求完成时会出现此错误。以下是我用于curl请求的代码。

错误发生在 return simplejson.loads(response_json)

    response_json = self.web_fetch(url)
    response_json = response_json.decode('utf-8')
    return json.loads(response_json)


def web_fetch(self, url):
        buffer = StringIO()
        curl = pycurl.Curl()
        curl.setopt(curl.URL, url)
        curl.setopt(curl.TIMEOUT, self.timeout)
        curl.setopt(curl.WRITEFUNCTION, buffer.write)
        curl.perform()
        curl.close()
        response = buffer.getvalue().strip()
        return response

完整回溯:

追溯:

File "/Users/nab/Desktop/myenv2/lib/python2.7/site-packages/django/core/handlers/base.py" in get_response
  111.                         response = callback(request, *callback_args, **callback_kwargs)
File "/Users/nab/Desktop/pricestore/pricemodels/views.py" in view_category
  620.     apicall=api.API().search_parts(category_id= str(categoryofpart.api_id), manufacturer = manufacturer, filter = filters, start=(catpage-1)*20, limit=20, sort_by='[["mpn","asc"]]')
File "/Users/nab/Desktop/pricestore/pricemodels/api.py" in search_parts
  176.         return simplejson.loads(response_json)
File "/Users/nab/Desktop/myenv2/lib/python2.7/site-packages/simplejson/__init__.py" in loads
  455.         return _default_decoder.decode(s)
File "/Users/nab/Desktop/myenv2/lib/python2.7/site-packages/simplejson/decoder.py" in decode
  374.         obj, end = self.raw_decode(s)
File "/Users/nab/Desktop/myenv2/lib/python2.7/site-packages/simplejson/decoder.py" in raw_decode
  393.         return self.scan_once(s, idx=_w(s, idx).end())

Exception Type: JSONDecodeError at /pricemodels/2/dir/
Exception Value: Expecting value: line 1 column 1 (char 0)

I am getting error Expecting value: line 1 column 1 (char 0) when trying to decode JSON.

The URL I use for the API call works fine in the browser, but gives this error when done through a curl request. The following is the code I use for the curl request.

The error happens at return simplejson.loads(response_json)

    response_json = self.web_fetch(url)
    response_json = response_json.decode('utf-8')
    return json.loads(response_json)


def web_fetch(self, url):
        buffer = StringIO()
        curl = pycurl.Curl()
        curl.setopt(curl.URL, url)
        curl.setopt(curl.TIMEOUT, self.timeout)
        curl.setopt(curl.WRITEFUNCTION, buffer.write)
        curl.perform()
        curl.close()
        response = buffer.getvalue().strip()
        return response

Full Traceback:

Traceback:

File "/Users/nab/Desktop/myenv2/lib/python2.7/site-packages/django/core/handlers/base.py" in get_response
  111.                         response = callback(request, *callback_args, **callback_kwargs)
File "/Users/nab/Desktop/pricestore/pricemodels/views.py" in view_category
  620.     apicall=api.API().search_parts(category_id= str(categoryofpart.api_id), manufacturer = manufacturer, filter = filters, start=(catpage-1)*20, limit=20, sort_by='[["mpn","asc"]]')
File "/Users/nab/Desktop/pricestore/pricemodels/api.py" in search_parts
  176.         return simplejson.loads(response_json)
File "/Users/nab/Desktop/myenv2/lib/python2.7/site-packages/simplejson/__init__.py" in loads
  455.         return _default_decoder.decode(s)
File "/Users/nab/Desktop/myenv2/lib/python2.7/site-packages/simplejson/decoder.py" in decode
  374.         obj, end = self.raw_decode(s)
File "/Users/nab/Desktop/myenv2/lib/python2.7/site-packages/simplejson/decoder.py" in raw_decode
  393.         return self.scan_once(s, idx=_w(s, idx).end())

Exception Type: JSONDecodeError at /pricemodels/2/dir/
Exception Value: Expecting value: line 1 column 1 (char 0)

回答 0

总结评论中的对话:

  • 无需使用simplejson库,Python作为json模块包含了相同的库。

  • 无需解码UTF8对unicode的响应,simplejson/ json .loads()方法可以本地处理UTF8编码的数据。

  • pycurl有一个非常古老的API。除非您有特定的使用要求,否则会有更好的选择。

requests提供最友好的API,包括JSON支持。如果可以,将您的通话替换为:

import requests

return requests.get(url).json()

To summarize the conversation in the comments:

  • There is no need to use simplejson library, the same library is included with Python as the json module.

  • There is no need to decode a response from UTF8 to unicode, the simplejson / json .loads() method can handle UTF8 encoded data natively.

  • pycurl has a very archaic API. Unless you have a specific requirement for using it, there are better choices.

requests offers the most friendly API, including JSON support. If you can, replace your call with:

import requests

return requests.get(url).json()

回答 1

检查响应数据主体,是否存在实际数据并且数据转储的格式是否正确。

在大多数情况下,您的json.loadsJSONDecodeError: Expecting value: line 1 column 1 (char 0)错误是由于:

  • 非JSON引用
  • XML / HTML输出(即以<开头的字符串),或
  • 不兼容的字符编码

最终,错误告诉您字符串在第一位置已经不符合JSON。

这样,如果尽管乍一看具有看起来像JSON的数据主体,但解析仍然失败,请尝试替换数据主体的引号:

import sys, json
struct = {}
try:
  try: #try parsing to dict
    dataform = str(response_json).strip("'<>() ").replace('\'', '\"')
    struct = json.loads(dataform)
  except:
    print repr(resonse_json)
    print sys.exc_info()

注意:数据中的引号必须正确转义

Check the response data-body, whether actual data is present and a data-dump appears to be well-formatted.

In most cases your json.loadsJSONDecodeError: Expecting value: line 1 column 1 (char 0) error is due to :

  • non-JSON conforming quoting
  • XML/HTML output (that is, a string starting with <), or
  • incompatible character encoding

Ultimately the error tells you that at the very first position the string already doesn’t conform to JSON.

As such, if parsing fails despite having a data-body that looks JSON like at first glance, try replacing the quotes of the data-body:

import sys, json
struct = {}
try:
  try: #try parsing to dict
    dataform = str(response_json).strip("'<>() ").replace('\'', '\"')
    struct = json.loads(dataform)
  except:
    print repr(resonse_json)
    print sys.exc_info()

Note: Quotes within the data must be properly escaped


回答 2

当您有一个HTTP错误代码(例如404)并尝试将响应解析为JSON时,使用requestslib JSONDecodeError可能会发生!

您必须首先检查200(OK)或让它出现错误以避免这种情况。我希望它以较少的错误消息失败。

注意:就像注释中提到的Martijn Pieters一样,如果发生错误(取决于实现),服务器可以使用JSON进行响应,因此检查Content-Type标头更加可靠。

With the requests lib JSONDecodeError can happen when you have an http error code like 404 and try to parse the response as JSON !

You must first check for 200 (OK) or let it raise on error to avoid this case. I wish it failed with a less cryptic error message.

NOTE: as Martijn Pieters stated in the comments servers can respond with JSON in case of errors (it depends on the implementation), so checking the Content-Type header is more reliable.


回答 3

我认为值得一提的是,在您解析JSON文件本身的内容时-健全性检查对于确保您实际上在调用文件json.loads()内容(而不是该JSON 的文件路径)非常有用。:

json_file_path = "/path/to/example.json"

with open(json_file_path, 'r') as j:
     contents = json.loads(j.read())

我有些尴尬地承认有时会发生这种情况:

contents = json.loads(json_file_path)

I think it’s worth mentioning that in cases where you’re parsing the contents of a JSON file itself – sanity checks can be useful to ensure that you’re actually invoking json.loads() on the contents of the file, as opposed to the file path of that JSON:

json_file_path = "/path/to/example.json"

with open(json_file_path, 'r') as j:
     contents = json.loads(j.read())

I’m a little embarrassed to admit that this can happen sometimes:

contents = json.loads(json_file_path)

回答 4

检查文件的编码格式,并在读取文件时使用相应的编码格式。它将解决您的问题。

with open("AB.json", encoding='utf-8', errors='ignore') as json_data:
     data = json.load(json_data, strict=False)

Check encoding format of your file and use corresponding encoding format while reading file. It will solve your problem.

with open("AB.json", encoding='utf-8', errors='ignore') as json_data:
     data = json.load(json_data, strict=False)

回答 5

很多时候,这是因为您要解析的字符串为空:

>>> import json
>>> x = json.loads("")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/Cellar/python/3.7.3/Frameworks/Python.framework/Versions/3.7/lib/python3.7/json/__init__.py", line 348, in loads
    return _default_decoder.decode(s)
  File "/usr/local/Cellar/python/3.7.3/Frameworks/Python.framework/Versions/3.7/lib/python3.7/json/decoder.py", line 337, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "/usr/local/Cellar/python/3.7.3/Frameworks/Python.framework/Versions/3.7/lib/python3.7/json/decoder.py", line 355, in raw_decode
    raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

您可以通过json_string事先检查是否为空来进行补救:

import json

if json_string:
    x = json.loads(json_string)
else:
    // Your logic here
    x = {}

A lot of times, this will be because the string you’re trying to parse is blank:

>>> import json
>>> x = json.loads("")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/Cellar/python/3.7.3/Frameworks/Python.framework/Versions/3.7/lib/python3.7/json/__init__.py", line 348, in loads
    return _default_decoder.decode(s)
  File "/usr/local/Cellar/python/3.7.3/Frameworks/Python.framework/Versions/3.7/lib/python3.7/json/decoder.py", line 337, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "/usr/local/Cellar/python/3.7.3/Frameworks/Python.framework/Versions/3.7/lib/python3.7/json/decoder.py", line 355, in raw_decode
    raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

You can remedy by checking whether json_string is empty beforehand:

import json

if json_string:
    x = json.loads(json_string)
else:
    // Your logic here
    x = {}

回答 6

即使调用了decode(),也可能嵌入了0。使用replace():

import json
struct = {}
try:
    response_json = response_json.decode('utf-8').replace('\0', '')
    struct = json.loads(response_json)
except:
    print('bad json: ', response_json)
return struct

There may be embedded 0’s, even after calling decode(). Use replace():

import json
struct = {}
try:
    response_json = response_json.decode('utf-8').replace('\0', '')
    struct = json.loads(response_json)
except:
    print('bad json: ', response_json)
return struct

回答 7

使用请求,我遇到了这个问题。感谢Christophe Roussy的解释。

为了调试,我使用了:

response = requests.get(url)
logger.info(type(response))

我从API返回了404响应。

I had exactly this issue using requests. Thanks to Christophe Roussy for his explanation.

To debug, I used:

response = requests.get(url)
logger.info(type(response))

I was getting a 404 response back from the API.


回答 8

我在请求(python库)时遇到了同样的问题。它恰好是accept-encoding标题。

它是这样设置的: 'accept-encoding': 'gzip, deflate, br'

我只是将其从请求中删除,并停止获取错误。

I was having the same problem with requests (the python library). It happened to be the accept-encoding header.

It was set this way: 'accept-encoding': 'gzip, deflate, br'

I simply removed it from the request and stopped getting the error.


回答 9

对我来说,它没有在请求中使用身份验证。

For me, it was not using authentication in the request.


回答 10

对我来说,这是服务器响应而不是200响应,并且响应不是json格式的。我最终在json解析之前执行了此操作:

# this is the https request for data in json format
response_json = requests.get() 

# only proceed if I have a 200 response which is saved in status_code
if (response_json.status_code == 200):  
     response = response_json.json() #converting from json to dictionary using json library

For me it was server responding with something other than 200 and the response was not json formatted. I ended up doing this before the json parse:

# this is the https request for data in json format
response_json = requests.get() 

# only proceed if I have a 200 response which is saved in status_code
if (response_json.status_code == 200):  
     response = response_json.json() #converting from json to dictionary using json library

回答 11

如果您是Windows用户,则Tweepy API可以在数据对象之间生成一个空行。由于这种情况,您会收到“ JSONDecodeError:预期值:第1行第1列(字符0)”错误。为避免此错误,您可以删除空行。

例如:

 def on_data(self, data):
        try:
            with open('sentiment.json', 'a', newline='\n') as f:
                f.write(data)
                return True
        except BaseException as e:
            print("Error on_data: %s" % str(e))
        return True

参考: Twitter流API从None给出JSONDecodeError(“ Expecting value”,s,err.value)

If you are a Windows user, Tweepy API can generate an empty line between data objects. Because of this situation, you can get “JSONDecodeError: Expecting value: line 1 column 1 (char 0)” error. To avoid this error, you can delete empty lines.

For example:

 def on_data(self, data):
        try:
            with open('sentiment.json', 'a', newline='\n') as f:
                f.write(data)
                return True
        except BaseException as e:
            print("Error on_data: %s" % str(e))
        return True

Reference: Twitter stream API gives JSONDecodeError(“Expecting value”, s, err.value) from None


回答 12

只需检查请求的状态码是否为200。例如:

if status != 200:
    print("An error has occured. [Status code", status, "]")
else:
    data = response.json() #Only convert to Json when status is OK.
    if not data["elements"]:
        print("Empty JSON")
    else:
        "You can extract data here"

Just check if the request has a status code 200. So for example:

if status != 200:
    print("An error has occured. [Status code", status, "]")
else:
    data = response.json() #Only convert to Json when status is OK.
    if not data["elements"]:
        print("Empty JSON")
    else:
        "You can extract data here"

回答 13

我在基于Python的Web API的响应中收到了这样的错误.text,但是它导致了我的到来,因此这可能会帮助遇到类似问题的其他人(使用requests.. 过滤响应并在搜索中请求问题非常困难)

使用json.dumps()要求 data ARG创建JSON的正确转义的字符串前发布解决了该问题对我来说

requests.post(url, data=json.dumps(data))

I received such an error in a Python-based web API’s response .text, but it led me here, so this may help others with a similar issue (it’s very difficult to filter response and request issues in a search when using requests..)

Using json.dumps() on the request data arg to create a correctly-escaped string of JSON before POSTing fixed the issue for me

requests.post(url, data=json.dumps(data))

从文件读取JSON?

问题:从文件读取JSON?

只是因为一个简单,易于表达的陈述使我的脸上有些错误,所以我有点头疼。

我有一个名为strings.json的json文件,如下所示:

"strings": [{"-name": "city", "#text": "City"}, {"-name": "phone", "#text": "Phone"}, ...,
            {"-name": "address", "#text": "Address"}]

我现在想读取json文件。我发现了以下这些语句,但是不起作用:

import json
from pprint import pprint

with open('strings.json') as json_data:
    d = json.loads(json_data)
    json_data.close()
    pprint(d)

控制台上显示的错误是这样的:

Traceback (most recent call last):
  File "/home/.../android/values/manipulate_json.py", line 5, in <module>
    d = json.loads(json_data)
  File "/usr/lib/python2.7/json/__init__.py", line 326, in loads
    return _default_decoder.decode(s)
  File "/usr/lib/python2.7/json/decoder.py", line 366, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
TypeError: expected string or buffer
[Finished in 0.1s with exit code 1]

已编辑

从更改json.loadsjson.load

并得到了:

Traceback (most recent call last):
  File "/home/.../android/values/manipulate_json.py", line 5, in <module>
    d = json.load(json_data)
  File "/usr/lib/python2.7/json/__init__.py", line 278, in load
    **kw)
  File "/usr/lib/python2.7/json/__init__.py", line 326, in loads
    return _default_decoder.decode(s)
  File "/usr/lib/python2.7/json/decoder.py", line 369, in decode
    raise ValueError(errmsg("Extra data", s, end, len(s)))
ValueError: Extra data: line 829 column 1 - line 829 column 2 (char 18476 - 18477)
[Finished in 0.1s with exit code 1]

I am getting a bit of headache just because a simple looking, easy statement is throwing some errors in my face.

I have a json file called strings.json like this:

"strings": [{"-name": "city", "#text": "City"}, {"-name": "phone", "#text": "Phone"}, ...,
            {"-name": "address", "#text": "Address"}]

I want to read the json file, just that for now. I have these statements which I found out, but it’s not working:

import json
from pprint import pprint

with open('strings.json') as json_data:
    d = json.loads(json_data)
    json_data.close()
    pprint(d)

The error displayed on the console was this:

Traceback (most recent call last):
  File "/home/.../android/values/manipulate_json.py", line 5, in <module>
    d = json.loads(json_data)
  File "/usr/lib/python2.7/json/__init__.py", line 326, in loads
    return _default_decoder.decode(s)
  File "/usr/lib/python2.7/json/decoder.py", line 366, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
TypeError: expected string or buffer
[Finished in 0.1s with exit code 1]

Edited

Changed from json.loads to json.load

and got this:

Traceback (most recent call last):
  File "/home/.../android/values/manipulate_json.py", line 5, in <module>
    d = json.load(json_data)
  File "/usr/lib/python2.7/json/__init__.py", line 278, in load
    **kw)
  File "/usr/lib/python2.7/json/__init__.py", line 326, in loads
    return _default_decoder.decode(s)
  File "/usr/lib/python2.7/json/decoder.py", line 369, in decode
    raise ValueError(errmsg("Extra data", s, end, len(s)))
ValueError: Extra data: line 829 column 1 - line 829 column 2 (char 18476 - 18477)
[Finished in 0.1s with exit code 1]

回答 0

json.load()方法(“ load”中没有“ s”)可以直接读取文件:

import json

with open('strings.json') as f:
    d = json.load(f)
    print(d)

您正在使用json.loads()方法,该方法仅用于字符串参数。

编辑:新消息是一个完全不同的问题。在这种情况下,该文件中存在一些无效的json。为此,我建议通过json验证器运行文件。

还有一些修复json的解决方案,例如,如何自动修复无效的JSON字符串?

The json.load() method (without “s” in “load”) can read a file directly:

import json

with open('strings.json') as f:
    d = json.load(f)
    print(d)

You were using the json.loads() method, which is used for string arguments only.

Edit: The new message is a totally different problem. In that case, there is some invalid json in that file. For that, I would recommend running the file through a json validator.

There are also solutions for fixing json like for example How do I automatically fix an invalid JSON string?.


回答 1

这是一段代码,对我来说很好

import json

with open("test.json") as json_file:
    json_data = json.load(json_file)
    print(json_data)

与数据

{
    "a": [1,3,"asdf",true],
    "b": {
        "Hello": "world"
    }
}

您可能想用try catch包装json.load行,因为无效的JSON会导致stacktrace错误消息。

Here is a copy of code which works fine for me

import json

with open("test.json") as json_file:
    json_data = json.load(json_file)
    print(json_data)

with the data

{
    "a": [1,3,"asdf",true],
    "b": {
        "Hello": "world"
    }
}

you may want to wrap your json.load line with a try catch because invalid JSON will cause a stacktrace error message.


回答 2

问题是使用with语句:

with open('strings.json') as json_data:
    d = json.load(json_data)
    pprint(d)

该文件将已经隐式关闭。无需json_data.close()再次调用。

The problem is using with statement:

with open('strings.json') as json_data:
    d = json.load(json_data)
    pprint(d)

The file is going to be implicitly closed already. There is no need to call json_data.close() again.


回答 3

在python 3中,我们可以使用以下方法。

从文件读取并转换为JSON

import json
from pprint import pprint

# Considering "json_list.json" is a json file

with open('json_list.json') as fd:
     json_data = json.load(fd)
     pprint(json_data)

with语句自动关闭打开的文件描述符。


字符串到JSON

import json
from pprint import pprint

json_data = json.loads('{"name" : "myName", "age":24}')
pprint(json_data)

In python 3, we can use below method.

Read from file and convert to JSON

import json
from pprint import pprint

# Considering "json_list.json" is a json file

with open('json_list.json') as fd:
     json_data = json.load(fd)
     pprint(json_data)

with statement automatically close the opened file descriptor.


String to JSON

import json
from pprint import pprint

json_data = json.loads('{"name" : "myName", "age":24}')
pprint(json_data)

回答 4

作为补充,今天您可以使用熊猫来导入json:
https : //pandas.pydata.org/pandas-docs/stable/genic/pandas.read_json.html 您可能要仔细使用Orient参数。

To add on this, today you are able to use pandas to import json:
https://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_json.html You may want to do a careful use of the orient parameter.


回答 5

您可以使用pandas库读取JSON文件。

import pandas as pd
df = pd.read_json('strings.json',lines=True)
print(df)

You can use pandas library to read the JSON file.

import pandas as pd
df = pd.read_json('strings.json',lines=True)
print(df)

回答 6

这对我有用。

json.load()接受文件对象,解析JSON数据,使用该数据填充Python字典并将其返回给您。

假设JSON文件是这样的:

{
   "emp_details":[
                 {
                "emp_name":"John",
                "emp_emailId":"john@gmail.com"  
                  },
                {
                 "emp_name":"Aditya",
                 "emp_emailId":"adityatest@yahoo.com"
                }
              ] 
}

import json 

# Opening JSON file 
f = open('data.json',) 

# returns JSON object as  
# a dictionary 
data = json.load(f) 

# Iterating through the json 
# list 
for i in data['emp_details']: 
    print(i) 

# Closing file 
f.close()

#Output:
{'emp_name':'John','emp_emailId':'john@gmail.com'}
{'emp_name':'Aditya','emp_emailId':'adityatest@yahoo.com'}

This works for me.

json.load() accepts file object, parses the JSON data, populates a Python dictionary with the data and returns it back to you.

Suppose JSON file is like this:

{
   "emp_details":[
                 {
                "emp_name":"John",
                "emp_emailId":"john@gmail.com"  
                  },
                {
                 "emp_name":"Aditya",
                 "emp_emailId":"adityatest@yahoo.com"
                }
              ] 
}


import json 

# Opening JSON file 
f = open('data.json',) 

# returns JSON object as  
# a dictionary 
data = json.load(f) 

# Iterating through the json 
# list 
for i in data['emp_details']: 
    print(i) 

# Closing file 
f.close()

#Output:
{'emp_name':'John','emp_emailId':'john@gmail.com'}
{'emp_name':'Aditya','emp_emailId':'adityatest@yahoo.com'}