问题:如何对集合进行JSON序列化?

我有一个Python set,其中包含带有__hash____eq__方法的对象,以确保该集合中没有重复项。

我需要对该结果进行json编码set,但是即使将一个空值传递set给该json.dumps方法也会引发TypeError

  File "/usr/lib/python2.7/json/encoder.py", line 201, in encode
    chunks = self.iterencode(o, _one_shot=True)
  File "/usr/lib/python2.7/json/encoder.py", line 264, in iterencode
    return _iterencode(o, 0)
  File "/usr/lib/python2.7/json/encoder.py", line 178, in default
    raise TypeError(repr(o) + " is not JSON serializable")
TypeError: set([]) is not JSON serializable

我知道我可以为json.JSONEncoder具有自定义default方法的类创建扩展,但是我什至不知道从哪里开始转换set。是否应该set使用默认方法中的值创建字典,然后返回该方法的编码?理想情况下,我想使默认方法能够处理原始编码器阻塞的所有数据类型(我将Mongo用作数据源,因此日期似乎也引发了此错误)

正确方向的任何提示将不胜感激。

编辑:

感谢你的回答!也许我应该更精确一些。

我利用(并赞成)这里的答案来解决set翻译的局限性,但是内部键也是一个问题。

中的set对象是转换为的复杂对象__dict__,但它们本身也可以包含其属性值,这些值可能不符合json编码器中的基本类型。

涉及到很多不同的类型set,并且哈希基本上为实体计算了唯一的ID,但是按照NoSQL的真正精神,没有确切说明子对象包含什么。

一个对象可能包含的日期值starts,而另一个对象可能具有一些其他模式,该模式不包含包含“非原始”对象的键。

这就是为什么我唯一能想到的解决方案是扩展JSONEncoder替换default方法以打开不同情况的方法-但我不确定如何进行此操作,并且文档不明确。在嵌套对象中,是default按键返回go 的值,还是只是查看整个对象的通用包含/丢弃?该方法如何容纳嵌套值?我已经看过先前的问题,但似乎找不到最佳的针对特定情况的编码的方法(不幸的是,这似乎是我在这里需要做的事情)。

I have a Python set that contains objects with __hash__ and __eq__ methods in order to make certain no duplicates are included in the collection.

I need to json encode this result set, but passing even an empty set to the json.dumps method raises a TypeError.

  File "/usr/lib/python2.7/json/encoder.py", line 201, in encode
    chunks = self.iterencode(o, _one_shot=True)
  File "/usr/lib/python2.7/json/encoder.py", line 264, in iterencode
    return _iterencode(o, 0)
  File "/usr/lib/python2.7/json/encoder.py", line 178, in default
    raise TypeError(repr(o) + " is not JSON serializable")
TypeError: set([]) is not JSON serializable

I know I can create an extension to the json.JSONEncoder class that has a custom default method, but I’m not even sure where to begin in converting over the set. Should I create a dictionary out of the set values within the default method, and then return the encoding on that? Ideally, I’d like to make the default method able to handle all the datatypes that the original encoder chokes on (I’m using Mongo as a data source so dates seem to raise this error too)

Any hint in the right direction would be appreciated.

EDIT:

Thanks for the answer! Perhaps I should have been more precise.

I utilized (and upvoted) the answers here to get around the limitations of the set being translated, but there are internal keys that are an issue as well.

The objects in the set are complex objects that translate to __dict__, but they themselves can also contain values for their properties that could be ineligible for the basic types in the json encoder.

There’s a lot of different types coming into this set, and the hash basically calculates a unique id for the entity, but in the true spirit of NoSQL there’s no telling exactly what the child object contains.

One object might contain a date value for starts, whereas another may have some other schema that includes no keys containing “non-primitive” objects.

That is why the only solution I could think of was to extend the JSONEncoder to replace the default method to turn on different cases – but I’m not sure how to go about this and the documentation is ambiguous. In nested objects, does the value returned from default go by key, or is it just a generic include/discard that looks at the whole object? How does that method accommodate nested values? I’ve looked through previous questions and can’t seem to find the best approach to case-specific encoding (which unfortunately seems like what I’m going to need to do here).


回答 0

JSON表示法只有少数本机数据类型(对象,数组,字符串,数字,布尔值和null),因此以JSON序列化的任何内容都必须表示为这些类型之一。

json模块docs所示,此转换可以由JSONEncoderJSONDecoder自动完成,但随后您将放弃可能需要的其他一些结构(如果将集转换为列表,则将失去恢复常规数据的能力。列表;如果使用将集转换为字典,dict.fromkeys(s)则将失去恢复字典的能力)。

一个更复杂的解决方案是构建可以与其他本机JSON类型共存的自定义类型。这使您可以存储嵌套结构,其中包括列表,集合,字典,小数,日期时间对象等:

from json import dumps, loads, JSONEncoder, JSONDecoder
import pickle

class PythonObjectEncoder(JSONEncoder):
    def default(self, obj):
        if isinstance(obj, (list, dict, str, unicode, int, float, bool, type(None))):
            return JSONEncoder.default(self, obj)
        return {'_python_object': pickle.dumps(obj)}

def as_python_object(dct):
    if '_python_object' in dct:
        return pickle.loads(str(dct['_python_object']))
    return dct

这是一个示例会话,显示它可以处理列表,字典和集合:

>>> data = [1,2,3, set(['knights', 'who', 'say', 'ni']), {'key':'value'}, Decimal('3.14')]

>>> j = dumps(data, cls=PythonObjectEncoder)

>>> loads(j, object_hook=as_python_object)
[1, 2, 3, set(['knights', 'say', 'who', 'ni']), {u'key': u'value'}, Decimal('3.14')]

另外,使用更通用的序列化技术(例如YAMLTwisted Jelly或Python的pickle模块)可能很有用。它们每个都支持更大范围的数据类型。

JSON notation has only a handful of native datatypes (objects, arrays, strings, numbers, booleans, and null), so anything serialized in JSON needs to be expressed as one of these types.

As shown in the json module docs, this conversion can be done automatically by a JSONEncoder and JSONDecoder, but then you would be giving up some other structure you might need (if you convert sets to a list, then you lose the ability to recover regular lists; if you convert sets to a dictionary using dict.fromkeys(s) then you lose the ability to recover dictionaries).

A more sophisticated solution is to build-out a custom type that can coexist with other native JSON types. This lets you store nested structures that include lists, sets, dicts, decimals, datetime objects, etc.:

from json import dumps, loads, JSONEncoder, JSONDecoder
import pickle

class PythonObjectEncoder(JSONEncoder):
    def default(self, obj):
        if isinstance(obj, (list, dict, str, unicode, int, float, bool, type(None))):
            return JSONEncoder.default(self, obj)
        return {'_python_object': pickle.dumps(obj)}

def as_python_object(dct):
    if '_python_object' in dct:
        return pickle.loads(str(dct['_python_object']))
    return dct

Here is a sample session showing that it can handle lists, dicts, and sets:

>>> data = [1,2,3, set(['knights', 'who', 'say', 'ni']), {'key':'value'}, Decimal('3.14')]

>>> j = dumps(data, cls=PythonObjectEncoder)

>>> loads(j, object_hook=as_python_object)
[1, 2, 3, set(['knights', 'say', 'who', 'ni']), {u'key': u'value'}, Decimal('3.14')]

Alternatively, it may be useful to use a more general purpose serialization technique such as YAML, Twisted Jelly, or Python’s pickle module. These each support a much greater range of datatypes.


回答 1

您可以创建一个自定义编码器,list当遇到时将返回set。这是一个例子:

>>> import json
>>> class SetEncoder(json.JSONEncoder):
...    def default(self, obj):
...       if isinstance(obj, set):
...          return list(obj)
...       return json.JSONEncoder.default(self, obj)
... 
>>> json.dumps(set([1,2,3,4,5]), cls=SetEncoder)
'[1, 2, 3, 4, 5]'

您也可以通过这种方式检测其他类型。如果需要保留列表实际上是一个集合,则可以使用自定义编码。类似的东西return {'type':'set', 'list':list(obj)}可能会起作用。

要说明嵌套类型,请考虑将其序列化:

>>> class Something(object):
...    pass
>>> json.dumps(set([1,2,3,4,5,Something()]), cls=SetEncoder)

这将引发以下错误:

TypeError: <__main__.Something object at 0x1691c50> is not JSON serializable

这表明编码器将获取list返回的结果,并对其子代递归调用序列化器。要为多种类型添加自定义序列化程序,可以执行以下操作:

>>> class SetEncoder(json.JSONEncoder):
...    def default(self, obj):
...       if isinstance(obj, set):
...          return list(obj)
...       if isinstance(obj, Something):
...          return 'CustomSomethingRepresentation'
...       return json.JSONEncoder.default(self, obj)
... 
>>> json.dumps(set([1,2,3,4,5,Something()]), cls=SetEncoder)
'[1, 2, 3, 4, 5, "CustomSomethingRepresentation"]'

You can create a custom encoder that returns a list when it encounters a set. Here’s an example:

>>> import json
>>> class SetEncoder(json.JSONEncoder):
...    def default(self, obj):
...       if isinstance(obj, set):
...          return list(obj)
...       return json.JSONEncoder.default(self, obj)
... 
>>> json.dumps(set([1,2,3,4,5]), cls=SetEncoder)
'[1, 2, 3, 4, 5]'

You can detect other types this way too. If you need to retain that the list was actually a set, you could use a custom encoding. Something like return {'type':'set', 'list':list(obj)} might work.

To illustrated nested types, consider serializing this:

>>> class Something(object):
...    pass
>>> json.dumps(set([1,2,3,4,5,Something()]), cls=SetEncoder)

This raises the following error:

TypeError: <__main__.Something object at 0x1691c50> is not JSON serializable

This indicates that the encoder will take the list result returned and recursively call the serializer on its children. To add a custom serializer for multiple types, you can do this:

>>> class SetEncoder(json.JSONEncoder):
...    def default(self, obj):
...       if isinstance(obj, set):
...          return list(obj)
...       if isinstance(obj, Something):
...          return 'CustomSomethingRepresentation'
...       return json.JSONEncoder.default(self, obj)
... 
>>> json.dumps(set([1,2,3,4,5,Something()]), cls=SetEncoder)
'[1, 2, 3, 4, 5, "CustomSomethingRepresentation"]'

回答 2

我将Raymond Hettinger的解决方案调整为适用于python 3。

这是发生了什么变化:

  • unicode 消失了
  • 更新调用父母defaultsuper()
  • 使用base64序列化bytes型成str(因为它似乎bytes在python 3不能被转换为JSON)
from decimal import Decimal
from base64 import b64encode, b64decode
from json import dumps, loads, JSONEncoder
import pickle

class PythonObjectEncoder(JSONEncoder):
    def default(self, obj):
        if isinstance(obj, (list, dict, str, int, float, bool, type(None))):
            return super().default(obj)
        return {'_python_object': b64encode(pickle.dumps(obj)).decode('utf-8')}

def as_python_object(dct):
    if '_python_object' in dct:
        return pickle.loads(b64decode(dct['_python_object'].encode('utf-8')))
    return dct

data = [1,2,3, set(['knights', 'who', 'say', 'ni']), {'key':'value'}, Decimal('3.14')]
j = dumps(data, cls=PythonObjectEncoder)
print(loads(j, object_hook=as_python_object))
# prints: [1, 2, 3, {'knights', 'who', 'say', 'ni'}, {'key': 'value'}, Decimal('3.14')]

I adapted Raymond Hettinger’s solution to python 3.

Here is what has changed:

  • unicode disappeared
  • updated the call to the parents’ default with super()
  • using base64 to serialize the bytes type into str (because it seems that bytes in python 3 can’t be converted to JSON)
from decimal import Decimal
from base64 import b64encode, b64decode
from json import dumps, loads, JSONEncoder
import pickle

class PythonObjectEncoder(JSONEncoder):
    def default(self, obj):
        if isinstance(obj, (list, dict, str, int, float, bool, type(None))):
            return super().default(obj)
        return {'_python_object': b64encode(pickle.dumps(obj)).decode('utf-8')}

def as_python_object(dct):
    if '_python_object' in dct:
        return pickle.loads(b64decode(dct['_python_object'].encode('utf-8')))
    return dct

data = [1,2,3, set(['knights', 'who', 'say', 'ni']), {'key':'value'}, Decimal('3.14')]
j = dumps(data, cls=PythonObjectEncoder)
print(loads(j, object_hook=as_python_object))
# prints: [1, 2, 3, {'knights', 'who', 'say', 'ni'}, {'key': 'value'}, Decimal('3.14')]

回答 3

JSON中仅字典,列表和原始对象类型(int,字符串,布尔)可用。

Only dictionaries, Lists and primitive object types (int, string, bool) are available in JSON.


回答 4

您无需创建自定义编码器类即可提供default方法-可以将其作为关键字参数传递:

import json

def serialize_sets(obj):
    if isinstance(obj, set):
        return list(obj)

    return obj

json_str = json.dumps(set([1,2,3]), default=serialize_sets)
print(json_str)

会生成[1, 2, 3]所有受支持的Python版本。

You don’t need to make a custom encoder class to supply the default method – it can be passed in as a keyword argument:

import json

def serialize_sets(obj):
    if isinstance(obj, set):
        return list(obj)

    return obj

json_str = json.dumps(set([1,2,3]), default=serialize_sets)
print(json_str)

results in [1, 2, 3] in all supported Python versions.


回答 5

如果您只需要编码集合,而不是一般的Python对象,并且想要使其易于阅读,则可以使用Raymond Hettinger答案的简化版本:

import json
import collections

class JSONSetEncoder(json.JSONEncoder):
    """Use with json.dumps to allow Python sets to be encoded to JSON

    Example
    -------

    import json

    data = dict(aset=set([1,2,3]))

    encoded = json.dumps(data, cls=JSONSetEncoder)
    decoded = json.loads(encoded, object_hook=json_as_python_set)
    assert data == decoded     # Should assert successfully

    Any object that is matched by isinstance(obj, collections.Set) will
    be encoded, but the decoded value will always be a normal Python set.

    """

    def default(self, obj):
        if isinstance(obj, collections.Set):
            return dict(_set_object=list(obj))
        else:
            return json.JSONEncoder.default(self, obj)

def json_as_python_set(dct):
    """Decode json {'_set_object': [1,2,3]} to set([1,2,3])

    Example
    -------
    decoded = json.loads(encoded, object_hook=json_as_python_set)

    Also see :class:`JSONSetEncoder`

    """
    if '_set_object' in dct:
        return set(dct['_set_object'])
    return dct

If you only need to encode sets, not general Python objects, and want to keep it easily human-readable, a simplified version of Raymond Hettinger’s answer can be used:

import json
import collections

class JSONSetEncoder(json.JSONEncoder):
    """Use with json.dumps to allow Python sets to be encoded to JSON

    Example
    -------

    import json

    data = dict(aset=set([1,2,3]))

    encoded = json.dumps(data, cls=JSONSetEncoder)
    decoded = json.loads(encoded, object_hook=json_as_python_set)
    assert data == decoded     # Should assert successfully

    Any object that is matched by isinstance(obj, collections.Set) will
    be encoded, but the decoded value will always be a normal Python set.

    """

    def default(self, obj):
        if isinstance(obj, collections.Set):
            return dict(_set_object=list(obj))
        else:
            return json.JSONEncoder.default(self, obj)

def json_as_python_set(dct):
    """Decode json {'_set_object': [1,2,3]} to set([1,2,3])

    Example
    -------
    decoded = json.loads(encoded, object_hook=json_as_python_set)

    Also see :class:`JSONSetEncoder`

    """
    if '_set_object' in dct:
        return set(dct['_set_object'])
    return dct

回答 6

如果您只需要快速转储并且不想实现自定义编码器。您可以使用以下内容:

json_string = json.dumps(data, iterable_as_array=True)

这会将所有集合(和其他可迭代对象)转换为数组。请注意,当您解析json时,这些字段将保留为数组。如果要保留类型,则需要编写自定义编码器。

If you need just quick dump and don’t want to implement custom encoder. You can use the following:

json_string = json.dumps(data, iterable_as_array=True)

This will convert all sets (and other iterables) into arrays. Just beware that those fields will stay arrays when you parse the json back. If you want to preserve the types, you need to write custom encoder.


回答 7

公认的解决方案的一个缺点是它的输出是非常特定于python的。也就是说,人类无法观察到其原始json输出,也无法通过其他语言(例如javascript)加载该输出。例:

db = {
        "a": [ 44, set((4,5,6)) ],
        "b": [ 55, set((4,3,2)) ]
        }

j = dumps(db, cls=PythonObjectEncoder)
print(j)

会给你:

{"a": [44, {"_python_object": "gANjYnVpbHRpbnMKc2V0CnEAXXEBKEsESwVLBmWFcQJScQMu"}], "b": [55, {"_python_object": "gANjYnVpbHRpbnMKc2V0CnEAXXEBKEsCSwNLBGWFcQJScQMu"}]}

我可以提出一种解决方案,将集合降级为包含输出列表的字典,并在使用相同的编码器加载到python中时将其降级为集合,从而保留可观察性和语言不可知性:

from decimal import Decimal
from base64 import b64encode, b64decode
from json import dumps, loads, JSONEncoder
import pickle

class PythonObjectEncoder(JSONEncoder):
    def default(self, obj):
        if isinstance(obj, (list, dict, str, int, float, bool, type(None))):
            return super().default(obj)
        elif isinstance(obj, set):
            return {"__set__": list(obj)}
        return {'_python_object': b64encode(pickle.dumps(obj)).decode('utf-8')}

def as_python_object(dct):
    if '__set__' in dct:
        return set(dct['__set__'])
    elif '_python_object' in dct:
        return pickle.loads(b64decode(dct['_python_object'].encode('utf-8')))
    return dct

db = {
        "a": [ 44, set((4,5,6)) ],
        "b": [ 55, set((4,3,2)) ]
        }

j = dumps(db, cls=PythonObjectEncoder)
print(j)
ob = loads(j)
print(ob["a"])

这使您:

{"a": [44, {"__set__": [4, 5, 6]}], "b": [55, {"__set__": [2, 3, 4]}]}
[44, {'__set__': [4, 5, 6]}]

请注意,序列化包含具有键的元素的字典"__set__"将破坏此机制。因此__set__现在已成为保留dict键。显然,可以随意使用另一个更加模糊的密钥。

One shortcoming of the accepted solution is that its output is very python specific. I.e. its raw json output cannot be observed by a human or loaded by another language (e.g. javascript). example:

db = {
        "a": [ 44, set((4,5,6)) ],
        "b": [ 55, set((4,3,2)) ]
        }

j = dumps(db, cls=PythonObjectEncoder)
print(j)

Will get you:

{"a": [44, {"_python_object": "gANjYnVpbHRpbnMKc2V0CnEAXXEBKEsESwVLBmWFcQJScQMu"}], "b": [55, {"_python_object": "gANjYnVpbHRpbnMKc2V0CnEAXXEBKEsCSwNLBGWFcQJScQMu"}]}

I can propose a solution which downgrades the set to a dict containing a list on the way out, and back to a set when loaded into python using the same encoder, therefore preserving observability and language agnosticism:

from decimal import Decimal
from base64 import b64encode, b64decode
from json import dumps, loads, JSONEncoder
import pickle

class PythonObjectEncoder(JSONEncoder):
    def default(self, obj):
        if isinstance(obj, (list, dict, str, int, float, bool, type(None))):
            return super().default(obj)
        elif isinstance(obj, set):
            return {"__set__": list(obj)}
        return {'_python_object': b64encode(pickle.dumps(obj)).decode('utf-8')}

def as_python_object(dct):
    if '__set__' in dct:
        return set(dct['__set__'])
    elif '_python_object' in dct:
        return pickle.loads(b64decode(dct['_python_object'].encode('utf-8')))
    return dct

db = {
        "a": [ 44, set((4,5,6)) ],
        "b": [ 55, set((4,3,2)) ]
        }

j = dumps(db, cls=PythonObjectEncoder)
print(j)
ob = loads(j)
print(ob["a"])

Which gets you:

{"a": [44, {"__set__": [4, 5, 6]}], "b": [55, {"__set__": [2, 3, 4]}]}
[44, {'__set__': [4, 5, 6]}]

Note that serializing a dictionary which has an element with a key "__set__" will break this mechanism. So __set__ has now become a reserved dict key. Obviously feel free to use another, more deeply obfuscated key.


声明:本站所有文章,如无特殊说明或标注,均为本站原创发布。任何个人或组织,在未征得本站同意时,禁止复制、盗用、采集、发布本站内容到任何网站、书籍等各类媒体平台。如若本站内容侵犯了原著者的合法权益,可联系我们进行处理。