标签归档:serialization

将类实例序列化为JSON

问题:将类实例序列化为JSON

我正在尝试创建类实例的JSON字符串表示形式并且遇到困难。假设该类的构建如下:

class testclass:
    value1 = "a"
    value2 = "b"

像这样对json.dumps进行调用:

t = testclass()
json.dumps(t)

失败了,并告诉我testclass不可JSON序列化。

TypeError: <__main__.testclass object at 0x000000000227A400> is not JSON serializable

我也尝试过使用pickle模块:

t = testclass()
print(pickle.dumps(t, pickle.HIGHEST_PROTOCOL))

它提供类实例信息,但不提供类实例的序列化内容。

b'\x80\x03c__main__\ntestclass\nq\x00)\x81q\x01}q\x02b.'

我究竟做错了什么?

I am trying to create a JSON string representation of a class instance and having difficulty. Let’s say the class is built like this:

class testclass:
    value1 = "a"
    value2 = "b"

A call to the json.dumps is made like this:

t = testclass()
json.dumps(t)

It is failing and telling me that the testclass is not JSON serializable.

TypeError: <__main__.testclass object at 0x000000000227A400> is not JSON serializable

I have also tried using the pickle module :

t = testclass()
print(pickle.dumps(t, pickle.HIGHEST_PROTOCOL))

And it gives class instance information but not a serialized content of the class instance.

b'\x80\x03c__main__\ntestclass\nq\x00)\x81q\x01}q\x02b.'

What am I doing wrong?


回答 0

基本问题是,JSON编码器json.dumps()默认仅知道如何序列化一组有限的对象类型(所有内置类型)。在此处列出:https//docs.python.org/3.3/library/json.html#encoders-and-decoders

一个好的解决方案是使您的类继承自该类JSONEncoder,然后实现该JSONEncoder.default()函数,并使该函数为您的类发出正确的JSON。

一个简单的解决方案是调用该实例json.dumps().__dict__成员。那是一个标准的Python dict,如果您的类很简单,它将是JSON可序列化的。

class Foo(object):
    def __init__(self):
        self.x = 1
        self.y = 2

foo = Foo()
s = json.dumps(foo) # raises TypeError with "is not JSON serializable"

s = json.dumps(foo.__dict__) # s set to: {"x":1, "y":2}

在此博客文章中讨论了上述方法:

    使用__dict__将任意Python对象序列化为JSON

The basic problem is that the JSON encoder json.dumps() only knows how to serialize a limited set of object types by default, all built-in types. List here: https://docs.python.org/3.3/library/json.html#encoders-and-decoders

One good solution would be to make your class inherit from JSONEncoder and then implement the JSONEncoder.default() function, and make that function emit the correct JSON for your class.

A simple solution would be to call json.dumps() on the .__dict__ member of that instance. That is a standard Python dict and if your class is simple it will be JSON serializable.

class Foo(object):
    def __init__(self):
        self.x = 1
        self.y = 2

foo = Foo()
s = json.dumps(foo) # raises TypeError with "is not JSON serializable"

s = json.dumps(foo.__dict__) # s set to: {"x":1, "y":2}

The above approach is discussed in this blog posting:

    Serializing arbitrary Python objects to JSON using __dict__


回答 1

您可以尝试一种对我有用的方法:

json.dumps()可以采用默认的可选参数,您可以在其中为未知类型指定自定义序列化函数,在我看来,

def serialize(obj):
    """JSON serializer for objects not serializable by default json code"""

    if isinstance(obj, date):
        serial = obj.isoformat()
        return serial

    if isinstance(obj, time):
        serial = obj.isoformat()
        return serial

    return obj.__dict__

前两个if用于日期和时间序列化,然后obj.__dict__返回任何其他对象。

最后的通话如下:

json.dumps(myObj, default=serialize)

当序列化一个集合并且不想__dict__显式地为每个对象调用时,这特别好。在这里,它会自动为您完成。

到目前为止,对我来说非常好,期待您的想法。

There’s one way that works great for me that you can try out:

json.dumps() can take an optional parameter default where you can specify a custom serializer function for unknown types, which in my case looks like

def serialize(obj):
    """JSON serializer for objects not serializable by default json code"""

    if isinstance(obj, date):
        serial = obj.isoformat()
        return serial

    if isinstance(obj, time):
        serial = obj.isoformat()
        return serial

    return obj.__dict__

First two ifs are for date and time serialization and then there is a obj.__dict__ returned for any other object.

the final call looks like:

json.dumps(myObj, default=serialize)

It’s especially good when you are serializing a collection and you don’t want to call __dict__ explicitly for every object. Here it’s done for you automatically.

So far worked so good for me, looking forward for your thoughts.


回答 2

您可以defaultjson.dumps()函数中指定命名参数:

json.dumps(obj, default=lambda x: x.__dict__)

说明:

形成文档(2.73.6):

``default(obj)`` is a function that should return a serializable version
of obj or raise TypeError. The default simply raises TypeError.

(适用于Python 2.7和Python 3.x)

注意:在这种情况下,您需要instance变量而不是class变量,如问题中的示例所示。(我假设询问者是class instance一个类的对象)

我首先从@phihag的答案中学到了这一点。发现它是最简单,最干净的方法。

You can specify the default named parameter in the json.dumps() function:

json.dumps(obj, default=lambda x: x.__dict__)

Explanation:

Form the docs (2.7, 3.6):

``default(obj)`` is a function that should return a serializable version
of obj or raise TypeError. The default simply raises TypeError.

(Works on Python 2.7 and Python 3.x)

Note: In this case you need instance variables and not class variables, as the example in the question tries to do. (I am assuming the asker meant class instance to be an object of a class)

I learned this first from @phihag’s answer here. Found it to be the simplest and cleanest way to do the job.


回答 3

我只是做:

data=json.dumps(myobject.__dict__)

这不是完整的答案,如果您有某种复杂的对象类,那么您肯定不会得到所有。但是,我将其用于一些简单的对象。

在OptionParser模块中获得的“ options”类确实非常有效。这里是JSON请求本身。

  def executeJson(self, url, options):
        data=json.dumps(options.__dict__)
        if options.verbose:
            print data
        headers = {'Content-type': 'application/json', 'Accept': 'text/plain'}
        return requests.post(url, data, headers=headers)

I just do:

data=json.dumps(myobject.__dict__)

This is not the full answer, and if you have some sort of complicated object class you certainly will not get everything. However I use this for some of my simple objects.

One that it works really well on is the “options” class that you get from the OptionParser module. Here it is along with the JSON request itself.

  def executeJson(self, url, options):
        data=json.dumps(options.__dict__)
        if options.verbose:
            print data
        headers = {'Content-type': 'application/json', 'Accept': 'text/plain'}
        return requests.post(url, data, headers=headers)

回答 4

使用jsonpickle

import jsonpickle

object = YourClass()
json_object = jsonpickle.encode(object)

Using jsonpickle

import jsonpickle

object = YourClass()
json_object = jsonpickle.encode(object)

回答 5

JSON并非真正意在序列化任意Python对象。这对于序列化dict对象pickle非常有用,但是实际上通常应该使用该模块。的输出pickle不是真正的人类可读的,但是应该可以使它顺畅运行。如果您坚持使用JSON,则可以签出jsonpickle模块,这是一种有趣的混合方法。

https://github.com/jsonpickle/jsonpickle

JSON is not really meant for serializing arbitrary Python objects. It’s great for serializing dict objects, but the pickle module is really what you should be using in general. Output from pickle is not really human-readable, but it should unpickle just fine. If you insist on using JSON, you could check out the jsonpickle module, which is an interesting hybrid approach.

https://github.com/jsonpickle/jsonpickle


回答 6

这是两个简单的函数,用于对任何不复杂的类进行序列化,没有任何花哨的地方。

我将其用于配置类型的东西,因为我可以在不进行代码调整的情况下将新成员添加到类中。

import json

class SimpleClass:
    def __init__(self, a=None, b=None, c=None):
        self.a = a
        self.b = b
        self.c = c

def serialize_json(instance=None, path=None):
    dt = {}
    dt.update(vars(instance))

    with open(path, "w") as file:
        json.dump(dt, file)

def deserialize_json(cls=None, path=None):
    def read_json(_path):
        with open(_path, "r") as file:
            return json.load(file)

    data = read_json(path)

    instance = object.__new__(cls)

    for key, value in data.items():
        setattr(instance, key, value)

    return instance

# Usage: Create class and serialize under Windows file system.
write_settings = SimpleClass(a=1, b=2, c=3)
serialize_json(write_settings, r"c:\temp\test.json")

# Read back and rehydrate.
read_settings = deserialize_json(SimpleClass, r"c:\temp\test.json")

# results are the same.
print(vars(write_settings))
print(vars(read_settings))

# output:
# {'c': 3, 'b': 2, 'a': 1}
# {'c': 3, 'b': 2, 'a': 1}

Here are two simple functions for serialization of any non-sophisticated classes, nothing fancy as explained before.

I use this for configuration type stuff because I can add new members to the classes with no code adjustments.

import json

class SimpleClass:
    def __init__(self, a=None, b=None, c=None):
        self.a = a
        self.b = b
        self.c = c

def serialize_json(instance=None, path=None):
    dt = {}
    dt.update(vars(instance))

    with open(path, "w") as file:
        json.dump(dt, file)

def deserialize_json(cls=None, path=None):
    def read_json(_path):
        with open(_path, "r") as file:
            return json.load(file)

    data = read_json(path)

    instance = object.__new__(cls)

    for key, value in data.items():
        setattr(instance, key, value)

    return instance

# Usage: Create class and serialize under Windows file system.
write_settings = SimpleClass(a=1, b=2, c=3)
serialize_json(write_settings, r"c:\temp\test.json")

# Read back and rehydrate.
read_settings = deserialize_json(SimpleClass, r"c:\temp\test.json")

# results are the same.
print(vars(write_settings))
print(vars(read_settings))

# output:
# {'c': 3, 'b': 2, 'a': 1}
# {'c': 3, 'b': 2, 'a': 1}

回答 7

关于如何开始执行此操作,有一些很好的答案。但是要记住一些事情:

  • 如果实例嵌套在大型数据结构中怎么办?
  • 如果还想要类的名称怎么办?
  • 如果要反序列化实例怎么办?
  • 如果您正在使用该怎么办 __slots__而不是__dict__呢?
  • 如果您只是不想自己做,该怎么办?

json-tricks是一个库(由我创建,其他人对此做出了贡献)已经有一段时间了。例如:

class MyTestCls:
    def __init__(self, **kwargs):
        for k, v in kwargs.items():
            setattr(self, k, v)

cls_instance = MyTestCls(s='ub', dct={'7': 7})

json = dumps(cls_instance, indent=4)
instance = loads(json)

您将恢复实例。这里的json看起来像这样:

{
    "__instance_type__": [
        "json_tricks.test_class",
        "MyTestCls"
    ],
    "attributes": {
        "s": "ub",
        "dct": {
            "7": 7
        }
    }
}

如果您想提出自己的解决方案,请查看以下内容的来源 json-tricks以免忘记某些特殊情况(例如__slots__)。

它还执行其他类型,例如numpy数组,日期时间,复数;它还允许发表评论。

There are some good answers on how to get started on doing this. But there are some things to keep in mind:

  • What if the instance is nested inside a large data structure?
  • What if also want the class name?
  • What if you want to deserialize the instance?
  • What if you’re using __slots__ instead of __dict__?
  • What if you just don’t want to do it yourself?

json-tricks is a library (that I made and others contributed to) which has been able to do this for quite a while. For example:

class MyTestCls:
    def __init__(self, **kwargs):
        for k, v in kwargs.items():
            setattr(self, k, v)

cls_instance = MyTestCls(s='ub', dct={'7': 7})

json = dumps(cls_instance, indent=4)
instance = loads(json)

You’ll get your instance back. Here the json looks like this:

{
    "__instance_type__": [
        "json_tricks.test_class",
        "MyTestCls"
    ],
    "attributes": {
        "s": "ub",
        "dct": {
            "7": 7
        }
    }
}

If you like to make your own solution, you might look at the source of json-tricks so as not to forget some special cases (like __slots__).

It also does other types like numpy arrays, datetimes, complex numbers; it also allows for comments.


回答 8

Python3.x

我所能达到的最好的方法就是这个。
注意,此代码也对待set()。
这种方法是通用的,只需要扩展类(在第二个示例中)。
请注意,我只是在处理文件,但是很容易根据自己的喜好修改行为。

但是,这是CoDec。

通过更多的工作,您可以用其他方式构造您的类。我假定使用默认的构造函数来实例化它,然后更新类dict。

import json
import collections


class JsonClassSerializable(json.JSONEncoder):

    REGISTERED_CLASS = {}

    def register(ctype):
        JsonClassSerializable.REGISTERED_CLASS[ctype.__name__] = ctype

    def default(self, obj):
        if isinstance(obj, collections.Set):
            return dict(_set_object=list(obj))
        if isinstance(obj, JsonClassSerializable):
            jclass = {}
            jclass["name"] = type(obj).__name__
            jclass["dict"] = obj.__dict__
            return dict(_class_object=jclass)
        else:
            return json.JSONEncoder.default(self, obj)

    def json_to_class(self, dct):
        if '_set_object' in dct:
            return set(dct['_set_object'])
        elif '_class_object' in dct:
            cclass = dct['_class_object']
            cclass_name = cclass["name"]
            if cclass_name not in self.REGISTERED_CLASS:
                raise RuntimeError(
                    "Class {} not registered in JSON Parser"
                    .format(cclass["name"])
                )
            instance = self.REGISTERED_CLASS[cclass_name]()
            instance.__dict__ = cclass["dict"]
            return instance
        return dct

    def encode_(self, file):
        with open(file, 'w') as outfile:
            json.dump(
                self.__dict__, outfile,
                cls=JsonClassSerializable,
                indent=4,
                sort_keys=True
            )

    def decode_(self, file):
        try:
            with open(file, 'r') as infile:
                self.__dict__ = json.load(
                    infile,
                    object_hook=self.json_to_class
                )
        except FileNotFoundError:
            print("Persistence load failed "
                  "'{}' do not exists".format(file)
                  )


class C(JsonClassSerializable):

    def __init__(self):
        self.mill = "s"


JsonClassSerializable.register(C)


class B(JsonClassSerializable):

    def __init__(self):
        self.a = 1230
        self.c = C()


JsonClassSerializable.register(B)


class A(JsonClassSerializable):

    def __init__(self):
        self.a = 1
        self.b = {1, 2}
        self.c = B()

JsonClassSerializable.register(A)

A().encode_("test")
b = A()
b.decode_("test")
print(b.a)
print(b.b)
print(b.c.a)

编辑

通过更多的研究,我找到了一种使用元类进行泛化而无需SUPERCLASS寄存器方法调用的方法。

import json
import collections

REGISTERED_CLASS = {}

class MetaSerializable(type):

    def __call__(cls, *args, **kwargs):
        if cls.__name__ not in REGISTERED_CLASS:
            REGISTERED_CLASS[cls.__name__] = cls
        return super(MetaSerializable, cls).__call__(*args, **kwargs)


class JsonClassSerializable(json.JSONEncoder, metaclass=MetaSerializable):

    def default(self, obj):
        if isinstance(obj, collections.Set):
            return dict(_set_object=list(obj))
        if isinstance(obj, JsonClassSerializable):
            jclass = {}
            jclass["name"] = type(obj).__name__
            jclass["dict"] = obj.__dict__
            return dict(_class_object=jclass)
        else:
            return json.JSONEncoder.default(self, obj)

    def json_to_class(self, dct):
        if '_set_object' in dct:
            return set(dct['_set_object'])
        elif '_class_object' in dct:
            cclass = dct['_class_object']
            cclass_name = cclass["name"]
            if cclass_name not in REGISTERED_CLASS:
                raise RuntimeError(
                    "Class {} not registered in JSON Parser"
                    .format(cclass["name"])
                )
            instance = REGISTERED_CLASS[cclass_name]()
            instance.__dict__ = cclass["dict"]
            return instance
        return dct

    def encode_(self, file):
        with open(file, 'w') as outfile:
            json.dump(
                self.__dict__, outfile,
                cls=JsonClassSerializable,
                indent=4,
                sort_keys=True
            )

    def decode_(self, file):
        try:
            with open(file, 'r') as infile:
                self.__dict__ = json.load(
                    infile,
                    object_hook=self.json_to_class
                )
        except FileNotFoundError:
            print("Persistence load failed "
                  "'{}' do not exists".format(file)
                  )


class C(JsonClassSerializable):

    def __init__(self):
        self.mill = "s"


class B(JsonClassSerializable):

    def __init__(self):
        self.a = 1230
        self.c = C()


class A(JsonClassSerializable):

    def __init__(self):
        self.a = 1
        self.b = {1, 2}
        self.c = B()


A().encode_("test")
b = A()
b.decode_("test")
print(b.a)
# 1
print(b.b)
# {1, 2}
print(b.c.a)
# 1230
print(b.c.c.mill)
# s

Python3.x

The best aproach I could reach with my knowledge was this.
Note that this code treat set() too.
This approach is generic just needing the extension of class (in the second example).
Note that I’m just doing it to files, but it’s easy to modify the behavior to your taste.

However this is a CoDec.

With a little more work you can construct your class in other ways. I assume a default constructor to instance it, then I update the class dict.

import json
import collections


class JsonClassSerializable(json.JSONEncoder):

    REGISTERED_CLASS = {}

    def register(ctype):
        JsonClassSerializable.REGISTERED_CLASS[ctype.__name__] = ctype

    def default(self, obj):
        if isinstance(obj, collections.Set):
            return dict(_set_object=list(obj))
        if isinstance(obj, JsonClassSerializable):
            jclass = {}
            jclass["name"] = type(obj).__name__
            jclass["dict"] = obj.__dict__
            return dict(_class_object=jclass)
        else:
            return json.JSONEncoder.default(self, obj)

    def json_to_class(self, dct):
        if '_set_object' in dct:
            return set(dct['_set_object'])
        elif '_class_object' in dct:
            cclass = dct['_class_object']
            cclass_name = cclass["name"]
            if cclass_name not in self.REGISTERED_CLASS:
                raise RuntimeError(
                    "Class {} not registered in JSON Parser"
                    .format(cclass["name"])
                )
            instance = self.REGISTERED_CLASS[cclass_name]()
            instance.__dict__ = cclass["dict"]
            return instance
        return dct

    def encode_(self, file):
        with open(file, 'w') as outfile:
            json.dump(
                self.__dict__, outfile,
                cls=JsonClassSerializable,
                indent=4,
                sort_keys=True
            )

    def decode_(self, file):
        try:
            with open(file, 'r') as infile:
                self.__dict__ = json.load(
                    infile,
                    object_hook=self.json_to_class
                )
        except FileNotFoundError:
            print("Persistence load failed "
                  "'{}' do not exists".format(file)
                  )


class C(JsonClassSerializable):

    def __init__(self):
        self.mill = "s"


JsonClassSerializable.register(C)


class B(JsonClassSerializable):

    def __init__(self):
        self.a = 1230
        self.c = C()


JsonClassSerializable.register(B)


class A(JsonClassSerializable):

    def __init__(self):
        self.a = 1
        self.b = {1, 2}
        self.c = B()

JsonClassSerializable.register(A)

A().encode_("test")
b = A()
b.decode_("test")
print(b.a)
print(b.b)
print(b.c.a)

Edit

With some more of research I found a way to generalize without the need of the SUPERCLASS register method call, using a metaclass

import json
import collections

REGISTERED_CLASS = {}

class MetaSerializable(type):

    def __call__(cls, *args, **kwargs):
        if cls.__name__ not in REGISTERED_CLASS:
            REGISTERED_CLASS[cls.__name__] = cls
        return super(MetaSerializable, cls).__call__(*args, **kwargs)


class JsonClassSerializable(json.JSONEncoder, metaclass=MetaSerializable):

    def default(self, obj):
        if isinstance(obj, collections.Set):
            return dict(_set_object=list(obj))
        if isinstance(obj, JsonClassSerializable):
            jclass = {}
            jclass["name"] = type(obj).__name__
            jclass["dict"] = obj.__dict__
            return dict(_class_object=jclass)
        else:
            return json.JSONEncoder.default(self, obj)

    def json_to_class(self, dct):
        if '_set_object' in dct:
            return set(dct['_set_object'])
        elif '_class_object' in dct:
            cclass = dct['_class_object']
            cclass_name = cclass["name"]
            if cclass_name not in REGISTERED_CLASS:
                raise RuntimeError(
                    "Class {} not registered in JSON Parser"
                    .format(cclass["name"])
                )
            instance = REGISTERED_CLASS[cclass_name]()
            instance.__dict__ = cclass["dict"]
            return instance
        return dct

    def encode_(self, file):
        with open(file, 'w') as outfile:
            json.dump(
                self.__dict__, outfile,
                cls=JsonClassSerializable,
                indent=4,
                sort_keys=True
            )

    def decode_(self, file):
        try:
            with open(file, 'r') as infile:
                self.__dict__ = json.load(
                    infile,
                    object_hook=self.json_to_class
                )
        except FileNotFoundError:
            print("Persistence load failed "
                  "'{}' do not exists".format(file)
                  )


class C(JsonClassSerializable):

    def __init__(self):
        self.mill = "s"


class B(JsonClassSerializable):

    def __init__(self):
        self.a = 1230
        self.c = C()


class A(JsonClassSerializable):

    def __init__(self):
        self.a = 1
        self.b = {1, 2}
        self.c = B()


A().encode_("test")
b = A()
b.decode_("test")
print(b.a)
# 1
print(b.b)
# {1, 2}
print(b.c.a)
# 1230
print(b.c.c.mill)
# s

回答 9

我相信与其采用公认的答案所建议的继​​承,不如使用多态性更好。否则,必须有一个很大的if else语句才能自定义每个对象的编码。这意味着为JSON创建通用的默认编码器,如下所示:

def jsonDefEncoder(obj):
   if hasattr(obj, 'jsonEnc'):
      return obj.jsonEnc()
   else: #some default behavior
      return obj.__dict__

然后jsonEnc()在要序列化的每个类中都有一个函数。例如

class A(object):
   def __init__(self,lengthInFeet):
      self.lengthInFeet=lengthInFeet
   def jsonEnc(self):
      return {'lengthInMeters': lengthInFeet * 0.3 } # each foot is 0.3 meter

然后你打电话 json.dumps(classInstance,default=jsonDefEncoder)

I believe instead of inheritance as suggested in accepted answer, it’s better to use polymorphism. Otherwise you have to have a big if else statement to customize encoding of every object. That means create a generic default encoder for JSON as:

def jsonDefEncoder(obj):
   if hasattr(obj, 'jsonEnc'):
      return obj.jsonEnc()
   else: #some default behavior
      return obj.__dict__

and then have a jsonEnc() function in each class you want to serialize. e.g.

class A(object):
   def __init__(self,lengthInFeet):
      self.lengthInFeet=lengthInFeet
   def jsonEnc(self):
      return {'lengthInMeters': lengthInFeet * 0.3 } # each foot is 0.3 meter

Then you call json.dumps(classInstance,default=jsonDefEncoder)


保存对象(数据持久性)

问题:保存对象(数据持久性)

我创建了一个像这样的对象:

company1.name = 'banana' 
company1.value = 40

我想保存该对象。我怎样才能做到这一点?

I’ve created an object like this:

company1.name = 'banana' 
company1.value = 40

I would like to save this object. How can I do that?


回答 0

您可以使用pickle标准库中的模块。这是您的示例的基本应用:

import pickle

class Company(object):
    def __init__(self, name, value):
        self.name = name
        self.value = value

with open('company_data.pkl', 'wb') as output:
    company1 = Company('banana', 40)
    pickle.dump(company1, output, pickle.HIGHEST_PROTOCOL)

    company2 = Company('spam', 42)
    pickle.dump(company2, output, pickle.HIGHEST_PROTOCOL)

del company1
del company2

with open('company_data.pkl', 'rb') as input:
    company1 = pickle.load(input)
    print(company1.name)  # -> banana
    print(company1.value)  # -> 40

    company2 = pickle.load(input)
    print(company2.name) # -> spam
    print(company2.value)  # -> 42

您还可以定义自己的简单实用程序,如下所示,该实用程序打开文件并向其中写入单个对象:

def save_object(obj, filename):
    with open(filename, 'wb') as output:  # Overwrites any existing file.
        pickle.dump(obj, output, pickle.HIGHEST_PROTOCOL)

# sample usage
save_object(company1, 'company1.pkl')

更新资料

由于这是一个很受欢迎的答案,因此,我想谈谈一些较高级的用法主题。

cPickle(或_pickle)与pickle

实际使用cPickle模块几乎总是可取的,而不是pickle因为模块是用C编写的并且速度更快。它们之间有一些细微的差异,但是在大多数情况下它们是等效的,并且C版本将提供非常优越的性能。切换到它再简单不过,只需将import语句更改为:

import cPickle as pickle

在Python 3中,它cPickle已被重命名_pickle,但是不再需要执行此操作,因为该pickle模块现在可以自动执行此操作-请参阅python 3中的pickle和_pickle有什么区别?

总结是,您可以使用类似以下内容的代码来确保您的代码在Python 2和3中都可用时始终使用C版本:

try:
    import cPickle as pickle
except ModuleNotFoundError:
    import pickle

数据流格式(协议)

pickle可以读写多种不同的特定于Python的格式的文件,称为文档中所述的协议,“协议版本0”为ASCII,因此“易于阅读”。> 0的版本是二进制的,可用的最高版本取决于所使用的Python版本。默认值还取决于Python版本。在Python 2中,默认值是Protocol版本,但在Python 3.8.1中,它是Protocol版本。在Python 3.x中,该模块已添加,但在Python 2中不存在。04pickle.DEFAULT_PROTOCOL

幸运的是,pickle.HIGHEST_PROTOCOL在每个调用中都有一个写速记的方法(假设这就是您想要的,并且您通常会这样做),只需使用文字数字-1-类似于通过负索引引用序列的最后一个元素。因此,与其编写:

pickle.dump(obj, output, pickle.HIGHEST_PROTOCOL)

您可以这样写:

pickle.dump(obj, output, -1)

无论哪种方式,如果您创建了一个Pickler用于多个酸洗操作的对象,则只需指定一次协议:

pickler = pickle.Pickler(output, -1)
pickler.dump(obj1)
pickler.dump(obj2)
   etc...

注意:如果您正在运行不同版本的Python的环境中,则可能需要显式地使用(即,硬编码)它们都可以读取的特定协议编号(较新的版本通常可以读取较早版本产生的文件) 。

多个物件

虽然泡菜文件可以包含如上述样品中,当有这些数目不详的任何数量的腌制对象的,它往往更容易将其全部保存在某种可变大小的容器,就像一个listtupledict写字一次调用即可将它们全部保存到文件中:

tech_companies = [
    Company('Apple', 114.18), Company('Google', 908.60), Company('Microsoft', 69.18)
]
save_object(tech_companies, 'tech_companies.pkl')

然后使用以下命令恢复列表及其中的所有内容:

with open('tech_companies.pkl', 'rb') as input:
    tech_companies = pickle.load(input)

主要优点是您无需知道要保存多少个对象实例即可在以后加载它们(尽管如果没有该信息可以这样做,但它需要一些专门的代码)。请参阅相关问题的答案在腌制文件中保存和加载多个对象?有关执行此操作的不同方法的详细信息。个人喜欢@Lutz Prechelt的答案。它适用于此处的示例:

class Company:
    def __init__(self, name, value):
        self.name = name
        self.value = value

def pickled_items(filename):
    """ Unpickle a file of pickled data. """
    with open(filename, "rb") as f:
        while True:
            try:
                yield pickle.load(f)
            except EOFError:
                break

print('Companies in pickle file:')
for company in pickled_items('company_data.pkl'):
    print('  name: {}, value: {}'.format(company.name, company.value))

You could use the pickle module in the standard library. Here’s an elementary application of it to your example:

import pickle

class Company(object):
    def __init__(self, name, value):
        self.name = name
        self.value = value

with open('company_data.pkl', 'wb') as output:
    company1 = Company('banana', 40)
    pickle.dump(company1, output, pickle.HIGHEST_PROTOCOL)

    company2 = Company('spam', 42)
    pickle.dump(company2, output, pickle.HIGHEST_PROTOCOL)

del company1
del company2

with open('company_data.pkl', 'rb') as input:
    company1 = pickle.load(input)
    print(company1.name)  # -> banana
    print(company1.value)  # -> 40

    company2 = pickle.load(input)
    print(company2.name) # -> spam
    print(company2.value)  # -> 42

You could also define your own simple utility like the following which opens a file and writes a single object to it:

def save_object(obj, filename):
    with open(filename, 'wb') as output:  # Overwrites any existing file.
        pickle.dump(obj, output, pickle.HIGHEST_PROTOCOL)

# sample usage
save_object(company1, 'company1.pkl')

Update

Since this is such a popular answer, I’d like touch on a few slightly advanced usage topics.

cPickle (or _pickle) vs pickle

It’s almost always preferable to actually use the cPickle module rather than pickle because the former is written in C and is much faster. There are some subtle differences between them, but in most situations they’re equivalent and the C version will provide greatly superior performance. Switching to it couldn’t be easier, just change the import statement to this:

import cPickle as pickle

In Python 3, cPickle was renamed _pickle, but doing this is no longer necessary since the pickle module now does it automatically—see What difference between pickle and _pickle in python 3?.

The rundown is you could use something like the following to ensure that your code will always use the C version when it’s available in both Python 2 and 3:

try:
    import cPickle as pickle
except ModuleNotFoundError:
    import pickle

Data stream formats (protocols)

pickle can read and write files in several different, Python-specific, formats, called protocols as described in the documentation, “Protocol version 0” is ASCII and therefore “human-readable”. Versions > 0 are binary and the highest one available depends on what version of Python is being used. The default also depends on Python version. In Python 2 the default was Protocol version 0, but in Python 3.8.1, it’s Protocol version 4. In Python 3.x the module had a pickle.DEFAULT_PROTOCOL added to it, but that doesn’t exist in Python 2.

Fortunately there’s shorthand for writing pickle.HIGHEST_PROTOCOL in every call (assuming that’s what you want, and you usually do), just use the literal number -1 — similar to referencing the last element of a sequence via a negative index. So, instead of writing:

pickle.dump(obj, output, pickle.HIGHEST_PROTOCOL)

You can just write:

pickle.dump(obj, output, -1)

Either way, you’d only have specify the protocol once if you created a Pickler object for use in multiple pickle operations:

pickler = pickle.Pickler(output, -1)
pickler.dump(obj1)
pickler.dump(obj2)
   etc...

Note: If you’re in an environment running different versions of Python, then you’ll probably want to explicitly use (i.e. hardcode) a specific protocol number that all of them can read (later versions can generally read files produced by earlier ones).

Multiple Objects

While a pickle file can contain any number of pickled objects, as shown in the above samples, when there’s an unknown number of them, it’s often easier to store them all in some sort of variably-sized container, like a list, tuple, or dict and write them all to the file in a single call:

tech_companies = [
    Company('Apple', 114.18), Company('Google', 908.60), Company('Microsoft', 69.18)
]
save_object(tech_companies, 'tech_companies.pkl')

and restore the list and everything in it later with:

with open('tech_companies.pkl', 'rb') as input:
    tech_companies = pickle.load(input)

The major advantage is you don’t need to know how many object instances are saved in order to load them back later (although doing so without that information is possible, it requires some slightly specialized code). See the answers to the related question Saving and loading multiple objects in pickle file? for details on different ways to do this. Personally I like @Lutz Prechelt’s answer the best. Here’s it adapted to the examples here:

class Company:
    def __init__(self, name, value):
        self.name = name
        self.value = value

def pickled_items(filename):
    """ Unpickle a file of pickled data. """
    with open(filename, "rb") as f:
        while True:
            try:
                yield pickle.load(f)
            except EOFError:
                break

print('Companies in pickle file:')
for company in pickled_items('company_data.pkl'):
    print('  name: {}, value: {}'.format(company.name, company.value))

回答 1

我认为,假设该对象是个,这是一个很强的假设class。如果不是,该class怎么办?还有一种假设是该对象未在解释器中定义。如果在解释器中定义该怎么办?另外,如果属性是动态添加的,该怎么办?当某些python对象__dict__在创建后向其添加了属性时,pickle就不考虑这些属性的添加(即“忘记”了它们的添加-因为pickle是通过引用对象定义进行序列化的)。

在所有这些情况,pickle并且cPickle可以可怕的失败你。

如果您要保存object(任意创建的)具有属性(在对象定义中添加,或之后添加)的属性,则最好的选择是使用dill,它可以在python中序列化几乎所有内容。

我们从上课开始…

Python 2.7.8 (default, Jul 13 2014, 02:29:54) 
[GCC 4.2.1 Compatible Apple Clang 4.1 ((tags/Apple/clang-421.11.66))] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import pickle
>>> class Company:
...     pass
... 
>>> company1 = Company()
>>> company1.name = 'banana'
>>> company1.value = 40
>>> with open('company.pkl', 'wb') as f:
...     pickle.dump(company1, f, pickle.HIGHEST_PROTOCOL)
... 
>>> 

现在关闭,然后重新启动…

Python 2.7.8 (default, Jul 13 2014, 02:29:54) 
[GCC 4.2.1 Compatible Apple Clang 4.1 ((tags/Apple/clang-421.11.66))] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import pickle
>>> with open('company.pkl', 'rb') as f:
...     company1 = pickle.load(f)
... 
Traceback (most recent call last):
  File "<stdin>", line 2, in <module>
  File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py", line 1378, in load
    return Unpickler(file).load()
  File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py", line 858, in load
dispatch[key](self)
  File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py", line 1090, in load_global
    klass = self.find_class(module, name)
  File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py", line 1126, in find_class
    klass = getattr(mod, name)
AttributeError: 'module' object has no attribute 'Company'
>>> 

糟糕… pickle无法处理。让我们尝试一下dill。我们将引入另一种对象类型(a lambda)以取得良好的效果。

Python 2.7.8 (default, Jul 13 2014, 02:29:54) 
[GCC 4.2.1 Compatible Apple Clang 4.1 ((tags/Apple/clang-421.11.66))] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import dill       
>>> class Company:
...     pass
... 
>>> company1 = Company()
>>> company1.name = 'banana'
>>> company1.value = 40
>>> 
>>> company2 = lambda x:x
>>> company2.name = 'rhubarb'
>>> company2.value = 42
>>> 
>>> with open('company_dill.pkl', 'wb') as f:
...     dill.dump(company1, f)
...     dill.dump(company2, f)
... 
>>> 

现在读取文件。

Python 2.7.8 (default, Jul 13 2014, 02:29:54) 
[GCC 4.2.1 Compatible Apple Clang 4.1 ((tags/Apple/clang-421.11.66))] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import dill
>>> with open('company_dill.pkl', 'rb') as f:
...     company1 = dill.load(f)
...     company2 = dill.load(f)
... 
>>> company1 
<__main__.Company instance at 0x107909128>
>>> company1.name
'banana'
>>> company1.value
40
>>> company2.name
'rhubarb'
>>> company2.value
42
>>>    

有用。pickle失败的原因(dill并非如此)是(在大多数情况下)dill将其视为__main__一个模块,并且还可以腌制类定义而不是通过引用进行腌制(就像这样pickle做)。dill腌制a 的原因lambda是它给它起了个名字……然后就会出现腌制魔术。

实际上,有一种保存所有这些对象的简便方法,尤其是当您创建了很多对象时。只需转储整个python会话,然后稍后再返回即可。

Python 2.7.8 (default, Jul 13 2014, 02:29:54) 
[GCC 4.2.1 Compatible Apple Clang 4.1 ((tags/Apple/clang-421.11.66))] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import dill
>>> class Company:
...     pass
... 
>>> company1 = Company()
>>> company1.name = 'banana'
>>> company1.value = 40
>>> 
>>> company2 = lambda x:x
>>> company2.name = 'rhubarb'
>>> company2.value = 42
>>> 
>>> dill.dump_session('dill.pkl')
>>> 

现在关闭计算机,享用意式浓缩咖啡或其他任何东西,然后再回来…

Python 2.7.8 (default, Jul 13 2014, 02:29:54) 
[GCC 4.2.1 Compatible Apple Clang 4.1 ((tags/Apple/clang-421.11.66))] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import dill
>>> dill.load_session('dill.pkl')
>>> company1.name
'banana'
>>> company1.value
40
>>> company2.name
'rhubarb'
>>> company2.value
42
>>> company2
<function <lambda> at 0x1065f2938>

唯一的主要缺点是它dill不是python标准库的一部分。因此,如果您无法在服务器上安装python软件包,则无法使用它。

但是,如果你能够在系统上安装Python包,你可以得到最新的dillgit+https://github.com/uqfoundation/dill.git@master#egg=dill。您可以使用下载最新版本pip install dill

I think it’s a pretty strong assumption to assume that the object is a class. What if it’s not a class? There’s also the assumption that the object was not defined in the interpreter. What if it was defined in the interpreter? Also, what if the attributes were added dynamically? When some python objects have attributes added to their __dict__ after creation, pickle doesn’t respect the addition of those attributes (i.e. it ‘forgets’ they were added — because pickle serializes by reference to the object definition).

In all these cases, pickle and cPickle can fail you horribly.

If you are looking to save an object (arbitrarily created), where you have attributes (either added in the object definition, or afterward)… your best bet is to use dill, which can serialize almost anything in python.

We start with a class…

Python 2.7.8 (default, Jul 13 2014, 02:29:54) 
[GCC 4.2.1 Compatible Apple Clang 4.1 ((tags/Apple/clang-421.11.66))] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import pickle
>>> class Company:
...     pass
... 
>>> company1 = Company()
>>> company1.name = 'banana'
>>> company1.value = 40
>>> with open('company.pkl', 'wb') as f:
...     pickle.dump(company1, f, pickle.HIGHEST_PROTOCOL)
... 
>>> 

Now shut down, and restart…

Python 2.7.8 (default, Jul 13 2014, 02:29:54) 
[GCC 4.2.1 Compatible Apple Clang 4.1 ((tags/Apple/clang-421.11.66))] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import pickle
>>> with open('company.pkl', 'rb') as f:
...     company1 = pickle.load(f)
... 
Traceback (most recent call last):
  File "<stdin>", line 2, in <module>
  File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py", line 1378, in load
    return Unpickler(file).load()
  File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py", line 858, in load
dispatch[key](self)
  File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py", line 1090, in load_global
    klass = self.find_class(module, name)
  File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py", line 1126, in find_class
    klass = getattr(mod, name)
AttributeError: 'module' object has no attribute 'Company'
>>> 

Oops… pickle can’t handle it. Let’s try dill. We’ll throw in another object type (a lambda) for good measure.

Python 2.7.8 (default, Jul 13 2014, 02:29:54) 
[GCC 4.2.1 Compatible Apple Clang 4.1 ((tags/Apple/clang-421.11.66))] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import dill       
>>> class Company:
...     pass
... 
>>> company1 = Company()
>>> company1.name = 'banana'
>>> company1.value = 40
>>> 
>>> company2 = lambda x:x
>>> company2.name = 'rhubarb'
>>> company2.value = 42
>>> 
>>> with open('company_dill.pkl', 'wb') as f:
...     dill.dump(company1, f)
...     dill.dump(company2, f)
... 
>>> 

And now read the file.

Python 2.7.8 (default, Jul 13 2014, 02:29:54) 
[GCC 4.2.1 Compatible Apple Clang 4.1 ((tags/Apple/clang-421.11.66))] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import dill
>>> with open('company_dill.pkl', 'rb') as f:
...     company1 = dill.load(f)
...     company2 = dill.load(f)
... 
>>> company1 
<__main__.Company instance at 0x107909128>
>>> company1.name
'banana'
>>> company1.value
40
>>> company2.name
'rhubarb'
>>> company2.value
42
>>>    

It works. The reason pickle fails, and dill doesn’t, is that dill treats __main__ like a module (for the most part), and also can pickle class definitions instead of pickling by reference (like pickle does). The reason dill can pickle a lambda is that it gives it a name… then pickling magic can happen.

Actually, there’s an easier way to save all these objects, especially if you have a lot of objects you’ve created. Just dump the whole python session, and come back to it later.

Python 2.7.8 (default, Jul 13 2014, 02:29:54) 
[GCC 4.2.1 Compatible Apple Clang 4.1 ((tags/Apple/clang-421.11.66))] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import dill
>>> class Company:
...     pass
... 
>>> company1 = Company()
>>> company1.name = 'banana'
>>> company1.value = 40
>>> 
>>> company2 = lambda x:x
>>> company2.name = 'rhubarb'
>>> company2.value = 42
>>> 
>>> dill.dump_session('dill.pkl')
>>> 

Now shut down your computer, go enjoy an espresso or whatever, and come back later…

Python 2.7.8 (default, Jul 13 2014, 02:29:54) 
[GCC 4.2.1 Compatible Apple Clang 4.1 ((tags/Apple/clang-421.11.66))] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import dill
>>> dill.load_session('dill.pkl')
>>> company1.name
'banana'
>>> company1.value
40
>>> company2.name
'rhubarb'
>>> company2.value
42
>>> company2
<function <lambda> at 0x1065f2938>

The only major drawback is that dill is not part of the python standard library. So if you can’t install a python package on your server, then you can’t use it.

However, if you are able to install python packages on your system, you can get the latest dill with git+https://github.com/uqfoundation/dill.git@master#egg=dill. And you can get the latest released version with pip install dill.


回答 2

您可以使用anycache为您完成这项工作。它考虑了所有细节:

  • 它使用莳萝作为后端,扩展了python pickle模块以处理lambda和所有不错的python功能。
  • 它将不同的对象存储到不同的文件,并正确地重新加载它们。
  • 限制缓存大小
  • 允许清除缓存
  • 允许在多次运行之间共享对象
  • 允许尊重会影响结果的输入文件

假设您有一个myfunc创建实例的函数:

from anycache import anycache

class Company(object):
    def __init__(self, name, value):
        self.name = name
        self.value = value

@anycache(cachedir='/path/to/your/cache')    
def myfunc(name, value)
    return Company(name, value)

Anycache首次调用myfunc,并cachedir使用唯一标识符(取决于函数名称及其参数)作为文件名将结果腌制到文件中。在任何连续运行中,将加载已腌制的对象。如果在cachedir两次python运行之间保留了,则腌制的对象将从先前的python运行中获取。

有关更多详细信息,请参见文档

You can use anycache to do the job for you. It considers all the details:

  • It uses dill as backend, which extends the python pickle module to handle lambda and all the nice python features.
  • It stores different objects to different files and reloads them properly.
  • Limits cache size
  • Allows cache clearing
  • Allows sharing of objects between multiple runs
  • Allows respect of input files which influence the result

Assuming you have a function myfunc which creates the instance:

from anycache import anycache

class Company(object):
    def __init__(self, name, value):
        self.name = name
        self.value = value

@anycache(cachedir='/path/to/your/cache')    
def myfunc(name, value)
    return Company(name, value)

Anycache calls myfunc at the first time and pickles the result to a file in cachedir using an unique identifier (depending on the function name and its arguments) as filename. On any consecutive run, the pickled object is loaded. If the cachedir is preserved between python runs, the pickled object is taken from the previous python run.

For any further details see the documentation


回答 3

使用company1您的问题和python3的快速示例。

import pickle

# Save the file
pickle.dump(company1, file = open("company1.pickle", "wb"))

# Reload the file
company1_reloaded = pickle.load(open("company1.pickle", "rb"))

但是,正如该答案指出的那样,泡菜经常失败。所以你应该真正使用dill

import dill

# Save the file
dill.dump(company1, file = open("company1.pickle", "wb"))

# Reload the file
company1_reloaded = dill.load(open("company1.pickle", "rb"))

Quick example using company1 from your question, with python3.

import pickle

# Save the file
pickle.dump(company1, file = open("company1.pickle", "wb"))

# Reload the file
company1_reloaded = pickle.load(open("company1.pickle", "rb"))

However, as this answer noted, pickle often fails. So you should really use dill.

import dill

# Save the file
dill.dump(company1, file = open("company1.pickle", "wb"))

# Reload the file
company1_reloaded = dill.load(open("company1.pickle", "rb"))

如何从JSON获取字符串对象而不是Unicode?

问题:如何从JSON获取字符串对象而不是Unicode?

我正在使用Python 2ASCII编码的文本文件中解析JSON 。

使用json或 加载这些文件时simplejson,我所有的字符串值都转换为Unicode对象而不是字符串对象。问题是,我必须将数据与仅接受字符串对象的某些库一起使用。我无法更改库,无法更新它们。

是否可以获取字符串对象而不是Unicode对象?

>>> import json
>>> original_list = ['a', 'b']
>>> json_list = json.dumps(original_list)
>>> json_list
'["a", "b"]'
>>> new_list = json.loads(json_list)
>>> new_list
[u'a', u'b']  # I want these to be of type `str`, not `unicode`

更新资料

很久以前,当我坚持使用Python 2时就问这个问题。今天一种简单易用的解决方案是使用最新版本的Python,即Python 3及更高版本。

I’m using Python 2 to parse JSON from ASCII encoded text files.

When loading these files with either json or simplejson, all my string values are cast to Unicode objects instead of string objects. The problem is, I have to use the data with some libraries that only accept string objects. I can’t change the libraries nor update them.

Is it possible to get string objects instead of Unicode ones?

Example

>>> import json
>>> original_list = ['a', 'b']
>>> json_list = json.dumps(original_list)
>>> json_list
'["a", "b"]'
>>> new_list = json.loads(json_list)
>>> new_list
[u'a', u'b']  # I want these to be of type `str`, not `unicode`

Update

This question was asked a long time ago, when I was stuck with Python 2. One easy and clean solution for today is to use a recent version of Python — i.e. Python 3 and forward.


回答 0

一个解决方案 object_hook

import json

def json_load_byteified(file_handle):
    return _byteify(
        json.load(file_handle, object_hook=_byteify),
        ignore_dicts=True
    )

def json_loads_byteified(json_text):
    return _byteify(
        json.loads(json_text, object_hook=_byteify),
        ignore_dicts=True
    )

def _byteify(data, ignore_dicts = False):
    # if this is a unicode string, return its string representation
    if isinstance(data, unicode):
        return data.encode('utf-8')
    # if this is a list of values, return list of byteified values
    if isinstance(data, list):
        return [ _byteify(item, ignore_dicts=True) for item in data ]
    # if this is a dictionary, return dictionary of byteified keys and values
    # but only if we haven't already byteified it
    if isinstance(data, dict) and not ignore_dicts:
        return {
            _byteify(key, ignore_dicts=True): _byteify(value, ignore_dicts=True)
            for key, value in data.iteritems()
        }
    # if it's anything else, return it in its original form
    return data

用法示例:

>>> json_loads_byteified('{"Hello": "World"}')
{'Hello': 'World'}
>>> json_loads_byteified('"I am a top-level string"')
'I am a top-level string'
>>> json_loads_byteified('7')
7
>>> json_loads_byteified('["I am inside a list"]')
['I am inside a list']
>>> json_loads_byteified('[[[[[[[["I am inside a big nest of lists"]]]]]]]]')
[[[[[[[['I am inside a big nest of lists']]]]]]]]
>>> json_loads_byteified('{"foo": "bar", "things": [7, {"qux": "baz", "moo": {"cow": ["milk"]}}]}')
{'things': [7, {'qux': 'baz', 'moo': {'cow': ['milk']}}], 'foo': 'bar'}
>>> json_load_byteified(open('somefile.json'))
{'more json': 'from a file'}

它是如何工作的,我为什么要使用它?

Mark Amery的功能比这些功能更短更清晰,那么它们的意义何在?您为什么要使用它们?

纯粹是为了 表现。Mark的答案首先使用unicode字符串完全解码JSON文本,然后遍历整个解码值以将所有字符串转换为字节字符串。这有一些不良影响:

  • 整个解码结构的副本在内存中创建
  • 如果您的JSON对象确实是深度嵌套(500级或更多),那么您将达到Python的最大递归深度

这个答案通过缓解这两方面的性能问题object_hook的参数json.loadjson.loads。从文档

object_hook是一个可选函数,它将被解码的任何对象文字(a dict)的结果调用。将使用object_hook的返回值代替dict。此功能可用于实现自定义解码器

由于嵌套在其他字典中许多层次的字典object_hook 在解码时会传递给我们,因此我们可以在那时将其中的任何字符串或列表字节化,并避免以后再进行深度递归。

Mark的答案不适合按object_hook现状使用,因为它会递归为嵌套词典。我们阻止这个答案与该递归ignore_dicts参数_byteify,它被传递给它在任何时候都只是object_hook它传递一个新dict来byteify。该ignore_dicts标志指示_byteify忽略dicts,因为它们已经被字节化了。

最后,我们的实现json_load_byteified和在从或返回的结果上json_loads_byteified调用_byteify(with ignore_dicts=True)来处理被解码的JSON文本在顶层没有的情况。json.loadjson.loadsdict

A solution with object_hook

import json

def json_load_byteified(file_handle):
    return _byteify(
        json.load(file_handle, object_hook=_byteify),
        ignore_dicts=True
    )

def json_loads_byteified(json_text):
    return _byteify(
        json.loads(json_text, object_hook=_byteify),
        ignore_dicts=True
    )

def _byteify(data, ignore_dicts = False):
    # if this is a unicode string, return its string representation
    if isinstance(data, unicode):
        return data.encode('utf-8')
    # if this is a list of values, return list of byteified values
    if isinstance(data, list):
        return [ _byteify(item, ignore_dicts=True) for item in data ]
    # if this is a dictionary, return dictionary of byteified keys and values
    # but only if we haven't already byteified it
    if isinstance(data, dict) and not ignore_dicts:
        return {
            _byteify(key, ignore_dicts=True): _byteify(value, ignore_dicts=True)
            for key, value in data.iteritems()
        }
    # if it's anything else, return it in its original form
    return data

Example usage:

>>> json_loads_byteified('{"Hello": "World"}')
{'Hello': 'World'}
>>> json_loads_byteified('"I am a top-level string"')
'I am a top-level string'
>>> json_loads_byteified('7')
7
>>> json_loads_byteified('["I am inside a list"]')
['I am inside a list']
>>> json_loads_byteified('[[[[[[[["I am inside a big nest of lists"]]]]]]]]')
[[[[[[[['I am inside a big nest of lists']]]]]]]]
>>> json_loads_byteified('{"foo": "bar", "things": [7, {"qux": "baz", "moo": {"cow": ["milk"]}}]}')
{'things': [7, {'qux': 'baz', 'moo': {'cow': ['milk']}}], 'foo': 'bar'}
>>> json_load_byteified(open('somefile.json'))
{'more json': 'from a file'}

How does this work and why would I use it?

Mark Amery’s function is shorter and clearer than these ones, so what’s the point of them? Why would you want to use them?

Purely for performance. Mark’s answer decodes the JSON text fully first with unicode strings, then recurses through the entire decoded value to convert all strings to byte strings. This has a couple of undesirable effects:

  • A copy of the entire decoded structure gets created in memory
  • If your JSON object is really deeply nested (500 levels or more) then you’ll hit Python’s maximum recursion depth

This answer mitigates both of those performance issues by using the object_hook parameter of json.load and json.loads. From the docs:

object_hook is an optional function that will be called with the result of any object literal decoded (a dict). The return value of object_hook will be used instead of the dict. This feature can be used to implement custom decoders

Since dictionaries nested many levels deep in other dictionaries get passed to object_hook as they’re decoded, we can byteify any strings or lists inside them at that point and avoid the need for deep recursion later.

Mark’s answer isn’t suitable for use as an object_hook as it stands, because it recurses into nested dictionaries. We prevent that recursion in this answer with the ignore_dicts parameter to _byteify, which gets passed to it at all times except when object_hook passes it a new dict to byteify. The ignore_dicts flag tells _byteify to ignore dicts since they already been byteified.

Finally, our implementations of json_load_byteified and json_loads_byteified call _byteify (with ignore_dicts=True) on the result returned from json.load or json.loads to handle the case where the JSON text being decoded doesn’t have a dict at the top level.


回答 1

尽管这里有一些不错的答案,但是我最终还是使用PyYAML来解析我的JSON文件,因为它将键和值提供为str类型字符串而不是unicode类型。由于JSON是YAML的子集,因此效果很好:

>>> import json
>>> import yaml
>>> list_org = ['a', 'b']
>>> list_dump = json.dumps(list_org)
>>> list_dump
'["a", "b"]'
>>> json.loads(list_dump)
[u'a', u'b']
>>> yaml.safe_load(list_dump)
['a', 'b']

笔记

不过要注意一些事项:

  • 我得到字符串对象,因为我的所有条目都是ASCII编码的。如果我要使用unicode编码的条目,我会将它们作为unicode对象取回-没有转换!

  • 您应该(可能总是)使用PyYAML的safe_load功能;如果使用它来加载JSON文件,则无论如何都不需要该load函数的“附加功能” 。

  • 如果你想拥有的1.2版规范更多的支持(和YAML解析器正确地解析非常低的数字)尝试Ruamel YAMLpip install ruamel.yamlimport ruamel.yaml as yaml我在测试时所需要的所有我。

转换次数

如前所述,没有转换!如果不能确定只处理ASCII值(并且不能确定大多数时间),最好使用转换函数

我现在几次使用Mark Amery的产品,效果很好,而且非常易于使用。您也可以使用类似的功能object_hook来代替它,因为它可以提高大文件的性能。对此,请参见Mirec Miskuf稍有涉及的答案

While there are some good answers here, I ended up using PyYAML to parse my JSON files, since it gives the keys and values as str type strings instead of unicode type. Because JSON is a subset of YAML it works nicely:

>>> import json
>>> import yaml
>>> list_org = ['a', 'b']
>>> list_dump = json.dumps(list_org)
>>> list_dump
'["a", "b"]'
>>> json.loads(list_dump)
[u'a', u'b']
>>> yaml.safe_load(list_dump)
['a', 'b']

Notes

Some things to note though:

  • I get string objects because all my entries are ASCII encoded. If I would use unicode encoded entries, I would get them back as unicode objects — there is no conversion!

  • You should (probably always) use PyYAML’s safe_load function; if you use it to load JSON files, you don’t need the “additional power” of the load function anyway.

  • If you want a YAML parser that has more support for the 1.2 version of the spec (and correctly parses very low numbers) try Ruamel YAML: pip install ruamel.yaml and import ruamel.yaml as yaml was all I needed in my tests.

Conversion

As stated, there is no conversion! If you can’t be sure to only deal with ASCII values (and you can’t be sure most of the time), better use a conversion function:

I used the one from Mark Amery a couple of times now, it works great and is very easy to use. You can also use a similar function as an object_hook instead, as it might gain you a performance boost on big files. See the slightly more involved answer from Mirec Miskuf for that.


回答 2

没有内置选项可以使json模块函数返回字节字符串而不是unicode字符串。但是,此简短的简单递归函数会将所有已解码的JSON对象从使用unicode字符串转换为UTF-8编码的字节字符串:

def byteify(input):
    if isinstance(input, dict):
        return {byteify(key): byteify(value)
                for key, value in input.iteritems()}
    elif isinstance(input, list):
        return [byteify(element) for element in input]
    elif isinstance(input, unicode):
        return input.encode('utf-8')
    else:
        return input

只需在从json.load或获得的输出上调用它json.loads call调用它即可。

一些注意事项:

  • 要支持Python 2.6或更早版本,请替换return {byteify(key): byteify(value) for key, value in input.iteritems()}return dict([(byteify(key), byteify(value)) for key, value in input.iteritems()]),因为直到Python 2.7才支持字典解析。
  • 由于此答案遍历整个解码对象,因此它具有一些不良的性能特征,可以通过非常小心地使用object_hookobject_pairs_hook参数来避免。Mirec Miskuf的答案是迄今为止唯一能够正确实现这一目标的答案,尽管因此,它的答案比我的方法要复杂得多。

There’s no built-in option to make the json module functions return byte strings instead of unicode strings. However, this short and simple recursive function will convert any decoded JSON object from using unicode strings to UTF-8-encoded byte strings:

def byteify(input):
    if isinstance(input, dict):
        return {byteify(key): byteify(value)
                for key, value in input.iteritems()}
    elif isinstance(input, list):
        return [byteify(element) for element in input]
    elif isinstance(input, unicode):
        return input.encode('utf-8')
    else:
        return input

Just call this on the output you get from a json.load or json.loads call.

A couple of notes:

  • To support Python 2.6 or earlier, replace return {byteify(key): byteify(value) for key, value in input.iteritems()} with return dict([(byteify(key), byteify(value)) for key, value in input.iteritems()]), since dictionary comprehensions weren’t supported until Python 2.7.
  • Since this answer recurses through the entire decoded object, it has a couple of undesirable performance characteristics that can be avoided with very careful use of the object_hook or object_pairs_hook parameters. Mirec Miskuf’s answer is so far the only one that manages to pull this off correctly, although as a consequence, it’s significantly more complicated than my approach.

回答 3

您可以使用该object_hook参数json.loads来传递转换器。事实发生后,您不必进行转换。该json模块将始终object_hook仅传递字典,并且将递归传递嵌套字典,因此您不必自己递归到嵌套字典。我认为我不会将unicode字符串转换为Wells show之类的数字。如果它是unicode字符串,则在JSON文件中被引为字符串,因此应该是字符串(或文件错误)。

另外,我会尽量避免str(val)unicode对象执行类似操作。您应该使用value.encode(encoding)有效的编码,具体取决于外部库的期望。

因此,例如:

def _decode_list(data):
    rv = []
    for item in data:
        if isinstance(item, unicode):
            item = item.encode('utf-8')
        elif isinstance(item, list):
            item = _decode_list(item)
        elif isinstance(item, dict):
            item = _decode_dict(item)
        rv.append(item)
    return rv

def _decode_dict(data):
    rv = {}
    for key, value in data.iteritems():
        if isinstance(key, unicode):
            key = key.encode('utf-8')
        if isinstance(value, unicode):
            value = value.encode('utf-8')
        elif isinstance(value, list):
            value = _decode_list(value)
        elif isinstance(value, dict):
            value = _decode_dict(value)
        rv[key] = value
    return rv

obj = json.loads(s, object_hook=_decode_dict)

You can use the object_hook parameter for json.loads to pass in a converter. You don’t have to do the conversion after the fact. The json module will always pass the object_hook dicts only, and it will recursively pass in nested dicts, so you don’t have to recurse into nested dicts yourself. I don’t think I would convert unicode strings to numbers like Wells shows. If it’s a unicode string, it was quoted as a string in the JSON file, so it is supposed to be a string (or the file is bad).

Also, I’d try to avoid doing something like str(val) on a unicode object. You should use value.encode(encoding) with a valid encoding, depending on what your external lib expects.

So, for example:

def _decode_list(data):
    rv = []
    for item in data:
        if isinstance(item, unicode):
            item = item.encode('utf-8')
        elif isinstance(item, list):
            item = _decode_list(item)
        elif isinstance(item, dict):
            item = _decode_dict(item)
        rv.append(item)
    return rv

def _decode_dict(data):
    rv = {}
    for key, value in data.iteritems():
        if isinstance(key, unicode):
            key = key.encode('utf-8')
        if isinstance(value, unicode):
            value = value.encode('utf-8')
        elif isinstance(value, list):
            value = _decode_list(value)
        elif isinstance(value, dict):
            value = _decode_dict(value)
        rv[key] = value
    return rv

obj = json.loads(s, object_hook=_decode_dict)

回答 4

这是因为json在字符串对象和unicode对象之间没有区别。它们都是javascript中的字符串。

我认为JSON返回Unicode对象是正确的。实际上,我不会接受任何东西,因为javascript字符串实际上是unicode对象(即JSON(javascript)字符串可以存储任何类型的unicode字符),因此unicode在从JSON转换字符串时创建对象是有意义的。普通字符串不适合使用,因为库必须猜测您想要的编码。

最好在unicode任何地方使用字符串对象。因此,最好的选择是更新库,以便它们可以处理unicode对象。

但是,如果您真的想要字节串,只需将结果编码为您选择的编码即可:

>>> nl = json.loads(js)
>>> nl
[u'a', u'b']
>>> nl = [s.encode('utf-8') for s in nl]
>>> nl
['a', 'b']

That’s because json has no difference between string objects and unicode objects. They’re all strings in javascript.

I think JSON is right to return unicode objects. In fact, I wouldn’t accept anything less, since javascript strings are in fact unicode objects (i.e. JSON (javascript) strings can store any kind of unicode character) so it makes sense to create unicode objects when translating strings from JSON. Plain strings just wouldn’t fit since the library would have to guess the encoding you want.

It’s better to use unicode string objects everywhere. So your best option is to update your libraries so they can deal with unicode objects.

But if you really want bytestrings, just encode the results to the encoding of your choice:

>>> nl = json.loads(js)
>>> nl
[u'a', u'b']
>>> nl = [s.encode('utf-8') for s in nl]
>>> nl
['a', 'b']

回答 5

存在一个简单的解决方法。

TL; DR-使用ast.literal_eval()代替json.loads()。双方astjson在标准库。

虽然这不是一个“完美”的答案,但是如果您的计划是完全忽略Unicode,那么答案就很远了。在Python 2.7中

import json, ast
d = { 'field' : 'value' }
print "JSON Fail: ", json.loads(json.dumps(d))
print "AST Win:", ast.literal_eval(json.dumps(d))

给出:

JSON Fail:  {u'field': u'value'}
AST Win: {'field': 'value'}

当某些对象实际上是Unicode字符串时,这会变得更加冗长。完整的答案很快就会出现。

There exists an easy work-around.

TL;DR – Use ast.literal_eval() instead of json.loads(). Both ast and json are in the standard library.

While not a ‘perfect’ answer, it gets one pretty far if your plan is to ignore Unicode altogether. In Python 2.7

import json, ast
d = { 'field' : 'value' }
print "JSON Fail: ", json.loads(json.dumps(d))
print "AST Win:", ast.literal_eval(json.dumps(d))

gives:

JSON Fail:  {u'field': u'value'}
AST Win: {'field': 'value'}

This gets more hairy when some objects are really Unicode strings. The full answer gets hairy quickly.


回答 6

Mike Brennan的答案很接近,但是没有理由重新遍历整个结构。如果使用object_hook_pairs(Python 2.7+)参数:

object_pairs_hook是一个可选函数,将使用对的有序列表解码的任何对象文字的结果调用该函数。的返回值object_pairs_hook将代替dict。此功能可用于实现依赖于键和值对的解码顺序的自定义解码器(例如,collections.OrderedDict将记住插入顺序)。如果object_hook也定义,object_pairs_hook则优先。

使用它,您可以获得每个JSON对象,因此无需进行递归即可进行解码:

def deunicodify_hook(pairs):
    new_pairs = []
    for key, value in pairs:
        if isinstance(value, unicode):
            value = value.encode('utf-8')
        if isinstance(key, unicode):
            key = key.encode('utf-8')
        new_pairs.append((key, value))
    return dict(new_pairs)

In [52]: open('test.json').read()
Out[52]: '{"1": "hello", "abc": [1, 2, 3], "def": {"hi": "mom"}, "boo": [1, "hi", "moo", {"5": "some"}]}'                                        

In [53]: json.load(open('test.json'))
Out[53]: 
{u'1': u'hello',
 u'abc': [1, 2, 3],
 u'boo': [1, u'hi', u'moo', {u'5': u'some'}],
 u'def': {u'hi': u'mom'}}

In [54]: json.load(open('test.json'), object_pairs_hook=deunicodify_hook)
Out[54]: 
{'1': 'hello',
 'abc': [1, 2, 3],
 'boo': [1, 'hi', 'moo', {'5': 'some'}],
 'def': {'hi': 'mom'}}

请注意,由于您使用时,每个对象都将移交给该钩子,因此我不必递归调用该钩子object_pairs_hook。您确实需要关心列表,但是如您所见,列表中的对象将被正确转换,并且您无需递归即可实现它。

编辑:一位同事指出Python2.6没有object_hook_pairs。您仍然可以通过做一个很小的更改来使用Python2.6。在上方的挂钩中,更改:

for key, value in pairs:

for key, value in pairs.iteritems():

然后使用object_hook代替object_pairs_hook

In [66]: json.load(open('test.json'), object_hook=deunicodify_hook)
Out[66]: 
{'1': 'hello',
 'abc': [1, 2, 3],
 'boo': [1, 'hi', 'moo', {'5': 'some'}],
 'def': {'hi': 'mom'}}

使用 object_pairs_hook结果,可以为JSON对象中的每个对象实例化一个更少的字典,如果您正在解析一个巨大的文档,那可能值得一试。

Mike Brennan’s answer is close, but there is no reason to re-traverse the entire structure. If you use the object_hook_pairs (Python 2.7+) parameter:

object_pairs_hook is an optional function that will be called with the result of any object literal decoded with an ordered list of pairs. The return value of object_pairs_hook will be used instead of the dict. This feature can be used to implement custom decoders that rely on the order that the key and value pairs are decoded (for example, collections.OrderedDict will remember the order of insertion). If object_hook is also defined, the object_pairs_hook takes priority.

With it, you get each JSON object handed to you, so you can do the decoding with no need for recursion:

def deunicodify_hook(pairs):
    new_pairs = []
    for key, value in pairs:
        if isinstance(value, unicode):
            value = value.encode('utf-8')
        if isinstance(key, unicode):
            key = key.encode('utf-8')
        new_pairs.append((key, value))
    return dict(new_pairs)

In [52]: open('test.json').read()
Out[52]: '{"1": "hello", "abc": [1, 2, 3], "def": {"hi": "mom"}, "boo": [1, "hi", "moo", {"5": "some"}]}'                                        

In [53]: json.load(open('test.json'))
Out[53]: 
{u'1': u'hello',
 u'abc': [1, 2, 3],
 u'boo': [1, u'hi', u'moo', {u'5': u'some'}],
 u'def': {u'hi': u'mom'}}

In [54]: json.load(open('test.json'), object_pairs_hook=deunicodify_hook)
Out[54]: 
{'1': 'hello',
 'abc': [1, 2, 3],
 'boo': [1, 'hi', 'moo', {'5': 'some'}],
 'def': {'hi': 'mom'}}

Notice that I never have to call the hook recursively since every object will get handed to the hook when you use the object_pairs_hook. You do have to care about lists, but as you can see, an object within a list will be properly converted, and you don’t have to recurse to make it happen.

EDIT: A coworker pointed out that Python2.6 doesn’t have object_hook_pairs. You can still use this will Python2.6 by making a very small change. In the hook above, change:

for key, value in pairs:

to

for key, value in pairs.iteritems():

Then use object_hook instead of object_pairs_hook:

In [66]: json.load(open('test.json'), object_hook=deunicodify_hook)
Out[66]: 
{'1': 'hello',
 'abc': [1, 2, 3],
 'boo': [1, 'hi', 'moo', {'5': 'some'}],
 'def': {'hi': 'mom'}}

Using object_pairs_hook results in one less dictionary being instantiated for each object in the JSON object, which, if you were parsing a huge document, might be worth while.


回答 7

恐怕在simplejson库中无法自动实现此目的。

simplejson中的扫描器和解码器旨在生成unicode文本。为此,该库使用了一个称为c_scanstring(如果可用,为了提高速度)或py_scanstringC版本不可用的函数。该scanstring函数几乎被simplejson用来解码可能包含文本的结构的每个例程多次调用。您将不得不scanstring在simplejson.decoder中对值进行Monkey修补,或者在子类中JSONDecoder提供几乎所有可能包含文本的您自己的完整实现。

但是,simplejson输出unicode的原因是json规范中特别提到“字符串是零个或多个Unicode字符的集合” …对unicode的支持被认为是格式本身的一部分。Simplejson的scanstring实现范围甚至可以扫描和解释Unicode转义(甚至对格式错误的多字节字符集表示形式进行错误检查),因此唯一能够可靠地将值返回给您的方法就是Unicode。

如果您有一个需要使用的老化库,str建议您在解析后费力地搜索嵌套的数据结构(我承认这是您明确表示要避免的内容…对不起),或者将您的库包装成某种形式外观,您可以在其中更细化输入参数。如果您的数据结构确实深度嵌套,则第二种方法可能比第一种方法更易于管理。

I’m afraid there’s no way to achieve this automatically within the simplejson library.

The scanner and decoder in simplejson are designed to produce unicode text. To do this, the library uses a function called c_scanstring (if it’s available, for speed), or py_scanstring if the C version is not available. The scanstring function is called several times by nearly every routine that simplejson has for decoding a structure that might contain text. You’d have to either monkeypatch the scanstring value in simplejson.decoder, or subclass JSONDecoder and provide pretty much your own entire implementation of anything that might contain text.

The reason that simplejson outputs unicode, however, is that the json spec specifically mentions that “A string is a collection of zero or more Unicode characters”… support for unicode is assumed as part of the format itself. Simplejson’s scanstring implementation goes so far as to scan and interpret unicode escapes (even error-checking for malformed multi-byte charset representations), so the only way it can reliably return the value to you is as unicode.

If you have an aged library that needs an str, I recommend you either laboriously search the nested data structure after parsing (which I acknowledge is what you explicitly said you wanted to avoid… sorry), or perhaps wrap your libraries in some sort of facade where you can massage the input parameters at a more granular level. The second approach might be more manageable than the first if your data structures are indeed deeply nested.


回答 8

正如Mark(Amery)正确指出的那样:仅当您只有ASCII时,才可以在json转储上使用PyYaml的反序列化器。至少开箱即用。

关于PyYaml方法的两个简短评论:

  1. 切勿对字段中的数据使用yaml.load。yaml的功能(!)执行隐藏在结构中的任意代码。

  2. 可以通过以下方法使其也适用于非ASCII:

    def to_utf8(loader, node):
        return loader.construct_scalar(node).encode('utf-8')
    yaml.add_constructor(u'tag:yaml.org,2002:str', to_utf8)

但是从性能上来说,它与马克·阿默里的答案没有可比性:

将一些深层嵌套的示例字典扔到这两种方法上,我得到了这一点(dt [j] = json.loads(json.dumps(m))的时间增量):

     dt[yaml.safe_load(json.dumps(m))] =~ 100 * dt[j]
     dt[byteify recursion(Mark Amery)] =~   5 * dt[j]

因此,反序列化包括完全遍历树编码,完全在json基于C的实现的数量级之内。我发现这非常快,并且比深层嵌套结构中的yaml加载还要健壮。而且,查看yaml.load会减少安全性错误的发生。

=>虽然我希望使用指向仅基于C的转换器的指针,但byteify函数应该是默认答案。

如果您的json结构来自包含用户输入的字段,则尤其如此。因为那样的话,您可能无论如何都要遍历您的结构-独立于所需的内部数据结构(仅“ unicode三明治”或字节字符串)。

为什么?

Unicode 规范化。对于不知道:吃片止痛片和阅读

因此,使用字节化递归可以用一块石头杀死两只鸟:

  1. 从嵌套的json转储中获取字节串
  2. 使用户输入值标准化,以便您在存储中查找内容。

在我的测试中,结果证明,用unicodedata.normalize(’NFC’,input).encode(’utf-8’)替换input.encode(’utf-8’)甚至比不使用NFC还要快-多数民众赞成在很大程度上取决于样本数据。

As Mark (Amery) correctly notes: Using PyYaml‘s deserializer on a json dump works only if you have ASCII only. At least out of the box.

Two quick comments on the PyYaml approach:

  1. NEVER use yaml.load on data from the field. Its a feature(!) of yaml to execute arbitrary code hidden within the structure.

  2. You can make it work also for non ASCII via this:

    def to_utf8(loader, node):
        return loader.construct_scalar(node).encode('utf-8')
    yaml.add_constructor(u'tag:yaml.org,2002:str', to_utf8)
    

But performance wise its of no comparison to Mark Amery’s answer:

Throwing some deeply nested sample dicts onto the two methods, I get this (with dt[j] = time delta of json.loads(json.dumps(m))):

     dt[yaml.safe_load(json.dumps(m))] =~ 100 * dt[j]
     dt[byteify recursion(Mark Amery)] =~   5 * dt[j]

So deserialization including fully walking the tree and encoding, well within the order of magnitude of json’s C based implementation. I find this remarkably fast and its also more robust than the yaml load at deeply nested structures. And less security error prone, looking at yaml.load.

=> While I would appreciate a pointer to a C only based converter the byteify function should be the default answer.

This holds especially true if your json structure is from the field, containing user input. Because then you probably need to walk anyway over your structure – independent on your desired internal data structures (‘unicode sandwich’ or byte strings only).

Why?

Unicode normalisation. For the unaware: Take a painkiller and read this.

So using the byteify recursion you kill two birds with one stone:

  1. get your bytestrings from nested json dumps
  2. get user input values normalised, so that you find the stuff in your storage.

In my tests it turned out that replacing the input.encode(‘utf-8’) with a unicodedata.normalize(‘NFC’, input).encode(‘utf-8’) was even faster than w/o NFC – but thats heavily dependent on the sample data I guess.


回答 9

的疑难杂症的是,simplejsonjson是两个不同的模块,至少在它们的方式处理的unicode。您使用的json是py 2.6+,它为您提供unicode值,而simplejson返回字符串对象。只需在您的环境中尝试easy_install-ing simplejson,看看是否可行。它对我有用。

The gotcha is that simplejson and json are two different modules, at least in the manner they deal with unicode. You have json in py 2.6+, and this gives you unicode values, whereas simplejson returns string objects. Just try easy_install-ing simplejson in your environment and see if that works. It did for me.


回答 10

只需使用pickle而不是json进行转储和加载,如下所示:

    import json
    import pickle

    d = { 'field1': 'value1', 'field2': 2, }

    json.dump(d,open("testjson.txt","w"))

    print json.load(open("testjson.txt","r"))

    pickle.dump(d,open("testpickle.txt","w"))

    print pickle.load(open("testpickle.txt","r"))

它产生的输出是(正确处理字符串和整数):

    {u'field2': 2, u'field1': u'value1'}
    {'field2': 2, 'field1': 'value1'}

Just use pickle instead of json for dump and load, like so:

    import json
    import pickle

    d = { 'field1': 'value1', 'field2': 2, }

    json.dump(d,open("testjson.txt","w"))

    print json.load(open("testjson.txt","r"))

    pickle.dump(d,open("testpickle.txt","w"))

    print pickle.load(open("testpickle.txt","r"))

The output it produces is (strings and integers are handled correctly):

    {u'field2': 2, u'field1': u'value1'}
    {'field2': 2, 'field1': 'value1'}

回答 11

因此,我遇到了同样的问题。猜猜Google的第一个结果是什么。

因为我需要将所有数据传递给PyGTK,所以Unicode字符串对我也不是很有用。所以我有另一种递归转换方法。实际上,类型安全JSON转换也需要使用它-json.dump()会在所有非文字类(例如Python对象)上保释。但是不转换字典索引。

# removes any objects, turns unicode back into str
def filter_data(obj):
        if type(obj) in (int, float, str, bool):
                return obj
        elif type(obj) == unicode:
                return str(obj)
        elif type(obj) in (list, tuple, set):
                obj = list(obj)
                for i,v in enumerate(obj):
                        obj[i] = filter_data(v)
        elif type(obj) == dict:
                for i,v in obj.iteritems():
                        obj[i] = filter_data(v)
        else:
                print "invalid object in data, converting to string"
                obj = str(obj) 
        return obj

So, I’ve run into the same problem. Guess what was the first Google result.

Because I need to pass all data to PyGTK, unicode strings aren’t very useful to me either. So I have another recursive conversion method. It’s actually also needed for typesafe JSON conversion – json.dump() would bail on any non-literals, like Python objects. Doesn’t convert dict indexes though.

# removes any objects, turns unicode back into str
def filter_data(obj):
        if type(obj) in (int, float, str, bool):
                return obj
        elif type(obj) == unicode:
                return str(obj)
        elif type(obj) in (list, tuple, set):
                obj = list(obj)
                for i,v in enumerate(obj):
                        obj[i] = filter_data(v)
        elif type(obj) == dict:
                for i,v in obj.iteritems():
                        obj[i] = filter_data(v)
        else:
                print "invalid object in data, converting to string"
                obj = str(obj) 
        return obj

回答 12

我有一个JSON dict作为字符串。键和值是unicode对象,如以下示例所示:

myStringDict = "{u'key':u'value'}"

我可以通过使用byteify将字符串转换为dict对象来使用上面建议的功能ast.literal_eval(myStringDict)

I had a JSON dict as a string. The keys and values were unicode objects like in the following example:

myStringDict = "{u'key':u'value'}"

I could use the byteify function suggested above by converting the string to a dict object using ast.literal_eval(myStringDict).


回答 13

使用钩子支持Python2&3(来自https://stackoverflow.com/a/33571117/558397

import requests
import six
from six import iteritems

requests.packages.urllib3.disable_warnings()  # @UndefinedVariable
r = requests.get("http://echo.jsontest.com/key/value/one/two/three", verify=False)

def _byteify(data):
    # if this is a unicode string, return its string representation
    if isinstance(data, six.string_types):
        return str(data.encode('utf-8').decode())

    # if this is a list of values, return list of byteified values
    if isinstance(data, list):
        return [ _byteify(item) for item in data ]

    # if this is a dictionary, return dictionary of byteified keys and values
    # but only if we haven't already byteified it
    if isinstance(data, dict):
        return {
            _byteify(key): _byteify(value) for key, value in iteritems(data)
        }
    # if it's anything else, return it in its original form
    return data

w = r.json(object_hook=_byteify)
print(w)

返回值:

 {'three': '', 'key': 'value', 'one': 'two'}

Support Python2&3 using hook (from https://stackoverflow.com/a/33571117/558397)

import requests
import six
from six import iteritems

requests.packages.urllib3.disable_warnings()  # @UndefinedVariable
r = requests.get("http://echo.jsontest.com/key/value/one/two/three", verify=False)

def _byteify(data):
    # if this is a unicode string, return its string representation
    if isinstance(data, six.string_types):
        return str(data.encode('utf-8').decode())

    # if this is a list of values, return list of byteified values
    if isinstance(data, list):
        return [ _byteify(item) for item in data ]

    # if this is a dictionary, return dictionary of byteified keys and values
    # but only if we haven't already byteified it
    if isinstance(data, dict):
        return {
            _byteify(key): _byteify(value) for key, value in iteritems(data)
        }
    # if it's anything else, return it in its original form
    return data

w = r.json(object_hook=_byteify)
print(w)

Returns:

 {'three': '', 'key': 'value', 'one': 'two'}

回答 14

这对游戏来说太晚了,但是我建立了这个递归脚轮。它可以满足我的需求,而且我认为它比较完整。它可能会帮助您。

def _parseJSON(self, obj):
    newobj = {}

    for key, value in obj.iteritems():
        key = str(key)

        if isinstance(value, dict):
            newobj[key] = self._parseJSON(value)
        elif isinstance(value, list):
            if key not in newobj:
                newobj[key] = []
                for i in value:
                    newobj[key].append(self._parseJSON(i))
        elif isinstance(value, unicode):
            val = str(value)
            if val.isdigit():
                val = int(val)
            else:
                try:
                    val = float(val)
                except ValueError:
                    val = str(val)
            newobj[key] = val

    return newobj

只需将其传递给JSON对象,如下所示:

obj = json.loads(content, parse_float=float, parse_int=int)
obj = _parseJSON(obj)

我将其作为类的私有成员,但是您可以根据需要重新调整方法的用途。

This is late to the game, but I built this recursive caster. It works for my needs and I think it’s relatively complete. It may help you.

def _parseJSON(self, obj):
    newobj = {}

    for key, value in obj.iteritems():
        key = str(key)

        if isinstance(value, dict):
            newobj[key] = self._parseJSON(value)
        elif isinstance(value, list):
            if key not in newobj:
                newobj[key] = []
                for i in value:
                    newobj[key].append(self._parseJSON(i))
        elif isinstance(value, unicode):
            val = str(value)
            if val.isdigit():
                val = int(val)
            else:
                try:
                    val = float(val)
                except ValueError:
                    val = str(val)
            newobj[key] = val

    return newobj

Just pass it a JSON object like so:

obj = json.loads(content, parse_float=float, parse_int=int)
obj = _parseJSON(obj)

I have it as a private member of a class, but you can repurpose the method as you see fit.


回答 15

我重写了Wells的_parse_json()来处理json对象本身是数组的情况(我的用例)。

def _parseJSON(self, obj):
    if isinstance(obj, dict):
        newobj = {}
        for key, value in obj.iteritems():
            key = str(key)
            newobj[key] = self._parseJSON(value)
    elif isinstance(obj, list):
        newobj = []
        for value in obj:
            newobj.append(self._parseJSON(value))
    elif isinstance(obj, unicode):
        newobj = str(obj)
    else:
        newobj = obj
    return newobj

I rewrote Wells’s _parse_json() to handle cases where the json object itself is an array (my use case).

def _parseJSON(self, obj):
    if isinstance(obj, dict):
        newobj = {}
        for key, value in obj.iteritems():
            key = str(key)
            newobj[key] = self._parseJSON(value)
    elif isinstance(obj, list):
        newobj = []
        for value in obj:
            newobj.append(self._parseJSON(value))
    elif isinstance(obj, unicode):
        newobj = str(obj)
    else:
        newobj = obj
    return newobj

回答 16

这是用C语言编写的递归编码器:https : //github.com/axiros/nested_encode

与json.loads相比,“平均”结构的性能开销约为10%。

python speed.py                                                                                            
  json loads            [0.16sec]: {u'a': [{u'b': [[1, 2, [u'\xd6ster..
  json loads + encoding [0.18sec]: {'a': [{'b': [[1, 2, ['\xc3\x96ster.
  time overhead in percent: 9%

使用以下测试结构:

import json, nested_encode, time

s = """
{
  "firstName": "Jos\\u0301",
  "lastName": "Smith",
  "isAlive": true,
  "age": 25,
  "address": {
    "streetAddress": "21 2nd Street",
    "city": "\\u00d6sterreich",
    "state": "NY",
    "postalCode": "10021-3100"
  },
  "phoneNumbers": [
    {
      "type": "home",
      "number": "212 555-1234"
    },
    {
      "type": "office",
      "number": "646 555-4567"
    }
  ],
  "children": [],
  "spouse": null,
  "a": [{"b": [[1, 2, ["\\u00d6sterreich"]]]}]
}
"""


t1 = time.time()
for i in xrange(10000):
    u = json.loads(s)
dt_json = time.time() - t1

t1 = time.time()
for i in xrange(10000):
    b = nested_encode.encode_nested(json.loads(s))
dt_json_enc = time.time() - t1

print "json loads            [%.2fsec]: %s..." % (dt_json, str(u)[:20])
print "json loads + encoding [%.2fsec]: %s..." % (dt_json_enc, str(b)[:20])

print "time overhead in percent: %i%%"  % (100 * (dt_json_enc - dt_json)/dt_json)

here is a recursive encoder written in C: https://github.com/axiros/nested_encode

Performance overhead for “average” structures around 10% compared to json.loads.

python speed.py                                                                                            
  json loads            [0.16sec]: {u'a': [{u'b': [[1, 2, [u'\xd6ster..
  json loads + encoding [0.18sec]: {'a': [{'b': [[1, 2, ['\xc3\x96ster.
  time overhead in percent: 9%

using this teststructure:

import json, nested_encode, time

s = """
{
  "firstName": "Jos\\u0301",
  "lastName": "Smith",
  "isAlive": true,
  "age": 25,
  "address": {
    "streetAddress": "21 2nd Street",
    "city": "\\u00d6sterreich",
    "state": "NY",
    "postalCode": "10021-3100"
  },
  "phoneNumbers": [
    {
      "type": "home",
      "number": "212 555-1234"
    },
    {
      "type": "office",
      "number": "646 555-4567"
    }
  ],
  "children": [],
  "spouse": null,
  "a": [{"b": [[1, 2, ["\\u00d6sterreich"]]]}]
}
"""


t1 = time.time()
for i in xrange(10000):
    u = json.loads(s)
dt_json = time.time() - t1

t1 = time.time()
for i in xrange(10000):
    b = nested_encode.encode_nested(json.loads(s))
dt_json_enc = time.time() - t1

print "json loads            [%.2fsec]: %s..." % (dt_json, str(u)[:20])
print "json loads + encoding [%.2fsec]: %s..." % (dt_json_enc, str(b)[:20])

print "time overhead in percent: %i%%"  % (100 * (dt_json_enc - dt_json)/dt_json)

回答 17

使用Python 3.6,有时我仍然遇到这个问题。例如,当从REST API获取响应并将响应文本加载到JSON时,我仍然会获得unicode字符串。使用json.dumps()找到了一个简单的解决方案。

response_message = json.loads(json.dumps(response.text))
print(response_message)

With Python 3.6, sometimes I still run into this problem. For example, when getting response from a REST API and loading the response text to JSON, I still get the unicode strings. Found a simple solution using json.dumps().

response_message = json.loads(json.dumps(response.text))
print(response_message)

回答 18

我也遇到了这个问题,不得不处理JSON,我想出了一个小循环将Unicode键转换为字符串。(simplejson在GAE上不返回字符串键。)

obj 是从JSON解码的对象:

if NAME_CLASS_MAP.has_key(cls):
    kwargs = {}
    for i in obj.keys():
        kwargs[str(i)] = obj[i]
    o = NAME_CLASS_MAP[cls](**kwargs)
    o.save()

kwargs是我传递给GAE应用程序的构造函数的内容(它不喜欢其中的unicode**kwargs

不像Wells的解决方案那样强大,但是要小得多。

I ran into this problem too, and having to deal with JSON, I came up with a small loop that converts the unicode keys to strings. (simplejson on GAE does not return string keys.)

obj is the object decoded from JSON:

if NAME_CLASS_MAP.has_key(cls):
    kwargs = {}
    for i in obj.keys():
        kwargs[str(i)] = obj[i]
    o = NAME_CLASS_MAP[cls](**kwargs)
    o.save()

kwargs is what I pass to the constructor of the GAE application (which does not like unicode keys in **kwargs)

Not as robust as the solution from Wells, but much smaller.


回答 19

我从Mark Amery答案中改编了代码,尤其是为了摆脱isinstance鸭蛋式游戏的优点。

编码是手动完成的,ensure_ascii已被禁用。的python文档json.dump

如果suresure_ascii为True(默认值),则输出中的所有非ASCII字符均以\ uXXXX序列转义

免责声明:在doctest中,我使用了匈牙利语。一些与匈牙利人相关的著名字符编码是:使用cp852的IBM / OEM编码,例如。在DOS中(有时不正确地称为ascii,我认为这取决于代码页设置),cp1250例如。在Windows中(有时称为ansi,取决于语言环境设置),并且iso-8859-2有时在http服务器上使用。测试文本Tüskéshátú kígyóbűvölő归因于Wikipedia的KoltaiLászló(本机人名)。

# coding: utf-8
"""
This file should be encoded correctly with utf-8.
"""
import json

def encode_items(input, encoding='utf-8'):
    u"""original from: https://stackoverflow.com/a/13101776/611007
    adapted by SO/u/611007 (20150623)
    >>> 
    >>> ## run this with `python -m doctest <this file>.py` from command line
    >>> 
    >>> txt = u"Tüskéshátú kígyóbűvölő"
    >>> txt2 = u"T\\u00fcsk\\u00e9sh\\u00e1t\\u00fa k\\u00edgy\\u00f3b\\u0171v\\u00f6l\\u0151"
    >>> txt3 = u"uúuutifu"
    >>> txt4 = b'u\\xfauutifu'
    >>> # txt4 shouldn't be 'u\\xc3\\xbauutifu', string content needs double backslash for doctest:
    >>> assert u'\\u0102' not in b'u\\xfauutifu'.decode('cp1250')
    >>> txt4u = txt4.decode('cp1250')
    >>> assert txt4u == u'u\\xfauutifu', repr(txt4u)
    >>> txt5 = b"u\\xc3\\xbauutifu"
    >>> txt5u = txt5.decode('utf-8')
    >>> txt6 = u"u\\u251c\\u2551uutifu"
    >>> there_and_back_again = lambda t: encode_items(t, encoding='utf-8').decode('utf-8')
    >>> assert txt == there_and_back_again(txt)
    >>> assert txt == there_and_back_again(txt2)
    >>> assert txt3 == there_and_back_again(txt3)
    >>> assert txt3.encode('cp852') == there_and_back_again(txt4u).encode('cp852')
    >>> assert txt3 == txt4u,(txt3,txt4u)
    >>> assert txt3 == there_and_back_again(txt5)
    >>> assert txt3 == there_and_back_again(txt5u)
    >>> assert txt3 == there_and_back_again(txt4u)
    >>> assert txt3.encode('cp1250') == encode_items(txt4, encoding='utf-8')
    >>> assert txt3.encode('utf-8') == encode_items(txt5, encoding='utf-8')
    >>> assert txt2.encode('utf-8') == encode_items(txt, encoding='utf-8')
    >>> assert {'a':txt2.encode('utf-8')} == encode_items({'a':txt}, encoding='utf-8')
    >>> assert [txt2.encode('utf-8')] == encode_items([txt], encoding='utf-8')
    >>> assert [[txt2.encode('utf-8')]] == encode_items([[txt]], encoding='utf-8')
    >>> assert [{'a':txt2.encode('utf-8')}] == encode_items([{'a':txt}], encoding='utf-8')
    >>> assert {'b':{'a':txt2.encode('utf-8')}} == encode_items({'b':{'a':txt}}, encoding='utf-8')
    """
    try:
        input.iteritems
        return {encode_items(k): encode_items(v) for (k,v) in input.iteritems()}
    except AttributeError:
        if isinstance(input, unicode):
            return input.encode(encoding)
        elif isinstance(input, str):
            return input
        try:
            iter(input)
            return [encode_items(e) for e in input]
        except TypeError:
            return input

def alt_dumps(obj, **kwargs):
    """
    >>> alt_dumps({'a': u"T\\u00fcsk\\u00e9sh\\u00e1t\\u00fa k\\u00edgy\\u00f3b\\u0171v\\u00f6l\\u0151"})
    '{"a": "T\\xc3\\xbcsk\\xc3\\xa9sh\\xc3\\xa1t\\xc3\\xba k\\xc3\\xadgy\\xc3\\xb3b\\xc5\\xb1v\\xc3\\xb6l\\xc5\\x91"}'
    """
    if 'ensure_ascii' in kwargs:
        del kwargs['ensure_ascii']
    return json.dumps(encode_items(obj), ensure_ascii=False, **kwargs)

我还想强调Jarret Hardie答案,该答案引用了JSON规范,并引用:

字符串是零个或多个Unicode字符的集合

在我的用例中,我有带有json的文件。它们是utf-8编码文件。ensure_ascii会导致正确转义但可读性不强的json文件,这就是为什么我调整了Mark Amery的答案来满足自己的需求的原因。

doctest并不是特别周到,但是我分享了代码,希望它对某人有用。

I’ve adapted the code from the answer of Mark Amery, particularly in order to get rid of isinstance for the pros of duck-typing.

The encoding is done manually and ensure_ascii is disabled. The python docs for json.dump says that

If ensure_ascii is True (the default), all non-ASCII characters in the output are escaped with \uXXXX sequences

Disclaimer: in the doctest I used the Hungarian language. Some notable Hungarian-related character encodings are: cp852 the IBM/OEM encoding used eg. in DOS (sometimes referred as ascii, incorrectly I think, it is dependent on the codepage setting), cp1250 used eg. in Windows (sometimes referred as ansi, dependent on the locale settings), and iso-8859-2, sometimes used on http servers. The test text Tüskéshátú kígyóbűvölő is attributed to Koltai László (native personal name form) and is from wikipedia.

# coding: utf-8
"""
This file should be encoded correctly with utf-8.
"""
import json

def encode_items(input, encoding='utf-8'):
    u"""original from: https://stackoverflow.com/a/13101776/611007
    adapted by SO/u/611007 (20150623)
    >>> 
    >>> ## run this with `python -m doctest <this file>.py` from command line
    >>> 
    >>> txt = u"Tüskéshátú kígyóbűvölő"
    >>> txt2 = u"T\\u00fcsk\\u00e9sh\\u00e1t\\u00fa k\\u00edgy\\u00f3b\\u0171v\\u00f6l\\u0151"
    >>> txt3 = u"uúuutifu"
    >>> txt4 = b'u\\xfauutifu'
    >>> # txt4 shouldn't be 'u\\xc3\\xbauutifu', string content needs double backslash for doctest:
    >>> assert u'\\u0102' not in b'u\\xfauutifu'.decode('cp1250')
    >>> txt4u = txt4.decode('cp1250')
    >>> assert txt4u == u'u\\xfauutifu', repr(txt4u)
    >>> txt5 = b"u\\xc3\\xbauutifu"
    >>> txt5u = txt5.decode('utf-8')
    >>> txt6 = u"u\\u251c\\u2551uutifu"
    >>> there_and_back_again = lambda t: encode_items(t, encoding='utf-8').decode('utf-8')
    >>> assert txt == there_and_back_again(txt)
    >>> assert txt == there_and_back_again(txt2)
    >>> assert txt3 == there_and_back_again(txt3)
    >>> assert txt3.encode('cp852') == there_and_back_again(txt4u).encode('cp852')
    >>> assert txt3 == txt4u,(txt3,txt4u)
    >>> assert txt3 == there_and_back_again(txt5)
    >>> assert txt3 == there_and_back_again(txt5u)
    >>> assert txt3 == there_and_back_again(txt4u)
    >>> assert txt3.encode('cp1250') == encode_items(txt4, encoding='utf-8')
    >>> assert txt3.encode('utf-8') == encode_items(txt5, encoding='utf-8')
    >>> assert txt2.encode('utf-8') == encode_items(txt, encoding='utf-8')
    >>> assert {'a':txt2.encode('utf-8')} == encode_items({'a':txt}, encoding='utf-8')
    >>> assert [txt2.encode('utf-8')] == encode_items([txt], encoding='utf-8')
    >>> assert [[txt2.encode('utf-8')]] == encode_items([[txt]], encoding='utf-8')
    >>> assert [{'a':txt2.encode('utf-8')}] == encode_items([{'a':txt}], encoding='utf-8')
    >>> assert {'b':{'a':txt2.encode('utf-8')}} == encode_items({'b':{'a':txt}}, encoding='utf-8')
    """
    try:
        input.iteritems
        return {encode_items(k): encode_items(v) for (k,v) in input.iteritems()}
    except AttributeError:
        if isinstance(input, unicode):
            return input.encode(encoding)
        elif isinstance(input, str):
            return input
        try:
            iter(input)
            return [encode_items(e) for e in input]
        except TypeError:
            return input

def alt_dumps(obj, **kwargs):
    """
    >>> alt_dumps({'a': u"T\\u00fcsk\\u00e9sh\\u00e1t\\u00fa k\\u00edgy\\u00f3b\\u0171v\\u00f6l\\u0151"})
    '{"a": "T\\xc3\\xbcsk\\xc3\\xa9sh\\xc3\\xa1t\\xc3\\xba k\\xc3\\xadgy\\xc3\\xb3b\\xc5\\xb1v\\xc3\\xb6l\\xc5\\x91"}'
    """
    if 'ensure_ascii' in kwargs:
        del kwargs['ensure_ascii']
    return json.dumps(encode_items(obj), ensure_ascii=False, **kwargs)

I’d also like to highlight the answer of Jarret Hardie which references the JSON spec, quoting:

A string is a collection of zero or more Unicode characters

In my use-case I had files with json. They are utf-8 encoded files. ensure_ascii results in properly escaped but not very readable json files, that is why I’ve adapted Mark Amery’s answer to fit my needs.

The doctest is not particularly thoughtful but I share the code in the hope that it will useful for someone.


回答 20

看看这个类似问题的答案,该问题指出

u-前缀仅表示您具有Unicode字符串。当您真正使用字符串时,它不会出现在您的数据中。不要被打印输出扔掉。

例如,尝试以下操作:

print mail_accounts[0]["i"]

你不会看到你。

Check out this answer to a similar question like this which states that

The u- prefix just means that you have a Unicode string. When you really use the string, it won’t appear in your data. Don’t be thrown by the printed output.

For example, try this:

print mail_accounts[0]["i"]

You won’t see a u.


将python字典转换为字符串并返回

问题:将python字典转换为字符串并返回

我正在编写一个将数据存储在字典对象中的程序,但是该数据需要在程序执行过程中的某个时候保存,并在再次运行该程序时重新加载到字典对象中。如何将字典对象转换为可以写入文件并可以加载回字典对象的字符串?希望这将支持包含词典的词典。

I am writing a program that stores data in a dictionary object, but this data needs to be saved at some point during the program execution and loaded back into the dictionary object when the program is run again. How would I convert a dictionary object into a string that can be written to a file and loaded back into a dictionary object? This will hopefully support dictionaries containing dictionaries.


回答 0

json模块是一个很好的解决方案。与pickle相比,它的优势在于它仅生成纯文本输出,并且是跨平台和跨版本的。

import json
json.dumps(dict)

The json module is a good solution here. It has the advantages over pickle that it only produces plain text output, and is cross-platform and cross-version.

import json
json.dumps(dict)

回答 1

如果您的字典不太大,也许str + eval可以完成工作:

dict1 = {'one':1, 'two':2, 'three': {'three.1': 3.1, 'three.2': 3.2 }}
str1 = str(dict1)

dict2 = eval(str1)

print dict1==dict2

如果源不受信任,则可以使用ast.literal_eval而不是eval来提高安全性。

If your dictionary isn’t too big maybe str + eval can do the work:

dict1 = {'one':1, 'two':2, 'three': {'three.1': 3.1, 'three.2': 3.2 }}
str1 = str(dict1)

dict2 = eval(str1)

print dict1==dict2

You can use ast.literal_eval instead of eval for additional security if the source is untrusted.


回答 2

我用json

import json

# convert to string
input = json.dumps({'id': id })

# load to dict
my_dict = json.loads(input) 

I use json:

import json

# convert to string
input = json.dumps({'id': id })

# load to dict
my_dict = json.loads(input) 

回答 3

使用该pickle模块将其保存到磁盘并稍后加载。

Use the pickle module to save it to disk and load later on.


回答 4

为什么不使用Python 3的内置ast库的函数literal_eval。最好使用literal_eval而不是eval

import ast
str_of_dict = "{'key1': 'key1value', 'key2': 'key2value'}"
ast.literal_eval(str_of_dict)

将输出作为实际字典

{'key1': 'key1value', 'key2': 'key2value'}

而且,如果您要求将Dictionary转换为String,那么如何使用str() Python的方法。

假设字典是:

my_dict = {'key1': 'key1value', 'key2': 'key2value'}

这将像这样完成:

str(my_dict)

将打印:

"{'key1': 'key1value', 'key2': 'key2value'}"

这是您想要的简单操作。

Why not to use Python 3’s inbuilt ast library’s function literal_eval. It is better to use literal_eval instead of eval

import ast
str_of_dict = "{'key1': 'key1value', 'key2': 'key2value'}"
ast.literal_eval(str_of_dict)

will give output as actual Dictionary

{'key1': 'key1value', 'key2': 'key2value'}

And If you are asking to convert a Dictionary to a String then, How about using str() method of Python.

Suppose the dictionary is :

my_dict = {'key1': 'key1value', 'key2': 'key2value'}

And this will be done like this :

str(my_dict)

Will Print :

"{'key1': 'key1value', 'key2': 'key2value'}"

This is the easy as you like.


回答 5

如果在中国

import codecs
fout = codecs.open("xxx.json", "w", "utf-8")
dict_to_json = json.dumps({'text':"中文"},ensure_ascii=False,indent=2)
fout.write(dict_to_json + '\n')

If in Chinses

import codecs
fout = codecs.open("xxx.json", "w", "utf-8")
dict_to_json = json.dumps({'text':"中文"},ensure_ascii=False,indent=2)
fout.write(dict_to_json + '\n')

回答 6

将字典转换成JSON(字符串)

import json 

mydict = { "name" : "Don", 
          "surname" : "Mandol", 
          "age" : 43} 

result = json.dumps(mydict)

print(result[0:20])

将为您提供:

{“ name”:“ Don”,“ sur

将字符串转换成字典

back_to_mydict = json.loads(result) 

Convert dictionary into JSON (string)

import json 

mydict = { "name" : "Don", 
          "surname" : "Mandol", 
          "age" : 43} 

result = json.dumps(mydict)

print(result[0:20])

will get you:

{“name”: “Don”, “sur

Convert string into dictionary

back_to_mydict = json.loads(result) 

回答 7

我认为您应该考虑使用shelve提供持久性文件支持的字典式对象的模块。它很容易代替“真实”字典使用,因为它几乎透明地为您的程序提供了可以像字典一样使用的东西,而无需将其显式转换为字符串然后写入文件(或反之,反之亦然。

主要区别是需要open()先使用close()它,然后再使用(完成时可能要使用sync()它,具体取决于writeback要使用所使用选项)。创建的任何“架子”文件对象都可以包含常规字典作为值,从而使它们可以逻辑嵌套。

这是一个简单的例子:

import shelve

shelf = shelve.open('mydata')  # open for reading and writing, creating if nec
shelf.update({'one':1, 'two':2, 'three': {'three.1': 3.1, 'three.2': 3.2 }})
shelf.close()

shelf = shelve.open('mydata')
print shelf
shelf.close()

输出:

{'three': {'three.1': 3.1, 'three.2': 3.2}, 'two': 2, 'one': 1}

I think you should consider using the shelve module which provides persistent file-backed dictionary-like objects. It’s easy to use in place of a “real” dictionary because it almost transparently provides your program with something that can be used just like a dictionary, without the need to explicitly convert it to a string and then write to a file (or vice-versa).

The main difference is needing to initially open() it before first use and then close() it when you’re done (and possibly sync()ing it, depending on the writeback option being used). Any “shelf” file objects create can contain regular dictionaries as values, allowing them to be logically nested.

Here’s a trivial example:

import shelve

shelf = shelve.open('mydata')  # open for reading and writing, creating if nec
shelf.update({'one':1, 'two':2, 'three': {'three.1': 3.1, 'three.2': 3.2 }})
shelf.close()

shelf = shelve.open('mydata')
print shelf
shelf.close()

Output:

{'three': {'three.1': 3.1, 'three.2': 3.2}, 'two': 2, 'one': 1}

回答 8

如果您关心速度,请使用与json相同的API的ujson(UltraJSON):

import ujson
ujson.dumps([{"key": "value"}, 81, True])
# '[{"key":"value"},81,true]'
ujson.loads("""[{"key": "value"}, 81, true]""")
# [{u'key': u'value'}, 81, True]

If you care about the speed use ujson (UltraJSON), which has the same API as json:

import ujson
ujson.dumps([{"key": "value"}, 81, True])
# '[{"key":"value"},81,true]'
ujson.loads("""[{"key": "value"}, 81, true]""")
# [{u'key': u'value'}, 81, True]

回答 9

如果需要可读性(不是IMHO的JSON或XML),或者如果不需要阅读的话,我就使用yaml。

from pickle import dumps, loads
x = dict(a=1, b=2)
y = dict(c = x, z=3)
res = dumps(y)
open('/var/tmp/dump.txt', 'w').write(res)

回过头再读

from pickle import dumps, loads
rev = loads(open('/var/tmp/dump.txt').read())
print rev

I use yaml for that if needs to be readable (neither JSON nor XML are that IMHO), or if reading is not necessary I use pickle.

Write

from pickle import dumps, loads
x = dict(a=1, b=2)
y = dict(c = x, z=3)
res = dumps(y)
open('/var/tmp/dump.txt', 'w').write(res)

Read back

from pickle import dumps, loads
rev = loads(open('/var/tmp/dump.txt').read())
print rev

如何使可序列化的JSON类

问题:如何使可序列化的JSON类

如何使Python类可序列化?

一个简单的类:

class FileItem:
    def __init__(self, fname):
        self.fname = fname

我应该怎么做才能获得输出:

>>> import json

>>> my_file = FileItem('/foo/bar')
>>> json.dumps(my_file)
TypeError: Object of type 'FileItem' is not JSON serializable

没有错误

How to make a Python class serializable?

A simple class:

class FileItem:
    def __init__(self, fname):
        self.fname = fname

What should I do to be able to get output of:

>>> import json

>>> my_file = FileItem('/foo/bar')
>>> json.dumps(my_file)
TypeError: Object of type 'FileItem' is not JSON serializable

Without the error


回答 0

您对预期的输出有想法吗?例如这样做吗?

>>> f  = FileItem("/foo/bar")
>>> magic(f)
'{"fname": "/foo/bar"}'

在这种情况下,您只能调用json.dumps(f.__dict__)

如果您想要更多的自定义输出,则必须继承JSONEncoder并实现自己的自定义序列化。

有关一个简单的示例,请参见下文。

>>> from json import JSONEncoder
>>> class MyEncoder(JSONEncoder):
        def default(self, o):
            return o.__dict__    

>>> MyEncoder().encode(f)
'{"fname": "/foo/bar"}'

然后,将该类json.dumps()作为clskwarg 传递给方法:

json.dumps(cls=MyEncoder)

如果你也想解码,那么你将有一个自定义供应object_hookJSONDecoder类。例如

>>> def from_json(json_object):
        if 'fname' in json_object:
            return FileItem(json_object['fname'])
>>> f = JSONDecoder(object_hook = from_json).decode('{"fname": "/foo/bar"}')
>>> f
<__main__.FileItem object at 0x9337fac>
>>> 

Do you have an idea about the expected output? For e.g. will this do?

>>> f  = FileItem("/foo/bar")
>>> magic(f)
'{"fname": "/foo/bar"}'

In that case you can merely call json.dumps(f.__dict__).

If you want more customized output then you will have to subclass JSONEncoder and implement your own custom serialization.

For a trivial example, see below.

>>> from json import JSONEncoder
>>> class MyEncoder(JSONEncoder):
        def default(self, o):
            return o.__dict__    

>>> MyEncoder().encode(f)
'{"fname": "/foo/bar"}'

Then you pass this class into the json.dumps() method as cls kwarg:

json.dumps(cls=MyEncoder)

If you also want to decode then you’ll have to supply a custom object_hook to the JSONDecoder class. For e.g.

>>> def from_json(json_object):
        if 'fname' in json_object:
            return FileItem(json_object['fname'])
>>> f = JSONDecoder(object_hook = from_json).decode('{"fname": "/foo/bar"}')
>>> f
<__main__.FileItem object at 0x9337fac>
>>> 

回答 1

这是一个简单功能的简单解决方案:

.toJSON() 方法

代替JSON可序列化的类,实现一个序列化器方法:

import json

class Object:
    def toJSON(self):
        return json.dumps(self, default=lambda o: o.__dict__, 
            sort_keys=True, indent=4)

因此,您只需调用它即可序列化:

me = Object()
me.name = "Onur"
me.age = 35
me.dog = Object()
me.dog.name = "Apollo"

print(me.toJSON())

将输出:

{
    "age": 35,
    "dog": {
        "name": "Apollo"
    },
    "name": "Onur"
}

Here is a simple solution for a simple feature:

.toJSON() Method

Instead of a JSON serializable class, implement a serializer method:

import json

class Object:
    def toJSON(self):
        return json.dumps(self, default=lambda o: o.__dict__, 
            sort_keys=True, indent=4)

So you just call it to serialize:

me = Object()
me.name = "Onur"
me.age = 35
me.dog = Object()
me.dog.name = "Apollo"

print(me.toJSON())

will output:

{
    "age": 35,
    "dog": {
        "name": "Apollo"
    },
    "name": "Onur"
}

回答 2

对于更复杂的类,您可以考虑使用jsonpickle工具:

jsonpickle是一个Python库,用于将复杂的Python对象与JSON进行序列化和反序列化。

用于将Python编码为JSON的标准Python库(例如stdlib的json,simplejson和demjson)只能处理具有直接JSON等效项的Python原语(例如,字典,列表,字符串,整数等)。jsonpickle建立在这些库之上,并允许将更复杂的数据结构序列化为JSON。jsonpickle具有高度的可配置性和可扩展性,允许用户选择JSON后端并添加其他后端。

(链接到PyPi上的jsonpickle)

For more complex classes you could consider the tool jsonpickle:

jsonpickle is a Python library for serialization and deserialization of complex Python objects to and from JSON.

The standard Python libraries for encoding Python into JSON, such as the stdlib’s json, simplejson, and demjson, can only handle Python primitives that have a direct JSON equivalent (e.g. dicts, lists, strings, ints, etc.). jsonpickle builds on top of these libraries and allows more complex data structures to be serialized to JSON. jsonpickle is highly configurable and extendable–allowing the user to choose the JSON backend and add additional backends.

(link to jsonpickle on PyPi)


回答 3

大多数答案都涉及将对json.dumps()的调用更改为并非总是可能或不希望的(例如,它可能发生在框架组件内部)。

如果您希望能够原样调用 json.dumps(obj),那么一个简单的解决方案就是从dict继承:

class FileItem(dict):
    def __init__(self, fname):
        dict.__init__(self, fname=fname)

f = FileItem('tasks.txt')
json.dumps(f)  #No need to change anything here

如果您的类只是基本数据表示形式,则此方法有效,对于棘手的事情,您始终可以显式设置键。

Most of the answers involve changing the call to json.dumps(), which is not always possible or desirable (it may happen inside a framework component for example).

If you want to be able to call json.dumps(obj) as is, then a simple solution is inheriting from dict:

class FileItem(dict):
    def __init__(self, fname):
        dict.__init__(self, fname=fname)

f = FileItem('tasks.txt')
json.dumps(f)  #No need to change anything here

This works if your class is just basic data representation, for trickier things you can always set keys explicitly.


回答 4

我喜欢Onur的答案,但会扩展为包括一个可选toJSON()方法,以使对象自行序列化:

def dumper(obj):
    try:
        return obj.toJSON()
    except:
        return obj.__dict__
print json.dumps(some_big_object, default=dumper, indent=2)

I like Onur’s answer but would expand to include an optional toJSON() method for objects to serialize themselves:

def dumper(obj):
    try:
        return obj.toJSON()
    except:
        return obj.__dict__
print json.dumps(some_big_object, default=dumper, indent=2)

回答 5

另一个选择是将JSON转储包装在其自己的类中:

import json

class FileItem:
    def __init__(self, fname):
        self.fname = fname

    def __repr__(self):
        return json.dumps(self.__dict__)

或者,甚至更好的是,从类中继承FileItem JsonSerializable类:

import json

class JsonSerializable(object):
    def toJson(self):
        return json.dumps(self.__dict__)

    def __repr__(self):
        return self.toJson()


class FileItem(JsonSerializable):
    def __init__(self, fname):
        self.fname = fname

测试:

>>> f = FileItem('/foo/bar')
>>> f.toJson()
'{"fname": "/foo/bar"}'
>>> f
'{"fname": "/foo/bar"}'
>>> str(f) # string coercion
'{"fname": "/foo/bar"}'

Another option is to wrap JSON dumping in its own class:

import json

class FileItem:
    def __init__(self, fname):
        self.fname = fname

    def __repr__(self):
        return json.dumps(self.__dict__)

Or, even better, subclassing FileItem class from a JsonSerializable class:

import json

class JsonSerializable(object):
    def toJson(self):
        return json.dumps(self.__dict__)

    def __repr__(self):
        return self.toJson()


class FileItem(JsonSerializable):
    def __init__(self, fname):
        self.fname = fname

Testing:

>>> f = FileItem('/foo/bar')
>>> f.toJson()
'{"fname": "/foo/bar"}'
>>> f
'{"fname": "/foo/bar"}'
>>> str(f) # string coercion
'{"fname": "/foo/bar"}'

回答 6

只需将to_json方法添加到您的类中,如下所示:

def to_json(self):
  return self.message # or how you want it to be serialized

并将此代码(来自此答案添加到所有内容的顶部:

from json import JSONEncoder

def _default(self, obj):
    return getattr(obj.__class__, "to_json", _default.default)(obj)

_default.default = JSONEncoder().default
JSONEncoder.default = _default

导入时,它将对Monkeyjson模块进行Monkey补丁处理,因此JSONEncoder.default()自动检查特殊的“ to_json()”方法,并在找到后使用该方法对对象进行编码。

就像Onur所说的一样,但是这次您不必更新json.dumps()项目中的每个项目。

Just add to_json method to your class like this:

def to_json(self):
  return self.message # or how you want it to be serialized

And add this code (from this answer), to somewhere at the top of everything:

from json import JSONEncoder

def _default(self, obj):
    return getattr(obj.__class__, "to_json", _default.default)(obj)

_default.default = JSONEncoder().default
JSONEncoder.default = _default

This will monkey-patch json module when it’s imported so JSONEncoder.default() automatically checks for a special “to_json()” method and uses it to encode the object if found.

Just like Onur said, but this time you don’t have to update every json.dumps() in your project.


回答 7

前几天,我遇到了这个问题,并为Python对象实现了一个更通用的Encoder版本,可以处理嵌套对象继承的字段

import json
import inspect

class ObjectEncoder(json.JSONEncoder):
    def default(self, obj):
        if hasattr(obj, "to_json"):
            return self.default(obj.to_json())
        elif hasattr(obj, "__dict__"):
            d = dict(
                (key, value)
                for key, value in inspect.getmembers(obj)
                if not key.startswith("__")
                and not inspect.isabstract(value)
                and not inspect.isbuiltin(value)
                and not inspect.isfunction(value)
                and not inspect.isgenerator(value)
                and not inspect.isgeneratorfunction(value)
                and not inspect.ismethod(value)
                and not inspect.ismethoddescriptor(value)
                and not inspect.isroutine(value)
            )
            return self.default(d)
        return obj

例:

class C(object):
    c = "NO"
    def to_json(self):
        return {"c": "YES"}

class B(object):
    b = "B"
    i = "I"
    def __init__(self, y):
        self.y = y

    def f(self):
        print "f"

class A(B):
    a = "A"
    def __init__(self):
        self.b = [{"ab": B("y")}]
        self.c = C()

print json.dumps(A(), cls=ObjectEncoder, indent=2, sort_keys=True)

结果:

{
  "a": "A", 
  "b": [
    {
      "ab": {
        "b": "B", 
        "i": "I", 
        "y": "y"
      }
    }
  ], 
  "c": {
    "c": "YES"
  }, 
  "i": "I"
}

I came across this problem the other day and implemented a more general version of an Encoder for Python objects that can handle nested objects and inherited fields:

import json
import inspect

class ObjectEncoder(json.JSONEncoder):
    def default(self, obj):
        if hasattr(obj, "to_json"):
            return self.default(obj.to_json())
        elif hasattr(obj, "__dict__"):
            d = dict(
                (key, value)
                for key, value in inspect.getmembers(obj)
                if not key.startswith("__")
                and not inspect.isabstract(value)
                and not inspect.isbuiltin(value)
                and not inspect.isfunction(value)
                and not inspect.isgenerator(value)
                and not inspect.isgeneratorfunction(value)
                and not inspect.ismethod(value)
                and not inspect.ismethoddescriptor(value)
                and not inspect.isroutine(value)
            )
            return self.default(d)
        return obj

Example:

class C(object):
    c = "NO"
    def to_json(self):
        return {"c": "YES"}

class B(object):
    b = "B"
    i = "I"
    def __init__(self, y):
        self.y = y

    def f(self):
        print "f"

class A(B):
    a = "A"
    def __init__(self):
        self.b = [{"ab": B("y")}]
        self.c = C()

print json.dumps(A(), cls=ObjectEncoder, indent=2, sort_keys=True)

Result:

{
  "a": "A", 
  "b": [
    {
      "ab": {
        "b": "B", 
        "i": "I", 
        "y": "y"
      }
    }
  ], 
  "c": {
    "c": "YES"
  }, 
  "i": "I"
}

回答 8

如果您使用的是Python3.5 +,则可以使用jsons。它将把您的对象(及其所有属性递归地)转换成字典。

import jsons

a_dict = jsons.dump(your_object)

或者,如果您想要一个字符串:

a_str = jsons.dumps(your_object)

或者如果您的Class实施了jsons.JsonSerializable

a_dict = your_object.json

If you’re using Python3.5+, you could use jsons. It will convert your object (and all its attributes recursively) to a dict.

import jsons

a_dict = jsons.dump(your_object)

Or if you wanted a string:

a_str = jsons.dumps(your_object)

Or if your class implemented jsons.JsonSerializable:

a_dict = your_object.json

回答 9

import simplejson

class User(object):
    def __init__(self, name, mail):
        self.name = name
        self.mail = mail

    def _asdict(self):
        return self.__dict__

print(simplejson.dumps(User('alice', 'alice@mail.com')))

如果使用标准json,则需要定义一个default函数

import json
def default(o):
    return o._asdict()

print(json.dumps(User('alice', 'alice@mail.com'), default=default))
import simplejson

class User(object):
    def __init__(self, name, mail):
        self.name = name
        self.mail = mail

    def _asdict(self):
        return self.__dict__

print(simplejson.dumps(User('alice', 'alice@mail.com')))

if use standard json, u need to define a default function

import json
def default(o):
    return o._asdict()

print(json.dumps(User('alice', 'alice@mail.com'), default=default))

回答 10

json在可以打印的对象方面受到限制,并且jsonpickle(您可能需要pip install jsonpickle)在不能缩进文本方面受到限制。如果您想检查无法更改其类的对象的内容,我仍然找不到比以下方法更直接的方法:

 import json
 import jsonpickle
 ...
 print  json.dumps(json.loads(jsonpickle.encode(object)), indent=2)

注意:他们仍然无法打印对象方法。

json is limited in terms of objects it can print, and jsonpickle (you may need a pip install jsonpickle) is limited in terms it can’t indent text. If you would like to inspect the contents of an object whose class you can’t change, I still couldn’t find a straighter way than:

 import json
 import jsonpickle
 ...
 print  json.dumps(json.loads(jsonpickle.encode(object)), indent=2)

Note: that still they can’t print the object methods.


回答 11

此类可以解决问题,它将对象转换为标准json。

import json


class Serializer(object):
    @staticmethod
    def serialize(object):
        return json.dumps(object, default=lambda o: o.__dict__.values()[0])

用法:

Serializer.serialize(my_object)

python2.7和工作python3

This class can do the trick, it converts object to standard json .

import json


class Serializer(object):
    @staticmethod
    def serialize(object):
        return json.dumps(object, default=lambda o: o.__dict__.values()[0])

usage:

Serializer.serialize(my_object)

working in python2.7 and python3.


回答 12

import json

class Foo(object):
    def __init__(self):
        self.bar = 'baz'
        self._qux = 'flub'

    def somemethod(self):
        pass

def default(instance):
    return {k: v
            for k, v in vars(instance).items()
            if not str(k).startswith('_')}

json_foo = json.dumps(Foo(), default=default)
assert '{"bar": "baz"}' == json_foo

print(json_foo)
import json

class Foo(object):
    def __init__(self):
        self.bar = 'baz'
        self._qux = 'flub'

    def somemethod(self):
        pass

def default(instance):
    return {k: v
            for k, v in vars(instance).items()
            if not str(k).startswith('_')}

json_foo = json.dumps(Foo(), default=default)
assert '{"bar": "baz"}' == json_foo

print(json_foo)

回答 13

jaraco给出了一个非常简洁的答案。我需要修复一些小问题,但这可行:

# Your custom class
class MyCustom(object):
    def __json__(self):
        return {
            'a': self.a,
            'b': self.b,
            '__python__': 'mymodule.submodule:MyCustom.from_json',
        }

    to_json = __json__  # supported by simplejson

    @classmethod
    def from_json(cls, json):
        obj = cls()
        obj.a = json['a']
        obj.b = json['b']
        return obj

# Dumping and loading
import simplejson

obj = MyCustom()
obj.a = 3
obj.b = 4

json = simplejson.dumps(obj, for_json=True)

# Two-step loading
obj2_dict = simplejson.loads(json)
obj2 = MyCustom.from_json(obj2_dict)

# Make sure we have the correct thing
assert isinstance(obj2, MyCustom)
assert obj2.__dict__ == obj.__dict__

请注意,我们需要两个步骤进行加载。目前,该__python__属性尚未使用。

这有多普遍?

使用AlJohri的方法,我检查了方法的普及程度:

序列化(Python-> JSON):

反序列化(JSON-> Python):

jaraco gave a pretty neat answer. I needed to fix some minor things, but this works:

Code

# Your custom class
class MyCustom(object):
    def __json__(self):
        return {
            'a': self.a,
            'b': self.b,
            '__python__': 'mymodule.submodule:MyCustom.from_json',
        }

    to_json = __json__  # supported by simplejson

    @classmethod
    def from_json(cls, json):
        obj = cls()
        obj.a = json['a']
        obj.b = json['b']
        return obj

# Dumping and loading
import simplejson

obj = MyCustom()
obj.a = 3
obj.b = 4

json = simplejson.dumps(obj, for_json=True)

# Two-step loading
obj2_dict = simplejson.loads(json)
obj2 = MyCustom.from_json(obj2_dict)

# Make sure we have the correct thing
assert isinstance(obj2, MyCustom)
assert obj2.__dict__ == obj.__dict__

Note that we need two steps for loading. For now, the __python__ property is not used.

How common is this?

Using the method of AlJohri, I check popularity of approaches:

Serialization (Python -> JSON):

Deserialization (JSON -> Python):


回答 14

这对我来说效果很好:

class JsonSerializable(object):

    def serialize(self):
        return json.dumps(self.__dict__)

    def __repr__(self):
        return self.serialize()

    @staticmethod
    def dumper(obj):
        if "serialize" in dir(obj):
            return obj.serialize()

        return obj.__dict__

然后

class FileItem(JsonSerializable):
    ...

log.debug(json.dumps(<my object>, default=JsonSerializable.dumper, indent=2))

This has worked well for me:

class JsonSerializable(object):

    def serialize(self):
        return json.dumps(self.__dict__)

    def __repr__(self):
        return self.serialize()

    @staticmethod
    def dumper(obj):
        if "serialize" in dir(obj):
            return obj.serialize()

        return obj.__dict__

and then

class FileItem(JsonSerializable):
    ...

and

log.debug(json.dumps(<my object>, default=JsonSerializable.dumper, indent=2))

回答 15

如果您不介意为其安装软件包,则可以使用json-tricks

pip install json-tricks

在此之后,你只需要导入dump(s)json_tricks替代JSON,它通常会工作:

from json_tricks import dumps
json_str = dumps(cls_instance, indent=4)

这会给

{
        "__instance_type__": [
                "module_name.test_class",
                "MyTestCls"
        ],
        "attributes": {
                "attr": "val",
                "dct_attr": {
                        "hello": 42
                }
        }
}

基本上就是这样!


总的来说,这会很好。有一些exceptions,例如,如果发生了特殊情况__new__,或者发生了更多的元类魔术。

显然,加载也可以(否则有什么意义):

from json_tricks import loads
json_str = loads(json_str)

这确实假定module_name.test_class.MyTestCls可以导入并且没有以不兼容的方式进行更改。您将获得一个实例,而不是字典或其他内容,它应该与您转储的副本相同。

如果要自定义某些东西的序列化方法,可以向类中添加特殊方法,如下所示:

class CustomEncodeCls:
        def __init__(self):
                self.relevant = 42
                self.irrelevant = 37

        def __json_encode__(self):
                # should return primitive, serializable types like dict, list, int, string, float...
                return {'relevant': self.relevant}

        def __json_decode__(self, **attrs):
                # should initialize all properties; note that __init__ is not called implicitly
                self.relevant = attrs['relevant']
                self.irrelevant = 12

例如,它仅序列化部分属性参数。

作为免费赠品,您可以获得numpy数组(日期)的反序列化,日期和时间,有序映射以及在json中包含注释的功能。

免责声明:我创建了json_tricks,因为我和您有同样的问题。

If you don’t mind installing a package for it, you can use json-tricks:

pip install json-tricks

After that you just need to import dump(s) from json_tricks instead of json, and it’ll usually work:

from json_tricks import dumps
json_str = dumps(cls_instance, indent=4)

which’ll give

{
        "__instance_type__": [
                "module_name.test_class",
                "MyTestCls"
        ],
        "attributes": {
                "attr": "val",
                "dct_attr": {
                        "hello": 42
                }
        }
}

And that’s basically it!


This will work great in general. There are some exceptions, e.g. if special things happen in __new__, or more metaclass magic is going on.

Obviously loading also works (otherwise what’s the point):

from json_tricks import loads
json_str = loads(json_str)

This does assume that module_name.test_class.MyTestCls can be imported and hasn’t changed in non-compatible ways. You’ll get back an instance, not some dictionary or something, and it should be an identical copy to the one you dumped.

If you want to customize how something gets (de)serialized, you can add special methods to your class, like so:

class CustomEncodeCls:
        def __init__(self):
                self.relevant = 42
                self.irrelevant = 37

        def __json_encode__(self):
                # should return primitive, serializable types like dict, list, int, string, float...
                return {'relevant': self.relevant}

        def __json_decode__(self, **attrs):
                # should initialize all properties; note that __init__ is not called implicitly
                self.relevant = attrs['relevant']
                self.irrelevant = 12

which serializes only part of the attributes parameters, as an example.

And as a free bonus, you get (de)serialization of numpy arrays, date & times, ordered maps, as well as the ability to include comments in json.

Disclaimer: I created json_tricks, because I had the same problem as you.


回答 16

jsonweb似乎是对我最好的解决方案。参见http://www.jsonweb.info/en/latest/

from jsonweb.encode import to_object, dumper

@to_object()
class DataModel(object):
  def __init__(self, id, value):
   self.id = id
   self.value = value

>>> data = DataModel(5, "foo")
>>> dumper(data)
'{"__type__": "DataModel", "id": 5, "value": "foo"}'

jsonweb seems to be the best solution for me. See http://www.jsonweb.info/en/latest/

from jsonweb.encode import to_object, dumper

@to_object()
class DataModel(object):
  def __init__(self, id, value):
   self.id = id
   self.value = value

>>> data = DataModel(5, "foo")
>>> dumper(data)
'{"__type__": "DataModel", "id": 5, "value": "foo"}'

回答 17

这是我的3美分…
这演示了一个类似树的python对象的显式json序列化。
注意:如果您实际上想要这样的代码,则可以使用扭曲的FilePath类。

import json, sys, os

class File:
    def __init__(self, path):
        self.path = path

    def isdir(self):
        return os.path.isdir(self.path)

    def isfile(self):
        return os.path.isfile(self.path)

    def children(self):        
        return [File(os.path.join(self.path, f)) 
                for f in os.listdir(self.path)]

    def getsize(self):        
        return os.path.getsize(self.path)

    def getModificationTime(self):
        return os.path.getmtime(self.path)

def _default(o):
    d = {}
    d['path'] = o.path
    d['isFile'] = o.isfile()
    d['isDir'] = o.isdir()
    d['mtime'] = int(o.getModificationTime())
    d['size'] = o.getsize() if o.isfile() else 0
    if o.isdir(): d['children'] = o.children()
    return d

folder = os.path.abspath('.')
json.dump(File(folder), sys.stdout, default=_default)

Here is my 3 cents …
This demonstrates explicit json serialization for a tree-like python object.
Note: If you actually wanted some code like this you could use the twisted FilePath class.

import json, sys, os

class File:
    def __init__(self, path):
        self.path = path

    def isdir(self):
        return os.path.isdir(self.path)

    def isfile(self):
        return os.path.isfile(self.path)

    def children(self):        
        return [File(os.path.join(self.path, f)) 
                for f in os.listdir(self.path)]

    def getsize(self):        
        return os.path.getsize(self.path)

    def getModificationTime(self):
        return os.path.getmtime(self.path)

def _default(o):
    d = {}
    d['path'] = o.path
    d['isFile'] = o.isfile()
    d['isDir'] = o.isdir()
    d['mtime'] = int(o.getModificationTime())
    d['size'] = o.getsize() if o.isfile() else 0
    if o.isdir(): d['children'] = o.children()
    return d

folder = os.path.abspath('.')
json.dump(File(folder), sys.stdout, default=_default)

回答 18

当我尝试将Peewee的模型存储到PostgreSQL中时遇到了这个问题 JSONField

经过一段时间的努力,这是一般的解决方案。

我的解决方案的关键是浏览Python的源代码,并意识到代码文档(在此描述)已经解释了如何扩展现有的json.dumps以支持其他数据类型。

假设您当前有一个模型,其中包含一些无法序列化为JSON的字段,并且包含JSON字段的模型最初看起来像这样:

class SomeClass(Model):
    json_field = JSONField()

只需这样定义一个自定义JSONEncoder

class CustomJsonEncoder(json.JSONEncoder):
    def default(self, obj):
        if isinstance(obj, SomeTypeUnsupportedByJsonDumps):
            return < whatever value you want >
        return json.JSONEncoder.default(self, obj)

    @staticmethod
    def json_dumper(obj):
        return json.dumps(obj, cls=CustomJsonEncoder)

然后JSONField像下面这样使用它:

class SomeClass(Model):
    json_field = JSONField(dumps=CustomJsonEncoder.json_dumper)

关键是default(self, obj)上面的方法。对于... is not JSON serializable您从Python收到的每一个投诉,只需添加代码来处理从unserializable-to-JSON类型(例如Enumdatetime

例如,这是我如何支持从继承的类Enum

class TransactionType(Enum):
   CURRENT = 1
   STACKED = 2

   def default(self, obj):
       if isinstance(obj, TransactionType):
           return obj.value
       return json.JSONEncoder.default(self, obj)

最后,使用上述实现的代码,您可以将任何Peewee模型转换为JSON可序列化的对象,如下所示:

peewee_model = WhateverPeeweeModel()
new_model = SomeClass()
new_model.json_field = model_to_dict(peewee_model)

尽管上面的代码(某种程度上)特定于Peewee,但是我认为:

  1. 通常适用于其他ORM(例如Django等)
  2. 另外,如果您了解其json.dumps工作原理,那么该解决方案通常也适用于Python(无ORM)

如有任何疑问,请发表在评论部分。谢谢!

I ran into this problem when I tried to store Peewee’s model into PostgreSQL JSONField.

After struggling for a while, here’s the general solution.

The key to my solution is going through Python’s source code and realizing that the code documentation (described here) already explains how to extend the existing json.dumps to support other data types.

Suppose you current have a model that contains some fields that are not serializable to JSON and the model that contains the JSON field originally looks like this:

class SomeClass(Model):
    json_field = JSONField()

Just define a custom JSONEncoder like this:

class CustomJsonEncoder(json.JSONEncoder):
    def default(self, obj):
        if isinstance(obj, SomeTypeUnsupportedByJsonDumps):
            return < whatever value you want >
        return json.JSONEncoder.default(self, obj)

    @staticmethod
    def json_dumper(obj):
        return json.dumps(obj, cls=CustomJsonEncoder)

And then just use it in your JSONField like below:

class SomeClass(Model):
    json_field = JSONField(dumps=CustomJsonEncoder.json_dumper)

The key is the default(self, obj) method above. For every single ... is not JSON serializable complaint you receive from Python, just add code to handle the unserializable-to-JSON type (such as Enum or datetime)

For example, here’s how I support a class inheriting from Enum:

class TransactionType(Enum):
   CURRENT = 1
   STACKED = 2

   def default(self, obj):
       if isinstance(obj, TransactionType):
           return obj.value
       return json.JSONEncoder.default(self, obj)

Finally, with the code implemented like above, you can just convert any Peewee models to be a JSON-seriazable object like below:

peewee_model = WhateverPeeweeModel()
new_model = SomeClass()
new_model.json_field = model_to_dict(peewee_model)

Though the code above was (somewhat) specific to Peewee, but I think:

  1. It’s applicable to other ORMs (Django, etc) in general
  2. Also, if you understood how json.dumps works, this solution also works with Python (sans ORM) in general too

Any questions, please post in the comments section. Thanks!


回答 19

此函数使用递归遍历字典的每个部分,然后调用非内置类型的类的repr()方法。

def sterilize(obj):
    object_type = type(obj)
    if isinstance(obj, dict):
        return {k: sterilize(v) for k, v in obj.items()}
    elif object_type in (list, tuple):
        return [sterilize(v) for v in obj]
    elif object_type in (str, int, bool):
        return obj
    else:
        return obj.__repr__()

This function uses recursion to iterate over every part of the dictionary and then calls the repr() methods of classes that are not build-in types.

def sterilize(obj):
    object_type = type(obj)
    if isinstance(obj, dict):
        return {k: sterilize(v) for k, v in obj.items()}
    elif object_type in (list, tuple):
        return [sterilize(v) for v in obj]
    elif object_type in (str, int, bool):
        return obj
    else:
        return obj.__repr__()

回答 20

这是一个小型库,它将带有其所有子级的对象序列化为JSON并将其解析回:

https://github.com/Toubs/PyJSONSerialization/

This is a small library that serializes an object with all its children to JSON and also parses it back:

https://github.com/Toubs/PyJSONSerialization/


回答 21

我想出了自己的解决方案。使用此方法,传递任何文档(dictlistObjectId等)进行序列化。

def getSerializable(doc):
    # check if it's a list
    if isinstance(doc, list):
        for i, val in enumerate(doc):
            doc[i] = getSerializable(doc[i])
        return doc

    # check if it's a dict
    if isinstance(doc, dict):
        for key in doc.keys():
            doc[key] = getSerializable(doc[key])
        return doc

    # Process ObjectId
    if isinstance(doc, ObjectId):
        doc = str(doc)
        return doc

    # Use any other custom serializting stuff here...

    # For the rest of stuff
    return doc

I came up with my own solution. Use this method, pass any document (dict,list, ObjectId etc) to serialize.

def getSerializable(doc):
    # check if it's a list
    if isinstance(doc, list):
        for i, val in enumerate(doc):
            doc[i] = getSerializable(doc[i])
        return doc

    # check if it's a dict
    if isinstance(doc, dict):
        for key in doc.keys():
            doc[key] = getSerializable(doc[key])
        return doc

    # Process ObjectId
    if isinstance(doc, ObjectId):
        doc = str(doc)
        return doc

    # Use any other custom serializting stuff here...

    # For the rest of stuff
    return doc

回答 22

我选择使用装饰器解决datetime对象序列化问题。这是我的代码:

#myjson.py
#Author: jmooremcc 7/16/2017

import json
from datetime import datetime, date, time, timedelta
"""
This module uses decorators to serialize date objects using json
The filename is myjson.py
In another module you simply add the following import statement:
    from myjson import json

json.dumps and json.dump will then correctly serialize datetime and date 
objects
"""

def json_serial(obj):
    """JSON serializer for objects not serializable by default json code"""

    if isinstance(obj, (datetime, date)):
        serial = str(obj)
        return serial
    raise TypeError ("Type %s not serializable" % type(obj))


def FixDumps(fn):
    def hook(obj):
        return fn(obj, default=json_serial)

    return hook

def FixDump(fn):
    def hook(obj, fp):
        return fn(obj,fp, default=json_serial)

    return hook


json.dumps=FixDumps(json.dumps)
json.dump=FixDump(json.dump)


if __name__=="__main__":
    today=datetime.now()
    data={'atime':today, 'greet':'Hello'}
    str=json.dumps(data)
    print str

通过导入上述模块,我的其他模块以常规方式(不指定默认关键字)使用json来序列化包含日期时间对象的数据。datetime序列化程序代码会被json.dumps和json.dump自动调用。

I chose to use decorators to solve the datetime object serialization problem. Here is my code:

#myjson.py
#Author: jmooremcc 7/16/2017

import json
from datetime import datetime, date, time, timedelta
"""
This module uses decorators to serialize date objects using json
The filename is myjson.py
In another module you simply add the following import statement:
    from myjson import json

json.dumps and json.dump will then correctly serialize datetime and date 
objects
"""

def json_serial(obj):
    """JSON serializer for objects not serializable by default json code"""

    if isinstance(obj, (datetime, date)):
        serial = str(obj)
        return serial
    raise TypeError ("Type %s not serializable" % type(obj))


def FixDumps(fn):
    def hook(obj):
        return fn(obj, default=json_serial)

    return hook

def FixDump(fn):
    def hook(obj, fp):
        return fn(obj,fp, default=json_serial)

    return hook


json.dumps=FixDumps(json.dumps)
json.dump=FixDump(json.dump)


if __name__=="__main__":
    today=datetime.now()
    data={'atime':today, 'greet':'Hello'}
    str=json.dumps(data)
    print str

By importing the above module, my other modules use json in a normal way (without specifying the default keyword) to serialize data that contains date time objects. The datetime serializer code is automatically called for json.dumps and json.dump.


回答 23

我最喜欢Lost Koder的方法。尝试序列化其成员/方法无法序列化的更复杂对象时,我遇到了问题。这是适用于更多对象的实现:

class Serializer(object):
    @staticmethod
    def serialize(obj):
        def check(o):
            for k, v in o.__dict__.items():
                try:
                    _ = json.dumps(v)
                    o.__dict__[k] = v
                except TypeError:
                    o.__dict__[k] = str(v)
            return o
        return json.dumps(check(obj).__dict__, indent=2)

I liked Lost Koder’s method the most. I ran into issues when trying to serialize more complex objects whos members/methods aren’t serializable. Here’s my implementation that works on more objects:

class Serializer(object):
    @staticmethod
    def serialize(obj):
        def check(o):
            for k, v in o.__dict__.items():
                try:
                    _ = json.dumps(v)
                    o.__dict__[k] = v
                except TypeError:
                    o.__dict__[k] = str(v)
            return o
        return json.dumps(check(obj).__dict__, indent=2)

回答 24

如果您能够安装软件包,我建议您尝试dill,这对于我的项目来说效果很好。这个套件的优点是它具有与相同的介面pickle,因此,如果您已经pickle在专案中使用过,可以简单地以dill为该脚本并查看脚本是否在运行,而无需更改任何代码。因此,这是一个非常便宜的解决方案!

(完全反公开:我与Dill项目毫无关系,也从未参与过该项目。)

安装软件包:

pip install dill

然后,编辑要导入的代码,dill而不是pickle

# import pickle
import dill as pickle

运行您的脚本,看看它是否有效。(如果这样做,您可能需要清理代码,以使您不再隐藏pickle模块名称!)

dill项目页面可以和不能序列化的数据类型的一些细节:

dill 可以腌制以下标准类型:

无,类型,布尔值,整数,长整型,浮点型,复杂,str,unicode,元组,列表,字典,文件,缓冲区,内置,新旧样式类,新旧样式类实例,集合,frozenset,数组,功能,异常

dill 也可以腌制更多“异国”标准类型:

具有yields的函数,嵌套函数,lambdas,单元格,方法,unboundmethod,模块,代码,methodwrapper,dictproxy,methoddescriptor,getsetdescriptor,memberdescriptor,wrapperperscriptor,xrange,slice,未实现,省略号,退出

dill 尚不能腌制这些标准类型:

框架,发生器,回溯

If you are able to install a package, I’d recommend trying dill, which worked just fine for my project. A nice thing about this package is that it has the same interface as pickle, so if you have already been using pickle in your project you can simply substitute in dill and see if the script runs, without changing any code. So it is a very cheap solution to try!

(Full anti-disclosure: I am in no way affiliated with and have never contributed to the dill project.)

Install the package:

pip install dill

Then edit your code to import dill instead of pickle:

# import pickle
import dill as pickle

Run your script and see if it works. (If it does you may want to clean up your code so that you are no longer shadowing the pickle module name!)

Some specifics on datatypes that dill can and cannot serialize, from the project page:

dill can pickle the following standard types:

none, type, bool, int, long, float, complex, str, unicode, tuple, list, dict, file, buffer, builtin, both old and new style classes, instances of old and new style classes, set, frozenset, array, functions, exceptions

dill can also pickle more ‘exotic’ standard types:

functions with yields, nested functions, lambdas, cell, method, unboundmethod, module, code, methodwrapper, dictproxy, methoddescriptor, getsetdescriptor, memberdescriptor, wrapperdescriptor, xrange, slice, notimplemented, ellipsis, quit

dill cannot yet pickle these standard types:

frame, generator, traceback


回答 25

我在这里没有提到串行版本控制或反向兼容,因此我将发布已经使用了一段时间的解决方案。我可能有很多东西要学,特别是Java和Javascript在这里可能比我更成熟,但是这里

https://gist.github.com/andy-d/b7878d0044a4242c0498ed6d67fd50fe

I see no mention here of serial versioning or backcompat, so I will post my solution which I’ve been using for a bit. I probably have a lot more to learn from, specifically Java and Javascript are probably more mature than me here but here goes

https://gist.github.com/andy-d/b7878d0044a4242c0498ed6d67fd50fe


回答 26

要添加另一个选项:您可以使用attrs包和asdict方法。

class ObjectEncoder(JSONEncoder):
    def default(self, o):
        return attr.asdict(o)

json.dumps(objects, cls=ObjectEncoder)

并转换回来

def from_json(o):
    if '_obj_name' in o:
        type_ = o['_obj_name']
        del o['_obj_name']
        return globals()[type_](**o)
    else:
        return o

data = JSONDecoder(object_hook=from_json).decode(data)

类看起来像这样

@attr.s
class Foo(object):
    x = attr.ib()
    _obj_name = attr.ib(init=False, default='Foo')

To add another option: You can use the attrs package and the asdict method.

class ObjectEncoder(JSONEncoder):
    def default(self, o):
        return attr.asdict(o)

json.dumps(objects, cls=ObjectEncoder)

and to convert back

def from_json(o):
    if '_obj_name' in o:
        type_ = o['_obj_name']
        del o['_obj_name']
        return globals()[type_](**o)
    else:
        return o

data = JSONDecoder(object_hook=from_json).decode(data)

class looks like this

@attr.s
class Foo(object):
    x = attr.ib()
    _obj_name = attr.ib(init=False, default='Foo')

回答 27

除了Onur的答案外,您可能还想处理类似以下的datetime类型。
(为了处理:“ datetime.datetime”对象没有属性“ dict ”异常。)

def datetime_option(value):
    if isinstance(value, datetime.date):
        return value.timestamp()
    else:
        return value.__dict__

用法:

def toJSON(self):
    return json.dumps(self, default=datetime_option, sort_keys=True, indent=4)

In addition to the Onur’s answer, You possibly want to deal with datetime type like below.
(in order to handle: ‘datetime.datetime’ object has no attribute ‘dict‘ exception.)

def datetime_option(value):
    if isinstance(value, datetime.date):
        return value.timestamp()
    else:
        return value.__dict__

Usage:

def toJSON(self):
    return json.dumps(self, default=datetime_option, sort_keys=True, indent=4)

回答 28

首先,我们需要使对象符合JSON,因此可以使用标准JSON模块将其转储。我这样做是这样的:

def serialize(o):
    if isinstance(o, dict):
        return {k:serialize(v) for k,v in o.items()}
    if isinstance(o, list):
        return [serialize(e) for e in o]
    if isinstance(o, bytes):
        return o.decode("utf-8")
    return o

First we need to make our object JSON-compliant, so we can dump it using the standard JSON module. I did it this way:

def serialize(o):
    if isinstance(o, dict):
        return {k:serialize(v) for k,v in o.items()}
    if isinstance(o, list):
        return [serialize(e) for e in o]
    if isinstance(o, bytes):
        return o.decode("utf-8")
    return o

回答 29

基于Quinten Cabo答案

def sterilize(obj):
    if type(obj) in (str, float, int, bool, type(None)):
        return obj
    elif isinstance(obj, dict):
        return {k: sterilize(v) for k, v in obj.items()}
    elif hasattr(obj, '__iter__') and callable(obj.__iter__):
        return [sterilize(v) for v in obj]
    elif hasattr(obj, '__dict__'):
        return {k: sterilize(v) for k, v in obj.__dict__.items() if k not in ['__module__', '__dict__', '__weakref__', '__doc__']}
    else:
        return repr(obj)

区别是

  1. 适用于任何迭代而不是just listtuple(适用于NumPy数组等)
  2. 适用于动态类型(包含 __dict__)。
  3. 包括本机类型floatNone因此它们不会转换为字符串。

留给读者的练习是处理__slots__,这些类既可迭代且具有成员,这些类既是字典又具有成员,等等。

Building on Quinten Cabo‘s answer:

def sterilize(obj):
    if type(obj) in (str, float, int, bool, type(None)):
        return obj
    elif isinstance(obj, dict):
        return {k: sterilize(v) for k, v in obj.items()}
    elif hasattr(obj, '__iter__') and callable(obj.__iter__):
        return [sterilize(v) for v in obj]
    elif hasattr(obj, '__dict__'):
        return {k: sterilize(v) for k, v in obj.__dict__.items() if k not in ['__module__', '__dict__', '__weakref__', '__doc__']}
    else:
        return repr(obj)

The differences are

  1. Works for any iterable instead of just list and tuple (it works for NumPy arrays, etc.)
  2. Works for dynamic types (ones that contain a __dict__).
  3. Includes native types float and None so they don’t get converted to string.

Left as an exercise to the reader is to handle __slots__, classes that are both iterable and have members, classes that are dictionaries and also have members, etc.


Flatbuffers-FlatBuffers:内存效率高的串行化库

Flatbuffers

Flatbuffers是一个跨平台的序列化库,旨在实现最高的内存效率。它允许您直接访问序列化数据,而无需先对其进行解析/解包,同时仍具有很好的向前/向后兼容性

请访问我们的landing page浏览我们的文档

支持的操作系统

  • Windows
  • MacOS X
  • Linux操作系统
  • 安卓系统
  • 以及使用最新的C++编译器的任何其他版本

支持的编程语言

  • C++
  • C#
  • C
  • GO
  • Java语言
  • JavaScript
  • PHP
  • python
  • Rust

还有更多的正在进行中

贡献

为这个项目做贡献,看见CONTRIBUTING

安全性

请参阅我们的Security Policy用于报告漏洞

许可

平缓冲器是按照Apache许可证2.0版进行许可的。看见LICENSE有关完整的许可证文本,请参阅