Python 实用宝典

Question 1

I am trying to create a JSON string representation of a class instance and having difficulty. Let’s say the class is built like this:

class testclass:
    value1 = "a"
    value2 = "b"

A call to the json.dumps is made like this:

t = testclass()
json.dumps(t)

It is failing and telling me that the testclass is not JSON serializable.

TypeError: <__main__.testclass object at 0x000000000227A400> is not JSON serializable

I have also tried using the pickle module :

t = testclass()
print(pickle.dumps(t, pickle.HIGHEST_PROTOCOL))

And it gives class instance information but not a serialized content of the class instance.

b'\x80\x03c__main__\ntestclass\nq\x00)\x81q\x01}q\x02b.'

What am I doing wrong?

Question 2

The basic problem is that the JSON encoder json.dumps() only knows how to serialize a limited set of object types by default, all built-in types. List here: https://docs.python.org/3.3/library/json.html#encoders-and-decoders

One good solution would be to make your class inherit from JSONEncoder and then implement the JSONEncoder.default() function, and make that function emit the correct JSON for your class.

A simple solution would be to call json.dumps() on the .__dict__ member of that instance. That is a standard Python dict and if your class is simple it will be JSON serializable.

class Foo(object):
    def __init__(self):
        self.x = 1
        self.y = 2

foo = Foo()
s = json.dumps(foo) # raises TypeError with "is not JSON serializable"

s = json.dumps(foo.__dict__) # s set to: {"x":1, "y":2}

The above approach is discussed in this blog posting:

Serializing arbitrary Python objects to JSON using __dict__

Question 3

There’s one way that works great for me that you can try out:

json.dumps() can take an optional parameter default where you can specify a custom serializer function for unknown types, which in my case looks like

def serialize(obj):
    """JSON serializer for objects not serializable by default json code"""

    if isinstance(obj, date):
        serial = obj.isoformat()
        return serial

    if isinstance(obj, time):
        serial = obj.isoformat()
        return serial

    return obj.__dict__

First two ifs are for date and time serialization and then there is a obj.__dict__ returned for any other object.

the final call looks like:

json.dumps(myObj, default=serialize)

It’s especially good when you are serializing a collection and you don’t want to call __dict__ explicitly for every object. Here it’s done for you automatically.

So far worked so good for me, looking forward for your thoughts.

Question 4

You can specify the default named parameter in the json.dumps() function:

json.dumps(obj, default=lambda x: x.__dict__)

Explanation:

Form the docs (2.7, 3.6):

``default(obj)`` is a function that should return a serializable version
of obj or raise TypeError. The default simply raises TypeError.

(Works on Python 2.7 and Python 3.x)

Note: In this case you need instance variables and not class variables, as the example in the question tries to do. (I am assuming the asker meant class instance to be an object of a class)

I learned this first from @phihag’s answer here. Found it to be the simplest and cleanest way to do the job.

Question 5

I just do:

data=json.dumps(myobject.__dict__)

This is not the full answer, and if you have some sort of complicated object class you certainly will not get everything. However I use this for some of my simple objects.

One that it works really well on is the “options” class that you get from the OptionParser module. Here it is along with the JSON request itself.

  def executeJson(self, url, options):
        data=json.dumps(options.__dict__)
        if options.verbose:
            print data
        headers = {'Content-type': 'application/json', 'Accept': 'text/plain'}
        return requests.post(url, data, headers=headers)

Question 6

Using jsonpickle

import jsonpickle

object = YourClass()
json_object = jsonpickle.encode(object)

Question 7

JSON is not really meant for serializing arbitrary Python objects. It’s great for serializing dict objects, but the pickle module is really what you should be using in general. Output from pickle is not really human-readable, but it should unpickle just fine. If you insist on using JSON, you could check out the jsonpickle module, which is an interesting hybrid approach.

https://github.com/jsonpickle/jsonpickle

Question 8

Here are two simple functions for serialization of any non-sophisticated classes, nothing fancy as explained before.

I use this for configuration type stuff because I can add new members to the classes with no code adjustments.

import json

class SimpleClass:
    def __init__(self, a=None, b=None, c=None):
        self.a = a
        self.b = b
        self.c = c

def serialize_json(instance=None, path=None):
    dt = {}
    dt.update(vars(instance))

    with open(path, "w") as file:
        json.dump(dt, file)

def deserialize_json(cls=None, path=None):
    def read_json(_path):
        with open(_path, "r") as file:
            return json.load(file)

    data = read_json(path)

    instance = object.__new__(cls)

    for key, value in data.items():
        setattr(instance, key, value)

    return instance

# Usage: Create class and serialize under Windows file system.
write_settings = SimpleClass(a=1, b=2, c=3)
serialize_json(write_settings, r"c:\temp\test.json")

# Read back and rehydrate.
read_settings = deserialize_json(SimpleClass, r"c:\temp\test.json")

# results are the same.
print(vars(write_settings))
print(vars(read_settings))

# output:
# {'c': 3, 'b': 2, 'a': 1}
# {'c': 3, 'b': 2, 'a': 1}

Question 9

There are some good answers on how to get started on doing this. But there are some things to keep in mind:

What if the instance is nested inside a large data structure?
What if also want the class name?
What if you want to deserialize the instance?
What if you’re using __slots__ instead of __dict__?
What if you just don’t want to do it yourself?

json-tricks is a library (that I made and others contributed to) which has been able to do this for quite a while. For example:

class MyTestCls:
    def __init__(self, **kwargs):
        for k, v in kwargs.items():
            setattr(self, k, v)

cls_instance = MyTestCls(s='ub', dct={'7': 7})

json = dumps(cls_instance, indent=4)
instance = loads(json)

You’ll get your instance back. Here the json looks like this:

{
    "__instance_type__": [
        "json_tricks.test_class",
        "MyTestCls"
    ],
    "attributes": {
        "s": "ub",
        "dct": {
            "7": 7
        }
    }
}

If you like to make your own solution, you might look at the source of json-tricks so as not to forget some special cases (like __slots__).

It also does other types like numpy arrays, datetimes, complex numbers; it also allows for comments.

Question 10

Python3.x

The best aproach I could reach with my knowledge was this.
Note that this code treat set() too.
This approach is generic just needing the extension of class (in the second example).
Note that I’m just doing it to files, but it’s easy to modify the behavior to your taste.

However this is a CoDec.

With a little more work you can construct your class in other ways. I assume a default constructor to instance it, then I update the class dict.

import json
import collections


class JsonClassSerializable(json.JSONEncoder):

    REGISTERED_CLASS = {}

    def register(ctype):
        JsonClassSerializable.REGISTERED_CLASS[ctype.__name__] = ctype

    def default(self, obj):
        if isinstance(obj, collections.Set):
            return dict(_set_object=list(obj))
        if isinstance(obj, JsonClassSerializable):
            jclass = {}
            jclass["name"] = type(obj).__name__
            jclass["dict"] = obj.__dict__
            return dict(_class_object=jclass)
        else:
            return json.JSONEncoder.default(self, obj)

    def json_to_class(self, dct):
        if '_set_object' in dct:
            return set(dct['_set_object'])
        elif '_class_object' in dct:
            cclass = dct['_class_object']
            cclass_name = cclass["name"]
            if cclass_name not in self.REGISTERED_CLASS:
                raise RuntimeError(
                    "Class {} not registered in JSON Parser"
                    .format(cclass["name"])
                )
            instance = self.REGISTERED_CLASS[cclass_name]()
            instance.__dict__ = cclass["dict"]
            return instance
        return dct

    def encode_(self, file):
        with open(file, 'w') as outfile:
            json.dump(
                self.__dict__, outfile,
                cls=JsonClassSerializable,
                indent=4,
                sort_keys=True
            )

    def decode_(self, file):
        try:
            with open(file, 'r') as infile:
                self.__dict__ = json.load(
                    infile,
                    object_hook=self.json_to_class
                )
        except FileNotFoundError:
            print("Persistence load failed "
                  "'{}' do not exists".format(file)
                  )


class C(JsonClassSerializable):

    def __init__(self):
        self.mill = "s"


JsonClassSerializable.register(C)


class B(JsonClassSerializable):

    def __init__(self):
        self.a = 1230
        self.c = C()


JsonClassSerializable.register(B)


class A(JsonClassSerializable):

    def __init__(self):
        self.a = 1
        self.b = {1, 2}
        self.c = B()

JsonClassSerializable.register(A)

A().encode_("test")
b = A()
b.decode_("test")
print(b.a)
print(b.b)
print(b.c.a)

Edit

With some more of research I found a way to generalize without the need of the SUPERCLASS register method call, using a metaclass

import json
import collections

REGISTERED_CLASS = {}

class MetaSerializable(type):

    def __call__(cls, *args, **kwargs):
        if cls.__name__ not in REGISTERED_CLASS:
            REGISTERED_CLASS[cls.__name__] = cls
        return super(MetaSerializable, cls).__call__(*args, **kwargs)


class JsonClassSerializable(json.JSONEncoder, metaclass=MetaSerializable):

    def default(self, obj):
        if isinstance(obj, collections.Set):
            return dict(_set_object=list(obj))
        if isinstance(obj, JsonClassSerializable):
            jclass = {}
            jclass["name"] = type(obj).__name__
            jclass["dict"] = obj.__dict__
            return dict(_class_object=jclass)
        else:
            return json.JSONEncoder.default(self, obj)

    def json_to_class(self, dct):
        if '_set_object' in dct:
            return set(dct['_set_object'])
        elif '_class_object' in dct:
            cclass = dct['_class_object']
            cclass_name = cclass["name"]
            if cclass_name not in REGISTERED_CLASS:
                raise RuntimeError(
                    "Class {} not registered in JSON Parser"
                    .format(cclass["name"])
                )
            instance = REGISTERED_CLASS[cclass_name]()
            instance.__dict__ = cclass["dict"]
            return instance
        return dct

    def encode_(self, file):
        with open(file, 'w') as outfile:
            json.dump(
                self.__dict__, outfile,
                cls=JsonClassSerializable,
                indent=4,
                sort_keys=True
            )

    def decode_(self, file):
        try:
            with open(file, 'r') as infile:
                self.__dict__ = json.load(
                    infile,
                    object_hook=self.json_to_class
                )
        except FileNotFoundError:
            print("Persistence load failed "
                  "'{}' do not exists".format(file)
                  )


class C(JsonClassSerializable):

    def __init__(self):
        self.mill = "s"


class B(JsonClassSerializable):

    def __init__(self):
        self.a = 1230
        self.c = C()


class A(JsonClassSerializable):

    def __init__(self):
        self.a = 1
        self.b = {1, 2}
        self.c = B()


A().encode_("test")
b = A()
b.decode_("test")
print(b.a)
# 1
print(b.b)
# {1, 2}
print(b.c.a)
# 1230
print(b.c.c.mill)
# s

Question 11

I believe instead of inheritance as suggested in accepted answer, it’s better to use polymorphism. Otherwise you have to have a big if else statement to customize encoding of every object. That means create a generic default encoder for JSON as:

def jsonDefEncoder(obj):
   if hasattr(obj, 'jsonEnc'):
      return obj.jsonEnc()
   else: #some default behavior
      return obj.__dict__

and then have a jsonEnc() function in each class you want to serialize. e.g.

class A(object):
   def __init__(self,lengthInFeet):
      self.lengthInFeet=lengthInFeet
   def jsonEnc(self):
      return {'lengthInMeters': lengthInFeet * 0.3 } # each foot is 0.3 meter

Then you call json.dumps(classInstance,default=jsonDefEncoder)

Question 12

I’ve created an object like this:

company1.name = 'banana' 
company1.value = 40

I would like to save this object. How can I do that?

Question 13

You could use the pickle module in the standard library. Here’s an elementary application of it to your example:

import pickle

class Company(object):
    def __init__(self, name, value):
        self.name = name
        self.value = value

with open('company_data.pkl', 'wb') as output:
    company1 = Company('banana', 40)
    pickle.dump(company1, output, pickle.HIGHEST_PROTOCOL)

    company2 = Company('spam', 42)
    pickle.dump(company2, output, pickle.HIGHEST_PROTOCOL)

del company1
del company2

with open('company_data.pkl', 'rb') as input:
    company1 = pickle.load(input)
    print(company1.name)  # -> banana
    print(company1.value)  # -> 40

    company2 = pickle.load(input)
    print(company2.name) # -> spam
    print(company2.value)  # -> 42

You could also define your own simple utility like the following which opens a file and writes a single object to it:

def save_object(obj, filename):
    with open(filename, 'wb') as output:  # Overwrites any existing file.
        pickle.dump(obj, output, pickle.HIGHEST_PROTOCOL)

# sample usage
save_object(company1, 'company1.pkl')

Update

Since this is such a popular answer, I’d like touch on a few slightly advanced usage topics.

`cPickle` (or `_pickle`) vs `pickle`

It’s almost always preferable to actually use the cPickle module rather than pickle because the former is written in C and is much faster. There are some subtle differences between them, but in most situations they’re equivalent and the C version will provide greatly superior performance. Switching to it couldn’t be easier, just change the import statement to this:

import cPickle as pickle

In Python 3, cPickle was renamed _pickle, but doing this is no longer necessary since the pickle module now does it automatically—see What difference between pickle and _pickle in python 3?.

The rundown is you could use something like the following to ensure that your code will always use the C version when it’s available in both Python 2 and 3:

try:
    import cPickle as pickle
except ModuleNotFoundError:
    import pickle

Data stream formats (protocols)

pickle can read and write files in several different, Python-specific, formats, called protocols as described in the documentation, “Protocol version 0” is ASCII and therefore “human-readable”. Versions > 0 are binary and the highest one available depends on what version of Python is being used. The default also depends on Python version. In Python 2 the default was Protocol version 0, but in Python 3.8.1, it’s Protocol version 4. In Python 3.x the module had a pickle.DEFAULT_PROTOCOL added to it, but that doesn’t exist in Python 2.

Fortunately there’s shorthand for writing pickle.HIGHEST_PROTOCOL in every call (assuming that’s what you want, and you usually do), just use the literal number -1 — similar to referencing the last element of a sequence via a negative index. So, instead of writing:

pickle.dump(obj, output, pickle.HIGHEST_PROTOCOL)

You can just write:

pickle.dump(obj, output, -1)

Either way, you’d only have specify the protocol once if you created a Pickler object for use in multiple pickle operations:

pickler = pickle.Pickler(output, -1)
pickler.dump(obj1)
pickler.dump(obj2)
   etc...

Note: If you’re in an environment running different versions of Python, then you’ll probably want to explicitly use (i.e. hardcode) a specific protocol number that all of them can read (later versions can generally read files produced by earlier ones).

Multiple Objects

While a pickle file can contain any number of pickled objects, as shown in the above samples, when there’s an unknown number of them, it’s often easier to store them all in some sort of variably-sized container, like a list, tuple, or dict and write them all to the file in a single call:

tech_companies = [
    Company('Apple', 114.18), Company('Google', 908.60), Company('Microsoft', 69.18)
]
save_object(tech_companies, 'tech_companies.pkl')

and restore the list and everything in it later with:

with open('tech_companies.pkl', 'rb') as input:
    tech_companies = pickle.load(input)

The major advantage is you don’t need to know how many object instances are saved in order to load them back later (although doing so without that information is possible, it requires some slightly specialized code). See the answers to the related question Saving and loading multiple objects in pickle file? for details on different ways to do this. Personally I like @Lutz Prechelt’s answer the best. Here’s it adapted to the examples here:

class Company:
    def __init__(self, name, value):
        self.name = name
        self.value = value

def pickled_items(filename):
    """ Unpickle a file of pickled data. """
    with open(filename, "rb") as f:
        while True:
            try:
                yield pickle.load(f)
            except EOFError:
                break

print('Companies in pickle file:')
for company in pickled_items('company_data.pkl'):
    print('  name: {}, value: {}'.format(company.name, company.value))

Question 14

I think it’s a pretty strong assumption to assume that the object is a class. What if it’s not a class? There’s also the assumption that the object was not defined in the interpreter. What if it was defined in the interpreter? Also, what if the attributes were added dynamically? When some python objects have attributes added to their __dict__ after creation, pickle doesn’t respect the addition of those attributes (i.e. it ‘forgets’ they were added — because pickle serializes by reference to the object definition).

In all these cases, pickle and cPickle can fail you horribly.

If you are looking to save an object (arbitrarily created), where you have attributes (either added in the object definition, or afterward)… your best bet is to use dill, which can serialize almost anything in python.

We start with a class…

Python 2.7.8 (default, Jul 13 2014, 02:29:54) 
[GCC 4.2.1 Compatible Apple Clang 4.1 ((tags/Apple/clang-421.11.66))] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import pickle
>>> class Company:
...     pass
... 
>>> company1 = Company()
>>> company1.name = 'banana'
>>> company1.value = 40
>>> with open('company.pkl', 'wb') as f:
...     pickle.dump(company1, f, pickle.HIGHEST_PROTOCOL)
... 
>>>

Now shut down, and restart…

Python 2.7.8 (default, Jul 13 2014, 02:29:54) 
[GCC 4.2.1 Compatible Apple Clang 4.1 ((tags/Apple/clang-421.11.66))] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import pickle
>>> with open('company.pkl', 'rb') as f:
...     company1 = pickle.load(f)
... 
Traceback (most recent call last):
  File "<stdin>", line 2, in <module>
  File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py", line 1378, in load
    return Unpickler(file).load()
  File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py", line 858, in load
dispatch[key](self)
  File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py", line 1090, in load_global
    klass = self.find_class(module, name)
  File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py", line 1126, in find_class
    klass = getattr(mod, name)
AttributeError: 'module' object has no attribute 'Company'
>>>

Oops… pickle can’t handle it. Let’s try dill. We’ll throw in another object type (a lambda) for good measure.

Python 2.7.8 (default, Jul 13 2014, 02:29:54) 
[GCC 4.2.1 Compatible Apple Clang 4.1 ((tags/Apple/clang-421.11.66))] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import dill       
>>> class Company:
...     pass
... 
>>> company1 = Company()
>>> company1.name = 'banana'
>>> company1.value = 40
>>> 
>>> company2 = lambda x:x
>>> company2.name = 'rhubarb'
>>> company2.value = 42
>>> 
>>> with open('company_dill.pkl', 'wb') as f:
...     dill.dump(company1, f)
...     dill.dump(company2, f)
... 
>>>

And now read the file.

Python 2.7.8 (default, Jul 13 2014, 02:29:54) 
[GCC 4.2.1 Compatible Apple Clang 4.1 ((tags/Apple/clang-421.11.66))] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import dill
>>> with open('company_dill.pkl', 'rb') as f:
...     company1 = dill.load(f)
...     company2 = dill.load(f)
... 
>>> company1 
<__main__.Company instance at 0x107909128>
>>> company1.name
'banana'
>>> company1.value
40
>>> company2.name
'rhubarb'
>>> company2.value
42
>>>

It works. The reason pickle fails, and dill doesn’t, is that dill treats __main__ like a module (for the most part), and also can pickle class definitions instead of pickling by reference (like pickle does). The reason dill can pickle a lambda is that it gives it a name… then pickling magic can happen.

Actually, there’s an easier way to save all these objects, especially if you have a lot of objects you’ve created. Just dump the whole python session, and come back to it later.

Python 2.7.8 (default, Jul 13 2014, 02:29:54) 
[GCC 4.2.1 Compatible Apple Clang 4.1 ((tags/Apple/clang-421.11.66))] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import dill
>>> class Company:
...     pass
... 
>>> company1 = Company()
>>> company1.name = 'banana'
>>> company1.value = 40
>>> 
>>> company2 = lambda x:x
>>> company2.name = 'rhubarb'
>>> company2.value = 42
>>> 
>>> dill.dump_session('dill.pkl')
>>>

Now shut down your computer, go enjoy an espresso or whatever, and come back later…

Python 2.7.8 (default, Jul 13 2014, 02:29:54) 
[GCC 4.2.1 Compatible Apple Clang 4.1 ((tags/Apple/clang-421.11.66))] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import dill
>>> dill.load_session('dill.pkl')
>>> company1.name
'banana'
>>> company1.value
40
>>> company2.name
'rhubarb'
>>> company2.value
42
>>> company2
<function <lambda> at 0x1065f2938>

The only major drawback is that dill is not part of the python standard library. So if you can’t install a python package on your server, then you can’t use it.

However, if you are able to install python packages on your system, you can get the latest dill with git+https://github.com/uqfoundation/dill.git@master#egg=dill. And you can get the latest released version with pip install dill.

Question 15

You can use anycache to do the job for you. It considers all the details:

It uses dill as backend, which extends the python pickle module to handle lambda and all the nice python features.
It stores different objects to different files and reloads them properly.
Limits cache size
Allows cache clearing
Allows sharing of objects between multiple runs
Allows respect of input files which influence the result

Assuming you have a function myfunc which creates the instance:

from anycache import anycache

class Company(object):
    def __init__(self, name, value):
        self.name = name
        self.value = value

@anycache(cachedir='/path/to/your/cache')    
def myfunc(name, value)
    return Company(name, value)

Anycache calls myfunc at the first time and pickles the result to a file in cachedir using an unique identifier (depending on the function name and its arguments) as filename. On any consecutive run, the pickled object is loaded. If the cachedir is preserved between python runs, the pickled object is taken from the previous python run.

For any further details see the documentation

Question 16

Quick example using company1 from your question, with python3.

import pickle

# Save the file
pickle.dump(company1, file = open("company1.pickle", "wb"))

# Reload the file
company1_reloaded = pickle.load(open("company1.pickle", "rb"))

However, as this answer noted, pickle often fails. So you should really use dill.

import dill

# Save the file
dill.dump(company1, file = open("company1.pickle", "wb"))

# Reload the file
company1_reloaded = dill.load(open("company1.pickle", "rb"))

Question 17

I’m using Python 2 to parse JSON from ASCII encoded text files.

When loading these files with either json or simplejson, all my string values are cast to Unicode objects instead of string objects. The problem is, I have to use the data with some libraries that only accept string objects. I can’t change the libraries nor update them.

Is it possible to get string objects instead of Unicode ones?

Example

>>> import json
>>> original_list = ['a', 'b']
>>> json_list = json.dumps(original_list)
>>> json_list
'["a", "b"]'
>>> new_list = json.loads(json_list)
>>> new_list
[u'a', u'b']  # I want these to be of type `str`, not `unicode`

Update

This question was asked a long time ago, when I was stuck with Python 2. One easy and clean solution for today is to use a recent version of Python — i.e. Python 3 and forward.

Question 18

A solution with `object_hook`

import json

def json_load_byteified(file_handle):
    return _byteify(
        json.load(file_handle, object_hook=_byteify),
        ignore_dicts=True
    )

def json_loads_byteified(json_text):
    return _byteify(
        json.loads(json_text, object_hook=_byteify),
        ignore_dicts=True
    )

def _byteify(data, ignore_dicts = False):
    # if this is a unicode string, return its string representation
    if isinstance(data, unicode):
        return data.encode('utf-8')
    # if this is a list of values, return list of byteified values
    if isinstance(data, list):
        return [ _byteify(item, ignore_dicts=True) for item in data ]
    # if this is a dictionary, return dictionary of byteified keys and values
    # but only if we haven't already byteified it
    if isinstance(data, dict) and not ignore_dicts:
        return {
            _byteify(key, ignore_dicts=True): _byteify(value, ignore_dicts=True)
            for key, value in data.iteritems()
        }
    # if it's anything else, return it in its original form
    return data

Example usage:

>>> json_loads_byteified('{"Hello": "World"}')
{'Hello': 'World'}
>>> json_loads_byteified('"I am a top-level string"')
'I am a top-level string'
>>> json_loads_byteified('7')
7
>>> json_loads_byteified('["I am inside a list"]')
['I am inside a list']
>>> json_loads_byteified('[[[[[[[["I am inside a big nest of lists"]]]]]]]]')
[[[[[[[['I am inside a big nest of lists']]]]]]]]
>>> json_loads_byteified('{"foo": "bar", "things": [7, {"qux": "baz", "moo": {"cow": ["milk"]}}]}')
{'things': [7, {'qux': 'baz', 'moo': {'cow': ['milk']}}], 'foo': 'bar'}
>>> json_load_byteified(open('somefile.json'))
{'more json': 'from a file'}

How does this work and why would I use it?

Mark Amery’s function is shorter and clearer than these ones, so what’s the point of them? Why would you want to use them?

Purely for performance. Mark’s answer decodes the JSON text fully first with unicode strings, then recurses through the entire decoded value to convert all strings to byte strings. This has a couple of undesirable effects:

A copy of the entire decoded structure gets created in memory
If your JSON object is really deeply nested (500 levels or more) then you’ll hit Python’s maximum recursion depth

This answer mitigates both of those performance issues by using the object_hook parameter of json.load and json.loads. From the docs:

object_hook is an optional function that will be called with the result of any object literal decoded (a dict). The return value of object_hook will be used instead of the dict. This feature can be used to implement custom decoders

Since dictionaries nested many levels deep in other dictionaries get passed to object_hook as they’re decoded, we can byteify any strings or lists inside them at that point and avoid the need for deep recursion later.

Mark’s answer isn’t suitable for use as an object_hook as it stands, because it recurses into nested dictionaries. We prevent that recursion in this answer with the ignore_dicts parameter to _byteify, which gets passed to it at all times except when object_hook passes it a new dict to byteify. The ignore_dicts flag tells _byteify to ignore dicts since they already been byteified.

Finally, our implementations of json_load_byteified and json_loads_byteified call _byteify (with ignore_dicts=True) on the result returned from json.load or json.loads to handle the case where the JSON text being decoded doesn’t have a dict at the top level.

Question 19

While there are some good answers here, I ended up using PyYAML to parse my JSON files, since it gives the keys and values as str type strings instead of unicode type. Because JSON is a subset of YAML it works nicely:

>>> import json
>>> import yaml
>>> list_org = ['a', 'b']
>>> list_dump = json.dumps(list_org)
>>> list_dump
'["a", "b"]'
>>> json.loads(list_dump)
[u'a', u'b']
>>> yaml.safe_load(list_dump)
['a', 'b']

Notes

Some things to note though:

I get string objects because all my entries are ASCII encoded. If I would use unicode encoded entries, I would get them back as unicode objects — there is no conversion!
You should (probably always) use PyYAML’s safe_load function; if you use it to load JSON files, you don’t need the “additional power” of the load function anyway.
If you want a YAML parser that has more support for the 1.2 version of the spec (and correctly parses very low numbers) try Ruamel YAML: pip install ruamel.yaml and import ruamel.yaml as yaml was all I needed in my tests.

Conversion

As stated, there is no conversion! If you can’t be sure to only deal with ASCII values (and you can’t be sure most of the time), better use a conversion function:

I used the one from Mark Amery a couple of times now, it works great and is very easy to use. You can also use a similar function as an object_hook instead, as it might gain you a performance boost on big files. See the slightly more involved answer from Mirec Miskuf for that.

Question 20

There’s no built-in option to make the json module functions return byte strings instead of unicode strings. However, this short and simple recursive function will convert any decoded JSON object from using unicode strings to UTF-8-encoded byte strings:

def byteify(input):
    if isinstance(input, dict):
        return {byteify(key): byteify(value)
                for key, value in input.iteritems()}
    elif isinstance(input, list):
        return [byteify(element) for element in input]
    elif isinstance(input, unicode):
        return input.encode('utf-8')
    else:
        return input

Just call this on the output you get from a json.load or json.loads call.

A couple of notes:

To support Python 2.6 or earlier, replace return {byteify(key): byteify(value) for key, value in input.iteritems()} with return dict([(byteify(key), byteify(value)) for key, value in input.iteritems()]), since dictionary comprehensions weren’t supported until Python 2.7.
Since this answer recurses through the entire decoded object, it has a couple of undesirable performance characteristics that can be avoided with very careful use of the object_hook or object_pairs_hook parameters. Mirec Miskuf’s answer is so far the only one that manages to pull this off correctly, although as a consequence, it’s significantly more complicated than my approach.

Question 21

You can use the object_hook parameter for json.loads to pass in a converter. You don’t have to do the conversion after the fact. The json module will always pass the object_hook dicts only, and it will recursively pass in nested dicts, so you don’t have to recurse into nested dicts yourself. I don’t think I would convert unicode strings to numbers like Wells shows. If it’s a unicode string, it was quoted as a string in the JSON file, so it is supposed to be a string (or the file is bad).

Also, I’d try to avoid doing something like str(val) on a unicode object. You should use value.encode(encoding) with a valid encoding, depending on what your external lib expects.

So, for example:

def _decode_list(data):
    rv = []
    for item in data:
        if isinstance(item, unicode):
            item = item.encode('utf-8')
        elif isinstance(item, list):
            item = _decode_list(item)
        elif isinstance(item, dict):
            item = _decode_dict(item)
        rv.append(item)
    return rv

def _decode_dict(data):
    rv = {}
    for key, value in data.iteritems():
        if isinstance(key, unicode):
            key = key.encode('utf-8')
        if isinstance(value, unicode):
            value = value.encode('utf-8')
        elif isinstance(value, list):
            value = _decode_list(value)
        elif isinstance(value, dict):
            value = _decode_dict(value)
        rv[key] = value
    return rv

obj = json.loads(s, object_hook=_decode_dict)

Question 22

That’s because json has no difference between string objects and unicode objects. They’re all strings in javascript.

I think JSON is right to return unicode objects. In fact, I wouldn’t accept anything less, since javascript strings are in fact unicode objects (i.e. JSON (javascript) strings can store any kind of unicode character) so it makes sense to create unicode objects when translating strings from JSON. Plain strings just wouldn’t fit since the library would have to guess the encoding you want.

It’s better to use unicode string objects everywhere. So your best option is to update your libraries so they can deal with unicode objects.

But if you really want bytestrings, just encode the results to the encoding of your choice:

>>> nl = json.loads(js)
>>> nl
[u'a', u'b']
>>> nl = [s.encode('utf-8') for s in nl]
>>> nl
['a', 'b']

Question 23

There exists an easy work-around.

TL;DR – Use ast.literal_eval() instead of json.loads(). Both ast and json are in the standard library.

While not a ‘perfect’ answer, it gets one pretty far if your plan is to ignore Unicode altogether. In Python 2.7

import json, ast
d = { 'field' : 'value' }
print "JSON Fail: ", json.loads(json.dumps(d))
print "AST Win:", ast.literal_eval(json.dumps(d))

gives:

JSON Fail:  {u'field': u'value'}
AST Win: {'field': 'value'}

This gets more hairy when some objects are really Unicode strings. The full answer gets hairy quickly.

Question 24

Mike Brennan’s answer is close, but there is no reason to re-traverse the entire structure. If you use the object_hook_pairs (Python 2.7+) parameter:

object_pairs_hook is an optional function that will be called with the result of any object literal decoded with an ordered list of pairs. The return value of object_pairs_hook will be used instead of the dict. This feature can be used to implement custom decoders that rely on the order that the key and value pairs are decoded (for example, collections.OrderedDict will remember the order of insertion). If object_hook is also defined, the object_pairs_hook takes priority.

With it, you get each JSON object handed to you, so you can do the decoding with no need for recursion:

def deunicodify_hook(pairs):
    new_pairs = []
    for key, value in pairs:
        if isinstance(value, unicode):
            value = value.encode('utf-8')
        if isinstance(key, unicode):
            key = key.encode('utf-8')
        new_pairs.append((key, value))
    return dict(new_pairs)

In [52]: open('test.json').read()
Out[52]: '{"1": "hello", "abc": [1, 2, 3], "def": {"hi": "mom"}, "boo": [1, "hi", "moo", {"5": "some"}]}'                                        

In [53]: json.load(open('test.json'))
Out[53]: 
{u'1': u'hello',
 u'abc': [1, 2, 3],
 u'boo': [1, u'hi', u'moo', {u'5': u'some'}],
 u'def': {u'hi': u'mom'}}

In [54]: json.load(open('test.json'), object_pairs_hook=deunicodify_hook)
Out[54]: 
{'1': 'hello',
 'abc': [1, 2, 3],
 'boo': [1, 'hi', 'moo', {'5': 'some'}],
 'def': {'hi': 'mom'}}

Notice that I never have to call the hook recursively since every object will get handed to the hook when you use the object_pairs_hook. You do have to care about lists, but as you can see, an object within a list will be properly converted, and you don’t have to recurse to make it happen.

EDIT: A coworker pointed out that Python2.6 doesn’t have object_hook_pairs. You can still use this will Python2.6 by making a very small change. In the hook above, change:

for key, value in pairs:

to

for key, value in pairs.iteritems():

Then use object_hook instead of object_pairs_hook:

In [66]: json.load(open('test.json'), object_hook=deunicodify_hook)
Out[66]: 
{'1': 'hello',
 'abc': [1, 2, 3],
 'boo': [1, 'hi', 'moo', {'5': 'some'}],
 'def': {'hi': 'mom'}}

Using object_pairs_hook results in one less dictionary being instantiated for each object in the JSON object, which, if you were parsing a huge document, might be worth while.

Question 25

I’m afraid there’s no way to achieve this automatically within the simplejson library.

The scanner and decoder in simplejson are designed to produce unicode text. To do this, the library uses a function called c_scanstring (if it’s available, for speed), or py_scanstring if the C version is not available. The scanstring function is called several times by nearly every routine that simplejson has for decoding a structure that might contain text. You’d have to either monkeypatch the scanstring value in simplejson.decoder, or subclass JSONDecoder and provide pretty much your own entire implementation of anything that might contain text.

The reason that simplejson outputs unicode, however, is that the json spec specifically mentions that “A string is a collection of zero or more Unicode characters”… support for unicode is assumed as part of the format itself. Simplejson’s scanstring implementation goes so far as to scan and interpret unicode escapes (even error-checking for malformed multi-byte charset representations), so the only way it can reliably return the value to you is as unicode.

If you have an aged library that needs an str, I recommend you either laboriously search the nested data structure after parsing (which I acknowledge is what you explicitly said you wanted to avoid… sorry), or perhaps wrap your libraries in some sort of facade where you can massage the input parameters at a more granular level. The second approach might be more manageable than the first if your data structures are indeed deeply nested.

Question 26

As Mark (Amery) correctly notes: Using PyYaml‘s deserializer on a json dump works only if you have ASCII only. At least out of the box.

Two quick comments on the PyYaml approach:

NEVER use yaml.load on data from the field. Its a feature(!) of yaml to execute arbitrary code hidden within the structure.

You can make it work also for non ASCII via this:

def to_utf8(loader, node):
    return loader.construct_scalar(node).encode('utf-8')
yaml.add_constructor(u'tag:yaml.org,2002:str', to_utf8)

But performance wise its of no comparison to Mark Amery’s answer:

Throwing some deeply nested sample dicts onto the two methods, I get this (with dt[j] = time delta of json.loads(json.dumps(m))):

     dt[yaml.safe_load(json.dumps(m))] =~ 100 * dt[j]
     dt[byteify recursion(Mark Amery)] =~   5 * dt[j]

So deserialization including fully walking the tree and encoding, well within the order of magnitude of json’s C based implementation. I find this remarkably fast and its also more robust than the yaml load at deeply nested structures. And less security error prone, looking at yaml.load.

=> While I would appreciate a pointer to a C only based converter the byteify function should be the default answer.

This holds especially true if your json structure is from the field, containing user input. Because then you probably need to walk anyway over your structure – independent on your desired internal data structures (‘unicode sandwich’ or byte strings only).

Why?

Unicode normalisation. For the unaware: Take a painkiller and read this.

So using the byteify recursion you kill two birds with one stone:

get your bytestrings from nested json dumps
get user input values normalised, so that you find the stuff in your storage.

In my tests it turned out that replacing the input.encode(‘utf-8’) with a unicodedata.normalize(‘NFC’, input).encode(‘utf-8’) was even faster than w/o NFC – but thats heavily dependent on the sample data I guess.

Question 27

The gotcha is that simplejson and json are two different modules, at least in the manner they deal with unicode. You have json in py 2.6+, and this gives you unicode values, whereas simplejson returns string objects. Just try easy_install-ing simplejson in your environment and see if that works. It did for me.

Question 28

Just use pickle instead of json for dump and load, like so:

    import json
    import pickle

    d = { 'field1': 'value1', 'field2': 2, }

    json.dump(d,open("testjson.txt","w"))

    print json.load(open("testjson.txt","r"))

    pickle.dump(d,open("testpickle.txt","w"))

    print pickle.load(open("testpickle.txt","r"))

The output it produces is (strings and integers are handled correctly):

    {u'field2': 2, u'field1': u'value1'}
    {'field2': 2, 'field1': 'value1'}

Question 29

So, I’ve run into the same problem. Guess what was the first Google result.

Because I need to pass all data to PyGTK, unicode strings aren’t very useful to me either. So I have another recursive conversion method. It’s actually also needed for typesafe JSON conversion – json.dump() would bail on any non-literals, like Python objects. Doesn’t convert dict indexes though.

# removes any objects, turns unicode back into str
def filter_data(obj):
        if type(obj) in (int, float, str, bool):
                return obj
        elif type(obj) == unicode:
                return str(obj)
        elif type(obj) in (list, tuple, set):
                obj = list(obj)
                for i,v in enumerate(obj):
                        obj[i] = filter_data(v)
        elif type(obj) == dict:
                for i,v in obj.iteritems():
                        obj[i] = filter_data(v)
        else:
                print "invalid object in data, converting to string"
                obj = str(obj) 
        return obj

Question 30

I had a JSON dict as a string. The keys and values were unicode objects like in the following example:

myStringDict = "{u'key':u'value'}"

I could use the byteify function suggested above by converting the string to a dict object using ast.literal_eval(myStringDict).

Question 31

Support Python2&3 using hook (from https://stackoverflow.com/a/33571117/558397)

import requests
import six
from six import iteritems

requests.packages.urllib3.disable_warnings()  # @UndefinedVariable
r = requests.get("http://echo.jsontest.com/key/value/one/two/three", verify=False)

def _byteify(data):
    # if this is a unicode string, return its string representation
    if isinstance(data, six.string_types):
        return str(data.encode('utf-8').decode())

    # if this is a list of values, return list of byteified values
    if isinstance(data, list):
        return [ _byteify(item) for item in data ]

    # if this is a dictionary, return dictionary of byteified keys and values
    # but only if we haven't already byteified it
    if isinstance(data, dict):
        return {
            _byteify(key): _byteify(value) for key, value in iteritems(data)
        }
    # if it's anything else, return it in its original form
    return data

w = r.json(object_hook=_byteify)
print(w)

Returns:

 {'three': '', 'key': 'value', 'one': 'two'}

Question 32

This is late to the game, but I built this recursive caster. It works for my needs and I think it’s relatively complete. It may help you.

def _parseJSON(self, obj):
    newobj = {}

    for key, value in obj.iteritems():
        key = str(key)

        if isinstance(value, dict):
            newobj[key] = self._parseJSON(value)
        elif isinstance(value, list):
            if key not in newobj:
                newobj[key] = []
                for i in value:
                    newobj[key].append(self._parseJSON(i))
        elif isinstance(value, unicode):
            val = str(value)
            if val.isdigit():
                val = int(val)
            else:
                try:
                    val = float(val)
                except ValueError:
                    val = str(val)
            newobj[key] = val

    return newobj

Just pass it a JSON object like so:

obj = json.loads(content, parse_float=float, parse_int=int)
obj = _parseJSON(obj)

I have it as a private member of a class, but you can repurpose the method as you see fit.

Question 33

I rewrote Wells’s _parse_json() to handle cases where the json object itself is an array (my use case).

def _parseJSON(self, obj):
    if isinstance(obj, dict):
        newobj = {}
        for key, value in obj.iteritems():
            key = str(key)
            newobj[key] = self._parseJSON(value)
    elif isinstance(obj, list):
        newobj = []
        for value in obj:
            newobj.append(self._parseJSON(value))
    elif isinstance(obj, unicode):
        newobj = str(obj)
    else:
        newobj = obj
    return newobj

Question 34

here is a recursive encoder written in C: https://github.com/axiros/nested_encode

Performance overhead for “average” structures around 10% compared to json.loads.

python speed.py                                                                                            
  json loads            [0.16sec]: {u'a': [{u'b': [[1, 2, [u'\xd6ster..
  json loads + encoding [0.18sec]: {'a': [{'b': [[1, 2, ['\xc3\x96ster.
  time overhead in percent: 9%

using this teststructure:

import json, nested_encode, time

s = """
{
  "firstName": "Jos\\u0301",
  "lastName": "Smith",
  "isAlive": true,
  "age": 25,
  "address": {
    "streetAddress": "21 2nd Street",
    "city": "\\u00d6sterreich",
    "state": "NY",
    "postalCode": "10021-3100"
  },
  "phoneNumbers": [
    {
      "type": "home",
      "number": "212 555-1234"
    },
    {
      "type": "office",
      "number": "646 555-4567"
    }
  ],
  "children": [],
  "spouse": null,
  "a": [{"b": [[1, 2, ["\\u00d6sterreich"]]]}]
}
"""


t1 = time.time()
for i in xrange(10000):
    u = json.loads(s)
dt_json = time.time() - t1

t1 = time.time()
for i in xrange(10000):
    b = nested_encode.encode_nested(json.loads(s))
dt_json_enc = time.time() - t1

print "json loads            [%.2fsec]: %s..." % (dt_json, str(u)[:20])
print "json loads + encoding [%.2fsec]: %s..." % (dt_json_enc, str(b)[:20])

print "time overhead in percent: %i%%"  % (100 * (dt_json_enc - dt_json)/dt_json)

Question 35

With Python 3.6, sometimes I still run into this problem. For example, when getting response from a REST API and loading the response text to JSON, I still get the unicode strings. Found a simple solution using json.dumps().

response_message = json.loads(json.dumps(response.text))
print(response_message)

Question 36

I ran into this problem too, and having to deal with JSON, I came up with a small loop that converts the unicode keys to strings. (simplejson on GAE does not return string keys.)

obj is the object decoded from JSON:

if NAME_CLASS_MAP.has_key(cls):
    kwargs = {}
    for i in obj.keys():
        kwargs[str(i)] = obj[i]
    o = NAME_CLASS_MAP[cls](**kwargs)
    o.save()

kwargs is what I pass to the constructor of the GAE application (which does not like unicode keys in **kwargs)

Not as robust as the solution from Wells, but much smaller.

Question 37

I’ve adapted the code from the answer of Mark Amery, particularly in order to get rid of isinstance for the pros of duck-typing.

The encoding is done manually and ensure_ascii is disabled. The python docs for json.dump says that

If ensure_ascii is True (the default), all non-ASCII characters in the output are escaped with \uXXXX sequences

Disclaimer: in the doctest I used the Hungarian language. Some notable Hungarian-related character encodings are: cp852 the IBM/OEM encoding used eg. in DOS (sometimes referred as ascii, incorrectly I think, it is dependent on the codepage setting), cp1250 used eg. in Windows (sometimes referred as ansi, dependent on the locale settings), and iso-8859-2, sometimes used on http servers. The test text Tüskéshátú kígyóbűvölő is attributed to Koltai László (native personal name form) and is from wikipedia.

# coding: utf-8
"""
This file should be encoded correctly with utf-8.
"""
import json

def encode_items(input, encoding='utf-8'):
    u"""original from: https://stackoverflow.com/a/13101776/611007
    adapted by SO/u/611007 (20150623)
    >>> 
    >>> ## run this with `python -m doctest <this file>.py` from command line
    >>> 
    >>> txt = u"Tüskéshátú kígyóbűvölő"
    >>> txt2 = u"T\\u00fcsk\\u00e9sh\\u00e1t\\u00fa k\\u00edgy\\u00f3b\\u0171v\\u00f6l\\u0151"
    >>> txt3 = u"uúuutifu"
    >>> txt4 = b'u\\xfauutifu'
    >>> # txt4 shouldn't be 'u\\xc3\\xbauutifu', string content needs double backslash for doctest:
    >>> assert u'\\u0102' not in b'u\\xfauutifu'.decode('cp1250')
    >>> txt4u = txt4.decode('cp1250')
    >>> assert txt4u == u'u\\xfauutifu', repr(txt4u)
    >>> txt5 = b"u\\xc3\\xbauutifu"
    >>> txt5u = txt5.decode('utf-8')
    >>> txt6 = u"u\\u251c\\u2551uutifu"
    >>> there_and_back_again = lambda t: encode_items(t, encoding='utf-8').decode('utf-8')
    >>> assert txt == there_and_back_again(txt)
    >>> assert txt == there_and_back_again(txt2)
    >>> assert txt3 == there_and_back_again(txt3)
    >>> assert txt3.encode('cp852') == there_and_back_again(txt4u).encode('cp852')
    >>> assert txt3 == txt4u,(txt3,txt4u)
    >>> assert txt3 == there_and_back_again(txt5)
    >>> assert txt3 == there_and_back_again(txt5u)
    >>> assert txt3 == there_and_back_again(txt4u)
    >>> assert txt3.encode('cp1250') == encode_items(txt4, encoding='utf-8')
    >>> assert txt3.encode('utf-8') == encode_items(txt5, encoding='utf-8')
    >>> assert txt2.encode('utf-8') == encode_items(txt, encoding='utf-8')
    >>> assert {'a':txt2.encode('utf-8')} == encode_items({'a':txt}, encoding='utf-8')
    >>> assert [txt2.encode('utf-8')] == encode_items([txt], encoding='utf-8')
    >>> assert [[txt2.encode('utf-8')]] == encode_items([[txt]], encoding='utf-8')
    >>> assert [{'a':txt2.encode('utf-8')}] == encode_items([{'a':txt}], encoding='utf-8')
    >>> assert {'b':{'a':txt2.encode('utf-8')}} == encode_items({'b':{'a':txt}}, encoding='utf-8')
    """
    try:
        input.iteritems
        return {encode_items(k): encode_items(v) for (k,v) in input.iteritems()}
    except AttributeError:
        if isinstance(input, unicode):
            return input.encode(encoding)
        elif isinstance(input, str):
            return input
        try:
            iter(input)
            return [encode_items(e) for e in input]
        except TypeError:
            return input

def alt_dumps(obj, **kwargs):
    """
    >>> alt_dumps({'a': u"T\\u00fcsk\\u00e9sh\\u00e1t\\u00fa k\\u00edgy\\u00f3b\\u0171v\\u00f6l\\u0151"})
    '{"a": "T\\xc3\\xbcsk\\xc3\\xa9sh\\xc3\\xa1t\\xc3\\xba k\\xc3\\xadgy\\xc3\\xb3b\\xc5\\xb1v\\xc3\\xb6l\\xc5\\x91"}'
    """
    if 'ensure_ascii' in kwargs:
        del kwargs['ensure_ascii']
    return json.dumps(encode_items(obj), ensure_ascii=False, **kwargs)

I’d also like to highlight the answer of Jarret Hardie which references the JSON spec, quoting:

A string is a collection of zero or more Unicode characters

In my use-case I had files with json. They are utf-8 encoded files. ensure_ascii results in properly escaped but not very readable json files, that is why I’ve adapted Mark Amery’s answer to fit my needs.

The doctest is not particularly thoughtful but I share the code in the hope that it will useful for someone.

Question 38

Check out this answer to a similar question like this which states that

The u- prefix just means that you have a Unicode string. When you really use the string, it won’t appear in your data. Don’t be thrown by the printed output.

For example, try this:

print mail_accounts[0]["i"]

You won’t see a u.

Question 39

I am writing a program that stores data in a dictionary object, but this data needs to be saved at some point during the program execution and loaded back into the dictionary object when the program is run again. How would I convert a dictionary object into a string that can be written to a file and loaded back into a dictionary object? This will hopefully support dictionaries containing dictionaries.

Question 40

The json module is a good solution here. It has the advantages over pickle that it only produces plain text output, and is cross-platform and cross-version.

import json
json.dumps(dict)

Question 41

If your dictionary isn’t too big maybe str + eval can do the work:

dict1 = {'one':1, 'two':2, 'three': {'three.1': 3.1, 'three.2': 3.2 }}
str1 = str(dict1)

dict2 = eval(str1)

print dict1==dict2

You can use ast.literal_eval instead of eval for additional security if the source is untrusted.

Question 42

I use json:

import json

# convert to string
input = json.dumps({'id': id })

# load to dict
my_dict = json.loads(input)

Question 43

Use the pickle module to save it to disk and load later on.

Question 44

Why not to use Python 3’s inbuilt ast library’s function literal_eval. It is better to use literal_eval instead of eval

import ast
str_of_dict = "{'key1': 'key1value', 'key2': 'key2value'}"
ast.literal_eval(str_of_dict)

will give output as actual Dictionary

{'key1': 'key1value', 'key2': 'key2value'}

And If you are asking to convert a Dictionary to a String then, How about using str() method of Python.

Suppose the dictionary is :

my_dict = {'key1': 'key1value', 'key2': 'key2value'}

And this will be done like this :

str(my_dict)

Will Print :

"{'key1': 'key1value', 'key2': 'key2value'}"

This is the easy as you like.

Question 45

If in Chinses

import codecs
fout = codecs.open("xxx.json", "w", "utf-8")
dict_to_json = json.dumps({'text':"中文"},ensure_ascii=False,indent=2)
fout.write(dict_to_json + '\n')

Question 46

Convert dictionary into JSON (string)

import json 

mydict = { "name" : "Don", 
          "surname" : "Mandol", 
          "age" : 43} 

result = json.dumps(mydict)

print(result[0:20])

will get you:

{“name”: “Don”, “sur

Convert string into dictionary

back_to_mydict = json.loads(result)

Question 47

I think you should consider using the shelve module which provides persistent file-backed dictionary-like objects. It’s easy to use in place of a “real” dictionary because it almost transparently provides your program with something that can be used just like a dictionary, without the need to explicitly convert it to a string and then write to a file (or vice-versa).

The main difference is needing to initially open() it before first use and then close() it when you’re done (and possibly sync()ing it, depending on the writeback option being used). Any “shelf” file objects create can contain regular dictionaries as values, allowing them to be logically nested.

Here’s a trivial example:

import shelve

shelf = shelve.open('mydata')  # open for reading and writing, creating if nec
shelf.update({'one':1, 'two':2, 'three': {'three.1': 3.1, 'three.2': 3.2 }})
shelf.close()

shelf = shelve.open('mydata')
print shelf
shelf.close()

Output:

{'three': {'three.1': 3.1, 'three.2': 3.2}, 'two': 2, 'one': 1}

Question 48

If you care about the speed use ujson (UltraJSON), which has the same API as json:

import ujson
ujson.dumps([{"key": "value"}, 81, True])
# '[{"key":"value"},81,true]'
ujson.loads("""[{"key": "value"}, 81, true]""")
# [{u'key': u'value'}, 81, True]

Question 49

I use yaml for that if needs to be readable (neither JSON nor XML are that IMHO), or if reading is not necessary I use pickle.

Write

from pickle import dumps, loads
x = dict(a=1, b=2)
y = dict(c = x, z=3)
res = dumps(y)
open('/var/tmp/dump.txt', 'w').write(res)

Read back

from pickle import dumps, loads
rev = loads(open('/var/tmp/dump.txt').read())
print rev

问题：将类实例序列化为JSON

回答 0

回答 1

回答 2

回答 3

回答 4

回答 5

回答 6

回答 7

回答 8

回答 9

问题：保存对象（数据持久性）

回答 0

更新资料

cPickle（或_pickle）与pickle

数据流格式（协议）

多个物件

Update

cPickle (or _pickle) vs pickle

Data stream formats (protocols)

Multiple Objects

回答 1

回答 2

回答 3

问题：如何从JSON获取字符串对象而不是Unicode？

例

更新资料

Example

Update

回答 0

一个解决方案 object_hook

它是如何工作的，我为什么要使用它？

A solution with object_hook

How does this work and why would I use it?

回答 1

笔记

转换次数

Notes

Conversion

回答 2

回答 3

回答 4

回答 5

回答 6

回答 7

回答 8

回答 9

回答 10

回答 11

回答 12

回答 13

回答 14

回答 15

回答 16

回答 17

回答 18

回答 19

回答 20

问题：将python字典转换为字符串并返回

回答 0

回答 1

回答 2

回答 3

回答 4

回答 5

回答 6

回答 7

回答 8

回答 9

问题：如何使可序列化的JSON类

回答 0

回答 1

.toJSON() 方法

.toJSON() Method

回答 2

回答 3

回答 4

回答 5

回答 6

回答 7

`cPickle`（或`_pickle`）与`pickle`

`cPickle` (or `_pickle`) vs `pickle`

一个解决方案 `object_hook`

A solution with `object_hook`

`.toJSON()` 方法

`.toJSON()` Method

Flatbuffers