In Python, what is the difference between json.load() and json.loads()?

I guess that the load() function must be used with a file object (I need thus to use a context manager) while the loads() function take the path to the file as a string. It is a bit confusing.

Does the letter “s” in json.loads() stand for string?

Thanks a lot for your answers!

Yes, s stands for string. The json.loads function does not take the file path, but the file contents as a string. Look at the documentation at https://docs.python.org/2/library/json.html!

# open a json file for reading and print content using json.load
with open("/xyz/json_data.json", "r") as content:


{u'event': {u'id': u'5206c7e2-da67-42da-9341-6ea403c632c7', u'name': u'Sufiyan Ghori'}}


# you cannot use json.loads on file object
with open("json_data.json", "r") as content:




json.loads() 反串化字符串。



with open("json_data.json", "r") as content:


{u'event': {u'id': u'5206c7e2-da67-42da-9341-6ea403c632c7', u'name': u'Sufiyan Ghori'}}

那是因为类型content.read()是字符串,即<type 'str'>


with open("json_data.json", "r") as content:






cat json_data.json | ./test.py

{u'event': {u'id': u'5206c7e2-da67-42da-9341-6ea403c632c7', u'name': u'Sufiyan Ghori'}}


Just going to add a simple example to what everyone has explained,


json.load can deserialize a file itself i.e. it accepts a file object, for example,

# open a json file for reading and print content using json.load
with open("/xyz/json_data.json", "r") as content:

will output,

{u'event': {u'id': u'5206c7e2-da67-42da-9341-6ea403c632c7', u'name': u'Sufiyan Ghori'}}

If I use json.loads to open a file instead,

# you cannot use json.loads on file object
with open("json_data.json", "r") as content:

I would get this error:

TypeError: expected string or buffer


json.loads() deserialize string.

So in order to use json.loads I will have to pass the content of the file using read() function, for example,

using content.read() with json.loads() return content of the file,

with open("json_data.json", "r") as content:


{u'event': {u'id': u'5206c7e2-da67-42da-9341-6ea403c632c7', u'name': u'Sufiyan Ghori'}}

That’s because type of content.read() is string, i.e. <type 'str'>

If I use json.load() with content.read(), I will get error,

with open("json_data.json", "r") as content:


AttributeError: ‘str’ object has no attribute ‘read’

So, now you know json.load deserialze file and json.loads deserialize a string.

Another example,

sys.stdin return file object, so if i do print(json.load(sys.stdin)), I will get actual json data,

cat json_data.json | ./test.py

{u'event': {u'id': u'5206c7e2-da67-42da-9341-6ea403c632c7', u'name': u'Sufiyan Ghori'}}

If I want to use json.loads(), I would do print(json.loads(sys.stdin.read())) instead.

json.load(fp[, encoding[, cls[, object_hook[, parse_float[, parse_int[, parse_constant[, object_pairs_hook[, **kw]]]]]]]])


json.loads(s[, encoding[, cls[, object_hook[, parse_float[, parse_int[, parse_constant[, object_pairs_hook[, **kw]]]]]]]])



Documentation is quite clear: https://docs.python.org/2/library/json.html

json.load(fp[, encoding[, cls[, object_hook[, parse_float[, parse_int[, parse_constant[, object_pairs_hook[, **kw]]]]]]]])

Deserialize fp (a .read()-supporting file-like object containing a JSON document) to a Python object using this conversion table.

json.loads(s[, encoding[, cls[, object_hook[, parse_float[, parse_int[, parse_constant[, object_pairs_hook[, **kw]]]]]]]])

Deserialize s (a str or unicode instance containing a JSON document) to a Python object using this conversion table.

So load is for a file, loads for a string

json.loads()需要一个(有效)JSON字符串-即 {"foo": "bar"}


假设您有一个文件example.json,其内容如下:{“ key_1”:1,1,“ key_2”:“ foo”,“ Key_3”:null}

>>> import json
>>> file = open("example.json")

>>> type(file)
<class '_io.TextIOWrapper'>

>>> file
<_io.TextIOWrapper name='example.json' mode='r' encoding='UTF-8'>

>>> json.load(file)
{'key_1': 1, 'key_2': 'foo', 'Key_3': None}

>>> json.loads(file)
Traceback (most recent call last):
  File "/usr/local/python/Versions/3.7/lib/python3.7/json/__init__.py", line 341, in loads
TypeError: the JSON object must be str, bytes or bytearray, not TextIOWrapper

>>> string = '{"foo": "bar"}'

>>> type(string)
<class 'str'>

>>> string
'{"foo": "bar"}'

>>> json.loads(string)
{'foo': 'bar'}

>>> json.load(string)
Traceback (most recent call last):
  File "/usr/local/python/Versions/3.7/lib/python3.7/json/__init__.py", line 293, in load
    return loads(fp.read(),
AttributeError: 'str' object has no attribute 'read'

QUICK ANSWER (very simplified!)

json.load() takes a FILE

json.load() expects a file (file object) – e.g. a file you opened before given by filepath like 'files/example.json'.

json.loads() takes a STRING

json.loads() expects a (valid) JSON string – i.e. {"foo": "bar"}


Assuming you have a file example.json with this content: { “key_1”: 1, “key_2”: “foo”, “Key_3”: null }

>>> import json
>>> file = open("example.json")

>>> type(file)
<class '_io.TextIOWrapper'>

>>> file
<_io.TextIOWrapper name='example.json' mode='r' encoding='UTF-8'>

>>> json.load(file)
{'key_1': 1, 'key_2': 'foo', 'Key_3': None}

>>> json.loads(file)
Traceback (most recent call last):
  File "/usr/local/python/Versions/3.7/lib/python3.7/json/__init__.py", line 341, in loads
TypeError: the JSON object must be str, bytes or bytearray, not TextIOWrapper

>>> string = '{"foo": "bar"}'

>>> type(string)
<class 'str'>

>>> string
'{"foo": "bar"}'

>>> json.loads(string)
{'foo': 'bar'}

>>> json.load(string)
Traceback (most recent call last):
  File "/usr/local/python/Versions/3.7/lib/python3.7/json/__init__.py", line 293, in load
    return loads(fp.read(),
AttributeError: 'str' object has no attribute 'read'

import json
with open('strings.json') as f:
    d = json.load(f)


import json

person = '{"name": "Bob", "languages": ["English", "Fench"]}'
# Output : <type 'str'>

person_dict = json.loads(person)
print( person_dict)
# Output: {'name': 'Bob', 'languages': ['English', 'Fench']}

# Output : <type 'dict'>


The json.load() method (without “s” in “load”) can read a file directly:

import json
with open('strings.json') as f:
    d = json.load(f)

json.loads() method, which is used for string arguments only.

import json

person = '{"name": "Bob", "languages": ["English", "Fench"]}'
# Output : <type 'str'>

person_dict = json.loads(person)
print( person_dict)
# Output: {'name': 'Bob', 'languages': ['English', 'Fench']}

# Output : <type 'dict'>

Here , we can see after using loads() takes a string ( type(str) ) as a input and return dictionary.

def load(fp, *, cls=None, object_hook=None, parse_float=None,
        parse_int=None, parse_constant=None, object_pairs_hook=None, **kw):

    return loads(fp.read(),
        cls=cls, object_hook=object_hook,
        parse_float=parse_float, parse_int=parse_int,
        parse_constant=parse_constant, object_pairs_hook=object_pairs_hook, **kw)



with open (file) as fp:
    s = fp.read()


with open (file) as fp:



def loads(s, *, encoding=None, cls=None, object_hook=None, parse_float=None,
        parse_int=None, parse_constant=None, object_pairs_hook=None, **kw):
    """Deserialize ``s`` (a ``str``, ``bytes`` or ``bytearray`` instance
    containing a JSON document) to a Python object.


    if isinstance(s, str):
        if s.startswith('\ufeff'):
            raise JSONDecodeError("Unexpected UTF-8 BOM (decode using utf-8-sig)",
                                  s, 0)
        if not isinstance(s, (bytes, bytearray)):
            raise TypeError(f'the JSON object must be str, bytes or bytearray, '
                            f'not {s.__class__.__name__}')
        s = s.decode(detect_encoding(s), 'surrogatepass')

In python3.7.7, the definition of json.load is as below according to cpython source code:

def load(fp, *, cls=None, object_hook=None, parse_float=None,
        parse_int=None, parse_constant=None, object_pairs_hook=None, **kw):

    return loads(fp.read(),
        cls=cls, object_hook=object_hook,
        parse_float=parse_float, parse_int=parse_int,
        parse_constant=parse_constant, object_pairs_hook=object_pairs_hook, **kw)

json.load actually calls json.loads and use fp.read() as the first argument.

So if your code is:

with open (file) as fp:
    s = fp.read()

It’s the same to do this:

with open (file) as fp:

But if you need to specify the bytes reading from the file as like fp.read(10) or the string/bytes you want to deserialize is not from file, you should use json.loads()

As for json.loads(), it not only deserialize string but also bytes. If s is bytes or bytearray, it will be decoded to string first. You can also find it in the source code.

def loads(s, *, encoding=None, cls=None, object_hook=None, parse_float=None,
        parse_int=None, parse_constant=None, object_pairs_hook=None, **kw):
    """Deserialize ``s`` (a ``str``, ``bytes`` or ``bytearray`` instance
    containing a JSON document) to a Python object.


    if isinstance(s, str):
        if s.startswith('\ufeff'):
            raise JSONDecodeError("Unexpected UTF-8 BOM (decode using utf-8-sig)",
                                  s, 0)
        if not isinstance(s, (bytes, bytearray)):
            raise TypeError(f'the JSON object must be str, bytes or bytearray, '
                            f'not {s.__class__.__name__}')
        s = s.decode(detect_encoding(s), 'surrogatepass')




I searched in this official document to find difference between the json.dump() and json.dumps() in python. It is clear that they are related with file write option.
But what is the detailed difference between them and in what situations one has more advantage than other?

正如Antii Haapala在此答案中提到的,在ensure_ascii行为上有一些细微的差异。这主要是由于底层write()函数是如何工作的,因为它是对块而不是整个字符串进行操作。检查他的答案以获取更多详细信息。







There isn’t much else to add other than what the docs say. If you want to dump the JSON into a file/socket or whatever, then you should go with dump(). If you only need it as a string (for printing, parsing or whatever) then use dumps() (dump string)

As mentioned by Antti Haapala in this answer, there are some minor differences on the ensure_ascii behaviour. This is mostly due to how the underlying write() function works, being that it operates on chunks rather than the whole string. Check his answer for more details on that.


Serialize obj as a JSON formatted stream to fp (a .write()-supporting file-like object

If ensure_ascii is False, some chunks written to fp may be unicode instances


Serialize obj to a JSON formatted str

If ensure_ascii is False, the result may contain non-ASCII characters and the return value may be a unicode instance

The functions with an s take string parameters. The others take file streams.

调用时,jsonstr = json.dumps(mydata)它首先在内存中创建数据的完整副本,然后才将file.write(jsonstr)其复制到磁盘。因此,这是一种更快的方法,但是如果要保存大量数据,则可能会成为问题。

当调用json.dump(mydata, file)-不带’s’时,不使用新的内存,因为数据是按块转储的。但是整个过程要慢大约2倍。


In memory usage and speed.

When you call jsonstr = json.dumps(mydata) it first creates a full copy of your data in memory and only then you file.write(jsonstr) it to disk. So this is a faster method but can be a problem if you have a big piece of data to save.

When you call json.dump(mydata, file) — without ‘s’, new memory is not used, as the data is dumped by chunks. But the whole process is about 2 times slower.

Source: I checked the source code of json.dump() and json.dumps() and also tested both the variants measuring the time with time.time() and watching the memory usage in htop.

Python 2的一个显着差异是,如果您使用ensure_ascii=False,则dump可以将UTF-8编码的数据正确写入文件中(除非您使用的扩展名不是UTF-8的8位字符串):

dumps另一方面,with ensure_ascii=False可以产生a strunicode仅取决于您用于字符串的类型:




当然,这在Python 3中不再是有效的问题,因为不再存在这种8位/ Unicode的混淆。

至于loadVS loadsload认为整个文件是一个JSON文件,所以你不能用它来从单个文件读取多个新行限制JSON文件。

One notable difference in Python 2 is that if you’re using ensure_ascii=False, dump will properly write UTF-8 encoded data into the file (unless you used 8-bit strings with extended characters that are not UTF-8):

dumps on the other hand, with ensure_ascii=False can produce a str or unicode just depending on what types you used for strings:

Serialize obj to a JSON formatted str using this conversion table. If ensure_ascii is False, the result may contain non-ASCII characters and the return value may be a unicode instance.

(emphasis mine). Note that it may still be a str instance as well.

Thus you cannot use its return value to save the structure into file without checking which format was returned and possibly playing with unicode.encode.

This of course is not valid concern in Python 3 any more, since there is no more this 8-bit/Unicode confusion.

As for load vs loads, load considers the whole file to be one JSON document, so you cannot use it to read multiple newline limited JSON documents from a single file.




{"from": {"id": "8", "name": "Mary Pinter"}, "message": "How ARE you?", "comments": {"count": 0}, "updated_time": "2012-05-01", "created_time": "2012-05-01", "to": {"data": [{"id": "1543", "name": "Honey Pinter"}]}, "type": "status", "id": "id_7"}


import json 

str = '{"from": {"id": "8", "name": "Mary Pinter"}, "message": "How ARE you?", "comments": {"count": 0}, "updated_time": "2012-05-01", "created_time": "2012-05-01", "to": {"data": [{"id": "1543", "name": "Honey Pinter"}]}, "type": "status", "id": "id_7"}'
data = json.loads(str)

post_id = data['id']
post_type = data['type']

created_time = data['created_time']
updated_time = data['updated_time']

if data.get('application'):
    app_id = data['application'].get('id', 0)

#if data.get('to'):
#... This is the part I am not sure how to do
# Since it is in the form "to": {"data":[{"id":...}]}



I have a bunch of JSON data from Facebook posts like the one below:

{"from": {"id": "8", "name": "Mary Pinter"}, "message": "How ARE you?", "comments": {"count": 0}, "updated_time": "2012-05-01", "created_time": "2012-05-01", "to": {"data": [{"id": "1543", "name": "Honey Pinter"}]}, "type": "status", "id": "id_7"}

The JSON data is semi-structured and all is not the same. Below is my code:

import json 

str = '{"from": {"id": "8", "name": "Mary Pinter"}, "message": "How ARE you?", "comments": {"count": 0}, "updated_time": "2012-05-01", "created_time": "2012-05-01", "to": {"data": [{"id": "1543", "name": "Honey Pinter"}]}, "type": "status", "id": "id_7"}'
data = json.loads(str)

post_id = data['id']
post_type = data['type']

created_time = data['created_time']
updated_time = data['updated_time']

if data.get('application'):
    app_id = data['application'].get('id', 0)

#if data.get('to'):
#... This is the part I am not sure how to do
# Since it is in the form "to": {"data":[{"id":...}]}

I want the code to print the to_id as 1543 else print ‘null’

I am not sure how to do this.

import json

jsonData = """{"from": {"id": "8", "name": "Mary Pinter"}, "message": "How ARE you?", "comments": {"count": 0}, "updated_time": "2012-05-01", "created_time": "2012-05-01", "to": {"data": [{"id": "1543", "name": "Honey Pinter"}]}, "type": "status", "id": "id_7"}"""

def getTargetIds(jsonData):
    data = json.loads(jsonData)
    if 'to' not in data:
        raise ValueError("No target in given data")
    if 'data' not in data['to']:
        raise ValueError("No data for target")

    for dest in data['to']['data']:
        if 'id' not in dest:
        targetId = dest['id']
        print("to_id:", targetId)


In [9]: getTargetIds(s)
to_id: 1543
import json

jsonData = """{"from": {"id": "8", "name": "Mary Pinter"}, "message": "How ARE you?", "comments": {"count": 0}, "updated_time": "2012-05-01", "created_time": "2012-05-01", "to": {"data": [{"id": "1543", "name": "Honey Pinter"}]}, "type": "status", "id": "id_7"}"""

def getTargetIds(jsonData):
    data = json.loads(jsonData)
    if 'to' not in data:
        raise ValueError("No target in given data")
    if 'data' not in data['to']:
        raise ValueError("No data for target")

    for dest in data['to']['data']:
        if 'id' not in dest:
        targetId = dest['id']
        print("to_id:", targetId)


In [9]: getTargetIds(s)
to_id: 1543

h = {'a': 1}
'b' in h # returns False


h.get('b') # returns None


h.get('b', 'Default value')

If all you want is to check if key exists or not

h = {'a': 1}
'b' in h # returns False

If you want to check if there is a value for key

h.get('b') # returns None

Return a default value if actual value is missing

h.get('b', 'Default value')

def get_attribute(data, attribute, default_value):
    return data.get(attribute) or default_value


from json_utils import get_attribute

def my_cool_iteration_func(data):

    data_to = get_attribute(data, 'to', None)
    if not data_to:

    data_to_data = get_attribute(data_to, 'data', [])
    for item in data_to_data:
        print('The id is: %s' % get_attribute(item, 'id', 'null'))


我使用的原因data.get(attribute) or default_value不是简单的data.get(attribute, default_value)

{'my_key': None}.get('my_key', 'nothing') # returns None
{'my_key': None}.get('my_key') or 'nothing' # returns 'nothing'

在我的应用程序中,获取属性值为“ null”与根本不获取属性相同。如果您的用法不同,则需要进行更改。

It is a good practice to create helper utility methods for things like that so that whenever you need to change the logic of attribute validation it would be in one place, and the code will be more readable for the followers.

For example create a helper method (or class JsonUtils with static methods) in json_utils.py:

def get_attribute(data, attribute, default_value):
    return data.get(attribute) or default_value

and then use it in your project:

from json_utils import get_attribute

def my_cool_iteration_func(data):

    data_to = get_attribute(data, 'to', None)
    if not data_to:

    data_to_data = get_attribute(data_to, 'data', [])
    for item in data_to_data:
        print('The id is: %s' % get_attribute(item, 'id', 'null'))


There is a reason I am using data.get(attribute) or default_value instead of simply data.get(attribute, default_value):

{'my_key': None}.get('my_key', 'nothing') # returns None
{'my_key': None}.get('my_key') or 'nothing' # returns 'nothing'

In my applications getting attribute with value ‘null’ is the same as not getting the attribute at all. If your usage is different, you need to change this.

jsonData = """{"from": {"id": "8", "name": "Mary Pinter"}, "message": "How ARE you?", "comments": {"count": 0}, "updated_time": "2012-05-01", "created_time": "2012-05-01", "to": {"data": [{"id": "1543", "name": "Honey Pinter"}, {"name": "Joe Schmoe"}]}, "type": "status", "id": "id_7"}"""

def getTargetIds(jsonData):
    data = json.loads(jsonData)
    for dest in data['to']['data']:
        print("to_id:", dest.get('id', 'null'))


>>> getTargetIds(jsonData)
to_id: 1543
to_id: null


def getTargetIds(jsonData):
    data = json.loads(jsonData)
    for dest in data['to']['data']:
        if 'id' in to_id:
            print("to_id:", dest['id'])


>>> getTargetIds(jsonData)
to_id: 1543


jsonData = """{"from": {"id": "8", "name": "Mary Pinter"}, "message": "How ARE you?", "comments": {"count": 0}, "updated_time": "2012-05-01", "created_time": "2012-05-01", "to": {"data": [{"id": "1543", "name": "Honey Pinter"}, {"name": "Joe Schmoe"}]}, "type": "status", "id": "id_7"}"""

def getTargetIds(jsonData):
    data = json.loads(jsonData)
    for dest in data['to']['data']:
        print("to_id:", dest.get('id', 'null'))

Try it:

>>> getTargetIds(jsonData)
to_id: 1543
to_id: null

Or, if you just want to skip over values missing ids instead of printing 'null':

def getTargetIds(jsonData):
    data = json.loads(jsonData)
    for dest in data['to']['data']:
        if 'id' in to_id:
            print("to_id:", dest['id'])


>>> getTargetIds(jsonData)
to_id: 1543

Of course in real life, you probably don’t want to print each id, but to store them and do something with them, but that’s another issue.

if "my_data" in my_json_data:
         print json.dumps(my_json_data["my_data"])
if "my_data" in my_json_data:
         print json.dumps(my_json_data["my_data"])

def is_json_key_present(json, key):
        buf = json[key]
    except KeyError:
        return False

    return True

I wrote a tiny function for this purpose. Feel free to repurpose,

def is_json_key_present(json, key):
        buf = json[key]
    except KeyError:
        return False

>>> import json
>>> releases = {1: "foo-v0.1"}
>>> json.dumps(releases)
'{"1": "foo-v0.1"}'



I have found that when the following is run, python’s json module (included since 2.6) converts int dictionary keys to strings.

>>> import json
>>> releases = {1: "foo-v0.1"}
>>> json.dumps(releases)
'{"1": "foo-v0.1"}'

Is there any easy way to preserve the key as an int, without needing to parse the string on dump and load. I believe it would be possible using the hooks provided by the json module, but again this still requires parsing. Is there possibly an argument I have overlooked? cheers, chaz

Sub-question: Thanks for the answers. Seeing as json works as I feared, is there an easy way to convey key type by maybe parsing the output of dumps? Also I should note the code doing the dumping and the code downloading the json object from a server and loading it, are both written by me.

在Perl,Javascript,awk和许多其他语言中,哈希,关联数组或给定语言所调用的名称的键是字符串(或Perl中的“标量”)。在Perl $foo{1}, $foo{1.0}, and $foo{"1"}是在相同的对应的所有引用%foo—关键是评估作为标!

JSON是从Javascript序列化技术开始的。(JSON代表Ĵ AVA 小号 CRIPT ö bject Ñ浮选。)当然它实现为它的映射符号的语义这与它的映射语义一致。

如果序列化的两端都将是Python,那么最好使用咸菜。如果您真的需要将这些从JSON转换回本机Python对象,我想您有两种选择。首先try: ... except: ...,如果字典查找失败,您可以尝试()将任何键转换为数字。或者,如果将代码添加到另一端(此JSON数据的序列化器或生成器),则可以让它对每个键值执行JSON序列化—将其作为键列表提供。(然后,您的Python代码将首先在键列表上进行迭代,将它们实例化/反序列化为本地Python对象…,然后使用那些键来访问映射中的值)。

This is one of those subtle differences among various mapping collections that can bite you. JSON treats keys as strings; Python supports distinct keys differing only in type.

In Python (and apparently in Lua) the keys to a mapping (dictionary or table, respectively) are object references. In Python they must be immutable types, or they must be objects which implement a __hash__ method. (The Lua docs suggest that it automatically uses the object’s ID as a hash/key even for mutable objects and relies on string interning to ensure that equivalent strings map to the same objects).

In Perl, Javascript, awk and many other languages the keys for hashes, associative arrays or whatever they’re called for the given language, are strings (or “scalars” in Perl). In perl $foo{1}, $foo{1.0}, and $foo{"1"} are all references to the same mapping in %foo — the key is evaluated as a scalar!

JSON started as a Javascript serialization technology. (JSON stands for JavaScript Object Notation.) Naturally it implements semantics for its mapping notation which are consistent with its mapping semantics.

If both ends of your serialization are going to be Python then you’d be better off using pickles. If you really need to convert these back from JSON into native Python objects I guess you have a couple of choices. First you could try (try: ... except: ...) to convert any key to a number in the event of a dictionary look-up failure. Alternatively, if you add code to the other end (the serializer or generator of this JSON data) then you could have it perform a JSON serialization on each of the key values — providing those as a list of keys. (Then your Python code would first iterate over the list of keys, instantiating/deserializing them into native Python objects … and then use those for access the values out of the mapping).

var a= {1: 'a'};
for (k in a)
    alert(typeof k); // 'string'


a[999999999999999999999]= 'a'; // this even works on Array
alert(a[1000000000000000000000]); // 'a'
alert(a['999999999999999999999']); // fail
alert(a['1e+21']); // 'a'


No, there is no such thing as a Number key in JavaScript. All object properties are converted to String.

var a= {1: 'a'};
for (k in a)
    alert(typeof k); // 'string'

This can lead to some curious-seeming behaviours:

a[999999999999999999999]= 'a'; // this even works on Array
alert(a[1000000000000000000000]); // 'a'
alert(a['999999999999999999999']); // fail
alert(a['1e+21']); // 'a'

JavaScript Objects aren’t really proper mappings as you’d understand it in languages like Python, and using keys that aren’t String results in weirdness. This is why JSON always explicitly writes keys as strings, even where it doesn’t look necessary.

>>>> import json
>>>> json.dumps(releases.items())
    '[[1, "foo-v0.1"]]'
>>>> releases = {1: "foo-v0.1"}
>>>> releases == dict(json.loads(json.dumps(releases.items())))

Alternatively you can also try converting dictionary to a list of [(k1,v1),(k2,v2)] format while encoding it using json, and converting it back to dictionary after decoding it back.

>>>> import json
>>>> json.dumps(releases.items())
    '[[1, "foo-v0.1"]]'
>>>> releases = {1: "foo-v0.1"}
>>>> releases == dict(json.loads(json.dumps(releases.items())))
I believe this will need some more work like having some sort of flag to identify what all parameters to be converted to dictionary after decoding it back from json.

可以通过使用 json.loads(jsonDict, object_hook=jsonKeys2int)

def jsonKeys2int(x):
    if isinstance(x, dict):
            return {int(k):v for k,v in x.items()}
    return x



def jsonKV2int(x):
    if isinstance(x, dict):
            return {int(k):(int(v) if isinstance(v, unicode) else v) for k,v in x.items()}
    return x




如何在字典理解中使用if / else?


Answering your subquestion:

It can be accomplished by using json.loads(jsonDict, object_hook=jsonKeys2int)

def jsonKeys2int(x):
    if isinstance(x, dict):
            return {int(k):v for k,v in x.items()}
    return x

This function will also work for nested dicts and uses a dict comprehension.

If you want to to cast the values too, use:

def jsonKV2int(x):
    if isinstance(x, dict):
            return {int(k):(int(v) if isinstance(v, unicode) else v) for k,v in x.items()}
    return x

Which tests the instance of the values and casts them only if they are strings objects (unicode to be exact).

Both functions assumes keys (and values) to be integers.

Thanks to:

How to use if/else in a dictionary comprehension?

Convert a string key to int in a Dictionary

I’ve gotten bitten by the same problem. As others have pointed out, in JSON, the mapping keys must be strings. You can do one of two things. You can use a less strict JSON library, like demjson, which allows integer strings. If no other programs (or no other in other languages) are going to read it, then you should be okay. Or you can use a different serialization language. I wouldn’t suggest pickle. It’s hard to read, and is not designed to be secure. Instead, I’d suggest YAML, which is (nearly) a superset of JSON, and does allow integer keys. (At least PyYAML does.)

回答 5


import ast

Convert the dictionary to be string by using str(dict) and then convert it back to dict by doing this:

import ast

>>> import json
>>> json_data = '{"1": "one", "2": {"-3": "minus three", "4": "four"}}'
>>> py_dict = json.loads(json_data, object_hook=lambda d: {int(k) if k.lstrip('-').isdigit() else k: v for k, v in d.items()})

>>> py_dict
{1: 'one', 2: {-3: 'minus three', 4: 'four'}}

仅用于将json键解析为int的过滤器。您也可以将int(v) if v.lstrip('-').isdigit() else v过滤器用于json值。

Here is my solution! I used object_hook, it is useful when you have nested json

>>> import json
>>> json_data = '{"1": "one", "2": {"-3": "minus three", "4": "four"}}'
>>> py_dict = json.loads(json_data, object_hook=lambda d: {int(k) if k.lstrip('-').isdigit() else k: v for k, v in d.items()})

>>> py_dict
{1: 'one', 2: {-3: 'minus three', 4: 'four'}}

There is filter only for parsing json key to int. You can use int(v) if v.lstrip('-').isdigit() else v filter for json value too.

def convert_keys_to_int(d: dict):
    new_dict = {}
    for k, v in d.items():
            new_key = int(k)
        except ValueError:
            new_key = k
        if type(v) == dict:
            v = _convert_keys_to_int(v)
        new_dict[new_key] = v
    return new_dict


>>>d = {1: 3, 2: 'a', 3: {1: 'a', 2: 10}, 4: {'a': 2, 'b': 10}}
>>>convert_keys_to_int(json.loads(json.dumps(d)))  == d

I made a very simple extension of Murmel’s answer which I think will work on a pretty arbitrary dictionary (including nested) assuming it can be dumped by JSON in the first place. Any keys which can be interpreted as integers will be cast to int. No doubt this is not very efficient, but it works for my purposes of storing to and loading from json strings.

def convert_keys_to_int(d: dict):
    new_dict = {}
    for k, v in d.items():
            new_key = int(k)
        except ValueError:
            new_key = k
        if type(v) == dict:
            v = _convert_keys_to_int(v)
        new_dict[new_key] = v
    return new_dict

Assuming that all keys in the original dict are integers if they can be cast to int, then this will return the original dictionary after storing as a json. e.g.

>>>d = {1: 3, 2: 'a', 3: {1: 'a', 2: 10}, 4: {'a': 2, 'b': 10}}
>>>convert_keys_to_int(json.loads(json.dumps(d)))  == d

You can write your json.dumps by yourself, here is a example from djson: encoder.py. You can use it like this:

assert dumps({1: "abc"}) == '{1: "abc"}'




with open(filename, "r") as f:
  data = json.loads(f.read())


with open(filename, "r") as f:
  data = json.load(f)



但是,对于其他类型的JSON错误(包括经典的“在列表中的最后一项上使用逗号”,以及其他诸如大写true / false的大写字母),Python的输出仅为:

Traceback (most recent call last):
  File "myfile.py", line 8, in myfunction
    config = json.loads(f.read())
  File "c:\python27\lib\json\__init__.py", line 326, in loads
    return _default_decoder.decode(s)
  File "c:\python27\lib\json\decoder.py", line 360, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "c:\python27\lib\json\decoder.py", line 378, in raw_decode
    raise ValueError("No JSON object could be decoded")
ValueError: No JSON object could be decoded


Python code to load data from some long complicated JSON file:

with open(filename, "r") as f:
  data = json.loads(f.read())

(note: the best code version should be:

with open(filename, "r") as f:
  data = json.load(f)

but both exhibit similar behavior)

For many types of JSON error (missing delimiters, incorrect backslashes in strings, etc), this prints a nice helpful message containing the line and column number where the JSON error was found.

However, for other types of JSON error (including the classic “using comma on the last item in a list”, but also other things like capitalising true/false), Python’s output is just:

Traceback (most recent call last):
  File "myfile.py", line 8, in myfunction
    config = json.loads(f.read())
  File "c:\python27\lib\json\__init__.py", line 326, in loads
    return _default_decoder.decode(s)
  File "c:\python27\lib\json\decoder.py", line 360, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "c:\python27\lib\json\decoder.py", line 378, in raw_decode
    raise ValueError("No JSON object could be decoded")
ValueError: No JSON object could be decoded

For that type of ValueError, how do you get Python to tell you where is the error in the JSON file?

回答 0


ValueError: No JSON object could be decoded


simplejson.decoder.JSONDecodeError: Expecting object: line 1 column 5 (char 5)


I’ve found that the simplejson module gives more descriptive errors in many cases where the built-in json module is vague. For instance, for the case of having a comma after the last item in a list:

ValueError: No JSON object could be decoded

which is not very descriptive. The same operation with simplejson:

simplejson.decoder.JSONDecodeError: Expecting object: line 1 column 5 (char 5)

Much better! Likewise for other common errors like capitalizing True.

回答 1



You wont be able to get python to tell you where the JSON is incorrect. You will need to use a linter online somewhere like this

This will show you error in the JSON you are trying to decode.

您可以尝试在以下位置找到rson库:http : //code.google.com/p/rson/。我还在PYPI上:https ://pypi.python.org/pypi/rson/0.9,所以您可以使用easy_install或pip来获取它。


>>> rson.loads('[1,2,]')
rson.base.tokenizer.RSONDecodeError: Unexpected trailing comma: line 1, column 6, text ']'



>>> rson.loads('[true,False]')
[True, u'False']

You could try the rson library found here: http://code.google.com/p/rson/ . I it also up on PYPI: https://pypi.python.org/pypi/rson/0.9 so you can use easy_install or pip to get it.

for the example given by tom:

>>> rson.loads('[1,2,]')
rson.base.tokenizer.RSONDecodeError: Unexpected trailing comma: line 1, column 6, text ']'

RSON is a designed to be a superset of JSON, so it can parse JSON files. It also has an alternate syntax which is much nicer for humans to look at and edit. I use it quite a bit for input files.

As for the capitalizing of boolean values: it appears that rson reads incorrectly capitalized booleans as strings.

>>> rson.loads('[true,False]')
[True, u'False']

回答 3

我有一个类似的问题,这是由于单引号引起的。JSON标准(http://json.org)仅讨论使用双引号,因此必须是python json库仅支持双引号。

I had a similar problem and it was due to singlequotes. The JSON standard(http://json.org) talks only about using double quotes so it must be that the python json library supports only double quotes.

回答 4


def load_json_file(path):
    data = open(path, 'r').read()
    print data
        return Bunch(json.loads(data))
    except ValueError, e:
        raise MalformedJsonFileError('%s when reading "%s"' % (str(e),

(这是因为我设置了文本编辑器来编写UTF-8 BOM …愚蠢)

仅仅提及这一点是因为,虽然可能不能很好地解决OP的特定问题,但这是一种确定非常令人讨厌的bug的来源的相当快捷的方法。我敢打赌,很多人会偶然发现这篇文章,他们正在寻找更详细的解决方案MalformedJsonFileError: No JSON object could be decoded when reading …。这样可能对他们有帮助。

For my particular version of this problem, I went ahead and searched the function declaration of load_json_file(path) within the packaging.py file, then smuggled a print line into it:

def load_json_file(path):
    data = open(path, 'r').read()
    print data
        return Bunch(json.loads(data))
    except ValueError, e:
        raise MalformedJsonFileError('%s when reading "%s"' % (str(e),

That way it would print the content of the json file before entering the try-catch, and that way – even with my barely existing Python knowledge – I was able to quickly figure out why my configuration couldn’t read the json file.
(It was because I had set up my text editor to write a UTF-8 BOM … stupid)

Just mentioning this because, while maybe not a good answer to the OP’s specific problem, this was a rather quick method in determining the source of a very oppressing bug. And I bet that many people will stumble upon this article who are searching a more verbose solution for a MalformedJsonFileError: No JSON object could be decoded when reading …. So that might help them.

对我来说,我的json文件很大,json在python中使用common 时会出现上述错误。

安装后simplejson通过sudo pip install simplejson


import json
import simplejson

def test_parse_json():
    f_path = '/home/hello/_data.json'
    with open(f_path) as f:
        # j_data = json.load(f)      # ValueError: No JSON object could be decoded
        j_data = simplejson.load(f)  # right
    lst_img = j_data['images']['image']
    print lst_img[0]

if __name__ == '__main__':

As to me, my json file is very large, when use common json in python it gets the above error.

After install simplejson by sudo pip install simplejson.

And then I solved it.

import json
import simplejson

def test_parse_json():
    f_path = '/home/hello/_data.json'
    with open(f_path) as f:
        # j_data = json.load(f)      # ValueError: No JSON object could be decoded
        j_data = simplejson.load(f)  # right
    lst_img = j_data['images']['image']
    print lst_img[0]

if __name__ == '__main__':

回答 6


    file = open("list.json",'w')

    json_file = open("list.json","r")
    json_decoded = json.load(json_file)
    print json_decoded

问题是我忘了file.close() 做到这一点并解决了问题。

I had a similar problem this was my code:

    file = open("list.json",'w')

    json_file = open("list.json","r")
    json_decoded = json.load(json_file)
    print json_decoded

the problem was i had forgotten to file.close() I did it and fixed the problem.

  1. 创建一个子类“ JSONLintCheck”以从类“ JSONDecoder”继承,并覆盖类“ JSONDecoder”的init方法,如下所示:

    def __init__(self, encoding=None, object_hook=None, parse_float=None,parse_int=None, parse_constant=None, strict=True,object_pairs_hook=None)        
            super(JSONLintCheck,self).__init__(encoding=None, object_hook=None,      parse_float=None,parse_int=None, parse_constant=None, strict=True,object_pairs_hook=None)
            self.scan_once = make_scanner(self)
  1. make_scanner是一个新函数,用于覆盖上述类的’scan_once’方法。这是它的代码:
  1 #!/usr/bin/env python
  2 from json import JSONDecoder
  3 from json import decoder
  4 import re
  6 NUMBER_RE = re.compile(
  7     r'(-?(?:0|[1-9]\d*))(\.\d+)?([eE][-+]?\d+)?',
  8     (re.VERBOSE | re.MULTILINE | re.DOTALL))
 10 def py_make_scanner(context):
 11     parse_object = context.parse_object
 12     parse_array = context.parse_array
 13     parse_string = context.parse_string
 14     match_number = NUMBER_RE.match
 15     encoding = context.encoding
 16     strict = context.strict
 17     parse_float = context.parse_float
 18     parse_int = context.parse_int
 19     parse_constant = context.parse_constant
 20     object_hook = context.object_hook
 21     object_pairs_hook = context.object_pairs_hook
 23     def _scan_once(string, idx):
 24         try:
 25             nextchar = string[idx]
 26         except IndexError:
 27             raise ValueError(decoder.errmsg("Could not get the next character",string,idx))
 28             #raise StopIteration
 30         if nextchar == '"':
 31             return parse_string(string, idx + 1, encoding, strict)
 32         elif nextchar == '{':
 33             return parse_object((string, idx + 1), encoding, strict,
 34                 _scan_once, object_hook, object_pairs_hook)
 35         elif nextchar == '[':
 36             return parse_array((string, idx + 1), _scan_once)
 37         elif nextchar == 'n' and string[idx:idx + 4] == 'null':
 38             return None, idx + 4
 39         elif nextchar == 't' and string[idx:idx + 4] == 'true':
 40             return True, idx + 4
 41         elif nextchar == 'f' and string[idx:idx + 5] == 'false':
 42             return False, idx + 5
 44         m = match_number(string, idx)
 45         if m is not None:
 46             integer, frac, exp = m.groups()
 47             if frac or exp:
 48                 res = parse_float(integer + (frac or '') + (exp or ''))
 49             else:
 50                 res = parse_int(integer)
 51             return res, m.end()
 52         elif nextchar == 'N' and string[idx:idx + 3] == 'NaN':
 53             return parse_constant('NaN'), idx + 3
 54         elif nextchar == 'I' and string[idx:idx + 8] == 'Infinity':
 55             return parse_constant('Infinity'), idx + 8
 56         elif nextchar == '-' and string[idx:idx + 9] == '-Infinity':
 57             return parse_constant('-Infinity'), idx + 9
 58         else:
 59             #raise StopIteration   # Here is where needs modification
 60             raise ValueError(decoder.errmsg("Expecting propert name enclosed in double quotes",string,idx))
 61     return _scan_once
 63 make_scanner = py_make_scanner
  1. 最好将“ make_scanner”功能与新的子类一起放入同一文件中。

  1. Create a child class “JSONLintCheck” to inherite from class “JSONDecoder” and override the init method of the class “JSONDecoder” like below:

    def __init__(self, encoding=None, object_hook=None, parse_float=None,parse_int=None, parse_constant=None, strict=True,object_pairs_hook=None)        
            super(JSONLintCheck,self).__init__(encoding=None, object_hook=None,      parse_float=None,parse_int=None, parse_constant=None, strict=True,object_pairs_hook=None)
            self.scan_once = make_scanner(self)
  1. make_scanner is a new function that used to override the ‘scan_once’ method of the above class. And here is code for it:
  1 #!/usr/bin/env python
  2 from json import JSONDecoder
  3 from json import decoder
  4 import re
  6 NUMBER_RE = re.compile(
  7     r'(-?(?:0|[1-9]\d*))(\.\d+)?([eE][-+]?\d+)?',
  8     (re.VERBOSE | re.MULTILINE | re.DOTALL))
 10 def py_make_scanner(context):
 11     parse_object = context.parse_object
 12     parse_array = context.parse_array
 13     parse_string = context.parse_string
 14     match_number = NUMBER_RE.match
 15     encoding = context.encoding
 16     strict = context.strict
 17     parse_float = context.parse_float
 18     parse_int = context.parse_int
 19     parse_constant = context.parse_constant
 20     object_hook = context.object_hook
 21     object_pairs_hook = context.object_pairs_hook
 23     def _scan_once(string, idx):
 24         try:
 25             nextchar = string[idx]
 26         except IndexError:
 27             raise ValueError(decoder.errmsg("Could not get the next character",string,idx))
 28             #raise StopIteration
 30         if nextchar == '"':
 31             return parse_string(string, idx + 1, encoding, strict)
 32         elif nextchar == '{':
 33             return parse_object((string, idx + 1), encoding, strict,
 34                 _scan_once, object_hook, object_pairs_hook)
 35         elif nextchar == '[':
 36             return parse_array((string, idx + 1), _scan_once)
 37         elif nextchar == 'n' and string[idx:idx + 4] == 'null':
 38             return None, idx + 4
 39         elif nextchar == 't' and string[idx:idx + 4] == 'true':
 40             return True, idx + 4
 41         elif nextchar == 'f' and string[idx:idx + 5] == 'false':
 42             return False, idx + 5
 44         m = match_number(string, idx)
 45         if m is not None:
 46             integer, frac, exp = m.groups()
 47             if frac or exp:
 48                 res = parse_float(integer + (frac or '') + (exp or ''))
 49             else:
 50                 res = parse_int(integer)
 51             return res, m.end()
 52         elif nextchar == 'N' and string[idx:idx + 3] == 'NaN':
 53             return parse_constant('NaN'), idx + 3
 54         elif nextchar == 'I' and string[idx:idx + 8] == 'Infinity':
 55             return parse_constant('Infinity'), idx + 8
 56         elif nextchar == '-' and string[idx:idx + 9] == '-Infinity':
 57             return parse_constant('-Infinity'), idx + 9
 58         else:
 59             #raise StopIteration   # Here is where needs modification
 60             raise ValueError(decoder.errmsg("Expecting propert name enclosed in double quotes",string,idx))
 61     return _scan_once
 63 make_scanner = py_make_scanner
  1. Better put the ‘make_scanner’ function together with the new child class into a same file.

json.tool 直到我删除了UTF BOM标记,都拒绝处理甚至是空文件(只是花括号)。


  • 用vim打开我的json文件,
  • 删除了字节顺序标记(set nobomb
  • 保存存档


Just hit the same issue and in my case the problem was related to BOM (byte order mark) at the beginning of the file.

json.tool would refuse to process even empty file (just curly braces) until i removed the UTF BOM mark.

What I have done is:

  • opened my json file with vim,
  • removed byte order mark (set nobomb)
  • save file

This resolved the problem with json.tool. Hope this helps!

json.dump({}, file)

When your file is created. Instead of creating a file with content is empty. Replace with:

json.dump({}, file)

You could use cjson, that claims to be up to 250 times faster than pure-python implementations, given that you have “some long complicated JSON file” and you will probably need to run it several times (decoders fail and report the first error they encounter only).






{"id":"123456789","name":"John Doe","first_name":"John","last_name":"Doe","link":"http:\/\/www.facebook.com\/jdoe","gender":"male","email":"jdoe\u0040gmail.com","timezone":-7,"locale":"en_US","verified":true,"updated_time":"2011-01-12T02:43:35+0000"}


ValueError: dictionary update sequence element #0 has length 1; 2 is required


KeyError: '{'


So I’ve spent way to much time on this, and it seems to me like it should be a simple fix. I’m trying to use Facebook’s Authentication to register users on my site, and I’m trying to do it server side. I’ve gotten to the point where I get my access token, and when I go to:


I get the information I’m looking for as a string that’s like this:

{"id":"123456789","name":"John Doe","first_name":"John","last_name":"Doe","link":"http:\/\/www.facebook.com\/jdoe","gender":"male","email":"jdoe\u0040gmail.com","timezone":-7,"locale":"en_US","verified":true,"updated_time":"2011-01-12T02:43:35+0000"}

It seems like I should just be able to use dict(string) on this but I’m getting this error:

ValueError: dictionary update sequence element #0 has length 1; 2 is required

So I tried using Pickle, but got this error:

KeyError: '{'

I tried using django.serializers to de-serialize it but had similar results. Any thoughts? I feel like the answer has to be simple, and I’m just being stupid. Thanks for any help!

回答 0

此数据为JSON!如果您使用的是Python 2.6+,则可以使用内置json模块反序列化它,否则可以使用出色的第三方simplejson模块

import json    # or `import simplejson as json` if on Python < 2.6

json_string = u'{ "id":"123456789", ... }'
obj = json.loads(json_string)    # obj now contains a dict of the data

This data is JSON! You can deserialize it using the built-in json module if you’re on Python 2.6+, otherwise you can use the excellent third-party simplejson module.

import json    # or `import simplejson as json` if on Python < 2.6

json_string = u'{ "id":"123456789", ... }'
obj = json.loads(json_string)    # obj now contains a dict of the data

回答 1

使用ast.literal_eval评估Python文字。但是,您拥有的是JSON(例如,请注意“ true”),因此请使用JSON解串器。

>>> import json
>>> s = """{"id":"123456789","name":"John Doe","first_name":"John","last_name":"Doe","link":"http:\/\/www.facebook.com\/jdoe","gender":"male","email":"jdoe\u0040gmail.com","timezone":-7,"locale":"en_US","verified":true,"updated_time":"2011-01-12T02:43:35+0000"}"""
>>> json.loads(s)
{u'first_name': u'John', u'last_name': u'Doe', u'verified': True, u'name': u'John Doe', u'locale': u'en_US', u'gender': u'male', u'email': u'jdoe@gmail.com', u'link': u'http://www.facebook.com/jdoe', u'timezone': -7, u'updated_time': u'2011-01-12T02:43:35+0000', u'id': u'123456789'}

Use ast.literal_eval to evaluate Python literals. However, what you have is JSON (note “true” for example), so use a JSON deserializer.

>>> import json
>>> s = """{"id":"123456789","name":"John Doe","first_name":"John","last_name":"Doe","link":"http:\/\/www.facebook.com\/jdoe","gender":"male","email":"jdoe\u0040gmail.com","timezone":-7,"locale":"en_US","verified":true,"updated_time":"2011-01-12T02:43:35+0000"}"""
>>> json.loads(s)
{u'first_name': u'John', u'last_name': u'Doe', u'verified': True, u'name': u'John Doe', u'locale': u'en_US', u'gender': u'male', u'email': u'jdoe@gmail.com', u'link': u'http://www.facebook.com/jdoe', u'timezone': -7, u'updated_time': u'2011-01-12T02:43:35+0000', u'id': u'123456789'}





str ="<?xml version="1.0" ?><person><name>john</name><age>20</age></person"
dic_xml = convert_to_dic(str)

然后dic_xml看起来像{'person' : { 'name' : 'john', 'age' : 20 } }

I have a program that reads an xml document from a socket. I have the xml document stored in a string which I would like to convert directly to a Python dictionary, the same way it is done in Django’s simplejson library.

Take as an example:

str ="<?xml version="1.0" ?><person><name>john</name><age>20</age></person"
dic_xml = convert_to_dic(str)

Then dic_xml would look like {'person' : { 'name' : 'john', 'age' : 20 } }

回答 0

from xml.etree import cElementTree as ElementTree

class XmlListConfig(list):
    def __init__(self, aList):
        for element in aList:
            if element:
                # treat like dict
                if len(element) == 1 or element[0].tag != element[1].tag:
                # treat like list
                elif element[0].tag == element[1].tag:
            elif element.text:
                text = element.text.strip()
                if text:

class XmlDictConfig(dict):
    Example usage:

    >>> tree = ElementTree.parse('your_file.xml')
    >>> root = tree.getroot()
    >>> xmldict = XmlDictConfig(root)

    Or, if you want to use an XML string:

    >>> root = ElementTree.XML(xml_string)
    >>> xmldict = XmlDictConfig(root)

    And then use xmldict for what it is... a dict.
    def __init__(self, parent_element):
        if parent_element.items():
        for element in parent_element:
            if element:
                # treat like dict - we assume that if the first two tags
                # in a series are different, then they are all different.
                if len(element) == 1 or element[0].tag != element[1].tag:
                    aDict = XmlDictConfig(element)
                # treat like list - we assume that if the first two tags
                # in a series are the same, then the rest are the same.
                    # here, we put the list in dictionary; the key is the
                    # tag name the list elements all share in common, and
                    # the value is the list itself 
                    aDict = {element[0].tag: XmlListConfig(element)}
                # if the tag has attributes, add those to the dict
                if element.items():
                self.update({element.tag: aDict})
            # this assumes that if you've got an attribute in a tag,
            # you won't be having any text. This may or may not be a 
            # good idea -- time will tell. It works for the way we are
            # currently doing XML configuration files...
            elif element.items():
                self.update({element.tag: dict(element.items())})
            # finally, if there are no child tags and no attributes, extract
            # the text
                self.update({element.tag: element.text})


tree = ElementTree.parse('your_file.xml')
root = tree.getroot()
xmldict = XmlDictConfig(root)


root = ElementTree.XML(xml_string)
xmldict = XmlDictConfig(root)

Here is the code from the website just in case the link goes bad.

from xml.etree import cElementTree as ElementTree

class XmlListConfig(list):
    def __init__(self, aList):
        for element in aList:
            if element:
                # treat like dict
                if len(element) == 1 or element[0].tag != element[1].tag:
                # treat like list
                elif element[0].tag == element[1].tag:
            elif element.text:
                text = element.text.strip()
                if text:

class XmlDictConfig(dict):
    Example usage:

    >>> tree = ElementTree.parse('your_file.xml')
    >>> root = tree.getroot()
    >>> xmldict = XmlDictConfig(root)

    Or, if you want to use an XML string:

    >>> root = ElementTree.XML(xml_string)
    >>> xmldict = XmlDictConfig(root)

    And then use xmldict for what it is... a dict.
    def __init__(self, parent_element):
        if parent_element.items():
        for element in parent_element:
            if element:
                # treat like dict - we assume that if the first two tags
                # in a series are different, then they are all different.
                if len(element) == 1 or element[0].tag != element[1].tag:
                    aDict = XmlDictConfig(element)
                # treat like list - we assume that if the first two tags
                # in a series are the same, then the rest are the same.
                    # here, we put the list in dictionary; the key is the
                    # tag name the list elements all share in common, and
                    # the value is the list itself 
                    aDict = {element[0].tag: XmlListConfig(element)}
                # if the tag has attributes, add those to the dict
                if element.items():
                self.update({element.tag: aDict})
            # this assumes that if you've got an attribute in a tag,
            # you won't be having any text. This may or may not be a 
            # good idea -- time will tell. It works for the way we are
            # currently doing XML configuration files...
            elif element.items():
                self.update({element.tag: dict(element.items())})
            # finally, if there are no child tags and no attributes, extract
            # the text
                self.update({element.tag: element.text})

tree = ElementTree.parse('your_file.xml')
root = tree.getroot()
xmldict = XmlDictConfig(root)

//Or, if you want to use an XML string:

root = ElementTree.XML(xml_string)
xmldict = XmlDictConfig(root)

回答 1


<?xml version="1.0" ?>
# {u'person': {u'age': u'20', u'name': u'john'}}

xmltodict (full disclosure: I wrote it) does exactly that:

<?xml version="1.0" ?>
# {u'person': {u'age': u'20', u'name': u'john'}}

回答 2


from collections import defaultdict

def etree_to_dict(t):
    d = {t.tag: {} if t.attrib else None}
    children = list(t)
    if children:
        dd = defaultdict(list)
        for dc in map(etree_to_dict, children):
            for k, v in dc.items():
        d = {t.tag: {k:v[0] if len(v) == 1 else v for k, v in dd.items()}}
    if t.attrib:
        d[t.tag].update(('@' + k, v) for k, v in t.attrib.items())
    if t.text:
        text = t.text.strip()
        if children or t.attrib:
            if text:
              d[t.tag]['#text'] = text
            d[t.tag] = text
    return d


from xml.etree import cElementTree as ET
e = ET.XML('''
  <e />
  <e name="value" />
  <e name="value">text</e>
  <e> <a>text</a> <b>text</b> </e>
  <e> <a>text</a> <a>text</a> </e>
  <e> text <a>text</a> </e>

from pprint import pprint


{'root': {'e': [None,
                {'@name': 'value'},
                {'#text': 'text', '@name': 'value'},
                {'a': 'text', 'b': 'text'},
                {'a': ['text', 'text']},
                {'#text': 'text', 'a': 'text'}]}}



如果要进行相反的操作从JSON / dict发出XML字符串,则可以使用:

except NameError:  # python3
  basestring = str

def dict_to_etree(d):
    def _to_etree(d, root):
        if not d:
        elif isinstance(d, basestring):
            root.text = d
        elif isinstance(d, dict):
            for k,v in d.items():
                assert isinstance(k, basestring)
                if k.startswith('#'):
                    assert k == '#text' and isinstance(v, basestring)
                    root.text = v
                elif k.startswith('@'):
                    assert isinstance(v, basestring)
                    root.set(k[1:], v)
                elif isinstance(v, list):
                    for e in v:
                        _to_etree(e, ET.SubElement(root, k))
                    _to_etree(v, ET.SubElement(root, k))
            raise TypeError('invalid type: ' + str(type(d)))
    assert isinstance(d, dict) and len(d) == 1
    tag, body = next(iter(d.items()))
    node = ET.Element(tag)
    _to_etree(body, node)
    return ET.tostring(node)


The following XML-to-Python-dict snippet parses entities as well as attributes following this XML-to-JSON “specification”. It is the most general solution handling all cases of XML.

from collections import defaultdict

def etree_to_dict(t):
    d = {t.tag: {} if t.attrib else None}
    children = list(t)
    if children:
        dd = defaultdict(list)
        for dc in map(etree_to_dict, children):
            for k, v in dc.items():
        d = {t.tag: {k:v[0] if len(v) == 1 else v for k, v in dd.items()}}
    if t.attrib:
        d[t.tag].update(('@' + k, v) for k, v in t.attrib.items())
    if t.text:
        text = t.text.strip()
        if children or t.attrib:
            if text:
              d[t.tag]['#text'] = text
            d[t.tag] = text
    return d

It is used:

from xml.etree import cElementTree as ET
e = ET.XML('''
  <e />
  <e name="value" />
  <e name="value">text</e>
  <e> <a>text</a> <b>text</b> </e>
  <e> <a>text</a> <a>text</a> </e>
  <e> text <a>text</a> </e>

from pprint import pprint

The output of this example (as per above-linked “specification”) should be:

{'root': {'e': [None,
                {'@name': 'value'},
                {'#text': 'text', '@name': 'value'},
                {'a': 'text', 'b': 'text'},
                {'a': ['text', 'text']},
                {'#text': 'text', 'a': 'text'}]}}

Not necessarily pretty, but it is unambiguous, and simpler XML inputs result in simpler JSON. :)


If you want to do the reverse, emit an XML string from a JSON/dict, you can use:

except NameError:  # python3
  basestring = str

def dict_to_etree(d):
    def _to_etree(d, root):
        if not d:
        elif isinstance(d, basestring):
            root.text = d
        elif isinstance(d, dict):
            for k,v in d.items():
                assert isinstance(k, basestring)
                if k.startswith('#'):
                    assert k == '#text' and isinstance(v, basestring)
                    root.text = v
                elif k.startswith('@'):
                    assert isinstance(v, basestring)
                    root.set(k[1:], v)
                elif isinstance(v, list):
                    for e in v:
                        _to_etree(e, ET.SubElement(root, k))
                    _to_etree(v, ET.SubElement(root, k))
            raise TypeError('invalid type: ' + str(type(d)))
    assert isinstance(d, dict) and len(d) == 1
    tag, body = next(iter(d.items()))
    node = ET.Element(tag)
    _to_etree(body, node)
    return ET.tostring(node)


import xml.etree.ElementTree as ET

from copy import copy

def dictify(r,root=True):
    if root:
        return {r.tag : dictify(r, False)}
    if r.text:
    for x in r.findall("./*"):
        if x.tag not in d:
    return d


root = ET.fromstring("<erik><a x='1'>v</a><a y='2'>w</a></erik>")



{'erik': {'a': [{'x': '1', '_text': 'v'}, {'y': '2', '_text': 'w'}]}}

This lightweight version, while not configurable, is pretty easy to tailor as needed, and works in old pythons. Also it is rigid – meaning the results are the same regardless of the existence of attributes.

import xml.etree.ElementTree as ET

from copy import copy

def dictify(r,root=True):
    if root:
        return {r.tag : dictify(r, False)}
    if r.text:
    for x in r.findall("./*"):
        if x.tag not in d:
    return d


root = ET.fromstring("<erik><a x='1'>v</a><a y='2'>w</a></erik>")


Results in:

{'erik': {'a': [{'x': '1', '_text': 'v'}, {'y': '2', '_text': 'w'}]}}

PicklingTools库的最新版本(1.3.0和1.3.1)支持将XML转换为Python dict的工具。

可从此处下载文件: PicklingTools 1.3.1

没有为转换颇有几分文档在这里:文档中详细的所有XML和Python字典之间转换时将产生的决定和问题描述(也有一些边缘情况:属性,列表,匿名列表,匿名多数转换器无法处理的dict,eval等)。通常,这些转换器易于使用。如果“ example.xml”包含:



>>> from xmlloader import *
>>> example = file('example.xml', 'r')   # A document containing XML
>>> xl = StreamXMLLoader(example, 0)     # 0 = all defaults on operation
>>> result = xl.expect XML()
>>> print result
{'top': {'a': '1', 'c': 'three', 'b': '2.2'}}

有一些可以在C ++和Python中进行转换的工具:C ++和Python可以进行相同的转换,但是C ++的速度要快60倍左右

The most recent versions of the PicklingTools libraries (1.3.0 and 1.3.1) support tools for converting from XML to a Python dict.

The download is available here: PicklingTools 1.3.1

There is quite a bit of documentation for the converters here: the documentation describes in detail all of the decisions and issues that will arise when converting between XML and Python dictionaries (there are a number of edge cases: attributes, lists, anonymous lists, anonymous dicts, eval, etc. that most converters don’t handle). In general, though, the converters are easy to use. If an ‘example.xml’ contains:


Then to convert it to a dictionary:

>>> from xmlloader import *
>>> example = file('example.xml', 'r')   # A document containing XML
>>> xl = StreamXMLLoader(example, 0)     # 0 = all defaults on operation
>>> result = xl.expect XML()
>>> print result
{'top': {'a': '1', 'c': 'three', 'b': '2.2'}}

There are tools for converting in both C++ and Python: the C++ and Python do indentical conversion, but the C++ is about 60x faster

[sudo] pip install lxml


from lxml import objectify as xml_objectify

def xml_to_dict(xml_str):
    """ Convert xml to dict, using lxml v3.4.2 xml processing library """
    def xml_to_dict_recursion(xml_object):
        dict_object = xml_object.__dict__
        if not dict_object:
            return xml_object
        for key, value in dict_object.items():
            dict_object[key] = xml_to_dict_recursion(value)
        return dict_object
    return xml_to_dict_recursion(xml_objectify.fromstring(xml_str))

xml_string = """<?xml version="1.0" encoding="UTF-8"?><Response><NewOrderResp>

print xml_to_dict(xml_string)


def xml_to_dict(xml_str):
    """ Convert xml to dict, using lxml v3.4.2 xml processing library, see http://lxml.de/ """
    def xml_to_dict_recursion(xml_object):
        dict_object = xml_object.__dict__
        if not dict_object:  # if empty dict returned
            return xml_object
        for key, value in dict_object.items():
            dict_object[key] = xml_to_dict_recursion(value)
        return dict_object
    xml_obj = objectify.fromstring(xml_str)
    return {xml_obj.tag: xml_to_dict_recursion(xml_obj)}


xml_obj.find('.//')  # lxml.objectify.ObjectifiedElement instance


You can do this quite easily with lxml. First install it:

[sudo] pip install lxml

Here is a recursive function I wrote that does the heavy lifting for you:

from lxml import objectify as xml_objectify

def xml_to_dict(xml_str):
    """ Convert xml to dict, using lxml v3.4.2 xml processing library """
    def xml_to_dict_recursion(xml_object):
        dict_object = xml_object.__dict__
        if not dict_object:
            return xml_object
        for key, value in dict_object.items():
            dict_object[key] = xml_to_dict_recursion(value)
        return dict_object
    return xml_to_dict_recursion(xml_objectify.fromstring(xml_str))

xml_string = """<?xml version="1.0" encoding="UTF-8"?><Response><NewOrderResp>

print xml_to_dict(xml_string)

The below variant preserves the parent key / element:

def xml_to_dict(xml_str):
    """ Convert xml to dict, using lxml v3.4.2 xml processing library, see http://lxml.de/ """
    def xml_to_dict_recursion(xml_object):
        dict_object = xml_object.__dict__
        if not dict_object:  # if empty dict returned
            return xml_object
        for key, value in dict_object.items():
            dict_object[key] = xml_to_dict_recursion(value)
        return dict_object
    xml_obj = objectify.fromstring(xml_str)
    return {xml_obj.tag: xml_to_dict_recursion(xml_obj)}

If you want to only return a subtree and convert it to dict, you can use Element.find() to get the subtree and then convert it:

xml_obj.find('.//')  # lxml.objectify.ObjectifiedElement instance

回答 6

免责声明:此经过修改的XML解析器受到Adam Clark 的启发。原始XML解析器适用于大多数简单情况。但是,它不适用于某些复杂的XML文件。我逐行调试了代码,最后解决了一些问题。如果您发现一些错误,请告诉我。我很高兴修复它。

class XmlDictConfig(dict):  
    Note: need to add a root into if no exising    
    Example usage:
    >>> tree = ElementTree.parse('your_file.xml')
    >>> root = tree.getroot()
    >>> xmldict = XmlDictConfig(root)
    Or, if you want to use an XML string:
    >>> root = ElementTree.XML(xml_string)
    >>> xmldict = XmlDictConfig(root)
    And then use xmldict for what it is... a dict.
    def __init__(self, parent_element):
        if parent_element.items():
            self.updateShim( dict(parent_element.items()) )
        for element in parent_element:
            if len(element):
                aDict = XmlDictConfig(element)
            #   if element.items():
            #   aDict.updateShim(dict(element.items()))
                self.updateShim({element.tag: aDict})
            elif element.items():    # items() is specialy for attribtes
                elementattrib= element.items()
                if element.text:           
                    elementattrib.append((element.tag,element.text ))     # add tag:text if there exist
                self.updateShim({element.tag: dict(elementattrib)})
                self.updateShim({element.tag: element.text})

    def updateShim (self, aDict ):
        for key in aDict.keys():   # keys() includes tag and attributes
            if key in self:
                value = self.pop(key)
                if type(value) is not list:
                    listOfDicts = []
                    self.update({key: listOfDicts})
                    self.update({key: value})
                self.update({key:aDict[key]})  # it was self.update(aDict)    

class XmlDictConfig(dict):  
    Note: need to add a root into if no exising    
    Example usage:
    >>> tree = ElementTree.parse('your_file.xml')
    >>> root = tree.getroot()
    >>> xmldict = XmlDictConfig(root)
    Or, if you want to use an XML string:
    >>> root = ElementTree.XML(xml_string)
    >>> xmldict = XmlDictConfig(root)
    And then use xmldict for what it is... a dict.
    def __init__(self, parent_element):
        if parent_element.items():
            self.updateShim( dict(parent_element.items()) )
        for element in parent_element:
            if len(element):
                aDict = XmlDictConfig(element)
            #   if element.items():
            #   aDict.updateShim(dict(element.items()))
                self.updateShim({element.tag: aDict})
            elif element.items():    # items() is specialy for attribtes
                elementattrib= element.items()
                if element.text:           
                    elementattrib.append((element.tag,element.text ))     # add tag:text if there exist
                self.updateShim({element.tag: dict(elementattrib)})
                self.updateShim({element.tag: element.text})

    def updateShim (self, aDict ):
        for key in aDict.keys():   # keys() includes tag and attributes
            if key in self:
                value = self.pop(key)
                if type(value) is not list:
                    listOfDicts = []
                    self.update({key: listOfDicts})
                    self.update({key: value})
                self.update({key:aDict[key]})  # it was self.update(aDict)    

回答 7

def xml_to_dict(node):
    @param node:lxml_node
    @return: dict 

    return {'tag': node.tag, 'text': node.text, 'attrib': node.attrib, 'children': {child.tag: xml_to_dict(child) for child in node}}
def xml_to_dict(node):
    @param node:lxml_node
    @return: dict 

    return {'tag': node.tag, 'text': node.text, 'attrib': node.attrib, 'children': {child.tag: xml_to_dict(child) for child in node}}

回答 8

最容易使用的XML XML解析器是ElementTree(从2.5x开始,在标准库xml.etree.ElementTree中)。我认为没有什么可以完全满足您的要求。使用ElementTree编写某些内容来完成您想要的事情,这很简单,但是为什么要转换为字典,为什么不直接使用ElementTree。

The easiest to use XML parser for Python is ElementTree (as of 2.5x and above it is in the standard library xml.etree.ElementTree). I don’t think there is anything that does exactly what you want out of the box. It would be pretty trivial to write something to do what you want using ElementTree, but why convert to a dictionary, and why not just use ElementTree directly.

import xml.etree.ElementTree as ElementTree

class XmlDictConfig(dict):
    def __init__(self, parent_element):
        if parent_element.items():
        for element in parent_element:
            if len(element):
                aDict = XmlDictConfig(element)
                if element.items():
                self.updateShim({element.tag: aDict})
            elif element.items():
                self.updateShim({element.tag: dict(element.items())})
                self.updateShim({element.tag: element.text.strip()})

    def updateShim (self, aDict ):
        for key in aDict.keys():
            if key in self:
                value = self.pop(key)
                if type(value) is not list:
                    listOfDicts = []
                    self.update({key: listOfDicts})

                    self.update({key: value})

I added a shim between that looks to see if the element already exists before self.update(). If so, pops the existing entry and creates a lists out of the existing and the new. Any subsequent duplicates are added to the list.

Not sure if this can be handled more gracefully, but it works:

import xml.etree.ElementTree as ElementTree

class XmlDictConfig(dict):
    def __init__(self, parent_element):
        if parent_element.items():
        for element in parent_element:
            if len(element):
                aDict = XmlDictConfig(element)
                if element.items():
                self.updateShim({element.tag: aDict})
            elif element.items():
                self.updateShim({element.tag: dict(element.items())})
                self.updateShim({element.tag: element.text.strip()})

    def updateShim (self, aDict ):
        for key in aDict.keys():
            if key in self:
                value = self.pop(key)
                if type(value) is not list:
                    listOfDicts = []
                    self.update({key: listOfDicts})

                    self.update({key: value})

回答 10

从@ K3 — rnc 响应(最适合我),我添加了一些小修改以从XML文本中获得OrderedDict(有时顺序很重要):

def etree_to_ordereddict(t):
d = OrderedDict()
d[t.tag] = OrderedDict() if t.attrib else None
children = list(t)
if children:
    dd = OrderedDict()
    for dc in map(etree_to_ordereddict, children):
        for k, v in dc.iteritems():
            if k not in dd:
                dd[k] = list()
    d = OrderedDict()
    d[t.tag] = OrderedDict()
    for k, v in dd.iteritems():
        if len(v) == 1:
            d[t.tag][k] = v[0]
            d[t.tag][k] = v
if t.attrib:
    d[t.tag].update(('@' + k, v) for k, v in t.attrib.iteritems())
if t.text:
    text = t.text.strip()
    if children or t.attrib:
        if text:
            d[t.tag]['#text'] = text
return d

在@ K3 — rnc示例中,可以使用它:

from xml.etree import cElementTree as ET
e = ET.XML('''
  <e />
  <e name="value" />
  <e name="value">text</e>
  <e> <a>text</a> <b>text</b> </e>
  <e> <a>text</a> <a>text</a> </e>
  <e> text <a>text</a> </e>

from pprint import pprint

From @K3—rnc response (the best for me) I’ve added a small modifications to get an OrderedDict from an XML text (some times order matters):

def etree_to_ordereddict(t):
d = OrderedDict()
d[t.tag] = OrderedDict() if t.attrib else None
children = list(t)
if children:
    dd = OrderedDict()
    for dc in map(etree_to_ordereddict, children):
        for k, v in dc.iteritems():
            if k not in dd:
                dd[k] = list()
    d = OrderedDict()
    d[t.tag] = OrderedDict()
    for k, v in dd.iteritems():
        if len(v) == 1:
            d[t.tag][k] = v[0]
            d[t.tag][k] = v
if t.attrib:
    d[t.tag].update(('@' + k, v) for k, v in t.attrib.iteritems())
if t.text:
    text = t.text.strip()
    if children or t.attrib:
        if text:
            d[t.tag]['#text'] = text
        d[t.tag] = text
return d

Following @K3—rnc example, you can use it:

from xml.etree import cElementTree as ET
e = ET.XML('''
  <e />
  <e name="value" />
  <e name="value">text</e>
  <e> <a>text</a> <b>text</b> </e>
  <e> <a>text</a> <a>text</a> </e>
  <e> text <a>text</a> </e>

from pprint import pprint

Hope it helps ;)

from xml.dom.minidom import parse

class NotTextNodeError:

def getTextFromNode(node):
    scans through all children of node and gathers the
    text. if node has non-text child-nodes, then
    NotTextNodeError is raised.
    t = ""
    for n in node.childNodes:
    if n.nodeType == n.TEXT_NODE:
        t += n.nodeValue
        raise NotTextNodeError
    return t

def nodeToDic(node):
    nodeToDic() scans through the children of node and makes a
    dictionary from the content.
    three cases are differentiated:
    - if the node contains no other nodes, it is a text-node
    and {nodeName:text} is merged into the dictionary.
    - if the node has the attribute "method" set to "true",
    then it's children will be appended to a list and this
    list is merged to the dictionary in the form: {nodeName:list}.
    - else, nodeToDic() will call itself recursively on
    the nodes children (merging {nodeName:nodeToDic()} to
    the dictionary).
    dic = {} 
    for n in node.childNodes:
    if n.nodeType != n.ELEMENT_NODE:
    if n.getAttribute("multiple") == "true":
        # node with multiple children:
        # put them in a list
        l = []
        for c in n.childNodes:
            if c.nodeType != n.ELEMENT_NODE:

        text = getTextFromNode(n)
    except NotTextNodeError:
            # 'normal' node

        # text node
    return dic

def readConfig(filename):
    dom = parse(filename)
    return nodeToDic(dom)

def test():
    dic = readConfig("sample.xml")

    print dic["Config"]["Name"]
    for item in dic["Config"]["Items"]:
    print "Item's Name:", item["Name"]
    print "Item's Value:", item["Value"]


<?xml version="1.0" encoding="UTF-8"?>

    <Name>My Config File</Name>

    <Items multiple="true">
        <Name>First Item</Name>
        <Value>Value 1</Value>
        <Name>Second Item</Name>
        <Value>Value 2</Value>


My Config File

Item's Name: First Item
Item's Value: Value 1
Item's Name: Second Item
Item's Value: Value 2

from xml.dom.minidom import parse

class NotTextNodeError:

def getTextFromNode(node):
    scans through all children of node and gathers the
    text. if node has non-text child-nodes, then
    NotTextNodeError is raised.
    t = ""
    for n in node.childNodes:
    if n.nodeType == n.TEXT_NODE:
        t += n.nodeValue
        raise NotTextNodeError
    return t

def nodeToDic(node):
    nodeToDic() scans through the children of node and makes a
    dictionary from the content.
    three cases are differentiated:
    - if the node contains no other nodes, it is a text-node
    and {nodeName:text} is merged into the dictionary.
    - if the node has the attribute "method" set to "true",
    then it's children will be appended to a list and this
    list is merged to the dictionary in the form: {nodeName:list}.
    - else, nodeToDic() will call itself recursively on
    the nodes children (merging {nodeName:nodeToDic()} to
    the dictionary).
    dic = {} 
    for n in node.childNodes:
    if n.nodeType != n.ELEMENT_NODE:
    if n.getAttribute("multiple") == "true":
        # node with multiple children:
        # put them in a list
        l = []
        for c in n.childNodes:
            if c.nodeType != n.ELEMENT_NODE:

        text = getTextFromNode(n)
    except NotTextNodeError:
            # 'normal' node

        # text node
    return dic

def readConfig(filename):
    dom = parse(filename)
    return nodeToDic(dom)

def test():
    dic = readConfig("sample.xml")

    print dic["Config"]["Name"]
    for item in dic["Config"]["Items"]:
    print "Item's Name:", item["Name"]
    print "Item's Value:", item["Value"]


<?xml version="1.0" encoding="UTF-8"?>

    <Name>My Config File</Name>

    <Items multiple="true">
        <Name>First Item</Name>
        <Value>Value 1</Value>
        <Name>Second Item</Name>
        <Value>Value 2</Value>


My Config File

Item's Name: First Item
Item's Value: Value 1
Item's Name: Second Item
Item's Value: Value 2

def xmltodict(element):
    if not isinstance(element, ElementTree.Element):
        raise ValueError("must pass xml.etree.ElementTree.Element object")

    def xmltodict_handler(parent_element):
        result = dict()
        for element in parent_element:
            if len(element):
                obj = xmltodict_handler(element)
                obj = element.text

            if result.get(element.tag):
                if hasattr(result[element.tag], "append"):
                    result[element.tag] = [result[element.tag], obj]
                result[element.tag] = obj
        return result

    return {element.tag: xmltodict_handler(element)}

def dicttoxml(element):
    if not isinstance(element, dict):
        raise ValueError("must pass dict type")
    if len(element) != 1:
        raise ValueError("dict must have exactly one root key")

    def dicttoxml_handler(result, key, value):
        if isinstance(value, list):
            for e in value:
                dicttoxml_handler(result, key, e)
            elem = ElementTree.Element(key)
            elem.text = value
        elif isinstance(value, int) or isinstance(value, float):
            elem = ElementTree.Element(key)
            elem.text = str(value)
        elif value is None:
            res = ElementTree.Element(key)
            for k, v in value.items():
                dicttoxml_handler(res, k, v)

    result = ElementTree.Element(element.keys()[0])
    for key, value in element[element.keys()[0]].items():
        dicttoxml_handler(result, key, value)
    return result

def xmlfiletodict(filename):
    return xmltodict(ElementTree.parse(filename).getroot())

def dicttoxmlfile(element, filename):

def xmlstringtodict(xmlstring):
    return xmltodict(ElementTree.fromstring(xmlstring).getroot())

def dicttoxmlstring(element):
    return ElementTree.tostring(dicttoxml(element))

At one point I had to parse and write XML that only consisted of elements without attributes so a 1:1 mapping from XML to dict was possible easily. This is what I came up with in case someone else also doesnt need attributes:

def xmltodict(element):
    if not isinstance(element, ElementTree.Element):
        raise ValueError("must pass xml.etree.ElementTree.Element object")

    def xmltodict_handler(parent_element):
        result = dict()
        for element in parent_element:
            if len(element):
                obj = xmltodict_handler(element)
                obj = element.text

            if result.get(element.tag):
                if hasattr(result[element.tag], "append"):
                    result[element.tag] = [result[element.tag], obj]
                result[element.tag] = obj
        return result

    return {element.tag: xmltodict_handler(element)}

def dicttoxml(element):
    if not isinstance(element, dict):
        raise ValueError("must pass dict type")
    if len(element) != 1:
        raise ValueError("dict must have exactly one root key")

    def dicttoxml_handler(result, key, value):
        if isinstance(value, list):
            for e in value:
                dicttoxml_handler(result, key, e)
        elif isinstance(value, basestring):
            elem = ElementTree.Element(key)
            elem.text = value
        elif isinstance(value, int) or isinstance(value, float):
            elem = ElementTree.Element(key)
            elem.text = str(value)
        elif value is None:
            res = ElementTree.Element(key)
            for k, v in value.items():
                dicttoxml_handler(res, k, v)

    result = ElementTree.Element(element.keys()[0])
    for key, value in element[element.keys()[0]].items():
        dicttoxml_handler(result, key, value)
    return result

def xmlfiletodict(filename):
    return xmltodict(ElementTree.parse(filename).getroot())

def dicttoxmlfile(element, filename):

def xmlstringtodict(xmlstring):
    return xmltodict(ElementTree.fromstring(xmlstring).getroot())

def dicttoxmlstring(element):
    return ElementTree.tostring(dicttoxml(element))

from collections import defaultdict
def xml2dict(node):
    d, count = defaultdict(list), 1
    for i in node:
        d[i.tag + "_" + str(count)]['text'] = i.findtext('.')[0]
        d[i.tag + "_" + str(count)]['attrib'] = i.attrib # attrib gives the list
        d[i.tag + "_" + str(count)]['children'] = xml2dict(i) # it gives dict
     return d

@dibrovsd: Solution will not work if the xml have more than one tag with same name

On your line of thought, I have modified the code a bit and written it for general node instead of root:

from collections import defaultdict
def xml2dict(node):
    d, count = defaultdict(list), 1
    for i in node:
        d[i.tag + "_" + str(count)]['text'] = i.findtext('.')[0]
        d[i.tag + "_" + str(count)]['attrib'] = i.attrib # attrib gives the list
        d[i.tag + "_" + str(count)]['children'] = xml2dict(i) # it gives dict
     return d

import xml.etree.ElementTree as ET

class XMLToDictionary(dict):
    def __init__(self, parentElement):
        self.parentElement = parentElement
        for child in list(parentElement):
            child.text = child.text if (child.text != None) else  ' '
            if len(child) == 0:
                self.update(self._addToDict(key= child.tag, value = child.text.strip(), dict = self))
                innerChild = XMLToDictionary(parentElement=child)
                self.update(self._addToDict(key=innerChild.parentElement.tag, value=innerChild, dict=self))

    def getDict(self):
        return {self.parentElement.tag: self}

    class _addToDict(dict):
        def __init__(self, key, value, dict):
            if not key in dict:
                self.update({key: value})
                identical = dict[key] if type(dict[key]) == list else [dict[key]]
                self.update({key: identical + [value]})

tree = ET.parse('./XML.xml')
root = tree.getroot()
parseredDict = XMLToDictionary(root).getDict()


{'A': {'B': [{'BB': 'inAB', 'C': {'D': {'E': ['inABCDE', 'value2', 'value3']}, 'inCout-ofD': '123'}}, 'abc'], 'F': 'F'}}

and in python

import xml.etree.ElementTree as ET

class XMLToDictionary(dict):
    def __init__(self, parentElement):
        self.parentElement = parentElement
        for child in list(parentElement):
            child.text = child.text if (child.text != None) else  ' '
            if len(child) == 0:
                self.update(self._addToDict(key= child.tag, value = child.text.strip(), dict = self))
                innerChild = XMLToDictionary(parentElement=child)
                self.update(self._addToDict(key=innerChild.parentElement.tag, value=innerChild, dict=self))

    def getDict(self):
        return {self.parentElement.tag: self}

    class _addToDict(dict):
        def __init__(self, key, value, dict):
            if not key in dict:
                self.update({key: value})
                identical = dict[key] if type(dict[key]) == list else [dict[key]]
                self.update({key: identical + [value]})

tree = ET.parse('./XML.xml')
root = tree.getroot()
parseredDict = XMLToDictionary(root).getDict()

the output is

{'A': {'B': [{'BB': 'inAB', 'C': {'D': {'E': ['inABCDE', 'value2', 'value3']}, 'inCout-ofD': '123'}}, 'abc'], 'F': 'F'}}

    def recursive_dict(element):
        return (element.tag.split('}')[1],
                dict(map(recursive_dict, element.getchildren()),

    def recursive_dict(element):
        return (element.tag.split('}')[1],
                dict(map(recursive_dict, element.getchildren()),



我需要将一个dict密钥为type str且值为ints 的小对象保存到磁盘,然后将其恢复。像这样:

{'juanjo': 2, 'pedro':99, 'other': 333}


我正在使用Python 2.6。

I need to save to disk a little dict object whose keys are of the type str and values are ints and then recover it. Something like this:

{'juanjo': 2, 'pedro':99, 'other': 333}

What is the best option and why? Serialize it with pickle or with simplejson?

I am using Python 2.6.

If you do not have any interoperability requirements (e.g. you are just going to use the data with Python) and a binary format is fine, go with cPickle which gives you really fast Python object serialization.

If you want interoperability or you want a text format to store your data, go with JSON (or some other appropriate format depending on your constraints).

回答 1


I prefer JSON over pickle for my serialization. Unpickling can run arbitrary code, and using pickle to transfer data between programs or store data between sessions is a security hole. JSON does not introduce a security hole and is standardized, so the data can be accessed by programs in different languages if you ever need to.

回答 2

您可能还会发现一些有趣的图表,可以进行比较:http : //kovshenin.com/archives/pickle-vs-json-which-is-faster/

You might also find this interesting, with some charts to compare: http://kovshenin.com/archives/pickle-vs-json-which-is-faster/

回答 3



其他答案中引用的测试结果记录在2010年,2016年使用cPickle 协议2更新的测试显示:

  • cPickle 3.8倍更快的加载速度
  • cPickle 1.5倍读取速度更快
  • cPickle编码稍小

使用这个gist可以自己重现这一点,它基于康斯坦丁在其他答案中引用的基准,但是使用协议2而不是pickle的cPickle,并且使用pickle的json(因为json比simplejson快)来使用json ,例如

wget https://gist.github.com/jdimatteo/af317ef24ccf1b3fa91f4399902bb534/raw/03e8dbab11b5605bc572bc117c8ac34cfa959a70/pickle_vs_json.py
python pickle_vs_json.py

在不错的2015 Xeon处理器上使用python 2.7的结果:

Dir Entries Method  Time    Length

dump    10  JSON    0.017   1484510
load    10  JSON    0.375   -
dump    10  Pickle  0.011   1428790
load    10  Pickle  0.098   -
dump    20  JSON    0.036   2969020
load    20  JSON    1.498   -
dump    20  Pickle  0.022   2857580
load    20  Pickle  0.394   -
dump    50  JSON    0.079   7422550
load    50  JSON    9.485   -
dump    50  Pickle  0.055   7143950
load    50  Pickle  2.518   -
dump    100 JSON    0.165   14845100
load    100 JSON    37.730  -
dump    100 Pickle  0.107   14287900
load    100 Pickle  9.907   -

带有pickle协议3的Python 3.4甚至更快。

If you are more concerned with interoperability, security, and/or human readability, then use JSON.

The tests results referenced in other answers were recorded in 2010, and the updated tests in 2016 with cPickle protocol 2 show:

  • cPickle 3.8x faster loading
  • cPickle 1.5x faster reading
  • cPickle slightly smaller encoding

Reproduce this yourself with this gist, which is based on the Konstantin’s benchmark referenced in other answers, but using cPickle with protocol 2 instead of pickle, and using json instead of simplejson (since json is faster than simplejson), e.g.

wget https://gist.github.com/jdimatteo/af317ef24ccf1b3fa91f4399902bb534/raw/03e8dbab11b5605bc572bc117c8ac34cfa959a70/pickle_vs_json.py
python pickle_vs_json.py

Results with python 2.7 on a decent 2015 Xeon processor:

Dir Entries Method  Time    Length

dump    10  JSON    0.017   1484510
load    10  JSON    0.375   -
dump    10  Pickle  0.011   1428790
load    10  Pickle  0.098   -
dump    20  JSON    0.036   2969020
load    20  JSON    1.498   -
dump    20  Pickle  0.022   2857580
load    20  Pickle  0.394   -
dump    50  JSON    0.079   7422550
load    50  JSON    9.485   -
dump    50  Pickle  0.055   7143950
load    50  Pickle  2.518   -
dump    100 JSON    0.165   14845100
load    100 JSON    37.730  -
dump    100 Pickle  0.107   14287900
load    100 Pickle  9.907   -

Python 3.4 with pickle protocol 3 is even faster.

回答 4

JSON or pickle? How about JSON and pickle! You can use jsonpickle. It easy to use and the file on disk is readable because it’s JSON.


回答 5

我尝试了几种方法,发现使用cPickle并将dumps方法的协议参数设置为:cPickle.dumps(obj, protocol=cPickle.HIGHEST_PROTOCOL)是最快的转储方法。

import msgpack
import json
import pickle
import timeit
import cPickle
import numpy as np

num_tests = 10

obj = np.random.normal(0.5, 1, [240, 320, 3])

command = 'pickle.dumps(obj)'
setup = 'from __main__ import pickle, obj'
result = timeit.timeit(command, setup=setup, number=num_tests)
print("pickle:  %f seconds" % result)

command = 'cPickle.dumps(obj)'
setup = 'from __main__ import cPickle, obj'
result = timeit.timeit(command, setup=setup, number=num_tests)
print("cPickle:   %f seconds" % result)

command = 'cPickle.dumps(obj, protocol=cPickle.HIGHEST_PROTOCOL)'
setup = 'from __main__ import cPickle, obj'
result = timeit.timeit(command, setup=setup, number=num_tests)
print("cPickle highest:   %f seconds" % result)

command = 'json.dumps(obj.tolist())'
setup = 'from __main__ import json, obj'
result = timeit.timeit(command, setup=setup, number=num_tests)
print("json:   %f seconds" % result)

command = 'msgpack.packb(obj.tolist())'
setup = 'from __main__ import msgpack, obj'
result = timeit.timeit(command, setup=setup, number=num_tests)
print("msgpack:   %f seconds" % result)


pickle         :   0.847938 seconds
cPickle        :   0.810384 seconds
cPickle highest:   0.004283 seconds
json           :   1.769215 seconds
msgpack        :   0.270886 seconds

import msgpack
import json
import pickle
import timeit
import cPickle
import numpy as np

num_tests = 10

obj = np.random.normal(0.5, 1, [240, 320, 3])

command = 'pickle.dumps(obj)'
setup = 'from __main__ import pickle, obj'
result = timeit.timeit(command, setup=setup, number=num_tests)
print("pickle:  %f seconds" % result)

command = 'cPickle.dumps(obj)'
setup = 'from __main__ import cPickle, obj'
result = timeit.timeit(command, setup=setup, number=num_tests)
print("cPickle:   %f seconds" % result)

command = 'cPickle.dumps(obj, protocol=cPickle.HIGHEST_PROTOCOL)'
setup = 'from __main__ import cPickle, obj'
result = timeit.timeit(command, setup=setup, number=num_tests)
print("cPickle highest:   %f seconds" % result)

command = 'json.dumps(obj.tolist())'
setup = 'from __main__ import json, obj'
result = timeit.timeit(command, setup=setup, number=num_tests)
print("json:   %f seconds" % result)

command = 'msgpack.packb(obj.tolist())'
setup = 'from __main__ import msgpack, obj'
result = timeit.timeit(command, setup=setup, number=num_tests)
print("msgpack:   %f seconds" % result)


pickle         :   0.847938 seconds
cPickle        :   0.810384 seconds
cPickle highest:   0.004283 seconds
json           :   1.769215 seconds
msgpack        :   0.270886 seconds

Personally, I generally prefer JSON because the data is human-readable. Definitely, if you need to serialize something that JSON won’t take, than use pickle.

But for most data storage, you won’t need to serialize anything weird and JSON is much easier and always allows you to pop it open in a text editor and check out the data yourself.

The speed is nice, but for most datasets the difference is negligible; Python generally isn’t too fast anyways.

JSON ValueError:期望的属性名称:第1行第2列(字符1)

问题:JSON ValueError:期望的属性名称:第1行第2列(字符1)


ValueError: Expecting property name: line 1 column 2 (char 1)


from kafka.client import KafkaClient
from kafka.consumer import SimpleConsumer
from kafka.producer import SimpleProducer, KeyedProducer
import pymongo
from pymongo import MongoClient
import json

c = MongoClient("")
db = c.test_database3
collection = db.tweet_col

kafka = KafkaClient("")

consumer = SimpleConsumer(kafka,"myconsumer","test")
for tweet in consumer:
    print tweet.message.value
    jsonTweet=json.loads(({u'favorited': False, u'contributors': None})


jsonTweet=json.loads({u'favorited': False, u'contributors': None})


ValueError: Expecting property name: line 1 column 2 (char 1)

Here is my code:

from kafka.client import KafkaClient
from kafka.consumer import SimpleConsumer
from kafka.producer import SimpleProducer, KeyedProducer
import pymongo
from pymongo import MongoClient
import json

c = MongoClient("")
db = c.test_database3
collection = db.tweet_col

kafka = KafkaClient("")

consumer = SimpleConsumer(kafka,"myconsumer","test")
for tweet in consumer:
    print tweet.message.value
    jsonTweet=json.loads(({u'favorited': False, u'contributors': None})

I’m pretty sure that the error is occuring at the 2nd to last line

jsonTweet=json.loads({u'favorited': False, u'contributors': None})

but I do not know what to do to fix it. Any advice would be appreciated.

回答 0


>>> json_string = '{"favorited": false, "contributors": null}'
'{"favorited": false, "contributors": null}'
>>> value = json.loads(json_string)
{u'favorited': False, u'contributors': None}
>>> json_dump = json.dumps(value)
'{"favorited": false, "contributors": null}'

所以那行是不正确的,因为您正在尝试load使用python dict,并json.loads期望json string应该有一个有效的python <type 'str'>



json.loads will load a json string into a python dict, json.dumps will dump a python dict to a json string, for example:

>>> json_string = '{"favorited": false, "contributors": null}'
'{"favorited": false, "contributors": null}'
>>> value = json.loads(json_string)
{u'favorited': False, u'contributors': None}
>>> json_dump = json.dumps(value)
'{"favorited": false, "contributors": null}'

So that line is incorrect since you are trying to load a python dict, and json.loads is expecting a valid json string which should have <type 'str'>.

So if you are trying to load the json, you should change what you are loading to look like the json_string above, or you should be dumping it. This is just my best guess from the given information. What is it that you are trying to accomplish?

Also you don’t need to specify the u before your strings, as @Cld mentioned in the comments.

    'property': 1


    "property": 1


json.loads 不接受最终逗号:

  "property": "text", 
  "property2": "text2",


您可以使用ast(Python 2和3的标准库的一部分)进行此处理。这是一个例子:

import ast
# ast.literal_eval() return a dict object, we must use json.dumps to get JSON string
import json

# Single quote to double with ast.literal_eval()
json_data = "{'property': 'text'}"
json_data = ast.literal_eval(json_data)
# Displays : {"property": "text"}

# ast.literal_eval() with double quotes
json_data = '{"property": "text"}'
json_data = ast.literal_eval(json_data)
# Displays : {"property": "text"}

# ast.literal_eval() with final coma
json_data = "{'property': 'text', 'property2': 'text2',}"
json_data = ast.literal_eval(json_data)
# Displays : {"property2": "text2", "property": "text"}



警告由于Python AST编译器中的堆栈深度限制,使用足够大/复杂的字符串可能会使Python解释器崩溃。



import ast
import json

data = json.dumps(ast.literal_eval(json_data_single_quote))

ast 文件资料

ast Python 3 doc

ast Python 2文档




Single quote issue

I used a json string with single quotes :

    'property': 1

But json.loads accepts only double quotes for json properties :

    "property": 1

Final comma issue

json.loads doesn’t accept a final comma:

  "property": "text", 
  "property2": "text2",

Solution: ast to solve single quote and final comma issues

You can use ast (part of standard library for both Python 2 and 3) for this processing. Here is an example :

import ast
# ast.literal_eval() return a dict object, we must use json.dumps to get JSON string
import json

# Single quote to double with ast.literal_eval()
json_data = "{'property': 'text'}"
json_data = ast.literal_eval(json_data)
# Displays : {"property": "text"}

# ast.literal_eval() with double quotes
json_data = '{"property": "text"}'
json_data = ast.literal_eval(json_data)
# Displays : {"property": "text"}

# ast.literal_eval() with final coma
json_data = "{'property': 'text', 'property2': 'text2',}"
json_data = ast.literal_eval(json_data)
# Displays : {"property2": "text2", "property": "text"}

Using ast will prevent you from single quote and final comma issues by interpet the JSON like Python dictionnary (so you must follow the Python dictionnary syntax). It’s a pretty good and safely alternative of eval() function for literal structures.

Python documentation warned us of using large/complex string :

Warning It is possible to crash the Python interpreter with a sufficiently large/complex string due to stack depth limitations in Python’s AST compiler.

json.dumps with single quotes

To use json.dumps with single quotes easily you can use this code:

import ast
import json

data = json.dumps(ast.literal_eval(json_data_single_quote))

ast documentation

ast Python 3 doc

ast Python 2 doc


I hope it helps.

回答 2

  1. 用双引号替换所有单引号
  2. 将字符串中的’u“’替换为’”’…因此,在将字符串加载到json之前,基本上将内部unicodes转换为字符串
>> strs = "{u'key':u'val'}"
>> strs = strs.replace("'",'"')
>> json.loads(strs.replace('u"','"'))
  1. replace all single quotes with double quotes
  2. replace ‘u”‘ from your strings to ‘”‘ … so basically convert internal unicodes to strings before loading the string into json
>> strs = "{u'key':u'val'}"
>> strs = strs.replace("'",'"')
>> json.loads(strs.replace('u"','"'))

All other answers may answer your query, but I faced same issue which was due to stray , which I added at the end of my json string like this:


I finally got it working when I removed extra , like this:


Hope this help! cheers.

回答 4


In [15]: a = "[{'start_city': '1', 'end_city': 'aaa', 'number': 1},\
...:      {'start_city': '2', 'end_city': 'bbb', 'number': 1},\
...:      {'start_city': '3', 'end_city': 'ccc', 'number': 1}]"
In [16]: import ast
In [17]: ast.literal_eval(a)
[{'end_city': 'aaa', 'number': 1, 'start_city': '1'},
 {'end_city': 'bbb', 'number': 1, 'start_city': '2'},
 {'end_city': 'ccc', 'number': 1, 'start_city': '3'}]

In [15]: a = "[{'start_city': '1', 'end_city': 'aaa', 'number': 1},\
...:      {'start_city': '2', 'end_city': 'bbb', 'number': 1},\
...:      {'start_city': '3', 'end_city': 'ccc', 'number': 1}]"
In [16]: import ast
In [17]: ast.literal_eval(a)
[{'end_city': 'aaa', 'number': 1, 'start_city': '1'},
 {'end_city': 'bbb', 'number': 1, 'start_city': '2'},
 {'end_city': 'ccc', 'number': 1, 'start_city': '3'}]

回答 5


echo "{"thumbnailWidth": 640}" | myscript.py


echo '{"thumbnailWidth": 640}' | myscript.py

由于这是,这是获得什么python脚本:{thumbnailWidth: 640}; 双引号已被有效删除。

A different case in which I encountered this was when I was using echo to pipe the JSON into my python script and carelessly wrapped the JSON string in double quotes:

echo "{"thumbnailWidth": 640}" | myscript.py

Note that the JSON string itself has quotes and I should have done:

echo '{"thumbnailWidth": 640}' | myscript.py

Request = request.get_json()


                       data=json.dumps(dict(foo = 'bar')))


>>> request.get_data()
'{"foo": "bar"}'
>>> request.get_json()

Flask似乎有一个JSON参数,您可以在其中发布请求中设置json = dict(foo =’bar’),但我不知道如何使用unittest模块来做到这一点。

I have code within a Flask application that uses JSONs in the request, and I can get the JSON object like so:

Request = request.get_json()

This has been working fine, however I am trying to create unit tests using Python’s unittest module and I’m having difficulty finding a way to send a JSON with the request.

                       data=json.dumps(dict(foo = 'bar')))

This gives me:

>>> request.get_data()
'{"foo": "bar"}'
>>> request.get_json()

Flask seems to have a JSON argument where you can set json=dict(foo=’bar’) within the post request, but I don’t know how to do that with the unittest module.

Changing the post to


fixed it.

Thanks to user3012759.

回答 1

更新:由于Flask 1.0发布的flask.testing.FlaskClient方法接受json参数和Response.get_json添加的方法,请参见example

对于Flask 0.x,您可以使用以下收据:

from flask import Flask, Response as BaseResponse, json
from flask.testing import FlaskClient
from werkzeug.utils import cached_property

class Response(BaseResponse):
    def json(self):
        return json.loads(self.data)

class TestClient(FlaskClient):
    def open(self, *args, **kwargs):
        if 'json' in kwargs:
            kwargs['data'] = json.dumps(kwargs.pop('json'))
            kwargs['content_type'] = 'application/json'
        return super(TestClient, self).open(*args, **kwargs)

app = Flask(__name__)
app.response_class = Response
app.test_client_class = TestClient
app.testing = True

for Flask 0.x you may use receipt below:

from flask import Flask, Response as BaseResponse, json
from flask.testing import FlaskClient
from werkzeug.utils import cached_property

class Response(BaseResponse):
    def json(self):
        return json.loads(self.data)

class TestClient(FlaskClient):
    def open(self, *args, **kwargs):
        if 'json' in kwargs:
            kwargs['data'] = json.dumps(kwargs.pop('json'))
            kwargs['content_type'] = 'application/json'
        return super(TestClient, self).open(*args, **kwargs)

app = Flask(__name__)
app.response_class = Response
app.test_client_class = TestClient
app.testing = True