标签归档:json

json.load()和json.loads()函数有什么区别

问题:json.load()和json.loads()函数有什么区别

在Python中,json.load()和之间有什么区别json.loads()

我猜想load()函数必须与文件对象一起使用(因此,我需要使用上下文管理器),而load()函数将文件路径作为字符串。这有点令人困惑。

字母“ sjson.loads()代表字符串吗?

非常感谢你的回答!

In Python, what is the difference between json.load() and json.loads()?

I guess that the load() function must be used with a file object (I need thus to use a context manager) while the loads() function take the path to the file as a string. It is a bit confusing.

Does the letter “s” in json.loads() stand for string?

Thanks a lot for your answers!


回答 0

是的,s代表字符串。该json.loads函数不采用文件路径,而是将文件内容作为字符串。查看位于https://docs.python.org/2/library/json.html的文档!

Yes, s stands for string. The json.loads function does not take the file path, but the file contents as a string. Look at the documentation at https://docs.python.org/2/library/json.html!


回答 1

只是在每个人的解释中添加一个简单的例子,

json.load()

json.load可以反序列化文件本身,即它接受一个file对象,例如,

# open a json file for reading and print content using json.load
with open("/xyz/json_data.json", "r") as content:
  print(json.load(content))

将输出

{u'event': {u'id': u'5206c7e2-da67-42da-9341-6ea403c632c7', u'name': u'Sufiyan Ghori'}}

如果我改用json.loads打开文件,

# you cannot use json.loads on file object
with open("json_data.json", "r") as content:
  print(json.loads(content))

我会收到此错误:

TypeError:预期的字符串或缓冲区

json.loads()

json.loads() 反串化字符串。

因此,要使用json.loads该文件read(),我将不得不使用函数传递文件的内容,例如,

content.read()json.loads()文件的返回内容一起使用,

with open("json_data.json", "r") as content:
  print(json.loads(content.read()))

输出,

{u'event': {u'id': u'5206c7e2-da67-42da-9341-6ea403c632c7', u'name': u'Sufiyan Ghori'}}

那是因为类型content.read()是字符串,即<type 'str'>

如果json.load()与配合使用content.read(),则会出现错误,

with open("json_data.json", "r") as content:
  print(json.load(content.read()))

给,

AttributeError:’str’对象没有属性’read’

因此,现在您知道json.load反序列化文件并json.loads反序列化一个字符串。

另一个例子,

sys.stdin返回file对象,所以如果我这样做print(json.load(sys.stdin)),我将获得实际的json数据,

cat json_data.json | ./test.py

{u'event': {u'id': u'5206c7e2-da67-42da-9341-6ea403c632c7', u'name': u'Sufiyan Ghori'}}

如果要使用json.loads(),我会print(json.loads(sys.stdin.read()))改为使用。

Just going to add a simple example to what everyone has explained,

json.load()

json.load can deserialize a file itself i.e. it accepts a file object, for example,

# open a json file for reading and print content using json.load
with open("/xyz/json_data.json", "r") as content:
  print(json.load(content))

will output,

{u'event': {u'id': u'5206c7e2-da67-42da-9341-6ea403c632c7', u'name': u'Sufiyan Ghori'}}

If I use json.loads to open a file instead,

# you cannot use json.loads on file object
with open("json_data.json", "r") as content:
  print(json.loads(content))

I would get this error:

TypeError: expected string or buffer

json.loads()

json.loads() deserialize string.

So in order to use json.loads I will have to pass the content of the file using read() function, for example,

using content.read() with json.loads() return content of the file,

with open("json_data.json", "r") as content:
  print(json.loads(content.read()))

Output,

{u'event': {u'id': u'5206c7e2-da67-42da-9341-6ea403c632c7', u'name': u'Sufiyan Ghori'}}

That’s because type of content.read() is string, i.e. <type 'str'>

If I use json.load() with content.read(), I will get error,

with open("json_data.json", "r") as content:
  print(json.load(content.read()))

Gives,

AttributeError: ‘str’ object has no attribute ‘read’

So, now you know json.load deserialze file and json.loads deserialize a string.

Another example,

sys.stdin return file object, so if i do print(json.load(sys.stdin)), I will get actual json data,

cat json_data.json | ./test.py

{u'event': {u'id': u'5206c7e2-da67-42da-9341-6ea403c632c7', u'name': u'Sufiyan Ghori'}}

If I want to use json.loads(), I would do print(json.loads(sys.stdin.read())) instead.


回答 2

文档非常清晰:https//docs.python.org/2/library/json.html

json.load(fp[, encoding[, cls[, object_hook[, parse_float[, parse_int[, parse_constant[, object_pairs_hook[, **kw]]]]]]]])

使用此转换表将fp(支持.read()的包含JSON文档的类似文件的对象)反序列化为Python对象。

json.loads(s[, encoding[, cls[, object_hook[, parse_float[, parse_int[, parse_constant[, object_pairs_hook[, **kw]]]]]]]])

使用此转换表将s(包含JSON文档的str或unicode实例)反序列化为Python对象。

所以load是一个文件,loads一个string

Documentation is quite clear: https://docs.python.org/2/library/json.html

json.load(fp[, encoding[, cls[, object_hook[, parse_float[, parse_int[, parse_constant[, object_pairs_hook[, **kw]]]]]]]])

Deserialize fp (a .read()-supporting file-like object containing a JSON document) to a Python object using this conversion table.

json.loads(s[, encoding[, cls[, object_hook[, parse_float[, parse_int[, parse_constant[, object_pairs_hook[, **kw]]]]]]]])

Deserialize s (a str or unicode instance containing a JSON document) to a Python object using this conversion table.

So load is for a file, loads for a string


回答 3

快速解答(非常简化!)

json.load()需要一个文件

json.load()需要一个文件(文件对象),例如,您在文件路径(如)给定之前打开的文件'files/example.json'


json.loads()需要一个STRING

json.loads()需要一个(有效)JSON字符串-即 {"foo": "bar"}


例子

假设您有一个文件example.json,其内容如下:{“ key_1”:1,1,“ key_2”:“ foo”,“ Key_3”:null}

>>> import json
>>> file = open("example.json")

>>> type(file)
<class '_io.TextIOWrapper'>

>>> file
<_io.TextIOWrapper name='example.json' mode='r' encoding='UTF-8'>

>>> json.load(file)
{'key_1': 1, 'key_2': 'foo', 'Key_3': None}

>>> json.loads(file)
Traceback (most recent call last):
  File "/usr/local/python/Versions/3.7/lib/python3.7/json/__init__.py", line 341, in loads
TypeError: the JSON object must be str, bytes or bytearray, not TextIOWrapper


>>> string = '{"foo": "bar"}'

>>> type(string)
<class 'str'>

>>> string
'{"foo": "bar"}'

>>> json.loads(string)
{'foo': 'bar'}

>>> json.load(string)
Traceback (most recent call last):
  File "/usr/local/python/Versions/3.7/lib/python3.7/json/__init__.py", line 293, in load
    return loads(fp.read(),
AttributeError: 'str' object has no attribute 'read'

QUICK ANSWER (very simplified!)

json.load() takes a FILE

json.load() expects a file (file object) – e.g. a file you opened before given by filepath like 'files/example.json'.


json.loads() takes a STRING

json.loads() expects a (valid) JSON string – i.e. {"foo": "bar"}


EXAMPLES

Assuming you have a file example.json with this content: { “key_1”: 1, “key_2”: “foo”, “Key_3”: null }

>>> import json
>>> file = open("example.json")

>>> type(file)
<class '_io.TextIOWrapper'>

>>> file
<_io.TextIOWrapper name='example.json' mode='r' encoding='UTF-8'>

>>> json.load(file)
{'key_1': 1, 'key_2': 'foo', 'Key_3': None}

>>> json.loads(file)
Traceback (most recent call last):
  File "/usr/local/python/Versions/3.7/lib/python3.7/json/__init__.py", line 341, in loads
TypeError: the JSON object must be str, bytes or bytearray, not TextIOWrapper


>>> string = '{"foo": "bar"}'

>>> type(string)
<class 'str'>

>>> string
'{"foo": "bar"}'

>>> json.loads(string)
{'foo': 'bar'}

>>> json.load(string)
Traceback (most recent call last):
  File "/usr/local/python/Versions/3.7/lib/python3.7/json/__init__.py", line 293, in load
    return loads(fp.read(),
AttributeError: 'str' object has no attribute 'read'

回答 4

所述json.load()方法(无“S”中的“负荷”)可直接读取的文件:

import json
with open('strings.json') as f:
    d = json.load(f)
    print(d)

json.loads()方法,仅用于字符串参数。

import json

person = '{"name": "Bob", "languages": ["English", "Fench"]}'
print(type(person))
# Output : <type 'str'>

person_dict = json.loads(person)
print( person_dict)
# Output: {'name': 'Bob', 'languages': ['English', 'Fench']}

print(type(person_dict))
# Output : <type 'dict'>

在这里,我们可以看到在使用load()将字符串(type(str))作为输入并返回字典之后

The json.load() method (without “s” in “load”) can read a file directly:

import json
with open('strings.json') as f:
    d = json.load(f)
    print(d)

json.loads() method, which is used for string arguments only.

import json

person = '{"name": "Bob", "languages": ["English", "Fench"]}'
print(type(person))
# Output : <type 'str'>

person_dict = json.loads(person)
print( person_dict)
# Output: {'name': 'Bob', 'languages': ['English', 'Fench']}

print(type(person_dict))
# Output : <type 'dict'>

Here , we can see after using loads() takes a string ( type(str) ) as a input and return dictionary.


回答 5

在python3.7.7中,根据cpython源代码,json.load的定义如下:

def load(fp, *, cls=None, object_hook=None, parse_float=None,
        parse_int=None, parse_constant=None, object_pairs_hook=None, **kw):

    return loads(fp.read(),
        cls=cls, object_hook=object_hook,
        parse_float=parse_float, parse_int=parse_int,
        parse_constant=parse_constant, object_pairs_hook=object_pairs_hook, **kw)

json.load实际上调用json.loads并fp.read()用作第一个参数。

因此,如果您的代码是:

with open (file) as fp:
    s = fp.read()
    json.loads(s)

这样做是一样的:

with open (file) as fp:
    json.load(fp)

但是,如果您需要指定从文件中读取的字节,例如,fp.read(10)或者您要反序列化的字符串/字节不是从文件中读取,则应使用json.loads()

至于json.loads(),它不仅反序列化字符串,而且还反序列化字节。如果s为bytes或bytearray,则将其首先解码为字符串。您也可以在源代码中找到它。

def loads(s, *, encoding=None, cls=None, object_hook=None, parse_float=None,
        parse_int=None, parse_constant=None, object_pairs_hook=None, **kw):
    """Deserialize ``s`` (a ``str``, ``bytes`` or ``bytearray`` instance
    containing a JSON document) to a Python object.

    ...

    """
    if isinstance(s, str):
        if s.startswith('\ufeff'):
            raise JSONDecodeError("Unexpected UTF-8 BOM (decode using utf-8-sig)",
                                  s, 0)
    else:
        if not isinstance(s, (bytes, bytearray)):
            raise TypeError(f'the JSON object must be str, bytes or bytearray, '
                            f'not {s.__class__.__name__}')
        s = s.decode(detect_encoding(s), 'surrogatepass')

In python3.7.7, the definition of json.load is as below according to cpython source code:

def load(fp, *, cls=None, object_hook=None, parse_float=None,
        parse_int=None, parse_constant=None, object_pairs_hook=None, **kw):

    return loads(fp.read(),
        cls=cls, object_hook=object_hook,
        parse_float=parse_float, parse_int=parse_int,
        parse_constant=parse_constant, object_pairs_hook=object_pairs_hook, **kw)

json.load actually calls json.loads and use fp.read() as the first argument.

So if your code is:

with open (file) as fp:
    s = fp.read()
    json.loads(s)

It’s the same to do this:

with open (file) as fp:
    json.load(fp)

But if you need to specify the bytes reading from the file as like fp.read(10) or the string/bytes you want to deserialize is not from file, you should use json.loads()

As for json.loads(), it not only deserialize string but also bytes. If s is bytes or bytearray, it will be decoded to string first. You can also find it in the source code.

def loads(s, *, encoding=None, cls=None, object_hook=None, parse_float=None,
        parse_int=None, parse_constant=None, object_pairs_hook=None, **kw):
    """Deserialize ``s`` (a ``str``, ``bytes`` or ``bytearray`` instance
    containing a JSON document) to a Python object.

    ...

    """
    if isinstance(s, str):
        if s.startswith('\ufeff'):
            raise JSONDecodeError("Unexpected UTF-8 BOM (decode using utf-8-sig)",
                                  s, 0)
    else:
        if not isinstance(s, (bytes, bytearray)):
            raise TypeError(f'the JSON object must be str, bytes or bytearray, '
                            f'not {s.__class__.__name__}')
        s = s.decode(detect_encoding(s), 'surrogatepass')


python中的json.dump()和json.dumps()有什么区别?

问题:python中的json.dump()和json.dumps()有什么区别?

我在官方文档中进行了搜索,以查找python中的json.dump()和json.dumps()之间的区别。显然,它们与文件写入选项有关。
但是,它们之间的详细区别是什么?在什么情况下,一个比另一个具有更多的优势?

I searched in this official document to find difference between the json.dump() and json.dumps() in python. It is clear that they are related with file write option.
But what is the detailed difference between them and in what situations one has more advantage than other?


回答 0

除了文档所说的内容外,没有什么可添加的。如果要将JSON转储到文件/套接字或其他文件中,则应使用dump()。如果只需要它作为字符串(用于打印,解析或其他操作),则使用dumps()(转储字符串)

正如Antii Haapala在此答案中提到的,在ensure_ascii行为上有一些细微的差异。这主要是由于底层write()函数是如何工作的,因为它是对块而不是整个字符串进行操作。检查他的答案以获取更多详细信息。

json.dump()

将obj作为JSON格式的流序列化到fp(支持.write()的类似文件的对象

如果ensure_ascii为False,则写入fp的某些块可能是unicode实例

json.dumps()

将obj序列化为JSON格式的str

如果sure_ascii为False,则结果可能包含非ASCII字符,并且返回值可能是unicode实例

There isn’t much else to add other than what the docs say. If you want to dump the JSON into a file/socket or whatever, then you should go with dump(). If you only need it as a string (for printing, parsing or whatever) then use dumps() (dump string)

As mentioned by Antti Haapala in this answer, there are some minor differences on the ensure_ascii behaviour. This is mostly due to how the underlying write() function works, being that it operates on chunks rather than the whole string. Check his answer for more details on that.

json.dump()

Serialize obj as a JSON formatted stream to fp (a .write()-supporting file-like object

If ensure_ascii is False, some chunks written to fp may be unicode instances

json.dumps()

Serialize obj to a JSON formatted str

If ensure_ascii is False, the result may contain non-ASCII characters and the return value may be a unicode instance


回答 1

与功能s取字符串参数。其他则采用文件流。

The functions with an s take string parameters. The others take file streams.


回答 2

在内存使用和速度上。

调用时,jsonstr = json.dumps(mydata)它首先在内存中创建数据的完整副本,然后才将file.write(jsonstr)其复制到磁盘。因此,这是一种更快的方法,但是如果要保存大量数据,则可能会成为问题。

当调用json.dump(mydata, file)-不带’s’时,不使用新的内存,因为数据是按块转储的。但是整个过程要慢大约2倍。

来源:我检查了json.dump()和的源代码,json.dumps()还测试了两个变量,它们测量了time.time()htop中的时间并观察了它们的内存使用情况。

In memory usage and speed.

When you call jsonstr = json.dumps(mydata) it first creates a full copy of your data in memory and only then you file.write(jsonstr) it to disk. So this is a faster method but can be a problem if you have a big piece of data to save.

When you call json.dump(mydata, file) — without ‘s’, new memory is not used, as the data is dumped by chunks. But the whole process is about 2 times slower.

Source: I checked the source code of json.dump() and json.dumps() and also tested both the variants measuring the time with time.time() and watching the memory usage in htop.


回答 3

Python 2的一个显着差异是,如果您使用ensure_ascii=False,则dump可以将UTF-8编码的数据正确写入文件中(除非您使用的扩展名不是UTF-8的8位字符串):

dumps另一方面,with ensure_ascii=False可以产生a strunicode仅取决于您用于字符串的类型:

使用此转换表将obj序列化为JSON格式的str。如果sure_ascii为False,则结果可能包含非ASCII字符,并且返回值可能是unicodeinstance

(强调我的)。请注意,它可能仍然是一个str实例。

因此,如果不检查返回的格式以及可能使用的格式,就无法使用其返回值将结构保存到文件中unicode.encode

当然,这在Python 3中不再是有效的问题,因为不再存在这种8位/ Unicode的混淆。


至于loadVS loadsload认为整个文件是一个JSON文件,所以你不能用它来从单个文件读取多个新行限制JSON文件。

One notable difference in Python 2 is that if you’re using ensure_ascii=False, dump will properly write UTF-8 encoded data into the file (unless you used 8-bit strings with extended characters that are not UTF-8):

dumps on the other hand, with ensure_ascii=False can produce a str or unicode just depending on what types you used for strings:

Serialize obj to a JSON formatted str using this conversion table. If ensure_ascii is False, the result may contain non-ASCII characters and the return value may be a unicode instance.

(emphasis mine). Note that it may still be a str instance as well.

Thus you cannot use its return value to save the structure into file without checking which format was returned and possibly playing with unicode.encode.

This of course is not valid concern in Python 3 any more, since there is no more this 8-bit/Unicode confusion.


As for load vs loads, load considers the whole file to be one JSON document, so you cannot use it to read multiple newline limited JSON documents from a single file.


检查密钥是否存在,并使用Python迭代JSON数组

问题:检查密钥是否存在,并使用Python迭代JSON数组

我从Facebook帖子中获得了一堆JSON数据,如下所示:

{"from": {"id": "8", "name": "Mary Pinter"}, "message": "How ARE you?", "comments": {"count": 0}, "updated_time": "2012-05-01", "created_time": "2012-05-01", "to": {"data": [{"id": "1543", "name": "Honey Pinter"}]}, "type": "status", "id": "id_7"}

JSON数据是半结构化的,并且所有数据都不相同。下面是我的代码:

import json 

str = '{"from": {"id": "8", "name": "Mary Pinter"}, "message": "How ARE you?", "comments": {"count": 0}, "updated_time": "2012-05-01", "created_time": "2012-05-01", "to": {"data": [{"id": "1543", "name": "Honey Pinter"}]}, "type": "status", "id": "id_7"}'
data = json.loads(str)

post_id = data['id']
post_type = data['type']
print(post_id)
print(post_type)

created_time = data['created_time']
updated_time = data['updated_time']
print(created_time)
print(updated_time)

if data.get('application'):
    app_id = data['application'].get('id', 0)
    print(app_id)
else:
    print('null')

#if data.get('to'):
#... This is the part I am not sure how to do
# Since it is in the form "to": {"data":[{"id":...}]}

我希望代码将to_id打印为1543,否则打印’null’

我不确定该怎么做。

I have a bunch of JSON data from Facebook posts like the one below:

{"from": {"id": "8", "name": "Mary Pinter"}, "message": "How ARE you?", "comments": {"count": 0}, "updated_time": "2012-05-01", "created_time": "2012-05-01", "to": {"data": [{"id": "1543", "name": "Honey Pinter"}]}, "type": "status", "id": "id_7"}

The JSON data is semi-structured and all is not the same. Below is my code:

import json 

str = '{"from": {"id": "8", "name": "Mary Pinter"}, "message": "How ARE you?", "comments": {"count": 0}, "updated_time": "2012-05-01", "created_time": "2012-05-01", "to": {"data": [{"id": "1543", "name": "Honey Pinter"}]}, "type": "status", "id": "id_7"}'
data = json.loads(str)

post_id = data['id']
post_type = data['type']
print(post_id)
print(post_type)

created_time = data['created_time']
updated_time = data['updated_time']
print(created_time)
print(updated_time)

if data.get('application'):
    app_id = data['application'].get('id', 0)
    print(app_id)
else:
    print('null')

#if data.get('to'):
#... This is the part I am not sure how to do
# Since it is in the form "to": {"data":[{"id":...}]}

I want the code to print the to_id as 1543 else print ‘null’

I am not sure how to do this.


回答 0

import json

jsonData = """{"from": {"id": "8", "name": "Mary Pinter"}, "message": "How ARE you?", "comments": {"count": 0}, "updated_time": "2012-05-01", "created_time": "2012-05-01", "to": {"data": [{"id": "1543", "name": "Honey Pinter"}]}, "type": "status", "id": "id_7"}"""

def getTargetIds(jsonData):
    data = json.loads(jsonData)
    if 'to' not in data:
        raise ValueError("No target in given data")
    if 'data' not in data['to']:
        raise ValueError("No data for target")

    for dest in data['to']['data']:
        if 'id' not in dest:
            continue
        targetId = dest['id']
        print("to_id:", targetId)

输出:

In [9]: getTargetIds(s)
to_id: 1543
import json

jsonData = """{"from": {"id": "8", "name": "Mary Pinter"}, "message": "How ARE you?", "comments": {"count": 0}, "updated_time": "2012-05-01", "created_time": "2012-05-01", "to": {"data": [{"id": "1543", "name": "Honey Pinter"}]}, "type": "status", "id": "id_7"}"""

def getTargetIds(jsonData):
    data = json.loads(jsonData)
    if 'to' not in data:
        raise ValueError("No target in given data")
    if 'data' not in data['to']:
        raise ValueError("No data for target")

    for dest in data['to']['data']:
        if 'id' not in dest:
            continue
        targetId = dest['id']
        print("to_id:", targetId)

Output:

In [9]: getTargetIds(s)
to_id: 1543

回答 1

如果您只想检查密钥是否存在

h = {'a': 1}
'b' in h # returns False

如果要检查是否有密钥值

h.get('b') # returns None

如果缺少实际值,则返回默认值

h.get('b', 'Default value')

If all you want is to check if key exists or not

h = {'a': 1}
'b' in h # returns False

If you want to check if there is a value for key

h.get('b') # returns None

Return a default value if actual value is missing

h.get('b', 'Default value')

回答 2

为此类事件创建助手实用程序方法是一个好习惯,这样,每当您需要更改属性验证的逻辑时,它就会放在一个位置,并且对于跟随者而言,代码将更具可读性。

例如,在以下位置创建一个辅助方法(或JsonUtils带有静态方法的类)json_utils.py

def get_attribute(data, attribute, default_value):
    return data.get(attribute) or default_value

然后在您的项目中使用它:

from json_utils import get_attribute

def my_cool_iteration_func(data):

    data_to = get_attribute(data, 'to', None)
    if not data_to:
        return

    data_to_data = get_attribute(data_to, 'data', [])
    for item in data_to_data:
        print('The id is: %s' % get_attribute(item, 'id', 'null'))

重要的提示:

我使用的原因data.get(attribute) or default_value不是简单的data.get(attribute, default_value)

{'my_key': None}.get('my_key', 'nothing') # returns None
{'my_key': None}.get('my_key') or 'nothing' # returns 'nothing'

在我的应用程序中,获取属性值为“ null”与根本不获取属性相同。如果您的用法不同,则需要进行更改。

It is a good practice to create helper utility methods for things like that so that whenever you need to change the logic of attribute validation it would be in one place, and the code will be more readable for the followers.

For example create a helper method (or class JsonUtils with static methods) in json_utils.py:

def get_attribute(data, attribute, default_value):
    return data.get(attribute) or default_value

and then use it in your project:

from json_utils import get_attribute

def my_cool_iteration_func(data):

    data_to = get_attribute(data, 'to', None)
    if not data_to:
        return

    data_to_data = get_attribute(data_to, 'data', [])
    for item in data_to_data:
        print('The id is: %s' % get_attribute(item, 'id', 'null'))

IMPORTANT NOTE:

There is a reason I am using data.get(attribute) or default_value instead of simply data.get(attribute, default_value):

{'my_key': None}.get('my_key', 'nothing') # returns None
{'my_key': None}.get('my_key') or 'nothing' # returns 'nothing'

In my applications getting attribute with value ‘null’ is the same as not getting the attribute at all. If your usage is different, you need to change this.


回答 3

jsonData = """{"from": {"id": "8", "name": "Mary Pinter"}, "message": "How ARE you?", "comments": {"count": 0}, "updated_time": "2012-05-01", "created_time": "2012-05-01", "to": {"data": [{"id": "1543", "name": "Honey Pinter"}, {"name": "Joe Schmoe"}]}, "type": "status", "id": "id_7"}"""

def getTargetIds(jsonData):
    data = json.loads(jsonData)
    for dest in data['to']['data']:
        print("to_id:", dest.get('id', 'null'))

试试吧:

>>> getTargetIds(jsonData)
to_id: 1543
to_id: null

或者,如果您只想跳过缺少ID的值,而不是打印'null'

def getTargetIds(jsonData):
    data = json.loads(jsonData)
    for dest in data['to']['data']:
        if 'id' in to_id:
            print("to_id:", dest['id'])

所以:

>>> getTargetIds(jsonData)
to_id: 1543

当然,在现实生活中,您可能不想使用print每个id,而是要存储它们并对其进行操作,但这是另一个问题。

jsonData = """{"from": {"id": "8", "name": "Mary Pinter"}, "message": "How ARE you?", "comments": {"count": 0}, "updated_time": "2012-05-01", "created_time": "2012-05-01", "to": {"data": [{"id": "1543", "name": "Honey Pinter"}, {"name": "Joe Schmoe"}]}, "type": "status", "id": "id_7"}"""

def getTargetIds(jsonData):
    data = json.loads(jsonData)
    for dest in data['to']['data']:
        print("to_id:", dest.get('id', 'null'))

Try it:

>>> getTargetIds(jsonData)
to_id: 1543
to_id: null

Or, if you just want to skip over values missing ids instead of printing 'null':

def getTargetIds(jsonData):
    data = json.loads(jsonData)
    for dest in data['to']['data']:
        if 'id' in to_id:
            print("to_id:", dest['id'])

So:

>>> getTargetIds(jsonData)
to_id: 1543

Of course in real life, you probably don’t want to print each id, but to store them and do something with them, but that’s another issue.


回答 4

if "my_data" in my_json_data:
         print json.dumps(my_json_data["my_data"])
if "my_data" in my_json_data:
         print json.dumps(my_json_data["my_data"])

回答 5

为此,我编写了一个小函数。随时调整用途,

def is_json_key_present(json, key):
    try:
        buf = json[key]
    except KeyError:
        return False

    return True

I wrote a tiny function for this purpose. Feel free to repurpose,

def is_json_key_present(json, key):
    try:
        buf = json[key]
    except KeyError:
        return False

    return True

Python的json模块,将int字典键转换为字符串

问题:Python的json模块,将int字典键转换为字符串

我发现运行以下命令时,python的json模块(自2.6起包含)将int字典键转换为字符串。

>>> import json
>>> releases = {1: "foo-v0.1"}
>>> json.dumps(releases)
'{"1": "foo-v0.1"}'

有什么简单的方法可以将键保留为int,而无需在转储和加载时解析字符串。我相信可以使用json模块提供的钩子,但这仍然需要解析。我可能会忽略一个论点吗?欢呼声,查兹

子问题:感谢您的回答。看到j​​son像我所担心的那样工作,是否有一种简单的方法可以通过解析转储的输出来传达密钥类型?我还要注意执行转储的代码以及从服务器下载json对象并加载它的代码均由我编写。

I have found that when the following is run, python’s json module (included since 2.6) converts int dictionary keys to strings.

>>> import json
>>> releases = {1: "foo-v0.1"}
>>> json.dumps(releases)
'{"1": "foo-v0.1"}'

Is there any easy way to preserve the key as an int, without needing to parse the string on dump and load. I believe it would be possible using the hooks provided by the json module, but again this still requires parsing. Is there possibly an argument I have overlooked? cheers, chaz

Sub-question: Thanks for the answers. Seeing as json works as I feared, is there an easy way to convey key type by maybe parsing the output of dumps? Also I should note the code doing the dumping and the code downloading the json object from a server and loading it, are both written by me.


回答 0

这是可能困扰您的各种映射集合之间的细微差别之一。JSON将键视为字符串;Python支持仅在类型上不同的独特键。

在Python中(显然在Lua中),映射的键(分别是字典或表)是对象引用。在Python中,它们必须是不可变的类型,或者它们必须是实现__hash__方法的对象。(Lua的文档建议即使对于可变对象,它也会自动将对象的ID用作哈希/键,并依赖于字符串插入以确保等效的字符串映射到相同的对象)。

在Perl,Javascript,awk和许多其他语言中,哈希,关联数组或给定语言所调用的名称的键是字符串(或Perl中的“标量”)。在Perl $foo{1}, $foo{1.0}, and $foo{"1"}是在相同的对应的所有引用%foo—关键是评估作为标!

JSON是从Javascript序列化技术开始的。(JSON代表Ĵ AVA 小号 CRIPT ö bject Ñ浮选。)当然它实现为它的映射符号的语义这与它的映射语义一致。

如果序列化的两端都将是Python,那么最好使用咸菜。如果您真的需要将这些从JSON转换回本机Python对象,我想您有两种选择。首先try: ... except: ...,如果字典查找失败,您可以尝试()将任何键转换为数字。或者,如果将代码添加到另一端(此JSON数据的序列化器或生成器),则可以让它对每个键值执行JSON序列化—将其作为键列表提供。(然后,您的Python代码将首先在键列表上进行迭代,将它们实例化/反序列化为本地Python对象…,然后使用那些键来访问映射中的值)。

This is one of those subtle differences among various mapping collections that can bite you. JSON treats keys as strings; Python supports distinct keys differing only in type.

In Python (and apparently in Lua) the keys to a mapping (dictionary or table, respectively) are object references. In Python they must be immutable types, or they must be objects which implement a __hash__ method. (The Lua docs suggest that it automatically uses the object’s ID as a hash/key even for mutable objects and relies on string interning to ensure that equivalent strings map to the same objects).

In Perl, Javascript, awk and many other languages the keys for hashes, associative arrays or whatever they’re called for the given language, are strings (or “scalars” in Perl). In perl $foo{1}, $foo{1.0}, and $foo{"1"} are all references to the same mapping in %foo — the key is evaluated as a scalar!

JSON started as a Javascript serialization technology. (JSON stands for JavaScript Object Notation.) Naturally it implements semantics for its mapping notation which are consistent with its mapping semantics.

If both ends of your serialization are going to be Python then you’d be better off using pickles. If you really need to convert these back from JSON into native Python objects I guess you have a couple of choices. First you could try (try: ... except: ...) to convert any key to a number in the event of a dictionary look-up failure. Alternatively, if you add code to the other end (the serializer or generator of this JSON data) then you could have it perform a JSON serialization on each of the key values — providing those as a list of keys. (Then your Python code would first iterate over the list of keys, instantiating/deserializing them into native Python objects … and then use those for access the values out of the mapping).


回答 1

不,JavaScript中没有数字键之类的东西。所有对象属性都将转换为String。

var a= {1: 'a'};
for (k in a)
    alert(typeof k); // 'string'

这可能会导致一些奇怪的行为:

a[999999999999999999999]= 'a'; // this even works on Array
alert(a[1000000000000000000000]); // 'a'
alert(a['999999999999999999999']); // fail
alert(a['1e+21']); // 'a'

JavaScript对象并不是真正正确的映射,因为您会在Python之类的语言中理解它,并且使用非String的键会导致怪异。这就是为什么JSON总是显式地将键写为字符串的原因,即使在不需要的地方也是如此。

No, there is no such thing as a Number key in JavaScript. All object properties are converted to String.

var a= {1: 'a'};
for (k in a)
    alert(typeof k); // 'string'

This can lead to some curious-seeming behaviours:

a[999999999999999999999]= 'a'; // this even works on Array
alert(a[1000000000000000000000]); // 'a'
alert(a['999999999999999999999']); // fail
alert(a['1e+21']); // 'a'

JavaScript Objects aren’t really proper mappings as you’d understand it in languages like Python, and using keys that aren’t String results in weirdness. This is why JSON always explicitly writes keys as strings, even where it doesn’t look necessary.


回答 2

或者,您也可以尝试在使用json进行编码的同时将字典转换为[(k1,v1),(k2,v2)]格式的列表,并在将其解码后将其转换回字典。


>>>> import json
>>>> json.dumps(releases.items())
    '[[1, "foo-v0.1"]]'
>>>> releases = {1: "foo-v0.1"}
>>>> releases == dict(json.loads(json.dumps(releases.items())))
     True
我相信这将需要更多的工作,例如具有某种标志,以识别从json解码回去后将要转换为字典的所有参数。

Alternatively you can also try converting dictionary to a list of [(k1,v1),(k2,v2)] format while encoding it using json, and converting it back to dictionary after decoding it back.


>>>> import json
>>>> json.dumps(releases.items())
    '[[1, "foo-v0.1"]]'
>>>> releases = {1: "foo-v0.1"}
>>>> releases == dict(json.loads(json.dumps(releases.items())))
     True
I believe this will need some more work like having some sort of flag to identify what all parameters to be converted to dictionary after decoding it back from json.

回答 3

回答您的子问题:

可以通过使用 json.loads(jsonDict, object_hook=jsonKeys2int)

def jsonKeys2int(x):
    if isinstance(x, dict):
            return {int(k):v for k,v in x.items()}
    return x

此功能也适用于嵌套词典,并使用词典理解。

如果您也想强制转换值,请使用:

def jsonKV2int(x):
    if isinstance(x, dict):
            return {int(k):(int(v) if isinstance(v, unicode) else v) for k,v in x.items()}
    return x

它测试值的实例并仅在它们是字符串对象(确切地说是unicode)时才将其强制转换。

这两个函数均假定键(和值)为整数。

谢谢:

如何在字典理解中使用if / else?

在字典中将字符串键转换为int

Answering your subquestion:

It can be accomplished by using json.loads(jsonDict, object_hook=jsonKeys2int)

def jsonKeys2int(x):
    if isinstance(x, dict):
            return {int(k):v for k,v in x.items()}
    return x

This function will also work for nested dicts and uses a dict comprehension.

If you want to to cast the values too, use:

def jsonKV2int(x):
    if isinstance(x, dict):
            return {int(k):(int(v) if isinstance(v, unicode) else v) for k,v in x.items()}
    return x

Which tests the instance of the values and casts them only if they are strings objects (unicode to be exact).

Both functions assumes keys (and values) to be integers.

Thanks to:

How to use if/else in a dictionary comprehension?

Convert a string key to int in a Dictionary


回答 4

我被同样的问题咬了。正如其他人指出的那样,在JSON中,映射键必须是字符串。您可以做两件事之一。您可以使用不太严格的JSON库,例如demjson,它允许整数字符串。如果没有其他程序(或其他语言的其他语言)无法读取它,那么您应该可以。或者,您可以使用其他序列化语言。我不建议泡菜。它很难阅读,并非旨在确保安全。相反,我建议使用YAML,它几乎是JSON的超集,并且确实允许整数键。(至少PyYAML这样做。)

I’ve gotten bitten by the same problem. As others have pointed out, in JSON, the mapping keys must be strings. You can do one of two things. You can use a less strict JSON library, like demjson, which allows integer strings. If no other programs (or no other in other languages) are going to read it, then you should be okay. Or you can use a different serialization language. I wouldn’t suggest pickle. It’s hard to read, and is not designed to be secure. Instead, I’d suggest YAML, which is (nearly) a superset of JSON, and does allow integer keys. (At least PyYAML does.)


回答 5

使用将字典转换为字符串str(dict),然后执行以下操作将其转换回dict:

import ast
ast.literal_eval(string)

Convert the dictionary to be string by using str(dict) and then convert it back to dict by doing this:

import ast
ast.literal_eval(string)

回答 6

这是我的解决方案!我用过object_hook,当您嵌套时很有用json

>>> import json
>>> json_data = '{"1": "one", "2": {"-3": "minus three", "4": "four"}}'
>>> py_dict = json.loads(json_data, object_hook=lambda d: {int(k) if k.lstrip('-').isdigit() else k: v for k, v in d.items()})

>>> py_dict
{1: 'one', 2: {-3: 'minus three', 4: 'four'}}

仅用于将json键解析为int的过滤器。您也可以将int(v) if v.lstrip('-').isdigit() else v过滤器用于json值。

Here is my solution! I used object_hook, it is useful when you have nested json

>>> import json
>>> json_data = '{"1": "one", "2": {"-3": "minus three", "4": "four"}}'
>>> py_dict = json.loads(json_data, object_hook=lambda d: {int(k) if k.lstrip('-').isdigit() else k: v for k, v in d.items()})

>>> py_dict
{1: 'one', 2: {-3: 'minus three', 4: 'four'}}

There is filter only for parsing json key to int. You can use int(v) if v.lstrip('-').isdigit() else v filter for json value too.


回答 7

我对Murmel的答案做了一个非常简单的扩展,我认为它可以在相当随意的字典(包括嵌套字典)上工作,前提是它首先可以被JSON转储。任何可以解释为整数的键都将转换为int。毫无疑问,这不是很有效,但是它可以实现我存储到json字符串和从json字符串加载的目的。

def convert_keys_to_int(d: dict):
    new_dict = {}
    for k, v in d.items():
        try:
            new_key = int(k)
        except ValueError:
            new_key = k
        if type(v) == dict:
            v = _convert_keys_to_int(v)
        new_dict[new_key] = v
    return new_dict

假设原始字典中的所有键都是整数(如果可以将它们强制转换为int),则在将其存储为json后将返回原始字典。例如

>>>d = {1: 3, 2: 'a', 3: {1: 'a', 2: 10}, 4: {'a': 2, 'b': 10}}
>>>convert_keys_to_int(json.loads(json.dumps(d)))  == d
True

I made a very simple extension of Murmel’s answer which I think will work on a pretty arbitrary dictionary (including nested) assuming it can be dumped by JSON in the first place. Any keys which can be interpreted as integers will be cast to int. No doubt this is not very efficient, but it works for my purposes of storing to and loading from json strings.

def convert_keys_to_int(d: dict):
    new_dict = {}
    for k, v in d.items():
        try:
            new_key = int(k)
        except ValueError:
            new_key = k
        if type(v) == dict:
            v = _convert_keys_to_int(v)
        new_dict[new_key] = v
    return new_dict

Assuming that all keys in the original dict are integers if they can be cast to int, then this will return the original dictionary after storing as a json. e.g.

>>>d = {1: 3, 2: 'a', 3: {1: 'a', 2: 10}, 4: {'a': 2, 'b': 10}}
>>>convert_keys_to_int(json.loads(json.dumps(d)))  == d
True

回答 8

你可以写你json.dumps自己,这里是从例如djsonencoder.py。您可以像这样使用它:

assert dumps({1: "abc"}) == '{1: "abc"}'

You can write your json.dumps by yourself, here is a example from djson: encoder.py. You can use it like this:

assert dumps({1: "abc"}) == '{1: "abc"}'

比“无法解码JSON对象”显示更好的错误消息

问题:比“无法解码JSON对象”显示更好的错误消息

Python代码可从一些冗长而复杂的JSON文件加载数据:

with open(filename, "r") as f:
  data = json.loads(f.read())

(注意:最佳代码版本应为:

with open(filename, "r") as f:
  data = json.load(f)

但两者都表现出相似的行为)

对于许多类型的JSON错误(缺少分隔符,字符串中不正确的反斜杠等),这会打印出一条非常有用的消息,其中包含找到JSON错误的行号和列号。

但是,对于其他类型的JSON错误(包括经典的“在列表中的最后一项上使用逗号”,以及其他诸如大写true / false的大写字母),Python的输出仅为:

Traceback (most recent call last):
  File "myfile.py", line 8, in myfunction
    config = json.loads(f.read())
  File "c:\python27\lib\json\__init__.py", line 326, in loads
    return _default_decoder.decode(s)
  File "c:\python27\lib\json\decoder.py", line 360, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "c:\python27\lib\json\decoder.py", line 378, in raw_decode
    raise ValueError("No JSON object could be decoded")
ValueError: No JSON object could be decoded

对于这种类型的ValueError,如何让Python告诉您JSON文件中的错误在哪里?

Python code to load data from some long complicated JSON file:

with open(filename, "r") as f:
  data = json.loads(f.read())

(note: the best code version should be:

with open(filename, "r") as f:
  data = json.load(f)

but both exhibit similar behavior)

For many types of JSON error (missing delimiters, incorrect backslashes in strings, etc), this prints a nice helpful message containing the line and column number where the JSON error was found.

However, for other types of JSON error (including the classic “using comma on the last item in a list”, but also other things like capitalising true/false), Python’s output is just:

Traceback (most recent call last):
  File "myfile.py", line 8, in myfunction
    config = json.loads(f.read())
  File "c:\python27\lib\json\__init__.py", line 326, in loads
    return _default_decoder.decode(s)
  File "c:\python27\lib\json\decoder.py", line 360, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "c:\python27\lib\json\decoder.py", line 378, in raw_decode
    raise ValueError("No JSON object could be decoded")
ValueError: No JSON object could be decoded

For that type of ValueError, how do you get Python to tell you where is the error in the JSON file?


回答 0

我发现,在simplejson内置json模块含糊不清的许多情况下,该模块会给出更多的描述性错误。例如,对于列表中最后一项之后的逗号:

json.loads('[1,2,]')
....
ValueError: No JSON object could be decoded

这不是很描述。与以下操作相同simplejson

simplejson.loads('[1,2,]')
...
simplejson.decoder.JSONDecodeError: Expecting object: line 1 column 5 (char 5)

好多了!同样适用于其他常见错误,例如大写True

I’ve found that the simplejson module gives more descriptive errors in many cases where the built-in json module is vague. For instance, for the case of having a comma after the last item in a list:

json.loads('[1,2,]')
....
ValueError: No JSON object could be decoded

which is not very descriptive. The same operation with simplejson:

simplejson.loads('[1,2,]')
...
simplejson.decoder.JSONDecodeError: Expecting object: line 1 column 5 (char 5)

Much better! Likewise for other common errors like capitalizing True.


回答 1

您将无法获得python来告诉您JSON不正确的地方。您将需要在这样的地方在线使用棉绒

这将向您显示您尝试解码的JSON错误。

You wont be able to get python to tell you where the JSON is incorrect. You will need to use a linter online somewhere like this

This will show you error in the JSON you are trying to decode.


回答 2

您可以尝试在以下位置找到rson库:http : //code.google.com/p/rson/。我还在PYPI上:https ://pypi.python.org/pypi/rson/0.9,所以您可以使用easy_install或pip来获取它。

对于tom给出的示例:

>>> rson.loads('[1,2,]')
...
rson.base.tokenizer.RSONDecodeError: Unexpected trailing comma: line 1, column 6, text ']'

RSON被设计为JSON的超集,因此它可以解析JSON文件。它还具有一种替代语法,对于人类来说,查看和编辑它好得多。我在输入文件中使用了很多。

至于布尔值的大写:rson似乎错误地将大写的布尔值读取为字符串。

>>> rson.loads('[true,False]')
[True, u'False']

You could try the rson library found here: http://code.google.com/p/rson/ . I it also up on PYPI: https://pypi.python.org/pypi/rson/0.9 so you can use easy_install or pip to get it.

for the example given by tom:

>>> rson.loads('[1,2,]')
...
rson.base.tokenizer.RSONDecodeError: Unexpected trailing comma: line 1, column 6, text ']'

RSON is a designed to be a superset of JSON, so it can parse JSON files. It also has an alternate syntax which is much nicer for humans to look at and edit. I use it quite a bit for input files.

As for the capitalizing of boolean values: it appears that rson reads incorrectly capitalized booleans as strings.

>>> rson.loads('[true,False]')
[True, u'False']

回答 3

我有一个类似的问题,这是由于单引号引起的。JSON标准(http://json.org)仅讨论使用双引号,因此必须是python json库仅支持双引号。

I had a similar problem and it was due to singlequotes. The JSON standard(http://json.org) talks only about using double quotes so it must be that the python json library supports only double quotes.


回答 4

对于这个问题的特定版本,我继续搜索load_json_file(path)packaging.py文件中的函数声明,然后在其中走私了print一行:

def load_json_file(path):
    data = open(path, 'r').read()
    print data
    try:
        return Bunch(json.loads(data))
    except ValueError, e:
        raise MalformedJsonFileError('%s when reading "%s"' % (str(e),
                                                               path))

这样,它将在进入try-catch之前打印json文件的内容,并且即使我几乎不具备Python知识,我也能够迅速弄清楚为什么我的配置无法读取json文件。
(这是因为我设置了文本编辑器来编写UTF-8 BOM …愚蠢)

仅仅提及这一点是因为,虽然可能不能很好地解决OP的特定问题,但这是一种确定非常令人讨厌的bug的来源的相当快捷的方法。我敢打赌,很多人会偶然发现这篇文章,他们正在寻找更详细的解决方案MalformedJsonFileError: No JSON object could be decoded when reading …。这样可能对他们有帮助。

For my particular version of this problem, I went ahead and searched the function declaration of load_json_file(path) within the packaging.py file, then smuggled a print line into it:

def load_json_file(path):
    data = open(path, 'r').read()
    print data
    try:
        return Bunch(json.loads(data))
    except ValueError, e:
        raise MalformedJsonFileError('%s when reading "%s"' % (str(e),
                                                               path))

That way it would print the content of the json file before entering the try-catch, and that way – even with my barely existing Python knowledge – I was able to quickly figure out why my configuration couldn’t read the json file.
(It was because I had set up my text editor to write a UTF-8 BOM … stupid)

Just mentioning this because, while maybe not a good answer to the OP’s specific problem, this was a rather quick method in determining the source of a very oppressing bug. And I bet that many people will stumble upon this article who are searching a more verbose solution for a MalformedJsonFileError: No JSON object could be decoded when reading …. So that might help them.


回答 5

对我来说,我的json文件很大,json在python中使用common 时会出现上述错误。

安装后simplejson通过sudo pip install simplejson

然后我解决了。

import json
import simplejson


def test_parse_json():
    f_path = '/home/hello/_data.json'
    with open(f_path) as f:
        # j_data = json.load(f)      # ValueError: No JSON object could be decoded
        j_data = simplejson.load(f)  # right
    lst_img = j_data['images']['image']
    print lst_img[0]


if __name__ == '__main__':
    test_parse_json()

As to me, my json file is very large, when use common json in python it gets the above error.

After install simplejson by sudo pip install simplejson.

And then I solved it.

import json
import simplejson


def test_parse_json():
    f_path = '/home/hello/_data.json'
    with open(f_path) as f:
        # j_data = json.load(f)      # ValueError: No JSON object could be decoded
        j_data = simplejson.load(f)  # right
    lst_img = j_data['images']['image']
    print lst_img[0]


if __name__ == '__main__':
    test_parse_json()

回答 6

我有一个类似的问题,这是我的代码:

    json_file=json.dumps(pyJson)
    file = open("list.json",'w')
    file.write(json_file)  

    json_file = open("list.json","r")
    json_decoded = json.load(json_file)
    print json_decoded

问题是我忘了file.close() 做到这一点并解决了问题。

I had a similar problem this was my code:

    json_file=json.dumps(pyJson)
    file = open("list.json",'w')
    file.write(json_file)  

    json_file = open("list.json","r")
    json_decoded = json.load(json_file)
    print json_decoded

the problem was i had forgotten to file.close() I did it and fixed the problem.


回答 7

可接受的答案是解决问题的最简单方法。但是,如果由于公司政策而不允许您安装simplejson,我建议采用以下解决方案来解决“在列表中的最后一项上使用逗号”这一特定问题:

  1. 创建一个子类“ JSONLintCheck”以从类“ JSONDecoder”继承,并覆盖类“ JSONDecoder”的init方法,如下所示:

    def __init__(self, encoding=None, object_hook=None, parse_float=None,parse_int=None, parse_constant=None, strict=True,object_pairs_hook=None)        
            super(JSONLintCheck,self).__init__(encoding=None, object_hook=None,      parse_float=None,parse_int=None, parse_constant=None, strict=True,object_pairs_hook=None)
            self.scan_once = make_scanner(self)
  1. make_scanner是一个新函数,用于覆盖上述类的’scan_once’方法。这是它的代码:
  1 #!/usr/bin/env python
  2 from json import JSONDecoder
  3 from json import decoder
  4 import re
  5
  6 NUMBER_RE = re.compile(
  7     r'(-?(?:0|[1-9]\d*))(\.\d+)?([eE][-+]?\d+)?',
  8     (re.VERBOSE | re.MULTILINE | re.DOTALL))
  9
 10 def py_make_scanner(context):
 11     parse_object = context.parse_object
 12     parse_array = context.parse_array
 13     parse_string = context.parse_string
 14     match_number = NUMBER_RE.match
 15     encoding = context.encoding
 16     strict = context.strict
 17     parse_float = context.parse_float
 18     parse_int = context.parse_int
 19     parse_constant = context.parse_constant
 20     object_hook = context.object_hook
 21     object_pairs_hook = context.object_pairs_hook
 22
 23     def _scan_once(string, idx):
 24         try:
 25             nextchar = string[idx]
 26         except IndexError:
 27             raise ValueError(decoder.errmsg("Could not get the next character",string,idx))
 28             #raise StopIteration
 29
 30         if nextchar == '"':
 31             return parse_string(string, idx + 1, encoding, strict)
 32         elif nextchar == '{':
 33             return parse_object((string, idx + 1), encoding, strict,
 34                 _scan_once, object_hook, object_pairs_hook)
 35         elif nextchar == '[':
 36             return parse_array((string, idx + 1), _scan_once)
 37         elif nextchar == 'n' and string[idx:idx + 4] == 'null':
 38             return None, idx + 4
 39         elif nextchar == 't' and string[idx:idx + 4] == 'true':
 40             return True, idx + 4
 41         elif nextchar == 'f' and string[idx:idx + 5] == 'false':
 42             return False, idx + 5
 43
 44         m = match_number(string, idx)
 45         if m is not None:
 46             integer, frac, exp = m.groups()
 47             if frac or exp:
 48                 res = parse_float(integer + (frac or '') + (exp or ''))
 49             else:
 50                 res = parse_int(integer)
 51             return res, m.end()
 52         elif nextchar == 'N' and string[idx:idx + 3] == 'NaN':
 53             return parse_constant('NaN'), idx + 3
 54         elif nextchar == 'I' and string[idx:idx + 8] == 'Infinity':
 55             return parse_constant('Infinity'), idx + 8
 56         elif nextchar == '-' and string[idx:idx + 9] == '-Infinity':
 57             return parse_constant('-Infinity'), idx + 9
 58         else:
 59             #raise StopIteration   # Here is where needs modification
 60             raise ValueError(decoder.errmsg("Expecting propert name enclosed in double quotes",string,idx))
 61     return _scan_once
 62
 63 make_scanner = py_make_scanner
  1. 最好将“ make_scanner”功能与新的子类一起放入同一文件中。

The accepted answer is the easiest one to fix the problem. But in case you are not allowed to install the simplejson due to your company policy, I propose below solution to fix the particular issue of “using comma on the last item in a list”:

  1. Create a child class “JSONLintCheck” to inherite from class “JSONDecoder” and override the init method of the class “JSONDecoder” like below:

    def __init__(self, encoding=None, object_hook=None, parse_float=None,parse_int=None, parse_constant=None, strict=True,object_pairs_hook=None)        
            super(JSONLintCheck,self).__init__(encoding=None, object_hook=None,      parse_float=None,parse_int=None, parse_constant=None, strict=True,object_pairs_hook=None)
            self.scan_once = make_scanner(self)
    
  1. make_scanner is a new function that used to override the ‘scan_once’ method of the above class. And here is code for it:
  1 #!/usr/bin/env python
  2 from json import JSONDecoder
  3 from json import decoder
  4 import re
  5
  6 NUMBER_RE = re.compile(
  7     r'(-?(?:0|[1-9]\d*))(\.\d+)?([eE][-+]?\d+)?',
  8     (re.VERBOSE | re.MULTILINE | re.DOTALL))
  9
 10 def py_make_scanner(context):
 11     parse_object = context.parse_object
 12     parse_array = context.parse_array
 13     parse_string = context.parse_string
 14     match_number = NUMBER_RE.match
 15     encoding = context.encoding
 16     strict = context.strict
 17     parse_float = context.parse_float
 18     parse_int = context.parse_int
 19     parse_constant = context.parse_constant
 20     object_hook = context.object_hook
 21     object_pairs_hook = context.object_pairs_hook
 22
 23     def _scan_once(string, idx):
 24         try:
 25             nextchar = string[idx]
 26         except IndexError:
 27             raise ValueError(decoder.errmsg("Could not get the next character",string,idx))
 28             #raise StopIteration
 29
 30         if nextchar == '"':
 31             return parse_string(string, idx + 1, encoding, strict)
 32         elif nextchar == '{':
 33             return parse_object((string, idx + 1), encoding, strict,
 34                 _scan_once, object_hook, object_pairs_hook)
 35         elif nextchar == '[':
 36             return parse_array((string, idx + 1), _scan_once)
 37         elif nextchar == 'n' and string[idx:idx + 4] == 'null':
 38             return None, idx + 4
 39         elif nextchar == 't' and string[idx:idx + 4] == 'true':
 40             return True, idx + 4
 41         elif nextchar == 'f' and string[idx:idx + 5] == 'false':
 42             return False, idx + 5
 43
 44         m = match_number(string, idx)
 45         if m is not None:
 46             integer, frac, exp = m.groups()
 47             if frac or exp:
 48                 res = parse_float(integer + (frac or '') + (exp or ''))
 49             else:
 50                 res = parse_int(integer)
 51             return res, m.end()
 52         elif nextchar == 'N' and string[idx:idx + 3] == 'NaN':
 53             return parse_constant('NaN'), idx + 3
 54         elif nextchar == 'I' and string[idx:idx + 8] == 'Infinity':
 55             return parse_constant('Infinity'), idx + 8
 56         elif nextchar == '-' and string[idx:idx + 9] == '-Infinity':
 57             return parse_constant('-Infinity'), idx + 9
 58         else:
 59             #raise StopIteration   # Here is where needs modification
 60             raise ValueError(decoder.errmsg("Expecting propert name enclosed in double quotes",string,idx))
 61     return _scan_once
 62
 63 make_scanner = py_make_scanner
  1. Better put the ‘make_scanner’ function together with the new child class into a same file.

回答 8

只是遇到了同样的问题,在我的情况下,问题与BOM文件开头的(字节顺序标记)有关。

json.tool 直到我删除了UTF BOM标记,都拒绝处理甚至是空文件(只是花括号)。

我所做的是:

  • 用vim打开我的json文件,
  • 删除了字节顺序标记(set nobomb
  • 保存存档

这就解决了json.tool的问题。希望这可以帮助!

Just hit the same issue and in my case the problem was related to BOM (byte order mark) at the beginning of the file.

json.tool would refuse to process even empty file (just curly braces) until i removed the UTF BOM mark.

What I have done is:

  • opened my json file with vim,
  • removed byte order mark (set nobomb)
  • save file

This resolved the problem with json.tool. Hope this helps!


回答 9

创建文件时。而不是创建内容为空的文件。用。。。来代替:

json.dump({}, file)

When your file is created. Instead of creating a file with content is empty. Replace with:

json.dump({}, file)

回答 10

您可以使用cjson,它声称比纯python实现快250倍,因为您有“某个较长且复杂的JSON文件”,并且可能需要运行几次(解码器失败并报告第一个错误)仅遇到)。

You could use cjson, that claims to be up to 250 times faster than pure-python implementations, given that you have “some long complicated JSON file” and you will probably need to run it several times (decoders fail and report the first error they encounter only).


Python中的字符串到字典

问题:Python中的字符串到字典

所以我花了很多时间在此上,在我看来,这应该是一个简单的修复。我正在尝试使用Facebook的身份验证在我的网站上注册用户,并且正在服务器端进行操作。我已经到了获取访问令牌的地步,并且当我去:

https://graph.facebook.com/me?access_token=MY_ACCESS_TOKEN

我得到的信息就是这样的字符串:

{"id":"123456789","name":"John Doe","first_name":"John","last_name":"Doe","link":"http:\/\/www.facebook.com\/jdoe","gender":"male","email":"jdoe\u0040gmail.com","timezone":-7,"locale":"en_US","verified":true,"updated_time":"2011-01-12T02:43:35+0000"}

似乎我应该可以使用dict(string)它,但出现此错误:

ValueError: dictionary update sequence element #0 has length 1; 2 is required

所以我尝试使用Pickle,但收到此错误:

KeyError: '{'

我尝试使用django.serializers反序列化它,但结果相似。有什么想法吗?我觉得答案必须很简单,而且我很愚蠢。谢谢你的帮助!

So I’ve spent way to much time on this, and it seems to me like it should be a simple fix. I’m trying to use Facebook’s Authentication to register users on my site, and I’m trying to do it server side. I’ve gotten to the point where I get my access token, and when I go to:

https://graph.facebook.com/me?access_token=MY_ACCESS_TOKEN

I get the information I’m looking for as a string that’s like this:

{"id":"123456789","name":"John Doe","first_name":"John","last_name":"Doe","link":"http:\/\/www.facebook.com\/jdoe","gender":"male","email":"jdoe\u0040gmail.com","timezone":-7,"locale":"en_US","verified":true,"updated_time":"2011-01-12T02:43:35+0000"}

It seems like I should just be able to use dict(string) on this but I’m getting this error:

ValueError: dictionary update sequence element #0 has length 1; 2 is required

So I tried using Pickle, but got this error:

KeyError: '{'

I tried using django.serializers to de-serialize it but had similar results. Any thoughts? I feel like the answer has to be simple, and I’m just being stupid. Thanks for any help!


回答 0

此数据为JSON!如果您使用的是Python 2.6+,则可以使用内置json模块反序列化它,否则可以使用出色的第三方simplejson模块

import json    # or `import simplejson as json` if on Python < 2.6

json_string = u'{ "id":"123456789", ... }'
obj = json.loads(json_string)    # obj now contains a dict of the data

This data is JSON! You can deserialize it using the built-in json module if you’re on Python 2.6+, otherwise you can use the excellent third-party simplejson module.

import json    # or `import simplejson as json` if on Python < 2.6

json_string = u'{ "id":"123456789", ... }'
obj = json.loads(json_string)    # obj now contains a dict of the data

回答 1

使用ast.literal_eval评估Python文字。但是,您拥有的是JSON(例如,请注意“ true”),因此请使用JSON解串器。

>>> import json
>>> s = """{"id":"123456789","name":"John Doe","first_name":"John","last_name":"Doe","link":"http:\/\/www.facebook.com\/jdoe","gender":"male","email":"jdoe\u0040gmail.com","timezone":-7,"locale":"en_US","verified":true,"updated_time":"2011-01-12T02:43:35+0000"}"""
>>> json.loads(s)
{u'first_name': u'John', u'last_name': u'Doe', u'verified': True, u'name': u'John Doe', u'locale': u'en_US', u'gender': u'male', u'email': u'jdoe@gmail.com', u'link': u'http://www.facebook.com/jdoe', u'timezone': -7, u'updated_time': u'2011-01-12T02:43:35+0000', u'id': u'123456789'}

Use ast.literal_eval to evaluate Python literals. However, what you have is JSON (note “true” for example), so use a JSON deserializer.

>>> import json
>>> s = """{"id":"123456789","name":"John Doe","first_name":"John","last_name":"Doe","link":"http:\/\/www.facebook.com\/jdoe","gender":"male","email":"jdoe\u0040gmail.com","timezone":-7,"locale":"en_US","verified":true,"updated_time":"2011-01-12T02:43:35+0000"}"""
>>> json.loads(s)
{u'first_name': u'John', u'last_name': u'Doe', u'verified': True, u'name': u'John Doe', u'locale': u'en_US', u'gender': u'male', u'email': u'jdoe@gmail.com', u'link': u'http://www.facebook.com/jdoe', u'timezone': -7, u'updated_time': u'2011-01-12T02:43:35+0000', u'id': u'123456789'}

如何将xml字符串转换为字典?

问题:如何将xml字符串转换为字典?

我有一个程序可以从套接字读取xml文档。我将xml文档存储在一个字符串中,我想将其直接转换为Python字典,就像在Django的simplejson库中一样。

举个例子:

str ="<?xml version="1.0" ?><person><name>john</name><age>20</age></person"
dic_xml = convert_to_dic(str)

然后dic_xml看起来像{'person' : { 'name' : 'john', 'age' : 20 } }

I have a program that reads an xml document from a socket. I have the xml document stored in a string which I would like to convert directly to a Python dictionary, the same way it is done in Django’s simplejson library.

Take as an example:

str ="<?xml version="1.0" ?><person><name>john</name><age>20</age></person"
dic_xml = convert_to_dic(str)

Then dic_xml would look like {'person' : { 'name' : 'john', 'age' : 20 } }


回答 0

这是某人创建的一个很棒的模块。我已经使用过几次了。 http://code.activestate.com/recipes/410469-xml-as-dictionary/

这是网站上的代码,以防链接损坏。

from xml.etree import cElementTree as ElementTree

class XmlListConfig(list):
    def __init__(self, aList):
        for element in aList:
            if element:
                # treat like dict
                if len(element) == 1 or element[0].tag != element[1].tag:
                    self.append(XmlDictConfig(element))
                # treat like list
                elif element[0].tag == element[1].tag:
                    self.append(XmlListConfig(element))
            elif element.text:
                text = element.text.strip()
                if text:
                    self.append(text)


class XmlDictConfig(dict):
    '''
    Example usage:

    >>> tree = ElementTree.parse('your_file.xml')
    >>> root = tree.getroot()
    >>> xmldict = XmlDictConfig(root)

    Or, if you want to use an XML string:

    >>> root = ElementTree.XML(xml_string)
    >>> xmldict = XmlDictConfig(root)

    And then use xmldict for what it is... a dict.
    '''
    def __init__(self, parent_element):
        if parent_element.items():
            self.update(dict(parent_element.items()))
        for element in parent_element:
            if element:
                # treat like dict - we assume that if the first two tags
                # in a series are different, then they are all different.
                if len(element) == 1 or element[0].tag != element[1].tag:
                    aDict = XmlDictConfig(element)
                # treat like list - we assume that if the first two tags
                # in a series are the same, then the rest are the same.
                else:
                    # here, we put the list in dictionary; the key is the
                    # tag name the list elements all share in common, and
                    # the value is the list itself 
                    aDict = {element[0].tag: XmlListConfig(element)}
                # if the tag has attributes, add those to the dict
                if element.items():
                    aDict.update(dict(element.items()))
                self.update({element.tag: aDict})
            # this assumes that if you've got an attribute in a tag,
            # you won't be having any text. This may or may not be a 
            # good idea -- time will tell. It works for the way we are
            # currently doing XML configuration files...
            elif element.items():
                self.update({element.tag: dict(element.items())})
            # finally, if there are no child tags and no attributes, extract
            # the text
            else:
                self.update({element.tag: element.text})

用法示例:

tree = ElementTree.parse('your_file.xml')
root = tree.getroot()
xmldict = XmlDictConfig(root)

//或者,如果要使用XML字符串:

root = ElementTree.XML(xml_string)
xmldict = XmlDictConfig(root)

This is a great module that someone created. I’ve used it several times. http://code.activestate.com/recipes/410469-xml-as-dictionary/

Here is the code from the website just in case the link goes bad.

from xml.etree import cElementTree as ElementTree

class XmlListConfig(list):
    def __init__(self, aList):
        for element in aList:
            if element:
                # treat like dict
                if len(element) == 1 or element[0].tag != element[1].tag:
                    self.append(XmlDictConfig(element))
                # treat like list
                elif element[0].tag == element[1].tag:
                    self.append(XmlListConfig(element))
            elif element.text:
                text = element.text.strip()
                if text:
                    self.append(text)


class XmlDictConfig(dict):
    '''
    Example usage:

    >>> tree = ElementTree.parse('your_file.xml')
    >>> root = tree.getroot()
    >>> xmldict = XmlDictConfig(root)

    Or, if you want to use an XML string:

    >>> root = ElementTree.XML(xml_string)
    >>> xmldict = XmlDictConfig(root)

    And then use xmldict for what it is... a dict.
    '''
    def __init__(self, parent_element):
        if parent_element.items():
            self.update(dict(parent_element.items()))
        for element in parent_element:
            if element:
                # treat like dict - we assume that if the first two tags
                # in a series are different, then they are all different.
                if len(element) == 1 or element[0].tag != element[1].tag:
                    aDict = XmlDictConfig(element)
                # treat like list - we assume that if the first two tags
                # in a series are the same, then the rest are the same.
                else:
                    # here, we put the list in dictionary; the key is the
                    # tag name the list elements all share in common, and
                    # the value is the list itself 
                    aDict = {element[0].tag: XmlListConfig(element)}
                # if the tag has attributes, add those to the dict
                if element.items():
                    aDict.update(dict(element.items()))
                self.update({element.tag: aDict})
            # this assumes that if you've got an attribute in a tag,
            # you won't be having any text. This may or may not be a 
            # good idea -- time will tell. It works for the way we are
            # currently doing XML configuration files...
            elif element.items():
                self.update({element.tag: dict(element.items())})
            # finally, if there are no child tags and no attributes, extract
            # the text
            else:
                self.update({element.tag: element.text})

Example usage:

tree = ElementTree.parse('your_file.xml')
root = tree.getroot()
xmldict = XmlDictConfig(root)

//Or, if you want to use an XML string:

root = ElementTree.XML(xml_string)
xmldict = XmlDictConfig(root)

回答 1

xmltodict(完全公开:我写了它)确实做到了:

xmltodict.parse("""
<?xml version="1.0" ?>
<person>
  <name>john</name>
  <age>20</age>
</person>""")
# {u'person': {u'age': u'20', u'name': u'john'}}

xmltodict (full disclosure: I wrote it) does exactly that:

xmltodict.parse("""
<?xml version="1.0" ?>
<person>
  <name>john</name>
  <age>20</age>
</person>""")
# {u'person': {u'age': u'20', u'name': u'john'}}

回答 2

以下XML-to-Python-dict片段分析了此XML-to-JSON“规范”之后的实体以及属性。这是处理XML所有情况的最通用的解决方案。

from collections import defaultdict

def etree_to_dict(t):
    d = {t.tag: {} if t.attrib else None}
    children = list(t)
    if children:
        dd = defaultdict(list)
        for dc in map(etree_to_dict, children):
            for k, v in dc.items():
                dd[k].append(v)
        d = {t.tag: {k:v[0] if len(v) == 1 else v for k, v in dd.items()}}
    if t.attrib:
        d[t.tag].update(('@' + k, v) for k, v in t.attrib.items())
    if t.text:
        text = t.text.strip()
        if children or t.attrib:
            if text:
              d[t.tag]['#text'] = text
        else:
            d[t.tag] = text
    return d

它用于:

from xml.etree import cElementTree as ET
e = ET.XML('''
<root>
  <e />
  <e>text</e>
  <e name="value" />
  <e name="value">text</e>
  <e> <a>text</a> <b>text</b> </e>
  <e> <a>text</a> <a>text</a> </e>
  <e> text <a>text</a> </e>
</root>
''')

from pprint import pprint
pprint(etree_to_dict(e))

此示例的输出(根据上面链接的“规范”)应为:

{'root': {'e': [None,
                'text',
                {'@name': 'value'},
                {'#text': 'text', '@name': 'value'},
                {'a': 'text', 'b': 'text'},
                {'a': ['text', 'text']},
                {'#text': 'text', 'a': 'text'}]}}

不一定很漂亮,但它是明确的,而更简单的XML输入会导致更简单的JSON。:)


更新资料

如果要进行相反的操作从JSON / dict发出XML字符串,则可以使用:

try:
  basestring
except NameError:  # python3
  basestring = str

def dict_to_etree(d):
    def _to_etree(d, root):
        if not d:
            pass
        elif isinstance(d, basestring):
            root.text = d
        elif isinstance(d, dict):
            for k,v in d.items():
                assert isinstance(k, basestring)
                if k.startswith('#'):
                    assert k == '#text' and isinstance(v, basestring)
                    root.text = v
                elif k.startswith('@'):
                    assert isinstance(v, basestring)
                    root.set(k[1:], v)
                elif isinstance(v, list):
                    for e in v:
                        _to_etree(e, ET.SubElement(root, k))
                else:
                    _to_etree(v, ET.SubElement(root, k))
        else:
            raise TypeError('invalid type: ' + str(type(d)))
    assert isinstance(d, dict) and len(d) == 1
    tag, body = next(iter(d.items()))
    node = ET.Element(tag)
    _to_etree(body, node)
    return ET.tostring(node)

pprint(dict_to_etree(d))

The following XML-to-Python-dict snippet parses entities as well as attributes following this XML-to-JSON “specification”. It is the most general solution handling all cases of XML.

from collections import defaultdict

def etree_to_dict(t):
    d = {t.tag: {} if t.attrib else None}
    children = list(t)
    if children:
        dd = defaultdict(list)
        for dc in map(etree_to_dict, children):
            for k, v in dc.items():
                dd[k].append(v)
        d = {t.tag: {k:v[0] if len(v) == 1 else v for k, v in dd.items()}}
    if t.attrib:
        d[t.tag].update(('@' + k, v) for k, v in t.attrib.items())
    if t.text:
        text = t.text.strip()
        if children or t.attrib:
            if text:
              d[t.tag]['#text'] = text
        else:
            d[t.tag] = text
    return d

It is used:

from xml.etree import cElementTree as ET
e = ET.XML('''
<root>
  <e />
  <e>text</e>
  <e name="value" />
  <e name="value">text</e>
  <e> <a>text</a> <b>text</b> </e>
  <e> <a>text</a> <a>text</a> </e>
  <e> text <a>text</a> </e>
</root>
''')

from pprint import pprint
pprint(etree_to_dict(e))

The output of this example (as per above-linked “specification”) should be:

{'root': {'e': [None,
                'text',
                {'@name': 'value'},
                {'#text': 'text', '@name': 'value'},
                {'a': 'text', 'b': 'text'},
                {'a': ['text', 'text']},
                {'#text': 'text', 'a': 'text'}]}}

Not necessarily pretty, but it is unambiguous, and simpler XML inputs result in simpler JSON. :)


Update

If you want to do the reverse, emit an XML string from a JSON/dict, you can use:

try:
  basestring
except NameError:  # python3
  basestring = str

def dict_to_etree(d):
    def _to_etree(d, root):
        if not d:
            pass
        elif isinstance(d, basestring):
            root.text = d
        elif isinstance(d, dict):
            for k,v in d.items():
                assert isinstance(k, basestring)
                if k.startswith('#'):
                    assert k == '#text' and isinstance(v, basestring)
                    root.text = v
                elif k.startswith('@'):
                    assert isinstance(v, basestring)
                    root.set(k[1:], v)
                elif isinstance(v, list):
                    for e in v:
                        _to_etree(e, ET.SubElement(root, k))
                else:
                    _to_etree(v, ET.SubElement(root, k))
        else:
            raise TypeError('invalid type: ' + str(type(d)))
    assert isinstance(d, dict) and len(d) == 1
    tag, body = next(iter(d.items()))
    node = ET.Element(tag)
    _to_etree(body, node)
    return ET.tostring(node)

pprint(dict_to_etree(d))

回答 3

这个轻量级的版本虽然不可配置,但是很容易根据需要进行定制,并且可以在旧的python中工作。它也是严格的-意味着无论属性是否存在,结果都是相同的。

import xml.etree.ElementTree as ET

from copy import copy

def dictify(r,root=True):
    if root:
        return {r.tag : dictify(r, False)}
    d=copy(r.attrib)
    if r.text:
        d["_text"]=r.text
    for x in r.findall("./*"):
        if x.tag not in d:
            d[x.tag]=[]
        d[x.tag].append(dictify(x,False))
    return d

所以:

root = ET.fromstring("<erik><a x='1'>v</a><a y='2'>w</a></erik>")

dictify(root)

结果是:

{'erik': {'a': [{'x': '1', '_text': 'v'}, {'y': '2', '_text': 'w'}]}}

This lightweight version, while not configurable, is pretty easy to tailor as needed, and works in old pythons. Also it is rigid – meaning the results are the same regardless of the existence of attributes.

import xml.etree.ElementTree as ET

from copy import copy

def dictify(r,root=True):
    if root:
        return {r.tag : dictify(r, False)}
    d=copy(r.attrib)
    if r.text:
        d["_text"]=r.text
    for x in r.findall("./*"):
        if x.tag not in d:
            d[x.tag]=[]
        d[x.tag].append(dictify(x,False))
    return d

So:

root = ET.fromstring("<erik><a x='1'>v</a><a y='2'>w</a></erik>")

dictify(root)

Results in:

{'erik': {'a': [{'x': '1', '_text': 'v'}, {'y': '2', '_text': 'w'}]}}

回答 4

PicklingTools库的最新版本(1.3.0和1.3.1)支持将XML转换为Python dict的工具。

可从此处下载文件: PicklingTools 1.3.1

没有为转换颇有几分文档在这里:文档中详细的所有XML和Python字典之间转换时将产生的决定和问题描述(也有一些边缘情况:属性,列表,匿名列表,匿名多数转换器无法处理的dict,eval等)。通常,这些转换器易于使用。如果“ example.xml”包含:

<top>
  <a>1</a>
  <b>2.2</b>
  <c>three</c>
</top>

然后将其转换为字典:

>>> from xmlloader import *
>>> example = file('example.xml', 'r')   # A document containing XML
>>> xl = StreamXMLLoader(example, 0)     # 0 = all defaults on operation
>>> result = xl.expect XML()
>>> print result
{'top': {'a': '1', 'c': 'three', 'b': '2.2'}}

有一些可以在C ++和Python中进行转换的工具:C ++和Python可以进行相同的转换,但是C ++的速度要快60倍左右

The most recent versions of the PicklingTools libraries (1.3.0 and 1.3.1) support tools for converting from XML to a Python dict.

The download is available here: PicklingTools 1.3.1

There is quite a bit of documentation for the converters here: the documentation describes in detail all of the decisions and issues that will arise when converting between XML and Python dictionaries (there are a number of edge cases: attributes, lists, anonymous lists, anonymous dicts, eval, etc. that most converters don’t handle). In general, though, the converters are easy to use. If an ‘example.xml’ contains:

<top>
  <a>1</a>
  <b>2.2</b>
  <c>three</c>
</top>

Then to convert it to a dictionary:

>>> from xmlloader import *
>>> example = file('example.xml', 'r')   # A document containing XML
>>> xl = StreamXMLLoader(example, 0)     # 0 = all defaults on operation
>>> result = xl.expect XML()
>>> print result
{'top': {'a': '1', 'c': 'three', 'b': '2.2'}}

There are tools for converting in both C++ and Python: the C++ and Python do indentical conversion, but the C++ is about 60x faster


回答 5

您可以使用lxml轻松完成此操作。首先安装它:

[sudo] pip install lxml

这是我编写的递归函数,可以为您完成繁重的工作:

from lxml import objectify as xml_objectify


def xml_to_dict(xml_str):
    """ Convert xml to dict, using lxml v3.4.2 xml processing library """
    def xml_to_dict_recursion(xml_object):
        dict_object = xml_object.__dict__
        if not dict_object:
            return xml_object
        for key, value in dict_object.items():
            dict_object[key] = xml_to_dict_recursion(value)
        return dict_object
    return xml_to_dict_recursion(xml_objectify.fromstring(xml_str))

xml_string = """<?xml version="1.0" encoding="UTF-8"?><Response><NewOrderResp>
<IndustryType>Test</IndustryType><SomeData><SomeNestedData1>1234</SomeNestedData1>
<SomeNestedData2>3455</SomeNestedData2></SomeData></NewOrderResp></Response>"""

print xml_to_dict(xml_string)

以下变体保留了父键/元素:

def xml_to_dict(xml_str):
    """ Convert xml to dict, using lxml v3.4.2 xml processing library, see http://lxml.de/ """
    def xml_to_dict_recursion(xml_object):
        dict_object = xml_object.__dict__
        if not dict_object:  # if empty dict returned
            return xml_object
        for key, value in dict_object.items():
            dict_object[key] = xml_to_dict_recursion(value)
        return dict_object
    xml_obj = objectify.fromstring(xml_str)
    return {xml_obj.tag: xml_to_dict_recursion(xml_obj)}

如果只想返回一个子树并将其转换为dict,则可以使用Element.find()获取该子树,然后对其进行转换:

xml_obj.find('.//')  # lxml.objectify.ObjectifiedElement instance

请在此处查看lxml文档。我希望这有帮助!

You can do this quite easily with lxml. First install it:

[sudo] pip install lxml

Here is a recursive function I wrote that does the heavy lifting for you:

from lxml import objectify as xml_objectify


def xml_to_dict(xml_str):
    """ Convert xml to dict, using lxml v3.4.2 xml processing library """
    def xml_to_dict_recursion(xml_object):
        dict_object = xml_object.__dict__
        if not dict_object:
            return xml_object
        for key, value in dict_object.items():
            dict_object[key] = xml_to_dict_recursion(value)
        return dict_object
    return xml_to_dict_recursion(xml_objectify.fromstring(xml_str))

xml_string = """<?xml version="1.0" encoding="UTF-8"?><Response><NewOrderResp>
<IndustryType>Test</IndustryType><SomeData><SomeNestedData1>1234</SomeNestedData1>
<SomeNestedData2>3455</SomeNestedData2></SomeData></NewOrderResp></Response>"""

print xml_to_dict(xml_string)

The below variant preserves the parent key / element:

def xml_to_dict(xml_str):
    """ Convert xml to dict, using lxml v3.4.2 xml processing library, see http://lxml.de/ """
    def xml_to_dict_recursion(xml_object):
        dict_object = xml_object.__dict__
        if not dict_object:  # if empty dict returned
            return xml_object
        for key, value in dict_object.items():
            dict_object[key] = xml_to_dict_recursion(value)
        return dict_object
    xml_obj = objectify.fromstring(xml_str)
    return {xml_obj.tag: xml_to_dict_recursion(xml_obj)}

If you want to only return a subtree and convert it to dict, you can use Element.find() to get the subtree and then convert it:

xml_obj.find('.//')  # lxml.objectify.ObjectifiedElement instance

See the lxml docs here. I hope this helps!


回答 6

免责声明:此经过修改的XML解析器受到Adam Clark 的启发。原始XML解析器适用于大多数简单情况。但是,它不适用于某些复杂的XML文件。我逐行调试了代码,最后解决了一些问题。如果您发现一些错误,请告诉我。我很高兴修复它。

class XmlDictConfig(dict):  
    '''   
    Note: need to add a root into if no exising    
    Example usage:
    >>> tree = ElementTree.parse('your_file.xml')
    >>> root = tree.getroot()
    >>> xmldict = XmlDictConfig(root)
    Or, if you want to use an XML string:
    >>> root = ElementTree.XML(xml_string)
    >>> xmldict = XmlDictConfig(root)
    And then use xmldict for what it is... a dict.
    '''
    def __init__(self, parent_element):
        if parent_element.items():
            self.updateShim( dict(parent_element.items()) )
        for element in parent_element:
            if len(element):
                aDict = XmlDictConfig(element)
            #   if element.items():
            #   aDict.updateShim(dict(element.items()))
                self.updateShim({element.tag: aDict})
            elif element.items():    # items() is specialy for attribtes
                elementattrib= element.items()
                if element.text:           
                    elementattrib.append((element.tag,element.text ))     # add tag:text if there exist
                self.updateShim({element.tag: dict(elementattrib)})
            else:
                self.updateShim({element.tag: element.text})

    def updateShim (self, aDict ):
        for key in aDict.keys():   # keys() includes tag and attributes
            if key in self:
                value = self.pop(key)
                if type(value) is not list:
                    listOfDicts = []
                    listOfDicts.append(value)
                    listOfDicts.append(aDict[key])
                    self.update({key: listOfDicts})
                else:
                    value.append(aDict[key])
                    self.update({key: value})
            else:
                self.update({key:aDict[key]})  # it was self.update(aDict)    

Disclaimer: This modified XML parser was inspired by Adam Clark The original XML parser works for most of simple cases. However, it didn’t work for some complicated XML files. I debugged the code line by line and finally fixed some issues. If you find some bugs, please let me know. I am glad to fix it.

class XmlDictConfig(dict):  
    '''   
    Note: need to add a root into if no exising    
    Example usage:
    >>> tree = ElementTree.parse('your_file.xml')
    >>> root = tree.getroot()
    >>> xmldict = XmlDictConfig(root)
    Or, if you want to use an XML string:
    >>> root = ElementTree.XML(xml_string)
    >>> xmldict = XmlDictConfig(root)
    And then use xmldict for what it is... a dict.
    '''
    def __init__(self, parent_element):
        if parent_element.items():
            self.updateShim( dict(parent_element.items()) )
        for element in parent_element:
            if len(element):
                aDict = XmlDictConfig(element)
            #   if element.items():
            #   aDict.updateShim(dict(element.items()))
                self.updateShim({element.tag: aDict})
            elif element.items():    # items() is specialy for attribtes
                elementattrib= element.items()
                if element.text:           
                    elementattrib.append((element.tag,element.text ))     # add tag:text if there exist
                self.updateShim({element.tag: dict(elementattrib)})
            else:
                self.updateShim({element.tag: element.text})

    def updateShim (self, aDict ):
        for key in aDict.keys():   # keys() includes tag and attributes
            if key in self:
                value = self.pop(key)
                if type(value) is not list:
                    listOfDicts = []
                    listOfDicts.append(value)
                    listOfDicts.append(aDict[key])
                    self.update({key: listOfDicts})
                else:
                    value.append(aDict[key])
                    self.update({key: value})
            else:
                self.update({key:aDict[key]})  # it was self.update(aDict)    

回答 7

def xml_to_dict(node):
    u''' 
    @param node:lxml_node
    @return: dict 
    '''

    return {'tag': node.tag, 'text': node.text, 'attrib': node.attrib, 'children': {child.tag: xml_to_dict(child) for child in node}}
def xml_to_dict(node):
    u''' 
    @param node:lxml_node
    @return: dict 
    '''

    return {'tag': node.tag, 'text': node.text, 'attrib': node.attrib, 'children': {child.tag: xml_to_dict(child) for child in node}}

回答 8

最容易使用的XML XML解析器是ElementTree(从2.5x开始,在标准库xml.etree.ElementTree中)。我认为没有什么可以完全满足您的要求。使用ElementTree编写某些内容来完成您想要的事情,这很简单,但是为什么要转换为字典,为什么不直接使用ElementTree。

The easiest to use XML parser for Python is ElementTree (as of 2.5x and above it is in the standard library xml.etree.ElementTree). I don’t think there is anything that does exactly what you want out of the box. It would be pretty trivial to write something to do what you want using ElementTree, but why convert to a dictionary, and why not just use ElementTree directly.


回答 9

来自http://code.activestate.com/recipes/410469-xml-as-dictionary/的代码效果很好,但是,如果在层次结构中的给定位置存在多个相同的元素,它将覆盖它们。

我在两者之间添加了一个垫片,以查看在self.update()之前该元素是否已经存在。如果是这样,则弹出现有条目并从现有条目和新条目中创建一个列表。随后的所有重复项都将添加到列表中。

不知道是否可以更妥善地处理此问题,但它的工作原理是:

import xml.etree.ElementTree as ElementTree

class XmlDictConfig(dict):
    def __init__(self, parent_element):
        if parent_element.items():
            self.updateShim(dict(parent_element.items()))
        for element in parent_element:
            if len(element):
                aDict = XmlDictConfig(element)
                if element.items():
                    aDict.updateShim(dict(element.items()))
                self.updateShim({element.tag: aDict})
            elif element.items():
                self.updateShim({element.tag: dict(element.items())})
            else:
                self.updateShim({element.tag: element.text.strip()})

    def updateShim (self, aDict ):
        for key in aDict.keys():
            if key in self:
                value = self.pop(key)
                if type(value) is not list:
                    listOfDicts = []
                    listOfDicts.append(value)
                    listOfDicts.append(aDict[key])
                    self.update({key: listOfDicts})

                else:
                    value.append(aDict[key])
                    self.update({key: value})
            else:
                self.update(aDict)

The code from http://code.activestate.com/recipes/410469-xml-as-dictionary/ works well, but if there are multiple elements that are the same at a given place in the hierarchy it just overrides them.

I added a shim between that looks to see if the element already exists before self.update(). If so, pops the existing entry and creates a lists out of the existing and the new. Any subsequent duplicates are added to the list.

Not sure if this can be handled more gracefully, but it works:

import xml.etree.ElementTree as ElementTree

class XmlDictConfig(dict):
    def __init__(self, parent_element):
        if parent_element.items():
            self.updateShim(dict(parent_element.items()))
        for element in parent_element:
            if len(element):
                aDict = XmlDictConfig(element)
                if element.items():
                    aDict.updateShim(dict(element.items()))
                self.updateShim({element.tag: aDict})
            elif element.items():
                self.updateShim({element.tag: dict(element.items())})
            else:
                self.updateShim({element.tag: element.text.strip()})

    def updateShim (self, aDict ):
        for key in aDict.keys():
            if key in self:
                value = self.pop(key)
                if type(value) is not list:
                    listOfDicts = []
                    listOfDicts.append(value)
                    listOfDicts.append(aDict[key])
                    self.update({key: listOfDicts})

                else:
                    value.append(aDict[key])
                    self.update({key: value})
            else:
                self.update(aDict)

回答 10

从@ K3 — rnc 响应(最适合我),我添加了一些小修改以从XML文本中获得OrderedDict(有时顺序很重要):

def etree_to_ordereddict(t):
d = OrderedDict()
d[t.tag] = OrderedDict() if t.attrib else None
children = list(t)
if children:
    dd = OrderedDict()
    for dc in map(etree_to_ordereddict, children):
        for k, v in dc.iteritems():
            if k not in dd:
                dd[k] = list()
            dd[k].append(v)
    d = OrderedDict()
    d[t.tag] = OrderedDict()
    for k, v in dd.iteritems():
        if len(v) == 1:
            d[t.tag][k] = v[0]
        else:
            d[t.tag][k] = v
if t.attrib:
    d[t.tag].update(('@' + k, v) for k, v in t.attrib.iteritems())
if t.text:
    text = t.text.strip()
    if children or t.attrib:
        if text:
            d[t.tag]['#text'] = text
    else:
        d[t.tag] = text
return d

在@ K3 — rnc示例中,可以使用它:

from xml.etree import cElementTree as ET
e = ET.XML('''
<root>
  <e />
  <e>text</e>
  <e name="value" />
  <e name="value">text</e>
  <e> <a>text</a> <b>text</b> </e>
  <e> <a>text</a> <a>text</a> </e>
  <e> text <a>text</a> </e>
</root>
''')

from pprint import pprint
pprint(etree_to_ordereddict(e))

希望能帮助到你 ;)

From @K3—rnc response (the best for me) I’ve added a small modifications to get an OrderedDict from an XML text (some times order matters):

def etree_to_ordereddict(t):
d = OrderedDict()
d[t.tag] = OrderedDict() if t.attrib else None
children = list(t)
if children:
    dd = OrderedDict()
    for dc in map(etree_to_ordereddict, children):
        for k, v in dc.iteritems():
            if k not in dd:
                dd[k] = list()
            dd[k].append(v)
    d = OrderedDict()
    d[t.tag] = OrderedDict()
    for k, v in dd.iteritems():
        if len(v) == 1:
            d[t.tag][k] = v[0]
        else:
            d[t.tag][k] = v
if t.attrib:
    d[t.tag].update(('@' + k, v) for k, v in t.attrib.iteritems())
if t.text:
    text = t.text.strip()
    if children or t.attrib:
        if text:
            d[t.tag]['#text'] = text
    else:
        d[t.tag] = text
return d

Following @K3—rnc example, you can use it:

from xml.etree import cElementTree as ET
e = ET.XML('''
<root>
  <e />
  <e>text</e>
  <e name="value" />
  <e name="value">text</e>
  <e> <a>text</a> <b>text</b> </e>
  <e> <a>text</a> <a>text</a> </e>
  <e> text <a>text</a> </e>
</root>
''')

from pprint import pprint
pprint(etree_to_ordereddict(e))

Hope it helps ;)


回答 11

这是ActiveState解决方案的链接-以及代码再次消失的代码。

==================================================
xmlreader.py:
==================================================
from xml.dom.minidom import parse


class NotTextNodeError:
    pass


def getTextFromNode(node):
    """
    scans through all children of node and gathers the
    text. if node has non-text child-nodes, then
    NotTextNodeError is raised.
    """
    t = ""
    for n in node.childNodes:
    if n.nodeType == n.TEXT_NODE:
        t += n.nodeValue
    else:
        raise NotTextNodeError
    return t


def nodeToDic(node):
    """
    nodeToDic() scans through the children of node and makes a
    dictionary from the content.
    three cases are differentiated:
    - if the node contains no other nodes, it is a text-node
    and {nodeName:text} is merged into the dictionary.
    - if the node has the attribute "method" set to "true",
    then it's children will be appended to a list and this
    list is merged to the dictionary in the form: {nodeName:list}.
    - else, nodeToDic() will call itself recursively on
    the nodes children (merging {nodeName:nodeToDic()} to
    the dictionary).
    """
    dic = {} 
    for n in node.childNodes:
    if n.nodeType != n.ELEMENT_NODE:
        continue
    if n.getAttribute("multiple") == "true":
        # node with multiple children:
        # put them in a list
        l = []
        for c in n.childNodes:
            if c.nodeType != n.ELEMENT_NODE:
            continue
        l.append(nodeToDic(c))
            dic.update({n.nodeName:l})
        continue

    try:
        text = getTextFromNode(n)
    except NotTextNodeError:
            # 'normal' node
            dic.update({n.nodeName:nodeToDic(n)})
            continue

        # text node
        dic.update({n.nodeName:text})
    continue
    return dic


def readConfig(filename):
    dom = parse(filename)
    return nodeToDic(dom)





def test():
    dic = readConfig("sample.xml")

    print dic["Config"]["Name"]
    print
    for item in dic["Config"]["Items"]:
    print "Item's Name:", item["Name"]
    print "Item's Value:", item["Value"]

test()



==================================================
sample.xml:
==================================================
<?xml version="1.0" encoding="UTF-8"?>

<Config>
    <Name>My Config File</Name>

    <Items multiple="true">
    <Item>
        <Name>First Item</Name>
        <Value>Value 1</Value>
    </Item>
    <Item>
        <Name>Second Item</Name>
        <Value>Value 2</Value>
    </Item>
    </Items>

</Config>



==================================================
output:
==================================================
My Config File

Item's Name: First Item
Item's Value: Value 1
Item's Name: Second Item
Item's Value: Value 2

Here’s a link to an ActiveState solution – and the code in case it disappears again.

==================================================
xmlreader.py:
==================================================
from xml.dom.minidom import parse


class NotTextNodeError:
    pass


def getTextFromNode(node):
    """
    scans through all children of node and gathers the
    text. if node has non-text child-nodes, then
    NotTextNodeError is raised.
    """
    t = ""
    for n in node.childNodes:
    if n.nodeType == n.TEXT_NODE:
        t += n.nodeValue
    else:
        raise NotTextNodeError
    return t


def nodeToDic(node):
    """
    nodeToDic() scans through the children of node and makes a
    dictionary from the content.
    three cases are differentiated:
    - if the node contains no other nodes, it is a text-node
    and {nodeName:text} is merged into the dictionary.
    - if the node has the attribute "method" set to "true",
    then it's children will be appended to a list and this
    list is merged to the dictionary in the form: {nodeName:list}.
    - else, nodeToDic() will call itself recursively on
    the nodes children (merging {nodeName:nodeToDic()} to
    the dictionary).
    """
    dic = {} 
    for n in node.childNodes:
    if n.nodeType != n.ELEMENT_NODE:
        continue
    if n.getAttribute("multiple") == "true":
        # node with multiple children:
        # put them in a list
        l = []
        for c in n.childNodes:
            if c.nodeType != n.ELEMENT_NODE:
            continue
        l.append(nodeToDic(c))
            dic.update({n.nodeName:l})
        continue

    try:
        text = getTextFromNode(n)
    except NotTextNodeError:
            # 'normal' node
            dic.update({n.nodeName:nodeToDic(n)})
            continue

        # text node
        dic.update({n.nodeName:text})
    continue
    return dic


def readConfig(filename):
    dom = parse(filename)
    return nodeToDic(dom)





def test():
    dic = readConfig("sample.xml")

    print dic["Config"]["Name"]
    print
    for item in dic["Config"]["Items"]:
    print "Item's Name:", item["Name"]
    print "Item's Value:", item["Value"]

test()



==================================================
sample.xml:
==================================================
<?xml version="1.0" encoding="UTF-8"?>

<Config>
    <Name>My Config File</Name>

    <Items multiple="true">
    <Item>
        <Name>First Item</Name>
        <Value>Value 1</Value>
    </Item>
    <Item>
        <Name>Second Item</Name>
        <Value>Value 2</Value>
    </Item>
    </Items>

</Config>



==================================================
output:
==================================================
My Config File

Item's Name: First Item
Item's Value: Value 1
Item's Name: Second Item
Item's Value: Value 2

回答 12

在某一时刻,我不得不解析和编写仅包含没有属性的元素的XML,因此从XML到dict的1:1映射很容易。如果别人也不需要属性,这就是我想出的:

def xmltodict(element):
    if not isinstance(element, ElementTree.Element):
        raise ValueError("must pass xml.etree.ElementTree.Element object")

    def xmltodict_handler(parent_element):
        result = dict()
        for element in parent_element:
            if len(element):
                obj = xmltodict_handler(element)
            else:
                obj = element.text

            if result.get(element.tag):
                if hasattr(result[element.tag], "append"):
                    result[element.tag].append(obj)
                else:
                    result[element.tag] = [result[element.tag], obj]
            else:
                result[element.tag] = obj
        return result

    return {element.tag: xmltodict_handler(element)}


def dicttoxml(element):
    if not isinstance(element, dict):
        raise ValueError("must pass dict type")
    if len(element) != 1:
        raise ValueError("dict must have exactly one root key")

    def dicttoxml_handler(result, key, value):
        if isinstance(value, list):
            for e in value:
                dicttoxml_handler(result, key, e)
        elif isinstance(value, basestring):
            elem = ElementTree.Element(key)
            elem.text = value
            result.append(elem)
        elif isinstance(value, int) or isinstance(value, float):
            elem = ElementTree.Element(key)
            elem.text = str(value)
            result.append(elem)
        elif value is None:
            result.append(ElementTree.Element(key))
        else:
            res = ElementTree.Element(key)
            for k, v in value.items():
                dicttoxml_handler(res, k, v)
            result.append(res)

    result = ElementTree.Element(element.keys()[0])
    for key, value in element[element.keys()[0]].items():
        dicttoxml_handler(result, key, value)
    return result

def xmlfiletodict(filename):
    return xmltodict(ElementTree.parse(filename).getroot())

def dicttoxmlfile(element, filename):
    ElementTree.ElementTree(dicttoxml(element)).write(filename)

def xmlstringtodict(xmlstring):
    return xmltodict(ElementTree.fromstring(xmlstring).getroot())

def dicttoxmlstring(element):
    return ElementTree.tostring(dicttoxml(element))

At one point I had to parse and write XML that only consisted of elements without attributes so a 1:1 mapping from XML to dict was possible easily. This is what I came up with in case someone else also doesnt need attributes:

def xmltodict(element):
    if not isinstance(element, ElementTree.Element):
        raise ValueError("must pass xml.etree.ElementTree.Element object")

    def xmltodict_handler(parent_element):
        result = dict()
        for element in parent_element:
            if len(element):
                obj = xmltodict_handler(element)
            else:
                obj = element.text

            if result.get(element.tag):
                if hasattr(result[element.tag], "append"):
                    result[element.tag].append(obj)
                else:
                    result[element.tag] = [result[element.tag], obj]
            else:
                result[element.tag] = obj
        return result

    return {element.tag: xmltodict_handler(element)}


def dicttoxml(element):
    if not isinstance(element, dict):
        raise ValueError("must pass dict type")
    if len(element) != 1:
        raise ValueError("dict must have exactly one root key")

    def dicttoxml_handler(result, key, value):
        if isinstance(value, list):
            for e in value:
                dicttoxml_handler(result, key, e)
        elif isinstance(value, basestring):
            elem = ElementTree.Element(key)
            elem.text = value
            result.append(elem)
        elif isinstance(value, int) or isinstance(value, float):
            elem = ElementTree.Element(key)
            elem.text = str(value)
            result.append(elem)
        elif value is None:
            result.append(ElementTree.Element(key))
        else:
            res = ElementTree.Element(key)
            for k, v in value.items():
                dicttoxml_handler(res, k, v)
            result.append(res)

    result = ElementTree.Element(element.keys()[0])
    for key, value in element[element.keys()[0]].items():
        dicttoxml_handler(result, key, value)
    return result

def xmlfiletodict(filename):
    return xmltodict(ElementTree.parse(filename).getroot())

def dicttoxmlfile(element, filename):
    ElementTree.ElementTree(dicttoxml(element)).write(filename)

def xmlstringtodict(xmlstring):
    return xmltodict(ElementTree.fromstring(xmlstring).getroot())

def dicttoxmlstring(element):
    return ElementTree.tostring(dicttoxml(element))

回答 13

@dibrovsd:如果xml具有多个具有相同名称的标签,则解决方案将不起作用

根据您的想法,我对代码进行了一些修改,并将其编写为常规节点而不是root用户:

from collections import defaultdict
def xml2dict(node):
    d, count = defaultdict(list), 1
    for i in node:
        d[i.tag + "_" + str(count)]['text'] = i.findtext('.')[0]
        d[i.tag + "_" + str(count)]['attrib'] = i.attrib # attrib gives the list
        d[i.tag + "_" + str(count)]['children'] = xml2dict(i) # it gives dict
     return d

@dibrovsd: Solution will not work if the xml have more than one tag with same name

On your line of thought, I have modified the code a bit and written it for general node instead of root:

from collections import defaultdict
def xml2dict(node):
    d, count = defaultdict(list), 1
    for i in node:
        d[i.tag + "_" + str(count)]['text'] = i.findtext('.')[0]
        d[i.tag + "_" + str(count)]['attrib'] = i.attrib # attrib gives the list
        d[i.tag + "_" + str(count)]['children'] = xml2dict(i) # it gives dict
     return d

回答 14

我修改了我的口味的答案之一,并使用同一标签处理多个值,例如考虑以下保存在XML.xml文件中的xml代码。

     <A>
        <B>
            <BB>inAB</BB>
            <C>
                <D>
                    <E>
                        inABCDE
                    </E>
                    <E>value2</E>
                    <E>value3</E>
                </D>
                <inCout-ofD>123</inCout-ofD>
            </C>
        </B>
        <B>abc</B>
        <F>F</F>
    </A>

和在python中

import xml.etree.ElementTree as ET




class XMLToDictionary(dict):
    def __init__(self, parentElement):
        self.parentElement = parentElement
        for child in list(parentElement):
            child.text = child.text if (child.text != None) else  ' '
            if len(child) == 0:
                self.update(self._addToDict(key= child.tag, value = child.text.strip(), dict = self))
            else:
                innerChild = XMLToDictionary(parentElement=child)
                self.update(self._addToDict(key=innerChild.parentElement.tag, value=innerChild, dict=self))

    def getDict(self):
        return {self.parentElement.tag: self}

    class _addToDict(dict):
        def __init__(self, key, value, dict):
            if not key in dict:
                self.update({key: value})
            else:
                identical = dict[key] if type(dict[key]) == list else [dict[key]]
                self.update({key: identical + [value]})


tree = ET.parse('./XML.xml')
root = tree.getroot()
parseredDict = XMLToDictionary(root).getDict()
print(parseredDict)

输出是

{'A': {'B': [{'BB': 'inAB', 'C': {'D': {'E': ['inABCDE', 'value2', 'value3']}, 'inCout-ofD': '123'}}, 'abc'], 'F': 'F'}}

I have modified one of the answers to my taste and to work with multiple values with the same tag for example consider the following xml code saved in XML.xml file

     <A>
        <B>
            <BB>inAB</BB>
            <C>
                <D>
                    <E>
                        inABCDE
                    </E>
                    <E>value2</E>
                    <E>value3</E>
                </D>
                <inCout-ofD>123</inCout-ofD>
            </C>
        </B>
        <B>abc</B>
        <F>F</F>
    </A>

and in python

import xml.etree.ElementTree as ET




class XMLToDictionary(dict):
    def __init__(self, parentElement):
        self.parentElement = parentElement
        for child in list(parentElement):
            child.text = child.text if (child.text != None) else  ' '
            if len(child) == 0:
                self.update(self._addToDict(key= child.tag, value = child.text.strip(), dict = self))
            else:
                innerChild = XMLToDictionary(parentElement=child)
                self.update(self._addToDict(key=innerChild.parentElement.tag, value=innerChild, dict=self))

    def getDict(self):
        return {self.parentElement.tag: self}

    class _addToDict(dict):
        def __init__(self, key, value, dict):
            if not key in dict:
                self.update({key: value})
            else:
                identical = dict[key] if type(dict[key]) == list else [dict[key]]
                self.update({key: identical + [value]})


tree = ET.parse('./XML.xml')
root = tree.getroot()
parseredDict = XMLToDictionary(root).getDict()
print(parseredDict)

the output is

{'A': {'B': [{'BB': 'inAB', 'C': {'D': {'E': ['inABCDE', 'value2', 'value3']}, 'inCout-ofD': '123'}}, 'abc'], 'F': 'F'}}

回答 15

我有一个递归方法,可从lxml元素获取字典

    def recursive_dict(element):
        return (element.tag.split('}')[1],
                dict(map(recursive_dict, element.getchildren()),
                     **element.attrib))

I have a recursive method to get a dictionary from a lxml element

    def recursive_dict(element):
        return (element.tag.split('}')[1],
                dict(map(recursive_dict, element.getchildren()),
                     **element.attrib))

泡菜还是json?

问题:泡菜还是json?

我需要将一个dict密钥为type str且值为ints 的小对象保存到磁盘,然后将其恢复。像这样:

{'juanjo': 2, 'pedro':99, 'other': 333}

最佳选择是什么,为什么?使用pickle或使用序列化它simplejson

我正在使用Python 2.6。

I need to save to disk a little dict object whose keys are of the type str and values are ints and then recover it. Something like this:

{'juanjo': 2, 'pedro':99, 'other': 333}

What is the best option and why? Serialize it with pickle or with simplejson?

I am using Python 2.6.


回答 0

如果您没有任何互操作性要求(例如,您将仅使用Python使用数据)并且二进制格式很好,请使用cPickle,它将为您提供真正快速的Python对象序列化。

如果您希望互操作性或想要一种文本格式来存储数据,请使用JSON(或其他一些适当的格式,具体取决于您的约束)。

If you do not have any interoperability requirements (e.g. you are just going to use the data with Python) and a binary format is fine, go with cPickle which gives you really fast Python object serialization.

If you want interoperability or you want a text format to store your data, go with JSON (or some other appropriate format depending on your constraints).


回答 1

对于序列化,我更喜欢JSON而不是pickle。取消选择可以运行任意代码,并且pickle用于在程序之间传输数据或在会话之间存储数据是一个安全漏洞。JSON不会引入安全漏洞,并且已标准化,因此,您可以根据需要使用不同语言的程序访问数据。

I prefer JSON over pickle for my serialization. Unpickling can run arbitrary code, and using pickle to transfer data between programs or store data between sessions is a security hole. JSON does not introduce a security hole and is standardized, so the data can be accessed by programs in different languages if you ever need to.


回答 2

您可能还会发现一些有趣的图表,可以进行比较:http : //kovshenin.com/archives/pickle-vs-json-which-is-faster/

You might also find this interesting, with some charts to compare: http://kovshenin.com/archives/pickle-vs-json-which-is-faster/


回答 3

如果您主要关注速度和空间,请使用cPickle,因为cPickle比JSON快。

如果您更关注互操作性,安全性和/或人类可读性,请使用JSON。


其他答案中引用的测试结果记录在2010年,2016年使用cPickle 协议2更新的测试显示:

  • cPickle 3.8倍更快的加载速度
  • cPickle 1.5倍读取速度更快
  • cPickle编码稍小

使用这个gist可以自己重现这一点,它基于康斯坦丁在其他答案中引用的基准,但是使用协议2而不是pickle的cPickle,并且使用pickle的json(因为json比simplejson快)来使用json ,例如

wget https://gist.github.com/jdimatteo/af317ef24ccf1b3fa91f4399902bb534/raw/03e8dbab11b5605bc572bc117c8ac34cfa959a70/pickle_vs_json.py
python pickle_vs_json.py

在不错的2015 Xeon处理器上使用python 2.7的结果:

Dir Entries Method  Time    Length

dump    10  JSON    0.017   1484510
load    10  JSON    0.375   -
dump    10  Pickle  0.011   1428790
load    10  Pickle  0.098   -
dump    20  JSON    0.036   2969020
load    20  JSON    1.498   -
dump    20  Pickle  0.022   2857580
load    20  Pickle  0.394   -
dump    50  JSON    0.079   7422550
load    50  JSON    9.485   -
dump    50  Pickle  0.055   7143950
load    50  Pickle  2.518   -
dump    100 JSON    0.165   14845100
load    100 JSON    37.730  -
dump    100 Pickle  0.107   14287900
load    100 Pickle  9.907   -

带有pickle协议3的Python 3.4甚至更快。

If you are primarily concerned with speed and space, use cPickle because cPickle is faster than JSON.

If you are more concerned with interoperability, security, and/or human readability, then use JSON.


The tests results referenced in other answers were recorded in 2010, and the updated tests in 2016 with cPickle protocol 2 show:

  • cPickle 3.8x faster loading
  • cPickle 1.5x faster reading
  • cPickle slightly smaller encoding

Reproduce this yourself with this gist, which is based on the Konstantin’s benchmark referenced in other answers, but using cPickle with protocol 2 instead of pickle, and using json instead of simplejson (since json is faster than simplejson), e.g.

wget https://gist.github.com/jdimatteo/af317ef24ccf1b3fa91f4399902bb534/raw/03e8dbab11b5605bc572bc117c8ac34cfa959a70/pickle_vs_json.py
python pickle_vs_json.py

Results with python 2.7 on a decent 2015 Xeon processor:

Dir Entries Method  Time    Length

dump    10  JSON    0.017   1484510
load    10  JSON    0.375   -
dump    10  Pickle  0.011   1428790
load    10  Pickle  0.098   -
dump    20  JSON    0.036   2969020
load    20  JSON    1.498   -
dump    20  Pickle  0.022   2857580
load    20  Pickle  0.394   -
dump    50  JSON    0.079   7422550
load    50  JSON    9.485   -
dump    50  Pickle  0.055   7143950
load    50  Pickle  2.518   -
dump    100 JSON    0.165   14845100
load    100 JSON    37.730  -
dump    100 Pickle  0.107   14287900
load    100 Pickle  9.907   -

Python 3.4 with pickle protocol 3 is even faster.


回答 4

JSON还是泡菜?JSON 泡菜怎么样!您可以使用jsonpickle。它易于使用,并且磁盘上的文件是JSON,因此可读。

http://jsonpickle.github.com/

JSON or pickle? How about JSON and pickle! You can use jsonpickle. It easy to use and the file on disk is readable because it’s JSON.

http://jsonpickle.github.com/


回答 5

我尝试了几种方法,发现使用cPickle并将dumps方法的协议参数设置为:cPickle.dumps(obj, protocol=cPickle.HIGHEST_PROTOCOL)是最快的转储方法。

import msgpack
import json
import pickle
import timeit
import cPickle
import numpy as np

num_tests = 10

obj = np.random.normal(0.5, 1, [240, 320, 3])

command = 'pickle.dumps(obj)'
setup = 'from __main__ import pickle, obj'
result = timeit.timeit(command, setup=setup, number=num_tests)
print("pickle:  %f seconds" % result)

command = 'cPickle.dumps(obj)'
setup = 'from __main__ import cPickle, obj'
result = timeit.timeit(command, setup=setup, number=num_tests)
print("cPickle:   %f seconds" % result)


command = 'cPickle.dumps(obj, protocol=cPickle.HIGHEST_PROTOCOL)'
setup = 'from __main__ import cPickle, obj'
result = timeit.timeit(command, setup=setup, number=num_tests)
print("cPickle highest:   %f seconds" % result)

command = 'json.dumps(obj.tolist())'
setup = 'from __main__ import json, obj'
result = timeit.timeit(command, setup=setup, number=num_tests)
print("json:   %f seconds" % result)


command = 'msgpack.packb(obj.tolist())'
setup = 'from __main__ import msgpack, obj'
result = timeit.timeit(command, setup=setup, number=num_tests)
print("msgpack:   %f seconds" % result)

输出:

pickle         :   0.847938 seconds
cPickle        :   0.810384 seconds
cPickle highest:   0.004283 seconds
json           :   1.769215 seconds
msgpack        :   0.270886 seconds

I have tried several methods and found out that using cPickle with setting the protocol argument of the dumps method as: cPickle.dumps(obj, protocol=cPickle.HIGHEST_PROTOCOL) is the fastest dump method.

import msgpack
import json
import pickle
import timeit
import cPickle
import numpy as np

num_tests = 10

obj = np.random.normal(0.5, 1, [240, 320, 3])

command = 'pickle.dumps(obj)'
setup = 'from __main__ import pickle, obj'
result = timeit.timeit(command, setup=setup, number=num_tests)
print("pickle:  %f seconds" % result)

command = 'cPickle.dumps(obj)'
setup = 'from __main__ import cPickle, obj'
result = timeit.timeit(command, setup=setup, number=num_tests)
print("cPickle:   %f seconds" % result)


command = 'cPickle.dumps(obj, protocol=cPickle.HIGHEST_PROTOCOL)'
setup = 'from __main__ import cPickle, obj'
result = timeit.timeit(command, setup=setup, number=num_tests)
print("cPickle highest:   %f seconds" % result)

command = 'json.dumps(obj.tolist())'
setup = 'from __main__ import json, obj'
result = timeit.timeit(command, setup=setup, number=num_tests)
print("json:   %f seconds" % result)


command = 'msgpack.packb(obj.tolist())'
setup = 'from __main__ import msgpack, obj'
result = timeit.timeit(command, setup=setup, number=num_tests)
print("msgpack:   %f seconds" % result)

Output:

pickle         :   0.847938 seconds
cPickle        :   0.810384 seconds
cPickle highest:   0.004283 seconds
json           :   1.769215 seconds
msgpack        :   0.270886 seconds

回答 6

就个人而言,我通常更喜欢JSON,因为数据是人类可读的。当然,如果您需要序列化JSON不会接受的内容,则可以使用pickle。

但是对于大多数数据存储而言,您不需要序列化任何奇怪的东西,而JSON则容易得多,并且始终允许您在文本编辑器中将其弹出并自行检查数据。

速度不错,但是对于大多数数据集而言,差异可以忽略不计;无论如何,Python通常并不太快。

Personally, I generally prefer JSON because the data is human-readable. Definitely, if you need to serialize something that JSON won’t take, than use pickle.

But for most data storage, you won’t need to serialize anything weird and JSON is much easier and always allows you to pop it open in a text editor and check out the data yourself.

The speed is nice, but for most datasets the difference is negligible; Python generally isn’t too fast anyways.


JSON ValueError:期望的属性名称:第1行第2列(字符1)

问题:JSON ValueError:期望的属性名称:第1行第2列(字符1)

我在使用json.loads转换为dict对象时遇到麻烦,我无法弄清楚我在做什么错。我得到的确切错误是

ValueError: Expecting property name: line 1 column 2 (char 1)

这是我的代码:

from kafka.client import KafkaClient
from kafka.consumer import SimpleConsumer
from kafka.producer import SimpleProducer, KeyedProducer
import pymongo
from pymongo import MongoClient
import json

c = MongoClient("54.210.157.57")
db = c.test_database3
collection = db.tweet_col

kafka = KafkaClient("54.210.157.57:9092")

consumer = SimpleConsumer(kafka,"myconsumer","test")
for tweet in consumer:
    print tweet.message.value
    jsonTweet=json.loads(({u'favorited': False, u'contributors': None})
    collection.insert(jsonTweet)

我很确定错误发生在第二行到最后一行

jsonTweet=json.loads({u'favorited': False, u'contributors': None})

但我不知道该如何解决。任何意见,将不胜感激。

I am having trouble using json.loads to convert to a dict object and I can’t figure out what I’m doing wrong.The exact error I get running this is

ValueError: Expecting property name: line 1 column 2 (char 1)

Here is my code:

from kafka.client import KafkaClient
from kafka.consumer import SimpleConsumer
from kafka.producer import SimpleProducer, KeyedProducer
import pymongo
from pymongo import MongoClient
import json

c = MongoClient("54.210.157.57")
db = c.test_database3
collection = db.tweet_col

kafka = KafkaClient("54.210.157.57:9092")

consumer = SimpleConsumer(kafka,"myconsumer","test")
for tweet in consumer:
    print tweet.message.value
    jsonTweet=json.loads(({u'favorited': False, u'contributors': None})
    collection.insert(jsonTweet)

I’m pretty sure that the error is occuring at the 2nd to last line

jsonTweet=json.loads({u'favorited': False, u'contributors': None})

but I do not know what to do to fix it. Any advice would be appreciated.


回答 0

json.loads将json字符串加载到python中dictjson.dumps将python转储dict到json字符串中,例如:

>>> json_string = '{"favorited": false, "contributors": null}'
'{"favorited": false, "contributors": null}'
>>> value = json.loads(json_string)
{u'favorited': False, u'contributors': None}
>>> json_dump = json.dumps(value)
'{"favorited": false, "contributors": null}'

所以那行是不正确的,因为您正在尝试load使用python dict,并json.loads期望json string应该有一个有效的python <type 'str'>

因此,如果您尝试加载json,则应更改要加载的内容,使其类似于json_string上面的内容,否则应将其转储。根据给定的信息,这只是我的最佳猜测。您要完成什么?

另外,您也不需要u在字符串前指定,就像注释中提到的@Cld一样。

json.loads will load a json string into a python dict, json.dumps will dump a python dict to a json string, for example:

>>> json_string = '{"favorited": false, "contributors": null}'
'{"favorited": false, "contributors": null}'
>>> value = json.loads(json_string)
{u'favorited': False, u'contributors': None}
>>> json_dump = json.dumps(value)
'{"favorited": false, "contributors": null}'

So that line is incorrect since you are trying to load a python dict, and json.loads is expecting a valid json string which should have <type 'str'>.

So if you are trying to load the json, you should change what you are loading to look like the json_string above, or you should be dumping it. This is just my best guess from the given information. What is it that you are trying to accomplish?

Also you don’t need to specify the u before your strings, as @Cld mentioned in the comments.


回答 1

我遇到另一个返回相同错误的问题。

单引号

我使用带单引号的json字符串:

{
    'property': 1
}

但是json.loads只接受json属性的双引号

{
    "property": 1
}

最终逗号问题

json.loads 不接受最终逗号:

{
  "property": "text", 
  "property2": "text2",
}

解决方案:ast解决单引号和最终逗号问题

您可以使用ast(Python 2和3的标准库的一部分)进行此处理。这是一个例子:

import ast
# ast.literal_eval() return a dict object, we must use json.dumps to get JSON string
import json

# Single quote to double with ast.literal_eval()
json_data = "{'property': 'text'}"
json_data = ast.literal_eval(json_data)
print(json.dumps(json_data))
# Displays : {"property": "text"}

# ast.literal_eval() with double quotes
json_data = '{"property": "text"}'
json_data = ast.literal_eval(json_data)
print(json.dumps(json_data))
# Displays : {"property": "text"}

# ast.literal_eval() with final coma
json_data = "{'property': 'text', 'property2': 'text2',}"
json_data = ast.literal_eval(json_data)
print(json.dumps(json_data))
# Displays : {"property2": "text2", "property": "text"}

使用ast会像Python字典一样插入JSON,从而避免出现单引号和最终逗号问题(因此,您必须遵循Python字典语法)。它是eval()文字结构函数的一种非常不错且安全的替代方法。

Python文档警告我们使用大/复杂字符串:

警告由于Python AST编译器中的堆栈深度限制,使用足够大/复杂的字符串可能会使Python解释器崩溃。

带有单引号的json.dumps

json.dumps轻松使用单引号,可以使用以下代码:

import ast
import json

data = json.dumps(ast.literal_eval(json_data_single_quote))

ast 文件资料

ast Python 3 doc

ast Python 2文档

工具

如果您经常编辑JSON,则可以使用CodeBeautify。它可以帮助您修复语法错误并缩小/美化JSON。

希望对您有所帮助。

I encountered another problem that returns the same error.

Single quote issue

I used a json string with single quotes :

{
    'property': 1
}

But json.loads accepts only double quotes for json properties :

{
    "property": 1
}

Final comma issue

json.loads doesn’t accept a final comma:

{
  "property": "text", 
  "property2": "text2",
}

Solution: ast to solve single quote and final comma issues

You can use ast (part of standard library for both Python 2 and 3) for this processing. Here is an example :

import ast
# ast.literal_eval() return a dict object, we must use json.dumps to get JSON string
import json

# Single quote to double with ast.literal_eval()
json_data = "{'property': 'text'}"
json_data = ast.literal_eval(json_data)
print(json.dumps(json_data))
# Displays : {"property": "text"}

# ast.literal_eval() with double quotes
json_data = '{"property": "text"}'
json_data = ast.literal_eval(json_data)
print(json.dumps(json_data))
# Displays : {"property": "text"}

# ast.literal_eval() with final coma
json_data = "{'property': 'text', 'property2': 'text2',}"
json_data = ast.literal_eval(json_data)
print(json.dumps(json_data))
# Displays : {"property2": "text2", "property": "text"}

Using ast will prevent you from single quote and final comma issues by interpet the JSON like Python dictionnary (so you must follow the Python dictionnary syntax). It’s a pretty good and safely alternative of eval() function for literal structures.

Python documentation warned us of using large/complex string :

Warning It is possible to crash the Python interpreter with a sufficiently large/complex string due to stack depth limitations in Python’s AST compiler.

json.dumps with single quotes

To use json.dumps with single quotes easily you can use this code:

import ast
import json

data = json.dumps(ast.literal_eval(json_data_single_quote))

ast documentation

ast Python 3 doc

ast Python 2 doc

Tool

If you frequently edit JSON, you may use CodeBeautify. It helps you to fix syntax error and minify/beautify JSON.

I hope it helps.


回答 2

  1. 用双引号替换所有单引号
  2. 将字符串中的’u“’替换为’”’…因此,在将字符串加载到json之前,基本上将内部unicodes转换为字符串
>> strs = "{u'key':u'val'}"
>> strs = strs.replace("'",'"')
>> json.loads(strs.replace('u"','"'))
  1. replace all single quotes with double quotes
  2. replace ‘u”‘ from your strings to ‘”‘ … so basically convert internal unicodes to strings before loading the string into json
>> strs = "{u'key':u'val'}"
>> strs = strs.replace("'",'"')
>> json.loads(strs.replace('u"','"'))

回答 3

所有其他答案都可以回答您的查询,但是我遇到了同样的问题,这是由于,我在json字符串的末尾添加了这样的杂散:

{
 "key":"123sdf",
 "bus_number":"asd234sdf",
}

当我,像这样删除多余的东西时,我终于使它工作了:

{
 "key":"123sdf",
 "bus_number":"asd234sdf"
}

希望有帮助!干杯。

All other answers may answer your query, but I faced same issue which was due to stray , which I added at the end of my json string like this:

{
 "key":"123sdf",
 "bus_number":"asd234sdf",
}

I finally got it working when I removed extra , like this:

{
 "key":"123sdf",
 "bus_number":"asd234sdf"
}

Hope this help! cheers.


回答 4

使用ast,例如

In [15]: a = "[{'start_city': '1', 'end_city': 'aaa', 'number': 1},\
...:      {'start_city': '2', 'end_city': 'bbb', 'number': 1},\
...:      {'start_city': '3', 'end_city': 'ccc', 'number': 1}]"
In [16]: import ast
In [17]: ast.literal_eval(a)
Out[17]:
[{'end_city': 'aaa', 'number': 1, 'start_city': '1'},
 {'end_city': 'bbb', 'number': 1, 'start_city': '2'},
 {'end_city': 'ccc', 'number': 1, 'start_city': '3'}]

used ast, example

In [15]: a = "[{'start_city': '1', 'end_city': 'aaa', 'number': 1},\
...:      {'start_city': '2', 'end_city': 'bbb', 'number': 1},\
...:      {'start_city': '3', 'end_city': 'ccc', 'number': 1}]"
In [16]: import ast
In [17]: ast.literal_eval(a)
Out[17]:
[{'end_city': 'aaa', 'number': 1, 'start_city': '1'},
 {'end_city': 'bbb', 'number': 1, 'start_city': '2'},
 {'end_city': 'ccc', 'number': 1, 'start_city': '3'}]

回答 5

我遇到的另一种情况是,当我使用echoJSON将其通过管道传递到python脚本中,并且不小心将JSON字符串用双引号引起来时:

echo "{"thumbnailWidth": 640}" | myscript.py

请注意,JSON字符串本身带有引号,我应该这样做:

echo '{"thumbnailWidth": 640}' | myscript.py

由于这是,这是获得什么python脚本:{thumbnailWidth: 640}; 双引号已被有效删除。

A different case in which I encountered this was when I was using echo to pipe the JSON into my python script and carelessly wrapped the JSON string in double quotes:

echo "{"thumbnailWidth": 640}" | myscript.py

Note that the JSON string itself has quotes and I should have done:

echo '{"thumbnailWidth": 640}' | myscript.py

As it was, this is what the python script received: {thumbnailWidth: 640}; the double quotes were effectively stripped.


如何在单元测试中使用JSON发送请求

问题:如何在单元测试中使用JSON发送请求

我在Flask应用程序中包含在请求中使用JSON的代码,并且可以像这样获取JSON对象:

Request = request.get_json()

一切正常,但是我试图使用Python的unittest模块创建单元测试,并且很难找到一种发送带有请求的JSON的方法。

response=self.app.post('/test_function', 
                       data=json.dumps(dict(foo = 'bar')))

这给了我:

>>> request.get_data()
'{"foo": "bar"}'
>>> request.get_json()
None

Flask似乎有一个JSON参数,您可以在其中发布请求中设置json = dict(foo =’bar’),但我不知道如何使用unittest模块来做到这一点。

I have code within a Flask application that uses JSONs in the request, and I can get the JSON object like so:

Request = request.get_json()

This has been working fine, however I am trying to create unit tests using Python’s unittest module and I’m having difficulty finding a way to send a JSON with the request.

response=self.app.post('/test_function', 
                       data=json.dumps(dict(foo = 'bar')))

This gives me:

>>> request.get_data()
'{"foo": "bar"}'
>>> request.get_json()
None

Flask seems to have a JSON argument where you can set json=dict(foo=’bar’) within the post request, but I don’t know how to do that with the unittest module.


回答 0

将帖子更改为

response=self.app.post('/test_function', 
                       data=json.dumps(dict(foo='bar')),
                       content_type='application/json')

解决它。

感谢user3012759。

Changing the post to

response=self.app.post('/test_function', 
                       data=json.dumps(dict(foo='bar')),
                       content_type='application/json')

fixed it.

Thanks to user3012759.


回答 1

更新:由于Flask 1.0发布的flask.testing.FlaskClient方法接受json参数和Response.get_json添加的方法,请参见example

对于Flask 0.x,您可以使用以下收据:

from flask import Flask, Response as BaseResponse, json
from flask.testing import FlaskClient
from werkzeug.utils import cached_property


class Response(BaseResponse):
    @cached_property
    def json(self):
        return json.loads(self.data)


class TestClient(FlaskClient):
    def open(self, *args, **kwargs):
        if 'json' in kwargs:
            kwargs['data'] = json.dumps(kwargs.pop('json'))
            kwargs['content_type'] = 'application/json'
        return super(TestClient, self).open(*args, **kwargs)


app = Flask(__name__)
app.response_class = Response
app.test_client_class = TestClient
app.testing = True

UPDATE: Since Flask 1.0 released flask.testing.FlaskClient methods accepts json argument and Response.get_json method added, see example.

for Flask 0.x you may use receipt below:

from flask import Flask, Response as BaseResponse, json
from flask.testing import FlaskClient
from werkzeug.utils import cached_property


class Response(BaseResponse):
    @cached_property
    def json(self):
        return json.loads(self.data)


class TestClient(FlaskClient):
    def open(self, *args, **kwargs):
        if 'json' in kwargs:
            kwargs['data'] = json.dumps(kwargs.pop('json'))
            kwargs['content_type'] = 'application/json'
        return super(TestClient, self).open(*args, **kwargs)


app = Flask(__name__)
app.response_class = Response
app.test_client_class = TestClient
app.testing = True