Python 实用宝典

Question 1

I’d like to read multiple JSON objects from a file/stream in Python, one at a time. Unfortunately json.load() just .read()s until end-of-file; there doesn’t seem to be any way to use it to read a single object or to lazily iterate over the objects.

Is there any way to do this? Using the standard library would be ideal, but if there’s a third-party library I’d use that instead.

At the moment I’m putting each object on a separate line and using json.loads(f.readline()), but I would really prefer not to need to do this.

Example Use

example.py

import my_json as json
import sys

for o in json.iterload(sys.stdin):
    print("Working on a", type(o))

in.txt

{"foo": ["bar", "baz"]} 1 2 [] 4 5 6

example session

$ python3.2 example.py < in.txt
Working on a dict
Working on a int
Working on a int
Working on a list
Working on a int
Working on a int
Working on a int

Question 2

Here’s a much, much simpler solution. The secret is to try, fail, and use the information in the exception to parse correctly. The only limitation is the file must be seekable.

def stream_read_json(fn):
    import json
    start_pos = 0
    with open(fn, 'r') as f:
        while True:
            try:
                obj = json.load(f)
                yield obj
                return
            except json.JSONDecodeError as e:
                f.seek(start_pos)
                json_str = f.read(e.pos)
                obj = json.loads(json_str)
                start_pos += e.pos
                yield obj

Edit: just noticed that this will only work for Python >=3.5. For earlier, failures return a ValueError, and you have to parse out the position from the string, e.g.

def stream_read_json(fn):
    import json
    import re
    start_pos = 0
    with open(fn, 'r') as f:
        while True:
            try:
                obj = json.load(f)
                yield obj
                return
            except ValueError as e:
                f.seek(start_pos)
                end_pos = int(re.match('Extra data: line \d+ column \d+ .*\(char (\d+).*\)',
                                    e.args[0]).groups()[0])
                json_str = f.read(end_pos)
                obj = json.loads(json_str)
                start_pos += end_pos
                yield obj

Question 3

JSON generally isn’t very good for this sort of incremental use; there’s no standard way to serialise multiple objects so that they can easily be loaded one at a time, without parsing the whole lot.

The object per line solution that you’re using is seen elsewhere too. Scrapy calls it ‘JSON lines’:

You can do it slightly more Pythonically:

for jsonline in f:
    yield json.loads(jsonline)   # or do the processing in this loop

I think this is about the best way – it doesn’t rely on any third party libraries, and it’s easy to understand what’s going on. I’ve used it in some of my own code as well.

Question 4

A little late maybe, but I had this exact problem (well, more or less). My standard solution for these problems is usually to just do a regex split on some well-known root object, but in my case it was impossible. The only feasible way to do this generically is to implement a proper tokenizer.

After not finding a generic-enough and reasonably well-performing solution, I ended doing this myself, writing the splitstream module. It is a pre-tokenizer that understands JSON and XML and splits a continuous stream into multiple chunks for parsing (it leaves the actual parsing up to you though). To get some kind of performance out of it, it is written as a C module.

Example:

from splitstream import splitfile

for jsonstr in splitfile(sys.stdin, format="json")):
    yield json.loads(jsonstr)

Question 5

Sure you can do this. You just have to take to raw_decode directly. This implementation loads the whole file into memory and operates on that string (much as json.load does); if you have large files you can modify it to only read from the file as necessary without much difficulty.

import json
from json.decoder import WHITESPACE

def iterload(string_or_fp, cls=json.JSONDecoder, **kwargs):
    if isinstance(string_or_fp, file):
        string = string_or_fp.read()
    else:
        string = str(string_or_fp)

    decoder = cls(**kwargs)
    idx = WHITESPACE.match(string, 0).end()
    while idx < len(string):
        obj, end = decoder.raw_decode(string, idx)
        yield obj
        idx = WHITESPACE.match(string, end).end()

Usage: just as you requested, it’s a generator.

Question 6

This is a pretty nasty problem actually because you have to stream in lines, but pattern match across multiple lines against braces, but also pattern match json. It’s a sort of json-preparse followed by a json parse. Json is, in comparison to other formats, easy to parse so it’s not always necessary to go for a parsing library, nevertheless, how to should we solve these conflicting issues?

Generators to the rescue!

The beauty of generators for a problem like this is you can stack them on top of each other gradually abstracting away the difficulty of the problem whilst maintaining laziness. I also considered using the mechanism for passing back values into a generator (send()) but fortunately found I didn’t need to use that.

To solve the first of the problems you need some sort of streamingfinditer, as a streaming version of re.finditer. My attempt at this below pulls in lines as needed (uncomment the debug statement to see) whilst still returning matches. I actually then modified it slightly to yield non-matched lines as well as matches (marked as 0 or 1 in the first part of the yielded tuple).

import re

def streamingfinditer(pat,stream):
  for s in stream:
#    print "Read next line: " + s
    while 1:
      m = re.search(pat,s)
      if not m:
        yield (0,s)
        break
      yield (1,m.group())
      s = re.split(pat,s,1)[1]

With that, it’s then possible to match up until braces, account each time for whether the braces are balanced, and then return either simple or compound objects as appropriate.

braces='{}[]'
whitespaceesc=' \t'
bracesesc='\\'+'\\'.join(braces)
balancemap=dict(zip(braces,[1,-1,1,-1]))
bracespat='['+bracesesc+']'
nobracespat='[^'+bracesesc+']*'
untilbracespat=nobracespat+bracespat

def simpleorcompoundobjects(stream):
  obj = ""
  unbalanced = 0
  for (c,m) in streamingfinditer(re.compile(untilbracespat),stream):
    if (c == 0): # remainder of line returned, nothing interesting
      if (unbalanced == 0):
        yield (0,m)
      else:
        obj += m
    if (c == 1): # match returned
      if (unbalanced == 0):
        yield (0,m[:-1])
        obj += m[-1]
      else:
        obj += m
      unbalanced += balancemap[m[-1]]
      if (unbalanced == 0):
        yield (1,obj)
        obj=""

This returns tuples as follows:

(0,"String of simple non-braced objects easy to parse")
(1,"{ 'Compound' : 'objects' }")

Basically that’s the nasty part done. We now just have to do the final level of parsing as we see fit. For example we can use Jeremy Roman’s iterload function (Thanks!) to do parsing for a single line:

def streamingiterload(stream):
  for c,o in simpleorcompoundobjects(stream):
    for x in iterload(o):
      yield x

Test it:

of = open("test.json","w") 
of.write("""[ "hello" ] { "goodbye" : 1 } 1 2 {
} 2
9 78
 4 5 { "animals" : [ "dog" , "lots of mice" ,
 "cat" ] }
""")
of.close()
// open & stream the json
f = open("test.json","r")
for o in streamingiterload(f.readlines()):
  print o
f.close()

I get these results (and if you turn on that debug line, you’ll see it pulls in the lines as needed):

[u'hello']
{u'goodbye': 1}
1
2
{}
2
9
78
4
5
{u'animals': [u'dog', u'lots of mice', u'cat']}

This won’t work for all situations. Due to the implementation of the json library, it is impossible to work entirely correctly without reimplementing the parser yourself.

Question 7

I believe a better way of doing it would be to use a state machine. Below is a sample code that I worked out by converting a NodeJS code on below link to Python ~~3 (used nonlocal keyword only available in Python 3, code won’t work on Python 2)~~

Edit-1: Updated and made code compatible with Python 2

Edit-2: Updated and added a Python3 only version as well

https://gist.github.com/creationix/5992451

Python 3 only version

# A streaming byte oriented JSON parser.  Feed it a single byte at a time and
# it will emit complete objects as it comes across them.  Whitespace within and
# between objects is ignored.  This means it can parse newline delimited JSON.
import math


def json_machine(emit, next_func=None):
    def _value(byte_data):
        if not byte_data:
            return

        if byte_data == 0x09 or byte_data == 0x0a or byte_data == 0x0d or byte_data == 0x20:
            return _value  # Ignore whitespace

        if byte_data == 0x22:  # "
            return string_machine(on_value)

        if byte_data == 0x2d or (0x30 <= byte_data < 0x40):  # - or 0-9
            return number_machine(byte_data, on_number)

        if byte_data == 0x7b:  #:
            return object_machine(on_value)

        if byte_data == 0x5b:  # [
            return array_machine(on_value)

        if byte_data == 0x74:  # t
            return constant_machine(TRUE, True, on_value)

        if byte_data == 0x66:  # f
            return constant_machine(FALSE, False, on_value)

        if byte_data == 0x6e:  # n
            return constant_machine(NULL, None, on_value)

        if next_func == _value:
            raise Exception("Unexpected 0x" + str(byte_data))

        return next_func(byte_data)

    def on_value(value):
        emit(value)
        return next_func

    def on_number(number, byte):
        emit(number)
        return _value(byte)

    next_func = next_func or _value
    return _value


TRUE = [0x72, 0x75, 0x65]
FALSE = [0x61, 0x6c, 0x73, 0x65]
NULL = [0x75, 0x6c, 0x6c]


def constant_machine(bytes_data, value, emit):
    i = 0
    length = len(bytes_data)

    def _constant(byte_data):
        nonlocal i
        if byte_data != bytes_data[i]:
            i += 1
            raise Exception("Unexpected 0x" + str(byte_data))

        i += 1
        if i < length:
            return _constant
        return emit(value)

    return _constant


def string_machine(emit):
    string = ""

    def _string(byte_data):
        nonlocal string

        if byte_data == 0x22:  # "
            return emit(string)

        if byte_data == 0x5c:  # \
            return _escaped_string

        if byte_data & 0x80:  # UTF-8 handling
            return utf8_machine(byte_data, on_char_code)

        if byte_data < 0x20:  # ASCII control character
            raise Exception("Unexpected control character: 0x" + str(byte_data))

        string += chr(byte_data)
        return _string

    def _escaped_string(byte_data):
        nonlocal string

        if byte_data == 0x22 or byte_data == 0x5c or byte_data == 0x2f:  # " \ /
            string += chr(byte_data)
            return _string

        if byte_data == 0x62:  # b
            string += "\b"
            return _string

        if byte_data == 0x66:  # f
            string += "\f"
            return _string

        if byte_data == 0x6e:  # n
            string += "\n"
            return _string

        if byte_data == 0x72:  # r
            string += "\r"
            return _string

        if byte_data == 0x74:  # t
            string += "\t"
            return _string

        if byte_data == 0x75:  # u
            return hex_machine(on_char_code)

    def on_char_code(char_code):
        nonlocal string
        string += chr(char_code)
        return _string

    return _string


# Nestable state machine for UTF-8 Decoding.
def utf8_machine(byte_data, emit):
    left = 0
    num = 0

    def _utf8(byte_data):
        nonlocal num, left
        if (byte_data & 0xc0) != 0x80:
            raise Exception("Invalid byte in UTF-8 character: 0x" + byte_data.toString(16))

        left = left - 1

        num |= (byte_data & 0x3f) << (left * 6)
        if left:
            return _utf8
        return emit(num)

    if 0xc0 <= byte_data < 0xe0:  # 2-byte UTF-8 Character
        left = 1
        num = (byte_data & 0x1f) << 6
        return _utf8

    if 0xe0 <= byte_data < 0xf0:  # 3-byte UTF-8 Character
        left = 2
        num = (byte_data & 0xf) << 12
        return _utf8

    if 0xf0 <= byte_data < 0xf8:  # 4-byte UTF-8 Character
        left = 3
        num = (byte_data & 0x07) << 18
        return _utf8

    raise Exception("Invalid byte in UTF-8 string: 0x" + str(byte_data))


# Nestable state machine for hex escaped characters
def hex_machine(emit):
    left = 4
    num = 0

    def _hex(byte_data):
        nonlocal num, left

        if 0x30 <= byte_data < 0x40:
            i = byte_data - 0x30
        elif 0x61 <= byte_data <= 0x66:
            i = byte_data - 0x57
        elif 0x41 <= byte_data <= 0x46:
            i = byte_data - 0x37
        else:
            raise Exception("Expected hex char in string hex escape")

        left -= 1
        num |= i << (left * 4)

        if left:
            return _hex
        return emit(num)

    return _hex


def number_machine(byte_data, emit):
    sign = 1
    number = 0
    decimal = 0
    esign = 1
    exponent = 0

    def _mid(byte_data):
        if byte_data == 0x2e:  # .
            return _decimal

        return _later(byte_data)

    def _number(byte_data):
        nonlocal number
        if 0x30 <= byte_data < 0x40:
            number = number * 10 + (byte_data - 0x30)
            return _number

        return _mid(byte_data)

    def _start(byte_data):
        if byte_data == 0x30:
            return _mid

        if 0x30 < byte_data < 0x40:
            return _number(byte_data)

        raise Exception("Invalid number: 0x" + str(byte_data))

    if byte_data == 0x2d:  # -
        sign = -1
        return _start

    def _decimal(byte_data):
        nonlocal decimal
        if 0x30 <= byte_data < 0x40:
            decimal = (decimal + byte_data - 0x30) / 10
            return _decimal

        return _later(byte_data)

    def _later(byte_data):
        if byte_data == 0x45 or byte_data == 0x65:  # E e
            return _esign

        return _done(byte_data)

    def _esign(byte_data):
        nonlocal esign
        if byte_data == 0x2b:  # +
            return _exponent

        if byte_data == 0x2d:  # -
            esign = -1
            return _exponent

        return _exponent(byte_data)

    def _exponent(byte_data):
        nonlocal exponent
        if 0x30 <= byte_data < 0x40:
            exponent = exponent * 10 + (byte_data - 0x30)
            return _exponent

        return _done(byte_data)

    def _done(byte_data):
        value = sign * (number + decimal)
        if exponent:
            value *= math.pow(10, esign * exponent)

        return emit(value, byte_data)

    return _start(byte_data)


def array_machine(emit):
    array_data = []

    def _array(byte_data):
        if byte_data == 0x5d:  # ]
            return emit(array_data)

        return json_machine(on_value, _comma)(byte_data)

    def on_value(value):
        array_data.append(value)

    def _comma(byte_data):
        if byte_data == 0x09 or byte_data == 0x0a or byte_data == 0x0d or byte_data == 0x20:
            return _comma  # Ignore whitespace

        if byte_data == 0x2c:  # ,
            return json_machine(on_value, _comma)

        if byte_data == 0x5d:  # ]
            return emit(array_data)

        raise Exception("Unexpected byte: 0x" + str(byte_data) + " in array body")

    return _array


def object_machine(emit):
    object_data = {}
    key = None

    def _object(byte_data):
        if byte_data == 0x7d:  #
            return emit(object_data)

        return _key(byte_data)

    def _key(byte_data):
        if byte_data == 0x09 or byte_data == 0x0a or byte_data == 0x0d or byte_data == 0x20:
            return _object  # Ignore whitespace

        if byte_data == 0x22:
            return string_machine(on_key)

        raise Exception("Unexpected byte: 0x" + str(byte_data))

    def on_key(result):
        nonlocal key
        key = result
        return _colon

    def _colon(byte_data):
        if byte_data == 0x09 or byte_data == 0x0a or byte_data == 0x0d or byte_data == 0x20:
            return _colon  # Ignore whitespace

        if byte_data == 0x3a:  # :
            return json_machine(on_value, _comma)

        raise Exception("Unexpected byte: 0x" + str(byte_data))

    def on_value(value):
        object_data[key] = value

    def _comma(byte_data):
        if byte_data == 0x09 or byte_data == 0x0a or byte_data == 0x0d or byte_data == 0x20:
            return _comma  # Ignore whitespace

        if byte_data == 0x2c:  # ,
            return _key

        if byte_data == 0x7d:  #
            return emit(object_data)

        raise Exception("Unexpected byte: 0x" + str(byte_data))

    return _object

Python 2 compatible version

# A streaming byte oriented JSON parser.  Feed it a single byte at a time and
# it will emit complete objects as it comes across them.  Whitespace within and
# between objects is ignored.  This means it can parse newline delimited JSON.
import math


def json_machine(emit, next_func=None):
    def _value(byte_data):
        if not byte_data:
            return

        if byte_data == 0x09 or byte_data == 0x0a or byte_data == 0x0d or byte_data == 0x20:
            return _value  # Ignore whitespace

        if byte_data == 0x22:  # "
            return string_machine(on_value)

        if byte_data == 0x2d or (0x30 <= byte_data < 0x40):  # - or 0-9
            return number_machine(byte_data, on_number)

        if byte_data == 0x7b:  #:
            return object_machine(on_value)

        if byte_data == 0x5b:  # [
            return array_machine(on_value)

        if byte_data == 0x74:  # t
            return constant_machine(TRUE, True, on_value)

        if byte_data == 0x66:  # f
            return constant_machine(FALSE, False, on_value)

        if byte_data == 0x6e:  # n
            return constant_machine(NULL, None, on_value)

        if next_func == _value:
            raise Exception("Unexpected 0x" + str(byte_data))

        return next_func(byte_data)

    def on_value(value):
        emit(value)
        return next_func

    def on_number(number, byte):
        emit(number)
        return _value(byte)

    next_func = next_func or _value
    return _value


TRUE = [0x72, 0x75, 0x65]
FALSE = [0x61, 0x6c, 0x73, 0x65]
NULL = [0x75, 0x6c, 0x6c]


def constant_machine(bytes_data, value, emit):
    local_data = {"i": 0, "length": len(bytes_data)}

    def _constant(byte_data):
        # nonlocal i, length
        if byte_data != bytes_data[local_data["i"]]:
            local_data["i"] += 1
            raise Exception("Unexpected 0x" + byte_data.toString(16))

        local_data["i"] += 1

        if local_data["i"] < local_data["length"]:
            return _constant
        return emit(value)

    return _constant


def string_machine(emit):
    local_data = {"string": ""}

    def _string(byte_data):
        # nonlocal string

        if byte_data == 0x22:  # "
            return emit(local_data["string"])

        if byte_data == 0x5c:  # \
            return _escaped_string

        if byte_data & 0x80:  # UTF-8 handling
            return utf8_machine(byte_data, on_char_code)

        if byte_data < 0x20:  # ASCII control character
            raise Exception("Unexpected control character: 0x" + byte_data.toString(16))

        local_data["string"] += chr(byte_data)
        return _string

    def _escaped_string(byte_data):
        # nonlocal string

        if byte_data == 0x22 or byte_data == 0x5c or byte_data == 0x2f:  # " \ /
            local_data["string"] += chr(byte_data)
            return _string

        if byte_data == 0x62:  # b
            local_data["string"] += "\b"
            return _string

        if byte_data == 0x66:  # f
            local_data["string"] += "\f"
            return _string

        if byte_data == 0x6e:  # n
            local_data["string"] += "\n"
            return _string

        if byte_data == 0x72:  # r
            local_data["string"] += "\r"
            return _string

        if byte_data == 0x74:  # t
            local_data["string"] += "\t"
            return _string

        if byte_data == 0x75:  # u
            return hex_machine(on_char_code)

    def on_char_code(char_code):
        # nonlocal string
        local_data["string"] += chr(char_code)
        return _string

    return _string


# Nestable state machine for UTF-8 Decoding.
def utf8_machine(byte_data, emit):
    local_data = {"left": 0, "num": 0}

    def _utf8(byte_data):
        # nonlocal num, left
        if (byte_data & 0xc0) != 0x80:
            raise Exception("Invalid byte in UTF-8 character: 0x" + byte_data.toString(16))

        local_data["left"] -= 1

        local_data["num"] |= (byte_data & 0x3f) << (local_data["left"] * 6)
        if local_data["left"]:
            return _utf8
        return emit(local_data["num"])

    if 0xc0 <= byte_data < 0xe0:  # 2-byte UTF-8 Character
        local_data["left"] = 1
        local_data["num"] = (byte_data & 0x1f) << 6
        return _utf8

    if 0xe0 <= byte_data < 0xf0:  # 3-byte UTF-8 Character
        local_data["left"] = 2
        local_data["num"] = (byte_data & 0xf) << 12
        return _utf8

    if 0xf0 <= byte_data < 0xf8:  # 4-byte UTF-8 Character
        local_data["left"] = 3
        local_data["num"] = (byte_data & 0x07) << 18
        return _utf8

    raise Exception("Invalid byte in UTF-8 string: 0x" + str(byte_data))


# Nestable state machine for hex escaped characters
def hex_machine(emit):
    local_data = {"left": 4, "num": 0}

    def _hex(byte_data):
        # nonlocal num, left
        i = 0  # Parse the hex byte
        if 0x30 <= byte_data < 0x40:
            i = byte_data - 0x30
        elif 0x61 <= byte_data <= 0x66:
            i = byte_data - 0x57
        elif 0x41 <= byte_data <= 0x46:
            i = byte_data - 0x37
        else:
            raise Exception("Expected hex char in string hex escape")

        local_data["left"] -= 1
        local_data["num"] |= i << (local_data["left"] * 4)

        if local_data["left"]:
            return _hex
        return emit(local_data["num"])

    return _hex


def number_machine(byte_data, emit):
    local_data = {"sign": 1, "number": 0, "decimal": 0, "esign": 1, "exponent": 0}

    def _mid(byte_data):
        if byte_data == 0x2e:  # .
            return _decimal

        return _later(byte_data)

    def _number(byte_data):
        # nonlocal number
        if 0x30 <= byte_data < 0x40:
            local_data["number"] = local_data["number"] * 10 + (byte_data - 0x30)
            return _number

        return _mid(byte_data)

    def _start(byte_data):
        if byte_data == 0x30:
            return _mid

        if 0x30 < byte_data < 0x40:
            return _number(byte_data)

        raise Exception("Invalid number: 0x" + byte_data.toString(16))

    if byte_data == 0x2d:  # -
        local_data["sign"] = -1
        return _start

    def _decimal(byte_data):
        # nonlocal decimal
        if 0x30 <= byte_data < 0x40:
            local_data["decimal"] = (local_data["decimal"] + byte_data - 0x30) / 10
            return _decimal

        return _later(byte_data)

    def _later(byte_data):
        if byte_data == 0x45 or byte_data == 0x65:  # E e
            return _esign

        return _done(byte_data)

    def _esign(byte_data):
        # nonlocal esign
        if byte_data == 0x2b:  # +
            return _exponent

        if byte_data == 0x2d:  # -
            local_data["esign"] = -1
            return _exponent

        return _exponent(byte_data)

    def _exponent(byte_data):
        # nonlocal exponent
        if 0x30 <= byte_data < 0x40:
            local_data["exponent"] = local_data["exponent"] * 10 + (byte_data - 0x30)
            return _exponent

        return _done(byte_data)

    def _done(byte_data):
        value = local_data["sign"] * (local_data["number"] + local_data["decimal"])
        if local_data["exponent"]:
            value *= math.pow(10, local_data["esign"] * local_data["exponent"])

        return emit(value, byte_data)

    return _start(byte_data)


def array_machine(emit):
    local_data = {"array_data": []}

    def _array(byte_data):
        if byte_data == 0x5d:  # ]
            return emit(local_data["array_data"])

        return json_machine(on_value, _comma)(byte_data)

    def on_value(value):
        # nonlocal array_data
        local_data["array_data"].append(value)

    def _comma(byte_data):
        if byte_data == 0x09 or byte_data == 0x0a or byte_data == 0x0d or byte_data == 0x20:
            return _comma  # Ignore whitespace

        if byte_data == 0x2c:  # ,
            return json_machine(on_value, _comma)

        if byte_data == 0x5d:  # ]
            return emit(local_data["array_data"])

        raise Exception("Unexpected byte: 0x" + str(byte_data) + " in array body")

    return _array


def object_machine(emit):
    local_data = {"object_data": {}, "key": ""}

    def _object(byte_data):
        # nonlocal object_data, key
        if byte_data == 0x7d:  #
            return emit(local_data["object_data"])

        return _key(byte_data)

    def _key(byte_data):
        if byte_data == 0x09 or byte_data == 0x0a or byte_data == 0x0d or byte_data == 0x20:
            return _object  # Ignore whitespace

        if byte_data == 0x22:
            return string_machine(on_key)

        raise Exception("Unexpected byte: 0x" + byte_data.toString(16))

    def on_key(result):
        # nonlocal object_data, key
        local_data["key"] = result
        return _colon

    def _colon(byte_data):
        # nonlocal object_data, key
        if byte_data == 0x09 or byte_data == 0x0a or byte_data == 0x0d or byte_data == 0x20:
            return _colon  # Ignore whitespace

        if byte_data == 0x3a:  # :
            return json_machine(on_value, _comma)

        raise Exception("Unexpected byte: 0x" + str(byte_data))

    def on_value(value):
        # nonlocal object_data, key
        local_data["object_data"][local_data["key"]] = value

    def _comma(byte_data):
        # nonlocal object_data
        if byte_data == 0x09 or byte_data == 0x0a or byte_data == 0x0d or byte_data == 0x20:
            return _comma  # Ignore whitespace

        if byte_data == 0x2c:  # ,
            return _key

        if byte_data == 0x7d:  #
            return emit(local_data["object_data"])

        raise Exception("Unexpected byte: 0x" + str(byte_data))

    return _object

Testing it

if __name__ == "__main__":
    test_json = """[1,2,"3"] {"name": 
    "tarun"} 1 2 
    3 [{"name":"a", 
    "data": [1,
    null,2]}]
"""
    def found_json(data):
        print(data)

    state = json_machine(found_json)

    for char in test_json:
        state = state(ord(char))

The output of the same is

[1, 2, '3']
{'name': 'tarun'}
1
2
3
[{'name': 'a', 'data': [1, None, 2]}]

Question 8

I’d like to provide a solution. The key thought is to “try” to decode: if it fails, give it more feed, otherwise use the offset information to prepare next decoding.

However the current json module can’t tolerate SPACE in head of string to be decoded, so I have to strip them off.

import sys
import json

def iterload(file):
    buffer = ""
    dec = json.JSONDecoder()
    for line in file:         
        buffer = buffer.strip(" \n\r\t") + line.strip(" \n\r\t")
        while(True):
            try:
                r = dec.raw_decode(buffer)
            except:
                break
            yield r[0]
            buffer = buffer[r[1]:].strip(" \n\r\t")


for o in iterload(sys.stdin):
    print("Working on a", type(o),  o)

========================= I have tested for several txt files, and it works fine. (in1.txt)

{"foo": ["bar", "baz"]
}
 1 2 [
  ]  4
{"foo1": ["bar1", {"foo2":{"A":1, "B":3}, "DDD":4}]
}
 5   6

(in2.txt)

{"foo"
: ["bar",
  "baz"]
  } 
1 2 [
] 4 5 6

(in.txt, your initial)

{"foo": ["bar", "baz"]} 1 2 [] 4 5 6

(output for Benedict’s testcase)

python test.py < in.txt
('Working on a', <type 'list'>, [u'hello'])
('Working on a', <type 'dict'>, {u'goodbye': 1})
('Working on a', <type 'int'>, 1)
('Working on a', <type 'int'>, 2)
('Working on a', <type 'dict'>, {})
('Working on a', <type 'int'>, 2)
('Working on a', <type 'int'>, 9)
('Working on a', <type 'int'>, 78)
('Working on a', <type 'int'>, 4)
('Working on a', <type 'int'>, 5)
('Working on a', <type 'dict'>, {u'animals': [u'dog', u'lots of mice', u'cat']})

Question 9

Here’s mine:

import simplejson as json
from simplejson import JSONDecodeError
class StreamJsonListLoader():
    """
    When you have a big JSON file containint a list, such as

    [{
        ...
    },
    {
        ...
    },
    {
        ...
    },
    ...
    ]

    And it's too big to be practically loaded into memory and parsed by json.load,
    This class comes to the rescue. It lets you lazy-load the large json list.
    """

    def __init__(self, filename_or_stream):
        if type(filename_or_stream) == str:
            self.stream = open(filename_or_stream)
        else:
            self.stream = filename_or_stream

        if not self.stream.read(1) == '[':
            raise NotImplementedError('Only JSON-streams of lists (that start with a [) are supported.')

    def __iter__(self):
        return self

    def next(self):
        read_buffer = self.stream.read(1)
        while True:
            try:
                json_obj = json.loads(read_buffer)

                if not self.stream.read(1) in [',',']']:
                    raise Exception('JSON seems to be malformed: object is not followed by comma (,) or end of list (]).')
                return json_obj
            except JSONDecodeError:
                next_char = self.stream.read(1)
                read_buffer += next_char
                while next_char != '}':
                    next_char = self.stream.read(1)
                    if next_char == '':
                        raise StopIteration
                    read_buffer += next_char

Question 10

I used @wuilang’s elegant solution. The simple approach — read a byte, try to decode, read a byte, try to decode, … — worked, but unfortunately it was very slow.

In my case, I was trying to read “pretty-printed” JSON objects of the same object type from a file. This allowed me to optimize the approach; I could read the file line-by-line, only decoding when I found a line that contained exactly “}”:

def iterload(stream):
    buf = ""
    dec = json.JSONDecoder()
    for line in stream:
        line = line.rstrip()
        buf = buf + line
        if line == "}":
            yield dec.raw_decode(buf)
            buf = ""

If you happen to be working with one-per-line compact JSON that escapes newlines in string literals, then you can safely simplify this approach even more:

def iterload(stream):
    dec = json.JSONDecoder()
    for line in stream:
        yield dec.raw_decode(line)

Obviously, these simple approaches only work for very specific kinds of JSON. However, if these assumptions hold, these solutions work correctly and quickly.

Question 11

If you use a json.JSONDecoder instance you can use raw_decode member function. It returns a tuple of python representation of the JSON value and an index to where the parsing stopped. This makes it easy to slice (or seek in a stream object) the remaining JSON values. I’m not so happy about the extra while loop to skip over the white space between the different JSON values in the input but it gets the job done in my opinion.

import json

def yield_multiple_value(f):
    '''
    parses multiple JSON values from a file.
    '''
    vals_str = f.read()
    decoder = json.JSONDecoder()
    try:
        nread = 0
        while nread < len(vals_str):
            val, n = decoder.raw_decode(vals_str[nread:])
            nread += n
            # Skip over whitespace because of bug, below.
            while nread < len(vals_str) and vals_str[nread].isspace():
                nread += 1
            yield val
    except json.JSONDecodeError as e:
        pass
    return

The next version is much shorter and eats the part of the string that is already parsed. It seems that for some reason a second call json.JSONDecoder.raw_decode() seems to fail when the first character in the string is a whitespace, that is also the reason why I skip over the whitespace in the whileloop above …

def yield_multiple_value(f):
    '''
    parses multiple JSON values from a file.
    '''
    vals_str = f.read()
    decoder = json.JSONDecoder()
    while vals_str:
        val, n = decoder.raw_decode(vals_str)
        #remove the read characters from the start.
        vals_str = vals_str[n:]
        # remove leading white space because a second call to decoder.raw_decode()
        # fails when the string starts with whitespace, and
        # I don't understand why...
        vals_str = vals_str.lstrip()
        yield val
    return

In the documentation about the json.JSONDecoder class the method raw_decode https://docs.python.org/3/library/json.html#encoders-and-decoders contains the following:

This can be used to decode a JSON document from a string that may have extraneous data at the end.

And this extraneous data can easily be another JSON value. In other words the method might be written with this purpose in mind.

With the input.txt using the upper function I obtain the example output as presented in the original question.

Question 12

You can use https://pypi.org/project/json-stream-parser/ for exactly that purpose.

import sys
from json_stream_parser import load_iter
for obj in load_iter(sys.stdin):
    print(obj)

output

{'foo': ['bar', 'baz']}
1
2
[]
4
5
6

Question 13

I don’t care if it’s JSON, pickle, YAML, or whatever.

All other implementations I have seen are not forwards compatible, so if I have a config file, add a new key in the code, then load that config file, it’ll just crash.

Are there any simple way to do this?

Question 14

Configuration files in python

There are several ways to do this depending on the file format required.

ConfigParser [.ini format]

I would use the standard configparser approach unless there were compelling reasons to use a different format.

Write a file like so:

# python 2.x
# from ConfigParser import SafeConfigParser
# config = SafeConfigParser()

# python 3.x
from configparser import ConfigParser
config = ConfigParser()

config.read('config.ini')
config.add_section('main')
config.set('main', 'key1', 'value1')
config.set('main', 'key2', 'value2')
config.set('main', 'key3', 'value3')

with open('config.ini', 'w') as f:
    config.write(f)

The file format is very simple with sections marked out in square brackets:

[main]
key1 = value1
key2 = value2
key3 = value3

Values can be extracted from the file like so:

# python 2.x
# from ConfigParser import SafeConfigParser
# config = SafeConfigParser()

# python 3.x
from configparser import ConfigParser
config = ConfigParser()

config.read('config.ini')

print config.get('main', 'key1') # -> "value1"
print config.get('main', 'key2') # -> "value2"
print config.get('main', 'key3') # -> "value3"

# getfloat() raises an exception if the value is not a float
a_float = config.getfloat('main', 'a_float')

# getint() and getboolean() also do this for their respective types
an_int = config.getint('main', 'an_int')

JSON [.json format]

JSON data can be very complex and has the advantage of being highly portable.

Write data to a file:

import json

config = {'key1': 'value1', 'key2': 'value2'}

with open('config.json', 'w') as f:
    json.dump(config, f)

Read data from a file:

import json

with open('config.json', 'r') as f:
    config = json.load(f)

#edit the data
config['key3'] = 'value3'

#write it back to the file
with open('config.json', 'w') as f:
    json.dump(config, f)

YAML

A basic YAML example is provided in this answer. More details can be found on the pyYAML website.

Question 15

ConfigParser Basic example

The file can be loaded and used like this:

#!/usr/bin/env python

import ConfigParser
import io

# Load the configuration file
with open("config.yml") as f:
    sample_config = f.read()
config = ConfigParser.RawConfigParser(allow_no_value=True)
config.readfp(io.BytesIO(sample_config))

# List all contents
print("List all contents")
for section in config.sections():
    print("Section: %s" % section)
    for options in config.options(section):
        print("x %s:::%s:::%s" % (options,
                                  config.get(section, options),
                                  str(type(options))))

# Print some contents
print("\nPrint some contents")
print(config.get('other', 'use_anonymous'))  # Just get the value
print(config.getboolean('other', 'use_anonymous'))  # You know the datatype?

which outputs

List all contents
Section: mysql
x host:::localhost:::<type 'str'>
x user:::root:::<type 'str'>
x passwd:::my secret password:::<type 'str'>
x db:::write-math:::<type 'str'>
Section: other
x preprocessing_queue:::["preprocessing.scale_and_center",
"preprocessing.dot_reduction",
"preprocessing.connect_lines"]:::<type 'str'>
x use_anonymous:::yes:::<type 'str'>

Print some contents
yes
True

As you can see, you can use a standard data format that is easy to read and write. Methods like getboolean and getint allow you to get the datatype instead of a simple string.

Writing configuration

import os
configfile_name = "config.yaml"

# Check if there is already a configurtion file
if not os.path.isfile(configfile_name):
    # Create the configuration file as it doesn't exist yet
    cfgfile = open(configfile_name, 'w')

    # Add content to the file
    Config = ConfigParser.ConfigParser()
    Config.add_section('mysql')
    Config.set('mysql', 'host', 'localhost')
    Config.set('mysql', 'user', 'root')
    Config.set('mysql', 'passwd', 'my secret password')
    Config.set('mysql', 'db', 'write-math')
    Config.add_section('other')
    Config.set('other',
               'preprocessing_queue',
               ['preprocessing.scale_and_center',
                'preprocessing.dot_reduction',
                'preprocessing.connect_lines'])
    Config.set('other', 'use_anonymous', True)
    Config.write(cfgfile)
    cfgfile.close()

results in

[mysql]
host = localhost
user = root
passwd = my secret password
db = write-math

[other]
preprocessing_queue = ['preprocessing.scale_and_center', 'preprocessing.dot_reduction', 'preprocessing.connect_lines']
use_anonymous = True

XML Basic example

Seems not to be used at all for configuration files by the Python community. However, parsing / writing XML is easy and there are plenty of possibilities to do so with Python. One is BeautifulSoup:

from BeautifulSoup import BeautifulSoup

with open("config.xml") as f:
    content = f.read()

y = BeautifulSoup(content)
print(y.mysql.host.contents[0])
for tag in y.other.preprocessing_queue:
    print(tag)

where the config.xml might look like this

<config>
    <mysql>
        <host>localhost</host>
        <user>root</user>
        <passwd>my secret password</passwd>
        <db>write-math</db>
    </mysql>
    <other>
        <preprocessing_queue>
            <li>preprocessing.scale_and_center</li>
            <li>preprocessing.dot_reduction</li>
            <li>preprocessing.connect_lines</li>
        </preprocessing_queue>
        <use_anonymous value="true" />
    </other>
</config>

Question 16

If you want to use something like an INI file to hold settings, consider using configparser which loads key value pairs from a text file, and can easily write back to the file.

INI file has the format:

[Section]
key = value
key with spaces = somevalue

Question 17

Save and load a dictionary. You will have arbitrary keys, values and arbitrary number of key, values pairs.

Question 18

Try using ReadSettings:

from readsettings import ReadSettings
data = ReadSettings("settings.json") # Load or create any json, yml, yaml or toml file
data["name"] = "value" # Set "name" to "value"
data["name"] # Returns: "value"

Question 19

try using cfg4py:

Hierarchichal design, mulitiple env supported, so never mess up dev settings with production site settings.
Code completion. Cfg4py will convert your yaml into a python class, then code completion is available while you typing your code.
many more..

DISCLAIMER: I’m the author of this module

Question 20

I am using the standard json module in python 2.6 to serialize a list of floats. However, I’m getting results like this:

>>> import json
>>> json.dumps([23.67, 23.97, 23.87])
'[23.670000000000002, 23.969999999999999, 23.870000000000001]'

I want the floats to be formated with only two decimal digits. The output should look like this:

>>> json.dumps([23.67, 23.97, 23.87])
'[23.67, 23.97, 23.87]'

I have tried defining my own JSON Encoder class:

class MyEncoder(json.JSONEncoder):
    def encode(self, obj):
        if isinstance(obj, float):
            return format(obj, '.2f')
        return json.JSONEncoder.encode(self, obj)

This works for a sole float object:

>>> json.dumps(23.67, cls=MyEncoder)
'23.67'

But fails for nested objects:

>>> json.dumps([23.67, 23.97, 23.87])
'[23.670000000000002, 23.969999999999999, 23.870000000000001]'

I don’t want to have external dependencies, so I prefer to stick with the standard json module.

How can I achieve this?

Question 21

Note: This does not work in any recent version of Python.

Unfortunately, I believe you have to do this by monkey-patching (which, to my opinion, indicates a design defect in the standard library json package). E.g., this code:

import json
from json import encoder
encoder.FLOAT_REPR = lambda o: format(o, '.2f')
    
print(json.dumps(23.67))
print(json.dumps([23.67, 23.97, 23.87]))

emits:

23.67
[23.67, 23.97, 23.87]

as you desire. Obviously, there should be an architected way to override FLOAT_REPR so that EVERY representation of a float is under your control if you wish it to be; but unfortunately that’s not how the json package was designed:-(.

Question 22

import simplejson
    
class PrettyFloat(float):
    def __repr__(self):
        return '%.15g' % self
    
def pretty_floats(obj):
    if isinstance(obj, float):
        return PrettyFloat(obj)
    elif isinstance(obj, dict):
        return dict((k, pretty_floats(v)) for k, v in obj.items())
    elif isinstance(obj, (list, tuple)):
        return list(map(pretty_floats, obj))
    return obj
    
print(simplejson.dumps(pretty_floats([23.67, 23.97, 23.87])))

emits

[23.67, 23.97, 23.87]

No monkeypatching necessary.

Question 23

If you’re using Python 2.7, a simple solution is to simply round your floats explicitly to the desired precision.

>>> sys.version
'2.7.1 (r271:86832, Nov 27 2010, 18:30:46) [MSC v.1500 32 bit (Intel)]'
>>> json.dumps(1.0/3.0)
'0.3333333333333333'
>>> json.dumps(round(1.0/3.0, 2))
'0.33'

This works because Python 2.7 made float rounding more consistent. Unfortunately this does not work in Python 2.6:

>>> sys.version
'2.6.6 (r266:84292, Dec 27 2010, 00:02:40) \n[GCC 4.4.5]'
>>> json.dumps(round(1.0/3.0, 2))
'0.33000000000000002'

The solutions mentioned above are workarounds for 2.6, but none are entirely adequate. Monkey patching json.encoder.FLOAT_REPR does not work if your Python runtime uses a C version of the JSON module. The PrettyFloat class in Tom Wuttke’s answer works, but only if %g encoding works globally for your application. The %.15g is a bit magic, it works because float precision is 17 significant digits and %g does not print trailing zeroes.

I spent some time trying to make a PrettyFloat that allowed customization of precision for each number. Ie, a syntax like

>>> json.dumps(PrettyFloat(1.0 / 3.0, 4))
'0.3333'

It’s not easy to get this right. Inheriting from float is awkward. Inheriting from Object and using a JSONEncoder subclass with its own default() method should work, except the json module seems to assume all custom types should be serialized as strings. Ie: you end up with the Javascript string “0.33” in the output, not the number 0.33. There may be a way yet to make this work, but it’s harder than it looks.

Question 24

Really unfortunate that dumps doesn’t allow you to do anything to floats. However loads does. So if you don’t mind the extra CPU load, you could throw it through the encoder/decoder/encoder and get the right result:

>>> json.dumps(json.loads(json.dumps([.333333333333, .432432]), parse_float=lambda x: round(float(x), 3)))
'[0.333, 0.432]'

Question 25

Here’s a solution that worked for me in Python 3 and does not require monkey patching:

import json

def round_floats(o):
    if isinstance(o, float): return round(o, 2)
    if isinstance(o, dict): return {k: round_floats(v) for k, v in o.items()}
    if isinstance(o, (list, tuple)): return [round_floats(x) for x in o]
    return o


json.dumps(round_floats([23.63437, 23.93437, 23.842347]))

Output is:

[23.63, 23.93, 23.84]

It copies the data but with rounded floats.

Question 26

If you’re stuck with Python 2.5 or earlier versions: The monkey-patch trick does not seem to work with the original simplejson module if the C speedups are installed:

$ python
Python 2.5.4 (r254:67916, Jan 20 2009, 11:06:13) 
[GCC 4.2.1 (SUSE Linux)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import simplejson
>>> simplejson.__version__
'2.0.9'
>>> simplejson._speedups
<module 'simplejson._speedups' from '/home/carlos/.python-eggs/simplejson-2.0.9-py2.5-linux-i686.egg-tmp/simplejson/_speedups.so'>
>>> simplejson.encoder.FLOAT_REPR = lambda f: ("%.2f" % f)
>>> simplejson.dumps([23.67, 23.97, 23.87])
'[23.670000000000002, 23.969999999999999, 23.870000000000001]'
>>> simplejson.encoder.c_make_encoder = None
>>> simplejson.dumps([23.67, 23.97, 23.87])
'[23.67, 23.97, 23.87]'
>>>

Question 27

You can do what you need to do, but it isn’t documented:

>>> import json
>>> json.encoder.FLOAT_REPR = lambda f: ("%.2f" % f)
>>> json.dumps([23.67, 23.97, 23.87])
'[23.67, 23.97, 23.87]'

Question 28

Alex Martelli’s solution will work for single threaded apps, but may not work for multi-threaded apps that need to control the number of decimal places per thread. Here is a solution that should work in multi threaded apps:

import threading
from json import encoder

def FLOAT_REPR(f):
    """
    Serialize a float to a string, with a given number of digits
    """
    decimal_places = getattr(encoder.thread_local, 'decimal_places', 0)
    format_str = '%%.%df' % decimal_places
    return format_str % f

encoder.thread_local = threading.local()
encoder.FLOAT_REPR = FLOAT_REPR     

#As an example, call like this:
import json

encoder.thread_local.decimal_places = 1
json.dumps([1.56, 1.54]) #Should result in '[1.6, 1.5]'

You can merely set encoder.thread_local.decimal_places to the number of decimal places you want, and the next call to json.dumps() in that thread will use that number of decimal places

Question 29

If you need to do this in python 2.7 without overriding the global json.encoder.FLOAT_REPR, here’s one way.

import json
import math

class MyEncoder(json.JSONEncoder):
    "JSON encoder that renders floats to two decimal places"

    FLOAT_FRMT = '{0:.2f}'

    def floatstr(self, obj):
        return self.FLOAT_FRMT.format(obj)

    def _iterencode(self, obj, markers=None):
        # stl JSON lame override #1
        new_obj = obj
        if isinstance(obj, float):
            if not math.isnan(obj) and not math.isinf(obj):
                new_obj = self.floatstr(obj)
        return super(MyEncoder, self)._iterencode(new_obj, markers=markers)

    def _iterencode_dict(self, dct, markers=None):
        # stl JSON lame override #2
        new_dct = {}
        for key, value in dct.iteritems():
            if isinstance(key, float):
                if not math.isnan(key) and not math.isinf(key):
                    key = self.floatstr(key)
            new_dct[key] = value
        return super(MyEncoder, self)._iterencode_dict(new_dct, markers=markers)

Then, in python 2.7:

>>> from tmp import MyEncoder
>>> enc = MyEncoder()
>>> enc.encode([23.67, 23.98, 23.87])
'[23.67, 23.98, 23.87]'

In python 2.6, it doesn’t quite work as Matthew Schinckel points out below:

>>> import MyEncoder
>>> enc = MyEncoder()  
>>> enc.encode([23.67, 23.97, 23.87])
'["23.67", "23.97", "23.87"]'

Question 30

Pros:

Works with any JSON encoder, or even python’s repr.
Short(ish), seems to work.

Cons:

Ugly regexp hack, barely tested.

Quadratic complexity.

def fix_floats(json, decimals=2, quote='"'):
    pattern = r'^((?:(?:"(?:\\.|[^\\"])*?")|[^"])*?)(-?\d+\.\d{'+str(decimals)+'}\d+)'
    pattern = re.sub('"', quote, pattern) 
    fmt = "%%.%df" % decimals
    n = 1
    while n:
        json, n = re.subn(pattern, lambda m: m.group(1)+(fmt % float(m.group(2)).rstrip('0')), json)
    return json

Question 31

When importing the standard json module, it is enough to change the default encoder FLOAT_REPR. There isn’t really the need to import or create Encoder instances.

import json
json.encoder.FLOAT_REPR = lambda o: format(o, '.2f')

json.dumps([23.67, 23.97, 23.87]) #returns  '[23.67, 23.97, 23.87]'

Sometimes is also very useful to output as json the best representation python can guess with str. This will make sure signifficant digits are not ignored.

import json
json.dumps([23.67, 23.9779, 23.87489])
# output is'[23.670000000000002, 23.977900000000002, 23.874890000000001]'

json.encoder.FLOAT_REPR = str
json.dumps([23.67, 23.9779, 23.87489])
# output is '[23.67, 23.9779, 23.87489]'

Question 32

I agree with @Nelson that inheriting from float is awkward, but perhaps a solution that only touches the __repr__ function might be forgiveable. I ended up using the decimal package for this to reformat floats when needed. The upside is that this works in all contexts where repr() is being called, so also when simply printing lists to stdout for example. Also, the precision is runtime configurable, after the data has been created. Downside is of course that your data needs to be converted to this special float class (as unfortunately you cannot seem to monkey patch float.__repr__). For that I provide a brief conversion function.

The code:

import decimal
C = decimal.getcontext()

class decimal_formatted_float(float):
   def __repr__(self):
       s = str(C.create_decimal_from_float(self))
       if '.' in s: s = s.rstrip('0')
       return s

def convert_to_dff(elem):
    try:
        return elem.__class__(map(convert_to_dff, elem))
    except:
        if isinstance(elem, float):
            return decimal_formatted_float(elem)
        else:
            return elem

Usage example:

>>> import json
>>> li = [(1.2345,),(7.890123,4.567,890,890.)]
>>>
>>> decimal.getcontext().prec = 15
>>> dff_li = convert_to_dff(li)
>>> dff_li
[(1.2345,), (7.890123, 4.567, 890, 890)]
>>> json.dumps(dff_li)
'[[1.2345], [7.890123, 4.567, 890, 890]]'
>>>
>>> decimal.getcontext().prec = 3
>>> dff_li = convert_to_dff(li)
>>> dff_li
[(1.23,), (7.89, 4.57, 890, 890)]
>>> json.dumps(dff_li)
'[[1.23], [7.89, 4.57, 890, 890]]'

Question 33

Using numpy

If you actually have really long floats you can round them up/down correctly with numpy:

import json 

import numpy as np

data = np.array([23.671234, 23.97432, 23.870123])

json.dumps(np.around(data, decimals=2).tolist())

'[23.67, 23.97, 23.87]'

Question 34

I just released fjson, a small Python library to fix this issue. Install with

pip install fjson

and use just like json, with the addition of the float_format parameter:

import math
import fjson


data = {"a": 1, "b": math.pi}
print(fjson.dumps(data, float_format=".6e", indent=2))

{
  "a": 1,
  "b": 3.141593e+00
}

Question 35

Can someone suggest how I can beautify JSON in Python or through the command line?

The only online based JSON beautifier which could do it was: http://jsonviewer.stack.hu/.

I need to use it from within Python, however.

This is my dataset:

{ "head": {"vars": [ "address" , "description" ,"listprice" ]} , "results": { "bindings": [ 
    {
        "address" : { "type":"string", "value" : " Dyne Road, London NW6"},
            "description" :{ "type":"string", "value" : "6 bed semi detached house"},
            "listprice" : { "type":"string", "value" : "1,150,000"}
    }
    ,
        {
            "address" : { "type":"string", "value" : " Tweedy Road, Bromley BR1"},
            "description" :{ "type":"string", "value" : "5 bed terraced house"},
            "listprice" : { "type":"string", "value" : "550,000"}
        }
    ,
        {
            "address" : { "type":"string", "value" : " Vera Avenue, London N21"},
            "description" :{ "type":"string", "value" : "4 bed detached house"},
            "listprice" : { "type":"string", "value" : "

                995,000


                    "}
        }
    ,
        {
            "address" : { "type":"string", "value" : " Wimbledon Park Side, London SW19"},
            "description" :{ "type":"string", "value" : "3 bedroom  property for sale"},
            "listprice" : { "type":"string", "value" : "

                1,250,000


                    "}
        }
    ,
        {
            "address" : { "type":"string", "value" : " Westbere Road, West Hampstead, London NW2"},
            "description" :{ "type":"string", "value" : "5 bedroom  semi detached house for sale"},
            "listprice" : { "type":"string", "value" : "

                1,250,000


                    "}
        }
    ,
        {
            "address" : { "type":"string", "value" : " The Avenue, Hatch End, Pinner HA5"},
            "description" :{ "type":"string", "value" : "5 bedroom  detached house for sale"},
            "listprice" : { "type":"string", "value" : "

                1,250,000


                    "}
        }
    ,
        {
            "address" : { "type":"string", "value" : " Princes Park Avenue, London NW11"},
            "description" :{ "type":"string", "value" : "4 bedroom  detached house for sale"},
            "listprice" : { "type":"string", "value" : "

                1,250,000


                    "}
        }
    ,
        {
            "address" : { "type":"string", "value" : " Canons Drive, Edgware HA8"},
            "description" :{ "type":"string", "value" : "4 bedroom  detached house for sale"},
            "listprice" : { "type":"string", "value" : "

                1,250,000


                    "}
        }
    ,
        {
            "address" : { "type":"string", "value" : " Westbere Road, West Hampstead NW2"},
            "description" :{ "type":"string", "value" : "5 bedroom  property for sale"},
            "listprice" : { "type":"string", "value" : "

                1,250,000


                    "}
        }
    ,
        {
            "address" : { "type":"string", "value" : " Haymills Estate, Ealing, London"},
            "description" :{ "type":"string", "value" : "5 bedroom  property for sale"},
            "listprice" : { "type":"string", "value" : "

                1,250,000


                    "}
        }
    ,
        {
            "address" : { "type":"string", "value" : " Dene Terrace Woodclyffe Drive, Chislehurst, Kent BR7"},
            "description" :{ "type":"string", "value" : "5 bedroom  terraced house for sale"},
            "listprice" : { "type":"string", "value" : "

                1,250,000


                    "}
        }
    ,
        {
            "address" : { "type":"string", "value" : " Dene Terrace Woodclyffe Drive, Chislehurst, Kent BR7"},
            "description" :{ "type":"string", "value" : "5 bedroom  semi detached house for sale"},
            "listprice" : { "type":"string", "value" : "

                1,250,000


                    "}
        }
    ,
        {
            "address" : { "type":"string", "value" : " Northwick Close, St John's Wood NW8"},
            "description" :{ "type":"string", "value" : "3 bedroom  property for sale"},
            "listprice" : { "type":"string", "value" : "

                1,250,000


                    "}
        }
    ,
        {
            "address" : { "type":"string", "value" : " Claremont Gardens, Surbiton KT6"},
            "description" :{ "type":"string", "value" : "13 bedroom  property for sale"},
            "listprice" : { "type":"string", "value" : "

                1,250,000


                    "}
        }
    ,
        {
            "address" : { "type":"string", "value" : " Dene Terrace Woodclyffe Drive, Chislehurst, Kent BR7"},
            "description" :{ "type":"string", "value" : "5 bedroom  end terrace house for sale"},
            "listprice" : { "type":"string", "value" : "

                1,250,000


                    "}
        }
    ,
        {
            "address" : { "type":"string", "value" : " Stamford Road, London N1"},
            "description" :{ "type":"string", "value" : "4 bedroom  terraced house for sale"},
            "listprice" : { "type":"string", "value" : "

                1,250,000


                    "}
        }
    ,
        {
            "address" : { "type":"string", "value" : " Stanhope Avenue, London N3"},
            "description" :{ "type":"string", "value" : "6 bedroom  property for sale"},
            "listprice" : { "type":"string", "value" : "

                1,250,000


                    "}
        }
    ,
        {
            "address" : { "type":"string", "value" : " Haymills Estate, Ealing, London"},
            "description" :{ "type":"string", "value" : "5 bedroom  property for sale"},
            "listprice" : { "type":"string", "value" : "

                1,250,000


                    "}
        }
    ,
        {
            "address" : { "type":"string", "value" : " Elms Crescent, London SW4"},
            "description" :{ "type":"string", "value" : "5 bedroom  property for sale"},
            "listprice" : { "type":"string", "value" : "

                1,250,000


                    "}
        }
    ,
        {
            "address" : { "type":"string", "value" : " Princes Park Avenue, London NW11"},
            "description" :{ "type":"string", "value" : "4 bedroom  property for sale"},
            "listprice" : { "type":"string", "value" : "

                1,250,000


                    "}
        }
    ,
        {
            "address" : { "type":"string", "value" : " Abbeville Road, London SW4"},
            "description" :{ "type":"string", "value" : "4 bedroom  property for sale"},
            "listprice" : { "type":"string", "value" : "

                1,250,000


                    "}
        }
    ,
        {
            "address" : { "type":"string", "value" : " Canons Drive, Edgware HA8"},
            "description" :{ "type":"string", "value" : "4 bedroom  detached house for sale"},
            "listprice" : { "type":"string", "value" : "

                1,250,000


                    "}
        }
    ,
        {
            "address" : { "type":"string", "value" : " Henson Avenue, Willesdon Green NW2"},
            "description" :{ "type":"string", "value" : "5 bedroom  property for sale"},
            "listprice" : { "type":"string", "value" : "

                1,250,000


                    "}
        }
    ,
        {
            "address" : { "type":"string", "value" : " Woodstock Road, London NW11"},
            "description" :{ "type":"string", "value" : "5 bedroom  property for sale"},
            "listprice" : { "type":"string", "value" : "

                1,250,000


                    "}
        }
    ,
        {
            "address" : { "type":"string", "value" : " Tamworth Street, London SW6"},
            "description" :{ "type":"string", "value" : "5 bedroom  property for sale"},
            "listprice" : { "type":"string", "value" : "

                1,250,000


                    "}
        }
    ,
        {
            "address" : { "type":"string", "value" : " Stanhope Avenue, Finchley, London"},
            "description" :{ "type":"string", "value" : "5 bedroom  semi detached house for sale"},
            "listprice" : { "type":"string", "value" : "

                1,250,000


                    "}
        }
    ,
        {
            "address" : { "type":"string", "value" : " The Old Burlington, Church Street, London W4"},
            "description" :{ "type":"string", "value" : "3 bedroom  property for sale"},
            "listprice" : { "type":"string", "value" : "

                1,250,000


                    "}
        }
    ,
        {
            "address" : { "type":"string", "value" : " Ebury Close, Northwood HA6"},
            "description" :{ "type":"string", "value" : "4 bedroom  detached house for sale"},
            "listprice" : { "type":"string", "value" : "

                1,250,000


                    "}
        }
    ,
        {
            "address" : { "type":"string", "value" : " Middleton Road, London NW11"},
            "description" :{ "type":"string", "value" : "4 bedroom  property for sale"},
            "listprice" : { "type":"string", "value" : "

                1,250,000


                    "}
        }
    ,
        {
            "address" : { "type":"string", "value" : " Henson Avenue, Willesden Green NW2"},
            "description" :{ "type":"string", "value" : "5 bedroom  property for sale"},
            "listprice" : { "type":"string", "value" : "

                1,250,000


                    "}
        }
    ,
        {
            "address" : { "type":"string", "value" : " Huron Road, London SW17"},
            "description" :{ "type":"string", "value" : "6 bedroom  property for sale"},
            "listprice" : { "type":"string", "value" : "

                1,250,000


                    "}
        }
    ,
        {
            "address" : { "type":"string", "value" : " Corringway, Ealing W5"},
            "description" :{ "type":"string", "value" : "5 bedroom  detached house for sale"},
            "listprice" : { "type":"string", "value" : "

                1,250,000


                    "}
        }
    ,
        {
            "address" : { "type":"string", "value" : " Woodlands Avenue, New Malden KT3"},
            "description" :{ "type":"string", "value" : "5 bedroom  detached house for sale"},
            "listprice" : { "type":"string", "value" : "

                1,250,000


                    "}
        }
    ,
        {
            "address" : { "type":"string", "value" : " Gunnersbury Park Area, Ealing, London"},
            "description" :{ "type":"string", "value" : "6 bedroom  property for sale"},
            "listprice" : { "type":"string", "value" : "

                1,250,000


                    "}
        }
    ,
        {
            "address" : { "type":"string", "value" : " Blenheim Gardens, London, Brent NW2"},
            "description" :{ "type":"string", "value" : "6 bedroom  property for sale"},
            "listprice" : { "type":"string", "value" : "

                1,250,000


                    "}
        }
    ,
        {
            "address" : { "type":"string", "value" : " Creighton Road, London NW6"},
            "description" :{ "type":"string", "value" : "4 bedroom  terraced house for sale"},
            "listprice" : { "type":"string", "value" : "

                1,250,000


                    "}
        }
    ,
        {
            "address" : { "type":"string", "value" : " Plaistow Lane, Bromley BR1"},
            "description" :{ "type":"string", "value" : "7 bedroom  detached house for sale"},
            "listprice" : { "type":"string", "value" : "

                1,250,000


                    "}
        }
    ,
        {
            "address" : { "type":"string", "value" : " Greenfield Gardens, London NW2"},
            "description" :{ "type":"string", "value" : "4 bedroom  property for sale"},
            "listprice" : { "type":"string", "value" : "

                1,250,000


                    "}
        }
    ,
        {
            "address" : { "type":"string", "value" : " Hendon Avenue, London N3"},
            "description" :{ "type":"string", "value" : "3 bedroom  detached house for sale"},
            "listprice" : { "type":"string", "value" : "

                1,250,000


                    "}
        }
    ,
        {
            "address" : { "type":"string", "value" : " Peckham Park Road, London SE15"},
            "description" :{ "type":"string", "value" : "6 bedroom  semi detached house for sale"},
            "listprice" : { "type":"string", "value" : "

                1,250,000


                    "}
        }
    ,
        {
            "address" : { "type":"string", "value" : " Woodclyffe Drive, Chislehurst BR7"},
            "description" :{ "type":"string", "value" : "5 bedroom  house for sale"},
            "listprice" : { "type":"string", "value" : "

                From 1,250,000


                    "}
        }
    ,
        {
            "address" : { "type":"string", "value" : " Highwood Hill, Mill Hill, London"},
            "description" :{ "type":"string", "value" : "5 bedroom  detached house for sale"},
            "listprice" : { "type":"string", "value" : "

                1,250,000


                    "}
        }
    ,
        {
            "address" : { "type":"string", "value" : " Stanhope Avenue, London N3"},
            "description" :{ "type":"string", "value" : "5 bedroom  semi detached house for sale"},
            "listprice" : { "type":"string", "value" : "

                1,250,000


                    "}
        }
    ,
        {
            "address" : { "type":"string", "value" : " Kersley Mews, London SW11"},
            "description" :{ "type":"string", "value" : "3 bedroom  mews for sale"},
            "listprice" : { "type":"string", "value" : "

                1,250,000


                    "}
        }
    ,
        {
            "address" : { "type":"string", "value" : " Ebury Close, Northwood HA6"},
            "description" :{ "type":"string", "value" : "4 bedroom  detached house for sale"},
            "listprice" : { "type":"string", "value" : "

                1,250,000


                    "}
        }
    ,
        {
            "address" : { "type":"string", "value" : " Ellesmere Road, Chiswick W4"},
            "description" :{ "type":"string", "value" : "6 bedroom  detached house for sale"},
            "listprice" : { "type":"string", "value" : "

                1,250,000


                    "}
        }
    ,
        {
            "address" : { "type":"string", "value" : " The Avenue, Hatch End, Pinner, Middlesex"},
            "description" :{ "type":"string", "value" : "5 bedroom  detached house for sale"},
            "listprice" : { "type":"string", "value" : "

                1,250,000


                    "}
        }
    ,
        {
            "address" : { "type":"string", "value" : " Wandsworth, London SW18"},
            "description" :{ "type":"string", "value" : "6 bedroom  semi detached house for sale"},
            "listprice" : { "type":"string", "value" : "

                1,250,000


                    "}
        }
    ,
        {
            "address" : { "type":"string", "value" : " Carlton Road, New Malden KT3"},
            "description" :{ "type":"string", "value" : "4 bedroom  detached house for sale"},
            "listprice" : { "type":"string", "value" : "

                1,250,000


                    "}
        }
    ,
        {
            "address" : { "type":"string", "value" : " St Mary's Mews, Ealing W5"},
            "description" :{ "type":"string", "value" : "3 bedroom  property for sale"},
            "listprice" : { "type":"string", "value" : "

                1,250,000


                    "}
        }
    ,
        {
            "address" : { "type":"string", "value" : " Ritherdon Road, Balham, London SW17"},
            "description" :{ "type":"string", "value" : "5 bedroom  semi detached house for sale"},
            "listprice" : { "type":"string", "value" : "

                1,250,000


                    "}
        }
    ,
        {
            "address" : { "type":"string", "value" : " Goldsmith Avenue, London W3"},
            "description" :{ "type":"string", "value" : "5 bedroom  property for sale"},
            "listprice" : { "type":"string", "value" : "

                1,250,000


                    "}
        }
    ,
        {
            "address" : { "type":"string", "value" : " Plaistow Lane, Bromley, Kent BR1"},
            "description" :{ "type":"string", "value" : "7 bedroom  detached house for sale"},
            "listprice" : { "type":"string", "value" : "

                1,250,000


                    "}
        }
    ] } }

Question 36

From the command-line:

echo '{"one":1,"two":2}' | python -mjson.tool

which outputs:

{
    "one": 1, 
    "two": 2
}

Programmtically, the Python manual describes pretty-printing JSON:

>>> import json
>>> print json.dumps({'4': 5, '6': 7}, sort_keys=True, indent=4)
{
    "4": 5,
    "6": 7
}

Question 37

Use the indent argument of the dumps function in the json module.

From the docs:

>>> import json
>>> print json.dumps({'4': 5, '6': 7}, sort_keys=True, indent=4)
{
    "4": 5,
    "6": 7
}

Question 38

A minimal in-python solution that colors json data supplied via the command line:

import sys
import json
from pygments import highlight, lexers, formatters

formatted_json = json.dumps(json.loads(sys.argv[1]), indent=4)
colorful_json = highlight(unicode(formatted_json, 'UTF-8'), lexers.JsonLexer(), formatters.TerminalFormatter())
print(colorful_json)

Inspired by pjson mentioned above. This code needs pygments to be installed.

Output example:

Question 39

Try underscore-cli:

cat myfile.json | underscore print --color

It’s a pretty nifty tool that can elegantly do a lot of manipulation of structured data, execute js snippets, fill templates, etc. It’s ridiculously well documented, polished, and ready for serious use. And I wrote it. :)

Question 40

The cli command I’ve used with python for this is:

cat myfile.json | python -mjson.tool

You should be able to find more info here:

http://docs.python.org/library/json.html

Question 41

It looks like jsbeautifier open sourced their tools and packaged them as Python and JS libs, and as CLI tools. It doesn’t look like they call out to a web service, but I didn’t check too closely. See the github repo with install instructions.

From their docs for Python CLI and library usage:

To beautify using python:

$ pip install jsbeautifier
$ js-beautify file.js

Beautified output goes to stdout.

To use jsbeautifier as a library is simple:

import jsbeautifier
res = jsbeautifier.beautify('your javascript string')
res = jsbeautifier.beautify_file('some_file.js')

…or, to specify some options:

opts = jsbeautifier.default_options()
opts.indent_size = 2
res = jsbeautifier.beautify('some javascript', opts)

If you want to pass a string instead of a filename, and you are using bash, then you can use process substitution like so:

$ js-beautify <(echo '{"some": "json"}')

Question 42

I didn’t like the output of json.dumps(…) -> For my taste way too much newlines. And I didn’t want to use a command line tool or install something. I finally found Pythons pprint (= pretty print). Unfortunately it doesn’t generate proper JSON but I think it is useful to have a user friendly glympse at the stored data.

Output of json.dumps(json_dict, indent=4)

{
    "hyperspace": {
        "constraints": [],
        "design": [
            [
                "windFarm.windparkSize.k",
                "continuous",
                [
                    0,
                    0,
                    5
                ]
            ],
            [
                "hydroPlant.primaryControlMax",
                "continuous",
                [
                    100,
                    300
                ]
            ]
        ],
        "kpis": [
            "frequency.y",
            "city.load.p[2]"
        ]
    },
    "lhc_size": 10,
    "number_of_runs": 10
}

Usage of pprint:

import pprint

json_dict = {"hyperspace": {"constraints": [], "design": [["windFarm.windparkSize.k", "continuous", [0, 0, 5]], ["hydroPlant.primaryControlMax", "continuous", [100, 300]]], "kpis": ["frequency.y", "city.load.p[2]"]}, "lhc_size": 10, "number_of_runs": 10}

formatted_json_str = pprint.pformat(json_dict)
print(formatted_json_str)
pprint.pprint(json_dict)

Result of pprint.pformat(...) or pprint.pprint(...):

{'hyperspace': {'constraints': [],
                'design': [['windFarm.windparkSize.k', 'continuous', [0, 0, 5]],
                           ['hydroPlant.primaryControlMax',
                            'continuous',
                            [100, 300]]],
                'kpis': ['frequency.y', 'city.load.p[2]']},
 'lhc_size': 10,
 'number_of_runs': 10}

Question 43

alias jsonp='pbpaste | python -m json.tool'

This will pretty print JSON that’s on the clipboard in OSX. Just Copy it then call the alias from a Bash prompt.

Question 44

You could pipe the output to jq. If you python script contains something like

print json.dumps(data)

then you can fire:

python foo.py | jq '.'

Question 45

Use the python tool library

Command line: python -mjson.tool

In code: http://docs.python.org/library/json.html

Question 46

First install pygments

then

echo '<some json>' | python -m json.tool | pygmentize -l json

Question 47

Your data is poorly formed. The value fields in particular have numerous spaces and new lines. Automated formatters won’t work on this, as they will not modify the actual data. As you generate the data for output, filter it as needed to avoid the spaces.

Question 48

With jsonlint (like xmllint):

aptitude install python-demjson
jsonlint -f foo.json

Question 49

How can I test whether two JSON objects are equal in python, disregarding the order of lists?

For example …

JSON document a:

{
    "errors": [
        {"error": "invalid", "field": "email"},
        {"error": "required", "field": "name"}
    ],
    "success": false
}

JSON document b:

{
    "success": false,
    "errors": [
        {"error": "required", "field": "name"},
        {"error": "invalid", "field": "email"}
    ]
}

a and b should compare equal, even though the order of the "errors" lists are different.

Question 50

If you want two objects with the same elements but in a different order to compare equal, then the obvious thing to do is compare sorted copies of them – for instance, for the dictionaries represented by your JSON strings a and b:

import json

a = json.loads("""
{
    "errors": [
        {"error": "invalid", "field": "email"},
        {"error": "required", "field": "name"}
    ],
    "success": false
}
""")

b = json.loads("""
{
    "success": false,
    "errors": [
        {"error": "required", "field": "name"},
        {"error": "invalid", "field": "email"}
    ]
}
""")

>>> sorted(a.items()) == sorted(b.items())
False

… but that doesn’t work, because in each case, the "errors" item of the top-level dict is a list with the same elements in a different order, and sorted() doesn’t try to sort anything except the “top” level of an iterable.

To fix that, we can define an ordered function which will recursively sort any lists it finds (and convert dictionaries to lists of (key, value) pairs so that they’re orderable):

def ordered(obj):
    if isinstance(obj, dict):
        return sorted((k, ordered(v)) for k, v in obj.items())
    if isinstance(obj, list):
        return sorted(ordered(x) for x in obj)
    else:
        return obj

If we apply this function to a and b, the results compare equal:

>>> ordered(a) == ordered(b)
True

Question 51

Another way could be to use json.dumps(X, sort_keys=True) option:

import json
a, b = json.dumps(a, sort_keys=True), json.dumps(b, sort_keys=True)
a == b # a normal string comparison

This works for nested dictionaries and lists.

Question 52

Decode them and compare them as mgilson comment.

Order does not matter for dictionary as long as the keys, and values matches. (Dictionary has no order in Python)

>>> {'a': 1, 'b': 2} == {'b': 2, 'a': 1}
True

But order is important in list; sorting will solve the problem for the lists.

>>> [1, 2] == [2, 1]
False
>>> [1, 2] == sorted([2, 1])
True

>>> a = '{"errors": [{"error": "invalid", "field": "email"}, {"error": "required", "field": "name"}], "success": false}'
>>> b = '{"errors": [{"error": "required", "field": "name"}, {"error": "invalid", "field": "email"}], "success": false}'
>>> a, b = json.loads(a), json.loads(b)
>>> a['errors'].sort()
>>> b['errors'].sort()
>>> a == b
True

Above example will work for the JSON in the question. For general solution, see Zero Piraeus’s answer.

Question 53

For the following two dicts ‘dictWithListsInValue’ and ‘reorderedDictWithReorderedListsInValue’ which are simply reordered versions of each other

dictObj = {"foo": "bar", "john": "doe"}
reorderedDictObj = {"john": "doe", "foo": "bar"}
dictObj2 = {"abc": "def"}
dictWithListsInValue = {'A': [{'X': [dictObj2, dictObj]}, {'Y': 2}], 'B': dictObj2}
reorderedDictWithReorderedListsInValue = {'B': dictObj2, 'A': [{'Y': 2}, {'X': [reorderedDictObj, dictObj2]}]}
a = {"L": "M", "N": dictWithListsInValue}
b = {"L": "M", "N": reorderedDictWithReorderedListsInValue}

print(sorted(a.items()) == sorted(b.items()))  # gives false

gave me wrong result i.e. false .

So I created my own cutstom ObjectComparator like this:

def my_list_cmp(list1, list2):
    if (list1.__len__() != list2.__len__()):
        return False

    for l in list1:
        found = False
        for m in list2:
            res = my_obj_cmp(l, m)
            if (res):
                found = True
                break

        if (not found):
            return False

    return True


def my_obj_cmp(obj1, obj2):
    if isinstance(obj1, list):
        if (not isinstance(obj2, list)):
            return False
        return my_list_cmp(obj1, obj2)
    elif (isinstance(obj1, dict)):
        if (not isinstance(obj2, dict)):
            return False
        exp = set(obj2.keys()) == set(obj1.keys())
        if (not exp):
            # print(obj1.keys(), obj2.keys())
            return False
        for k in obj1.keys():
            val1 = obj1.get(k)
            val2 = obj2.get(k)
            if isinstance(val1, list):
                if (not my_list_cmp(val1, val2)):
                    return False
            elif isinstance(val1, dict):
                if (not my_obj_cmp(val1, val2)):
                    return False
            else:
                if val2 != val1:
                    return False
    else:
        return obj1 == obj2

    return True


dictObj = {"foo": "bar", "john": "doe"}
reorderedDictObj = {"john": "doe", "foo": "bar"}
dictObj2 = {"abc": "def"}
dictWithListsInValue = {'A': [{'X': [dictObj2, dictObj]}, {'Y': 2}], 'B': dictObj2}
reorderedDictWithReorderedListsInValue = {'B': dictObj2, 'A': [{'Y': 2}, {'X': [reorderedDictObj, dictObj2]}]}
a = {"L": "M", "N": dictWithListsInValue}
b = {"L": "M", "N": reorderedDictWithReorderedListsInValue}

print(my_obj_cmp(a, b))  # gives true

which gave me the correct expected output!

Logic is pretty simple:

If the objects are of type ‘list’ then compare each item of the first list with the items of the second list until found , and if the item is not found after going through the second list , then ‘found’ would be = false. ‘found’ value is returned

Else if the objects to be compared are of type ‘dict’ then compare the values present for all the respective keys in both the objects. (Recursive comparison is performed)

Else simply call obj1 == obj2 . It by default works fine for the object of strings and numbers and for those eq() is defined appropriately .

(Note that the algorithm can further be improved by removing the items found in object2, so that the next item of object1 would not compare itself with the items already found in the object2)

Question 54

You can write your own equals function:

dicts are equal if: 1) all keys are equal, 2) all values are equal
lists are equal if: all items are equal and in the same order
primitives are equal if a == b

Because you’re dealing with json, you’ll have standard python types: dict, list, etc., so you can do hard type checking if type(obj) == 'dict':, etc.

Rough example (not tested):

def json_equals(jsonA, jsonB):
    if type(jsonA) != type(jsonB):
        # not equal
        return False
    if type(jsonA) == dict:
        if len(jsonA) != len(jsonB):
            return False
        for keyA in jsonA:
            if keyA not in jsonB or not json_equal(jsonA[keyA], jsonB[keyA]):
                return False
    elif type(jsonA) == list:
        if len(jsonA) != len(jsonB):
            return False
        for itemA, itemB in zip(jsonA, jsonB):
            if not json_equal(itemA, itemB):
                return False
    else:
        return jsonA == jsonB

Question 55

For others who’d like to debug the two JSON objects (usually, there is a reference and a target), here is a solution you may use. It will list the “path” of different/mismatched ones from target to the reference.

level option is used for selecting how deep you would like to look into.

show_variables option can be turned on to show the relevant variable.

def compareJson(example_json, target_json, level=-1, show_variables=False):
  _different_variables = _parseJSON(example_json, target_json, level=level, show_variables=show_variables)
  return len(_different_variables) == 0, _different_variables

def _parseJSON(reference, target, path=[], level=-1, show_variables=False):  
  if level > 0 and len(path) == level:
    return []
  
  _different_variables = list()
  # the case that the inputs is a dict (i.e. json dict)  
  if isinstance(reference, dict):
    for _key in reference:      
      _path = path+[_key]
      try:
        _different_variables += _parseJSON(reference[_key], target[_key], _path, level, show_variables)
      except KeyError:
        _record = ''.join(['[%s]'%str(p) for p in _path])
        if show_variables:
          _record += ': %s <--> MISSING!!'%str(reference[_key])
        _different_variables.append(_record)
  # the case that the inputs is a list/tuple
  elif isinstance(reference, list) or isinstance(reference, tuple):
    for index, v in enumerate(reference):
      _path = path+[index]
      try:
        _target_v = target[index]
        _different_variables += _parseJSON(v, _target_v, _path, level, show_variables)
      except IndexError:
        _record = ''.join(['[%s]'%str(p) for p in _path])
        if show_variables:
          _record += ': %s <--> MISSING!!'%str(v)
        _different_variables.append(_record)
  # the actual comparison about the value, if they are not the same, record it
  elif reference != target:
    _record = ''.join(['[%s]'%str(p) for p in path])
    if show_variables:
      _record += ': %s <--> %s'%(str(reference), str(target))
    _different_variables.append(_record)

  return _different_variables

Question 56

I’m trying to process incoming JSON/Ajax requests with Django/Python.

request.is_ajax() is True on the request, but I have no idea where the payload is with the JSON data.

request.POST.dir contains this:

['__class__', '__cmp__', '__contains__', '__copy__', '__deepcopy__', '__delattr__',
 '__delitem__', '__dict__', '__doc__', '__eq__', '__ge__', '__getattribute__',
'__getitem__', '__gt__', '__hash__', '__init__', '__iter__', '__le__', '__len__',
 '__lt__', '__module__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', 
'__setattr__', '__setitem__', '__str__', '__weakref__', '_assert_mutable', '_encoding', 
'_get_encoding', '_mutable', '_set_encoding', 'appendlist', 'clear', 'copy', 'encoding', 
'fromkeys', 'get', 'getlist', 'has_key', 'items', 'iteritems', 'iterkeys', 'itervalues', 
'keys', 'lists', 'pop', 'popitem', 'setdefault', 'setlist', 'setlistdefault', 'update', 
'urlencode', 'values']

There are apparently no keys in the request post keys.

When I look at the POST in Firebug, there is JSON data being sent up in the request.

Question 57

If you are posting JSON to Django, I think you want request.body (request.raw_post_data on Django < 1.4). This will give you the raw JSON data sent via the post. From there you can process it further.

Here is an example using JavaScript, jQuery, jquery-json and Django.

JavaScript:

var myEvent = {id: calEvent.id, start: calEvent.start, end: calEvent.end,
               allDay: calEvent.allDay };
$.ajax({
    url: '/event/save-json/',
    type: 'POST',
    contentType: 'application/json; charset=utf-8',
    data: $.toJSON(myEvent),
    dataType: 'text',
    success: function(result) {
        alert(result.Result);
    }
});

Django:

def save_events_json(request):
    if request.is_ajax():
        if request.method == 'POST':
            print 'Raw Data: "%s"' % request.body   
    return HttpResponse("OK")

Django < 1.4:

  def save_events_json(request):
    if request.is_ajax():
        if request.method == 'POST':
            print 'Raw Data: "%s"' % request.raw_post_data
    return HttpResponse("OK")

Question 58

I had the same problem. I had been posting a complex JSON response, and I couldn’t read my data using the request.POST dictionary.

My JSON POST data was:

//JavaScript code:
//Requires json2.js and jQuery.
var response = {data:[{"a":1, "b":2},{"a":2, "b":2}]}
json_response = JSON.stringify(response); // proper serialization method, read 
                                          // http://ejohn.org/blog/ecmascript-5-strict-mode-json-and-more/
$.post('url',json_response);

In this case you need to use method provided by aurealus. Read the request.body and deserialize it with the json stdlib.

#Django code:
import json
def save_data(request):
  if request.method == 'POST':
    json_data = json.loads(request.body) # request.raw_post_data w/ Django < 1.4
    try:
      data = json_data['data']
    except KeyError:
      HttpResponseServerError("Malformed data!")
    HttpResponse("Got json data")

Question 59

Method 1

Client : Send as JSON

$.ajax({
    url: 'example.com/ajax/',
    type: 'POST',
    contentType: 'application/json; charset=utf-8',
    processData: false,
    data: JSON.stringify({'name':'John', 'age': 42}),
    ...
});

//Sent as a JSON object {'name':'John', 'age': 42}

Server :

data = json.loads(request.body) # {'name':'John', 'age': 42}

Method 2

Client : Send as x-www-form-urlencoded
(Note: contentType & processData have changed, JSON.stringify is not needed)

$.ajax({
    url: 'example.com/ajax/',
    type: 'POST',    
    data: {'name':'John', 'age': 42},
    contentType: 'application/x-www-form-urlencoded; charset=utf-8',  //Default
    processData: true,       
});

//Sent as a query string name=John&age=42

Server :

data = request.POST # will be <QueryDict: {u'name':u'John', u'age': 42}>

Changed in 1.5+ : https://docs.djangoproject.com/en/dev/releases/1.5/#non-form-data-in-http-requests

Non-form data in HTTP requests :
request.POST will no longer include data posted via HTTP requests with non form-specific content-types in the header. In prior versions, data posted with content-types other than multipart/form-data or application/x-www-form-urlencoded would still end up represented in the request.POST attribute. Developers wishing to access the raw POST data for these cases, should use the request.body attribute instead.

Probably related

Question 60

Its important to remember Python 3 has a different way to represent strings – they are byte arrays.

Using Django 1.9 and Python 2.7 and sending the JSON data in the main body (not a header) you would use something like:

mydata = json.loads(request.body)

But for Django 1.9 and Python 3.4 you would use:

mydata = json.loads(request.body.decode("utf-8"))

I just went through this learning curve making my first Py3 Django app!

Question 61

request.raw_response is now deprecated. Use request.body instead to process non-conventional form data such as XML payloads, binary images, etc.

Django documentation on the issue.

Question 62

on django 1.6 python 3.3

client

$.ajax({
    url: '/urll/',
    type: 'POST',
    contentType: 'application/json; charset=utf-8',
    data: JSON.stringify(json_object),
    dataType: 'json',
    success: function(result) {
        alert(result.Result);
    }
});

server

def urll(request):

if request.is_ajax():
    if request.method == 'POST':
        print ('Raw Data:', request.body) 

        print ('type(request.body):', type(request.body)) # this type is bytes

        print(json.loads(request.body.decode("utf-8")))

Question 63

The HTTP POST payload is just a flat bunch of bytes. Django (like most frameworks) decodes it into a dictionary from either URL encoded parameters, or MIME-multipart encoding. If you just dump the JSON data in the POST content, Django won’t decode it. Either do the JSON decoding from the full POST content (not the dictionary); or put the JSON data into a MIME-multipart wrapper.

In short, show the JavaScript code. The problem seems to be there.

Question 64

request.raw_post_data has been deprecated. Use request.body instead

问题：如何在Python中从文件/流中懒惰地读取多个JSON值？

使用范例

example.py

in.txt

示例会话

Example Use

example.py

in.txt

example session

回答 0

回答 1

回答 2

回答 3

回答 4

回答 5

仅限Python 3版本

Python 2兼容版本

测试它

Python 3 only version

Python 2 compatible version

Testing it

回答 6

回答 7

回答 8

回答 9

回答 10

问题：Python：您将如何保存一个简单的设置/配置文件？

回答 0

python中的配置文件

ConfigParser [.ini格式]

JSON [.json格式]

YAML

Configuration files in python

ConfigParser [.ini format]

JSON [.json format]

YAML

回答 1

ConfigParser Basic示例

XML基本示例

ConfigParser Basic example

XML Basic example

回答 2

回答 3

回答 4

回答 5

问题：格式使用标准json模块浮动

回答 0

回答 1

回答 2

回答 3

回答 4

回答 5

回答 6

回答 7

回答 8

回答 9

回答 10

回答 11

回答 12

使用numpy

Using numpy

回答 13

问题：如何在Python中美化JSON？

回答 0

回答 1

回答 2

回答 3

回答 4

回答 5

回答 6

回答 7

回答 8

回答 9

回答 10

回答 11

回答 12

问题：如何以相同的顺序比较两个具有相同元素的JSON对象相等？

回答 0

回答 1

回答 2

只需将数字从`int64`（从numpy）转换为`int`。

Just convert numbers from `int64` (from numpy) to `int`.

使用Django序列化器的`python`格式，