Python 实用宝典

Question 1

I’m using json.dumps to convert into json like

countries.append({"id":row.id,"name":row.name,"timezone":row.timezone})
print json.dumps(countries)

The result i have is:

[
   {"timezone": 4, "id": 1, "name": "Mauritius"}, 
   {"timezone": 2, "id": 2, "name": "France"}, 
   {"timezone": 1, "id": 3, "name": "England"}, 
   {"timezone": -4, "id": 4, "name": "USA"}
]

I want to have the keys in the following order: id, name, timezone – but instead I have timezone, id, name.

How should I fix this?

Question 2

Both Python dict (before Python 3.7) and JSON object are unordered collections. You could pass sort_keys parameter, to sort the keys:

>>> import json
>>> json.dumps({'a': 1, 'b': 2})
'{"b": 2, "a": 1}'
>>> json.dumps({'a': 1, 'b': 2}, sort_keys=True)
'{"a": 1, "b": 2}'

If you need a particular order; you could use collections.OrderedDict:

>>> from collections import OrderedDict
>>> json.dumps(OrderedDict([("a", 1), ("b", 2)]))
'{"a": 1, "b": 2}'
>>> json.dumps(OrderedDict([("b", 2), ("a", 1)]))
'{"b": 2, "a": 1}'

Since Python 3.6, the keyword argument order is preserved and the above can be rewritten using a nicer syntax:

>>> json.dumps(OrderedDict(a=1, b=2))
'{"a": 1, "b": 2}'
>>> json.dumps(OrderedDict(b=2, a=1))
'{"b": 2, "a": 1}'

See PEP 468 – Preserving Keyword Argument Order.

If your input is given as JSON then to preserve the order (to get OrderedDict), you could pass object_pair_hook, as suggested by @Fred Yankowski:

>>> json.loads('{"a": 1, "b": 2}', object_pairs_hook=OrderedDict)
OrderedDict([('a', 1), ('b', 2)])
>>> json.loads('{"b": 2, "a": 1}', object_pairs_hook=OrderedDict)
OrderedDict([('b', 2), ('a', 1)])

Question 3

As others have mentioned the underlying dict is unordered. However there are OrderedDict objects in python. ( They’re built in in recent pythons, or you can use this: http://code.activestate.com/recipes/576693/ ).

I believe that newer pythons json implementations correctly handle the built in OrderedDicts, but I’m not sure (and I don’t have easy access to test).

Old pythons simplejson implementations dont handle the OrderedDict objects nicely .. and convert them to regular dicts before outputting them.. but you can overcome this by doing the following:

class OrderedJsonEncoder( simplejson.JSONEncoder ):
   def encode(self,o):
      if isinstance(o,OrderedDict.OrderedDict):
         return "{" + ",".join( [ self.encode(k)+":"+self.encode(v) for (k,v) in o.iteritems() ] ) + "}"
      else:
         return simplejson.JSONEncoder.encode(self, o)

now using this we get:

>>> import OrderedDict
>>> unordered={"id":123,"name":"a_name","timezone":"tz"}
>>> ordered = OrderedDict.OrderedDict( [("id",123), ("name","a_name"), ("timezone","tz")] )
>>> e = OrderedJsonEncoder()
>>> print e.encode( unordered )
{"timezone": "tz", "id": 123, "name": "a_name"}
>>> print e.encode( ordered )
{"id":123,"name":"a_name","timezone":"tz"}

Which is pretty much as desired.

Another alternative would be to specialise the encoder to directly use your row class, and then you’d not need any intermediate dict or UnorderedDict.

Question 4

The order of a dictionary doesn’t have any relationship to the order it was defined in. This is true of all dictionaries, not just those turned into JSON.

>>> {"b": 1, "a": 2}
{'a': 2, 'b': 1}

Indeed, the dictionary was turned “upside down” before it even reached json.dumps:

>>> {"id":1,"name":"David","timezone":3}
{'timezone': 3, 'id': 1, 'name': 'David'}

Question 5

hey i know it is so late for this answer but add sort_keys and assign false to it as follows :

json.dumps({'****': ***},sort_keys=False)

this worked for me

Question 6

json.dump() will preserve the ordder of your dictionary. Open the file in a text editor and you will see. It will preserve the order regardless of whether you send it an OrderedDict.

But json.load() will lose the order of the saved object unless you tell it to load into an OrderedDict(), which is done with the object_pairs_hook parameter as J.F.Sebastian instructed above.

It would otherwise lose the order because under usual operation, it loads the saved dictionary object into a regular dict and a regular dict does not preserve the oder of the items it is given.

Question 7

in JSON, as in Javascript, order of object keys is meaningless, so it really doesn’t matter what order they’re displayed in, it is the same object.

Question 8

I just realized that json.dumps() adds spaces in the JSON object

e.g.

{'duration': '02:55', 'name': 'flower', 'chg': 0}

how can remove the spaces in order to make the JSON more compact and save bytes to be sent via HTTP?

such as:

{'duration':'02:55','name':'flower','chg':0}

Question 9

json.dumps(separators=(',', ':'))

Question 10

In some cases you may want to get rid of the trailing whitespaces only. You can then use

json.dumps(separators=(',', ': '))

There is a space after : but not after ,.

This is useful for diff’ing your JSON files (in version control such as git diff), where some editors will get rid of the trailing whitespace but python json.dump will add it back.

Note: This does not exactly answers the question on top, but I came here looking for this answer specifically. I don’t think that it deserves its own QA, so I’m adding it here.

Question 11

I am getting some data from a JSON file “new.json”, and I want to filter some data and store it into a new JSON file. Here is my code:

import json
with open('new.json') as infile:
    data = json.load(infile)
for item in data:
    iden = item.get["id"]
    a = item.get["a"]
    b = item.get["b"]
    c = item.get["c"]
    if c == 'XYZ' or  "XYZ" in data["text"]:
        filename = 'abc.json'
    try:
        outfile = open(filename,'ab')
    except:
        outfile = open(filename,'wb')
    obj_json={}
    obj_json["ID"] = iden
    obj_json["VAL_A"] = a
    obj_json["VAL_B"] = b

and I am getting an error, the traceback is:

  File "rtfav.py", line 3, in <module>
    data = json.load(infile)
  File "/usr/lib64/python2.7/json/__init__.py", line 278, in load
    **kw)
  File "/usr/lib64/python2.7/json/__init__.py", line 326, in loads
    return _default_decoder.decode(s)
  File "/usr/lib64/python2.7/json/decoder.py", line 369, in decode
    raise ValueError(errmsg("Extra data", s, end, len(s)))
ValueError: Extra data: line 88 column 2 - line 50607 column 2 (char 3077 - 1868399)

Can someone help me?

Here is a sample of the data in new.json, there are about 1500 more such dictionaries in the file

{
    "contributors": null, 
    "truncated": false, 
    "text": "@HomeShop18 #DreamJob to professional rafter", 
    "in_reply_to_status_id": null, 
    "id": 421584490452893696, 
    "favorite_count": 0, 
    "source": "<a href=\"https://mobile.twitter.com\" rel=\"nofollow\">Mobile Web (M2)</a>", 
    "retweeted": false, 
    "coordinates": null, 
    "entities": {
        "symbols": [], 
        "user_mentions": [
            {
                "id": 183093247, 
                "indices": [
                    0, 
                    11
                ], 
                "id_str": "183093247", 
                "screen_name": "HomeShop18", 
                "name": "HomeShop18"
            }
        ], 
        "hashtags": [
            {
                "indices": [
                    12, 
                    21
                ], 
                "text": "DreamJob"
            }
        ], 
        "urls": []
    }, 
    "in_reply_to_screen_name": "HomeShop18", 
    "id_str": "421584490452893696", 
    "retweet_count": 0, 
    "in_reply_to_user_id": 183093247, 
    "favorited": false, 
    "user": {
        "follow_request_sent": null, 
        "profile_use_background_image": true, 
        "default_profile_image": false, 
        "id": 2254546045, 
        "verified": false, 
        "profile_image_url_https": "https://pbs.twimg.com/profile_images/413952088880594944/rcdr59OY_normal.jpeg", 
        "profile_sidebar_fill_color": "171106", 
        "profile_text_color": "8A7302", 
        "followers_count": 87, 
        "profile_sidebar_border_color": "BCB302", 
        "id_str": "2254546045", 
        "profile_background_color": "0F0A02", 
        "listed_count": 1, 
        "profile_background_image_url_https": "https://abs.twimg.com/images/themes/theme1/bg.png", 
        "utc_offset": null, 
        "statuses_count": 9793, 
        "description": "Rafter. Rafting is what I do. Me aur mera Tablet.  Technocrat of Future", 
        "friends_count": 231, 
        "location": "", 
        "profile_link_color": "473623", 
        "profile_image_url": "http://pbs.twimg.com/profile_images/413952088880594944/rcdr59OY_normal.jpeg", 
        "following": null, 
        "geo_enabled": false, 
        "profile_banner_url": "https://pbs.twimg.com/profile_banners/2254546045/1388065343", 
        "profile_background_image_url": "http://abs.twimg.com/images/themes/theme1/bg.png", 
        "name": "Jayy", 
        "lang": "en", 
        "profile_background_tile": false, 
        "favourites_count": 41, 
        "screen_name": "JzayyPsingh", 
        "notifications": null, 
        "url": null, 
        "created_at": "Fri Dec 20 05:46:00 +0000 2013", 
        "contributors_enabled": false, 
        "time_zone": null, 
        "protected": false, 
        "default_profile": false, 
        "is_translator": false
    }, 
    "geo": null, 
    "in_reply_to_user_id_str": "183093247", 
    "lang": "en", 
    "created_at": "Fri Jan 10 10:09:09 +0000 2014", 
    "filter_level": "medium", 
    "in_reply_to_status_id_str": null, 
    "place": null
}

Question 12

As you can see in the following example, json.loads (and json.load) does not decode multiple json object.

>>> json.loads('{}')
{}
>>> json.loads('{}{}') # == json.loads(json.dumps({}) + json.dumps({}))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Python27\lib\json\__init__.py", line 338, in loads
    return _default_decoder.decode(s)
  File "C:\Python27\lib\json\decoder.py", line 368, in decode
    raise ValueError(errmsg("Extra data", s, end, len(s)))
ValueError: Extra data: line 1 column 3 - line 1 column 5 (char 2 - 4)

If you want to dump multiple dictionaries, wrap them in a list, dump the list (instead of dumping dictionaries multiple times)

>>> dict1 = {}
>>> dict2 = {}
>>> json.dumps([dict1, dict2])
'[{}, {}]'
>>> json.loads(json.dumps([dict1, dict2]))
[{}, {}]

Question 13

You can just read from a file, jsonifying each line as you go:

tweets = []
for line in open('tweets.json', 'r'):
    tweets.append(json.loads(line))

This avoids storing intermediate python objects. As long as your write one full tweet per append() call, this should work.

Question 14

I came across this because I was trying to load a JSON file dumped from MongoDB. It was giving me an error

JSONDecodeError: Extra data: line 2 column 1

The MongoDB JSON dump has one object per line, so what worked for me is:

import json
data = [json.loads(line) for line in open('data.json', 'r')]

Question 15

This may also happen if your JSON file is not just 1 JSON record. A JSON record looks like this:

[{"some data": value, "next key": "another value"}]

It opens and closes with a bracket [ ], within the brackets are the braces { }. There can be many pairs of braces, but it all ends with a close bracket ]. If your json file contains more than one of those:

[{"some data": value, "next key": "another value"}]
[{"2nd record data": value, "2nd record key": "another value"}]

then loads() will fail.

I verified this with my own file that was failing.

import json

guestFile = open("1_guests.json",'r')
guestData = guestFile.read()
guestFile.close()
gdfJson = json.loads(guestData)

This works because 1_guests.json has one record []. The original file I was using all_guests.json had 6 records separated by newline. I deleted 5 records, (which I already checked to be bookended by brackets) and saved the file under a new name. Then the loads statement worked.

Error was

   raise ValueError(errmsg("Extra data", s, end, len(s)))
ValueError: Extra data: line 2 column 1 - line 10 column 1 (char 261900 - 6964758)

PS. I use the word record, but that’s not the official name. Also, if your file has newline characters like mine, you can loop through it to loads() one record at a time into a json variable.

Question 16

Well , it might help someone. i just got the same error while my json file is like this

{"id":"1101010","city_id":"1101","name":"TEUPAH SELATAN"}
{"id":"1101020","city_id":"1101","name":"SIMEULUE TIMUR"}

and i found it malformed, so i changed it into somekind of

{
  "datas":[
    {"id":"1101010","city_id":"1101","name":"TEUPAH SELATAN"},
    {"id":"1101020","city_id":"1101","name":"SIMEULUE TIMUR"}
  ]
}

Question 17

One-liner for your problem:

data = [json.loads(line) for line in open('tweets.json', 'r')]

Question 18

If you want to solve it in a two-liner you can do it like this:

with open('data.json') as f:
    data = [json.loads(line) for line in f]

Question 19

I think saving dicts in a list is not an ideal solution here proposed by @falsetru.

Better way is, iterating through dicts and saving them to .json by adding a new line.

our 2 dictionaries are

d1 = {'a':1}

d2 = {'b':2}

you can write them to .json

import json
with open('sample.json','a') as sample:
    for dict in [d1,d2]:
        sample.write('{}\n'.format(json.dumps(dict)))

and you can read json file without any issues

with open('sample.json','r') as sample:
    for line in sample:
        line = json.loads(line.strip())

simple and efficient

Question 20

I have a Python set that contains objects with __hash__ and __eq__ methods in order to make certain no duplicates are included in the collection.

I need to json encode this result set, but passing even an empty set to the json.dumps method raises a TypeError.

  File "/usr/lib/python2.7/json/encoder.py", line 201, in encode
    chunks = self.iterencode(o, _one_shot=True)
  File "/usr/lib/python2.7/json/encoder.py", line 264, in iterencode
    return _iterencode(o, 0)
  File "/usr/lib/python2.7/json/encoder.py", line 178, in default
    raise TypeError(repr(o) + " is not JSON serializable")
TypeError: set([]) is not JSON serializable

I know I can create an extension to the json.JSONEncoder class that has a custom default method, but I’m not even sure where to begin in converting over the set. Should I create a dictionary out of the set values within the default method, and then return the encoding on that? Ideally, I’d like to make the default method able to handle all the datatypes that the original encoder chokes on (I’m using Mongo as a data source so dates seem to raise this error too)

Any hint in the right direction would be appreciated.

EDIT:

Thanks for the answer! Perhaps I should have been more precise.

I utilized (and upvoted) the answers here to get around the limitations of the set being translated, but there are internal keys that are an issue as well.

The objects in the set are complex objects that translate to __dict__, but they themselves can also contain values for their properties that could be ineligible for the basic types in the json encoder.

There’s a lot of different types coming into this set, and the hash basically calculates a unique id for the entity, but in the true spirit of NoSQL there’s no telling exactly what the child object contains.

One object might contain a date value for starts, whereas another may have some other schema that includes no keys containing “non-primitive” objects.

That is why the only solution I could think of was to extend the JSONEncoder to replace the default method to turn on different cases – but I’m not sure how to go about this and the documentation is ambiguous. In nested objects, does the value returned from default go by key, or is it just a generic include/discard that looks at the whole object? How does that method accommodate nested values? I’ve looked through previous questions and can’t seem to find the best approach to case-specific encoding (which unfortunately seems like what I’m going to need to do here).

Question 21

JSON notation has only a handful of native datatypes (objects, arrays, strings, numbers, booleans, and null), so anything serialized in JSON needs to be expressed as one of these types.

As shown in the json module docs, this conversion can be done automatically by a JSONEncoder and JSONDecoder, but then you would be giving up some other structure you might need (if you convert sets to a list, then you lose the ability to recover regular lists; if you convert sets to a dictionary using dict.fromkeys(s) then you lose the ability to recover dictionaries).

A more sophisticated solution is to build-out a custom type that can coexist with other native JSON types. This lets you store nested structures that include lists, sets, dicts, decimals, datetime objects, etc.:

from json import dumps, loads, JSONEncoder, JSONDecoder
import pickle

class PythonObjectEncoder(JSONEncoder):
    def default(self, obj):
        if isinstance(obj, (list, dict, str, unicode, int, float, bool, type(None))):
            return JSONEncoder.default(self, obj)
        return {'_python_object': pickle.dumps(obj)}

def as_python_object(dct):
    if '_python_object' in dct:
        return pickle.loads(str(dct['_python_object']))
    return dct

Here is a sample session showing that it can handle lists, dicts, and sets:

>>> data = [1,2,3, set(['knights', 'who', 'say', 'ni']), {'key':'value'}, Decimal('3.14')]

>>> j = dumps(data, cls=PythonObjectEncoder)

>>> loads(j, object_hook=as_python_object)
[1, 2, 3, set(['knights', 'say', 'who', 'ni']), {u'key': u'value'}, Decimal('3.14')]

Alternatively, it may be useful to use a more general purpose serialization technique such as YAML, Twisted Jelly, or Python’s pickle module. These each support a much greater range of datatypes.

Question 22

You can create a custom encoder that returns a list when it encounters a set. Here’s an example:

>>> import json
>>> class SetEncoder(json.JSONEncoder):
...    def default(self, obj):
...       if isinstance(obj, set):
...          return list(obj)
...       return json.JSONEncoder.default(self, obj)
... 
>>> json.dumps(set([1,2,3,4,5]), cls=SetEncoder)
'[1, 2, 3, 4, 5]'

You can detect other types this way too. If you need to retain that the list was actually a set, you could use a custom encoding. Something like return {'type':'set', 'list':list(obj)} might work.

To illustrated nested types, consider serializing this:

>>> class Something(object):
...    pass
>>> json.dumps(set([1,2,3,4,5,Something()]), cls=SetEncoder)

This raises the following error:

TypeError: <__main__.Something object at 0x1691c50> is not JSON serializable

This indicates that the encoder will take the list result returned and recursively call the serializer on its children. To add a custom serializer for multiple types, you can do this:

>>> class SetEncoder(json.JSONEncoder):
...    def default(self, obj):
...       if isinstance(obj, set):
...          return list(obj)
...       if isinstance(obj, Something):
...          return 'CustomSomethingRepresentation'
...       return json.JSONEncoder.default(self, obj)
... 
>>> json.dumps(set([1,2,3,4,5,Something()]), cls=SetEncoder)
'[1, 2, 3, 4, 5, "CustomSomethingRepresentation"]'

Question 23

I adapted Raymond Hettinger’s solution to python 3.

Here is what has changed:

unicode disappeared
updated the call to the parents’ default with super()
using base64 to serialize the bytes type into str (because it seems that bytes in python 3 can’t be converted to JSON)

from decimal import Decimal
from base64 import b64encode, b64decode
from json import dumps, loads, JSONEncoder
import pickle

class PythonObjectEncoder(JSONEncoder):
    def default(self, obj):
        if isinstance(obj, (list, dict, str, int, float, bool, type(None))):
            return super().default(obj)
        return {'_python_object': b64encode(pickle.dumps(obj)).decode('utf-8')}

def as_python_object(dct):
    if '_python_object' in dct:
        return pickle.loads(b64decode(dct['_python_object'].encode('utf-8')))
    return dct

data = [1,2,3, set(['knights', 'who', 'say', 'ni']), {'key':'value'}, Decimal('3.14')]
j = dumps(data, cls=PythonObjectEncoder)
print(loads(j, object_hook=as_python_object))
# prints: [1, 2, 3, {'knights', 'who', 'say', 'ni'}, {'key': 'value'}, Decimal('3.14')]

Question 24

Only dictionaries, Lists and primitive object types (int, string, bool) are available in JSON.

Question 25

You don’t need to make a custom encoder class to supply the default method – it can be passed in as a keyword argument:

import json

def serialize_sets(obj):
    if isinstance(obj, set):
        return list(obj)

    return obj

json_str = json.dumps(set([1,2,3]), default=serialize_sets)
print(json_str)

results in [1, 2, 3] in all supported Python versions.

Question 26

If you only need to encode sets, not general Python objects, and want to keep it easily human-readable, a simplified version of Raymond Hettinger’s answer can be used:

import json
import collections

class JSONSetEncoder(json.JSONEncoder):
    """Use with json.dumps to allow Python sets to be encoded to JSON

    Example
    -------

    import json

    data = dict(aset=set([1,2,3]))

    encoded = json.dumps(data, cls=JSONSetEncoder)
    decoded = json.loads(encoded, object_hook=json_as_python_set)
    assert data == decoded     # Should assert successfully

    Any object that is matched by isinstance(obj, collections.Set) will
    be encoded, but the decoded value will always be a normal Python set.

    """

    def default(self, obj):
        if isinstance(obj, collections.Set):
            return dict(_set_object=list(obj))
        else:
            return json.JSONEncoder.default(self, obj)

def json_as_python_set(dct):
    """Decode json {'_set_object': [1,2,3]} to set([1,2,3])

    Example
    -------
    decoded = json.loads(encoded, object_hook=json_as_python_set)

    Also see :class:`JSONSetEncoder`

    """
    if '_set_object' in dct:
        return set(dct['_set_object'])
    return dct

Question 27

If you need just quick dump and don’t want to implement custom encoder. You can use the following:

json_string = json.dumps(data, iterable_as_array=True)

This will convert all sets (and other iterables) into arrays. Just beware that those fields will stay arrays when you parse the json back. If you want to preserve the types, you need to write custom encoder.

Question 28

One shortcoming of the accepted solution is that its output is very python specific. I.e. its raw json output cannot be observed by a human or loaded by another language (e.g. javascript). example:

db = {
        "a": [ 44, set((4,5,6)) ],
        "b": [ 55, set((4,3,2)) ]
        }

j = dumps(db, cls=PythonObjectEncoder)
print(j)

Will get you:

{"a": [44, {"_python_object": "gANjYnVpbHRpbnMKc2V0CnEAXXEBKEsESwVLBmWFcQJScQMu"}], "b": [55, {"_python_object": "gANjYnVpbHRpbnMKc2V0CnEAXXEBKEsCSwNLBGWFcQJScQMu"}]}

I can propose a solution which downgrades the set to a dict containing a list on the way out, and back to a set when loaded into python using the same encoder, therefore preserving observability and language agnosticism:

from decimal import Decimal
from base64 import b64encode, b64decode
from json import dumps, loads, JSONEncoder
import pickle

class PythonObjectEncoder(JSONEncoder):
    def default(self, obj):
        if isinstance(obj, (list, dict, str, int, float, bool, type(None))):
            return super().default(obj)
        elif isinstance(obj, set):
            return {"__set__": list(obj)}
        return {'_python_object': b64encode(pickle.dumps(obj)).decode('utf-8')}

def as_python_object(dct):
    if '__set__' in dct:
        return set(dct['__set__'])
    elif '_python_object' in dct:
        return pickle.loads(b64decode(dct['_python_object'].encode('utf-8')))
    return dct

db = {
        "a": [ 44, set((4,5,6)) ],
        "b": [ 55, set((4,3,2)) ]
        }

j = dumps(db, cls=PythonObjectEncoder)
print(j)
ob = loads(j)
print(ob["a"])

Which gets you:

{"a": [44, {"__set__": [4, 5, 6]}], "b": [55, {"__set__": [2, 3, 4]}]}
[44, {'__set__': [4, 5, 6]}]

Note that serializing a dictionary which has an element with a key "__set__" will break this mechanism. So __set__ has now become a reserved dict key. Obviously feel free to use another, more deeply obfuscated key.

Question 29

What is the difference between json.dumps and json.load?

From my understanding, one loads JSON into a dictionary and another loads into objects.

Question 30

dumps takes an object and produces a string:

>>> a = {'foo': 3}
>>> json.dumps(a)
'{"foo": 3}'

load would take a file-like object, read the data from that object, and use that string to create an object:

with open('file.json') as fh:
    a = json.load(fh)

Note that dump and load convert between files and objects, while dumps and loads convert between strings and objects. You can think of the s-less functions as wrappers around the s functions:

def dump(obj, fh):
    fh.write(dumps(obj))

def load(fh):
    return loads(fh.read())

Question 31

json loads -> returns an object from a string representing a json object.

json dumps -> returns a string representing a json object from an object.

load and dump -> read/write from/to file instead of string

Question 32

What I am trying to do is extract elevation data from a google maps API along a path specified by latitude and longitude coordinates as follows:

from urllib2 import Request, urlopen
import json

path1 = '42.974049,-81.205203|42.974298,-81.195755'
request=Request('http://maps.googleapis.com/maps/api/elevation/json?locations='+path1+'&sensor=false')
response = urlopen(request)
elevations = response.read()

This gives me a data that looks like this:

elevations.splitlines()

['{',
 '   "results" : [',
 '      {',
 '         "elevation" : 243.3462677001953,',
 '         "location" : {',
 '            "lat" : 42.974049,',
 '            "lng" : -81.205203',
 '         },',
 '         "resolution" : 19.08790397644043',
 '      },',
 '      {',
 '         "elevation" : 244.1318664550781,',
 '         "location" : {',
 '            "lat" : 42.974298,',
 '            "lng" : -81.19575500000001',
 '         },',
 '         "resolution" : 19.08790397644043',
 '      }',
 '   ],',
 '   "status" : "OK"',
 '}']

when putting into as DataFrame here is what I get:

pd.read_json(elevations)

and here is what I want:

I’m not sure if this is possible, but mainly what I am looking for is a way to be able to put the elevation, latitude and longitude data together in a pandas dataframe (doesn’t have to have fancy mutiline headers).

If any one can help or give some advice on working with this data that would be great! If you can’t tell I haven’t worked much with json data before…

EDIT:

This method isn’t all that attractive but seems to work:

data = json.loads(elevations)
lat,lng,el = [],[],[]
for result in data['results']:
    lat.append(result[u'location'][u'lat'])
    lng.append(result[u'location'][u'lng'])
    el.append(result[u'elevation'])
df = pd.DataFrame([lat,lng,el]).T

ends up dataframe having columns latitude, longitude, elevation

Question 33

I found a quick and easy solution to what I wanted using json_normalize() included in pandas 1.01.

from urllib2 import Request, urlopen
import json

import pandas as pd    

path1 = '42.974049,-81.205203|42.974298,-81.195755'
request=Request('http://maps.googleapis.com/maps/api/elevation/json?locations='+path1+'&sensor=false')
response = urlopen(request)
elevations = response.read()
data = json.loads(elevations)
df = pd.json_normalize(data['results'])

This gives a nice flattened dataframe with the json data that I got from the Google Maps API.

Question 34

Check this snip out.

# reading the JSON data using json.load()
file = 'data.json'
with open(file) as train_file:
    dict_train = json.load(train_file)

# converting json dataset from dictionary to dataframe
train = pd.DataFrame.from_dict(dict_train, orient='index')
train.reset_index(level=0, inplace=True)

Hope it helps :)

Question 35

You could first import your json data in a Python dictionnary :

data = json.loads(elevations)

Then modify data on the fly :

for result in data['results']:
    result[u'lat']=result[u'location'][u'lat']
    result[u'lng']=result[u'location'][u'lng']
    del result[u'location']

Rebuild json string :

elevations = json.dumps(data)

Finally :

pd.read_json(elevations)

You can, also, probably avoid to dump data back to a string, I assume Panda can directly create a DataFrame from a dictionnary (I haven’t used it since a long time :p)

Question 36

Just a new version of the accepted answer, as python3.x does not support urllib2

from requests import request
import json
from pandas.io.json import json_normalize

path1 = '42.974049,-81.205203|42.974298,-81.195755'
response=request(url='http://maps.googleapis.com/maps/api/elevation/json?locations='+path1+'&sensor=false', method='get')
elevations = response.json()
elevations
data = json.loads(elevations)
json_normalize(data['results'])

Question 37

The problem is that you have several columns in the data frame that contain dicts with smaller dicts inside them. Useful Json is often heavily nested. I have been writing small functions that pull the info I want out into a new column. That way I have it in the format that I want to use.

for row in range(len(data)):
    #First I load the dict (one at a time)
    n = data.loc[row,'dict_column']
    #Now I make a new column that pulls out the data that I want.
    data.loc[row,'new_column'] = n.get('key')

Question 38

Optimization of the accepted answer:

The accepted answer has some functioning problems, so I want to share my code that does not rely on urllib2:

import requests
from pandas import json_normalize
url = 'https://www.energidataservice.dk/proxy/api/datastore_search?resource_id=nordpoolmarket&limit=5'

response = requests.get(url)
dictr = response.json()
recs = dictr['result']['records']
df = json_normalize(recs)
print(df)

Output:

        _id                    HourUTC               HourDK  ... ElbasAveragePriceEUR  ElbasMaxPriceEUR  ElbasMinPriceEUR
0    264028  2019-01-01T00:00:00+00:00  2019-01-01T01:00:00  ...                  NaN               NaN               NaN
1    138428  2017-09-03T15:00:00+00:00  2017-09-03T17:00:00  ...                33.28              33.4              32.0
2    138429  2017-09-03T16:00:00+00:00  2017-09-03T18:00:00  ...                35.20              35.7              34.9
3    138430  2017-09-03T17:00:00+00:00  2017-09-03T19:00:00  ...                37.50              37.8              37.3
4    138431  2017-09-03T18:00:00+00:00  2017-09-03T20:00:00  ...                39.65              42.9              35.3
..      ...                        ...                  ...  ...                  ...               ...               ...
995  139290  2017-10-09T13:00:00+00:00  2017-10-09T15:00:00  ...                38.40              38.4              38.4
996  139291  2017-10-09T14:00:00+00:00  2017-10-09T16:00:00  ...                41.90              44.3              33.9
997  139292  2017-10-09T15:00:00+00:00  2017-10-09T17:00:00  ...                46.26              49.5              41.4
998  139293  2017-10-09T16:00:00+00:00  2017-10-09T18:00:00  ...                56.22              58.5              49.1
999  139294  2017-10-09T17:00:00+00:00  2017-10-09T19:00:00  ...                56.71              65.4              42.2

PS: API is for Danish electricity prices

Question 39

Here is small utility class that converts JSON to DataFrame and back: Hope you find this helpful.

# -*- coding: utf-8 -*-
from pandas.io.json import json_normalize

class DFConverter:

    #Converts the input JSON to a DataFrame
    def convertToDF(self,dfJSON):
        return(json_normalize(dfJSON))

    #Converts the input DataFrame to JSON 
    def convertToJSON(self, df):
        resultJSON = df.to_json(orient='records')
        return(resultJSON)

Question 40

billmanH’s solution helped me but didn’t work until i switched from:

n = data.loc[row,'json_column']

to:

n = data.iloc[[row]]['json_column']

here’s the rest of it, converting to a dictionary is helpful for working with json data.

import json

for row in range(len(data)):
    n = data.iloc[[row]]['json_column'].item()
    jsonDict = json.loads(n)
    if ('mykey' in jsonDict):
        display(jsonDict['mykey'])

Question 41

#Use the small trick to make the data json interpret-able
#Since your data is not directly interpreted by json.loads()

>>> import json
>>> f=open("sampledata.txt","r+")
>>> data = f.read()
>>> for x in data.split("\n"):
...     strlist = "["+x+"]"
...     datalist=json.loads(strlist)
...     for y in datalist:
...             print(type(y))
...             print(y)
...
...
<type 'dict'>
{u'0': [[10.8, 36.0], {u'10': 0, u'1': 0, u'0': 0, u'3': 0, u'2': 0, u'5': 0, u'4': 0, u'7': 0, u'6': 0, u'9': 0, u'8': 0}]}
<type 'dict'>
{u'1': [[10.8, 36.1], {u'10': 0, u'1': 0, u'0': 0, u'3': 0, u'2': 0, u'5': 0, u'4': 0, u'7': 0, u'6': 0, u'9': 0, u'8': 0}]}
<type 'dict'>
{u'2': [[10.8, 36.2], {u'10': 0, u'1': 0, u'0': 0, u'3': 0, u'2': 0, u'5': 0, u'4': 0, u'7': 0, u'6': 0, u'9': 0, u'8': 0}]}
<type 'dict'>
{u'3': [[10.8, 36.300000000000004], {u'10': 0, u'1': 0, u'0': 0, u'3': 0, u'2': 0, u'5': 0, u'4': 0, u'7': 0, u'6': 0, u'9': 0, u'8': 0}]}
<type 'dict'>
{u'4': [[10.8, 36.4], {u'10': 0, u'1': 0, u'0': 0, u'3': 0, u'2': 0, u'5': 0, u'4': 0, u'7': 0, u'6': 0, u'9': 0, u'8': 0}]}
<type 'dict'>
{u'5': [[10.8, 36.5], {u'10': 0, u'1': 0, u'0': 0, u'3': 0, u'2': 0, u'5': 0, u'4': 0, u'7': 0, u'6': 0, u'9': 0, u'8': 0}]}
<type 'dict'>
{u'6': [[10.8, 36.6], {u'10': 0, u'1': 0, u'0': 0, u'3': 0, u'2': 0, u'5': 0, u'4': 0, u'7': 0, u'6': 0, u'9': 0, u'8': 0}]}
<type 'dict'>
{u'7': [[10.8, 36.7], {u'10': 0, u'1': 0, u'0': 0, u'3': 0, u'2': 0, u'5': 0, u'4': 0, u'7': 0, u'6': 0, u'9': 0, u'8': 0}]}
<type 'dict'>
{u'8': [[10.8, 36.800000000000004], {u'1': 0, u'0': 0, u'3': 0, u'2': 0, u'5': 0, u'4': 0, u'7': 0, u'6': 0, u'9': 0, u'8': 0}]}
<type 'dict'>
{u'9': [[10.8, 36.9], {u'1': 0, u'0': 0, u'3': 0, u'2': 0, u'5': 0, u'4': 0, u'7': 0, u'6': 0, u'9': 0, u'8': 0}]}

Question 42

Once you have the flattened DataFrame obtained by the accepted answer, you can make the columns a MultiIndex (“fancy multiline header”) like this:

df.columns = pd.MultiIndex.from_tuples([tuple(c.split('.')) for c in df.columns])

Question 43

I’m playing with both learning python and trying to get github issues into a readable form. Using the advice on How can I convert JSON to CSV? I came up with this:

import json
import csv

f=open('issues.json')
data = json.load(f)
f.close()

f=open("issues.csv","wb+")
csv_file=csv.writer(f)

csv_file.writerow(["gravatar_id","position","number","votes","created_at","comments","body","title","updated_at","html_url","user","labels","state"])

for item in data:
        csv_file.writerow([item["gravatar_id"], item["position"], item["number"], item["votes"], item["created_at"], item["comments"], item["body"], item["title"], item["updated_at"], item["html_url"], item["user"], item["labels"], item["state"]])

Where “issues.json” is the json file containing my github issues. When I try to run that, I get

File "foo.py", line 14, in <module>
csv_file.writerow([item["gravatar_id"], item["position"], item["number"], item["votes"], item["created_at"], item["comments"], item["body"], item["title"], item["updated_at"], item["html_url"], item["user"], item["labels"], item["state"]])

TypeError: string indices must be integers

What am I missing here? Which are the “string indices”? I’m sure that once I get this working I’ll have more issues, but for now , I’d just love for this to work!

When I tweak the for statement to simply

for item in data:
    print item

what I get is … “issues” — so I’m doing something more basic wrong. Here’s a bit of my json:

{"issues":[{"gravatar_id":"44230311a3dcd684b6c5f81bf2ec9f60","position":2.0,"number":263,"votes":0,"created_at":"2010/09/17 16:06:50 -0700","comments":11,"body":"Add missing paging (Older>>) links...

when I print data it looks like it is getting munged really oddly:

{u'issues': [{u'body': u'Add missing paging (Older>>) lin...

Question 44

item is most likely a string in your code; the string indices are the ones in the square brackets, e.g., gravatar_id. So I’d first check your data variable to see what you received there; I guess that data is a list of strings (or at least a list containing at least one string) while it should be a list of dictionaries.

Question 45

The variable item is a string. An index looks like this:

>>> mystring = 'helloworld'
>>> print mystring[0]
'h'

The above example uses the 0 index of the string to refer to the first character.

Strings can’t have string indices (like dictionaries can). So this won’t work:

>>> mystring = 'helloworld'
>>> print mystring['stringindex']
TypeError: string indices must be integers

Question 46

data is a dict object. So, iterate over it like this:

Python 2

for key, value in data.iteritems():
    print key, value

Python 3

for key, value in data.items():
    print(key, value)

Question 47

TypeError for Slice Notation `str[a:b]`

tl;dr: use a colon : instead of a comma in between the two indices a and b in str[a:b]

When working with strings and slice notation (a common sequence operation), it can happen that a TypeError is raised, pointing out that the indices must be integers, even if they obviously are.

Example

>>> my_string = "hello world"
>>> my_string[0,5]
TypeError: string indices must be integers

We obviously passed two integers for the indices to the slice notation, right? So what is the problem here?

This error can be very frustrating – especially at the beginning of learning Python – because the error message is a little bit misleading.

Explanation

We implicitly passed a tuple of two integers (0 and 5) to the slice notation when we called my_string[0,5] because 0,5 (even without the parentheses) evaluates to the same tuple as (0,5) would do.

A comma , is actually enough for Python to evaluate something as a tuple:

>>> my_variable = 0,
>>> type(my_variable)
<class 'tuple'>

So what we did there, this time explicitly:

>>> my_string = "hello world"
>>> my_tuple = 0, 5
>>> my_string[my_tuple]
TypeError: string indices must be integers

Now, at least, the error message makes sense.

Solution

We need to replace the comma , with a colon : to separate the two integers correctly:

>>> my_string = "hello world"
>>> my_string[0:5]
'hello'

A clearer and more helpful error message could have been something like:

TypeError: string indices must be integers (not tuple)

A good error message shows the user directly what they did wrong and it would have been more obvious how to solve the problem.

[So the next time when you find yourself responsible for writing an error description message, think of this example and add the reason or other useful information to error message to let you and maybe other people understand what went wrong.]

Lessons learned

slice notation uses colons : to separate its indices (and step range, e.g. str[from:to:step])
tuples are defined by commas , (e.g. t = 1,)
add some information to error messages for users to understand what went wrong

Cheers and happy programming
winklerrr

[I know this question was already answered and this wasn’t exactly the question the thread starter asked, but I came here because of the above problem which leads to the same error message. At least it took me quite some time to find that little typo.

So I hope that this will help someone else who stumbled upon the same error and saves them some time finding that tiny mistake.]

Question 48

This can happen if a comma is missing. I ran into it when I had a list of two-tuples, each of which consisted of a string in the first position, and a list in the second. I erroneously omitted the comma after the first component of a tuple in one case, and the interpreter thought I was trying to index the first component.

Question 49

I had a similar issue with Pandas, you need to use the iterrows() function to iterate through a Pandas dataset Pandas documentation for iterrows

data = pd.read_csv('foo.csv')
for index,item in data.iterrows():
    print('{} {}'.format(item["gravatar_id"], item["position"]))

note that you need to handle the index in the dataset that is also returned by the function.

Question 50

I am trying to pass in a JSON file and convert the data into a dictionary.

So far, this is what I have done:

import json
json1_file = open('json1')
json1_str = json1_file.read()
json1_data = json.loads(json1_str)

I’m expecting json1_data to be a dict type but it actually comes out as a list type when I check it with type(json1_data).

What am I missing? I need this to be a dictionary so I can access one of the keys.

Question 51

Your JSON is an array with a single object inside, so when you read it in you get a list with a dictionary inside. You can access your dictionary by accessing item 0 in the list, as shown below:

json1_data = json.loads(json1_str)[0]

Now you can access the data stored in datapoints just as you were expecting:

datapoints = json1_data['datapoints']

I have one more question if anyone can bite: I am trying to take the average of the first elements in these datapoints(i.e. datapoints[0][0]). Just to list them, I tried doing datapoints[0:5][0] but all I get is the first datapoint with both elements as opposed to wanting to get the first 5 datapoints containing only the first element. Is there a way to do this?

datapoints[0:5][0] doesn’t do what you’re expecting. datapoints[0:5] returns a new list slice containing just the first 5 elements, and then adding [0] on the end of it will take just the first element from that resulting list slice. What you need to use to get the result you want is a list comprehension:

[p[0] for p in datapoints[0:5]]

Here’s a simple way to calculate the mean:

sum(p[0] for p in datapoints[0:5])/5. # Result is 35.8

If you’re willing to install NumPy, then it’s even easier:

import numpy
json1_file = open('json1')
json1_str = json1_file.read()
json1_data = json.loads(json1_str)[0]
datapoints = numpy.array(json1_data['datapoints'])
avg = datapoints[0:5,0].mean()
# avg is now 35.8

Using the , operator with the slicing syntax for NumPy’s arrays has the behavior you were originally expecting with the list slices.

Question 52

Here is a simple snippet that read’s in a json text file from a dictionary. Note that your json file must follow the json standard, so it has to have " double quotes rather then ' single quotes.

Your JSON dump.txt File:

{"test":"1", "test2":123}

Python Script:

import json
with open('/your/path/to/a/dict/dump.txt') as handle:
    dictdump = json.loads(handle.read())

Question 53

You can use the following:

import json

 with open('<yourFile>.json', 'r') as JSON:
       json_dict = json.load(JSON)

 # Now you can use it like dictionary
 # For example:

 print(json_dict["username"])

Question 54

The best way to Load JSON Data into Dictionary is You can user the inbuilt json loader.

Below is the sample snippet that can be used.

import json
f = open("data.json")
data = json.load(f))
f.close()
type(data)
print(data[<keyFromTheJsonFile>])

Question 55

I am working with a Python code for a REST API, so this is for those who are working on similar projects.

I extract data from an URL using a POST request and the raw output is JSON. For some reason the output is already a dictionary, not a list, and I’m able to refer to the nested dictionary keys right away, like this:

datapoint_1 = json1_data['datapoints']['datapoint_1']

where datapoint_1 is inside the datapoints dictionary.

Question 56

pass the data using javascript ajax from get methods

    **//javascript function    
    function addnewcustomer(){ 
    //This function run when button click
    //get the value from input box using getElementById
            var new_cust_name = document.getElementById("new_customer").value;
            var new_cust_cont = document.getElementById("new_contact_number").value;
            var new_cust_email = document.getElementById("new_email").value;
            var new_cust_gender = document.getElementById("new_gender").value;
            var new_cust_cityname = document.getElementById("new_cityname").value;
            var new_cust_pincode = document.getElementById("new_pincode").value;
            var new_cust_state = document.getElementById("new_state").value;
            var new_cust_contry = document.getElementById("new_contry").value;
    //create json or if we know python that is call dictionary.        
    var data = {"cust_name":new_cust_name, "cust_cont":new_cust_cont, "cust_email":new_cust_email, "cust_gender":new_cust_gender, "cust_cityname":new_cust_cityname, "cust_pincode":new_cust_pincode, "cust_state":new_cust_state, "cust_contry":new_cust_contry};
    //apply stringfy method on json
            data = JSON.stringify(data);
    //insert data into database using javascript ajax
            var send_data = new XMLHttpRequest();
            send_data.open("GET", "http://localhost:8000/invoice_system/addnewcustomer/?customerinfo="+data,true);
            send_data.send();

            send_data.onreadystatechange = function(){
              if(send_data.readyState==4 && send_data.status==200){
                alert(send_data.responseText);
              }
            }
          }

django views

    def addNewCustomer(request):
    #if method is get then condition is true and controller check the further line
        if request.method == "GET":
    #this line catch the json from the javascript ajax.
            cust_info = request.GET.get("customerinfo")
    #fill the value in variable which is coming from ajax.
    #it is a json so first we will get the value from using json.loads method.
    #cust_name is a key which is pass by javascript json. 
    #as we know json is a key value pair. the cust_name is a key which pass by javascript json
            cust_name = json.loads(cust_info)['cust_name']
            cust_cont = json.loads(cust_info)['cust_cont']
            cust_email = json.loads(cust_info)['cust_email']
            cust_gender = json.loads(cust_info)['cust_gender']
            cust_cityname = json.loads(cust_info)['cust_cityname']
            cust_pincode = json.loads(cust_info)['cust_pincode']
            cust_state = json.loads(cust_info)['cust_state']
            cust_contry = json.loads(cust_info)['cust_contry']
    #it print the value of cust_name variable on server
            print(cust_name)
            print(cust_cont)
            print(cust_email)
            print(cust_gender)
            print(cust_cityname)
            print(cust_pincode)
            print(cust_state)
            print(cust_contry)
            return HttpResponse("Yes I am reach here.")**

Question 57

我正在使用Python-2.6 CGI脚本，但在执行此操作时在服务器日志中发现了此错误json.dumps()，

Traceback (most recent call last):
  File "/etc/mongodb/server/cgi-bin/getstats.py", line 135, in <module>
    print json.dumps(__getdata())
  File "/usr/lib/python2.7/json/__init__.py", line 231, in dumps
    return _default_encoder.encode(obj)
  File "/usr/lib/python2.7/json/encoder.py", line 201, in encode
    chunks = self.iterencode(o, _one_shot=True)
  File "/usr/lib/python2.7/json/encoder.py", line 264, in iterencode
    return _iterencode(o, 0)
UnicodeDecodeError: 'utf8' codec can't decode byte 0xa5 in position 0: invalid start byte

“这里

__getdata()函数返回dictionary {}。

张贴这个问题之前我已经提到这个问题，操作系统，所以的。

更新

下一行损害了JSON编码器，

now = datetime.datetime.now()
now = datetime.datetime.strftime(now, '%Y-%m-%dT%H:%M:%S.%fZ')
print json.dumps({'current_time': now}) // this is the culprit

我有一个临时解决方案

print json.dumps( {'old_time': now.encode('ISO-8859-1').strip() })

但是我不确定这样做是否正确。

Question 58

I am using Python-2.6 CGI scripts but found this error in server log while doing json.dumps(),

Traceback (most recent call last):
  File "/etc/mongodb/server/cgi-bin/getstats.py", line 135, in <module>
    print json.dumps(__getdata())
  File "/usr/lib/python2.7/json/__init__.py", line 231, in dumps
    return _default_encoder.encode(obj)
  File "/usr/lib/python2.7/json/encoder.py", line 201, in encode
    chunks = self.iterencode(o, _one_shot=True)
  File "/usr/lib/python2.7/json/encoder.py", line 264, in iterencode
    return _iterencode(o, 0)
UnicodeDecodeError: 'utf8' codec can't decode byte 0xa5 in position 0: invalid start byte

Here ,

__getdata() function returns dictionary {} .

Before posting this question I have referred this of question os SO.

UPDATES

Following line is hurting JSON encoder,

now = datetime.datetime.now()
now = datetime.datetime.strftime(now, '%Y-%m-%dT%H:%M:%S.%fZ')
print json.dumps({'current_time': now}) // this is the culprit

I got a temporary fix for it

print json.dumps( {'old_time': now.encode('ISO-8859-1').strip() })

But I am not sure is it correct way to do it.

Question 59

该错误是因为字典中存在一些非ASCII字符，并且无法对其进行编码/解码。避免此错误的一种简单方法是使用encode()如下功能对此类字符串进行编码（如果a字符串为非ascii字符）：

a.encode('utf-8').strip()

Question 60

The error is because there is some non-ascii character in the dictionary and it can’t be encoded/decoded. One simple way to avoid this error is to encode such strings with encode() function as follows (if a is the string with non-ascii character):

a.encode('utf-8').strip()

Question 61

我仅通过在read_csv()命令中定义其他编解码器包来切换此设置：

encoding = 'unicode_escape'

例如：

import pandas as pd
data = pd.read_csv(filename, encoding= 'unicode_escape')

Question 62

I switched this simply by defining a different codec package in the read_csv() command:

encoding = 'unicode_escape'

Eg:

import pandas as pd
data = pd.read_csv(filename, encoding= 'unicode_escape')

Question 63

试试下面的代码片段：

with open(path, 'rb') as f:
  text = f.read()

Question 64

Try the below code snippet:

with open(path, 'rb') as f:
  text = f.read()

问题：JSON对象中的项目使用“ json.dumps”乱序了吗？

回答 0

回答 1

回答 2

回答 3

回答 4

回答 5

问题：Python-没有空格的json

回答 0

回答 1

问题：Python json.loads显示ValueError：额外数据

回答 0

回答 1

回答 2

回答 3

回答 4

回答 5

回答 6

回答 7

问题：如何对集合进行JSON序列化？

回答 0

回答 1

回答 2

回答 3

回答 4

回答 5

回答 6

回答 7

问题：json.dumps和json.load有什么区别？[关闭]

回答 0

回答 1

问题：JSON转换为Pandas DataFrame

回答 0

回答 1

回答 2

回答 3

回答 4

回答 5

回答 6

回答 7

回答 8

回答 9

问题：为什么会看到“ TypeError：字符串索引必须为整数”？

回答 0

回答 1

回答 2

Python 2

Python 3

Python 2

Python 3

回答 3

切片符号的TypeError str[a:b]

例

说明

解

得到教训

TypeError for Slice Notation str[a:b]

Example

Explanation

Solution

Lessons learned

回答 4

回答 5

问题：将JSON字符串转换为字典未列出

回答 0

回答 1

回答 2

回答 3

回答 4

回答 5

问题：UnicodeDecodeError：“ utf8”编解码器无法解码位置0的字节0xa5：无效的起始字节

更新

UPDATES

回答 0

回答 1

回答 2

回答 3

回答 4

回答 5

回答 6

切片符号的TypeError `str[a:b]`

TypeError for Slice Notation `str[a:b]`