标签归档:json

JSON对象中的项目使用“ json.dumps”乱序了吗?

问题:JSON对象中的项目使用“ json.dumps”乱序了吗?

json.dumps用来转换成json之类的

countries.append({"id":row.id,"name":row.name,"timezone":row.timezone})
print json.dumps(countries)

我得到的结果是:

[
   {"timezone": 4, "id": 1, "name": "Mauritius"}, 
   {"timezone": 2, "id": 2, "name": "France"}, 
   {"timezone": 1, "id": 3, "name": "England"}, 
   {"timezone": -4, "id": 4, "name": "USA"}
]

我想按以下顺序获取键:ID,名称,时区-但我却拥有时区,ID,名称。

我该如何解决?

I’m using json.dumps to convert into json like

countries.append({"id":row.id,"name":row.name,"timezone":row.timezone})
print json.dumps(countries)

The result i have is:

[
   {"timezone": 4, "id": 1, "name": "Mauritius"}, 
   {"timezone": 2, "id": 2, "name": "France"}, 
   {"timezone": 1, "id": 3, "name": "England"}, 
   {"timezone": -4, "id": 4, "name": "USA"}
]

I want to have the keys in the following order: id, name, timezone – but instead I have timezone, id, name.

How should I fix this?


回答 0

Python dict(在Python 3.7之前)和JSON对象都是无序集合。您可以传递sort_keys参数来对键进行排序:

>>> import json
>>> json.dumps({'a': 1, 'b': 2})
'{"b": 2, "a": 1}'
>>> json.dumps({'a': 1, 'b': 2}, sort_keys=True)
'{"a": 1, "b": 2}'

如果您需要特定的订单;您可以使用collections.OrderedDict

>>> from collections import OrderedDict
>>> json.dumps(OrderedDict([("a", 1), ("b", 2)]))
'{"a": 1, "b": 2}'
>>> json.dumps(OrderedDict([("b", 2), ("a", 1)]))
'{"b": 2, "a": 1}'

从Python 3.6开始,关键字参数顺序得以保留,并且可以使用更好的语法重写以上内容:

>>> json.dumps(OrderedDict(a=1, b=2))
'{"a": 1, "b": 2}'
>>> json.dumps(OrderedDict(b=2, a=1))
'{"b": 2, "a": 1}'

请参阅PEP 468 –保留关键字参数顺序

如果你的输入给出JSON然后保存顺序(得到OrderedDict),你可以传递object_pair_hook通过@Fred Yankowski的建议

>>> json.loads('{"a": 1, "b": 2}', object_pairs_hook=OrderedDict)
OrderedDict([('a', 1), ('b', 2)])
>>> json.loads('{"b": 2, "a": 1}', object_pairs_hook=OrderedDict)
OrderedDict([('b', 2), ('a', 1)])

Both Python dict (before Python 3.7) and JSON object are unordered collections. You could pass sort_keys parameter, to sort the keys:

>>> import json
>>> json.dumps({'a': 1, 'b': 2})
'{"b": 2, "a": 1}'
>>> json.dumps({'a': 1, 'b': 2}, sort_keys=True)
'{"a": 1, "b": 2}'

If you need a particular order; you could use collections.OrderedDict:

>>> from collections import OrderedDict
>>> json.dumps(OrderedDict([("a", 1), ("b", 2)]))
'{"a": 1, "b": 2}'
>>> json.dumps(OrderedDict([("b", 2), ("a", 1)]))
'{"b": 2, "a": 1}'

Since Python 3.6, the keyword argument order is preserved and the above can be rewritten using a nicer syntax:

>>> json.dumps(OrderedDict(a=1, b=2))
'{"a": 1, "b": 2}'
>>> json.dumps(OrderedDict(b=2, a=1))
'{"b": 2, "a": 1}'

See PEP 468 – Preserving Keyword Argument Order.

If your input is given as JSON then to preserve the order (to get OrderedDict), you could pass object_pair_hook, as suggested by @Fred Yankowski:

>>> json.loads('{"a": 1, "b": 2}', object_pairs_hook=OrderedDict)
OrderedDict([('a', 1), ('b', 2)])
>>> json.loads('{"b": 2, "a": 1}', object_pairs_hook=OrderedDict)
OrderedDict([('b', 2), ('a', 1)])

回答 1

正如其他人提到的那样,基本的命令是无序的。但是在python中有OrderedDict对象。(它们内置在最新的python中,或者您可以使用此代码:http : //code.activestate.com/recipes/576693/)。

我相信较新的pythons json实现可以正确处理内置的OrderedDicts,但是我不确定(而且我无法轻松访问测试)。

旧的python simplejson实现无法很好地处理OrderedDict对象..并在输出它们之前将它们转换为常规dict ..但您可以通过执行以下操作来克服这一点:

class OrderedJsonEncoder( simplejson.JSONEncoder ):
   def encode(self,o):
      if isinstance(o,OrderedDict.OrderedDict):
         return "{" + ",".join( [ self.encode(k)+":"+self.encode(v) for (k,v) in o.iteritems() ] ) + "}"
      else:
         return simplejson.JSONEncoder.encode(self, o)

现在使用这个我们得到:

>>> import OrderedDict
>>> unordered={"id":123,"name":"a_name","timezone":"tz"}
>>> ordered = OrderedDict.OrderedDict( [("id",123), ("name","a_name"), ("timezone","tz")] )
>>> e = OrderedJsonEncoder()
>>> print e.encode( unordered )
{"timezone": "tz", "id": 123, "name": "a_name"}
>>> print e.encode( ordered )
{"id":123,"name":"a_name","timezone":"tz"}

这几乎是所需要的。

另一种选择是专门使编码器直接使用您的行类,这样就不需要任何中间dict或UnorderedDict。

As others have mentioned the underlying dict is unordered. However there are OrderedDict objects in python. ( They’re built in in recent pythons, or you can use this: http://code.activestate.com/recipes/576693/ ).

I believe that newer pythons json implementations correctly handle the built in OrderedDicts, but I’m not sure (and I don’t have easy access to test).

Old pythons simplejson implementations dont handle the OrderedDict objects nicely .. and convert them to regular dicts before outputting them.. but you can overcome this by doing the following:

class OrderedJsonEncoder( simplejson.JSONEncoder ):
   def encode(self,o):
      if isinstance(o,OrderedDict.OrderedDict):
         return "{" + ",".join( [ self.encode(k)+":"+self.encode(v) for (k,v) in o.iteritems() ] ) + "}"
      else:
         return simplejson.JSONEncoder.encode(self, o)

now using this we get:

>>> import OrderedDict
>>> unordered={"id":123,"name":"a_name","timezone":"tz"}
>>> ordered = OrderedDict.OrderedDict( [("id",123), ("name","a_name"), ("timezone","tz")] )
>>> e = OrderedJsonEncoder()
>>> print e.encode( unordered )
{"timezone": "tz", "id": 123, "name": "a_name"}
>>> print e.encode( ordered )
{"id":123,"name":"a_name","timezone":"tz"}

Which is pretty much as desired.

Another alternative would be to specialise the encoder to directly use your row class, and then you’d not need any intermediate dict or UnorderedDict.


回答 2

字典的顺序与其定义的顺序没有任何关系。这对所有字典都是正确的,不仅是那些变成JSON的字典。

>>> {"b": 1, "a": 2}
{'a': 2, 'b': 1}

实际上,该字典甚至在到达之前就被“颠倒了” json.dumps

>>> {"id":1,"name":"David","timezone":3}
{'timezone': 3, 'id': 1, 'name': 'David'}

The order of a dictionary doesn’t have any relationship to the order it was defined in. This is true of all dictionaries, not just those turned into JSON.

>>> {"b": 1, "a": 2}
{'a': 2, 'b': 1}

Indeed, the dictionary was turned “upside down” before it even reached json.dumps:

>>> {"id":1,"name":"David","timezone":3}
{'timezone': 3, 'id': 1, 'name': 'David'}

回答 3

嘿,我知道这个答案太迟了,但是添加sort_keys并为它赋false,如下所示:

json.dumps({'****': ***},sort_keys=False)

这对我有用

hey i know it is so late for this answer but add sort_keys and assign false to it as follows :

json.dumps({'****': ***},sort_keys=False)

this worked for me


回答 4

json.dump()将保留字典的序号。在文本编辑器中打开文件,您将看到。无论您是否发送OrderedDict,它都会保留订单。

但是json.load()会丢失已保存对象的顺序,除非您告诉它加载到OrderedDict()中,这是通过上述JFSebastian的object_pairs_hook参数完成的。

否则,它将失去顺序,因为在常规操作下,它将保存的字典对象加载到常规字典中,而常规字典不保留给出的项目的价格。

json.dump() will preserve the ordder of your dictionary. Open the file in a text editor and you will see. It will preserve the order regardless of whether you send it an OrderedDict.

But json.load() will lose the order of the saved object unless you tell it to load into an OrderedDict(), which is done with the object_pairs_hook parameter as J.F.Sebastian instructed above.

It would otherwise lose the order because under usual operation, it loads the saved dictionary object into a regular dict and a regular dict does not preserve the oder of the items it is given.


回答 5

在JSON中,就像在Javascript中一样,对象键的顺序是没有意义的,因此它们按什么顺序显示实际上并不重要,它是同一对象。

in JSON, as in Javascript, order of object keys is meaningless, so it really doesn’t matter what order they’re displayed in, it is the same object.


Python-没有空格的json

问题:Python-没有空格的json

我刚刚意识到json.dumps()在JSON对象中添加了空格

例如

{'duration': '02:55', 'name': 'flower', 'chg': 0}

如何删除空格以使JSON更紧凑并保存要通过HTTP发送的字节?

如:

{'duration':'02:55','name':'flower','chg':0}

I just realized that json.dumps() adds spaces in the JSON object

e.g.

{'duration': '02:55', 'name': 'flower', 'chg': 0}

how can remove the spaces in order to make the JSON more compact and save bytes to be sent via HTTP?

such as:

{'duration':'02:55','name':'flower','chg':0}

回答 0

json.dumps(separators=(',', ':'))
json.dumps(separators=(',', ':'))

回答 1

在某些情况下,您可能只希望摆脱尾随空格。然后,您可以使用

json.dumps(separators=(',', ': '))

后面有空格,:但后面没有空格,

这对区分JSON文件(在诸如的版本控制中git diff)很有用,在该版本中,某些编辑器将摆脱尾随的空白,而python json.dump将其重新添加回去。

注意:这并不能完全回答上面的问题,但是我来这里是为了寻找该答案。我认为它不应该进行自己的质量检查,因此在此添加它。

In some cases you may want to get rid of the trailing whitespaces only. You can then use

json.dumps(separators=(',', ': '))

There is a space after : but not after ,.

This is useful for diff’ing your JSON files (in version control such as git diff), where some editors will get rid of the trailing whitespace but python json.dump will add it back.

Note: This does not exactly answers the question on top, but I came here looking for this answer specifically. I don’t think that it deserves its own QA, so I’m adding it here.


Python json.loads显示ValueError:额外数据

问题:Python json.loads显示ValueError:额外数据

我正在从JSON文件“ new.json”中获取一些数据,我想过滤一些数据并将其存储到新的JSON文件中。这是我的代码:

import json
with open('new.json') as infile:
    data = json.load(infile)
for item in data:
    iden = item.get["id"]
    a = item.get["a"]
    b = item.get["b"]
    c = item.get["c"]
    if c == 'XYZ' or  "XYZ" in data["text"]:
        filename = 'abc.json'
    try:
        outfile = open(filename,'ab')
    except:
        outfile = open(filename,'wb')
    obj_json={}
    obj_json["ID"] = iden
    obj_json["VAL_A"] = a
    obj_json["VAL_B"] = b

我收到一个错误,回溯是:

  File "rtfav.py", line 3, in <module>
    data = json.load(infile)
  File "/usr/lib64/python2.7/json/__init__.py", line 278, in load
    **kw)
  File "/usr/lib64/python2.7/json/__init__.py", line 326, in loads
    return _default_decoder.decode(s)
  File "/usr/lib64/python2.7/json/decoder.py", line 369, in decode
    raise ValueError(errmsg("Extra data", s, end, len(s)))
ValueError: Extra data: line 88 column 2 - line 50607 column 2 (char 3077 - 1868399)

有人能帮我吗?

这是new.json中数据的示例,文件中还有约1500种这样的词典

{
    "contributors": null, 
    "truncated": false, 
    "text": "@HomeShop18 #DreamJob to professional rafter", 
    "in_reply_to_status_id": null, 
    "id": 421584490452893696, 
    "favorite_count": 0, 
    "source": "<a href=\"https://mobile.twitter.com\" rel=\"nofollow\">Mobile Web (M2)</a>", 
    "retweeted": false, 
    "coordinates": null, 
    "entities": {
        "symbols": [], 
        "user_mentions": [
            {
                "id": 183093247, 
                "indices": [
                    0, 
                    11
                ], 
                "id_str": "183093247", 
                "screen_name": "HomeShop18", 
                "name": "HomeShop18"
            }
        ], 
        "hashtags": [
            {
                "indices": [
                    12, 
                    21
                ], 
                "text": "DreamJob"
            }
        ], 
        "urls": []
    }, 
    "in_reply_to_screen_name": "HomeShop18", 
    "id_str": "421584490452893696", 
    "retweet_count": 0, 
    "in_reply_to_user_id": 183093247, 
    "favorited": false, 
    "user": {
        "follow_request_sent": null, 
        "profile_use_background_image": true, 
        "default_profile_image": false, 
        "id": 2254546045, 
        "verified": false, 
        "profile_image_url_https": "https://pbs.twimg.com/profile_images/413952088880594944/rcdr59OY_normal.jpeg", 
        "profile_sidebar_fill_color": "171106", 
        "profile_text_color": "8A7302", 
        "followers_count": 87, 
        "profile_sidebar_border_color": "BCB302", 
        "id_str": "2254546045", 
        "profile_background_color": "0F0A02", 
        "listed_count": 1, 
        "profile_background_image_url_https": "https://abs.twimg.com/images/themes/theme1/bg.png", 
        "utc_offset": null, 
        "statuses_count": 9793, 
        "description": "Rafter. Rafting is what I do. Me aur mera Tablet.  Technocrat of Future", 
        "friends_count": 231, 
        "location": "", 
        "profile_link_color": "473623", 
        "profile_image_url": "http://pbs.twimg.com/profile_images/413952088880594944/rcdr59OY_normal.jpeg", 
        "following": null, 
        "geo_enabled": false, 
        "profile_banner_url": "https://pbs.twimg.com/profile_banners/2254546045/1388065343", 
        "profile_background_image_url": "http://abs.twimg.com/images/themes/theme1/bg.png", 
        "name": "Jayy", 
        "lang": "en", 
        "profile_background_tile": false, 
        "favourites_count": 41, 
        "screen_name": "JzayyPsingh", 
        "notifications": null, 
        "url": null, 
        "created_at": "Fri Dec 20 05:46:00 +0000 2013", 
        "contributors_enabled": false, 
        "time_zone": null, 
        "protected": false, 
        "default_profile": false, 
        "is_translator": false
    }, 
    "geo": null, 
    "in_reply_to_user_id_str": "183093247", 
    "lang": "en", 
    "created_at": "Fri Jan 10 10:09:09 +0000 2014", 
    "filter_level": "medium", 
    "in_reply_to_status_id_str": null, 
    "place": null
} 

I am getting some data from a JSON file “new.json”, and I want to filter some data and store it into a new JSON file. Here is my code:

import json
with open('new.json') as infile:
    data = json.load(infile)
for item in data:
    iden = item.get["id"]
    a = item.get["a"]
    b = item.get["b"]
    c = item.get["c"]
    if c == 'XYZ' or  "XYZ" in data["text"]:
        filename = 'abc.json'
    try:
        outfile = open(filename,'ab')
    except:
        outfile = open(filename,'wb')
    obj_json={}
    obj_json["ID"] = iden
    obj_json["VAL_A"] = a
    obj_json["VAL_B"] = b

and I am getting an error, the traceback is:

  File "rtfav.py", line 3, in <module>
    data = json.load(infile)
  File "/usr/lib64/python2.7/json/__init__.py", line 278, in load
    **kw)
  File "/usr/lib64/python2.7/json/__init__.py", line 326, in loads
    return _default_decoder.decode(s)
  File "/usr/lib64/python2.7/json/decoder.py", line 369, in decode
    raise ValueError(errmsg("Extra data", s, end, len(s)))
ValueError: Extra data: line 88 column 2 - line 50607 column 2 (char 3077 - 1868399)

Can someone help me?

Here is a sample of the data in new.json, there are about 1500 more such dictionaries in the file

{
    "contributors": null, 
    "truncated": false, 
    "text": "@HomeShop18 #DreamJob to professional rafter", 
    "in_reply_to_status_id": null, 
    "id": 421584490452893696, 
    "favorite_count": 0, 
    "source": "<a href=\"https://mobile.twitter.com\" rel=\"nofollow\">Mobile Web (M2)</a>", 
    "retweeted": false, 
    "coordinates": null, 
    "entities": {
        "symbols": [], 
        "user_mentions": [
            {
                "id": 183093247, 
                "indices": [
                    0, 
                    11
                ], 
                "id_str": "183093247", 
                "screen_name": "HomeShop18", 
                "name": "HomeShop18"
            }
        ], 
        "hashtags": [
            {
                "indices": [
                    12, 
                    21
                ], 
                "text": "DreamJob"
            }
        ], 
        "urls": []
    }, 
    "in_reply_to_screen_name": "HomeShop18", 
    "id_str": "421584490452893696", 
    "retweet_count": 0, 
    "in_reply_to_user_id": 183093247, 
    "favorited": false, 
    "user": {
        "follow_request_sent": null, 
        "profile_use_background_image": true, 
        "default_profile_image": false, 
        "id": 2254546045, 
        "verified": false, 
        "profile_image_url_https": "https://pbs.twimg.com/profile_images/413952088880594944/rcdr59OY_normal.jpeg", 
        "profile_sidebar_fill_color": "171106", 
        "profile_text_color": "8A7302", 
        "followers_count": 87, 
        "profile_sidebar_border_color": "BCB302", 
        "id_str": "2254546045", 
        "profile_background_color": "0F0A02", 
        "listed_count": 1, 
        "profile_background_image_url_https": "https://abs.twimg.com/images/themes/theme1/bg.png", 
        "utc_offset": null, 
        "statuses_count": 9793, 
        "description": "Rafter. Rafting is what I do. Me aur mera Tablet.  Technocrat of Future", 
        "friends_count": 231, 
        "location": "", 
        "profile_link_color": "473623", 
        "profile_image_url": "http://pbs.twimg.com/profile_images/413952088880594944/rcdr59OY_normal.jpeg", 
        "following": null, 
        "geo_enabled": false, 
        "profile_banner_url": "https://pbs.twimg.com/profile_banners/2254546045/1388065343", 
        "profile_background_image_url": "http://abs.twimg.com/images/themes/theme1/bg.png", 
        "name": "Jayy", 
        "lang": "en", 
        "profile_background_tile": false, 
        "favourites_count": 41, 
        "screen_name": "JzayyPsingh", 
        "notifications": null, 
        "url": null, 
        "created_at": "Fri Dec 20 05:46:00 +0000 2013", 
        "contributors_enabled": false, 
        "time_zone": null, 
        "protected": false, 
        "default_profile": false, 
        "is_translator": false
    }, 
    "geo": null, 
    "in_reply_to_user_id_str": "183093247", 
    "lang": "en", 
    "created_at": "Fri Jan 10 10:09:09 +0000 2014", 
    "filter_level": "medium", 
    "in_reply_to_status_id_str": null, 
    "place": null
} 

回答 0

如以下示例所示,json.loads(和json.load)不会解码多个json对象。

>>> json.loads('{}')
{}
>>> json.loads('{}{}') # == json.loads(json.dumps({}) + json.dumps({}))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Python27\lib\json\__init__.py", line 338, in loads
    return _default_decoder.decode(s)
  File "C:\Python27\lib\json\decoder.py", line 368, in decode
    raise ValueError(errmsg("Extra data", s, end, len(s)))
ValueError: Extra data: line 1 column 3 - line 1 column 5 (char 2 - 4)

如果要转储多个字典,请将它们包装在列表中,然后转储列表(而不是多次转储字典)

>>> dict1 = {}
>>> dict2 = {}
>>> json.dumps([dict1, dict2])
'[{}, {}]'
>>> json.loads(json.dumps([dict1, dict2]))
[{}, {}]

As you can see in the following example, json.loads (and json.load) does not decode multiple json object.

>>> json.loads('{}')
{}
>>> json.loads('{}{}') # == json.loads(json.dumps({}) + json.dumps({}))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Python27\lib\json\__init__.py", line 338, in loads
    return _default_decoder.decode(s)
  File "C:\Python27\lib\json\decoder.py", line 368, in decode
    raise ValueError(errmsg("Extra data", s, end, len(s)))
ValueError: Extra data: line 1 column 3 - line 1 column 5 (char 2 - 4)

If you want to dump multiple dictionaries, wrap them in a list, dump the list (instead of dumping dictionaries multiple times)

>>> dict1 = {}
>>> dict2 = {}
>>> json.dumps([dict1, dict2])
'[{}, {}]'
>>> json.loads(json.dumps([dict1, dict2]))
[{}, {}]

回答 1

您可以从文件中jsonifying逐行读取:

tweets = []
for line in open('tweets.json', 'r'):
    tweets.append(json.loads(line))

这样可以避免存储中间的python对象。只要您在每次append()通话中写一条完整的推文,它就可以工作。

You can just read from a file, jsonifying each line as you go:

tweets = []
for line in open('tweets.json', 'r'):
    tweets.append(json.loads(line))

This avoids storing intermediate python objects. As long as your write one full tweet per append() call, this should work.


回答 2

我碰到这个是因为我试图加载从MongoDB转储的JSON文件。这给我一个错误

JSONDecodeError: Extra data: line 2 column 1

MongoDB JSON转储每行有一个对象,因此对我有用的是:

import json
data = [json.loads(line) for line in open('data.json', 'r')]

I came across this because I was trying to load a JSON file dumped from MongoDB. It was giving me an error

JSONDecodeError: Extra data: line 2 column 1

The MongoDB JSON dump has one object per line, so what worked for me is:

import json
data = [json.loads(line) for line in open('data.json', 'r')]

回答 3

如果您的JSON文件不只是1个JSON记录,也会发生这种情况。JSON记录如下所示:

[{"some data": value, "next key": "another value"}]

它用方括号[]打开和关闭,方括号内是括号{}。可以有许多对括号,但它们的结尾都带有一个括号[]。如果您的json文件包含多个以下文件之一:

[{"some data": value, "next key": "another value"}]
[{"2nd record data": value, "2nd record key": "another value"}]

然后load()将失败。

我用自己的失败文件验证了这一点。

import json

guestFile = open("1_guests.json",'r')
guestData = guestFile.read()
guestFile.close()
gdfJson = json.loads(guestData)

之所以有效,是因为1_guests.json有一条记录[]。我正在使用all_guests.json的原始文件有6条记录,以换行符分隔。我删除了5条记录(我已经检查了它们是否包含在方括号中)并以新名称保存了该文件。然后,loads语句起作用了。

错误原为

   raise ValueError(errmsg("Extra data", s, end, len(s)))
ValueError: Extra data: line 2 column 1 - line 10 column 1 (char 261900 - 6964758)

PS。我使用记录一词,但这不是官方名称。另外,如果您的文件包含像我这样的换行符,则可以遍历整个文件,一次将一条记录加载到json变量中。

This may also happen if your JSON file is not just 1 JSON record. A JSON record looks like this:

[{"some data": value, "next key": "another value"}]

It opens and closes with a bracket [ ], within the brackets are the braces { }. There can be many pairs of braces, but it all ends with a close bracket ]. If your json file contains more than one of those:

[{"some data": value, "next key": "another value"}]
[{"2nd record data": value, "2nd record key": "another value"}]

then loads() will fail.

I verified this with my own file that was failing.

import json

guestFile = open("1_guests.json",'r')
guestData = guestFile.read()
guestFile.close()
gdfJson = json.loads(guestData)

This works because 1_guests.json has one record []. The original file I was using all_guests.json had 6 records separated by newline. I deleted 5 records, (which I already checked to be bookended by brackets) and saved the file under a new name. Then the loads statement worked.

Error was

   raise ValueError(errmsg("Extra data", s, end, len(s)))
ValueError: Extra data: line 2 column 1 - line 10 column 1 (char 261900 - 6964758)

PS. I use the word record, but that’s not the official name. Also, if your file has newline characters like mine, you can loop through it to loads() one record at a time into a json variable.


回答 4

好吧,这可能会对某人有所帮助。我的json文件像这样时我只是遇到了相同的错误

{"id":"1101010","city_id":"1101","name":"TEUPAH SELATAN"}
{"id":"1101020","city_id":"1101","name":"SIMEULUE TIMUR"}

我发现它的格式不正确,所以我将其更改为

{
  "datas":[
    {"id":"1101010","city_id":"1101","name":"TEUPAH SELATAN"},
    {"id":"1101020","city_id":"1101","name":"SIMEULUE TIMUR"}
  ]
}

Well , it might help someone. i just got the same error while my json file is like this

{"id":"1101010","city_id":"1101","name":"TEUPAH SELATAN"}
{"id":"1101020","city_id":"1101","name":"SIMEULUE TIMUR"}

and i found it malformed, so i changed it into somekind of

{
  "datas":[
    {"id":"1101010","city_id":"1101","name":"TEUPAH SELATAN"},
    {"id":"1101020","city_id":"1101","name":"SIMEULUE TIMUR"}
  ]
}

回答 5

一线解决您的问题:

data = [json.loads(line) for line in open('tweets.json', 'r')]

One-liner for your problem:

data = [json.loads(line) for line in open('tweets.json', 'r')]

回答 6

如果您想用两种方法解决它,可以这样做:

with open('data.json') as f:
    data = [json.loads(line) for line in f]

If you want to solve it in a two-liner you can do it like this:

with open('data.json') as f:
    data = [json.loads(line) for line in f]

回答 7

我认为将字典保存在列表中并不是@falsetru提出的理想解决方案。

更好的方法是遍历字典并通过添加新行将其保存到.json。

我们的2本字典是

d1 = {'a':1}

d2 = {'b':2}

您可以将它们写入.json

import json
with open('sample.json','a') as sample:
    for dict in [d1,d2]:
        sample.write('{}\n'.format(json.dumps(dict)))

而且您可以读取json文件而没有任何问题

with open('sample.json','r') as sample:
    for line in sample:
        line = json.loads(line.strip())

简单高效

I think saving dicts in a list is not an ideal solution here proposed by @falsetru.

Better way is, iterating through dicts and saving them to .json by adding a new line.

our 2 dictionaries are

d1 = {'a':1}

d2 = {'b':2}

you can write them to .json

import json
with open('sample.json','a') as sample:
    for dict in [d1,d2]:
        sample.write('{}\n'.format(json.dumps(dict)))

and you can read json file without any issues

with open('sample.json','r') as sample:
    for line in sample:
        line = json.loads(line.strip())

simple and efficient


如何对集合进行JSON序列化?

问题:如何对集合进行JSON序列化?

我有一个Python set,其中包含带有__hash____eq__方法的对象,以确保该集合中没有重复项。

我需要对该结果进行json编码set,但是即使将一个空值传递set给该json.dumps方法也会引发TypeError

  File "/usr/lib/python2.7/json/encoder.py", line 201, in encode
    chunks = self.iterencode(o, _one_shot=True)
  File "/usr/lib/python2.7/json/encoder.py", line 264, in iterencode
    return _iterencode(o, 0)
  File "/usr/lib/python2.7/json/encoder.py", line 178, in default
    raise TypeError(repr(o) + " is not JSON serializable")
TypeError: set([]) is not JSON serializable

我知道我可以为json.JSONEncoder具有自定义default方法的类创建扩展,但是我什至不知道从哪里开始转换set。是否应该set使用默认方法中的值创建字典,然后返回该方法的编码?理想情况下,我想使默认方法能够处理原始编码器阻塞的所有数据类型(我将Mongo用作数据源,因此日期似乎也引发了此错误)

正确方向的任何提示将不胜感激。

编辑:

感谢你的回答!也许我应该更精确一些。

我利用(并赞成)这里的答案来解决set翻译的局限性,但是内部键也是一个问题。

中的set对象是转换为的复杂对象__dict__,但它们本身也可以包含其属性值,这些值可能不符合json编码器中的基本类型。

涉及到很多不同的类型set,并且哈希基本上为实体计算了唯一的ID,但是按照NoSQL的真正精神,没有确切说明子对象包含什么。

一个对象可能包含的日期值starts,而另一个对象可能具有一些其他模式,该模式不包含包含“非原始”对象的键。

这就是为什么我唯一能想到的解决方案是扩展JSONEncoder替换default方法以打开不同情况的方法-但我不确定如何进行此操作,并且文档不明确。在嵌套对象中,是default按键返回go 的值,还是只是查看整个对象的通用包含/丢弃?该方法如何容纳嵌套值?我已经看过先前的问题,但似乎找不到最佳的针对特定情况的编码的方法(不幸的是,这似乎是我在这里需要做的事情)。

I have a Python set that contains objects with __hash__ and __eq__ methods in order to make certain no duplicates are included in the collection.

I need to json encode this result set, but passing even an empty set to the json.dumps method raises a TypeError.

  File "/usr/lib/python2.7/json/encoder.py", line 201, in encode
    chunks = self.iterencode(o, _one_shot=True)
  File "/usr/lib/python2.7/json/encoder.py", line 264, in iterencode
    return _iterencode(o, 0)
  File "/usr/lib/python2.7/json/encoder.py", line 178, in default
    raise TypeError(repr(o) + " is not JSON serializable")
TypeError: set([]) is not JSON serializable

I know I can create an extension to the json.JSONEncoder class that has a custom default method, but I’m not even sure where to begin in converting over the set. Should I create a dictionary out of the set values within the default method, and then return the encoding on that? Ideally, I’d like to make the default method able to handle all the datatypes that the original encoder chokes on (I’m using Mongo as a data source so dates seem to raise this error too)

Any hint in the right direction would be appreciated.

EDIT:

Thanks for the answer! Perhaps I should have been more precise.

I utilized (and upvoted) the answers here to get around the limitations of the set being translated, but there are internal keys that are an issue as well.

The objects in the set are complex objects that translate to __dict__, but they themselves can also contain values for their properties that could be ineligible for the basic types in the json encoder.

There’s a lot of different types coming into this set, and the hash basically calculates a unique id for the entity, but in the true spirit of NoSQL there’s no telling exactly what the child object contains.

One object might contain a date value for starts, whereas another may have some other schema that includes no keys containing “non-primitive” objects.

That is why the only solution I could think of was to extend the JSONEncoder to replace the default method to turn on different cases – but I’m not sure how to go about this and the documentation is ambiguous. In nested objects, does the value returned from default go by key, or is it just a generic include/discard that looks at the whole object? How does that method accommodate nested values? I’ve looked through previous questions and can’t seem to find the best approach to case-specific encoding (which unfortunately seems like what I’m going to need to do here).


回答 0

JSON表示法只有少数本机数据类型(对象,数组,字符串,数字,布尔值和null),因此以JSON序列化的任何内容都必须表示为这些类型之一。

json模块docs所示,此转换可以由JSONEncoderJSONDecoder自动完成,但随后您将放弃可能需要的其他一些结构(如果将集转换为列表,则将失去恢复常规数据的能力。列表;如果使用将集转换为字典,dict.fromkeys(s)则将失去恢复字典的能力)。

一个更复杂的解决方案是构建可以与其他本机JSON类型共存的自定义类型。这使您可以存储嵌套结构,其中包括列表,集合,字典,小数,日期时间对象等:

from json import dumps, loads, JSONEncoder, JSONDecoder
import pickle

class PythonObjectEncoder(JSONEncoder):
    def default(self, obj):
        if isinstance(obj, (list, dict, str, unicode, int, float, bool, type(None))):
            return JSONEncoder.default(self, obj)
        return {'_python_object': pickle.dumps(obj)}

def as_python_object(dct):
    if '_python_object' in dct:
        return pickle.loads(str(dct['_python_object']))
    return dct

这是一个示例会话,显示它可以处理列表,字典和集合:

>>> data = [1,2,3, set(['knights', 'who', 'say', 'ni']), {'key':'value'}, Decimal('3.14')]

>>> j = dumps(data, cls=PythonObjectEncoder)

>>> loads(j, object_hook=as_python_object)
[1, 2, 3, set(['knights', 'say', 'who', 'ni']), {u'key': u'value'}, Decimal('3.14')]

另外,使用更通用的序列化技术(例如YAMLTwisted Jelly或Python的pickle模块)可能很有用。它们每个都支持更大范围的数据类型。

JSON notation has only a handful of native datatypes (objects, arrays, strings, numbers, booleans, and null), so anything serialized in JSON needs to be expressed as one of these types.

As shown in the json module docs, this conversion can be done automatically by a JSONEncoder and JSONDecoder, but then you would be giving up some other structure you might need (if you convert sets to a list, then you lose the ability to recover regular lists; if you convert sets to a dictionary using dict.fromkeys(s) then you lose the ability to recover dictionaries).

A more sophisticated solution is to build-out a custom type that can coexist with other native JSON types. This lets you store nested structures that include lists, sets, dicts, decimals, datetime objects, etc.:

from json import dumps, loads, JSONEncoder, JSONDecoder
import pickle

class PythonObjectEncoder(JSONEncoder):
    def default(self, obj):
        if isinstance(obj, (list, dict, str, unicode, int, float, bool, type(None))):
            return JSONEncoder.default(self, obj)
        return {'_python_object': pickle.dumps(obj)}

def as_python_object(dct):
    if '_python_object' in dct:
        return pickle.loads(str(dct['_python_object']))
    return dct

Here is a sample session showing that it can handle lists, dicts, and sets:

>>> data = [1,2,3, set(['knights', 'who', 'say', 'ni']), {'key':'value'}, Decimal('3.14')]

>>> j = dumps(data, cls=PythonObjectEncoder)

>>> loads(j, object_hook=as_python_object)
[1, 2, 3, set(['knights', 'say', 'who', 'ni']), {u'key': u'value'}, Decimal('3.14')]

Alternatively, it may be useful to use a more general purpose serialization technique such as YAML, Twisted Jelly, or Python’s pickle module. These each support a much greater range of datatypes.


回答 1

您可以创建一个自定义编码器,list当遇到时将返回set。这是一个例子:

>>> import json
>>> class SetEncoder(json.JSONEncoder):
...    def default(self, obj):
...       if isinstance(obj, set):
...          return list(obj)
...       return json.JSONEncoder.default(self, obj)
... 
>>> json.dumps(set([1,2,3,4,5]), cls=SetEncoder)
'[1, 2, 3, 4, 5]'

您也可以通过这种方式检测其他类型。如果需要保留列表实际上是一个集合,则可以使用自定义编码。类似的东西return {'type':'set', 'list':list(obj)}可能会起作用。

要说明嵌套类型,请考虑将其序列化:

>>> class Something(object):
...    pass
>>> json.dumps(set([1,2,3,4,5,Something()]), cls=SetEncoder)

这将引发以下错误:

TypeError: <__main__.Something object at 0x1691c50> is not JSON serializable

这表明编码器将获取list返回的结果,并对其子代递归调用序列化器。要为多种类型添加自定义序列化程序,可以执行以下操作:

>>> class SetEncoder(json.JSONEncoder):
...    def default(self, obj):
...       if isinstance(obj, set):
...          return list(obj)
...       if isinstance(obj, Something):
...          return 'CustomSomethingRepresentation'
...       return json.JSONEncoder.default(self, obj)
... 
>>> json.dumps(set([1,2,3,4,5,Something()]), cls=SetEncoder)
'[1, 2, 3, 4, 5, "CustomSomethingRepresentation"]'

You can create a custom encoder that returns a list when it encounters a set. Here’s an example:

>>> import json
>>> class SetEncoder(json.JSONEncoder):
...    def default(self, obj):
...       if isinstance(obj, set):
...          return list(obj)
...       return json.JSONEncoder.default(self, obj)
... 
>>> json.dumps(set([1,2,3,4,5]), cls=SetEncoder)
'[1, 2, 3, 4, 5]'

You can detect other types this way too. If you need to retain that the list was actually a set, you could use a custom encoding. Something like return {'type':'set', 'list':list(obj)} might work.

To illustrated nested types, consider serializing this:

>>> class Something(object):
...    pass
>>> json.dumps(set([1,2,3,4,5,Something()]), cls=SetEncoder)

This raises the following error:

TypeError: <__main__.Something object at 0x1691c50> is not JSON serializable

This indicates that the encoder will take the list result returned and recursively call the serializer on its children. To add a custom serializer for multiple types, you can do this:

>>> class SetEncoder(json.JSONEncoder):
...    def default(self, obj):
...       if isinstance(obj, set):
...          return list(obj)
...       if isinstance(obj, Something):
...          return 'CustomSomethingRepresentation'
...       return json.JSONEncoder.default(self, obj)
... 
>>> json.dumps(set([1,2,3,4,5,Something()]), cls=SetEncoder)
'[1, 2, 3, 4, 5, "CustomSomethingRepresentation"]'

回答 2

我将Raymond Hettinger的解决方案调整为适用于python 3。

这是发生了什么变化:

  • unicode 消失了
  • 更新调用父母defaultsuper()
  • 使用base64序列化bytes型成str(因为它似乎bytes在python 3不能被转换为JSON)
from decimal import Decimal
from base64 import b64encode, b64decode
from json import dumps, loads, JSONEncoder
import pickle

class PythonObjectEncoder(JSONEncoder):
    def default(self, obj):
        if isinstance(obj, (list, dict, str, int, float, bool, type(None))):
            return super().default(obj)
        return {'_python_object': b64encode(pickle.dumps(obj)).decode('utf-8')}

def as_python_object(dct):
    if '_python_object' in dct:
        return pickle.loads(b64decode(dct['_python_object'].encode('utf-8')))
    return dct

data = [1,2,3, set(['knights', 'who', 'say', 'ni']), {'key':'value'}, Decimal('3.14')]
j = dumps(data, cls=PythonObjectEncoder)
print(loads(j, object_hook=as_python_object))
# prints: [1, 2, 3, {'knights', 'who', 'say', 'ni'}, {'key': 'value'}, Decimal('3.14')]

I adapted Raymond Hettinger’s solution to python 3.

Here is what has changed:

  • unicode disappeared
  • updated the call to the parents’ default with super()
  • using base64 to serialize the bytes type into str (because it seems that bytes in python 3 can’t be converted to JSON)
from decimal import Decimal
from base64 import b64encode, b64decode
from json import dumps, loads, JSONEncoder
import pickle

class PythonObjectEncoder(JSONEncoder):
    def default(self, obj):
        if isinstance(obj, (list, dict, str, int, float, bool, type(None))):
            return super().default(obj)
        return {'_python_object': b64encode(pickle.dumps(obj)).decode('utf-8')}

def as_python_object(dct):
    if '_python_object' in dct:
        return pickle.loads(b64decode(dct['_python_object'].encode('utf-8')))
    return dct

data = [1,2,3, set(['knights', 'who', 'say', 'ni']), {'key':'value'}, Decimal('3.14')]
j = dumps(data, cls=PythonObjectEncoder)
print(loads(j, object_hook=as_python_object))
# prints: [1, 2, 3, {'knights', 'who', 'say', 'ni'}, {'key': 'value'}, Decimal('3.14')]

回答 3

JSON中仅字典,列表和原始对象类型(int,字符串,布尔)可用。

Only dictionaries, Lists and primitive object types (int, string, bool) are available in JSON.


回答 4

您无需创建自定义编码器类即可提供default方法-可以将其作为关键字参数传递:

import json

def serialize_sets(obj):
    if isinstance(obj, set):
        return list(obj)

    return obj

json_str = json.dumps(set([1,2,3]), default=serialize_sets)
print(json_str)

会生成[1, 2, 3]所有受支持的Python版本。

You don’t need to make a custom encoder class to supply the default method – it can be passed in as a keyword argument:

import json

def serialize_sets(obj):
    if isinstance(obj, set):
        return list(obj)

    return obj

json_str = json.dumps(set([1,2,3]), default=serialize_sets)
print(json_str)

results in [1, 2, 3] in all supported Python versions.


回答 5

如果您只需要编码集合,而不是一般的Python对象,并且想要使其易于阅读,则可以使用Raymond Hettinger答案的简化版本:

import json
import collections

class JSONSetEncoder(json.JSONEncoder):
    """Use with json.dumps to allow Python sets to be encoded to JSON

    Example
    -------

    import json

    data = dict(aset=set([1,2,3]))

    encoded = json.dumps(data, cls=JSONSetEncoder)
    decoded = json.loads(encoded, object_hook=json_as_python_set)
    assert data == decoded     # Should assert successfully

    Any object that is matched by isinstance(obj, collections.Set) will
    be encoded, but the decoded value will always be a normal Python set.

    """

    def default(self, obj):
        if isinstance(obj, collections.Set):
            return dict(_set_object=list(obj))
        else:
            return json.JSONEncoder.default(self, obj)

def json_as_python_set(dct):
    """Decode json {'_set_object': [1,2,3]} to set([1,2,3])

    Example
    -------
    decoded = json.loads(encoded, object_hook=json_as_python_set)

    Also see :class:`JSONSetEncoder`

    """
    if '_set_object' in dct:
        return set(dct['_set_object'])
    return dct

If you only need to encode sets, not general Python objects, and want to keep it easily human-readable, a simplified version of Raymond Hettinger’s answer can be used:

import json
import collections

class JSONSetEncoder(json.JSONEncoder):
    """Use with json.dumps to allow Python sets to be encoded to JSON

    Example
    -------

    import json

    data = dict(aset=set([1,2,3]))

    encoded = json.dumps(data, cls=JSONSetEncoder)
    decoded = json.loads(encoded, object_hook=json_as_python_set)
    assert data == decoded     # Should assert successfully

    Any object that is matched by isinstance(obj, collections.Set) will
    be encoded, but the decoded value will always be a normal Python set.

    """

    def default(self, obj):
        if isinstance(obj, collections.Set):
            return dict(_set_object=list(obj))
        else:
            return json.JSONEncoder.default(self, obj)

def json_as_python_set(dct):
    """Decode json {'_set_object': [1,2,3]} to set([1,2,3])

    Example
    -------
    decoded = json.loads(encoded, object_hook=json_as_python_set)

    Also see :class:`JSONSetEncoder`

    """
    if '_set_object' in dct:
        return set(dct['_set_object'])
    return dct

回答 6

如果您只需要快速转储并且不想实现自定义编码器。您可以使用以下内容:

json_string = json.dumps(data, iterable_as_array=True)

这会将所有集合(和其他可迭代对象)转换为数组。请注意,当您解析json时,这些字段将保留为数组。如果要保留类型,则需要编写自定义编码器。

If you need just quick dump and don’t want to implement custom encoder. You can use the following:

json_string = json.dumps(data, iterable_as_array=True)

This will convert all sets (and other iterables) into arrays. Just beware that those fields will stay arrays when you parse the json back. If you want to preserve the types, you need to write custom encoder.


回答 7

公认的解决方案的一个缺点是它的输出是非常特定于python的。也就是说,人类无法观察到其原始json输出,也无法通过其他语言(例如javascript)加载该输出。例:

db = {
        "a": [ 44, set((4,5,6)) ],
        "b": [ 55, set((4,3,2)) ]
        }

j = dumps(db, cls=PythonObjectEncoder)
print(j)

会给你:

{"a": [44, {"_python_object": "gANjYnVpbHRpbnMKc2V0CnEAXXEBKEsESwVLBmWFcQJScQMu"}], "b": [55, {"_python_object": "gANjYnVpbHRpbnMKc2V0CnEAXXEBKEsCSwNLBGWFcQJScQMu"}]}

我可以提出一种解决方案,将集合降级为包含输出列表的字典,并在使用相同的编码器加载到python中时将其降级为集合,从而保留可观察性和语言不可知性:

from decimal import Decimal
from base64 import b64encode, b64decode
from json import dumps, loads, JSONEncoder
import pickle

class PythonObjectEncoder(JSONEncoder):
    def default(self, obj):
        if isinstance(obj, (list, dict, str, int, float, bool, type(None))):
            return super().default(obj)
        elif isinstance(obj, set):
            return {"__set__": list(obj)}
        return {'_python_object': b64encode(pickle.dumps(obj)).decode('utf-8')}

def as_python_object(dct):
    if '__set__' in dct:
        return set(dct['__set__'])
    elif '_python_object' in dct:
        return pickle.loads(b64decode(dct['_python_object'].encode('utf-8')))
    return dct

db = {
        "a": [ 44, set((4,5,6)) ],
        "b": [ 55, set((4,3,2)) ]
        }

j = dumps(db, cls=PythonObjectEncoder)
print(j)
ob = loads(j)
print(ob["a"])

这使您:

{"a": [44, {"__set__": [4, 5, 6]}], "b": [55, {"__set__": [2, 3, 4]}]}
[44, {'__set__': [4, 5, 6]}]

请注意,序列化包含具有键的元素的字典"__set__"将破坏此机制。因此__set__现在已成为保留dict键。显然,可以随意使用另一个更加模糊的密钥。

One shortcoming of the accepted solution is that its output is very python specific. I.e. its raw json output cannot be observed by a human or loaded by another language (e.g. javascript). example:

db = {
        "a": [ 44, set((4,5,6)) ],
        "b": [ 55, set((4,3,2)) ]
        }

j = dumps(db, cls=PythonObjectEncoder)
print(j)

Will get you:

{"a": [44, {"_python_object": "gANjYnVpbHRpbnMKc2V0CnEAXXEBKEsESwVLBmWFcQJScQMu"}], "b": [55, {"_python_object": "gANjYnVpbHRpbnMKc2V0CnEAXXEBKEsCSwNLBGWFcQJScQMu"}]}

I can propose a solution which downgrades the set to a dict containing a list on the way out, and back to a set when loaded into python using the same encoder, therefore preserving observability and language agnosticism:

from decimal import Decimal
from base64 import b64encode, b64decode
from json import dumps, loads, JSONEncoder
import pickle

class PythonObjectEncoder(JSONEncoder):
    def default(self, obj):
        if isinstance(obj, (list, dict, str, int, float, bool, type(None))):
            return super().default(obj)
        elif isinstance(obj, set):
            return {"__set__": list(obj)}
        return {'_python_object': b64encode(pickle.dumps(obj)).decode('utf-8')}

def as_python_object(dct):
    if '__set__' in dct:
        return set(dct['__set__'])
    elif '_python_object' in dct:
        return pickle.loads(b64decode(dct['_python_object'].encode('utf-8')))
    return dct

db = {
        "a": [ 44, set((4,5,6)) ],
        "b": [ 55, set((4,3,2)) ]
        }

j = dumps(db, cls=PythonObjectEncoder)
print(j)
ob = loads(j)
print(ob["a"])

Which gets you:

{"a": [44, {"__set__": [4, 5, 6]}], "b": [55, {"__set__": [2, 3, 4]}]}
[44, {'__set__': [4, 5, 6]}]

Note that serializing a dictionary which has an element with a key "__set__" will break this mechanism. So __set__ has now become a reserved dict key. Obviously feel free to use another, more deeply obfuscated key.


json.dumps和json.load有什么区别?[关闭]

问题:json.dumps和json.load有什么区别?[关闭]

json.dumps和之间有什么区别json.load

据我了解,一个将JSON加载到字典中,另一个则加载到对象中。

What is the difference between json.dumps and json.load?

From my understanding, one loads JSON into a dictionary and another loads into objects.


回答 0

dumps 接受一个对象并产生一个字符串:

>>> a = {'foo': 3}
>>> json.dumps(a)
'{"foo": 3}'

load 将采用类似文件的对象,从该对象读取数据,然后使用该字符串创建一个对象:

with open('file.json') as fh:
    a = json.load(fh)

需要注意的是dumpload文件和对象,而之间的转换dumpsloads相互转换的字符串和对象。您可以将s-less函数视为函数的包装器s

def dump(obj, fh):
    fh.write(dumps(obj))

def load(fh):
    return loads(fh.read())

dumps takes an object and produces a string:

>>> a = {'foo': 3}
>>> json.dumps(a)
'{"foo": 3}'

load would take a file-like object, read the data from that object, and use that string to create an object:

with open('file.json') as fh:
    a = json.load(fh)

Note that dump and load convert between files and objects, while dumps and loads convert between strings and objects. You can think of the s-less functions as wrappers around the s functions:

def dump(obj, fh):
    fh.write(dumps(obj))

def load(fh):
    return loads(fh.read())

回答 1

json加载->从代表json对象的字符串中返回一个对象。

json dumps->从对象返回代表json对象的字符串。

加载和转储->从文件读/写到文件而不是字符串

json loads -> returns an object from a string representing a json object.

json dumps -> returns a string representing a json object from an object.

load and dump -> read/write from/to file instead of string


JSON转换为Pandas DataFrame

问题:JSON转换为Pandas DataFrame

我想做的是沿着经纬度坐标指定的路径从Google Maps API中提取海拔数据,如下所示:

from urllib2 import Request, urlopen
import json

path1 = '42.974049,-81.205203|42.974298,-81.195755'
request=Request('http://maps.googleapis.com/maps/api/elevation/json?locations='+path1+'&sensor=false')
response = urlopen(request)
elevations = response.read()

这给了我一个看起来像这样的数据:

elevations.splitlines()

['{',
 '   "results" : [',
 '      {',
 '         "elevation" : 243.3462677001953,',
 '         "location" : {',
 '            "lat" : 42.974049,',
 '            "lng" : -81.205203',
 '         },',
 '         "resolution" : 19.08790397644043',
 '      },',
 '      {',
 '         "elevation" : 244.1318664550781,',
 '         "location" : {',
 '            "lat" : 42.974298,',
 '            "lng" : -81.19575500000001',
 '         },',
 '         "resolution" : 19.08790397644043',
 '      }',
 '   ],',
 '   "status" : "OK"',
 '}']

当放入DataFrame时,我得到的是:

pd.read_json(elevations)

这是我想要的:

我不确定这是否可行,但主要是我想寻找的是一种将海拔,纬度和经度数据放到pandas数据框中的方式(不必具有花哨的mutiline标头)。

如果有人可以帮助或提出一些使用此数据的建议,那就太好了!如果您不能告诉我之前我并没有对JSON数据做太多工作…

编辑:

这种方法并不是很吸引人,但似乎可以起作用:

data = json.loads(elevations)
lat,lng,el = [],[],[]
for result in data['results']:
    lat.append(result[u'location'][u'lat'])
    lng.append(result[u'location'][u'lng'])
    el.append(result[u'elevation'])
df = pd.DataFrame([lat,lng,el]).T

结束具有经度,纬度,海拔列的数据框

What I am trying to do is extract elevation data from a google maps API along a path specified by latitude and longitude coordinates as follows:

from urllib2 import Request, urlopen
import json

path1 = '42.974049,-81.205203|42.974298,-81.195755'
request=Request('http://maps.googleapis.com/maps/api/elevation/json?locations='+path1+'&sensor=false')
response = urlopen(request)
elevations = response.read()

This gives me a data that looks like this:

elevations.splitlines()

['{',
 '   "results" : [',
 '      {',
 '         "elevation" : 243.3462677001953,',
 '         "location" : {',
 '            "lat" : 42.974049,',
 '            "lng" : -81.205203',
 '         },',
 '         "resolution" : 19.08790397644043',
 '      },',
 '      {',
 '         "elevation" : 244.1318664550781,',
 '         "location" : {',
 '            "lat" : 42.974298,',
 '            "lng" : -81.19575500000001',
 '         },',
 '         "resolution" : 19.08790397644043',
 '      }',
 '   ],',
 '   "status" : "OK"',
 '}']

when putting into as DataFrame here is what I get:

pd.read_json(elevations)

and here is what I want:

I’m not sure if this is possible, but mainly what I am looking for is a way to be able to put the elevation, latitude and longitude data together in a pandas dataframe (doesn’t have to have fancy mutiline headers).

If any one can help or give some advice on working with this data that would be great! If you can’t tell I haven’t worked much with json data before…

EDIT:

This method isn’t all that attractive but seems to work:

data = json.loads(elevations)
lat,lng,el = [],[],[]
for result in data['results']:
    lat.append(result[u'location'][u'lat'])
    lng.append(result[u'location'][u'lng'])
    el.append(result[u'elevation'])
df = pd.DataFrame([lat,lng,el]).T

ends up dataframe having columns latitude, longitude, elevation


回答 0

我找到了一个快速简便的解决方案,以解决我想要使用的json_normalize()问题pandas 1.01

from urllib2 import Request, urlopen
import json

import pandas as pd    

path1 = '42.974049,-81.205203|42.974298,-81.195755'
request=Request('http://maps.googleapis.com/maps/api/elevation/json?locations='+path1+'&sensor=false')
response = urlopen(request)
elevations = response.read()
data = json.loads(elevations)
df = pd.json_normalize(data['results'])

这提供了一个很好的扁平化数据框架,其中包含我从Google Maps API获得的json数据。

I found a quick and easy solution to what I wanted using json_normalize() included in pandas 1.01.

from urllib2 import Request, urlopen
import json

import pandas as pd    

path1 = '42.974049,-81.205203|42.974298,-81.195755'
request=Request('http://maps.googleapis.com/maps/api/elevation/json?locations='+path1+'&sensor=false')
response = urlopen(request)
elevations = response.read()
data = json.loads(elevations)
df = pd.json_normalize(data['results'])

This gives a nice flattened dataframe with the json data that I got from the Google Maps API.


回答 1

检查此片段。

# reading the JSON data using json.load()
file = 'data.json'
with open(file) as train_file:
    dict_train = json.load(train_file)

# converting json dataset from dictionary to dataframe
train = pd.DataFrame.from_dict(dict_train, orient='index')
train.reset_index(level=0, inplace=True)

希望能帮助到你 :)

Check this snip out.

# reading the JSON data using json.load()
file = 'data.json'
with open(file) as train_file:
    dict_train = json.load(train_file)

# converting json dataset from dictionary to dataframe
train = pd.DataFrame.from_dict(dict_train, orient='index')
train.reset_index(level=0, inplace=True)

Hope it helps :)


回答 2

您可以先将json数据导入Python字典中:

data = json.loads(elevations)

然后动态修改数据:

for result in data['results']:
    result[u'lat']=result[u'location'][u'lat']
    result[u'lng']=result[u'location'][u'lng']
    del result[u'location']

重建json字符串:

elevations = json.dumps(data)

最后:

pd.read_json(elevations)

您也可以避免将数据转储到字符串中,我假设Panda可以直接从字典创建DataFrame(很长时间以来我就没有使用过它:p)

You could first import your json data in a Python dictionnary :

data = json.loads(elevations)

Then modify data on the fly :

for result in data['results']:
    result[u'lat']=result[u'location'][u'lat']
    result[u'lng']=result[u'location'][u'lng']
    del result[u'location']

Rebuild json string :

elevations = json.dumps(data)

Finally :

pd.read_json(elevations)

You can, also, probably avoid to dump data back to a string, I assume Panda can directly create a DataFrame from a dictionnary (I haven’t used it since a long time :p)


回答 3

只是接受答案的新版本,因为python3.x不支持urllib2

from requests import request
import json
from pandas.io.json import json_normalize

path1 = '42.974049,-81.205203|42.974298,-81.195755'
response=request(url='http://maps.googleapis.com/maps/api/elevation/json?locations='+path1+'&sensor=false', method='get')
elevations = response.json()
elevations
data = json.loads(elevations)
json_normalize(data['results'])

Just a new version of the accepted answer, as python3.x does not support urllib2

from requests import request
import json
from pandas.io.json import json_normalize

path1 = '42.974049,-81.205203|42.974298,-81.195755'
response=request(url='http://maps.googleapis.com/maps/api/elevation/json?locations='+path1+'&sensor=false', method='get')
elevations = response.json()
elevations
data = json.loads(elevations)
json_normalize(data['results'])

回答 4

问题是您在数据框中有几列,其中包含较小的dict。有用的Json通常是大量嵌套的。我一直在编写一些小的函数,这些函数将我想要的信息拉到新的列中。这样,我就可以使用想要的格式了。

for row in range(len(data)):
    #First I load the dict (one at a time)
    n = data.loc[row,'dict_column']
    #Now I make a new column that pulls out the data that I want.
    data.loc[row,'new_column'] = n.get('key')

The problem is that you have several columns in the data frame that contain dicts with smaller dicts inside them. Useful Json is often heavily nested. I have been writing small functions that pull the info I want out into a new column. That way I have it in the format that I want to use.

for row in range(len(data)):
    #First I load the dict (one at a time)
    n = data.loc[row,'dict_column']
    #Now I make a new column that pulls out the data that I want.
    data.loc[row,'new_column'] = n.get('key')

回答 5

优化可接受的答案:

可接受的答案存在一些功能上的问题,因此我想共享不依赖urllib2的代码:

import requests
from pandas.io.json import json_normalize
url = 'https://www.energidataservice.dk/proxy/api/datastore_search?resource_id=nordpoolmarket&limit=5'

r = requests.get(url)
dictr = r.json()
recs = dictr['result']['records']
df = json_normalize(recs)
print(df)

输出:

        _id                    HourUTC               HourDK  ... ElbasAveragePriceEUR  ElbasMaxPriceEUR  ElbasMinPriceEUR
0    264028  2019-01-01T00:00:00+00:00  2019-01-01T01:00:00  ...                  NaN               NaN               NaN
1    138428  2017-09-03T15:00:00+00:00  2017-09-03T17:00:00  ...                33.28              33.4              32.0
2    138429  2017-09-03T16:00:00+00:00  2017-09-03T18:00:00  ...                35.20              35.7              34.9
3    138430  2017-09-03T17:00:00+00:00  2017-09-03T19:00:00  ...                37.50              37.8              37.3
4    138431  2017-09-03T18:00:00+00:00  2017-09-03T20:00:00  ...                39.65              42.9              35.3
..      ...                        ...                  ...  ...                  ...               ...               ...
995  139290  2017-10-09T13:00:00+00:00  2017-10-09T15:00:00  ...                38.40              38.4              38.4
996  139291  2017-10-09T14:00:00+00:00  2017-10-09T16:00:00  ...                41.90              44.3              33.9
997  139292  2017-10-09T15:00:00+00:00  2017-10-09T17:00:00  ...                46.26              49.5              41.4
998  139293  2017-10-09T16:00:00+00:00  2017-10-09T18:00:00  ...                56.22              58.5              49.1
999  139294  2017-10-09T17:00:00+00:00  2017-10-09T19:00:00  ...                56.71              65.4              42.2 

PS:API用于丹麦电价

Optimization of the accepted answer:

The accepted answer has some functioning problems, so I want to share my code that does not rely on urllib2:

import requests
from pandas import json_normalize
url = 'https://www.energidataservice.dk/proxy/api/datastore_search?resource_id=nordpoolmarket&limit=5'

response = requests.get(url)
dictr = response.json()
recs = dictr['result']['records']
df = json_normalize(recs)
print(df)

Output:

        _id                    HourUTC               HourDK  ... ElbasAveragePriceEUR  ElbasMaxPriceEUR  ElbasMinPriceEUR
0    264028  2019-01-01T00:00:00+00:00  2019-01-01T01:00:00  ...                  NaN               NaN               NaN
1    138428  2017-09-03T15:00:00+00:00  2017-09-03T17:00:00  ...                33.28              33.4              32.0
2    138429  2017-09-03T16:00:00+00:00  2017-09-03T18:00:00  ...                35.20              35.7              34.9
3    138430  2017-09-03T17:00:00+00:00  2017-09-03T19:00:00  ...                37.50              37.8              37.3
4    138431  2017-09-03T18:00:00+00:00  2017-09-03T20:00:00  ...                39.65              42.9              35.3
..      ...                        ...                  ...  ...                  ...               ...               ...
995  139290  2017-10-09T13:00:00+00:00  2017-10-09T15:00:00  ...                38.40              38.4              38.4
996  139291  2017-10-09T14:00:00+00:00  2017-10-09T16:00:00  ...                41.90              44.3              33.9
997  139292  2017-10-09T15:00:00+00:00  2017-10-09T17:00:00  ...                46.26              49.5              41.4
998  139293  2017-10-09T16:00:00+00:00  2017-10-09T18:00:00  ...                56.22              58.5              49.1
999  139294  2017-10-09T17:00:00+00:00  2017-10-09T19:00:00  ...                56.71              65.4              42.2 

PS: API is for Danish electricity prices


回答 6

这是将JSON转换为DataFrame并返回的小型实用程序类:希望对您有所帮助。

# -*- coding: utf-8 -*-
from pandas.io.json import json_normalize

class DFConverter:

    #Converts the input JSON to a DataFrame
    def convertToDF(self,dfJSON):
        return(json_normalize(dfJSON))

    #Converts the input DataFrame to JSON 
    def convertToJSON(self, df):
        resultJSON = df.to_json(orient='records')
        return(resultJSON)

Here is small utility class that converts JSON to DataFrame and back: Hope you find this helpful.

# -*- coding: utf-8 -*-
from pandas.io.json import json_normalize

class DFConverter:

    #Converts the input JSON to a DataFrame
    def convertToDF(self,dfJSON):
        return(json_normalize(dfJSON))

    #Converts the input DataFrame to JSON 
    def convertToJSON(self, df):
        resultJSON = df.to_json(orient='records')
        return(resultJSON)

回答 7

billmanH的解决方案对我有所帮助,但是直到我从以下位置切换后才起作用:

n = data.loc[row,'json_column']

至:

n = data.iloc[[row]]['json_column']

这就是其余的内容,转换为字典对于使用json数据很有帮助。

import json

for row in range(len(data)):
    n = data.iloc[[row]]['json_column'].item()
    jsonDict = json.loads(n)
    if ('mykey' in jsonDict):
        display(jsonDict['mykey'])

billmanH’s solution helped me but didn’t work until i switched from:

n = data.loc[row,'json_column']

to:

n = data.iloc[[row]]['json_column']

here’s the rest of it, converting to a dictionary is helpful for working with json data.

import json

for row in range(len(data)):
    n = data.iloc[[row]]['json_column'].item()
    jsonDict = json.loads(n)
    if ('mykey' in jsonDict):
        display(jsonDict['mykey'])

回答 8

#Use the small trick to make the data json interpret-able
#Since your data is not directly interpreted by json.loads()

>>> import json
>>> f=open("sampledata.txt","r+")
>>> data = f.read()
>>> for x in data.split("\n"):
...     strlist = "["+x+"]"
...     datalist=json.loads(strlist)
...     for y in datalist:
...             print(type(y))
...             print(y)
...
...
<type 'dict'>
{u'0': [[10.8, 36.0], {u'10': 0, u'1': 0, u'0': 0, u'3': 0, u'2': 0, u'5': 0, u'4': 0, u'7': 0, u'6': 0, u'9': 0, u'8': 0}]}
<type 'dict'>
{u'1': [[10.8, 36.1], {u'10': 0, u'1': 0, u'0': 0, u'3': 0, u'2': 0, u'5': 0, u'4': 0, u'7': 0, u'6': 0, u'9': 0, u'8': 0}]}
<type 'dict'>
{u'2': [[10.8, 36.2], {u'10': 0, u'1': 0, u'0': 0, u'3': 0, u'2': 0, u'5': 0, u'4': 0, u'7': 0, u'6': 0, u'9': 0, u'8': 0}]}
<type 'dict'>
{u'3': [[10.8, 36.300000000000004], {u'10': 0, u'1': 0, u'0': 0, u'3': 0, u'2': 0, u'5': 0, u'4': 0, u'7': 0, u'6': 0, u'9': 0, u'8': 0}]}
<type 'dict'>
{u'4': [[10.8, 36.4], {u'10': 0, u'1': 0, u'0': 0, u'3': 0, u'2': 0, u'5': 0, u'4': 0, u'7': 0, u'6': 0, u'9': 0, u'8': 0}]}
<type 'dict'>
{u'5': [[10.8, 36.5], {u'10': 0, u'1': 0, u'0': 0, u'3': 0, u'2': 0, u'5': 0, u'4': 0, u'7': 0, u'6': 0, u'9': 0, u'8': 0}]}
<type 'dict'>
{u'6': [[10.8, 36.6], {u'10': 0, u'1': 0, u'0': 0, u'3': 0, u'2': 0, u'5': 0, u'4': 0, u'7': 0, u'6': 0, u'9': 0, u'8': 0}]}
<type 'dict'>
{u'7': [[10.8, 36.7], {u'10': 0, u'1': 0, u'0': 0, u'3': 0, u'2': 0, u'5': 0, u'4': 0, u'7': 0, u'6': 0, u'9': 0, u'8': 0}]}
<type 'dict'>
{u'8': [[10.8, 36.800000000000004], {u'1': 0, u'0': 0, u'3': 0, u'2': 0, u'5': 0, u'4': 0, u'7': 0, u'6': 0, u'9': 0, u'8': 0}]}
<type 'dict'>
{u'9': [[10.8, 36.9], {u'1': 0, u'0': 0, u'3': 0, u'2': 0, u'5': 0, u'4': 0, u'7': 0, u'6': 0, u'9': 0, u'8': 0}]}

#Use the small trick to make the data json interpret-able
#Since your data is not directly interpreted by json.loads()

>>> import json
>>> f=open("sampledata.txt","r+")
>>> data = f.read()
>>> for x in data.split("\n"):
...     strlist = "["+x+"]"
...     datalist=json.loads(strlist)
...     for y in datalist:
...             print(type(y))
...             print(y)
...
...
<type 'dict'>
{u'0': [[10.8, 36.0], {u'10': 0, u'1': 0, u'0': 0, u'3': 0, u'2': 0, u'5': 0, u'4': 0, u'7': 0, u'6': 0, u'9': 0, u'8': 0}]}
<type 'dict'>
{u'1': [[10.8, 36.1], {u'10': 0, u'1': 0, u'0': 0, u'3': 0, u'2': 0, u'5': 0, u'4': 0, u'7': 0, u'6': 0, u'9': 0, u'8': 0}]}
<type 'dict'>
{u'2': [[10.8, 36.2], {u'10': 0, u'1': 0, u'0': 0, u'3': 0, u'2': 0, u'5': 0, u'4': 0, u'7': 0, u'6': 0, u'9': 0, u'8': 0}]}
<type 'dict'>
{u'3': [[10.8, 36.300000000000004], {u'10': 0, u'1': 0, u'0': 0, u'3': 0, u'2': 0, u'5': 0, u'4': 0, u'7': 0, u'6': 0, u'9': 0, u'8': 0}]}
<type 'dict'>
{u'4': [[10.8, 36.4], {u'10': 0, u'1': 0, u'0': 0, u'3': 0, u'2': 0, u'5': 0, u'4': 0, u'7': 0, u'6': 0, u'9': 0, u'8': 0}]}
<type 'dict'>
{u'5': [[10.8, 36.5], {u'10': 0, u'1': 0, u'0': 0, u'3': 0, u'2': 0, u'5': 0, u'4': 0, u'7': 0, u'6': 0, u'9': 0, u'8': 0}]}
<type 'dict'>
{u'6': [[10.8, 36.6], {u'10': 0, u'1': 0, u'0': 0, u'3': 0, u'2': 0, u'5': 0, u'4': 0, u'7': 0, u'6': 0, u'9': 0, u'8': 0}]}
<type 'dict'>
{u'7': [[10.8, 36.7], {u'10': 0, u'1': 0, u'0': 0, u'3': 0, u'2': 0, u'5': 0, u'4': 0, u'7': 0, u'6': 0, u'9': 0, u'8': 0}]}
<type 'dict'>
{u'8': [[10.8, 36.800000000000004], {u'1': 0, u'0': 0, u'3': 0, u'2': 0, u'5': 0, u'4': 0, u'7': 0, u'6': 0, u'9': 0, u'8': 0}]}
<type 'dict'>
{u'9': [[10.8, 36.9], {u'1': 0, u'0': 0, u'3': 0, u'2': 0, u'5': 0, u'4': 0, u'7': 0, u'6': 0, u'9': 0, u'8': 0}]}



回答 9

DataFrame通过接受的答案获得展平后,可以将列设置为“ MultiIndex(花式多行标题)”,如下所示:

df.columns = pd.MultiIndex.from_tuples([tuple(c.split('.')) for c in df.columns])

Once you have the flattened DataFrame obtained by the accepted answer, you can make the columns a MultiIndex (“fancy multiline header”) like this:

df.columns = pd.MultiIndex.from_tuples([tuple(c.split('.')) for c in df.columns])

为什么会看到“ TypeError:字符串索引必须为整数”?

问题:为什么会看到“ TypeError:字符串索引必须为整数”?

我正在学习python并试图将github问题转换为可读形式。使用有关如何将JSON转换为CSV的建议?我想出了这个:

import json
import csv

f=open('issues.json')
data = json.load(f)
f.close()

f=open("issues.csv","wb+")
csv_file=csv.writer(f)

csv_file.writerow(["gravatar_id","position","number","votes","created_at","comments","body","title","updated_at","html_url","user","labels","state"])

for item in data:
        csv_file.writerow([item["gravatar_id"], item["position"], item["number"], item["votes"], item["created_at"], item["comments"], item["body"], item["title"], item["updated_at"], item["html_url"], item["user"], item["labels"], item["state"]])

其中“ issues.json”是包含我的github问题的json文件。当我尝试运行它时,我得到

File "foo.py", line 14, in <module>
csv_file.writerow([item["gravatar_id"], item["position"], item["number"], item["votes"], item["created_at"], item["comments"], item["body"], item["title"], item["updated_at"], item["html_url"], item["user"], item["labels"], item["state"]])

TypeError: string indices must be integers

我在这里想念什么?哪些是“字符串索引”?我确定一旦完成这项工作,我还会遇到更多问题,但是就目前而言,我只是希望它能正常工作!

当我将for陈述调整为

for item in data:
    print item

我得到的是…“问题”-所以我在做一些更基本的错误。这是我的json:

{"issues":[{"gravatar_id":"44230311a3dcd684b6c5f81bf2ec9f60","position":2.0,"number":263,"votes":0,"created_at":"2010/09/17 16:06:50 -0700","comments":11,"body":"Add missing paging (Older>>) links...

当我打印时data,看起来真的很奇怪:

{u'issues': [{u'body': u'Add missing paging (Older>>) lin...

I’m playing with both learning python and trying to get github issues into a readable form. Using the advice on How can I convert JSON to CSV? I came up with this:

import json
import csv

f=open('issues.json')
data = json.load(f)
f.close()

f=open("issues.csv","wb+")
csv_file=csv.writer(f)

csv_file.writerow(["gravatar_id","position","number","votes","created_at","comments","body","title","updated_at","html_url","user","labels","state"])

for item in data:
        csv_file.writerow([item["gravatar_id"], item["position"], item["number"], item["votes"], item["created_at"], item["comments"], item["body"], item["title"], item["updated_at"], item["html_url"], item["user"], item["labels"], item["state"]])

Where “issues.json” is the json file containing my github issues. When I try to run that, I get

File "foo.py", line 14, in <module>
csv_file.writerow([item["gravatar_id"], item["position"], item["number"], item["votes"], item["created_at"], item["comments"], item["body"], item["title"], item["updated_at"], item["html_url"], item["user"], item["labels"], item["state"]])

TypeError: string indices must be integers

What am I missing here? Which are the “string indices”? I’m sure that once I get this working I’ll have more issues, but for now , I’d just love for this to work!

When I tweak the for statement to simply

for item in data:
    print item

what I get is … “issues” — so I’m doing something more basic wrong. Here’s a bit of my json:

{"issues":[{"gravatar_id":"44230311a3dcd684b6c5f81bf2ec9f60","position":2.0,"number":263,"votes":0,"created_at":"2010/09/17 16:06:50 -0700","comments":11,"body":"Add missing paging (Older>>) links...

when I print data it looks like it is getting munged really oddly:

{u'issues': [{u'body': u'Add missing paging (Older>>) lin...

回答 0

item您的代码中很可能是字符串;字符串索引是方括号中的那些,例如gravatar_id。因此,我首先要检查您的data变量以查看您在那里收到了什么;我猜这data是一个字符串列表(或至少一个包含至少一个字符串的列表),而它应该是字典列表。

item is most likely a string in your code; the string indices are the ones in the square brackets, e.g., gravatar_id. So I’d first check your data variable to see what you received there; I guess that data is a list of strings (or at least a list containing at least one string) while it should be a list of dictionaries.


回答 1

该变量item是一个字符串。索引如下所示:

>>> mystring = 'helloworld'
>>> print mystring[0]
'h'

上面的示例使用0字符串的索引来引用第一个字符。

字符串不能具有字符串索引(就像字典一样)。所以这行不通:

>>> mystring = 'helloworld'
>>> print mystring['stringindex']
TypeError: string indices must be integers

The variable item is a string. An index looks like this:

>>> mystring = 'helloworld'
>>> print mystring[0]
'h'

The above example uses the 0 index of the string to refer to the first character.

Strings can’t have string indices (like dictionaries can). So this won’t work:

>>> mystring = 'helloworld'
>>> print mystring['stringindex']
TypeError: string indices must be integers

回答 2

data是一个dict对象。因此,像这样迭代它:

Python 2

for key, value in data.iteritems():
    print key, value

Python 3

for key, value in data.items():
    print(key, value)

data is a dict object. So, iterate over it like this:

Python 2

for key, value in data.iteritems():
    print key, value

Python 3

for key, value in data.items():
    print(key, value)

回答 3

切片符号的TypeError str[a:b]

tl; dr:在两个索引之间和中使用冒号 :而不是逗号abstr[a:b]


使用字符串切片表示法常见的序列操作)时,可能会TypeError引发a 的增加,指出索引必须是整数,即使它们显然是整数。

>>> my_string = "hello world"
>>> my_string[0,5]
TypeError: string indices must be integers

我们显然将两个整数作为索引传递给切片符号,对吗?那么这是什么问题呢?

该错误可能非常令人沮丧-尤其是在学习Python初期-因为错误消息有点误导。

说明

调用时,我们隐式地将两个整数(0和5)的元组传递给切片符号,my_string[0,5]因为0,5(即使没有括号)求值结果也与该元组相同(0,5)

,实际上,逗号足以使Python将某些内容评估为元组:

>>> my_variable = 0,
>>> type(my_variable)
<class 'tuple'>

因此,我们这次在这里做了什么:

>>> my_string = "hello world"
>>> my_tuple = 0, 5
>>> my_string[my_tuple]
TypeError: string indices must be integers

现在,至少,该错误消息有意义。

我们需要用冒号替换逗号 ,,以正确分隔两个整数: :

>>> my_string = "hello world"
>>> my_string[0:5]
'hello'

更清晰,更有用的错误消息可能是:

TypeError: string indices must be integers (not tuple)

一条良好的错误消息会直接向用户显示他们的错误行为,而解决问题的方式将更加明显。

[因此,下次当您发现自己负责编写错误描述消息时,请考虑以下示例,并在错误消息中添加原因或其他有用的信息,以使您甚至其他人可以理解出了什么问题。

得到教训

  • 切片符号使用冒号:分隔其索引(以及步进范围,例如str[from:to:step]
  • 元组由逗号定义,(例如t = 1,
  • 在错误消息中添加一些信息,以使用户了解出了什么问题

欢呼和快乐编程
winklerrr


[我知道这个问题已经回答了,这不完全是线程启动程序提出的问题,但是由于上述问题导致出现相同的错误消息,所以我来到这里。至少我花了很多时间才找到那个小错字。

因此,我希望这会帮助那些偶然发现相同错误的人,并节省他们发现细微错误的时间。

TypeError for Slice Notation str[a:b]

tl;dr: use a colon : instead of a comma in between the two indices a and b in str[a:b]


When working with strings and slice notation (a common sequence operation), it can happen that a TypeError is raised, pointing out that the indices must be integers, even if they obviously are.

Example

>>> my_string = "hello world"
>>> my_string[0,5]
TypeError: string indices must be integers

We obviously passed two integers for the indices to the slice notation, right? So what is the problem here?

This error can be very frustrating – especially at the beginning of learning Python – because the error message is a little bit misleading.

Explanation

We implicitly passed a tuple of two integers (0 and 5) to the slice notation when we called my_string[0,5] because 0,5 (even without the parentheses) evaluates to the same tuple as (0,5) would do.

A comma , is actually enough for Python to evaluate something as a tuple:

>>> my_variable = 0,
>>> type(my_variable)
<class 'tuple'>

So what we did there, this time explicitly:

>>> my_string = "hello world"
>>> my_tuple = 0, 5
>>> my_string[my_tuple]
TypeError: string indices must be integers

Now, at least, the error message makes sense.

Solution

We need to replace the comma , with a colon : to separate the two integers correctly:

>>> my_string = "hello world"
>>> my_string[0:5]
'hello'

A clearer and more helpful error message could have been something like:

TypeError: string indices must be integers (not tuple)

A good error message shows the user directly what they did wrong and it would have been more obvious how to solve the problem.

[So the next time when you find yourself responsible for writing an error description message, think of this example and add the reason or other useful information to error message to let you and maybe other people understand what went wrong.]

Lessons learned

  • slice notation uses colons : to separate its indices (and step range, e.g. str[from:to:step])
  • tuples are defined by commas , (e.g. t = 1,)
  • add some information to error messages for users to understand what went wrong

Cheers and happy programming
winklerrr


[I know this question was already answered and this wasn’t exactly the question the thread starter asked, but I came here because of the above problem which leads to the same error message. At least it took me quite some time to find that little typo.

So I hope that this will help someone else who stumbled upon the same error and saves them some time finding that tiny mistake.]


回答 4

如果缺少逗号,可能会发生这种情况。当我有一个包含两个元组的列表时,我遇到了它,每个列表中的第一个位置包含一个字符串,第二个位置包含一个字符串。在一种情况下,我错误地省略了元组的第一部分之后的逗号,并且解释器认为我正在尝试索引第一部分。

This can happen if a comma is missing. I ran into it when I had a list of two-tuples, each of which consisted of a string in the first position, and a list in the second. I erroneously omitted the comma after the first component of a tuple in one case, and the interpreter thought I was trying to index the first component.


回答 5

我在Pandas上遇到过类似的问题,您需要使用iterrows()函数来遍历Pandas数据集Pandas文档以进行迭代

data = pd.read_csv('foo.csv')
for index,item in data.iterrows():
    print('{} {}'.format(item["gravatar_id"], item["position"]))

请注意,您需要处理该函数还返回的数据集中的索引。

I had a similar issue with Pandas, you need to use the iterrows() function to iterate through a Pandas dataset Pandas documentation for iterrows

data = pd.read_csv('foo.csv')
for index,item in data.iterrows():
    print('{} {}'.format(item["gravatar_id"], item["position"]))

note that you need to handle the index in the dataset that is also returned by the function.


将JSON字符串转换为字典未列出

问题:将JSON字符串转换为字典未列出

我正在尝试传递JSON文件并将数据转换成字典。

到目前为止,这是我所做的:

import json
json1_file = open('json1')
json1_str = json1_file.read()
json1_data = json.loads(json1_str)

我期望json1_data是一种dict类型,但是list当我使用进行检查时,它实际上是作为一种类型出现的type(json1_data)

我想念什么?我需要将它作为字典,以便可以访问其中一个键。

I am trying to pass in a JSON file and convert the data into a dictionary.

So far, this is what I have done:

import json
json1_file = open('json1')
json1_str = json1_file.read()
json1_data = json.loads(json1_str)

I’m expecting json1_data to be a dict type but it actually comes out as a list type when I check it with type(json1_data).

What am I missing? I need this to be a dictionary so I can access one of the keys.


回答 0

JSON是一个内部有单个对象的数组,因此当您读入JSON时,会得到一个内部带有字典的列表。您可以通过访问列表中的项目0来访问字典,如下所示:

json1_data = json.loads(json1_str)[0]

现在,您可以按预期访问存储在数据点中的数据

datapoints = json1_data['datapoints']

我还有一个问题,是否有人可以咬:我正在尝试获取这些数据点(即datapoints [0] [0])中第一个元素的平均值。只是列出它们,我尝试做datapoints [0:5] [0],但我得到的只是两个元素的第一个数据点,而不是想要获取仅包含第一个元素的前5个数据点。有没有办法做到这一点?

datapoints[0:5][0]并没有达到您的期望。datapoints[0:5]返回仅包含前5个元素的新列表切片,然后[0]在其末尾添加将仅从结果列表切片中获取第一个元素。您需要使用以获得列表结果的方法:

[p[0] for p in datapoints[0:5]]

这是一种计算均值的简单方法:

sum(p[0] for p in datapoints[0:5])/5. # Result is 35.8

如果您愿意安装NumPy,那么它甚至更容易:

import numpy
json1_file = open('json1')
json1_str = json1_file.read()
json1_data = json.loads(json1_str)[0]
datapoints = numpy.array(json1_data['datapoints'])
avg = datapoints[0:5,0].mean()
# avg is now 35.8

,运算符与NumPy数组的切片语法一起使用会产生您最初期望的与列表切片相同的行为。

Your JSON is an array with a single object inside, so when you read it in you get a list with a dictionary inside. You can access your dictionary by accessing item 0 in the list, as shown below:

json1_data = json.loads(json1_str)[0]

Now you can access the data stored in datapoints just as you were expecting:

datapoints = json1_data['datapoints']

I have one more question if anyone can bite: I am trying to take the average of the first elements in these datapoints(i.e. datapoints[0][0]). Just to list them, I tried doing datapoints[0:5][0] but all I get is the first datapoint with both elements as opposed to wanting to get the first 5 datapoints containing only the first element. Is there a way to do this?

datapoints[0:5][0] doesn’t do what you’re expecting. datapoints[0:5] returns a new list slice containing just the first 5 elements, and then adding [0] on the end of it will take just the first element from that resulting list slice. What you need to use to get the result you want is a list comprehension:

[p[0] for p in datapoints[0:5]]

Here’s a simple way to calculate the mean:

sum(p[0] for p in datapoints[0:5])/5. # Result is 35.8

If you’re willing to install NumPy, then it’s even easier:

import numpy
json1_file = open('json1')
json1_str = json1_file.read()
json1_data = json.loads(json1_str)[0]
datapoints = numpy.array(json1_data['datapoints'])
avg = datapoints[0:5,0].mean()
# avg is now 35.8

Using the , operator with the slicing syntax for NumPy’s arrays has the behavior you were originally expecting with the list slices.


回答 1

这是一个简单的代码片段,可json从字典中读取文本文件。请注意,您的json文件必须遵循json标准,因此它必须具有"双引号而不是'单引号。

您的JSON dump.txt文件:

{"test":"1", "test2":123}

Python脚本:

import json
with open('/your/path/to/a/dict/dump.txt') as handle:
    dictdump = json.loads(handle.read())

Here is a simple snippet that read’s in a json text file from a dictionary. Note that your json file must follow the json standard, so it has to have " double quotes rather then ' single quotes.

Your JSON dump.txt File:

{"test":"1", "test2":123}

Python Script:

import json
with open('/your/path/to/a/dict/dump.txt') as handle:
    dictdump = json.loads(handle.read())

回答 2

您可以使用以下内容:

import json

 with open('<yourFile>.json', 'r') as JSON:
       json_dict = json.load(JSON)

 # Now you can use it like dictionary
 # For example:

 print(json_dict["username"])

You can use the following:

import json

 with open('<yourFile>.json', 'r') as JSON:
       json_dict = json.load(JSON)

 # Now you can use it like dictionary
 # For example:

 print(json_dict["username"])

回答 3

将JSON数据加载到Dictionary中的最好方法是可以使用内置的json加载器。

以下是可以使用的示例代码段。

import json
f = open("data.json")
data = json.load(f))
f.close()
type(data)
print(data[<keyFromTheJsonFile>])

The best way to Load JSON Data into Dictionary is You can user the inbuilt json loader.

Below is the sample snippet that can be used.

import json
f = open("data.json")
data = json.load(f))
f.close()
type(data)
print(data[<keyFromTheJsonFile>])

回答 4

我正在使用针对REST API的Python代码,因此这是针对从事类似项目的人员的。

我使用POST请求从URL提取数据,原始输出为JSON。由于某种原因,输出已经是字典,而不是列表,并且我能够立即引用嵌套的字典键,如下所示:

datapoint_1 = json1_data['datapoints']['datapoint_1']

其中datapoint_1在数据点字典中。

I am working with a Python code for a REST API, so this is for those who are working on similar projects.

I extract data from an URL using a POST request and the raw output is JSON. For some reason the output is already a dictionary, not a list, and I’m able to refer to the nested dictionary keys right away, like this:

datapoint_1 = json1_data['datapoints']['datapoint_1']

where datapoint_1 is inside the datapoints dictionary.


回答 5

从get方法使用javascript ajax传递数据

    **//javascript function    
    function addnewcustomer(){ 
    //This function run when button click
    //get the value from input box using getElementById
            var new_cust_name = document.getElementById("new_customer").value;
            var new_cust_cont = document.getElementById("new_contact_number").value;
            var new_cust_email = document.getElementById("new_email").value;
            var new_cust_gender = document.getElementById("new_gender").value;
            var new_cust_cityname = document.getElementById("new_cityname").value;
            var new_cust_pincode = document.getElementById("new_pincode").value;
            var new_cust_state = document.getElementById("new_state").value;
            var new_cust_contry = document.getElementById("new_contry").value;
    //create json or if we know python that is call dictionary.        
    var data = {"cust_name":new_cust_name, "cust_cont":new_cust_cont, "cust_email":new_cust_email, "cust_gender":new_cust_gender, "cust_cityname":new_cust_cityname, "cust_pincode":new_cust_pincode, "cust_state":new_cust_state, "cust_contry":new_cust_contry};
    //apply stringfy method on json
            data = JSON.stringify(data);
    //insert data into database using javascript ajax
            var send_data = new XMLHttpRequest();
            send_data.open("GET", "http://localhost:8000/invoice_system/addnewcustomer/?customerinfo="+data,true);
            send_data.send();

            send_data.onreadystatechange = function(){
              if(send_data.readyState==4 && send_data.status==200){
                alert(send_data.responseText);
              }
            }
          }

django意见

    def addNewCustomer(request):
    #if method is get then condition is true and controller check the further line
        if request.method == "GET":
    #this line catch the json from the javascript ajax.
            cust_info = request.GET.get("customerinfo")
    #fill the value in variable which is coming from ajax.
    #it is a json so first we will get the value from using json.loads method.
    #cust_name is a key which is pass by javascript json. 
    #as we know json is a key value pair. the cust_name is a key which pass by javascript json
            cust_name = json.loads(cust_info)['cust_name']
            cust_cont = json.loads(cust_info)['cust_cont']
            cust_email = json.loads(cust_info)['cust_email']
            cust_gender = json.loads(cust_info)['cust_gender']
            cust_cityname = json.loads(cust_info)['cust_cityname']
            cust_pincode = json.loads(cust_info)['cust_pincode']
            cust_state = json.loads(cust_info)['cust_state']
            cust_contry = json.loads(cust_info)['cust_contry']
    #it print the value of cust_name variable on server
            print(cust_name)
            print(cust_cont)
            print(cust_email)
            print(cust_gender)
            print(cust_cityname)
            print(cust_pincode)
            print(cust_state)
            print(cust_contry)
            return HttpResponse("Yes I am reach here.")**

pass the data using javascript ajax from get methods

    **//javascript function    
    function addnewcustomer(){ 
    //This function run when button click
    //get the value from input box using getElementById
            var new_cust_name = document.getElementById("new_customer").value;
            var new_cust_cont = document.getElementById("new_contact_number").value;
            var new_cust_email = document.getElementById("new_email").value;
            var new_cust_gender = document.getElementById("new_gender").value;
            var new_cust_cityname = document.getElementById("new_cityname").value;
            var new_cust_pincode = document.getElementById("new_pincode").value;
            var new_cust_state = document.getElementById("new_state").value;
            var new_cust_contry = document.getElementById("new_contry").value;
    //create json or if we know python that is call dictionary.        
    var data = {"cust_name":new_cust_name, "cust_cont":new_cust_cont, "cust_email":new_cust_email, "cust_gender":new_cust_gender, "cust_cityname":new_cust_cityname, "cust_pincode":new_cust_pincode, "cust_state":new_cust_state, "cust_contry":new_cust_contry};
    //apply stringfy method on json
            data = JSON.stringify(data);
    //insert data into database using javascript ajax
            var send_data = new XMLHttpRequest();
            send_data.open("GET", "http://localhost:8000/invoice_system/addnewcustomer/?customerinfo="+data,true);
            send_data.send();

            send_data.onreadystatechange = function(){
              if(send_data.readyState==4 && send_data.status==200){
                alert(send_data.responseText);
              }
            }
          }

django views

    def addNewCustomer(request):
    #if method is get then condition is true and controller check the further line
        if request.method == "GET":
    #this line catch the json from the javascript ajax.
            cust_info = request.GET.get("customerinfo")
    #fill the value in variable which is coming from ajax.
    #it is a json so first we will get the value from using json.loads method.
    #cust_name is a key which is pass by javascript json. 
    #as we know json is a key value pair. the cust_name is a key which pass by javascript json
            cust_name = json.loads(cust_info)['cust_name']
            cust_cont = json.loads(cust_info)['cust_cont']
            cust_email = json.loads(cust_info)['cust_email']
            cust_gender = json.loads(cust_info)['cust_gender']
            cust_cityname = json.loads(cust_info)['cust_cityname']
            cust_pincode = json.loads(cust_info)['cust_pincode']
            cust_state = json.loads(cust_info)['cust_state']
            cust_contry = json.loads(cust_info)['cust_contry']
    #it print the value of cust_name variable on server
            print(cust_name)
            print(cust_cont)
            print(cust_email)
            print(cust_gender)
            print(cust_cityname)
            print(cust_pincode)
            print(cust_state)
            print(cust_contry)
            return HttpResponse("Yes I am reach here.")**

UnicodeDecodeError:“ utf8”编解码器无法解码位置0的字节0xa5:无效的起始字节

问题:UnicodeDecodeError:“ utf8”编解码器无法解码位置0的字节0xa5:无效的起始字节

我正在使用Python-2.6 CGI脚本,但在执行此操作时在服务器日志中发现了此错误json.dumps()

Traceback (most recent call last):
  File "/etc/mongodb/server/cgi-bin/getstats.py", line 135, in <module>
    print json.dumps(​​__get​data())
  File "/usr/lib/python2.7/json/__init__.py", line 231, in dumps
    return _default_encoder.encode(obj)
  File "/usr/lib/python2.7/json/encoder.py", line 201, in encode
    chunks = self.iterencode(o, _one_shot=True)
  File "/usr/lib/python2.7/json/encoder.py", line 264, in iterencode
    return _iterencode(o, 0)
UnicodeDecodeError: 'utf8' codec can't decode byte 0xa5 in position 0: invalid start byte

“这里

​__get​data()函数返回dictionary {}

张贴这个问题之前我已经提到这个问题,操作系统,所以的。


更新

下一行损害了JSON编码器,

now = datetime.datetime.now()
now = datetime.datetime.strftime(now, '%Y-%m-%dT%H:%M:%S.%fZ')
print json.dumps({'current_time': now}) // this is the culprit

我有一个临时解决方案

print json.dumps( {'old_time': now.encode('ISO-8859-1').strip() })

但是我不确定这样做是否正确。

I am using Python-2.6 CGI scripts but found this error in server log while doing json.dumps(),

Traceback (most recent call last):
  File "/etc/mongodb/server/cgi-bin/getstats.py", line 135, in <module>
    print json.dumps(​​__get​data())
  File "/usr/lib/python2.7/json/__init__.py", line 231, in dumps
    return _default_encoder.encode(obj)
  File "/usr/lib/python2.7/json/encoder.py", line 201, in encode
    chunks = self.iterencode(o, _one_shot=True)
  File "/usr/lib/python2.7/json/encoder.py", line 264, in iterencode
    return _iterencode(o, 0)
UnicodeDecodeError: 'utf8' codec can't decode byte 0xa5 in position 0: invalid start byte

​Here ,

​__get​data() function returns dictionary {} .

Before posting this question I have referred this of question os SO.


UPDATES

Following line is hurting JSON encoder,

now = datetime.datetime.now()
now = datetime.datetime.strftime(now, '%Y-%m-%dT%H:%M:%S.%fZ')
print json.dumps({'current_time': now}) // this is the culprit

I got a temporary fix for it

print json.dumps( {'old_time': now.encode('ISO-8859-1').strip() })

But I am not sure is it correct way to do it.


回答 0

该错误是因为字典中存在一些非ASCII字符,并且无法对其进行编码/解码。避免此错误的一种简单方法是使用encode()如下功能对此类字符串进行编码(如果a字符串为非ascii字符):

a.encode('utf-8').strip()

The error is because there is some non-ascii character in the dictionary and it can’t be encoded/decoded. One simple way to avoid this error is to encode such strings with encode() function as follows (if a is the string with non-ascii character):

a.encode('utf-8').strip()

回答 1

我仅通过在read_csv()命令中定义其他编解码器包来切换此设置:

encoding = 'unicode_escape'

例如:

import pandas as pd
data = pd.read_csv(filename, encoding= 'unicode_escape')

I switched this simply by defining a different codec package in the read_csv() command:

encoding = 'unicode_escape'

Eg:

import pandas as pd
data = pd.read_csv(filename, encoding= 'unicode_escape')

回答 2

试试下面的代码片段:

with open(path, 'rb') as f:
  text = f.read()

Try the below code snippet:

with open(path, 'rb') as f:
  text = f.read()

回答 3

您的字符串中包含一个非ascii字符编码。

utf-8如果您需要在代码中使用其他编码,则可能无法解码。例如:

>>> 'my weird character \x96'.decode('utf-8')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Python27\lib\encodings\utf_8.py", line 16, in decode
    return codecs.utf_8_decode(input, errors, True)
UnicodeDecodeError: 'utf8' codec can't decode byte 0x96 in position 19: invalid start byte

在这种情况下,编码是windows-1252必须要做的:

>>> 'my weird character \x96'.decode('windows-1252')
u'my weird character \u2013'

现在Unicode,您可以安全地编码为了utf-8

Your string has a non ascii character encoded in it.

Not being able to decode with utf-8 may happen if you’ve needed to use other encodings in your code. For example:

>>> 'my weird character \x96'.decode('utf-8')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Python27\lib\encodings\utf_8.py", line 16, in decode
    return codecs.utf_8_decode(input, errors, True)
UnicodeDecodeError: 'utf8' codec can't decode byte 0x96 in position 19: invalid start byte

In this case, the encoding is windows-1252 so you have to do:

>>> 'my weird character \x96'.decode('windows-1252')
u'my weird character \u2013'

Now that you have Unicode, you can safely encode into utf-8.


回答 4

阅读时csv,我添加了一种编码方法:

import pandas as pd
dataset = pd.read_csv('sample_data.csv', header= 0,
                        encoding= 'unicode_escape')

On read csv, I added an encoding method:

import pandas as pd
dataset = pd.read_csv('sample_data.csv', header= 0,
                        encoding= 'unicode_escape')

回答 5

此解决方案为我工作:

import pandas as pd
data = pd.read_csv("training.csv", encoding = 'unicode_escape')

This solution worked for me:

import pandas as pd
data = pd.read_csv("training.csv", encoding = 'unicode_escape')

回答 6

在代码顶部设置默认编码器

import sys
reload(sys)
sys.setdefaultencoding("ISO-8859-1")

Set default encoder at the top of your code

import sys
reload(sys)
sys.setdefaultencoding("ISO-8859-1")

回答 7

从2018-05开始decode,至少可以直接使用Python 3直接处理此问题。

我正在使用以下代码段输入invalid start byteinvalid continuation byte输入错误。添加errors='ignore'为我修复。

with open(out_file, 'rb') as f:
    for line in f:
        print(line.decode(errors='ignore'))

As of 2018-05 this is handled directly with decode, at least for Python 3.

I’m using the below snippet for invalid start byte and invalid continuation byte type errors. Adding errors='ignore' fixed it for me.

with open(out_file, 'rb') as f:
    for line in f:
        print(line.decode(errors='ignore'))

回答 8

灵感来自@aaronpenne和@Soumyaansh

f = open("file.txt", "rb")
text = f.read().decode(errors='replace')

Inspired by @aaronpenne and @Soumyaansh

f = open("file.txt", "rb")
text = f.read().decode(errors='replace')

回答 9

简单的解决方案:

import pandas as pd
df = pd.read_csv('file_name.csv', engine='python')

Simple Solution:

import pandas as pd
df = pd.read_csv('file_name.csv', engine='python')

回答 10

下一行损害了JSON编码器,

now = datetime.datetime.now()
now = datetime.datetime.strftime(now, '%Y-%m-%dT%H:%M:%S.%fZ')
print json.dumps({'current_time': now}) // this is the culprit

我有一个临时解决方案

print json.dumps( {'old_time': now.encode('ISO-8859-1').strip() })

将其标记为正确(作为临时解决方案)(不确定)。

Following line is hurting JSON encoder,

now = datetime.datetime.now()
now = datetime.datetime.strftime(now, '%Y-%m-%dT%H:%M:%S.%fZ')
print json.dumps({'current_time': now}) // this is the culprit

I got a temporary fix for it

print json.dumps( {'old_time': now.encode('ISO-8859-1').strip() })

Marking this as correct as a temporary fix (Not sure so).


回答 11

如果上述方法不适合您,则可能需要研究更改encodingcsv file本身。

使用Excel:

  1. csv使用打开文件Excel
  2. 导航到“文件”菜单选项,然后单击“另存为”
  3. 单击浏览以选择保存文件的位置
  4. 输入想要的文件名
  5. 选择CSV (Comma delimited) (*.csv)选项
  6. 单击工具下拉框,然后单击Web选项。
  7. 在“编码”选项卡下,Unicode (UTF-8)从“将此文档另存为”下拉列表中选择选项。
  8. 保存文件

使用记事本:

  1. csv file使用记事本打开
  2. 导航到文件>另存为选项
  3. 接下来,选择文件的位置
  4. 选择“保存类型”选项作为“所有文件”(
  5. 指定带.csv扩展名的文件名
  6. 编码下拉列表中,选择UTF-8选项。
  7. 单击保存以保存文件

这样,您应该能够在import csv不遇到的情况下进行归档UnicodeCodeError

If the above methods are not working for you, you may want to look into changing the encoding of the csv file itself.

Using Excel:

  1. Open csv file using Excel
  2. Navigate to File menu option and click Save As
  3. Click Browse to select a location to save the file
  4. Enter intended filename
  5. Select CSV (Comma delimited) (*.csv) option
  6. Click Tools drop-down box and click Web Options
  7. Under Encoding tab, select the option Unicode (UTF-8) from Save this document as drop-down list
  8. Save the file

Using Notepad:

  1. Open csv file using notepad
  2. Navigate to File > Save As option
  3. Next, select the location to the file
  4. Select the Save as type option as All Files(.)
  5. Specify the file name with .csv extension
  6. From Encoding drop-down list, select UTF-8 option.
  7. Click Save to save the file

By doing this, you should be able to import csv files without encountering the UnicodeCodeError.


回答 12

您可以使用任何特定用法和输入的标准编码。

utf-8 是默认值。

iso8859-1 西欧也很受欢迎。

例如: bytes_obj.decode('iso8859-1')

请参阅:文档

You may use any standard encoding of your specific usage and input.

utf-8 is the default.

iso8859-1 is also popular for Western Europe.

e.g: bytes_obj.decode('iso8859-1')

see: docs


回答 13

在尝试了所有上述解决方法后,如果仍然抛出相同的错误,则可以尝试将文件导出为CSV(如果已有的话,第二次导出)。特别是如果您正在使用scikit learn,最好import将数据集作为CSV file

我在一起度过了几个小时,而解决方案就是这么简单。将文件以CSV格式导出到Anaconda安装了分类器工具的目录,然后尝试。

After trying all the aforementioned workarounds, if it still throws the same error, you can try exporting the file as CSV (a second time if you already have). Especially if you’re using scikit learn, it is best to import the dataset as a CSV file.

I spent hours together, whereas the solution was this simple. Export the file as a CSV to the directory where Anaconda or your classifier tools are installed and try.


回答 14

而不是寻找解码a5(Yen ¥)或96(en-dash )的方法,而是告诉MySQL您的客户端已编码为“ latin1”,但您希望在数据库中使用“ utf8”。

使用UTF-8字符查看“问题”中的详细信息;我看到的不是我存储的

Instead of looking for ways to decode a5 (Yen ¥) or 96 (en-dash ), tell MySQL that your client is encoded “latin1”, but you want “utf8” in the database.

See details in Trouble with UTF-8 characters; what I see is not what I stored


回答 15

就我而言,我不得不将文件另存为带有BOM的UTF8,而不仅仅是UTF8 utf8这个错误消失了。

In my case, i had to save the file as UTF8 with BOM not just as UTF8 utf8 then this error was gone.


Python中的HTTP请求和JSON解析

问题:Python中的HTTP请求和JSON解析

我想通过Google Directions API动态查询Google Maps。例如,此请求计算通过伊利诺伊州芝加哥市到密苏里州乔普林市和俄克拉荷马州俄克拉荷马市的两个航路点的路线:

http://maps.googleapis.com/maps/api/directions/json?origin = Chicago,IL&destination = Los + Angeles,CA&waypoints = Joplin,MO |俄克拉荷马州+市,OK&sensor = false

以JSON格式返回结果。

如何在Python中执行此操作?我想发送这样的请求,接收结果并解析它。

I want to dynamically query Google Maps through the Google Directions API. As an example, this request calculates the route from Chicago, IL to Los Angeles, CA via two waypoints in Joplin, MO and Oklahoma City, OK:

http://maps.googleapis.com/maps/api/directions/json?origin=Chicago,IL&destination=Los+Angeles,CA&waypoints=Joplin,MO|Oklahoma+City,OK&sensor=false

It returns a result in the JSON format.

How can I do this in Python? I want to send such a request, receive the result and parse it.


回答 0

我建议使用很棒的请求库:

import requests

url = 'http://maps.googleapis.com/maps/api/directions/json'

params = dict(
    origin='Chicago,IL',
    destination='Los+Angeles,CA',
    waypoints='Joplin,MO|Oklahoma+City,OK',
    sensor='false'
)

resp = requests.get(url=url, params=params)
data = resp.json() # Check the JSON Response Content documentation below

JSON响应内容:https : //requests.readthedocs.io/en/master/user/quickstart/#json-response-content

I recommend using the awesome requests library:

import requests

url = 'http://maps.googleapis.com/maps/api/directions/json'

params = dict(
    origin='Chicago,IL',
    destination='Los+Angeles,CA',
    waypoints='Joplin,MO|Oklahoma+City,OK',
    sensor='false'
)

resp = requests.get(url=url, params=params)
data = resp.json() # Check the JSON Response Content documentation below

JSON Response Content: https://requests.readthedocs.io/en/master/user/quickstart/#json-response-content


回答 1

requestsPython模块负责的两个检索JSON数据和对其进行解码,由于其内建JSON解码器。这是来自模块文档的示例:

>>> import requests
>>> r = requests.get('https://github.com/timeline.json')
>>> r.json()
[{u'repository': {u'open_issues': 0, u'url': 'https://github.com/...

因此,无需使用一些单独的模块来解码JSON。

The requests Python module takes care of both retrieving JSON data and decoding it, due to its builtin JSON decoder. Here is an example taken from the module’s documentation:

>>> import requests
>>> r = requests.get('https://github.com/timeline.json')
>>> r.json()
[{u'repository': {u'open_issues': 0, u'url': 'https://github.com/...

So there is no use of having to use some separate module for decoding JSON.


回答 2

requests具有内置.json()方法

import requests
requests.get(url).json()

requests has built-in .json() method

import requests
requests.get(url).json()

回答 3

import urllib
import json

url = 'http://maps.googleapis.com/maps/api/directions/json?origin=Chicago,IL&destination=Los+Angeles,CA&waypoints=Joplin,MO|Oklahoma+City,OK&sensor=false'
result = json.load(urllib.urlopen(url))
import urllib
import json

url = 'http://maps.googleapis.com/maps/api/directions/json?origin=Chicago,IL&destination=Los+Angeles,CA&waypoints=Joplin,MO|Oklahoma+City,OK&sensor=false'
result = json.load(urllib.urlopen(url))

回答 4

使用请求库,漂亮地打印结果,以便您可以更好地定位要提取的键/值,然后使用嵌套的for循环来解析数据。在该示例中,我逐步提取了行车路线。

import json, requests, pprint

url = 'http://maps.googleapis.com/maps/api/directions/json?'

params = dict(
    origin='Chicago,IL',
    destination='Los+Angeles,CA',
    waypoints='Joplin,MO|Oklahoma+City,OK',
    sensor='false'
)


data = requests.get(url=url, params=params)
binary = data.content
output = json.loads(binary)

# test to see if the request was valid
#print output['status']

# output all of the results
#pprint.pprint(output)

# step-by-step directions
for route in output['routes']:
        for leg in route['legs']:
            for step in leg['steps']:
                print step['html_instructions']

Use the requests library, pretty print the results so you can better locate the keys/values you want to extract, and then use nested for loops to parse the data. In the example I extract step by step driving directions.

import json, requests, pprint

url = 'http://maps.googleapis.com/maps/api/directions/json?'

params = dict(
    origin='Chicago,IL',
    destination='Los+Angeles,CA',
    waypoints='Joplin,MO|Oklahoma+City,OK',
    sensor='false'
)


data = requests.get(url=url, params=params)
binary = data.content
output = json.loads(binary)

# test to see if the request was valid
#print output['status']

# output all of the results
#pprint.pprint(output)

# step-by-step directions
for route in output['routes']:
        for leg in route['legs']:
            for step in leg['steps']:
                print step['html_instructions']

回答 5

试试这个:

import requests
import json

# Goole Maps API.
link = 'http://maps.googleapis.com/maps/api/directions/json?origin=Chicago,IL&destination=Los+Angeles,CA&waypoints=Joplin,MO|Oklahoma+City,OK&sensor=false'

# Request data from link as 'str'
data = requests.get(link).text

# convert 'str' to Json
data = json.loads(data)

# Now you can access Json 
for i in data['routes'][0]['legs'][0]['steps']:
    lattitude = i['start_location']['lat']
    longitude = i['start_location']['lng']
    print('{}, {}'.format(lattitude, longitude))

Try this:

import requests
import json

# Goole Maps API.
link = 'http://maps.googleapis.com/maps/api/directions/json?origin=Chicago,IL&destination=Los+Angeles,CA&waypoints=Joplin,MO|Oklahoma+City,OK&sensor=false'

# Request data from link as 'str'
data = requests.get(link).text

# convert 'str' to Json
data = json.loads(data)

# Now you can access Json 
for i in data['routes'][0]['legs'][0]['steps']:
    lattitude = i['start_location']['lat']
    longitude = i['start_location']['lng']
    print('{}, {}'.format(lattitude, longitude))

回答 6

同样适用于控制台上漂亮的Json:

 json.dumps(response.json(), indent=2)

可以使用带有缩进的转储。(请导入json

Also for pretty Json on console:

 json.dumps(response.json(), indent=2)

possible to use dumps with indent. (Please import json)