如何从网页获取JSON到Python脚本

问题:如何从网页获取JSON到Python脚本

在我的一个脚本中获得了以下代码:

#
# url is defined above.
#
jsonurl = urlopen(url)

#
# While trying to debug, I put this in:
#
print jsonurl

#
# Was hoping text would contain the actual json crap from the URL, but seems not...
#
text = json.loads(jsonurl)
print text

我想要做的是获取{{.....etc.....}}在Firefox中将其加载到脚本中时在URL上看到的内容,以便我可以解析出一个值。我已经用Google搜索了很多,但是关于如何{{...}}从URL 实际获取内容.json到Python脚本中的对象中,我还没有找到一个好的答案。

Got the following code in one of my scripts:

#
# url is defined above.
#
jsonurl = urlopen(url)

#
# While trying to debug, I put this in:
#
print jsonurl

#
# Was hoping text would contain the actual json crap from the URL, but seems not...
#
text = json.loads(jsonurl)
print text

What I want to do is get the {{.....etc.....}} stuff that I see on the URL when I load it in Firefox into my script so I can parse a value out of it. I’ve Googled a ton but I haven’t found a good answer as to how to actually get the {{...}} stuff from a URL ending in .json into an object in a Python script.


回答 0

从URL获取数据,然后调用json.loads例如

Python3示例

import urllib.request, json 
with urllib.request.urlopen("http://maps.googleapis.com/maps/api/geocode/json?address=google") as url:
    data = json.loads(url.read().decode())
    print(data)

Python2示例

import urllib, json
url = "http://maps.googleapis.com/maps/api/geocode/json?address=google"
response = urllib.urlopen(url)
data = json.loads(response.read())
print data

输出结果将是这样的:

{
"results" : [
    {
    "address_components" : [
        {
            "long_name" : "Charleston and Huff",
            "short_name" : "Charleston and Huff",
            "types" : [ "establishment", "point_of_interest" ]
        },
        {
            "long_name" : "Mountain View",
            "short_name" : "Mountain View",
            "types" : [ "locality", "political" ]
        },
        {
...

Get data from the URL and then call json.loads e.g.

Python3 example:

import urllib.request, json 
with urllib.request.urlopen("http://maps.googleapis.com/maps/api/geocode/json?address=google") as url:
    data = json.loads(url.read().decode())
    print(data)

Python2 example:

import urllib, json
url = "http://maps.googleapis.com/maps/api/geocode/json?address=google"
response = urllib.urlopen(url)
data = json.loads(response.read())
print data

The output would result in something like this:

{
"results" : [
    {
    "address_components" : [
        {
            "long_name" : "Charleston and Huff",
            "short_name" : "Charleston and Huff",
            "types" : [ "establishment", "point_of_interest" ]
        },
        {
            "long_name" : "Mountain View",
            "short_name" : "Mountain View",
            "types" : [ "locality", "political" ]
        },
        {
...

回答 1

我猜您实际上是想从URL中获取数据:

jsonurl = urlopen(url)
text = json.loads(jsonurl.read()) # <-- read from it

或者,在请求库中检出JSON解码器

import requests
r = requests.get('someurl')
print r.json() # if response type was set to JSON, then you'll automatically have a JSON response here...

I’ll take a guess that you actually want to get data from the URL:

jsonurl = urlopen(url)
text = json.loads(jsonurl.read()) # <-- read from it

Or, check out JSON decoder in the requests library.

import requests
r = requests.get('someurl')
print r.json() # if response type was set to JSON, then you'll automatically have a JSON response here...

回答 2

这会从使用Python 2.X和Python 3.X的网页获取JSON格式的字典:

#!/usr/bin/env python

try:
    # For Python 3.0 and later
    from urllib.request import urlopen
except ImportError:
    # Fall back to Python 2's urllib2
    from urllib2 import urlopen

import json


def get_jsonparsed_data(url):
    """
    Receive the content of ``url``, parse it as JSON and return the object.

    Parameters
    ----------
    url : str

    Returns
    -------
    dict
    """
    response = urlopen(url)
    data = response.read().decode("utf-8")
    return json.loads(data)


url = ("http://maps.googleapis.com/maps/api/geocode/json?"
       "address=googleplex&sensor=false")
print(get_jsonparsed_data(url))

另请参阅:JSON读写示例

This gets a dictionary in JSON format from a webpage with Python 2.X and Python 3.X:

#!/usr/bin/env python

try:
    # For Python 3.0 and later
    from urllib.request import urlopen
except ImportError:
    # Fall back to Python 2's urllib2
    from urllib2 import urlopen

import json


def get_jsonparsed_data(url):
    """
    Receive the content of ``url``, parse it as JSON and return the object.

    Parameters
    ----------
    url : str

    Returns
    -------
    dict
    """
    response = urlopen(url)
    data = response.read().decode("utf-8")
    return json.loads(data)


url = ("http://maps.googleapis.com/maps/api/geocode/json?"
       "address=googleplex&sensor=false")
print(get_jsonparsed_data(url))

See also: Read and write example for JSON


回答 3

我发现这是使用Python 3时从网页获取JSON的最简单,最有效的方法:

import json,urllib.request
data = urllib.request.urlopen("https://api.github.com/users?since=100").read()
output = json.loads(data)
print (output)

I have found this to be the easiest and most efficient way to get JSON from a webpage when using Python 3:

import json,urllib.request
data = urllib.request.urlopen("https://api.github.com/users?since=100").read()
output = json.loads(data)
print (output)

回答 4

调用的所有urlopen()操作(根据docs)都返回一个类似文件的对象。一旦有了它,就需要调用其read()方法来在网络上实际提取JSON数据。

就像是:

jsonurl = urlopen(url)

text = json.loads(jsonurl.read())
print text

All that the call to urlopen() does (according to the docs) is return a file-like object. Once you have that, you need to call its read() method to actually pull the JSON data across the network.

Something like:

jsonurl = urlopen(url)

text = json.loads(jsonurl.read())
print text

回答 5

在Python 2中,可以使用json.load()代替json.loads()

import json
import urllib

url = 'https://api.github.com/users?since=100'
output = json.load(urllib.urlopen(url))
print(output)

不幸的是,这在Python 3中不起作用。json.load只是json.loads的包装,它为类似文件的对象调用read()。json.loads需要一个字符串对象,而urllib.urlopen(url).read()的输出是一个字节对象。因此,必须获取文件编码才能使其在Python 3中工作。

在此示例中,我们查询标头以获取编码,如果没有得到编码,则回退到utf-8。标头对象在Python 2和3之间是不同的,因此必须以不同的方式完成。使用请求可以避免所有这些情况,但是有时您需要坚持使用标准库。

import json
from six.moves.urllib.request import urlopen

DEFAULT_ENCODING = 'utf-8'
url = 'https://api.github.com/users?since=100'
urlResponse = urlopen(url)

if hasattr(urlResponse.headers, 'get_content_charset'):
    encoding = urlResponse.headers.get_content_charset(DEFAULT_ENCODING)
else:
    encoding = urlResponse.headers.getparam('charset') or DEFAULT_ENCODING

output = json.loads(urlResponse.read().decode(encoding))
print(output)

In Python 2, json.load() will work instead of json.loads()

import json
import urllib

url = 'https://api.github.com/users?since=100'
output = json.load(urllib.urlopen(url))
print(output)

Unfortunately, that doesn’t work in Python 3. json.load is just a wrapper around json.loads that calls read() for a file-like object. json.loads requires a string object and the output of urllib.urlopen(url).read() is a bytes object. So one has to get the file encoding in order to make it work in Python 3.

In this example we query the headers for the encoding and fall back to utf-8 if we don’t get one. The headers object is different between Python 2 and 3 so it has to be done different ways. Using requests would avoid all this, but sometimes you need to stick to the standard library.

import json
from six.moves.urllib.request import urlopen

DEFAULT_ENCODING = 'utf-8'
url = 'https://api.github.com/users?since=100'
urlResponse = urlopen(url)

if hasattr(urlResponse.headers, 'get_content_charset'):
    encoding = urlResponse.headers.get_content_charset(DEFAULT_ENCODING)
else:
    encoding = urlResponse.headers.getparam('charset') or DEFAULT_ENCODING

output = json.loads(urlResponse.read().decode(encoding))
print(output)

回答 6

无需使用额外的库来解析json …

json.loads()返回字典

所以就你而言 text["someValueKey"]

There’s no need to use an extra library to parse the json…

json.loads() returns a dictionary.

So in your case, just do text["someValueKey"]


回答 7

答案较晚,但python>=3.6您可以使用:

import dload
j = dload.json(url)

安装dload方式:

pip3 install dload

Late answer, but for python>=3.6 you can use:

import dload
j = dload.json(url)

Install dload with:

pip3 install dload

回答 8

您可以使用json.dumps

import json

# Hier comes you received data

data = json.dumps(response)

print(data)

为了加载json并将其写入文件,以下代码很有用:

data = json.loads(json.dumps(Response, sort_keys=False, indent=4))
with open('data.json', 'w') as outfile:
json.dump(data, outfile, sort_keys=False, indent=4)

you can use json.dumps:

import json

# Hier comes you received data

data = json.dumps(response)

print(data)

for loading json and write it on file the following code is useful:

data = json.loads(json.dumps(Response, sort_keys=False, indent=4))
with open('data.json', 'w') as outfile:
json.dump(data, outfile, sort_keys=False, indent=4)