问题:格式使用标准json模块浮动
我正在使用python 2.6中的标准json模块来序列化float列表。但是,我得到这样的结果:
>>> import json
>>> json.dumps([23.67, 23.97, 23.87])
'[23.670000000000002, 23.969999999999999, 23.870000000000001]'
我希望浮点数仅使用两位十进制数字进行格式化。输出应如下所示:
>>> json.dumps([23.67, 23.97, 23.87])
'[23.67, 23.97, 23.87]'
我尝试定义自己的JSON Encoder类:
class MyEncoder(json.JSONEncoder):
def encode(self, obj):
if isinstance(obj, float):
return format(obj, '.2f')
return json.JSONEncoder.encode(self, obj)
这适用于唯一的float对象:
>>> json.dumps(23.67, cls=MyEncoder)
'23.67'
但是对于嵌套对象失败:
>>> json.dumps([23.67, 23.97, 23.87])
'[23.670000000000002, 23.969999999999999, 23.870000000000001]'
我不想有外部依赖性,所以我更喜欢使用标准的json模块。
我该如何实现?
I am using the standard json module in python 2.6 to serialize a list of floats. However, I’m getting results like this:
>>> import json
>>> json.dumps([23.67, 23.97, 23.87])
'[23.670000000000002, 23.969999999999999, 23.870000000000001]'
I want the floats to be formated with only two decimal digits. The output should look like this:
>>> json.dumps([23.67, 23.97, 23.87])
'[23.67, 23.97, 23.87]'
I have tried defining my own JSON Encoder class:
class MyEncoder(json.JSONEncoder):
def encode(self, obj):
if isinstance(obj, float):
return format(obj, '.2f')
return json.JSONEncoder.encode(self, obj)
This works for a sole float object:
>>> json.dumps(23.67, cls=MyEncoder)
'23.67'
But fails for nested objects:
>>> json.dumps([23.67, 23.97, 23.87])
'[23.670000000000002, 23.969999999999999, 23.870000000000001]'
I don’t want to have external dependencies, so I prefer to stick with the standard json module.
How can I achieve this?
回答 0
注:这并没有任何最新版本的Python的工作。
不幸的是,我相信您必须通过Monkey补丁来做到这一点(我认为这表明标准库json
软件包中存在设计缺陷)。例如,此代码:
import json
from json import encoder
encoder.FLOAT_REPR = lambda o: format(o, '.2f')
print(json.dumps(23.67))
print(json.dumps([23.67, 23.97, 23.87]))
发出:
23.67
[23.67, 23.97, 23.87]
如您所愿。显然,应该有一种覆盖的结构化方法,FLOAT_REPR
以便您可以控制浮点数的每个表示形式;但不幸的是,这不是json
包装的设计方式:-(。
Note: This does not work in any recent version of Python.
Unfortunately, I believe you have to do this by monkey-patching (which, to my opinion, indicates a design defect in the standard library json
package). E.g., this code:
import json
from json import encoder
encoder.FLOAT_REPR = lambda o: format(o, '.2f')
print(json.dumps(23.67))
print(json.dumps([23.67, 23.97, 23.87]))
emits:
23.67
[23.67, 23.97, 23.87]
as you desire. Obviously, there should be an architected way to override FLOAT_REPR
so that EVERY representation of a float is under your control if you wish it to be; but unfortunately that’s not how the json
package was designed:-(.
回答 1
import simplejson
class PrettyFloat(float):
def __repr__(self):
return '%.15g' % self
def pretty_floats(obj):
if isinstance(obj, float):
return PrettyFloat(obj)
elif isinstance(obj, dict):
return dict((k, pretty_floats(v)) for k, v in obj.items())
elif isinstance(obj, (list, tuple)):
return list(map(pretty_floats, obj))
return obj
print(simplejson.dumps(pretty_floats([23.67, 23.97, 23.87])))
发出
[23.67, 23.97, 23.87]
无需进行Monkey修补。
import simplejson
class PrettyFloat(float):
def __repr__(self):
return '%.15g' % self
def pretty_floats(obj):
if isinstance(obj, float):
return PrettyFloat(obj)
elif isinstance(obj, dict):
return dict((k, pretty_floats(v)) for k, v in obj.items())
elif isinstance(obj, (list, tuple)):
return list(map(pretty_floats, obj))
return obj
print(simplejson.dumps(pretty_floats([23.67, 23.97, 23.87])))
emits
[23.67, 23.97, 23.87]
No monkeypatching necessary.
回答 2
如果您使用的是Python 2.7,一个简单的解决方案是将浮点数显式舍入到所需的精度。
>>> sys.version
'2.7.1 (r271:86832, Nov 27 2010, 18:30:46) [MSC v.1500 32 bit (Intel)]'
>>> json.dumps(1.0/3.0)
'0.3333333333333333'
>>> json.dumps(round(1.0/3.0, 2))
'0.33'
之所以有效,是因为Python 2.7使浮点舍入更加一致。不幸的是,这在Python 2.6中不起作用:
>>> sys.version
'2.6.6 (r266:84292, Dec 27 2010, 00:02:40) \n[GCC 4.4.5]'
>>> json.dumps(round(1.0/3.0, 2))
'0.33000000000000002'
上面提到的解决方案是2.6的解决方法,但没有一个是完全足够的。如果您的Python运行时使用JSON模块的C版本,则Monkey修补json.encoder.FLOAT_REPR不起作用。Tom Wuttke的答案中的PrettyFloat类起作用,但是仅当%g编码对于您的应用程序全局起作用时。%.15g有点魔术,它可以工作,因为浮点精度是17个有效数字,%g不打印尾随零。
我花了一些时间尝试制作一个PrettyFloat,它允许为每个数字自定义精度。即,像这样的语法
>>> json.dumps(PrettyFloat(1.0 / 3.0, 4))
'0.3333'
要做到这一点并不容易。从float继承很尴尬。从Object继承并使用带有自己的default()方法的JSONEncoder子类应该可以工作,除了json模块似乎假定所有自定义类型都应序列化为字符串。即:您最终在输出中使用Javascript字符串“ 0.33”,而不是数字0.33。也许还有一种方法可以使这项工作完成,但是比看起来要难。
If you’re using Python 2.7, a simple solution is to simply round your floats explicitly to the desired precision.
>>> sys.version
'2.7.1 (r271:86832, Nov 27 2010, 18:30:46) [MSC v.1500 32 bit (Intel)]'
>>> json.dumps(1.0/3.0)
'0.3333333333333333'
>>> json.dumps(round(1.0/3.0, 2))
'0.33'
This works because Python 2.7 made float rounding more consistent. Unfortunately this does not work in Python 2.6:
>>> sys.version
'2.6.6 (r266:84292, Dec 27 2010, 00:02:40) \n[GCC 4.4.5]'
>>> json.dumps(round(1.0/3.0, 2))
'0.33000000000000002'
The solutions mentioned above are workarounds for 2.6, but none are entirely adequate. Monkey patching json.encoder.FLOAT_REPR does not work if your Python runtime uses a C version of the JSON module. The PrettyFloat class in Tom Wuttke’s answer works, but only if %g encoding works globally for your application. The %.15g is a bit magic, it works because float precision is 17 significant digits and %g does not print trailing zeroes.
I spent some time trying to make a PrettyFloat that allowed customization of precision for each number. Ie, a syntax like
>>> json.dumps(PrettyFloat(1.0 / 3.0, 4))
'0.3333'
It’s not easy to get this right. Inheriting from float is awkward. Inheriting from Object and using a JSONEncoder subclass with its own default() method should work, except the json module seems to assume all custom types should be serialized as strings. Ie: you end up with the Javascript string “0.33” in the output, not the number 0.33. There may be a way yet to make this work, but it’s harder than it looks.
回答 3
真不幸,dumps
这使您无法做任何漂浮的事情。但是loads
确实如此。因此,如果您不介意额外的CPU负载,则可以将其扔到编码器/解码器/编码器中,并得到正确的结果:
>>> json.dumps(json.loads(json.dumps([.333333333333, .432432]), parse_float=lambda x: round(float(x), 3)))
'[0.333, 0.432]'
Really unfortunate that dumps
doesn’t allow you to do anything to floats. However loads
does. So if you don’t mind the extra CPU load, you could throw it through the encoder/decoder/encoder and get the right result:
>>> json.dumps(json.loads(json.dumps([.333333333333, .432432]), parse_float=lambda x: round(float(x), 3)))
'[0.333, 0.432]'
回答 4
这是在Python 3中对我有用的解决方案,不需要Monkey补丁:
import json
def round_floats(o):
if isinstance(o, float): return round(o, 2)
if isinstance(o, dict): return {k: round_floats(v) for k, v in o.items()}
if isinstance(o, (list, tuple)): return [round_floats(x) for x in o]
return o
json.dumps(round_floats([23.63437, 23.93437, 23.842347]))
输出为:
[23.63, 23.93, 23.84]
它复制数据,但具有四舍五入的浮点数。
Here’s a solution that worked for me in Python 3 and does not require monkey patching:
import json
def round_floats(o):
if isinstance(o, float): return round(o, 2)
if isinstance(o, dict): return {k: round_floats(v) for k, v in o.items()}
if isinstance(o, (list, tuple)): return [round_floats(x) for x in o]
return o
json.dumps(round_floats([23.63437, 23.93437, 23.842347]))
Output is:
[23.63, 23.93, 23.84]
It copies the data but with rounded floats.
回答 5
如果您坚持使用Python 2.5或更早版本:如果安装了C加速,则Monkey-patch技巧似乎不适用于原始的simplejson模块:
$ python
Python 2.5.4 (r254:67916, Jan 20 2009, 11:06:13)
[GCC 4.2.1 (SUSE Linux)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import simplejson
>>> simplejson.__version__
'2.0.9'
>>> simplejson._speedups
<module 'simplejson._speedups' from '/home/carlos/.python-eggs/simplejson-2.0.9-py2.5-linux-i686.egg-tmp/simplejson/_speedups.so'>
>>> simplejson.encoder.FLOAT_REPR = lambda f: ("%.2f" % f)
>>> simplejson.dumps([23.67, 23.97, 23.87])
'[23.670000000000002, 23.969999999999999, 23.870000000000001]'
>>> simplejson.encoder.c_make_encoder = None
>>> simplejson.dumps([23.67, 23.97, 23.87])
'[23.67, 23.97, 23.87]'
>>>
If you’re stuck with Python 2.5 or earlier versions: The monkey-patch trick does not seem to work with the original simplejson module if the C speedups are installed:
$ python
Python 2.5.4 (r254:67916, Jan 20 2009, 11:06:13)
[GCC 4.2.1 (SUSE Linux)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import simplejson
>>> simplejson.__version__
'2.0.9'
>>> simplejson._speedups
<module 'simplejson._speedups' from '/home/carlos/.python-eggs/simplejson-2.0.9-py2.5-linux-i686.egg-tmp/simplejson/_speedups.so'>
>>> simplejson.encoder.FLOAT_REPR = lambda f: ("%.2f" % f)
>>> simplejson.dumps([23.67, 23.97, 23.87])
'[23.670000000000002, 23.969999999999999, 23.870000000000001]'
>>> simplejson.encoder.c_make_encoder = None
>>> simplejson.dumps([23.67, 23.97, 23.87])
'[23.67, 23.97, 23.87]'
>>>
回答 6
您可以做您需要做的事情,但是没有记录:
>>> import json
>>> json.encoder.FLOAT_REPR = lambda f: ("%.2f" % f)
>>> json.dumps([23.67, 23.97, 23.87])
'[23.67, 23.97, 23.87]'
You can do what you need to do, but it isn’t documented:
>>> import json
>>> json.encoder.FLOAT_REPR = lambda f: ("%.2f" % f)
>>> json.dumps([23.67, 23.97, 23.87])
'[23.67, 23.97, 23.87]'
回答 7
Alex Martelli的解决方案将适用于单线程应用程序,但不适用于需要控制每个线程的小数位数的多线程应用程序。这是一种应在多线程应用程序中使用的解决方案:
import threading
from json import encoder
def FLOAT_REPR(f):
"""
Serialize a float to a string, with a given number of digits
"""
decimal_places = getattr(encoder.thread_local, 'decimal_places', 0)
format_str = '%%.%df' % decimal_places
return format_str % f
encoder.thread_local = threading.local()
encoder.FLOAT_REPR = FLOAT_REPR
#As an example, call like this:
import json
encoder.thread_local.decimal_places = 1
json.dumps([1.56, 1.54]) #Should result in '[1.6, 1.5]'
您仅可以将encoder.thread_local.decimal_places设置为所需的小数位数,而该线程中对json.dumps()的下一次调用将使用该小数位数
Alex Martelli’s solution will work for single threaded apps, but may not work for multi-threaded apps that need to control the number of decimal places per thread. Here is a solution that should work in multi threaded apps:
import threading
from json import encoder
def FLOAT_REPR(f):
"""
Serialize a float to a string, with a given number of digits
"""
decimal_places = getattr(encoder.thread_local, 'decimal_places', 0)
format_str = '%%.%df' % decimal_places
return format_str % f
encoder.thread_local = threading.local()
encoder.FLOAT_REPR = FLOAT_REPR
#As an example, call like this:
import json
encoder.thread_local.decimal_places = 1
json.dumps([1.56, 1.54]) #Should result in '[1.6, 1.5]'
You can merely set encoder.thread_local.decimal_places to the number of decimal places you want, and the next call to json.dumps() in that thread will use that number of decimal places
回答 8
如果您需要在python 2.7中执行此操作而不覆盖全局json.encoder.FLOAT_REPR,这是一种方法。
import json
import math
class MyEncoder(json.JSONEncoder):
"JSON encoder that renders floats to two decimal places"
FLOAT_FRMT = '{0:.2f}'
def floatstr(self, obj):
return self.FLOAT_FRMT.format(obj)
def _iterencode(self, obj, markers=None):
# stl JSON lame override #1
new_obj = obj
if isinstance(obj, float):
if not math.isnan(obj) and not math.isinf(obj):
new_obj = self.floatstr(obj)
return super(MyEncoder, self)._iterencode(new_obj, markers=markers)
def _iterencode_dict(self, dct, markers=None):
# stl JSON lame override #2
new_dct = {}
for key, value in dct.iteritems():
if isinstance(key, float):
if not math.isnan(key) and not math.isinf(key):
key = self.floatstr(key)
new_dct[key] = value
return super(MyEncoder, self)._iterencode_dict(new_dct, markers=markers)
然后,在python 2.7中:
>>> from tmp import MyEncoder
>>> enc = MyEncoder()
>>> enc.encode([23.67, 23.98, 23.87])
'[23.67, 23.98, 23.87]'
在python 2.6中,它无法正常工作,正如Matthew Schinckel指出的那样:
>>> import MyEncoder
>>> enc = MyEncoder()
>>> enc.encode([23.67, 23.97, 23.87])
'["23.67", "23.97", "23.87"]'
If you need to do this in python 2.7 without overriding the global json.encoder.FLOAT_REPR, here’s one way.
import json
import math
class MyEncoder(json.JSONEncoder):
"JSON encoder that renders floats to two decimal places"
FLOAT_FRMT = '{0:.2f}'
def floatstr(self, obj):
return self.FLOAT_FRMT.format(obj)
def _iterencode(self, obj, markers=None):
# stl JSON lame override #1
new_obj = obj
if isinstance(obj, float):
if not math.isnan(obj) and not math.isinf(obj):
new_obj = self.floatstr(obj)
return super(MyEncoder, self)._iterencode(new_obj, markers=markers)
def _iterencode_dict(self, dct, markers=None):
# stl JSON lame override #2
new_dct = {}
for key, value in dct.iteritems():
if isinstance(key, float):
if not math.isnan(key) and not math.isinf(key):
key = self.floatstr(key)
new_dct[key] = value
return super(MyEncoder, self)._iterencode_dict(new_dct, markers=markers)
Then, in python 2.7:
>>> from tmp import MyEncoder
>>> enc = MyEncoder()
>>> enc.encode([23.67, 23.98, 23.87])
'[23.67, 23.98, 23.87]'
In python 2.6, it doesn’t quite work as Matthew Schinckel points out below:
>>> import MyEncoder
>>> enc = MyEncoder()
>>> enc.encode([23.67, 23.97, 23.87])
'["23.67", "23.97", "23.87"]'
回答 9
优点:
- 适用于任何JSON编码器,甚至python的repr。
- 短(ish),似乎起作用。
缺点:
- 丑陋的regexp hack,未经测试。
二次复杂度。
def fix_floats(json, decimals=2, quote='"'):
pattern = r'^((?:(?:"(?:\\.|[^\\"])*?")|[^"])*?)(-?\d+\.\d{'+str(decimals)+'}\d+)'
pattern = re.sub('"', quote, pattern)
fmt = "%%.%df" % decimals
n = 1
while n:
json, n = re.subn(pattern, lambda m: m.group(1)+(fmt % float(m.group(2)).rstrip('0')), json)
return json
Pros:
- Works with any JSON encoder, or even python’s repr.
- Short(ish), seems to work.
Cons:
- Ugly regexp hack, barely tested.
Quadratic complexity.
def fix_floats(json, decimals=2, quote='"'):
pattern = r'^((?:(?:"(?:\\.|[^\\"])*?")|[^"])*?)(-?\d+\.\d{'+str(decimals)+'}\d+)'
pattern = re.sub('"', quote, pattern)
fmt = "%%.%df" % decimals
n = 1
while n:
json, n = re.subn(pattern, lambda m: m.group(1)+(fmt % float(m.group(2)).rstrip('0')), json)
return json
回答 10
导入标准json模块时,只需更改默认编码器FLOAT_REPR。确实不需要导入或创建Encoder实例。
import json
json.encoder.FLOAT_REPR = lambda o: format(o, '.2f')
json.dumps([23.67, 23.97, 23.87]) #returns '[23.67, 23.97, 23.87]'
有时,将python可以用str猜出的最佳表示形式作为json输出也非常有用。这将确保重要数字不会被忽略。
import json
json.dumps([23.67, 23.9779, 23.87489])
# output is'[23.670000000000002, 23.977900000000002, 23.874890000000001]'
json.encoder.FLOAT_REPR = str
json.dumps([23.67, 23.9779, 23.87489])
# output is '[23.67, 23.9779, 23.87489]'
When importing the standard json module, it is enough to change the default encoder FLOAT_REPR. There isn’t really the need to import or create Encoder instances.
import json
json.encoder.FLOAT_REPR = lambda o: format(o, '.2f')
json.dumps([23.67, 23.97, 23.87]) #returns '[23.67, 23.97, 23.87]'
Sometimes is also very useful to output as json the best representation python can guess with str. This will make sure signifficant digits are not ignored.
import json
json.dumps([23.67, 23.9779, 23.87489])
# output is'[23.670000000000002, 23.977900000000002, 23.874890000000001]'
json.encoder.FLOAT_REPR = str
json.dumps([23.67, 23.9779, 23.87489])
# output is '[23.67, 23.9779, 23.87489]'
回答 11
我同意@Nelson的观点,从float继承是很尴尬的,但是也许只涉及__repr__
函数的解决方案是可以原谅的。我最终使用该decimal
软件包在需要时重新格式化浮点数。好处是,这在所有repr()
被调用的上下文中都有效,例如在简单地将列表打印到stdout时也是如此。同样,创建数据后,精度可以在运行时配置。缺点当然是您的数据需要转换为特殊的float类(不幸的是,您似乎无法获得Monkey补丁float.__repr__
)。为此,我提供了一个简短的转换功能。
代码:
import decimal
C = decimal.getcontext()
class decimal_formatted_float(float):
def __repr__(self):
s = str(C.create_decimal_from_float(self))
if '.' in s: s = s.rstrip('0')
return s
def convert_to_dff(elem):
try:
return elem.__class__(map(convert_to_dff, elem))
except:
if isinstance(elem, float):
return decimal_formatted_float(elem)
else:
return elem
用法示例:
>>> import json
>>> li = [(1.2345,),(7.890123,4.567,890,890.)]
>>>
>>> decimal.getcontext().prec = 15
>>> dff_li = convert_to_dff(li)
>>> dff_li
[(1.2345,), (7.890123, 4.567, 890, 890)]
>>> json.dumps(dff_li)
'[[1.2345], [7.890123, 4.567, 890, 890]]'
>>>
>>> decimal.getcontext().prec = 3
>>> dff_li = convert_to_dff(li)
>>> dff_li
[(1.23,), (7.89, 4.57, 890, 890)]
>>> json.dumps(dff_li)
'[[1.23], [7.89, 4.57, 890, 890]]'
I agree with @Nelson that inheriting from float is awkward, but perhaps a solution that only touches the __repr__
function might be forgiveable. I ended up using the decimal
package for this to reformat floats when needed. The upside is that this works in all contexts where repr()
is being called, so also when simply printing lists to stdout for example. Also, the precision is runtime configurable, after the data has been created. Downside is of course that your data needs to be converted to this special float class (as unfortunately you cannot seem to monkey patch float.__repr__
). For that I provide a brief conversion function.
The code:
import decimal
C = decimal.getcontext()
class decimal_formatted_float(float):
def __repr__(self):
s = str(C.create_decimal_from_float(self))
if '.' in s: s = s.rstrip('0')
return s
def convert_to_dff(elem):
try:
return elem.__class__(map(convert_to_dff, elem))
except:
if isinstance(elem, float):
return decimal_formatted_float(elem)
else:
return elem
Usage example:
>>> import json
>>> li = [(1.2345,),(7.890123,4.567,890,890.)]
>>>
>>> decimal.getcontext().prec = 15
>>> dff_li = convert_to_dff(li)
>>> dff_li
[(1.2345,), (7.890123, 4.567, 890, 890)]
>>> json.dumps(dff_li)
'[[1.2345], [7.890123, 4.567, 890, 890]]'
>>>
>>> decimal.getcontext().prec = 3
>>> dff_li = convert_to_dff(li)
>>> dff_li
[(1.23,), (7.89, 4.57, 890, 890)]
>>> json.dumps(dff_li)
'[[1.23], [7.89, 4.57, 890, 890]]'
回答 12
使用numpy
如果您实际上有很长的浮动,则可以使用numpy将其正确向上/向下取整:
import json
import numpy as np
data = np.array([23.671234, 23.97432, 23.870123])
json.dumps(np.around(data, decimals=2).tolist())
'[23.67, 23.97, 23.87]'
Using numpy
If you actually have really long floats you can round them up/down correctly with numpy:
import json
import numpy as np
data = np.array([23.671234, 23.97432, 23.870123])
json.dumps(np.around(data, decimals=2).tolist())
'[23.67, 23.97, 23.87]'
回答 13
我刚刚发布了fjson(一个小的Python库)来解决此问题。与安装
pip install fjson
并使用like json
,并添加float_format
参数:
import math
import fjson
data = {"a": 1, "b": math.pi}
print(fjson.dumps(data, float_format=".6e", indent=2))
{
"a": 1,
"b": 3.141593e+00
}
I just released fjson, a small Python library to fix this issue. Install with
pip install fjson
and use just like json
, with the addition of the float_format
parameter:
import math
import fjson
data = {"a": 1, "b": math.pi}
print(fjson.dumps(data, float_format=".6e", indent=2))
{
"a": 1,
"b": 3.141593e+00
}