标签归档:floating-accuracy

为什么4 * 0.1的浮点值在Python 3中看起来不错,但3 * 0.1却不这样?

问题:为什么4 * 0.1的浮点值在Python 3中看起来不错,但3 * 0.1却不这样?

我知道大多数小数都没有确切的浮点表示形式(浮点数学运算符是否损坏?)。

但是,当两个值实际上都具有丑陋的十进制表示形式时,我看不出为什么4*0.1将其很好地打印为0.4,但3*0.1不是这样:

>>> 3*0.1
0.30000000000000004
>>> 4*0.1
0.4
>>> from decimal import Decimal
>>> Decimal(3*0.1)
Decimal('0.3000000000000000444089209850062616169452667236328125')
>>> Decimal(4*0.1)
Decimal('0.40000000000000002220446049250313080847263336181640625')

I know that most decimals don’t have an exact floating point representation (Is floating point math broken?).

But I don’t see why 4*0.1 is printed nicely as 0.4, but 3*0.1 isn’t, when both values actually have ugly decimal representations:

>>> 3*0.1
0.30000000000000004
>>> 4*0.1
0.4
>>> from decimal import Decimal
>>> Decimal(3*0.1)
Decimal('0.3000000000000000444089209850062616169452667236328125')
>>> Decimal(4*0.1)
Decimal('0.40000000000000002220446049250313080847263336181640625')

回答 0

简单的答案是因为3*0.1 != 0.3归因于量化(四舍五入)误差(而4*0.1 == 0.4乘以2的幂通常是“精确”运算)。

您可以.hex在Python中使用该方法查看数字的内部表示形式(基本上是确切的二进制浮点值,而不是以10为底的近似值)。这可以帮助解释幕后情况。

>>> (0.1).hex()
'0x1.999999999999ap-4'
>>> (0.3).hex()
'0x1.3333333333333p-2'
>>> (0.1*3).hex()
'0x1.3333333333334p-2'
>>> (0.4).hex()
'0x1.999999999999ap-2'
>>> (0.1*4).hex()
'0x1.999999999999ap-2'

0.1是0x1.999999999999a乘以2 ^ -4。末尾的“ a”表示数字10-换句话说,二进制浮点数中的0.1 略大于 “精确”值0.1(因为最终的0x0.99舍入为0x0.a)。当您将其乘以4(2的幂)时,指数会上移(从2 ^ -4到2 ^ -2),但数字不变,所以4*0.1 == 0.4

但是,当乘以3时,0x0.99与0x0.a0(0x0.07)之间的微小差异会放大为0x0.15错误,在最后一个位置显示为一位错误。这将导致0.1 * 3 略大于 0.3的舍入值。

Python 3的float repr被设计为可双向访问的,也就是说,显示的值应完全可转换为原始值。因此,它无法显示0.30.1*3完全相同的方式,或两个不同的数字最终会往返后相同。因此,Python 3的repr引擎选择显示一个略有明显错误的引擎。

The simple answer is because 3*0.1 != 0.3 due to quantization (roundoff) error (whereas 4*0.1 == 0.4 because multiplying by a power of two is usually an “exact” operation). Python tries to find the shortest string that would round to the desired value, so it can display 4*0.1 as 0.4 as these are equal, but it cannot display 3*0.1 as 0.3 because these are not equal.

You can use the .hex method in Python to view the internal representation of a number (basically, the exact binary floating point value, rather than the base-10 approximation). This can help to explain what’s going on under the hood.

>>> (0.1).hex()
'0x1.999999999999ap-4'
>>> (0.3).hex()
'0x1.3333333333333p-2'
>>> (0.1*3).hex()
'0x1.3333333333334p-2'
>>> (0.4).hex()
'0x1.999999999999ap-2'
>>> (0.1*4).hex()
'0x1.999999999999ap-2'

0.1 is 0x1.999999999999a times 2^-4. The “a” at the end means the digit 10 – in other words, 0.1 in binary floating point is very slightly larger than the “exact” value of 0.1 (because the final 0x0.99 is rounded up to 0x0.a). When you multiply this by 4, a power of two, the exponent shifts up (from 2^-4 to 2^-2) but the number is otherwise unchanged, so 4*0.1 == 0.4.

However, when you multiply by 3, the tiny little difference between 0x0.99 and 0x0.a0 (0x0.07) magnifies into a 0x0.15 error, which shows up as a one-digit error in the last position. This causes 0.1*3 to be very slightly larger than the rounded value of 0.3.

Python 3’s float repr is designed to be round-trippable, that is, the value shown should be exactly convertible into the original value (float(repr(f)) == f for all floats f). Therefore, it cannot display 0.3 and 0.1*3 exactly the same way, or the two different numbers would end up the same after round-tripping. Consequently, Python 3’s repr engine chooses to display one with a slight apparent error.


回答 1

reprstr在Python 3中)将根据需要输出尽可能多的数字,以使该值明确。在这种情况下,相乘的结果3*0.1不是最接近0.3的值(十六进制为0x1.3333333333333p-2),实际上是高了一个LSB​​(0x1.3333333333334p-2),因此它需要更多的数字才能与0.3区分。

另一方面,乘法4*0.1 的确获得了最接近0.4的值(十六进制为0x1.999999999999ap-2),因此不需要任何其他数字。

您可以很容易地验证这一点:

>>> 3*0.1 == 0.3
False
>>> 4*0.1 == 0.4
True

我在上面使用了十六进制表示法,因为它既美观又紧凑,并且显示了两个值之间的位差。您可以使用eg自己执行此操作(3*0.1).hex()。如果您希望以全部十进制的形式查看它们,请执行以下操作:

>>> Decimal(3*0.1)
Decimal('0.3000000000000000444089209850062616169452667236328125')
>>> Decimal(0.3)
Decimal('0.299999999999999988897769753748434595763683319091796875')
>>> Decimal(4*0.1)
Decimal('0.40000000000000002220446049250313080847263336181640625')
>>> Decimal(0.4)
Decimal('0.40000000000000002220446049250313080847263336181640625')

repr (and str in Python 3) will put out as many digits as required to make the value unambiguous. In this case the result of the multiplication 3*0.1 isn’t the closest value to 0.3 (0x1.3333333333333p-2 in hex), it’s actually one LSB higher (0x1.3333333333334p-2) so it needs more digits to distinguish it from 0.3.

On the other hand, the multiplication 4*0.1 does get the closest value to 0.4 (0x1.999999999999ap-2 in hex), so it doesn’t need any additional digits.

You can verify this quite easily:

>>> 3*0.1 == 0.3
False
>>> 4*0.1 == 0.4
True

I used hex notation above because it’s nice and compact and shows the bit difference between the two values. You can do this yourself using e.g. (3*0.1).hex(). If you’d rather see them in all their decimal glory, here you go:

>>> Decimal(3*0.1)
Decimal('0.3000000000000000444089209850062616169452667236328125')
>>> Decimal(0.3)
Decimal('0.299999999999999988897769753748434595763683319091796875')
>>> Decimal(4*0.1)
Decimal('0.40000000000000002220446049250313080847263336181640625')
>>> Decimal(0.4)
Decimal('0.40000000000000002220446049250313080847263336181640625')

回答 2

这是其他答案的简化结论。

如果您在Python的命令行上检查浮点数或将其打印,它将通过repr创建其字符串表示形式的函数。

从3.2版开始,Python strrepr使用复杂的舍入机制,其更喜欢好看的小数,如果有可能,但使用更多的数字在需要保证双射(一个一对一)映射花车和它们的字符串表示之间。

这种方案保证repr(float(s))即使简单的小数点不能精确地表示为浮点数(例如when),其值对于简单的小数点也看起来不错s = "0.1")

同时,它保证float(repr(x)) == x每个浮动都成立x

Here’s a simplified conclusion from other answers.

If you check a float on Python’s command line or print it, it goes through function repr which creates its string representation.

Starting with version 3.2, Python’s str and repr use a complex rounding scheme, which prefers nice-looking decimals if possible, but uses more digits where necessary to guarantee bijective (one-to-one) mapping between floats and their string representations.

This scheme guarantees that value of repr(float(s)) looks nice for simple decimals, even if they can’t be represented precisely as floats (eg. when s = "0.1").

At the same time it guarantees that float(repr(x)) == x holds for every float x


回答 3

并不是真的特定于Python的实现,而是应该适用于任何浮点数到十进制字符串的函数。

浮点数本质上是一个二进制数,但以科学计数法表示,有效数字的固定限制。

具有不与底数共享的质数因子的任何数字的逆将始终导致重复的点表示。例如1/7的素数为7,与10不共享,因此具有重复的十进制表示形式,素数为2和5的1/10也是如此,后者不与2共享; 这意味着0.1不能由点后的有限位数精确表示。

由于0.1没有精确的表示形式,因此将近似值转换为小数点字符串的函数通常将尝试近似某些值,以使它们不会像0.1000000000004121那样获得不直观的结果。

由于浮点数是科学计数法,因此任何乘以基数的幂只会影响数的指数部分。例如,十进制表示法为1.231e + 2 * 100 = 1.231e + 4,同样,二进制表示法为1.00101010e11 * 100 = 1.00101010e101。如果我乘以非底数的幂,则有效数字也会受到影响。例如1.2e1 * 3 = 3.6e1

根据所使用的算法,它可能会尝试仅根据有效数字来猜测常见的小数。0.1和0.4都具有相同的二进制有效数字,因为它们的浮点数本质上分别是(8/5)(2 ^ -4)和(8/5)(2 ^ -6)的截断。如果该算法将8/5 sigfig模式标识为十进制1.6,则它将适用于0.1、0.2、0.4、0.8等。对于其他组合(例如,浮点数3除以浮点数10),它也可能具有魔术的sigfig模式。以及其他统计上可能由10除以形成的魔术图案。

在3 * 0.1的情况下,最后几个有效数字可能与将浮点数3除以浮点数10有所不同,从而导致算法无法根据其对精度损失的容忍度来识别0.3常数的幻数。

编辑:https//docs.python.org/3.1/tutorial/floatingpoint.html

有趣的是,有许多不同的十进制数字共享相同的最接近的近似二进制分数。例如,数字0.1和0.10000000000000001和0.1000000000000000055511151231257827021181583404541015625都由3602879701896397/2 ** 55近似。由于所有这些十进制值都具有相同的近似值,因此可以显示其中任何一个,同时仍保留不变的eval(repr(x) )== x。

对于精度损失没有容忍度,如果float x(0.3)不完全等于float y(0.1 * 3),则repr(x)不完全等于repr(y)。

Not really specific to Python’s implementation but should apply to any float to decimal string functions.

A floating point number is essentially a binary number, but in scientific notation with a fixed limit of significant figures.

The inverse of any number that has a prime number factor that is not shared with the base will always result in a recurring dot point representation. For example 1/7 has a prime factor, 7, that is not shared with 10, and therefore has a recurring decimal representation, and the same is true for 1/10 with prime factors 2 and 5, the latter not being shared with 2; this means that 0.1 cannot be exactly represented by a finite number of bits after the dot point.

Since 0.1 has no exact representation, a function that converts the approximation to a decimal point string will usually try to approximate certain values so that they don’t get unintuitive results like 0.1000000000004121.

Since the floating point is in scientific notation, any multiplication by a power of the base only affects the exponent part of the number. For example 1.231e+2 * 100 = 1.231e+4 for decimal notation, and likewise, 1.00101010e11 * 100 = 1.00101010e101 in binary notation. If I multiply by a non-power of the base, the significant digits will also be affected. For example 1.2e1 * 3 = 3.6e1

Depending on the algorithm used, it may try to guess common decimals based on the significant figures only. Both 0.1 and 0.4 have the same significant figures in binary, because their floats are essentially truncations of (8/5)(2^-4) and (8/5)(2^-6) respectively. If the algorithm identifies the 8/5 sigfig pattern as the decimal 1.6, then it will work on 0.1, 0.2, 0.4, 0.8, etc. It may also have magic sigfig patterns for other combinations, such as the float 3 divided by float 10 and other magic patterns statistically likely to be formed by division by 10.

In the case of 3*0.1, the last few significant figures will likely be different from dividing a float 3 by float 10, causing the algorithm to fail to recognize the magic number for the 0.3 constant depending on its tolerance for precision loss.

Edit: https://docs.python.org/3.1/tutorial/floatingpoint.html

Interestingly, there are many different decimal numbers that share the same nearest approximate binary fraction. For example, the numbers 0.1 and 0.10000000000000001 and 0.1000000000000000055511151231257827021181583404541015625 are all approximated by 3602879701896397 / 2 ** 55. Since all of these decimal values share the same approximation, any one of them could be displayed while still preserving the invariant eval(repr(x)) == x.

There is no tolerance for precision loss, if float x (0.3) is not exactly equal to float y (0.1*3), then repr(x) is not exactly equal to repr(y).