标签归档:math

为什么Python的math.ceil()和math.floor()操作返回浮点数而不是整数?

问题:为什么Python的math.ceil()和math.floor()操作返回浮点数而不是整数?

有人可以解释一下吗(直接来自文档 -我的重点):

math.ceil(x)以浮点数形式返回x的上限,最小数值大于或等于x。

math.floor(x)以浮点数形式返回x的下限,即小于或等于x 的最大数值。

为什么会.ceil.floor回报花车当它们被定义应该算整数?


编辑:

好吧,这有一些很好的论据来说明为什么它们应该返回浮点数,而当@jcollado指出它们实际上确实在Python 3中返回整数时,我才刚刚习惯这个想法。

Can someone explain this (straight from the docs– emphasis mine):

math.ceil(x) Return the ceiling of x as a float, the smallest integer value greater than or equal to x.

math.floor(x) Return the floor of x as a float, the largest integer value less than or equal to x.

Why would .ceil and .floor return floats when they are by definition supposed to calculate integers?


EDIT:

Well this got some very good arguments as to why they should return floats, and I was just getting used to the idea, when @jcollado pointed out that they in fact do return ints in Python 3…


回答 0

浮点数的范围通常超过整数的范围。通过返回浮点值,函数可以为超出可表示的整数范围的输入值返回有意义的值。

考虑:如果floor()返回整数,应该floor(1.0e30)返回什么?

现在,尽管Python的整数现在具有任意精度,但并不总是这样。标准库函数是等效C库函数的精简包装。

The range of floating point numbers usually exceeds the range of integers. By returning a floating point value, the functions can return a sensible value for input values that lie outside the representable range of integers.

Consider: If floor() returned an integer, what should floor(1.0e30) return?

Now, while Python’s integers are now arbitrary precision, it wasn’t always this way. The standard library functions are thin wrappers around the equivalent C library functions.


回答 1

正如其他答案所指出的那样,在python中,它们返回浮点数的原因可能是出于防止溢出问题的历史原因。但是,它们在python 3中返回整数。

>>> import math
>>> type(math.floor(3.1))
<class 'int'>
>>> type(math.ceil(3.1))
<class 'int'>

您可以在PEP 3141中找到更多信息。

As pointed out by other answers, in python they return floats probably because of historical reasons to prevent overflow problems. However, they return integers in python 3.

>>> import math
>>> type(math.floor(3.1))
<class 'int'>
>>> type(math.ceil(3.1))
<class 'int'>

You can find more information in PEP 3141.


回答 2

您的评论明显说明了您的困惑:

ceil / floor操作的重点是将浮点数转换为整数!

ceil和floor操作的重点是将浮点数据舍入为整数值。不做类型转换。需要获取数值的用户可以在操作之后进行显式转换。

请注意,如果所有可用的都是返回整数的ceil或float运算,则不可能简单地实现舍入到整数值。您需要首先检查输入是否在可表示的整数范围内,然后调用该函数;您将需要在单独的代码路径中处理NaN和无穷大。

此外,如果要符合IEEE 754,则必须具有返回浮点数的ceil和floor版本。

The source of your confusion is evident in your comment:

The whole point of ceil/floor operations is to convert floats to integers!

The point of the ceil and floor operations is to round floating-point data to integral values. Not to do a type conversion. Users who need to get integer values can do an explicit conversion following the operation.

Note that it would not be possible to implement a round to integral value as trivially if all you had available were a ceil or float operation that returned an integer. You would need to first check that the input is within the representable integer range, then call the function; you would need to handle NaN and infinities in a separate code path.

Additionally, you must have versions of ceil and floor which return floating-point numbers if you want to conform to IEEE 754.


回答 3

因为python的数学库是对返回浮点数的C数学库的精简包装。

Because python’s math library is a thin wrapper around the C math library which returns floats.


回答 4

在Python 2.4之前的版本中,整数不能容纳所有范围的截断实数。

http://docs.python.org/whatsnew/2.4.html#pep-237-unifying-long-integers-and-integers

Before Python 2.4, an integer couldn’t hold the full range of truncated real numbers.

http://docs.python.org/whatsnew/2.4.html#pep-237-unifying-long-integers-and-integers


回答 5

因为float的范围大于整数的范围-返回整数可能会溢出

Because the range for floats is greater than that of integers — returning an integer could overflow


回答 6

这是一个非常有趣的问题!因为浮点数需要一些位来存储指数(= bits_for_exponent),所以任何大于2**(float_size - bits_for_exponent)整数的浮点数始终是整数!在另一个极端与负指数的浮动会给之一10-1。这使整数范围浮点范围的讨论变得毫无意义,因为只要数字超出整数类型的范围,这些函数将简单地返回原始数字。python函数是函数的包装器,C因此这确实是C函数的不足之处,它们应该返回整数并强制程序员执行范围NaN//Inf在调用ceil / floor之前检查。

因此,逻辑上的答案是这些函数唯一有用的函数,它们将返回整数范围内的值,因此,它们返回浮点数的事实是一个错误,您非常聪明地意识到这一点!

This is a very interesting question! As a float requires some bits to store the exponent (=bits_for_exponent) any floating point number greater than 2**(float_size - bits_for_exponent) will always be an integral value! At the other extreme a float with a negative exponent will give one of 1, 0 or -1. This makes the discussion of integer range versus float range moot because these functions will simply return the original number whenever the number is outside the range of the integer type. The python functions are wrappers of the C function and so this is really a deficiency of the C functions where they should have returned an integer and forced the programer to do the range/NaN/Inf check before calling ceil/floor.

Thus the logical answer is the only time these functions are useful they would return a value within integer range and so the fact they return a float is a mistake and you are very smart for realizing this!


回答 7

也许因为其他语言也这样做,所以它是普遍接受的行为。(有充分的理由,如其他答案所示)

Maybe because other languages do this as well, so it is generally-accepted behavior. (For good reasons, as shown in the other answers)


Python部门

问题:Python部门

我试图将一组从-100到0的数字归一化到10-100的范围,并且遇到了问题,只是注意到即使根本没有任何变量,这也无法评估我期望的方式:

>>> (20-10) / (100-10)
0

浮动划分也不起作用:

>>> float((20-10) / (100-10))
0.0

如果除法的任一侧都转换为浮点数,它将起作用:

>>> (20-10) / float((100-10))
0.1111111111111111

第一个示例中的每一边都被评估为一个int,这意味着最终答案将被强制转换为int。由于0.111小于.5,因此将其舍入为0。在我看来,这不是透明的,但我想是这样的。

有什么解释?

I was trying to normalize a set of numbers from -100 to 0 to a range of 10-100 and was having problems only to notice that even with no variables at all, this does not evaluate the way I would expect it to:

>>> (20-10) / (100-10)
0

Float division doesn’t work either:

>>> float((20-10) / (100-10))
0.0

If either side of the division is cast to a float it will work:

>>> (20-10) / float((100-10))
0.1111111111111111

Each side in the first example is evaluating as an int which means the final answer will be cast to an int. Since 0.111 is less than .5, it rounds to 0. It is not transparent in my opinion, but I guess that’s the way it is.

What is the explanation?


回答 0

您使用的是Python 2.x,其中整数除法将被截断而不是变成浮点数。

>>> 1 / 2
0

您应该将其中一个设为float

>>> float(10 - 20) / (100 - 10)
-0.1111111111111111

from __future__ import division,强制/采用总是返回float的Python 3.x行为。

>>> from __future__ import division
>>> (10 - 20) / (100 - 10)
-0.1111111111111111

You’re using Python 2.x, where integer divisions will truncate instead of becoming a floating point number.

>>> 1 / 2
0

You should make one of them a float:

>>> float(10 - 20) / (100 - 10)
-0.1111111111111111

or from __future__ import division, which the forces / to adopt Python 3.x’s behavior that always returns a float.

>>> from __future__ import division
>>> (10 - 20) / (100 - 10)
-0.1111111111111111

回答 1

将Integers放入其中,因此Python给了您一个整数

>>> 10 / 90
0

如果此后将其强制转换为浮点数,则四舍五入将已经完成,换句话说,0整数将始终变为0浮点数。

如果您在除法的任一侧使用浮点数,那么Python将为您提供您所期望的答案。

>>> 10 / 90.0
0.1111111111111111

因此,在您的情况下:

>>> float(20-10) / (100-10)
0.1111111111111111
>>> (20-10) / float(100-10)
0.1111111111111111

You’re putting Integers in so Python is giving you an integer back:

>>> 10 / 90
0

If if you cast this to a float afterwards the rounding will have already been done, in other words, 0 integer will always become 0 float.

If you use floats on either side of the division then Python will give you the answer you expect.

>>> 10 / 90.0
0.1111111111111111

So in your case:

>>> float(20-10) / (100-10)
0.1111111111111111
>>> (20-10) / float(100-10)
0.1111111111111111

回答 2

在进行除法之前,需要将其更改为浮点数。那是:

float(20 - 10) / (100 - 10)

You need to change it to a float BEFORE you do the division. That is:

float(20 - 10) / (100 - 10)

回答 3

在Python 2.7中,/如果输入为整数,则运算符为整数除法:

>>>20/15
1

>>>20.0/15.0
1.33333333333

>>>20.0/15
1.33333333333

在Python 3.3中,/即使输入是整数,运算符也是浮点除法。

>>> 20/15
1.33333333333

>>>20.0/15
1.33333333333

对于Python 3中的整数除法,我们将使用//运算符。

//运算符在Python 2.7和Python 3.3中都是整数除法运算符。

在Python 2.7和Python 3.3中:

>>>20//15
1

现在,看比较

>>>a = 7.0/4.0
>>>b = 7/4
>>>print a == b

对于上述程序,输出在Python 2.7中为False,在Python 3.3中为True。

在Python 2.7中a = 1.75和b = 1。

在Python 3.3中,a = 1.75和b = 1.75,仅因为/是一个浮点除法。

In Python 2.7, the / operator is an integer division if inputs are integers:

>>>20/15
1

>>>20.0/15.0
1.33333333333

>>>20.0/15
1.33333333333

In Python 3.3, the / operator is a float division even if the inputs are integer.

>>> 20/15
1.33333333333

>>>20.0/15
1.33333333333

For integer division in Python 3, we will use the // operator.

The // operator is an integer division operator in both Python 2.7 and Python 3.3.

In Python 2.7 and Python 3.3:

>>>20//15
1

Now, see the comparison

>>>a = 7.0/4.0
>>>b = 7/4
>>>print a == b

For the above program, the output will be False in Python 2.7 and True in Python 3.3.

In Python 2.7 a = 1.75 and b = 1.

In Python 3.3 a = 1.75 and b = 1.75, just because / is a float division.


回答 4

它与您使用的python版本有关。基本上,它采用C行为:如果将两个整数相除,结果将四舍五入为一个整数。还请记住,Python从左到右执行操作,这在您键入时发挥作用。

示例:由于这是我进行算术运算时总是会浮现的一个问题(我应该转换为浮点数和哪个数字),因此提供了一个来自该方面的示例:

>>> a = 1/2/3/4/5/4/3
>>> a
0

当我们将整数相除时,毫不奇怪的是,它会降低四舍五入。

>>> a = 1/2/3/4/5/4/float(3)
>>> a
0.0

如果我们强制转换要浮点数的最后一个整数,我们仍然会得到零,因为到我们的数字除以浮点数时,由于整数除法,该数字已经变为0。

>>> a = 1/2/3/float(4)/5/4/3
>>> a
0.0

与上述相同,但将float类型转换向左侧稍微移近。

>>> a = float(1)/2/3/4/5/4/3
>>> a
0.0006944444444444445

最后,当我们将第一个整数转换为浮点数时,结果是所需的整数,因为从第一个除法(即最左边的整数)开始,我们使用浮点数。

额外1:如果您尝试回答该问题以改善算术评估,则应检查此内容

附加2:请注意以下情况:

>>> a = float(1/2/3/4/5/4/3)
>>> a
0.0

It has to do with the version of python that you use. Basically it adopts the C behavior: if you divide two integers, the results will be rounded down to an integer. Also keep in mind that Python does the operations from left to right, which plays a role when you typecast.

Example: Since this is a question that always pops in my head when I am doing arithmetic operations (should I convert to float and which number), an example from that aspect is presented:

>>> a = 1/2/3/4/5/4/3
>>> a
0

When we divide integers, not surprisingly it gets lower rounded.

>>> a = 1/2/3/4/5/4/float(3)
>>> a
0.0

If we typecast the last integer to float, we will still get zero, since by the time our number gets divided by the float has already become 0 because of the integer division.

>>> a = 1/2/3/float(4)/5/4/3
>>> a
0.0

Same scenario as above but shifting the float typecast a little closer to the left side.

>>> a = float(1)/2/3/4/5/4/3
>>> a
0.0006944444444444445

Finally, when we typecast the first integer to float, the result is the desired one, since beginning from the first division, i.e. the leftmost one, we use floats.

Extra 1: If you are trying to answer that to improve arithmetic evaluation, you should check this

Extra 2: Please be careful of the following scenario:

>>> a = float(1/2/3/4/5/4/3)
>>> a
0.0

回答 5

通过放置“。”来指定浮点数 该数字之后也会导致其默认浮动。

>>> 1 / 2
0

>>> 1. / 2.
0.5

Specifying a float by placing a ‘.’ after the number will also cause it to default to float.

>>> 1 / 2
0

>>> 1. / 2.
0.5

回答 6

使其中至少一个浮点,然后将是浮点除法,而不是整数:

>>> (20.0-10) / (100-10)
0.1111111111111111

将结果强制转换为浮点型为时已晚。

Make at least one of them float, then it will be float division, not integer:

>>> (20.0-10) / (100-10)
0.1111111111111111

Casting the result to float is too late.


回答 7

在python中cv2未更新除法计算。因此,您必须将其包括from __future__ import division 在程序的第一行中。

In python cv2 not updated the division calculation. so, you must include from __future__ import division in first line of the program.


回答 8

无论哪种方式,它都是整数除法。10/90 =0。在第二种情况下,您只是将0强制转换为浮点数。

尝试将“ /”的操作数之一强制转换为浮点数:

float(20-10) / (100-10)

Either way, it’s integer division. 10/90 = 0. In the second case, you’re merely casting 0 to a float.

Try casting one of the operands of “/” to be a float:

float(20-10) / (100-10)

回答 9

在第二个示例中,除法已经发生,您正在转换为浮动。试试这个:

float(20-10) / float(100-10)

You’re casting to float after the division has already happened in your second example. Try this:

float(20-10) / float(100-10)

回答 10

令我惊讶的是,没有人提到原始海报可能喜欢有理数。如果您对此感兴趣,基于Python的程序Sage会为您服务。(尽管3.x正在开发中,但目前仍基于Python2.x。)

sage: (20-10) / (100-10)
1/9

这不是每个人的解决方案,因为它会做一些准备工作,所以这些数字不是intSage Integer类元素。尽管如此,值得一提的是Python生态系统的一部分。

I’m somewhat surprised that no one has mentioned that the original poster might have liked rational numbers to result. Should you be interested in this, the Python-based program Sage has your back. (Currently still based on Python 2.x, though 3.x is under way.)

sage: (20-10) / (100-10)
1/9

This isn’t a solution for everyone, because it does do some preparsing so these numbers aren’t ints, but Sage Integer class elements. Still, worth mentioning as a part of the Python ecosystem.


回答 11

我个人更喜欢1. *在开始时插入一个。所以表达式变成了这样的东西:

1. * (20-10) / (100-10)

因为我总是对某些公式进行除法,例如:

accuracy = 1. * (len(y_val) - sum(y_val)) / len(y_val)

因此不可能简单地添加一个.0like 20.0。就我而言,用a换行float()可能会失去一点可读性。

Personally I preferred to insert a 1. * at the very beginning. So the expression become something like this:

1. * (20-10) / (100-10)

As I always do a division for some formula like:

accuracy = 1. * (len(y_val) - sum(y_val)) / len(y_val)

so it is impossible to simply add a .0 like 20.0. And in my case, wrapping with a float() may lose a little bit readability.


numpy max vs amax vs maximum

问题:numpy max vs amax vs maximum

numpy的具有看起来他们可被用于同样的东西三个不同的函数—不同之处在于numpy.maximum被用于逐元素,而numpy.maxnumpy.amax可以在特定轴,或所有元件一起使用。为什么不仅仅存在numpy.max?在性能上有一些微妙之处吗?

(类似minvs. aminvs. minimum

numpy has three different functions which seem like they can be used for the same things — except that numpy.maximum can only be used element-wise, while numpy.max and numpy.amax can be used on particular axes, or all elements. Why is there more than just numpy.max? Is there some subtlety to this in performance?

(Similarly for min vs. amin vs. minimum)


回答 0

np.max只是的别名np.amax。此函数仅在单个输入数组上起作用,并在整个数组中找到最大元素的值(返回标量)。或者,它接受一个axis参数,并沿输入数组的轴找到最大值(返回一个新数组)。

>>> a = np.array([[0, 1, 6],
                  [2, 4, 1]])
>>> np.max(a)
6
>>> np.max(a, axis=0) # max of each column
array([2, 4, 6])

的默认行为np.maximum是采用两个数组并计算其按元素的最大值。在这里,“兼容”意味着可以将一个阵列广播到另一个阵列。例如:

>>> b = np.array([3, 6, 1])
>>> c = np.array([4, 2, 9])
>>> np.maximum(b, c)
array([4, 6, 9])

但是np.maximum它也是一个通用函数,这意味着它具有使用多维数组时有用的其他功能和方法。例如,您可以计算数组(或数组的特定轴)上的累积最大值:

>>> d = np.array([2, 0, 3, -4, -2, 7, 9])
>>> np.maximum.accumulate(d)
array([2, 2, 3, 3, 3, 7, 9])

无法使用np.max

您可以在使用时在一定程度上进行np.maximum模仿:np.maxnp.maximum.reduce

>>> np.maximum.reduce(d)
9
>>> np.max(d)
9

基本测试表明这两种方法在性能上是可比的。它们应该是np.max()实际需要np.maximum.reduce执行的计算。

np.max is just an alias for np.amax. This function only works on a single input array and finds the value of maximum element in that entire array (returning a scalar). Alternatively, it takes an axis argument and will find the maximum value along an axis of the input array (returning a new array).

>>> a = np.array([[0, 1, 6],
                  [2, 4, 1]])
>>> np.max(a)
6
>>> np.max(a, axis=0) # max of each column
array([2, 4, 6])

The default behaviour of np.maximum is to take two arrays and compute their element-wise maximum. Here, ‘compatible’ means that one array can be broadcast to the other. For example:

>>> b = np.array([3, 6, 1])
>>> c = np.array([4, 2, 9])
>>> np.maximum(b, c)
array([4, 6, 9])

But np.maximum is also a universal function which means that it has other features and methods which come in useful when working with multidimensional arrays. For example you can compute the cumulative maximum over an array (or a particular axis of the array):

>>> d = np.array([2, 0, 3, -4, -2, 7, 9])
>>> np.maximum.accumulate(d)
array([2, 2, 3, 3, 3, 7, 9])

This is not possible with np.max.

You can make np.maximum imitate np.max to a certain extent when using np.maximum.reduce:

>>> np.maximum.reduce(d)
9
>>> np.max(d)
9

Basic testing suggests the two approaches are comparable in performance; and they should be, as np.max() actually calls np.maximum.reduce to do the computation.


回答 1

您已经说明了为什么np.maximum不同的地方-它返回的数组是两个数组之间按元素的最大值。

至于np.amaxnp.max:它们都调用相同的函数- np.max只是的别名np.amax,它们计算数组中或沿数组轴上所有元素的最大值。

In [1]: import numpy as np

In [2]: np.amax
Out[2]: <function numpy.core.fromnumeric.amax>

In [3]: np.max
Out[3]: <function numpy.core.fromnumeric.amax>

You’ve already stated why np.maximum is different – it returns an array that is the element-wise maximum between two arrays.

As for np.amax and np.max: they both call the same function – np.max is just an alias for np.amax, and they compute the maximum of all elements in an array, or along an axis of an array.

In [1]: import numpy as np

In [2]: np.amax
Out[2]: <function numpy.core.fromnumeric.amax>

In [3]: np.max
Out[3]: <function numpy.core.fromnumeric.amax>

回答 2

为了完整起见,在Numpy中有四个最大相关函数。它们分为两个不同的类别:

  • np.amax/np.maxnp.nanmax::用于单阵列订单统计
  • np.maximumnp.fmax:用于两个数组的元素比较

单阵列订单统计

NaNs传播者np.amax/np.max及其NaN无知对应物np.nanmax

  • np.max只是的别名np.amax,因此它们被视为一个函数。

    >>> np.max.__name__
    'amax'
    >>> np.max is np.amax
    True
  • np.max传播NaN,而np.nanmax忽略NaN。

    >>> np.max([np.nan, 3.14, -1])
    nan
    >>> np.nanmax([np.nan, 3.14, -1])
    3.14

二。用于两个数组的元素比较

NaNs传播者np.maximum及其NaNs无知对应物np.fmax

  • 这两个函数都需要两个数组作为要比较的前两个位置args。

    # x1 and x2 must be the same shape or can be broadcast
    np.maximum(x1, x2, /, ...);
    np.fmax(x1, x2, /, ...)
  • np.maximum传播NaN,而np.fmax忽略NaN。

    >>> np.maximum([np.nan, 3.14, 0], [np.NINF, np.nan, 2.72])
    array([ nan,  nan, 2.72])
    >>> np.fmax([np.nan, 3.14, 0], [np.NINF, np.nan, 2.72])
    array([-inf, 3.14, 2.72])
  • 逐个元素的函数是np.ufuncUniversal Function,这意味着它们具有正常Numpy函数所不具备的一些特殊属性。

    >>> type(np.maximum)
    <class 'numpy.ufunc'>
    >>> type(np.fmax)
    <class 'numpy.ufunc'>
    >>> #---------------#
    >>> type(np.max)
    <class 'function'>
    >>> type(np.nanmax)
    <class 'function'>

最后,相同的规则适用于四个最小相关功能:

  • np.amin/np.minnp.nanmin;
  • 并且np.minimumnp.fmin

For completeness, in Numpy there are four maximum related functions. They fall into two different categories:

  • np.amax/np.max, np.nanmax: for single array order statistics
  • and np.maximum, np.fmax: for element-wise comparison of two arrays

I. For single array order statistics

NaNs propagator np.amax/np.max and its NaN ignorant counterpart np.nanmax.

  • np.max is just an alias of np.amax, so they are considered as one function.

    >>> np.max.__name__
    'amax'
    >>> np.max is np.amax
    True
    
  • np.max propagates NaNs while np.nanmax ignores NaNs.

    >>> np.max([np.nan, 3.14, -1])
    nan
    >>> np.nanmax([np.nan, 3.14, -1])
    3.14
    

II. For element-wise comparison of two arrays

NaNs propagator np.maximum and its NaNs ignorant counterpart np.fmax.

  • Both functions require two arrays as the first two positional args to compare with.

    # x1 and x2 must be the same shape or can be broadcast
    np.maximum(x1, x2, /, ...);
    np.fmax(x1, x2, /, ...)
    
  • np.maximum propagates NaNs while np.fmax ignores NaNs.

    >>> np.maximum([np.nan, 3.14, 0], [np.NINF, np.nan, 2.72])
    array([ nan,  nan, 2.72])
    >>> np.fmax([np.nan, 3.14, 0], [np.NINF, np.nan, 2.72])
    array([-inf, 3.14, 2.72])
    
  • The element-wise functions are np.ufunc(Universal Function), which means they have some special properties that normal Numpy function don’t have.

    >>> type(np.maximum)
    <class 'numpy.ufunc'>
    >>> type(np.fmax)
    <class 'numpy.ufunc'>
    >>> #---------------#
    >>> type(np.max)
    <class 'function'>
    >>> type(np.nanmax)
    <class 'function'>
    

And finally, the same rules apply to the four minimum related functions:

  • np.amin/np.min, np.nanmin;
  • and np.minimum, np.fmin.

回答 3

np.maximum 不仅按元素进行比较,而且将数组与单个值进行比较

>>>np.maximum([23, 14, 16, 20, 25], 18)
array([23, 18, 18, 20, 25])

np.maximum not only compares elementwise but also compares array elementwise with single value

>>>np.maximum([23, 14, 16, 20, 25], 18)
array([23, 18, 18, 20, 25])

在球体上平均分配n个点

问题:在球体上平均分配n个点

我需要一种算法,该算法可以使我在球体上的位置保持N个点(可能少于20个),并模糊地将它们分散开。不需要“完美”,但是我只需要它,所以它们都不会聚在一起。

  • 这个问题提供了很好的代码,但是我找不到统一的方法,因为这似乎是100%随机的。
  • 推荐的这篇博客文章有两种方法可以输入球体上的点数,但是Saff和Kuijlaars算法恰好是我可以转录的伪代码,而我发现的代码示例包含“ node [k]”,而我无法看到解释并破坏了这种可能性。第二个博客示例是“黄金分割螺旋”,它给了我奇怪的,成堆的结果,但没有明确的方法来定义恒定半径。
  • 这种算法这个问题好像它可能工作,但我不能拼凑那是什么网页上成伪代码或任何东西。

我遇到的其他一些问题线程涉及随机均匀分布,这增加了我不关心的复杂程度。我很抱歉这是一个愚蠢的问题,但是我想表明我确实看上去很努力,但仍然表现不佳。

因此,我要寻找的是简单的伪代码,以将N个点均匀地分布在一个单位球体上,该点以球坐标或笛卡尔坐标返回。如果它甚至可以通过一点随机分布就更好了(想想围绕一颗恒星的行星,适当散开,但还有回旋余地)。

I need an algorithm that can give me positions around a sphere for N points (less than 20, probably) that vaguely spreads them out. There’s no need for “perfection”, but I just need it so none of them are bunched together.

  • This question provided good code, but I couldn’t find a way to make this uniform, as this seemed 100% randomized.
  • This blog post recommended had two ways allowing input of number of points on the sphere, but the Saff and Kuijlaars algorithm is exactly in psuedocode I could transcribe, and the code example I found contained “node[k]”, which I couldn’t see explained and ruined that possibility. The second blog example was the Golden Section Spiral, which gave me strange, bunched up results, with no clear way to define a constant radius.
  • This algorithm from this question seems like it could possibly work, but I can’t piece together what’s on that page into psuedocode or anything.

A few other question threads I came across spoke of randomized uniform distribution, which adds a level of complexity I’m not concerned about. I apologize that this is such a silly question, but I wanted to show that I’ve truly looked hard and still come up short.

So, what I’m looking for is simple pseudocode to evenly distribute N points around a unit sphere, that either returns in spherical or Cartesian coordinates. Even better if it can even distribute with a bit of randomization (think planets around a star, decently spread out, but with room for leeway).


回答 0

此示例中,代码 node[k]只是第k个节点。您正在生成一个数组N个点,它node[k]是第k个(从0到N-1)。如果这一切使您感到困惑,希望您现在就可以使用它。

(换句话说,k是大小为N的数组,该数组在代码片段开始之前定义,并且包含点列表)。

或者,在此处建立另一个答案(并使用Python):

> cat ll.py
from math import asin
nx = 4; ny = 5
for x in range(nx):
    lon = 360 * ((x+0.5) / nx)
    for y in range(ny):                                                         
        midpt = (y+0.5) / ny                                                    
        lat = 180 * asin(2*((y+0.5)/ny-0.5))                                    
        print lon,lat                                                           
> python2.7 ll.py                                                      
45.0 -166.91313924                                                              
45.0 -74.0730322921                                                             
45.0 0.0                                                                        
45.0 74.0730322921                                                              
45.0 166.91313924                                                               
135.0 -166.91313924                                                             
135.0 -74.0730322921                                                            
135.0 0.0                                                                       
135.0 74.0730322921                                                             
135.0 166.91313924                                                              
225.0 -166.91313924                                                             
225.0 -74.0730322921                                                            
225.0 0.0                                                                       
225.0 74.0730322921                                                             
225.0 166.91313924
315.0 -166.91313924
315.0 -74.0730322921
315.0 0.0
315.0 74.0730322921
315.0 166.91313924

如果进行绘制,您会发现两极附近的垂直间距较大,因此每个点都位于大约相同的总面积内空间中(在两极附近,“水平”空间较小,因此“垂直”空间更大) )。

这与所有点到邻居的距离都差不多(这是我认为您的链接所要讨论的)不同,但这可能足以满足您的需求,并且只需制作一个统一的经纬度网格即可进行改进。

In this example code node[k] is just the kth node. You are generating an array N points and node[k] is the kth (from 0 to N-1). If that is all that is confusing you, hopefully you can use that now.

(in other words, k is an array of size N that is defined before the code fragment starts, and which contains a list of the points).

Alternatively, building on the other answer here (and using Python):

> cat ll.py
from math import asin
nx = 4; ny = 5
for x in range(nx):
    lon = 360 * ((x+0.5) / nx)
    for y in range(ny):                                                         
        midpt = (y+0.5) / ny                                                    
        lat = 180 * asin(2*((y+0.5)/ny-0.5))                                    
        print lon,lat                                                           
> python2.7 ll.py                                                      
45.0 -166.91313924                                                              
45.0 -74.0730322921                                                             
45.0 0.0                                                                        
45.0 74.0730322921                                                              
45.0 166.91313924                                                               
135.0 -166.91313924                                                             
135.0 -74.0730322921                                                            
135.0 0.0                                                                       
135.0 74.0730322921                                                             
135.0 166.91313924                                                              
225.0 -166.91313924                                                             
225.0 -74.0730322921                                                            
225.0 0.0                                                                       
225.0 74.0730322921                                                             
225.0 166.91313924
315.0 -166.91313924
315.0 -74.0730322921
315.0 0.0
315.0 74.0730322921
315.0 166.91313924

If you plot that, you’ll see that the vertical spacing is larger near the poles so that each point is situated in about the same total area of space (near the poles there’s less space “horizontally”, so it gives more “vertically”).

This isn’t the same as all points having about the same distance to their neighbours (which is what I think your links are talking about), but it may be sufficient for what you want and improves on simply making a uniform lat/lon grid.


回答 1

斐波那契球算法对此非常有用。它速度快,并且结果一目了然,很容易使人眼蒙蔽。您可以看到一个处理完成的示例,该处理将随着时间的增加显示结果。这是 @gman制作的另一个出色的交互式示例。这是python中的一个简单实现。

import math


def fibonacci_sphere(samples=1):

    points = []
    phi = math.pi * (3. - math.sqrt(5.))  # golden angle in radians

    for i in range(samples):
        y = 1 - (i / float(samples - 1)) * 2  # y goes from 1 to -1
        radius = math.sqrt(1 - y * y)  # radius at y

        theta = phi * i  # golden angle increment

        x = math.cos(theta) * radius
        z = math.sin(theta) * radius

        points.append((x, y, z))

    return points

1000个样本可为您提供:

在此处输入图片说明

The Fibonacci sphere algorithm is great for this. It is fast and gives results that at a glance will easily fool the human eye. You can see an example done with processing which will show the result over time as points are added. Here’s another great interactive example made by @gman. And here’s a simple implementation in python.

import math


def fibonacci_sphere(samples=1):

    points = []
    phi = math.pi * (3. - math.sqrt(5.))  # golden angle in radians

    for i in range(samples):
        y = 1 - (i / float(samples - 1)) * 2  # y goes from 1 to -1
        radius = math.sqrt(1 - y * y)  # radius at y

        theta = phi * i  # golden angle increment

        x = math.cos(theta) * radius
        z = math.sin(theta) * radius

        points.append((x, y, z))

    return points

1000 samples gives you this:

enter image description here


回答 2

黄金螺旋法

您说您无法使用金色螺旋方法,但这很遗憾,因为它确实非常好。我想给您一个完整的了解,以便您也许可以理解如何避免这一问题。

因此,这是一种快速,非随机的方式来创建近似正确的晶格。如上所述,没有一个晶格会是完美的,但这可能就足够了。它与其他方法(例如BendWavy.org中的方法)进行了比较,但它的外观漂亮美观,并且可以保证极限间距的均匀性。

底漆:单位盘上的向日葵螺旋

为了理解该算法,我首先邀请您看一下2D向日葵螺旋算法。这是基于这样一个事实,即最不合理的数字是黄金分割率(1 + sqrt(5))/2,如果一个人通过“站在中心,转动整个黄金分割率,然后在那个方向发射另一个点”的方法来发射点,则自然会构造一个螺旋线,随着您获得越来越多的点数,尽管如此,仍然拒绝拥有明确排列的“条形”,使这些点排成一行。(注1)

磁盘上均匀间距的算法是

from numpy import pi, cos, sin, sqrt, arange
import matplotlib.pyplot as pp

num_pts = 100
indices = arange(0, num_pts, dtype=float) + 0.5

r = sqrt(indices/num_pts)
theta = pi * (1 + 5**0.5) * indices

pp.scatter(r*cos(theta), r*sin(theta))
pp.show()

并产生如下结果(n = 100和n = 1000):

在此处输入图片说明

径向间隔点

关键的怪事是公式r = sqrt(indices / num_pts); 我怎么来的那个?(笔记2。)

好吧,我在这里使用平方根,因为我希望它们在磁盘周围具有均匀的区域间距。这是相同的话说,在大的限度Ñ我想一点区域ř ∈([R – [R + d – [R ),Θ ∈(θθ + d θ)来包含正比于它的区域的多个点,这是– [R d – [R d θ。现在,如果我们假装我们在这里谈论一个随机变量,这可以直接解释为说(RΘ)仅为cr对于某些常数的联合概率密度 c。然后在单位磁盘上进行归一化将迫使c = 1 /π。

现在让我介绍一个技巧。它来自概率论在那里它被称为采样逆CDF:假设你想生成的概率密度的随机变量˚FZ ^),你有一个随机变量û〜制服(0,1),就像来的出random()在大多数编程语言中。你怎么做到这一点?

  1. 首先,将您的密度转换为累积分布函数或CDF,我们将其称为Fz)。请记住,CDF随导数fz)。
  2. 然后计算CDF的反函数F -1z)。
  3. 您会发现Z = F -1U)根据目标密度分布。(注3)。

现在,黄金比例螺旋技巧将点以θ的均匀分布方式隔开,因此我们将其积分;对于单位磁盘,我们剩下Fr)= r 2。因此反函数为F -1u)= u 1/2,因此我们将在磁盘上生成极坐标为的随机点r = sqrt(random()); theta = 2 * pi * random()

现在,我们不再对这个反函数进行随机采样,而是对其进行均匀采样,而关于均匀采样的好处是,关于点如何在大N的限制内扩展的结果将表现得就像我们对随机函数采样一样。这种组合是诀窍。而不是random()使用(arange(0, num_pts, dtype=float) + 0.5)/num_pts,也就是说,如果我们要采样10个点,则为r = 0.05, 0.15, 0.25, ... 0.95。我们统一采样r以获得相等的区域间距,并使用向日葵增量来避免输出中的点“棒”。

现在在球上做向日葵

我们需要对球点进行点更改仅涉及将极坐标切换为球坐标。径向坐标当然不会进入此范围,因为我们位于单位球体上。为了让事情变得更加一致这里,尽管我是一位训练有素的物理学家,我会用数学家的坐标,其中0≤ φ ≤π就是北纬从极点和0≤下来θ ≤2π东经。因此与上面的区别在于,我们基本上是用φ代替变量r

我们的区域元素,这是[R d [R d θ,现在变成了没有,备受更复杂的罪孽(φ)d φ d θ。因此,我们对统一的间距联合密度是罪(φ)/4π。积分出θ,我们发现˚Fφ)= SIN(φ)/ 2,从而˚Fφ)=(1 – COS(φ))/ 2。反相此我们可以看到,一个均匀随机变量看起来像ACOS(1 – 2 ü),但我们采样均匀,而不是随机的,所以我们改为使用φ ķ = ACOS(1 – 2( ķ+ 0.5)/ N)。算法的其余部分只是将其投影到x,y和z坐标上:

from numpy import pi, cos, sin, arccos, arange
import mpl_toolkits.mplot3d
import matplotlib.pyplot as pp

num_pts = 1000
indices = arange(0, num_pts, dtype=float) + 0.5

phi = arccos(1 - 2*indices/num_pts)
theta = pi * (1 + 5**0.5) * indices

x, y, z = cos(theta) * sin(phi), sin(theta) * sin(phi), cos(phi);

pp.figure().add_subplot(111, projection='3d').scatter(x, y, z);
pp.show()

再次对于n = 100和n = 1000,结果如下所示: 在此处输入图片说明 在此处输入图片说明

进一步的研究

我想大呼马丁·罗伯茨(Martin Roberts)的博客。请注意,上面我通过向每个索引添加0.5来创建索引的偏移量。这只是视觉上吸引我,但是事实证明,偏移量的选择很重要,并且在整个间隔内不是恒定不变的,并且如果选择正确,可能意味着包装精度提高了8%。还应该有一种方法可以使他的R 2序列覆盖一个球体,很有趣的是,看看它是否也产生了很好的均匀覆盖,也许是原样,但也许只需要从一半单位正方形沿对角线方向左右切开,然后拉伸得到一个圆。

笔记

  1. 这些“条”是由对数字的有理逼近形成的,而对数字的最佳有理逼近来自其连续的分数表达式,z + 1/(n_1 + 1/(n_2 + 1/(n_3 + ...)))其中z是整数,并且n_1, n_2, n_3, ...是正整数的有限或无限序列:

    def continued_fraction(r):
        while r != 0:
            n = floor(r)
            yield n
            r = 1/(r - n)

    由于分数部分1/(...)始终在零和一之间,因此连续分数中的大整数可以提供特别好的有理近似值:“一个除以100和101之间的值”比“一个除以1-2之间的值”要好。因此,最不合理的数字是一个,1 + 1/(1 + 1/(1 + ...))并且没有特别好的有理近似值。通过乘以φ可以得出黄金分割率的公式,从而可以求解φ = 1 + 1 / φ

  2. 对于不太熟悉NumPy的人们-所有功能都是“矢量化的”,因此sqrt(array)与其他语言可能会写的相同map(sqrt, array)。因此,这是一个逐个组件的sqrt应用程序。标量除法或标量加法也是如此-并行适用于所有组件。

  3. 一旦您知道这是结果,证明就很简单。如果您问z < Z < z + d z的概率是什么,这与问z < F -1U)< z + d z的概率是什么,将F应用于所有三个表达式表示它是一个单调递增的函数,因此Fz)< U < Fz + d z),向外扩展右侧以找到Fz)+ fz)d z,并且由于U是均匀的,因此如所承诺的,该概率仅为fz)d z

The golden spiral method

You said you couldn’t get the golden spiral method to work and that’s a shame because it’s really, really good. I would like to give you a complete understanding of it so that maybe you can understand how to keep this away from being “bunched up.”

So here’s a fast, non-random way to create a lattice that is approximately correct; as discussed above, no lattice will be perfect, but this may be good enough. It is compared to other methods e.g. at BendWavy.org but it just has a nice and pretty look as well as a guarantee about even spacing in the limit.

Primer: sunflower spirals on the unit disk

To understand this algorithm, I first invite you to look at the 2D sunflower spiral algorithm. This is based on the fact that the most irrational number is the golden ratio (1 + sqrt(5))/2 and if one emits points by the approach “stand at the center, turn a golden ratio of whole turns, then emit another point in that direction,” one naturally constructs a spiral which, as you get to higher and higher numbers of points, nevertheless refuses to have well-defined ‘bars’ that the points line up on.(Note 1.)

The algorithm for even spacing on a disk is,

from numpy import pi, cos, sin, sqrt, arange
import matplotlib.pyplot as pp

num_pts = 100
indices = arange(0, num_pts, dtype=float) + 0.5

r = sqrt(indices/num_pts)
theta = pi * (1 + 5**0.5) * indices

pp.scatter(r*cos(theta), r*sin(theta))
pp.show()

and it produces results that look like (n=100 and n=1000):

enter image description here

Spacing the points radially

The key strange thing is the formula r = sqrt(indices / num_pts); how did I come to that one? (Note 2.)

Well, I am using the square root here because I want these to have even-area spacing around the disk. That is the same as saying that in the limit of large N I want a little region R ∈ (r, r + dr), Θ ∈ (θ, θ + dθ) to contain a number of points proportional to its area, which is r dr dθ. Now if we pretend that we are talking about a random variable here, this has a straightforward interpretation as saying that the joint probability density for (R, Θ) is just c r for some constant c. Normalization on the unit disk would then force c = 1/π.

Now let me introduce a trick. It comes from probability theory where it’s known as sampling the inverse CDF: suppose you wanted to generate a random variable with a probability density f(z) and you have a random variable U ~ Uniform(0, 1), just like comes out of random() in most programming languages. How do you do this?

  1. First, turn your density into a cumulative distribution function or CDF, which we will call F(z). A CDF, remember, increases monotonically from 0 to 1 with derivative f(z).
  2. Then calculate the CDF’s inverse function F-1(z).
  3. You will find that Z = F-1(U) is distributed according to the target density. (Note 3).

Now the golden-ratio spiral trick spaces the points out in a nicely even pattern for θ so let’s integrate that out; for the unit disk we are left with F(r) = r2. So the inverse function is F-1(u) = u1/2, and therefore we would generate random points on the disk in polar coordinates with r = sqrt(random()); theta = 2 * pi * random().

Now instead of randomly sampling this inverse function we’re uniformly sampling it, and the nice thing about uniform sampling is that our results about how points are spread out in the limit of large N will behave as if we had randomly sampled it. This combination is the trick. Instead of random() we use (arange(0, num_pts, dtype=float) + 0.5)/num_pts, so that, say, if we want to sample 10 points they are r = 0.05, 0.15, 0.25, ... 0.95. We uniformly sample r to get equal-area spacing, and we use the sunflower increment to avoid awful “bars” of points in the output.

Now doing the sunflower on a sphere

The changes that we need to make to dot the sphere with points merely involve switching out the polar coordinates for spherical coordinates. The radial coordinate of course doesn’t enter into this because we’re on a unit sphere. To keep things a little more consistent here, even though I was trained as a physicist I’ll use mathematicians’ coordinates where 0 ≤ φ ≤ π is latitude coming down from the pole and 0 ≤ θ ≤ 2π is longitude. So the difference from above is that we are basically replacing the variable r with φ.

Our area element, which was r dr dθ, now becomes the not-much-more-complicated sin(φ) dφ dθ. So our joint density for uniform spacing is sin(φ)/4π. Integrating out θ, we find f(φ) = sin(φ)/2, thus F(φ) = (1 − cos(φ))/2. Inverting this we can see that a uniform random variable would look like acos(1 – 2 u), but we sample uniformly instead of randomly, so we instead use φk = acos(1 − 2 (k + 0.5)/N). And the rest of the algorithm is just projecting this onto the x, y, and z coordinates:

from numpy import pi, cos, sin, arccos, arange
import mpl_toolkits.mplot3d
import matplotlib.pyplot as pp

num_pts = 1000
indices = arange(0, num_pts, dtype=float) + 0.5

phi = arccos(1 - 2*indices/num_pts)
theta = pi * (1 + 5**0.5) * indices

x, y, z = cos(theta) * sin(phi), sin(theta) * sin(phi), cos(phi);

pp.figure().add_subplot(111, projection='3d').scatter(x, y, z);
pp.show()

Again for n=100 and n=1000 the results look like: enter image description here enter image description here

Further research

I wanted to give a shout out to Martin Roberts’s blog. Note that above I created an offset of my indices by adding 0.5 to each index. This was just visually appealing to me, but it turns out that the choice of offset matters a lot and is not constant over the interval and can mean getting as much as 8% better accuracy in packing if chosen correctly. There should also be a way to get his R2 sequence to cover a sphere and it would be interesting to see if this also produced a nice even covering, perhaps as-is but perhaps needing to be, say, taken from only a half of the unit square cut diagonally or so and stretched around to get a circle.

Notes

  1. Those “bars” are formed by rational approximations to a number, and the best rational approximations to a number come from its continued fraction expression, z + 1/(n_1 + 1/(n_2 + 1/(n_3 + ...))) where z is an integer and n_1, n_2, n_3, ... is either a finite or infinite sequence of positive integers:

    def continued_fraction(r):
        while r != 0:
            n = floor(r)
            yield n
            r = 1/(r - n)
    

    Since the fraction part 1/(...) is always between zero and one, a large integer in the continued fraction allows for a particularly good rational approximation: “one divided by something between 100 and 101” is better than “one divided by something between 1 and 2.” The most irrational number is therefore the one which is 1 + 1/(1 + 1/(1 + ...)) and has no particularly good rational approximations; one can solve φ = 1 + 1/φ by multiplying through by φ to get the formula for the golden ratio.

  2. For folks who are not so familiar with NumPy — all of the functions are “vectorized,” so that sqrt(array) is the same as what other languages might write map(sqrt, array). So this is a component-by-component sqrt application. The same also holds for division by a scalar or addition with scalars — those apply to all components in parallel.

  3. The proof is simple once you know that this is the result. If you ask what’s the probability that z < Z < z + dz, this is the same as asking what’s the probability that z < F-1(U) < z + dz, apply F to all three expressions noting that it is a monotonically increasing function, hence F(z) < U < F(z + dz), expand the right hand side out to find F(z) + f(z) dz, and since U is uniform this probability is just f(z) dz as promised.


回答 3

这被称为球体上的堆积点,并且没有(已知)一般的完美解决方案。但是,有许多不完善的解决方案。最受欢迎的三个似乎是:

  1. 创建一个模拟。将每个点视为约束在球体上的电子,然后运行一定数量的步骤进行仿真。电子的排斥力自然会使系统趋于更稳定的状态,在这些状态下,点之间的距离尽可能远。
  2. 超立方体排斥。这种花哨的方法实际上非常简单:您可以在围绕球体的立方体内部统一选择点(远远超过n它们),然后拒绝球体外部的点。将其余点视为向量,并将其标准化。这些是您的“样本”- n使用某种方法(随机,贪婪等)选择它们。
  3. 螺旋近似。您围绕球体跟踪螺旋,并在螺旋周围均匀分布点。由于涉及数学,因此与模拟相比,它们的理解更为复杂,但速度更快(并且可能涉及的代码更少)。最受欢迎的似乎是Saff等人

一个很多关于这个问题的更多信息,可以发现这里

This is known as packing points on a sphere, and there is no (known) general, perfect solution. However, there are plenty of imperfect solutions. The three most popular seem to be:

  1. Create a simulation. Treat each point as an electron constrained to a sphere, then run a simulation for a certain number of steps. The electrons’ repulsion will naturally tend the system to a more stable state, where the points are about as far away from each other as they can get.
  2. Hypercube rejection. This fancy-sounding method is actually really simple: you uniformly choose points (much more than n of them) inside of the cube surrounding the sphere, then reject the points outside of the sphere. Treat the remaining points as vectors, and normalize them. These are your “samples” – choose n of them using some method (randomly, greedy, etc).
  3. Spiral approximations. You trace a spiral around a sphere, and evenly-distribute the points around the spiral. Because of the mathematics involved, these are more complicated to understand than the simulation, but much faster (and probably involving less code). The most popular seems to be by Saff, et al.

A lot more information about this problem can be found here


回答 4

您要寻找的是球形覆盖物。球形覆盖问题非常棘手,除了少数点之外,其他解决方案都是未知的。可以肯定知道的一件事是,给定一个球体上的n个点,总是存在两个距离d = (4-csc^2(\pi n/6(n-2)))^(1/2)或更近的点。

如果您想要一种概率方法来生成均匀分布在球体上的点,则很简单:通过高斯分布在空间中均匀生成点(它内置于Java中,不难找到其他语言的代码)。因此,在3维空间中,您需要

Random r = new Random();
double[] p = { r.nextGaussian(), r.nextGaussian(), r.nextGaussian() };

然后通过将其与原点的距离归一化来将点投影到球体上

double norm = Math.sqrt( (p[0])^2 + (p[1])^2 + (p[2])^2 ); 
double[] sphereRandomPoint = { p[0]/norm, p[1]/norm, p[2]/norm };

n维上的高斯分布是球对称的,因此到球上的投影是均匀的。

当然,不能保证在统一生成的点的集合中任意两个点之间的距离都将限制在下面,因此您可以使用拒绝来强制执行您可能具有的任何此类条件:可能最好先生成整个集合,然后再生成如有必要,拒绝整个收藏。(或者使用“早期拒绝”来拒绝您到目前为止生成的整个集合;只是不要保留某些要点,而要丢弃其他要点。)您可以使用d上面给出的公式减去一些懈怠来确定之间的最小距离点以下,您将拒绝一组点。您必须计算n选择2个距离,拒绝的概率取决于松弛度;很难说是怎么回事,所以运行模拟以了解相关的统计信息。

What you are looking for is called a spherical covering. The spherical covering problem is very hard and solutions are unknown except for small numbers of points. One thing that is known for sure is that given n points on a sphere, there always exist two points of distance d = (4-csc^2(\pi n/6(n-2)))^(1/2) or closer.

If you want a probabilistic method for generating points uniformly distributed on a sphere, it’s easy: generate points in space uniformly by Gaussian distribution (it’s built into Java, not hard to find the code for other languages). So in 3-dimensional space, you need something like

Random r = new Random();
double[] p = { r.nextGaussian(), r.nextGaussian(), r.nextGaussian() };

Then project the point onto the sphere by normalizing its distance from the origin

double norm = Math.sqrt( (p[0])^2 + (p[1])^2 + (p[2])^2 ); 
double[] sphereRandomPoint = { p[0]/norm, p[1]/norm, p[2]/norm };

The Gaussian distribution in n dimensions is spherically symmetric so the projection onto the sphere is uniform.

Of course, there’s no guarantee that the distance between any two points in a collection of uniformly generated points will be bounded below, so you can use rejection to enforce any such conditions that you might have: probably it’s best to generate the whole collection and then reject the whole collection if necessary. (Or use “early rejection” to reject the whole collection you’ve generated so far; just don’t keep some points and drop others.) You can use the formula for d given above, minus some slack, to determine the min distance between points below which you will reject a set of points. You’ll have to calculate n choose 2 distances, and the probability of rejection will depend on the slack; it’s hard to say how, so run a simulation to get a feel for the relevant statistics.


回答 5

该答案基于该答案很好概述的相同“理论”

我将这个答案添加为: -此外,很难“摸索”如何在没有图像的情况下区分其他选项,因此,这是此选项的外观(如下)以及可立即运行的实现随之而来。
-其他选项均不能满足“均匀性”需求(即显然不是这样)。(注意在原始问题中特别希望获得类似行星分布的行为,您只是从有限的列表中随机拒绝了k个均匀创建的点(随机返回k个项目中的索引计数)。)
-最接近其他暗示迫使您通过“角轴”来决定“ N”,而跨两个角轴值仅是“ N的一个值”(在N较小的情况下,要知道可能会是什么还是可能不会很重要(例如,您想获得“ 5分”-尽享乐趣))

N为20时:

在此处输入图片说明
然后N在80: 在此处输入图片说明


这是现成的python3代码,其仿真是相同的来源:“ http://web.archive.org/web/20120421191837/http://www.cgafaq.info/wiki/Evenly_distributed_points_on_sphere ” 。(我包含的绘图在以“ main”运行时会触发,取自:http : //www.scipy.org/Cookbook/Matplotlib/mplot3D

from math import cos, sin, pi, sqrt

def GetPointsEquiAngularlyDistancedOnSphere(numberOfPoints=45):
    """ each point you get will be of form 'x, y, z'; in cartesian coordinates
        eg. the 'l2 distance' from the origion [0., 0., 0.] for each point will be 1.0 
        ------------
        converted from:  http://web.archive.org/web/20120421191837/http://www.cgafaq.info/wiki/Evenly_distributed_points_on_sphere ) 
    """
    dlong = pi*(3.0-sqrt(5.0))  # ~2.39996323 
    dz   =  2.0/numberOfPoints
    long =  0.0
    z    =  1.0 - dz/2.0
    ptsOnSphere =[]
    for k in range( 0, numberOfPoints): 
        r    = sqrt(1.0-z*z)
        ptNew = (cos(long)*r, sin(long)*r, z)
        ptsOnSphere.append( ptNew )
        z    = z - dz
        long = long + dlong
    return ptsOnSphere

if __name__ == '__main__':                
    ptsOnSphere = GetPointsEquiAngularlyDistancedOnSphere( 80)    

    #toggle True/False to print them
    if( True ):    
        for pt in ptsOnSphere:  print( pt)

    #toggle True/False to plot them
    if(True):
        from numpy import *
        import pylab as p
        import mpl_toolkits.mplot3d.axes3d as p3

        fig=p.figure()
        ax = p3.Axes3D(fig)

        x_s=[];y_s=[]; z_s=[]

        for pt in ptsOnSphere:
            x_s.append( pt[0]); y_s.append( pt[1]); z_s.append( pt[2])

        ax.scatter3D( array( x_s), array( y_s), array( z_s) )                
        ax.set_xlabel('X'); ax.set_ylabel('Y'); ax.set_zlabel('Z')
        p.show()
        #end

经过低计数测试(N以2、5、7、13等表示),并且看起来“不错”

This answer is based on the same ‘theory’ that is outlined well by this answer

I’m adding this answer as:
— None of the other options fit the ‘uniformity’ need ‘spot-on’ (or not obviously-clearly so). (Noting to get the planet like distribution looking behavior particurally wanted in the original ask, you just reject from the finite list of the k uniformly created points at random (random wrt the index count in the k items back).)
–The closest other impl forced you to decide the ‘N’ by ‘angular axis’, vs. just ‘one value of N’ across both angular axis values ( which at low counts of N is very tricky to know what may, or may not matter (e.g. you want ‘5’ points — have fun ) )
–Furthermore, it’s very hard to ‘grok’ how to differentiate between the other options without any imagery, so here’s what this option looks like (below), and the ready-to-run implementation that goes with it.

with N at 20:

enter image description here
and then N at 80: enter image description here


here’s the ready-to-run python3 code, where the emulation is that same source: ” http://web.archive.org/web/20120421191837/http://www.cgafaq.info/wiki/Evenly_distributed_points_on_sphere ” found by others. ( The plotting I’ve included, that fires when run as ‘main,’ is taken from: http://www.scipy.org/Cookbook/Matplotlib/mplot3D )

from math import cos, sin, pi, sqrt

def GetPointsEquiAngularlyDistancedOnSphere(numberOfPoints=45):
    """ each point you get will be of form 'x, y, z'; in cartesian coordinates
        eg. the 'l2 distance' from the origion [0., 0., 0.] for each point will be 1.0 
        ------------
        converted from:  http://web.archive.org/web/20120421191837/http://www.cgafaq.info/wiki/Evenly_distributed_points_on_sphere ) 
    """
    dlong = pi*(3.0-sqrt(5.0))  # ~2.39996323 
    dz   =  2.0/numberOfPoints
    long =  0.0
    z    =  1.0 - dz/2.0
    ptsOnSphere =[]
    for k in range( 0, numberOfPoints): 
        r    = sqrt(1.0-z*z)
        ptNew = (cos(long)*r, sin(long)*r, z)
        ptsOnSphere.append( ptNew )
        z    = z - dz
        long = long + dlong
    return ptsOnSphere

if __name__ == '__main__':                
    ptsOnSphere = GetPointsEquiAngularlyDistancedOnSphere( 80)    

    #toggle True/False to print them
    if( True ):    
        for pt in ptsOnSphere:  print( pt)

    #toggle True/False to plot them
    if(True):
        from numpy import *
        import pylab as p
        import mpl_toolkits.mplot3d.axes3d as p3

        fig=p.figure()
        ax = p3.Axes3D(fig)

        x_s=[];y_s=[]; z_s=[]

        for pt in ptsOnSphere:
            x_s.append( pt[0]); y_s.append( pt[1]); z_s.append( pt[2])

        ax.scatter3D( array( x_s), array( y_s), array( z_s) )                
        ax.set_xlabel('X'); ax.set_ylabel('Y'); ax.set_zlabel('Z')
        p.show()
        #end

tested at low counts (N in 2, 5, 7, 13, etc) and seems to work ‘nice’


回答 6

尝试:

function sphere ( N:float,k:int):Vector3 {
    var inc =  Mathf.PI  * (3 - Mathf.Sqrt(5));
    var off = 2 / N;
    var y = k * off - 1 + (off / 2);
    var r = Mathf.Sqrt(1 - y*y);
    var phi = k * inc;
    return Vector3((Mathf.Cos(phi)*r), y, Mathf.Sin(phi)*r); 
};

上面的函数应该以N个循环总数和k个循环电流迭代的形式循环运行。

它基于向日葵种子模式,除了将向日葵种子弯曲成半个圆顶,再弯曲成一个球体。

这是一张照片,除了我将相机放置在球体内的一半位置之外,所以看起来是2d而不是3d,因为相机到所有点的距离都相同。 http://3.bp.blogspot.com/-9lbPHLccQHA/USXf88_bvVI/AAAAAAAAADY/j7qhQsSZsA8/s640/sphere.jpg

Try:

function sphere ( N:float,k:int):Vector3 {
    var inc =  Mathf.PI  * (3 - Mathf.Sqrt(5));
    var off = 2 / N;
    var y = k * off - 1 + (off / 2);
    var r = Mathf.Sqrt(1 - y*y);
    var phi = k * inc;
    return Vector3((Mathf.Cos(phi)*r), y, Mathf.Sin(phi)*r); 
};

The above function should run in loop with N loop total and k loop current iteration.

It is based on a sunflower seeds pattern, except the sunflower seeds are curved around into a half dome, and again into a sphere.

Here is a picture, except I put the camera half way inside the sphere so it looks 2d instead of 3d because the camera is same distance from all points. http://3.bp.blogspot.com/-9lbPHLccQHA/USXf88_bvVI/AAAAAAAAADY/j7qhQsSZsA8/s640/sphere.jpg


回答 7

Healpix解决了一个密切相关的问题(用相等面积的像素对球体进行像素化):

http://healpix.sourceforge.net/

这可能是过大了,但是也许看了之后,您就会意识到它的其他一些不错的特性对您很有趣。它不仅仅是输出点云的函数。

我降落在这里试图再次找到它。名称“ healpix”并不完全引起球体…

Healpix solves a closely related problem (pixelating the sphere with equal area pixels):

http://healpix.sourceforge.net/

It’s probably overkill, but maybe after looking at it you’ll realize some of it’s other nice properties are interesting to you. It’s way more than just a function that outputs a point cloud.

I landed here trying to find it again; the name “healpix” doesn’t exactly evoke spheres…


回答 8

仅需少量点就可以运行模拟:

from random import random,randint
r = 10
n = 20
best_closest_d = 0
best_points = []
points = [(r,0,0) for i in range(n)]
for simulation in range(10000):
    x = random()*r
    y = random()*r
    z = r-(x**2+y**2)**0.5
    if randint(0,1):
        x = -x
    if randint(0,1):
        y = -y
    if randint(0,1):
        z = -z
    closest_dist = (2*r)**2
    closest_index = None
    for i in range(n):
        for j in range(n):
            if i==j:
                continue
            p1,p2 = points[i],points[j]
            x1,y1,z1 = p1
            x2,y2,z2 = p2
            d = (x1-x2)**2+(y1-y2)**2+(z1-z2)**2
            if d < closest_dist:
                closest_dist = d
                closest_index = i
    if simulation % 100 == 0:
        print simulation,closest_dist
    if closest_dist > best_closest_d:
        best_closest_d = closest_dist
        best_points = points[:]
    points[closest_index]=(x,y,z)


print best_points
>>> best_points
[(9.921692138442777, -9.930808529773849, 4.037839326088124),
 (5.141893371460546, 1.7274947332807744, -4.575674650522637),
 (-4.917695758662436, -1.090127967097737, -4.9629263893193745),
 (3.6164803265540666, 7.004158551438312, -2.1172868271109184),
 (-9.550655088997003, -9.580386054762917, 3.5277052594769422),
 (-0.062238110294250415, 6.803105171979587, 3.1966101417463655),
 (-9.600996012203195, 9.488067284474834, -3.498242301168819),
 (-8.601522086624803, 4.519484132245867, -0.2834204048792728),
 (-1.1198210500791472, -2.2916581379035694, 7.44937337008726),
 (7.981831370440529, 8.539378431788634, 1.6889099589074377),
 (0.513546008372332, -2.974333486904779, -6.981657873262494),
 (-4.13615438946178, -6.707488383678717, 2.1197605651446807),
 (2.2859494919024326, -8.14336582650039, 1.5418694699275672),
 (-7.241410895247996, 9.907335206038226, 2.271647103735541),
 (-9.433349952523232, -7.999106443463781, -2.3682575660694347),
 (3.704772125650199, 1.0526567864085812, 6.148581714099761),
 (-3.5710511242327048, 5.512552040316693, -3.4318468250897647),
 (-7.483466337225052, -1.506434920354559, 2.36641535124918),
 (7.73363824231576, -8.460241422163824, -1.4623228616326003),
 (10, 0, 0)]

with small numbers of points you could run a simulation:

from random import random,randint
r = 10
n = 20
best_closest_d = 0
best_points = []
points = [(r,0,0) for i in range(n)]
for simulation in range(10000):
    x = random()*r
    y = random()*r
    z = r-(x**2+y**2)**0.5
    if randint(0,1):
        x = -x
    if randint(0,1):
        y = -y
    if randint(0,1):
        z = -z
    closest_dist = (2*r)**2
    closest_index = None
    for i in range(n):
        for j in range(n):
            if i==j:
                continue
            p1,p2 = points[i],points[j]
            x1,y1,z1 = p1
            x2,y2,z2 = p2
            d = (x1-x2)**2+(y1-y2)**2+(z1-z2)**2
            if d < closest_dist:
                closest_dist = d
                closest_index = i
    if simulation % 100 == 0:
        print simulation,closest_dist
    if closest_dist > best_closest_d:
        best_closest_d = closest_dist
        best_points = points[:]
    points[closest_index]=(x,y,z)


print best_points
>>> best_points
[(9.921692138442777, -9.930808529773849, 4.037839326088124),
 (5.141893371460546, 1.7274947332807744, -4.575674650522637),
 (-4.917695758662436, -1.090127967097737, -4.9629263893193745),
 (3.6164803265540666, 7.004158551438312, -2.1172868271109184),
 (-9.550655088997003, -9.580386054762917, 3.5277052594769422),
 (-0.062238110294250415, 6.803105171979587, 3.1966101417463655),
 (-9.600996012203195, 9.488067284474834, -3.498242301168819),
 (-8.601522086624803, 4.519484132245867, -0.2834204048792728),
 (-1.1198210500791472, -2.2916581379035694, 7.44937337008726),
 (7.981831370440529, 8.539378431788634, 1.6889099589074377),
 (0.513546008372332, -2.974333486904779, -6.981657873262494),
 (-4.13615438946178, -6.707488383678717, 2.1197605651446807),
 (2.2859494919024326, -8.14336582650039, 1.5418694699275672),
 (-7.241410895247996, 9.907335206038226, 2.271647103735541),
 (-9.433349952523232, -7.999106443463781, -2.3682575660694347),
 (3.704772125650199, 1.0526567864085812, 6.148581714099761),
 (-3.5710511242327048, 5.512552040316693, -3.4318468250897647),
 (-7.483466337225052, -1.506434920354559, 2.36641535124918),
 (7.73363824231576, -8.460241422163824, -1.4623228616326003),
 (10, 0, 0)]

回答 9

以您的两个最大因素为准N,如果N==20这两个最大因素是{5,4}或更普遍的话{a,b}。计算

dlat  = 180/(a+1)
dlong = 360/(b+1})

把你的第一个点{90-dlat/2,(dlong/2)-180},第二个在{90-dlat/2,(3*dlong/2)-180},你在第3次{90-dlat/2,(5*dlong/2)-180},直到你绊倒环游世界一次,此时你一定要了解{75,150},当你去旁边{90-3*dlat/2,(dlong/2)-180}

显然,我正在按球形地球表面上的度数进行此操作,使用了将+/-转换为N / S或E / W的常规约定。显然,这给了您一个完全非随机的分布,但是它是均匀的,并且这些点不会聚集在一起。

要增加一定程度的随机性,您可以生成2个正态分布(均值0和std dev分别为{dlat / 3,dlong / 3})并将它们添加到均匀分布的点上。

Take the two largest factors of your N, if N==20 then the two largest factors are {5,4}, or, more generally {a,b}. Calculate

dlat  = 180/(a+1)
dlong = 360/(b+1})

Put your first point at {90-dlat/2,(dlong/2)-180}, your second at {90-dlat/2,(3*dlong/2)-180}, your 3rd at {90-dlat/2,(5*dlong/2)-180}, until you’ve tripped round the world once, by which time you’ve got to about {75,150} when you go next to {90-3*dlat/2,(dlong/2)-180}.

Obviously I’m working this in degrees on the surface of the spherical earth, with the usual conventions for translating +/- to N/S or E/W. And obviously this gives you a completely non-random distribution, but it is uniform and the points are not bunched together.

To add some degree of randomness, you could generate 2 normally-distributed (with mean 0 and std dev of {dlat/3, dlong/3} as appropriate) and add them to your uniformly distributed points.


回答 10

编辑:这不能回答OP想要问的问题,请留在这里,以防人们发现它有用。

我们使用概率的乘法规则,并结合无穷小。这将导致两行代码来实现所需的结果:

longitude: φ = uniform([0,2pi))
azimuth:   θ = -arcsin(1 - 2*uniform([0,1]))

(在以下坐标系中定义:)

在此处输入图片说明

您的语言通常具有统一的随机数基元。例如,在python中,您可以使用random.random()返回范围内的数字[0,1)。您可以将此数字乘以k以得到范围内的随机数[0,k)。因此在python中,uniform([0,2pi))将表示random.random()*2*math.pi


证明

现在我们不能均匀地分配θ,否则我们将陷入困境。我们希望分配与球面楔形的表面积成比例的概率(此图中的θ实际上为φ):

在此处输入图片说明

在赤道的角位移dφ将导致dφ* r的位移。在任意方位角θ处的位移将是什么?好吧,距z轴的半径为r*sin(θ),因此与楔形相交的“纬度”的弧长为dφ * r*sin(θ)。因此我们计算累积分布,通过对从南极到北极的切片面积进行积分,要采样的区域。

在此处输入图片说明(其中stuff = dφ*r

现在,我们将尝试从中获取CDF的逆样本:http : //en.wikipedia.org/wiki/Inverse_transform_sampling

首先,我们将几乎CDF除以最大值进行归一化。这具有抵消dφ和r的副作用。

azimuthalCDF: cumProb = (sin(θ)+1)/2 from -pi/2 to pi/2

inverseCDF: θ = -sin^(-1)(1 - 2*cumProb)

从而:

let x by a random float in range [0,1]
θ = -arcsin(1-2*x)

edit: This does not answer the question the OP meant to ask, leaving it here in case people find it useful somehow.

We use the multiplication rule of probability, combined with infinitessimals. This results in 2 lines of code to achieve your desired result:

longitude: φ = uniform([0,2pi))
azimuth:   θ = -arcsin(1 - 2*uniform([0,1]))

(defined in the following coordinate system:)

enter image description here

Your language typically has a uniform random number primitive. For example in python you can use random.random() to return a number in the range [0,1). You can multiply this number by k to get a random number in the range [0,k). Thus in python, uniform([0,2pi)) would mean random.random()*2*math.pi.


Proof

Now we can’t assign θ uniformly, otherwise we’d get clumping at the poles. We wish to assign probabilities proportional to the surface area of the spherical wedge (the θ in this diagram is actually φ):

enter image description here

An angular displacement dφ at the equator will result in a displacement of dφ*r. What will that displacement be at an arbitrary azimuth θ? Well, the radius from the z-axis is r*sin(θ), so the arclength of that “latitude” intersecting the wedge is dφ * r*sin(θ). Thus we calculate the cumulative distribution of the area to sample from it, by integrating the area of the slice from the south pole to the north pole.

enter image description here (where stuff=dφ*r)

We will now attempt to get the inverse of the CDF to sample from it: http://en.wikipedia.org/wiki/Inverse_transform_sampling

First we normalize by dividing our almost-CDF by its maximum value. This has the side-effect of cancelling out the dφ and r.

azimuthalCDF: cumProb = (sin(θ)+1)/2 from -pi/2 to pi/2

inverseCDF: θ = -sin^(-1)(1 - 2*cumProb)

Thus:

let x by a random float in range [0,1]
θ = -arcsin(1-2*x)

回答 11

或…放置20个点,计算二十面体面的中心。对于12点,找到二十面体的顶点。对于30点,是二十面体边缘的中点。您可以对四面体,立方体,十二面体和八面体执行相同的操作:一组点位于顶点上,另一组点位于面的中心,另一组点位于边的中心。但是,不能将它们混合在一起。

OR… to place 20 points, compute the centers of the icosahedronal faces. For 12 points, find the vertices of the icosahedron. For 30 points, the mid point of the edges of the icosahedron. you can do the same thing with the tetrahedron, cube, dodecahedron and octahedrons: one set of points is on the vertices, another on the center of the face and another on the center of the edges. They cannot be mixed, however.


回答 12

# create uniform spiral grid
numOfPoints = varargin[0]
vxyz = zeros((numOfPoints,3),dtype=float)
sq0 = 0.00033333333**2
sq2 = 0.9999998**2
sumsq = 2*sq0 + sq2
vxyz[numOfPoints -1] = array([(sqrt(sq0/sumsq)), 
                              (sqrt(sq0/sumsq)), 
                              (-sqrt(sq2/sumsq))])
vxyz[0] = -vxyz[numOfPoints -1] 
phi2 = sqrt(5)*0.5 + 2.5
rootCnt = sqrt(numOfPoints)
prevLongitude = 0
for index in arange(1, (numOfPoints -1), 1, dtype=float):
  zInc = (2*index)/(numOfPoints) -1
  radius = sqrt(1-zInc**2)

  longitude = phi2/(rootCnt*radius)
  longitude = longitude + prevLongitude
  while (longitude > 2*pi): 
    longitude = longitude - 2*pi

  prevLongitude = longitude
  if (longitude > pi):
    longitude = longitude - 2*pi

  latitude = arccos(zInc) - pi/2
  vxyz[index] = array([ (cos(latitude) * cos(longitude)) ,
                        (cos(latitude) * sin(longitude)), 
                        sin(latitude)])
# create uniform spiral grid
numOfPoints = varargin[0]
vxyz = zeros((numOfPoints,3),dtype=float)
sq0 = 0.00033333333**2
sq2 = 0.9999998**2
sumsq = 2*sq0 + sq2
vxyz[numOfPoints -1] = array([(sqrt(sq0/sumsq)), 
                              (sqrt(sq0/sumsq)), 
                              (-sqrt(sq2/sumsq))])
vxyz[0] = -vxyz[numOfPoints -1] 
phi2 = sqrt(5)*0.5 + 2.5
rootCnt = sqrt(numOfPoints)
prevLongitude = 0
for index in arange(1, (numOfPoints -1), 1, dtype=float):
  zInc = (2*index)/(numOfPoints) -1
  radius = sqrt(1-zInc**2)

  longitude = phi2/(rootCnt*radius)
  longitude = longitude + prevLongitude
  while (longitude > 2*pi): 
    longitude = longitude - 2*pi

  prevLongitude = longitude
  if (longitude > pi):
    longitude = longitude - 2*pi

  latitude = arccos(zInc) - pi/2
  vxyz[index] = array([ (cos(latitude) * cos(longitude)) ,
                        (cos(latitude) * sin(longitude)), 
                        sin(latitude)])

回答 13

@robert king这是一个非常不错的解决方案,但其中包含一些草率的错误。我知道它对我有很大帮助,所以不要介意草率。:)这是一个清理的版本。

from math import pi, asin, sin, degrees
halfpi, twopi = .5 * pi, 2 * pi
sphere_area = lambda R=1.0: 4 * pi * R ** 2

lat_dist = lambda lat, R=1.0: R*(1-sin(lat))

#A = 2*pi*R^2(1-sin(lat))
def sphere_latarea(lat, R=1.0):
    if -halfpi > lat or lat > halfpi:
        raise ValueError("lat must be between -halfpi and halfpi")
    return 2 * pi * R ** 2 * (1-sin(lat))

sphere_lonarea = lambda lon, R=1.0: \
        4 * pi * R ** 2 * lon / twopi

#A = 2*pi*R^2 |sin(lat1)-sin(lat2)| |lon1-lon2|/360
#    = (pi/180)R^2 |sin(lat1)-sin(lat2)| |lon1-lon2|
sphere_rectarea = lambda lat0, lat1, lon0, lon1, R=1.0: \
        (sphere_latarea(lat0, R)-sphere_latarea(lat1, R)) * (lon1-lon0) / twopi


def test_sphere(n_lats=10, n_lons=19, radius=540.0):
    total_area = 0.0
    for i_lons in range(n_lons):
        lon0 = twopi * float(i_lons) / n_lons
        lon1 = twopi * float(i_lons+1) / n_lons
        for i_lats in range(n_lats):
            lat0 = asin(2 * float(i_lats) / n_lats - 1)
            lat1 = asin(2 * float(i_lats+1)/n_lats - 1)
            area = sphere_rectarea(lat0, lat1, lon0, lon1, radius)
            print("{:} {:}: {:9.4f} to  {:9.4f}, {:9.4f} to  {:9.4f} => area {:10.4f}"
                    .format(i_lats, i_lons
                    , degrees(lat0), degrees(lat1)
                    , degrees(lon0), degrees(lon1)
                    , area))
            total_area += area
    print("total_area = {:10.4f} (difference of {:10.4f})"
            .format(total_area, abs(total_area) - sphere_area(radius)))

test_sphere()

@robert king It’s a really nice solution but has some sloppy bugs in it. I know it helped me a lot though, so never mind the sloppiness. :) Here is a cleaned up version….

from math import pi, asin, sin, degrees
halfpi, twopi = .5 * pi, 2 * pi
sphere_area = lambda R=1.0: 4 * pi * R ** 2

lat_dist = lambda lat, R=1.0: R*(1-sin(lat))

#A = 2*pi*R^2(1-sin(lat))
def sphere_latarea(lat, R=1.0):
    if -halfpi > lat or lat > halfpi:
        raise ValueError("lat must be between -halfpi and halfpi")
    return 2 * pi * R ** 2 * (1-sin(lat))

sphere_lonarea = lambda lon, R=1.0: \
        4 * pi * R ** 2 * lon / twopi

#A = 2*pi*R^2 |sin(lat1)-sin(lat2)| |lon1-lon2|/360
#    = (pi/180)R^2 |sin(lat1)-sin(lat2)| |lon1-lon2|
sphere_rectarea = lambda lat0, lat1, lon0, lon1, R=1.0: \
        (sphere_latarea(lat0, R)-sphere_latarea(lat1, R)) * (lon1-lon0) / twopi


def test_sphere(n_lats=10, n_lons=19, radius=540.0):
    total_area = 0.0
    for i_lons in range(n_lons):
        lon0 = twopi * float(i_lons) / n_lons
        lon1 = twopi * float(i_lons+1) / n_lons
        for i_lats in range(n_lats):
            lat0 = asin(2 * float(i_lats) / n_lats - 1)
            lat1 = asin(2 * float(i_lats+1)/n_lats - 1)
            area = sphere_rectarea(lat0, lat1, lon0, lon1, radius)
            print("{:} {:}: {:9.4f} to  {:9.4f}, {:9.4f} to  {:9.4f} => area {:10.4f}"
                    .format(i_lats, i_lons
                    , degrees(lat0), degrees(lat1)
                    , degrees(lon0), degrees(lon1)
                    , area))
            total_area += area
    print("total_area = {:10.4f} (difference of {:10.4f})"
            .format(total_area, abs(total_area) - sphere_area(radius)))

test_sphere()

回答 14

这行得通,而且非常简单。您想要的点数:

    private function moveTweets():void {


        var newScale:Number=Scale(meshes.length,50,500,6,2);
        trace("new scale:"+newScale);


        var l:Number=this.meshes.length;
        var tweetMeshInstance:TweetMesh;
        var destx:Number;
        var desty:Number;
        var destz:Number;
        for (var i:Number=0;i<this.meshes.length;i++){

            tweetMeshInstance=meshes[i];

            var phi:Number = Math.acos( -1 + ( 2 * i ) / l );
            var theta:Number = Math.sqrt( l * Math.PI ) * phi;

            tweetMeshInstance.origX = (sphereRadius+5) * Math.cos( theta ) * Math.sin( phi );
            tweetMeshInstance.origY= (sphereRadius+5) * Math.sin( theta ) * Math.sin( phi );
            tweetMeshInstance.origZ = (sphereRadius+5) * Math.cos( phi );

            destx=sphereRadius * Math.cos( theta ) * Math.sin( phi );
            desty=sphereRadius * Math.sin( theta ) * Math.sin( phi );
            destz=sphereRadius * Math.cos( phi );

            tweetMeshInstance.lookAt(new Vector3D());


            TweenMax.to(tweetMeshInstance, 1, {scaleX:newScale,scaleY:newScale,x:destx,y:desty,z:destz,onUpdate:onLookAtTween, onUpdateParams:[tweetMeshInstance]});

        }

    }
    private function onLookAtTween(theMesh:TweetMesh):void {
        theMesh.lookAt(new Vector3D());
    }

This works and it’s deadly simple. As many points as you want:

    private function moveTweets():void {


        var newScale:Number=Scale(meshes.length,50,500,6,2);
        trace("new scale:"+newScale);


        var l:Number=this.meshes.length;
        var tweetMeshInstance:TweetMesh;
        var destx:Number;
        var desty:Number;
        var destz:Number;
        for (var i:Number=0;i<this.meshes.length;i++){

            tweetMeshInstance=meshes[i];

            var phi:Number = Math.acos( -1 + ( 2 * i ) / l );
            var theta:Number = Math.sqrt( l * Math.PI ) * phi;

            tweetMeshInstance.origX = (sphereRadius+5) * Math.cos( theta ) * Math.sin( phi );
            tweetMeshInstance.origY= (sphereRadius+5) * Math.sin( theta ) * Math.sin( phi );
            tweetMeshInstance.origZ = (sphereRadius+5) * Math.cos( phi );

            destx=sphereRadius * Math.cos( theta ) * Math.sin( phi );
            desty=sphereRadius * Math.sin( theta ) * Math.sin( phi );
            destz=sphereRadius * Math.cos( phi );

            tweetMeshInstance.lookAt(new Vector3D());


            TweenMax.to(tweetMeshInstance, 1, {scaleX:newScale,scaleY:newScale,x:destx,y:desty,z:destz,onUpdate:onLookAtTween, onUpdateParams:[tweetMeshInstance]});

        }

    }
    private function onLookAtTween(theMesh:TweetMesh):void {
        theMesh.lookAt(new Vector3D());
    }

评估字符串中的数学表达式

问题:评估字符串中的数学表达式

stringExp = "2^4"
intVal = int(stringExp)      # Expected value: 16

这将返回以下错误:

Traceback (most recent call last):  
File "<stdin>", line 1, in <module>
ValueError: invalid literal for int()
with base 10: '2^4'

我知道eval可以解决此问题,但是难道没有更好,更重要的是更安全的方法来评估存储在字符串中的数学表达式吗?

stringExp = "2^4"
intVal = int(stringExp)      # Expected value: 16

This returns the following error:

Traceback (most recent call last):  
File "<stdin>", line 1, in <module>
ValueError: invalid literal for int()
with base 10: '2^4'

I know that eval can work around this, but isn’t there a better and – more importantly – safer method to evaluate a mathematical expression that is being stored in a string?


回答 0

Pyparsing可用于解析数学表达式。特别是,fourFn.py 显示了如何解析基本算术表达式。在下面,我将fourFn重新包装为一个数值解析器类,以便于重用。

from __future__ import division
from pyparsing import (Literal, CaselessLiteral, Word, Combine, Group, Optional,
                       ZeroOrMore, Forward, nums, alphas, oneOf)
import math
import operator

__author__ = 'Paul McGuire'
__version__ = '$Revision: 0.0 $'
__date__ = '$Date: 2009-03-20 $'
__source__ = '''http://pyparsing.wikispaces.com/file/view/fourFn.py
http://pyparsing.wikispaces.com/message/view/home/15549426
'''
__note__ = '''
All I've done is rewrap Paul McGuire's fourFn.py as a class, so I can use it
more easily in other places.
'''


class NumericStringParser(object):
    '''
    Most of this code comes from the fourFn.py pyparsing example

    '''

    def pushFirst(self, strg, loc, toks):
        self.exprStack.append(toks[0])

    def pushUMinus(self, strg, loc, toks):
        if toks and toks[0] == '-':
            self.exprStack.append('unary -')

    def __init__(self):
        """
        expop   :: '^'
        multop  :: '*' | '/'
        addop   :: '+' | '-'
        integer :: ['+' | '-'] '0'..'9'+
        atom    :: PI | E | real | fn '(' expr ')' | '(' expr ')'
        factor  :: atom [ expop factor ]*
        term    :: factor [ multop factor ]*
        expr    :: term [ addop term ]*
        """
        point = Literal(".")
        e = CaselessLiteral("E")
        fnumber = Combine(Word("+-" + nums, nums) +
                          Optional(point + Optional(Word(nums))) +
                          Optional(e + Word("+-" + nums, nums)))
        ident = Word(alphas, alphas + nums + "_$")
        plus = Literal("+")
        minus = Literal("-")
        mult = Literal("*")
        div = Literal("/")
        lpar = Literal("(").suppress()
        rpar = Literal(")").suppress()
        addop = plus | minus
        multop = mult | div
        expop = Literal("^")
        pi = CaselessLiteral("PI")
        expr = Forward()
        atom = ((Optional(oneOf("- +")) +
                 (ident + lpar + expr + rpar | pi | e | fnumber).setParseAction(self.pushFirst))
                | Optional(oneOf("- +")) + Group(lpar + expr + rpar)
                ).setParseAction(self.pushUMinus)
        # by defining exponentiation as "atom [ ^ factor ]..." instead of
        # "atom [ ^ atom ]...", we get right-to-left exponents, instead of left-to-right
        # that is, 2^3^2 = 2^(3^2), not (2^3)^2.
        factor = Forward()
        factor << atom + \
            ZeroOrMore((expop + factor).setParseAction(self.pushFirst))
        term = factor + \
            ZeroOrMore((multop + factor).setParseAction(self.pushFirst))
        expr << term + \
            ZeroOrMore((addop + term).setParseAction(self.pushFirst))
        # addop_term = ( addop + term ).setParseAction( self.pushFirst )
        # general_term = term + ZeroOrMore( addop_term ) | OneOrMore( addop_term)
        # expr <<  general_term
        self.bnf = expr
        # map operator symbols to corresponding arithmetic operations
        epsilon = 1e-12
        self.opn = {"+": operator.add,
                    "-": operator.sub,
                    "*": operator.mul,
                    "/": operator.truediv,
                    "^": operator.pow}
        self.fn = {"sin": math.sin,
                   "cos": math.cos,
                   "tan": math.tan,
                   "exp": math.exp,
                   "abs": abs,
                   "trunc": lambda a: int(a),
                   "round": round,
                   "sgn": lambda a: abs(a) > epsilon and cmp(a, 0) or 0}

    def evaluateStack(self, s):
        op = s.pop()
        if op == 'unary -':
            return -self.evaluateStack(s)
        if op in "+-*/^":
            op2 = self.evaluateStack(s)
            op1 = self.evaluateStack(s)
            return self.opn[op](op1, op2)
        elif op == "PI":
            return math.pi  # 3.1415926535
        elif op == "E":
            return math.e  # 2.718281828
        elif op in self.fn:
            return self.fn[op](self.evaluateStack(s))
        elif op[0].isalpha():
            return 0
        else:
            return float(op)

    def eval(self, num_string, parseAll=True):
        self.exprStack = []
        results = self.bnf.parseString(num_string, parseAll)
        val = self.evaluateStack(self.exprStack[:])
        return val

你可以这样使用

nsp = NumericStringParser()
result = nsp.eval('2^4')
print(result)
# 16.0

result = nsp.eval('exp(2^4)')
print(result)
# 8886110.520507872

Pyparsing can be used to parse mathematical expressions. In particular, fourFn.py shows how to parse basic arithmetic expressions. Below, I’ve rewrapped fourFn into a numeric parser class for easier reuse.

from __future__ import division
from pyparsing import (Literal, CaselessLiteral, Word, Combine, Group, Optional,
                       ZeroOrMore, Forward, nums, alphas, oneOf)
import math
import operator

__author__ = 'Paul McGuire'
__version__ = '$Revision: 0.0 $'
__date__ = '$Date: 2009-03-20 $'
__source__ = '''http://pyparsing.wikispaces.com/file/view/fourFn.py
http://pyparsing.wikispaces.com/message/view/home/15549426
'''
__note__ = '''
All I've done is rewrap Paul McGuire's fourFn.py as a class, so I can use it
more easily in other places.
'''


class NumericStringParser(object):
    '''
    Most of this code comes from the fourFn.py pyparsing example

    '''

    def pushFirst(self, strg, loc, toks):
        self.exprStack.append(toks[0])

    def pushUMinus(self, strg, loc, toks):
        if toks and toks[0] == '-':
            self.exprStack.append('unary -')

    def __init__(self):
        """
        expop   :: '^'
        multop  :: '*' | '/'
        addop   :: '+' | '-'
        integer :: ['+' | '-'] '0'..'9'+
        atom    :: PI | E | real | fn '(' expr ')' | '(' expr ')'
        factor  :: atom [ expop factor ]*
        term    :: factor [ multop factor ]*
        expr    :: term [ addop term ]*
        """
        point = Literal(".")
        e = CaselessLiteral("E")
        fnumber = Combine(Word("+-" + nums, nums) +
                          Optional(point + Optional(Word(nums))) +
                          Optional(e + Word("+-" + nums, nums)))
        ident = Word(alphas, alphas + nums + "_$")
        plus = Literal("+")
        minus = Literal("-")
        mult = Literal("*")
        div = Literal("/")
        lpar = Literal("(").suppress()
        rpar = Literal(")").suppress()
        addop = plus | minus
        multop = mult | div
        expop = Literal("^")
        pi = CaselessLiteral("PI")
        expr = Forward()
        atom = ((Optional(oneOf("- +")) +
                 (ident + lpar + expr + rpar | pi | e | fnumber).setParseAction(self.pushFirst))
                | Optional(oneOf("- +")) + Group(lpar + expr + rpar)
                ).setParseAction(self.pushUMinus)
        # by defining exponentiation as "atom [ ^ factor ]..." instead of
        # "atom [ ^ atom ]...", we get right-to-left exponents, instead of left-to-right
        # that is, 2^3^2 = 2^(3^2), not (2^3)^2.
        factor = Forward()
        factor << atom + \
            ZeroOrMore((expop + factor).setParseAction(self.pushFirst))
        term = factor + \
            ZeroOrMore((multop + factor).setParseAction(self.pushFirst))
        expr << term + \
            ZeroOrMore((addop + term).setParseAction(self.pushFirst))
        # addop_term = ( addop + term ).setParseAction( self.pushFirst )
        # general_term = term + ZeroOrMore( addop_term ) | OneOrMore( addop_term)
        # expr <<  general_term
        self.bnf = expr
        # map operator symbols to corresponding arithmetic operations
        epsilon = 1e-12
        self.opn = {"+": operator.add,
                    "-": operator.sub,
                    "*": operator.mul,
                    "/": operator.truediv,
                    "^": operator.pow}
        self.fn = {"sin": math.sin,
                   "cos": math.cos,
                   "tan": math.tan,
                   "exp": math.exp,
                   "abs": abs,
                   "trunc": lambda a: int(a),
                   "round": round,
                   "sgn": lambda a: abs(a) > epsilon and cmp(a, 0) or 0}

    def evaluateStack(self, s):
        op = s.pop()
        if op == 'unary -':
            return -self.evaluateStack(s)
        if op in "+-*/^":
            op2 = self.evaluateStack(s)
            op1 = self.evaluateStack(s)
            return self.opn[op](op1, op2)
        elif op == "PI":
            return math.pi  # 3.1415926535
        elif op == "E":
            return math.e  # 2.718281828
        elif op in self.fn:
            return self.fn[op](self.evaluateStack(s))
        elif op[0].isalpha():
            return 0
        else:
            return float(op)

    def eval(self, num_string, parseAll=True):
        self.exprStack = []
        results = self.bnf.parseString(num_string, parseAll)
        val = self.evaluateStack(self.exprStack[:])
        return val

You can use it like this

nsp = NumericStringParser()
result = nsp.eval('2^4')
print(result)
# 16.0

result = nsp.eval('exp(2^4)')
print(result)
# 8886110.520507872

回答 1

eval 是邪恶的

eval("__import__('os').remove('important file')") # arbitrary commands
eval("9**9**9**9**9**9**9**9", {'__builtins__': None}) # CPU, memory

注意:即使您使用set __builtins__None使用内省也可能爆发:

eval('(1).__class__.__bases__[0].__subclasses__()', {'__builtins__': None})

使用以下方法评估算术表达式 ast

import ast
import operator as op

# supported operators
operators = {ast.Add: op.add, ast.Sub: op.sub, ast.Mult: op.mul,
             ast.Div: op.truediv, ast.Pow: op.pow, ast.BitXor: op.xor,
             ast.USub: op.neg}

def eval_expr(expr):
    """
    >>> eval_expr('2^6')
    4
    >>> eval_expr('2**6')
    64
    >>> eval_expr('1 + 2*3**(4^5) / (6 + -7)')
    -5.0
    """
    return eval_(ast.parse(expr, mode='eval').body)

def eval_(node):
    if isinstance(node, ast.Num): # <number>
        return node.n
    elif isinstance(node, ast.BinOp): # <left> <operator> <right>
        return operators[type(node.op)](eval_(node.left), eval_(node.right))
    elif isinstance(node, ast.UnaryOp): # <operator> <operand> e.g., -1
        return operators[type(node.op)](eval_(node.operand))
    else:
        raise TypeError(node)

您可以轻松限制每个操作或任何中间结果的允许范围,例如,限制以下项的输入参数a**b

def power(a, b):
    if any(abs(n) > 100 for n in [a, b]):
        raise ValueError((a,b))
    return op.pow(a, b)
operators[ast.Pow] = power

或限制中间结果的大小:

import functools

def limit(max_=None):
    """Return decorator that limits allowed returned values."""
    def decorator(func):
        @functools.wraps(func)
        def wrapper(*args, **kwargs):
            ret = func(*args, **kwargs)
            try:
                mag = abs(ret)
            except TypeError:
                pass # not applicable
            else:
                if mag > max_:
                    raise ValueError(ret)
            return ret
        return wrapper
    return decorator

eval_ = limit(max_=10**100)(eval_)

>>> evil = "__import__('os').remove('important file')"
>>> eval_expr(evil) #doctest:+IGNORE_EXCEPTION_DETAIL
Traceback (most recent call last):
...
TypeError:
>>> eval_expr("9**9")
387420489
>>> eval_expr("9**9**9**9**9**9**9**9") #doctest:+IGNORE_EXCEPTION_DETAIL
Traceback (most recent call last):
...
ValueError:

eval is evil

eval("__import__('os').remove('important file')") # arbitrary commands
eval("9**9**9**9**9**9**9**9", {'__builtins__': None}) # CPU, memory

Note: even if you use set __builtins__ to None it still might be possible to break out using introspection:

eval('(1).__class__.__bases__[0].__subclasses__()', {'__builtins__': None})

Evaluate arithmetic expression using ast

import ast
import operator as op

# supported operators
operators = {ast.Add: op.add, ast.Sub: op.sub, ast.Mult: op.mul,
             ast.Div: op.truediv, ast.Pow: op.pow, ast.BitXor: op.xor,
             ast.USub: op.neg}

def eval_expr(expr):
    """
    >>> eval_expr('2^6')
    4
    >>> eval_expr('2**6')
    64
    >>> eval_expr('1 + 2*3**(4^5) / (6 + -7)')
    -5.0
    """
    return eval_(ast.parse(expr, mode='eval').body)

def eval_(node):
    if isinstance(node, ast.Num): # <number>
        return node.n
    elif isinstance(node, ast.BinOp): # <left> <operator> <right>
        return operators[type(node.op)](eval_(node.left), eval_(node.right))
    elif isinstance(node, ast.UnaryOp): # <operator> <operand> e.g., -1
        return operators[type(node.op)](eval_(node.operand))
    else:
        raise TypeError(node)

You can easily limit allowed range for each operation or any intermediate result, e.g., to limit input arguments for a**b:

def power(a, b):
    if any(abs(n) > 100 for n in [a, b]):
        raise ValueError((a,b))
    return op.pow(a, b)
operators[ast.Pow] = power

Or to limit magnitude of intermediate results:

import functools

def limit(max_=None):
    """Return decorator that limits allowed returned values."""
    def decorator(func):
        @functools.wraps(func)
        def wrapper(*args, **kwargs):
            ret = func(*args, **kwargs)
            try:
                mag = abs(ret)
            except TypeError:
                pass # not applicable
            else:
                if mag > max_:
                    raise ValueError(ret)
            return ret
        return wrapper
    return decorator

eval_ = limit(max_=10**100)(eval_)

Example

>>> evil = "__import__('os').remove('important file')"
>>> eval_expr(evil) #doctest:+IGNORE_EXCEPTION_DETAIL
Traceback (most recent call last):
...
TypeError:
>>> eval_expr("9**9")
387420489
>>> eval_expr("9**9**9**9**9**9**9**9") #doctest:+IGNORE_EXCEPTION_DETAIL
Traceback (most recent call last):
...
ValueError:

回答 2

eval()*的一些更安全的选择:sympy.sympify().evalf()

*sympify根据文档中的以下警告,SymPy 也不安全。

警告:请注意,此函数使用eval,因此不应该在未过滤的输入上使用。

Some safer alternatives to eval() and sympy.sympify().evalf()*:

*SymPy sympify is also unsafe according to the following warning from the documentation.

Warning: Note that this function uses eval, and thus shouldn’t be used on unsanitized input.


回答 3

好的,所以eval的问题在于,即使您摆脱,它也很容易逃脱其沙箱__builtins__。逃避沙箱的所有方法归结为使用getattrobject.__getattribute__(通过.操作员)通过某个允许的对象(''.__class__.__bases__[0].__subclasses__或类似对象)获得对某个危险对象的引用。 getattr通过设置消除__builtins__Noneobject.__getattribute__之所以困难,是因为它不能简单地删除,这既是因为它object是不可变的,又是因为删除它会破坏一切。但是,__getattribute__只能通过.操作员进行访问,因此清除您的输入内容就足以确保eval无法逃脱其沙箱。
在处理公式中,十进制的唯一有效用法是在十进制之前或之后[0-9],因此我们只删除的所有其他实例.

import re
inp = re.sub(r"\.(?![0-9])","", inp)
val = eval(inp, {'__builtins__':None})

请注意,虽然python通常将1 + 1.视为1 + 1.0,但这会删除尾部.并留下1 + 1。您可以将)和添加EOF到允许遵循的事物列表中.,但是为什么要麻烦呢?

Okay, so the problem with eval is that it can escape its sandbox too easily, even if you get rid of __builtins__. All the methods for escaping the sandbox come down to using getattr or object.__getattribute__ (via the . operator) to obtain a reference to some dangerous object via some allowed object (''.__class__.__bases__[0].__subclasses__ or similar). getattr is eliminated by setting __builtins__ to None. object.__getattribute__ is the difficult one, since it cannot simply be removed, both because object is immutable and because removing it would break everything. However, __getattribute__ is only accessible via the . operator, so purging that from your input is sufficient to ensure eval cannot escape its sandbox.
In processing formulas, the only valid use of a decimal is when it is preceded or followed by [0-9], so we just remove all other instances of ..

import re
inp = re.sub(r"\.(?![0-9])","", inp)
val = eval(inp, {'__builtins__':None})

Note that while python normally treats 1 + 1. as 1 + 1.0, this will remove the trailing . and leave you with 1 + 1. You could add ),, and EOF to the list of things allowed to follow ., but why bother?


回答 4

您可以使用ast模块并编写一个NodeVisitor,以验证每个节点的类型是否属于白名单。

import ast, math

locals =  {key: value for (key,value) in vars(math).items() if key[0] != '_'}
locals.update({"abs": abs, "complex": complex, "min": min, "max": max, "pow": pow, "round": round})

class Visitor(ast.NodeVisitor):
    def visit(self, node):
       if not isinstance(node, self.whitelist):
           raise ValueError(node)
       return super().visit(node)

    whitelist = (ast.Module, ast.Expr, ast.Load, ast.Expression, ast.Add, ast.Sub, ast.UnaryOp, ast.Num, ast.BinOp,
            ast.Mult, ast.Div, ast.Pow, ast.BitOr, ast.BitAnd, ast.BitXor, ast.USub, ast.UAdd, ast.FloorDiv, ast.Mod,
            ast.LShift, ast.RShift, ast.Invert, ast.Call, ast.Name)

def evaluate(expr, locals = {}):
    if any(elem in expr for elem in '\n#') : raise ValueError(expr)
    try:
        node = ast.parse(expr.strip(), mode='eval')
        Visitor().visit(node)
        return eval(compile(node, "<string>", "eval"), {'__builtins__': None}, locals)
    except Exception: raise ValueError(expr)

因为它通过白名单而不是黑名单工作,所以它是安全的。它可以访问的唯一函数和变量是您明确为其提供访问权限的函数和变量。我在字典中填充了与数学相关的函数,因此您可以根据需要轻松访问这些函数,但是必须显式使用它。

如果字符串尝试调用未提供的函数或调用任何方法,则将引发异常,并且该异常将不会执行。

因为它使用Python内置的解析器和评估器,所以它也继承了Python的优先级和升级规则。

>>> evaluate("7 + 9 * (2 << 2)")
79
>>> evaluate("6 // 2 + 0.0")
3.0

上面的代码仅在Python 3上进行了测试。

如果需要,可以在此函数上添加超时装饰器。

You can use the ast module and write a NodeVisitor that verifies that the type of each node is part of a whitelist.

import ast, math

locals =  {key: value for (key,value) in vars(math).items() if key[0] != '_'}
locals.update({"abs": abs, "complex": complex, "min": min, "max": max, "pow": pow, "round": round})

class Visitor(ast.NodeVisitor):
    def visit(self, node):
       if not isinstance(node, self.whitelist):
           raise ValueError(node)
       return super().visit(node)

    whitelist = (ast.Module, ast.Expr, ast.Load, ast.Expression, ast.Add, ast.Sub, ast.UnaryOp, ast.Num, ast.BinOp,
            ast.Mult, ast.Div, ast.Pow, ast.BitOr, ast.BitAnd, ast.BitXor, ast.USub, ast.UAdd, ast.FloorDiv, ast.Mod,
            ast.LShift, ast.RShift, ast.Invert, ast.Call, ast.Name)

def evaluate(expr, locals = {}):
    if any(elem in expr for elem in '\n#') : raise ValueError(expr)
    try:
        node = ast.parse(expr.strip(), mode='eval')
        Visitor().visit(node)
        return eval(compile(node, "<string>", "eval"), {'__builtins__': None}, locals)
    except Exception: raise ValueError(expr)

Because it works via a whitelist rather than a blacklist, it is safe. The only functions and variables it can access are those you explicitly give it access to. I populated a dict with math-related functions so you can easily provide access to those if you want, but you have to explicitly use it.

If the string attempts to call functions that haven’t been provided, or invoke any methods, an exception will be raised, and it will not be executed.

Because this uses Python’s built in parser and evaluator, it also inherits Python’s precedence and promotion rules as well.

>>> evaluate("7 + 9 * (2 << 2)")
79
>>> evaluate("6 // 2 + 0.0")
3.0

The above code has only been tested on Python 3.

If desired, you can add a timeout decorator on this function.


回答 5

究其原因evalexec这么危险的是,默认的compile功能会产生字节码的任何有效的Python表达式,而默认evalexec将执行任何有效的Python字节码。迄今为止,所有答案都集中在限制可以生成的字节码(通过清理输入)或使用AST构建自己的域特定语言。

相反,您可以轻松创建一个简单的eval函数,该函数不能执行任何有害的操作,并且可以轻松地对内存或所用时间进行运行时检查。当然,如果这是简单的数学运算,那将是捷径。

c = compile(stringExp, 'userinput', 'eval')
if c.co_code[0]==b'd' and c.co_code[3]==b'S':
    return c.co_consts[ord(c.co_code[1])+ord(c.co_code[2])*256]

它的工作方式很简单,在编译过程中可以安全地评估任何常数数学表达式并将其存储为常数。由compile返回的代码对象由组成d,它是的字节码LOAD_CONST,其后是要加载的常量的编号(通常是列表中的最后一个),然后是S,它是用于加载的常量的字节码。RETURN_VALUE。如果此快捷方式不起作用,则表示用户输入不是常量表达式(包含变量或函数调用或类似内容)。

这也为一些更复杂的输入格式打开了大门。例如:

stringExp = "1 + cos(2)"

这实际上需要评估字节码,这仍然非常简单。Python字节码是一种面向堆栈的语言,因此所有事情都是简单的事情TOS=stack.pop(); op(TOS); stack.put(TOS)或类似的事情。关键是只能实现安全的操作码(加载/存储值,数学运算,返回值),而不是不安全的操作码(属性查找)。如果您希望用户能够调用函数(不使用上面的快捷方式的全部原因),可以简单地使您的实现CALL_FUNCTION仅允许“安全”列表中的函数。

from dis import opmap
from Queue import LifoQueue
from math import sin,cos
import operator

globs = {'sin':sin, 'cos':cos}
safe = globs.values()

stack = LifoQueue()

class BINARY(object):
    def __init__(self, operator):
        self.op=operator
    def __call__(self, context):
        stack.put(self.op(stack.get(),stack.get()))

class UNARY(object):
    def __init__(self, operator):
        self.op=operator
    def __call__(self, context):
        stack.put(self.op(stack.get()))


def CALL_FUNCTION(context, arg):
    argc = arg[0]+arg[1]*256
    args = [stack.get() for i in range(argc)]
    func = stack.get()
    if func not in safe:
        raise TypeError("Function %r now allowed"%func)
    stack.put(func(*args))

def LOAD_CONST(context, arg):
    cons = arg[0]+arg[1]*256
    stack.put(context['code'].co_consts[cons])

def LOAD_NAME(context, arg):
    name_num = arg[0]+arg[1]*256
    name = context['code'].co_names[name_num]
    if name in context['locals']:
        stack.put(context['locals'][name])
    else:
        stack.put(context['globals'][name])

def RETURN_VALUE(context):
    return stack.get()

opfuncs = {
    opmap['BINARY_ADD']: BINARY(operator.add),
    opmap['UNARY_INVERT']: UNARY(operator.invert),
    opmap['CALL_FUNCTION']: CALL_FUNCTION,
    opmap['LOAD_CONST']: LOAD_CONST,
    opmap['LOAD_NAME']: LOAD_NAME
    opmap['RETURN_VALUE']: RETURN_VALUE,
}

def VMeval(c):
    context = dict(locals={}, globals=globs, code=c)
    bci = iter(c.co_code)
    for bytecode in bci:
        func = opfuncs[ord(bytecode)]
        if func.func_code.co_argcount==1:
            ret = func(context)
        else:
            args = ord(bci.next()), ord(bci.next())
            ret = func(context, args)
        if ret:
            return ret

def evaluate(expr):
    return VMeval(compile(expr, 'userinput', 'eval'))

显然,此版本的实际版本会更长一些(有119个操作码,其中24个与数学相关)。添加STORE_FAST和另外两个将允许'x=5;return x+x轻松地进行类似或类似的输入。它甚至可以用来执行用户创建的函数,只要用户创建的函数本身是通过VMeval执行的(不要使它们可调用!!!否则它们可能被用作某个地方的回调)。处理循环需要对goto字节码的支持,这意味着从for迭代器更改为最明显的)。while并维护指向当前指令的指针,但这并不难。为了抵制DOS,主循环应检查自计算开始以来经过了多少时间,并且某些运算符应拒绝超出合理范围的输入(BINARY_POWER

尽管此方法比简单表达式的简单语法解析器要长一些(请参见上文,仅获取编译的常量),但它可以轻松扩展到更复杂的输入,并且不需要处理语法(compile将任意复杂的事物简化为一系列简单的说明)。

The reason eval and exec are so dangerous is that the default compile function will generate bytecode for any valid python expression, and the default eval or exec will execute any valid python bytecode. All the answers to date have focused on restricting the bytecode that can be generated (by sanitizing input) or building your own domain-specific-language using the AST.

Instead, you can easily create a simple eval function that is incapable of doing anything nefarious and can easily have runtime checks on memory or time used. Of course, if it is simple math, than there is a shortcut.

c = compile(stringExp, 'userinput', 'eval')
if c.co_code[0]==b'd' and c.co_code[3]==b'S':
    return c.co_consts[ord(c.co_code[1])+ord(c.co_code[2])*256]

The way this works is simple, any constant mathematic expression is safely evaluated during compilation and stored as a constant. The code object returned by compile consists of d, which is the bytecode for LOAD_CONST, followed by the number of the constant to load (usually the last one in the list), followed by S, which is the bytecode for RETURN_VALUE. If this shortcut doesn’t work, it means that the user input isn’t a constant expression (contains a variable or function call or similar).

This also opens the door to some more sophisticated input formats. For example:

stringExp = "1 + cos(2)"

This requires actually evaluating the bytecode, which is still quite simple. Python bytecode is a stack oriented language, so everything is a simple matter of TOS=stack.pop(); op(TOS); stack.put(TOS) or similar. The key is to only implement the opcodes that are safe (loading/storing values, math operations, returning values) and not unsafe ones (attribute lookup). If you want the user to be able to call functions (the whole reason not to use the shortcut above), simple make your implementation of CALL_FUNCTION only allow functions in a ‘safe’ list.

from dis import opmap
from Queue import LifoQueue
from math import sin,cos
import operator

globs = {'sin':sin, 'cos':cos}
safe = globs.values()

stack = LifoQueue()

class BINARY(object):
    def __init__(self, operator):
        self.op=operator
    def __call__(self, context):
        stack.put(self.op(stack.get(),stack.get()))

class UNARY(object):
    def __init__(self, operator):
        self.op=operator
    def __call__(self, context):
        stack.put(self.op(stack.get()))


def CALL_FUNCTION(context, arg):
    argc = arg[0]+arg[1]*256
    args = [stack.get() for i in range(argc)]
    func = stack.get()
    if func not in safe:
        raise TypeError("Function %r now allowed"%func)
    stack.put(func(*args))

def LOAD_CONST(context, arg):
    cons = arg[0]+arg[1]*256
    stack.put(context['code'].co_consts[cons])

def LOAD_NAME(context, arg):
    name_num = arg[0]+arg[1]*256
    name = context['code'].co_names[name_num]
    if name in context['locals']:
        stack.put(context['locals'][name])
    else:
        stack.put(context['globals'][name])

def RETURN_VALUE(context):
    return stack.get()

opfuncs = {
    opmap['BINARY_ADD']: BINARY(operator.add),
    opmap['UNARY_INVERT']: UNARY(operator.invert),
    opmap['CALL_FUNCTION']: CALL_FUNCTION,
    opmap['LOAD_CONST']: LOAD_CONST,
    opmap['LOAD_NAME']: LOAD_NAME
    opmap['RETURN_VALUE']: RETURN_VALUE,
}

def VMeval(c):
    context = dict(locals={}, globals=globs, code=c)
    bci = iter(c.co_code)
    for bytecode in bci:
        func = opfuncs[ord(bytecode)]
        if func.func_code.co_argcount==1:
            ret = func(context)
        else:
            args = ord(bci.next()), ord(bci.next())
            ret = func(context, args)
        if ret:
            return ret

def evaluate(expr):
    return VMeval(compile(expr, 'userinput', 'eval'))

Obviously, the real version of this would be a bit longer (there are 119 opcodes, 24 of which are math related). Adding STORE_FAST and a couple others would allow for input like 'x=5;return x+x or similar, trivially easily. It can even be used to execute user-created functions, so long as the user created functions are themselves executed via VMeval (don’t make them callable!!! or they could get used as a callback somewhere). Handling loops requires support for the goto bytecodes, which means changing from a for iterator to while and maintaining a pointer to the current instruction, but isn’t too hard. For resistance to DOS, the main loop should check how much time has passed since the start of the calculation, and certain operators should deny input over some reasonable limit (BINARY_POWER being the most obvious).

While this approach is somewhat longer than a simple grammar parser for simple expressions (see above about just grabbing the compiled constant), it extends easily to more complicated input, and doesn’t require dealing with grammar (compile take anything arbitrarily complicated and reduces it to a sequence of simple instructions).


回答 6

我想我会使用eval(),但首先要检查以确保该字符串是有效的数学表达式,而不是恶意的表达式。您可以使用正则表达式进行验证。

eval() 还采用了其他参数,您可以使用这些参数来限制其操作的命名空间,以提高安全性。

I think I would use eval(), but would first check to make sure the string is a valid mathematical expression, as opposed to something malicious. You could use a regex for the validation.

eval() also takes additional arguments which you can use to restrict the namespace it operates in for greater security.


回答 7

这是一个很晚的答复,但我认为对将来的参考很有用。您可以使用SymPy,而不是编写自己的数学解析器(尽管上面的pyparsing示例很棒)。我没有很多经验,但是它包含的数学引擎比任何人都可能为特定应用程序编写的函数强大得多,并且基本表达式评估非常简单:

>>> import sympy
>>> x, y, z = sympy.symbols('x y z')
>>> sympy.sympify("x**3 + sin(y)").evalf(subs={x:1, y:-3})
0.858879991940133

确实很酷!A from sympy import *带来了更多的功能支持,例如触发功能,特殊功能等,但是我在这里避免了显示来自何处的信息。

This is a massively late reply, but I think useful for future reference. Rather than write your own math parser (although the pyparsing example above is great) you could use SymPy. I don’t have a lot of experience with it, but it contains a much more powerful math engine than anyone is likely to write for a specific application and the basic expression evaluation is very easy:

>>> import sympy
>>> x, y, z = sympy.symbols('x y z')
>>> sympy.sympify("x**3 + sin(y)").evalf(subs={x:1, y:-3})
0.858879991940133

Very cool indeed! A from sympy import * brings in a lot more function support, such as trig functions, special functions, etc., but I’ve avoided that here to show what’s coming from where.


回答 8

[我知道这是一个老问题,但是当它们弹出时,有必要指出新的有用的解决方案]

自python3.6起,此功能现已内置于语言中,即“ f-strings”

请参阅:PEP 498-文字字符串插值

例如(注意f前缀):

f'{2**4}'
=> '16'

[I know this is an old question, but it is worth pointing out new useful solutions as they pop up]

Since python3.6, this capability is now built into the language, coined “f-strings”.

See: PEP 498 — Literal String Interpolation

For example (note the f prefix):

f'{2**4}'
=> '16'

回答 9

使用eval在干净的命名空间:

>>> ns = {'__builtins__': None}
>>> eval('2 ** 4', ns)
16

干净的命名空间应防止注入。例如:

>>> eval('__builtins__.__import__("os").system("echo got through")', ns)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<string>", line 1, in <module>
AttributeError: 'NoneType' object has no attribute '__import__'

否则,您将获得:

>>> eval('__builtins__.__import__("os").system("echo got through")')
got through
0

您可能要授予对math模块的访问权限:

>>> import math
>>> ns = vars(math).copy()
>>> ns['__builtins__'] = None
>>> eval('cos(pi/3)', ns)
0.50000000000000011

Use eval in a clean namespace:

>>> ns = {'__builtins__': None}
>>> eval('2 ** 4', ns)
16

The clean namespace should prevent injection. For instance:

>>> eval('__builtins__.__import__("os").system("echo got through")', ns)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<string>", line 1, in <module>
AttributeError: 'NoneType' object has no attribute '__import__'

Otherwise you would get:

>>> eval('__builtins__.__import__("os").system("echo got through")')
got through
0

You might want to give access to the math module:

>>> import math
>>> ns = vars(math).copy()
>>> ns['__builtins__'] = None
>>> eval('cos(pi/3)', ns)
0.50000000000000011

回答 10

这是我不使用eval即可解决问题的方法。适用于Python2和Python3。它不适用于负数。

$ python -m pytest test.py

test.py

from solution import Solutions

class SolutionsTestCase(unittest.TestCase):
    def setUp(self):
        self.solutions = Solutions()

    def test_evaluate(self):
        expressions = [
            '2+3=5',
            '6+4/2*2=10',
            '3+2.45/8=3.30625',
            '3**3*3/3+3=30',
            '2^4=6'
        ]
        results = [x.split('=')[1] for x in expressions]
        for e in range(len(expressions)):
            if '.' in results[e]:
                results[e] = float(results[e])
            else:
                results[e] = int(results[e])
            self.assertEqual(
                results[e],
                self.solutions.evaluate(expressions[e])
            )

solution.py

class Solutions(object):
    def evaluate(self, exp):
        def format(res):
            if '.' in res:
                try:
                    res = float(res)
                except ValueError:
                    pass
            else:
                try:
                    res = int(res)
                except ValueError:
                    pass
            return res
        def splitter(item, op):
            mul = item.split(op)
            if len(mul) == 2:
                for x in ['^', '*', '/', '+', '-']:
                    if x in mul[0]:
                        mul = [mul[0].split(x)[1], mul[1]]
                    if x in mul[1]:
                        mul = [mul[0], mul[1].split(x)[0]]
            elif len(mul) > 2:
                pass
            else:
                pass
            for x in range(len(mul)):
                mul[x] = format(mul[x])
            return mul
        exp = exp.replace(' ', '')
        if '=' in exp:
            res = exp.split('=')[1]
            res = format(res)
            exp = exp.replace('=%s' % res, '')
        while '^' in exp:
            if '^' in exp:
                itm = splitter(exp, '^')
                res = itm[0] ^ itm[1]
                exp = exp.replace('%s^%s' % (str(itm[0]), str(itm[1])), str(res))
        while '**' in exp:
            if '**' in exp:
                itm = splitter(exp, '**')
                res = itm[0] ** itm[1]
                exp = exp.replace('%s**%s' % (str(itm[0]), str(itm[1])), str(res))
        while '/' in exp:
            if '/' in exp:
                itm = splitter(exp, '/')
                res = itm[0] / itm[1]
                exp = exp.replace('%s/%s' % (str(itm[0]), str(itm[1])), str(res))
        while '*' in exp:
            if '*' in exp:
                itm = splitter(exp, '*')
                res = itm[0] * itm[1]
                exp = exp.replace('%s*%s' % (str(itm[0]), str(itm[1])), str(res))
        while '+' in exp:
            if '+' in exp:
                itm = splitter(exp, '+')
                res = itm[0] + itm[1]
                exp = exp.replace('%s+%s' % (str(itm[0]), str(itm[1])), str(res))
        while '-' in exp:
            if '-' in exp:
                itm = splitter(exp, '-')
                res = itm[0] - itm[1]
                exp = exp.replace('%s-%s' % (str(itm[0]), str(itm[1])), str(res))

        return format(exp)

Here’s my solution to the problem without using eval. Works with Python2 and Python3. It doesn’t work with negative numbers.

$ python -m pytest test.py

test.py

from solution import Solutions

class SolutionsTestCase(unittest.TestCase):
    def setUp(self):
        self.solutions = Solutions()

    def test_evaluate(self):
        expressions = [
            '2+3=5',
            '6+4/2*2=10',
            '3+2.45/8=3.30625',
            '3**3*3/3+3=30',
            '2^4=6'
        ]
        results = [x.split('=')[1] for x in expressions]
        for e in range(len(expressions)):
            if '.' in results[e]:
                results[e] = float(results[e])
            else:
                results[e] = int(results[e])
            self.assertEqual(
                results[e],
                self.solutions.evaluate(expressions[e])
            )

solution.py

class Solutions(object):
    def evaluate(self, exp):
        def format(res):
            if '.' in res:
                try:
                    res = float(res)
                except ValueError:
                    pass
            else:
                try:
                    res = int(res)
                except ValueError:
                    pass
            return res
        def splitter(item, op):
            mul = item.split(op)
            if len(mul) == 2:
                for x in ['^', '*', '/', '+', '-']:
                    if x in mul[0]:
                        mul = [mul[0].split(x)[1], mul[1]]
                    if x in mul[1]:
                        mul = [mul[0], mul[1].split(x)[0]]
            elif len(mul) > 2:
                pass
            else:
                pass
            for x in range(len(mul)):
                mul[x] = format(mul[x])
            return mul
        exp = exp.replace(' ', '')
        if '=' in exp:
            res = exp.split('=')[1]
            res = format(res)
            exp = exp.replace('=%s' % res, '')
        while '^' in exp:
            if '^' in exp:
                itm = splitter(exp, '^')
                res = itm[0] ^ itm[1]
                exp = exp.replace('%s^%s' % (str(itm[0]), str(itm[1])), str(res))
        while '**' in exp:
            if '**' in exp:
                itm = splitter(exp, '**')
                res = itm[0] ** itm[1]
                exp = exp.replace('%s**%s' % (str(itm[0]), str(itm[1])), str(res))
        while '/' in exp:
            if '/' in exp:
                itm = splitter(exp, '/')
                res = itm[0] / itm[1]
                exp = exp.replace('%s/%s' % (str(itm[0]), str(itm[1])), str(res))
        while '*' in exp:
            if '*' in exp:
                itm = splitter(exp, '*')
                res = itm[0] * itm[1]
                exp = exp.replace('%s*%s' % (str(itm[0]), str(itm[1])), str(res))
        while '+' in exp:
            if '+' in exp:
                itm = splitter(exp, '+')
                res = itm[0] + itm[1]
                exp = exp.replace('%s+%s' % (str(itm[0]), str(itm[1])), str(res))
        while '-' in exp:
            if '-' in exp:
                itm = splitter(exp, '-')
                res = itm[0] - itm[1]
                exp = exp.replace('%s-%s' % (str(itm[0]), str(itm[1])), str(res))

        return format(exp)

如何使用numpy.correlate进行自相关?

问题:如何使用numpy.correlate进行自相关?

我需要对一组数字进行自相关,据我了解,这只是一组与自身之间的相关性。

我已经使用numpy的相关函数进行了尝试,但是我不相信结果,因为它几乎总是给出一个向量,其中第一个数字不是应该的最大值。

因此,这个问题实际上是两个问题:

  1. 到底在numpy.correlate做什么?
  2. 如何使用它(或其他方法)进行自相关?

I need to do auto-correlation of a set of numbers, which as I understand it is just the correlation of the set with itself.

I’ve tried it using numpy’s correlate function, but I don’t believe the result, as it almost always gives a vector where the first number is not the largest, as it ought to be.

So, this question is really two questions:

  1. What exactly is numpy.correlate doing?
  2. How can I use it (or something else) to do auto-correlation?

回答 0

要回答您的第一个问题,numpy.correlate(a, v, mode)是对进行反卷积av并给出指定模式限制的结果。的卷积的定义,C(T)=&Sigma; -∞<I <∞一个 v 吨+ I其中-∞<T <∞,允许从结果-∞〜∞,但显然不能存储无限长的数组。因此必须对其进行裁剪,这就是该模式的用处。共有3种不同的模式:完全,相同和有效:

  • “全”模式返回结果为每一个t地方都av有一定的重叠。
  • “相同”模式返回的结果与最短向量(av)的长度相同。
  • 仅当av完全重叠时,“有效”模式才返回结果。该文件numpy.convolve提供了有关模式的更多细节。

关于第二个问题,我想numpy.correlate 是在给您自相关,也给您更多的相关性。自相关用于确定在某个时间差处信号或功能与自身的相似程度。在时间差为0时,自相关应该是最高的,因为信号与其自身相同,因此您希望自相关结果数组中的第一个元素最大。但是,相关不是在时间差为0时开始的。它以负的时间差开始,接近0,然后变为正值。也就是说,您期望:

自相关(A)=&Sigma; -∞<I <∞一个 v 吨+ I其中0 <= T <∞

但是您得到的是:

自相关(A)=&Sigma; -∞<I <∞一个 v 吨+ I其中-∞<T <∞

您需要做的是获取相关结果的后半部分,这应该是您要寻找的自相关。一个简单的python函数可以做到:

def autocorr(x):
    result = numpy.correlate(x, x, mode='full')
    return result[result.size/2:]

当然,您将需要进行错误检查以确保它x实际上是一维数组。另外,这种解释可能并不是最严格的数学解释。我一直在讨论无限性,因为卷积的定义使用了无限性,但这不一定适用于自相关。因此,这种解释的理论部分可能有点儿怪异,但希望实际结果会有所帮助。这些 有关自相关的页面非常有用,如果您不介意使用符号和繁琐的概念,可以为您提供更好的理论背景。

To answer your first question, numpy.correlate(a, v, mode) is performing the convolution of a with the reverse of v and giving the results clipped by the specified mode. The definition of convolution, C(t)=∑ -∞ < i < ∞ aivt+i where -∞ < t < ∞, allows for results from -∞ to ∞, but you obviously can’t store an infinitely long array. So it has to be clipped, and that is where the mode comes in. There are 3 different modes: full, same, & valid:

  • “full” mode returns results for every t where both a and v have some overlap.
  • “same” mode returns a result with the same length as the shortest vector (a or v).
  • “valid” mode returns results only when a and v completely overlap each other. The documentation for numpy.convolve gives more detail on the modes.

For your second question, I think numpy.correlate is giving you the autocorrelation, it is just giving you a little more as well. The autocorrelation is used to find how similar a signal, or function, is to itself at a certain time difference. At a time difference of 0, the auto-correlation should be the highest because the signal is identical to itself, so you expected that the first element in the autocorrelation result array would be the greatest. However, the correlation is not starting at a time difference of 0. It starts at a negative time difference, closes to 0, and then goes positive. That is, you were expecting:

autocorrelation(a) = ∑ -∞ < i < ∞ aivt+i where 0 <= t < ∞

But what you got was:

autocorrelation(a) = ∑ -∞ < i < ∞ aivt+i where -∞ < t < ∞

What you need to do is take the last half of your correlation result, and that should be the autocorrelation you are looking for. A simple python function to do that would be:

def autocorr(x):
    result = numpy.correlate(x, x, mode='full')
    return result[result.size/2:]

You will, of course, need error checking to make sure that x is actually a 1-d array. Also, this explanation probably isn’t the most mathematically rigorous. I’ve been throwing around infinities because the definition of convolution uses them, but that doesn’t necessarily apply for autocorrelation. So, the theoretical portion of this explanation may be slightly wonky, but hopefully the practical results are helpful. These pages on autocorrelation are pretty helpful, and can give you a much better theoretical background if you don’t mind wading through the notation and heavy concepts.


回答 1

自相关有两个版本:统计和卷积。它们都做相同的事情,只是有一点点细节:统计版本被标准化为间隔[-1,1]。这是如何进行统计的示例:

def acf(x, length=20):
    return numpy.array([1]+[numpy.corrcoef(x[:-i], x[i:])[0,1]  \
        for i in range(1, length)])

Auto-correlation comes in two versions: statistical and convolution. They both do the same, except for a little detail: The statistical version is normalized to be on the interval [-1,1]. Here is an example of how you do the statistical one:

def acf(x, length=20):
    return numpy.array([1]+[numpy.corrcoef(x[:-i], x[i:])[0,1]  \
        for i in range(1, length)])

回答 2

使用numpy.corrcoef函数而不是numpy.correlate计算t的滞后量的统计相关性:

def autocorr(x, t=1):
    return numpy.corrcoef(numpy.array([x[:-t], x[t:]]))

Use the numpy.corrcoef function instead of numpy.correlate to calculate the statistical correlation for a lag of t:

def autocorr(x, t=1):
    return numpy.corrcoef(numpy.array([x[:-t], x[t:]]))

回答 3

我认为有两件事使该主题更加混乱:

  1. 统计与信号处理的定义:正如其他人指出的那样,在统计中,我们将自相关归一化为[-1,1]。
  2. 部分与非部分均值/方差:当时间序列在滞后> 0时移动时,它们的重叠大小将始终<原始长度。我们使用原始(非局部)的均值和标准差,还是始终使用不断变化的重叠(局部)计算新的均值和标准差有所不同。(可能对此有一个正式的术语,但现在我要使用“部分”)。

我创建了5个函数来计算1d数组的自相关,具有部分与非部分的区别。一些使用统计中的公式,一些使用在信号处理意义上的相关性,这也可以通过FFT完成。但是所有结果都是统计信息定义中的自相关,因此它们说明了它们如何相互链接。代码如下:

import numpy
import matplotlib.pyplot as plt

def autocorr1(x,lags):
    '''numpy.corrcoef, partial'''

    corr=[1. if l==0 else numpy.corrcoef(x[l:],x[:-l])[0][1] for l in lags]
    return numpy.array(corr)

def autocorr2(x,lags):
    '''manualy compute, non partial'''

    mean=numpy.mean(x)
    var=numpy.var(x)
    xp=x-mean
    corr=[1. if l==0 else numpy.sum(xp[l:]*xp[:-l])/len(x)/var for l in lags]

    return numpy.array(corr)

def autocorr3(x,lags):
    '''fft, pad 0s, non partial'''

    n=len(x)
    # pad 0s to 2n-1
    ext_size=2*n-1
    # nearest power of 2
    fsize=2**numpy.ceil(numpy.log2(ext_size)).astype('int')

    xp=x-numpy.mean(x)
    var=numpy.var(x)

    # do fft and ifft
    cf=numpy.fft.fft(xp,fsize)
    sf=cf.conjugate()*cf
    corr=numpy.fft.ifft(sf).real
    corr=corr/var/n

    return corr[:len(lags)]

def autocorr4(x,lags):
    '''fft, don't pad 0s, non partial'''
    mean=x.mean()
    var=numpy.var(x)
    xp=x-mean

    cf=numpy.fft.fft(xp)
    sf=cf.conjugate()*cf
    corr=numpy.fft.ifft(sf).real/var/len(x)

    return corr[:len(lags)]

def autocorr5(x,lags):
    '''numpy.correlate, non partial'''
    mean=x.mean()
    var=numpy.var(x)
    xp=x-mean
    corr=numpy.correlate(xp,xp,'full')[len(x)-1:]/var/len(x)

    return corr[:len(lags)]


if __name__=='__main__':

    y=[28,28,26,19,16,24,26,24,24,29,29,27,31,26,38,23,13,14,28,19,19,\
            17,22,2,4,5,7,8,14,14,23]
    y=numpy.array(y).astype('float')

    lags=range(15)
    fig,ax=plt.subplots()

    for funcii, labelii in zip([autocorr1, autocorr2, autocorr3, autocorr4,
        autocorr5], ['np.corrcoef, partial', 'manual, non-partial',
            'fft, pad 0s, non-partial', 'fft, no padding, non-partial',
            'np.correlate, non-partial']):

        cii=funcii(y,lags)
        print(labelii)
        print(cii)
        ax.plot(lags,cii,label=labelii)

    ax.set_xlabel('lag')
    ax.set_ylabel('correlation coefficient')
    ax.legend()
    plt.show()

这是输出图:

在此处输入图片说明

我们看不到全部5条线,因为其中3条线重叠(在紫色处)。重叠都是非局部自相关。这是因为来自信号处理方法(np.correlateFFT)的计算不会为每个重叠计算出不同的均值/标准差。

另请注意,fft, no padding, non-partial(红线)结果是不同的,因为在执行FFT之前,时间序列未填充0s,因此是循环FFT。我无法详细解释原因,这就是我从其他地方学到的。

I think there are 2 things that add confusion to this topic:

  1. statistical v.s. signal processing definition: as others have pointed out, in statistics we normalize auto-correlation into [-1,1].
  2. partial v.s. non-partial mean/variance: when the timeseries shifts at a lag>0, their overlap size will always < original length. Do we use the mean and std of the original (non-partial), or always compute a new mean and std using the ever changing overlap (partial) makes a difference. (There’s probably a formal term for this, but I’m gonna use “partial” for now).

I’ve created 5 functions that compute auto-correlation of a 1d array, with partial v.s. non-partial distinctions. Some use formula from statistics, some use correlate in the signal processing sense, which can also be done via FFT. But all results are auto-correlations in the statistics definition, so they illustrate how they are linked to each other. Code below:

import numpy
import matplotlib.pyplot as plt

def autocorr1(x,lags):
    '''numpy.corrcoef, partial'''

    corr=[1. if l==0 else numpy.corrcoef(x[l:],x[:-l])[0][1] for l in lags]
    return numpy.array(corr)

def autocorr2(x,lags):
    '''manualy compute, non partial'''

    mean=numpy.mean(x)
    var=numpy.var(x)
    xp=x-mean
    corr=[1. if l==0 else numpy.sum(xp[l:]*xp[:-l])/len(x)/var for l in lags]

    return numpy.array(corr)

def autocorr3(x,lags):
    '''fft, pad 0s, non partial'''

    n=len(x)
    # pad 0s to 2n-1
    ext_size=2*n-1
    # nearest power of 2
    fsize=2**numpy.ceil(numpy.log2(ext_size)).astype('int')

    xp=x-numpy.mean(x)
    var=numpy.var(x)

    # do fft and ifft
    cf=numpy.fft.fft(xp,fsize)
    sf=cf.conjugate()*cf
    corr=numpy.fft.ifft(sf).real
    corr=corr/var/n

    return corr[:len(lags)]

def autocorr4(x,lags):
    '''fft, don't pad 0s, non partial'''
    mean=x.mean()
    var=numpy.var(x)
    xp=x-mean

    cf=numpy.fft.fft(xp)
    sf=cf.conjugate()*cf
    corr=numpy.fft.ifft(sf).real/var/len(x)

    return corr[:len(lags)]

def autocorr5(x,lags):
    '''numpy.correlate, non partial'''
    mean=x.mean()
    var=numpy.var(x)
    xp=x-mean
    corr=numpy.correlate(xp,xp,'full')[len(x)-1:]/var/len(x)

    return corr[:len(lags)]


if __name__=='__main__':

    y=[28,28,26,19,16,24,26,24,24,29,29,27,31,26,38,23,13,14,28,19,19,\
            17,22,2,4,5,7,8,14,14,23]
    y=numpy.array(y).astype('float')

    lags=range(15)
    fig,ax=plt.subplots()

    for funcii, labelii in zip([autocorr1, autocorr2, autocorr3, autocorr4,
        autocorr5], ['np.corrcoef, partial', 'manual, non-partial',
            'fft, pad 0s, non-partial', 'fft, no padding, non-partial',
            'np.correlate, non-partial']):

        cii=funcii(y,lags)
        print(labelii)
        print(cii)
        ax.plot(lags,cii,label=labelii)

    ax.set_xlabel('lag')
    ax.set_ylabel('correlation coefficient')
    ax.legend()
    plt.show()

Here is the output figure:

enter image description here

We don’t see all 5 lines because 3 of them overlap (at the purple). The overlaps are all non-partial auto-correlations. This is because computations from the signal processing methods (np.correlate, FFT) don’t compute a different mean/std for each overlap.

Also note that the fft, no padding, non-partial (red line) result is different, because it didn’t pad the timeseries with 0s before doing FFT, so it’s circular FFT. I can’t explain in detail why, that’s what I learned from elsewhere.


回答 4

当我遇到相同的问题时,我想与您分享几行代码。实际上,到目前为止,关于stackoverflow中的自相关的文章非常多。如果将自相关定义为a(x, L) = sum(k=0,N-L-1)((xk-xbar)*(x(k+L)-xbar))/sum(k=0,N-1)((xk-xbar)**2)[这是IDL的a_correlate函数中给出的定义,并且与我在问题#12269834的答案2中看到的一致 ],那么以下内容似乎给出了正确的结果:

import numpy as np
import matplotlib.pyplot as plt

# generate some data
x = np.arange(0.,6.12,0.01)
y = np.sin(x)
# y = np.random.uniform(size=300)
yunbiased = y-np.mean(y)
ynorm = np.sum(yunbiased**2)
acor = np.correlate(yunbiased, yunbiased, "same")/ynorm
# use only second half
acor = acor[len(acor)/2:]

plt.plot(acor)
plt.show()

如您所见,我已经用正弦曲线和均匀的随机分布对其进行了测试,两个结果看起来都与我期望的一样。请注意,我mode="same"代替mode="full"其他人使用了。

As I just ran into the same problem, I would like to share a few lines of code with you. In fact there are several rather similar posts about autocorrelation in stackoverflow by now. If you define the autocorrelation as a(x, L) = sum(k=0,N-L-1)((xk-xbar)*(x(k+L)-xbar))/sum(k=0,N-1)((xk-xbar)**2) [this is the definition given in IDL’s a_correlate function and it agrees with what I see in answer 2 of question #12269834], then the following seems to give the correct results:

import numpy as np
import matplotlib.pyplot as plt

# generate some data
x = np.arange(0.,6.12,0.01)
y = np.sin(x)
# y = np.random.uniform(size=300)
yunbiased = y-np.mean(y)
ynorm = np.sum(yunbiased**2)
acor = np.correlate(yunbiased, yunbiased, "same")/ynorm
# use only second half
acor = acor[len(acor)/2:]

plt.plot(acor)
plt.show()

As you see I have tested this with a sin curve and a uniform random distribution, and both results look like I would expect them. Note that I used mode="same" instead of mode="full" as the others did.


回答 5

您的问题1已在此处几个出色的答案中进行了广泛讨论。

我想与您分享几行代码,这些代码仅允许您根据自相关的数学属性来计算信号的自相关。也就是说,可以通过以下方式计算自相关:

  1. 从信号中减去平均值并获得无偏信号

  2. 计算无偏信号的傅立叶变换

  3. 通过采用无偏信号的傅立叶变换的每个值的平方范数来计算信号的功率谱密度

  4. 计算功率谱密度的傅立叶逆变换

  5. 通过无偏信号的平方和归一化功率谱密度的傅立叶逆变换,并且仅取所得矢量的一半

执行此操作的代码如下:

def autocorrelation (x) :
    """
    Compute the autocorrelation of the signal, based on the properties of the
    power spectral density of the signal.
    """
    xp = x-np.mean(x)
    f = np.fft.fft(xp)
    p = np.array([np.real(v)**2+np.imag(v)**2 for v in f])
    pi = np.fft.ifft(p)
    return np.real(pi)[:x.size/2]/np.sum(xp**2)

Your question 1 has been already extensively discussed in several excellent answers here.

I thought to share with you a few lines of code that allow you to compute the autocorrelation of a signal based only on the mathematical properties of the autocorrelation. That is, the autocorrelation may be computed in the following way:

  1. subtract the mean from the signal and obtain an unbiased signal

  2. compute the Fourier transform of the unbiased signal

  3. compute the power spectral density of the signal, by taking the square norm of each value of the Fourier transform of the unbiased signal

  4. compute the inverse Fourier transform of the power spectral density

  5. normalize the inverse Fourier transform of the power spectral density by the sum of the squares of the unbiased signal, and take only half of the resulting vector

The code to do this is the following:

def autocorrelation (x) :
    """
    Compute the autocorrelation of the signal, based on the properties of the
    power spectral density of the signal.
    """
    xp = x-np.mean(x)
    f = np.fft.fft(xp)
    p = np.array([np.real(v)**2+np.imag(v)**2 for v in f])
    pi = np.fft.ifft(p)
    return np.real(pi)[:x.size/2]/np.sum(xp**2)

回答 6

我是一名计算生物学家,当我不得不计算几个随机过程的时间序列之间的自相关/互相关性时,我意识到自己np.correlate没有做我需要的工作。

确实,似乎缺少的np.correlate是在距离𝜏 上所有可能的几个时间点上求平均值

这是我定义函数的方式,以完成所需的工作:

def autocross(x, y):
    c = np.correlate(x, y, "same")
    v = [c[i]/( len(x)-abs( i - (len(x)/2)  ) ) for i in range(len(c))]
    return v

在我看来,以前的答案都没有涉及这种自相关/互相关的情况:希望这个答案对像我这样从事随机过程的人可能有用。

I’m a computational biologist, and when I had to compute the auto/cross-correlations between couples of time series of stochastic processes I realized that np.correlate was not doing the job I needed.

Indeed, what seems to be missing from np.correlate is the averaging over all the possible couples of time points at distance 𝜏.

Here is how I defined a function doing what I needed:

def autocross(x, y):
    c = np.correlate(x, y, "same")
    v = [c[i]/( len(x)-abs( i - (len(x)/2)  ) ) for i in range(len(c))]
    return v

It seems to me none of the previous answers cover this instance of auto/cross-correlation: hope this answer may be useful to somebody working on stochastic processes like me.


回答 7

我使用talib.CORREL进行这种自相关,我怀疑您可以对其他软件包进行同样的操作:

def autocorrelate(x, period):

    # x is a deep indicator array 
    # period of sample and slices of comparison

    # oldest data (period of input array) may be nan; remove it
    x = x[-np.count_nonzero(~np.isnan(x)):]
    # subtract mean to normalize indicator
    x -= np.mean(x)
    # isolate the recent sample to be autocorrelated
    sample = x[-period:]
    # create slices of indicator data
    correls = []
    for n in range((len(x)-1), period, -1):
        alpha = period + n
        slices = (x[-alpha:])[:period]
        # compare each slice to the recent sample
        correls.append(ta.CORREL(slices, sample, period)[-1])
    # fill in zeros for sample overlap period of recent correlations    
    for n in range(period,0,-1):
        correls.append(0)
    # oldest data (autocorrelation period) will be nan; remove it
    correls = np.array(correls[-np.count_nonzero(~np.isnan(correls)):])      

    return correls

# CORRELATION OF BEST FIT
# the highest value correlation    
max_value = np.max(correls)
# index of the best correlation
max_index = np.argmax(correls)

I use talib.CORREL for autocorrelation like this, I suspect you could do the same with other packages:

def autocorrelate(x, period):

    # x is a deep indicator array 
    # period of sample and slices of comparison

    # oldest data (period of input array) may be nan; remove it
    x = x[-np.count_nonzero(~np.isnan(x)):]
    # subtract mean to normalize indicator
    x -= np.mean(x)
    # isolate the recent sample to be autocorrelated
    sample = x[-period:]
    # create slices of indicator data
    correls = []
    for n in range((len(x)-1), period, -1):
        alpha = period + n
        slices = (x[-alpha:])[:period]
        # compare each slice to the recent sample
        correls.append(ta.CORREL(slices, sample, period)[-1])
    # fill in zeros for sample overlap period of recent correlations    
    for n in range(period,0,-1):
        correls.append(0)
    # oldest data (autocorrelation period) will be nan; remove it
    correls = np.array(correls[-np.count_nonzero(~np.isnan(correls)):])      

    return correls

# CORRELATION OF BEST FIT
# the highest value correlation    
max_value = np.max(correls)
# index of the best correlation
max_index = np.argmax(correls)

回答 8

使用傅立叶变换和卷积定理

时间复杂度为 N * log(N)

def autocorr1(x):
    r2=np.fft.ifft(np.abs(np.fft.fft(x))**2).real
    return r2[:len(x)//2]

这是一个标准化且无偏见的版本,它也是 N * log(N)

def autocorr2(x):
    r2=np.fft.ifft(np.abs(np.fft.fft(x))**2).real
    c=(r2/x.shape-np.mean(x)**2)/np.std(x)**2
    return c[:len(x)//2]

A. Levy提供的方法有效,但是我在PC上对其进行了测试,其时间复杂度似乎为N * N

def autocorr(x):
    result = numpy.correlate(x, x, mode='full')
    return result[result.size/2:]

Using Fourier transformation and the convolution theorem

The time complexicity is N*log(N)

def autocorr1(x):
    r2=np.fft.ifft(np.abs(np.fft.fft(x))**2).real
    return r2[:len(x)//2]

Here is a normalized and unbiased version, it is also N*log(N)

def autocorr2(x):
    r2=np.fft.ifft(np.abs(np.fft.fft(x))**2).real
    c=(r2/x.shape-np.mean(x)**2)/np.std(x)**2
    return c[:len(x)//2]

The method provided by A. Levy works, but I tested it in my PC, its time complexicity seems to be N*N

def autocorr(x):
    result = numpy.correlate(x, x, mode='full')
    return result[result.size/2:]

回答 9

statsmodels.tsa.stattools.acf()中提供了numpy.correlate的替代方法。这就产生了不断降低的自相关函数,如OP所述。实现起来非常简单:

from statsmodels.tsa import stattools
# x = 1-D array
# Yield normalized autocorrelation function of number lags
autocorr = stattools.acf( x )

# Get autocorrelation coefficient at lag = 1
autocorr_coeff = autocorr[1]

默认行为是停滞40次,但是可以使用nlag=针对特定应用程序的选项进行调整。该页面底部提供了该功能背后的统计信息

An alternative to numpy.correlate is available in statsmodels.tsa.stattools.acf(). This yields a continuously decreasing autocorrelation function like the one described by OP. Implementing it is fairly simple:

from statsmodels.tsa import stattools
# x = 1-D array
# Yield normalized autocorrelation function of number lags
autocorr = stattools.acf( x )

# Get autocorrelation coefficient at lag = 1
autocorr_coeff = autocorr[1]

The default behavior is to stop at 40 nlags, but this can be adjusted with the nlag= option for your specific application. There is a citation at the bottom of the page for the statistics behind the function.


回答 10

我认为对OP问题的真正答案简明地包含在Numpy.correlate文档的以下摘录中:

mode : {'valid', 'same', 'full'}, optional
    Refer to the `convolve` docstring.  Note that the default
    is `valid`, unlike `convolve`, which uses `full`.

这意味着,当不使用’mode’定义时,Numpy.correlate函数在为其两个输入参数赋予相同的矢量时(即-用于执行自相关时)将返回标量。

I think the real answer to the OP’s question is succinctly contained in this excerpt from the Numpy.correlate documentation:

mode : {'valid', 'same', 'full'}, optional
    Refer to the `convolve` docstring.  Note that the default
    is `valid`, unlike `convolve`, which uses `full`.

This implies that, when used with no ‘mode’ definition, the Numpy.correlate function will return a scalar, when given the same vector for its two input arguments (i.e. – when used to perform autocorrelation).


回答 11

一个没有熊猫的简单解决方案:

import numpy as np

def auto_corrcoef(x):
   return np.corrcoef(x[1:-1], x[2:])[0,1]

A simple solution without pandas:

import numpy as np

def auto_corrcoef(x):
   return np.corrcoef(x[1:-1], x[2:])[0,1]

回答 12

给定pandas datatime系列收益,绘制统计自相关图:

import matplotlib.pyplot as plt

def plot_autocorr(returns, lags):
    autocorrelation = []
    for lag in range(lags+1):
        corr_lag = returns.corr(returns.shift(-lag)) 
        autocorrelation.append(corr_lag)
    plt.plot(range(lags+1), autocorrelation, '--o')
    plt.xticks(range(lags+1))
    return np.array(autocorrelation)

Plot the statistical autocorrelation given a pandas datatime Series of returns:

import matplotlib.pyplot as plt

def plot_autocorr(returns, lags):
    autocorrelation = []
    for lag in range(lags+1):
        corr_lag = returns.corr(returns.shift(-lag)) 
        autocorrelation.append(corr_lag)
    plt.plot(range(lags+1), autocorrelation, '--o')
    plt.xticks(range(lags+1))
    return np.array(autocorrelation)

获取数的所有除数的最佳方法是什么?

问题:获取数的所有除数的最佳方法是什么?

这是非常愚蠢的方式:

def divisorGenerator(n):
    for i in xrange(1,n/2+1):
        if n%i == 0: yield i
    yield n

我想要得到的结果与此类似,但是我想要一个更智能的算法(这个算法太慢而且太笨了:-)

我可以很快找到主要因素及其多样性。我有一个生成器以这种方式生成因子:

(因数1,多重性1)(因数
2,多重性2)
(因数3,多重性3)
等等…

即输出

for i in factorGenerator(100):
    print i

是:

(2, 2)
(5, 2)

我不知道这对我想做的事情有多大帮助(我为其他问题编写了代码),无论如何,我都希望有一种更聪明的制作方法

for i in divisorGen(100):
    print i

输出:

1
2
4
5
10
20
25
50
100

更新:非常感谢Greg Hewgill和他的“智能方式” :)计算100000000的所有除数,而用39s的方式计算了我的机器上愚蠢的方式花费了0.01s,这很酷:D

更新2:别说这是这篇文章的重复。计算给定数的除数无需计算所有除数。这是一个不同的问题,如果您认为不是这样,那么请在Wikipedia上查找“除数函数”。在发布之前,请先阅读问题和答案,如果您不明白主题是什么,请不要添加无用且已经给出答案的内容。

Here’s the very dumb way:

def divisorGenerator(n):
    for i in xrange(1,n/2+1):
        if n%i == 0: yield i
    yield n

The result I’d like to get is similar to this one, but I’d like a smarter algorithm (this one it’s too much slow and dumb :-)

I can find prime factors and their multiplicity fast enough. I’ve an generator that generates factor in this way:

(factor1, multiplicity1)
(factor2, multiplicity2)
(factor3, multiplicity3)
and so on…

i.e. the output of

for i in factorGenerator(100):
    print i

is:

(2, 2)
(5, 2)

I don’t know how much is this useful for what I want to do (I coded it for other problems), anyway I’d like a smarter way to make

for i in divisorGen(100):
    print i

output this:

1
2
4
5
10
20
25
50
100

UPDATE: Many thanks to Greg Hewgill and his “smart way” :) Calculating all divisors of 100000000 took 0.01s with his way against the 39s that the dumb way took on my machine, very cool :D

UPDATE 2: Stop saying this is a duplicate of this post. Calculating the number of divisor of a given number doesn’t need to calculate all the divisors. It’s a different problem, if you think it’s not then look for “Divisor function” on wikipedia. Read the questions and the answer before posting, if you do not understand what is the topic just don’t add not useful and already given answers.


回答 0

给定您的factorGenerator功能,这里divisorGen应该可以工作:

def divisorGen(n):
    factors = list(factorGenerator(n))
    nfactors = len(factors)
    f = [0] * nfactors
    while True:
        yield reduce(lambda x, y: x*y, [factors[x][0]**f[x] for x in range(nfactors)], 1)
        i = 0
        while True:
            f[i] += 1
            if f[i] <= factors[i][1]:
                break
            f[i] = 0
            i += 1
            if i >= nfactors:
                return

该算法的整体效率将完全取决于的效率factorGenerator

Given your factorGenerator function, here is a divisorGen that should work:

def divisorGen(n):
    factors = list(factorGenerator(n))
    nfactors = len(factors)
    f = [0] * nfactors
    while True:
        yield reduce(lambda x, y: x*y, [factors[x][0]**f[x] for x in range(nfactors)], 1)
        i = 0
        while True:
            f[i] += 1
            if f[i] <= factors[i][1]:
                break
            f[i] = 0
            i += 1
            if i >= nfactors:
                return

The overall efficiency of this algorithm will depend entirely on the efficiency of the factorGenerator.


回答 1

要扩展Shimi所说的话,您应该只在1到n的平方根之间运行循环。然后找到对,执行n / i,这将覆盖整个问题空间。

还要指出的是,这是一个NP或“困难”的问题。穷举搜索(您正在执行的方式)与保证答案的效果差不多。加密算法等使用此事实来帮助保护它们。如果有人要解决这个问题,那么我们目前大多数的“安全”通信,即使不是全部,也会变得不安全。

Python代码:

import math

def divisorGenerator(n):
    large_divisors = []
    for i in xrange(1, int(math.sqrt(n) + 1)):
        if n % i == 0:
            yield i
            if i*i != n:
                large_divisors.append(n / i)
    for divisor in reversed(large_divisors):
        yield divisor

print list(divisorGenerator(100))

哪个应该输出类似以下的列表:

[1、2、4、5、10、20、25、50、100]

To expand on what Shimi has said, you should only be running your loop from 1 to the square root of n. Then to find the pair, do n / i, and this will cover the whole problem space.

As was also noted, this is a NP, or ‘difficult’ problem. Exhaustive search, the way you are doing it, is about as good as it gets for guaranteed answers. This fact is used by encryption algorithms and the like to help secure them. If someone were to solve this problem, most if not all of our current ‘secure’ communication would be rendered insecure.

Python code:

import math

def divisorGenerator(n):
    large_divisors = []
    for i in xrange(1, int(math.sqrt(n) + 1)):
        if n % i == 0:
            yield i
            if i*i != n:
                large_divisors.append(n / i)
    for divisor in reversed(large_divisors):
        yield divisor

print list(divisorGenerator(100))

Which should output a list like:

[1, 2, 4, 5, 10, 20, 25, 50, 100]

回答 2

尽管已经有很多解决方案,但我确实必须发布此内容:)

这是:

  • 可读的
  • 自包含,可复制并粘贴
  • 快速(在有很多主要因素和因数的情况下,比公认的解决方案快10倍以上)
  • 符合python3,python2和pypy

码:

def divisors(n):
    # get factors and their counts
    factors = {}
    nn = n
    i = 2
    while i*i <= nn:
        while nn % i == 0:
            factors[i] = factors.get(i, 0) + 1
            nn //= i
        i += 1
    if nn > 1:
        factors[nn] = factors.get(nn, 0) + 1

    primes = list(factors.keys())

    # generates factors from primes[k:] subset
    def generate(k):
        if k == len(primes):
            yield 1
        else:
            rest = generate(k+1)
            prime = primes[k]
            for factor in rest:
                prime_to_i = 1
                # prime_to_i iterates prime**i values, i being all possible exponents
                for _ in range(factors[prime] + 1):
                    yield factor * prime_to_i
                    prime_to_i *= prime

    # in python3, `yield from generate(0)` would also work
    for factor in generate(0):
        yield factor

Although there are already many solutions to this, I really have to post this :)

This one is:

  • readable
  • short
  • self contained, copy & paste ready
  • quick (in cases with a lot of prime factors and divisors, > 10 times faster than the accepted solution)
  • python3, python2 and pypy compliant

Code:

def divisors(n):
    # get factors and their counts
    factors = {}
    nn = n
    i = 2
    while i*i <= nn:
        while nn % i == 0:
            factors[i] = factors.get(i, 0) + 1
            nn //= i
        i += 1
    if nn > 1:
        factors[nn] = factors.get(nn, 0) + 1

    primes = list(factors.keys())

    # generates factors from primes[k:] subset
    def generate(k):
        if k == len(primes):
            yield 1
        else:
            rest = generate(k+1)
            prime = primes[k]
            for factor in rest:
                prime_to_i = 1
                # prime_to_i iterates prime**i values, i being all possible exponents
                for _ in range(factors[prime] + 1):
                    yield factor * prime_to_i
                    prime_to_i *= prime

    # in python3, `yield from generate(0)` would also work
    for factor in generate(0):
        yield factor

回答 3

我认为您可以停在,math.sqrt(n)而不是n / 2。

我会举一个例子,以便您容易理解。现在sqrt(28)5.29这样ceil(5.29)将为6所以我,如果我将在6停止那么我将可以得到所有的除数。怎么样?

首先查看代码,然后查看图片:

import math
def divisors(n):
    divs = [1]
    for i in xrange(2,int(math.sqrt(n))+1):
        if n%i == 0:
            divs.extend([i,n/i])
    divs.extend([n])
    return list(set(divs))

现在,请参见下图:

可以说我已经添加1到除数列表中,i=2所以我从

28的除数

因此,在所有迭代的末尾,因为我将商和除数添加到列表中,所以填充了28的所有除数。

资料来源:如何确定数字的除数

I think you can stop at math.sqrt(n) instead of n/2.

I will give you example so that you can understand it easily. Now the sqrt(28) is 5.29 so ceil(5.29) will be 6. So I if I will stop at 6 then I will can get all the divisors. How?

First see the code and then see image:

import math
def divisors(n):
    divs = [1]
    for i in xrange(2,int(math.sqrt(n))+1):
        if n%i == 0:
            divs.extend([i,n/i])
    divs.extend([n])
    return list(set(divs))

Now, See the image below:

Lets say I have already added 1 to my divisors list and I start with i=2 so

Divisors of a 28

So at the end of all the iterations as I have added the quotient and the divisor to my list all the divisors of 28 are populated.

Source: How to determine the divisors of a number


回答 4

我喜欢Greg解决方案,但我希望它更像python。我觉得它会更快,更易读。所以经过一段时间的编码后,我想到了这一点。

要创建列表的笛卡尔积,需要前两个功能。一旦出现此问题,便可以重复使用。顺便说一下,我必须自己编写程序,如果有人知道该问题的标准解决方案,请随时与我联系。

现在,“ Factorgenerator”将返回一个字典。然后将字典放入“除数”中,后者使用字典首先生成一个列表列表,其中每个列表都是具有p素数的p ^ n形式的因子的列表。然后,我们生成这些列表的笛卡尔乘积,最后使用Greg的解决方案生成除数。我们对它们进行排序,然后将其退回。

我测试了它,它似乎比以前的版本要快一些。我将它作为一个更大的程序的一部分进行了测试,所以我不能真正说出它快多少。

彼得罗·斯佩罗尼(Pietrosperoni点它)

from math import sqrt


##############################################################
### cartesian product of lists ##################################
##############################################################

def appendEs2Sequences(sequences,es):
    result=[]
    if not sequences:
        for e in es:
            result.append([e])
    else:
        for e in es:
            result+=[seq+[e] for seq in sequences]
    return result


def cartesianproduct(lists):
    """
    given a list of lists,
    returns all the possible combinations taking one element from each list
    The list does not have to be of equal length
    """
    return reduce(appendEs2Sequences,lists,[])

##############################################################
### prime factors of a natural ##################################
##############################################################

def primefactors(n):
    '''lists prime factors, from greatest to smallest'''  
    i = 2
    while i<=sqrt(n):
        if n%i==0:
            l = primefactors(n/i)
            l.append(i)
            return l
        i+=1
    return [n]      # n is prime


##############################################################
### factorization of a natural ##################################
##############################################################

def factorGenerator(n):
    p = primefactors(n)
    factors={}
    for p1 in p:
        try:
            factors[p1]+=1
        except KeyError:
            factors[p1]=1
    return factors

def divisors(n):
    factors = factorGenerator(n)
    divisors=[]
    listexponents=[map(lambda x:k**x,range(0,factors[k]+1)) for k in factors.keys()]
    listfactors=cartesianproduct(listexponents)
    for f in listfactors:
        divisors.append(reduce(lambda x, y: x*y, f, 1))
    divisors.sort()
    return divisors



print divisors(60668796879)

PS这是我第一次发布到stackoverflow。我期待任何反馈。

I like Greg solution, but I wish it was more python like. I feel it would be faster and more readable; so after some time of coding I came out with this.

The first two functions are needed to make the cartesian product of lists. And can be reused whnever this problem arises. By the way, I had to program this myself, if anyone knows of a standard solution for this problem, please feel free to contact me.

“Factorgenerator” now returns a dictionary. And then the dictionary is fed into “divisors”, who uses it to generate first a list of lists, where each list is the list of the factors of the form p^n with p prime. Then we make the cartesian product of those lists, and we finally use Greg’ solution to generate the divisor. We sort them, and return them.

I tested it and it seem to be a bit faster than the previous version. I tested it as part of a bigger program, so I can’t really say how much is it faster though.

Pietro Speroni (pietrosperoni dot it)

from math import sqrt


##############################################################
### cartesian product of lists ##################################
##############################################################

def appendEs2Sequences(sequences,es):
    result=[]
    if not sequences:
        for e in es:
            result.append([e])
    else:
        for e in es:
            result+=[seq+[e] for seq in sequences]
    return result


def cartesianproduct(lists):
    """
    given a list of lists,
    returns all the possible combinations taking one element from each list
    The list does not have to be of equal length
    """
    return reduce(appendEs2Sequences,lists,[])

##############################################################
### prime factors of a natural ##################################
##############################################################

def primefactors(n):
    '''lists prime factors, from greatest to smallest'''  
    i = 2
    while i<=sqrt(n):
        if n%i==0:
            l = primefactors(n/i)
            l.append(i)
            return l
        i+=1
    return [n]      # n is prime


##############################################################
### factorization of a natural ##################################
##############################################################

def factorGenerator(n):
    p = primefactors(n)
    factors={}
    for p1 in p:
        try:
            factors[p1]+=1
        except KeyError:
            factors[p1]=1
    return factors

def divisors(n):
    factors = factorGenerator(n)
    divisors=[]
    listexponents=[map(lambda x:k**x,range(0,factors[k]+1)) for k in factors.keys()]
    listfactors=cartesianproduct(listexponents)
    for f in listfactors:
        divisors.append(reduce(lambda x, y: x*y, f, 1))
    divisors.sort()
    return divisors



print divisors(60668796879)

P.S. it is the first time I am posting to stackoverflow. I am looking forward for any feedback.


回答 5

这是在纯Python 3.6中对10 ** 16左右的数字进行处理的一种智能,快速的方法,

from itertools import compress

def primes(n):
    """ Returns  a list of primes < n for n > 2 """
    sieve = bytearray([True]) * (n//2)
    for i in range(3,int(n**0.5)+1,2):
        if sieve[i//2]:
            sieve[i*i//2::i] = bytearray((n-i*i-1)//(2*i)+1)
    return [2,*compress(range(3,n,2), sieve[1:])]

def factorization(n):
    """ Returns a list of the prime factorization of n """
    pf = []
    for p in primeslist:
      if p*p > n : break
      count = 0
      while not n % p:
        n //= p
        count += 1
      if count > 0: pf.append((p, count))
    if n > 1: pf.append((n, 1))
    return pf

def divisors(n):
    """ Returns an unsorted list of the divisors of n """
    divs = [1]
    for p, e in factorization(n):
        divs += [x*p**k for k in range(1,e+1) for x in divs]
    return divs

n = 600851475143
primeslist = primes(int(n**0.5)+1) 
print(divisors(n))

Here is a smart and fast way to do it for numbers up to and around 10**16 in pure Python 3.6,

from itertools import compress

def primes(n):
    """ Returns  a list of primes < n for n > 2 """
    sieve = bytearray([True]) * (n//2)
    for i in range(3,int(n**0.5)+1,2):
        if sieve[i//2]:
            sieve[i*i//2::i] = bytearray((n-i*i-1)//(2*i)+1)
    return [2,*compress(range(3,n,2), sieve[1:])]

def factorization(n):
    """ Returns a list of the prime factorization of n """
    pf = []
    for p in primeslist:
      if p*p > n : break
      count = 0
      while not n % p:
        n //= p
        count += 1
      if count > 0: pf.append((p, count))
    if n > 1: pf.append((n, 1))
    return pf

def divisors(n):
    """ Returns an unsorted list of the divisors of n """
    divs = [1]
    for p, e in factorization(n):
        divs += [x*p**k for k in range(1,e+1) for x in divs]
    return divs

n = 600851475143
primeslist = primes(int(n**0.5)+1) 
print(divisors(n))

回答 6

改编自CodeReview,这是一个与num=1!一起使用的变体!

from itertools import product
import operator

def prod(ls):
   return reduce(operator.mul, ls, 1)

def powered(factors, powers):
   return prod(f**p for (f,p) in zip(factors, powers))


def divisors(num) :

   pf = dict(prime_factors(num))
   primes = pf.keys()
   #For each prime, possible exponents
   exponents = [range(i+1) for i in pf.values()]
   return (powered(primes,es) for es in product(*exponents))

Adapted from CodeReview, here is a variant which works with num=1 !

from itertools import product
import operator

def prod(ls):
   return reduce(operator.mul, ls, 1)

def powered(factors, powers):
   return prod(f**p for (f,p) in zip(factors, powers))


def divisors(num) :

   pf = dict(prime_factors(num))
   primes = pf.keys()
   #For each prime, possible exponents
   exponents = [range(i+1) for i in pf.values()]
   return (powered(primes,es) for es in product(*exponents))

回答 7

我将添加一个稍微修改过的Anivarth版本(因为我认为它是最Python的)以供将来参考。

from math import sqrt

def divisors(n):
    divs = {1,n}
    for i in range(2,int(sqrt(n))+1):
        if n%i == 0:
            divs.update((i,n//i))
    return divs

I’m just going to add a slightly revised version of Anivarth’s (as I believe it’s the most pythonic) for future reference.

from math import sqrt

def divisors(n):
    divs = {1,n}
    for i in range(2,int(sqrt(n))+1):
        if n%i == 0:
            divs.update((i,n//i))
    return divs

回答 8

旧问题,但这是我的看法:

def divs(n, m):
    if m == 1: return [1]
    if n % m == 0: return [m] + divs(n, m - 1)
    return divs(n, m - 1)

您可以代理:

def divisorGenerator(n):
    for x in reversed(divs(n, n)):
        yield x

注意:对于支持的语言,这可能是尾递归。

Old question, but here is my take:

def divs(n, m):
    if m == 1: return [1]
    if n % m == 0: return [m] + divs(n, m - 1)
    return divs(n, m - 1)

You can proxy with:

def divisorGenerator(n):
    for x in reversed(divs(n, n)):
        yield x

NOTE: For languages that support, this could be tail recursive.


回答 9

假设factors函数返回n的因数(例如,factors(60)返回列表[2,2,3,5]),这是一个计算n除数的函数

function divisors(n)
    divs := [1]
    for fact in factors(n)
        temp := []
        for div in divs
            if fact * div not in divs
                append fact * div to temp
        divs := divs + temp
    return divs

Assuming that the factors function returns the factors of n (for instance, factors(60) returns the list [2, 2, 3, 5]), here is a function to compute the divisors of n:

function divisors(n)
    divs := [1]
    for fact in factors(n)
        temp := []
        for div in divs
            if fact * div not in divs
                append fact * div to temp
        divs := divs + temp
    return divs

回答 10

这是我的解决方案。它似乎很愚蠢,但效果很好…而且我试图找到所有合适的除数,所以循环从i = 2开始。

import math as m 

def findfac(n):
    faclist = [1]
    for i in range(2, int(m.sqrt(n) + 2)):
        if n%i == 0:
            if i not in faclist:
                faclist.append(i)
                if n/i not in faclist:
                    faclist.append(n/i)
    return facts

Here’s my solution. It seems to be dumb but works well…and I was trying to find all proper divisors so the loop started from i = 2.

import math as m 

def findfac(n):
    faclist = [1]
    for i in range(2, int(m.sqrt(n) + 2)):
        if n%i == 0:
            if i not in faclist:
                faclist.append(i)
                if n/i not in faclist:
                    faclist.append(n/i)
    return facts

回答 11

如果您只在乎使用列表推导,对您而言别无其他!

from itertools import combinations
from functools import reduce

def get_devisors(n):
    f = [f for f,e in list(factorGenerator(n)) for i in range(e)]
    fc = [x for l in range(len(f)+1) for x in combinations(f, l)]
    devisors = [1 if c==() else reduce((lambda x, y: x * y), c) for c in set(fc)]
    return sorted(devisors)

If you only care about using list comprehensions and nothing else matters to you!

from itertools import combinations
from functools import reduce

def get_devisors(n):
    f = [f for f,e in list(factorGenerator(n)) for i in range(e)]
    fc = [x for l in range(len(f)+1) for x in combinations(f, l)]
    devisors = [1 if c==() else reduce((lambda x, y: x * y), c) for c in set(fc)]
    return sorted(devisors)

回答 12

如果您的PC拥有大量内存,那么使用numpy可以使单个行足够快:

N = 10000000; tst = np.arange(1, N); tst[np.mod(N, tst) == 0]
Out: 
array([      1,       2,       4,       5,       8,      10,      16,
            20,      25,      32,      40,      50,      64,      80,
           100,     125,     128,     160,     200,     250,     320,
           400,     500,     625,     640,     800,    1000,    1250,
          1600,    2000,    2500,    3125,    3200,    4000,    5000,
          6250,    8000,   10000,   12500,   15625,   16000,   20000,
         25000,   31250,   40000,   50000,   62500,   78125,   80000,
        100000,  125000,  156250,  200000,  250000,  312500,  400000,
        500000,  625000, 1000000, 1250000, 2000000, 2500000, 5000000])

在我的慢速PC上花费不到1秒。

If your PC has tons of memory, a brute single line can be fast enough with numpy:

N = 10000000; tst = np.arange(1, N); tst[np.mod(N, tst) == 0]
Out: 
array([      1,       2,       4,       5,       8,      10,      16,
            20,      25,      32,      40,      50,      64,      80,
           100,     125,     128,     160,     200,     250,     320,
           400,     500,     625,     640,     800,    1000,    1250,
          1600,    2000,    2500,    3125,    3200,    4000,    5000,
          6250,    8000,   10000,   12500,   15625,   16000,   20000,
         25000,   31250,   40000,   50000,   62500,   78125,   80000,
        100000,  125000,  156250,  200000,  250000,  312500,  400000,
        500000,  625000, 1000000, 1250000, 2000000, 2500000, 5000000])

Takes less than 1s on my slow PC.


回答 13

我通过生成器函数的解决方案是:

def divisor(num):
    for x in range(1, num + 1):
        if num % x == 0:
            yield x
    while True:
        yield None

My solution via generator function is:

def divisor(num):
    for x in range(1, num + 1):
        if num % x == 0:
            yield x
    while True:
        yield None

回答 14

return [x for x in range(n+1) if n/x==int(n/x)]
return [x for x in range(n+1) if n/x==int(n/x)]

回答 15

对我来说,这很好,也很干净(Python 3)

def divisors(number):
    n = 1
    while(n<number):
        if(number%n==0):
            print(n)
        else:
            pass
        n += 1
    print(number)

速度不是很快,但是可以按需逐行返回除数,如果您确实想要,也可以执行list.append(n)和list.append(number)

For me this works fine and is also clean (Python 3)

def divisors(number):
    n = 1
    while(n<number):
        if(number%n==0):
            print(n)
        else:
            pass
        n += 1
    print(number)

Not very fast but returns divisors line by line as you wanted, also you can do list.append(n) and list.append(number) if you really want to


如何在Python中将数字四舍五入为有效数字

问题:如何在Python中将数字四舍五入为有效数字

我需要四舍五入才能在UI中显示。例如,一个重要的数字:

1234-> 1000

0.12-> 0.1

0.012-> 0.01

0.062-> 0.06

6253-> 6000

1999-> 2000

是否有使用Python库执行此操作的好方法,还是我必须自己编写它?

I need to round a float to be displayed in a UI. E.g, to one significant figure:

1234 -> 1000

0.12 -> 0.1

0.012 -> 0.01

0.062 -> 0.06

6253 -> 6000

1999 -> 2000

Is there a nice way to do this using the Python library, or do I have to write it myself?


回答 0

您可以使用负数舍入整数:

>>> round(1234, -3)
1000.0

因此,如果您只需要最高有效数字:

>>> from math import log10, floor
>>> def round_to_1(x):
...   return round(x, -int(floor(log10(abs(x)))))
... 
>>> round_to_1(0.0232)
0.02
>>> round_to_1(1234243)
1000000.0
>>> round_to_1(13)
10.0
>>> round_to_1(4)
4.0
>>> round_to_1(19)
20.0

如果大于1,则可能需要将float转换为整数。

You can use negative numbers to round integers:

>>> round(1234, -3)
1000.0

Thus if you need only most significant digit:

>>> from math import log10, floor
>>> def round_to_1(x):
...   return round(x, -int(floor(log10(abs(x)))))
... 
>>> round_to_1(0.0232)
0.02
>>> round_to_1(1234243)
1000000.0
>>> round_to_1(13)
10.0
>>> round_to_1(4)
4.0
>>> round_to_1(19)
20.0

You’ll probably have to take care of turning float to integer if it’s bigger than 1.


回答 1

字符串格式的%g将格式化浮点数,并四舍五入到有效数字。有时会使用科学的“ e”表示法,因此将舍入的字符串转换回浮点数,然后通过%s字符串格式进行格式化。

>>> '%s' % float('%.1g' % 1234)
'1000'
>>> '%s' % float('%.1g' % 0.12)
'0.1'
>>> '%s' % float('%.1g' % 0.012)
'0.01'
>>> '%s' % float('%.1g' % 0.062)
'0.06'
>>> '%s' % float('%.1g' % 6253)
'6000.0'
>>> '%s' % float('%.1g' % 1999)
'2000.0'

%g in string formatting will format a float rounded to some number of significant figures. It will sometimes use ‘e’ scientific notation, so convert the rounded string back to a float then through %s string formatting.

>>> '%s' % float('%.1g' % 1234)
'1000'
>>> '%s' % float('%.1g' % 0.12)
'0.1'
>>> '%s' % float('%.1g' % 0.012)
'0.01'
>>> '%s' % float('%.1g' % 0.062)
'0.06'
>>> '%s' % float('%.1g' % 6253)
'6000.0'
>>> '%s' % float('%.1g' % 1999)
'2000.0'

回答 2

如果要使用除1个有效小数之外的其他数字(否则与Evgeny相同):

>>> from math import log10, floor
>>> def round_sig(x, sig=2):
...   return round(x, sig-int(floor(log10(abs(x))))-1)
... 
>>> round_sig(0.0232)
0.023
>>> round_sig(0.0232, 1)
0.02
>>> round_sig(1234243, 3)
1230000.0

If you want to have other than 1 significant decimal (otherwise the same as Evgeny):

>>> from math import log10, floor
>>> def round_sig(x, sig=2):
...   return round(x, sig-int(floor(log10(abs(x))))-1)
... 
>>> round_sig(0.0232)
0.023
>>> round_sig(0.0232, 1)
0.02
>>> round_sig(1234243, 3)
1230000.0

回答 3

f'{float(f"{i:.1g}"):g}'
# Or with Python <3.6,
'{:g}'.format(float('{:.1g}'.format(i)))

该解决方案与所有其他解决方案不同,因为:

  1. 恰好解决了OP问题
  2. 它并没有需要任何额外的包
  3. 需要任何用户定义的辅助功能数学运算

对于任意数量n的有效数字,可以使用:

print('{:g}'.format(float('{:.{p}g}'.format(i, p=n))))

测试:

a = [1234, 0.12, 0.012, 0.062, 6253, 1999, -3.14, 0., -48.01, 0.75]
b = ['{:g}'.format(float('{:.1g}'.format(i))) for i in a]
# b == ['1000', '0.1', '0.01', '0.06', '6000', '2000', '-3', '0', '-50', '0.8']

注意:使用此解决方案,不可能从输入动态地调整有效数字的数量,因为没有标准的方法来区分具有不同尾随零的数字(3.14 == 3.1400)。如果需要这样做,则需要非标准功能,例如精确度软件包中提供的功能。

f'{float(f"{i:.1g}"):g}'
# Or with Python <3.6,
'{:g}'.format(float('{:.1g}'.format(i)))

This solution is different from all of the others because:

  1. it exactly solves the OP question
  2. it does not need any extra package
  3. it does not need any user-defined auxiliary function or mathematical operation

For an arbitrary number n of significant figures, you can use:

print('{:g}'.format(float('{:.{p}g}'.format(i, p=n))))

Test:

a = [1234, 0.12, 0.012, 0.062, 6253, 1999, -3.14, 0., -48.01, 0.75]
b = ['{:g}'.format(float('{:.1g}'.format(i))) for i in a]
# b == ['1000', '0.1', '0.01', '0.06', '6000', '2000', '-3', '0', '-50', '0.8']

Note: with this solution, it is not possible to adapt the number of significant figures dynamically from the input because there is no standard way to distinguish numbers with different numbers of trailing zeros (3.14 == 3.1400). If you need to do so, then non-standard functions like the ones provided in the to-precision package are needed.


回答 4

我已经创建了可以满足您需求的高精度软件包。它使您可以为数字赋予或多或少的有效数字。

它还输出带有指定数量有效数字的标准,科学和工程符号。

在接受的答案中有一行

>>> round_to_1(1234243)
1000000.0

实际上指定了8个无花果。对于数字1234243,我的图书馆仅显示一个有效数字:

>>> from to_precision import to_precision
>>> to_precision(1234243, 1, 'std')
'1000000'
>>> to_precision(1234243, 1, 'sci')
'1e6'
>>> to_precision(1234243, 1, 'eng')
'1e6'

还将舍入最后一个有效数字,如果未指定符号,则可以自动选择要使用的符号:

>>> to_precision(599, 2)
'600'
>>> to_precision(1164, 2)
'1.2e3'

I have created the package to-precision that does what you want. It allows you to give your numbers more or less significant figures.

It also outputs standard, scientific, and engineering notation with a specified number of significant figures.

In the accepted answer there is the line

>>> round_to_1(1234243)
1000000.0

That actually specifies 8 sig figs. For the number 1234243 my library only displays one significant figure:

>>> from to_precision import to_precision
>>> to_precision(1234243, 1, 'std')
'1000000'
>>> to_precision(1234243, 1, 'sci')
'1e6'
>>> to_precision(1234243, 1, 'eng')
'1e6'

It will also round the last significant figure and can automatically choose what notation to use if a notation isn’t specified:

>>> to_precision(599, 2)
'600'
>>> to_precision(1164, 2)
'1.2e3'

回答 5

将整数四舍五入到1位有效数字的基本思想是将其转换为浮点数,该浮点数在该点之前1位数并四舍五入,然后将其转换回其原始整数大小。

为此,我们需要知道小于整数10的最大幂。为此,我们可以使用log 10功能的下限。

from math import log10, floor
def round_int(i,places):
    if i == 0:
        return 0
    isign = i/abs(i)
    i = abs(i)
    if i < 1:
        return 0
    max10exp = floor(log10(i))
    if max10exp+1 < places:
        return i
    sig10pow = 10**(max10exp-places+1)
    floated = i*1.0/sig10pow
    defloated = round(floated)*sig10pow
    return int(defloated*isign)

To round an integer to 1 significant figure the basic idea is to convert it to a floating point with 1 digit before the point and round that, then convert it back to its original integer size.

To do this we need to know the largest power of 10 less than the integer. We can use floor of the log 10 function for this.

from math import log10, floor
def round_int(i,places):
    if i == 0:
        return 0
    isign = i/abs(i)
    i = abs(i)
    if i < 1:
        return 0
    max10exp = floor(log10(i))
    if max10exp+1 < places:
        return i
    sig10pow = 10**(max10exp-places+1)
    floated = i*1.0/sig10pow
    defloated = round(floated)*sig10pow
    return int(defloated*isign)

回答 6

为了直接回答这个问题,这是我使用R函数命名的版本:

import math

def signif(x, digits=6):
    if x == 0 or not math.isfinite(x):
        return x
    digits -= math.ceil(math.log10(abs(x)))
    return round(x, digits)

我发布此答案的主要原因是评论抱怨“ 0.075”舍入为0.07而不是0.08。如“新手C”所指出的,这是由于具有有限精度和以2为底的表示形式的浮点运算的组合。实际上可以表示的最接近0.075的数字要小一些,因此舍入的结果与您可能天真地期望的不同。

还要注意,这适用于任何非十进制浮点算法的使用,例如C和Java都有相同的问题。

为了更详细地显示,我们要求Python将数字格式化为“十六进制”格式:

0.075.hex()

这给了我们:0x1.3333333333333p-4。这样做的原因是正常的十进制表示形式通常涉及舍入,因此不是计算机实际“看到”数字的方式。如果您不习惯这种格式,那么Python docsC standard是两个有用的参考。

为了说明这些数字是如何工作的,我们可以通过以下操作回到起点:

0x13333333333333 / 16**13 * 2**-4

应该打印出来0.07516**13是因为小数点后有13个十六进制数字,并且2**-4是因为十六进制指数是以2为底的。

现在我们有了关于浮点数表示方式的一些想法,可以使用decimal模块为我们提供更多的精度,向我们展示发生了什么事情:

from decimal import Decimal

Decimal(0x13333333333333) / 16**13 / 2**4

给:0.07499999999999999722444243844并希望解释为什么round(0.075, 2)评估0.07

To directly answer the question, here’s my version using naming from the R function:

import math

def signif(x, digits=6):
    if x == 0 or not math.isfinite(x):
        return x
    digits -= math.ceil(math.log10(abs(x)))
    return round(x, digits)

My main reason for posting this answer are the comments complaining that “0.075” rounds to 0.07 rather than 0.08. This is due, as pointed out by “Novice C”, to a combination of floating point arithmetic having both finite precision and a base-2 representation. The nearest number to 0.075 that can actually be represented is slightly smaller, hence rounding comes out differently than you might naively expect.

Also note that this applies to any use of non-decimal floating point arithmetic, e.g. C and Java both have the same issue.

To show in more detail, we ask Python to format the number in “hex” format:

0.075.hex()

which gives us: 0x1.3333333333333p-4. The reason for doing this is that the normal decimal representation often involves rounding and hence is not how the computer actually “sees” the number. If you’re not used to this format, a couple of useful references are the Python docs and the C standard.

To show how these numbers work a bit, we can get back to our starting point by doing:

0x13333333333333 / 16**13 * 2**-4

which should should print out 0.075. 16**13 is because there are 13 hexadecimal digits after the decimal point, and 2**-4 is because hex exponents are base-2.

Now we have some idea of how floats are represented we can use the decimal module to give us some more precision, showing us what’s going on:

from decimal import Decimal

Decimal(0x13333333333333) / 16**13 / 2**4

giving: 0.07499999999999999722444243844 and hopefully explaining why round(0.075, 2) evaluates to 0.07


回答 7

def round_to_n(x, n):
    if not x: return 0
    power = -int(math.floor(math.log10(abs(x)))) + (n - 1)
    factor = (10 ** power)
    return round(x * factor) / factor

round_to_n(0.075, 1)      # 0.08
round_to_n(0, 1)          # 0
round_to_n(-1e15 - 1, 16) # 1000000000000001.0

希望能充分利用以上所有答案中的最佳答案(减去能够将其表示为一行lambda的意思;)。尚未探索,请随时编辑此答案:

round_to_n(1e15 + 1, 11)  # 999999999999999.9
def round_to_n(x, n):
    if not x: return 0
    power = -int(math.floor(math.log10(abs(x)))) + (n - 1)
    factor = (10 ** power)
    return round(x * factor) / factor

round_to_n(0.075, 1)      # 0.08
round_to_n(0, 1)          # 0
round_to_n(-1e15 - 1, 16) # 1000000000000001.0

Hopefully taking the best of all the answers above (minus being able to put it as a one line lambda ;) ). Haven’t explored yet, feel free to edit this answer:

round_to_n(1e15 + 1, 11)  # 999999999999999.9

回答 8

我修改了indgar的解决方案,以处理负数和小数(包括零)。

from math import log10, floor
def round_sig(x, sig=6, small_value=1.0e-9):
    return round(x, sig - int(floor(log10(max(abs(x), abs(small_value))))) - 1)

I modified indgar’s solution to handle negative numbers and small numbers (including zero).

from math import log10, floor
def round_sig(x, sig=6, small_value=1.0e-9):
    return round(x, sig - int(floor(log10(max(abs(x), abs(small_value))))) - 1)

回答 9

如果您想不涉及字符串而四舍五入,我发现上面的链接中隐藏了该链接:

http://code.activestate.com/lists/python-tutor/70739/

给我最好的印象 然后,当您使用任何字符串格式的描述符进行打印时,您将获得一个合理的输出,并且可以将数字表示形式用于其他计算目的。

链接上的代码由三部分组成:def,doc和return。它有一个错误:您需要检查对数是否爆炸。那很容易。将输入与进行比较sys.float_info.min。完整的解决方案是:

import sys,math

def tidy(x, n):
"""Return 'x' rounded to 'n' significant digits."""
y=abs(x)
if y <= sys.float_info.min: return 0.0
return round( x, int( n-math.ceil(math.log10(y)) ) )

它适用于任何标量数值,float如果由于某种原因需要移动响应,n可以为a 。您实际上可以将限制提高到:

sys.float_info.min*sys.float_info.epsilon

如果您出于某些原因正在使用极小的值,则不会引发错误。

If you want to round without involving strings, the link I found buried in the comments above:

http://code.activestate.com/lists/python-tutor/70739/

strikes me as best. Then when you print with any string formatting descriptors, you get a reasonable output, and you can use the numeric representation for other calculation purposes.

The code at the link is a three liner: def, doc, and return. It has a bug: you need to check for exploding logarithms. That is easy. Compare the input to sys.float_info.min. The complete solution is:

import sys,math

def tidy(x, n):
"""Return 'x' rounded to 'n' significant digits."""
y=abs(x)
if y <= sys.float_info.min: return 0.0
return round( x, int( n-math.ceil(math.log10(y)) ) )

It works for any scalar numeric value, and n can be a float if you need to shift the response for some reason. You can actually push the limit to:

sys.float_info.min*sys.float_info.epsilon

without provoking an error, if for some reason you are working with miniscule values.


回答 10

我想不出任何能够立即解决的问题。但是对于浮点数来说,它处理得很好。

>>> round(1.2322, 2)
1.23

整数比较棘手。它们不会以10为基数存储在内存中,因此重要的地方并不是自然而然的事情。但是,一旦成为字符串,实现起来就很简单了。

或对于整数:

>>> def intround(n, sigfigs):
...   n = str(n)
...   return n[:sigfigs] + ('0' * (len(n)-(sigfigs)))

>>> intround(1234, 1)
'1000'
>>> intround(1234, 2)

如果您想创建一个可以处理任何数字的函数,我的偏好是将它们都转换为字符串,并寻找一个小数位来决定要做什么:

>>> def roundall1(n, sigfigs):
...   n = str(n)
...   try:
...     sigfigs = n.index('.')
...   except ValueError:
...     pass
...   return intround(n, sigfigs)

另一种选择是检查类型。这将不太灵活,并且可能无法与其他数字(例如Decimal对象)很好地配合使用:

>>> def roundall2(n, sigfigs):
...   if type(n) is int: return intround(n, sigfigs)
...   else: return round(n, sigfigs)

I can’t think of anything that would be able to handle this out of the box. But it’s fairly well handled for floating point numbers.

>>> round(1.2322, 2)
1.23

Integers are trickier. They’re not stored as base 10 in memory, so significant places isn’t a natural thing to do. It’s fairly trivial to implement once they’re a string though.

Or for integers:

>>> def intround(n, sigfigs):
...   n = str(n)
...   return n[:sigfigs] + ('0' * (len(n)-(sigfigs)))

>>> intround(1234, 1)
'1000'
>>> intround(1234, 2)

If you would like to create a function that handles any number, my preference would be to convert them both to strings and look for a decimal place to decide what to do:

>>> def roundall1(n, sigfigs):
...   n = str(n)
...   try:
...     sigfigs = n.index('.')
...   except ValueError:
...     pass
...   return intround(n, sigfigs)

Another option is to check for type. This will be far less flexible, and will probably not play nicely with other numbers such as Decimal objects:

>>> def roundall2(n, sigfigs):
...   if type(n) is int: return intround(n, sigfigs)
...   else: return round(n, sigfigs)

回答 11

给出的答案是给出的最好的答案,但是它有很多限制,并且在技术上没有正确的有效数字。

numpy.format_float_positional直接支持所需的行为。以下片段将浮点数x格式化为4个有效数字,并取消了科学计数法。

import numpy as np
x=12345.6
np.format_float_positional(x, precision=4, unique=False, fractional=False, trim='k')
> 12340.

The posted answer was the best available when given, but it has a number of limitations and does not produce technically correct significant figures.

numpy.format_float_positional supports the desired behaviour directly. The following fragment returns the float x formatted to 4 significant figures, with scientific notation suppressed.

import numpy as np
x=12345.6
np.format_float_positional(x, precision=4, unique=False, fractional=False, trim='k')
> 12340.

回答 12

我也遇到了这个问题,但是我需要控制舍入类型。因此,我编写了一个快速函数(请参见下面的代码),该函数可以将值,舍入类型和所需的有效数字考虑在内。

import decimal
from math import log10, floor

def myrounding(value , roundstyle='ROUND_HALF_UP',sig = 3):
    roundstyles = [ 'ROUND_05UP','ROUND_DOWN','ROUND_HALF_DOWN','ROUND_HALF_UP','ROUND_CEILING','ROUND_FLOOR','ROUND_HALF_EVEN','ROUND_UP']

    power =  -1 * floor(log10(abs(value)))
    value = '{0:f}'.format(value) #format value to string to prevent float conversion issues
    divided = Decimal(value) * (Decimal('10.0')**power) 
    roundto = Decimal('10.0')**(-sig+1)
    if roundstyle not in roundstyles:
        print('roundstyle must be in list:', roundstyles) ## Could thrown an exception here if you want.
    return_val = decimal.Decimal(divided).quantize(roundto,rounding=roundstyle)*(decimal.Decimal(10.0)**-power)
    nozero = ('{0:f}'.format(return_val)).rstrip('0').rstrip('.') # strips out trailing 0 and .
    return decimal.Decimal(nozero)


for x in list(map(float, '-1.234 1.2345 0.03 -90.25 90.34543 9123.3 111'.split())):
    print (x, 'rounded UP: ',myrounding(x,'ROUND_UP',3))
    print (x, 'rounded normal: ',myrounding(x,sig=3))

I ran into this as well but I needed control over the rounding type. Thus, I wrote a quick function (see code below) that can take value, rounding type, and desired significant digits into account.

import decimal
from math import log10, floor

def myrounding(value , roundstyle='ROUND_HALF_UP',sig = 3):
    roundstyles = [ 'ROUND_05UP','ROUND_DOWN','ROUND_HALF_DOWN','ROUND_HALF_UP','ROUND_CEILING','ROUND_FLOOR','ROUND_HALF_EVEN','ROUND_UP']

    power =  -1 * floor(log10(abs(value)))
    value = '{0:f}'.format(value) #format value to string to prevent float conversion issues
    divided = Decimal(value) * (Decimal('10.0')**power) 
    roundto = Decimal('10.0')**(-sig+1)
    if roundstyle not in roundstyles:
        print('roundstyle must be in list:', roundstyles) ## Could thrown an exception here if you want.
    return_val = decimal.Decimal(divided).quantize(roundto,rounding=roundstyle)*(decimal.Decimal(10.0)**-power)
    nozero = ('{0:f}'.format(return_val)).rstrip('0').rstrip('.') # strips out trailing 0 and .
    return decimal.Decimal(nozero)


for x in list(map(float, '-1.234 1.2345 0.03 -90.25 90.34543 9123.3 111'.split())):
    print (x, 'rounded UP: ',myrounding(x,'ROUND_UP',3))
    print (x, 'rounded normal: ',myrounding(x,sig=3))

回答 13

使用python 2.6+ 新样式格式(不建议使用%-style):

>>> "{0}".format(float("{0:.1g}".format(1216)))
'1000.0'
>>> "{0}".format(float("{0:.1g}".format(0.00356)))
'0.004'

在python 2.7+中,您可以省略前导0s。

Using python 2.6+ new-style formatting (as %-style is deprecated):

>>> "{0}".format(float("{0:.1g}".format(1216)))
'1000.0'
>>> "{0}".format(float("{0:.1g}".format(0.00356)))
'0.004'

In python 2.7+ you can omit the leading 0s.


回答 14

如果数字大于10 **(-decimal_positions),则此函数进行正常的回合,否则增加更多的小数,直到达到有意义的小数位数为止:

def smart_round(x, decimal_positions):
    dp = - int(math.log10(abs(x))) if x != 0.0 else int(0)
    return round(float(x), decimal_positions + dp if dp > 0 else decimal_positions)

希望能帮助到你。

This function does a normal round if the number is bigger than 10**(-decimal_positions), otherwise adds more decimal until the number of meaningful decimal positions is reached:

def smart_round(x, decimal_positions):
    dp = - int(math.log10(abs(x))) if x != 0.0 else int(0)
    return round(float(x), decimal_positions + dp if dp > 0 else decimal_positions)

Hope it helps.


回答 15

https://stackoverflow.com/users/1391441/gabriel,以下内容是否解决了您对rnd(.075,1)的担忧?警告:以浮点数形式返回值

def round_to_n(x, n):
    fmt = '{:1.' + str(n) + 'e}'    # gives 1.n figures
    p = fmt.format(x).split('e')    # get mantissa and exponent
                                    # round "extra" figure off mantissa
    p[0] = str(round(float(p[0]) * 10**(n-1)) / 10**(n-1))
    return float(p[0] + 'e' + p[1]) # convert str to float

>>> round_to_n(750, 2)
750.0
>>> round_to_n(750, 1)
800.0
>>> round_to_n(.0750, 2)
0.075
>>> round_to_n(.0750, 1)
0.08
>>> math.pi
3.141592653589793
>>> round_to_n(math.pi, 7)
3.141593

https://stackoverflow.com/users/1391441/gabriel, does the following address your concern about rnd(.075, 1)? Caveat: returns value as a float

def round_to_n(x, n):
    fmt = '{:1.' + str(n) + 'e}'    # gives 1.n figures
    p = fmt.format(x).split('e')    # get mantissa and exponent
                                    # round "extra" figure off mantissa
    p[0] = str(round(float(p[0]) * 10**(n-1)) / 10**(n-1))
    return float(p[0] + 'e' + p[1]) # convert str to float

>>> round_to_n(750, 2)
750.0
>>> round_to_n(750, 1)
800.0
>>> round_to_n(.0750, 2)
0.075
>>> round_to_n(.0750, 1)
0.08
>>> math.pi
3.141592653589793
>>> round_to_n(math.pi, 7)
3.141593

回答 16

这将返回一个字符串,以便结果不包含小数部分,并且正确显示了E表示法中否则会出现的较小值:

def sigfig(x, num_sigfig):
    num_decplace = num_sigfig - int(math.floor(math.log10(abs(x)))) - 1
    return '%.*f' % (num_decplace, round(x, num_decplace))

This returns a string, so that results without fractional parts, and small values which would otherwise appear in E notation are shown correctly:

def sigfig(x, num_sigfig):
    num_decplace = num_sigfig - int(math.floor(math.log10(abs(x)))) - 1
    return '%.*f' % (num_decplace, round(x, num_decplace))

回答 17

给定一个如此彻底回答的问题,为什么不添加另一个

尽管上面的许多内容是可比的,但这更适合我的审美观

import numpy as np

number=-456.789
significantFigures=4

roundingFactor=significantFigures - int(np.floor(np.log10(np.abs(number)))) - 1
rounded=np.round(number, roundingFactor)

string=rounded.astype(str)

print(string)

这适用于单个数字和numpy数组,对于负数应该可以正常工作。

我们可能还要增加一个附加步骤-即使四舍五入为整数,np.round()也会返回一个十进制数(即,对于ificantFigures = 2,我们可能期望返回-460,但相反会得到-460.0)。我们可以添加此步骤以更正此问题:

if roundingFactor<=0:
    rounded=rounded.astype(int)

不幸的是,最后一步不适用于数字数组-亲爱的读者,我会把这个留给您看看是否需要。

Given a question so thoroughly answered why not add another

This suits my aesthetic a little better, though many of the above are comparable

import numpy as np

number=-456.789
significantFigures=4

roundingFactor=significantFigures - int(np.floor(np.log10(np.abs(number)))) - 1
rounded=np.round(number, roundingFactor)

string=rounded.astype(str)

print(string)

This works for individual numbers and numpy arrays, and should function fine for negative numbers.

There’s one additional step we might add – np.round() returns a decimal number even if rounded is an integer (i.e. for significantFigures=2 we might expect to get back -460 but instead we get -460.0). We can add this step to correct for that:

if roundingFactor<=0:
    rounded=rounded.astype(int)

Unfortunately, this final step won’t work for an array of numbers – I’ll leave that to you dear reader to figure out if you need.


回答 18

sigfig包/库盖这一点。后安装,你可以做到以下几点:

>>> from sigfig import round
>>> round(1234, 1)
1000
>>> round(0.12, 1)
0.1
>>> round(0.012, 1)
0.01
>>> round(0.062, 1)
0.06
>>> round(6253, 1)
6000
>>> round(1999, 1)
2000

The sigfig package/library covers this. After installing you can do the following:

>>> from sigfig import round
>>> round(1234, 1)
1000
>>> round(0.12, 1)
0.1
>>> round(0.012, 1)
0.01
>>> round(0.062, 1)
0.06
>>> round(6253, 1)
6000
>>> round(1999, 1)
2000

回答 19

import math

  def sig_dig(x, n_sig_dig):
      num_of_digits = len(str(x).replace(".", ""))
      if n_sig_dig >= num_of_digits:
          return x
      n = math.floor(math.log10(x) + 1 - n_sig_dig)
      result = round(10 ** -n * x) * 10 ** n
      return float(str(result)[: n_sig_dig + 1])


    >>> sig_dig(1234243, 3)
    >>> sig_dig(243.3576, 5)

        1230.0
        243.36
import math

  def sig_dig(x, n_sig_dig):
      num_of_digits = len(str(x).replace(".", ""))
      if n_sig_dig >= num_of_digits:
          return x
      n = math.floor(math.log10(x) + 1 - n_sig_dig)
      result = round(10 ** -n * x) * 10 ** n
      return float(str(result)[: n_sig_dig + 1])


    >>> sig_dig(1234243, 3)
    >>> sig_dig(243.3576, 5)

        1230.0
        243.36

如何在Python中计算平方根?

问题:如何在Python中计算平方根?

为什么Python会给出“错误”的答案?

x = 16

sqrt = x**(.5)  #returns 4
sqrt = x**(1/2) #returns 1

是的,我知道import math并使用sqrt。但我正在寻找以上答案。

Why does Python give the “wrong” answer?

x = 16

sqrt = x**(.5)  #returns 4
sqrt = x**(1/2) #returns 1

Yes, I know import math and use sqrt. But I’m looking for an answer to the above.


回答 0

sqrt=x**(1/2)在做整数除法。1/2 == 0

因此,您在第一个实例中计算x (1/2),在第二个实例中计算x (0)

没错,这是对其他问题的正确答案。

sqrt=x**(1/2) is doing integer division. 1/2 == 0.

So you’re computing x(1/2) in the first instance, x(0) in the second.

So it’s not wrong, it’s the right answer to a different question.


回答 1

您必须编写:sqrt = x**(1/2.0),否则将执行整数除法并1/2返回表达式0

此行为是在Python 2.x的“正常”,而在Python 3.x的1/2计算结果为0.5。如果您希望Python 2.x代码的行为类似于3.x wrt除法写入from __future__ import division-那么1/2将评估0.5为并向后兼容,1//2评估为0

作为记录,计算平方根的首选方法是:

import math
math.sqrt(x)

You have to write: sqrt = x**(1/2.0), otherwise an integer division is performed and the expression 1/2 returns 0.

This behavior is “normal” in Python 2.x, whereas in Python 3.x 1/2 evaluates to 0.5. If you want your Python 2.x code to behave like 3.x w.r.t. division write from __future__ import division – then 1/2 will evaluate to 0.5 and for backwards compatibility, 1//2 will evaluate to 0.

And for the record, the preferred way to calculate a square root is this:

import math
math.sqrt(x)

回答 2

import math
math.sqrt( x )

这是对答案链的琐碎补充。但是,由于该主题在Google上非常普遍,因此我认为值得添加。

import math
math.sqrt( x )

It is a trivial addition to the answer chain. However since the Subject is very common google hit, this deserves to be added, I believe.


回答 3

/ 在Python 2中执行整数除法:

>>> 1/2
0

如果其中一个数字是浮点数,则按预期方式工作:

>>> 1.0/2
0.5
>>> 16**(1.0/2)
4.0

/ performs an integer division in Python 2:

>>> 1/2
0

If one of the numbers is a float, it works as expected:

>>> 1.0/2
0.5
>>> 16**(1.0/2)
4.0

回答 4

您所看到的是整数除法。要默认获得浮点除法,

from __future__ import division

或者,您可以将1/2的1或2转换为浮点值。

sqrt = x**(1.0/2)

What you’re seeing is integer division. To get floating point division by default,

from __future__ import division

Or, you could convert 1 or 2 of 1/2 into a floating point value.

sqrt = x**(1.0/2)

回答 5

这可能有点迟了,但是计算平方根的最简单,最准确的方法是牛顿法。

您有一个要计算其平方根的数字,(num)并且猜出了其平方根(estimate)。估计可以是大于0的任何数字,但是有意义的数字会大大缩短递归调用的深度。

new_estimate = (estimate + num / estimate) / 2

该行使用这两个参数计算出更准确的估算值。您可以将new_estimate值传递给该函数,然后计算另一个比上一个更准确的new_estimate,也可以像这样进行递归函数定义。

def newtons_method(num, estimate):
    # Computing a new_estimate
    new_estimate = (estimate + num / estimate) / 2
    print(new_estimate)
    # Base Case: Comparing our estimate with built-in functions value
    if new_estimate == math.sqrt(num):
        return True
    else:
        return newtons_method(num, new_estimate)

例如,我们需要找到30的平方根。我们知道结果在5到6之间。

newtons_method(30,5)

数字是30,估计是5。每个递归调用的结果是:

5.5
5.477272727272727
5.4772255752546215
5.477225575051661

最后的结果是最精确的数字平方根计算。它与内置函数math.sqrt()的值相同。

This might be a little late to answer but most simple and accurate way to compute square root is newton’s method.

You have a number which you want to compute its square root (num) and you have a guess of its square root (estimate). Estimate can be any number bigger than 0, but a number that makes sense shortens the recursive call depth significantly.

new_estimate = (estimate + num / estimate) / 2

This line computes a more accurate estimate with those 2 parameters. You can pass new_estimate value to the function and compute another new_estimate which is more accurate than the previous one or you can make a recursive function definition like this.

def newtons_method(num, estimate):
    # Computing a new_estimate
    new_estimate = (estimate + num / estimate) / 2
    print(new_estimate)
    # Base Case: Comparing our estimate with built-in functions value
    if new_estimate == math.sqrt(num):
        return True
    else:
        return newtons_method(num, new_estimate)

For example we need to find 30’s square root. We know that the result is between 5 and 6.

newtons_method(30,5)

number is 30 and estimate is 5. The result from each recursive calls are:

5.5
5.477272727272727
5.4772255752546215
5.477225575051661

The last result is the most accurate computation of the square root of number. It is the same value as the built-in function math.sqrt().


回答 6

可能是一种简单的记住方式:在分子(或分母)后添加点

16 ** (1. / 2)   # 4
289 ** (1. / 2)  # 17
27 ** (1. / 3)   # 3

Perhaps a simple way to remember: add a dot after the numerator (or denominator)

16 ** (1. / 2)   # 4
289 ** (1. / 2)  # 17
27 ** (1. / 3)   # 3

回答 7

您可以使用NumPy计算数组的平方根:

 import numpy as np
 np.sqrt([1, 4, 9])

You can use NumPy to calculate square roots of arrays:

 import numpy as np
 np.sqrt([1, 4, 9])

回答 8

我希望下面提到的代码能够回答您的问题。

def root(x,a):
    y = 1 / a
    y = float(y)
    print y
    z = x ** y
    print z

base = input("Please input the base value:")
power = float(input("Please input the root value:"))


root(base,power) 

I hope the below mentioned code will answer your question.

def root(x,a):
    y = 1 / a
    y = float(y)
    print y
    z = x ** y
    print z

base = input("Please input the base value:")
power = float(input("Please input the root value:"))


root(base,power) 

在Python中将float转换为整数的最安全方法?

问题:在Python中将float转换为整数的最安全方法?

Python的math模块包含诸如floor&的便捷函数ceil。这些函数采用浮点数,并在其下或上返回最接近的整数。但是,这些函数将答案作为浮点数返回。例如:

import math
f=math.floor(2.3)

现在f返回:

2.0

从该浮点数中获取整数而不冒取舍入错误风险的最安全方法是什么(例如,如果浮点数等于1.99999),或者我应该完全使用另一个函数?

Python’s math module contain handy functions like floor & ceil. These functions take a floating point number and return the nearest integer below or above it. However these functions return the answer as a floating point number. For example:

import math
f=math.floor(2.3)

Now f returns:

2.0

What is the safest way to get an integer out of this float, without running the risk of rounding errors (for example if the float is the equivalent of 1.99999) or perhaps I should use another function altogether?


回答 0

可以用浮点数表示的所有整数均具有精确的表示形式。这样您就可以安全地使用int结果了。仅当您尝试使用非2的幂的分母来表示有理数时,才会出现不精确的表示。

这项工作一点都不小!IEEE浮点表示的一个属性是int∘floor=⌊⋅⌋,如果所讨论的数字的大小足够小,但是int(floor(2.3))可能为1的情况下,可能会有不同的表示形式。

要引用维基百科

绝对值小于或等于2 24的任何整数都可以用单精度格式准确表示,绝对值小于或等于2 53的任何整数都可以用双精度格式准确表示。

All integers that can be represented by floating point numbers have an exact representation. So you can safely use int on the result. Inexact representations occur only if you are trying to represent a rational number with a denominator that is not a power of two.

That this works is not trivial at all! It’s a property of the IEEE floating point representation that int∘floor = ⌊⋅⌋ if the magnitude of the numbers in question is small enough, but different representations are possible where int(floor(2.3)) might be 1.

To quote from Wikipedia,

Any integer with absolute value less than or equal to 224 can be exactly represented in the single precision format, and any integer with absolute value less than or equal to 253 can be exactly represented in the double precision format.


回答 1

使用int(your non integer number)将打钉。

print int(2.3) # "2"
print int(math.sqrt(5)) # "2"

Use int(your non integer number) will nail it.

print int(2.3) # "2"
print int(math.sqrt(5)) # "2"

回答 2

您可以使用舍入功能。如果您不使用第二个参数(有效数字位数),那么我认为您将获得想要的行为。

空闲输出。

>>> round(2.99999999999)
3
>>> round(2.6)
3
>>> round(2.5)
3
>>> round(2.4)
2

You could use the round function. If you use no second parameter (# of significant digits) then I think you will get the behavior you want.

IDLE output.

>>> round(2.99999999999)
3
>>> round(2.6)
3
>>> round(2.5)
3
>>> round(2.4)
2

回答 3

结合之前的两个结果,我们得到:

int(round(some_float))

这可以相当可靠地将浮点数转换为整数。

Combining two of the previous results, we have:

int(round(some_float))

This converts a float to an integer fairly dependably.


回答 4

这项工作一点都不小!IEEE浮点表示的一个属性是int∘floor=⌊⋅⌋,如果所讨论的数字的大小足够小,但是int(floor(2.3))可能为1的情况下,可能会有不同的表示形式。

这篇文章解释了为什么它可以在这个范围内工作

在double中,您可以毫无问题地表示32位整数。有不能是任何四舍五入问题。更精确地,双精度数可以表示2 53-2 53之间(包括2 53-2 53)的所有整数。

简短说明:一个double最多可以存储53个二进制数字。当您需要更多时,该数字将在右边填充零。

由此可见,53个数字是无需填充即可存储的最大数字。自然,所有需要较少数字的(整数)数字都可以准确存储。

111加1(省略)111(53个)将产生100 … 000,(53个零)。众所周知,我们可以存储53位数字,即最右边的零填充。

这是2 53的来源。


详细信息:我们需要考虑IEEE-754浮点如何工作。

  1 bit    11 / 8     52 / 23      # bits double/single precision
[ sign |  exponent | mantissa ]

然后,该数字的计算方式如下(不包括此处无关的特殊情况):

-1 ×1.尾数×2 指数-偏差

其中偏压= 2 指数- 1 1 –分别,即,1023和127,用于双/单精度。

明知乘以2 X根本改变所有位X位的左侧,可以很容易地看到,任何整数必须具备的所有位尾数为此右上小数点零。

除零以外的任何整数都具有以下二进制形式:

1x … x,其中x -es表示MSB右侧的位(最高有效位)。

因为我们排除了零,所以总会有一个MSB为1,这就是为什么不存储它的原因。要存储整数,我们必须将其转换为上述形式:-1 符号 ×1.尾数×2 指数偏差

就是说,将这些位移到小数点后直到只有MSB朝MSB的左侧移动。然后,小数点右边的所有位都存储在尾数中。

由此可见,除MSB外,我们最多可以存储52个二进制数字。

因此,显式存储所有位的最高编号为

111(omitted)111.   that's 53 ones (52 + implicit 1) in the case of doubles.

为此,我们需要设置指数,以使小数点后移52位。如果我们将指数增加一,我们将无法知道小数点后左边的数字。

111(omitted)111x.

按照惯例,它是0。将整个尾数设置为零,我们收到以下数字:

100(omitted)00x. = 100(omitted)000.

这是一个1,后跟53个零,已存储52个,并且由于指数而加了1。

它代表2 53,它标志着我们可以准确表示所有整数的边界(负向和正向)。如果要将1加到2 53,则必须将隐式零(由表示x)设置为1,但这是不可能的。

That this works is not trivial at all! It’s a property of the IEEE floating point representation that int∘floor = ⌊⋅⌋ if the magnitude of the numbers in question is small enough, but different representations are possible where int(floor(2.3)) might be 1.

This post explains why it works in that range.

In a double, you can represent 32bit integers without any problems. There cannot be any rounding issues. More precisely, doubles can represent all integers between and including 253 and -253.

Short explanation: A double can store up to 53 binary digits. When you require more, the number is padded with zeroes on the right.

It follows that 53 ones is the largest number that can be stored without padding. Naturally, all (integer) numbers requiring less digits can be stored accurately.

Adding one to 111(omitted)111 (53 ones) yields 100…000, (53 zeroes). As we know, we can store 53 digits, that makes the rightmost zero padding.

This is where 253 comes from.


More detail: We need to consider how IEEE-754 floating point works.

  1 bit    11 / 8     52 / 23      # bits double/single precision
[ sign |  exponent | mantissa ]

The number is then calculated as follows (excluding special cases that are irrelevant here):

-1sign × 1.mantissa ×2exponent – bias

where bias = 2exponent – 1 – 1, i.e. 1023 and 127 for double/single precision respectively.

Knowing that multiplying by 2X simply shifts all bits X places to the left, it’s easy to see that any integer must have all bits in the mantissa that end up right of the decimal point to zero.

Any integer except zero has the following form in binary:

1x…x where the x-es represent the bits to the right of the MSB (most significant bit).

Because we excluded zero, there will always be a MSB that is one—which is why it’s not stored. To store the integer, we must bring it into the aforementioned form: -1sign × 1.mantissa ×2exponent – bias.

That’s saying the same as shifting the bits over the decimal point until there’s only the MSB towards the left of the MSB. All the bits right of the decimal point are then stored in the mantissa.

From this, we can see that we can store at most 52 binary digits apart from the MSB.

It follows that the highest number where all bits are explicitly stored is

111(omitted)111.   that's 53 ones (52 + implicit 1) in the case of doubles.

For this, we need to set the exponent, such that the decimal point will be shifted 52 places. If we were to increase the exponent by one, we cannot know the digit right to the left after the decimal point.

111(omitted)111x.

By convention, it’s 0. Setting the entire mantissa to zero, we receive the following number:

100(omitted)00x. = 100(omitted)000.

That’s a 1 followed by 53 zeroes, 52 stored and 1 added due to the exponent.

It represents 253, which marks the boundary (both negative and positive) between which we can accurately represent all integers. If we wanted to add one to 253, we would have to set the implicit zero (denoted by the x) to one, but that’s impossible.


回答 5

math.floor将始终返回整数,因此int(math.floor(some_float))永远不会引入舍入错误。

但是,舍入错误可能已经引入了math.floor(some_large_float),或者甚至当首先将大量存储在float中时也已引入。(存储在浮点数中的大数字可能会失去精度。)

math.floor will always return an integer number and thus int(math.floor(some_float)) will never introduce rounding errors.

The rounding error might already be introduced in math.floor(some_large_float), though, or even when storing a large number in a float in the first place. (Large numbers may lose precision when stored in floats.)


回答 6

如果需要将字符串float转换为int,则可以使用此方法。

例如:'38.0'38

为了将其转换为int,可以将其转换为float,然后转换为int。这也适用于浮点字符串或整数字符串。

>>> int(float('38.0'))
38
>>> int(float('38'))
38

注意:这将删除小数点后的所有数字。

>>> int(float('38.2'))
38

If you need to convert a string float to an int you can use this method.

Example: '38.0' to 38

In order to convert this to an int you can cast it as a float then an int. This will also work for float strings or integer strings.

>>> int(float('38.0'))
38
>>> int(float('38'))
38

Note: This will strip any numbers after the decimal.

>>> int(float('38.2'))
38

回答 7

另一个代码示例使用变量将实数/浮点数转换为整数。“ vel”是一个实数/浮点数,并转换为第二高的整数“ newvel”。

import arcpy.math, os, sys, arcpy.da
.
.
with arcpy.da.SearchCursor(densifybkp,[floseg,vel,Length]) as cursor:
 for row in cursor:
    curvel = float(row[1])
    newvel = int(math.ceil(curvel))

Another code sample to convert a real/float to an integer using variables. “vel” is a real/float number and converted to the next highest INTEGER, “newvel”.

import arcpy.math, os, sys, arcpy.da
.
.
with arcpy.da.SearchCursor(densifybkp,[floseg,vel,Length]) as cursor:
 for row in cursor:
    curvel = float(row[1])
    newvel = int(math.ceil(curvel))

回答 8

由于您要求的是“最安全”的方式,因此我将提供除最佳答案之外的另一个答案。

确保您不损失任何精度的一种简单方法是检查转换后的值是否相等。

if int(some_value) == some_value:
     some_value = int(some_value)

例如,如果float为1.0,则1.0等于1。因此将执行向int的转换。如果float为1.1,则int(1.1)等于1,并且1.1!=1。因此,该值将保持为float值,并且不会损失任何精度。

Since you’re asking for the ‘safest’ way, I’ll provide another answer other than the top answer.

An easy way to make sure you don’t lose any precision is to check if the values would be equal after you convert them.

if int(some_value) == some_value:
     some_value = int(some_value)

If the float is 1.0 for example, 1.0 is equal to 1. So the conversion to int will execute. And if the float is 1.1, int(1.1) equates to 1, and 1.1 != 1. So the value will remain a float and you won’t lose any precision.


回答 9

df [‘Column_Name’] = df [‘Column_Name’]。astype(int)

df[‘Column_Name’]=df[‘Column_Name’].astype(int)