Python 实用宝典

Question 1

Can someone explain this (straight from the docs– emphasis mine):

math.ceil(x) Return the ceiling of x as a float, the smallest integer value greater than or equal to x.

math.floor(x) Return the floor of x as a float, the largest integer value less than or equal to x.

Why would .ceil and .floor return floats when they are by definition supposed to calculate integers?

EDIT:

Well this got some very good arguments as to why they should return floats, and I was just getting used to the idea, when @jcollado pointed out that they in fact do return ints in Python 3…

Question 2

The range of floating point numbers usually exceeds the range of integers. By returning a floating point value, the functions can return a sensible value for input values that lie outside the representable range of integers.

Consider: If floor() returned an integer, what should floor(1.0e30) return?

Now, while Python’s integers are now arbitrary precision, it wasn’t always this way. The standard library functions are thin wrappers around the equivalent C library functions.

Question 3

As pointed out by other answers, in python they return floats probably because of historical reasons to prevent overflow problems. However, they return integers in python 3.

>>> import math
>>> type(math.floor(3.1))
<class 'int'>
>>> type(math.ceil(3.1))
<class 'int'>

You can find more information in PEP 3141.

Question 4

The source of your confusion is evident in your comment:

The whole point of ceil/floor operations is to convert floats to integers!

The point of the ceil and floor operations is to round floating-point data to integral values. Not to do a type conversion. Users who need to get integer values can do an explicit conversion following the operation.

Note that it would not be possible to implement a round to integral value as trivially if all you had available were a ceil or float operation that returned an integer. You would need to first check that the input is within the representable integer range, then call the function; you would need to handle NaN and infinities in a separate code path.

Additionally, you must have versions of ceil and floor which return floating-point numbers if you want to conform to IEEE 754.

Question 5

Because python’s math library is a thin wrapper around the C math library which returns floats.

Question 6

Before Python 2.4, an integer couldn’t hold the full range of truncated real numbers.

http://docs.python.org/whatsnew/2.4.html#pep-237-unifying-long-integers-and-integers

Question 7

Because the range for floats is greater than that of integers — returning an integer could overflow

Question 8

This is a very interesting question! As a float requires some bits to store the exponent (=bits_for_exponent) any floating point number greater than 2**(float_size - bits_for_exponent) will always be an integral value! At the other extreme a float with a negative exponent will give one of 1, 0 or -1. This makes the discussion of integer range versus float range moot because these functions will simply return the original number whenever the number is outside the range of the integer type. The python functions are wrappers of the C function and so this is really a deficiency of the C functions where they should have returned an integer and forced the programer to do the range/NaN/Inf check before calling ceil/floor.

Thus the logical answer is the only time these functions are useful they would return a value within integer range and so the fact they return a float is a mistake and you are very smart for realizing this!

Question 9

Maybe because other languages do this as well, so it is generally-accepted behavior. (For good reasons, as shown in the other answers)

Question 10

I was trying to normalize a set of numbers from -100 to 0 to a range of 10-100 and was having problems only to notice that even with no variables at all, this does not evaluate the way I would expect it to:

>>> (20-10) / (100-10)
0

Float division doesn’t work either:

>>> float((20-10) / (100-10))
0.0

If either side of the division is cast to a float it will work:

>>> (20-10) / float((100-10))
0.1111111111111111

Each side in the first example is evaluating as an int which means the final answer will be cast to an int. Since 0.111 is less than .5, it rounds to 0. It is not transparent in my opinion, but I guess that’s the way it is.

What is the explanation?

Question 11

You’re using Python 2.x, where integer divisions will truncate instead of becoming a floating point number.

>>> 1 / 2
0

You should make one of them a float:

>>> float(10 - 20) / (100 - 10)
-0.1111111111111111

or from __future__ import division, which the forces / to adopt Python 3.x’s behavior that always returns a float.

>>> from __future__ import division
>>> (10 - 20) / (100 - 10)
-0.1111111111111111

Question 12

You’re putting Integers in so Python is giving you an integer back:

>>> 10 / 90
0

If if you cast this to a float afterwards the rounding will have already been done, in other words, 0 integer will always become 0 float.

If you use floats on either side of the division then Python will give you the answer you expect.

>>> 10 / 90.0
0.1111111111111111

So in your case:

>>> float(20-10) / (100-10)
0.1111111111111111
>>> (20-10) / float(100-10)
0.1111111111111111

Question 13

You need to change it to a float BEFORE you do the division. That is:

float(20 - 10) / (100 - 10)

Question 14

In Python 2.7, the / operator is an integer division if inputs are integers:

>>>20/15
1

>>>20.0/15.0
1.33333333333

>>>20.0/15
1.33333333333

In Python 3.3, the / operator is a float division even if the inputs are integer.

>>> 20/15
1.33333333333

>>>20.0/15
1.33333333333

For integer division in Python 3, we will use the // operator.

The // operator is an integer division operator in both Python 2.7 and Python 3.3.

In Python 2.7 and Python 3.3:

>>>20//15
1

Now, see the comparison

>>>a = 7.0/4.0
>>>b = 7/4
>>>print a == b

For the above program, the output will be False in Python 2.7 and True in Python 3.3.

In Python 2.7 a = 1.75 and b = 1.

In Python 3.3 a = 1.75 and b = 1.75, just because / is a float division.

Question 15

It has to do with the version of python that you use. Basically it adopts the C behavior: if you divide two integers, the results will be rounded down to an integer. Also keep in mind that Python does the operations from left to right, which plays a role when you typecast.

Example: Since this is a question that always pops in my head when I am doing arithmetic operations (should I convert to float and which number), an example from that aspect is presented:

>>> a = 1/2/3/4/5/4/3
>>> a
0

When we divide integers, not surprisingly it gets lower rounded.

>>> a = 1/2/3/4/5/4/float(3)
>>> a
0.0

If we typecast the last integer to float, we will still get zero, since by the time our number gets divided by the float has already become 0 because of the integer division.

>>> a = 1/2/3/float(4)/5/4/3
>>> a
0.0

Same scenario as above but shifting the float typecast a little closer to the left side.

>>> a = float(1)/2/3/4/5/4/3
>>> a
0.0006944444444444445

Finally, when we typecast the first integer to float, the result is the desired one, since beginning from the first division, i.e. the leftmost one, we use floats.

Extra 1: If you are trying to answer that to improve arithmetic evaluation, you should check this

Extra 2: Please be careful of the following scenario:

>>> a = float(1/2/3/4/5/4/3)
>>> a
0.0

Question 16

Specifying a float by placing a ‘.’ after the number will also cause it to default to float.

>>> 1 / 2
0

>>> 1. / 2.
0.5

Question 17

Make at least one of them float, then it will be float division, not integer:

>>> (20.0-10) / (100-10)
0.1111111111111111

Casting the result to float is too late.

Question 18

In python cv2 not updated the division calculation. so, you must include from __future__ import division in first line of the program.

Question 19

Either way, it’s integer division. 10/90 = 0. In the second case, you’re merely casting 0 to a float.

Try casting one of the operands of “/” to be a float:

float(20-10) / (100-10)

Question 20

You’re casting to float after the division has already happened in your second example. Try this:

float(20-10) / float(100-10)

Question 21

I’m somewhat surprised that no one has mentioned that the original poster might have liked rational numbers to result. Should you be interested in this, the Python-based program Sage has your back. (Currently still based on Python 2.x, though 3.x is under way.)

sage: (20-10) / (100-10)
1/9

This isn’t a solution for everyone, because it does do some preparsing so these numbers aren’t ints, but Sage Integer class elements. Still, worth mentioning as a part of the Python ecosystem.

Question 22

Personally I preferred to insert a 1. * at the very beginning. So the expression become something like this:

1. * (20-10) / (100-10)

As I always do a division for some formula like:

accuracy = 1. * (len(y_val) - sum(y_val)) / len(y_val)

so it is impossible to simply add a .0 like 20.0. And in my case, wrapping with a float() may lose a little bit readability.

Question 23

numpy has three different functions which seem like they can be used for the same things — except that numpy.maximum can only be used element-wise, while numpy.max and numpy.amax can be used on particular axes, or all elements. Why is there more than just numpy.max? Is there some subtlety to this in performance?

(Similarly for min vs. amin vs. minimum)

Question 24

np.max is just an alias for np.amax. This function only works on a single input array and finds the value of maximum element in that entire array (returning a scalar). Alternatively, it takes an axis argument and will find the maximum value along an axis of the input array (returning a new array).

>>> a = np.array([[0, 1, 6],
                  [2, 4, 1]])
>>> np.max(a)
6
>>> np.max(a, axis=0) # max of each column
array([2, 4, 6])

The default behaviour of np.maximum is to take two arrays and compute their element-wise maximum. Here, ‘compatible’ means that one array can be broadcast to the other. For example:

>>> b = np.array([3, 6, 1])
>>> c = np.array([4, 2, 9])
>>> np.maximum(b, c)
array([4, 6, 9])

But np.maximum is also a universal function which means that it has other features and methods which come in useful when working with multidimensional arrays. For example you can compute the cumulative maximum over an array (or a particular axis of the array):

>>> d = np.array([2, 0, 3, -4, -2, 7, 9])
>>> np.maximum.accumulate(d)
array([2, 2, 3, 3, 3, 7, 9])

This is not possible with np.max.

You can make np.maximum imitate np.max to a certain extent when using np.maximum.reduce:

>>> np.maximum.reduce(d)
9
>>> np.max(d)
9

Basic testing suggests the two approaches are comparable in performance; and they should be, as np.max() actually calls np.maximum.reduce to do the computation.

Question 25

You’ve already stated why np.maximum is different – it returns an array that is the element-wise maximum between two arrays.

As for np.amax and np.max: they both call the same function – np.max is just an alias for np.amax, and they compute the maximum of all elements in an array, or along an axis of an array.

In [1]: import numpy as np

In [2]: np.amax
Out[2]: <function numpy.core.fromnumeric.amax>

In [3]: np.max
Out[3]: <function numpy.core.fromnumeric.amax>

Question 26

For completeness, in Numpy there are four maximum related functions. They fall into two different categories:

np.amax/np.max, np.nanmax: for single array order statistics
and np.maximum, np.fmax: for element-wise comparison of two arrays

I. For single array order statistics

NaNs propagator np.amax/np.max and its NaN ignorant counterpart np.nanmax.

np.max is just an alias of np.amax, so they are considered as one function.
```
>>> np.max.__name__
'amax'
>>> np.max is np.amax
True
```

np.max propagates NaNs while np.nanmax ignores NaNs.

>>> np.max([np.nan, 3.14, -1])
nan
>>> np.nanmax([np.nan, 3.14, -1])
3.14

II. For element-wise comparison of two arrays

NaNs propagator np.maximum and its NaNs ignorant counterpart np.fmax.

Both functions require two arrays as the first two positional args to compare with.

# x1 and x2 must be the same shape or can be broadcast
np.maximum(x1, x2, /, ...);
np.fmax(x1, x2, /, ...)

np.maximum propagates NaNs while np.fmax ignores NaNs.

>>> np.maximum([np.nan, 3.14, 0], [np.NINF, np.nan, 2.72])
array([ nan,  nan, 2.72])
>>> np.fmax([np.nan, 3.14, 0], [np.NINF, np.nan, 2.72])
array([-inf, 3.14, 2.72])

The element-wise functions are np.ufunc(Universal Function), which means they have some special properties that normal Numpy function don’t have.

>>> type(np.maximum)
<class 'numpy.ufunc'>
>>> type(np.fmax)
<class 'numpy.ufunc'>
>>> #---------------#
>>> type(np.max)
<class 'function'>
>>> type(np.nanmax)
<class 'function'>

And finally, the same rules apply to the four minimum related functions:

np.amin/np.min, np.nanmin;
and np.minimum, np.fmin.

Question 27

np.maximum not only compares elementwise but also compares array elementwise with single value

>>>np.maximum([23, 14, 16, 20, 25], 18)
array([23, 18, 18, 20, 25])

Question 28

I need an algorithm that can give me positions around a sphere for N points (less than 20, probably) that vaguely spreads them out. There’s no need for “perfection”, but I just need it so none of them are bunched together.

This question provided good code, but I couldn’t find a way to make this uniform, as this seemed 100% randomized.
This blog post recommended had two ways allowing input of number of points on the sphere, but the Saff and Kuijlaars algorithm is exactly in psuedocode I could transcribe, and the code example I found contained “node[k]”, which I couldn’t see explained and ruined that possibility. The second blog example was the Golden Section Spiral, which gave me strange, bunched up results, with no clear way to define a constant radius.
This algorithm from this question seems like it could possibly work, but I can’t piece together what’s on that page into psuedocode or anything.

A few other question threads I came across spoke of randomized uniform distribution, which adds a level of complexity I’m not concerned about. I apologize that this is such a silly question, but I wanted to show that I’ve truly looked hard and still come up short.

So, what I’m looking for is simple pseudocode to evenly distribute N points around a unit sphere, that either returns in spherical or Cartesian coordinates. Even better if it can even distribute with a bit of randomization (think planets around a star, decently spread out, but with room for leeway).

Question 29

In this example code node[k] is just the kth node. You are generating an array N points and node[k] is the kth (from 0 to N-1). If that is all that is confusing you, hopefully you can use that now.

(in other words, k is an array of size N that is defined before the code fragment starts, and which contains a list of the points).

Alternatively, building on the other answer here (and using Python):

> cat ll.py
from math import asin
nx = 4; ny = 5
for x in range(nx):
    lon = 360 * ((x+0.5) / nx)
    for y in range(ny):                                                         
        midpt = (y+0.5) / ny                                                    
        lat = 180 * asin(2*((y+0.5)/ny-0.5))                                    
        print lon,lat                                                           
> python2.7 ll.py                                                      
45.0 -166.91313924                                                              
45.0 -74.0730322921                                                             
45.0 0.0                                                                        
45.0 74.0730322921                                                              
45.0 166.91313924                                                               
135.0 -166.91313924                                                             
135.0 -74.0730322921                                                            
135.0 0.0                                                                       
135.0 74.0730322921                                                             
135.0 166.91313924                                                              
225.0 -166.91313924                                                             
225.0 -74.0730322921                                                            
225.0 0.0                                                                       
225.0 74.0730322921                                                             
225.0 166.91313924
315.0 -166.91313924
315.0 -74.0730322921
315.0 0.0
315.0 74.0730322921
315.0 166.91313924

If you plot that, you’ll see that the vertical spacing is larger near the poles so that each point is situated in about the same total area of space (near the poles there’s less space “horizontally”, so it gives more “vertically”).

This isn’t the same as all points having about the same distance to their neighbours (which is what I think your links are talking about), but it may be sufficient for what you want and improves on simply making a uniform lat/lon grid.

Question 30

The Fibonacci sphere algorithm is great for this. It is fast and gives results that at a glance will easily fool the human eye. You can see an example done with processing which will show the result over time as points are added. Here’s another great interactive example made by @gman. And here’s a simple implementation in python.

import math


def fibonacci_sphere(samples=1):

    points = []
    phi = math.pi * (3. - math.sqrt(5.))  # golden angle in radians

    for i in range(samples):
        y = 1 - (i / float(samples - 1)) * 2  # y goes from 1 to -1
        radius = math.sqrt(1 - y * y)  # radius at y

        theta = phi * i  # golden angle increment

        x = math.cos(theta) * radius
        z = math.sin(theta) * radius

        points.append((x, y, z))

    return points

1000 samples gives you this:

enter image description here

Question 31

The golden spiral method

You said you couldn’t get the golden spiral method to work and that’s a shame because it’s really, really good. I would like to give you a complete understanding of it so that maybe you can understand how to keep this away from being “bunched up.”

So here’s a fast, non-random way to create a lattice that is approximately correct; as discussed above, no lattice will be perfect, but this may be good enough. It is compared to other methods e.g. at BendWavy.org but it just has a nice and pretty look as well as a guarantee about even spacing in the limit.

Primer: sunflower spirals on the unit disk

To understand this algorithm, I first invite you to look at the 2D sunflower spiral algorithm. This is based on the fact that the most irrational number is the golden ratio (1 + sqrt(5))/2 and if one emits points by the approach “stand at the center, turn a golden ratio of whole turns, then emit another point in that direction,” one naturally constructs a spiral which, as you get to higher and higher numbers of points, nevertheless refuses to have well-defined ‘bars’ that the points line up on.^{(Note 1.)}

The algorithm for even spacing on a disk is,

from numpy import pi, cos, sin, sqrt, arange
import matplotlib.pyplot as pp

num_pts = 100
indices = arange(0, num_pts, dtype=float) + 0.5

r = sqrt(indices/num_pts)
theta = pi * (1 + 5**0.5) * indices

pp.scatter(r*cos(theta), r*sin(theta))
pp.show()

and it produces results that look like (n=100 and n=1000):

Spacing the points radially

The key strange thing is the formula r = sqrt(indices / num_pts); how did I come to that one? ^{(Note 2.)}

Well, I am using the square root here because I want these to have even-area spacing around the disk. That is the same as saying that in the limit of large N I want a little region R ∈ (r, r + dr), Θ ∈ (θ, θ + dθ) to contain a number of points proportional to its area, which is r dr dθ. Now if we pretend that we are talking about a random variable here, this has a straightforward interpretation as saying that the joint probability density for (R, Θ) is just c r for some constant c. Normalization on the unit disk would then force c = 1/π.

Now let me introduce a trick. It comes from probability theory where it’s known as sampling the inverse CDF: suppose you wanted to generate a random variable with a probability density f(z) and you have a random variable U ~ Uniform(0, 1), just like comes out of random() in most programming languages. How do you do this?

First, turn your density into a cumulative distribution function or CDF, which we will call F(z). A CDF, remember, increases monotonically from 0 to 1 with derivative f(z).
Then calculate the CDF’s inverse function F^-1(z).
You will find that Z = F^-1(U) is distributed according to the target density. ^{(Note 3).}

Now the golden-ratio spiral trick spaces the points out in a nicely even pattern for θ so let’s integrate that out; for the unit disk we are left with F(r) = r². So the inverse function is F^-1(u) = u^1/2, and therefore we would generate random points on the disk in polar coordinates with r = sqrt(random()); theta = 2 * pi * random().

Now instead of randomly sampling this inverse function we’re uniformly sampling it, and the nice thing about uniform sampling is that our results about how points are spread out in the limit of large N will behave as if we had randomly sampled it. This combination is the trick. Instead of random() we use (arange(0, num_pts, dtype=float) + 0.5)/num_pts, so that, say, if we want to sample 10 points they are r = 0.05, 0.15, 0.25, ... 0.95. We uniformly sample r to get equal-area spacing, and we use the sunflower increment to avoid awful “bars” of points in the output.

Now doing the sunflower on a sphere

The changes that we need to make to dot the sphere with points merely involve switching out the polar coordinates for spherical coordinates. The radial coordinate of course doesn’t enter into this because we’re on a unit sphere. To keep things a little more consistent here, even though I was trained as a physicist I’ll use mathematicians’ coordinates where 0 ≤ φ ≤ π is latitude coming down from the pole and 0 ≤ θ ≤ 2π is longitude. So the difference from above is that we are basically replacing the variable r with φ.

Our area element, which was r dr dθ, now becomes the not-much-more-complicated sin(φ) dφ dθ. So our joint density for uniform spacing is sin(φ)/4π. Integrating out θ, we find f(φ) = sin(φ)/2, thus F(φ) = (1 − cos(φ))/2. Inverting this we can see that a uniform random variable would look like acos(1 – 2 u), but we sample uniformly instead of randomly, so we instead use φ_k = acos(1 − 2 (k + 0.5)/N). And the rest of the algorithm is just projecting this onto the x, y, and z coordinates:

from numpy import pi, cos, sin, arccos, arange
import mpl_toolkits.mplot3d
import matplotlib.pyplot as pp

num_pts = 1000
indices = arange(0, num_pts, dtype=float) + 0.5

phi = arccos(1 - 2*indices/num_pts)
theta = pi * (1 + 5**0.5) * indices

x, y, z = cos(theta) * sin(phi), sin(theta) * sin(phi), cos(phi);

pp.figure().add_subplot(111, projection='3d').scatter(x, y, z);
pp.show()

Again for n=100 and n=1000 the results look like:

Further research

I wanted to give a shout out to Martin Roberts’s blog. Note that above I created an offset of my indices by adding 0.5 to each index. This was just visually appealing to me, but it turns out that the choice of offset matters a lot and is not constant over the interval and can mean getting as much as 8% better accuracy in packing if chosen correctly. There should also be a way to get his R₂ sequence to cover a sphere and it would be interesting to see if this also produced a nice even covering, perhaps as-is but perhaps needing to be, say, taken from only a half of the unit square cut diagonally or so and stretched around to get a circle.

Notes

Those “bars” are formed by rational approximations to a number, and the best rational approximations to a number come from its continued fraction expression, z + 1/(n_1 + 1/(n_2 + 1/(n_3 + ...))) where z is an integer and n_1, n_2, n_3, ... is either a finite or infinite sequence of positive integers:
```
def continued_fraction(r):
    while r != 0:
        n = floor(r)
        yield n
        r = 1/(r - n)
```
Since the fraction part 1/(...) is always between zero and one, a large integer in the continued fraction allows for a particularly good rational approximation: “one divided by something between 100 and 101” is better than “one divided by something between 1 and 2.” The most irrational number is therefore the one which is 1 + 1/(1 + 1/(1 + ...)) and has no particularly good rational approximations; one can solve φ = 1 + 1/φ by multiplying through by φ to get the formula for the golden ratio.
For folks who are not so familiar with NumPy — all of the functions are “vectorized,” so that sqrt(array) is the same as what other languages might write map(sqrt, array). So this is a component-by-component sqrt application. The same also holds for division by a scalar or addition with scalars — those apply to all components in parallel.
The proof is simple once you know that this is the result. If you ask what’s the probability that z < Z < z + dz, this is the same as asking what’s the probability that z < F^-1(U) < z + dz, apply F to all three expressions noting that it is a monotonically increasing function, hence F(z) < U < F(z + dz), expand the right hand side out to find F(z) + f(z) dz, and since U is uniform this probability is just f(z) dz as promised.

Question 32

This is known as packing points on a sphere, and there is no (known) general, perfect solution. However, there are plenty of imperfect solutions. The three most popular seem to be:

Create a simulation. Treat each point as an electron constrained to a sphere, then run a simulation for a certain number of steps. The electrons’ repulsion will naturally tend the system to a more stable state, where the points are about as far away from each other as they can get.
Hypercube rejection. This fancy-sounding method is actually really simple: you uniformly choose points (much more than n of them) inside of the cube surrounding the sphere, then reject the points outside of the sphere. Treat the remaining points as vectors, and normalize them. These are your “samples” – choose n of them using some method (randomly, greedy, etc).
Spiral approximations. You trace a spiral around a sphere, and evenly-distribute the points around the spiral. Because of the mathematics involved, these are more complicated to understand than the simulation, but much faster (and probably involving less code). The most popular seems to be by Saff, et al.

A lot more information about this problem can be found here

Question 33

What you are looking for is called a spherical covering. The spherical covering problem is very hard and solutions are unknown except for small numbers of points. One thing that is known for sure is that given n points on a sphere, there always exist two points of distance d = (4-csc^2(\pi n/6(n-2)))^(1/2) or closer.

If you want a probabilistic method for generating points uniformly distributed on a sphere, it’s easy: generate points in space uniformly by Gaussian distribution (it’s built into Java, not hard to find the code for other languages). So in 3-dimensional space, you need something like

Random r = new Random();
double[] p = { r.nextGaussian(), r.nextGaussian(), r.nextGaussian() };

Then project the point onto the sphere by normalizing its distance from the origin

double norm = Math.sqrt( (p[0])^2 + (p[1])^2 + (p[2])^2 ); 
double[] sphereRandomPoint = { p[0]/norm, p[1]/norm, p[2]/norm };

The Gaussian distribution in n dimensions is spherically symmetric so the projection onto the sphere is uniform.

Of course, there’s no guarantee that the distance between any two points in a collection of uniformly generated points will be bounded below, so you can use rejection to enforce any such conditions that you might have: probably it’s best to generate the whole collection and then reject the whole collection if necessary. (Or use “early rejection” to reject the whole collection you’ve generated so far; just don’t keep some points and drop others.) You can use the formula for d given above, minus some slack, to determine the min distance between points below which you will reject a set of points. You’ll have to calculate n choose 2 distances, and the probability of rejection will depend on the slack; it’s hard to say how, so run a simulation to get a feel for the relevant statistics.

Question 34

This answer is based on the same ‘theory’ that is outlined well by this answer

I’m adding this answer as:
— None of the other options fit the ‘uniformity’ need ‘spot-on’ (or not obviously-clearly so). (Noting to get the planet like distribution looking behavior particurally wanted in the original ask, you just reject from the finite list of the k uniformly created points at random (random wrt the index count in the k items back).)
–The closest other impl forced you to decide the ‘N’ by ‘angular axis’, vs. just ‘one value of N’ across both angular axis values ( which at low counts of N is very tricky to know what may, or may not matter (e.g. you want ‘5’ points — have fun ) )
–Furthermore, it’s very hard to ‘grok’ how to differentiate between the other options without any imagery, so here’s what this option looks like (below), and the ready-to-run implementation that goes with it.

with N at 20:

enter image description here
and then N at 80:

here’s the ready-to-run python3 code, where the emulation is that same source: ” http://web.archive.org/web/20120421191837/http://www.cgafaq.info/wiki/Evenly_distributed_points_on_sphere ” found by others. ( The plotting I’ve included, that fires when run as ‘main,’ is taken from: http://www.scipy.org/Cookbook/Matplotlib/mplot3D )

from math import cos, sin, pi, sqrt

def GetPointsEquiAngularlyDistancedOnSphere(numberOfPoints=45):
    """ each point you get will be of form 'x, y, z'; in cartesian coordinates
        eg. the 'l2 distance' from the origion [0., 0., 0.] for each point will be 1.0 
        ------------
        converted from:  http://web.archive.org/web/20120421191837/http://www.cgafaq.info/wiki/Evenly_distributed_points_on_sphere ) 
    """
    dlong = pi*(3.0-sqrt(5.0))  # ~2.39996323 
    dz   =  2.0/numberOfPoints
    long =  0.0
    z    =  1.0 - dz/2.0
    ptsOnSphere =[]
    for k in range( 0, numberOfPoints): 
        r    = sqrt(1.0-z*z)
        ptNew = (cos(long)*r, sin(long)*r, z)
        ptsOnSphere.append( ptNew )
        z    = z - dz
        long = long + dlong
    return ptsOnSphere

if __name__ == '__main__':                
    ptsOnSphere = GetPointsEquiAngularlyDistancedOnSphere( 80)    

    #toggle True/False to print them
    if( True ):    
        for pt in ptsOnSphere:  print( pt)

    #toggle True/False to plot them
    if(True):
        from numpy import *
        import pylab as p
        import mpl_toolkits.mplot3d.axes3d as p3

        fig=p.figure()
        ax = p3.Axes3D(fig)

        x_s=[];y_s=[]; z_s=[]

        for pt in ptsOnSphere:
            x_s.append( pt[0]); y_s.append( pt[1]); z_s.append( pt[2])

        ax.scatter3D( array( x_s), array( y_s), array( z_s) )                
        ax.set_xlabel('X'); ax.set_ylabel('Y'); ax.set_zlabel('Z')
        p.show()
        #end

tested at low counts (N in 2, 5, 7, 13, etc) and seems to work ‘nice’

Question 35

Try:

function sphere ( N:float,k:int):Vector3 {
    var inc =  Mathf.PI  * (3 - Mathf.Sqrt(5));
    var off = 2 / N;
    var y = k * off - 1 + (off / 2);
    var r = Mathf.Sqrt(1 - y*y);
    var phi = k * inc;
    return Vector3((Mathf.Cos(phi)*r), y, Mathf.Sin(phi)*r); 
};

The above function should run in loop with N loop total and k loop current iteration.

It is based on a sunflower seeds pattern, except the sunflower seeds are curved around into a half dome, and again into a sphere.

Here is a picture, except I put the camera half way inside the sphere so it looks 2d instead of 3d because the camera is same distance from all points. http://3.bp.blogspot.com/-9lbPHLccQHA/USXf88_bvVI/AAAAAAAAADY/j7qhQsSZsA8/s640/sphere.jpg

Question 36

Healpix solves a closely related problem (pixelating the sphere with equal area pixels):

http://healpix.sourceforge.net/

It’s probably overkill, but maybe after looking at it you’ll realize some of it’s other nice properties are interesting to you. It’s way more than just a function that outputs a point cloud.

I landed here trying to find it again; the name “healpix” doesn’t exactly evoke spheres…

Question 37

with small numbers of points you could run a simulation:

from random import random,randint
r = 10
n = 20
best_closest_d = 0
best_points = []
points = [(r,0,0) for i in range(n)]
for simulation in range(10000):
    x = random()*r
    y = random()*r
    z = r-(x**2+y**2)**0.5
    if randint(0,1):
        x = -x
    if randint(0,1):
        y = -y
    if randint(0,1):
        z = -z
    closest_dist = (2*r)**2
    closest_index = None
    for i in range(n):
        for j in range(n):
            if i==j:
                continue
            p1,p2 = points[i],points[j]
            x1,y1,z1 = p1
            x2,y2,z2 = p2
            d = (x1-x2)**2+(y1-y2)**2+(z1-z2)**2
            if d < closest_dist:
                closest_dist = d
                closest_index = i
    if simulation % 100 == 0:
        print simulation,closest_dist
    if closest_dist > best_closest_d:
        best_closest_d = closest_dist
        best_points = points[:]
    points[closest_index]=(x,y,z)


print best_points
>>> best_points
[(9.921692138442777, -9.930808529773849, 4.037839326088124),
 (5.141893371460546, 1.7274947332807744, -4.575674650522637),
 (-4.917695758662436, -1.090127967097737, -4.9629263893193745),
 (3.6164803265540666, 7.004158551438312, -2.1172868271109184),
 (-9.550655088997003, -9.580386054762917, 3.5277052594769422),
 (-0.062238110294250415, 6.803105171979587, 3.1966101417463655),
 (-9.600996012203195, 9.488067284474834, -3.498242301168819),
 (-8.601522086624803, 4.519484132245867, -0.2834204048792728),
 (-1.1198210500791472, -2.2916581379035694, 7.44937337008726),
 (7.981831370440529, 8.539378431788634, 1.6889099589074377),
 (0.513546008372332, -2.974333486904779, -6.981657873262494),
 (-4.13615438946178, -6.707488383678717, 2.1197605651446807),
 (2.2859494919024326, -8.14336582650039, 1.5418694699275672),
 (-7.241410895247996, 9.907335206038226, 2.271647103735541),
 (-9.433349952523232, -7.999106443463781, -2.3682575660694347),
 (3.704772125650199, 1.0526567864085812, 6.148581714099761),
 (-3.5710511242327048, 5.512552040316693, -3.4318468250897647),
 (-7.483466337225052, -1.506434920354559, 2.36641535124918),
 (7.73363824231576, -8.460241422163824, -1.4623228616326003),
 (10, 0, 0)]

Question 38

Take the two largest factors of your N, if N==20 then the two largest factors are {5,4}, or, more generally {a,b}. Calculate

dlat  = 180/(a+1)
dlong = 360/(b+1})

Put your first point at {90-dlat/2,(dlong/2)-180}, your second at {90-dlat/2,(3*dlong/2)-180}, your 3rd at {90-dlat/2,(5*dlong/2)-180}, until you’ve tripped round the world once, by which time you’ve got to about {75,150} when you go next to {90-3*dlat/2,(dlong/2)-180}.

Obviously I’m working this in degrees on the surface of the spherical earth, with the usual conventions for translating +/- to N/S or E/W. And obviously this gives you a completely non-random distribution, but it is uniform and the points are not bunched together.

To add some degree of randomness, you could generate 2 normally-distributed (with mean 0 and std dev of {dlat/3, dlong/3} as appropriate) and add them to your uniformly distributed points.

Question 39

edit: This does not answer the question the OP meant to ask, leaving it here in case people find it useful somehow.

We use the multiplication rule of probability, combined with infinitessimals. This results in 2 lines of code to achieve your desired result:

longitude: φ = uniform([0,2pi))
azimuth:   θ = -arcsin(1 - 2*uniform([0,1]))

(defined in the following coordinate system:)

enter image description here

Your language typically has a uniform random number primitive. For example in python you can use random.random() to return a number in the range [0,1). You can multiply this number by k to get a random number in the range [0,k). Thus in python, uniform([0,2pi)) would mean random.random()*2*math.pi.

Proof

Now we can’t assign θ uniformly, otherwise we’d get clumping at the poles. We wish to assign probabilities proportional to the surface area of the spherical wedge (the θ in this diagram is actually φ):

enter image description here

An angular displacement dφ at the equator will result in a displacement of dφ*r. What will that displacement be at an arbitrary azimuth θ? Well, the radius from the z-axis is r*sin(θ), so the arclength of that “latitude” intersecting the wedge is dφ * r*sin(θ). Thus we calculate the cumulative distribution of the area to sample from it, by integrating the area of the slice from the south pole to the north pole.

enter image description here (where stuff=dφ*r)

We will now attempt to get the inverse of the CDF to sample from it: http://en.wikipedia.org/wiki/Inverse_transform_sampling

First we normalize by dividing our almost-CDF by its maximum value. This has the side-effect of cancelling out the dφ and r.

azimuthalCDF: cumProb = (sin(θ)+1)/2 from -pi/2 to pi/2

inverseCDF: θ = -sin^(-1)(1 - 2*cumProb)

Thus:

let x by a random float in range [0,1]
θ = -arcsin(1-2*x)

Question 40

OR… to place 20 points, compute the centers of the icosahedronal faces. For 12 points, find the vertices of the icosahedron. For 30 points, the mid point of the edges of the icosahedron. you can do the same thing with the tetrahedron, cube, dodecahedron and octahedrons: one set of points is on the vertices, another on the center of the face and another on the center of the edges. They cannot be mixed, however.

Question 41

# create uniform spiral grid
numOfPoints = varargin[0]
vxyz = zeros((numOfPoints,3),dtype=float)
sq0 = 0.00033333333**2
sq2 = 0.9999998**2
sumsq = 2*sq0 + sq2
vxyz[numOfPoints -1] = array([(sqrt(sq0/sumsq)), 
                              (sqrt(sq0/sumsq)), 
                              (-sqrt(sq2/sumsq))])
vxyz[0] = -vxyz[numOfPoints -1] 
phi2 = sqrt(5)*0.5 + 2.5
rootCnt = sqrt(numOfPoints)
prevLongitude = 0
for index in arange(1, (numOfPoints -1), 1, dtype=float):
  zInc = (2*index)/(numOfPoints) -1
  radius = sqrt(1-zInc**2)

  longitude = phi2/(rootCnt*radius)
  longitude = longitude + prevLongitude
  while (longitude > 2*pi): 
    longitude = longitude - 2*pi

  prevLongitude = longitude
  if (longitude > pi):
    longitude = longitude - 2*pi

  latitude = arccos(zInc) - pi/2
  vxyz[index] = array([ (cos(latitude) * cos(longitude)) ,
                        (cos(latitude) * sin(longitude)), 
                        sin(latitude)])

Question 42

@robert king It’s a really nice solution but has some sloppy bugs in it. I know it helped me a lot though, so never mind the sloppiness. :) Here is a cleaned up version….

from math import pi, asin, sin, degrees
halfpi, twopi = .5 * pi, 2 * pi
sphere_area = lambda R=1.0: 4 * pi * R ** 2

lat_dist = lambda lat, R=1.0: R*(1-sin(lat))

#A = 2*pi*R^2(1-sin(lat))
def sphere_latarea(lat, R=1.0):
    if -halfpi > lat or lat > halfpi:
        raise ValueError("lat must be between -halfpi and halfpi")
    return 2 * pi * R ** 2 * (1-sin(lat))

sphere_lonarea = lambda lon, R=1.0: \
        4 * pi * R ** 2 * lon / twopi

#A = 2*pi*R^2 |sin(lat1)-sin(lat2)| |lon1-lon2|/360
#    = (pi/180)R^2 |sin(lat1)-sin(lat2)| |lon1-lon2|
sphere_rectarea = lambda lat0, lat1, lon0, lon1, R=1.0: \
        (sphere_latarea(lat0, R)-sphere_latarea(lat1, R)) * (lon1-lon0) / twopi


def test_sphere(n_lats=10, n_lons=19, radius=540.0):
    total_area = 0.0
    for i_lons in range(n_lons):
        lon0 = twopi * float(i_lons) / n_lons
        lon1 = twopi * float(i_lons+1) / n_lons
        for i_lats in range(n_lats):
            lat0 = asin(2 * float(i_lats) / n_lats - 1)
            lat1 = asin(2 * float(i_lats+1)/n_lats - 1)
            area = sphere_rectarea(lat0, lat1, lon0, lon1, radius)
            print("{:} {:}: {:9.4f} to  {:9.4f}, {:9.4f} to  {:9.4f} => area {:10.4f}"
                    .format(i_lats, i_lons
                    , degrees(lat0), degrees(lat1)
                    , degrees(lon0), degrees(lon1)
                    , area))
            total_area += area
    print("total_area = {:10.4f} (difference of {:10.4f})"
            .format(total_area, abs(total_area) - sphere_area(radius)))

test_sphere()

Question 43

This works and it’s deadly simple. As many points as you want:

    private function moveTweets():void {


        var newScale:Number=Scale(meshes.length,50,500,6,2);
        trace("new scale:"+newScale);


        var l:Number=this.meshes.length;
        var tweetMeshInstance:TweetMesh;
        var destx:Number;
        var desty:Number;
        var destz:Number;
        for (var i:Number=0;i<this.meshes.length;i++){

            tweetMeshInstance=meshes[i];

            var phi:Number = Math.acos( -1 + ( 2 * i ) / l );
            var theta:Number = Math.sqrt( l * Math.PI ) * phi;

            tweetMeshInstance.origX = (sphereRadius+5) * Math.cos( theta ) * Math.sin( phi );
            tweetMeshInstance.origY= (sphereRadius+5) * Math.sin( theta ) * Math.sin( phi );
            tweetMeshInstance.origZ = (sphereRadius+5) * Math.cos( phi );

            destx=sphereRadius * Math.cos( theta ) * Math.sin( phi );
            desty=sphereRadius * Math.sin( theta ) * Math.sin( phi );
            destz=sphereRadius * Math.cos( phi );

            tweetMeshInstance.lookAt(new Vector3D());


            TweenMax.to(tweetMeshInstance, 1, {scaleX:newScale,scaleY:newScale,x:destx,y:desty,z:destz,onUpdate:onLookAtTween, onUpdateParams:[tweetMeshInstance]});

        }

    }
    private function onLookAtTween(theMesh:TweetMesh):void {
        theMesh.lookAt(new Vector3D());
    }

Question 44

stringExp = "2^4"
intVal = int(stringExp)      # Expected value: 16

This returns the following error:

Traceback (most recent call last):  
File "<stdin>", line 1, in <module>
ValueError: invalid literal for int()
with base 10: '2^4'

I know that eval can work around this, but isn’t there a better and – more importantly – safer method to evaluate a mathematical expression that is being stored in a string?

Question 45

Pyparsing can be used to parse mathematical expressions. In particular, fourFn.py shows how to parse basic arithmetic expressions. Below, I’ve rewrapped fourFn into a numeric parser class for easier reuse.

from __future__ import division
from pyparsing import (Literal, CaselessLiteral, Word, Combine, Group, Optional,
                       ZeroOrMore, Forward, nums, alphas, oneOf)
import math
import operator

__author__ = 'Paul McGuire'
__version__ = '$Revision: 0.0 $'
__date__ = '$Date: 2009-03-20 $'
__source__ = '''http://pyparsing.wikispaces.com/file/view/fourFn.py
http://pyparsing.wikispaces.com/message/view/home/15549426
'''
__note__ = '''
All I've done is rewrap Paul McGuire's fourFn.py as a class, so I can use it
more easily in other places.
'''


class NumericStringParser(object):
    '''
    Most of this code comes from the fourFn.py pyparsing example

    '''

    def pushFirst(self, strg, loc, toks):
        self.exprStack.append(toks[0])

    def pushUMinus(self, strg, loc, toks):
        if toks and toks[0] == '-':
            self.exprStack.append('unary -')

    def __init__(self):
        """
        expop   :: '^'
        multop  :: '*' | '/'
        addop   :: '+' | '-'
        integer :: ['+' | '-'] '0'..'9'+
        atom    :: PI | E | real | fn '(' expr ')' | '(' expr ')'
        factor  :: atom [ expop factor ]*
        term    :: factor [ multop factor ]*
        expr    :: term [ addop term ]*
        """
        point = Literal(".")
        e = CaselessLiteral("E")
        fnumber = Combine(Word("+-" + nums, nums) +
                          Optional(point + Optional(Word(nums))) +
                          Optional(e + Word("+-" + nums, nums)))
        ident = Word(alphas, alphas + nums + "_$")
        plus = Literal("+")
        minus = Literal("-")
        mult = Literal("*")
        div = Literal("/")
        lpar = Literal("(").suppress()
        rpar = Literal(")").suppress()
        addop = plus | minus
        multop = mult | div
        expop = Literal("^")
        pi = CaselessLiteral("PI")
        expr = Forward()
        atom = ((Optional(oneOf("- +")) +
                 (ident + lpar + expr + rpar | pi | e | fnumber).setParseAction(self.pushFirst))
                | Optional(oneOf("- +")) + Group(lpar + expr + rpar)
                ).setParseAction(self.pushUMinus)
        # by defining exponentiation as "atom [ ^ factor ]..." instead of
        # "atom [ ^ atom ]...", we get right-to-left exponents, instead of left-to-right
        # that is, 2^3^2 = 2^(3^2), not (2^3)^2.
        factor = Forward()
        factor << atom + \
            ZeroOrMore((expop + factor).setParseAction(self.pushFirst))
        term = factor + \
            ZeroOrMore((multop + factor).setParseAction(self.pushFirst))
        expr << term + \
            ZeroOrMore((addop + term).setParseAction(self.pushFirst))
        # addop_term = ( addop + term ).setParseAction( self.pushFirst )
        # general_term = term + ZeroOrMore( addop_term ) | OneOrMore( addop_term)
        # expr <<  general_term
        self.bnf = expr
        # map operator symbols to corresponding arithmetic operations
        epsilon = 1e-12
        self.opn = {"+": operator.add,
                    "-": operator.sub,
                    "*": operator.mul,
                    "/": operator.truediv,
                    "^": operator.pow}
        self.fn = {"sin": math.sin,
                   "cos": math.cos,
                   "tan": math.tan,
                   "exp": math.exp,
                   "abs": abs,
                   "trunc": lambda a: int(a),
                   "round": round,
                   "sgn": lambda a: abs(a) > epsilon and cmp(a, 0) or 0}

    def evaluateStack(self, s):
        op = s.pop()
        if op == 'unary -':
            return -self.evaluateStack(s)
        if op in "+-*/^":
            op2 = self.evaluateStack(s)
            op1 = self.evaluateStack(s)
            return self.opn[op](op1, op2)
        elif op == "PI":
            return math.pi  # 3.1415926535
        elif op == "E":
            return math.e  # 2.718281828
        elif op in self.fn:
            return self.fn[op](self.evaluateStack(s))
        elif op[0].isalpha():
            return 0
        else:
            return float(op)

    def eval(self, num_string, parseAll=True):
        self.exprStack = []
        results = self.bnf.parseString(num_string, parseAll)
        val = self.evaluateStack(self.exprStack[:])
        return val

You can use it like this

nsp = NumericStringParser()
result = nsp.eval('2^4')
print(result)
# 16.0

result = nsp.eval('exp(2^4)')
print(result)
# 8886110.520507872

Question 46

`eval` is evil

eval("__import__('os').remove('important file')") # arbitrary commands
eval("9**9**9**9**9**9**9**9", {'__builtins__': None}) # CPU, memory

Note: even if you use set __builtins__ to None it still might be possible to break out using introspection:

eval('(1).__class__.__bases__[0].__subclasses__()', {'__builtins__': None})

Evaluate arithmetic expression using `ast`

import ast
import operator as op

# supported operators
operators = {ast.Add: op.add, ast.Sub: op.sub, ast.Mult: op.mul,
             ast.Div: op.truediv, ast.Pow: op.pow, ast.BitXor: op.xor,
             ast.USub: op.neg}

def eval_expr(expr):
    """
    >>> eval_expr('2^6')
    4
    >>> eval_expr('2**6')
    64
    >>> eval_expr('1 + 2*3**(4^5) / (6 + -7)')
    -5.0
    """
    return eval_(ast.parse(expr, mode='eval').body)

def eval_(node):
    if isinstance(node, ast.Num): # <number>
        return node.n
    elif isinstance(node, ast.BinOp): # <left> <operator> <right>
        return operators[type(node.op)](eval_(node.left), eval_(node.right))
    elif isinstance(node, ast.UnaryOp): # <operator> <operand> e.g., -1
        return operators[type(node.op)](eval_(node.operand))
    else:
        raise TypeError(node)

You can easily limit allowed range for each operation or any intermediate result, e.g., to limit input arguments for a**b:

def power(a, b):
    if any(abs(n) > 100 for n in [a, b]):
        raise ValueError((a,b))
    return op.pow(a, b)
operators[ast.Pow] = power

Or to limit magnitude of intermediate results:

import functools

def limit(max_=None):
    """Return decorator that limits allowed returned values."""
    def decorator(func):
        @functools.wraps(func)
        def wrapper(*args, **kwargs):
            ret = func(*args, **kwargs)
            try:
                mag = abs(ret)
            except TypeError:
                pass # not applicable
            else:
                if mag > max_:
                    raise ValueError(ret)
            return ret
        return wrapper
    return decorator

eval_ = limit(max_=10**100)(eval_)

Example

>>> evil = "__import__('os').remove('important file')"
>>> eval_expr(evil) #doctest:+IGNORE_EXCEPTION_DETAIL
Traceback (most recent call last):
...
TypeError:
>>> eval_expr("9**9")
387420489
>>> eval_expr("9**9**9**9**9**9**9**9") #doctest:+IGNORE_EXCEPTION_DETAIL
Traceback (most recent call last):
...
ValueError:

Question 47

Some safer alternatives to eval() and sympy.sympify().evalf()^*:

^*SymPy sympify is also unsafe according to the following warning from the documentation.

Warning: Note that this function uses eval, and thus shouldn’t be used on unsanitized input.

Question 48

Okay, so the problem with eval is that it can escape its sandbox too easily, even if you get rid of __builtins__. All the methods for escaping the sandbox come down to using getattr or object.__getattribute__ (via the . operator) to obtain a reference to some dangerous object via some allowed object (''.__class__.__bases__[0].__subclasses__ or similar). getattr is eliminated by setting __builtins__ to None. object.__getattribute__ is the difficult one, since it cannot simply be removed, both because object is immutable and because removing it would break everything. However, __getattribute__ is only accessible via the . operator, so purging that from your input is sufficient to ensure eval cannot escape its sandbox.
In processing formulas, the only valid use of a decimal is when it is preceded or followed by [0-9], so we just remove all other instances of ..

import re
inp = re.sub(r"\.(?![0-9])","", inp)
val = eval(inp, {'__builtins__':None})

Note that while python normally treats 1 + 1. as 1 + 1.0, this will remove the trailing . and leave you with 1 + 1. You could add ),, and EOF to the list of things allowed to follow ., but why bother?

Question 49

You can use the ast module and write a NodeVisitor that verifies that the type of each node is part of a whitelist.

import ast, math

locals =  {key: value for (key,value) in vars(math).items() if key[0] != '_'}
locals.update({"abs": abs, "complex": complex, "min": min, "max": max, "pow": pow, "round": round})

class Visitor(ast.NodeVisitor):
    def visit(self, node):
       if not isinstance(node, self.whitelist):
           raise ValueError(node)
       return super().visit(node)

    whitelist = (ast.Module, ast.Expr, ast.Load, ast.Expression, ast.Add, ast.Sub, ast.UnaryOp, ast.Num, ast.BinOp,
            ast.Mult, ast.Div, ast.Pow, ast.BitOr, ast.BitAnd, ast.BitXor, ast.USub, ast.UAdd, ast.FloorDiv, ast.Mod,
            ast.LShift, ast.RShift, ast.Invert, ast.Call, ast.Name)

def evaluate(expr, locals = {}):
    if any(elem in expr for elem in '\n#') : raise ValueError(expr)
    try:
        node = ast.parse(expr.strip(), mode='eval')
        Visitor().visit(node)
        return eval(compile(node, "<string>", "eval"), {'__builtins__': None}, locals)
    except Exception: raise ValueError(expr)

Because it works via a whitelist rather than a blacklist, it is safe. The only functions and variables it can access are those you explicitly give it access to. I populated a dict with math-related functions so you can easily provide access to those if you want, but you have to explicitly use it.

If the string attempts to call functions that haven’t been provided, or invoke any methods, an exception will be raised, and it will not be executed.

Because this uses Python’s built in parser and evaluator, it also inherits Python’s precedence and promotion rules as well.

>>> evaluate("7 + 9 * (2 << 2)")
79
>>> evaluate("6 // 2 + 0.0")
3.0

The above code has only been tested on Python 3.

If desired, you can add a timeout decorator on this function.

Question 50

The reason eval and exec are so dangerous is that the default compile function will generate bytecode for any valid python expression, and the default eval or exec will execute any valid python bytecode. All the answers to date have focused on restricting the bytecode that can be generated (by sanitizing input) or building your own domain-specific-language using the AST.

Instead, you can easily create a simple eval function that is incapable of doing anything nefarious and can easily have runtime checks on memory or time used. Of course, if it is simple math, than there is a shortcut.

c = compile(stringExp, 'userinput', 'eval')
if c.co_code[0]==b'd' and c.co_code[3]==b'S':
    return c.co_consts[ord(c.co_code[1])+ord(c.co_code[2])*256]

The way this works is simple, any constant mathematic expression is safely evaluated during compilation and stored as a constant. The code object returned by compile consists of d, which is the bytecode for LOAD_CONST, followed by the number of the constant to load (usually the last one in the list), followed by S, which is the bytecode for RETURN_VALUE. If this shortcut doesn’t work, it means that the user input isn’t a constant expression (contains a variable or function call or similar).

This also opens the door to some more sophisticated input formats. For example:

stringExp = "1 + cos(2)"

This requires actually evaluating the bytecode, which is still quite simple. Python bytecode is a stack oriented language, so everything is a simple matter of TOS=stack.pop(); op(TOS); stack.put(TOS) or similar. The key is to only implement the opcodes that are safe (loading/storing values, math operations, returning values) and not unsafe ones (attribute lookup). If you want the user to be able to call functions (the whole reason not to use the shortcut above), simple make your implementation of CALL_FUNCTION only allow functions in a ‘safe’ list.

from dis import opmap
from Queue import LifoQueue
from math import sin,cos
import operator

globs = {'sin':sin, 'cos':cos}
safe = globs.values()

stack = LifoQueue()

class BINARY(object):
    def __init__(self, operator):
        self.op=operator
    def __call__(self, context):
        stack.put(self.op(stack.get(),stack.get()))

class UNARY(object):
    def __init__(self, operator):
        self.op=operator
    def __call__(self, context):
        stack.put(self.op(stack.get()))


def CALL_FUNCTION(context, arg):
    argc = arg[0]+arg[1]*256
    args = [stack.get() for i in range(argc)]
    func = stack.get()
    if func not in safe:
        raise TypeError("Function %r now allowed"%func)
    stack.put(func(*args))

def LOAD_CONST(context, arg):
    cons = arg[0]+arg[1]*256
    stack.put(context['code'].co_consts[cons])

def LOAD_NAME(context, arg):
    name_num = arg[0]+arg[1]*256
    name = context['code'].co_names[name_num]
    if name in context['locals']:
        stack.put(context['locals'][name])
    else:
        stack.put(context['globals'][name])

def RETURN_VALUE(context):
    return stack.get()

opfuncs = {
    opmap['BINARY_ADD']: BINARY(operator.add),
    opmap['UNARY_INVERT']: UNARY(operator.invert),
    opmap['CALL_FUNCTION']: CALL_FUNCTION,
    opmap['LOAD_CONST']: LOAD_CONST,
    opmap['LOAD_NAME']: LOAD_NAME
    opmap['RETURN_VALUE']: RETURN_VALUE,
}

def VMeval(c):
    context = dict(locals={}, globals=globs, code=c)
    bci = iter(c.co_code)
    for bytecode in bci:
        func = opfuncs[ord(bytecode)]
        if func.func_code.co_argcount==1:
            ret = func(context)
        else:
            args = ord(bci.next()), ord(bci.next())
            ret = func(context, args)
        if ret:
            return ret

def evaluate(expr):
    return VMeval(compile(expr, 'userinput', 'eval'))

Obviously, the real version of this would be a bit longer (there are 119 opcodes, 24 of which are math related). Adding STORE_FAST and a couple others would allow for input like 'x=5;return x+x or similar, trivially easily. It can even be used to execute user-created functions, so long as the user created functions are themselves executed via VMeval (don’t make them callable!!! or they could get used as a callback somewhere). Handling loops requires support for the goto bytecodes, which means changing from a for iterator to while and maintaining a pointer to the current instruction, but isn’t too hard. For resistance to DOS, the main loop should check how much time has passed since the start of the calculation, and certain operators should deny input over some reasonable limit (BINARY_POWER being the most obvious).

While this approach is somewhat longer than a simple grammar parser for simple expressions (see above about just grabbing the compiled constant), it extends easily to more complicated input, and doesn’t require dealing with grammar (compile take anything arbitrarily complicated and reduces it to a sequence of simple instructions).

Question 51

I think I would use eval(), but would first check to make sure the string is a valid mathematical expression, as opposed to something malicious. You could use a regex for the validation.

eval() also takes additional arguments which you can use to restrict the namespace it operates in for greater security.

Question 52

This is a massively late reply, but I think useful for future reference. Rather than write your own math parser (although the pyparsing example above is great) you could use SymPy. I don’t have a lot of experience with it, but it contains a much more powerful math engine than anyone is likely to write for a specific application and the basic expression evaluation is very easy:

>>> import sympy
>>> x, y, z = sympy.symbols('x y z')
>>> sympy.sympify("x**3 + sin(y)").evalf(subs={x:1, y:-3})
0.858879991940133

Very cool indeed! A from sympy import * brings in a lot more function support, such as trig functions, special functions, etc., but I’ve avoided that here to show what’s coming from where.

Question 53

[I know this is an old question, but it is worth pointing out new useful solutions as they pop up]

Since python3.6, this capability is now built into the language, coined “f-strings”.

See: PEP 498 — Literal String Interpolation

For example (note the f prefix):

f'{2**4}'
=> '16'

Question 54

Use eval in a clean namespace:

>>> ns = {'__builtins__': None}
>>> eval('2 ** 4', ns)
16

The clean namespace should prevent injection. For instance:

>>> eval('__builtins__.__import__("os").system("echo got through")', ns)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<string>", line 1, in <module>
AttributeError: 'NoneType' object has no attribute '__import__'

Otherwise you would get:

>>> eval('__builtins__.__import__("os").system("echo got through")')
got through
0

You might want to give access to the math module:

>>> import math
>>> ns = vars(math).copy()
>>> ns['__builtins__'] = None
>>> eval('cos(pi/3)', ns)
0.50000000000000011

Question 55

Here’s my solution to the problem without using eval. Works with Python2 and Python3. It doesn’t work with negative numbers.

$ python -m pytest test.py

test.py

from solution import Solutions

class SolutionsTestCase(unittest.TestCase):
    def setUp(self):
        self.solutions = Solutions()

    def test_evaluate(self):
        expressions = [
            '2+3=5',
            '6+4/2*2=10',
            '3+2.45/8=3.30625',
            '3**3*3/3+3=30',
            '2^4=6'
        ]
        results = [x.split('=')[1] for x in expressions]
        for e in range(len(expressions)):
            if '.' in results[e]:
                results[e] = float(results[e])
            else:
                results[e] = int(results[e])
            self.assertEqual(
                results[e],
                self.solutions.evaluate(expressions[e])
            )

solution.py

class Solutions(object):
    def evaluate(self, exp):
        def format(res):
            if '.' in res:
                try:
                    res = float(res)
                except ValueError:
                    pass
            else:
                try:
                    res = int(res)
                except ValueError:
                    pass
            return res
        def splitter(item, op):
            mul = item.split(op)
            if len(mul) == 2:
                for x in ['^', '*', '/', '+', '-']:
                    if x in mul[0]:
                        mul = [mul[0].split(x)[1], mul[1]]
                    if x in mul[1]:
                        mul = [mul[0], mul[1].split(x)[0]]
            elif len(mul) > 2:
                pass
            else:
                pass
            for x in range(len(mul)):
                mul[x] = format(mul[x])
            return mul
        exp = exp.replace(' ', '')
        if '=' in exp:
            res = exp.split('=')[1]
            res = format(res)
            exp = exp.replace('=%s' % res, '')
        while '^' in exp:
            if '^' in exp:
                itm = splitter(exp, '^')
                res = itm[0] ^ itm[1]
                exp = exp.replace('%s^%s' % (str(itm[0]), str(itm[1])), str(res))
        while '**' in exp:
            if '**' in exp:
                itm = splitter(exp, '**')
                res = itm[0] ** itm[1]
                exp = exp.replace('%s**%s' % (str(itm[0]), str(itm[1])), str(res))
        while '/' in exp:
            if '/' in exp:
                itm = splitter(exp, '/')
                res = itm[0] / itm[1]
                exp = exp.replace('%s/%s' % (str(itm[0]), str(itm[1])), str(res))
        while '*' in exp:
            if '*' in exp:
                itm = splitter(exp, '*')
                res = itm[0] * itm[1]
                exp = exp.replace('%s*%s' % (str(itm[0]), str(itm[1])), str(res))
        while '+' in exp:
            if '+' in exp:
                itm = splitter(exp, '+')
                res = itm[0] + itm[1]
                exp = exp.replace('%s+%s' % (str(itm[0]), str(itm[1])), str(res))
        while '-' in exp:
            if '-' in exp:
                itm = splitter(exp, '-')
                res = itm[0] - itm[1]
                exp = exp.replace('%s-%s' % (str(itm[0]), str(itm[1])), str(res))

        return format(exp)

Question 56

I need to do auto-correlation of a set of numbers, which as I understand it is just the correlation of the set with itself.

I’ve tried it using numpy’s correlate function, but I don’t believe the result, as it almost always gives a vector where the first number is not the largest, as it ought to be.

So, this question is really two questions:

What exactly is numpy.correlate doing?
How can I use it (or something else) to do auto-correlation?

Question 57

To answer your first question, numpy.correlate(a, v, mode) is performing the convolution of a with the reverse of v and giving the results clipped by the specified mode. The definition of convolution, C(t)=∑ _{-∞ < i < ∞} a_iv_t+i where -∞ < t < ∞, allows for results from -∞ to ∞, but you obviously can’t store an infinitely long array. So it has to be clipped, and that is where the mode comes in. There are 3 different modes: full, same, & valid:

“full” mode returns results for every t where both a and v have some overlap.
“same” mode returns a result with the same length as the shortest vector (a or v).
“valid” mode returns results only when a and v completely overlap each other. The documentation for numpy.convolve gives more detail on the modes.

For your second question, I think numpy.correlate is giving you the autocorrelation, it is just giving you a little more as well. The autocorrelation is used to find how similar a signal, or function, is to itself at a certain time difference. At a time difference of 0, the auto-correlation should be the highest because the signal is identical to itself, so you expected that the first element in the autocorrelation result array would be the greatest. However, the correlation is not starting at a time difference of 0. It starts at a negative time difference, closes to 0, and then goes positive. That is, you were expecting:

autocorrelation(a) = ∑ _{-∞ < i < ∞} a_iv_t+i where 0 <= t < ∞

But what you got was:

autocorrelation(a) = ∑ _{-∞ < i < ∞} a_iv_t+i where -∞ < t < ∞

What you need to do is take the last half of your correlation result, and that should be the autocorrelation you are looking for. A simple python function to do that would be:

def autocorr(x):
    result = numpy.correlate(x, x, mode='full')
    return result[result.size/2:]

You will, of course, need error checking to make sure that x is actually a 1-d array. Also, this explanation probably isn’t the most mathematically rigorous. I’ve been throwing around infinities because the definition of convolution uses them, but that doesn’t necessarily apply for autocorrelation. So, the theoretical portion of this explanation may be slightly wonky, but hopefully the practical results are helpful. These pages on autocorrelation are pretty helpful, and can give you a much better theoretical background if you don’t mind wading through the notation and heavy concepts.

Question 58

Auto-correlation comes in two versions: statistical and convolution. They both do the same, except for a little detail: The statistical version is normalized to be on the interval [-1,1]. Here is an example of how you do the statistical one:

def acf(x, length=20):
    return numpy.array([1]+[numpy.corrcoef(x[:-i], x[i:])[0,1]  \
        for i in range(1, length)])

Question 59

Use the numpy.corrcoef function instead of numpy.correlate to calculate the statistical correlation for a lag of t:

def autocorr(x, t=1):
    return numpy.corrcoef(numpy.array([x[:-t], x[t:]]))

Question 60

I think there are 2 things that add confusion to this topic:

statistical v.s. signal processing definition: as others have pointed out, in statistics we normalize auto-correlation into [-1,1].
partial v.s. non-partial mean/variance: when the timeseries shifts at a lag>0, their overlap size will always < original length. Do we use the mean and std of the original (non-partial), or always compute a new mean and std using the ever changing overlap (partial) makes a difference. (There’s probably a formal term for this, but I’m gonna use “partial” for now).

I’ve created 5 functions that compute auto-correlation of a 1d array, with partial v.s. non-partial distinctions. Some use formula from statistics, some use correlate in the signal processing sense, which can also be done via FFT. But all results are auto-correlations in the statistics definition, so they illustrate how they are linked to each other. Code below:

import numpy
import matplotlib.pyplot as plt

def autocorr1(x,lags):
    '''numpy.corrcoef, partial'''

    corr=[1. if l==0 else numpy.corrcoef(x[l:],x[:-l])[0][1] for l in lags]
    return numpy.array(corr)

def autocorr2(x,lags):
    '''manualy compute, non partial'''

    mean=numpy.mean(x)
    var=numpy.var(x)
    xp=x-mean
    corr=[1. if l==0 else numpy.sum(xp[l:]*xp[:-l])/len(x)/var for l in lags]

    return numpy.array(corr)

def autocorr3(x,lags):
    '''fft, pad 0s, non partial'''

    n=len(x)
    # pad 0s to 2n-1
    ext_size=2*n-1
    # nearest power of 2
    fsize=2**numpy.ceil(numpy.log2(ext_size)).astype('int')

    xp=x-numpy.mean(x)
    var=numpy.var(x)

    # do fft and ifft
    cf=numpy.fft.fft(xp,fsize)
    sf=cf.conjugate()*cf
    corr=numpy.fft.ifft(sf).real
    corr=corr/var/n

    return corr[:len(lags)]

def autocorr4(x,lags):
    '''fft, don't pad 0s, non partial'''
    mean=x.mean()
    var=numpy.var(x)
    xp=x-mean

    cf=numpy.fft.fft(xp)
    sf=cf.conjugate()*cf
    corr=numpy.fft.ifft(sf).real/var/len(x)

    return corr[:len(lags)]

def autocorr5(x,lags):
    '''numpy.correlate, non partial'''
    mean=x.mean()
    var=numpy.var(x)
    xp=x-mean
    corr=numpy.correlate(xp,xp,'full')[len(x)-1:]/var/len(x)

    return corr[:len(lags)]


if __name__=='__main__':

    y=[28,28,26,19,16,24,26,24,24,29,29,27,31,26,38,23,13,14,28,19,19,\
            17,22,2,4,5,7,8,14,14,23]
    y=numpy.array(y).astype('float')

    lags=range(15)
    fig,ax=plt.subplots()

    for funcii, labelii in zip([autocorr1, autocorr2, autocorr3, autocorr4,
        autocorr5], ['np.corrcoef, partial', 'manual, non-partial',
            'fft, pad 0s, non-partial', 'fft, no padding, non-partial',
            'np.correlate, non-partial']):

        cii=funcii(y,lags)
        print(labelii)
        print(cii)
        ax.plot(lags,cii,label=labelii)

    ax.set_xlabel('lag')
    ax.set_ylabel('correlation coefficient')
    ax.legend()
    plt.show()

Here is the output figure:

We don’t see all 5 lines because 3 of them overlap (at the purple). The overlaps are all non-partial auto-correlations. This is because computations from the signal processing methods (np.correlate, FFT) don’t compute a different mean/std for each overlap.

Also note that the fft, no padding, non-partial (red line) result is different, because it didn’t pad the timeseries with 0s before doing FFT, so it’s circular FFT. I can’t explain in detail why, that’s what I learned from elsewhere.

Question 61

As I just ran into the same problem, I would like to share a few lines of code with you. In fact there are several rather similar posts about autocorrelation in stackoverflow by now. If you define the autocorrelation as a(x, L) = sum(k=0,N-L-1)((xk-xbar)*(x(k+L)-xbar))/sum(k=0,N-1)((xk-xbar)**2) [this is the definition given in IDL’s a_correlate function and it agrees with what I see in answer 2 of question #12269834], then the following seems to give the correct results:

import numpy as np
import matplotlib.pyplot as plt

# generate some data
x = np.arange(0.,6.12,0.01)
y = np.sin(x)
# y = np.random.uniform(size=300)
yunbiased = y-np.mean(y)
ynorm = np.sum(yunbiased**2)
acor = np.correlate(yunbiased, yunbiased, "same")/ynorm
# use only second half
acor = acor[len(acor)/2:]

plt.plot(acor)
plt.show()

As you see I have tested this with a sin curve and a uniform random distribution, and both results look like I would expect them. Note that I used mode="same" instead of mode="full" as the others did.

Question 62

Your question 1 has been already extensively discussed in several excellent answers here.

I thought to share with you a few lines of code that allow you to compute the autocorrelation of a signal based only on the mathematical properties of the autocorrelation. That is, the autocorrelation may be computed in the following way:

subtract the mean from the signal and obtain an unbiased signal
compute the Fourier transform of the unbiased signal
compute the power spectral density of the signal, by taking the square norm of each value of the Fourier transform of the unbiased signal
compute the inverse Fourier transform of the power spectral density
normalize the inverse Fourier transform of the power spectral density by the sum of the squares of the unbiased signal, and take only half of the resulting vector

The code to do this is the following:

def autocorrelation (x) :
    """
    Compute the autocorrelation of the signal, based on the properties of the
    power spectral density of the signal.
    """
    xp = x-np.mean(x)
    f = np.fft.fft(xp)
    p = np.array([np.real(v)**2+np.imag(v)**2 for v in f])
    pi = np.fft.ifft(p)
    return np.real(pi)[:x.size/2]/np.sum(xp**2)

Question 63

I’m a computational biologist, and when I had to compute the auto/cross-correlations between couples of time series of stochastic processes I realized that np.correlate was not doing the job I needed.

Indeed, what seems to be missing from np.correlate is the averaging over all the possible couples of time points at distance 𝜏.

Here is how I defined a function doing what I needed:

def autocross(x, y):
    c = np.correlate(x, y, "same")
    v = [c[i]/( len(x)-abs( i - (len(x)/2)  ) ) for i in range(len(c))]
    return v

It seems to me none of the previous answers cover this instance of auto/cross-correlation: hope this answer may be useful to somebody working on stochastic processes like me.

Question 64

I use talib.CORREL for autocorrelation like this, I suspect you could do the same with other packages:

def autocorrelate(x, period):

    # x is a deep indicator array 
    # period of sample and slices of comparison

    # oldest data (period of input array) may be nan; remove it
    x = x[-np.count_nonzero(~np.isnan(x)):]
    # subtract mean to normalize indicator
    x -= np.mean(x)
    # isolate the recent sample to be autocorrelated
    sample = x[-period:]
    # create slices of indicator data
    correls = []
    for n in range((len(x)-1), period, -1):
        alpha = period + n
        slices = (x[-alpha:])[:period]
        # compare each slice to the recent sample
        correls.append(ta.CORREL(slices, sample, period)[-1])
    # fill in zeros for sample overlap period of recent correlations    
    for n in range(period,0,-1):
        correls.append(0)
    # oldest data (autocorrelation period) will be nan; remove it
    correls = np.array(correls[-np.count_nonzero(~np.isnan(correls)):])      

    return correls

# CORRELATION OF BEST FIT
# the highest value correlation    
max_value = np.max(correls)
# index of the best correlation
max_index = np.argmax(correls)

问题：为什么Python的math.ceil（）和math.floor（）操作返回浮点数而不是整数？

回答 0

回答 1

回答 2

回答 3

回答 4

回答 5

回答 6

回答 7

问题：Python部门

回答 0

回答 1

回答 2

回答 3

回答 4

回答 5

回答 6

回答 7

回答 8

回答 9

回答 10

回答 11

问题：numpy max vs amax vs maximum

回答 0

回答 1

回答 2

一，单阵列订单统计

二。用于两个数组的元素比较

I. For single array order statistics

II. For element-wise comparison of two arrays

回答 3

问题：在球体上平均分配n个点

回答 0

回答 1

回答 2

黄金螺旋法

底漆：单位盘上的向日葵螺旋

径向间隔点

现在在球上做向日葵

进一步的研究

笔记

The golden spiral method

Primer: sunflower spirals on the unit disk

Spacing the points radially

Now doing the sunflower on a sphere

Further research

Notes

回答 3

回答 4

回答 5

回答 6

回答 7

回答 8

回答 9

回答 10

回答 11

回答 12

回答 13

回答 14

问题：评估字符串中的数学表达式

回答 0

回答 1

eval 是邪恶的

使用以下方法评估算术表达式 ast

例

eval is evil

Evaluate arithmetic expression using ast

Example

回答 2

回答 3

回答 4

回答 5

回答 6

回答 7

回答 8

回答 9

回答 10

问题：如何使用numpy.correlate进行自相关？

回答 0

回答 1

`eval` 是邪恶的

使用以下方法评估算术表达式 `ast`

`eval` is evil

Evaluate arithmetic expression using `ast`