Python 实用宝典

Question 1

I have a list of 3-tuples representing a set of points in 3D space. I want to plot a surface that covers all these points.

The plot_surface function in the mplot3d package requires as arguments X,Y and Z to be 2d arrays. Is plot_surface the right function to plot surface and how do I transform my data into the required format?

data = [(x1,y1,z1),(x2,y2,z2),.....,(xn,yn,zn)]

Question 2

For surfaces it’s a bit different than a list of 3-tuples, you should pass in a grid for the domain in 2d arrays.

If all you have is a list of 3d points, rather than some function f(x, y) -> z, then you will have a problem because there are multiple ways to triangulate that 3d point cloud into a surface.

Here’s a smooth surface example:

import numpy as np
from mpl_toolkits.mplot3d import Axes3D  
# Axes3D import has side effects, it enables using projection='3d' in add_subplot
import matplotlib.pyplot as plt
import random

def fun(x, y):
    return x**2 + y

fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')
x = y = np.arange(-3.0, 3.0, 0.05)
X, Y = np.meshgrid(x, y)
zs = np.array(fun(np.ravel(X), np.ravel(Y)))
Z = zs.reshape(X.shape)

ax.plot_surface(X, Y, Z)

ax.set_xlabel('X Label')
ax.set_ylabel('Y Label')
ax.set_zlabel('Z Label')

plt.show()

Question 3

You can read data direct from some file and plot

from mpl_toolkits.mplot3d import Axes3D
import matplotlib.pyplot as plt
from matplotlib import cm
import numpy as np
from sys import argv

x,y,z = np.loadtxt('your_file', unpack=True)

fig = plt.figure()
ax = Axes3D(fig)
surf = ax.plot_trisurf(x, y, z, cmap=cm.jet, linewidth=0.1)
fig.colorbar(surf, shrink=0.5, aspect=5)
plt.savefig('teste.pdf')
plt.show()

If necessary you can pass vmin and vmax to define the colorbar range, e.g.

surf = ax.plot_trisurf(x, y, z, cmap=cm.jet, linewidth=0.1, vmin=0, vmax=2000)

Bonus Section

I was wondering how to do some interactive plots, in this case with artificial data

from __future__ import print_function
from ipywidgets import interact, interactive, fixed, interact_manual
import ipywidgets as widgets
from IPython.display import Image

from mpl_toolkits.mplot3d import Axes3D
import matplotlib.pyplot as plt
import numpy as np
from mpl_toolkits import mplot3d

def f(x, y):
    return np.sin(np.sqrt(x ** 2 + y ** 2))

def plot(i):

    fig = plt.figure()
    ax = plt.axes(projection='3d')

    theta = 2 * np.pi * np.random.random(1000)
    r = i * np.random.random(1000)
    x = np.ravel(r * np.sin(theta))
    y = np.ravel(r * np.cos(theta))
    z = f(x, y)

    ax.plot_trisurf(x, y, z, cmap='viridis', edgecolor='none')
    fig.tight_layout()

interactive_plot = interactive(plot, i=(2, 10))
interactive_plot

Question 4

I just came across this same problem. I have evenly spaced data that is in 3 1-D arrays instead of the 2-D arrays that matplotlib‘s plot_surface wants. My data happened to be in a pandas.DataFrame so here is the matplotlib.plot_surface example with the modifications to plot 3 1-D arrays.

from mpl_toolkits.mplot3d import Axes3D
from matplotlib import cm
from matplotlib.ticker import LinearLocator, FormatStrFormatter
import matplotlib.pyplot as plt
import numpy as np

X = np.arange(-5, 5, 0.25)
Y = np.arange(-5, 5, 0.25)
X, Y = np.meshgrid(X, Y)
R = np.sqrt(X**2 + Y**2)
Z = np.sin(R)

fig = plt.figure()
ax = fig.gca(projection='3d')
surf = ax.plot_surface(X, Y, Z, rstride=1, cstride=1, cmap=cm.coolwarm,
    linewidth=0, antialiased=False)
ax.set_zlim(-1.01, 1.01)

ax.zaxis.set_major_locator(LinearLocator(10))
ax.zaxis.set_major_formatter(FormatStrFormatter('%.02f'))

fig.colorbar(surf, shrink=0.5, aspect=5)
plt.title('Original Code')

That is the original example. Adding this next bit on creates the same plot from 3 1-D arrays.

# ~~~~ MODIFICATION TO EXAMPLE BEGINS HERE ~~~~ #
import pandas as pd
from scipy.interpolate import griddata
# create 1D-arrays from the 2D-arrays
x = X.reshape(1600)
y = Y.reshape(1600)
z = Z.reshape(1600)
xyz = {'x': x, 'y': y, 'z': z}

# put the data into a pandas DataFrame (this is what my data looks like)
df = pd.DataFrame(xyz, index=range(len(xyz['x']))) 

# re-create the 2D-arrays
x1 = np.linspace(df['x'].min(), df['x'].max(), len(df['x'].unique()))
y1 = np.linspace(df['y'].min(), df['y'].max(), len(df['y'].unique()))
x2, y2 = np.meshgrid(x1, y1)
z2 = griddata((df['x'], df['y']), df['z'], (x2, y2), method='cubic')

fig = plt.figure()
ax = fig.gca(projection='3d')
surf = ax.plot_surface(x2, y2, z2, rstride=1, cstride=1, cmap=cm.coolwarm,
    linewidth=0, antialiased=False)
ax.set_zlim(-1.01, 1.01)

ax.zaxis.set_major_locator(LinearLocator(10))
ax.zaxis.set_major_formatter(FormatStrFormatter('%.02f'))

fig.colorbar(surf, shrink=0.5, aspect=5)
plt.title('Meshgrid Created from 3 1D Arrays')
# ~~~~ MODIFICATION TO EXAMPLE ENDS HERE ~~~~ #

plt.show()

Here are the resulting figures:

Question 5

Just to chime in, Emanuel had the answer that I (and probably many others) are looking for. If you have 3d scattered data in 3 separate arrays, pandas is an incredible help and works much better than the other options. To elaborate, suppose your x,y,z are some arbitrary variables. In my case these were c,gamma, and errors because I was testing a support vector machine. There are many potential choices to plot the data:

scatter3D(cParams, gammas, avg_errors_array) – this works but is overly simplistic
plot_wireframe(cParams, gammas, avg_errors_array) – this works, but will look ugly if your data isn’t sorted nicely, as is potentially the case with massive chunks of real scientific data
ax.plot3D(cParams, gammas, avg_errors_array) – similar to wireframe

Wireframe plot of the data

3d scatter of the data

The code looks like this:

    fig = plt.figure()
    ax = fig.gca(projection='3d')
    ax.set_xlabel('c parameter')
    ax.set_ylabel('gamma parameter')
    ax.set_zlabel('Error rate')
    #ax.plot_wireframe(cParams, gammas, avg_errors_array)
    #ax.plot3D(cParams, gammas, avg_errors_array)
    #ax.scatter3D(cParams, gammas, avg_errors_array, zdir='z',cmap='viridis')

    df = pd.DataFrame({'x': cParams, 'y': gammas, 'z': avg_errors_array})
    surf = ax.plot_trisurf(df.x, df.y, df.z, cmap=cm.jet, linewidth=0.1)
    fig.colorbar(surf, shrink=0.5, aspect=5)    
    plt.savefig('./plots/avgErrs_vs_C_andgamma_type_%s.png'%(k))
    plt.show()

Here is the final output:

Question 6

check the official example. X,Y and Z are indeed 2d arrays, numpy.meshgrid() is a simple way to get 2d x,y mesh out of 1d x and y values.

http://matplotlib.sourceforge.net/mpl_examples/mplot3d/surface3d_demo.py

here’s pythonic way to convert your 3-tuples to 3 1d arrays.

data = [(1,2,3), (10,20,30), (11, 22, 33), (110, 220, 330)]
X,Y,Z = zip(*data)
In [7]: X
Out[7]: (1, 10, 11, 110)
In [8]: Y
Out[8]: (2, 20, 22, 220)
In [9]: Z
Out[9]: (3, 30, 33, 330)

Here’s mtaplotlib delaunay triangulation (interpolation), it converts 1d x,y,z into something compliant (?):

http://matplotlib.sourceforge.net/api/mlab_api.html#matplotlib.mlab.griddata

Question 7

In Matlab I did something similar using the delaunay function on the x, y coords only (not the z), then plotting with trimesh or trisurf, using z as the height.

SciPy has the Delaunay class, which is based on the same underlying QHull library that the Matlab’s delaunay function is, so you should get identical results.

From there, it should be a few lines of code to convert this Plotting 3D Polygons in python-matplotlib example into what you wish to achieve, as Delaunay gives you the specification of each triangular polygon.

Question 8

Just to add some further thoughts which may help others with irregular domain type problems. For a situation where the user has three vectors/lists, x,y,z representing a 2D solution where z is to be plotted on a rectangular grid as a surface, the ‘plot_trisurf()’ comments by ArtifixR are applicable. A similar example but with non rectangular domain is:

import matplotlib.pyplot as plt
from matplotlib import cm
from mpl_toolkits.mplot3d import Axes3D 

# problem parameters
nu = 50; nv = 50
u = np.linspace(0, 2*np.pi, nu,) 
v = np.linspace(0, np.pi, nv,)

xx = np.zeros((nu,nv),dtype='d')
yy = np.zeros((nu,nv),dtype='d')
zz = np.zeros((nu,nv),dtype='d')

# populate x,y,z arrays
for i in range(nu):
  for j in range(nv):
    xx[i,j] = np.sin(v[j])*np.cos(u[i])
    yy[i,j] = np.sin(v[j])*np.sin(u[i])
    zz[i,j] = np.exp(-4*(xx[i,j]**2 + yy[i,j]**2)) # bell curve

# convert arrays to vectors
x = xx.flatten()
y = yy.flatten()
z = zz.flatten()

# Plot solution surface
fig = plt.figure(figsize=(6,6))
ax = Axes3D(fig)
ax.plot_trisurf(x, y, z, cmap=cm.jet, linewidth=0,
                antialiased=False)
ax.set_title(r'trisurf example',fontsize=16, color='k')
ax.view_init(60, 35)
fig.tight_layout()
plt.show()

The above code produces:

However, this may not solve all problems, particular where the problem is defined on an irregular domain. Also, in the case where the domain has one or more concave areas, the delaunay triangulation may result in generating spurious triangles exterior to the domain. In such cases, these rogue triangles have to be removed from the triangulation in order to achieve the correct surface representation. For these situations, the user may have to explicitly include the delaunay triangulation calculation so that these triangles can be removed programmatically. Under these circumstances, the following code could replace the previous plot code:


import matplotlib.tri as mtri 
import scipy.spatial
# plot final solution
pts = np.vstack([x, y]).T
tess = scipy.spatial.Delaunay(pts) # tessilation

# Create the matplotlib Triangulation object
xx = tess.points[:, 0]
yy = tess.points[:, 1]
tri = tess.vertices # or tess.simplices depending on scipy version

#############################################################
# NOTE: If 2D domain has concave properties one has to
#       remove delaunay triangles that are exterior to the domain.
#       This operation is problem specific!
#       For simple situations create a polygon of the
#       domain from boundary nodes and identify triangles
#       in 'tri' outside the polygon. Then delete them from
#       'tri'.
#       <ADD THE CODE HERE>
#############################################################

triDat = mtri.Triangulation(x=pts[:, 0], y=pts[:, 1], triangles=tri)

# Plot solution surface
fig = plt.figure(figsize=(6,6))
ax = fig.gca(projection='3d')
ax.plot_trisurf(triDat, z, linewidth=0, edgecolor='none',
                antialiased=False, cmap=cm.jet)
ax.set_title(r'trisurf with delaunay triangulation', 
          fontsize=16, color='k')
plt.show()

Example plots are given below illustrating solution 1) with spurious triangles, and 2) where they have been removed:

I hope the above may be of help to people with concavity situations in the solution data.

Question 9

It is not possible to directly make a 3d surface using your data. I would recommend you to build an interpolation model using some tools like pykridge. The process will include three steps:

Train an interpolation model using pykridge
Build a grid from X and Y using meshgrid
Interpolate values for Z

Having created your grid and the corresponding Z values, now you’re ready to go with plot_surface. Note that depending on the size of your data, the meshgrid function can run for a while. The workaround is to create evenly spaced samples using np.linspace for X and Y axes, then apply interpolation to infer the necessary Z values. If so, the interpolated values might different from the original Z because X and Y have changed.

Question 10

I’d like to check if variable is None or numpy.array. I’ve implemented check_a function to do this.

def check_a(a):
    if not a:
        print "please initialize a"

a = None
check_a(a)
a = np.array([1,2])
check_a(a)

But, this code raises ValueError. What is the straight forward way?

ValueError                                Traceback (most recent call last)
<ipython-input-41-0201c81c185e> in <module>()
      6 check_a(a)
      7 a = np.array([1,2])
----> 8 check_a(a)

<ipython-input-41-0201c81c185e> in check_a(a)
      1 def check_a(a):
----> 2     if not a:
      3         print "please initialize a"
      4 
      5 a = None

ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

Question 11

Using not a to test whether a is None assumes that the other possible values of a have a truth value of True. However, most NumPy arrays don’t have a truth value at all, and not cannot be applied to them.

If you want to test whether an object is None, the most general, reliable way is to literally use an is check against None:

if a is None:
    ...
else:
    ...

This doesn’t depend on objects having a truth value, so it works with NumPy arrays.

Note that the test has to be is, not ==. is is an object identity test. == is whatever the arguments say it is, and NumPy arrays say it’s a broadcasted elementwise equality comparison, producing a boolean array:

>>> a = numpy.arange(5)
>>> a == None
array([False, False, False, False, False])
>>> if a == None:
...     pass
...
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: The truth value of an array with more than one element is ambiguous.
 Use a.any() or a.all()

On the other side of things, if you want to test whether an object is a NumPy array, you can test its type:

# Careful - the type is np.ndarray, not np.array. np.array is a factory function.
if type(a) is np.ndarray:
    ...
else:
    ...

You can also use isinstance, which will also return True for subclasses of that type (if that is what you want). Considering how terrible and incompatible np.matrix is, you may not actually want this:

# Again, ndarray, not array, because array is a factory function.
if isinstance(a, np.ndarray):
    ...
else:
    ...

Question 12

If you are trying to do something very similar: a is not None, the same issue comes up. That is, Numpy complains that one must use a.any or a.all.

A workaround is to do:

if not (a is None):
    pass

Not too pretty, but it does the job.

Question 13

You can see if object has shape or not

def check_array(x):
    try:
        x.shape
        return True
    except:
        return False

Question 14

I would like to execute the equivalent of the following MATLAB code using NumPy: repmat([1; 1], [1 1 1]). How would I accomplish this?

Question 15

Here is a much better (official) NumPy for Matlab Users link – I’m afraid the mathesaurus one is quite out of date.

The numpy equivalent of repmat(a, m, n) is tile(a, (m, n)).

This works with multiple dimensions and gives a similar result to matlab. (Numpy gives a 3d output array as you would expect – matlab for some reason gives 2d output – but the content is the same).

Matlab:

>> repmat([1;1],[1,1,1])

ans =
     1
     1

Python:

In [46]: a = np.array([[1],[1]])
In [47]: np.tile(a, [1,1,1])
Out[47]: 
array([[[1],
        [1]]])

Question 16

Note that some of the reasons you’d need to use MATLAB’s repmat are taken care of by NumPy’s broadcasting mechanism, which allows you to do various types of math with arrays of similar shape. So if you had, say, a 1600x1400x3 array representing a 3-color image, you could (elementwise) multiply it by [1.0 0.25 0.25] to reduce the amount of green and blue at each pixel. See the above link for more information.

Question 17

See NumPy for Matlab users.

Matlab:

repmat(a, 2, 3)

Numpy:

numpy.kron(numpy.ones((2,3)), a)

Matlib in Numpy (numpy.matlib.repmat()):

numpy.matlib.repmat(a, 2, 3)

Question 18

This is how I understood it out of a bit of fiddling around. Happy to be corrected and hope this helps.

Say you have a matrix M of 2×3 elements. This has two dimensions, obviously.

I could see no difference between Matlab and Python while asking to manipulate the input matrix along the dimensions the matrix already has. Thus the two commands

repmat(M,m,n) % matlab

np.tile(M,(m,n)) # python

are really equivalent for a matrix of rank 2 (two dimensions).

The matters goes counter-intuitive when you ask for repetition/tiling over more dimensions than the input matrix has. Going back to the matrix M of rank two and shape 2×3, it is sufficient to look at what happens to the size/shape of the output matrix. Say the sequence for manipulation is now 1,1,2.

In Matlab

> size(repmat(M,1,1,2))
ans =

    2   3   2

it has copied the first two dimensions (rows and columns) of the input matrix and has repeated that once into a new third dimension (copied twice, that is). True to the naming repmat for repeat matrix.

In Python

>>> np.tile(M,(1,1,2)).shape
(1, 2, 6)

it has applied a different procedure since, I presume, the sequence (1,1,2) is read differently than in Matlab. The number of copies in the direction of columns, rows and out-of-plane dimension are being read from right to left. The resulting object has a different shape from Matlab. One can no longer assert that repmat and tile are equivalent instructions.

In order to get tile to behave like repmat, in Python one has to make sure that the input matrix has as many dimensions as the elements are in the sequence. This is done, for example, by a little preconditioning and creating a related object N

N = M[:,:,np.newaxis]

Then, at the input side one has N.shape = (2,3,1) rather than M.shape = (2,3) and at the output side

>>> np.tile(N,(1,1,2)).shape
(2, 3, 2)

which was the answer of size(repmat(M,1,1,2)). I presume this is because we have guided Python to add the third dimension to the right of (2,3) rather than to its left, so that Python works out the sequence (1,1,2) as it was intended in the Matlab way of reading it.

The element in [:,:,0] in the Python answer for N will contain the same values as the element (:,:,1) the Matlab answer for M.

Finally, I can’t seem to find an equivalent for repmat when one uses the Kronecker product out of

>>> np.kron(np.ones((1,1,2)),M).shape
(1, 2, 6)

unless I then precondition M into N as above. So I would argue that the most general way to move on is to use the ways of np.newaxis.

The game gets trickier when we consider a matrix L of rank 3 (three dimensions) and the simple case of no new dimensions being added in the output matrix. These two seemingly equivalent instructions will not produce the same results

repmat(L,p,q,r) % matlab

np.tile(L,(p,q,r)) # python

because the row, column, out-of-plane directions are (p,q,r) in Matlab and (q,r,p) in Python, which was not visible with rank-2 arrays. There, one has to be careful and obtaining the same results with the two languages would require more preconditioning.

I am aware that this reasoning may well not be general, but I could work it out only this far. Hopefully this invites other fellows to put it to a harder test.

Question 19

Know both tile and repeat.

x = numpy.arange(5)
print numpy.tile(x, 2)
print x.repeat(2)

Question 20

numpy.matlib has a repmat function with a similar interface as the matlab function

from numpy.matlib import repmat
repmat( np.array([[1],[1]]) , 1, 1)

Question 21

>>> import numpy as np

>>> np.repeat(['a','b'], [2,5])

array(['a', 'a', 'b', 'b', 'b', 'b', 'b'], dtype='<U1')

>>> np.repeat([1,2], [2,5])

array([1, 1, 2, 2, 2, 2, 2])

>>> np.repeat(np.array([1,2]), [3]).reshape(2,3)

array([[1, 1, 1],
       [2, 2, 2]])

>>> np.repeat(np.array([1,2]), [2,4]).reshape(3,2)

array([[1, 1],
       [2, 2],
       [2, 2]])

>>> np.repeat(np.matrix('1 2; 3 4'), [2]).reshape(4,2)

matrix([[1, 1],
        [2, 2],
        [3, 3],
        [4, 4]])

Question 22

I am trying to understand what is machine epsilon. According to the Wikipedia, it can be calculated as follows:

def machineEpsilon(func=float):
    machine_epsilon = func(1)
    while func(1)+func(machine_epsilon) != func(1):
        machine_epsilon_last = machine_epsilon
        machine_epsilon = func(machine_epsilon) / func(2)
    return machine_epsilon_last

However, it is suitable only for double precision numbers. I am interested in modifying it to support also single precision numbers. I read that numpy can be used, particularly numpy.float32 class. Can anybody help with modifying the function?

Question 23

An easier way to get the machine epsilon for a given float type is to use np.finfo():

print(np.finfo(float).eps)
# 2.22044604925e-16

print(np.finfo(np.float32).eps)
# 1.19209e-07

Question 24

Another easy way to get epsilon is:

In [1]: 7./3 - 4./3 -1
Out[1]: 2.220446049250313e-16

Question 25

It will already work, as David pointed out!

>>> def machineEpsilon(func=float):
...     machine_epsilon = func(1)
...     while func(1)+func(machine_epsilon) != func(1):
...         machine_epsilon_last = machine_epsilon
...         machine_epsilon = func(machine_epsilon) / func(2)
...     return machine_epsilon_last
... 
>>> machineEpsilon(float)
2.220446049250313e-16
>>> import numpy
>>> machineEpsilon(numpy.float64)
2.2204460492503131e-16
>>> machineEpsilon(numpy.float32)
1.1920929e-07

Question 26

I want to make some unit-tests for my app, and I need to compare two arrays. Since array.__eq__ returns a new array (so TestCase.assertEqual fails), what is the best way to assert for equality?

Currently I’m using

self.assertTrue((arr1 == arr2).all())

but I don’t really like it

Question 27

check out the assert functions in numpy.testing, e.g.

assert_array_equal

for floating point arrays equality test might fail and assert_almost_equal is more reliable.

update

A few versions ago numpy obtained assert_allclose which is now my favorite since it allows us to specify both absolute and relative error and doesn’t require decimal rounding as the closeness criterion.

Question 28

I think (arr1 == arr2).all() looks pretty nice. But you could use:

numpy.allclose(arr1, arr2)

but it’s not quite the same.

An alternative, almost the same as your example is:

numpy.alltrue(arr1 == arr2)

Note that scipy.array is actually a reference numpy.array. That makes it easier to find the documentation.

Question 29

I find that using self.assertEqual(arr1.tolist(), arr2.tolist()) is the easiest way of comparing arrays with unittest.

I agree it’s not the prettiest solution and it’s probably not the fastest but it’s probably more uniform with the rest of your test cases, you get all the unittest error description and it’s really simple to implement.

Question 30

Since Python 3.2 you can use assertSequenceEqual(array1.tolist(), array2.tolist()).

This has the added value of showing you the exact items in which the arrays differ.

Question 31

In my tests I use this:

numpy.testing.assert_array_equal(arr1, arr2)

Question 32

np.linalg.norm(arr1 - arr2) < 1e-6

Question 33

I need to write a function which will detect if the input contains at least one value which is non-numeric. If a non-numeric value is found I will raise an error (because the calculation should only return a numeric value). The number of dimensions of the input array is not known in advance – the function should give the correct value regardless of ndim. As an extra complication the input could be a single float or numpy.float64 or even something oddball like a zero-dimensional array.

The obvious way to solve this is to write a recursive function which iterates over every iterable object in the array until it finds a non-iterabe. It will apply the numpy.isnan() function over every non-iterable object. If at least one non-numeric value is found then the function will return False immediately. Otherwise if all the values in the iterable are numeric it will eventually return True.

That works just fine, but it’s pretty slow and I expect that NumPy has a much better way to do it. What is an alternative that is faster and more numpyish?

Here’s my mockup:

def contains_nan( myarray ):
    """
    @param myarray : An n-dimensional array or a single float
    @type myarray : numpy.ndarray, numpy.array, float
    @returns: bool
    Returns true if myarray is numeric or only contains numeric values.
    Returns false if at least one non-numeric value exists
    Not-A-Number is given by the numpy.isnan() function.
    """
    return True

Question 34

This should be faster than iterating and will work regardless of shape.

numpy.isnan(myarray).any()

Edit: 30x faster:

import timeit
s = 'import numpy;a = numpy.arange(10000.).reshape((100,100));a[10,10]=numpy.nan'
ms = [
    'numpy.isnan(a).any()',
    'any(numpy.isnan(x) for x in a.flatten())']
for m in ms:
    print "  %.2f s" % timeit.Timer(m, s).timeit(1000), m

Results:

  0.11 s numpy.isnan(a).any()
  3.75 s any(numpy.isnan(x) for x in a.flatten())

Bonus: it works fine for non-array NumPy types:

>>> a = numpy.float64(42.)
>>> numpy.isnan(a).any()
False
>>> a = numpy.float64(numpy.nan)
>>> numpy.isnan(a).any()
True

Question 35

If infinity is a possible value, I would use numpy.isfinite

numpy.isfinite(myarray).all()

If the above evaluates to True, then myarray contains no, numpy.nan, numpy.inf or -numpy.inf values.

numpy.nan will be OK with numpy.inf values, for example:

In [11]: import numpy as np

In [12]: b = np.array([[4, np.inf],[np.nan, -np.inf]])

In [13]: np.isnan(b)
Out[13]: 
array([[False, False],
       [ True, False]], dtype=bool)

In [14]: np.isfinite(b)
Out[14]: 
array([[ True, False],
       [False, False]], dtype=bool)

Question 36

Pfft! Microseconds! Never solve a problem in microseconds that can be solved in nanoseconds.

Note that the accepted answer:

iterates over the whole data, regardless of whether a nan is found
creates a temporary array of size N, which is redundant.

A better solution is to return True immediately when NAN is found:

import numba
import numpy as np

NAN = float("nan")

@numba.njit(nogil=True)
def _any_nans(a):
    for x in a:
        if np.isnan(x): return True
    return False

@numba.jit
def any_nans(a):
    if not a.dtype.kind=='f': return False
    return _any_nans(a.flat)

array1M = np.random.rand(1000000)
assert any_nans(array1M)==False
%timeit any_nans(array1M)  # 573us

array1M[0] = NAN
assert any_nans(array1M)==True
%timeit any_nans(array1M)  # 774ns  (!nanoseconds)

and works for n-dimensions:

array1M_nd = array1M.reshape((len(array1M)/2, 2))
assert any_nans(array1M_nd)==True
%timeit any_nans(array1M_nd)  # 774ns

Compare this to the numpy native solution:

def any_nans(a):
    if not a.dtype.kind=='f': return False
    return np.isnan(a).any()

array1M = np.random.rand(1000000)
assert any_nans(array1M)==False
%timeit any_nans(array1M)  # 456us

array1M[0] = NAN
assert any_nans(array1M)==True
%timeit any_nans(array1M)  # 470us

%timeit np.isnan(array1M).any()  # 532us

The early-exit method is 3 orders or magnitude speedup (in some cases). Not too shabby for a simple annotation.

Question 37

With numpy 1.3 or svn you can do this

In [1]: a = arange(10000.).reshape(100,100)

In [3]: isnan(a.max())
Out[3]: False

In [4]: a[50,50] = nan

In [5]: isnan(a.max())
Out[5]: True

In [6]: timeit isnan(a.max())
10000 loops, best of 3: 66.3 µs per loop

The treatment of nans in comparisons was not consistent in earlier versions.

Question 38

(np.where(np.isnan(A)))[0].shape[0] will be greater than 0 if A contains at least one element of nan, A could be an n x m matrix.

Example:

import numpy as np

A = np.array([1,2,4,np.nan])

if (np.where(np.isnan(A)))[0].shape[0]: 
    print "A contains nan"
else:
    print "A does not contain nan"

Question 39

For example, if we have a numpy array A, and we want a numpy array B with the same elements.

What is the difference between the following (see below) methods? When is additional memory allocated, and when is it not?

B = A
B[:] = A (same as B[:]=A[:]?)
numpy.copy(B, A)

Question 40

All three versions do different things:

B = A

This binds a new name B to the existing object already named A. Afterwards they refer to the same object, so if you modify one in place, you’ll see the change through the other one too.
B[:] = A (same as B[:]=A[:]?)

This copies the values from A into an existing array B. The two arrays must have the same shape for this to work. B[:] = A[:] does the same thing (but B = A[:] would do something more like 1).
numpy.copy(B, A)

This is not legal syntax. You probably meant B = numpy.copy(A). This is almost the same as 2, but it creates a new array, rather than reusing the B array. If there were no other references to the previous B value, the end result would be the same as 2, but it will use more memory temporarily during the copy.

Or maybe you meant numpy.copyto(B, A), which is legal, and is equivalent to 2?

Question 41

B=A creates a reference
B[:]=A makes a copy
numpy.copy(B,A) makes a copy

the last two need additional memory.

To make a deep copy you need to use B = copy.deepcopy(A)

Question 42

This is the only working answer for me:

B=numpy.array(A)

Question 43

I wonder, how to save and load numpy.array data properly. Currently I’m using the numpy.savetxt() method. For example, if I got an array markers, which looks like this:

I try to save it by the use of:

numpy.savetxt('markers.txt', markers)

In other script I try to open previously saved file:

markers = np.fromfile("markers.txt")

And that’s what I get…

Saved data first looks like this:

0.000000000000000000e+00
0.000000000000000000e+00
0.000000000000000000e+00
0.000000000000000000e+00
0.000000000000000000e+00
0.000000000000000000e+00
0.000000000000000000e+00
0.000000000000000000e+00
0.000000000000000000e+00
0.000000000000000000e+00

But when I save just loaded data by the use of the same method, ie. numpy.savetxt() it looks like this:

1.398043286095131769e-76
1.398043286095288860e-76
1.396426376485745879e-76
1.398043286055061908e-76
1.398043286095288860e-76
1.182950697433698368e-76
1.398043275797188953e-76
1.398043286095288860e-76
1.210894289234927752e-99
1.398040649781712473e-76

What am I doing wrong? PS there are no other “backstage” operation which I perform. Just saving and loading, and that’s what I get. Thank you in advance.

Question 44

The most reliable way I have found to do this is to use np.savetxt with np.loadtxt and not np.fromfile which is better suited to binary files written with tofile. The np.fromfile and np.tofile methods write and read binary files whereas np.savetxt writes a text file. So, for example:

In [1]: a = np.array([1, 2, 3, 4])
In [2]: np.savetxt('test1.txt', a, fmt='%d')
In [3]: b = np.loadtxt('test1.txt', dtype=int)
In [4]: a == b
Out[4]: array([ True,  True,  True,  True], dtype=bool)

Or:

In [5]: a.tofile('test2.dat')
In [6]: c = np.fromfile('test2.dat', dtype=int)
In [7]: c == a
Out[7]: array([ True,  True,  True,  True], dtype=bool)

I use the former method even if it is slower and creates bigger files (sometimes): the binary format can be platform dependent (for example, the file format depends on the endianness of your system).

There is a platform independent format for NumPy arrays, which can be saved and read with np.save and np.load:

In  [8]: np.save('test3.npy', a)    # .npy extension is added if not given
In  [9]: d = np.load('test3.npy')
In [10]: a == d
Out[10]: array([ True,  True,  True,  True], dtype=bool)

Question 45

np.save('data.npy', num_arr) # save
new_num_arr = np.load('data.npy') # load

Question 46

np.fromfile() has a sep= keyword argument:

Separator between items if file is a text file. Empty (“”) separator means the file should be treated as binary. Spaces (” ”) in the separator match zero or more whitespace characters. A separator consisting only of spaces must match at least one whitespace.

The default value of sep="" means that np.fromfile() tries to read it as a binary file rather than a space-separated text file, so you get nonsense values back. If you use np.fromfile('markers.txt', sep=" ") you will get the result you are looking for.

However, as others have pointed out, np.loadtxt() is the preferred way to convert text files to numpy arrays, and unless the file needs to be human-readable it is usually better to use binary formats instead (e.g. np.load()/np.save()).

Question 47

For a short answer you should use np.save and np.load. The advantages of these is that they are made by developers of the numpy library and they already work (plus are likely already optimized nicely) e.g.

import numpy as np
from pathlib import Path

path = Path('~/data/tmp/').expanduser()
path.mkdir(parents=True, exist_ok=True)

lb,ub = -1,1
num_samples = 5
x = np.random.uniform(low=lb,high=ub,size=(1,num_samples))
y = x**2 + x + 2

np.save(path/'x', x)
np.save(path/'y', y)

x_loaded = np.load(path/'x.npy')
y_load = np.load(path/'y.npy')

print(x is x_loaded) # False
print(x == x_loaded) # [[ True  True  True  True  True]]

Expanded answer:

In the end it really depends in your needs because you can also save it human readable format (see this Dump a NumPy array into a csv file) or even with other libraries if your files are extremely large (see this best way to preserve numpy arrays on disk for an expanded discussion).

However, (making an expansion since you use the word “properly” in your question) I still think using the numpy function out of the box (and most code!) most likely satisfy most user needs. The most important reason is that it already works. Trying to use something else for any other reason might take you on an unexpectedly LONG rabbit hole to figure out why it doesn’t work and force it work.

Take for example trying to save it with pickle. I tried that just for fun and it took me at least 30 minutes to realize that pickle wouldn’t save my stuff unless I opened & read the file in bytes mode with wb. Took time to google, try thing, understand the error message etc… Small detail but the fact that it already required me to open a file complicated things in unexpected ways. To add that it required me to re-read this (which btw is sort of confusing) Difference between modes a, a+, w, w+, and r+ in built-in open function?.

So if there is an interface that meets your needs use it unless you have a (very) good reason (e.g. compatibility with matlab or for some reason your really want to read the file and printing in python really doesn’t meet your needs, which might be questionable). Furthermore, most likely if you need to optimize it you’ll find out later down the line (rather than spend ages debugging useless stuff like opening a simple numpy file).

So use the interface/numpy provide. It might not be perfect it’s most likely fine, especially for a library that’s been around as long as numpy.

I already spent the saving and loading data with numpy in a bunch of way so have fun with it, hope it helps!

import numpy as np
import pickle
from pathlib import Path

path = Path('~/data/tmp/').expanduser()
path.mkdir(parents=True, exist_ok=True)

lb,ub = -1,1
num_samples = 5
x = np.random.uniform(low=lb,high=ub,size=(1,num_samples))
y = x**2 + x + 2

# using save (to npy), savez (to npz)
np.save(path/'x', x)
np.save(path/'y', y)
np.savez(path/'db', x=x, y=y)
with open(path/'db.pkl', 'wb') as db_file:
    pickle.dump(obj={'x':x, 'y':y}, file=db_file)

## using loading npy, npz files
x_loaded = np.load(path/'x.npy')
y_load = np.load(path/'y.npy')
db = np.load(path/'db.npz')
with open(path/'db.pkl', 'rb') as db_file:
    db_pkl = pickle.load(db_file)

print(x is x_loaded)
print(x == x_loaded)
print(x == db['x'])
print(x == db_pkl['x'])
print('done')

Some comments on what I learned:

np.save as expected, this already compresses it well (see https://stackoverflow.com/a/55750128/1601580), works out of the box without any file opening. Clean. Easy. Efficient. Use it.
np.savez uses a uncompressed format (see docs) Save several arrays into a single file in uncompressed .npz format. If you decide to use this (you were warned to go away from the standard solution so expect bugs!) you might discover that you need to use argument names to save it, unless you want to use the default names. So don’t use this if the first already works (or any works use that!)
Pickle also allows for arbitrary code execution. Some people might not want to use this for security reasons.
human readable files are expensive to make etc. Probably not worth it.
there is something called hdf5 for large files. Cool! https://stackoverflow.com/a/9619713/1601580

Note this is not an exhaustive answer. But for other resources check this:

For pickle (guess the top answer is don’t use pickle us np.save): Save Numpy Array using Pickle
For large files (great answer! compares storage size, loading save and more!): https://stackoverflow.com/a/41425878/1601580
For matlab (we have to accept matlab has some freakin’ nice plots!): “Converting” Numpy arrays to Matlab and vice versa
For saving in human readable format: Dump a NumPy array into a csv file

Question 48

I have two matrices

a = np.matrix([[1,2], [3,4]])
b = np.matrix([[5,6], [7,8]])

and I want to get the element-wise product, [[1*5,2*6], [3*7,4*8]], equaling

[[5,12], [21,32]]

I have tried

print(np.dot(a,b))

and

print(a*b)

but both give the result

[[19 22], [43 50]]

which is the matrix product, not the element-wise product. How can I get the the element-wise product (aka Hadamard product) using built-in functions?

Question 49

For elementwise multiplication of matrix objects, you can use numpy.multiply:

import numpy as np
a = np.array([[1,2],[3,4]])
b = np.array([[5,6],[7,8]])
np.multiply(a,b)

Result

array([[ 5, 12],
       [21, 32]])

However, you should really use array instead of matrix. matrix objects have all sorts of horrible incompatibilities with regular ndarrays. With ndarrays, you can just use * for elementwise multiplication:

a * b

If you’re on Python 3.5+, you don’t even lose the ability to perform matrix multiplication with an operator, because @ does matrix multiplication now:

a @ b  # matrix multiplication

Question 50

just do this:

import numpy as np

a = np.array([[1,2],[3,4]])
b = np.array([[5,6],[7,8]])

a * b

Question 51

import numpy as np
x = np.array([[1,2,3], [4,5,6]])
y = np.array([[-1, 2, 0], [-2, 5, 1]])

x*y
Out: 
array([[-1,  4,  0],
       [-8, 25,  6]])

%timeit x*y
1000000 loops, best of 3: 421 ns per loop

np.multiply(x,y)
Out: 
array([[-1,  4,  0],
       [-8, 25,  6]])

%timeit np.multiply(x, y)
1000000 loops, best of 3: 457 ns per loop

Both np.multiply and * would yield element wise multiplication known as the Hadamard Product

%timeit is ipython magic

Question 52

Try this:

a = np.matrix([[1,2], [3,4]])
b = np.matrix([[5,6], [7,8]])

#This would result a 'numpy.ndarray'
result = np.array(a) * np.array(b)

Here, np.array(a) returns a 2D array of type ndarray and multiplication of two ndarray would result element wise multiplication. So the result would be:

result = [[5, 12], [21, 32]]

If you wanna get a matrix, the do it with this:

result = np.mat(result)

Question 53

My numpy arrays use np.nan to designate missing values. As I iterate over the data set, I need to detect such missing values and handle them in special ways.

Naively I used numpy.isnan(val), which works well unless val isn’t among the subset of types supported by numpy.isnan(). For example, missing data can occur in string fields, in which case I get:

>>> np.isnan('some_string')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: Not implemented for this type

Other than writing an expensive wrapper that catches the exception and returns False, is there a way to handle this elegantly and efficiently?

Question 54

pandas.isnull() (also pd.isna(), in newer versions) checks for missing values in both numeric and string/object arrays. From the documentation, it checks for:

NaN in numeric arrays, None/NaN in object arrays

Quick example:

import pandas as pd
import numpy as np
s = pd.Series(['apple', np.nan, 'banana'])
pd.isnull(s)
Out[9]: 
0    False
1     True
2    False
dtype: bool

The idea of using numpy.nan to represent missing values is something that pandas introduced, which is why pandas has the tools to deal with it.

Datetimes too (if you use pd.NaT you won’t need to specify the dtype)

In [24]: s = Series([Timestamp('20130101'),np.nan,Timestamp('20130102 9:30')],dtype='M8[ns]')

In [25]: s
Out[25]: 
0   2013-01-01 00:00:00
1                   NaT
2   2013-01-02 09:30:00
dtype: datetime64[ns]``

In [26]: pd.isnull(s)
Out[26]: 
0    False
1     True
2    False
dtype: bool

Question 55

Is your type really arbitrary? If you know it is just going to be a int float or string you could just do

 if val.dtype == float and np.isnan(val):

assuming it is wrapped in numpy , it will always have a dtype and only float and complex can be NaN

问题：matplotlib中的曲面图

回答 0

回答 1

奖金部分

Bonus Section

回答 2

回答 3

回答 4

回答 5

回答 6

回答 7

问题：检查变量是否为None或numpy.array时发生ValueError

回答 0

回答 1

回答 2

问题：在NumPy中相当于MATLAB的repmat

回答 0

回答 1

回答 2

回答 3

回答 4

回答 5

回答 6

问题：python numpy机器epsilon

回答 0

回答 1

回答 2

问题：断言numpy.array相等的最佳方法？

回答 0

回答 1

回答 2

回答 3

回答 4

回答 5

问题：检测NumPy数组是否包含至少一个非数值？

回答 0

回答 1

回答 2

回答 3

回答 4

问题：带副本的numpy数组分配

回答 0

回答 1

回答 2

问题：如何正确保存和加载numpy.array（）数据？

回答 0

回答 1

回答 2

回答 3

问题：如何在numpy中获得按元素矩阵乘法（Hadamard积）？

回答 0

回答 1

回答 2

回答 3

问题：高效地检查Python / numpy / pandas中的任意对象是否为NaN？

回答 0

回答 1

有趣好用的Python教程