标签归档:numpy

matplotlib中的曲面图

问题:matplotlib中的曲面图

我有一个3元组的列表,表示3D空间中的一组点。我想绘制一个覆盖所有这些点的表面。

包中的plot_surface函数mplot3d要求X,Y和Z作为2d数组作为参数。是plot_surface正确的功能来绘制表面吗?如何将数据转换为所需的格式?

data = [(x1,y1,z1),(x2,y2,z2),.....,(xn,yn,zn)]

I have a list of 3-tuples representing a set of points in 3D space. I want to plot a surface that covers all these points.

The plot_surface function in the mplot3d package requires as arguments X,Y and Z to be 2d arrays. Is plot_surface the right function to plot surface and how do I transform my data into the required format?

data = [(x1,y1,z1),(x2,y2,z2),.....,(xn,yn,zn)]

回答 0

对于曲面,它与三元组列表略有不同,您应该为2d数组中的域传递网格。

如果您只拥有3d点列表而不是某些函数f(x, y) -> z,则将遇到问题,因为有多种方法可以将3d点云三角化为表面。

这是一个光滑的表面示例:

import numpy as np
from mpl_toolkits.mplot3d import Axes3D  
# Axes3D import has side effects, it enables using projection='3d' in add_subplot
import matplotlib.pyplot as plt
import random

def fun(x, y):
    return x**2 + y

fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')
x = y = np.arange(-3.0, 3.0, 0.05)
X, Y = np.meshgrid(x, y)
zs = np.array(fun(np.ravel(X), np.ravel(Y)))
Z = zs.reshape(X.shape)

ax.plot_surface(X, Y, Z)

ax.set_xlabel('X Label')
ax.set_ylabel('Y Label')
ax.set_zlabel('Z Label')

plt.show()

For surfaces it’s a bit different than a list of 3-tuples, you should pass in a grid for the domain in 2d arrays.

If all you have is a list of 3d points, rather than some function f(x, y) -> z, then you will have a problem because there are multiple ways to triangulate that 3d point cloud into a surface.

Here’s a smooth surface example:

import numpy as np
from mpl_toolkits.mplot3d import Axes3D  
# Axes3D import has side effects, it enables using projection='3d' in add_subplot
import matplotlib.pyplot as plt
import random

def fun(x, y):
    return x**2 + y

fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')
x = y = np.arange(-3.0, 3.0, 0.05)
X, Y = np.meshgrid(x, y)
zs = np.array(fun(np.ravel(X), np.ravel(Y)))
Z = zs.reshape(X.shape)

ax.plot_surface(X, Y, Z)

ax.set_xlabel('X Label')
ax.set_ylabel('Y Label')
ax.set_zlabel('Z Label')

plt.show()


回答 1

您可以直接从某些文件中读取数据并绘图

from mpl_toolkits.mplot3d import Axes3D
import matplotlib.pyplot as plt
from matplotlib import cm
import numpy as np
from sys import argv

x,y,z = np.loadtxt('your_file', unpack=True)

fig = plt.figure()
ax = Axes3D(fig)
surf = ax.plot_trisurf(x, y, z, cmap=cm.jet, linewidth=0.1)
fig.colorbar(surf, shrink=0.5, aspect=5)
plt.savefig('teste.pdf')
plt.show()

如有必要,您可以传递vmin和vmax来定义颜色条范围,例如

surf = ax.plot_trisurf(x, y, z, cmap=cm.jet, linewidth=0.1, vmin=0, vmax=2000)

奖金部分

我想知道如何在人工数据的情况下进行一些交互式绘图

from __future__ import print_function
from ipywidgets import interact, interactive, fixed, interact_manual
import ipywidgets as widgets
from IPython.display import Image

from mpl_toolkits.mplot3d import Axes3D
import matplotlib.pyplot as plt
import numpy as np
from mpl_toolkits import mplot3d

def f(x, y):
    return np.sin(np.sqrt(x ** 2 + y ** 2))

def plot(i):

    fig = plt.figure()
    ax = plt.axes(projection='3d')

    theta = 2 * np.pi * np.random.random(1000)
    r = i * np.random.random(1000)
    x = np.ravel(r * np.sin(theta))
    y = np.ravel(r * np.cos(theta))
    z = f(x, y)

    ax.plot_trisurf(x, y, z, cmap='viridis', edgecolor='none')
    fig.tight_layout()

interactive_plot = interactive(plot, i=(2, 10))
interactive_plot

You can read data direct from some file and plot

from mpl_toolkits.mplot3d import Axes3D
import matplotlib.pyplot as plt
from matplotlib import cm
import numpy as np
from sys import argv

x,y,z = np.loadtxt('your_file', unpack=True)

fig = plt.figure()
ax = Axes3D(fig)
surf = ax.plot_trisurf(x, y, z, cmap=cm.jet, linewidth=0.1)
fig.colorbar(surf, shrink=0.5, aspect=5)
plt.savefig('teste.pdf')
plt.show()

If necessary you can pass vmin and vmax to define the colorbar range, e.g.

surf = ax.plot_trisurf(x, y, z, cmap=cm.jet, linewidth=0.1, vmin=0, vmax=2000)

Bonus Section

I was wondering how to do some interactive plots, in this case with artificial data

from __future__ import print_function
from ipywidgets import interact, interactive, fixed, interact_manual
import ipywidgets as widgets
from IPython.display import Image

from mpl_toolkits.mplot3d import Axes3D
import matplotlib.pyplot as plt
import numpy as np
from mpl_toolkits import mplot3d

def f(x, y):
    return np.sin(np.sqrt(x ** 2 + y ** 2))

def plot(i):

    fig = plt.figure()
    ax = plt.axes(projection='3d')

    theta = 2 * np.pi * np.random.random(1000)
    r = i * np.random.random(1000)
    x = np.ravel(r * np.sin(theta))
    y = np.ravel(r * np.cos(theta))
    z = f(x, y)

    ax.plot_trisurf(x, y, z, cmap='viridis', edgecolor='none')
    fig.tight_layout()

interactive_plot = interactive(plot, i=(2, 10))
interactive_plot

回答 2

我只是遇到了同样的问题。我已均匀间隔即在3 1-d阵列,而不是2-d阵列数据matplotlibplot_surface欲望。我的数据恰好在,pandas.DataFrame所以这里是修改3个1D数组的matplotlib.plot_surface示例

from mpl_toolkits.mplot3d import Axes3D
from matplotlib import cm
from matplotlib.ticker import LinearLocator, FormatStrFormatter
import matplotlib.pyplot as plt
import numpy as np

X = np.arange(-5, 5, 0.25)
Y = np.arange(-5, 5, 0.25)
X, Y = np.meshgrid(X, Y)
R = np.sqrt(X**2 + Y**2)
Z = np.sin(R)

fig = plt.figure()
ax = fig.gca(projection='3d')
surf = ax.plot_surface(X, Y, Z, rstride=1, cstride=1, cmap=cm.coolwarm,
    linewidth=0, antialiased=False)
ax.set_zlim(-1.01, 1.01)

ax.zaxis.set_major_locator(LinearLocator(10))
ax.zaxis.set_major_formatter(FormatStrFormatter('%.02f'))

fig.colorbar(surf, shrink=0.5, aspect=5)
plt.title('Original Code')

那是原始的例子。在下一个位加上这个位,可以从3个1D数组中创建相同的图。

# ~~~~ MODIFICATION TO EXAMPLE BEGINS HERE ~~~~ #
import pandas as pd
from scipy.interpolate import griddata
# create 1D-arrays from the 2D-arrays
x = X.reshape(1600)
y = Y.reshape(1600)
z = Z.reshape(1600)
xyz = {'x': x, 'y': y, 'z': z}

# put the data into a pandas DataFrame (this is what my data looks like)
df = pd.DataFrame(xyz, index=range(len(xyz['x']))) 

# re-create the 2D-arrays
x1 = np.linspace(df['x'].min(), df['x'].max(), len(df['x'].unique()))
y1 = np.linspace(df['y'].min(), df['y'].max(), len(df['y'].unique()))
x2, y2 = np.meshgrid(x1, y1)
z2 = griddata((df['x'], df['y']), df['z'], (x2, y2), method='cubic')

fig = plt.figure()
ax = fig.gca(projection='3d')
surf = ax.plot_surface(x2, y2, z2, rstride=1, cstride=1, cmap=cm.coolwarm,
    linewidth=0, antialiased=False)
ax.set_zlim(-1.01, 1.01)

ax.zaxis.set_major_locator(LinearLocator(10))
ax.zaxis.set_major_formatter(FormatStrFormatter('%.02f'))

fig.colorbar(surf, shrink=0.5, aspect=5)
plt.title('Meshgrid Created from 3 1D Arrays')
# ~~~~ MODIFICATION TO EXAMPLE ENDS HERE ~~~~ #

plt.show()

以下是得出的数字:

I just came across this same problem. I have evenly spaced data that is in 3 1-D arrays instead of the 2-D arrays that matplotlib‘s plot_surface wants. My data happened to be in a pandas.DataFrame so here is the matplotlib.plot_surface example with the modifications to plot 3 1-D arrays.

from mpl_toolkits.mplot3d import Axes3D
from matplotlib import cm
from matplotlib.ticker import LinearLocator, FormatStrFormatter
import matplotlib.pyplot as plt
import numpy as np

X = np.arange(-5, 5, 0.25)
Y = np.arange(-5, 5, 0.25)
X, Y = np.meshgrid(X, Y)
R = np.sqrt(X**2 + Y**2)
Z = np.sin(R)

fig = plt.figure()
ax = fig.gca(projection='3d')
surf = ax.plot_surface(X, Y, Z, rstride=1, cstride=1, cmap=cm.coolwarm,
    linewidth=0, antialiased=False)
ax.set_zlim(-1.01, 1.01)

ax.zaxis.set_major_locator(LinearLocator(10))
ax.zaxis.set_major_formatter(FormatStrFormatter('%.02f'))

fig.colorbar(surf, shrink=0.5, aspect=5)
plt.title('Original Code')

That is the original example. Adding this next bit on creates the same plot from 3 1-D arrays.

# ~~~~ MODIFICATION TO EXAMPLE BEGINS HERE ~~~~ #
import pandas as pd
from scipy.interpolate import griddata
# create 1D-arrays from the 2D-arrays
x = X.reshape(1600)
y = Y.reshape(1600)
z = Z.reshape(1600)
xyz = {'x': x, 'y': y, 'z': z}

# put the data into a pandas DataFrame (this is what my data looks like)
df = pd.DataFrame(xyz, index=range(len(xyz['x']))) 

# re-create the 2D-arrays
x1 = np.linspace(df['x'].min(), df['x'].max(), len(df['x'].unique()))
y1 = np.linspace(df['y'].min(), df['y'].max(), len(df['y'].unique()))
x2, y2 = np.meshgrid(x1, y1)
z2 = griddata((df['x'], df['y']), df['z'], (x2, y2), method='cubic')

fig = plt.figure()
ax = fig.gca(projection='3d')
surf = ax.plot_surface(x2, y2, z2, rstride=1, cstride=1, cmap=cm.coolwarm,
    linewidth=0, antialiased=False)
ax.set_zlim(-1.01, 1.01)

ax.zaxis.set_major_locator(LinearLocator(10))
ax.zaxis.set_major_formatter(FormatStrFormatter('%.02f'))

fig.colorbar(surf, shrink=0.5, aspect=5)
plt.title('Meshgrid Created from 3 1D Arrays')
# ~~~~ MODIFICATION TO EXAMPLE ENDS HERE ~~~~ #

plt.show()

Here are the resulting figures:


回答 3

只是为了说明一下,伊曼纽尔(Emanuel)有了我(可能还有许多其他人)正在寻找的答案。如果您在3个单独的阵列中有3d分散的数据,则pandas是一个了不起的帮助,并且比其他选项要好得多。详细地说,假设您的x,y,z是一些任意变量。在我的情况下,这些是c,gamma和错误,因为我正在测试支持向量机。有很多潜在的选择来绘制数据:

  • scatter3D(cParams,gammas,avg_errors_array)-可行,但是过于简单
  • plot_wireframe(cParams,gammas,avg_errors_array)-可以工作,但是如果您的数据排序不好,看起来会很丑陋,这可能是大量真实科学数据的情况
  • ax.plot3D(cParams,gammas,avg_errors_array)-类似于线框

数据线框图

数据的3D分散

代码如下:

    fig = plt.figure()
    ax = fig.gca(projection='3d')
    ax.set_xlabel('c parameter')
    ax.set_ylabel('gamma parameter')
    ax.set_zlabel('Error rate')
    #ax.plot_wireframe(cParams, gammas, avg_errors_array)
    #ax.plot3D(cParams, gammas, avg_errors_array)
    #ax.scatter3D(cParams, gammas, avg_errors_array, zdir='z',cmap='viridis')

    df = pd.DataFrame({'x': cParams, 'y': gammas, 'z': avg_errors_array})
    surf = ax.plot_trisurf(df.x, df.y, df.z, cmap=cm.jet, linewidth=0.1)
    fig.colorbar(surf, shrink=0.5, aspect=5)    
    plt.savefig('./plots/avgErrs_vs_C_andgamma_type_%s.png'%(k))
    plt.show()

这是最终输出:

Just to chime in, Emanuel had the answer that I (and probably many others) are looking for. If you have 3d scattered data in 3 separate arrays, pandas is an incredible help and works much better than the other options. To elaborate, suppose your x,y,z are some arbitrary variables. In my case these were c,gamma, and errors because I was testing a support vector machine. There are many potential choices to plot the data:

  • scatter3D(cParams, gammas, avg_errors_array) – this works but is overly simplistic
  • plot_wireframe(cParams, gammas, avg_errors_array) – this works, but will look ugly if your data isn’t sorted nicely, as is potentially the case with massive chunks of real scientific data
  • ax.plot3D(cParams, gammas, avg_errors_array) – similar to wireframe

Wireframe plot of the data

3d scatter of the data

The code looks like this:

    fig = plt.figure()
    ax = fig.gca(projection='3d')
    ax.set_xlabel('c parameter')
    ax.set_ylabel('gamma parameter')
    ax.set_zlabel('Error rate')
    #ax.plot_wireframe(cParams, gammas, avg_errors_array)
    #ax.plot3D(cParams, gammas, avg_errors_array)
    #ax.scatter3D(cParams, gammas, avg_errors_array, zdir='z',cmap='viridis')

    df = pd.DataFrame({'x': cParams, 'y': gammas, 'z': avg_errors_array})
    surf = ax.plot_trisurf(df.x, df.y, df.z, cmap=cm.jet, linewidth=0.1)
    fig.colorbar(surf, shrink=0.5, aspect=5)    
    plt.savefig('./plots/avgErrs_vs_C_andgamma_type_%s.png'%(k))
    plt.show()

Here is the final output:


回答 4

查看官方示例。X,Y和Z实际上是2d数组,numpy.meshgrid()是从1d x和y值中获取2d x,y网格的简单方法。

http://matplotlib.sourceforge.net/mpl_examples/mplot3d/surface3d_demo.py

这是将3元组转换为3个1d数组的pythonic方法。

data = [(1,2,3), (10,20,30), (11, 22, 33), (110, 220, 330)]
X,Y,Z = zip(*data)
In [7]: X
Out[7]: (1, 10, 11, 110)
In [8]: Y
Out[8]: (2, 20, 22, 220)
In [9]: Z
Out[9]: (3, 30, 33, 330)

这是mtaplotlib delaunay三角剖分(插值),它将1d x,y,z转换为兼容的(?):

http://matplotlib.sourceforge.net/api/mlab_api.html#matplotlib.mlab.griddata

check the official example. X,Y and Z are indeed 2d arrays, numpy.meshgrid() is a simple way to get 2d x,y mesh out of 1d x and y values.

http://matplotlib.sourceforge.net/mpl_examples/mplot3d/surface3d_demo.py

here’s pythonic way to convert your 3-tuples to 3 1d arrays.

data = [(1,2,3), (10,20,30), (11, 22, 33), (110, 220, 330)]
X,Y,Z = zip(*data)
In [7]: X
Out[7]: (1, 10, 11, 110)
In [8]: Y
Out[8]: (2, 20, 22, 220)
In [9]: Z
Out[9]: (3, 30, 33, 330)

Here’s mtaplotlib delaunay triangulation (interpolation), it converts 1d x,y,z into something compliant (?):

http://matplotlib.sourceforge.net/api/mlab_api.html#matplotlib.mlab.griddata


回答 5

在Matlab中,我仅使用,坐标(而不是)delaunay上的函数做了类似的事情,然后使用或绘制,使用了高度。xyztrimeshtrisurfz

SciPy具有Delaunay类,该类基于与Matlab delaunay函数相同的基础QHull库,因此您应该获得相同的结果。

从那里开始,应该有几行代码将python-matplotlib示例中的Plotting 3D Polygons转换为您希望实现的目标,从而为Delaunay您提供了每个三角形多边形的规格。

In Matlab I did something similar using the delaunay function on the x, y coords only (not the z), then plotting with trimesh or trisurf, using z as the height.

SciPy has the Delaunay class, which is based on the same underlying QHull library that the Matlab’s delaunay function is, so you should get identical results.

From there, it should be a few lines of code to convert this Plotting 3D Polygons in python-matplotlib example into what you wish to achieve, as Delaunay gives you the specification of each triangular polygon.


回答 6

只是添加一些其他想法,这些想法可能会帮助其他人解决不规则的域类型问题。对于用户具有三个向量/列表的情况,x,y,z表示2D解决方案,其中z将被绘制在作为表面的矩形网格上,ArtifixR的’plot_trisurf()’注释适用。一个具有非矩形域的类似示例是:

import matplotlib.pyplot as plt
from matplotlib import cm
from mpl_toolkits.mplot3d import Axes3D 

# problem parameters
nu = 50; nv = 50
u = np.linspace(0, 2*np.pi, nu,) 
v = np.linspace(0, np.pi, nv,)

xx = np.zeros((nu,nv),dtype='d')
yy = np.zeros((nu,nv),dtype='d')
zz = np.zeros((nu,nv),dtype='d')

# populate x,y,z arrays
for i in range(nu):
  for j in range(nv):
    xx[i,j] = np.sin(v[j])*np.cos(u[i])
    yy[i,j] = np.sin(v[j])*np.sin(u[i])
    zz[i,j] = np.exp(-4*(xx[i,j]**2 + yy[i,j]**2)) # bell curve

# convert arrays to vectors
x = xx.flatten()
y = yy.flatten()
z = zz.flatten()

# Plot solution surface
fig = plt.figure(figsize=(6,6))
ax = Axes3D(fig)
ax.plot_trisurf(x, y, z, cmap=cm.jet, linewidth=0,
                antialiased=False)
ax.set_title(r'trisurf example',fontsize=16, color='k')
ax.view_init(60, 35)
fig.tight_layout()
plt.show()

上面的代码生成:

但是,这可能无法解决所有问题,尤其是在不规则域中定义问题的情况下。同样,在畴具有一个或多个凹面区域的情况下,德劳内三角剖分可能会导致在畴外部生成虚假三角形。在这种情况下,必须从三角测量中删除这些流氓三角形,以实现正确的表面表示。对于这些情况,用户可能必须明确包括delaunay三角剖分计算,以便可以通过编程方式删除这些三角形。在这种情况下,以下代码可以代替以前的绘图代码:


import matplotlib.tri as mtri 
import scipy.spatial
# plot final solution
pts = np.vstack([x, y]).T
tess = scipy.spatial.Delaunay(pts) # tessilation

# Create the matplotlib Triangulation object
xx = tess.points[:, 0]
yy = tess.points[:, 1]
tri = tess.vertices # or tess.simplices depending on scipy version

#############################################################
# NOTE: If 2D domain has concave properties one has to
#       remove delaunay triangles that are exterior to the domain.
#       This operation is problem specific!
#       For simple situations create a polygon of the
#       domain from boundary nodes and identify triangles
#       in 'tri' outside the polygon. Then delete them from
#       'tri'.
#       <ADD THE CODE HERE>
#############################################################

triDat = mtri.Triangulation(x=pts[:, 0], y=pts[:, 1], triangles=tri)

# Plot solution surface
fig = plt.figure(figsize=(6,6))
ax = fig.gca(projection='3d')
ax.plot_trisurf(triDat, z, linewidth=0, edgecolor='none',
                antialiased=False, cmap=cm.jet)
ax.set_title(r'trisurf with delaunay triangulation', 
          fontsize=16, color='k')
plt.show()

下面的示例图说明了解决方案1)带有虚假三角形的溶液,以及2)去除了溶液的位置:

我希望以上内容可能对解决方案数据中出现凹形情况的人们有所帮助。

Just to add some further thoughts which may help others with irregular domain type problems. For a situation where the user has three vectors/lists, x,y,z representing a 2D solution where z is to be plotted on a rectangular grid as a surface, the ‘plot_trisurf()’ comments by ArtifixR are applicable. A similar example but with non rectangular domain is:

import matplotlib.pyplot as plt
from matplotlib import cm
from mpl_toolkits.mplot3d import Axes3D 

# problem parameters
nu = 50; nv = 50
u = np.linspace(0, 2*np.pi, nu,) 
v = np.linspace(0, np.pi, nv,)

xx = np.zeros((nu,nv),dtype='d')
yy = np.zeros((nu,nv),dtype='d')
zz = np.zeros((nu,nv),dtype='d')

# populate x,y,z arrays
for i in range(nu):
  for j in range(nv):
    xx[i,j] = np.sin(v[j])*np.cos(u[i])
    yy[i,j] = np.sin(v[j])*np.sin(u[i])
    zz[i,j] = np.exp(-4*(xx[i,j]**2 + yy[i,j]**2)) # bell curve

# convert arrays to vectors
x = xx.flatten()
y = yy.flatten()
z = zz.flatten()

# Plot solution surface
fig = plt.figure(figsize=(6,6))
ax = Axes3D(fig)
ax.plot_trisurf(x, y, z, cmap=cm.jet, linewidth=0,
                antialiased=False)
ax.set_title(r'trisurf example',fontsize=16, color='k')
ax.view_init(60, 35)
fig.tight_layout()
plt.show()

The above code produces:

However, this may not solve all problems, particular where the problem is defined on an irregular domain. Also, in the case where the domain has one or more concave areas, the delaunay triangulation may result in generating spurious triangles exterior to the domain. In such cases, these rogue triangles have to be removed from the triangulation in order to achieve the correct surface representation. For these situations, the user may have to explicitly include the delaunay triangulation calculation so that these triangles can be removed programmatically. Under these circumstances, the following code could replace the previous plot code:


import matplotlib.tri as mtri 
import scipy.spatial
# plot final solution
pts = np.vstack([x, y]).T
tess = scipy.spatial.Delaunay(pts) # tessilation

# Create the matplotlib Triangulation object
xx = tess.points[:, 0]
yy = tess.points[:, 1]
tri = tess.vertices # or tess.simplices depending on scipy version

#############################################################
# NOTE: If 2D domain has concave properties one has to
#       remove delaunay triangles that are exterior to the domain.
#       This operation is problem specific!
#       For simple situations create a polygon of the
#       domain from boundary nodes and identify triangles
#       in 'tri' outside the polygon. Then delete them from
#       'tri'.
#       <ADD THE CODE HERE>
#############################################################

triDat = mtri.Triangulation(x=pts[:, 0], y=pts[:, 1], triangles=tri)

# Plot solution surface
fig = plt.figure(figsize=(6,6))
ax = fig.gca(projection='3d')
ax.plot_trisurf(triDat, z, linewidth=0, edgecolor='none',
                antialiased=False, cmap=cm.jet)
ax.set_title(r'trisurf with delaunay triangulation', 
          fontsize=16, color='k')
plt.show()

Example plots are given below illustrating solution 1) with spurious triangles, and 2) where they have been removed:

I hope the above may be of help to people with concavity situations in the solution data.


回答 7

无法使用您的数据直接制作3d曲面。我建议您使用诸如pykridge之类的工具构建插值模型。该过程将包括三个步骤:

  1. 使用训练插值模型 pykridge
  2. 使用X和构建网格Ymeshgrid
  3. 内插值 Z

创建了网格和相应的Z值之后,现在就可以使用了plot_surface。请注意,根据数据的大小,该meshgrid功能可以运行一段时间。解决方法是使用np.linspacefor XYaxis 创建均匀间隔的样本,然后应用插值来推断必要的Z值。如果是这样,则插值可能会与原始值有所不同,Z因为X并且Y已经更改。

It is not possible to directly make a 3d surface using your data. I would recommend you to build an interpolation model using some tools like pykridge. The process will include three steps:

  1. Train an interpolation model using pykridge
  2. Build a grid from X and Y using meshgrid
  3. Interpolate values for Z

Having created your grid and the corresponding Z values, now you’re ready to go with plot_surface. Note that depending on the size of your data, the meshgrid function can run for a while. The workaround is to create evenly spaced samples using np.linspace for X and Y axes, then apply interpolation to infer the necessary Z values. If so, the interpolated values might different from the original Z because X and Y have changed.


检查变量是否为None或numpy.array时发生ValueError

问题:检查变量是否为None或numpy.array时发生ValueError

我想检查变量是否为None或numpy.array。我已经实现check_a了此功能。

def check_a(a):
    if not a:
        print "please initialize a"

a = None
check_a(a)
a = np.array([1,2])
check_a(a)

但是,此代码引发ValueError。什么是直截了当的方式?

ValueError                                Traceback (most recent call last)
<ipython-input-41-0201c81c185e> in <module>()
      6 check_a(a)
      7 a = np.array([1,2])
----> 8 check_a(a)

<ipython-input-41-0201c81c185e> in check_a(a)
      1 def check_a(a):
----> 2     if not a:
      3         print "please initialize a"
      4 
      5 a = None

ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

I’d like to check if variable is None or numpy.array. I’ve implemented check_a function to do this.

def check_a(a):
    if not a:
        print "please initialize a"

a = None
check_a(a)
a = np.array([1,2])
check_a(a)

But, this code raises ValueError. What is the straight forward way?

ValueError                                Traceback (most recent call last)
<ipython-input-41-0201c81c185e> in <module>()
      6 check_a(a)
      7 a = np.array([1,2])
----> 8 check_a(a)

<ipython-input-41-0201c81c185e> in check_a(a)
      1 def check_a(a):
----> 2     if not a:
      3         print "please initialize a"
      4 
      5 a = None

ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

回答 0

使用not a测试是否aNone假设的其他可能值a有真值True。但是,大多数NumPy数组根本没有真值,not因此无法应用于它们。

如果要测试某个对象是否为None,最通用,最可靠的方法就是直接使用以下is检查None

if a is None:
    ...
else:
    ...

这不依赖于具有真值的对象,因此它适用于NumPy数组。

注意测试必须是is,不是==is是对象身份测试。==无论参数说什么,NumPy数组都说这是广播的元素等式比较,产生一个布尔数组:

>>> a = numpy.arange(5)
>>> a == None
array([False, False, False, False, False])
>>> if a == None:
...     pass
...
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: The truth value of an array with more than one element is ambiguous.
 Use a.any() or a.all()

另一方面,如果要测试对象是否为NumPy数组,则可以测试其类型:

# Careful - the type is np.ndarray, not np.array. np.array is a factory function.
if type(a) is np.ndarray:
    ...
else:
    ...

您还可以使用isinstance,它还会返回True该类型的子类(如果您要的话)。考虑到可怕和不兼容np.matrix,您可能实际上不希望这样做:

# Again, ndarray, not array, because array is a factory function.
if isinstance(a, np.ndarray):
    ...
else:
    ...    

Using not a to test whether a is None assumes that the other possible values of a have a truth value of True. However, most NumPy arrays don’t have a truth value at all, and not cannot be applied to them.

If you want to test whether an object is None, the most general, reliable way is to literally use an is check against None:

if a is None:
    ...
else:
    ...

This doesn’t depend on objects having a truth value, so it works with NumPy arrays.

Note that the test has to be is, not ==. is is an object identity test. == is whatever the arguments say it is, and NumPy arrays say it’s a broadcasted elementwise equality comparison, producing a boolean array:

>>> a = numpy.arange(5)
>>> a == None
array([False, False, False, False, False])
>>> if a == None:
...     pass
...
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: The truth value of an array with more than one element is ambiguous.
 Use a.any() or a.all()

On the other side of things, if you want to test whether an object is a NumPy array, you can test its type:

# Careful - the type is np.ndarray, not np.array. np.array is a factory function.
if type(a) is np.ndarray:
    ...
else:
    ...

You can also use isinstance, which will also return True for subclasses of that type (if that is what you want). Considering how terrible and incompatible np.matrix is, you may not actually want this:

# Again, ndarray, not array, because array is a factory function.
if isinstance(a, np.ndarray):
    ...
else:
    ...    

回答 1

如果您尝试执行非常相似的操作:a is not None,则会出现相同的问题。也就是说,Numpy抱怨必须使用a.anya.all

解决方法是:

if not (a is None):
    pass

不太漂亮,但是可以做到。

If you are trying to do something very similar: a is not None, the same issue comes up. That is, Numpy complains that one must use a.any or a.all.

A workaround is to do:

if not (a is None):
    pass

Not too pretty, but it does the job.


回答 2

您可以查看对象是否具有形状

def check_array(x):
    try:
        x.shape
        return True
    except:
        return False

You can see if object has shape or not

def check_array(x):
    try:
        x.shape
        return True
    except:
        return False

在NumPy中相当于MATLAB的repmat

问题:在NumPy中相当于MATLAB的repmat

我想使用NumPy执行以下MATLAB代码的等效项:repmat([1; 1], [1 1 1])。我将如何完成?

I would like to execute the equivalent of the following MATLAB code using NumPy: repmat([1; 1], [1 1 1]). How would I accomplish this?


回答 0

对于Matlab用户,这是一个更好(官方)的NumPy链接-恐怕其中的Mathesaurus已经过时了。

的numpy的当量repmat(a, m, n)tile(a, (m, n))

这适用于多个维度,并提供与matlab类似的结果。(Numpy提供了3d输出数组,正如您期望的那样-由于某种原因,matlab提供了2d输出-但内容相同)。

Matlab:

>> repmat([1;1],[1,1,1])

ans =
     1
     1

Python:

In [46]: a = np.array([[1],[1]])
In [47]: np.tile(a, [1,1,1])
Out[47]: 
array([[[1],
        [1]]])

Here is a much better (official) NumPy for Matlab Users link – I’m afraid the mathesaurus one is quite out of date.

The numpy equivalent of repmat(a, m, n) is tile(a, (m, n)).

This works with multiple dimensions and gives a similar result to matlab. (Numpy gives a 3d output array as you would expect – matlab for some reason gives 2d output – but the content is the same).

Matlab:

>> repmat([1;1],[1,1,1])

ans =
     1
     1

Python:

In [46]: a = np.array([[1],[1]])
In [47]: np.tile(a, [1,1,1])
Out[47]: 
array([[[1],
        [1]]])

回答 1

请注意,NumPy的广播机制已解决了需要使用MATLAB的repmat的某些原因,该机制使您可以使用形状相似的数组进行各种类型的数学运算。因此,如果您有一个表示3色图像的1600x1400x3数组,则可以(在元素上)相乘[1.0 0.25 0.25]以减少每个像素的绿色和蓝色量。有关更多信息,请参见上面的链接。

Note that some of the reasons you’d need to use MATLAB’s repmat are taken care of by NumPy’s broadcasting mechanism, which allows you to do various types of math with arrays of similar shape. So if you had, say, a 1600x1400x3 array representing a 3-color image, you could (elementwise) multiply it by [1.0 0.25 0.25] to reduce the amount of green and blue at each pixel. See the above link for more information.


回答 2

请参阅Matlab用户的NumPy

Matlab:

repmat(a, 2, 3)

脾气暴躁:

numpy.kron(numpy.ones((2,3)), a)

Numpy中的Matlib(numpy.matlib.repmat()):

numpy.matlib.repmat(a, 2, 3)

See NumPy for Matlab users.

Matlab:

repmat(a, 2, 3)

Numpy:

numpy.kron(numpy.ones((2,3)), a)

Matlib in Numpy (numpy.matlib.repmat()):

numpy.matlib.repmat(a, 2, 3)

回答 3

这就是我有点儿摆弄的方式。很高兴得到纠正,希望能有所帮助。

假设您有2×3个元素的矩阵M。显然,这有两个方面。


当要求沿着矩阵已经具有的维度操纵输入矩阵时,我看不到Matlab和Python之间的区别。因此这两个命令

repmat(M,m,n) % matlab

np.tile(M,(m,n)) # python

对于等级2(二维)的矩阵实际上是等效的。


当您要求重复/平铺比输入矩阵更多的维度时,事情变得与直觉相反。回到等级2和形状2×3的矩阵M,足以看出输出矩阵的大小/形状发生了什么。假设现在的操作顺序为1,1,2。

在Matlab中

> size(repmat(M,1,1,2))
ans =

    2   3   2

它已经复制了输入矩阵的前两个维度(行和列),并已将其重复一次到新的第三个维度(即复制了两次)。repmat符合重复矩阵的命名。

在Python中

>>> np.tile(M,(1,1,2)).shape
(1, 2, 6)

它采用了不同的过程,因为我认为序列(1,1,2)的读取方式与Matlab中的读取方式不同。从右到左读取列,行和面外尺寸方向的份数。生成的对象具有与Matlab不同的形状。人们可以不再断言repmattile等价指令。


为了变得tilerepmat,在Python中必须确保输入矩阵的维数与序列中的元素一样多。例如,这可以通过一些预处理并创建相关对象N来完成。

N = M[:,:,np.newaxis]

然后,在输入端有N.shape = (2,3,1)而不是M.shape = (2,3)和在输出端

>>> np.tile(N,(1,1,2)).shape
(2, 3, 2)

这是的答案size(repmat(M,1,1,2))。我猜这是因为我们已经指导Python将第三个维度添加到(2,3)的右侧,而不是它的左侧,以便Python可以按Matlab的预期计算出序列(1,1,2)的阅读方式。

在元件[:,:,0]在Python答案Ñ将包含相同的值作为元素(:,:,1)Matlab的答案中号


最后,我似乎找不到repmat当人使用Kronecker产品的等效产品

>>> np.kron(np.ones((1,1,2)),M).shape
(1, 2, 6)

除非我如上所述将M前提为N。因此,我认为继续前进的最一般方法是使用np.newaxis


当我们考虑等级3(三个维度)的矩阵L以及在输出矩阵中不添加任何新维度的简单情况时,游戏将变得更加棘手。这两个看似等效的指令不会产生相同的结果

repmat(L,p,q,r) % matlab

np.tile(L,(p,q,r)) # python

因为行,列和平面外方向在Matlab中是(p,q,r)在Python中是(q,r,p),在rank-2数组中不可见。在那里,必须要小心,使用两种语言获得相同的结果将需要更多的预处理。


我知道这种推理可能不是一般性的,但我只能在目前为止得出结论。希望这会邀请其他人对其进行更严格的测试。

This is how I understood it out of a bit of fiddling around. Happy to be corrected and hope this helps.

Say you have a matrix M of 2×3 elements. This has two dimensions, obviously.


I could see no difference between Matlab and Python while asking to manipulate the input matrix along the dimensions the matrix already has. Thus the two commands

repmat(M,m,n) % matlab

np.tile(M,(m,n)) # python

are really equivalent for a matrix of rank 2 (two dimensions).


The matters goes counter-intuitive when you ask for repetition/tiling over more dimensions than the input matrix has. Going back to the matrix M of rank two and shape 2×3, it is sufficient to look at what happens to the size/shape of the output matrix. Say the sequence for manipulation is now 1,1,2.

In Matlab

> size(repmat(M,1,1,2))
ans =

    2   3   2

it has copied the first two dimensions (rows and columns) of the input matrix and has repeated that once into a new third dimension (copied twice, that is). True to the naming repmat for repeat matrix.

In Python

>>> np.tile(M,(1,1,2)).shape
(1, 2, 6)

it has applied a different procedure since, I presume, the sequence (1,1,2) is read differently than in Matlab. The number of copies in the direction of columns, rows and out-of-plane dimension are being read from right to left. The resulting object has a different shape from Matlab. One can no longer assert that repmat and tile are equivalent instructions.


In order to get tile to behave like repmat, in Python one has to make sure that the input matrix has as many dimensions as the elements are in the sequence. This is done, for example, by a little preconditioning and creating a related object N

N = M[:,:,np.newaxis]

Then, at the input side one has N.shape = (2,3,1) rather than M.shape = (2,3) and at the output side

>>> np.tile(N,(1,1,2)).shape
(2, 3, 2)

which was the answer of size(repmat(M,1,1,2)). I presume this is because we have guided Python to add the third dimension to the right of (2,3) rather than to its left, so that Python works out the sequence (1,1,2) as it was intended in the Matlab way of reading it.

The element in [:,:,0] in the Python answer for N will contain the same values as the element (:,:,1) the Matlab answer for M.


Finally, I can’t seem to find an equivalent for repmat when one uses the Kronecker product out of

>>> np.kron(np.ones((1,1,2)),M).shape
(1, 2, 6)

unless I then precondition M into N as above. So I would argue that the most general way to move on is to use the ways of np.newaxis.


The game gets trickier when we consider a matrix L of rank 3 (three dimensions) and the simple case of no new dimensions being added in the output matrix. These two seemingly equivalent instructions will not produce the same results

repmat(L,p,q,r) % matlab

np.tile(L,(p,q,r)) # python

because the row, column, out-of-plane directions are (p,q,r) in Matlab and (q,r,p) in Python, which was not visible with rank-2 arrays. There, one has to be careful and obtaining the same results with the two languages would require more preconditioning.


I am aware that this reasoning may well not be general, but I could work it out only this far. Hopefully this invites other fellows to put it to a harder test.


回答 4

认识tilerepeat

x = numpy.arange(5)
print numpy.tile(x, 2)
print x.repeat(2)

Know both tile and repeat.

x = numpy.arange(5)
print numpy.tile(x, 2)
print x.repeat(2)

回答 5

numpy.matlib有一个 repmat函数,其接口与matlab函数类似

from numpy.matlib import repmat
repmat( np.array([[1],[1]]) , 1, 1)

numpy.matlib has a repmat function with a similar interface as the matlab function

from numpy.matlib import repmat
repmat( np.array([[1],[1]]) , 1, 1)

回答 6

>>> import numpy as np

>>> np.repeat(['a','b'], [2,5])

array(['a', 'a', 'b', 'b', 'b', 'b', 'b'], dtype='<U1')

>>> np.repeat([1,2], [2,5])

array([1, 1, 2, 2, 2, 2, 2])

>>> np.repeat(np.array([1,2]), [3]).reshape(2,3)

array([[1, 1, 1],
       [2, 2, 2]])

>>> np.repeat(np.array([1,2]), [2,4]).reshape(3,2)

array([[1, 1],
       [2, 2],
       [2, 2]])

>>> np.repeat(np.matrix('1 2; 3 4'), [2]).reshape(4,2)

matrix([[1, 1],
        [2, 2],
        [3, 3],
        [4, 4]])
>>> import numpy as np

>>> np.repeat(['a','b'], [2,5])

array(['a', 'a', 'b', 'b', 'b', 'b', 'b'], dtype='<U1')

>>> np.repeat([1,2], [2,5])

array([1, 1, 2, 2, 2, 2, 2])

>>> np.repeat(np.array([1,2]), [3]).reshape(2,3)

array([[1, 1, 1],
       [2, 2, 2]])

>>> np.repeat(np.array([1,2]), [2,4]).reshape(3,2)

array([[1, 1],
       [2, 2],
       [2, 2]])

>>> np.repeat(np.matrix('1 2; 3 4'), [2]).reshape(4,2)

matrix([[1, 1],
        [2, 2],
        [3, 3],
        [4, 4]])

python numpy机器epsilon

问题:python numpy机器epsilon

我试图了解什么是机器epsilon。根据维基百科,可以如下计算:

def machineEpsilon(func=float):
    machine_epsilon = func(1)
    while func(1)+func(machine_epsilon) != func(1):
        machine_epsilon_last = machine_epsilon
        machine_epsilon = func(machine_epsilon) / func(2)
    return machine_epsilon_last

但是,它仅适用于双精度数字。我有兴趣修改它以支持单精度数字。我读到可以使用numpy,尤其是numpy.float32类。有人可以帮助您修改功能吗?

I am trying to understand what is machine epsilon. According to the Wikipedia, it can be calculated as follows:

def machineEpsilon(func=float):
    machine_epsilon = func(1)
    while func(1)+func(machine_epsilon) != func(1):
        machine_epsilon_last = machine_epsilon
        machine_epsilon = func(machine_epsilon) / func(2)
    return machine_epsilon_last

However, it is suitable only for double precision numbers. I am interested in modifying it to support also single precision numbers. I read that numpy can be used, particularly numpy.float32 class. Can anybody help with modifying the function?


回答 0

对于给定的float类型,获取机器epsilon的更简单方法是使用np.finfo()

print(np.finfo(float).eps)
# 2.22044604925e-16

print(np.finfo(np.float32).eps)
# 1.19209e-07

An easier way to get the machine epsilon for a given float type is to use np.finfo():

print(np.finfo(float).eps)
# 2.22044604925e-16

print(np.finfo(np.float32).eps)
# 1.19209e-07

回答 1

获得epsilon的另一种简单方法是:

In [1]: 7./3 - 4./3 -1
Out[1]: 2.220446049250313e-16

Another easy way to get epsilon is:

In [1]: 7./3 - 4./3 -1
Out[1]: 2.220446049250313e-16

回答 2

正如David指出的那样,它已经可以工作了!

>>> def machineEpsilon(func=float):
...     machine_epsilon = func(1)
...     while func(1)+func(machine_epsilon) != func(1):
...         machine_epsilon_last = machine_epsilon
...         machine_epsilon = func(machine_epsilon) / func(2)
...     return machine_epsilon_last
... 
>>> machineEpsilon(float)
2.220446049250313e-16
>>> import numpy
>>> machineEpsilon(numpy.float64)
2.2204460492503131e-16
>>> machineEpsilon(numpy.float32)
1.1920929e-07

It will already work, as David pointed out!

>>> def machineEpsilon(func=float):
...     machine_epsilon = func(1)
...     while func(1)+func(machine_epsilon) != func(1):
...         machine_epsilon_last = machine_epsilon
...         machine_epsilon = func(machine_epsilon) / func(2)
...     return machine_epsilon_last
... 
>>> machineEpsilon(float)
2.220446049250313e-16
>>> import numpy
>>> machineEpsilon(numpy.float64)
2.2204460492503131e-16
>>> machineEpsilon(numpy.float32)
1.1920929e-07

断言numpy.array相等的最佳方法?

问题:断言numpy.array相等的最佳方法?

我想为我的应用程序做一些单元测试,并且需要比较两个数组。由于array.__eq__返回一个新数组(因此TestCase.assertEqual失败),为相等性断言的最佳方法是什么?

目前我正在使用

self.assertTrue((arr1 == arr2).all())

但我不是很喜欢

I want to make some unit-tests for my app, and I need to compare two arrays. Since array.__eq__ returns a new array (so TestCase.assertEqual fails), what is the best way to assert for equality?

Currently I’m using

self.assertTrue((arr1 == arr2).all())

but I don’t really like it


回答 0

在中检出assert函数numpy.testing,例如

assert_array_equal

对于浮点数组,相等性测试可能会失败,并且 assert_almost_equal更加可靠。

更新

之前获得了一些版本的numpy assert_allclose,现在它是我的最爱,因为它允许我们指定绝对误差和相对误差,并且不需要十进制舍入作为接近度标准。

check out the assert functions in numpy.testing, e.g.

assert_array_equal

for floating point arrays equality test might fail and assert_almost_equal is more reliable.

update

A few versions ago numpy obtained assert_allclose which is now my favorite since it allows us to specify both absolute and relative error and doesn’t require decimal rounding as the closeness criterion.


回答 1

我觉得(arr1 == arr2).all()很好看。但是您可以使用:

numpy.allclose(arr1, arr2)

但是不完全一样。

与您的示例几乎相同的替代方法是:

numpy.alltrue(arr1 == arr2)

请注意,scipy.array实际上是一个引用numpy.array。这样可以更轻松地找到文档。

I think (arr1 == arr2).all() looks pretty nice. But you could use:

numpy.allclose(arr1, arr2)

but it’s not quite the same.

An alternative, almost the same as your example is:

numpy.alltrue(arr1 == arr2)

Note that scipy.array is actually a reference numpy.array. That makes it easier to find the documentation.


回答 2

我发现使用 self.assertEqual(arr1.tolist(), arr2.tolist()) 是比较数组与unittest的最简单方法。

我同意这不是最漂亮的解决方案,并且可能不是最快的解决方案,但是与其余测试用例相比,它可能更统一,您可以获得所有unittest错误描述,并且实现起来非常简单。

I find that using self.assertEqual(arr1.tolist(), arr2.tolist()) is the easiest way of comparing arrays with unittest.

I agree it’s not the prettiest solution and it’s probably not the fastest but it’s probably more uniform with the rest of your test cases, you get all the unittest error description and it’s really simple to implement.


回答 3

从Python 3.2开始,您可以使用assertSequenceEqual(array1.tolist(), array2.tolist())

这具有向您显示数组不同的确切项目的附加价值。

Since Python 3.2 you can use assertSequenceEqual(array1.tolist(), array2.tolist()).

This has the added value of showing you the exact items in which the arrays differ.


回答 4

在我的测试中,我使用以下代码:

try:
    numpy.testing.assert_array_equal(arr1, arr2)
    res = True
except AssertionError as err:
    res = False
    print (err)
self.assertTrue(res)

In my tests I use this:

numpy.testing.assert_array_equal(arr1, arr2)

回答 5

np.linalg.norm(arr1 - arr2) < 1e-6

np.linalg.norm(arr1 - arr2) < 1e-6


检测NumPy数组是否包含至少一个非数值?

问题:检测NumPy数组是否包含至少一个非数值?

我需要编写一个函数来检测输入是否包含至少一个非数字值。如果找到一个非数字值,我将引发一个错误(因为该计算应仅返回一个数字值)。预先不知道输入数组的维数-无论ndim如何,函数都应给出正确的值。更为复杂的是,输入可能是单个浮点数,numpy.float64或者甚至是零维数组之类的奇数。

解决此问题的明显方法是编写一个递归函数,该函数对数组中的每个可迭代对象进行迭代,直到找到一个非iterabe。它将numpy.isnan()对每个不可迭代的对象应用该函数。如果找到至少一个非数字值,则该函数将立即返回False。否则,如果iterable中的所有值都是数字,它将最终返回True。

效果很好,但是速度很慢,我希望NumPy有更好的方法。什么是更快更麻木的替代品?

这是我的样机:

def contains_nan( myarray ):
    """
    @param myarray : An n-dimensional array or a single float
    @type myarray : numpy.ndarray, numpy.array, float
    @returns: bool
    Returns true if myarray is numeric or only contains numeric values.
    Returns false if at least one non-numeric value exists
    Not-A-Number is given by the numpy.isnan() function.
    """
    return True

I need to write a function which will detect if the input contains at least one value which is non-numeric. If a non-numeric value is found I will raise an error (because the calculation should only return a numeric value). The number of dimensions of the input array is not known in advance – the function should give the correct value regardless of ndim. As an extra complication the input could be a single float or numpy.float64 or even something oddball like a zero-dimensional array.

The obvious way to solve this is to write a recursive function which iterates over every iterable object in the array until it finds a non-iterabe. It will apply the numpy.isnan() function over every non-iterable object. If at least one non-numeric value is found then the function will return False immediately. Otherwise if all the values in the iterable are numeric it will eventually return True.

That works just fine, but it’s pretty slow and I expect that NumPy has a much better way to do it. What is an alternative that is faster and more numpyish?

Here’s my mockup:

def contains_nan( myarray ):
    """
    @param myarray : An n-dimensional array or a single float
    @type myarray : numpy.ndarray, numpy.array, float
    @returns: bool
    Returns true if myarray is numeric or only contains numeric values.
    Returns false if at least one non-numeric value exists
    Not-A-Number is given by the numpy.isnan() function.
    """
    return True

回答 0

这应该比迭代更快,并且无论形状如何都可以工作。

numpy.isnan(myarray).any()

编辑:快30倍:

import timeit
s = 'import numpy;a = numpy.arange(10000.).reshape((100,100));a[10,10]=numpy.nan'
ms = [
    'numpy.isnan(a).any()',
    'any(numpy.isnan(x) for x in a.flatten())']
for m in ms:
    print "  %.2f s" % timeit.Timer(m, s).timeit(1000), m

结果:

  0.11 s numpy.isnan(a).any()
  3.75 s any(numpy.isnan(x) for x in a.flatten())

奖励:它适用于非数组NumPy类型:

>>> a = numpy.float64(42.)
>>> numpy.isnan(a).any()
False
>>> a = numpy.float64(numpy.nan)
>>> numpy.isnan(a).any()
True

This should be faster than iterating and will work regardless of shape.

numpy.isnan(myarray).any()

Edit: 30x faster:

import timeit
s = 'import numpy;a = numpy.arange(10000.).reshape((100,100));a[10,10]=numpy.nan'
ms = [
    'numpy.isnan(a).any()',
    'any(numpy.isnan(x) for x in a.flatten())']
for m in ms:
    print "  %.2f s" % timeit.Timer(m, s).timeit(1000), m

Results:

  0.11 s numpy.isnan(a).any()
  3.75 s any(numpy.isnan(x) for x in a.flatten())

Bonus: it works fine for non-array NumPy types:

>>> a = numpy.float64(42.)
>>> numpy.isnan(a).any()
False
>>> a = numpy.float64(numpy.nan)
>>> numpy.isnan(a).any()
True

回答 1

如果无穷是一个可能的值,我将使用numpy.isfinite

numpy.isfinite(myarray).all()

如果以上计算结果为True,则不myarray包含numpy.nannumpy.inf-numpy.inf值。

numpy.nan可以使用numpy.inf值,例如:

In [11]: import numpy as np

In [12]: b = np.array([[4, np.inf],[np.nan, -np.inf]])

In [13]: np.isnan(b)
Out[13]: 
array([[False, False],
       [ True, False]], dtype=bool)

In [14]: np.isfinite(b)
Out[14]: 
array([[ True, False],
       [False, False]], dtype=bool)

If infinity is a possible value, I would use numpy.isfinite

numpy.isfinite(myarray).all()

If the above evaluates to True, then myarray contains no, numpy.nan, numpy.inf or -numpy.inf values.

numpy.nan will be OK with numpy.inf values, for example:

In [11]: import numpy as np

In [12]: b = np.array([[4, np.inf],[np.nan, -np.inf]])

In [13]: np.isnan(b)
Out[13]: 
array([[False, False],
       [ True, False]], dtype=bool)

In [14]: np.isfinite(b)
Out[14]: 
array([[ True, False],
       [False, False]], dtype=bool)

回答 2

ff!微秒!永远不要在几微秒内解决可以在十亿分之一秒内解决的问题。

注意接受的答案:

  • 遍历整个数据,无论是否找到nan
  • 创建一个大小为N的临时数组,该数组是多余的。

更好的解决方案是在找到NAN时立即返回True:

import numba
import numpy as np

NAN = float("nan")

@numba.njit(nogil=True)
def _any_nans(a):
    for x in a:
        if np.isnan(x): return True
    return False

@numba.jit
def any_nans(a):
    if not a.dtype.kind=='f': return False
    return _any_nans(a.flat)

array1M = np.random.rand(1000000)
assert any_nans(array1M)==False
%timeit any_nans(array1M)  # 573us

array1M[0] = NAN
assert any_nans(array1M)==True
%timeit any_nans(array1M)  # 774ns  (!nanoseconds)

并适用于n维:

array1M_nd = array1M.reshape((len(array1M)/2, 2))
assert any_nans(array1M_nd)==True
%timeit any_nans(array1M_nd)  # 774ns

将此与numpy本机解决方案进行比较:

def any_nans(a):
    if not a.dtype.kind=='f': return False
    return np.isnan(a).any()

array1M = np.random.rand(1000000)
assert any_nans(array1M)==False
%timeit any_nans(array1M)  # 456us

array1M[0] = NAN
assert any_nans(array1M)==True
%timeit any_nans(array1M)  # 470us

%timeit np.isnan(array1M).any()  # 532us

提前退出方法是3阶或幅度加速(在某些情况下)。对于简单的注解,不要太破旧。

Pfft! Microseconds! Never solve a problem in microseconds that can be solved in nanoseconds.

Note that the accepted answer:

  • iterates over the whole data, regardless of whether a nan is found
  • creates a temporary array of size N, which is redundant.

A better solution is to return True immediately when NAN is found:

import numba
import numpy as np

NAN = float("nan")

@numba.njit(nogil=True)
def _any_nans(a):
    for x in a:
        if np.isnan(x): return True
    return False

@numba.jit
def any_nans(a):
    if not a.dtype.kind=='f': return False
    return _any_nans(a.flat)

array1M = np.random.rand(1000000)
assert any_nans(array1M)==False
%timeit any_nans(array1M)  # 573us

array1M[0] = NAN
assert any_nans(array1M)==True
%timeit any_nans(array1M)  # 774ns  (!nanoseconds)

and works for n-dimensions:

array1M_nd = array1M.reshape((len(array1M)/2, 2))
assert any_nans(array1M_nd)==True
%timeit any_nans(array1M_nd)  # 774ns

Compare this to the numpy native solution:

def any_nans(a):
    if not a.dtype.kind=='f': return False
    return np.isnan(a).any()

array1M = np.random.rand(1000000)
assert any_nans(array1M)==False
%timeit any_nans(array1M)  # 456us

array1M[0] = NAN
assert any_nans(array1M)==True
%timeit any_nans(array1M)  # 470us

%timeit np.isnan(array1M).any()  # 532us

The early-exit method is 3 orders or magnitude speedup (in some cases). Not too shabby for a simple annotation.


回答 3

使用numpy 1.3或svn,您可以执行此操作

In [1]: a = arange(10000.).reshape(100,100)

In [3]: isnan(a.max())
Out[3]: False

In [4]: a[50,50] = nan

In [5]: isnan(a.max())
Out[5]: True

In [6]: timeit isnan(a.max())
10000 loops, best of 3: 66.3 µs per loop

比较中对Nans的处理在早期版本中不一致。

With numpy 1.3 or svn you can do this

In [1]: a = arange(10000.).reshape(100,100)

In [3]: isnan(a.max())
Out[3]: False

In [4]: a[50,50] = nan

In [5]: isnan(a.max())
Out[5]: True

In [6]: timeit isnan(a.max())
10000 loops, best of 3: 66.3 µs per loop

The treatment of nans in comparisons was not consistent in earlier versions.


回答 4

(np.where(np.isnan(A)))[0].shape[0]将大于0如果A含有的至少一种元素nanA可以是一个n x m矩阵。

例:

import numpy as np

A = np.array([1,2,4,np.nan])

if (np.where(np.isnan(A)))[0].shape[0]: 
    print "A contains nan"
else:
    print "A does not contain nan"

(np.where(np.isnan(A)))[0].shape[0] will be greater than 0 if A contains at least one element of nan, A could be an n x m matrix.

Example:

import numpy as np

A = np.array([1,2,4,np.nan])

if (np.where(np.isnan(A)))[0].shape[0]: 
    print "A contains nan"
else:
    print "A does not contain nan"

带副本的numpy数组分配

问题:带副本的numpy数组分配

例如,如果我们有一个numpyarray A,我们想要一个numpy数组B具有相同元素。

以下(见下文)方法之间的区别是什么?什么时候分配额外的内存,什么时候不分配?

  1. B = A
  2. B[:] = A(与B[:]=A[:]?相同)
  3. numpy.copy(B, A)

For example, if we have a numpy array A, and we want a numpy array B with the same elements.

What is the difference between the following (see below) methods? When is additional memory allocated, and when is it not?

  1. B = A
  2. B[:] = A (same as B[:]=A[:]?)
  3. numpy.copy(B, A)

回答 0

这三个版本都做不同的事情:

  1. B = A

    这会将新名称绑定B到已经命名的现有对象A。之后,它们引用同一个对象,因此,如果您就地修改一个对象,那么您也会在另一个对象中看到更改。

  2. B[:] = A(与B[:]=A[:]?相同)

    这会将值从中复制A到现有数组中B。两个数组必须具有相同的形状才能起作用。B[:] = A[:]做同样的事情(但B = A[:]会做更多类似1的事情)。

  3. numpy.copy(B, A)

    这不是合法的语法。你可能是说B = numpy.copy(A)。这几乎与2相同,但是它创建了一个新数组,而不是重用该B数组。如果没有其他对先前B值的引用,则最终结果将与2相同,但是在复制期间它将临时使用更多内存。

    也许您是说numpy.copyto(B, A),这是合法的,等于2?

All three versions do different things:

  1. B = A

    This binds a new name B to the existing object already named A. Afterwards they refer to the same object, so if you modify one in place, you’ll see the change through the other one too.

  2. B[:] = A (same as B[:]=A[:]?)

    This copies the values from A into an existing array B. The two arrays must have the same shape for this to work. B[:] = A[:] does the same thing (but B = A[:] would do something more like 1).

  3. numpy.copy(B, A)

    This is not legal syntax. You probably meant B = numpy.copy(A). This is almost the same as 2, but it creates a new array, rather than reusing the B array. If there were no other references to the previous B value, the end result would be the same as 2, but it will use more memory temporarily during the copy.

    Or maybe you meant numpy.copyto(B, A), which is legal, and is equivalent to 2?


回答 1

  1. B=A 创建参考
  2. B[:]=A 复制
  3. numpy.copy(B,A) 复制

后两个需要额外的内存。

要制作深拷贝,您需要使用 B = copy.deepcopy(A)

  1. B=A creates a reference
  2. B[:]=A makes a copy
  3. numpy.copy(B,A) makes a copy

the last two need additional memory.

To make a deep copy you need to use B = copy.deepcopy(A)


回答 2

这是我唯一的工作答案:

B=numpy.array(A)

This is the only working answer for me:

B=numpy.array(A)

如何正确保存和加载numpy.array()数据?

问题:如何正确保存和加载numpy.array()数据?

我想知道如何numpy.array正确保存和加载数据。目前,我正在使用该numpy.savetxt()方法。例如,如果我有一个array markers,它看起来像这样:

我尝试通过使用以下方式保存它:

numpy.savetxt('markers.txt', markers)

在其他脚本中,我尝试打开以前保存的文件:

markers = np.fromfile("markers.txt")

这就是我得到的…

首先保存的数据如下所示:

0.000000000000000000e+00
0.000000000000000000e+00
0.000000000000000000e+00
0.000000000000000000e+00
0.000000000000000000e+00
0.000000000000000000e+00
0.000000000000000000e+00
0.000000000000000000e+00
0.000000000000000000e+00
0.000000000000000000e+00

但是,当我使用相同的方法保存刚刚加载的数据时,即 numpy.savetxt()它看起来像这样:

1.398043286095131769e-76
1.398043286095288860e-76
1.396426376485745879e-76
1.398043286055061908e-76
1.398043286095288860e-76
1.182950697433698368e-76
1.398043275797188953e-76
1.398043286095288860e-76
1.210894289234927752e-99
1.398040649781712473e-76

我究竟做错了什么?PS没有执行其他“后台”操作。只需保存和加载,这就是我得到的。先感谢您。

I wonder, how to save and load numpy.array data properly. Currently I’m using the numpy.savetxt() method. For example, if I got an array markers, which looks like this:

I try to save it by the use of:

numpy.savetxt('markers.txt', markers)

In other script I try to open previously saved file:

markers = np.fromfile("markers.txt")

And that’s what I get…

Saved data first looks like this:

0.000000000000000000e+00
0.000000000000000000e+00
0.000000000000000000e+00
0.000000000000000000e+00
0.000000000000000000e+00
0.000000000000000000e+00
0.000000000000000000e+00
0.000000000000000000e+00
0.000000000000000000e+00
0.000000000000000000e+00

But when I save just loaded data by the use of the same method, ie. numpy.savetxt() it looks like this:

1.398043286095131769e-76
1.398043286095288860e-76
1.396426376485745879e-76
1.398043286055061908e-76
1.398043286095288860e-76
1.182950697433698368e-76
1.398043275797188953e-76
1.398043286095288860e-76
1.210894289234927752e-99
1.398040649781712473e-76

What am I doing wrong? PS there are no other “backstage” operation which I perform. Just saving and loading, and that’s what I get. Thank you in advance.


回答 0

我发现执行此操作的最可靠方法是与一起使用np.savetxtnp.loadtxt而不是np.fromfile更适合用编写的二进制文件tofile。该np.fromfilenp.tofile方法写入和读取二进制文件,而np.savetxt写入一个文本文件。因此,例如:

In [1]: a = np.array([1, 2, 3, 4])
In [2]: np.savetxt('test1.txt', a, fmt='%d')
In [3]: b = np.loadtxt('test1.txt', dtype=int)
In [4]: a == b
Out[4]: array([ True,  True,  True,  True], dtype=bool)

要么:

In [5]: a.tofile('test2.dat')
In [6]: c = np.fromfile('test2.dat', dtype=int)
In [7]: c == a
Out[7]: array([ True,  True,  True,  True], dtype=bool)

我使用前一种方法,即使它速度较慢并且有时会创建更大的文件:二进制格式也可能取决于平台(例如,文件格式取决于系统的字节序)。

NumPy数组有与平台无关的格式,可以使用np.save和保存和读取np.load

In  [8]: np.save('test3.npy', a)    # .npy extension is added if not given
In  [9]: d = np.load('test3.npy')
In [10]: a == d
Out[10]: array([ True,  True,  True,  True], dtype=bool)

The most reliable way I have found to do this is to use np.savetxt with np.loadtxt and not np.fromfile which is better suited to binary files written with tofile. The np.fromfile and np.tofile methods write and read binary files whereas np.savetxt writes a text file. So, for example:

In [1]: a = np.array([1, 2, 3, 4])
In [2]: np.savetxt('test1.txt', a, fmt='%d')
In [3]: b = np.loadtxt('test1.txt', dtype=int)
In [4]: a == b
Out[4]: array([ True,  True,  True,  True], dtype=bool)

Or:

In [5]: a.tofile('test2.dat')
In [6]: c = np.fromfile('test2.dat', dtype=int)
In [7]: c == a
Out[7]: array([ True,  True,  True,  True], dtype=bool)

I use the former method even if it is slower and creates bigger files (sometimes): the binary format can be platform dependent (for example, the file format depends on the endianness of your system).

There is a platform independent format for NumPy arrays, which can be saved and read with np.save and np.load:

In  [8]: np.save('test3.npy', a)    # .npy extension is added if not given
In  [9]: d = np.load('test3.npy')
In [10]: a == d
Out[10]: array([ True,  True,  True,  True], dtype=bool)

回答 1

np.save('data.npy', num_arr) # save
new_num_arr = np.load('data.npy') # load
np.save('data.npy', num_arr) # save
new_num_arr = np.load('data.npy') # load

回答 2

np.fromfile()有一个sep=关键字参数:

如果文件是文本文件,则项目之间的分隔符。空(“”)分隔符表示文件应被视为二进制文件。分隔符中的空格(“”)匹配零个或多个空格字符。仅由空格组成的分隔符必须至少匹配一个空格。

默认值sep=""意味着np.fromfile()试图将其读取为二进制文件而不是以空格分隔的文本文件,因此您会得到无意义的值。如果使用np.fromfile('markers.txt', sep=" "),将得到您想要的结果。

但是,正如其他人指出的那样,这np.loadtxt()是将文本文件转换为numpy数组的首选方法,除非该文件需要人类可读,否则通常最好使用二进制格式(例如np.load()/ np.save())。

np.fromfile() has a sep= keyword argument:

Separator between items if file is a text file. Empty (“”) separator means the file should be treated as binary. Spaces (” ”) in the separator match zero or more whitespace characters. A separator consisting only of spaces must match at least one whitespace.

The default value of sep="" means that np.fromfile() tries to read it as a binary file rather than a space-separated text file, so you get nonsense values back. If you use np.fromfile('markers.txt', sep=" ") you will get the result you are looking for.

However, as others have pointed out, np.loadtxt() is the preferred way to convert text files to numpy arrays, and unless the file needs to be human-readable it is usually better to use binary formats instead (e.g. np.load()/np.save()).


回答 3

对于简短的答案,您应该使用np.savenp.load。这些方法的优点是它们是由numpy库的开发人员制作的,并且已经可以工作(加上可能已经很好地进行了优化),例如

import numpy as np
from pathlib import Path

path = Path('~/data/tmp/').expanduser()
path.mkdir(parents=True, exist_ok=True)

lb,ub = -1,1
num_samples = 5
x = np.random.uniform(low=lb,high=ub,size=(1,num_samples))
y = x**2 + x + 2

np.save(path/'x', x)
np.save(path/'y', y)

x_loaded = np.load(path/'x.npy')
y_load = np.load(path/'y.npy')

print(x is x_loaded) # False
print(x == x_loaded) # [[ True  True  True  True  True]]

扩展答案:

最后,它确实取决于您的需求,因为您还可以将其保存为人类可读的格式(请参见将此NumPy数组转储到csv文件中),或者如果文件非常大,甚至可以与其他库一起使用(请参见保存numpy数组的最佳方法)在磁盘上进行扩展讨论)。

但是,(由于您在问题中使用“正确”一词,因此进行了扩展)我仍然认为开箱即用(和大多数代码!)使用numpy函数最有可能满足大多数用户需求。最重要的原因是它已经起作用。出于其他原因尝试使用其他东西可能会使您出乎意料的长兔子洞,弄清楚为什么它不起作用并迫使它起作用。

以尝试用泡菜保存为例。我只是为了好玩而尝试,花了至少30分钟的时间才意识到,除非我用字节模式打开并读取文件,否则泡菜不会保存我的东西wb。花时间去Google,试一试,理解错误消息等。小细节,但事实是它已经需要我打开文件,从而以意想不到的方式使事情变得复杂。补充一点,它要求我重新阅读此内容(哪个btw有点令人困惑)内置开放功能中的模式a,a +,w,w +和r +之间的区别?

所以,如果有符合您需要使用它,除非你有一个(的接口非常)充分的理由(如与MATLAB或由于某种原因,你真的要读取的文件和打印Python真的不能满足您的需求,它的兼容性可能有问题)。此外,最有可能的是,如果您需要对其进行优化,则可以在以后找到答案(而不是花很多时间调试无用的东西,例如打开一个简单的numpy文件)。

因此,请使用interface / numpy提供。它可能并不完美,这很可能很好,尤其是对于已经存在numpy的库而言。

我已经花了很多时间用numpy来保存和加载数据,所以请乐在其中,希望对您有所帮助!

import numpy as np
import pickle
from pathlib import Path

path = Path('~/data/tmp/').expanduser()
path.mkdir(parents=True, exist_ok=True)

lb,ub = -1,1
num_samples = 5
x = np.random.uniform(low=lb,high=ub,size=(1,num_samples))
y = x**2 + x + 2

# using save (to npy), savez (to npz)
np.save(path/'x', x)
np.save(path/'y', y)
np.savez(path/'db', x=x, y=y)
with open(path/'db.pkl', 'wb') as db_file:
    pickle.dump(obj={'x':x, 'y':y}, file=db_file)

## using loading npy, npz files
x_loaded = np.load(path/'x.npy')
y_load = np.load(path/'y.npy')
db = np.load(path/'db.npz')
with open(path/'db.pkl', 'rb') as db_file:
    db_pkl = pickle.load(db_file)

print(x is x_loaded)
print(x == x_loaded)
print(x == db['x'])
print(x == db_pkl['x'])
print('done')

关于我学到的一些评论:

  • np.save如预期的那样,它已经很好地进行了压缩(请参阅https://stackoverflow.com/a/55750128/1601580),开箱即用,无需打开任何文件。清洁。简单。高效。用它。
  • np.savez使用未压缩的格式(请参阅docsSave several arrays into a single file in uncompressed 。npz format.如果决定使用此格式(警告您不要使用标准解决方案,因此请注意错误!),您可能会发现您需要使用参数名称来保存它,除非您想要使用默认名称。因此,如果第一个已经使用(或任何作品都使用该功能!),请勿使用此功能。
  • Pickle还允许执行任意代码。出于安全原因,某些人可能不想使用此功能。
  • 可读文件的制作成本很高,可能不值得。
  • 有一些所谓hdf5的大文件。凉!https://stackoverflow.com/a/9619713/1601580

请注意,这不是详尽的答案。但是对于其他资源,请检查以下内容:

For a short answer you should use np.save and np.load. The advantages of these is that they are made by developers of the numpy library and they already work (plus are likely already optimized nicely) e.g.

import numpy as np
from pathlib import Path

path = Path('~/data/tmp/').expanduser()
path.mkdir(parents=True, exist_ok=True)

lb,ub = -1,1
num_samples = 5
x = np.random.uniform(low=lb,high=ub,size=(1,num_samples))
y = x**2 + x + 2

np.save(path/'x', x)
np.save(path/'y', y)

x_loaded = np.load(path/'x.npy')
y_load = np.load(path/'y.npy')

print(x is x_loaded) # False
print(x == x_loaded) # [[ True  True  True  True  True]]

Expanded answer:

In the end it really depends in your needs because you can also save it human readable format (see this Dump a NumPy array into a csv file) or even with other libraries if your files are extremely large (see this best way to preserve numpy arrays on disk for an expanded discussion).

However, (making an expansion since you use the word “properly” in your question) I still think using the numpy function out of the box (and most code!) most likely satisfy most user needs. The most important reason is that it already works. Trying to use something else for any other reason might take you on an unexpectedly LONG rabbit hole to figure out why it doesn’t work and force it work.

Take for example trying to save it with pickle. I tried that just for fun and it took me at least 30 minutes to realize that pickle wouldn’t save my stuff unless I opened & read the file in bytes mode with wb. Took time to google, try thing, understand the error message etc… Small detail but the fact that it already required me to open a file complicated things in unexpected ways. To add that it required me to re-read this (which btw is sort of confusing) Difference between modes a, a+, w, w+, and r+ in built-in open function?.

So if there is an interface that meets your needs use it unless you have a (very) good reason (e.g. compatibility with matlab or for some reason your really want to read the file and printing in python really doesn’t meet your needs, which might be questionable). Furthermore, most likely if you need to optimize it you’ll find out later down the line (rather than spend ages debugging useless stuff like opening a simple numpy file).

So use the interface/numpy provide. It might not be perfect it’s most likely fine, especially for a library that’s been around as long as numpy.

I already spent the saving and loading data with numpy in a bunch of way so have fun with it, hope it helps!

import numpy as np
import pickle
from pathlib import Path

path = Path('~/data/tmp/').expanduser()
path.mkdir(parents=True, exist_ok=True)

lb,ub = -1,1
num_samples = 5
x = np.random.uniform(low=lb,high=ub,size=(1,num_samples))
y = x**2 + x + 2

# using save (to npy), savez (to npz)
np.save(path/'x', x)
np.save(path/'y', y)
np.savez(path/'db', x=x, y=y)
with open(path/'db.pkl', 'wb') as db_file:
    pickle.dump(obj={'x':x, 'y':y}, file=db_file)

## using loading npy, npz files
x_loaded = np.load(path/'x.npy')
y_load = np.load(path/'y.npy')
db = np.load(path/'db.npz')
with open(path/'db.pkl', 'rb') as db_file:
    db_pkl = pickle.load(db_file)

print(x is x_loaded)
print(x == x_loaded)
print(x == db['x'])
print(x == db_pkl['x'])
print('done')

Some comments on what I learned:

  • np.save as expected, this already compresses it well (see https://stackoverflow.com/a/55750128/1601580), works out of the box without any file opening. Clean. Easy. Efficient. Use it.
  • np.savez uses a uncompressed format (see docs) Save several arrays into a single file in uncompressed .npz format. If you decide to use this (you were warned to go away from the standard solution so expect bugs!) you might discover that you need to use argument names to save it, unless you want to use the default names. So don’t use this if the first already works (or any works use that!)
  • Pickle also allows for arbitrary code execution. Some people might not want to use this for security reasons.
  • human readable files are expensive to make etc. Probably not worth it.
  • there is something called hdf5 for large files. Cool! https://stackoverflow.com/a/9619713/1601580

Note this is not an exhaustive answer. But for other resources check this:


如何在numpy中获得按元素矩阵乘法(Hadamard积)?

问题:如何在numpy中获得按元素矩阵乘法(Hadamard积)?

我有两个矩阵

a = np.matrix([[1,2], [3,4]])
b = np.matrix([[5,6], [7,8]])

我想得到元素乘积[[1*5,2*6], [3*7,4*8]],等于

[[5,12], [21,32]]

我努力了

print(np.dot(a,b)) 

print(a*b)

但两者都给出结果

[[19 22], [43 50]]

这是矩阵乘积,而不是元素乘积。如何使用内置函数获取按元素分类的产品(又名Hadamard产品)?

I have two matrices

a = np.matrix([[1,2], [3,4]])
b = np.matrix([[5,6], [7,8]])

and I want to get the element-wise product, [[1*5,2*6], [3*7,4*8]], equaling

[[5,12], [21,32]]

I have tried

print(np.dot(a,b)) 

and

print(a*b)

but both give the result

[[19 22], [43 50]]

which is the matrix product, not the element-wise product. How can I get the the element-wise product (aka Hadamard product) using built-in functions?


回答 0

对于matrix对象的元素乘法,可以使用numpy.multiply

import numpy as np
a = np.array([[1,2],[3,4]])
b = np.array([[5,6],[7,8]])
np.multiply(a,b)

结果

array([[ 5, 12],
       [21, 32]])

但是,您应该真正使用array而不是matrixmatrix对象与常规ndarray具有各种可怕的不兼容性。使用ndarrays时,您可以仅使用*元素级乘法:

a * b

如果您使用的是Python 3.5+,则您甚至都不会失去使用运算符执行矩阵乘法的能力,因为@矩阵乘法现在可以

a @ b  # matrix multiplication

For elementwise multiplication of matrix objects, you can use numpy.multiply:

import numpy as np
a = np.array([[1,2],[3,4]])
b = np.array([[5,6],[7,8]])
np.multiply(a,b)

Result

array([[ 5, 12],
       [21, 32]])

However, you should really use array instead of matrix. matrix objects have all sorts of horrible incompatibilities with regular ndarrays. With ndarrays, you can just use * for elementwise multiplication:

a * b

If you’re on Python 3.5+, you don’t even lose the ability to perform matrix multiplication with an operator, because @ does matrix multiplication now:

a @ b  # matrix multiplication

回答 1

只是这样做:

import numpy as np

a = np.array([[1,2],[3,4]])
b = np.array([[5,6],[7,8]])

a * b

just do this:

import numpy as np

a = np.array([[1,2],[3,4]])
b = np.array([[5,6],[7,8]])

a * b

回答 2

import numpy as np
x = np.array([[1,2,3], [4,5,6]])
y = np.array([[-1, 2, 0], [-2, 5, 1]])

x*y
Out: 
array([[-1,  4,  0],
       [-8, 25,  6]])

%timeit x*y
1000000 loops, best of 3: 421 ns per loop

np.multiply(x,y)
Out: 
array([[-1,  4,  0],
       [-8, 25,  6]])

%timeit np.multiply(x, y)
1000000 loops, best of 3: 457 ns per loop

两者np.multiply*都会产生元素明智的乘法,称为Hadamard积

%timeit 是ipython的魔力

import numpy as np
x = np.array([[1,2,3], [4,5,6]])
y = np.array([[-1, 2, 0], [-2, 5, 1]])

x*y
Out: 
array([[-1,  4,  0],
       [-8, 25,  6]])

%timeit x*y
1000000 loops, best of 3: 421 ns per loop

np.multiply(x,y)
Out: 
array([[-1,  4,  0],
       [-8, 25,  6]])

%timeit np.multiply(x, y)
1000000 loops, best of 3: 457 ns per loop

Both np.multiply and * would yield element wise multiplication known as the Hadamard Product

%timeit is ipython magic


回答 3

试试这个:

a = np.matrix([[1,2], [3,4]])
b = np.matrix([[5,6], [7,8]])

#This would result a 'numpy.ndarray'
result = np.array(a) * np.array(b)

在此,np.array(a)返回类型为2D的2D数组,ndarray并且ndarray将导致元素相乘。因此结果将是:

result = [[5, 12], [21, 32]]

如果您想获取矩阵,请执行以下操作:

result = np.mat(result)

Try this:

a = np.matrix([[1,2], [3,4]])
b = np.matrix([[5,6], [7,8]])

#This would result a 'numpy.ndarray'
result = np.array(a) * np.array(b)

Here, np.array(a) returns a 2D array of type ndarray and multiplication of two ndarray would result element wise multiplication. So the result would be:

result = [[5, 12], [21, 32]]

If you wanna get a matrix, the do it with this:

result = np.mat(result)

高效地检查Python / numpy / pandas中的任意对象是否为NaN?

问题:高效地检查Python / numpy / pandas中的任意对象是否为NaN?

我的numpy数组用于np.nan指定缺失值。当我遍历数据集时,我需要检测这些缺失值并以特殊方式处理它们。

我天真地使用过numpy.isnan(val),除非val不在所支持的类型子集中,numpy.isnan()。例如,字符串字段中可能会丢失数据,在这种情况下,我得到:

>>> np.isnan('some_string')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: Not implemented for this type

除了编写昂贵的包装程序以捕获异常并返回之外 False,还有没有办法优雅而有效地处理此问题?

My numpy arrays use np.nan to designate missing values. As I iterate over the data set, I need to detect such missing values and handle them in special ways.

Naively I used numpy.isnan(val), which works well unless val isn’t among the subset of types supported by numpy.isnan(). For example, missing data can occur in string fields, in which case I get:

>>> np.isnan('some_string')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: Not implemented for this type

Other than writing an expensive wrapper that catches the exception and returns False, is there a way to handle this elegantly and efficiently?


回答 0

pandas.isnull()(也是pd.isna(),在较新版本中)检查数字数组和字符串/对象数组中的缺失值。从文档中,它检查:

数字数组中的NaN,对象数组中的None / NaN

快速示例:

import pandas as pd
import numpy as np
s = pd.Series(['apple', np.nan, 'banana'])
pd.isnull(s)
Out[9]: 
0    False
1     True
2    False
dtype: bool

numpy.nan用于表示缺失值的想法是pandas引入的,这就是为什么pandas有工具来处理它的原因。

日期时间也是如此(如果使用pd.NaT,则无需指定dtype)

In [24]: s = Series([Timestamp('20130101'),np.nan,Timestamp('20130102 9:30')],dtype='M8[ns]')

In [25]: s
Out[25]: 
0   2013-01-01 00:00:00
1                   NaT
2   2013-01-02 09:30:00
dtype: datetime64[ns]``

In [26]: pd.isnull(s)
Out[26]: 
0    False
1     True
2    False
dtype: bool

pandas.isnull() (also pd.isna(), in newer versions) checks for missing values in both numeric and string/object arrays. From the documentation, it checks for:

NaN in numeric arrays, None/NaN in object arrays

Quick example:

import pandas as pd
import numpy as np
s = pd.Series(['apple', np.nan, 'banana'])
pd.isnull(s)
Out[9]: 
0    False
1     True
2    False
dtype: bool

The idea of using numpy.nan to represent missing values is something that pandas introduced, which is why pandas has the tools to deal with it.

Datetimes too (if you use pd.NaT you won’t need to specify the dtype)

In [24]: s = Series([Timestamp('20130101'),np.nan,Timestamp('20130102 9:30')],dtype='M8[ns]')

In [25]: s
Out[25]: 
0   2013-01-01 00:00:00
1                   NaT
2   2013-01-02 09:30:00
dtype: datetime64[ns]``

In [26]: pd.isnull(s)
Out[26]: 
0    False
1     True
2    False
dtype: bool

回答 1

您的类型是真的武断吗?如果您知道它将只是一个int浮点数或字符串,则可以这样做

 if val.dtype == float and np.isnan(val):

假设它包装在numpy中,它将始终具有dtype,并且只有float和complex可以为NaN

Is your type really arbitrary? If you know it is just going to be a int float or string you could just do

 if val.dtype == float and np.isnan(val):

assuming it is wrapped in numpy , it will always have a dtype and only float and complex can be NaN