问题:如何使用scipy执行二维插值?

此Q&A旨在作为有关使用scipy进行二维(和多维)插值的规范(-ish)。经常有关于各种多维插值方法的基本语法的问题,我希望也将这些问题弄清楚。

我有一组散二维数据点,我想他们绘制作为一个很好的表面,最好使用类似contourfplot_surfacematplotlib.pyplot。如何使用scipy将二维或多维数据插值到网格?

我已经找到了该scipy.interpolate子软件包,但是在使用interp2dor bisplrepgriddataor 时仍然会出错rbf。这些方法的正确语法是什么?

This Q&A is intended as a canonical(-ish) concerning two-dimensional (and multi-dimensional) interpolation using scipy. There are often questions concerning the basic syntax of various multidimensional interpolation methods, I hope to set these straight too.

I have a set of scattered two-dimensional data points, and I would like to plot them as a nice surface, preferably using something like contourf or plot_surface in matplotlib.pyplot. How can I interpolate my two-dimensional or multidimensional data to a mesh using scipy?

I’ve found the scipy.interpolate sub-package, but I keep getting errors when using interp2d or bisplrep or griddata or rbf. What is the proper syntax of these methods?


回答 0

免责声明:我在写这篇文章时主要是出于句法考虑和一般行为。我不熟悉所描述方法的内存和CPU方面,因此,我的答案是针对那些数据量较小的用户,因此,插值的质量可能是要考虑的主要方面。我知道,在处理非常大的数据集时,性能更好的方法(即griddataRbf)可能不可行。

我将比较三种多维插值方法(interp2d/ splines griddataRbf)。我将对它们执行两种插值任务和两种基础功能(要从中插值的点)。具体示例将演示二维插值,但可行的方法适用于任意尺寸。每种方法都提供各种插值;在所有情况下,我都会使用三次插值(或接近1的东西)。重要的是要注意,每当使用插值时,都会引入与原始数据相比的偏差,并且所使用的特定方法会影响最终的工件。始终意识到这一点,并负责任地插值。

两个插值任务将是

  1. 上采样(输入数据在矩形网格上,输出数据在较密集的网格上)
  2. 将分散的数据插值到常规网格上

这两个函数(在上[x,y] in [-1,1]x[-1,1])将是

  1. 平稳,友好的功能:cos(pi*x)*sin(pi*y); 范围[-1, 1]
  2. 邪恶的(尤其是非连续的)函数:x*y/(x^2+y^2)在原点附近具有0.5的值;范围[-0.5, 0.5]

它们的外观如下:

图1:测试功能

我将首先演示这三个方法在这四个测试下的行为,然后详细介绍这三个方法的语法。如果您知道对某个方法应该有什么期望,那么您可能不想浪费时间学习其语法(看着您interp2d)。

测试数据

为了清楚起见,这是我用来生成输入数据的代码。虽然在这种特定情况下,我显然知道数据的基础功能,但我只会使用它为插值方法生成输入。为了方便起见,我使用了numpy(并且主要是用于生成数据),但仅scipy也足够。

import numpy as np
import scipy.interpolate as interp

# auxiliary function for mesh generation
def gimme_mesh(n):
    minval = -1
    maxval =  1
    # produce an asymmetric shape in order to catch issues with transpositions
    return np.meshgrid(np.linspace(minval,maxval,n), np.linspace(minval,maxval,n+1))

# set up underlying test functions, vectorized
def fun_smooth(x, y):
    return np.cos(np.pi*x)*np.sin(np.pi*y)

def fun_evil(x, y):
    # watch out for singular origin; function has no unique limit there
    return np.where(x**2+y**2>1e-10, x*y/(x**2+y**2), 0.5)

# sparse input mesh, 6x7 in shape
N_sparse = 6
x_sparse,y_sparse = gimme_mesh(N_sparse)
z_sparse_smooth = fun_smooth(x_sparse, y_sparse)
z_sparse_evil = fun_evil(x_sparse, y_sparse)

# scattered input points, 10^2 altogether (shape (100,))
N_scattered = 10
x_scattered,y_scattered = np.random.rand(2,N_scattered**2)*2 - 1
z_scattered_smooth = fun_smooth(x_scattered, y_scattered)
z_scattered_evil = fun_evil(x_scattered, y_scattered)

# dense output mesh, 20x21 in shape
N_dense = 20
x_dense,y_dense = gimme_mesh(N_dense)

平滑的功能和上采样

让我们从最简单的任务开始。这是从形状网格[6,7]到其中之一的向上采样如何[20,21]实现平滑测试功能的方法:

图2:平滑上采样

尽管这是一项简单的任务,但输出之间已经存在细微的差异。乍一看,这三个输出都是合理的。根据我们对基础功能的先验知识,有两个功能需要注意:中间情况griddata会使数据失真最大。注意图的y==-1边界(最靠近x标签):该函数应严格为零(因为y==-1该平滑函数是节点线),但是并非如此griddata。还要注意图的x==-1边界(在左边,在后面):基础函数在处具有局部最大值(隐含边界附近的零梯度)[-1, -0.5],但griddata输出在该区域中明显显示出非零梯度。效果是微妙的,但仍然存在偏见。(保真度Rbf如果使用默认的径向函数,则更好multiquadratic

邪恶功能和上采样

更加艰巨的任务是对我们邪恶的函数执行升采样:

图3:邪恶的向上采样

三种方法之间开始出现明显的差异。查看表面图,从的输出中会出现明显的虚假极值interp2d(请注意所绘制表面的右侧的两个驼峰)。虽然griddataRbf似乎产生乍看类似的结果,后者似乎产生更深的最低附近[0.4, -0.4]也就是从底层功能缺失。

但是,有一个至关重要的方面Rbf要优越得多:它尊重基础函数的对称性(当然,这也可以通过样本网格的对称性来实现)。的输出griddata破坏了采样点的对称性,在对称情况下,该对称性已经微弱可见。

功能流畅且数据分散

最常见的是要对分散的数据执行插值。因此,我希望这些测试更加重要。如上所示,在感兴趣的域中伪均匀地选择了采样点。在实际情况下,每次测量都可能会产生额外的噪声,因此您应该考虑对原始数据进行插值是否有意义。

平滑功能的输出:

图4:平滑分散插值

现在已经有一些恐怖表演在继续。为了保留至少最少的信息,我将输出从剪辑interp2d到两者之间,[-1, 1]专门用于绘图。显然,虽然存在一些潜在的形状,但是在巨大的嘈杂区域中,该方法完全失效了。第二种情况griddata很好地再现了形状,但请注意轮廓图边界处的白色区域。这是由于这样的事实,它griddata仅在输入数据点的凸包内部有效(换句话说,它不执行任何外推)。我保留了凸包外部的输出点的默认NaN值。2考虑到这些功能,Rbf似乎表现最佳。

邪恶功能和分散数据

我们一直在等待的时刻:

图5:邪恶分散插值

interp2d放弃并不奇怪。实际上,在调用期间,interp2d您应该期望一些友善RuntimeWarning的人抱怨无法构造样条线。至于其他两种方法,Rbf即使在推断结果的域边界附近,似乎也能产生最佳输出。


因此,让我以优先顺序从高到低的顺序对这三种方法说几句话(这样一来,最坏的情况就是每个人阅读的可能性最小)。

scipy.interpolate.Rbf

Rbf级代表“径向基函数”。老实说,直到开始研究这篇文章之前,我从未考虑过这种方法,但是我敢肯定,将来我会使用这些方法。

就像基于样条的方法(请参阅后面)一样,用法分为两个步骤:首先Rbf根据输入数据创建一个可调用的类实例,然后为给定的输出网格调用此对象以获得插值结果。平滑升采样测试的示例:

import scipy.interpolate as interp
zfun_smooth_rbf = interp.Rbf(x_sparse, y_sparse, z_sparse_smooth, function='cubic', smooth=0)  # default smooth=0 for interpolation
z_dense_smooth_rbf = zfun_smooth_rbf(x_dense, y_dense)  # not really a function, but a callable class instance

请注意,在这种情况下,输入点和输出点均为2d数组,并且输出z_dense_smooth_rbf具有相同的形状,x_densey_dense无需付出任何努力。另请注意,它Rbf支持插值的任意尺寸。

所以, scipy.interpolate.Rbf

  • 即使对于疯狂的输入数据也能产生良好的输出
  • 支持更大尺寸的插值
  • 在输入点的凸包之外进行外推(当然,外推总是一场赌博,您通常根本不应该依赖它)
  • 首先创建一个插值器,因此在各个输出点对其进行评估会减少额外的工作量
  • 可以具有任意形状的输出点(而不是局限于矩形网格,请参阅下文)
  • 倾向于保持输入数据的对称性
  • 支持多种径向函数关键字functionmultiquadricinversegaussianlinearcubicquinticthin_plate和用户自定义的任意

scipy.interpolate.griddata

我以前最喜欢的,griddata是用于任意尺寸插值的通用工具。除了为节点点的凸包之外的点设置单个预设值外,它不会执行外推,但是由于外推是非常易变且危险的事情,因此不一定是不利条件。用法示例:

z_dense_smooth_griddata = interp.griddata(np.array([x_sparse.ravel(),y_sparse.ravel()]).T,
                                          z_sparse_smooth.ravel(),
                                          (x_dense,y_dense), method='cubic')   # default method is linear

请注意略显晦涩的语法。输入点具有在形状上的阵列,以指定[N, D]D尺寸。为此,我们首先必须展平2D坐标数组(使用ravel),然后将数组连接起来并转置结果。有多种方法可以执行此操作,但是它们似乎都很庞大。输入z数据也必须展平。关于输出点,我们有更多的自由度:由于某些原因,它们也可以指定为多维数组的元组。请注意,helpof griddata具误导性,因为这表明输入点也是如此(至少对于版本0.17.0):

griddata(points, values, xi, method='linear', fill_value=nan, rescale=False)
    Interpolate unstructured D-dimensional data.

    Parameters
    ----------
    points : ndarray of floats, shape (n, D)
        Data point coordinates. Can either be an array of
        shape (n, D), or a tuple of `ndim` arrays.
    values : ndarray of float or complex, shape (n,)
        Data values.
    xi : ndarray of float, shape (M, D)
        Points at which to interpolate data.

简而言之, scipy.interpolate.griddata

  • 即使对于疯狂的输入数据也能产生良好的输出
  • 支持更大尺寸的插值
  • 不执行外推,可以为输入点的凸包外部的输出设置单个值(请参阅参考资料fill_value
  • 在一次调用中计算插值,因此从头开始探测多组输出点
  • 可以具有任意形状的输出点
  • 支持任意尺寸的最近邻和线性插值,立方在1d和2d中。最近邻和线性内插使用NearestNDInterpolatorLinearNDInterpolator引擎盖下,分别。一维三次插值使用样条曲线,二维三次插值用于CloughTocher2DInterpolator构造连续可微分分段三次插值器。
  • 可能会违反输入数据的对称性

scipy.interpolate.interp2d/scipy.interpolate.bisplrep

我正在interp2d与其亲戚讨论的唯一原因是它的名称具有欺骗性,人们可能会尝试使用它。剧透警告:请勿使用(自scipy版本0.17.0起)。它已经比以前的主题更特别了,因为它专门用于二维插值,但是我怀疑这是迄今为止多元插值最常见的情况。

就语法而言,interp2d这类似于Rbf它首先需要构造一个插值实例,可以调用该插值实例以提供实际的插值。但是有一个问题:输出点必须位于矩形网格上,因此输入到内插器的输入的输入必须是跨输出网格的一维向量,就像来自numpy.meshgrid

# reminder: x_sparse and y_sparse are of shape [6, 7] from numpy.meshgrid
zfun_smooth_interp2d = interp.interp2d(x_sparse, y_sparse, z_sparse_smooth, kind='cubic')   # default kind is 'linear'
# reminder: x_dense and y_dense are of shape [20, 21] from numpy.meshgrid
xvec = x_dense[0,:] # 1d array of unique x values, 20 elements
yvec = y_dense[:,0] # 1d array of unique y values, 21 elements
z_dense_smooth_interp2d = zfun_smooth_interp2d(xvec,yvec)   # output is [20, 21]-shaped array

使用时最常见的错误之一interp2d是将完整的2d网格放入插值调用中,这会导致大量的内存消耗,并希望仓促行事MemoryError

现在,最大的问题interp2d是它通常不起作用。为了理解这一点,我们必须深入研究。事实证明,这interp2d是较低级函数bisplrep+的bisplev包装,而后者又是FITPACK例程(用Fortran编写)的包装。对上一个示例的等效调用为

kind = 'cubic'
if kind=='linear':
    kx=ky=1
elif kind=='cubic':
    kx=ky=3
elif kind=='quintic':
    kx=ky=5
# bisplrep constructs a spline representation, bisplev evaluates the spline at given points
bisp_smooth = interp.bisplrep(x_sparse.ravel(),y_sparse.ravel(),z_sparse_smooth.ravel(),kx=kx,ky=ky,s=0)
z_dense_smooth_bisplrep = interp.bisplev(xvec,yvec,bisp_smooth).T  # note the transpose

现在,这里的事儿interp2d:(在SciPy的版本0.17.0或更新版本)有一个很好的注释interpolate/interpolate.pyinterp2d

if not rectangular_grid:
    # TODO: surfit is really not meant for interpolation!
    self.tck = fitpack.bisplrep(x, y, z, kx=kx, ky=ky, s=0.0)

确实在中interpolate/fitpack.pybisplrep有一些设置,最终

tx, ty, c, o = _fitpack._surfit(x, y, z, w, xb, xe, yb, ye, kx, ky,
                                task, s, eps, tx, ty, nxest, nyest,
                                wrk, lwrk1, lwrk2)                 

就是这样。底层的例程interp2d并不是真正要执行插值。它们可能足以满足足够良好的数据需求,但是在实际情况下,您可能需要使用其他内容。

总结一下, interpolate.interp2d

  • 即使数据经过良好调整也会导致伪像
  • 专为双变量问题(尽管interpn在网格上定义的输入点有限)
  • 执行外推
  • 首先创建一个插值器,因此在各个输出点对其进行评估会减少额外的工作量
  • 只能在矩形网格上产生输出,对于分散的输出,您将不得不在循环中调用内插器
  • 支持线性,三次和五次插值
  • 可能会违反输入数据的对称性

1我相当确定,的基函数cubiclinear种类Rbf不完全对应于同名的其他插值器。
2这些NaN也是表面图看起来如此奇怪的原因:从历史上看,matplotlib难以绘制具有适当深度信息的复杂3d对象。数据中的NaN值会混淆渲染器,因此应将应在背面的表面部分绘制在正面。这是可视化的问题,而不是插值。

Disclaimer: I’m mostly writing this post with syntactical considerations and general behaviour in mind. I’m not familiar with the memory and CPU aspect of the methods described, and I aim this answer at those who have reasonably small sets of data, such that the quality of the interpolation can be the main aspect to consider. I am aware that when working with very large data sets, the better-performing methods (namely griddata and Rbf) might not be feasible.

I’m going to compare three kinds of multi-dimensional interpolation methods (interp2d/splines, griddata and Rbf). I will subject them to two kinds of interpolation tasks and two kinds of underlying functions (points from which are to be interpolated). The specific examples will demonstrate two-dimensional interpolation, but the viable methods are applicable in arbitrary dimensions. Each method provides various kinds of interpolation; in all cases I will use cubic interpolation (or something close1). It’s important to note that whenever you use interpolation you introduce bias compared to your raw data, and the specific methods used affect the artifacts that you will end up with. Always be aware of this, and interpolate responsibly.

The two interpolation tasks will be

  1. upsampling (input data is on a rectangular grid, output data is on a denser grid)
  2. interpolation of scattered data onto a regular grid

The two functions (over the domain [x,y] in [-1,1]x[-1,1]) will be

  1. a smooth and friendly function: cos(pi*x)*sin(pi*y); range in [-1, 1]
  2. an evil (and in particular, non-continuous) function: x*y/(x^2+y^2) with a value of 0.5 near the origin; range in [-0.5, 0.5]

Here’s how they look:

fig1: test functions

I will first demonstrate how the three methods behave under these four tests, then I’ll detail the syntax of all three. If you know what you should expect from a method, you might not want to waste your time learning its syntax (looking at you, interp2d).

Test data

For the sake of explicitness, here is the code with which I generated the input data. While in this specific case I’m obviously aware of the function underlying the data, I will only use this to generate input for the interpolation methods. I use numpy for convenience (and mostly for generating the data), but scipy alone would suffice too.

import numpy as np
import scipy.interpolate as interp

# auxiliary function for mesh generation
def gimme_mesh(n):
    minval = -1
    maxval =  1
    # produce an asymmetric shape in order to catch issues with transpositions
    return np.meshgrid(np.linspace(minval,maxval,n), np.linspace(minval,maxval,n+1))

# set up underlying test functions, vectorized
def fun_smooth(x, y):
    return np.cos(np.pi*x)*np.sin(np.pi*y)

def fun_evil(x, y):
    # watch out for singular origin; function has no unique limit there
    return np.where(x**2+y**2>1e-10, x*y/(x**2+y**2), 0.5)

# sparse input mesh, 6x7 in shape
N_sparse = 6
x_sparse,y_sparse = gimme_mesh(N_sparse)
z_sparse_smooth = fun_smooth(x_sparse, y_sparse)
z_sparse_evil = fun_evil(x_sparse, y_sparse)

# scattered input points, 10^2 altogether (shape (100,))
N_scattered = 10
x_scattered,y_scattered = np.random.rand(2,N_scattered**2)*2 - 1
z_scattered_smooth = fun_smooth(x_scattered, y_scattered)
z_scattered_evil = fun_evil(x_scattered, y_scattered)

# dense output mesh, 20x21 in shape
N_dense = 20
x_dense,y_dense = gimme_mesh(N_dense)

Smooth function and upsampling

Let’s start with the easiest task. Here’s how an upsampling from a mesh of shape [6,7] to one of [20,21] works out for the smooth test function:

fig2: smooth upsampling

Even though this is a simple task, there are already subtle differences between the outputs. At a first glance all three outputs are reasonable. There are two features to note, based on our prior knowledge of the underlying function: the middle case of griddata distorts the data most. Note the y==-1 boundary of the plot (nearest the x label): the function should be strictly zero (since y==-1 is a nodal line for the smooth function), yet this is not the case for griddata. Also note the x==-1 boundary of the plots (behind, to the left): the underlying function has a local maximum (implying zero gradient near the boundary) at [-1, -0.5], yet the griddata output shows clearly non-zero gradient in this region. The effect is subtle, but it’s a bias none the less. (The fidelity of Rbf is even better with the default choice of radial functions, dubbed multiquadratic.)

Evil function and upsampling

A bit harder task is to perform upsampling on our evil function:

fig3: evil upsampling

Clear differences are starting to show among the three methods. Looking at the surface plots, there are clear spurious extrema appearing in the output from interp2d (note the two humps on the right side of the plotted surface). While griddata and Rbf seem to produce similar results at first glance, the latter seems to produce a deeper minimum near [0.4, -0.4] that is absent from the underlying function.

However, there is one crucial aspect in which Rbf is far superior: it respects the symmetry of the underlying function (which is of course also made possible by the symmetry of the sample mesh). The output from griddata breaks the symmetry of the sample points, which is already weakly visible in the smooth case.

Smooth function and scattered data

Most often one wants to perform interpolation on scattered data. For this reason I expect these tests to be more important. As shown above, the sample points were chosen pseudo-uniformly in the domain of interest. In realistic scenarios you might have additional noise with each measurement, and you should consider whether it makes sense to interpolate your raw data to begin with.

Output for the smooth function:

fig4: smooth scattered interpolation

Now there’s already a bit of a horror show going on. I clipped the output from interp2d to between [-1, 1] exclusively for plotting, in order to preserve at least a minimal amount of information. It’s clear that while some of the underlying shape is present, there are huge noisy regions where the method completely breaks down. The second case of griddata reproduces the shape fairly nicely, but note the white regions at the border of the contour plot. This is due to the fact that griddata only works inside the convex hull of the input data points (in other words, it doesn’t perform any extrapolation). I kept the default NaN value for output points lying outside the convex hull.2 Considering these features, Rbf seems to perform best.

Evil function and scattered data

And the moment we’ve all been waiting for:

fig5: evil scattered interpolation

It’s no huge surprise that interp2d gives up. In fact, during the call to interp2d you should expect some friendly RuntimeWarnings complaining about the impossibility of the spline to be constructed. As for the other two methods, Rbf seems to produce the best output, even near the borders of the domain where the result is extrapolated.


So let me say a few words about the three methods, in decreasing order of preference (so that the worst is the least likely to be read by anybody).

scipy.interpolate.Rbf

The Rbf class stands for “radial basis functions”. To be honest I’ve never considered this approach until I started researching for this post, but I’m pretty sure I’ll be using these in the future.

Just like the spline-based methods (see later), usage comes in two steps: first one creates a callable Rbf class instance based on the input data, and then calls this object for a given output mesh to obtain the interpolated result. Example from the smooth upsampling test:

import scipy.interpolate as interp
zfun_smooth_rbf = interp.Rbf(x_sparse, y_sparse, z_sparse_smooth, function='cubic', smooth=0)  # default smooth=0 for interpolation
z_dense_smooth_rbf = zfun_smooth_rbf(x_dense, y_dense)  # not really a function, but a callable class instance

Note that both input and output points were 2d arrays in this case, and the output z_dense_smooth_rbf has the same shape as x_dense and y_dense without any effort. Also note that Rbf supports arbitrary dimensions for interpolation.

So, scipy.interpolate.Rbf

  • produces well-behaved output even for crazy input data
  • supports interpolation in higher dimensions
  • extrapolates outside the convex hull of the input points (of course extrapolation is always a gamble, and you should generally not rely on it at all)
  • creates an interpolator as a first step, so evaluating it in various output points is less additional effort
  • can have output points of arbitrary shape (as opposed to being constrained to rectangular meshes, see later)
  • prone to preserving the symmetry of the input data
  • supports multiple kinds of radial functions for keyword function: multiquadric, inverse, gaussian, linear, cubic, quintic, thin_plate and user-defined arbitrary

scipy.interpolate.griddata

My former favourite, griddata, is a general workhorse for interpolation in arbitrary dimensions. It doesn’t perform extrapolation beyond setting a single preset value for points outside the convex hull of the nodal points, but since extrapolation is a very fickle and dangerous thing, this is not necessarily a con. Usage example:

z_dense_smooth_griddata = interp.griddata(np.array([x_sparse.ravel(),y_sparse.ravel()]).T,
                                          z_sparse_smooth.ravel(),
                                          (x_dense,y_dense), method='cubic')   # default method is linear

Note the slightly kludgy syntax. The input points have to be specified in an array of shape [N, D] in D dimensions. For this we first have to flatten our 2d coordinate arrays (using ravel), then concatenate the arrays and transpose the result. There are multiple ways to do this, but all of them seem to be bulky. The input z data also have to be flattened. We have a bit more freedom when it comes to the output points: for some reason these can also be specified as a tuple of multidimensional arrays. Note that the help of griddata is misleading, as it suggests that the same is true for the input points (at least for version 0.17.0):

griddata(points, values, xi, method='linear', fill_value=nan, rescale=False)
    Interpolate unstructured D-dimensional data.

    Parameters
    ----------
    points : ndarray of floats, shape (n, D)
        Data point coordinates. Can either be an array of
        shape (n, D), or a tuple of `ndim` arrays.
    values : ndarray of float or complex, shape (n,)
        Data values.
    xi : ndarray of float, shape (M, D)
        Points at which to interpolate data.

In a nutshell, scipy.interpolate.griddata

  • produces well-behaved output even for crazy input data
  • supports interpolation in higher dimensions
  • does not perform extrapolation, a single value can be set for the output outside the convex hull of the input points (see fill_value)
  • computes the interpolated values in a single call, so probing multiple sets of output points starts from scratch
  • can have output points of arbitrary shape
  • supports nearest-neighbour and linear interpolation in arbitrary dimensions, cubic in 1d and 2d. Nearest-neighbour and linear interpolation use NearestNDInterpolator and LinearNDInterpolator under the hood, respectively. 1d cubic interpolation uses a spline, 2d cubic interpolation uses CloughTocher2DInterpolator to construct a continuously differentiable piecewise-cubic interpolator.
  • might violate the symmetry of the input data

scipy.interpolate.interp2d/scipy.interpolate.bisplrep

The only reason I’m discussing interp2d and its relatives is that it has a deceptive name, and people are likely to try using it. Spoiler alert: don’t use it (as of scipy version 0.17.0). It’s already more special than the previous subjects in that it’s specifically used for two-dimensional interpolation, but I suspect this is by far the most common case for multivariate interpolation.

As far as syntax goes, interp2d is similar to Rbf in that it first needs constructing an interpolation instance, which can be called to provide the actual interpolated values. There’s a catch, however: the output points have to be located on a rectangular mesh, so inputs going into the call to the interpolator have to be 1d vectors which span the output grid, as if from numpy.meshgrid:

# reminder: x_sparse and y_sparse are of shape [6, 7] from numpy.meshgrid
zfun_smooth_interp2d = interp.interp2d(x_sparse, y_sparse, z_sparse_smooth, kind='cubic')   # default kind is 'linear'
# reminder: x_dense and y_dense are of shape [20, 21] from numpy.meshgrid
xvec = x_dense[0,:] # 1d array of unique x values, 20 elements
yvec = y_dense[:,0] # 1d array of unique y values, 21 elements
z_dense_smooth_interp2d = zfun_smooth_interp2d(xvec,yvec)   # output is [20, 21]-shaped array

One of the most common mistakes when using interp2d is putting your full 2d meshes into the interpolation call, which leads to explosive memory consumption, and hopefully to a hasty MemoryError.

Now, the greatest problem with interp2d is that it often doesn’t work. In order to understand this, we have to look under the hood. It turns out that interp2d is a wrapper for the lower-level functions bisplrep+bisplev, which are in turn wrappers for FITPACK routines (written in Fortran). The equivalent call to the previous example would be

kind = 'cubic'
if kind=='linear':
    kx=ky=1
elif kind=='cubic':
    kx=ky=3
elif kind=='quintic':
    kx=ky=5
# bisplrep constructs a spline representation, bisplev evaluates the spline at given points
bisp_smooth = interp.bisplrep(x_sparse.ravel(),y_sparse.ravel(),z_sparse_smooth.ravel(),kx=kx,ky=ky,s=0)
z_dense_smooth_bisplrep = interp.bisplev(xvec,yvec,bisp_smooth).T  # note the transpose

Now, here’s the thing about interp2d: (in scipy version 0.17.0) there is a nice comment in interpolate/interpolate.py for interp2d:

if not rectangular_grid:
    # TODO: surfit is really not meant for interpolation!
    self.tck = fitpack.bisplrep(x, y, z, kx=kx, ky=ky, s=0.0)

and indeed in interpolate/fitpack.py, in bisplrep there’s some setup and ultimately

tx, ty, c, o = _fitpack._surfit(x, y, z, w, xb, xe, yb, ye, kx, ky,
                                task, s, eps, tx, ty, nxest, nyest,
                                wrk, lwrk1, lwrk2)                 

And that’s it. The routines underlying interp2d are not really meant to perform interpolation. They might suffice for sufficiently well-behaved data, but under realistic circumstances you will probably want to use something else.

Just to conclude, interpolate.interp2d

  • can lead to artifacts even with well-tempered data
  • is specifically for bivariate problems (although there’s the limited interpn for input points defined on a grid)
  • performs extrapolation
  • creates an interpolator as a first step, so evaluating it in various output points is less additional effort
  • can only produce output over a rectangular grid, for scattered output you would have to call the interpolator in a loop
  • supports linear, cubic and quintic interpolation
  • might violate the symmetry of the input data

1I’m fairly certain that the cubic and linear kind of basis functions of Rbf do not exactly correspond to the other interpolators of the same name.
2These NaNs are also the reason for why the surface plot seems so odd: matplotlib historically has difficulties with plotting complex 3d objects with proper depth information. The NaN values in the data confuse the renderer, so parts of the surface that should be in the back are plotted to be in the front. This is an issue with visualization, and not interpolation.


声明:本站所有文章,如无特殊说明或标注,均为本站原创发布。任何个人或组织,在未征得本站同意时,禁止复制、盗用、采集、发布本站内容到任何网站、书籍等各类媒体平台。如若本站内容侵犯了原著者的合法权益,可联系我们进行处理。